Commit Graph

1189 Commits

Author SHA1 Message Date
William Schmidt bccc9aa81c Don't vectorize PHIs in catchswitch blocks
We currently assert in vectorizeTree(TreeEntry*) when processing a PHI
bundle in a block containing a catchswitch.  We attempt to set the
IRBuilder insertion point following the catchswitch, which is invalid.
This is done so that ShuffleBuilder.finalize() knows where to insert
a shuffle if one is needed.

To avoid this occurring, watch out for catchswitch blocks during
buildTree_rec() processing, and avoid adding PHIs in such blocks to
the vectorizable tree.  It is unlikely that constraining vectorization
over an exception path will cause a noticeable performance loss, so
this seems preferable to trying to anticipate when a shuffle will and
will not be required.
2022-07-19 06:10:17 -07:00
Evgeniy Brevnov 8f90edeb55 Additional regression test for a crash during reorder masked gather nodes 2022-07-19 19:03:53 +07:00
Sanjay Patel 10ebaf7686 [SLP] add test for load combining + shuffling; NFC
issue #38821
2022-07-04 10:55:07 -04:00
David Green 2de05afc19 [SLP] Peek into loads when hitting the RecursionMaxDepth
This patch slightly extends the limit on the RecursionMaxDepth inside
the SLP vectorizer. It does it only when it hits a load (or zext/sext of
a load), which allows it to peek through in the places where it will be
the most valuable, without ballooning out the O(..) by any 2^n factors.

Differential Revision: https://reviews.llvm.org/D122148
2022-07-04 14:22:50 +01:00
Alexey Bataev 34073b5538 [SLP][NFC]Rework the test for logical and freeze, need some extra nodes,
NFC.
2022-07-01 12:43:10 -07:00
Alexey Bataev 48aa787ab3 [SLP][NFC]Add a test for logical and operands, requiring extra
freezextra freeze, NFC.e.
2022-07-01 11:53:50 -07:00
Simon Pilgrim e961e05d59 [SLP][X86] Add 32-bit vector stores to help vectorization opportunities
Building on the work on D124284, this patch tags v4i8 and v2i16 vector loads as custom, enabling SLP to try to vectorize these types ending in a partial store (using the SSE MOVD instruction) - we already do something similar for 64-bit vector types.

Differential Revision: https://reviews.llvm.org/D127604
2022-06-30 20:25:50 +01:00
Alexey Bataev bf4dcbd2df [SLP]Fix PR56251: Do not remove the reordering from the root node, being used as an operand.
If the root order itself does not require reordering, we can just
remove its reorder mask safely (e.g., if the root node is a vector of
phis). But if this node is used as an operand in the graph, we cannot
delete the reordering, need to keep it. Otherwise the graph nodes are
not synchronized with the operands. It may cause an extra gather
instruction(s) or a compiler crash.
Also, need to be very careful when selecting the gather nodes for
reordering since there might several gather nodes with the same scalars
and we can try to reorder just the same node many times instead of
different nodes.

Differential Revision: https://reviews.llvm.org/D128680
2022-06-28 13:42:05 -07:00
Alexey Bataev 2faacf61a5 [SLP]Improve shuffles cost estimation where possible.
Improved/fixed cost modeling for shuffles by providing masks, improved
cost model for non-identity insertelements.

Differential Revision: https://reviews.llvm.org/D115462
2022-06-24 09:28:01 -07:00
Nabeel Omer 0d41794335 [SLP] Add cost model for `llvm.powi.*` intrinsics (REAPPLIED)
Patch was reverted in 4c5f10a due to buildbot failures, now being
reapplied with updated AArch64 and RISCV tests.

This patch adds handling for the llvm.powi.* intrinsics in
BasicTTIImplBase::getIntrinsicInstrCost() and improves vectorization.
Closes #53887.

Differential Revision: https://reviews.llvm.org/D128172
2022-06-24 10:23:19 +00:00
Alexey Bataev 3b6edef15d [SLP]Fix a crash when reorder masked gather nodes with reused scalars.
If the masked gather nodes must be reordered, we can just reorder
scalars, just like for gather nodes. But if the node contains reused
scalars, it must be handled same way as a regular vectorizable node,
since need to reorder reused mask, not the scalars directly.

Differential Revision: https://reviews.llvm.org/D128360
2022-06-23 11:32:30 -07:00
Fangrui Song 1ffd2d99c2 Revert D115462 "[SLP]Improve shuffles cost estimation where possible."
This reverts commit cac60940b7.

Caused -Os -fsanitize=memory -march=haswell miscompile to pytorch/cpuinfo.
See my latest comment (may update) on D115462.
2022-06-22 23:16:31 -07:00
Fangrui Song a411bc11d6 Revert "[SLP]Fix a crash when insert subvector is out of range."
This reverts commit f1ee2738b3.

Revert due to the revert of a dependent commit `[SLP]Improve shuffles cost estimation where possible.`
2022-06-22 23:16:25 -07:00
Vasileios Porpodas 7a9ad25769 Recommit "[SLP][X86] Improve reordering to consider alternate instruction bundles"
This reverts commit 6d6268dcbf.

Review: https://reviews.llvm.org/D125712
2022-06-21 18:35:29 -07:00
Vasileios Porpodas 6d6268dcbf Revert "[SLP][X86] Improve reordering to consider alternate instruction bundles"
This reverts commit 6f88acf410.
2022-06-21 17:07:21 -07:00
Vasileios Porpodas 6f88acf410 [SLP][X86] Improve reordering to consider alternate instruction bundles
During the reordering transformation we should try to avoid reordering bundles
like fadd,fsub because this may block them being matched into a single vector
instruction in x86.
We do this by checking if a TreeEntry is such a pattern and adding it to the
list of TreeEntries with orders that need to be considered.

Differential Revision: https://reviews.llvm.org/D125712
2022-06-21 16:44:48 -07:00
Vasileios Porpodas 085f59a826 [SLP][NFC] Precommit test for a followup patch that improves reordering for addsubs
Differential Revision: https://reviews.llvm.org/D126091
2022-06-21 14:34:55 -07:00
Nabeel Omer 4c5f10aeeb Revert rGe6ccb57bb3f6b761f2310e97fd6ca99eff42f73e "[SLP] Add cost model for `llvm.powi.*` intrinsics"
This reverts commit e6ccb57bb3.
2022-06-21 15:05:55 +00:00
Nabeel Omer e6ccb57bb3 [SLP] Add cost model for `llvm.powi.*` intrinsics
This patch adds handling for the llvm.powi.* intrinsics in
BasicTTIImplBase::getIntrinsicInstrCost() and improves vectorization.
Closes #53887.

Differential Revision: https://reviews.llvm.org/D128172
2022-06-21 14:40:34 +00:00
Alexey Bataev f1ee2738b3 [SLP]Fix a crash when insert subvector is out of range.
If the OffsetBeg + InsertVecSz is greater than VecSz, need to estimate
the cost as shuffle of 2 vector, not as insert of subvector. Otherwise,
the inserted subvector is out of range and compiler may crash.

Differential Revision: https://reviews.llvm.org/D128071
2022-06-21 07:16:35 -07:00
Nabeel Omer cd8870e850 [SLP] Add a test for llvm.powi.*
This patch introduces a test for the issue discovered in #53887.

Differential Revision: https://reviews.llvm.org/D128178
2022-06-20 12:41:37 +00:00
Alexey Bataev 76782a65ee [SLP]Use original vector if need to shuffle truncated root.
If the root scalar is mapped to to the smallest bit width, the vector is
truncated and the types between original buildvector and extracted value
mismatched. For extract, we emit sext/zext instructions, for shuffles we
can reuse oringal vector instead of the truncated one.

Differential Revision: https://reviews.llvm.org/D127974
2022-06-16 10:41:18 -07:00
Alexey Bataev 7236d49fd5 [SLP]Extend vectorization for scatter vectorize nodes.
Currently scatter vectorize nodes can be emitted only for GEPs with
constant indices. But we can also emit such nodes for GEPs with the same
ptr and non-constant vectorizable/gathered indices, if profitable. Patch
adds support for such nodes and tries to improve handling of GEPs with
non-const indeces for such nodes.

Metric: SLP.NumVectorInstructions

Program                                                                                       SLP.NumVectorInstructions
                                                                                              results                   results0 diff
                    test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test  5243.00                   5240.00  -0.1%
                     test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test  5243.00                   5240.00  -0.1%
                     test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 27550.00                  27507.00  -0.2%
                               test-suite :: External/SPEC/CFP2006/453.povray/453.povray.test  5395.00                   5380.00  -0.3%
                       test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test  5389.00                   5374.00  -0.3%
                    test-suite :: External/SPEC/CINT2017rate/520.omnetpp_r/520.omnetpp_r.test   961.00                    958.00  -0.3%
                   test-suite :: External/SPEC/CINT2017speed/620.omnetpp_s/620.omnetpp_s.test   961.00                    958.00  -0.3%
                               test-suite :: External/SPEC/CFP2006/447.dealII/447.dealII.test  5664.00                   5643.00  -0.4%
                       test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 13202.00                  13127.00  -0.6%
                                test-suite :: External/SPEC/CINT2006/445.gobmk/445.gobmk.test   212.00                    207.00  -2.4%
                                test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test   890.00                    850.00  -4.5%
                            test-suite :: External/SPEC/CINT2006/464.h264ref/464.h264ref.test  1695.00                   1581.00  -6.7%
                                 test-suite :: MultiSource/Applications/JM/lencod/lencod.test  2338.00                   2140.00  -8.5%
                                  test-suite :: SingleSource/UnitTests/matrix-types-spec.test    63.00                     55.00 -12.7%
                             test-suite :: SingleSource/Benchmarks/Adobe-C++/loop_unroll.test   468.00                    356.00 -23.9%
                                                                           Geomean difference                                     -0.3%

All numbers show increased number of generated vector instructions.

Diff:
SingleSource/Benchmarks/Adobe-C++/loop_unroll - better without LTO, but
need an extra analysis with LTO (with LTO compiler generates
masked_gather, while before regular loads were emitted because of extra
data, availbale at LTO time).
SingleSource/UnitTests/matrix-types-spec - more vector code.
MultiSource/Applications/JM/lencod/lencod - same.
External/SPEC/CINT2006/464.h264ref/464.h264ref - same.
MultiSource/Benchmarks/7zip/7zip-benchmark - same.
External/SPEC/CINT2006/445.gobmk/445.gobmk - no changes.
External/SPEC/CFP2017rate/510.parest_r/510.parest_r - more vector code.
External/SPEC/CFP2006/447.dealII/447.dealII - same
External/SPEC/CINT2017speed/620.omnetpp_s/620.omnetpp_s - same
External/SPEC/CINT2017rate/520.omnetpp_r/520.omnetpp - same
External/SPEC/CFP2017rate/511.povray_r/511.povray - same
External/SPEC/CFP2006/453.povray/453.povray - same
External/SPEC/CFP2017rate/526.blender_r/526.blender_r - same
External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r - same
External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s - same

Differential Revision: https://reviews.llvm.org/D127219
2022-06-16 06:05:48 -07:00
Alexey Bataev c60c13f7eb [SLP] Improve reordering in presence of constant only nodes.
We can skip the analysis of the constant nodes, their order should not
affect the ordering of the trees/subtrees.

Differential Revision: https://reviews.llvm.org/D127775
2022-06-15 06:17:34 -07:00
Nabeel Omer 245604a96f [X86][SLP] Basic test coverage for llvm.powi
This patch introduces basic test coverage for llvm.powi.* intrinsics.

Differential Revision: https://reviews.llvm.org/D127492
2022-06-15 11:13:54 +01:00
Nuno Lopes eb8cbb3ad7 [NFC] Add 3 more -inseltpoison.ll test variations 2022-06-10 14:06:32 +01:00
Alexey Bataev 3731bbc425 [SLP]Add a test for geps with non-const indeces in scatter vectorize
nodes, NFC.
2022-06-07 08:02:14 -07:00
Vasileios Porpodas 6c6ad5143a [SLP][NFC] Precommit test for followup patch that fixes vector phi poison input.
Differential Revision: https://reviews.llvm.org/D126938
2022-06-06 10:00:27 -07:00
Alexey Bataev cac60940b7 [SLP]Improve shuffles cost estimation where possible.
Improved/fixed cost modeling for shuffles by providing masks, improved
cost model for non-identity insertelements.

Differential Revision: https://reviews.llvm.org/D115462
2022-06-03 08:06:22 -07:00
Fangrui Song df0f30dc36 Revert "[SLP]Improve shuffles cost estimation where possible."
This reverts commit 9980c99718.

Caused assertion failures: https://reviews.llvm.org/D115462#3555350
2022-06-03 00:30:34 -07:00
Alexey Bataev 9980c99718 [SLP]Improve shuffles cost estimation where possible.
Improved/fixed cost modeling for shuffles by providing masks, improved
cost model for non-identity insertelements.

Differential Revision: https://reviews.llvm.org/D115462
2022-06-02 11:18:14 -07:00
Alexey Bataev 73020b4540 Revert "[SLP]Improve shuffles cost estimation where possible."
This reverts commit fd5a6ce9dc to fix
a crash detected by a buildbot
https://lab.llvm.org/buildbot/#/builders/179/builds/3805/steps/11/logs/stdio.
2022-06-01 15:44:51 -07:00
Alexey Bataev fd5a6ce9dc [SLP]Improve shuffles cost estimation where possible.
Improved/fixed cost modeling for shuffles by providing masks, improved
cost model for non-identity insertelements.

Differential Revision: https://reviews.llvm.org/D115462
2022-06-01 11:01:37 -07:00
Alexey Bataev fe4949942d [SLP]Fix PR55796: insert point for extractelements from different basic blocks.
Extractelement instructions may come from different basic blocks, need
to take it into account when looking for a last instruction in the
bundle to prevent compiler crash.

Differential Revision: https://reviews.llvm.org/D126777
2022-06-01 09:44:53 -07:00
Alexey Bataev 120d52b0ef [SLP]Fix PR55653: emit undefs where required, not poison.
Need to handle a corner case correctly, if all elements are Undefs/Poisons,
need to emit actual values, not just poisons.

Differential Revision: https://reviews.llvm.org/D126298
2022-05-26 08:38:50 -07:00
Alexey Bataev 9139d484d4 [SLP]Fix crash on reordering of ScatterVectorize nodes.
ScatterVectorize nodes should be handled same way as gathers in
reorderBottomToTop function, since we can simple reorder the loads in
this node. Because of that need to include such nodes to the list of
gathered nodes to fix compiler crash.

Differential Revision: https://reviews.llvm.org/D126378
2022-05-26 06:25:58 -07:00
Alexey Bataev 3bf5c2c8ec [SLP]Do not try to generate ScatterVectorize if it will be scalarized.
SLP should build ScatterVectorize nodes only if they actually end up
with masked gather rather than with scalarization. In the second
scenario better to build a gather node.

Differential Revision: https://reviews.llvm.org/D126379
2022-05-25 14:25:07 -07:00
Alexey Bataev 10f41a2147 [SLP]Fix PR55688: Miscompile due to incorrect nuw/nsw handling.
Need to use all ReductionOps when propagating flags for the reduction
ops, otherwise transformation is not correct. Plus, need to drop nuw/nsw
flags.

Differential Revision: https://reviews.llvm.org/D126371
2022-05-25 13:59:06 -07:00
Sanjay Patel d3187dd5f0 [SLP] add minimum test for miscompile (PR55688); NFC 2022-05-25 13:52:47 -04:00
Vasileios Porpodas 9df0568b07 [SLP] Fix crash caused by reorderBottomToTop().
The crash is caused by incorrect order set by reorderBottomToTop(), which
happens when it is reordering a TreeEntry which has a user that has already been
reordered earlier. Please see the detailed description in the lit test.

Differential Revision: https://reviews.llvm.org/D126099
2022-05-24 12:24:19 -07:00
Alexey Bataev 2ac5ebedea [SLP]Do not emit extract elements for insertelements users, replace with shuffles directly.
SLP vectorizer emits extracts for externally used vectorized scalars and
estimates the cost for each such extract. But in many cases these
scalars are input for insertelement instructions, forming buildvector,
and instead of extractelement/insertelement pair we can emit/cost
estimate shuffle(s) cost and generate series of shuffles, which can be
further optimized.

Tested using test-suite (+SPEC2017), the tests passed, SLP was able to
generate/vectorize more instructions in many cases and it allowed to reduce
number of re-vectorization attempts (where we could try to vectorize
buildector insertelements again and again).

Differential Revision: https://reviews.llvm.org/D107966
2022-05-23 07:06:45 -07:00
Alexey Bataev bea86a2d3f [SLP][NFC]Add a test for extracting scalar from undef result vector,
NFC.
2022-05-23 06:43:37 -07:00
Florian Hahn aeb19817d6
Revert "[SLP]Do not emit extract elements for insertelements users, replace with shuffles directly."
This reverts commit fc9c59c355.

The patch triggers an assertion when building SPEC on X86. Reduced
reproducer shared at D107966.

Also reverts follow-up commit 11a09af76d.
2022-05-21 21:00:01 +01:00
Alexey Bataev fc9c59c355 [SLP]Do not emit extract elements for insertelements users, replace with shuffles directly.
SLP vectorizer emits extracts for externally used vectorized scalars and
estimates the cost for each such extract. But in many cases these
scalars are input for insertelement instructions, forming buildvector,
and instead of extractelement/insertelement pair we can emit/cost
estimate shuffle(s) cost and generate series of shuffles, which can be
further optimized.

Tested using test-suite (+SPEC2017), the tests passed, SLP was able to
generate/vectorize more instructions in many cases and it allowed to reduce
number of re-vectorization attempts (where we could try to vectorize
buildector insertelements again and again).

Differential Revision: https://reviews.llvm.org/D107966
2022-05-20 05:58:09 -07:00
William Schmidt d633dbd195 [SLP][NFC] Pre-commit test showing vectorization preventing FMA
When we generate a horizontal reduction of floating adds fed by a vectorized
tree rooted at floating multiplies, we should account for the cost of no
longer being able to generate scalar FMAs.  Similarly, if we vectorize a
list of floating multiplies that each feeds a single floating add, we should
again account for this cost.

The first test was reduced from a case where the vectorizable tree looked
barely profitable (cost -1) with a horizontal reduction, but produced
substantially worse code than allowing the FMAs to be generated.  The second
test was derived from the first: we again generate a horizontal reduction
here, but even if the horizontal reduction is forced to be unprofitable, we
try to vectorize the multiplies.  I have follow-up patches to address these
issues.

Differential Revision: https://reviews.llvm.org/D124867
2022-05-19 06:57:24 -07:00
Alexey Bataev 7d8060bc19 [SLP]Improve reductions vectorization.
The pattern matching and vectgorization for reductions was not very
effective. Some of of the possible reduction values were marked as
external arguments, SLP could not find some reduction patterns because
of too early attempt to vectorize pair of binops arguments, the cost of
consts reductions was not correct. Patch addresses these issues and
improves the analysis/cost estimation and vectorization of the
reductions.

The most significant changes in SLP.NumVectorInstructions:

Metric: SLP.NumVectorInstructions                                                                                                                                                                                                 [140/14396]

Program                                                                                        results  results0 diff
               test-suite :: SingleSource/Benchmarks/Adobe-C++/loop_unroll.test   920.00  3548.00 285.7%
                test-suite :: SingleSource/Benchmarks/BenchmarkGame/n-body.test    66.00   122.00  84.8%
      test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/miniGMG/miniGMG.test   100.00   128.00  28.0%
 test-suite :: MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/timberwolfmc.test   664.00   810.00  22.0%
                 test-suite :: MultiSource/Benchmarks/mafft/pairlocalalign.test   592.00   687.00  16.0%
  test-suite :: MultiSource/Benchmarks/MiBench/consumer-lame/consumer-lame.test   402.00   426.00   6.0%
                   test-suite :: MultiSource/Applications/JM/lencod/lencod.test  1665.00  1745.00   4.8%
  test-suite :: External/SPEC/CINT2017rate/500.perlbench_r/500.perlbench_r.test   135.00   139.00   3.0%
 test-suite :: External/SPEC/CINT2017speed/600.perlbench_s/600.perlbench_s.test   135.00   139.00   3.0%
                  test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test   388.00   397.00   2.3%
                   test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test   895.00   914.00   2.1%
    test-suite :: MultiSource/Benchmarks/MiBench/telecomm-gsm/telecomm-gsm.test   240.00   244.00   1.7%
           test-suite :: MultiSource/Benchmarks/mediabench/gsm/toast/toast.test   240.00   244.00   1.7%
             test-suite :: External/SPEC/CINT2017speed/602.gcc_s/602.gcc_s.test   820.00   832.00   1.5%
              test-suite :: External/SPEC/CINT2017rate/502.gcc_r/502.gcc_r.test   820.00   832.00   1.5%
       test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 14804.00 14914.00   0.7%
                        test-suite :: MultiSource/Benchmarks/Bullet/bullet.test  8125.00  8183.00   0.7%
           test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test  1330.00  1338.00   0.6%
            test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test  1330.00  1338.00   0.6%
         test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test  9832.00  9880.00   0.5%
         test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test  5267.00  5291.00   0.5%
       test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test  4018.00  4024.00   0.1%
      test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test  4018.00  4024.00   0.1%
              test-suite :: External/SPEC/CFP2017speed/644.nab_s/644.nab_s.test   426.00   424.00  -0.5%
               test-suite :: External/SPEC/CFP2017rate/544.nab_r/544.nab_r.test   426.00   424.00  -0.5%
          test-suite :: External/SPEC/CINT2017rate/541.leela_r/541.leela_r.test   201.00   192.00  -4.5%
         test-suite :: External/SPEC/CINT2017speed/641.leela_s/641.leela_s.test   201.00   192.00  -4.5%

644.nab_s and 544.nab_r - reduced number of shuffles but increased number
of useful vectorized instructions.

641.leela_s and 541.leela_r - the function
`@_ZN9FastBoard25get_pattern3_augment_specEiib` is not inlined anymore
but its body gets vectorized successfully. Before, the function was
inlined twice and vectorized just after inlining, currently it is not
required. The vector code looks pretty similar, just like as it was before.

Differential Revision: https://reviews.llvm.org/D111574
2022-05-18 13:22:18 -07:00
Alexey Bataev b0f0313feb [SLP]Add an extra check for select minmax reduction to avoid crash.
Need to check if the reduction is still (not)cmp-select pattern min/max
reduction to avoid compiler crash during building list of reduction
operations. cmp-sel pattern provides 2 reduction operations, while
intrinsics - just one.
2022-05-17 06:05:52 -07:00
Alexey Bataev 152072801e [SLP]Check if the root of the buildvector has one use only.
The root of the buildvector can have only one use, otherwise it can be
treated only as a final element of the previous buildvector sequence.
2022-05-16 07:30:36 -07:00
Alexey Bataev 8b8281f354 [SLP]Do not vectorize non-profitable alternate nodes.
If alternate node has only 2 instructions and the tree is already big
enough, better to skip the vectorization of such nodes, they are not
very profitable (the resulting code cotains 3 instructions instead of
original 2 scalars). SLP can try to vectorize the buildvector sequence
in the next attempt, if it is profitable.

Metric: SLP.NumVectorInstructions

Program                                                                                       SLP.NumVectorInstructions
                                                                               results                   results0 diff
     test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/miniAMR/miniAMR.test    72.00                     73.00   1.4%
test-suite :: MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/timberwolfmc.test  1186.00                   1198.00   1.0%
     test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/miniFE/miniFE.test   241.00                    242.00   0.4%
                  test-suite :: MultiSource/Applications/JM/lencod/lencod.test  2131.00                   2139.00   0.4%
 test-suite :: External/SPEC/CINT2017rate/523.xalancbmk_r/523.xalancbmk_r.test  6377.00                   6384.00   0.1%
test-suite :: External/SPEC/CINT2017speed/623.xalancbmk_s/623.xalancbmk_s.test  6377.00                   6384.00   0.1%
        test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 12650.00                  12658.00   0.1%
      test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 26169.00                  26147.00  -0.1%
          test-suite :: MultiSource/Benchmarks/Trimaran/enc-3des/enc-3des.test    99.00                     86.00 -13.1%

Gains:
526.blender_r - more vectorized trees.
enc-3des - same.

Others:
510.parest_r - no changes.
miniFE - same
623.xalancbmk_s - some (non-profitable) parts of the trees are not
    vectorized.
523.xalancbmk_r - same
lencod - same
timberwolfmc - same
miniAMR - same

Differential Revision: https://reviews.llvm.org/D125571
2022-05-13 14:28:54 -07:00
Alexey Bataev 85f6b15ee5 [SLP]Do not look for buildvector sequence, if the index is reused.
If the insert indes was used already or is not constant, we should stop
looking for unique buildvector sequence, it mustbe splitted to
2 different buildvectors.
2022-05-13 13:56:02 -07:00