Commit Graph

3124 Commits

Author SHA1 Message Date
Vasileios Porpodas 0950d4060c Recommit "[SLP] Make reordering aware of external vectorizable scalar stores."
This reverts commit c2a7904aba.

Original code review: https://reviews.llvm.org/D125111
2022-05-11 16:47:29 -07:00
Arthur Eubanks c2a7904aba Revert "[SLP] Make reordering aware of external vectorizable scalar stores."
This reverts commit 71bcead98b.

Causes crashes, see comments in D125111.
2022-05-11 15:28:00 -07:00
Alexey Bataev f5d45d70a5 [SLP]Further improvement of the cost model for scalars used in buildvectors.
Further improvement of the cost model for the scalars used in
buildvectors sequences. The main functionality is outlined into
a separate function.
The cost is calculated in the following way:
1. If the Base vector is not undef vector, resizing the very first mask to
have common VF and perform action for 2 input vectors (including non-undef
Base). Other shuffle masks are combined with the resulting after the 1 stage and processed as a shuffle of 2 elements.
2. If the Base is undef vector and have only 1 shuffle mask, perform the
action only for 1 vector with the given mask, if it is not the identity
mask.
3. If > 2 masks are used, perform serie of shuffle actions for 2 vectors,
combing the masks properly between the steps.

The original implementation misses the very first analysis for the Base
vector, so the cost might too optimistic in some cases. But it improves
the cost for the insertelements which are part of the current SLP graph.

Part of D107966.

Differential Revision: https://reviews.llvm.org/D115750
2022-05-11 06:08:55 -07:00
Florian Hahn 635b752211
[VPlan] VPInterleaveRecipe only uses first lane if op not stored.
With opaque pointers, both the stored value and the address can be the
same. Only consider the recipe using the first lane only *if* the
address is not stored.

Fixes #55375.
2022-05-11 11:24:56 +01:00
Vasileios Porpodas 71bcead98b [SLP] Make reordering aware of external vectorizable scalar stores.
The current reordering scheme only checks the ordering of in-tree operands.
There are some cases, however, where we need to adjust the ordering based on
the ordering of a future SLP-tree who's instructions are not part of the
current tree, but are external users.

This patch is a simple implementation of this. We keep track of scalar stores
that are users of TreeEntries and if they look profitable to vectorize, then
we keep track of their ordering. During the reordering step we take this new
index order into account. This can remove some shuffles in cases like in the
lit test.

Differential Revision: https://reviews.llvm.org/D125111
2022-05-10 15:25:35 -07:00
Alexey Bataev 4212ef8a0e Revert "[SLP]Further improvement of the cost model for scalars used in buildvectors."
This reverts commit 99f31acfce and several
others to fix detected crashes, reported in https://reviews.llvm.org/D115750
2022-05-09 13:46:06 -07:00
Alexey Bataev cce80bd8b7 [SLP]Adjust assertion check for scalars in several insertelements.
If the same scalar is inserted several times into the same buildvector,
the mask index can be used already. In this case need to check, that
this scalar is already part of the vectorized buildvector.
2022-05-09 13:07:59 -07:00
Florian Hahn 266ea446ab
Revert "Recommit "[VPlan] Remove uneeded needsVectorIV check.""
This reverts commit 8b48223447.

This triggers an assertion on a test case mentioned in D123720.
Revert while I investigate.
2022-05-09 20:33:14 +01:00
Alexey Bataev 9dc4ced204 [SLP]Try partial store vectorization if supported by target.
We can try to vectorize number of stores less than MinVecRegSize
/ scalar_value_size, if it is allowed by target. Gives an extra
opportunity for the vectorization.

Fixes PR54985.

Differential Revision: https://reviews.llvm.org/D124284
2022-05-09 09:48:15 -07:00
Alexey Bataev 9c3a75eabf [SLP]Fix a crash when preparing a mask for external scalars.
Need to use actual index instead of the tree entry position, since the
insert index may be different than 0. It mean, that we vectorized part
of the buildvector starting from not initial insertelement instruction
beause of some reason.
2022-05-09 07:59:34 -07:00
David Green 6f9e1ea0ef [VectorCombine] Attempt to fold select shuffles from reductions
Given a commutative reduction leading from a shuffle, the order of the
lanes on the shuffle are not important for the result. This means we can
reorder the shuffle to something simpler, which we try shuffling the
first vector lanes first. This was D123494.

The new shuffle may not be profitable though, and if it is not we can
try the folding of select shuffles from D123911. This, with some
adjustment as the output lane ordering is now unimportant, can allow the
final shuffle to simplify given the inputs to the patterns from D123911.
Where as each transformation on their own are not profitable, the
combination is.

We can only support a single shuffle when called from reductions, but we
are able to sort the ReconstructMask, potentially allowing it to
simplify to an identity or concat mask.

Differential Revision: https://reviews.llvm.org/D125086
2022-05-08 10:32:41 +01:00
David Green 802e15c576 [SLP] Cluster ordering for loads
Given a load without a better order, this patch partially sorts the
elements to form clusters of adjacent elements in memory. These clusters
can potentially be loaded in fewer loads, meaning less overall shuffling
(for example loading v4i8 clusters of a v16i8 as a single f32 loads, as
opposed to multiple independent bytes loads and inserts).

Differential Revision: https://reviews.llvm.org/D122145
2022-05-07 14:38:11 +01:00
David Green 100cb9a2ba [VectorCombine] Fold shuffle select pattern
This patch adds a combine to attempt to reduce the costs of certain
select-shuffle patterns. The form of code it attempts to detect is:
  %x = shuffle ...
  %y = shuffle ...
  %a = binop %x, %y
  %b = binop %x, %y
  shuffle %a, %b, selectmask

A classic select-mask will pick items from each lane of a or b. These
do not always have a great lowering on many architectures. This patch
attempts to pack a and b into the lower elements, creating a differently
ordered shuffle for reconstructing the orignal which may be better than
the select mask. This can be better for performance, especially if less
elements of a and b need to be computed and the input shuffles are
cheaper.

Because select-masks are just one form of shuffle, we generalize to any
mask. So long as the backend has decent costmodel for the shuffles, this
can generally improve things when they come up. For more basic cost
models the folds do not appear to be profitable, not getting past the
cost checks.

Differential Revision: https://reviews.llvm.org/D123911
2022-05-06 08:13:18 +01:00
Florian Hahn f9f7aa30f8
[VPlan] Remove dead code to create VPWidenPHIRecipes (NFCI).
After introducing VPWidenPointerInductionRecipe, VPWidenPHIRecipes
should not be created at this point. Turn check into an assert.
2022-05-05 19:29:02 +01:00
Alexey Bataev 99f31acfce [SLP]Further improvement of the cost model for scalars used in buildvectors.
Further improvement of the cost model for the scalars used in
buildvectors sequences. The main functionality is outlined into
a separate function.
The cost is calculated in the following way:
1. If the Base vector is not undef vector, resizing the very first mask to
have common VF and perform action for 2 input vectors (including non-undef
Base). Other shuffle masks are combined with the resulting after the 1 stage and processed as a shuffle of 2 elements.
2. If the Base is undef vector and have only 1 shuffle mask, perform the
action only for 1 vector with the given mask, if it is not the identity
mask.
3. If > 2 masks are used, perform serie of shuffle actions for 2 vectors,
combing the masks properly between the steps.

The original implementation misses the very first analysis for the Base
vector, so the cost might too optimistic in some cases. But it improves
the cost for the insertelements which are part of the current SLP graph.

Part of D107966.

Differential Revision: https://reviews.llvm.org/D115750
2022-05-05 06:04:25 -07:00
Florian Hahn 8b48223447
Recommit "[VPlan] Remove uneeded needsVectorIV check."
This reverts commit f4e1eaa375.

The patch was originally reverted because it uncovered an issue that has
now been fixed in 0ef8ca6d88.
2022-05-04 10:53:42 +01:00
serge-sans-paille 7030654296 [iwyu] Handle regressions in libLLVM header include
Running iwyu-diff on LLVM codebase since fa5a4e1b95 detected a few
regressions, fixing them.

Differential Revision: https://reviews.llvm.org/D124847
2022-05-04 08:32:38 +02:00
Igor Kirillov 4e5e042d9a [LoopVectorize] Support reductions that store intermediary result
Adds ability to vectorize loops containing a store to a loop-invariant
address as part of a reduction that isn't converted to SSA form due to
lack of aliasing info. Runtime checks are generated to ensure the store
does not alias any other accesses in the loop.

Ordered fadd reductions are not yet supported.

Differential Revision: https://reviews.llvm.org/D110235
2022-05-03 10:12:30 +01:00
David Green 6f81903e89 [LV][SLP] Mark fptosi_sat as vectorizable
This adds fptosi_sat and fptoui_sat to the list of trivially
vectorizable functions, mainly so that the loop vectorizer can vectorize
the instruction. Marking them as trivially vectorizable also allows them
to be SLP vectorized, and Scalarized.

The signature of a fptosi_sat requires two type overrides
(@llvm.fptosi.sat.v2i32.v2f32), unlike other intrinsics that often only
take a single. This patch alters hasVectorInstrinsicOverloadedScalarOpd
to isVectorIntrinsicWithOverloadTypeAtArg, so that it can mark the first
operand of the intrinsic as a overloaded (but not scalar) operand.

Differential Revision: https://reviews.llvm.org/D124358
2022-05-03 09:32:34 +01:00
Alexey Bataev e74a73782f [SLP][NFC]Minor code changes for better readability, NFC. 2022-05-02 12:58:25 -07:00
Alexey Bataev 7ea03f0b4e [SLP]Improve reductions analysis and emission, part 1.
Currently SLP vectorizer walks through the instructions and selects
3 main classes of values: 1) reduction operations - instructions with same
reduction opcode (add, mul, min/max, etc.), which build the reduction,
2) reduced values - instructions with the same opcodes, but different
from the reduction opcode, 3) extra arguments - all other values,
instructions from the different basic block rather than the root node,
instructions with to many/less uses.

This scheme is not very efficient. It excludes some instructions and all
non-instruction values from the reductions (constants, proficient
gathers), to many possibly reduced values are marked as extra arguments.
Patch improves this process by introducing a bit extended analysis
stage. During this stage, we still try to select 3 classes of the
values: 1) reduction operations - same as before, 2) possibly reduced
values - all instructions from the current block/non-instructions, which
may build a vectorization tree, 3) extra arguments - instructions from
the different basic blocks. Additionally, an extra sorting of the
possibly reduced values occurs to build the scalar sequences which
highly likely will bed vectorized, e.g. loads are grouped by the
distance between them, constants are grouped together, cmp instructions
are sorted by their compare types and predicates, extractelement
instructions are sorted by the vector operand, etc. Also, these groups
are reordered by their length so the longest group is the first in the
list of the possibly reduced values.

The vectorization process tries to emit the reductions for all these
groups. These reductions, remaining non-vectorized possible reduced
values and extra arguments are then combined into the final expression
just like it was before.

Differential Revision: https://reviews.llvm.org/D114171
2022-05-02 12:03:58 -07:00
Florian Hahn 0ef8ca6d88
[VPlan] Do not create VPWidenCall recipes for scalar vector factors.
'Widen' recipe are only used when actual vector values are generated.
Fix tryToWidenCall to do not create VPWidenCallRecipes for scalar vector
factors.

This was exposed by D123720, because the widened recipes are considered
vector users.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D124718
2022-05-02 19:40:33 +01:00
Simon Pilgrim 34f97a3709 [VectorCombine] Merge isa<>/cast<> into dyn_cast<>. NFC.
We want to handle the the assert in VectorCombine so avoid the repeated isa/cast code.
2022-05-01 20:09:10 +01:00
Simon Pilgrim 09761ce295 [SLPVectorizer] Remove weird unicode character from comment. NFCI.
Whatever it was, Visual Assist really didn't like it....
2022-05-01 16:37:21 +01:00
Alexey Bataev 484fcb9888 [SLP][NFC]Fix a comment. 2022-04-29 09:27:13 -07:00
Anna Thomas 205246cb64 [CompileTime] [Passes] Avoid computing unnecessary analyses. NFC
Similar to c515b2f39e, If there are no loops in the function as seen
through LI, we should avoid computing the remaining expensive analyses
(such as SCEV, BPI).  Reordered the analyses requests and early return
if there are no loops.

The logic of avoiding expensive analyses is applied to LoopVectorizer,
LoopLoadElimination and LoopUnrollPass, i.e. all function passes which operate
on loops.

This is an NFC with compile time improvement.

Differential Revision: https://reviews.llvm.org/D124529
2022-04-29 10:00:06 -04:00
Ricky Zhou 24a133e16f
[LV] Rename CountRoundDown to VectorTripCount (NFC)
The name CountRoundDown is potentially misleading, as the number of
iterations can be rounded up when folding the tail.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D119681
2022-04-29 13:50:00 +01:00
Florian Hahn e66127e69b
[VPlan] Simplify & adjust code as suggested in D123005.
Improve code as suggested in D123005. Applied separately, because the
comments where made a diff that has not been rebased to current main.
2022-04-29 13:34:54 +01:00
David Green 7047c47918 [VecCombine] Fix sort comparator logic in foldShuffleFromReductions
I think this sort comparator was overly complex, and the windows
expensive check bot agreed, failing as it was not giving a strict weak
ordering. Change it to use the comparison of the mask values as unsigned
integers. This should sort the undef elements to the end whilst keeping
X<Y otherwise.
2022-04-29 09:30:02 +01:00
Florian Hahn f4e1eaa375
Revert "[VPlan] Remove uneeded needsVectorIV check."
This reverts commit 43842b887e while I
investigate a buildbot failure.

It also reverts the follow-up commit
2883de0514.
2022-04-28 20:16:21 +01:00
David Green ded8187e35 [VectorCombine] Try to reduce shuffle cost for commutative reduction operands
Given a shuffle feeding a commutative reduction, the lane ordering of
the shuffle will not alter the result. This is also true if there are a
number of operations between the reduction and the shuffle, providing
they only operate lane-wise. This patch searches for cases like that in
Vector Combine, allowing us to check the cost of the shuffle vs an
in-order identity shuffle and replace the order if possible. This only
handles a single shuffle at the moment to keep things simple, and is
able to ignore splats that produce results where every result is the
same.

This is a more powerful version of a combine that already happens in
instrcombine, capable of optimizing more cases by looking through more
instructions and being able to cost the shuffle.

Differential Revision: https://reviews.llvm.org/D123494
2022-04-28 19:46:12 +01:00
Alexey Bataev 75e1cf4a6a [COST]Improve cost model for shuffles in SLP.
Introduced masks where they are not added and improved target dependent
cost models to avoid returning of the incorrect cost results after
adding masks.

Differential Revision: https://reviews.llvm.org/D100486
2022-04-28 10:04:41 -07:00
Florian Hahn 2883de0514
[VPlan] Fix comment formatting from 43842b887e. 2022-04-28 16:31:48 +01:00
Florian Hahn 43842b887e
[VPlan] Remove uneeded needsVectorIV check.
Remove one of the last remaining uses of ::needsVectorIV, preparing for
its removal. Now that usesScalars is available and based on the
information explicit in VPlan, there is no need to use the pre-computed
needsVectorIV.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D123720
2022-04-28 16:27:34 +01:00
Alexey Bataev 9861ca0c23 Revert "[COST]Improve cost model for shuffles in SLP."
This reverts commit 29a470e380 to fix
a crash reported in https://reviews.llvm.org/D100486#3479989.
2022-04-28 08:11:56 -07:00
Alexey Bataev 29a470e380 [COST]Improve cost model for shuffles in SLP.
Introduced masks where they are not added and improved target dependent
cost models to avoid returning of the incorrect cost results after
adding masks.

Differential Revision: https://reviews.llvm.org/D100486
2022-04-27 10:56:26 -07:00
Shilei Tian a6b355dd31 [SLP] Fix a typo that causes redundant assertion and potential segment fault
Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D124497
2022-04-27 10:07:59 -04:00
Vasileios Porpodas fa8a9fea47 Recommit "[SLP][TTI] Refactoring of `getShuffleCost` `Args` to work like `getArithmeticInstrCost`"
This reverts commit 6a9bbd9f20.

Code review: https://reviews.llvm.org/D124202
2022-04-26 14:02:40 -07:00
Vasileios Porpodas 6a9bbd9f20 Revert "[SLP][TTI] Refactoring of `getShuffleCost` `Args` to work like `getArithmeticInstrCost`"
This reverts commit 55ce296d6f.
2022-04-26 11:25:26 -07:00
Vasileios Porpodas 55ce296d6f [SLP][TTI] Refactoring of `getShuffleCost` `Args` to work like `getArithmeticInstrCost`
Before this patch `Args` was used to pass a broadcat's arguments by SLP.
This patch changes this. `Args` is now used for passing the operands of
the shuffle.

Differential Revision: https://reviews.llvm.org/D124202
2022-04-26 11:11:29 -07:00
Valery N Dmitriev 88b9e46fb5 [SLP] Steer for the best chance in tryToVectorize() when rooting with binary ops.
tryToVectorize() method implements one of searching paths for vectorizable tree roots in SLP vectorizer,
specifically for binary and comparison operations. Order of making probes for various scalar pairs
was defined by its implementation: the instruction operands, then climb over one operand if
the instruction is its sole user and then perform same actions for another operand if previous
attempts failed. Problem with this approach is that among these options we can have more than a
single vectorizable tree candidate and it is not necessarily the one that encountered first.
Trying to build vectorizable tree for each possible combination for just evaluation is expensive.
But we already have lookahead heuristics mechanism which we use for finding best pick among
operands of commutative instructions. It calculates cumulative score for candidates in two
consecutive lanes. This patch introduces use of the heuristics for choosing the best pair among
several combinations. We only try one that looks as most promising for vectorization.
Additional benefit is that we reduce total number of vectorization trees built for probes
because we skip those looking non-profitable early.

Reviewed By: Alexey Bataev (ABataev), Vasileios Porpodas (vporpo)
Differential Revision: https://reviews.llvm.org/D124309
2022-04-25 12:25:33 -07:00
David Green 9727c77d58 [NFC] Rename Instrinsic to Intrinsic 2022-04-25 18:13:23 +01:00
Valery N Dmitriev edf7bed87b [SLP][NFC] Outline lookahead heuristics into a separate helper class.
Minor refactoring to reduce size of functional change D124309:
  look-ahead scoring routines pulled out of VLOperands and formed
  new LookAheadHeuristics helper class.

Reviewed By: Alexey Bataev (ABataev), Vasileios Porpodas (vporpo)
Differential Revision: https://reviews.llvm.org/D124313
2022-04-22 18:59:08 -07:00
Vasileios Porpodas 889588ee97 [SLP] Refactoring isLegalBroadcastLoad() to use `ElementCount`.
Replacing `unsigned` with `ElementCount` in the argument of `isLegalBroadcastLoad()`.
This helps reduce the diff of a future SLP patch for AArch64.
2022-04-21 10:19:00 -07:00
Florian Hahn bea69b232f
[VPlan] Initial modeling of middle block in VPlan.
This patch extends the scope of VPlan to also include the exit (aka
middle) block.

For now, the exit block remains empty, but handling of exit values will
subsequently be moved to VPlan, by adding recipes to model exit values
in the exit block.

As a first step, this will allow fixing #51366.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D123457
2022-04-20 19:34:41 +01:00
Vasileios Porpodas 8d4b5e0833 [NFC][SLP] Improved description of getShallowScore() and getScoreAtLevelRec()
Differential Revision: https://reviews.llvm.org/D124027
2022-04-19 12:15:36 -07:00
Florian Hahn 4026b718b8
[VPlan] Remove unused SCEV forward declaration (NFC). 2022-04-19 17:16:17 +02:00
Alexey Bataev 883571928c Revert "[SLP]Improve reductions analysis and emission, part 1."
This reverts commit 0e1f4d4d3c to fix
a crash reported in PR54976
2022-04-19 06:17:03 -07:00
Florian Hahn a65f2730d2
[VPlan] Expand induction step in VPlan pre-header.
This patch moves SCEV expansion of steps used by
VPWidenIntOrFpInductionRecipes to the pre-header using
VPExpandSCEVRecipe. This ensures that those steps are expanded while the
CFG is in a valid state. Previously, SCEV expansion may happen during
vector body code-generation, during which the CFG may be invalid,
causing issues with SCEV expansion.

Depends on D122095.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D122096
2022-04-19 13:06:39 +02:00
Vasileios Porpodas b1333f03d9 Recommit "[SLP] Support internal users of splat loads"
Code review: https://reviews.llvm.org/D121940

This reverts commit 359dbb0d3d.
2022-04-18 15:58:01 -07:00