Commit Graph

2449 Commits

Author SHA1 Message Date
Alexey Bataev af870e11ae [SLP] Add detection of shuffled/perfect matching of tree entries.
SLP supports perfect diamond matching for the vectorized tree entries
but do not support it for gathered entries and does not support
non-perfect (shuffled) matching with 1 or 2 tree entries. Patch adds
support for this matching to improve cost of the vectorized tree.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D100495
2021-04-20 09:08:46 -07:00
Alexey Bataev b82344a019 Revert "[SLP] Add detection of shuffled/perfect matching of tree entries."
This reverts commit daf6e18c55 to fix the
compiler crash.
2021-04-20 08:29:32 -07:00
Alexey Bataev daf6e18c55 [SLP] Add detection of shuffled/perfect matching of tree entries.
SLP supports perfect diamond matching for the vectorized tree entries
but do not support it for gathered entries and does not support
non-perfect (shuffled) matching with 1 or 2 tree entries. Patch adds
support for this matching to improve cost of the vectorized tree.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D100495
2021-04-20 07:46:49 -07:00
Alexey Bataev cf00cb8bed Revert "[SLP] Add detection of shuffled/perfect matching of tree entries."
This reverts commit b232771aca to fix
buildbots.
2021-04-20 07:16:11 -07:00
Alexey Bataev b232771aca [SLP] Add detection of shuffled/perfect matching of tree entries.
SLP supports perfect diamond matching for the vectorized tree entries
but do not support it for gathered entries and does not support
non-perfect (shuffled) matching with 1 or 2 tree entries. Patch adds
support for this matching to improve cost of the vectorized tree.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D100495
2021-04-20 06:55:55 -07:00
Sander de Smalen 86729538bd [LV] Let selectVectorizationFactor reason directly on VectorizationFactor.
Rather than maintaining two separate values, a `float` for the per-lane
cost and a Width for the VF, maintain a single VectorizationFactor which
comprises the two and also removes the need for converting an integer value
to float.

This simplifies the query when asking if one VF is more profitable than
another when we want to extend this for scalable vectors (which may
require additional options to determine if e.g. a scalable VF of the
some cost, is more profitable than a fixed VF of the same cost).

The patch isn't entirely NFC because it also fixes an issue in
selectEpilogueVectorizationFactor, where the cost passed to ProfitableVFs
no longer truncates the floating-point cost from `float` to `unsigned` to
then perform the calculation on the truncated cost. It now does
a cost comparison with the correct precision.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D100121
2021-04-20 09:54:45 +01:00
Alexey Bataev 8030481065 Revert "[SLP]Add detection of shuffled/perfect matching of tree entries."
This reverts commit d6fde91379 to fix
compiler crashes.
2021-04-19 14:10:04 -07:00
Alexey Bataev d6fde91379 [SLP]Add detection of shuffled/perfect matching of tree entries.
SLP supports perfect diamond matching for the vectorized tree entries
but do not support it for gathered entries and does not support
non-perfect (shuffled) matching with 1 or 2 tree entries. Patch adds
support for this matching to improve cost of the vectorized tree.

Differential Revision: https://reviews.llvm.org/D100495
2021-04-19 13:29:30 -07:00
Cullen Rhodes f0bc2782f2 [TTI] NFC: Remove unused 'OptSize' parameter from shouldMaximizeVectorBandwidth
Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D100377
2021-04-19 11:01:34 +00:00
Florian Hahn 49999d4364 [VPlan] Replace a few unnecessary includes with forward decls. 2021-04-15 20:08:31 +01:00
Florian Hahn 6adebe3fd2 [VPlan] Add VPRecipeBase::mayHaveSideEffects.
Add an initial version of a helper to determine whether a recipe may
have side-effects.

Reviewed By: a.elovikov

Differential Revision: https://reviews.llvm.org/D100259
2021-04-15 11:49:40 +01:00
David Sherwood ea14df695e [SVE][LoopVectorize] Fix crash in InnerLoopVectorizer::widenPHIInstruction
There were a few places in widenPHIInstruction where calculations of
offsets were failing to take the runtime calculation of VF into
account for scalable vectors. I've fixed those cases in this patch
as well as adding an assert that we should not be scalarising for
scalable vectors.

Tests are added here:

  Transforms/LoopVectorize/AArch64/sve-widen-phi.ll

Differential Revision: https://reviews.llvm.org/D99254
2021-04-15 10:51:49 +01:00
David Sherwood 7120f89f7d [NFC][LoopVectorize] Remove unnecessary VF.isScalable asserts
There are a few places in LoopVectorize.cpp where we have been too
cautious in adding VF.isScalable() asserts and it can be confusing.
It also makes it more difficult to see the genuine places where
work needs doing to improve scalable vectorization support.

This patch changes getMemInstScalarizationCost to return an
invalid cost instead of firing an assert for scalable vectors. Also,
vectorizeInterleaveGroup had multiple asserts all for the same
thing. I have removed all but one assert near the start of the
function, and added a new assert that we aren't dealing with masks
for scalable vectors.

Differential Revision: https://reviews.llvm.org/D99727
2021-04-15 09:41:03 +01:00
Simon Pilgrim b49c41afba [SLP] createOp - fix null dereference warning. NFCI.
Only attempt to propagateIRFlags if we have both SelectInst - afaict we shouldn't have matched a min/max reduction without both SelectInst, but static analyzer doesn't know that.
2021-04-14 15:24:41 +01:00
Sander de Smalen bd86824d98 [TTI] NFC: Change getArithmeticReductionCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

This patch is practically NFC, with the exception of an AArch64 SVE related
cost-model change, where we can now return an Invalid cost instead of some
bogus number.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D100201
2021-04-13 14:20:59 +01:00
Sander de Smalen 92d8421f49 [TTI] NFC: Change getCastInstrCost and getExtractWithExtendCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D100199
2021-04-13 14:20:58 +01:00
dfukalov d066079728 [NFC][AA] Prepare to convert AliasResult to class with PartialAlias offset.
Main reason is preparation to transform AliasResult to class that contains
offset for PartialAlias case.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D98027
2021-04-09 12:54:22 +03:00
Alexey Bataev ab124bbe2a [SLP]Fix PR49898: Infinite loop in SLP vectorizer.
We should not re-try attempt of finding of the consecutive store chain
if it was tried before.

Differential Revision: https://reviews.llvm.org/D100131
2021-04-08 14:18:06 -07:00
Florian Hahn e4de3cdf3d [LV] Pass VPWidenPHIRecipe to widenPHIInstruction (NFC).
Instead of passing the start value and the defined value to
widenPHIInstruction, pass the VPWidenPHIRecipe directly, which can be
used to get both (and more in future patches).
2021-04-08 14:25:10 +01:00
David Green 8675ef100f [LV] Logical and/or select costs
D99674 stopped the folding of certain select operations into and/or, due
to incorrect folding in the presence of poison. D97360 added some costs
to attempt to account for the change, but only worked at the getUserCost
level, not the getCmpSelInstrCost that the vectorizer will use directly.
This adds similar logic into the vectorizer to handle these logical
and/or selects, treating them like and/or directly.

This fixes 60% performance regressions from code like the attached test
case.

Differential Revision: https://reviews.llvm.org/D99884
2021-04-08 10:39:47 +01:00
Alexey Bataev a78e86e6be [SLP]Avoid multiple attempts to vectorize CmpInsts.
No need to lookup through and/or try to vectorize operands of the
CmpInst instructions during attempts to find/vectorize min/max
reductions. Compiler implements postanalysis of the CmpInsts so we can
skip extra attempts in tryToVectorizeHorReductionOrInstOperands and save
compile time.

Differential Revision: https://reviews.llvm.org/D99950
2021-04-07 06:15:42 -07:00
Philip Reames a6d2a8d6f5 Add a subclass of IntrinsicInst for llvm.assume [nfc]
Add the subclass, update a few places which check for the intrinsic to use idiomatic dyn_cast, and update the public interface of AssumptionCache to use the new class.  A follow up change will do the same for the newer assumption query/bundle mechanisms.
2021-04-06 11:16:22 -07:00
Kerry McLaughlin 7344f3d39a [LoopVectorize] Add strict in-order reduction support for fixed-width vectorization
Previously we could only vectorize FP reductions if fast math was enabled, as this allows us to
reorder FP operations. However, it may still be beneficial to vectorize the loop by moving
the reduction inside the vectorized loop and making sure that the scalar reduction value
be an input to the horizontal reduction, e.g:

  %phi = phi float [ 0.0, %entry ], [ %reduction, %vector_body ]
  %load = load <8 x float>
  %reduction = call float @llvm.vector.reduce.fadd.v8f32(float %phi, <8 x float> %load)

This patch adds a new flag (IsOrdered) to RecurrenceDescriptor and makes use of the changes added
by D75069 as much as possible, which already teaches the vectorizer about in-loop reductions.
For now in-order reduction support is off by default and controlled with the `-enable-strict-reductions` flag.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D98435
2021-04-06 14:45:34 +01:00
Kerry McLaughlin 857b8a73da [LoopVectorize] Change the identity element for FAdd
Changes getRecurrenceIdentity to always return a neutral value of -0.0 for FAdd.

Reviewed By: dmgreen, spatel

Differential Revision: https://reviews.llvm.org/D98963
2021-04-06 12:13:43 +01:00
Florian Hahn a6b06b785c [VPlan] Print VPValue operands for VPWidenPHI if possible.
For VPWidenPHIRecipes that model all incoming values as VPValue
operands, print those operands instead of printing the original PHI.

D99294 updates recipes of reduction PHIs to use the VPValue for the
incoming value from the loop backedge, making use of this new printing.
2021-04-06 12:11:21 +01:00
Alexey Bataev 00a84f9a7f [SLP]Improve vectorization of the CmpInst instructions.
During vectorization better to postpone the vectorization of the CmpInst
instructions till the end of the basic block. Otherwise we may vectorize
it too early and may miss some vectorization patterns, like reductions.

Reworked part of D57059

Differential Revision: https://reviews.llvm.org/D99796
2021-04-05 06:22:51 -07:00
Fangrui Song 8e5f3d04f2 [SLPVectorizer] Fix divide-by-zero after D99719
Will add a test case later.
2021-04-02 11:13:51 -07:00
Florian Hahn 8867fc69f0 [LV] Hoist mapping of IR operands to VPValues (NFC).
This patch moves mapping of IR operands to VPValues out of
tryToCreateWidenRecipe. This allows using existing VPValue operands when
widening recipes directly, which will be introduced in future patches.
2021-04-02 17:57:20 +01:00
Alexey Bataev 5fcb07a070 [SLP]Fix a bug in min/max reduction, number of condition uses.
The ultimate reduction node may have multiple uses, but if the ultimate
reduction is min/max reduction and based on SelectInstruction, the
condition of this select instruction must have only single use.

Differential Revision: https://reviews.llvm.org/D99753
2021-04-02 07:09:44 -07:00
Florian Hahn 0f3230390b
[SLP] Better estimate cost of no-op extracts on target vectors.
The motivation for this patch is to better estimate the cost of
extracelement instructions in cases were they are going to be free,
because the source vector can be used directly.

A simple example is

    %v1.lane.0 = extractelement <2 x double> %v.1, i32 0
    %v1.lane.1 = extractelement <2 x double> %v.1, i32 1

    %a.lane.0 = fmul double %v1.lane.0, %x
    %a.lane.1 = fmul double %v1.lane.1, %y

Currently we only consider the extracts free, if there are no other
users.

In this particular case, on AArch64 which can fit <2 x double> in a
vector register, the extracts should be free, independently of other
users, because the source vector of the extracts will be in a vector
register directly, so it should be free to use the vector directly.

The SLP vectorized version of noop_extracts_9_lanes is 30%-50% faster on
certain AArch64 CPUs.

It looks like this does not impact any code in
SPEC2000/SPEC2006/MultiSource both on X86 and AArch64 with -O3 -flto.

This originally regressed after D80773, so if there's a better
alternative to explore, I'd be more than happy to do that.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D99719
2021-04-02 10:40:12 +01:00
Alexey Bataev c03696da5e [SLP]Improve and fix getVectorElementSize.
1. Need to cleanup InstrElementSize map for each new tree, otherwise might
use sizes from the previous run of the vectorization attempt.
2. No need to include into analysis the instructions from the different basic
   blocks to save compile time.

Differential Revision: https://reviews.llvm.org/D99677
2021-04-01 06:51:26 -07:00
Alexey Bataev ce98a0556a [SLP]Remove `else` after `return`, NFC.` 2021-04-01 05:33:01 -07:00
Huihui Zhang fe5c4a06a4 [LoopVectorize] Use SetVector to track uniform uses to prevent non-determinism.
Use SetVector instead of SmallPtrSet to track values with uniform use. Doing this
can help avoid non-determinism caused by iterating over unordered containers.

This bug was found with reverse iteration turning on,
--extra-llvm-cmake-variables="-DLLVM_REVERSE_ITERATION=ON".
Failing LLVM test consecutive-ptr-uniforms.ll .

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D99549
2021-03-31 11:21:07 -07:00
Sander de Smalen 7108b2dec1 [SVE] Fix LoopVectorizer test scalalable-call.ll
This marks FSIN and other operations to EXPAND for scalable
vectors, so that they are not assumed to be legal by the cost-model.

Depends on D97470

Reviewed By: dmgreen, paulwalker-arm

Differential Revision: https://reviews.llvm.org/D97471
2021-03-31 14:52:49 +01:00
Huihui Zhang d857a81437 [VPlan] Use SetVector for VPExternalDefs to prevent non-determinism.
Use SetVector instead of SmallPtrSet for external definitions created for VPlan.
Doing this can help avoid non-determinism caused by iterating over unordered containers.

This bug was found with reverse iteration turning on,
 --extra-llvm-cmake-variables="-DLLVM_REVERSE_ITERATION=ON".
Failing LLVM-Unit test VPRecipeTest.dump.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D99544
2021-03-30 12:10:56 -07:00
David Sherwood a08c7736a7 [LoopVectorize] Add support for scalable vectorization of induction variables
This patch adds support for the vectorization of induction variables when
using scalable vectors, which required the following changes:

1. Removed assert from InnerLoopVectorizer::getStepVector.
2. Modified InnerLoopVectorizer::createVectorIntOrFpInductionPHI to use
   a runtime determined value for VF and removed an assert.
3. Modified InnerLoopVectorizer::buildScalarSteps to work for scalable
   vectors. I did this by calculating the full vector value for each Part
   of the unroll factor (UF) and caching this in the VP state. This means
   that we are always able to extract an arbitrary element from the vector
   if necessary. In addition to this, I also permitted the caching of the
   individual lane values themselves for the known minimum number of elements
   in the same way we do for fixed width vectors. This is a further
   optimisation that improves the code quality since it avoids unnecessary
   extractelement operations when extracting the first lane.
4. Added an assert to InnerLoopVectorizer::widenPHIInstruction, since while
   testing some code paths I noticed this is currently broken for scalable
   vectors.

Various tests to support different cases have been added here:

  Transforms/LoopVectorize/AArch64/sve-inductions.ll

Differential Revision: https://reviews.llvm.org/D98715
2021-03-30 11:13:31 +01:00
Florian Hahn c773d0f973
Recommit "[LV] Move runtime pointer size check to LVP::plan()."
Re-apply 25fbe803d4, with a small update to emit the right remark
class.

Original message:
    [LV] Move runtime pointer size check to LVP::plan().

    This removes the need for the remaining doesNotMeet check and instead
    directly checks if there are too many runtime checks for vectorization
    in the planner.

    A subsequent patch will adjust the logic used to decide whether to
    vectorize with runtime to consider their cost more accurately.

    Reviewed By: lebedev.ri
2021-03-29 16:14:27 +01:00
Florian Hahn 485c8ce733
Revert "[LV] Move runtime pointer size check to LVP::plan()."
This reverts commit 25fbe803d4.

This breaks a clang test which filters for the wrong remark type.
2021-03-29 14:41:53 +01:00
Sanjay Patel da381cf7ce [SLP] allow matching integer min/max intrinsics as reduction ops
This is a 2nd try of:
3c8473ba53
which was reverted at:
 a26312f9d4
because of crashing.

This version includes extra code and tests to avoid the known
crashing examples as discussed in PR49730.

Original commit message:
As noted in D98152, we need to patch SLP to avoid regressions when
we start canonicalizing to integer min/max intrinsics.
Most of the real work to make this possible was in:
7202f47508

Differential Revision: https://reviews.llvm.org/D98981
2021-03-29 09:38:18 -04:00
Florian Hahn 25fbe803d4
[LV] Move runtime pointer size check to LVP::plan().
This removes the need for the remaining doesNotMeet check and instead
directly checks if there are too many runtime checks for vectorization
in the planner.

A subsequent patch will adjust the logic used to decide whether to
vectorize with runtime to consider their cost more accurately.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D98634
2021-03-29 14:12:29 +01:00
Florian Hahn 8c6c357897
[LV] Mark a few more cost-model members as const (NFC). 2021-03-28 14:59:48 +01:00
Florian Hahn d2855eba81
[LV] Fix formatting from 2f9d68c3f1. 2021-03-27 21:29:56 +00:00
Florian Hahn 2f9d68c3f1
[LV] Mark some methods as const (NFC).
Mark a few methods as const, as they do not modify any state.
2021-03-27 21:27:53 +00:00
Sanjay Patel b0797e0c12 [SLP] use dyn_cast instead of isa + cast; NFC 2021-03-26 13:52:31 -04:00
Sanjay Patel a26312f9d4 Revert "[SLP] allow matching integer min/max intrinsics as reduction ops"
This reverts commit 3c8473ba53 and includes test diffs to
maintain testing status.

There's at least 1 place that was not updated with 7202f47508 ,
so we can crash mismatching select and intrinsics as shown in
PR49730.
2021-03-26 09:59:14 -04:00
David Sherwood c39460cc4f Revert "[LoopVectorize] Simplify scalar cost calculation in getInstructionCost"
This reverts commit 240aa96cf2.
2021-03-26 11:36:53 +00:00
David Sherwood 240aa96cf2 [LoopVectorize] Simplify scalar cost calculation in getInstructionCost
This patch simplifies the calculation of certain costs in
getInstructionCost when isScalarAfterVectorization() returns a true value.
There are a few places where we multiply a cost by a number N, i.e.

  unsigned N = isScalarAfterVectorization(I, VF) ? VF.getKnownMinValue() : 1;
  return N * TTI.getArithmeticInstrCost(...

After some investigation it seems that there are only these cases that occur
in practice:

1. VF is a scalar, in which case N = 1.
2. VF is a vector. We can only get here if: a) the instruction is a
GEP/bitcast with scalar uses, or b) this is an update to an induction variable
that remains scalar.

I have changed the code so that N is assumed to always be 1. For GEPs
the cost is always 0, since this is calculated later on as part of the
load/store cost. For all other cases I have added an assert that none of the
users needs scalarising, which didn't fire in any unit tests.

Only one test required fixing and I believe the original cost for the scalar
add instruction to have been wrong, since only one copy remains after
vectorisation.

Differential Revision: https://reviews.llvm.org/D98512
2021-03-26 11:27:12 +00:00
Yevgeny Rouban f7ef26ef0b [SLP] Fix crash in reduction for integer min/max
The SCEV commit b46c085d2b [NFCI] SCEVExpander:
    emit intrinsics for integral {u,s}{min,max} SCEV expressions
seems to reveal a new crash in SLPVectorizer.
SLP crashes expecting a SelectInst as an externally used value
but umin() call is found.

The patch relaxes the assumption to make the IR flag propagation safe.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D99328
2021-03-25 21:44:21 +07:00
Alexey Bataev 568c874117 [SLP]Improve and simplify extendSchedulingRegion.
We do not need to scan further if the upper end or lower end of the
basic block is reached already and the instruction is not found. It
means that the instruction is definitely in the lower part of basic
block or in the upper block relatively.
This should improve compile time for the very big basic blocks.

Differential Revision: https://reviews.llvm.org/D99266
2021-03-25 05:31:58 -07:00
Florian Hahn 9d45579279
[LV] Factor out phi type access to variable (NFC).
A slight simplification of the code to reduce future diffs.
2021-03-24 19:25:22 +00:00