Commit Graph

1210 Commits

Author SHA1 Message Date
Caroline Concatto 3c03635d53 [SVE][LoopVectorize] Add support for scalable vectorization of loops with vector reverse
This patch adds support for reverse loop vectorization.
It is possible to vectorize the following loop:
```
  for (int i = n-1; i >= 0; --i)
    a[i] = b[i] + 1.0;
```
with fixed or scalable vector.
The loop-vectorizer will use 'reverse' on the loads/stores to make
sure the lanes themselves are also handled in the right order.
This patch adds support for scalable vector on IRBuilder interface to
create a reverse vector. The IR function
CreateVectorReverse lowers to experimental.vector.reverse for scalable vector
and keedp the original behavior for fixed vector using shuffle reverse.

Differential Revision: https://reviews.llvm.org/D95363
2021-03-16 07:51:59 +00:00
Roman Lebedev 78b8ce40ef
Reland [SCEV] Improve modelling for (null) pointer constants
This reverts commit 329aeb5db4,
and relands commit 61f006ac65.

This is a continuation of D89456.

As it was suggested there, now that SCEV models `PtrToInt`,
we can try to improve SCEV's pointer handling.
In particular, i believe, i will need this in the future
to further fix `SCEVAddExpr`operation type handling.

This removes special handling of `ConstantPointerNull`
from `ScalarEvolution::createSCEV()`, and add constant folding
into `ScalarEvolution::getPtrToIntExpr()`.
This way, `null` constants stay as such in SCEV's,
but gracefully become zero integers when asked.

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D98147
2021-03-13 16:05:34 +03:00
Roman Lebedev 329aeb5db4
Temporairly evert "[SCEV] Improve modelling for (null) pointer constants"
This appears to have broken ubsan bot:
https://lab.llvm.org/buildbot/#/builders/85/builds/3062
https://reviews.llvm.org/D98147#2623549

It looks like LSR needs some kind of a change around insertion point handling.
Reverting until i have a fix.

This reverts commit 61f006ac65.
2021-03-13 09:10:28 +03:00
Roman Lebedev 61f006ac65
[SCEV] Improve modelling for (null) pointer constants
This is a continuation of D89456.

As it was suggested there, now that SCEV models `PtrToInt`,
we can try to improve SCEV's pointer handling.
In particular, i believe, i will need this in the future
to further fix `SCEVAddExpr`operation type handling.

This removes special handling of `ConstantPointerNull`
from `ScalarEvolution::createSCEV()`, and add constant folding
into `ScalarEvolution::getPtrToIntExpr()`.
This way, `null` constants stay as such in SCEV's,
but gracefully become zero integers when asked.

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D98147
2021-03-12 22:11:58 +03:00
Florian Hahn 8904a82fa7
[LV] Fix name in CHECK pattern after fb3ca7076 2021-03-12 13:31:48 +00:00
Florian Hahn fb3ca70761
[LV] Account IV recipes being uniform in VPTransformState::get().
This patch fixes a crash when trying to get a scalar value using
VPTransformState::get() for uniform induction values or truncated
induction values. IVs and truncated IVs can be uniform and the updated
code accounts for that, fixing the crash.

This should fix
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=31981
2021-03-12 13:29:06 +00:00
Mauri Mustonen 0de8aeae72
[VPlan] Support to widen select intructions in VPlan native path
Add support to widen select instructions in VPlan native path by using a correct recipe when such instructions are encountered. This is already used by inner loop vectorizer.

Previously select instructions get handled by the wrong recipe and resulted in unreachable instruction errors like this one: https://bugs.llvm.org/show_bug.cgi?id=48139.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D97136
2021-03-10 20:59:53 +00:00
David Green fa450e98c5 [ARM] Test for predicated scalar memops. NFC
This test shows a case where we can potentially scalarize the store in a
predicated loop, creating a lot of instructions that would be much
slower than scalar.
2021-03-09 21:57:18 +00:00
Masoud Ataei 820f508b08 [PowerPC] Removing _massv place holder
Since P8 is the oldest machine supported by MASSV pass,
_massv place holder is removed and the oldest version of
MASSV functions is assumed. If the P9 vector specific is
detected in the compilation process, the P8 prefix will
be updated to P9.

Differential Revision: https://reviews.llvm.org/D98064
2021-03-08 21:43:24 +00:00
David Sherwood de3185647d [LoopVectorize][SVE] Add tests for vectorising conditional loads of invariant addresses
For loops of the form:

 void foo(int *a, int *cond, short *inv, long long n) {
   for (long long i=0; i<n; ++i) {
     if (cond[i])
       a[i] = *inv;
   }
 }

we can vectorise for SVE using masked gather loads where the array
of pointers is simply a vector splat of 'inv' and the mask comes
from the condition 'cond[i] != 0'.

This patch simply adds tests upstream to defend this capability.

Differential Revision: https://reviews.llvm.org/D98043
2021-03-08 08:38:31 +00:00
Mauri Mustonen 494b5ba364
[VPlan] Support to widen call intructions in VPlan native path
Add support to widen call instructions in VPlan native path by using a correct recipe when such instructions are encountered. This is already used by inner loop vectorizer.

Previously call instructions got handled by wrong recipes and resulted in unreachable instruction errors like this one: https://bugs.llvm.org/show_bug.cgi?id=48139.

Patch by Mauri Mustonen <mauri.mustonen@tuni.fi>

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D97278
2021-03-06 21:59:52 +00:00
Roman Lebedev b46c085d2b
[NFCI] SCEVExpander: emit intrinsics for integral {u,s}{min,max} SCEV expressions
These intrinsics, not the icmp+select are the canonical form nowadays,
so we might as well directly emit them.

This should not cause any regressions, but if it does,
then then they would needed to be fixed regardless.

Note that this doesn't deal with `SCEVExpander::isHighCostExpansion()`,
but that is a pessimization, not a correctness issue.

Additionally, the non-intrinsic form has issues with undef,
see https://reviews.llvm.org/D88287#2587863
2021-03-06 21:52:46 +03:00
David Sherwood fec0a0adac [SVE][LoopVectorize] Add support for extracting the last lane of a scalable vector
There are certain loops like this below:

  for (int i = 0; i < n; i++) {
    a[i] = b[i] + 1;
    *inv = a[i];
  }

that can only be vectorised if we are able to extract the last lane of the
vectorised form of 'a[i]'. For fixed width vectors this already works since
we know at compile time what the final lane is, however for scalable vectors
this is a different story. This patch adds support for extracting the last
lane from a scalable vector using a runtime determined lane value. I have
added support to VPIteration for runtime-determined lanes that still permit
the caching of values. I did this by introducing a new class called VPLane,
which describes the lane we're dealing with and provides interfaces to get
both the compile-time known lane and the runtime determined value. Whilst
doing this work I couldn't find any explicit tests for extracting the last
lane values of fixed width vectors so I added tests for both scalable and
fixed width vectors.

Differential Revision: https://reviews.llvm.org/D95139
2021-03-05 09:57:56 +00:00
Luke d28297ff68 [RISCV] Enable fixed-length vectorization of LoopVectorizer for RISC-V Vector
By implementing the method "unsigned RISCVTTIImpl::getRegisterBitWidth(bool Vector)",
fixed-length vectorization is enabled when possible. Without this method, the
"#pragma clang loop" directive is needed to enable vectorization(or the cost model
may inform LLVM that "Vectorization is possible but not beneficial").

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D97549
2021-03-05 10:54:51 +08:00
Sanjay Patel 1bee549737 [LoopVectorize] propagate fast-math-flags from induction instructions
This code assumed that FP math was only permissable if it was
fully "fast", so it hard-coded "fast" when creating new instructions.

The underlying code already allows matching recurrences/reductions
that are only "reassoc", so this change should prevent the potential
miscompile seen in the test diffs (we created "fast" ops even though
none existed in the original code).

I don't know if we need to create the temporary IRBuilder objects
used here, so that could be follow-up clean-up.

There's an open question about whether we should require "nsz" in
addition to "reassoc" here. InstCombine uses that combo for its
reassociative folds, but I think codegen is not as strict.
2021-03-04 17:21:32 -05:00
Sanjay Patel 36a489d194 [Analysis][LoopVectorize] rename "Unsafe" variables/methods; NFC
Similar to b3a33553ae, but this shows a TODO and a potential
miscompile is already present.

We are tracking an FP instruction that does *not* have FMF (reassoc)
properties, so calling that "Unsafe" seems opposite of the common
reading.

I also removed one getter method by rolling the null check into
the access. Further simplification may be possible.

The motivation is to clean up the interactions between FMF and
function-level attributes in these classes and their callers.

The new test shows that there is an existing bug somewhere in
the callers. We assumed that the original code was fully 'fast'
and so we produced IR with 'fast' even though it was just 'reassoc'.
2021-03-04 10:40:26 -05:00
Florian Hahn 0cb9d8acbc
[LV] Add test cases that require a larger number of RT checks.
Precommit tests cases for D75981.
2021-03-02 10:49:38 +00:00
Masoud Ataei 5fe0cab79e [PowerPC] Removing sqrtd2 and sqrtf4 from list of vectorizable function with MASSV
Under -O3 and -Ofast, the MASSV conversion prevents the sqrt call to be inlined.
Inline sqrt is faster than MASSV call on leppc.

Differential Revision: https://reviews.llvm.org/D97487
2021-03-01 15:42:19 +00:00
Florian Hahn 53dacb7b67
[LV] Generate RT checks up-front and remove them if required.
This patch updates LV to generate the runtime checks just after cost
modeling, to allow a more precise estimate of the actual cost of the
checks. This information will be used in future patches to generate
larger runtime checks in cases where the checks only make up a small
fraction of the expected scalar loop execution time.

The runtime checks are created up-front in a temporary block to allow better
estimating the cost and un-linked from the existing IR. After deciding to
vectorize, the checks are moved backed. If deciding not to vectorize, the
temporary block is completely removed.

This patch is similar in spirit to D71053, but explores a different
direction: instead of delaying the decision on whether to vectorize in
the presence of runtime checks it instead optimistically creates the
runtime checks early and discards them later if decided to not
vectorize. This has the advantage that the cost-modeling decisions
can be kept together and can be done up-front and thus preserving the
general code structure. I think delaying (part) of the decision to
vectorize would also make the VPlan migration a bit harder.

One potential drawback of this patch is that we speculatively
generate IR which we might have to clean up later. However it seems like
the code required to do so is quite manageable.

Reviewed By: lebedev.ri, ebrevnov

Differential Revision: https://reviews.llvm.org/D75980
2021-03-01 10:48:04 +00:00
Florian Hahn 6240f436dd
Recommit "[LV] Allow tryToCreateWidenRecipe to return a VPValue, use for blends."
This reverts the revert commit 437f0bbcd5.

It adds a new toVPRecipeResult, which forces VPRecipeOrVPValueTy to be
constructed with a VPRecipeBase *. This should address ambiguous
constructor issues for recipe sub-types that also inherit from VPValue.
2021-02-24 10:36:02 +00:00
Florian Hahn 437f0bbcd5
Revert "[LV] Allow tryToCreateWidenRecipe to return a VPValue, use for blends."
This reverts commit 4efa097eb4, because
some the compilers used for some bots do not support automatic
conversions to PointerUnion.
2021-02-23 16:57:21 +00:00
Florian Hahn 4efa097eb4
[LV] Allow tryToCreateWidenRecipe to return a VPValue, use for blends.
Generalize the return value of tryToCreateWidenRecipe to return either a
newly create recipe or an existing VPValue. Use this to avoid creating
unnecessary VPBlendRecipes.

Fixes PR44800.
2021-02-23 16:52:03 +00:00
David Green dd2dbf7ee2 [TTI] Change getOperandsScalarizationOverhead to take Type args
As a followup to D95291, getOperandsScalarizationOverhead was still
using a VF as a vector factor if the arguments were scalar, and would
assert on certain matrix intrinsics with differently sized vector
arguments. This patch removes the VF arg, instead passing the Types
through directly. This should allow it to more accurately compute the
cost without having to guess at which operands will be vectorized,
something difficult with more complex intrinsics.

This adjusts one SVE test as it is now calling the wrong intrinsic vs
veccall. Without invalid InstructCosts the cost of the scalarized
intrinsic is too low. This should get fixed when the cost of
scalarization is accounted for with scalable types.

Differential Revision: https://reviews.llvm.org/D96287
2021-02-23 13:04:59 +00:00
David Green bd4b61efbd [CostModel] Remove VF from IntrinsicCostAttributes
getIntrinsicInstrCost takes a IntrinsicCostAttributes holding various
parameters of the intrinsic being costed. It can either be called with a
scalar intrinsic (RetTy==Scalar, VF==1), with a vector instruction
(RetTy==Vector, VF==1) or from the vectorizer with a scalar type and
vector width (RetTy==Scalar, VF>1). A RetTy==Vector, VF>1 is considered
an error. Both of the vector modes are expected to be treated the same,
but because this is confusing many backends end up getting it wrong.

Instead of trying work with those two values separately this removes the
VF parameter, widening the RetTy/ArgTys by VF used called from the
vectorizer. This keeps things simpler, but does require some other
modifications to keep things consistent.

Most backends look like this will be an improvement (or were not using
getIntrinsicInstrCost). AMDGPU needed the most changes to keep the code
from c230965ccf working. ARM removed the fix in
dfac521da1, webassembly happens to get a fixup for an SLP cost
issue and both X86 and AArch64 seem to now be using better costs from
the vectorizer.

Differential Revision: https://reviews.llvm.org/D95291
2021-02-23 13:03:26 +00:00
Florian Hahn c7ee57f1dc
[LV] Directly use incoming value for single VPBlendRecipes.
VPBlendRecipes with single incoming (value, mask) pair are no-ops. Use
the incoming value directly.
2021-02-22 16:10:08 +00:00
Florian Hahn 15a74b64df
[VPlan] Manage pairs of incoming (VPValue, VPBB) in VPWidenPHIRecipe.
This patch extends VPWidenPHIRecipe to manage pairs of incoming
(VPValue, VPBasicBlock) in the VPlan native path. This is made possible
because we now directly manage defined VPValues for recipes.

By keeping both the incoming value and block in the recipe directly,
code-generation in the VPlan native path becomes independent of the
predecessor ordering when fixing up non-induction phis, which currently
can cause crashes in the VPlan native path.

This fixes PR45958.

Reviewed By: sguggill

Differential Revision: https://reviews.llvm.org/D96773
2021-02-22 09:44:25 +00:00
Sanjay Patel 5b250a27ec [Analysis][LoopVectorize] do not form reductions of pointers
This is a fix for https://llvm.org/PR49215 either before/after
we make a verifier enhancement for vector reductions with D96904.

I'm not sure what the current thinking is for pointer math/logic
in IR. We allow icmp on pointer values. Therefore, we match min/max
patterns, so without this patch, the vectorizer could form a vector
reduction from that sequence.

But the LangRef definitions for min/max and vector reduction
intrinsics do not allow pointer types:
https://llvm.org/docs/LangRef.html#llvm-smax-intrinsic
https://llvm.org/docs/LangRef.html#llvm-vector-reduce-umax-intrinsic

So we would crash/assert at some point - either in IR verification,
in the cost model, or in codegen. If we do want to allow this kind
of transform, we will need to update the LangRef and all of those
parts of the compiler.

Differential Revision: https://reviews.llvm.org/D97047
2021-02-19 14:01:57 -05:00
David Green a1c34a9d6a [ARM] Correct vector predicate type in MVE getCmpSelInstrCost 2021-02-19 14:43:51 +00:00
Florian Hahn edc92a1c42
[LV] Remove VPCallback.
Now that all state for generated instructions is managed directly in
VPTransformState, VPCallBack is no longer needed. This patch updates the
last use of `getOrCreateScalarValue` to instead manage the value
directly in VPTransformState and removes VPCallback.

Reviewed By: gilr

Differential Revision: https://reviews.llvm.org/D95383
2021-02-19 12:50:41 +00:00
David Green 1a6744e3dc [ARM] Add larger than legal ICmp costs
A v8i32 compare will produce a v8i1 predicate, but during codegen the
v8i32 will be split into two v4i32, potentially requiring two v4i1
predicates to be merged into a single v8i1. Because this merging of two
v4i1's into a v8i1 is very expensive, we need to make the cost of the
compare equally high.

This patch adds the cost of that to ARMTTIImpl::getCmpSelInstrCost.
Because we don't know whether the user of the predicate can be split,
and the cost model is mostly pre-instruction, we may be pessimistic but
that should only be for larger and legal types. This also adds min/max
detection to the costmodel where it can be detected, to keep those in
line with the cost of simple min/max instructions. Otherwise for the
most part, costs that were already expensive have become more expensive.

Differential Revision: https://reviews.llvm.org/D96692
2021-02-18 11:42:17 +00:00
David Green 1fbb3287fc [ARM] MVE ICmp costing tests. NFC 2021-02-18 10:50:34 +00:00
Joseph Huber c3a3d20093 [LV] Add analysis remark for mixed precision conversions
Floating point conversions inside vectorized loops have performance
implications but are very subtle. The user could specify a floating
point constant, or call a function without realizing that it will
force a change in the vector width. An example of this behaviour is
seen in https://godbolt.org/z/M3nT6c . The vectorizer should indicate
when this happens becuase it is most likely unintended behaviour.

This patch adds a simple check for this behaviour by following floating
point stores in the original loop and checking if a floating point
conversion operation occurs.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D95539
2021-02-17 21:37:08 -05:00
Kerry McLaughlin ba1e150d03 [SVE] Add support for scalable vectorization of loops with int/fast FP reductions
This patch enables scalable vectorization of loops with integer/fast reductions, e.g:

```
unsigned sum = 0;
for (int i = 0; i < n; ++i) {
  sum += a[i];
}
```

A new TTI interface, isLegalToVectorizeReduction, has been added to prevent
reductions which are not supported for scalable types from vectorizing.
If the reduction is not supported for a given scalable VF,
computeFeasibleMaxVF will fall back to using fixed-width vectorization.

Reviewed By: david-arm, fhahn, dmgreen

Differential Revision: https://reviews.llvm.org/D95245
2021-02-16 13:50:06 +00:00
Florian Hahn 54a14c264a
[VPlan] Manage scalarized values using VPValues.
This patch updates codegen to use VPValues to manage the generated
scalarized instructions.

Reviewed By: gilr

Differential Revision: https://reviews.llvm.org/D92285
2021-02-16 09:04:10 +00:00
Kerry McLaughlin 5fe1593438 [LoopVectorizer] Require no-signed-zeros-fp-math=true for fmin/fmax
Currently, setting the `no-nans-fp-math` attribute to true will allow
loops with fmin/fmax to vectorize, though we should be requiring that
`no-signed-zeros-fp-math` is also set.

This patch adds the check for no-signed-zeros at the function level and includes
tests to make sure we don't vectorize functions with only one of the attributes
associated.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D96604
2021-02-15 13:47:05 +00:00
Juneyoung Lee ed253ef772 [LoopVectorize] Fix VPRecipeBuilder::createEdgeMask to correctly generate the mask
This patch fixes pr48832 by correctly generating the mask when a poison value is involved.

Consider this CFG (which is a part of the input):

```
for.body:                                         ; preds = %for.cond
  br i1 true, label %cond.false, label %land.rhs

land.rhs:                                         ; preds = %for.body
  br i1 poison, label %cond.end, label %cond.false

cond.false:                                       ; preds = %for.body, %land.rhs
  br label %cond.end

cond.end:                                         ; preds = %land.rhs, %cond.false
  %cond = phi i32 [ 0, %cond.false ], [ 1, %land.rhs ]

```

The path for.body -> land.rhs -> cond.end should be taken when 'select i1 false, i1 poison, i1 false' holds (which means it's never taken); but VPRecipeBuilder::createEdgeMask was emitting 'and i1 false, poison' instead.
The former one successfully blocks poison propagation whereas the latter one doesn't, making the condition poison and thus causing the miscompilation.

SimplifyCFG has a similar bug (which didn't expose a real-world bug yet), and a patch for this is also ongoing (see https://reviews.llvm.org/D95026).

Reviewed By: bjope

Differential Revision: https://reviews.llvm.org/D95217
2021-02-14 21:12:34 +09:00
Kerry McLaughlin fea06efe7c [SVE][LoopVectorize] Support for vectorization of loops with function calls
Changes `getScalarizationOverhead` to return an invalid cost for scalable VFs
and adds some simple tests for loops containing a function for which
there is a vectorized variant available.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D96356
2021-02-12 13:47:43 +00:00
Sanjay Patel 79b1b4a581 [Vectorizers][TTI] remove option to bypass creation of vector reduction intrinsics
The vector reduction intrinsics started life as experimental ops, so backend support
was lacking. As part of promoting them to 1st-class intrinsics, however, codegen
support was added/improved:
D58015
D90247

So I think it is safe to now remove this complication from IR.

Note that we still have an IR-level codegen expansion pass for these as discussed
in D95690. Removing that is another step in simplifying the logic. Also note that
x86 was already unconditionally forming reductions in IR, so there should be no
difference for x86.

I spot checked a couple of the tests here by running them through opt+llc and did
not see any asm diffs.

If we do find functional differences for other targets, it should be possible
to (at least temporarily) restore the shuffle IR with the ExpandReductions IR
pass.

Differential Revision: https://reviews.llvm.org/D96552
2021-02-12 08:13:50 -05:00
Florian Hahn d5387ec267
[LV] Add tests showing suboptimal vectorization for narrow types.
This patch adds additional test cases showing missing/sub-optimal
vectorization for loops which contain small and wider memory ops on
AArch64.
2021-02-11 17:24:28 +00:00
Craig Topper 18ff7e045a [RISCV] Make the min and max vector width command line options more consistent and check their relationship to each other. 2021-02-09 10:47:23 -08:00
Jinsong Ji 9202806241 Revert "[CostModel] Remove VF from IntrinsicCostAttributes"
This reverts commit 502a67dd7f.

This expose a failure in test-suite build on PowerPC,
revert to unblock buildbot first,
Dave will re-commit in https://reviews.llvm.org/D96287.

Thanks Dave.
2021-02-09 02:14:14 +00:00
Hsiangkai Wang a5b07a221a [RISCV] Initial support of LoopVectorizer for RISC-V Vector.
Define an option -riscv-vector-bits-max to specify the maximum vector
bits for vectorizer. Loop vectorizer will use the value to check if it
is safe to use the whole vector registers to vectorize the loop.

It is not the optimum solution for loop vectorizing for scalable vector.
It assumed the whole vector registers will be used to vectorize the code.
If it is possible, we should configure vl to do vectorize instead of
using whole vector registers.

We only consider LMUL = 1 in this patch.

This patch just an initial work for loop vectorizer for RISC-V Vector.

Differential Revision: https://reviews.llvm.org/D95659
2021-02-09 06:32:18 +08:00
Simon Pilgrim 22302b2be0 [LoopVectorize][ARM] Regenerate mve-gather-scatter-tailpred.ll test
Fix codegen after rG7fe41ac3dff2d44c3d2c31b28554fbe4a86eaa6c
2021-02-05 12:32:45 +00:00
Adrian Kuegel 7fe41ac3df Revert "[LV] Unconditionally branch from middle to scalar preheader if the scalar loop must execute"
This reverts commit 3e5ce49e53.

Tests started failing on PPC, for example:
http://lab.llvm.org:8011/#/builders/105/builds/5569
2021-02-05 12:51:03 +01:00
David Green 502a67dd7f [CostModel] Remove VF from IntrinsicCostAttributes
getIntrinsicInstrCost takes a IntrinsicCostAttributes holding various
parameters of the intrinsic being costed. It can either be called with a
scalar intrinsic (RetTy==Scalar, VF==1), with a vector instruction
(RetTy==Vector, VF==1) or from the vectorizer with a scalar type and
vector width (RetTy==Scalar, VF>1). A RetTy==Vector, VF>1 is considered
an error. Both of the vector modes are expected to be treated the same,
but because this is confusing many backends end up getting it wrong.

Instead of trying work with those two values separately this removes the
VF parameter, widening the RetTy/ArgTys by VF used called from the
vectorizer. This keeps things simpler, but does require some other
modifications to keep things consistent.

Most backends look like this will be an improvement (or were not using
getIntrinsicInstrCost). AMDGPU needed the most changes to keep the code
from c230965ccf working. ARM removed the fix in
dfac521da1, webassembly happens to get a fixup for an SLP cost
issue and both X86 and AArch64 seem to now be using better costs from
the vectorizer.

Differential Revision: https://reviews.llvm.org/D95291
2021-02-05 09:34:24 +00:00
Philip Reames 4cb7d03481 Add missing test update from 3e5ce49
Sorry for the build break, apparently forgot to build ARM target.
2021-02-04 18:04:24 -08:00
Philip Reames 3e5ce49e53 [LV] Unconditionally branch from middle to scalar preheader if the scalar loop must execute
If we know that the scalar epilogue is required to run, modify the CFG to end the middle block with an unconditional branch to scalar preheader. This is instead of a conditional branch to either the preheader or the exit block.

The motivation to do this is to support multiple exit blocks. Specifically, the current structure forces us to identify immediate dominators and *which* exit block to branch from in the middle terminator. For the multiple exit case - where we know require scalar will hold - these questions are ill formed.

This is the last change needed to support multiple exit loops, but since the diffs are already large enough, I'm going to land this, and then enable separately. You can think of this as being NFCI-ish prep work, but the changes are a bit too involved for me to feel comfortable tagging the change that way.

Differential Revision: https://reviews.llvm.org/D94892
2021-02-04 17:28:30 -08:00
Sanjay Patel 916c4121c1 [LoopVectorize] add test for fake min/max; NFC
This goes with the dyn_cast fix:
0fa61304d2

That was made after noticing that the assert was over-reaching here:
bbed5f2f8a ( D95690 )
2021-02-03 09:24:57 -05:00
David Sherwood d4d4ceeb8f [SVE][LoopVectorize] Add masked load/store and gather/scatter support for SVE
This patch updates IRBuilder::CreateMaskedGather/Scatter to work
with ScalableVectorType and adds isLegalMaskedGather/Scatter functions
to AArch64TargetTransformInfo. In addition I've fixed up
isLegalMaskedLoad/Store to return true for supported scalar types,
since this is what the vectorizer asks for.

In LoopVectorize.cpp I've changed
LoopVectorizationCostModel::getInterleaveGroupCost to return an invalid
cost for scalable vectors, since currently this relies upon using shuffle
vector for reversing vectors. In addition, in
LoopVectorizationCostModel::setCostBasedWideningDecision I have assumed
that the cost of scalarising memory ops is infinitely expensive.

I have added some simple masked load/store and gather/scatter tests,
including cases where we use gathers and scatters for conditional invariant
loads and stores.

Differential Revision: https://reviews.llvm.org/D95350
2021-02-02 09:52:39 +00:00
Gil Rapaport d475030dc2 [SCEV] Apply loop guards to divisibility tests
Extend applyLoopGuards() to take into account conditions/assumes proving some
value %v to be divisible by D by rewriting %v to (%v / D) * D. This lets the
loop unroller and the loop vectorizer identify more loops as not requiring
remainder loops.

Differential Revision: https://reviews.llvm.org/D95521
2021-02-02 08:09:39 +02:00