Commit Graph

2918 Commits

Author SHA1 Message Date
Dávid Bolvanský 5f2f611880 Fixed more warnings in LLVM produced by -Wbitwise-instead-of-logical 2021-10-03 13:58:10 +02:00
hyeongyu kim cf284f6c5e [LSV] Change the default value of InstertElement to poison
This patch is changing the InsertElement's placeholder to poison without changing the LSV's behavior.

Regardless of whether `StoreTy` is FixedVectorType or not, the poison value will be overwritten with a different value.
Therefore, whether the InsertElement's placeholder is poison or undef will not affect the result of the program.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D111005
2021-10-03 17:57:34 +09:00
Kazu Hirata 4f0225f6d2 [Transforms] Migrate from getNumArgOperands to arg_size (NFC)
Note that getNumArgOperands is considered a legacy name.  See
llvm/include/llvm/IR/InstrTypes.h for details.
2021-10-01 09:57:40 -07:00
Kerry McLaughlin c1d46d3461 [SLPVectorizer] Fix crash in isShuffle with scalable vectors
D104809 changed `buildTree_rec` to check for extract element instructions
with scalable types. However, if the extract is extended or truncated,
these changes do not apply and we assert later on in isShuffle(), which
attempts to cast the type of the extract to FixedVectorType.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D110640
2021-10-01 10:56:44 +01:00
Krasimir Georgiev 685f1bfd0a Revert "[LoopVectorize] Permit vectorisation of more select(cmp(), X, Y) reduction patterns"
It appears to cause stage2 clang build failures, e.g.,
https://lab.llvm.org/buildbot/#/builders/74/builds/7145.

This reverts commit 1fb37334bd.
2021-10-01 11:39:43 +02:00
David Sherwood 1fb37334bd [LoopVectorize] Permit vectorisation of more select(cmp(), X, Y) reduction patterns
This patch adds further support for vectorisation of loops that involve
selecting an integer value based on a previous comparison. Consider the
following C++ loop:

  int r = a;
  for (int i = 0; i < n; i++) {
    if (src[i] > 3) {
      r = b;
    }
    src[i] += 2;
  }

We should be able to vectorise this loop because all we are doing is
selecting between two states - 'a' and 'b' - both of which are loop
invariant. This just involves building a vector of values that contain
either 'a' or 'b', where the final reduced value will be 'b' if any lane
contains 'b'.

The IR generated by clang typically looks like this:

  %phi = phi i32 [ %a, %entry ], [ %phi.update, %for.body ]
  ...
  %pred = icmp ugt i32 %val, i32 3
  %phi.update = select i1 %pred, i32 %b, i32 %phi

We already detect min/max patterns, which also involve a select + cmp.
However, with the min/max patterns we are selecting loaded values (and
hence loop variant) in the loop. In addition we only support certain
cmp predicates. This patch adds a new pattern matching function
(isSelectCmpPattern) and new RecurKind enums - SelectICmp & SelectFCmp.
We only support selecting values that are integer and loop invariant,
however we can support any kind of compare - integer or float.

Tests have been added here:

  Transforms/LoopVectorize/AArch64/sve-select-cmp.ll
  Transforms/LoopVectorize/select-cmp-predicated.ll
  Transforms/LoopVectorize/select-cmp.ll

Differential Revision: https://reviews.llvm.org/D108136
2021-10-01 08:41:03 +01:00
Kazu Hirata f631173d80 [llvm] Migrate from arg_operands to args (NFC)
Note that arg_operands is considered a legacy name.  See
llvm/include/llvm/IR/InstrTypes.h for details.
2021-09-30 08:51:21 -07:00
Anna Thomas 452714f8f8 [BPI] Keep BPI available in loop passes through LoopStandardAnalysisResults
This is analogous to D86156 (which preserves "lossy" BFI in loop
passes). Lossy means that the analysis preserved may not be up to date
with regards to new blocks that are added in loop passes, but BPI will
not contain stale pointers to basic blocks that are deleted by the loop
passes.

This is achieved through BasicBlockCallbackVH in BPI, which calls
eraseBlock that updates the data structures in BPI whenever a basic
block is deleted.

This patch does not have any changes in the upstream pipeline, since
none of the loop passes in the pipeline use BPI currently.
However, since BPI wasn't previously preserved in loop passes, the loop
predication pass was invoking BPI *on the entire
function* every time it ran in an LPM.  This caused massive compile time
in our downstream LPM invocation which contained loop predication.

See updated test with an invocation of a loop-pipeline containing loop
predication and -debug-pass turned ON.

Reviewed-By: asbirlea, modimo
Differential Revision: https://reviews.llvm.org/D110438
2021-09-30 10:27:05 -04:00
Alexey Bataev f701505c45 [SLP]Improve vectorization of phi nodes by trying wider vectors.
Try to improve vectorization of the PHI nodes by trying to vectorize
similar instructions at the size of the widest possible vectors, then
aggregating with compatible type PHIs and trying to vectoriza again and
only if this failed, try smaller sizes of the vector factors for
compatible PHI nodes. This restores performance of several benchmarks
after tuning of the fp/int conversion instructions costs.

Differential Revision: https://reviews.llvm.org/D108740
2021-09-28 07:20:36 -07:00
Alexey Bataev 8bacfb9bed [SLP]No need to schedule/check parent for extract{element/value} instruction.
The instruction extractelement/extractvalue are not required to
be scheduled since they only depend on the source vector/aggregate (with
constant indices), smae applies to the parent basic block checks.
Improves compile time and saves scheduling budget.

Differential Revision: https://reviews.llvm.org/D108703
2021-09-28 06:13:55 -07:00
Florian Hahn e2f6290e06
[VectorCombine] Discard ScalarizationResult state in early exit.
ScalarizationResult's destructor makes sure ToFreeze is not ignored if
set. Currently, scalarizeLoadExtract has an early exit if the index is
not safe directly. But when it is SafeWithFreeze, we need to discard the
state first, otherwise we hit the assert in the destructor.

Fixes PR51992.
2021-09-28 12:52:16 +01:00
Jameson Nash e27a6db529 Bad SLPVectorization shufflevector replacement, resulting in write to wrong memory location
We see that it might otherwise do:

  %10 = getelementptr {}**, <2 x {}***> %9, <2 x i32> <i32 10, i32 4>
  %11 = bitcast <2 x {}***> %10 to <2 x i64*>
...
  %27 = extractelement <2 x i64*> %11, i32 0
  %28 = bitcast i64* %27 to <2 x i64>*
  store <2 x i64> %22, <2 x i64>* %28, align 4, !tbaa !2

Which is an out-of-bounds store (the extractelement got offset 10
instead of offset 4 as intended). With the fix, we correctly generate
extractelement for i32 1 and generate correct code.

Differential Revision: https://reviews.llvm.org/D106613
2021-09-27 14:06:13 -04:00
Simon Pilgrim 8a44281f47 [SLP] getReductionCost - use explicit TTI::TCK_RecipThroughput CostKind. NFCI.
Avoid relying on the default cost kinds in TTI calls (we already do this in other places in SLP) - noticed while trying to see how much work it'd be to extend D110242 and remove all remaining uses of default CostKind arguments.
2021-09-22 16:52:22 +01:00
Florian Hahn 300870a95c
[VectorCombine] Switch to using a worklist.
This patch updates VectorCombine to use a worklist to allow iterative
simplifications where a combine enables other combines.

Suggested in D100302.

The main use case at the moment is foldSingleElementStore and
scalarizeLoadExtract working together to improve scalarization.

Note that we now also do not run SimplifyInstructionsInBlock on the
whole function if there have been changes. This means we fail to
remove/simplify instructions not related to any of the vector combines.
IMO this is fine, as simplifying the whole function seems more like a
workaround for not tracking the changed instructions.

Compile-time impact looks neutral:
NewPM-O3: +0.02%
NewPM-ReleaseThinLTO: -0.00%
NewPM-ReleaseLTO-g: -0.02%

http://llvm-compile-time-tracker.com/compare.php?from=52832cd917af00e2b9c6a9d1476ba79754dcabff&to=e66520a4637290550a945d528e3e59573485dd40&stat=instructions

Reviewed By: spatel, lebedev.ri

Differential Revision: https://reviews.llvm.org/D110171
2021-09-22 09:54:58 +01:00
Ayal Zaks ab6a69dfea [LV] Fix crash for reverse interleaved loads with gap under fold-tail.
This patch fixes the crash found by PR51614:
whenever doing tail folding, interleave groups must be considered under mask.

Another fix D108900 follows for targets that support masked loads and stores:
when *deciding* to vectorize with masked interleave groups, check if the access
is reverse - which is currently not supported; rather than (only) asserting when
computing cost and generating code.

Differential Revision: https://reviews.llvm.org/D108891
2021-09-21 20:13:32 +03:00
Florian Hahn 5131037ea9
[ValueTracking,VectorCombine] Allow passing DT to computeConstantRange.
isValidAssumeForContext can provide better results with access to the
dominator tree in some cases. This patch adjusts computeConstantRange to
allow passing through a dominator tree.

The use VectorCombine is updated to pass through the DT to enable
additional scalarization.

Note that similar APIs like computeKnownBits already accept optional dominator
tree arguments.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D110175
2021-09-21 16:54:47 +01:00
Alexey Bataev bc69dd62c0 [SLP]Improve graph reordering.
Reworked reordering algorithm. Originally, the compiler just tried to
detect the most common order in the reordarable nodes (loads, stores,
extractelements,extractvalues) and then fully rebuilding the graph in
the best order. This was not effecient, since it required an extra
memory and time for building/rebuilding tree, double the use of the
scheduling budget, which could lead to missing vectorization due to
exausted scheduling resources.

Patch provide 2-way approach for graph reodering problem. At first, all
reordering is done in-place, it doe not required tree
deleting/rebuilding, it just rotates the scalars/orders/reuses masks in
the graph node.

The first step (top-to bottom) rotates the whole graph, similarly to the previous
implementation. Compiler counts the number of the most used orders of
the graph nodes with the same vectorization factor and then rotates the
subgraph with the given vectorization factor to the most used order, if
it is not empty. Then repeats the same procedure for the subgraphs with
the smaller vectorization factor. We can do this because we still need
to reshuffle smaller subgraph when buildiong operands for the graph
nodes with lasrger vectorization factor, we can rotate just subgraph,
not the whole graph.

The second step (bottom-to-top) scans through the leaves and tries to
detect the users of the leaves which can be reordered. If the leaves can
be reorder in the best fashion, they are reordered and their user too.
It allows to remove double shuffles to the same ordering of the operands in
many cases and just reorder the user operations instead. Plus, it moves
the final shuffles closer to the top of the graph and in many cases
allows to remove extra shuffle because the same procedure is repeated
again and we can again merge some reordering masks and reorder user nodes
instead of the operands.

Also, patch improves cost model for gathering of loads, which improves
x264 benchmark in some cases.

Gives about +2% on AVX512 + LTO (more expected for AVX/AVX2) for {625,525}x264,
+3% for 508.namd, improves most of other benchmarks.
The compile and link time are almost the same, though in some cases it
should be better (we're not doing an extra instruction scheduling
anymore) + we may vectorize more code for the large basic blocks again
because of saving scheduling budget.

Differential Revision: https://reviews.llvm.org/D105020
2021-09-20 08:42:19 -07:00
Kazu Hirata 24c8eaec94 [Transforms] Use make_early_inc_range (NFC) 2021-09-15 19:55:24 -07:00
Florian Hahn e90d55e1c9
[VPlan] Support sinking recipes with uniform users outside sink target.
This is a first step towards addressing the last remaining limitation of
the VPlan version of sinkScalarOperands: the legacy version can
partially sink operands. For example, if a GEP has uniform users outside
the sink target block, then the legacy version will sink all scalar
GEPs, other than the one for lane 0.

This patch works towards addressing this case in the VPlan version by
detecting such cases and duplicating the sink candidate. All users
outside of the sink target will be updated to use the uniform clone.

Note that this highlights an issue with VPValue naming. If we duplicate
a replicate recipe, they will share the same underlying IR value and
both VPValues will have the same name ir<%gep>.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D104254
2021-09-15 09:21:39 +01:00
Markus Lavin 1ac209ed76 [NPM] Added -print-pipeline-passes print params for a few passes.
Added '-print-pipeline-passes' printing of parameters for those passes
declared with *_WITH_PARAMS macro in PassRegistry.def.

Note that it only prints the parameters declared inside *_WITH_PARAMS as
in a few cases there appear to be additional parameters not parsable.

The following passes are now covered (i.e. all of those with *_WITH_PARAMS in
PassRegistry.def).

LoopExtractorPass - loop-extract
HWAddressSanitizerPass - hwsan
EarlyCSEPass - early-cse
EntryExitInstrumenterPass - ee-instrument
LowerMatrixIntrinsicsPass - lower-matrix-intrinsics
LoopUnrollPass - loop-unroll
AddressSanitizerPass - asan
MemorySanitizerPass - msan
SimplifyCFGPass - simplifycfg
LoopVectorizePass - loop-vectorize
MergedLoadStoreMotionPass - mldst-motion
GVN - gvn
StackLifetimePrinterPass - print<stack-lifetime>
SimpleLoopUnswitchPass - simple-loop-unswitch

Differential Revision: https://reviews.llvm.org/D109310
2021-09-15 08:34:04 +02:00
Florian Hahn 7359450e6a
[VPlan] Queue (block, operand) pairs together (NFC).
Instead of discovering the sink-to block for each operand in the main
loop, the sink-to block can instead be directly queued with the
operands.

This simplifies processing in the main loop and is a NFC change split
off from D104254 as suggested there.
2021-09-14 20:02:51 +01:00
Florian Hahn c24fc37e47
[VectorCombine] Support AND/UREM indices that require freezing.
38b098be66 limited scalarization to indices that are known non-poison.
For certain patterns that restrict the range of an index, we can insert
a freeze of the original value, to prevent propagation of poison.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D107580
2021-09-13 11:21:45 +01:00
David Sherwood bbada9ff45 [NFC] Replace unsigned VF with ElementCount in EpilogueLoopVectorizationInfo
This patch simply replaces any unsigned VFs with ElementCounts. It's
still NFC because at the moment epilogue vectorisation is disabled
when the main vector loop uses scalable vectors.

Differential Revision: https://reviews.llvm.org/D109364
2021-09-13 10:18:30 +01:00
Florian Hahn 368af7558e
[VPlan] Fix crash caused by not updating all users properly.
Users of VPValues are managed in a vector, so we need to be more
careful when iterating over users while updating them. For now, just
copy them.

Fixes 51798.
2021-09-12 18:10:53 +01:00
Kazu Hirata 15e9575fb5 [Vectorize] Fix "unused variable" warnings 2021-09-11 12:06:43 -07:00
Nikita Popov 45c467346a [LAA] Pass access type to getPtrStride()
Pass the access type to getPtrStride(), so it is not determined
from the pointer element type. Many cases still fetch the element
type at a higher level though, so this only partially addresses
the issue.
2021-09-11 19:16:49 +02:00
Rosie Sumpter 9d1bea9c88 [SVE][LoopVectorize] Optimise code generated by widenPHIInstruction
For SVE, when scalarising the PHI instruction the whole vector part is
generated as opposed to creating instructions for each lane for fixed-
width vectors. However, in some cases the lane values may be needed
later (e.g for a load instruction) so we still need to calculate
these values to avoid extractelement being called on the vector part.

Differential Revision: https://reviews.llvm.org/D109445
2021-09-10 11:58:04 +01:00
Chris Lattner 735f46715d [APInt] Normalize naming on keep constructors / predicate methods.
This renames the primary methods for creating a zero value to `getZero`
instead of `getNullValue` and renames predicates like `isAllOnesValue`
to simply `isAllOnes`.  This achieves two things:

1) This starts standardizing predicates across the LLVM codebase,
   following (in this case) ConstantInt.  The word "Value" doesn't
   convey anything of merit, and is missing in some of the other things.

2) Calling an integer "null" doesn't make any sense.  The original sin
   here is mine and I've regretted it for years.  This moves us to calling
   it "zero" instead, which is correct!

APInt is widely used and I don't think anyone is keen to take massive source
breakage on anything so core, at least not all in one go.  As such, this
doesn't actually delete any entrypoints, it "soft deprecates" them with a
comment.

Included in this patch are changes to a bunch of the codebase, but there are
more.  We should normalize SelectionDAG and other APIs as well, which would
make the API change more mechanical.

Differential Revision: https://reviews.llvm.org/D109483
2021-09-09 09:50:24 -07:00
Kazu Hirata 5648f7170e [Analysis, Target, Transforms] Construct SmallVector with iterator ranges (NFC) 2021-09-07 09:19:33 -07:00
Nikita Popov 02f74eadbe [IVDescriptors] Make pointer inductions compatible with opaque pointers
Store the used element type in the InductionDescriptor. For typed
pointers, it remains the pointer element type. For opaque pointers,
we always use an i8 element type, such that the step is a simple
offset.

A previous version of this patch instead tried to guess the element
type from an induction GEP, but this is not reliable, as the GEP
may be hidden (see @both in iv_outside_user.ll).

Differential Revision: https://reviews.llvm.org/D104795
2021-09-01 21:02:05 +02:00
Nikita Popov 9d720dcb89 [LoadStoreVectorizer] Make aliasing check more precise
The load store vectorizer currently uses isNoAlias() to determine
whether memory-accessing instructions should prevent vectorization.
However, this only works for loads and stores. Additionally, a
couple of intrinsics like assume are special-cased to be ignored.

Instead use getModRefInfo() to generically determine whether the
instruction accesses/modifies the relevant location. This will
automatically handle all inaccessiblememonly intrinsics correctly
(as well as other calls that don't modref for other reasons).
This requires generalizing the code a bit, as it was previously
only considering loads and stored in particular.

Differential Revision: https://reviews.llvm.org/D109020
2021-09-01 18:10:09 +02:00
Nikita Popov 48ebe427c9 [SLPVectorizer] Make aliasing check more precise
SLPVectorizer currently uses AA::isNoAlias() to determine whether
two locations alias. This does not work if one of the instructions
is a call. Instead, we should check getModRefInfo(), which
determines whether an arbitrary instruction modifies or references
a given location.

Among other things, this prevents @llvm.experimental.noalias.scope.decl()
and other inaccessiblmemonly intrinsics from interfering with SLP
vectorization.

Differential Revision: https://reviews.llvm.org/D109012
2021-08-31 22:35:30 +02:00
Anton Afanasyev 077d4cb3ab Revert "[SLP]No need to schedule/check parent for extract{element/value} instruction."
Revert since introduced issure reported here:
https://lists.llvm.org/pipermail/llvm-dev/2021-August/152411.html
Discussed starting from here: https://reviews.llvm.org/D108703#2974289

This reverts commit a36bc873a2.
2021-08-31 15:29:06 +03:00
Mikhail Goncharov 5097b6e352 Revert "[SLP]Improve graph reordering."
This reverts commit 84cbd71c95.

This commit breaks one of the internal tests. As agreed with Alexey I
will provide the reproducer later.
2021-08-30 19:16:44 +02:00
Florian Hahn abd36fe512
[VPlan] Introduce code to limit querying VPValues using IR references.
After applying VPlan-to-VPlan transformations, using IR references to
query VPlan values may be incorrect, as the IR is not in sync with the
VPlan any longer.

To better detect such mis-matches, this patch introduces a new flag to
VPlans to indicate whether it is safe to query VPValues using IR values.

getVPValue is updated to assert if it is called when the flag indicates
it is not safe any longer.

There is an escape hatch via an extra argument, because there are 3
places that need to be fixed first. Those are

1. truncateToMinimalBitwidths
2. clearReductionWrapFlags
3. fixLCSSAPHIs

As a first step, this flag will help preventing new code from violating
this property.

Any suggestions with respect to naming very welcome!

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D108573
2021-08-30 09:12:09 +02:00
Alexey Bataev 84cbd71c95 [SLP]Improve graph reordering.
Reworked reordering algorithm. Originally, the compiler just tried to
detect the most common order in the reordarable nodes (loads, stores,
extractelements,extractvalues) and then fully rebuilding the graph in
the best order. This was not effecient, since it required an extra
memory and time for building/rebuilding tree, double the use of the
scheduling budget, which could lead to missing vectorization due to
exausted scheduling resources.

Patch provide 2-way approach for graph reodering problem. At first, all
reordering is done in-place, it doe not required tree
deleting/rebuilding, it just rotates the scalars/orders/reuses masks in
the graph node.

The first step (top-to bottom) rotates the whole graph, similarly to the previous
implementation. Compiler counts the number of the most used orders of
the graph nodes with the same vectorization factor and then rotates the
subgraph with the given vectorization factor to the most used order, if
it is not empty. Then repeats the same procedure for the subgraphs with
the smaller vectorization factor. We can do this because we still need
to reshuffle smaller subgraph when buildiong operands for the graph
nodes with lasrger vectorization factor, we can rotate just subgraph,
not the whole graph.

The second step (bottom-to-top) scans through the leaves and tries to
detect the users of the leaves which can be reordered. If the leaves can
be reorder in the best fashion, they are reordered and their user too.
It allows to remove double shuffles to the same ordering of the operands in
many cases and just reorder the user operations instead. Plus, it moves
the final shuffles closer to the top of the graph and in many cases
allows to remove extra shuffle because the same procedure is repeated
again and we can again merge some reordering masks and reorder user nodes
instead of the operands.

Also, patch improves cost model for gathering of loads, which improves
x264 benchmark in some cases.

Gives about +2% on AVX512 + LTO (more expected for AVX/AVX2) for {625,525}x264,
+3% for 508.namd, improves most of other benchmarks.
The compile and link time are almost the same, though in some cases it
should be better (we're not doing an extra instruction scheduling
anymore) + we may vectorize more code for the large basic blocks again
because of saving scheduling budget.

Differential Revision: https://reviews.llvm.org/D105020
2021-08-26 12:31:18 -07:00
Alexey Bataev b00f73d8bf Revert "[SLP]Improve graph reordering."
This reverts commit a28234e37a to
investigate a compiler crash caused by the commit.
2021-08-26 09:19:40 -07:00
Alexey Bataev a28234e37a [SLP]Improve graph reordering.
Reworked reordering algorithm. Originally, the compiler just tried to
detect the most common order in the reordarable nodes (loads, stores,
extractelements,extractvalues) and then fully rebuilding the graph in
the best order. This was not effecient, since it required an extra
memory and time for building/rebuilding tree, double the use of the
scheduling budget, which could lead to missing vectorization due to
exausted scheduling resources.

Patch provide 2-way approach for graph reodering problem. At first, all
reordering is done in-place, it doe not required tree
deleting/rebuilding, it just rotates the scalars/orders/reuses masks in
the graph node.

The first step (top-to bottom) rotates the whole graph, similarly to the previous
implementation. Compiler counts the number of the most used orders of
the graph nodes with the same vectorization factor and then rotates the
subgraph with the given vectorization factor to the most used order, if
it is not empty. Then repeats the same procedure for the subgraphs with
the smaller vectorization factor. We can do this because we still need
to reshuffle smaller subgraph when buildiong operands for the graph
nodes with lasrger vectorization factor, we can rotate just subgraph,
not the whole graph.

The second step (bottom-to-top) scans through the leaves and tries to
detect the users of the leaves which can be reordered. If the leaves can
be reorder in the best fashion, they are reordered and their user too.
It allows to remove double shuffles to the same ordering of the operands in
many cases and just reorder the user operations instead. Plus, it moves
the final shuffles closer to the top of the graph and in many cases
allows to remove extra shuffle because the same procedure is repeated
again and we can again merge some reordering masks and reorder user nodes
instead of the operands.

Also, patch improves cost model for gathering of loads, which improves
x264 benchmark in some cases.

Gives about +2% on AVX512 + LTO (more expected for AVX/AVX2) for {625,525}x264,
+3% for 508.namd, improves most of other benchmarks.
The compile and link time are almost the same, though in some cases it
should be better (we're not doing an extra instruction scheduling
anymore) + we may vectorize more code for the large basic blocks again
because of saving scheduling budget.

Differential Revision: https://reviews.llvm.org/D105020
2021-08-26 07:19:07 -07:00
Alexey Bataev a36bc873a2 [SLP]No need to schedule/check parent for extract{element/value} instruction.
The instruction extractelement/extractvalue are not required to
be scheduled since they only depend on the source vector/aggregate (with
constant indices), smae applies to the parent basic block checks.
Improves compile time and saves scheduling budget.

Differential Revision: https://reviews.llvm.org/D108703
2021-08-25 09:27:55 -07:00
Florian Hahn d024a01511
Recommit "[LoopVectorize][AArch64] Enable ordered reductions by default for AArch64"
This reverts the revert ab9296f13b.

The issue causing the revert should be fixed in 9baed023b4.
2021-08-23 11:25:27 +01:00
Florian Hahn 9baed023b4
[LV] Adjust reduction recipes before recurrence handling.
Adjusting the reduction recipes still relies on references to the
original IR, which can become outdated by the first-order recurrence
handling. Until reduction recipe construction does not require IR
references, move it before first-order recurrence handling, to prevent a
crash as exposed by D106653.
2021-08-22 11:02:33 +01:00
Florian Hahn ab9296f13b
Revert "[LoopVectorize][AArch64] Enable ordered reductions by default for AArch64"
This reverts commit f4122398e7 to
investigate a crash exposed by it.

The patch breaks building the code below with `clang -O2 --target=aarch64-linux`

     int a;
     double b, c;
     void d() {
       for (; a; a++) {
         b += c;
         c = a;
       }
     }
2021-08-20 21:24:28 +01:00
David Sherwood f4122398e7 [LoopVectorize][AArch64] Enable ordered reductions by default for AArch64
I have added a new TTI interface called enableOrderedReductions() that
controls whether or not ordered reductions should be enabled for a
given target. By default this returns false, whereas for AArch64 it
returns true and we rely upon the cost model to make sensible
vectorisation choices. It is still possible to override the new TTI
interface by setting the command line flag:

  -force-ordered-reductions=true|false

I have added a new RUN line to show that we use ordered reductions by
default for SVE and Neon:

  Transforms/LoopVectorize/AArch64/strict-fadd.ll
  Transforms/LoopVectorize/AArch64/scalable-strict-fadd.ll

Differential Revision: https://reviews.llvm.org/D106653
2021-08-19 09:29:40 +01:00
Dylan Fleming ef198cd99e [SVE] Remove usage of getMaxVScale for AArch64, in favour of IR Attribute
Removed AArch64 usage of the getMaxVScale interface, replacing it with
the vscale_range(min, max) IR Attribute.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D106277
2021-08-17 14:42:47 +01:00
Nikita Popov 570c9beb8e [MemorySSA] Remove unnecessary MSSA dependencies
LoopLoadElimination, LoopVersioning and LoopVectorize currently
fetch MemorySSA when construction LoopAccessAnalysis. However,
LoopAccessAnalysis does not actually use MemorySSA and we can pass
nullptr instead.

This saves one MemorySSA calculation in the default pipeline, and
thus improves compile-time.

Differential Revision: https://reviews.llvm.org/D108074
2021-08-16 20:40:55 +02:00
Paul Walker f7a831daa6 [LoopVectorize] Don't emit remarks about lack of scalable vectors unless they're specifically requested.
Previously we emitted a "does not support scalable vectors"
remark for all targets whenever vectorisation is attempted. This
pollutes the output for architectures that don't support scalable
vectors and is likely confusing to the user.

Instead this patch introduces a debug message that reports when
scalable vectorisation is allowed by the target and only issues
the previous remark when scalable vectorisation is specifically
requested, for example:

  #pragma clang loop vectorize_width(2, scalable)

Differential Revision: https://reviews.llvm.org/D108028
2021-08-15 12:15:52 +01:00
Dorit Nuzman 67278b8a90 [LV] Support Interleaved Store Group With Gaps
Teach LV to use masked-store to support interleave-store-group with
gaps (instead of scatters/scalarization).

The symmetric case of using masked-load to support
interleaved-load-group with gaps was introduced a while ago, by
https://reviews.llvm.org/D53668; This patch completes the store-scenario
leftover from D53668, and solves PR50566.

Reviewed by: Ayal Zaks

Differential Revision: https://reviews.llvm.org/D104750
2021-08-08 10:32:02 +03:00
Florian Hahn a00aafc30d
[VPlan] Iterate over phi recipes to detect reductions to fix.
After refactoring the phi recipes, we can now iterate over all header
phis in a VPlan to detect reductions when it comes to fixing them up
when tail folding.

This reduces the coupling with the cost model & legal by using the
information directly available in VPlan. It also removes a call to
getOrAddVPValue, which references the original IR value which may
become outdated after VPlan transformations.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D100102
2021-08-07 14:06:50 +01:00
David Sherwood 3fd96e1b2e [LoopVectorize] Improve vectorisation of some intrinsics by treating them as uniform
This patch adds more instructions to the Uniforms list, for example certain
intrinsics that are uniform by definition or whose operands are loop invariant.
This list includes:

  1. The intrinsics 'experimental.noalias.scope.decl' and 'sideeffect', which
  are always uniform by definition.
  2. If intrinsics 'lifetime.start', 'lifetime.end' and 'assume' have
  loop invariant input operands then these are also uniform too.

Also, in VPRecipeBuilder::handleReplication we check if an instruction is
uniform based purely on whether or not the instruction lives in the Uniforms
list. However, there are certain cases where calls to some intrinsics can
be effectively treated as uniform too. Therefore, we now also treat the
following cases as uniform for scalable vectors:

  1. If the 'assume' intrinsic's operand is not loop invariant, then we
  are free to treat this as uniform anyway since it's only a performance
  hint. We will get the benefit for the first lane.
  2. When the input pointers for 'lifetime.start' and 'lifetime.end' are loop
  variant then for scalable vectors we assume these still ultimately come
  from the broadcast of an alloca. We do not support scalable vectorisation
  of loops containing alloca instructions, hence the alloca itself would
  be invariant. If the pointer does not come from an alloca then the
  intrinsic itself has no effect.

I have updated the assume test for fixed width, since we now treat it
as uniform:

  Transforms/LoopVectorize/assume.ll

I've also added new scalable vectorisation tests for other intriniscs:

  Transforms/LoopVectorize/scalable-assume.ll
  Transforms/LoopVectorize/scalable-lifetime.ll
  Transforms/LoopVectorize/scalable-noalias-scope-decl.ll

Differential Revision: https://reviews.llvm.org/D107284
2021-08-06 10:13:15 +01:00
David Sherwood 43a5c750d1 Revert "[LoopVectorize] Add support for replication of more intrinsics with scalable vectors"
This reverts commit 95800da914.
2021-08-06 09:48:16 +01:00
Florian Hahn 3e58dd19df
[LV] Move reduction PHI node fixup to VPlan::execute (NFC).
All information to fix-up the reduction phi nodes in the vectorized loop
is available in VPlan now. This patch moves the code to do so, to make
this clearer. Fixing up the loop exit value still relies on other
information and remains outside of VPlan for now.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D100113
2021-08-06 08:29:20 +01:00
Kazu Hirata 72661f337a [Transforms] Drop unnecessary const from return types (NFC)
Identified with readability-const-return-type.
2021-08-05 08:53:17 -07:00
Alexey Bataev e7c3eaa8ae [SLP]Do not emit extra shuffle for insertelements vectorization.
If the vectorized insertelements instructions form indentity subvector
(the subvector at the beginning of the long vector), it is just enough
to extend the vector itself, no need to generate inserting subvector
shuffle.

Differential Revision: https://reviews.llvm.org/D107494
2021-08-05 08:41:24 -07:00
David Sherwood e9177b0958 Fix build issues caused by 95800da914 2021-08-05 16:26:34 +01:00
Sander de Smalen 3e47f009ff [LV] Consider ExtractValue as uniform.
Since all operands to ExtractValue must be loop-invariant when we deem
the loop vectorizable, we can consider ExtractValue to be uniform.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D107286
2021-08-05 16:20:50 +01:00
Florian Hahn 38b098be66
[VectorCombine] Limit scalarization known non-poison indices.
We can only trust the range of the index if it is guaranteed
non-poison.

Fixes PR50949.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D107364
2021-08-05 15:36:31 +01:00
David Sherwood 95800da914 [LoopVectorize] Add support for replication of more intrinsics with scalable vectors
This patch adds more instructions to the Uniforms list, for example certain
intrinsics that are uniform by definition or whose operands are loop invariant.
This list includes:

  1. The intrinsics 'experimental.noalias.scope.decl' and 'sideeffect', which
  are always uniform by definition.
  2. If intrinsics 'lifetime.start', 'lifetime.end' and 'assume' have
  loop invariant input operands then these are also uniform too.

Also, in VPRecipeBuilder::handleReplication we check if an instruction is
uniform based purely on whether or not the instruction lives in the Uniforms
list. However, there are certain cases where calls to some intrinsics can
be effectively treated as uniform too. Therefore, we now also treat the
following cases as uniform for scalable vectors:

  1. If the 'assume' intrinsic's operand is not loop invariant, then we
  are free to treat this as uniform anyway since it's only a performance
  hint. We will get the benefit for the first lane.
  2. When the input pointers for 'lifetime.start' and 'lifetime.end' are loop
  variant then for scalable vectors we assume these still ultimately come
  from the broadcast of an alloca. We do not support scalable vectorisation
  of loops containing alloca instructions, hence the alloca itself would
  be invariant. If the pointer does not come from an alloca then the
  intrinsic itself has no effect.

I have updated the assume test for fixed width, since we now treat it
as uniform:

  Transforms/LoopVectorize/assume.ll

I've also added new scalable vectorisation tests for other intriniscs:

  Transforms/LoopVectorize/scalable-assume.ll
  Transforms/LoopVectorize/scalable-lifetime.ll
  Transforms/LoopVectorize/scalable-noalias-scope-decl.ll

Differential Revision: https://reviews.llvm.org/D107284
2021-08-05 15:17:27 +01:00
Sander de Smalen 8d08a84745 [LV] Remove a change that was added in D106164.
This change wasn't strictly necessary for D106164 and could be removed.
This patch addresses the post-commit comments from @fhahn on D106164, and
also changes sve-widen-gep.ll to use the same IR test as shown in
pointer-induction.ll.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D106878
2021-08-05 14:44:53 +01:00
Alexey Bataev 214f99b27c Revert "[SLP]Do not emit extra shuffle for insertelements vectorization."
This reverts commit 871ea69803 to fix the
problem if the first vector is not just undef.
2021-08-04 11:28:59 -07:00
Alexey Bataev 871ea69803 [SLP]Do not emit extra shuffle for insertelements vectorization.
If the vectorized insertelements instructions form indentity subvector
(the subvector at the beginning of the long vector), it is just enough
to extend the vector itself, no need to generate inserting subvector
shuffle.

Differential Revision: https://reviews.llvm.org/D107344
2021-08-03 13:18:41 -07:00
Alexey Bataev 7d9d926a18 Revert "[SLP]Improve graph reordering."
This reverts commit e408d1dfab and
2 other (4b25c11321 and
c2deb2afaf) related to fix the problem with the
reordering shuffles.
2021-08-03 12:13:43 -07:00
David Sherwood 0156f91f3b [NFC] Rename enable-strict-reductions to force-ordered-reductions
I'm renaming the flag because a future patch will add a new
enableOrderedReductions() TTI interface and so the meaning of this
flag will change to be one of forcing the target to enable/disable
them. Also, since other places in LoopVectorize.cpp use the word
'Ordered' instead of 'strict' I changed the flag to match.

Differential Revision: https://reviews.llvm.org/D107264
2021-08-03 09:33:01 +01:00
Florian Hahn bb725c9803
[VPlan] Use defined and ops VPValues to print VPInterleaveRecipe.
This patch updates VPInterleaveRecipe::print to print the actual defined
VPValues for load groups and the store VPValue operands for store
groups.

The IR references may become outdated while transforming the VPlan and
the defined and stored VPValues always are up-to-date.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D107223
2021-08-02 18:36:36 +01:00
Alexey Bataev 95e5d401ae [SLP]Improve splats vectorization.
Replace insertelement instructions for splats with just single
insertelement + broadcast shuffle. Also, try to merge these instructions
if they come from the same/shuffled gather node.

Differential Revision: https://reviews.llvm.org/D107104
2021-07-30 10:17:45 -07:00
Alexey Bataev 4b25c11321 [SLP]Fix an assertion for the size of user nodes.
For the nodes with reused scalars the user may be not only of the size
of the final shuffle but also of the size of the scalars themselves,
need to check for this. It is safe to just modify the check here, since
the order of the scalars themselves is preserved, only indeces of the
reused scalars are changed. So, the users with the same size as the
number of scalars in the node, will not be affected, they still will get
the operands in the required order.

Reported by @mstorsjo in D105020.

Differential Revision: https://reviews.llvm.org/D107080
2021-07-30 05:46:44 -07:00
Alexey Bataev f4fb854811 [SLP]Do not consider deleted instruction as external users.
If the instruction was previously deleted, it should not be treated as
an external user. This fixes cost estimation and removes dead
extractelement instructions.

Differential Revision: https://reviews.llvm.org/D107106
2021-07-30 05:37:43 -07:00
Alexey Bataev c2deb2afaf [SLP]Fix a crash in gathered loads analysis.
Need to check that the minimum acceptable vector factor is at least 2,
not 0, to avoid compiler crash during gathered loads analysis.

Differential Revision: https://reviews.llvm.org/D107058
2021-07-30 05:19:17 -07:00
Alexey Bataev 3ad6437fcc [SLP]Fix build on MacOS, NFC. 2021-07-28 06:33:13 -07:00
Alexey Bataev e408d1dfab [SLP]Improve graph reordering.
Reworked reordering algorithm. Originally, the compiler just tried to
detect the most common order in the reordarable nodes (loads, stores,
extractelements,extractvalues) and then fully rebuilding the graph in
the best order. This was not effecient, since it required an extra
memory and time for building/rebuilding tree, double the use of the
scheduling budget, which could lead to missing vectorization due to
exausted scheduling resources.

Patch provide 2-way approach for graph reodering problem. At first, all
reordering is done in-place, it doe not required tree
deleting/rebuilding, it just rotates the scalars/orders/reuses masks in
the graph node.

The first step (top-to bottom) rotates the whole graph, similarly to the previous
implementation. Compiler counts the number of the most used orders of
the graph nodes with the same vectorization factor and then rotates the
subgraph with the given vectorization factor to the most used order, if
it is not empty. Then repeats the same procedure for the subgraphs with
the smaller vectorization factor. We can do this because we still need
to reshuffle smaller subgraph when buildiong operands for the graph
nodes with lasrger vectorization factor, we can rotate just subgraph,
not the whole graph.

The second step (bottom-to-top) scans through the leaves and tries to
detect the users of the leaves which can be reordered. If the leaves can
be reorder in the best fashion, they are reordered and their user too.
It allows to remove double shuffles to the same ordering of the operands in
many cases and just reorder the user operations instead. Plus, it moves
the final shuffles closer to the top of the graph and in many cases
allows to remove extra shuffle because the same procedure is repeated
again and we can again merge some reordering masks and reorder user nodes
instead of the operands.

Also, patch improves cost model for gathering of loads, which improves
x264 benchmark in some cases.

Gives about +2% on AVX512 + LTO (more expected for AVX/AVX2) for {625,525}x264,
+3% for 508.namd, improves most of other benchmarks.
The compile and link time are almost the same, though in some cases it
should be better (we're not doing an extra instruction scheduling
anymore) + we may vectorize more code for the large basic blocks again
because of saving scheduling budget.

Differential Revision: https://reviews.llvm.org/D105020
2021-07-28 05:49:06 -07:00
Florian Hahn c07dd2b885
[LV] Move recurrence backedge fixup code to VPlan::execute (NFC).
As suggested in D105008, move the code that fixes up the backedge value
for first order recurrences to VPlan::execute.

Now all that remains in fixFirstOrderRecurrences is the code responsible
for creating the exit values in the middle block.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D106244
2021-07-28 13:32:40 +01:00
David Green 41cedb1c9a [LV][ARM] Tighten up MLA reduction costing
This makes a couple of changes to the costing of MLA reduction patterns,
to more accurately cost various patterns that can come up from
vectorization.

 - The Arm implementation of getExtendedAddReductionCost is altered to
   only provide costs for legal or smaller types. Larger than legal types
   need to be split, which currently does not work very well, especially
   for predicated reductions where the predicate may be legal but needs to
   be split. Currently we limit it to legal or smaller input types.
 - The getReductionPatternCost has learnt that reduce(ext(mul(ext, ext))
   is a pattern that can come up, and can be treated the same as
   reduce(mul(ext, ext)) providing the extension types match.
 - And it has been adjusted to not count the ext in reduce(mul(ext, ext))
   as part of a reduce(mul) pattern.

Together these changes help to more accurately cost the mla reductions
in cases such as where the extend types don't match or the extend
opcodes are different, picking better vector factors that don't result
in expanded reductions.

Differential Revision: https://reviews.llvm.org/D106166
2021-07-28 12:50:58 +01:00
David Sherwood a5dd6c6cf9 [LoopVectorize] Don't interleave scalar ordered reductions for inner loops
Consider the following loop:

  void foo(float *dst, float *src, int N) {
    for (int i = 0; i < N; i++) {
      dst[i] = 0.0;
      for (int j = 0; j < N; j++) {
        dst[i] += src[(i * N) + j];
      }
    }
  }

When we are not building with -Ofast we may attempt to vectorise the
inner loop using ordered reductions instead. In addition we also try
to select an appropriate interleave count for the inner loop. However,
when choosing a VF=1 the inner loop will be scalar and there is existing
code in selectInterleaveCount that limits the interleave count to 2
for reductions due to concerns about increasing the critical path.
For ordered reductions this problem is even worse due to the additional
data dependency, and so I've added code to simply disable interleaving
for scalar ordered reductions for now.

Test added here:

  Transforms/LoopVectorize/AArch64/strict-fadd-vf1.ll

Differential Revision: https://reviews.llvm.org/D106646
2021-07-27 17:41:01 +01:00
Sander de Smalen d7dd12aee3 [LV] Disable Scalable VFs when tail folding is enabled b/c of low tripcount.
The loop vectorizer may decide to use tail folding when the trip-count
is low. When that happens, scalable VFs are no longer a candidate,
since tail folding/predication is not yet supported for scalable vectors.

This can be re-enabled in a future patch.

Reviewed By: kmclaughlin

Differential Revision: https://reviews.llvm.org/D106657
2021-07-27 11:37:21 +01:00
Sander de Smalen 13ccb09725 [LV] Don't let ForceTargetInstructionCost override Invalid cost.
Invalid costs can be used to avoid vectorization with a given VF, which is
used for scalable vectors to avoid things that the code-generator cannot
handle. If we override the cost using the -force-target-instruction-cost
option of the LV, we would override this mechanism, rendering the flag useless.

This change ensures the cost is only overriden when the original cost that
was calculated is valid. That allows the flag to be used in combination
with the -scalable-vectorization option.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D106677
2021-07-26 20:27:49 +01:00
Sander de Smalen b9051ba848 [LV] Remove assert that VF cannot be scalable in setCostBasedWideningDecision.
Scalarization for scalable vectors is not (yet) supported, so the
LV discards a VF when scalarization is chosen as the widening
decision. It should therefore not assert that the VF is not scalable
when it computes the decision to scalarize.

The code can get here when both the interleave-cost, gather/scatter cost
and scalarization-cost are all illegal. This may e.g. happen for SVE
when the VF=1, to avoid generating `<vscale x 1 x eltty>` types that
the code-generator cannot yet handle.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D106656
2021-07-26 17:11:45 +01:00
Sander de Smalen 981e9dce54 [LV] Don't assume isScalarAfterVectorization if one of the uses needs widening.
This fixes an issue that was found in D105199, where a GEP instruction
is used both as the address of a store, as well as the value of a store.
For the former, the value is scalar after vectorization, but the latter
(as value) requires widening.

Other code in that function seems to prevent similar cases from happening,
but it seems this case was missed.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D106164
2021-07-26 16:01:55 +01:00
Florian Hahn 7a1e73f0b9
Recommit "[VPlan] Add recipe for first-order rec phis, make splicing explicit."
This reverts the revert commit b1777b04dc.

The patch originally got reverted due to a crash:
https://bugs.chromium.org/p/chromium/issues/detail?id=1232798#c2

The underlying issue was that we were not using the stored values from
the modified memory recipes, but the out-of-date values directly from
the IR (accessed via the VPlan). This should be fixed in d995d6376. A
reduced version of the reproducer has been added in 93664503be.
2021-07-26 15:50:30 +01:00
Alexey Bataev 6ca48efcf6 [SLP]Fix costs calculations.
Need to fix several cost-related problems. The final type may be defined
incorrectly because of to early definition (we may end up with the wider
type), the CommonCost should not be redefined in ExtractElements
cost related calculations and the shuffle of the final insertelements
vectors should be calculated as a cost of single vector permutations
+ costs of two vector permutations for other n-1 incoming vectors.

Differential Revision: https://reviews.llvm.org/D106578
2021-07-26 07:14:03 -07:00
Kerry McLaughlin e484e1ae03 [SVE] Fix casts to <FixedVectorType> in truncateToMinimalBitwidths
Fixes more casts to `<FixedVectorType>` for the cases where the
instruction is a Insert/ExtractElementInst.

For fixed-width, this part of truncateToMinimalBitWidths is tested by
AArch64/type-shrinkage-insertelt.ll. I attempted to write a test case for this part
of truncateToMinimalBitWidths which uses scalable vectors, but was unable to add
one. The tests in type-shrinkage-insertelt.ll rely on scalarization to create extract
element instructions for instance, which is not possible for scalable vectors.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D106163
2021-07-26 13:44:51 +01:00
Alexey Bataev d7cb2a0796 Revert "[SLP]Fix costs calculations."
This reverts commit a053afed49 to fix
buildbots.
2021-07-26 05:42:34 -07:00
Alexey Bataev a053afed49 [SLP]Fix costs calculations.
Need to fix several cost-related problems. The final type may be defined
incorrectly because of to early definition (we may end up with the wider
type), the CommonCost should not be redefined in ExtractElements
cost related calculations and the shuffle of the final insertelements
vectors should be calculated as a cost of single vector permutations
+ costs of two vector permutations for other n-1 incoming vectors.

Differential Revision: https://reviews.llvm.org/D106578
2021-07-26 04:37:22 -07:00
Florian Hahn d995d63767
[VPlan] Use stored value from recipes for interleave groups.
Instead of getting the VPValue for the stored IR values through the
current plan, use the stored value of the recipes directly.

This way, the correct VPValues are used if the store recipes have been
modified in the VPlan and the IR value is not correct any longer. This
can happen, e.g. due to D105008.
2021-07-26 12:05:23 +01:00
David Sherwood 0aff1798b5 [Analysis] Add simple cost model for strict (in-order) reductions
I have added a new FastMathFlags parameter to getArithmeticReductionCost
to indicate what type of reduction we are performing:

  1. Tree-wise. This is the typical fast-math reduction that involves
  continually splitting a vector up into halves and adding each
  half together until we get a scalar result. This is the default
  behaviour for integers, whereas for floating point we only do this
  if reassociation is allowed.
  2. Ordered. This now allows us to estimate the cost of performing
  a strict vector reduction by treating it as a series of scalar
  operations in lane order. This is the case when FP reassociation
  is not permitted. For scalable vectors this is more difficult
  because at compile time we do not know how many lanes there are,
  and so we use the worst case maximum vscale value.

I have also fixed getTypeBasedIntrinsicInstrCost to pass in the
FastMathFlags, which meant fixing up some X86 tests where we always
assumed the vector.reduce.fadd/mul intrinsics were 'fast'.

New tests have been added here:

  Analysis/CostModel/AArch64/reduce-fadd.ll
  Analysis/CostModel/AArch64/sve-intrinsics.ll
  Transforms/LoopVectorize/AArch64/strict-fadd-cost.ll
  Transforms/LoopVectorize/AArch64/sve-strict-fadd-cost.ll

Differential Revision: https://reviews.llvm.org/D105432
2021-07-26 10:26:06 +01:00
Nico Weber b1777b04dc Revert "[VPlan] Add recipe for first-order rec phis, make splicing explicit."
Makes clang crash: https://reviews.llvm.org/D105008#2903350
This reverts commit d2a73fb44e.

Also revert a minor formatting follow-up:
This reverts commit 82834a6732.
2021-07-25 17:39:28 -04:00
Caroline Concatto 5a4de84d55 [LoopVectorize] Fix crash for predicated instruction with scalable VF
This patch avoids computing discounts for predicated instructions  when the
VF is scalable.
There is no support for vectorization of loops with division because the
vectorizer cannot guarantee that zero divisions will not happen.

This loop now does not use VF scalable

```
for (long long i = 0; i < n; i++)
    if (cond[i])
      a[i] /= b[i];
```

Differential Revision: https://reviews.llvm.org/D101916
2021-07-22 12:48:27 +01:00
David Green 72dc5cab4f [LV] Make use of PatternMatchers in getReductionPatternCost. NFC
Pulled out of D106166, this modifies getReductionPatternCost to use
PatternMatchers, hopefully simplifying the code a little.
2021-07-21 11:34:30 +01:00
David Green 4272e64acd [LV] Change interface of getReductionPatternCost to return Optional
Currently the Instruction cost of getReductionPatternCost returns an
Invalid cost to specify "did not find the pattern". This changes that to
return an Optional with None specifying not found, allowing Invalid to
mean an infinite cost as is used elsewhere.

Differential Revision: https://reviews.llvm.org/D106140
2021-07-20 16:44:50 +01:00
Caroline Concatto cf78995c4a [NFC][LoopVectorizer] Remove VF.isScalable() assertion from collectInstsToScalarize and getInstructionCost
This patch removes the assertion when VF is scalable and replaces
getKnownMinValue() by getFixedValue(),  so it still guards the code against
scalable vector types.
The assertions were used to guarantee that getknownMinValue were not used for
scalable vectors.

Differential Revision: https://reviews.llvm.org/D106359
2021-07-20 15:56:30 +01:00
Florian Hahn 82834a6732
[VPlan] Fix formatting glitch from d2a73fb44e. 2021-07-20 16:16:30 +02:00
Florian Hahn d2a73fb44e
[VPlan] Add recipe for first-order rec phis, make splicing explicit.
This patch adds a VPFirstOrderRecurrencePHIRecipe, to further untangle
VPWidenPHIRecipe into distinct recipes for distinct use cases/lowering.
See D104989 for a new recipe for reduction phis.

This patch also introduces a new `FirstOrderRecurrenceSplice`
VPInstruction opcode, which is used to make the forming of the vector
recurrence value explicit in VPlan. This more accurately models def-uses
in VPlan and also simplifies code-generation. Now, the vector recurrence
values are created at the right place during VPlan-codegeneration,
rather than during post-VPlan fixups.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D105008
2021-07-20 16:14:17 +02:00
Alexey Bataev d8d8b4574a [SLP]Fix possible crash on unreachable incoming values sorting.
The incoming values for PHI nodes may come from unreachable BasicBlocks,
need to handle this case.

Differential Revision: https://reviews.llvm.org/D106264
2021-07-19 04:54:53 -07:00
Alexey Bataev da3dbfcacf [SLP]Improve calculations of the cost for reused/reordered scalars.
Part of D105020. Also, fixed FIXMEs that need to use wider vector type
when trying to calculate the cost of reused scalars. This may cause
regressions unless D100486 is landed to improve the cost estimations
for long vectors shuffling.

Differential Revision: https://reviews.llvm.org/D106060
2021-07-16 13:40:15 -07:00
Alexey Bataev 1b18e9ab67 [PATCH] D105827: [SLP]Workaround for InsertSubVector cost.
The cost of the InsertSubvector shuffle kind cost is not complete and
may end up with just extracts + inserts costs in many cases. Added
a workaround to represent it as a generic PermuteSingleSrc, which is
still pessimistic but better than InsertSubvector.

Differential Revision: https://reviews.llvm.org/D105827
2021-07-16 12:59:08 -07:00
Kerry McLaughlin 49d73130ca [LV] Avoid scalable vectorization for loops containing alloca
This patch returns an Invalid cost from getInstructionCost() for alloca
instructions if the VF is scalable, as otherwise loops which contain
these instructions will crash when attempting to scalarize the alloca.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D105824
2021-07-16 11:47:13 +01:00
Sander de Smalen 239d01fa88 Reland "[LV] Print remark when loop cannot be vectorized due to invalid costs."
The original patch was:
  https://reviews.llvm.org/D105806

There were some issues with undeterministic behaviour of the sorting
function, which led to scalable-call.ll passing and/or failing. This
patch fixes the issue by numbering all instructions in the array first,
and using that number as the order, which should provide a consistent
ordering.

This reverts commit a607f64118.
2021-07-16 10:52:01 +01:00
Sanjay Patel 81ce3aa30c [SLP] avoid leaking poison in reduction of safe boolean logic ops
This bug was introduced with D105730 / 25ee55c0ba .

If we are not converting all of the operations of a reduction
into a vector op, we need to preserve the existing select form
of the remaining ops. Otherwise, we are potentially leaking
poison where it did not in the original code.

Alive2 agrees that the version that freezes some inputs
and then falls back to scalar is correct:
https://alive2.llvm.org/ce/z/erF4K2
2021-07-15 17:33:06 -04:00
Arthur Eubanks 99cb2507f3 Revert "[SLP]Workaround for InsertSubVector cost."
This reverts commit 2eb50baf05.

Causes hangs, see comments on D105827.
2021-07-15 10:19:41 -07:00
Philip Reames 95346ba877 [LV] Enable vectorization of multiple exit loops w/computable exit counts
This change enables vectorization of multiple exit loops when the exit count is statically computable. That requirement - shared with the rest of LV - in turn requires each exit to be analyzeable and to dominate the latch.

The majority of work to support this was done in a set of previous patches. In particular,, 72314466 avoids having multiple edges from the middle block to the exits, and 4b33b2387 which added support for non-latch single exit and multiple exits with a single exiting block. As a result, this change is basically just removing a bailout and adjusting some tests now that the prerequisite work is done and has stuck in tree for a bit.

Differential Revision: https://reviews.llvm.org/D105817
2021-07-15 08:53:51 -07:00
Sander de Smalen a607f64118 Revert "[LV] Print remark when loop cannot be vectorized due to invalid costs."
This reverts commit efaf3099c8.
This reverts commit dc7bdc1e71.

Reverting patches due to buildbot failures.
2021-07-15 15:21:57 +01:00
Sander de Smalen dc7bdc1e71 [LV] Fix determinism for failing scalable-call.ll test.
The sort function for emitting an OptRemark was not deterministic,
which caused scalable-call.ll to fail on some buildbots. This patch
fixes that.

This patch also fixes an issue where `Instruction::comesBefore()`
is called when two Instructions are in different basic blocks,
which would otherwise cause an assertion failure.
2021-07-15 13:16:59 +01:00
Alexey Bataev ba2690b17b [SLP][NFC]Fix variables names, NFC. 2021-07-14 12:43:45 -07:00
Simon Pilgrim 4fd0addb68 [SLP] Fix case of variable name. NFCI. 2021-07-14 20:20:04 +01:00
Sander de Smalen efaf3099c8 [LV] Print remark when loop cannot be vectorized due to invalid costs.
This patch emits remarks for instructions that have invalid costs for
a given set of vectorization factors. Some example output:

  t.c:4:19: remark: Instruction with invalid costs prevented vectorization at VF=(vscale x 1): load
      dst[i] = sinf(src[i]);
                    ^
  t.c:4:14: remark: Instruction with invalid costs prevented vectorization at VF=(vscale x 1, vscale x 2, vscale x 4): call to llvm.sin.f32
      dst[i] = sinf(src[i]);
               ^
  t.c:4:12: remark: Instruction with invalid costs prevented vectorization at VF=(vscale x 1): store
      dst[i] = sinf(src[i]);
             ^

Reviewed By: fhahn, kmclaughlin

Differential Revision: https://reviews.llvm.org/D105806
2021-07-14 17:11:33 +01:00
Alexey Bataev 2eb50baf05 [SLP]Workaround for InsertSubVector cost.
The cost of the InsertSubvector shuffle kind cost is not complete and
may end up with just extracts + inserts costs in many cases. Added
a workaround to represent it as a generic PermuteSingleSrc, which is
still pessimistic but better than InsertSubvector.

Differential Revision: https://reviews.llvm.org/D105827
2021-07-14 07:54:24 -07:00
Sanjay Patel 25ee55c0ba [SLP] match logical and/or as reduction candidates
This has been a work-in-progress for a long time...we finally have all of
the pieces in place to handle vectorization of compare code as shown in:
https://llvm.org/PR41312

To do this (see PhaseOrdering tests), we converted SimplifyCFG and
InstCombine to the poison-safe (select) forms of the logic ops, so now we
need to have SLP recognize those patterns and insert a freeze op to make
a safe reduction:
https://alive2.llvm.org/ce/z/NH54Ah

We get the minimal patterns with this patch, but the PhaseOrdering tests
show that we still need adjustments to get the ideal IR in some or all of
the motivating cases.

Differential Revision: https://reviews.llvm.org/D105730
2021-07-14 09:02:31 -04:00
Sander de Smalen d2e4ccc790 [LV] Ignore candidate VFs with invalid costs.
This follows on from discussion on the mailing-list:
  https://lists.llvm.org/pipermail/llvm-dev/2021-June/151047.html

to interpret an Invalid cost as 'infinitely expensive', as this
simplifies some of the legalization issues with scalable vectors.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D105473
2021-07-12 09:58:22 +01:00
Florian Hahn c6e4c1fbd8
[VPlan] Remove default arg from getVPValue (NFC).
The const version of VPValue::getVPValue still had a default value for
the value index. Remove the default value and use getVPSingleValue
instead, which is the proper function.
2021-07-11 22:03:09 +02:00
Sander de Smalen 239fcda268 [LV] NFCI: Do cost comparison on InstructionCost directly.
Instead of performing the isMoreProfitable() operation on
InstructionCost::CostTy the operation is performed on InstructionCost
directly, so that it can handle the case where one of the costs is
Invalid.

This patch also changes the CostTy to be int64_t, so that the type is
wide enough to deal with multiplications with e.g. `unsigned MaxTripCount`.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D105113
2021-07-10 11:57:16 +01:00
Valery N Dmitriev 8e9216fe87 [SLP] Do not make an attempt to match reduction on already erased instruction.
Differential Revision: https://reviews.llvm.org/D105752
2021-07-09 17:13:15 -07:00
Sanjay Patel c2b7f09d8c [SLP] make invalid operand explicit for extra arg in reduction matching; NFC
This makes it clearer when we have encountered the extra arg.
Also, we may need to adjust the way the operand iteration
works when handling logical and/or.
2021-07-09 15:32:12 -04:00
Sanjay Patel 486992f958 [SLP] improve code comments; NFC
This likely started out only supporint binops,
but now we handle min/max using cmp+sel, and
we may extend to handle bool logic in the form
of select.
2021-07-09 12:49:54 -04:00
Sanjay Patel 544f2711bb [SLP] make checks for cmp+select min/max more explicit
This is NFC-intended currently (so no test diffs). The motivation
is to eventually allow matching for poison-safe logical-and and
logical-or (these are in the form of a select-of-bools).
( https://llvm.org/PR41312 )

Those patterns will not have all of the same constraints as min/max
in the form of cmp+sel. We may also end up removing the cmp+sel
min/max matching entirely (if we canonicalize to intrinsics), so
this will make that step easier.
2021-07-09 12:43:43 -04:00
David Green 38c9a4068d [TTI] Remove IsPairwiseForm from getArithmeticReductionCost
This patch removes the IsPairwiseForm flag from the Reduction Cost TTI
hooks, along with some accompanying code for pattern matching reductions
from trees starting at extract elements. IsPairWise is now assumed to be
false, which was the predominant way that the value was used from both
the Loop and SLP vectorizers. Since the adjustments such as D93860, the
SLP vectorizer has not relied upon this distinction between paiwise and
non-pairwise reductions.

This also removes some code that was detecting reductions trees starting
from extract elements inside the costmodel. This case was
double-counting costs though, adding the individual costs on the
individual instruction _and_ the total cost of the reduction. Removing
it changes the costs in llvm/test/Analysis/CostModel/X86/reduction.ll to
not double count. The cost of reduction intrinsics is still tested
through the various tests in
llvm/test/Analysis/CostModel/X86/reduce-xyz.ll.

Differential Revision: https://reviews.llvm.org/D105484
2021-07-09 11:51:16 +01:00
Alexey Bataev c574d2fbac [SLP]Improve vectorization of stores.
Patch tries to improve the vectorization of stores. Originally, we just
check the type and the base pointer of the store.
Patch adds some extra checks to avoid non-profitable vectorization
cases. It includes analysis of the scalar values to be stored and
triggers the vectorization attempt only if the scalar values have
same/alt opcode and are from same basic block, i.e. we don't end up
immediately with the gather node, which is not profitable.
This also improves compile time by filtering out non-profitable cases.

Part of D57059.

Differential Revision: https://reviews.llvm.org/D104122
2021-07-08 12:35:39 -07:00
Alexey Bataev 0d74fd3fdf [SLP][COST][X86]Improve cost model for masked gather.
Revived D101297 in its original form + added some changes in X86
legalization cehcking for masked gathers.

This solution is the most stable and the most correct one. We have to
check the legality before trying to build the masked gather in SLP.
Without this check we have incorrect cost (for SLP) in case if the masked gather
is not legal/slower than the gather. And we're missing some
vectorization opportunities.

This can be fixed in the cost model, but in this case we need to add
special checks for the cost of GEPs for ScatterVectorize node, add
special check for small trees, etc., i.e. there are a lot of corner
cases here and there, which insrease code base and make it harder to
maintain the code.

> Can't we rely on cost model to deal with this? This can be profitable for futher vectorization, when we can start from such gather loads as seed.

The question from D101297. Actually, no, it can't. Actually, simple
gather may give us better result, especially after we started
vectorization of insertelements. Plus, like I said before, the cost for
non-legal masked gathers leads to missed vectorization opportunities.

Differential Revision: https://reviews.llvm.org/D105042
2021-07-08 11:53:30 -07:00
Sanjay Patel 97c473ad39 [SLP] rename variable to not be misleading; NFC
The reduction matching was probably only dealing with binops
when it was written, but we have now generalized it to handle
select and intrinsics too, so assert on that too.
2021-07-07 14:40:21 -04:00
Philip Reames 723144665b [LV] Unconditionally branch from middle to scalar preheader if the scalar loop must execute (try 4)
Resubmit after the following changes:

* Fix a latent bug related to unrolling with required epilogue (see e49d65f). I believe this is the cause of the prior PPC buildbot failure.
* Disable non-latch exits for epilogue vectorization to be safe (9ffa90d)
* Split out assert movement (600624a) to reduce churn if this gets reverted again.

Previous commit message (try 3)

Resubmit after fixing test/Transforms/LoopVectorize/ARM/mve-gather-scatter-tailpred.ll

Previous commit message...

This is a resubmit of 3e5ce4 (which was reverted by 7fe41ac).  The original commit caused a PPC build bot failure we never really got to the bottom of.  I can't reproduce the issue, and the bot owner was non-responsive.  In the meantime, we stumbled across an issue which seems possibly related, and worked around a latent bug in 80e8025.  My best guess is that the original patch exposed that latent issue at higher frequency, but it really is just a guess.

Original commit message follows...

If we know that the scalar epilogue is required to run, modify the CFG to end the middle block with an unconditional branch to scalar preheader. This is instead of a conditional branch to either the preheader or the exit block.

The motivation to do this is to support multiple exit blocks. Specifically, the current structure forces us to identify immediate dominators and *which* exit block to branch from in the middle terminator. For the multiple exit case - where we know require scalar will hold - these questions are ill formed.

This is the last change needed to support multiple exit loops, but since the diffs are already large enough, I'm going to land this, and then enable separately. You can think of this as being NFCIish prep work, but the changes are a bit too involved for me to feel comfortable tagging the review that way.

Differential Revision: https://reviews.llvm.org/D94892
2021-07-07 07:44:35 -07:00
Dylan Fleming 7215dcfe36 [SVE] Fix ShuffleVector cast<FixedVectorType> in truncateToMinimalBitwidths
Depends on D104239

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D105341
2021-07-07 15:30:10 +01:00
Dylan Fleming 7586b47fb6 [SVE] Fix cast<FixedVectorType> in truncateToMinimalBitwidths
Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D104239
2021-07-07 09:58:05 +01:00
Philip Reames 9ffa90d6c2 [LV] Disable epilogue vectorization for non-latch exits
When skimming through old review discussion, I noticed a post commit comment on an earlier patch which had gone unaddressed.  Better late (4 months), than never right?

I'm not aware of an active problem with the combination of non-latch exits and epilogue vectorization, but the interaction was not considered and I'm not modivated to make epilogue vectorization work with early exits. If there were a bug in the interaction, it would be pretty hard to hit right now (as we canonicalize towards bottom tested loops), but an upcoming change to allow multiple exit loops will greatly increase the chance for error.  Thus, let's play it safe for now.
2021-07-06 10:57:10 -07:00
Alexey Bataev 4e1a0684f1 [SLP]Fix non-determinism in PHI sorting.
Compare type IDs and DFS numbering for basic block instead of addresses
to fix non-determinism.

Differential Revision: https://reviews.llvm.org/D105031
2021-07-06 08:45:45 -07:00
Florian Hahn ef0d147cdc
Recommit "[VPlan] Add VPReductionPHIRecipe (NFC)." and follow-ups.
This reverts commit 706bbfb35b.

The committed version moves the definition of VPReductionPHIRecipe out
of an ifdef only intended for ::print helpers. This should resolve the
build failures that caused the revert
2021-07-06 14:15:42 +01:00
Kerry McLaughlin a7512401e5 [LV] Prevent vectorization with unsupported element types.
This patch adds a TTI function, isElementTypeLegalForScalableVector, to query
whether it is possible to vectorize a given element type. This is called by
isLegalToVectorizeInstTypesForScalable to reject scalable vectorization if
any of the instruction types in the loop are unsupported, e.g:

  int foo(__int128_t* ptr, int N)
    #pragma clang loop vectorize_width(4, scalable)
    for (int i=0; i<N; ++i)
      ptr[i] = ptr[i] + 42;

This example currently crashes if we attempt to vectorize since i128 is not a
supported type for scalable vectorization.

Reviewed By: sdesmalen, david-arm

Differential Revision: https://reviews.llvm.org/D102253
2021-07-06 13:06:21 +01:00
Florian Hahn 706bbfb35b
Revert "[VPlan] Add VPReductionPHIRecipe (NFC)." and follow-ups
This reverts commit 3fed6d443f,
bbcbf21ae6 and
6c3451cd76.

The changes causing build failures with certain configurations, e.g.
https://lab.llvm.org/buildbot/#/builders/67/builds/3365/steps/6/logs/stdio

    lib/libLLVMVectorize.a(LoopVectorize.cpp.o): In function `llvm::VPRecipeBuilder::tryToCreateWidenRecipe(llvm::Instruction*, llvm::ArrayRef<llvm::VPValue*>, llvm::VFRange&, std::unique_ptr<llvm::VPlan, std::default_delete<llvm::VPlan> >&) [clone .localalias.8]':
    LoopVectorize.cpp:(.text._ZN4llvm15VPRecipeBuilder22tryToCreateWidenRecipeEPNS_11InstructionENS_8ArrayRefIPNS_7VPValueEEERNS_7VFRangeERSt10unique_ptrINS_5VPlanESt14default_deleteISA_EE+0x63b): undefined reference to `vtable for llvm::VPReductionPHIRecipe'
    collect2: error: ld returned 1 exit status
2021-07-06 12:10:03 +01:00
Florian Hahn 3fed6d443f
[VPlan] Mark overriden function in VPWidenPHIRecipe as virtual.
VPReductionRecipe overrides those implementations. Mark them as virtual
in the VPWidenPHIRecipe to unbreak build in certain configurations.
2021-07-06 12:00:41 +01:00
Florian Hahn bbcbf21ae6
[VPlan] Add destructor to VPReductionRecipe to unbreak build.
Attempt to unbreak
https://lab.llvm.org/buildbot/#/builders/67/builds/3363/steps/6/logs/stdio
2021-07-06 11:41:20 +01:00
Florian Hahn 6c3451cd76
[VPlan] Add VPReductionPHIRecipe (NFC).
This patch is a first step towards splitting up VPWidenPHIRecipe into
separate recipes for the 3 distinct cases they model:

    1. reduction phis,
    2. first-order recurrence phis,
    3. pointer induction phis.

This allows untangling the code generation and allows us to reduce the
reliance on LoopVectorizationCostModel during VPlan code generation.

Discussed/suggested in D100102, D100113, D104197.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D104989
2021-07-06 11:25:28 +01:00
Kerry McLaughlin 17b701c43c [LV] Collect a list of all element types found in the loop (NFC)
Splits `getSmallestAndWidestTypes` into two functions, one of which now collects
a list of all element types found in the loop (`ElementTypesInLoop`). This ensures we do not
have to iterate over all instructions in the loop again in other places, such as in D102253
which disables scalable vectorization of a loop if any of the instructions use invalid types.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D105437
2021-07-06 10:37:41 +01:00
Caroline Concatto b868a2d2c6 [SLPVectorizer] Fix crash in vectorizeChainsInBlock for scalable vector.
The function vectorizeChainsInBlock does not support scalable vector,
because function like canReuseExtract and isCommutative in the code
path assert with scalable vectors.

This patch avoids vectorizing blocks that have extract instructions with scalable
vector..

Differential Revision: https://reviews.llvm.org/D104809
2021-07-05 12:43:41 +01:00
Nikita Popov a213f735d8 [IR] Deprecate GetElementPtrInst::CreateInBounds without element type
This API is not compatible with opaque pointers, the method
accepting an explicit pointer element type should be used instead.

Thankfully there were few in-tree users. The BPF case still ends
up using the pointer element type for now and needs something like
D105407 to avoid doing so.
2021-07-04 16:49:30 +02:00
Paul Walker 287d39dd5a [NFC] Fix a few whitespace issues and typos. 2021-07-04 11:49:58 +01:00
Nikita Popov fabc17192e [IRBuilder] Add type argument to CreateMaskedLoad/Gather
Same as other CreateLoad-style APIs, these need an explicit type
argument to support opaque pointers.

Differential Revision: https://reviews.llvm.org/D105395
2021-07-04 12:17:59 +02:00
Alexey Bataev 7f7e4aed21 [SLP][NFC]Refactor findLaneForValue and make it static member, NFC, by
V.Dmitriev.

Reduces number of arguments
2021-07-02 10:30:13 -07:00
Alexey Bataev 28ac873bcb [SLP]Fix gathering of the scalars by not ignoring UndefValues.
The compiler should not ignore UndefValue when gathering the scalars,
otherwise the resulting code may be less defined than the original one.
Also, grouped scalars to insert them at first to reduce the analysis in
further passes.

Differential Revision: https://reviews.llvm.org/D105275
2021-07-02 04:46:48 -07:00
David Sherwood 51b4ab26ca [NFC] Add new setDebugLocFromInst that uses the class Builder by default
In lots of places we were calling setDebugLocFromInst and passing
in the same Builder member variable found in InnerLoopVectorizer.
I personally found this confusing so I've changed the interface
to take an Optional<IRBuilder<> *> and we can now pass in None
when we want to use the class member variable.

Differential Revision: https://reviews.llvm.org/D105100
2021-07-01 14:23:34 +01:00
David Sherwood 7b7b5b5a26 [NFC] Rename shadowed variable in InnerLoopVectorizer::createInductionVariable
Avoid creating a IRBuilder stack variable with the same name as the
class member.
2021-06-30 11:11:49 +01:00
Philip Reames e49d65f36d [LV] Fix bug when unrolling (only) a loop with non-latch exit
If we unroll a loop in the vectorizer (without vectorizing), and the cost model requires a epilogue be generated for correctness, the code generation must actually do so.

The included test case on an unmodified opt will access memory one past the expected bound.  As a result, this patch is fixing a latent miscompile.

Differential Revision: https://reviews.llvm.org/D103700
2021-06-29 08:04:26 -07:00
David Sherwood 9de63367d8 Revert "[NFC] Remove shadowed variable in InnerLoopVectorizer::createInductionVariable"
This reverts commit 9dde514162.
2021-06-29 15:20:22 +01:00
David Sherwood 9dde514162 [NFC] Remove shadowed variable in InnerLoopVectorizer::createInductionVariable
Avoid creating a IRBuilder stack variable with the same name as the
class member.
2021-06-29 14:34:30 +01:00
David Sherwood 8a3365fba2 Revert "[NFC] Remove shadowed variable in InnerLoopVectorizer::createInductionVariable"
This reverts commit dcfc2c3fac.
2021-06-29 14:04:42 +01:00
Florian Hahn 47215e1c62
[LV] Fix crash when target instruction for sinking is dead.
This patch fixes a crash when the target instruction for sinking is
dead. In that case, no recipe is created and trying to get the recipe
for it results in a crash. To ensure all sink targets are alive, find &
use the first previous alive instruction.

Note that the case where the sink source is dead is already handled.

Found by
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=35320

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D104603
2021-06-29 13:31:22 +01:00
David Sherwood 303b6d5e98 [LoopVectorize] Add support for scalable vectorization of invariant stores
Previously in setCostBasedWideningDecision if we encountered an
invariant store we just assumed that we could scalarize the store
and called getUniformMemOpCost to get the associated cost.
However, for scalable vectors this is not an option because it is
not currently possibly to scalarize the store. At the moment we
crash in VPReplicateRecipe::execute when trying to scalarize the
store.

Therefore, I have changed setCostBasedWideningDecision so that if
we are storing a scalable vector out to a uniform address and the
target supports scatter instructions, then we should use those
instead.

Tests have been added here:

  Transforms/LoopVectorize/AArch64/sve-inv-store.ll

Differential Revision: https://reviews.llvm.org/D104624
2021-06-29 11:56:09 +01:00
David Sherwood dcfc2c3fac [NFC] Remove shadowed variable in InnerLoopVectorizer::createInductionVariable
Avoid creating a IRBuilder stack variable with the same name as the
class member.
2021-06-29 09:14:35 +01:00
Kerry McLaughlin f99672568f [LoopVectorize] Fix strict reductions where VF = 1
Currently we will allow loops with a fixed width VF of 1 to vectorize
if the -enable-strict-reductions flag is set. However, the loop vectorizer
will not use ordered reductions if `VF.isScalar()` and the resulting
vectorized loop will be out of order.

This patch removes `VF.isVector()` when checking if ordered reductions
should be used. Also, instead of converting the FAdds to reductions if the
VF = 1, operands of the FAdds are changed such that the order is preserved.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D104533
2021-06-28 11:27:10 +01:00
Florian Hahn 80aa7e147e
[VPlan] Merge predicated-triangle regions, after sinking.
Sinking scalar operands into predicated-triangle regions may allow
merging regions. This patch adds a VPlan-to-VPlan transform that tries
to merge predicate-triangle regions after sinking.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D100260
2021-06-28 11:10:38 +01:00
Nikita Popov a9129f8964 [LoadStoreVectorizer] Support opaque pointers
There are remaining redundant bitcasts.
2021-06-27 15:42:16 +02:00
Florian Hahn f1a6430272
[VPlan] Track both incoming values for first-order recurrence phis.
This patch updates VPWidenPHI recipes for first-order recurrences to
also track the incoming value from the back-edge. Similar to D99294,
which did the same for reductions.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D104197
2021-06-27 14:29:35 +01:00
Florian Hahn 7f36981977
[LV] Adjust trip count based on IsOrdered in widenPHIInstruction (NFC).
Suggested in D104197, avoids the early exit.
2021-06-26 13:13:25 +01:00
Florian Hahn cc5ee857f9
[LV] Doxygenize VectorizationFactor member comments (NFC).
Minor cleanup for follow-up patch.
2021-06-25 18:35:00 +01:00
Florian Hahn 91053e327c
[LV] Reflow comment for VectorizationCostTy (NFC). 2021-06-25 14:20:06 +01:00
Florian Hahn 833bdbe93c
[LV] Support sinking recipe in replicate region after another region.
This patch handles sinking a replicate region after another replicate
region. In that case, we can connect the sink region after the target
region. This properly handles the case for which an assertion has been
added in 337d765282.

Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=34842.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D103514
2021-06-24 13:58:42 +01:00
Nikita Popov 00d3f7cc3c [LAA] Make getPointersDiff() API compatible with opaque pointers
Make getPointersDiff() and sortPtrAccesses() compatible with opaque
pointers by explicitly passing in the element type instead of
determining it from the pointer element type.

The SLPVectorizer result is slightly non-optimal in that unnecessary
pointer bitcasts are added.

Differential Revision: https://reviews.llvm.org/D104784
2021-06-23 18:44:34 +02:00
Alexey Bataev 908b753661 [SLP]Improve vectorization of PHI instructions.
Perform better analysis when trying to vectorize PHIs.
1. Do not try to vectorize vector PHIs.
2. Do deeper analysis for more profitable nodes for the vectorization.

Before we just tried to vectorize the PHIs of the same type. Patch
improves this and tries to vectorize PHIs with incoming values which
come from the same basic block, have the same and/or alternative
opcodes.

It allows to save the compile time and provides better vectorization
results in general.

Part of D57059.

Differential Revision: https://reviews.llvm.org/D103638
2021-06-21 12:26:24 -07:00
Roman Lebedev 37dfc467ac
[NFC] LoopVectorizationCostModel::getMaximizedVFForTarget(): clarify debug msg
This really isn't talking about vectors in general,
but only about either fixed or scalable vectors,
and it's pretty confusing to see it state
that there aren't any vectors :)
2021-06-17 21:07:34 +03:00
Florian Hahn 80a403348b
[VPlan] Support PHIs as LastInst when inserting scalars in ::get().
At the moment, we create insertelement instructions directly after
LastInst when inserting scalar values in a vector in
VPTransformState::get.

This results in invalid IR when LastInst is a phi, followed by another
phi. In that case, the new instructions should be inserted just after
the last PHI node in the block.

At the moment, I don't think the problematic case can be triggered, but
it can happen once predicate regions are merged and multiple
VPredInstPHI recipes are in the same block (D100260).

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D104188
2021-06-17 09:36:44 +01:00
Bjorn Pettersson 4c7f820b2b Update @llvm.powi to handle different int sizes for the exponent
This can be seen as a follow up to commit 0ee439b705,
that changed the second argument of __powidf2, __powisf2 and
__powitf2 in compiler-rt from si_int to int. That was to align with
how those runtimes are defined in libgcc.
One thing that seem to have been missing in that patch was to make
sure that the rest of LLVM also handle that the argument now depends
on the size of int (not using the si_int machine mode for 32-bit).
When using __builtin_powi for a target with 16-bit int clang crashed.
And when emitting libcalls to those rtlib functions, typically when
lowering @llvm.powi), the backend would always prepare the exponent
argument as an i32 which caused miscompiles when the rtlib was
compiled with 16-bit int.

The solution used here is to use an overloaded type for the second
argument in @llvm.powi. This way clang can use the "correct" type
when lowering __builtin_powi, and then later when emitting the libcall
it is assumed that the type used in @llvm.powi matches the rtlib
function.

One thing that needed some extra attention was that when vectorizing
calls several passes did not support that several arguments could
be overloaded in the intrinsics. This patch allows overload of a
scalar operand by adding hasVectorInstrinsicOverloadedScalarOpd, with
an entry for powi.

Differential Revision: https://reviews.llvm.org/D99439
2021-06-17 09:38:28 +02:00
Evgeniy Brevnov 96cded5b79 [SLP] Incorrect handling of external scalar values
Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D103954
2021-06-16 13:27:36 +07:00
Florian Hahn 96ca03493a
[VectorCombine] Limit scalarization to non-poison indices for now.
As Eli mentioned post-commit in D103378, the result of the freeze may
still be out-of-range according to Alive2. So for now, just limit the
transform to indices that are non-poison.
2021-06-14 16:40:14 +01:00
Simon Pilgrim b013c58e82 VPlanSLP.cpp - tidy implicit header dependencies. NFCI.
We don't use std::string and std::vector, but we do use std::pair and std::max.
2021-06-13 12:37:17 +01:00
Valery N Dmitriev 94a07c79cf [SLP][NFC] Fix condition that was supposed to save a bit of compile time.
It was found by chance revealing discrepancy between comment (few lines above),
the condition and how re-ordering of instruction is done inside the if statement
it guards. The condition was always evaluated to true.

Differential Revision: https://reviews.llvm.org/D104064
2021-06-11 10:08:55 -07:00
Alexey Bataev a010d4230e [SLP]Allow reordering of insertelements.
After we added support for non-ordered insertelements, we can allow
their reordering.

Differential Revision: https://reviews.llvm.org/D104057
2021-06-11 08:47:41 -07:00
Alexey Bataev 74af4bb1f4 [SLP]Remove unnecessary UndefValue in CreateShuffle.
No need to use UndefValue in CreateShuffle call.

Differential Revision: https://reviews.llvm.org/D104113
2021-06-11 08:08:30 -07:00
Roman Lebedev 20542b47d6
[VectorCombine] scalarizeLoadExtract(): use computeAlignmentAfterScalarization() helper
This results in slightly more optimistic alignments in some cases
2021-06-11 12:47:10 +03:00
Roman Lebedev abc0e0125c
[NFC][VectorCombine] Extract computeAlignmentAfterScalarization() helper function 2021-06-11 12:47:09 +03:00
Simon Pilgrim 5e6bfb661e [Analysis] Pass RecurrenceDescriptor as const reference. NFCI.
We were passing the RecurrenceDescriptor by value to most of the reduction analysis methods, despite it being rather bulky with TrackingVH members (that can be costly to copy). In all these cases we're only using the RecurrenceDescriptor for rather basic purposes (access to types/kinds etc.).

Differential Revision: https://reviews.llvm.org/D104029
2021-06-11 10:24:14 +01:00
Qiu Chaofan 2670c7dd5b [VectorCombine] Fix alignment in single element store
This fixes the concern in single element store scalarization that the
alignment of new store may be larger than it should be. It calculates
the largest alignment if index is constant, and a safe one if not.

Reviewed By: lebedev.ri, spatel

Differential Revision: https://reviews.llvm.org/D103419
2021-06-11 10:28:15 +08:00
Slava Nikolaev 119965865c LoadStoreVectorizer: support different operand orders in the add sequence match
First we refactor the code which does no wrapping add sequences
match: we need to allow different operand orders for
the key add instructions involved in the match.

Then we use the refactored code trying 4 variants of matching operands.

Originally the code relied on the fact that the matching operands
of the two last add instructions of memory index calculations
had the same LHS argument. But which operand is the same
in the two instructions is actually not essential, so now we allow
that to be any of LHS or RHS of each of the two instructions.
This increases the chances of vectorization to happen.

Reviewed By: volkan

Differential Revision: https://reviews.llvm.org/D103912
2021-06-10 16:31:35 -07:00
Joachim Meyer 4f01122c3f [LV] Parallel annotated loop does not imply all loads can be hoisted.
As noted in https://bugs.llvm.org/show_bug.cgi?id=46666, the current behavior of assuming if-conversion safety if a loop is annotated parallel (`!llvm.loop.parallel_accesses`), is not expectable, the documentation for this behavior was since removed from the LangRef again, and can lead to invalid reads.
This was observed in POCL (https://github.com/pocl/pocl/issues/757) and would require similar workarounds in current work at hipSYCL.

The question remains why this was initially added and what the implications of removing this optimization would be.
Do we need an alternative mechanism to propagate the information about legality of if-conversion?
Or is the idea that conditional loads in `#pragma clang loop vectorize(assume_safety)` can be executed unmasked without additional checks flawed in general?
I think this implication is not part of what a user of that pragma (and corresponding metadata) would expect and thus dangerous.

Only two additional tests failed, which are adapted in this patch. Depending on the further direction force-ifcvt.ll should be removed or further adapted.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D103907
2021-06-10 23:37:57 +02:00
Alexey Bataev a893b44187 [SLP]Disable scheduling of insertelements.
There is no need to schedule insertelement instructions. The compiler
did not schedule them before it started support their vectorization and
it should not do it after. We pre-schedule them manually when finding
a build vector sequence.
Disabling scheduling of insertelement instructions improves compile
time and vectorization of the very large basic blocks by saving
scheduling budget for other instructions.

Differential Revision: https://reviews.llvm.org/D104026
2021-06-10 10:25:26 -07:00
Keith Smiley 026170d17d Fix range-loop-analysis warning
```
llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8024:19: warning: loop variable 'VF' of type 'const llvm::ElementCount' creates a copy from type 'const llvm::ElementCount' [-Wrange-loop-analysis]
  for (const auto VF : VFCandidates) {
                  ^
llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8024:8: note: use reference type 'const llvm::ElementCount &' to prevent copying
  for (const auto VF : VFCandidates) {
       ^~~~~~~~~~~~~~~
                  &
1 warning generated.
```

Differential Revision: https://reviews.llvm.org/D103970
2021-06-10 08:39:54 -07:00
Alexey Bataev a0086add2e [SLP]Improve gathering of scalar elements.
1. Better sorting of scalars to be gathered. Trying to insert
   constants/arguments/instructions-out-of-loop at first and only then
   the instructions which are inside the loop. It improves hoisting of
   invariant insertelements instructions.
2. Better detection of shuffle candidates in gathering function.
3. The cost of insertelement for constants is 0.

Part of D57059.

Differential Revision: https://reviews.llvm.org/D103458
2021-06-09 05:23:21 -07:00
Kerry McLaughlin 14eeccfe9a [LoopVectorize] Don't use strict reductions when reordering is allowed
If the `-enable-strict-reductions` flag is set to true, then currently we will
always choose to vectorize the loop with strict in-order reductions. This is
not necessary where we allow the reordering of FP operations, such as
when loop hints are passed via metadata.

This patch moves useOrderedReductions so that we can also check whether
loop hints allow reordering, in which case we should use the default
behaviour of vectorizing with unordered reductions.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D103814
2021-06-08 10:39:29 +01:00
Florian Hahn 1465e7770b
[VPlan] Print successors of VPRegionBlocks.
The non-DOT printing does not include the successors of VPregionBlocks.
This patch use the same style for printing successors as for
VPBasicBlock.

I think the printing of successors could be a bit improved further, as
at the moment it is hard to ensure a check line matches all successors.
But that can be done as follow-up.

Reviewed By: a.elovikov

Differential Revision: https://reviews.llvm.org/D103515
2021-06-07 17:57:21 +01:00
Florian Hahn 23c2f2e6b2
[LV] Mark increment of main vector loop induction variable as NUW.
This patch marks the induction increment of the main induction variable
of the vector loop as NUW when not folding the tail.

If the tail is not folded, we know that End - Start >= Step (either
statically or through the minimum iteration checks). We also know that both
Start % Step == 0 and End % Step == 0. We exit the vector loop if %IV +
%Step == %End. Hence we must exit the loop before %IV + %Step unsigned
overflows and we can mark the induction increment as NUW.

This should make SCEV return more precise bounds for the created vector
loops, used by later optimizations, like late unrolling.

At the moment quite a few tests still need to be updated, but before
doing so I'd like to get initial feedback to make sure I am not missing
anything.

Note that this could probably be further improved by using information
from the original IV.

Attempt of modeling of the assumption in Alive2:
https://alive2.llvm.org/ce/z/H_DL_g

Part of a set of fixes required for PR50412.

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D103255
2021-06-07 10:47:52 +01:00
Alexey Bataev 8c48d77cdf [SLP]Improve cost estimation/emission of externally used extractelements.
No need to recalculate the cost of extractelements, just no need to
compensate the cost of all extractelements, need to check before if this
is actually going to be removed at the vectorization. Also, no need to
 generate new extractelement instruction, we may just regenerate the
 original one. It may improve the final vectorization.

Differential Revision: https://reviews.llvm.org/D102933
2021-06-03 10:26:59 -07:00
Alexey Bataev 89f3bc7698 [SLP]Allow to reorder nodes with >2 scalar values.
tryToVectorizeList function allows to reorder only 2 scalars. Patch
allows to reorder >2 scalars. Also, to avoid possible regressions, it
allows extra vectorization of the remaining parts of the scalars
elements if possible.

Part of D57059.

Differential Revision: https://reviews.llvm.org/D103247
2021-06-03 10:01:36 -07:00
Harald van Dijk 5d2b3de284
[SLP] Avoid std::stable_sort(properlyDominates()).
As noticed by NAKAMURA Takumi back in 2017, we cannot use
properlyDominates for std::stable_sort as properlyDominates only
partially orders blocks. That is, for blocks A, B, C, D, where A
dominates B and C dominates D, we have A == C, B == C, but A < B. This
is not a valid comparison function for std::stable_sort and causes
different results between libstdc++ and libc++. This change uses DFS
numbering to give deterministic results for all reachable blocks.
Unreachable blocks are ignored already, so do not need special
consideration.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D103441
2021-06-03 17:51:52 +01:00
Sander de Smalen d41cb6bb26 [LV] Build and cost VPlans for scalable VFs.
This patch uses the calculated maximum scalable VFs to build VPlans,
cost them and select a suitable scalable VF.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D98722
2021-06-02 14:47:47 +01:00
Sander de Smalen 034503e9d2 [LV] NFC: Remove redundant isLegalMasked(Gather|Scatter) functions.
This NFC change follows from conversation in D102437, where it was discussed
to remove these functions as a separate patch.
2021-06-02 14:09:07 +01:00
Sander de Smalen 3472d3fd9d [LV] NFC: Replace custom getMemInstValueType by llvm::getLoadStoreType.
llvm::getLoadStoreType was added recently and has the same implementation
as 'getMemInstValueType' in LoopVectorize.cpp. Since there is no
value in having two implementations, this patch removes the custom LV
implementation in favor of the generic one defined in Instructions.h.
2021-06-02 14:09:06 +01:00
Harald van Dijk f126e8ec28
[SLPVectorizer] Ignore unreachable blocks
As the existing test unreachable.ll shows, we should be doing more
work to avoid entering unreachable blocks: we should not stop
vectorization just because a PHI incoming value from an unreachable
block cannot be vectorized. We know that particular value will never
be used so we can just replace it with poison.
2021-06-01 20:21:04 +01:00
Alexey Bataev 36911971a5 [SLP]Better detection of perfect/shuffles matches for gather nodes.
Implemented better scheme for perfect/shuffled matches of the gather
nodes which allows to fix the performance regressions introduced by
earlier patches. Starting detecting matches for broadcast nodes and
extractelement gathering.

Differential Revision: https://reviews.llvm.org/D102920
2021-06-01 07:08:07 -07:00
Florian Hahn d4c070d801
[VectorCombine] Freeze index unless it is known to be non-poison.
If the index itself is already poison, the poison propagates through
instructions clamping the index to a valid range. This still causes
introducing a load of poison, as flagged by Alive2 and pointed out
at 575e2aff55.

This patch updates the code to freeze the index, unless it is proven to
not be poison.

Reviewed By: nlopes

Differential Revision: https://reviews.llvm.org/D103378
2021-06-01 10:40:57 +01:00
Florian Hahn aa00b1d763
[LV] Try to sink users recursively for first-order recurrences.
Update isFirstOrderRecurrence to  explore all uses of a recurrence phi
and check if we can sink them. If there are multiple users to sink, they
are all mapped to the previous instruction.

Fixes PR44286 (and another PR or two).

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D84951
2021-05-31 19:55:33 +01:00
Bardia Mahjour 06eaffa858 [NFC] Remove confusing info about MainLoop VF/UF from debug message 2021-05-28 16:10:04 -04:00
Florian Hahn 007f268c35
[VectorCombine] Check indices for all extracts we scalarize.
We need to make sure that the indices of all extracts we scalarize are
valid.
2021-05-28 18:35:29 +01:00
Florian Hahn 38641ddf3e
[VPlan] Do not sink uniform recipes in sinkScalarOperands.
For uniform ReplicateRecipes, only the first lane should be used, so
sinking them would mean we have to compute the value of the first lane
multiple times. Also, at the moment, sinking them causes a crash because
the value of the first lane is re-used by all users.

Reported post-commit for D100258.
2021-05-27 14:07:48 +01:00
Alexey Bataev 27d3528acf [SLP]Fix vectorization of insertelements with multiple uses.
SLP vectorizer should not consider in sertelements with multiple uses as
a part of high level build vector, it must be considered as
a terminating insertelement in the vector build, otherwise it may
produce incorrect code.

Differential Revision: https://reviews.llvm.org/D103164
2021-05-26 09:42:18 -07:00
Kerry McLaughlin 9f76a85260 [LoopVectorize] Enable strict reductions when allowReordering() returns false
When loop hints are passed via metadata, the allowReordering function
in LoopVectorizationLegality will allow the order of floating point
operations to be changed:

  bool allowReordering() const {
    // When enabling loop hints are provided we allow the vectorizer to change
    // the order of operations that is given by the scalar loop. This is not
    // enabled by default because can be unsafe or inefficient.

The -enable-strict-reductions flag introduced in D98435 will currently only
vectorize reductions in-loop if hints are used, since canVectorizeFPMath()
will return false if reordering is not allowed.

This patch changes canVectorizeFPMath() to query whether it is safe to
vectorize the loop with ordered reductions if no hints are used. For
testing purposes, an additional flag (-hints-allow-reordering) has been
added to disable the reordering behaviour described above.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D101836
2021-05-26 13:59:12 +01:00
Florian Hahn 8e83ff58c9
[VectorCombine] Remove unneeded InsertPointGuard (NFCI).
All users of the builder should set an insert point before using the
builder. There should be no need for using InsertPointGuard here.
2021-05-25 17:01:05 +01:00
Florian Hahn 575e2aff55
[VectorCombine] Use constant range info for index scalarization legality.
We can only scalarize memory accesses if we know the index is valid.

This patch adjusts canScalarizeAcceess to fall back to
computeConstantRange to check if the index is known to be valid.

Reviewed By: nlopes

Differential Revision: https://reviews.llvm.org/D102476
2021-05-25 13:58:42 +01:00
Anton Afanasyev b2cd895011 [SLP] Fix "gathering" of insertelement instructions
For rare exceptional case vector tree node (insertelements for now only)
is marked as `NeedToGather`, this case is processed by patch. Follow-up
of D98714 to fix bug reported here https://reviews.llvm.org/D98714#2764135.

Differential Revision: https://reviews.llvm.org/D102675
2021-05-25 01:35:43 +03:00
Florian Hahn 65d3dd7c88
[VPlan] Add first VPlan version of sinkScalarOperands.
This patch adds a first VPlan-based implementation of sinking of scalar
operands.

The current version traverse a VPlan once and processes all operands of
a predicated REPLICATE recipe. If one of those operands can be sunk,
it is moved to the block containing the predicated REPLICATE recipe.
Continue with processing the operands of the sunk recipe.

The initial version does not re-process candidates after other recipes
have been sunk. It also cannot partially sink induction increments at
the moment. The VPlan only contains WIDEN-INDUCTION recipes and if the
induction is used for example in a GEP, only the first lane is used and
in the lowered IR the adds for the other lanes can be sunk into the
predicated blocks.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D100258
2021-05-24 15:29:58 +01:00
Florian Hahn e9d97d7d9d
[VPlan] Add mayReadOrWriteMemory & friends.
This patch adds initial implementation of mayReadOrWriteMemory,
mayReadFromMemory and mayWriteToMemory to VPRecipeBase.

Used by D100258.
2021-05-24 13:11:32 +01:00
Florian Hahn 4e8c28b6fb
Recommit "[VectorCombine] Scalarize vector load/extract."
This reverts commit 94d54155e2.

This fixes a sanitizer failure by moving scalarizeLoadExtract(I)
before foldSingleElementStore(I), which may remove instructions.
2021-05-24 11:35:07 +01:00
Florian Hahn 94d54155e2
Revert "[VectorCombine] Scalarize vector load/extract."
This reverts commit 86497785d5.

One of the tests causes an ASAN failure.
https://lab.llvm.org/buildbot/#/builders/5/builds/7927/steps/12/logs/stdio
2021-05-24 10:11:00 +01:00
Florian Hahn 86497785d5
[VectorCombine] Scalarize vector load/extract.
This patch adds a new combine that tries to scalarize chains of
`extractelement (load %ptr), %idx` to `load (gep %ptr, %idx)`. This is
profitable when extracting only a few elements out of a large vector.

At the moment, `store (extractelement (load %ptr), %idx), %ptr`
operations on large vectors result in huge code in the backend.

This can easily be triggered by using the matrix extension, e.g.
https://clang.godbolt.org/z/qsccPdPf4

This should complement D98240.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D100273
2021-05-24 09:29:08 +01:00
Alexey Bataev 8dab25954b [SLP]Improve handling of compensate external uses cost.
External insertelement users can be represented as a result of shuffle
of the vectorized element and noconsecutive insertlements too. Added
support for handling non-consecutive insertelements.

Differential Revision: https://reviews.llvm.org/D101555
2021-05-21 07:45:31 -07:00
Daniil Fukalov e8e88c3353 [TTI] NFC: Change getRegUsageForType to return InstructionCost.
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D102541
2021-05-21 15:17:23 +03:00
Alexey Bataev 182162b616 [SLP]Try to vectorize tiny trees with shuffled gathers of extractelements.
If we gather extract elements and they actually are just shuffles, it
might be profitable to vectorize them even if the tree is tiny.

Differential Revision: https://reviews.llvm.org/D101460
2021-05-20 08:36:16 -07:00
David Sherwood 7e95a563c8 Remove scalable vector assert from InnerLoopVectorizer::setDebugLocFromInst
In InnerLoopVectorizer::setDebugLocFromInst we were previously
asserting that the VF is not scalable. This is because we want to
use the number of elements to create a duplication factor for the
debug profiling data. However, for scalable vectors we only know the
minimum number of elements. I've simply removed the assert for now
and added a FIXME saying that we assume vscale is always 1. When
vscale is not 1 it just means that the profiling data isn't as
accurate, but shouldn't cause any functional problems.
2021-05-19 13:33:10 +01:00
Sander de Smalen 4f86aa650c [LV] Add -scalable-vectorization=<option> flag.
This patch adds a new option to the LoopVectorizer to control how
scalable vectors can be used.

Initially, this suggests three levels to control scalable
vectorization, although other more aggressive options can be added in
the future.

The possible options are:
- Disabled:   Disables vectorization with scalable vectors.
- Enabled:    Vectorize loops using scalable vectors or fixed-width
              vectors, but favors fixed-width vectors when the cost
              is a tie.
- Preferred:  Like 'Enabled', but favoring scalable vectors when the
              cost-model is inconclusive.

Reviewed By: paulwalker-arm, vkmr

Differential Revision: https://reviews.llvm.org/D101945
2021-05-19 10:40:56 +01:00
Rong Xu 886629a8c9 [SampleFDO] New hierarchical discriminator for Flow Sensitive SampleFDO
This patch implements first part of Flow Sensitive SampleFDO (FSAFDO).
It has the following changes:
(1) disable current discriminator encoding scheme,
(2) new hierarchical discriminator for FSAFDO.

For this patch, option "-enable-fs-discriminator=true" turns on the new
functionality. Option "-enable-fs-discriminator=false" (the default)
keeps the current SampleFDO behavior. When the fs-discriminator is
enabled, we insert a flag variable, namely, llvm_fs_discriminator, to
the object. This symbol will checked by create_llvm_prof tool, and used
to generate a profile with FS-AFDO discriminators enabled. If this
happens, for an extbinary format profile, create_llvm_prof tool
will add a flag to profile summary section.

Differential Revision: https://reviews.llvm.org/D102246
2021-05-18 16:23:43 -07:00
Arthur Eubanks 6b9524a05b [NewPM] Don't mark AA analyses as preserved
Currently all AA analyses marked as preserved are stateless, not taking
into account their dependent analyses. So there's no need to mark them
as preserved, they won't be invalidated unless their analyses are.

SCEVAAResults was the one exception to this, it was treated like a
typical analysis result. Make it like the others and don't invalidate
unless SCEV is invalidated.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D102032
2021-05-18 13:49:03 -07:00
Florian Hahn cc1a6361d3
[VPlan] Add VPUserID to distinguish between recipes and others.
This allows cast/dyn_cast'ing from VPUser to recipes. This is needed
because there are VPUsers that are not recipes.

Reviewed By: gilr, a.elovikov

Differential Revision: https://reviews.llvm.org/D100257
2021-05-18 09:17:28 +01:00
Sander de Smalen 81fdc73e5d [LV] Return both fixed and scalable Max VF from computeMaxVF.
This patch introduces a new class, MaxVFCandidates, that holds the
maximum vectorization factors that have been computed for both scalable
and fixed-width vectors.

This patch is intended to be NFC for fixed-width vectors, although
considering a scalable max VF (which is disabled by default) pessimises
tail-loop elimination, since it can no longer determine if any chosen VF
(less than fixed/scalable MaxVFs) is guaranteed to handle all vector
iterations if the trip-count is known. This issue will be addressed in
a future patch.

Reviewed By: fhahn, david-arm

Differential Revision: https://reviews.llvm.org/D98721
2021-05-18 08:03:48 +01:00
Philip Reames ed9d70781b Revert "[LV] Unconditionally branch from middle to scalar preheader if the scalar loop must execute (try 3)"
This reverts commit 6d3e3ae8a9.

Still seeing PPC build bot failures, and one arm self host bot failing.  I'm officially stumped, and need help from a bot owner to reduce.
2021-05-17 20:53:28 -07:00
Philip Reames 6d3e3ae8a9 [LV] Unconditionally branch from middle to scalar preheader if the scalar loop must execute (try 3)
Resubmit after fixing test/Transforms/LoopVectorize/ARM/mve-gather-scatter-tailpred.ll

Previous commit message...

This is a resubmit of 3e5ce4 (which was reverted by 7fe41ac).  The original commit caused a PPC build bot failure we never really got to the bottom of.  I can't reproduce the issue, and the bot owner was non-responsive.  In the meantime, we stumbled across an issue which seems possibly related, and worked around a latent bug in 80e8025.  My best guess is that the original patch exposed that latent issue at higher frequency, but it really is just a guess.

Original commit message follows...

If we know that the scalar epilogue is required to run, modify the CFG to end the middle block with an unconditional branch to scalar preheader. This is instead of a conditional branch to either the preheader or the exit block.

The motivation to do this is to support multiple exit blocks. Specifically, the current structure forces us to identify immediate dominators and *which* exit block to branch from in the middle terminator. For the multiple exit case - where we know require scalar will hold - these questions are ill formed.

This is the last change needed to support multiple exit loops, but since the diffs are already large enough, I'm going to land this, and then enable separately. You can think of this as being NFCIish prep work, but the changes are a bit too involved for me to feel comfortable tagging the review that way.

Differential Revision: https://reviews.llvm.org/D94892
2021-05-17 16:59:25 -07:00
Philip Reames d16da7343d Revert "[LV] Unconditionally branch from middle to scalar preheader if the scalar loop must execute"
This reverts commit c23ce54b36.  I apparently missed some newly added non-x86 tests.
2021-05-17 16:49:32 -07:00
Philip Reames c23ce54b36 [LV] Unconditionally branch from middle to scalar preheader if the scalar loop must execute
This is a resubmit of 3e5ce4 (which was reverted by 7fe41ac).  The original commit caused a PPC build bot failure we never really got to the bottom of.  I can't reproduce the issue, and the bot owner was non-responsive.  In the meantime, we stumbled across an issue which seems possibly related, and worked around a latent bug in 80e8025.  My best guess is that the original patch exposed that latent issue at higher frequency, but it really is just a guess.

Original commit message follows...

If we know that the scalar epilogue is required to run, modify the CFG to end the middle block with an unconditional branch to scalar preheader. This is instead of a conditional branch to either the preheader or the exit block.

The motivation to do this is to support multiple exit blocks. Specifically, the current structure forces us to identify immediate dominators and *which* exit block to branch from in the middle terminator. For the multiple exit case - where we know require scalar will hold - these questions are ill formed.

This is the last change needed to support multiple exit loops, but since the diffs are already large enough, I'm going to land this, and then enable separately. You can think of this as being NFCIish prep work, but the changes are a bit too involved for me to feel comfortable tagging the review that way.

Differential Revision: https://reviews.llvm.org/D94892
2021-05-17 16:33:56 -07:00
Sander de Smalen f82966d19a [LoopVectorizationLegality] NFC: Mark some interfaces as 'const'
This patch marks blockNeedsPredication, isConsecutivePtr, isMaskRequired
and getSymbolicStrides as 'const'.
2021-05-14 11:53:54 +01:00
Anton Afanasyev 207cdd7ed9 [SLP] Fix spill cost computation for insertelement tree node
This is follow up for D98714, bugfixing.
2021-05-14 13:14:41 +03:00
Sander de Smalen 459c48e04f NFCI: Remove VF argument from isScalarWithPredication
As discussed in D102437, the VF argument to isScalarWithPredication
seems redundant, so this is intended to be a non-functional change. It
seems wrong to query the widening decision at this point. Removing the
operand and code to get the widening decision causes no unit/regression
tests to fail. I've also found no issues running the LLVM test-suite.

This subsequently removes the VF argument from isPredicatedInst as well,
since it is no longer required.
2021-05-14 10:34:40 +01:00
Florian Hahn bdada7546e
[VPlan] Adjust assert in splitBlock to allow splitting at end.
SplitAt should only be dereferenced in the assert if it does not point
to the end of the block. This fixes a crash in the added test case.
2021-05-13 13:36:35 +01:00
Anton Afanasyev ab2c499d3a [SLP] Add insertelement instructions to vectorizable tree
Add new type of tree node for `InsertElementInst` chain forming vector.
These instructions could be either removed, or replaced by shuffles during
vectorization and we can add this node to cost model, so naturally estimating
their cost, getting rid of `CompensateCost` tricks and reducing further work
for InstCombine. This fixes PR40522 and PR35732 in a natural way. Also this
patch is the first step towards revectorization of partially vectorization
(to fix PR42022 completely). After adding inserts to tree the next step is
to add vector instructions there (for instance, to merge `store <2 x float>`
and `store <2 x float>` to `store <4 x float>`).

Fixes PR40522 and PR35732.

Differential Revision: https://reviews.llvm.org/D98714
2021-05-13 07:41:45 +03:00
Justin Bogner e7d26aceca Change the context instruction for computeKnownBits in LoadStoreVectorizer pass
This change enables cases for which the index value for the first
load/store instruction in a pair could be a function argument. This
allows using llvm.assume to provide known bits information in such
cases.

Patch by Viacheslav Nikolaev. Thanks!

Differential Revision: https://reviews.llvm.org/D101680
2021-05-12 15:29:29 -07:00
David Sherwood b7a11274f9 [LoopVectorize] Fix scalarisation crash in widenPHIInstruction for scalable vectors
In InnerLoopVectorizer::widenPHIInstruction there are cases where we have
to scalarise a pointer induction variable after vectorisation. For scalable
vectors we already deal with the case where the pointer induction variable
is uniform, but we currently crash if not uniform. For fixed width vectors
we calculate every lane of the scalarised pointer induction variable for a
given VF, however this cannot work for scalable vectors. In this case I
have added support for caching the whole vector value for each unrolled
part so that we can always extract an arbitrary element. Additionally, we
still continue to cache the known minimum number of lanes too in order
to improve code quality by avoiding an extractelement operation.

I have adapted an existing test `pointer_iv_mixed` from the file:

  Transforms/LoopVectorize/consecutive-ptr-uniforms.ll

and added it here for scalable vectors instead:

  Transforms/LoopVectorize/AArch64/sve-widen-phi.ll

Differential Revision: https://reviews.llvm.org/D101294
2021-05-12 11:02:11 +01:00
Qiu Chaofan 6d2df18163 [VectorComine] Restrict single-element-store index to inbounds constant
Vector single element update optimization is landed in 2db4979. But the
scope needs restriction. This patch restricts the index to inbounds and
vector must be fixed sized. In future, we may use value tracking to
relax constant restrictions.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D102146
2021-05-12 13:18:20 +08:00
Florian Hahn faebc6bf10
[VPlan] Register recipe for instr if the simplified value is recipe.
If the simplified VPValue is a recipe, we need to register it for Instr,
in case it needs to be recorded. The way this is handled in general may
change soon, following some post-commit comments.

This fixes PR50298.
2021-05-11 14:32:34 +01:00
Sanjay Patel 49950cb1f6 [SLP] restrict matching of load combine candidates
The test example from https://llvm.org/PR50256 (and reduced here)
shows that we can match a load combine candidate even when there
are no "or" instructions. We can avoid that by confirming that we
do see an "or". This doesn't apply when matching an or-reduction
because that match begins from the operands of the reduction.

Differential Revision: https://reviews.llvm.org/D102074
2021-05-11 08:46:40 -04:00
Alexey Bataev 30463bc3f1 [SLP]Do not count perfect diamond matches for gathers several times.
Need to remove the old code for avoiding double counting of the gather
nodes with perfect diamond matches within the tree after we started
detecting perfect/shuffled matching in the previous patch D100495. We
may skip the cost for such nodes completely.

Differential Revision: https://reviews.llvm.org/D102023
2021-05-10 07:08:07 -07:00
Qiu Chaofan 2db4979c0f [VectorCombine] Simplify to scalar store if only one element updated
This patch simplifies load-insertelt-store pattern into
getelementptr-store.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D98240
2021-05-08 18:14:51 +08:00
Florian Hahn 75b9997760
[LV] Remove reference of PHI from comment, they are not recorded (NFC).
The comment incorrectly states that the PHI is recorded. That's not
accurate, only the recipe for the incoming value is recorded.

Suggested post-commit for 4ba8720f88.
2021-05-07 21:34:23 +01:00
Florian Hahn 337d765282
[LV] Assert if trying to sink replicate region into another region (NFC)
Currently sinking a replicate region into another replicate region is
not supported. Add an assert, to make the problem more obvious, should
it occur.

Discussed post-commit for ccebf7a109.
2021-05-07 21:25:35 +01:00
Florian Hahn 01c26d4e04
[LV] Rename Region to TargetRegion, similar to SinkRegion (NFC).
Adjust the name to make it clearer this is the region containing the
target recipe, similar to SinkRegion below.

Suggested post-commit for ccebf7a109.
2021-05-07 21:25:35 +01:00
Caroline Concatto cf06c8eee3 [LoopVectorize][SVE] Remove assert for scalable vector in InnerLoopVectorizer::fixReduction
The function fixReduction used to assert/crash for scalable vector when
a vector reduce could be done with a smaller vector.
This patch removes this assertion as it is safe to use scalable vector for
vector reduce and truncate.

Differential Revision: https://reviews.llvm.org/D101260
2021-05-07 09:37:37 +01:00
Simon Pilgrim 338c1b701f [SLP] Constify the TreeEntry* input into getEntryCost() + setInsertPointAfterBundle(). NFCI. 2021-05-06 16:20:19 +01:00
Simon Pilgrim 2dab059021 [SLP] Constify the TreeEntry* input into dumpTreeCosts(). NFCI. 2021-05-06 16:20:19 +01:00
Simon Pilgrim 1b47489fd0 [SLP] Use empty() instead of size() == 0. NFCI. 2021-05-06 16:20:18 +01:00
David Green 4979c90458 [LV] Account for tripcount when calculation vectorization profitability
The loop vectorizer will currently assume a large trip count when
calculating which of several vectorization factors are more profitable.
That is often not a terrible assumption to make as small trip count
loops will usually have been fully unrolled. There are cases however
where we will try to vectorize them, and especially when folding the
tail by masking can incorrectly choose to vectorize loops that are not
beneficial, due to the folded tail rounding the iteration count up for
the vectorized loop.

The motivating example here has a trip count of 5, so either performs 5
scalar iterations or 2 vector iterations (with VF=4). At a high enough
trip count the vectorization becomes profitable, but the rounding up to
2 vector iterations vs only 5 scalar makes it unprofitable.

This adds an alternative cost calculation when we know the max trip
count and are folding tail by masking, rounding the iteration count up
to the correct number for the vector width. We still do not account for
anything like setup cost or the mixture of vector and scalar loops, but
this is at least an improvement in a few cases that we have had
reported.

Differential Revision: https://reviews.llvm.org/D101726
2021-05-06 12:36:46 +01:00
Kerry McLaughlin 8c9742bd23 [SVE][LoopVectorize] Add support for scalable vectorization of first-order recurrences
Adds support for scalable vectorization of loops containing first-order recurrences, e.g:
```
for(int i = 0; i < n; i++)
  b[i] =  a[i] + a[i - 1]
```
This patch changes fixFirstOrderRecurrence for scalable vectors to take vscale into
account when inserting into and extracting from the last lane of a vector.
CreateVectorSplice has been added to construct a vector for the recurrence, which
returns a splice intrinsic for scalable types. For fixed-width the behaviour
remains unchanged as CreateVectorSplice will return a shufflevector instead.

The tests included here are the same as test/Transform/LoopVectorize/first-order-recurrence.ll

Reviewed By: david-arm, fhahn

Differential Revision: https://reviews.llvm.org/D101076
2021-05-06 11:35:39 +01:00
Philip Reames 80e8025083 [LV] Workaround PR49900 (a crash due to analyzing partially mutated IR)
LoopVectorize has a fairly deeply baked in design problem where it will try to query analysis (primarily SCEV, but also ValueTracking) in the midst of mutating IR. In particular, the intermediate IR state does not represent the semantics of the original (or final) program.

Fixing this for real is hard, but all of the cases seen so far share a common symptom. In cases seen to date, the analysis being queried is the computation of the original loop's trip count. We can fix this particular instance of the issue by simply computing the trip count early, and caching it.

I want to be really clear that this is nothing but a workaround. It does nothing to fix the root issue, and at best, delays the time until we have to fix this for real. Florian and I have discussed an eventual solution in the review comments for https://reviews.llvm.org/D100663, but it's a lot of work.

Test taken from https://reviews.llvm.org/D100663.

Differential Revision: https://reviews.llvm.org/D101487
2021-05-05 09:56:28 -07:00
Florian Hahn ccebf7a109
[VPlan] Properly handle sinking of replicate regions.
This patch updates the code that sinks recipes required for first-order
recurrences to properly handle replicate-regions. At the moment, the
code would just move the replicate recipe out of its replicate-region,
producing an invalid VPlan.

When sinking a recipe in a replicate-region, we have to sink the whole
region. To do that, we first need to split the block at the target
recipe and move the region in between.

This patch also adds a splitAt helper to VPBasicBlock to split a
VPBasicBlock at a given iterator.

Fixes PR50009.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D100751
2021-05-04 22:36:01 +01:00
Florian Hahn 4ba8720f88
[VPlan] Representing backedge def-use feeding reduction phis.
This patch updates the code handling reduction recipes to also keep
track of the incoming value from the latch in the recipe. This is needed
to model the def-use chains completely in VPlan, so that it is possible
to replace the incoming value with an arbitrary VPValue.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D99294
2021-05-04 16:33:22 +01:00
Sander de Smalen 9931ae645e Reland "[LV] Calculate max feasible scalable VF."
Relands https://reviews.llvm.org/D98509

This reverts commit 51d648c119.
2021-05-04 15:44:41 +01:00
Alexey Bataev 369cd2ae52 Revert "[SLP]Allow masked gathers only if allowed by target."
This reverts commit fd18547e07. Need to
add a check for the size of the vectorization tree to avoid some extra
vectorization.
2021-05-04 04:53:22 -07:00
Alexey Bataev fd18547e07 [SLP]Allow masked gathers only if allowed by target.
Need to check if target allows/supports masked gathers before trying to
estimate its cost, otherwise we may fail to vectorize some of the
patterns because of too pessimistic cost model.

Part of D57059.

Differential Revision: https://reviews.llvm.org/D101297
2021-05-03 08:06:20 -07:00
Alexey Bataev 2e4cc9a725 Revert "[SLP]Allow masked gathers only if allowed by target."
This reverts commit b5f64768cf to fix
a compiler crash revealed by buildbots.
2021-05-03 07:20:00 -07:00
Alexey Bataev b5f64768cf [SLP]Allow masked gathers only if allowed by target.
Need to check if target allows/supports masked gathers before trying to
estimate its cost, otherwise we may fail to vectorize some of the
patterns because of too pessimistic cost model.

Part of D57059.

Differential Revision: https://reviews.llvm.org/D101297
2021-05-03 06:45:42 -07:00
Florian Hahn 2b7fa7f744 [LV] Iterate over recipes in VPlan to fix PHI (NFC).
As we gradually move more elements of LV to VPlan, we are trying to
reduce the number of places that still has to check IR of the original
loop.

This patch adjusts the code to fix cross iteration phis to get the PHIs
to fix directly from the VPlan that is executed. We still need the
original PHI to check for first-order recurrences, but we can get rid of
that once we model that explicitly in VPlan as well.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D99293
2021-05-03 14:09:46 +01:00
Florian Hahn 942e068d7a [VPlan] Add VPBasicBlock::phis() helper (NFC).
This patch introduces a helper to obtain an iterator range for the
PHI-like recipes in a block.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D100101
2021-05-02 19:20:13 +01:00
Justin Bogner 9542721085 Add support for llvm.assume intrinsic to the LoadStoreVectorizer pass
Patch by Viacheslav Nikolaev. Thanks!
2021-04-30 13:39:46 -07:00
Alexey Bataev a3fd82c289 [SLP]Fix the crash on cost calculation if non-compatible vectors shuffled.
If the extracts from the non-power-2 vectors are recognized as shuffles,
need some extra checks to not crash cost calculations if trying to gext
the ecost for subvector extracts. In this case need to check carefully
that we do not exit out of bounds of the original vector, otherwise the
TTI's cost model will crash on assert.

Differential Revision: https://reviews.llvm.org/D101477
2021-04-30 09:34:20 -07:00
Alexey Bataev 12c51f2358 [COST] Improve shuffle kind detection if shuffle mask is provided.
Added an extra analysis for better choosing of shuffle kind in
getShuffleCost functions for better cost estimation if mask was
provided.

Differential Revision: https://reviews.llvm.org/D100865
2021-04-29 12:48:00 -07:00
Alexey Bataev 6e859f3cd4 Revert "[COST] Improve shuffle kind detection if shuffle mask is provided."
This reverts commit 9239932221 to fix
a compiler crash on mask checks.
2021-04-29 12:40:33 -07:00
Alexey Bataev 9239932221 [COST] Improve shuffle kind detection if shuffle mask is provided.
Added an extra analysis for better choosing of shuffle kind in
getShuffleCost functions for better cost estimation if mask was
provided.

Differential Revision: https://reviews.llvm.org/D100865
2021-04-29 09:42:56 -07:00
Sander de Smalen 51d648c119 Revert "[LV] Calculate max feasible scalable VF."
Temporarily reverting this patch due to some unexpected issue found
by one of the PPC buildbots.

This reverts commit 584e9b6e4b.
2021-04-29 16:04:37 +01:00
Florian Hahn a0e1313c23
[VPlan] Add getVPSingleValue helper.
As suggested in D99294, this adds a getVPSingleValue helper to use for
recipes that are guaranteed to define a single value. This replaces uses
of getVPValue() which used to default to I = 0.
2021-04-29 13:37:38 +01:00
Bardia Mahjour ddb3b26a12 [LV] Consider Loop Unroll Hints When Making Interleave Decisions
This patch causes the loop vectorizer to not interleave loops that have
nounroll loop hints (llvm.loop.unroll.disable and llvm.loop.unroll_count(1)).
Note that if a particular interleave count is being requested
(through llvm.loop.interleave_count), it will still be honoured, regardless
of the presence of nounroll hints.

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D101374
2021-04-28 17:27:52 -04:00
David Sherwood 00e65f3345 [LoopVectorize][SVE] Fix crash when vectorising FP negation
This patch fixes a crash encountered when vectorising the following loop:

 void foo(float *dst, float *src, long long n) {
   for (long long i = 0; i < n; i++)
     dst[i] = -src[i];
 }

using scalable vectors. I've added a test to

 Transforms/LoopVectorize/AArch64/sve-basic-vec.ll

as well as cleaned up the other tests in the same file.

Differential Revision: https://reviews.llvm.org/D98054
2021-04-28 15:22:35 +01:00