Commit Graph

2676 Commits

Author SHA1 Message Date
Alexey Bataev 1b18e9ab67 [PATCH] D105827: [SLP]Workaround for InsertSubVector cost.
The cost of the InsertSubvector shuffle kind cost is not complete and
may end up with just extracts + inserts costs in many cases. Added
a workaround to represent it as a generic PermuteSingleSrc, which is
still pessimistic but better than InsertSubvector.

Differential Revision: https://reviews.llvm.org/D105827
2021-07-16 12:59:08 -07:00
Kerry McLaughlin 49d73130ca [LV] Avoid scalable vectorization for loops containing alloca
This patch returns an Invalid cost from getInstructionCost() for alloca
instructions if the VF is scalable, as otherwise loops which contain
these instructions will crash when attempting to scalarize the alloca.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D105824
2021-07-16 11:47:13 +01:00
Sander de Smalen 239d01fa88 Reland "[LV] Print remark when loop cannot be vectorized due to invalid costs."
The original patch was:
  https://reviews.llvm.org/D105806

There were some issues with undeterministic behaviour of the sorting
function, which led to scalable-call.ll passing and/or failing. This
patch fixes the issue by numbering all instructions in the array first,
and using that number as the order, which should provide a consistent
ordering.

This reverts commit a607f64118.
2021-07-16 10:52:01 +01:00
Sanjay Patel 81ce3aa30c [SLP] avoid leaking poison in reduction of safe boolean logic ops
This bug was introduced with D105730 / 25ee55c0ba .

If we are not converting all of the operations of a reduction
into a vector op, we need to preserve the existing select form
of the remaining ops. Otherwise, we are potentially leaking
poison where it did not in the original code.

Alive2 agrees that the version that freezes some inputs
and then falls back to scalar is correct:
https://alive2.llvm.org/ce/z/erF4K2
2021-07-15 17:33:06 -04:00
Arthur Eubanks 99cb2507f3 Revert "[SLP]Workaround for InsertSubVector cost."
This reverts commit 2eb50baf05.

Causes hangs, see comments on D105827.
2021-07-15 10:19:41 -07:00
Philip Reames 95346ba877 [LV] Enable vectorization of multiple exit loops w/computable exit counts
This change enables vectorization of multiple exit loops when the exit count is statically computable. That requirement - shared with the rest of LV - in turn requires each exit to be analyzeable and to dominate the latch.

The majority of work to support this was done in a set of previous patches. In particular,, 72314466 avoids having multiple edges from the middle block to the exits, and 4b33b2387 which added support for non-latch single exit and multiple exits with a single exiting block. As a result, this change is basically just removing a bailout and adjusting some tests now that the prerequisite work is done and has stuck in tree for a bit.

Differential Revision: https://reviews.llvm.org/D105817
2021-07-15 08:53:51 -07:00
Sander de Smalen a607f64118 Revert "[LV] Print remark when loop cannot be vectorized due to invalid costs."
This reverts commit efaf3099c8.
This reverts commit dc7bdc1e71.

Reverting patches due to buildbot failures.
2021-07-15 15:21:57 +01:00
Sander de Smalen dc7bdc1e71 [LV] Fix determinism for failing scalable-call.ll test.
The sort function for emitting an OptRemark was not deterministic,
which caused scalable-call.ll to fail on some buildbots. This patch
fixes that.

This patch also fixes an issue where `Instruction::comesBefore()`
is called when two Instructions are in different basic blocks,
which would otherwise cause an assertion failure.
2021-07-15 13:16:59 +01:00
Alexey Bataev ba2690b17b [SLP][NFC]Fix variables names, NFC. 2021-07-14 12:43:45 -07:00
Simon Pilgrim 4fd0addb68 [SLP] Fix case of variable name. NFCI. 2021-07-14 20:20:04 +01:00
Sander de Smalen efaf3099c8 [LV] Print remark when loop cannot be vectorized due to invalid costs.
This patch emits remarks for instructions that have invalid costs for
a given set of vectorization factors. Some example output:

  t.c:4:19: remark: Instruction with invalid costs prevented vectorization at VF=(vscale x 1): load
      dst[i] = sinf(src[i]);
                    ^
  t.c:4:14: remark: Instruction with invalid costs prevented vectorization at VF=(vscale x 1, vscale x 2, vscale x 4): call to llvm.sin.f32
      dst[i] = sinf(src[i]);
               ^
  t.c:4:12: remark: Instruction with invalid costs prevented vectorization at VF=(vscale x 1): store
      dst[i] = sinf(src[i]);
             ^

Reviewed By: fhahn, kmclaughlin

Differential Revision: https://reviews.llvm.org/D105806
2021-07-14 17:11:33 +01:00
Alexey Bataev 2eb50baf05 [SLP]Workaround for InsertSubVector cost.
The cost of the InsertSubvector shuffle kind cost is not complete and
may end up with just extracts + inserts costs in many cases. Added
a workaround to represent it as a generic PermuteSingleSrc, which is
still pessimistic but better than InsertSubvector.

Differential Revision: https://reviews.llvm.org/D105827
2021-07-14 07:54:24 -07:00
Sanjay Patel 25ee55c0ba [SLP] match logical and/or as reduction candidates
This has been a work-in-progress for a long time...we finally have all of
the pieces in place to handle vectorization of compare code as shown in:
https://llvm.org/PR41312

To do this (see PhaseOrdering tests), we converted SimplifyCFG and
InstCombine to the poison-safe (select) forms of the logic ops, so now we
need to have SLP recognize those patterns and insert a freeze op to make
a safe reduction:
https://alive2.llvm.org/ce/z/NH54Ah

We get the minimal patterns with this patch, but the PhaseOrdering tests
show that we still need adjustments to get the ideal IR in some or all of
the motivating cases.

Differential Revision: https://reviews.llvm.org/D105730
2021-07-14 09:02:31 -04:00
Sander de Smalen d2e4ccc790 [LV] Ignore candidate VFs with invalid costs.
This follows on from discussion on the mailing-list:
  https://lists.llvm.org/pipermail/llvm-dev/2021-June/151047.html

to interpret an Invalid cost as 'infinitely expensive', as this
simplifies some of the legalization issues with scalable vectors.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D105473
2021-07-12 09:58:22 +01:00
Florian Hahn c6e4c1fbd8
[VPlan] Remove default arg from getVPValue (NFC).
The const version of VPValue::getVPValue still had a default value for
the value index. Remove the default value and use getVPSingleValue
instead, which is the proper function.
2021-07-11 22:03:09 +02:00
Sander de Smalen 239fcda268 [LV] NFCI: Do cost comparison on InstructionCost directly.
Instead of performing the isMoreProfitable() operation on
InstructionCost::CostTy the operation is performed on InstructionCost
directly, so that it can handle the case where one of the costs is
Invalid.

This patch also changes the CostTy to be int64_t, so that the type is
wide enough to deal with multiplications with e.g. `unsigned MaxTripCount`.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D105113
2021-07-10 11:57:16 +01:00
Valery N Dmitriev 8e9216fe87 [SLP] Do not make an attempt to match reduction on already erased instruction.
Differential Revision: https://reviews.llvm.org/D105752
2021-07-09 17:13:15 -07:00
Sanjay Patel c2b7f09d8c [SLP] make invalid operand explicit for extra arg in reduction matching; NFC
This makes it clearer when we have encountered the extra arg.
Also, we may need to adjust the way the operand iteration
works when handling logical and/or.
2021-07-09 15:32:12 -04:00
Sanjay Patel 486992f958 [SLP] improve code comments; NFC
This likely started out only supporint binops,
but now we handle min/max using cmp+sel, and
we may extend to handle bool logic in the form
of select.
2021-07-09 12:49:54 -04:00
Sanjay Patel 544f2711bb [SLP] make checks for cmp+select min/max more explicit
This is NFC-intended currently (so no test diffs). The motivation
is to eventually allow matching for poison-safe logical-and and
logical-or (these are in the form of a select-of-bools).
( https://llvm.org/PR41312 )

Those patterns will not have all of the same constraints as min/max
in the form of cmp+sel. We may also end up removing the cmp+sel
min/max matching entirely (if we canonicalize to intrinsics), so
this will make that step easier.
2021-07-09 12:43:43 -04:00
David Green 38c9a4068d [TTI] Remove IsPairwiseForm from getArithmeticReductionCost
This patch removes the IsPairwiseForm flag from the Reduction Cost TTI
hooks, along with some accompanying code for pattern matching reductions
from trees starting at extract elements. IsPairWise is now assumed to be
false, which was the predominant way that the value was used from both
the Loop and SLP vectorizers. Since the adjustments such as D93860, the
SLP vectorizer has not relied upon this distinction between paiwise and
non-pairwise reductions.

This also removes some code that was detecting reductions trees starting
from extract elements inside the costmodel. This case was
double-counting costs though, adding the individual costs on the
individual instruction _and_ the total cost of the reduction. Removing
it changes the costs in llvm/test/Analysis/CostModel/X86/reduction.ll to
not double count. The cost of reduction intrinsics is still tested
through the various tests in
llvm/test/Analysis/CostModel/X86/reduce-xyz.ll.

Differential Revision: https://reviews.llvm.org/D105484
2021-07-09 11:51:16 +01:00
Alexey Bataev c574d2fbac [SLP]Improve vectorization of stores.
Patch tries to improve the vectorization of stores. Originally, we just
check the type and the base pointer of the store.
Patch adds some extra checks to avoid non-profitable vectorization
cases. It includes analysis of the scalar values to be stored and
triggers the vectorization attempt only if the scalar values have
same/alt opcode and are from same basic block, i.e. we don't end up
immediately with the gather node, which is not profitable.
This also improves compile time by filtering out non-profitable cases.

Part of D57059.

Differential Revision: https://reviews.llvm.org/D104122
2021-07-08 12:35:39 -07:00
Alexey Bataev 0d74fd3fdf [SLP][COST][X86]Improve cost model for masked gather.
Revived D101297 in its original form + added some changes in X86
legalization cehcking for masked gathers.

This solution is the most stable and the most correct one. We have to
check the legality before trying to build the masked gather in SLP.
Without this check we have incorrect cost (for SLP) in case if the masked gather
is not legal/slower than the gather. And we're missing some
vectorization opportunities.

This can be fixed in the cost model, but in this case we need to add
special checks for the cost of GEPs for ScatterVectorize node, add
special check for small trees, etc., i.e. there are a lot of corner
cases here and there, which insrease code base and make it harder to
maintain the code.

> Can't we rely on cost model to deal with this? This can be profitable for futher vectorization, when we can start from such gather loads as seed.

The question from D101297. Actually, no, it can't. Actually, simple
gather may give us better result, especially after we started
vectorization of insertelements. Plus, like I said before, the cost for
non-legal masked gathers leads to missed vectorization opportunities.

Differential Revision: https://reviews.llvm.org/D105042
2021-07-08 11:53:30 -07:00
Sanjay Patel 97c473ad39 [SLP] rename variable to not be misleading; NFC
The reduction matching was probably only dealing with binops
when it was written, but we have now generalized it to handle
select and intrinsics too, so assert on that too.
2021-07-07 14:40:21 -04:00
Philip Reames 723144665b [LV] Unconditionally branch from middle to scalar preheader if the scalar loop must execute (try 4)
Resubmit after the following changes:

* Fix a latent bug related to unrolling with required epilogue (see e49d65f). I believe this is the cause of the prior PPC buildbot failure.
* Disable non-latch exits for epilogue vectorization to be safe (9ffa90d)
* Split out assert movement (600624a) to reduce churn if this gets reverted again.

Previous commit message (try 3)

Resubmit after fixing test/Transforms/LoopVectorize/ARM/mve-gather-scatter-tailpred.ll

Previous commit message...

This is a resubmit of 3e5ce4 (which was reverted by 7fe41ac).  The original commit caused a PPC build bot failure we never really got to the bottom of.  I can't reproduce the issue, and the bot owner was non-responsive.  In the meantime, we stumbled across an issue which seems possibly related, and worked around a latent bug in 80e8025.  My best guess is that the original patch exposed that latent issue at higher frequency, but it really is just a guess.

Original commit message follows...

If we know that the scalar epilogue is required to run, modify the CFG to end the middle block with an unconditional branch to scalar preheader. This is instead of a conditional branch to either the preheader or the exit block.

The motivation to do this is to support multiple exit blocks. Specifically, the current structure forces us to identify immediate dominators and *which* exit block to branch from in the middle terminator. For the multiple exit case - where we know require scalar will hold - these questions are ill formed.

This is the last change needed to support multiple exit loops, but since the diffs are already large enough, I'm going to land this, and then enable separately. You can think of this as being NFCIish prep work, but the changes are a bit too involved for me to feel comfortable tagging the review that way.

Differential Revision: https://reviews.llvm.org/D94892
2021-07-07 07:44:35 -07:00
Dylan Fleming 7215dcfe36 [SVE] Fix ShuffleVector cast<FixedVectorType> in truncateToMinimalBitwidths
Depends on D104239

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D105341
2021-07-07 15:30:10 +01:00
Dylan Fleming 7586b47fb6 [SVE] Fix cast<FixedVectorType> in truncateToMinimalBitwidths
Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D104239
2021-07-07 09:58:05 +01:00
Philip Reames 9ffa90d6c2 [LV] Disable epilogue vectorization for non-latch exits
When skimming through old review discussion, I noticed a post commit comment on an earlier patch which had gone unaddressed.  Better late (4 months), than never right?

I'm not aware of an active problem with the combination of non-latch exits and epilogue vectorization, but the interaction was not considered and I'm not modivated to make epilogue vectorization work with early exits. If there were a bug in the interaction, it would be pretty hard to hit right now (as we canonicalize towards bottom tested loops), but an upcoming change to allow multiple exit loops will greatly increase the chance for error.  Thus, let's play it safe for now.
2021-07-06 10:57:10 -07:00
Alexey Bataev 4e1a0684f1 [SLP]Fix non-determinism in PHI sorting.
Compare type IDs and DFS numbering for basic block instead of addresses
to fix non-determinism.

Differential Revision: https://reviews.llvm.org/D105031
2021-07-06 08:45:45 -07:00
Florian Hahn ef0d147cdc
Recommit "[VPlan] Add VPReductionPHIRecipe (NFC)." and follow-ups.
This reverts commit 706bbfb35b.

The committed version moves the definition of VPReductionPHIRecipe out
of an ifdef only intended for ::print helpers. This should resolve the
build failures that caused the revert
2021-07-06 14:15:42 +01:00
Kerry McLaughlin a7512401e5 [LV] Prevent vectorization with unsupported element types.
This patch adds a TTI function, isElementTypeLegalForScalableVector, to query
whether it is possible to vectorize a given element type. This is called by
isLegalToVectorizeInstTypesForScalable to reject scalable vectorization if
any of the instruction types in the loop are unsupported, e.g:

  int foo(__int128_t* ptr, int N)
    #pragma clang loop vectorize_width(4, scalable)
    for (int i=0; i<N; ++i)
      ptr[i] = ptr[i] + 42;

This example currently crashes if we attempt to vectorize since i128 is not a
supported type for scalable vectorization.

Reviewed By: sdesmalen, david-arm

Differential Revision: https://reviews.llvm.org/D102253
2021-07-06 13:06:21 +01:00
Florian Hahn 706bbfb35b
Revert "[VPlan] Add VPReductionPHIRecipe (NFC)." and follow-ups
This reverts commit 3fed6d443f,
bbcbf21ae6 and
6c3451cd76.

The changes causing build failures with certain configurations, e.g.
https://lab.llvm.org/buildbot/#/builders/67/builds/3365/steps/6/logs/stdio

    lib/libLLVMVectorize.a(LoopVectorize.cpp.o): In function `llvm::VPRecipeBuilder::tryToCreateWidenRecipe(llvm::Instruction*, llvm::ArrayRef<llvm::VPValue*>, llvm::VFRange&, std::unique_ptr<llvm::VPlan, std::default_delete<llvm::VPlan> >&) [clone .localalias.8]':
    LoopVectorize.cpp:(.text._ZN4llvm15VPRecipeBuilder22tryToCreateWidenRecipeEPNS_11InstructionENS_8ArrayRefIPNS_7VPValueEEERNS_7VFRangeERSt10unique_ptrINS_5VPlanESt14default_deleteISA_EE+0x63b): undefined reference to `vtable for llvm::VPReductionPHIRecipe'
    collect2: error: ld returned 1 exit status
2021-07-06 12:10:03 +01:00
Florian Hahn 3fed6d443f
[VPlan] Mark overriden function in VPWidenPHIRecipe as virtual.
VPReductionRecipe overrides those implementations. Mark them as virtual
in the VPWidenPHIRecipe to unbreak build in certain configurations.
2021-07-06 12:00:41 +01:00
Florian Hahn bbcbf21ae6
[VPlan] Add destructor to VPReductionRecipe to unbreak build.
Attempt to unbreak
https://lab.llvm.org/buildbot/#/builders/67/builds/3363/steps/6/logs/stdio
2021-07-06 11:41:20 +01:00
Florian Hahn 6c3451cd76
[VPlan] Add VPReductionPHIRecipe (NFC).
This patch is a first step towards splitting up VPWidenPHIRecipe into
separate recipes for the 3 distinct cases they model:

    1. reduction phis,
    2. first-order recurrence phis,
    3. pointer induction phis.

This allows untangling the code generation and allows us to reduce the
reliance on LoopVectorizationCostModel during VPlan code generation.

Discussed/suggested in D100102, D100113, D104197.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D104989
2021-07-06 11:25:28 +01:00
Kerry McLaughlin 17b701c43c [LV] Collect a list of all element types found in the loop (NFC)
Splits `getSmallestAndWidestTypes` into two functions, one of which now collects
a list of all element types found in the loop (`ElementTypesInLoop`). This ensures we do not
have to iterate over all instructions in the loop again in other places, such as in D102253
which disables scalable vectorization of a loop if any of the instructions use invalid types.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D105437
2021-07-06 10:37:41 +01:00
Caroline Concatto b868a2d2c6 [SLPVectorizer] Fix crash in vectorizeChainsInBlock for scalable vector.
The function vectorizeChainsInBlock does not support scalable vector,
because function like canReuseExtract and isCommutative in the code
path assert with scalable vectors.

This patch avoids vectorizing blocks that have extract instructions with scalable
vector..

Differential Revision: https://reviews.llvm.org/D104809
2021-07-05 12:43:41 +01:00
Nikita Popov a213f735d8 [IR] Deprecate GetElementPtrInst::CreateInBounds without element type
This API is not compatible with opaque pointers, the method
accepting an explicit pointer element type should be used instead.

Thankfully there were few in-tree users. The BPF case still ends
up using the pointer element type for now and needs something like
D105407 to avoid doing so.
2021-07-04 16:49:30 +02:00
Paul Walker 287d39dd5a [NFC] Fix a few whitespace issues and typos. 2021-07-04 11:49:58 +01:00
Nikita Popov fabc17192e [IRBuilder] Add type argument to CreateMaskedLoad/Gather
Same as other CreateLoad-style APIs, these need an explicit type
argument to support opaque pointers.

Differential Revision: https://reviews.llvm.org/D105395
2021-07-04 12:17:59 +02:00
Alexey Bataev 7f7e4aed21 [SLP][NFC]Refactor findLaneForValue and make it static member, NFC, by
V.Dmitriev.

Reduces number of arguments
2021-07-02 10:30:13 -07:00
Alexey Bataev 28ac873bcb [SLP]Fix gathering of the scalars by not ignoring UndefValues.
The compiler should not ignore UndefValue when gathering the scalars,
otherwise the resulting code may be less defined than the original one.
Also, grouped scalars to insert them at first to reduce the analysis in
further passes.

Differential Revision: https://reviews.llvm.org/D105275
2021-07-02 04:46:48 -07:00
David Sherwood 51b4ab26ca [NFC] Add new setDebugLocFromInst that uses the class Builder by default
In lots of places we were calling setDebugLocFromInst and passing
in the same Builder member variable found in InnerLoopVectorizer.
I personally found this confusing so I've changed the interface
to take an Optional<IRBuilder<> *> and we can now pass in None
when we want to use the class member variable.

Differential Revision: https://reviews.llvm.org/D105100
2021-07-01 14:23:34 +01:00
David Sherwood 7b7b5b5a26 [NFC] Rename shadowed variable in InnerLoopVectorizer::createInductionVariable
Avoid creating a IRBuilder stack variable with the same name as the
class member.
2021-06-30 11:11:49 +01:00
Philip Reames e49d65f36d [LV] Fix bug when unrolling (only) a loop with non-latch exit
If we unroll a loop in the vectorizer (without vectorizing), and the cost model requires a epilogue be generated for correctness, the code generation must actually do so.

The included test case on an unmodified opt will access memory one past the expected bound.  As a result, this patch is fixing a latent miscompile.

Differential Revision: https://reviews.llvm.org/D103700
2021-06-29 08:04:26 -07:00
David Sherwood 9de63367d8 Revert "[NFC] Remove shadowed variable in InnerLoopVectorizer::createInductionVariable"
This reverts commit 9dde514162.
2021-06-29 15:20:22 +01:00
David Sherwood 9dde514162 [NFC] Remove shadowed variable in InnerLoopVectorizer::createInductionVariable
Avoid creating a IRBuilder stack variable with the same name as the
class member.
2021-06-29 14:34:30 +01:00
David Sherwood 8a3365fba2 Revert "[NFC] Remove shadowed variable in InnerLoopVectorizer::createInductionVariable"
This reverts commit dcfc2c3fac.
2021-06-29 14:04:42 +01:00
Florian Hahn 47215e1c62
[LV] Fix crash when target instruction for sinking is dead.
This patch fixes a crash when the target instruction for sinking is
dead. In that case, no recipe is created and trying to get the recipe
for it results in a crash. To ensure all sink targets are alive, find &
use the first previous alive instruction.

Note that the case where the sink source is dead is already handled.

Found by
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=35320

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D104603
2021-06-29 13:31:22 +01:00
David Sherwood 303b6d5e98 [LoopVectorize] Add support for scalable vectorization of invariant stores
Previously in setCostBasedWideningDecision if we encountered an
invariant store we just assumed that we could scalarize the store
and called getUniformMemOpCost to get the associated cost.
However, for scalable vectors this is not an option because it is
not currently possibly to scalarize the store. At the moment we
crash in VPReplicateRecipe::execute when trying to scalarize the
store.

Therefore, I have changed setCostBasedWideningDecision so that if
we are storing a scalable vector out to a uniform address and the
target supports scatter instructions, then we should use those
instead.

Tests have been added here:

  Transforms/LoopVectorize/AArch64/sve-inv-store.ll

Differential Revision: https://reviews.llvm.org/D104624
2021-06-29 11:56:09 +01:00
David Sherwood dcfc2c3fac [NFC] Remove shadowed variable in InnerLoopVectorizer::createInductionVariable
Avoid creating a IRBuilder stack variable with the same name as the
class member.
2021-06-29 09:14:35 +01:00
Kerry McLaughlin f99672568f [LoopVectorize] Fix strict reductions where VF = 1
Currently we will allow loops with a fixed width VF of 1 to vectorize
if the -enable-strict-reductions flag is set. However, the loop vectorizer
will not use ordered reductions if `VF.isScalar()` and the resulting
vectorized loop will be out of order.

This patch removes `VF.isVector()` when checking if ordered reductions
should be used. Also, instead of converting the FAdds to reductions if the
VF = 1, operands of the FAdds are changed such that the order is preserved.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D104533
2021-06-28 11:27:10 +01:00
Florian Hahn 80aa7e147e
[VPlan] Merge predicated-triangle regions, after sinking.
Sinking scalar operands into predicated-triangle regions may allow
merging regions. This patch adds a VPlan-to-VPlan transform that tries
to merge predicate-triangle regions after sinking.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D100260
2021-06-28 11:10:38 +01:00
Nikita Popov a9129f8964 [LoadStoreVectorizer] Support opaque pointers
There are remaining redundant bitcasts.
2021-06-27 15:42:16 +02:00
Florian Hahn f1a6430272
[VPlan] Track both incoming values for first-order recurrence phis.
This patch updates VPWidenPHI recipes for first-order recurrences to
also track the incoming value from the back-edge. Similar to D99294,
which did the same for reductions.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D104197
2021-06-27 14:29:35 +01:00
Florian Hahn 7f36981977
[LV] Adjust trip count based on IsOrdered in widenPHIInstruction (NFC).
Suggested in D104197, avoids the early exit.
2021-06-26 13:13:25 +01:00
Florian Hahn cc5ee857f9
[LV] Doxygenize VectorizationFactor member comments (NFC).
Minor cleanup for follow-up patch.
2021-06-25 18:35:00 +01:00
Florian Hahn 91053e327c
[LV] Reflow comment for VectorizationCostTy (NFC). 2021-06-25 14:20:06 +01:00
Florian Hahn 833bdbe93c
[LV] Support sinking recipe in replicate region after another region.
This patch handles sinking a replicate region after another replicate
region. In that case, we can connect the sink region after the target
region. This properly handles the case for which an assertion has been
added in 337d765282.

Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=34842.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D103514
2021-06-24 13:58:42 +01:00
Nikita Popov 00d3f7cc3c [LAA] Make getPointersDiff() API compatible with opaque pointers
Make getPointersDiff() and sortPtrAccesses() compatible with opaque
pointers by explicitly passing in the element type instead of
determining it from the pointer element type.

The SLPVectorizer result is slightly non-optimal in that unnecessary
pointer bitcasts are added.

Differential Revision: https://reviews.llvm.org/D104784
2021-06-23 18:44:34 +02:00
Alexey Bataev 908b753661 [SLP]Improve vectorization of PHI instructions.
Perform better analysis when trying to vectorize PHIs.
1. Do not try to vectorize vector PHIs.
2. Do deeper analysis for more profitable nodes for the vectorization.

Before we just tried to vectorize the PHIs of the same type. Patch
improves this and tries to vectorize PHIs with incoming values which
come from the same basic block, have the same and/or alternative
opcodes.

It allows to save the compile time and provides better vectorization
results in general.

Part of D57059.

Differential Revision: https://reviews.llvm.org/D103638
2021-06-21 12:26:24 -07:00
Roman Lebedev 37dfc467ac
[NFC] LoopVectorizationCostModel::getMaximizedVFForTarget(): clarify debug msg
This really isn't talking about vectors in general,
but only about either fixed or scalable vectors,
and it's pretty confusing to see it state
that there aren't any vectors :)
2021-06-17 21:07:34 +03:00
Florian Hahn 80a403348b
[VPlan] Support PHIs as LastInst when inserting scalars in ::get().
At the moment, we create insertelement instructions directly after
LastInst when inserting scalar values in a vector in
VPTransformState::get.

This results in invalid IR when LastInst is a phi, followed by another
phi. In that case, the new instructions should be inserted just after
the last PHI node in the block.

At the moment, I don't think the problematic case can be triggered, but
it can happen once predicate regions are merged and multiple
VPredInstPHI recipes are in the same block (D100260).

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D104188
2021-06-17 09:36:44 +01:00
Bjorn Pettersson 4c7f820b2b Update @llvm.powi to handle different int sizes for the exponent
This can be seen as a follow up to commit 0ee439b705,
that changed the second argument of __powidf2, __powisf2 and
__powitf2 in compiler-rt from si_int to int. That was to align with
how those runtimes are defined in libgcc.
One thing that seem to have been missing in that patch was to make
sure that the rest of LLVM also handle that the argument now depends
on the size of int (not using the si_int machine mode for 32-bit).
When using __builtin_powi for a target with 16-bit int clang crashed.
And when emitting libcalls to those rtlib functions, typically when
lowering @llvm.powi), the backend would always prepare the exponent
argument as an i32 which caused miscompiles when the rtlib was
compiled with 16-bit int.

The solution used here is to use an overloaded type for the second
argument in @llvm.powi. This way clang can use the "correct" type
when lowering __builtin_powi, and then later when emitting the libcall
it is assumed that the type used in @llvm.powi matches the rtlib
function.

One thing that needed some extra attention was that when vectorizing
calls several passes did not support that several arguments could
be overloaded in the intrinsics. This patch allows overload of a
scalar operand by adding hasVectorInstrinsicOverloadedScalarOpd, with
an entry for powi.

Differential Revision: https://reviews.llvm.org/D99439
2021-06-17 09:38:28 +02:00
Evgeniy Brevnov 96cded5b79 [SLP] Incorrect handling of external scalar values
Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D103954
2021-06-16 13:27:36 +07:00
Florian Hahn 96ca03493a
[VectorCombine] Limit scalarization to non-poison indices for now.
As Eli mentioned post-commit in D103378, the result of the freeze may
still be out-of-range according to Alive2. So for now, just limit the
transform to indices that are non-poison.
2021-06-14 16:40:14 +01:00
Simon Pilgrim b013c58e82 VPlanSLP.cpp - tidy implicit header dependencies. NFCI.
We don't use std::string and std::vector, but we do use std::pair and std::max.
2021-06-13 12:37:17 +01:00
Valery N Dmitriev 94a07c79cf [SLP][NFC] Fix condition that was supposed to save a bit of compile time.
It was found by chance revealing discrepancy between comment (few lines above),
the condition and how re-ordering of instruction is done inside the if statement
it guards. The condition was always evaluated to true.

Differential Revision: https://reviews.llvm.org/D104064
2021-06-11 10:08:55 -07:00
Alexey Bataev a010d4230e [SLP]Allow reordering of insertelements.
After we added support for non-ordered insertelements, we can allow
their reordering.

Differential Revision: https://reviews.llvm.org/D104057
2021-06-11 08:47:41 -07:00
Alexey Bataev 74af4bb1f4 [SLP]Remove unnecessary UndefValue in CreateShuffle.
No need to use UndefValue in CreateShuffle call.

Differential Revision: https://reviews.llvm.org/D104113
2021-06-11 08:08:30 -07:00
Roman Lebedev 20542b47d6
[VectorCombine] scalarizeLoadExtract(): use computeAlignmentAfterScalarization() helper
This results in slightly more optimistic alignments in some cases
2021-06-11 12:47:10 +03:00
Roman Lebedev abc0e0125c
[NFC][VectorCombine] Extract computeAlignmentAfterScalarization() helper function 2021-06-11 12:47:09 +03:00
Simon Pilgrim 5e6bfb661e [Analysis] Pass RecurrenceDescriptor as const reference. NFCI.
We were passing the RecurrenceDescriptor by value to most of the reduction analysis methods, despite it being rather bulky with TrackingVH members (that can be costly to copy). In all these cases we're only using the RecurrenceDescriptor for rather basic purposes (access to types/kinds etc.).

Differential Revision: https://reviews.llvm.org/D104029
2021-06-11 10:24:14 +01:00
Qiu Chaofan 2670c7dd5b [VectorCombine] Fix alignment in single element store
This fixes the concern in single element store scalarization that the
alignment of new store may be larger than it should be. It calculates
the largest alignment if index is constant, and a safe one if not.

Reviewed By: lebedev.ri, spatel

Differential Revision: https://reviews.llvm.org/D103419
2021-06-11 10:28:15 +08:00
Slava Nikolaev 119965865c LoadStoreVectorizer: support different operand orders in the add sequence match
First we refactor the code which does no wrapping add sequences
match: we need to allow different operand orders for
the key add instructions involved in the match.

Then we use the refactored code trying 4 variants of matching operands.

Originally the code relied on the fact that the matching operands
of the two last add instructions of memory index calculations
had the same LHS argument. But which operand is the same
in the two instructions is actually not essential, so now we allow
that to be any of LHS or RHS of each of the two instructions.
This increases the chances of vectorization to happen.

Reviewed By: volkan

Differential Revision: https://reviews.llvm.org/D103912
2021-06-10 16:31:35 -07:00
Joachim Meyer 4f01122c3f [LV] Parallel annotated loop does not imply all loads can be hoisted.
As noted in https://bugs.llvm.org/show_bug.cgi?id=46666, the current behavior of assuming if-conversion safety if a loop is annotated parallel (`!llvm.loop.parallel_accesses`), is not expectable, the documentation for this behavior was since removed from the LangRef again, and can lead to invalid reads.
This was observed in POCL (https://github.com/pocl/pocl/issues/757) and would require similar workarounds in current work at hipSYCL.

The question remains why this was initially added and what the implications of removing this optimization would be.
Do we need an alternative mechanism to propagate the information about legality of if-conversion?
Or is the idea that conditional loads in `#pragma clang loop vectorize(assume_safety)` can be executed unmasked without additional checks flawed in general?
I think this implication is not part of what a user of that pragma (and corresponding metadata) would expect and thus dangerous.

Only two additional tests failed, which are adapted in this patch. Depending on the further direction force-ifcvt.ll should be removed or further adapted.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D103907
2021-06-10 23:37:57 +02:00
Alexey Bataev a893b44187 [SLP]Disable scheduling of insertelements.
There is no need to schedule insertelement instructions. The compiler
did not schedule them before it started support their vectorization and
it should not do it after. We pre-schedule them manually when finding
a build vector sequence.
Disabling scheduling of insertelement instructions improves compile
time and vectorization of the very large basic blocks by saving
scheduling budget for other instructions.

Differential Revision: https://reviews.llvm.org/D104026
2021-06-10 10:25:26 -07:00
Keith Smiley 026170d17d Fix range-loop-analysis warning
```
llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8024:19: warning: loop variable 'VF' of type 'const llvm::ElementCount' creates a copy from type 'const llvm::ElementCount' [-Wrange-loop-analysis]
  for (const auto VF : VFCandidates) {
                  ^
llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8024:8: note: use reference type 'const llvm::ElementCount &' to prevent copying
  for (const auto VF : VFCandidates) {
       ^~~~~~~~~~~~~~~
                  &
1 warning generated.
```

Differential Revision: https://reviews.llvm.org/D103970
2021-06-10 08:39:54 -07:00
Alexey Bataev a0086add2e [SLP]Improve gathering of scalar elements.
1. Better sorting of scalars to be gathered. Trying to insert
   constants/arguments/instructions-out-of-loop at first and only then
   the instructions which are inside the loop. It improves hoisting of
   invariant insertelements instructions.
2. Better detection of shuffle candidates in gathering function.
3. The cost of insertelement for constants is 0.

Part of D57059.

Differential Revision: https://reviews.llvm.org/D103458
2021-06-09 05:23:21 -07:00
Kerry McLaughlin 14eeccfe9a [LoopVectorize] Don't use strict reductions when reordering is allowed
If the `-enable-strict-reductions` flag is set to true, then currently we will
always choose to vectorize the loop with strict in-order reductions. This is
not necessary where we allow the reordering of FP operations, such as
when loop hints are passed via metadata.

This patch moves useOrderedReductions so that we can also check whether
loop hints allow reordering, in which case we should use the default
behaviour of vectorizing with unordered reductions.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D103814
2021-06-08 10:39:29 +01:00
Florian Hahn 1465e7770b
[VPlan] Print successors of VPRegionBlocks.
The non-DOT printing does not include the successors of VPregionBlocks.
This patch use the same style for printing successors as for
VPBasicBlock.

I think the printing of successors could be a bit improved further, as
at the moment it is hard to ensure a check line matches all successors.
But that can be done as follow-up.

Reviewed By: a.elovikov

Differential Revision: https://reviews.llvm.org/D103515
2021-06-07 17:57:21 +01:00
Florian Hahn 23c2f2e6b2
[LV] Mark increment of main vector loop induction variable as NUW.
This patch marks the induction increment of the main induction variable
of the vector loop as NUW when not folding the tail.

If the tail is not folded, we know that End - Start >= Step (either
statically or through the minimum iteration checks). We also know that both
Start % Step == 0 and End % Step == 0. We exit the vector loop if %IV +
%Step == %End. Hence we must exit the loop before %IV + %Step unsigned
overflows and we can mark the induction increment as NUW.

This should make SCEV return more precise bounds for the created vector
loops, used by later optimizations, like late unrolling.

At the moment quite a few tests still need to be updated, but before
doing so I'd like to get initial feedback to make sure I am not missing
anything.

Note that this could probably be further improved by using information
from the original IV.

Attempt of modeling of the assumption in Alive2:
https://alive2.llvm.org/ce/z/H_DL_g

Part of a set of fixes required for PR50412.

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D103255
2021-06-07 10:47:52 +01:00
Alexey Bataev 8c48d77cdf [SLP]Improve cost estimation/emission of externally used extractelements.
No need to recalculate the cost of extractelements, just no need to
compensate the cost of all extractelements, need to check before if this
is actually going to be removed at the vectorization. Also, no need to
 generate new extractelement instruction, we may just regenerate the
 original one. It may improve the final vectorization.

Differential Revision: https://reviews.llvm.org/D102933
2021-06-03 10:26:59 -07:00
Alexey Bataev 89f3bc7698 [SLP]Allow to reorder nodes with >2 scalar values.
tryToVectorizeList function allows to reorder only 2 scalars. Patch
allows to reorder >2 scalars. Also, to avoid possible regressions, it
allows extra vectorization of the remaining parts of the scalars
elements if possible.

Part of D57059.

Differential Revision: https://reviews.llvm.org/D103247
2021-06-03 10:01:36 -07:00
Harald van Dijk 5d2b3de284
[SLP] Avoid std::stable_sort(properlyDominates()).
As noticed by NAKAMURA Takumi back in 2017, we cannot use
properlyDominates for std::stable_sort as properlyDominates only
partially orders blocks. That is, for blocks A, B, C, D, where A
dominates B and C dominates D, we have A == C, B == C, but A < B. This
is not a valid comparison function for std::stable_sort and causes
different results between libstdc++ and libc++. This change uses DFS
numbering to give deterministic results for all reachable blocks.
Unreachable blocks are ignored already, so do not need special
consideration.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D103441
2021-06-03 17:51:52 +01:00
Sander de Smalen d41cb6bb26 [LV] Build and cost VPlans for scalable VFs.
This patch uses the calculated maximum scalable VFs to build VPlans,
cost them and select a suitable scalable VF.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D98722
2021-06-02 14:47:47 +01:00
Sander de Smalen 034503e9d2 [LV] NFC: Remove redundant isLegalMasked(Gather|Scatter) functions.
This NFC change follows from conversation in D102437, where it was discussed
to remove these functions as a separate patch.
2021-06-02 14:09:07 +01:00
Sander de Smalen 3472d3fd9d [LV] NFC: Replace custom getMemInstValueType by llvm::getLoadStoreType.
llvm::getLoadStoreType was added recently and has the same implementation
as 'getMemInstValueType' in LoopVectorize.cpp. Since there is no
value in having two implementations, this patch removes the custom LV
implementation in favor of the generic one defined in Instructions.h.
2021-06-02 14:09:06 +01:00
Harald van Dijk f126e8ec28
[SLPVectorizer] Ignore unreachable blocks
As the existing test unreachable.ll shows, we should be doing more
work to avoid entering unreachable blocks: we should not stop
vectorization just because a PHI incoming value from an unreachable
block cannot be vectorized. We know that particular value will never
be used so we can just replace it with poison.
2021-06-01 20:21:04 +01:00
Alexey Bataev 36911971a5 [SLP]Better detection of perfect/shuffles matches for gather nodes.
Implemented better scheme for perfect/shuffled matches of the gather
nodes which allows to fix the performance regressions introduced by
earlier patches. Starting detecting matches for broadcast nodes and
extractelement gathering.

Differential Revision: https://reviews.llvm.org/D102920
2021-06-01 07:08:07 -07:00
Florian Hahn d4c070d801
[VectorCombine] Freeze index unless it is known to be non-poison.
If the index itself is already poison, the poison propagates through
instructions clamping the index to a valid range. This still causes
introducing a load of poison, as flagged by Alive2 and pointed out
at 575e2aff55.

This patch updates the code to freeze the index, unless it is proven to
not be poison.

Reviewed By: nlopes

Differential Revision: https://reviews.llvm.org/D103378
2021-06-01 10:40:57 +01:00
Florian Hahn aa00b1d763
[LV] Try to sink users recursively for first-order recurrences.
Update isFirstOrderRecurrence to  explore all uses of a recurrence phi
and check if we can sink them. If there are multiple users to sink, they
are all mapped to the previous instruction.

Fixes PR44286 (and another PR or two).

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D84951
2021-05-31 19:55:33 +01:00
Bardia Mahjour 06eaffa858 [NFC] Remove confusing info about MainLoop VF/UF from debug message 2021-05-28 16:10:04 -04:00
Florian Hahn 007f268c35
[VectorCombine] Check indices for all extracts we scalarize.
We need to make sure that the indices of all extracts we scalarize are
valid.
2021-05-28 18:35:29 +01:00
Florian Hahn 38641ddf3e
[VPlan] Do not sink uniform recipes in sinkScalarOperands.
For uniform ReplicateRecipes, only the first lane should be used, so
sinking them would mean we have to compute the value of the first lane
multiple times. Also, at the moment, sinking them causes a crash because
the value of the first lane is re-used by all users.

Reported post-commit for D100258.
2021-05-27 14:07:48 +01:00
Alexey Bataev 27d3528acf [SLP]Fix vectorization of insertelements with multiple uses.
SLP vectorizer should not consider in sertelements with multiple uses as
a part of high level build vector, it must be considered as
a terminating insertelement in the vector build, otherwise it may
produce incorrect code.

Differential Revision: https://reviews.llvm.org/D103164
2021-05-26 09:42:18 -07:00
Kerry McLaughlin 9f76a85260 [LoopVectorize] Enable strict reductions when allowReordering() returns false
When loop hints are passed via metadata, the allowReordering function
in LoopVectorizationLegality will allow the order of floating point
operations to be changed:

  bool allowReordering() const {
    // When enabling loop hints are provided we allow the vectorizer to change
    // the order of operations that is given by the scalar loop. This is not
    // enabled by default because can be unsafe or inefficient.

The -enable-strict-reductions flag introduced in D98435 will currently only
vectorize reductions in-loop if hints are used, since canVectorizeFPMath()
will return false if reordering is not allowed.

This patch changes canVectorizeFPMath() to query whether it is safe to
vectorize the loop with ordered reductions if no hints are used. For
testing purposes, an additional flag (-hints-allow-reordering) has been
added to disable the reordering behaviour described above.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D101836
2021-05-26 13:59:12 +01:00
Florian Hahn 8e83ff58c9
[VectorCombine] Remove unneeded InsertPointGuard (NFCI).
All users of the builder should set an insert point before using the
builder. There should be no need for using InsertPointGuard here.
2021-05-25 17:01:05 +01:00
Florian Hahn 575e2aff55
[VectorCombine] Use constant range info for index scalarization legality.
We can only scalarize memory accesses if we know the index is valid.

This patch adjusts canScalarizeAcceess to fall back to
computeConstantRange to check if the index is known to be valid.

Reviewed By: nlopes

Differential Revision: https://reviews.llvm.org/D102476
2021-05-25 13:58:42 +01:00
Anton Afanasyev b2cd895011 [SLP] Fix "gathering" of insertelement instructions
For rare exceptional case vector tree node (insertelements for now only)
is marked as `NeedToGather`, this case is processed by patch. Follow-up
of D98714 to fix bug reported here https://reviews.llvm.org/D98714#2764135.

Differential Revision: https://reviews.llvm.org/D102675
2021-05-25 01:35:43 +03:00