Commit Graph

1624 Commits

Author SHA1 Message Date
David Green b65267ca7b [LV] Invalidate widening decisions after maximizing vector bandwidth
When MaximizeVectorBandwidth is enabled, we can end up (via calls to
collectUniformsAndScalars/setCostBasedWideningDecision through
calculateRegisterUsage) making widening decisions before we have decided
whether to fold the tail by masking. These decisions will be wrong if we
later decided to fold the tail, for example when the trip count is very
low. It will use incorrect costs for loads that should get masked, using
standard memory operation costs instead.

This still at the moment uses the EmulatedMaskMemRefHack costs (a bit
unfortunately), but the old costs without this change were 1, leading to
too optimistic vectorization.

This slightly changes the way that the MaximizeVectorBandwidth option
works to make it easier to test, always honouring the option if it is
set.

Differential Revision: https://reviews.llvm.org/D120215
2022-03-31 09:19:31 +01:00
Florian Hahn ecb4171dcb
[LV] Handle zero cost loops in selectInterleaveCount.
In some case, like in the added test case, we can reach
selectInterleaveCount with loops that actually have a cost of 0.

Unfortunately a loop cost of 0 is also used to communicate that the cost
has not been computed yet. To resolve the crash, bail out if the cost
remains zero after computing it.

This seems like the best option, as there are multiple code paths that
return a cost of 0 to force a computation in selectInterleaveCount.
Computing the cost at multiple places up front there would unnecessarily
complicate the logic.

Fixes #54413.
2022-03-29 22:52:43 +01:00
Florian Hahn 46432a0088
[VPlan] Add VPWidenPointerInductionRecipe.
This patch moves pointer induction handling from VPWidenPHIRecipe to its
own recipe. In the process, it adds all information required to generate
code for pointer inductions without relying on Legal to access the list
of induction phis.

Alternatively VPWidenPHIRecipe could also take an optional pointer to InductionDescriptor.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D121615
2022-03-24 14:58:45 +00:00
Florian Hahn 890fc21742
[LV] Extend checks in debugloc.ll. 2022-03-23 20:21:58 +00:00
Florian Hahn 973183612e
[VPlan] Add test for VPExpandSCEVRecipe printing. 2022-03-20 10:11:40 +00:00
Florian Hahn d5fbcf76fd
[VPlan] Improve pattern in vplan-printing.ll check line.
The existing pattern only matched a single value, which breaks if the
numbering slightly changes.
2022-03-19 16:03:25 +00:00
Andrew Wei 0af3e6a22d [InstCombine] Sink instructions with multiple users in a successor block.
This patch tries to sink instructions when they are only used in a successor block.

This is a further enhancement patch based on Anna's commit:
D109700, which allows sinking an instruction having multiple uses in a single user.

In this patch, sink instructions with multiple users in a single successor block will be supported.
It could fix a known issue from rust:
  https://github.com/rust-lang/rust/issues/51346#issuecomment-394443610

Reviewed By: nikic, reames

Differential Revision: https://reviews.llvm.org/D121585
2022-03-18 11:53:45 +08:00
Florian Hahn 151c144350
[LV] Use usesScalars in widenPHIInstruction.
This uses the existing VPlan helpers to check whether there are scalar
uses of a phi recipe. It remove one of the few remaining dependencies on
the cost model from VPlan code generation.

Depends on D121612.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D121613
2022-03-17 13:16:32 +00:00
Malhar Jajoo a36d269658 [VPlan] Avoid collecting scalars for SVE
This patch ensures scalars (except for uniforms) are no
longer collected (prior to LVP planning phase) for
scalable vectorization.

This is to avoid the chances of generating scalarized
instructions later (during LVP execute phase) as they
are not supported for scalable vectorization.

Relevant test has also been added.

Differential Revision: https://reviews.llvm.org/D121452
2022-03-16 16:33:34 +00:00
Florian Hahn 5c4d64eb0d
[LV] Make reduction-order.ll test independent of instruction naming.
Also update test to not use branch on undef.
2022-03-15 11:13:18 +00:00
Florian Hahn 4a0481e981
[LV] Check for users of truncated IVs, add more detailed comment.
Add missing outside user check for truncated IVs. Also hoist the code in
the helper with additional explanations.

Fixes #54370.
2022-03-14 19:39:30 +00:00
Florian Hahn 1c0fc1f074
[VPlan] Ensure each iv user is only visited once in transform.
If a recipe has multiple uses of an IV, we crash. It causes a crash when
building llvm-test-suite.

Exposed by 95f76bff1c.
2022-03-13 21:42:17 +00:00
Florian Hahn 95f76bff1c
[LV] Create & use VPScalarIVSteps for all scalar users.
This patch is a follow-up to D115953. It updates optimizeInductions
to also introduce new VPScalarIVStepsRecipes if an IV has both vector
and scalar uses.

It updates all uses that only need scalar values to use the newly
created recipe for the scalar steps.

This completes untangling of VPWidenIntOrFpInductionRecipe
code-generation. Now the recipe *only* creates the widened vector
values, as it says on the tin.

The code to genereate IR has been moved directly to
VPWidenIntOrFpInductionRecipe::execute.

Note that the recipe has been updated to hold a reference to
ScalarEvolution, which is needed to expand the step, until we can place
the corresponding SCEV expansion in the pre-header.

Depends on D120827.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D120828
2022-03-13 17:15:24 +00:00
Sanjay Patel b48fe158e0 [Analysis] remove bogus smin/smax pattern detection
This is a revert of cfcc42bdc. The analysis is wrong as shown by
the minimal tests for instcombine:
https://alive2.llvm.org/ce/z/y9Dp8A

There may be a way to salvage some of the other tests,
but that can be done as follow-ups. This avoids a miscompile
and fixes #54311.
2022-03-09 17:50:34 -05:00
Florian Hahn a12403cfea
[LV] Do not consider instrs dead if used by phi that's not in plan.
Single value phis won't be modeled in VPlan. If the phi only gets used
outside the loop, the current code misses the fact that the incoming
value is not dead. Update the code to also look through such phis to
check for outside users.

Fixes #54266
2022-03-09 16:04:44 +00:00
Florian Hahn a2979c8399
[IVDescriptors] Bail out instead of asserting that order is expected.
When dealing with multiple phis that depend on each other, the order
might have been changed and may not match the expectation. If that
happens, bail out, rather than asserting.

Fixes https://github.com/llvm/llvm-project/issues/54218
Fixes https://github.com/llvm/llvm-project/issues/54233
Fixes https://github.com/llvm/llvm-project/issues/54254
2022-03-07 19:57:26 +00:00
Florian Hahn f4368487aa
[LV] Add test from PR54227.
Test from https://github.com/llvm/llvm-project/issues/54227.

The underlying issue has already been fixed in de8ac48 with a separate
test.
2022-03-07 17:01:22 +00:00
Roman Lebedev 2f80ea7f4f
[NFC][LV] Use different braces in debug output
The analysis passes output function name encapsulated in `'` braces,
but LV uses `"`. Harmonizing this may help in creating an update script
for the LV costmodel test checks.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D121105
2022-03-07 19:32:37 +03:00
Florian Hahn de8ac485e5
[IVDescriptor] Remove SinkCandidate from SinkAfter before re-sinking.
This ensures the right order in the sink-after map is maintained. If we
re-sink an instruction, it must be sunk after all earlier instructions
have been sunk.

Fixes https://github.com/llvm/llvm-project/issues/54223
2022-03-05 19:48:26 +00:00
Florian Hahn 5a60260efe
[IVDescriptor] Use DT to check order of Previous, OtherPrev.
Previous and OhterPrev may not be in the same block. Use DT::dominates
instead of local comesBefore. DT::dominates is already used earlier to
check the order of Previous and SinkCandidate.

Fixes https://github.com/llvm/llvm-project/issues/54195
2022-03-04 11:07:42 +00:00
Florian Hahn 139215af8e
[IVDescriptor] Find original 'Previous' for first-order recurrences.
This patch extends first-order recurrence handling to support cases
where we already sunk an instruction for a different recurrence, but
LastPrev comes before Previous.

To handle those cases correctly, we need to find the earliest entry for
the sink-after chain, because this is references the Previous from the
original recurrence. This is needed to ensure we use the correct
instruction as sink point.

Depends on D118558.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D118642
2022-03-03 16:41:26 +00:00
Florian Hahn 8777cb66a8
[VPlan] Remove reliance on underlying instr for ScalarIVSteps (NFCI).
Instead of relying on underlying instructions, this patch updates
VPScalarIVStepsRecipe to only store the required type information.

This removes access to unrelated information, as well as avoiding issues
with the same underlying instruction being shared by multiple recipes.

This change should only change the debug output and not cause any
codegen changes, hence NFCI.
2022-03-02 16:23:19 +00:00
Florian Hahn 6dc456a375
[LV] Remove redundant check line from recurrence test.
The removed line matches the previous line, modulo the check prefix.
There is no way to disable sinking instructions as required due to
first-order recurrence and removing the line should be safe.
2022-03-02 13:48:46 +00:00
Florian Hahn 83fd2071f0
[LV] Modernize test matching hardcoded induction phi name. 2022-03-02 10:12:38 +00:00
Florian Hahn 470b5c7f0d
[LV] Add test with multiple use of a FOR chained together.
Additional test coverage for D118642.
2022-03-01 14:18:23 +00:00
Nikita Popov 26748bb15a [InstCombine] Slightly relax one-use check in abs canonicalization
Treat the icmp and sub symmetrically, and require that one of them
has one use, not the icmp in particular. This could be further
relaxed in the abs (but not nabs) case to not check one-use at
all.
2022-03-01 15:06:41 +01:00
Nikita Popov 7c080e4649 [LoopVectorize] Regenerate test checks (NFC) 2022-03-01 15:01:14 +01:00
Andrei Elovikov 6e9a8cdcfb [NFC][LoopVectorizer] Simplify LoopVectorize/X86/gather_scatter.ll
The test used to run whole O3 pipeline. Modify it to contain LLVM IR right
before LV and limit passes to "-loop-vectorizer -simplifycfg".

For the RUN line with forced VF force interleave factor as well to simplify
CHECKs as interleaving isn't related to the purpose of the test.

I also tried to add "noalias" to pointer arguments in
@test_gather_not_profitable_pr48429 but LAI seems unable to use them.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D119786
2022-02-28 11:12:50 -08:00
Florian Hahn b3e8ace198
Recommit "[VPlan] Introduce recipe to build scalar steps."
This reverts the revert commit ff93260bf6.

The underlying issue causing the PPC bot failures has been fixed in
cbaac14734 and a corresponding test case has been added in
ad2cad1c52.

Original message:

    This patch adds a new VPScalarIVStepsRecipe to handle building scalar
    steps.

    In the first patch, it only handles the case where there is no vector
    induction variable needed.

    Reviewed By: Ayal

    Differential Revision: https://reviews.llvm.org/D115953
2022-02-28 14:12:20 +00:00
Florian Hahn cbaac14734
[LV] Remove induction recipes only used outside vector loop.
Exit values of vector inductions are generated completely independent of
the induction recipes. Consider them for removal, if they are not used
in loop.

This fixes a crash exposed by 49b23f451c.
2022-02-28 11:14:22 +00:00
Florian Hahn 8bbc5e172a
[LV] Add test with dead induction in vector loop used outside.
Add test with a induction phi that is not used in the vector loop, but
by an lcssa phi in the loop exit.
2022-02-28 10:39:08 +00:00
Florian Hahn ad2cad1c52
[LV] Add test with IV that needs scalar steps and user outside of loop.
Also add a run line to check interleaving only. This test covers the PPC
buildbot failures caused by 49b23f451c.
2022-02-28 09:46:18 +00:00
Florian Hahn ff93260bf6
Revert "[VPlan] Introduce recipe to build scalar steps."
This reverts commit 49b23f451c.

This appears to break some PPC build bots. Revert while I investigate.
2022-02-27 17:51:19 +00:00
Florian Hahn 49b23f451c
[VPlan] Introduce recipe to build scalar steps.
This patch adds a new VPScalarIVStepsRecipe to handle building scalar
steps.

In the first patch, it only handles the case where there is no vector
induction variable needed.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D115953
2022-02-27 17:32:41 +00:00
Florian Hahn da740492b0
[VPlan] Remove dead header-phi recipes.
This patch adds a new transform to remove dead recipes. For now, it only
removes dead recipes in the header, to keep the number tests that require
updating manageable. Future patches will extend this to remove dead
recipes across the whole plan.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D118051
2022-02-26 16:26:39 +00:00
Florian Hahn 462cd9270c
[LV] Add test with redundant cast in separate latch block.
Adds another interesting test for D118051.
2022-02-26 14:52:55 +00:00
Nikita Popov a266af7211 [InstCombine] Canonicalize SPF to min/max intrinsics
Now that integer min/max intrinsics have good support in both
InstCombine and other passes, start canonicalizing SPF min/max
to intrinsic min/max.

Once this sticks, we can stop matching SPF min/max in various
places, and can remove hacks we have for preventing infinite loops
and breaking of SPF canonicalization.

Differential Revision: https://reviews.llvm.org/D98152
2022-02-24 09:01:20 +01:00
Malhar Jajoo 9f1c6fbf11 [LAA] Add remarks for unbounded array access
Adds new optimization remarks when loop vectorization fails due to
the compiler being unable to find bound of an array access inside
a loop

Differential Revision: https://reviews.llvm.org/D115873
2022-02-23 15:57:39 +00:00
Kerry McLaughlin 12fb133eba [LoopVectorize] Support conditional in-loop vector reductions
Extends getReductionOpChain to look through Phis which may be part of
the reduction chain. adjustRecipesForReductions will now also create a
CondOp for VPReductionRecipe if the block is predicated and not only if
foldTailByMasking is true.

Changes were required in tryToBlend to ensure that we don't attempt
to convert the reduction Phi into a select by returning a VPBlendRecipe.
The VPReductionRecipe will create a select between the Phi and the reduction.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D117580
2022-02-22 12:04:35 +00:00
Florian Hahn 5c7ae10cec
[LV] Add store to test to make sure the loop is not dead.
Add an extra store to the test, to make sure the operations in the loop
cannot be optimized away after D118051.
2022-02-20 15:05:29 +00:00
zhongyunde b2f5164deb [IVDescriptors] Support FOR where we have multiple sink pointed
Handles the case where Previous doesn't come before LastPrev incorrectly.
Fix https://github.com/llvm/llvm-project/issues/53483

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D118558
2022-02-14 09:30:35 +08:00
Florian Hahn d462e64754
[LV] Drop noalias from check lines from test (NFC).
The noalias metadata checks re not really relevant for the test and
slight changes to metadata numbering can have large knock-on effects
causing large noise in test diff.
2022-02-13 11:36:54 +00:00
Florian Hahn 446e7c64c7
[LV] Add real uses in some tests, to make them more robust.
Add real uses to some tests, to ensure dead instructions cannot be directly
removed.
2022-02-13 09:52:59 +00:00
Florian Hahn 9474c3009e
[LV] Move unrelated tests from first-order-recurrence-chains.ll 2022-02-11 09:15:42 +00:00
Florian Hahn f97795121f
[LV] Add tests with chained first-order recurrences. 2022-02-10 15:55:19 +00:00
Simon Pilgrim 4517488eb7 [LoopVectorize] Regenerate reduction-predselect.ll test checks 2022-02-10 12:03:10 +00:00
David Green b55d4c2ad8 Revert "[LV] Remove `LoopVectorizationCostModel::useEmulatedMaskMemRefHack()`"
This reverts commit 77a0da926c as we've
received multiple reports of this significantly impacting performance,
in ways that don't seem to just be target specific cost models going
wrong. I would offer some reproducers, but the test changes here seem to
be full of them!

Reverting for now and hopefully we can remove the "hack" more carefully
as we go.
2022-02-09 20:02:54 +00:00
David Green b4c6d1bb37 [LoopVectorizer] Don't perform interleaving of predicated scalar loops
The vectorizer will choose at times to "vectorize" loops with a scalar
factor (VF=1) with interleaving (IC > 1). This can occasionally produce
better code than the unroller (notable for reductions where it can
produce independent reduction chains that are combined after the loop).
At times this is not very beneficial though, for example when runtime
checks are needed or when the scalar code requires predication.

This addresses the second point, preventing the vectorizer from
interleaving when the scalar loop will require predication. This
prevents it from making a bit of a mess, that is worse than the original
and better left for the unroller to unroll if beneficial. It helps
reverse some of the regressions from D118090.

Differential Revision: https://reviews.llvm.org/D118566
2022-02-07 19:34:28 +00:00
Florian Hahn 1049735d07
[LV] Adjust accesses in test to ensure full RT checks are generated.
Add an additional access so the full runtime checks are still generated,
even after D119078.
2022-02-07 18:07:19 +00:00
Roman Lebedev 77a0da926c
[LV] Remove `LoopVectorizationCostModel::useEmulatedMaskMemRefHack()`
D43208 extracted `useEmulatedMaskMemRefHack()` from legality into cost model.
What it essentially does is prevents scalarized vectorization of masked memory operations:
```
  // TODO: Cost model for emulated masked load/store is completely
  // broken. This hack guides the cost model to use an artificially
  // high enough value to practically disable vectorization with such
  // operations, except where previously deployed legality hack allowed
  // using very low cost values. This is to avoid regressions coming simply
  // from moving "masked load/store" check from legality to cost model.
  // Masked Load/Gather emulation was previously never allowed.
  // Limited number of Masked Store/Scatter emulation was allowed.
```

While i don't really understand about what specifically `is completely broken`
was talking about, i believe that at least on X86 with AVX2-or-later,
this is no longer true. (or at least, i would like to know what is still broken).
So i would like to follow suit after D111460, and like wise disable that hack for AVX2+.

But since this was added for X86 specifically, let's just instead completely remove this hack.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D114779
2022-02-07 16:08:31 +03:00