Commit Graph

1851 Commits

Author SHA1 Message Date
Florian Hahn 7c0ff64b0f
[LAA] Change to function analysis for new PM.
At the moment, LoopAccessAnalysis is a loop analysis for the new pass
manager. The issue with that is that LAI caches SCEV expressions and
modifications in a loop may impact SCEV expressions in other loops, but
we do not have a convenient way to invalidate LAI for other loops
withing a loop pipeline.

To avoid this issue, turn it into a function analysis which returns a
manager object that keeps track of the individual LAI objects per loop.

Fixes #50940.

Fixes #51669.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D134606
2022-10-01 15:44:27 +01:00
Arthur Eubanks e23aee7175 [test] Update some legacy PM tests 2022-09-30 11:31:02 -07:00
Florian Hahn 9933a2e9fd
[SCEVExpander] Move LCSSA fixup to ::expand.
Move LCSSA fixup from ::expandCodeForImpl to ::expand(). This has
the advantage that we directly preserve LCSSA nodes here instead of
relying on doing so in rememberInstruction. It also ensures that we
 don't add the non-LCSSA-safe value to InsertedExpressions.

Alternative to D132704.

Fixes #57000.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D134739
2022-09-29 20:49:56 +01:00
Igor Kirillov 2d60d7ba1a [LoopVectorize][Fix] Crash when invariant store address is calculated inside loop
Fixes #57572

Generally LICM pass is responsible for sinking out code that calculates
invariant address inside loop as it only needed to be calculated once.
But in rare case it does not happen we will not be vectorizing the
loop.

Differential Revision: https://reviews.llvm.org/D133687
2022-09-28 10:33:50 +01:00
Philip Reames dc7387b587 [LV] Adjust cost model to use uniform store lowering for unpredicated uniform stores
Follow up to D133580; adjust the cost model to prefer uniform store lowering for scalable stores which are unpredicated.

The impact here isn't in the uniform store lowering quality itself. InstCombine happily converts the scatter form into the single store form. The main impact is in letting the rest of the cost model make choices based on the knowledge that the vector will be scalarized on use.

Differential Revision: https://reviews.llvm.org/D134460
2022-09-27 07:28:40 -07:00
Florian Hahn 2c692d891e
[LV] Update handling of scalable pointer inductions after b73d2c8.
The dependent code has been changed quite a lot since 151c144 which
b73d2c8 effectively reverts. Now we run into a case where lowering
didn't expect/support the behavior pre 151c144 any longer.

Update the code dealing with scalable pointer inductions to also check
for uniformity in combination with isScalarAfterVectorization. This
should ensure scalable pointer inductions are handled properly during
epilogue vectorization.

Fixes #57912.
2022-09-23 18:23:02 +01:00
Florian Hahn 17167005d5
[LV] Add test for #57912.
Add test showing miscompilation during epilogue vectorization with SVE.
2022-09-23 11:49:55 +01:00
Florian Hahn 05b3493819
[LV] Convert sve-epilog-vect.ll to use opaque pointers. 2022-09-23 10:24:19 +01:00
Philip Reames 32dc1151e2 [VPlan] Only generate single instr for unpredicated stores of varying value to invariant address
This extends the previously added uniform store case to handle stores of loop varying values to a loop invariant address. Note that the placement of this code only allows unpredicated stores; this is important for correctness. (That is "IsPredicated" is always false at this point in the function.)

This patch does not include scalable types. The diff felt "large enough" as it were; I'll handle that in a separate patch. (It requires some changes to cost modeling.)

Differential Revision: https://reviews.llvm.org/D133580
2022-09-22 08:53:46 -07:00
Simon Pilgrim e030be64d8 [CostModel][X86] Add partial CostKinds handling for funnelshifts/rotates
This mainly just adds costs for the targets where we have actual funnelshift/rotate instructions (VBMI2/XOP etc.) - the cases where we expand still need addressing, although for many the default shift+or expansion, especially for uniform cases, isn't that bad.

This was achieved with the 'cost-tables vs llvm-mca' script D103695
2022-09-22 11:24:11 +01:00
Simon Pilgrim b2cd8118d0 [CostModel][X86] Add CostKinds handling for smax/smin/umax/umin instructions
This was achieved with the 'cost-tables vs llvm-mca' script D103695
2022-09-22 10:19:23 +01:00
Philip Reames 8c46881a53 [TTI] Recognize fp constants in getOperandInfo
We were recognizing vectors of floats, but not scalars.  That's a tad odd.
2022-09-21 14:34:34 -07:00
Graham Hunter 7b420a4a8b [NFC][LV] Scalarizing test for masked vector calls 2022-09-21 15:43:25 +01:00
Simon Pilgrim 71162ad957 [LoopVectorize] Fix test name - the test is for fshl not cttz intrinsic costs 2022-09-21 15:24:43 +01:00
Sanjay Patel 0f32a5dea0 [InstCombine] don't canonicalize shl+sub to mul+add
This stops Negator from transforming:
`C1 - shl X, C2 --> mul X, (1<<C2) + C1`
...in the general case. There does not seem to be any analysis
benefit to using mul in IR, and there's definitely downside in
codegen (particularly when the multiply has to be expanded).

If `C1` is 0, then there's a stronger argument that the single
mul is a better canonicalization than negate-of-shl, but we may
want to remove that too.

This was noted as a potential conflict for D133667.

Differential Revision: https://reviews.llvm.org/D134310
2022-09-21 08:39:07 -04:00
Simon Pilgrim 09cb9fdef9 [InstCombine] Fold ult(add(x,-1),c) -> ule(x,c) iff x != 0 (PR57635)
Alive2: https://alive2.llvm.org/ce/z/sZ6wwS

As detailed on Issue #57635 and #37628 - for unsigned comparisons, we can compare prior to a decrement iff the value is known never to be zero.

Differential Revision: https://reviews.llvm.org/D134172
2022-09-20 16:44:41 +01:00
Vitaly Buka bbef90ace4 [IRBuilder] Use PoisonValue in CreateMasked*
Followup to 72b776168c

Reviewed By: nlopes

Differential Revision: https://reviews.llvm.org/D133967
2022-09-19 11:01:41 -07:00
Florian Hahn 582f8ef19f
[LV] Keep track of cost-based ScalarAfterVec in VPWidenPointerInd.
Epilogue vectorization uses isScalarAfterVectorization to check if
widened versions for inductions need to be generated and bails out in
those cases.

At the moment, there are scenarios where isScalarAfterVectorization
returns true but VPWidenPointerInduction::onlyScalarsGenerated would
return false, causing widening.

This can lead to widened phis with incorrect start values being created
in the epilogue vector body.

This patch addresses the issue by storing the cost-model decision in
VPWidenPointerInductionRecipe and restoring the behavior before 151c144.
This effectively reverts 151c144, but the long-term fix is to properly
support widened inductions during epilogue vectorization

Fixes #57712.
2022-09-19 18:14:35 +01:00
Sebastian Peryt 99c9b37d11 [NFC][1/n] Remove -enable-new-pm=0 flags from lit tests
This is the first patch in a series intended for removing flag
-enable-new-pm=0 from lit tests. This is part of a bigger
effort of completely removing legacy code related to legacy
pass manager in favor of currently default new pass manager.

In this patch flag has been removed only from tests where no significant
change has been required because checks has been duplicated for
both PMs.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D134150
2022-09-19 09:57:37 -07:00
Florian Hahn f02ff5348f
[LV] Move new epilog-vectorization-widen-inductions.ll to AArch64 dir.
The test requires the AArch64 backend, so move it to the right subdir.
2022-09-19 17:13:06 +01:00
Florian Hahn 6087b6386e
[LV] Add tests for epilogue vectorization with widened inductions.
Includes a test for the miscompile in #57712.
2022-09-19 17:10:41 +01:00
Simon Pilgrim 393cc6a354 [LoopVectorize] Regenerate runtime-check.ll 2022-09-19 10:25:48 +01:00
Simon Pilgrim 7e626d7a89 [LoopVectorize][X86] Use quotes around the pass list to appease DOS cmd evaluation
DOS can't handle -passes='default<O3>' correctly
2022-09-19 10:24:37 +01:00
Sanjay Patel d6498abc24 [InstCombine] remove multi-use add demanded constant fold
This was originally part of D133788. There are no visible
regressions. All of the diffs show a large unsigned constant
becoming a small negative constant. This should be better
for analysis (and slightly less compile-time) and codegen.
2022-09-18 14:23:43 -04:00
Vitaly Buka ed188b39ab [test] Regenerate few tests 2022-09-15 12:36:32 -07:00
Simon Pilgrim 0ec028fe10 [CostModel][X86] Add CostKinds handling for vector shift by uniform/constuniform ops
Vector shift by const uniform is the cheapest shift instruction we have, non-const uniform have a marginally higher cost - some targets 'splat' the amount internally to use the shift-per-element instruction, others see a higher cost for the explicit zeroing of the upper bits for the (64-bit) shift amount.

This was achieved with an updated version of the 'cost-tables vs llvm-mca' script D103695 (I'll update the patch soon for reference)
2022-09-15 14:05:30 +01:00
jacquesguan ecf327f154 [RISCV] Add cost model for vector insert/extract element.
This patch adds cost model for vector insert/extract element instructions. In RVV, we could use vector scalar move instruction to insert or extract the first element, and use vslide to move it. But for mask vector or i64 vector in i32 target, we need special instructions to make it.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D133007
2022-09-14 11:10:18 +08:00
Simon Pilgrim 8ae9cf550b [LoopVectorize][X86] Add uniform shift costs checks for VF=1/2/4 2022-09-13 13:46:52 +01:00
Philip Reames 4e295cb1ce [LV] Autogen a test for ease of update 2022-09-09 08:16:22 -07:00
Philip Reames edb26268ce [VPlan] Only generate single instr for stores uniform across all parts.
Extend the approach taken by D133019 to store instructions.

Differential Revision: https://reviews.llvm.org/D133497
2022-09-09 07:15:12 -07:00
Graham Hunter 1f639d1bd2 [NFC][LV] Convert masked call tests to use update script 2022-09-09 10:07:39 +01:00
Craig Topper 5f3a8b585b [RISCV] Add RecurKind::FMulAdd to isLegalToVectorizeReduction for scalable vectors.
Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D133511
2022-09-08 12:34:59 -07:00
Philip Reames 4c4c0d2c06 [LV] Use safe-divisor lowering for fixed vectors if profitable
This extends the safe-divisor widening scheme recently added for scalable vectors to handle fixed vectors as well.

Differential Revision: https://reviews.llvm.org/D132591
2022-09-08 09:15:54 -07:00
Florian Hahn 422cf99161
[VPlan] Only generate single instr for loads uniform across all parts.
VPReplicateRecipe::isUniform actually means uniform-per-parts, hence a
scalar instruction is generated per-part.

This is a potential alternative D132892. For now the current patch only
catches cases where the address is trivially invariant (defined outside
VPlan), while D132892 catches any address that is considered invariant
by SCEV AFAICT.

It should be possible to hoist fully invariant recipes feeding loads out
of the vector loop region as well, but in practice LICM should do that
already.

This version of the patch artificially limits this to loads to make it
easier to compare, but this restriction should be easily liftable.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D133019
2022-09-08 14:27:58 +01:00
Florian Hahn ba3d29f871
[LCSSA] Update unreachable uses with poison.
Users of LCSSA may not expect non-phi uses when checking the uses
outside a loop, which may cause crashes. This is due to the fact that we
do not update uses in unreachable blocks.

To ensure all reachable uses outside the loop are phis, update uses in
unreachable blocks to use poison in dead code.

Fixes #57508.
2022-09-04 22:26:18 +01:00
Florian Hahn a10d42dd45
[LV] Update test use opaque pointers, regenerate checks.
Modernize the test to make it easier to extend in a follow-up patch.
2022-09-04 22:26:18 +01:00
Florian Hahn fc444ddc77
[VPlan] Add field to track if intrinsic should be used for call. (NFC)
This patch moves the cost-based decision whether to use an intrinsic or
library call to the point where the recipe is created. This untangles
code-gen from the cost model and also avoids doing some extra work as
the information is already computed at construction.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D132585
2022-09-01 13:14:40 +01:00
Florian Hahn faad567589
[LV] Add test case where SCEV is needed to remove vector backedge.
Test case mentioned in the discussion for D115261.
2022-08-31 14:01:42 +01:00
Florian Hahn 1ed555a62b
[LV] Fix test cases where vector loop never executed.
It looks like the vector loops in the modified test cases
unintentionally never get executed. Update the exit condition to ensure
it does to avoid them getting optimized away in upcoming changes.
2022-08-31 13:24:49 +01:00
Philip Reames 4c10646367 [LV] Refresh autogen tests to reflect naming changes [nfc]
Purely so that these can be easily autogened without spurious diffs
2022-08-29 14:16:54 -07:00
Florian Hahn 005d1a8ff5
[LV] Add test where either a libfunc or intrinsic is chosen.
In the newly added test either a libfunc (VF=2) or a intrinsic (VF=4)
can be chosen.

Test coverage for D132585.
2022-08-29 10:51:20 +01:00
Philip Reames b45a262679 [RISCV] Enable fixed length vectors and loop vectorization with same
This change enables the use of RISCV's variable length vector registers for fixed length vectors in the IR, and implicitly enables various IR transforms which generate fixed length vectors if legal (e.g. LoopVectorize). Specifically, this enables fixed length vectors which are known to be inbounds of the underlying variable hardware size.

For context, remember that the +V extension provides a minimum VLEN of 128. The embedded variants provide lower minimums. The analogy here is essentially vectorizing for SSE on a machine which may or may not include AVX2/AVX512. We won't get full utilization by default, but we will get some benefit. And of course, with an explicit mcpu we can vectorize to the exact target hardware.

The LV impact is mostly related to vectorizer robustness. In cases we haven't yet fully implemented scalable vectorization support, we can fall back to fixed length vectorization.

SLP has been disabled for now, even when fixed vectors are enabled.  See a310637 and associated review.  There are a few addiitional code quality issues which need worked through before turning SLP on would be reasonable.

Differential Revision: https://reviews.llvm.org/D131508
2022-08-26 14:45:23 -07:00
Florian Hahn 9405af1c85
[LAA] Require AddRecs to be in the innermost loop for diff-checks.
The simpler diff-checks require pointers with add-recs from the same
innermost loop, but this property wasn't check completely. Add the
missing check to ensure both addrecs are in the innermost loop.

Fixes #57315.
2022-08-26 20:39:52 +01:00
Florian Hahn e117137af0
[LV] Add another test for incorrect runtime check generation.
Add a variation of @nested_loop_outer_iv_addrec_invariant_in_inner with
the dependence sink and source swapped to extend test coverage.

Also simplifies the test by removing an unneeded reduction.
2022-08-26 17:28:55 +01:00
Florian Hahn 6e56779e6b
[LV] Add test for incorrect runtime check generation #57315.
Test for PR57315 based on a test provided by @kpdev42.
2022-08-26 16:29:20 +01:00
Florian Hahn 3b135ef446
[LV] Convert runtime diff check test to use opaque pointers.
Modernize the test to make it easier to extend with up-to-date IR.
2022-08-26 16:02:38 +01:00
Philip Reames 86b67a310d [LAA] Prune dependencies with distance large than access implied by trip count
When we have a dependency with a dependence distance which can only be hit on an iteration beyond the actual trip count of the loop, we can ignore that dependency when analyzing said loop. We already had this code, but had restricted it solely to unknown dependence distances. This change applies it to all dependence distances.

Without this code, we relied on the vectorizer reducing VF such that our infeasible dependence was respected. This usually worked out to about the same result, but not always. For fixed length vectorization, this could mean a smaller VF than optimal being chosen or additional runtime checks. For scalable vectorization - where the bounds on access implied by VF are broader - we could often not find a feasible VF at all.

Differential Revision: https://reviews.llvm.org/D131924
2022-08-25 14:24:13 -07:00
Florian Hahn 637da77e66
[LV] Add additional test coverage for SCEVexp and LCSSA interaction.
Also converts the test to use opaque pointers while I am here.
2022-08-25 20:59:47 +01:00
Philip Reames 190cdf51ff [RISCV][LV] Add predicated div/rem test for fixed length vectorization 2022-08-24 11:24:22 -07:00
Philip Reames b20104f644 [LV] Update a test which appears to have been editted without regen [nfc] 2022-08-24 11:05:49 -07:00