Commit Graph

1661 Commits

Author SHA1 Message Date
Florian Hahn 635b752211
[VPlan] VPInterleaveRecipe only uses first lane if op not stored.
With opaque pointers, both the stored value and the address can be the
same. Only consider the recipe using the first lane only *if* the
address is not stored.

Fixes #55375.
2022-05-11 11:24:56 +01:00
Florian Hahn e79c1962b9
[LV] Add opaque pointer test for #55375. 2022-05-11 11:24:52 +01:00
Nikita Popov ff20ee32d8 [LoopVectorize] Remove incorrect nuw flag from test (NFC)
nuw does not make sense for reverse iteration.
2022-05-10 12:17:09 +02:00
David Sherwood 45f2e92d97 [NFC][LoopVectorize] Add SVE test for tail-folding combined with interleaving
Differential Revision: https://reviews.llvm.org/D125001
2022-05-09 13:08:25 +01:00
Simon Pilgrim cbfa857346 [CostModel][X86] Adjust 128-bit select costs to account for slow BLENDV op
Based off the script from D103695 - Jaguar, Bulldozer, Silvermont (et al) and Haswell all have slow BLENDV ops, so adjust the worse case cost values
2022-05-06 13:07:34 +01:00
Florian Hahn ff8d0b338f
[VPlan] Add test for printing plan with an exit value.
Test for printing plan with additions from D123537.
2022-05-04 17:19:02 +01:00
Igor Kirillov 4e5e042d9a [LoopVectorize] Support reductions that store intermediary result
Adds ability to vectorize loops containing a store to a loop-invariant
address as part of a reduction that isn't converted to SSA form due to
lack of aliasing info. Runtime checks are generated to ensure the store
does not alias any other accesses in the loop.

Ordered fadd reductions are not yet supported.

Differential Revision: https://reviews.llvm.org/D110235
2022-05-03 10:12:30 +01:00
David Green 6f81903e89 [LV][SLP] Mark fptosi_sat as vectorizable
This adds fptosi_sat and fptoui_sat to the list of trivially
vectorizable functions, mainly so that the loop vectorizer can vectorize
the instruction. Marking them as trivially vectorizable also allows them
to be SLP vectorized, and Scalarized.

The signature of a fptosi_sat requires two type overrides
(@llvm.fptosi.sat.v2i32.v2f32), unlike other intrinsics that often only
take a single. This patch alters hasVectorInstrinsicOverloadedScalarOpd
to isVectorIntrinsicWithOverloadTypeAtArg, so that it can mark the first
operand of the intrinsic as a overloaded (but not scalar) operand.

Differential Revision: https://reviews.llvm.org/D124358
2022-05-03 09:32:34 +01:00
Florian Hahn 0ef8ca6d88
[VPlan] Do not create VPWidenCall recipes for scalar vector factors.
'Widen' recipe are only used when actual vector values are generated.
Fix tryToWidenCall to do not create VPWidenCallRecipes for scalar vector
factors.

This was exposed by D123720, because the widened recipes are considered
vector users.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D124718
2022-05-02 19:40:33 +01:00
David Green c7d39fd61a [LV][SLP] Add tests for vectorizing fptoi_sat intrinsics. NFC 2022-05-02 15:11:44 +01:00
Simon Pilgrim cff0afc184 [LoopVectorize][X86] Regenerate invariant-store-vectorization.ll 2022-05-01 13:04:24 +01:00
Simon Pilgrim c2964746e3 [CostModel][X86] Reduce cost of vector selects on SSE2/AVX1 targets
Based off the script from D103695, we were exaggerating the cost of the OR(AND(X,M),AND(Y,~M)) expansion using instruction count instead of effective throughput
2022-05-01 09:32:14 +01:00
Florian Hahn 841fffa745
[LV] Add test for interleaving multiple iterations with call. 2022-04-30 20:43:22 +01:00
Bjorn Pettersson 2e14900db9 [test][NewPM] Use -passes=loop-vectorize instead of -loop-vectorize
Update a bunch of loop-vectorize regression tests to use the new PM
syntax (opt -passes=loop-vectorize) instead of the deprecated legacy
PM syntax (opt -loop-vectorize).
2022-04-28 16:46:00 +02:00
Florian Hahn bea69b232f
[VPlan] Initial modeling of middle block in VPlan.
This patch extends the scope of VPlan to also include the exit (aka
middle) block.

For now, the exit block remains empty, but handling of exit values will
subsequently be moved to VPlan, by adding recipes to model exit values
in the exit block.

As a first step, this will allow fixing #51366.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D123457
2022-04-20 19:34:41 +01:00
Florian Hahn a65f2730d2
[VPlan] Expand induction step in VPlan pre-header.
This patch moves SCEV expansion of steps used by
VPWidenIntOrFpInductionRecipes to the pre-header using
VPExpandSCEVRecipe. This ensures that those steps are expanded while the
CFG is in a valid state. Previously, SCEV expansion may happen during
vector body code-generation, during which the CFG may be invalid,
causing issues with SCEV expansion.

Depends on D122095.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D122096
2022-04-19 13:06:39 +02:00
Craig Topper ac8c720d48 [IR] Allow constant folding (insertelement <vscale x 2 x i32> zeroinitializer, i32 0, i32 i32 0.
Most of insertelement constant folding is blocked if the vector type
is scalable. I believe we can make an exception for inserting null
into an all zeros vector.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D123413
2022-04-15 17:44:32 -07:00
Florian Hahn 73f5d7d0d6
[VPlan] Handle equal address and store ops in onlyFirstLaneDemanded.
With opaque pointers, the stored value and address can be the same.

Previously the code in VPWidenMemoryInstructionRecipe::onlyFirstLaneDemanded
incorrectly considers stores with matching store and pointer operands as
only demanding the first lane, causing a crash.
2022-04-15 22:53:33 +02:00
Muhammad Omair Javaid 42ebfa8269 Revert "[AArch64] Set maximum VF with shouldMaximizeVectorBandwidth"
This reverts commit 64b6192e81.

This broke LLVM AArch64 buildbot clang-aarch64-sve-vls-2stage:

https://lab.llvm.org/buildbot/#/builders/176/builds/1515

llvm-tblgen crashes after applying this patch.
2022-04-13 04:53:07 +05:00
Simon Pilgrim 431e93f4f5 [InstCombine] Fold sub(add(x,y),min/max(x,y)) -> max/min(x,y) (PR38280)
As discussed on Issue #37628, we can flip a min/max node if we're subtracting from the sum of the node's operands

Alive2: https://alive2.llvm.org/ce/z/W_KXfy

Differential Revision: https://reviews.llvm.org/D123399
2022-04-11 11:32:56 +01:00
Florian Hahn 5f1eb74850
[VPlan] Place VPExpandSCEVRecipe in pre-header.
After D121624 models the pre-header in VPlan, VPExpandSCEVRecipes can be
placed there. This ensures SCEV expansion happens before modifying the
CFG during VPlan execution, when CFG is incomplete.

Depends on D121624.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D122095
2022-04-10 10:26:20 +02:00
Florian Hahn 256c6b0ba1
[VPlan] Model pre-header explicitly.
This patch extends the scope of VPlan to also model the pre-header.
The pre-header can be used to place recipes that should be code-gen'd
outside the loop, like SCEV expansion.

Depends on D121623.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D121624
2022-04-09 14:19:47 +02:00
Simon Pilgrim 450f0d76b4 [LoopVectorize] Regenerate first-order-recurrence.ll 2022-04-09 10:33:03 +01:00
Stanislav Mekhanoshin fced87d457 [AMDGPU] Fix regression with vectorization limiting
D67148 has removed TTI::getNumberOfRegisters(bool Vector) and
started to call TTI::getNumberOfRegisters(unsigned ClassID) from
the LoopVectorize. This has resulted in an unrestricted vectorization
on AMDGPU blowing up register pressure.

Differential Revision: https://reviews.llvm.org/D122850
2022-04-08 17:46:49 -07:00
Florian Hahn 467dbcd9f1
[LV] Set debug loc after setting insert point.
This fixes the code to actually use the location of the instruction, if
available. Previously, SetInsertPoint would overwrite the insert point
set from the instruction.
2022-04-08 20:34:40 +02:00
Florian Hahn 4c0d5db9c9
[LV] Add test case for wrong debug location with replicate recipe. 2022-04-08 20:34:16 +02:00
Florian Hahn 29fe998eaa
[VPlan] Preserve debug location when creating branch.
Update createEmptyBasicBlock to preserve the debug location of the
previous terminator.
2022-04-08 17:22:53 +02:00
Florian Hahn 547567fe2b
[LV] Add test for missing debug info on branch in vector loop.
Adds a test case where currently no debug location is added to branches
in the vector body.
2022-04-08 17:22:53 +02:00
Florian Hahn 631016a853
[LV] Add test case for PR54427.
Reduced test for #54427.
2022-04-07 23:21:21 +02:00
Jingu Kang 64b6192e81 [AArch64] Set maximum VF with shouldMaximizeVectorBandwidth
Set the maximum VF of AArch64 with 128 / the size of smallest type in loop.

Differential Revision: https://reviews.llvm.org/D118979
2022-04-05 13:16:52 +01:00
Florian Hahn 1ff022e21b
[LV] Add vector.body block to parent loop during skeleton creation.
When creating induction resume values, SCEV queries may rely on
LoopInfo. Make sure vector.body gets added to the loop of the pre-header
during skeleton construction.

%vector.body will be moved to the vector preheader during VPlan
execution.

Fixes #54745.
2022-04-05 11:54:17 +01:00
Florian Hahn 368d35a894
[LV] Add addiitonal tests for pointer difference memory checks.
Additional tests for D119078.
2022-04-04 17:58:48 +01:00
Philip Reames 88de27e3fd [LV] Handle non-integral types when considering interleave widening legality
In general, anywhere we might need to insert a blind bitcast, we need to make sure the types are losslessly convertible.

This fixes pr54634.
2022-04-03 20:16:20 -07:00
Dávid Bolvanský 872f7000fc Revert "[NFCI] Regenerate SROA/LoopVectorize test checks"
This reverts commit 14e3450fb5.
2022-04-04 01:15:30 +02:00
Dávid Bolvanský a113a582b1 [NFCI] Regenerate LoopVectorize test checks 2022-04-03 21:56:24 +02:00
Florian Hahn 95b2aa511e
[VPlan] Set VPlan header block name to vector.body.
This brings the VPlan block naming in line with the naming of the
generated basic blocks.
2022-04-02 19:34:32 +01:00
Florian Hahn a08c90a402
[LV] Re-use TripCount from EPI.TripCount.
During skeleton construction for the epilogue vector loop, generic
helpers use getOrCreateTripCount, which will re-expand the trip count
computation. Instead, re-use the TripCount created during main loop
vectorization.
2022-04-01 13:47:34 +01:00
David Green b65267ca7b [LV] Invalidate widening decisions after maximizing vector bandwidth
When MaximizeVectorBandwidth is enabled, we can end up (via calls to
collectUniformsAndScalars/setCostBasedWideningDecision through
calculateRegisterUsage) making widening decisions before we have decided
whether to fold the tail by masking. These decisions will be wrong if we
later decided to fold the tail, for example when the trip count is very
low. It will use incorrect costs for loads that should get masked, using
standard memory operation costs instead.

This still at the moment uses the EmulatedMaskMemRefHack costs (a bit
unfortunately), but the old costs without this change were 1, leading to
too optimistic vectorization.

This slightly changes the way that the MaximizeVectorBandwidth option
works to make it easier to test, always honouring the option if it is
set.

Differential Revision: https://reviews.llvm.org/D120215
2022-03-31 09:19:31 +01:00
Florian Hahn ecb4171dcb
[LV] Handle zero cost loops in selectInterleaveCount.
In some case, like in the added test case, we can reach
selectInterleaveCount with loops that actually have a cost of 0.

Unfortunately a loop cost of 0 is also used to communicate that the cost
has not been computed yet. To resolve the crash, bail out if the cost
remains zero after computing it.

This seems like the best option, as there are multiple code paths that
return a cost of 0 to force a computation in selectInterleaveCount.
Computing the cost at multiple places up front there would unnecessarily
complicate the logic.

Fixes #54413.
2022-03-29 22:52:43 +01:00
Florian Hahn 46432a0088
[VPlan] Add VPWidenPointerInductionRecipe.
This patch moves pointer induction handling from VPWidenPHIRecipe to its
own recipe. In the process, it adds all information required to generate
code for pointer inductions without relying on Legal to access the list
of induction phis.

Alternatively VPWidenPHIRecipe could also take an optional pointer to InductionDescriptor.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D121615
2022-03-24 14:58:45 +00:00
Florian Hahn 890fc21742
[LV] Extend checks in debugloc.ll. 2022-03-23 20:21:58 +00:00
Florian Hahn 973183612e
[VPlan] Add test for VPExpandSCEVRecipe printing. 2022-03-20 10:11:40 +00:00
Florian Hahn d5fbcf76fd
[VPlan] Improve pattern in vplan-printing.ll check line.
The existing pattern only matched a single value, which breaks if the
numbering slightly changes.
2022-03-19 16:03:25 +00:00
Andrew Wei 0af3e6a22d [InstCombine] Sink instructions with multiple users in a successor block.
This patch tries to sink instructions when they are only used in a successor block.

This is a further enhancement patch based on Anna's commit:
D109700, which allows sinking an instruction having multiple uses in a single user.

In this patch, sink instructions with multiple users in a single successor block will be supported.
It could fix a known issue from rust:
  https://github.com/rust-lang/rust/issues/51346#issuecomment-394443610

Reviewed By: nikic, reames

Differential Revision: https://reviews.llvm.org/D121585
2022-03-18 11:53:45 +08:00
Florian Hahn 151c144350
[LV] Use usesScalars in widenPHIInstruction.
This uses the existing VPlan helpers to check whether there are scalar
uses of a phi recipe. It remove one of the few remaining dependencies on
the cost model from VPlan code generation.

Depends on D121612.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D121613
2022-03-17 13:16:32 +00:00
Malhar Jajoo a36d269658 [VPlan] Avoid collecting scalars for SVE
This patch ensures scalars (except for uniforms) are no
longer collected (prior to LVP planning phase) for
scalable vectorization.

This is to avoid the chances of generating scalarized
instructions later (during LVP execute phase) as they
are not supported for scalable vectorization.

Relevant test has also been added.

Differential Revision: https://reviews.llvm.org/D121452
2022-03-16 16:33:34 +00:00
Florian Hahn 5c4d64eb0d
[LV] Make reduction-order.ll test independent of instruction naming.
Also update test to not use branch on undef.
2022-03-15 11:13:18 +00:00
Florian Hahn 4a0481e981
[LV] Check for users of truncated IVs, add more detailed comment.
Add missing outside user check for truncated IVs. Also hoist the code in
the helper with additional explanations.

Fixes #54370.
2022-03-14 19:39:30 +00:00
Florian Hahn 1c0fc1f074
[VPlan] Ensure each iv user is only visited once in transform.
If a recipe has multiple uses of an IV, we crash. It causes a crash when
building llvm-test-suite.

Exposed by 95f76bff1c.
2022-03-13 21:42:17 +00:00
Florian Hahn 95f76bff1c
[LV] Create & use VPScalarIVSteps for all scalar users.
This patch is a follow-up to D115953. It updates optimizeInductions
to also introduce new VPScalarIVStepsRecipes if an IV has both vector
and scalar uses.

It updates all uses that only need scalar values to use the newly
created recipe for the scalar steps.

This completes untangling of VPWidenIntOrFpInductionRecipe
code-generation. Now the recipe *only* creates the widened vector
values, as it says on the tin.

The code to genereate IR has been moved directly to
VPWidenIntOrFpInductionRecipe::execute.

Note that the recipe has been updated to hold a reference to
ScalarEvolution, which is needed to expand the step, until we can place
the corresponding SCEV expansion in the pre-header.

Depends on D120827.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D120828
2022-03-13 17:15:24 +00:00