Adds ability to vectorize loops containing a store to a loop-invariant
address as part of a reduction that isn't converted to SSA form due to
lack of aliasing info. Runtime checks are generated to ensure the store
does not alias any other accesses in the loop.
Ordered fadd reductions are not yet supported.
Differential Revision: https://reviews.llvm.org/D110235
This adds fptosi_sat and fptoui_sat to the list of trivially
vectorizable functions, mainly so that the loop vectorizer can vectorize
the instruction. Marking them as trivially vectorizable also allows them
to be SLP vectorized, and Scalarized.
The signature of a fptosi_sat requires two type overrides
(@llvm.fptosi.sat.v2i32.v2f32), unlike other intrinsics that often only
take a single. This patch alters hasVectorInstrinsicOverloadedScalarOpd
to isVectorIntrinsicWithOverloadTypeAtArg, so that it can mark the first
operand of the intrinsic as a overloaded (but not scalar) operand.
Differential Revision: https://reviews.llvm.org/D124358
'Widen' recipe are only used when actual vector values are generated.
Fix tryToWidenCall to do not create VPWidenCallRecipes for scalar vector
factors.
This was exposed by D123720, because the widened recipes are considered
vector users.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D124718
Similar to c515b2f39e, If there are no loops in the function as seen
through LI, we should avoid computing the remaining expensive analyses
(such as SCEV, BPI). Reordered the analyses requests and early return
if there are no loops.
The logic of avoiding expensive analyses is applied to LoopVectorizer,
LoopLoadElimination and LoopUnrollPass, i.e. all function passes which operate
on loops.
This is an NFC with compile time improvement.
Differential Revision: https://reviews.llvm.org/D124529
The name CountRoundDown is potentially misleading, as the number of
iterations can be rounded up when folding the tail.
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D119681
This patch extends the scope of VPlan to also include the exit (aka
middle) block.
For now, the exit block remains empty, but handling of exit values will
subsequently be moved to VPlan, by adding recipes to model exit values
in the exit block.
As a first step, this will allow fixing #51366.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D123457
This patch moves SCEV expansion of steps used by
VPWidenIntOrFpInductionRecipes to the pre-header using
VPExpandSCEVRecipe. This ensures that those steps are expanded while the
CFG is in a valid state. Previously, SCEV expansion may happen during
vector body code-generation, during which the CFG may be invalid,
causing issues with SCEV expansion.
Depends on D122095.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D122096
This addresses an existing TODO by keeping a mapping of external IR
Value * definitions wrapped in VPValues for use in a VPlan.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D123700
This patch extends the scope of VPlan to also model the pre-header.
The pre-header can be used to place recipes that should be code-gen'd
outside the loop, like SCEV expansion.
Depends on D121623.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D121624
This fixes the code to actually use the location of the instruction, if
available. Previously, SetInsertPoint would overwrite the insert point
set from the instruction.
When creating induction resume values, SCEV queries may rely on
LoopInfo. Make sure vector.body gets added to the loop of the pre-header
during skeleton construction.
%vector.body will be moved to the vector preheader during VPlan
execution.
Fixes#54745.
This was exposed by 14e3650f. The recommit of 14e3650f will hit the
problematic code path requiring the workaround.
test case that crashes without the workaround.
During skeleton construction for the epilogue vector loop, generic
helpers use getOrCreateTripCount, which will re-expand the trip count
computation. Instead, re-use the TripCount created during main loop
vectorization.
This reverts the revert commit 2760cdc9c6.
This version pulls in the code to create the vector loop object in VPlan
from D121624.
This is needed because otherwise existing LoopInfo verification will
fail, as a loop block doesn't have in-loop successors now that we
do not replace the branch.
Now that we do not add new loops during skeleton construction, there's
also no need to verify LI there.
The only remaining use was to get the exit block of the loop. Instead of
relying on the loop, use the successor of VectorHeaderBB
(LoopMiddleBlock) directly to set VPTransformState::CFG::ExitB
Depends on D121621.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D121623
When MaximizeVectorBandwidth is enabled, we can end up (via calls to
collectUniformsAndScalars/setCostBasedWideningDecision through
calculateRegisterUsage) making widening decisions before we have decided
whether to fold the tail by masking. These decisions will be wrong if we
later decided to fold the tail, for example when the trip count is very
low. It will use incorrect costs for loads that should get masked, using
standard memory operation costs instead.
This still at the moment uses the EmulatedMaskMemRefHack costs (a bit
unfortunately), but the old costs without this change were 1, leading to
too optimistic vectorization.
This slightly changes the way that the MaximizeVectorBandwidth option
works to make it easier to test, always honouring the option if it is
set.
Differential Revision: https://reviews.llvm.org/D120215
In some case, like in the added test case, we can reach
selectInterleaveCount with loops that actually have a cost of 0.
Unfortunately a loop cost of 0 is also used to communicate that the cost
has not been computed yet. To resolve the crash, bail out if the cost
remains zero after computing it.
This seems like the best option, as there are multiple code paths that
return a cost of 0 to force a computation in selectInterleaveCount.
Computing the cost at multiple places up front there would unnecessarily
complicate the logic.
Fixes#54413.
This patch moves the code to set the correct incoming block for the
backedge value to VPlan::execute.
When generating the phi node, the backedge value is temporarily added
using the pre-header as incoming block. The invalid phi node will be
fixed up during VPlan::execute after main VPlan code generation.
At the same time, the backedge value is also moved to the latch.
This change removes the requirement to create the latch block up-front
for VPWidenInductionPHIRecipe::execute, which in turn will enable
modeling the pre-header in VPlan.
Depends on D121617.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D121618
This patch moves the code to set the correct incoming block for the
backedge value to VPlan::execute.
When generating the phi node, the backedge value is temporarily added
using the pre-header as incoming block. The invalid phi node will be
fixed up during VPlan::execute after main VPlan code generation.
At the same time, the backedge value is also moved to the latch.
This change removes the requirement to create the latch block up-front
for VPWidenIntOrFpInductionRecipe::execute, which in turn will enable
modeling the pre-header in VPlan.
As an alternative, the increment could be modeled as separate recipe,
but that would require more work and a bit of redundant code, as we need
to create the step-vector during VPWidenIntOrFpInductionRecipe::execute
anyways, to create the values for different parts.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D121617
Update all places that currently assume the entry block to the plan is
also the vector loop header to use getVectorLoopRegion instead.
getVectorLoopRegion will keep doing the right thing when the pre-header
is modeled explicitly (and becomes the new entry block in the plan).
This patch moves pointer induction handling from VPWidenPHIRecipe to its
own recipe. In the process, it adds all information required to generate
code for pointer inductions without relying on Legal to access the list
of induction phis.
Alternatively VPWidenPHIRecipe could also take an optional pointer to InductionDescriptor.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D121615
createInductionResumeValues only uses its loop argument only to get the
pre-header, but the pre-header is already known (we created/cached it
earlier). Remove the unneeded loop argument.
completeLoopSkeleton only uses its loop argument only to get the
pre-header, but the pre-header is already known (we created/cached it
earlier). Remove the unneeded loop argument.
Update functions that previously took a loop pointer but only to get the
pre-header. Instead, pass the block directly. This removes the
requirement for the loop object to be created up-front.
This uses the existing VPlan helpers to check whether there are scalar
uses of a phi recipe. It remove one of the few remaining dependencies on
the cost model from VPlan code generation.
Depends on D121612.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D121613
This patch ensures scalars (except for uniforms) are no
longer collected (prior to LVP planning phase) for
scalable vectorization.
This is to avoid the chances of generating scalarized
instructions later (during LVP execute phase) as they
are not supported for scalable vectorization.
Relevant test has also been added.
Differential Revision: https://reviews.llvm.org/D121452
Update places still referencing LoopVectorBody to use the vector loop to
get the vector loop header. This is needed to move vector loop
code-generation to VPlan completely, which in turn is needed to model
pre-header & exit blocks in VPlan as well.
The insertion point for the builder used during VPlan code generation is
set during code generation. Setting the insert point here is dead code
and can be removed.
This patch is a follow-up to D115953. It updates optimizeInductions
to also introduce new VPScalarIVStepsRecipes if an IV has both vector
and scalar uses.
It updates all uses that only need scalar values to use the newly
created recipe for the scalar steps.
This completes untangling of VPWidenIntOrFpInductionRecipe
code-generation. Now the recipe *only* creates the widened vector
values, as it says on the tin.
The code to genereate IR has been moved directly to
VPWidenIntOrFpInductionRecipe::execute.
Note that the recipe has been updated to hold a reference to
ScalarEvolution, which is needed to expand the step, until we can place
the corresponding SCEV expansion in the pre-header.
Depends on D120827.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D120828
The analysis passes output function name encapsulated in `'` braces,
but LV uses `"`. Harmonizing this may help in creating an update script
for the LV costmodel test checks.
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D121105
Instead of relying on underlying instructions, this patch updates
VPScalarIVStepsRecipe to only store the required type information.
This removes access to unrelated information, as well as avoiding issues
with the same underlying instruction being shared by multiple recipes.
This change should only change the debug output and not cause any
codegen changes, hence NFCI.
This reverts the revert commit ff93260bf6.
The underlying issue causing the PPC bot failures has been fixed in
cbaac14734 and a corresponding test case has been added in
ad2cad1c52.
Original message:
This patch adds a new VPScalarIVStepsRecipe to handle building scalar
steps.
In the first patch, it only handles the case where there is no vector
induction variable needed.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D115953