This patch updates VPWidenPHI recipes for first-order recurrences to
also track the incoming value from the back-edge. Similar to D99294,
which did the same for reductions.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D104197
The non-DOT printing does not include the successors of VPregionBlocks.
This patch use the same style for printing successors as for
VPBasicBlock.
I think the printing of successors could be a bit improved further, as
at the moment it is hard to ensure a check line matches all successors.
But that can be done as follow-up.
Reviewed By: a.elovikov
Differential Revision: https://reviews.llvm.org/D103515
This allows cast/dyn_cast'ing from VPUser to recipes. This is needed
because there are VPUsers that are not recipes.
Reviewed By: gilr, a.elovikov
Differential Revision: https://reviews.llvm.org/D100257
This patch updates the code that sinks recipes required for first-order
recurrences to properly handle replicate-regions. At the moment, the
code would just move the replicate recipe out of its replicate-region,
producing an invalid VPlan.
When sinking a recipe in a replicate-region, we have to sink the whole
region. To do that, we first need to split the block at the target
recipe and move the region in between.
This patch also adds a splitAt helper to VPBasicBlock to split a
VPBasicBlock at a given iterator.
Fixes PR50009.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D100751
This patch updates the code handling reduction recipes to also keep
track of the incoming value from the latch in the recipe. This is needed
to model the def-use chains completely in VPlan, so that it is possible
to replace the incoming value with an arbitrary VPValue.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D99294
As we gradually move more elements of LV to VPlan, we are trying to
reduce the number of places that still has to check IR of the original
loop.
This patch adjusts the code to fix cross iteration phis to get the PHIs
to fix directly from the VPlan that is executed. We still need the
original PHI to check for first-order recurrences, but we can get rid of
that once we model that explicitly in VPlan as well.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D99293
This patch introduces a helper to obtain an iterator range for the
PHI-like recipes in a block.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D100101
As suggested in D99294, this adds a getVPSingleValue helper to use for
recipes that are guaranteed to define a single value. This replaces uses
of getVPValue() which used to default to I = 0.
When iterating over const blocks, the base type in the lambdas needs
to use const VPBlockBase *, otherwise it cannot be used with input
iterators over const VPBlockBase.
Also adjust the type of the input iterator range to const &, as it
does not take ownership of the input range.
This patch adds a blocksOnly helpers which take an iterator range
over VPBlockBase * or const VPBlockBase * and returns an interator
range that only include BlockTy blocks. The accesses are casted to
BlockTy.
Reviewed By: a.elovikov
Differential Revision: https://reviews.llvm.org/D101093
This patch adds a new iterator to traverse through VPRegionBlocks and a
GraphTraits specialization using the iterator to traverse through
VPRegionBlocks.
Because there is already a GraphTraits specialization for VPBlockBase *
and co, a new VPBlockRecursiveTraversalWrapper helper is introduced.
This allows us to provide a new GraphTraits specialization for that
type. Users can use the new recursive traversal by using this wrapper.
The graph trait visits both the entry block of a region, as well as all
its successors. Exit blocks of a region implicitly have their parent
region's successors. This ensures all blocks in a region are visited
before any blocks in a successor region when doing a reverse post-order
traversal of the graph.
Reviewed By: a.elovikov
Differential Revision: https://reviews.llvm.org/D100175
Rather than maintaining two separate values, a `float` for the per-lane
cost and a Width for the VF, maintain a single VectorizationFactor which
comprises the two and also removes the need for converting an integer value
to float.
This simplifies the query when asking if one VF is more profitable than
another when we want to extend this for scalable vectors (which may
require additional options to determine if e.g. a scalable VF of the
some cost, is more profitable than a fixed VF of the same cost).
The patch isn't entirely NFC because it also fixes an issue in
selectEpilogueVectorizationFactor, where the cost passed to ProfitableVFs
no longer truncates the floating-point cost from `float` to `unsigned` to
then perform the calculation on the truncated cost. It now does
a cost comparison with the correct precision.
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D100121
Add an initial version of a helper to determine whether a recipe may
have side-effects.
Reviewed By: a.elovikov
Differential Revision: https://reviews.llvm.org/D100259
Use SetVector instead of SmallPtrSet for external definitions created for VPlan.
Doing this can help avoid non-determinism caused by iterating over unordered containers.
This bug was found with reverse iteration turning on,
--extra-llvm-cmake-variables="-DLLVM_REVERSE_ITERATION=ON".
Failing LLVM-Unit test VPRecipeTest.dump.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D99544
I foresee two uses for this:
1) It's easier to use those in debugger.
2) Once we start implementing more VPlan-to-VPlan transformations (especially
inner loop massaging stuff), using the vectorized LLVM IR as CHECK targets in
LIT test would become too obscure. I can imagine that we'd want to CHECK
against VPlan dumps after multiple transformations instead. That would be
easier with plain text dumps than with DOT format.
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D96628
This reverts commit 6b053c9867.
The build is broken:
ld.lld: error: undefined symbol: llvm::VPlan::printDOT(llvm::raw_ostream&) const
>>> referenced by LoopVectorize.cpp
>>> LoopVectorize.cpp.o:(llvm::LoopVectorizationPlanner::printPlans(llvm::raw_ostream&)) in archive lib/libLLVMVectorize.a
I foresee two uses for this:
1) It's easier to use those in debugger.
2) Once we start implementing more VPlan-to-VPlan transformations (especially
inner loop massaging stuff), using the vectorized LLVM IR as CHECK targets in
LIT test would become too obscure. I can imagine that we'd want to CHECK
against VPlan dumps after multiple transformations instead. That would be
easier with plain text dumps than with DOT format.
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D96628
Instead of maintaining a separate map from predicated instructions to
recipes, we can instead directly look at the VP operands. If the operand
comes from a predicated instruction, the operand will be a
VPPredInstPHIRecipe with a VPReplicateRecipe as its operand.
There are certain loops like this below:
for (int i = 0; i < n; i++) {
a[i] = b[i] + 1;
*inv = a[i];
}
that can only be vectorised if we are able to extract the last lane of the
vectorised form of 'a[i]'. For fixed width vectors this already works since
we know at compile time what the final lane is, however for scalable vectors
this is a different story. This patch adds support for extracting the last
lane from a scalable vector using a runtime determined lane value. I have
added support to VPIteration for runtime-determined lanes that still permit
the caching of values. I did this by introducing a new class called VPLane,
which describes the lane we're dealing with and provides interfaces to get
both the compile-time known lane and the runtime determined value. Whilst
doing this work I couldn't find any explicit tests for extracting the last
lane values of fixed width vectors so I added tests for both scalable and
fixed width vectors.
Differential Revision: https://reviews.llvm.org/D95139
Update the deletion order when destroying VPBasicBlocks. This ensures
recipes that depend on earlier ones in the block are removed first.
Otherwise this may cause issues when recipes have remaining users later
in the block.
This patch extends VPWidenPHIRecipe to manage pairs of incoming
(VPValue, VPBasicBlock) in the VPlan native path. This is made possible
because we now directly manage defined VPValues for recipes.
By keeping both the incoming value and block in the recipe directly,
code-generation in the VPlan native path becomes independent of the
predecessor ordering when fixing up non-induction phis, which currently
can cause crashes in the VPlan native path.
This fixes PR45958.
Reviewed By: sguggill
Differential Revision: https://reviews.llvm.org/D96773
Now that all state for generated instructions is managed directly in
VPTransformState, VPCallBack is no longer needed. This patch updates the
last use of `getOrCreateScalarValue` to instead manage the value
directly in VPTransformState and removes VPCallback.
Reviewed By: gilr
Differential Revision: https://reviews.llvm.org/D95383
This patch updates codegen to use VPValues to manage the generated
scalarized instructions.
Reviewed By: gilr
Differential Revision: https://reviews.llvm.org/D92285
The individual recipes have been updated to manage their operands using
VPUser a while back. Now that the transition is done, we can instead
make VPRecipeBase a VPUser and get rid of the toVPUser helper.
VP blocks keep track of a condition, which is a VPValue. This patch
updates VPBlockBase to manage the value using VPUser, so
replaceAllUsesWith properly updates the condition bit as well.
This is required to enable VP2VP transformations and it helps with
simplifying some of the code required to manage condition bits.
Reviewed By: gilr
Differential Revision: https://reviews.llvm.org/D95382
This patch updates some places where VectorLoopValueMap is accessed
directly to instead go through VPTransformState.
As we move towards managing created values exclusively in VPTransformState,
this ensures the use always can fetch the correct value.
This is in preparation for D92285, which switches to managing scalarized
values through VPValues.
In the future, the various fix* functions should be moved directly into
the VPlan codegen stage.
Reviewed By: gilr
Differential Revision: https://reviews.llvm.org/D95757
This patch updates the induction value creation to use VPValues of
recipes to map the created values. This should bring is one step closer
to being able to optimize induction recipes directly in VPlan.
Currently widenIntOrFpInduction also generates vector values for a cast
of the induction, if it exists. Make this explicit by adding the cast
instruction to the values defined by the recipe.
Reviewed By: gilr
Differential Revision: https://reviews.llvm.org/D92284
This patch adds constructors to VPIteration as a cleaner way of
initialising the struct and replaces existing constructions of
the form:
{Part, Lane}
with
VPIteration(Part, Lane)
I have also added a default constructor, which is used by VPlan.cpp
when deciding whether to replicate a block or not.
This refactoring will be required in a later patch that adds more
members and functions to VPIteration.
Differential Revision: https://reviews.llvm.org/D95676
I am trying to untangle the fast-math-flags propagation logic
in the vectorizers (see a6f022127 for SLP).
The loop vectorizer has a mix of checking FP function attributes,
IR-level FMF, and just wrong assumptions.
I am trying to avoid regressions while fixing this, and I think
the IR-level logic is good enough for that, but it's hard to say
for sure. This would be the 1st step in the clean-up.
The existing test that I changed to include 'fast' actually shows
a miscompile: the function only had the equivalent of nnan, but we
created new instructions that had fast (all FMF set). This is
similar to the example in https://llvm.org/PR35538
Differential Revision: https://reviews.llvm.org/D95452
This patch unifies the way recipes and VPValues are printed after the
transition to VPDef.
VPSlotTracker has been updated to iterate over all recipes and all
their defined values to number those. There is no need to number
values in Value2VPValue.
It also updates a few places that only used slot numbers for
VPInstruction. All recipes now can produce numbered VPValues.
Similar to D92129, update VPWidenPHIRecipe to manage the start value as
VPValue. This allows adjusting the start value as a VPlan transform,
which will be used in a follow-up patch to support reductions during
epilogue vectorization.
Reviewed By: gilr
Differential Revision: https://reviews.llvm.org/D93975
This was suggested to prepare for D93975.
By moving the start value creation to widenPHInstruction, we set the
stage to manage the start value directly in VPWidenPHIRecipe, which be
used subsequently to set the 'resume' value for reductions during
epilogue vectorization.
It also moves RdxDesc to the recipe, so we do not have to rely on Legal
to look it up later.
Reviewed By: gilr
Differential Revision: https://reviews.llvm.org/D94175
The new test case here contains a first order recurrences and an
instruction that is replicated. The first order recurrence forces an
instruction to be sunk _into_, as opposed to after the replication
region. That causes several things to go wrong including registering
vector instructions multiple times and failing to create dominance
relations correctly.
Instead we should be sinking to after the replication region, which is
what this patch makes sure happens.
Differential Revision: https://reviews.llvm.org/D93629
This patch updates VPWidenIntOrFpInductionRecipe to hold the start value
for the induction variable. This makes the start value explicit and
allows for adjusting the start value for a VPlan.
The flexibility will be used in further patches.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D92129
This patch turns updates VPInstruction to manage the value it defines
using VPDef. The VPValue is used during VPlan construction and
codegeneration instead of the plain IR reference where possible.
Reviewed By: gilr
Differential Revision: https://reviews.llvm.org/D90565
This patch makes VPRecipeBase a direct subclass of VPDef, moving the
SubclassID to VPDef.
Reviewed By: gilr
Differential Revision: https://reviews.llvm.org/D90564
This patch turns updates VPInterleaveRecipe to manage the values it defines
using VPDef. The VPValue is used during VPlan construction and
codegeneration instead of the plain IR reference where possible.
Reviewed By: gilr
Differential Revision: https://reviews.llvm.org/D90562
This patch turns updates VPWidenSelectRecipe to manage the value
it defines using VPDef.
Reviewed By: gilr
Differential Revision: https://reviews.llvm.org/D90560
This patch turns updates VPWidenGEPRecipe to manage the value it defines
using VPDef. The VPValue is used during VPlan construction and
codegeneration instead of the plain IR reference where possible.
Reviewed By: gilr
Differential Revision: https://reviews.llvm.org/D90561
This patch turns updates VPWidenREcipe to manage the value it defines
using VPDef.
Reviewed By: gilr
Differential Revision: https://reviews.llvm.org/D90559
This patch updates VPWidenMemoryInstructionRecipe to use VPDef
to manage the value it produces instead of inheriting from VPValue.
Reviewed By: gilr
Differential Revision: https://reviews.llvm.org/D90563
* Steps are scaled by `vscale`, a runtime value.
* Changes to circumvent the cost-model for now (temporary)
so that the cost-model can be implemented separately.
This can vectorize the following loop [1]:
void loop(int N, double *a, double *b) {
#pragma clang loop vectorize_width(4, scalable)
for (int i = 0; i < N; i++) {
a[i] = b[i] + 1.0;
}
}
[1] This source-level example is based on the pragma proposed
separately in D89031. This patch only implements the LLVM part.
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D91077