This patch is part of a series of patches that migrate integer
instruction costs to use InstructionCost. In the function
selectVectorizationFactor I have simply asserted that the cost
is valid and extracted the value as is. In future we expect
to encounter invalid costs, but we should filter out those
vectorization factors that lead to such invalid costs.
See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html
Differential Revision: https://reviews.llvm.org/D92178
A severe compile-time slowdown from this call is noted in:
https://llvm.org/PR48689
My naive fix was to put it under LLVM_DEBUG ( 267ff79 ),
but that's not limiting in the way we want.
This is a quick fix (or we could just remove the call completely
and rely on some later pass to discover potentially wrong IR?).
A bigger/better fix would be to improve/limit verifyFunction()
as noted in:
https://llvm.org/PR47712
Differential Revision: https://reviews.llvm.org/D94328
Similar to D92129, update VPWidenPHIRecipe to manage the start value as
VPValue. This allows adjusting the start value as a VPlan transform,
which will be used in a follow-up patch to support reductions during
epilogue vectorization.
Reviewed By: gilr
Differential Revision: https://reviews.llvm.org/D93975
This was suggested to prepare for D93975.
By moving the start value creation to widenPHInstruction, we set the
stage to manage the start value directly in VPWidenPHIRecipe, which be
used subsequently to set the 'resume' value for reductions during
epilogue vectorization.
It also moves RdxDesc to the recipe, so we do not have to rely on Legal
to look it up later.
Reviewed By: gilr
Differential Revision: https://reviews.llvm.org/D94175
As noted in PR48689, the verifier may have some kind
of exponential behavior that should be addressed
separately. For now, only run it in debug mode to
prevent problems for release+asserts.
That limit is what we had before D80401, and I'm
not sure if there was a reason to change it in that
patch.
In the following loop:
void foo(int *a, int *b, int N) {
for (int i=0; i<N; ++i)
a[i + 4] = a[i] + b[i];
}
The loop dependence constrains the VF to a maximum of (4, fixed), which
would mean using <4 x i32> as the vector type in vectorization.
Extending this to scalable vectorization, a VF of (4, scalable) implies
a vector type of <vscale x 4 x i32>. To determine if this is legal
vscale must be taken into account. For this example, unless
max(vscale)=1, it's unsafe to vectorize.
For SVE, the number of bits in an SVE register is architecturally
defined to be a multiple of 128 bits with a maximum of 2048 bits, thus
the maximum vscale is 16. In the loop above it is therefore unfeasible
to vectorize with SVE. However, in this loop:
void foo(int *a, int *b, int N) {
#pragma clang loop vectorize_width(X, scalable)
for (int i=0; i<N; ++i)
a[i + 32] = a[i] + b[i];
}
As long as max(vscale) multiplied by the number of lanes 'X' doesn't
exceed the dependence distance, it is safe to vectorize. For SVE a VF of
(2, scalable) is within this constraint, since a vector of <16 x 2 x 32>
will have no dependencies between lanes. For any number of lanes larger
than this it would be unsafe to vectorize.
This patch extends 'computeFeasibleMaxVF' to legalize scalable VFs
specified as loop hints, implementing the following behaviour:
* If the backend does not support scalable vectors, ignore the hint.
* If scalable vectorization is unfeasible given the loop
dependence, like in the first example above for SVE, then use a
fixed VF.
* Accept scalable VFs if it's safe to do so.
* Otherwise, clamp scalable VFs that exceed the maximum safe VF.
Reviewed By: sdesmalen, fhahn, david-arm
Differential Revision: https://reviews.llvm.org/D91718
The new test case here contains a first order recurrences and an
instruction that is replicated. The first order recurrence forces an
instruction to be sunk _into_, as opposed to after the replication
region. That causes several things to go wrong including registering
vector instructions multiple times and failing to create dominance
relations correctly.
Instead we should be sinking to after the replication region, which is
what this patch makes sure happens.
Differential Revision: https://reviews.llvm.org/D93629
After merging the shuffles, we cannot rely on the previous shuffle
anymore and need to shrink the final shuffle, if it is required.
Reported in D92668
Differential Revision: https://reviews.llvm.org/D93967
Similar to 5a1d31a28 -
This should be no-functional-change because the reduction kind
opcodes are 1-for-1 mappings to the instructions we are matching
as reductions. But we want to remove the need for the
`OperationData` opcode field because that does not work when
we start matching intrinsics (eg, maxnum) as reduction candidates.
This patch updates VPWidenIntOrFpInductionRecipe to hold the start value
for the induction variable. This makes the start value explicit and
allows for adjusting the start value for a VPlan.
The flexibility will be used in further patches.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D92129
This patch adds a new getLiveInIRValue accessor to VPValue, which
returns the underlying value, if the VPValue is defined outside of
VPlan. This is required to handle scalars in VPTransformState, which
requires dealing with scalars defined outside of VPlan.
We can simply check VPValue::Def to determine if the value is defined
inside a VPlan.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D92281
This patch makes SLP and LV emit operations with initial vectors set to poison constant instead of undef.
This is a part of efforts for using poison vector instead of undef to represent "doesn't care" vector.
The goal is to make nice shufflevector optimizations valid that is currently incorrect due to the tricky interaction between undef and poison (see https://bugs.llvm.org/show_bug.cgi?id=44185 ).
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D94061
This should be no-functional-change because the reduction kind
opcodes are 1-for-1 mappings to the instructions we are matching
as reductions. But we want to remove the need for the
`OperationData` opcode field because that does not work when
we start matching intrinsics (eg, maxnum) as reduction candidates.
SLP tries to model 2 forms of vector reductions: pairwise and splitting.
From the cost model code comments, those are defined using an example as:
/// Pairwise:
/// (v0, v1, v2, v3)
/// ((v0+v1), (v2+v3), undef, undef)
/// Split:
/// (v0, v1, v2, v3)
/// ((v0+v2), (v1+v3), undef, undef)
I don't know the full history of this functionality, but it was partly
added back in D29402. There are apparently no users at this point (no
regression tests change). X86 might have managed to work-around the need
for this through cost model and codegen improvements.
Removing this code makes it easier to continue the work that was started
in D87416 / D88193. The alternative -- if there is some target that is
silently using this option -- is to move this logic into LoopUtils. We
have related/duplicate functionality there via llvm::createTargetReduction().
Differential Revision: https://reviews.llvm.org/D93860
Creating in-loop reductions relies on IR references to map
IR values to VPValues after interleave group creation.
Make sure we re-add the updated member to the plan, so the look-ups
still work as expected
This fixes a crash reported after D90562.
While here, rename the inaccurate getRecurrenceBinOp()
because that was also used to get CmpInst opcodes.
The recurrence/reduction kind should always refer to the
expected opcode for a reduction. SLP appears to be the
only direct caller of createSimpleTargetReduction(), and
that calling code ideally should not be carrying around
both an opcode and a reduction kind.
This should allow us to generalize reduction matching to
use intrinsics instead of only binops.
This is almost all mechanical search-and-replace and
no-functional-change-intended (NFC). Having a single
enum makes it easier to match/reason about the
reduction cases.
The goal is to remove `Opcode` from reduction matching
code in the vectorizers because that makes it harder to
adapt the code to handle intrinsics.
The code in RecurrenceDescriptor::AddReductionVar() is
the only place that required closer inspection. It uses
a RecurrenceDescriptor and a second InstDesc to sometimes
overwrite part of the struct. It seem like we should be
able to simplify that logic, but it's not clear exactly
which cmp+sel patterns that we are trying to handle/avoid.
If DoExtraAnalysis is true (e.g. because remarks are enabled), we
continue with the analysis rather than exiting. Update code to
conditionally check if the ExitBB has phis or not a single predecessor.
Otherwise a nullptr is dereferenced with DoExtraAnalysis.
I don't know if there's some way this changes what the vectorizers
may produce for reductions, but I have added test coverage with
3567908 and 5ced712 to show that both passes already have bugs in
this area. Hopefully this does not make things worse before we can
really fix it.
I'm not sure if the SLP enum was created before the IVDescriptor
RecurrenceDescriptor / RecurrenceKind existed, but the code in
SLP is now redundant with that class, so it just makes things
more complicated to have both. We eventually call LoopUtils
createSimpleTargetReduction() to create reduction ops, so we
might as well standardize on those enum names.
There's still a question of whether we need to use TTI::ReductionFlags
vs. MinMaxRecurrenceKind, but that can be another clean-up step.
Another option would just be to flatten the enums in RecurrenceDescriptor
into a single enum. There isn't much benefit (smaller switches?) to
having a min/max subset.
This reverts commit 4ffcd4fe9a thus restoring e4df6a40da.
The only change from the original patch is to add "llvm::" before the call to empty(iterator_range). This is a speculative fix for the ambiguity reported on some builders.
This patch is a major step towards supporting multiple exit loops in the vectorizer. This patch on it's own extends the loop forms allowed in two ways:
single exit loops which are not bottom tested
multiple exit loops w/ a single exit block reached from all exits and no phis in the exit block (because of LCSSA this implies no values defined in the loop used later)
The restrictions on multiple exit loop structures will be removed in follow up patches; disallowing cases for now makes the code changes smaller and more obvious. As before, we can only handle loops with entirely analyzable exits. Removing that restriction is much harder, and is not part of currently planned efforts.
The basic idea here is that we can force the last iteration to run in the scalar epilogue loop (if we have one). From the definition of SCEV's backedge taken count, we know that no earlier iteration can exit the vector body. As such, we can leave the decision on which exit to be taken to the scalar code and generate a bottom tested vector loop which runs all but the last iteration.
The existing code already had the notion of requiring one iteration in the scalar epilogue, this patch is mainly about generalizing that support slightly, making sure we don't try to use this mechanism when tail folding, and updating the code to reflect the difference between a single exit block and a unique exit block (very mechanical).
Differential Revision: https://reviews.llvm.org/D93317
Previously the branch from the middle block to the scalar preheader & exit
was being set-up at the end of skeleton creation in completeLoopSkeleton.
Inserting SCEV or runtime checks may result in LCSSA phis being created,
if they are required. Adjusting branches afterwards may break those
PHIs.
To avoid this, we can instead create the branch from the middle block
to the exit after we created the middle block, so we have the final CFG
before potentially adjusting/creating PHIs.
This fixes a crash for the included test case. For the non-crashing
case, this is almost a NFC with respect to the generated code. The
only change is the order of the predecessors of the involved branch
targets.
Note an assertion was moved from LoopVersioning() to
LoopVersioning::versionLoop. Adjusting the branches means loop-simplify
form may be broken before constructing LoopVersioning. But LV only uses
LoopVersioning to annotate the loop instructions with !noalias metadata,
which does not require loop-simplify form.
This is a fix for an existing issue uncovered by D93317.
I am hoping to extend the reduction matching code, and it is
hard to distinguish "ReductionData" from "ReducedValueData".
So extend the tree/root metaphor to include leaves.
Another problem is that the name "OperationData" does not
provide insight into its purpose. I'm not sure if we can alter
that underlying data structure to make the code clearer.
I think this is NFC currently, but the bug would be exposed
when we allow binary intrinsics (maxnum, etc) as candidates
for reductions.
The code in matchAssociativeReduction() is using
OperationData::getNumberOfOperands() when comparing whether
the "EdgeToVisit" iterator is in-bounds, so this code must
use the same (potentially offset) operand value to set
the "EdgeToVisit".
ScalarEvolution should be able to handle both constant and variable trip
counts using getURemExpr, so we do not have to handle them separately.
This is a small simplification of a56280094e.
Reviewed By: gilr
Differential Revision: https://reviews.llvm.org/D93677
This patch turns updates VPInstruction to manage the value it defines
using VPDef. The VPValue is used during VPlan construction and
codegeneration instead of the plain IR reference where possible.
Reviewed By: gilr
Differential Revision: https://reviews.llvm.org/D90565
When the trip-count is provably divisible by the maximal/chosen VF, folding the
loop's tail during vectorization is redundant. This commit extends the existing
test for constant trip-counts to any trip-count known to be divisible by
maximal/selected VF by SCEV.
Differential Revision: https://reviews.llvm.org/D93615
This patch makes VPRecipeBase a direct subclass of VPDef, moving the
SubclassID to VPDef.
Reviewed By: gilr
Differential Revision: https://reviews.llvm.org/D90564
This patch turns updates VPInterleaveRecipe to manage the values it defines
using VPDef. The VPValue is used during VPlan construction and
codegeneration instead of the plain IR reference where possible.
Reviewed By: gilr
Differential Revision: https://reviews.llvm.org/D90562
An earlier patch introduced asserts that the InstructionCost is
valid because at that time the ReuseShuffleCost variable was an
unsigned. However, now that the variable is an InstructionCost
instance the asserts can be removed.
See this thread for context:
http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html
See this patch for the introduction of the type:
https://reviews.llvm.org/D91174
This is an enhancement motivated by https://llvm.org/PR16739
(see D92858 for another).
We can look through a GEP to find a base pointer that may be
safe to use for a vector load. If so, then we shuffle (shift)
the necessary vector element over to index 0.
Alive2 proof based on 1 of the regression tests:
https://alive2.llvm.org/ce/z/yPJLkh
The vector translation is independent of endian (verify by
changing to leading 'E' in the datalayout string).
Differential Revision: https://reviews.llvm.org/D93229
Here's another minimal step suggested by D93229 / D93397 .
(I'm trying to be extra careful in these changes because
load transforms are easy to get wrong.)
We can optimistically choose the greater alignment of a
load and its pointer operand. As the test diffs show, this
can improve what would have been unaligned vector loads
into aligned loads.
When we enhance with gep offsets, we will need to adjust
the alignment calculation to include that offset.
Differential Revision: https://reviews.llvm.org/D93406
As discussed in D93229, we only need a minimal alignment constraint
when querying whether a hypothetical vector load is safe. We still
pass/use the potentially stronger alignment attribute when checking
costs and creating the new load.
There's already a test that changes with the minimum code change,
so splitting this off as a preliminary commit independent of any
gep/offset enhancements.
Differential Revision: https://reviews.llvm.org/D93397
This patch changes the type of cost variables (for instance: Cost, ExtractCost,
SpillCost) to use InstructionCost.
This patch also changes the type of cost variables to InstructionCost in other
functions that use the result of getTreeCost()
This patch is part of a series of patches to use InstructionCost instead of
unsigned/int for the cost model functions.
See this thread for context:
http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html
Depends on D91174
Differential Revision: https://reviews.llvm.org/D93049
Given we haven't yet enabled multiple exiting blocks, this is currently non functional, but it's an obvious extension which cleans up a later patch.
I don't think this is worth review (as it's pretty obvious), if anyone disagrees, feel feel to revert or comment and I will.
This should be purely non-functional. When touching this code for another reason, I found the handling of the PredicateOrDontVectorize piece here very confusing. Let's make it an explicit state (instead of an implicit combination of two variables), and use early return for options/hint processing.
This patch turns updates VPWidenSelectRecipe to manage the value
it defines using VPDef.
Reviewed By: gilr
Differential Revision: https://reviews.llvm.org/D90560
This patch turns updates VPWidenGEPRecipe to manage the value it defines
using VPDef. The VPValue is used during VPlan construction and
codegeneration instead of the plain IR reference where possible.
Reviewed By: gilr
Differential Revision: https://reviews.llvm.org/D90561
This patch turns updates VPWidenREcipe to manage the value it defines
using VPDef.
Reviewed By: gilr
Differential Revision: https://reviews.llvm.org/D90559
As noted in D93229, the transform from scalar load to vector load
potentially leaks poison from the extra vector elements that are
being loaded.
We could use freeze here (and x86 codegen at least appears to be
the same either way), but we already have a shuffle in this logic
to optionally change the vector size, so let's allow that
instruction to serve both purposes.
Differential Revision: https://reviews.llvm.org/D93238
D82227 has added a proper check to limit PHI vectorization to the
maximum vector register size. That unfortunately resulted in at
least a couple of regressions on SystemZ and x86.
This change reverts PHI handling from D82227 and replaces it with
a more general check in SLPVectorizerPass::tryToVectorizeList().
Moved to tryToVectorizeList() it allows to restart vectorization
if initial chunk fails.
However, this function is more general and handles not only PHI
but everything which SLP handles. If vectorization factor would
be limited to maximum vector register size it would limit much
more vectorization than before leading to further regressions.
Therefore a new TTI callback getMaximumVF() is added with the
default 0 to preserve current behavior and limit nothing. Then
targets can decide what is better for them.
The callback gets ElementSize just like a similar getMinimumVF()
function and the main opcode of the chain. The latter is to avoid
regressions at least on the AMDGPU. We can have loads and stores
up to 128 bit wide, and <2 x 16> bit vector math on some
subtargets, where the rest shall not be vectorized. I.e. we need
to differentiate based on the element size and operation itself.
Differential Revision: https://reviews.llvm.org/D92059
This patch updates VPWidenMemoryInstructionRecipe to use VPDef
to manage the value it produces instead of inheriting from VPValue.
Reviewed By: gilr
Differential Revision: https://reviews.llvm.org/D90563
Vector element size could be different for different store chains.
This patch prevents wrong computation of maximum number of elements
for that case.
Differential Revision: https://reviews.llvm.org/D93192
When it comes to the scalar cost of any predicated block, the loop
vectorizer by default regards this predication as a sign that it is
looking at an if-conversion and divides the scalar cost of the block by
2, assuming it would only be executed half the time. This however makes
no sense if the predication has been introduced to tail predicate the
loop.
Original patch by Anna Welker
Differential Revision: https://reviews.llvm.org/D86452
This is the first in a series of patches that attempts to migrate
existing cost instructions to return a new InstructionCost class
in place of a simple integer. This new class is intended to be
as light-weight and simple as possible, with a full range of
arithmetic and comparison operators that largely mirror the same
sets of operations on basic types, such as integers. The main
advantage to using an InstructionCost is that it can encode a
particular cost state in addition to a value. The initial
implementation only has two states - Normal and Invalid - but these
could be expanded over time if necessary. An invalid state can
be used to represent an unknown cost or an instruction that is
prohibitively expensive.
This patch adds the new class and changes the getInstructionCost
interface to return the new class. Other cost functions, such as
getUserCost, etc., will be migrated in future patches as I believe
this to be less disruptive. One benefit of this new class is that
it provides a way to unify many of the magic costs in the codebase
where the cost is set to a deliberately high number to prevent
optimisations taking place, e.g. vectorization. It also provides
a route to represent the extremely high, and unknown, cost of
scalarization of scalable vectors, which is not currently supported.
Differential Revision: https://reviews.llvm.org/D91174
This is an enhancement to load vectorization that is motivated by
a pattern in https://llvm.org/PR16739.
Unfortunately, it's still not enough to make a difference there.
We will have to handle multi-use cases in some better way to avoid
creating multiple overlapping loads.
Differential Revision: https://reviews.llvm.org/D92858
For stores chain vectorization we choose the size of vector
elements to ensure we fit to minimum and maximum vector register
size for the number of elements given. This patch corrects vector
element size choosing the width of value truncated just before
storing instead of the width of value stored.
Fixes PR46983
Differential Revision: https://reviews.llvm.org/D92824
* Steps are scaled by `vscale`, a runtime value.
* Changes to circumvent the cost-model for now (temporary)
so that the cost-model can be implemented separately.
This can vectorize the following loop [1]:
void loop(int N, double *a, double *b) {
#pragma clang loop vectorize_width(4, scalable)
for (int i = 0; i < N; i++) {
a[i] = b[i] + 1.0;
}
}
[1] This source-level example is based on the pragma proposed
separately in D89031. This patch only implements the LLVM part.
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D91077
This patch removes a number of asserts that VF is not scalable, even though
the code where this assert lives does nothing that prevents VF being scalable.
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D91060
It is possible to merge reuse and reorder shuffles and reduce the total
cost of the ivectorization tree/number of final instructions.
Differential Revision: https://reviews.llvm.org/D92668
The initial step of the uniform-after-vectorization (lane-0 demanded only) analysis was very awkwardly written. It would revisit use list of each pointer operand of a widened load/store. As a result, it was in the worst case O(N^2) where N was the number of instructions in a loop, and had restricted operand Value types to reduce the size of use lists.
This patch replaces the original algorithm with one which is at most O(2N) in the number of instructions in the loop. (The key observation is that each use of a potentially interesting pointer is visited at most twice, once on first scan, once in the use list of *it's* operand. Only instructions within the loop have their uses scanned.)
In the process, we remove a restriction which required the operand of the uniform mem op to itself be an instruction. This allows detection of uniform mem ops involving global addresses.
Differential Revision: https://reviews.llvm.org/D92056
This is yet another attempt at providing support for epilogue
vectorization following discussions raised in RFC http://llvm.1065342.n5.nabble.com/llvm-dev-Proposal-RFC-Epilog-loop-vectorization-tt106322.html#none
and reviews D30247 and D88819.
Similar to D88819, this patch achieve epilogue vectorization by
executing a single vplan twice: once on the main loop and a second
time on the epilogue loop (using a different VF). However it's able
to handle more loops, and generates more optimal control flow for
cases where the trip count is too small to execute any code in vector
form.
Reviewed By: SjoerdMeijer
Differential Revision: https://reviews.llvm.org/D89566
This might be a small improvement in readability, but the
real motivation is to make it easier to adapt the code to
deal with intrinsics like 'maxnum' and/or integer min/max.
There is potentially help in doing that with D92086, but
we might also just add specialized wrappers here to deal
with the expected patterns.
In this patch I have added support for a new loop hint called
vectorize.scalable.enable that says whether we should enable scalable
vectorization or not. If a user wants to instruct the compiler to
vectorize a loop with scalable vectors they can now do this as
follows:
br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !2
...
!2 = !{!2, !3, !4}
!3 = !{!"llvm.loop.vectorize.width", i32 8}
!4 = !{!"llvm.loop.vectorize.scalable.enable", i1 true}
Setting the hint to false simply reverts the behaviour back to the
default, using fixed width vectors.
Differential Revision: https://reviews.llvm.org/D88962
This is yet another attempt at providing support for epilogue
vectorization following discussions raised in RFC http://llvm.1065342.n5.nabble.com/llvm-dev-Proposal-RFC-Epilog-loop-vectorization-tt106322.html#none
and reviews D30247 and D88819.
Similar to D88819, this patch achieve epilogue vectorization by
executing a single vplan twice: once on the main loop and a second
time on the epilogue loop (using a different VF). However it's able
to handle more loops, and generates more optimal control flow for
cases where the trip count is too small to execute any code in vector
form.
Reviewed By: SjoerdMeijer
Differential Revision: https://reviews.llvm.org/D89566
In the following loop the dependence distance is 2 and can only be
vectorized if the vector length is no larger than this.
void foo(int *a, int *b, int N) {
#pragma clang loop vectorize(enable) vectorize_width(4)
for (int i=0; i<N; ++i) {
a[i + 2] = a[i] + b[i];
}
}
However, when specifying a VF of 4 via a loop hint this loop is
vectorized. According to [1][2], loop hints are ignored if the
optimization is not safe to apply.
This patch introduces a check to bail of vectorization if the user
specified VF is greater than the maximum feasible VF, unless explicitly
forced with '-force-vector-width=X'.
[1] https://llvm.org/docs/LangRef.html#llvm-loop-vectorize-and-llvm-loop-interleave
[2] https://clang.llvm.org/docs/LanguageExtensions.html#extensions-for-loop-hint-optimizations
Reviewed By: sdesmalen, fhahn, Meinersbur
Differential Revision: https://reviews.llvm.org/D90687
This patch replaces the attribute `unsigned VF` in the class
IntrinsicCostAttributes by `ElementCount VF`.
This is a non-functional change to help upcoming patches to compute the cost
model for scalable vector inside this class.
Differential Revision: https://reviews.llvm.org/D91532
Instruction ExtractValue wasn't handled in
LoopVectorizationCostModel::getInstructionCost(). As a result, it was modeled
as a mul which is not really accurate. Since it is free (most of the times),
this now gets a cost of 0 using getInstructionCost.
This is a follow-up of D92208, that required changing this regression test.
In a follow up I will look at InsertValue which also isn't handled yet.
Differential Revision: https://reviews.llvm.org/D92317
VPPredInstPHIRecipe is one of the recipes that was missed during the
initial conversion. This patch adjusts the recipe to also manage its
operand using VPUser.
Interleave groups also depend on the values they store. Manage the
stored values as VPUser operands. This is currently a NFC, but is
required to allow VPlan transforms and to manage generated vector values
exclusively in VPTransformState.
Update VPReplicateRecipe to inherit from VPValue. This still does not
update scalarizeInstruction to set the result for the VPValue of
VPReplicateRecipe, because this first requires tracking scalar values in
VPTransformState.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D91500
MaxSafeRegisterWidth is a misnomer since it actually returns the maximum
safe vector width. Register suggests it relates directly to a physical
register where it could be a vector spanning one or more physical
registers.
Reviewed By: sdesmalen
Differential Revision: https://reviews.llvm.org/D91727
This is a follow-up to 00a6601136 to make
isa<VPReductionRecipe> work and unifies the VPValue ID names, by making
sure they all consistently start with VPV*.
Similar to other patches, this makes VPWidenRecipe a VPValue. Because of
the way it interacts with the reduction code it also slightly alters the
way that VPValues are registered, removing the up front NeedDef and
using getOrAddVPValue to create them on-demand if needed instead.
Differential Revision: https://reviews.llvm.org/D88447