Cost-modeling decisions are tied to the compute interleave groups
(widening decisions, scalar and uniform values). When invalidating the
interleave groups, those decisions also need to be invalidated.
Otherwise there is a mis-match during VPlan construction.
VPWidenMemoryRecipes created initially are left around w/o converting them
into VPInterleave recipes. Such a conversion indeed should not take place,
and these gather/scatter recipes may in fact be right. The crux is leaving around
obsolete CM_Interleave (and dependent) markings of instructions along with
their costs, instead of recalculating decisions, costs, and recipes.
Alternatively to forcing a complete recompute later on, we could try
to selectively invalidate the decisions connected to the interleave
groups. But we would likely need to run the uniform/scalar value
detection parts again anyways and the extra complexity is probably not
worth it.
Fixes PR45572.
Reviewers: gilr, rengolin, Ayal, hsaito
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D78298
After introducing VPWidenSelectRecipe, the duplicated logic can be
shared.
Reviewers: gilr, rengolin, Ayal, hsaito
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D77973
Handling LoadInst and StoreInst in tryToWiden seems a bit
counter-intuitive, as there is only an assertion for them and in no
case VPWidenRefipes are created for them.
I think it makes sense to move the assertion to handleReplication, where
the non-widened loads and store are handled.
Reviewers: gilr, rengolin, Ayal, hsaito
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D77972
Fix an assert introduced in 41ed5d856c1: a phi with a single predecessor and a
mask is a valid case which is already supported by the code.
Differential Revision: https://reviews.llvm.org/D78115
Summary:
Currently, the internal options -vectorize-loops, -vectorize-slp, and
-interleave-loops do not have much practical effect. This is because
they are used to initialize the corresponding flags in the pass
managers, and those flags are then unconditionally overwritten when
compiling via clang or via LTO from the linkers. The only exception was
-vectorize-loops via opt because of some special hackery there.
While vectorization could still be disabled when compiling via clang,
using -fno-[slp-]vectorize, this meant that there was no way to disable
it when compiling in LTO mode via the linkers. This only affected
ThinLTO, since for regular LTO vectorization is done during the compile
step for scalability reasons. For ThinLTO it is invoked in the LTO
backends. See also the discussion on PR45434.
This patch makes it so the internal options can actually be used to
disable these optimizations. Ultimately, the best long term solution is
to mark the loops with metadata (similar to the approach used to fix
-fno-unroll-loops in D77058), but this enables a shorter term
workaround, and actually makes these internal options useful.
I constant propagated the initial values of these internal flags into
the pass manager flags (for some reasons vectorize-loops and
interleave-loops were initialized to true, while vectorize-slp was
initialized to false). As mentioned above, they are overwritten
unconditionally so this doesn't have any real impact, and these initial
values aren't particularly meaningful.
I then changed the passes to check the internl values and return without
performing the associated optimization when false (I changed the default
of -vectorize-slp to true so the options behave similarly). I was able
to remove the hackery in opt used to get -vectorize-loops=false to work,
as well as a special option there used to disable SLP vectorization.
Finally, I changed thinlto-slp-vectorize-pm.c to:
a) Only test SLP (moved the loop vectorization checking to a new test).
b) Use code that is slp vectorized when it is enabled, and check that
instead of whether the pass is enabled.
c) Test the new behavior of -vectorize-slp.
d) Test both pass managers.
The loop vectorization (and associated interleaving) testing I moved to
a new thinlto-loop-vectorize-pm.c test, with several changes:
a) Changed the flags on the interleaving testing so that it will
actually interleave, and check that.
b) Test the new behavior of -vectorize-loops and -interleave-loops.
c) Test both pass managers.
Reviewers: fhahn, wmi
Subscribers: hiraditya, steven_wu, dexonsmith, cfe-commits, davezarzycki, llvm-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D77989
Summary:
Remove usages of asserting vector getters in Type in preparation for the
VectorType refactor. The existence of these functions complicates the
refactor while adding little value.
Reviewers: rriddle, sdesmalen, efriedma
Reviewed By: efriedma
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77259
Pass from the calling recipe the interleave group itself instead of passing the
group's insertion position and having the function query CM for its interleave
group and making sure that given instruction is the insertion point of.
Differential Revision: https://reviews.llvm.org/D78002
Widening a selects depends on whether the condition is loop invariant or
not. Rather than checking during codegen-time, the information can be
recorded at the VPlan construction time.
This was suggested as part of D76992, to reduce the reliance on
accessing the original underlying IR values.
Reviewers: gilr, rengolin, Ayal, hsaito
Reviewed By: gilr
Differential Revision: https://reviews.llvm.org/D77869
InnerLoopVectorizer's code called during VPlan execution still relies on
original IR's def-use relations to decide which vector code to generate,
limiting VPlan transformations ability to modify def-use relations and still
have ILV generate the vector code.
This commit introduces VPValues for VPBlendRecipe to use as the values to
blend. The recipe is generated with VPValues wrapping the phi's incoming values
of the scalar phi. This reduces ingredient def-use usage by ILV as a step
towards full VPlan-based def-use relations.
Differential Revision: https://reviews.llvm.org/D77539
Introduce a new VPWidenCanonicalIVRecipe to generate a canonical vector
induction for use in fold-tail-with-masking, if a primary induction is absent.
The canonical scalar IV having start = 0 and step = VF*UF, created during code
-gen to control the vector loop, is widened into a canonical vector IV having
start = {<Part*VF, Part*VF+1, ..., Part*VF+VF-1> for 0 <= Part < UF} and
step = <VF*UF, VF*UF, ..., VF*UF>.
Differential Revision: https://reviews.llvm.org/D77635
When building a VPlan, BasicBlock::instructionsWithoutDebug() is used to
iterate over the instructions in a block. This means that no recipes
should be created for debug info intrinsics already and we can turn the
early exit into an assertion.
Reviewers: Ayal, gilr, rengolin, aprantl
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D77636
This patch adds VPValue versions for the arguments of the call to
VPWidenCallRecipe and uses them during code-generation.
Similar to D76373 this reduces ingredient def-use usage by ILV as
a step towards full VPlan-based def-use relations.
Reviewers: Ayal, gilr, rengolin
Reviewed By: gilr
Differential Revision: https://reviews.llvm.org/D77655
This patch moves calls to their own recipe, to simplify the transition
to VPUser for operands of VPWidenRecipe, as discussed in D76992.
Subsequently additional information can be added to the recipe rather
than computing it during the execute step.
Reviewers: rengolin, Ayal, gilr, hsaito
Reviewed By: gilr
Differential Revision: https://reviews.llvm.org/D77467
After 49d00824bb, VPWidenRecipe only stores a single instruction.
tryToWiden can simply return the widen recipe, like other helpers in
VPRecipeBuilder.
Instead, represent the mask as out-of-line data in the instruction. This
should be more efficient in the places that currently use
getShuffleVector(), and paves the way for further changes to add new
shuffles for scalable vectors.
This doesn't change the syntax in textual IR. And I don't currently plan
to change the bitcode encoding in this patch, although we'll probably
need to do something once we extend shufflevector for scalable types.
I expect that once this is finished, we can then replace the raw "mask"
with something more appropriate for scalable vectors. Not sure exactly
what this looks like at the moment, but there are a few different ways
we could handle it. Maybe we could try to describe specific shuffles.
Or maybe we could define it in terms of a function to convert a fixed-length
array into an appropriate scalable vector, using a "step", or something
like that.
Differential Revision: https://reviews.llvm.org/D72467
In InnerLoopVectorizer::getOrCreateTripCount, when the backedge taken
count is a SCEV add expression, its type is defined by the type of the
last operand of the add expression.
In the test case from PR45259, this last operand happens to be a
pointer, which (according to llvm::Type) does not have a primitive size
in bits. In this case, LoopVectorize fails to truncate the SCEV and
crashes as a result.
Uing ScalarEvolution::getTypeSizeInBits makes the truncation work as expected.
https://bugs.llvm.org/show_bug.cgi?id=45259
Differential Revision: https://reviews.llvm.org/D76669
This patch changes VPWidenRecipe to only store a single original IR
instruction. This is the first required step towards modeling it's
operands as VPValues and also towards breaking it up into a
VPInstruction.
Discussed as part of D74695.
Reviewers: Ayal, gilr, rengolin
Reviewed By: gilr
Differential Revision: https://reviews.llvm.org/D76988
This untangles the logic in widenIntOrFpInduction in order to make more
explicit and visible how exactly the induction variable is lowered.
Differential Revision: https://reviews.llvm.org/D76686
InnerLoopVectorizer's code called during VPlan execution still relies on
original IR's def-use relations to decide which vector code to generate,
limiting VPlan transformations ability to modify def-use relations and still
have ILV generate the vector code.
This commit introduces a VPValue for VPWidenMemoryInstructionRecipe to use as
the stored value. The recipe is generated with a VPValue wrapping the stored
value of the scalar store. This reduces ingredient def-use usage by ILV as a
step towards full VPlan-based def-use relations.
Differential Revision: https://reviews.llvm.org/D76373
Summary:
Support ConstantInt::get() and Constant::getAllOnesValue() for scalable
vector type, this requires ConstantVector::getSplat() to take in 'ElementCount',
instead of 'unsigned' number of element count.
This change is needed for D73753.
Reviewers: sdesmalen, efriedma, apazos, spatel, huntergr, willlovett
Reviewed By: efriedma
Subscribers: tschuett, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74386
Refines the gather/scatter cost model, but also changes the TTI
function getIntrinsicInstrCost to accept an additional parameter
which is needed for the gather/scatter cost evaluation.
This did require trivial changes in some non-ARM backends to
adopt the new parameter.
Extending gathers and truncating scatters are now priced cheaper.
Differential Revision: https://reviews.llvm.org/D75525
Currently when printing VPValues we use the object address, which makes
it hard to distinguish VPValues as they usually are large numbers with
varying distance between them.
This patch adds a simple slot tracker, similar to the ModuleSlotTracker
used for IR values. In order to dump a VPValue or anything containing a
VPValue, a slot tracker for the enclosing VPlan needs to be created. The
existing VPlanPrinter can take care of that for the existing code. We
assign consecutive numbers to each VPValue we encounter in a reverse
post order traversal of the VPlan.
Reviewers: rengolin, hsaito, fhahn, Ayal, dorit, gilr
Reviewed By: gilr
Differential Revision: https://reviews.llvm.org/D73078
This patch adds a getPlan accessor to VPBlockBase, which finds the entry
block of the plan containing the block and returns the plan set for this
block.
VPBlockBase contains a VPlan pointer, but it should only be set for
the entry block of a plan. This allows moving blocks without updating
the pointer for each moved block and in the future we might introduce a
parent relationship between plans and blocks, similar to the one in LLVM IR.
Reviewers: rengolin, hsaito, fhahn, Ayal, dorit, gilr
Reviewed By: gilr
Differential Revision: https://reviews.llvm.org/D74445
getReductionVars, getInductionVars and getFirstOrderRecurrences were all
being returned from LoopVectorizationLegality as pointers to lists. This
just changes them to be references, cleaning up the interface slightly.
Differential Revision: https://reviews.llvm.org/D75448
A recent commit
(https://reviews.llvm.org/rG66c120f02560ef528a60924104ead66f330190f1) changed
the cost for calls to functions that have a vector version for some
vectorization factor. However, no check is performed for whether the
vectorization factor matches the current one being cost modeled. This leads to
attempts to widen call instructions to a vectorization factor for which such a
function does not exist, which in turn leads to an assertion failure.
This patch adds the check for vectorization factor (i.e. not just that the
called function has a vector version for some VF, but that it has a vector
version for this VF).
Differential revision: https://reviews.llvm.org/D74944
ToVectorTy is defined and used in multiple places. Hoist it to
VectorUtils.h to avoid duplication and improve re-usability.
Reviewers: rengolin, hsaito, Ayal, gilr, fpetrogalli
Reviewed By: fpetrogalli
Differential Revision: https://reviews.llvm.org/D74959
Dead instructions do not need to be sunk. Currently we try and record
the recipies for them, but there are no recipes emitted for them and
there's nothing to sink. They can be removed from SinkAfter while
marking them for recording.
Fixes PR44634.
Reviewers: rengolin, hsaito, fhahn, Ayal, gilr
Reviewed By: gilr
Differential Revision: https://reviews.llvm.org/D73423
Currently due to the edge caching, we create wrong predicates for
branches with matching true and false successors. We will cache the
condition for the edge from the true successor, and then lookup the same
edge (src and dst are the same) for the edge to the false successor.
If both successors match, the condition should always be true. At the
moment, we cannot really create constant VPValues, but we can just
create a true condition as X | !X. Later passes will clean that up.
Fixes PR44488.
Reviewers: rengolin, hsaito, fhahn, Ayal, dorit, gilr
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D73079
Summary: Vectorized loop processes VFxUF number of elements in one iteration thus total number of iterations decreases proportionally. In addition epilog loop may not have more than VFxUF - 1 iterations. This patch updates profile information accordingly.
Reviewers: hsaito, Ayal, fhahn, reames, silvas, dcaballe, SjoerdMeijer, mkuper, DaniilSuchkov
Reviewed By: Ayal, DaniilSuchkov
Subscribers: fedor.sergeev, hiraditya, rkruppe, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67905
Summary:
This commits is a rework of the patch in
https://reviews.llvm.org/D67572.
The rework was requested to prevent out-of-tree performance regression
when vectorizing out-of-tree IR intrinsics. The vectorization of such
intrinsics is enquired via the static function `isTLIScalarize`. For
detail see the discussion in https://reviews.llvm.org/D67572.
Reviewers: uabelho, fhahn, sdesmalen
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72734
The assume intrinsic is intentionally marked as may reading/writing
memory, to avoid passes moving them around. When flattening the CFG
for predicated blocks, we have to drop the assume calls, as they
are control-flow dependent.
There are some cases where we can do better (when control flow is
preserved), but that is follow-up work.
Fixes PR43620.
Reviewers: hsaito, rengolin, dcaballe, Ayal
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D68814
Memory instruction widening recipes use the pointer operand of their load/store
ingredient for generating the needed GEPs, making it difficult to feed these
recipes with pointers based on other ingredients or none at all.
This patch modifies these recipes to use a VPValue for the pointer instead, in
order to reduce ingredient def-use usage by ILV as a step towards full
VPlan-based def-use relations. The recipes are constructed with VPValues bound
to these ingredients, maintaining current behavior.
Differential revision: https://reviews.llvm.org/D70865
This addresses a vectorisation regression for tail-folded loops that are
counting down, e.g. loops as simple as this:
void foo(char *A, char *B, char *C, uint32_t N) {
while (N > 0) {
*C++ = *A++ + *B++;
N--;
}
}
These are loops that can be vectorised, but when tail-folding is requested, it
can't find a primary induction variable which we do need for predicating the
loop. As a result, the loop isn't vectorised at all, which it is able to do
when tail-folding is not attempted. So, this adds a check for the primary
induction variable where we decide how to lower the scalar epilogue. I.e., when
there isn't a primary induction variable, a scalar epilogue loop is allowed
(i.e. don't request tail-folding) so that vectorisation could still be
triggered.
Having this check for the primary induction variable make sense anyway, and in
addition, in a follow-up of this I will look into discovering earlier the
primary induction variable for counting down loops, so that this can also be
tail-folded.
Differential revision: https://reviews.llvm.org/D72324
SCEVExpander modifies the underlying function so it is more suitable in
Transforms/Utils, rather than Analysis. This allows using other
transform utils in SCEVExpander.
Reviewers: sanjoy.google, efriedma, reames
Reviewed By: sanjoy.google
Differential Revision: https://reviews.llvm.org/D71537