For the scalar/splat case, this fold is subsumed by
foldLogOpOfMaskedICmps(). However, the conjugated fold for "or"
also supports splats with undef. Make both code paths consistent
by using m_ZeroInt() for the "and" implementation as well.
https://alive2.llvm.org/ce/z/tN63cuhttps://alive2.llvm.org/ce/z/ufB_Ue
When folding and/or of icmps, look through add of a constant and
adjust the icmp range instead. Effectively, this decomposes
X + C1 < C2 style range checks back into a normal range. This allows
us to fold comparisons involving two range checks or one range check
and some other condition. We had a fold for a really specific case
of this (or of range check and eq, and only one one side!) while
this handles it in fully generality.
Differential Revision: https://reviews.llvm.org/D113510
Since there is just a single check for LHS in ~(A | B) & C | ...
transforms and multiple RHS checks inside with more coming I am
removing m_OneUse checks for LHS and adding new checks for RHS.
This is non essential as long as there is total benefit.
In addition (~(A | B) & C) | (~(A | C) & B) --> (B ^ C) & ~A
checks were overly restrictive, it should be good without any
additional checks.
Differential Revision: https://reviews.llvm.org/D113141
A problem was noted in the post-commit review for
c36b7e21bd / D113035 :
If the source type is not integer or integer vector,
then we could crash when trying to ComputeNumSignBits().
Unfortunately sinking recipes for first-order recurrences relies on
the original position of recipes. So if a recipes needs to be sunk after
an optimized induction, it needs to stay in the original position, until
sinking is done. This is causing PR52460.
To fix the crash, keep the recipes in the original position until
sink-after is done.
Post-commit follow-up to c45045bfd0 to address PR52460.
This reverts commit 7cd273c339.
Several patches with tests fixes have been applied:
0cada82f0a "[Test] Remove incorrect test in GVN"
97cb13615d "[Test] Separate IndVars test into AArch64 and X86 parts"
985cc490f1 "[Test] Remove separated test in IndVars",
and test failures caused by 5ec2386 should be resolved now.
Previously, InstCombine detected a pair of llvm.stacksave/stackrestore
instructions that are adjacent modulo debug instructions in order to
eliminate the llvm.stackrestore. This precludes situations where
intervening instructions (e.g. loads) preclude the llvm.stacksave and
llvm.stackrestore from becoming adjacent. This commit extends the logic
and allows for eliminating the llvm.stackrestore when the range of
instructions between them does not include any alloca or side-effect
causing instructions.
Signed-off-by: Itay Bookstein <itay.bookstein@nextsilicon.com>
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D113105
Hoist the instruction classification logic outside the loop
in preparation for reuse in a future commit.
Signed-off-by: Itay Bookstein <itay.bookstein@nextsilicon.com>
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D113464
This patch uses the abstract attributor introduced in D111054 to get the
assumption values instead of the `hasAssumption` function. This also
calls it so assumption information should propagate throug the device
where applicabile.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D111445
This patch introduces a new abstract attributor instance that propagates
assumption information from functions. Conceptually, if a function is
only called by functions that have certain assumptions, then we can
apply the same assumptions to that function. This problem is similar to
calculating the dominator set, but the assumptions are merged instead of
nodes.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D111054
add tracing for loads and stores.
The primary goal is to have more options for data-flow-guided fuzzing,
i.e. use data flow insights to perform better mutations or more agressive corpus expansion.
But the feature is general puspose, could be used for other things too.
Pipe the flag though clang and clang driver, same as for the other SanitizerCoverage flags.
While at it, change some plain arrays into std::array.
Tests: clang flags test, LLVM IR test, compiler-rt executable test.
Reviewed By: morehouse
Differential Revision: https://reviews.llvm.org/D113447
The implementation for and/or is the same, apart from the choice
of exactIntersectWith() vs exactUnionWith(). Extract a common
function to make future extension easier.
Rather than testing for many specific combinations of predicates
and values, compute the exact icmp regions for both comparisons
and check whether they union/intersect exactly. If they do,
construct the equivalent icmp for the new range. Assuming that the
existing code handled all possible cases, this should be NFC.
Differential Revision: https://reviews.llvm.org/D113367
To be more consistent with other pass struct names.
There are still more passes that don't end with "Pass", but these are the important ones.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D112935
Op0 - umax(X, Op0) --> 0 - usub.sat(X, Op1)
I'm not sure if this is really an improvement in IR because
we probably have better recognition/analysis for min/max,
but this lines up with the fold we do for the icmp+select
idiom and removes another diff from D98152.
This is similar to the previous fold in the code that was
added with:
83c2fb9f66baa6a85130https://alive2.llvm.org/ce/z/5MrVB9
Changes VPReplicateRecipe to extract the last lane from an unconditional,
uniform store instruction. collectLoopUniforms will also add stores to
the list of uniform instructions where Legal->isUniformMemOp is true.
setCostBasedWideningDecision now sets the widening decision for
all uniform memory ops to Scalarize, where previously GatherScatter
may have been chosen for scalable stores.
This fixes an assert ("Cannot yet scalarize uniform stores") in
setCostBasedWideningDecision when we have a loop containing a
uniform i1 store and a scalable VF, which we cannot create a scatter for.
Reviewed By: sdesmalen, david-arm, fhahn
Differential Revision: https://reviews.llvm.org/D112725
(Cond & C) | (~bitcast(Cond) & D) --> bitcast (select Cond, (bc C), (bc D))
This is part of fixing:
https://llvm.org/PR34047
That report shows a case where a bitcast is sitting between the select condition
candidate and its 'not' value due to current cast canonicalization rules.
There's a bitcast type restriction that might be violated in existing matching,
but I still need to investigate if that is possible -
Alive2 shows we can only do this transform safely when the bitcast is from
narrow to wide vector elements (otherwise poison could leak into elements
that were safe in the original code):
https://alive2.llvm.org/ce/z/Hf66qh
Differential Revision: https://reviews.llvm.org/D113035
attempts
Prevent the selection of IVs that have a SCEV containing an undef. Also
prevent salvaging attempts for values for which a SCEV could not be
created by ScalarEvolution and have only SCEVUknown.
Reviewed by: Orlando
Differential Revision: https://reviews.llvm.org/D111810
Without this patch, passingValueIsAlwaysUndefined will iterate over all
instructions from I to the end of the basic block, even if the use is
outside the block.
This patch adds an early bail out, if the use instruction is outside I's
BB. This can greatly reduce compile-time in cases where very large basic
blocks are involved, with a large number of PHI nodes and incoming
values.
Note that the refactoring makes the handling of the case where I is a
phi and Use is in PHI more explicit as well: for phi nodes, we can also
directly bail out. In the existing code, we would iterate until we reach
the end and return false.
Based on an earlier patch by Matt Wala.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D113293
This reapplies patch db289340c8.
The test failures on build with expensive checks caused by the patch happened due
to the fact that we sorted loop Phis in replaceCongruentIVs using llvm::sort,
which shuffles the given container if the expensive checks are enabled,
so equivalent Phis in the sorted vector had different mutual order from run
to run. replaceCongruentIVs tries to replace narrow Phis with truncations
of wide ones. In some test cases there were several Phis with the same
width, so if their order differs from run to run, the narrow Phis would
be replaced with a different Phi, depending on the shuffling result.
The patch ae14fae0ff fixed this issue by
replacing llvm::sort with llvm::stable_sort.
This patch adds a function to verify general properties of VPlans. The
first check makes sure that all phi-like recipes are at the beginning of
a block, with no other recipes in between.
Note that this currently may not hold for VPBlendRecipes at the moment,
as other recipes may be inserted before the VPBlendRecipe during mask
creation.
Note that this patch depends on D111300 and D111301, which fix code that
breaks the checked invariant.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D111302
This is a fix for test failures on expensive checks build caused by db289340c8.
With LLVM_ENABLE_EXPENSIVE_CHECKS enabled the llvm::sort shuffles the given container.
However, the sort is only called when the TTI is passed to replaceCongruentIVs.
In the mentioned patch we pass it TTI, so the sort happens. But due to shuffling
equivalent Phis may appear in different order from run to run.
With the stable_sort instead of sort this is impossible - the order of sorted Phis
is preserved.
that don't use the inline asm marker
This patch makes the changes to the ARC middle-end passes that are
needed to handle operand bundle "clang.arc.attachedcall" on targets that
don't use the inline asm marker for the retainRV/autoreleaseRV
handshake (e.g., x86-64).
Note that anyone who wants to use the operand bundle on their target has
to teach their backend to handle the operand bundle. The x86-64 backend
already knows about the operand bundle (see
https://reviews.llvm.org/D94597).
Differential Revision: https://reviews.llvm.org/D111334
This is in preparation for only invalidating analyses on changed
functions.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D113303
instruction the key points to is deleted
Use weak value handles for both the key and the value. The entry is
invalid if either value handle is null.
This fixes an assertion failure in BasicAAResult::alias that is caused
by UnderlyingObjCPtrCache returning a wrong value.
I don't have a test case for this patch that fails reliably.
rdar://83984790
- CUDA cannot associate memory space with pointer types. Even though Clang could add extra attributes to specify the address space explicitly on a pointer type, it breaks the portability between Clang and NVCC.
- This change proposes to assume the address space from a pointer from the assumption built upon target-specific address space predicates, such as `__isGlobal` from CUDA. E.g.,
```
foo(float *p) {
__builtin_assume(__isGlobal(p));
// From there, we could assume p is a global pointer instead of a
// generic one.
}
```
This makes the code portable without introducing the implementation-specific features.
Note that NVCC starts to support __builtin_assume from version 11.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D112041
InstCombine converts range tests of the form (X > C1 && X < C2) or
(X < C1 || X > C2) into checks of the form (X + C3 < C4) or
(X + C3 > C4). It is possible to express all range tests in either
of these forms (with different choices of constants), but currently
neither of them is considered canonical. We may have equivalent
range tests using either ult or ugt.
This proposes to canonicalize all range tests to use ult. An
alternative would be to canonicalize to either ult or ugt depending
on the specific constants involved -- e.g. in practice we currently
generate ult for && style ranges and ugt for || style ranges when
going through the insertRangeTest() helper. In fact, the "clamp like"
fold was relying on this, which is why I had to tweak it to not
assume whether inversion is needed based on just the predicate.
Proof: https://alive2.llvm.org/ce/z/_SP_rQ
Differential Revision: https://reviews.llvm.org/D113366
All phi-like recipes should be at the beginning of a VPBasicBlock with
no other recipes in between. Ensure that the recurrence-splicing recipe
is not added between phi-like recipes, but after them.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D111301
When targeting a specific CPU with scalable vectorization, the knowledge
of that particular CPU's vscale value can be used to tune the cost-model
and make the cost per lane less pessimistic.
If the target implements 'TTI.getVScaleForTuning()', the cost-per-lane
is calculated as:
Cost / (VScaleForTuning * VF.KnownMinLanes)
Otherwise, it assumes a value of 1 meaning that the behavior
is unchanged and calculated as:
Cost / VF.KnownMinLanes
Reviewed By: kmclaughlin, david-arm
Differential Revision: https://reviews.llvm.org/D113209
Extended value is known to be inside range smaller than full one.
Prevent SCCP to mark such value as overdefined.
Fixes PR52253
Differential Revision: https://reviews.llvm.org/D112721
The common use case for calling createStepForVF is currently something
like:
Value *Step = createStepForVF(Builder, ConstantInt::get(Ty, UF), VF);
and it makes more sense to reduce overall lines of code and change the
function to let it create the constant instead. With my patch this
becomes:
Value *Step = createStepForVF(Builder, Ty, VF, UF);
and the ConstantInt is created instead createStepForVF. A side-effect of
this is that the code in createStepForVF is also becomes simpler.
As part of this patch I've also replaced some calls to getRuntimeVF
with calls to createStepForVF, i.e.
getRuntimeVF(Builder, Count->getType(), VFactor * UFactor) ->
createStepForVF(Builder, Count->getType(), VFactor, UFactor)
because this feels semantically better.
Differential Revision: https://reviews.llvm.org/D113122
In IndVarSimplify after simplifying and extending loop IVs we call 'replaceCongruentIVs'.
This function optionally takes a TTI argument to be able to replace narrow IVs uses
with truncates of the widest one.
For some reason the TTI wasn't passed to the function, so it couldn't perform such
transform.
This patch fixes it.
Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D113024
At the moment in LoopVectorizationCostModel::selectEpilogueVectorizationFactor
we bail out if the main vector loop uses a scalable VF. This patch adds
support for generating epilogue vector loops using a fixed-width VF when the
main vector loop uses a scalable VF.
I've changed LoopVectorizationCostModel::selectEpilogueVectorizationFactor
so that we convert the scalable VF into a fixed-width VF and do profitability
checks on that instead. In addition, since the scalable and fixed-width VFs
live in different VPlans that means I had to change the calls to
LVP.hasPlanWithVFs so that we only pass in the fixed-width VF.
New tests added here:
Transforms/LoopVectorize/AArch64/sve-epilog-vect.ll
Differential Revision: https://reviews.llvm.org/D109432
For some optimizations on comparisons it's necessary that the
union/intersect is exact and not a superset. Add methods that
return Optional<ConstantRange> only if the result is exact.
For the sake of simplicity this is implemented by comparing
the subset and superset approximations for now, but it should be
possible to do this more directly, as unionWith() and intersectWith()
already distinguish the cases where the result is imprecise for the
preferred range type functionality.