All of the callers already have an Instruction *. Many of them
from a dyn_cast.
Also update the OperationData constructor to use a Instruction&
to remove a dyn_cast and make it clear that the pointer is non-null.
Differential Revision: https://reviews.llvm.org/D88132
D87691 reordered some checks, which turned out to be unsafe. More
specifically, when examining a store instruction, the check against
getOrCreateResult should be done before attempting to call
isSameMemGeneration. Otherwise a crash in MSSA walker can occur.
This patch restores the order of these calls to what it was originally.
This refactors VPuser to not inherit from VPValue to facilitate
introducing operations that introduce multiple VPValues (e.g.
VPInterleaveRecipe).
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D84679
In this patch I've fixed some warnings that arose from the implicit
cast of TypeSize -> uint64_t. I tried writing a variety of different
cases to show how this optimisation might work for scalable vectors
and found:
1. The optimisation does not work for cases where the cast type
is scalable and the allocated type is not. This because we need to
know how many times the cast type fits into the allocated type.
2. If we pass all the various checks for the case when the allocated
type is scalable and the cast type is not, then when creating the
new alloca we have to take vscale into account. This leads to
sub-optimal IR that is worse than the original IR.
3. For the remaining case when both the alloca and cast types are
scalable it is hard to find examples where the optimisation would
kick in, except for simple bitcasts, because we typically fail the
ABI alignment checks.
For now I've changed the code to bail out if only one of the alloca
and cast types is scalable. This means we continue to support the
existing cases where both types are fixed, and also the specific case
when both types are scalable with the same size and alignment, for
example a simple bitcast of an alloca to another type.
I've added tests that show we don't attempt to promote the alloca,
except for simple bitcasts:
Transforms/InstCombine/AArch64/sve-cast-of-alloc.ll
Differential revision: https://reviews.llvm.org/D87378
A conversion from `pow` to `sqrt` shall not call an `errno`-setting
`sqrt` with -//infinity//: the `sqrt` will set `EDOM` where the `pow`
call need not.
This patch avoids the erroneous (pun not intended) transformation by
applying the restrictions discussed in the thread for
https://lists.llvm.org/pipermail/llvm-dev/2020-September/145051.html.
The existing tests are updated (depending on emphasis in the checks for
library calls, avoidance of overlap, and overall coverage):
- to add `ninf`, retaining the intended library call,
- to use the intrinsic, retaining the use of `select`, or
- to expect the replacement to not occur.
The following is tested:
- The pow intrinsic folds to a `select` instruction to
handle -//infinity//.
- The pow library call folds, with `ninf`, to `sqrt` without the
`select` instruction associated with handling -//infinity//.
- The pow library call does not fold to `sqrt` without `ninf`.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D87877
As an exhaustive test shows, this logic is fully identical to the old
implementation, with exception of the case where both of the operands
had empty ranges:
```
TEST_F(ConstantRangeTest, CVP_UDiv) {
unsigned Bits = 4;
EnumerateConstantRanges(Bits, [&](const ConstantRange &CR0) {
if(CR0.isEmptySet())
return;
EnumerateConstantRanges(Bits, [&](const ConstantRange &CR1) {
if(CR0.isEmptySet())
return;
unsigned MaxActiveBits = 0;
for (const ConstantRange &CR : {CR0, CR1})
MaxActiveBits = std::max(MaxActiveBits, CR.getActiveBits());
ConstantRange OperandRange(Bits, /*isFullSet=*/false);
for (const ConstantRange &CR : {CR0, CR1})
OperandRange = OperandRange.unionWith(CR);
unsigned NewWidth = OperandRange.getUnsignedMax().getActiveBits();
EXPECT_EQ(MaxActiveBits, NewWidth) << CR0 << " " << CR1;
});
});
}
```
This is a continuation of 8d487668d0,
the logic is pretty much identical for SRem:
Name: pos pos
Pre: C0 >= 0 && C1 >= 0
%r = srem i8 C0, C1
=>
%r = urem i8 C0, C1
Name: pos neg
Pre: C0 >= 0 && C1 <= 0
%r = srem i8 C0, C1
=>
%r = urem i8 C0, -C1
Name: neg pos
Pre: C0 <= 0 && C1 >= 0
%r = srem i8 C0, C1
=>
%t0 = urem i8 -C0, C1
%r = sub i8 0, %t0
Name: neg neg
Pre: C0 <= 0 && C1 <= 0
%r = srem i8 C0, C1
=>
%t0 = urem i8 -C0, -C1
%r = sub i8 0, %t0
https://rise4fun.com/Alive/Vd6
Now, this new logic does not result in any new catches
as of vanilla llvm test-suite + RawSpeed.
but it should be virtually compile-time free,
and it may be important to be consistent in their handling,
because if we had a pair of sdiv-srem, and only converted one of them,
-divrempairs will no longer see them as a pair,
and thus not "merge" them.
The current code for handling pow(x, y) where y is an integer plus 0.5
is not explicitly guarded against attempting to transform the case where
abs(y) is exactly 0.5.
The latter case is meant to be handled by `replacePowWithSqrt`. Indeed,
if the pow(x, integer+0.5) case proceeds past a certain point, it will
hit an assertion by attempting to form pow(x, 0) using `getPow`.
This patch adds an explicit check to prevent attempting the
pow(x, integer+0.5) transformation on pow(x, +/-0.5) as suggested during
the review of D87877. This has the effect of retaining the shrinking of
`pow` to `powf` when the `sqrt` libcall cannot be formed.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D88066
Refactored __tgt_target_data_begin_mapper_<issue|wait> to receive the handle as an input/output argument.
This given the compiler warning of returning the handle as copy.
Differential Revision: https://reviews.llvm.org/D88029
This provides a convenient way to print VPValues and recipes in a
debugger. In particular it saves the user from instantiating
VPSlotTracker to print recipes or values.
Changes TTI function getIntImmCostInst to take an additional Instruction parameter,
which enables us to be able to check it is part of a min(max())/max(min()) pattern that will match SSAT.
We can then mark the constant used as free to prevent it being hoisted so SSAT can still be generated.
Required minor changes in some non-ARM backends to allow for the optional parameter to be included.
Differential Revision: https://reviews.llvm.org/D87457
Extend the handling of memory intrinsics to also include non-
target-specific intrinsics, in particular masked loads and stores.
Invent "isHandledNonTargetIntrinsic" to distinguish between intrin-
sics that should be handled natively from intrinsics that can be
passed to TTI.
Add code that handles masked loads and stores and update the
testcase to reflect the results.
Differential Revision: https://reviews.llvm.org/D87340
SimplifyCFG's options should always be overridden by command line flags,
but they mistakenly weren't in the default constructor.
Reviewed By: ychen
Differential Revision: https://reviews.llvm.org/D87718
1. Store intrinsic ID in ParseMemoryInst instead of a boolean flag
"IsTargetMemInst". This will make it easier to add support for
target-independent intrinsics.
2. Extract the complex multiline conditions from EarlyCSE::processNode
into a new function "getMatchingValue".
Differential Revision: https://reviews.llvm.org/D87691
This pass is like DeadCodeEliminationPass, but only does one pass
through a function instead of iterating on users of eliminated
instructions.
DeadCodeEliminationPass should be used in all cases.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D87933
The implementation of gather() should be reduced too,
but this change by itself makes things a little clearer:
we don't try to gather to a different type or
number-of-values than whatever is passed in as the value
list itself.
If some leaves have the same instructions to be vectorized, we may
incorrectly evaluate the best order for the root node (it is built for the
vector of instructions without repeated instructions and, thus, has less
elements than the root node). In this case we just can not try to reorder
the tree + we may calculate the wrong number of nodes that requre the
same reordering.
For example, if the root node is \<a+b, a+c, a+d, f+e\>, then the leaves
are \<a, a, a, f\> and \<b, c, d, e\>. When we try to vectorize the first
leaf, it will be shrink to \<a, b\>. If instructions in this leaf should
be reordered, the best order will be \<1, 0\>. We need to extend this
order for the root node. For the root node this order should look like
\<3, 0, 1, 2\>. This patch allows extension of the orders of the nodes
with the reused instructions.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D45263
LSR claims to MemorySSA, but we also have to make sure it is preserved
when splitting critical edges. This can be done by passing MSSAU to
SplitCriticalEdge.
Fixes PR47557.
We do similar factorization folds in SimplifyUsingDistributiveLaws,
but that drops no-wrap properties. Propagating those optimally may
help solve:
https://llvm.org/PR47430
The propagation is all-or-nothing for these patterns: when all
3 incoming ops have nsw or nuw, the 2 new ops should have the
same no-wrap property:
https://alive2.llvm.org/ce/z/Dv8wsU
This also solves:
https://llvm.org/PR47584
The test (currently crashing) is reduced from the example provided
in the post-commit discussion in D87149.
Differential Revision: https://reviews.llvm.org/D87965
I want to export this function, and the current API was a bit
weird: It took an additional Alignment argument that didn't really
have anything to do with what the function does. Drop it, and
perform a max at the callsite.
Also rename it to tryEnforceAlignment().
Currently SCEVExpander creates inttoptr for non-integral pointers if the
base is a null constant for example. This results in invalid IR.
This patch changes InsertNoopCastOfTo to emit a GEP & bitcast to convert
to a non-integral pointer. First, a GEP of i8* null is generated and the
integral value is used as index. The GEP is then bitcasted to the target
type.
This was exposed by D71539.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D87827
When address sanitizing a function, stack unpinsoning code is inserted before each ret instruction. However if the ret instruciton is preceded by a musttail call, such transformation broke the musttail call contract and generates invalid IR.
This patch fixes the issue by moving the insertion point prior to the musttail call if there is one.
Differential Revision: https://reviews.llvm.org/D87777