This patch is for fixing potential shufflevector-related bugs like D93818.
As D93818, this patch change shufflevector's default placeholder to poison.
To reduce risk, it was divided into several patches, and this patch is for InstCombineCompares and InstructionCombining.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D110227
This renames the primary methods for creating a zero value to `getZero`
instead of `getNullValue` and renames predicates like `isAllOnesValue`
to simply `isAllOnes`. This achieves two things:
1) This starts standardizing predicates across the LLVM codebase,
following (in this case) ConstantInt. The word "Value" doesn't
convey anything of merit, and is missing in some of the other things.
2) Calling an integer "null" doesn't make any sense. The original sin
here is mine and I've regretted it for years. This moves us to calling
it "zero" instead, which is correct!
APInt is widely used and I don't think anyone is keen to take massive source
breakage on anything so core, at least not all in one go. As such, this
doesn't actually delete any entrypoints, it "soft deprecates" them with a
comment.
Included in this patch are changes to a bunch of the codebase, but there are
more. We should normalize SelectionDAG and other APIs as well, which would
make the API change more mechanical.
Differential Revision: https://reviews.llvm.org/D109483
This could go either direction since the instruction
count is the same either way, but there are a few
reasons to prefer this:
1. We already do the related transform with 'and'
(see just above the new code).
2. We try (too hard) to compensate for not having this
and possibly other folds in transformZExtICmp(),
and that leads to bugs like https://llvm.org/PR51762 .
3. Codegen looks better across a variety of targets.
https://alive2.llvm.org/ce/z/uEgn4P
We cannot leak any equivalency information by comparing against null
since null never has virtual metadata associated with it (when null is
not a valid dereferenceable pointer).
Instcombine seems to make sure that a null will be on the RHS, so we
don't have to check both operands.
This fixes a missed optimization in llvm-test-suite's MultiSource lambda
benchmark under -fstrict-vtable-pointers.
Reviewed By: Prazek
Differential Revision: https://reviews.llvm.org/D108734
This is a quick fix for a motivating case that looks like this:
https://godbolt.org/z/GeMqzMc38
As noted, we might be able to restore the min/max patterns
with select folds, or we just wait for this to become easier
with canonicalization to min/max intrinsics.
The intrinsics have an extra chunk of known bits logic
compared to the normal cmp+select idiom. That allows
folding the icmp in each case to something better, but
that then opposes the canonical form of min/max that
we try to form for a select.
I'm carving out a narrow exception to preserve all
existing regression tests while avoiding the inf-loop.
It seems unlikely that this is the only bug like this
left, but this should fix:
https://llvm.org/PR51419
There may be some generalizations (see test comments) of these patterns,
but this should handle the cases motivated by:
https://llvm.org/PR51315https://llvm.org/PR51259
The backend may want to transform differently, but at least for
the x86 examples that I looked at, there does not appear to be
any significant perf diff either way.
The inttoptr/ptrtoint roundtrip optimization is not always correct.
We are working towards removing this optimization and adding support
to specific cases where this optimization works. This patch is the
first one on this line.
Consider the example:
%i = ptrtoint i8* %X to i64
%p = inttoptr i64 %i to i16*
%cmp = icmp eq i8* %load, %p
In this specific case, the inttoptr/ptrtoint optimization is correct
as it only compares the pointer values. In this patch, we fold
inttoptr/ptrtoint to a bitcast (if src and dest types are different).
Differential Revision: https://reviews.llvm.org/D105088
This set of folds was added recently with:
c7b658aeb50c400e895340b752d28d
...and I noted that this wasn't likely to fire in code derived
from C/C++ source because of nsw in particular. But I didn't
notice that I had placed the code above the no-wrap block
of transforms.
This is likely the cause of regressions noted from the previous
commit because -- as shown in the test diffs -- we may have
transformed into a compare with an arbitrary constant rather
than a simpler signbit test.
This is the pattern from the description of:
https://llvm.org/PR50816
There might be a way to generalize this to a smaller or more
generic pattern, but I have not found it yet.
https://alive2.llvm.org/ce/z/ShzJoF
define i1 @src(i8 %x) {
%add = add i8 %x, -1
%xor = xor i8 %x, -1
%and = and i8 %add, %xor
%r = icmp slt i8 %and, 0
ret i1 %r
}
define i1 @tgt(i8 %x) {
%r = icmp eq i8 %x, 0
ret i1 %r
}
This follows up patches for the unsigned siblings:
0c400e8953c7b658aeb5
We are translating an offset signed compare to its
unsigned equivalent when one end of the range is
at the limit (zero or unsigned max).
(X + C2) >s C --> X <u (SMAX - C) (if C == C2 - 1)
(X + C2) <s C --> X >u (C ^ SMAX) (if C == C2)
This probably does not show up much in IR derived
from C/C++ source because that would likely have
'nsw', and we have folds for that already.
As with the previous unsigned transforms, the folds
could be generalized to handle non-constant patterns:
https://alive2.llvm.org/ce/z/Y8Xrrm
; sgt
define i1 @src(i8 %a, i8 %c) {
%c2 = add i8 %c, 1
%t = add i8 %a, %c2
%ov = icmp sgt i8 %t, %c
ret i1 %ov
}
define i1 @tgt(i8 %a, i8 %c) {
%c_off = sub i8 127, %c ; SMAX
%ov = icmp ult i8 %a, %c_off
ret i1 %ov
}
https://alive2.llvm.org/ce/z/c8uhnk
; slt
define i1 @src(i8 %a, i8 %c) {
%t = add i8 %a, %c
%ov = icmp slt i8 %t, %c
ret i1 %ov
}
define i1 @tgt(i8 %a, i8 %c) {
%c_offnot = xor i8 %c, 127 ; SMAX
%ov = icmp ugt i8 %a, %c_offnot
ret i1 %ov
}
This is one sibling of the fold added with c7b658aeb5 .
(X + C2) <u C --> X >s ~C2 (if C == C2 + SMIN)
I'm still not sure how to describe it best, but we're
translating 2 constants from an unsigned range comparison
to signed because that eliminates the offset (add) op.
This could be extended to handle the more general (non-constant)
pattern too:
https://alive2.llvm.org/ce/z/K-fMBf
define i1 @src(i8 %a, i8 %c2) {
%t = add i8 %a, %c2
%c = add i8 %c2, 128 ; SMIN
%ov = icmp ult i8 %t, %c
ret i1 %ov
}
define i1 @tgt(i8 %a, i8 %c2) {
%not_c2 = xor i8 %c2, -1
%ov = icmp sgt i8 %a, %not_c2
ret i1 %ov
}
There must be a better way to describe this pattern in words?
(X + C2) >u C --> X <s -C2 (if C == C2 + SMAX)
This could be extended to handle the more general (non-constant)
pattern too:
https://alive2.llvm.org/ce/z/rdfNFP
define i1 @src(i8 %a, i8 %c1) {
%t = add i8 %a, %c1
%c2 = add i8 %c1, 127 ; SMAX
%ov = icmp ugt i8 %t, %c2
ret i1 %ov
}
define i1 @tgt(i8 %a, i8 %c1) {
%neg_c1 = sub i8 0, %c1
%ov = icmp slt i8 %a, %neg_c1
ret i1 %ov
}
The pattern was noticed as a by-product of D104932.
We could use a bigger hammer and bail out on any constant
expression, but there's a regression test that appears to
validly do the transform (although it may not have been
intending to check that optimization).
Rather than relying on pointer type equality (which, for a change,
is silently incorrect with opaque pointers) check that the GEP
source element types match.
As noted in PR45210: https://bugs.llvm.org/show_bug.cgi?id=45210
...the bug is triggered as Eli say when sext(idx) * ElementSize overflows.
```
// assume that GV is an array of 4-byte elements
GEP = gep GV, 0, Idx // this is accessing Idx * 4
L = load GEP
ICI = icmp eq L, value
=>
ICI = icmp eq Idx, NewIdx
```
The foldCmpLoadFromIndexedGlobal function simplifies GEP+load operation to icmp.
And there is a problem because Idx * ElementSize can overflow.
Let's assume that the wanted value is at offset 0.
Then, there are actually four possible values for Idx to match offset 0: 0x00..00, 0x40..00, 0x80..00, 0xC0..00.
We should return true for all these values, but currently, the new icmp only returns true for 0x00..00.
This problem can be solved by masking off (trailing zeros of ElementSize) bits from Idx.
```
...
=>
Idx' = and Idx, 0x3F..FF
ICI = icmp eq Idx', NewIdx
```
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D99481
This reverts commit 4f2fd3818b.
The Linux kernel fails to build after this commit. See
https://reviews.llvm.org/D99481 for a reproducer.
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
As noted in PR45210: https://bugs.llvm.org/show_bug.cgi?id=45210
...the bug is triggered as Eli say when sext(idx) * ElementSize overflows.
```
// assume that GV is an array of 4-byte elements
GEP = gep GV, 0, Idx // this is accessing Idx * 4
L = load GEP
ICI = icmp eq L, value
=>
ICI = icmp eq Idx, NewIdx
```
The foldCmpLoadFromIndexedGlobal function simplifies GEP+load operation to icmp.
And there is a problem because Idx * ElementSize can overflow.
Let's assume that the wanted value is at offset 0.
Then, there are actually four possible values for Idx to match offset 0: 0x00..00, 0x40..00, 0x80..00, 0xC0..00.
We should return true for all these values, but currently, the new icmp only returns true for 0x00..00.
This problem can be solved by masking off (trailing zeros of ElementSize) bits from Idx.
```
...
=>
Idx' = and Idx, 0x3F..FF
ICI = icmp eq Idx', NewIdx
```
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D99481
This was reverted due to performance regressions in ARM benchmarks,
which have since been addressed by D101196 (SCEV analysis improvement)
and D101778 (CGP reverse transform).
-----
The single-use case is handled implicity by converting the icmp
into a mask check first. When comparing with zero in particular,
we don't need the one-use restriction, as we only produce a single
icmp.
https://alive2.llvm.org/ce/z/MSixcmhttps://alive2.llvm.org/ce/z/GwpG0M
This fixes https://llvm.org/PR48900 , but as seen in the
regression tests prevents some optimizations.
There are a few options to restore those (switch to min/max
intrinsics, add larger pattern matching for select with
dominating condition, improve CVP), but we need to prevent
the bug 1st.
This reverts commit 13ec913bdf.
This commit introduces new uses of the overflow checking intrinsics that
depend on implementations in compiler-rt, which Windows users generally
do not link against. I filed an issue (somewhere) to make clang
auto-link the builtins library to resolve this situation, but until that
happens, it isn't reasonable for the optimizer to introduce new link
time dependencies.
Currently, the InstCombineCompare is combining two add operations
into a single add operation which always has a nsw flag, without
checking the conditions to see if this flag should be present
according to the original two add operations or not.
This patch will change the InstCombineCompare to emit the nsw or
nuw only when these flags are allowed to be generated according to
the original add operations and remove the possibility of applying
wrong optimization with passes that will perform on the IR later
in the pipeline.
To confirm that the current results are buggy and the results after
proposed patch are the correct IR the following examples from Alive2
are attached; the same results can be seen in the case of nuw flag
and nsw is just used as an example. The following link shows that
the generated IR with current LLVM is a buggy IR when none of the
original add operations have nsw flag.
https://alive2.llvm.org/ce/z/WGaDrm
The following link proves that the generated IR after the patch in
the former case is the correct IR.
https://alive2.llvm.org/ce/z/wQ7G_e
Differential Revision: https://reviews.llvm.org/D100095
As discussed in:
https://llvm.org/PR49179
...this pattern shows up in library code.
There are several potential generalizations as noted,
but we need to be careful that we get FP special-values
right, and it's not clear how much variation we should
expect to see from this exact idiom.
This is a more basic pattern that we should handle before trying to solve:
https://llvm.org/PR48640
There might be a better way to think about this because the pre-condition
that I came up with (number of sign bits in the compare constant) misses a
potential transform for each of ugt and ult as commented on in the test file.
Tried to model this is in Alive:
https://rise4fun.com/Alive/juX1
...but I couldn't get the ComputeNumSignBits() pre-condition to work as
expected, so replaced with leading 0/1 preconditions instead.
Name: ugt
Pre: countLeadingZeros(C2) <= C1 && countLeadingOnes(C2) <= C1
%a = ashr %x, C1
%r = icmp ugt i8 %a, C2
=>
%r = icmp slt i8 %x, 0
Name: ult
Pre: countLeadingZeros(C2) <= C1 && countLeadingOnes(C2) <= C1
%a = ashr %x, C1
%r = icmp ult i4 %a, C2
=>
%r = icmp sgt i4 %x, -1
Also approximated in Alive2:
https://alive2.llvm.org/ce/z/u5hCczhttps://alive2.llvm.org/ce/z/__szVL
Differential Revision: https://reviews.llvm.org/D94014
Currently make_early_inc_range cannot be used with iterators with
operator* implementations that do not return a reference.
Most notably in the LLVM codebase, this means the User iterator ranges
cannot be used with make_early_inc_range, which slightly simplifies
iterating over ranges while elements are removed.
Instead of directly using BaseT::reference as return type of operator*,
this patch uses decltype to get the actual return type of the operator*
implementation in WrappedIteratorT.
This patch also updates a few places to use make use of
make_early_inc_range.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D93992
This patch
- Adds containsPoisonElement that checks existence of poison in constant vector elements,
- Renames containsUndefElement to containsUndefOrPoisonElement to clarify its behavior & updates its uses properly
With this patch, isGuaranteedNotToBeUndefOrPoison's tests w.r.t constant vectors are added because its analysis is improved.
Thanks!
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D94053
Feeding vector values to `InstCombiner::OptimizeOverflowCheck` produces a scalar boolean flag if it proves the overflow check can be eliminated.
This causes `InstCombiner::CreateOverflowTuple` to crash as it correctly expects a vector of i1 values instead.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D89628
Use isKnownXY comparators when one of the operands can be with
scalable vectors or getFixedSize() for all the other cases.
This patch also does bug fixes for getPrimitiveSizeInBits by using
getFixedSize() near the places with the TypeSize comparison.
Differential Revision: https://reviews.llvm.org/D89703
We know V is a IntToPtrInst or PtrToIntInst type so we know its a CastInst - so use cast<> directly.
Prevents clang static analyzer warning that we could deference a null pointer.
We cannot iterate on scalable vector, the number of elements is unknown at compile-time.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D87918
For a long time, the InstCombine pass handled target specific
intrinsics. Having target specific code in general passes was noted as
an area for improvement for a long time.
D81728 moves most target specific code out of the InstCombine pass.
Applying the target specific combinations in an extra pass would
probably result in inferior optimizations compared to the current
fixed-point iteration, therefore the InstCombine pass resorts to newly
introduced functions in the TargetTransformInfo when it encounters
unknown intrinsics.
The patch should not have any effect on generated code (under the
assumption that code never uses intrinsics from a foreign target).
This introduces three new functions:
TargetTransformInfo::instCombineIntrinsic
TargetTransformInfo::simplifyDemandedUseBitsIntrinsic
TargetTransformInfo::simplifyDemandedVectorEltsIntrinsic
A few target specific parts are left in the InstCombine folder, where
it makes sense to share code. The largest left-over part in
InstCombineCalls.cpp is the code shared between arm and aarch64.
This allows to move about 3000 lines out from InstCombine to the targets.
Differential Revision: https://reviews.llvm.org/D81728
As noted in PR46561:
https://bugs.llvm.org/show_bug.cgi?id=46561
...it takes something beyond a minimal IR example to trigger
this bug because it relies on matching non-canonical IR.
There are no tests that show the need for matching this
pattern, so I'm just deleting it to fix the miscompile.
Summary:
The actual transform i was going after was:
https://rise4fun.com/Alive/Tp9H
```
Name: zz
Pre: isPowerOf2(C0) && isPowerOf2(C1) && C1 == C0
%t0 = and i8 %x, C0
%r = icmp eq i8 %t0, C1
=>
%t = icmp eq i8 %t0, 0
%r = xor i1 %t, -1
Name: zz
Pre: isPowerOf2(C0)
%t0 = and i8 %x, C0
%r = icmp ne i8 %t0, 0
=>
%t = icmp eq i8 %t0, 0
%r = xor i1 %t, -1
```
but as it can be seen from the current tests, we already canonicalize most of it,
and we are only missing handling multi-use non-canonical icmp predicates.
If we have both `!=0` and `==0`, even though we can CSE them,
we end up being stuck with them. We should canonicalize to the `==0`.
I believe this is one of the cleanup steps i'll need after `-scalarizer`
if i end up proceeding with my WIP alloca promotion helper pass.
Reviewers: spatel, jdoerfert, nikic
Reviewed By: nikic
Subscribers: zzheng, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D83139
I originally reverted the patch because it was causing performance
issues, but now I think it's just enabling simplify-cfg to do
something that I don't want instead :)
Sorry for the noise.
This reverts commit 3e39760f8e.
We can simplify
```
icmp <pred> phi(C1, C2, ...), C
```
with
```
phi(icmp(C1, C), icmp(C2, C), ...)
```
provided that all comparison of constants are constants themselves.
Differential Revision: https://reviews.llvm.org/D81151
Reviewed By: lebedev.ri
(X | MaskC) == C --> (X & ~MaskC) == C ^ MaskC
(X | MaskC) != C --> (X & ~MaskC) != C ^ MaskC
We have more analyis for 'and' patterns and already lean this way
in the existing code, so this should be neutral or better in IR.
If this does not do as well in codegen, the problem already exists
and we should fix that based on target costs/heuristics.
http://volta.cs.utah.edu:8080/z/oP3ecL
define void @src(i8 %x, i8 %OrC, i8 %C, i1* %p0, i1* %p1) {
%or = or i8 %x, %OrC
%eq = icmp eq i8 %or, %C
store i1 %eq, i1* %p0
%ne = icmp ne i8 %or, %C
store i1 %ne, i1* %p1
ret void
}
define void @tgt(i8 %x, i8 %OrC, i8 %C, i1* %p0, i1* %p1) {
%NotOrC = xor i8 %OrC, -1
%a = and i8 %x, %NotOrC
%NewC = xor i8 %C, %OrC
%eq = icmp eq i8 %a, %NewC
store i1 %eq, i1* %p0
%ne = icmp ne i8 %a, %NewC
store i1 %ne, i1* %p1
ret void
}
Revision a1c05fe <https://reviews.llvm.org/rGa1c05fe20f3def1f1be9f50d2adefc6b6f1578ad>
removed bitcast from the list of problematic transformations, however:
%97 = fptrunc ppc_fp128 %2 to double // we need to check ppc_fp128 here to prevent the transformation
%98 = bitcast double %97 to i64 // a1c05fe checks ppc_fp128 at here
%99 = icmp slt i64 %98, 0
%100 = zext i1 %99 to i8
store i8 %100, i8* %7, align 1
so this patch does that. I'm also disabling it in the presence of extend just in case.
I verified separately that the hash of -std::infinity and std::infinity don't match now.
Differential Revision: https://reviews.llvm.org/D77911
Summary:
Remove usages of asserting vector getters in Type in preparation for the
VectorType refactor. The existence of these functions complicates the
refactor while adding little value.
Reviewers: sdesmalen, rriddle, efriedma
Reviewed By: sdesmalen
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77263
Based on the post-commit comments for rG0f56bbc, there might
be a problem with this transform:
(bitcast (fpext/fptrunc X)) to iX) < 0 --> (bitcast X to iY) < 0
...and the ppc_fp128 data type, so conservatively bypass if we
are bitcasting a ppc_fp128.
We might be able to account for endian or other differences to
enable this for PowerPC again if that is useful.
Differential Revision: https://reviews.llvm.org/D77642
These are versions of a function that regressed with:
rGf2fbdf76d8d0
That particular problem occurs with an instcombine-simplifycfg-instcombine
sequence, but we can show that it exists within instcombine only with
other variations of the pattern.
This reverts commit f2fbdf76d8.
As noted in the post-commit thread:
https://reviews.llvm.org/rGf2fbdf76d8d0
...this can obscure a min/max pattern where the components
have extra uses. We can show that the problem is independent
of this change with a slightly modified source example, so
this revert just delays/reduces the need to fix the real
problem.
We need to improve our analysis of negation or -- more
generally -- subtraction using patches like D77230 or D68408.
Instead, represent the mask as out-of-line data in the instruction. This
should be more efficient in the places that currently use
getShuffleVector(), and paves the way for further changes to add new
shuffles for scalable vectors.
This doesn't change the syntax in textual IR. And I don't currently plan
to change the bitcode encoding in this patch, although we'll probably
need to do something once we extend shufflevector for scalable types.
I expect that once this is finished, we can then replace the raw "mask"
with something more appropriate for scalable vectors. Not sure exactly
what this looks like at the moment, but there are a few different ways
we could handle it. Maybe we could try to describe specific shuffles.
Or maybe we could define it in terms of a function to convert a fixed-length
array into an appropriate scalable vector, using a "step", or something
like that.
Differential Revision: https://reviews.llvm.org/D72467
InstCombine has a mess of logic that tries to preserve min/max patterns,
but AFAICT, this one is not necessary because we can always narrow the
corresponding select in this sequence to match the narrow compare.
The biggest danger for this patch is inducing infinite looping or
assert from exceeding max iterations. If any bots hit that in the
vicinity of this commit, this is the likely patch to blame.
As we don't return the result of replaceInstUsesWith(), we are
responsible for erasing the instruction.
There is a small subtlety here in that we need to do this after
the other uses of Builder, which uses the original multiply as
the insertion point.
NFC apart from worklist order changes.
Usually when we replaceInstUsesWith() we also return the original
instruction, and InstCombine will take care of erasing it. Here
we don't do that, so we need to manually erase it.
NFC apart from worklist order changes.
Summary:
Support ConstantInt::get() and Constant::getAllOnesValue() for scalable
vector type, this requires ConstantVector::getSplat() to take in 'ElementCount',
instead of 'unsigned' number of element count.
This change is needed for D73753.
Reviewers: sdesmalen, efriedma, apazos, spatel, huntergr, willlovett
Reviewed By: efriedma
Subscribers: tschuett, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74386
Much like with reassociateShiftAmtsOfTwoSameDirectionShifts(),
as input, we have the following pattern:
icmp eq/ne (and ((x shift Q), (y oppositeshift K))), 0
We want to rewrite that as:
icmp eq/ne (and (x shift (Q+K)), y), 0 iff (Q+K) u< bitwidth(x)
While we know that originally (Q+K) would not overflow
(because 2 * (N-1) u<= iN -1), we may have looked past extensions of
shift amounts. so it may now overflow in smaller bitwidth.
To ensure that does not happen, we need to ensure that the total maximal
shift amount is still representable in that smaller bitwidth.
If the overflow would happen, (Q+K) u< bitwidth(x) check would be bogus.
https://bugs.llvm.org/show_bug.cgi?id=44802
This version fixes a buildbot failure cause by picking the wrong insert
point for XORs. We cannot pick the XOR binary operator as insert point,
as it is not guaranteed that both input operands for the overflow
intrinsic are defined before it.
This reverts the revert commit
c7fc0e5da6.
Instcombine folds (a + b <u a) to (a ^ -1 <u b) and that does not match
the expected pattern in CodeGenPerpare via UAddWithOverflow.
This causes a regression over Clang 7 on both X86 and AArch64:
https://gcc.godbolt.org/z/juhXYV
This patch extends UAddWithOverflow to also catch the XOR case, if the
XOR is only used in the ICMP. This covers just a single case, but I'd
like to make sure I am not missing anything before tackling the other
cases.
Reviewers: nikic, RKSimon, lebedev.ri, spatel
Reviewed By: nikic, lebedev.ri
Differential Revision: https://reviews.llvm.org/D74228
Fix for https://bugs.llvm.org/show_bug.cgi?id=44754. We already have
a fold that converts icmp (and (ashr X, C3), C2), C1 into
icmp (and C2'), C1', but it imposed overly strict requirements on the
transform.
Relax this by checking that both C2 and C1 don't shift out bits
(in a signed sense) when forming the new constants.
Alive proofs (https://rise4fun.com/Alive/PTz0):
Name: ashr_legal
Pre: ((C2 << C3) >> C3) == C2 && ((C1 << C3) >> C3) == C1
%a = ashr i16 %x, C3
%b = and i16 %a, C2
%c = icmp i16 %b, C1
=>
%d = and i16 %x, C2 << C3
%c = icmp i16 %d, C1 << C3
Name: ashr_shiftout_eq
Pre: ((C2 << C3) >> C3) == C2 && ((C1 << C3) >> C3) != C1
%a = ashr i16 %x, C3
%b = and i16 %a, C2
%c = icmp eq i16 %b, C1
=>
%c = false
Note that >> corresponds to ashr here. The case of an equality
comparison has some special handling in this transform, because
it will form to a true/false result if the condition on the comparison
constant it violated.
Differential Revision: https://reviews.llvm.org/D74294
This is a followup to D73803, which uses the replaceOperand()
helper in more places.
This should be NFC apart from changes to worklist order.
Differential Revision: https://reviews.llvm.org/D73919
As discussed on D73919, this replaces a few cases where we were
modifying multiple operands of instructions in-place with the
creation of a new instruction, which we generally prefer nowadays.
This tends to be more readable and less prone to worklist management
bugs.
Test changes are only superficial (instruction naming and order).
Adds a replaceOperand() helper, which is like Instruction.setOperand()
but adds the old operand to the worklist. This reduces the amount of
missing or incorrect worklist management.
This only applies the helper to a relatively small subset of
setOperand() calls in InstCombine, namely those of the pattern
`I.setOperand(); return &I;`, where it is most obviously applicable.
Differential Revision: https://reviews.llvm.org/D73803
This renames Worklist.AddDeferred() to Worklist.add() and
Worklist.Add() to Worklist.push(). The intention here is that
Worklist.add() should be the go-to method for explicit worklist
management, while the raw Worklist.push() is mostly for
InstCombine internals. I will then migrate uses of Worklist.push()
to Worklist.add() in followup changes.
As suggested by spatel on D73411 I'm also changing the remaining
method names to lowercase first character, in line with current
coding standards.
Differential Revision: https://reviews.llvm.org/D73745
In line with current conventions, create new instructions rather
than modify two operands in place and performing manual worklist
management.
This should be NFC apart from possible worklist order changes.
For the
icmp eq (add X, C1), C2 => icmp eq X, C2-C1
icmp eq (sub C1, X), C2 => icmp eq X, C1-C2
folds, this allows C1 to be non-splat and contain undefs.
C2 is still splat, due to the structure of the code.
This is to address the remaining part of the regression in D73411,
where demanded element analysis replaces some elements with undef.
Differential Revision: https://reviews.llvm.org/D73647
cmp (splat V1, M), SplatC --> splat (cmp V1, SplatC'), M
As discussed in PR44588:
https://bugs.llvm.org/show_bug.cgi?id=44588
...we try harder to push shuffles after binops than after compares.
This patch handles the special (but presumably most common case) of
splat shuffles. If both operands are splats, then we can do the
comparison on the non-splat inputs followed by splat of the compare.
That should take care of the regression noted in D73411.
There's another potential fold requested in PR37463 to scalarize the
compare, but that's another patch (and it's not clear if we can do
that without the ability to undo it later):
https://bugs.llvm.org/show_bug.cgi?id=37463
Differential Revision: https://reviews.llvm.org/D73575
This addresses https://bugs.llvm.org/show_bug.cgi?id=42801.
The m_c_ICmp() matcher is changed to provide the swapped predicate
if the operands are swapped.
Existing uses of m_c_ICmp() fall in one of two categories: Working
on equality predicates only, where swapping is irrelevant.
Or performing a manual swap, in which case this patch removes it.
The only exception is the foldICmpWithLowBitMaskedVal() fold, which
does not swap the predicate, and instead reasons about whether
a swap occurred or not for each predicate. Getting the swapped
predicate allows us to merge the logic for pairs of predicates,
instead of duplicating it.
Differential Revision: https://reviews.llvm.org/D72976
As shown in P44383:
https://bugs.llvm.org/show_bug.cgi?id=44383
...we can't safely propagate a vector constant through this icmp fold
if that vector constant contains undefined elements.
We know that each defined element of the constant is safe though, so
find the first of those and replicate it into the formerly undef lanes.
Differential Revision: https://reviews.llvm.org/D72101
GEP index size can be specified in the DataLayout, introduced in D42123. However, there were still places
in which getIndexSizeInBits was used interchangeably with getPointerSizeInBits. This notably caused issues
with Instcombine's visitPtrToInt; but the unit tests was incorrect, so this remained undiscovered.
This fixes the buildbot failures.
Differential Revision: https://reviews.llvm.org/D68328
Patch by Joseph Faulls!
GEP index size can be specified in the DataLayout, introduced in D42123. However, there were still places
in which getIndexSizeInBits was used interchangeably with getPointerSizeInBits. This notably caused issues
with Instcombine's visitPtrToInt; but the unit tests was incorrect, so this remained undiscovered.
Differential Revision: https://reviews.llvm.org/D68328
Patch by Joseph Faulls!
Fix for https://bugs.llvm.org/show_bug.cgi?id=40846.
This adds a combine for cases where a (a + b) < a style overflow
check is performed, but with a + b being the result of
uadd.with.overflow, so the overflow result is also already available
and we can just use it. Subsequently GVN/CSE will deduplicate the extracts.
We can run into this situation if you have both a uadd.with.overflow
and a manual add + overflow check in the same function (on the same
operands), in which case GVN will rewrite the add to the with.overflow
result and leave you with this pattern.
The implementation is a bit ugly because I'm handling the various
canonicalization edge cases.
This does not yet handle the negated version of this pattern.
Differential Revision: https://reviews.llvm.org/D58644
rL341831 moved one-use check higher up, restricting a few folds
that produced a single instruction from two instructions to the case
where the inner instruction would go away.
Original commit message:
> InstCombine: move hasOneUse check to the top of foldICmpAddConstant
>
> There were two combines not covered by the check before now,
> neither of which actually differed from normal in the benefit analysis.
>
> The most recent seems to be because it was just added at the top of the
> function (naturally). The older is from way back in 2008 (r46687)
> when we just didn't put those checks in so routinely, and has been
> diligently maintained since.
From the commit message alone, there doesn't seem to be a
deeper motivation, deeper problem that was trying to solve,
other than 'fixing the wrong one-use check'.
As i have briefly discusses in IRC with Tim, the original motivation
can no longer be recovered, too much time has passed.
However i believe that the original fold was doing the right thing,
we should be performing such a transformation even if the inner `add`
will not go away - that will still unchain the comparison from `add`,
it will no longer need to wait for `add` to compute.
Doing so doesn't seem to break any particular idioms,
as least as far as i can see.
References https://bugs.llvm.org/show_bug.cgi?id=44100
This is a fix for:
https://bugs.llvm.org/show_bug.cgi?id=43730
...and as shown there, we have existing test cases that show potential miscompiles.
We could just bail out for vector constants that contain any undef elements, or we can do as shown here:
allow the transform, but replace the undefs with a safe value.
For most of the tests shown, this results in a full splat constant (no undefs) which is probably a win
for further IR analysis because we conservatively don't match undefs in most cases. Codegen can probably
recover these kinds of undef lanes via demanded elements analysis if that's profitable.
Differential Revision: https://reviews.llvm.org/D69519
This adds folds for comparing uadd.sat/usub.sat with zero:
* uadd.sat(a, b) == 0 => a == 0 && b == 0 => (a | b) == 0
* usub.sat(a, b) == 0 => a <= b
And inverted forms for !=.
Differential Revision: https://reviews.llvm.org/D69224
llvm-svn: 375374
Summary:
This problem consists of several parts:
* Basic sign bit extraction - `trunc? (?shr %x, (bitwidth(x)-1))`.
This is trivial, and easy to do, we have a fold for it.
* Shift amount reassociation - if we have two identical shifts,
and we can simplify-add their shift amounts together,
then we likely can just perform them as a single shift.
But this is finicky, has one-use restrictions,
and shift opcodes must be identical.
But there is a super-pattern where both of these work together.
to produce sign bit test from two shifts + comparison.
We do indeed already handle this in most cases.
But since we get that fold transitively, it has one-use restrictions.
And what's worse, in this case the right-shifts aren't required to be
identical, and we can't handle that transitively:
If the total shift amount is bitwidth-1, only a sign bit will remain
in the output value. But if we look at this from the perspective of
two shifts, we can't fold - we can't possibly know what bit pattern
we'd produce via two shifts, it will be *some* kind of a mask
produced from original sign bit, but we just can't tell it's shape:
https://rise4fun.com/Alive/cM0https://rise4fun.com/Alive/9IN
But it will *only* contain sign bit and zeros.
So from the perspective of sign bit test, we're good:
https://rise4fun.com/Alive/FRzhttps://rise4fun.com/Alive/qBU
Superb!
So the simplest solution is to extend `reassociateShiftAmtsOfTwoSameDirectionShifts()` to also have a
sudo-analysis mode that will ignore extra-uses, and will only check
whether a) those are two right shifts and b) they end up with bitwidth(x)-1
shift amount and return either the original value that we sign-checking,
or null.
This does not have any functionality change for
the existing `reassociateShiftAmtsOfTwoSameDirectionShifts()`.
All that being said, as disscussed in the review, this yet again
increases usage of instsimplify in instcombine as utility.
Some day that may need to be reevaluated.
https://bugs.llvm.org/show_bug.cgi?id=43595
Reviewers: spatel, efriedma, vsk
Reviewed By: spatel
Subscribers: xbolva00, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68930
llvm-svn: 375371
True, no test coverage is being added here. But those non-canonical
predicates that are already handled here already have no test coverage
as far as i can tell. I tried to add tests for them, but all the patterns
already get handled elsewhere.
llvm-svn: 373962
We do indeed already get it right in some cases, but only transitively,
with one-use restrictions. Since we only need to produce a single
comparison, it makes sense to match the pattern directly:
https://rise4fun.com/Alive/kPg
llvm-svn: 373802
Summary:
Removing an assumption (assert) that the CmpInst already has been
simplified in getFlippedStrictnessPredicateAndConstant. Solution is
to simply bail out instead of hitting the assertion. Instead we
assume that any profitable rewrite will happen in the next iteration
of InstCombine.
The reason why we can't assume that the CmpInst already has been
simplified is that the worklist does not guarantee such an ordering.
Solves https://bugs.llvm.org/show_bug.cgi?id=43376
Reviewers: spatel, lebedev.ri
Reviewed By: lebedev.ri
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68022
llvm-svn: 372972
https://rise4fun.com/Alive/KtL
This also shows that the fold added in D67412 / r372257
was too specific, and the new fold allows those test cases
to be handled more generically, therefore i delete now-dead code.
This is yet again motivated by
D67122 "[UBSan][clang][compiler-rt] Applying non-zero offset to nullptr is undefined behaviour"
llvm-svn: 372912
This has the potential to uncover missed analysis/folds as shown in the
min/max code comment/test, but fewer restrictions on icmp folds should
be better in general to solve cases like:
https://bugs.llvm.org/show_bug.cgi?id=43310
llvm-svn: 372510
Related folds were added in:
rL125734
...the code comment about register pressure is discussed in
more detail in:
https://bugs.llvm.org/show_bug.cgi?id=2698
But 10 years later, perf testing bzip2 with this change now
shows a slight (0.2% average) improvement on Haswell although
that's probably within test noise.
Given that this is IR canonicalization, we shouldn't be worried
about register pressure though; the backend should be able to
adjust for that as needed.
This is part of solving PR43310 the theoretically right way:
https://bugs.llvm.org/show_bug.cgi?id=43310
...ie, if we don't cripple basic transforms, then we won't
need to add special-case code to detect larger patterns.
rL371940 and rL371981 are related patches in this series.
llvm-svn: 372007
This fold and several others were added in:
rL125734 <https://reviews.llvm.org/rL125734>
...with no explanation for the one-use checks other than the code
comments about register pressure.
Given that this is IR canonicalization, we shouldn't be worried
about register pressure though; the backend should be able to
adjust for that as needed.
This is part of solving PR43310 the theoretically right way:
https://bugs.llvm.org/show_bug.cgi?id=43310
...ie, if we don't cripple basic transforms, then we won't
need to add special-case code to detect larger patterns.
rL371940 is a related patch in this series.
llvm-svn: 371981
This blob was written before match() existed, so it
could probably be reduced significantly.
But I suspect it isn't well tested, so tests would have
to be added to reduce risk from logic changes.
llvm-svn: 371978
This fold and several others were added in:
rL125734
...with no explanation for the one-use checks other than the code
comments about register pressure.
Given that this is IR canonicalization, we shouldn't be worried
about register pressure though; the backend should be able to
adjust for that as needed.
There are similar checks as noted with the TODO comments. I'm
hoping to remove those restrictions too, but if any of these
does cause a regression, it should be easier to correct by making
small, individual commits.
This is part of solving PR43310 the theoretically right way:
https://bugs.llvm.org/show_bug.cgi?id=43310
...ie, if we don't cripple basic transforms, then we won't
need to add special-case code to detect larger patterns.
llvm-svn: 371940
(srem X, pow2C) sgt/slt 0 can be reduced using bit hacks by masking
off the sign bit and the module (low) bits:
https://rise4fun.com/Alive/jSO
A '2' divisor allows slightly more folding:
https://rise4fun.com/Alive/tDBM
Any chance to remove an 'srem' use is probably worthwhile, but this is limited
to the one-use improvement case because doing more may expose other missing
folds. That means it does nothing for PR21929 yet:
https://bugs.llvm.org/show_bug.cgi?id=21929
Differential Revision: https://reviews.llvm.org/D67334
llvm-svn: 371610
A follow-up for r329011.
This may be changed to produce @llvm.sub.with.overflow in a later patch,
but for now just make things more consistent overall.
A few observations stem from this:
* There does not seem to be a similar one-instruction fold for uadd-overflow
* I'm not sure we'll want to canonicalize `B u> A` as `usub.with.overflow`,
so since the `icmp` here no longer refers to `sub`,
reconstructing `usub.with.overflow` will be problematic,
and will likely require standalone pass (similar to DivRemPairs).
https://rise4fun.com/Alive/Zqs
Name: (A - B) u> A --> B u> A
%t0 = sub i8 %A, %B
%r = icmp ugt i8 %t0, %A
=>
%r = icmp ugt i8 %B, %A
Name: (A - B) u<= A --> B u<= A
%t0 = sub i8 %A, %B
%r = icmp ule i8 %t0, %A
=>
%r = icmp ule i8 %B, %A
Name: C u< (C - D) --> C u< D
%t0 = sub i8 %C, %D
%r = icmp ult i8 %C, %t0
=>
%r = icmp ult i8 %C, %D
Name: C u>= (C - D) --> C u>= D
%t0 = sub i8 %C, %D
%r = icmp uge i8 %C, %t0
=>
%r = icmp uge i8 %C, %D
llvm-svn: 371101
Summary:
Finally, the fold i was looking forward to :)
The legality check is muddy, i doubt i've groked the full generalization,
but it handles all the cases i care about, and can come up with:
https://rise4fun.com/Alive/26j
I.e. we can perform the fold if **any** of the following is true:
* The shift amount is either zero or one less than widest bitwidth
* Either of the values being shifted has at most lowest bit set
* The value that is being shifted by `shl` (which is not truncated) should have no less leading zeros than the total shift amount;
* The value that is being shifted by `lshr` (which **is** truncated) should have no less leading zeros than the widest bit width minus total shift amount minus one
I strongly suspect there is some better generalization, but i'm not aware of it as of right now.
For now i also avoided using actual `computeKnownBits()`, but restricted it to constants.
Reviewers: spatel, nikic, xbolva00
Reviewed By: spatel
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66383
llvm-svn: 370324
Summary:
`matchThreeWayIntCompare()` looks for
```
select i1 (a == b),
i32 Equal,
i32 (select i1 (a < b), i32 Less, i32 Greater)
```
but both of these selects/compares can be in it's commuted form,
so out of 8 variants, only the two most basic ones is handled.
This fixes regression being introduced in D66232.
Reviewers: spatel, nikic, efriedma, xbolva00
Reviewed By: spatel
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66607
llvm-svn: 369841
Summary:
If we have e.g.:
```
%t = icmp ult i32 %x, 65536
%r = select i1 %t, i32 %y, i32 65535
```
the constants `65535` and `65536` are suspiciously close.
We could perform a transformation to deduplicate them:
```
Name: ult
%t = icmp ult i32 %x, 65536
%r = select i1 %t, i32 %y, i32 65535
=>
%t.inv = icmp ugt i32 %x, 65535
%r = select i1 %t.inv, i32 65535, i32 %y
```
https://rise4fun.com/Alive/avb
While this may seem esoteric, this should certainly be good for vectors
(less constant pool usage) and for opt-for-size - need to have only one constant.
But the real fun part here is that it allows further transformation,
in particular it finishes cleaning up the `clamp` folding,
see e.g. `canonicalize-clamp-with-select-of-constant-threshold-pattern.ll`.
We start with e.g.
```
%dont_need_to_clamp_positive = icmp sle i32 %X, 32767
%dont_need_to_clamp_negative = icmp sge i32 %X, -32768
%clamp_limit = select i1 %dont_need_to_clamp_positive, i32 -32768, i32 32767
%dont_need_to_clamp = and i1 %dont_need_to_clamp_positive, %dont_need_to_clamp_negative
%R = select i1 %dont_need_to_clamp, i32 %X, i32 %clamp_limit
```
without this patch we currently produce
```
%1 = icmp slt i32 %X, 32768
%2 = icmp sgt i32 %X, -32768
%3 = select i1 %2, i32 %X, i32 -32768
%R = select i1 %1, i32 %3, i32 32767
```
which isn't really a `clamp` - both comparisons are performed on the original value,
this patch changes it into
```
%1.inv = icmp sgt i32 %X, 32767
%2 = icmp sgt i32 %X, -32768
%3 = select i1 %2, i32 %X, i32 -32768
%R = select i1 %1.inv, i32 32767, i32 %3
```
and then the magic happens! Some further transform finishes polishing it and we finally get:
```
%t1 = icmp sgt i32 %X, -32768
%t2 = select i1 %t1, i32 %X, i32 -32768
%t3 = icmp slt i32 %t2, 32767
%R = select i1 %t3, i32 %t2, i32 32767
```
which is beautiful and just what we want.
Proofs for `getFlippedStrictnessPredicateAndConstant()` for de-canonicalization:
https://rise4fun.com/Alive/THl
Proofs for the fold itself: https://rise4fun.com/Alive/THl
Reviewers: spatel, dmgreen, nikic, xbolva00
Reviewed By: spatel
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66232
llvm-svn: 369840
Started implementing the vector case and realized the scalar case hadn't handled the GEP producing a different type than the base correctly. It's entertaining seeing what slips through review when we're focused on the 'hard' parts. :(
Also adding an extra vector test as it happened to be in workspace and wasn't worth separating.
llvm-svn: 369795