InstCombine converts range tests of the form (X > C1 && X < C2) or
(X < C1 || X > C2) into checks of the form (X + C3 < C4) or
(X + C3 > C4). It is possible to express all range tests in either
of these forms (with different choices of constants), but currently
neither of them is considered canonical. We may have equivalent
range tests using either ult or ugt.
This proposes to canonicalize all range tests to use ult. An
alternative would be to canonicalize to either ult or ugt depending
on the specific constants involved -- e.g. in practice we currently
generate ult for && style ranges and ugt for || style ranges when
going through the insertRangeTest() helper. In fact, the "clamp like"
fold was relying on this, which is why I had to tweak it to not
assume whether inversion is needed based on just the predicate.
Proof: https://alive2.llvm.org/ce/z/_SP_rQ
Differential Revision: https://reviews.llvm.org/D113366
This introduces a new ComputeMinSignedBits method for ValueTracking that
returns the BitWidth - SignBits + 1 from ComputeSignBits, and represents
the minimum bit size for the value as a signed integer. Similar to the
existing APInt::getMinSignedBits method, this can make some of the
reasoning around ComputeSignBits more natural.
See https://reviews.llvm.org/D112298
Fixes a crash observed by oss-fuzz in 39934. Issue at hand is that code expects a pattern match on m_Mul to imply the operand is a mul instruction, however mul constexprs are also valid here.
Inspired by D111968, provide a isNegatedPowerOf2() wrapper instead of obfuscating code with (-Value).isPowerOf2() patterns, which I'm sure are likely avenues for typos.....
Differential Revision: https://reviews.llvm.org/D111998
There may be some other patterns like this or a generalization,
but this is an example that I noticed would definitely regress
with a planned follow-up to D111410.
https://alive2.llvm.org/ce/z/GVpQDb
The test diffs show that we have better analysis/folds for 'add'
(although we should at least have the simplifications
independently, so we don't have the one-use restriction).
This is related to solving regressions that would appear in
transforms related to D111410, and that is part of a series
of enhancements that may eventually helpi solve PR34047.
https://alive2.llvm.org/ce/z/3tB9KG
define i1 @src(i8 %x, i8 %C, i8 %C2) {
%sub = sub nuw i8 %C2, %x
%r = icmp slt i8 %sub, %C
ret i1 %r
}
define i1 @tgt(i8 %x, i8 %C, i8 %C2) {
%Cnot = xor i8 %C, -1
%C2not = xor i8 %C2, -1
%add = add nuw i8 %x, %C2not
%r = icmp sgt i8 %add, %Cnot
ret i1 %r
}
There were 2 related but over-specified folds for:
C1 - X == C
One allowed multi-use but was limited to equal constants.
The other allowed different constants but disallowed multi-use.
This combines the 2 folds into a more general match.
The test diffs show the multi-use cases that were falling
through the cracks.
https://alive2.llvm.org/ce/z/4_hEt2
define i1 @src(i8 %x, i8 %subC, i8 %C) {
%s = sub i8 %subC, %x
%r = icmp eq i8 %s, %C
ret i1 %r
}
define i1 @tgt(i8 %x, i8 %subC, i8 %C) {
%newC = sub i8 %subC, %C
%isneg = icmp eq i8 %x, %newC
ret i1 %isneg
}
Stop using APInt constructors and methods that were soft-deprecated in
D109483. This fixes all the uses I found in llvm, except for the APInt
unit tests which should still test the deprecated methods.
Differential Revision: https://reviews.llvm.org/D110807
This patch is for fixing potential shufflevector-related bugs like D93818.
As D93818, this patch change shufflevector's default placeholder to poison.
To reduce risk, it was divided into several patches, and this patch is for InstCombineCompares and InstructionCombining.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D110227
This renames the primary methods for creating a zero value to `getZero`
instead of `getNullValue` and renames predicates like `isAllOnesValue`
to simply `isAllOnes`. This achieves two things:
1) This starts standardizing predicates across the LLVM codebase,
following (in this case) ConstantInt. The word "Value" doesn't
convey anything of merit, and is missing in some of the other things.
2) Calling an integer "null" doesn't make any sense. The original sin
here is mine and I've regretted it for years. This moves us to calling
it "zero" instead, which is correct!
APInt is widely used and I don't think anyone is keen to take massive source
breakage on anything so core, at least not all in one go. As such, this
doesn't actually delete any entrypoints, it "soft deprecates" them with a
comment.
Included in this patch are changes to a bunch of the codebase, but there are
more. We should normalize SelectionDAG and other APIs as well, which would
make the API change more mechanical.
Differential Revision: https://reviews.llvm.org/D109483
This could go either direction since the instruction
count is the same either way, but there are a few
reasons to prefer this:
1. We already do the related transform with 'and'
(see just above the new code).
2. We try (too hard) to compensate for not having this
and possibly other folds in transformZExtICmp(),
and that leads to bugs like https://llvm.org/PR51762 .
3. Codegen looks better across a variety of targets.
https://alive2.llvm.org/ce/z/uEgn4P
We cannot leak any equivalency information by comparing against null
since null never has virtual metadata associated with it (when null is
not a valid dereferenceable pointer).
Instcombine seems to make sure that a null will be on the RHS, so we
don't have to check both operands.
This fixes a missed optimization in llvm-test-suite's MultiSource lambda
benchmark under -fstrict-vtable-pointers.
Reviewed By: Prazek
Differential Revision: https://reviews.llvm.org/D108734
This is a quick fix for a motivating case that looks like this:
https://godbolt.org/z/GeMqzMc38
As noted, we might be able to restore the min/max patterns
with select folds, or we just wait for this to become easier
with canonicalization to min/max intrinsics.
The intrinsics have an extra chunk of known bits logic
compared to the normal cmp+select idiom. That allows
folding the icmp in each case to something better, but
that then opposes the canonical form of min/max that
we try to form for a select.
I'm carving out a narrow exception to preserve all
existing regression tests while avoiding the inf-loop.
It seems unlikely that this is the only bug like this
left, but this should fix:
https://llvm.org/PR51419
There may be some generalizations (see test comments) of these patterns,
but this should handle the cases motivated by:
https://llvm.org/PR51315https://llvm.org/PR51259
The backend may want to transform differently, but at least for
the x86 examples that I looked at, there does not appear to be
any significant perf diff either way.
The inttoptr/ptrtoint roundtrip optimization is not always correct.
We are working towards removing this optimization and adding support
to specific cases where this optimization works. This patch is the
first one on this line.
Consider the example:
%i = ptrtoint i8* %X to i64
%p = inttoptr i64 %i to i16*
%cmp = icmp eq i8* %load, %p
In this specific case, the inttoptr/ptrtoint optimization is correct
as it only compares the pointer values. In this patch, we fold
inttoptr/ptrtoint to a bitcast (if src and dest types are different).
Differential Revision: https://reviews.llvm.org/D105088
This set of folds was added recently with:
c7b658aeb50c400e895340b752d28d
...and I noted that this wasn't likely to fire in code derived
from C/C++ source because of nsw in particular. But I didn't
notice that I had placed the code above the no-wrap block
of transforms.
This is likely the cause of regressions noted from the previous
commit because -- as shown in the test diffs -- we may have
transformed into a compare with an arbitrary constant rather
than a simpler signbit test.
This is the pattern from the description of:
https://llvm.org/PR50816
There might be a way to generalize this to a smaller or more
generic pattern, but I have not found it yet.
https://alive2.llvm.org/ce/z/ShzJoF
define i1 @src(i8 %x) {
%add = add i8 %x, -1
%xor = xor i8 %x, -1
%and = and i8 %add, %xor
%r = icmp slt i8 %and, 0
ret i1 %r
}
define i1 @tgt(i8 %x) {
%r = icmp eq i8 %x, 0
ret i1 %r
}
This follows up patches for the unsigned siblings:
0c400e8953c7b658aeb5
We are translating an offset signed compare to its
unsigned equivalent when one end of the range is
at the limit (zero or unsigned max).
(X + C2) >s C --> X <u (SMAX - C) (if C == C2 - 1)
(X + C2) <s C --> X >u (C ^ SMAX) (if C == C2)
This probably does not show up much in IR derived
from C/C++ source because that would likely have
'nsw', and we have folds for that already.
As with the previous unsigned transforms, the folds
could be generalized to handle non-constant patterns:
https://alive2.llvm.org/ce/z/Y8Xrrm
; sgt
define i1 @src(i8 %a, i8 %c) {
%c2 = add i8 %c, 1
%t = add i8 %a, %c2
%ov = icmp sgt i8 %t, %c
ret i1 %ov
}
define i1 @tgt(i8 %a, i8 %c) {
%c_off = sub i8 127, %c ; SMAX
%ov = icmp ult i8 %a, %c_off
ret i1 %ov
}
https://alive2.llvm.org/ce/z/c8uhnk
; slt
define i1 @src(i8 %a, i8 %c) {
%t = add i8 %a, %c
%ov = icmp slt i8 %t, %c
ret i1 %ov
}
define i1 @tgt(i8 %a, i8 %c) {
%c_offnot = xor i8 %c, 127 ; SMAX
%ov = icmp ugt i8 %a, %c_offnot
ret i1 %ov
}
This is one sibling of the fold added with c7b658aeb5 .
(X + C2) <u C --> X >s ~C2 (if C == C2 + SMIN)
I'm still not sure how to describe it best, but we're
translating 2 constants from an unsigned range comparison
to signed because that eliminates the offset (add) op.
This could be extended to handle the more general (non-constant)
pattern too:
https://alive2.llvm.org/ce/z/K-fMBf
define i1 @src(i8 %a, i8 %c2) {
%t = add i8 %a, %c2
%c = add i8 %c2, 128 ; SMIN
%ov = icmp ult i8 %t, %c
ret i1 %ov
}
define i1 @tgt(i8 %a, i8 %c2) {
%not_c2 = xor i8 %c2, -1
%ov = icmp sgt i8 %a, %not_c2
ret i1 %ov
}
There must be a better way to describe this pattern in words?
(X + C2) >u C --> X <s -C2 (if C == C2 + SMAX)
This could be extended to handle the more general (non-constant)
pattern too:
https://alive2.llvm.org/ce/z/rdfNFP
define i1 @src(i8 %a, i8 %c1) {
%t = add i8 %a, %c1
%c2 = add i8 %c1, 127 ; SMAX
%ov = icmp ugt i8 %t, %c2
ret i1 %ov
}
define i1 @tgt(i8 %a, i8 %c1) {
%neg_c1 = sub i8 0, %c1
%ov = icmp slt i8 %a, %neg_c1
ret i1 %ov
}
The pattern was noticed as a by-product of D104932.
We could use a bigger hammer and bail out on any constant
expression, but there's a regression test that appears to
validly do the transform (although it may not have been
intending to check that optimization).
Rather than relying on pointer type equality (which, for a change,
is silently incorrect with opaque pointers) check that the GEP
source element types match.
As noted in PR45210: https://bugs.llvm.org/show_bug.cgi?id=45210
...the bug is triggered as Eli say when sext(idx) * ElementSize overflows.
```
// assume that GV is an array of 4-byte elements
GEP = gep GV, 0, Idx // this is accessing Idx * 4
L = load GEP
ICI = icmp eq L, value
=>
ICI = icmp eq Idx, NewIdx
```
The foldCmpLoadFromIndexedGlobal function simplifies GEP+load operation to icmp.
And there is a problem because Idx * ElementSize can overflow.
Let's assume that the wanted value is at offset 0.
Then, there are actually four possible values for Idx to match offset 0: 0x00..00, 0x40..00, 0x80..00, 0xC0..00.
We should return true for all these values, but currently, the new icmp only returns true for 0x00..00.
This problem can be solved by masking off (trailing zeros of ElementSize) bits from Idx.
```
...
=>
Idx' = and Idx, 0x3F..FF
ICI = icmp eq Idx', NewIdx
```
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D99481
This reverts commit 4f2fd3818b.
The Linux kernel fails to build after this commit. See
https://reviews.llvm.org/D99481 for a reproducer.
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
As noted in PR45210: https://bugs.llvm.org/show_bug.cgi?id=45210
...the bug is triggered as Eli say when sext(idx) * ElementSize overflows.
```
// assume that GV is an array of 4-byte elements
GEP = gep GV, 0, Idx // this is accessing Idx * 4
L = load GEP
ICI = icmp eq L, value
=>
ICI = icmp eq Idx, NewIdx
```
The foldCmpLoadFromIndexedGlobal function simplifies GEP+load operation to icmp.
And there is a problem because Idx * ElementSize can overflow.
Let's assume that the wanted value is at offset 0.
Then, there are actually four possible values for Idx to match offset 0: 0x00..00, 0x40..00, 0x80..00, 0xC0..00.
We should return true for all these values, but currently, the new icmp only returns true for 0x00..00.
This problem can be solved by masking off (trailing zeros of ElementSize) bits from Idx.
```
...
=>
Idx' = and Idx, 0x3F..FF
ICI = icmp eq Idx', NewIdx
```
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D99481
This was reverted due to performance regressions in ARM benchmarks,
which have since been addressed by D101196 (SCEV analysis improvement)
and D101778 (CGP reverse transform).
-----
The single-use case is handled implicity by converting the icmp
into a mask check first. When comparing with zero in particular,
we don't need the one-use restriction, as we only produce a single
icmp.
https://alive2.llvm.org/ce/z/MSixcmhttps://alive2.llvm.org/ce/z/GwpG0M
This fixes https://llvm.org/PR48900 , but as seen in the
regression tests prevents some optimizations.
There are a few options to restore those (switch to min/max
intrinsics, add larger pattern matching for select with
dominating condition, improve CVP), but we need to prevent
the bug 1st.
This reverts commit 13ec913bdf.
This commit introduces new uses of the overflow checking intrinsics that
depend on implementations in compiler-rt, which Windows users generally
do not link against. I filed an issue (somewhere) to make clang
auto-link the builtins library to resolve this situation, but until that
happens, it isn't reasonable for the optimizer to introduce new link
time dependencies.
Currently, the InstCombineCompare is combining two add operations
into a single add operation which always has a nsw flag, without
checking the conditions to see if this flag should be present
according to the original two add operations or not.
This patch will change the InstCombineCompare to emit the nsw or
nuw only when these flags are allowed to be generated according to
the original add operations and remove the possibility of applying
wrong optimization with passes that will perform on the IR later
in the pipeline.
To confirm that the current results are buggy and the results after
proposed patch are the correct IR the following examples from Alive2
are attached; the same results can be seen in the case of nuw flag
and nsw is just used as an example. The following link shows that
the generated IR with current LLVM is a buggy IR when none of the
original add operations have nsw flag.
https://alive2.llvm.org/ce/z/WGaDrm
The following link proves that the generated IR after the patch in
the former case is the correct IR.
https://alive2.llvm.org/ce/z/wQ7G_e
Differential Revision: https://reviews.llvm.org/D100095