Iff we know we can get rid of the inversions in the new pattern,
we can thus get rid of the inversion in the old pattern,
this decreasing instruction count.
Note that we could position this transformation as just hoisting
of the `not` (still, iff y is freely negatible), but the test changes
show a number of regressions, so let's not do that.
Iff we know we can get rid of the inversions in the new pattern,
we can thus get rid of the inversion in the old pattern,
this decreasing instruction count.
As Mikael Holmén is noting in the post-commit review for the first fix
https://reviews.llvm.org/rGd4ccef38d0bb#967466
not hoisting constantexprs is not enough,
because if the xor originally was a constantexpr (i.e. X is a constantexpr).
`SimplifyAssociativeOrCommutative()` in `visitXor()` will immediately
undo this transform, thus again causing an infinite combine loop.
This transform has resulted in a surprising number of constantexpr failures.
As it is being reported (in post-commit review) in
https://reviews.llvm.org/D93857
this fold (as i expected, but failed to come up with test coverage
despite trying) has issues with constant expressions.
Since we only care about true constants, which constantexprs are not,
don't perform such hoisting for constant expressions.
This is one of the deficiencies that can be observed in
https://godbolt.org/z/YPczsG after D91038 patch set.
This exposed two missing folds, one was fixed by the previous commit,
another one is `(A ^ B) | ~(A ^ B) --> -1` / `(A ^ B) & ~(A ^ B) --> 0`.
`-early-cse` will catch it: https://godbolt.org/z/4n1T1v,
but isn't meaningful to fix it in InstCombine,
because we'd need to essentially do our own CSE,
and we can't even rely on `Instruction::isIdenticalTo()`,
because there are no guarantees that the order of operands matches.
So let's just accept it as a loss.
One less instruction and reducing use count of zext.
As alive2 confirms, we're fine with all the weird combinations of
undef elts in constants, but unless the shift amount was undef
for a lane, we must sanitize undef mask to zero, since sign bits
are no longer zeros.
https://rise4fun.com/Alive/d7r
```
----------------------------------------
Optimization: zz
Precondition: ((C1 == (width(%r) - width(%x))) && isSignBit(C2))
%o0 = zext %x
%o1 = shl %o0, C1
%r = and %o1, C2
=>
%n0 = sext %x
%r = and %n0, C2
Done: 2016
Optimization is correct!
```
m_SpecificInt has the same 'no undef element' behaviour as m_APInt so no change there, and anyway we have test coverage for undef elements in the fold.
Noticed while fixing a Wshadow warning about shadow Value *X, *Y variables.
There are 1-2 potential follow-up NFC commits to reduce
this further on the way to generalizing this for vectors.
The operand replacing path should be dead code because demanded
bits handles that more generally (D91415).
I'm not certain InstCombinerImpl::matchBSwapOrBitReverse needs to filter the or(op0(),op1()) ops - there are just too many cases that recognizeBSwapOrBitReverseIdiom/collectBitParts handle now (and quickly).
matchBSwapOrBitReverse was hardcoded to just match bswaps - we're going to need to expose the ability to match bitreverse as well, so make this part of the function call.
Fixes a number of stage2 buildbots that were failing when I generalized the m_ConstantInt() logic - that didn't match for pointer types but m_Zero() does......
Scalar cases were already being handled by foldLogOpOfMaskedICmps (so this was dead code), but refactoring to support non-uniform vectors will take some time, so tweak this fold in the meantime.
m_SpecificInt doesn't accept undef elements in a vector splat value - tweak specific_intval to optionally allow undefs and add the m_SpecificIntAllowUndef variants.
Allows us to remove the m_APIntAllowUndef + comparison hack inside matchFunnelShift
Replace m_SpecificInt with m_APIntAllowUndef to matching splats containing undefs, then use ConstantExpr::mergeUndefsWith to merge the undefs together in the result.
The undef funnel shift amounts are getting replaced with zero later on - I'll address this in a later patch, otherwise we lose potential shift by splat value patterns.
If value tracking can confirm that a shift value is less than the type bitwidth then we can more confidently fold general or(shl(a,x),lshr(b,sub(bw,x))) patterns to a funnel/rotate intrinsic pattern without causing bad codegen regressions in the backend (see D89139).
Reapplied after the shift canonicalization in rG02295e6d1a15 which removed the need to flip the shift values.
Differential Revision: https://reviews.llvm.org/D88783
After rG02295e6d1a15 we no longer need to invert the shift values for fshr - this is just hidden at the moment as funnel shifts only ever match for constant values so never use the fshr "Sub on SHL" path.
If value tracking can confirm that a shift value is less than the type bitwidth then we can more confidently fold general or(shl(a,x),lshr(b,sub(bw,x))) patterns to a funnel/rotate intrinsic pattern without causing bad codegen regressions in the backend (see D89139).
Differential Revision: https://reviews.llvm.org/D88783
Complete basic PR46895 fixes by refactoring D87452/D88402 to allow us to match non-uniform constant values.
We still don't handle non-uniform vectors that contain undef elements, but that can wait until we have a decent generic mechanism for this.
Differential Revision: https://reviews.llvm.org/D88420
First step towards extending the existing rotation support to full funnel shift handling now that the backend legalization support has improved.
This enables us to match the shift by constant cases, which are pretty trivial to expand again if necessary.
D88420 will add non-uniform support for funnel shifts as well once its been finalized.
Differential Revision: https://reviews.llvm.org/D88834
The existing code ignores undef values which matches m_SpecificInt_ICMP, although m_SpecificInt_ICMP returns false for an all-undef constant, I've added test coverage at rGfe0197e194a64f9 to show that undef folding should already have dealt with that case.
If we're bswap'ing some bytes and zero'ing the remainder we can perform this as a bswap+mask which helps us match 'partial' bswaps as a first step towards folding into a more complex bswap pattern.
Reapplied with early-out if recognizeBSwapOrBitReverseIdiom collects a source wider than the result type.
Differential Revision: https://reviews.llvm.org/D88578
If we're bswap'ing some bytes and zero'ing the remainder we can perform this as a bswap+mask which helps us match 'partial' bswaps as a first step towards folding into a more complex bswap pattern.
Differential Revision: https://reviews.llvm.org/D88578
Fixes minor bug in D88402 where we were using the original shift constant (with undefs) instead of one with the splat values (re)splatted to all elements.
This patch adds handling of rotation patterns with constant shift amounts - the next bit will be how we want to support non-uniform constant vectors.
Differential Revision: https://reviews.llvm.org/D87452
This reverses the existing transform that would uniformly canonicalize any 'xor' after any shift. In the case of logical shifts, that turns a 'not' into an arbitrary 'xor' with constant, and that's probably not as good for analysis, SCEV, or codegen.
The SCEV motivating case is discussed in:
http://bugs.llvm.org/PR47136
There's an analysis motivating case at:
http://bugs.llvm.org/PR38781
I did draft a patch that would do the same for 'ashr' but that's questionable because it's just swapping the position of a 'not' and uncovers at least 2 missing folds that we would probably need to deal with as preliminary steps.
Alive proofs:
https://rise4fun.com/Alive/BBV
Name: shift right of 'not'
Pre: C2 == (-1 u>> C1)
%a = lshr i8 %x, C1
%r = xor i8 %a, C2
=>
%n = xor i8 %x, -1
%r = lshr i8 %n, C1
Name: shift left of 'not'
Pre: C2 == (-1 << C1)
%a = shl i8 %x, C1
%r = xor i8 %a, C2
=>
%n = xor i8 %x, -1
%r = shl i8 %n, C1
Name: ashr of 'not'
%a = ashr i8 %x, C1
%r = xor i8 %a, -1
=>
%n = xor i8 %x, -1
%r = ashr i8 %n, C1
Differential Revision: https://reviews.llvm.org/D86243
The 1st try at this (rG2265d01f2a5b) exposed what looks like
unspecified behavior in C/C++ resulting in test variations.
The arguments to BinaryOperator::CreateAnd() were both IRBuilder
function calls, and the order in which they execute determines
the order of the new instructions in the IR. But the order of
function arg evaluation is not fixed by the rules of C/C++, so
depending on compiler config, the test would fail because the
test expected a single fixed ordering of instructions.
Original commit message:
I tried to use m_Deferred() on this, but didn't find
a clean way to do that.
http://bugs.llvm.org/PR46955https://alive2.llvm.org/ce/z/2h6QTq
For a long time, the InstCombine pass handled target specific
intrinsics. Having target specific code in general passes was noted as
an area for improvement for a long time.
D81728 moves most target specific code out of the InstCombine pass.
Applying the target specific combinations in an extra pass would
probably result in inferior optimizations compared to the current
fixed-point iteration, therefore the InstCombine pass resorts to newly
introduced functions in the TargetTransformInfo when it encounters
unknown intrinsics.
The patch should not have any effect on generated code (under the
assumption that code never uses intrinsics from a foreign target).
This introduces three new functions:
TargetTransformInfo::instCombineIntrinsic
TargetTransformInfo::simplifyDemandedUseBitsIntrinsic
TargetTransformInfo::simplifyDemandedVectorEltsIntrinsic
A few target specific parts are left in the InstCombine folder, where
it makes sense to share code. The largest left-over part in
InstCombineCalls.cpp is the code shared between arm and aarch64.
This allows to move about 3000 lines out from InstCombine to the targets.
Differential Revision: https://reviews.llvm.org/D81728
I'm not sure if the test is truly minimal, but we need to
induce a situation where a value becomes a constant but is
not immediately folded before getting to the 'or' transform.
We can't leave undef vector element constants as-is,
it is a miscompile, so we need to sanitize them.
We have two vectors (C and ~C):
* We can't replace undef with 0 in both of them
* We can't replace undef with 0 in only one of them
* We could replace undef with -1 in both of them
* We could replace undef with -1 in only one(!) of them
* We could replace undef with -1 in one and 0 in another one of them.
Therefore, it seems best to go with the last option, since otherwise
we'd loose knowledge that C and ~C have no common bits set,
which seems more important than preserving partial undef knowledge.
Fixes https://bugs.llvm.org/show_bug.cgi?id=45955
Fold or(zext(bitreverse(x)),shl(zext(bitreverse(y)),bw/2) -> bitreverse(or(zext(x),shl(zext(y),bw/2))
Practically this is the same as the BSWAP pattern so we might as well handle it.
This adds a general combine that can be used to fold:
or(zext(OP(x)), shl(zext(OP(y)),bw/2))
-->
OP(or(zext(x), shl(zext(y),bw/2)))
Allowing us to widen 'concat-able' style or+zext patterns - I've just set this up for BSWAP but we could use this for other similar ops (BITREVERSE for instance).
We already do something similar for bitop(bswap(x),bswap(y)) --> bswap(bitop(x,y))
Fixes PR45715
Reviewed By: @lebedev.ri
Differential Revision: https://reviews.llvm.org/D79041
Summary:
Remove usages of asserting vector getters in Type in preparation for the
VectorType refactor. The existence of these functions complicates the
refactor while adding little value.
Reviewers: sdesmalen, rriddle, efriedma
Reviewed By: sdesmalen
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77263
Because this code does not use the IC-aware replaceInstUsesWith()
helper, we need to manually push users to the worklist.
This is NFC-ish, in that it may only change worklist order.
This patch adds a simplification if an OR weakens the overflow condition
for umul.with.overflow by treating any non-zero result as overflow. In that
case, we overflow if both umul.with.overflow operands are != 0, as in that
case the result can only be 0, iff the multiplication overflows.
Code like this is generated by code using __builtin_mul_overflow with
negative integer constants, e.g.
bool test(unsigned long long v, unsigned long long *res) {
return __builtin_mul_overflow(v, -4775807LL, res);
}
```
----------------------------------------
Name: D74141
%res = umul_overflow {i8, i1} %a, %b
%mul = extractvalue {i8, i1} %res, 0
%overflow = extractvalue {i8, i1} %res, 1
%cmp = icmp ne %mul, 0
%ret = or i1 %overflow, %cmp
ret i1 %ret
=>
%t0 = icmp ne i8 %a, 0
%t1 = icmp ne i8 %b, 0
%ret = and i1 %t0, %t1
ret i1 %ret
%res = umul_overflow {i8, i1} %a, %b
%mul = extractvalue {i8, i1} %res, 0
%cmp = icmp ne %mul, 0
%overflow = extractvalue {i8, i1} %res, 1
Done: 1
Optimization is correct!
```
Reviewers: nikic, lebedev.ri, spatel, Bigcheese, dexonsmith, aemerson
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D74141
As discussed on D73919, this replaces a few cases where we were
modifying multiple operands of instructions in-place with the
creation of a new instruction, which we generally prefer nowadays.
This tends to be more readable and less prone to worklist management
bugs.
Test changes are only superficial (instruction naming and order).
Adds a replaceOperand() helper, which is like Instruction.setOperand()
but adds the old operand to the worklist. This reduces the amount of
missing or incorrect worklist management.
This only applies the helper to a relatively small subset of
setOperand() calls in InstCombine, namely those of the pattern
`I.setOperand(); return &I;`, where it is most obviously applicable.
Differential Revision: https://reviews.llvm.org/D73803
This renames Worklist.AddDeferred() to Worklist.add() and
Worklist.Add() to Worklist.push(). The intention here is that
Worklist.add() should be the go-to method for explicit worklist
management, while the raw Worklist.push() is mostly for
InstCombine internals. I will then migrate uses of Worklist.push()
to Worklist.add() in followup changes.
As suggested by spatel on D73411 I'm also changing the remaining
method names to lowercase first character, in line with current
coding standards.
Differential Revision: https://reviews.llvm.org/D73745
This addresses https://bugs.llvm.org/show_bug.cgi?id=42801.
The m_c_ICmp() matcher is changed to provide the swapped predicate
if the operands are swapped.
Existing uses of m_c_ICmp() fall in one of two categories: Working
on equality predicates only, where swapping is irrelevant.
Or performing a manual swap, in which case this patch removes it.
The only exception is the foldICmpWithLowBitMaskedVal() fold, which
does not swap the predicate, and instead reasons about whether
a swap occurred or not for each predicate. Getting the swapped
predicate allows us to merge the logic for pairs of predicates,
instead of duplicating it.
Differential Revision: https://reviews.llvm.org/D72976
not (select ?, (cmp TPred, ?, ?), (cmp FPred, ?, ?) -->
select ?, (cmp TPred', ?, ?), (cmp FPred', ?, ?)
If both sides of the select are cmps, we can remove an instruction.
The case where only side is a cmp is deferred to a possible
follow-on patch.
We have a more general 'isFreeToInvert' analysis, but I'm not seeing
a way to use that more widely without inducing infinite looping
(opposing transforms).
Here, we flip the compare predicates directly, so we should not have
any danger by creating extra intermediate 'not' ops.
Alive proofs:
https://rise4fun.com/Alive/jKa
Name: both select values are compares - invert predicates
%tcmp = icmp sle i32 %x, %y
%fcmp = icmp ugt i32 %z, %w
%sel = select i1 %cond, i1 %tcmp, i1 %fcmp
%not = xor i1 %sel, true
=>
%tcmp_not = icmp sgt i32 %x, %y
%fcmp_not = icmp ule i32 %z, %w
%not = select i1 %cond, i1 %tcmp_not, i1 %fcmp_not
Name: false val is compare - invert/not
%fcmp = icmp ugt i32 %z, %w
%sel = select i1 %cond, i1 %tcmp, i1 %fcmp
%not = xor i1 %sel, true
=>
%tcmp_not = xor i1 %tcmp, -1
%fcmp_not = icmp ule i32 %z, %w
%not = select i1 %cond, i1 %tcmp_not, i1 %fcmp_not
Differential Revision: https://reviews.llvm.org/D72007
In this pattern, all the "magic" bits that we'd `add` are all
high sign bits, and in the value we'd be adding to they are all unset,
not unexpectedly, so we can have an `or` there:
https://rise4fun.com/Alive/ups
It is possible that `haveNoCommonBitsSet()` should be taught about this
pattern so that we never have an `add` variant, but the reasoning would
need to be recursive (because of that `select`), so i'm not really sure
that would be worth it just yet.
llvm-svn: 375378
https://rise4fun.com/Alive/KtL
This also shows that the fold added in D67412 / r372257
was too specific, and the new fold allows those test cases
to be handled more generically, therefore i delete now-dead code.
This is yet again motivated by
D67122 "[UBSan][clang][compiler-rt] Applying non-zero offset to nullptr is undefined behaviour"
llvm-svn: 372912
Summary:
This is again motivated by D67122 sanitizer check enhancement.
That patch seemingly worsens `-fsanitize=pointer-overflow`
overhead from 25% to 50%, which strongly implies missing folds.
For
```
#include <cassert>
char* test(char& base, signed long offset) {
__builtin_assume(offset < 0);
return &base + offset;
}
```
We produce
https://godbolt.org/z/r40U47
and again those two icmp's can be merged:
```
Name: 0
Pre: C != 0
%adjusted = add i8 %base, C
%not_null = icmp ne i8 %adjusted, 0
%no_underflow = icmp ult i8 %adjusted, %base
%r = and i1 %not_null, %no_underflow
=>
%neg_offset = sub i8 0, C
%r = icmp ugt i8 %base, %neg_offset
```
https://rise4fun.com/Alive/ALaphttps://rise4fun.com/Alive/slnN
There are 3 other variants of this pattern,
i believe they all will go into InstSimplify.
https://bugs.llvm.org/show_bug.cgi?id=43259
Reviewers: spatel, xbolva00, nikic
Reviewed By: spatel
Subscribers: efriedma, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67849
llvm-svn: 372768
Summary:
This is again motivated by D67122 sanitizer check enhancement.
That patch seemingly worsens `-fsanitize=pointer-overflow`
overhead from 25% to 50%, which strongly implies missing folds.
This pattern isn't exactly what we get there
(strict vs. non-strict predicate), but this pattern does not
require known-bits analysis, so it is best to handle it first.
```
Name: 0
%adjusted = add i8 %base, %offset
%not_null = icmp ne i8 %adjusted, 0
%no_underflow = icmp ule i8 %adjusted, %base
%r = and i1 %not_null, %no_underflow
=>
%neg_offset = sub i8 0, %offset
%r = icmp ugt i8 %base, %neg_offset
```
https://rise4fun.com/Alive/knp
There are 3 other variants of this pattern,
they all will go into InstSimplify:
https://rise4fun.com/Alive/bIDZhttps://bugs.llvm.org/show_bug.cgi?id=43259
Reviewers: spatel, xbolva00, nikic
Reviewed By: spatel
Subscribers: hiraditya, majnemer, vsk, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67846
llvm-svn: 372767
Summary:
Fold
or(ashr(subNSW(Y, X), ScalarSizeInBits(Y)-1), X)
into
X s> Y ? -1 : X
https://rise4fun.com/Alive/d8Ab
clamp255 is a common operator in image processing, can be implemented
in a shifty way "(255 - X) >> 31 | X & 255". Fold shift into select
enables more optimization, e.g., vmin generation for ARM target.
Reviewers: lebedev.ri, efriedma, spatel, kparzysz, bcahoon
Reviewed By: lebedev.ri
Subscribers: kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67800
llvm-svn: 372678
Summary:
This is again motivated by D67122 sanitizer check enhancement.
That patch seemingly worsens `-fsanitize=pointer-overflow`
overhead from 25% to 50%, which strongly implies missing folds.
In this particular case, given
```
char* test(char& base, unsigned long offset) {
return &base - offset;
}
```
it will end up producing something like
https://godbolt.org/z/luGEju
which after optimizations reduces down to roughly
```
declare void @use64(i64)
define i1 @test(i8* dereferenceable(1) %base, i64 %offset) {
%base_int = ptrtoint i8* %base to i64
%adjusted = sub i64 %base_int, %offset
call void @use64(i64 %adjusted)
%not_null = icmp ne i64 %adjusted, 0
%no_underflow = icmp ule i64 %adjusted, %base_int
%no_underflow_and_not_null = and i1 %not_null, %no_underflow
ret i1 %no_underflow_and_not_null
}
```
Without D67122 there was no `%not_null`,
and in this particular case we can "get rid of it", by merging two checks:
Here we are checking: `Base u>= Offset && (Base u- Offset) != 0`, but that is simply `Base u> Offset`
Alive proofs:
https://rise4fun.com/Alive/QOs
The `@llvm.usub.with.overflow` pattern itself is not handled here
because this is the main pattern, that we currently consider canonical.
https://bugs.llvm.org/show_bug.cgi?id=43251
Reviewers: spatel, nikic, xbolva00, majnemer
Reviewed By: xbolva00, majnemer
Subscribers: vsk, majnemer, xbolva00, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67356
llvm-svn: 372341
Summary:
I don't have a direct motivational case for this,
but it would be good to have this for completeness/symmetry.
This pattern is basically the motivational pattern from
https://bugs.llvm.org/show_bug.cgi?id=43251
but with different predicate that requires that the offset is non-zero.
The completeness bit comes from the fact that a similar pattern (offset != zero)
will be needed for https://bugs.llvm.org/show_bug.cgi?id=43259,
so it'd seem to be good to not overlook very similar patterns..
Proofs: https://rise4fun.com/Alive/21b
Also, there is something odd with `isKnownNonZero()`, if the non-zero
knowledge was specified as an assumption, it didn't pick it up (PR43267)
With this, i see no other missing folds for
https://bugs.llvm.org/show_bug.cgi?id=43251
Reviewers: spatel, nikic, xbolva00
Reviewed By: spatel
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67412
llvm-svn: 372257
Summary:
This is rather unconventional..
As the comment there says, we don't have much folds for xor-of-icmps,
we try to turn them into an and-of-icmps, for which we have plenty of folds.
But if the ICmp we need to invert is not single-use - we give up.
As discussed in https://reviews.llvm.org/D65148#1603922,
we may have a non-canonical CLAMP pattern, with bit match and
select-of-threshold that we'll potentially clamp.
As it can be seen in `canonicalize-clamp-with-select-of-constant-threshold-pattern.ll`,
out of all 8 variations of the pattern, only two are **not** canonicalized into
the variant with and+icmp instead of bit math.
The reason is because the ICmp we need to invert is not single-use - we give up.
We indeed can't perform this fold at will, the general rule is that
we should not increase instruction count in InstCombine,
But we wouldn't end up increasing instruction count if we can adapt every other
user to the inverted value. This way the `not` we create **will** get folded,
and in the end the instruction count did not increase.
For that, of course, we need to look at the users of a Value,
which is again rather unconventional for InstCombine :S
Thus i'm proposing to be a little bit more insistive in `foldXorOfICmps()`.
The alternatives would be to not create that `not`, but add duplicate code to
manually invert all users; or to add some even less general combine to handle
some more specific pattern[s].
Reviewers: spatel, nikic, RKSimon, craig.topper
Reviewed By: spatel
Subscribers: hiraditya, jdoerfert, dmgreen, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65530
llvm-svn: 368685
We can treat icmp eq X, MIN_UINT as icmp ule X, MIN_UINT and allow
it to merge with icmp ugt X, C. Similar for the other constants.
We can do simliar for icmp ne X, (U)INT_MIN/MAX in foldAndOfICmps. And we already handled UINT_MIN there.
Fixes PR42691.
Differential Revision: https://reviews.llvm.org/D65017
llvm-svn: 366945