icmp is the preferred spelling in IR because icmp analysis is
expected to be better than any other analysis. This should
lead to more follow-on folding potential.
It's difficult to say exactly what we should do in codegen to
compensate. For example on AArch64, which of these is preferred:
sub w8, w0, w1
lsr w0, w8, #31
vs:
cmp w0, w1
cset w0, lt
If there are perf regressions, then we should deal with those in
codegen on a case-by-case basis.
A possible motivating example for better optimization is shown in:
https://llvm.org/PR43198 but that will require other transforms
before anything changes there.
Alive proof:
https://rise4fun.com/Alive/o4E
Name: sign-bit splat
Pre: C1 == (width(%x) - 1)
%s = sub nsw %x, %y
%r = ashr %s, C1
=>
%c = icmp slt %x, %y
%r = sext %c
Name: sign-bit LSB
Pre: C1 == (width(%x) - 1)
%s = sub nsw %x, %y
%r = lshr %s, C1
=>
%c = icmp slt %x, %y
%r = zext %c
This still only gets used for scalar types but now always uses ConstantExpr in preparation for vector support - it was using APInt methods in some places.
Use m_Specific instead of m_Value followed by an equality check - we already do this for the similar folds above, it looks like an oversight in rG2b459fe7e1e where the original pattern match code looked a little different.
This reverses the existing transform that would uniformly canonicalize any 'xor' after any shift. In the case of logical shifts, that turns a 'not' into an arbitrary 'xor' with constant, and that's probably not as good for analysis, SCEV, or codegen.
The SCEV motivating case is discussed in:
http://bugs.llvm.org/PR47136
There's an analysis motivating case at:
http://bugs.llvm.org/PR38781
I did draft a patch that would do the same for 'ashr' but that's questionable because it's just swapping the position of a 'not' and uncovers at least 2 missing folds that we would probably need to deal with as preliminary steps.
Alive proofs:
https://rise4fun.com/Alive/BBV
Name: shift right of 'not'
Pre: C2 == (-1 u>> C1)
%a = lshr i8 %x, C1
%r = xor i8 %a, C2
=>
%n = xor i8 %x, -1
%r = lshr i8 %n, C1
Name: shift left of 'not'
Pre: C2 == (-1 << C1)
%a = shl i8 %x, C1
%r = xor i8 %a, C2
=>
%n = xor i8 %x, -1
%r = shl i8 %n, C1
Name: ashr of 'not'
%a = ashr i8 %x, C1
%r = xor i8 %a, -1
=>
%n = xor i8 %x, -1
%r = ashr i8 %n, C1
Differential Revision: https://reviews.llvm.org/D86243
For a long time, the InstCombine pass handled target specific
intrinsics. Having target specific code in general passes was noted as
an area for improvement for a long time.
D81728 moves most target specific code out of the InstCombine pass.
Applying the target specific combinations in an extra pass would
probably result in inferior optimizations compared to the current
fixed-point iteration, therefore the InstCombine pass resorts to newly
introduced functions in the TargetTransformInfo when it encounters
unknown intrinsics.
The patch should not have any effect on generated code (under the
assumption that code never uses intrinsics from a foreign target).
This introduces three new functions:
TargetTransformInfo::instCombineIntrinsic
TargetTransformInfo::simplifyDemandedUseBitsIntrinsic
TargetTransformInfo::simplifyDemandedVectorEltsIntrinsic
A few target specific parts are left in the InstCombine folder, where
it makes sense to share code. The largest left-over part in
InstCombineCalls.cpp is the code shared between arm and aarch64.
This allows to move about 3000 lines out from InstCombine to the targets.
Differential Revision: https://reviews.llvm.org/D81728
Summary:
Support ConstantInt::get() and Constant::getAllOnesValue() for scalable
vector type, this requires ConstantVector::getSplat() to take in 'ElementCount',
instead of 'unsigned' number of element count.
This change is needed for D73753.
Reviewers: sdesmalen, efriedma, apazos, spatel, huntergr, willlovett
Reviewed By: efriedma
Subscribers: tschuett, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74386
Spin-off from D75407. As described there, ConstantFoldConstant()
currently returns null for non-ConstantExpr/ConstantVector inputs,
but otherwise always returns non-null, independently of whether
any folding has happened or not.
This is confusing and makes consumer code more complicated.
I would expect either that ConstantFoldConstant() returns only if
it actually folded something, or that it always returns non-null.
I'm going to the latter possibility here, which appears to be more
useful considering existing usage.
Differential Revision: https://reviews.llvm.org/D75543
As input, we have the following pattern:
Sh0 (Sh1 X, Q), K
We want to rewrite that as:
Sh x, (Q+K) iff (Q+K) u< bitwidth(x)
While we know that originally (Q+K) would not overflow
(because 2 * (N-1) u<= iN -1), we may have looked past extensions of
shift amounts. so it may now overflow in smaller bitwidth.
To ensure that does not happen, we need to ensure that the total maximal
shift amount is still representable in that smaller bitwidth.
If the overflow would happen, (Q+K) u< bitwidth(x) check would be bogus.
https://bugs.llvm.org/show_bug.cgi?id=44802
This is a followup to D73803, which uses the replaceOperand()
helper in more places.
This should be NFC apart from changes to worklist order.
Differential Revision: https://reviews.llvm.org/D73919
This renames Worklist.AddDeferred() to Worklist.add() and
Worklist.Add() to Worklist.push(). The intention here is that
Worklist.add() should be the go-to method for explicit worklist
management, while the raw Worklist.push() is mostly for
InstCombine internals. I will then migrate uses of Worklist.push()
to Worklist.add() in followup changes.
As suggested by spatel on D73411 I'm also changing the remaining
method names to lowercase first character, in line with current
coding standards.
Differential Revision: https://reviews.llvm.org/D73745
shift (logic (shift X, C0), Y), C1 --> logic (shift X, C0+C1), (shift Y, C1)
This is an IR translation of an existing SDAG transform added here:
rL370617
So we again have 9 possible patterns with a commuted IR variant of each pattern:
https://rise4fun.com/Alive/VlIhttps://rise4fun.com/Alive/n1mhttps://rise4fun.com/Alive/1Vn
Part of the motivation is to allow easier recognition and subsequent
canonicalization of bswap patterns as discussed in PR43146:
https://bugs.llvm.org/show_bug.cgi?id=43146
We had to delay this transform because it used to allow the SLP vectorizer
to create awful reductions out of simple load-combines.
That problem was fixed with:
rL375025
(we'll bring back load combining in IR someday...)
The backend is also better equipped to deal with these patterns now
using hooks like TLI.getShiftAmountThreshold().
The only remaining potential controversy is that the -reassociate pass
tends to reverse this kind of pattern (to help GVN?). But since -reassociate
doesn't do anything with these specific patterns, there is no conflict currently.
Finally, there's a new pass proposal at D67383 for general tree-height-reduction
reassociation, and it could use a cost model to decide how to optimally rearrange
these kinds of ops for a target. That patch appears to be stalled.
Differential Revision: https://reviews.llvm.org/D69842
Summary:
That fold keeps growing and growing :(
I think this may be one of the last pieces for it.
Since D67677/D67725, the fold knowns the general form
of the pattern - where some masking is needed:
https://rise4fun.com/Alive/F5Rhttps://rise4fun.com/Alive/gslRa
But there is one more huge piece missing - if you are extracting some bits,
it is not impossible that the origin is wider than the extraction,
i.e. there may be a truncation. And we don't deal with that yet.
But we can, and the generalization remains fully identical:
https://rise4fun.com/Alive/Uarhttps://rise4fun.com/Alive/5SW
After a preparatory cleanup i think the diff looks rather clean.
One missing piece is that in some patterns (especially pat. b),
`-1` only needs to be `-1` in final type, but that is for later..
https://bugs.llvm.org/show_bug.cgi?id=42563
Reviewers: spatel, nikic
Reviewed By: spatel
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69125
Summary:
This problem consists of several parts:
* Basic sign bit extraction - `trunc? (?shr %x, (bitwidth(x)-1))`.
This is trivial, and easy to do, we have a fold for it.
* Shift amount reassociation - if we have two identical shifts,
and we can simplify-add their shift amounts together,
then we likely can just perform them as a single shift.
But this is finicky, has one-use restrictions,
and shift opcodes must be identical.
But there is a super-pattern where both of these work together.
to produce sign bit test from two shifts + comparison.
We do indeed already handle this in most cases.
But since we get that fold transitively, it has one-use restrictions.
And what's worse, in this case the right-shifts aren't required to be
identical, and we can't handle that transitively:
If the total shift amount is bitwidth-1, only a sign bit will remain
in the output value. But if we look at this from the perspective of
two shifts, we can't fold - we can't possibly know what bit pattern
we'd produce via two shifts, it will be *some* kind of a mask
produced from original sign bit, but we just can't tell it's shape:
https://rise4fun.com/Alive/cM0https://rise4fun.com/Alive/9IN
But it will *only* contain sign bit and zeros.
So from the perspective of sign bit test, we're good:
https://rise4fun.com/Alive/FRzhttps://rise4fun.com/Alive/qBU
Superb!
So the simplest solution is to extend `reassociateShiftAmtsOfTwoSameDirectionShifts()` to also have a
sudo-analysis mode that will ignore extra-uses, and will only check
whether a) those are two right shifts and b) they end up with bitwidth(x)-1
shift amount and return either the original value that we sign-checking,
or null.
This does not have any functionality change for
the existing `reassociateShiftAmtsOfTwoSameDirectionShifts()`.
All that being said, as disscussed in the review, this yet again
increases usage of instsimplify in instcombine as utility.
Some day that may need to be reevaluated.
https://bugs.llvm.org/show_bug.cgi?id=43595
Reviewers: spatel, efriedma, vsk
Reviewed By: spatel
Subscribers: xbolva00, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68930
llvm-svn: 375371
The 1st attempt at rL374828 inserted the code
at the wrong position (outside of the constant-shift-amount
block). Trying again with an additional test to verify
const-ness.
For a constant shift amount, add the following fold.
shl (zext (i1 X)), ShAmt --> select (X, 1 << ShAmt, 0)
https://rise4fun.com/Alive/IZ9
Fixes PR42257.
Based on original patch by @zvi (Zvi Rackover)
Differential Revision: https://reviews.llvm.org/D63382
llvm-svn: 374886
For a constant shift amount, add the following fold.
shl (zext (i1 X)), ShAmt --> select (X, 1 << ShAmt, 0)
https://rise4fun.com/Alive/IZ9
Fixes PR42257.
Based on original patch by @zvi (Zvi Rackover)
Differential Revision: https://reviews.llvm.org/D63382
llvm-svn: 374828
Summary:
Currently, we pre-check whether we need to produce a mask or not.
This involves some rather magical constants.
I'd like to extend this fold to also handle the situation
when there's also a `trunc` before outer shift.
That will require another set of magical constants.
It's ugly.
Instead, we can just compute the mask, and check
whether mask is a pass-through (all-ones) or not.
This way we don't need to have any magical numbers.
This change is NFC other than the fact that we now compute
the mask and then check if we need (and can!) apply it.
Reviewers: spatel
Reviewed By: spatel
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68470
llvm-svn: 373961
Summary:
When we do `ConstantExpr::getZExt()`, that "extends" `undef` to `0`,
which means that for patterns a/b we'd assume that we must not produce
any bits for that channel, while in reality we simply didn't care
about that channel - i.e. we don't need to mask it.
Reviewers: spatel
Reviewed By: spatel
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68239
llvm-svn: 373960
Initially (D65380) i believed that if we have rightshift-trunc-rightshift,
we can't do any folding. But as it usually happens, i was wrong.
https://rise4fun.com/Alive/GEwhttps://rise4fun.com/Alive/gN2O
In https://bugs.llvm.org/show_bug.cgi?id=43564 we happen to have
this very sequence, of two right shifts separated by trunc.
And "just" so that happens, we apparently can fold the pattern
if the total shift amount is either 0, or it's equal to the bitwidth
of the innermost widest shift - i.e. if we are left with only the
original sign bit. Which is exactly what is wanted there.
llvm-svn: 373801
https://rise4fun.com/Alive/8BY - valid for lshr+trunc+variable sext
https://rise4fun.com/Alive/7jk - the variable sext can be redundant
https://rise4fun.com/Alive/Qslu - 'exact'-ness of first shift can be preserver
https://rise4fun.com/Alive/IF63 - without trunc we could view this as
more general "drop redundant mask before right-shift",
but let's handle it here for now
https://rise4fun.com/Alive/iip - likewise, without trunc, variable sext can be redundant.
There's more patterns for sure - e.g. we can have 'lshr' as the final shift,
but that might be best handled by some more generic transform, e.g.
"drop redundant masking before right-shift" (PR42456)
I'm singling-out this sext patch because you can only extract
high bits with `*shr` (unlike abstract bit masking),
and i *know* this fold is wanted by existing code.
I don't believe there is much to review here,
so i'm gonna opt into post-review mode here.
https://bugs.llvm.org/show_bug.cgi?id=43523
llvm-svn: 373542
Summary:
This is valid for any `sext` bitwidth pair:
```
Processing /tmp/opt.ll..
----------------------------------------
%signed = sext %y
%r = shl %x, %signed
ret %r
=>
%unsigned = zext %y
%r = shl %x, %unsigned
ret %r
%signed = sext %y
Done: 2016
Optimization is correct!
```
(This isn't so for funnel shifts, there it's illegal for e.g. i6->i7.)
Main motivation is the C++ semantics:
```
int shl(int a, char b) {
return a << b;
}
```
ends as
```
%3 = sext i8 %1 to i32
%4 = shl i32 %0, %3
```
https://godbolt.org/z/0jgqUq
which is, as this shows, too pessimistic.
There is another problem here - we can only do the fold
if sext is one-use. But we can trivially have cases
where several shifts have the same sext shift amount.
This should be resolved, later.
Reviewers: spatel, nikic, RKSimon
Reviewed By: spatel
Subscribers: efriedma, hiraditya, nlopes, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68103
llvm-svn: 373106
Summary:
If we have a pattern `(x & (-1 >> maskNbits)) << shiftNbits`,
we already know (have a fold) that will drop the `& (-1 >> maskNbits)`
mask iff `(shiftNbits-maskNbits) s>= 0` (i.e. `shiftNbits u>= maskNbits`).
So even if `(shiftNbits-maskNbits) s< 0`, we can still
fold, we will just need to apply a **constant** mask afterwards:
```
Name: c, normal+mask
%t0 = lshr i32 -1, C1
%t1 = and i32 %t0, %x
%r = shl i32 %t1, C2
=>
%n0 = shl i32 %x, C2
%n1 = i32 ((-(C2-C1))+32)
%n2 = zext i32 %n1 to i64
%n3 = lshr i64 -1, %n2
%n4 = trunc i64 %n3 to i32
%r = and i32 %n0, %n4
```
https://rise4fun.com/Alive/gslRa
Naturally, old `%masked` will have to be one-use.
This is not valid for pattern f - where "masking" is done via `ashr`.
https://bugs.llvm.org/show_bug.cgi?id=42563
Reviewers: spatel, nikic, xbolva00
Reviewed By: spatel
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67725
llvm-svn: 372630
Summary:
And this is **finally** the interesting part of that fold!
If we have a pattern `(x & (~(-1 << maskNbits))) << shiftNbits`,
we already know (have a fold) that will drop the `& (~(-1 << maskNbits))`
mask iff `(maskNbits+shiftNbits) u>= bitwidth(x)`.
But that is actually ignorant, there's more general fold here:
In this pattern, `(maskNbits+shiftNbits)` actually correlates
with the number of low bits that will remain in the final value.
So even if `(maskNbits+shiftNbits) u< bitwidth(x)`, we can still
fold, we will just need to apply a **constant** mask afterwards:
```
Name: a, normal+mask
%onebit = shl i32 -1, C1
%mask = xor i32 %onebit, -1
%masked = and i32 %mask, %x
%r = shl i32 %masked, C2
=>
%n0 = shl i32 %x, C2
%n1 = add i32 C1, C2
%n2 = zext i32 %n1 to i64
%n3 = shl i64 -1, %n2
%n4 = xor i64 %n3, -1
%n5 = trunc i64 %n4 to i32
%r = and i32 %n0, %n5
```
https://rise4fun.com/Alive/F5R
Naturally, old `%masked` will have to be one-use.
Similar fold exists for patterns c,d,e, will post patch later.
https://bugs.llvm.org/show_bug.cgi?id=42563
Reviewers: spatel, nikic, xbolva00
Reviewed By: spatel
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67677
llvm-svn: 372629
This reverts commit 5dbb90bfe1.
As noted in the post-commit thread for r367891, this can create
a multiply that is lowered to a libcall that may not exist.
We need to improve the backend decomposition for integer multiply
before trying to re-land this (if it's still worthwhile after
doing the backend work).
llvm-svn: 369174
This was initially committed in r368059 but got reverted in r368084
because there was a faulty logic in how the shift amounts type mismatch
was being handled (it simply wasn't).
I've added an explicit bailout before we SimplifyAddInst() - i don't think
it's designed in general to handle differently-typed values, even though
the actual problem only comes from ConstantExpr's.
I have also changed the common type deduction, to not just blindly
look past zext, but try to do that so that in the end types match.
Differential Revision: https://reviews.llvm.org/D65380
llvm-svn: 368141
This reverts r368059 (git commit 0f95710976)
This caused Clang to assert while self-hosting and compiling
SystemZInstrInfo.cpp. Reduction is running.
llvm-svn: 368084