This is no-functional-change-intended, but it hopefully makes things
slightly clearer and more efficient to have transforms that require
'shl' be called only from visitShl(). Further cleanup is possible.
This is another step towards trying to re-apply D110170
by eliminating conflicting transforms that cause infinite loops.
a47c8e40c7 was a previous patch in this direction.
The diffs here are mostly cosmetic, but intentional:
1. The existing code that would handle this pattern in FoldShiftByConstant()
is limited to 'shl' only now. The formatting change to IsLeftShift shows
that we could move several transforms into visitShl() directly for
efficiency because they are not common shift transforms.
2. The tests are regenerated to show new instruction names to prove that
we are getting (almost) identical logic results.
3. The one case where we differ ("trunc_sandwich_small_shift1") shows that
we now use a narrow 'and' instruction. Previously, we relied on another
transform to do that, but it is limited to legal types. That seems to
be a legacy constraint from when IR analysis and codegen were less robust.
https://alive2.llvm.org/ce/z/JxyGA4
declare void @llvm.assume(i1)
define i8 @src(i32 %x, i32 %c0, i8 %c1) {
; The sum of the shifts must not overflow the source width.
%z1 = zext i8 %c1 to i32
%sum = add i32 %c0, %z1
%ov = icmp ult i32 %sum, 32
call void @llvm.assume(i1 %ov)
%sh1 = lshr i32 %x, %c0
%tr = trunc i32 %sh1 to i8
%sh2 = lshr i8 %tr, %c1
ret i8 %sh2
}
define i8 @tgt(i32 %x, i32 %c0, i8 %c1) {
%z1 = zext i8 %c1 to i32
%sum = add i32 %c0, %z1
%maskc = lshr i8 -1, %c1
%s = lshr i32 %x, %sum
%t = trunc i32 %s to i8
%a = and i8 %t, %maskc
ret i8 %a
}
Only the multi-use cases are changing here because there's
another fold that catches the simpler patterns.
But that other fold is the source of infinite loops when we
try to add D110170, so removing that is planned as a follow-up.
Attempt to show the general proof in Alive2:
https://alive2.llvm.org/ce/z/Ns1uS2
Note that the overshift fold-to-zero tests are not
currently handled by instsimplify. If they were, we
could assert that the shift amount sum is less than
the source bitwidth.
This renames the primary methods for creating a zero value to `getZero`
instead of `getNullValue` and renames predicates like `isAllOnesValue`
to simply `isAllOnes`. This achieves two things:
1) This starts standardizing predicates across the LLVM codebase,
following (in this case) ConstantInt. The word "Value" doesn't
convey anything of merit, and is missing in some of the other things.
2) Calling an integer "null" doesn't make any sense. The original sin
here is mine and I've regretted it for years. This moves us to calling
it "zero" instead, which is correct!
APInt is widely used and I don't think anyone is keen to take massive source
breakage on anything so core, at least not all in one go. As such, this
doesn't actually delete any entrypoints, it "soft deprecates" them with a
comment.
Included in this patch are changes to a bunch of the codebase, but there are
more. We should normalize SelectionDAG and other APIs as well, which would
make the API change more mechanical.
Differential Revision: https://reviews.llvm.org/D109483
I'm not sure if there is a better way or another bug
still here, but this is enough to avoid the loop from:
https://llvm.org/PR51657
The test requires multiple blocks and datalayout to
trigger the problem path.
This transform should be updated to use better
variable names and code comments. It could
also create the shift-of-shift directly instead
of relying on another combine for that.
This transform is written in a confusing style,
and I suspect it is at fault for a more serious
bug noted in PR51567.
But it's been around forever, so I'm making the
minimal change to fix another bug - it could
increase instructions because it was not checking
uses.
Both patterns are equivalent (https://alive2.llvm.org/ce/z/jfCViF),
so we should have a preference. It seems like mask+negation is better
than two shifts.
Handle the missing fold reported in PR50816, which is a variant of the existing ashr(sub_nsw(X,Y),bw-1) --> sext(icmp_sgt(X,Y)) fold.
We also handle the lshr(or(neg(x),x),bw-1) --> zext(icmp_ne(x,0)) equivalent - https://alive2.llvm.org/ce/z/SnZmSj
We still allow multi uses of the neg(x) - as this is likely to let us further simplify other uses of the neg - but not multi uses of the or() which would increase instruction count.
Differential Revision: https://reviews.llvm.org/D105764
This is identical to 781d077afb,
but for the other function.
For certain shift amount bit widths, we must first ensure that adding
shift amounts is safe, that the sum won't have an unsigned overflow.
Fixes https://bugs.llvm.org/show_bug.cgi?id=49778
This is a special-case multiply that replicates bits of
the source operand. We need this fold to avoid regression
if we make canonicalization to `mul` more aggressive for
shl+or patterns.
I did not see a way to make Alive generalize the bit width
condition for even-number-of-bits only, but an example of
the proof is:
Name: i32
Pre: isPowerOf2(C1 - 1) && log2(C1) == C2 && (C2 * 2 == width(C2))
%m = mul nuw i32 %x, C1
%t = lshr i32 %m, C2
=>
%t = and i32 %x, C1 - 2
Name: i14
%m = mul nuw i14 %x, 129
%t = lshr i14 %m, 7
=>
%t = and i14 %x, 127
https://rise4fun.com/Alive/e52
This is a yet another hint that we will eventually need InstCombineInverter,
which would consistently sink inversions, but but for that we'll need
to consistently hoist inversions where possible, so let's do that here.
Example of a proof: https://alive2.llvm.org/ce/z/78SbDq
See https://bugs.llvm.org/show_bug.cgi?id=48995
icmp is the preferred spelling in IR because icmp analysis is
expected to be better than any other analysis. This should
lead to more follow-on folding potential.
It's difficult to say exactly what we should do in codegen to
compensate. For example on AArch64, which of these is preferred:
sub w8, w0, w1
lsr w0, w8, #31
vs:
cmp w0, w1
cset w0, lt
If there are perf regressions, then we should deal with those in
codegen on a case-by-case basis.
A possible motivating example for better optimization is shown in:
https://llvm.org/PR43198 but that will require other transforms
before anything changes there.
Alive proof:
https://rise4fun.com/Alive/o4E
Name: sign-bit splat
Pre: C1 == (width(%x) - 1)
%s = sub nsw %x, %y
%r = ashr %s, C1
=>
%c = icmp slt %x, %y
%r = sext %c
Name: sign-bit LSB
Pre: C1 == (width(%x) - 1)
%s = sub nsw %x, %y
%r = lshr %s, C1
=>
%c = icmp slt %x, %y
%r = zext %c
This still only gets used for scalar types but now always uses ConstantExpr in preparation for vector support - it was using APInt methods in some places.
Use m_Specific instead of m_Value followed by an equality check - we already do this for the similar folds above, it looks like an oversight in rG2b459fe7e1e where the original pattern match code looked a little different.
This reverses the existing transform that would uniformly canonicalize any 'xor' after any shift. In the case of logical shifts, that turns a 'not' into an arbitrary 'xor' with constant, and that's probably not as good for analysis, SCEV, or codegen.
The SCEV motivating case is discussed in:
http://bugs.llvm.org/PR47136
There's an analysis motivating case at:
http://bugs.llvm.org/PR38781
I did draft a patch that would do the same for 'ashr' but that's questionable because it's just swapping the position of a 'not' and uncovers at least 2 missing folds that we would probably need to deal with as preliminary steps.
Alive proofs:
https://rise4fun.com/Alive/BBV
Name: shift right of 'not'
Pre: C2 == (-1 u>> C1)
%a = lshr i8 %x, C1
%r = xor i8 %a, C2
=>
%n = xor i8 %x, -1
%r = lshr i8 %n, C1
Name: shift left of 'not'
Pre: C2 == (-1 << C1)
%a = shl i8 %x, C1
%r = xor i8 %a, C2
=>
%n = xor i8 %x, -1
%r = shl i8 %n, C1
Name: ashr of 'not'
%a = ashr i8 %x, C1
%r = xor i8 %a, -1
=>
%n = xor i8 %x, -1
%r = ashr i8 %n, C1
Differential Revision: https://reviews.llvm.org/D86243
For a long time, the InstCombine pass handled target specific
intrinsics. Having target specific code in general passes was noted as
an area for improvement for a long time.
D81728 moves most target specific code out of the InstCombine pass.
Applying the target specific combinations in an extra pass would
probably result in inferior optimizations compared to the current
fixed-point iteration, therefore the InstCombine pass resorts to newly
introduced functions in the TargetTransformInfo when it encounters
unknown intrinsics.
The patch should not have any effect on generated code (under the
assumption that code never uses intrinsics from a foreign target).
This introduces three new functions:
TargetTransformInfo::instCombineIntrinsic
TargetTransformInfo::simplifyDemandedUseBitsIntrinsic
TargetTransformInfo::simplifyDemandedVectorEltsIntrinsic
A few target specific parts are left in the InstCombine folder, where
it makes sense to share code. The largest left-over part in
InstCombineCalls.cpp is the code shared between arm and aarch64.
This allows to move about 3000 lines out from InstCombine to the targets.
Differential Revision: https://reviews.llvm.org/D81728
Summary:
Support ConstantInt::get() and Constant::getAllOnesValue() for scalable
vector type, this requires ConstantVector::getSplat() to take in 'ElementCount',
instead of 'unsigned' number of element count.
This change is needed for D73753.
Reviewers: sdesmalen, efriedma, apazos, spatel, huntergr, willlovett
Reviewed By: efriedma
Subscribers: tschuett, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74386
Spin-off from D75407. As described there, ConstantFoldConstant()
currently returns null for non-ConstantExpr/ConstantVector inputs,
but otherwise always returns non-null, independently of whether
any folding has happened or not.
This is confusing and makes consumer code more complicated.
I would expect either that ConstantFoldConstant() returns only if
it actually folded something, or that it always returns non-null.
I'm going to the latter possibility here, which appears to be more
useful considering existing usage.
Differential Revision: https://reviews.llvm.org/D75543
As input, we have the following pattern:
Sh0 (Sh1 X, Q), K
We want to rewrite that as:
Sh x, (Q+K) iff (Q+K) u< bitwidth(x)
While we know that originally (Q+K) would not overflow
(because 2 * (N-1) u<= iN -1), we may have looked past extensions of
shift amounts. so it may now overflow in smaller bitwidth.
To ensure that does not happen, we need to ensure that the total maximal
shift amount is still representable in that smaller bitwidth.
If the overflow would happen, (Q+K) u< bitwidth(x) check would be bogus.
https://bugs.llvm.org/show_bug.cgi?id=44802
This is a followup to D73803, which uses the replaceOperand()
helper in more places.
This should be NFC apart from changes to worklist order.
Differential Revision: https://reviews.llvm.org/D73919
This renames Worklist.AddDeferred() to Worklist.add() and
Worklist.Add() to Worklist.push(). The intention here is that
Worklist.add() should be the go-to method for explicit worklist
management, while the raw Worklist.push() is mostly for
InstCombine internals. I will then migrate uses of Worklist.push()
to Worklist.add() in followup changes.
As suggested by spatel on D73411 I'm also changing the remaining
method names to lowercase first character, in line with current
coding standards.
Differential Revision: https://reviews.llvm.org/D73745
shift (logic (shift X, C0), Y), C1 --> logic (shift X, C0+C1), (shift Y, C1)
This is an IR translation of an existing SDAG transform added here:
rL370617
So we again have 9 possible patterns with a commuted IR variant of each pattern:
https://rise4fun.com/Alive/VlIhttps://rise4fun.com/Alive/n1mhttps://rise4fun.com/Alive/1Vn
Part of the motivation is to allow easier recognition and subsequent
canonicalization of bswap patterns as discussed in PR43146:
https://bugs.llvm.org/show_bug.cgi?id=43146
We had to delay this transform because it used to allow the SLP vectorizer
to create awful reductions out of simple load-combines.
That problem was fixed with:
rL375025
(we'll bring back load combining in IR someday...)
The backend is also better equipped to deal with these patterns now
using hooks like TLI.getShiftAmountThreshold().
The only remaining potential controversy is that the -reassociate pass
tends to reverse this kind of pattern (to help GVN?). But since -reassociate
doesn't do anything with these specific patterns, there is no conflict currently.
Finally, there's a new pass proposal at D67383 for general tree-height-reduction
reassociation, and it could use a cost model to decide how to optimally rearrange
these kinds of ops for a target. That patch appears to be stalled.
Differential Revision: https://reviews.llvm.org/D69842