The fold currently only handles rotation patterns, but with the maturation of backend funnel shift handling we can now realistically handle all funnel shift patterns.
This should allow us to begin resolving PR46896 et al.
Ensure we block poison in a funnel shift value - similar to rG0fe91ad463fea9d08cbcd640a62aa9ca2d8d05e0
Reapplied with fix for PR48068 - we weren't checking that the shift values could be hoisted from their basicblocks.
Differential Revision: https://reviews.llvm.org/D90625
This reverts commit 59b22e495c.
That commit broke building for ARM and AArch64, reproducible like this:
$ cat apedec-reduced.c
a;
b(e) {
int c;
unsigned d = f();
c = d >> 32 - e;
return c;
}
g() {
int h = i();
if (a)
h = h << a | b(a);
return h;
}
$ clang -target aarch64-linux-gnu -w -c -O3 apedec-reduced.c
clang: ../lib/Transforms/InstCombine/InstructionCombining.cpp:3656: bool llvm::InstCombinerImpl::run(): Assertion `DT.dominates(BB, UserParent) && "Dominance relation broken?"' failed.
Same thing for e.g. an armv7-linux-gnueabihf target.
The fold currently only handles rotation patterns, but with the maturation of backend funnel shift handling we can now realistically handle all funnel shift patterns.
This should allow us to begin resolving PR46896 et al.
Differential Revision: https://reviews.llvm.org/D90625
Replace m_ConstantInt with m_APInt to support uniform vectors (with no undef elements)
Adding non-undef support would involve some refactoring of the MaskOps struct but this might still be worth it.
As it's causing some bot failures (and per request from kbarton).
This reverts commit r358543/ab70da07286e618016e78247e4a24fcb84077fda.
llvm-svn: 358546
Now, that we have funnel shift intrinsics, it should be safe to convert this form of rotate to it.
In the worst case (a target that doesn't have rotate instructions), we will expand this into a
branch-less sequence of ALU ops (neg/and/and/lshr/shl/or) in the backend, so it's still very
likely to be a perf improvement over the original code.
The motivating source code pattern for this is shown in:
https://bugs.llvm.org/show_bug.cgi?id=34924
Background:
I looked at several different options before deciding where to try this - instcombine, simplifycfg,
CGP - because it doesn't fit cleanly anywhere AFAIK.
The backend (CGP, SDAG, GlobalIsel?) is too late for what we're trying to accomplish. We want to
have the IR converted before we reach things like vectorization because the reduced code can make a
loop much simpler to transform.
Technically, this could be included in instcombine, but it's a large pattern match that includes
control-flow, so it just felt wrong to stuff into there (although I have a draft of that patch).
Similarly, this could be part of simplifycfg, but all of this pattern matching is a stretch.
So we're left with our relatively new dumping ground for homeless transforms: aggressive-instcombine.
This only runs at -O3, but that seems like a reasonable limitation given that source code has many
options to avoid this pattern (including the recently added clang intrinsics for rotates).
I'm including a PhaseOrdering test because we require the teamwork of 3 passes (aggressive-instcombine,
instcombine, simplifycfg) to get this into the minimal IR form that we want. That test shows a bug
with the new pass manager that's independent of this change (but it will be masked if we canonicalize
harder to funnel shift intrinsics in instcombine).
Differential Revision: https://reviews.llvm.org/D55604
llvm-svn: 349396
This is a follow-up to D45986. As suggested there, we should match the "all-bits-set"
pattern in addition to "any-bits-set".
This was a little more complicated than I thought it would be initially because the
"and 1" instruction can be anywhere in the chain. Hopefully, the code comments make
that logic understandable, but if you see a way to simplify or improve that, it's
most appreciated.
This transforms patterns that emerge from bitfield tests as seen in PR37098:
https://bugs.llvm.org/show_bug.cgi?id=37098
I think it would also help reduce the large test from:
D46336
D46595
but we need something to reassociate that case to the forms we're expecting here first.
Differential Revision: https://reviews.llvm.org/D46649
llvm-svn: 331937
and (or (lshr X, C), ...), 1 --> (X & C') != 0
I initially thought about implementing the minimal pattern in instcombine as mentioned here:
https://bugs.llvm.org/show_bug.cgi?id=37098#c6
...but we need to do better to catch the more general sequence from the motivating test
(more than 2 bits in the compare). And a test-suite run with statistics showed that this
pattern only happened 2 times currently. It would potentially happen more often if
reassociation worked better (D45842), but it's probably still not too frequent?
This is small enough that I didn't see a need to create a whole new class/file within
AggressiveInstCombine. There are likely other relatively small matchers like what was
discussed in D44266 that would slide under foldUnusualPatterns() (name suggestions welcome).
We could potentially also consolidate matchers for ctpop, bswap, etc under here.
Differential Revision: https://reviews.llvm.org/D45986
llvm-svn: 331311
I'm not sure if this is where we should try to fold these
patterns inspired by:
https://bugs.llvm.org/show_bug.cgi?id=37098
...if this isn't the right place, we can move the tests.
llvm-svn: 330642
This covers the case where TruncInst leaf node is a constant expression.
See PR36121 for more details.
Differential Revision: https://reviews.llvm.org/D42622
llvm-svn: 323926
Because dead code may contain non-standard IR that causes infinite looping or crashes in underlying analysis.
See PR36134 for more details.
Differential Revision: https://reviews.llvm.org/D42683
llvm-svn: 323862
Combine expression patterns to form expressions with fewer, simple instructions.
This pass does not modify the CFG.
For example, this pass reduce width of expressions post-dominated by TruncInst
into smaller width when applicable.
It differs from instcombine pass in that it contains pattern optimization that
requires higher complexity than the O(1), thus, it should run fewer times than
instcombine pass.
Differential Revision: https://reviews.llvm.org/D38313
llvm-svn: 323321