Remove #include of Transforms/Scalar.h from Transform/Utils to fix layering.
Transforms depends on Transforms/Utils, not the other way around. So
remove the header and the "createStripGCRelocatesPass" function
declaration (& definition) that is unused and motivated this dependency.
Move Transforms/Utils/Local.h into Analysis because it's used by
Analysis/MemoryBuiltins.cpp.
llvm-svn: 328165
getNumUses is a linear time operation. It traverses the user linked list to the end and counts as it goes. Since we are only interested in small constant counts, we should use hasNUses or hasNUsesMore more that terminate the traversal as soon as it can provide the answer.
There are still two other locations in InstCombine, but changing those would force a rebase of D44266 which if accepted would remove them.
Differential Revision: https://reviews.llvm.org/D44398
llvm-svn: 327315
Also, rename 'foldOpWithConstantIntoOperand' because that's annoyingly
vague. The constant check is redundant in some cases, but it allows
removing duplication for most of the calls.
llvm-svn: 326329
This is guarded by shouldChangeType(), so the tests show that
we don't do the fold if the narrower type is not legal. Note
that there is a proposal (D42424) that would change the results
for the specific cases shown in these tests. That difference is
also discussed in PR35792:
https://bugs.llvm.org/show_bug.cgi?id=35792
Alive proofs for the cases handled here as well as the bitwise
logic binops that we should already do better on:
https://rise4fun.com/Alive/c97https://rise4fun.com/Alive/Lc5Ehttps://rise4fun.com/Alive/kdf
llvm-svn: 323437
We want to do this for 2 reasons:
1. Value tracking does not recognize the ashr variant, so it would fail to match for cases like D39766.
2. DAGCombiner does better at producing optimal codegen when we have the cmp+sel pattern.
More detail about what happens in the backend:
1. DAGCombiner has a generic transform for all targets to convert the scalar cmp+sel variant of abs
into the shift variant. That is the opposite of this IR canonicalization.
2. DAGCombiner has a generic transform for all targets to convert the vector cmp+sel variant of abs
into either an ABS node or the shift variant. That is again the opposite of this IR canonicalization.
3. DAGCombiner has a generic transform for all targets to convert the exact shift variants produced by #1 or #2
into an ISD::ABS node. Note: It would be an efficiency improvement if we had #1 go directly to an ABS node
when that's legal/custom.
4. The pattern matching above is incomplete, so it is possible to escape the intended/optimal codegen in a
variety of ways.
a. For #2, the vector path is missing the case for setlt with a '1' constant.
b. For #3, we are missing a match for commuted versions of the shift variants.
5. Therefore, this IR canonicalization can only help get us to the optimal codegen. The version of cmp+sel
produced by this patch will be recognized in the DAG and converted to an ABS node when possible or the
shift sequence when not.
6. In the following examples with this patch applied, we may get conditional moves rather than the shift
produced by the generic DAGCombiner transforms. The conditional move is created using a target-specific
decision for any given target. Whether it is optimal or not for a particular subtarget may be up for debate.
define i32 @abs_shifty(i32 %x) {
%signbit = ashr i32 %x, 31
%add = add i32 %signbit, %x
%abs = xor i32 %signbit, %add
ret i32 %abs
}
define i32 @abs_cmpsubsel(i32 %x) {
%cmp = icmp slt i32 %x, zeroinitializer
%sub = sub i32 zeroinitializer, %x
%abs = select i1 %cmp, i32 %sub, i32 %x
ret i32 %abs
}
define <4 x i32> @abs_shifty_vec(<4 x i32> %x) {
%signbit = ashr <4 x i32> %x, <i32 31, i32 31, i32 31, i32 31>
%add = add <4 x i32> %signbit, %x
%abs = xor <4 x i32> %signbit, %add
ret <4 x i32> %abs
}
define <4 x i32> @abs_cmpsubsel_vec(<4 x i32> %x) {
%cmp = icmp slt <4 x i32> %x, zeroinitializer
%sub = sub <4 x i32> zeroinitializer, %x
%abs = select <4 x i1> %cmp, <4 x i32> %sub, <4 x i32> %x
ret <4 x i32> %abs
}
> $ ./opt -instcombine shiftyabs.ll -S | ./llc -o - -mtriple=x86_64 -mattr=avx
> abs_shifty:
> movl %edi, %eax
> negl %eax
> cmovll %edi, %eax
> retq
>
> abs_cmpsubsel:
> movl %edi, %eax
> negl %eax
> cmovll %edi, %eax
> retq
>
> abs_shifty_vec:
> vpabsd %xmm0, %xmm0
> retq
>
> abs_cmpsubsel_vec:
> vpabsd %xmm0, %xmm0
> retq
>
> $ ./opt -instcombine shiftyabs.ll -S | ./llc -o - -mtriple=aarch64
> abs_shifty:
> cmp w0, #0 // =0
> cneg w0, w0, mi
> ret
>
> abs_cmpsubsel:
> cmp w0, #0 // =0
> cneg w0, w0, mi
> ret
>
> abs_shifty_vec:
> abs v0.4s, v0.4s
> ret
>
> abs_cmpsubsel_vec:
> abs v0.4s, v0.4s
> ret
>
> $ ./opt -instcombine shiftyabs.ll -S | ./llc -o - -mtriple=powerpc64le
> abs_shifty:
> srawi 4, 3, 31
> add 3, 3, 4
> xor 3, 3, 4
> blr
>
> abs_cmpsubsel:
> srawi 4, 3, 31
> add 3, 3, 4
> xor 3, 3, 4
> blr
>
> abs_shifty_vec:
> vspltisw 3, -16
> vspltisw 4, 15
> vsubuwm 3, 4, 3
> vsraw 3, 2, 3
> vadduwm 2, 2, 3
> xxlxor 34, 34, 35
> blr
>
> abs_cmpsubsel_vec:
> vspltisw 3, -16
> vspltisw 4, 15
> vsubuwm 3, 4, 3
> vsraw 3, 2, 3
> vadduwm 2, 2, 3
> xxlxor 34, 34, 35
> blr
>
Differential Revision: https://reviews.llvm.org/D40984
llvm-svn: 320921
This is a preliminary step towards solving the remaining part of PR27145 - IR for isfinite():
https://bugs.llvm.org/show_bug.cgi?id=27145
In order to solve that one more generally, we need to add matching for and/or of fcmp ord/uno
with a constant operand.
But while looking at those patterns, I realized we were missing a canonicalization for nonzero
constants. Rather than limiting to just folds for constants, we're adding a general value
tracking method for this based on an existing DAG helper.
By transforming everything to 0.0, we can simplify the existing code in foldLogicOfFCmps()
and pick up missing vector folds.
Differential Revision: https://reviews.llvm.org/D37427
llvm-svn: 312591
In addition to removing chunks of duplicated code, we don't
want these to diverge. If there's a fold for one, there
should be a fold of the other via DeMorgan's Laws.
llvm-svn: 312420
We had these locals:
Value *Op0RHS = LHS->getOperand(1);
Value *Op1LHS = RHS->getOperand(0);
...so we confusingly transposed the meaning of left/right and op0/op1.
llvm-svn: 312418
This makes it easier to see that they're almost duplicates.
As with the similar icmp functions, there should be identical
folds for both logic ops because those are DeMorganized variants.
llvm-svn: 312415
A future patch will make the code look through truncates feeding the compare. So the compares might be different types but the pretruncated types might be the same.
This should be safe because we still require the same Value* to be used truncated or not in both compares. So that serves to ensure the types are the same.
llvm-svn: 312381
Previously we used the type from the LHS of the compare, but a future patch will change decomposeBitTestICmp to look through truncates so it will return a pretruncated Value* and the type needs to match that.
llvm-svn: 312380
Looks like for 'and' and 'or' we end up performing at least some of the transformations this is bocking in a round about way anyway.
For 'and sext(cmp1), sext(cmp2) we end up later turning it into 'select cmp1, sext(cmp2), 0'. Then we optimize that back to sext (and cmp1, cmp2). This is the same result we would have gotten if shouldOptimizeCast hadn't blocked it. We do something analogous for 'or'.
With this patch we allow that transformation to happen directly in foldCastedBitwiseLogic. And we now support the same thing for 'xor'. This is definitely opening up many other cases, but since we already went around it for some cases hopefully it's ok.
Differential Revision: https://reviews.llvm.org/D36213
llvm-svn: 311508
I don't think there's any reason to have them scattered about and on all 4 operands. We already have an early check that both compares must be the same type. And within a given compare the LHS and RHS must have the same type. Beyond that I don't think there's anyway this function returns anything valid for pointer types. So let's just return early and be done with it.
Differential Revision: https://reviews.llvm.org/D36561
llvm-svn: 311383
This recommits r310869, with the moved files and no extra changes.
Original commit message:
This addresses a fixme in InstSimplify about using decomposeBitTest. This also fixes InstSimplify to handle ugt and ult compares too.
I've modified the interface a little to return only the APInt version of the mask that InstSimplify needs. InstCombine now has a small wrapper routine to create a Constant out of it. I've also dropped the returning of 0 since InstSimplify doesn't need that. So InstCombine creates a zero constant itself.
I also had to make decomposeBitTest support vectors since InstSimplify needs that.
As InstSimplify can't use something from the Transforms library, I've moved the CmpInstAnalysis code to the Analysis library.
Differential Revision: https://reviews.llvm.org/D36593
llvm-svn: 310889
Failed to add the two files that moved. And then added an extra change I didn't mean to while trying to fix that. Reverting everything.
llvm-svn: 310873
This addresses a fixme in InstSimplify about using decomposeBitTest. This also fixes InstSimplify to handle ugt and ult compares too.
I've modified the interface a little to return only the APInt version of the mask that InstSimplify needs. InstCombine now has a small wrapper routine to create a Constant out of it. I've also dropped the returning of 0 since InstSimplify doesn't need that. So InstCombine creates a zero constant itself.
I also had to make decomposeBitTest support vectors since InstSimplify needs that.
As InstSimplify can't use something from the Transforms library, I've moved the CmpInstAnalysis code to the Analysis library.
Differential Revision: https://reviews.llvm.org/D36593
llvm-svn: 310869
Summary:
These functions were overly complicated. The body of this function was rechecking for an And operation to find the constant, but we already knew we were looking at two Ands ORed together and the pieces are in variables. We already had earlier nearby code that checked for ConstantInts. So just inline the remaining parts into the earlier code.
Next step is to use m_APInt instead of ConstantInt.
Reviewers: spatel, efriedma, davide, majnemer
Reviewed By: spatel
Subscribers: zzheng, llvm-commits
Differential Revision: https://reviews.llvm.org/D36439
llvm-svn: 310806
This also corrects the description to match what was actually implemented. The old comment said X^(C1|C2), but it implemented X^((C1|C2)&~(C1&C2)). I believe ((C1|C2)&~(C1&C2)) is equivalent to (C1^C2).
Differential Revision: https://reviews.llvm.org/D36505
llvm-svn: 310658
We used to try to truncate the constant vector to vXi1, but if it's already i1 this would fail. Instead we now use IRBuilder::getZExtOrTrunc which should check the type and only create a trunc if needed. I believe this should trigger constant folding in the IRBuilder and ultimately do the same thing just with the additional type check.
llvm-svn: 310639
Note the original code I deleted incorrectly listed this as (X | C1) & C2 --> (X & C2^(C1&C2)) | C1 Which is only valid if C1 is a subset of C2. This relied on SimplifyDemandedBits to remove any extra bits from C1 before we got to that code.
My new implementation avoids relying on that behavior so that it can be naively verified with alive.
Differential Revision: https://reviews.llvm.org/D36384
llvm-svn: 310272
Summary:
The (not (sext)) case is really (xor (sext), -1) which should have been simplified to (sext (xor, 1)) before we got here. So we shouldn't need to handle it.
With that taken care of we only need to two cases so don't need the swap anymore. This makes us in sync with the equivalent code in visitOr so inline this to match.
Reviewers: spatel, eli.friedman, majnemer
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D36240
llvm-svn: 310063
As far as I can tell this should be handled by foldCastedBitwiseLogic which is called later in visitXor.
Differential Revision: https://reviews.llvm.org/D36214
llvm-svn: 309882
This adds support for sext in foldLogicCastConstant. This is a prerequisite for D36214.
Differential Revision: https://reviews.llvm.org/D36234
llvm-svn: 309880
Summary:
If one side simplifies to the identity value for inner opcode, we can replace the value with just the operation that can't be simplified.
I've removed a couple now unneeded special cases in visitAnd and visitOr. There are probably other cases I missed.
Reviewers: spatel, majnemer, hfinkel, dberlin
Reviewed By: spatel
Subscribers: grandinj, llvm-commits, spatel
Differential Revision: https://reviews.llvm.org/D35451
llvm-svn: 308111
Previously the InstCombiner class contained a pointer to an IR builder that had been passed to the constructor. Sometimes this would be passed to helper functions as either a pointer or the pointer would be dereferenced to be passed by reference.
This patch makes it a reference everywhere including the InstCombiner class itself so there is more inconsistency. This a large, but mechanical patch. I've done very minimal formatting changes on it despite what clang-format wanted to do.
llvm-svn: 307451
Going through the Constant methods requires redetermining that the Constant is a ConstantInt and then calling isZero/isOne/isMinusOne.
llvm-svn: 307292
Bswap isn't a simple operation so we need to make sure we are really removing a call to it before doing these simplifications.
For the case when both LHS and RHS are bswaps I've allowed it to be moved if either LHS or RHS has a single use since that at least allows us to move it later where it might find another bswap to combine with and it decreases the use count on the other side so maybe the other user can be optimized.
Differential Revision: https://reviews.llvm.org/D34974
llvm-svn: 307273
Summary:
I came across this while thinking about what would happen if one of the operands in this xor pattern was itself a inverted (A & ~B) ^ (~A & B)-> (A^B).
The patterns here assume that the (~a | ~b) will be demorganed to ~(a & b) first. Though I wonder if there's a multiple use case that would prevent the demorgan.
Reviewers: spatel
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D34870
llvm-svn: 306967
There are two conditions ORed here with similar checks and each contain two matches that must be true for the if to succeed. With the commutable match on the first half of the OR then both ifs basically have the same first part and only the second part distinguishs. With this change we move the commutable match to second half and make the first half unique.
This caused some tests to change because we now produce a commuted result, but this shouldn't matter in practice.
llvm-svn: 306800
If the components of the and/or had multiple uses, this transform created an additional instruction.
This patch makes sure we remove one of the components.
Differential Revision: https://reviews.llvm.org/D34498
llvm-svn: 306027
There are 2 parts to this patch made simultaneously to avoid a regression.
We're reversing the canonicalization that moves bitwise vector ops before bitcasts.
We're moving bitwise vector ops *after* bitcasts instead. That's the 1st and 3rd hunks
of the patch. The motivation is that there's only one fold that currently depends on
the existing canonicalization (see next), but there are many folds that would
automatically benefit from the new canonicalization.
PR33138 ( https://bugs.llvm.org/show_bug.cgi?id=33138 ) shows why/how we have these
patterns in IR.
There's an or(and,andn) pattern that requires an adjustment in order to continue matching
to 'select' because the bitcast changes position. This match is unfortunately complicated
because it requires 4 logic ops with optional bitcast and sext ops.
Test diffs:
1. The bitcast.ll and bitcast-bigendian.ll changes show the most basic difference -
bitcast comes before logic.
2. There are also tests with no diffs in bitcast.ll that verify that we're still doing
folds that were enabled by the previous canonicalization.
3. icmp-xor-signbit.ll shows the payoff. We don't need to adjust existing icmp patterns
to look through bitcasts.
4. logical-select.ll contains several tests for the or(and,andn) --> select fold to
verify that we are still handling those cases. The lone diff shows the movement of
the bitcast from the new canonicalization rule.
Differential Revision: https://reviews.llvm.org/D33517
llvm-svn: 306011
We have a large portfolio of folds for and-of-icmps and or-of-icmps in InstSimplify and InstCombine,
but hardly anything for xor-of-icmps. Rather than trying to rethink and translate all of those folds,
we can use the truth table definition of xor:
X ^ Y --> (X | Y) & !(X & Y)
...to see if we can convert the xor to and/or and then use the existing folds.
http://rise4fun.com/Alive/J9v
Differential Revision: https://reviews.llvm.org/D33342
llvm-svn: 305792
Summary:
These 4 patterns have the same one use check repeated twice for each. Once without a cast and one with. But the cast has no effect on what method is called.
For the OR case I believe it is always profitable regardless of the number of uses since we'll never increase the instruction count.
For the AND case I believe it is profitable if the pair of xors has one use such that we'll get rid of it completely. Or if the C value is something freely invertible, in which case the not doesn't cost anything.
Reviewers: spatel, majnemer
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D34308
llvm-svn: 305705
Summary: This is the demorganed version of the case we already handle for the OR of iszero.
Reviewers: spatel
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D34244
llvm-svn: 305548
Currently we expect A to be on the same side in both Ands but nothing guarantees that.
While there also switch to using matchers for some of the code.
Differential Revision: https://reviews.llvm.org/D34230
llvm-svn: 305487
Summary: This matches the behavior we already had for compares and makes us consistent everywhere.
Reviewers: dberlin, hfinkel, spatel
Reviewed By: dberlin
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D33604
llvm-svn: 305049
We have wrappers for several other ValueTracking methods that take care of passing all of the analysis and assumption cache parameters. This extends it to isKnownToBeAPowerOfTwo.
llvm-svn: 303924
Also, fix the old-style capitalization of the related functions
and move them to the 'private' section of the class since they
are just helpers of the visit* functions.
As shown in the post-commit comments for D32143, we are missing
folds for xor-of-icmps.
llvm-svn: 303381
// (X ^ C1) | C2 --> (X | C2) ^ (C1&~C2)
This canonicalization was added at:
https://reviews.llvm.org/rL7264
By moving xors out/down, we can more easily combine constants. I'm adding
tests that do not change with this patch, so we can verify that those kinds
of transforms are still happening.
This is no-functional-change-intended because there's a later fold:
// (X^C)|Y -> (X|Y)^C iff Y&C == 0
...and demanded-bits appears to guarantee that any fold that would have
hit the fold we're removing here would be caught by that 2nd fold.
Similar reasoning was used in:
https://reviews.llvm.org/rL299384
The larger motivation for removing this code is that it could interfere with
the fix for PR32706:
https://bugs.llvm.org/show_bug.cgi?id=32706
Ie, we're not checking if the 'xor' is actually a 'not', so we could reverse
a 'not' optimization and cause an infinite loop by altering an 'xor X, -1'.
Differential Revision: https://reviews.llvm.org/D33050
llvm-svn: 302733
The motivation for getting rid of dyn_castNotVal is to allow fixing:
https://bugs.llvm.org/show_bug.cgi?id=32706
So this was supposed to be functional-change-intended for the case
of inverting constants and applying DeMorgan. However, I can't find
any cases where that pattern will actually get to matchDeMorgansLaws()
because we have other folds in visitAnd/visitOr that do the same
thing. So this ends up just being a clean-up patch with slight efficiency
improvement, but no-functional-change-intended.
llvm-svn: 302581
This is another step towards getting rid of dyn_castNotVal,
so we can recommit:
https://reviews.llvm.org/rL300977
As the tests show, we were missing the lshr case for constants
and both ashr/lshr vector splat folds. The ashr case with constant
was being performed inefficiently in 2 steps. It's also possible
there was a latent bug in that case because we can't do that fold
if the constant is positive:
http://rise4fun.com/Alive/Bge
llvm-svn: 302465
We can simplify (or (icmp X, C1), (icmp X, C2)) to 'true' or one of the icmps in many cases.
I had to check some of these with Alive to prove to myself it's right, but everything seems
to check out. Eg, the deleted code in instcombine was completely ignoring predicates with
mismatched signedness.
This is a follow-up to:
https://reviews.llvm.org/rL301260https://reviews.llvm.org/D32143
llvm-svn: 302370
This was originally checked in here:
https://reviews.llvm.org/rL301923
And reverted here:
https://reviews.llvm.org/rL301924
Because there's a clang test that would fail after this. I fixed/removed the
offending CHECK lines in:
https://reviews.llvm.org/rL301928
So let's try this again. Original commit message:
This is the fold that causes the infinite loop in BoringSSL
(https://github.com/google/boringssl/blob/master/crypto/cipher/e_rc2.c)
when we fix instcombine demanded bits to prefer 'not' ops as in https://reviews.llvm.org/D32255.
There are 2 or 3 problems with dyn_castNotVal, and I don't think we can
reinstate https://reviews.llvm.org/D32255 until dyn_castNotVal is completely eliminated.
1. As shown here, it transforms 'not' into random xor. This transform is harmful to SCEV and codegen because 'not' can often be folded while random xor cannot.
2. It does not transform vector constants. This is actually a good thing, but if you don't believe the above argument, then we shouldn't have excluded vectors.
3. It tries to avoid transforming not(not(X)). That's nice, but it doesn't match the greedy nature of instcombine. If we DeMorganize a pattern that has an extra 'not' in it: ~(~(~X) & Y) --> (~X | ~Y)
That's just another case of DeMorgan, so we should trust that we'll fold that pattern too: (~X | ~ Y) --> ~(X & Y)
Differential Revision: https://reviews.llvm.org/D32665
llvm-svn: 301929
This is the fold that causes the infinite loop in BoringSSL
(https://github.com/google/boringssl/blob/master/crypto/cipher/e_rc2.c)
when we fix instcombine demanded bits to prefer 'not' ops as in D32255.
There are 2 or 3 problems with dyn_castNotVal, and I don't think we can
reinstate D32255 until dyn_castNotVal is completely eliminated.
1. As shown here, it transforms 'not' into random xor. This transform is
harmful to SCEV and codegen because 'not' can often be folded while
random xor cannot.
2. It does not transform vector constants. This is actually a good thing,
but if you don't believe the above argument, then we shouldn't have
excluded vectors.
3. It tries to avoid transforming not(not(X)). That's nice, but it doesn't
match the greedy nature of instcombine. If we DeMorganize a pattern
that has an extra 'not' in it:
~(~(~X) & Y) --> (~X | ~Y)
That's just another case of DeMorgan, so we should trust that we'll fold
that pattern too:
(~X | ~ Y) --> ~(X & Y)
Differential Revision: https://reviews.llvm.org/D32665
llvm-svn: 301923
If we have ~(~X & Y), it only makes sense to transform it to (X | ~Y) when we do not need
the intermediate (~X & Y) value. In that case, we would need an extra instruction to
generate ~Y + 'or' (as shown in the test changes).
It's ok if we have multiple uses of ~X or Y, however. In those cases, we may not reduce the
instruction count or critical path, but we might improve throughput because we can generate
~X and ~Y in parallel. Whether that actually makes perf sense or not for a target is something
we can't answer in IR.
Differential Revision: https://reviews.llvm.org/D32703
llvm-svn: 301848
The matching here wasn't able to handle all the possible commutes. It always assumed the not would be on the left of the xor, but that's not guaranteed.
Differential Revision: https://reviews.llvm.org/D32474
llvm-svn: 301316
We can simplify (and (icmp X, C1), (icmp X, C2)) to one of the icmps in many cases.
I had to check some of these with Alive to prove to myself it's right, but everything
seems to check out. Eg, the code in instcombine was completely ignoring predicates with
mismatched signedness.
Handling or-of-icmps would be a follow-up step.
Differential Revision: https://reviews.llvm.org/D32143
llvm-svn: 301260
This is a straight cut and paste, but there's a bigger problem: if this
fold exists for simplifyOr, there should be a DeMorganized version for
simplifyAnd. But more than that, we have a patchwork of ad hoc logic
optimizations in InstCombine. There should be some structure to ensure
that we're not missing sibling folds across and/or/xor.
llvm-svn: 301213
We handled all of the commuted variants for plain xor already,
although they were scattered around and sometimes folded less
efficiently using distributive laws. We had no folds for not-xor.
Handling all of these patterns consistently is part of trying to
reinstate:
https://reviews.llvm.org/rL300977
llvm-svn: 301144
There's probably some better way to write this that eliminates the
code duplication without hurting readability, but at least this
eliminates the logic holes and is hopefully slightly more efficient
than creating new instructions.
llvm-svn: 301129
The later uses of dyn_castNotVal in this block are either
incomplete (doesn't handle vector constants) or overstepping
(shouldn't handle constants at all), but this first use is
just unnecessary. 'I' is obviously not a constant, and it
can't be a not-of-a-not because that would already be
instsimplified.
llvm-svn: 301088
getSignBit is a static function that creates an APInt with only the sign bit set. getSignMask seems like a better name to convey its functionality. In fact several places use it and then store in an APInt named SignMask.
Differential Revision: https://reviews.llvm.org/D32108
llvm-svn: 300856
So, `cast<Instruction>` is not guaranteed to succeed. Change the
code so that we create a new constant and use it in the newly
created instruction, as it's done in other places in InstCombine.
OK'ed by Sanjay/Craig. Fixes PR32686.
llvm-svn: 300495
...when C1 differs from C2 by one bit and C1 <u C2:
http://rise4fun.com/Alive/Vuo
And move related folds to a helper function. This reduces code duplication and
will make it easier to remove the scalar-only restriction as a follow-up step.
llvm-svn: 300364
This is effectively a retry of:
https://reviews.llvm.org/rL299851
but now we have tests and an assert to make sure the bug
that was exposed with that attempt will not happen again.
I'll fix the code duplication and missing sibling fold next,
but I want to make this change as small as possible to reduce
risk since I messed it up last time.
This should fix:
https://bugs.llvm.org/show_bug.cgi?id=32524
llvm-svn: 300236
It's less efficient to produce 'ule' than 'ult' since we know we're going to
canonicalize to 'ult', but we shouldn't have duplicated code for these folds.
As a trade-off, this was a pretty terrible way to make a '2'. :)
if (LHSC == SubOne(RHSC))
AddC = ConstantExpr::getSub(AddOne(RHSC), LHSC);
The next steps are to share the code to fix PR32524 and add the missing 'and'
fold that was left out when PR14708 was fixed:
https://bugs.llvm.org/show_bug.cgi?id=14708
llvm-svn: 300222
One potential way to make InstCombine (very slightly?) faster is to recycle instructions
when possible instead of creating new ones. It's not explicitly stated AFAIK, but we don't
consider this an "InstSimplify". We could, however, make a new layer to house transforms
like this if that makes InstCombine more manageable (just throwing out an idea; not sure
how much opportunity is actually here).
Differential Revision: https://reviews.llvm.org/D31863
llvm-svn: 300067
Also, make the same change in and-of-icmps and remove a hack for detecting that case.
Finally, add some FIXME comments because the code duplication here is awful.
This should fix the remaining IR problem noted in:
https://bugs.llvm.org/show_bug.cgi?id=32524
llvm-svn: 299851
"PredicatesFoldable" returns false for signed/unsigned mismatched pairs,
so these cases should never exist. We'll default to 'unreachable' on those
predicate combos instead.
Most of what's left in these switches belongs in InstSimplify (and may
already be there), so there's probably more that can be done to reduce
this code.
llvm-svn: 299829
This combine is fully handled by SimplifyDemandedInstructionBits as of r299658 where I fixed this code to ensure the Add/Sub had only a single user. Otherwise it would fire and create additional instructions. That fix resulted in an improvement to code generated for tsan which is why I committed it before deleting.
Differential Revision: https://reviews.llvm.org/D31543
llvm-svn: 299704
There must be some opportunity to refactor big chunks of nearly duplicated code in FoldOrOfICmps / FoldAndOfICmps.
Also, none of this works with vectors, but it should.
llvm-svn: 299568
Currently we only fold with ConstantInt RHS. This generalizes to any Constant RHS.
Differential Revision: https://reviews.llvm.org/D31610
llvm-svn: 299466
It turns out that SimplifyDemandedInstructionBits will get called earlier and remove bits from C1 first. Effectively doing (X & (C1&C2)) | C2. So by the time it got to this check there could be no common bits.
I think the DAGCombiner has the same check but its check can be executed because it handles demanded bits later. I'll look at it next.
llvm-svn: 299384
1. Improve enum, function, and variable names.
2. Improve comments.
3. Fix variable capitalization.
4. Run clang-format.
As an existing code comment suggests, this should work with vector types / splat constants too,
so making this look right first will reduce the diffs needed for that change.
llvm-svn: 299365
The callers have already performed the necessary cast before calling. This allows us to remove a comment that says the instruction must be a BinaryOperator and make it explicit in the argument type.
Had to add a default case to the switch because BinaryOperator::getOpcode() returns a BinaryOps enum.
llvm-svn: 299339
As far as I can tell this combine is fully handled by SimplifyDemandedInstructionBits.
I was only looking at this because it is the only user of APIntOps::isShiftedMask which is itself broken. As demonstrated by r299187. I was going to fix isShiftedMask and needed to make sure we had coverage for the new cases it would expose to this combine. But looks like we can nuke it instead.
Differential Revision: https://reviews.llvm.org/D31543
llvm-svn: 299337
This removes a parameter from the routine that was responsible for a lot of the issue. It was a bit count that had to be set to the BitWidth of the APInt and would get passed to getLowBitsSet. This guaranteed the call to getLowBitsSet would create an all ones value. This was then compared to (V | (V-1)). So the only shifted masks we detected had to have the MSB set.
The one in tree user is a transform in InstCombine that never fires due to earlier transforms covering the case better. I've submitted a patch to remove it completely, but for now I've just adapted it to the new interface for isShiftedMask.
llvm-svn: 299273
Now that we call ShrinkDemandedConstant on the RHS of sub this should be taken care of. This code doesn't trigger on any in tree regressions, but did before ShrinkDemandedConstant was added to the RHS.
llvm-svn: 298644
Some of the callers are artificially limiting this transform to integer types;
this should make it easier to incrementally remove that restriction.
llvm-svn: 291620
Background/motivation - I was circling back around to:
https://llvm.org/bugs/show_bug.cgi?id=28296
I made a simple patch for that and noticed some regressions, so added test cases for
those with rL281055, and this is hopefully the minimal fix for just those cases.
But as you can see from the surrounding untouched folds, we are missing commuted patterns
all over the place, and of course there are no regression tests to cover any of those cases.
We could sprinkle "m_c_" dust all over this file and catch most of the missing folds, but
then we still wouldn't have test coverage, and we'd still miss some fraction of commuted
patterns because they require adjustments to the match order.
I'm aware of the concern about the potential compile-time performance impact of adding
matches like this (currently being discussed on llvm-dev), but I don't think there's any
evidence yet to suggest that handling commutative pattern matching more thoroughly is not
a worthwhile goal of InstCombine.
Differential Revision: https://reviews.llvm.org/D24419
llvm-svn: 290067
A number of new patterns for simplifying and/xor of icmp:
(icmp ne %x, 0) ^ (icmp ne %y, 0) => icmp ne %x, %y if the following is true:
1- (%x = and %a, %mask) and (%y = and %b, %mask)
2- %mask is a power of 2.
(icmp eq %x, 0) & (icmp ne %y, 0) => icmp ult %x, %y if the following is true:
1- (%x = and %a, %mask1) and (%y = and %b, %mask2)
2- Let %t be the smallest power of 2 where %mask1 & %t != 0. Then for any
%s that is a power of 2 and %s & %mask2 != 0, we must have %s <= %t.
For example if %mask1 = 24 and %mask2 = 16, setting %s = 16 and %t = 8
violates condition (2) above. So this optimization cannot be applied.
llvm-svn: 289813
After r289755, the AssumptionCache is no longer needed. Variables affected by
assumptions are now found by using the new operand-bundle-based scheme. This
new scheme is more computationally efficient, and also we need much less
code...
llvm-svn: 289756
This is prep work before changing the callers to also use APInt which will
allow folds for splat vectors. Currently, the callers have ConstantInt
guards in place, so no functional change intended with this commit.
llvm-svn: 280282
It's much less code and easier to read if we don't duplicate
everything between the 'Inside' and not 'Inside' cases.
As noted with the FIXME, the goal is to make this vector-friendly
in a follow-up patch.
llvm-svn: 280183
Summary:
InstCombine unfolds expressions of the form `zext(or(icmp, icmp))` to `or(zext(icmp), zext(icmp))` such that in a later iteration of InstCombine the exposed `zext(icmp)` instructions can be optimized. We now combine this unfolding and the subsequent `zext(icmp)` optimization to be performed together. Since the unfolding doesn't happen separately anymore, we also again enable the folding of `logic(cast(icmp), cast(icmp))` expressions to `cast(logic(icmp, icmp))` which had been disabled due to its interference with the unfolding transformation.
Tested via `make check` and `lnt`.
Background
==========
For a better understanding on how it came to this change we subsequently summarize its history. In commit r275989 we've already tried to enable the folding of `logic(cast(icmp), cast(icmp))` to `cast(logic(icmp, icmp))` which had to be reverted in r276106 because it could lead to an endless loop in InstCombine (also see http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160718/374347.html). The root of this problem is that in `visitZExt()` in InstCombineCasts.cpp there also exists a reverse of the above folding transformation, that unfolds `zext(or(icmp, icmp))` to `or(zext(icmp), zext(icmp))` in order to expose `zext(icmp)` operations which would then possibly be eliminated by subsequent iterations of InstCombine. However, before these `zext(icmp)` would be eliminated the folding from r275989 could kick in and cause InstCombine to endlessly switch back and forth between the folding and the unfolding transformation. This is the reason why we now combine the `zext`-unfolding and the elimination of the exposed `zext(icmp)` to happen at one go because this enables us to still allow the cast-folding in `logic(cast(icmp), cast(icmp))` without entering an endless loop again.
Details on the submitted changes
================================
- In `visitZExt()` we combine the unfolding and optimization of `zext` instructions.
- In `transformZExtICmp()` we have to use `Builder->CreateIntCast()` instead of `CastInst::CreateIntegerCast()` to make sure that the new `CastInst` is inserted in a `BasicBlock`. The new calls to `transformZExtICmp()` that we introduce in `visitZExt()` would otherwise cause according assertions to be triggered (in our case this happend, for example, with lnt for the MultiSource/Applications/sqlite3 and SingleSource/Regression/C++/EH/recursive-throw tests). The subsequent usage of `replaceInstUsesWith()` is necessary to ensure that the new `CastInst` replaces the `ZExtInst` accordingly.
- In InstCombineAndOrXor.cpp we again allow the folding of casts on `icmp` instructions.
- The instruction order in the optimized IR for the zext-or-icmp.ll test case is different with the introduced changes.
- The test cases in zext.ll have been adopted from the reverted commits r275989 and r276105.
Reviewers: grosser, majnemer, spatel
Subscribers: eli.friedman, majnemer, llvm-commits
Differential Revision: https://reviews.llvm.org/D22864
Contributed-by: Matthias Reisinger <d412vv1n@gmail.com>
llvm-svn: 277635
As noted in https://reviews.llvm.org/D22537 , we can use this functionality in
visitSelectInstWithICmp() and InstSimplify, but currently we have duplicated
code.
llvm-svn: 276140
Summary:
Currently, InstCombine is already able to fold expressions of the form `logic(cast(A), cast(B))` to the simpler form `cast(logic(A, B))`, where logic designates one of `and`/`or`/`xor`. This transformation is implemented in `foldCastedBitwiseLogic()` in InstCombineAndOrXor.cpp. However, this optimization will not be performed if both `A` and `B` are `icmp` instructions. The decision to preclude casts of `icmp` instructions originates in r48715 in combination with r261707, and can be best understood by the title of the former one:
> Transform (zext (or (icmp), (icmp))) to (or (zext (cimp), (zext icmp))) if at least one of the (zext icmp) can be transformed to eliminate an icmp.
Apparently, it introduced a transformation that is a reverse of the transformation that is done in `foldCastedBitwiseLogic()`. Its purpose is to expose pairs of `zext icmp` that would subsequently be optimized by `transformZExtICmp()` in InstCombineCasts.cpp. Therefore, in order to avoid an endless loop of switching back and forth between these two transformations, the one in `foldCastedBitwiseLogic()` has been restricted to exclude `icmp` instructions which is mirrored in the responsible check:
`if ((!isa<ICmpInst>(Cast0Src) || !isa<ICmpInst>(Cast1Src)) && ...`
This check seems to sort out more cases than necessary because:
- the reverse transformation is obviously done for `or` instructions only
- and also not every `zext icmp` pair is necessarily the result of this reverse transformation
Therefore we now remove this check and replace it by a more finegrained one in `shouldOptimizeCast()` that now rejects only those `logic(zext(icmp), zext(icmp))` that would be able to be optimized by `transformZExtICmp()`, which also avoids the mentioned endless loop. That means we are now able to also simplify expressions of the form `logic(cast(icmp), cast(icmp))` to `cast(logic(icmp, icmp))` (`cast` being an arbitrary `CastInst`).
As an example, consider the following IR snippet
```
%1 = icmp sgt i64 %a, %b
%2 = zext i1 %1 to i8
%3 = icmp slt i64 %a, %c
%4 = zext i1 %3 to i8
%5 = and i8 %2, %4
```
which would now be transformed to
```
%1 = icmp sgt i64 %a, %b
%2 = icmp slt i64 %a, %c
%3 = and i1 %1, %2
%4 = zext i1 %3 to i8
```
This issue became apparent when experimenting with the programming language Julia, which makes use of LLVM. Currently, Julia lowers its `Bool` datatype to LLVM's `i8` (also see https://github.com/JuliaLang/julia/pull/17225). In fact, the above IR example is the lowered form of the Julia snippet `(a > b) & (a < c)`. Like shown above, this may introduce `zext` operations, casting between `i1` and `i8`, which could for example hinder ScalarEvolution and Polly on certain code.
Reviewers: grosser, vtjnash, majnemer
Subscribers: majnemer, llvm-commits
Differential Revision: https://reviews.llvm.org/D22511
Contributed-by: Matthias Reisinger
llvm-svn: 275989
Summary:
This patch cleans up parts of InstCombine to raise its compliance with the LLVM coding standards and to increase its readability. The changes and according rationale are summarized in the following:
- Rename `ShouldOptimizeCast()` to `shouldOptimizeCast()` since functions should start with a lower case letter.
- Move `shouldOptimizeCast()` from InstCombineCasts.cpp to InstCombineAndOrXor.cpp since it's only used there.
- Simplify interface of `shouldOptimizeCast()`.
- Minor code style adaptions in `shouldOptimizeCast()`.
- Remove the documentation on the function definition of `shouldOptimizeCast()` since it just repeats the documentation on its declaration. Also enhance the documentation on its declaration with more information describing its intended use and make it doxygen-compliant.
- Change a comment in `foldCastedBitwiseLogic()` from `fold (logic (cast A), (cast B)) -> (cast (logic A, B))` to `fold logic(cast(A), cast(B)) -> cast(logic(A, B))` since the surrounding comments use this format.
- Remove comment `Only do this if the casts both really cause code to be generated.` in `foldCastedBitwiseLogic()` since it just repeats parts of the documentation of `shouldOptimizeCast()` and does not help to improve readability.
- Simplify the interface of `isEliminableCastPair()`.
- Removed the documentation on the function definition of `isEliminableCastPair()` which only contained obvious statements about its implementation. Instead added more general doxygen-compliant documentation to its declaration.
- Renamed parameter `DoXform` of `transformZExtIcmp()` to `DoTransform` to make its intention clearer.
- Moved documentation of `transformZExtIcmp()` from its definition to its declaration and made it doxygen-compliant.
Reviewers: vtjnash, grosser
Subscribers: majnemer, llvm-commits
Differential Revision: https://reviews.llvm.org/D22449
Contributed-by: Matthias Reisinger
llvm-svn: 275964