// (X ^ C1) | C2 --> (X | C2) ^ (C1&~C2)
This canonicalization was added at:
https://reviews.llvm.org/rL7264
By moving xors out/down, we can more easily combine constants. I'm adding
tests that do not change with this patch, so we can verify that those kinds
of transforms are still happening.
This is no-functional-change-intended because there's a later fold:
// (X^C)|Y -> (X|Y)^C iff Y&C == 0
...and demanded-bits appears to guarantee that any fold that would have
hit the fold we're removing here would be caught by that 2nd fold.
Similar reasoning was used in:
https://reviews.llvm.org/rL299384
The larger motivation for removing this code is that it could interfere with
the fix for PR32706:
https://bugs.llvm.org/show_bug.cgi?id=32706
Ie, we're not checking if the 'xor' is actually a 'not', so we could reverse
a 'not' optimization and cause an infinite loop by altering an 'xor X, -1'.
Differential Revision: https://reviews.llvm.org/D33050
llvm-svn: 302733
The motivation for getting rid of dyn_castNotVal is to allow fixing:
https://bugs.llvm.org/show_bug.cgi?id=32706
So this was supposed to be functional-change-intended for the case
of inverting constants and applying DeMorgan. However, I can't find
any cases where that pattern will actually get to matchDeMorgansLaws()
because we have other folds in visitAnd/visitOr that do the same
thing. So this ends up just being a clean-up patch with slight efficiency
improvement, but no-functional-change-intended.
llvm-svn: 302581
This is another step towards getting rid of dyn_castNotVal,
so we can recommit:
https://reviews.llvm.org/rL300977
As the tests show, we were missing the lshr case for constants
and both ashr/lshr vector splat folds. The ashr case with constant
was being performed inefficiently in 2 steps. It's also possible
there was a latent bug in that case because we can't do that fold
if the constant is positive:
http://rise4fun.com/Alive/Bge
llvm-svn: 302465
We can simplify (or (icmp X, C1), (icmp X, C2)) to 'true' or one of the icmps in many cases.
I had to check some of these with Alive to prove to myself it's right, but everything seems
to check out. Eg, the deleted code in instcombine was completely ignoring predicates with
mismatched signedness.
This is a follow-up to:
https://reviews.llvm.org/rL301260https://reviews.llvm.org/D32143
llvm-svn: 302370
This was originally checked in here:
https://reviews.llvm.org/rL301923
And reverted here:
https://reviews.llvm.org/rL301924
Because there's a clang test that would fail after this. I fixed/removed the
offending CHECK lines in:
https://reviews.llvm.org/rL301928
So let's try this again. Original commit message:
This is the fold that causes the infinite loop in BoringSSL
(https://github.com/google/boringssl/blob/master/crypto/cipher/e_rc2.c)
when we fix instcombine demanded bits to prefer 'not' ops as in https://reviews.llvm.org/D32255.
There are 2 or 3 problems with dyn_castNotVal, and I don't think we can
reinstate https://reviews.llvm.org/D32255 until dyn_castNotVal is completely eliminated.
1. As shown here, it transforms 'not' into random xor. This transform is harmful to SCEV and codegen because 'not' can often be folded while random xor cannot.
2. It does not transform vector constants. This is actually a good thing, but if you don't believe the above argument, then we shouldn't have excluded vectors.
3. It tries to avoid transforming not(not(X)). That's nice, but it doesn't match the greedy nature of instcombine. If we DeMorganize a pattern that has an extra 'not' in it: ~(~(~X) & Y) --> (~X | ~Y)
That's just another case of DeMorgan, so we should trust that we'll fold that pattern too: (~X | ~ Y) --> ~(X & Y)
Differential Revision: https://reviews.llvm.org/D32665
llvm-svn: 301929
This is the fold that causes the infinite loop in BoringSSL
(https://github.com/google/boringssl/blob/master/crypto/cipher/e_rc2.c)
when we fix instcombine demanded bits to prefer 'not' ops as in D32255.
There are 2 or 3 problems with dyn_castNotVal, and I don't think we can
reinstate D32255 until dyn_castNotVal is completely eliminated.
1. As shown here, it transforms 'not' into random xor. This transform is
harmful to SCEV and codegen because 'not' can often be folded while
random xor cannot.
2. It does not transform vector constants. This is actually a good thing,
but if you don't believe the above argument, then we shouldn't have
excluded vectors.
3. It tries to avoid transforming not(not(X)). That's nice, but it doesn't
match the greedy nature of instcombine. If we DeMorganize a pattern
that has an extra 'not' in it:
~(~(~X) & Y) --> (~X | ~Y)
That's just another case of DeMorgan, so we should trust that we'll fold
that pattern too:
(~X | ~ Y) --> ~(X & Y)
Differential Revision: https://reviews.llvm.org/D32665
llvm-svn: 301923
If we have ~(~X & Y), it only makes sense to transform it to (X | ~Y) when we do not need
the intermediate (~X & Y) value. In that case, we would need an extra instruction to
generate ~Y + 'or' (as shown in the test changes).
It's ok if we have multiple uses of ~X or Y, however. In those cases, we may not reduce the
instruction count or critical path, but we might improve throughput because we can generate
~X and ~Y in parallel. Whether that actually makes perf sense or not for a target is something
we can't answer in IR.
Differential Revision: https://reviews.llvm.org/D32703
llvm-svn: 301848
The matching here wasn't able to handle all the possible commutes. It always assumed the not would be on the left of the xor, but that's not guaranteed.
Differential Revision: https://reviews.llvm.org/D32474
llvm-svn: 301316
We can simplify (and (icmp X, C1), (icmp X, C2)) to one of the icmps in many cases.
I had to check some of these with Alive to prove to myself it's right, but everything
seems to check out. Eg, the code in instcombine was completely ignoring predicates with
mismatched signedness.
Handling or-of-icmps would be a follow-up step.
Differential Revision: https://reviews.llvm.org/D32143
llvm-svn: 301260
This is a straight cut and paste, but there's a bigger problem: if this
fold exists for simplifyOr, there should be a DeMorganized version for
simplifyAnd. But more than that, we have a patchwork of ad hoc logic
optimizations in InstCombine. There should be some structure to ensure
that we're not missing sibling folds across and/or/xor.
llvm-svn: 301213
We handled all of the commuted variants for plain xor already,
although they were scattered around and sometimes folded less
efficiently using distributive laws. We had no folds for not-xor.
Handling all of these patterns consistently is part of trying to
reinstate:
https://reviews.llvm.org/rL300977
llvm-svn: 301144
There's probably some better way to write this that eliminates the
code duplication without hurting readability, but at least this
eliminates the logic holes and is hopefully slightly more efficient
than creating new instructions.
llvm-svn: 301129
The later uses of dyn_castNotVal in this block are either
incomplete (doesn't handle vector constants) or overstepping
(shouldn't handle constants at all), but this first use is
just unnecessary. 'I' is obviously not a constant, and it
can't be a not-of-a-not because that would already be
instsimplified.
llvm-svn: 301088
getSignBit is a static function that creates an APInt with only the sign bit set. getSignMask seems like a better name to convey its functionality. In fact several places use it and then store in an APInt named SignMask.
Differential Revision: https://reviews.llvm.org/D32108
llvm-svn: 300856
So, `cast<Instruction>` is not guaranteed to succeed. Change the
code so that we create a new constant and use it in the newly
created instruction, as it's done in other places in InstCombine.
OK'ed by Sanjay/Craig. Fixes PR32686.
llvm-svn: 300495
...when C1 differs from C2 by one bit and C1 <u C2:
http://rise4fun.com/Alive/Vuo
And move related folds to a helper function. This reduces code duplication and
will make it easier to remove the scalar-only restriction as a follow-up step.
llvm-svn: 300364
This is effectively a retry of:
https://reviews.llvm.org/rL299851
but now we have tests and an assert to make sure the bug
that was exposed with that attempt will not happen again.
I'll fix the code duplication and missing sibling fold next,
but I want to make this change as small as possible to reduce
risk since I messed it up last time.
This should fix:
https://bugs.llvm.org/show_bug.cgi?id=32524
llvm-svn: 300236
It's less efficient to produce 'ule' than 'ult' since we know we're going to
canonicalize to 'ult', but we shouldn't have duplicated code for these folds.
As a trade-off, this was a pretty terrible way to make a '2'. :)
if (LHSC == SubOne(RHSC))
AddC = ConstantExpr::getSub(AddOne(RHSC), LHSC);
The next steps are to share the code to fix PR32524 and add the missing 'and'
fold that was left out when PR14708 was fixed:
https://bugs.llvm.org/show_bug.cgi?id=14708
llvm-svn: 300222
One potential way to make InstCombine (very slightly?) faster is to recycle instructions
when possible instead of creating new ones. It's not explicitly stated AFAIK, but we don't
consider this an "InstSimplify". We could, however, make a new layer to house transforms
like this if that makes InstCombine more manageable (just throwing out an idea; not sure
how much opportunity is actually here).
Differential Revision: https://reviews.llvm.org/D31863
llvm-svn: 300067
Also, make the same change in and-of-icmps and remove a hack for detecting that case.
Finally, add some FIXME comments because the code duplication here is awful.
This should fix the remaining IR problem noted in:
https://bugs.llvm.org/show_bug.cgi?id=32524
llvm-svn: 299851
"PredicatesFoldable" returns false for signed/unsigned mismatched pairs,
so these cases should never exist. We'll default to 'unreachable' on those
predicate combos instead.
Most of what's left in these switches belongs in InstSimplify (and may
already be there), so there's probably more that can be done to reduce
this code.
llvm-svn: 299829
This combine is fully handled by SimplifyDemandedInstructionBits as of r299658 where I fixed this code to ensure the Add/Sub had only a single user. Otherwise it would fire and create additional instructions. That fix resulted in an improvement to code generated for tsan which is why I committed it before deleting.
Differential Revision: https://reviews.llvm.org/D31543
llvm-svn: 299704
There must be some opportunity to refactor big chunks of nearly duplicated code in FoldOrOfICmps / FoldAndOfICmps.
Also, none of this works with vectors, but it should.
llvm-svn: 299568
Currently we only fold with ConstantInt RHS. This generalizes to any Constant RHS.
Differential Revision: https://reviews.llvm.org/D31610
llvm-svn: 299466
It turns out that SimplifyDemandedInstructionBits will get called earlier and remove bits from C1 first. Effectively doing (X & (C1&C2)) | C2. So by the time it got to this check there could be no common bits.
I think the DAGCombiner has the same check but its check can be executed because it handles demanded bits later. I'll look at it next.
llvm-svn: 299384
1. Improve enum, function, and variable names.
2. Improve comments.
3. Fix variable capitalization.
4. Run clang-format.
As an existing code comment suggests, this should work with vector types / splat constants too,
so making this look right first will reduce the diffs needed for that change.
llvm-svn: 299365
The callers have already performed the necessary cast before calling. This allows us to remove a comment that says the instruction must be a BinaryOperator and make it explicit in the argument type.
Had to add a default case to the switch because BinaryOperator::getOpcode() returns a BinaryOps enum.
llvm-svn: 299339
As far as I can tell this combine is fully handled by SimplifyDemandedInstructionBits.
I was only looking at this because it is the only user of APIntOps::isShiftedMask which is itself broken. As demonstrated by r299187. I was going to fix isShiftedMask and needed to make sure we had coverage for the new cases it would expose to this combine. But looks like we can nuke it instead.
Differential Revision: https://reviews.llvm.org/D31543
llvm-svn: 299337
This removes a parameter from the routine that was responsible for a lot of the issue. It was a bit count that had to be set to the BitWidth of the APInt and would get passed to getLowBitsSet. This guaranteed the call to getLowBitsSet would create an all ones value. This was then compared to (V | (V-1)). So the only shifted masks we detected had to have the MSB set.
The one in tree user is a transform in InstCombine that never fires due to earlier transforms covering the case better. I've submitted a patch to remove it completely, but for now I've just adapted it to the new interface for isShiftedMask.
llvm-svn: 299273
Now that we call ShrinkDemandedConstant on the RHS of sub this should be taken care of. This code doesn't trigger on any in tree regressions, but did before ShrinkDemandedConstant was added to the RHS.
llvm-svn: 298644
Some of the callers are artificially limiting this transform to integer types;
this should make it easier to incrementally remove that restriction.
llvm-svn: 291620
Background/motivation - I was circling back around to:
https://llvm.org/bugs/show_bug.cgi?id=28296
I made a simple patch for that and noticed some regressions, so added test cases for
those with rL281055, and this is hopefully the minimal fix for just those cases.
But as you can see from the surrounding untouched folds, we are missing commuted patterns
all over the place, and of course there are no regression tests to cover any of those cases.
We could sprinkle "m_c_" dust all over this file and catch most of the missing folds, but
then we still wouldn't have test coverage, and we'd still miss some fraction of commuted
patterns because they require adjustments to the match order.
I'm aware of the concern about the potential compile-time performance impact of adding
matches like this (currently being discussed on llvm-dev), but I don't think there's any
evidence yet to suggest that handling commutative pattern matching more thoroughly is not
a worthwhile goal of InstCombine.
Differential Revision: https://reviews.llvm.org/D24419
llvm-svn: 290067
A number of new patterns for simplifying and/xor of icmp:
(icmp ne %x, 0) ^ (icmp ne %y, 0) => icmp ne %x, %y if the following is true:
1- (%x = and %a, %mask) and (%y = and %b, %mask)
2- %mask is a power of 2.
(icmp eq %x, 0) & (icmp ne %y, 0) => icmp ult %x, %y if the following is true:
1- (%x = and %a, %mask1) and (%y = and %b, %mask2)
2- Let %t be the smallest power of 2 where %mask1 & %t != 0. Then for any
%s that is a power of 2 and %s & %mask2 != 0, we must have %s <= %t.
For example if %mask1 = 24 and %mask2 = 16, setting %s = 16 and %t = 8
violates condition (2) above. So this optimization cannot be applied.
llvm-svn: 289813
After r289755, the AssumptionCache is no longer needed. Variables affected by
assumptions are now found by using the new operand-bundle-based scheme. This
new scheme is more computationally efficient, and also we need much less
code...
llvm-svn: 289756
This is prep work before changing the callers to also use APInt which will
allow folds for splat vectors. Currently, the callers have ConstantInt
guards in place, so no functional change intended with this commit.
llvm-svn: 280282
It's much less code and easier to read if we don't duplicate
everything between the 'Inside' and not 'Inside' cases.
As noted with the FIXME, the goal is to make this vector-friendly
in a follow-up patch.
llvm-svn: 280183
Summary:
InstCombine unfolds expressions of the form `zext(or(icmp, icmp))` to `or(zext(icmp), zext(icmp))` such that in a later iteration of InstCombine the exposed `zext(icmp)` instructions can be optimized. We now combine this unfolding and the subsequent `zext(icmp)` optimization to be performed together. Since the unfolding doesn't happen separately anymore, we also again enable the folding of `logic(cast(icmp), cast(icmp))` expressions to `cast(logic(icmp, icmp))` which had been disabled due to its interference with the unfolding transformation.
Tested via `make check` and `lnt`.
Background
==========
For a better understanding on how it came to this change we subsequently summarize its history. In commit r275989 we've already tried to enable the folding of `logic(cast(icmp), cast(icmp))` to `cast(logic(icmp, icmp))` which had to be reverted in r276106 because it could lead to an endless loop in InstCombine (also see http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160718/374347.html). The root of this problem is that in `visitZExt()` in InstCombineCasts.cpp there also exists a reverse of the above folding transformation, that unfolds `zext(or(icmp, icmp))` to `or(zext(icmp), zext(icmp))` in order to expose `zext(icmp)` operations which would then possibly be eliminated by subsequent iterations of InstCombine. However, before these `zext(icmp)` would be eliminated the folding from r275989 could kick in and cause InstCombine to endlessly switch back and forth between the folding and the unfolding transformation. This is the reason why we now combine the `zext`-unfolding and the elimination of the exposed `zext(icmp)` to happen at one go because this enables us to still allow the cast-folding in `logic(cast(icmp), cast(icmp))` without entering an endless loop again.
Details on the submitted changes
================================
- In `visitZExt()` we combine the unfolding and optimization of `zext` instructions.
- In `transformZExtICmp()` we have to use `Builder->CreateIntCast()` instead of `CastInst::CreateIntegerCast()` to make sure that the new `CastInst` is inserted in a `BasicBlock`. The new calls to `transformZExtICmp()` that we introduce in `visitZExt()` would otherwise cause according assertions to be triggered (in our case this happend, for example, with lnt for the MultiSource/Applications/sqlite3 and SingleSource/Regression/C++/EH/recursive-throw tests). The subsequent usage of `replaceInstUsesWith()` is necessary to ensure that the new `CastInst` replaces the `ZExtInst` accordingly.
- In InstCombineAndOrXor.cpp we again allow the folding of casts on `icmp` instructions.
- The instruction order in the optimized IR for the zext-or-icmp.ll test case is different with the introduced changes.
- The test cases in zext.ll have been adopted from the reverted commits r275989 and r276105.
Reviewers: grosser, majnemer, spatel
Subscribers: eli.friedman, majnemer, llvm-commits
Differential Revision: https://reviews.llvm.org/D22864
Contributed-by: Matthias Reisinger <d412vv1n@gmail.com>
llvm-svn: 277635
As noted in https://reviews.llvm.org/D22537 , we can use this functionality in
visitSelectInstWithICmp() and InstSimplify, but currently we have duplicated
code.
llvm-svn: 276140
Summary:
Currently, InstCombine is already able to fold expressions of the form `logic(cast(A), cast(B))` to the simpler form `cast(logic(A, B))`, where logic designates one of `and`/`or`/`xor`. This transformation is implemented in `foldCastedBitwiseLogic()` in InstCombineAndOrXor.cpp. However, this optimization will not be performed if both `A` and `B` are `icmp` instructions. The decision to preclude casts of `icmp` instructions originates in r48715 in combination with r261707, and can be best understood by the title of the former one:
> Transform (zext (or (icmp), (icmp))) to (or (zext (cimp), (zext icmp))) if at least one of the (zext icmp) can be transformed to eliminate an icmp.
Apparently, it introduced a transformation that is a reverse of the transformation that is done in `foldCastedBitwiseLogic()`. Its purpose is to expose pairs of `zext icmp` that would subsequently be optimized by `transformZExtICmp()` in InstCombineCasts.cpp. Therefore, in order to avoid an endless loop of switching back and forth between these two transformations, the one in `foldCastedBitwiseLogic()` has been restricted to exclude `icmp` instructions which is mirrored in the responsible check:
`if ((!isa<ICmpInst>(Cast0Src) || !isa<ICmpInst>(Cast1Src)) && ...`
This check seems to sort out more cases than necessary because:
- the reverse transformation is obviously done for `or` instructions only
- and also not every `zext icmp` pair is necessarily the result of this reverse transformation
Therefore we now remove this check and replace it by a more finegrained one in `shouldOptimizeCast()` that now rejects only those `logic(zext(icmp), zext(icmp))` that would be able to be optimized by `transformZExtICmp()`, which also avoids the mentioned endless loop. That means we are now able to also simplify expressions of the form `logic(cast(icmp), cast(icmp))` to `cast(logic(icmp, icmp))` (`cast` being an arbitrary `CastInst`).
As an example, consider the following IR snippet
```
%1 = icmp sgt i64 %a, %b
%2 = zext i1 %1 to i8
%3 = icmp slt i64 %a, %c
%4 = zext i1 %3 to i8
%5 = and i8 %2, %4
```
which would now be transformed to
```
%1 = icmp sgt i64 %a, %b
%2 = icmp slt i64 %a, %c
%3 = and i1 %1, %2
%4 = zext i1 %3 to i8
```
This issue became apparent when experimenting with the programming language Julia, which makes use of LLVM. Currently, Julia lowers its `Bool` datatype to LLVM's `i8` (also see https://github.com/JuliaLang/julia/pull/17225). In fact, the above IR example is the lowered form of the Julia snippet `(a > b) & (a < c)`. Like shown above, this may introduce `zext` operations, casting between `i1` and `i8`, which could for example hinder ScalarEvolution and Polly on certain code.
Reviewers: grosser, vtjnash, majnemer
Subscribers: majnemer, llvm-commits
Differential Revision: https://reviews.llvm.org/D22511
Contributed-by: Matthias Reisinger
llvm-svn: 275989
Summary:
This patch cleans up parts of InstCombine to raise its compliance with the LLVM coding standards and to increase its readability. The changes and according rationale are summarized in the following:
- Rename `ShouldOptimizeCast()` to `shouldOptimizeCast()` since functions should start with a lower case letter.
- Move `shouldOptimizeCast()` from InstCombineCasts.cpp to InstCombineAndOrXor.cpp since it's only used there.
- Simplify interface of `shouldOptimizeCast()`.
- Minor code style adaptions in `shouldOptimizeCast()`.
- Remove the documentation on the function definition of `shouldOptimizeCast()` since it just repeats the documentation on its declaration. Also enhance the documentation on its declaration with more information describing its intended use and make it doxygen-compliant.
- Change a comment in `foldCastedBitwiseLogic()` from `fold (logic (cast A), (cast B)) -> (cast (logic A, B))` to `fold logic(cast(A), cast(B)) -> cast(logic(A, B))` since the surrounding comments use this format.
- Remove comment `Only do this if the casts both really cause code to be generated.` in `foldCastedBitwiseLogic()` since it just repeats parts of the documentation of `shouldOptimizeCast()` and does not help to improve readability.
- Simplify the interface of `isEliminableCastPair()`.
- Removed the documentation on the function definition of `isEliminableCastPair()` which only contained obvious statements about its implementation. Instead added more general doxygen-compliant documentation to its declaration.
- Renamed parameter `DoXform` of `transformZExtIcmp()` to `DoTransform` to make its intention clearer.
- Moved documentation of `transformZExtIcmp()` from its definition to its declaration and made it doxygen-compliant.
Reviewers: vtjnash, grosser
Subscribers: majnemer, llvm-commits
Differential Revision: https://reviews.llvm.org/D22449
Contributed-by: Matthias Reisinger
llvm-svn: 275964
In D21740, we discussed trying to make this a more general matcher. However, I didn't see a clean
way to handle the regular m_Not cases and these non-splat vector patterns, so I've opted for the
direct approach here. If there are other potential uses of areInverseVectorBitmasks(), we could
move that helper function to a higher level.
There is an open question as to which is of these forms should be considered the canonical IR:
%sel = select <4 x i1> <i1 true, i1 false, i1 false, i1 true>, <4 x i32> %a, <4 x i32> %b
%shuf = shufflevector <4 x i32> %a, <4 x i32> %b, <4 x i32> <i32 0, i32 5, i32 6, i32 3>
Differential Revision: http://reviews.llvm.org/D22114
llvm-svn: 275289
This isn't a sure thing (are 2 extra bitcasts less expensive than a logic op?),
but we'll try to err on the conservative side by going with the case that has
less IR instructions.
Note: This question came up in http://reviews.llvm.org/D22114 , but this part is
independent of that patch proposal, so I'm making this small change ahead of that
one.
See also:
http://reviews.llvm.org/rL274926
llvm-svn: 274932
By putting all the possible commutations together, we simplify the code.
Note that this is NFCI, but I'm adding tests that actually exercise each
commutation pattern because we don't have this anywhere else.
llvm-svn: 273702
By moving this transform to InstSimplify from InstCombine, we sidestep the problem/question
raised by PR27869:
https://llvm.org/bugs/show_bug.cgi?id=27869
...where InstCombine turns an icmp+zext into a shift causing us to miss the fold.
Credit to David Majnemer for a draft patch of the changes to InstructionSimplify.cpp.
Differential Revision: http://reviews.llvm.org/D21512
llvm-svn: 273200
There was concern that creating bitcasts for the simpler potential select pattern:
define <2 x i64> @vecBitcastOp1(<4 x i1> %cmp, <2 x i64> %a) {
%a2 = add <2 x i64> %a, %a
%sext = sext <4 x i1> %cmp to <4 x i32>
%bc = bitcast <4 x i32> %sext to <2 x i64>
%and = and <2 x i64> %a2, %bc
ret <2 x i64> %and
}
might lead to worse code for some targets, so this patch is matching the larger
patterns seen in the test cases.
The motivating example for this patch is this IR produced via SSE intrinsics in C:
define <2 x i64> @gibson(<2 x i64> %a, <2 x i64> %b) {
%t0 = bitcast <2 x i64> %a to <4 x i32>
%t1 = bitcast <2 x i64> %b to <4 x i32>
%cmp = icmp sgt <4 x i32> %t0, %t1
%sext = sext <4 x i1> %cmp to <4 x i32>
%t2 = bitcast <4 x i32> %sext to <2 x i64>
%and = and <2 x i64> %t2, %a
%neg = xor <4 x i32> %sext, <i32 -1, i32 -1, i32 -1, i32 -1>
%neg2 = bitcast <4 x i32> %neg to <2 x i64>
%and2 = and <2 x i64> %neg2, %b
%or = or <2 x i64> %and, %and2
ret <2 x i64> %or
}
For an AVX target, this is currently:
vpcmpgtd %xmm1, %xmm0, %xmm2
vpand %xmm0, %xmm2, %xmm0
vpandn %xmm1, %xmm2, %xmm1
vpor %xmm1, %xmm0, %xmm0
retq
With this patch, it becomes:
vpmaxsd %xmm1, %xmm0, %xmm0
Differential Revision: http://reviews.llvm.org/D20774
llvm-svn: 271676
This is effectively NFC because we already do this transform after r175380:
http://reviews.llvm.org/rL175380
and also via foldBoolSextMaskToSelect().
This change should just make it a bit more efficient to match the pattern.
The original guard was added in r95058:
http://reviews.llvm.org/rL95058
A sampling of codegen for current in-tree targets shows no problems. This
makes sense given that we're already producing the vector selects via the
other transforms.
llvm-svn: 271554
Also, rename recognizeBitReverseOrBSwapIdiom to recognizeBSwapOrBitReverseIdiom,
so the ordering of the MatchBSwaps and MatchBitReversals arguments are
consistent with the function name.
llvm-svn: 270715
Given that we're not actually reducing the instruction count in the included
regression tests, I think we would call this a canonicalization step.
The motivation comes from the example in PR26702:
https://llvm.org/bugs/show_bug.cgi?id=26702
If we hoist the bitwise logic ahead of the bitcast, the previously unoptimizable
example of:
define <4 x i32> @is_negative(<4 x i32> %x) {
%lobit = ashr <4 x i32> %x, <i32 31, i32 31, i32 31, i32 31>
%not = xor <4 x i32> %lobit, <i32 -1, i32 -1, i32 -1, i32 -1>
%bc = bitcast <4 x i32> %not to <2 x i64>
%notnot = xor <2 x i64> %bc, <i64 -1, i64 -1>
%bc2 = bitcast <2 x i64> %notnot to <4 x i32>
ret <4 x i32> %bc2
}
Simplifies to the expected:
define <4 x i32> @is_negative(<4 x i32> %x) {
%lobit = ashr <4 x i32> %x, <i32 31, i32 31, i32 31, i32 31>
ret <4 x i32> %lobit
}
Differential Revision: http://reviews.llvm.org/D17583
llvm-svn: 262645
This is part of the payoff for the refactoring in:
http://reviews.llvm.org/rL261649http://reviews.llvm.org/rL261707
In addition to removing a pile of duplicated code, the xor case was
missing the optimization for vector types because it checked
"SrcTy->isIntegerTy()" rather than "SrcTy->isIntOrIntVectorTy()"
like 'and' and 'or' were already doing.
This solves part of:
https://llvm.org/bugs/show_bug.cgi?id=26702
llvm-svn: 261750
Note: The 'and' case in foldCastedBitwiseLogic() is inheriting one extra
check from the nearly identical 'or' case:
if ((!isa<ICmpInst>(Cast0Src) || !isa<ICmpInst>(Cast1Src))
But I'm not sure how to expose that difference in a regression test.
Without that check, the 'or' path will infinite loop on:
test/Transforms/InstCombine/zext-or-icmp.ll
because the zext-or-icmp fold is attempting a reverse transform.
The refactoring should extend to the 'xor' case next to solve part of
PR26702.
llvm-svn: 261707
This is a straight cut and paste of the existing code and is intended to
be the first step in solving part of PR26702:
https://llvm.org/bugs/show_bug.cgi?id=26702
We should be able to reuse most of this and delete the nearly identical
existing code in visitOr(). Then, we can enhance visitXor() to use the
same code too.
llvm-svn: 261649
There are several requirements that ended up with this design;
1. Matching bitreversals is too heavyweight for InstCombine and doesn't really need to be done so early.
2. Bitreversals and byteswaps are very related in their matching logic.
3. We want to implement support for matching more advanced bswap/bitreverse patterns like partial bswaps/bitreverses.
4. Bswaps are best matched early in InstCombine.
The result of these is that a new utility function is created in Transforms/Utils/Local.h that can be configured to search for bswaps, bitreverses or both. InstCombine uses it to find only bswaps, CGP uses it to find only bitreversals.
We can then extend the matching logic in one place only.
llvm-svn: 257875
MatchBSwap has most of the functionality to match bit reversals already. If we switch it from looking at bytes to individual bits and remove a few early exits, we can extend the main recursive function to match any sequence of ORs, ANDs and shifts that assemble a value from different parts of another, base value. Once we have this bit->bit mapping, we can very simply detect if it is appropriate for a bswap or bitreverse.
llvm-svn: 255334
This is a partial fix for PR24886:
https://llvm.org/bugs/show_bug.cgi?id=24886
Without this IR transform, the backend (x86 at least) was producing inefficient code.
This patch is making 2 assumptions:
1. The canonical form of a fabs() operation is, in fact, the LLVM fabs() intrinsic.
2. The high bit of an FP value is always the sign bit; as noted in the bug report, this isn't specified by the LangRef.
Differential Revision: http://reviews.llvm.org/D13076
llvm-svn: 249702
This is a fix for PR22723:
https://llvm.org/bugs/show_bug.cgi?id=22723
My first attempt at this was to change what I thought was the root problem:
xor (zext i1 X to i32), 1 --> zext (xor i1 X, true) to i32
...but we create the opposite pattern in InstCombiner::visitZExt(), so infinite loop!
My next idea was to fix the matchIfNot() implementation in PatternMatch, but that would
mean potentially returning a different size for the match than what was input. I think
this would require all users of m_Not to check the size of the returned match, so I
abandoned that idea.
I settled on just fixing the exact case presented in the PR. This patch does allow the
2 functions in PR22723 to compile identically (x86):
bool test(bool x, bool y) { return !x | !y; }
bool test(bool x, bool y) { return !x || !y; }
...
andb %sil, %dil
xorb $1, %dil
movb %dil, %al
retq
Differential Revision: http://reviews.llvm.org/D12705
llvm-svn: 248634
Summary: We know that -x & 1 is equivalent to x & 1, avoid using negation for testing if a negative integer is even or odd.
Reviewers: majnemer
Subscribers: junbuml, mssimpso, gberry, mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D12156
llvm-svn: 245569
Summary:
This change splits `makeICmpRegion` into `makeAllowedICmpRegion` and
`makeSatisfyingICmpRegion` with slightly different contracts. The first
one is useful for determining what values some expression //may// take,
given that a certain `icmp` evaluates to true. The second one is useful
for determining what values are guaranteed to //satisfy// a given
`icmp`.
Reviewers: nlewycky
Reviewed By: nlewycky
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8345
llvm-svn: 232575
Summary:
Now that the DataLayout is a mandatory part of the module, let's start
cleaning the codebase. This patch is a first attempt at doing that.
This patch is not exactly NFC as for instance some places were passing
a nullptr instead of the DataLayout, possibly just because there was a
default value on the DataLayout argument to many functions in the API.
Even though it is not purely NFC, there is no change in the
validation.
I turned as many pointer to DataLayout to references, this helped
figuring out all the places where a nullptr could come up.
I had initially a local version of this patch broken into over 30
independant, commits but some later commit were cleaning the API and
touching part of the code modified in the previous commits, so it
seemed cleaner without the intermediate state.
Test Plan:
Reviewers: echristo
Subscribers: llvm-commits
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 231740
This case is interesting because ScalarEvolutionExpander lowers min(a,
b) as ~max(~a,~b). I think the profitability heuristics can be made
more clever/aggressive, but this is a start.
Differential Revision: http://reviews.llvm.org/D7821
llvm-svn: 230285
creating a non-internal header file for the InstCombine pass.
I thought about calling this InstCombiner.h or in some way more clearly
associating it with the InstCombiner clas that it is primarily defining,
but there are several other utility interfaces defined within this for
InstCombine. If, in the course of refactoring, those end up moving
elsewhere or going away, it might make more sense to make this the
combiner's header alone.
Naturally, this is a bikeshed to a certain degree, so feel free to lobby
for a different shade of paint if this name just doesn't suit you.
llvm-svn: 226783
a cache of assumptions for a single function, and an immutable pass that
manages those caches.
The motivation for this change is two fold. Immutable analyses are
really hacks around the current pass manager design and don't exist in
the new design. This is usually OK, but it requires that the core logic
of an immutable pass be reasonably partitioned off from the pass logic.
This change does precisely that. As a consequence it also paves the way
for the *many* utility functions that deal in the assumptions to live in
both pass manager worlds by creating an separate non-pass object with
its own independent API that they all rely on. Now, the only bits of the
system that deal with the actual pass mechanics are those that actually
need to deal with the pass mechanics.
Once this separation is made, several simplifications become pretty
obvious in the assumption cache itself. Rather than using a set and
callback value handles, it can just be a vector of weak value handles.
The callers can easily skip the handles that are null, and eventually we
can wrap all of this up behind a filter iterator.
For now, this adds boiler plate to the various passes, but this kind of
boiler plate will end up making it possible to port these passes to the
new pass manager, and so it will end up factored away pretty reasonably.
llvm-svn: 225131
Added instcombine optimizations for BSWAP with AND/OR/XOR ops:
OP( BSWAP(x), BSWAP(y) ) -> BSWAP( OP(x, y) )
OP( BSWAP(x), CONSTANT ) -> BSWAP( OP(x, BSWAP(CONSTANT) ) )
Since its just a one liner, I've also added BSWAP to the DAGCombiner equivalent as well:
fold (OP (bswap x), (bswap y)) -> (bswap (OP x, y))
Refactored bswap-fold tests to use FileCheck instead of just checking that the bswaps had gone.
Differential Revision: http://reviews.llvm.org/D6407
llvm-svn: 223349
Try to convert two compares of a signed range check into a single unsigned compare.
Examples:
(icmp sge x, 0) & (icmp slt x, n) --> icmp ult x, n
(icmp slt x, 0) | (icmp sgt x, n) --> icmp ugt x, n
llvm-svn: 223224
We may be in a situation where the icmps might not be near each other in
a tree of or instructions. Try to dig out related compare instructions
and see if they combine.
N.B. This won't fire on deep trees of compares because rewritting the
tree might end up creating a net increase of IR. We may have to resort
to something more sophisticated if this is a real problem.
llvm-svn: 222928
It is impossible for (x & INT_MAX) == 0 && x == INT_MAX to ever be true.
While this sort of reasoning should normally live in InstSimplify,
the machinery that derives this result is not trivial to split out.
llvm-svn: 222230
This change, which allows @llvm.assume to be used from within computeKnownBits
(and other associated functions in ValueTracking), adds some (optional)
parameters to computeKnownBits and friends. These functions now (optionally)
take a "context" instruction pointer, an AssumptionTracker pointer, and also a
DomTree pointer, and most of the changes are just to pass this new information
when it is easily available from InstSimplify, InstCombine, etc.
As explained below, the significant conceptual change is that known properties
of a value might depend on the control-flow location of the use (because we
care that the @llvm.assume dominates the use because assumptions have
control-flow dependencies). This means that, when we ask if bits are known in a
value, we might get different answers for different uses.
The significant changes are all in ValueTracking. Two main changes: First, as
with the rest of the code, new parameters need to be passed around. To make
this easier, I grouped them into a structure, and I made internal static
versions of the relevant functions that take this structure as a parameter. The
new code does as you might expect, it looks for @llvm.assume calls that make
use of the value we're trying to learn something about (often indirectly),
attempts to pattern match that expression, and uses the result if successful.
By making use of the AssumptionTracker, the process of finding @llvm.assume
calls is not expensive.
Part of the structure being passed around inside ValueTracking is a set of
already-considered @llvm.assume calls. This is to prevent a query using, for
example, the assume(a == b), to recurse on itself. The context and DT params
are used to find applicable assumptions. An assumption needs to dominate the
context instruction, or come after it deterministically. In this latter case we
only handle the specific case where both the assumption and the context
instruction are in the same block, and we need to exclude assumptions from
being used to simplify their own ephemeral values (those which contribute only
to the assumption) because otherwise the assumption would prove its feeding
comparison trivial and would be removed.
This commit adds the plumbing and the logic for a simple masked-bit propagation
(just enough to write a regression test). Future commits add more patterns
(and, correspondingly, more regression tests).
llvm-svn: 217342
The special case did not work when run under -reassociate and can easily
be expressed by a further generalization of an existing pattern.
llvm-svn: 217227
consider: (and (icmp X, Y), (and Z, (icmp A, B)))
It may be possible to combine (icmp X, Y) with (icmp A, B).
If we successfully combine, create an 'and' instruction with Z.
This fixes PR20814.
N.B. There is room for improvement after this change but I'm not
convinced it's worth chasing yet.
llvm-svn: 216814
(X >> Z) & (Y >> Z) -> (X&Y) >> Z for all shifts.
(X >> Z) | (Y >> Z) -> (X|Y) >> Z for all shifts.
(X >> Z) ^ (Y >> Z) -> (X^Y) >> Z for all shifts.
These patterns were previously handled separately in visitAnd()/visitOr()/visitXor().
Differential Revision: http://reviews.llvm.org/D4951
llvm-svn: 216443
Correctness proof of the transform using CVC3-
$ cat t.cvc
A, B : BITVECTOR(32);
QUERY BVXOR(A | B, BVXOR(A,B) ) = A & B;
$ cvc3 t.cvc
Valid.
llvm-svn: 215524
While we can already transform A | (A ^ B) into A | B, things get bad
once we have (A ^ B) | (A ^ B ^ Cst) because reassociation will morph
this into (A ^ B) | ((A ^ Cst) ^ B). Our existing patterns fail once
this happens.
To fix this, we add a new pattern which looks through the tree of xor
binary operators to see that, in fact, there exists a redundant xor
operation.
What follows bellow is a correctness proof of the transform using CVC3.
$ cat t.cvc
A, B, C : BITVECTOR(64);
QUERY BVXOR(A, B) | BVXOR(BVXOR(B, C), A) = BVXOR(A, B) | C;
QUERY BVXOR(BVXOR(A, C), B) | BVXOR(A, B) = BVXOR(A, B) | C;
QUERY BVXOR(A, B) & BVXOR(BVXOR(B, C), A) = BVXOR(A, B) & ~C;
QUERY BVXOR(BVXOR(A, C), B) & BVXOR(A, B) = BVXOR(A, B) & ~C;
$ cvc3 < t.cvc
Valid.
Valid.
Valid.
Valid.
llvm-svn: 214342
"((~A & B) | A) -> (A | B)" and "((A & B) | ~A) -> (~A | B)"
Original Patch credit to Ankit Jain !!
Differential Revision: http://reviews.llvm.org/D4591
llvm-svn: 213676