For the scalar/splat case, this fold is subsumed by
foldLogOpOfMaskedICmps(). However, the conjugated fold for "or"
also supports splats with undef. Make both code paths consistent
by using m_ZeroInt() for the "and" implementation as well.
https://alive2.llvm.org/ce/z/tN63cuhttps://alive2.llvm.org/ce/z/ufB_Ue
When folding and/or of icmps, look through add of a constant and
adjust the icmp range instead. Effectively, this decomposes
X + C1 < C2 style range checks back into a normal range. This allows
us to fold comparisons involving two range checks or one range check
and some other condition. We had a fold for a really specific case
of this (or of range check and eq, and only one one side!) while
this handles it in fully generality.
Differential Revision: https://reviews.llvm.org/D113510
Since there is just a single check for LHS in ~(A | B) & C | ...
transforms and multiple RHS checks inside with more coming I am
removing m_OneUse checks for LHS and adding new checks for RHS.
This is non essential as long as there is total benefit.
In addition (~(A | B) & C) | (~(A | C) & B) --> (B ^ C) & ~A
checks were overly restrictive, it should be good without any
additional checks.
Differential Revision: https://reviews.llvm.org/D113141
A problem was noted in the post-commit review for
c36b7e21bd / D113035 :
If the source type is not integer or integer vector,
then we could crash when trying to ComputeNumSignBits().
The implementation for and/or is the same, apart from the choice
of exactIntersectWith() vs exactUnionWith(). Extract a common
function to make future extension easier.
Rather than testing for many specific combinations of predicates
and values, compute the exact icmp regions for both comparisons
and check whether they union/intersect exactly. If they do,
construct the equivalent icmp for the new range. Assuming that the
existing code handled all possible cases, this should be NFC.
Differential Revision: https://reviews.llvm.org/D113367
(Cond & C) | (~bitcast(Cond) & D) --> bitcast (select Cond, (bc C), (bc D))
This is part of fixing:
https://llvm.org/PR34047
That report shows a case where a bitcast is sitting between the select condition
candidate and its 'not' value due to current cast canonicalization rules.
There's a bitcast type restriction that might be violated in existing matching,
but I still need to investigate if that is possible -
Alive2 shows we can only do this transform safely when the bitcast is from
narrow to wide vector elements (otherwise poison could leak into elements
that were safe in the original code):
https://alive2.llvm.org/ce/z/Hf66qh
Differential Revision: https://reviews.llvm.org/D113035
The added test has poison lanes due to the vector shuffle. This can
cause an infinite loop of combines in instcombine where it folds
xor(ashr, -1) -> select (icmp slt 0), -1, 0 -> sext (icmp slt 0) -> xor(ashr, -1).
We usually prevent this by checking that the xor constant is not -1,
but with vectors some of the lanes may be -1, some may be poison. So
this changes the way we detect that from "!C1->isAllOnesValue()" to
"!match(C1, m_AllOnes())", which is more able to detect that some of the
lanes are poison.
Fixes PR52397
The tests diffs are logically equivalent, and so this is
generally NFC, but this makes the code match the code
comment.
It should also be more efficient. If we choose the 'not'
operand (rather than the 'not' instruction) as the select
condition, then we don't have to invert the select
condition/operands as a subsequent transform.
Similar to 54e969cffd (and with cosmetic updates to hopefully
make that easier to read), this fold has been around since early
in LLVM history.
Intermediate folds have been added subsequently, so extra uses
are required to exercise this code.
The test example actually shows an unintended consequence with
extra uses - we end up with an extra instruction compared to what
we started with. But this at least makes scalar/vector consistent.
General proof:
https://alive2.llvm.org/ce/z/tmuBza
This fold was added long ago (part of fixing PR4216),
and it matched scalars only. Intermediate folds have
been added subsequently, so extra uses are required
to exercise this code.
General proof:
https://alive2.llvm.org/ce/z/G6BBhB
One of the specific tests:
https://alive2.llvm.org/ce/z/t0JhEB
The sequence of instructions `xor (ashr X, BW-1), C` (or with a truncation
`xor (trunc (ashr X, BW-1)), C)` takes a value, produces all zeros or all
ones and with it optionally inverts a constant depending on whether the
original input was positive or negative. This is the same as checking if
the value is positive, and selecting between the constant and ~constant.
https://alive2.llvm.org/ce/z/NJ85qY
This is a fairly general version of a fold that helps pull saturating
arithmetic into a canonical form.
Differential Revision: https://reviews.llvm.org/D109151
Fixes non-determinisctic order of XOR instructions created after
5a7a458306. The order of call argument evaluation is not
defined, so create one Value before the call.
This removes an over-specified fold. The more general transform
was added with:
727e642e97
There's a difference on an existing test that shows a potentially
unnecessary use limit on an icmp fold.
That fold is in InstCombinerImpl::foldICmpSubConstant(), and IIRC
there was some back-and-forth on it and similar folds because they
could cause analysis/passes (SCEV, LSR?) to miss optimizations.
Differential Revision: https://reviews.llvm.org/D111410
(iN X s>> (N-1)) & Y --> (X < 0) ? Y : 0
https://alive2.llvm.org/ce/z/qeYhdz
I was looking at a missing abs() transform and found my way to this
generalization of an existing fold that was added with D67799.
As discussed in that review, we want to make sure codegen handles
this difference well, and for all of the targets/types that I
spot-checked, it looks good.
I am leaving the existing fold in place in this commit because
it covers a potentially missing icmp fold, but I plan to remove
that as a follow-up commit as suggested during review.
Differential Revision: https://reviews.llvm.org/D111410
This is NFC-intended for scalar code. There are still unnecessary
m_ConstantInt restrictions in surrounding code, so this is not a
complete fix.
This prevents regressions seen with a planned follow-on to D111410.
This removes repeated calls to m_Not, so hopefully a little
more efficient.
Also, we may need to enhance some of these blocks to allow
logical and/or (select of bools).
Stop using APInt constructors and methods that were soft-deprecated in
D109483. This fixes all the uses I found in llvm, except for the APInt
unit tests which should still test the deprecated methods.
Differential Revision: https://reviews.llvm.org/D110807
This renames the primary methods for creating a zero value to `getZero`
instead of `getNullValue` and renames predicates like `isAllOnesValue`
to simply `isAllOnes`. This achieves two things:
1) This starts standardizing predicates across the LLVM codebase,
following (in this case) ConstantInt. The word "Value" doesn't
convey anything of merit, and is missing in some of the other things.
2) Calling an integer "null" doesn't make any sense. The original sin
here is mine and I've regretted it for years. This moves us to calling
it "zero" instead, which is correct!
APInt is widely used and I don't think anyone is keen to take massive source
breakage on anything so core, at least not all in one go. As such, this
doesn't actually delete any entrypoints, it "soft deprecates" them with a
comment.
Included in this patch are changes to a bunch of the codebase, but there are
more. We should normalize SelectionDAG and other APIs as well, which would
make the API change more mechanical.
Differential Revision: https://reviews.llvm.org/D109483
This makes the intrinsic logic match the cmp+select idiom folds
just below. It's not clearly a win either way unless we think
that a 'not' op costs more than min/max.
The cmp+select folds on these patterns are more extensive than
the intrinsics currently and may have some complicated interactions,
so I'm trying to make those line up and bring the optimizations
for intrinsics up to parity.
Currently we only match bswap intrinsics from or(shl(),lshr()) style patterns when we could often match bitreverse intrinsics almost as cheaply.
Differential Revision: https://reviews.llvm.org/D90170
If a logical and/or is used, we need to be careful not to propagate
a potential poison value from the RHS by inserting a freeze
instruction. Otherwise it works the same way as bitwise and/or.
This is intended to address the regression reported at
https://reviews.llvm.org/D101191#2751002.
Differential Revision: https://reviews.llvm.org/D102279
We can not rely on (C+X)-->(X+C) already happening,
because we might not have visited that `add` yet.
The added testcase would get stuck in an endless combine loop.
Remove the requirement that the instruction is a BinaryOperator,
make the predicate check more compact and use slightly more
meaningful naming for the and operands.
Let's say you represent (i32, i32) as an i64 from which the parts
are extracted with lshr/trunc. Then, if you compare two tuples by
parts you get something like A[0] == B[0] && A[1] == B[1], just
that the part extraction happens by lshr/trunc and not a narrow
load or similar.
The fold implemented here reduces such equality comparisons by
converting them into a comparison on a larger part of the integer
(which might be the whole integer). It handles both the "and of eq"
and the conjugated "or of ne" case.
I'm being conservative with one-use for now, though this could be
relaxed if profitable (the base pattern converts 11 instructions
into 5 instructions, but there's quite a few variations on how it
can play out).
Differential Revision: https://reviews.llvm.org/D101232
The swap of the operands can affect later transforms that
are expecting a constant as operand 1. I don't think we
can trigger a bug with the current code, but I hit that
problem while drafting a new transform for min/max intrinsics.
If we have a recurrence of the form <Start, Or, Step> we know that the value taken by the recurrence stabilizes on the first iteration (provided step is loop invariant). We can exploit that fact to remove the loop carried dependence in the recurrence.
Differential Revision: https://reviews.llvm.org/D97578 (or part)
If we have a recurrence of the form <Start, And, Step> we know that the value taken by the recurrence stabilizes on the first iteration (provided step is loop invariant). We can exploit that fact to remove the loop carried dependence in the recurrence.
Differential Revision: https://reviews.llvm.org/D97578 (and part)
recognizeBSwapOrBitReverseIdiom + collectBitParts have pattern matching to bail out early if a bswap/bitreverse pattern isn't possible - we should be able to rely on this instead without any notable change in compile time.
This is part of a cleanup towards letting matchBSwapOrBitReverse /recognizeBSwapOrBitReverseIdiom use 'root' instructions that aren't ORs (FSHL/FSHRs in particular which can be prematurely created).
Differential Revision: https://reviews.llvm.org/D97056