This is an intrinsic version of the existing fold for binops.
As a first step, I only allowed min/max, but the code is set
up to make adding more intrinsics easy (with more or less than
2 arguments).
This (and possible follow-ups) are discussed in issue #46238.
A generalization like this was suggested in D119754.
This is the inverse direction of D119851,
and we get all of the folds there plus the one that was missed.
There is precedence for this kind of transform in instcombine
with "or" instructions (but strangely only with that one opcode AFAICT).
Similar justification as in the other patch:
The line between instcombine and reassociate for these kinds of folds
is blurry. This doesn't appear to have much cost and gives us the
expected wins from repeated folds as seen in the last set of test diffs.
Differential Revision: https://reviews.llvm.org/D119955
Integer min/max operations are associative:
max (max X, C0), C1 --> max X, (max C0, C1) --> max X, NewC
https://alive2.llvm.org/ce/z/wW5HVM
This would avoid a regression when we canonicalize to min/max intrinsics
(see D98152 ).
Differential Revision: https://reviews.llvm.org/D119754
We should not lose analysis precision if an 'add' has both no-wrap
flags (nsw and nuw) compared to just one or the other.
This patch is modeled on a similar construct that was added with
D59386.
I don't think it is possible to expose a problem with an unsigned
compare because of the way this was coded (nuw is handled first).
InstCombine has an assert that fires with the example from:
https://github.com/llvm/llvm-project/issues/52884
...because it was expecting InstSimplify to handle this kind of
pattern with an smax.
Fixes#52884
Differential Revision: https://reviews.llvm.org/D116322
umax(X, Op1) - Op1 --> usub.sat(X, Op1)
https://alive2.llvm.org/ce/z/HpcGiJ
This happens in 2 or more steps with an icmp-select idiom
instead of an intrinsic. This is another step towards
canonicalization of the min/max intrinsics. See:
D98152
This is another regression noted with the proposal to canonicalize
to the min/max intrinsics in D98152.
Here are Alive2 attempts to show correctness without specifying
exact constants:
https://alive2.llvm.org/ce/z/bvfCwh (smax)
https://alive2.llvm.org/ce/z/of7eqy (smin)
https://alive2.llvm.org/ce/z/2Xtxoh (umax)
https://alive2.llvm.org/ce/z/Rm4Ad8 (umin)
(if you comment out the assume and/or no-wrap, you should see failures)
The different output for the umin test is due to a fold added with
c4fc2cb5b2 :
// umin(x, 1) == zext(x != 0)
We probably want to adjust that, so it applies more generally
(umax --> sext or patterns where we can fold to select-of-constants).
Some folds that were ok when starting with cmp+select may increase
instruction count for the equivalent intrinsic, so we have to decide
if it's worth altering a min/max.
Differential Revision: https://reviews.llvm.org/D110038
This is a translation of the existing code to handle the intrinsics
and another step towards D98152.
https://alive2.llvm.org/ce/z/jA7eBC
This pattern is already handled by underlying folds if there are
less uses, so the minimal tests in this case have extra uses.
The larger cmyk tests show the motivation - when combined with
other folds, we invert a larger sequence and eliminate 'not' ops.
isFreeToInvert allows min/max with 'not' on both operands,
so easing the argument restriction catches the case where
that operand has one use.
We already handle the sub-patterns when there are less uses:
https://alive2.llvm.org/ce/z/8Jatm_
...but this is another step towards parity with the
equivalent icmp+select idioms ( D98152 ).
Differential Revision: https://reviews.llvm.org/D109059
This mimics the code for the corresponding cmp-select idiom.
This also prevents an infinite loop because isFreeToInvert
does not match constant expressions.
So this patch solves the same problem as D108814 and obsoletes
it, but my main motivation is to enhance the pattern matching
to allow more invertible ops. That change will be a follow-up
patch on top of this one.
Differential Revision: https://reviews.llvm.org/D109058
This is a re-try of 3aa009cc87 which was reverted at
9577fac0fd because it caused an infinite loop.
For the extra test case, either re-ordering the transforms
or adding the extra clause to avoid sub-of-sub is enough
to prevent the infinite compile, but I'm doing both to be
safer.
Original commit message:
The motivation was to get min/max intrinsics to parity
with cmp+select idioms, but this unlocks a few more
folds because isFreeToInvert recognizes add/sub with
constants too.
In the min/max example, we have too many extra uses
for smaller folds to improve things, but this fold
is able to eliminate uses even though we can't reduce
the number of instructions.
The motivation was to get min/max intrinsics to parity
with cmp+select idioms, but this unlocks a few more
folds because isFreeToInvert recognizes add/sub with
constants too.
In the min/max example, we have too many extra uses
for smaller folds to improve things, but this fold
is able to eliminate uses even though we can't reduce
the number of instructions.
This makes the intrinsic logic match the cmp+select idiom folds
just below. It's not clearly a win either way unless we think
that a 'not' op costs more than min/max.
The cmp+select folds on these patterns are more extensive than
the intrinsics currently and may have some complicated interactions,
so I'm trying to make those line up and bring the optimizations
for intrinsics up to parity.
If both operands are negated, we can invert the min/max and do
the negation after:
smax (neg nsw X), (neg nsw Y) --> neg nsw (smin X, Y)
smin (neg nsw X), (neg nsw Y) --> neg nsw (smax X, Y)
This is visible as a remaining regression in D98152. I don't see
a way to generalize this for 'unsigned' or adapt Negator to
handle it. This only appears to be safe with 'nsw':
https://alive2.llvm.org/ce/z/GUy1zJ
Differential Revision: https://reviews.llvm.org/D108165
This is a re-try of 6de1dbbd09 which was reverted because
it missed a null check. Extra test for that failure added.
Original commit message:
This is an adaptation of D41603 and another step on the way
to canonicalizing to the intrinsic forms of min/max.
See D98152 for status.
This is a direct translation of the select folds added with
D53033 / D53036 and another step towards canonicalization
using the intrinsics (see D98152).
We already implemented this for the select form, but the intrinsic form was missing. Note that this doesn't change poison behavior as 1 is non-poison, and the optimized form is still poison exactly when x is.
As suggested in the review thread for 5094e12 and seen in the
motivating example from https://llvm.org/PR49885, it's not
clear if we have a way to create the optimal code without
this heuristic.
This is another step towards parity between existing select
transforms and min/max intrinsics (D98152)..
The existing 'not' folds around select are complicated, so
it's likely that we will need to enhance this, but this
should be a safe step.