When the mask is a power-of-2 constant and op0 is a shifted-power-of-2
constant, test if the shift amount equals the offset bit index:
(ShiftC << X) & C --> X == (log2(C) - log2(ShiftC)) ? C : 0
(ShiftC >> X) & C --> X == (log2(ShiftC) - log2(C)) ? C : 0
This is an alternate to D127610 with a more general pattern.
We match only shift+and instead of the trailing xor, so we see a few
more tests diffs. I think we discussed this initially in D126617.
Here are proofs for shifts in both directions:
https://alive2.llvm.org/ce/z/CFrLs4
The test diffs look equal or better for IR, and this makes the
patterns more uniform in IR. The backend can partially invert this
in both cases if that is profitable. It is not trivially reversible,
however, so if we find perf regressions that are not easy to undo,
then we may want to revert this.
Differential Revision: https://reviews.llvm.org/D127801
We really want to push freezes through recurrence phis, so that we
freeze only the start value, rather than the IV value on every
iteration. foldOpIntoPhi() already handles this for the case where
the transfer function doesn't produce poison, e.g.
%iv.next = add %iv, 1. However, this does not work if nowrap flags
are present, e.g. the very common %iv.next = add nuw %iv, 1 case.
This patch adds a fold that pushes freeze instructions to the start
value by checking whether all backedge values will be non-poison
after poison generating flags have been dropped. This allows pushing
freezes out of loops in most cases. I suspect that this also
obsoletes the CanonicalizeFreezeInLoops pass, and we can probably
drop it.
Fixes https://github.com/llvm/llvm-project/issues/56048.
Differential Revision: https://reviews.llvm.org/D127960
If an integer PHI has an illegal type (according to the data layout) and
it is only used by `trunc` or `trunc(lshr)` operations, we split the PHI
into various instructions in its predecessors:
6d1543a167/llvm/lib/Transforms/InstCombine/InstCombinePHI.cpp (L1536-L1543)
So this can produce code like the following:
Before:
```
pred:
...
bb:
%p = phi i8 [ %somevalue, %pred ], ...
...
%tobool = trunc i8 %p to i1
use %tobool
...
```
In this code, `%p` has an illegal integer type, `i8`, and its only used
in a `trunc` instruction later. In this case this pass puts extraction
code in its predecessors:
After:
```
pred:
...
%t = and i8 %somevalue, 1
%extract = icmp ne i8 %t, 0
bb:
%p.new = phi i1 [ %extract, %pred ], ...
use %p.new instead of %tobool
```
But this doesn't work if `pred` is a `catchswitch` BB because it cannot
have any non-PHI instructions. This CL ensures we bail out in that case.
Fixes https://github.com/llvm/llvm-project/issues/55803.
Reviewed By: dschuff
Differential Revision: https://reviews.llvm.org/D127699
This shows narrowing improvements on the logic tests
(transforms recently added with e247b0e5c9).
This is not a complete fix. That would require adding
folds to visitOr/visitXor. But it enables the expected
transforms for the basic patterns in the affected tests.
When pushing an operation across a phi node, we should avoid doing
so across a loop backedge. This is generally non-profitable, because
it does not reduce the number of times the operation is executed,
and could lead to an infinite combine loop.
The code was already guarding against this, but using an
insufficiently strong condition, which did not cover the case where
the operation was originally outside the loop (in which case the
transform moves the operation from outside the loop into the loop,
which is particularly undesirable).
Differential Revision: https://reviews.llvm.org/D127499
The 1st try ( afa192cfb6 ) was reverted because it could
cause an infinite loop with constant expressions.
A test for that and an extra condition to enable the transform
are added now. I also added code comments to better describe
the transform and the existing, related transform.
Original commit message:
https://alive2.llvm.org/ce/z/hRy3rE
As shown in D123408, we can produce this pattern when moving
casts around, and we already have a related fold for a binop
with a constant operand.
In foldSelectIntoOp we sometimes transform a select of a fadd into a
fadd of a select, where we select between data and an identity value.
For both fadd and fsub the identity is always -0.0, but if the nsz
flag is set on the select instruction we can use +0.0 instead. Doing
so then triggers other optimisations, such as when folding the select
of masked load into a new masked load.
Differential Revision: https://reviews.llvm.org/D126774
https://alive2.llvm.org/ce/z/hRy3rE
As shown in D123408, we can produce this pattern when moving
cast around, and we already have a related fold for a binop
with a constant operand.
Clang-format InstructionSimplify and convert all "FunctionName"s to
"functionName". This patch does touch a lot of files but gets done with
the cleanup of InstructionSimplify in one commit.
This is the alternative to the less invasive clang-format only patch: D126783
Reviewed By: spatel, rengolin
Differential Revision: https://reviews.llvm.org/D126889
The patch simplifies some of the patterns as below
(A | (B & C0)) | (B & C1) -> A | (B & C0|C1)
((B & C0) | A) | (B & C1) -> (B & C0|C1) | A
In some scenarios like byte reverse on half word, we can see this pattern multiple times and this conversion can optimize these patterns.
Additionally this commit fixes the issue reported with the test case.
int f(int a, int b) {
int c = ((unsigned char)(a >> 23) & 925);
if (a)
c = (a >> 23 & b) | ((unsigned char)(a >> 23) & 925) | (b >> 23 & 157);
return c;
}
The previous revision/commit did not check one-use of an intermediate value that this transform re-uses.
When that value has another use, an existing transform will try to invert the transform here.
By adding one-use checks, we avoid the infinite loops seen with the earlier commit.
Differential Revision: https://reviews.llvm.org/D124119
Existing condition for
fold icmp ugt (ashr X, ShAmtC), C --> icmp ugt X, ((C + 1) << ShAmtC) - 1
missed some boundary. It cause this fold don't work for some cases, and the
reason is due to signed number overflow.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D127188
InstCombine tries to rewrite
%prod = mul nsw i64 %X, Scale
%acc = add nsw i64 %prod, Offset
%0 = alloca i8, i64 %acc, align 4
%1 = bitcast i8* %0 to i32*
Use ( %1 )
into
%prod = mul nsw i64 %X, Scale/4
%acc = add nsw i64 %prod, Offset/4
%0 = alloca i32, i64 %acc, align 4
Use (%0)
But it assumes Scale is unsigned, and performs an unsigned division.
So we should bail out if Scale cannot be interpreted as an unsigned safely.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D126546
If we don't demand low bits and it is valid to pre-shift a constant:
(C2 >> X) << C1 --> (C2 << C1) >> X
https://alive2.llvm.org/ce/z/_UzTMP
This is the reverse-order shift sibling to 82040d414b ( D127122 ).
It seems likely that we would want to add this to the SDAG version of
the code too to keep it on par with IR.
If we don't demand high bits (zeros) and it is valid to pre-shift a constant:
(C2 << X) >> C1 --> (C2 >> C1) << X
https://alive2.llvm.org/ce/z/P3dWDW
There are a variety of related patterns, but I haven't found a single solution
that gets all of the motivating examples - so pulling this piece out of
D126617 along with more tests.
We should also handle the case where we shift-right followed by shift-left,
but I'll make that a follow-on patch assuming this one is ok. It seems likely
that we would want to add this to the SDAG version of the code too to keep it
on par with IR.
Differential Revision: https://reviews.llvm.org/D127122
https://alive2.llvm.org/ce/z/o7rQ5q
This shows an extra instruction in some cases, but that is
caused by an existing canonicalization of trunc -> and+icmp.
Codegen should be better for any target where a multiply is
more costly than the most simple ALU op.
This ends up producing the requested x86 asm from issue #55618,
but it's not the same IR. We are missing a canonicalization
from the negate+mask pattern to the trunc+select created here.
We could go either way on this and several similar matches.
Just matching as a binop is possibly slightly more efficient;
we don't need to re-confirm the opcode of the instruction.
This reverts commit ec4adf1f6c. The commit causes
clang to hang on a certain input:
```
$ cat q.cc
int f(int a, int b) {
int c = ((unsigned char)(a >> 23) & 925);
if (a)
c = (a >> 23 & b) | ((unsigned char)(a >> 23) & 925) | (b >> 23 & 157);
return c;
}
$ time ./clang-15-10515 --target=x86_64--linux-gnu -O1 -c q.cc
^C
real 0m45.072s
user 0m0.025s
sys 0m0.099s
```
X <=u (sext i1 Y) --> (X == 0) | Y
https://alive2.llvm.org/ce/z/W_tZzo
This is the conjugate/sibling pattern suggested with D126171
for a sign-extended bool value.
When reassociating GEPs, we can only keep inbounds if both original
GEPs were inbounds, and their offsets have the same sign. For the
sake of simplicity, I only handle the case where both offsets are
non-negative here.
It would probably be fine to just not preserve inbounds at all here,
but as I don't see a compile-time impact for adding the
isKnownNonNegative() calls I went with this more conservative
approach.
Fixes https://github.com/llvm/llvm-project/issues/44206.
Differential Revision: https://reviews.llvm.org/D126687
Even if the total offset is inbounds, we might represent it by first
performing a large negative offset and then a small positive one.
With inbounds semantics as currently specified, each offset must
be inbounds individually, not just the overall offset of the GEP.
Fix this by checking that the sign of all offsets is the same.
Fixes https://github.com/llvm/llvm-project/issues/55722.
(C2 >> X) >> C1 --> (C2 >> C1) >> X
The shift-left form of this transform has existed since:
16f18ed7b5
...but it applies to matching shift right opcodes too:
https://alive2.llvm.org/ce/z/c5eQms
The restriction goes back to:
16f18ed7b5
...but the fold only replaces a shift with a shift, so that's not necessary.
Generalizing to other opcodes is planned as a follow-up.
If only one of the GEPs is inbounds, then after swapping, there is
no guarantee that one of them will be inbounds as well
(see e.g. https://alive2.llvm.org/ce/z/agaCnp).
This is only a partial fix, because even if both are inbounds, the
result is not necessarily inbounds (if the offsets have different
signs).
As the long explanatory comment attests, performing the modification
in place is pretty tricky. Drop this unnecessary complexity and
always create new instructions.
This should be NFC-ish, but can probably cause difference due to
worklist order.
(ashr i32 X, 31) * C --> (X < 0) ? -C : 0
https://alive2.llvm.org/ce/z/G8u9SS
With a constant operand, this is an improvement in IR
and codegen (where it can be converted to a mask op).
Without a constant operand, we would have to negate
the operand, so that is probably better left to the backend.
This is similar but not the same optimization that is requested
in #55618.
This is effectively NFC (intentionally no test diffs)
because we already have the related fold that converts
the 'and' pattern to select. So this is just an efficiency
improvement.
This extends the fold from D126410 / 3952c905ef
to allow for the only case where it works with signed
division:
https://alive2.llvm.org/ce/z/k7_ypu
(X s/ Y) == SMIN --> (X == SMIN) && (Y == 1)
(X s/ Y) != SMIN --> (X != SMIN) || (Y != 1)
This is another improvement based on #55695.
With large compare constant:
(X u/ Y) == C --> (X == C) && (Y == 1)
(X u/ Y) != C --> (X != C) || (Y != 1)
https://alive2.llvm.org/ce/z/EhKwh6
There are various potential missing icmp (div) transforms shown here:
https://github.com/llvm/llvm-project/issues/55695
This is a generalization for part of the udiv + equality.
I didn't check in detail, but some of those may only make sense as
codegen transforms.
This results in one extra instruction in IR, but it is better for
analysis, and looks much better in codegen on all targets that I tried.
Differential Revision: https://reviews.llvm.org/D126410
This patch break foldBitCastBitwiseLogic limite the destination
must have an integer element type, and eliminate one bitcast by
doing the logic op in the type of the input that has an integer
element type.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D126184
shuffle (cast X), (cast Y), Mask --> cast (shuffle X, Y, Mask)
This extends the transform added with 0353c2c996.
If the shuffle reduces vector length, the transform
reduces the width of the cast, so that should be a
win for most codegen (if not, it can be inverted).
Bitcasts were stripped in one case, but not the other. Of course,
this no longer really matters with opaque pointers, but as I went
through the trouble of tracking this down, we may as well remove
one typed vs opaque pointer optimization discrepancy.
Use IRBuilder so that the newly created freeze instructions
automatically gets inserted back into the IC worklist.
The changed worklist processing order leads to some cosmetic
differences in tests.
Fixes https://github.com/llvm/llvm-project/issues/55619.
Most of the folds implemented in this function work fine with
logical operations. We only need to be careful for the cases that
work on non-constant masks, where the RHS operand shouldn't be
poison.
This is a conservative implementation that bails out of illegal
transforms, but we could also change these to insert freeze instead.
X <u (zext i1 Y) --> (X == 0) && Y
https://alive2.llvm.org/ce/z/avQDRY
This is a generalization of 4069cccf3b based on the post-commit suggestion.
This also adds the i1 type check and tests that were missing from the earlier
attempt; that commit caused several bot fails and was reverted.
Differential Revision: https://reviews.llvm.org/D126171
Similarly to a change recently done for fcmps, add a flag that
indicates whether the and/or is logical to foldAndOrOfICmps, and
reuse the function when folding logical and/or.
We were already calling some parts of it, but this gives us a
clearer indication of which parts may need poison-safe variants,
and would also allow to fold combinations of bitwise and logical
and/or.
This change should be close to NFC, because all folds this enables
were either already called previously, or can make use of implied
poison reasoning.
This is the specific pattern seen in #53432, but it can be extended
in multiple ways:
1. The 'zext' could be an 'and'
2. The 'sub' could be some other binop with a similar ==0 property (udiv).
There might be some way to generalize using knownbits, but that
would require checking that the 'bool' value is created with
some instruction that can be replaced with new icmp+logic.
https://alive2.llvm.org/ce/z/-KCfpa
Most clients only used these methods because they wanted to be able to
extend or truncate to the same bit width (which is a no-op). Now that
the standard zext, sext and trunc allow this, there is no reason to use
the OrSelf versions.
The OrSelf versions additionally have the strange behaviour of allowing
extending to a *smaller* width, or truncating to a *larger* width, which
are also treated as no-ops. A small amount of client code relied on this
(ConstantRange::castOp and MicrosoftCXXNameMangler::mangleNumber) and
needed rewriting.
Differential Revision: https://reviews.llvm.org/D125557
In this patch we add a function foldICmpInstWithConstantAllowUndef
to fold integer comparisons with a constant operand: icmp Pred X, C
where X is some kind of instruction and C is AllowUndef.
We move this fold to the new function, so that it can solve undef elts in a vector.
Reviewed By: spatel, RKSimon
Differential Revision: https://reviews.llvm.org/D125220
When shifting by a byte-multiple:
bswap (shl X, Y) --> lshr (bswap X), Y
bswap (lshr X, Y) --> shl (bswap X), Y
This was limited to constants as a first step in D122010 / 60820e53ec ,
but issue #55327 shows a source example (and there's a test based on that here)
where a variable shift amount is used in this pattern.
We could do better by inserting a bitcast from scalar int
to vector int or using an insertelement (the alternate test
does not crash because there's an independent fold like that).
But this doesn't seem like a likely pattern, so just bail out
for now.
Fixes issue #55516.
shuffle (cast X), (cast Y), Mask --> cast (shuffle X, Y, Mask)
This extends the transform added with 0353c2c996.
If the casts are to a larger element type, the transform
reduces shuffle bit width, so that should be a win for
most codegen (if not, it can be inverted).
The transform was wrong in 3 ways:
1. It created an extra instruction when the source and dest types don't match.
2. It did not account for an extra use of the icmp, so could create 2 extra insts.
3. It favored bit hacks over icmp (icmp generally has better analysis).
This fixes#54692 (modeled by the PhaseOrdering tests).
This is a minimal step to fix the bug, but we should likely invert
this and the sibling transform for the "is negative" pattern too.
The backend should be able to invert this back to a shift if that
leads to better codegen.
This is a reduced try of 3794cc0e99 - that was reverted because
it could cause infinite loops by conflicting with the related
transforms in this block that create shifts.
Checking whether two KnownBits are the same is somewhat common,
mainly in test code.
I don't think there is a lot of room for confusion with "determine
what the KnownBits for an icmp eq would be", as that has a
different result type (this is what the eq() method implements,
which returns Optional<bool>).
Differential Revision: https://reviews.llvm.org/D125692
The existing transform was wrong in 3 ways:
1. It created an extra instruction when the source and dest types don't match.
2. It did not account for an extra use of the icmp, so could create 2 extra insts.
3. It favored bit hacks over icmp (icmp generally has better analysis).
This fixes#54692 (modeled by the PhaseOrdering tests).
This is a minimal step to fix the bug, but we should likely invert
the sibling transform for the "is negative" pattern too.
The backend should be able to invert this back to a shift if that
leads to better codegen.
The patch simplifies some of the patterns as below
(A | (B & C0)) | (B & C1) -> A | (B & C0|C1)
((B & C0) | A) | (B & C1) -> (B & C0|C1) | A
In some scenarios like byte reverse on half word, we can see this pattern multiple times and this conversion can optimize these patterns.
Differential Revision: https://reviews.llvm.org/D124119
There is a long function foldICmpInstWithConstant,
we can separate a function foldICmpBinOpWithConstant from it.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D125457
We commonly want to create either an inbounds or non-inbounds GEP
based on a boolean value, e.g. when preserving inbounds from
existing GEPs. Directly accept such a boolean in the API, rather
than requiring a ternary between CreateGEP and CreateInBoundsGEP.
This change is not entirely NFC, because we now preserve an
inbounds flag in a constant expression edge-case in InstCombine.
This patch fix bug left in D124503. We should do
sub(add(X,Z),umin(Y,Z)) --> add(X,usub.sat(Z,Y)) instead of
sub(add(X,Z),umin(Y,Z)) --> add(X,usub.sat(Y,Z)).
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D125352
As discussed in issue #37809, this transform is not safe
if the input is an undefined value.
This is similar to recent changes for urem and sdiv:
d428f09b2c99ef341ce9
There is no difference in codegen on the basic examples,
but this could lead to regressions. We may need to
improve freeze analysis or lowering if that happens.
Presumably, in real cases that are similar to the tests
where a subsequent transform removes the rem, we
will also be able to remove the freeze by seeing that
the parameter has 'noundef'.
As discussed in issue #37809, this transform is not safe
if the input is an undefined value.
This is similar to a recent change for urem:
d428f09b2c
There is no difference in codegen on the basic examples,
but this could lead to regressions. We may need to
improve freeze analysis or lowering if that happens.
Presumably, in real cases that are similar to the tests
where a subsequent transform removes the select, we
will also be able to remove the freeze by seeing that
the parameter has 'noundef'.
As discussed in issue #37809, this transform is not safe
if the input is an undefined value.
There is no difference in codegen on the basic examples,
but this could lead to regressions. We may need to
improve freeze analysis or lowering if that happens.
If there is a freeze %x, we currently replace all other uses of %x
with freeze %x -- as long as they are dominated by the freeze
instruction. This patch extends this behavior to cases where we
did not originally dominate the use by moving the freeze
instruction directly after the definition of the frozen value.
The motivation can be seen in test @combine_and_after_freezing_uses:
Canonicalizing everything to freeze %x allows folds that are based
on value identity (i.e. same operand occurring in two places) to
trigger. This also covers the case from D125248.
Differential Revision: https://reviews.llvm.org/D125321
shuffle (cast X), (cast Y), Mask --> cast (shuffle X, Y, Mask)
This is similar to a recent transform with fneg ( b331a7ebc1 ),
but this is intentionally the most conservative first step to
try to avoid regressions in codegen. There are several
restrictions that could be removed as follow-up enhancements.
Note that a cast with a unary shuffle is currently canonicalized
in the other direction (shuffle after cast - D103038 ). We might
want to invert that to be consistent with this patch.
30a12f3f63 switched the type check
to use the GEP result type rather than the GEP operand type.
However, the GEP result types may match even if the operand types
don't, in case GEPs with scalar/vector base and vector index
are compared.
Fixes https://github.com/llvm/llvm-project/issues/55363.
As shown in https://github.com/llvm/llvm-project/issues/55150 -
the existing fold may be wrong when converting to a signed value.
This is a quick fix to avoid the miscompile.
I added tests/comments for all of the signed/unsigned combinations
at either side of the boundary width, and tried to confirm with Alive2:
https://alive2.llvm.org/ce/z/3p9DSu
There are already some TODO items in the test file that suggest
possible refinements, so the regression with ui->FP->si is probably ok.
It seems unlikely that we'd see these kind of edge cases with
non-byte-width integer types in real code. The potential miscompile
went undetected for several years.
This and 747c6a0c73fixes#55150.
Differential Revision: https://reviews.llvm.org/D124692
If a constrained intrinsic call was replaced by some value, it was not
removed in some cases. The dangling instruction resulted in useless
instructions executed in runtime. It happened because constrained
intrinsics usually have side effect, it is used to model the interaction
with floating-point environment. In some cases side effect is actually
absent or can be ignored.
This change adds specific treatment of constrained intrinsics so that
their side effect can be removed if it actually absents.
Differential Revision: https://reviews.llvm.org/D118426
Splatting a bit of constant-index across a value:
sext (ashr (trunc iN X to iM), M-1) to iN --> ashr (shl X, N-M), N-1
If the dest type is different, use a cast (adjust use check).
https://alive2.llvm.org/ce/z/acAan3
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D124590
For the unary shuffle pattern, this is opposite to what we try
to do with binops, but it seems better to keep it consistent
with the motivating binary shuffle pattern. On that, it is
clearly better on the usual no-extra uses case.
There is a chance that this will pull an fneg away from some
other binop and cause a regression in codegen, but that should
be invertible in the backend. The transform is birectional:
https://alive2.llvm.org/ce/z/kKaKCUhttps://alive2.llvm.org/ce/z/3DesfwFixes#45631
Try to push an icmp into a select even if the icmp operand isn't
constant - perform a generic SimplifyICmpInst instead.
This doesn't appear to impact compile-time much, and forming
logical and/or is generally profitable, as we have very good
support for them.
D113035 enhanced the matching of bitwise selects from vector types. This
change unfortunately introduced crashes as it tries to cast scalable
vector types to integers.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D124997
If a constrained intrinsic call was replaced by some value, it was not
removed in some cases. The dangling instruction resulted in useless
instructions executed in runtime. It happened because constrained
intrinsics usually have side effect, it is used to model the interaction
with floating-point environment. In some cases it is correct behavior
but often the side effect is actually absent or can be ignored.
This change adds specific treatment of constrained intrinsics so that
their side effect can be removed if it actually absents.
Differential Revision: https://reviews.llvm.org/D118426
This check is in the related fold for binops,
but it was missed when the code was adapted
for intrinsics in 432c199e84. The new test
would crash when trying to create a new
intrinsic with mismatched types.
This extends 432c199e84 and 9c4770eaab with an intrinsic
cited directly in issue #46238
Eventually, we will want to use llvm::isTriviallyVectorizable()
or create some new API for this list, but for now, I am intentionally
making a minimum change to reduce risk and only affect an intrinsic
with regression tests in place.
https://alive2.llvm.org/ce/z/sD-JVv
This extends 432c199e84 with a 3 arg intrinsic to demonstrate
that the code works with the extra operand.
Eventually, we will want to use llvm::isTriviallyVectorizable()
or create some new API for this list, but for now, I am intentionally
making a minimum change to reduce risk and only affect an intrinsic
with regression tests in place.
This is an intrinsic version of the existing fold for binops.
As a first step, I only allowed min/max, but the code is set
up to make adding more intrinsics easy (with more or less than
2 arguments).
This (and possible follow-ups) are discussed in issue #46238.
libcalls." (was 0f8c626). This reverts commit 14d9390.
The patch previously failed to recognize cases where user had defined a
function alias with an identical name as that of the library
function. Module::getFunction() would then return nullptr which is what the
sanitizer discovered.
In this updated version a new function isLibFuncEmittable() has as well been
introduced which is now used instead of TLI->has() anytime a library function
is to be emitted . It additionally also makes sure there is e.g. no function
alias with the same name in the module.
Reviewed By: Eli Friedman
Differential Revision: https://reviews.llvm.org/D123198
Normally the index type will already be canonicalized here, but
this is not guaranteed depending on visitation order. The code
was already accounting for a potentially needed sext, but a trunc
may also be needed.
Add a ConstantExpr::getSExtOrTrunc() helper method to make this
simpler. This matches the corresponding IRBuilder method in behavior.
Fixes https://github.com/llvm/llvm-project/issues/55228.
This patch removes an old hack in visitSelectInst that was written to avoid miscompilation bugs in loop unswitch.
(Added via https://reviews.llvm.org/D35811)
The legacy loop unswitch pass will be removed after D124376, and the new simple loop unswitch pass correctly uses freeze to avoid introducing UB after D124252.
Since the hack is not necessary anymore, this patch removes it.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D124426
This removes memset with undef char. We already do this for stores
of undef value.
This comes with the caveat that this optimization is not, strictly
speaking, legal for undef values, because we might be overwriting
a poison value. However, our entire load/store model currently still
operates on undef values, so we need to support undef here as well
for internal consistency.
Once https://github.com/llvm/llvm-project/issues/52930 is resolved,
these and related folds can be limited to poison -- I've added
FIXMEs to that effect.
Differential Revision: https://reviews.llvm.org/D124173
This is an edge-case where we don't convert to bitwise and/or based
on implies poison reasoning, so explicitly try to perform the fold
in logical form. The transform itself is poison-safe, as both icmps
are based on the same value and any nowrap flags are discarded as
part of the fold (https://alive2.llvm.org/ce/z/aCwC8b for the used
example).
This fold handles a special subset of foldAndOrOfICmpsUsingRanges(),
use the more generic implementation instead.
The result can differ if a representation using a range comparison
is possible, in which case that is preferred over masking. There is
a canonicalization opportunity here.
This is the de Morgan conjugated variant of the existing fold for
ors. Implement this by switching the range code to always work
on ors and perform invert operands at the start and end. This makes
reasoning easier and makes the extension more obviosuly correct.
We can express this fold more naturally when working on the constant
range implementation. This change is not entirely NFC, because the
code now also handles cases that don't match the precise pattern
this previously looked for, e.g. we can omit an add on one of the
ranges.
Currently, two GEPs will only be combined if the result element
type of one is the same as the source element type of the other.
However, this means we may miss folding opportunities where the
second GEP could be rewritten using a different element type. This
is especially relevant for opaque pointers, where constant GEPs
often use i8 element type.
Address this by converting GEP indices to offsets, adding them,
and then converting them back to indices. The first (inner) GEP
is allowed to have variable indices as well, in which case only
the constant suffix is converted into an offset.
This should address the regression reported in
https://reviews.llvm.org/D123300#3467615.
Differential Revision: https://reviews.llvm.org/D124459
canonicalizeClampLike canonicalizes the ule/ugt comparisons to ult/uge,
respectively. However, it does not update the variable holding the
comparison predicate type after doing this. Later code fails to handle
the non-canonical predicate type (specifically, the swap of
ThresholdLowIncl and ThresholdHighExcl when Pred0 has been canonicalized
from ugt to uge). This leads to the miscompile reported in PR53252. Fix
this by updating the comparison predicate after canonicalizing.
Fixes#53252
Differential Revision: https://reviews.llvm.org/D119690
We can always replace the undef elements in a vector constant
with regular constants to get rid of the freeze:
https://alive2.llvm.org/ce/z/nfRb4F
The select diffs show that we might do better by adjusting the
logic for a frozen select condition. We may also want to refine
the vector constant replacement to consider forming a splat.
Differential Revision: https://reviews.llvm.org/D123962
We may be able to make the ValueTracking wrapper smarter
in the future (for example, analyze a simple recurrence),
so this will automatically benefit if that happens.
This patch add a function foldSelectWithFCmpToFabs, and do more combine for
fneg-of-fabs.
With 'nsz':
fold (X < +/-0.0) ? X : -X or (X <= +/-0.0) ? X : -X to -fabs(x)
fold (X > +/-0.0) ? X : -X or (X >= +/-0.0) ? X : -X to -fabs(x)
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D123830
Folds are supposed to always be added in conjugated pairs for and
and or. Merge the two functions to make folds for which this is
currently not the case more obvious.
1d90e53044 switch this code to store
the predicates and operands in variables, but retained a
swapOperands() call here. Thus the commuted cases were no longer
folded. Additionally, as the change was not reported, the next
InstCombine iteration would not pick it up either.
The first attempt at this missed a check to make sure the offset
constant was in range and caused many bot failures.
That was missed in the Alive2 proof because on overshift creates
poison rather than the assert from APInt. Here's an alternate
attempt at a proof using count-trailing-zeros:
https://alive2.llvm.org/ce/z/pnXQYR
Original commit message:
This is similar to an existing pre-shift-of-constant fold:
8a9c70fc01
...but in this case, we need no-wrap on the shl and a negative
offset:
https://alive2.llvm.org/ce/z/_RVz99
We're making a recursive call here and everything in the function
assumes we're looking at scalars. This would be violated if we
looked through a bitcast from vectors.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D124015
This is not expected to have a functional difference as discussed in the
post-commit comments for 8a9c70fc01. All of the motivating tests for
the older fold still optimize as expected because other code can infer
the 'nuw'.
With 'nuw' we can convert the increment of the shift amount
into a pre-shift (constant fold) of the shifted constant:
https://alive2.llvm.org/ce/z/FkTyR2
Fixes issue #41976
The description was ambiguous about the behavior
when boths select arms are constant or both arms
are not constant. I don't think there's any
evidence to support either way, but this matches
the code with a more specified description.
We can extend this to deal with vector constants
with undef/poison elements. Currently, those don't
get folded anywhere.
This diff extends foldSelectInstWithICmp to handle the case icmp(X) ? f(X) : C
when f(X) is guaranteed to be equal to C for all X in the exact range of the inverse predicate.
This addresses the issue https://github.com/llvm/llvm-project/issues/54089.
Differential revision: https://reviews.llvm.org/D123159
Test plan: make check-all
The test is already simplified, and I'm not sure how
to write a test to exercise the new clause. But it
protects the 2-bit pattern from miscompiling as noted
in D123453.
https://alive2.llvm.org/ce/z/QPyVfv
(If we managed to fall into the mul transform, it
would wrongly create a zero on this pattern.)
It actually implements support for seeing through loads, using alias analysis to
refine the result.
This is rather limited, but I didn't want to rely on more than available
analysis at that point (to be gentle with compilation time), and it does seem to
catch common scenario, as showcased by the included tests.
Differential Revision: https://reviews.llvm.org/D122431
By adding a parameter to function FoldOpIntoSelect, we can fold more Ops to Select.
For this example, we tend to fold the division instruction,
so we no longer care whether SelectInst is one use.
This patch slove TODO left in InstCombine/div.ll.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D122967
Sometimes we can infer an align from an allocalign but the function
already promised it'd be more-aligned than the allocalign and there's an
existing align that we shouldn't reduce. Make sure we handle that
correctly.
Differential Revision: https://reviews.llvm.org/D121642
This is a lshr equivalent to D122340 - if we don't demand any of the additional sign bits introduced by the ashr, the lshr can be treated as an ashr and we can remove the shift entirely if we only demand already known sign bits.
Another step towards PR21929
https://alive2.llvm.org/ce/z/6f3kjq
Differential Revision: https://reviews.llvm.org/D123118
This is a hacky fix for:
https://github.com/llvm/llvm-project/issues/54558
As discussed there, codegen regressed when we opened up this transform
to allow extra uses ( 61580d0949 ), and it's not clear how to
undo the transforms at the later stage of compilation.
As noted in the code comments, there's a set of remaining folds that
are still limited to one-use, so we can try harder to refine and
expand the limitations on these folds, but it's likely to be an
up-and-down battle as we find and overcome similar regressions.
Differential Revision: https://reviews.llvm.org/D122909
This is a retry of 9397bdc67e - that was reverted until
we had a clang warning in place to alert users about a
possible mistake in source. The warning was added with
ab982eace6.
This is noted as a missing clang warning in #54222,
but it is also a missing optimization opportunity.
Alive2 proofs:
https://alive2.llvm.org/ce/z/Q8drDqhttps://alive2.llvm.org/ce/z/pE6LRt
I don't see a single conversion for all predicates
using "getFCmpCode" logic, so other predicates are
left as a TODO item.
These two are equivalent,
and i *think* the `and` form is more-ish canonical.
General proof: https://alive2.llvm.org/ce/z/RrF5s6
If constant on the (outer) `xor` is an `undef`,
the whole lane is dead: https://alive2.llvm.org/ce/z/mu4Sh2
However, if the constant on the (inner) `or` is an `undef`,
we must sanitize it first: https://alive2.llvm.org/ce/z/MHYJL7
I guess, producing a zero `and`-mask is optimal in that case.
alive-tv is happy about the entirety of `xor-of-or.ll`.
Before we gave up if a call through bitcast had parameter attributes.
Interestingly, we allowed attributes for the return value already. We
now handle both the same way, namely, we drop the ones that are
incompatible with the new type and keep the rest. This cannot cause
"more UB" than initially present.
Differential Revision: https://reviews.llvm.org/D119967
We already do this for SelectionDAG, but we're missing it here.
Noticed while re-triaging PR21929
Differential Revision: https://reviews.llvm.org/D122340
Add the "(0 - X) --> (X * -1)" reverse identity to the list of alternate form binops.
We need a little hack to make the existing logic work because it does not expect to
move constants from op0 to op1, but the code comment hopefully makes that clear.
I don't think there are any other identities like that.
Fixes#54364
Differential Revision: https://reviews.llvm.org/D122390
The first attempt at this missed a validity check.
This version includes a test of the narrow source
type for modulo-16-bits.
Original commit message:
This is the IR counterpart to 370ebc9d9a
which provided a bswap narrowing fix for issue #53867.
Here we can be more general (although I'm not sure yet
what would happen for illegal types in codegen - too
rare to worry about?):
https://alive2.llvm.org/ce/z/3-CPfo
This will be more effective if we have moved the shift
after the bswap as proposed in D122010, but it is
independent of that patch.
Differential Revision: https://reviews.llvm.org/D122166
The motivation for this is that while both memcpyopt and dse will catch this case, both are limited by MSSA's walk back threshold when finding clobbers. As such, if you have a memcpy of an otherwise dead alloca placed towards the end of a long basic block with lots of other memory instructions, it would be missed. This is a bit undesirable for such an "obviously" useless bit of code.
As noted in comments, we should probably generalize instcombine's escape analysis peephole (see visitAllocInst) to allow read xor write. Doing that would subsume this code in a more general way, but is also a more involved change. For the moment, I went with the easiest fix.
When shifting by a byte-multiple:
bswap (shl X, C) --> lshr (bswap X), C
bswap (lshr X, C) --> shl (bswap X), C
This is an IR implementation of a transform suggested in D120648.
The "swaps cancel" test models the motivating optimization from
that proposal.
Alive2 checks (as noted in the other review, we could use
knownbits to handle shift-by-variable-amount, but that can be an
enhancement patch):
https://alive2.llvm.org/ce/z/pXUaRfhttps://alive2.llvm.org/ce/z/ZnaMLf
Differential Revision: https://reviews.llvm.org/D122010
This is the IR counterpart to 370ebc9d9a
which provided a bswap narrowing fix for issue #53867.
Here we can be more general (although I'm not sure yet
what would happen for illegal types in codegen - too
rare to worry about?):
https://alive2.llvm.org/ce/z/3-CPfo
This will be more effective if we have moved the shift
after the bswap as proposed in D122010, but it is
independent of that patch.
Differential Revision: https://reviews.llvm.org/D122166
This patch tries to sink instructions when they are only used in a successor block.
This is a further enhancement patch based on Anna's commit:
D109700, which allows sinking an instruction having multiple uses in a single user.
In this patch, sink instructions with multiple users in a single successor block will be supported.
It could fix a known issue from rust:
https://github.com/rust-lang/rust/issues/51346#issuecomment-394443610
Reviewed By: nikic, reames
Differential Revision: https://reviews.llvm.org/D121585
Reapply with an explicit check for multi-edges, as the expected
behavior of multi-edge dominance is unclear (D120811).
-----
For conditional branches, we know the value is i1 0 or i1 1 along
the outgoing edges. For switches we can apply exactly the same
optimization, just with the known values determined by the switch
cases.
This can be viewed as swapping the select arms:
https://alive2.llvm.org/ce/z/jUvFMJ
...so we don't have the 'nsz' problem with the more general fold.
This unlocks other folds for the motivating fabs example.
This was discussed in issue #38828.
If we have a logical and/or in select form and the true/false operand
is an fcmp with poison generating FMF, we won't be able to fold it
to an and/or instruction. This prevents us from optimizing the case
where it is a logical operation of two fcmps with identical operands.
This patch adds explicit checks for this case that doesn't rely on
converting to and/or to do the optimization. It reuses the existing
foldLogicOfFCmps, but adds a new flag to disable the other combine
that is inside that function.
FMF flags from the two FCmps are intersected using the logic added in
D121243. The FIXME has been updated to indicate that we can only use
a union for the non-select form.
This allows us to optimize cases like this from compare-fp-3.c in the
gcc torture suite with fast math.
void
test1 (float x, float y)
{
if ((x==y) && (x!=y))
link_error0();
}
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D121323
X (any pred) -X --> X (any pred) 0.0
This works with all FP values and preserves FMF.
Alive2 examples:
https://alive2.llvm.org/ce/z/dj6jhp
This can also create one of the patterns that we match as "fabs"
as shown in one of the test diffs.
This reverts commit 9397bdc67e.
This optimization is likely to surprise programmers as seen
in post-commit comments, so we should add a clang warning
first (that is proposed in D121306).
As discussed on Issue #32161 this fold can be generalized a lot more than it currently is, but this patch at least adds vector support.
Differential Revision: https://reviews.llvm.org/D121358
This patch intersects the fast math flags from the two fcmps instead
of dropping them.
I poked at this a bunch with Alive2 for nnan and ninf flags and it seemed
to check out. With the other flags it told me "Couldn't prove the
correctness of the transformation". Not sure if I should just preserve
nnan and ninf?
Reviewed By: spatel, lebedev.ri
Differential Revision: https://reviews.llvm.org/D121243
This is noted as a missing clang warning in #54222
(and we should still make that enhancement).
Alive2 proofs:
https://alive2.llvm.org/ce/z/Q8drDqhttps://alive2.llvm.org/ce/z/pE6LRt
I don't see a single conversion for all predicates
using "getFCmpCode" logic, so other predicates are
left as a TODO item.
Prior to this change LLVM would happily elide a call to any allocation
function and a call to any free function operating on the same unused
pointer. This can cause problems in some obscure cases, for example if
the body of operator::new can be inlined but the body of
operator::delete can't, as in this example from jyknight:
#include <stdlib.h>
#include <stdio.h>
int allocs = 0;
void *operator new(size_t n) {
allocs++;
void *mem = malloc(n);
if (!mem) abort();
return mem;
}
__attribute__((noinline)) void operator delete(void *mem) noexcept {
allocs--;
free(mem);
}
void deleteit(int*i) { delete i; }
int main() {
int*i = new int;
deleteit(i);
if (allocs != 0)
printf("MEMORY LEAK! allocs: %d\n", allocs);
}
This patch addresses the issue by introducing the concept of an
allocator function family and uses it to make sure that alloc/free
function pairs are only removed if they're in the same family.
Differential Revision: https://reviews.llvm.org/D117356
The similar getICmpCode and getPredForICmpCode are already there.
This moves FP for consistency.
I think InstCombine is currently the only user of both.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D120754
To make this actually trigger, we also need to check whether the
function types differ, which is a hidden cast under opaque pointers.
The transform is somewhat less relevant there because it is
primarily about pointer bitcasts, but it can also happen with other
bit- or pointer-castable types.
Byval handling is easier with opaque pointers because there is no
need to adjust the byval type, we only need to make sure that it's
still a pointer.
The logic for handling this was fixed in
8d7f118ab2, but the check for byval
on the callee was retained. This resulted in a weird situation
where the transform would work depending on whether the byval
was only on the call or on both the call and the function.
This is a recommit without changes. I originally reverted this
due to a significant code-size regression on tramp3d-v4, however
further investigation showed that in the tramp3d-v4 case this
change enables additional optimizations (in particular more
jump threading), which happens to reduce the size of a function
just enough to be eligible for inlining at hot callsites, which
results in the code size increase. As such, this was just bad
luck.
-----
This one-use limitation is artificial, we do not increase
instruction count if we perform the fold with multiple uses. The
motivating case is shown in @sub_eq_zero_select, where the one-use
limitation causes us to miss a subsequent select fold.
I believe the backend is pretty good about reusing flag-producing
subs for cmps with same operands, so I think doing this is fine.
Differential Revision: https://reviews.llvm.org/D120337
For conditional branches, we know the value is i1 0 or i1 1 along
the outgoing edges. For switches we can apply exactly the same
optimization, just with the known values determined by the switch
cases.
Instead of passing an InstCmpInt * and a bool just pass the predicate
from the caller.
I'm considering moving the similar FCmp functions from InstCombine
over here and this makes the interface consistent with what is used
for FCmp.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D120609
This transform can still be applied if there are more than two
phi inputs, as long as phi inputs with the same value are dominated
by the same idom edge.