Commit Graph

4976 Commits

Author SHA1 Message Date
Sanjay Patel bfde861935 [InstCombine] convert mask and shift of power-of-2 to cmp+select
When the mask is a power-of-2 constant and op0 is a shifted-power-of-2
constant, test if the shift amount equals the offset bit index:

(ShiftC << X) & C --> X == (log2(C) - log2(ShiftC)) ? C : 0
(ShiftC >> X) & C --> X == (log2(ShiftC) - log2(C)) ? C : 0

This is an alternate to D127610 with a more general pattern.
We match only shift+and instead of the trailing xor, so we see a few
more tests diffs. I think we discussed this initially in D126617.

Here are proofs for shifts in both directions:
https://alive2.llvm.org/ce/z/CFrLs4

The test diffs look equal or better for IR, and this makes the
patterns more uniform in IR. The backend can partially invert this
in both cases if that is profitable. It is not trivially reversible,
however, so if we find perf regressions that are not easy to undo,
then we may want to revert this.

Differential Revision: https://reviews.llvm.org/D127801
2022-06-17 10:51:57 -04:00
Nikita Popov c6b88cb918 [InstCombine] Push freeze through recurrence phi
We really want to push freezes through recurrence phis, so that we
freeze only the start value, rather than the IV value on every
iteration. foldOpIntoPhi() already handles this for the case where
the transfer function doesn't produce poison, e.g.
%iv.next = add %iv, 1. However, this does not work if nowrap flags
are present, e.g. the very common %iv.next = add nuw %iv, 1 case.

This patch adds a fold that pushes freeze instructions to the start
value by checking whether all backedge values will be non-poison
after poison generating flags have been dropped. This allows pushing
freezes out of loops in most cases. I suspect that this also
obsoletes the CanonicalizeFreezeInLoops pass, and we can probably
drop it.

Fixes https://github.com/llvm/llvm-project/issues/56048.

Differential Revision: https://reviews.llvm.org/D127960
2022-06-17 15:01:41 +02:00
Heejin Ahn b2f4112f25 [InstCombine] Improve check for catchswitch BBs (NFC)
Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D127810
2022-06-15 01:06:13 -07:00
chenglin.bi 286198ff04 [InstCombine] Optimize lshr+shl+and conversion pattern
if `C1` and `C3` are pow2 and `Log2(C3) >= C2`:
    ((C1 >> X) << C2) & C3 -> X == (Log2(C1)+C2-Log2(C3)) ? C3 : 0
https://alive2.llvm.org/ce/z/zvrkKF

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D127469
2022-06-14 11:06:10 +08:00
Heejin Ahn ac4006b0d6 [InstCombine] Don't slice up PHIs when pred BB has catchswitch
If an integer PHI has an illegal type (according to the data layout) and
it is only used by `trunc` or `trunc(lshr)` operations, we split the PHI
into various instructions in its predecessors:
6d1543a167/llvm/lib/Transforms/InstCombine/InstCombinePHI.cpp (L1536-L1543)

So this can produce code like the following:
Before:
```
pred:
  ...

bb:
  %p = phi i8 [ %somevalue, %pred ], ...
  ...
  %tobool = trunc i8 %p to i1
  use %tobool
  ...
```
In this code, `%p` has an illegal integer type, `i8`, and its only used
in a `trunc` instruction later. In this case this pass puts extraction
code in its predecessors:

After:
```
pred:
  ...
  %t = and i8 %somevalue, 1
  %extract = icmp ne i8 %t, 0

bb:
  %p.new = phi i1 [ %extract, %pred ], ...
  use %p.new instead of %tobool
```

But this doesn't work if `pred` is a `catchswitch` BB because it cannot
have any non-PHI instructions. This CL ensures we bail out in that case.

Fixes https://github.com/llvm/llvm-project/issues/55803.

Reviewed By: dschuff

Differential Revision: https://reviews.llvm.org/D127699
2022-06-13 18:32:09 -07:00
Sanjay Patel 310adb658c [InstCombine] reorder mask folds for efficiency
This shows narrowing improvements on the logic tests
(transforms recently added with e247b0e5c9).

This is not a complete fix. That would require adding
folds to visitOr/visitXor. But it enables the expected
transforms for the basic patterns in the affected tests.
2022-06-13 09:49:57 -04:00
David Sherwood 83251896d7 [NFC][InstCombine] Refactor InstCombinerImpl::foldSelectIntoOp
Introduce a lambda function so that we remove a lot of code
duplication.

Differential Revision: https://reviews.llvm.org/D127493
2022-06-13 10:37:07 +01:00
Nikita Popov 92a9b1c918 [InstCombine] Don't push operation across loop phi
When pushing an operation across a phi node, we should avoid doing
so across a loop backedge. This is generally non-profitable, because
it does not reduce the number of times the operation is executed,
and could lead to an infinite combine loop.

The code was already guarding against this, but using an
insufficiently strong condition, which did not cover the case where
the operation was originally outside the loop (in which case the
transform moves the operation from outside the loop into the loop,
which is particularly undesirable).

Differential Revision: https://reviews.llvm.org/D127499
2022-06-13 10:48:09 +02:00
Nuno Lopes e5c5f92e12 [InstCombine] switch synthetic unreachable to use undef instead of poison (NFC) 2022-06-10 21:54:09 +01:00
Sanjay Patel e247b0e5c9 [InstCombine] add narrowing transform for low-masked binop with zext operand (2nd try)
The 1st try ( afa192cfb6 ) was reverted because it could
cause an infinite loop with constant expressions.

A test for that and an extra condition to enable the transform
are added now. I also added code comments to better describe
the transform and the existing, related transform.

Original commit message:
https://alive2.llvm.org/ce/z/hRy3rE

As shown in D123408, we can produce this pattern when moving
casts around, and we already have a related fold for a binop
with a constant operand.
2022-06-10 12:42:27 -04:00
Guillaume Chatelet dc9c2eac98 [NFC][Alignment] Simplify code 2022-06-10 15:25:28 +00:00
Sanjay Patel 6fedc6a2b4 Revert "[InstCombine] add narrowing transform for low-masked binop with zext operand"
This reverts commit afa192cfb6.
This can cause an infinite loop as shown with an example in the
post-commit thread.
2022-06-10 08:25:10 -04:00
David Sherwood 8daaea206b [InstCombine] Use +0.0 instead of -0.0 as the FP identity for some folds
In foldSelectIntoOp we sometimes transform a select of a fadd into a
fadd of a select, where we select between data and an identity value.
For both fadd and fsub the identity is always -0.0, but if the nsz
flag is set on the select instruction we can use +0.0 instead. Doing
so then triggers other optimisations, such as when folding the select
of masked load into a new masked load.

Differential Revision: https://reviews.llvm.org/D126774
2022-06-10 12:42:34 +01:00
chenglin.bi de7a6ae1ff [InstCombine] Optimize shl+lshr+and conversion pattern
if `C1` and `C3` are pow2 and `Log2(C3)+C2 < BitWidth`:
    ((C1 << X) >> C2) & C3 -> X == (Log2(C3)+C2-Log2(C1)) ? C3 : 0;

https://alive2.llvm.org/ce/z/Pus5bd

Fix issue https://github.com/llvm/llvm-project/issues/55739

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D126617
2022-06-10 09:36:58 +08:00
Sanjay Patel afa192cfb6 [InstCombine] add narrowing transform for low-masked binop with zext operand
https://alive2.llvm.org/ce/z/hRy3rE

As shown in D123408, we can produce this pattern when moving
cast around, and we already have a related fold for a binop
with a constant operand.
2022-06-09 16:59:26 -04:00
Simon Moll b8c2781ff6 [NFC] format InstructionSimplify & lowerCaseFunctionNames
Clang-format InstructionSimplify and convert all "FunctionName"s to
"functionName".  This patch does touch a lot of files but gets done with
the cleanup of InstructionSimplify in one commit.

This is the alternative to the less invasive clang-format only patch: D126783

Reviewed By: spatel, rengolin

Differential Revision: https://reviews.llvm.org/D126889
2022-06-09 16:10:08 +02:00
Biplob Mishra d87bfa9ad0 [InstCombine] Combine instructions of type or/and where AND masks can be combined.
The patch simplifies some of the patterns as below

(A | (B & C0)) | (B & C1) -> A | (B & C0|C1)
((B & C0) | A) | (B & C1) -> (B & C0|C1) | A

In some scenarios like byte reverse on half word, we can see this pattern multiple times and this conversion can optimize these patterns.

Additionally this commit fixes the issue reported with the test case.
int f(int a, int b) {
  int c = ((unsigned char)(a >> 23) & 925);
  if (a)
    c = (a >> 23 & b) | ((unsigned char)(a >> 23) & 925) | (b >> 23 & 157);
  return c;
}

The previous revision/commit did not check one-use of an intermediate value that this transform re-uses.
When that value has another use, an existing transform will try to invert the transform here.
By adding one-use checks, we avoid the infinite loops seen with the earlier commit.

Differential Revision: https://reviews.llvm.org/D124119
2022-06-09 10:58:30 +01:00
Chenbing Zheng 38992d2c5e [InstCombine] improve fold for icmp-ugt-ashr
Existing condition for
fold icmp ugt (ashr X, ShAmtC), C --> icmp ugt X, ((C + 1) << ShAmtC) - 1
missed some boundary. It cause this fold don't work for some cases, and the
reason is due to signed number overflow.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D127188
2022-06-09 16:22:12 +08:00
Wael Yehia 0952cf5bbb [InstCombine] decomposeSimpleLinearExpr should bail out on negative operands.
InstCombine tries to rewrite

  %prod = mul nsw i64 %X,   Scale
  %acc = add nsw i64 %prod,   Offset
  %0 = alloca i8, i64 %acc, align 4
  %1 = bitcast i8* %0 to i32*
  Use ( %1 )

into

  %prod = mul nsw i64 %X,   Scale/4
  %acc = add nsw i64 %prod,   Offset/4
  %0 = alloca i32, i64 %acc, align 4
  Use (%0)

But it assumes Scale is unsigned, and performs an unsigned division.
So we should bail out if Scale cannot be interpreted as an unsigned safely.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D126546
2022-06-08 00:57:25 +00:00
Sanjay Patel cae993d4c8 [InstCombine] [InstCombine] reduce left-shift-of-right-shifted constant via demanded bits
If we don't demand low bits and it is valid to pre-shift a constant:
(C2 >> X) << C1 --> (C2 << C1) >> X

https://alive2.llvm.org/ce/z/_UzTMP

This is the reverse-order shift sibling to 82040d414b ( D127122 ).
It seems likely that we would want to add this to the SDAG version of
the code too to keep it on par with IR.
2022-06-07 18:43:27 -04:00
Sanjay Patel a4d2c5ecaa [InstCombine] reduce code duplication for accessing type; NFC 2022-06-07 18:43:27 -04:00
Sanjay Patel 82040d414b [InstCombine] reduce right-shift-of-left-shifted constant via demanded bits
If we don't demand high bits (zeros) and it is valid to pre-shift a constant:
(C2 << X) >> C1 --> (C2 >> C1) << X

https://alive2.llvm.org/ce/z/P3dWDW

There are a variety of related patterns, but I haven't found a single solution
that gets all of the motivating examples - so pulling this piece out of
D126617 along with more tests.

We should also handle the case where we shift-right followed by shift-left,
but I'll make that a follow-on patch assuming this one is ok. It seems likely
that we would want to add this to the SDAG version of the code too to keep it
on par with IR.

Differential Revision: https://reviews.llvm.org/D127122
2022-06-07 13:28:18 -04:00
Sanjay Patel 3f33d67d8a [InstCombine] fold mul with masked low bit operand to trunc+select
https://alive2.llvm.org/ce/z/o7rQ5q

This shows an extra instruction in some cases, but that is
caused by an existing canonicalization of trunc -> and+icmp.

Codegen should be better for any target where a multiply is
more costly than the most simple ALU op.

This ends up producing the requested x86 asm from issue #55618,
but it's not the same IR. We are missing a canonicalization
from the negate+mask pattern to the trunc+select created here.
2022-06-05 20:07:18 -04:00
Sanjay Patel 8689463bfb [InstCombine] make pattern matching more consistent; NFC
We could go either way on this and several similar matches.
Just matching as a binop is possibly slightly more efficient;
we don't need to re-confirm the opcode of the instruction.
2022-06-02 16:01:23 -04:00
Alexander Kornienko aa98e7e1eb Revert "[InstCombine] Combine instructions of type or/and where AND masks can be combined."
This reverts commit ec4adf1f6c. The commit causes
clang to hang on a certain input:
```
$ cat q.cc
int f(int a, int b) {
  int c = ((unsigned char)(a >> 23) & 925);
  if (a)
    c = (a >> 23 & b) | ((unsigned char)(a >> 23) & 925) | (b >> 23 & 157);
  return c;
}

$ time ./clang-15-10515 --target=x86_64--linux-gnu -O1  -c q.cc
^C

real    0m45.072s
user    0m0.025s
sys     0m0.099s
```
2022-06-01 14:20:00 +02:00
Sanjay Patel 2bf6123f22 [InstCombine] fold icmp of sext bool based on limited range
X <=u (sext i1 Y) --> (X == 0) | Y

https://alive2.llvm.org/ce/z/W_tZzo

This is the conjugate/sibling pattern suggested with D126171
for a sign-extended bool value.
2022-05-31 12:37:56 -04:00
Nikita Popov 36cbdaa163 [InstCombine] Fix inbounds preservation when swapping GEPs (PR44206)
When reassociating GEPs, we can only keep inbounds if both original
GEPs were inbounds, and their offsets have the same sign. For the
sake of simplicity, I only handle the case where both offsets are
non-negative here.

It would probably be fine to just not preserve inbounds at all here,
but as I don't see a compile-time impact for adding the
isKnownNonNegative() calls I went with this more conservative
approach.

Fixes https://github.com/llvm/llvm-project/issues/44206.

Differential Revision: https://reviews.llvm.org/D126687
2022-05-31 15:45:02 +02:00
Danila Malyutin 4fb3fd7d82 [InstCombine] Fix const folding of switches with default case
In case phi was in the default block it could lead to multi-edge.
Fixes #55721.

Differential Revision: https://reviews.llvm.org/D126650
2022-05-31 15:13:58 +03:00
Nikita Popov 872d69e5d4 [InstCombine] Fix inbounds preservation when merging GEPs (PR55722)
Even if the total offset is inbounds, we might represent it by first
performing a large negative offset and then a small positive one.
With inbounds semantics as currently specified, each offset must
be inbounds individually, not just the overall offset of the GEP.

Fix this by checking that the sign of all offsets is the same.

Fixes https://github.com/llvm/llvm-project/issues/55722.
2022-05-31 11:54:01 +02:00
Sanjay Patel a0c3c60728 [InstCombine] fold shift-right-by-constant with shift-right-of-constant operand
(C2 >> X) >> C1 --> (C2 >> C1) >> X

The shift-left form of this transform has existed since:
16f18ed7b5

...but it applies to matching shift right opcodes too:
https://alive2.llvm.org/ce/z/c5eQms
2022-05-30 15:30:01 -04:00
Sanjay Patel c5d942a4fb [InstCombine] remove unnecessary one-use check from (C2 << X) << C1 fold
The restriction goes back to:
16f18ed7b5
...but the fold only replaces a shift with a shift, so that's not necessary.
Generalizing to other opcodes is planned as a follow-up.
2022-05-30 15:17:54 -04:00
Nikita Popov a770f534e6 [InstCombine] When swapping GEPs, only keep inbounds if both are
If only one of the GEPs is inbounds, then after swapping, there is
no guarantee that one of them will be inbounds as well
(see e.g. https://alive2.llvm.org/ce/z/agaCnp).

This is only a partial fix, because even if both are inbounds, the
result is not necessarily inbounds (if the offsets have different
signs).
2022-05-30 17:04:42 +02:00
Nikita Popov 2d7bab666f [InstCombine] Always create new GEPs when swapping GEPs
As the long explanatory comment attests, performing the modification
in place is pretty tricky. Drop this unnecessary complexity and
always create new instructions.

This should be NFC-ish, but can probably cause difference due to
worklist order.
2022-05-30 16:48:52 +02:00
zhongyunde 3e6ba89055 [InstCombine] Fold a mul with bool value into and
Fixes https://github.com/llvm/llvm-project/issues/55599
  X * Y --> X & Y, iff X, Y can be only {0, 1}.
https://alive2.llvm.org/ce/z/_RsTKF

Reviewed By: spatel, nikic

Differential Revision: https://reviews.llvm.org/D126040
2022-05-30 21:05:00 +08:00
Chenbing Zheng ef256ed58e [InstCombine] bitcast (extractelement <1 x elt>, dest) -> bitcast(<1 x elt>, dest)
Only solve dest type is vector to avoid inverse transform in visitBitCast.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D125951
2022-05-30 10:16:32 +08:00
Sanjay Patel b5b6aa4d53 [InstCombine] fold multiply by signbit-splat to cmp+select
(ashr i32 X, 31) * C --> (X < 0) ? -C : 0
https://alive2.llvm.org/ce/z/G8u9SS

With a constant operand, this is an improvement in IR
and codegen (where it can be converted to a mask op).

Without a constant operand, we would have to negate
the operand, so that is probably better left to the backend.

This is similar but not the same optimization that is requested
in #55618.
2022-05-27 11:54:19 -04:00
Sanjay Patel 5a6e085757 [InstCombine] reduce code duplication; NFC 2022-05-27 11:54:19 -04:00
Sanjay Patel c4c750058f [InstCombine] fold mul of signbit directly to X < 0 ? Y : 0
This is effectively NFC (intentionally no test diffs)
because we already have the related fold that converts
the 'and' pattern to select. So this is just an efficiency
improvement.
2022-05-26 16:19:15 -04:00
Sanjay Patel 49f8b05137 [InstCombine] fold icmp equality with sdiv and SMIN
This extends the fold from D126410 / 3952c905ef
to allow for the only case where it works with signed
division:
https://alive2.llvm.org/ce/z/k7_ypu

(X s/ Y) == SMIN --> (X == SMIN) && (Y == 1)
(X s/ Y) != SMIN --> (X != SMIN) || (Y != 1)

This is another improvement based on #55695.
2022-05-26 16:19:15 -04:00
Sanjay Patel ed5be1523f [InstCombine] reduce code duplication in icmp+div folds; NFC 2022-05-26 16:19:15 -04:00
Sanjay Patel 3952c905ef [InstCombine] fold icmp equality with udiv and large constant
With large compare constant:
(X u/ Y) == C --> (X == C) && (Y == 1)
(X u/ Y) != C --> (X != C) || (Y != 1)

https://alive2.llvm.org/ce/z/EhKwh6

There are various potential missing icmp (div) transforms shown here:
https://github.com/llvm/llvm-project/issues/55695

This is a generalization for part of the udiv + equality.
I didn't check in detail, but some of those may only make sense as
codegen transforms.

This results in one extra instruction in IR, but it is better for
analysis, and looks much better in codegen on all targets that I tried.

Differential Revision: https://reviews.llvm.org/D126410
2022-05-26 09:08:47 -04:00
Chenbing Zheng 1486a9c9fe [InstCombine] [NFC] refector foldXorOfICmps
Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D126268
2022-05-26 11:07:18 +08:00
Chenbing Zheng 41aab93afc [InstCombine] bitcast(logic(bitcast(X), bitcast(Y))) -> bitcast'(logic(bitcast'(X), Y))
This patch break foldBitCastBitwiseLogic limite the destination
must have an integer element type, and eliminate one bitcast by
doing the logic op in the type of the input that has an integer
element type.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D126184
2022-05-26 10:23:44 +08:00
Chenbing Zheng 269e3f7369 [InstCombine] [NFC] Move transforms for truncated shifts into narrowBinOp
Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D126056
2022-05-25 10:21:39 +08:00
Sanjay Patel 05527b68a0 [InstCombine] fold more shuffles with FP<->Int cast operands
shuffle (cast X), (cast Y), Mask --> cast (shuffle X, Y, Mask)

This extends the transform added with 0353c2c996.

If the shuffle reduces vector length, the transform
reduces the width of the cast, so that should be a
win for most codegen (if not, it can be inverted).
2022-05-24 15:11:38 -04:00
Nikita Popov e6e0eb3bc8 [InstCombine] Strip bitcasts in GEP diff fold
Bitcasts were stripped in one case, but not the other. Of course,
this no longer really matters with opaque pointers, but as I went
through the trouble of tracking this down, we may as well remove
one typed vs opaque pointer optimization discrepancy.
2022-05-24 16:12:01 +02:00
Nikita Popov b2a13d3e2d [InstCombine] Use IRBuilder in freeze pushing transform (PR55619)
Use IRBuilder so that the newly created freeze instructions
automatically gets inserted back into the IC worklist.

The changed worklist processing order leads to some cosmetic
differences in tests.

Fixes https://github.com/llvm/llvm-project/issues/55619.
2022-05-24 15:48:28 +02:00
Nikita Popov a7c079aaa2 [InstCombine] Support logical and in masked icmp fold
Most of the folds implemented in this function work fine with
logical operations. We only need to be careful for the cases that
work on non-constant masks, where the RHS operand shouldn't be
poison.

This is a conservative implementation that bails out of illegal
transforms, but we could also change these to insert freeze instead.
2022-05-24 11:16:33 +02:00
Nikita Popov 5abaabed22 [InstCombine] Use m_APInt() in asymmetric masked icmp fold
This is mostly intended as code cleanup, but it does also add
support for splat vectors to this fold.
2022-05-24 10:57:28 +02:00
Nikita Popov c0e06c7448 [InstCombine] Handle logical and/or in recursive and/or of icmps fold
The and/or of icmps fold is also applied in reassociated form.
However, this currently only happens for bitwise and of bitwise
and, but not for bitwise and of logical and (or other combinations,
but this is the one being addressed here).

We can do this for bitwise+logical combinations as well, but need
to be a bit careful about which of the resulting ands are logical:
https://alive2.llvm.org/ce/z/WYSjGh
https://alive2.llvm.org/ce/z/guxYnz
https://alive2.llvm.org/ce/z/S5SYxY
https://alive2.llvm.org/ce/z/2rAWeW
2022-05-24 10:13:10 +02:00