Commit Graph

5134 Commits

Author SHA1 Message Date
Kazu Hirata aa8feeefd3 Don't use Optional::hasValue (NFC) 2022-06-25 11:55:57 -07:00
chenglin.bi 30e49a3794 [InstCombine] Optimise shift+and+boolean conversion pattern to simple comparison
if (`C1` is pow2) & (`(C2 & ~(C1-1)) + C1)` is pow2):
    ((C1 << X) & C2) == 0 -> X >= (Log2(C2+C1) - Log2(C1));
https://alive2.llvm.org/ce/z/EJAl1R
    ((C1 << X) & C2) != 0 -> X  < (Log2(C2+C1) - Log2(C1));
https://alive2.llvm.org/ce/z/3bVRVz

And remove dead code.

Fix: https://github.com/llvm/llvm-project/issues/56124

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D126591
2022-06-23 21:53:07 +08:00
Kazu Hirata 7a47ee51a1 [llvm] Don't use Optional::getValue (NFC) 2022-06-20 22:45:45 -07:00
Kazu Hirata e0e687a615 [llvm] Don't use Optional::hasValue (NFC) 2022-06-20 10:38:12 -07:00
Chenbing Zheng 0eff6c6ba8 [InstCombine] add vector support for (A >> C) == (B >> C) --> (A^B) u< (1 << C)
Reviewed By: spatel, RKSimon

Differential Revision: https://reviews.llvm.org/D127398
2022-06-20 10:55:47 +08:00
Eric Gullufsen 73202130e5 [InstCombine] Optimize test for same-sign of values
(icmp slt (X & Y), 0) | (icmp sgt (X | Y), -1) -> (icmp sgt (X ^ Y), -1)
(icmp slt (X | Y), 0) & (icmp sgt (X & Y), -1) -> (icmp slt (X ^ Y), 0)

[[ https://alive2.llvm.org/ce/z/qXxEFP | alive2 example ]]
[[ https://godbolt.org/z/aWf9c6j74 | godbolt  ]]

[[ https://godbolt.org/z/5Ydn5TehY | godbolt for inverted form ]]
[[ https://alive2.llvm.org/ce/z/93AODr | alive2 for inverted form ]]
[[ https://github.com/llvm/llvm-project/issues/55988 | issue #55988 ]]

Differential Revision: https://reviews.llvm.org/D127903
2022-06-19 16:18:19 -04:00
Sanjay Patel 0399473de8 [InstCombine] add fold for (ShiftC >> X) <u C
https://alive2.llvm.org/ce/z/RcdzM-

This fixes a regression noted in issue #56046.
2022-06-19 11:03:28 -04:00
Kazu Hirata 129b531c9c [llvm] Use value_or instead of getValueOr (NFC) 2022-06-18 23:07:11 -07:00
Sanjay Patel bfde861935 [InstCombine] convert mask and shift of power-of-2 to cmp+select
When the mask is a power-of-2 constant and op0 is a shifted-power-of-2
constant, test if the shift amount equals the offset bit index:

(ShiftC << X) & C --> X == (log2(C) - log2(ShiftC)) ? C : 0
(ShiftC >> X) & C --> X == (log2(ShiftC) - log2(C)) ? C : 0

This is an alternate to D127610 with a more general pattern.
We match only shift+and instead of the trailing xor, so we see a few
more tests diffs. I think we discussed this initially in D126617.

Here are proofs for shifts in both directions:
https://alive2.llvm.org/ce/z/CFrLs4

The test diffs look equal or better for IR, and this makes the
patterns more uniform in IR. The backend can partially invert this
in both cases if that is profitable. It is not trivially reversible,
however, so if we find perf regressions that are not easy to undo,
then we may want to revert this.

Differential Revision: https://reviews.llvm.org/D127801
2022-06-17 10:51:57 -04:00
Nikita Popov c6b88cb918 [InstCombine] Push freeze through recurrence phi
We really want to push freezes through recurrence phis, so that we
freeze only the start value, rather than the IV value on every
iteration. foldOpIntoPhi() already handles this for the case where
the transfer function doesn't produce poison, e.g.
%iv.next = add %iv, 1. However, this does not work if nowrap flags
are present, e.g. the very common %iv.next = add nuw %iv, 1 case.

This patch adds a fold that pushes freeze instructions to the start
value by checking whether all backedge values will be non-poison
after poison generating flags have been dropped. This allows pushing
freezes out of loops in most cases. I suspect that this also
obsoletes the CanonicalizeFreezeInLoops pass, and we can probably
drop it.

Fixes https://github.com/llvm/llvm-project/issues/56048.

Differential Revision: https://reviews.llvm.org/D127960
2022-06-17 15:01:41 +02:00
Heejin Ahn b2f4112f25 [InstCombine] Improve check for catchswitch BBs (NFC)
Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D127810
2022-06-15 01:06:13 -07:00
chenglin.bi 286198ff04 [InstCombine] Optimize lshr+shl+and conversion pattern
if `C1` and `C3` are pow2 and `Log2(C3) >= C2`:
    ((C1 >> X) << C2) & C3 -> X == (Log2(C1)+C2-Log2(C3)) ? C3 : 0
https://alive2.llvm.org/ce/z/zvrkKF

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D127469
2022-06-14 11:06:10 +08:00
Heejin Ahn ac4006b0d6 [InstCombine] Don't slice up PHIs when pred BB has catchswitch
If an integer PHI has an illegal type (according to the data layout) and
it is only used by `trunc` or `trunc(lshr)` operations, we split the PHI
into various instructions in its predecessors:
6d1543a167/llvm/lib/Transforms/InstCombine/InstCombinePHI.cpp (L1536-L1543)

So this can produce code like the following:
Before:
```
pred:
  ...

bb:
  %p = phi i8 [ %somevalue, %pred ], ...
  ...
  %tobool = trunc i8 %p to i1
  use %tobool
  ...
```
In this code, `%p` has an illegal integer type, `i8`, and its only used
in a `trunc` instruction later. In this case this pass puts extraction
code in its predecessors:

After:
```
pred:
  ...
  %t = and i8 %somevalue, 1
  %extract = icmp ne i8 %t, 0

bb:
  %p.new = phi i1 [ %extract, %pred ], ...
  use %p.new instead of %tobool
```

But this doesn't work if `pred` is a `catchswitch` BB because it cannot
have any non-PHI instructions. This CL ensures we bail out in that case.

Fixes https://github.com/llvm/llvm-project/issues/55803.

Reviewed By: dschuff

Differential Revision: https://reviews.llvm.org/D127699
2022-06-13 18:32:09 -07:00
Sanjay Patel 310adb658c [InstCombine] reorder mask folds for efficiency
This shows narrowing improvements on the logic tests
(transforms recently added with e247b0e5c9).

This is not a complete fix. That would require adding
folds to visitOr/visitXor. But it enables the expected
transforms for the basic patterns in the affected tests.
2022-06-13 09:49:57 -04:00
David Sherwood 83251896d7 [NFC][InstCombine] Refactor InstCombinerImpl::foldSelectIntoOp
Introduce a lambda function so that we remove a lot of code
duplication.

Differential Revision: https://reviews.llvm.org/D127493
2022-06-13 10:37:07 +01:00
Nikita Popov 92a9b1c918 [InstCombine] Don't push operation across loop phi
When pushing an operation across a phi node, we should avoid doing
so across a loop backedge. This is generally non-profitable, because
it does not reduce the number of times the operation is executed,
and could lead to an infinite combine loop.

The code was already guarding against this, but using an
insufficiently strong condition, which did not cover the case where
the operation was originally outside the loop (in which case the
transform moves the operation from outside the loop into the loop,
which is particularly undesirable).

Differential Revision: https://reviews.llvm.org/D127499
2022-06-13 10:48:09 +02:00
Nuno Lopes e5c5f92e12 [InstCombine] switch synthetic unreachable to use undef instead of poison (NFC) 2022-06-10 21:54:09 +01:00
Sanjay Patel e247b0e5c9 [InstCombine] add narrowing transform for low-masked binop with zext operand (2nd try)
The 1st try ( afa192cfb6 ) was reverted because it could
cause an infinite loop with constant expressions.

A test for that and an extra condition to enable the transform
are added now. I also added code comments to better describe
the transform and the existing, related transform.

Original commit message:
https://alive2.llvm.org/ce/z/hRy3rE

As shown in D123408, we can produce this pattern when moving
casts around, and we already have a related fold for a binop
with a constant operand.
2022-06-10 12:42:27 -04:00
Guillaume Chatelet dc9c2eac98 [NFC][Alignment] Simplify code 2022-06-10 15:25:28 +00:00
Sanjay Patel 6fedc6a2b4 Revert "[InstCombine] add narrowing transform for low-masked binop with zext operand"
This reverts commit afa192cfb6.
This can cause an infinite loop as shown with an example in the
post-commit thread.
2022-06-10 08:25:10 -04:00
David Sherwood 8daaea206b [InstCombine] Use +0.0 instead of -0.0 as the FP identity for some folds
In foldSelectIntoOp we sometimes transform a select of a fadd into a
fadd of a select, where we select between data and an identity value.
For both fadd and fsub the identity is always -0.0, but if the nsz
flag is set on the select instruction we can use +0.0 instead. Doing
so then triggers other optimisations, such as when folding the select
of masked load into a new masked load.

Differential Revision: https://reviews.llvm.org/D126774
2022-06-10 12:42:34 +01:00
chenglin.bi de7a6ae1ff [InstCombine] Optimize shl+lshr+and conversion pattern
if `C1` and `C3` are pow2 and `Log2(C3)+C2 < BitWidth`:
    ((C1 << X) >> C2) & C3 -> X == (Log2(C3)+C2-Log2(C1)) ? C3 : 0;

https://alive2.llvm.org/ce/z/Pus5bd

Fix issue https://github.com/llvm/llvm-project/issues/55739

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D126617
2022-06-10 09:36:58 +08:00
Sanjay Patel afa192cfb6 [InstCombine] add narrowing transform for low-masked binop with zext operand
https://alive2.llvm.org/ce/z/hRy3rE

As shown in D123408, we can produce this pattern when moving
cast around, and we already have a related fold for a binop
with a constant operand.
2022-06-09 16:59:26 -04:00
Simon Moll b8c2781ff6 [NFC] format InstructionSimplify & lowerCaseFunctionNames
Clang-format InstructionSimplify and convert all "FunctionName"s to
"functionName".  This patch does touch a lot of files but gets done with
the cleanup of InstructionSimplify in one commit.

This is the alternative to the less invasive clang-format only patch: D126783

Reviewed By: spatel, rengolin

Differential Revision: https://reviews.llvm.org/D126889
2022-06-09 16:10:08 +02:00
Biplob Mishra d87bfa9ad0 [InstCombine] Combine instructions of type or/and where AND masks can be combined.
The patch simplifies some of the patterns as below

(A | (B & C0)) | (B & C1) -> A | (B & C0|C1)
((B & C0) | A) | (B & C1) -> (B & C0|C1) | A

In some scenarios like byte reverse on half word, we can see this pattern multiple times and this conversion can optimize these patterns.

Additionally this commit fixes the issue reported with the test case.
int f(int a, int b) {
  int c = ((unsigned char)(a >> 23) & 925);
  if (a)
    c = (a >> 23 & b) | ((unsigned char)(a >> 23) & 925) | (b >> 23 & 157);
  return c;
}

The previous revision/commit did not check one-use of an intermediate value that this transform re-uses.
When that value has another use, an existing transform will try to invert the transform here.
By adding one-use checks, we avoid the infinite loops seen with the earlier commit.

Differential Revision: https://reviews.llvm.org/D124119
2022-06-09 10:58:30 +01:00
Chenbing Zheng 38992d2c5e [InstCombine] improve fold for icmp-ugt-ashr
Existing condition for
fold icmp ugt (ashr X, ShAmtC), C --> icmp ugt X, ((C + 1) << ShAmtC) - 1
missed some boundary. It cause this fold don't work for some cases, and the
reason is due to signed number overflow.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D127188
2022-06-09 16:22:12 +08:00
Wael Yehia 0952cf5bbb [InstCombine] decomposeSimpleLinearExpr should bail out on negative operands.
InstCombine tries to rewrite

  %prod = mul nsw i64 %X,   Scale
  %acc = add nsw i64 %prod,   Offset
  %0 = alloca i8, i64 %acc, align 4
  %1 = bitcast i8* %0 to i32*
  Use ( %1 )

into

  %prod = mul nsw i64 %X,   Scale/4
  %acc = add nsw i64 %prod,   Offset/4
  %0 = alloca i32, i64 %acc, align 4
  Use (%0)

But it assumes Scale is unsigned, and performs an unsigned division.
So we should bail out if Scale cannot be interpreted as an unsigned safely.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D126546
2022-06-08 00:57:25 +00:00
Sanjay Patel cae993d4c8 [InstCombine] [InstCombine] reduce left-shift-of-right-shifted constant via demanded bits
If we don't demand low bits and it is valid to pre-shift a constant:
(C2 >> X) << C1 --> (C2 << C1) >> X

https://alive2.llvm.org/ce/z/_UzTMP

This is the reverse-order shift sibling to 82040d414b ( D127122 ).
It seems likely that we would want to add this to the SDAG version of
the code too to keep it on par with IR.
2022-06-07 18:43:27 -04:00
Sanjay Patel a4d2c5ecaa [InstCombine] reduce code duplication for accessing type; NFC 2022-06-07 18:43:27 -04:00
Sanjay Patel 82040d414b [InstCombine] reduce right-shift-of-left-shifted constant via demanded bits
If we don't demand high bits (zeros) and it is valid to pre-shift a constant:
(C2 << X) >> C1 --> (C2 >> C1) << X

https://alive2.llvm.org/ce/z/P3dWDW

There are a variety of related patterns, but I haven't found a single solution
that gets all of the motivating examples - so pulling this piece out of
D126617 along with more tests.

We should also handle the case where we shift-right followed by shift-left,
but I'll make that a follow-on patch assuming this one is ok. It seems likely
that we would want to add this to the SDAG version of the code too to keep it
on par with IR.

Differential Revision: https://reviews.llvm.org/D127122
2022-06-07 13:28:18 -04:00
Sanjay Patel 3f33d67d8a [InstCombine] fold mul with masked low bit operand to trunc+select
https://alive2.llvm.org/ce/z/o7rQ5q

This shows an extra instruction in some cases, but that is
caused by an existing canonicalization of trunc -> and+icmp.

Codegen should be better for any target where a multiply is
more costly than the most simple ALU op.

This ends up producing the requested x86 asm from issue #55618,
but it's not the same IR. We are missing a canonicalization
from the negate+mask pattern to the trunc+select created here.
2022-06-05 20:07:18 -04:00
Sanjay Patel 8689463bfb [InstCombine] make pattern matching more consistent; NFC
We could go either way on this and several similar matches.
Just matching as a binop is possibly slightly more efficient;
we don't need to re-confirm the opcode of the instruction.
2022-06-02 16:01:23 -04:00
Alexander Kornienko aa98e7e1eb Revert "[InstCombine] Combine instructions of type or/and where AND masks can be combined."
This reverts commit ec4adf1f6c. The commit causes
clang to hang on a certain input:
```
$ cat q.cc
int f(int a, int b) {
  int c = ((unsigned char)(a >> 23) & 925);
  if (a)
    c = (a >> 23 & b) | ((unsigned char)(a >> 23) & 925) | (b >> 23 & 157);
  return c;
}

$ time ./clang-15-10515 --target=x86_64--linux-gnu -O1  -c q.cc
^C

real    0m45.072s
user    0m0.025s
sys     0m0.099s
```
2022-06-01 14:20:00 +02:00
Sanjay Patel 2bf6123f22 [InstCombine] fold icmp of sext bool based on limited range
X <=u (sext i1 Y) --> (X == 0) | Y

https://alive2.llvm.org/ce/z/W_tZzo

This is the conjugate/sibling pattern suggested with D126171
for a sign-extended bool value.
2022-05-31 12:37:56 -04:00
Nikita Popov 36cbdaa163 [InstCombine] Fix inbounds preservation when swapping GEPs (PR44206)
When reassociating GEPs, we can only keep inbounds if both original
GEPs were inbounds, and their offsets have the same sign. For the
sake of simplicity, I only handle the case where both offsets are
non-negative here.

It would probably be fine to just not preserve inbounds at all here,
but as I don't see a compile-time impact for adding the
isKnownNonNegative() calls I went with this more conservative
approach.

Fixes https://github.com/llvm/llvm-project/issues/44206.

Differential Revision: https://reviews.llvm.org/D126687
2022-05-31 15:45:02 +02:00
Danila Malyutin 4fb3fd7d82 [InstCombine] Fix const folding of switches with default case
In case phi was in the default block it could lead to multi-edge.
Fixes #55721.

Differential Revision: https://reviews.llvm.org/D126650
2022-05-31 15:13:58 +03:00
Nikita Popov 872d69e5d4 [InstCombine] Fix inbounds preservation when merging GEPs (PR55722)
Even if the total offset is inbounds, we might represent it by first
performing a large negative offset and then a small positive one.
With inbounds semantics as currently specified, each offset must
be inbounds individually, not just the overall offset of the GEP.

Fix this by checking that the sign of all offsets is the same.

Fixes https://github.com/llvm/llvm-project/issues/55722.
2022-05-31 11:54:01 +02:00
Sanjay Patel a0c3c60728 [InstCombine] fold shift-right-by-constant with shift-right-of-constant operand
(C2 >> X) >> C1 --> (C2 >> C1) >> X

The shift-left form of this transform has existed since:
16f18ed7b5

...but it applies to matching shift right opcodes too:
https://alive2.llvm.org/ce/z/c5eQms
2022-05-30 15:30:01 -04:00
Sanjay Patel c5d942a4fb [InstCombine] remove unnecessary one-use check from (C2 << X) << C1 fold
The restriction goes back to:
16f18ed7b5
...but the fold only replaces a shift with a shift, so that's not necessary.
Generalizing to other opcodes is planned as a follow-up.
2022-05-30 15:17:54 -04:00
Nikita Popov a770f534e6 [InstCombine] When swapping GEPs, only keep inbounds if both are
If only one of the GEPs is inbounds, then after swapping, there is
no guarantee that one of them will be inbounds as well
(see e.g. https://alive2.llvm.org/ce/z/agaCnp).

This is only a partial fix, because even if both are inbounds, the
result is not necessarily inbounds (if the offsets have different
signs).
2022-05-30 17:04:42 +02:00
Nikita Popov 2d7bab666f [InstCombine] Always create new GEPs when swapping GEPs
As the long explanatory comment attests, performing the modification
in place is pretty tricky. Drop this unnecessary complexity and
always create new instructions.

This should be NFC-ish, but can probably cause difference due to
worklist order.
2022-05-30 16:48:52 +02:00
zhongyunde 3e6ba89055 [InstCombine] Fold a mul with bool value into and
Fixes https://github.com/llvm/llvm-project/issues/55599
  X * Y --> X & Y, iff X, Y can be only {0, 1}.
https://alive2.llvm.org/ce/z/_RsTKF

Reviewed By: spatel, nikic

Differential Revision: https://reviews.llvm.org/D126040
2022-05-30 21:05:00 +08:00
Chenbing Zheng ef256ed58e [InstCombine] bitcast (extractelement <1 x elt>, dest) -> bitcast(<1 x elt>, dest)
Only solve dest type is vector to avoid inverse transform in visitBitCast.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D125951
2022-05-30 10:16:32 +08:00
Sanjay Patel b5b6aa4d53 [InstCombine] fold multiply by signbit-splat to cmp+select
(ashr i32 X, 31) * C --> (X < 0) ? -C : 0
https://alive2.llvm.org/ce/z/G8u9SS

With a constant operand, this is an improvement in IR
and codegen (where it can be converted to a mask op).

Without a constant operand, we would have to negate
the operand, so that is probably better left to the backend.

This is similar but not the same optimization that is requested
in #55618.
2022-05-27 11:54:19 -04:00
Sanjay Patel 5a6e085757 [InstCombine] reduce code duplication; NFC 2022-05-27 11:54:19 -04:00
Sanjay Patel c4c750058f [InstCombine] fold mul of signbit directly to X < 0 ? Y : 0
This is effectively NFC (intentionally no test diffs)
because we already have the related fold that converts
the 'and' pattern to select. So this is just an efficiency
improvement.
2022-05-26 16:19:15 -04:00
Sanjay Patel 49f8b05137 [InstCombine] fold icmp equality with sdiv and SMIN
This extends the fold from D126410 / 3952c905ef
to allow for the only case where it works with signed
division:
https://alive2.llvm.org/ce/z/k7_ypu

(X s/ Y) == SMIN --> (X == SMIN) && (Y == 1)
(X s/ Y) != SMIN --> (X != SMIN) || (Y != 1)

This is another improvement based on #55695.
2022-05-26 16:19:15 -04:00
Sanjay Patel ed5be1523f [InstCombine] reduce code duplication in icmp+div folds; NFC 2022-05-26 16:19:15 -04:00
Sanjay Patel 3952c905ef [InstCombine] fold icmp equality with udiv and large constant
With large compare constant:
(X u/ Y) == C --> (X == C) && (Y == 1)
(X u/ Y) != C --> (X != C) || (Y != 1)

https://alive2.llvm.org/ce/z/EhKwh6

There are various potential missing icmp (div) transforms shown here:
https://github.com/llvm/llvm-project/issues/55695

This is a generalization for part of the udiv + equality.
I didn't check in detail, but some of those may only make sense as
codegen transforms.

This results in one extra instruction in IR, but it is better for
analysis, and looks much better in codegen on all targets that I tried.

Differential Revision: https://reviews.llvm.org/D126410
2022-05-26 09:08:47 -04:00
Chenbing Zheng 1486a9c9fe [InstCombine] [NFC] refector foldXorOfICmps
Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D126268
2022-05-26 11:07:18 +08:00
Chenbing Zheng 41aab93afc [InstCombine] bitcast(logic(bitcast(X), bitcast(Y))) -> bitcast'(logic(bitcast'(X), Y))
This patch break foldBitCastBitwiseLogic limite the destination
must have an integer element type, and eliminate one bitcast by
doing the logic op in the type of the input that has an integer
element type.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D126184
2022-05-26 10:23:44 +08:00
Chenbing Zheng 269e3f7369 [InstCombine] [NFC] Move transforms for truncated shifts into narrowBinOp
Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D126056
2022-05-25 10:21:39 +08:00
Sanjay Patel 05527b68a0 [InstCombine] fold more shuffles with FP<->Int cast operands
shuffle (cast X), (cast Y), Mask --> cast (shuffle X, Y, Mask)

This extends the transform added with 0353c2c996.

If the shuffle reduces vector length, the transform
reduces the width of the cast, so that should be a
win for most codegen (if not, it can be inverted).
2022-05-24 15:11:38 -04:00
Nikita Popov e6e0eb3bc8 [InstCombine] Strip bitcasts in GEP diff fold
Bitcasts were stripped in one case, but not the other. Of course,
this no longer really matters with opaque pointers, but as I went
through the trouble of tracking this down, we may as well remove
one typed vs opaque pointer optimization discrepancy.
2022-05-24 16:12:01 +02:00
Nikita Popov b2a13d3e2d [InstCombine] Use IRBuilder in freeze pushing transform (PR55619)
Use IRBuilder so that the newly created freeze instructions
automatically gets inserted back into the IC worklist.

The changed worklist processing order leads to some cosmetic
differences in tests.

Fixes https://github.com/llvm/llvm-project/issues/55619.
2022-05-24 15:48:28 +02:00
Nikita Popov a7c079aaa2 [InstCombine] Support logical and in masked icmp fold
Most of the folds implemented in this function work fine with
logical operations. We only need to be careful for the cases that
work on non-constant masks, where the RHS operand shouldn't be
poison.

This is a conservative implementation that bails out of illegal
transforms, but we could also change these to insert freeze instead.
2022-05-24 11:16:33 +02:00
Nikita Popov 5abaabed22 [InstCombine] Use m_APInt() in asymmetric masked icmp fold
This is mostly intended as code cleanup, but it does also add
support for splat vectors to this fold.
2022-05-24 10:57:28 +02:00
Nikita Popov c0e06c7448 [InstCombine] Handle logical and/or in recursive and/or of icmps fold
The and/or of icmps fold is also applied in reassociated form.
However, this currently only happens for bitwise and of bitwise
and, but not for bitwise and of logical and (or other combinations,
but this is the one being addressed here).

We can do this for bitwise+logical combinations as well, but need
to be a bit careful about which of the resulting ands are logical:
https://alive2.llvm.org/ce/z/WYSjGh
https://alive2.llvm.org/ce/z/guxYnz
https://alive2.llvm.org/ce/z/S5SYxY
https://alive2.llvm.org/ce/z/2rAWeW
2022-05-24 10:13:10 +02:00
Sanjay Patel e8c20d995b [IR] add and use pattern match specialization for sqrt intrinsic; NFC
This was included in D126190 originally, but it's
independent and a useful change for readability.
2022-05-23 14:16:30 -04:00
Nikita Popov f45c1e436e [InstCombine] Change operand order in recursive and/or of icmps fold
The order obviously doesn't matter for bitwise and/or, but would
matter for logical and/or, so change it to preserve the original
order.
2022-05-23 17:29:33 +02:00
Sanjay Patel 1ebad988b1 [InstCombine] fold icmp of zext bool based on limited range
X <u (zext i1 Y) --> (X == 0) && Y

https://alive2.llvm.org/ce/z/avQDRY

This is a generalization of 4069cccf3b based on the post-commit suggestion.
This also adds the i1 type check and tests that were missing from the earlier
attempt; that commit caused several bot fails and was reverted.

Differential Revision: https://reviews.llvm.org/D126171
2022-05-23 09:59:21 -04:00
Nikita Popov 45226d04f0 [InstCombine] Reuse icmp of and/or folds for logical and/or
Similarly to a change recently done for fcmps, add a flag that
indicates whether the and/or is logical to foldAndOrOfICmps, and
reuse the function when folding logical and/or.

We were already calling some parts of it, but this gives us a
clearer indication of which parts may need poison-safe variants,
and would also allow to fold combinations of bitwise and logical
and/or.

This change should be close to NFC, because all folds this enables
were either already called previously, or can make use of implied
poison reasoning.
2022-05-23 15:37:07 +02:00
Sanjay Patel cba0ebd576 Revert "[InstCombine] fold icmp with sub and bool"
This reverts commit 4069cccf3b.
This causes bot failures, and there's a possibly a better way to get this and other patterns.
2022-05-22 12:13:20 -04:00
Sanjay Patel 4069cccf3b [InstCombine] fold icmp with sub and bool
This is the specific pattern seen in #53432, but it can be extended
in multiple ways:
1. The 'zext' could be an 'and'
2. The 'sub' could be some other binop with a similar ==0 property (udiv).

There might be some way to generalize using knownbits, but that
would require checking that the 'bool' value is created with
some instruction that can be replaced with new icmp+logic.

https://alive2.llvm.org/ce/z/-KCfpa
2022-05-22 11:51:07 -04:00
Sanjay Patel f0071d43e4 [InstCombine] add use check to fold of bitwise logic with cast ops
This was shown as a potential regression in D126040.
2022-05-20 09:08:53 -04:00
Chenbing Zheng cf348f6a2c [InstCombine] [NFC] Use a pattern matcher for ExtractElementInst
Reviewed By: RKSimon, rampitec

Differential Revision: https://reviews.llvm.org/D125857
2022-05-20 10:31:40 +08:00
Jay Foad 6bec3e9303 [APInt] Remove all uses of zextOrSelf, sextOrSelf and truncOrSelf
Most clients only used these methods because they wanted to be able to
extend or truncate to the same bit width (which is a no-op). Now that
the standard zext, sext and trunc allow this, there is no reason to use
the OrSelf versions.

The OrSelf versions additionally have the strange behaviour of allowing
extending to a *smaller* width, or truncating to a *larger* width, which
are also treated as no-ops. A small amount of client code relied on this
(ConstantRange::castOp and MicrosoftCXXNameMangler::mangleNumber) and
needed rewriting.

Differential Revision: https://reviews.llvm.org/D125557
2022-05-19 11:23:13 +01:00
Chenbing Zheng ffaaf2498b [InstCombine] (rot X, ?) == 0/-1 --> X == 0/-1
In this patch we add a function foldICmpInstWithConstantAllowUndef
to fold integer comparisons with a constant operand: icmp Pred X, C
where X is some kind of instruction and C is AllowUndef.

We move this fold to the new function, so that it can solve undef elts in a vector.

Reviewed By: spatel, RKSimon

Differential Revision: https://reviews.llvm.org/D125220
2022-05-19 11:22:26 +08:00
Chenbing Zheng 51df77f36d [InstCombine] Allow undef vectors when foldSelectToCopysign
Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D125671
2022-05-19 10:57:49 +08:00
Sanjay Patel ebbc37391f [InstCombine] allow variable shift amount in bswap + shift fold
When shifting by a byte-multiple:
bswap (shl X, Y) --> lshr (bswap X), Y
bswap (lshr X, Y) --> shl (bswap X), Y

This was limited to constants as a first step in D122010 / 60820e53ec ,
but issue #55327 shows a source example (and there's a test based on that here)
where a variable shift amount is used in this pattern.
2022-05-18 14:38:16 -04:00
Sanjay Patel 990cc49ca0 [InstCombine] avoid crash on fold of icmp with cast operand
We could do better by inserting a bitcast from scalar int
to vector int or using an insertelement (the alternate test
does not crash because there's an independent fold like that).

But this doesn't seem like a likely pattern, so just bail out
for now.

Fixes issue #55516.
2022-05-18 09:16:30 -04:00
Sanjay Patel be6d7cc93c [InstCombine] reduce code duplication for checking types; NFC 2022-05-18 09:16:30 -04:00
Sanjay Patel dbf3b5f114 [InstCombine] fold more shuffles with FP<->Int cast operands
shuffle (cast X), (cast Y), Mask --> cast (shuffle X, Y, Mask)

This extends the transform added with 0353c2c996.

If the casts are to a larger element type, the transform
reduces shuffle bit width, so that should be a win for
most codegen (if not, it can be inverted).
2022-05-17 14:25:11 -04:00
Sanjay Patel f31d39c42c [InstCombine] remove cast-of-signbit to shift transform
The transform was wrong in 3 ways:

1. It created an extra instruction when the source and dest types don't match.
2. It did not account for an extra use of the icmp, so could create 2 extra insts.
3. It favored bit hacks over icmp (icmp generally has better analysis).

This fixes #54692 (modeled by the PhaseOrdering tests).

This is a minimal step to fix the bug, but we should likely invert
this and the sibling transform for the "is negative" pattern too.

The backend should be able to invert this back to a shift if that
leads to better codegen.

This is a reduced try of 3794cc0e99 - that was reverted because
it could cause infinite loops by conflicting with the related
transforms in this block that create shifts.
2022-05-17 11:10:28 -04:00
Nikita Popov a694546f7c [KnownBits] Add operator==
Checking whether two KnownBits are the same is somewhat common,
mainly in test code.

I don't think there is a lot of room for confusion with "determine
what the KnownBits for an icmp eq would be", as that has a
different result type (this is what the eq() method implements,
which returns Optional<bool>).

Differential Revision: https://reviews.llvm.org/D125692
2022-05-17 09:38:13 +02:00
Sanjay Patel 07d549bce9 Revert "[InstCombine] invert canonicalization for cast of signbit test"
This reverts commit 3794cc0e99.
This change is suspected of causing bots to hang at stage 2
compiles, so reverting to confirm and investigate.
2022-05-16 17:47:02 -04:00
Sanjay Patel 3794cc0e99 [InstCombine] invert canonicalization for cast of signbit test
The existing transform was wrong in 3 ways:
1. It created an extra instruction when the source and dest types don't match.
2. It did not account for an extra use of the icmp, so could create 2 extra insts.
3. It favored bit hacks over icmp (icmp generally has better analysis).

This fixes #54692 (modeled by the PhaseOrdering tests).

This is a minimal step to fix the bug, but we should likely invert
the sibling transform for the "is negative" pattern too.

The backend should be able to invert this back to a shift if that
leads to better codegen.
2022-05-16 12:55:52 -04:00
Sanjay Patel be7f09f7b2 [IR] create and use helper functions that test the signbit; NFCI 2022-05-16 11:26:23 -04:00
Biplob Mishra ec4adf1f6c [InstCombine] Combine instructions of type or/and where AND masks can be combined.
The patch simplifies some of the patterns as below

(A | (B & C0)) | (B & C1) -> A | (B & C0|C1)
((B & C0) | A) | (B & C1) -> (B & C0|C1) | A

In some scenarios like byte reverse on half word, we can see this pattern multiple times and this conversion can optimize these patterns.

Differential Revision: https://reviews.llvm.org/D124119
2022-05-16 12:43:33 +01:00
Chenbing Zheng acbad5086a [InstCombine] [NFC] separate a function foldICmpBinOpWithConstant
There is a long function foldICmpInstWithConstant,
we can separate a function foldICmpBinOpWithConstant from it.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D125457
2022-05-14 10:54:15 +08:00
Nikita Popov ed1cb01baf [IRBuilder] Add IsInBounds parameter to CreateGEP()
We commonly want to create either an inbounds or non-inbounds GEP
based on a boolean value, e.g. when preserving inbounds from
existing GEPs. Directly accept such a boolean in the API, rather
than requiring a ternary between CreateGEP and CreateInBoundsGEP.

This change is not entirely NFC, because we now preserve an
inbounds flag in a constant expression edge-case in InstCombine.
2022-05-13 14:30:55 +02:00
Nikita Popov d9ad6a2c8b [InstCombine] Fix unused variable warning (NFC) 2022-05-13 12:43:21 +02:00
Chenbing Zheng 2a0837aab1 [InstCombine] fix sub(add(X,Y),umin(Y,Z)) --> add(X,usub.sat(Y,Z))
This patch fix bug left in D124503. We should do
sub(add(X,Z),umin(Y,Z)) --> add(X,usub.sat(Z,Y)) instead of
sub(add(X,Z),umin(Y,Z)) --> add(X,usub.sat(Y,Z)).

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D125352
2022-05-13 09:54:10 +08:00
Sanjay Patel 2fa8fc3d0a [InstCombine] freeze operand in div+mul fold
As discussed in issue #37809, this transform is not safe
if the input is an undefined value.

This is similar to recent changes for urem and sdiv:
d428f09b2c
99ef341ce9

There is no difference in codegen on the basic examples,
but this could lead to regressions. We may need to
improve freeze analysis or lowering if that happens.

Presumably, in real cases that are similar to the tests
where a subsequent transform removes the rem, we
will also be able to remove the freeze by seeing that
the parameter has 'noundef'.
2022-05-12 13:49:29 -04:00
Sanjay Patel 99ef341ce9 [InstCombine] freeze operand in sdiv expansion
As discussed in issue #37809, this transform is not safe
if the input is an undefined value.

This is similar to a recent change for urem:
d428f09b2c

There is no difference in codegen on the basic examples,
but this could lead to regressions. We may need to
improve freeze analysis or lowering if that happens.

Presumably, in real cases that are similar to the tests
where a subsequent transform removes the select, we
will also be able to remove the freeze by seeing that
the parameter has 'noundef'.
2022-05-11 14:01:28 -04:00
Sanjay Patel d428f09b2c [InstCombine] freeze operand in urem expansion
As discussed in issue #37809, this transform is not safe
if the input is an undefined value.

There is no difference in codegen on the basic examples,
but this could lead to regressions. We may need to
improve freeze analysis or lowering if that happens.
2022-05-11 12:47:26 -04:00
Nikita Popov 6001bfcedc [InstCombine] Freeze other uses of frozen value
If there is a freeze %x, we currently replace all other uses of %x
with freeze %x -- as long as they are dominated by the freeze
instruction. This patch extends this behavior to cases where we
did not originally dominate the use by moving the freeze
instruction directly after the definition of the frozen value.

The motivation can be seen in test @combine_and_after_freezing_uses:
Canonicalizing everything to freeze %x allows folds that are based
on value identity (i.e. same operand occurring in two places) to
trigger. This also covers the case from D125248.

Differential Revision: https://reviews.llvm.org/D125321
2022-05-11 16:47:12 +02:00
Sanjay Patel 0353c2c996 [InstCombine] fold shuffles with FP<->Int cast operands
shuffle (cast X), (cast Y), Mask --> cast (shuffle X, Y, Mask)

This is similar to a recent transform with fneg ( b331a7ebc1 ),
but this is intentionally the most conservative first step to
try to avoid regressions in codegen. There are several
restrictions that could be removed as follow-up enhancements.

Note that a cast with a unary shuffle is currently canonicalized
in the other direction (shuffle after cast - D103038 ). We might
want to invert that to be consistent with this patch.
2022-05-10 14:20:43 -04:00
Nikita Popov d222bab672 [InstCombine] Handle GEP scalar/vector base mismatch (PR55363)
30a12f3f63 switched the type check
to use the GEP result type rather than the GEP operand type.
However, the GEP result types may match even if the operand types
don't, in case GEPs with scalar/vector base and vector index
are compared.

Fixes https://github.com/llvm/llvm-project/issues/55363.
2022-05-10 11:26:43 +02:00
Sanjay Patel 8650f05c97 [InstCombine] fix miscompile when casting int->FP->int
As shown in https://github.com/llvm/llvm-project/issues/55150 -
the existing fold may be wrong when converting to a signed value.
This is a quick fix to avoid the miscompile.

I added tests/comments for all of the signed/unsigned combinations
at either side of the boundary width, and tried to confirm with Alive2:
https://alive2.llvm.org/ce/z/3p9DSu

There are already some TODO items in the test file that suggest
possible refinements, so the regression with ui->FP->si is probably ok.
It seems unlikely that we'd see these kind of edge cases with
non-byte-width integer types in real code. The potential miscompile
went undetected for several years.

This and 747c6a0c73 fixes #55150.

Differential Revision: https://reviews.llvm.org/D124692
2022-05-07 08:46:25 -04:00
Serge Pavlov eb28da89a6 [InstCombine] Remove side effect of replaced constrained intrinsics
If a constrained intrinsic call was replaced by some value, it was not
removed in some cases. The dangling instruction resulted in useless
instructions executed in runtime. It happened because constrained
intrinsics usually have side effect, it is used to model the interaction
with floating-point environment. In some cases side effect is actually
absent or can be ignored.

This change adds specific treatment of constrained intrinsics so that
their side effect can be removed if it actually absents.

Differential Revision: https://reviews.llvm.org/D118426
2022-05-07 19:04:11 +07:00
Chenbing Zheng 394c683d40 [InstCombine] sub(add(X,Y),umin(Y,Z)) --> add(X,usub.sat(Y,Z))
Alive2: https://alive2.llvm.org/ce/z/2UNVbp

Reviewed By: RKSimon, spatel

Differential Revision: https://reviews.llvm.org/D124503
2022-05-07 17:17:48 +08:00
Chenbing Zheng 8eaa1ef0d8 [InstCombine] add casts from splat-a-bit pattern if necessary
Splatting a bit of constant-index across a value:
sext (ashr (trunc iN X to iM), M-1) to iN --> ashr (shl X, N-M), N-1
If the dest type is different, use a cast (adjust use check).

https://alive2.llvm.org/ce/z/acAan3

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D124590
2022-05-07 15:34:57 +08:00
Sanjay Patel b331a7ebc1 [InstCombine] canonicalize fneg after shuffle
For the unary shuffle pattern, this is opposite to what we try
to do with binops, but it seems better to keep it consistent
with the motivating binary shuffle pattern. On that, it is
clearly better on the usual no-extra uses case.

There is a chance that this will pull an fneg away from some
other binop and cause a regression in codegen, but that should
be invertible in the backend. The transform is birectional:
https://alive2.llvm.org/ce/z/kKaKCU
https://alive2.llvm.org/ce/z/3Desfw

Fixes #45631
2022-05-06 16:30:26 -04:00
Nikita Popov 82190f917a [InstCombine] Fold icmp of select with implied condition
When threading the icmp over the select, check whether the
condition can be folded when taking into account the select
condition.
2022-05-06 17:13:32 +02:00
Nikita Popov 0863abe3ac [InstCombine] Fold icmp of select with non-constant operand
Try to push an icmp into a select even if the icmp operand isn't
constant - perform a generic SimplifyICmpInst instead.

This doesn't appear to impact compile-time much, and forming
logical and/or is generally profitable, as we have very good
support for them.
2022-05-06 16:04:39 +02:00
Nikita Popov b457ac4240 [InstCombine] Extract icmp of select transform (NFC)
To make it either to extend to the case where the other operand
is not a constant.
2022-05-06 14:46:44 +02:00
Fraser Cormack bafab9c09f [InstCombine] Fix scalable-vector bitwise select matching
D113035 enhanced the matching of bitwise selects from vector types. This
change unfortunately introduced crashes as it tries to cast scalable
vector types to integers.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D124997
2022-05-06 12:59:39 +01:00
Chenbing Zheng 4c8c101b49 [InstCombine] try to narrow more shifted bswap-of-zext
Try to narrow more bswap, if the shift amount is less than the zext
(bswap (zext X)) >> C --> (zext (bswap X)) << C'

https://alive2.llvm.org/ce/z/i7ddjn

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D124598
2022-05-06 10:45:10 +08:00
Serge Pavlov e1554ac63a Revert "[InstCombine] Remove side effect of replaced constrained intrinsics"
This reverts commit 83914ee96f.
The change caused discussion: https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20220502/1034841.html
2022-05-06 01:09:16 +07:00
Serge Pavlov 83914ee96f [InstCombine] Remove side effect of replaced constrained intrinsics
If a constrained intrinsic call was replaced by some value, it was not
removed in some cases. The dangling instruction resulted in useless
instructions executed in runtime. It happened because constrained
intrinsics usually have side effect, it is used to model the interaction
with floating-point environment. In some cases it is correct behavior
but often the side effect is actually absent or can be ignored.

This change adds specific treatment of constrained intrinsics so that
their side effect can be removed if it actually absents.

Differential Revision: https://reviews.llvm.org/D118426
2022-05-05 12:02:42 +07:00
Alexander Shaposhnikov ec7122f64b [InstCombine] Fold ((A&B)^C)|B
Fold ((A&B)^C)|B into C|B.

https://alive2.llvm.org/ce/z/zSGSor

This addresses the issue https://github.com/llvm/llvm-project/issues/55169

Test plan: ninja check-all

Differential revision: https://reviews.llvm.org/D124710
2022-05-05 00:56:20 +00:00
Sanjay Patel 14f257620c [InstCombine] add type constraint to intrinsic+shuffle fold
This check is in the related fold for binops,
but it was missed when the code was adapted
for intrinsics in 432c199e84. The new test
would crash when trying to create a new
intrinsic with mismatched types.
2022-05-04 13:07:26 -04:00
Sanjay Patel 7e6d318c50 [InstCombine] move shuffle after funnel shift with same-shuffled operands
This extends 432c199e84 and 9c4770eaab with an intrinsic
cited directly in issue #46238

Eventually, we will want to use llvm::isTriviallyVectorizable()
or create some new API for this list, but for now, I am intentionally
making a minimum change to reduce risk and only affect an intrinsic
with regression tests in place.
2022-05-04 13:07:26 -04:00
Sanjay Patel 15042f44a2 [InstCombine] propagate FMF when reordering intrinsics and shuffles
This was missed when extending the fold to allow fma with
9c4770eaab
2022-05-04 12:10:38 -04:00
Sanjay Patel 9c4770eaab [InstCombine] move shuffle after fma with same-shuffled operands
https://alive2.llvm.org/ce/z/sD-JVv

This extends 432c199e84 with a 3 arg intrinsic to demonstrate
that the code works with the extra operand.

Eventually, we will want to use llvm::isTriviallyVectorizable()
or create some new API for this list, but for now, I am intentionally
making a minimum change to reduce risk and only affect an intrinsic
with regression tests in place.
2022-05-04 11:50:38 -04:00
Sanjay Patel 432c199e84 [InstCombine] move shuffle after min/max with same-shuffled operands
This is an intrinsic version of the existing fold for binops.
As a first step, I only allowed min/max, but the code is set
up to make adding more intrinsics easy (with more or less than
2 arguments).

This (and possible follow-ups) are discussed in issue #46238.
2022-05-03 16:23:11 -04:00
Jonas Paulsson 304378fd09 Reapply "[BuildLibCalls] Introduce getOrInsertLibFunc() for use when building
libcalls." (was 0f8c626). This reverts commit 14d9390.

The patch previously failed to recognize cases where user had defined a
function alias with an identical name as that of the library
function. Module::getFunction() would then return nullptr which is what the
sanitizer discovered.

In this updated version a new function isLibFuncEmittable() has as well been
introduced which is now used instead of TLI->has() anytime a library function
is to be emitted . It additionally also makes sure there is e.g. no function
alias with the same name in the module.

Reviewed By: Eli Friedman

Differential Revision: https://reviews.llvm.org/D123198
2022-05-02 19:37:00 +02:00
Nikita Popov 95fedfab6c [InstCombine] Handle non-canonical GEP index in indexed compare fold (PR55228)
Normally the index type will already be canonicalized here, but
this is not guaranteed depending on visitation order. The code
was already accounting for a potentially needed sext, but a trunc
may also be needed.

Add a ConstantExpr::getSExtOrTrunc() helper method to make this
simpler. This matches the corresponding IRBuilder method in behavior.

Fixes https://github.com/llvm/llvm-project/issues/55228.
2022-05-02 17:56:01 +02:00
Juneyoung Lee 40a2e35599 [InstCombine] Remove the undef-related workaround code in visitSelectInst
This patch removes an old hack in visitSelectInst that was written to avoid miscompilation bugs in loop unswitch.
(Added via https://reviews.llvm.org/D35811)

The legacy loop unswitch pass will be removed after D124376, and the new simple loop unswitch pass correctly uses freeze to avoid introducing UB after D124252.

Since the hack is not necessary anymore, this patch removes it.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D124426
2022-04-30 20:48:42 +09:00
Nikita Popov 1881711fbb [InstCombine] Remove memset of undef value
This removes memset with undef char. We already do this for stores
of undef value.

This comes with the caveat that this optimization is not, strictly
speaking, legal for undef values, because we might be overwriting
a poison value. However, our entire load/store model currently still
operates on undef values, so we need to support undef here as well
for internal consistency.

Once https://github.com/llvm/llvm-project/issues/52930 is resolved,
these and related folds can be limited to poison -- I've added
FIXMEs to that effect.

Differential Revision: https://reviews.llvm.org/D124173
2022-04-29 14:51:18 +02:00
Nikita Popov 982cbed819 [InstCombine] Fold logical and/or of range icmps with nowrap flags
This is an edge-case where we don't convert to bitwise and/or based
on implies poison reasoning, so explicitly try to perform the fold
in logical form. The transform itself is poison-safe, as both icmps
are based on the same value and any nowrap flags are discarded as
part of the fold (https://alive2.llvm.org/ce/z/aCwC8b for the used
example).
2022-04-29 14:42:42 +02:00
Nikita Popov 57aaeefc18 [InstCombine] Pass ICmpInsts to foldAndOrOfICmpsUsingRanges() (NFC)
Pass the whole instruction rather than unpacking it. This makes it
easier to reuse the function in another place, as the entire
logic is encapsulated.
2022-04-29 12:46:31 +02:00
Nikita Popov 1f53932a95 [InstCombine] Remove foldAndOrOfEqualityCmpsWithConstants() fold
This fold handles a special subset of foldAndOrOfICmpsUsingRanges(),
use the more generic implementation instead.

The result can differ if a representation using a range comparison
is possible, in which case that is preferred over masking. There is
a canonicalization opportunity here.
2022-04-29 12:23:00 +02:00
Nikita Popov 5515263e44 [InstCombine] Fold and of two ranges differing by mask
This is the de Morgan conjugated variant of the existing fold for
ors. Implement this by switching the range code to always work
on ors and perform invert operands at the start and end. This makes
reasoning easier and makes the extension more obviosuly correct.
2022-04-29 12:01:38 +02:00
Nikita Popov d5ee20fcc9 [InstCombine] Switch an or of icmps fold to use constant ranges
We can express this fold more naturally when working on the constant
range implementation. This change is not entirely NFC, because the
code now also handles cases that don't match the precise pattern
this previously looked for, e.g. we can omit an add on one of the
ranges.
2022-04-29 11:15:54 +02:00
Nikita Popov 90dba831ae [InstCombine] Fold or of icmp ne trunc/and
This adds the de Morgan conjugated variant for the existing
"and eq" style fold.

Proof: https://alive2.llvm.org/ce/z/tkNAcG
2022-04-28 15:07:16 +02:00
Nicolas Abram Lujan f8a574bf4d [InstCombine] C0 >> (X - C1) --> (C0 << C1) >> X
With the right pre-conditions, we can fold the offset
into the shifted constant:
https://alive2.llvm.org/ce/z/drMRBU
https://alive2.llvm.org/ce/z/cUQv-_

Fixes #55016

Differential Revision: https://reviews.llvm.org/D124369
2022-04-27 14:18:30 -04:00
Roman Lebedev ffafa71f64
[InstCombine] 'round up integer': if bias is just right, just reuse instructions
This is only useful if we can't create new instruction
because %x.aligned has other uses and already sticks around.
2022-04-27 17:27:02 +03:00
Roman Lebedev aac0afd1dd
[InstCombine] Fold 'round up integer' pattern (when alignment is a power of two)
But don't deal with non-splats.

The test coverage is sufficiently exhaustive,
and alive is happy about the changes there.

Example with constants: https://alive2.llvm.org/ce/z/EUaJ5- / https://alive2.llvm.org/ce/z/Bkng2X
General proof: https://alive2.llvm.org/ce/z/3RjJ5A
2022-04-27 17:26:55 +03:00
Nikita Popov c103f5e9da [InstCombine] Combine opaque pointer GEPs with mismatching element types
Currently, two GEPs will only be combined if the result element
type of one is the same as the source element type of the other.
However, this means we may miss folding opportunities where the
second GEP could be rewritten using a different element type. This
is especially relevant for opaque pointers, where constant GEPs
often use i8 element type.

Address this by converting GEP indices to offsets, adding them,
and then converting them back to indices. The first (inner) GEP
is allowed to have variable indices as well, in which case only
the constant suffix is converted into an offset.

This should address the regression reported in
https://reviews.llvm.org/D123300#3467615.

Differential Revision: https://reviews.llvm.org/D124459
2022-04-27 09:33:47 +02:00
Ricky Zhou 4041c44853 [InstCombine] Update predicate when canonicalizing comparisons in canonicalizeClampLike.
canonicalizeClampLike canonicalizes the ule/ugt comparisons to ult/uge,
respectively. However, it does not update the variable holding the
comparison predicate type after doing this. Later code fails to handle
the non-canonical predicate type (specifically, the swap of
ThresholdLowIncl and ThresholdHighExcl when Pred0 has been canonicalized
from ugt to uge). This leads to the miscompile reported in PR53252. Fix
this by updating the comparison predicate after canonicalizing.

Fixes #53252

Differential Revision: https://reviews.llvm.org/D119690
2022-04-26 17:35:45 -04:00
Sanjay Patel 903aa5e0f8 [InstCombine] try to fold icmp with mismatched extended operands
If a value is known to be non-negative and zexted,
that's the same thing as sexted.

So for the purpose of looking past the casts with
an icmp, treat it as if it was a sext:
https://alive2.llvm.org/ce/z/_BDsGV

This is necessary, but not enough to solve the
motivating problem:
https://github.com/llvm/llvm-project/issues/55013

Differential Revision: https://reviews.llvm.org/D124419
2022-04-26 14:26:36 -04:00
Sanjay Patel c8ed784ee6 [InstCombine] fold freeze of partial undef/poison vector constants
We can always replace the undef elements in a vector constant
with regular constants to get rid of the freeze:
https://alive2.llvm.org/ce/z/nfRb4F

The select diffs show that we might do better by adjusting the
logic for a frozen select condition. We may also want to refine
the vector constant replacement to consider forming a splat.

Differential Revision: https://reviews.llvm.org/D123962
2022-04-26 14:16:11 -04:00
Sanjay Patel 6631907ad2 [InstCombine] use isKnownNonNegative to reduce code duplication; NFC
We may be able to make the ValueTracking wrapper smarter
in the future (for example, analyze a simple recurrence),
so this will automatically benefit if that happens.
2022-04-25 17:13:29 -04:00
Nikita Popov e8945110d2 [InstCombine] Remove redundant unsigned underflow fold (NFCI)
This is now handled as a combination of two other folds:
(A+B) <= A & (A+B) != 0  -->  (A+B)-1 < A
(A+B)-1 < A  -->  -B < A
2022-04-25 14:22:43 +02:00
Nikita Popov ee50925894 [InstCombine] Fold (X != 0) & (Y u>= X)
This adds the De Morgan conjugated fold for the existing
(X == 0) | (Y u< X) fold.

Proof: https://alive2.llvm.org/ce/z/3Me3JQ
2022-04-25 13:16:47 +02:00
Nikita Popov 2bec8d6d59 [InstCombine] Fold X + Y + C u< X
This is a variation on the X + Y u< X fold with an extra constant.
Proof: https://alive2.llvm.org/ce/z/VNb8pY
2022-04-25 12:53:39 +02:00
Chenbing Zheng 5805cfb901 [InstCombine] Complete folding of fneg-of-fabs
This patch add a function foldSelectWithFCmpToFabs, and do more combine for
fneg-of-fabs.
With 'nsz':
fold (X <  +/-0.0) ? X : -X or (X <= +/-0.0) ? X : -X to -fabs(x)
fold (X >  +/-0.0) ? X : -X or (X >= +/-0.0) ? X : -X to -fabs(x)

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D123830
2022-04-25 09:53:36 +08:00
Simon Pilgrim ffe13960b5 [InstCombine] Fold (A & 2^C1) + A => A & (2^C1 - 1) iff bit C1 in A is a sign bit (PR21929)
Alive2: https://alive2.llvm.org/ce/z/Ygq26C

This is the final missing fold to handle the modulo2 simplification: https://github.com/llvm/llvm-project/issues/22303

Fixes #22303

Differential Revision: https://reviews.llvm.org/D123374
2022-04-22 16:59:02 +01:00
Nikita Popov 369ef9bf60 [InstCombine] Extract code for or of icmp eq zero and icmp fold (NFC)
To make it easier to extend this to the congruent and case.
2022-04-22 16:48:59 +02:00
Nikita Popov ba46ae7bd8 [InstCombine] Merge foldAndOfICmps() and foldOrOfICmps() (NFCI)
Folds are supposed to always be added in conjugated pairs for and
and or. Merge the two functions to make folds for which this is
currently not the case more obvious.
2022-04-22 12:48:03 +02:00
Nikita Popov 3e1d2c352c [InstCombine] Fix or of commuted foldable predicates
1d90e53044 switch this code to store
the predicates and operands in variables, but retained a
swapOperands() call here. Thus the commuted cases were no longer
folded. Additionally, as the change was not reported, the next
InstCombine iteration would not pick it up either.
2022-04-22 12:31:26 +02:00
Sanjay Patel 664ae7bbcc [InstCombine] C0 <<{nsw, nuw} (X - C1) --> (C0 >> C1) << X (2nd try)
The first attempt at this missed a check to make sure the offset
constant was in range and caused many bot failures.

That was missed in the Alive2 proof because on overshift creates
poison rather than the assert from APInt. Here's an alternate
attempt at a proof using count-trailing-zeros:
https://alive2.llvm.org/ce/z/pnXQYR

Original commit message:

This is similar to an existing pre-shift-of-constant fold:
8a9c70fc01
...but in this case, we need no-wrap on the shl and a negative
offset:
https://alive2.llvm.org/ce/z/_RVz99
2022-04-21 16:18:46 -04:00
chenglin.bi 25aba1abb5 Revert "[InstCombine] Add one use limitation for (X * C2) << C1 --> X * (C2 << C1)"
This reverts commit b543d28df7.
2022-04-22 00:56:20 +08:00
chenglin.bi b543d28df7 [InstCombine] Add one use limitation for (X * C2) << C1 --> X * (C2 << C1)
Follow up D123453, add one-use limitation for
(X * C2) << C1 --> X * (C2 << C1)
to make consistent with
lshr (mul nuw x, MulC), ShAmtC -> mul nuw x, (MulC >> ShAmtC)

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D124183
2022-04-22 00:32:36 +08:00
Sanjay Patel 8960ba7491 Revert "[InstCombine] C0 <<{nsw, nuw} (X - C1) --> (C0 >> C1) << X"
This reverts commit 5819f4a422.
This caused bots to fail with a crash/assert during the fold,
so some constraint was missed.
2022-04-21 12:15:27 -04:00
Sanjay Patel 5819f4a422 [InstCombine] C0 <<{nsw, nuw} (X - C1) --> (C0 >> C1) << X
This is similar to an existing pre-shift-of-constant fold:
8a9c70fc01
...but in this case, we need no-wrap on the shl and a negative
offset:
https://alive2.llvm.org/ce/z/_RVz99

Fixes #54890
2022-04-21 11:38:27 -04:00
Nikita Popov 46c2b41d02 [InstCombine] Remove dead code (NFC)
This was a leftover condition without code.
2022-04-21 15:53:53 +02:00
Craig Topper e3f6c2d288 [InstCombine] Don't look through bitcast from vector in collectInsertionElements.
We're making a recursive call here and everything in the function
assumes we're looking at scalars. This would be violated if we
looked through a bitcast from vectors.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D124015
2022-04-20 09:15:32 -07:00
chenglin.bi 1fae4b492d [InstCombine] Fold mul nuw+lshr to a single multiplication when the latter is a factor
if c is divisible by (1 << ShAmtC), we can fold this pattern:
lshr (mul nuw x, c), ShAmtC -> mul nuw x, (c >> ShAmtC)

https://alive2.llvm.org/ce/z/ox4wAt

Fix https://github.com/llvm/llvm-project/issues/54824

Reviewed By: spatel, lebedev.ri, craig.topper

Differential Revision: https://reviews.llvm.org/D123453
2022-04-21 00:13:36 +08:00
Sanjay Patel bf09a925f2 [InstCombine] remove likely redundant ValueTracking-based folds for shifts
This is not expected to have a functional difference as discussed in the
post-commit comments for 8a9c70fc01. All of the motivating tests for
the older fold still optimize as expected because other code can infer
the 'nuw'.
2022-04-20 11:28:31 -04:00
Sanjay Patel 8a9c70fc01 [InstCombine] C0 shift (X add nuw C) --> (C0 shift C) shift X
With 'nuw' we can convert the increment of the shift amount
into a pre-shift (constant fold) of the shifted constant:
https://alive2.llvm.org/ce/z/FkTyR2

Fixes issue #41976
2022-04-19 15:21:34 -04:00
Sanjay Patel 3a27b51b27 [InstCombine] reduce code for freeze of undef
The description was ambiguous about the behavior
when boths select arms are constant or both arms
are not constant. I don't think there's any
evidence to support either way, but this matches
the code with a more specified description.

We can extend this to deal with vector constants
with undef/poison elements. Currently, those don't
get folded anywhere.
2022-04-18 15:14:02 -04:00
Sanjay Patel 2c2568f39e [InstCombine] canonicalize select with signbit test
This is part of solving issue #54750 - in that example
we have both forms of the compare and do not recognize
the equivalence.
2022-04-14 14:28:47 -04:00
Liqin Weng fa4b4f0fcb [InstCombine] fold more constant remainder to select-of-constants remainder
Reviewed By: xbolva00, spatel, Chenbing.Zheng

Differential Revision: https://reviews.llvm.org/D123486
2022-04-12 09:40:56 +08:00
Alexander Shaposhnikov f6bb156fb1 [InstCombine] Fold icmp(X) ? f(X) : C
This diff extends foldSelectInstWithICmp to handle the case icmp(X) ? f(X) : C
when f(X) is guaranteed to be equal to C for all X in the exact range of the inverse predicate.
This addresses the issue https://github.com/llvm/llvm-project/issues/54089.

Differential revision: https://reviews.llvm.org/D123159

Test plan: make check-all
2022-04-12 01:32:55 +00:00
Sanjay Patel 1206a18d41 [InstCombine] guard against splat-mul corner case
The test is already simplified, and I'm not sure how
to write a test to exercise the new clause. But it
protects the 2-bit pattern from miscompiling as noted
in D123453.

https://alive2.llvm.org/ce/z/QPyVfv
(If we managed to fall into the mul transform, it
would wrongly create a zero on this pattern.)
2022-04-11 15:50:13 -04:00
Sanjay Patel 7783db55af [InstCombine] try to fold low-mask of ashr to lshr
With one-use, we handle this via demanded-bits.
But We need to handle extra uses to improve issue #54750.

https://alive2.llvm.org/ce/z/aDYkPv
2022-04-11 11:56:40 -04:00
Simon Pilgrim 431e93f4f5 [InstCombine] Fold sub(add(x,y),min/max(x,y)) -> max/min(x,y) (PR38280)
As discussed on Issue #37628, we can flip a min/max node if we're subtracting from the sum of the node's operands

Alive2: https://alive2.llvm.org/ce/z/W_KXfy

Differential Revision: https://reviews.llvm.org/D123399
2022-04-11 11:32:56 +01:00
serge-sans-paille aa15ea47e2 [builtin_object_size] Basic support for posix_memalign
It actually implements support for seeing through loads, using alias analysis to
refine the result.

This is rather limited, but I didn't want to rely on more than available
analysis at that point (to be gentle with compilation time), and it does seem to
catch common scenario, as showcased by the included tests.

Differential Revision: https://reviews.llvm.org/D122431
2022-04-08 09:31:11 +02:00
Chenbing Zheng 467cbb6249 [InstCombine] fold more constant divisor to select-of-constants divisor
By adding a parameter to function FoldOpIntoSelect, we can fold more Ops to Select.
For this example, we tend to fold the division instruction,
so we no longer care whether SelectInst is one use.

This patch slove TODO left in InstCombine/div.ll.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D122967
2022-04-08 10:19:24 +08:00
Augie Fackler f3c702fbd1 InstCombineCalls: fix annotateAnyAllocCallSite to report changes
Spotted during review of D123052.

Differential Revision: https://reviews.llvm.org/D123232
2022-04-07 13:49:09 -04:00
Augie Fackler f120be6c86 InstCombineCalls: when adding an align attribute, never reduce it
Sometimes we can infer an align from an allocalign but the function
already promised it'd be more-aligned than the allocalign and there's an
existing align that we shouldn't reduce. Make sure we handle that
correctly.

Differential Revision: https://reviews.llvm.org/D121642
2022-04-07 12:38:44 -04:00
Augie Fackler ca051a46fb InstCombineCalls: infer return alignment from allocalign attributes
This exposes a couple of lingering bugs, which will be fixed in
the next two commits.

Differential Revision: https://reviews.llvm.org/D123052
2022-04-07 12:38:44 -04:00
Simon Pilgrim afa1ae9e0c [InstCombine] SimplifyDemandedUseBits - allow and(srem(X,Pow2),C) -> and(X,C) to work on vector types
Replace m_ConstantInt with m_APInt to match uniform (no-undef) vector remainder amounts.
2022-04-07 15:24:45 +01:00
Simon Pilgrim 5909c67883 [InstCombine] SimplifyDemandedUseBits - add TODO to remove shl node if we only demand known sign bits of the shift source
Similar to what we already perform for ashr/lshr
2022-04-07 14:35:11 +01:00
Simon Pilgrim 5e90224839 [InstCombine] SimplifyDemandedUseBits - remove lshr node if we only demand known sign bit
This is a lshr equivalent to D122340 - if we don't demand any of the additional sign bits introduced by the ashr, the lshr can be treated as an ashr and we can remove the shift entirely if we only demand already known sign bits.

Another step towards PR21929

https://alive2.llvm.org/ce/z/6f3kjq

Differential Revision: https://reviews.llvm.org/D123118
2022-04-07 14:33:31 +01:00
Matt Devereau 2c3f66519c [SVE] Extend support for folding select + masked gathers
Extend the work done in D106376 to include masked gathers

Differential Revision: https://reviews.llvm.org/D122896
2022-04-05 16:27:11 +00:00
Alexander Shaposhnikov 6cf10b7e6e [InstCombine] Fold srem(X, PowerOf2) == C into (X & Mask) == C for positive C
This diff extends InstCombinerImpl::foldICmpSRemConstant to handle the cases
srem(X, PowerOf2) == C and
srem(X, PowerOf2) != C
for positive C.
This addresses the issue https://github.com/llvm/llvm-project/issues/54650

Differential revision: https://reviews.llvm.org/D122942

Test plan: make check-all
2022-04-03 03:57:05 +00:00
Sanjay Patel 5f8c2b884d [InstCombine] limit icmp fold with sub if other sub user is a phi
This is a hacky fix for:
https://github.com/llvm/llvm-project/issues/54558

As discussed there, codegen regressed when we opened up this transform
to allow extra uses ( 61580d0949 ), and it's not clear how to
undo the transforms at the later stage of compilation.

As noted in the code comments, there's a set of remaining folds that
are still limited to one-use, so we can try harder to refine and
expand the limitations on these folds, but it's likely to be an
up-and-down battle as we find and overcome similar regressions.

Differential Revision: https://reviews.llvm.org/D122909
2022-04-02 19:23:42 -04:00
Sanjay Patel 97ac0cd6c4 [InstCombine] fold fcmp with lossy casted constant (2nd try)
This is a retry of 9397bdc67e - that was reverted until
we had a clang warning in place to alert users about a
possible mistake in source. The warning was added with
ab982eace6.

This is noted as a missing clang warning in #54222,
but it is also a missing optimization opportunity.

Alive2 proofs:
https://alive2.llvm.org/ce/z/Q8drDq
https://alive2.llvm.org/ce/z/pE6LRt

I don't see a single conversion for all predicates
using "getFCmpCode" logic, so other predicates are
left as a TODO item.
2022-04-02 19:23:01 -04:00
Roman Lebedev 308ca349cb
[InstCombine] Fold `(X | C2) ^ C1 --> (X & ~C2) ^ (C1^C2)`
These two are equivalent,
and i *think* the `and` form is more-ish canonical.

General proof: https://alive2.llvm.org/ce/z/RrF5s6

If constant on the (outer) `xor` is an `undef`,
the whole lane is dead: https://alive2.llvm.org/ce/z/mu4Sh2

However, if the constant on the (inner) `or` is an `undef`,
we must sanitize it first: https://alive2.llvm.org/ce/z/MHYJL7
I guess, producing a zero `and`-mask is optimal in that case.

alive-tv is happy about the entirety of `xor-of-or.ll`.
2022-04-03 00:12:56 +03:00
Hirochika Matsumoto a3cffc1150 [InstCombine] Fold (ctpop(X) == 1) | (X == 0) into ctpop(X) < 2
https://alive2.llvm.org/ce/z/94yRMN

Fixes #54177

Differential Revision: https://reviews.llvm.org/D122077
2022-03-29 11:30:06 -04:00
Nikita Popov 682ef39b1a [InstCombine] Remove call to getPointerElementType()
This was erroneously re-introduced as part of
bb0b23174e.
2022-03-29 16:52:29 +02:00
Johannes Doerfert bb0b23174e [InstCombineCalls] Optimize call of bitcast even w/ parameter attributes
Before we gave up if a call through bitcast had parameter attributes.
Interestingly, we allowed attributes for the return value already. We
now handle both the same way, namely, we drop the ones that are
incompatible with the new type and keep the rest. This cannot cause
"more UB" than initially present.

Differential Revision: https://reviews.llvm.org/D119967
2022-03-28 20:57:52 -05:00
chenglin.bi 9a53793ab8 [InstCombine] Fold two select patterns into and-or
select (~a | c), a, b -> and a, (or c, b) https://alive2.llvm.org/ce/z/bnDobs
select (~c & b), a, b -> and b, (or a, c) https://alive2.llvm.org/ce/z/k2jJHJ

Differential Revision: https://reviews.llvm.org/D122152
2022-03-28 16:07:55 -04:00
Simon Pilgrim 6a094a6264 [InstCombine] SimplifyDemandedUseBits - remove ashr node if we only demand known sign bits
We already do this for SelectionDAG, but we're missing it here.

Noticed while re-triaging PR21929

Differential Revision: https://reviews.llvm.org/D122340
2022-03-25 15:39:08 +00:00
Sanjay Patel 5dbb53b1b4 [InstCombine] merge shuffled vector negate and multiply
Add the "(0 - X) --> (X * -1)" reverse identity to the list of alternate form binops.

We need a little hack to make the existing logic work because it does not expect to
move constants from op0 to op1, but the code comment hopefully makes that clear.
I don't think there are any other identities like that.

Fixes #54364

Differential Revision: https://reviews.llvm.org/D122390
2022-03-24 10:25:16 -04:00
Dávid Bolvanský 4397504c2d [NFCI] Fix set-but-unused warning in InstCombineAddSub.cpp 2022-03-24 08:33:40 +01:00
chenglin.bi 52f323d0f1 [InstCombine] Fold abs of known negative operand when source is sub
When abs source comes from (x - y), check if a "x > y" dominating
condition exists.

Fixes #54132

Differential Revision: https://reviews.llvm.org/D122013
2022-03-23 15:21:33 -04:00
Sanjay Patel 0fcff69bcb [InstCombine] try to narrow shifted bswap-of-zext (2nd try)
The first attempt at this missed a validity check.
This version includes a test of the narrow source
type for modulo-16-bits.

Original commit message:

This is the IR counterpart to 370ebc9d9a
which provided a bswap narrowing fix for issue #53867.

Here we can be more general (although I'm not sure yet
what would happen for illegal types in codegen - too
rare to worry about?):
https://alive2.llvm.org/ce/z/3-CPfo

This will be more effective if we have moved the shift
after the bswap as proposed in D122010, but it is
independent of that patch.

Differential Revision: https://reviews.llvm.org/D122166
2022-03-23 11:28:37 -04:00
Nathan Chancellor 4e0008dcbe
Revert "[InstCombine] try to narrow shifted bswap-of-zext"
This reverts commit 9e9bda2e8f.

This causes a backend error when building the Linux kernel for arm64.
See https://reviews.llvm.org/D122166 for a simplified reproducer.
2022-03-22 17:32:33 -07:00
Philip Reames 7abefc4222 [instcombine] Fold away memset/memmove from otherwise unused alloca
The motivation for this is that while both memcpyopt and dse will catch this case, both are limited by MSSA's walk back threshold when finding clobbers.  As such, if you have a memcpy of an otherwise dead alloca placed towards the end of a long basic block with lots of other memory instructions, it would be missed.  This is a bit undesirable for such an "obviously" useless bit of code.

As noted in comments, we should probably generalize instcombine's escape analysis peephole (see visitAllocInst) to allow read xor write.  Doing that would subsume this code in a more general way, but is also a more involved change.  For the moment, I went with the easiest fix.
2022-03-22 13:48:48 -07:00
Sanjay Patel ccf8c969c2 [InstCombine] reorder code, fix formatting; NFC
The affected code can be updated to solve #54364,
so make some cosmetic diffs before real changes.
2022-03-22 16:33:01 -04:00
Sanjay Patel 60820e53ec [InstCombine] try to canonicalize logical shift after bswap
When shifting by a byte-multiple:
bswap (shl X, C) --> lshr (bswap X), C
bswap (lshr X, C) --> shl (bswap X), C

This is an IR implementation of a transform suggested in D120648.
The "swaps cancel" test models the motivating optimization from
that proposal.

Alive2 checks (as noted in the other review, we could use
knownbits to handle shift-by-variable-amount, but that can be an
enhancement patch):
https://alive2.llvm.org/ce/z/pXUaRf
https://alive2.llvm.org/ce/z/ZnaMLf

Differential Revision: https://reviews.llvm.org/D122010
2022-03-22 09:10:55 -04:00
Sanjay Patel 9e9bda2e8f [InstCombine] try to narrow shifted bswap-of-zext
This is the IR counterpart to 370ebc9d9a
which provided a bswap narrowing fix for issue #53867.

Here we can be more general (although I'm not sure yet
what would happen for illegal types in codegen - too
rare to worry about?):
https://alive2.llvm.org/ce/z/3-CPfo

This will be more effective if we have moved the shift
after the bswap as proposed in D122010, but it is
independent of that patch.

Differential Revision: https://reviews.llvm.org/D122166
2022-03-22 08:22:30 -04:00
Nikita Popov fc8946fae7 [InstCombine] Remove integer SPF of SPF folds (NFCI)
Now that we canonicalize to intrinsics, these folds should no
longer be needed. Only one fold that also applies to floating-point
min/max is retained.
2022-03-18 10:20:48 +01:00
Andrew Wei 0af3e6a22d [InstCombine] Sink instructions with multiple users in a successor block.
This patch tries to sink instructions when they are only used in a successor block.

This is a further enhancement patch based on Anna's commit:
D109700, which allows sinking an instruction having multiple uses in a single user.

In this patch, sink instructions with multiple users in a single successor block will be supported.
It could fix a known issue from rust:
  https://github.com/rust-lang/rust/issues/51346#issuecomment-394443610

Reviewed By: nikic, reames

Differential Revision: https://reviews.llvm.org/D121585
2022-03-18 11:53:45 +08:00
Nikita Popov 4010a7a5d0 Reapply [InstCombine] Support switch in phi to cond fold
Reapply with an explicit check for multi-edges, as the expected
behavior of multi-edge dominance is unclear (D120811).

-----

For conditional branches, we know the value is i1 0 or i1 1 along
the outgoing edges. For switches we can apply exactly the same
optimization, just with the known values determined by the switch
cases.
2022-03-17 10:03:09 +01:00
Sanjay Patel 598721f866 [InstCombine] try harder to propagate 'nsz' through fneg-of-select
This can be viewed as swapping the select arms:
https://alive2.llvm.org/ce/z/jUvFMJ
...so we don't have the 'nsz' problem with the more general fold.

This unlocks other folds for the motivating fabs example.
This was discussed in issue #38828.
2022-03-15 11:05:29 -04:00
Simon Pilgrim 7e4cf582cf [InstCombine] Add general constant support to eq/ne icmp(add(X,C1),add(Y,C2)) -> icmp(add(X,C1-C2),Y) fold
A further extension for Issue #32161

For eq/ne comparisons - the sign mismatch and bounds constraints are redundant, so if the that fold fails, fallback and just fold the constants directly.

https://alive2.llvm.org/ce/z/cdodNQ

The loop rotation test change looks mostly benign - the backend doesn't seem to suffer? https://gcc.godbolt.org/z/dErMY78To

Differential Revision: https://reviews.llvm.org/D121551
2022-03-15 14:17:38 +00:00
Craig Topper ce78e68261 [InstCombine] Fold select based logic of fcmps with same operands when FMF is present.
If we have a logical and/or in select form and the true/false operand
is an fcmp with poison generating FMF, we won't be able to fold it
to an and/or instruction. This prevents us from optimizing the case
where it is a logical operation of two fcmps with identical operands.

This patch adds explicit checks for this case that doesn't rely on
converting to and/or to do the optimization. It reuses the existing
foldLogicOfFCmps, but adds a new flag to disable the other combine
that is inside that function.

FMF flags from the two FCmps are intersected using the logic added in
D121243. The FIXME has been updated to indicate that we can only use
a union for the non-select form.

This allows us to optimize cases like this from compare-fp-3.c in the
gcc torture suite with fast math.

void
test1 (float x, float y)
{
  if ((x==y) && (x!=y))
    link_error0();
}

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D121323
2022-03-14 14:45:07 -07:00
Sanjay Patel 3491f2f4b0 [InstCombine] replace negated operand in fcmp with 0.0
X (any pred) -X --> X (any pred) 0.0

This works with all FP values and preserves FMF.
Alive2 examples:
https://alive2.llvm.org/ce/z/dj6jhp

This can also create one of the patterns that we match as "fabs"
as shown in one of the test diffs.
2022-03-10 12:53:32 -05:00
Sanjay Patel 9fac110bf7 Revert "[InstCombine] fold fcmp with lossy casted constant"
This reverts commit 9397bdc67e.

This optimization is likely to surprise programmers as seen
in post-commit comments, so we should add a clang warning
first (that is proposed in D121306).
2022-03-10 10:22:22 -05:00
Simon Pilgrim 808d9d260b [InstCombine] Add vector support to icmp(add(X,C1),add(Y,C2)) -> icmp(add(X,C1-C2),Y) fold
As discussed on Issue #32161 this fold can be generalized a lot more than it currently is, but this patch at least adds vector support.

Differential Revision: https://reviews.llvm.org/D121358
2022-03-10 13:30:48 +00:00
Craig Topper f72fe2ef67 [InstCombine] Preserve FMF in foldLogicOfFCmps.
This patch intersects the fast math flags from the two fcmps instead
of dropping them.

I poked at this a bunch with Alive2 for nnan and ninf flags and it seemed
to check out. With the other flags it told me "Couldn't prove the
correctness of the transformation". Not sure if I should just preserve
nnan and ninf?

Reviewed By: spatel, lebedev.ri

Differential Revision: https://reviews.llvm.org/D121243
2022-03-09 09:17:09 -08:00
Sanjay Patel 9397bdc67e [InstCombine] fold fcmp with lossy casted constant
This is noted as a missing clang warning in #54222
(and we should still make that enhancement).

Alive2 proofs:
https://alive2.llvm.org/ce/z/Q8drDq
https://alive2.llvm.org/ce/z/pE6LRt

I don't see a single conversion for all predicates
using "getFCmpCode" logic, so other predicates are
left as a TODO item.
2022-03-08 12:41:12 -05:00
Arnold Schwaighofer dcdc1f29bb InstCombine: Can't fold a phi arg load into the phi if the load is from a swifterror address
`swifterror` addresses are only allowed as operands to load, store, and
calls.

The following transformation is not allowed. It would create a phi with a
`swifterror` address operand.

```
 %addr = alloca swifterror i8*
 br %cond, label %bb1, label %b22

 bb1:
   %val1 = load i8*, i8** %addr
   br exit

 bb2:
   %val2 = load i8*, i8** %addr
   br exit

 exit:
   %val = phi [%val1, %bb1] [%val2, %bb2]
```

=>

```
 %addr = alloca swifterror i8*
 br %cond, label %bb1, label %b22

 bb1:
   br exit

 bb2:
   br exit

 exit:
   %val_addr = phi [%addr, %bb1] [%addr, %bb2]
   %val2 = load i8*, i8** %val_addr
```

rdar://89865485

Differential Revision: https://reviews.llvm.org/D121217
2022-03-08 09:09:51 -08:00
Augie Fackler 5e4c75db3b InstructionCombining: avoid eliding mismatched alloc/free pairs
Prior to this change LLVM would happily elide a call to any allocation
function and a call to any free function operating on the same unused
pointer. This can cause problems in some obscure cases, for example if
the body of operator::new can be inlined but the body of
operator::delete can't, as in this example from jyknight:

    #include <stdlib.h>
    #include <stdio.h>

    int allocs = 0;

    void *operator new(size_t n) {
        allocs++;
        void *mem = malloc(n);
        if (!mem) abort();
        return mem;
    }

    __attribute__((noinline)) void operator delete(void *mem) noexcept {
        allocs--;
        free(mem);
    }

    void deleteit(int*i) { delete i; }
    int main() {
        int*i = new int;
        deleteit(i);
        if (allocs != 0)
          printf("MEMORY LEAK! allocs: %d\n", allocs);
    }

This patch addresses the issue by introducing the concept of an
allocator function family and uses it to make sure that alloc/free
function pairs are only removed if they're in the same family.

Differential Revision: https://reviews.llvm.org/D117356
2022-03-04 10:41:10 -05:00
Craig Topper 608161225e [InstCombine][Analysis] Move getFCmpCode and getPredForFCmpCode to CmpInstAnalysis. NFC
The similar getICmpCode and getPredForICmpCode are already there.
This moves FP for consistency.

I think InstCombine is currently the only user of both.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D120754
2022-03-03 09:33:24 -08:00
Nikita Popov c1b9667148 [InstCombine] Support opaque pointers in callee bitcast fold
To make this actually trigger, we also need to check whether the
function types differ, which is a hidden cast under opaque pointers.
The transform is somewhat less relevant there because it is
primarily about pointer bitcasts, but it can also happen with other
bit- or pointer-castable types.

Byval handling is easier with opaque pointers because there is no
need to adjust the byval type, we only need to make sure that it's
still a pointer.
2022-03-03 11:07:39 +01:00
Nikita Popov 6c8adc5054 [InstCombine] Remove unnecessary byval check in callee cast fold
The logic for handling this was fixed in
8d7f118ab2, but the check for byval
on the callee was retained. This resulted in a weird situation
where the transform would work depending on whether the byval
was only on the call or on both the call and the function.
2022-03-03 10:55:14 +01:00
serge-sans-paille 59630917d6 Cleanup includes: Transform/Scalar
Estimated impact on preprocessor output line:
before: 1062981579
after:  1062494547

Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D120817
2022-03-03 07:56:34 +01:00
Nikita Popov 61580d0949 Reapply [InstCombine] Remove one-use limitation from X-Y==0 fold
This is a recommit without changes. I originally reverted this
due to a significant code-size regression on tramp3d-v4, however
further investigation showed that in the tramp3d-v4 case this
change enables additional optimizations (in particular more
jump threading), which happens to reduce the size of a function
just enough to be eligible for inlining at hot callsites, which
results in the code size increase. As such, this was just bad
luck.

-----

This one-use limitation is artificial, we do not increase
instruction count if we perform the fold with multiple uses. The
motivating case is shown in @sub_eq_zero_select, where the one-use
limitation causes us to miss a subsequent select fold.

I believe the backend is pretty good about reusing flag-producing
subs for cmps with same operands, so I think doing this is fine.

Differential Revision: https://reviews.llvm.org/D120337
2022-03-02 16:43:33 +01:00
Nikita Popov 5cf06d10f8 Revert "[InstCombine] Support switch in phi to cond fold"
This reverts commit 0817ce86b5.

Seeing some ppc64le stage2 failures, reverting to investigate.
2022-03-02 12:49:47 +01:00
Nikita Popov 0817ce86b5 [InstCombine] Support switch in phi to cond fold
For conditional branches, we know the value is i1 0 or i1 1 along
the outgoing edges. For switches we can apply exactly the same
optimization, just with the known values determined by the switch
cases.
2022-03-02 12:16:32 +01:00
serge-sans-paille a494ae43be Cleanup includes: TransformsUtils
Estimation on the impact on preprocessor output:
before: 1065307662
after:  1064800684

Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D120741
2022-03-01 21:00:07 +01:00
Craig Topper 7bc6667845 [Analysis] Simplify the interface to llvm::getICmpCode. NFC
Instead of passing an InstCmpInt * and a bool just pass the predicate
from the caller.

I'm considering moving the similar FCmp functions from InstCombine
over here and this makes the interface consistent with what is used
for FCmp.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D120609
2022-03-01 09:53:27 -08:00
Nikita Popov a1f442b278 [InstCombine] Support phi to cond fold with more than two preds
This transform can still be applied if there are more than two
phi inputs, as long as phi inputs with the same value are dominated
by the same idom edge.
2022-03-01 16:31:49 +01:00