Commit Graph

94 Commits

Author SHA1 Message Date
Sanjay Patel f9f40aa10d [InstCombine] fold negated low-bit-mask to cmp+select
(-(X & 1)) & Y --> (X & 1) == 0 ? 0 : Y
https://alive2.llvm.org/ce/z/rhpH3i

This is noted as a missing IR canonicalization in issue #55618.
We already managed to fix codegen to the expected form.
2022-07-03 12:25:26 -04:00
Sanjay Patel c1c3134ac4 [InstCombine] add tests for and-of-negated-lowbitmask; NFC 2022-07-03 12:25:25 -04:00
Sanjay Patel bfde861935 [InstCombine] convert mask and shift of power-of-2 to cmp+select
When the mask is a power-of-2 constant and op0 is a shifted-power-of-2
constant, test if the shift amount equals the offset bit index:

(ShiftC << X) & C --> X == (log2(C) - log2(ShiftC)) ? C : 0
(ShiftC >> X) & C --> X == (log2(ShiftC) - log2(C)) ? C : 0

This is an alternate to D127610 with a more general pattern.
We match only shift+and instead of the trailing xor, so we see a few
more tests diffs. I think we discussed this initially in D126617.

Here are proofs for shifts in both directions:
https://alive2.llvm.org/ce/z/CFrLs4

The test diffs look equal or better for IR, and this makes the
patterns more uniform in IR. The backend can partially invert this
in both cases if that is profitable. It is not trivially reversible,
however, so if we find perf regressions that are not easy to undo,
then we may want to revert this.

Differential Revision: https://reviews.llvm.org/D127801
2022-06-17 10:51:57 -04:00
chenglin.bi 286198ff04 [InstCombine] Optimize lshr+shl+and conversion pattern
if `C1` and `C3` are pow2 and `Log2(C3) >= C2`:
    ((C1 >> X) << C2) & C3 -> X == (Log2(C1)+C2-Log2(C3)) ? C3 : 0
https://alive2.llvm.org/ce/z/zvrkKF

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D127469
2022-06-14 11:06:10 +08:00
Sanjay Patel 310adb658c [InstCombine] reorder mask folds for efficiency
This shows narrowing improvements on the logic tests
(transforms recently added with e247b0e5c9).

This is not a complete fix. That would require adding
folds to visitOr/visitXor. But it enables the expected
transforms for the basic patterns in the affected tests.
2022-06-13 09:49:57 -04:00
Sanjay Patel e247b0e5c9 [InstCombine] add narrowing transform for low-masked binop with zext operand (2nd try)
The 1st try ( afa192cfb6 ) was reverted because it could
cause an infinite loop with constant expressions.

A test for that and an extra condition to enable the transform
are added now. I also added code comments to better describe
the transform and the existing, related transform.

Original commit message:
https://alive2.llvm.org/ce/z/hRy3rE

As shown in D123408, we can produce this pattern when moving
casts around, and we already have a related fold for a binop
with a constant operand.
2022-06-10 12:42:27 -04:00
Sanjay Patel 6fedc6a2b4 Revert "[InstCombine] add narrowing transform for low-masked binop with zext operand"
This reverts commit afa192cfb6.
This can cause an infinite loop as shown with an example in the
post-commit thread.
2022-06-10 08:25:10 -04:00
chenglin.bi cde377db85 [InstCombine] Add negative vector tests for lshr+shl+and/shl+lshr+and transforms; NFC 2022-06-10 11:36:39 +08:00
chenglin.bi 87b5840b34 [InstCombine] Add baseline tests for lshr+shl+and transforms; NFC 2022-06-10 11:00:41 +08:00
chenglin.bi de7a6ae1ff [InstCombine] Optimize shl+lshr+and conversion pattern
if `C1` and `C3` are pow2 and `Log2(C3)+C2 < BitWidth`:
    ((C1 << X) >> C2) & C3 -> X == (Log2(C3)+C2-Log2(C1)) ? C3 : 0;

https://alive2.llvm.org/ce/z/Pus5bd

Fix issue https://github.com/llvm/llvm-project/issues/55739

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D126617
2022-06-10 09:36:58 +08:00
Sanjay Patel afa192cfb6 [InstCombine] add narrowing transform for low-masked binop with zext operand
https://alive2.llvm.org/ce/z/hRy3rE

As shown in D123408, we can produce this pattern when moving
cast around, and we already have a related fold for a binop
with a constant operand.
2022-06-09 16:59:26 -04:00
Sanjay Patel 48a606d0c7 [InstCombine] add tests for masked binop narrowing; NFC 2022-06-09 16:55:24 -04:00
chenglin.bi 226c564329 [InstCombine] Add vector tests for shl+lshr+and transforms; NFC
D126617
2022-06-09 11:14:26 +08:00
Sanjay Patel 4b2681ffa8 [InstCombine] add/move tests for opposite direction shifts; NFC 2022-06-06 11:35:50 -04:00
chenglin.bi 2e7d4b6619 [InstCombine] Add more tests for shl+lshr transforms; NFC 2022-06-06 10:15:48 +08:00
chenglin.bi cfdd2b1aef [InstCombine] Fix tests const value for shl+lshr transforms; NFC 2022-06-06 10:08:56 +08:00
chenglin.bi 0cbd5d3ded [InstCombine] Add more tests for shl+lshr transforms; NFC 2022-06-06 09:59:41 +08:00
Sanjay Patel a0c3c60728 [InstCombine] fold shift-right-by-constant with shift-right-of-constant operand
(C2 >> X) >> C1 --> (C2 >> C1) >> X

The shift-left form of this transform has existed since:
16f18ed7b5

...but it applies to matching shift right opcodes too:
https://alive2.llvm.org/ce/z/c5eQms
2022-05-30 15:30:01 -04:00
chenglin.bi 9080e21906 [InstCombine] Add baseline tests for shift+and transforms; NFC 2022-05-30 00:30:56 +08:00
Sanjay Patel 7783db55af [InstCombine] try to fold low-mask of ashr to lshr
With one-use, we handle this via demanded-bits.
But We need to handle extra uses to improve issue #54750.

https://alive2.llvm.org/ce/z/aDYkPv
2022-04-11 11:56:40 -04:00
Sanjay Patel 141892d481 [InstCombine] add tests for low-mask of ashr; NFC 2022-04-11 11:56:40 -04:00
Roman Lebedev 308ca349cb
[InstCombine] Fold `(X | C2) ^ C1 --> (X & ~C2) ^ (C1^C2)`
These two are equivalent,
and i *think* the `and` form is more-ish canonical.

General proof: https://alive2.llvm.org/ce/z/RrF5s6

If constant on the (outer) `xor` is an `undef`,
the whole lane is dead: https://alive2.llvm.org/ce/z/mu4Sh2

However, if the constant on the (inner) `or` is an `undef`,
we must sanitize it first: https://alive2.llvm.org/ce/z/MHYJL7
I guess, producing a zero `and`-mask is optimal in that case.

alive-tv is happy about the entirety of `xor-of-or.ll`.
2022-04-03 00:12:56 +03:00
Bjorn Pettersson acdc419c89 [test] Use -passes=instcombine instead of -instcombine in lots of tests. NFC
Another step moving away from the deprecated syntax of specifying
pass pipeline in opt.

Differential Revision: https://reviews.llvm.org/D119081
2022-02-07 14:26:59 +01:00
Sanjay Patel 1a60ae02c6 [InstCombine] fold mask-with-signbit-splat to icmp+select
~(iN X s>> (N-1)) & Y --> (X s< 0) ? 0 : Y

https://alive2.llvm.org/ce/z/JKlQ9x

This is similar to D111410 / 727e642e97 ,
but it includes a 'not' of the signbit and so it
saves an instruction in the basic pattern.

DAGCombiner or target-specific folds can expand
this back into bit-hacks.

The diffs in the logical-select tests are not true
regressions - running early-cse and another round
of instcombine is expected in a normal opt pipeline,
and that reduces back to a minimal form as shown
in the duplicated PhaseOrdering test.

I have no understanding of the SystemZ diffs, so
I made the minimal edits suggested by FileCheck to
make that test pass again. That whole test file is
wrong though. It is running the entire optimizer (-O2)
to check IR, and then topping that by even running
codegen and checking asm. It needs to be split up.

Fixes #52631
2021-12-14 16:00:42 -05:00
Sanjay Patel fd5e493874 [PhaseOrdering] add tests for vector select; NFC
The 1st test corresponds to a minimally optimized (mem2reg)
version of the example in:
issue #52631

The 2nd test copies an existing instcombine test with the
same pattern. If we canonicalize differently, we can miss
reducing to minimal form in a single invocation of
-instcombine, but that should not escape the normal opt
pipeline.
2021-12-14 14:35:10 -05:00
Sanjay Patel 8bfcf1ab6c [InstCombine] move/add tests for binops with sext operand; NFC 2021-11-22 14:43:57 -05:00
Sanjay Patel 337948ac6e [InstCombine] add folds for binop with sexted bool and constant operands
This is a generalization/extension of the existing and/or
folds noted with TODO comments. Those have a one-use
constraint that is not necessary.

Potential follow-ups are noted by the TODO comments in
the new function. We can also call this function from
other binop visit* functions, but we need to add tests
first.

This solves:
https://llvm.org/PR52543

https://alive2.llvm.org/ce/z/NWuCR5
2021-11-20 12:33:00 -05:00
Sanjay Patel 1d007d0e5a [InstCombine] add tests for bitwise logic with bool op; NFC 2021-11-20 12:32:55 -05:00
Sanjay Patel 491efa7f31 [InstCombine] add/adjust tests for mask of sext i1; NFC
These are sibling transforms, but the test coverage was
uneven and incomplete.
2021-11-19 16:07:18 -05:00
Nikita Popov 861adaf2ad [InstCombine] Support splat vectors in some and of icmp folds
Replace m_ConstantInt() with m_APInt() to support splat vectors
in addition to scalar integers.
2021-11-10 22:37:54 +01:00
Nikita Popov fa4e9e64e2 [InstCombine] Add vector variants to merge-icmps.ll (NFC)
And regenerate test checks.
2021-11-10 22:36:06 +01:00
Sanjay Patel 727e642e97 [InstCombine] generalize fold for mask-with-signbit-splat
(iN X s>> (N-1)) & Y --> (X < 0) ? Y : 0

https://alive2.llvm.org/ce/z/qeYhdz

I was looking at a missing abs() transform and found my way to this
generalization of an existing fold that was added with D67799.
As discussed in that review, we want to make sure codegen handles
this difference well, and for all of the targets/types that I
spot-checked, it looks good.

I am leaving the existing fold in place in this commit because
it covers a potentially missing icmp fold, but I plan to remove
that as a follow-up commit as suggested during review.

Differential Revision: https://reviews.llvm.org/D111410
2021-10-15 16:25:48 -04:00
Sanjay Patel a35673f4cf [InstCombine] add tests for (i32 X s>> 31) & Y; NFC
Also regenerate some check lines to more accurately show
current transforms via name changes.
2021-10-08 11:07:57 -04:00
Sanjay Patel 41ff7612b3 [InstCombine] allow splat vectors for narrowing masked fold
Mostly cosmetic diffs, but the use of m_APInt matches splat constants.
2021-09-17 11:24:16 -04:00
Sanjay Patel 3a587ed20f [InstCombine] add vector tests for 'and' folds; NFC 2021-09-17 11:24:16 -04:00
Juneyoung Lee 8a156d1c27 [InstCombine] Fully disable select to and/or i1 folding
This is a patch that disables the poison-unsafe select -> and/or i1 folding.

It has been blocking D72396 and also has been the source of a few miscompilations
described in llvm.org/pr49688 .
D99674 conditionally blocked this folding and successfully fixed the latter one.
The former one was still blocked, and this patch addresses it.

Note that a few test functions that has `_logical` suffix are now deoptimized.
These are created by @nikic to check the impact of disabling this optimization
by copying existing original functions and replacing and/or with select.

I can see that most of these are poison-unsafe; they can be revived by introducing
freeze instruction. I left comments at fcmp + select optimizations (or-fcmp.ll, and-fcmp.ll)
because I think they are good targets for freeze fix.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D101191
2021-05-06 09:29:52 +09:00
Nikita Popov fb063c933f [InstCombine] Duplicate tests for logical and/or (NFC)
This replicates existing and/or tests to also test variants using
select. This should help us get a more accurate view on which
optimizations we're missing if we disable the select -> and/or
fold.
2021-01-12 21:50:41 +01:00
Dávid Bolvanský c1937c2af2 [NFC] Added/adjusted tests for PR48604; second pattern 2020-12-31 14:59:15 +01:00
Dávid Bolvanský 742ea77ca4 [InstCombine] Transform (A + B) - (A | B) to A & B (PR48604)
define i32 @src(i32 %x, i32 %y) {
%0:
  %a = add i32 %x, %y
  %o = or i32 %x, %y
  %r = sub i32 %a, %o
  ret i32 %r
}
=>
define i32 @tgt(i32 %x, i32 %y) {
%0:
  %b = and i32 %x, %y
  ret i32 %b
}
Transformation seems to be correct!

https://alive2.llvm.org/ce/z/aQRh2j
2020-12-31 14:03:20 +01:00
Dávid Bolvanský 9b64939463 [NFC] Added tests for PR48604 2020-12-31 14:03:20 +01:00
Bhramar Vatsa fd679107d6
[InstCombine] Optimize away the unnecessary multi-use sign-extend
C.f. https://bugs.llvm.org/show_bug.cgi?id=47765

Added a case for handling the sign-extend (Shl+AShr) for multiple uses,
to optimize it away for an individual use,
when the demanded bits aren't affected by sign-extend.

https://rise4fun.com/Alive/lgf

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D91343
2020-12-01 16:54:00 +03:00
Sanjay Patel dfd2858b7f [InstCombine] add test comments for negative tests; NFC 2020-11-20 07:59:46 -05:00
Sanjay Patel 4a66a1d17a [InstCombine] allow vectors for masked-add -> xor fold
https://rise4fun.com/Alive/I4Ge

  Name: add with pow2 mask
  Pre: isPowerOf2(C2) && (C1 & C2) != 0 && (C1 & (C2-1)) == 0
  %a = add i8 %x, C1
  %r = and i8 %a, C2
  =>
  %n = and i8 %x, C2
  %r = xor i8 %n, C2
2020-11-17 13:36:08 -05:00
Sanjay Patel f791ad7e1e [InstCombine] remove scalar constraint for mask-of-add fold
https://rise4fun.com/Alive/V6fP

  Name: add with low mask
  Pre: (C1 & (-1 u>> countLeadingZeros(C2))) == 0
  %a = add i8 %x, C1
  %r = and i8 %a, C2
  =>
  %r = and i8 %x, C2
2020-11-17 12:13:45 -05:00
Sanjay Patel c15e5bdfb7 [InstCombine] add vector test for mask of add; NFC 2020-11-17 12:13:45 -05:00
Sanjay Patel 433696911a [InstCombine] relax constraints on mask-of-add
There are 2 changes:
1. Remove the unnecessary one-use check.
2. Remove the unnecessary power-of-2 check.

https://rise4fun.com/Alive/V6fP

  Name: add with low mask
  Pre: (C1 & (-1 u>> countLeadingZeros(C2))) == 0
  %a = add i8 %x, C1
  %r = and i8 %a, C2
  =>
  %r = and i8 %x, C2
2020-11-17 12:13:44 -05:00
Sanjay Patel 1d7d9d6685 [InstCombine] add tests for masked add; NFC 2020-11-17 12:13:44 -05:00
Sanjay Patel 4e68bc0999 Revert "[InstCombine] add multi-use demanded bits fold for add with low-bit mask"
This reverts commit e56103d250.
There is a stage2 msan failure blamed on this commit:
http://lab.llvm.org:8011/#/builders/74/builds/888/steps/9/logs/stdio
2020-11-16 14:48:09 -05:00
Sanjay Patel 6ddc237766 [InstCombine] reduce code for flip of masked bit; NFC
There are 1-2 potential follow-up NFC commits to reduce
this further on the way to generalizing this for vectors.

The operand replacing path should be dead code because demanded
bits handles that more generally (D91415).
2020-11-15 15:43:34 -05:00
Sanjay Patel e56103d250 [InstCombine] add multi-use demanded bits fold for add with low-bit mask
I noticed an add example like the one from D91343, so here's a similar patch.
The logic is based on existing code for the single-use demanded bits fold.
But I only matched a constant instead of using compute known bits on the
operands because that was the motivating patterni that I noticed.

I think this will allow removing a special-case (but incomplete) dedicated
fold within visitAnd(), but I need to untangle the existing code to be sure.

https://rise4fun.com/Alive/V6fP

  Name: add with low mask
  Pre: (C1 & (-1 u>> countLeadingZeros(C2))) == 0
  %a = add i8 %x, C1
  %r = and i8 %a, C2
  =>
  %r = and i8 %x, C2

Differential Revision: https://reviews.llvm.org/D91415
2020-11-15 15:09:49 -05:00