Commit Graph

333 Commits

Author SHA1 Message Date
Sanjay Patel d5c002bdc7 [InstCombine] fix code comment to match code; NFC 2021-11-09 14:27:29 -05:00
Sanjay Patel 2a88d00cf2 [InstCombine] fold sub-of-umax to 0-usubsat
Op0 - umax(X, Op0) --> 0 - usub.sat(X, Op1)

I'm not sure if this is really an improvement in IR because
we probably have better recognition/analysis for min/max,
but this lines up with the fold we do for the icmp+select
idiom and removes another diff from D98152.

This is similar to the previous fold in the code that was
added with:
83c2fb9f66
baa6a85130

https://alive2.llvm.org/ce/z/5MrVB9
2021-11-09 12:46:03 -05:00
Sanjay Patel baa6a85130 [InstCombine] allow commute in sub-of-umax fold
This fold was added with:
83c2fb9f66
...but missed the commuted pattern:
https://alive2.llvm.org/ce/z/_tYEGy
2021-11-09 10:50:11 -05:00
Sanjay Patel 83c2fb9f66 [InstCombine] match usub.sat from umax intrinsic
umax(X, Op1) - Op1 --> usub.sat(X, Op1)

https://alive2.llvm.org/ce/z/HpcGiJ

This happens in 2 or more steps with an icmp-select idiom
instead of an intrinsic. This is another step towards
canonicalization of the min/max intrinsics. See:
D98152
2021-11-06 08:32:52 -04:00
Jay Foad a9bceb2b05 [APInt] Stop using soft-deprecated constructors and methods in llvm. NFC.
Stop using APInt constructors and methods that were soft-deprecated in
D109483. This fixes all the uses I found in llvm, except for the APInt
unit tests which should still test the deprecated methods.

Differential Revision: https://reviews.llvm.org/D110807
2021-10-04 08:57:44 +01:00
Dávid Bolvanský 5f2f611880 Fixed more warnings in LLVM produced by -Wbitwise-instead-of-logical 2021-10-03 13:58:10 +02:00
Sanjay Patel 75e8eb2b10 [InstCombine] update code/test comments; NFC
Follow-up for post-commit suggestion on:
28afaed691

The comments were partly copied from the original
code, but not updated to match the new code.
2021-09-11 10:53:53 -04:00
Sanjay Patel 28afaed691 [InstCombine] fold sub of min/max intrinsics with invertible ops
This is a translation of the existing code to handle the intrinsics
and another step towards D98152.

https://alive2.llvm.org/ce/z/jA7eBC

This pattern is already handled by underlying folds if there are
less uses, so the minimal tests in this case have extra uses.

The larger cmyk tests show the motivation - when combined with
other folds, we invert a larger sequence and eliminate 'not' ops.
2021-09-11 09:18:46 -04:00
Sanjay Patel cc9c545fb4 [InstCombine] generalize subtract with 'not' operands; 2nd try
This is a re-try of 3aa009cc87 which was reverted at
9577fac0fd because it caused an infinite loop.

For the extra test case, either re-ordering the transforms
or adding the extra clause to avoid sub-of-sub is enough
to prevent the infinite compile, but I'm doing both to be
safer.

Original commit message:
The motivation was to get min/max intrinsics to parity
with cmp+select idioms, but this unlocks a few more
folds because isFreeToInvert recognizes add/sub with
constants too.

In the min/max example, we have too many extra uses
for smaller folds to improve things, but this fold
is able to eliminate uses even though we can't reduce
the number of instructions.
2021-08-23 17:06:51 -04:00
Florian Hahn 9577fac0fd
Revert "[InstCombine] generalize subtract with 'not' operands"
This reverts commit 3aa009cc87.

The reverted commit causes an infinite loop in instcombine. See PR51584.
2021-08-23 15:47:21 +01:00
Sanjay Patel 3aa009cc87 [InstCombine] generalize subtract with 'not' operands
The motivation was to get min/max intrinsics to parity
with cmp+select idioms, but this unlocks a few more
folds because isFreeToInvert recognizes add/sub with
constants too.

In the min/max example, we have too many extra uses
for smaller folds to improve things, but this fold
is able to eliminate uses even though we can't reduce
the number of instructions.
2021-08-22 07:18:31 -04:00
Sanjay Patel 41af8f0ad5 [InstCombine] combine constants by reassociating add/sub/add
This may overlap partially with the reassociate pass,
but it seems simple enough that we should try it here
in InstCombine to enable other folds.

This shows up as an opportunity and potential regression
if we improve a subtract fold with 'not' ops to be more
general.
2021-08-21 11:45:43 -04:00
Sanjay Patel 0e15de2d0c [InstCombine] fold reassociative FP add into start value of fadd reduction
This pattern is visible in unrolled and vectorized loops.
Although the backend seems to be able to reassociate to
ideal form in the examples I looked at, we might as well
do that in IR for efficiency.
2021-07-18 06:26:20 -04:00
Sanjay Patel d2012d965d [InstCombine] fix nsz (fast-math) propagation from fneg-of-select
As discussed in the post-commit comments for:
3cdd05e519

It seems to be safe to propagate all flags from the final fneg
except for 'nsz' to the new select:
https://alive2.llvm.org/ce/z/J_APDc

nsz has unique FMF semantics: it is not poison, it is only
"insignificant" in the calculation according to the LangRef.
2021-06-08 17:04:30 -04:00
Sanjay Patel 4675beaa21 [InstCombine] intersect nsz and ninf fast-math-flags (FMF) for fneg(fdiv) fold
https://alive2.llvm.org/ce/z/3KPvih

https://llvm.org/PR49654
2021-06-07 13:22:49 -04:00
Sanjay Patel 519e98cd9a [InstCombine] refactor match clauses; NFC
We need to adjust the FMF propagation on at least
one of these transforms as discussed in:
https://llvm.org/PR49654
...so this should make it easier to intersect flags.
2021-06-07 13:22:49 -04:00
Sanjay Patel 3cdd05e519 [InstCombine] fold fnegs around select
This is one of the folds requested in:
https://llvm.org/PR39480

https://alive2.llvm.org/ce/z/NczU3V

Note - this uses the normal FMF propagation logic
(flags transfer from the final value to new/intermediate ops).
It's not clear if this matches what Alive2 implements,
so we may want to adjust one or the other.
2021-05-17 14:53:49 -04:00
Dávid Bolvanský 691badc3d6 [InstCombine] C - ctpop(a) - > ctpop(~a)) if C is bitwidth (PR50104)
Proof: https://alive2.llvm.org/ce/z/mncA9K
Solves https://bugs.llvm.org/show_bug.cgi?id=50104

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D101257
2021-04-26 15:40:54 +02:00
Dávid Bolvanský d4ec8ea19c [InstCombine] ctpop(X) + ctpop(Y) => ctpop(X | Y) if X and Y have no common bits (PR48999)
For example:

```
int src(unsigned int a, unsigned int b)
{
    return __builtin_popcount(a << 16) + __builtin_popcount(b >> 16);
}

int tgt(unsigned int a, unsigned int b)
{
    return __builtin_popcount((a << 16)  | (b >> 16));
}
```

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D101210
2021-04-24 17:52:10 +02:00
Dávid Bolvanský 9aee07abd0 [InstCombine] X - usub.sat(X, Y) => umin(X, Y)
Pattern regressed in LLVM 9 with the introduction of usub.sat.

Fixes https://bugs.llvm.org/show_bug.cgi?id=42178#c2

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D101184
2021-04-23 21:13:07 +02:00
Roman Lebedev a36bb7fd76
[InstCombine] (X | Op01C) + Op1C --> X + (Op01C + Op1C) iff the or is actually an add
https://alive2.llvm.org/ce/z/Coc5yf
2021-04-11 18:08:08 +03:00
Roman Lebedev 24f67473dd
[InstCombine] foldAddWithConstant(): don't deal with non-immediate constants
All of the code that handles general constant here (other than the more
restrictive APInt-dealing code) expects that it is an immediate,
because otherwise we won't actually fold the constants, and increase
instruction count. And it isn't obvious why we'd be okay with
increasing the number of constant expressions,
those still will have to be run..

But after 2829094a8e
this could also cause endless combine loops.
So actually properly restrict this code to immediates.
2021-04-07 19:50:19 +03:00
Roman Lebedev 2829094a8e
Reland [InstCombine] Fold `((X - Y) - Z)` to `X - (Y + Z)` (PR49858)
This reverts commit a547b4e26b,
relanding commit 31d219d299,
which was reverted because there was a conflicting inverse transform,
which was causing an endless combine loop, which has now been adjusted.

Original commit message:

https://alive2.llvm.org/ce/z/67w-wQ

We prefer `add`s over `sub`, and this particular xform
allows further folds to happen:

Fixes https://bugs.llvm.org/show_bug.cgi?id=49858
2021-04-07 12:06:25 +03:00
Roman Lebedev 93d1d94b74
[InstCombine] Restrict "C-(X+C2) --> (C-C2)-X" fold to immediate constants
I.e., if any/all of the consants is an expression, don't do it.
Since those constants won't reduce into an immediate,
but would be left as an constant expression, they could cause
endless combine loops after 31d219d299
added an inverse transformation.
2021-04-07 12:06:24 +03:00
Petr Hosek a547b4e26b Revert "[InstCombine] Fold `((X - Y) - Z)` to `X - (Y + Z)` (PR49858)"
This reverts commit 31d219d299 which
causes an infinite loop when compiling the XRay runtime.
2021-04-06 22:30:28 -07:00
Roman Lebedev 31d219d299
[InstCombine] Fold `((X - Y) - Z)` to `X - (Y + Z)` (PR49858)
https://alive2.llvm.org/ce/z/67w-wQ

We prefer `add`s over `sub`, and this particular xform
allows further folds to happen:

Fixes https://bugs.llvm.org/show_bug.cgi?id=49858
2021-04-06 15:58:14 +03:00
Dávid Bolvanský ae69fa9b9f [InstCombine] Transform (A + B) - (A & B) to A | B (PR48604)
define i32 @src(i32 %x, i32 %y) {
%0:
  %a = add i32 %x, %y
  %o = and i32 %x, %y
  %r = sub i32 %a, %o
  ret i32 %r
}
=>
define i32 @tgt(i32 %x, i32 %y) {
%0:
  %b = or i32 %x, %y
  ret i32 %b
}
Transformation seems to be correct!

https://alive2.llvm.org/ce/z/2fhW6r
2020-12-31 15:04:32 +01:00
Dávid Bolvanský 742ea77ca4 [InstCombine] Transform (A + B) - (A | B) to A & B (PR48604)
define i32 @src(i32 %x, i32 %y) {
%0:
  %a = add i32 %x, %y
  %o = or i32 %x, %y
  %r = sub i32 %a, %o
  ret i32 %r
}
=>
define i32 @tgt(i32 %x, i32 %y) {
%0:
  %b = and i32 %x, %y
  ret i32 %b
}
Transformation seems to be correct!

https://alive2.llvm.org/ce/z/aQRh2j
2020-12-31 14:03:20 +01:00
Roman Lebedev b3021a72a6
[IR][InstCombine] Add m_ImmConstant(), that matches on non-ConstantExpr constants, and use it
A pattern to ignore ConstantExpr's is quite common, since they frequently
lead into infinite combine loops, so let's make writing it easier.
2020-12-24 21:20:47 +03:00
Reid Kleckner d2ed9d6b7e Revert "ADT: Migrate users of AlignedCharArrayUnion to std::aligned_union_t, NFC"
We determined that the MSVC implementation of std::aligned* isn't suited
to our needs. It doesn't support 16 byte alignment or higher, and it
doesn't really guarantee 8 byte alignment. See
https://github.com/microsoft/STL/issues/1533

Also reverts "ADT: Change AlignedCharArrayUnion to an alias of std::aligned_union_t, NFC"

Also reverts "ADT: Remove AlignedCharArrayUnion, NFC" to bring back
AlignedCharArrayUnion.

This reverts commit 4d8bf870a8.

This reverts commit d10f9863a5.

This reverts commit 4b5dc150b9.
2020-12-14 17:04:06 -08:00
Duncan P. N. Exon Smith d10f9863a5 ADT: Migrate users of AlignedCharArrayUnion to std::aligned_union_t, NFC
Prepare to delete `AlignedCharArrayUnion` by migrating its users over to
`std::aligned_union_t`.

I will delete `AlignedCharArrayUnion` and its tests in a follow-up
commit so that it's easier to revert in isolation in case some
downstream wants to keep using it.

Differential Revision: https://reviews.llvm.org/D92516
2020-12-04 12:34:49 -08:00
Duncan P. N. Exon Smith 5b267fb796 ADT: Stop peeking inside AlignedCharArrayUnion, NFC
Update all the users of `AlignedCharArrayUnion` to stop peeking inside
(to look at `buffer`) so that a follow-up patch can replace it with an
alias to `std::aligned_union_t`.

This was reviewed as part of https://reviews.llvm.org/D92512, but I'm
splitting this bit out to commit first to reduce churn in case the
change to `AlignedCharArrayUnion` needs to be reverted for some
unexpected reason.
2020-12-04 11:07:42 -08:00
Sanjay Patel 678b9c5dde [InstCombine] try difference-of-shifts factorization before negator
We need to preserve wrapping flags to allow better folds.
The cases with geps may be non-intuitive, but that appears to agree with Alive2:
https://alive2.llvm.org/ce/z/JQcqw7
We create 'nsw' ops independent from the original wrapping on the sub.
2020-11-24 13:56:30 -05:00
Sanjay Patel ab29f091eb [InstCombine] propagate 'nsw' on pointer difference of 'inbounds' geps
This is a retry of 324a53205. I cautiously reverted that at 6aa3fc4
because the rules about gep math were not clear. Since then, we
have added this line to LangRef for gep inbounds:
"The successive addition of offsets (without adding the base address)
does not wrap the pointer index type in a signed sense (nsw)."

See D90708 and post-commit comments on the revert patch for more details.
2020-11-23 16:50:09 -05:00
Nikita Popov 02dda1c659 [Local] Clean up EmitGEPOffset
Handle the emission of the add in a single place, instead of three
different ones.

Don't emit an unnecessary add with zero to start with. It will get
dropped by InstCombine, but we may as well not create it in the
first place. This also means that InstCombine does not need to
specially handle this extra add.

This is conceptually NFC, but can affect worklist order etc.
2020-11-13 18:30:56 +01:00
Sanjay Patel 0abde4bc92 [InstCombine] fold sub of low-bit masked value from offset of same value
There might be some demanded/known bits way to generalize this,
but I'm not seeing it right now.

This came up as a regression when I was looking at a different
demanded bits improvement.

https://rise4fun.com/Alive/5fl

  Name: general
  Pre: ((-1 << countTrailingZeros(C1)) & C2) == 0
  %a1 = add i8 %x, C1
  %a2 = and i8 %x, C2
  %r = sub i8 %a1, %a2
  =>
  %r = and i8 %a1, ~C2

  Name: test 1
  %a1 = add i8 %x, 192
  %a2 = and i8 %x, 10
  %r = sub i8 %a1, %a2
  =>
  %r = and i8 %a1, -11

  Name: test 2
  %a1 = add i8 %x, -108
  %a2 = and i8 %x, 3
  %r = sub i8 %a1, %a2
  =>
  %r = and i8 %a1, -4
2020-11-12 20:10:28 -05:00
Roman Lebedev c009d11bda
[InstCombine] Perform C-(X+C2) --> (C-C2)-X transform before using Negator
In particular, it makes it fire for C=0, because negator doesn't want
to perform that fold since in general it's not beneficial.
2020-11-03 16:06:52 +03:00
Sanjay Patel 3f3356bdd9 [InstCombine] allow vector splats for add+xor --> shifts 2020-10-11 09:04:24 -04:00
Sanjay Patel f81200ae99 [InstCombine] add one-use check to add+xor transform
As shown in the affected test, we could increase instruction
count without this limitation. There's another test with extra
use that shows we still convert directly to a real "sext" if
possible.
2020-10-11 09:04:24 -04:00
Sanjay Patel 080e6bc205 [InstCombine] allow vector splats for add+and with high-mask
There might be a better way to specify the pre-conditions,
but this is hopefully clearer than the way it was written:
https://rise4fun.com/Alive/Jhk3

  Pre: C2 < 0 && isShiftedMask(C2) && (C1 == C1 & C2)
  %a = and %x, C2
  %r = add %a, C1
  =>
  %a2 = add %x, C1
  %r = and %a2, C2
2020-10-09 10:39:11 -04:00
Sanjay Patel f688ae7a0e [InstCombine] allow vector splats for add+xor with low-mask
This can be allowed with undef elements too, but that can be another step:
https://alive2.llvm.org/ce/z/hnC4Z-
2020-10-08 15:53:38 -04:00
Sanjay Patel 5ac89add1e [InstCombine] remove unnecessary one-use check from add-xor transform
Pre-conditions seem to be optimal, but we don't need a use check
because we are only replacing an add with a sub.

https://rise4fun.com/Alive/hzN

  Pre: (~C1 | C2 == -1) && isPowerOf2(C2+1)
  %m = and i8 %x, C1
  %f = xor i8 %m, C2
  %r = add i8 %f, C3
  =>
  %r = sub i8 C2 + C3, %m
2020-10-08 15:08:51 -04:00
Sanjay Patel b57451b011 [InstCombine] allow vector splats for add+xor with signmask 2020-10-08 10:46:34 -04:00
Amara Emerson 322d0afd87 [llvm][mlir] Promote the experimental reduction intrinsics to be first class intrinsics.
This change renames the intrinsics to not have "experimental" in the name.

The autoupgrader will handle legacy intrinsics.

Relevant ML thread: http://lists.llvm.org/pipermail/llvm-dev/2020-April/140729.html

Differential Revision: https://reviews.llvm.org/D88787
2020-10-07 10:36:44 -07:00
Martin Storsjö 2c4c659666 [InstCombine] Add parentheses in assert to silence GCC warning. NFC. 2020-09-23 09:03:01 +03:00
Sanjay Patel 7903ae4720 [InstCombine] factorize left shifts of add/sub
We do similar factorization folds in SimplifyUsingDistributiveLaws,
but that drops no-wrap properties. Propagating those optimally may
help solve:
https://llvm.org/PR47430

The propagation is all-or-nothing for these patterns: when all
3 incoming ops have nsw or nuw, the 2 new ops should have the
same no-wrap property:
https://alive2.llvm.org/ce/z/Dv8wsU

This also solves:
https://llvm.org/PR47584
2020-09-20 12:55:24 -04:00
Sanjay Patel 6aa3fc4a5b Revert "[InstCombine] propagate 'nsw' on pointer difference of 'inbounds' geps (PR47430)"
This reverts commit 324a53205a.

On closer examination of at least one of the test diffs,
this does not appear to be correct in all cases. Even the
existing 'nsw' creation may be wrong based on this example:
https://alive2.llvm.org/ce/z/uL4Hw9
https://alive2.llvm.org/ce/z/fJMKQS
2020-09-11 10:54:48 -04:00
Sanjay Patel 324a53205a [InstCombine] propagate 'nsw' on pointer difference of 'inbounds' geps (PR47430)
There's no signed wrap if both geps have 'inbounds':
https://alive2.llvm.org/ce/z/nZkQTg
https://alive2.llvm.org/ce/z/7qFauh
2020-09-11 10:39:09 -04:00
Sanjay Patel 8b30067919 [InstCombine] improve fold of pointer differences
This was supposed to be an NFC cleanup, but there's
a real logic difference (did not drop 'nsw') visible
in some tests in addition to an efficiency improvement.

This is because in the case where we have 2 GEPs,
the code was *always* swapping the operands and
negating the result. But if we have 2 GEPs, we
should *never* need swapping/negation AFAICT.

This is part of improving flags propagation noticed
with PR47430.
2020-09-07 15:54:32 -04:00
Sanjay Patel 3ca8b9a560 [InstCombine] give a name to an intermediate value for easier tracking; NFC
As noted in PR47430, we probably want to conditionally include 'nsw'
here anyway, so we are going to need to fill out the optional args.
2020-09-07 08:19:42 -04:00