Commit Graph

4497 Commits

Author SHA1 Message Date
Alexey Bataev 04ba80ca4d [Instcombiner]Improve emission of logical or/and reductions.
For logical or/and reductions we emit regular intrinsics @llvm.vector.reduce.or/and.vxi1 calls.
These intrinsics are not effective for the logical or/and reductions,
especially if the optimizer is able to emit short circuit versions of
the scalar or/and instructions and vector code gets less effective than
the scalar version.
Instead, or reduction for i1 can be represented as:
```
%val = bitcast <ReduxWidth x i1> to iReduxWidth
%res = cmp ne iReduxWidth %val, 0
```
and reduction for i1 can be represented as:
```
%val = bitcast <ReduxWidth x i1> to iReduxWidth
%res = cmp eq iReduxWidth %val, 11111
```
This improves perfromance of the vector code significantly and make it
to outperform short circuit scalar code.

Part of D57059.

Differential Revision: https://reviews.llvm.org/D97406
2021-03-04 08:01:02 -08:00
Serguei Katkov a0ff0f30df [InstCombine] Move statepoint intrinsic handling from visitCall to visitCallBase
statepoint intrinsic can be used in invoke context,
so it should be handled in visitCallBase to cover both call and invoke.

Reviewers: reames, dantrushin
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D97833
2021-03-04 11:00:22 +07:00
Philip Reames 99f5417346 Sink routine for replacing a operand bundle to CallBase [NFC]
We had equivalent code for both CallInst and InvokeInst, but never cared about the result type.
2021-03-03 12:07:55 -08:00
Sanjay Patel 9502061bcc [InstCombine] avoid infinite loop in demanded bits for select
https://llvm.org/PR49205
2021-02-28 10:17:53 -05:00
Stephen Tozer ec7b9b0c18 [InstCombine] Avoid redundant or out-of-order debug value sinking
This patch modifies TryToSinkInstruction in the InstCombine pass, to prevent
redundant debug intrinsics from being produced, and also prevent the intrinsics
from being emitted in an incorrect order. It does this by ensuring that when
this pass sinks an instruction and creates clones of the debug intrinsics that
use that instruction, it inserts those debug intrinsics in their original order,
and only inserts the last debug intrinsic for each variable in the Instruction's
block.

Differential revision: https://reviews.llvm.org/D95463
2021-02-26 13:04:33 +00:00
Sanjay Patel a7cee55762 [InstCombine] fold fdiv with powi divisor (PR49147)
This extends b40fde062c for the especially non-standard
powi pattern. We want to avoid being completely wrong
on the negation-of-int-min corner case, so I'm adding
an extra FMF check for 'ninf' assuming that gives us
the flexibility to handle that possibility.
https://llvm.org/PR49147
2021-02-24 16:44:36 -05:00
Sanjay Patel 868d43fbd6 [InstCombine] add helper for x/pow(); NFC
We at least want to add powi to this list, so
split it off into a switch to reduce code duplication.
2021-02-24 16:44:36 -05:00
Nikita Popov e0615bcd39 [Loads] Add optimized FindAvailableLoadedValue() overload (NFCI)
FindAvailableLoadedValue() accepts an iterator by reference. If no
available value is found, then the iterator will either be left
at a clobbering instruction or the beginning of the basic block.
This allows using FindAvailableLoadedValue() across multiple blocks.

If this functionality is not needed, as is the case in InstCombine,
then we can use a much more efficient implementation: First try
to find an available value, and only perform clobber checks if
we actually found one. As this function only looks at a very small
number of instructions (6 by default) and usually doesn't find an
available value, this saves many expensive alias analysis queries.
2021-02-21 18:42:56 +01:00
Sanjay Patel e772618f1e [InstCombine] fold fdiv with exp/exp2 divisor (PR49147)
Follow-up to:
D96648 / b40fde062
...for the special-case base calls.

From the earlier commit:
This is unusual in the general (non-reciprocal) case because we need
an extra instruction, but that should be better for general FP
reassociation and codegen. We conservatively check for "arcp" FMF
here as we do with existing fdiv folds, but it is not strictly
necessary to have that.
2021-02-20 16:02:58 -05:00
Simon Pilgrim 609d0c9772 [InstCombine] matchBSwapOrBitReverse - remove pattern matching early-out. NFCI.
recognizeBSwapOrBitReverseIdiom + collectBitParts have pattern matching to bail out early if a bswap/bitreverse pattern isn't possible - we should be able to rely on this instead without any notable change in compile time.

This is part of a cleanup towards letting matchBSwapOrBitReverse /recognizeBSwapOrBitReverseIdiom use 'root' instructions that aren't ORs (FSHL/FSHRs in particular which can be prematurely created).

Differential Revision: https://reviews.llvm.org/D97056
2021-02-20 13:15:34 +00:00
Nikita Popov 70e3c9a8b6 [BasicAA] Always strip single-argument phi nodes
We can always look through single-argument (LCSSA) phi nodes when
performing alias analysis. getUnderlyingObject() already does this,
but stripPointerCastsAndInvariantGroups() does not. We still look
through these phi nodes with the usual aliasPhi() logic, but
sometimes get sub-optimal results due to the restrictions on value
equivalence when looking through arbitrary phi nodes. I think it's
generally beneficial to keep the underlying object logic and the
pointer cast stripping logic in sync, insofar as it is possible.

With this patch we get marginally better results:

  aa.NumMayAlias | 5010069 | 5009861
  aa.NumMustAlias | 347518 | 347674
  aa.NumNoAlias | 27201336 | 27201528
  ...
  licm.NumPromoted | 1293 | 1296

I've renamed the relevant strip method to stripPointerCastsForAliasAnalysis(),
as we're past the point where we can explicitly spell out everything
that's getting stripped.

Differential Revision: https://reviews.llvm.org/D96668
2021-02-18 23:07:50 +01:00
Philip Reames 8666463889 [instcombine] Exploit UB implied by nofree attributes
This patch simply implements the documented UB of the current nofree attributes as specified. It doesn't try to be fancy about inference (yet), it just implements the cases already specified and inferred.

Note: When this lands, it may expose miscompiles. If so, please revert and provide a test case. It's likely the bug is in the existing inference code and without a relatively complete test case, it will be hard to debug.

Differential Revision: https://reviews.llvm.org/D96349
2021-02-18 08:34:22 -08:00
Sanjay Patel 85294703a7 [InstCombine] fold fcmp-of-copysign idiom
As discussed in:
https://llvm.org/PR49179
...this pattern shows up in library code.
There are several potential generalizations as noted,
but we need to be careful that we get FP special-values
right, and it's not clear how much variation we should
expect to see from this exact idiom.
2021-02-17 10:32:33 -05:00
Sanjay Patel b40fde062c [InstCombine] fold fdiv with pow divisor (PR49147)
This is unusual in the general (non-reciprocal) case because we need
an extra instruction, but that should be better for general FP
reassociation and codegen. We conservatively check for "arcp" FMF
here as we do with existing fdiv folds, but it is not strictly
necessary to have that.

This is part of solving:
https://llvm.org/PR49147
(The powi variant potentially has a different constraint.)

Differential Revision: https://reviews.llvm.org/D96648
2021-02-14 08:07:36 -05:00
Tyker 642e9225c6 reland [InstCombine] convert assumes to operand bundles
Instcombine will convert the nonnull and alignment assumption that use the boolean condtion
to an assumption that uses the operand bundles when knowledge retention is enabled.

Differential Revision: https://reviews.llvm.org/D82703
2021-02-13 13:03:11 +01:00
Hongtao Yu 1cb47a063e [CSSPGO] Unblock optimizations with pseudo probe instrumentation.
The IR/MIR pseudo probe intrinsics don't get materialized into real machine instructions and therefore they don't incur runtime cost directly. However, they come with indirect cost by blocking certain optimizations. Some of the blocking are intentional (such as blocking code merge) for better counts quality while the others are accidental. This change unblocks perf-critical optimizations that do not affect counts quality. They include:

1. IR InstCombine, sinking load operation to shorten lifetimes.
2. MIR LiveRangeShrink, similar to #1
3. MIR TwoAddressInstructionPass, i.e, opeq transform
4. MIR function argument copy elision
5. IR stack protection. (though not perf-critical but nice to have).

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D95982
2021-02-10 12:43:17 -08:00
Sanjay Patel 6e2053983e [InstCombine] fold lshr(mul X, SplatC), C2
This is a special-case multiply that replicates bits of
the source operand. We need this fold to avoid regression
if we make canonicalization to `mul` more aggressive for
shl+or patterns.

I did not see a way to make Alive generalize the bit width
condition for even-number-of-bits only, but an example of
the proof is:
  Name: i32
  Pre: isPowerOf2(C1 - 1) && log2(C1) == C2 && (C2 * 2 == width(C2))
  %m = mul nuw i32 %x, C1
  %t = lshr i32 %m, C2
  =>
  %t = and i32 %x, C1 - 2

  Name: i14
  %m = mul nuw i14 %x, 129
  %t = lshr i14 %m, 7
  =>
  %t = and i14 %x, 127

https://rise4fun.com/Alive/e52
2021-02-10 15:02:31 -05:00
Tyker 5652e192fc Revert "[InstCombine] convert assumes to operand bundles"
This reverts commit 5eb2e994f9.
2021-02-10 01:32:00 +01:00
Tyker 5eb2e994f9 [InstCombine] convert assumes to operand bundles
Instcombine will convert the nonnull and alignment assumption that use the boolean condtion
to an assumption that uses the operand bundles when knowledge retention is enabled.

Differential Revision: https://reviews.llvm.org/D82703
2021-02-09 19:33:53 +01:00
Kazu Hirata 302313a264 [Transforms] Use range-based for loops (NFC) 2021-02-08 22:33:53 -08:00
Roman Lebedev 485c4b552b
[InstCombine] Host inversion out of ashr's value operand (PR48995)
This is a yet another hint that we will eventually need InstCombineInverter,
which would consistently sink inversions, but but for that we'll need
to consistently hoist inversions where possible, so let's do that here.

Example of a proof: https://alive2.llvm.org/ce/z/78SbDq

See https://bugs.llvm.org/show_bug.cgi?id=48995
2021-02-02 17:56:43 +03:00
Sanjay Patel 0ce2920f17 [InstCombine] try to narrow min/max intrinsics with constant operand
The constant trunc/ext may not be the optimal pre-condition,
but I think that handles the common cases.

Example of Alive2 proof:
https://alive2.llvm.org/ce/z/sREeLC

This is another step towards canonicalizing to the intrinsics.
Narrowing was identified as source of potential regression for
abs(), so we need to handle this for min/max - see:
https://llvm.org/PR48816

If this is not enough, we could process intrinsics in
the trunc-driven matching in canEvaluateTruncated().
2021-02-01 13:44:13 -05:00
Valery N Dmitriev 716b9dd0d8 [InstCombine] Preserve FMF for powi simplifications.
Differential Revision: https://reviews.llvm.org/D95455
2021-01-26 13:26:06 -08:00
Sanjay Patel 09a136bcc6 [InstCombine] narrow min/max intrinsics with extended inputs
We can sink extends after min/max if they match and would
not change the sign-interpreted compare. The only combo
that doesn't work is zext+smin/smax because the zexts
could change a negative number into positive:
https://alive2.llvm.org/ce/z/D6sz6J

Sext+umax/umin works:

  define i32 @src(i8 %x, i8 %y) {
  %0:
    %sx = sext i8 %x to i32
    %sy = sext i8 %y to i32
    %m = umax i32 %sx, %sy
    ret i32 %m
  }
  =>
  define i32 @tgt(i8 %x, i8 %y) {
  %0:
    %m = umax i8 %x, %y
    %r = sext i8 %m to i32
    ret i32 %r
  }
  Transformation seems to be correct!
2021-01-25 07:52:50 -05:00
Jeroen Dobbelaere dcc7706fcf [InstCombine] Remove unused llvm.experimental.noalias.scope.decl
A @llvm.experimental.noalias.scope.decl is only useful if there is !alias.scope and !noalias metadata that uses the declared scope.
When that is not the case for at least one of the two, the intrinsic call can as well be removed.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D95141
2021-01-24 13:55:50 +01:00
Florian Hahn d60b74c28a
[InstCombine] Set MadeIRChange in replaceInstUsesWith.
Some utilities used by InstCombine, like SimplifyLibCalls, may add new
instructions and replace the uses of a call, but return nullptr because
the inserted call produces multiple results.

Previously, the replaced library calls would get removed by
InstCombine's deleter, but after
292077072e this may not happen, if the
willreturn attribute is missing.

As a work-around, update replaceInstUsesWith to set MadeIRChange, if it
replaces any uses. This catches the cases where it is used as replacer
by utilities used by InstCombine and seems useful in general; updating
uses will modify the IR.

This fixes an expensive-check failure when replacing
@__sinpif/@__cospifi with @__sincospif_sret.
2021-01-23 17:52:59 +00:00
Sanjay Patel 411c144e4c [InstCombine] narrow abs with sign-extended input
In the motivating cases from https://llvm.org/PR48816 ,
we have a trailing trunc. But that is not required to
reduce the abs width:
https://alive2.llvm.org/ce/z/ECaz-p
...as long as we clear the int-min-is-poison bit (nsw).

We have some existing tests that are affected, and I'm
not sure what the overall implications are, but in general
we favor narrowing operations over preserving nsw/nuw.

If that causes problems, we could restrict this transform
based on type (shouldChangeType() and/or vector vs. scalar).

Differential Revision: https://reviews.llvm.org/D95235
2021-01-22 13:36:04 -05:00
Roman Lebedev d1a6f92fd5
[InstCombine] Fold `(~x) | y` --> `~(x & (~y))` iff it is free to do so
Iff we know we can get rid of the inversions in the new pattern,
we can thus get rid of the inversion in the old pattern,
this decreasing instruction count.

Note that we could position this transformation as just hoisting
of the `not` (still, iff y is freely negatible), but the test changes
show a number of regressions, so let's not do that.
2021-01-22 17:23:54 +03:00
Roman Lebedev 79b0d21ce9
[InstCombine] Fold `(~x) & y` --> `~(x | (~y))` iff it is free to do so
Iff we know we can get rid of the inversions in the new pattern,
we can thus get rid of the inversion in the old pattern,
this decreasing instruction count.
2021-01-22 17:23:54 +03:00
Roman Lebedev 4ed0d8f2f0
[NFC][InstCombine] Extract freelyInvertAllUsersOf() out of canonicalizeICmpPredicate()
I'd like to use it in an upcoming fold.
2021-01-22 17:23:53 +03:00
Kazu Hirata e53472de68 [Transforms] Use llvm::append_range (NFC) 2021-01-20 21:35:54 -08:00
Kazu Hirata 8f5da41c4d [llvm] Construct SmallVector with iterator ranges (NFC) 2021-01-20 21:35:52 -08:00
Nikita Popov 21443381c0 Reapply [InstCombine] Replace one-use select operand based on condition
Relative to the original change, this adds a check that the
instruction on which we're replacing operands is safe to speculatively
execute, because that's what we're effectively doing. We're executing
the instruction with the replaced operand, which is fine if it's pure,
but not fine if can cause side-effects or UB (aka is not speculatable).

Additionally, we cannot (generally) replace operands in phi nodes,
as these may refer to a different loop iteration. This is also covered
by the speculation check.

-----

InstCombine already performs a fold where X == Y ? f(X) : Z is
transformed to X == Y ? f(Y) : Z if f(Y) simplifies. However,
if f(X) only has one use, then we can always directly replace the
use inside the instruction. To actually be profitable, limit it to
the case where Y is a non-expr constant.

This could be further extended to replace uses further up a one-use
instruction chain, but for now this only looks one level up.

Among other things, this also subsumes D94860.

Differential Revision: https://reviews.llvm.org/D94862
2021-01-19 20:26:38 +01:00
Hans Wennborg 58bdfcfac0 Revert 5238e7b302 "[InstCombine] Replace one-use select operand based on condition"
This caused a miscompile in Chromium, see comments on the codereview for
discussion and pointer to a reproducer.

> InstCombine already performs a fold where X == Y ? f(X) : Z is
> transformed to X == Y ? f(Y) : Z if f(Y) simplifies. However,
> if f(X) only has one use, then we can always directly replace the
> use inside the instruction. To actually be profitable, limit it to
> the case where Y is a non-expr constant.
>
> This could be further extended to replace uses further up a one-use
> instruction chain, but for now this only looks one level up.
>
> Among other things, this also subsumes D94860.
>
> Differential Revision: https://reviews.llvm.org/D94862

This also reverts the follow-up
a003f26539cf4db744655e76c41f4c4a8913f116:

> [llvm] Prevent infinite loop in InstCombine of select statements
>
> This fixes an issue where the RHS and LHS the comparison operation
> creating the predicate were swapped back and forth forever.
>
> Differential Revision: https://reviews.llvm.org/D94934
2021-01-19 11:50:56 +01:00
Tres Popp a003f26539 [llvm] Prevent infinite loop in InstCombine of select statements
This fixes an issue where the RHS and LHS the comparison operation
creating the predicate were swapped back and forth forever.

Differential Revision: https://reviews.llvm.org/D94934
2021-01-19 10:31:48 +01:00
Juneyoung Lee 2d89ebd5d1 Address unused variable warning 2021-01-19 09:30:16 +09:00
Juneyoung Lee 0441df94ad [InstCombine,InstSimplify] Optimize select followed by and/or/xor
This patch adds `A & (A && B)` -> `A && B`  (similarly for or + logical or)

Also, this patch adds `~(select C, (icmp pred X, Y), const)` -> `select C, (icmp pred' X, Y), ~const`.

Alive2 proof:
merge_and: https://alive2.llvm.org/ce/z/teMR97
merge_or: https://alive2.llvm.org/ce/z/b4yZUp
xor_and: https://alive2.llvm.org/ce/z/_-TXHi
xor_or: https://alive2.llvm.org/ce/z/2uYx_a

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D94861
2021-01-19 09:14:17 +09:00
Dávid Bolvanský ed396212da [InstCombine] Transform abs pattern using multiplication to abs intrinsic (PR45691)
```
unsigned r(int v)
{
    return (1 | -(v < 0)) * v;
}

`r` is equivalent to `abs(v)`.

```

```
define <4 x i8> @src(<4 x i8> %0) {
%1:
  %2 = ashr <4 x i8> %0, { 31, undef, 31, 31 }
  %3 = or <4 x i8> %2, { 1, 1, 1, undef }
  %4 = mul nsw <4 x i8> %3, %0
  ret <4 x i8> %4
}
=>
define <4 x i8> @tgt(<4 x i8> %0) {
%1:
  %2 = icmp slt <4 x i8> %0, { 0, 0, 0, 0 }
  %3 = sub nsw <4 x i8> { 0, 0, 0, 0 }, %0
  %4 = select <4 x i1> %2, <4 x i8> %3, <4 x i8> %0
  ret <4 x i8> %4
}
Transformation seems to be correct!
```

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D94874
2021-01-17 17:06:14 +01:00
Nikita Popov 5238e7b302 [InstCombine] Replace one-use select operand based on condition
InstCombine already performs a fold where X == Y ? f(X) : Z is
transformed to X == Y ? f(Y) : Z if f(Y) simplifies. However,
if f(X) only has one use, then we can always directly replace the
use inside the instruction. To actually be profitable, limit it to
the case where Y is a non-expr constant.

This could be further extended to replace uses further up a one-use
instruction chain, but for now this only looks one level up.

Among other things, this also subsumes D94860.

Differential Revision: https://reviews.llvm.org/D94862
2021-01-16 23:25:02 +01:00
Nikita Popov 17863614da [InstCombine] Fold select -> and/or using impliesPoison
We can fold a ? b : false to a & b if is_poison(b) implies that
is_poison(a), at which point we're able to reuse all the usual fold
on ands. In particular, this covers the very common case of
icmp X, C && icmp X, C'. The same applies to ors.

This currently only has an effect if the
-instcombine-unsafe-select-transform=0 option is set.

Differential Revision: https://reviews.llvm.org/D94550
2021-01-13 17:45:40 +01:00
Luo, Yuanke 055644cc45 [X86][AMX] Prohibit pointer cast on load.
The load/store instruction will be transformed to amx intrinsics in the
pass of AMX type lowering. Prohibiting the pointer cast make that pass
happy.

Differential Revision: https://reviews.llvm.org/D94372
2021-01-13 09:39:19 +08:00
Nikita Popov 23390e7a13 [InstCombine] Handle logical and/or in assume optimization
assume(a && b) can be converted to assume(a); assume(b) even if
the condition is logical. Same for assume(!(a || b)).
2021-01-12 22:36:40 +01:00
Dávid Bolvanský 0529946b5b [instCombine] Add (A ^ B) | ~(A | B) -> ~(A & B)
define i32 @src(i32 %x, i32 %y) {
%0:
  %xor = xor i32 %y, %x
  %or = or i32 %y, %x
  %neg = xor i32 %or, 4294967295
  %or1 = or i32 %xor, %neg
  ret i32 %or1
}
=>
define i32 @tgt(i32 %x, i32 %y) {
%0:
  %and = and i32 %x, %y
  %neg = xor i32 %and, 4294967295
  ret i32 %neg
}
Transformation seems to be correct!

https://alive2.llvm.org/ce/z/Cvca4a
2021-01-12 19:29:17 +01:00
Sanjay Patel 288f3fc5df [InstCombine] reduce icmp(ashr X, C1), C2 to sign-bit test
This is a more basic pattern that we should handle before trying to solve:
https://llvm.org/PR48640

There might be a better way to think about this because the pre-condition
that I came up with (number of sign bits in the compare constant) misses a
potential transform for each of ugt and ult as commented on in the test file.

Tried to model this is in Alive:
https://rise4fun.com/Alive/juX1
...but I couldn't get the ComputeNumSignBits() pre-condition to work as
expected, so replaced with leading 0/1 preconditions instead.

  Name: ugt
  Pre: countLeadingZeros(C2) <= C1 && countLeadingOnes(C2) <= C1
  %a = ashr %x, C1
  %r = icmp ugt i8 %a, C2
    =>
  %r = icmp slt i8 %x, 0

  Name: ult
  Pre: countLeadingZeros(C2) <= C1 && countLeadingOnes(C2) <= C1
  %a = ashr %x, C1
  %r = icmp ult i4 %a, C2
    =>
  %r = icmp sgt i4 %x, -1

Also approximated in Alive2:
https://alive2.llvm.org/ce/z/u5hCcz
https://alive2.llvm.org/ce/z/__szVL

Differential Revision: https://reviews.llvm.org/D94014
2021-01-11 15:53:39 -05:00
Florian Hahn c701f85c45
[STLExtras] Use return type from operator* of the wrapped iter.
Currently make_early_inc_range cannot be used with iterators with
operator* implementations that do not return a reference.

Most notably in the LLVM codebase, this means the User iterator ranges
cannot be used with make_early_inc_range, which slightly simplifies
iterating over ranges while elements are removed.

Instead of directly using BaseT::reference as return type of operator*,
this patch uses decltype to get the actual return type of the operator*
implementation in WrappedIteratorT.

This patch also updates a few places to use make use of
make_early_inc_range.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D93992
2021-01-10 14:41:13 +00:00
Kazu Hirata 33bf1cad75 [llvm] Use *Set::contains (NFC) 2021-01-07 20:29:34 -08:00
Juneyoung Lee 29f8628d1f [Constant] Add containsPoisonElement
This patch

- Adds containsPoisonElement that checks existence of poison in constant vector elements,
- Renames containsUndefElement to containsUndefOrPoisonElement to clarify its behavior & updates its uses properly

With this patch, isGuaranteedNotToBeUndefOrPoison's tests w.r.t constant vectors are added because its analysis is improved.

Thanks!

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D94053
2021-01-06 12:10:33 +09:00
Simon Pilgrim 313d982df6 [IR] Add ConstantInt::getBool helpers to wrap getTrue/getFalse. 2021-01-05 11:01:10 +00:00
Kazu Hirata 530c5af6a4 [Transforms] Construct SmallVector with iterator ranges (NFC) 2021-01-02 09:24:17 -08:00
Dávid Bolvanský ae69fa9b9f [InstCombine] Transform (A + B) - (A & B) to A | B (PR48604)
define i32 @src(i32 %x, i32 %y) {
%0:
  %a = add i32 %x, %y
  %o = and i32 %x, %y
  %r = sub i32 %a, %o
  ret i32 %r
}
=>
define i32 @tgt(i32 %x, i32 %y) {
%0:
  %b = or i32 %x, %y
  ret i32 %b
}
Transformation seems to be correct!

https://alive2.llvm.org/ce/z/2fhW6r
2020-12-31 15:04:32 +01:00
Dávid Bolvanský 742ea77ca4 [InstCombine] Transform (A + B) - (A | B) to A & B (PR48604)
define i32 @src(i32 %x, i32 %y) {
%0:
  %a = add i32 %x, %y
  %o = or i32 %x, %y
  %r = sub i32 %a, %o
  ret i32 %r
}
=>
define i32 @tgt(i32 %x, i32 %y) {
%0:
  %b = and i32 %x, %y
  ret i32 %b
}
Transformation seems to be correct!

https://alive2.llvm.org/ce/z/aQRh2j
2020-12-31 14:03:20 +01:00
Juneyoung Lee 9b29610228 Use unary CreateShuffleVector if possible
As mentioned in D93793, there are quite a few places where unary `IRBuilder::CreateShuffleVector(X, Mask)` can be used
instead of `IRBuilder::CreateShuffleVector(X, Undef, Mask)`.
Let's update them.

Actually, it would have been more natural if the patches were made in this order:
(1) let them use unary CreateShuffleVector first
(2) update IRBuilder::CreateShuffleVector to use poison as a placeholder value (D93793)

The order is swapped, but in terms of correctness it is still fine.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D93923
2020-12-30 22:36:08 +09:00
Luo, Yuanke 981a0bd858 [X86] Add x86_amx type for intel AMX.
The x86_amx is used for AMX intrisics. <256 x i32> is bitcast to x86_amx when
it is used by AMX intrinsics, and x86_amx is bitcast to <256 x i32> when it
is used by load/store instruction. So amx intrinsics only operate on type x86_amx.
It can help to separate amx intrinsics from llvm IR instructions (+-*/).
Thank Craig for the idea. This patch depend on https://reviews.llvm.org/D87981.

Differential Revision: https://reviews.llvm.org/D91927
2020-12-30 13:52:13 +08:00
Roman Lebedev 374ef57f13
[InstCombine] 'hoist xor-by-constant from xor-by-value': completely give up on constant exprs
As Mikael Holmén is noting in the post-commit review for the first fix
https://reviews.llvm.org/rGd4ccef38d0bb#967466
not hoisting constantexprs is not enough,
because if the xor originally was a constantexpr (i.e. X is a constantexpr).
`SimplifyAssociativeOrCommutative()` in `visitXor()` will immediately
undo this transform, thus again causing an infinite combine loop.

This transform has resulted in a surprising number of constantexpr failures.
2020-12-29 16:28:18 +03:00
Nikita Popov 4a16c507cb [InstCombine] Disable unsafe select transform behind a flag
This disables the poison-unsafe select -> and/or transform behind
a flag (we continue to perform the fold by default). This is intended
to simplify evaluation and testing while we teach various passes
to directly recognize the select pattern.

This only disables the main select -> and/or transform. A number of
related ones are instead changed to canonicalize to the a ? b : false
and a ? true : b forms which represent and/or respectively. This
requires a bit of care to avoid infinite loops, as we do not want
!a ? b : false to be converted into a ? false : b.

The basic idea here is the same as D93065, but keeps the change
behind a flag for now.

Differential Revision: https://reviews.llvm.org/D93840
2020-12-28 22:43:52 +01:00
Roman Lebedev d4ccef38d0
[InstCombine] 'hoist xor-by-constant from xor-by-value': ignore constantexprs
As it is being reported (in post-commit review) in
https://reviews.llvm.org/D93857
this fold (as i expected, but failed to come up with test coverage
despite trying) has issues with constant expressions.
Since we only care about true constants, which constantexprs are not,
don't perform such hoisting for constant expressions.
2020-12-28 20:15:20 +03:00
Juneyoung Lee 9d70dbdc2b [InstCombine] use poison as placeholder for undemanded elems
Currently undef is used as a don’t-care vector when constructing a vector using a series of insertelement.
However, this is problematic because undef isn’t undefined enough.
Especially, a sequence of insertelement can be optimized to shufflevector, but using undef as its placeholder makes shufflevector a poison-blocking instruction because undef cannot be optimized to poison.
This makes a few straightforward optimizations incorrect, such as:

```
;  https://bugs.llvm.org/show_bug.cgi?id=44185

define <4 x float> @insert_not_undef_shuffle_translate_commute(float %x, <4 x float> %y, <4 x float> %q) {
  %xv = insertelement <4 x float> %q, float %x, i32 2
  %r = shufflevector <4 x float> %y, <4 x float> %xv, <4 x i32> { 0, 6, 2, undef }
  ret <4 x float> %r ; %r[3] is undef
}
=>
define <4 x float> @insert_not_undef_shuffle_translate_commute(float %x, <4 x float> %y, <4 x float> %q) {
  %r = insertelement <4 x float> %y, float %x, i32 1
  ret <4 x float> %r ; %r[3] = %y[3], incorrect if %y[3] = poison
}

Transformation doesn't verify!
ERROR: Target is more poisonous than source
```

I’d like to suggest
1. Using poison as insertelement’s placeholder value (IRBuilder::CreateVectorSplat should be patched too)
2. Updating shufflevector’s semantics to return poison element if mask is undef

Note that poison is currently lowered into UNDEF in SelDag, so codegen part is okay.
m_Undef() matches PoisonValue as well, so existing optimizations will still fire.

The only concern is hidden miscompilations that will go incorrect when poison constant is given.
A conservative way is copying all tests having `insertelement undef` & replacing it with `insertelement poison` & run Alive2 on it, but it will create many tests and people won’t like it. :(

Instead, I’ll simply locally maintain the tests and run Alive2.
If there is any bug found, I’ll report it.

Relevant links: https://bugs.llvm.org/show_bug.cgi?id=43958 , http://lists.llvm.org/pipermail/llvm-dev/2019-November/137242.html

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D93586
2020-12-28 08:58:15 +09:00
Kazu Hirata 789d250613 [CodeGen, Transforms] Use *Map::lookup (NFC) 2020-12-27 09:57:27 -08:00
Roman Lebedev d9ebaeeb46
[InstCombine] Hoist xor-by-constant from xor-by-value
This is one of the deficiencies that can be observed in
https://godbolt.org/z/YPczsG after D91038 patch set.

This exposed two missing folds, one was fixed by the previous commit,
another one is `(A ^ B) | ~(A ^ B) --> -1` / `(A ^ B) & ~(A ^ B) --> 0`.

`-early-cse` will catch it: https://godbolt.org/z/4n1T1v,
but isn't meaningful to fix it in InstCombine,
because we'd need to essentially do our own CSE,
and we can't even rely on `Instruction::isIdenticalTo()`,
because there are no guarantees that the order of operands matches.
So let's just accept it as a loss.
2020-12-24 21:20:50 +03:00
Roman Lebedev 5b78303433
[InstCombine] Fold `a & ~(a ^ b)` to `x & y`
```
----------------------------------------
define i32 @and_xor_not_common_op(i32 %a, i32 %b) {
%0:
  %b2 = xor i32 %b, 4294967295
  %t2 = xor i32 %a, %b2
  %t4 = and i32 %t2, %a
  ret i32 %t4
}
=>
define i32 @and_xor_not_common_op(i32 %a, i32 %b) {
%0:
  %t4 = and i32 %a, %b
  ret i32 %t4
}
Transformation seems to be correct!
```
2020-12-24 21:20:49 +03:00
Roman Lebedev b3021a72a6
[IR][InstCombine] Add m_ImmConstant(), that matches on non-ConstantExpr constants, and use it
A pattern to ignore ConstantExpr's is quite common, since they frequently
lead into infinite combine loops, so let's make writing it easier.
2020-12-24 21:20:47 +03:00
Simon Pilgrim 89abe1cf83 [InstCombine] foldICmpUsingKnownBits - use KnownBits signed/unsigned getMin/MaxValue helpers. NFCI.
Replace the local compute*SignedMinMaxValuesFromKnownBits methods with the equivalent KnownBits helpers to determine the min/max value ranges.
2020-12-24 14:22:26 +00:00
Nikita Popov ef2f843347 Revert "[InstCombine] Check inbounds in load/store of gep null transform (PR48577)"
This reverts commit 899faa50f2.

Upon further consideration, this does not fix the right issue.
Doing this fold for non-inbounds GEPs is legal, because the
resulting pointer is still based-on null, which has no associated
address range, and as such and access to it is UB.

https://bugs.llvm.org/show_bug.cgi?id=48577#c3
2020-12-24 12:36:56 +01:00
Nikita Popov 90177912a4 Revert "[InstCombine] Fold gep inbounds of null to null"
This reverts commit eb79fd3c92.

This causes stage2 crashes, possibly due to StringMap being
miscompiled. Reverting for now.
2020-12-24 10:20:31 +01:00
Roman Lebedev f8079355c6
[InstCombine] canonicalizeAbsNabs(): don't propagate NSW flag for NABS patter
As Nuno is noting in post-commit review in
https://reviews.llvm.org/D87188#2467915
it is not correct to keep NSW for negated abs pattern,
so don't do that.
2020-12-24 00:06:09 +03:00
Nikita Popov 759b8c11c3 [InstCombine] Handle different pointer types when folding gep of null
The source pointer type is not necessarily the same as the result
pointer type, so we can't simply return the original null pointer,
it might be a different one.
2020-12-23 21:58:26 +01:00
Nikita Popov eb79fd3c92 [InstCombine] Fold gep inbounds of null to null
Effectively, this is what we were previously already doing when
the GEP was used in conjunction with a load or store, but this
fold can also be applied more generally:

> The only in bounds address for a null pointer in the default
> address-space is the null pointer itself.
2020-12-23 21:41:53 +01:00
Nikita Popov 899faa50f2 [InstCombine] Check inbounds in load/store of gep null transform (PR48577)
If the GEP isn't inbounds, then accessing a GEP of null location
is generally not UB.

While this is a minimal fix, the GEP of null handling should
probably be its own fold.
2020-12-23 21:03:22 +01:00
Congzhe Cao c60a58f8d4 [InstCombine] Add check of i1 types in select-to-zext/sext transformation
When doing select-to-zext/sext transformations, we should
not handle TrueVal and FalseVal of i1 type otherwise it
would result in zext/sext i1 to i1.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D93272
2020-12-21 18:46:24 -05:00
Roman Lebedev 897c985e1e
[InstCombine] Canonicalize SPF to abs intrinsic
This patch enables canonicalization of SPF_ABS and SPF_ABS
to the abs intrinsic.

This is a recommit, the original try was
05d4c4ebc2,
but it was reverted due to an apparent miscompile,
which since then has just been fixed by the previous commit.

Differential Revision: https://reviews.llvm.org/D87188
2020-12-18 21:18:14 +03:00
Florian Hahn 01089c876b
[InstCombine] Preserve !annotation on newly created instructions.
If the source instruction has !annotation metadata, all instructions
created during combining should also have it. Tell the builder to
add it.

The !annotation system was discussed on llvm-dev as part of
'RFC: Combining Annotation Metadata and Remarks'
(http://lists.llvm.org/pipermail/llvm-dev/2020-November/146393.html)

This patch is based on an earlier patch by Francis Visoiu Mistrih.

Reviewed By: thegameg, lebedev.ri

Differential Revision: https://reviews.llvm.org/D91444
2020-12-17 15:20:23 +00:00
Jun Ma 0138399903 [InstCombine] Remove scalable vector restriction in InstCombineCasts
Differential Revision: https://reviews.llvm.org/D93389
2020-12-17 22:02:33 +08:00
Florian Hahn 29077ae860
[IRBuilder] Generalize debug loc handling for arbitrary metadata.
This patch extends IRBuilder to allow adding/preserving arbitrary
metadata on created instructions.

Instead of using references to specific metadata nodes (like DebugLoc),
IRbuilder now keeps a vector of (metadata kind, MDNode *) pairs, which
are added to each created instruction.

The patch itself is a NFC and only moves the existing debug location
handling over to the new system. In a follow-up patch it will be used to
preserve !annotation metadata besides !dbg.

The current approach requires iterating over MetadataToCopy to avoid
adding duplicates, but given that the number of metadata kinds to
copy/preserve is going to be very small initially (0, 1 (for !dbg) or 2
(!dbg and !annotation)) that should not matter.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D93400
2020-12-17 13:27:43 +00:00
Florian Hahn eba09a2db9
[InstCombine] Preserve !annotation for newly created instructions.
When replacing an instruction with !annotation with a newly created
replacement, add the !annotation metadata to the replacement.

This mostly covers cases where the new instructions are created using
the ::Create helpers. Instructions created by IRBuilder will be handled
by D91444.

Reviewed By: thegameg

Differential Revision: https://reviews.llvm.org/D93399
2020-12-17 09:06:51 +00:00
Jun Ma 52a3267ffa [InstCombine] Remove scalable vector restriction in foldVectorBinop
Differential Revision: https://reviews.llvm.org/D93289
2020-12-15 21:14:59 +08:00
Jun Ma ffe84d90e9 [InstCombine][NFC] Change cast of FixedVectorType to dyn_cast. 2020-12-15 20:36:57 +08:00
Jun Ma e12f584578 [InstCombine] Remove scalable vector restriction in InstCombineCompares
Differential Revision: https://reviews.llvm.org/D93269
2020-12-15 20:36:57 +08:00
Jun Ma 2ac58e21a1 [InstCombine] Remove scalable vector restriction when fold SelectInst
Differential Revision: https://reviews.llvm.org/D93083
2020-12-15 20:36:57 +08:00
Reid Kleckner d2ed9d6b7e Revert "ADT: Migrate users of AlignedCharArrayUnion to std::aligned_union_t, NFC"
We determined that the MSVC implementation of std::aligned* isn't suited
to our needs. It doesn't support 16 byte alignment or higher, and it
doesn't really guarantee 8 byte alignment. See
https://github.com/microsoft/STL/issues/1533

Also reverts "ADT: Change AlignedCharArrayUnion to an alias of std::aligned_union_t, NFC"

Also reverts "ADT: Remove AlignedCharArrayUnion, NFC" to bring back
AlignedCharArrayUnion.

This reverts commit 4d8bf870a8.

This reverts commit d10f9863a5.

This reverts commit 4b5dc150b9.
2020-12-14 17:04:06 -08:00
Sanjay Patel 4f051fe374 [InstCombine] avoid crash sinking to unreachable block
The test is reduced from the example in D82005.

Similar to 94f6d365e, the test here would assert in
the DomTree when we tried to convert a select to a
phi with an unreachable block operand.

We may want to add some kind of guard code in DomTree
itself to avoid this sort of problem.
2020-12-10 13:10:26 -05:00
Roman Lebedev e6f2a79d7a
[InstCombine] canonicalizeSaturatedAdd(): last fold is only valid for strict comparison (PR48390)
We could create uadd.sat under incorrect circumstances
if a select with -1 as the false value was canonicalized
by swapping the T/F values. Unlike the other transforms
in the same function, it is not invariant to equality.

Some alive proofs: https://alive2.llvm.org/ce/z/emmKKL

Based on original patch by David Green!

Fixes https://bugs.llvm.org/show_bug.cgi?id=48390

Differential Revision: https://reviews.llvm.org/D92717
2020-12-09 18:19:09 +03:00
Joe Ellis 80c33de2d3 [SelectionDAG] Add llvm.vector.{extract,insert} intrinsics
This commit adds two new intrinsics.

- llvm.experimental.vector.insert: used to insert a vector into another
  vector starting at a given index.

- llvm.experimental.vector.extract: used to extract a subvector from a
  larger vector starting from a given index.

The codegen work for these intrinsics has already been completed; this
commit is simply exposing the existing ISD nodes to LLVM IR.

Reviewed By: cameron.mcinally

Differential Revision: https://reviews.llvm.org/D91362
2020-12-09 11:08:41 +00:00
Kazu Hirata ddb002d7c7 [InstCombine] Remove replacePointer (NFC)
The declaration was introduced on Feb 10, 2017 in commit
ba01ed00fe without a corresponding
definition.
2020-12-06 10:24:08 -08:00
Sanjay Patel 94f6d365e4 [InstCombine] avoid crash on phi with unreachable incoming block (PR48369) 2020-12-06 09:31:47 -05:00
Duncan P. N. Exon Smith d10f9863a5 ADT: Migrate users of AlignedCharArrayUnion to std::aligned_union_t, NFC
Prepare to delete `AlignedCharArrayUnion` by migrating its users over to
`std::aligned_union_t`.

I will delete `AlignedCharArrayUnion` and its tests in a follow-up
commit so that it's easier to revert in isolation in case some
downstream wants to keep using it.

Differential Revision: https://reviews.llvm.org/D92516
2020-12-04 12:34:49 -08:00
Duncan P. N. Exon Smith 5b267fb796 ADT: Stop peeking inside AlignedCharArrayUnion, NFC
Update all the users of `AlignedCharArrayUnion` to stop peeking inside
(to look at `buffer`) so that a follow-up patch can replace it with an
alias to `std::aligned_union_t`.

This was reviewed as part of https://reviews.llvm.org/D92512, but I'm
splitting this bit out to commit first to reduce churn in case the
change to `AlignedCharArrayUnion` needs to be reverted for some
unexpected reason.
2020-12-04 11:07:42 -08:00
jasonliu a65d8c5d72 [XCOFF][AIX] Generate LSDA data and compact unwind section on AIX
Summary:
AIX uses the existing EH infrastructure in clang and llvm.
The major differences would be
1. AIX do not have CFI instructions.
2. AIX uses a new personality routine, named __xlcxx_personality_v1.
   It doesn't use the GCC personality rountine, because the
   interoperability is not there yet on AIX.
3. AIX do not use eh_frame sections. Instead, it would use a eh_info
section (compat unwind section) to store the information about
personality routine and LSDA data address.

Reviewed By: daltenty, hubert.reinterpretcast

Differential Revision: https://reviews.llvm.org/D91455
2020-12-02 18:42:44 +00:00
Sanjay Patel 9f60b8b3d2 [InstCombine] canonicalize sign-bit-shift of difference to ext(icmp)
icmp is the preferred spelling in IR because icmp analysis is
expected to be better than any other analysis. This should
lead to more follow-on folding potential.

It's difficult to say exactly what we should do in codegen to
compensate. For example on AArch64, which of these is preferred:
	sub	w8, w0, w1
	lsr	w0, w8, #31

vs:
	cmp	w0, w1
	cset	w0, lt

If there are perf regressions, then we should deal with those in
codegen on a case-by-case basis.

A possible motivating example for better optimization is shown in:
https://llvm.org/PR43198 but that will require other transforms
before anything changes there.

Alive proof:
https://rise4fun.com/Alive/o4E

  Name: sign-bit splat
  Pre: C1 == (width(%x) - 1)
  %s = sub nsw %x, %y
  %r = ashr %s, C1
  =>
  %c = icmp slt %x, %y
  %r = sext %c

  Name: sign-bit LSB
  Pre: C1 == (width(%x) - 1)
  %s = sub nsw %x, %y
  %r = lshr %s, C1
  =>
  %c = icmp slt %x, %y
  %r = zext %c
2020-12-01 09:58:11 -05:00
Bhramar Vatsa fd679107d6
[InstCombine] Optimize away the unnecessary multi-use sign-extend
C.f. https://bugs.llvm.org/show_bug.cgi?id=47765

Added a case for handling the sign-extend (Shl+AShr) for multiple uses,
to optimize it away for an individual use,
when the demanded bits aren't affected by sign-extend.

https://rise4fun.com/Alive/lgf

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D91343
2020-12-01 16:54:00 +03:00
Roman Lebedev 94ead0190f
[InstCombine] Improve vector undef handling for sext(ashr(shl(trunc()))) fold, 2
If the shift amount was undef for some lane, the shift amount in opposite
shift is irrelevant for that lane, and the new shift amount for that lane
can be undef.
2020-12-01 16:54:00 +03:00
Roman Lebedev 52533b52b8
Revert "[InstCombine] Improve vector undef handling for sext(ashr(shl(trunc()))) fold"
It seems i have missed checklines, temporairly reverting,
will reland momentairly..

This reverts commit aa1aa13509.
2020-12-01 15:47:04 +03:00
Roman Lebedev aa1aa13509
[InstCombine] Improve vector undef handling for sext(ashr(shl(trunc()))) fold
If the shift amount was undef for some lane, the shift amount in opposite
shift is irrelevant for that lane, and the new shift amount for that lane
can be undef.
2020-12-01 15:13:08 +03:00
Roman Lebedev 8e29e20e0d
[InstCombine] Evaluate new shift amount for sext(ashr(shl(trunc()))) fold in wide type (PR48343)
It is not correct to compute that new shift amount in it's narrow type
and only then extend it into the wide type:

----------------------------------------
Optimization: PR48343 good
Precondition: (width(%X) == width(%r))
  %o0 = trunc %X
  %o1 = shl %o0, %Y
  %o2 = ashr %o1, %Y
  %r = sext %o2
=>
  %n0 = sext %Y
  %n1 = sub width(%o0), %n0
  %n2 = sub width(%X), %n1
  %n3 = shl %X, %n2
  %r = ashr %n3, %n2

Done: 2016
Optimization is correct!

----------------------------------------
Optimization: PR48343 bad
Precondition: (width(%X) == width(%r))
  %o0 = trunc %X
  %o1 = shl %o0, %Y
  %o2 = ashr %o1, %Y
  %r = sext %o2
=>
  %n0 = sub width(%o0), %Y
  %n1 = sub width(%X), %n0
  %n2 = sext %n1
  %n3 = shl %X, %n2
  %r = ashr %n3, %n2

Done: 1
ERROR: Domain of definedness of Target is smaller than Source's for i9 %r

Example:
%X i9 = 0x000 (0)
%Y i4 = 0x3 (3)
%o0 i4 = 0x0 (0)
%o1 i4 = 0x0 (0)
%o2 i4 = 0x0 (0)
%n0 i4 = 0x1 (1)
%n1 i4 = 0x8 (8, -8)
%n2 i9 = 0x1F8 (504, -8)
%n3 i9 = 0x000 (0)
Source value: 0x000 (0)
Target value: undef


I.e. we should be computing it in the wide type from the beginning.

Fixes https://bugs.llvm.org/show_bug.cgi?id=48343
2020-12-01 15:13:07 +03:00
Sanjay Patel 678b9c5dde [InstCombine] try difference-of-shifts factorization before negator
We need to preserve wrapping flags to allow better folds.
The cases with geps may be non-intuitive, but that appears to agree with Alive2:
https://alive2.llvm.org/ce/z/JQcqw7
We create 'nsw' ops independent from the original wrapping on the sub.
2020-11-24 13:56:30 -05:00
Sanjay Patel ab29f091eb [InstCombine] propagate 'nsw' on pointer difference of 'inbounds' geps
This is a retry of 324a53205. I cautiously reverted that at 6aa3fc4
because the rules about gep math were not clear. Since then, we
have added this line to LangRef for gep inbounds:
"The successive addition of offsets (without adding the base address)
does not wrap the pointer index type in a signed sense (nsw)."

See D90708 and post-commit comments on the revert patch for more details.
2020-11-23 16:50:09 -05:00
Kazu Hirata def7cfb7ff [InstCombine] Use is_contained (NFC) 2020-11-21 15:47:11 -08:00
Roman Lebedev a91e96702a
[InstCombine] Fold `and(shl(zext(x), width(SIGNMASK) - width(%x)), SIGNMASK)` to `and(sext(%x), SIGNMASK)`
One less instruction and reducing use count of zext.
As alive2 confirms, we're fine with all the weird combinations of
undef elts in constants, but unless the shift amount was undef
for a lane, we must sanitize undef mask to zero, since sign bits
are no longer zeros.

https://rise4fun.com/Alive/d7r
```
----------------------------------------
Optimization: zz
Precondition: ((C1 == (width(%r) - width(%x))) && isSignBit(C2))
  %o0 = zext %x
  %o1 = shl %o0, C1
  %r = and %o1, C2
=>
  %n0 = sext %x
  %r = and %n0, C2

Done: 2016
Optimization is correct!
```
2020-11-20 00:31:27 +03:00
Kazu Hirata 43c0e4f665 [Transforms] Use llvm::is_contained (NFC) 2020-11-18 20:42:22 -08:00
Sanjay Patel 4a66a1d17a [InstCombine] allow vectors for masked-add -> xor fold
https://rise4fun.com/Alive/I4Ge

  Name: add with pow2 mask
  Pre: isPowerOf2(C2) && (C1 & C2) != 0 && (C1 & (C2-1)) == 0
  %a = add i8 %x, C1
  %r = and i8 %a, C2
  =>
  %n = and i8 %x, C2
  %r = xor i8 %n, C2
2020-11-17 13:36:08 -05:00
Simon Pilgrim f7ebdec987 [InstCombine] visitAnd - remove unnecessary Value *X, *Y shadow variables. NFCI.
Fixes a number of Wshadow warnings.
2020-11-17 17:59:21 +00:00
Simon Pilgrim abf29d9862 [InstCombine] visitAnd - use m_SpecificInt instead of m_APInt + comparison. NFCI.
m_SpecificInt has the same 'no undef element' behaviour as m_APInt so no change there, and anyway we have test coverage for undef elements in the fold.

Noticed while fixing a Wshadow warning about shadow Value *X, *Y variables.
2020-11-17 17:37:10 +00:00
Sanjay Patel f791ad7e1e [InstCombine] remove scalar constraint for mask-of-add fold
https://rise4fun.com/Alive/V6fP

  Name: add with low mask
  Pre: (C1 & (-1 u>> countLeadingZeros(C2))) == 0
  %a = add i8 %x, C1
  %r = and i8 %a, C2
  =>
  %r = and i8 %x, C2
2020-11-17 12:13:45 -05:00
Sanjay Patel 433696911a [InstCombine] relax constraints on mask-of-add
There are 2 changes:
1. Remove the unnecessary one-use check.
2. Remove the unnecessary power-of-2 check.

https://rise4fun.com/Alive/V6fP

  Name: add with low mask
  Pre: (C1 & (-1 u>> countLeadingZeros(C2))) == 0
  %a = add i8 %x, C1
  %r = and i8 %a, C2
  =>
  %r = and i8 %x, C2
2020-11-17 12:13:44 -05:00
Sanjay Patel 4e68bc0999 Revert "[InstCombine] add multi-use demanded bits fold for add with low-bit mask"
This reverts commit e56103d250.
There is a stage2 msan failure blamed on this commit:
http://lab.llvm.org:8011/#/builders/74/builds/888/steps/9/logs/stdio
2020-11-16 14:48:09 -05:00
Sanjay Patel 6ddc237766 [InstCombine] reduce code for flip of masked bit; NFC
There are 1-2 potential follow-up NFC commits to reduce
this further on the way to generalizing this for vectors.

The operand replacing path should be dead code because demanded
bits handles that more generally (D91415).
2020-11-15 15:43:34 -05:00
Sanjay Patel e56103d250 [InstCombine] add multi-use demanded bits fold for add with low-bit mask
I noticed an add example like the one from D91343, so here's a similar patch.
The logic is based on existing code for the single-use demanded bits fold.
But I only matched a constant instead of using compute known bits on the
operands because that was the motivating patterni that I noticed.

I think this will allow removing a special-case (but incomplete) dedicated
fold within visitAnd(), but I need to untangle the existing code to be sure.

https://rise4fun.com/Alive/V6fP

  Name: add with low mask
  Pre: (C1 & (-1 u>> countLeadingZeros(C2))) == 0
  %a = add i8 %x, C1
  %r = and i8 %a, C2
  =>
  %r = and i8 %x, C2

Differential Revision: https://reviews.llvm.org/D91415
2020-11-15 15:09:49 -05:00
Nikita Popov 02dda1c659 [Local] Clean up EmitGEPOffset
Handle the emission of the add in a single place, instead of three
different ones.

Don't emit an unnecessary add with zero to start with. It will get
dropped by InstCombine, but we may as well not create it in the
first place. This also means that InstCombine does not need to
specially handle this extra add.

This is conceptually NFC, but can affect worklist order etc.
2020-11-13 18:30:56 +01:00
serge-sans-paille 9218ff50f9 llvmbuildectomy - replace llvm-build by plain cmake
No longer rely on an external tool to build the llvm component layout.

Instead, leverage the existing `add_llvm_componentlibrary` cmake function and
introduce `add_llvm_component_group` to accurately describe component behavior.

These function store extra properties in the created targets. These properties
are processed once all components are defined to resolve library dependencies
and produce the header expected by llvm-config.

Differential Revision: https://reviews.llvm.org/D90848
2020-11-13 10:35:24 +01:00
Sanjay Patel 0abde4bc92 [InstCombine] fold sub of low-bit masked value from offset of same value
There might be some demanded/known bits way to generalize this,
but I'm not seeing it right now.

This came up as a regression when I was looking at a different
demanded bits improvement.

https://rise4fun.com/Alive/5fl

  Name: general
  Pre: ((-1 << countTrailingZeros(C1)) & C2) == 0
  %a1 = add i8 %x, C1
  %a2 = and i8 %x, C2
  %r = sub i8 %a1, %a2
  =>
  %r = and i8 %a1, ~C2

  Name: test 1
  %a1 = add i8 %x, 192
  %a2 = and i8 %x, 10
  %r = sub i8 %a1, %a2
  =>
  %r = and i8 %a1, -11

  Name: test 2
  %a1 = add i8 %x, -108
  %a2 = and i8 %x, 3
  %r = sub i8 %a1, %a2
  =>
  %r = and i8 %a1, -4
2020-11-12 20:10:28 -05:00
Simon Pilgrim 1a62ca65c1 [KnownBits] Add KnownBits::commonBits helper. NFCI.
We have a frequent pattern where we're merging two KnownBits to get the common/shared bits, and I just fell for the gotcha where I tried to use the & operator to merge them........
2020-11-11 12:15:54 +00:00
LemonBoy 42732d33cc
[InstCombine] Fix constant-folding of overflowing arithmetic ops on vectors
Feeding vector values to `InstCombiner::OptimizeOverflowCheck` produces a scalar boolean flag if it proves the overflow check can be eliminated.
This causes `InstCombiner::CreateOverflowTuple` to crash as it correctly expects a vector of i1 values instead.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D89628
2020-11-09 14:41:07 +03:00
Simon Pilgrim 0fe91ad463 [InstCombine] foldSelectFunnelShift - block poison in funnel shift value
As raised by @nlopes on D90382 - if this is not a rotate then the select was blocking poison from the 'shift-by-zero' non-TVal, but a funnel shift won't - so freeze it.
2020-11-08 12:58:30 +00:00
Roman Lebedev 8d0fdd36a3
[IR] CmpInst: Add getFlippedSignednessPredicate()
And refactor a few places to use it
2020-11-06 11:31:09 +03:00
Simon Pilgrim 4b2be681f4 [InstCombine] Remove orphan InstCombinerImpl method declarations. NFCI. 2020-11-05 10:13:16 +00:00
Roman Lebedev c009d11bda
[InstCombine] Perform C-(X+C2) --> (C-C2)-X transform before using Negator
In particular, it makes it fire for C=0, because negator doesn't want
to perform that fold since in general it's not beneficial.
2020-11-03 16:06:52 +03:00
Roman Lebedev e465f9c303
[InstCombine] Negator: - (C - %x) --> %x - C (PR47997)
This relaxes one-use restriction on that `sub` fold,
since apparently the addition of Negator broke
preexisting `C-(C2-X) --> X+(C-C2)` (with C=0) fold.
2020-11-03 16:06:51 +03:00
Simon Pilgrim 538fdb0189 [InstCombine] foldSelectRotate - generalize to foldSelectFunnelShift
This is the last of the rotate->funnel shift InstCombine generalizations for PR46896

We still have foldGuardedRotateToFunnelShift to deal with in AggressiveInstCombine

Differential Revision: https://reviews.llvm.org/D90382
2020-10-31 12:32:34 +00:00
Simon Pilgrim dcb3dc101d [InstCombine] visitShl - ensure inner shifts have inrange amounts
Noticed when fixing OSS Fuzz #26716
2020-10-29 15:28:15 +00:00
Florian Hahn 53f4c4b2cc [InstCombine] Do not introduce bitcasts for swifterror arguments.
The following constraints hold for swifterror values:

    A swifterror value (either the parameter or the alloca) can only
    be loaded and stored from, or used as a swifterror argument.

This patch updates instcombine to not try to convert a bitcast of a
function into a bitcast of a swifterror argument.

Reviewed By: rjmccall

Differential Revision: https://reviews.llvm.org/D90258
2020-10-28 21:52:12 +00:00
Luqman Aden 4c0a016927 Rename EHPersonality::MSVC_Win64SEH to EHPersonality::MSVC_TableSEH. NFC.
The types of SEH aren't x86(-32) vs x64 but rather stack-based exception chaining
vs table-based exception handling. x86-32 is the only arch for which Windows
uses the former. 32-bit ARM would use what is called Win64SEH today, which
is a bit confusing so instead let's just rename it to be a bit more clear.

Reviewed By: compnerd, rnk

Differential Revision: https://reviews.llvm.org/D90117
2020-10-27 23:22:13 -07:00
Roman Lebedev 0ac56e8eaa
[InstCombine] Fold `(X >>? C1) << C2` patterns to shift+bitmask (PR37872)
This is essentially finalizes a revert of rL155136,
because nowadays the situation has improved, SCEV can model
all these patterns well, and we canonicalize rotate-like patterns
into a funnel shift intrinsics in InstCombine.
So this should not cause any pessimization.

I've verified the canonicalize-{a,l}shr-shl-to-masking.ll transforms
with alive, which confirms that we can freely preserve exact-ness,
and no-wrap flags.

Profs:
* base: https://rise4fun.com/Alive/gPQ
* exact-ness preservation: https://rise4fun.com/Alive/izi
* nuw preservation: https://rise4fun.com/Alive/DmD
* nsw preservation: https://rise4fun.com/Alive/SLN6N
* nuw nsw preservation: https://rise4fun.com/Alive/Qp7

Refs. https://reviews.llvm.org/D46760
2020-10-27 14:42:53 +03:00
Sanjay Patel 5a6e66ec72 [InstCombine] add folds for icmp+ctpop
https://alive2.llvm.org/ce/z/XjFPQJ

  define void @src(i64 %value) {
    %t0 = call i64 @llvm.ctpop.i64(i64 %value)
    %gt = icmp ugt i64 %t0, 63
    %lt = icmp ult i64 %t0, 64
    call void @use(i1 %gt, i1 %lt)
    ret void
  }

  define void @tgt(i64 %value) {
    %eq = icmp eq i64 %value, -1
    %ne = icmp ne i64 %value, -1
    call void @use(i1 %eq, i1 %ne)
    ret void
  }

  declare i64 @llvm.ctpop.i64(i64) #1
  declare void @use(i1, i1)
2020-10-26 16:48:56 -04:00
Sanjay Patel 437d7551c5 [InstCombine] reduce code duplication in icmp intrinsic folds; NFC 2020-10-26 16:48:56 -04:00
Joe Ellis 0f83505593 [SVE][InstCombine] Fix TypeSize warning in canReplaceGEPIdxWithZero
The warning would fire when calling canReplaceGEPIdxWithZero on a GEP
whose source element type is a scalable vector. The size of scalable
vector types is not known, so this optimization cannot be performed.

This patch fixes the issue by:

- bailing out early in this routine if the GEP instruction's source
  element type is a scalable vector.

- making use of getFixedSize -- this removes the dependency on the
  deprecated interface.

Reviewed By: fpetrogalli

Differential Revision: https://reviews.llvm.org/D89968
2020-10-26 17:40:26 +00:00
Simon Pilgrim 6b2eb31e1e [InstCombine] Add support for zext(and(neg(amt),width-1)) rotate shift amount patterns
Alive2: https://alive2.llvm.org/ce/z/bCvvHd
2020-10-26 11:22:41 +00:00
Simon Pilgrim 3052e474ec [InstCombine] matchBSwapOrBitReversem - recognise or(fshl(),fshl()) bswap patterns.
I'm not certain InstCombinerImpl::matchBSwapOrBitReverse needs to filter the or(op0(),op1()) ops - there are just too many cases that recognizeBSwapOrBitReverseIdiom/collectBitParts handle now (and quickly).
2020-10-25 10:17:45 +00:00
Simon Pilgrim 310f62b4ff [InstCombine] narrowFunnelShift - fold trunc/zext or(shl(a,x),lshr(b,sub(bw,x))) -> fshl(a,b,x) (PR35155)
As discussed on PR35155, this extends narrowFunnelShift (recently renamed from narrowRotate) to support basic funnel shift patterns.

Unlike matchFunnelShift we don't include the computeKnownBits limitation as extracting the pattern from the zext/trunc layers should be a indicator of reasonable funnel shift codegen, in D89139 we demonstrated how to efficiently promote funnel shifts to wider types.

Differential Revision: https://reviews.llvm.org/D89542
2020-10-24 12:42:43 +01:00
Simon Pilgrim 1cab3bf004 [InstCombine] matchBSwapOrBitReverse - expose bswap/bitreverse matching flags.
matchBSwapOrBitReverse was hardcoded to just match bswaps - we're going to need to expose the ability to match bitreverse as well, so make this part of the function call.
2020-10-23 12:35:28 +01:00
Simon Pilgrim 19a13bf538 [InstCombine] Rename InstCombinerImpl::matchBSwap to matchBSwapOrBitReverse. NFCI.
This matches bswap and bitreverse intrinsics, so we should make that clear in the function name.
2020-10-23 12:35:27 +01:00
Caroline Concatto 2415636475 [SVE]Clarify TypeSize comparisons in llvm/lib/Transforms
Use isKnownXY comparators when one of the operands can be with
scalable vectors or getFixedSize() for all the other cases.

This patch also does bug fixes for getPrimitiveSizeInBits by using
getFixedSize() near the places with the TypeSize comparison.

Differential Revision: https://reviews.llvm.org/D89703
2020-10-23 09:15:17 +01:00
Layton Kifer d49911c282 [InstCombine][NFC] Use ConstantExpr::getBinOpIdentity
Delete duplicate implementation getSelectFoldableConstant and
replace with ConstantExpr::getBinOpIdentity.

Differential Revision: https://reviews.llvm.org/D89839
2020-10-22 20:44:57 +02:00
Vedant Kumar 3419252a79 [InstCombine] Remove dbg.values describing contents of dead allocas
When InstCombine removes an alloca, it erases the dbg.{addr,declare}
instructions which refer to the alloca. It would be better to instead
remove all debug intrinsics which describe the contents of the dead
alloca, namely all dbg.value(<dead alloca>, ..., DW_OP_deref)'s.

This effectively undoes work performed in an InstCombine run earlier in
the pipeline by LowerDbgDeclare, which inserts DW_OP_deref dbg.values
before CallInst users of an alloca. The motivating example looks like:

```
  define void @foo(i32 %0) {
    %a = alloca i32              ; This alloca is erased.
    store i32 %0, i32* %a
    dbg.value(i32 %0, "arg0")    ; This dbg.value survives.
    dbg.value(i32* %a, "arg0", DW_OP_deref)
    call void @trivially_inlinable_no_op(i32* %a)
    ret void
  }
```

If the DW_OP_deref dbg.value is not erased, it becomes dbg.value(undef)
after inlining, making "arg0" unavailable. But we already have dbg.value
descriptions of the alloca's value (from LowerDbgDeclare), so the
DW_OP_deref dbg.value cannot serve its purpose of describing an
initialization of the alloca by some callee. It invalidates other useful
dbg.values, causing large gaps in location coverage, so we should delete
it (even though doing so may cause stale dbg.values to appear, if
there's a dead store to `%a` in @trivially_inlinable_no_op).

OTOH, it wouldn't be correct to delete all dbg.value descriptions of an
alloca. Note that it's possible to describe a variable that takes on
different pointer values, e.g.:

```
  void use(int *);
  void t(int a, int b) {
    int *local = &a;     // dbg.value(i32* %a.addr, "local")
    local = &b;          // dbg.value(i32* undef, "local")
    use(&a);             //           (note: %b.addr is optimized out)
    local = &a;          // dbg.value(i32* %a.addr, "local")
  }
```

In this example, the alloca for "b" is erased, but we need to describe
the value of "local" as <unavailable> before the call to "use". This
prevents "local" from appearing to be equal to "&a" at the callsite.

rdar://66592859

Differential Revision: https://reviews.llvm.org/D85555
2020-10-22 10:00:13 -07:00
Simon Pilgrim 7b4a828452 [InstCombine] foldOrOfICmps - use m_Specific instead of explicit comparisons. NFCI. 2020-10-21 11:53:45 +01:00
Martin Storsjö 4de215ff18 Revert "[InstCombine] Add or((icmp ult/ule (A + C1), C3), (icmp ult/ule (A + C2), C3)) uniform vector support"
Also revert "[InstCombine] foldOrOfICmps - use m_Specific instead of
explicit comparisons. NFCI." to make the primarily intended revert
work.

This reverts commits ce13549761 and
e372a5f86f.

This commit caused failed asserts e.g. like this:

$ cat repro.cpp
bool a(char b) {
  return b >= '0' && b <= '9' || (b | 32) >= 'a' && (b | 32) <= 'z';
$ clang++ -target x86_64-linux-gnu -c -O2 repro.cpp
clang++: ../include/llvm/ADT/APInt.h:1151: bool llvm::APInt::operator==(const
llvm::APInt&) const: Assertion `BitWidth == RHS.BitWidth && "Comparison
requires equal bit widths"' failed.
2020-10-21 09:47:18 +03:00
Simon Pilgrim ec228fbfc0 [InstCombine] SimplifyDemandedUseBits - replace dyn_cast<ConstantInt> with m_ConstantInt. NFCI. 2020-10-20 16:45:16 +01:00
Simon Pilgrim ce13549761 [InstCombine] foldOrOfICmps - use m_Specific instead of explicit comparisons. NFCI. 2020-10-20 16:26:41 +01:00
Simon Pilgrim e372a5f86f [InstCombine] Add or((icmp ult/ule (A + C1), C3), (icmp ult/ule (A + C2), C3)) uniform vector support
Reapplied rGa704d8238c86 with a check for integer/integervector types to prevent matching with pointer types
2020-10-20 14:14:26 +01:00
Simon Pilgrim e346ea9905 [InstCombine] SimplifyDemandedUseBits - pass APInt by const reference. NFCI. 2020-10-20 12:13:08 +01:00
Simon Pilgrim adb52e5f9e [InstCombine] foldOrOfICmps - only fold (icmp_eq B, 0) | (icmp_ult/gt A, B) for integer types
Fixes a number of stage2 buildbots that were failing when I generalized the m_ConstantInt() logic - that didn't match for pointer types but m_Zero() does......
2020-10-19 17:05:38 +01:00
Simon Pilgrim 482e6f0041 Revert rGa704d8238c86bac: "[InstCombine] Add or((icmp ult/ule (A + C1), C3), (icmp ult/ule (A + C2), C3)) uniform vector support"
This reverts commit a704d8238c.

Causing stage2 build failures on some bots.
2020-10-19 16:03:36 +01:00
Simon Pilgrim de885f1b2a [InstCombine] Add (icmp ne A, 0) | (icmp ne B, 0) --> (icmp ne (A|B), 0) vector support
Scalar cases were already being handled by foldLogOpOfMaskedICmps (so this was dead code), but refactoring to support non-uniform vectors will take some time, so tweak this fold in the meantime.
2020-10-19 15:41:21 +01:00
Simon Pilgrim ecd25086d1 [InstCombine] Add (icmp eq B, 0) | (icmp ult/gt A, B) -> (icmp ule A, B-1) vector support 2020-10-19 15:23:48 +01:00
Simon Pilgrim a704d8238c [InstCombine] Add or((icmp ult/ule (A + C1), C3), (icmp ult/ule (A + C2), C3)) uniform vector support 2020-10-19 14:55:18 +01:00
Simon Pilgrim 1d90e53044 [InstCombine] foldOrOfICmps - pull out repeated getOperand() calls. NFCI. 2020-10-19 14:28:08 +01:00
Simon Pilgrim 0b7b446a40 [InstCombine] Support vectors-with-undef in and(logicalshift(1,X),1) --> zext(X == 0) fold 2020-10-19 11:10:32 +01:00
Sanjay Patel 53e92b4c0e [InstCombine] (~A & B) ^ A -> A | B
Differential Revision: https://reviews.llvm.org/D86395
2020-10-17 12:20:18 -04:00
Juneyoung Lee 62a0ec1612 Add support for !noundef metatdata on loads
This patch adds metadata !noundef and makes load instructions can optionally have it.
A load with !noundef always return a well-defined value (has no undef bit or isn't poison).
If the loaded value isn't well defined, the behavior is undefined.

This metadata can be used to encode the assumption from C/C++ that certain reads of variables should have well-defined values.
It is helpful for optimizing freeze instructions away, because freeze can be removed when its operand has well-defined value, and showing that a load from arbitrary location is well-defined is usually hard otherwise.

The same information can be encoded with llvm.assume with operand bundle; using metadata is chosen because I wasn't sure whether code motion can be freely done when llvm.assume is inserted from clang instead.
The existing codebase already is stripping unknown metadata when doing code motion, so using metadata is UB-safe as well.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D89050
2020-10-17 13:50:10 +09:00
Simon Pilgrim 83ae625f0c [InstCombine] visitAnd - pull out repeated I.getType() calls. NFCI. 2020-10-16 15:43:11 +01:00
Simon Pilgrim 253f24cf4c [InstCombine] Remove custom and(trunc(and(x,c1)),c2) fold
This is more correctly handled by canEvaluateTruncated (one use checks etc.) and covers all the tests cases that were added for this fold.
2020-10-16 15:43:10 +01:00
Simon Pilgrim 981fdf01d5 [InstCombine] foldSelectRotate - canonicalize to OR(SHL,LSHR). NFCI.
Match the canonicalization code that was added to matchFunnelShift at rG02295e6d1a15
2020-10-16 13:18:53 +01:00
Simon Pilgrim 1cf347e48b [InstCombine] narrowRotate - minor refactoring for funnel shift support. NFC.
Prep work for PR35155 - renamed narrowRotate to narrowFunnelShift, rewrote some comments and adjusted code to collect separate shift values, although we bail if they don't match (still only rotations are only actually folded).

I'm trying to match matchFunnelShift as much as possible in case we finally get to merge these one day.
2020-10-16 11:27:28 +01:00
Simon Pilgrim 55991b44b7 [InstCombine] foldAndOrOfICmpsOfAndWithPow2 - add vector support
Support vector cases for folding:

 (iszero(A & K1) | iszero(A & K2)) -> (A & (K1 | K2)) != (K1 | K2)
 (!iszero(A & K1) & !iszero(A & K2)) -> (A & (K1 | K2)) == (K1 | K2)
2020-10-16 10:41:40 +01:00
Simon Pilgrim 23f1616626 [InstCombine] Use m_SpecificInt instead of m_APInt + comparison. NFCI. 2020-10-15 16:06:27 +01:00
Simon Pilgrim b3330ae42c [InstCombine] SimplifyDemandedUseBits - xor - refactor cast<ConstantInt> usage to PatternMatch. NFCI.
First step towards replacing these to add full vector support.
2020-10-15 16:06:23 +01:00
Simon Pilgrim 2b45639ea0 [InstCombine] InstCombineAndOrXor - refactor cast<ConstantInt> usages to PatternMatch. NFCI.
First step towards replacing these to add full vector support.
2020-10-15 16:06:17 +01:00
Simon Pilgrim 09be7623e4 [InstCombine] visitXor - refactor ((X^C1)>>C2)^C3 -> (X>>C2)^((C1>>C2)^C3) fold. NFCI.
This is still ConstantInt-only (scalar) but is refactored to use PatternMatch to make adding vector support in the future relatively trivial.
2020-10-15 14:38:15 +01:00
Simon Pilgrim 60ba9233d1 Revert rG25a97c3a43d7 - "[InstCombine] visitCallInst - retain undefs in vector funnel shift amounts"
This reverts commit 25a97c3a43.

We have other constant folds that fold undef funnel shift amounts to 0 - so we need to be consistent.

If we end up with regressions where we lose a splat shift amount pattern we'll have to investigate other canonicalizations, but matchFunnelShift currently protects us from that.
2020-10-14 18:14:37 +01:00
Matt Arsenault 6a9484f4bf InstCombine: Fix losing load properties in copy-constant-to-alloca
Preserve the alignment and metadata. Atomic loads are skipped for
this, but pass along the properties for consistency.
2020-10-14 12:55:25 -04:00
Matt Arsenault 6da31fa4a6 InstCombine: Fix infinite loop in copy-constant-to-alloca transform
This was broken by 16295d521e, when
instructions started being handled and not just constant
expressions. This was re-inserting an equivalent bitcast to the
original memcpy operand, which made a non-functional IR change on
every iteration.

This also fixes a secondary problem where it was inserting
addrspacecasts which may not have been legal (i.e. it changed the
source address space). Start visiting all pointer users and fail out
if we can't process them. Also start handling the relevant memory
intrinsic users. These cases can be dealt with by running
InferAddressSpaces separately.
2020-10-14 12:55:25 -04:00
Simon Pilgrim 89657b3a3b [InstCombine] narrowRotate - canonicalize to OR(SHL,LSHR). NFCI.
Match the canonicalization code that was added to matchFunnelShift at rG02295e6d1a15
2020-10-14 16:45:00 +01:00
Simon Pilgrim 89a2a47870 [InstCombine] Add m_SpecificIntAllowUndef pattern matcher
m_SpecificInt doesn't accept undef elements in a vector splat value - tweak specific_intval to optionally allow undefs and add the m_SpecificIntAllowUndef variants.

Allows us to remove the m_APIntAllowUndef + comparison hack inside matchFunnelShift
2020-10-14 16:15:53 +01:00
Simon Pilgrim 25a97c3a43 [InstCombine] visitCallInst - retain undefs in vector funnel shift amounts
By always performing a modulo on the shift amount constants this was causing undef amounts being replaced with zero, meaning we were losing funnel shift by splat (with undef) patterns.

Tweaked the shift amount bounds check to support (passthrough) undefs, and use Constant::mergeUndefsWith to preserve the undefs after folding.
2020-10-14 14:38:21 +01:00
Juneyoung Lee 9b3c2a72e4 [ValueTracking] Use assume's noundef operand bundle
This patch updates `isGuaranteedNotToBeUndefOrPoison` to use `llvm.assume`'s `noundef` operand bundle.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D89219
2020-10-14 20:16:33 +09:00
Simon Pilgrim 1e4d882f9a [InstCombine] matchFunnelShift - add support for non-uniform vectors containing undefs.
Replace m_SpecificInt with m_APIntAllowUndef to matching splats containing undefs, then use ConstantExpr::mergeUndefsWith to merge the undefs together in the result.

The undef funnel shift amounts are getting replaced with zero later on - I'll address this in a later patch, otherwise we lose potential shift by splat value patterns.
2020-10-14 10:42:27 +01:00
Simon Pilgrim 9c3138bd6d [InstCombine] visitTrunc - pass through undefs for trunc(shift(trunc/ext(x),c)) patterns
Based on the recent patches D88475 and D88429 where we are losing undef values due to extension/comparisons.

I've added a Constant::mergeUndefsWith method that merges the undef scalar/elements from another Constant into a specific Constant.

Differential Revision: https://reviews.llvm.org/D88687
2020-10-13 14:35:18 +01:00
Simon Pilgrim 5df61724a1 [InstCombine] Support uniform vector splats in ((((X >> C) & CC) + Y) << C) folds.
Add support for uniform vector splats (no undefs).
2020-10-13 09:28:39 +01:00
Simon Pilgrim 4ff7136268 [InstCombine] FoldShiftByConstant - create Scalar/Vector constant with ConstantInt::get(). NFCI.
There's no need to create constant vector splats manually - missed this one in rG24dd0cd1edd5
2020-10-12 18:39:45 +01:00
Simon Pilgrim 24dd0cd1ed [InstCombine] FoldShiftByConstant - create Scalar/Vector constant with ConstantInt::get(). NFCI.
There's no need to create constant vector splats manually.
2020-10-12 18:17:20 +01:00
Simon Pilgrim 2de368f6a7 [InstCombine] FoldShiftByConstant - merge equivalent types. NFCI.
Consistently use the original shift instruction's Type/BitWidth instead of the operands, casted values etc.
2020-10-12 18:17:19 +01:00
Simon Pilgrim bbf3925879 [InstCombine] matchFunnelShift - fold or(shl(a,x),lshr(b,sub(bw,x))) -> fshl(a,b,x) iff x < bw (REAPPLIED)
If value tracking can confirm that a shift value is less than the type bitwidth then we can more confidently fold general or(shl(a,x),lshr(b,sub(bw,x))) patterns to a funnel/rotate intrinsic pattern without causing bad codegen regressions in the backend (see D89139).

Reapplied after the shift canonicalization in rG02295e6d1a15 which removed the need to flip the shift values.

Differential Revision: https://reviews.llvm.org/D88783
2020-10-12 16:06:41 +01:00
Simon Pilgrim fa56623370 [InstCombine] matchFunnelShift - remove shift value commutation. NFCI.
After rG02295e6d1a15 we no longer need to invert the shift values for fshr - this is just hidden at the moment as funnel shifts only ever match for constant values so never use the fshr "Sub on SHL" path.
2020-10-12 15:55:18 +01:00
Simon Pilgrim 02295e6d1a [InstCombine] matchFunnelShift - canonicalize to OR(SHL,LSHR). NFCI.
Simplify the shift amount matching code by canonicalizing the shift ops first.
2020-10-12 15:10:59 +01:00
Simon Pilgrim 45d785e22b Revert rGb97093e520036f8 - "[InstCombine] matchFunnelShift - fold or(shl(a,x),lshr(b,sub(bw,x))) -> fshl(a,b,x) iff x < bw"
This reverts commit b97093e520.

Funnel shift argument commutation isn't working correctly
2020-10-12 11:38:52 +01:00
Roman Lebedev 544a6aa267
[InstCombine] combineLoadToOperationType(): don't fold int<->ptr cast into load
And another step towards transforms not introducing inttoptr and/or
ptrtoint casts that weren't there already.

As we've been establishing (see D88788/D88789), if there is a int<->ptr cast,
it basically must stay as-is, we can't do much with it.

I've looked, and the most source of new such casts being introduces,
as far as i can tell, is this transform, which, ironically,
tries to reduce count of casts..

On vanilla llvm test-suite + RawSpeed, @ `-O3`, this results in
-33.58% less `IntToPtr`s (19014 -> 12629)
and +76.20% more `PtrToInt`s (18589 -> 32753),
which is an increase of +20.69% in total.

However just on RawSpeed, where i know there are basically
none `IntToPtr` in the original source code,
this results in -99.27% less `IntToPtr`s (2724 -> 20)
and +82.92% more `PtrToInt`s (4513 -> 8255).
which is again an increase of 14.34% in total.

To me this does seem like the step in the right direction,
we end up with strictly less `IntToPtr`, but strictly more `PtrToInt`,
which seems like a reasonable trade-off.

See https://reviews.llvm.org/D88860 / https://reviews.llvm.org/D88995
for some more discussion on the subject.

(Eventually, `CastInst::isNoopCast()`/`CastInst::isEliminableCastPair`
should be taught about this, yes)

Reviewed By: nlopes, nikic

Differential Revision: https://reviews.llvm.org/D88979
2020-10-11 20:24:28 +03:00
Sanjay Patel 3f3356bdd9 [InstCombine] allow vector splats for add+xor --> shifts 2020-10-11 09:04:24 -04:00
Sanjay Patel f81200ae99 [InstCombine] add one-use check to add+xor transform
As shown in the affected test, we could increase instruction
count without this limitation. There's another test with extra
use that shows we still convert directly to a real "sext" if
possible.
2020-10-11 09:04:24 -04:00
Simon Pilgrim b97093e520 [InstCombine] matchFunnelShift - fold or(shl(a,x),lshr(b,sub(bw,x))) -> fshl(a,b,x) iff x < bw
If value tracking can confirm that a shift value is less than the type bitwidth then we can more confidently fold general or(shl(a,x),lshr(b,sub(bw,x))) patterns to a funnel/rotate intrinsic pattern without causing bad codegen regressions in the backend (see D89139).

Differential Revision: https://reviews.llvm.org/D88783
2020-10-11 10:37:20 +01:00
Simon Pilgrim b752daa26b [InstCombine] Replace getLogBase2 internal helper with ConstantExpr::getExactLogBase2. NFCI.
This exposes the helper for other power-of-2 instcombine folds that I'm intending to add vector support to.

The helper only operated on power-of-2 constants so getExactLogBase2 is a more accurate name.
2020-10-11 10:31:17 +01:00
Simon Pilgrim 702ccb40e2 [InstCombine] getLogBase2(undef) -> 0.
Move the undef element handling into the getLogBase2 helper instead of pre-empting with replaceUndefsWith.
2020-10-10 20:29:03 +01:00
Simon Pilgrim 3aab3cbd4a [InstCombine] getLogBase2 - no need to specify Type. NFCI.
In all the getLogBase2 uses, the specified Type is always the same as the constant being folded.
2020-10-10 20:09:55 +01:00
Simon Pilgrim 8a836daaa9 [InstCombine] Support lshr(trunc(lshr(x,c1)), c2) -> trunc(lshr(lshr(x,c1),c2)) uniform vector tests
FoldShiftByConstant is hardcoded for scalar/uniform outer shift amounts atm so that needs to be fixed first to support non-uniform cases
2020-10-09 16:54:46 +01:00
Simon Pilgrim 1c040a3e56 [InstCombine] commonShiftTransforms - add support for pow2 nonuniform constant vectors in srem fold
Note: we already fold srem to undef if any denominator vector element is undef.
2020-10-09 15:59:33 +01:00
Sanjay Patel 080e6bc205 [InstCombine] allow vector splats for add+and with high-mask
There might be a better way to specify the pre-conditions,
but this is hopefully clearer than the way it was written:
https://rise4fun.com/Alive/Jhk3

  Pre: C2 < 0 && isShiftedMask(C2) && (C1 == C1 & C2)
  %a = and %x, C2
  %r = add %a, C1
  =>
  %a2 = add %x, C1
  %r = and %a2, C2
2020-10-09 10:39:11 -04:00
Simon Pilgrim 9e796d5e71 [InstCombine] foldShiftOfShiftedLogic - add support for nonuniform constant vectors 2020-10-09 14:25:12 +01:00
Simon Pilgrim 556316cf72 [InstCombine] foldShiftOfShiftedLogic - replace cast<BinaryOperator> with m_BinOp matcher. NFCI.
Allows us to drop the !isa<ConstantExpr> check.
2020-10-09 14:10:12 +01:00
Simon Pilgrim d9f064dc0b [InstCombine] visitTrunc - trunc(shl(X, C)) --> shl(trunc(X),trunc(C)) vector support
Annoyingly vectors aren't supported by shouldChangeType(), but we have precedents for always performing this on vector types (e.g. narrowBinOp).

Differential Revision: https://reviews.llvm.org/D89067
2020-10-08 22:07:51 +01:00
Sanjay Patel f688ae7a0e [InstCombine] allow vector splats for add+xor with low-mask
This can be allowed with undef elements too, but that can be another step:
https://alive2.llvm.org/ce/z/hnC4Z-
2020-10-08 15:53:38 -04:00
Sanjay Patel 5ac89add1e [InstCombine] remove unnecessary one-use check from add-xor transform
Pre-conditions seem to be optimal, but we don't need a use check
because we are only replacing an add with a sub.

https://rise4fun.com/Alive/hzN

  Pre: (~C1 | C2 == -1) && isPowerOf2(C2+1)
  %m = and i8 %x, C1
  %f = xor i8 %m, C2
  %r = add i8 %f, C3
  =>
  %r = sub i8 C2 + C3, %m
2020-10-08 15:08:51 -04:00
Sanjay Patel b57451b011 [InstCombine] allow vector splats for add+xor with signmask 2020-10-08 10:46:34 -04:00
Simon Pilgrim 5415fef3ab [InstCombine] matchFunnelShift - support non-uniform constant vector shift amounts (PR46895)
Complete basic PR46895 fixes by refactoring D87452/D88402 to allow us to match non-uniform constant values.

We still don't handle non-uniform vectors that contain undef elements, but that can wait until we have a decent generic mechanism for this.

Differential Revision: https://reviews.llvm.org/D88420
2020-10-08 12:56:27 +01:00
Simon Pilgrim e1d4ca0009 [InstCombine] matchRotate - add support for matching general funnel shifts with constant shift amounts (PR46896)
First step towards extending the existing rotation support to full funnel shift handling now that the backend legalization support has improved.

This enables us to match the shift by constant cases, which are pretty trivial to expand again if necessary.

D88420 will add non-uniform support for funnel shifts as well once its been finalized.

Differential Revision: https://reviews.llvm.org/D88834
2020-10-08 11:05:14 +01:00
Simon Pilgrim aa47962cc9 [InstCombine] canNarrowShiftAmt - replace custom Constant matching with m_SpecificInt_ICMP
The existing code ignores undef values which matches m_SpecificInt_ICMP, although m_SpecificInt_ICMP returns false for an all-undef constant, I've added test coverage at rGfe0197e194a64f9 to show that undef folding should already have dealt with that case.
2020-10-08 10:53:32 +01:00
Amara Emerson 322d0afd87 [llvm][mlir] Promote the experimental reduction intrinsics to be first class intrinsics.
This change renames the intrinsics to not have "experimental" in the name.

The autoupgrader will handle legacy intrinsics.

Relevant ML thread: http://lists.llvm.org/pipermail/llvm-dev/2020-April/140729.html

Differential Revision: https://reviews.llvm.org/D88787
2020-10-07 10:36:44 -07:00
Roman Lebedev fed0f890e5
InstCombine: Negator: don't rely on complexity sorting already being performed (PR47752)
In some cases, we can negate instruction if only one of it's operands
negates. Previously, we assumed that constants would have been
canonicalized to RHS already, but that isn't guaranteed to happen,
because of InstCombine worklist visitation order,
as the added test (previously-hanging) shows.

So if we only need to negate a single operand,
we should ensure ourselves that we try constant operand first.
Do that by re-doing the complexity sorting ourselves,
when we actually care about it.

Fixes https://bugs.llvm.org/show_bug.cgi?id=47752
2020-10-07 15:09:50 +03:00
Simon Pilgrim 17b9a91ec2 [InstCombine] canRewriteGEPAsOffset - don't dereference a dyn_cast<>. NFCI.
We know V is a IntToPtrInst or PtrToIntInst type so we know its a CastInst - so use cast<> directly.

Prevents clang static analyzer warning that we could deference a null pointer.
2020-10-06 14:48:34 +01:00
Simon Pilgrim 75d33a3a97 [InstCombine] FoldShiftByConstant - consistently use ConstantExpr in logicalshift(trunc(shift(x,c1)),c2) fold. NFCI.
This still only gets used for scalar types but now always uses ConstantExpr in preparation for vector support - it was using APInt methods in some places.
2020-10-06 14:48:34 +01:00
Simon Pilgrim 21100f885d [InstCombine] FoldShiftByConstant - use PatternMatch for logicalshift(trunc(shift(x,c1)),c2) fold. NFCI. 2020-10-06 13:13:08 +01:00
Simon Pilgrim 0b402e985e [InstCombine] FoldShiftByConstant - remove unnecessary cast<>. NFC.
Op1 is already a Constant*
2020-10-06 13:13:08 +01:00
Roman Lebedev e00f189d39
[InstCombine] Revert rL226781 "Teach InstCombine to canonicalize loads which are only ever stored to always use a legal integer type if one is available." (PR47592)
(it was introduced in https://lists.llvm.org/pipermail/llvm-dev/2015-January/080956.html)

This canonicalization seems dubious.

Most importantly, while it does not create `inttoptr` casts by itself,
it may cause them to appear later, see e.g. D88788.

I think it's pretty obvious that it is an undesirable outcome,
by now we've established that seemingly no-op `inttoptr`/`ptrtoint` casts
are not no-op, and are no longer eager to look past them.
Which e.g. means that given
```
%a = load i32
%b = inttoptr %a
%c = inttoptr %a
```
we likely won't be able to tell that `%b` and `%c` is the same thing.

As we can see in D88789 / D88788 / D88806 / D75505,
we can't really teach SCEV about this (not without the https://bugs.llvm.org/show_bug.cgi?id=47592 at least)
And we can't recover the situation post-inlining in instcombine.

So it really does look like this fold is actively breaking
otherwise-good IR, in a way that is not recoverable.
And that means, this fold isn't helpful in exposing the passes
that are otherwise unaware of these patterns it produces.

Thusly, i propose to simply not perform such a canonicalization.
The original motivational RFC does not state what larger problem
that canonicalization was trying to solve, so i'm not sure
how this plays out in the larger picture.

On vanilla llvm test-suite + RawSpeed, this results in
increase of asm instructions and final object size by ~+0.05%
decreases final count of bitcasts by -4.79% (-28990),
ptrtoint casts by -15.41% (-3423),
and of inttoptr casts by -25.59% (-6919, *sic*).
Overall, there's -0.04% less IR blocks, -0.39% instructions.

See https://bugs.llvm.org/show_bug.cgi?id=47592

Differential Revision: https://reviews.llvm.org/D88789
2020-10-06 00:00:30 +03:00
Nikita Popov 3641d375f6 [InstCombine] Handle GEP inbounds in select op replacement (PR47730)
When retrying the "simplify with operand replaced" select
optimization without poison flags, also handle inbounds on GEPs.

Of course, this particular example would also be safe to transform
while keeping inbounds, but the underlying machinery does not
know this (yet).
2020-10-05 21:13:02 +02:00
Simon Pilgrim 8fb4645321 [InstCombine] FoldShiftByConstant - use m_Specific. NFCI.
Use m_Specific instead of m_Value followed by an equality check - we already do this for the similar folds above, it looks like an oversight in rG2b459fe7e1e where the original pattern match code looked a little different.
2020-10-05 18:56:10 +01:00
Simon Pilgrim 4ce61144cb [InstCombine] canEvaluateShifted - remove dead (and never used code). NFC.
This was already #if'd out when it was added back in 2010 at rG18d7fc8fc6767 and has never been touched since.
2020-10-05 18:05:13 +01:00
Simon Pilgrim 3aa93f690b [InstCombine] recognizeBSwapOrBitReverseIdiom - support for 'partial' bswap patterns (PR47191) (Reapplied)
If we're bswap'ing some bytes and zero'ing the remainder we can perform this as a bswap+mask which helps us match 'partial' bswaps as a first step towards folding into a more complex bswap pattern.

Reapplied with early-out if recognizeBSwapOrBitReverseIdiom collects a source wider than the result type.

Differential Revision: https://reviews.llvm.org/D88578
2020-10-03 14:52:42 +01:00
Simon Pilgrim 0364721e3e Revert rG3d14a1e982ad27 - "[InstCombine] recognizeBSwapOrBitReverseIdiom - support for 'partial' bswap patterns (PR47191)"
This reverts commit 3d14a1e982.

This is breaking on some 2stage clang buildbots
2020-10-02 18:17:14 +01:00
Simon Pilgrim 3d14a1e982 [InstCombine] recognizeBSwapOrBitReverseIdiom - support for 'partial' bswap patterns (PR47191)
If we're bswap'ing some bytes and zero'ing the remainder we can perform this as a bswap+mask which helps us match 'partial' bswaps as a first step towards folding into a more complex bswap pattern.

Differential Revision: https://reviews.llvm.org/D88578
2020-10-02 17:25:12 +01:00
Nikita Popov 9d1c8c0ba9 [InstCombine] Fix select operand simplification with undef (PR47696)
When replacing X == Y ? f(X) : Z with X == Y ? f(Y) : Z, make sure
that Y cannot be undef. If it may be undef, we might end up picking
a different value for undef in the comparison and the select
operand.
2020-10-01 21:15:48 +02:00
Simon Pilgrim 567049f892 [InstCombine] Use m_FAbs matcher helper. NFCI. 2020-10-01 14:42:34 +01:00
Simon Pilgrim 323d08e50a [InstCombine] Fix bswap(trunc(bswap(x))) -> trunc(lshr(x, c)) vector support
Use getScalarSizeInBits not getPrimitiveSizeInBits to determine the shift value at the element level.
2020-09-30 16:01:08 +01:00
Sanjay Patel 0527c8749b [InstCombine] ease alignment restriction for converting masked load to normal load
I think we initially made this fold conservative to be safer, but we do not
need the alignment attribute/metadata limitation because the masked load
intrinsic itself specifies the alignment. A normal vector load is better for
IR transforms and should be no worse in codegen than the masked alternative.
If it is worse for some target, the backend can reverse this transform.

Differential Revision: https://reviews.llvm.org/D88505
2020-09-29 15:26:22 -04:00
Simon Pilgrim 0cf48a7065 [InstCombine] visitTrunc - trunc (*shr (trunc A), C) --> trunc(*shr A, C)
Attempt to fold trunc (*shr (trunc A), C) --> trunc(*shr A, C) iff the shift amount if small enough that all zero/sign bits created by the shift are removed by the last trunc.

Helps fix the regressions encountered in D88316.

I've tweaked a couple of shift values as suggested by @lebedev.ri to ensure we have coverage of shift values close (above/below) to the max limit.

Differential Revision: https://reviews.llvm.org/D88429
2020-09-29 18:27:42 +01:00
Simon Pilgrim b610d73b3f [InstCombine] visitTrunc - remove dead trunc(lshr (zext A), C) combine. NFCI.
I added additional test coverage at rG7a55989dc4305 - but all are handled independently of this combine and http://lab.llvm.org:8080/coverage/coverage-reports/ indicates the code is never used.

Differential revision: https://reviews.llvm.org/D88492
2020-09-29 17:15:16 +01:00
Simon Pilgrim 89a8a0c910 [InstCombine] Inherit exact flags on extended shifts in trunc (lshr (sext A), C) --> (ashr A, C)
This was missed in D88475
2020-09-29 15:32:09 +01:00
Simon Pilgrim 14ff38e235 [InstCombine] visitTrunc - trunc (lshr (sext A), C) --> (ashr A, C) non-uniform support
This came from @lebedev.ri's suggestion to use m_SpecificInt_ICMP for D88429 - since I was going to change the m_APInt to m_Constant for that patch I thought I would do it for the only other user of the APInt first.

I've added a ConstantExpr::getUMin helper - its trivial to add UMAX/SMIN/SMAX but thought I'd wait until we have use cases.

Differential Revision: https://reviews.llvm.org/D88475
2020-09-29 15:01:16 +01:00
Simon Pilgrim 63ee42a06b [InstCombine] matchRotate - force splat of uniform constant rotation amounts (PR46895)
Fixes minor bug in D88402 where we were using the original shift constant (with undefs) instead of one with the splat values (re)splatted to all elements.
2020-09-28 15:12:41 +01:00
Simon Pilgrim dabb14cadd [InstCombine] matchRotate - allow undef in uniform constant rotation amounts (PR46895)
An extension to D87452, we can safely permit undefs in the uniform/splat detection

https://alive2.llvm.org/ce/z/nT-ptN

Differential Revision: https://reviews.llvm.org/D88402
2020-09-28 13:36:13 +01:00
Benjamin Kramer 7b782062b4 [InstCombine] Simplify code. NFCI. 2020-09-27 19:11:07 +02:00
Simon Pilgrim 9ff9c1d8ee [InstCombine] matchRotate - support (uniform) constant rotation amounts (PR46895)
This patch adds handling of rotation patterns with constant shift amounts - the next bit will be how we want to support non-uniform constant vectors.

Differential Revision: https://reviews.llvm.org/D87452
2020-09-25 22:03:10 +01:00
Simon Pilgrim 91589cf679 Add missing namespace closure comments. NFCI.
Fixes some clang-tidy llvm-namespace-comment warnings.
2020-09-23 16:19:25 +01:00
David Sherwood 59c4d5aad0 [SVE] Fix InstCombinerImpl::PromoteCastOfAllocation for scalable vectors
In this patch I've fixed some warnings that arose from the implicit
cast of TypeSize -> uint64_t. I tried writing a variety of different
cases to show how this optimisation might work for scalable vectors
and found:

1. The optimisation does not work for cases where the cast type
is scalable and the allocated type is not. This because we need to
know how many times the cast type fits into the allocated type.
2. If we pass all the various checks for the case when the allocated
type is scalable and the cast type is not, then when creating the
new alloca we have to take vscale into account. This leads to
sub-optimal IR that is worse than the original IR.
3. For the remaining case when both the alloca and cast types are
scalable it is hard to find examples where the optimisation would
kick in, except for simple bitcasts, because we typically fail the
ABI alignment checks.

For now I've changed the code to bail out if only one of the alloca
and cast types is scalable. This means we continue to support the
existing cases where both types are fixed, and also the specific case
when both types are scalable with the same size and alignment, for
example a simple bitcast of an alloca to another type.

I've added tests that show we don't attempt to promote the alloca,
except for simple bitcasts:

  Transforms/InstCombine/AArch64/sve-cast-of-alloc.ll

Differential revision: https://reviews.llvm.org/D87378
2020-09-23 08:43:05 +01:00
Martin Storsjö 2c4c659666 [InstCombine] Add parentheses in assert to silence GCC warning. NFC. 2020-09-23 09:03:01 +03:00
Sanjay Patel 6bad3caeb0 [InstCombine] use unary shuffle creator to reduce code duplication; NFC 2020-09-21 15:34:24 -04:00
Sanjay Patel 7903ae4720 [InstCombine] factorize left shifts of add/sub
We do similar factorization folds in SimplifyUsingDistributiveLaws,
but that drops no-wrap properties. Propagating those optimally may
help solve:
https://llvm.org/PR47430

The propagation is all-or-nothing for these patterns: when all
3 incoming ops have nsw or nuw, the 2 new ops should have the
same no-wrap property:
https://alive2.llvm.org/ce/z/Dv8wsU

This also solves:
https://llvm.org/PR47584
2020-09-20 12:55:24 -04:00
Sanjay Patel cf75e83275 [InstCombine] replace zombie unreachable values with 'undef' before erasing
The test (currently crashing) is reduced from the example provided
in the post-commit discussion in D87149.

Differential Revision: https://reviews.llvm.org/D87965
2020-09-20 12:25:08 -04:00
Huihui Zhang 9ad6049736 [InstCombine][SVE] Skip scalable type for InstCombiner::getFlippedStrictnessPredicateAndConstant.
We cannot iterate on scalable vector, the number of elements is unknown at compile-time.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D87918
2020-09-18 11:26:36 -07:00
Nikita Popov 13e19d2e7c Revert "[InstCombine] Canonicalize SPF_ABS to abs intrinc"
This reverts commit 05d4c4ebc2.

mstorsjo reports a miscompile after this change in
https://reviews.llvm.org/D87188#2281093. Reverting until I can
investigate this.
2020-09-18 09:38:26 +02:00
Nikita Popov 05d4c4ebc2 [InstCombine] Canonicalize SPF_ABS to abs intrinc
Enable canonicalization of SPF_ABS and SPF_NABS to the abs intrinsic.

To be conservative, the one-use check on the comparison is retained,
this may be relaxed if all goes well.

It's pretty likely that this will uncover places that missing
handling for the abs() intrinsic. Please report any seen performance
regressions.

Differential Revision: https://reviews.llvm.org/D87188
2020-09-17 22:28:34 +02:00
Nikita Popov 222bf3ffbc Reapply [InstCombine] Simplify select operand based on equality condition
Reapply after fixing SimplifyWithOpReplaced() to never return
the original value, which would lead to an infinite loop in this
transform.

-----

For selects of the type X == Y ? A : B, check if we can simplify A
by using the X == Y equality and replace the operand if that's
possible. We already try to do this in InstSimplify, but will only
fold if the result of the simplification is the same as B, in which
case the select can be dropped entirely. Here the select will be
retained, just one operand simplified.

As we are performing an actual replacement here, we don't have
problems with refinement / poison values.

Differential Revision: https://reviews.llvm.org/D87480
2020-09-16 20:53:58 +02:00
Benjamin Kramer b768546fe0 Revert "[InstCombine] Simplify select operand based on equality condition"
This reverts commit cfff88c03c. Sends
instcombine into an infinite loop.

```
define i1 @foo(i32 %arg, i32 %arg1) {
bb:
  %tmp = udiv i32 %arg, %arg1
  %tmp2 = mul nsw i32 %tmp, %arg1
  %tmp3 = icmp eq i32 %tmp2, %arg
  %tmp4 = select i1 %tmp3, i32 %tmp, i32 undef
  %tmp5 = icmp sgt i32 %tmp4, 255
  ret i1 %tmp5
}
```
2020-09-15 12:22:47 +02:00
Nikita Popov cfff88c03c [InstCombine] Simplify select operand based on equality condition
For selects of the type X == Y ? A : B, check if we can simplify A
by using the X == Y equality and replace the operand if that's
possible. We already try to do this in InstSimplify, but will only
fold if the result of the simplification is the same as B, in which
case the select can be dropped entirely. Here the select will be
retained, just one operand simplified.

As we are performing an actual replacement here, we don't have
problems with refinement / poison values.

Differential Revision: https://reviews.llvm.org/D87480
2020-09-14 20:07:06 +02:00
Tyker 78de7297ab Reland [AssumeBundles] Use operand bundles to encode alignment assumptions
NOTE: There is a mailing list discussion on this: http://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html

Complemantary to the assumption outliner prototype in D71692, this patch
shows how we could simplify the code emitted for an alignemnt
assumption. The generated code is smaller, less fragile, and it makes it
easier to recognize the additional use as a "assumption use".

As mentioned in D71692 and on the mailing list, we could adopt this
scheme, and similar schemes for other patterns, without adopting the
assumption outlining.
2020-09-12 15:36:06 +02:00
Nikita Popov 36e2e2e12e [InstCombine] Fix incorrect SimplifyWithOpReplaced transform (PR47322)
This is a followup to D86834, which partially fixed this issue in
InstSimplify. However, InstCombine repeats the same transform while
dropping poison flags -- which does not cover cases where poison is
introduced in some other way.

The fix here is a bit more comprehensive, because things are quite
entangled, and it's hard to only partially address it without
regressing optimization. There are really two changes here:

 * Export the SimplifyWithOpReplaced API from InstSimplify, with an
   added AllowRefinement flag. For replacements inside the TrueVal
   we don't actually care whether refinement occurs or not, the
   replacement is always legal. This part of the transform is now
   done in InstSimplify only. (It should be noted that the current
   AllowRefinement check is not sufficient -- that's an issue we
   need to address separately.)
 * Change the InstCombine fold to work by temporarily dropping
   poison generating flags, running the fold and then restoring the
   flags if it didn't work out. This will ensure that the InstCombine
   fold is correct as long as the InstSimplify fold is correct.

Differential Revision: https://reviews.llvm.org/D87445
2020-09-12 14:45:06 +02:00
Sanjay Patel 6aa3fc4a5b Revert "[InstCombine] propagate 'nsw' on pointer difference of 'inbounds' geps (PR47430)"
This reverts commit 324a53205a.

On closer examination of at least one of the test diffs,
this does not appear to be correct in all cases. Even the
existing 'nsw' creation may be wrong based on this example:
https://alive2.llvm.org/ce/z/uL4Hw9
https://alive2.llvm.org/ce/z/fJMKQS
2020-09-11 10:54:48 -04:00
Sanjay Patel 324a53205a [InstCombine] propagate 'nsw' on pointer difference of 'inbounds' geps (PR47430)
There's no signed wrap if both geps have 'inbounds':
https://alive2.llvm.org/ce/z/nZkQTg
https://alive2.llvm.org/ce/z/7qFauh
2020-09-11 10:39:09 -04:00
Simon Pilgrim 48b510c4bc [NFC] Fix compiler warnings due to integer comparison of different signedness
Fix by directly using INT_MAX and INT32_MAX.

Patch by: @nullptr.cpp (Yang Fan)

Differential Revision: https://reviews.llvm.org/D87347
2020-09-11 15:32:03 +01:00
Christopher Tetreault 7ddfd9b3eb [SVE] Bail from VectorUtils heuristics for scalable vectors
Bail from maskIsAllZeroOrUndef and maskIsAllOneOrUndef prior to iterating over the number of
elements for scalable vectors.

Assert that the mask type is not scalable in possiblyDemandedEltsInMask .

Assert that the types are correct in all three functions.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D87424
2020-09-10 12:29:37 -07:00
Nikita Popov 4e413e1621 [InstCombine] Temporarily do not drop volatile stores before unreachable
See discussion in D87149. Dropping volatile stores here is legal
per LLVM semantics, but causes issues for real code and may result
in a change to LLVM volatile semantics. Temporarily treat volatile
stores as "not guaranteed to transfer execution" in just this place,
until this issue has been resolved.
2020-09-10 16:16:44 +02:00
Nikita Popov f6b87da0c7 [InstCombine] Fold comparison of abs with int min
If the abs is poisoning, this is already folded to true/false.
For non-poisoning abs, we can convert this to a comparison with
the operand.
2020-09-08 20:23:03 +02:00
Nikita Popov e97f3b1b43 [InstCombine] Fold abs of known negative operand
If we know that the abs operand is known negative, we can replace
it with a neg.

To avoid computing known bits twice, I've removed the fold for the
non-negative case from InstSimplify. Both the non-negative and the
negative case are handled by InstCombine now, with one known bits call.

Differential Revision: https://reviews.llvm.org/D87196
2020-09-08 20:14:35 +02:00
Sanjay Patel 8b30067919 [InstCombine] improve fold of pointer differences
This was supposed to be an NFC cleanup, but there's
a real logic difference (did not drop 'nsw') visible
in some tests in addition to an efficiency improvement.

This is because in the case where we have 2 GEPs,
the code was *always* swapping the operands and
negating the result. But if we have 2 GEPs, we
should *never* need swapping/negation AFAICT.

This is part of improving flags propagation noticed
with PR47430.
2020-09-07 15:54:32 -04:00
Sanjay Patel 7a6d6f0f70 [InstCombine] improve folds for icmp with multiply operands (PR47432)
Check for no overflow along with an odd constant before
we lose information by converting to bitwise logic.

https://rise4fun.com/Alive/2Xl

  Pre: C1 != 0
  %mx = mul nsw i8 %x, C1
  %my = mul nsw i8 %y, C1
  %r = icmp eq i8 %mx, %my
  =>
  %r = icmp eq i8 %x, %y

  Name: nuw ne
  Pre: C1 != 0
  %mx = mul nuw i8 %x, C1
  %my = mul nuw i8 %y, C1
  %r = icmp ne i8 %mx, %my
  =>
  %r = icmp ne i8 %x, %y

  Name: odd ne
  Pre: C1 % 2 != 0
  %mx = mul i8 %x, C1
  %my = mul i8 %y, C1
  %r = icmp ne i8 %mx, %my
  =>
  %r = icmp ne i8 %x, %y
2020-09-07 12:40:37 -04:00
Sanjay Patel b22910daab [InstCombine] erase instructions leading up to unreachable
Normal dead code elimination ignores assume intrinsics, so we fail to
delete assumes that are not meaningful (and potentially worse if they
cause conflicts with other assumptions).

The motivating example in https://llvm.org/PR47416 suggests that we
might have problems upstream from here (difference between C and C++),
but this should be a cheap way to make sure we remove more dead code.

Differential Revision: https://reviews.llvm.org/D87149
2020-09-07 10:44:08 -04:00
Sanjay Patel 3ca8b9a560 [InstCombine] give a name to an intermediate value for easier tracking; NFC
As noted in PR47430, we probably want to conditionally include 'nsw'
here anyway, so we are going to need to fill out the optional args.
2020-09-07 08:19:42 -04:00
Nikita Popov 4892d3a198 [InstCombine] Fold abs with dominating condition
Similar to D87168, but for abs. If we have a dominating x >= 0
condition, then we know that abs(x) is x. This fold is in
InstCombine, because we need to create a sub instruction for
the x < 0 case.

Differential Revision: https://reviews.llvm.org/D87184
2020-09-05 16:18:35 +02:00
Nikita Popov ada8a17d94 [InstCombine] Fold abs intrinsic eq zero
Following the same transform for the select version of abs.
2020-09-05 15:11:38 +02:00
Nikita Popov 58b28fa7a2 [InstCombine] Fold mul of abs intrinsic
Same as the existing SPF_ABS fold. We don't need to explicitly
handle NABS, as the negs will get folded away first.
2020-09-05 12:37:45 +02:00
Nikita Popov 10cb23c6ca [InstCombine] Fold cttz of abs intrinsic
Same as the existing fold for SPF_ABS. We don't need to explicitly
handle the NABS variant, as we'll first fold away the neg in that
case.
2020-09-05 12:25:41 +02:00
Sanjay Patel 2391a34f9f [InstCombine] canonicalize all commutative intrinsics with constant arg 2020-09-03 12:42:04 -04:00
Eli Friedman 96ef6998df [InstCombine] Fix a couple crashes with extractelement on a scalable vector.
Differential Revision: https://reviews.llvm.org/D86989
2020-09-02 18:02:07 -07:00
Venkataramanan Kumar 626c3738cd [InstCombine] Transform 1.0/sqrt(X) * X to X/sqrt(X)
These transforms will now be performed irrespective of the number of uses for the expression "1.0/sqrt(X)":
1.0/sqrt(X) * X => X/sqrt(X)
X * 1.0/sqrt(X) => X/sqrt(X)

We already handle more general cases, and we are intentionally not creating extra (and likely expensive)
fdiv ops in IR. This pattern is the exception to the rule because we always expect the Backend to reduce
X/sqrt(X) to sqrt(X), if it has the necessary (reassoc) fast-math-flags.

Ref: DagCombiner optimizes the X/sqrt(X) to sqrt(X).

Differential Revision: https://reviews.llvm.org/D86726
2020-09-02 08:23:48 -04:00
Christopher Tetreault 640f20b0c7 [SVE] Remove calls to VectorType::getNumElements from InstCombine
Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D82237
2020-08-31 12:59:10 -07:00
Roman Lebedev c23aefd7c3
[NFC][InstCombine] visitPHINode(): cleanup PHI CSE instruction replacement
As @nikic is pointing out in https://reviews.llvm.org/rGbf21ce7b908e#inline-4647
this must be sufficient otherwise `EliminateDuplicatePHINodes()`
would have hit issues with it already.
2020-08-31 22:29:39 +03:00
Roman Lebedev bf21ce7b90
[InstCombine] Take 3: Perform trivial PHI CSE
The original take 1 was 6102310d81,
which taught InstSimplify to do that, which seemed better at time,
since we got EarlyCSE support for free.

However, it was proven that we can not do that there,
the simplified-to PHI would not be reachable from the original PHI,
and that is not something InstSimplify is allowed to do,
as noted in the commit ed90f15efb
that reverted it:
> It appears to cause compilation non-determinism and caused stage3 mismatches.

Then there was take 2 3e69871ab5,
which was InstCombine-specific, but it again showed stage2-stage3 differences,
and reverted in bdaa3f86a0.
This is quite alarming.

Here, let's try to change how we find existing PHI candidate:
due to the worklist order, and the way PHI nodes are inserted
(it may be inserted as the first one, or maybe not), let's look at *all*
PHI nodes in the block.

Effects on vanilla llvm test-suite + RawSpeed:
```
| statistic name                                     | baseline  | proposed  |      Δ |        % |    \|%\| |
|----------------------------------------------------|-----------|-----------|-------:|---------:|---------:|
| asm-printer.EmittedInsts                           | 7942329   | 7942457   |    128 |    0.00% |    0.00% |
| assembler.ObjectBytes                              | 254295632 | 254312480 |  16848 |    0.01% |    0.01% |
| correlated-value-propagation.NumPhis               | 18412     | 18347     |    -65 |   -0.35% |    0.35% |
| early-cse.NumCSE                                   | 2183283   | 2183267   |    -16 |    0.00% |    0.00% |
| early-cse.NumSimplify                              | 550105    | 541842    |  -8263 |   -1.50% |    1.50% |
| instcombine.NumAggregateReconstructionsSimplified  | 73        | 4506      |   4433 | 6072.60% | 6072.60% |
| instcombine.NumCombined                            | 3640311   | 3644419   |   4108 |    0.11% |    0.11% |
| instcombine.NumDeadInst                            | 1778204   | 1783205   |   5001 |    0.28% |    0.28% |
| instcombine.NumPHICSEs                             | 0         | 22490     |  22490 |    0.00% |    0.00% |
| instcombine.NumWorklistIterations                  | 2023272   | 2024400   |   1128 |    0.06% |    0.06% |
| instcount.NumCallInst                              | 1758395   | 1758802   |    407 |    0.02% |    0.02% |
| instcount.NumInvokeInst                            | 59478     | 59502     |     24 |    0.04% |    0.04% |
| instcount.NumPHIInst                               | 330557    | 330545    |    -12 |    0.00% |    0.00% |
| instcount.TotalBlocks                              | 1077138   | 1077220   |     82 |    0.01% |    0.01% |
| instcount.TotalFuncs                               | 101442    | 101441    |     -1 |    0.00% |    0.00% |
| instcount.TotalInsts                               | 8831946   | 8832606   |    660 |    0.01% |    0.01% |
| simplifycfg.NumHoistCommonCode                     | 24186     | 24187     |      1 |    0.00% |    0.00% |
| simplifycfg.NumInvokes                             | 4300      | 4410      |    110 |    2.56% |    2.56% |
| simplifycfg.NumSimpl                               | 1019813   | 999767    | -20046 |   -1.97% |    1.97% |
```
So it fires 22490 times, which is less than ~24k the take 1 did,
but more than what take 2 did (22228 times)
.
It allows foldAggregateConstructionIntoAggregateReuse() to actually work
after PHI-of-extractvalue folds did their thing. Previously SimplifyCFG
would have done this PHI CSE, of all places. Additionally, allows some
more `invoke`->`call` folds to happen (+110, +2.56%).

All in all, expectedly, this catches less things overall,
but all the motivational cases are still caught, so all good.
2020-08-29 18:21:24 +03:00
Roman Lebedev bdaa3f86a0
Revert "[InstCombine] Take 2: Perform trivial PHI CSE"
While the original variant with doing this in InstSimplify (rightfully)
caused questions and ultimately was detected to be a culprit
of stage2-stage3 mismatch, it was expected that
InstCombine-based implementation would be fine.

But apparently it's not, as
http://lab.llvm.org:8011/builders/clang-with-thin-lto-ubuntu/builds/24095/steps/compare-compilers/logs/stdio
suggests.

Which suggests that somewhere in InstCombine there is a loop
over nondeterministically sorted container, which causes
different worklist ordering.

This reverts commit 3e69871ab5.
2020-08-29 16:05:02 +03:00
Nikita Popov 6093b14c2c [InstCombine] Return replaceInstUsesWith() result (NFC)
Follow the usual usage pattern for this function and return the
result.
2020-08-29 14:49:57 +02:00
Roman Lebedev 71ac9105cd
[InstCombine] foldAggregateConstructionIntoAggregateReuse(): use InstCombiner::replaceInstUsesWith() instead of RAUW
We really shouldn't use RAUW in InstCombine
because we should consistently update Worklist to avoid extra iterations.
2020-08-29 15:10:14 +03:00
Roman Lebedev e65f213178
[InstCombine] canonicalizeICmpPredicate(): use InstCombiner::replaceInstUsesWith() instead of RAUW
We really shouldn't use RAUW in InstCombine
because we should consistently update Worklist to avoid extra iterations.
2020-08-29 15:10:14 +03:00
Roman Lebedev bd12113f57
[NFC][InstCombine] Fix some comments: the code already uses IC::replaceInstUsesWith() 2020-08-29 15:10:14 +03:00
Roman Lebedev 49d223274f
[NFC][InstCombine] Add STATISTIC() for how many iterations we did
As we've established, if it takes more than two iterations
(one to perform folding and one to ensure that no folding opportunities
remain) per function, then there are worklist management issues.
So it may be interesting to keep track of it.
2020-08-29 15:10:13 +03:00
Roman Lebedev 4f4eecf0ec
[InstCombine] visitPHINode(): use InstCombiner::replaceInstUsesWith() instead of RAUW
As noted in post-commit review, we really shouldn't use RAUW in InstCombine
because we should consistently update Worklist to avoid extra iterations.
2020-08-29 15:10:00 +03:00
Roman Lebedev 3e69871ab5
[InstCombine] Take 2: Perform trivial PHI CSE
The original take was 6102310d81,
which taught InstSimplify to do that, which seemed better at time,
since we got EarlyCSE support for free.

However, it was proven that we can not do that there,
the simplified-to PHI would not be reachable from the original PHI,
and that is not something InstSimplify is allowed to do,
as noted in the commit ed90f15efb
that reverted it :
> It appears to cause compilation non-determinism and caused stage3 mismatches.

However InstCombine already does many different optimizations,
so it should be a safe place to do it here.

Note that we still can't just compare incoming values ranges,
because there is no guarantee that these PHI's we'd simplify to
were already re-visited and sorted.
However coming up with a test is problematic.

Effects on vanilla llvm test-suite + RawSpeed:
```
| statistic name                                     | baseline  | proposed  |      Δ |        % |      |%| |
|----------------------------------------------------|-----------|-----------|-------:|---------:|---------:|
| instcombine.NumPHICSEs                             | 0         | 22228     |  22228 |    0.00% |    0.00% |
| asm-printer.EmittedInsts                           | 7942329   | 7942456   |    127 |    0.00% |    0.00% |
| assembler.ObjectBytes                              | 254295632 | 254313792 |  18160 |    0.01% |    0.01% |
| early-cse.NumCSE                                   | 2183283   | 2183272   |    -11 |    0.00% |    0.00% |
| early-cse.NumSimplify                              | 550105    | 541842    |  -8263 |   -1.50% |    1.50% |
| instcombine.NumAggregateReconstructionsSimplified  | 73        | 4506      |   4433 | 6072.60% | 6072.60% |
| instcombine.NumCombined                            | 3640311   | 3666911   |  26600 |    0.73% |    0.73% |
| instcombine.NumDeadInst                            | 1778204   | 1783318   |   5114 |    0.29% |    0.29% |
| instcount.NumCallInst                              | 1758395   | 1758804   |    409 |    0.02% |    0.02% |
| instcount.NumInvokeInst                            | 59478     | 59502     |     24 |    0.04% |    0.04% |
| instcount.NumPHIInst                               | 330557    | 330549    |     -8 |    0.00% |    0.00% |
| instcount.TotalBlocks                              | 1077138   | 1077221   |     83 |    0.01% |    0.01% |
| instcount.TotalFuncs                               | 101442    | 101441    |     -1 |    0.00% |    0.00% |
| instcount.TotalInsts                               | 8831946   | 8832611   |    665 |    0.01% |    0.01% |
| simplifycfg.NumInvokes                             | 4300      | 4410      |    110 |    2.56% |    2.56% |
| simplifycfg.NumSimpl                               | 1019813   | 999740    | -20073 |   -1.97% |    1.97% |
```
So it fires ~22k times, which is less than ~24k the take 1 did.
It allows foldAggregateConstructionIntoAggregateReuse() to actually work
after PHI-of-extractvalue folds did their thing. Previously SimplifyCFG
would have done this PHI CSE, of all places. Additionally, allows some
more `invoke`->`call` folds to happen (+110, +2.56%).

All in all, expectedly, this catches less things overall,
but all the motivational cases are still caught, so all good.
2020-08-29 13:13:06 +03:00
Nikita Popov 57a26bb7b4 [InstCombine] Fix typo in comment (NFC)
As pointed out in post-commit review of D63060.
2020-08-29 10:17:17 +02:00
Nikita Popov ffe05dd125 [InstCombine] usub.sat(a, b) + b => umax(a, b) (PR42178)
Fixes https://bugs.llvm.org/show_bug.cgi?id=42178 by folding
usub.sat(a, b) + b to umax(a, b). The backend will expand umax
back to usubsat if that is profitable.

We may also want to handle uadd.sat(a, b) - b in the future.

Differential Revision: https://reviews.llvm.org/D63060
2020-08-28 21:52:29 +02:00
David Sherwood f4257c5832 [SVE] Make ElementCount members private
This patch changes ElementCount so that the Min and Scalable
members are now private and can only be accessed via the get
functions getKnownMinValue() and isScalable(). In addition I've
added some other member functions for more commonly used operations.
Hopefully this makes the class more useful and will reduce the
need for calling getKnownMinValue().

Differential Revision: https://reviews.llvm.org/D86065
2020-08-28 14:43:53 +01:00
Roman Lebedev 95848ea101
[Value][InstCombine] Fix one-use checks in PHI-of-op -> Op-of-PHI[s] transforms to be one-user checks
As FIXME said, they really should be checking for a single user,
not use, so let's do that. It is not *that* unusual to have
the same value as incoming value in a PHI node, not unlike
how a PHI may have the same incoming basic block more than once.

There isn't a nice way to do that, Value::users() isn't uniqified,
and Value only tracks it's uses, not Users, so the check is
potentially costly since it does indeed potentially involes
traversing the entire use list of a value.
2020-08-26 20:20:41 +03:00
Roman Lebedev 1f90d45b9e
[InstCombine] PHI-of-extractvalues -> extractvalue-of-PHI, aka invokes are bad
While since D86306 we do it's sibling fold for `insertvalue`,
we should also do this for `extractvalue`'s.

And unlike that one, the results here are, quite honestly, shocking,
as it can be observed here on vanilla llvm test-suite + RawSpeed results:

```
| statistic name                                     | baseline  | proposed  |       Δ |       % |    |%| |
|----------------------------------------------------|-----------|-----------|--------:|--------:|-------:|
| asm-printer.EmittedInsts                           | 7945095   | 7942507   |   -2588 |  -0.03% |  0.03% |
| assembler.ObjectBytes                              | 273209920 | 273069800 | -140120 |  -0.05% |  0.05% |
| early-cse.NumCSE                                   | 2183363   | 2183398   |      35 |   0.00% |  0.00% |
| early-cse.NumSimplify                              | 541847    | 550017    |    8170 |   1.51% |  1.51% |
| instcombine.NumAggregateReconstructionsSimplified  | 2139      | 108       |   -2031 | -94.95% | 94.95% |
| instcombine.NumCombined                            | 3601364   | 3635448   |   34084 |   0.95% |  0.95% |
| instcombine.NumConstProp                           | 27153     | 27157     |       4 |   0.01% |  0.01% |
| instcombine.NumDeadInst                            | 1694521   | 1765022   |   70501 |   4.16% |  4.16% |
| instcombine.NumPHIsOfExtractValues                 | 0         | 37546     |   37546 |   0.00% |  0.00% |
| instcombine.NumSunkInst                            | 63158     | 63686     |     528 |   0.84% |  0.84% |
| instcount.NumBrInst                                | 874304    | 871857    |   -2447 |  -0.28% |  0.28% |
| instcount.NumCallInst                              | 1757657   | 1758402   |     745 |   0.04% |  0.04% |
| instcount.NumExtractValueInst                      | 45623     | 11483     |  -34140 | -74.83% | 74.83% |
| instcount.NumInsertValueInst                       | 4983      | 580       |   -4403 | -88.36% | 88.36% |
| instcount.NumInvokeInst                            | 61018     | 59478     |   -1540 |  -2.52% |  2.52% |
| instcount.NumLandingPadInst                        | 35334     | 34215     |   -1119 |  -3.17% |  3.17% |
| instcount.NumPHIInst                               | 344428    | 331116    |  -13312 |  -3.86% |  3.86% |
| instcount.NumRetInst                               | 100773    | 100772    |      -1 |   0.00% |  0.00% |
| instcount.TotalBlocks                              | 1081154   | 1077166   |   -3988 |  -0.37% |  0.37% |
| instcount.TotalFuncs                               | 101443    | 101442    |      -1 |   0.00% |  0.00% |
| instcount.TotalInsts                               | 8890201   | 8833747   |  -56454 |  -0.64% |  0.64% |
| instsimplify.NumSimplified                         | 75822     | 75707     |    -115 |  -0.15% |  0.15% |
| simplifycfg.NumHoistCommonCode                     | 24203     | 24197     |      -6 |  -0.02% |  0.02% |
| simplifycfg.NumHoistCommonInstrs                   | 48201     | 48195     |      -6 |  -0.01% |  0.01% |
| simplifycfg.NumInvokes                             | 2785      | 4298      |    1513 |  54.33% | 54.33% |
| simplifycfg.NumSimpl                               | 997332    | 1018189   |   20857 |   2.09% |  2.09% |
| simplifycfg.NumSinkCommonCode                      | 7088      | 6464      |    -624 |  -8.80% |  8.80% |
| simplifycfg.NumSinkCommonInstrs                    | 15117     | 14021     |   -1096 |  -7.25% |  7.25% |
```
... which tells us that this new fold fires whopping 38k times,
increasing the amount of SimplifyCFG's `invoke`->`call` transforms by +54% (+1513) (again, D85787 did that last time),
decreasing total instruction count by -0.64% (-56454),
and sharply decreasing count of `insertvalue`'s (-88.36%, i.e. 9 times less)
and `extractvalue`'s (-74.83%, i.e. four times less).

This causes geomean -0.01% binary size decrease
http://llvm-compile-time-tracker.com/compare.php?from=4d5ca22b8adfb6643466e4e9f48ba14bb48938bc&to=97dacca0111cb2ae678204e52a3cee00e3a69208&stat=size-text
and, ignoring `O0-g`, is a geomean -0.01%..-0.05% compile-time improvement
http://llvm-compile-time-tracker.com/compare.php?from=4d5ca22b8adfb6643466e4e9f48ba14bb48938bc&to=97dacca0111cb2ae678204e52a3cee00e3a69208&stat=instructions

The other thing that tells is, is that while this is a massive win for `invoke`->`call` transform
`InstCombinerImpl::foldAggregateConstructionIntoAggregateReuse()` fold,
which is supposed to be dealing with such aggregate reconstructions,
fires a lot less now. There are two reasons why:
1. After this fold, as it can be seen in tests, we may (will) end up with trivially redundant PHI nodes.
   We don't CSE them in InstCombine presently, which means that EarlyCSE needs to run and then InstCombine rerun.
2. But then, EarlyCSE not only manages to fold such redundant PHI's,
   it also sees that the extract-insert chain recreates the original aggregate,
   and replaces it with the original aggregate.

The take-aways are
1. We maybe should do most trivial, same-BB PHI CSE in InstCombine
2. I need to check if what other patterns remain, and how they can be resolved.
   (i.e. i wonder if `foldAggregateConstructionIntoAggregateReuse()` might go away)

This is a reland of the original commit fcb51d8c24,
because originally i forgot to ensure that the base aggregate types match.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D86530
2020-08-26 09:57:50 +03:00
Roman Lebedev c295c6f2c0
Revert "[InstCombine] PHI-of-extractvalues -> extractvalue-of-PHI, aka invokes are bad"
This reverts commit fcb51d8c24.

As buildbots report, there's apparently some missing check to ensure
that the types of incoming values match the type of PHI.
Let's revert for a moment.
2020-08-26 09:23:22 +03:00
Roman Lebedev fcb51d8c24
[InstCombine] PHI-of-extractvalues -> extractvalue-of-PHI, aka invokes are bad
While since D86306 we do it's sibling fold for `insertvalue`,
we should also do this for `extractvalue`'s.

And unlike that one, the results here are, quite honestly, shocking,
as it can be observed here on vanilla llvm test-suite + RawSpeed results:

```
| statistic name                                     | baseline  | proposed  |       Δ |       % |    |%| |
|----------------------------------------------------|-----------|-----------|--------:|--------:|-------:|
| asm-printer.EmittedInsts                           | 7945095   | 7942507   |   -2588 |  -0.03% |  0.03% |
| assembler.ObjectBytes                              | 273209920 | 273069800 | -140120 |  -0.05% |  0.05% |
| early-cse.NumCSE                                   | 2183363   | 2183398   |      35 |   0.00% |  0.00% |
| early-cse.NumSimplify                              | 541847    | 550017    |    8170 |   1.51% |  1.51% |
| instcombine.NumAggregateReconstructionsSimplified  | 2139      | 108       |   -2031 | -94.95% | 94.95% |
| instcombine.NumCombined                            | 3601364   | 3635448   |   34084 |   0.95% |  0.95% |
| instcombine.NumConstProp                           | 27153     | 27157     |       4 |   0.01% |  0.01% |
| instcombine.NumDeadInst                            | 1694521   | 1765022   |   70501 |   4.16% |  4.16% |
| instcombine.NumPHIsOfExtractValues                 | 0         | 37546     |   37546 |   0.00% |  0.00% |
| instcombine.NumSunkInst                            | 63158     | 63686     |     528 |   0.84% |  0.84% |
| instcount.NumBrInst                                | 874304    | 871857    |   -2447 |  -0.28% |  0.28% |
| instcount.NumCallInst                              | 1757657   | 1758402   |     745 |   0.04% |  0.04% |
| instcount.NumExtractValueInst                      | 45623     | 11483     |  -34140 | -74.83% | 74.83% |
| instcount.NumInsertValueInst                       | 4983      | 580       |   -4403 | -88.36% | 88.36% |
| instcount.NumInvokeInst                            | 61018     | 59478     |   -1540 |  -2.52% |  2.52% |
| instcount.NumLandingPadInst                        | 35334     | 34215     |   -1119 |  -3.17% |  3.17% |
| instcount.NumPHIInst                               | 344428    | 331116    |  -13312 |  -3.86% |  3.86% |
| instcount.NumRetInst                               | 100773    | 100772    |      -1 |   0.00% |  0.00% |
| instcount.TotalBlocks                              | 1081154   | 1077166   |   -3988 |  -0.37% |  0.37% |
| instcount.TotalFuncs                               | 101443    | 101442    |      -1 |   0.00% |  0.00% |
| instcount.TotalInsts                               | 8890201   | 8833747   |  -56454 |  -0.64% |  0.64% |
| instsimplify.NumSimplified                         | 75822     | 75707     |    -115 |  -0.15% |  0.15% |
| simplifycfg.NumHoistCommonCode                     | 24203     | 24197     |      -6 |  -0.02% |  0.02% |
| simplifycfg.NumHoistCommonInstrs                   | 48201     | 48195     |      -6 |  -0.01% |  0.01% |
| simplifycfg.NumInvokes                             | 2785      | 4298      |    1513 |  54.33% | 54.33% |
| simplifycfg.NumSimpl                               | 997332    | 1018189   |   20857 |   2.09% |  2.09% |
| simplifycfg.NumSinkCommonCode                      | 7088      | 6464      |    -624 |  -8.80% |  8.80% |
| simplifycfg.NumSinkCommonInstrs                    | 15117     | 14021     |   -1096 |  -7.25% |  7.25% |
```
... which tells us that this new fold fires whopping 38k times,
increasing the amount of SimplifyCFG's `invoke`->`call` transforms by +54% (+1513) (again, D85787 did that last time),
decreasing total instruction count by -0.64% (-56454),
and sharply decreasing count of `insertvalue`'s (-88.36%, i.e. 9 times less)
and `extractvalue`'s (-74.83%, i.e. four times less).

This causes geomean -0.01% binary size decrease
http://llvm-compile-time-tracker.com/compare.php?from=4d5ca22b8adfb6643466e4e9f48ba14bb48938bc&to=97dacca0111cb2ae678204e52a3cee00e3a69208&stat=size-text
and, ignoring `O0-g`, is a geomean -0.01%..-0.05% compile-time improvement
http://llvm-compile-time-tracker.com/compare.php?from=4d5ca22b8adfb6643466e4e9f48ba14bb48938bc&to=97dacca0111cb2ae678204e52a3cee00e3a69208&stat=instructions

The other thing that tells is, is that while this is a massive win for `invoke`->`call` transform
`InstCombinerImpl::foldAggregateConstructionIntoAggregateReuse()` fold,
which is supposed to be dealing with such aggregate reconstructions,
fires a lot less now. There are two reasons why:
1. After this fold, as it can be seen in tests, we may (will) end up with trivially redundant PHI nodes.
   We don't CSE them in InstCombine presently, which means that EarlyCSE needs to run and then InstCombine rerun.
2. But then, EarlyCSE not only manages to fold such redundant PHI's,
   it also sees that the extract-insert chain recreates the original aggregate,
   and replaces it with the original aggregate.

The take-aways are
1. We maybe should do most trivial, same-BB PHI CSE in InstCombine
2. I need to check if what other patterns remain, and how they can be resolved.
   (i.e. i wonder if `foldAggregateConstructionIntoAggregateReuse()` might go away)

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D86530
2020-08-26 09:08:24 +03:00
Sanjay Patel c4f0a0896f [InstCombine] improve demanded element analysis for vector insert-of-extract (2nd try)
The 1st attempt (rG557b890) was reverted because it caused miscompiles.
That bug is avoided here by changing the order of folds and as verified
in the new tests.

Original commit message:
InstCombine currently has odd rules for folding insert-extract chains to shuffles,
so we miss collapsing seemingly simple cases as shown in the tests here.

But poison makes this not quite as easy as we might have guessed. Alive2 tests to
show the subtle difference (similar to the regression tests):
https://alive2.llvm.org/ce/z/hp4hv3 (this is ok)
https://alive2.llvm.org/ce/z/ehEWaN (poison leakage)

SLP tends to create these patterns (as shown in the SLP tests), and this could
help with solving PR16739.

Differential Revision: https://reviews.llvm.org/D86460
2020-08-25 11:19:36 -04:00
Benjamin Kramer c6fb72de4f Revert "[InstCombine] improve demanded element analysis for vector insert-of-extract"
This reverts commit 557b890ff4. Causing
miscompiles, test case is on llvm-commits.
2020-08-25 11:31:31 +02:00
David Sherwood 7b64765cd1 [SVE] Fix TypeSize related warnings with IR truncates of scalable vectors
In getCastInstrCost when the instruction is a truncate we were relying
upon the implicit TypeSize -> uint64_t cast when asking if a given type
has the same size as a legal integer. I've changed the code to only
ask the question if the type is fixed length.

I have also changed InstCombinerImpl::SimplifyDemandedUseBits to bail
out for now if the type is a scalable vector.

I've added the following new tests:

  Analysis/CostModel/AArch64/sve-trunc.ll
  Transforms/InstCombine/AArch64/sve-trunc.ll

for both of these fixes.

Differential revision: https://reviews.llvm.org/D86432
2020-08-25 09:17:56 +01:00
Roman Lebedev cdd339c568
[InstCombine] PHI-of-insertvalues -> insertvalue-of-PHI's
As per statistic, this happens pretty exceedingly rare,
but i have seen it in exactly the situations the
Phi-aware aggregate reconstruction would have handled,
eventually, and allowed invoke -> call fold later on.

So while this might be something that other fold
will have to learn about, i believe we should be
doing this transform in general.

Here, we are okay with adding two PHI's to get both the base aggregate,
and the inserted value. I'm not sure it makes much sense to restrict
it to a single phi (to just the inserted value?), because originally
we'd be receiving the final aggregate already..

llvm test-suite + RawSpeed:
```
| statistic name                             | baseline  | proposed  |    Δ |      % | \|%\| |
|--------------------------------------------|-----------|-----------|-----:|-------:|------:|
| instcombine.NumPHIsOfInsertValues          | 0         | 12        |  12  |  0.00% | 0.00% |
| asm-printer.EmittedInsts                   | 8926643   | 8926595   | -48  |  0.00% | 0.00% |
| instcombine.NumCombined                    | 3846614   | 3846640   |  26  |  0.00% | 0.00% |
| instcombine.NumConstProp                   | 24302     | 24293     |  -9  | -0.04% | 0.04% |
| instcombine.NumDeadInst                    | 1620140   | 1620112   | -28  |  0.00% | 0.00% |
| instcount.NumBrInst                        | 898466    | 898464    |  -2  |  0.00% | 0.00% |
| instcount.NumCallInst                      | 1760819   | 1760875   |  56  |  0.00% | 0.00% |
| instcount.NumExtractValueInst              | 45659     | 45649     | -10  | -0.02% | 0.02% |
| instcount.NumInsertValueInst               | 4991      | 4981      | -10  | -0.20% | 0.20% |
| instcount.NumIntToPtrInst                  | 27084     | 27087     |   3  |  0.01% | 0.01% |
| instcount.NumPHIInst                       | 371435    | 371429    |  -6  |  0.00% | 0.00% |
| instcount.NumStoreInst                     | 906011    | 906019    |   8  |  0.00% | 0.00% |
| instcount.TotalBlocks                      | 1105520   | 1105518   |  -2  |  0.00% | 0.00% |
| instcount.TotalInsts                       | 9795737   | 9795776   |  39  |  0.00% | 0.00% |
| simplifycfg.NumInvokes                     | 2784      | 2786      |   2  |  0.07% | 0.07% |
| simplifycfg.NumSimpl                       | 1001840   | 1001850   |  10  |  0.00% | 0.00% |
| simplifycfg.NumSinkCommonInstrs            | 15174     | 15170     |  -4  | -0.03% | 0.03% |
```

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D86306
2020-08-25 10:38:11 +03:00
Sanjay Patel 557b890ff4 [InstCombine] improve demanded element analysis for vector insert-of-extract
InstCombine currently has odd rules for folding insert-extract chains to shuffles,
so we miss collapsing seemingly simple cases as shown in the tests here.

But poison makes this not quite as easy as we might have guessed. Alive2 tests to
show the subtle difference (similar to the regression tests):
https://alive2.llvm.org/ce/z/hp4hv3 (this is ok)
https://alive2.llvm.org/ce/z/ehEWaN (poison leakage)

SLP tends to create these patterns (as shown in the SLP tests), and this could
help with solving PR16739.

Differential Revision: https://reviews.llvm.org/D86460
2020-08-24 17:00:16 -04:00
Roman Lebedev 56c529300e
[NFC][InstCombine] Adjust naming for some methods to match coding standards
Requested as preparatory cleanup in https://reviews.llvm.org/D86306#inline-799065
2020-08-24 22:39:34 +03:00
Sanjay Patel 6a44edb8da [InstCombine] fold abs of select with negated op (PR39474)
Similar to the existing transform - peek through a select
to match a value and its negation.

https://alive2.llvm.org/ce/z/MXi5KG

  define i8 @src(i1 %b, i8 %x) {
  %0:
    %neg = sub i8 0, %x
    %sel = select i1 %b, i8 %x, i8 %neg
    %abs = abs i8 %sel, 1
    ret i8 %abs
  }
  =>
  define i8 @tgt(i1 %b, i8 %x) {
  %0:
    %abs = abs i8 %x, 1
    ret i8 %abs
  }
  Transformation seems to be correct!
2020-08-24 07:37:55 -04:00
Roman Lebedev f6decfa36d
[InstCombine] Negator: freeze is freely negatible if it's operand is negatible 2020-08-23 23:28:19 +03:00
Sanjay Patel ec06b38130 [InstCombine] canonicalize 'not' ops before logical shifts
This reverses the existing transform that would uniformly canonicalize any 'xor' after any shift. In the case of logical shifts, that turns a 'not' into an arbitrary 'xor' with constant, and that's probably not as good for analysis, SCEV, or codegen.

The SCEV motivating case is discussed in:
http://bugs.llvm.org/PR47136

There's an analysis motivating case at:
http://bugs.llvm.org/PR38781

I did draft a patch that would do the same for 'ashr' but that's questionable because it's just swapping the position of a 'not' and uncovers at least 2 missing folds that we would probably need to deal with as preliminary steps.

Alive proofs:
https://rise4fun.com/Alive/BBV

  Name: shift right of 'not'
  Pre: C2 == (-1 u>> C1)
  %a = lshr i8 %x, C1
  %r = xor i8 %a, C2
  =>
  %n = xor i8 %x, -1
  %r = lshr i8 %n, C1

  Name: shift left of 'not'
  Pre: C2 == (-1 << C1)
  %a = shl i8 %x, C1
  %r = xor i8 %a, C2
  =>
  %n = xor i8 %x, -1
  %r = shl i8 %n, C1

  Name: ashr of 'not'
  %a = ashr i8 %x, C1
  %r = xor i8 %a, -1
  =>
  %n = xor i8 %x, -1
  %r = ashr i8 %n, C1

Differential Revision: https://reviews.llvm.org/D86243
2020-08-22 09:38:13 -04:00
Serguei Katkov 9e362bb0eb [InstCombine] Remove unused entries in gc-live bundle of statepoint
If some of gc live value are not used in gc.relocate we can remove them
from gc-live bundle of statepoint instruction.

Also the CL removes duplicated Values in gc-live bundle.

Reviewers: reames, dantrushin
Reviewed By: dantrushin
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D85959
2020-08-22 01:36:22 +07:00
Serguei Katkov 63d9d56a55 [InstCombine] Move handling of gc.relocate in a gc.statepoint
The only def for gc.relocate is a gc.statepoint. But real dependency is not
described by def-use chain. Instead this dependency is encoded by indecies
of operands in gc-live bundle of statepoint as integer constants in gc.relocate.

InstCombine operates by def-use chain. As a result when value in gc-live bundle
is simplified the gc.statepoint itself is not simplified but it might simplify dependent
gc.relocates. To trigger the optimization of gc.relocate we now unconditionally trigger
check of all dependent gc.relocates by adding them to worklist.

This CL handles of gc.relocates as process of gc.statepoint optimization considering
gc.statepoint and related gc.relocate as whole entity.

Reviewers: reames, dantrushin
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D85954
2020-08-21 23:44:23 +07:00
Sanjay Patel 6f3511a01a [ValueTracking] define/use max recursion depth in header
There's a potential motivating case to increase this limit in PR47191:
http://bugs.llvm.org/PR47191

But first we should make it less hacky. The limit in InstCombine is directly tied
to this value because an increase there can cause asserts in the underlying value
tracking calls if not changed together. The usage in VectorUtils is independent,
but the comment suggests that we should use the same value unless there's a known
reason to diverge. There are similar limits in codegen analysis, but I think we
should leave those independent in case we intentionally want the optimization
power/cost to be different there.

Differential Revision: https://reviews.llvm.org/D86113
2020-08-19 16:56:59 -04:00
Sanjay Patel c8d711adae [InstCombine] reduce code duplication; NFC 2020-08-19 12:05:12 -04:00
Benjamin Kramer b98e25b6d7 Make helpers static. NFC. 2020-08-19 16:00:03 +02:00
Roman Lebedev 3d76a133c7
Revert "[InstCombine] Lower infinite combine loop detection thresholds"
And as being reported by Florian Hahn, there's a hit
in MultiSource/Benchmarks/mafft from the test-suite on X86 with -O3 -flto,
so reverting until addressed.

This reverts commit 71e0b82c9f.
2020-08-19 16:53:30 +03:00
Roman Lebedev 71e0b82c9f
[InstCombine] Lower infinite combine loop detection thresholds
It's been a month since 2f3862eb9f,
and no new bug reports about the threshold were filled,
so let's bump it again and wait again.
2020-08-19 14:37:57 +03:00
Roman Lebedev 2f01785857
[NFC][InstCombine] Aggregate reconstruction: use plain map
Now that we no longer require for this map to have stable iteration order,
we no longer need to pay for keeping the iteration order stable,
so switch from `SmallMapVector` to `SmallDenseMap`.
2020-08-19 01:09:25 +03:00
Roman Lebedev 78bd4231bf
[InstCombine] PHI-aware aggregate reconstruction: properly handle duplicate predecessors
While it may seem like we can just "deduplicate" the case where
some basic block happens to be a predecessor more than once,
which happens for e.g. switches, that is not correct thing to do.
We must actually add a PHI operand for each predecessor.

This was initially reported to me by David Major
as a clang crash during gecko build for android.
2020-08-19 01:00:42 +03:00
Sanjay Patel 139da9c4d7 [InstCombine] fold fabs of select with negated operand
This is the FP example shown in:
https://bugs.llvm.org/PR39474
2020-08-18 09:23:07 -04:00
Roman Lebedev 03127f795b
[InstCombine] PHI-aware aggregate reconstruction: correctly detect "use" basic block
While the original implementation added in D85787 / ae7f08812e
is not incorrect, it is known to be suboptimal.

In particular, it is not incorrect to use the basic block
in which the original `insertvalue` instruction is located
as the merge point, that is not necessarily optimal,
as `@test6` shows.

We should look at all the AggElts, and, if they are all defined
in the same basic block, then that is the basic block we should use.

On RawSpeed library, this catches +4% (+50) more cases.
On vanilla LLVM test-suits, this catches +12% (+92) more cases.
2020-08-18 00:45:18 +03:00
Roman Lebedev f4f673e0e3
[NFC][InstCombine] PHI-aware aggregate reconstruction: don't capture UseBB in lambdas, take it as argument
In a following patch, UseBB will be detected later,
so capturing it is potentially error-prone (capture by ref vs by val).

Also, parametrized UseBB will likely be needed
for multiple levels of PHI indirections later on anyways.
2020-08-18 00:45:18 +03:00
Roman Lebedev 4973ca3eac
[NFC][InstCombine] PHI-aware aggregate reconstruction: insert PHI node manually
This is NFC at the moment, because right now we always insert the PHI
into the same basic block in which the original `insertvalue` instruction
is, but that will change.

Also, fixes addition of the suffix to the value names.
2020-08-18 00:45:17 +03:00
Sanjay Patel e6b6787d01 [InstCombine] fold abs(X)/X to cmp+select
The backend can convert the select-of-constants to
bit-hack shift+logic if desirable.

https://alive2.llvm.org/ce/z/pgJT6E

  define i8 @src(i8 %x) {
  %0:
    %a = abs i8 %x, 1
    %d = sdiv i8 %x, %a
    ret i8 %d
  }
  =>
  define i8 @tgt(i8 %x) {
  %0:
    %cond = icmp sgt i8 %x, 255
    %r = select i1 %cond, i8 1, i8 255
    ret i8 %r
  }
  Transformation seems to be correct!
2020-08-17 08:01:28 -04:00
Sanjay Patel 6cd4a6f6b2 [InstCombine] reduce code duplication; NFC 2020-08-17 08:01:27 -04:00
Yonghong Song aa61e43040 [InstCombine] Fix a compilation bug
With gcc 6.3.0, I hit the following compilation bug.
  ../lib/Transforms/InstCombine/InstCombineVectorOps.cpp:937:2: error: extra ‘;’ [-Werror=pedantic]
   };
    ^
  cc1plus: all warnings being treated as errors

The error is introduced by Commit ae7f08812e ("[InstCombine]
Aggregate reconstruction simplification (PR47060)")
2020-08-16 21:56:42 -07:00
Roman Lebedev 0ec1f0f332
[NFCI][InstCombine] Pacify GCC builds - don't name variable and enum class identically 2020-08-16 23:37:36 +03:00
Roman Lebedev ae7f08812e
[InstCombine] Aggregate reconstruction simplification (PR47060)
This pattern happens in clang C++ exception lowering code, on unwind branch.
We end up having a `landingpad` block after each `invoke`, where RAII
cleanup is performed, and the elements of an aggregate `{i8*, i32}`
holding exception info are `extractvalue`'d, and we then branch to common block
that takes extracted `i8*` and `i32` elements (via `phi` nodes),
form a new aggregate, and finally `resume`'s the exception.

The problem is that, if the cleanup block is effectively empty,
it shouldn't be there, there shouldn't be that `landingpad` and `resume`,
said `invoke` should be a  `call`.

Indeed, we do that simplification in e.g. SimplifyCFG `SimplifyCFGOpt::simplifyResume()`.
But the thing is, all this extra `extractvalue` + `phi` + `insertvalue` cruft,
while it is pointless, does not look like "empty cleanup block".
So the `SimplifyCFGOpt::simplifyResume()` fails, and the exception is has
higher cost than it could have on unwind branch :S

This doesn't happen *that* often, but it will basically happen once per C++
function with complex CFG that called more than one other function
that isn't known to be `nounwind`.

I think, this is a missing fold in InstCombine, so i've implemented it.

I think, the algorithm/implementation is rather self-explanatory:
1. Find a chain of `insertvalue`'s that fully tell us the initializer of the aggregate.
2. For each element, try to find from which aggregate it was extracted.
   If it was extracted from the aggregate with identical type,
   from identical element index, great.
3. If all elements were found to have been extracted from the same aggregate,
   then we can just use said original source aggregate directly,
   instead of re-creating it.
4. If we fail to find said aggregate when looking only in the current block,
   we need be PHI-aware - we might have different source aggregate when coming
   from each predecessor.

I'm not sure if this already handles everything, and there are some FIXME's,
i'll deal with all that later in followups.

I'd be fine with going with post-commit review here code-wise,
but just in case there are thoughts, i'm posting this.

On RawSpeed, for example, this has the following effect:
```
| statistic name                                    | baseline | proposed |     Δ |       % | abs(%) |
|---------------------------------------------------|---------:|---------:|------:|--------:|-------:|
| instcombine.NumAggregateReconstructionsSimplified |        0 |     1253 |  1253 |   0.00% |  0.00% |
| simplifycfg.NumInvokes                            |      948 |     1355 |   407 |  42.93% | 42.93% |
| instcount.NumInsertValueInst                      |     4382 |     3210 | -1172 | -26.75% | 26.75% |
| simplifycfg.NumSinkCommonCode                     |      574 |      458 |  -116 | -20.21% | 20.21% |
| simplifycfg.NumSinkCommonInstrs                   |     1154 |      921 |  -233 | -20.19% | 20.19% |
| instcount.NumExtractValueInst                     |    29017 |    26397 | -2620 |  -9.03% |  9.03% |
| instcombine.NumDeadInst                           |   166618 |   174705 |  8087 |   4.85% |  4.85% |
| instcount.NumPHIInst                              |    51526 |    50678 |  -848 |  -1.65% |  1.65% |
| instcount.NumLandingPadInst                       |    20865 |    20609 |  -256 |  -1.23% |  1.23% |
| instcount.NumInvokeInst                           |    34023 |    33675 |  -348 |  -1.02% |  1.02% |
| simplifycfg.NumSimpl                              |   113634 |   114708 |  1074 |   0.95% |  0.95% |
| instcombine.NumSunkInst                           |    15030 |    14930 |  -100 |  -0.67% |  0.67% |
| instcount.TotalBlocks                             |   219544 |   219024 |  -520 |  -0.24% |  0.24% |
| instcombine.NumCombined                           |   644562 |   645805 |  1243 |   0.19% |  0.19% |
| instcount.TotalInsts                              |  2139506 |  2135377 | -4129 |  -0.19% |  0.19% |
| instcount.NumBrInst                               |   156988 |   156821 |  -167 |  -0.11% |  0.11% |
| instcount.NumCallInst                             |  1206144 |  1207076 |   932 |   0.08% |  0.08% |
| instcount.NumResumeInst                           |     5193 |     5190 |    -3 |  -0.06% |  0.06% |
| asm-printer.EmittedInsts                          |   948580 |   948299 |  -281 |  -0.03% |  0.03% |
| instcount.TotalFuncs                              |    11509 |    11507 |    -2 |  -0.02% |  0.02% |
| inline.NumDeleted                                 |    97595 |    97597 |     2 |   0.00% |  0.00% |
| inline.NumInlined                                 |   210514 |   210522 |     8 |   0.00% |  0.00% |
```
So we manage to increase the amount of `invoke` -> `call` conversions in SimplifyCFG by almost a half,
and there is a very apparent decrease in instruction and basic block count.

On vanilla llvm-test-suite:
```
| statistic name                                    | baseline | proposed |     Δ |       % | abs(%) |
|---------------------------------------------------|---------:|---------:|------:|--------:|-------:|
| instcombine.NumAggregateReconstructionsSimplified |        0 |      744 |   744 |   0.00% |  0.00% |
| instcount.NumInsertValueInst                      |     2705 |     2053 |  -652 | -24.10% | 24.10% |
| simplifycfg.NumInvokes                            |     1212 |     1424 |   212 |  17.49% | 17.49% |
| instcount.NumExtractValueInst                     |    21681 |    20139 | -1542 |  -7.11% |  7.11% |
| simplifycfg.NumSinkCommonInstrs                   |    14575 |    14361 |  -214 |  -1.47% |  1.47% |
| simplifycfg.NumSinkCommonCode                     |     6815 |     6743 |   -72 |  -1.06% |  1.06% |
| instcount.NumLandingPadInst                       |    14851 |    14712 |  -139 |  -0.94% |  0.94% |
| instcount.NumInvokeInst                           |    27510 |    27332 |  -178 |  -0.65% |  0.65% |
| instcombine.NumDeadInst                           |  1438173 |  1443371 |  5198 |   0.36% |  0.36% |
| instcount.NumResumeInst                           |     2880 |     2872 |    -8 |  -0.28% |  0.28% |
| instcombine.NumSunkInst                           |    55187 |    55076 |  -111 |  -0.20% |  0.20% |
| instcount.NumPHIInst                              |   321366 |   320916 |  -450 |  -0.14% |  0.14% |
| instcount.TotalBlocks                             |   886816 |   886493 |  -323 |  -0.04% |  0.04% |
| instcount.TotalInsts                              |  7663845 |  7661108 | -2737 |  -0.04% |  0.04% |
| simplifycfg.NumSimpl                              |   886791 |   887171 |   380 |   0.04% |  0.04% |
| instcount.NumCallInst                             |   553552 |   553733 |   181 |   0.03% |  0.03% |
| instcombine.NumCombined                           |  3200512 |  3201202 |   690 |   0.02% |  0.02% |
| instcount.NumBrInst                               |   741794 |   741656 |  -138 |  -0.02% |  0.02% |
| simplifycfg.NumHoistCommonInstrs                  |    14443 |    14445 |     2 |   0.01% |  0.01% |
| asm-printer.EmittedInsts                          |  7978085 |  7977916 |  -169 |   0.00% |  0.00% |
| inline.NumDeleted                                 |    73188 |    73189 |     1 |   0.00% |  0.00% |
| inline.NumInlined                                 |   291959 |   291968 |     9 |   0.00% |  0.00% |
```
Roughly similar effect, less instructions and blocks total.

See also: rGe492f0e03b01a5e4ec4b6333abb02d303c3e479e.

Compile-time wise, this appears to be roughly geomean-neutral:
http://llvm-compile-time-tracker.com/compare.php?from=39617aaed95ac00957979bc1525598c1be80e85e&to=b59866cf30420da8f8e3ca239ed3bec577b23387&stat=instructions

And this is a win size-wize in general:
http://llvm-compile-time-tracker.com/compare.php?from=39617aaed95ac00957979bc1525598c1be80e85e&to=b59866cf30420da8f8e3ca239ed3bec577b23387&stat=size-text

See https://bugs.llvm.org/show_bug.cgi?id=47060

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D85787
2020-08-16 23:27:56 +03:00
Sanjay Patel 3ffb751f3d [InstCombine] fold copysign with fabs/fneg operand
We already get this in the backend, but we need to do
it in IR too to consistently get yet more copysign
transforms.
2020-08-16 08:53:47 -04:00
Sanjay Patel 3fed67b7e6 [InstCombine] reduce code duplication; NFC 2020-08-16 08:53:47 -04:00
Serguei Katkov 98ba0a5ffe [InstCombine] Handle gc.relocate(null) in one iteration
InstCombine adds users of transformed instruction to working list to
process on the same iteration. However gc.relocate may have a hidden
user (next gc.relocate) which is connected through gc.statepoint intrinsic and
there is no direct def-use chain between them.

In this case if the next gc.relocation is already processed it will not be added
to worklist and will not be able to be processed on the same iteration.
Let's we have the following case:
A = gc.relocate(null)
B = statepoint(A)
C = gc.relocate(B, hidden(A))
If C is already considered then after replacement of A with null, statepoint B
instruction will be added to the queue but not C.
C can be processed only on the next iteration.

If the chain of relocation is pretty long the many iteration may be required.
This change is to reduce the number of iteration to meet the latest changes
related to reducing infinite loop threshold.

This is a quick (not best) fix. In the follow up patches I plan to move gc relocation
handling into statepoint handler. This should also help to remove unused gc live
entries in statepoint bundle.

Reviewers: reames, dantrushin
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D75598
2020-08-13 23:16:27 +07:00
David Stenberg e8ebebb0bd [InstCombine] Fix incorrect Modified status
When removing instructions from unreachable blocks, and only debug info
intrinsics were removed, InstCombine could incorrectly return a false
Modified status.

This is fixed by making removeAllNonTerminatorAndEHPadInstructions()
also return how many debug info intrinsics that were removed, and take
that into account.

This was caught using the check introduced by D80916.

Reviewed By: majnemer

Differential Revision: https://reviews.llvm.org/D85839
2020-08-13 15:10:41 +02:00
Sanjay Patel 23bd33c6ac [InstCombine] prefer xor with -1 because 'not' is easier to understand (PR32706)
This is a retry of rL300977 which was reverted because of infinite loops.
We have fixed all of the known places where that would happen, but there's
still a chance that this patch will cause infinite loops.

This matches the demanded bits behavior in the DAG and should fix:
https://bugs.llvm.org/show_bug.cgi?id=32706

Differential Revision: https://reviews.llvm.org/D32255
2020-08-12 15:50:33 -04:00
Roman Lebedev d6f0600c96
[NFC][InstCombine] Add FIXME's for getLogBase2() / visitUDivOperand()
These are not correctness issues.

In visitUDivOperand(), if the (potential) divisor is undef, then udiv is
already UB, so it is not incorrect to keep undef as shift amount.

But, that is suboptimal.
We could instead simply drop that select, picking the other operand.

Afterwards, getLogBase2() could assert that there is no undef in divisor.
2020-08-12 22:06:54 +03:00