Commit Graph

4497 Commits

Author SHA1 Message Date
Fangrui Song d8aba75a76 Internalize some cl::opt global variables or move them under namespace llvm 2021-05-07 11:15:43 -07:00
Juneyoung Lee 8a156d1c27 [InstCombine] Fully disable select to and/or i1 folding
This is a patch that disables the poison-unsafe select -> and/or i1 folding.

It has been blocking D72396 and also has been the source of a few miscompilations
described in llvm.org/pr49688 .
D99674 conditionally blocked this folding and successfully fixed the latter one.
The former one was still blocked, and this patch addresses it.

Note that a few test functions that has `_logical` suffix are now deoptimized.
These are created by @nikic to check the impact of disabling this optimization
by copying existing original functions and replacing and/or with select.

I can see that most of these are poison-unsafe; they can be revived by introducing
freeze instruction. I left comments at fcmp + select optimizations (or-fcmp.ll, and-fcmp.ll)
because I think they are good targets for freeze fix.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D101191
2021-05-06 09:29:52 +09:00
Coplin, Jared 6251b2f7f6 Attach metadata to simplified masked loads and stores 2021-05-05 18:01:49 -05:00
Sanjay Patel 0034197874 [InstCombine] improve readability; NFC 2021-05-05 11:05:47 -04:00
Juneyoung Lee 1fef5c88a6 [InstCombine] Fold more select of selects using isImpliedCondition
This is a simple folding that does these:

```
select x_inv, true, (select y, x, false)
=>
select x_inv, true, y
```
https://alive2.llvm.org/ce/z/-STJ2d

```
select (select y, x, false), true, x_inv
=>
select y, true, x_inv
```
https://alive2.llvm.org/ce/z/6ruYt6

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D101807
2021-05-05 13:44:58 +09:00
Sanjay Patel a6f79b5671 [InstCombine] avoid infinite loops with select/icmp transforms
This fixes https://llvm.org/PR48900 , but as seen in the
regression tests prevents some optimizations.

There are a few options to restore those (switch to min/max
intrinsics, add larger pattern matching for select with
dominating condition, improve CVP), but we need to prevent
the bug 1st.
2021-05-04 11:54:06 -04:00
Dávid Bolvanský 80b897e21b [InstCombine] ctpop(X) ^ ctpop(Y) & 1 --> ctpop(X^Y) & 1 (PR50094)
Original pattern: (__builtin_parity(x) ^ __builtin_parity(y))

LLVM rewrites it as: (__builtin_popcount(x) ^ __builtin_popcount(y)) & 1

Optimized form:  __builtin_popcount(X^Y) & 1

Alive proof: https://alive2.llvm.org/ce/z/-GdWFr

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D101802
2021-05-04 13:16:18 +02:00
Juneyoung Lee 24ce194cfe [InstCombine] generalize select + select/and/or folding using implied conditions
This patch optimizes the remaining possible cases in D101191 by generalizing isImpliedCondition()-based
foldings.

Assume that there is `op a, (select b, _, _)` where op is one of `and i1`, `or i1` or their select forms.

We can do the following optimization based on the result of `isImpliedCondition(a, b)`:

If a = true implies…
- b = true:
    - select a, (select b, A, B), false => select a, A, false : https://alive2.llvm.org/ce/z/WCnZYh
    - and a, (select b, A, B) => select a, A, false : https://alive2.llvm.org/ce/z/uZhcMG
- b = false:
    - select a, (select b, A, B), false => select a, B, false : https://alive2.llvm.org/ce/z/c2hJpV
    - and a, (select b, A, B) => select a, B, false : https://alive2.llvm.org/ce/z/5ggwMM

If a = false implies…
- b = true:
    - select a, true, (select b, A, B) => select a, true, A : https://alive2.llvm.org/ce/z/tidKvH
    - or a, (select b, A, B) =>  select a, true, A : https://alive2.llvm.org/ce/z/cC-uyb
- b = false:
    - select a, true, (select b, A, B) => select a, true, B : https://alive2.llvm.org/ce/z/ZXpJq9
    - or a, (select b, A, B) => select a, true, B : https://alive2.llvm.org/ce/z/hnDrJj

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D101720
2021-05-04 09:42:06 +09:00
Dávid Bolvanský 08c08577f9 [InstCombine] cttz(sext(x)) -> cttz(zext(x))
```

----------------------------------------
define i32 @src(i16 %x, i1 %b) {
%0:
  %z = sext i16 %x to i32
  %p = cttz i32 %z, %b
  ret i32 %p
}
=>
define i32 @tgt(i16 %x, i1 %b) {
%0:
  %z = zext i16 %x to i32
  %p = cttz i32 %z, %b
  ret i32 %p
}
Transformation seems to be correct!
```

https://alive2.llvm.org/ce/z/evomeg

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D101764
2021-05-03 23:59:30 +02:00
Dávid Bolvanský 27b651ca47 [InstCombine] cttz(zext(x)) -> zext(cttz(x)) if the 'ZeroIsUndef' parameter is 'true' (PR50172)
Zext doesn't change the number of trailing zeros, so narrow cttz(zext(x)) -> zext(cttz(x)) if the 'ZeroIsUndef' parameter is 'true'.

Proofs:
https://alive2.llvm.org/ce/z/o2dnjY

Solves https://bugs.llvm.org/show_bug.cgi?id=50172

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D101582
2021-05-03 17:05:12 +02:00
Sanjay Patel 1b24f35f84 [InstCombine] improve demanded bits analysis of left-shifted operand
If we don't demand high bits, then we also don't care about those
high bits of a left-shift operand regardless of shift amount.
I noticed the sext/trunc pattern in a motivating example.
It seems like there should be a low-bits with right-shift sibling,
but I haven't looked at that yet.

https://alive2.llvm.org/ce/z/JuS6jc
https://rise4fun.com/Alive/Trm (not sure how to use 'width' with Alive1)
https://alive2.llvm.org/ce/z/gRadbF

Differential Revision: https://reviews.llvm.org/D101489
2021-05-03 08:39:20 -04:00
Juneyoung Lee 39eb2665d9 [InstCombine] Add a few more patterns for folding select of select
This is a patch that folds select of select to salvage some optimizations after select -> and/or folding is disabled.

```
select (select a, true, b), c, false -> select a, c, false
select c, (select a, true, b), false -> select c, a, false
  if c implies that b is false (isImpliedCondition).
```
https://alive2.llvm.org/ce/z/ANatjt, https://alive2.llvm.org/ce/z/rv8zTB

```
sel (sel c, a, false), true, (sel !c, b, false) -> sel c, a, b
sel (sel !c, a, false), true, (sel c, b, false) -> sel c, b, a
```
https://alive2.llvm.org/ce/z/U2kp-t, https://alive2.llvm.org/ce/z/bc88EE

See D101191

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D101375
2021-05-02 19:00:42 +09:00
Juneyoung Lee 1977c53b2a [InstCombine] Fold overflow bit of [u|s]mul.with.overflow in a poison-safe way
As discussed in D101191, this patch adds a poison-safe folding of overflow bit check:
```
  %Op0 = icmp ne i4 %X, 0
  %Agg = call { i4, i1 } @llvm.[us]mul.with.overflow.i4(i4 %X, i4 %Y)
  %Op1 = extractvalue { i4, i1 } %Agg, 1
  %ret = select i1 %Op0, i1 %Op1, i1 false
=>
  %Y.fr = freeze %Y
  %Agg = call { i4, i1 } @llvm.[us]mul.with.overflow.i4(i4 %X, i4 %Y.fr)
  %Op1 = extractvalue { i4, i1 } %Agg, 1
  %ret = %Op1
```

https://alive2.llvm.org/ce/z/zgPUGT
https://alive2.llvm.org/ce/z/h2gZ_6

Note that there are cases where inserting freeze is not necessary: e.g. %Y is `noundef`.
In this case, LLVM is already good because `%ret` is already successfully folded into `and`,
triggering the pre-existing optimization in InstSimplify: https://godbolt.org/z/v6qena15K

Differential Revision: https://reviews.llvm.org/D101423
2021-05-02 11:54:12 +09:00
Sanjay Patel 0f8b6686ac [InstCombine] narrow popcount with zext operand
https://llvm.org/PR50141
2021-04-29 15:07:16 -04:00
Sanjay Patel abd7529625 [InstCombine] relax masking requirement for truncated funnel/rotate match
I was investigating a seemingly unrelated improvement in demanded
bits for shift-left, but that caused regressions on these tests
because we were able to look through/eliminate the mask.

https://alive2.llvm.org/ce/z/Ztdr22

  define i8 @src(i32 %x, i32 %y, i32 %shift) {
  %and = and i32 %shift, 3
  %conv = and i32 %x, 255
  %shr = lshr i32 %conv, %and
  %sub = sub i32 8, %and
  %shl = shl i32 %y, %sub
  %or = or i32 %shr, %shl
  %conv2 = trunc i32 %or to i8
  ret i8 %conv2
  }

  define i8 @tgt(i32 %x, i32 %y, i32 %shift) {
  %x8 = trunc i32 %x to i8
  %y8 = trunc i32 %y to i8
  %shift8 = trunc i32 %shift to i8
  %and = and i8 %shift8, 3
  %conv2 = call i8 @llvm.fshr.i8(i8 %y8, i8 %x8, i8 %and)
  ret i8 %conv2
  }

  declare i8 @llvm.fshr.i8(i8,i8,i8)
2021-04-28 16:49:50 -04:00
Sanjay Patel 025bb52903 [InstCombine] fold clamp to 2 values from min/max intrinsics
The "select" versions of these folds is also missing and can
cause infinite loops as shown in:
https://llvm.org/PR48900
...but it seems easier to match these as max/min as a first fix.

https://alive2.llvm.org/ce/z/wv-_dT
2021-04-27 15:35:49 -04:00
Hongtao Yu 30bb5be389 [CSSPGO] Unblock optimizations with pseudo probe instrumentation part 2.
As a follow-up to D95982, this patch continues unblocking optimizations that are blocked by pseudu probe instrumention.

The optimizations unblocked are:
		- In-block load propagation.
		- In-block dead store elimination
		- Memory copy optimization that turns stores to consecutive memories into a memset.

These optimizations are local to a block, so they shouldn't affect the profile quality.

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D100075
2021-04-26 16:52:33 -07:00
Dávid Bolvanský 691badc3d6 [InstCombine] C - ctpop(a) - > ctpop(~a)) if C is bitwidth (PR50104)
Proof: https://alive2.llvm.org/ce/z/mncA9K
Solves https://bugs.llvm.org/show_bug.cgi?id=50104

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D101257
2021-04-26 15:40:54 +02:00
Dávid Bolvanský 137568e579 [InstCombine] Fixed UB in foldCtpop 2021-04-24 19:44:16 +02:00
Dávid Bolvanský de3fa35cdb [InstCombine] ctpop(rot(X)) -> ctpop(X)
Proof:
https://alive2.llvm.org/ce/z/ss2zyt - rotl
https://alive2.llvm.org/ce/z/ZM7Aue - rotr

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D101235
2021-04-24 18:25:03 +02:00
Dávid Bolvanský d4ec8ea19c [InstCombine] ctpop(X) + ctpop(Y) => ctpop(X | Y) if X and Y have no common bits (PR48999)
For example:

```
int src(unsigned int a, unsigned int b)
{
    return __builtin_popcount(a << 16) + __builtin_popcount(b >> 16);
}

int tgt(unsigned int a, unsigned int b)
{
    return __builtin_popcount((a << 16)  | (b >> 16));
}
```

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D101210
2021-04-24 17:52:10 +02:00
Dávid Bolvanský 9aee07abd0 [InstCombine] X - usub.sat(X, Y) => umin(X, Y)
Pattern regressed in LLVM 9 with the introduction of usub.sat.

Fixes https://bugs.llvm.org/show_bug.cgi?id=42178#c2

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D101184
2021-04-23 21:13:07 +02:00
Sanjay Patel e10d7d455d [InstCombine] fold 'not' of ctpop in parity pattern
As discussed in https://llvm.org/PR50096 , we could
convert the 'not' into a 'sub' and see the same
fold. That's because we already have another demanded
bits optimization for 'sub'.

We could add a related transform for
odd-number-of-type-bits, but that seems unlikely
to be practical.

https://alive2.llvm.org/ce/z/TWJZXr
2021-04-23 13:23:24 -04:00
Dávid Bolvanský 5f77e7708a [InstCombine] Fixed crash when setting align attr for memalign 2021-04-23 14:04:08 +02:00
Philip Reames 15e19a2599 Revert "[instcombine] Exploit UB implied by nofree attributes"
This change effectively reverts 86664638, but since there have been some changes on top and I wanted to leave the tests in, it's not a mechanical revert.

Why revert this now?  Two main reasons:
1) There are continuing discussion around what the semantics of nofree.  I am getting increasing uncomfortable with the seeming possibility we might redefine nofree in a way incompatible with these changes.
2) There was a reported miscompile triggered by this change (https://github.com/emscripten-core/emscripten/issues/9443).  At first, I was making good progress on tracking down the issues exposed and those issues appeared to be unrelated latent bugs.  Now that we've found at least one bug in the original change, and the investigation has stalled, I'm no longer comfortable leaving this in tree.  In retrospect, I probably should have reverted this earlier and investigated the issues once the triggering change was out of tree.
2021-04-22 10:53:17 -07:00
Nikita Popov 24e9fbc1a3 Revert "[InstCombine] Fold multiuse shr eq zero"
This reverts commit 9423f78240.

A performance regression with this patch has been reported at
https://reviews.llvm.org/rG9423f78240a2#990953. Reverting for now.
2021-04-21 21:40:52 +02:00
Reid Kleckner 91f7a4fff7 Revert "[InstCombine] Recognize `((x * y) s/ x) !=/== y` as an signed multiplication overflow check (PR48769)"
This reverts commit 13ec913bdf.

This commit introduces new uses of the overflow checking intrinsics that
depend on implementations in compiler-rt, which Windows users generally
do not link against. I filed an issue (somewhere) to make clang
auto-link the builtins library to resolve this situation, but until that
happens, it isn't reasonable for the optimizer to introduce new link
time dependencies.
2021-04-20 15:53:34 -07:00
Philip Reames 4824d876f0 Revert "Allow invokable sub-classes of IntrinsicInst"
This reverts commit d87b9b81cc.

Post commit review raised concerns, reverting while discussion happens.
2021-04-20 15:38:38 -07:00
Roman Lebedev 5a654bfeab
Revert "[InstCombine] `sext(trunc(x)) --> sext(x)` iff trunc is NSW (PR49543)"
I forgot about the case where we sign-extend to width smaller than the original.

This reverts commit 1e6ca23ab8.
2021-04-21 01:11:15 +03:00
Roman Lebedev 1e68d338c1
Revert "[InstCombine] "Bypass" NUW trunc of lshr if we are going to sext the result (PR49543)"
I forgot about the case where we sign-extend to width smaller than the original.

This reverts commit 41b71f718b.
2021-04-21 01:11:14 +03:00
Philip Reames d87b9b81cc Allow invokable sub-classes of IntrinsicInst
It used to be that all of our intrinsics were call instructions, but over time, we've added more and more invokable intrinsics. According to the verifier, we're up to 8 right now. As IntrinsicInst is a sub-class of CallInst, this puts us in an awkward spot where the idiomatic means to check for intrinsic has a false negative if the intrinsic is invoked.

This change switches IntrinsicInst from being a sub-class of CallInst to being a subclass of CallBase. This allows invoked intrinsics to be instances of IntrinsicInst, at the cost of requiring a few more casts to CallInst in places where the intrinsic really is known to be a call, not an invoke.

After this lands and has baked for a couple days, planned cleanups:
    Make GCStatepointInst a IntrinsicInst subclass.
    Merge intrinsic handling in InstCombine and use idiomatic visitIntrinsicInst entry point for InstVisitor.
    Do the same in SelectionDAG.
    Do the same in FastISEL.

Differential Revision: https://reviews.llvm.org/D99976
2021-04-20 15:03:49 -07:00
Roman Lebedev 41b71f718b
[InstCombine] "Bypass" NUW trunc of lshr if we are going to sext the result (PR49543)
This is a more convoluted form of the same pattern "sext of NSW trunc",
but in this case the operand of trunc was a right-shift,
and the truncation chops off just the zero bits that were shifted-in.
2021-04-21 00:31:46 +03:00
Roman Lebedev 1e6ca23ab8
[InstCombine] `sext(trunc(x)) --> sext(x)` iff trunc is NSW (PR49543)
If we can tell that trunc only chops off sign bits, and not all of them,
then we can simply sign-extend the trunc's source.
2021-04-21 00:31:45 +03:00
Sanjay Patel 1e202e8f39 [InstCombine] fold shift-of-srem-by-2 to mask+shift
There are several potential srem-by-2 folds
because the result is known {-1,0,1}.

https://alive2.llvm.org/ce/z/LuVyeK
2021-04-20 17:10:16 -04:00
Roman Lebedev 13ec913bdf
[InstCombine] Recognize `((x * y) s/ x) !=/== y` as an signed multiplication overflow check (PR48769)
We already had support for it's unsigned variant, so simply extend it
to also handle the signed variant.

Fixes https://bugs.llvm.org/show_bug.cgi?id=48769
2021-04-20 21:29:43 +03:00
Philip Reames 3b1474cab2 free(nullptr) does not violate the nofree specification
This fixes a subtle and nasty bug in my 86664638. The problem is that free(nullptr) is well defined (and common).

The specification for the nofree attributes talks about memory objects, and doesn't explicitly address null, but I think it's reasonable to assume that nofree doesn't disallow a call to free(nullptr). If it did, we'd have to prove nonnull on an argument to ever infer nofree which doesn't seem to be the intent.

This was found by Nuno and Alive2 over in https://reviews.llvm.org/D100141#2697374.

Differential Revision: https://reviews.llvm.org/D100779
2021-04-20 09:08:05 -07:00
Luo, Yuanke bcdaccfe34 [X86][AMX] Verify illegal types or instructions for x86_amx.
This patch is related to https://reviews.llvm.org/D100032 which define
some illegal types or operations for x86_amx. There are no arguments,
arrays, pointers, vectors or constants of x86_amx.

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D100472
2021-04-20 16:14:22 +08:00
Dávid Bolvanský 324d641b75 [InstCombine] Enhance deduction of alignment for aligned_alloc
This patch improves https://reviews.llvm.org/D76971 (Deduce attributes for aligned_alloc in InstCombine) and implements "TODO" item mentioned in the review of that patch.

> The function aligned_alloc() is the same as memalign(), except for the added restriction that size should be a multiple of alignment.

Currently, we simply bail out if we see a non-constant size - change that.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D100785
2021-04-20 02:04:18 +02:00
Nikita Popov 9423f78240 [InstCombine] Fold multiuse shr eq zero
The single-use case is handled implicity by converting the icmp
into a mask check first. When comparing with zero in particular,
we don't need the one-use restriction, as we only produce a single
icmp.

https://alive2.llvm.org/ce/z/MSixcm
https://alive2.llvm.org/ce/z/GwpG0M
2021-04-19 22:13:11 +02:00
Juneyoung Lee 1c10201d96 Update InstCombine to use undef matcher instead
This is a patch to use m_Undef() matcher instead of isa<UndefValue>().

As suggested in D100122, this update is separately committed.
2021-04-18 11:05:36 +09:00
Philip Reames ff55d01a8e [nofree] Restrict semantics to memory visible to caller
This patch clarifies the semantics of the nofree function attribute to make clear that it provides an "as if" semantic. That is, a nofree function is guaranteed not to free memory which existed before the call, but might allocate and then deallocate that same memory within the lifetime of the callee.

This is the result of the discussion on llvm-dev under the thread "Ambiguity in the nofree function attribute".

The most important part of this change is the LangRef wording. The rest is minor comment changes to emphasize the new semantics where code was accidentally consistent, and fix one place which wasn't consistent. That one place is currently narrowly used as it is primarily part of the ongoing (and not yet enabled) deref-at-point semantics work.

Differential Revision: https://reviews.llvm.org/D100141
2021-04-16 11:38:55 -07:00
Mehrnoosh Heidarpour 29f189f90d [InstCombine] Conditionally emit nowrap flags when combining two adds
Currently, the InstCombineCompare is combining two add operations
into a single add operation which always has a nsw flag, without
checking the conditions to see if this flag should be present
according to the original two add operations or not.

This patch will change the InstCombineCompare to emit the nsw or
nuw only when these flags are allowed to be generated according to
the original add operations and remove the possibility of applying
wrong optimization with passes that will perform on the IR later
in the pipeline.

To confirm that the current results are buggy and the results after
proposed patch are the correct IR the following examples from Alive2
are attached; the same results can be seen in the case of nuw flag
and nsw is just used as an example. The following link shows that
the generated IR with current LLVM is a buggy IR when none of the
original add operations have nsw flag.
https://alive2.llvm.org/ce/z/WGaDrm
The following link proves that the generated IR after the patch in
the former case is the correct IR.
https://alive2.llvm.org/ce/z/wQ7G_e

Differential Revision: https://reviews.llvm.org/D100095
2021-04-14 20:53:06 +02:00
Benjamin Kramer cf4161673c [Instcombine] Disable memcpy of alloca bypass for instruction sources
This transformation is fundamentally broken when it comes to dominance,
it just happened to work when the source of the memcpy can be moved into
the place of the alloca. The bug shows up a lot more often since
077bff39d4 allows the source to be a
switch.

It would be possible to check dominance of the source and all its
operands, but that seems very heavy for instcombine.
2021-04-14 16:52:09 +02:00
Roman Lebedev 2fea5d5d4a
[InstCombine] tmp alloca bypass: ensure that the replacement dominates all alloca uses
After 077bff39d4,
isDereferenceableForAllocaSize() can recurse into selects,
which is causing a problem for the new test case,
reduced from https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20210412/904154.html
because the replacement (the select) is defined after the first use
of an alloca, so we'd end up with a verifier error.

Now, this new check is too restrictive.
We likely can handle *some* cases, by trying to sink all uses of an alloca
to after the the def.
2021-04-14 13:04:12 +03:00
Yuanfang Chen c5fda0e662 Reland "Revert "[InstCombine] when calling conventions are compatible, don't convert the call to undef idiom""
This reverts commit a3fabc79ae (relands
f4d682d6ce with fix for the compile-time
regression issue).
2021-04-12 14:50:54 -07:00
Nikita Popov a3fabc79ae Revert "[InstCombine] when calling conventions are compatible, don't convert the call to undef idiom"
This reverts commit f4d682d6ce.

This caused a significant compile-time regression:
https://llvm-compile-time-tracker.com/compare.php?from=4b7bad9eaea2233521a94f6b096aaa88dc584e23&to=f4d682d6ce6c5b3a41a0acf297507c82f5c21eef&stat=instructions

Possibly this is due to overeager parsing of target triples.
2021-04-12 22:55:59 +02:00
Sanjay Patel 5354a213a0 [InstCombine] fold shift+trunc signbit check
https://alive2.llvm.org/ce/z/6vQvrP

This solves:
https://llvm.org/PR49866
2021-04-12 16:19:43 -04:00
Yuanfang Chen f4d682d6ce [InstCombine] when calling conventions are compatible, don't convert the call to undef idiom
D24453 enabled libcalls simplication for ARM PCS. This may cause
caller/callee calling conventions mismatch in some situations such as
LTO. This patch makes instcombine aware that the compatible calling
conventions differences are benign (not emitting undef idom).

Differential Revision: https://reviews.llvm.org/D99773
2021-04-12 09:32:23 -07:00
Roman Lebedev 91248e2db9
[InstCombine] Improve "get low bit mask upto and including bit X" pattern
https://alive2.llvm.org/ce/z/3u-48R
2021-04-11 18:08:08 +03:00
Roman Lebedev a36bb7fd76
[InstCombine] (X | Op01C) + Op1C --> X + (Op01C + Op1C) iff the or is actually an add
https://alive2.llvm.org/ce/z/Coc5yf
2021-04-11 18:08:08 +03:00
Sanjay Patel 84cdccc9dc [InstCombine] try to eliminate an instruction in min/max -> abs fold
As suggested in the review thread for 5094e12 and seen in the
motivating example from https://llvm.org/PR49885, it's not
clear if we have a way to create the optimal code without
this heuristic.
2021-04-09 10:34:03 -04:00
Sanjay Patel 5094e1279e [InstCombine] fold min/max intrinsic with negated operand to abs
The smax case shows up in https://llvm.org/PR49885 .
The others seem unlikely, but we might as well try
for uniformity (although that could mean an extra
instruction to create "nabs").

smax -- https://alive2.llvm.org/ce/z/8yYaGy
smin -- https://alive2.llvm.org/ce/z/0_7zc_
umax -- https://alive2.llvm.org/ce/z/EcsZWs
umin -- https://alive2.llvm.org/ce/z/Xw6WvB
2021-04-08 14:37:39 -04:00
Sanjay Patel c0bbd0cc35 [InstCombine] fold not ops around min/max intrinsics
This is another step towards parity with the existing
cmp+select folds (see D98152).
2021-04-07 17:31:36 -04:00
Roman Lebedev 24f67473dd
[InstCombine] foldAddWithConstant(): don't deal with non-immediate constants
All of the code that handles general constant here (other than the more
restrictive APInt-dealing code) expects that it is an immediate,
because otherwise we won't actually fold the constants, and increase
instruction count. And it isn't obvious why we'd be okay with
increasing the number of constant expressions,
those still will have to be run..

But after 2829094a8e
this could also cause endless combine loops.
So actually properly restrict this code to immediates.
2021-04-07 19:50:19 +03:00
Sanjay Patel 1894c6c59e [InstCombine] avoid infinite loop from partial undef vectors
This fixes the examples from
D99674 and
https://llvm.org/PR49878

The matchers succeed on partial undef/poison vector constants,
but the transform creates a full 'not' (-1) constant, so it
would undo a demanded vector elements change triggered by the
extractelement.

Differential Revision: https://reviews.llvm.org/D100044
2021-04-07 12:18:12 -04:00
Sanjay Patel 0333ed8e0c [InstCombine] move abs transform to helper function; NFC
The swap of the operands can affect later transforms that
are expecting a constant as operand 1. I don't think we
can trigger a bug with the current code, but I hit that
problem while drafting a new transform for min/max intrinsics.
2021-04-07 08:35:07 -04:00
Roman Lebedev 2829094a8e
Reland [InstCombine] Fold `((X - Y) - Z)` to `X - (Y + Z)` (PR49858)
This reverts commit a547b4e26b,
relanding commit 31d219d299,
which was reverted because there was a conflicting inverse transform,
which was causing an endless combine loop, which has now been adjusted.

Original commit message:

https://alive2.llvm.org/ce/z/67w-wQ

We prefer `add`s over `sub`, and this particular xform
allows further folds to happen:

Fixes https://bugs.llvm.org/show_bug.cgi?id=49858
2021-04-07 12:06:25 +03:00
Roman Lebedev 93d1d94b74
[InstCombine] Restrict "C-(X+C2) --> (C-C2)-X" fold to immediate constants
I.e., if any/all of the consants is an expression, don't do it.
Since those constants won't reduce into an immediate,
but would be left as an constant expression, they could cause
endless combine loops after 31d219d299
added an inverse transformation.
2021-04-07 12:06:24 +03:00
Petr Hosek a547b4e26b Revert "[InstCombine] Fold `((X - Y) - Z)` to `X - (Y + Z)` (PR49858)"
This reverts commit 31d219d299 which
causes an infinite loop when compiling the XRay runtime.
2021-04-06 22:30:28 -07:00
Philip Reames 4bf8985f4f Replace calls to IntrinsicInst::Create with CallInst::Create [nfc]
There is no IntrinsicInst::Create.  These are binding to the method in the super type.  Be explicitly about which method is being called.
2021-04-06 13:23:58 -07:00
Philip Reames 908215b346 Use AssumeInst in a few more places [nfc]
Follow up to a6d2a8d6f5.  These were found by simply grepping for "::assume", and are the subset of that result which looked cleaner to me using the isa/dyn_cast patterns.
2021-04-06 13:18:53 -07:00
Philip Reames 9ef6aa020b Plumb AssumeInst through operand bundle apis [nfc]
Follow up to a6d2a8d6f5.  This covers all the public interfaces of the bundle related code.  I tried to cleanup the internals where the changes were obvious, but there's definitely more room for improvement.
2021-04-06 12:53:53 -07:00
Philip Reames a6d2a8d6f5 Add a subclass of IntrinsicInst for llvm.assume [nfc]
Add the subclass, update a few places which check for the intrinsic to use idiomatic dyn_cast, and update the public interface of AssumptionCache to use the new class.  A follow up change will do the same for the newer assumption query/bundle mechanisms.
2021-04-06 11:16:22 -07:00
Roman Lebedev 31d219d299
[InstCombine] Fold `((X - Y) - Z)` to `X - (Y + Z)` (PR49858)
https://alive2.llvm.org/ce/z/67w-wQ

We prefer `add`s over `sub`, and this particular xform
allows further folds to happen:

Fixes https://bugs.llvm.org/show_bug.cgi?id=49858
2021-04-06 15:58:14 +03:00
Sanjay Patel c590a9880d [InstCombine] fix potential miscompile in select value equivalence
As shown in the example based on:
https://llvm.org/PR49832
...and the existing test, we can't substitute
a vector value because the equality compare
replacement that we are attempting requires
that the comparison is true for the entire
value. Vector select can be partly true/false.
2021-04-05 12:25:40 -04:00
Roman Lebedev 2760a808b9
[InstCombine] dropRedundantMaskingOfLeftShiftInput(): check that adding shift amounts doesn't overflow (PR49778)
This is identical to 781d077afb,
but for the other function.

For certain shift amount bit widths, we must first ensure that adding
shift amounts is safe, that the sum won't have an unsigned overflow.

Fixes https://bugs.llvm.org/show_bug.cgi?id=49778
2021-04-04 23:26:41 +03:00
Roman Lebedev dceb3e5996
[NFC][InstCombine] Extract canTryToConstantAddTwoShiftAmounts() as helper 2021-04-04 23:26:41 +03:00
Sanjay Patel c0645f1324 [InstCombine] fold popcount of exactly one bit to shift
This is discussed in https://llvm.org/PR48999 ,
but it does not solve that request.

The difference in the vector test shows that some
other logic transform is limited to scalar types.
2021-04-04 11:43:49 -04:00
Juneyoung Lee 5207cde5cb [InstCombine] Conditionally fold select i1 into and/or
This patch fixes llvm.org/pr49688 by conditionally folding select i1 into and/or:

```
select cond, cond2, false
->
and cond, cond2
```

This is not safe if cond2 is poison whereas cond isn’t.

Unconditionally disabling this transformation affects later pipelines that depend on and/or i1s.
To minimize its impact, this patch conservatively checks whether cond2 is an instruction that
creates a poison or its operand creates a poison.
This approach is similar to what InstSimplify's SimplifyWithOpReplaced is doing.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D99674
2021-04-04 14:11:28 +09:00
Sanjay Patel 412fc74140 [InstCombine] fold not+or+neg
~((-X) | Y) --> (X - 1) & (~Y)

We generally prefer 'add' over 'sub', this reduces the
dependency chain, and this looks better for codegen on
x86, ARM, and AArch64 targets.

https://llvm.org/PR45755

https://alive2.llvm.org/ce/z/cxZDSp
2021-04-02 13:16:36 -04:00
Jeroen Dobbelaere b82b305cf9 [InstCombine] Fix out-of-bounds ashr(shl) optimization
This fixes a crash found by the oss fuzzer and reported by @fhahn.
The suggestion of @RKSimon seems to be the correct fix here. (See D91343).

The oss fuzz report can be found here: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=32759

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D99792
2021-04-02 13:45:11 +02:00
Sanjay Patel 1462bdf1b9 [InstCombine] fold abs(srem X, 2)
This is a missing optimization based on an example in:
https://llvm.org/PR49763

As noted there and the test here, we could add a more
general fold if that is shown useful.

https://alive2.llvm.org/ce/z/xEHdTv
https://alive2.llvm.org/ce/z/97dcY5
2021-03-31 11:29:20 -04:00
Sanjay Patel c2ebad8d55 [InstCombine] add fold for demand of low bit of abs()
This is one problem shown in https://llvm.org/PR49763

https://alive2.llvm.org/ce/z/cV6-4K
https://alive2.llvm.org/ce/z/9_3g-L
2021-03-30 15:14:37 -04:00
Sanjay Patel 01ae6e5ead [InstCombine] sink min/max intrinsics with common op after select
This is another step towards parity with cmp+select min/max idioms.

See D98152.
2021-03-28 13:13:04 -04:00
Nashe Mncube 5d929794a8 [llvm-opt] Bug fix within combining FP vectors
A bug was found within InstCombineCasts where a function call
is only implemented to work with FixedVectors. This caused a
crash when a ScalableVector was passed to this function.
This commit introduces a regression test which recreates the
failure and a bug fix.

Differential Revision: https://reviews.llvm.org/D98351
2021-03-23 12:13:41 +00:00
Juneyoung Lee 960a767368 Reland "[InstCombine] Add simplification of two logical and/ors"
This relands 07c3b97e18 (D96945) which was reverted by
commit f49354838e.
The two-stage compilation successfully tests passes on my machine.
2021-03-23 16:24:50 +09:00
Roman Lebedev d37fe26a2b
[NFC][IR] Type: add getWithNewType() method
Sometimes you want to get a type with same vector element count
as the current type, but different element type,
but there's no QOL wrapper to do that. Add one.
2021-03-23 00:50:58 +03:00
Philip Reames 5698537f81 Update basic deref API to account for possiblity of free [NFC]
This patch is plumbing to support work towards the goal outlined in the recent llvm-dev post "[llvm-dev] RFC: Decomposing deref(N) into deref(N) + nofree".

The point of this change is purely to simplify iteration on other pieces on way to making the switch. Rebuilding with a change to Value.h is slow and painful, so I want to get the API change landed. Once that's done, I plan to more closely audit each caller, add the inference rules in their own patch, then post a patch with the langref changes and test diffs. The value of the command line flag is that we can exercise the inference logic in standalone patches without needing the whole switch ready to go just yet.

Differential Revision: https://reviews.llvm.org/D98908
2021-03-19 11:17:19 -07:00
Stephen Tozer 3bfddc2593 Reapply "[DebugInfo] Handle multiple variable location operands in IR"
Fixed section of code that iterated through a SmallDenseMap and added
instructions in each iteration, causing non-deterministic code; replaced
SmallDenseMap with MapVector to prevent non-determinism.

This reverts commit 01ac6d1587.
2021-03-17 16:45:25 +00:00
Hans Wennborg 01ac6d1587 Revert "[DebugInfo] Handle multiple variable location operands in IR"
This caused non-deterministic compiler output; see comment on the
code review.

> This patch updates the various IR passes to correctly handle dbg.values with a
> DIArgList location. This patch does not actually allow DIArgLists to be produced
> by salvageDebugInfo, and it does not affect any pass after codegen-prepare.
> Other than that, it should cover every IR pass.
>
> Most of the changes simply extend code that operated on a single debug value to
> operate on the list of debug values in the style of any_of, all_of, for_each,
> etc. Instances of setOperand(0, ...) have been replaced with with
> replaceVariableLocationOp, which takes the value that is being replaced as an
> additional argument. In places where this value isn't readily available, we have
> to track the old value through to the point where it gets replaced.
>
> Differential Revision: https://reviews.llvm.org/D88232

This reverts commit df69c69427.
2021-03-17 13:36:48 +01:00
Mohammad Hadi Jooybar 302b80abf0 [InstCombine] Avoid Bitcast-GEP fusion for pointers directly from allocation functions
Elimination of bitcasts with void pointer arguments results in GEPs with pure byte indexes. These GEPs do not preserve struct/array information and interrupt phi address translation in later pipeline stages.

Here is the original motivation for this patch:

```
#include<stdio.h>
#include<malloc.h>

typedef struct __Node{

  double f;
  struct __Node *next;

} Node;

void foo () {
  Node *a = (Node*) malloc (sizeof(Node));
  a->next = NULL;
  a->f = 11.5f;

  Node *ptr = a;
  double sum = 0.0f;
  while (ptr) {
    sum += ptr->f;
    ptr = ptr->next;
  }
  printf("%f\n", sum);
}
```
By explicit assignment  `a->next = NULL`, we can infer the length of the link list is `1`. In this case we can eliminate while loop traversal entirely. This elimination is supposed to be performed by GVN/MemoryDependencyAnalysis/PhiTranslation .

The final IR before this patch:

```
define dso_local void @foo(i32* nocapture readnone %r) local_unnamed_addr #0 {
entry:
  %call = tail call noalias dereferenceable_or_null(16) i8* @malloc(i64 16) #2
  %next = getelementptr inbounds i8, i8* %call, i64 8
  %0 = bitcast i8* %next to %struct.__Node**
  store %struct.__Node* null, %struct.__Node** %0, align 8, !tbaa !2
  %f = bitcast i8* %call to double*
  store double 1.150000e+01, double* %f, align 8, !tbaa !8
  %tobool12 = icmp eq i8* %call, null
  br i1 %tobool12, label %while.end, label %while.body.lr.ph

while.body.lr.ph:                                 ; preds = %entry
  %1 = bitcast i8* %call to %struct.__Node*
  br label %while.body

while.body:                                       ; preds = %while.body.lr.ph, %while.body
  %sum.014 = phi double [ 0.000000e+00, %while.body.lr.ph ], [ %add, %while.body ]
  %ptr.013 = phi %struct.__Node* [ %1, %while.body.lr.ph ], [ %3, %while.body ]
  %f1 = getelementptr inbounds %struct.__Node, %struct.__Node* %ptr.013, i64 0, i32 0
  %2 = load double, double* %f1, align 8, !tbaa !8
  %add = fadd contract double %sum.014, %2
  %next2 = getelementptr inbounds %struct.__Node, %struct.__Node* %ptr.013, i64 0, i32 1
  %3 = load %struct.__Node*, %struct.__Node** %next2, align 8, !tbaa !2
  %tobool = icmp eq %struct.__Node* %3, null
  br i1 %tobool, label %while.end, label %while.body

while.end:                                        ; preds = %while.body, %entry
  %sum.0.lcssa = phi double [ 0.000000e+00, %entry ], [ %add, %while.body ]
  %call3 = tail call i32 (i8*, ...) @printf(i8* nonnull dereferenceable(1) getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %sum.0.lcssa)
  ret void
}
```

Final IR after this patch:
```
; Function Attrs: nofree nounwind
define dso_local void @foo(i32* nocapture readnone %r) local_unnamed_addr #0 {
while.end:
  %call3 = tail call i32 (i8*, ...) @printf(i8* nonnull dereferenceable(1) getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double 1.150000e+01)
  ret void
}
```

IR before GVN before this patch:
```
define dso_local void @foo(i32* nocapture readnone %r) local_unnamed_addr #0 {
entry:
  %call = tail call noalias dereferenceable_or_null(16) i8* @malloc(i64 16) #2
  %next = getelementptr inbounds i8, i8* %call, i64 8
  %0 = bitcast i8* %next to %struct.__Node**
  store %struct.__Node* null, %struct.__Node** %0, align 8, !tbaa !2
  %f = bitcast i8* %call to double*
  store double 1.150000e+01, double* %f, align 8, !tbaa !8
  %tobool12 = icmp eq i8* %call, null
  br i1 %tobool12, label %while.end, label %while.body.lr.ph

while.body.lr.ph:                                 ; preds = %entry
  %1 = bitcast i8* %call to %struct.__Node*
  br label %while.body

while.body:                                       ; preds = %while.body.lr.ph, %while.body
  %sum.014 = phi double [ 0.000000e+00, %while.body.lr.ph ], [ %add, %while.body ]
  %ptr.013 = phi %struct.__Node* [ %1, %while.body.lr.ph ], [ %3, %while.body ]
  %f1 = getelementptr inbounds %struct.__Node, %struct.__Node* %ptr.013, i64 0, i32 0
  %2 = load double, double* %f1, align 8, !tbaa !8
  %add = fadd contract double %sum.014, %2
  %next2 = getelementptr inbounds %struct.__Node, %struct.__Node* %ptr.013, i64 0, i32 1
  %3 = load %struct.__Node*, %struct.__Node** %next2, align 8, !tbaa !2
  %tobool = icmp eq %struct.__Node* %3, null
  br i1 %tobool, label %while.end.loopexit, label %while.body

while.end.loopexit:                               ; preds = %while.body
  %add.lcssa = phi double [ %add, %while.body ]
  br label %while.end

while.end:                                        ; preds = %while.end.loopexit, %entry
  %sum.0.lcssa = phi double [ 0.000000e+00, %entry ], [ %add.lcssa, %while.end.loopexit ]
  %call3 = tail call i32 (i8*, ...) @printf(i8* nonnull dereferenceable(1) getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %sum.0.lcssa)
  ret void
}
```
IR before GVN after this patch:
```
define dso_local void @foo(i32* nocapture readnone %r) local_unnamed_addr #0 {
entry:
  %call = tail call noalias dereferenceable_or_null(16) i8* @malloc(i64 16) #2
  %0 = bitcast i8* %call to %struct.__Node*
  %next = getelementptr inbounds %struct.__Node, %struct.__Node* %0, i64 0, i32 1
  store %struct.__Node* null, %struct.__Node** %next, align 8, !tbaa !2
  %f = getelementptr inbounds %struct.__Node, %struct.__Node* %0, i64 0, i32 0
  store double 1.150000e+01, double* %f, align 8, !tbaa !8
  %tobool12 = icmp eq i8* %call, null
  br i1 %tobool12, label %while.end, label %while.body.preheader

while.body.preheader:                             ; preds = %entry
  br label %while.body

while.body:                                       ; preds = %while.body.preheader, %while.body
  %sum.014 = phi double [ %add, %while.body ], [ 0.000000e+00, %while.body.preheader ]
  %ptr.013 = phi %struct.__Node* [ %2, %while.body ], [ %0, %while.body.preheader ]
  %f1 = getelementptr inbounds %struct.__Node, %struct.__Node* %ptr.013, i64 0, i32 0
  %1 = load double, double* %f1, align 8, !tbaa !8
  %add = fadd contract double %sum.014, %1
  %next2 = getelementptr inbounds %struct.__Node, %struct.__Node* %ptr.013, i64 0, i32 1
  %2 = load %struct.__Node*, %struct.__Node** %next2, align 8, !tbaa !2
  %tobool = icmp eq %struct.__Node* %2, null
  br i1 %tobool, label %while.end.loopexit, label %while.body

while.end.loopexit:                               ; preds = %while.body
  %add.lcssa = phi double [ %add, %while.body ]
  br label %while.end

while.end:                                        ; preds = %while.end.loopexit, %entry
  %sum.0.lcssa = phi double [ 0.000000e+00, %entry ], [ %add.lcssa, %while.end.loopexit ]
  %call3 = tail call i32 (i8*, ...) @printf(i8* nonnull dereferenceable(1) getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %sum.0.lcssa)
  ret void
}
```

The phi translation fails before this patch and it prevents GVN to remove the loop. The reason for this failure is in InstCombine. When the Instruction combining pass decides to convert:
```
 %call = tail call noalias dereferenceable_or_null(16) i8* @malloc(i64 16)
  %0 = bitcast i8* %call to %struct.__Node*
  %next = getelementptr inbounds %struct.__Node, %struct.__Node* %0, i64 0, i32 1
  store %struct.__Node* null, %struct.__Node** %next
```
to
```
%call = tail call noalias dereferenceable_or_null(16) i8* @malloc(i64 16)
  %next = getelementptr inbounds i8, i8* %call, i64 8
  %0 = bitcast i8* %next to %struct.__Node**
  store %struct.__Node* null, %struct.__Node** %0

```

GEP instructions with pure byte indexes (e.g. `getelementptr inbounds i8, i8* %call, i64 8`) are obstacles for address translation. address translation is looking for structural similarity between GEPs and these GEPs usually do not match since they have different structure.

This change will cause couple of failures in LLVM-tests. However, in all cases we need to change expected result by the test. I will update those tests as soon as I get green light on this patch.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D96881
2021-03-16 17:05:44 -04:00
Simonas Kazlauskas 7d7001b2cb [InstCombine] Restrict a GEP transform to avoid changing provenance
This is an alternative to D98120. Herein, instead of deleting the transformation entirely, we check
that the underlying objects are both the same and therefore this transformation wouldn't incur a
provenance change, if applied.

https://alive2.llvm.org/ce/z/SYF_yv

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D98588
2021-03-14 16:32:04 +02:00
Luo, Yuanke 66fbf5fafb [X86][AMX] Prevent transforming load pointer from <256 x i32>* to x86_amx*.
The load/store instruction will be transformed to amx intrinsics
in the pass of AMX type lowering. Prohibiting the pointer cast
make that pass happy.

Differential Revision: https://reviews.llvm.org/D98247
2021-03-14 09:24:56 +08:00
Sanjay Patel 4224a36957 [InstCombine] avoid creating an extra instruction in zext fold and possible inf-loop
The structure of this fold is suspect vs. most of instcombine
because it creates instructions and tries to delete them
immediately after.

If we don't have the operand types for the icmps, then we are
not behaving as assumed. And as shown in PR49475, we can inf-loop.
2021-03-13 08:30:51 -05:00
Nikita Popov 42eb658f65 [OpaquePtrs] Remove some uses of type-less CreateGEP() (NFC)
This removes some (but not all) uses of type-less CreateGEP()
and CreateInBoundsGEP() APIs, which are incompatible with opaque
pointers.

There are a still a number of tricky uses left, as well as many
more variation APIs for CreateGEP.
2021-03-12 21:01:16 +01:00
Juneyoung Lee f49354838e Revert "[InstCombine] Add simplification of two logical and/ors"
This reverts commit 07c3b97e18 due to a reported failure in two-stage build.
2021-03-10 05:48:31 +09:00
gbtozers df69c69427 [DebugInfo] Handle multiple variable location operands in IR
This patch updates the various IR passes to correctly handle dbg.values with a
DIArgList location. This patch does not actually allow DIArgLists to be produced
by salvageDebugInfo, and it does not affect any pass after codegen-prepare.
Other than that, it should cover every IR pass.

Most of the changes simply extend code that operated on a single debug value to
operate on the list of debug values in the style of any_of, all_of, for_each,
etc. Instances of setOperand(0, ...) have been replaced with with
replaceVariableLocationOp, which takes the value that is being replaced as an
additional argument. In places where this value isn't readily available, we have
to track the old value through to the point where it gets replaced.

Differential Revision: https://reviews.llvm.org/D88232
2021-03-09 16:44:38 +00:00
Sanjay Patel 2986a9c7e2 [InstCombine] canonicalize 'not' op after min/max intrinsic
This is another step towards parity between existing select
transforms and min/max intrinsics (D98152)..

The existing 'not' folds around select are complicated, so
it's likely that we will need to enhance this, but this
should be a safe step.
2021-03-09 11:33:28 -05:00
Sanjay Patel 41b9209a12 [InstCombine] fold min/max intrinsics with not ops
This is a partial translation of the existing select-based
folds. We need to recreate several different transforms to
avoid regressions as noted in D98152.

https://alive2.llvm.org/ce/z/teuZ_J
2021-03-09 08:55:48 -05:00
Florian Hahn 92da5b7119
[InstCombine] Simplify phis with incoming pointer-casts.
If the incoming values of a phi are pointer casts of the same original
value, replace the phi with a single cast. Such redundant phis are
somewhat common after loop-rotate and removing them can avoid some
unnecessary code bloat, e.g. because an iteration of a loop is peeled
off to make the phi invariant. It should also simplify further analysis
on its own.

InstCombine already uses stripPointerCasts in a couple of places and
also simplifies phis based on the incoming values, so the patch should
fit in the existing scope.

The patch causes binary changes in 47 out of 237 benchmarks in
MultiSource/SPEC2000/SPEC2006 with -O3 -flto on X86.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D98058
2021-03-09 11:40:18 +00:00
Philip Reames ebc61f9d3c [instcombine] Collapse trivial or recurrences
If we have a recurrence of the form <Start, Or, Step> we know that the value taken by the recurrence stabilizes on the first iteration (provided step is loop invariant). We can exploit that fact to remove the loop carried dependence in the recurrence.

Differential Revision: https://reviews.llvm.org/D97578 (or part)
2021-03-08 09:21:38 -08:00
Philip Reames 239a618180 [instcombine] Collapse trivial and recurrences
If we have a recurrence of the form <Start, And, Step> we know that the value taken by the recurrence stabilizes on the first iteration (provided step is loop invariant). We can exploit that fact to remove the loop carried dependence in the recurrence.

Differential Revision: https://reviews.llvm.org/D97578 (and part)
2021-03-08 09:21:38 -08:00
Sanne Wouda 05a6e2eb9a [InstCombine] Add a combine for a shuffle of similar bitcasts
Some intrinsics wrapper code has the habit of ignoring the type of the
elements in vectors, thinking of vector registers as a "bag of bits". As
a consequence, some operations are shared between vectors of different
types are shared. For example, functions that rearrange elements in a
vector can be shared between vectors of int32 and float.

This can result in bitcasts in awkward places that prevent the backend
from recognizing some instructions. For AArch64 in particular, it
inhibits the selection of dup from a general purpose register (GPR), and
mov from GPR to a vector lane.

This patch adds a pattern in InstCombine to move the bitcasts past the
shufflevector if this is possible. Sometimes this even allows
InstCombine to remove the bitcast entirely, as in the included tests.

Alternatively this could be done with a few extra patterns in the
AArch64 backend, but InstCombine seems like a better place for this.

Differential Revision: https://reviews.llvm.org/D97397
2021-03-08 16:32:30 +00:00
Sanne Wouda 5e963a2441 Rehome an orphaned comment [NFC]
As seen in 35827164c4, the "shuffle x, x, mask" comment has drifted away
from the implementation of the pattern. Put it back.
2021-03-08 16:32:30 +00:00
Stephen Tozer 4343c68fa3 Fix: [DebugInfo] Support DIArgList in DbgVariableIntrinsic
This patch removed the only use of a lambda capture, triggering an error
on `-Werror -Wunused-lambda-capture` builds.
2021-03-08 14:57:11 +00:00
gbtozers e5d958c456 [DebugInfo] Support DIArgList in DbgVariableIntrinsic
This patch updates DbgVariableIntrinsics to support use of a DIArgList for the
location operand, resulting in a significant change to its interface. This patch
does not update all IR passes to support multiple location operands in a
dbg.value; the only change is to update the DbgVariableIntrinsic interface and
its uses. All code outside of the intrinsic classes assumes that an intrinsic
will always have exactly one location operand; they will still support
DIArgLists, but only if they contain exactly one Value.

Among other changes, the setOperand and setArgOperand functions in
DbgVariableIntrinsic have been made private. This is to prevent code from
setting the operands of these intrinsics directly, which could easily result in
incorrect/invalid operands being set. This does not prevent these functions from
being called on a debug intrinsic at all, as they can still be called on any
CallInst pointer; it is assumed that any code directly setting the operands on a
generic call instruction is doing so safely. The intention for making these
functions private is to prevent DIArgLists from being overwritten by code that's
naively trying to replace one of the Values it points to, and also to fail fast
if a DbgVariableIntrinsic is updated to use a DIArgList without a valid
corresponding DIExpression.
2021-03-08 14:36:13 +00:00
Juneyoung Lee 07c3b97e18 [InstCombine] Add simplification of two logical and/ors
This is a patch that adds folding of two logical and/ors that share one variable:

a && (a && b) -> a && b
a && (a & b)  -> a && b
...

This is towards removing the poison-unsafe select optimization (D93065 has more context).

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D96945
2021-03-08 02:38:43 +09:00
Juneyoung Lee d672c81126 [InstCombine] use safe transformation by default
.. since it will be folded into and/or anyway
2021-03-08 02:25:29 +09:00
Juneyoung Lee 33590ed4f2 [InstCombine] fix another poison-unsafe select transformation
This fixes another unsafe select folding by disabling it if
EnableUnsafeSelectTransform is set to false.

EnableUnsafeSelectTransform's default value is true, hence it won't
affect generated code (unless the flag is explicitly set to false).
2021-03-08 02:11:04 +09:00
Roman Lebedev 2ad1f5eb1a
[InstCombine] Don't canonicalize (gep i8* X, -(ptrtoint Y)) as (inttoptr (sub (ptrtoint X), (ptrtoint Y)))
It's just a wrong thing to do.

We introduce inttoptr where there were none, which results in
loosing all provenance information because we no longer have a GEP{i,},
and pessimize all future optimizations,
because we are basically not allowed to look past `inttoptr`.

(gep i8* X, -(ptrtoint Y))  *is* the canonical form.
So just drop this fold.

Noticed while reviewing D98120.
2021-03-06 23:00:25 +03:00
Alexey Bataev 04ba80ca4d [Instcombiner]Improve emission of logical or/and reductions.
For logical or/and reductions we emit regular intrinsics @llvm.vector.reduce.or/and.vxi1 calls.
These intrinsics are not effective for the logical or/and reductions,
especially if the optimizer is able to emit short circuit versions of
the scalar or/and instructions and vector code gets less effective than
the scalar version.
Instead, or reduction for i1 can be represented as:
```
%val = bitcast <ReduxWidth x i1> to iReduxWidth
%res = cmp ne iReduxWidth %val, 0
```
and reduction for i1 can be represented as:
```
%val = bitcast <ReduxWidth x i1> to iReduxWidth
%res = cmp eq iReduxWidth %val, 11111
```
This improves perfromance of the vector code significantly and make it
to outperform short circuit scalar code.

Part of D57059.

Differential Revision: https://reviews.llvm.org/D97406
2021-03-04 08:01:02 -08:00
Serguei Katkov a0ff0f30df [InstCombine] Move statepoint intrinsic handling from visitCall to visitCallBase
statepoint intrinsic can be used in invoke context,
so it should be handled in visitCallBase to cover both call and invoke.

Reviewers: reames, dantrushin
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D97833
2021-03-04 11:00:22 +07:00
Philip Reames 99f5417346 Sink routine for replacing a operand bundle to CallBase [NFC]
We had equivalent code for both CallInst and InvokeInst, but never cared about the result type.
2021-03-03 12:07:55 -08:00
Sanjay Patel 9502061bcc [InstCombine] avoid infinite loop in demanded bits for select
https://llvm.org/PR49205
2021-02-28 10:17:53 -05:00
Stephen Tozer ec7b9b0c18 [InstCombine] Avoid redundant or out-of-order debug value sinking
This patch modifies TryToSinkInstruction in the InstCombine pass, to prevent
redundant debug intrinsics from being produced, and also prevent the intrinsics
from being emitted in an incorrect order. It does this by ensuring that when
this pass sinks an instruction and creates clones of the debug intrinsics that
use that instruction, it inserts those debug intrinsics in their original order,
and only inserts the last debug intrinsic for each variable in the Instruction's
block.

Differential revision: https://reviews.llvm.org/D95463
2021-02-26 13:04:33 +00:00
Sanjay Patel a7cee55762 [InstCombine] fold fdiv with powi divisor (PR49147)
This extends b40fde062c for the especially non-standard
powi pattern. We want to avoid being completely wrong
on the negation-of-int-min corner case, so I'm adding
an extra FMF check for 'ninf' assuming that gives us
the flexibility to handle that possibility.
https://llvm.org/PR49147
2021-02-24 16:44:36 -05:00
Sanjay Patel 868d43fbd6 [InstCombine] add helper for x/pow(); NFC
We at least want to add powi to this list, so
split it off into a switch to reduce code duplication.
2021-02-24 16:44:36 -05:00
Nikita Popov e0615bcd39 [Loads] Add optimized FindAvailableLoadedValue() overload (NFCI)
FindAvailableLoadedValue() accepts an iterator by reference. If no
available value is found, then the iterator will either be left
at a clobbering instruction or the beginning of the basic block.
This allows using FindAvailableLoadedValue() across multiple blocks.

If this functionality is not needed, as is the case in InstCombine,
then we can use a much more efficient implementation: First try
to find an available value, and only perform clobber checks if
we actually found one. As this function only looks at a very small
number of instructions (6 by default) and usually doesn't find an
available value, this saves many expensive alias analysis queries.
2021-02-21 18:42:56 +01:00
Sanjay Patel e772618f1e [InstCombine] fold fdiv with exp/exp2 divisor (PR49147)
Follow-up to:
D96648 / b40fde062
...for the special-case base calls.

From the earlier commit:
This is unusual in the general (non-reciprocal) case because we need
an extra instruction, but that should be better for general FP
reassociation and codegen. We conservatively check for "arcp" FMF
here as we do with existing fdiv folds, but it is not strictly
necessary to have that.
2021-02-20 16:02:58 -05:00
Simon Pilgrim 609d0c9772 [InstCombine] matchBSwapOrBitReverse - remove pattern matching early-out. NFCI.
recognizeBSwapOrBitReverseIdiom + collectBitParts have pattern matching to bail out early if a bswap/bitreverse pattern isn't possible - we should be able to rely on this instead without any notable change in compile time.

This is part of a cleanup towards letting matchBSwapOrBitReverse /recognizeBSwapOrBitReverseIdiom use 'root' instructions that aren't ORs (FSHL/FSHRs in particular which can be prematurely created).

Differential Revision: https://reviews.llvm.org/D97056
2021-02-20 13:15:34 +00:00
Nikita Popov 70e3c9a8b6 [BasicAA] Always strip single-argument phi nodes
We can always look through single-argument (LCSSA) phi nodes when
performing alias analysis. getUnderlyingObject() already does this,
but stripPointerCastsAndInvariantGroups() does not. We still look
through these phi nodes with the usual aliasPhi() logic, but
sometimes get sub-optimal results due to the restrictions on value
equivalence when looking through arbitrary phi nodes. I think it's
generally beneficial to keep the underlying object logic and the
pointer cast stripping logic in sync, insofar as it is possible.

With this patch we get marginally better results:

  aa.NumMayAlias | 5010069 | 5009861
  aa.NumMustAlias | 347518 | 347674
  aa.NumNoAlias | 27201336 | 27201528
  ...
  licm.NumPromoted | 1293 | 1296

I've renamed the relevant strip method to stripPointerCastsForAliasAnalysis(),
as we're past the point where we can explicitly spell out everything
that's getting stripped.

Differential Revision: https://reviews.llvm.org/D96668
2021-02-18 23:07:50 +01:00
Philip Reames 8666463889 [instcombine] Exploit UB implied by nofree attributes
This patch simply implements the documented UB of the current nofree attributes as specified. It doesn't try to be fancy about inference (yet), it just implements the cases already specified and inferred.

Note: When this lands, it may expose miscompiles. If so, please revert and provide a test case. It's likely the bug is in the existing inference code and without a relatively complete test case, it will be hard to debug.

Differential Revision: https://reviews.llvm.org/D96349
2021-02-18 08:34:22 -08:00
Sanjay Patel 85294703a7 [InstCombine] fold fcmp-of-copysign idiom
As discussed in:
https://llvm.org/PR49179
...this pattern shows up in library code.
There are several potential generalizations as noted,
but we need to be careful that we get FP special-values
right, and it's not clear how much variation we should
expect to see from this exact idiom.
2021-02-17 10:32:33 -05:00
Sanjay Patel b40fde062c [InstCombine] fold fdiv with pow divisor (PR49147)
This is unusual in the general (non-reciprocal) case because we need
an extra instruction, but that should be better for general FP
reassociation and codegen. We conservatively check for "arcp" FMF
here as we do with existing fdiv folds, but it is not strictly
necessary to have that.

This is part of solving:
https://llvm.org/PR49147
(The powi variant potentially has a different constraint.)

Differential Revision: https://reviews.llvm.org/D96648
2021-02-14 08:07:36 -05:00
Tyker 642e9225c6 reland [InstCombine] convert assumes to operand bundles
Instcombine will convert the nonnull and alignment assumption that use the boolean condtion
to an assumption that uses the operand bundles when knowledge retention is enabled.

Differential Revision: https://reviews.llvm.org/D82703
2021-02-13 13:03:11 +01:00
Hongtao Yu 1cb47a063e [CSSPGO] Unblock optimizations with pseudo probe instrumentation.
The IR/MIR pseudo probe intrinsics don't get materialized into real machine instructions and therefore they don't incur runtime cost directly. However, they come with indirect cost by blocking certain optimizations. Some of the blocking are intentional (such as blocking code merge) for better counts quality while the others are accidental. This change unblocks perf-critical optimizations that do not affect counts quality. They include:

1. IR InstCombine, sinking load operation to shorten lifetimes.
2. MIR LiveRangeShrink, similar to #1
3. MIR TwoAddressInstructionPass, i.e, opeq transform
4. MIR function argument copy elision
5. IR stack protection. (though not perf-critical but nice to have).

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D95982
2021-02-10 12:43:17 -08:00
Sanjay Patel 6e2053983e [InstCombine] fold lshr(mul X, SplatC), C2
This is a special-case multiply that replicates bits of
the source operand. We need this fold to avoid regression
if we make canonicalization to `mul` more aggressive for
shl+or patterns.

I did not see a way to make Alive generalize the bit width
condition for even-number-of-bits only, but an example of
the proof is:
  Name: i32
  Pre: isPowerOf2(C1 - 1) && log2(C1) == C2 && (C2 * 2 == width(C2))
  %m = mul nuw i32 %x, C1
  %t = lshr i32 %m, C2
  =>
  %t = and i32 %x, C1 - 2

  Name: i14
  %m = mul nuw i14 %x, 129
  %t = lshr i14 %m, 7
  =>
  %t = and i14 %x, 127

https://rise4fun.com/Alive/e52
2021-02-10 15:02:31 -05:00
Tyker 5652e192fc Revert "[InstCombine] convert assumes to operand bundles"
This reverts commit 5eb2e994f9.
2021-02-10 01:32:00 +01:00
Tyker 5eb2e994f9 [InstCombine] convert assumes to operand bundles
Instcombine will convert the nonnull and alignment assumption that use the boolean condtion
to an assumption that uses the operand bundles when knowledge retention is enabled.

Differential Revision: https://reviews.llvm.org/D82703
2021-02-09 19:33:53 +01:00
Kazu Hirata 302313a264 [Transforms] Use range-based for loops (NFC) 2021-02-08 22:33:53 -08:00
Roman Lebedev 485c4b552b
[InstCombine] Host inversion out of ashr's value operand (PR48995)
This is a yet another hint that we will eventually need InstCombineInverter,
which would consistently sink inversions, but but for that we'll need
to consistently hoist inversions where possible, so let's do that here.

Example of a proof: https://alive2.llvm.org/ce/z/78SbDq

See https://bugs.llvm.org/show_bug.cgi?id=48995
2021-02-02 17:56:43 +03:00
Sanjay Patel 0ce2920f17 [InstCombine] try to narrow min/max intrinsics with constant operand
The constant trunc/ext may not be the optimal pre-condition,
but I think that handles the common cases.

Example of Alive2 proof:
https://alive2.llvm.org/ce/z/sREeLC

This is another step towards canonicalizing to the intrinsics.
Narrowing was identified as source of potential regression for
abs(), so we need to handle this for min/max - see:
https://llvm.org/PR48816

If this is not enough, we could process intrinsics in
the trunc-driven matching in canEvaluateTruncated().
2021-02-01 13:44:13 -05:00
Valery N Dmitriev 716b9dd0d8 [InstCombine] Preserve FMF for powi simplifications.
Differential Revision: https://reviews.llvm.org/D95455
2021-01-26 13:26:06 -08:00
Sanjay Patel 09a136bcc6 [InstCombine] narrow min/max intrinsics with extended inputs
We can sink extends after min/max if they match and would
not change the sign-interpreted compare. The only combo
that doesn't work is zext+smin/smax because the zexts
could change a negative number into positive:
https://alive2.llvm.org/ce/z/D6sz6J

Sext+umax/umin works:

  define i32 @src(i8 %x, i8 %y) {
  %0:
    %sx = sext i8 %x to i32
    %sy = sext i8 %y to i32
    %m = umax i32 %sx, %sy
    ret i32 %m
  }
  =>
  define i32 @tgt(i8 %x, i8 %y) {
  %0:
    %m = umax i8 %x, %y
    %r = sext i8 %m to i32
    ret i32 %r
  }
  Transformation seems to be correct!
2021-01-25 07:52:50 -05:00
Jeroen Dobbelaere dcc7706fcf [InstCombine] Remove unused llvm.experimental.noalias.scope.decl
A @llvm.experimental.noalias.scope.decl is only useful if there is !alias.scope and !noalias metadata that uses the declared scope.
When that is not the case for at least one of the two, the intrinsic call can as well be removed.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D95141
2021-01-24 13:55:50 +01:00
Florian Hahn d60b74c28a
[InstCombine] Set MadeIRChange in replaceInstUsesWith.
Some utilities used by InstCombine, like SimplifyLibCalls, may add new
instructions and replace the uses of a call, but return nullptr because
the inserted call produces multiple results.

Previously, the replaced library calls would get removed by
InstCombine's deleter, but after
292077072e this may not happen, if the
willreturn attribute is missing.

As a work-around, update replaceInstUsesWith to set MadeIRChange, if it
replaces any uses. This catches the cases where it is used as replacer
by utilities used by InstCombine and seems useful in general; updating
uses will modify the IR.

This fixes an expensive-check failure when replacing
@__sinpif/@__cospifi with @__sincospif_sret.
2021-01-23 17:52:59 +00:00
Sanjay Patel 411c144e4c [InstCombine] narrow abs with sign-extended input
In the motivating cases from https://llvm.org/PR48816 ,
we have a trailing trunc. But that is not required to
reduce the abs width:
https://alive2.llvm.org/ce/z/ECaz-p
...as long as we clear the int-min-is-poison bit (nsw).

We have some existing tests that are affected, and I'm
not sure what the overall implications are, but in general
we favor narrowing operations over preserving nsw/nuw.

If that causes problems, we could restrict this transform
based on type (shouldChangeType() and/or vector vs. scalar).

Differential Revision: https://reviews.llvm.org/D95235
2021-01-22 13:36:04 -05:00
Roman Lebedev d1a6f92fd5
[InstCombine] Fold `(~x) | y` --> `~(x & (~y))` iff it is free to do so
Iff we know we can get rid of the inversions in the new pattern,
we can thus get rid of the inversion in the old pattern,
this decreasing instruction count.

Note that we could position this transformation as just hoisting
of the `not` (still, iff y is freely negatible), but the test changes
show a number of regressions, so let's not do that.
2021-01-22 17:23:54 +03:00
Roman Lebedev 79b0d21ce9
[InstCombine] Fold `(~x) & y` --> `~(x | (~y))` iff it is free to do so
Iff we know we can get rid of the inversions in the new pattern,
we can thus get rid of the inversion in the old pattern,
this decreasing instruction count.
2021-01-22 17:23:54 +03:00
Roman Lebedev 4ed0d8f2f0
[NFC][InstCombine] Extract freelyInvertAllUsersOf() out of canonicalizeICmpPredicate()
I'd like to use it in an upcoming fold.
2021-01-22 17:23:53 +03:00
Kazu Hirata e53472de68 [Transforms] Use llvm::append_range (NFC) 2021-01-20 21:35:54 -08:00
Kazu Hirata 8f5da41c4d [llvm] Construct SmallVector with iterator ranges (NFC) 2021-01-20 21:35:52 -08:00
Nikita Popov 21443381c0 Reapply [InstCombine] Replace one-use select operand based on condition
Relative to the original change, this adds a check that the
instruction on which we're replacing operands is safe to speculatively
execute, because that's what we're effectively doing. We're executing
the instruction with the replaced operand, which is fine if it's pure,
but not fine if can cause side-effects or UB (aka is not speculatable).

Additionally, we cannot (generally) replace operands in phi nodes,
as these may refer to a different loop iteration. This is also covered
by the speculation check.

-----

InstCombine already performs a fold where X == Y ? f(X) : Z is
transformed to X == Y ? f(Y) : Z if f(Y) simplifies. However,
if f(X) only has one use, then we can always directly replace the
use inside the instruction. To actually be profitable, limit it to
the case where Y is a non-expr constant.

This could be further extended to replace uses further up a one-use
instruction chain, but for now this only looks one level up.

Among other things, this also subsumes D94860.

Differential Revision: https://reviews.llvm.org/D94862
2021-01-19 20:26:38 +01:00
Hans Wennborg 58bdfcfac0 Revert 5238e7b302 "[InstCombine] Replace one-use select operand based on condition"
This caused a miscompile in Chromium, see comments on the codereview for
discussion and pointer to a reproducer.

> InstCombine already performs a fold where X == Y ? f(X) : Z is
> transformed to X == Y ? f(Y) : Z if f(Y) simplifies. However,
> if f(X) only has one use, then we can always directly replace the
> use inside the instruction. To actually be profitable, limit it to
> the case where Y is a non-expr constant.
>
> This could be further extended to replace uses further up a one-use
> instruction chain, but for now this only looks one level up.
>
> Among other things, this also subsumes D94860.
>
> Differential Revision: https://reviews.llvm.org/D94862

This also reverts the follow-up
a003f26539cf4db744655e76c41f4c4a8913f116:

> [llvm] Prevent infinite loop in InstCombine of select statements
>
> This fixes an issue where the RHS and LHS the comparison operation
> creating the predicate were swapped back and forth forever.
>
> Differential Revision: https://reviews.llvm.org/D94934
2021-01-19 11:50:56 +01:00
Tres Popp a003f26539 [llvm] Prevent infinite loop in InstCombine of select statements
This fixes an issue where the RHS and LHS the comparison operation
creating the predicate were swapped back and forth forever.

Differential Revision: https://reviews.llvm.org/D94934
2021-01-19 10:31:48 +01:00
Juneyoung Lee 2d89ebd5d1 Address unused variable warning 2021-01-19 09:30:16 +09:00
Juneyoung Lee 0441df94ad [InstCombine,InstSimplify] Optimize select followed by and/or/xor
This patch adds `A & (A && B)` -> `A && B`  (similarly for or + logical or)

Also, this patch adds `~(select C, (icmp pred X, Y), const)` -> `select C, (icmp pred' X, Y), ~const`.

Alive2 proof:
merge_and: https://alive2.llvm.org/ce/z/teMR97
merge_or: https://alive2.llvm.org/ce/z/b4yZUp
xor_and: https://alive2.llvm.org/ce/z/_-TXHi
xor_or: https://alive2.llvm.org/ce/z/2uYx_a

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D94861
2021-01-19 09:14:17 +09:00
Dávid Bolvanský ed396212da [InstCombine] Transform abs pattern using multiplication to abs intrinsic (PR45691)
```
unsigned r(int v)
{
    return (1 | -(v < 0)) * v;
}

`r` is equivalent to `abs(v)`.

```

```
define <4 x i8> @src(<4 x i8> %0) {
%1:
  %2 = ashr <4 x i8> %0, { 31, undef, 31, 31 }
  %3 = or <4 x i8> %2, { 1, 1, 1, undef }
  %4 = mul nsw <4 x i8> %3, %0
  ret <4 x i8> %4
}
=>
define <4 x i8> @tgt(<4 x i8> %0) {
%1:
  %2 = icmp slt <4 x i8> %0, { 0, 0, 0, 0 }
  %3 = sub nsw <4 x i8> { 0, 0, 0, 0 }, %0
  %4 = select <4 x i1> %2, <4 x i8> %3, <4 x i8> %0
  ret <4 x i8> %4
}
Transformation seems to be correct!
```

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D94874
2021-01-17 17:06:14 +01:00
Nikita Popov 5238e7b302 [InstCombine] Replace one-use select operand based on condition
InstCombine already performs a fold where X == Y ? f(X) : Z is
transformed to X == Y ? f(Y) : Z if f(Y) simplifies. However,
if f(X) only has one use, then we can always directly replace the
use inside the instruction. To actually be profitable, limit it to
the case where Y is a non-expr constant.

This could be further extended to replace uses further up a one-use
instruction chain, but for now this only looks one level up.

Among other things, this also subsumes D94860.

Differential Revision: https://reviews.llvm.org/D94862
2021-01-16 23:25:02 +01:00
Nikita Popov 17863614da [InstCombine] Fold select -> and/or using impliesPoison
We can fold a ? b : false to a & b if is_poison(b) implies that
is_poison(a), at which point we're able to reuse all the usual fold
on ands. In particular, this covers the very common case of
icmp X, C && icmp X, C'. The same applies to ors.

This currently only has an effect if the
-instcombine-unsafe-select-transform=0 option is set.

Differential Revision: https://reviews.llvm.org/D94550
2021-01-13 17:45:40 +01:00
Luo, Yuanke 055644cc45 [X86][AMX] Prohibit pointer cast on load.
The load/store instruction will be transformed to amx intrinsics in the
pass of AMX type lowering. Prohibiting the pointer cast make that pass
happy.

Differential Revision: https://reviews.llvm.org/D94372
2021-01-13 09:39:19 +08:00
Nikita Popov 23390e7a13 [InstCombine] Handle logical and/or in assume optimization
assume(a && b) can be converted to assume(a); assume(b) even if
the condition is logical. Same for assume(!(a || b)).
2021-01-12 22:36:40 +01:00
Dávid Bolvanský 0529946b5b [instCombine] Add (A ^ B) | ~(A | B) -> ~(A & B)
define i32 @src(i32 %x, i32 %y) {
%0:
  %xor = xor i32 %y, %x
  %or = or i32 %y, %x
  %neg = xor i32 %or, 4294967295
  %or1 = or i32 %xor, %neg
  ret i32 %or1
}
=>
define i32 @tgt(i32 %x, i32 %y) {
%0:
  %and = and i32 %x, %y
  %neg = xor i32 %and, 4294967295
  ret i32 %neg
}
Transformation seems to be correct!

https://alive2.llvm.org/ce/z/Cvca4a
2021-01-12 19:29:17 +01:00
Sanjay Patel 288f3fc5df [InstCombine] reduce icmp(ashr X, C1), C2 to sign-bit test
This is a more basic pattern that we should handle before trying to solve:
https://llvm.org/PR48640

There might be a better way to think about this because the pre-condition
that I came up with (number of sign bits in the compare constant) misses a
potential transform for each of ugt and ult as commented on in the test file.

Tried to model this is in Alive:
https://rise4fun.com/Alive/juX1
...but I couldn't get the ComputeNumSignBits() pre-condition to work as
expected, so replaced with leading 0/1 preconditions instead.

  Name: ugt
  Pre: countLeadingZeros(C2) <= C1 && countLeadingOnes(C2) <= C1
  %a = ashr %x, C1
  %r = icmp ugt i8 %a, C2
    =>
  %r = icmp slt i8 %x, 0

  Name: ult
  Pre: countLeadingZeros(C2) <= C1 && countLeadingOnes(C2) <= C1
  %a = ashr %x, C1
  %r = icmp ult i4 %a, C2
    =>
  %r = icmp sgt i4 %x, -1

Also approximated in Alive2:
https://alive2.llvm.org/ce/z/u5hCcz
https://alive2.llvm.org/ce/z/__szVL

Differential Revision: https://reviews.llvm.org/D94014
2021-01-11 15:53:39 -05:00
Florian Hahn c701f85c45
[STLExtras] Use return type from operator* of the wrapped iter.
Currently make_early_inc_range cannot be used with iterators with
operator* implementations that do not return a reference.

Most notably in the LLVM codebase, this means the User iterator ranges
cannot be used with make_early_inc_range, which slightly simplifies
iterating over ranges while elements are removed.

Instead of directly using BaseT::reference as return type of operator*,
this patch uses decltype to get the actual return type of the operator*
implementation in WrappedIteratorT.

This patch also updates a few places to use make use of
make_early_inc_range.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D93992
2021-01-10 14:41:13 +00:00
Kazu Hirata 33bf1cad75 [llvm] Use *Set::contains (NFC) 2021-01-07 20:29:34 -08:00
Juneyoung Lee 29f8628d1f [Constant] Add containsPoisonElement
This patch

- Adds containsPoisonElement that checks existence of poison in constant vector elements,
- Renames containsUndefElement to containsUndefOrPoisonElement to clarify its behavior & updates its uses properly

With this patch, isGuaranteedNotToBeUndefOrPoison's tests w.r.t constant vectors are added because its analysis is improved.

Thanks!

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D94053
2021-01-06 12:10:33 +09:00
Simon Pilgrim 313d982df6 [IR] Add ConstantInt::getBool helpers to wrap getTrue/getFalse. 2021-01-05 11:01:10 +00:00
Kazu Hirata 530c5af6a4 [Transforms] Construct SmallVector with iterator ranges (NFC) 2021-01-02 09:24:17 -08:00
Dávid Bolvanský ae69fa9b9f [InstCombine] Transform (A + B) - (A & B) to A | B (PR48604)
define i32 @src(i32 %x, i32 %y) {
%0:
  %a = add i32 %x, %y
  %o = and i32 %x, %y
  %r = sub i32 %a, %o
  ret i32 %r
}
=>
define i32 @tgt(i32 %x, i32 %y) {
%0:
  %b = or i32 %x, %y
  ret i32 %b
}
Transformation seems to be correct!

https://alive2.llvm.org/ce/z/2fhW6r
2020-12-31 15:04:32 +01:00
Dávid Bolvanský 742ea77ca4 [InstCombine] Transform (A + B) - (A | B) to A & B (PR48604)
define i32 @src(i32 %x, i32 %y) {
%0:
  %a = add i32 %x, %y
  %o = or i32 %x, %y
  %r = sub i32 %a, %o
  ret i32 %r
}
=>
define i32 @tgt(i32 %x, i32 %y) {
%0:
  %b = and i32 %x, %y
  ret i32 %b
}
Transformation seems to be correct!

https://alive2.llvm.org/ce/z/aQRh2j
2020-12-31 14:03:20 +01:00
Juneyoung Lee 9b29610228 Use unary CreateShuffleVector if possible
As mentioned in D93793, there are quite a few places where unary `IRBuilder::CreateShuffleVector(X, Mask)` can be used
instead of `IRBuilder::CreateShuffleVector(X, Undef, Mask)`.
Let's update them.

Actually, it would have been more natural if the patches were made in this order:
(1) let them use unary CreateShuffleVector first
(2) update IRBuilder::CreateShuffleVector to use poison as a placeholder value (D93793)

The order is swapped, but in terms of correctness it is still fine.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D93923
2020-12-30 22:36:08 +09:00
Luo, Yuanke 981a0bd858 [X86] Add x86_amx type for intel AMX.
The x86_amx is used for AMX intrisics. <256 x i32> is bitcast to x86_amx when
it is used by AMX intrinsics, and x86_amx is bitcast to <256 x i32> when it
is used by load/store instruction. So amx intrinsics only operate on type x86_amx.
It can help to separate amx intrinsics from llvm IR instructions (+-*/).
Thank Craig for the idea. This patch depend on https://reviews.llvm.org/D87981.

Differential Revision: https://reviews.llvm.org/D91927
2020-12-30 13:52:13 +08:00
Roman Lebedev 374ef57f13
[InstCombine] 'hoist xor-by-constant from xor-by-value': completely give up on constant exprs
As Mikael Holmén is noting in the post-commit review for the first fix
https://reviews.llvm.org/rGd4ccef38d0bb#967466
not hoisting constantexprs is not enough,
because if the xor originally was a constantexpr (i.e. X is a constantexpr).
`SimplifyAssociativeOrCommutative()` in `visitXor()` will immediately
undo this transform, thus again causing an infinite combine loop.

This transform has resulted in a surprising number of constantexpr failures.
2020-12-29 16:28:18 +03:00
Nikita Popov 4a16c507cb [InstCombine] Disable unsafe select transform behind a flag
This disables the poison-unsafe select -> and/or transform behind
a flag (we continue to perform the fold by default). This is intended
to simplify evaluation and testing while we teach various passes
to directly recognize the select pattern.

This only disables the main select -> and/or transform. A number of
related ones are instead changed to canonicalize to the a ? b : false
and a ? true : b forms which represent and/or respectively. This
requires a bit of care to avoid infinite loops, as we do not want
!a ? b : false to be converted into a ? false : b.

The basic idea here is the same as D93065, but keeps the change
behind a flag for now.

Differential Revision: https://reviews.llvm.org/D93840
2020-12-28 22:43:52 +01:00
Roman Lebedev d4ccef38d0
[InstCombine] 'hoist xor-by-constant from xor-by-value': ignore constantexprs
As it is being reported (in post-commit review) in
https://reviews.llvm.org/D93857
this fold (as i expected, but failed to come up with test coverage
despite trying) has issues with constant expressions.
Since we only care about true constants, which constantexprs are not,
don't perform such hoisting for constant expressions.
2020-12-28 20:15:20 +03:00
Juneyoung Lee 9d70dbdc2b [InstCombine] use poison as placeholder for undemanded elems
Currently undef is used as a don’t-care vector when constructing a vector using a series of insertelement.
However, this is problematic because undef isn’t undefined enough.
Especially, a sequence of insertelement can be optimized to shufflevector, but using undef as its placeholder makes shufflevector a poison-blocking instruction because undef cannot be optimized to poison.
This makes a few straightforward optimizations incorrect, such as:

```
;  https://bugs.llvm.org/show_bug.cgi?id=44185

define <4 x float> @insert_not_undef_shuffle_translate_commute(float %x, <4 x float> %y, <4 x float> %q) {
  %xv = insertelement <4 x float> %q, float %x, i32 2
  %r = shufflevector <4 x float> %y, <4 x float> %xv, <4 x i32> { 0, 6, 2, undef }
  ret <4 x float> %r ; %r[3] is undef
}
=>
define <4 x float> @insert_not_undef_shuffle_translate_commute(float %x, <4 x float> %y, <4 x float> %q) {
  %r = insertelement <4 x float> %y, float %x, i32 1
  ret <4 x float> %r ; %r[3] = %y[3], incorrect if %y[3] = poison
}

Transformation doesn't verify!
ERROR: Target is more poisonous than source
```

I’d like to suggest
1. Using poison as insertelement’s placeholder value (IRBuilder::CreateVectorSplat should be patched too)
2. Updating shufflevector’s semantics to return poison element if mask is undef

Note that poison is currently lowered into UNDEF in SelDag, so codegen part is okay.
m_Undef() matches PoisonValue as well, so existing optimizations will still fire.

The only concern is hidden miscompilations that will go incorrect when poison constant is given.
A conservative way is copying all tests having `insertelement undef` & replacing it with `insertelement poison` & run Alive2 on it, but it will create many tests and people won’t like it. :(

Instead, I’ll simply locally maintain the tests and run Alive2.
If there is any bug found, I’ll report it.

Relevant links: https://bugs.llvm.org/show_bug.cgi?id=43958 , http://lists.llvm.org/pipermail/llvm-dev/2019-November/137242.html

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D93586
2020-12-28 08:58:15 +09:00
Kazu Hirata 789d250613 [CodeGen, Transforms] Use *Map::lookup (NFC) 2020-12-27 09:57:27 -08:00
Roman Lebedev d9ebaeeb46
[InstCombine] Hoist xor-by-constant from xor-by-value
This is one of the deficiencies that can be observed in
https://godbolt.org/z/YPczsG after D91038 patch set.

This exposed two missing folds, one was fixed by the previous commit,
another one is `(A ^ B) | ~(A ^ B) --> -1` / `(A ^ B) & ~(A ^ B) --> 0`.

`-early-cse` will catch it: https://godbolt.org/z/4n1T1v,
but isn't meaningful to fix it in InstCombine,
because we'd need to essentially do our own CSE,
and we can't even rely on `Instruction::isIdenticalTo()`,
because there are no guarantees that the order of operands matches.
So let's just accept it as a loss.
2020-12-24 21:20:50 +03:00
Roman Lebedev 5b78303433
[InstCombine] Fold `a & ~(a ^ b)` to `x & y`
```
----------------------------------------
define i32 @and_xor_not_common_op(i32 %a, i32 %b) {
%0:
  %b2 = xor i32 %b, 4294967295
  %t2 = xor i32 %a, %b2
  %t4 = and i32 %t2, %a
  ret i32 %t4
}
=>
define i32 @and_xor_not_common_op(i32 %a, i32 %b) {
%0:
  %t4 = and i32 %a, %b
  ret i32 %t4
}
Transformation seems to be correct!
```
2020-12-24 21:20:49 +03:00
Roman Lebedev b3021a72a6
[IR][InstCombine] Add m_ImmConstant(), that matches on non-ConstantExpr constants, and use it
A pattern to ignore ConstantExpr's is quite common, since they frequently
lead into infinite combine loops, so let's make writing it easier.
2020-12-24 21:20:47 +03:00
Simon Pilgrim 89abe1cf83 [InstCombine] foldICmpUsingKnownBits - use KnownBits signed/unsigned getMin/MaxValue helpers. NFCI.
Replace the local compute*SignedMinMaxValuesFromKnownBits methods with the equivalent KnownBits helpers to determine the min/max value ranges.
2020-12-24 14:22:26 +00:00
Nikita Popov ef2f843347 Revert "[InstCombine] Check inbounds in load/store of gep null transform (PR48577)"
This reverts commit 899faa50f2.

Upon further consideration, this does not fix the right issue.
Doing this fold for non-inbounds GEPs is legal, because the
resulting pointer is still based-on null, which has no associated
address range, and as such and access to it is UB.

https://bugs.llvm.org/show_bug.cgi?id=48577#c3
2020-12-24 12:36:56 +01:00
Nikita Popov 90177912a4 Revert "[InstCombine] Fold gep inbounds of null to null"
This reverts commit eb79fd3c92.

This causes stage2 crashes, possibly due to StringMap being
miscompiled. Reverting for now.
2020-12-24 10:20:31 +01:00
Roman Lebedev f8079355c6
[InstCombine] canonicalizeAbsNabs(): don't propagate NSW flag for NABS patter
As Nuno is noting in post-commit review in
https://reviews.llvm.org/D87188#2467915
it is not correct to keep NSW for negated abs pattern,
so don't do that.
2020-12-24 00:06:09 +03:00
Nikita Popov 759b8c11c3 [InstCombine] Handle different pointer types when folding gep of null
The source pointer type is not necessarily the same as the result
pointer type, so we can't simply return the original null pointer,
it might be a different one.
2020-12-23 21:58:26 +01:00
Nikita Popov eb79fd3c92 [InstCombine] Fold gep inbounds of null to null
Effectively, this is what we were previously already doing when
the GEP was used in conjunction with a load or store, but this
fold can also be applied more generally:

> The only in bounds address for a null pointer in the default
> address-space is the null pointer itself.
2020-12-23 21:41:53 +01:00
Nikita Popov 899faa50f2 [InstCombine] Check inbounds in load/store of gep null transform (PR48577)
If the GEP isn't inbounds, then accessing a GEP of null location
is generally not UB.

While this is a minimal fix, the GEP of null handling should
probably be its own fold.
2020-12-23 21:03:22 +01:00
Congzhe Cao c60a58f8d4 [InstCombine] Add check of i1 types in select-to-zext/sext transformation
When doing select-to-zext/sext transformations, we should
not handle TrueVal and FalseVal of i1 type otherwise it
would result in zext/sext i1 to i1.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D93272
2020-12-21 18:46:24 -05:00
Roman Lebedev 897c985e1e
[InstCombine] Canonicalize SPF to abs intrinsic
This patch enables canonicalization of SPF_ABS and SPF_ABS
to the abs intrinsic.

This is a recommit, the original try was
05d4c4ebc2,
but it was reverted due to an apparent miscompile,
which since then has just been fixed by the previous commit.

Differential Revision: https://reviews.llvm.org/D87188
2020-12-18 21:18:14 +03:00
Florian Hahn 01089c876b
[InstCombine] Preserve !annotation on newly created instructions.
If the source instruction has !annotation metadata, all instructions
created during combining should also have it. Tell the builder to
add it.

The !annotation system was discussed on llvm-dev as part of
'RFC: Combining Annotation Metadata and Remarks'
(http://lists.llvm.org/pipermail/llvm-dev/2020-November/146393.html)

This patch is based on an earlier patch by Francis Visoiu Mistrih.

Reviewed By: thegameg, lebedev.ri

Differential Revision: https://reviews.llvm.org/D91444
2020-12-17 15:20:23 +00:00
Jun Ma 0138399903 [InstCombine] Remove scalable vector restriction in InstCombineCasts
Differential Revision: https://reviews.llvm.org/D93389
2020-12-17 22:02:33 +08:00
Florian Hahn 29077ae860
[IRBuilder] Generalize debug loc handling for arbitrary metadata.
This patch extends IRBuilder to allow adding/preserving arbitrary
metadata on created instructions.

Instead of using references to specific metadata nodes (like DebugLoc),
IRbuilder now keeps a vector of (metadata kind, MDNode *) pairs, which
are added to each created instruction.

The patch itself is a NFC and only moves the existing debug location
handling over to the new system. In a follow-up patch it will be used to
preserve !annotation metadata besides !dbg.

The current approach requires iterating over MetadataToCopy to avoid
adding duplicates, but given that the number of metadata kinds to
copy/preserve is going to be very small initially (0, 1 (for !dbg) or 2
(!dbg and !annotation)) that should not matter.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D93400
2020-12-17 13:27:43 +00:00
Florian Hahn eba09a2db9
[InstCombine] Preserve !annotation for newly created instructions.
When replacing an instruction with !annotation with a newly created
replacement, add the !annotation metadata to the replacement.

This mostly covers cases where the new instructions are created using
the ::Create helpers. Instructions created by IRBuilder will be handled
by D91444.

Reviewed By: thegameg

Differential Revision: https://reviews.llvm.org/D93399
2020-12-17 09:06:51 +00:00
Jun Ma 52a3267ffa [InstCombine] Remove scalable vector restriction in foldVectorBinop
Differential Revision: https://reviews.llvm.org/D93289
2020-12-15 21:14:59 +08:00
Jun Ma ffe84d90e9 [InstCombine][NFC] Change cast of FixedVectorType to dyn_cast. 2020-12-15 20:36:57 +08:00
Jun Ma e12f584578 [InstCombine] Remove scalable vector restriction in InstCombineCompares
Differential Revision: https://reviews.llvm.org/D93269
2020-12-15 20:36:57 +08:00
Jun Ma 2ac58e21a1 [InstCombine] Remove scalable vector restriction when fold SelectInst
Differential Revision: https://reviews.llvm.org/D93083
2020-12-15 20:36:57 +08:00
Reid Kleckner d2ed9d6b7e Revert "ADT: Migrate users of AlignedCharArrayUnion to std::aligned_union_t, NFC"
We determined that the MSVC implementation of std::aligned* isn't suited
to our needs. It doesn't support 16 byte alignment or higher, and it
doesn't really guarantee 8 byte alignment. See
https://github.com/microsoft/STL/issues/1533

Also reverts "ADT: Change AlignedCharArrayUnion to an alias of std::aligned_union_t, NFC"

Also reverts "ADT: Remove AlignedCharArrayUnion, NFC" to bring back
AlignedCharArrayUnion.

This reverts commit 4d8bf870a8.

This reverts commit d10f9863a5.

This reverts commit 4b5dc150b9.
2020-12-14 17:04:06 -08:00
Sanjay Patel 4f051fe374 [InstCombine] avoid crash sinking to unreachable block
The test is reduced from the example in D82005.

Similar to 94f6d365e, the test here would assert in
the DomTree when we tried to convert a select to a
phi with an unreachable block operand.

We may want to add some kind of guard code in DomTree
itself to avoid this sort of problem.
2020-12-10 13:10:26 -05:00
Roman Lebedev e6f2a79d7a
[InstCombine] canonicalizeSaturatedAdd(): last fold is only valid for strict comparison (PR48390)
We could create uadd.sat under incorrect circumstances
if a select with -1 as the false value was canonicalized
by swapping the T/F values. Unlike the other transforms
in the same function, it is not invariant to equality.

Some alive proofs: https://alive2.llvm.org/ce/z/emmKKL

Based on original patch by David Green!

Fixes https://bugs.llvm.org/show_bug.cgi?id=48390

Differential Revision: https://reviews.llvm.org/D92717
2020-12-09 18:19:09 +03:00
Joe Ellis 80c33de2d3 [SelectionDAG] Add llvm.vector.{extract,insert} intrinsics
This commit adds two new intrinsics.

- llvm.experimental.vector.insert: used to insert a vector into another
  vector starting at a given index.

- llvm.experimental.vector.extract: used to extract a subvector from a
  larger vector starting from a given index.

The codegen work for these intrinsics has already been completed; this
commit is simply exposing the existing ISD nodes to LLVM IR.

Reviewed By: cameron.mcinally

Differential Revision: https://reviews.llvm.org/D91362
2020-12-09 11:08:41 +00:00
Kazu Hirata ddb002d7c7 [InstCombine] Remove replacePointer (NFC)
The declaration was introduced on Feb 10, 2017 in commit
ba01ed00fe without a corresponding
definition.
2020-12-06 10:24:08 -08:00
Sanjay Patel 94f6d365e4 [InstCombine] avoid crash on phi with unreachable incoming block (PR48369) 2020-12-06 09:31:47 -05:00
Duncan P. N. Exon Smith d10f9863a5 ADT: Migrate users of AlignedCharArrayUnion to std::aligned_union_t, NFC
Prepare to delete `AlignedCharArrayUnion` by migrating its users over to
`std::aligned_union_t`.

I will delete `AlignedCharArrayUnion` and its tests in a follow-up
commit so that it's easier to revert in isolation in case some
downstream wants to keep using it.

Differential Revision: https://reviews.llvm.org/D92516
2020-12-04 12:34:49 -08:00
Duncan P. N. Exon Smith 5b267fb796 ADT: Stop peeking inside AlignedCharArrayUnion, NFC
Update all the users of `AlignedCharArrayUnion` to stop peeking inside
(to look at `buffer`) so that a follow-up patch can replace it with an
alias to `std::aligned_union_t`.

This was reviewed as part of https://reviews.llvm.org/D92512, but I'm
splitting this bit out to commit first to reduce churn in case the
change to `AlignedCharArrayUnion` needs to be reverted for some
unexpected reason.
2020-12-04 11:07:42 -08:00
jasonliu a65d8c5d72 [XCOFF][AIX] Generate LSDA data and compact unwind section on AIX
Summary:
AIX uses the existing EH infrastructure in clang and llvm.
The major differences would be
1. AIX do not have CFI instructions.
2. AIX uses a new personality routine, named __xlcxx_personality_v1.
   It doesn't use the GCC personality rountine, because the
   interoperability is not there yet on AIX.
3. AIX do not use eh_frame sections. Instead, it would use a eh_info
section (compat unwind section) to store the information about
personality routine and LSDA data address.

Reviewed By: daltenty, hubert.reinterpretcast

Differential Revision: https://reviews.llvm.org/D91455
2020-12-02 18:42:44 +00:00
Sanjay Patel 9f60b8b3d2 [InstCombine] canonicalize sign-bit-shift of difference to ext(icmp)
icmp is the preferred spelling in IR because icmp analysis is
expected to be better than any other analysis. This should
lead to more follow-on folding potential.

It's difficult to say exactly what we should do in codegen to
compensate. For example on AArch64, which of these is preferred:
	sub	w8, w0, w1
	lsr	w0, w8, #31

vs:
	cmp	w0, w1
	cset	w0, lt

If there are perf regressions, then we should deal with those in
codegen on a case-by-case basis.

A possible motivating example for better optimization is shown in:
https://llvm.org/PR43198 but that will require other transforms
before anything changes there.

Alive proof:
https://rise4fun.com/Alive/o4E

  Name: sign-bit splat
  Pre: C1 == (width(%x) - 1)
  %s = sub nsw %x, %y
  %r = ashr %s, C1
  =>
  %c = icmp slt %x, %y
  %r = sext %c

  Name: sign-bit LSB
  Pre: C1 == (width(%x) - 1)
  %s = sub nsw %x, %y
  %r = lshr %s, C1
  =>
  %c = icmp slt %x, %y
  %r = zext %c
2020-12-01 09:58:11 -05:00
Bhramar Vatsa fd679107d6
[InstCombine] Optimize away the unnecessary multi-use sign-extend
C.f. https://bugs.llvm.org/show_bug.cgi?id=47765

Added a case for handling the sign-extend (Shl+AShr) for multiple uses,
to optimize it away for an individual use,
when the demanded bits aren't affected by sign-extend.

https://rise4fun.com/Alive/lgf

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D91343
2020-12-01 16:54:00 +03:00
Roman Lebedev 94ead0190f
[InstCombine] Improve vector undef handling for sext(ashr(shl(trunc()))) fold, 2
If the shift amount was undef for some lane, the shift amount in opposite
shift is irrelevant for that lane, and the new shift amount for that lane
can be undef.
2020-12-01 16:54:00 +03:00
Roman Lebedev 52533b52b8
Revert "[InstCombine] Improve vector undef handling for sext(ashr(shl(trunc()))) fold"
It seems i have missed checklines, temporairly reverting,
will reland momentairly..

This reverts commit aa1aa13509.
2020-12-01 15:47:04 +03:00
Roman Lebedev aa1aa13509
[InstCombine] Improve vector undef handling for sext(ashr(shl(trunc()))) fold
If the shift amount was undef for some lane, the shift amount in opposite
shift is irrelevant for that lane, and the new shift amount for that lane
can be undef.
2020-12-01 15:13:08 +03:00
Roman Lebedev 8e29e20e0d
[InstCombine] Evaluate new shift amount for sext(ashr(shl(trunc()))) fold in wide type (PR48343)
It is not correct to compute that new shift amount in it's narrow type
and only then extend it into the wide type:

----------------------------------------
Optimization: PR48343 good
Precondition: (width(%X) == width(%r))
  %o0 = trunc %X
  %o1 = shl %o0, %Y
  %o2 = ashr %o1, %Y
  %r = sext %o2
=>
  %n0 = sext %Y
  %n1 = sub width(%o0), %n0
  %n2 = sub width(%X), %n1
  %n3 = shl %X, %n2
  %r = ashr %n3, %n2

Done: 2016
Optimization is correct!

----------------------------------------
Optimization: PR48343 bad
Precondition: (width(%X) == width(%r))
  %o0 = trunc %X
  %o1 = shl %o0, %Y
  %o2 = ashr %o1, %Y
  %r = sext %o2
=>
  %n0 = sub width(%o0), %Y
  %n1 = sub width(%X), %n0
  %n2 = sext %n1
  %n3 = shl %X, %n2
  %r = ashr %n3, %n2

Done: 1
ERROR: Domain of definedness of Target is smaller than Source's for i9 %r

Example:
%X i9 = 0x000 (0)
%Y i4 = 0x3 (3)
%o0 i4 = 0x0 (0)
%o1 i4 = 0x0 (0)
%o2 i4 = 0x0 (0)
%n0 i4 = 0x1 (1)
%n1 i4 = 0x8 (8, -8)
%n2 i9 = 0x1F8 (504, -8)
%n3 i9 = 0x000 (0)
Source value: 0x000 (0)
Target value: undef


I.e. we should be computing it in the wide type from the beginning.

Fixes https://bugs.llvm.org/show_bug.cgi?id=48343
2020-12-01 15:13:07 +03:00
Sanjay Patel 678b9c5dde [InstCombine] try difference-of-shifts factorization before negator
We need to preserve wrapping flags to allow better folds.
The cases with geps may be non-intuitive, but that appears to agree with Alive2:
https://alive2.llvm.org/ce/z/JQcqw7
We create 'nsw' ops independent from the original wrapping on the sub.
2020-11-24 13:56:30 -05:00
Sanjay Patel ab29f091eb [InstCombine] propagate 'nsw' on pointer difference of 'inbounds' geps
This is a retry of 324a53205. I cautiously reverted that at 6aa3fc4
because the rules about gep math were not clear. Since then, we
have added this line to LangRef for gep inbounds:
"The successive addition of offsets (without adding the base address)
does not wrap the pointer index type in a signed sense (nsw)."

See D90708 and post-commit comments on the revert patch for more details.
2020-11-23 16:50:09 -05:00
Kazu Hirata def7cfb7ff [InstCombine] Use is_contained (NFC) 2020-11-21 15:47:11 -08:00
Roman Lebedev a91e96702a
[InstCombine] Fold `and(shl(zext(x), width(SIGNMASK) - width(%x)), SIGNMASK)` to `and(sext(%x), SIGNMASK)`
One less instruction and reducing use count of zext.
As alive2 confirms, we're fine with all the weird combinations of
undef elts in constants, but unless the shift amount was undef
for a lane, we must sanitize undef mask to zero, since sign bits
are no longer zeros.

https://rise4fun.com/Alive/d7r
```
----------------------------------------
Optimization: zz
Precondition: ((C1 == (width(%r) - width(%x))) && isSignBit(C2))
  %o0 = zext %x
  %o1 = shl %o0, C1
  %r = and %o1, C2
=>
  %n0 = sext %x
  %r = and %n0, C2

Done: 2016
Optimization is correct!
```
2020-11-20 00:31:27 +03:00
Kazu Hirata 43c0e4f665 [Transforms] Use llvm::is_contained (NFC) 2020-11-18 20:42:22 -08:00
Sanjay Patel 4a66a1d17a [InstCombine] allow vectors for masked-add -> xor fold
https://rise4fun.com/Alive/I4Ge

  Name: add with pow2 mask
  Pre: isPowerOf2(C2) && (C1 & C2) != 0 && (C1 & (C2-1)) == 0
  %a = add i8 %x, C1
  %r = and i8 %a, C2
  =>
  %n = and i8 %x, C2
  %r = xor i8 %n, C2
2020-11-17 13:36:08 -05:00
Simon Pilgrim f7ebdec987 [InstCombine] visitAnd - remove unnecessary Value *X, *Y shadow variables. NFCI.
Fixes a number of Wshadow warnings.
2020-11-17 17:59:21 +00:00