Commit Graph

5243 Commits

Author SHA1 Message Date
Nikita Popov 8453fbf088 [ValueTracking] Compute known bits of min/max intrinsics
Implement known bits for the min/max intrinsics based on the
recently added KnownBits primitives.
2020-09-08 21:08:17 +02:00
Nikita Popov 8927c90069 [InstCombine] Add tests for known bits for min/max intrinsics (NFC)
We already have test coverage for the underlying calculation,
this just checked that the folding is wired up...
2020-09-08 21:08:17 +02:00
Nikita Popov f6b87da0c7 [InstCombine] Fold comparison of abs with int min
If the abs is poisoning, this is already folded to true/false.
For non-poisoning abs, we can convert this to a comparison with
the operand.
2020-09-08 20:23:03 +02:00
Nikita Popov 6eef387ddd [InstCombine] Test comparison of abs with int min (NFC) 2020-09-08 20:20:32 +02:00
Nikita Popov e97f3b1b43 [InstCombine] Fold abs of known negative operand
If we know that the abs operand is known negative, we can replace
it with a neg.

To avoid computing known bits twice, I've removed the fold for the
non-negative case from InstSimplify. Both the non-negative and the
negative case are handled by InstCombine now, with one known bits call.

Differential Revision: https://reviews.llvm.org/D87196
2020-09-08 20:14:35 +02:00
Sanjay Patel 4964d75d70 [InstCombine] add bitwise logic fold tests for D86395; NFC 2020-09-08 09:17:42 -04:00
Sanjay Patel 8b30067919 [InstCombine] improve fold of pointer differences
This was supposed to be an NFC cleanup, but there's
a real logic difference (did not drop 'nsw') visible
in some tests in addition to an efficiency improvement.

This is because in the case where we have 2 GEPs,
the code was *always* swapping the operands and
negating the result. But if we have 2 GEPs, we
should *never* need swapping/negation AFAICT.

This is part of improving flags propagation noticed
with PR47430.
2020-09-07 15:54:32 -04:00
Sanjay Patel 70207816e3 [InstCombine] add ptr difference tests; NFC 2020-09-07 15:54:32 -04:00
Sanjay Patel 7a6d6f0f70 [InstCombine] improve folds for icmp with multiply operands (PR47432)
Check for no overflow along with an odd constant before
we lose information by converting to bitwise logic.

https://rise4fun.com/Alive/2Xl

  Pre: C1 != 0
  %mx = mul nsw i8 %x, C1
  %my = mul nsw i8 %y, C1
  %r = icmp eq i8 %mx, %my
  =>
  %r = icmp eq i8 %x, %y

  Name: nuw ne
  Pre: C1 != 0
  %mx = mul nuw i8 %x, C1
  %my = mul nuw i8 %y, C1
  %r = icmp ne i8 %mx, %my
  =>
  %r = icmp ne i8 %x, %y

  Name: odd ne
  Pre: C1 % 2 != 0
  %mx = mul i8 %x, C1
  %my = mul i8 %y, C1
  %r = icmp ne i8 %mx, %my
  =>
  %r = icmp ne i8 %x, %y
2020-09-07 12:40:37 -04:00
Sanjay Patel 11d8eedfa5 [InstCombine] move/add tests for icmp with mul operands; NFC 2020-09-07 12:40:37 -04:00
Sanjay Patel b22910daab [InstCombine] erase instructions leading up to unreachable
Normal dead code elimination ignores assume intrinsics, so we fail to
delete assumes that are not meaningful (and potentially worse if they
cause conflicts with other assumptions).

The motivating example in https://llvm.org/PR47416 suggests that we
might have problems upstream from here (difference between C and C++),
but this should be a cheap way to make sure we remove more dead code.

Differential Revision: https://reviews.llvm.org/D87149
2020-09-07 10:44:08 -04:00
Sanjay Patel 28aa60aae2 [InstCombine] add test with more unreachable insts; NFC
Goes with D87149
2020-09-07 08:19:43 -04:00
Sanjay Patel 3ca8b9a560 [InstCombine] give a name to an intermediate value for easier tracking; NFC
As noted in PR47430, we probably want to conditionally include 'nsw'
here anyway, so we are going to need to fill out the optional args.
2020-09-07 08:19:42 -04:00
Nikita Popov 5ad6552a83 [InstCombine] Add tests for known negative abs intrinsic (NFC)
And duplicate tests for known non-negative from InstSimplify.
2020-09-05 17:31:04 +02:00
Nikita Popov 4892d3a198 [InstCombine] Fold abs with dominating condition
Similar to D87168, but for abs. If we have a dominating x >= 0
condition, then we know that abs(x) is x. This fold is in
InstCombine, because we need to create a sub instruction for
the x < 0 case.

Differential Revision: https://reviews.llvm.org/D87184
2020-09-05 16:18:35 +02:00
Nikita Popov ada8a17d94 [InstCombine] Fold abs intrinsic eq zero
Following the same transform for the select version of abs.
2020-09-05 15:11:38 +02:00
Nikita Popov 94c71d6aa1 [InstCombine] Add tests for abs intrinsic eq zero (NFC) 2020-09-05 15:11:38 +02:00
Nikita Popov 58b28fa7a2 [InstCombine] Fold mul of abs intrinsic
Same as the existing SPF_ABS fold. We don't need to explicitly
handle NABS, as the negs will get folded away first.
2020-09-05 12:37:45 +02:00
Nikita Popov 3ab13348ba [InstCombine] Add tests for mul of abs intrinsic (NFC) 2020-09-05 12:36:27 +02:00
Nikita Popov 10cb23c6ca [InstCombine] Fold cttz of abs intrinsic
Same as the existing fold for SPF_ABS. We don't need to explicitly
handle the NABS variant, as we'll first fold away the neg in that
case.
2020-09-05 12:25:41 +02:00
Nikita Popov 1903a1afd9 [InstCombine] Add tests for cttz of abs intrinsic (NFC) 2020-09-05 12:22:42 +02:00
Nikita Popov d401e376e4 [InstCombine] Test abs with dominating condition (NFC) 2020-09-05 11:10:01 +02:00
Sanjay Patel 35c6d56c04 [InstCombine] rename tmp values to avoid scripted FileCheck conflicts; NFC 2020-09-04 17:02:18 -04:00
Sanjay Patel c5d6b2b7e5 [InstCombine] add test for assume in block with unreachable (PR47416); NFC 2020-09-04 16:57:35 -04:00
Bryan Chan 3404add468 [EarlyCSE] Verify hash code in regression tests
As discussed in D86843, -earlycse-debug-hash should be used in more regression
tests to catch inconsistency between the hashing and the equivalence check.

Differential Revision: https://reviews.llvm.org/D86863
2020-09-04 10:40:35 -04:00
Sanjay Patel 2391a34f9f [InstCombine] canonicalize all commutative intrinsics with constant arg 2020-09-03 12:42:04 -04:00
Sanjay Patel 9bb3a9eebb [InstCombine] add tests for commutative intrinsics; NFC 2020-09-03 12:42:04 -04:00
Eli Friedman 96ef6998df [InstCombine] Fix a couple crashes with extractelement on a scalable vector.
Differential Revision: https://reviews.llvm.org/D86989
2020-09-02 18:02:07 -07:00
Venkataramanan Kumar 626c3738cd [InstCombine] Transform 1.0/sqrt(X) * X to X/sqrt(X)
These transforms will now be performed irrespective of the number of uses for the expression "1.0/sqrt(X)":
1.0/sqrt(X) * X => X/sqrt(X)
X * 1.0/sqrt(X) => X/sqrt(X)

We already handle more general cases, and we are intentionally not creating extra (and likely expensive)
fdiv ops in IR. This pattern is the exception to the rule because we always expect the Backend to reduce
X/sqrt(X) to sqrt(X), if it has the necessary (reassoc) fast-math-flags.

Ref: DagCombiner optimizes the X/sqrt(X) to sqrt(X).

Differential Revision: https://reviews.llvm.org/D86726
2020-09-02 08:23:48 -04:00
Sanjay Patel d48699e3e8 [InstCombine] adjust recip sqrt tests for better coverage; NFC
Goes with D86726
2020-09-01 09:43:42 -04:00
Roman Lebedev c23aefd7c3
[NFC][InstCombine] visitPHINode(): cleanup PHI CSE instruction replacement
As @nikic is pointing out in https://reviews.llvm.org/rGbf21ce7b908e#inline-4647
this must be sufficient otherwise `EliminateDuplicatePHINodes()`
would have hit issues with it already.
2020-08-31 22:29:39 +03:00
Venkataramanan Kumar 86163f885b [InstCombine] add extra-use tests for fmul+sqrt; NFC 2020-08-30 18:41:01 -04:00
Roman Lebedev bf21ce7b90
[InstCombine] Take 3: Perform trivial PHI CSE
The original take 1 was 6102310d81,
which taught InstSimplify to do that, which seemed better at time,
since we got EarlyCSE support for free.

However, it was proven that we can not do that there,
the simplified-to PHI would not be reachable from the original PHI,
and that is not something InstSimplify is allowed to do,
as noted in the commit ed90f15efb
that reverted it:
> It appears to cause compilation non-determinism and caused stage3 mismatches.

Then there was take 2 3e69871ab5,
which was InstCombine-specific, but it again showed stage2-stage3 differences,
and reverted in bdaa3f86a0.
This is quite alarming.

Here, let's try to change how we find existing PHI candidate:
due to the worklist order, and the way PHI nodes are inserted
(it may be inserted as the first one, or maybe not), let's look at *all*
PHI nodes in the block.

Effects on vanilla llvm test-suite + RawSpeed:
```
| statistic name                                     | baseline  | proposed  |      Δ |        % |    \|%\| |
|----------------------------------------------------|-----------|-----------|-------:|---------:|---------:|
| asm-printer.EmittedInsts                           | 7942329   | 7942457   |    128 |    0.00% |    0.00% |
| assembler.ObjectBytes                              | 254295632 | 254312480 |  16848 |    0.01% |    0.01% |
| correlated-value-propagation.NumPhis               | 18412     | 18347     |    -65 |   -0.35% |    0.35% |
| early-cse.NumCSE                                   | 2183283   | 2183267   |    -16 |    0.00% |    0.00% |
| early-cse.NumSimplify                              | 550105    | 541842    |  -8263 |   -1.50% |    1.50% |
| instcombine.NumAggregateReconstructionsSimplified  | 73        | 4506      |   4433 | 6072.60% | 6072.60% |
| instcombine.NumCombined                            | 3640311   | 3644419   |   4108 |    0.11% |    0.11% |
| instcombine.NumDeadInst                            | 1778204   | 1783205   |   5001 |    0.28% |    0.28% |
| instcombine.NumPHICSEs                             | 0         | 22490     |  22490 |    0.00% |    0.00% |
| instcombine.NumWorklistIterations                  | 2023272   | 2024400   |   1128 |    0.06% |    0.06% |
| instcount.NumCallInst                              | 1758395   | 1758802   |    407 |    0.02% |    0.02% |
| instcount.NumInvokeInst                            | 59478     | 59502     |     24 |    0.04% |    0.04% |
| instcount.NumPHIInst                               | 330557    | 330545    |    -12 |    0.00% |    0.00% |
| instcount.TotalBlocks                              | 1077138   | 1077220   |     82 |    0.01% |    0.01% |
| instcount.TotalFuncs                               | 101442    | 101441    |     -1 |    0.00% |    0.00% |
| instcount.TotalInsts                               | 8831946   | 8832606   |    660 |    0.01% |    0.01% |
| simplifycfg.NumHoistCommonCode                     | 24186     | 24187     |      1 |    0.00% |    0.00% |
| simplifycfg.NumInvokes                             | 4300      | 4410      |    110 |    2.56% |    2.56% |
| simplifycfg.NumSimpl                               | 1019813   | 999767    | -20046 |   -1.97% |    1.97% |
```
So it fires 22490 times, which is less than ~24k the take 1 did,
but more than what take 2 did (22228 times)
.
It allows foldAggregateConstructionIntoAggregateReuse() to actually work
after PHI-of-extractvalue folds did their thing. Previously SimplifyCFG
would have done this PHI CSE, of all places. Additionally, allows some
more `invoke`->`call` folds to happen (+110, +2.56%).

All in all, expectedly, this catches less things overall,
but all the motivational cases are still caught, so all good.
2020-08-29 18:21:24 +03:00
Roman Lebedev bdaa3f86a0
Revert "[InstCombine] Take 2: Perform trivial PHI CSE"
While the original variant with doing this in InstSimplify (rightfully)
caused questions and ultimately was detected to be a culprit
of stage2-stage3 mismatch, it was expected that
InstCombine-based implementation would be fine.

But apparently it's not, as
http://lab.llvm.org:8011/builders/clang-with-thin-lto-ubuntu/builds/24095/steps/compare-compilers/logs/stdio
suggests.

Which suggests that somewhere in InstCombine there is a loop
over nondeterministically sorted container, which causes
different worklist ordering.

This reverts commit 3e69871ab5.
2020-08-29 16:05:02 +03:00
Roman Lebedev 9c1ca8a3f7
[NFC][InstCombine] select.ll: remove outdated TODO comment
Fixed by 3e69871ab5
2020-08-29 15:10:13 +03:00
Roman Lebedev 3e69871ab5
[InstCombine] Take 2: Perform trivial PHI CSE
The original take was 6102310d81,
which taught InstSimplify to do that, which seemed better at time,
since we got EarlyCSE support for free.

However, it was proven that we can not do that there,
the simplified-to PHI would not be reachable from the original PHI,
and that is not something InstSimplify is allowed to do,
as noted in the commit ed90f15efb
that reverted it :
> It appears to cause compilation non-determinism and caused stage3 mismatches.

However InstCombine already does many different optimizations,
so it should be a safe place to do it here.

Note that we still can't just compare incoming values ranges,
because there is no guarantee that these PHI's we'd simplify to
were already re-visited and sorted.
However coming up with a test is problematic.

Effects on vanilla llvm test-suite + RawSpeed:
```
| statistic name                                     | baseline  | proposed  |      Δ |        % |      |%| |
|----------------------------------------------------|-----------|-----------|-------:|---------:|---------:|
| instcombine.NumPHICSEs                             | 0         | 22228     |  22228 |    0.00% |    0.00% |
| asm-printer.EmittedInsts                           | 7942329   | 7942456   |    127 |    0.00% |    0.00% |
| assembler.ObjectBytes                              | 254295632 | 254313792 |  18160 |    0.01% |    0.01% |
| early-cse.NumCSE                                   | 2183283   | 2183272   |    -11 |    0.00% |    0.00% |
| early-cse.NumSimplify                              | 550105    | 541842    |  -8263 |   -1.50% |    1.50% |
| instcombine.NumAggregateReconstructionsSimplified  | 73        | 4506      |   4433 | 6072.60% | 6072.60% |
| instcombine.NumCombined                            | 3640311   | 3666911   |  26600 |    0.73% |    0.73% |
| instcombine.NumDeadInst                            | 1778204   | 1783318   |   5114 |    0.29% |    0.29% |
| instcount.NumCallInst                              | 1758395   | 1758804   |    409 |    0.02% |    0.02% |
| instcount.NumInvokeInst                            | 59478     | 59502     |     24 |    0.04% |    0.04% |
| instcount.NumPHIInst                               | 330557    | 330549    |     -8 |    0.00% |    0.00% |
| instcount.TotalBlocks                              | 1077138   | 1077221   |     83 |    0.01% |    0.01% |
| instcount.TotalFuncs                               | 101442    | 101441    |     -1 |    0.00% |    0.00% |
| instcount.TotalInsts                               | 8831946   | 8832611   |    665 |    0.01% |    0.01% |
| simplifycfg.NumInvokes                             | 4300      | 4410      |    110 |    2.56% |    2.56% |
| simplifycfg.NumSimpl                               | 1019813   | 999740    | -20073 |   -1.97% |    1.97% |
```
So it fires ~22k times, which is less than ~24k the take 1 did.
It allows foldAggregateConstructionIntoAggregateReuse() to actually work
after PHI-of-extractvalue folds did their thing. Previously SimplifyCFG
would have done this PHI CSE, of all places. Additionally, allows some
more `invoke`->`call` folds to happen (+110, +2.56%).

All in all, expectedly, this catches less things overall,
but all the motivational cases are still caught, so all good.
2020-08-29 13:13:06 +03:00
Roman Lebedev 3ba83f2d29
[NFC][InstCombine] Add tests for PHI CSE
As discussed in https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20200824/824235.html
even though it seems worthwhile doing so in InstSimplify,
we really can't do that there, because the other PHI wouldn't be
def-reachable from the original PHI.

So ignoring whether or not EarlyCSE should also know to do this,
InstCombine is the place.
2020-08-29 13:13:06 +03:00
Owen Anderson ed90f15efb Revert "[InstSimplify][EarlyCSE] Try to CSE PHI nodes in the same basic block"
This reverts commit 6102310d81.  It
appears to cause compilation non-determinism and caused stage3
mismatches.
2020-08-28 23:43:42 +00:00
Nikita Popov ffe05dd125 [InstCombine] usub.sat(a, b) + b => umax(a, b) (PR42178)
Fixes https://bugs.llvm.org/show_bug.cgi?id=42178 by folding
usub.sat(a, b) + b to umax(a, b). The backend will expand umax
back to usubsat if that is profitable.

We may also want to handle uadd.sat(a, b) - b in the future.

Differential Revision: https://reviews.llvm.org/D63060
2020-08-28 21:52:29 +02:00
Roman Lebedev 6102310d81
[InstSimplify][EarlyCSE] Try to CSE PHI nodes in the same basic block
Apparently, we don't do this, neither in EarlyCSE, nor in InstSimplify,
nor in (old) GVN, but do in NewGVN and SimplifyCFG of all places..

While i could teach EarlyCSE how to hash PHI nodes,
we can't really do much (anything?) even if we find two identical
PHI nodes in different basic blocks, same-BB case is the interesting one,
and if we teach InstSimplify about it (which is what i wanted originally,
https://reviews.llvm.org/D86530), we get EarlyCSE support for free.

So i would think this is pretty uncontroversial.

On vanilla llvm test-suite + RawSpeed, this has the following effects:
```
| statistic name                                     | baseline  | proposed  |      Δ |        % |    \|%\| |
|----------------------------------------------------|-----------|-----------|-------:|---------:|---------:|
| instsimplify.NumPHICSE                             | 0         | 23779     |  23779 |    0.00% |    0.00% |
| asm-printer.EmittedInsts                           | 7942328   | 7942392   |     64 |    0.00% |    0.00% |
| assembler.ObjectBytes                              | 273069192 | 273084704 |  15512 |    0.01% |    0.01% |
| correlated-value-propagation.NumPhis               | 18412     | 18539     |    127 |    0.69% |    0.69% |
| early-cse.NumCSE                                   | 2183283   | 2183227   |    -56 |    0.00% |    0.00% |
| early-cse.NumSimplify                              | 550105    | 542090    |  -8015 |   -1.46% |    1.46% |
| instcombine.NumAggregateReconstructionsSimplified  | 73        | 4506      |   4433 | 6072.60% | 6072.60% |
| instcombine.NumCombined                            | 3640264   | 3664769   |  24505 |    0.67% |    0.67% |
| instcombine.NumDeadInst                            | 1778193   | 1783183   |   4990 |    0.28% |    0.28% |
| instcount.NumCallInst                              | 1758401   | 1758799   |    398 |    0.02% |    0.02% |
| instcount.NumInvokeInst                            | 59478     | 59502     |     24 |    0.04% |    0.04% |
| instcount.NumPHIInst                               | 330557    | 330533    |    -24 |   -0.01% |    0.01% |
| instcount.TotalInsts                               | 8831952   | 8832286   |    334 |    0.00% |    0.00% |
| simplifycfg.NumInvokes                             | 4300      | 4410      |    110 |    2.56% |    2.56% |
| simplifycfg.NumSimpl                               | 1019808   | 999607    | -20201 |   -1.98% |    1.98% |
```
I.e. it fires ~24k times, causes +110 (+2.56%) more `invoke` -> `call`
transforms, and counter-intuitively results in *more* instructions total.

That being said, the PHI count doesn't decrease that much,
and looking at some examples, it seems at least some of them
were previously getting PHI CSE'd in SimplifyCFG of all places..

I'm adjusting `Instruction::isIdenticalToWhenDefined()` at the same time.
As a comment in `InstCombinerImpl::visitPHINode()` already stated,
there are no guarantees on the ordering of the operands of a PHI node,
so if we just naively compare them, we may false-negatively say that
the nodes are not equal when the only difference is operand order,
which is especially important since the fold is in InstSimplify,
so we can't rely on InstCombine sorting them beforehand.

Fixing this for the general case is costly (geomean +0.02%),
and does not appear to catch anything in test-suite, but for
the same-BB case, it's trivial, so let's fix at least that.

As per http://llvm-compile-time-tracker.com/compare.php?from=04879086b44348cad600a0a1ccbe1f7776cc3cf9&to=82bdedb888b945df1e9f130dd3ac4dd3c96e2925&stat=instructions
this appears to cause geomean +0.03% compile time increase (regression),
but geomean -0.01%..-0.04% code size decrease (improvement).
2020-08-27 18:47:04 +03:00
Florian Hahn 419c6948df [SimplifyLibCalls] Remove over-eager early return in strlen optzns.
Currently we bail out early for strlen calls with a GEP operand, if none
of the GEP specific optimizations fire. But there could be later
optimizations that still apply,  which we currently miss out on.

An example is that we do not apply the following optimization
   strlen(x) == 0 --> *x == 0

Unless I am missing something, there seems to be no reason for bailing
out early there.

Fixes PR47149.

Reviewed By: lebedev.ri, xbolva00

Differential Revision: https://reviews.llvm.org/D85886
2020-08-27 15:19:45 +01:00
Roman Lebedev 95848ea101
[Value][InstCombine] Fix one-use checks in PHI-of-op -> Op-of-PHI[s] transforms to be one-user checks
As FIXME said, they really should be checking for a single user,
not use, so let's do that. It is not *that* unusual to have
the same value as incoming value in a PHI node, not unlike
how a PHI may have the same incoming basic block more than once.

There isn't a nice way to do that, Value::users() isn't uniqified,
and Value only tracks it's uses, not Users, so the check is
potentially costly since it does indeed potentially involes
traversing the entire use list of a value.
2020-08-26 20:20:41 +03:00
Roman Lebedev 8bfe46dce2
[NFC][InstCombine] Add tests with PHI-of-{insert,extract}value with multiple uses
It is fine if the operation has multiple uses, as long as they are all
in this very PHI node.
2020-08-26 20:20:41 +03:00
Roman Lebedev 1f90d45b9e
[InstCombine] PHI-of-extractvalues -> extractvalue-of-PHI, aka invokes are bad
While since D86306 we do it's sibling fold for `insertvalue`,
we should also do this for `extractvalue`'s.

And unlike that one, the results here are, quite honestly, shocking,
as it can be observed here on vanilla llvm test-suite + RawSpeed results:

```
| statistic name                                     | baseline  | proposed  |       Δ |       % |    |%| |
|----------------------------------------------------|-----------|-----------|--------:|--------:|-------:|
| asm-printer.EmittedInsts                           | 7945095   | 7942507   |   -2588 |  -0.03% |  0.03% |
| assembler.ObjectBytes                              | 273209920 | 273069800 | -140120 |  -0.05% |  0.05% |
| early-cse.NumCSE                                   | 2183363   | 2183398   |      35 |   0.00% |  0.00% |
| early-cse.NumSimplify                              | 541847    | 550017    |    8170 |   1.51% |  1.51% |
| instcombine.NumAggregateReconstructionsSimplified  | 2139      | 108       |   -2031 | -94.95% | 94.95% |
| instcombine.NumCombined                            | 3601364   | 3635448   |   34084 |   0.95% |  0.95% |
| instcombine.NumConstProp                           | 27153     | 27157     |       4 |   0.01% |  0.01% |
| instcombine.NumDeadInst                            | 1694521   | 1765022   |   70501 |   4.16% |  4.16% |
| instcombine.NumPHIsOfExtractValues                 | 0         | 37546     |   37546 |   0.00% |  0.00% |
| instcombine.NumSunkInst                            | 63158     | 63686     |     528 |   0.84% |  0.84% |
| instcount.NumBrInst                                | 874304    | 871857    |   -2447 |  -0.28% |  0.28% |
| instcount.NumCallInst                              | 1757657   | 1758402   |     745 |   0.04% |  0.04% |
| instcount.NumExtractValueInst                      | 45623     | 11483     |  -34140 | -74.83% | 74.83% |
| instcount.NumInsertValueInst                       | 4983      | 580       |   -4403 | -88.36% | 88.36% |
| instcount.NumInvokeInst                            | 61018     | 59478     |   -1540 |  -2.52% |  2.52% |
| instcount.NumLandingPadInst                        | 35334     | 34215     |   -1119 |  -3.17% |  3.17% |
| instcount.NumPHIInst                               | 344428    | 331116    |  -13312 |  -3.86% |  3.86% |
| instcount.NumRetInst                               | 100773    | 100772    |      -1 |   0.00% |  0.00% |
| instcount.TotalBlocks                              | 1081154   | 1077166   |   -3988 |  -0.37% |  0.37% |
| instcount.TotalFuncs                               | 101443    | 101442    |      -1 |   0.00% |  0.00% |
| instcount.TotalInsts                               | 8890201   | 8833747   |  -56454 |  -0.64% |  0.64% |
| instsimplify.NumSimplified                         | 75822     | 75707     |    -115 |  -0.15% |  0.15% |
| simplifycfg.NumHoistCommonCode                     | 24203     | 24197     |      -6 |  -0.02% |  0.02% |
| simplifycfg.NumHoistCommonInstrs                   | 48201     | 48195     |      -6 |  -0.01% |  0.01% |
| simplifycfg.NumInvokes                             | 2785      | 4298      |    1513 |  54.33% | 54.33% |
| simplifycfg.NumSimpl                               | 997332    | 1018189   |   20857 |   2.09% |  2.09% |
| simplifycfg.NumSinkCommonCode                      | 7088      | 6464      |    -624 |  -8.80% |  8.80% |
| simplifycfg.NumSinkCommonInstrs                    | 15117     | 14021     |   -1096 |  -7.25% |  7.25% |
```
... which tells us that this new fold fires whopping 38k times,
increasing the amount of SimplifyCFG's `invoke`->`call` transforms by +54% (+1513) (again, D85787 did that last time),
decreasing total instruction count by -0.64% (-56454),
and sharply decreasing count of `insertvalue`'s (-88.36%, i.e. 9 times less)
and `extractvalue`'s (-74.83%, i.e. four times less).

This causes geomean -0.01% binary size decrease
http://llvm-compile-time-tracker.com/compare.php?from=4d5ca22b8adfb6643466e4e9f48ba14bb48938bc&to=97dacca0111cb2ae678204e52a3cee00e3a69208&stat=size-text
and, ignoring `O0-g`, is a geomean -0.01%..-0.05% compile-time improvement
http://llvm-compile-time-tracker.com/compare.php?from=4d5ca22b8adfb6643466e4e9f48ba14bb48938bc&to=97dacca0111cb2ae678204e52a3cee00e3a69208&stat=instructions

The other thing that tells is, is that while this is a massive win for `invoke`->`call` transform
`InstCombinerImpl::foldAggregateConstructionIntoAggregateReuse()` fold,
which is supposed to be dealing with such aggregate reconstructions,
fires a lot less now. There are two reasons why:
1. After this fold, as it can be seen in tests, we may (will) end up with trivially redundant PHI nodes.
   We don't CSE them in InstCombine presently, which means that EarlyCSE needs to run and then InstCombine rerun.
2. But then, EarlyCSE not only manages to fold such redundant PHI's,
   it also sees that the extract-insert chain recreates the original aggregate,
   and replaces it with the original aggregate.

The take-aways are
1. We maybe should do most trivial, same-BB PHI CSE in InstCombine
2. I need to check if what other patterns remain, and how they can be resolved.
   (i.e. i wonder if `foldAggregateConstructionIntoAggregateReuse()` might go away)

This is a reland of the original commit fcb51d8c24,
because originally i forgot to ensure that the base aggregate types match.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D86530
2020-08-26 09:57:50 +03:00
Roman Lebedev 451b1bd894
[NFC][InstCombine] Add a PHI-of-insertvalues test with different base aggregate types 2020-08-26 09:57:50 +03:00
Roman Lebedev c295c6f2c0
Revert "[InstCombine] PHI-of-extractvalues -> extractvalue-of-PHI, aka invokes are bad"
This reverts commit fcb51d8c24.

As buildbots report, there's apparently some missing check to ensure
that the types of incoming values match the type of PHI.
Let's revert for a moment.
2020-08-26 09:23:22 +03:00
Roman Lebedev fcb51d8c24
[InstCombine] PHI-of-extractvalues -> extractvalue-of-PHI, aka invokes are bad
While since D86306 we do it's sibling fold for `insertvalue`,
we should also do this for `extractvalue`'s.

And unlike that one, the results here are, quite honestly, shocking,
as it can be observed here on vanilla llvm test-suite + RawSpeed results:

```
| statistic name                                     | baseline  | proposed  |       Δ |       % |    |%| |
|----------------------------------------------------|-----------|-----------|--------:|--------:|-------:|
| asm-printer.EmittedInsts                           | 7945095   | 7942507   |   -2588 |  -0.03% |  0.03% |
| assembler.ObjectBytes                              | 273209920 | 273069800 | -140120 |  -0.05% |  0.05% |
| early-cse.NumCSE                                   | 2183363   | 2183398   |      35 |   0.00% |  0.00% |
| early-cse.NumSimplify                              | 541847    | 550017    |    8170 |   1.51% |  1.51% |
| instcombine.NumAggregateReconstructionsSimplified  | 2139      | 108       |   -2031 | -94.95% | 94.95% |
| instcombine.NumCombined                            | 3601364   | 3635448   |   34084 |   0.95% |  0.95% |
| instcombine.NumConstProp                           | 27153     | 27157     |       4 |   0.01% |  0.01% |
| instcombine.NumDeadInst                            | 1694521   | 1765022   |   70501 |   4.16% |  4.16% |
| instcombine.NumPHIsOfExtractValues                 | 0         | 37546     |   37546 |   0.00% |  0.00% |
| instcombine.NumSunkInst                            | 63158     | 63686     |     528 |   0.84% |  0.84% |
| instcount.NumBrInst                                | 874304    | 871857    |   -2447 |  -0.28% |  0.28% |
| instcount.NumCallInst                              | 1757657   | 1758402   |     745 |   0.04% |  0.04% |
| instcount.NumExtractValueInst                      | 45623     | 11483     |  -34140 | -74.83% | 74.83% |
| instcount.NumInsertValueInst                       | 4983      | 580       |   -4403 | -88.36% | 88.36% |
| instcount.NumInvokeInst                            | 61018     | 59478     |   -1540 |  -2.52% |  2.52% |
| instcount.NumLandingPadInst                        | 35334     | 34215     |   -1119 |  -3.17% |  3.17% |
| instcount.NumPHIInst                               | 344428    | 331116    |  -13312 |  -3.86% |  3.86% |
| instcount.NumRetInst                               | 100773    | 100772    |      -1 |   0.00% |  0.00% |
| instcount.TotalBlocks                              | 1081154   | 1077166   |   -3988 |  -0.37% |  0.37% |
| instcount.TotalFuncs                               | 101443    | 101442    |      -1 |   0.00% |  0.00% |
| instcount.TotalInsts                               | 8890201   | 8833747   |  -56454 |  -0.64% |  0.64% |
| instsimplify.NumSimplified                         | 75822     | 75707     |    -115 |  -0.15% |  0.15% |
| simplifycfg.NumHoistCommonCode                     | 24203     | 24197     |      -6 |  -0.02% |  0.02% |
| simplifycfg.NumHoistCommonInstrs                   | 48201     | 48195     |      -6 |  -0.01% |  0.01% |
| simplifycfg.NumInvokes                             | 2785      | 4298      |    1513 |  54.33% | 54.33% |
| simplifycfg.NumSimpl                               | 997332    | 1018189   |   20857 |   2.09% |  2.09% |
| simplifycfg.NumSinkCommonCode                      | 7088      | 6464      |    -624 |  -8.80% |  8.80% |
| simplifycfg.NumSinkCommonInstrs                    | 15117     | 14021     |   -1096 |  -7.25% |  7.25% |
```
... which tells us that this new fold fires whopping 38k times,
increasing the amount of SimplifyCFG's `invoke`->`call` transforms by +54% (+1513) (again, D85787 did that last time),
decreasing total instruction count by -0.64% (-56454),
and sharply decreasing count of `insertvalue`'s (-88.36%, i.e. 9 times less)
and `extractvalue`'s (-74.83%, i.e. four times less).

This causes geomean -0.01% binary size decrease
http://llvm-compile-time-tracker.com/compare.php?from=4d5ca22b8adfb6643466e4e9f48ba14bb48938bc&to=97dacca0111cb2ae678204e52a3cee00e3a69208&stat=size-text
and, ignoring `O0-g`, is a geomean -0.01%..-0.05% compile-time improvement
http://llvm-compile-time-tracker.com/compare.php?from=4d5ca22b8adfb6643466e4e9f48ba14bb48938bc&to=97dacca0111cb2ae678204e52a3cee00e3a69208&stat=instructions

The other thing that tells is, is that while this is a massive win for `invoke`->`call` transform
`InstCombinerImpl::foldAggregateConstructionIntoAggregateReuse()` fold,
which is supposed to be dealing with such aggregate reconstructions,
fires a lot less now. There are two reasons why:
1. After this fold, as it can be seen in tests, we may (will) end up with trivially redundant PHI nodes.
   We don't CSE them in InstCombine presently, which means that EarlyCSE needs to run and then InstCombine rerun.
2. But then, EarlyCSE not only manages to fold such redundant PHI's,
   it also sees that the extract-insert chain recreates the original aggregate,
   and replaces it with the original aggregate.

The take-aways are
1. We maybe should do most trivial, same-BB PHI CSE in InstCombine
2. I need to check if what other patterns remain, and how they can be resolved.
   (i.e. i wonder if `foldAggregateConstructionIntoAggregateReuse()` might go away)

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D86530
2020-08-26 09:08:24 +03:00
Sanjay Patel c4f0a0896f [InstCombine] improve demanded element analysis for vector insert-of-extract (2nd try)
The 1st attempt (rG557b890) was reverted because it caused miscompiles.
That bug is avoided here by changing the order of folds and as verified
in the new tests.

Original commit message:
InstCombine currently has odd rules for folding insert-extract chains to shuffles,
so we miss collapsing seemingly simple cases as shown in the tests here.

But poison makes this not quite as easy as we might have guessed. Alive2 tests to
show the subtle difference (similar to the regression tests):
https://alive2.llvm.org/ce/z/hp4hv3 (this is ok)
https://alive2.llvm.org/ce/z/ehEWaN (poison leakage)

SLP tends to create these patterns (as shown in the SLP tests), and this could
help with solving PR16739.

Differential Revision: https://reviews.llvm.org/D86460
2020-08-25 11:19:36 -04:00
Sanjay Patel 11f8d4aa10 [InstCombine] add vector demanded elements tests with shuffles; NFC
The 1st draft of D86460 (reverted) would show miscompiles with these tests
because the undef element tracking went wrong and became visible in the
shuffle masks.
2020-08-25 11:19:35 -04:00
Roman Lebedev 4d5ca22b8a
[NFC][InstCombine] Tests for PHI-of-extractvalues
Much like with it's sibling fold HI-of-insertvalues,
it appears to be much more worthwhile than it would seem.
2020-08-25 13:01:07 +03:00