Commit Graph

4030 Commits

Author SHA1 Message Date
Simon Pilgrim 91589cf679 Add missing namespace closure comments. NFCI.
Fixes some clang-tidy llvm-namespace-comment warnings.
2020-09-23 16:19:25 +01:00
David Sherwood 59c4d5aad0 [SVE] Fix InstCombinerImpl::PromoteCastOfAllocation for scalable vectors
In this patch I've fixed some warnings that arose from the implicit
cast of TypeSize -> uint64_t. I tried writing a variety of different
cases to show how this optimisation might work for scalable vectors
and found:

1. The optimisation does not work for cases where the cast type
is scalable and the allocated type is not. This because we need to
know how many times the cast type fits into the allocated type.
2. If we pass all the various checks for the case when the allocated
type is scalable and the cast type is not, then when creating the
new alloca we have to take vscale into account. This leads to
sub-optimal IR that is worse than the original IR.
3. For the remaining case when both the alloca and cast types are
scalable it is hard to find examples where the optimisation would
kick in, except for simple bitcasts, because we typically fail the
ABI alignment checks.

For now I've changed the code to bail out if only one of the alloca
and cast types is scalable. This means we continue to support the
existing cases where both types are fixed, and also the specific case
when both types are scalable with the same size and alignment, for
example a simple bitcast of an alloca to another type.

I've added tests that show we don't attempt to promote the alloca,
except for simple bitcasts:

  Transforms/InstCombine/AArch64/sve-cast-of-alloc.ll

Differential revision: https://reviews.llvm.org/D87378
2020-09-23 08:43:05 +01:00
Martin Storsjö 2c4c659666 [InstCombine] Add parentheses in assert to silence GCC warning. NFC. 2020-09-23 09:03:01 +03:00
Sanjay Patel 6bad3caeb0 [InstCombine] use unary shuffle creator to reduce code duplication; NFC 2020-09-21 15:34:24 -04:00
Sanjay Patel 7903ae4720 [InstCombine] factorize left shifts of add/sub
We do similar factorization folds in SimplifyUsingDistributiveLaws,
but that drops no-wrap properties. Propagating those optimally may
help solve:
https://llvm.org/PR47430

The propagation is all-or-nothing for these patterns: when all
3 incoming ops have nsw or nuw, the 2 new ops should have the
same no-wrap property:
https://alive2.llvm.org/ce/z/Dv8wsU

This also solves:
https://llvm.org/PR47584
2020-09-20 12:55:24 -04:00
Sanjay Patel cf75e83275 [InstCombine] replace zombie unreachable values with 'undef' before erasing
The test (currently crashing) is reduced from the example provided
in the post-commit discussion in D87149.

Differential Revision: https://reviews.llvm.org/D87965
2020-09-20 12:25:08 -04:00
Huihui Zhang 9ad6049736 [InstCombine][SVE] Skip scalable type for InstCombiner::getFlippedStrictnessPredicateAndConstant.
We cannot iterate on scalable vector, the number of elements is unknown at compile-time.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D87918
2020-09-18 11:26:36 -07:00
Nikita Popov 13e19d2e7c Revert "[InstCombine] Canonicalize SPF_ABS to abs intrinc"
This reverts commit 05d4c4ebc2.

mstorsjo reports a miscompile after this change in
https://reviews.llvm.org/D87188#2281093. Reverting until I can
investigate this.
2020-09-18 09:38:26 +02:00
Nikita Popov 05d4c4ebc2 [InstCombine] Canonicalize SPF_ABS to abs intrinc
Enable canonicalization of SPF_ABS and SPF_NABS to the abs intrinsic.

To be conservative, the one-use check on the comparison is retained,
this may be relaxed if all goes well.

It's pretty likely that this will uncover places that missing
handling for the abs() intrinsic. Please report any seen performance
regressions.

Differential Revision: https://reviews.llvm.org/D87188
2020-09-17 22:28:34 +02:00
Nikita Popov 222bf3ffbc Reapply [InstCombine] Simplify select operand based on equality condition
Reapply after fixing SimplifyWithOpReplaced() to never return
the original value, which would lead to an infinite loop in this
transform.

-----

For selects of the type X == Y ? A : B, check if we can simplify A
by using the X == Y equality and replace the operand if that's
possible. We already try to do this in InstSimplify, but will only
fold if the result of the simplification is the same as B, in which
case the select can be dropped entirely. Here the select will be
retained, just one operand simplified.

As we are performing an actual replacement here, we don't have
problems with refinement / poison values.

Differential Revision: https://reviews.llvm.org/D87480
2020-09-16 20:53:58 +02:00
Benjamin Kramer b768546fe0 Revert "[InstCombine] Simplify select operand based on equality condition"
This reverts commit cfff88c03c. Sends
instcombine into an infinite loop.

```
define i1 @foo(i32 %arg, i32 %arg1) {
bb:
  %tmp = udiv i32 %arg, %arg1
  %tmp2 = mul nsw i32 %tmp, %arg1
  %tmp3 = icmp eq i32 %tmp2, %arg
  %tmp4 = select i1 %tmp3, i32 %tmp, i32 undef
  %tmp5 = icmp sgt i32 %tmp4, 255
  ret i1 %tmp5
}
```
2020-09-15 12:22:47 +02:00
Nikita Popov cfff88c03c [InstCombine] Simplify select operand based on equality condition
For selects of the type X == Y ? A : B, check if we can simplify A
by using the X == Y equality and replace the operand if that's
possible. We already try to do this in InstSimplify, but will only
fold if the result of the simplification is the same as B, in which
case the select can be dropped entirely. Here the select will be
retained, just one operand simplified.

As we are performing an actual replacement here, we don't have
problems with refinement / poison values.

Differential Revision: https://reviews.llvm.org/D87480
2020-09-14 20:07:06 +02:00
Tyker 78de7297ab Reland [AssumeBundles] Use operand bundles to encode alignment assumptions
NOTE: There is a mailing list discussion on this: http://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html

Complemantary to the assumption outliner prototype in D71692, this patch
shows how we could simplify the code emitted for an alignemnt
assumption. The generated code is smaller, less fragile, and it makes it
easier to recognize the additional use as a "assumption use".

As mentioned in D71692 and on the mailing list, we could adopt this
scheme, and similar schemes for other patterns, without adopting the
assumption outlining.
2020-09-12 15:36:06 +02:00
Nikita Popov 36e2e2e12e [InstCombine] Fix incorrect SimplifyWithOpReplaced transform (PR47322)
This is a followup to D86834, which partially fixed this issue in
InstSimplify. However, InstCombine repeats the same transform while
dropping poison flags -- which does not cover cases where poison is
introduced in some other way.

The fix here is a bit more comprehensive, because things are quite
entangled, and it's hard to only partially address it without
regressing optimization. There are really two changes here:

 * Export the SimplifyWithOpReplaced API from InstSimplify, with an
   added AllowRefinement flag. For replacements inside the TrueVal
   we don't actually care whether refinement occurs or not, the
   replacement is always legal. This part of the transform is now
   done in InstSimplify only. (It should be noted that the current
   AllowRefinement check is not sufficient -- that's an issue we
   need to address separately.)
 * Change the InstCombine fold to work by temporarily dropping
   poison generating flags, running the fold and then restoring the
   flags if it didn't work out. This will ensure that the InstCombine
   fold is correct as long as the InstSimplify fold is correct.

Differential Revision: https://reviews.llvm.org/D87445
2020-09-12 14:45:06 +02:00
Sanjay Patel 6aa3fc4a5b Revert "[InstCombine] propagate 'nsw' on pointer difference of 'inbounds' geps (PR47430)"
This reverts commit 324a53205a.

On closer examination of at least one of the test diffs,
this does not appear to be correct in all cases. Even the
existing 'nsw' creation may be wrong based on this example:
https://alive2.llvm.org/ce/z/uL4Hw9
https://alive2.llvm.org/ce/z/fJMKQS
2020-09-11 10:54:48 -04:00
Sanjay Patel 324a53205a [InstCombine] propagate 'nsw' on pointer difference of 'inbounds' geps (PR47430)
There's no signed wrap if both geps have 'inbounds':
https://alive2.llvm.org/ce/z/nZkQTg
https://alive2.llvm.org/ce/z/7qFauh
2020-09-11 10:39:09 -04:00
Simon Pilgrim 48b510c4bc [NFC] Fix compiler warnings due to integer comparison of different signedness
Fix by directly using INT_MAX and INT32_MAX.

Patch by: @nullptr.cpp (Yang Fan)

Differential Revision: https://reviews.llvm.org/D87347
2020-09-11 15:32:03 +01:00
Christopher Tetreault 7ddfd9b3eb [SVE] Bail from VectorUtils heuristics for scalable vectors
Bail from maskIsAllZeroOrUndef and maskIsAllOneOrUndef prior to iterating over the number of
elements for scalable vectors.

Assert that the mask type is not scalable in possiblyDemandedEltsInMask .

Assert that the types are correct in all three functions.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D87424
2020-09-10 12:29:37 -07:00
Nikita Popov 4e413e1621 [InstCombine] Temporarily do not drop volatile stores before unreachable
See discussion in D87149. Dropping volatile stores here is legal
per LLVM semantics, but causes issues for real code and may result
in a change to LLVM volatile semantics. Temporarily treat volatile
stores as "not guaranteed to transfer execution" in just this place,
until this issue has been resolved.
2020-09-10 16:16:44 +02:00
Nikita Popov f6b87da0c7 [InstCombine] Fold comparison of abs with int min
If the abs is poisoning, this is already folded to true/false.
For non-poisoning abs, we can convert this to a comparison with
the operand.
2020-09-08 20:23:03 +02:00
Nikita Popov e97f3b1b43 [InstCombine] Fold abs of known negative operand
If we know that the abs operand is known negative, we can replace
it with a neg.

To avoid computing known bits twice, I've removed the fold for the
non-negative case from InstSimplify. Both the non-negative and the
negative case are handled by InstCombine now, with one known bits call.

Differential Revision: https://reviews.llvm.org/D87196
2020-09-08 20:14:35 +02:00
Sanjay Patel 8b30067919 [InstCombine] improve fold of pointer differences
This was supposed to be an NFC cleanup, but there's
a real logic difference (did not drop 'nsw') visible
in some tests in addition to an efficiency improvement.

This is because in the case where we have 2 GEPs,
the code was *always* swapping the operands and
negating the result. But if we have 2 GEPs, we
should *never* need swapping/negation AFAICT.

This is part of improving flags propagation noticed
with PR47430.
2020-09-07 15:54:32 -04:00
Sanjay Patel 7a6d6f0f70 [InstCombine] improve folds for icmp with multiply operands (PR47432)
Check for no overflow along with an odd constant before
we lose information by converting to bitwise logic.

https://rise4fun.com/Alive/2Xl

  Pre: C1 != 0
  %mx = mul nsw i8 %x, C1
  %my = mul nsw i8 %y, C1
  %r = icmp eq i8 %mx, %my
  =>
  %r = icmp eq i8 %x, %y

  Name: nuw ne
  Pre: C1 != 0
  %mx = mul nuw i8 %x, C1
  %my = mul nuw i8 %y, C1
  %r = icmp ne i8 %mx, %my
  =>
  %r = icmp ne i8 %x, %y

  Name: odd ne
  Pre: C1 % 2 != 0
  %mx = mul i8 %x, C1
  %my = mul i8 %y, C1
  %r = icmp ne i8 %mx, %my
  =>
  %r = icmp ne i8 %x, %y
2020-09-07 12:40:37 -04:00
Sanjay Patel b22910daab [InstCombine] erase instructions leading up to unreachable
Normal dead code elimination ignores assume intrinsics, so we fail to
delete assumes that are not meaningful (and potentially worse if they
cause conflicts with other assumptions).

The motivating example in https://llvm.org/PR47416 suggests that we
might have problems upstream from here (difference between C and C++),
but this should be a cheap way to make sure we remove more dead code.

Differential Revision: https://reviews.llvm.org/D87149
2020-09-07 10:44:08 -04:00
Sanjay Patel 3ca8b9a560 [InstCombine] give a name to an intermediate value for easier tracking; NFC
As noted in PR47430, we probably want to conditionally include 'nsw'
here anyway, so we are going to need to fill out the optional args.
2020-09-07 08:19:42 -04:00
Nikita Popov 4892d3a198 [InstCombine] Fold abs with dominating condition
Similar to D87168, but for abs. If we have a dominating x >= 0
condition, then we know that abs(x) is x. This fold is in
InstCombine, because we need to create a sub instruction for
the x < 0 case.

Differential Revision: https://reviews.llvm.org/D87184
2020-09-05 16:18:35 +02:00
Nikita Popov ada8a17d94 [InstCombine] Fold abs intrinsic eq zero
Following the same transform for the select version of abs.
2020-09-05 15:11:38 +02:00
Nikita Popov 58b28fa7a2 [InstCombine] Fold mul of abs intrinsic
Same as the existing SPF_ABS fold. We don't need to explicitly
handle NABS, as the negs will get folded away first.
2020-09-05 12:37:45 +02:00
Nikita Popov 10cb23c6ca [InstCombine] Fold cttz of abs intrinsic
Same as the existing fold for SPF_ABS. We don't need to explicitly
handle the NABS variant, as we'll first fold away the neg in that
case.
2020-09-05 12:25:41 +02:00
Sanjay Patel 2391a34f9f [InstCombine] canonicalize all commutative intrinsics with constant arg 2020-09-03 12:42:04 -04:00
Eli Friedman 96ef6998df [InstCombine] Fix a couple crashes with extractelement on a scalable vector.
Differential Revision: https://reviews.llvm.org/D86989
2020-09-02 18:02:07 -07:00
Venkataramanan Kumar 626c3738cd [InstCombine] Transform 1.0/sqrt(X) * X to X/sqrt(X)
These transforms will now be performed irrespective of the number of uses for the expression "1.0/sqrt(X)":
1.0/sqrt(X) * X => X/sqrt(X)
X * 1.0/sqrt(X) => X/sqrt(X)

We already handle more general cases, and we are intentionally not creating extra (and likely expensive)
fdiv ops in IR. This pattern is the exception to the rule because we always expect the Backend to reduce
X/sqrt(X) to sqrt(X), if it has the necessary (reassoc) fast-math-flags.

Ref: DagCombiner optimizes the X/sqrt(X) to sqrt(X).

Differential Revision: https://reviews.llvm.org/D86726
2020-09-02 08:23:48 -04:00
Christopher Tetreault 640f20b0c7 [SVE] Remove calls to VectorType::getNumElements from InstCombine
Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D82237
2020-08-31 12:59:10 -07:00
Roman Lebedev c23aefd7c3
[NFC][InstCombine] visitPHINode(): cleanup PHI CSE instruction replacement
As @nikic is pointing out in https://reviews.llvm.org/rGbf21ce7b908e#inline-4647
this must be sufficient otherwise `EliminateDuplicatePHINodes()`
would have hit issues with it already.
2020-08-31 22:29:39 +03:00
Roman Lebedev bf21ce7b90
[InstCombine] Take 3: Perform trivial PHI CSE
The original take 1 was 6102310d81,
which taught InstSimplify to do that, which seemed better at time,
since we got EarlyCSE support for free.

However, it was proven that we can not do that there,
the simplified-to PHI would not be reachable from the original PHI,
and that is not something InstSimplify is allowed to do,
as noted in the commit ed90f15efb
that reverted it:
> It appears to cause compilation non-determinism and caused stage3 mismatches.

Then there was take 2 3e69871ab5,
which was InstCombine-specific, but it again showed stage2-stage3 differences,
and reverted in bdaa3f86a0.
This is quite alarming.

Here, let's try to change how we find existing PHI candidate:
due to the worklist order, and the way PHI nodes are inserted
(it may be inserted as the first one, or maybe not), let's look at *all*
PHI nodes in the block.

Effects on vanilla llvm test-suite + RawSpeed:
```
| statistic name                                     | baseline  | proposed  |      Δ |        % |    \|%\| |
|----------------------------------------------------|-----------|-----------|-------:|---------:|---------:|
| asm-printer.EmittedInsts                           | 7942329   | 7942457   |    128 |    0.00% |    0.00% |
| assembler.ObjectBytes                              | 254295632 | 254312480 |  16848 |    0.01% |    0.01% |
| correlated-value-propagation.NumPhis               | 18412     | 18347     |    -65 |   -0.35% |    0.35% |
| early-cse.NumCSE                                   | 2183283   | 2183267   |    -16 |    0.00% |    0.00% |
| early-cse.NumSimplify                              | 550105    | 541842    |  -8263 |   -1.50% |    1.50% |
| instcombine.NumAggregateReconstructionsSimplified  | 73        | 4506      |   4433 | 6072.60% | 6072.60% |
| instcombine.NumCombined                            | 3640311   | 3644419   |   4108 |    0.11% |    0.11% |
| instcombine.NumDeadInst                            | 1778204   | 1783205   |   5001 |    0.28% |    0.28% |
| instcombine.NumPHICSEs                             | 0         | 22490     |  22490 |    0.00% |    0.00% |
| instcombine.NumWorklistIterations                  | 2023272   | 2024400   |   1128 |    0.06% |    0.06% |
| instcount.NumCallInst                              | 1758395   | 1758802   |    407 |    0.02% |    0.02% |
| instcount.NumInvokeInst                            | 59478     | 59502     |     24 |    0.04% |    0.04% |
| instcount.NumPHIInst                               | 330557    | 330545    |    -12 |    0.00% |    0.00% |
| instcount.TotalBlocks                              | 1077138   | 1077220   |     82 |    0.01% |    0.01% |
| instcount.TotalFuncs                               | 101442    | 101441    |     -1 |    0.00% |    0.00% |
| instcount.TotalInsts                               | 8831946   | 8832606   |    660 |    0.01% |    0.01% |
| simplifycfg.NumHoistCommonCode                     | 24186     | 24187     |      1 |    0.00% |    0.00% |
| simplifycfg.NumInvokes                             | 4300      | 4410      |    110 |    2.56% |    2.56% |
| simplifycfg.NumSimpl                               | 1019813   | 999767    | -20046 |   -1.97% |    1.97% |
```
So it fires 22490 times, which is less than ~24k the take 1 did,
but more than what take 2 did (22228 times)
.
It allows foldAggregateConstructionIntoAggregateReuse() to actually work
after PHI-of-extractvalue folds did their thing. Previously SimplifyCFG
would have done this PHI CSE, of all places. Additionally, allows some
more `invoke`->`call` folds to happen (+110, +2.56%).

All in all, expectedly, this catches less things overall,
but all the motivational cases are still caught, so all good.
2020-08-29 18:21:24 +03:00
Roman Lebedev bdaa3f86a0
Revert "[InstCombine] Take 2: Perform trivial PHI CSE"
While the original variant with doing this in InstSimplify (rightfully)
caused questions and ultimately was detected to be a culprit
of stage2-stage3 mismatch, it was expected that
InstCombine-based implementation would be fine.

But apparently it's not, as
http://lab.llvm.org:8011/builders/clang-with-thin-lto-ubuntu/builds/24095/steps/compare-compilers/logs/stdio
suggests.

Which suggests that somewhere in InstCombine there is a loop
over nondeterministically sorted container, which causes
different worklist ordering.

This reverts commit 3e69871ab5.
2020-08-29 16:05:02 +03:00
Nikita Popov 6093b14c2c [InstCombine] Return replaceInstUsesWith() result (NFC)
Follow the usual usage pattern for this function and return the
result.
2020-08-29 14:49:57 +02:00
Roman Lebedev 71ac9105cd
[InstCombine] foldAggregateConstructionIntoAggregateReuse(): use InstCombiner::replaceInstUsesWith() instead of RAUW
We really shouldn't use RAUW in InstCombine
because we should consistently update Worklist to avoid extra iterations.
2020-08-29 15:10:14 +03:00
Roman Lebedev e65f213178
[InstCombine] canonicalizeICmpPredicate(): use InstCombiner::replaceInstUsesWith() instead of RAUW
We really shouldn't use RAUW in InstCombine
because we should consistently update Worklist to avoid extra iterations.
2020-08-29 15:10:14 +03:00
Roman Lebedev bd12113f57
[NFC][InstCombine] Fix some comments: the code already uses IC::replaceInstUsesWith() 2020-08-29 15:10:14 +03:00
Roman Lebedev 49d223274f
[NFC][InstCombine] Add STATISTIC() for how many iterations we did
As we've established, if it takes more than two iterations
(one to perform folding and one to ensure that no folding opportunities
remain) per function, then there are worklist management issues.
So it may be interesting to keep track of it.
2020-08-29 15:10:13 +03:00
Roman Lebedev 4f4eecf0ec
[InstCombine] visitPHINode(): use InstCombiner::replaceInstUsesWith() instead of RAUW
As noted in post-commit review, we really shouldn't use RAUW in InstCombine
because we should consistently update Worklist to avoid extra iterations.
2020-08-29 15:10:00 +03:00
Roman Lebedev 3e69871ab5
[InstCombine] Take 2: Perform trivial PHI CSE
The original take was 6102310d81,
which taught InstSimplify to do that, which seemed better at time,
since we got EarlyCSE support for free.

However, it was proven that we can not do that there,
the simplified-to PHI would not be reachable from the original PHI,
and that is not something InstSimplify is allowed to do,
as noted in the commit ed90f15efb
that reverted it :
> It appears to cause compilation non-determinism and caused stage3 mismatches.

However InstCombine already does many different optimizations,
so it should be a safe place to do it here.

Note that we still can't just compare incoming values ranges,
because there is no guarantee that these PHI's we'd simplify to
were already re-visited and sorted.
However coming up with a test is problematic.

Effects on vanilla llvm test-suite + RawSpeed:
```
| statistic name                                     | baseline  | proposed  |      Δ |        % |      |%| |
|----------------------------------------------------|-----------|-----------|-------:|---------:|---------:|
| instcombine.NumPHICSEs                             | 0         | 22228     |  22228 |    0.00% |    0.00% |
| asm-printer.EmittedInsts                           | 7942329   | 7942456   |    127 |    0.00% |    0.00% |
| assembler.ObjectBytes                              | 254295632 | 254313792 |  18160 |    0.01% |    0.01% |
| early-cse.NumCSE                                   | 2183283   | 2183272   |    -11 |    0.00% |    0.00% |
| early-cse.NumSimplify                              | 550105    | 541842    |  -8263 |   -1.50% |    1.50% |
| instcombine.NumAggregateReconstructionsSimplified  | 73        | 4506      |   4433 | 6072.60% | 6072.60% |
| instcombine.NumCombined                            | 3640311   | 3666911   |  26600 |    0.73% |    0.73% |
| instcombine.NumDeadInst                            | 1778204   | 1783318   |   5114 |    0.29% |    0.29% |
| instcount.NumCallInst                              | 1758395   | 1758804   |    409 |    0.02% |    0.02% |
| instcount.NumInvokeInst                            | 59478     | 59502     |     24 |    0.04% |    0.04% |
| instcount.NumPHIInst                               | 330557    | 330549    |     -8 |    0.00% |    0.00% |
| instcount.TotalBlocks                              | 1077138   | 1077221   |     83 |    0.01% |    0.01% |
| instcount.TotalFuncs                               | 101442    | 101441    |     -1 |    0.00% |    0.00% |
| instcount.TotalInsts                               | 8831946   | 8832611   |    665 |    0.01% |    0.01% |
| simplifycfg.NumInvokes                             | 4300      | 4410      |    110 |    2.56% |    2.56% |
| simplifycfg.NumSimpl                               | 1019813   | 999740    | -20073 |   -1.97% |    1.97% |
```
So it fires ~22k times, which is less than ~24k the take 1 did.
It allows foldAggregateConstructionIntoAggregateReuse() to actually work
after PHI-of-extractvalue folds did their thing. Previously SimplifyCFG
would have done this PHI CSE, of all places. Additionally, allows some
more `invoke`->`call` folds to happen (+110, +2.56%).

All in all, expectedly, this catches less things overall,
but all the motivational cases are still caught, so all good.
2020-08-29 13:13:06 +03:00
Nikita Popov 57a26bb7b4 [InstCombine] Fix typo in comment (NFC)
As pointed out in post-commit review of D63060.
2020-08-29 10:17:17 +02:00
Nikita Popov ffe05dd125 [InstCombine] usub.sat(a, b) + b => umax(a, b) (PR42178)
Fixes https://bugs.llvm.org/show_bug.cgi?id=42178 by folding
usub.sat(a, b) + b to umax(a, b). The backend will expand umax
back to usubsat if that is profitable.

We may also want to handle uadd.sat(a, b) - b in the future.

Differential Revision: https://reviews.llvm.org/D63060
2020-08-28 21:52:29 +02:00
David Sherwood f4257c5832 [SVE] Make ElementCount members private
This patch changes ElementCount so that the Min and Scalable
members are now private and can only be accessed via the get
functions getKnownMinValue() and isScalable(). In addition I've
added some other member functions for more commonly used operations.
Hopefully this makes the class more useful and will reduce the
need for calling getKnownMinValue().

Differential Revision: https://reviews.llvm.org/D86065
2020-08-28 14:43:53 +01:00
Roman Lebedev 95848ea101
[Value][InstCombine] Fix one-use checks in PHI-of-op -> Op-of-PHI[s] transforms to be one-user checks
As FIXME said, they really should be checking for a single user,
not use, so let's do that. It is not *that* unusual to have
the same value as incoming value in a PHI node, not unlike
how a PHI may have the same incoming basic block more than once.

There isn't a nice way to do that, Value::users() isn't uniqified,
and Value only tracks it's uses, not Users, so the check is
potentially costly since it does indeed potentially involes
traversing the entire use list of a value.
2020-08-26 20:20:41 +03:00
Roman Lebedev 1f90d45b9e
[InstCombine] PHI-of-extractvalues -> extractvalue-of-PHI, aka invokes are bad
While since D86306 we do it's sibling fold for `insertvalue`,
we should also do this for `extractvalue`'s.

And unlike that one, the results here are, quite honestly, shocking,
as it can be observed here on vanilla llvm test-suite + RawSpeed results:

```
| statistic name                                     | baseline  | proposed  |       Δ |       % |    |%| |
|----------------------------------------------------|-----------|-----------|--------:|--------:|-------:|
| asm-printer.EmittedInsts                           | 7945095   | 7942507   |   -2588 |  -0.03% |  0.03% |
| assembler.ObjectBytes                              | 273209920 | 273069800 | -140120 |  -0.05% |  0.05% |
| early-cse.NumCSE                                   | 2183363   | 2183398   |      35 |   0.00% |  0.00% |
| early-cse.NumSimplify                              | 541847    | 550017    |    8170 |   1.51% |  1.51% |
| instcombine.NumAggregateReconstructionsSimplified  | 2139      | 108       |   -2031 | -94.95% | 94.95% |
| instcombine.NumCombined                            | 3601364   | 3635448   |   34084 |   0.95% |  0.95% |
| instcombine.NumConstProp                           | 27153     | 27157     |       4 |   0.01% |  0.01% |
| instcombine.NumDeadInst                            | 1694521   | 1765022   |   70501 |   4.16% |  4.16% |
| instcombine.NumPHIsOfExtractValues                 | 0         | 37546     |   37546 |   0.00% |  0.00% |
| instcombine.NumSunkInst                            | 63158     | 63686     |     528 |   0.84% |  0.84% |
| instcount.NumBrInst                                | 874304    | 871857    |   -2447 |  -0.28% |  0.28% |
| instcount.NumCallInst                              | 1757657   | 1758402   |     745 |   0.04% |  0.04% |
| instcount.NumExtractValueInst                      | 45623     | 11483     |  -34140 | -74.83% | 74.83% |
| instcount.NumInsertValueInst                       | 4983      | 580       |   -4403 | -88.36% | 88.36% |
| instcount.NumInvokeInst                            | 61018     | 59478     |   -1540 |  -2.52% |  2.52% |
| instcount.NumLandingPadInst                        | 35334     | 34215     |   -1119 |  -3.17% |  3.17% |
| instcount.NumPHIInst                               | 344428    | 331116    |  -13312 |  -3.86% |  3.86% |
| instcount.NumRetInst                               | 100773    | 100772    |      -1 |   0.00% |  0.00% |
| instcount.TotalBlocks                              | 1081154   | 1077166   |   -3988 |  -0.37% |  0.37% |
| instcount.TotalFuncs                               | 101443    | 101442    |      -1 |   0.00% |  0.00% |
| instcount.TotalInsts                               | 8890201   | 8833747   |  -56454 |  -0.64% |  0.64% |
| instsimplify.NumSimplified                         | 75822     | 75707     |    -115 |  -0.15% |  0.15% |
| simplifycfg.NumHoistCommonCode                     | 24203     | 24197     |      -6 |  -0.02% |  0.02% |
| simplifycfg.NumHoistCommonInstrs                   | 48201     | 48195     |      -6 |  -0.01% |  0.01% |
| simplifycfg.NumInvokes                             | 2785      | 4298      |    1513 |  54.33% | 54.33% |
| simplifycfg.NumSimpl                               | 997332    | 1018189   |   20857 |   2.09% |  2.09% |
| simplifycfg.NumSinkCommonCode                      | 7088      | 6464      |    -624 |  -8.80% |  8.80% |
| simplifycfg.NumSinkCommonInstrs                    | 15117     | 14021     |   -1096 |  -7.25% |  7.25% |
```
... which tells us that this new fold fires whopping 38k times,
increasing the amount of SimplifyCFG's `invoke`->`call` transforms by +54% (+1513) (again, D85787 did that last time),
decreasing total instruction count by -0.64% (-56454),
and sharply decreasing count of `insertvalue`'s (-88.36%, i.e. 9 times less)
and `extractvalue`'s (-74.83%, i.e. four times less).

This causes geomean -0.01% binary size decrease
http://llvm-compile-time-tracker.com/compare.php?from=4d5ca22b8adfb6643466e4e9f48ba14bb48938bc&to=97dacca0111cb2ae678204e52a3cee00e3a69208&stat=size-text
and, ignoring `O0-g`, is a geomean -0.01%..-0.05% compile-time improvement
http://llvm-compile-time-tracker.com/compare.php?from=4d5ca22b8adfb6643466e4e9f48ba14bb48938bc&to=97dacca0111cb2ae678204e52a3cee00e3a69208&stat=instructions

The other thing that tells is, is that while this is a massive win for `invoke`->`call` transform
`InstCombinerImpl::foldAggregateConstructionIntoAggregateReuse()` fold,
which is supposed to be dealing with such aggregate reconstructions,
fires a lot less now. There are two reasons why:
1. After this fold, as it can be seen in tests, we may (will) end up with trivially redundant PHI nodes.
   We don't CSE them in InstCombine presently, which means that EarlyCSE needs to run and then InstCombine rerun.
2. But then, EarlyCSE not only manages to fold such redundant PHI's,
   it also sees that the extract-insert chain recreates the original aggregate,
   and replaces it with the original aggregate.

The take-aways are
1. We maybe should do most trivial, same-BB PHI CSE in InstCombine
2. I need to check if what other patterns remain, and how they can be resolved.
   (i.e. i wonder if `foldAggregateConstructionIntoAggregateReuse()` might go away)

This is a reland of the original commit fcb51d8c24,
because originally i forgot to ensure that the base aggregate types match.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D86530
2020-08-26 09:57:50 +03:00
Roman Lebedev c295c6f2c0
Revert "[InstCombine] PHI-of-extractvalues -> extractvalue-of-PHI, aka invokes are bad"
This reverts commit fcb51d8c24.

As buildbots report, there's apparently some missing check to ensure
that the types of incoming values match the type of PHI.
Let's revert for a moment.
2020-08-26 09:23:22 +03:00
Roman Lebedev fcb51d8c24
[InstCombine] PHI-of-extractvalues -> extractvalue-of-PHI, aka invokes are bad
While since D86306 we do it's sibling fold for `insertvalue`,
we should also do this for `extractvalue`'s.

And unlike that one, the results here are, quite honestly, shocking,
as it can be observed here on vanilla llvm test-suite + RawSpeed results:

```
| statistic name                                     | baseline  | proposed  |       Δ |       % |    |%| |
|----------------------------------------------------|-----------|-----------|--------:|--------:|-------:|
| asm-printer.EmittedInsts                           | 7945095   | 7942507   |   -2588 |  -0.03% |  0.03% |
| assembler.ObjectBytes                              | 273209920 | 273069800 | -140120 |  -0.05% |  0.05% |
| early-cse.NumCSE                                   | 2183363   | 2183398   |      35 |   0.00% |  0.00% |
| early-cse.NumSimplify                              | 541847    | 550017    |    8170 |   1.51% |  1.51% |
| instcombine.NumAggregateReconstructionsSimplified  | 2139      | 108       |   -2031 | -94.95% | 94.95% |
| instcombine.NumCombined                            | 3601364   | 3635448   |   34084 |   0.95% |  0.95% |
| instcombine.NumConstProp                           | 27153     | 27157     |       4 |   0.01% |  0.01% |
| instcombine.NumDeadInst                            | 1694521   | 1765022   |   70501 |   4.16% |  4.16% |
| instcombine.NumPHIsOfExtractValues                 | 0         | 37546     |   37546 |   0.00% |  0.00% |
| instcombine.NumSunkInst                            | 63158     | 63686     |     528 |   0.84% |  0.84% |
| instcount.NumBrInst                                | 874304    | 871857    |   -2447 |  -0.28% |  0.28% |
| instcount.NumCallInst                              | 1757657   | 1758402   |     745 |   0.04% |  0.04% |
| instcount.NumExtractValueInst                      | 45623     | 11483     |  -34140 | -74.83% | 74.83% |
| instcount.NumInsertValueInst                       | 4983      | 580       |   -4403 | -88.36% | 88.36% |
| instcount.NumInvokeInst                            | 61018     | 59478     |   -1540 |  -2.52% |  2.52% |
| instcount.NumLandingPadInst                        | 35334     | 34215     |   -1119 |  -3.17% |  3.17% |
| instcount.NumPHIInst                               | 344428    | 331116    |  -13312 |  -3.86% |  3.86% |
| instcount.NumRetInst                               | 100773    | 100772    |      -1 |   0.00% |  0.00% |
| instcount.TotalBlocks                              | 1081154   | 1077166   |   -3988 |  -0.37% |  0.37% |
| instcount.TotalFuncs                               | 101443    | 101442    |      -1 |   0.00% |  0.00% |
| instcount.TotalInsts                               | 8890201   | 8833747   |  -56454 |  -0.64% |  0.64% |
| instsimplify.NumSimplified                         | 75822     | 75707     |    -115 |  -0.15% |  0.15% |
| simplifycfg.NumHoistCommonCode                     | 24203     | 24197     |      -6 |  -0.02% |  0.02% |
| simplifycfg.NumHoistCommonInstrs                   | 48201     | 48195     |      -6 |  -0.01% |  0.01% |
| simplifycfg.NumInvokes                             | 2785      | 4298      |    1513 |  54.33% | 54.33% |
| simplifycfg.NumSimpl                               | 997332    | 1018189   |   20857 |   2.09% |  2.09% |
| simplifycfg.NumSinkCommonCode                      | 7088      | 6464      |    -624 |  -8.80% |  8.80% |
| simplifycfg.NumSinkCommonInstrs                    | 15117     | 14021     |   -1096 |  -7.25% |  7.25% |
```
... which tells us that this new fold fires whopping 38k times,
increasing the amount of SimplifyCFG's `invoke`->`call` transforms by +54% (+1513) (again, D85787 did that last time),
decreasing total instruction count by -0.64% (-56454),
and sharply decreasing count of `insertvalue`'s (-88.36%, i.e. 9 times less)
and `extractvalue`'s (-74.83%, i.e. four times less).

This causes geomean -0.01% binary size decrease
http://llvm-compile-time-tracker.com/compare.php?from=4d5ca22b8adfb6643466e4e9f48ba14bb48938bc&to=97dacca0111cb2ae678204e52a3cee00e3a69208&stat=size-text
and, ignoring `O0-g`, is a geomean -0.01%..-0.05% compile-time improvement
http://llvm-compile-time-tracker.com/compare.php?from=4d5ca22b8adfb6643466e4e9f48ba14bb48938bc&to=97dacca0111cb2ae678204e52a3cee00e3a69208&stat=instructions

The other thing that tells is, is that while this is a massive win for `invoke`->`call` transform
`InstCombinerImpl::foldAggregateConstructionIntoAggregateReuse()` fold,
which is supposed to be dealing with such aggregate reconstructions,
fires a lot less now. There are two reasons why:
1. After this fold, as it can be seen in tests, we may (will) end up with trivially redundant PHI nodes.
   We don't CSE them in InstCombine presently, which means that EarlyCSE needs to run and then InstCombine rerun.
2. But then, EarlyCSE not only manages to fold such redundant PHI's,
   it also sees that the extract-insert chain recreates the original aggregate,
   and replaces it with the original aggregate.

The take-aways are
1. We maybe should do most trivial, same-BB PHI CSE in InstCombine
2. I need to check if what other patterns remain, and how they can be resolved.
   (i.e. i wonder if `foldAggregateConstructionIntoAggregateReuse()` might go away)

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D86530
2020-08-26 09:08:24 +03:00