Commit Graph

10367 Commits

Author SHA1 Message Date
Krzysztof Parzyszek 76e8c1899e Break long line accidentally left in the previous commit 2020-09-23 12:24:45 -05:00
Krzysztof Parzyszek e976fb1e54 [EarlyCSE] Fix crash with expensive checks after D87691
D87691 reordered some checks, which turned out to be unsafe. More
specifically, when examining a store instruction, the check against
getOrCreateResult should be done before attempting to call
isSameMemGeneration. Otherwise a crash in MSSA walker can occur.

This patch restores the order of these calls to what it was originally.
2020-09-23 12:21:34 -05:00
Martin Storsjö b90132399a [CVP] Remove a redundant trailing semicolon, fixing GCC warnings. NFC. 2020-09-23 09:03:01 +03:00
Stefanos Baziotis 89c1e35f3c [LoopInfo] empty() -> isInnermost(), add isOutermost()
Differential Revision: https://reviews.llvm.org/D82895
2020-09-22 23:28:51 +03:00
Roman Lebedev b289dc5306
[CVP] Narrow SDiv/SRem to the smallest power-of-2 that's sufficient to contain its operands
This is practically identical to what we already do for UDiv/URem:
  https://rise4fun.com/Alive/04K

Name: narrow udiv
Pre: C0 u<= 255 && C1 u<= 255
%r = udiv i16 C0, C1
  =>
%t0 = trunc i16 C0 to i8
%t1 = trunc i16 C1 to i8
%t2 = udiv i8 %t0, %t1
%r = zext i8 %t2 to i16

Name: narrow exact udiv
Pre: C0 u<= 255 && C1 u<= 255
%r = udiv exact i16 C0, C1
  =>
%t0 = trunc i16 C0 to i8
%t1 = trunc i16 C1 to i8
%t2 = udiv exact i8 %t0, %t1
%r = zext i8 %t2 to i16

Name: narrow urem
Pre: C0 u<= 255 && C1 u<= 255
%r = urem i16 C0, C1
  =>
%t0 = trunc i16 C0 to i8
%t1 = trunc i16 C1 to i8
%t2 = urem i8 %t0, %t1
%r = zext i8 %t2 to i16

... only here we need to look for 'min signed bits', not 'active bits',
and there's an UB to be aware of:
  https://rise4fun.com/Alive/KG86
  https://rise4fun.com/Alive/LwR

Name: narrow sdiv
Pre: C0 <= 127 && C1 <= 127 && C0 >= -128 && C1 >= -128
%r = sdiv i16 C0, C1
  =>
%t0 = trunc i16 C0 to i9
%t1 = trunc i16 C1 to i9
%t2 = sdiv i9 %t0, %t1
%r = sext i9 %t2 to i16

Name: narrow exact sdiv
Pre: C0 <= 127 && C1 <= 127 && C0 >= -128 && C1 >= -128
%r = sdiv exact i16 C0, C1
  =>
%t0 = trunc i16 C0 to i9
%t1 = trunc i16 C1 to i9
%t2 = sdiv exact i9 %t0, %t1
%r = sext i9 %t2 to i16

Name: narrow srem
Pre: C0 <= 127 && C1 <= 127 && C0 >= -128 && C1 >= -128
%r = srem i16 C0, C1
  =>
%t0 = trunc i16 C0 to i9
%t1 = trunc i16 C1 to i9
%t2 = srem i9 %t0, %t1
%r = sext i9 %t2 to i16


Name: narrow sdiv
Pre: C0 <= 127 && C1 <= 127 && C0 >= -128 && C1 >= -128 && !(C0 == -128 && C1 == -1)
%r = sdiv i16 C0, C1
  =>
%t0 = trunc i16 C0 to i8
%t1 = trunc i16 C1 to i8
%t2 = sdiv i8 %t0, %t1
%r = sext i8 %t2 to i16

Name: narrow exact sdiv
Pre: C0 <= 127 && C1 <= 127 && C0 >= -128 && C1 >= -128 && !(C0 == -128 && C1 == -1)
%r = sdiv exact i16 C0, C1
  =>
%t0 = trunc i16 C0 to i8
%t1 = trunc i16 C1 to i8
%t2 = sdiv exact i8 %t0, %t1
%r = sext i8 %t2 to i16

Name: narrow srem
Pre: C0 <= 127 && C1 <= 127 && C0 >= -128 && C1 >= -128 && !(C0 == -128 && C1 == -1)
%r = srem i16 C0, C1
  =>
%t0 = trunc i16 C0 to i8
%t1 = trunc i16 C1 to i8
%t2 = srem i8 %t0, %t1
%r = sext i8 %t2 to i16


The ConstantRangeTest.losslessSignedTruncationSignext test sanity-checks
the logic, that we can losslessly truncate ConstantRange to
`getMinSignedBits()` and signext it back, and it will be identical
to the original CR.

On vanilla llvm test-suite + RawSpeed, this fires 1262 times,
while the same fold for UDiv/URem only fires 384 times. Sic!

Additionally, this causes +606.18% (+1079) extra cases of
aggressive-instcombine.NumDAGsReduced, and +473.14% (+1145)
of aggressive-instcombine.NumInstrsReduced folds.
2020-09-22 21:37:30 +03:00
Roman Lebedev 4977eadee5
[NFC][CVP] Give a better name STATISTIC() counting udiv i16 -> udiv i8 xforms 2020-09-22 21:37:30 +03:00
Roman Lebedev ba5afe5588
[NFC][CVP] processUDivOrURem(): refactor to use ConstantRange::getActiveBits()
As an exhaustive test shows, this logic is fully identical to the old
implementation, with exception of the case where both of the operands
had empty ranges:

```
TEST_F(ConstantRangeTest, CVP_UDiv) {
  unsigned Bits = 4;
  EnumerateConstantRanges(Bits, [&](const ConstantRange &CR0) {
    if(CR0.isEmptySet())
      return;
    EnumerateConstantRanges(Bits, [&](const ConstantRange &CR1) {
      if(CR0.isEmptySet())
        return;

      unsigned MaxActiveBits = 0;
      for (const ConstantRange &CR : {CR0, CR1})
        MaxActiveBits = std::max(MaxActiveBits, CR.getActiveBits());

      ConstantRange OperandRange(Bits, /*isFullSet=*/false);
      for (const ConstantRange &CR : {CR0, CR1})
        OperandRange = OperandRange.unionWith(CR);
      unsigned NewWidth = OperandRange.getUnsignedMax().getActiveBits();

      EXPECT_EQ(MaxActiveBits, NewWidth) << CR0 << " " << CR1;
    });
  });
}
```
2020-09-22 21:37:29 +03:00
Roman Lebedev 4eeeb356fc
[CVP] Enhance SRem -> URem fold to work not just on non-negative operands
This is a continuation of 8d487668d0,
the logic is pretty much identical for SRem:

Name: pos pos
Pre: C0 >= 0 && C1 >= 0
%r = srem i8 C0, C1
  =>
%r = urem i8 C0, C1

Name: pos neg
Pre: C0 >= 0 && C1 <= 0
%r = srem i8 C0, C1
  =>
%r = urem i8 C0, -C1

Name: neg pos
Pre: C0 <= 0 && C1 >= 0
%r = srem i8 C0, C1
  =>
%t0 = urem i8 -C0, C1
%r = sub i8 0, %t0

Name: neg neg
Pre: C0 <= 0 && C1 <= 0
%r = srem i8 C0, C1
  =>
%t0 = urem i8 -C0, -C1
%r = sub i8 0, %t0

https://rise4fun.com/Alive/Vd6

Now, this new logic does not result in any new catches
as of vanilla llvm test-suite + RawSpeed.
but it should be virtually compile-time free,
and it may be important to be consistent in their handling,
because if we had a pair of sdiv-srem, and only converted one of them,
-divrempairs will no longer see them as a pair,
and thus not "merge" them.
2020-09-22 21:37:28 +03:00
Meera Nakrani a3d0dce260 [ARM][TTI] Prevents constants in a min(max) or max(min) pattern from being hoisted when in a loop
Changes TTI function getIntImmCostInst to take an additional Instruction parameter,
which enables us to be able to check it is part of a min(max())/max(min()) pattern that will match SSAT.
We can then mark the constant used as free to prevent it being hoisted so SSAT can still be generated.
Required minor changes in some non-ARM backends to allow for the optional parameter to be included.

Differential Revision: https://reviews.llvm.org/D87457
2020-09-22 11:54:10 +00:00
Serguei Katkov 5502cfa091 [LoopUnswitch] Trivial simplification: remove trivial dead condition after unswitch
Non trivial loop unswitch can keep the dead condition instruction.
CL adds trivial dead code elimination for unused condition.

Reviewers: asbirlea, aqjune, fhahn, DaniilSuchkov, reames
Reviewed By: asbirlea
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D88014
2020-09-22 09:04:59 +07:00
Krzysztof Parzyszek ae3f54c1e9 [EarlyCSE] Handle masked loads and stores
Extend the handling of memory intrinsics to also include non-
target-specific intrinsics, in particular masked loads and stores.

Invent "isHandledNonTargetIntrinsic" to distinguish between intrin-
sics that should be handled natively from intrinsics that can be
passed to TTI.

Add code that handles masked loads and stores and update the
testcase to reflect the results.

Differential Revision: https://reviews.llvm.org/D87340
2020-09-21 18:47:10 -05:00
Arthur Eubanks 1747f77764 [SimplifyCFG] Override options in default constructor
SimplifyCFG's options should always be overridden by command line flags,
but they mistakenly weren't in the default constructor.

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D87718
2020-09-21 16:33:01 -07:00
Krzysztof Parzyszek 2c768c7d6c [EarlyCSE] Small refactoring changes, NFC
1. Store intrinsic ID in ParseMemoryInst instead of a boolean flag
   "IsTargetMemInst". This will make it easier to add support for
   target-independent intrinsics.
2. Extract the complex multiline conditions from EarlyCSE::processNode
   into a new function "getMatchingValue".

Differential Revision: https://reviews.llvm.org/D87691
2020-09-21 16:11:06 -05:00
Arthur Eubanks f4f7df037e [DIE] Remove DeadInstEliminationPass
This pass is like DeadCodeEliminationPass, but only does one pass
through a function instead of iterating on users of eliminated
instructions.

DeadCodeEliminationPass should be used in all cases.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D87933
2020-09-21 12:12:25 -07:00
Florian Hahn 57ae9bb932 [LSR] Preserve MSSA when using SplitCriticalEdge.
LSR claims to MemorySSA, but we also have to make sure it is preserved
when splitting critical edges. This can be done by passing MSSAU to
SplitCriticalEdge.

Fixes PR47557.
2020-09-21 09:51:26 +01:00
Florian Hahn 9d172c8e9c Recommit "[DSE] Switch to MemorySSA-backed DSE by default."
This switches to using DSE + MemorySSA by default again, after
fixing the issues reported after the first commit.

Notable fixes fc82006331, a0017c2bc2.

This reverts commit 3a59628f3c.
2020-09-18 11:05:00 +01:00
Whitney Tsang 1cee33e9db [LoopUnrollAndJam] Allow unroll and jam loops forced by user.
Summary: Allow unroll and jam loops forced by user.
LoopUnrollAndJamPass is still disabled by default in the NPM pipeline,
and can be controlled by -enable-npm-unroll-and-jam.

Reviewed By: Meinersbur, dmgreen

Differential Revision: https://reviews.llvm.org/D87786
2020-09-17 19:40:14 +00:00
Nikita Popov 91ce8e121b [GVN] Use that assume(!X) implies X==false (PR47496)
We already use that assume(X) implies X==true, do the same for
assume(!X) implying X==false. This fixes PR47496.
2020-09-17 21:34:44 +02:00
Michael Liao 4e4c89b22c [EarlyCSE] Simplify max/min pattern matching. NFC. 2020-09-16 18:34:46 -04:00
Arthur Eubanks f7aa1563eb [LowerSwitch][NewPM] Port lowerswitch to NPM
Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D87726
2020-09-15 18:18:31 -07:00
Wenlei He 2c391a5a14 [LICM] Make Loop ICM profile aware again
D65060 was reverted because it introduced non-determinism by using BFI counts from already freed blocks. The parent of this revision fixes that by using a VH callback on blocks to prevent this from happening and makes sure BFI data is passed correctly in LoopStandardAnalysisResults.

This re-introduces the previous optimization of using BFI data to prevent LICM from hoisting/sinking if the instruction will end up moving to a colder block.

Internally at Facebook this change results in a ~7% win in a CPU related metric in one of our big services by preventing hoisting cold code into a hot pre-header like the added test case demonstrates.

Testing:
ninja check

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D87551
2020-09-15 17:21:58 -07:00
Wenlei He 2ea4c2c598 [BFI] Make BFI information available through loop passes inside LoopStandardAnalysisResults
~~D65060 uncovered that trying to use BFI in loop passes can lead to non-deterministic behavior when blocks are re-used while retaining old BFI data.~~

~~To make sure BFI is preserved through loop passes a Value Handle (VH) callback is registered on blocks themselves. When a block is freed it now also wipes out the accompanying BFI entry such that stale BFI data can no longer persist resolving the determinism issue. ~~

~~An optimistic approach would be to incrementally update BFI information throughout the loop passes rather than only invalidating them on removed blocks. The issues with that are:~~
~~1. It is not clear how BFI information should be incrementally updated: If a block is duplicated does its BFI information come with? How about if it's split/modified/moved around? ~~
~~2. Assuming we can address these problems the implementation here will be a massive undertaking. ~~

~~There's a known need of BFI in LICM analysis which requires correct but not incrementally updated BFI data. A follow-up change can register BFI in all loop passes so this preserved but potentially lossy data is available to any loop pass that wants it.~~

See: D75341 for an identical implementation of preserving BFI via VH callbacks. The previous statements do still apply but this change no longer has to be in this diff because it's already upstream 😄 .

This diff also moves BFI to be a part of LoopStandardAnalysisResults since the previous method using getCachedResults now (correctly!) statically asserts (D72893) that this data isn't static through the loop passes.

Testing
Ninja check

Reviewed By: asbirlea, nikic

Differential Revision: https://reviews.llvm.org/D86156
2020-09-15 16:16:24 -07:00
Matt Arsenault 7d6ca2ec57 InferAddressSpaces: Fix assert with unreachable code
Invalid IR in unreachable code is technically valid IR. In this case,
the address space of the value was never inferred, and we tried to
rewrite it with an invalid address space value which would assert.
2020-09-15 15:48:43 -04:00
Florian Hahn 3d42d54955 [ConstraintElimination] Add constraint elimination pass.
This patch is a first draft of a new pass that adds a more flexible way
to eliminate compares based on more complex constraints collected from
dominating conditions.

In particular, it aims at simplifying conditions of the forms below
using a forward propagation approach, rather than instcomine-style
ad-hoc backwards walking of def-use chains.

    if (x < y)
      if (y < z)
        if (x < z) <- simplify

or

    if (x + 2 < y)
        if (x + 1 < y) <- simplify assuming no wraps

The general approach is to collect conditions and blocks, sort them by
dominance and then iterate over the sorted list. Conditions are turned
into a linear inequality and add it to a system containing the linear
inequalities that hold on entry to the block. For blocks, we check each
compare against the system and see if it is implied by the constraints
in the system.

We also keep a stack of processed conditions and remove conditions from
the stack and the constraint system once they go out-of-scope (= do not
dominate the current block any longer).

Currently there still are the least the following areas for improvements

* Currently large unsigned constants cannot be added to the system
  (coefficients must be represented as integers)
* The way constraints are managed currently is not very optimized.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D84547
2020-09-15 19:31:11 +01:00
Florian Hahn 3a59628f3c Revert "[DSE] Switch to MemorySSA-backed DSE by default."
This reverts commit fb109c42d9.

Temporarily revert due to a mis-compile pointed out at D87163.
2020-09-15 18:07:56 +01:00
Bjorn Pettersson aa8be5aeea [Scalarizer] Avoid changing name of non-instructions
The "takeName" logic in ScalarizerVisitor::gather did not consider
that the value vector could refer to non-instructions, such as
global variables. This patch make sure that we avoid changing the
name of a value if it isn't an instruction.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D87685
2020-09-15 14:15:50 +02:00
David Sherwood 69cccb3189 [SVE] Fix isLoadInvariantInLoop for scalable vectors
I've amended the isLoadInvariantInLoop function to bail out for
scalable vectors for now since the invariant.start intrinsic is only
ever generated by the clang frontend for thread locals or struct
and class constructors, neither of which support sizeless types.
In addition, the intrinsic itself does not currently support the
concept of a scaled size, which makes it impossible to compare
the sizes of different scalable objects, e.g. <vscale x 32 x i8>
and <vscale x 16 x i8>.

Added new tests here:

  Transforms/LICM/AArch64/sve-load-hoist.ll
  Transforms/LICM/hoisting.ll

Differential Revision: https://reviews.llvm.org/D87227
2020-09-15 08:30:19 +01:00
Florian Hahn f715d81c9d [DSE] Only eliminate candidates that always store the same loc.
AliasAnalysis/MemoryLocation does not account for loops. Two
MemoryLocation can be must-overwrite, even if the first one writes
multiple locations in a loop.

This patch prevents removing such stores, by only considering candidates
that are known to be loop invariant, or executed in the same BB.

Currently the invariant check is quite conservative and only considers
Alloca and Alloca-like instructions and arguments as invariant base pointers.
It also considers GEPs with all constant indices and invariant bases as
invariant.

This can be improved in the future, but the current implementation has
only minor impact on the total number of stores eliminated (25903 vs
26047 for the baseline). There are some 2-10% swings for some individual
benchmarks. In roughly half of the cases, the number of stores removed
increases actually, because we skip candidates that are unlikely to be
valid candidates early.
2020-09-14 12:06:58 +01:00
David Sherwood 816663adb5 [SVE] In LoopIdiomRecognize::isLegalStore bail out for scalable vectors
The function LoopIdiomRecognize::isLegalStore looks for stores in loops
that could be transformed into memset or memcpy. However, the algorithm
currently requires that we know how big the store is at runtime, i.e.
that the store size will not overflow an unsigned integer. For scalable
vectors we cannot guarantee this so I have changed the code to bail out
for now. In addition, even if we add a way to query the maximum value of
vscale in future we will still need to update the algorithm to cope with
non-constant strides. The additional cost associated with calculating
the memset and memcpy arguments will need to be taken into account as
well.

This patch also fixes up an implicit TypeSize -> uint64_t cast,
thereby removing a warning. I've added tests here showing a fixed
width vector loop being transformed into memcpy, and a scalable
vector loop remaining unchanged:

  Transforms/LoopIdiom/memcpy-vectors.ll

Differential Revision: https://reviews.llvm.org/D87439
2020-09-14 11:28:31 +01:00
David Stenberg bfcb824ba5 [JumpThreading] Fix an incorrect Modified status
This fixes PR47297.

When ProcessBlock() was able to constant fold the terminator's
condition, but not do any more transformations, the function would
return false, which would lead to the JumpThreading pass returning an
incorrect modified status. This patch makes so that ProcessBlock()
returns true in such cases. This will trigger an unnecessary invocation
of ProcessBlock() in such cases, but this should be rare to occur.

This was caught using the check introduced by D80916.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D87392
2020-09-14 10:36:13 +02:00
Florian Hahn e082dee2b5 [DSE] Bail out on MemoryPhis when deleting stores at end of function.
When deleting stores at the end of a function, we have to do PHI
translation, otherwise we might miss reads in different iterations of a
loop. See multiblock-loop-carried-dependence.ll for details.

This fixes a mis-compile and surprisingly also increases the number of
eliminated stores from 26047 to 26572 for MultiSource/SPEC2000/SPEC2006
on X86 with -O3 -flto. This is most likely because we save budget by not
exploring through MemoryPhis, which are less likely to result in valid
candidates for elimination.

The issue was reported post-commit for fb109c42d9.
2020-09-12 19:05:59 +01:00
Tyker 78de7297ab Reland [AssumeBundles] Use operand bundles to encode alignment assumptions
NOTE: There is a mailing list discussion on this: http://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html

Complemantary to the assumption outliner prototype in D71692, this patch
shows how we could simplify the code emitted for an alignemnt
assumption. The generated code is smaller, less fragile, and it makes it
easier to recognize the additional use as a "assumption use".

As mentioned in D71692 and on the mailing list, we could adopt this
scheme, and similar schemes for other patterns, without adopting the
assumption outlining.
2020-09-12 15:36:06 +02:00
Krzysztof Parzyszek f92908cc74 [DSE] Make sure that DSE+MSSA can handle masked stores
Differential Revision: https://reviews.llvm.org/D87414
2020-09-11 10:00:21 -05:00
Michael Liao f787fe15d8 [EarlyCSE] Remove unnecessary operand swap.
- As min/max are commutative operators, there is no need to swap
  operands. That breaks the convention calculating the hash value.
2020-09-11 02:14:04 -04:00
Michael Liao 41e68f7ee7 [EarlyCSE] Fix and recommit the revised c9826829d7
In addition to calculate hash consistently by swapping SELECT's
operands, we also need to inverse the select pattern favor to match the
original logic.

[EarlyCSE] Equivalent SELECTs should hash equally

DenseMap<SimpleValue> assumes that, if its isEqual method returns true
for two elements, then its getHashValue method must return the same value
for them. This invariant is broken when one SELECT node is a min/max
operation, and the other can be transformed into an equivalent min/max by
inverting its predicate and swapping its operands. This patch fixes an
assertion failure that would occur intermittently while compiling the
following IR:

    define i32 @t(i32 %i) {
      %cmp = icmp sle i32 0, %i
      %twin1 = select i1 %cmp, i32 %i, i32 0
      %cmpinv = icmp sgt i32 0, %i
      %twin2 = select i1 %cmpinv,  i32 0, i32 %i
      %sink = add i32 %twin1, %twin2
      ret i32 %sink
    }

Differential Revision: https://reviews.llvm.org/D86843
2020-09-10 23:30:56 -04:00
Michael Liao 39dc75f66c Revert "[EarlyCSE] Equivalent SELECTs should hash equally"
This reverts commit c9826829d7 as it
breaks regression tests.
2020-09-10 22:37:35 -04:00
Florian Hahn fb109c42d9 [DSE] Switch to MemorySSA-backed DSE by default.
The tests have been updated and I plan to move them from the MSSA
directory up.

Some end-to-end tests needed small adjustments. One difference to the
legacy DSE is that legacy DSE also deletes trivially dead instructions
that are unrelated to memory operations. Because MemorySSA-backed DSE
just walks the MemorySSA, we only visit/check memory instructions. But
removing unrelated dead instructions is not really DSE's job and other
passes will clean up.

One noteworthy change is in llvm/test/Transforms/Coroutines/ArgAddr.ll,
but I think this comes down to legacy DSE not handling instructions that
may throw correctly in that case. To cover this with MemorySSA-backed
DSE, we need an update to llvm.coro.begin to treat it's return value to
belong to the same underlying object as the passed pointer.

There are some minor cases MemorySSA-backed DSE currently misses, e.g. related
to atomic operations, but I think those can be implemented after the switch.

This has been discussed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2020-August/144417.html

For the MultiSource/SPEC2000/SPEC2006 the number of eliminated stores
goes from ~17500 (legayc DSE) to ~26300 (MemorySSA-backed). More numbers
and details in the thread on llvm-dev.

Impact on CTMark:
```
                                     Legacy Pass Manager
                        exec instrs    size-text
O3                       + 0.60%        - 0.27%
ReleaseThinLTO           + 1.00%        - 0.42%
ReleaseLTO-g.            + 0.77%        - 0.33%
RelThinLTO (link only)   + 0.87%        - 0.42%
RelLO-g (link only)      + 0.78%        - 0.33%
```
http://llvm-compile-time-tracker.com/compare.php?from=3f22e96d95c71ded906c67067d75278efb0a2525&to=ae8be4642533ff03803967ee9d7017c0d73b0ee0&stat=instructions
```
                                     New Pass Manager
                       exec instrs.   size-text
O3                       + 0.95%       - 0.25%
ReleaseThinLTO           + 1.34%       - 0.41%
ReleaseLTO-g.            + 1.71%       - 0.35%
RelThinLTO (link only)   + 0.96%       - 0.41%
RelLO-g (link only)      + 2.21%       - 0.35%
```
http://195.201.131.214:8000/compare.php?from=3f22e96d95c71ded906c67067d75278efb0a2525&to=ae8be4642533ff03803967ee9d7017c0d73b0ee0&stat=instructions

Reviewed By: asbirlea, xbolva00, nikic

Differential Revision: https://reviews.llvm.org/D87163
2020-09-10 22:24:32 +01:00
Bryan Chan c9826829d7 [EarlyCSE] Equivalent SELECTs should hash equally
DenseMap<SimpleValue> assumes that, if its isEqual method returns true
for two elements, then its getHashValue method must return the same value
for them. This invariant is broken when one SELECT node is a min/max
operation, and the other can be transformed into an equivalent min/max by
inverting its predicate and swapping its operands. This patch fixes an
assertion failure that would occur intermittently while compiling the
following IR:

    define i32 @t(i32 %i) {
      %cmp = icmp sle i32 0, %i
      %twin1 = select i1 %cmp, i32 %i, i32 0
      %cmpinv = icmp sgt i32 0, %i
      %twin2 = select i1 %cmpinv,  i32 0, i32 %i
      %sink = add i32 %twin1, %twin2
      ret i32 %sink
    }

Differential Revision: https://reviews.llvm.org/D86843
2020-09-10 16:59:24 -04:00
Krzysztof Parzyszek 8a08740db6 [GVN] Account for masked loads/stores depending on load/store instructions
This is a case where an intrinsic depends on a non-call instruction.

Differential Revision: https://reviews.llvm.org/D87423
2020-09-10 10:57:33 -05:00
Florian Hahn a5ec99da6e [DSE] Support eliminating memcpy.inline.
MemoryLocation has been taught about memcpy.inline, which means we can
get the memory locations read and written by it. This means DSE can
handle memcpy.inline
2020-09-10 13:19:25 +01:00
Juneyoung Lee 39c1653b3d [JumpThreading] Conditionally freeze its condition when unfolding select
This patch fixes pr45956 (https://bugs.llvm.org/show_bug.cgi?id=45956 ).
To minimize its impact to the quality of generated code, I suggest enabling
this only for LTO as a start (it has two JumpThreading passes registered).
This patch contains a flag that makes JumpThreading enable it.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D84940
2020-09-10 15:49:40 +09:00
Max Kazantsev c413a8a8ec [LoopLoadElim] Filter away candidates that stop being AddRecs after loop versioning. PR47457
The test in PR47457 demonstrates a situation when candidate load's pointer's SCEV
is no loger a SCEVAddRec after loop versioning. The code there assumes that it is
always a SCEVAddRec and crashes otherwise.

This patch makes sure that we do not consider candidates for which this requirement
is broken after the versioning.

Differential Revision: https://reviews.llvm.org/D87355
Reviewed By: asbirlea
2020-09-10 13:30:31 +07:00
Florian Hahn 9969c317ff [DSE,MemorySSA] Handle atomic stores explicitly in isReadClobber.
Atomic stores are modeled as MemoryDef to model the fact that they may
not be reordered, depending on the ordering constraints.

Atomic stores that are monotonic or weaker do not limit re-ordering, so
we do not have to treat them as potential read clobbers.

Note that llvm/test/Transforms/DeadStoreElimination/MSSA/atomic.ll
already contains a set of negative test cases.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D87386
2020-09-09 23:01:58 +01:00
Mark de Wever 08196e0b2e Implements [[likely]] and [[unlikely]] in IfStmt.
This is the initial part of the implementation of the C++20 likelihood
attributes. It handles the attributes in an if statement.

Differential Revision: https://reviews.llvm.org/D85091
2020-09-09 20:48:37 +02:00
Krzysztof Parzyszek 81ff2d30a9 [DSE] Handle masked stores 2020-09-09 13:31:31 -05:00
Juneyoung Lee 25ce1e0497 [ValueTracking] Add UndefOrPoison/Poison-only version of relevant functions
This patch adds isGuaranteedNotToBePoison and programUndefinedIfUndefOrPoison.

isGuaranteedNotToBePoison will be used at D75808. The latter function is used at isGuaranteedNotToBePoison.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D84242
2020-09-09 20:00:26 +09:00
Florian Hahn 2bcc4db761 [EarlyCSE] Explicitly require AAResultsWrapperPass.
The MemorySSAWrapperPass depends on AAResultsWrapperPass and if
MemorySSA is preserved but AAResultsWrapperPass is not, this could lead
to a crash when updating the last user of the MemorySSAWrapperPass.

Alternatively AAResultsWrapperPass could be marked preserved by GVN, but
I am not sure if that would be safe. I am not sure what is required in
order to preserve AAResultsWrapperPass. At the moment, it seems like a
couple of passes that do similar transforms to GVN are preserving it.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D87137
2020-09-09 09:14:50 +01:00
Max Kazantsev 795e4ee9d2 [NFC] Move functon from IndVarSimplify to SCEV
This function can be reused in other places.

Differential Revision: https://reviews.llvm.org/D87274
Reviewed By: fhahn, lebedev.ri
2020-09-09 11:20:59 +07:00
Florian Hahn c7b7c32f4a [DSE,MemorySSA] Increase walker limit a bit.
This slightly bumps the walker limit so that it covers more cases while
not increasing compile-time too much:
http://llvm-compile-time-tracker.com/compare.php?from=0fc1c2b51ba0cfb9145139af35be638333865251&to=91144a50ea4fa82c0c877e77784f60371640b263&stat=instructions
2020-09-08 14:55:46 +01:00
Andrew Wei 78071fb524 [LSR] Canonicalize a formula before insert it into the list
In GenerateConstantOffsetsImpl, we may generate non canonical Formula
if BaseRegs of that Formula is updated and includes a recurrent expr reg
related with current loop while its ScaledReg is not.

Patched by: mdchen
Reviewed By: qcolombet

Differential Revision: https://reviews.llvm.org/D86939
2020-09-08 13:14:53 +08:00