Commit Graph

22423 Commits

Author SHA1 Message Date
Sanjay Patel aff5bee35f [InstCombine] fold extract+insert into identity shuffle
This is similar to the existing fold for splats added with:
rL365379

If we can adjust the shuffle mask to include another element
in an identity mask (if it changes vector length, that's an
extract/insert subvector operation in the backend), then that
can eliminate extractelement/insertelement pairs in IR.

All targets are expected to lower shuffles with identity masks
efficiently.

llvm-svn: 371340
2019-09-08 19:03:01 +00:00
Simon Pilgrim 879ed20bde Fix typo. NFCI
llvm-svn: 371317
2019-09-07 18:09:09 +00:00
Roman Lebedev 45ba26599b [SimplifyCFG] SpeculativelyExecuteBB(): It's SpeculatedInstructions, not SpeculationCost
It counts the number of instructions we are ok speculating
(at most 1 there), not their cost, so rename accordingly.

llvm-svn: 371294
2019-09-07 09:06:06 +00:00
Hideto Ueno f2b9dc4758 [Attributor] ValueSimplify Abstract Attribute
Summary:
This patch introduces initial `AAValueSimplify` which simplifies a value in a context.

example
- (for function returned) If all the return values are the same and constant, then we can replace callsite returned with the constant.
- If an internal function takes the same value(constant) as an argument in the callsite, then we can replace the argument with that constant.

Reviewers: jdoerfert, sstefan1

Reviewed By: jdoerfert

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66967

llvm-svn: 371291
2019-09-07 07:03:05 +00:00
Teresa Johnson 9c27b59cec Change TargetLibraryInfo analysis passes to always require Function
Summary:
This is the first change to enable the TLI to be built per-function so
that -fno-builtin* handling can be migrated to use function attributes.
See discussion on D61634 for background. This is an enabler for fixing
handling of these options for LTO, for example.

This change should not affect behavior, as the provided function is not
yet used to build a specifically per-function TLI, but rather enables
that migration.

Most of the changes were very mechanical, e.g. passing a Function to the
legacy analysis pass's getTLI interface, or in Module level cases,
adding a callback. This is similar to the way the per-function TTI
analysis works.

There was one place where we were looking for builtins but not in the
context of a specific function. See FindCXAAtExit in
lib/Transforms/IPO/GlobalOpt.cpp. I'm somewhat concerned my workaround
could provide the wrong behavior in some corner cases. Suggestions
welcome.

Reviewers: chandlerc, hfinkel

Subscribers: arsenm, dschuff, jvesely, nhaehnle, mehdi_amini, javed.absar, sbc100, jgravelle-google, eraman, aheejin, steven_wu, george.burgess.iv, dexonsmith, jfb, asbirlea, gchatelet, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66428

llvm-svn: 371284
2019-09-07 03:09:36 +00:00
Evandro Menezes 7d677adf2d [InstCombine] Refactor substitution of instruction in the parent BB (NFC)
Add the new method `LibCallSimplifier::substituteInParent()` that calls
`LibCallSimplifier::replaceAllUsesWith()' and
`LibCallSimplifier::eraseFromParent()` back to back, simplifying the
resulting code.

llvm-svn: 371264
2019-09-06 22:07:11 +00:00
Sanjay Patel 4f0e429acc [SimplifyLibCalls] handle pow(x,-0.0) before it can assert (PR43233)
https://bugs.llvm.org/show_bug.cgi?id=43233

llvm-svn: 371221
2019-09-06 16:10:18 +00:00
Matt Arsenault 524a9d5774 InstCombine: Fix crash on icmp of gep with addrspacecasted null
llvm-svn: 371146
2019-09-05 23:39:21 +00:00
Vitaly Buka 9020f11377 [SimplifyCFG] Don't SimplifyBranchOnICmpChain with ExtraCase
Summary:
Here we try to avoid issues with "explicit branch" with SimplifyBranchOnICmpChain
which can check on undef. Msan by design reports branches on uninitialized
memory and undefs, so we have false report here.

In general msan does not like when we convert

```
// If at least one of them is true we can MSAN is ok if another is undefs
if (a || b)
  return;
```
into
```
// If 'a' is undef MSAN will complain even if 'b' is true
if (a)
  return;
if (b)
  return;
```

Example

Before optimization we had something like this:
```
while (true) {
  bool maybe_undef = doStuff();

  while (true) {
    char c = getChar();
    if (c != 10 && c != 13)
     continue
    break;
  }

  // we know that c == 10 || c == 13 if we get here,
  // so msan know that branch is not affected by maybe_undef
  if (maybe_undef || c == 10 || c == 13)
    continue;
  return;
}
```

SimplifyBranchOnICmpChain will convert that into
```
while (true) {
  bool maybe_undef = doStuff();

  while (true) {
    char c = getChar();
    if (c != 10 && c != 13)
      continue;
    break;
  }

  // however msan will complain here:
  if (maybe_undef)
    continue;

  // we know that c == 10 || c == 13, so either way we will get continue
  switch(c) {
    case 10: continue;
    case 13: continue;
  }
  return;
}
```

Reviewers: eugenis, efriedma

Reviewed By: eugenis, efriedma

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67205

llvm-svn: 371138
2019-09-05 22:49:34 +00:00
Roman Lebedev 8360c42e25 [InstCombine] foldICmpBinOp(): consider inverted check in 'unsigned sub overflow' check
A follow-up for r329011.
This may be changed to produce @llvm.sub.with.overflow in a later patch,
but for now just make things more consistent overall.

A few observations stem from this:
* There does not seem to be a similar one-instruction fold for uadd-overflow
* I'm not sure we'll want to canonicalize `B u> A` as `usub.with.overflow`,
  so since the `icmp` here no longer refers to `sub`,
  reconstructing `usub.with.overflow` will be problematic,
  and will likely require standalone pass (similar to DivRemPairs).

https://rise4fun.com/Alive/Zqs

Name: (A - B) u> A --> B u> A
  %t0 = sub i8 %A, %B
  %r = icmp ugt i8 %t0, %A
=>
  %r = icmp ugt i8 %B, %A

Name: (A - B) u<= A --> B u<= A
  %t0 = sub i8 %A, %B
  %r = icmp ule i8 %t0, %A
=>
  %r = icmp ule i8 %B, %A

Name: C u< (C - D) --> C u< D
  %t0 = sub i8 %C, %D
  %r = icmp ult i8 %C, %t0
=>
  %r = icmp ult i8 %C, %D

Name: C u>= (C - D) --> C u>= D
  %t0 = sub i8 %C, %D
  %r = icmp uge i8 %C, %t0
=>
  %r = icmp uge i8 %C, %D

llvm-svn: 371101
2019-09-05 17:41:02 +00:00
Roman Lebedev ecb7ea1ae7 [InstCombine] foldICmpBinOp(): consider inverted check in 'unsigned add overflow' check
A follow-up for r342004.
This will be changed to produce @llvm.add.with.overflow in a later patch,
but for now just make things more consistent overall.

https://rise4fun.com/Alive/qxE

Name: (Op1 + X) u< Op1 --> ~Op1 u< X
  %t0 = add i8 %Op1, %X
  %r = icmp ult i8 %t0, %Op1
=>
  %n = xor i8 %Op1, -1
  %r = icmp ult i8 %n, %X

Name: (Op1 + X) u>= Op1 --> ~Op1 u>= X
  %t0 = add i8 %Op1, %X
  %r = icmp uge i8 %t0, %Op1
=>
  %n = xor i8 %Op1, -1
  %r = icmp uge i8 %n, %X

;-------------------------------------------------------------------------------

Name: Op0 u> (Op0 + X) --> X u> ~Op0
  %t0 = add i8 %Op0, %X
  %r = icmp ugt i8 %Op0, %t0
=>
  %n = xor i8 %Op0, -1
  %r = icmp ugt i8 %X, %n

Name: Op0 u<= (Op0 + X) --> X u<= ~Op0
  %t0 = add i8 %Op0, %X
  %r = icmp ule i8 %Op0, %t0
=>
  %n = xor i8 %Op0, -1
  %r = icmp ule i8 %X, %n

llvm-svn: 371100
2019-09-05 17:40:49 +00:00
Denis Bakhvalov 58f172f05a [MergedLoadStoreMotion] Sink stores to BB with more than 2 predecessors
If we have:

bb5:
  br i1 %arg3, label %bb6, label %bb7

bb6:
  %tmp = getelementptr inbounds i32, i32* %arg1, i64 2
  store i32 3, i32* %tmp, align 4
  br label %bb9

bb7:
  %tmp8 = getelementptr inbounds i32, i32* %arg1, i64 2
  store i32 3, i32* %tmp8, align 4
  br label %bb9

bb9:  ; preds = %bb4, %bb6, %bb7
  ...

We can't sink stores directly into bb9.
This patch creates new BB that is successor of %bb6 and %bb7
and sinks stores into that block.

SplitFooterBB is the parameter to the pass that controls
that behavior.

Change-Id: I7fdf50a772b84633e4b1b860e905bf7e3e29940f
Differential: https://reviews.llvm.org/D66234
llvm-svn: 371089
2019-09-05 17:00:32 +00:00
Alina Sbirlea 2ac69aadb5 [MemorySSA] Verify MSSAUpdater exists.
llvm-svn: 371087
2019-09-05 16:58:15 +00:00
Hiroshi Yamauchi d842f2eec4 [PGO][CHR] Speed up following long, interlinked use-def chains.
Summary:
Avoid visiting an instruction more than once by using a map.

This is similar to https://reviews.llvm.org/rL361416.

Reviewers: davidxl

Reviewed By: davidxl

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67198

llvm-svn: 371086
2019-09-05 16:56:55 +00:00
Alina Sbirlea ae900d3882 [MemorySSA] Update MemorySSA when removing debug.value calls.
llvm-svn: 371084
2019-09-05 16:25:24 +00:00
Guillaume Chatelet 33671ceffa [LLVM][Alignment] Convert isLegalNTStore/isLegalNTLoad to llvm::Align
Summary:
This is patch is part of a serie to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67223

llvm-svn: 371063
2019-09-05 13:09:42 +00:00
Johannes Doerfert 56e9b608ad [Attributor][Stats] Use the right statistics macro
llvm-svn: 370976
2019-09-04 20:34:57 +00:00
Johannes Doerfert 7ab5253704 [Attributor][Fix] Make sure we do not delete live code
Summary: Liveness needs to mark edges, not blocks as dead.

Reviewers: sstefan1, uenoku

Subscribers: hiraditya, bollu, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67191

llvm-svn: 370975
2019-09-04 20:34:52 +00:00
Leonard Chan eca01b031d [NewPM][Sancov] Make Sancov a Module Pass instead of 2 Passes
This patch merges the sancov module and funciton passes into one module pass.

The reason for this is because we ran into an out of memory error when
attempting to run asan fuzzer on some protobufs (pc.cc files). I traced the OOM
error to the destructor of SanitizerCoverage where we only call
appendTo[Compiler]Used which calls appendToUsedList. I'm not sure where precisely
in appendToUsedList causes the OOM, but I am able to confirm that it's calling
this function *repeatedly* that causes the OOM. (I hacked sancov a bit such that
I can still create and destroy a new sancov on every function run, but only call
appendToUsedList after all functions in the module have finished. This passes, but
when I make it such that appendToUsedList is called on every sancov destruction,
we hit OOM.)

I don't think the OOM is from just adding to the SmallSet and SmallVector inside
appendToUsedList since in either case for a given module, they'll have the same
max size. I suspect that when the existing llvm.compiler.used global is erased,
the memory behind it isn't freed. I could be wrong on this though.

This patch works around the OOM issue by just calling appendToUsedList at the
end of every module run instead of function run. The same amount of constants
still get added to llvm.compiler.used, abd we make the pass usage and logic
simpler by not having any inter-pass dependencies.

Differential Revision: https://reviews.llvm.org/D66988

llvm-svn: 370971
2019-09-04 20:30:29 +00:00
Alina Sbirlea 6da79ce1fe [MemorySSA] Re-enable MemorySSA use.
Differential Revision: https://reviews.llvm.org/D58311

llvm-svn: 370957
2019-09-04 19:16:04 +00:00
Johannes Doerfert a7a3b3aa43 [Attributor][Fix] Ensure the attribute names are created properly
The names of the attributes were not always created properly which
caused problems with the yaml output.

llvm-svn: 370956
2019-09-04 19:01:08 +00:00
Philip Reames 4228245e41 [NFC] Switch last couple of invariant_load checks to use hasMetadata
llvm-svn: 370948
2019-09-04 18:27:31 +00:00
David Bolvansky 420cbb6190 [InstCombine] sub(xor(x, y), or(x, y)) -> neg(and(x, y))
Summary:
```
Name: sub(xor(x, y), or(x, y)) -> neg(and(x, y))
%or = or i32 %y, %x
%xor = xor i32 %x, %y
%sub = sub i32 %xor, %or
  =>
%sub1 = and i32 %x, %y
%sub = sub i32 0, %sub1

Optimization: sub(xor(x, y), or(x, y)) -> neg(and(x, y))
Done: 1
Optimization is correct!
```

https://rise4fun.com/Alive/8OI

Reviewers: lebedev.ri

Reviewed By: lebedev.ri

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67188

llvm-svn: 370945
2019-09-04 18:03:21 +00:00
David Bolvansky 0e07248704 [InstCombine] Fold sub (and A, B) (or A, B)) to neg (xor A, B)
Summary:
```
Name: sub(and(x, y), or(x, y)) -> neg(xor(x, y))
%or = or i32 %y, %x
%and = and i32 %x, %y
%sub = sub i32 %and, %or
  =>
%sub1 = xor i32 %x, %y
%sub = sub i32 0, %sub1

Optimization: sub(and(x, y), or(x, y)) -> neg(xor(x, y))
Done: 1
Optimization is correct!
```

https://rise4fun.com/Alive/VI6

Found by @lebedev.ri. Also author of the proof.

Reviewers: lebedev.ri, spatel

Reviewed By: lebedev.ri

Subscribers: llvm-commits, lebedev.ri

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67155

llvm-svn: 370934
2019-09-04 17:30:53 +00:00
Philip Reames 27820f9909 [Instruction] Add hasMetadata(Kind) helper [NFC]
It's a common idiom, so let's add the obvious wrapper for metadata kinds which are basically booleans.

llvm-svn: 370933
2019-09-04 17:28:48 +00:00
Johannes Doerfert 2f6220633c [Attributor] Look at internal functions only on-demand
Summary:
Instead of building attributes for internal functions which we do not
update as long as we assume they are dead, we now do not create
attributes until we assume the internal function to be live. This
improves the number of required iterations, as well as the number of
required updates, in real code. On our tests, the results are mixed.

Reviewers: sstefan1, uenoku

Subscribers: hiraditya, bollu, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66914

llvm-svn: 370924
2019-09-04 16:35:20 +00:00
Johannes Doerfert 97fd582b91 [Attributor] Use the white list for attributes consistently
Summary:
We create attributes on-demand so we need to check the white list
on-demand. This also unifies the location at which we create,
initialize, and eventually invalidate new abstract attributes.

The tests show mixed results, a few more call site attributes are
determined which can cause more iterations.

Reviewers: uenoku, sstefan1

Subscribers: hiraditya, bollu, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66913

llvm-svn: 370922
2019-09-04 16:26:20 +00:00
Johannes Doerfert b0412e437c [Attributor] Deal more explicit with non-exact definitions
Summary:
Before we tried to rule out non-exact definitions early but that lead to
on-demand attributes created for them anyway. As a consequence we needed
to look at the definition in the initialize of each attribute again.
This patch centralized this lookup and tightens the condition under
which we give up on non-exact definitions.

Reviewers: uenoku, sstefan1

Subscribers: hiraditya, bollu, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67115

llvm-svn: 370917
2019-09-04 16:16:13 +00:00
Sanjay Patel 4a2cd7be5a [InstSimplify] guard against unreachable code (PR43218)
This would crash:
https://bugs.llvm.org/show_bug.cgi?id=43218

llvm-svn: 370911
2019-09-04 15:12:55 +00:00
Alexey Lapshin cbf1f3b771 [Debuginfo][SROA] Need to handle dbg.value in SROA pass.
SROA pass processes debug info incorrecly if applied twice.
Specifically, after SROA works first time, instcombine converts dbg.declare
intrinsics into dbg.value. Inlining creates new opportunities for SROA,
so it is called again. This time it does not handle correctly previously
inserted dbg.value intrinsics.

Differential Revision: https://reviews.llvm.org/D64595

llvm-svn: 370906
2019-09-04 14:19:49 +00:00
David Bolvansky 358b80b340 [InstCombine] Fold sub (or A, B) (and A, B) to (xor A, B)
Summary:
```
Name: sub or and to xor
%or = or i32 %y, %x
%and = and i32 %x, %y
%sub = sub i32 %or, %and
  =>
%sub = xor i32 %x, %y

Optimization: sub or and to xor
Done: 1
Optimization is correct!
```
https://rise4fun.com/Alive/eJu

Reviewers: spatel, lebedev.ri

Reviewed By: lebedev.ri

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67153

llvm-svn: 370883
2019-09-04 12:00:33 +00:00
Philip Reames 30dc2da827 [GVN] Remove a todo introduced w/rL370791
When I dug into this, it turns out to be *much* more involved than I'd realized and doesn't actually simplify anything.  

The general purpose of the leader table is that we want to find the most-dominating definition quickly.  The problem for equivalance folding is slightly different; we want to find the most dominating *value* whose definition block dominates our use quickly.

To make this change, we'd end up having to restructure the leader table (either the sorting thereof, or maybe even introducing multiple leader tables per value) and that complexity is just not worth it.

llvm-svn: 370824
2019-09-03 21:56:17 +00:00
Alina Sbirlea ccb1862bc9 [MemorySSA] Disable MemorySSA use.
Differential Revision: https://reviews.llvm.org/D58311

llvm-svn: 370821
2019-09-03 21:20:46 +00:00
Johannes Doerfert b19cd27b28 [Attributor] Use the delete API for liveness
Reviewers: uenoku, sstefan1

Subscribers: hiraditya, bollu, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66833

llvm-svn: 370818
2019-09-03 20:42:16 +00:00
Johannes Doerfert 7516a5e045 [Attributor] Deduce "no-capture" argument attribute
Add the no-capture argument attribute deduction to the Attributor
fixpoint framework.

The new string attributed "no-capture-maybe-returned" is introduced to
allow deduction of no-capture through functions that "capture" an
argument but only by "returning" it. It is only used by the Attributor
for testing.

Differential Revision: https://reviews.llvm.org/D59922

llvm-svn: 370817
2019-09-03 20:37:24 +00:00
Alina Sbirlea e331d50534 [MemorySSA] Re-enable MemorySSA use.
Differential Revision: https://reviews.llvm.org/D58311

llvm-svn: 370811
2019-09-03 19:28:37 +00:00
Philip Reames 37e2f5f125 [GVN] Propagate simple equalities from assumes within the tail of the block
This extends the existing logic for propagating constant expressions in an analogous manner for what we do across basic blocks. The core point is that we chose some order of operands, and canonicalize uses towards that one.

The heuristic used is inspired by the one used across blocks; in a follow up change, I'd plan to common them so that the cross block version uses the slightly stronger ordering herein. 

As noted by the TODOs in the code, there's a good amount of room for improving the existing code and making it more powerful.  Some follow up work planned.

Differential Revision: https://reviews.llvm.org/D66977

llvm-svn: 370791
2019-09-03 17:31:19 +00:00
Roman Lebedev bdd65351d3 Revert r370454 "[LoopIdiomRecognize] BCmp loop idiom recognition"
https://bugs.llvm.org/show_bug.cgi?id=43206 was filed,
claiming that there is a miscompilation.
Reverting until i investigate.

This reverts commit r370454

llvm-svn: 370788
2019-09-03 17:14:56 +00:00
Bjorn Pettersson dd18ce4501 [LV] Fix miscompiles by adding non-header PHI nodes to AllowedExit
Summary:
Fold-tail currently supports reduction last-vector-value live-out's,
but has yet to support last-scalar-value live-outs, including
non-header phi's. As it relies on AllowedExit in order to detect
them and bail out we need to add the non-header PHI nodes to
AllowedExit, otherwise we end up with miscompiles.

Solves https://bugs.llvm.org/show_bug.cgi?id=43166

Reviewers: fhahn, Ayal

Reviewed By: fhahn, Ayal

Subscribers: anna, hiraditya, rkruppe, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67074

llvm-svn: 370721
2019-09-03 09:33:55 +00:00
Sjoerd Meijer 718f909ccd [LV] Tail-folding, runtime scev checks
Now that we allow tail-folding, not only when we optimise for size, make
sure we do not run in this assert.

Differential revision: https://reviews.llvm.org/D66932

llvm-svn: 370711
2019-09-03 08:53:02 +00:00
Sjoerd Meijer 0469b0e4ef [LV] Tail-folding with runtime memory checks
The loop vectorizer was running in an assert when it tried to fold the tail and
had to emit runtime memory disambiguation checks.

Differential revision: https://reviews.llvm.org/D66803

llvm-svn: 370707
2019-09-03 08:38:24 +00:00
Sanjay Patel 561c39994b [InstCombine] recognize bswap disguised as shufflevector
bitcast <N x i8> (shuf X, undef, <N, N-1,...0>) to i{N*8} --> bswap (bitcast X to i{N*8})

In PR43146:
https://bugs.llvm.org/show_bug.cgi?id=43146
...we have a more complicated case where SLP is making a mess of bswap. This patch won't
do anything for that currently, but we need to improve bswap recognition in instcombine,
SLP, and/or a standalone pass to avoid that problem.

This is limited using the data-layout so we don't try to do this transform with actual
vector types. The backend does not appear to have folds to convert in either direction,
so we don't want to mess up something that is actually better lowered as a shuffle.

On x86, we're trading something like this:

  vmovd	%edi, %xmm0
  vpshufb	LCPI0_0(%rip), %xmm0, %xmm0 ## xmm0 = xmm0[3,2,1,0,u,u,u,u,u,u,u,u,u,u,u,u]
  vmovd	%xmm0, %eax

For:

  movl	%edi, %eax
  bswapl	%eax

Differential Revision: https://reviews.llvm.org/D66965

llvm-svn: 370659
2019-09-02 13:33:20 +00:00
David Bolvansky ff0ad3c43d [InstCombine] mempcpy(d,s,n) to memcpy(d,s,n) + n
Summary:
Back-end currently expands mempcpy, but middle-end should work with memcpy instead of mempcpy to enable more memcpy-optimization.

GCC backend emits mempcpy, so LLVM backend could form it too, if we know mempcpy libcall is better than memcpy + n.
https://godbolt.org/z/dOCG96

Reviewers: efriedma, spatel, craig.topper, RKSimon, jdoerfert

Reviewed By: efriedma

Subscribers: hjl.tools, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65737

llvm-svn: 370593
2019-08-31 18:19:05 +00:00
Simon Pilgrim 757cc16ab7 Fix cppcheck shadow variable and variable scope warnings. NFCI.
llvm-svn: 370580
2019-08-31 12:30:19 +00:00
Nikita Popov b9e668f2e7 [CVP] Generate simpler code for elided with.overflow intrinsics
Use a { iN undef, i1 false } struct as the base, and only insert
the first operand, instead of using { iN undef, i1 undef } as the
base and inserting both. This is the same as what we do in InstCombine.

Differential Revision: https://reviews.llvm.org/D67034

llvm-svn: 370573
2019-08-31 09:58:37 +00:00
Wei Mi 798e59b81f [SampleFDO] Add profile symbol list section to discriminate function being
cold versus function being newly added.

This is the second half of https://reviews.llvm.org/D66374.

Profile symbol list is the collection of function symbols showing up in
the binary which generates the current profile. It is used to discriminate
function being cold versus function being newly added. Profile symbol list
is only added for profile with ExtBinary format.

During profile use compilation, when profile-sample-accurate is enabled,
a function without profile will be regarded as cold only when it is
contained in that list.

Differential Revision: https://reviews.llvm.org/D66766

llvm-svn: 370563
2019-08-31 02:27:26 +00:00
Wei Mi 5ef5829fb0 [GVN] Verify value equality before doing phi translation for call instruction
This is an updated version of https://reviews.llvm.org/D66909 to fix PR42605.

Basically, current phi translatation translates an old value number to an new
value number for a call instruction based on the literal equality of call
expression, without verifying there is no clobber in between. This is incorrect.

To get a finegrain check, use MachineDependence analysis to do the job. However,
this is still not ideal. Although given a call instruction,
`MemoryDependenceResults::getCallDependencyFrom` returns identical call
instructions without clobber in between using MemDepResult with its DepType to
be `Def`. However, identical is too strict here and we want it to be relaxed a
little to consider phi-translation -- callee is the same, param operands can be
different. That means changing the semantic of `MemDepResult::Def` and I don't
know the potential impact.

So currently the patch is still conservative to only handle
MemDepResult::NonFuncLocal, which means the current call has no function local
clobber. If there is clobber, even if the clobber doesn't stand in between the
current call and the call with the new value, we won't do phi-translate.

Differential Revision: https://reviews.llvm.org/D67013

llvm-svn: 370547
2019-08-30 23:01:22 +00:00
Johannes Doerfert 659a8707d6 [Attributor] Fix: do not pretend to preserve the CFG
llvm-svn: 370485
2019-08-30 16:35:10 +00:00
Johannes Doerfert 3fac668d83 [Attributor] Use existing function information for the call site
Summary:
Instead of recomputing information for call sites we now use the
function information directly. This is always valid and once we have
call site specific information we can improve here.

This patch also bootstraps attributes that are created on-demand through
an initial update call. Information that is known will then directly be
available in the new attribute without causing an iteration delay.

The tests show how this improves the iteration count.

Reviewers: sstefan1, uenoku

Subscribers: hiraditya, bollu, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66781

llvm-svn: 370480
2019-08-30 15:24:52 +00:00
Johannes Doerfert 81df452d82 [Attributor] Manifest load/store alignment generally
Summary:
Any pointer could have load/store users not only floating ones so we
move the manifest logic for alignment into the AAAlignImpl class.

Reviewers: uenoku, sstefan1

Subscribers: hiraditya, bollu, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66922

llvm-svn: 370479
2019-08-30 15:22:28 +00:00
Piotr Sobczak 67b979466a [InstCombine][AMDGPU] Simplify tbuffer loads
Summary: Add missing tbuffer loads intrinsics in SimplifyDemandedVectorElts.

Reviewers: arsenm, nhaehnle

Reviewed By: arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66926

llvm-svn: 370475
2019-08-30 14:20:04 +00:00
Haojian Wu ed170c9bf9 Remove an extra ";", NFC.
llvm-svn: 370465
2019-08-30 12:09:31 +00:00
Hideto Ueno 6381b143f6 [Attributor] Implement AANoAliasCallSiteArgument initialization
Summary: This patch adds an appropriate `initialize` method for `AANoAliasCallSiteArgument`.

Reviewers: jdoerfert, sstefan1

Reviewed By: jdoerfert

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66927

llvm-svn: 370456
2019-08-30 10:00:32 +00:00
Roman Lebedev 5c9f3cfec7 [LoopIdiomRecognize] BCmp loop idiom recognition
Summary:
@mclow.lists brought up this issue up in IRC.
It is a reasonably common problem to compare some two values for equality.
Those may be just some integers, strings or arrays of integers.

In C, there is `memcmp()`, `bcmp()` functions.
In C++, there exists `std::equal()` algorithm.
One can also write that function manually.

libstdc++'s `std::equal()` is specialized to directly call `memcmp()` for
various types, but not `std::byte` from C++2a. https://godbolt.org/z/mx2ejJ

libc++ does not do anything like that, it simply relies on simple C++'s
`operator==()`. https://godbolt.org/z/er0Zwf (GOOD!)

So likely, there exists a certain performance opportunities.
Let's compare performance of naive `std::equal()` (no `memcmp()`) with one that
is using `memcmp()` (in this case, compiled with modified compiler). {F8768213}

```
#include <algorithm>
#include <cmath>
#include <cstdint>
#include <iterator>
#include <limits>
#include <random>
#include <type_traits>
#include <utility>
#include <vector>

#include "benchmark/benchmark.h"

template <class T>
bool equal(T* a, T* a_end, T* b) noexcept {
  for (; a != a_end; ++a, ++b) {
    if (*a != *b) return false;
  }
  return true;
}

template <typename T>
std::vector<T> getVectorOfRandomNumbers(size_t count) {
  std::random_device rd;
  std::mt19937 gen(rd());
  std::uniform_int_distribution<T> dis(std::numeric_limits<T>::min(),
                                       std::numeric_limits<T>::max());
  std::vector<T> v;
  v.reserve(count);
  std::generate_n(std::back_inserter(v), count,
                  [&dis, &gen]() { return dis(gen); });
  assert(v.size() == count);
  return v;
}

struct Identical {
  template <typename T>
  static std::pair<std::vector<T>, std::vector<T>> Gen(size_t count) {
    auto Tmp = getVectorOfRandomNumbers<T>(count);
    return std::make_pair(Tmp, std::move(Tmp));
  }
};

struct InequalHalfway {
  template <typename T>
  static std::pair<std::vector<T>, std::vector<T>> Gen(size_t count) {
    auto V0 = getVectorOfRandomNumbers<T>(count);
    auto V1 = V0;
    V1[V1.size() / size_t(2)]++;  // just change the value.
    return std::make_pair(std::move(V0), std::move(V1));
  }
};

template <class T, class Gen>
void BM_bcmp(benchmark::State& state) {
  const size_t Length = state.range(0);

  const std::pair<std::vector<T>, std::vector<T>> Data =
      Gen::template Gen<T>(Length);
  const std::vector<T>& a = Data.first;
  const std::vector<T>& b = Data.second;
  assert(a.size() == Length && b.size() == a.size());

  benchmark::ClobberMemory();
  benchmark::DoNotOptimize(a);
  benchmark::DoNotOptimize(a.data());
  benchmark::DoNotOptimize(b);
  benchmark::DoNotOptimize(b.data());

  for (auto _ : state) {
    const bool is_equal = equal(a.data(), a.data() + a.size(), b.data());
    benchmark::DoNotOptimize(is_equal);
  }
  state.SetComplexityN(Length);
  state.counters["eltcnt"] =
      benchmark::Counter(Length, benchmark::Counter::kIsIterationInvariant);
  state.counters["eltcnt/sec"] =
      benchmark::Counter(Length, benchmark::Counter::kIsIterationInvariantRate);
  const size_t BytesRead = 2 * sizeof(T) * Length;
  state.counters["bytes_read/iteration"] =
      benchmark::Counter(BytesRead, benchmark::Counter::kDefaults,
                         benchmark::Counter::OneK::kIs1024);
  state.counters["bytes_read/sec"] = benchmark::Counter(
      BytesRead, benchmark::Counter::kIsIterationInvariantRate,
      benchmark::Counter::OneK::kIs1024);
}

template <typename T>
static void CustomArguments(benchmark::internal::Benchmark* b) {
  const size_t L2SizeBytes = []() {
    for (const benchmark::CPUInfo::CacheInfo& I :
         benchmark::CPUInfo::Get().caches) {
      if (I.level == 2) return I.size;
    }
    return 0;
  }();
  // What is the largest range we can check to always fit within given L2 cache?
  const size_t MaxLen = L2SizeBytes / /*total bufs*/ 2 /
                        /*maximal elt size*/ sizeof(T) / /*safety margin*/ 2;
  b->RangeMultiplier(2)->Range(1, MaxLen)->Complexity(benchmark::oN);
}

BENCHMARK_TEMPLATE(BM_bcmp, uint8_t, Identical)
    ->Apply(CustomArguments<uint8_t>);
BENCHMARK_TEMPLATE(BM_bcmp, uint16_t, Identical)
    ->Apply(CustomArguments<uint16_t>);
BENCHMARK_TEMPLATE(BM_bcmp, uint32_t, Identical)
    ->Apply(CustomArguments<uint32_t>);
BENCHMARK_TEMPLATE(BM_bcmp, uint64_t, Identical)
    ->Apply(CustomArguments<uint64_t>);

BENCHMARK_TEMPLATE(BM_bcmp, uint8_t, InequalHalfway)
    ->Apply(CustomArguments<uint8_t>);
BENCHMARK_TEMPLATE(BM_bcmp, uint16_t, InequalHalfway)
    ->Apply(CustomArguments<uint16_t>);
BENCHMARK_TEMPLATE(BM_bcmp, uint32_t, InequalHalfway)
    ->Apply(CustomArguments<uint32_t>);
BENCHMARK_TEMPLATE(BM_bcmp, uint64_t, InequalHalfway)
    ->Apply(CustomArguments<uint64_t>);
```
{F8768210}
```
$ ~/src/googlebenchmark/tools/compare.py --no-utest benchmarks build-{old,new}/test/llvm-bcmp-bench
RUNNING: build-old/test/llvm-bcmp-bench --benchmark_out=/tmp/tmpb6PEUx
2019-04-25 21:17:11
Running build-old/test/llvm-bcmp-bench
Run on (8 X 4000 MHz CPU s)
CPU Caches:
  L1 Data 16K (x8)
  L1 Instruction 64K (x4)
  L2 Unified 2048K (x4)
  L3 Unified 8192K (x1)
Load Average: 0.65, 3.90, 4.14
---------------------------------------------------------------------------------------------------
Benchmark                                         Time             CPU   Iterations UserCounters...
---------------------------------------------------------------------------------------------------
<...>
BM_bcmp<uint8_t, Identical>/512000           432131 ns       432101 ns         1613 bytes_read/iteration=1000k bytes_read/sec=2.20706G/s eltcnt=825.856M eltcnt/sec=1.18491G/s
BM_bcmp<uint8_t, Identical>_BigO               0.86 N          0.86 N
BM_bcmp<uint8_t, Identical>_RMS                   8 %             8 %
<...>
BM_bcmp<uint16_t, Identical>/256000          161408 ns       161409 ns         4027 bytes_read/iteration=1000k bytes_read/sec=5.90843G/s eltcnt=1030.91M eltcnt/sec=1.58603G/s
BM_bcmp<uint16_t, Identical>_BigO              0.67 N          0.67 N
BM_bcmp<uint16_t, Identical>_RMS                 25 %            25 %
<...>
BM_bcmp<uint32_t, Identical>/128000           81497 ns        81488 ns         8415 bytes_read/iteration=1000k bytes_read/sec=11.7032G/s eltcnt=1077.12M eltcnt/sec=1.57078G/s
BM_bcmp<uint32_t, Identical>_BigO              0.71 N          0.71 N
BM_bcmp<uint32_t, Identical>_RMS                 42 %            42 %
<...>
BM_bcmp<uint64_t, Identical>/64000            50138 ns        50138 ns        10909 bytes_read/iteration=1000k bytes_read/sec=19.0209G/s eltcnt=698.176M eltcnt/sec=1.27647G/s
BM_bcmp<uint64_t, Identical>_BigO              0.84 N          0.84 N
BM_bcmp<uint64_t, Identical>_RMS                 27 %            27 %
<...>
BM_bcmp<uint8_t, InequalHalfway>/512000      192405 ns       192392 ns         3638 bytes_read/iteration=1000k bytes_read/sec=4.95694G/s eltcnt=1.86266G eltcnt/sec=2.66124G/s
BM_bcmp<uint8_t, InequalHalfway>_BigO          0.38 N          0.38 N
BM_bcmp<uint8_t, InequalHalfway>_RMS              3 %             3 %
<...>
BM_bcmp<uint16_t, InequalHalfway>/256000     127858 ns       127860 ns         5477 bytes_read/iteration=1000k bytes_read/sec=7.45873G/s eltcnt=1.40211G eltcnt/sec=2.00219G/s
BM_bcmp<uint16_t, InequalHalfway>_BigO         0.50 N          0.50 N
BM_bcmp<uint16_t, InequalHalfway>_RMS             0 %             0 %
<...>
BM_bcmp<uint32_t, InequalHalfway>/128000      49140 ns        49140 ns        14281 bytes_read/iteration=1000k bytes_read/sec=19.4072G/s eltcnt=1.82797G eltcnt/sec=2.60478G/s
BM_bcmp<uint32_t, InequalHalfway>_BigO         0.40 N          0.40 N
BM_bcmp<uint32_t, InequalHalfway>_RMS            18 %            18 %
<...>
BM_bcmp<uint64_t, InequalHalfway>/64000       32101 ns        32099 ns        21786 bytes_read/iteration=1000k bytes_read/sec=29.7101G/s eltcnt=1.3943G eltcnt/sec=1.99381G/s
BM_bcmp<uint64_t, InequalHalfway>_BigO         0.50 N          0.50 N
BM_bcmp<uint64_t, InequalHalfway>_RMS             1 %             1 %
RUNNING: build-new/test/llvm-bcmp-bench --benchmark_out=/tmp/tmpQ46PP0
2019-04-25 21:19:29
Running build-new/test/llvm-bcmp-bench
Run on (8 X 4000 MHz CPU s)
CPU Caches:
  L1 Data 16K (x8)
  L1 Instruction 64K (x4)
  L2 Unified 2048K (x4)
  L3 Unified 8192K (x1)
Load Average: 1.01, 2.85, 3.71
---------------------------------------------------------------------------------------------------
Benchmark                                         Time             CPU   Iterations UserCounters...
---------------------------------------------------------------------------------------------------
<...>
BM_bcmp<uint8_t, Identical>/512000            18593 ns        18590 ns        37565 bytes_read/iteration=1000k bytes_read/sec=51.2991G/s eltcnt=19.2333G eltcnt/sec=27.541G/s
BM_bcmp<uint8_t, Identical>_BigO               0.04 N          0.04 N
BM_bcmp<uint8_t, Identical>_RMS                  37 %            37 %
<...>
BM_bcmp<uint16_t, Identical>/256000           18950 ns        18948 ns        37223 bytes_read/iteration=1000k bytes_read/sec=50.3324G/s eltcnt=9.52909G eltcnt/sec=13.511G/s
BM_bcmp<uint16_t, Identical>_BigO              0.08 N          0.08 N
BM_bcmp<uint16_t, Identical>_RMS                 34 %            34 %
<...>
BM_bcmp<uint32_t, Identical>/128000           18627 ns        18627 ns        37895 bytes_read/iteration=1000k bytes_read/sec=51.198G/s eltcnt=4.85056G eltcnt/sec=6.87168G/s
BM_bcmp<uint32_t, Identical>_BigO              0.16 N          0.16 N
BM_bcmp<uint32_t, Identical>_RMS                 35 %            35 %
<...>
BM_bcmp<uint64_t, Identical>/64000            18855 ns        18855 ns        37458 bytes_read/iteration=1000k bytes_read/sec=50.5791G/s eltcnt=2.39731G eltcnt/sec=3.3943G/s
BM_bcmp<uint64_t, Identical>_BigO              0.32 N          0.32 N
BM_bcmp<uint64_t, Identical>_RMS                 33 %            33 %
<...>
BM_bcmp<uint8_t, InequalHalfway>/512000        9570 ns         9569 ns        73500 bytes_read/iteration=1000k bytes_read/sec=99.6601G/s eltcnt=37.632G eltcnt/sec=53.5046G/s
BM_bcmp<uint8_t, InequalHalfway>_BigO          0.02 N          0.02 N
BM_bcmp<uint8_t, InequalHalfway>_RMS             29 %            29 %
<...>
BM_bcmp<uint16_t, InequalHalfway>/256000       9547 ns         9547 ns        74343 bytes_read/iteration=1000k bytes_read/sec=99.8971G/s eltcnt=19.0318G eltcnt/sec=26.8159G/s
BM_bcmp<uint16_t, InequalHalfway>_BigO         0.04 N          0.04 N
BM_bcmp<uint16_t, InequalHalfway>_RMS            29 %            29 %
<...>
BM_bcmp<uint32_t, InequalHalfway>/128000       9396 ns         9394 ns        73521 bytes_read/iteration=1000k bytes_read/sec=101.518G/s eltcnt=9.41069G eltcnt/sec=13.6255G/s
BM_bcmp<uint32_t, InequalHalfway>_BigO         0.08 N          0.08 N
BM_bcmp<uint32_t, InequalHalfway>_RMS            30 %            30 %
<...>
BM_bcmp<uint64_t, InequalHalfway>/64000        9499 ns         9498 ns        73802 bytes_read/iteration=1000k bytes_read/sec=100.405G/s eltcnt=4.72333G eltcnt/sec=6.73808G/s
BM_bcmp<uint64_t, InequalHalfway>_BigO         0.16 N          0.16 N
BM_bcmp<uint64_t, InequalHalfway>_RMS            28 %            28 %
Comparing build-old/test/llvm-bcmp-bench to build-new/test/llvm-bcmp-bench
Benchmark                                                  Time             CPU      Time Old      Time New       CPU Old       CPU New
---------------------------------------------------------------------------------------------------------------------------------------
<...>
BM_bcmp<uint8_t, Identical>/512000                      -0.9570         -0.9570        432131         18593        432101         18590
<...>
BM_bcmp<uint16_t, Identical>/256000                     -0.8826         -0.8826        161408         18950        161409         18948
<...>
BM_bcmp<uint32_t, Identical>/128000                     -0.7714         -0.7714         81497         18627         81488         18627
<...>
BM_bcmp<uint64_t, Identical>/64000                      -0.6239         -0.6239         50138         18855         50138         18855
<...>
BM_bcmp<uint8_t, InequalHalfway>/512000                 -0.9503         -0.9503        192405          9570        192392          9569
<...>
BM_bcmp<uint16_t, InequalHalfway>/256000                -0.9253         -0.9253        127858          9547        127860          9547
<...>
BM_bcmp<uint32_t, InequalHalfway>/128000                -0.8088         -0.8088         49140          9396         49140          9394
<...>
BM_bcmp<uint64_t, InequalHalfway>/64000                 -0.7041         -0.7041         32101          9499         32099          9498
```

What can we tell from the benchmark?
* Performance of naive equality check somewhat improves with element size,
  maxing out at eltcnt/sec=1.58603G/s for uint16_t, or bytes_read/sec=19.0209G/s
  for uint64_t. I think, that instability implies performance problems.
* Performance of `memcmp()`-aware benchmark always maxes out at around
  bytes_read/sec=51.2991G/s for every type. That is 2.6x the throughput of the
  naive variant!
* eltcnt/sec metric for the `memcmp()`-aware benchmark maxes out at
  eltcnt/sec=27.541G/s for uint8_t (was: eltcnt/sec=1.18491G/s, so 24x) and
  linearly decreases with element size.
  For uint64_t, it's ~4x+ the elements/second.
* The call obvious is more pricey than the loop, with small element count.
  As it can be seen from the full output {F8768210}, the `memcmp()` is almost
  universally worse, independent of the element size (and thus buffer size) when
  element count is less than 8.

So all in all, bcmp idiom does indeed pose untapped performance headroom.
This diff does implement said idiom recognition. I think a reasonable test
coverage is present, but do tell if there is anything obvious missing.

Now, quality. This does succeed to build and pass the test-suite, at least
without any non-bundled elements. {F8768216} {F8768217}
This transform fires 91 times:
```
$ /build/test-suite/utils/compare.py -m loop-idiom.NumBCmp result-new.json
Tests: 1149
Metric: loop-idiom.NumBCmp

Program                                         result-new

MultiSourc...Benchmarks/7zip/7zip-benchmark    79.00
MultiSource/Applications/d/make_dparser         3.00
SingleSource/UnitTests/vla                      2.00
MultiSource/Applications/Burg/burg              1.00
MultiSourc.../Applications/JM/lencod/lencod     1.00
MultiSource/Applications/lemon/lemon            1.00
MultiSource/Benchmarks/Bullet/bullet            1.00
MultiSourc...e/Benchmarks/MallocBench/gs/gs     1.00
MultiSourc...gs-C/TimberWolfMC/timberwolfmc     1.00
MultiSourc...Prolangs-C/simulator/simulator     1.00
```
The size changes are:
I'm not sure what's going on with SingleSource/UnitTests/vla.test yet, did not look.
```
$ /build/test-suite/utils/compare.py -m size..text result-{old,new}.json --filter-hash
Tests: 1149
Same hash: 907 (filtered out)
Remaining: 242
Metric: size..text

Program                                        result-old result-new diff
test-suite...ingleSource/UnitTests/vla.test   753.00     833.00     10.6%
test-suite...marks/7zip/7zip-benchmark.test   1001697.00 966657.00  -3.5%
test-suite...ngs-C/simulator/simulator.test   32369.00   32321.00   -0.1%
test-suite...plications/d/make_dparser.test   89585.00   89505.00   -0.1%
test-suite...ce/Applications/Burg/burg.test   40817.00   40785.00   -0.1%
test-suite.../Applications/lemon/lemon.test   47281.00   47249.00   -0.1%
test-suite...TimberWolfMC/timberwolfmc.test   250065.00  250113.00   0.0%
test-suite...chmarks/MallocBench/gs/gs.test   149889.00  149873.00  -0.0%
test-suite...ications/JM/lencod/lencod.test   769585.00  769569.00  -0.0%
test-suite.../Benchmarks/Bullet/bullet.test   770049.00  770049.00   0.0%
test-suite...HMARK_ANISTROPIC_DIFFUSION/128    NaN        NaN        nan%
test-suite...HMARK_ANISTROPIC_DIFFUSION/256    NaN        NaN        nan%
test-suite...CHMARK_ANISTROPIC_DIFFUSION/64    NaN        NaN        nan%
test-suite...CHMARK_ANISTROPIC_DIFFUSION/32    NaN        NaN        nan%
test-suite...ENCHMARK_BILATERAL_FILTER/64/4    NaN        NaN        nan%
Geomean difference                                                   nan%
         result-old    result-new       diff
count  1.000000e+01  10.00000      10.000000
mean   3.152090e+05  311695.40000  0.006749
std    3.790398e+05  372091.42232  0.036605
min    7.530000e+02  833.00000    -0.034981
25%    4.243300e+04  42401.00000  -0.000866
50%    1.197370e+05  119689.00000 -0.000392
75%    6.397050e+05  639705.00000 -0.000005
max    1.001697e+06  966657.00000  0.106242
```

I don't have timings though.

And now to the code. The basic idea is to completely replace the whole loop.
If we can't fully kill it, don't transform.
I have left one or two comments in the code, so hopefully it can be understood.

Also, there is a few TODO's that i have left for follow-ups:
* widening of `memcmp()`/`bcmp()`
* step smaller than the comparison size
* Metadata propagation
* more than two blocks as long as there is still a single backedge?
* ???

Reviewers: reames, fhahn, mkazantsev, chandlerc, craig.topper, courbet

Reviewed By: courbet

Subscribers: hiraditya, xbolva00, nikic, jfb, gchatelet, courbet, llvm-commits, mclow.lists

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61144

llvm-svn: 370454
2019-08-30 09:51:23 +00:00
Sanjay Patel 65f1c04000 [InstCombine] reduce duplicated code; NFC
llvm-svn: 370399
2019-08-29 19:36:18 +00:00
Alina Sbirlea 4b87023bae Revert enabling MemorySSA.
Breaks sanitizers bots.

Differential Revision: https://reviews.llvm.org/D58311

llvm-svn: 370397
2019-08-29 19:01:23 +00:00
Florian Hahn f9cdb98f40 [LoopUnrollAndJam] Use Lazy strategy for DTU.
We can also apply the earlier updates to the lazy DTU, instead of
applying them directly.

Reviewers: kuhar, brzycki, asbirlea, SjoerdMeijer

Reviewed By: brzycki, asbirlea, SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D66918

llvm-svn: 370391
2019-08-29 17:47:58 +00:00
Alina Sbirlea 6289ee941d [MemorySSA & LoopPassManager] Enable MemorySSA as loop dependency. Update tests.
Summary:
I'm not planning to check this in at the moment, but feedback is very welcome, in particular how this affects performance.
The feedback obtains here will guide the next steps towards enabling this.

This patch enables the use of MemorySSA in the loop pass manager.

Passes that currently use MemorySSA:
 - EarlyCSE
Passes that use MemorySSA after this patch:
 - EarlyCSE
 - LICM
 - SimpleLoopUnswitch
Loop passes that update MemorySSA (and do not use it yet, but could use it after this patch):
 - LoopInstSimplify
 - LoopSimplifyCFG
 - LoopUnswitch
 - LoopRotate
 - LoopSimplify
 - LCSSA
Loop passes that do *not* update MemorySSA:
 - IndVarSimplify
 - LoopDelete
 - LoopIdiom
 - LoopSink
 - LoopUnroll
 - LoopInterchange
 - LoopUnrollAndJam
 - LoopVectorize
 - LoopReroll
 - IRCE

Reviewers: chandlerc, george.burgess.iv, davide, sanjoy, gberry

Subscribers: jlebar, Prazek, dmgreen, jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58311

llvm-svn: 370384
2019-08-29 17:08:13 +00:00
Michael Liao 001871dee8 [SimplifyCFG] Skip sinking common lifetime markers of `alloca`.
Summary:
- Similar to the workaround in fix of PR30188, skip sinking common
  lifetime markers of `alloca`. They are mostly left there after
  inlining functions in branches.

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66950

llvm-svn: 370376
2019-08-29 16:12:05 +00:00
Roman Lebedev 9f35d2b564 [SimplifyCFG] FoldTwoEntryPHINode(): don't bailout on i1 PHI's if we can hoist a 'not' from incoming values
Summary:
As it can be seen in the tests in D65143/D65144, even though we have formed an '@llvm.umul.with.overflow'
and got rid of potential for division-by-zero, the control flow remains, we still have that branch.

We have this condition:
```
  // Don't fold i1 branches on PHIs which contain binary operators
  // These can often be turned into switches and other things.
  if (PN->getType()->isIntegerTy(1) &&
      (isa<BinaryOperator>(PN->getIncomingValue(0)) ||
       isa<BinaryOperator>(PN->getIncomingValue(1)) ||
       isa<BinaryOperator>(IfCond)))
    return false;
```
which was added back in rL121764 to help with `select` formation i think?

That check prevents us to flatten the CFG here, even though we know
we no longer need that guard and will be able to drop everything
but the '@llvm.umul.with.overflow' + `not`.

As it can be seen from tests, we end here because the `not` is being
sinked into the PHI's incoming values by InstCombine,
so we can't workaround this by hoisting it to after PHI.

Thus i suggest that we relax that check to not bailout if we'd get to hoist the `not`.

Reviewers: craig.topper, spatel, fhahn, nikic

Reviewed By: spatel

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65147

llvm-svn: 370349
2019-08-29 12:47:34 +00:00
Roman Lebedev 473a063a5e [InstCombine] Fold '((%x * %y) u/ %x) != %y' to '@llvm.umul.with.overflow' + overflow bit extraction
Summary:
`((%x * %y) u/ %x) != %y` is one of (3?) common ways to check that
some unsigned multiplication (will not) overflow.
Currently, we don't catch it. We could:
```
$ /repositories/alive2/build-Clang-unknown/alive -root-only ~/llvm-patch1.ll
Processing /home/lebedevri/llvm-patch1.ll..

----------------------------------------
Name: no overflow
  %o0 = mul i4 %y, %x
  %o1 = udiv i4 %o0, %x
  %r = icmp ne i4 %o1, %y
  ret i1 %r
=>
  %n0 = umul_overflow i4 %x, %y
  %o0 = extractvalue {i4, i1} %n0, 0
  %o1 = udiv %o0, %x
  %r = extractvalue {i4, i1} %n0, 1
  ret %r

Done: 1
Optimization is correct!

----------------------------------------
Name: no overflow
  %o0 = mul i4 %y, %x
  %o1 = udiv i4 %o0, %x
  %r = icmp eq i4 %o1, %y
  ret i1 %r
=>
  %n0 = umul_overflow i4 %x, %y
  %o0 = extractvalue {i4, i1} %n0, 0
  %o1 = udiv %o0, %x
  %n1 = extractvalue {i4, i1} %n0, 1
  %r = xor %n1, -1
  ret i1 %r

Done: 1
Optimization is correct!

```

Reviewers: nikic, spatel, efriedma, xbolva00, RKSimon

Reviewed By: nikic

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65144

llvm-svn: 370348
2019-08-29 12:47:20 +00:00
Roman Lebedev fb38b7aab3 [InstCombine] Fold '(-1 u/ %x) u< %y' to '@llvm.umul.with.overflow' + overflow bit extraction
Summary:
`(-1 u/ %x) u< %y` is one of (3?) common ways to check that
some unsigned multiplication (will not) overflow.
Currently, we don't catch it. We could:
```
----------------------------------------
Name: no overflow
  %o0 = udiv i4 -1, %x
  %r = icmp ult i4 %o0, %y
=>
  %o0 = udiv i4 -1, %x
  %n0 = umul_overflow i4 %x, %y
  %r = extractvalue {i4, i1} %n0, 1

Done: 1
Optimization is correct!

----------------------------------------
Name: no overflow, swapped
  %o0 = udiv i4 -1, %x
  %r = icmp ugt i4 %y, %o0
=>
  %o0 = udiv i4 -1, %x
  %n0 = umul_overflow i4 %x, %y
  %r = extractvalue {i4, i1} %n0, 1

Done: 1
Optimization is correct!

----------------------------------------
Name: overflow
  %o0 = udiv i4 -1, %x
  %r = icmp uge i4 %o0, %y
=>
  %o0 = udiv i4 -1, %x
  %n0 = umul_overflow i4 %x, %y
  %n1 = extractvalue {i4, i1} %n0, 1
  %r = xor %n1, -1

Done: 1
Optimization is correct!

----------------------------------------
Name: overflow
  %o0 = udiv i4 -1, %x
  %r = icmp ule i4 %y, %o0
=>
  %o0 = udiv i4 -1, %x
  %n0 = umul_overflow i4 %x, %y
  %n1 = extractvalue {i4, i1} %n0, 1
  %r = xor %n1, -1

Done: 1
Optimization is correct!
```

As it can be observed from tests, while simply forming the `@llvm.umul.with.overflow`
is easy, if we were looking for the inverted answer, then more work needs to be done
to cleanup the now-pointless control-flow that was guarding against division-by-zero.
This is being addressed in follow-up patches.

Reviewers: nikic, spatel, efriedma, xbolva00, RKSimon

Reviewed By: nikic, xbolva00

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65143

llvm-svn: 370347
2019-08-29 12:47:08 +00:00
Roman Lebedev f13b0e3ed8 [InstCombine] Shift amount reassociation in bittest: trunc-of-lshr (PR42399)
Summary:
Finally, the fold i was looking forward to :)

The legality check is muddy, i doubt  i've groked the full generalization,
but it handles all the cases i care about, and can come up with:
https://rise4fun.com/Alive/26j

I.e. we can perform the fold if **any** of the following is true:
* The shift amount is either zero or one less than widest bitwidth
* Either of the values being shifted has at most lowest bit set
* The value that is being shifted by `shl` (which is not truncated) should have no less leading zeros than the total shift amount;
* The value that is being shifted by `lshr` (which **is** truncated) should have no less leading zeros than the widest bit width minus total shift amount minus one

I strongly suspect there is some better generalization, but i'm not aware of it as of right now.
For now i also avoided using actual `computeKnownBits()`, but restricted it to constants.

Reviewers: spatel, nikic, xbolva00

Reviewed By: spatel

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66383

llvm-svn: 370324
2019-08-29 10:26:23 +00:00
Simon Pilgrim 920b04011b Fix variable set but no used warnings on NDEBUG builds. NFCI.
llvm-svn: 370319
2019-08-29 10:08:45 +00:00
Simon Pilgrim ef9c6a7077 Fix variable set but no used warning on NDEBUG builds. NFCI.
llvm-svn: 370317
2019-08-29 09:58:47 +00:00
Hideto Ueno cbab334e40 [Attributor] Deduce "noalias" attribute
Summary:
This patch adds very basic deduction for noalias.

Reviewers: jdoerfert, sstefan1

Reviewed By: jdoerfert

Tags: LLVM

Differential Revision: https://reviews.llvm.org/D66207

llvm-svn: 370295
2019-08-29 05:52:00 +00:00
Florian Hahn 3177b92231 [LoopUnroll] Use Lazy strategy for DTU used for MergeBlockIntoPredecessor.
We do not access the DT in the loop, so we do not have to apply updates
eagerly. We can apply them lazyly and flush them after we are done
merging blocks.

As follow-up work, we might be able to use the DTU above as well,
instead of manually updating the DT.

This brings the example from PR43134 from ~100s to ~4s for a relase +
assertions build on my machine.

Reviewers: efriedma, kuhar, asbirlea, brzycki

Reviewed By: kuhar, brzycki

Differential Revision: https://reviews.llvm.org/D66911

llvm-svn: 370292
2019-08-29 04:26:29 +00:00
Johannes Doerfert bf112139ac [Attributor] Improve messages in iteration verify mode
When we now verify the iteration count we will see the actual count
and the expected count before the assertion is triggered.

llvm-svn: 370285
2019-08-29 01:29:44 +00:00
Johannes Doerfert 62a9c1da78 [Attributor][Fix] Indicate change correctly
llvm-svn: 370283
2019-08-29 01:26:58 +00:00
Johannes Doerfert 1aac182f31 [Attributor] Fix typo
llvm-svn: 370282
2019-08-29 01:26:09 +00:00
Artur Pilipenko 925afc1ce7 Fix for "DICompileUnit not listed in llvm.dbg.cu" verification error after ...
...cloning a function from a different module

Currently when a function with debug info is cloned from a different module, the 
cloned function may have hanging DICompileUnits, so that the module with the 
cloned function fails debug info verification.

The proposed fix inserts all DICompileUnits reachable from the cloned function 
to "llvm.dbg.cu" metadata operands of the cloned function module. 

Reviewed By: aprantl, efriedma

Differential Revision: https://reviews.llvm.org/D66510

Patch by Oleg Pliss (Oleg.Pliss@azul.com)

llvm-svn: 370265
2019-08-28 21:27:50 +00:00
Julian Lettner 3ae9b9d5e4 [ASan] Make insertion of version mismatch guard configurable
By default ASan calls a versioned function
`__asan_version_mismatch_check_vXXX` from the ASan module constructor to
check that the compiler ABI version and runtime ABI version are
compatible. This ensures that we get a predictable linker error instead
of hard-to-debug runtime errors.

Sometimes, however, we want to skip this safety guard. This new command
line option allows us to do just that.

rdar://47891956

Reviewed By: kubamracek

Differential Revision: https://reviews.llvm.org/D66826

llvm-svn: 370258
2019-08-28 20:40:55 +00:00
Sanjay Patel 2d4b6777c4 [InstCombine] clean up wrap propagation for reassociated ops; NFCI
Always true/false checks were flagged by static analysis;
https://bugs.llvm.org/show_bug.cgi?id=43143

I have not confirmed the logic difference in propagating nsw vs. nuw,
but presumably we would have noticed a bug by now if that was wrong.

llvm-svn: 370248
2019-08-28 18:58:06 +00:00
Pirama Arumuga Nainar 19205abaaa [ValueMapper] NFC: Remove dead code to pause metadata mapping
Summary:
This functionality was added when Mapper::mapMetadata was recursive.  It
is no longer needed after r265456, which switched it to be iterative.

Reviewers: dexonsmith, srhines

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66860

llvm-svn: 370236
2019-08-28 17:43:14 +00:00
Johannes Doerfert f7ca0fe1c8 [Attributor] Regularly clear dependences to remove spurious ones
As dependences between abstract attributes can become stale, e.g., if
one was sufficient to imply another one at some point but it has since
been wakened to the point it is not usable for the formerly implied one.
To weed out spurious dependences, and thereby eliminate unneeded
updates, we introduce an option to determine how often the dependence
cache is cleared and recomputed during the fixpoint iteration.

Note that the initial value was determined such that we see a positive
result on our tests.

Differential Revision: https://reviews.llvm.org/D63315

llvm-svn: 370230
2019-08-28 16:58:52 +00:00
Simon Pilgrim 1d8a886c59 Reduce scope of variable only used in a local pattern match. NFCI.
llvm-svn: 370224
2019-08-28 16:06:33 +00:00
Craig Topper f79d8a064c [InstCombine] Disable recursion in foldGEPICmp for vector pointer GEPs
Due to missing vector support in this function, recursion can
generate worse code in some cases.

llvm-svn: 370221
2019-08-28 15:40:34 +00:00
Simon Pilgrim b569624049 Fix uninitialized variable warning in cppcheck. NFCI.
InstCombiner::MaxArraySizeForCombine is set outside the constructor so we need to ensure it has a default initialization value.

llvm-svn: 370220
2019-08-28 15:19:49 +00:00
David Bolvansky af118bb6d0 [NFC] Added a comment to avoid possible confusion
llvm-svn: 370217
2019-08-28 15:04:48 +00:00
Simon Pilgrim 316bfb0f48 Remove duplicate 'BitWidth' variable. NFCI.
llvm-svn: 370212
2019-08-28 14:37:44 +00:00
Johannes Doerfert 07a5c129c6 [Attributor] Restrict liveness and return information to functions
Summary:
Until we have proper call-site information we should not recompute
liveness and return information for each call site. This patch directly
uses the function versions and introduces TODOs at the usage sites.

The required iterations to get to the fixpoint are most of the time
reduced by this change and we always avoid work duplication.

Reviewers: sstefan1, uenoku

Subscribers: hiraditya, bollu, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66562

llvm-svn: 370208
2019-08-28 14:09:14 +00:00
Simon Pilgrim 284118ce3b InstCombiner::visitSelectInst - rename Pred to MinMaxPred to stop shadow variable warning. NFCI.
We have a lot of Predicate variables, all similarly named....

llvm-svn: 370207
2019-08-28 14:05:38 +00:00
Ayal Zaks d15df0ede5 [LV] Fold tail by masking - handle reductions
Allow vectorizing loops that have reductions when tail is folded by masking.
A select is introduced in VPlan, choosing between the last value carried by the
loop-exit/live-out instruction of the reduction, and the penultimate value
carried by the reduction phi, according to the "i < n" mask of fold-tail.
This select replaces the last value as the live-out value of the loop.

Differential Revision: https://reviews.llvm.org/D66720

llvm-svn: 370173
2019-08-28 09:02:23 +00:00
David Bolvansky 05bda8b4e5 Annotate return values of allocation functions with dereferenceable_or_null
Summary:
Example
define dso_local noalias i8* @_Z6maixxnv() local_unnamed_addr #0 {
entry:
  %call = tail call noalias dereferenceable_or_null(64) i8* @malloc(i64 64) #6
  ret i8* %call
}


Reviewers: jdoerfert

Reviewed By: jdoerfert

Subscribers: aaron.ballman, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66651

llvm-svn: 370168
2019-08-28 08:28:20 +00:00
Fangrui Song 6964027315 [LoopFusion] Fix another -Wunused-function in -DLLVM_ENABLE_ASSERTIONS=off build
llvm-svn: 370156
2019-08-28 03:12:40 +00:00
Craig Topper 5bbb604bb5 [InstCombine] Disable some portions of foldGEPICmp for GEPs that return a vector of pointers. Fix other portions.
llvm-svn: 370114
2019-08-27 21:38:56 +00:00
Philip Reames 2694522f13 [Loads/SROA] Remove blatantly incorrect code and fix a bug revealed in the process
The code we had isSafeToLoadUnconditionally was blatantly wrong. This function takes a "Size" argument which is supposed to describe the span loaded from. Instead, the code use the size of the pointer passed (which may be unrelated!) and only checks that span. For any Size > LoadSize, this can and does lead to miscompiles.

Worse, the generic code just a few lines above correctly handles the cases which *are* valid. So, let's delete said code.

Removing this code revealed two issues:
1) As noted by jdoerfert the removed code incorrectly handled external globals.  The test update in SROA is to stop testing incorrect behavior.
2) SROA was confusing bytes and bits, but this wasn't obvious as the Size parameter was being essentially ignored anyway.  Fixed.

Differential Revision: https://reviews.llvm.org/D66778

llvm-svn: 370102
2019-08-27 19:34:43 +00:00
David Bolvansky 0c2692108c [InstCombine] Fold select with ctlz to cttz
Summary:
Handle pattern [0]:

int ctz(unsigned int a)
{
  int c = __clz(a & -a);
  return a ? 31 - c : c;
}

In reality, the compiler can generate much better code for cttz, so fold away this pattern.

https://godbolt.org/z/c5kPtV

 [0] https://community.arm.com/community-help/f/discussions/2114/count-trailing-zeros

Reviewers: spatel, nikic, lebedev.ri, dmgreen, hfinkel

Reviewed By: hfinkel

Subscribers: hfinkel, javed.absar, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66308

llvm-svn: 370037
2019-08-27 10:22:40 +00:00
Johannes Doerfert 39681e733c [Attributor] Introduce an API to delete stuff
Summary:
During the fixpoint iteration, including the manifest stage, we should
not delete stuff as other abstract attributes might have a reference to
the value. Through the API this can now be done safely at the very end.

Reviewers: uenoku, sstefan1

Subscribers: hiraditya, bollu, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66779

llvm-svn: 370014
2019-08-27 04:57:54 +00:00
Vitaly Buka aeca56964f msan, codegen, instcombine: Keep more lifetime markers used for msan
Reviewers: eugenis

Subscribers: hiraditya, cfe-commits, #sanitizers, llvm-commits

Tags: #clang, #sanitizers, #llvm

Differential Revision: https://reviews.llvm.org/D66695

llvm-svn: 369979
2019-08-26 22:15:50 +00:00
Evgeniy Stepanov ed4fefb0df [hwasan] Fix test failure in r369721.
Try harder to emulate "old runtime" in the test.
To get the old behavior with the new runtime library, we need both
disable personality function wrapping and enable landing pad
instrumentation.

llvm-svn: 369977
2019-08-26 21:44:55 +00:00
Philip Reames cf3b555973 Add a clarify comment for meaning of SafePointes [NFC]
Extracted from D66688 as requested.

llvm-svn: 369962
2019-08-26 20:48:35 +00:00
Philip Reames b92c971099 [InstCombine] icmp eq/ne (gep inbounds P, Idx..), null -> icmp eq/ne P, null for vectors
Extend the transform introduced in https://reviews.llvm.org/D66608 to work for vector geps as well.

Differential Revision: https://reviews.llvm.org/D66671

llvm-svn: 369949
2019-08-26 19:11:49 +00:00
Johannes Doerfert b504eb8bb5 [Attributor] Adjust and test the iteration bound of tests
Summary:
Try to verify how many iterations we need for a fixpoint in our tests.
This patch adjust the way we count to make it easier to follow. It also
adjusts the bounds to actually account for a fixpoint and not only the
minimum number to pass all checks.

Reviewers: uenoku, sstefan1

Subscribers: hiraditya, bollu, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66757

llvm-svn: 369945
2019-08-26 18:55:47 +00:00
Johannes Doerfert a4a308cc25 [Attributor] Further cut down on non-determinism
llvm-svn: 369936
2019-08-26 17:51:23 +00:00
Johannes Doerfert 19b0043641 [Attributor] Allow explicit dependence tracking
By default, the Attributor tracks potential dependences between abstract
attributes based on the issued Attributor::getAAFor queries. This
simplifies the development of new abstract attributes but it can also
lead to spurious dependences that might increase compile time and make
internalization harder (D63312). With this patch, abstract attributes
can opt-out of implicit dependence tracking and instead register
dependences explicitly. It is up to the implementation to make sure all
existing dependences are registered.

Differential Revision: https://reviews.llvm.org/D63314

llvm-svn: 369935
2019-08-26 17:48:05 +00:00
Bjorn Pettersson d804bd17de [LoopUnroll] Handle certain PHIs in full unrolling properly
Summary:
When reconstructing the CFG of the loop after unrolling,
LoopUnroll could in some cases remove the phi operands of
loop-carried values instead of preserving them, resulting
in undef phi values after loop unrolling.

When doing this reconstruction, avoid removing incoming
phi values for phis in the successor blocks if the successor
is the block we are jumping to anyway.

Patch-by: ebevhan

Reviewers: fhahn, efriedma

Reviewed By: fhahn

Subscribers: bjope, lebedev.ri, zzheng, dmgreen, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66334

llvm-svn: 369886
2019-08-26 09:29:53 +00:00
Roman Lebedev 9cf08c6de1 [Constant] Add 'isElementWiseEqual()' method
Promoting it from InstCombine's tryToReuseConstantFromSelectInComparison().

Return true if this constant and a constant 'Y' are element-wise equal.
This is identical to just comparing the pointers, with the exception that
for vectors, if only one of the constants has an `undef` element in some
lane, the constants still match.

llvm-svn: 369842
2019-08-24 06:49:51 +00:00
Roman Lebedev de19f749e0 [InstCombine] matchThreeWayIntCompare(): commutativity awareness
Summary:
`matchThreeWayIntCompare()` looks for
```
   select i1 (a == b),
          i32 Equal,
          i32 (select i1 (a < b), i32 Less, i32 Greater)
```
but both of these selects/compares can be in it's commuted form,
so out of 8 variants, only the two most basic ones is handled.
This fixes regression being introduced in D66232.

Reviewers: spatel, nikic, efriedma, xbolva00

Reviewed By: spatel

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66607

llvm-svn: 369841
2019-08-24 06:49:36 +00:00
Roman Lebedev 2c75fe7f2a [InstCombine] Try to reuse constant from select in leading comparison
Summary:
If we have e.g.:
```
  %t = icmp ult i32 %x, 65536
  %r = select i1 %t, i32 %y, i32 65535
```
the constants `65535` and `65536` are suspiciously close.
We could perform a transformation to deduplicate them:
```
Name: ult
%t = icmp ult i32 %x, 65536
%r = select i1 %t, i32 %y, i32 65535
  =>
%t.inv = icmp ugt i32 %x, 65535
%r = select i1 %t.inv, i32 65535, i32 %y
```
https://rise4fun.com/Alive/avb

While this may seem esoteric, this should certainly be good for vectors
(less constant pool usage) and for opt-for-size - need to have only one constant.

But the real fun part here is that it allows further transformation,
in particular it finishes cleaning up the `clamp` folding,
see e.g. `canonicalize-clamp-with-select-of-constant-threshold-pattern.ll`.
We start with e.g.
```
  %dont_need_to_clamp_positive = icmp sle i32 %X, 32767
  %dont_need_to_clamp_negative = icmp sge i32 %X, -32768
  %clamp_limit = select i1 %dont_need_to_clamp_positive, i32 -32768, i32 32767
  %dont_need_to_clamp = and i1 %dont_need_to_clamp_positive, %dont_need_to_clamp_negative
  %R = select i1 %dont_need_to_clamp, i32 %X, i32 %clamp_limit
```
without this patch we currently produce
```
  %1 = icmp slt i32 %X, 32768
  %2 = icmp sgt i32 %X, -32768
  %3 = select i1 %2, i32 %X, i32 -32768
  %R = select i1 %1, i32 %3, i32 32767
```
which isn't really a `clamp` - both comparisons are performed on the original value,
this patch changes it into
```
  %1.inv = icmp sgt i32 %X, 32767
  %2 = icmp sgt i32 %X, -32768
  %3 = select i1 %2, i32 %X, i32 -32768
  %R = select i1 %1.inv, i32 32767, i32 %3
```
and then the magic happens! Some further transform finishes polishing it and we finally get:
```
  %t1 = icmp sgt i32 %X, -32768
  %t2 = select i1 %t1, i32 %X, i32 -32768
  %t3 = icmp slt i32 %t2, 32767
  %R = select i1 %t3, i32 %t2, i32 32767
```
which is beautiful and just what we want.

Proofs for `getFlippedStrictnessPredicateAndConstant()` for de-canonicalization:
https://rise4fun.com/Alive/THl
Proofs for the fold itself: https://rise4fun.com/Alive/THl

Reviewers: spatel, dmgreen, nikic, xbolva00

Reviewed By: spatel

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66232

llvm-svn: 369840
2019-08-24 06:49:25 +00:00