Commit Graph

17383 Commits

Author SHA1 Message Date
Chad Rosier 95abfa35d6 [Reassociate] Add negated value of negative constant to the Duplicates list.
In OptimizeAdd, we scan the operand list to see if there are any common factors
between operands that can be factored out to reduce the number of multiplies
(e.g., 'A*A+A*B*C+D' -> 'A*(A+B*C)+D'). For each operand of the operand list, we
only consider unique factors (which is tracked by the Duplicate set). Now if we
find a factor that is a negative constant, we add the negated value as a factor
as well, because we can percolate the negate out. However, we mistakenly don't
add this negated constant to the Duplicates set.

Consider the expression A*2*-2 + B. Obviously, nothing to factor.

For the added value A*2*-2 we over count 2 as a factor without this change,
which causes the assert reported in PR30256.  The problem is that this code is
assuming that all the multiply operands of the add are already reassociated.
This change avoids the issue by making OptimizeAdd tolerate multiplies which
haven't been completely optimized; this sort of works, but we're doing wasted
work: we'll end up revisiting the add later anyway.

Another possible approach would be to enforce RPO iteration order more strongly.
If we have RedoInsts, we process them immediately in RPO order, rather than
waiting until we've finished processing the whole function. Intuitively, it
seems like the natural approach: reassociation works on expression trees, so
the optimization only works in one direction. That said, I'm not sure how
practical that is given the current Reassociate; the "optimal" form for an
expression depends on its use list (see all the uses of "user_back()"), so
Reassociate is really an iterative optimization of sorts, so any changes here
would probably get messy.

PR30256

Differential Revision: https://reviews.llvm.org/D30228

llvm-svn: 296003
2017-02-23 18:49:03 +00:00
Dehao Chen 533bc6ea8e Use base discriminator in sample pgo profile matching.
Summary: The discriminator has been encoded, and only the base discriminator should be used during profile matching.

Reviewers: dblaikie, davidxl

Reviewed By: dblaikie, davidxl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30218

llvm-svn: 295999
2017-02-23 18:27:45 +00:00
Filipe Cabecinhas 33dd486f1d [AddressSanitizer] Add PS4 offset
llvm-svn: 295994
2017-02-23 17:10:28 +00:00
Sanjay Patel 68e4cb3c86 [InstCombine] use loop instead of recursion to peek through FPExt; NFCI
llvm-svn: 295992
2017-02-23 16:39:51 +00:00
Sanjay Patel adf2ab16e4 [InstCombine] use 'match' to reduce code; NFCI
llvm-svn: 295991
2017-02-23 16:26:03 +00:00
Alexey Bataev f77d1656af [SLP] Fix for PR32036: Vectorized horizontal reduction returning wrong
result

Summary:
If the same value is used several times as an extra value, SLP
vectorizer takes it into account only once instead of actual number of
using.
For example:
```
int val = 1;
for (int y = 0; y < 8; y++) {
  for (int x = 0; x < 8; x++) {
    val = val + input[y * 8 + x] + 3;
  }
}
```
We have 2 extra rguments: `1` - initial value of horizontal reduction
and `3`, which is added 8*8 times to the reduction. Before the patch we
added `1` to the reduction value and added once `3`, though it must be
added 64 times.

Reviewers: mkuper, mzolotukhin

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30262

llvm-svn: 295972
2017-02-23 13:37:09 +00:00
Alexey Bataev 14b370c1bf Revert "[SLP] Fix for PR32036: Vectorized horizontal reduction returning wrong"
This reverts commit 7c5141e577d9efd1c8e3087566a38ce6b3a41a84.

llvm-svn: 295957
2017-02-23 11:09:35 +00:00
Alexey Bataev 7ae653285d [SLP] Fix for PR32036: Vectorized horizontal reduction returning wrong
result

Summary:
If the same value is used several times as an extra value, SLP
vectorizer takes it into account only once instead of actual number of
using.
For example:
```
int val = 1;
for (int y = 0; y < 8; y++) {
  for (int x = 0; x < 8; x++) {
    val = val + input[y * 8 + x] + 3;
  }
}
```
We have 2 extra rguments: `1` - initial value of horizontal reduction
and `3`, which is added 8*8 times to the reduction. Before the patch we
added `1` to the reduction value and added once `3`, though it must be
added 64 times.

Reviewers: mkuper, mzolotukhin

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30262

llvm-svn: 295956
2017-02-23 10:57:15 +00:00
Alexey Bataev 7337212e83 Revert "[SLP] Fix for PR32036: Vectorized horizontal reduction returning wrong"
This reverts commit d83c81ee6a8dea662808ac22b396d1bb0595c89d.

llvm-svn: 295951
2017-02-23 09:59:29 +00:00
Alexey Bataev 68f2402c61 [SLP] Fix for PR32036: Vectorized horizontal reduction returning wrong
result

Summary:
If the same value is used several times as an extra value, SLP
vectorizer takes it into account only once instead of actual number of
using.
For example:
```
int val = 1;
for (int y = 0; y < 8; y++) {
  for (int x = 0; x < 8; x++) {
    val = val + input[y * 8 + x] + 3;
  }
}
```
We have 2 extra rguments: `1` - initial value of horizontal reduction
and `3`, which is added 8*8 times to the reduction. Before the patch we
added `1` to the reduction value and added once `3`, though it must be
added 64 times.

Reviewers: mkuper, mzolotukhin

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30262

llvm-svn: 295949
2017-02-23 09:40:38 +00:00
Matt Arsenault f0a88dbaab LoadStoreVectorizer: Split even sized illegal chains properly
Implement isLegalToVectorizeLoadChain for AMDGPU to avoid
producing private address spaces accesses that will need to be
split up later. This was doing the wrong thing in the case
where the queried chain was an even number of elements.

A possible <4 x i32> store was being split into
store <2 x i32>
store i32
store i32

rather than
store <2 x i32>
store <2 x i32>

when legal.

llvm-svn: 295933
2017-02-23 03:58:53 +00:00
Matt Arsenault d4bca1e9ef AMDGPU: Replace disabled exp inputs with undef
llvm-svn: 295914
2017-02-23 00:44:03 +00:00
Michael Kuperstein 6181c62b95 Revert r295868 because it breaks a different SLP lit test.
llvm-svn: 295906
2017-02-22 23:35:13 +00:00
Matt Arsenault f5262256a1 AMDGPU: Add replacement bfe intrinsics
llvm-svn: 295899
2017-02-22 23:04:58 +00:00
Sanjay Patel 4805ce0b17 [InstCombine] don't try SimplifyDemandedInstructionBits from add/sub because it's slow and unlikely to succeed
Notably, no regression tests change when we remove these calls, and these are expensive calls.

The motivation comes from the general acknowledgement that the compiler is getting slower:
http://lists.llvm.org/pipermail/llvm-dev/2017-January/109188.html
http://lists.llvm.org/pipermail/llvm-dev/2016-December/108279.html

And specifically the test case attached to PR32037:
https://bugs.llvm.org//show_bug.cgi?id=32037

Profiling the middle-end (opt) part of the compile:
$ ./opt -O2 row_common.bc -o /dev/null

...visitAdd and visitSub are near the top of the instcombine list, and the calls to SimplifyDemandedInstructionBits()
are high within each of those. Those calls account for 1%+ of the opt time in either debug or release profiles. And 
that's the rough win I see from this patch when testing opt built release from r295864 on an iMac with Haswell 4GHz
(model 4790K).

It seems unlikely that we'd be able to eliminate add/sub or change their operands given that add/sub normally affect
all bits, and the PR32037 example shows no IR difference after this change using -O2.

Also worth noting - the code comment in visitAdd:
// This handles stuff like (X & 254)+1 -> (X&254)|1
...isn't true. That transform is handled later with a call to haveNoCommonBitsSet().

Differential Revision: https://reviews.llvm.org/D30270

llvm-svn: 295898
2017-02-22 23:01:12 +00:00
Daniel Berlin fccbda967a PredicateInfo: Support switch statements
Summary:
Depends on D29606 and D29682

Makes us pass GVN's edge.ll (we also will pass a few other testcases
they just need cleaning up).

Thoughts on the Predicate* hiearchy of classes especially welcome :)
(it's not clear to me how best to organize it, and currently, the getBlock* seems ... uglier than maybe wasting a field somewhere or something).

Reviewers: davide

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29747

llvm-svn: 295889
2017-02-22 22:20:58 +00:00
Daniel Berlin 17e8d0eae2 Move updating functions to MemorySSAUpdater.
Add updater to passes that now need it.
Move around code in MemorySSA to expose needed functions.

Summary: Mostly cleanup

Reviewers: george.burgess.iv

Subscribers: llvm-commits, Prazek

Differential Revision: https://reviews.llvm.org/D30221

llvm-svn: 295887
2017-02-22 22:19:55 +00:00
Wei Mi 74d5a90fa6 [LSR] Canonicalize formula and put recursive Reg related with current loop in ScaledReg.
After rL294814, LSR formula can have multiple SCEVAddRecExprs inside of its BaseRegs.
Previous canonicalization will swap the first SCEVAddRecExpr in BaseRegs with ScaledReg.
But now we want to swap the SCEVAddRecExpr Reg related with current loop with ScaledReg.
Otherwise, we may generate code like this: RegA + lsr.iv + RegB, where loop invariant
parts RegA and RegB are not grouped together and cannot be promoted outside of loop.
With this patch, it will ensure lsr.iv to be generated later in the expr:
RegA + RegB + lsr.iv, so that RegA + RegB can be promoted outside of loop.

Differential Revision: https://reviews.llvm.org/D26781

llvm-svn: 295884
2017-02-22 21:47:08 +00:00
Alexey Bataev b551a81c28 [SLP] Fix for PR32036: Vectorized horizontal reduction returning wrong result
Summary:
If the same value is used several times as an extra value, SLP
vectorizer takes it into account only once instead of actual number of
using.
For example:
```
int val = 1;
for (int y = 0; y < 8; y++) {
  for (int x = 0; x < 8; x++) {
    val = val + input[y * 8 + x] + 3;
  }
}
```
We have 2 extra rguments: `1` - initial value of horizontal reduction
and `3`, which is added 8*8 times to the reduction. Before the patch we
added `1` to the reduction value and added once `3`, though it must be
added 64 times.

Reviewers: mkuper, mzolotukhin

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30262

llvm-svn: 295868
2017-02-22 20:06:40 +00:00
Karl-Johan Karlsson 6eaed7aceb [LoopVectorize] Added address space check when analysing interleaved accesses
Prevent memory objects of different address spaces to be part of
the same load/store groups when analysing interleaved accesses.

This is fixing pr31900.

Reviewers: HaoLiu, mssimpso, mkuper

Reviewed By: mssimpso, mkuper

Subscribers: llvm-commits, efriedma, mzolotukhin

Differential Revision: https://reviews.llvm.org/D29717

This reverts r295042 (re-applies r295038) with an additional fix for the
buildbot problem.

llvm-svn: 295858
2017-02-22 18:37:36 +00:00
Alexey Bataev 2e0031b371 [SLP] Remove unused initial value from the variable, NFC.
llvm-svn: 295826
2017-02-22 12:57:58 +00:00
Sean Silva 9011aca5f4 Use const-ref in range-loop for to avoid copying pairs of std::string
No reason to create temporaries.

Differential Revision: https://reviews.llvm.org/D29871

Patch by sergio.martins!

llvm-svn: 295807
2017-02-22 06:34:04 +00:00
Matt Arsenault 1f17c66890 AMDGPU: Add cvt.pkrtz intrinsic
Convert llvm.SI.packf16 test uses

llvm-svn: 295797
2017-02-22 00:27:34 +00:00
Michael Kuperstein c2af82b4b7 [LoopUnroll] Enable PGO-based loop peeling by default.
This enables peeling of loops with low dynamic iteration count by default,
when profile information is available.

Differential Revision: https://reviews.llvm.org/D27734

llvm-svn: 295796
2017-02-22 00:27:34 +00:00
Xin Tong ccee0e0c05 Make default value for disable-licm-promotion in licm explicit.
llvm-svn: 295767
2017-02-21 20:53:48 +00:00
Sanjay Patel cb731f1538 [InstCombine] canonicalize non-obivous forms of integer min/max
This is part of trying to clean up our handling of min/max patterns in IR.
By converting these to canonical form, we're more likely to recognize them
because there are various places in InstCombine that don't use 
matchSelectPattern or m_SMax and friends.

The backend fixups referenced in the now deleted TODO comment were added with:
https://reviews.llvm.org/rL291392
https://reviews.llvm.org/rL289738

If there's any codegen fallout from this change, we should be able to address
it in DAGCombiner or target-specific lowering. 

llvm-svn: 295758
2017-02-21 19:33:53 +00:00
Xin Tong ebfe01c121 [LoopSimplify] Simplify how we compute UniqueExit
Summary: Simplify how we compute UniqueExit. Reuse ExitBlockSet.

Reviewers: sanjoy, efriedma, hfinkel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30182

llvm-svn: 295751
2017-02-21 19:10:58 +00:00
Anna Thomas ec36f3b79a [InstCombine] Do not exercise nested max/min pattern on abs
Summary:
This is a fix for assertion failure in
`getInverseMinMaxSelectPattern` when ABS is passed in as a select pattern.

We should not be invoking the simplification rule for
ABS(MIN(~ x,y))) or ABS(MAX(~x,y)) combinations.

Added a test case which would cause an assertion failure without the patch.

Reviewers: sanjoy, majnemer

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30051

llvm-svn: 295719
2017-02-21 14:40:28 +00:00
Evgeny Stupachenko 9909872e30 The patch introduces new way of narrowing complex (>UINT16 variants) solutions.
The new method introduced under "-lsr-exp-narrow" option (currenlty set to true).

Summary:

The method is based on registers number mathematical expectation and should be
 generally closer to optimal solution.
Please see details in comments to
 "LSRInstance::NarrowSearchSpaceByDeletingCostlyFormulas()" function
 (in lib/Transforms/Scalar/LoopStrengthReduce.cpp).

Reviewers: qcolombet

Differential Revision: http://reviews.llvm.org/D29862

From: Evgeny Stupachenko <evstupac@gmail.com>
llvm-svn: 295704
2017-02-21 07:34:40 +00:00
Sanjoy Das 90208720e3 Add a wrapper around copy_if in STLExtras; NFC
I will add one more use for this in a later change.

llvm-svn: 295685
2017-02-21 00:38:44 +00:00
Sanjoy Das 85cd132068 [IndVars] Add an assert
We've already checked that the loop is in simplify form before, but a
little paranoia never hurt anyone.

llvm-svn: 295680
2017-02-20 23:37:11 +00:00
Daniel Berlin 78cbd28102 MemorySSA: Add support for renaming uses in the updater.
Summary:
This lets one add aliasing stores to the updater.
(i'm next going to move the creation/etc functions to the updater)

Reviewers: george.burgess.iv

Subscribers: llvm-commits, Prazek

Differential Revision: https://reviews.llvm.org/D30154

llvm-svn: 295677
2017-02-20 22:26:03 +00:00
Alexey Bataev f96465b9b8 [SLP] nullptr'ize initial value in `findBuildAggregate()`, NFC.
Initial value of V is sett nullptr, as it is not used.

llvm-svn: 295642
2017-02-20 08:04:11 +00:00
Alexey Bataev 2f6b124e01 [SLP] Rework `findBuildAggregate()` from ercursive form to iterative, NFC.
Reviewers: mkuper

Subscribers: llvm-commits, mzolotukhin

Differential Revision: https://reviews.llvm.org/D30103

llvm-svn: 295641
2017-02-20 07:49:39 +00:00
Simon Pilgrim 5798180947 Removed extra ';'
llvm-svn: 295603
2017-02-19 12:32:44 +00:00
Daniel Berlin a4b5c01dd2 Add a DebugCounter for PredicateInfo renaming, and an associated test
llvm-svn: 295594
2017-02-19 04:29:01 +00:00
Simon Pilgrim dba9011942 Fix unused variable warning when assertions are disabled.
llvm-svn: 295587
2017-02-19 00:33:37 +00:00
Daniel Berlin f7d9580a08 NewGVN: Start making use of predicateinfo pass.
Summary: This begins using the predicateinfo pass in NewGVN.

Reviewers: davide

Subscribers: llvm-commits, Prazek

Differential Revision: https://reviews.llvm.org/D29682

llvm-svn: 295583
2017-02-18 23:06:50 +00:00
Daniel Berlin b355c4ff5f NewGVN: Make ranking prefer undef to constants. Fix direction of
shouldSwapOperands to be correct.

llvm-svn: 295582
2017-02-18 23:06:47 +00:00
Daniel Berlin 588e0be39d PredicateInfo: Clean up predicate info a little, using insertion
helpers, and fixing support for the renaming the comparison.

llvm-svn: 295581
2017-02-18 23:06:38 +00:00
Sanjay Patel 53c5c3d65d [InstCombine] add nsw/nuw X, signbit --> or X, signbit
Changing to 'or' (rather than 'xor' when no wrapping flags are set)
allows icmp simplifies to happen as expected.

Differential Revision: https://reviews.llvm.org/D29729

llvm-svn: 295574
2017-02-18 22:20:09 +00:00
Piotr Padlewski cc5868c186 [MemorySSA] NFC small fixes
Summary:
2 small fixes extracted from
https://reviews.llvm.org/D29064

Reviewers: kuhar, davide, dberlin, george.burgess.iv

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30109

llvm-svn: 295566
2017-02-18 20:34:36 +00:00
Dehao Chen 7d230325ef Increases full-unroll threshold.
Summary:
The default threshold for fully unroll is too conservative. This patch doubles the full-unroll threshold

This change will affect the following speccpu2006 benchmarks (performance numbers were collected from Intel Sandybridge):

Performance:

403	0.11%
433	0.51%
445	0.48%
447	3.50%
453	1.49%
464	0.75%

Code size:

403	0.56%
433	0.96%
445	2.16%
447	2.96%
453	0.94%
464	8.02%

The compiler time overhead is similar with code size.

Reviewers: davidxl, mkuper, mzolotukhin, hfinkel, chandlerc

Reviewed By: hfinkel, chandlerc

Subscribers: mehdi_amini, zzheng, efriedma, haicheng, hfinkel, llvm-commits

Differential Revision: https://reviews.llvm.org/D28368

llvm-svn: 295538
2017-02-18 03:46:51 +00:00
Justin Bogner 7bc978b543 OptDiag: Allow constructing DiagnosticLocation from DISubprograms
This avoids creating a DILocation just to represent a line number,
since creating Metadata is expensive. Creating a DiagnosticLocation
directly is much cheaper.

llvm-svn: 295531
2017-02-18 02:00:27 +00:00
Davide Italiano bca05df38b [NewGVN] isOnlyReachableViaThisEdge() is dead now. NFCI.
llvm-svn: 295503
2017-02-17 22:12:30 +00:00
Davide Italiano 6db0ecafd7 [NewGVN] createVariableOrConstant is not required anymore. NFCI.
llvm-svn: 295500
2017-02-17 21:55:47 +00:00
Peter Collingbourne 184773d81f WholeProgramDevirt: For VCP use a 32-bit ConstantInt for the byte offset.
A future change will cause this byte offset to be inttoptr'd and then exported
via an absolute symbol. On the importing end we will expect the symbol to be
in range [0,2^32) so that it will fit into a 32-bit relocation. The problem
is that on 64-bit architectures if the offset is negative it will not be in
the correct range once we inttoptr it.

This change causes us to use a 32-bit integer so that it can be inttoptr'd
(which zero extends) into the correct range.

Differential Revision: https://reviews.llvm.org/D30016

llvm-svn: 295487
2017-02-17 19:43:45 +00:00
Peter Collingbourne 37317f1207 WholeProgramDevirt: Examine the function body when deciding whether functions are readnone.
The goal is to get an analysis result even for de-refineable functions.

Differential Revision: https://reviews.llvm.org/D29803

llvm-svn: 295472
2017-02-17 18:17:04 +00:00
Matthew Simpson f68e183f91 [LV] Remove constant restriction for vector phi creation
We previously only created a vector phi node for an induction variable if its
step had a constant integer type. However, the step actually only needs to be
loop-invariant. We only handle inductions having loop-invariant steps, so this
patch should enable vector phi node creation for all integer induction
variables that will be vectorized.

Differential Revision: https://reviews.llvm.org/D29956

llvm-svn: 295456
2017-02-17 16:09:07 +00:00
Eugene Leviant 958fcd7502 InstCombine: fix extraction when performing vector/array punning
Differential revision: https://reviews.llvm.org/D29491

llvm-svn: 295429
2017-02-17 07:36:03 +00:00
Sanjoy Das 8b859c26ec [JumpThreading] Re-enable JumpThreading for guards
Summary:
JumpThreading for guards feature has been reverted at https://reviews.llvm.org/rL295200
due to the following problem: the feature used the following algorithm for detection of
diamond patters:

1. Find a block with 2 predecessors;
2. Check that these blocks have a common single parent;
3. Check that the parent's terminator is a branch instruction.

The problem is that these checks are insufficient. They may pass for a non-diamond
construction in case if those two predecessors are actually the same block. This may
happen if parent's terminator is a br (either conditional or unconditional) to a block
that ends with "switch" instruction with exactly two branches going to one block.

This patch re-enables the JumpThreading for guards and fixes this issue by adding the
check that those found predecessors are actually different blocks. This guarantees that
parent's terminator is a conditional branch with exactly 2 different successors, which
is now ensured by assertions. It also adds two more tests for this situation (with parent's
terminator being a conditional and an unconditional branch).

Patch by Max Kazantsev!

Reviewers: anna, sanjoy, reames

Reviewed By: sanjoy

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30036

llvm-svn: 295410
2017-02-17 04:21:14 +00:00
Matt Arsenault c18b67745b Bug 31948: Fix assertion when bitcasting constantexpr pointers
llvm-svn: 295387
2017-02-17 00:32:19 +00:00
Wei Mi 493fb266ed [LSR] Prevent formula with SCEVAddRecExpr type of Reg from Sibling loops
In rL294814, we allow formula with SCEVAddRecExpr type of Reg from loops
other than current loop. This is good for the case when induction variable
of outerloop being used in expr in innerloop. But it is very bad to allow
such Reg from sibling loop because we may need to add lsr.iv in other sibling
loops when scev expanding those SCEVAddRecExpr type exprs. For the testcase
below, one loop can be inserted with a bunch of lsr.iv because of LSR for
other loops. 

// The induction variable j from a loop in the middle will have initial
// value generated from previous sibling loop and exit value used by its
// next sibling loop.
void goo(long i, long j); 
long cond; 

void foo(long N) { 
long i = 0; 
long j = 0; 
i = 0; do { goo(i, j); i++; j++; } while (cond); 
i = 0; do { goo(i, j); i++; j++; } while (cond); 
i = 0; do { goo(i, j); i++; j++; } while (cond); 
i = 0; do { goo(i, j); i++; j++; } while (cond); 
i = 0; do { goo(i, j); i++; j++; } while (cond); 
i = 0; do { goo(i, j); i++; j++; } while (cond); 
} 

The fix is to only allow formula with SCEVAddRecExpr type of Reg from current
loop or its parents.

Differential Revision: https://reviews.llvm.org/D30021

llvm-svn: 295378
2017-02-16 21:27:31 +00:00
Matt Arsenault 920576042d InstCombine: Canonicalize fast fmuladd to fmul + fadd
llvm-svn: 295353
2017-02-16 18:46:24 +00:00
Craig Topper 3731f4d173 [AVX-512][InstCombine] Teach InstCombine to optimize 512-bit packss/packus intrinsics like it does 128/256-bit.
llvm-svn: 295294
2017-02-16 07:35:23 +00:00
Peter Collingbourne 08eb081ac3 PMB: Add an importing WPD pass to the start of the ThinLTO backend pipeline.
Differential Revision: https://reviews.llvm.org/D30008

llvm-svn: 295260
2017-02-15 23:48:38 +00:00
Peter Collingbourne 50cbd7cc90 Re-apply r295110 and r295144 with a fix for the ASan issue.
llvm-svn: 295241
2017-02-15 21:56:51 +00:00
Sanjay Patel 845ea963aa [InstCombine] improve formatting; NFC
llvm-svn: 295237
2017-02-15 21:31:34 +00:00
Arnold Schwaighofer 8d61e0030a AddressSanitizer: don't track swifterror memory addresses
They are register promoted by ISel and so it makes no sense to treat them as
memory.

Inserting calls to the thread sanitizer would also generate invalid IR.

You would hit:

"swifterror value can only be loaded and stored from, or as a swifterror
argument!"

llvm-svn: 295230
2017-02-15 20:43:43 +00:00
Arnold Schwaighofer 8eb1a48540 ThreadSanitizer: don't track swifterror memory addresses
They are register promoted by ISel and so it makes no sense to treat them as
memory.

Inserting calls to the thread sanitizer would also generate invalid IR.

You would hit:

"swifterror value can only be loaded and stored from, or as a swifterror
argument!"

llvm-svn: 295215
2017-02-15 18:57:06 +00:00
Anna Thomas 94c8d4976c Revert "[JumpThreading] Thread through guards"
This reverts commit r294617.

We fail on an assert while trying to get a condition from an
unconditional branch.

llvm-svn: 295200
2017-02-15 17:08:29 +00:00
Sanjay Patel 288f075f8e [InlineFunction] use getFunction(); NFC
llvm-svn: 295185
2017-02-15 15:22:18 +00:00
Sanjay Patel 32d753cae3 [InlineFunction] use getCaller(); NFCI
llvm-svn: 295181
2017-02-15 15:08:38 +00:00
Sanjay Patel ada717e25b [InlineFunction] use range-for loop; NFCI
llvm-svn: 295179
2017-02-15 14:56:11 +00:00
Daniel Jasper eef9b03395 Revert r295110 and r295144.
This fails under ASAN:
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-bootstrap/builds/798/steps/check-llvm%20asan/logs/stdio

llvm-svn: 295162
2017-02-15 09:56:08 +00:00
Peter Collingbourne 0609acc10d SimplifyCFG: Register cloned assume intrinsics with assumption cache when creating critical edge.
Differential Revision: https://reviews.llvm.org/D29976

llvm-svn: 295145
2017-02-15 03:01:11 +00:00
Peter Collingbourne e2367415b6 WholeProgramDevirt: Separate the code that applies optzns from the code that decides whether to apply them. NFCI.
The idea is that the apply* functions will also be called when importing
devirt optimizations.

Differential Revision: https://reviews.llvm.org/D29745

llvm-svn: 295144
2017-02-15 02:13:08 +00:00
Easwaran Raman 5a12f236c6 Fix a bug in caller's BFI update code after inlining.
Multiple blocks in the callee can be mapped to a single cloned block
since we prune the callee as we clone it. The existing code
iterates over the value map and clones the block frequency (and
eventually scales the frequencies of the cloned blocks). Value map's
iteration is not deterministic and so the cloned block might get the
frequency of any of the original blocks. The fix is to set the max of
the original frequencies to the cloned block. The first block in the
sequence must have this max frequency and, in the call context,
subsequent blocks must have its frequency.

Differential Revision: https://reviews.llvm.org/D29696

llvm-svn: 295115
2017-02-14 22:49:28 +00:00
Michael Kuperstein 569162fefe [LV] Rename Induction to PrimaryInduction. NFC.
llvm-svn: 295111
2017-02-14 22:14:01 +00:00
Peter Collingbourne 534c0175b6 WholeProgramDevirt: Change internal vcall data structures to match summary.
Group calls into constant and non-constant arguments up front, and use uint64_t
instead of ConstantInt to represent constant arguments. The goal is to allow
the information from the summary to fit naturally into this data structure in
a future change (specifically, it will be added to CallSiteInfo).

This has two side effects:
- We disallow VCP for constant integer arguments of width >64 bits.
- We remove the restriction that the bitwidth of a vcall's argument and return
  types must match those of the vfunc definitions.
I don't expect either of these to matter in practice. The first case is
uncommon, and the second one will lead to UB (so we can do anything we like).

Differential Revision: https://reviews.llvm.org/D29744

llvm-svn: 295110
2017-02-14 22:12:23 +00:00
Taewook Oh 2e945ebb13 [BasicBlockUtils] Use getFirstNonPHIOrDbg to set debugloc for instructions created in SplitBlockPredecessors
Summary:
When setting debugloc for instructions created in SplitBlockPredecessors, current implementation copies debugloc from the first-non-phi instruction of the original basic block. However, if the first-non-phi instruction is a call for @llvm.dbg.value, the debugloc of the instruction may point the location outside of the block itself. For the example code of

```
  1 typedef struct _node_t {
  2   struct _node_t *next;
  3 } node_t;
  4
  5 extern node_t *root;
  6
  7 int foo() {
  8   node_t *node, *tmp;
  9   int ret = 0;
 10
 11   node = tmp = root->next;
 12   while (node != root) {
 13     while (node) {
 14       tmp = node;
 15       node = node->next;
 16       ret++;
 17     }
 18   }
 19
 20   return ret;
 21 }
```

, below is the basicblock corresponding to line 12 after Reassociate expressions pass:

```
while.cond:                                       ; preds = %while.cond2, %entry
  %node.0 = phi %struct._node_t* [ %1, %entry ], [ null, %while.cond2 ]
  %ret.0 = phi i32 [ 0, %entry ], [ %ret.1, %while.cond2 ]
  tail call void @llvm.dbg.value(metadata i32 %ret.0, i64 0, metadata !19, metadata !20), !dbg !21
  tail call void @llvm.dbg.value(metadata %struct._node_t* %node.0, i64 0, metadata !11, metadata !20), !dbg !31
  %cmp = icmp eq %struct._node_t* %node.0, %0, !dbg !33
  br i1 %cmp, label %while.end5, label %while.cond2, !dbg !35
```

As you can see, the first-non-phi instruction is a call for @llvm.dbg.value, and the debugloc is

```
!21 = !DILocation(line: 9, column: 7, scope: !6)
```

, which is a definition of 'ret' variable and outside of the scope of the basicblock itself. However, current implementation picks up this debugloc for the instructions created in SplitBlockPredecessors. This patch addresses this problem by picking up debugloc from the first-non-phi-non-dbg instruction.

Reviewers: dblaikie, samsonov, eugenis

Reviewed By: eugenis

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29867

llvm-svn: 295106
2017-02-14 21:10:40 +00:00
Vedant Kumar 55891fc71e Re-apply "[profiling] Remove dead profile name vars after emitting name data"
This reverts 295092 (re-applies 295084), with a fix for dangling
references from the array of coverage names passed down from frontends.

I missed this in my initial testing because I only checked test/Profile,
and not test/CoverageMapping as well.

Original commit message:

The profile name variables passed to counter increment intrinsics are dead
after we emit the finalized name data in __llvm_prf_nm. However, we neglect to
erase these name variables. This causes huge size increases in the
__TEXT,__const section as well as slowdowns when linker dead stripping is
disabled. Some affected projects are so massive that they fail to link on
Darwin, because only the small code model is supported.

Fix the issue by throwing away the name constants as soon as we're done with
them.

Differential Revision: https://reviews.llvm.org/D29921

llvm-svn: 295099
2017-02-14 20:03:48 +00:00
Vedant Kumar 27ebdf4bcb Revert "[profiling] Remove dead profile name vars after emitting name data"
This reverts commit r295084. There is a test failure on:

http://lab.llvm.org:8011/builders/clang-atom-d525-fedora-rel/builds/2620/

llvm-svn: 295092
2017-02-14 19:08:39 +00:00
Vedant Kumar bb10484662 [profiling] Remove dead profile name vars after emitting name data
The profile name variables passed to counter increment intrinsics are
dead after we emit the finalized name data in __llvm_prf_nm. However, we
neglect to erase these name variables. This causes huge size increases
in the __TEXT,__const section as well as slowdowns when linker dead
stripping is disabled. Some affected projects are so massive that they
fail to link on Darwin, because only the small code model is supported.

Fix the issue by throwing away the name constants as soon as we're done
with them.

Differential Revision: https://reviews.llvm.org/D29921

llvm-svn: 295084
2017-02-14 18:48:48 +00:00
Taewook Oh f22fa72e4a Do not apply redundant LastCallToStaticBonus
Summary:
As written in the comments above, LastCallToStaticBonus is already applied to
the cost if Caller has only one user, so it is redundant to reapply the bonus
here.

If the only user is not a caller, TotalSecondaryCost will not be adjusted
anyway because callerWillBeRemoved is false. If there's no caller at all, we
don't need to care about TotalSecondaryCost because
inliningPreventsSomeOuterInline is false.

Reviewers: chandlerc, eraman

Reviewed By: eraman

Subscribers: haicheng, davidxl, davide, llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D29169

llvm-svn: 295075
2017-02-14 17:30:05 +00:00
Brian Cain 6dedf65cc9 Correct a typo, s/hosting/hoisting/
llvm-svn: 295066
2017-02-14 16:41:10 +00:00
Matthew Simpson f09d13e5cc Reapply "[LV] Extend trunc optimization to all IVs with constant integer steps"
This reapplies commit r294967 with a fix for the execution time regressions
caught by the clang-cmake-aarch64-quick bot. We now extend the truncate
optimization to non-primary induction variables only if the truncate isn't
already free.

Differential Revision: https://reviews.llvm.org/D29847

llvm-svn: 295063
2017-02-14 16:28:32 +00:00
Alexey Bataev 2a2f35d59c [SLP] Fix for PR31879: vectorize repeated scalar ops that don't get put
back into a vector

Previously the cost of the existing ExtractElement/ExtractValue
instructions was considered as a dead cost only if it was detected that
they have only one use. But these instructions may be considered
dead also if users of the instructions are also going to be vectorized,
like:
```
%x0 = extractelement <2 x float> %x, i32 0
%x1 = extractelement <2 x float> %x, i32 1
%x0x0 = fmul float %x0, %x0
%x1x1 = fmul float %x1, %x1
%add = fadd float %x0x0, %x1x1
```
This can be transformed to
```
%1 = fmul <2 x float> %x, %x
%2 = extractelement <2 x float> %1, i32 0
%3 = extractelement <2 x float> %1, i32 1
%add = fadd float %2, %3
```
because though `%x0` and `%x1` have 2 users each other, these users are
part of the vectorized tree and we can consider these `extractelement`
instructions as dead.

Differential Revision: https://reviews.llvm.org/D29900

llvm-svn: 295056
2017-02-14 15:20:48 +00:00
Karl-Johan Karlsson ec21b769ec Revert "[LoopVectorize] Added address space check when analysing interleaved accesses"
This reverts r295038. The buildbot clang-with-thin-lto-ubuntu failed.
I'm reverting to investigate.

llvm-svn: 295042
2017-02-14 10:06:16 +00:00
Karl-Johan Karlsson 2ec409cca2 [LoopVectorize] Added address space check when analysing interleaved accesses
Prevent memory objects of different address spaces to be part of
the same load/store groups when analysing interleaved accesses.

This is fixing pr31900.

Reviewers: HaoLiu, mssimpso, mkuper

Reviewed By: mssimpso, mkuper

Subscribers: llvm-commits, efriedma, mzolotukhin

Differential Revision: https://reviews.llvm.org/D29717

llvm-svn: 295038
2017-02-14 08:14:06 +00:00
Karl-Johan Karlsson 38cbf5869d Test commit permission
Removing whitespace.

llvm-svn: 295037
2017-02-14 07:31:36 +00:00
Mikael Holmen ece84cd10c [LSR] Pointers with different address spaces are considered incompatible.
Summary:
Function isCompatibleIVType is already used as a guard before the call to

 SE.getMinusSCEV(OperExpr, PrevExpr);

in LSRInstance::ChainInstruction. getMinusSCEV requires the expressions
to be of the same type, so we now consider two pointers with different
address spaces to be incompatible, since it is possible that the pointers
in fact have different sizes.

Reviewers: qcolombet, eli.friedman

Reviewed By: qcolombet

Subscribers: nhaehnle, Ka-Ka, llvm-commits, mzolotukhin

Differential Revision: https://reviews.llvm.org/D29885

llvm-svn: 295033
2017-02-14 06:37:42 +00:00
Peter Collingbourne 002c2d5380 ThinLTOBitcodeWriter: Write available_externally copies of VCP eligible functions to merged module.
Differential Revision: https://reviews.llvm.org/D29701

llvm-svn: 295021
2017-02-14 03:42:38 +00:00
Philip Reames b2bca7e309 [LICM] Make store promotion work in the face of unordered atomics
Extend our store promotion code to deal with unordered atomic accesses. Ordered atomics continue to be unhandled.

Most of the change is straight-forward, the only complicated bit is in the reasoning around mixing of atomic and non-atomic memory access. Rather than trying to reason about the complex semantics in these cases, I simply disallowed promotion when both atomic and non-atomic accesses are present. This is conservatively correct.

It seems really tempting to just promote all access to atomics, but the original accesses might have been conditional. Since we can't lower an arbitrary atomic type, it might not be safe to promote all access to atomic. Consider a loop like the following:
while(b) {
  load i128 ...
  if (can lower i128 atomic)
    store atomic i128 ...
  else
    store i128
}

It could be there's no race on the location and thus the code is perfectly well defined even if we can't lower a i128 atomically. 

It's not clear we need to be this conservative - arguably the program above is brocken since it can't be lowered unless the branch is folded - but I didn't want to have to fix any fallout which might result.

Differential Revision: https://reviews.llvm.org/D15592

llvm-svn: 295015
2017-02-14 01:38:31 +00:00
Peter Collingbourne c45f7f3eb4 FunctionAttrs: Factor out a function for querying memory access of a specific copy of a function. NFC.
This will later be used by ThinLTOBitcodeWriter to add copies of readnone
functions to the regular LTO module.

Differential Revision: https://reviews.llvm.org/D29695

llvm-svn: 295008
2017-02-14 00:28:13 +00:00
Sanjay Patel 4f74216da0 [FunctionAttrs] try to extend nonnull-ness of arguments from a callsite back to its parent function
As discussed here:
http://lists.llvm.org/pipermail/llvm-dev/2016-December/108182.html
...we should be able to propagate 'nonnull' info from a callsite back to its parent.

The original motivation for this patch is our botched optimization of "dyn_cast" (PR28430),
but this won't solve that problem.

The transform is currently disabled by default while we wait for clang to work-around
potential security problems:
http://lists.llvm.org/pipermail/cfe-dev/2017-January/052066.html

Differential Revision: https://reviews.llvm.org/D27855

llvm-svn: 294998
2017-02-13 23:10:51 +00:00
Peter Collingbourne 2b33f65317 IR: Type ID summary extensions for WPD; thread summary into WPD pass.
Make the whole thing testable by adding YAML I/O support for the WPD
summary information and adding some negative tests that exercise the
YAML support.

Differential Revision: https://reviews.llvm.org/D29782

llvm-svn: 294981
2017-02-13 19:26:18 +00:00
Matthew Simpson 659f92e2aa Revert "[LV] Extend trunc optimization to all IVs with constant integer steps"
This reverts commit r294967. This patch caused execution time slowdowns in a
few LLVM test-suite tests, as reported by the clang-cmake-aarch64-quick bot.
I'm reverting to investigate.

llvm-svn: 294973
2017-02-13 18:02:35 +00:00
Matthew Simpson 7b7f40297f [LV] Extend trunc optimization to all IVs with constant integer steps
This patch extends the optimization of truncations whose operand is an
induction variable with a constant integer step. Previously we were only
applying this optimization to the primary induction variable. However, the cost
model assumes the optimization is applied to the truncation of all integer
induction variables (even regardless of step type). The transformation is now
applied to the other induction variables, and I've updated the cost model to
ensure it is better in sync with the transformation we actually perform.

Differential Revision: https://reviews.llvm.org/D29847

llvm-svn: 294967
2017-02-13 16:48:00 +00:00
Alexey Bataev e8b1536e21 [SLP] Fix for PR31690: Allow using of extra values in horizontal
reductions.

Currently, LLVM supports vectorization of horizontal reduction
instructions with initial value set to 0. Patch supports vectorization
of reduction with non-zero initial values. Also, it supports a
vectorization of instructions with some extra arguments, like:
```
float f(float x[], int a, int b) {
  float p = a % b;
  p += x[0] + 3;
  for (int i = 1; i < 32; i++)
    p += x[i];
  return p;
}
```
Patch allows vectorization of this kind of horizontal reductions.

Differential Revision: https://reviews.llvm.org/D29727

llvm-svn: 294934
2017-02-13 08:01:26 +00:00
Daniel Berlin 4d54796f87 NewGVN: Reverse order of congruence class elimination to maximize trivial deadness
llvm-svn: 294926
2017-02-12 23:24:45 +00:00
Daniel Berlin 508a1dec94 NewGVN: Use shouldSwapOperands in one more place
llvm-svn: 294925
2017-02-12 23:24:42 +00:00
Daniel Berlin 31e1b8fe48 Revert accidental commit titled "testing"
This reverts commit r294919

llvm-svn: 294923
2017-02-12 22:40:10 +00:00
Daniel Berlin 86eab15f2b NewGVN: Apply the fast math flags fix in r267113 to NewGVN as well.
llvm-svn: 294922
2017-02-12 22:25:20 +00:00
Daniel Berlin dbe8264c93 PredicateInfo: Handle critical edges
Summary:
This adds support for placing predicateinfo such that it affects critical edges.

This fixes the issues mentioned by Nuno on the mailing list.

Depends on D29519

Reviewers: davide, nlopes

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29606

llvm-svn: 294921
2017-02-12 22:12:20 +00:00
Daniel Berlin eccb8740d1 NewGVN: Fix missed call that should be to shouldSwapOperands
llvm-svn: 294920
2017-02-12 22:02:47 +00:00
Daniel Berlin 3fecad0d3e testing
llvm-svn: 294919
2017-02-12 22:02:20 +00:00
Sanjay Patel 45b7e69fef [InstCombine] fold icmp sgt/slt (add nsw X, C2), C --> icmp sgt/slt X, (C - C2)
I found one special case of this transform for 'slt 0', so I removed that and added the general transform.

Alive code to check correctness:

Name: slt_no_overflow
Pre: WillNotOverflowSignedSub(C1, C2)
%a = add nsw i8 %x, C2
%b = icmp slt %a, C1
  =>
%b = icmp slt %x, C1 - C2

Name: sgt_no_overflow
Pre: WillNotOverflowSignedSub(C1, C2)
%a = add nsw i8 %x, C2
%b = icmp sgt %a, C1
  =>
%b = icmp sgt %x, C1 - C2

http://rise4fun.com/Alive/MH

Differential Revision: https://reviews.llvm.org/D29774

llvm-svn: 294898
2017-02-12 16:40:30 +00:00
Daniel Berlin 22a4a01ffa NewGVN: Reverse sense of this test to make it clearer
llvm-svn: 294851
2017-02-11 15:20:15 +00:00
Daniel Berlin 1529bb93c9 NewGVN: Add missing initialization of NumFuncArgs lost due to bad merge.
llvm-svn: 294850
2017-02-11 15:13:49 +00:00
Daniel Berlin 1c08767f88 NewGVN: Rank and order commutative operands consistently.
llvm-svn: 294849
2017-02-11 15:07:01 +00:00
Daniel Berlin b79f53669a NewGVN: Clean up how we handle the INITIAL class so that everything in
it is dead or unreachable, as it should be.
This also makes the leader of INITIAL undef, enabling us to handle
irreducibility properly.

Summary:
This lets us verify, more than we do now, that we didn't screw up
value numbering.

Reviewers: davide

Subscribers: Prazek, llvm-commits

Differential Revision: https://reviews.llvm.org/D29842

llvm-svn: 294844
2017-02-11 12:48:50 +00:00
Benjamin Kramer efcf06f5f2 Move symbols from the global namespace into (anonymous) namespaces. NFC.
llvm-svn: 294837
2017-02-11 11:06:55 +00:00
Evgeny Stupachenko fe6f548d2d Fix PR23384 (under "-lsr-insns-cost" option)
Summary:
The patch adds instructions number generated by a solution
 to LSR cost under "-lsr-insns-cost" option.

Reviewers: qcolombet, hfinkel

Differential Revision: http://reviews.llvm.org/D28307

From: Evgeny Stupachenko <evstupac@gmail.com>
llvm-svn: 294821
2017-02-11 02:57:43 +00:00
Wei Mi 8f20e63a20 [LSR] Recommit: Allow formula containing Reg for SCEVAddRecExpr related with outerloop.
The recommit includes some changes of testcases. No functional change to the patch.

In RateRegister of existing LSR, if a formula contains a Reg which is a SCEVAddRecExpr,
and this SCEVAddRecExpr's loop is an outerloop, the formula will be marked as Loser
and dropped.

Suppose we have an IR that %for.body is outerloop and %for.body2 is innerloop. LSR only
handle inner loop now so only %for.body2 will be handled.

Using the logic above, formula like
reg(%array) + reg({1,+, %size}<%for.body>) + 1*reg({0,+,1}<%for.body2>) will be dropped
no matter what because reg({1,+, %size}<%for.body>) is a SCEVAddRecExpr type reg related
with outerloop. Only formula like
reg(%array) + 1*reg({{1,+, %size}<%for.body>,+,1}<nuw><nsw><%for.body2>) will be kept
because the SCEVAddRecExpr related with outerloop is folded into the initial value of the
SCEVAddRecExpr related with current loop.

But in some cases, we do need to share the basic induction variable
reg{0 ,+, 1}<%for.body2> among LSR Uses to reduce the final total number of induction
variables used by LSR, so we don't want to drop the formula like
reg(%array) + reg({1,+, %size}<%for.body>) + 1*reg({0,+,1}<%for.body2>) unconditionally.

From the existing comment, it tries to avoid considering multiple level loops at the same time.
However, existing LSR only handles innermost loop, so for any SCEVAddRecExpr with a loop other
than current loop, it is an invariant and will be simple to handle, and the formula doesn't have
to be dropped.

Differential Revision: https://reviews.llvm.org/D26429

llvm-svn: 294814
2017-02-11 00:50:23 +00:00
Chandler Carruth 027340f3b9 [PM] Fix a bug in how I ported LoopDeletion to the new PM.
This was marking the loop for deletion after the loop was deleted. This
almost works, except that when we do any kind of debug logging it starts
reading the name of the loop from deleted memory or otherwise blowing
up. This can fail in a bunch of ways. I recently added a test that
*always* does this, and it started failing on the sanitizer bots.

The fix is to mark the loop as deleted in the loop PM infrastructure
before we remove the loop. We can do this by passing the updater into
the routine. That also lets us simplify a bunch of other interface
components here for a net win.

llvm-svn: 294810
2017-02-11 00:09:30 +00:00
Benjamin Kramer 03ab8a366e [InstCombine] Move class into anonymous namespace. NFC.
This is necessary to avoid warnings from GCC.
InstCombineLoadStoreAlloca.cpp:238:7: error: 'PointerReplacer' declared
with greater visibility than the type of its field 'PointerReplacer::IC'

llvm-svn: 294794
2017-02-10 22:26:35 +00:00
Benjamin Kramer 684c87be4f [InstCombine] Silence unused variable warning in Release builds.
llvm-svn: 294788
2017-02-10 22:04:17 +00:00
Yaxun Liu ba01ed00fe Fix invalid addrspacecast due to combining alloca with global var
For function-scope variables with large initialisation list, FE usually 
generates a global variable to hold the initializer, then generates 
memcpy intrinsic to initialize the alloca. InstCombiner::visitAllocaInst 
identifies such allocas which are accessed only by reading and replaces 
them with the global variable. This is done by casting the global variable 
to the type of the alloca and replacing all references.

However, when the global variable is in a different address space which 
is disjoint with addr space 0 (e.g. for IR generated from OpenCL, 
global variable cannot be in private addr space i.e. addr space 0), casting 
the global variable to addr space 0 results in invalid IR for certain 
targets (e.g. amdgpu).

To fix this issue, when the global variable is not in addr space 0, 
instead of casting it to addr space 0, this patch chases down the uses 
of alloca until reaching the load instructions, then replaces load from 
alloca with load from the global variable. If during the chasing 
bitcast and GEP are encountered, new bitcast and GEP based on the global 
variable are generated and used in the load instructions.

Differential Revision: https://reviews.llvm.org/D27283

llvm-svn: 294786
2017-02-10 21:46:07 +00:00
Dehao Chen fb02f7140a Encode duplication factor from loop vectorization and loop unrolling to discriminator.
Summary:
This patch starts the implementation as discuss in the following RFC: http://lists.llvm.org/pipermail/llvm-dev/2016-October/106532.html

When optimization duplicates code that will scale down the execution count of a basic block, we will record the duplication factor as part of discriminator so that the offline process tool can find the duplication factor and collect the accurate execution frequency of the corresponding source code. Two important optimization that fall into this category is loop vectorization and loop unroll. This patch records the duplication factor for these 2 optimizations.

The recording will be guarded by a flag encode-duplication-in-discriminators, which is off by default.

Reviewers: probinson, aprantl, davidxl, hfinkel, echristo

Reviewed By: hfinkel

Subscribers: mehdi_amini, anemet, mzolotukhin, llvm-commits

Differential Revision: https://reviews.llvm.org/D26420

llvm-svn: 294782
2017-02-10 21:09:07 +00:00
Matthew Simpson df124a7569 [LV] Remove type restriction for vector phi creation
We previously only created a vector phi node for an induction variable if its
type matched the type of the canonical induction variable.

Differential Revision: https://reviews.llvm.org/D29776

llvm-svn: 294755
2017-02-10 16:15:26 +00:00
Philip Reames 578dafbd8b [LoopUnswitch] Remove BFI usage (dead code)
Chandler mentioned at the last social that the need for BFI in the new pass manager was causing a slight hiccup for this pass.  Given this code has been checked in, but off for over a year, it makes sense to just remove it for now.

Note that there's nothing wrong with the general idea - it's actually a quite good one - and once we have the infrastructure in place to implement this without the full recompuation on every loop, we absolutely should.

llvm-svn: 294715
2017-02-10 06:12:06 +00:00
Chandler Carruth addcda483e [PM] Port ArgumentPromotion to the new pass manager.
Now that the call graph supports efficient replacement of a function and
spurious reference edges, we can port ArgumentPromotion to the new pass
manager very easily.

The old PM-specific bits are sunk into callbacks that the new PM simply
doesn't use. Unlike the old PM, the new PM simply does argument
promotion and afterward does the update to LCG reflecting the promoted
function.

Differential Revision: https://reviews.llvm.org/D29580

llvm-svn: 294667
2017-02-09 23:46:27 +00:00
Peter Collingbourne 17febdbb25 WholeProgramDevirt: Check that VCP candidate functions are defined before evaluating them.
This was crashing before.

llvm-svn: 294666
2017-02-09 23:46:26 +00:00
Chandler Carruth aaad9f84be [PM/LCG] Teach the LazyCallGraph how to replace a function without
disturbing the graph or having to update edges.

This is motivated by porting argument promotion to the new pass manager.
Because of how LLVM IR Function objects work, in order to change their
signature a new object needs to be created. This is efficient and
straight forward in the IR but previously was very hard to implement in
LCG. We could easily replace the function a node in the graph
represents. The challenging part is how to handle updating the edges in
the graph.

LCG previously used an edge to a raw function to represent a node that
had not yet been scanned for calls and references. This was the core
of its laziness. However, that model causes this kind of update to be
very hard:
1) The keys to lookup an edge need to be `Function*`s that would all
   need to be updated when we update the node.
2) There will be some unknown number of edges that haven't transitioned
   from `Function*` edges to `Node*` edges.

All of this complexity isn't necessary. Instead, we can always build
a node around any function, always pointing edges at it and always using
it as the key to lookup an edge. To maintain the laziness, we need to
sink the *edges* of a node into a secondary object and explicitly model
transitioning a node from empty to populated by scanning the function.
This design seems much cleaner in a number of ways, but importantly
there is now exactly *one* place where the `Function*` has to be
updated!

Some other cleanups that fall out of this include having something to
model the *entry* edges more accurately. Rather than hand rolling parts
of the node in the graph itself, we have an explicit `EdgeSequence`
object that gives us exactly the functionality needed. We also have
a consistent place to define the edge iterators and can use them for
both the entry edges and the internal edges of the graph.

The API used to model the separation between a node and its edges is
intentionally very thin as most clients are expected to deal with nodes
that have populated edges. We model this exactly as an optional does
with an additional method to populate the edges when that is
a reasonable thing for a client to do. This is based on API design
suggestions from Richard Smith and David Blaikie, credit goes to them
for helping pick how to model this without it being either too explicit
or too implicit.

The patch is somewhat noisy due to shifting around iterator types and
new syntax for walking the edges of a node, but most of the
functionality change is in the `Edge`, `EdgeSequence`, and `Node` types.

Differential Revision: https://reviews.llvm.org/D29577

llvm-svn: 294653
2017-02-09 23:24:13 +00:00
Sanjay Patel f38bab73aa [InstCombine] allow (X * C2) << C1 --> X * (C2 << C1) for vectors
This fold already existed for vectors but only when 'C1' was a splat
constant (but 'C2' could be any constant). 

There were no tests for any vector constants, so I'm adding a test
that shows non-splat constants for both operands.  

llvm-svn: 294650
2017-02-09 23:13:04 +00:00
Peter Collingbourne cea1e4e79a De-duplicate some code for creating an AARGetter suitable for the legacy PM.
I'm about to use this in a couple more places.

Differential Revision: https://reviews.llvm.org/D29793

llvm-svn: 294648
2017-02-09 23:11:52 +00:00
Michael J. Spencer 714d9d22ad [LoadCombine] Fix combining of loads which span an aliasing store.
Fixes PR31517

Differential Revision: https://reviews.llvm.org/D28922

llvm-svn: 294632
2017-02-09 21:46:49 +00:00
Peter Collingbourne 857aba4410 Rename LowerTypeTestsSummaryAction to PassSummaryAction. NFCI.
I intend to use the same type with the same semantics in the WholeProgramDevirt
pass.

Differential Revision: https://reviews.llvm.org/D29746

llvm-svn: 294629
2017-02-09 21:45:01 +00:00
Sanjay Patel ae3b43e488 [InstCombine] use m_APInt to allow demanded bits analysis on splat constants
llvm-svn: 294628
2017-02-09 21:43:06 +00:00
Sanjoy Das 74bda4d591 [JumpThreading] Thread through guards
Summary:
This patch allows JumpThreading also thread through guards.
Virtually, guard(cond) is equivalent to the following construction:

  if (cond) { do something } else {deoptimize}

Yet it is not explicitly converted into IFs before lowering.
This patch enables early threading through guards in simple cases.
Currently it covers the following situation:

  if (cond1) {
    // code A
  } else {
    // code B
  }
  // code C
  guard(cond2)
  // code D

If there is implication cond1 => cond2 or !cond1 => cond2, we can transform
this construction into the following:

  if (cond1) {
    // code A
    // code C
  } else {
    // code B
    // code C
    guard(cond2)
  }
  // code D

Thus, removing the guard from one of execution branches.

Patch by Max Kazantsev!

Reviewers: reames, apilipenko, igor-laevsky, anna, sanjoy

Reviewed By: sanjoy

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29620

llvm-svn: 294617
2017-02-09 19:40:22 +00:00
Mike Aizatsky 4705ae936d [sancov] using comdat only when it is enabled
Differential Revision: https://reviews.llvm.org/D29733

llvm-svn: 294529
2017-02-08 23:12:46 +00:00
Mike Aizatsky 401d369328 [sancov] specifying comdat for sancov constructors
Differential Revision: https://reviews.llvm.org/D29662

llvm-svn: 294517
2017-02-08 21:20:33 +00:00
Peter Collingbourne 28ffd3261f ThinLTOBitcodeWriter: Strip debug info from merged module.
This module will contain nothing but vtable definitions and (soon)
available_externally function definitions, so there is no point in keeping
debug info in the module.

Differential Revision: https://reviews.llvm.org/D28913

llvm-svn: 294511
2017-02-08 20:44:00 +00:00
Elena Demikhovsky 5267edd3e3 [Loop Vectorizer] Cost-based decision for vectorization form of memory instruction.
Making the cost model selecting between Interleave, GatherScatter or Scalar vectorization form of memory instruction.
The right decision should be done for non-consecutive memory access instrcuctions that may have more than one vectorization solution.

This patch includes the following changes:
- Cost Model calculates the cost of Load/Store vector form and choose the better option between Widening, Interleave, GatherScactter and Scalarization. Cost Model keeps the widening decision.
- Arrays of Uniform and Scalar values are moved from Legality to Cost Model.
- Cost Model collects Uniforms and Scalars per VF. The collection is based on CM decision map of Loadis/Stores vectorization form.
- Vectorization of memory instruction is performed according to the CM decision.

Differential Revision: https://reviews.llvm.org/D27919

llvm-svn: 294503
2017-02-08 19:25:23 +00:00
Matt Arsenault 560665250f NVPTX: Extract mem intrinsic expansions into utilities
llvm-svn: 294490
2017-02-08 17:49:52 +00:00
Chad Rosier e22c992ba9 [Reassociate] Remove an unused argument. NFC.
llvm-svn: 294489
2017-02-08 17:45:27 +00:00
Sanjay Patel 6dd2eae76a [InstCombine] add local name for repeated calls; NFC
llvm-svn: 294470
2017-02-08 16:19:36 +00:00
Igor Laevsky a9b6872908 [InstComobineCalls] Fix buildbot failures after r294453.
Some targets don't support uint64_t options. Change type to unsigned.

Differential Revision: https://reviews.llvm.org/D28909

llvm-svn: 294461
2017-02-08 15:21:48 +00:00
Igor Laevsky 900ffa34c8 [InstCombineCalls] Unfold element atomic memcpy instruction
Differential Revision: https://reviews.llvm.org/D28909

llvm-svn: 294453
2017-02-08 14:32:04 +00:00
Igor Laevsky 4b317fa24e [InstCombineCalls] Remove zero length atomic memcpy intrinsics
Differential Revision: https://reviews.llvm.org/D28909

llvm-svn: 294452
2017-02-08 14:23:47 +00:00
Matt Arsenault cb3fa37c7e LSR: Check atomic instruction pointer operands
llvm-svn: 294410
2017-02-08 06:44:58 +00:00
Daniel Berlin 73eb7fe478 Revert "CVP: Make CVP iterate in an order that maximizes reuse of LVI cache"
This reverts commit r294398, it seems to be failing on the bots.

llvm-svn: 294399
2017-02-08 02:48:25 +00:00
Daniel Berlin b83cfb7724 CVP: Make CVP iterate in an order that maximizes reuse of LVI cache
Summary:
After the DFS order change for LVI, i have a few testcases that now
take forever.

The TL;DR - This is mainly due to the overdefined cache, but that
requires predicateinfo to fix[1]

In order to maximize reuse of the LVI cache for now, change the order
we iterate in.

This reduces my testcase from 5 minutes to 4 seconds.

I have verified cases like gmic do not get slower.

I am playing with whether the order should be postorder or idf.

[1] In practice, overdefined anywhere should be overdefined
everywhere, so this cache should be global.  That also fixes this bug.
The problem, however, is that LVI relies on this cache being filled in
per-block because it wants different values in different blocks due to
precisely the naming issue that predicateinfo fixes.  With
predicateinfo, making the cache global works fine on individual
passes, and also resolves this issue.

Reviewers: davide, sanjoy, chandlerc

Subscribers: llvm-commits, djasper

Differential Revision: https://reviews.llvm.org/D29679

llvm-svn: 294398
2017-02-08 02:35:07 +00:00
Sanjoy Das ec892139bd [IRCE] Add a missing invariant check
Currently IRCE relies on the loops it transforms to be (semantically) of
the form:

  for (i = START; i < END; i++)
    ...

or

  for (i = START; i > END; i--)
    ...

However, we were not verifying the presence of the START < END entry
check (i.e. check before the first iteration).  We were only verifying
that the backedge was guarded by (i + 1) < END.

Usually this would work "fine" since (especially in Java) most loops do
actually have the START < END check, but of course that is not
guaranteed.

llvm-svn: 294375
2017-02-07 23:59:07 +00:00
Daniel Berlin c763fd1ab4 PredicateInfo: Some compilers are unhappy with naming Use *'s Use. Change the name.
llvm-svn: 294364
2017-02-07 22:11:43 +00:00
Daniel Berlin 439042b7ad Add PredicateInfo utility and printing pass
Summary:
This patch adds a utility to build extended SSA (see "ABCD: eliminating
array bounds checks on demand"), and an intrinsic to support it. This
is then used to get functionality equivalent to propagateEquality in
GVN, in NewGVN (without having to replace instructions as we go). It
would work similarly in SCCP or other passes. This has been talked
about a few times, so i built a real implementation and tried to
productionize it.

Copies are inserted for operands used in assumes and conditional
branches that are based on comparisons (see below for more)

Every use affected by the predicate is renamed to the appropriate
intrinsic result.

E.g.
%cmp = icmp eq i32 %x, 50
br i1 %cmp, label %true, label %false
true:
ret i32 %x
false:
ret i32 1

will become

%cmp = icmp eq i32, %x, 50
br i1 %cmp, label %true, label %false
true:
; Has predicate info
; branch predicate info { TrueEdge: 1 Comparison: %cmp = icmp eq i32 %x, 50 }
%x.0 = call @llvm.ssa_copy.i32(i32 %x)
ret i32 %x.0
false:
ret i23 1

(you can use -print-predicateinfo to get an annotated-with-predicateinfo dump)

This enables us to easily determine what operations are affected by a
given predicate, and how operations affected by a chain of
predicates.

Reviewers: davide, sanjoy

Subscribers: mgorny, llvm-commits, Prazek

Differential Revision: https://reviews.llvm.org/D29519

Update for review comments

Fix a bug Nuno noticed where we are giving information about and/or on edges where the info is not useful and easy to use wrong

Update for review comments

llvm-svn: 294351
2017-02-07 21:10:46 +00:00
David Blaikie 4c01af203e Fix the -Werror build for some sign-comparisons
llvm-svn: 294331
2017-02-07 18:58:17 +00:00
Davide Italiano 2133bf5562 [InstCombine] Make max size array combine a tunable.
Requested by Sanjoy/Hal a while ago, and forgotten by me
(r283612).

llvm-svn: 294323
2017-02-07 17:56:50 +00:00
Reid Kleckner 79e37d517c Revert "[GVNHoist] Merge DebugLoc metadata on hoisted instructions"
This reverts commit r294250. It caused PR31891.

Add a test case that shows that inlinable calls retain location
information with an accurate scope.

llvm-svn: 294317
2017-02-07 17:31:13 +00:00
Peter Collingbourne 1ea1fd893c LowerTypeTests: Simplify. NFC.
llvm-svn: 294273
2017-02-07 03:20:58 +00:00
Dehao Chen 4a9dd70213 Fix the samplepgo indirect call promotion bug: we should not promote a direct call.
Summary: Checking CS.getCalledFunction() == nullptr does not necessary indicate indirect call. We also need to check if CS.getCalledValue() is not a constant.

Reviewers: davidxl

Reviewed By: davidxl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29570

llvm-svn: 294260
2017-02-06 23:33:15 +00:00
Paul Robinson 383c5c228f Merge DebugLoc on combined stores; in this case, when combining stores
from the end of two blocks, merge instead of arbitrarily picking one.

Differential Revision: http://reviews.llvm.org/D29504

llvm-svn: 294251
2017-02-06 22:19:04 +00:00
Taewook Oh 44a856f7d5 [GVNHoist] Merge DebugLoc metadata on hoisted instructions
Summary:
When instructions are hoisted, current implementation keeps DebugLoc metadata of the instruction that chosen as Repl (and its GEP operand if Repl is a load or a store). However, DebugLoc metadata should be updated to the 'merged' location across all hoisted instructions. See the following example code:


```
  1:  typedef struct {
  2:    int a[10];
  3:  } S1;
  4: 
  5:  extern S1 *s1[10];
  6: 
  7:  void foo(int x, int y, int i) {
  8:    if (y)
  9:      s1[i]->a[i] = x + y;
 10:    else
 11:      s1[i]->a[i] = x;
 12:  }
```

Below is LLVM IR representation of the program before gvn-hoist:


```
%struct.S1 = type { [10 x i32] }
@s1 = external local_unnamed_addr global [10 x %struct.S1*], align 16

define void @foo(i32 %x, i32 %y, i32 %i) !dbg !4 {
entry:
  %tobool = icmp ne i32 %y, 0, !dbg !8
  br i1 %tobool, label %if.then, label %if.else, !dbg !10

if.then:                                          ; preds = %entry
  %add = add nsw i32 %x, %y, !dbg !11
  %idxprom = sext i32 %i to i64, !dbg !12
  %arrayidx = getelementptr inbounds [10 x %struct.S1*], [10 x %struct.S1*]* @s1, i64 0, i64 %idxprom, !dbg !12
  %0 = load %struct.S1*, %struct.S1** %arrayidx, align 8, !dbg !12, !tbaa !13
  %a = getelementptr inbounds %struct.S1, %struct.S1* %0, i32 0, i32 0, !dbg !17
  br label %if.end, !dbg !12

if.else:                                          ; preds = %entry
  %idxprom3 = sext i32 %i to i64, !dbg !18
  %arrayidx4 = getelementptr inbounds [10 x %struct.S1*], [10 x %struct.S1*]* @s1, i64 0, i64 %idxprom3, !dbg !18
  %1 = load %struct.S1*, %struct.S1** %arrayidx4, align 8, !dbg !18, !tbaa !13
  %a5 = getelementptr inbounds %struct.S1, %struct.S1* %1, i32 0, i32 0, !dbg !19
  br label %if.end

if.end:                                           ; preds = %if.else, %if.then
  %a5.sink = phi [10 x i32]* [ %a5, %if.else ], [ %a, %if.then ]
  %.sink = phi i32 [ %x, %if.else ], [ %add, %if.then ]
  %idxprom6 = sext i32 %i to i64
  %arrayidx7 = getelementptr inbounds [10 x i32], [10 x i32]* %a5.sink, i64 0, i64 %idxprom6
  store i32 %.sink, i32* %arrayidx7, align 4, !tbaa !20
  ret void, !dbg !22
}

```
where


```
!11 = !DILocation(line: 9, column: 18, scope: !9)
!12 = !DILocation(line: 9, column: 5, scope: !9)
!18 = !DILocation(line: 11, column: 5, scope: !9)
!19 = !DILocation(line: 11, column: 9, scope: !9)
```

. And below is after gvn-hoist:


```
define void @foo(i32 %x, i32 %y, i32 %i) !dbg !4 {
entry:
  %tobool = icmp ne i32 %y, 0, !dbg !8
  %idxprom = sext i32 %i to i64, !dbg !10
  %0 = getelementptr inbounds [10 x %struct.S1*], [10 x %struct.S1*]* @s1, i64 0, i64 %idxprom, !dbg !10
  %1 = load %struct.S1*, %struct.S1** %0, align 8, !dbg !10, !tbaa !11
  br i1 %tobool, label %if.then, label %if.else, !dbg !15

if.then:                                          ; preds = %entry
  %add = add nsw i32 %x, %y, !dbg !16
  %arrayidx = getelementptr inbounds [10 x %struct.S1*], [10 x %struct.S1*]* @s1, i64 0, i64 %idxprom, !dbg !10
  %a = getelementptr inbounds %struct.S1, %struct.S1* %1, i32 0, i32 0, !dbg !17
  br label %if.end, !dbg !10

if.else:                                          ; preds = %entry
  %arrayidx4 = getelementptr inbounds [10 x %struct.S1*], [10 x %struct.S1*]* @s1, i64 0, i64 %idxprom, !dbg !18
  %a5 = getelementptr inbounds %struct.S1, %struct.S1* %1, i32 0, i32 0, !dbg !19
  br label %if.end

if.end:                                           ; preds = %if.else, %if.then
  %a5.sink = phi [10 x i32]* [ %a5, %if.else ], [ %a, %if.then ]
  %.sink = phi i32 [ %x, %if.else ], [ %add, %if.then ]
  %arrayidx7 = getelementptr inbounds [10 x i32], [10 x i32]* %a5.sink, i64 0, i64 %idxprom
  store i32 %.sink, i32* %arrayidx7, align 4, !tbaa !20
  ret void, !dbg !22
}

```
As you see, loads and their GEPs have been hosited from if.then/if.else block to entry block. However, DebugLoc metadata of these new instructions are still same as the instructions in if.then block, as they are moved/cloned from if.then block. This may result incorrect stepping and imprecise sample profile result.

Reviewers: majnemer, pcc, sebpop

Reviewed By: sebpop

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29377

llvm-svn: 294250
2017-02-06 22:05:04 +00:00
Michael Kuperstein 7a86bb2589 [SLP] Revert "Allow using of extra values in horizontal reductions."
This breaks when one of the extra values is also a scalar that
participates in the same vectorization tree which we'll end up
reducing.

llvm-svn: 294245
2017-02-06 21:50:59 +00:00
Sanjay Patel 54656ca7db [ValueTracking] emit a remark when we detect a conflicting assumption (PR31809)
This is a follow-up to D29395 where we try to be good citizens and let the user know that
we've probably gone off the rails.

This should allow us to resolve:
https://llvm.org/bugs/show_bug.cgi?id=31809

Differential Revision: https://reviews.llvm.org/D29404

llvm-svn: 294208
2017-02-06 18:26:06 +00:00
Dehao Chen c81483d79c Fix the bug of samplepgo indirect call promption when type casting of the return value is needed.
Summary: When type casting of the return value is needed, promoteIndirectCall will return the type casting instruction instead of the direct call. This patch changed to return the direct call instruction instead.

Reviewers: davidxl

Reviewed By: davidxl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29569

llvm-svn: 294205
2017-02-06 18:10:36 +00:00
Sanjay Patel cf4c90f3d3 [InstCombine] simplify dyn_cast + isa; NFCI
llvm-svn: 294198
2017-02-06 17:16:16 +00:00
Dehao Chen 448b5790f6 Refactor SampleProfile.cpp to make it cleaner. (NFC)
llvm-svn: 294118
2017-02-05 07:32:17 +00:00
Davide Italiano ec49313b11 [IPCP] Don't propagate return value for naked functions.
This is pretty much the same change made in SCCP.

llvm-svn: 294098
2017-02-04 19:44:14 +00:00