Commit Graph

21216 Commits

Author SHA1 Message Date
Mikael Holmen b69e5b7393 [GlobalOpt] Include padding in debug fragments
Summary:
When creating the debug fragments for a SRA'd variable, use the types'
allocation sizes. This fixes issues where the pass would emit too small
fragments, placed at the wrong offset, for padded types.

An example of this is long double on x86. The type is represented using
x86_fp80, which is 10 bytes, but the value is aligned to 12/16 bytes.
The padding is included in the type's DW_AT_byte_size attribute;
therefore, the fragments should also include that. Newer GCC releases
(I tested 7.2.0) emit 12/16-byte pieces for long double. Earlier
releases, e.g. GCC 5.5.0, behaved as LLVM did, i.e. by emitting a
10-byte piece, followed by an empty 2/6-byte piece for the padding.

Failing to cover all `DW_AT_byte_size' bytes of a value with non-empty
pieces results in the value being printed as <optimized out> by GDB.

Patch by: David Stenberg

Reviewers: aprantl, JDevlieghere

Reviewed By: aprantl, JDevlieghere

Subscribers: llvm-commits

Tags: #debug-info

Differential Revision: https://reviews.llvm.org/D42807

llvm-svn: 324066
2018-02-02 10:34:13 +00:00
David Blaikie 42bcdffd24 Add missing includes
llvm-svn: 324040
2018-02-02 00:11:09 +00:00
Sanjay Patel 3343fcef86 [InstCombine] allow multi-use values in canEvaluate* if all uses are in 1 inst
This is the enhancement suggested in D42536 to fix a shortcoming in 
regular InstCombine's canEvaluate* functionality.
When we have multiple uses of a value, but they're all in one instruction, we can 
allow that expression to be narrowed or widened for the same cost as a single-use 
value.

AFAICT, this can only matter for multiply: sub/and/or/xor/select would be simplified 
away if the operands are the same value; add becomes shl; shifts with a variable shift 
amount aren't handled.

Differential Revision: https://reviews.llvm.org/D42739

llvm-svn: 324014
2018-02-01 21:55:53 +00:00
David Green 184df0c35d Revert commit rL323951
Looks like it's causing timeouts out on at least ppc64le
buildbots.

llvm-svn: 323959
2018-02-01 13:05:25 +00:00
David Green e11f0545db [InstCombine] Allow common type conversions to i8/i16/i32
This, in instcombine, allows conversions to i8/i16/i32 (very
common cases) even if the resulting type is not legal according
to the data layout. This can often open up extra combine
opportunities.

Differential Revision: https://reviews.llvm.org/D42424

llvm-svn: 323951
2018-02-01 11:06:18 +00:00
Mikael Holmen 6d06976e74 [LSR] Don't force bases of foldable formulae to the final type.
Summary:
Before emitting code for scaled registers, we prevent
SCEVExpander from hoisting any scaled addressing mode
by emitting all the bases first. However, these bases
are being forced to the final type, resulting in some
odd code.

For example, if the type of the base is an integer and
the final type is a pointer, we will emit an inttoptr
for the base, a ptrtoint for the scale, and then a
'reverse' GEP where the GEP pointer is actually the base
integer and the index is the pointer. It's more intuitive
to use the pointer as a pointer and the integer as index.

Patch by: Bevin Hansson

Reviewers: atrick, qcolombet, sanjoy

Reviewed By: qcolombet

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D42103

llvm-svn: 323946
2018-02-01 06:38:34 +00:00
Amara Emerson 93b0ff20c9 [GlobalOpt] Improve common case efficiency of static global initializer evaluation
For very, very large global initializers which can be statically evaluated, the
code would create vectors of temporary Constants, modifying them in place,
before committing the resulting Constant aggregate to the global's initializer
value. This had effectively O(n^2) complexity in the size of the global
initializer and would cause memory and non-termination issues compiling some
workloads.

This change performs the static initializer evaluation and creation in batches,
once for each global in the evaluated IR memory. The existing code is maintained
as a last resort when the initializers are more complex than simple values in a
large aggregate. This should theoretically by NFC, no test as the example case
is massive. The existing test cases pass with this, as well as the llvm test
suite.

To give an example, consider the following C++ code adapted from the clang
regression tests:
struct S {
 int n = 10;
 int m = 2 * n;
 S(int a) : n(a) {}
};

template<typename T>
struct U {
 T *r = &q;
 T q = 42;
 U *p = this;
};

U<S> e;

The global static constructor for 'e' will need to initialize 'r' and 'p' of
the outer struct, while also initializing the inner 'q' structs 'n' and 'm'
members. This batch algorithm will simply use general CommitValueTo() method
to handle the complex nested S struct initialization of 'q', before
processing the outermost members in a single batch. Using CommitValueTo() to
handle member in the outer struct is inefficient when the struct/array is
very large as we end up creating and destroy constant arrays for each
initialization.
For the above case, we expect the following IR to be generated:

%struct.U = type { %struct.S*, %struct.S, %struct.U* }
%struct.S = type { i32, i32 }
@e = global %struct.U { %struct.S* gep inbounds (%struct.U, %struct.U* @e,
                                                 i64 0, i32 1),
                        %struct.S { i32 42, i32 84 }, %struct.U* @e }
The %struct.S { i32 42, i32 84 } inner initializer is treated as a complex
constant expression, while the other two elements of @e are "simple".

Differential Revision: https://reviews.llvm.org/D42612

llvm-svn: 323933
2018-01-31 23:56:07 +00:00
Matt Arsenault 06dfbb50d7 Utils: Fix DomTree update for entry block
If SplitBlockPredecessors was used on a function entry block,
it wouldn't update the dominator tree.

llvm-svn: 323928
2018-01-31 22:54:37 +00:00
Amjad Aboud b86b771c02 [AggressiveInstCombine] Fixed TruncCombine class to handle TruncInst leaf node correctly.
This covers the case where TruncInst leaf node is a constant expression.
See PR36121 for more details.

Differential Revision: https://reviews.llvm.org/D42622

llvm-svn: 323926
2018-01-31 22:39:05 +00:00
Eli Friedman 79d297abe4 [GlobalOpt] Fix exponential compile-time with selects.
If you have a long chain of select instructions created from something
like `int* p = &g; if (foo()) p += 4; if (foo2()) p += 4;` etc., a naive
recursive visitor will recursively visit each select twice, which is
O(2^N) in the number of select instructions. Use the visited set to cut
off recursion in this case.

(No testcase because this doesn't actually change the behavior, just the
time.)

Differential Revision: https://reviews.llvm.org/D42451

llvm-svn: 323910
2018-01-31 20:42:25 +00:00
Marek Olsak 13e4741275 AMDGPU: Add intrinsics llvm.amdgcn.cvt.{pknorm.i16, pknorm.u16, pk.i16, pk.u16}
Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye

Differential Revision: https://reviews.llvm.org/D41663

llvm-svn: 323908
2018-01-31 20:18:04 +00:00
Marek Olsak 8e7d149a31 [SeparateConstOffsetFromGEP] Preserve metadata when splitting GEPs
Summary:
!amdgpu.uniform needs to be preserved for AMDGPU, otherwise bad things
happen.

Reviewers: arsenm, nhaehnle, jingyue, broune, majnemer, bjarke.roune, dblaikie

Subscribers: wdng, tpr, llvm-commits

Differential Revision: https://reviews.llvm.org/D42744

llvm-svn: 323907
2018-01-31 20:17:52 +00:00
Sanjay Patel 1b66deea16 [InstCombine] reduce code duplication for canEvaluate* functions; NFCI
We'd have to make the change suggested in D42536 3x otherwise. 

llvm-svn: 323877
2018-01-31 14:55:53 +00:00
Amjad Aboud d895bff5f2 [AggressiveInstCombine] Make TruncCombine class ignore unreachable basic blocks.
Because dead code may contain non-standard IR that causes infinite looping or crashes in underlying analysis.
See PR36134 for more details.

Differential Revision: https://reviews.llvm.org/D42683

llvm-svn: 323862
2018-01-31 10:41:31 +00:00
Peter Collingbourne 7873669be5 LTO: Drop comdats when converting definitions to declarations.
Differential Revision: https://reviews.llvm.org/D42715

llvm-svn: 323844
2018-01-31 02:51:03 +00:00
Teresa Johnson df763188c9 Teach ValueMapper to use ODR uniqued types when available
Summary:
This is exposed during ThinLTO compilation, when we import an alias by
creating a clone of the aliasee. Without this fix the debug type is
unnecessarily cloned and we get a duplicate, undoing the uniquing.

Fixes PR36089.

Reviewers: mehdi_amini, pcc

Subscribers: eraman, JDevlieghere, llvm-commits

Differential Revision: https://reviews.llvm.org/D41669

llvm-svn: 323813
2018-01-30 20:16:32 +00:00
Zaara Syeda 1f59ae311b Re-commit : [PowerPC] Add handling for ColdCC calling convention and a pass to mark
candidates with coldcc attribute.

This recommits r322721 reverted due to sanitizer memory leak build bot failures.

Original commit message:
This patch adds support for the coldcc calling convention for Power.
This changes the set of non-volatile registers. It includes a pass to stress
test the implementation by marking all static directly called functions with
the coldcc attribute through the option -enable-coldcc-stress-test. It also
includes an option, -ppc-enable-coldcc, to add the coldcc attribute to
functions which are cold at all call sites based on BlockFrequencyInfo when
the containing function does not call any non cold functions.

Differential Revision: https://reviews.llvm.org/D38413

llvm-svn: 323778
2018-01-30 16:17:22 +00:00
Daniel Neilson 594f443b06 [RS4GC] Handle call/invoke instructions as base defining values of vectors
Summary:
 There's an asymmetry in the definitions of findBaseDefiningValueOfVector() and
findBaseDefiningValue() of RS4GC. The later handles call and invoke instructions,
and the former does not. This appears to be simple oversight. This patch remedies
the oversight by adding the call and invoke cases to findBaseDefiningValueOfVector().

Reviewers: DaniilSuchkov, anna

Reviewed By: anna

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D42653

llvm-svn: 323764
2018-01-30 14:43:41 +00:00
Sanjay Patel 1aef27f5cd [DSE] make sure memory is not modified before partial store merging (PR36129)
We missed a critical check in D30703. We must make sure that no intermediate 
store is sitting between the stores that we want to merge.

This should fix:
https://bugs.llvm.org/show_bug.cgi?id=36129

Differential Revision: https://reviews.llvm.org/D42663

llvm-svn: 323759
2018-01-30 13:53:59 +00:00
Brian M. Rzycki 994e889022 [JumpThreading][NFC] Rename LoadInst variables
Summary:
The JumpThreading pass has several locations where to the variable name LI
refers to a LoadInst type. This is confusing and inhibits the ability to use
LI for LoopInfo as a member of the JumpThreading class. Minor formatting
and comments were also altered to reflect this change.

Reviewers: dberlin, kuba, spop, sebpop

Reviewed by: sebpop

Subscribers: sebpop, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D42601

llvm-svn: 323695
2018-01-29 21:29:44 +00:00
Alexey Bataev 9c5c103283 [SLP] Fix for PR32086: Count InsertElementInstr of the same elements as shuffle.
Summary:
If the same value is going to be vectorized several times in the same
tree entry, this entry is considered to be a gather entry and cost of
this gather is counter as cost of InsertElementInstrs for each gathered
value. But we can consider these elements as ShuffleInstr with
SK_PermuteSingle shuffle kind.

Reviewers: spatel, RKSimon, mkuper, hfinkel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D38697

llvm-svn: 323662
2018-01-29 16:08:52 +00:00
George Rimar eaf5172ca6 [ThinLTO] - Stop internalizing and drop non-prevailing symbols.
Implementation marks non-prevailing symbols as not live in the summary.
Then them are dropped in backends.

Fixes https://bugs.llvm.org/show_bug.cgi?id=35938

Differential revision: https://reviews.llvm.org/D42107

llvm-svn: 323633
2018-01-29 08:03:30 +00:00
Davide Italiano 8b797a0fd2 [CVP] Don't Replace incoming values from unreachable blocks with undef.
This pretty much reverts r322006, except that we keep the test,
because we work around the issue exposed in a different way (a
recursion limit in value tracking). There's still probably some
sequence that exposes this problem, and the proper way to fix that
for somebody who has time is outlined in the code review.

llvm-svn: 323630
2018-01-29 05:59:55 +00:00
Hiroshi Inoue c8e9245816 [NFC] fix trivial typos in comments and documents
"to to" -> "to"

llvm-svn: 323628
2018-01-29 05:17:03 +00:00
Alexey Bataev f86be12182 Revert "[SLP] Fix for PR32086: Count InsertElementInstr of the same elements as shuffle."
This reverts commit r323530 to fix possible problems in users code.

llvm-svn: 323581
2018-01-27 02:42:21 +00:00
Alexey Bataev dce1614d75 Revert "[SLP] Removed the warning about unused variable, NFC."
This reverts commit r323533 to fix possible problems in users code.

llvm-svn: 323580
2018-01-27 02:42:17 +00:00
Vedant Kumar cff94627cf [InstrProfiling] Don't exit early when an unused intrinsic is found
This fixes a think-o in r323574.

llvm-svn: 323576
2018-01-27 00:01:04 +00:00
Vedant Kumar 1ee511c19c [InstrProfiling] Improve compile time when there is no work
When there are no uses of profiling intrinsics in a module, and there's
no coverage data to lower, InstrProfiling has no work to do.

llvm-svn: 323574
2018-01-26 23:54:24 +00:00
Vedant Kumar e48597a50e [InstCombine] Preserve debug values for eliminable casts
A cast from A to B is eliminable if its result is casted to C, and if
the pair of casts could just be expressed as a single cast. E.g here,
%c1 is eliminable:

  %c1 = zext i16 %A to i32
  %c2 = sext i32 %c1 to i64

InstCombine optimizes away eliminable casts. This patch teaches it to
insert a dbg.value intrinsic pointing to the final result, so that local
variables pointing to the eliminable result are preserved.

Differential Revision: https://reviews.llvm.org/D42566

llvm-svn: 323570
2018-01-26 22:02:52 +00:00
Alexey Bataev 041ef2dd15 [SLP] Removed the warning about unused variable, NFC.
llvm-svn: 323533
2018-01-26 15:34:44 +00:00
Alexey Bataev 167003df28 [SLP] Fix for PR32086: Count InsertElementInstr of the same elements as shuffle.
Summary:
If the same value is going to be vectorized several times in the same
tree entry, this entry is considered to be a gather entry and cost of
this gather is counter as cost of InsertElementInstrs for each gathered
value. But we can consider these elements as ShuffleInstr with
SK_PermuteSingle shuffle kind.

Reviewers: spatel, RKSimon, mkuper, hfinkel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D38697

llvm-svn: 323530
2018-01-26 14:31:09 +00:00
Daniil Fukalov 6e1dc68117 [AMDGPU] fix LDS f32 intrinsics
- using qualified pointer addrspace in intrinsics class to avoid .f32 mangling
- changed too common atomic mangling to ds
- added missing intrinsics to AMDGPUTTIImpl::getTgtMemIntrinsic

Reviewed by: b-sumner

Differential Revision: https://reviews.llvm.org/D42383

llvm-svn: 323516
2018-01-26 11:09:38 +00:00
Florian Hahn 212afb9fd9 [CallSiteSplitting] Fix infinite loop when recording conditions.
Fix infinite loop when recording conditions by correctly marking basic
blocks as visited.

Fixes https://bugs.llvm.org/show_bug.cgi?id=36105

llvm-svn: 323515
2018-01-26 10:36:50 +00:00
Hiroshi Inoue 0909ca132f [NFC] fix trivial typos in comments and documents
"in in" -> "in", "on on" -> "on" etc.

llvm-svn: 323508
2018-01-26 08:15:29 +00:00
Vedant Kumar 6394df9fc4 [Debug] LCSSA: Insert dbg.value at the first available insertion point
Inserting a dbg.value instruction at the start of a basic block with a
landingpad instruction triggers a verifier failure. We should be OK if
we insert the instruction a bit later.

Speculative fix for the bot failure described here:
https://reviews.llvm.org/D42551

llvm-svn: 323482
2018-01-25 23:48:29 +00:00
Easwaran Raman 8410c37465 [SyntheticCounts] Rewrite the code using only graph traits.
Summary:
The intent of this is to allow the code to be used with ThinLTO. In
Thinlink phase, a traditional Callgraph can not be computed even though
all the necessary information (nodes and edges of a call graph) is
available. This is due to the fact that CallGraph class is closely tied
to the IR. This patch first extends GraphTraits to add a CallGraphTraits
graph. This is then used to implement a version of counts propagation
on a generic callgraph.

Reviewers: davidxl

Subscribers: mehdi_amini, tejohnson, llvm-commits

Differential Revision: https://reviews.llvm.org/D42311

llvm-svn: 323475
2018-01-25 22:02:29 +00:00
Vedant Kumar 60f54084bf [Debug] Add dbg.value intrinsics for PHIs created during LCSSA.
This patch is an enhancement to propagate dbg.value information when
Phis are created on behalf of LCSSA.  I noticed a case where a value
carried across a loop was reported as <optimized out>.

Specifically this case:

  int bar(int x, int y) {
    return x + y;
  }

  int foo(int size) {
    int val = 0;
    for (int i = 0; i < size; ++i) {
      val = bar(val, i);  // Both val and i are correct
    }
    return val; // <optimized out>
  }

In the above case, after all of the interesting computation completes
our value is reported as "optimized out." This change will add a
dbg.value to correct this.

This patch also moves the dbg.value insertion routine from
LoopRotation.cpp into Local.cpp, so that we can share it in both places
(LoopRotation and LCSSA).

Patch by Matt Davis!

Differential Revision: https://reviews.llvm.org/D42551

llvm-svn: 323472
2018-01-25 21:37:07 +00:00
Vedant Kumar 6bfc869cf7 [Debug] Add a utility to propagate dbg.value to new PHIs, NFC
This simply moves an existing utility to Utils for reuse.

Split out of: https://reviews.llvm.org/D42551

Patch by Matt Davis!

llvm-svn: 323471
2018-01-25 21:37:05 +00:00
Evgeniy Stepanov 31475a039a [asan] Fix kernel callback naming in instrumentation module.
Right now clang uses "_n" suffix for some user space callbacks and "N" for the matching kernel ones. There's no need for this and it actually breaks kernel build with inline instrumentation. Use the same callback names for user space and the kernel (and also make them consistent with the names GCC uses).

Patch by Andrey Konovalov.

Differential Revision: https://reviews.llvm.org/D42423

llvm-svn: 323470
2018-01-25 21:28:51 +00:00
Easwaran Raman c73cec84c9 Re-land "[ThinLTO] Add call edges' relative block frequency to per-module summary."
It was reverted after buildbot regressions.

Original commit message:

This allows relative block frequency of call edges to be passed
to the thinlink stage where it will be used to compute synthetic
entry counts of functions.

llvm-svn: 323460
2018-01-25 19:27:17 +00:00
Alexey Bataev 102d4b59f9 Revert "[SLP] Fix for PR32086: Count InsertElementInstr of the same elements as shuffle."
This reverts commit r323441 to fix buildbots.

llvm-svn: 323447
2018-01-25 17:28:12 +00:00
Alexey Bataev c8cfa14b6d [SLP] Fix for PR32086: Count InsertElementInstr of the same elements as shuffle.
Summary:
If the same value is going to be vectorized several times in the same
tree entry, this entry is considered to be a gather entry and cost of
this gather is counter as cost of InsertElementInstrs for each gathered
value. But we can consider these elements as ShuffleInstr with
SK_PermuteSingle shuffle kind.

Reviewers: spatel, RKSimon, mkuper, hfinkel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D38697

llvm-svn: 323441
2018-01-25 16:45:18 +00:00
Sanjay Patel 1d68112c4b [InstCombine] narrow masked zexted binops (PR35792)
This is guarded by shouldChangeType(), so the tests show that
we don't do the fold if the narrower type is not legal. Note
that there is a proposal (D42424) that would change the results
for the specific cases shown in these tests. That difference is
also discussed in PR35792:
https://bugs.llvm.org/show_bug.cgi?id=35792

Alive proofs for the cases handled here as well as the bitwise 
logic binops that we should already do better on:
https://rise4fun.com/Alive/c97
https://rise4fun.com/Alive/Lc5E
https://rise4fun.com/Alive/kdf

llvm-svn: 323437
2018-01-25 16:34:36 +00:00
Alexey Bataev a0b2c78efc Revert "[SLP] Fix for PR32086: Count InsertElementInstr of the same elements as shuffle."
This reverts commit r323430 to fix buildbots.

llvm-svn: 323432
2018-01-25 15:20:29 +00:00
Alexey Bataev ad51fe3644 [SLP] Fix for PR32086: Count InsertElementInstr of the same elements as shuffle.
Summary:
If the same value is going to be vectorized several times in the same
tree entry, this entry is considered to be a gather entry and cost of
this gather is counter as cost of InsertElementInstrs for each gathered
value. But we can consider these elements as ShuffleInstr with
SK_PermuteSingle shuffle kind.

Reviewers: spatel, RKSimon, mkuper, hfinkel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D38697

llvm-svn: 323430
2018-01-25 15:01:36 +00:00
Amjad Aboud f1f57a3137 Another try to commit 323321 (aggressive instruction combine).
llvm-svn: 323416
2018-01-25 12:06:32 +00:00
Mikael Holmen 886edf8f8a [GlobalOpt] Emit fragments using field offsets from struct layout
Summary:
When creating the debug fragments for a SRA'd struct, use the fields'
offsets, taken from the struct layout, as the offsets for the resulting
fragments. This fixes an issue where GlobalOpt would emit fragments with
incorrect offsets for padded fields.

This should solve PR36016.

Patch by David Stenberg.

Reviewers: aprantl

Reviewed By: aprantl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D42489

llvm-svn: 323411
2018-01-25 10:09:26 +00:00
Alexey Bataev 0affccc8d7 Revert "[SLP] Fix for PR32086: Count InsertElementInstr of the same elements as shuffle."
This reverts commit r323348 because of the broken buildbots.

llvm-svn: 323359
2018-01-24 18:36:51 +00:00
Nicolai Haehnle 4afb64e4c6 Revert r321751, "StructurizeCFG: Fix broken backedge detection"
It causes regressions in various OpenGL test suites.

Keep the test cases introduced by r321751 as XFAIL, and add a test case
for the regression.

Change-Id: I90b4cc354f68cebe5fcef1f2422dc8fe1c6d3514
Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=36015
llvm-svn: 323355
2018-01-24 18:02:05 +00:00
Alexey Bataev 4bd8e5332f [SLP] Fix for PR32086: Count InsertElementInstr of the same elements as shuffle.
Summary:
If the same value is going to be vectorized several times in the same
tree entry, this entry is considered to be a gather entry and cost of
this gather is counter as cost of InsertElementInstrs for each gathered
value. But we can consider these elements as ShuffleInstr with
SK_PermuteSingle shuffle kind.

Reviewers: spatel, RKSimon, mkuper, hfinkel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D38697

llvm-svn: 323348
2018-01-24 17:50:53 +00:00
Amjad Aboud d53504e379 Reverted 323321.
llvm-svn: 323326
2018-01-24 14:48:49 +00:00
Amjad Aboud e4453233d7 [InstCombine] Introducing Aggressive Instruction Combine pass (-aggressive-instcombine).
Combine expression patterns to form expressions with fewer, simple instructions.
This pass does not modify the CFG.

For example, this pass reduce width of expressions post-dominated by TruncInst
into smaller width when applicable.

It differs from instcombine pass in that it contains pattern optimization that
requires higher complexity than the O(1), thus, it should run fewer times than
instcombine pass.

Differential Revision: https://reviews.llvm.org/D38313

llvm-svn: 323321
2018-01-24 12:42:42 +00:00
Max Kazantsev 0f720e1296 [NFC] Remove overconfident assert from IRCE
This patch removes assert that SCEV is able to prove that a value is
non-negative. In fact, SCEV can sometimes be unable to do this because
its cache does not update properly. This assert will be returned once this
problem is resolved.

llvm-svn: 323309
2018-01-24 07:51:41 +00:00
Volkan Keles ebf34ea316 BlockExtractor: Remove unused variable. NFC.
llvm-svn: 323271
2018-01-23 22:24:34 +00:00
Volkan Keles dc40be75f8 [llvm-extract] Support extracting basic blocks
Summary:
Currently, there is no way to extract a basic block from a function easily. This patch
extends llvm-extract to extract the specified basic block(s).

Reviewers: loladiro, rafael, bogner

Reviewed By: bogner

Subscribers: hintonda, mgorny, qcolombet, llvm-commits

Differential Revision: https://reviews.llvm.org/D41638

llvm-svn: 323266
2018-01-23 21:51:34 +00:00
Alexey Bataev 4f74a31c0e Revert "[SLP] Fix for PR32086: Count InsertElementInstr of the same elements as shuffle."
This reverts commit r323246 because of the broken buildbots.

llvm-svn: 323252
2018-01-23 20:11:27 +00:00
Alexey Bataev 6719e2418c [SLP] Fix for PR32086: Count InsertElementInstr of the same elements as shuffle.
Summary:
If the same value is going to be vectorized several times in the same
tree entry, this entry is considered to be a gather entry and cost of
this gather is counter as cost of InsertElementInstrs for each gathered
value. But we can consider these elements as ShuffleInstr with
SK_PermuteSingle shuffle kind.

Reviewers: spatel, RKSimon, mkuper, hfinkel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D38697

llvm-svn: 323246
2018-01-23 19:30:26 +00:00
Ashutosh Nema 007b425b77 This change add's optimization remark in LoopVersioning LICM pass.
Summary:
This patch is adding remark messages to the LoopVersioning LICM pass, 
which will be useful for optimization remark emitter (ORE) infrastructure.

Patch by: Deepak Porwal

Reviewers: anemet, ashutosh.nema, eastig

Subscribers: eastig, vivekvpandya, fhahn, llvm-commits
llvm-svn: 323183
2018-01-23 09:47:28 +00:00
Dmitry Vyukov 68aab34f2d asan: allow inline instrumentation for the kernel
Currently ASan instrumentation pass forces callback
instrumentation when applied to the kernel.
This patch changes the current behavior to allow
using inline instrumentation in this case.

Authored by andreyknvl. Reviewed in:
https://reviews.llvm.org/D42384

llvm-svn: 323140
2018-01-22 19:07:11 +00:00
Eugene Leviant 28d8a49f42 [ThinLTO] Re-commit of dot dumper after test fix
llvm-svn: 323116
2018-01-22 13:35:40 +00:00
Serguei Katkov f38041dc3e Revert [SCEV] Fix isLoopEntryGuardedByCond usage
It causes buildbot failures. New added assert is fired.
It seems not all usages of isLoopEntryGuardedByCond are fixed.

llvm-svn: 323079
2018-01-22 07:47:02 +00:00
Serguei Katkov 50714a1cbc [SCEV] Fix isLoopEntryGuardedByCond usage
ScalarEvolution::isKnownPredicate invokes isLoopEntryGuardedByCond without check
that SCEV is available at entry point of the loop. It is incorrect and fixed by patch.

Reviewers: sanjoy, mkazantsev, anna, dorit
Reviewed By: mkazantsev
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D42165

llvm-svn: 323077
2018-01-22 07:31:41 +00:00
Sanjay Patel 9530f18864 [InstCombine] (X << Y) / X -> 1 << Y
...when the shift is known to not overflow with the matching
signed-ness of the division.

This closes an optimization gap caused by canonicalizing mul
by power-of-2 to shl as shown in PR35709:
https://bugs.llvm.org/show_bug.cgi?id=35709

Patch by Anton Bikineev!

Differential Revision: https://reviews.llvm.org/D42032

llvm-svn: 323068
2018-01-21 16:14:51 +00:00
Eugene Leviant 72b9bdb71a Temporarily revert r323062 to investigate buildbot failures
llvm-svn: 323065
2018-01-21 10:22:19 +00:00
Eugene Leviant 453c976a63 [ThinLTO] Implement summary visualizer
Differential revision: https://reviews.llvm.org/D41297

llvm-svn: 323062
2018-01-21 07:27:32 +00:00
Philip Reames f57714c3c7 [DSE] Factor out common code [NFC]
We already had the pointer being stored to in the MemLoc, reuse that code.  In merging cases, it turned out the interface of the getLocForWrite had become inconsitent with other related utilities.  Fix that by making sure the input passes hasAnalyzableWrite as well.

llvm-svn: 323056
2018-01-21 02:10:54 +00:00
Philip Reames 424e7a1174 [DSE] Minor rename for clarity sake [NFC]
llvm-svn: 323055
2018-01-21 01:44:33 +00:00
Akira Hatanaka 73ceb50d85 [ObjCARC] Do not turn a call to @objc_autoreleaseReturnValue into a call
to @objc_autorelease if its operand is a PHI and the PHI has an
equivalent value that is used by a return instruction.

For example, ARC optimizer shouldn't replace the call in the following
example, as doing so breaks the AutoreleaseRV/RetainRV optimization:

  %v1 = bitcast i32* %v0 to i8*
  br label %bb3
bb2:
  %v3 = bitcast i32* %v2 to i8*
  br label %bb3
bb3:
  %p = phi i8* [ %v1, %bb1 ], [ %v3, %bb2 ]
  %retval = phi i32* [ %v0, %bb1 ], [ %v2, %bb2 ] ; equivalent to %p
  %v4 = tail call i8* @objc_autoreleaseReturnValue(i8* %p)
  ret i32* %retval

Also, make sure ObjCARCContract replaces @objc_autoreleaseReturnValue's
operand uses with its value so that the call gets tail-called.

rdar://problem/15894705

llvm-svn: 323009
2018-01-19 23:51:13 +00:00
Daniel Neilson 1e68724d24 Remove alignment argument from memcpy/memmove/memset in favour of alignment attributes (Step 1)
Summary:
 This is a resurrection of work first proposed and discussed in Aug 2015:
   http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.html
and initially landed (but then backed out) in Nov 2015:
   http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html

 The @llvm.memcpy/memmove/memset intrinsics currently have an explicit argument
which is required to be a constant integer. It represents the alignment of the
dest (and source), and so must be the minimum of the actual alignment of the
two.

 This change is the first in a series that allows source and dest to each
have their own alignments by using the alignment attribute on their arguments.

 In this change we:
1) Remove the alignment argument.
2) Add alignment attributes to the source & dest arguments. We, temporarily,
   require that the alignments for source & dest be equal.

 For example, code which used to read:
  call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %src, i32 100, i32 4, i1 false)
will now read
  call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 4 %dest, i8* align 4 %src, i32 100, i1 false)

 Downstream users may have to update their lit tests that check for
@llvm.memcpy/memmove/memset call/declaration patterns. The following extended sed script
may help with updating the majority of your tests, but it does not catch all possible
patterns so some manual checking and updating will be required.

s~declare void @llvm\.mem(set|cpy|move)\.p([^(]*)\((.*), i32, i1\)~declare void @llvm.mem\1.p\2(\3, i1)~g
s~call void @llvm\.memset\.p([^(]*)i8\(i8([^*]*)\* (.*), i8 (.*), i8 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i8(i8\2* \3, i8 \4, i8 \5, i1 \6)~g
s~call void @llvm\.memset\.p([^(]*)i16\(i8([^*]*)\* (.*), i8 (.*), i16 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i16(i8\2* \3, i8 \4, i16 \5, i1 \6)~g
s~call void @llvm\.memset\.p([^(]*)i32\(i8([^*]*)\* (.*), i8 (.*), i32 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i32(i8\2* \3, i8 \4, i32 \5, i1 \6)~g
s~call void @llvm\.memset\.p([^(]*)i64\(i8([^*]*)\* (.*), i8 (.*), i64 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i64(i8\2* \3, i8 \4, i64 \5, i1 \6)~g
s~call void @llvm\.memset\.p([^(]*)i128\(i8([^*]*)\* (.*), i8 (.*), i128 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i128(i8\2* \3, i8 \4, i128 \5, i1 \6)~g
s~call void @llvm\.memset\.p([^(]*)i8\(i8([^*]*)\* (.*), i8 (.*), i8 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i8(i8\2* align \6 \3, i8 \4, i8 \5, i1 \7)~g
s~call void @llvm\.memset\.p([^(]*)i16\(i8([^*]*)\* (.*), i8 (.*), i16 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i16(i8\2* align \6 \3, i8 \4, i16 \5, i1 \7)~g
s~call void @llvm\.memset\.p([^(]*)i32\(i8([^*]*)\* (.*), i8 (.*), i32 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i32(i8\2* align \6 \3, i8 \4, i32 \5, i1 \7)~g
s~call void @llvm\.memset\.p([^(]*)i64\(i8([^*]*)\* (.*), i8 (.*), i64 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i64(i8\2* align \6 \3, i8 \4, i64 \5, i1 \7)~g
s~call void @llvm\.memset\.p([^(]*)i128\(i8([^*]*)\* (.*), i8 (.*), i128 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i128(i8\2* align \6 \3, i8 \4, i128 \5, i1 \7)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i8\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i8 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i8(i8\3* \4, i8\5* \6, i8 \7, i1 \8)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i16\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i16 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i16(i8\3* \4, i8\5* \6, i16 \7, i1 \8)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i32\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i32 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i32(i8\3* \4, i8\5* \6, i32 \7, i1 \8)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i64\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i64 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i64(i8\3* \4, i8\5* \6, i64 \7, i1 \8)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i128\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i128 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i128(i8\3* \4, i8\5* \6, i128 \7, i1 \8)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i8\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i8 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i8(i8\3* align \8 \4, i8\5* align \8 \6, i8 \7, i1 \9)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i16\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i16 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i16(i8\3* align \8 \4, i8\5* align \8 \6, i16 \7, i1 \9)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i32\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i32 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i32(i8\3* align \8 \4, i8\5* align \8 \6, i32 \7, i1 \9)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i64\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i64 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i64(i8\3* align \8 \4, i8\5* align \8 \6, i64 \7, i1 \9)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i128\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i128 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i128(i8\3* align \8 \4, i8\5* align \8 \6, i128 \7, i1 \9)~g

 The remaining changes in the series will:
Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing
   source and dest alignments.
Step 3) Update Clang to use the new IRBuilder API.
Step 4) Update Polly to use the new IRBuilder API.
Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API,
        and those that use use MemIntrinsicInst::[get|set]Alignment() to use
        getDestAlignment() and getSourceAlignment() instead.
Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the
        MemIntrinsicInst::[get|set]Alignment() methods.

Reviewers: pete, hfinkel, lhames, reames, bollu

Reviewed By: reames

Subscribers: niosHD, reames, jholewinski, qcolombet, jfb, sanjoy, arsenm, dschuff, dylanmckay, mehdi_amini, sdardis, nemanjai, david2050, nhaehnle, javed.absar, sbc100, jgravelle-google, eraman, aheejin, kbarton, JDevlieghere, asb, rbar, johnrusso, simoncook, jordy.potman.lists, apazos, sabuasal, llvm-commits

Differential Revision: https://reviews.llvm.org/D41675

llvm-svn: 322965
2018-01-19 17:13:12 +00:00
Alexey Bataev fa80c47c6a [SLP] Fix vectorization for tree with trunc to minimum required bit width.
Summary:
If the vectorized tree has truncate to minimum required bit width and
the vector type of the cast operation after the truncation is the same
as the vector type of the cast operands, count cost of the vector cast
operation as 0, because this cast will be later removed.
Also, if the vectorization tree root operations are integer cast operations, do not consider them as candidates for truncation. It will just create extra number of the same vector/scalar operations, which will be removed by instcombiner.

Reviewers: RKSimon, spatel, mkuper, hfinkel, mssimpso

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D41948

llvm-svn: 322946
2018-01-19 14:40:13 +00:00
Hiroshi Inoue d24ddcd6c4 [NFC] fix trivial typos in comments
"the the" -> "the"

llvm-svn: 322934
2018-01-19 10:55:29 +00:00
John Brawn 2867bd72c0 [InstCombine] Make foldSelectOpOp able to handle two-operand getelementptr
Three (or more) operand getelementptrs could plausibly also be handled, but
handling only two-operand fits in easily with the existing BinaryOperator
handling.

Differential Revision: https://reviews.llvm.org/D39958

llvm-svn: 322930
2018-01-19 10:05:15 +00:00
Benjamin Kramer bfc1d976ca [HWAsan] Fix uninitialized variable.
Found by msan.

llvm-svn: 322847
2018-01-18 14:19:04 +00:00
Evgeniy Stepanov 5bd669dc8f [hwasan] LLVM-level flags for linux kernel-compatible hwasan instrumentation.
Summary:
-hwasan-mapping-offset defines the non-zero shadow base address.
-hwasan-kernel disables calls to __hwasan_init in module constructors.
Unlike ASan, -hwasan-kernel does not force callback instrumentation.
This is controlled separately with -hwasan-instrument-with-calls.

Reviewers: kcc

Subscribers: srhines, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D42141

llvm-svn: 322785
2018-01-17 23:24:38 +00:00
Easwaran Raman e5b8de2f1f Add a ProfileCount class to represent entry counts.
Summary:
The class wraps a uint64_t and an enum to represent the type of profile
count (real and synthetic) with some helper methods.

Reviewers: davidxl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D41883

llvm-svn: 322771
2018-01-17 22:24:23 +00:00
Javed Absar 1e28194a40 [SCEV] Fix typo. NFC.
Fix confusing typo in comment.

llvm-svn: 322765
2018-01-17 21:58:35 +00:00
Zaara Syeda c9dc7b451b Revert [PowerPC] This reverts commit rL322721
Failing build bots. Revert the commit now.

llvm-svn: 322748
2018-01-17 20:00:15 +00:00
Zaara Syeda 8e951fd2f6 [PowerPC] Add handling for ColdCC calling convention and a pass to mark
candidates with coldcc attribute.

This patch adds support for the coldcc calling convention for Power.
This changes the set of non-volatile registers. It includes a pass to stress
test the implementation by marking all static directly called functions with
the coldcc attribute through the option -enable-coldcc-stress-test. It also
includes an option, -ppc-enable-coldcc, to add the coldcc attribute to
functions which are cold at all call sites based on BlockFrequencyInfo when
the containing function does not call any non cold functions.

Differential Revision: https://reviews.llvm.org/D38413

llvm-svn: 322721
2018-01-17 18:22:55 +00:00
Sanjay Patel aa766efd09 [InstCombine] fix demanded-bits propagation for zext/trunc
I was comparing the demanded-bits implementations between InstCombine
and TargetLowering as part of investigating questions in D42088 and
noticed that this was wrong in IR. We were losing all of the prior
known bits when we got back to the 'zext'.

llvm-svn: 322662
2018-01-17 14:39:28 +00:00
Daniil Fukalov d5fca554e2 [AMDGPU] add LDS f32 intrinsics
added llvm.amdgcn.atomic.{add|min|max}.f32 intrinsics
to allow generate ds_{add|min|max}[_rtn]_f32 instructions
needed for OpenCL float atomics in LDS

Reviewed by: arsenm

Differential Revision: https://reviews.llvm.org/D37985

llvm-svn: 322656
2018-01-17 14:05:05 +00:00
Ivan A. Kosarev 4d0ff0c74d [Transforms] Support making mutable versions of new-format TBAA access tags
Differential Revision: https://reviews.llvm.org/D41565

llvm-svn: 322650
2018-01-17 13:29:54 +00:00
Javed Absar 0b05f327d6 [SCEV] fix typo
llvm-svn: 322629
2018-01-17 11:03:06 +00:00
Evgeniy Stepanov c07e0bd533 [hwasan] Rename sized load/store callbacks to be consistent with ASan.
Summary: __hwasan_load is now __hwasan_loadN.

Reviewers: kcc

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D42138

llvm-svn: 322601
2018-01-16 23:15:08 +00:00
Florian Hahn c6c89bffdc [CallSiteSplitting] Pass list of (BB, Conditions) pairs to splitCallSite.
This removes some duplication from splitCallSite and makes it easier to
add additional code dealing with each predecessor. It also allows us to
split for more than 2 predecessors, although that is not enabled for
now.

Reviewers: junbuml, mcrosier, davidxl, davide

Reviewed By: junbuml

Differential Revision: https://reviews.llvm.org/D41858

llvm-svn: 322599
2018-01-16 22:13:15 +00:00
Alexey Bataev 6977dbcc7b [SLP] Fix for PR32164: Improve vectorization of reverse order of extract operations.
Summary: Sometimes vectorization of insertelement instructions with extractelement operands may produce an extra shuffle operation, if these operands are in the reverse order. Patch tries to improve this situation by the reordering of the operands to remove this extra shuffle operation.

Reviewers: mkuper, hfinkel, RKSimon, spatel

Subscribers: mzolotukhin, llvm-commits

Differential Revision: https://reviews.llvm.org/D33954

llvm-svn: 322579
2018-01-16 18:17:01 +00:00
Hiroshi Inoue 99a8faa615 [SROA] fix assetion failure
This patch fixes the assertion failure in SROA reported in PR35657.
PR35657 reports the assertion failure due to r319522 (splitting for non-whole-alloca slices), but this problem can happen even without r319522.

The problem exists in a check for reusing an existing alloca when rewriting partitions. As the original comment said, we can reuse the existing alloca if the new alloca has the same type and offset with the existing one. But the code checks only type of the alloca and then check the offset using an assert.
In a corner case with out-of-bounds access (e.g. @PR35657 function added in unit test), it is possible that the two allocas have the same type but different offsets.

This patch makes the check of the offset in the if condition, and re-enables the splitting for non-whole-alloca slices.

Differential Revision: https://reviews.llvm.org/D41981

llvm-svn: 322533
2018-01-16 06:23:05 +00:00
Andrei Elovikov 7457aa0bce [LV] Don't call recordVectorLoopValueForInductionCast for newly-created IV from a trunc.
Summary:
This method is supposed to be called for IVs that have casts in their use-def
chains that are completely ignored after vectorization under PSE. However, for
truncates of such IVs the same InductionDescriptor is used during
creation/widening of both original IV based on PHINode and new IV based on
TruncInst.

This leads to unintended second call to recordVectorLoopValueForInductionCast
with a VectorLoopVal set to the newly created IV for a trunc and causes an
assert due to attempt to store new information for already existing entry in the
map. This is wrong and should not be done.

Fixes PR35773.

Reviewers: dorit, Ayal, mssimpso

Reviewed By: dorit

Subscribers: RKSimon, dim, dcaballe, hsaito, llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D41913

llvm-svn: 322473
2018-01-15 10:56:07 +00:00
Max Kazantsev d0fe502385 [NFC] Fix comment to adjust to reality
llvm-svn: 322468
2018-01-15 05:44:43 +00:00
Evgeniy Stepanov 080e0d40b9 [hwasan] An LLVM flag to disable stack tag randomization.
Summary: Necessary to achieve consistent test results.

Reviewers: kcc, alekseyshl

Subscribers: kubamracek, llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D42023

llvm-svn: 322429
2018-01-13 01:32:15 +00:00
Daniel Neilson 2409d24201 [NFC] Change MemIntrinsicInst::setAlignment() to take an unsigned instead of a Constant
Summary:
 In preparation for https://reviews.llvm.org/D41675 this NFC changes this
prototype of MemIntrinsicInst::setAlignment() to accept an unsigned instead
of a Constant.

llvm-svn: 322403
2018-01-12 21:33:37 +00:00
Brian M. Rzycki 9b7ae23256 [JumpThreading] Preservation of DT and LVI across the pass
Summary:
See D37528 for a previous (non-deferred) version of this
patch and its description.

Preserves dominance in a deferred manner using a new class
DeferredDominance. This reduces the performance impact of
updating the DominatorTree at every edge insertion and
deletion. A user may call DDT->flush() within JumpThreading
for an up-to-date DT. This patch currently has one flush()
at the end of runImpl() to ensure DT is preserved across
the pass.

LVI is also preserved to help subsequent passes such as
CorrelatedValuePropagation. LVI is simpler to maintain and
is done immediately (not deferred). The code to perform the
preversation was minimally altered and simply marked as
preserved for the PassManager to be informed.

This extends the analysis available to JumpThreading for
future enhancements such as threading across loop headers.

Reviewers: dberlin, kuhar, sebpop

Reviewed By: kuhar, sebpop

Subscribers: mgorny, dmgreen, kuba, rnk, rsmith, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D40146

llvm-svn: 322401
2018-01-12 21:06:48 +00:00
Max Kazantsev ef0576000c [IRCE][NFC] Make range check's End a non-null SCEV
Currently, IRC contains `Begin` and `Step` as SCEVs and `End` as value.
Aside from that, `End` can also be `nullptr` which can be later conditionally
converted into a non-null SCEV.

To make this logic more transparent, this patch makes `End` a SCEV and
calculates it early, so that it is never a null.

Differential Revision: https://reviews.llvm.org/D39590

llvm-svn: 322364
2018-01-12 10:00:26 +00:00
Serguei Katkov a757d65cec [LoopDeletion] Handle users in unreachable block
This is a fix for PR35884.

When we want to delete dead loop we must clean uses in unreachable blocks
otherwise we'll get an assert during deletion of instructions from the loop.

Reviewers: anna, davide
Reviewed By: anna
Subscribers: llvm-commits, lebedev.ri
Differential Revision: https://reviews.llvm.org/D41943

llvm-svn: 322357
2018-01-12 07:24:43 +00:00
Evgeniy Stepanov 99fa3e774d [hwasan] Stack instrumentation.
Summary:
Very basic stack instrumentation using tagged pointers.
Tag for N'th alloca in a function is built as XOR of:
 * base tag for the function, which is just some bits of SP (poor
   man's random)
 * small constant which is a function of N.

Allocas are aligned to 16 bytes. On every ReturnInst allocas are
re-tagged to catch use-after-return.

This implementation has a bunch of issues that will be taken care of
later:
1. lifetime intrinsics referring to tagged pointers are not
   recognized in SDAG. This effectively disables stack coloring.
2. Generated code is quite inefficient. There is one extra
   instruction at each memory access that adds the base tag to the
   untagged alloca address. It would be better to keep tagged SP in a
   callee-saved register and address allocas as an offset of that XOR
   retag, but that needs better coordination between hwasan
   instrumentation pass and prologue/epilogue insertion.
3. Lifetime instrinsics are ignored and use-after-scope is not
   implemented. This would be harder to do than in ASan, because we
   need to use a differently tagged pointer depending on which
   lifetime.start / lifetime.end the current instruction is dominated
   / post-dominated.

Reviewers: kcc, alekseyshl

Subscribers: srhines, kubamracek, javed.absar, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D41602

llvm-svn: 322324
2018-01-11 22:53:30 +00:00
Rafael Espindola e4b0231c63 Make internal/private GVs implicitly dso_local.
While updating clang tests for having clang set dso_local I noticed
that:

- There are *a lot* of tests to update.
- Many of the updates are redundant.

They are redundant because a GV is "obviously dso_local". This patch
starts formalizing that a bit by requiring that internal and private
GVs be dso_local too. Since they all are, we don't have to print
dso_local to the textual representation, making it a bit more compact
and easier to read.

llvm-svn: 322317
2018-01-11 22:15:05 +00:00
Fiona Glaser efe6a84e5b [Sink] Really really fix predicate in legality check
LoadInst isn't enough; we need to include intrinsics that perform loads too.

All side-effecting intrinsics and such are already covered by the isSafe
check, so we just need to care about things that read from memory.

D41960, originally from D33179.

llvm-svn: 322311
2018-01-11 21:28:57 +00:00
Benjamin Kramer 738e6e7cb0 [InstCombine] Apply the fix from r322284 for sin / cos -> tan too
llvm-svn: 322285
2018-01-11 15:33:21 +00:00
Benjamin Kramer 44993ede60 [InstCombine] For cos/sin -> tan copy attributes from cos instead of the
parent function

Ideally we should merge the attributes from the functions somehow, but
this is obviously an improvement over taking random attributes from the
caller which will trip up the verifier if they're nonsensical for an
unary intrinsic call.

llvm-svn: 322284
2018-01-11 15:19:02 +00:00
Dmitry Venikov e5fbf591a7 [InstCombine] Missed optimization in math expression: sin(x) / cos(x) => tan(x)
Summary: This patch enables folding sin(x) / cos(x) -> tan(x), cos(x) / sin(x) -> 1 / tan(x) under -ffast-math flag

Reviewers: hfinkel, spatel

Reviewed By: spatel

Subscribers: andrew.w.kaylor, efriedma, scanon, llvm-commits

Differential Revision: https://reviews.llvm.org/D41286

llvm-svn: 322255
2018-01-11 06:33:00 +00:00
Marcello Maggioni ddccd50313 [NFC] Commit to mention that r322248 is actually made by AndrewScheidecker
llvm-svn: 322249
2018-01-11 02:06:28 +00:00
Marcello Maggioni 7083423f22 [SimplifyCFG] Add cut-off for InitializeUniqueCases.
The function can take a significant amount of time on some
complicated test cases, but for the currently only use of
the function we can stop the initialization much earlier
when we find out we are going to discard the result anyway
in the caller of the function.

Adding configurable cut-off points so that we avoid wasting time.
NFCI.

llvm-svn: 322248
2018-01-11 02:01:16 +00:00
Justin Lebar 9d3afd3c06 Add explanatory comment to LoadStoreVectorizer.
Reviewers: arsenm

Subscribers: rengolin, sanjoy, wdng, hiraditya, asbirlea

Differential Revision: https://reviews.llvm.org/D41890

llvm-svn: 322157
2018-01-10 03:02:12 +00:00
Vlad Tsyrklevich cdec22ef9a LowerTypeTests: Add limited support for aliases
Summary:
LowerTypeTests moves some function definitions from individual object
files to the merged module, leaving a stub to be called in the merged
module's jump table. If an alias was pointing to such a function
definition LowerTypeTests would fail because the alias would be left
without a definition to point to.

This change 1) emits information about aliases to the ThinLTO summary,
2) replaces aliases pointing to function definitions that are moved to
the merged module with function declarations, and 3) re-emits those
aliases in the merged module pointing to the correct function
definitions.

The patch does not correctly fix all possible mis-uses of aliases in
LowerTypeTests. For example, it does not handle aliases with a different
type from the pointed to function.

The addition of alias data increases the size of Chrome build artifacts
by less than 1%.

Reviewers: pcc

Reviewed By: pcc

Subscribers: mehdi_amini, eraman, mgrang, llvm-commits, eugenis, kcc

Differential Revision: https://reviews.llvm.org/D41741

llvm-svn: 322139
2018-01-10 00:00:51 +00:00
Michael Zolotukhin 1f562176e9 [LoopRotate] Detect loops with indirect branches better (we're giving up on them).
llvm-svn: 322137
2018-01-09 23:54:35 +00:00
Chris Bieneman abdea268c1 [IPSCCP] Remove calls without side effects
Summary:
When performing constant propagation for call instructions we have historically replaced all uses of the return from a call, but not removed the call itself. This is required for correctness if the calls have side effects, however the compiler should be able to safely remove calls that don't have side effects.

This allows the compiler to completely fold away calls to functions that have no side effects if the inputs are constant and the output can be determined at compile time.

Reviewers: davide, sanjoy, bruno, dberlin

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D38856

llvm-svn: 322125
2018-01-09 21:58:46 +00:00
Daniel Berlin 56cca7437c NewGVN: Fix PR/33367, which was causing us to delete non-copy intrinsics accidentally in some rare cases
llvm-svn: 322115
2018-01-09 20:12:42 +00:00
Easwaran Raman bdf20261d8 Add a pass to generate synthetic function entry counts.
Summary:
This pass synthesizes function entry counts by traversing the callgraph
and using the relative block frequencies of the callsites. The intended
use of these counts is in inlining to determine hot/cold callsites in
the absence of profile information.

The pass is split into two files with the code that propagates the
counts in a callgraph in a Utils file. I plan to add support for
propagation in the thinlto link phase and the propagation code will be
shared and hence this split. I did not add support to the old PM since
hot callsite determination in inlining is not possible in old PM
(although we could use hot callee heuristic with synthetic counts in the
old PM it is not worth the effort tuning it)

Reviewers: davidxl, silvas

Subscribers: mgorny, mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D41604

llvm-svn: 322110
2018-01-09 19:39:35 +00:00
Sanjay Patel 6fb1357c35 [InstCombine] weaken assertions for icmp folds (PR35846)
Because of potential UB (known bits conflicts with an llvm.assume),
we have to check rather than assert here because InstSimplify doesn't
kill the compare:
https://bugs.llvm.org/show_bug.cgi?id=35846

llvm-svn: 322104
2018-01-09 18:56:03 +00:00
Petar Jovanovic 1d26c7e4ff [EarlyCSE] Salvage debug info during DCE
EarlyCSE did not try to salvage debug info during erasing of instructions.
This change fixes it.

Patch by Djordje Todorovic.

Differential Revision: https://reviews.llvm.org/D41496

llvm-svn: 322083
2018-01-09 15:08:37 +00:00
Simon Pilgrim 5d909be91b [InstCombine] Check for out of range ashr values using APInt before calling getZExtValue
Reduced from oss-fuzz #5032 test case

llvm-svn: 322078
2018-01-09 14:23:46 +00:00
Justin Bogner 6f6846fc9d AlwaysInliner: Alow setting InsertLifetime in the new-style pass
llvm-svn: 322033
2018-01-08 22:07:42 +00:00
Justin Bogner 92fe563b57 ArgPromotion: Allow setting MaxElements in the new-style pass
llvm-svn: 322025
2018-01-08 21:13:35 +00:00
Davide Italiano 9a60d2c157 [CVP] Replace incoming values from unreachable blocks with undef.
This is an attempt of fixing PR35807.
Due to the non-standard definition of dominance in LLVM, where uses in
unreachable blocks are dominated by anything, you can have, in an
unreachable block:

  %patatino = OP1 %patatino, CONSTANT

When `SimplifyInstruction` receives a PHI where an incoming value is of
the aforementioned form, in some cases, loops indefinitely.

What I propose here instead is keeping track of the incoming values
from unreachable blocks, and replacing them with undef. It fixes this
case, and it seems to be good regardless (even if we can't prove that
the value is constant, as it's coming from an unreachable block, we
can ignore it).

Differential Revision:  https://reviews.llvm.org/D41812

llvm-svn: 322006
2018-01-08 16:34:06 +00:00
Sanjay Patel 31b4b76f99 [InstCombine] fold min/max tree with common operand (PR35717)
There is precedence for factorization transforms in instcombine for FP ops with fast-math. 
We also have similar logic in foldSPFofSPF().

It would take more work to add this to reassociate because that's specialized for binops, 
and min/max are not binops (or even single instructions). Also, I don't have evidence that 
larger min/max trees than this exist in real code, but if we find that's true, we might
want to reorganize where/how we do this optimization.

In the motivating example from https://bugs.llvm.org/show_bug.cgi?id=35717 , we have:

int test(int xc, int xm, int xy) {
  int xk;
  if (xc < xm)
    xk = xc < xy ? xc : xy;
  else
    xk = xm < xy ? xm : xy;
  return xk;
}

This patch solves that problem because we recognize more min/max patterns after rL321672

https://rise4fun.com/Alive/Qjne
https://rise4fun.com/Alive/3yg

Differential Revision: https://reviews.llvm.org/D41603

llvm-svn: 321998
2018-01-08 15:05:34 +00:00
Alexey Bataev 5b9a77d4ea [SLP] Fix PR35777: Incorrect handling of aggregate values.
Summary:
Fixes the bug with incorrect handling of InsertValue|InsertElement
instrucions in SLP vectorizer. Currently, we may use incorrect
ExtractElement instructions as the operands of the original
InsertValue|InsertElement instructions.

Reviewers: mkuper, hfinkel, RKSimon, spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D41767

llvm-svn: 321994
2018-01-08 14:43:06 +00:00
Alexey Bataev 118a0a2c38 [SLP] Fix PR35628: Count external uses on extra reduction arguments.
Summary:
If the vectorized value is marked as extra reduction argument, its users
are not considered as external users. Patch fixes this.

Reviewers: mkuper, hfinkel, RKSimon, spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D41786

llvm-svn: 321993
2018-01-08 14:33:11 +00:00
Davide Italiano e15bffe9ea Revert "[SCCP] Manually fold branches on undef."
I thought this was responsible for PR35723, but I was
wrong, the issue lies elsewhere. Revert while I debug.

llvm-svn: 321975
2018-01-07 22:09:44 +00:00
Davide Italiano 4c39758a38 [SLPVectorizer] Reintroduce std::stable_sort(properlyDominates()).
The approach was never discussed, I wasn't able to reproduce this
non-determinism, and the original author went AWOL.
After a discussion on the ML, Philip suggested to revert this.

llvm-svn: 321974
2018-01-07 22:06:24 +00:00
Hal Finkel 0f1314c5ee [LV][VPlan] NFC patch to move LoopVectorizationPlanner class out of LoopVectorize.cpp
Another small step forward to move VPlan stuff outside of LoopVectorize.cpp.

VPlanBuilder.h is renamed to LoopVectorizationPlanner.h
LoopVectorizationPlanner class is moved from LoopVectorize.cpp to
LoopVectorizationPlanner.h LoopVectorizationCostModel::VectorizationFactor
class is moved to LoopVectorizationPlanner.h (used by the planner class) ---
this needs further streamlining work in later patches and thus all I did was
take it out of the CostModel class and moved to the header file.  The callback
function had to stay inside LoopVectorize.cpp since it calls an
InnerLoopVectorizer member function declared in it.  Next Steps: Make
InnerLoopVectorizer, LoopVectorizationCostModel, and other classes more modular
and more aligned with VPlan direction, in small increments.

Previous step was: r320900 (https://reviews.llvm.org/D41045)

Patch by Hideki Saito, thanks!

Differential Revision: https://reviews.llvm.org/D41420

llvm-svn: 321962
2018-01-07 16:02:58 +00:00
Florian Hahn 55be37e7d4 [CodeExtractor] Use subset of function attributes for extracted function.
In addition to target-dependent attributes, we can also preserve a
white-listed subset of target independent function attributes. The white-list
excludes problematic attributes, most prominently:

* attributes related to memory accesses, as alloca instructions
  could be moved in/out of the extracted block

* control-flow dependent attributes, like no_return or thunk, as the
  relerelevant instructions might or might not get extracted.

Thanks @efriedma and @aemerson for providing a set of attributes that cannot be
propagated.


Reviewers: efriedma, davidxl, davide, silvas

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D41334

llvm-svn: 321961
2018-01-07 11:22:25 +00:00
Florian Hahn a82eef2363 [InlineFunction] Preserve calling convention when forwarding VarArgs.
Reviewers: efriedma, rnk, davide

Reviewed By: rnk, davide

Differential Revision: https://reviews.llvm.org/D41556

llvm-svn: 321943
2018-01-06 20:56:27 +00:00
Florian Hahn de10e6e064 [InlineFunction] Preserve attributes when forwarding VarArgs.
Reviewers: rnk, efriedma

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D41555

llvm-svn: 321942
2018-01-06 20:46:00 +00:00
Florian Hahn 80788d8088 [InlineFunction] Inline vararg functions that do not access varargs.
If the varargs are not accessed by a function, we can inline the
function.

Reviewers: dblaikie, chandlerc, davide, efriedma, rnk, hfinkel

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D41335

llvm-svn: 321940
2018-01-06 19:45:40 +00:00
Sanjay Patel 26a6fcde83 [InstCombine] relax use constraint for min/max (~a, ~b) --> ~min/max(a, b)
In the minimal case, this won't remove instructions, but it still improves
uses of existing values.

In the motivating example from PR35834, it does remove instructions, and
sets that case up to be optimized by something like D41603:
https://reviews.llvm.org/D41603

llvm-svn: 321936
2018-01-06 17:34:22 +00:00
Vedant Kumar b2ec02ba0b [Utils] Simplify salvageDebugInfo, NFCI
Having a single call to findDbgUsers() allows salvageDebugInfo() to
return earlier.

Differential Revision: https://reviews.llvm.org/D41787

llvm-svn: 321915
2018-01-05 23:27:02 +00:00
Sanjay Patel 5b6aacf2c1 [InstCombine] add folds for min(~a, b) --> ~max(a, b)
Besides the bug of omitting the inverse transform of max(~a, ~b) --> ~min(a, b),
the use checking and operand creation were off. We were potentially creating 
repeated identical instructions of existing values. This led to infinite
looping after I added the extra folds.

By using the simpler m_Not matcher and not creating new 'not' ops for a and b,
we avoid that problem. It's possible that not using IsFreeToInvert() here is
more limiting than the simpler matcher, but there are no tests for anything
more exotic. It's also possible that we should relax the use checking further
to handle a case like PR35834:
https://bugs.llvm.org/show_bug.cgi?id=35834
...but we can make that a follow-up if it is needed. 

llvm-svn: 321882
2018-01-05 19:01:17 +00:00
Peter Collingbourne 9110cb456d WholeProgramDevirt: Simplify ORE getter mechanism for old PM. NFCI.
llvm-svn: 321841
2018-01-05 00:27:51 +00:00
Reid Kleckner cd78ddc119 Revert "[JumpThreading] Preservation of DT and LVI across the pass"
This reverts r321825, it causes crashes in Chromium. Reproducer
forthcoming.

llvm-svn: 321832
2018-01-04 23:23:46 +00:00
Brian M. Rzycki cdad6c0b60 [JumpThreading] Preservation of DT and LVI across the pass
Summary:
See D37528 for a previous (non-deferred) version of this
patch and its description.

Preserves dominance in a deferred manner using a new class
DeferredDominance. This reduces the performance impact of
updating the DominatorTree at every edge insertion and
deletion. A user may call DDT->flush() within JumpThreading
for an up-to-date DT. This patch currently has one flush()
at the end of runImpl() to ensure DT is preserved across
the pass.

LVI is also preserved to help subsequent passes such as
CorrelatedValuePropagation. LVI is simpler to maintain and
is done immediately (not deferred). The code to perfom the
preversation was minimally altered and was simply marked
as preserved for the PassManager to be informed.

This extends the analysis available to JumpThreading for
future enhancements. One example is loop boundary threading.

Reviewers: dberlin, kuhar, sebpop

Reviewed By: kuhar, sebpop

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D40146

llvm-svn: 321825
2018-01-04 21:57:32 +00:00
Anna Thomas 9fca583757 Add assertion on DT availability during LI update in UpdateAnalysisInformation
This came up during discussions in llvm-commits for
rL321653: Check for unreachable preds before updating LI in
UpdateAnalysisInformation

The assert provides hints to passes to require both DT and LI if we plan on
updating LI through this function.

Tests run: make check

llvm-svn: 321805
2018-01-04 17:21:15 +00:00
Sanjay Patel c63f9014d6 [InstCombine] safely create a constant of the right type (PR35794)
llvm-svn: 321801
2018-01-04 14:31:56 +00:00
Aditya Kumar 1f90cae80f [GVNHoist] Fix: PR35222 gvn-hoist incorrectly erases load in case of a loop
Reviewers:
    dberlin
    sebpop
    eli.friedman

Differential Revision: https://reviews.llvm.org/D41453

llvm-svn: 321789
2018-01-04 07:47:24 +00:00
Matt Arsenault 8070882b4e StructurizeCFG: Fix broken backedge detection
The work order was changed in r228186 from SCC order
to RPO with an arbitrary sorting function. The sorting
function attempted to move inner loop nodes earlier. This
was was apparently relying on an assumption that every block
in a given loop / the same loop depth would be seen before
visiting another loop. In the broken testcase, a block
outside of the loop was encountered before moving onto
another block in the same loop. The testcase would then
structurize such that one blocks unconditional successor
could never be reached.

Revert to plain RPO for the analysis phase. This fixes
detecting edges as backedges that aren't really.

The processing phase does use another visited set, and
I'm unclear on whether the order there is as important.
An arbitrary order doesn't work, and triggers some infinite
loops. The reversed RPO list seems to work and is closer
to the order that was used before, minus the arbitary
custom sorting.

A few of the changed tests now produce smaller code,
and a few are slightly worse looking.

llvm-svn: 321751
2018-01-03 18:45:37 +00:00
Simon Pilgrim 3bf2d64589 [InstCombine] Check for out of range shift values using APInt before calling getZExtValue
Reduced from oss-fuzz #4871 test case

llvm-svn: 321748
2018-01-03 18:28:20 +00:00
Anna Thomas bdb9430917 [BasicBlockUtils] Check for unreachable preds before updating LI in UpdateAnalysisInformation
Summary:
We are incorrectly updating the LI when loop-simplify generates
dedicated exit blocks for a loop. The issue is that there's an implicit
assumption that the Preds passed into UpdateAnalysisInformation are
reachable. However, this is not true and breaks LI by incorrectly
updating the header of a loop.

One such case is when we generate dedicated exits when the exit block is
a landing pad (through SplitLandingPadPredecessors). There maybe other
cases as well, since we do not guarantee that Preds passed in are
reachable basic blocks.

The added test case shows how loop-simplify breaks LI for the outer loop (and DT in turn)
after we try to generate the LoopSimplifyForm.

Reviewers: davide, chandlerc, sanjoy

Reviewed By: davide

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D41519

llvm-svn: 321653
2018-01-02 16:25:50 +00:00
Dmitry Venikov a58d8deb3a [InstCombine] Missed optimization in math expression: squashing sqrt functions
Summary: This patch enables folding under -ffast-math flag sqrt(a) * sqrt(b) -> sqrt(a*b)

Reviewers: hfinkel, spatel, davide

Reviewed By: spatel, davide

Subscribers: davide, llvm-commits

Differential Revision: https://reviews.llvm.org/D41322

llvm-svn: 321637
2018-01-02 05:58:11 +00:00
Davide Italiano 86b7949f62 [SimplifyCFG] Return to the pass manager the correct value.
I wanted to commit this with r321603, but I failed to squash
the two commits.

llvm-svn: 321606
2017-12-31 16:54:03 +00:00
Davide Italiano 0512bf5af2 [Utils/Local] Use `auto` when the type is obvious. NFCI.
llvm-svn: 321605
2017-12-31 16:51:50 +00:00
Davide Italiano 5dd1c587e7 [Utils] Remove commented debug message. NFCI.
llvm-svn: 321604
2017-12-31 16:48:44 +00:00
Davide Italiano 9f074fe915 [SimplifyCFG] Stop hoisting musttail calls incorrectly.
PR35774.

llvm-svn: 321603
2017-12-31 16:47:16 +00:00
Benjamin Kramer c7fc81e659 Use phi ranges to simplify code. No functionality change intended.
llvm-svn: 321585
2017-12-30 15:27:33 +00:00
Matt Arsenault 8dcfa137f3 StructurizeCFG: Use phi iterator range
llvm-svn: 321568
2017-12-29 19:25:57 +00:00
Benjamin Kramer 24cb28bb54 Remove superfluous copies in sample profiling.
No functionliaty change intended.

llvm-svn: 321530
2017-12-28 18:10:41 +00:00
Guozhi Wei 29697c13bc Revert r321377, it causes regression to https://reviews.llvm.org/P8055.
llvm-svn: 321528
2017-12-28 17:02:34 +00:00
Benjamin Kramer 3a13ed60ba Avoid int to string conversion in Twine or raw_ostream contexts.
Some output changes from uppercase hex to lowercase hex, no other functionality change intended.

llvm-svn: 321526
2017-12-28 16:58:54 +00:00
Max Kazantsev a13e163a27 [RewriteStatepoints] Fix incorrect assertion
`RewriteStatepointsForGC` iterates over function blocks and their predecessors
in order of declaration. One of outcomes of this is that callsites are placed in
arbitrary order which has nothing to do with travelsar order.

On the other hand, function `recomputeLiveInValues` asserts that bases are
added to `Info.PointerToBase` before their deried pointers are updated. But
if call sites are processed in order different from RPOT, this is not necessarily
true. We cannot guarantee that the base was placed there before every
pointer derived from it. All we can guarantee is that this base was marked as
known base by this point.

This patch replaces the fact that we assert from checking that the base was
added to the map with assert that the base was marked as known base.

Differential Revision: https://reviews.llvm.org/D41593

llvm-svn: 321517
2017-12-28 12:03:12 +00:00
Simon Pilgrim 472689a159 [InstCombine] Check for isa<Instruction> before using cast<>
Protects against casts from constexpr etc.

Reduced from oss-fuzz #4788 test case

llvm-svn: 321515
2017-12-28 09:35:35 +00:00
Reid Kleckner 6d31001cd6 Revert "[memcpyopt] Teach memcpyopt to optimize across basic blocks"
This reverts r321138. It seems there are still underlying issues with
memdep. PR35519 seems to still be present if debug info is enabled. We
end up losing a memcpy. Somehow during store to memset merging, we
insert the memset after the memcpy or fail to update the memdep analysis
to account for the newly inserted memset of a pair.

Reduced test case:

  #include <assert.h>
  #include <stdio.h>
  #include <string>
  #include <utility>
  #include <vector>

  void do_push_back(
      std::vector<std::pair<std::string, std::vector<std::string>>>* crls) {
    crls->push_back(std::make_pair(std::string(), std::vector<std::string>()));
  }

  int __attribute__((optnone)) main() {
    // Put some data in the vector and then remove it so we take the push_back
    // fast path.
    std::vector<std::pair<std::string, std::vector<std::string>>> crl_set;
    crl_set.push_back({"asdf", {}});
    crl_set.pop_back();
    printf("first word in vector storage: %p\n", *(void**)crl_set.data());

    // Do the push_back which may fail to initialize the data.
    do_push_back(&crl_set);
    auto* first = &crl_set.back().first;
    printf("first word in vector storage (should be zero): %p\n",
           *(void**)crl_set.data());
    assert(first->empty());
    puts("ok");
  }

Compile with libc++, enable optimizations, and enable debug info:
$ clang++ -stdlib=libc++ -g -O2 t.cpp -o t.exe -Wl,-rpath=llvm/build/lib

This program will assert with this change.

llvm-svn: 321510
2017-12-28 05:10:33 +00:00
Simon Pilgrim e7d032f1d8 [InstCombine] Gracefully handle out of range extractelement indices
InstSimplify is responsible for handling these, but we shouldn't just assert here.

Reduced from oss-fuzz #4808 test case

llvm-svn: 321489
2017-12-27 12:00:18 +00:00
Philip Reames cd13a66381 [instcombine] add powi(x, 2) -> x * x
llvm-svn: 321468
2017-12-27 01:30:12 +00:00
Philip Reames 5000ba69d7 Sink a couple of transforms from instcombine into instsimplify.
llvm-svn: 321467
2017-12-27 01:14:30 +00:00
Philip Reames 7a6db4fc4f [NFC] Extract out a helper function for SimplifyCall(CS, Q)
This simplifies code, but the real motivation is that it lets me clean up some downstream code.

llvm-svn: 321466
2017-12-27 00:16:12 +00:00
Zhaoshi Zheng 8af1e1cb78 [Unroll][DebugInfo] Propagate loop body's debug location to epilog preheader
NewExit and epilog PreHeader should has the same debug loc as the original loop
body, instead of original loop exit.

llvm-svn: 321465
2017-12-26 23:31:21 +00:00
Sanjay Patel 14adbacd8a [InstCombine] fix miscompile of frem with 0.0 operand (PR34870)
We might want to select NAN here or do this transform with fast-math,
but this should at least fix the miscompile.

llvm-svn: 321461
2017-12-26 22:12:20 +00:00
Benjamin Kramer 802e6255b2 Make helpers static. No functionality change.
llvm-svn: 321425
2017-12-24 12:46:22 +00:00
Florian Hahn 7e9328906b [CallSiteSplitting] Remove isOrHeader restriction.
By following the single predecessors of the predecessors of the call
site, we do not need to restrict the control flow.

Reviewed By: junbuml, davide

Differential Revision: https://reviews.llvm.org/D40729

llvm-svn: 321413
2017-12-23 20:02:26 +00:00
Davide Italiano 55b663431e [SCCP] Manually fold branches on undef.
This code was originally removed and replace with an assertion
because believed unnecessary. It turns out there was simply
no test coverage for this case, and the constant folder doesn't
yet know about patterns like `br undef %label1, %label2`.
Presumably at some point the constant folder might learn about
these patterns, but it's a broader change.
A testcase will be added to make sure this doesn't regress again
in the future.

Fixes PR35723.

llvm-svn: 321402
2017-12-23 15:06:30 +00:00
Guozhi Wei 33250340f4 [SimplifyCFG] Don't do if-conversion if there is a long dependence chain
If after if-conversion, most of the instructions in this new BB construct a long and slow dependence chain, it may be slower than cmp/branch, even if the branch has a high miss rate, because the control dependence is transformed into data dependence, and control dependence can be speculated, and thus, the second part can execute in parallel with the first part on modern OOO processor.

This patch checks for the long dependence chain, and give up if-conversion if find one.

Differential Revision: https://reviews.llvm.org/D39352

llvm-svn: 321377
2017-12-22 18:54:04 +00:00
Easwaran Raman a17f220590 Add hasProfileData() to check if a function has profile data. NFC.
Summary:
This replaces calls to getEntryCount().hasValue() with hasProfileData
that does the same thing. This refactoring is useful to do before adding
synthetic function entry counts but also a useful cleanup IMO even
otherwise. I have used hasProfileData instead of hasRealProfileData as
David had earlier suggested since I think profile implies "real" and I
use the phrase "synthetic entry count" and not "synthetic profile count"
but I am fine calling it hasRealProfileData if you prefer.

Reviewers: davidxl, silvas

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D41461

llvm-svn: 321331
2017-12-22 01:33:52 +00:00
Michael Zolotukhin ad371e0caa [SimplifyCFG] Avoid quadratic on a predecessors number behavior in instruction sinking.
If a block has N predecessors, then the current algorithm will try to
sink common code to this block N times (whenever we visit a
predecessor). Every attempt to sink the common code includes going
through all predecessors, so the complexity of the algorithm becomes
O(N^2).
With this patch we try to sink common code only when we visit the block
itself. With this, the complexity goes down to O(N).
As a side effect, the moment the code is sunk is slightly different than
before (the order of simplifications has been changed), that's why I had
to adjust two tests (note that neither of the tests is supposed to test
SimplifyCFG):
* test/CodeGen/AArch64/arm64-jumptable.ll - changes in this test mimic
the changes that previous implementation of SimplifyCFG would do.
* test/CodeGen/ARM/avoid-cpsr-rmw.ll - in this test I disabled common
code sinking by a command line flag.

llvm-svn: 321236
2017-12-21 01:22:13 +00:00
Matthew Simpson cb35c5d5c2 [ICP] Expose unconditional call promotion interface
This patch modifies the indirect call promotion utilities by exposing and using
an unconditional call promotion interface. The unconditional promotion
interface (i.e., call promotion without creating an if-then-else) can be used
if it's known that an indirect call has only one possible callee. The existing
conditional promotion interface uses this unconditional interface to promote an
indirect call after it has been versioned and placed within the "then" block.

A consequence of unconditional promotion is that the fix-up operations for phi
nodes in the normal destination of invoke instructions are changed. This is
necessary because the existing implementation assumed that an invoke had been
versioned, creating a "merge" block where a return value bitcast could be
placed. In the new implementation, the edge between a promoted invoke's parent
block and its normal destination is split if needed to add a bitcast for the
return value. If the invoke is also versioned, the phi node merging the return
value of the promoted and original invoke instructions is placed in the "merge"
block.

Differential Revision: https://reviews.llvm.org/D40751

llvm-svn: 321210
2017-12-20 19:26:37 +00:00
Evgeniy Stepanov 3fd1b1a764 [hwasan] Implement -fsanitize-recover=hwaddress.
Summary: Very similar to AddressSanitizer, with the exception of the error type encoding.

Reviewers: kcc, alekseyshl

Subscribers: cfe-commits, kubamracek, llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D41417

llvm-svn: 321203
2017-12-20 19:05:44 +00:00
Florian Hahn 012c8f97b2 [InstCombine] Add debug location to new caller.
Reviewers: rnk, aprantl, majnemer

Reviewed By: aprantl

Differential Revision: https://reviews.llvm.org/D414

llvm-svn: 321191
2017-12-20 17:16:59 +00:00
Mohammad Shahid 3a934d6ab9 Revert r320548:[SLP] Vectorize jumbled memory loads
llvm-svn: 321181
2017-12-20 15:26:59 +00:00
Florian Hahn 467abe3e4f [LV] Remove unnecessary DoExtraAnalysis guard (silent bug)
canVectorize is only checking if the loop has a normalized pre-header if DoExtraAnalysis is true.
This doesn't make sense to me because reporting analysis information shouldn't alter legality
checks. This is probably the result of a last minute minor change before committing (?).

Patch by Diego Caballero.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D40973

llvm-svn: 321172
2017-12-20 13:28:38 +00:00
Dan Gohman aa3922819e [memcpyopt] Teach memcpyopt to optimize across basic blocks
This teaches memcpyopt to make a non-local memdep query when a local query
indicates that the dependency is non-local. This notably allows it to
eliminate many more llvm.memcpy calls in common Rust code, often by 20-30%.

This is r319482 and r319483, along with fixes for PR35519: fix the 
optimization that merges stores into memsets to preserve cached memdep
info, and fix memdep's non-local caching strategy to not assume that larger
queries are always more conservative than smaller ones.

Fixes PR28958 and PR35519.

Differential Revision: https://reviews.llvm.org/D40802

llvm-svn: 321138
2017-12-20 01:36:25 +00:00
Adrian Prantl 0e6694d111 Silence a bunch of implicit fallthrough warnings
llvm-svn: 321114
2017-12-19 22:05:25 +00:00
Haicheng Wu 5b106ef92e [SeparateConstOffsetFromGEP] Fix a typo. NFC.
do CSE for to => do CSE to

llvm-svn: 321098
2017-12-19 18:49:21 +00:00
Max Kazantsev fd95ee0c9a [JumpThreading] Restrict PRE across instructions that don't pass control to successors
PRE in JumpThreading should not be able to hoist copy of non-speculable loads across
instructions that don't always transfer execution to their successors, otherwise they may
introduce an unsafe load which otherwise would not be executed.

The same problem for GVN was fixed as rL316975.

Differential Revision: https://reviews.llvm.org/D40347

llvm-svn: 321063
2017-12-19 09:10:21 +00:00
Teresa Johnson 915897e21b [PGO] Fix handling of cold entry count for instrumented PGO
Summary:
In r277849, getEntryCount was changed to return None when the entry
count was 0, specifically for SamplePGO where it means no samples were
recorded. However, for instrumentation PGO a 0 entry count should be
returned directly, since it does mean that the function was completely
cold. Otherwise we end up treating these functions conservatively
in isFunctionEntryCold() and isColdBB().

Instead, for SamplePGO use -1 when there are no samples, and change
getEntryCount to return None when the value is -1.

Reviewers: danielcdh, davidxl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D41307

llvm-svn: 321018
2017-12-18 20:02:43 +00:00
Dimitry Andric e4f5d01033 Fix more inconsistent line endings. NFC.
llvm-svn: 321016
2017-12-18 19:46:56 +00:00
Matt Arsenault d89d0b6494 Removed unused DominanceFrontier
llvm-svn: 321001
2017-12-18 18:01:13 +00:00
Xinliang David Li 19fb5b467b [PGO] add MST min edge selection heuristic to ensure non-zero entry count
Differential Revision: http://reviews.llvm.org/D41059

llvm-svn: 320998
2017-12-18 17:56:19 +00:00
Sean Fertile 5fb624a3b8 [Memcpy Loop Lowering] Remove the fixed int8 lowering.
Switch over to the lowering that uses target supplied operand types.

Differential Revision: https://reviews.llvm.org/D41201

llvm-svn: 320989
2017-12-18 15:31:14 +00:00
Eugene Leviant c95b49603e [ThinLTO] Remove unused code
This is a re-commit of r320464, after patch for gold plugin
was landed.

llvm-svn: 320968
2017-12-18 10:53:45 +00:00
Hiroshi Inoue c6faf15459 [SROA] Disable non-whole-alloca splits by default
This patch introduce a switch to control splitting of non-whole-alloca slices with default off.
The switch will be default on again after fixing an issue reported in PR35657.

llvm-svn: 320958
2017-12-18 06:47:37 +00:00
Sean Fertile 68d7f9da76 [Memcpy Loop Lowering] Only calculate residual size/bytes copied when needed.
If the loop operand type is int8 then there will be no residual loop for the
unknown size expansion. Dont create the residual-size and bytes-copied values
when they are not needed.

llvm-svn: 320929
2017-12-16 22:41:39 +00:00
Sanjay Patel 5a0cdac174 [InstCombine] canonicalize shifty abs(): ashr+add+xor --> cmp+neg+sel
We want to do this for 2 reasons:
1. Value tracking does not recognize the ashr variant, so it would fail to match for cases like D39766.
2. DAGCombiner does better at producing optimal codegen when we have the cmp+sel pattern.

More detail about what happens in the backend:
1. DAGCombiner has a generic transform for all targets to convert the scalar cmp+sel variant of abs 
   into the shift variant. That is the opposite of this IR canonicalization.
2. DAGCombiner has a generic transform for all targets to convert the vector cmp+sel variant of abs 
   into either an ABS node or the shift variant. That is again the opposite of this IR canonicalization.
3. DAGCombiner has a generic transform for all targets to convert the exact shift variants produced by #1 or #2
   into an ISD::ABS node. Note: It would be an efficiency improvement if we had #1 go directly to an ABS node 
   when that's legal/custom.
4. The pattern matching above is incomplete, so it is possible to escape the intended/optimal codegen in a 
   variety of ways.
   a. For #2, the vector path is missing the case for setlt with a '1' constant.
   b. For #3, we are missing a match for commuted versions of the shift variants.
5. Therefore, this IR canonicalization can only help get us to the optimal codegen. The version of cmp+sel 
   produced by this patch will be recognized in the DAG and converted to an ABS node when possible or the 
   shift sequence when not.
6. In the following examples with this patch applied, we may get conditional moves rather than the shift 
   produced by the generic DAGCombiner transforms. The conditional move is created using a target-specific 
   decision for any given target. Whether it is optimal or not for a particular subtarget may be up for debate.

define i32 @abs_shifty(i32 %x) {
  %signbit = ashr i32 %x, 31 
  %add = add i32 %signbit, %x  
  %abs = xor i32 %signbit, %add 
  ret i32 %abs
}

define i32 @abs_cmpsubsel(i32 %x) {
  %cmp = icmp slt i32 %x, zeroinitializer
  %sub = sub i32 zeroinitializer, %x
  %abs = select i1 %cmp, i32 %sub, i32 %x
  ret i32 %abs
}

define <4 x i32> @abs_shifty_vec(<4 x i32> %x) {
  %signbit = ashr <4 x i32> %x, <i32 31, i32 31, i32 31, i32 31> 
  %add = add <4 x i32> %signbit, %x  
  %abs = xor <4 x i32> %signbit, %add 
  ret <4 x i32> %abs
}

define <4 x i32> @abs_cmpsubsel_vec(<4 x i32> %x) {
  %cmp = icmp slt <4 x i32> %x, zeroinitializer
  %sub = sub <4 x i32> zeroinitializer, %x
  %abs = select <4 x i1> %cmp, <4 x i32> %sub, <4 x i32> %x
  ret <4 x i32> %abs
}

> $ ./opt -instcombine shiftyabs.ll -S | ./llc -o - -mtriple=x86_64 -mattr=avx 
> abs_shifty:
> 	movl	%edi, %eax
> 	negl	%eax
> 	cmovll	%edi, %eax
> 	retq
> 
> abs_cmpsubsel:
> 	movl	%edi, %eax
> 	negl	%eax
> 	cmovll	%edi, %eax
> 	retq
> 
> abs_shifty_vec:
> 	vpabsd	%xmm0, %xmm0
> 	retq
> 
> abs_cmpsubsel_vec:
> 	vpabsd	%xmm0, %xmm0
> 	retq
> 
> $ ./opt -instcombine shiftyabs.ll -S | ./llc -o - -mtriple=aarch64
> abs_shifty:
> 	cmp	w0, #0                  // =0
> 	cneg	w0, w0, mi
> 	ret
> 
> abs_cmpsubsel: 
> 	cmp	w0, #0                  // =0
> 	cneg	w0, w0, mi
> 	ret
>                                        
> abs_shifty_vec: 
> 	abs	v0.4s, v0.4s
> 	ret
> 
> abs_cmpsubsel_vec: 
> 	abs	v0.4s, v0.4s
> 	ret
> 
> $ ./opt -instcombine shiftyabs.ll -S | ./llc -o - -mtriple=powerpc64le 
> abs_shifty:  
> 	srawi 4, 3, 31
> 	add 3, 3, 4
> 	xor 3, 3, 4
> 	blr
> 
> abs_cmpsubsel:
> 	srawi 4, 3, 31
> 	add 3, 3, 4
> 	xor 3, 3, 4
> 	blr
> 
> abs_shifty_vec:   
> 	vspltisw 3, -16
> 	vspltisw 4, 15
> 	vsubuwm 3, 4, 3
> 	vsraw 3, 2, 3
> 	vadduwm 2, 2, 3
> 	xxlxor 34, 34, 35
> 	blr
> 
> abs_cmpsubsel_vec: 
> 	vspltisw 3, -16
> 	vspltisw 4, 15
> 	vsubuwm 3, 4, 3
> 	vsraw 3, 2, 3
> 	vadduwm 2, 2, 3
> 	xxlxor 34, 34, 35
> 	blr
>

Differential Revision: https://reviews.llvm.org/D40984

llvm-svn: 320921
2017-12-16 16:41:17 +00:00
Hal Finkel 5444f40965 [LV] Extend InstWidening with CM_Widen_Recursive
Changes to the original scalar loop during LV code gen cause the return value
of Legal->isConsecutivePtr() to be inconsistent with the return value during
legal/cost phases (further analysis and information of the bug is in D39346).
This patch is an alternative fix to PR34965 following the CM_Widen approach
proposed by Ayal and Gil in D39346. It extends InstWidening enum with
CM_Widen_Reverse to properly record the widening decision for consecutive
reverse memory accesses and, consequently, get rid of the
Legal->isConsetuviePtr() call in LV code gen. I think this is a simpler/cleaner
solution to PR34965 than the one in D39346.

Fixes PR34965.

Patch by Diego Caballero, thanks!

Differential Revision: https://reviews.llvm.org/D40742

llvm-svn: 320913
2017-12-16 02:55:24 +00:00
Hal Finkel 2ff24731bb [SimplifyLibCalls] Inline calls to cabs when it's safe to do so
When unsafe algerbra is allowed calls to cabs(r) can be replaced by:

  sqrt(creal(r)*creal(r) + cimag(r)*cimag(r))

Patch by Paul Walker, thanks!

Differential Revision: https://reviews.llvm.org/D40069

llvm-svn: 320901
2017-12-16 01:26:25 +00:00
Hal Finkel 7333aa9f16 [LV] NFC patch for moving VP*Recipe class definitions from LoopVectorize.cpp to VPlan.h
This is a small step forward to move VPlan stuff to where it should belong (i.e., VPlan.*):

  1. VP*Recipe classes in LoopVectorize.cpp are moved to VPlan.h.
  2. Many of VP*Recipe::print() and execute() definitions are still left in
     LoopVectorize.cpp since they refer to things declared in LoopVectorize.cpp. To
     be moved to VPlan.cpp at a later time.
  3. InterleaveGroup class is moved from anonymous namespace to llvm namespace.
     Referencing it in anonymous namespace from VPlan.h ended up in warning.

Patch by Hideki Saito, thanks!

Differential Revision: https://reviews.llvm.org/D41045

llvm-svn: 320900
2017-12-16 01:12:50 +00:00
Teresa Johnson 69b2de8466 Fix NDEBUG build problem in r320895
Fix incorrect placement of #endif causing NDEBUG build failures.

llvm-svn: 320897
2017-12-16 00:29:31 +00:00
Teresa Johnson 81bbf74265 [ThinLTO] Enable importing of aliases as copy of aliasee
Summary:
This implements a missing feature to allow importing of aliases, which
was previously disabled because alias cannot be available_externally.
We instead import an alias as a copy of its aliasee.

Some additional work was required in the IndexBitcodeWriter for the
distributed build case, to ensure that the aliasee has a value id
in the distributed index file (i.e. even when it is not being
imported directly).

This is a performance win in codes that have many aliases, e.g. C++
applications that have many constructor and destructor aliases.

Reviewers: pcc

Subscribers: mehdi_amini, inglorion, eraman, llvm-commits

Differential Revision: https://reviews.llvm.org/D40747

llvm-svn: 320895
2017-12-16 00:18:12 +00:00
Jun Bum Lim 44c58d35c1 Re-commit : [LICM] Allow sinking when foldable in loop
This recommits r320823 reverted due to the test failure in sink-foldable.ll and
an unused variable. Added "REQUIRES: aarch64-registered-target" in the test
and removed unused variable.

Original commit message:

  Continue trying to sink an instruction if its users in the loop is foldable.
  This will allow the instruction to be folded in the loop by decoupling it from
  the user outside of the loop.

  Reviewers: hfinkel, majnemer, davidxl, efriedma, danielcdh, bmakam, mcrosier

  Reviewed By: hfinkel

  Subscribers: javed.absar, bmakam, mcrosier, llvm-commits

  Differential Revision: https://reviews.llvm.org/D37076

llvm-svn: 320858
2017-12-15 20:33:24 +00:00
Sean Fertile 42b13343fd [Memcpy Loop Lowering] Insert loop BB inbetween the split BB.
The original memcpy expansion inserted the loop basic block inbetween
the 2 new basic blocks created by splitting the original block the memcpy
call was in. This commit makes the new memcpy expansion do the same to keep the
layout of the IR matching between the old and new implementations.

Differential Review: https://reviews.llvm.org/D41197

llvm-svn: 320848
2017-12-15 19:29:12 +00:00
Sanjay Patel c722e26549 fix typo in comment and remove inaccurate comment; NFC
llvm-svn: 320838
2017-12-15 18:25:13 +00:00
Jun Bum Lim 5efd4d8b5e Revert "Re-commit : [LICM] Allow sinking when foldable in loop"
This reverts commit r320833.

llvm-svn: 320836
2017-12-15 18:12:49 +00:00
Jun Bum Lim 83ccad6684 Re-commit : [LICM] Allow sinking when foldable in loop
This recommit r320823 after fixing a test failure.

 Original commit message:

    Continue trying to sink an instruction if its users in the loop is foldable.
    This will allow the instruction to be folded in the loop by decoupling it from
    the user outside of the loop.

    Reviewers: hfinkel, majnemer, davidxl, efriedma, danielcdh, bmakam, mcrosier

    Reviewed By: hfinkel

    Subscribers: javed.absar, bmakam, mcrosier, llvm-commits

    Differential Revision: https://reviews.llvm.org/D37076

llvm-svn: 320833
2017-12-15 17:58:59 +00:00
Jun Bum Lim 6136d87f5d Revert "[LICM] Allow sinking when foldable in loop"
This reverts commit r320823.

llvm-svn: 320828
2017-12-15 16:35:09 +00:00
Jun Bum Lim 22855c26a5 [LICM] Allow sinking when foldable in loop
Summary:
Continue trying to sink an instruction if its users in the loop is foldable.
This will allow the instruction to be folded in the loop by decoupling it from
the user outside of the loop.

Reviewers: hfinkel, majnemer, davidxl, efriedma, danielcdh, bmakam, mcrosier

Reviewed By: hfinkel

Subscribers: javed.absar, bmakam, mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D37076

llvm-svn: 320823
2017-12-15 16:09:54 +00:00
Fedor Sergeev 4b86d79048 [PM] port Rewrite Statepoints For GC to the new pass manager.
Summary:
The port is nearly straightforward.
The only complication is related to the analyses handling,
since one of the analyses used in this module pass is domtree,
which is a function analysis. That requires asking for the results
of each function and disallows a single interface for run-on-module
pass action.

Decided to copy-paste the main body of this pass.
Most of its code is requesting analyses anyway, so not that much
of a copy-paste.

The rest of the code movement is to transform all the implementation
helper functions like stripNonValidData into non-member statics.

Extended all the related LLVM tests with new-pass-manager use.
No failures.

Reviewers: sanjoy, anna, reames

Reviewed By: anna

Subscribers: skatkov, llvm-commits

Differential Revision: https://reviews.llvm.org/D41162

llvm-svn: 320796
2017-12-15 09:32:11 +00:00
Sanjay Patel 0ab0c1a201 [SimplifyCFG] don't sink common insts too soon (PR34603)
This should solve:
https://bugs.llvm.org/show_bug.cgi?id=34603
...by preventing SimplifyCFG from altering redundant instructions before early-cse has a chance to run.
It changes the default (canonical-forming) behavior of SimplifyCFG, so we're only doing the
sinking transform later in the optimization pipeline.

Differential Revision: https://reviews.llvm.org/D38566

llvm-svn: 320749
2017-12-14 22:05:20 +00:00
Guozhi Wei d22d1b953d [SLPVectorizer] Don't ignore scalar extraction instructions of aggregate value
In SLPVectorizer, the vector build instructions (insertvalue for aggregate type) is passed to BoUpSLP.buildTree, it is treated as UserIgnoreList, so later in cost estimation, the cost of these instructions are not counted. 
For aggregate value, later usage are more likely to be done in scalar registers, either used as individual scalars or used as a whole for function call or return value. Ignore scalar extraction instructions may cause too aggressive vectorization for aggregate values, and slow down performance. So for vectorization of aggregate value, the scalar extraction instructions are required in cost estimation.

Differential Revision: https://reviews.llvm.org/D41139

llvm-svn: 320736
2017-12-14 19:35:43 +00:00
Fedor Sergeev 83bcc68afa [PM][InstCombine] fixing omission of AliasAnalysis in new-pass-manager's version of InstCombine
Summary:
Passing AliasAnalysis results instead of nullptr appears to work just fine.
A couple new-pass-manager tests updated to align with new order of analyses.

Reviewers: chandlerc, spatel, craig.topper

Reviewed By: chandlerc

Subscribers: mehdi_amini, eraman, llvm-commits

Differential Revision: https://reviews.llvm.org/D41203

llvm-svn: 320687
2017-12-14 10:36:31 +00:00
Dorit Nuzman 4750c785b3 [LV] Support efficient vectorization of an induction with redundant casts
D30041 extended SCEVPredicateRewriter to improve handling of Phi nodes whose
update chain involves casts; PSCEV can now build an AddRecurrence for some
forms of such phi nodes, under the proper runtime overflow test. This means
that we can identify such phi nodes as an induction, and the loop-vectorizer
can now vectorize such inductions, however inefficiently. The vectorizer
doesn't know that it can ignore the casts, and so it vectorizes them.

This patch records the casts in the InductionDescriptor, so that they could
be marked to be ignored for cost calculation (we use VecValuesToIgnore for
that) and ignored for vectorization/widening/scalarization (i.e. treated as
TriviallyDead).

In addition to marking all these casts to be ignored, we also need to make
sure that each cast is mapped to the right vector value in the vector loop body
(be it a widened, vectorized, or scalarized induction). So whenever an
induction phi is mapped to a vector value (during vectorization/widening/
scalarization), we also map the respective cast instruction (if exists) to that
vector value. (If the phi-update sequence of an induction involves more than one
cast, then the above mapping to vector value is relevant only for the last cast
of the sequence as we allow only the "last cast" to be used outside the
induction update chain itself).

This is the last step in addressing PR30654.

llvm-svn: 320672
2017-12-14 07:56:31 +00:00
Sanjay Patel 558a465473 [EarlyCSE] recognize swapped variants of abs/nabs as equivalent
Extends https://reviews.llvm.org/rL320640

Differential Revision: https://reviews.llvm.org/D41136

llvm-svn: 320653
2017-12-13 22:57:35 +00:00
Brian M. Rzycki 580bc3c8fa Reverting [JumpThreading] Preservation of DT and LVI across the pass
Stage 2 bootstrap failed:
http://lab.llvm.org:8011/builders/clang-x86_64-linux-selfhost-modules-2/builds/14434

llvm-svn: 320641
2017-12-13 22:01:17 +00:00
Sanjay Patel 3c7a35de7f [EarlyCSE] recognize commuted and swapped variants of min/max as equivalent (PR35642)
As shown in:
https://bugs.llvm.org/show_bug.cgi?id=35642
...we can have different forms of min/max, so we should recognize those here in EarlyCSE 
similar to how we already handle binops and compares that can commute.

Differential Revision: https://reviews.llvm.org/D41136

llvm-svn: 320640
2017-12-13 21:58:15 +00:00
Michael Zolotukhin 6af4f232b5 Remove redundant includes from lib/Transforms.
llvm-svn: 320628
2017-12-13 21:31:01 +00:00
Brian M. Rzycki d989af98b3 [JumpThreading] Preservation of DT and LVI across the pass
Summary:
See D37528 for a previous (non-deferred) version of this
patch and its description.

Preserves dominance in a deferred manner using a new class
DeferredDominance. This reduces the performance impact of
updating the DominatorTree at every edge insertion and
deletion. A user may call DDT->flush() within JumpThreading
for an up-to-date DT. This patch currently has one flush()
at the end of runImpl() to ensure DT is preserved across
the pass.

LVI is also preserved to help subsequent passes such as
CorrelatedValuePropagation. LVI is simpler to maintain and
is done immediately (not deferred). The code to perfom the
preversation was minimally altered and was simply marked
as preserved for the PassManager to be informed.

This extends the analysis available to JumpThreading for
future enhancements. One example is loop boundary threading.

Reviewers: dberlin, kuhar, sebpop

Reviewed By: kuhar, sebpop

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D40146

llvm-svn: 320612
2017-12-13 20:52:26 +00:00
Aditya Kumar 49c03b11df [GVNHoist] Fix: PR35222 gvn-hoist incorrectly erases load
w.r.t. the paper
"A Practical Improvement to the Partial Redundancy Elimination in SSA Form"
(https://sites.google.com/site/jongsoopark/home/ssapre.pdf)

Proper dominance check was missing here, so having a loopinfo should not be required.
Committing this diff as this fixes the bug, if there are
further concerns, I'll be happy to work on them.

Differential Revision: https://reviews.llvm.org/D39781

llvm-svn: 320607
2017-12-13 19:40:07 +00:00
Igor Laevsky e0edb66475 Reintroduce r320049, r320014 and r319894.
OpenGL issues should be fixed by now.

llvm-svn: 320568
2017-12-13 11:21:18 +00:00
Mohammad Shahid dbd30edb7f [SLP] Vectorize jumbled memory loads.
Summary:
This patch tries to vectorize loads of consecutive memory accesses, accessed
in non-consecutive or jumbled way. An earlier attempt was made with patch D26905
which was reverted back due to some basic issue with representing the 'use mask' of
jumbled accesses.

This patch fixes the mask representation by recording the 'use mask' in the usertree entry.

Change-Id: I9fe7f5045f065d84c126fa307ef6ebe0787296df

Reviewers: mkuper, loladiro, Ayal, zvi, danielcdh

Reviewed By: Ayal

Subscribers: mgrang, dcaballe, hans, mzolotukhin

Differential Revision: https://reviews.llvm.org/D36130

llvm-svn: 320548
2017-12-13 03:08:29 +00:00
Florian Hahn beda7d517d [CallSiteSplitting] Refactor creating callsites.
Summary:
This change makes the call site creation more general if any of the
arguments is predicated on a condition in the call site's predecessors.

If we find a callsite, that potentially can be split, we collect the set
of conditions for the call site's predecessors (currently only 2
predecessors are allowed). To do that, we traverse each predecessor's
predecessors as long as it only has single predecessors and record the
condition, if it is relevant to the call site. For each condition, we
also check if the condition is taken or not. In case it is not taken,
we record the inverse predicate.

We use the recorded conditions to create the new call sites and split
the basic block.

This has 2 benefits: (1) it is slightly easier to see what is going on
(IMO) and (2) we can easily extend it to handle more complex control
flow.

Reviewers: davidxl, junbuml

Reviewed By: junbuml

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D40728

llvm-svn: 320547
2017-12-13 03:05:20 +00:00
Evgeniy Stepanov ecb48e523e [hwasan] Inline instrumentation & fixed shadow.
Summary: This brings CPU overhead on bzip2 down from 5.5x to 2x.

Reviewers: kcc, alekseyshl

Subscribers: kubamracek, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D41137

llvm-svn: 320538
2017-12-13 01:16:34 +00:00
Alexey Bataev 83c15b1363 [InstCombine] Fix PR35618: Instcombine hangs on single minmax load bitcast.
Summary:
If we have pattern `store (load(bitcast(select (cmp(V1, V2), &V1,
&V2)))), bitcast)`, but the load is used in other instructions, it leads
to looping in InstCombiner. Patch adds additional check that all users
of the load instructions are stores and then replaces all uses of load
instruction by the new one with new type.

Reviewers: RKSimon, spatel, majnemer

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D41072

llvm-svn: 320525
2017-12-12 20:28:46 +00:00
Fiona Glaser b8a330c42a Reassociate: add global reassociation algorithm
This algorithm (explained more in the source code) takes into account
global redundancies by building a "pair map" to find common subexprs.

The primary motivation of this is to handle situations like

foo = (a * b) * c
bar = (a * d) * c

where we currently don't identify that "a * c" is redundant.

Accordingly, it prioritizes the emission of a * c so that CSE
can remove the redundant calculation later.

Does not change the actual reassociation algorithm -- only the
order in which the reassociated operand chain is reconstructed.

Gives ~1.5% floating point math instruction count reduction on
a large offline suite of graphics shaders.

llvm-svn: 320515
2017-12-12 19:18:02 +00:00
Alexey Bataev fa0a76dbcc Revert "[InstCombine] Fix PR35618: Instcombine hangs on single minmax load bitcast."
This reverts commit r320510 - again sanitizers bbots.

llvm-svn: 320513
2017-12-12 19:12:34 +00:00
Hiroshi Yamauchi f3bda1daa2 Split IndirectBr critical edges before PGO gen/use passes.
Summary:
The PGO gen/use passes currently fail with an assert failure if there's a
critical edge whose source is an IndirectBr instruction and that edge
needs to be instrumented.

To avoid this in certain cases, split IndirectBr critical edges in the PGO
gen/use passes. This works for blocks with single indirectbr predecessors,
but not for those with multiple indirectbr predecessors (splitting an
IndirectBr critical edge isn't always possible.)

Reviewers: davidxl, xur

Reviewed By: davidxl

Subscribers: efriedma, llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D40699

llvm-svn: 320511
2017-12-12 19:07:43 +00:00
Alexey Bataev 195c97e220 [InstCombine] Fix PR35618: Instcombine hangs on single minmax load bitcast.
Summary:
If we have pattern `store (load(bitcast(select (cmp(V1, V2), &V1,
&V2)))), bitcast)`, but the load is used in other instructions, it leads
to looping in InstCombiner. Patch adds additional check that all users
of the load instructions are stores and then replaces all uses of load
instruction by the new one with new type.

Reviewers: RKSimon, spatel, majnemer

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D41072

llvm-svn: 320510
2017-12-12 18:47:00 +00:00
Alexey Bataev 6132a50d2a Revert "[InstCombine] Fix PR35618: Instcombine hangs on single minmax load bitcast."
This reverts commit r320499 again to resolve the problem with the
sanitizers bbots.

llvm-svn: 320501
2017-12-12 17:35:29 +00:00
Alexey Bataev ca4c9a5246 [InstCombine] Fix PR35618: Instcombine hangs on single minmax load bitcast.
Summary:
If we have pattern `store (load(bitcast(select (cmp(V1, V2), &V1,
&V2)))), bitcast)`, but the load is used in other instructions, it leads
to looping in InstCombiner. Patch adds additional check that all users
of the load instructions are stores and then replaces all uses of load
instruction by the new one with new type.

Reviewers: RKSimon, spatel, majnemer

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D41072

llvm-svn: 320499
2017-12-12 17:19:15 +00:00
Alexey Bataev d19dbe6791 Revert "[InstCombine] Fix PR35618: Instcombine hangs on single minmax load bitcast."
This reverts commit r320496 to solve the problems with sanitizer
buildbots.

llvm-svn: 320498
2017-12-12 17:08:48 +00:00
Alexey Bataev d0c3aeb200 [InstCombine] Fix PR35618: Instcombine hangs on single minmax load bitcast.
Summary:
If we have pattern `store (load(bitcast(select (cmp(V1, V2), &V1,
&V2)))), bitcast)`, but the load is used in other instructions, it leads
to looping in InstCombiner. Patch adds additional check that all users
of the load instructions are stores and then replaces all uses of load
instruction by the new one with new type.

Reviewers: RKSimon, spatel, majnemer

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D41072

llvm-svn: 320496
2017-12-12 16:58:48 +00:00
Alexey Bataev c9f1d2e4a0 Revert "[InstCombine] Fix PR35618: Instcombine hangs on single minmax load bitcast."
This reverts commit r320488 because of the failed asan buildbots..

llvm-svn: 320490
2017-12-12 16:05:52 +00:00
Alexey Bataev fb68c48a82 [InstCombine] Fix PR35618: Instcombine hangs on single minmax load bitcast.
Summary:
If we have pattern `store (load(bitcast(select (cmp(V1, V2), &V1,
&V2)))), bitcast)`, but the load is used in other instructions, it leads
to looping in InstCombiner. Patch adds additional check that all users
of the load instructions are stores and then replaces all uses of load
instruction by the new one with new type.

Reviewers: RKSimon, spatel, majnemer

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D41072

llvm-svn: 320488
2017-12-12 15:54:49 +00:00
Alexey Bataev ca2a8cea2f Revert "[InstCombine] Fix PR35618: Instcombine hangs on single minmax load bitcast."
This reverts commit r320483 because of the failed Windows buildbots.

llvm-svn: 320485
2017-12-12 15:24:17 +00:00
Alexey Bataev 1daef8a667 [InstCombine] Fix PR35618: Instcombine hangs on single minmax load bitcast.
If we have pattern `store (load(bitcast(select (cmp(V1, V2), &V1,
&V2)))), bitcast)`, but the load is used in other instructions, it leads
to looping in InstCombiner. Patch adds additional check that all users
of the load instructions are stores and then replaces all uses of load
instruction by the new one with new type.

Reviewers: RKSimon, spatel, majnemer

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D41072

llvm-svn: 320483
2017-12-12 15:03:17 +00:00
Anna Thomas 2dd9835f35 [InstComineLoadStoreAlloca] Optimize stores to GEP off null base
Summary:
Currently, in InstCombineLoadStoreAlloca, we have simplification
rules for the following cases:
  1. load off a null
  2. load off a GEP with null base
  3. store to a null

This patch adds support for the fourth case which is store into a
GEP with null base. Since this is UB as well (and directly analogous to
the load off a GEP with null base), we can substitute the stored val
with undef in instcombine, so that SimplifyCFG can optimize this code
into unreachable code.

Note: Right now, simplifyCFG hasn't been taught about optimizing
this to unreachable and adding an llvm.trap (this is already done for
the above 3 cases).

Reviewers: majnemer, hfinkel, sanjoy, davide

Reviewed by: sanjoy, davide

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D41026

llvm-svn: 320480
2017-12-12 14:12:33 +00:00
Eugene Leviant d53f3da772 Revert r320464 as it breaks gold plugin tests
llvm-svn: 320467
2017-12-12 10:12:46 +00:00
Igor Laevsky d63560b817 Revert r320049, r320014 and r319894
They were causing failures of the piglit OpenGL tests with AMD GPUs using the
Mesa radeonsi driver.

llvm-svn: 320466
2017-12-12 10:03:39 +00:00
Eugene Leviant 3695183395 [ThinLTO] Remove unused code from thinLTOInternalizeModule
Differential revision: https://reviews.llvm.org/D40970

llvm-svn: 320464
2017-12-12 09:12:32 +00:00
Dorit Nuzman 927b31600e [LV] Ignore the cost of values that will not appear in the vectorized loop
VecValuesToIgnore holds values that will not appear in the vectorized loop.
We should therefore ignore their cost when VF > 1.

Differential Revision: https://reviews.llvm.org/D40883

llvm-svn: 320463
2017-12-12 08:57:43 +00:00
Mikael Holmen 66cf383761 [CallSiteSplitting] Don't let debug intrinsics affect optimizations
Summary:
This solves PR35616.

We don't want the compiler to generate different code when we compile
with/without -g, so we now ignore debug intrinsics when determining if
the optimization can trigger or not.

Reviewers: junbuml

Subscribers: davide, JDevlieghere, llvm-commits

Differential Revision: https://reviews.llvm.org/D41068

llvm-svn: 320460
2017-12-12 07:29:57 +00:00
Matt Arsenault 3e268cc0dd LSR: Check more intrinsic pointer operands
llvm-svn: 320424
2017-12-11 21:38:43 +00:00
Hans Wennborg 27d1c00c01 Revert r320407 "[InstCombine] Fix PR35618: Instcombine hangs on single minmax load bitcast."
The tests fail (opt asserts) on Windows.

> Summary:
> If we have pattern `store (load(bitcast(select (cmp(V1, V2), &V1,
> &V2)))), bitcast)`, but the load is used in other instructions, it leads
> to looping in InstCombiner. Patch adds additional check that all users
> of the load instructions are stores and then replaces all uses of load
> instruction by the new one with new type.
>
> Reviewers: RKSimon, spatel, majnemer
>
> Subscribers: llvm-commits
>
> Differential Revision: https://reviews.llvm.org/D41072

llvm-svn: 320421
2017-12-11 21:15:27 +00:00
Adrian Prantl 3c6c14d14b ASAN: Provide reliable debug info for local variables at -O0.
The function stack poisioner conditionally stores local variables
either in an alloca or in malloc'ated memory, which has the
unfortunate side-effect, that the actual address of the variable is
only materialized when the variable is accessed, which means that
those variables are mostly invisible to the debugger even when
compiling without optimizations.

This patch stores the address of the local stack base into an alloca,
which can be referred to by the debug info and is available throughout
the function. This adds one extra pointer-sized alloca to each stack
frame (but mem2reg can optimize it away again when optimizations are
enabled, yielding roughly the same debug info quality as before in
optimized code).

rdar://problem/30433661

Differential Revision: https://reviews.llvm.org/D41034

llvm-svn: 320415
2017-12-11 20:43:21 +00:00
Alexey Bataev ec128ace8a [InstCombine] Fix PR35618: Instcombine hangs on single minmax load bitcast.
Summary:
If we have pattern `store (load(bitcast(select (cmp(V1, V2), &V1,
&V2)))), bitcast)`, but the load is used in other instructions, it leads
to looping in InstCombiner. Patch adds additional check that all users
of the load instructions are stores and then replaces all uses of load
instruction by the new one with new type.

Reviewers: RKSimon, spatel, majnemer

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D41072

llvm-svn: 320407
2017-12-11 19:11:16 +00:00
Alexander Potapenko 3c934e4864 [MSan] Hotfix compilation
For some reason the override directives got removed in r320373.
I suspect this to be an unwanted effect of clang-format.

llvm-svn: 320381
2017-12-11 15:48:56 +00:00
Alexander Potapenko c07e6a0eff [MSan] introduce getShadowOriginPtr(). NFC.
This patch introduces getShadowOriginPtr(), a method that obtains both the shadow and origin pointers for an address as a Value pair.
The existing callers of getShadowPtr() and getOriginPtr() are updated to use getShadowOriginPtr().

The rationale for this change is to simplify KMSAN instrumentation implementation.
In KMSAN origins tracking is always enabled, and there's no direct mapping between the app memory and the shadow/origin pages.
Both the shadow and the origin pointer for a given address are obtained by calling a single runtime hook from the instrumentation,
therefore it's easier to work with those pointers together.

Reviewed at https://reviews.llvm.org/D40835.

llvm-svn: 320373
2017-12-11 15:05:22 +00:00
Sanjay Patel b23e148114 [SimplifyLibCalls] propagate FMF when folding pow(x, -1.0) call
Follow-up for a bug that's similar to:
https://bugs.llvm.org/show_bug.cgi?id=35601

llvm-svn: 320312
2017-12-10 17:25:54 +00:00
Sanjay Patel 09ec34349a [SimplifyLibCalls] propagate FMF when folding pow(x, 2.0) call (PR35601)
This should fix the larger problem with sqrt shown in:
https://bugs.llvm.org/show_bug.cgi?id=35601

llvm-svn: 320310
2017-12-10 16:52:26 +00:00
Xinliang David Li fa3f1a15b2 [PGO] change arg type to uint64_t to match member field type
llvm-svn: 320285
2017-12-10 07:39:53 +00:00
Simon Pilgrim a42a54258e [InstCombine] Fix SimplifyDemandedUseBits SHL handling (PR35515)
Don't assume that the pattern matched SRL can be cast to an Instruction (might be ConstExpr etc.)

llvm-svn: 320270
2017-12-09 23:42:56 +00:00
Florian Hahn c5bebffe4f [InlineFunction] Set debug loc for call to forward varargs.
Reviewers: aprantl, dblaikie, rnk

Reviewed By: rnk

Subscribers: eraman, llvm-commits, JDevlieghere

Differential Revision: https://reviews.llvm.org/D40432

llvm-svn: 320252
2017-12-09 14:25:33 +00:00
Kamil Rytarowski 3d3f91e832 Register NetBSD/x86_64 in MemorySanitizer.cpp
Summary:
Reuse the Linux new mapping as it is.

Sponsored by <The NetBSD Foundation>

Reviewers: joerg, eugenis, vitalybuka

Reviewed By: vitalybuka

Subscribers: llvm-commits, #sanitizers

Tags: #sanitizers

Differential Revision: https://reviews.llvm.org/D41022

llvm-svn: 320219
2017-12-09 00:32:09 +00:00
Evgeniy Stepanov c667c1f47a Hardware-assisted AddressSanitizer (llvm part).
Summary:
This is LLVM instrumentation for the new HWASan tool. It is basically
a stripped down copy of ASan at this point, w/o stack or global
support. Instrumenation adds a global constructor + runtime callbacks
for every load and store.

HWASan comes with its own IR attribute.

A brief design document can be found in
clang/docs/HardwareAssistedAddressSanitizerDesign.rst (submitted earlier).

Reviewers: kcc, pcc, alekseyshl

Subscribers: srhines, mehdi_amini, mgorny, javed.absar, eraman, llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D40932

llvm-svn: 320217
2017-12-09 00:21:41 +00:00
Adrian Prantl d13170174c Generalize llvm::replaceDbgDeclare and actually support the use-case that
is mentioned in the documentation (inserting a deref before the plus_uconst).

llvm-svn: 320203
2017-12-08 21:58:18 +00:00
Florian Hahn e5089e2e94 [CodeExtractor] Add debug locations for new call and branch instrs.
Summary:
If a partially inlined function has debug info, we have to add debug
locations to the call instruction calling the outlined function.
We use the debug location of the first instruction in the outlined
function, as the introduced call transfers control to this statement and
there is no other equivalent line in the source code.

We also use the same debug location for the branch instruction added
to jump from artificial entry block for the outlined function, which just
jumps to the first actual basic block of the outlined function.

Reviewers: davide, aprantl, rriddle, dblaikie, danielcdh, wmi

Reviewed By: aprantl, rriddle, danielcdh

Subscribers: eraman, JDevlieghere, llvm-commits

Differential Revision: https://reviews.llvm.org/D40413

llvm-svn: 320199
2017-12-08 21:49:03 +00:00
Xinliang David Li d91057bf52 Revert r320104: infinite loop profiling bug fix
Causes unexpected memory issue with New PM this time.
The new PM invalidates BPI but not BFI, leaving the
reference to BPI from BFI invalid.

Abandon this patch.  There is a more general solution
which also handles runtime infinite loop (but not statically).

llvm-svn: 320180
2017-12-08 19:38:07 +00:00
Brian M. Rzycki 0eae123d9e [JumpThreading] Minor comment cleanup. NFC. (test commit)
llvm-svn: 320179
2017-12-08 19:36:32 +00:00
Alexey Bataev ec95c6cc0a [InstCombine] PR35354: Convert store(bitcast, load bitcast (select (Cond, &V1, &V2)) --> store (, load (select(Cond, load &V1, load &V2)))
Summary:
If we have the code like this:
```
float a, b;
a = std::max(a ,b);
```
it is converted into something like this:
```
%call = call dereferenceable(4) float* @_ZSt3maxIfERKT_S2_S2_(float* nonnull dereferenceable(4) %a.addr, float* nonnull dereferenceable(4) %b.addr)
%1 = bitcast float* %call to i32*
%2 = load i32, i32* %1, align 4
%3 = bitcast float* %a.addr to i32*
store i32 %2, i32* %3, align 4
```
After inlinning this code is converted to the next:
```
%1 = load float, float* %a.addr
%2 = load float, float* %b.addr
%cmp.i = fcmp fast olt float %1, %2
%__b.__a.i = select i1 %cmp.i, float* %a.addr, float* %b.addr
%3 = bitcast float* %__b.__a.i to i32*
%4 = load i32, i32* %3, align 4
%5 = bitcast float* %arrayidx to i32*
store i32 %4, i32* %5, align 4

```
This pattern is not recognized as minmax pattern.
Patch solves this problem by converting sequence
```
store (bitcast, (load bitcast (select ((cmp V1, V2), &V1, &V2))))
```
to a sequence
```
store (,load (select((cmp V1, V2), &V1, &V2)))
```
After this the code is recognized as minmax pattern.

Reviewers: RKSimon, spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D40304

llvm-svn: 320157
2017-12-08 15:32:10 +00:00
Bill Seurer 957a076cce [PowerPC][asan] Update asan to handle changed memory layouts in newer kernels
In more recent Linux kernels with 47 bit VMAs the layout of virtual memory
for powerpc64 changed causing the address sanitizer to not work properly. This
patch adds support for 47 bit VMA kernels for powerpc64 and fixes up test
cases.

https://reviews.llvm.org/D40907

There is an associated patch for compiler-rt.

Tested on several 4.x and 3.x kernel releases.

llvm-svn: 320109
2017-12-07 22:53:33 +00:00
Alina Sbirlea 193429f0c8 [ModRefInfo] Make enum ModRefInfo an enum class [NFC].
Summary:
Make enum ModRefInfo an enum class. Changes to ModRefInfo values should
be done using inline wrappers.
This should prevent future bit-wise opearations from being added, which can be more error-prone.

Reviewers: sanjoy, dberlin, hfinkel, george.burgess.iv

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D40933

llvm-svn: 320107
2017-12-07 22:41:34 +00:00
Xinliang David Li 4b0027f671 [PGO] detect infinite loop and form MST properly
Differential Revision: http://reviews.llvm.org/D40873

llvm-svn: 320104
2017-12-07 22:23:28 +00:00
Igor Laevsky 4a4f2e8c67 [InstCombine] Don't crash on out of bounds index in the insertelement
Differential Revision: https://reviews.llvm.org/D40390

llvm-svn: 320049
2017-12-07 15:00:52 +00:00
Adam Nemet a502ee73c4 [LV] Interleaved access vectorization: fix computing new alias info
As a new access is generated spanning across multiple fields, we need to
propagate alias info from all the fields to form the most generic alias info.

rdar://35602528

Differential Revision: https://reviews.llvm.org/D40617

llvm-svn: 319979
2017-12-06 22:42:24 +00:00
Sanjay Patel b6404a8ca6 [InstCombine] canonicalize constant-minus-boolean to select-of-constants
This restores the half of:
https://reviews.llvm.org/rL75531
that was reverted at:
https://reviews.llvm.org/rL159230

For the x86 case mentioned there, we now produce:
leal 1(%rdi), %eax
subl %esi, %eax

We have target hooks to invert this in DAGCombiner (and x86 is enabled) with:
https://reviews.llvm.org/rL296977
https://reviews.llvm.org/rL311731

AArch64 and possibly other targets would probably benefit from enabling those hooks too. 
See PR30327:
https://bugs.llvm.org/show_bug.cgi?id=30327#c2

Differential Revision: https://reviews.llvm.org/D40612

llvm-svn: 319964
2017-12-06 21:22:57 +00:00
Matthew Simpson e363d2cebb [PGO] Make indirect call promotion a utility
This patch factors out the main code transformation utilities in the pgo-driven
indirect call promotion pass and places them in Transforms/Utils. The change is
intended to be a non-functional change, letting non-pgo-driven passes share a
common implementation with the existing pgo-driven pass.

The common utilities are used to conditionally promote indirect call sites to
direct call sites. They perform the underlying transformation, and do not
consider profile information. The pgo-specific details (e.g., the computation
of branch weight metadata) have been left in the indirect call promotion pass.

Differential Revision: https://reviews.llvm.org/D40658

llvm-svn: 319963
2017-12-06 21:22:54 +00:00
Alina Sbirlea 18fea013de [ModRefInfo] Do not use ModRefInfo result in if conditions as this makes
assumptions about the values in the enum. Replace with wrapper returning
bool [NFC].

llvm-svn: 319949
2017-12-06 19:56:37 +00:00
Florian Hahn 115d99162c [InlineFunction] Only replace call if there are VarArgs to forward.
Summary:
There is no need to replace the original call instruction if no
 VarArgs need to be forwarded. 

Reviewers: davide, rnk, majnemer, efriedma

Reviewed By: efriedma

Subscribers: eraman, llvm-commits

Differential Revision: https://reviews.llvm.org/D40412

llvm-svn: 319947
2017-12-06 19:47:24 +00:00
Sanjay Patel 3e069f5724 [LoopUtils] simplify createTargetReduction(); NFCI
llvm-svn: 319946
2017-12-06 19:37:00 +00:00
Sanjay Patel 1ea7b6f7a1 [LoopUtils] fix variable name to match FMF vocabulary; NFC
llvm-svn: 319928
2017-12-06 19:11:23 +00:00
Hans Wennborg 146a9c3e51 Revert r319482 and r319483 "[memcpyopt] Teach memcpyopt to optimize across basic blocks"
This caused PR35519.

> [memcpyopt] Teach memcpyopt to optimize across basic blocks
>
> This teaches memcpyopt to make a non-local memdep query when a local query
> indicates that the dependency is non-local. This notably allows it to
> eliminate many more llvm.memcpy calls in common Rust code, often by 20-30%.
>
> Fixes PR28958.
>
> Differential Revision: https://reviews.llvm.org/D38374
>

> [memcpyopt] Commit file missed in r319482.
>
> This change was meant to be included with r319482 but was accidentally
> omitted.

llvm-svn: 319873
2017-12-06 01:47:55 +00:00
Xinliang David Li 45c819063a Revert r319794: [PGO] detect infinite loop and form MST properly: memory leak problem
llvm-svn: 319841
2017-12-05 21:54:01 +00:00
Alina Sbirlea 63d2250a42 Modify ModRefInfo values using static inline method abstractions [NFC].
Summary:
The aim is to make ModRefInfo checks and changes more intuitive
and less error prone using inline methods that abstract the bit operations.

Ideally ModRefInfo would become an enum class, but that change will require
a wider set of changes into FunctionModRefBehavior.

Reviewers: sanjoy, george.burgess.iv, dberlin, hfinkel

Subscribers: nlopes, llvm-commits

Differential Revision: https://reviews.llvm.org/D40749

llvm-svn: 319821
2017-12-05 20:12:23 +00:00
Joel Galenson ea0bafda8a [CVP] Remove some {s|u}sub.with.overflow checks.
This uses ConstantRange::makeGuaranteedNoWrapRegion's newly-added handling for subtraction to allow CVP to remove some subtraction overflow checks.

Differential Revision: https://reviews.llvm.org/D40039

llvm-svn: 319807
2017-12-05 18:14:24 +00:00
Joel Galenson d9500bc533 Test commit.
I removed a space at the end of a comment.  NFC.

llvm-svn: 319803
2017-12-05 17:59:07 +00:00
Xinliang David Li cc35bc9efc [PGO] detect infinite loop and form MST properly
Differential Revision: http://reviews.llvm.org/D40702

llvm-svn: 319794
2017-12-05 17:19:41 +00:00
Mikael Holmen 0a3e98062f Bail out of a SimplifyCFG switch table opt at undef values.
Summary:
A true or false result is expected from a comparison, but it seems the possibility of undef was overlooked, which could lead to a failed assert. This is fixed by this patch by bailing out if we encounter undef.

The bug is old and the assert has been there since the end of 2014, so it seems this is unusual enough to forego optimization.

Patch by JesperAntonsson.

Reviewers: spatel, eeckstein, hans

Reviewed By: hans

Subscribers: uabelho, llvm-commits

Differential Revision: https://reviews.llvm.org/D40639

llvm-svn: 319768
2017-12-05 14:14:00 +00:00
Evgeniy Stepanov 4a8d151986 [msan] Add a fixme note for a minor deficiency.
llvm-svn: 319708
2017-12-04 22:50:39 +00:00
Hiroshi Yamauchi 9364fa3434 Move splitIndirectCriticalEdges() to BasicBlockUtils.h.
Summary:
Move splitIndirectCriticalEdges() from CodeGenPrepare to BasicBlockUtils.h so
that it can be called from other places.

Reviewers: davidxl

Reviewed By: davidxl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D40750

llvm-svn: 319689
2017-12-04 20:36:01 +00:00
Sanjoy Das aa92cae14e [BypassSlowDivision] Improve our handling of divisions by constants
(This reapplies r314253.  r314253 was reverted on r314482 because of a
correctness regression on P100, but that regression was identified to be
something else.)

Summary:
Don't bail out on constant divisors for divisions that can be narrowed without
introducing control flow .  This gives us a 32 bit multiply instead of an
emulated 64 bit multiply in the generated PTX assembly.

Reviewers: jlebar

Subscribers: jholewinski, mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D38265

llvm-svn: 319677
2017-12-04 19:21:58 +00:00
Anna Thomas 7b360434ff [Loop Predication] Teach LP about reverse loops
Summary:
Currently, we only support predication for forward loops with step
of 1.  This patch enables loop predication for reverse or
countdownLoops, which satisfy the following conditions:
   1. The step of the IV is -1.
   2. The loop has a singe latch as B(X) = X <pred>
latchLimit with pred as s> or u>
   3. The IV of the guard is the decrement
IV of the latch condition (Guard is: G(X) = X-1 u< guardLimit).

This patch was downstream for a while and is the last series of patches
that's from our LP implementation downstream.

Reviewers: apilipenko, mkazantsev, sanjoy

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D40353

llvm-svn: 319659
2017-12-04 15:11:48 +00:00
Philip Reames 6260cf71d3 [IndVars] Fix a bug introduced in r317012
Turns out we can have comparisons which are indirect users of the induction variable that we can make invariant.  In this case, there is no loop invariant value contributing and we'd fail an assert.

The test case was found by a java fuzzer and reduced.  It's a real cornercase.  You have to have a static loop which we've already proven only executes once, but haven't broken the backedge on, and an inner phi whose result can be constant folded by SCEV using exit count reasoning but not proven by isKnownPredicate.  To my knowledge, only the fuzzer has hit this case.

llvm-svn: 319583
2017-12-01 20:57:19 +00:00
Hans Wennborg e2470b95da Revert r319531 "[SLPVectorizer] Failure to beneficially vectorize 'copyable' elements in integer binary ops."
It causes builds to fail with "Instruction does not dominate all uses" (PR35497).

> Patch tries to improve vectorization of the following code:
>
> void add1(int * __restrict dst, const int * __restrict src) {
>   *dst++ = *src++;
>   *dst++ = *src++ + 1;
>   *dst++ = *src++ + 2;
>   *dst++ = *src++ + 3;
> }
> Allows to vectorize even if the very first operation is not a binary add, but just a load.
>
> Fixed issues related to previous commit.
>
> Reviewers: spatel, mzolotukhin, mkuper, hfinkel, RKSimon, filcab, ABataev
>
> Reviewed By: ABataev, RKSimon
>
> Subscribers: llvm-commits, RKSimon
>
> Differential Revision: https://reviews.llvm.org/D28907

llvm-svn: 319550
2017-12-01 16:17:24 +00:00
Mikael Holmen 9c13c8b6ec Revert r319537: Bail out of a SimplifyCFG switch table opt at undef values.
Broke build bots so reverting.

llvm-svn: 319539
2017-12-01 13:11:39 +00:00
Mikael Holmen 9f047795fb Bail out of a SimplifyCFG switch table opt at undef values.
Summary:
A true or false result is expected from a comparison, but it seems the possibility of undef was overlooked, which could lead to a failed assert. This is fixed by this patch by bailing out if we encounter undef.

The bug is old and the assert has been there since the end of 2014, so it seems this is unusual enough to forego optimization.

Patch by: JesperAntonsson

Reviewers: spatel, eeckstein, hans

Reviewed By: hans

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D40639

llvm-svn: 319537
2017-12-01 12:30:49 +00:00
Dinar Temirbulatov 29e86584c6 [SLPVectorizer] Failure to beneficially vectorize 'copyable' elements in integer binary ops.
Patch tries to improve vectorization of the following code:
    
            void add1(int * __restrict dst, const int * __restrict src) {
              *dst++ = *src++;
              *dst++ = *src++ + 1;
              *dst++ = *src++ + 2;
              *dst++ = *src++ + 3;
            }
            Allows to vectorize even if the very first operation is not a binary add, but just a load.
    
            Fixed issues related to previous commit.
    
            Reviewers: spatel, mzolotukhin, mkuper, hfinkel, RKSimon, filcab, ABataev
    
            Reviewed By: ABataev, RKSimon
    
            Subscribers: llvm-commits, RKSimon
    
            Differential Revision: https://reviews.llvm.org/D28907

llvm-svn: 319531
2017-12-01 11:10:47 +00:00
Hiroshi Inoue 48e4c7aae6 Recommit rL319407: [SROA] enable splitting for non-whole-alloca loads and stores
Recommiting once reverted patch rL319407 after adding a check for bit vector size to avoid failures in some build bots.

llvm-svn: 319522
2017-12-01 06:05:05 +00:00
Zachary Turner 8065f0b975 Mark all library options as hidden.
These command line options are not intended for public use, and often
don't even make sense in the context of a particular tool anyway. About
90% of them are already hidden, but when people add new options they
forget to hide them, so if you were to make a brand new tool today, link
against one of LLVM's libraries, and run tool -help you would get a
bunch of junk that doesn't make sense for the tool you're writing.

This patch hides these options. The real solution is to not have
libraries defining command line options, but that's a much larger effort
and not something I'm prepared to take on.

Differential Revision: https://reviews.llvm.org/D40674

llvm-svn: 319505
2017-12-01 00:53:10 +00:00
Peter Collingbourne 1f03422610 ThinLTOBitcodeWriter: Try harder to discard unused references to the merged module.
If the thin module has no references to an internal global in the
merged module, we need to make sure to preserve that property if the
global is a member of a comdat group, as otherwise promotion can end
up adding global symbols to the comdat, which is not allowed.

This situation can arise if the external global in the thin module
has dead constant users, which would cause use_empty() to return
false and would cause us to try to promote it. To prevent this from
happening, discard the dead constant users before asking whether a
global is empty.

Differential Revision: https://reviews.llvm.org/D40593

llvm-svn: 319494
2017-11-30 23:05:52 +00:00
Dan Gohman 59e4c0b938 [memcpyopt] Teach memcpyopt to optimize across basic blocks
This teaches memcpyopt to make a non-local memdep query when a local query
indicates that the dependency is non-local. This notably allows it to
eliminate many more llvm.memcpy calls in common Rust code, often by 20-30%.

Fixes PR28958.

Differential Revision: https://reviews.llvm.org/D38374

llvm-svn: 319482
2017-11-30 22:10:53 +00:00
Xinliang David Li c23d2c6883 [PGO] Skip counter promotion for infinite loops
Differential Revision: http://reviews.llvm.org/D40662

llvm-svn: 319462
2017-11-30 19:16:25 +00:00
Hiroshi Inoue 21e8ded4d2 Revert rL319407: [SROA] enable splitting for non-whole-alloca loads and stores
This reverts commit rL319407 due to failures in some buildbot.

llvm-svn: 319410
2017-11-30 08:29:51 +00:00
Hiroshi Inoue 422e80aee2 [SROA] enable splitting for non-whole-alloca loads and stores
Currently, SROA splits loads and stores only when they are accessing the whole alloca.
This patch relaxes this limitation to allow splitting a load/store if all other loads and stores to the alloca are disjoint to or fully included in the current load/store. If there is no other load or store that crosses the boundary of the current load/store, the current splitting implementation works as is.
The whole-alloca loads and stores meet this new condition and so they are still splittable.

Here is a simplified motivating example.

struct record {
    long long a;
    int b;
    int c;
};

int func(struct record r) {
    for (int i = 0; i < r.c; i++)
        r.b++;
    return r.b;
}

When updating r.b (or r.c as well), LLVM generates redundant instructions on some platforms (such as x86_64, ppc64); here, r.b and r.c are packed into one 64-bit GPR when the struct is passed as a method argument.

With this patch, the above example is compiled into only few instructions without loop.
Without the patch, unnecessary loop-carried dependency is introduced by SROA and the loop cannot be eliminated by the later optimizers.

Differential Revision: https://reviews.llvm.org/D32998

llvm-svn: 319407
2017-11-30 07:44:46 +00:00
Graham Yiu 70293fa27a - Removed unused lamba (IsReturnBlock) causing build bots to fail for r319398
- Added lit testcases that were supposed to be part of r319398

llvm-svn: 319399
2017-11-30 03:36:57 +00:00
Graham Yiu 8b1882c186 With PGO information, we can do more aggressive outlining of cold regions in the inline candidate function. This contrasts with the scheme of keeping only the 'early return' portion of the inline candidate and outlining the rest of the function as a single function call.
Support for outlining multiple regions of each function is added, as well as some basic heuristics to determine which regions are good to outline. Outline candidates limited to regions that are single-entry & single-exit. We also avoid outlining regions that produce live-exit variables, which may inhibit some forms of code motion (like commoning).

Fallback to the regular partial inlining scheme is retained when either i) no regions are identified for outlining in the function, or ii) the outlined function could not be inlined in any of its callers.

Differential Revision: https://reviews.llvm.org/D38190

llvm-svn: 319398
2017-11-30 02:41:36 +00:00
Peter Collingbourne 9e3175bb6b LowerTypeTests: Deduplicate code. NFC.
llvm-svn: 319390
2017-11-30 00:27:08 +00:00
Peter Collingbourne 943aca3c27 LowerTypeTests: Remove unnecessary cast. NFC.
llvm-svn: 319387
2017-11-30 00:02:55 +00:00
Adam Nemet 2e92289014 Demote this opt remark to DEBUG.
From a random opt-stat output:

Top 10 remarks:
  tailcallelim/tailcall          53%
  inline/AlwaysInline            13%
  gvn/LoadClobbered              13%
  inline/Inlined                  8%
  inline/TooCostly                2%
  inline/NoDefinition             2%
  licm/LoadWithLoopInvariantAddressInvalidated  2%
  licm/Hoisted                    1%
  asm-printer/InstructionCount    1%
  prologepilog/StackSize          1%

llvm-svn: 319235
2017-11-28 22:11:00 +00:00
Adrian Prantl 77d90b0c39 SROA: Don't create variable fragments that are outside of the variable.
An alloca may be larger than a variable that is described to be stored
there. Don't create a dbg.value for fragments that are outside of the
variable.

This fixes PR35447.
https://bugs.llvm.org/show_bug.cgi?id=35447

llvm-svn: 319230
2017-11-28 21:30:38 +00:00
Hans Wennborg ca46db957d EntryExitInstrumenter: set DebugLocs on the inserted call instructions (PR35412)
Apparently the verifier requires that inlineable calls in a function
with debug info have debug locations.

llvm-svn: 319199
2017-11-28 18:44:26 +00:00
Jonas Paulsson f0ff20f1f0 Use getStoreSize() in various places instead of 'BitSize >> 3'.
This is needed for cases when the memory access is not as big as the width of
the data type. For instance, storing i1 (1 bit) would be done in a byte (8
bits).

Using 'BitSize >> 3' (or '/ 8') would e.g. give the memory access of an i1 a
size of 0, which for instance makes alias analysis return NoAlias even when
it shouldn't.

There are no tests as this was done as a follow-up to the bugfix for the case
where this was discovered (r318824). This handles more similar cases.

Review: Björn Petterson
https://reviews.llvm.org/D40339

llvm-svn: 319173
2017-11-28 14:44:32 +00:00
Chandler Carruth c34f789e38 Add a new pass to speculate around PHI nodes with constant (integer) operands when profitable.
The core idea is to (re-)introduce some redundancies where their cost is
hidden by the cost of materializing immediates for constant operands of
PHI nodes. When the cost of the redundancies is covered by this,
avoiding materializing the immediate has numerous benefits:
1) Less register pressure
2) Potential for further folding / combining
3) Potential for more efficient instructions due to immediate operand

As a motivating example, consider the remarkably different cost on x86
of a SHL instruction with an immediate operand versus a register
operand.

This pattern turns up surprisingly frequently, but is somewhat rarely
obvious as a significant performance problem.

The pass is entirely target independent, but it does rely on the target
cost model in TTI to decide when to speculate things around the PHI
node. I've included x86-focused tests, but any target that sets up its
immediate cost model should benefit from this pass.

There is probably more that can be done in this space, but the pass
as-is is enough to get some important performance on our internal
benchmarks, and should be generally performance neutral, but help with
more extensive benchmarking is always welcome.

One awkward part is that this pass has to be scheduled after
*everything* that can eliminate these kinds of redundancies. This
includes SimplifyCFG, GVN, etc. I'm open to suggestions about better
places to put this. We could in theory make it part of the codegen pass
pipeline, but there doesn't really seem to be a good reason for that --
it isn't "lowering" in any sense and only relies on pretty standard cost
model based TTI queries, so it seems to fit well with the "optimization"
pipeline model. Still, further thoughts on the pipeline position are
welcome.

I've also only implemented this in the new pass manager. If folks are
very interested, I can try to add it to the old PM as well, but I didn't
really see much point (my use case is already switched over to the new
PM).

I've tested this pretty heavily without issue. A wide range of
benchmarks internally show no change outside the noise, and I don't see
any significant changes in SPEC either. However, the size class
computation in tcmalloc is substantially improved by this, which turns
into a 2% to 4% win on the hottest path through tcmalloc for us, so
there are definitely important cases where this is going to make
a substantial difference.

Differential revision: https://reviews.llvm.org/D37467

llvm-svn: 319164
2017-11-28 11:32:31 +00:00
Florian Hahn 25ea91a838 [TailRecursionElimination] Skip debug intrinsics.
Summary:
I think we do not need to analyze debug intrinsics here, as they should
not impact codegen. This has 2 benefits: 1) slightly less work to do and
2) avoiding generating optimization remarks for converting calls to
debug intrinsics to tail calls, which are not really helpful for users.

Based on work by Sander de Smalen.

Reviewers: davide, trentxintong, aprantl

Reviewed By: aprantl

Subscribers: llvm-commits, JDevlieghere

Tags: #debug-info

Differential Revision: https://reviews.llvm.org/D40440

llvm-svn: 319158
2017-11-28 09:32:25 +00:00
Max Kazantsev 115607226a [GVN] Prevent ScalarPRE from hoisting across instructions that don't pass control flow to successors
This is to address a problem similar to those in D37460 for Scalar PRE. We should not
PRE across an instruction that may not pass execution to its successor unless it is safe
to speculatively execute it.

Differential Revision: https://reviews.llvm.org/D38619

llvm-svn: 319147
2017-11-28 07:07:55 +00:00
Rafael Espindola c06f55e1e8 This reverts commit r319096 and r319097.
Revert "[SROA] Propagate !range metadata when moving loads."
Revert "[Mem2Reg] Clang-format unformatted parts of this file. NFCI."

Davide says they broke a bot.

llvm-svn: 319131
2017-11-28 01:25:38 +00:00
Adrian Prantl d7f6f1636d SROA: Avoid creating a fragment expression that covers the entire variable.
Fixes PR35416.

https://bugs.llvm.org/show_bug.cgi?id=35416

llvm-svn: 319126
2017-11-28 00:57:53 +00:00
Davide Italiano 824d71a9c5 [Mem2Reg] Clang-format unformatted parts of this file. NFCI.
llvm-svn: 319097
2017-11-27 21:25:52 +00:00
Davide Italiano b5d59e73ee [SROA] Propagate !range metadata when moving loads.
This tries to propagate !range metadata to a pre-existing load
when a load is optimized out. This is done instead of adding an
assume because converting loads to and from assumes creates a
lot of IR.

Patch by Ariel Ben-Yehuda.

Differential Revision:  https://reviews.llvm.org/D37216

llvm-svn: 319096
2017-11-27 21:25:13 +00:00
Sanjay Patel 0de1a4bc2d [PartiallyInlineLibCalls][x86] add TTI hook to allow sqrt inlining to depend on arg rather than result
This should fix PR31455:
https://bugs.llvm.org/show_bug.cgi?id=31455

Differential Revision: https://reviews.llvm.org/D28314

llvm-svn: 319094
2017-11-27 21:15:43 +00:00
Arnold Schwaighofer d9e710984d Inliner: Don't mark notail calls with the 'tail' attribute
enum TailCallKind { TCK_None = 0, TCK_Tail = 1, TCK_MustTail = 2,
                    TCK_NoTail = 3 };

TCK_NoTail is greater than TCK_Tail so taking the min does not do the
correct thing.

rdar://35639547

llvm-svn: 319075
2017-11-27 19:03:40 +00:00
Sanjay Patel 863d494730 [InstCombine] use 'auto' with 'dyn_cast'; NFC
llvm-svn: 319067
2017-11-27 18:19:32 +00:00
Benjamin Kramer 51ebcaaf25 Make helpers static. NFC.
llvm-svn: 318953
2017-11-24 14:55:41 +00:00
Alexander Potapenko 9e5477f473 MSan: remove an unnecessary cast. NFC for userspace instrumenetation.
llvm-svn: 318923
2017-11-23 15:06:51 +00:00
Alexander Potapenko 391804f54b [MSan] Move the access address check before the shadow access for that address
MSan used to insert the shadow check of the store pointer operand
_after_ the shadow of the value operand has been written.
This happens to work in the userspace, as the whole shadow range is
always mapped. However in the kernel the shadow page may not exist, so
the bug may cause a crash.

This patch moves the address check in front of the shadow access.

llvm-svn: 318901
2017-11-23 08:34:32 +00:00
Max Kazantsev 716e647d74 [IRCE][NFC] Add no wrap flags to no-wrapping SCEV calculation
In a lambda where we expect to have result within bounds, add respective `nsw/nuw` flags to
help SCEV just in case if it fails to figure them out on its own.

Differential Revision: https://reviews.llvm.org/D40168

llvm-svn: 318898
2017-11-23 06:14:39 +00:00
Davide Italiano b480b5c2ee [SCCP] Pick the right lattice value for constants.
After the dataflow algorithm proves that an argument is constant,
it replaces it value with the integer constant and drops the lattice
value associated to the DEF.

e.g. in the example we have @f() that's called twice:
call @f(undef, ...)
call @f(2, ...)

`undef` MEET 2 = 2 so we replace the argument and all its uses with
the constant 2.

Shortly after, tryToReplaceWithConstantRange() tries to get the lattice
value for the argument we just replaced, causing an assertion.
This function is a little peculiar as it runs when we're doing replacement
and not as part of the solver but still queries the solver.

The fix is that of checking whether we replaced the value already and
get a temporary lattice value for the constant.

Thanks to Zhendong Su for the report!

Fixes PR35357.

llvm-svn: 318817
2017-11-22 03:04:55 +00:00
Hans Wennborg 37cbf28e79 EntryExitInstrumenter: support __cyg_profile_func_enter_bare
It works just like __cyg_profile_func_enter but takes no arguments.

llvm-svn: 318783
2017-11-21 17:22:19 +00:00
Alina Sbirlea ff8b8aea2e Add MemorySSA as loop dependency, disabled by default [NFC].
Summary:
First step in adding MemorySSA as dependency for loop pass manager.
Adding the dependency under a flag.

New pass manager: MSSA pointer in LoopStandardAnalysisResults can be null.
Legacy and new pass manager: Use cl::opt EnableMSSALoopDependency. Disabled by default.

Reviewers: sanjoy, davide, gberry

Subscribers: mehdi_amini, Prazek, llvm-commits

Differential Revision: https://reviews.llvm.org/D40274

llvm-svn: 318772
2017-11-21 15:45:46 +00:00
NAKAMURA Takumi 519ea284af SLPVectorizer.cpp: Avoid std::stable_sort(properlyDominates()).
properlyDominates() shouldn't be used as sort key. It causes different output between stdlibc++ and libc++.
Instead, I introduced RPOT. In most cases, it works for CSE.

llvm-svn: 318743
2017-11-21 09:41:01 +00:00
Davide Italiano 5df8080011 [SCCP] If we replace with a constant, we can't replace with a range.
This microoptimization is NFC.

llvm-svn: 318711
2017-11-21 00:21:52 +00:00
Vitaly Buka 8000f228b3 [msan] Don't sanitize "nosanitize" instructions
Reviewers: eugenis

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D40205

llvm-svn: 318708
2017-11-20 23:37:56 +00:00
Hiroshi Yamauchi c94d4d70d8 Add heuristics for irreducible loop metadata under PGO
Summary:
Add the following heuristics for irreducible loop metadata:

- When an irreducible loop header is missing the loop header weight metadata,
  give it the minimum weight seen among other headers.
- Annotate indirectbr targets with the loop header weight metadata (as they are
  likely to become irreducible loop headers after indirectbr tail duplication.)

These greatly improve the accuracy of the block frequency info of the Python
interpreter loop (eg. from ~3-16x off down to ~40-55% off) and the Python
performance (eg. unpack_sequence from ~50% slower to ~8% faster than GCC) due to
better register allocation under PGO.

Reviewers: davidxl

Reviewed By: davidxl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D39980

llvm-svn: 318693
2017-11-20 21:03:38 +00:00
Teresa Johnson 3309002a86 [SROA] Correctly invalidate analyses when dead instructions deleted
Summary:
SROA can fail in rewriting alloca but still rewrite a phi resulting
in dead instruction elimination. The Changed flag was not being set
correctly, resulting in downstream passes using stale analyses.
The included test case will assert during the second BDCE pass as a
result.

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D39921

llvm-svn: 318677
2017-11-20 18:33:38 +00:00
Evgeniy Stepanov 8e7018d92f [asan] Use dynamic shadow on 32-bit Android, try 2.
Summary:
This change reverts r318575 and changes FindDynamicShadowStart() to
keep the memory range it found mapped PROT_NONE to make sure it is
not reused. We also skip MemoryRangeIsAvailable() check, because it
is (a) unnecessary, and (b) would fail anyway.

Reviewers: pcc, vitalybuka, kcc

Subscribers: srhines, kubamracek, mgorny, llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D40203

llvm-svn: 318666
2017-11-20 17:41:57 +00:00
Gil Rapaport 8b9d1f3c5b [LV] Model masking in VPlan, introducing VPInstructions
This patch adds a new abstraction layer to VPlan and leverages it to model the planned
instructions that manipulate masks (AND, OR, NOT), introduced during predication.

The new VPValue and VPUser classes model how data flows into, through and out
of a VPlan, forming the vertices of a planned Def-Use graph. The new
VPInstruction class is a generic single-instruction Recipe that models a
planned instruction along with its opcode, operands and users. See
VectorizationPlan.rst for more details.

Differential Revision: https://reviews.llvm.org/D38676

llvm-svn: 318645
2017-11-20 12:01:47 +00:00
Max Kazantsev 268467869b [IRCE] Smart range intersection
In rL316552, we ban intersection of unsigned latch range with signed range check and vice
versa, unless the entire range check iteration space is known positive. It was a correct
functional fix that saved us from dealing with ambiguous values, but it also appeared
to be a very restrictive limitation. In particular, in the following case:

  loop:
    %iv = phi i32 [ 0, %preheader ], [ %iv.next, %latch]
    %iv.offset = add i32 %iv, 10
    %rc = icmp slt i32 %iv.offset, %len
    br i1 %rc, label %latch, label %deopt

  latch:
    %iv.next = add i32 %iv, 11
    %cond = icmp i32 ult %iv.next, 100
    br it %cond, label %loop, label %exit

Here, the unsigned iteration range is `[0, 100)`, and the safe range for range
check is `[-10, %len - 10)`. For unsigned iteration spaces, we use unsigned
min/max functions for range intersection. Given this, we wanted to avoid dealing
with `-10` because it is interpreted as a very big unsigned value. Semantically, range
check's safe range goes through unsigned border, so in fact it is two disjoint
ranges in IV's iteration space. Intersection of such ranges is not trivial, so we prohibited
this case saying that we are not allowed to intersect such ranges.

What semantics of this safe range actually means is that we can start from `-10` and go
up increasing the `%iv` by one until we reach `%len - 10` (for simplicity let's assume that
`%len - 10`  is a reasonably big positive value).

In particular, this safe iteration space includes `0, 1, 2, ..., %len - 11`. So if we were able to return
safe iteration space `[0, %len - 10)`, we could safely intersect it with IV's iteration space. All
values in this range are non-negative, so using signed/unsigned min/max for them is unambiguous.

In this patch, we alter the algorithm of safe range calculation so that it returnes a subset of the
original safe space which is represented by one continuous range that does not go through wrap.
In order to reach this, we use modified SCEV substraction function. It can be imagined as a function
that substracts by `1` (or `-1`) as long as the further substraction does not cause a wrap in IV iteration
space. This allows us to perform IRCE in many situations when we deal with IV space and range check
of different types (in terms of signed/unsigned).

We apply this approach for both matching and not matching types of IV iteration space and the
range check. One implication of this is that now IRCE became smarter in detection of empty safe
ranges. For example, in this case:
  loop:
    %iv = phi i32 [ %begin, %preheader ], [ %iv.next, %latch]
    %iv.offset = sub i32 %iv, 10
    %rc = icmp ult i32 %iv.offset, %len
    br i1 %rc, label %latch, label %deopt

  latch:
    %iv.next = add i32 %iv, 11
    %cond = icmp i32 ult %iv.next, 100
    br it %cond, label %loop, label %exit

If `%len` was less than 10 but SCEV failed to trivially prove that `%begin - 10 >u %len- 10`,
we could end up executing entire loop in safe preloop while the main loop was still generated,
but never executed. Now, cutting the ranges so that if both `begin - 10` and `%len - 10` overflow,
we have a trivially empty range of `[0, 0)`. This in some cases prevents us from meaningless optimization.

Differential Revision: https://reviews.llvm.org/D39954

llvm-svn: 318639
2017-11-20 06:07:57 +00:00
Sanjay Patel 9771a96f6e [LibCallSimplifier] allow splat vectors for pow(x, 0.5) -> sqrt() transforms
llvm-svn: 318629
2017-11-19 16:42:27 +00:00
Sanjay Patel fbd3e66b9a [LibCallSimplifier] partly fix pow(x, 0.5) -> sqrt() transforms
As the first test shows, we could transform an llvm intrinsic which never sets errno 
into a libcall which could set errno (even though it's marked readnone?), so that's 
not ideal.

It's possible that we can also transform a libcall which could set errno to an
intrinsic given the fast-math-flags constraint, but that's deferred to determine
exactly which set of FMF are needed.

Differential Revision: https://reviews.llvm.org/D40150

llvm-svn: 318628
2017-11-19 16:13:14 +00:00
Florian Hahn 2a266a343f [CallSiteSplitting] Remove some indirection (NFC).
Summary:
With this patch I tried to reduce the complexity of the code sightly, by
removing some indirection. Please let me know what you think.

Reviewers: junbuml, mcrosier, davidxl

Reviewed By: junbuml

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D40037

llvm-svn: 318593
2017-11-18 18:14:13 +00:00
Walter Lee 9abeecc07c [asan] Add a full redzone after every stack variable
We were not doing that for large shadow granularity.  Also add more
stack frame layout tests for large shadow granularity.

Differential Revision: https://reviews.llvm.org/D39475

llvm-svn: 318581
2017-11-18 01:13:18 +00:00
Evgeniy Stepanov 9d564cdcb0 Revert "[asan] Use dynamic shadow on 32-bit Android" and 3 more.
Revert the following commits:
  r318369 [asan] Fallback to non-ifunc dynamic shadow on android<22.
  r318235 [asan] Prevent rematerialization of &__asan_shadow.
  r317948 [sanitizer] Remove unnecessary attribute hidden.
  r317943 [asan] Use dynamic shadow on 32-bit Android.

MemoryRangeIsAvailable() reads /proc/$PID/maps into an mmap-ed buffer
that may overlap with the address range that we plan to use for the
dynamic shadow mapping. This is causing random startup crashes.

llvm-svn: 318575
2017-11-18 00:22:34 +00:00
Jun Bum Lim 0f90672ae9 [LICM] Fix PR35342
Summary: This change fix PR35342 by replacing only the current use with undef in unreachable blocks.

Reviewers: efriedma, mcrosier, igor-laevsky

Reviewed By: efriedma

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D40184

llvm-svn: 318551
2017-11-17 20:38:25 +00:00
Chandler Carruth 693eedb138 [PM/Unswitch] Teach SimpleLoopUnswitch to do non-trivial unswitching,
making it no longer even remotely simple.

The pass will now be more of a "full loop unswitching" pass rather than
anything substantively simpler than any other approach. I plan to rename
it accordingly once the dust settles.

The key ideas of the new loop unswitcher are carried over for
non-trivial unswitching:
1) Fully unswitch a branch or switch instruction from inside of a loop to
   outside of it.
2) Update the CFG and IR. This avoids needing to "remember" the
   unswitched branches as well as avoiding excessively cloning and
   reliance on complex parts of simplify-cfg to cleanup the cfg.
3) Update the analyses (where we can) rather than just blowing them away
   or relying on something else updating them.

Sadly, #3 is somewhat compromised here as the dominator tree updates
were too complex for me to want to reason about. I will need to make
another attempt to do this now that we have a nice dynamic update API
for dominators. However, we do adhere to #3 w.r.t. LoopInfo.

This approach also adds an important principls specific to non-trivial
unswitching: not *all* of the loop will be duplicated when unswitching.
This fact allows us to compute the cost in terms of how much *duplicate*
code is inserted rather than just on raw size. Unswitching conditions
which essentialy partition loops will work regardless of the total loop
size.

Some remaining issues that I will be addressing in subsequent commits:
- Handling unstructured control flow.
- Unswitching 'switch' cases instead of just branches.
- Moving to the dynamic update API for dominators.

Some high-level, interesting limitationsV that folks might want to push
on as follow-ups but that I don't have any immediate plans around:
- We could be much more clever about not cloning things that will be
  deleted. In fact, we should be able to delete *nothing* and do
  a minimal number of clones.
- There are many more interesting selection criteria for which branch to
  unswitch that we might want to look at. One that I'm interested in
  particularly are a set of conditions which all exit the loop and which
  can be merged into a single unswitched test of them.

Differential revision: https://reviews.llvm.org/D34200

llvm-svn: 318549
2017-11-17 19:58:36 +00:00
Max Kazantsev 1ac6e8ae61 [IRCE] Remove folding of two range checks into RANGE_CHECK_BOTH
The logic of replacing of a couple `RANGE_CHECK_LOWER + RANGE_CHECK_UPPER`
into `RANGE_CHECK_BOTH` in fact duplicates the logic of range intersection which
happens when we calculate safe iteration space. Effectively, the result of intersection of
these ranges doesn't differ from the range of merged range check.

We chose to remove duplicating logic in favor of code simplicity.

Differential Revision: https://reviews.llvm.org/D39589

llvm-svn: 318508
2017-11-17 06:49:26 +00:00
David Blaikie b3bde2ea50 Fix a bunch more layering of CodeGen headers that are in Target
All these headers already depend on CodeGen headers so moving them into
CodeGen fixes the layering (since CodeGen depends on Target, not the
other way around).

llvm-svn: 318490
2017-11-17 01:07:10 +00:00
Mandeep Singh Grang e6bb66357c [PredicateInfo] Add comment about why we require stable sort
llvm-svn: 318487
2017-11-17 00:43:24 +00:00
Walter Lee 8f1545c629 [asan] Fix small X86_64 ShadowOffset for non-default shadow scale
The requirement is that shadow memory must be aligned to page
boundaries (4k in this case).  Use a closed form equation that always
satisfies this requirement.

Differential Revision: https://reviews.llvm.org/D39471

llvm-svn: 318421
2017-11-16 17:03:00 +00:00
Sanjay Patel b3fa94586f [InstCombine] include 'sub' in the list of narrow-able binops
// trunc (binop X, C) --> binop (trunc X, C')
      // trunc (binop (ext X), Y) --> binop X, (trunc Y)

I'm grouping sub with the other binops  because that makes the code simpler
and the transforms are valid:
https://rise4fun.com/Alive/UeF
...so even though we don't expect a sub with constant Op1 or any of the
other opcodes with constant Op0 due to canonicalization rules, we might as
well handle those situations if non-canonical code somehow reaches this
point (it should just make instcombine more efficient in reaching its
end goal).

This should solve the problem that later manifests in the vectorizers in 
PR35295:
https://bugs.llvm.org/show_bug.cgi?id=35295

llvm-svn: 318404
2017-11-16 14:40:51 +00:00
Walter Lee 2a2b69e9c7 [asan] Fix size/alignment issues with non-default shadow scale
Fix a couple places where the minimum alignment/size should be a
function of the shadow granularity:
- alignment of AllGlobals
- the minimum left redzone size on the stack

Added a test to verify that the metadata_array is properly aligned
for shadow scale of 5, to be enabled when we add build support
for testing shadow scale of 5.

Differential Revision: https://reviews.llvm.org/D39470

llvm-svn: 318395
2017-11-16 12:57:19 +00:00
Max Kazantsev b1b8aff2e7 [IRCE] Fix SCEVExpander's usage in IRCE
When expanding exit conditions for pre- and postloops, we may end up expanding a
recurrency from the loop to in its loop's preheader. This produces incorrect IR.

This patch ensures that IRCE uses SCEVExpander correctly and only expands code which
is safe to expand in this particular location.

Differentian Revision: https://reviews.llvm.org/D39234

llvm-svn: 318381
2017-11-16 06:06:27 +00:00
Evgeniy Stepanov 396ed67950 [asan] Fallback to non-ifunc dynamic shadow on android<22.
Summary: Android < 22 does not support ifunc.

Reviewers: pcc

Subscribers: srhines, kubamracek, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D40116

llvm-svn: 318369
2017-11-16 02:52:19 +00:00
Craig Topper 062bcf30b1 [GVNHoist] Fix a signed/unsigned comparison warning that occurs in 32-bit builds with gcc.
std::distance returns ptrdiff_t which is signed. 64-bit builds don't notice because type promotion widens the unsigned first.

llvm-svn: 318354
2017-11-16 00:19:59 +00:00
Sanjay Patel 03d0cd6a81 [InstCombine] trunc (binop X, C) --> binop (trunc X, C')
Note that one-use and shouldChangeType() are checked ahead of the switch.

Without the narrowing folds, we can produce inferior vector code as shown in PR35299:
https://bugs.llvm.org/show_bug.cgi?id=35299

llvm-svn: 318323
2017-11-15 19:12:01 +00:00
Reid Kleckner 72b819b8ee [InstCombine] Salvage debug info during initial DCE
InstCombine salvages debug info for every instruction it erases from its
worklist, but it wasn't doing it during its initial DCE when populating
its worklist. This fixes that.

This should help improve availability of 'this' in optimized debug info
when casts are necessary.

llvm-svn: 318320
2017-11-15 18:51:12 +00:00
Adam Nemet 572a87c76f [SLP] Added more missed optimization remarks
Summary:
Added more remarks to SLP pass, in particular "missed" optimization remarks.
Also proposed several tests for new functionality.

Patch by Vladimir Miloserdov!

For reference you may look at: https://reviews.llvm.org/rL302811

Reviewers: anemet, fhahn

Reviewed By: anemet

Subscribers: javed.absar, lattner, petecoup, yakush, llvm-commits

Differential Revision: https://reviews.llvm.org/D38367

llvm-svn: 318307
2017-11-15 17:04:53 +00:00
Sanjay Patel d1becd082a [Reassociate] simplify code; NFCI
llvm-svn: 318298
2017-11-15 16:19:17 +00:00
Craig Topper f7b86728fa [InstCombine] Simplify binops that are only used by a select and are fed by a select with the same condition.
Summary:
This patch optimizes a binop sandwiched between 2 selects with the same condition. Since we know its only used by the select we can propagate the appropriate input value from the earlier select.

As I'm writing this I realize I may need to avoid doing this for division in case the select was protecting a divide by zero?

Reviewers: spatel, majnemer

Reviewed By: majnemer

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D39999

llvm-svn: 318267
2017-11-15 05:23:02 +00:00
Hans Wennborg 45cabacd2f Revert r318193 "[SLPVectorizer] Failure to beneficially vectorize 'copyable' elements in integer binary ops."
It crashes building sqlite; see reply on the llvm-commits thread.

> [SLPVectorizer] Failure to beneficially vectorize 'copyable' elements in integer binary ops.
>
>         Patch tries to improve vectorization of the following code:
>
>         void add1(int * __restrict dst, const int * __restrict src) {
>           *dst++ = *src++;
>           *dst++ = *src++ + 1;
>           *dst++ = *src++ + 2;
>           *dst++ = *src++ + 3;
>         }
>         Allows to vectorize even if the very first operation is not a binary add, but just a load.
>
>         Fixed issues related to previous commit.
>
>         Reviewers: spatel, mzolotukhin, mkuper, hfinkel, RKSimon, filcab, ABataev
>
>         Reviewed By: ABataev, RKSimon
>
>         Subscribers: llvm-commits, RKSimon
>
>         Differential Revision: https://reviews.llvm.org/D28907

llvm-svn: 318239
2017-11-15 00:38:13 +00:00
Craig Topper bf6495fbcb [LoopRotate] processLoop should return true even if it just simplified the loop latch without making any other changes
Simplifying a loop latch changes the IR and we need to make sure the pass manager knows to invalidate analysis passes if that happened.

PR35210 discovered a case where we failed to invalidate the post dominator tree after this simplification because we no changes other than simplifying the loop latch.

Fixes PR35210.

Differential Revision: https://reviews.llvm.org/D40035

llvm-svn: 318237
2017-11-15 00:22:42 +00:00
Evgeniy Stepanov cff19ee233 [asan] Prevent rematerialization of &__asan_shadow.
Summary:
In the mode when ASan shadow base is computed as the address of an
external global (__asan_shadow, currently on android/arm32 only),
regalloc prefers to rematerialize this value to save register spills.
Even in -Os. On arm32 it is rather expensive (2 loads + 1 constant
pool entry).

This changes adds an inline asm in the function prologue to suppress
this behavior. It reduces AsanTest binary size by 7%.

Reviewers: pcc, vitalybuka

Subscribers: aemerson, kristof.beyls, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D40048

llvm-svn: 318235
2017-11-15 00:11:51 +00:00
Davide Italiano 1380cb8055 [EntryExitInstrumenter] Placate GCC, the semicolon is redundant. NFCI.
llvm-svn: 318217
2017-11-14 23:13:38 +00:00
Sanjay Patel 64fd333304 [Reassociate] use dyn_cast instead of isa+cast; NFCI
llvm-svn: 318212
2017-11-14 23:03:56 +00:00
Reid Kleckner 29a5c03cc2 Make salvageDebugInfo of casts work for dbg.declare and dbg.addr
Summary:
Instcombine (and probably other passes) sometimes want to change the
type of an alloca. To do this, they generally create a new alloca with
the desired type, create a bitcast to make the new pointer type match
the old pointer type, replace all uses with the cast, and then simplify
the casts. We already knew how to salvage dbg.value instructions when
removing casts, but we can extend it to cover dbg.addr and dbg.declare.

Fixes a debug info quality issue uncovered in Chromium in
http://crbug.com/784609

Reviewers: aprantl, vsk

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D40042

llvm-svn: 318203
2017-11-14 21:49:06 +00:00
Hans Wennborg e1ecd61b98 Rename CountingFunctionInserter and use for both mcount and cygprofile calls, before and after inlining
Clang implements the -finstrument-functions flag inherited from GCC, which
inserts calls to __cyg_profile_func_{enter,exit} on function entry and exit.

This is useful for getting a trace of how the functions in a program are
executed. Normally, the calls remain even if a function is inlined into another
function, but it is useful to be able to turn this off for users who are
interested in a lower-level trace, i.e. one that reflects what functions are
called post-inlining. (We use this to generate link order files for Chromium.)

LLVM already has a pass for inserting similar instrumentation calls to
mcount(), which it does after inlining. This patch renames and extends that
pass to handle calls both to mcount and the cygprofile functions, before and/or
after inlining as controlled by function attributes.

Differential Revision: https://reviews.llvm.org/D39287

llvm-svn: 318195
2017-11-14 21:09:45 +00:00
Dinar Temirbulatov 2bd1836520 [SLPVectorizer] Failure to beneficially vectorize 'copyable' elements in integer binary ops.
Patch tries to improve vectorization of the following code:
    
        void add1(int * __restrict dst, const int * __restrict src) {
          *dst++ = *src++;
          *dst++ = *src++ + 1;
          *dst++ = *src++ + 2;
          *dst++ = *src++ + 3;
        }
        Allows to vectorize even if the very first operation is not a binary add, but just a load.
    
        Fixed issues related to previous commit.
    
        Reviewers: spatel, mzolotukhin, mkuper, hfinkel, RKSimon, filcab, ABataev
    
        Reviewed By: ABataev, RKSimon
    
        Subscribers: llvm-commits, RKSimon
    
        Differential Revision: https://reviews.llvm.org/D28907

llvm-svn: 318193
2017-11-14 20:55:08 +00:00
Mandeep Singh Grang b8a11bbcf1 [PredicateInfo] Stable sort ValueDFS to remove non-deterministic ordering
Summary: This fixes failure in Transforms/Util/PredicateInfo/testandor.ll uncovered by D39245.

Reviewers: dberlin

Reviewed By: dberlin

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D39630

llvm-svn: 318165
2017-11-14 18:22:50 +00:00
Gil Rapaport 848581cadb [LV] Introduce VPBlendRecipe, VPWidenMemoryInstructionRecipe
This patch is part of D38676.

The patch introduces two new Recipes to handle instructions whose vectorization
involves masking. These Recipes take VPlan-level masks in D38676, but still rely
on ILV's existing createEdgeMask(), createBlockInMask() in this patch.

VPBlendRecipe handles intra-loop phi nodes, which are vectorized as a sequence
of SELECTs. Its execute() code is refactored out of ILV::widenPHIInstruction(),
which now handles only loop-header phi nodes.

VPWidenMemoryInstructionRecipe handles load/store which are to be widened
(but are not part of an Interleave Group). In this patch it simply calls
ILV::vectorizeMemoryInstruction on execute().

Differential Revision: https://reviews.llvm.org/D39068

llvm-svn: 318149
2017-11-14 12:09:30 +00:00
Chandler Carruth 00a301d568 [PM] Port BoundsChecking to the new PM.
Registers it and everything, updates all the references, etc.

Next patch will add support to Clang's `-fexperimental-new-pass-manager`
path to actually enable BoundsChecking correctly.

Differential Revision: https://reviews.llvm.org/D39084

llvm-svn: 318128
2017-11-14 01:30:04 +00:00
Chandler Carruth 1594feea94 [PM] Refactor BoundsChecking further to prepare it to be exposed both as
a legacy and new PM pass.

This essentially moves the class state to parameters and re-shuffles the
code to make that reasonable. It also does some minor cleanups along the
way and leaves some comments.

Differential Revision: https://reviews.llvm.org/D39081

llvm-svn: 318124
2017-11-14 01:13:59 +00:00
Hans Wennborg 08b34a017a Update some code.google.com links
llvm-svn: 318115
2017-11-13 23:47:58 +00:00
Jatin Bhateja c61ade1ca0 [SCEV] Handling for ICmp occuring in the evolution chain.
Summary:
 If a compare instruction is same or inverse of the compare in the
 branch of the loop latch, then return a constant evolution node.
 This shall facilitate computations of loop exit counts in cases
 where compare appears in the evolution chain of induction variables.

 Will fix PR 34538

Reviewers: sanjoy, hfinkel, junryoungju

Reviewed By: sanjoy, junryoungju

Subscribers: javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D38494

llvm-svn: 318050
2017-11-13 16:43:24 +00:00
Bill Seurer 44156a0efb [PowerPC][msan] Update msan to handle changed memory layouts in newer kernels
In more recent Linux kernels (including those with 47 bit VMAs) the layout of
virtual memory for powerpc64 changed causing the memory sanitizer to not
work properly. This patch adjusts a bit mask in the memory sanitizer to work
on the newer kernels while continuing to work on the older ones as well.

This is the non-runtime part of the patch and finishes it. ref: r317802

Tested on several 4.x and 3.x kernel releases.

llvm-svn: 318045
2017-11-13 15:43:19 +00:00
Florian Hahn 7114755913 [CodeExtractor] Add missing AllowVarArgs initialization.
llvm-svn: 318029
2017-11-13 11:08:47 +00:00
Florian Hahn 0e9dec672d [PartialInliner] Inline vararg functions that forward varargs.
Summary:
This patch extends the partial inliner to support inlining parts of
vararg functions, if the vararg handling is done in the outlined part.

It adds a `ForwardVarArgsTo` argument to InlineFunction. If it is
non-null, all varargs passed to the inlined function will be added to
all calls to `ForwardVarArgsTo`.

The partial inliner takes care to only pass `ForwardVarArgsTo` if the
varargs handing is done in the outlined function. It checks that vastart
is not part of the function to be inlined.

`test/Transforms/CodeExtractor/PartialInlineNoInline.ll` (already part
of the repo) checks we do not do partial inlining if vastart is used in
a basic block that will be inlined.

Reviewers: davide, davidxl, grosser

Reviewed By: davide, davidxl, grosser

Subscribers: gyiu, grosser, eraman, llvm-commits

Differential Revision: https://reviews.llvm.org/D39607

llvm-svn: 318028
2017-11-13 10:35:52 +00:00
Craig Topper d3e5781e53 [InstCombine] Teach visitICmpInst to not break integer absolute value idioms
Summary:
This patch adds an early out to visitICmpInst if we are looking at a compare as part of an integer absolute value idiom. Similar is already done for min/max.

In the particular case I observed in a benchmark we had an absolute value of a load from an indexed global. We simplified the compare using foldCmpLoadFromIndexedGlobal into a magic bit vector, a shift, and an and. But the load result was still used for the select and the negate part of the absolute valute idiom. So we overcomplicated the code and lost the ability to recognize it as an absolute value.

I've chosen a simpler case for the test here.

Reviewers: spatel, davide, majnemer

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D39766

llvm-svn: 317994
2017-11-12 02:28:21 +00:00
Evgeniy Stepanov 989299c42b [asan] Use dynamic shadow on 32-bit Android.
Summary:
The following kernel change has moved ET_DYN base to 0x4000000 on arm32:
https://marc.info/?l=linux-kernel&m=149825162606848&w=2

Switch to dynamic shadow base to avoid such conflicts in the future.

Reserve shadow memory in an ifunc resolver, but don't use it in the instrumentation
until PR35221 is fixed. This will eventually let use save one load per function.

Reviewers: kcc

Subscribers: aemerson, srhines, kubamracek, kristof.beyls, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D39393

llvm-svn: 317943
2017-11-10 22:27:48 +00:00
Davide Italiano acf6065183 [SimplifyCFG] Use auto * when the type is obvious. NFCI.
llvm-svn: 317923
2017-11-10 20:46:21 +00:00
Daniel Neilson 6e4aa1e481 Expand IRBuilder interface for atomic memcpy to require pointer alignments. (NFC)
Summary:
 The specification of the @llvm.memcpy.element.unordered.atomic intrinsic requires
that the pointer arguments have alignments of at least the element size. The existing
IRBuilder interface to create a call to this intrinsic does not allow for providing
the alignment of these pointer args. Having an interface that makes it easy to
construct invalid intrinsic calls doesn't seem sensible, so this patch simply
adds the requirement that one provide the argument alignments when using IRBuilder
to create atomic memcpy calls.

llvm-svn: 317918
2017-11-10 19:38:12 +00:00
Sanjoy Das 6fabb90765 [CVP] Remove some {s|u}add.with.overflow checks.
Summary:
This adds logic to CVP to remove some overflow checks.  It uses LVI to remove
operations with at least one constant.  Specifically, this can remove many
overflow intrinsics immediately following an overflow check in the source code,
such as:

if (x < INT_MAX)
    ... x + 1 ...

Patch by Joel Galenson!

Reviewers: sanjoy, regehr

Reviewed By: sanjoy

Subscribers: fhahn, pirama, srhines, llvm-commits

Differential Revision: https://reviews.llvm.org/D39483

llvm-svn: 317911
2017-11-10 19:13:35 +00:00
Easwaran Raman 0a0913def2 Add a wrapper function to set branch weights metadata.
Summary:
This wrapper checks if there is at least one non-zero weight before
setting the metadata.

Reviewers: davidxl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D39872

llvm-svn: 317845
2017-11-09 22:52:20 +00:00
Paul Robinson b46256b0b4 Fix out-of-order stepping behavior in programs with hoisted constants.
When the Constant Hoisting pass moves expensive constants into a
common block, it would assign a debug location equal to the last use
of that constant. While this is certainly intuitive, it places the
constant in an out-of-order location, according to the debug location
information. This produces out-of-order stepping when debugging
programs affected by this pass.

This patch creates in-order stepping behavior by merging the debug
locations for hoisted constants, and the new insertion point.

Patch by Matthew Voss!

Differential Revision: https://reviews.llvm.org/D38088

llvm-svn: 317827
2017-11-09 20:01:31 +00:00
Alexey Bataev 0bd9004425 [SLP] Fix PR23510: Try to find best possible vectorizable stores.
Summary:
The analysis of the store sequence goes in straight order - from the
first store to the last. Bu the best opportunity for vectorization will
happen if we're going to use reverse order - from last store to the
first. It may be best because usually users have some initialization
part + further processing and this first initialization may confuse
SLP vectorizer.

Reviewers: RKSimon, hfinkel, mkuper, spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D39606

llvm-svn: 317821
2017-11-09 19:07:16 +00:00
Sanjay Patel 0d66010454 [Reassociate] don't name values "tmp"; NFCI
The toxic stew of created values named 'tmp' and tests that already have
values named 'tmp' and CHECK lines looking for values named 'tmp' causes
bad things to happen in our test line auto-generation scripts because it
wants to use 'TMP' as a prefix for unnamed values. Use less 'tmp' to 
avoid that.

llvm-svn: 317818
2017-11-09 18:14:24 +00:00
Serguei Katkov 722339e405 [GVN PRE] Patch the source for Phi node in PRE
We must patch all existing incoming values of Phi node,
otherwise it is possible that we can see poison
where program does not expect to see it.

This is the similar what GVN does.

The added test test/Transforms/GVN/PRE/pre-jt-add.ll shows an
example of wrong optimization done by jump threading due to
GVN PRE did not patch existing incoming value.

Reviewers: mkazantsev, wmi, dberlin, davide
Reviewed By: dberlin
Subscribers: efriedma, llvm-commits
Differential Revision: https://reviews.llvm.org/D39637

llvm-svn: 317768
2017-11-09 06:02:18 +00:00
Dan Gohman 2c74fe977d Add an @llvm.sideeffect intrinsic
This patch implements Chandler's idea [0] for supporting languages that
require support for infinite loops with side effects, such as Rust, providing
part of a solution to bug 965 [1].

Specifically, it adds an `llvm.sideeffect()` intrinsic, which has no actual
effect, but which appears to optimization passes to have obscure side effects,
such that they don't optimize away loops containing it. It also teaches
several optimization passes to ignore this intrinsic, so that it doesn't
significantly impact optimization in most cases.

As discussed on llvm-dev [2], this patch is the first of two major parts.
The second part, to change LLVM's semantics to have defined behavior
on infinite loops by default, with a function attribute for opting into
potential-undefined-behavior, will be implemented and posted for review in
a separate patch.

[0] http://lists.llvm.org/pipermail/llvm-dev/2015-July/088103.html
[1] https://bugs.llvm.org/show_bug.cgi?id=965
[2] http://lists.llvm.org/pipermail/llvm-dev/2017-October/118632.html

Differential Revision: https://reviews.llvm.org/D38336

llvm-svn: 317729
2017-11-08 21:59:51 +00:00
Teresa Johnson 07ec7d59c2 [ThinLTO] Ensure sanitizer passes are run
Summary:
In ThinLTO compilation, we exit populateModulePassManager early and
were not adding PM extension passes meant to run at the end of the
pipeline. This includes sanitizer passes. Add these passes before
the early exit.

A test will be added to projects/compiler-rt.

Reviewers: pcc

Subscribers: mehdi_amini, inglorion, llvm-commits

Differential Revision: https://reviews.llvm.org/D39565

llvm-svn: 317714
2017-11-08 19:45:52 +00:00
Mitch Phillips 0222224da6 Revert rL317618
The implemented pass fails and is breaking a large number of unit tests.
Example:
http://lab.llvm.org:8011/builders/clang-with-lto-ubuntu/builds/5777/steps/build-stage3-compiler/logs/stdio

This reverts commit rL317618

llvm-svn: 317641
2017-11-08 00:20:53 +00:00
Dinar Temirbulatov b9a2832874 [SLPVectorizer] Failure to beneficially vectorize 'copyable' elements in integer binary ops.
Patch tries to improve vectorization of the following code:

    void add1(int * __restrict dst, const int * __restrict src) {
      *dst++ = *src++;
      *dst++ = *src++ + 1;
      *dst++ = *src++ + 2;
      *dst++ = *src++ + 3;
    }
    Allows to vectorize even if the very first operation is not a binary add, but just a load.

    Fixed PR34619 and other issues related to previous commit.

    Reviewers: spatel, mzolotukhin, mkuper, hfinkel, RKSimon, filcab, ABataev

    Reviewed By: ABataev, RKSimon

    Subscribers: llvm-commits, RKSimon

    Differential Revision: https://reviews.llvm.org/D28907

llvm-svn: 317618
2017-11-07 21:25:34 +00:00
Craig Topper 7dd4d32431 Recommit r317510 "[InstCombine] Pull shifts through a select plus binop with constant"
The hexagon test should be fixed now.

Original commit message:

This pulls shifts through a select+binop with a constant where the select conditionally executes the binop. We already do this for just the binop, but not with the select.

This can allow us to get the select closer to other selects to enable removing one.

Differential Revision: https://reviews.llvm.org/D39222

llvm-svn: 317600
2017-11-07 18:47:24 +00:00
Craig Topper 386fc2516c [InstCombine] Update stale comment. NFC
Datalayout is no longer optional so the comment didn't match what the code currently does.

llvm-svn: 317594
2017-11-07 17:37:32 +00:00
Adrian Prantl 25a09dd408 Make DIExpression::createFragmentExpression() return an Optional.
We can't safely split arithmetic into multiple fragments because we
can't express carry-over between fragments.

llvm-svn: 317534
2017-11-07 00:45:34 +00:00
Davide Italiano 1a46affb45 [IPO/LowerTypesTest] Skip blockaddress(es) when replacing uses.
Blockaddresses refer to the function itself, therefore replacing them
would cause an assertion in doRAUW.

Fixes https://bugs.llvm.org/show_bug.cgi?id=35201

This was found when trying CFI on a proprietary kernel by Dmitry Mikulin.

Differential Revision:  https://reviews.llvm.org/D39695

llvm-svn: 317527
2017-11-07 00:09:25 +00:00
Adrian Prantl 182f9fea37 InstCombine: salvage the debug info of DCE'ed add instructions.
rdar://problem/31209283

llvm-svn: 317522
2017-11-06 22:49:39 +00:00
Hans Wennborg 8c4b10e84a Revert r317510 "[InstCombine] Pull shifts through a select plus binop with constant"
This broke the CodeGen/Hexagon/loop-idiom/pmpy-mod.ll test on a bunch of buildbots.

> This pulls shifts through a select+binop with a constant where the select conditionally executes the binop. We already do this for just the binop, but not with the select.
>
> This can allow us to get the select closer to other selects to enable removing one.
>
> Differential Revision: https://reviews.llvm.org/D39222
>
> git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317510 91177308-0d34-0410-b5e6-96231b3b80d8

llvm-svn: 317518
2017-11-06 22:28:02 +00:00
Xinliang David Li a531f189fc Fix comment /NFC
llvm-svn: 317514
2017-11-06 21:57:51 +00:00
Craig Topper 8917647333 [InstCombine] Pull shifts through a select plus binop with constant
This pulls shifts through a select+binop with a constant where the select conditionally executes the binop. We already do this for just the binop, but not with the select.

This can allow us to get the select closer to other selects to enable removing one.

Differential Revision: https://reviews.llvm.org/D39222

llvm-svn: 317510
2017-11-06 21:07:22 +00:00
Dehao Chen 5d2a1a5045 Include already promoted counts when computing SUM for VP.
Summary: When computing the SUM for indirect call promotion, if the callsite is already promoted in the profile, it will be promoted before ICP. In the current implementation, ICP only sees remaining counts in SUM. This may cause extra indirect call targets being promoted. This patch updates the SUM to include the counts already promoted earlier. This way we do not end up promoting too many indirect call targets.

Reviewers: tejohnson

Reviewed By: tejohnson

Subscribers: llvm-commits, sanjoy

Differential Revision: https://reviews.llvm.org/D38763

llvm-svn: 317502
2017-11-06 19:52:49 +00:00
Sanjay Patel 629c411538 [IR] redefine 'UnsafeAlgebra' / 'reassoc' fast-math-flags and add 'trans' fast-math-flag
As discussed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2016-November/107104.html
and again more recently:
http://lists.llvm.org/pipermail/llvm-dev/2017-October/118118.html

...this is a step in cleaning up our fast-math-flags implementation in IR to better match
the capabilities of both clang's user-visible flags and the backend's flags for SDNode.

As proposed in the above threads, we're replacing the 'UnsafeAlgebra' bit (which had the 
'umbrella' meaning that all flags are set) with a new bit that only applies to algebraic 
reassociation - 'AllowReassoc'.

We're also adding a bit to allow approximations for library functions called 'ApproxFunc' 
(this was initially proposed as 'libm' or similar).

...and we're out of bits. 7 bits ought to be enough for anyone, right? :) FWIW, I did 
look at getting this out of SubclassOptionalData via SubclassData (spacious 16-bits), 
but that's apparently already used for other purposes. Also, I don't think we can just 
add a field to FPMathOperator because Operator is not intended to be instantiated. 
We'll defer movement of FMF to another day.

We keep the 'fast' keyword. I thought about removing that, but seeing IR like this:
%f.fast = fadd reassoc nnan ninf nsz arcp contract afn float %op1, %op2
...made me think we want to keep the shortcut synonym.

Finally, this change is binary incompatible with existing IR as seen in the 
compatibility tests. This statement:
"Newer releases can ignore features from older releases, but they cannot miscompile 
them. For example, if nsw is ever replaced with something else, dropping it would be 
a valid way to upgrade the IR." 
( http://llvm.org/docs/DeveloperPolicy.html#ir-backwards-compatibility )
...provides the flexibility we want to make this change without requiring a new IR 
version. Ie, we're not loosening the FP strictness of existing IR. At worst, we will 
fail to optimize some previously 'fast' code because it's no longer recognized as 
'fast'. This should get fixed as we audit/squash all of the uses of 'isFast()'.

Note: an inter-dependent clang commit to use the new API name should closely follow 
commit.

Differential Revision: https://reviews.llvm.org/D39304

llvm-svn: 317488
2017-11-06 16:27:15 +00:00
Sean Fertile 4595a915f6 [LTO][ThinLTO] Use the linker resolutions to mark global values as dso_local.
Now that we have a way to mark GlobalValues as local we can use the symbol
resolutions that the linker plugin provides as part of lto/thinlto link
step to refine the compilers view on what symbols will end up being local.

Originally commited as r317374, but reverted in r317395 to update some missed
tests.

Differential Revision: https://reviews.llvm.org/D35702

llvm-svn: 317408
2017-11-04 17:04:39 +00:00
Sean Fertile 39770ca0a1 Revert "[LTO][ThinLTO] Use the linker resolutions to mark global values ..."
Changes more tests then expected on one of the build bots.
reverting to investigate.

This reverts https://llvm.org/svn/llvm-project/llvm/trunk@317374

llvm-svn: 317395
2017-11-04 01:54:20 +00:00
Davide Italiano c7c05ae4be [CallSiteSplitting] clang-format my last commit. NFCI.
Thanks to Rui for pointing out.

llvm-svn: 317393
2017-11-04 00:44:01 +00:00
Davide Italiano 91b4790b33 [CallSiteSplitting] Silence GCC's -Wparentheses. NFCI.
llvm-svn: 317385
2017-11-03 23:03:38 +00:00
Adrian Prantl 261ac8b23c Invoke salvageDebugInfo from CodeGenPrepare's SinkCast()
This preserves the debug info for the cast operation in the original location.

rdar://problem/33460652

Reapplied r317340 with the test moved into an ARM-specific directory.

llvm-svn: 317375
2017-11-03 21:55:03 +00:00
Sean Fertile 36528c2a9b [LTO][ThinLTO] Use the linker resolutions to mark global values as dso_local.
Now that we have a way to mark GlobalValues as local we can use the symbol
resolutions that the linker plugin provides as part of lto/thinlto link
step to refine the compilers view on what symbols will end up being local.

Differential Revision: https://reviews.llvm.org/D35702

llvm-svn: 317374
2017-11-03 21:45:55 +00:00
Craig Topper 12463779d3 [SimplifyCFG] When merging conditional stores, don't count the store we're merging against the PHINodeFoldingThreshold
Merging conditional stores tries to check to see if the code is if convertible after the store is moved. But the store hasn't been moved yet so its being counted against the threshold.

The patch adds 1 to the threshold comparison to make sure we don't count the store. I've adjusted a test to use a lower threshold to ensure we still do that conversion with the lower threshold.

Differential Revision: https://reviews.llvm.org/D39570

llvm-svn: 317368
2017-11-03 21:08:13 +00:00
Jun Bum Lim 0c99007db1 Recommit r317351 : Add CallSiteSplitting pass
This recommit r317351 after fixing a buildbot failure.

Original commit message:

    Summary:
    This change add a pass which tries to split a call-site to pass
    more constrained arguments if its argument is predicated in the control flow
    so that we can expose better context to the later passes (e.g, inliner, jump
    threading, or IPA-CP based function cloning, etc.).
    As of now we support two cases :

    1) If a call site is dominated by an OR condition and if any of its arguments
    are predicated on this OR condition, try to split the condition with more
    constrained arguments. For example, in the code below, we try to split the
    call site since we can predicate the argument (ptr) based on the OR condition.

    Split from :
          if (!ptr || c)
            callee(ptr);
    to :
          if (!ptr)
            callee(null ptr)  // set the known constant value
          else if (c)
            callee(nonnull ptr)  // set non-null attribute in the argument

    2) We can also split a call-site based on constant incoming values of a PHI
    For example,
    from :
          BB0:
           %c = icmp eq i32 %i1, %i2
           br i1 %c, label %BB2, label %BB1
          BB1:
           br label %BB2
          BB2:
           %p = phi i32 [ 0, %BB0 ], [ 1, %BB1 ]
           call void @bar(i32 %p)
    to
          BB0:
           %c = icmp eq i32 %i1, %i2
           br i1 %c, label %BB2-split0, label %BB1
          BB1:
           br label %BB2-split1
          BB2-split0:
           call void @bar(i32 0)
           br label %BB2
          BB2-split1:
           call void @bar(i32 1)
           br label %BB2
          BB2:
           %p = phi i32 [ 0, %BB2-split0 ], [ 1, %BB2-split1 ]

llvm-svn: 317362
2017-11-03 20:41:16 +00:00
Aaron Ballman ecf0e95267 Add llvm::for_each as a range-based extensions to <algorithm> and make use of it in some cases where it is a more clear alternative to std::for_each.
llvm-svn: 317356
2017-11-03 20:01:25 +00:00
Jun Bum Lim 0eb1c2d63a Revert "Add CallSiteSplitting pass"
Revert due to Buildbot failure.

This reverts commit r317351.

llvm-svn: 317353
2017-11-03 19:17:11 +00:00
Jun Bum Lim 2a58933519 Add CallSiteSplitting pass
Summary:
This change add a pass which tries to split a call-site to pass
more constrained arguments if its argument is predicated in the control flow
so that we can expose better context to the later passes (e.g, inliner, jump
threading, or IPA-CP based function cloning, etc.).
As of now we support two cases :

1) If a call site is dominated by an OR condition and if any of its arguments
are predicated on this OR condition, try to split the condition with more
constrained arguments. For example, in the code below, we try to split the
call site since we can predicate the argument (ptr) based on the OR condition.

Split from :
      if (!ptr || c)
        callee(ptr);
to :
      if (!ptr)
        callee(null ptr)  // set the known constant value
      else if (c)
        callee(nonnull ptr)  // set non-null attribute in the argument

2) We can also split a call-site based on constant incoming values of a PHI
For example,
from :
      BB0:
       %c = icmp eq i32 %i1, %i2
       br i1 %c, label %BB2, label %BB1
      BB1:
       br label %BB2
      BB2:
       %p = phi i32 [ 0, %BB0 ], [ 1, %BB1 ]
       call void @bar(i32 %p)
to
      BB0:
       %c = icmp eq i32 %i1, %i2
       br i1 %c, label %BB2-split0, label %BB1
      BB1:
       br label %BB2-split1
      BB2-split0:
       call void @bar(i32 0)
       br label %BB2
      BB2-split1:
       call void @bar(i32 1)
       br label %BB2
      BB2:
       %p = phi i32 [ 0, %BB2-split0 ], [ 1, %BB2-split1 ]

Reviewers: davidxl, huntergr, chandlerc, mcrosier, eraman, davide

Reviewed By: davidxl

Subscribers: sdesmalen, ashutosh.nema, fhahn, mssimpso, aemerson, mgorny, mehdi_amini, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D39137

llvm-svn: 317351
2017-11-03 19:01:57 +00:00
Evgeny Stupachenko d699de2b50 The patch fixes PR35131
Summary:

Fix a misprint which led to false CTLZ recognition.

Reviewers: craig.topper

Differential Revision: https://reviews.llvm.org/D39585

From: Evgeny Stupachenko <evstupac@gmail.com>
llvm-svn: 317348
2017-11-03 18:50:03 +00:00
Adrian Prantl 8fe9fb0ae5 Revert "Invoke salvageDebugInfo from CodeGenPrepare's SinkCast()"
This reverts commit 317342 while investigating bot breakage.

llvm-svn: 317345
2017-11-03 18:26:36 +00:00
Adrian Prantl 58e9a0bb16 Invoke salvageDebugInfo from CodeGenPrepare's SinkCast()
This preserves the debug info for the cast operation in the original location.

rdar://problem/33460652

llvm-svn: 317340
2017-11-03 18:00:02 +00:00
Jun Bum Lim f5fb3d745d [LICM] sink through non-trivially replicable PHI
Summary:
The current LICM allows sinking an instruction only when it is exposed to exit
blocks through a trivially replacable PHI of which all incoming values are the
same instruction. This change enhance LICM to sink a sinkable instruction
through non-trivially replacable PHIs by spliting predecessors of loop
exits.

Reviewers: hfinkel, majnemer, davidxl, bmakam, mcrosier, danielcdh, efriedma, jtony

Reviewed By: efriedma

Subscribers: nemanjai, dberlin, llvm-commits

Differential Revision: https://reviews.llvm.org/D37163

llvm-svn: 317335
2017-11-03 16:24:53 +00:00
Anna Thomas 6879721453 [LoopPredication] NFC: Refactored code to separate out functions being reused
Summary:
Refactored the code to separate out common functions that are being
reused.
This is to reduce the changes for changes coming up wrt loop
predication with reverse loops.

This refactoring is what we have in our downstream code.

llvm-svn: 317324
2017-11-03 14:25:39 +00:00
Mikael Holmen 6018104d5e [ADCE] Use MapVector for BlockInfo to make iteration order deterministic
Summary:
Also added a reserve() method to MapVector since we want to use that from
ADCE.

DenseMap does not provide deterministic iteration order so with that
we will handle the members of BlockInfo in random order, eventually
leading to random order of the blocks in the predecessor lists.

Without this change, I get the same predecessor order in about 90% of the
time when I compile a certain reproducer and in 10% I get a different one.

No idea how to make a proper test case for this.

Reviewers: kuhar, david2050

Reviewed By: kuhar

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D39593

llvm-svn: 317323
2017-11-03 14:15:08 +00:00
Florian Hahn 41e32bfd68 [PartialInliner] Skip call sites where inlining fails.
Summary:
InlineFunction can fail, for example when trying to inline vararg
fuctions. In those cases, we do not want to bump partial inlining
counters or set AnyInlined to true, because this could leave an unused
function hanging around.

Reviewers: davidxl, davide, gyiu

Reviewed By: davide

Subscribers: llvm-commits, eraman

Differential Revision: https://reviews.llvm.org/D39581

llvm-svn: 317314
2017-11-03 11:29:00 +00:00
Vedant Kumar 9196ed1be1 [LSR] Clarify a comment. NFC.
llvm-svn: 317295
2017-11-03 01:01:28 +00:00
Adrian Prantl fbb6fbf709 IndVarSimplify: preserve debug information attached to widened PHI nodes.
This fixes PR35015.

https://bugs.llvm.org/show_bug.cgi?id=35015

Differential Revision: https://reviews.llvm.org/D39345

llvm-svn: 317282
2017-11-02 23:17:06 +00:00
Hiroshi Yamauchi dce9def3dd Irreducible loop metadata for more accurate block frequency under PGO.
Summary:
Currently the block frequency analysis is an approximation for irreducible
loops.

The new irreducible loop metadata is used to annotate the irreducible loop
headers with their header weights based on the PGO profile (currently this is
approximated to be evenly weighted) and to help improve the accuracy of the
block frequency analysis for irreducible loops.

This patch is a basic support for this.

Reviewers: davidxl

Reviewed By: davidxl

Subscribers: mehdi_amini, llvm-commits, eraman

Differential Revision: https://reviews.llvm.org/D39028

llvm-svn: 317278
2017-11-02 22:26:51 +00:00
Anna Thomas 1d02b13eb7 [LoopPredication] Enable predication when latchCheckIV is wider than rangeCheck
Summary:
This patch allows us to predicate range checks that have a type narrower than
the latch check type. We leverage SCEV analysis to identify a truncate for the
latchLimit and latchStart.
There is also safety checks in place which requires the start and limit to be
known at compile time. We require this to make sure that the SCEV truncate expr
for the IV corresponding to the latch does not cause us to lose information
about the IV range.
Added tests show the loop predication over range checks that are of various
types and are narrower than the latch type.
This enhancement has been in our downstream tree for a while.

Reviewers: apilipenko, sanjoy, mkazantsev

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D39500

llvm-svn: 317269
2017-11-02 21:21:02 +00:00
Anna Thomas 729dafc16b Strip off invariant.start because memory locations arent invariant
The original change was reverted in rL317217 because of the failure in
the RS4GC testcase. I couldn't reproduce the failure on my local machine
(macbook) but could reproduce it on a linux box.

The failure was around removing the uses of invariant.start. The fix
here is to just RAUW undef (which was the first implementation in D39388).
This is perfectly valid IR as discussed in the review.

llvm-svn: 317225
2017-11-02 18:24:04 +00:00
Anna Thomas ebe429d99f Revert "[RS4GC] Strip off invariant.start because memory locations arent invariant"
This reverts commit r317215, investigating the test failure.

llvm-svn: 317217
2017-11-02 16:45:51 +00:00
Anna Thomas 486a7aaa31 [RS4GC] Strip off invariant.start because memory locations arent invariant
Summary:
Invariant.start on memory locations has the property that the memory
location is unchanging. However, this is not true in the face of
rewriting statepoints for GC.
Teach RS4GC about removing invariant.start so that optimizations after
RS4GC does not incorrect sink a load from the memory location past a
statepoint.

Added test showcasing the issue.

Reviewers: reames, apilipenko, dneilson

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D39388

llvm-svn: 317215
2017-11-02 16:23:31 +00:00
Clement Courbet 82bade615b Revert "[ExpandMemCmp] Split ExpandMemCmp from CodeGen into its own pass."
undefined reference to `llvm::TargetPassConfig::ID' on
clang-ppc64le-linux-multistage

This reverts commit eea333c33fa73ad225ef28607795984829f65688.

llvm-svn: 317213
2017-11-02 15:53:10 +00:00
Clement Courbet 1dc37b9c3b [ExpandMemCmp] Split ExpandMemCmp from CodeGen into its own pass.
Summary:
This is mostly a noop (most of the test diffs are renamed blocks).
There are a few temporary register renames (eax<->ecx) and a few blocks are
shuffled around.

See the discussion in PR33325 for more details.

Reviewers: spatel

Subscribers: mgorny

Differential Revision: https://reviews.llvm.org/D39456

llvm-svn: 317211
2017-11-02 15:02:51 +00:00
Bjorn Pettersson e73b85d1ab [SimplifyCFG] Discard speculated dbg intrinsics
Summary:
SpeculativelyExecuteBB can flatten the CFG by doing
speculative execution followed by a select instruction.
When the speculatively executed BB contained dbg intrinsics
the result could be a little bit weird, since those dbg
intrinsics were inserted before the select in the flattened
CFG. So when single stepping in the debugger, printing the
value of the variable referenced in the dbg intrinsic, it
could happen that it looked like the variable had values
that never actually were assigned to the variable.

This patch simply discards all dbg intrinsics that were found
in the speculatively executed BB.

Reviewers: aprantl, chandlerc, craig.topper

Reviewed By: aprantl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D39494

llvm-svn: 317198
2017-11-02 11:55:14 +00:00
Adrian Prantl bfa77c4c85 loop-unroll: teach remapInstruction to update dbg.value intrinsics.
Fixes PR35112.

https://bugs.llvm.org/show_bug.cgi?id=35112

llvm-svn: 317138
2017-11-01 23:12:35 +00:00
Adrian Prantl 98c6549e4a loop-rotate: avoid duplicating dbg.value intrinsics in the entry block.
This fixes the second half of PR35113.

This reapplies r317106 without modifications.

llvm-svn: 317121
2017-11-01 20:53:22 +00:00
Adrian Prantl d60f34c20a loop-rotate: eliminate duplicate debug intrinsics after splicing.
Fixes part of PR35113.

This reapplies r317105 with an additional check for isa<Instruction>
as found by the bots.

llvm-svn: 317120
2017-11-01 20:43:30 +00:00
Dehao Chen c6c051f2ea Include GUIDs from the same module when computing GUIDs that needs to be imported.
Summary: In the compile phase of SamplePGO+ThinLTO, ICP is not invoked. Instead, indirect call targets will be included as function metadata for ThinIndex to buidl the call graph. This should not only include functions defined in other modules, but also functions defined in the same module, otherwise ThinIndex may find the callee dead and eliminate it, while ICP in backend will revive the symbol, which leads to undefined symbol.

Reviewers: tejohnson

Reviewed By: tejohnson

Subscribers: sanjoy, llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D39480

llvm-svn: 317118
2017-11-01 20:26:47 +00:00
Philip Reames 7b861f08cd Revert 317016 and 317048
The former appears to have introduced a miscompile in a stage2 clang build.  Revert so I can investigate offline.

llvm-svn: 317116
2017-11-01 19:49:20 +00:00
Adrian Prantl c8516346e4 Revert r317105 to investigate bot breakage.
llvm-svn: 317110
2017-11-01 18:06:38 +00:00
Adrian Prantl 40a0ea5f29 Revert r317106 to facilitate reverting r317105.
llvm-svn: 317109
2017-11-01 18:06:35 +00:00
Peter Collingbourne 9fb6e1a037 LTO: Apply global DCE to ThinLTO modules at LTO opt level 0.
This is necessary because DCE is applied to full LTO modules. Without
this change, a reference from a dead ThinLTO global to a dead full
LTO global will result in an undefined reference at link time.

This problem is only observable when --gc-sections is disabled, or
when targeting COFF, as the COFF port of lld requires all symbols to
have a definition even if all references are dead (this is consistent
with link.exe).

This change also adds an EliminateAvailableExternally pass at -O0. This
is necessary to handle the situation on Windows where a non-prevailing
copy of a linkonce_odr function has an SEH filter function; any
such filters must be DCE'd because they will contain a call to the
llvm.localrecover intrinsic, passing as an argument the address of the
function that the filter belongs to, and llvm.localrecover requires
this function to be defined locally.

Fixes PR35142.

Differential Revision: https://reviews.llvm.org/D39484

llvm-svn: 317108
2017-11-01 17:58:39 +00:00
Adrian Prantl 9259f21604 loop-rotate: avoid duplicating dbg.value intrinsics in the entry block.
This fixes the second half of PR35113.

llvm-svn: 317106
2017-11-01 17:28:50 +00:00
Adrian Prantl b627acd0ce loop-rotate: eliminate duplicate debug intrinsics after splicing.
Fixes part of PR35113.

llvm-svn: 317105
2017-11-01 17:28:47 +00:00
Max Kazantsev 6f5229d7da Revert rL311205 "[IRCE] Fix buggy behavior in Clamp"
This patch reverts rL311205 that was initially a wrong fix. The real problem
was in intersection of signed and unsigned ranges (see rL316552), and the
patch being reverted masked the problem instead of fixing it.

By now, the test against which rL311205 was made works OK even without this
code. This revert patch also contains a test case that demonstrates incorrect
behavior caused by rL311205: it is caused by incorrect choise of signed max
instead of unsigned.

llvm-svn: 317088
2017-11-01 13:21:56 +00:00
Florian Hahn b93c06331e [CodeExtractor] Fix iterator invalidation in findOrCreateBlockForHoisting.
Summary:
By replacing branches to CommonExitBlock, we remove the node from
CommonExitBlock's predecessors, invalidating the iterator. The problem
is exposed when the common exit block has multiple predecessors and
needs to sink lifetime info. The modification in the test case trigger
the issue.

Reviewers: davidxl, davide, wmi

Reviewed By: davidxl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D39112

llvm-svn: 317084
2017-11-01 09:48:12 +00:00
Philip Reames 357cd3289e [SimplifyIndVar] Inline makIVComparisonInvariant to eleminate code duplication [NFC]
This formulation might be slightly slower since I eagerly compute the cheap replacements.  If anyone sees this having a compile time impact, let me know and I'll use lazy population instead.

llvm-svn: 317048
2017-10-31 22:56:16 +00:00
Adrian Prantl deb437b038 loop-rotate: simplify code by using llvm::findDbgValues(). (NFC)
llvm-svn: 317037
2017-10-31 21:03:22 +00:00
Benjamin Kramer 992fc4ea2d [coro] Make Spill a proper struct instead of deriving from pair.
No functionality change.

llvm-svn: 317027
2017-10-31 19:22:55 +00:00
Craig Topper 7c7fcabd3f [SimplifyCFG] Use a more generic name for the selects created by SpeculativelyExecuteBB to prevent long names from being created
Currently the selects are created with the names of their inputs concatenated together. It's possible to get cases that chain these selects together resulting in long names due to multiple levels of concatenation. Our internal branch of llvm managed to generate names over 100000 characters in length on a particular test due to an extreme compounding of the names.

This patch changes the name to a generic name that is not dependent on its inputs.

Differential Revision: https://reviews.llvm.org/D39440

llvm-svn: 317024
2017-10-31 19:03:51 +00:00
Philip Reames dc417a9819 [IndVarSimplify] Extract wrapper around SE-.isLoopInvariantPredicate [NFC]
This an intermediate state, the next patch will re-inline the markLoopInvariantPredicate function to reduce code duplication.

llvm-svn: 317016
2017-10-31 18:04:57 +00:00
Philip Reames cd0a5bb96c [IndVarSimplify] Simplify code using a dictionary
Possibly very slightly slower, but this code is not performance critical and the readability benefit alone is huge.

llvm-svn: 317012
2017-10-31 17:06:32 +00:00
Reid Kleckner c212cc88e2 [asan] Upgrade private linkage globals to internal linkage on COFF
COFF comdats require symbol table entries, which means the comdat leader
cannot have private linkage.

llvm-svn: 317009
2017-10-31 16:16:08 +00:00
Benjamin Kramer 3f3d5be759 [LoopVectorize] Replace manual VPlan memory management with unique_ptr.
No functionality change intended.

llvm-svn: 317003
2017-10-31 14:58:22 +00:00
Matthew Simpson b6915fbfa2 [InstCombine] Simplify selects that test cmpxchg instructions
If a select instruction tests the returned flag of a cmpxchg instruction and
selects between the returned value of the cmpxchg instruction and its compare
operand, the result of the select will always be equal to its false value.

Differential Revision: https://reviews.llvm.org/D39383

llvm-svn: 316994
2017-10-31 12:34:02 +00:00
David Green 64f53b4214 [LoopUnroll] Clean up remarks for unroll remainder
The optimisation remarks for loop unrolling with an unrolled remainder looks something like:

test.c:7:18: remark: completely unrolled loop with 3 iterations [-Rpass=loop-unroll]
            C[i] += A[i*N+j];
                 ^
test.c:6:9: remark: unrolled loop by a factor of 4 with run-time trip count [-Rpass=loop-unroll]
        for(int j = 0; j < N; j++)
        ^
This removes the first of the two messages.

Differential revision: https://reviews.llvm.org/D38725

llvm-svn: 316986
2017-10-31 10:47:46 +00:00
Max Kazantsev 84286ce5dd [IRCE][NFC] Rename fields of InductiveRangeCheck
Rename `Offset`, `Scale`, `Length` into `Begin`, `Step`, `End` respectively
to make naming of similar entities for Ranges and Range Checks more
consistent.

Differential Revision: https://reviews.llvm.org/D39414

llvm-svn: 316979
2017-10-31 06:19:05 +00:00
Max Kazantsev 21e7b53490 [NFC] Get rid of variables used in assert only
llvm-svn: 316977
2017-10-31 05:33:58 +00:00
Philip Reames 59bf1e0548 [IndVarSimplify] Simplify code using preheader assumption
As noted in the nice block comment, the previous code didn't actually handle multi-entry loops correctly, it just assumed SCEV didn't analyze such loops.  Given SCEV has comments to the contrary, that seems a bit suspect.  More importantly, the pass actually requires loopsimplify form which ensures a loop-preheader is available.  Remove the excessive generaility and shorten the code greatly.

Note that we do successfully analyze many multi-entry loops, but we do so by converting them to single entry loops.  See the added test case.

llvm-svn: 316976
2017-10-31 05:16:46 +00:00
Max Kazantsev 488ec975bb Reapply "[GVN] Prevent LoadPRE from hoisting across instructions that don't pass control flow to successors"
This patch fixes the miscompile that happens when PRE hoists loads across guards and
other instructions that don't always pass control flow to their successors. PRE is now prohibited
to hoist across such instructions because there is no guarantee that the load standing after such
instruction is still valid before such instruction. For example, a load from under a guard may be
invalid before the guard in the following case:
  int array[LEN];
  ...
  guard(0 <= index && index < LEN);
  use(array[index]);

Differential Revision: https://reviews.llvm.org/D37460

llvm-svn: 316975
2017-10-31 05:07:56 +00:00
Philip Reames 39a8dbff87 [SimplifyIndVar] Extract out invariant expression handling
Previously, the code returned early from the *function* when it couldn't find a free expansion, it should be returning from the *transform*.  I don't have a test case, noticed this via inspection.

As a follow up, I'm going to revisit the logic in the extract function.  I think that essentially the whole helper routine can be replaced with SCEVExpander, but I wanted to do that in a series of separate commits.

llvm-svn: 316974
2017-10-31 04:19:06 +00:00
Philip Reames 5552f503d5 Undo accidental commit
These files shouldn't have been submitted in 316967

llvm-svn: 316968
2017-10-31 00:04:09 +00:00
Philip Reames 9c3cbeea39 [CGP] Fix crash on i96 bit multiply
Issue found by llvm-isel-fuzzer on OSS fuzz, https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=3725

If anyone actually cares about > 64 bit arithmetic, there's a lot more to do in this area.  There's a bunch of obviously wrong code in the same function.  I don't have the time to fix all of them and am just using this to understand what the workflow for fixing fuzzer cases might look like.

llvm-svn: 316967
2017-10-30 23:59:51 +00:00
Yaxun Liu d23f23d81c InferAddressSpaces: Fix bug about replacing addrspacecast
InferAddressSpaces assumes the pointee type of addrspacecast
is the same as the operand, which is not always true and causes
invalid IR.

This bug cause build failure in HCC.

This patch fixes that.

Differential Revision: https://reviews.llvm.org/D39432

llvm-svn: 316957
2017-10-30 21:19:41 +00:00
Davide Italiano 834b45129b [NewGVN] Stop assuming PHI args ordering when looking at phi-of-ops.
It's not guaranteed. There's a bug open to sort them in predecessor
order, but it won't happen anytime soon. In the meanwhile, passes
will have to do an O(#preds) scan. Such is life.

llvm-svn: 316953
2017-10-30 20:20:16 +00:00
Daniel Neilson f9c7d29c77 Create instruction classes for identifying any atomicity of memory intrinsic. (NFC)
Summary:
For reference, see: http://lists.llvm.org/pipermail/llvm-dev/2017-August/116589.html

This patch fleshes out the instruction class hierarchy with respect to atomic and
non-atomic memory intrinsics. With this change, the relevant part of the class
hierarchy becomes:

IntrinsicInst
  -> MemIntrinsicBase (methods-only class)
    -> MemIntrinsic (non-atomic intrinsics)
      -> MemSetInst
      -> MemTransferInst
        -> MemCpyInst
        -> MemMoveInst
    -> AtomicMemIntrinsic (atomic intrinsics)
      -> AtomicMemSetInst
      -> AtomicMemTransferInst
        -> AtomicMemCpyInst
        -> AtomicMemMoveInst
    -> AnyMemIntrinsic (both atomicities)
      -> AnyMemSetInst
      -> AnyMemTransferInst
        -> AnyMemCpyInst
        -> AnyMemMoveInst

This involves some class renaming:
    ElementUnorderedAtomicMemCpyInst -> AtomicMemCpyInst
    ElementUnorderedAtomicMemMoveInst -> AtomicMemMoveInst
    ElementUnorderedAtomicMemSetInst -> AtomicMemSetInst
A script for doing this renaming in downstream trees is included below.

An example of where the Any* classes should be used in LLVM is when reasoning
about the effects of an instruction (ex: aliasing).

---
Script for renaming AtomicMem* classes:
PREFIXES="[<,([:space:]]"
CLASSES="MemIntrinsic|MemTransferInst|MemSetInst|MemMoveInst|MemCpyInst"
SUFFIXES="[;)>,[:space:]]"

REGEX="(${PREFIXES})ElementUnorderedAtomic(${CLASSES})(${SUFFIXES})"
REGEX2="visitElementUnorderedAtomic(${CLASSES})"

FILES=$( grep -E "(${REGEX}|${REGEX2})" -r . | tr ':' ' ' | awk '{print $1}' | sort | uniq )

SED_SCRIPT="s~${REGEX}~\1Atomic\2\3~g"
SED_SCRIPT2="s~${REGEX2}~visitAtomic\1~g"

for f in $FILES; do
    echo "Processing: $f"
    sed  -i ".bak" -E "${SED_SCRIPT};${SED_SCRIPT2};${EA_SED_SCRIPT};${EA_SED_SCRIPT2}" $f
done

Reviewers: sanjoy, deadalnix, apilipenko, anna, skatkov, mkazantsev

Reviewed By: sanjoy

Subscribers: hfinkel, jholewinski, arsenm, sdardis, nhaehnle, JDevlieghere, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D38419

llvm-svn: 316950
2017-10-30 19:51:48 +00:00
Mandeep Singh Grang f83268bd9e [GVNHoist] Fix non-deterministic sort order of PHIs for identical instructions
Summary: This fixes failure in Transforms/GVNHoist/hoist.ll uncovered by D39245.

Reviewers: hiraditya, spop, dberlin

Reviewed By: dberlin

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D39410

llvm-svn: 316949
2017-10-30 19:42:41 +00:00
Clement Courbet b2c3eb8cf1 [CodeGen][ExpandMemcmp] Allow memcmp to expand to vector loads (2).
- Targets that want to support memcmp expansions now return the list of
   supported load sizes.
 - Expansion codegen does not assume that all power-of-two load sizes
   smaller than the max load size are valid. For examples, this is not the
   case for x86(32bit)+sse2.

Fixes PR34887.

llvm-svn: 316905
2017-10-30 14:19:33 +00:00
Florian Hahn d0208b4b1c Recommit r315288: [SCCP] Propagate integer range info for parameters in IPSCCP.
This version of the patch includes a fix addressing a stage2 LTO buildbot
failure and addressed some additional nits.

Original commit message:
This updates the SCCP solver to use of the ValueElement lattice for
parameters, which provides integer range information. The range
information is used to remove unneeded icmp instructions.

For the following function, f() can be optimized to ret i32 2 with
this change

    source_filename = "sccp.c"
    target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
    target triple = "x86_64-unknown-linux-gnu"

    ; Function Attrs: norecurse nounwind readnone uwtable
    define i32 @main() local_unnamed_addr #0 {
    entry:
      %call = tail call fastcc i32 @f(i32 1)
      %call1 = tail call fastcc i32 @f(i32 47)
      %add3 = add nsw i32 %call, %call1
      ret i32 %add3
    }

    ; Function Attrs: noinline norecurse nounwind readnone uwtable
    define internal fastcc i32 @f(i32 %x) unnamed_addr #1 {
    entry:
      %c1 = icmp sle i32 %x, 100

      %cmp = icmp sgt i32 %x, 300
      %. = select i1 %cmp, i32 1, i32 2
      ret i32 %.
    }

    attributes #1 = { noinline }

Reviewers: davide, sanjoy, efriedma, dberlin

Reviewed By: davide, dberlin

Subscribers: mcrosier, gberry, mssimpso, dberlin, llvm-commits

Differential Revision: https://reviews.llvm.org/D36656

llvm-svn: 316891
2017-10-30 10:07:42 +00:00
Max Kazantsev 390fc57771 [IRCE][NFC] Store Length as SCEV in RangeCheck instead of Value
llvm-svn: 316889
2017-10-30 09:35:16 +00:00
Florian Hahn d18443edad Revert r316887 to fix buildbot failures.
llvm-svn: 316888
2017-10-30 09:21:50 +00:00
Florian Hahn 925d3e4a98 Recommit r315288: [SCCP] Propagate integer range info for parameters in IPSCCP.
This version of the patch includes a fix addressing a stage2 LTO buildbot
failure and addressed some additional nits.

Original commit message:
This updates the SCCP solver to use of the ValueElement lattice for
parameters, which provides integer range information. The range
information is used to remove unneeded icmp instructions.

For the following function, f() can be optimized to ret i32 2 with
this change

    source_filename = "sccp.c"
    target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
    target triple = "x86_64-unknown-linux-gnu"

    ; Function Attrs: norecurse nounwind readnone uwtable
    define i32 @main() local_unnamed_addr #0 {
    entry:
      %call = tail call fastcc i32 @f(i32 1)
      %call1 = tail call fastcc i32 @f(i32 47)
      %add3 = add nsw i32 %call, %call1
      ret i32 %add3
    }

    ; Function Attrs: noinline norecurse nounwind readnone uwtable
    define internal fastcc i32 @f(i32 %x) unnamed_addr #1 {
    entry:
      %c1 = icmp sle i32 %x, 100

      %cmp = icmp sgt i32 %x, 300
      %. = select i1 %cmp, i32 1, i32 2
      ret i32 %.
    }

    attributes #1 = { noinline }

Reviewers: davide, sanjoy, efriedma, dberlin

Reviewed By: davide, dberlin

Subscribers: mcrosier, gberry, mssimpso, dberlin, llvm-commits

Differential Revision: https://reviews.llvm.org/D36656

llvm-svn: 316887
2017-10-30 09:04:18 +00:00
Max Kazantsev 1d7c0439b9 [GVN][NFC] Mark instruction for deletion instead of immediate erasing in LoadPRE
It is done to uniformly handle instructions removal.

Differential Revision: https://reviews.llvm.org/D39369

llvm-svn: 316884
2017-10-30 04:48:34 +00:00
Sanjay Patel b049173157 [SimplifyCFG] use pass options and remove the latesimplifycfg pass
This is no-functional-change-intended.

This is repackaging the functionality of D30333 (defer switch-to-lookup-tables) and 
D35411 (defer folding unconditional branches) with pass parameters rather than a named
"latesimplifycfg" pass. Now that we have individual options to control the functionality,
we could decouple when these fire (but that's an independent patch if desired). 

The next planned step would be to add another option bit to disable the sinking transform
mentioned in D38566. This should also make it clear that the new pass manager needs to
be updated to limit simplifycfg in the same way as the old pass manager.

Differential Revision: https://reviews.llvm.org/D38631

llvm-svn: 316835
2017-10-28 18:43:07 +00:00
Craig Topper 49687104d6 [PartialInlineLibCalls] Teach PartialInlineLibCalls to honor nobuiltin, properly check the function signature, and check TLI::has
Summary:
We shouldn't do this transformation if the function is marked nobuitlin.

We were only checking that the return type is floating point, we really should be checking the argument types and argument count as well. This can be accomplished by using the other version of getLibFunc that takes the Function and not just the name.

We should also be checking TLI::has since sqrtf is a macro on Windows.

Fixes PR32559.

Reviewers: hfinkel, spatel, davide, efriedma

Reviewed By: davide, efriedma

Subscribers: efriedma, llvm-commits, eraman

Differential Revision: https://reviews.llvm.org/D39381

llvm-svn: 316819
2017-10-28 00:36:58 +00:00
Artur Pilipenko 8aadc643cf [LoopPredication] Handle the case when the guard and the latch IV have different offsets
This is a follow up change for D37569.

Currently the transformation is limited to the case when:
 * The loop has a single latch with the condition of the form: ++i <pred> latchLimit, where <pred> is u<, u<=, s<, or s<=.
 * The step of the IV used in the latch condition is 1.
 * The IV of the latch condition is the same as the post increment IV of the guard condition.
 * The guard condition is of the form i u< guardLimit.

This patch enables the transform in the case when the latch is

 latchStart + i <pred> latchLimit, where <pred> is u<, u<=, s<, or s<=.

And the guard is

 guardStart + i u< guardLimit

Reviewed By: anna

Differential Revision: https://reviews.llvm.org/D39097

llvm-svn: 316768
2017-10-27 14:46:17 +00:00
Max Kazantsev 665907c3c2 [GVN][NFC] Refactor loop iteration with foreach
llvm-svn: 316748
2017-10-27 08:19:35 +00:00
Eugene Zelenko 57bd5a0274 [Transforms] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 316724
2017-10-27 01:09:08 +00:00
Philip Reames 29dd40b38e [SimplifyIndVars] Shorten code by using SCEV helper [NFC]
llvm-svn: 316709
2017-10-26 22:02:16 +00:00
Dehao Chen ed2d5402cb Do not add discriminator encoding for debug intrinsics.
Summary: There are certain requirements for debug location of debug intrinsics, e.g. the scope of the DILocalVariable should be the same as the scope of its debug location. As a result, we should not add discriminator encoding for debug intrinsics.

Reviewers: dblaikie, aprantl

Reviewed By: aprantl

Subscribers: JDevlieghere, aprantl, bjope, sanjoy, llvm-commits

Differential Revision: https://reviews.llvm.org/D39343

llvm-svn: 316703
2017-10-26 21:20:52 +00:00
Philip Reames 21cc2fa3f6 [LICM] Restructure implicit exit handling to be more clear [NFCI]
When going to explain this to someone else, I got tripped up by the complicated meaning of IsKnownNonEscapingObject in load-store promotion.  Extract a helper routine and clarify naming/scopes to make this a bit more obvious.

llvm-svn: 316699
2017-10-26 21:00:15 +00:00
Balaram Makam 9ee942f481 Reapply r316582 [Local] Fix a bug in the domtree update logic for MergeBasicBlockIntoOnlyPred.
Summary: This reverts r316612 to reapply r316582. The buildbot failure was unrelated to this commit.

Reviewers:

Subscribers:

llvm-svn: 316669
2017-10-26 15:04:53 +00:00
Bjorn Pettersson 86db068e39 [LSV] Avoid adding vectors of pointers as candidates
Summary:
We no longer add vectors of pointers as candidates for
load/store vectorization. It does not seem to work anyway,
but without this patch we can end up in asserts when trying
to create casts between an integer type and the pointer of
vectors type.

The test case I've added used to assert like this when trying to
cast between i64 and <2 x i16*>:
opt: ../lib/IR/Instructions.cpp:2565: Assertion `castIsValid(op, S, Ty) && "Invalid cast!"' failed.
#0 PrintStackTraceSignalHandler(void*)
#1 SignalHandler(int)
#2 __restore_rt
#3 __GI_raise
#4 __GI_abort
#5 __GI___assert_fail
#6 llvm::CastInst::Create(llvm::Instruction::CastOps, llvm::Value*, llvm::Type*, llvm::Twine const&, llvm::Instruction*)
#7 llvm::IRBuilder<llvm::ConstantFolder, llvm::IRBuilderDefaultInserter>::CreateBitOrPointerCast(llvm::Value*, llvm::Type*, llvm::Twine const&)
#8 Vectorizer::vectorizeStoreChain(llvm::ArrayRef<llvm::Instruction*>, llvm::SmallPtrSet<llvm::Instruction*, 16u>*)

Reviewers: arsenm

Reviewed By: arsenm

Subscribers: nhaehnle, llvm-commits

Differential Revision: https://reviews.llvm.org/D39296

llvm-svn: 316665
2017-10-26 13:59:15 +00:00
Bjorn Pettersson 22a2282da1 [LSV] Skip all non-byte sizes, not only less than eight bits
Summary:
The code comments indicate that no effort has been spent on
handling load/stores when the size isn't a multiple of the
byte size correctly. However, the code only avoided types
smaller than 8 bits. So for example a load of an i28 could
still be considered as a candidate for vectorization.

This patch adjusts the code to behave according to the code
comment.

The test case used to hit the following assert when
trying to use "cast" an i32 to i28 using CreateBitOrPointerCast:

opt: ../lib/IR/Instructions.cpp:2565: Assertion `castIsValid(op, S, Ty) && "Invalid cast!"' failed.
#0 PrintStackTraceSignalHandler(void*)
#1 SignalHandler(int)
#2 __restore_rt
#3 __GI_raise
#4 __GI_abort
#5 __GI___assert_fail
#6 llvm::CastInst::Create(llvm::Instruction::CastOps, llvm::Value*, llvm::Type*, llvm::Twine const&, llvm::Instruction*)
#7 llvm::IRBuilder<llvm::ConstantFolder, llvm::IRBuilderDefaultInserter>::CreateBitOrPointerCast(llvm::Value*, llvm::Type*, llvm::Twine const&)
#8 (anonymous namespace)::Vectorizer::vectorizeLoadChain(llvm::ArrayRef<llvm::Instruction*>, llvm::SmallPtrSet<llvm::Instruction*, 16u>*)

Reviewers: arsenm

Reviewed By: arsenm

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D39295

llvm-svn: 316663
2017-10-26 13:42:55 +00:00
Eugene Zelenko 5c2aecef78 [Transforms] Revert r316630 changes in Scalar/MergeICmps.cpp to fix broken build bots (NFC).
llvm-svn: 316634
2017-10-26 01:25:14 +00:00
Eugene Zelenko 5adb96cc92 [Transforms] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 316630
2017-10-26 00:55:39 +00:00
Matthew Simpson 99f57933ba Attempt to unbreak the expensive-checks-win bot
llvm-svn: 316625
2017-10-25 22:46:34 +00:00
Balaram Makam 52252fe20d Revert r316582 [Local] Fix a bug in the domtree update logic for MergeBasicBlockIntoOnlyPred.
Summary: This reverts commit r316582. It looks like this commit broke tests on one buildbot:
http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/5719

. . .
Failing Tests (1):
    LLVM :: Transforms/CalledValuePropagation/simple-arguments.ll

Reviewers:

Subscribers:

llvm-svn: 316612
2017-10-25 21:32:54 +00:00
Balaram Makam 925ddf1a93 [Local] Fix a bug in the domtree update logic for MergeBasicBlockIntoOnlyPred.
Summary: For some irreducible CFG the domtree nodes might be dead, do not update domtree for dead nodes.

Reviewers: kuhar, dberlin, hfinkel

Reviewed By: kuhar

Subscribers: llvm-commits, mcrosier

Differential Revision: https://reviews.llvm.org/D38960

llvm-svn: 316582
2017-10-25 14:55:48 +00:00
Matthew Simpson cb58558c2f Add CalledValuePropagation pass
This patch adds a new pass for attaching !callees metadata to indirect call
sites. The pass propagates values to call sites by performing an IPSCCP-like
analysis using the generic sparse propagation solver. For indirect call sites
having a small set of possible callees, the attached metadata indicates what
those callees are. The metadata can be used to facilitate optimizations like
intersecting the function attributes of the possible callees, refining the call
graph, performing indirect call promotion, etc.

Differential Revision: https://reviews.llvm.org/D37355

llvm-svn: 316576
2017-10-25 13:40:08 +00:00
Max Kazantsev 9ac7021a25 [IRCE] Fix intersection between signed and unsigned ranges
IRCE for unsigned latch conditions was temporarily disabled by rL314881. The motivating
example contained an unsigned latch condition and a signed range check. One of the safe
iteration ranges was `[1, SINT_MAX + 1]`. Its right border was incorrectly interpreted as a negative
value in `IntersectRange` function, this lead to a miscompile under which we deleted a range check
without inserting a postloop where it was needed.

This patch brings back IRCE for unsigned latch conditions. Now we treat range intersection more
carefully. If the latch condition was unsigned, we only try to consider a range check for deletion if:
1. The range check is also unsigned, or
2. Safe iteration range of the range check lies within `[0, SINT_MAX]`.
The same is done for signed latch.

Values from `[0, SINT_MAX]` are unambiguous, these values are non-negative under any interpretation,
and all values of a range intersected with such range are also non-negative.

We also use signed/unsigned min/max functions for range intersection depending on type of the
latch condition.

Differential Revision: https://reviews.llvm.org/D38581

llvm-svn: 316552
2017-10-25 06:47:39 +00:00
Max Kazantsev 4332a943bc [IRCE] Smarter detection of empty ranges using SCEV
For a SCEV range, this patch replaces the naive emptiness check for SCEV ranges
which looks like `Begin == End` with a SCEV check. The range is guaranteed to be
empty of `Begin >= End`. We should filter such ranges out and do not try to perform
IRCE for them.

For example, we can get such range when intersecting range `[A, B)` and `[C, D)`
where `A < B < C < D`. The resulting range is `[max(A, C), min(B, D)) = [C, B)`.
This range is empty, but its `Begin` does not match with `End`.

Making IRCE for an empty range is basically safe but unprofitable because we
never actually get into the main loop where the range checks are supposed to
be eliminated. This patch uses SCEV mechanisms to treat loops with proved
`Begin >= End` as empty.

Differential Revision: https://reviews.llvm.org/D39082

llvm-svn: 316550
2017-10-25 06:10:02 +00:00
Eugene Zelenko 7f0f9bc5ab [Transforms] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 316503
2017-10-24 21:24:53 +00:00
Artem Belevich cb8f6328dc [NVPTX] allow address space inference for volatile loads/stores.
If particular target supports volatile memory access operations, we can
avoid AS casting to generic AS. Currently it's only enabled in NVPTX for
loads and stores that access global & shared AS.

Differential Revision: https://reviews.llvm.org/D39026

llvm-svn: 316495
2017-10-24 20:31:44 +00:00
Adrian Prantl d20442d383 Delete unused instantiations of DIBuilder. NFC
llvm-svn: 316494
2017-10-24 20:26:17 +00:00
Marek Olsak ce76ea0394 AMDGPU: Add new intrinsic llvm.amdgcn.kill(i1)
Summary:
Kill the thread if operand 0 == false.
llvm.amdgcn.wqm.vote can be applied to the operand.

Also allow kill in all shader stages.

Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D38544

llvm-svn: 316427
2017-10-24 10:27:13 +00:00
Marek Olsak 2114fc3bcb AMDGPU: Add llvm.amdgcn.wqm.vote intrinsic
Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye

Differential Revision: https://reviews.llvm.org/D38543

llvm-svn: 316426
2017-10-24 10:26:59 +00:00
Saleem Abdulrasool 619b3269fd ObjCARC: do not increment past the end of the BB
The `BasicBlock::getFirstInsertionPt` call may return `std::end` for the
BB.  Dereferencing the end iterator results in an assertion failure
"(!NodePtr->isKnownSentinel()), function operator*".  Ensure that the
returned iterator is valid before dereferencing it.  If the end is
returned, move one position backward to get a valid insertion point.

llvm-svn: 316401
2017-10-24 00:09:10 +00:00
Mandeep Singh Grang 9ed81c66ce [GVNSink] Fix failing GVNSink tests in the reverse iteration bot
Summary:

The elts of ActivePreds which is defined as a SmallPtrSet are copied
into Blocks using std::copy. This makes the resultant order of Blocks
non-deterministic. We cannot simply sort Blocks as they need to match
the corresponding Values. So a better approach is to define ActivePreds
as SmallSetVector.

This fixes the following failures in
http://lab.llvm.org:8011/builders/reverse-iteration:
  LLVM :: Transforms/GVNSink/indirect-call.ll
  LLVM :: Transforms/GVNSink/sink-common-code.ll
  LLVM :: Transforms/GVNSink/struct.ll

Reviewers: dberlin, jmolloy, bkramer, efriedma

Reviewed By: dberlin

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D39025

llvm-svn: 316369
2017-10-23 19:56:52 +00:00
Sanjay Patel b80daf0b48 [SimplifyCFG] delay switch condition forwarding to -latesimplifycfg
As discussed in D39011:
https://reviews.llvm.org/D39011
...replacing constants with a variable is inverting the transform done
by other IR passes, so we definitely don't want to do this early. 
In fact, it's questionable whether this transform belongs in SimplifyCFG 
at all. I'll look at moving this to codegen as a follow-up step.

llvm-svn: 316298
2017-10-22 19:10:07 +00:00
Sanjay Patel 24226504a7 [SimplifyCFG] try harder to forward switch condition to phi (PR34471)
The missed canonicalization/optimization in the motivating test from PR34471 leads to very different codegen:

  int switcher(int x) {
      switch(x) {
      case 17: return 17;
      case 19: return 19;
      case 42: return 42;
      default: break;
      }
      return 0;
    }

  int comparator(int x) {
    if (x == 17) return 17;
    if (x == 19) return 19;
    if (x == 42) return 42;
    return 0;
  }

For the first example, we use a bit-test optimization to avoid a series of compare-and-branch:
https://godbolt.org/g/BivDsw

Differential Revision: https://reviews.llvm.org/D39011

llvm-svn: 316293
2017-10-22 16:51:03 +00:00
David Green 907b60fbba [LoopInterchange] Fix phi node ordering miscompile.
The way that splitInnerLoopHeader splits blocks requires that
the induction PHI will be the first PHI in the inner loop
header. This makes sure that is actually the case when there
are both IV and reduction phis.

Differential Revision: https://reviews.llvm.org/D38682

llvm-svn: 316261
2017-10-21 13:58:37 +00:00
Eugene Zelenko fce435764e [Transforms] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 316253
2017-10-21 00:57:46 +00:00
Eugene Zelenko 99241d75c1 [Transforms] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 316241
2017-10-20 21:47:29 +00:00
Eugene Zelenko bff0ef0324 [Transforms] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 316190
2017-10-19 22:07:16 +00:00
Eugene Zelenko f27d161bf0 [Transforms] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 316187
2017-10-19 21:21:30 +00:00
Simon Pilgrim 0444e4fcd4 Fix MSVC signed/unsigned comparison warning
llvm-svn: 316161
2017-10-19 15:00:31 +00:00
Max Kazantsev 3612d4b4f9 [NFC][IRCE] Filter out empty ranges early
llvm-svn: 316146
2017-10-19 05:33:28 +00:00
whitequark a99ecf1bbb [MergeFunctions] Don't blindly RAUW a GlobalValue with a ConstantExpr.
MergeFunctions uses (through FunctionComparator) a map of GlobalValues
to identifiers because it needs to compare functions and globals
do not have an inherent total order. Thus, FunctionComparator
(through GlobalNumberState) has a ValueMap<GlobalValue *>.

r315852 added a RAUW on globals that may have been previously
encountered by the FunctionComparator, which would replace
a GlobalValue * key with a ConstantExpr *, which is illegal.

This commit adjusts that code path to remove the function being
replaced from the ValueMap as well.

llvm-svn: 316145
2017-10-19 04:47:48 +00:00
Chandler Carruth 3f0e056df4 [PM] Refactor the bounds checking pass to remove a method only called in
one place.

llvm-svn: 316135
2017-10-18 22:42:36 +00:00
Sanjoy Das 2f27456c82 Revert "[ScalarEvolution] Handling for ICmp occuring in the evolution chain."
This reverts commit r316054.  There was some confusion over the review process:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20171016/495884.html

llvm-svn: 316129
2017-10-18 22:00:57 +00:00
Eugene Zelenko 306d29977d [Transforms] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 316128
2017-10-18 21:46:47 +00:00
Jatin Bhateja 1fc49627e4 [ScalarEvolution] Handling for ICmp occuring in the evolution chain.
Summary:
 If a compare instruction is same or inverse of the compare in the
 branch of the loop latch, then return a constant evolution node.
 Currently scope of evaluation is limited to SCEV computation for
 PHI nodes.

 This shall facilitate computations of loop exit counts in cases
 where compare appears in the evolution chain of induction variables.

 Will fix PR 34538
Reviewers: sanjoy, hfinkel, junryoungju

Reviewed By: junryoungju

Subscribers: javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D38494

llvm-svn: 316054
2017-10-18 01:36:16 +00:00
Michael Zolotukhin c4fcc189d2 [GlobalDCE] Use DenseMap instead of unordered_multimap for GVDependencies.
Summary:
std::unordered_multimap happens to be very slow when the number of elements
grows large. On one of our internal applications we observed a 17x compile time
improvement from changing it to DenseMap.

Reviewers: mehdi_amini, serge-sans-paille, davide

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D38916

llvm-svn: 316045
2017-10-17 23:47:06 +00:00
Eugene Zelenko 6cadde7f40 [Transforms] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 316034
2017-10-17 21:27:42 +00:00
Vitaly Buka 524c0a639d Fix signed overflow detected by ubsan
This overflow does not affect algorithm, so just suppress it.

llvm-svn: 316018
2017-10-17 18:33:15 +00:00
Philip Reames 6a7bbfb2e2 Revert 315440 on behalf of mkazantsev
This patch reverts rL315440 because of the bug described at
https://bugs.llvm.org/show_bug.cgi?id=34937

The fix for the bug is on review as D38944, but not yet ready.  Given this is a regression reverting until a fix is ready is called for.

Max would have done the revert himself, but is having trouble doing a build of fresh LLVM for some reason.  I did the build and test to ensure the revert worked as expected on his behalf.

llvm-svn: 315974
2017-10-17 06:21:07 +00:00
Craig Topper 91259e2681 [JumpThreading] Move two PredValueInfoTy vectors to a scope closer to their usage. NFCI
llvm-svn: 315941
2017-10-16 21:54:13 +00:00
Eugene Zelenko dd40f5e7c1 [Transforms] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 315940
2017-10-16 21:34:24 +00:00
Akira Hatanaka e8c1a54c07 [ObjCARC] Do not move a release that has the clang.imprecise_release tag
above PHI instructions.

ARC optimizer has an optimization that moves a call to an ObjC runtime
function above a phi instruction when the phi has a null operand and is
an argument passed to the function call. This optimization should not
kick in when the runtime function is an objc_release that releases an
object with precise lifetime semantics.

rdar://problem/34959669

llvm-svn: 315914
2017-10-16 16:46:59 +00:00
Sanjay Patel 42135beac8 [InstCombine] don't unnecessarily generate a constant; NFCI
llvm-svn: 315910
2017-10-16 14:47:24 +00:00
NAKAMURA Takumi 414151a47e Revert rL315894, "SLPVectorizer.cpp: Try to appease stage2-3 difference. (D38586)"
llvm-svn: 315896
2017-10-16 09:50:01 +00:00
Nikolai Bozhenov 0e7ebbccc7 Move folding of icmp with zero after checking for min/max idioms.
Summary:
The following transformation for cmp instruction:

  icmp smin(x, PositiveValue), 0 -> icmp x, 0

should only be done after checking for min/max to prevent infinite
looping caused by a reverse canonicalization. That is why this
transformation was moved to place after the mentioned check.

Reviewers: spatel, efriedma

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D38934

Patch by: Artur Gainullin <artur.gainullin@intel.com>

llvm-svn: 315895
2017-10-16 09:19:21 +00:00
NAKAMURA Takumi 4543affa98 SLPVectorizer.cpp: Try to appease stage2-3 difference. (D38586)
llvm-svn: 315894
2017-10-16 09:15:23 +00:00
Sanjay Patel 934738a3da revert r314984: revert r314698 - [InstCombine] remove one-use restriction for icmp (shr exact X, C1), C2 --> icmp X, (C2<<C1)
Recommitting r314698. The bug exposed by this change should be fixed with:
https://reviews.llvm.org/rL315579 

llvm-svn: 315857
2017-10-15 15:39:15 +00:00
Sanjay Patel 30f30d37fb [SimplifyCFG] use range-for-loops, tidy; NFCI
There seems to be something missing here as shown in PR34471:
https://bugs.llvm.org/show_bug.cgi?id=34471 

llvm-svn: 315855
2017-10-15 14:43:39 +00:00
Aaron Ballman 615eb47035 Reverting r315590; it did not include changes for llvm-tblgen, which is causing link errors for several people.
Error LNK2019 unresolved external symbol "public: void __cdecl `anonymous namespace'::MatchableInfo::dump(void)const " (?dump@MatchableInfo@?A0xf4f1c304@@QEBAXXZ) referenced in function "public: void __cdecl `anonymous namespace'::AsmMatcherEmitter::run(class llvm::raw_ostream &)" (?run@AsmMatcherEmitter@?A0xf4f1c304@@QEAAXAEAVraw_ostream@llvm@@@Z) llvm-tblgen D:\llvm\2017\utils\TableGen\AsmMatcherEmitter.obj 1

llvm-svn: 315854
2017-10-15 14:32:27 +00:00
whitequark ae12efab20 [MergeFunctions] Merge small functions if possible without a thunk.
This can result in significant code size savings in some cases,
e.g. an interrupt table all filled with the same assembly stub
in a certain Cortex-M BSP results in code blowup by a factor of 2.5.

Differential Revision: https://reviews.llvm.org/D34806

llvm-svn: 315853
2017-10-15 12:29:09 +00:00
whitequark b2ce9ffede [MergeFunctions] Replace all uses of unnamed_addr functions.
This reduces code size for constructs like vtables or interrupt
tables that refer to functions in global initializers.

Differential Revision: https://reviews.llvm.org/D34805

llvm-svn: 315852
2017-10-15 12:29:01 +00:00
Hongbin Zheng 73f650435b [LoopInfo][Refactor] Make SetLoopAlreadyUnrolled a member function of the Loop Pass, NFC.
This avoid code duplication and allow us to add the disable unroll metadata elsewhere.

Differential Revision: https://reviews.llvm.org/D38928

llvm-svn: 315850
2017-10-15 07:31:02 +00:00
Sanjay Patel b869f76d85 [InstCombine] use m_Neg() to reduce code; NFCI
llvm-svn: 315762
2017-10-13 21:28:50 +00:00
Eugene Zelenko 3b87939604 [Transforms] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 315760
2017-10-13 21:17:07 +00:00
Peter Collingbourne 868783e855 LowerTypeTests: Give imported symbols a type with size 0 so that they are not assumed not to alias.
It is possible for both a base and a derived class to be satisfied
with a unique vtable. If a program contains casts of the same pointer
to both of those types, the CFI checks will be lowered to this
(with ThinLTO):

if (p != &__typeid_base_global_addr)
  trap();
if (p != &__typeid_derived_global_addr)
  trap();

The optimizer may then use the first condition combined
with the assumption that __typeid_base_global_addr and
__typeid_derived_global_addr may not alias to optimize away the second
comparison, resulting in an unconditional trap.

This patch fixes the bug by giving imported globals the type [0 x i8]*,
which prevents the optimizer from assuming that they do not alias.

Differential Revision: https://reviews.llvm.org/D38873

llvm-svn: 315753
2017-10-13 21:02:16 +00:00
Sanjay Patel f0242de143 [InstCombine] move code to remove repeated constant check; NFCI
Also, consolidate tests for this fold in one place.

llvm-svn: 315745
2017-10-13 20:29:11 +00:00
Sanjay Patel 28b3aa3663 [InstCombine] recycle adds for better efficiency
Also, clean up unnecessary matcher capture variable initializations.

llvm-svn: 315743
2017-10-13 20:12:21 +00:00
Sanjay Patel 2118952162 [InstCombine] use local var to reduce code duplication; NFCI
llvm-svn: 315728
2017-10-13 18:32:53 +00:00
Matthew Simpson 2284937bbc [IPSCCP] Move common functions to ValueLatticeUtils (NFC)
This patch moves some common utility functions out of IPSCCP and makes them
available globally. The functions determine if interprocedural data-flow
analyses can propagate information through function returns, arguments, and
global variables.

Differential Revision: https://reviews.llvm.org/D37638

llvm-svn: 315719
2017-10-13 17:53:44 +00:00
Sanjay Patel c419c9f640 [InstCombine] add hasOneUse check to add-zext-add fold to prevent increasing instructions
llvm-svn: 315718
2017-10-13 17:47:25 +00:00
Sanjay Patel 76ed9eab29 [InstCombine] use AddOne helper to reduce code; NFC
llvm-svn: 315709
2017-10-13 17:00:47 +00:00
Sanjay Patel 8d810fee43 [InstCombine] rearrange code to remove repeated constant check; NFCI
llvm-svn: 315703
2017-10-13 16:43:58 +00:00
Sanjay Patel 2150651ac3 [InstCombine] allow zext(bool) + C --> select bool, C+1, C for vector types
The backend should be prepared for this transform after:
https://reviews.llvm.org/rL311731

llvm-svn: 315701
2017-10-13 16:29:38 +00:00
Daniel Neilson fa14ebd138 [RS4GC] Look through vector bitcasts when looking for base pointer
Summary:
 In RS4GC it is possible that a base pointer is contained in a vector that
has undergone a bitcast from one element-pointertype to another. We teach
RS4GC how to look through bitcasts of vector types when looking for a base
pointer.

Reviewers: anna

Reviewed By: anna

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D38849

llvm-svn: 315694
2017-10-13 15:59:13 +00:00
Daniel Jasper 3344a21236 Revert r314923: "Recommit : Use the basic cost if a GEP is not used as addressing mode"
Significantly reduces performancei (~30%) of gipfeli
(https://github.com/google/gipfeli)

I have not yet managed to reproduce this regression with the open-source
version of the benchmark on github, but will work with others to get a
reproducer to you later today.

llvm-svn: 315680
2017-10-13 14:04:21 +00:00
Marco Castelluccio 0dcf64ad20 Disable gcov instrumentation of functions using funclet-based exception handling
Summary: This patch fixes the crash from https://bugs.llvm.org/show_bug.cgi?id=34659 and https://bugs.llvm.org/show_bug.cgi?id=34833.

Reviewers: rnk, majnemer

Reviewed By: rnk, majnemer

Subscribers: majnemer, llvm-commits

Differential Revision: https://reviews.llvm.org/D38223

llvm-svn: 315677
2017-10-13 13:49:15 +00:00
Eugene Zelenko 5323550e9a [Transforms] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 315640
2017-10-12 23:30:03 +00:00
Anna Thomas 61aec18d46 [CVP] Process binary operations even when def is local
Summary:
This patch adds processing of binary operations when the def of operands are in
the same block (i.e. local processing).

Earlier we bailed out in such cases (the bail out was introduced in rL252032)
because LVI at that time was more precise about context at the end of basic
blocks, which implied local def and use analysis didn't benefit CVP.

Since then we've added support for LVI in presence of assumes and guards. The
test cases added show how local def processing in CVP helps adding more
information to the ashr, sdiv, srem and add operators.

Note: processCmp which suffers from the same problem will
be handled in a later patch.

Reviewers: philip, apilipenko, SjoerdMeijer, hfinkel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D38766

llvm-svn: 315634
2017-10-12 22:39:52 +00:00
Artur Pilipenko ead69ee4bd [LoopPredication] Check whether the loop is already guarded by the first iteration check condition
llvm-svn: 315623
2017-10-12 21:21:17 +00:00
Bruno Cardoso Lopes 993d2e67d8 Revert "Reintroduce "[SCCP] Propagate integer range info for parameters in IPSCCP.""
This reverts commit r315593: still affect two bots:

http://lab.llvm.org:8011/builders/clang-with-lto-ubuntu/builds/5308
http://green.lab.llvm.org/green/job/clang-stage2-configure-Rlto/21751/

llvm-svn: 315618
2017-10-12 20:52:34 +00:00
Artur Pilipenko b4527e1ce2 [LoopPredication] Support ule, sle latch predicates
This is a follow up for the loop predication change 313981 to support ule, sle latch predicates.

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D38177

llvm-svn: 315616
2017-10-12 20:40:27 +00:00
Bruno Cardoso Lopes 326fdcbff8 Reintroduce "[SCCP] Propagate integer range info for parameters in IPSCCP."
This is r315288 & r315294, which were reverted due to stage2 bot
failures.

Summary:
This updates the SCCP solver to use of the ValueElement lattice for
parameters, which provides integer range information. The range
information is used to remove unneeded icmp instructions.

For the following function, f() can be optimized to `ret i32 2` with
this change

  source_filename = "sccp.c"
  target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
  target triple = "x86_64-unknown-linux-gnu"

  ; Function Attrs: norecurse nounwind readnone uwtable
  define i32 @main() local_unnamed_addr #0 {
  entry:
    %call = tail call fastcc i32 @f(i32 1)
    %call1 = tail call fastcc i32 @f(i32 47)
    %add3 = add nsw i32 %call, %call1
    ret i32 %add3
  }

  ; Function Attrs: noinline norecurse nounwind readnone uwtable
  define internal fastcc i32 @f(i32 %x) unnamed_addr #1 {
  entry:
    %c1 = icmp sle i32 %x, 100

    %cmp = icmp sgt i32 %x, 300
    %. = select i1 %cmp, i32 1, i32 2
    ret i32 %.
  }

  attributes #1 = { noinline }

Reviewers: davide, sanjoy, efriedma, dberlin

Reviewed By: davide, dberlin

Subscribers: mcrosier, gberry, mssimpso, dberlin, llvm-commits

Differential Revision: https://reviews.llvm.org/D36656

llvm-svn: 315593
2017-10-12 16:54:11 +00:00
Don Hinton 3e0199f7eb [dump] Remove NDEBUG from test to enable dump methods [NFC]
Summary:
Add LLVM_FORCE_ENABLE_DUMP cmake option, and use it along with
LLVM_ENABLE_ASSERTIONS to set LLVM_ENABLE_DUMP.

Remove NDEBUG and only use LLVM_ENABLE_DUMP to enable dump methods.

Move definition of LLVM_ENABLE_DUMP from config.h to llvm-config.h so
it'll be picked up by public headers.

Differential Revision: https://reviews.llvm.org/D38406

llvm-svn: 315590
2017-10-12 16:16:06 +00:00
Hongbin Zheng d36f2030e2 [SimplifyIndVar] Replace IVUsers with loop invariant whenever possible
Differential Revision: https://reviews.llvm.org/D38415

llvm-svn: 315551
2017-10-12 02:54:11 +00:00
Zachary Turner 41a9ee98f9 Revert "[ADT] Make Twine's copy constructor private."
This reverts commit 4e4ee1c507e2707bb3c208e1e1b6551c3015cbf5.

This is failing due to some code that isn't built on MSVC
so I didn't catch.  Not immediately obvious how to fix this
at first glance, so I'm reverting for now.

llvm-svn: 315536
2017-10-11 23:54:34 +00:00
Zachary Turner 337462b365 [ADT] Make Twine's copy constructor private.
There's a lot of misuse of Twine scattered around LLVM.  This
ranges in severity from benign (returning a Twine from a function
by value that is just a string literal) to pretty sketchy (storing
a Twine by value in a class).  While there are some uses for
copying Twines, most of the very compelling ones are confined
to the Twine class implementation itself, and other uses are
either dubious or easily worked around.

This patch makes Twine's copy constructor private, and fixes up
all callsites.

Differential Revision: https://reviews.llvm.org/D38767

llvm-svn: 315530
2017-10-11 23:33:06 +00:00
Eugene Zelenko 6f1ae631f7 [Transforms] Revert r315516 changes in PredicateInfo to fix Windows build bots (NFC).
llvm-svn: 315519
2017-10-11 21:56:44 +00:00
Eugene Zelenko 286d5897d6 [Transforms] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 315516
2017-10-11 21:41:43 +00:00
Vivek Pandya 9590658fb8 [NFC] Convert OptimizationRemarkEmitter old emit() calls to new closure
parameterized emit() calls

Summary: This is not functional change to adopt new emit() API added in r313691.

Reviewed By: anemet

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D38285

llvm-svn: 315476
2017-10-11 17:12:59 +00:00
Max Kazantsev fecaff1bd9 [NFC] Fix variables used only for assert in GVN
llvm-svn: 315448
2017-10-11 10:31:49 +00:00
Max Kazantsev 3b81809e06 [GVN] Prevent LoadPRE from hoisting across instructions that don't pass control flow to successors
This patch fixes the miscompile that happens when PRE hoists loads across guards and
other instructions that don't always pass control flow to their successors. PRE is now prohibited
to hoist across such instructions because there is no guarantee that the load standing after such
instruction is still valid before such instruction. For example, a load from under a guard may be
invalid before the guard in the following case:
  int array[LEN];
  ...
  guard(0 <= index && index < LEN);
  use(array[index]);

Differential Revision: https://reviews.llvm.org/D37460

llvm-svn: 315440
2017-10-11 08:10:43 +00:00
Max Kazantsev 0c8dd052b8 [LICM] Disallow sinking of unordered atomic loads into loops
Sinking of unordered atomic load into loop must be disallowed because it turns
a single load into multiple loads. The relevant section of the documentation
is: http://llvm.org/docs/Atomics.html#unordered, specifically the Notes for
Optimizers section. Here is the full text of this section:

> Notes for optimizers
> In terms of the optimizer, this **prohibits any transformation that
> transforms a single load into multiple loads**, transforms a store into
> multiple stores, narrows a store, or stores a value which would not be
> stored otherwise. Some examples of unsafe optimizations are narrowing
> an assignment into a bitfield, rematerializing a load, and turning loads
> and stores into a memcpy call. Reordering unordered operations is safe,
> though, and optimizers should take advantage of that because unordered
> operations are common in languages that need them.

Patch by Daniil Suchkov!

Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D38392

llvm-svn: 315438
2017-10-11 07:26:45 +00:00
Max Kazantsev 25d8655dc2 [IRCE] Do not process empty safe ranges
IRCE should not apply when the safe iteration range is proved to be empty.
In this case we do unneeded job creating pre/post loops and then never
go to the main loop.

This patch makes IRCE not apply to empty safe ranges, adds test for this
situation and also modifies one of existing tests where it used to happen
slightly.

Reviewed By: anna
Differential Revision: https://reviews.llvm.org/D38577

llvm-svn: 315437
2017-10-11 06:53:07 +00:00
Davide Italiano e2138fe41b [GVN] Don't replace constants with constants.
This fixes PR34908. Patch by Alex Crichton!

Differential Revision:  https://reviews.llvm.org/D38765

llvm-svn: 315429
2017-10-11 04:21:51 +00:00
Eugene Zelenko e9ea08a097 [Transforms] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 315383
2017-10-10 22:49:55 +00:00
Dehao Chen 3f56a05ae5 Use the first instruction's count to estimate the funciton's entry frequency.
Summary: In the current implementation, we only have accurate profile count for standalone symbols. For inlined functions, we do not have entry count data because it's not available in LBR. In this patch, we use the first instruction's frequency to estimiate the function's entry count, especially for inlined functions. This may be inaccurate due to debug info in optimized code. However, this is a better estimate than the static 80/20 estimation we have in the current implementation.

Reviewers: tejohnson, davidxl

Reviewed By: tejohnson

Subscribers: sanjoy, llvm-commits, aprantl

Differential Revision: https://reviews.llvm.org/D38478

llvm-svn: 315369
2017-10-10 21:13:50 +00:00
Bruno Cardoso Lopes 57304923ca Revert "[SCCP] Propagate integer range info for parameters in IPSCCP."
This reverts commit r315288. This is part of fixing segfault introduced
in:

http://green.lab.llvm.org/green/job/clang-stage2-configure-Rlto/21675/

llvm-svn: 315329
2017-10-10 16:37:57 +00:00
Bruno Cardoso Lopes 122c4b3c8c Revert "[SCCP] Fix mem-sanitizer failure introduced by r315288."
This reverts commit r315294. Part of fixing seg fault introduced in:
http://green.lab.llvm.org/green/job/clang-stage2-configure-Rlto/21675/

llvm-svn: 315328
2017-10-10 16:37:51 +00:00
Florian Hahn 7d2375df30 [SCCP] Fix mem-sanitizer failure introduced by r315288.
llvm-svn: 315294
2017-10-10 10:33:45 +00:00
Florian Hahn 22a44bca40 [SCCP] Propagate integer range info for parameters in IPSCCP.
Summary:
This updates the SCCP solver to use of the ValueElement lattice for 
parameters, which provides integer range information. The range
information is used to remove unneeded icmp instructions.

For the following function, f() can be optimized to `ret i32 2` with
this change

  source_filename = "sccp.c"
  target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
  target triple = "x86_64-unknown-linux-gnu"
  
  ; Function Attrs: norecurse nounwind readnone uwtable
  define i32 @main() local_unnamed_addr #0 {
  entry:
    %call = tail call fastcc i32 @f(i32 1)
    %call1 = tail call fastcc i32 @f(i32 47)
    %add3 = add nsw i32 %call, %call1
    ret i32 %add3
  }
  
  ; Function Attrs: noinline norecurse nounwind readnone uwtable
  define internal fastcc i32 @f(i32 %x) unnamed_addr #1 {
  entry:
    %c1 = icmp sle i32 %x, 100
  
    %cmp = icmp sgt i32 %x, 300
    %. = select i1 %cmp, i32 1, i32 2
    ret i32 %.
  }
  
  attributes #1 = { noinline }



Reviewers: davide, sanjoy, efriedma, dberlin

Reviewed By: davide, dberlin

Subscribers: mcrosier, gberry, mssimpso, dberlin, llvm-commits

Differential Revision: https://reviews.llvm.org/D36656

llvm-svn: 315288
2017-10-10 09:32:38 +00:00
Clement Courbet e2e8a5c496 Re-land "[MergeICmps] Disable mergeicmps if the target does not want to handle memcmp expansion."
(fixed stability issues)

This reverts commit d6492333d3b478a1d88163315002022f8d5e58dc.

llvm-svn: 315281
2017-10-10 08:00:45 +00:00
Xinliang David Li 4cdc9dab0a Renable r314928
Eliminate inttype phi with inttoptr/ptrtoint.

 This version fixed a bug in finding the matching
 phi -- the order of the incoming blocks may be 
 different (triggered in self build on Windows).
 A new test case is added.

llvm-svn: 315272
2017-10-10 05:07:54 +00:00
Adam Nemet 0965da2055 Rename OptimizationDiagnosticInfo.* to OptimizationRemarkEmitter.*
Sync it up with the name of the class actually defined here.  This has been
bothering me for a while...

llvm-svn: 315249
2017-10-09 23:19:02 +00:00
Sanjay Patel ce36b03b03 [InstCombine] fix formatting; NFC
llvm-svn: 315223
2017-10-09 17:54:46 +00:00
Sanjay Patel 72d339abb7 [InstCombine] use correct type when propagating constant condition in simplifyDivRemOfSelectWithZeroOp (PR34856)
llvm-svn: 315130
2017-10-06 23:43:06 +00:00
Sanjay Patel ae2e3a44d2 [InstCombine] rename SimplifyDivRemOfSelect to be clearer, add comments, simplify code; NFCI
There's at least one bug here - this code can fail with vector types (PR34856).
It's also being called for FREM; I'm still trying to understand how that is valid.

llvm-svn: 315127
2017-10-06 23:20:16 +00:00
Reid Kleckner b6b210e61f Revert "Roll forward r314928"
This appears to be miscompiling Clang, as shown on two Windows bootstrap
bots:
http://lab.llvm.org:8011/builders/clang-x86-windows-msvc2015/builds/7611
http://lab.llvm.org:8011/builders/clang-x64-ninja-win7/builds/6870

Nothing else is in the blame list. Both emit errors on this valid code
in the Windows ucrt headers:

C:\...\ucrt\malloc.h:95:32: error: invalid operands to binary expression ('char *' and 'int')
            _Ptr = (char*)_Ptr + _ALLOCA_S_MARKER_SIZE;
                   ~~~~~~~~~~~ ^ ~~~~~~~~~~~~~~~~~~~~~

I am attempting to reproduce this now.

This reverts r315044

llvm-svn: 315108
2017-10-06 21:17:51 +00:00
Dehao Chen 9bd60429e2 Directly return promoted direct call instead of rely on stripPointerCast.
Summary: stripPointerCast is not reliably returning the value that's being type-casted. Instead it may look further at function attributes to further propagate the value. Instead of relying on stripPOintercast, the more reliable solution is to directly use the pointer to the promoted direct call.

Reviewers: tejohnson, davidxl

Reviewed By: tejohnson

Subscribers: llvm-commits, sanjoy

Differential Revision: https://reviews.llvm.org/D38603

llvm-svn: 315077
2017-10-06 17:04:55 +00:00
Clement Courbet d12c189e2e Revert "[MergeICmps] Disable mergeicmps if the target does not want to handle memcmp expansion."
Still a few stability issues on windows.

This reverts commit 67e3db9bc121ba244e20337aabc7cf341a62b545.

llvm-svn: 315058
2017-10-06 13:02:24 +00:00
Clement Courbet 4e1bae8136 Re-land "[MergeICmps] Disable mergeicmps if the target does not want to handle memcmp expansion."
(fixed unit tests by making comparisons stable)

This reverts commit 1b2d359ce256fd6737da4e93833346a0bd6d7583.

llvm-svn: 315056
2017-10-06 12:12:35 +00:00
Xinliang David Li bcd36f7c5a Roll forward r314928
Fixed ThinLTO bootstrap failure : track new
bitcast per incomingVal. Added new tests.

llvm-svn: 315044
2017-10-06 05:15:25 +00:00
Davide Italiano c74ea93b8c [PM] Retire disable unit-at-a-time switch.
This is a vestige from the GCC-3 days, which disables IPO passes
when set. I don't think anybody actually uses it as there are
several IPO passes which still run with this flag set and
nobody complained/noticed. This reduces the delta between
current and new pass manager and allows us to easily review
the difference when we decide to flip the switch (or audit
which passes should run, FWIW).

llvm-svn: 315043
2017-10-06 04:39:40 +00:00
Jakub Kuderski cbe9fae99d [CodeExtractor] Fix multiple bugs under certain shape of extracted region
Summary:
If the extracted region has multiple exported data flows toward the same BB which is not included in the region, correct resotre instructions and PHI nodes won't be generated inside the exitStub. The solution is simply put the restore instructions right after the definition of output values instead of putting in exitStub.
Unittest for this bug is included.

Author: myhsu

Reviewers: chandlerc, davide, lattner, silvas, davidxl, wmi, kuhar

Subscribers: dberlin, kuhar, mgorny, llvm-commits

Differential Revision: https://reviews.llvm.org/D37902

llvm-svn: 315041
2017-10-06 03:37:06 +00:00
Daniel Berlin 08dd582ea0 NewGVN: Factor out duplicate parts of OpIsSafeForPHIOfOps
llvm-svn: 315040
2017-10-06 01:33:06 +00:00
Peter Collingbourne 715bcfe0c9 ModuleUtils: Stop using comdat members to generate unique module ids.
It is possible for two modules to define the same set of external
symbols without causing a duplicate symbol error at link time,
as long as each of the symbols is a comdat member. So we cannot
use them as part of a unique id for the module.

Differential Revision: https://reviews.llvm.org/D38602

llvm-svn: 315026
2017-10-05 21:54:53 +00:00
Sanjay Patel 7ac2db6a48 [InstCombine] improve folds for icmp gt/lt (shr X, C1), C2
We can always eliminate the shift in: icmp gt/lt (shr X, C1), C2 --> icmp gt/lt X, C'
This patch was supposed to just be an efficiency improvement because we were doing this 3-step process to fold:

IC: Visiting:   %c = icmp ugt i4 %s, 1
IC: ADD:   %s = lshr i4 %x, 1
IC: ADD:   %1 = udiv i4 %x, 2
IC: Old =   %c = icmp ugt i4 %1, 1
    New =   <badref> = icmp uge i4 %x, 4
IC: ADD:   %c = icmp uge i4 %x, 4
IC: ERASE   %2 = icmp ugt i4 %1, 1
IC: Visiting:   %c = icmp uge i4 %x, 4
IC: Old =   %c = icmp uge i4 %x, 4
    New =   <badref> = icmp ugt i4 %x, 3
IC: ADD:   %c = icmp ugt i4 %x, 3
IC: ERASE   %2 = icmp uge i4 %x, 4
IC: Visiting:   %c = icmp ugt i4 %x, 3
IC: DCE:   %1 = udiv i4 %x, 2
IC: ERASE   %1 = udiv i4 %x, 2
IC: DCE:   %s = lshr i4 %x, 1
IC: ERASE   %s = lshr i4 %x, 1
IC: Visiting:   ret i1 %c

When we could go directly to canonical icmp form:

IC: Visiting:   %c = icmp ugt i4 %s, 1
IC: Old =   %c = icmp ugt i4 %s, 1
    New =   <badref> = icmp ugt i4 %x, 3
IC: ADD:   %c = icmp ugt i4 %x, 3
IC: ERASE   %1 = icmp ugt i4 %s, 1
IC: ADD:   %s = lshr i4 %x, 1
IC: DCE:   %s = lshr i4 %x, 1
IC: ERASE   %s = lshr i4 %x, 1
IC: Visiting:   %c = icmp ugt i4 %x, 3

...but then I noticed that the folds were incomplete too:
https://godbolt.org/g/aB2hLE

Here are attempts to prove the logic with Alive:
https://rise4fun.com/Alive/92o

Name: lshr_ult
Pre: ((C2 << C1) u>> C1) == C2
%sh = lshr i8 %x, C1
%r = icmp ult i8 %sh, C2
  =>
%r = icmp ult i8 %x, (C2 << C1)

Name: ashr_slt
Pre: ((C2 << C1) >> C1) == C2
%sh = ashr i8 %x, C1
%r = icmp slt i8 %sh, C2
  =>
%r = icmp slt i8 %x, (C2 << C1)

Name: lshr_ugt
Pre: (((C2+1) << C1) u>> C1) == (C2+1)
%sh = lshr i8 %x, C1
%r = icmp ugt i8 %sh, C2
  =>
%r = icmp ugt i8 %x, ((C2+1) << C1) - 1

Name: ashr_sgt
Pre: (C2 != 127) && ((C2+1) << C1 != -128) && (((C2+1) << C1) >> C1) == (C2+1)
%sh = ashr i8 %x, C1
%r = icmp sgt i8 %sh, C2
  =>
%r = icmp sgt i8 %x, ((C2+1) << C1) - 1

Name: ashr_exact_sgt
Pre: ((C2 << C1) >> C1) == C2
%sh = ashr exact i8 %x, C1
%r = icmp sgt i8 %sh, C2
  =>
%r = icmp sgt i8 %x, (C2 << C1)

Name: ashr_exact_slt
Pre: ((C2 << C1) >> C1) == C2
%sh = ashr exact i8 %x, C1
%r = icmp slt i8 %sh, C2
  =>
%r = icmp slt i8 %x, (C2 << C1)

Name: lshr_exact_ugt
Pre: ((C2 << C1) u>> C1) == C2
%sh = lshr exact i8 %x, C1
%r = icmp ugt i8 %sh, C2
  =>
%r = icmp ugt i8 %x, (C2 << C1)

Name: lshr_exact_ult
Pre: ((C2 << C1) u>> C1) == C2
%sh = lshr exact i8 %x, C1
%r = icmp ult i8 %sh, C2
  =>
%r = icmp ult i8 %x, (C2 << C1)

We did something similar for 'shl' in D28406.

Differential Revision: https://reviews.llvm.org/D38514

llvm-svn: 315021
2017-10-05 21:11:49 +00:00
Dehao Chen 16f01fb1db Annotate VP prof on indirect call if it is ICPed in the profiled binary.
Summary: In SamplePGO, when an indirect call is promoted in the profiled binary, before profile annotation, it will be promoted and inlined. For the original indirect call, the current implementation will not mark VP profile on it. This is an issue when profile becomes stale. This patch annotates VP prof on indirect calls during annotation.

Reviewers: tejohnson

Reviewed By: tejohnson

Subscribers: sanjoy, llvm-commits

Differential Revision: https://reviews.llvm.org/D38477

llvm-svn: 315016
2017-10-05 20:15:29 +00:00
Davide Italiano c8708e59e8 [PassManager] Improve the interaction between -O2 and ThinLTO.
Run GDCE slightly later so that we don't have to repeat it
twice when preparing for Thin. Thanks to Mehdi for the suggestion.

llvm-svn: 314999
2017-10-05 18:23:25 +00:00
Davide Italiano ff829cea8b [PassManager] Run global optimizations after the inliner.
The inliner performs some kind of dead code elimination as it goes,
but there are cases that are not really caught by it. We might
at some point consider teaching the inliner about them, but it
is OK for now to run GlobalOpt + GlobalDCE in tandem as their
benefits generally outweight the cost, making the whole pipeline
faster.

This fixes PR34652.

Differential Revision: https://reviews.llvm.org/D38154

llvm-svn: 314997
2017-10-05 18:06:37 +00:00
Ayal Zaks c9e0f886e5 [LV] Fix PR34743 - handle casts that sink after interleaved loads
When ignoring a load that participates in an interleaved group, make sure to
move a cast that needs to sink after it.

Testcase derived from reproducer of PR34743.

Differential Revision: https://reviews.llvm.org/D38338

llvm-svn: 314986
2017-10-05 15:45:14 +00:00
Clement Courbet 922e5bc698 Revert "Re-land "[MergeICmps] Disable mergeicmps if the target does not want to handle memcmp expansion."""
broken test on windows

This reverts commit c91479518344fd1fc071c5bd5848f6eb83e53dca.

llvm-svn: 314985
2017-10-05 14:42:06 +00:00
Sanjay Patel f11b5b4f87 revert r314698 - [InstCombine] remove one-use restriction for icmp (shr exact X, C1), C2 --> icmp X, (C2<<C1)
There is a bot failure that appears to be related to this change:
http://lab.llvm.org:8011/builders/clang-cmake-armv7-a15-selfhost-neon/builds/2117

...so reverting to confirm that and attempting to keep the bot green while investigating.

llvm-svn: 314984
2017-10-05 14:26:15 +00:00
Ayal Zaks fc3f7a4f0c [LV] Fix PR34711 - widen instruction ranges when sinking casts
Instead of trying to keep LastWidenRecipe updated after creating each recipe,
have tryToWiden() retrieve the last recipe of the current VPBasicBlock and check
if it's a VPWidenRecipe when attempting to extend its range. This ensures that
such extensions, optimized to maintain the original instruction order, do so
only when the instructions are to maintain their relative order. The latter does
not always hold, e.g., when a cast needs to sink to unravel first order
recurrence (r306884).

Testcase derived from reproducer of PR34711.

Differential Revision: https://reviews.llvm.org/D38339

llvm-svn: 314981
2017-10-05 12:41:49 +00:00
Clement Courbet 4cafbb9b5e Re-land "[MergeICmps] Disable mergeicmps if the target does not want to handle memcmp expansion.""
llvm-svn: 314980
2017-10-05 12:39:57 +00:00
Clement Courbet 6603fc0e7b Revert "[MergeICmps] Disable mergeicmps if the target does not want to handle memcmp expansion."
Breaks
clang-stage1-cmake-RA-incremental/llvm/test/Transforms/MergeICmps/X86/tuple-four-int8.ll

This reverts commit 3038c459d67f8898ffa295d54a013b280690abfa.

llvm-svn: 314972
2017-10-05 08:03:39 +00:00
Craig Topper 17b0c78447 [InstCombine] Fix a vector splat handling bug in transformZExtICmp.
We were using an i1 type and then zero extending to a vector. Instead just create the 0/1 directly as a ConstantInt with the correct type. No need to ask ConstantExpr to zero extend for us.

This bug is a bit tricky to hit because it requires us to visit a zext of an icmp that would normally be simplified to true/false, but that icmp hasnt' been visited yet. In the test case this zext and icmp were created by visiting a udiv and due to worklist ordering we got to the zext first.

Fixes PR34841.

llvm-svn: 314971
2017-10-05 07:59:11 +00:00
Clement Courbet 902eef32eb [MergeICmps] Disable mergeicmps if the target does not want to handle memcmp expansion.
Summary: This is to avoid e.g. merging two cheap icmps if the target is not going to expand to something nice later.

Reviewers: dberlin, spatel

Subscribers: davide, nemanjai

Differential Revision: https://reviews.llvm.org/D38232

llvm-svn: 314970
2017-10-05 07:49:09 +00:00
Xinliang David Li 04ab11a08a Revert r314928 to investigate thinLTO bootstrap failure
llvm-svn: 314961
2017-10-05 01:40:13 +00:00
Craig Topper 7a93092399 [InstCombine] Improve support for ashr in foldICmpAndShift
We can support ashr similar to lshr, if we know that none of the shifted in bits are used. In that case SimplifyDemandedBits would normally convert it to lshr. But that conversion doesn't happen if the shift has additional users.

Differential Revision: https://reviews.llvm.org/D38521

llvm-svn: 314945
2017-10-04 23:06:13 +00:00
Hans Wennborg 899809d531 Fix a -Wparentheses warning. NFC.
llvm-svn: 314936
2017-10-04 21:14:07 +00:00
Marcello Maggioni df3e71e037 [LoopDeletion] Move deleteDeadLoop to to LoopUtils. NFC
llvm-svn: 314934
2017-10-04 20:42:46 +00:00
Sanjay Patel 4c33d5213b [SimplifyCFG] put the optional assumption cache pointer in the options struct; NFCI
This is a follow-up to https://reviews.llvm.org/D38138. 

I fixed the capitalization of some functions because we're changing those
lines anyway and that helped verify that we weren't accidentally dropping 
any options by using default param values.

llvm-svn: 314930
2017-10-04 20:26:25 +00:00
Xinliang David Li 7a73757358 Recommit r314561 after fixing msan build failure
(trial 2) Incoming val defined by terminator instruction which
also requires bitcasts can not be handled.

llvm-svn: 314928
2017-10-04 20:17:55 +00:00
Jun Bum Lim d40e03c2d8 Recommit : Use the basic cost if a GEP is not used as addressing mode
Recommitting r314517 with the fix for handling ConstantExpr.

Original commit message:
  Currently, getGEPCost() returns TCC_FREE whenever a GEP is a legal addressing
  mode in the target. However, since it doesn't check its actual users, it will
  return FREE even in cases where the GEP cannot be folded away as a part of
  actual addressing mode. For example, if an user of the GEP is a call
  instruction taking the GEP as a parameter, then the GEP may not be folded in
  isel.

llvm-svn: 314923
2017-10-04 18:33:52 +00:00
Clement Courbet 98eaa88357 [NFC] clang-format lib/Transforms/Scalar/MergeICmps.cpp
llvm-svn: 314906
2017-10-04 15:13:52 +00:00
Max Kazantsev 8aacef6cae [IRCE] Temporarily disable unsigned latch conditions by default
We have found some corner cases connected to range intersection where IRCE makes
a bad thing when the latch condition is unsigned. The fix for that will go as a follow up.
This patch temporarily disables IRCE for unsigned latch conditions until the issue is fixed.

The unsigned latch conditions were introduced to IRCE by rL310027.

Differential Revision: https://reviews.llvm.org/D38529

llvm-svn: 314881
2017-10-04 06:53:22 +00:00
Craig Topper df63b96811 [InstCombine] Use isSignBitCheck to simplify an if statement. Directly create new sign bit compares instead of manipulating the constant. NFCI
Since we no longer had the direct constant compares, manipulating the constant seemeded less clear.

llvm-svn: 314830
2017-10-03 19:14:23 +00:00
Hans Wennborg 9a9048e19f Revert r314806 "[SLP] Vectorize jumbled memory loads."
All the buildbots are red, e.g.
http://lab.llvm.org:8011/builders/clang-cmake-aarch64-lld/builds/2436/

> Summary:
> This patch tries to vectorize loads of consecutive memory accesses, accessed
> in non-consecutive or jumbled way. An earlier attempt was made with patch D26905
> which was reverted back due to some basic issue with representing the 'use mask' of
> jumbled accesses.
>
> This patch fixes the mask representation by recording the 'use mask' in the usertree entry.
>
> Change-Id: I9fe7f5045f065d84c126fa307ef6ebe0787296df
>
> Reviewers: mkuper, loladiro, Ayal, zvi, danielcdh
>
> Reviewed By: Ayal
>
> Subscribers: hans, mzolotukhin
>
> Differential Revision: https://reviews.llvm.org/D36130

llvm-svn: 314824
2017-10-03 18:32:29 +00:00
Dehao Chen ea523ddb1b Revert the change that accidentally went in r314806.
llvm-svn: 314807
2017-10-03 15:50:42 +00:00
Mohammad Shahid 1d5422f27f [SLP] Vectorize jumbled memory loads.
Summary:
This patch tries to vectorize loads of consecutive memory accesses, accessed
in non-consecutive or jumbled way. An earlier attempt was made with patch D26905
which was reverted back due to some basic issue with representing the 'use mask' of
jumbled accesses.

This patch fixes the mask representation by recording the 'use mask' in the usertree entry.

Change-Id: I9fe7f5045f065d84c126fa307ef6ebe0787296df

Reviewers: mkuper, loladiro, Ayal, zvi, danielcdh

Reviewed By: Ayal

Subscribers: hans, mzolotukhin

Differential Revision: https://reviews.llvm.org/D36130

llvm-svn: 314806
2017-10-03 15:28:48 +00:00
Craig Topper 8ed1aa91bd [InstCombine] Change a bunch of methods to take APInts by reference instead of pointer.
This allows us to remove a bunch of dereferences and only have a few dereferences at the call sites.

llvm-svn: 314762
2017-10-03 05:31:07 +00:00
Craig Topper 664c4d0190 [InstCombine] Replace an equality compare of two APInt pointers with a compare of the APInts themselves.
Apparently this works by virtue of the fact that the pointers are pointers to the APInts stored inside of the ConstantInt objects. But I really don't think we should be relying on that.

llvm-svn: 314761
2017-10-03 04:55:04 +00:00
Davide Italiano c48d1c8519 [PassManager] Retire cl::opt that have been set for a while. NFCI.
llvm-svn: 314740
2017-10-02 23:39:20 +00:00
Sanjay Patel 6e17c00a88 [InstCombine] remove one-use restriction for icmp (shr exact X, C1), C2 --> icmp X, (C2<<C1)
llvm-svn: 314698
2017-10-02 18:26:44 +00:00
Dehao Chen f464627f28 Update getMergedLocation to check the instruction type and merge properly.
Summary: If the merged instruction is call instruction, we need to set the scope to the closes common scope between 2 locations, otherwise it will cause trouble when the call is getting inlined.

Reviewers: dblaikie, aprantl

Reviewed By: dblaikie, aprantl

Subscribers: llvm-commits, sanjoy

Differential Revision: https://reviews.llvm.org/D37877

llvm-svn: 314694
2017-10-02 18:13:14 +00:00
Craig Topper 6e025a3ecc [InstCombine] Use APInt for all the math in foldICmpDivConstant
Summary: This currently uses ConstantExpr to do its math, but as noted in a TODO it can all be done directly on APInt.

Reviewers: spatel, majnemer

Reviewed By: majnemer

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D38440

llvm-svn: 314640
2017-10-01 23:53:54 +00:00
Daniel Jasper 3c9c60c727 Revert r314579: "Recommi r314561 after fixing over-debug assertion".
And follow-up r314585.
Leads to segfaults. I'll forward reproduction instructions to the patch
author.

Also, for a recommit, still add the original patch description.
Otherwise, it becomes really tedious to find out what a patch actually
does. The fact that it is a recommit with a fix is somewhat secondary.

llvm-svn: 314622
2017-10-01 09:53:53 +00:00
Dehao Chen d26dae0d34 Separate the logic when handling indirect calls in SamplePGO ThinLTO compile phase and other phases.
Summary: In SamplePGO ThinLTO compile phase, we will not invoke ICP as it may introduce confusion to the 2nd annotation. This patch extracted that logic and makes it clearer before profile annotation. In the mean time, we need to make function importing process both inlined callsites as well as not promoted indirect callsites.

Reviewers: tejohnson

Reviewed By: tejohnson

Subscribers: sanjoy, mehdi_amini, llvm-commits, inglorion

Differential Revision: https://reviews.llvm.org/D38094

llvm-svn: 314619
2017-10-01 05:24:51 +00:00
Daniel Berlin d36c27bedb NewGVN: Fix PR 34473, by not using ExactlyEqualsExpression for finding
phi of ops users.

llvm-svn: 314612
2017-09-30 23:51:55 +00:00
Daniel Berlin c1305af09b NewGVN: Evaluate phi of ops expressions before creating phi node
llvm-svn: 314611
2017-09-30 23:51:54 +00:00
Daniel Berlin 9b926e90d3 NewGVN: Allow dependent PHI of ops
llvm-svn: 314610
2017-09-30 23:51:53 +00:00
Daniel Berlin de6958ee85 NewGVN: Make OpIsSafeForPhiOfOps non-recursive
llvm-svn: 314609
2017-09-30 23:51:04 +00:00
Dehao Chen 4f5d830343 Refactor the SamplePGO profile annotation logic to extract inlineCallInstruction. (NFC)
llvm-svn: 314601
2017-09-30 20:46:15 +00:00
Daniel Jasper 0a51ec29c9 Revert r314435: "[JumpThreading] Preserve DT and LVI across the pass"
Causes a segfault on a builtbot (and in our internal bootstrapping of
Clang). See Eli's response on the commit thread.

llvm-svn: 314589
2017-09-30 11:57:19 +00:00
Xinliang David Li b8aac3ac19 Fix buildbot failure -- tighten type check for matching phi
llvm-svn: 314585
2017-09-30 05:27:46 +00:00
Xinliang David Li 3409d9c07f Recommi r314561 after fixing over-debug assertion
llvm-svn: 314579
2017-09-30 00:46:32 +00:00
Xinliang David Li 455dec098b Revert 314561 due to debug build assertion failure
llvm-svn: 314563
2017-09-29 22:30:34 +00:00
Xinliang David Li 5b9d96825b Eliminate PHI (int typed) which has only one use by intptr
This patch will eliminate redundant intptr/ptrtoint that pessimizes
analyses such as SCEV, AA and will make optimization passes such
as auto-vectorization more powerful.

Differential revision: http://reviews.llvm.org/D37832

llvm-svn: 314561
2017-09-29 22:10:15 +00:00
Alex Shlyapnikov e76aa3b0b2 Revert "Use the basic cost if a GEP is not used as addressing mode"
This reverts commit r314517.

This commit crashes sanitizer bots, for example:
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/4167

Stack snippet:
...
/mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm/include/llvm/Support/Casting.h:255:0
llvm::TargetTransformInfoImplCRTPBase<llvm::X86TTIImpl>::getGEPCost(llvm::GEPOperator const*, llvm::ArrayRef<llvm::Value const*>)
/mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h:742:0
llvm::TargetTransformInfoImplCRTPBase<llvm::X86TTIImpl>::getUserCost(llvm::User const*, llvm::ArrayRef<llvm::Value const*>)
/mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h:782:0
/mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm/lib/Analysis/TargetTransformInfo.cpp:116:0
/mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm/include/llvm/ADT/SmallVector.h:116:0
/mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm/include/llvm/ADT/SmallVector.h:343:0
/mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm/include/llvm/ADT/SmallVector.h:864:0
/mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm/include/llvm/Analysis/TargetTransformInfo.h:285:0
...

llvm-svn: 314560
2017-09-29 22:04:45 +00:00
Matthew Simpson f4bb480b62 [LV] Use correct insertion point when type shrinking reductions
When type shrinking reductions, we should insert the truncations and extends at
the end of the loop latch block. Previously, these instructions were inserted
at the end of the loop header block. The difference is only a problem for loops
with predicated instructions (e.g., conditional stores and instructions that
may divide by zero). For these instructions, we create new basic blocks inside
the vectorized loop, which cause the loop header and latch to no longer be the
same block. This should fix PR34687.

Reference: https://bugs.llvm.org/show_bug.cgi?id=34687
llvm-svn: 314542
2017-09-29 18:07:39 +00:00
Hongbin Zheng c8abdf5f25 [SimplifyIndVar] Do not fail when we constant fold an IV user to ConstantPointerNull
The type of a SCEVConstant may not match the corresponding LLVM Value.
In this case, we skip the constant folding for now.

TODO: Replace ConstantInt Zero by ConstantPointerNull
llvm-svn: 314531
2017-09-29 16:32:12 +00:00
Jun Bum Lim 0e16a59e83 Use the basic cost if a GEP is not used as addressing mode
Summary:
Currently, getGEPCost() returns TCC_FREE whenever a GEP is a legal addressing mode in the target.
However, since it doesn't check its actual users, it will return FREE even in cases
where the GEP cannot be folded away as a part of actual addressing mode.
For example, if an user of the GEP is a call instruction taking the GEP as a parameter,
then the GEP may not be folded in isel.

Reviewers: hfinkel, efriedma, mcrosier, jingyue, haicheng

Reviewed By: hfinkel

Subscribers: javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D38085

llvm-svn: 314517
2017-09-29 14:50:16 +00:00
Sanjoy Das 0ac5ba5ade Revert "[BypassSlowDivision] Improve our handling of divisions by constants"
This reverts commit r314253.  It causes a miscompile on P100 in an internal
benchmark.  Reverting while I investigate.

llvm-svn: 314482
2017-09-29 00:54:16 +00:00
Evandro Menezes 3701df55c6 [JumpThreading] Preserve DT and LVI across the pass
JumpThreading now preserves dominance and lazy value information across the
entire pass.  The pass manager is also informed of this preservation with
the goal of DT and LVI being recalculated fewer times overall during
compilation.

This change prepares JumpThreading for enhanced opportunities; particularly
those across loop boundaries.

Patch by: Brian Rzycki <b.rzycki@samsung.com>,
          Sebastian Pop <s.pop@samsung.com>

Differential revision: https://reviews.llvm.org/D37528

llvm-svn: 314435
2017-09-28 17:24:40 +00:00
Benjamin Kramer c965b30e54 [LoopUnroll] Fix use after poison.
llvm-svn: 314418
2017-09-28 14:47:39 +00:00
Sanjoy Das def1729dc4 Use a BumpPtrAllocator for Loop objects
Summary:
And now that we no longer have to explicitly free() the Loop instances, we can
(with more ease) use the destructor of LoopBase to do what LoopBase::clear() was
doing.

Reviewers: chandlerc

Subscribers: mehdi_amini, mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D38201

llvm-svn: 314375
2017-09-28 02:45:42 +00:00
Craig Topper 0cd25942f7 Revert r314017 '[InstCombine] Simplify check for RHS being a splat constant in foldICmpUsingKnownBits by just checking Op1Min==Op1Max rather than going through m_APInt.'
This reverts r314017 and similar code added in later commits. It seems to not work for pointer compares and is causing a bot failure for the last several days.

llvm-svn: 314360
2017-09-27 22:57:18 +00:00
Rui Ueyama 0dbb0f107e Fix -Wunused-variable for Release build.
llvm-svn: 314353
2017-09-27 22:03:15 +00:00
Sanjoy Das 4f3ebd537c Return the LoopUnrollResult from tryToUnrollLoop; NFC
I will use this in a later change.

llvm-svn: 314352
2017-09-27 21:45:22 +00:00
Sanjoy Das 8e8c1bc490 LoopDeletion: use return value instead of passing in LPMUpdater; NFC
I will use this refactoring in a later patch.

llvm-svn: 314351
2017-09-27 21:45:21 +00:00
Sanjoy Das 3567d3d2ec Rename LoopUnrollStatus to LoopUnrollResult; NFC
A "Result" suffix is more appropriate here

llvm-svn: 314350
2017-09-27 21:45:19 +00:00
Alexey Bataev 022cc6c41e [SLP] Fix crash on propagate IR flags for undef operands of min/max
reductions.

If both operands of the newly created SelectInst are Undefs the
resulting operation is also Undef, not SelectInst. It may cause crashes
when trying to propagate IR flags because function expects exactly
SelectInst instruction, nothing else.

llvm-svn: 314323
2017-09-27 17:42:49 +00:00
Chad Rosier d8b4b06f5d [InstCombine] Gating select arithmetic optimization.
These changes faciliate positive behavior for arithmetic based select
expressions that match its translation criteria, keeping code size gated to
neutral or improved scenarios.

Patch by Michael Berg <michael_c_berg@apple.com>!

Differential Revision: https://reviews.llvm.org/D38263

llvm-svn: 314320
2017-09-27 17:16:51 +00:00
Sanjay Patel fee80d5e65 [SLP] fix typos/formatting; NFC
llvm-svn: 314315
2017-09-27 16:32:56 +00:00
Sanjay Patel 0f9b4773c1 [SimplifyCFG] add a struct to house optional folds (PR34603)
This was intended to be no-functional-change, but it's not - there's a test diff.

So I thought I should stop here and post it as-is to see if this looks like what was expected 
based on the discussion in PR34603:
https://bugs.llvm.org/show_bug.cgi?id=34603

Notes:
 1. The test improvement occurs because the existing 'LateSimplifyCFG' marker is not carried 
    through the recursive calls to 'SimplifyCFG()->SimplifyCFGOpt().run()->SimplifyCFG()'. 
    The parameter isn't passed down, so we pick up the default value from the function signature 
    after the first level. I assumed that was a bug, so I've passed 'Options' down in all of the 
    'SimplifyCFG' calls.

 2. I split 'LateSimplifyCFG' into 2 bits: ConvertSwitchToLookupTable and KeepCanonicalLoops. 
    This would theoretically allow us to differentiate the transforms controlled by those params 
    independently.

 3. We could stash the optional AssumptionCache pointer and 'LoopHeaders' pointer in the struct too. 
    I just stopped here to minimize the diffs.

 4. Similarly, I stopped short of messing with the pass manager layer. I have another question that 
    could wait for the follow-up: why is the new pass manager creating the pass with LateSimplifyCFG 
    set to true no matter where in the pipeline it's creating SimplifyCFG passes?

    // Create an early function pass manager to cleanup the output of the
    // frontend.
    EarlyFPM.addPass(SimplifyCFGPass());

    -->

    /// \brief Construct a pass with the default thresholds
    /// and switch optimizations.
    SimplifyCFGPass::SimplifyCFGPass()
       : BonusInstThreshold(UserBonusInstThreshold),
         LateSimplifyCFG(true) {}   <-- switches get converted to lookup tables and loops may not be in canonical form

    If this is unintended, then it's possible that the current behavior of dropping the 'LateSimplifyCFG' 
    setting via recursion was masking this bug.

Differential Revision: https://reviews.llvm.org/D38138

llvm-svn: 314308
2017-09-27 14:54:16 +00:00
Hongbin Zheng d1b7b2efba [SimplifyIndVar] Constant fold IV users
This patch tries to transform cases like:

for (unsigned i = 0; i < N; i += 2) {
  bool c0 = (i & 0x1) == 0;
  bool c1 = ((i + 1) & 0x1) == 1;
}
To

for (unsigned i = 0; i < N; i += 2) {
  bool c0 = true;
  bool c1 = true;
}

This commit also update test/Transforms/IndVarSimplify/replace-srem-by-urem.ll to prevent constant folding.

Differential Revision: https://reviews.llvm.org/D38272

llvm-svn: 314266
2017-09-27 03:11:46 +00:00
Sanjoy Das eda7a86d42 [BypassSlowDivision] Improve our handling of divisions by constants
Summary:
Don't bail out on constant divisors for divisions that can be narrowed without
introducing control flow .  This gives us a 32 bit multiply instead of an
emulated 64 bit multiply in the generated PTX assembly.

Reviewers: jlebar

Subscribers: jholewinski, mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D38265

llvm-svn: 314253
2017-09-26 21:54:27 +00:00
Craig Topper 8bf622174d [InstCombine] Remove one use restriction on the shift for calls to foldICmpAndShift.
If this transformation succeeds, we're going to remove our dependency on the shift by rewriting the and. So it doesn't matter how many uses the shift has.

This distributes the one use check to other transforms in foldICmpAndConstConst that do need it.

Differential Revision: https://reviews.llvm.org/D38206

llvm-svn: 314233
2017-09-26 18:47:25 +00:00
Sanjay Patel 1d04b5bacf [DSE] Merge stores when the later store only writes to memory locations the early store also wrote to (2nd try)
This is a 2nd attempt at:
https://reviews.llvm.org/rL310055
...which was reverted at rL310123 because of PR34074:
https://bugs.llvm.org/show_bug.cgi?id=34074

In this version, we break out of the inner loop after we successfully merge and kill a pair of stores. In the
earlier rev, we were continuing instead, which meant we could process the invalid info from a now dead store.

Original commit message (authored by Filipe Cabecinhas):

This fixes PR31777.

If both stores' values are ConstantInt, we merge the two stores
(shifting the smaller store appropriately) and replace the earlier (and
larger) store with an updated constant.

In the future we should also support vectors of integers. And maybe
float/double if we can.  

Differential Revision: https://reviews.llvm.org/D30703

llvm-svn: 314206
2017-09-26 13:54:28 +00:00
Sylvestre Ledru e7d4cd639b Don't move llvm.localescape outside the entry block in the GCOV profiling pass
Summary:
This fixes https://bugs.llvm.org/show_bug.cgi?id=34714.

Patch by Marco Castelluccio

Reviewers: rnk

Reviewed By: rnk

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D38224

llvm-svn: 314201
2017-09-26 11:56:43 +00:00
Matthias Braun cc603ee3d5 TargetLibraryInfo: Stop guessing wchar_t size
Usually the frontend communicates the size of wchar_t via metadata and
we can optimize wcslen (and possibly other calls in the future). In
cases without the wchar_size metadata we would previously try to guess
the correct size based on the target triple; however this is fragile to
keep up to date and may miss users manually changing the size via flags.
Better be safe and stop guessing and optimizing if the frontend didn't
communicate the size.

Differential Revision: https://reviews.llvm.org/D38106

llvm-svn: 314185
2017-09-26 02:36:57 +00:00
Vlad Tsyrklevich 998b220e97 Add section headers to SpecialCaseLists
Summary:
Sanitizer blacklist entries currently apply to all sanitizers--there
is no way to specify that an entry should only apply to a specific
sanitizer. This is important for Control Flow Integrity since there are
several different CFI modes that can be enabled at once. For maximum
security, CFI blacklist entries should be scoped to only the specific
CFI mode(s) that entry applies to.

Adding section headers to SpecialCaseLists allows users to specify more
information about list entries, like sanitizer names or other metadata,
like so:

  [section1]
  fun:*fun1*
  [section2|section3]
  fun:*fun23*

The section headers are regular expressions. For backwards compatbility,
blacklist entries entered before a section header are put into the '[*]'
section so that blacklists without sections retain the same behavior.

SpecialCaseList has been modified to also accept a section name when
matching against the blacklist. It has also been modified so the
follow-up change to clang can define a derived class that allows
matching sections by SectionMask instead of by string.

Reviewers: pcc, kcc, eugenis, vsk

Reviewed By: eugenis, vsk

Subscribers: vitalybuka, llvm-commits

Differential Revision: https://reviews.llvm.org/D37924

llvm-svn: 314170
2017-09-25 22:11:11 +00:00
Craig Topper 30dc9797e9 [InstCombine] Move an optimization from foldICmpAndConstConst to foldICmpUsingKnownBits
All this optimization cares about is knowing how many low bits of LHS is known to be zero and whether that means that the result is 0 or greater than the RHS constant. It doesn't matter where the zeros in the low bits came from. So we don't need to specifically look for an AND. Instead we can use known bits.

Differential Revision: https://reviews.llvm.org/D38195

llvm-svn: 314153
2017-09-25 21:15:00 +00:00
Sanjay Patel ecb175608f [InstCombine] remove extract-of-select vector transform (2nd try)
The 1st attempt at this:
https://reviews.llvm.org/rL314117
was reverted at:
https://reviews.llvm.org/rL314118

because of bot fails for clang tests that were checking optimized IR. That should be fixed with:
https://reviews.llvm.org/rL314144
...so try again. 

Original commit message:

The transform to convert an extract-of-a-select-of-vectors was added at:
https://reviews.llvm.org/rL194013

And a question about the validity of this transform was raised in the review:
https://reviews.llvm.org/D1539:
...but not answered AFAICT>

Most of the motivating cases in that patch are now handled by other combines. These are the tests that were added with
the original commit, but they are not regressing even after we remove the transform in this patch.

The diffs we see after removing this transform cause us to avoid increasing the instruction count, so we don't want to do
those transforms as canonicalizations.

The motivation for not turning a vector-select-of-vectors into a scalar operation is shown in PR33301:
https://bugs.llvm.org/show_bug.cgi?id=33301
...in those cases, we'll get vector ops with this patch rather than the vector/scalar mix that we currently see.

Differential Revision: https://reviews.llvm.org/D38006

llvm-svn: 314147
2017-09-25 20:30:53 +00:00
Hongbin Zheng bbe448abd8 [SimplifyIndvar] Minor change to refine r314125, NFC
llvm-svn: 314130
2017-09-25 18:10:36 +00:00
Hongbin Zheng f0093e45c4 [SimplifyIndvar] Replace the srem used by IV if we can prove both of its operands are non-negative
Since now SCEV can handle 'urem', an 'urem' is a better canonical form than an 'srem' because it has well-defined behavior

This is a follow up of D34598

Differential Revision: https://reviews.llvm.org/D38072

llvm-svn: 314125
2017-09-25 17:39:40 +00:00
Sanjay Patel aa7f750bec revert r314117 because there are bogus clang tests that depend on the optimizer
llvm-svn: 314118
2017-09-25 17:00:04 +00:00
Sanjay Patel 9639897d77 [InstCombine] remove extract-of-select vector transform
The transform to convert an extract-of-a-select-of-vectors was added at:
rL194013

And a question about the validity of this transform was raised in the review:
https://reviews.llvm.org/D1539:
...but not answered AFAICT>

Most of the motivating cases in that patch are now handled by other combines. These are the tests that were added with
the original commit, but they are not regressing even after we remove the transform in this patch.

The diffs we see after removing this transform cause us to avoid increasing the instruction count, so we don't want to do
those transforms as canonicalizations.

The motivation for not turning a vector-select-of-vectors into a scalar operation is shown in PR33301:
https://bugs.llvm.org/show_bug.cgi?id=33301
...in those cases, we'll get vector ops with this patch rather than the vector/scalar mix that we currently see.

Differential Revision: https://reviews.llvm.org/D38006

llvm-svn: 314117
2017-09-25 16:41:34 +00:00
Alexey Bataev ccce7afee8 [SLP] Support for horizontal min/max reduction.
Summary:
SLP vectorizer supports horizontal reductions for Add/FAdd binary operations. Patch adds support for horizontal min/max reductions.
Function getReductionCost() is split to getArithmeticReductionCost() for binary operation reductions and getMinMaxReductionCost() for min/max reductions.
Patch fixes PR26956.

Reviewers: spatel, mkuper, hfinkel, RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D27846

llvm-svn: 314101
2017-09-25 13:34:59 +00:00
Craig Topper ea927baee2 [InstCombine] Teach foldICmpUsingKnownBits to simplify SLE/SGE/ULE/UGE to equality comparisons when the min/max ranges intersect in a single value.
This is the inverse of what we do for SGT/SLT/UGT/ULT.

llvm-svn: 314032
2017-09-22 21:47:22 +00:00
Craig Topper 3f364aa908 [InstCombine] Add constant splat handling to one of the ICMP_SLT/SGT cases in foldICmpUsingKnownBits.
llvm-svn: 314025
2017-09-22 19:54:15 +00:00
Craig Topper 3edda87c42 [InstCombine] Move the call to isSignBitCheck into getDemandedBitsLHSMask instead of calling it outside and passing its result through a flag. NFCI
The result of the isSignBitCheck isn't used anywhere else and this allows us to share the m_APInt call in the likely case that it isn't a sign bit check.

llvm-svn: 314018
2017-09-22 18:57:23 +00:00
Craig Topper 5b35b68785 [InstCombine] Simplify check for RHS being a splat constant in foldICmpUsingKnownBits by just checking Op1Min==Op1Max rather than going through m_APInt.
llvm-svn: 314017
2017-09-22 18:57:22 +00:00
Craig Topper 2c9b7d7894 [InstCombine] Make cases for ICMP_UGT/ICMP_ULT use similar formatting since they use similar code. NFC
llvm-svn: 314016
2017-09-22 18:57:20 +00:00
Artur Pilipenko 889dc1e3a5 Rework loop predication pass
We've found a serious issue with the current implementation of loop predication.
The current implementation relies on SCEV and this turned out to be problematic.
To fix the problem we had to rework the pass substantially. We have had the
reworked implementation in our downstream tree for a while. This is the initial
patch of the series of changes to upstream the new implementation.

For now the transformation is limited to the following case:
  * The loop has a single latch with either ult or slt icmp condition.
  * The step of the IV used in the latch condition is 1.
  * The IV of the latch condition is the same as the post increment IV of the guard condition.
  * The guard condition is ult.

See the review or the LoopPredication.cpp header for the details about the
problem and the new implementation.

Reviewed By: sanjoy, mkazantsev

Differential Revision: https://reviews.llvm.org/D37569

llvm-svn: 313981
2017-09-22 13:13:57 +00:00
Sanjoy Das 388b012f4e Rename markAsErased to erase, as pointed out in a previous review; NFC
llvm-svn: 313951
2017-09-22 01:47:41 +00:00
Reid Kleckner 0fe506bc5e Re-land r313825: "[IR] Add llvm.dbg.addr, a control-dependent version of llvm.dbg.declare"
The fix is to avoid invalidating our insertion point in
replaceDbgDeclare:
     Builder.insertDeclare(NewAddress, DIVar, DIExpr, Loc, InsertBefore);
+    if (DII == InsertBefore)
+      InsertBefore = &*std::next(InsertBefore->getIterator());
     DII->eraseFromParent();

I had to write a unit tests for this instead of a lit test because the
use list order matters in order to trigger the bug.

The reduced C test case for this was:
  void useit(int*);
  static inline void inlineme() {
    int x[2];
    useit(x);
  }
  void f() {
    inlineme();
    inlineme();
  }

llvm-svn: 313905
2017-09-21 19:52:03 +00:00
Daniel Jasper 7d2f38d600 Revert r313825: "[IR] Add llvm.dbg.addr, a control-dependent version of llvm.dbg.declare"
.. as well as the two subsequent changes r313826 and r313875.

This leads to segfaults in combination with ASAN. Will forward repro
instructions to the original author (rnk).

llvm-svn: 313876
2017-09-21 12:07:33 +00:00
Mikael Holmen 582e141007 [SROA] Really remove associated dbg.declare when removing dead alloca
Summary:
There already was code that tried to remove the dbg.declare, but that code
was placed after we had called
 I->replaceAllUsesWith(UndefValue::get(I->getType()));
on the alloca, so when we searched for the relevant dbg.declare, we
couldn't find it.

Now we do the search before we call RAUW so there is a chance to find it.

An existing testcase needed update due to this. Two dbg.declare with undef
were removed and then suddenly one of the two CHECKS failed.

Before this patch we got

  call void @llvm.dbg.declare(metadata i24* undef, metadata !14, metadata !DIExpression(DW_OP_LLVM_fragment, 32, 24)), !dbg !15
  call void @llvm.dbg.declare(metadata %struct.prog_src_register* undef, metadata !14, metadata !DIExpression()), !dbg !15
  call void @llvm.dbg.value(metadata i32 0, metadata !14, metadata !DIExpression(DW_OP_LLVM_fragment, 0, 32)), !dbg !15
  call void @llvm.dbg.value(metadata i32 0, metadata !14, metadata !DIExpression(DW_OP_LLVM_fragment, 32, 24)), !dbg !15

and with it we get

  call void @llvm.dbg.value(metadata i32 0, metadata !14, metadata !DIExpression(DW_OP_LLVM_fragment, 0, 32)), !dbg !15
  call void @llvm.dbg.value(metadata i32 0, metadata !14, metadata !DIExpression(DW_OP_LLVM_fragment, 32, 24)), !dbg !15

However, the CHECKs in the testcase checked things in a silly order, so
they only passed since they found things in the first dbg.declare. Now
we changed the order of the checks and the test passes.

Reviewers: rnk

Reviewed By: rnk

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D37900

llvm-svn: 313875
2017-09-21 11:14:27 +00:00
Strahinja Petrovic 29202f6dc1 Fixed reverted commit rL312318
This patch contains fix for reverted commit
rL312318 which was causing failure due to use
of unchecked dyn_cast to CIInit.

Patch by: Nikola Prica.

llvm-svn: 313870
2017-09-21 10:04:02 +00:00
Serguei Katkov 675e304ef8 Revert "Re-enable "[IRCE] Identify loops with latch comparison against current IV value""
Revert the patch causing the functional failures.
The patch owner is notified with test cases which fail.
Test case has been provided to Maxim offline.

llvm-svn: 313857
2017-09-21 04:50:41 +00:00
Craig Topper 18887bf179 [InstCombine] Teach getDemandedBitsLHSMask to handle constant splat vectors
This replaces a ConstantInt dyn_cast with m_APInt

Differential Revision: https://reviews.llvm.org/D38100

llvm-svn: 313840
2017-09-20 23:48:58 +00:00
Matt Morehouse 4881a23ca8 [MSan] Disable sanitization for __sanitizer_dtor_callback.
Summary:
Eliminate unnecessary instrumentation at __sanitizer_dtor_callback
call sites.  Fixes https://github.com/google/sanitizers/issues/861.

Reviewers: eugenis, kcc

Reviewed By: eugenis

Subscribers: vitalybuka, llvm-commits, cfe-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D38063

llvm-svn: 313831
2017-09-20 22:53:08 +00:00
Sanjay Patel 73811a152a [SimplifyCFG] don't create a no-op subtract
I noticed this inefficiency while investigating PR34603:
https://bugs.llvm.org/show_bug.cgi?id=34603

This fix will likely push another bug (we don't maintain state of 'LateSimplifyCFG') 
into hiding, but I'll try to clean that up with a follow-up patch anyway.

llvm-svn: 313829
2017-09-20 22:31:35 +00:00
Reid Kleckner 3f547e87b2 [IR] Add llvm.dbg.addr, a control-dependent version of llvm.dbg.declare
Summary:
This implements the design discussed on llvm-dev for better tracking of
variables that live in memory through optimizations:
  http://lists.llvm.org/pipermail/llvm-dev/2017-September/117222.html

This is tracked as PR34136

llvm.dbg.addr is intended to be produced and used in almost precisely
the same way as llvm.dbg.declare is today, with the exception that it is
control-dependent. That means that dbg.addr should always have a
position in the instruction stream, and it will allow passes that
optimize memory operations on local variables to insert llvm.dbg.value
calls to reflect deleted stores. See SourceLevelDebugging.rst for more
details.

The main drawback to generating DBG_VALUE machine instrs is that they
usually cause LLVM to emit a location list for DW_AT_location. The next
step will be to teach DwarfDebug.cpp how to recognize more DBG_VALUE
ranges as not needing a location list, and possibly start setting
DW_AT_start_offset for variables whose lifetimes begin mid-scope.

Reviewers: aprantl, dblaikie, probinson

Subscribers: eraman, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D37768

llvm-svn: 313825
2017-09-20 21:52:33 +00:00
Craig Topper 562bf99ee6 [InstCombine] Handle (X & C2) < C1 --> (X & C2) == 0
We already did (X & C2) > C1 --> (X & C2) != 0, if any bit set in (X & C2) will produce a result greater than C1. But there is an equivalent inverse condition with <= C1 (which will be canonicalized to < C1+1)

Differential Revision: https://reviews.llvm.org/D38065

llvm-svn: 313819
2017-09-20 21:18:17 +00:00
Craig Topper a0c897f634 [InstCombine] Use APInt::getActiveBits() to avoid creating an APInt from a trailing zero count to do a comparison. NFCI
llvm-svn: 313792
2017-09-20 18:49:29 +00:00
Hans Wennborg 57c3341ada Revert r313771 "[SLP] Vectorize jumbled memory loads."
This broke the buildbots, e.g.
http://bb.pgr.jp/builders/test-llvm-i686-linux-RA/builds/391

> Summary:
> This patch tries to vectorize loads of consecutive memory accesses, accessed
> in non-consecutive or jumbled way. An earlier attempt was made with patch D26905
> which was reverted back due to some basic issue with representing the 'use mask'
> jumbled accesses.
>
> This patch fixes the mask representation by recording the 'use mask' in the usertree entry.
>
> Change-Id: I9fe7f5045f065d84c126fa307ef6ebe0787296df
>
> Subscribers: mzolotukhin
>
> Reviewed By: ayal
>
> Differential Revision: https://reviews.llvm.org/D36130
>
> Review comments updated accordingly
>
> Change-Id: I22ab0a8a9bac9d49d74baa81a08e1e486f5e75f0
>
> Added a TODO for sortLoadAccesses API
>
> Change-Id: I3c679bf1865422d1b45e17ea28f1992bca660b58
>
> Modified the TODO for sortLoadAccesses API
>
> Change-Id: Ie64a66cb5f9e2a7610438abb0e750c6e090f9565
>
> Review comment update for using OpdNum to insert the mask in respective location
>
> Change-Id: I016d0c1b29874e979efc0205bbf078991f92edce
>
> Fixes '-Wsign-compare warning' in LoopAccessAnalysis.cpp and code rebase
>
> Change-Id: I64b2ea5e68c1d7b6a028f5ef8251c5a97333f89b

llvm-svn: 313781
2017-09-20 18:00:03 +00:00
Quentin Colombet aa103b3d86 [InstCombine] Add select simplifications
In these cases, two selects have constant selectable operands for
both the true and false components and have the same conditional
expression.
We then create two arithmetic operations of the same type and feed a
final select operation using the result of the true arithmetic for the true
operand and the result of the false arithmetic for the false operand and reuse
the original conditionl expression.
The arithmetic operations are naturally folded as a consequence, leaving
only the newly formed select to replace the old arithmetic operation.

Patch by: Michael Berg <michael_c_berg@apple.com>
Differential Revision: https://reviews.llvm.org/D37019

llvm-svn: 313774
2017-09-20 17:32:16 +00:00
Mohammad Shahid 2b281de576 [SLP] Vectorize jumbled memory loads.
Summary:
This patch tries to vectorize loads of consecutive memory accesses, accessed
in non-consecutive or jumbled way. An earlier attempt was made with patch D26905
which was reverted back due to some basic issue with representing the 'use mask'
jumbled accesses.

This patch fixes the mask representation by recording the 'use mask' in the usertree entry.

Change-Id: I9fe7f5045f065d84c126fa307ef6ebe0787296df

Subscribers: mzolotukhin

Reviewed By: ayal

Differential Revision: https://reviews.llvm.org/D36130

Review comments updated accordingly

Change-Id: I22ab0a8a9bac9d49d74baa81a08e1e486f5e75f0

Added a TODO for sortLoadAccesses API

Change-Id: I3c679bf1865422d1b45e17ea28f1992bca660b58

Modified the TODO for sortLoadAccesses API

Change-Id: Ie64a66cb5f9e2a7610438abb0e750c6e090f9565

Review comment update for using OpdNum to insert the mask in respective location

Change-Id: I016d0c1b29874e979efc0205bbf078991f92edce

Fixes '-Wsign-compare warning' in LoopAccessAnalysis.cpp and code rebase

Change-Id: I64b2ea5e68c1d7b6a028f5ef8251c5a97333f89b
llvm-svn: 313771
2017-09-20 17:19:57 +00:00
Teresa Johnson f625118ec7 [ThinLTO] Fix dead stripping analysis for SamplePGO
Summary:
The fix for dead stripping analysis in the case of SamplePGO indirect
calls to local functions (r313151) introduced the possibility of an
infinite loop.

Make sure we check for the value being already live after we update it
for SamplePGO indirect call handling.

Reviewers: danielcdh

Subscribers: mehdi_amini, inglorion, llvm-commits, eraman

Differential Revision: https://reviews.llvm.org/D38086

llvm-svn: 313766
2017-09-20 17:09:47 +00:00
Alexander Kornienko 6a140234ed Revert r313736: "[SLP] Vectorize jumbled memory loads."
The revision breaks buildbots:
http://lab.llvm.org:8011/builders/clang-x86_64-debian-fast/builds/6694/steps/test/logs/stdio

llvm-svn: 313758
2017-09-20 14:53:07 +00:00
Mohammad Shahid f8db9bd857 [SLP] Vectorize jumbled memory loads.
Summary:
This patch tries to vectorize loads of consecutive memory accesses, accessed
in non-consecutive or jumbled way. An earlier attempt was made with patch D26905
which was reverted back due to some basic issue with representing the 'use mask' of
jumbled accesses.

This patch fixes the mask representation by recording the 'use mask' in the usertree entry.

Change-Id: I9fe7f5045f065d84c126fa307ef6ebe0787296df

Reviewers: mkuper, loladiro, Ayal, zvi, danielcdh

Reviewed By: Ayal

Subscribers: mzolotukhin

Differential Revision: https://reviews.llvm.org/D36130

Commit after rebase for patch D36130

Change-Id: I8add1c265455669ef288d880f870a9522c8c08ab
llvm-svn: 313736
2017-09-20 08:18:28 +00:00
Sanjoy Das 09613b122e Tighten the invariants around LoopBase::invalidate
Summary:
With this change:
 - Methods in LoopBase trip an assert if the receiver has been invalidated
 - LoopBase::clear frees up the memory held the LoopBase instance

This change also shuffles things around as necessary to work with this stricter invariant.

Reviewers: chandlerc

Subscribers: mehdi_amini, mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D38055

llvm-svn: 313708
2017-09-20 02:31:57 +00:00
Daniel Berlin 064cb68d18 GVNSink: Make ModelledPHIs constructor linear (and avoid edge case it worries about) by avoiding getIncomingValueForBlock
llvm-svn: 313702
2017-09-20 00:07:27 +00:00
Daniel Berlin dd323297d0 Revert "[GVNSink] Remove dependency on SmallPtrSet iteration order."
This reverts commit r312156, because now the op and block arrays are not in the same order :(.

llvm-svn: 313701
2017-09-20 00:07:25 +00:00
Daniel Berlin 9632dd7376 NewGVN: Remove unused includes
llvm-svn: 313700
2017-09-20 00:07:12 +00:00
Sanjoy Das 76ab23234c [LoopInfo] Make LoopBase and Loop destructors non-public
Summary:
See comment for why I think this is a good idea.

This change also:

 - Removes an SCEV test case.  The SCEV test was not testing anything useful (most of it was `#if 0` ed out) and it would need to be updated to deal with a private ~Loop::Loop.
 - Updates the loop pass manager test case to deal with a private ~Loop::Loop.
 - Renames markAsRemoved to markAsErased to contrast with removeLoop, via the usual remove vs. erase idiom we already have for instructions and basic blocks.

Reviewers: chandlerc

Subscribers: mehdi_amini, mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D37996

llvm-svn: 313695
2017-09-19 23:19:00 +00:00
Adam Nemet 15fccf0009 Allow ORE.emit to take a closure to delay building the remark object
In the lambda we are now returning the remark by value so we need to preserve
its type in the insertion operator.  This requires making the insertion
operator generic.

I've also converted a few cases to use the new API.  It seems to work pretty
well.  See the LoopUnroller for a slightly more interesting case.

llvm-svn: 313691
2017-09-19 23:00:55 +00:00
Dehao Chen 62b9c33e1e Import all inlined indirect call targets for SamplePGO.
Summary: In the ThinLTO compilation, if a function is inlined in the profiling binary, we need to inline it before annotation. If the callee is not available in the primary module, a first step is needed to import that callee function. For the current implementation, if the call is an indirect call, which has been promoted to >1 targets and inlined, SamplePGO will only import one target with the largest sample count. This patch fixed the bug to import all targets instead.

Reviewers: tejohnson, davidxl

Reviewed By: tejohnson

Subscribers: sanjoy, llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D36637

llvm-svn: 313678
2017-09-19 21:18:14 +00:00
Sanjay Patel ca14697c2b [SimplifyCFG] fix typos/formatting; NFC
llvm-svn: 313671
2017-09-19 20:58:14 +00:00
Dehao Chen b6e60c8b80 Handle profile mismatch correctly for SamplePGO.
Summary: Fix the bug when promoted call return type mismatches with the promoted function, we should not try to inline it. Otherwise it may lead to compiler crash.

Reviewers: davidxl, tejohnson, eraman

Reviewed By: tejohnson

Subscribers: llvm-commits, sanjoy

Differential Revision: https://reviews.llvm.org/D38018

llvm-svn: 313658
2017-09-19 18:26:54 +00:00
Reid Kleckner 1aa4ea8104 [gcov] Emit errors when opening the notes file fails
No time to write a test case, on to the next bug. =P

Discovered while investigating PR34659

llvm-svn: 313571
2017-09-18 21:31:48 +00:00
Sanjay Patel 55de1ed8e6 [SLP] clean up for vector store case; NFCI
llvm-svn: 313541
2017-09-18 16:20:15 +00:00
Craig Topper f264fcc704 [X86] Remove VPERM2F128/VPERM2I128 intrinsics and autoupgrade to native shuffles.
I've moved the test cases from the InstCombine optimizations to the backend to keep the coverage we had there. It covered every possible immediate so I've preserved the resulting shuffle mask for each of those immediates.

llvm-svn: 313450
2017-09-16 07:36:14 +00:00
Chandler Carruth beb22b5437 [SLP] Revert r312791 and other necessary commits, except for TTI and
CostModel.

The original patch added support for horizontal min/max reductions to
the SLP vectorizer.

This patch causes LLVM to miscompile fairly simple signed min
reductions. I have attached a test progrom to http://llvm.org/PR34635
that shows the behavior change after this patch. We found this in a test
for the open source Eigen library, but also in other code.

Unfortunately, the revert is moderately challenging. It required
reverting:
r313042: [SLP] Test with multiple uses of conditional op and wrong parent.
r312853: [SLP] Fix buildbots, NFC.
r312793: [SLP] Fix the warning about paths not returning the value, NFC.
r312791: [SLP] Support for horizontal min/max reduction.

And even then, I had to completely skip reverting the changes to TTI and
CostModel because r312832 rewrote so much of this code. Plus, the cost
modeling changes aren implicated in the miscompile, so they should be
fine and will just not be used until this gets re-introduced.

llvm-svn: 313409
2017-09-15 22:23:27 +00:00
Vivek Pandya b5ab895e2a This patch fixes https://bugs.llvm.org/show_bug.cgi?id=32352
It enables OptimizationRemarkEmitter::allowExtraAnalysis and MachineOptimizationRemarkEmitter::allowExtraAnalysis to return true not only for -fsave-optimization-record but when specific remarks are requested with
command line options.
The diagnostic handler used to be callback now this patch adds a class
DiagnosticHandler. It has virtual method to provide custom diagnostic handler
and methods to control which particular remarks are enabled. 
However LLVM-C API users can still provide callback function for diagnostic handler.

llvm-svn: 313390
2017-09-15 20:10:09 +00:00
Vivek Pandya df8598dcc4 This reverts r313381
llvm-svn: 313387
2017-09-15 19:53:54 +00:00
Vivek Pandya 00d887447b This patch fixes https://bugs.llvm.org/show_bug.cgi?id=32352
It enables OptimizationRemarkEmitter::allowExtraAnalysis and MachineOptimizationRemarkEmitter::allowExtraAnalysis to return true not only for -fsave-optimization-record but when specific remarks are requested with
command line options.
The diagnostic handler used to be callback now this patch adds a class
DiagnosticHandler. It has virtual method to provide custom diagnostic handler
and methods to control which particular remarks are enabled. 
However LLVM-C API users can still provide callback function for diagnostic handler.

llvm-svn: 313382
2017-09-15 19:30:59 +00:00
Anna Thomas f34537dff8 [RuntimeUnroll] Add heuristic for unrolling multi-exit loop
Add a profitability heuristic to enable runtime unrolling of multi-exit
loop: There can be atmost two unique exit blocks for the loop and the
second exit block should be a deoptimizing block. Also, there can be one
other exiting block other than the latch exiting block. The reason for
the latter is so that we limit the number of branches in the unrolled
code to being at most the unroll factor.  Deoptimizing blocks are rarely
taken so these additional number of branches created due to the
unrolling are predictable, since one of their target is the deopt block.

Reviewers: apilipenko, reames, evstupac, mkuper

Subscribers: llvm-commits

Reviewed by: reames

Differential Revision: https://reviews.llvm.org/D35380

llvm-svn: 313363
2017-09-15 15:56:05 +00:00
Anna Thomas 512dde77ba [RuntimeUnrolling] Populate the VMap entry correctly when default generated through lookup
During runtime unrolling on loops with multiple exits, we update the
exit blocks with the correct phi values from both original and remainder
loop.
In this process, we lookup the VMap for the mapped incoming phi values,
but did not update the VMap if a default entry was generated in the VMap
during the lookup. This default value is generated when constants or
values outside the current loop are looked up.
This patch fixes the assertion failure when null entries are present in
the VMap because of this lookup. Added a testcase that showcases the
problem.

llvm-svn: 313358
2017-09-15 13:29:33 +00:00
Ilya Biryukov d23faa843e Revert "[SLPVectorizer] Failure to beneficially vectorize 'copyable' elements in integer binary ops."
This reverts commit r313348.

Reason: it caused buildbot failures.
llvm-svn: 313352
2017-09-15 10:15:00 +00:00
Dinar Temirbulatov e2358b53bc [SLPVectorizer] Failure to beneficially vectorize 'copyable' elements in integer binary ops.
Patch tries to improve vectorization of the following code:

void add1(int * __restrict dst, const int * __restrict src) {
  *dst++ = *src++;
  *dst++ = *src++ + 1;
  *dst++ = *src++ + 2;
  *dst++ = *src++ + 3;
}
Allows to vectorize even if the very first operation is not a binary add, but just a load.

Reviewers: spatel, mzolotukhin, mkuper, hfinkel, RKSimon, filcab, ABataev, davide

Subscribers: llvm-commits, RKSimon

Differential Revision: https://reviews.llvm.org/D28907

llvm-svn: 313348
2017-09-15 06:56:39 +00:00
Dinar Temirbulatov bb891b864c [SLPVectorizer] Remove duplicated functionality code in initScheduleData function, NFCI.
llvm-svn: 313341
2017-09-15 04:31:54 +00:00
Alina Sbirlea 7ed5856a32 Refactor collectChildrenInLoop to LoopUtils [NFC]
Summary: Move to LoopUtils method that collects all children of a node inside a loop.

Reviewers: majnemer, sanjoy

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D37870

llvm-svn: 313322
2017-09-15 00:04:16 +00:00
Dehao Chen 3a81f84d9a Invoke GetInlineCost for legality check before inline functions in SampleProfileLoader.
Summary: SampleProfileLoader inlines hot functions if it is inlined in the profiled binary. However, the inline needs to be guarded by legality check, otherwise it could lead to correctness issues.

Reviewers: eraman, davidxl

Reviewed By: eraman

Subscribers: vitalybuka, sanjoy, llvm-commits

Differential Revision: https://reviews.llvm.org/D37779

llvm-svn: 313277
2017-09-14 17:29:56 +00:00
Alon Kom 682cfc1d4c [LV] Fix maximum legal VF calculation
This patch fixes pr34283, which exposed that the computation of
maximum legal width for vectorization was wrong, because it relied
on MaxInterleaveFactor to obtain the maximum stride used in the loop,
however not all strided accesses in the loop have an interleave-group
associated with them.
Instead of recording the maximum stride in the loop, which can be over
conservative (e.g. if the access with the maximum stride is not involved
in the dependence limitation), this patch tracks the actual maximum legal
width imposed by accesses that are involved in dependencies.

Differential Revision: https://reviews.llvm.org/D37507

llvm-svn: 313237
2017-09-14 07:40:02 +00:00
Vitaly Buka 48624d327a Revert "Invoke GetInlineCost for legality check before inline functions in SampleProfileLoader."
Patch introduced uninitialized value.

This reverts commit r313195.

llvm-svn: 313230
2017-09-14 05:40:33 +00:00
Peter Collingbourne cfbd089237 Reland r313157, "ThinLTO: Correctly follow aliasee references when dead stripping." which was reverted in r313222.
This reland includes a fix for the LowerTypeTests pass so that it
looks past aliases when determining which type identifiers are live.

Differential Revision: https://reviews.llvm.org/D37842

llvm-svn: 313229
2017-09-14 05:02:59 +00:00
Dinar Temirbulatov df0b843875 [SLPVectorizer] Prefer auto over explicit type for VL0, NFCI.
llvm-svn: 313228
2017-09-14 04:28:35 +00:00
Hans Wennborg ae050afeb9 Revert r313157 "ThinLTO: Correctly follow aliasee references when dead stripping."
This broke Chromium's CFI build; see crbug.com/765004.

> We were previously handling aliases during dead stripping by adding
> the aliased global's "original name" GUID to the worklist. This will
> lead to incorrect behaviour if the global has local linkage because
> the original name GUID will not correspond to the global's GUID in
> the summary.
>
> Because an alias is just another name for the global that it
> references, there is no need to mark the referenced global as used,
> or to follow references from any other copies of the global. So all
> we need to do is to follow references from the aliasee's summary
> instead of the alias.
>
> Differential Revision: https://reviews.llvm.org/D37789

llvm-svn: 313222
2017-09-14 00:40:14 +00:00
Eugene Zelenko 8002c504cd [Transforms] Fix some Clang-tidy modernize-use-using and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 313198
2017-09-13 21:43:53 +00:00
Dehao Chen 15c86ef970 Invoke GetInlineCost for legality check before inline functions in SampleProfileLoader.
Summary: SampleProfileLoader inlines hot functions if it is inlined in the profiled binary. However, the inline needs to be guarded by legality check, otherwise it could lead to correctness issues.

Reviewers: eraman, davidxl

Reviewed By: eraman

Subscribers: sanjoy, llvm-commits

Differential Revision: https://reviews.llvm.org/D37779

llvm-svn: 313195
2017-09-13 21:22:55 +00:00
Anna Thomas 19529f75b9 [LV] Avoid computing the register usage for default VF. NFC
These are changes to reduce redundant computations when calculating a
feasible vectorization factor:
1. early return when target has no vector registers
2. don't compute register usage for the default VF.

Suggested during review for D37702.

llvm-svn: 313176
2017-09-13 19:35:45 +00:00
Hiroshi Yamauchi a43913cfaf Add options to dump PGO counts in text.
Summary:
Added text options to -pgo-view-counts and -pgo-view-raw-counts that dump block frequency and branch probability info in text.

This is useful when the graph is very large and complex (the dot command crashes, lines/edges too close to tell apart, hard to navigate without textual search) or simply when text is preferred.

Reviewers: davidxl

Reviewed By: davidxl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D37776

llvm-svn: 313159
2017-09-13 17:20:38 +00:00
Peter Collingbourne d067c8ed59 ThinLTO: Correctly follow aliasee references when dead stripping.
We were previously handling aliases during dead stripping by adding
the aliased global's "original name" GUID to the worklist. This will
lead to incorrect behaviour if the global has local linkage because
the original name GUID will not correspond to the global's GUID in
the summary.

Because an alias is just another name for the global that it
references, there is no need to mark the referenced global as used,
or to follow references from any other copies of the global. So all
we need to do is to follow references from the aliasee's summary
instead of the alias.

Differential Revision: https://reviews.llvm.org/D37789

llvm-svn: 313157
2017-09-13 17:09:20 +00:00
Teresa Johnson 1958083d35 [ThinLTO] For SamplePGO, need to handle ICP targets consistently in thin link
Summary:
SamplePGO indirect call profiles record the target as the original GUID
for statics. The importer had special handling to map to the normal GUID
in that case. The dead global analysis needs the same treatment or
inconsistencies arise, resulting in linker unsats due to some dead
symbols being exported and kept, leaving in references to other dead
symbols that are removed.

This can happen when a SamplePGO profile collected by one binary is used
for a different binary, so the indirect call profiles may not accurately
reflect live targets.

Reviewers: danielcdh

Subscribers: mehdi_amini, inglorion, llvm-commits, eraman

Differential Revision: https://reviews.llvm.org/D37783

llvm-svn: 313151
2017-09-13 15:16:38 +00:00
Ayal Zaks e2a8c0758f [LV] Fix PR34523 - avoid generating redundant selects
When converting a PHI into a series of 'select' instructions to combine the
incoming values together according their edge masks, initialize the first
value to the incoming value In0 of the first predecessor, instead of
generating a redundant assignment 'select(Cond[0], In0, In0)'. The latter
fails when the Cond[0] mask is null, representing a full mask, which can
happen only when there's a single incoming value.

No functional changes intended nor expected other than surviving null Cond[0]'s.

This fix follows D35725, which introduced using null to represent full masks.

Differential Revision: https://reviews.llvm.org/D37619

llvm-svn: 313119
2017-09-13 06:28:37 +00:00
Aditya Kumar dfa8741c96 [GVNHoist] Factor out reachability to search for anticipable instructions quickly
Factor out the reachability such that multiple queries to find reachability of values are fast. This is based on finding
the ANTIC points
in the CFG which do not change during hoisting. The ANTIC points are basically the dominance-frontiers in the inverse
graph. So we introduce a data structure (CHI nodes)
to keep track of values flowing out of a basic block. We only do this for values with multiple occurrences in the
function as they are the potential hoistable candidates.

This patch allows us to hoist instructions to a basic block with >2 successors, as well as deal with infinite loops in a
trivial way.
Relevant test cases are added to show the functionality as well as regression fixes from PR32821.

Regression from previous GVNHoist:
We do not hoist fully redundant expressions because fully redundant expressions are already handled by NewGVN

Differential Revision: https://reviews.llvm.org/D35918
Reviewers: dberlin, sebpop, gberry,

llvm-svn: 313116
2017-09-13 05:28:03 +00:00
Reid Kleckner 8a1cd91016 [InstCombine] Add a flag to disable LowerDbgDeclare
Summary:
This should improve optimized debug info for address-taken variables at
the cost of inaccurate debug info in some situations.

We patched this into clang and deployed this change to Chromium
developers, and this significantly improved debuggability of optimized
code. The long-term solution to PR34136 seems more and more like it's
going to take a while, so I would like to commit this change under a
flag so that it can be used as a stop-gap measure.

This flag should really help so for C++ aggregates like std::string and
std::vector, which are typically address-taken, even after inlining, and
cannot be SROA-ed.

Reviewers: aprantl, dblaikie, probinson, dberlin

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D36596

llvm-svn: 313108
2017-09-13 01:43:25 +00:00
Dehao Chen f3ed14d323 Refactor the code to pass down ACT to SampleProfileLoader correctly.
Summary: This change passes down ACT to SampleProfileLoader for the new PM. Also remove the default value for SampleProfileLoader class as it is not used.

Reviewers: eraman, davidxl

Reviewed By: eraman

Subscribers: sanjoy, llvm-commits

Differential Revision: https://reviews.llvm.org/D37773

llvm-svn: 313080
2017-09-12 21:55:55 +00:00
Alina Sbirlea 80b806bf30 Make promoteLoopAccessesToScalars independent of AliasSet [NFC]
Summary:
The current promoteLoopAccessesToScalars method receives an AliasSet, but
the information used is in fact a list of Value*, known to must alias.
Create the list ahead of time to make this method independent of the AliasSet class.

While there is no functionality change, this adds overhead for creating
a set of Value*, when promotion would normally exit earlier.
This is meant to be as a first refactoring step in order to start replacing
AliasSetTracker with MemorySSA.
And while the end goal is to redesign LICM, the first few steps will focus on
adding MemorySSA as an alternative to the AliasSetTracker using most of the
existing functionality.

Reviewers: mkuper, danielcdh, dberlin

Subscribers: sanjoy, chandlerc, gberry, davide, llvm-commits

Differential Revision: https://reviews.llvm.org/D35439

llvm-svn: 313075
2017-09-12 21:18:44 +00:00
Anna Thomas 9f1be02fa3 [LV] Clamp the VF to the trip count
Summary:
When the MaxVectorSize > ConstantTripCount, we should just clamp the
vectorization factor to be the ConstantTripCount.
This vectorizes loops where the TinyTripCountThreshold >= TripCount < MaxVF.

Earlier we were finding the maximum vector width, which could be greater than
the trip count itself. The Loop vectorizer does all the work for generating a
vectorizable loop, but in the end we would always choose the scalar loop (since
the VF > trip count). This allows us to choose the VF keeping in mind the trip
count if available.

This is a fix on top of rL312472.

Reviewers: Ayal, zvi, hfinkel, dneilson

Reviewed by: Ayal

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D37702

llvm-svn: 313046
2017-09-12 16:32:45 +00:00
Alexey Bataev a26d3e834d [SLP] Fix for PHINode during horizontal reduction scanning, NFC.
Reduces number of loops during instructions analysis.

llvm-svn: 313035
2017-09-12 15:13:50 +00:00
Peter Collingbourne b9b6025328 LowerTypeTests: Add import/export support for targets without absolute symbol constants.
The rationale is the same as for r312967.

Differential Revision: https://reviews.llvm.org/D37408

llvm-svn: 312968
2017-09-11 22:49:10 +00:00
Peter Collingbourne b15a35e604 WholeProgramDevirt: Add import/export support for targets without absolute symbol constants.
Not all targets support the use of absolute symbols to export
constants. In particular, ARM has a wide variety of constant encodings
that cannot currently be relocated by linkers. So instead of exporting
the constants using symbols, export them directly in the summary.
The values of the constants are left as zeroes on targets that support
symbolic exports.

This may result in more cache misses when targeting those architectures
as a result of arbitrary changes in constant values, but this seems
somewhat unavoidable for now.

Differential Revision: https://reviews.llvm.org/D37407

llvm-svn: 312967
2017-09-11 22:34:42 +00:00
Uriel Korach 18972237a2 Test commit
llvm-svn: 312878
2017-09-10 08:31:22 +00:00
Nuno Lopes 404f106d71 Merge isKnownNonNull into isKnownNonZero
It now knows the tricks of both functions.
Also, fix a bug that considered allocas of non-zero address space to be always non null

Differential Revision: https://reviews.llvm.org/D37628

llvm-svn: 312869
2017-09-09 18:23:11 +00:00
Sanjay Patel 6fd4391ddd [DivRempairs] add a pass to optimize div/rem pairs (PR31028)
This is intended to be a superset of the functionality from D31037 (EarlyCSE) but implemented 
as an independent pass, so there's no stretching of scope and feature creep for an existing pass. 
I also proposed a weaker version of this for SimplifyCFG in D30910. And I initially had almost 
this same functionality as an addition to CGP in the motivating example of PR31028:
https://bugs.llvm.org/show_bug.cgi?id=31028

The advantage of positioning this ahead of SimplifyCFG in the pass pipeline is that it can allow 
more flattening. But it needs to be after passes (InstCombine) that could sink a div/rem and
undo the hoisting that is done here.

Decomposing remainder may allow removing some code from the backend (PPC and possibly others).

Differential Revision: https://reviews.llvm.org/D37121 

llvm-svn: 312862
2017-09-09 13:38:18 +00:00
Kostya Serebryany 4192b96313 [sanitizer-coverage] call appendToUsed once per module, not once per function (which is too slow)
llvm-svn: 312855
2017-09-09 05:30:13 +00:00
Alexey Bataev 628fbcae4c [SLP] Fix buildbots, NFC.
llvm-svn: 312853
2017-09-09 02:08:45 +00:00
Dinar Temirbulatov 0d31f0af43 [SLPVectorizer] Add struct InstructionsState that holds information about analysis of vector to be vectorized.
Reviewers: spatel, mzolotukhin, mkuper, hfinkel, RKSimon, filcab, ABataev, davide

Subscribers: llvm-commits, rengolin

Differential Revision: https://reviews.llvm.org/D37212

llvm-svn: 312802
2017-09-08 17:08:17 +00:00
Alexey Bataev bd4a361739 [SLP] Fix the warning about paths not returning the value, NFC.
llvm-svn: 312793
2017-09-08 14:32:20 +00:00
Alexey Bataev 6dd29fccb8 [SLP] Support for horizontal min/max reduction.
SLP vectorizer supports horizontal reductions for Add/FAdd binary
operations. Patch adds support for horizontal min/max reductions.
Function getReductionCost() is split to getArithmeticReductionCost() for
binary operation reductions and getMinMaxReductionCost() for min/max
reductions.
Patch fixes PR26956.

Differential revision: https://reviews.llvm.org/D27846

llvm-svn: 312791
2017-09-08 13:49:36 +00:00
Max Kazantsev d7b0f74c64 Re-enable "[IRCE] Identify loops with latch comparison against current IV value"
Re-applying after the found bug was fixed.

Differential Revision: https://reviews.llvm.org/D36215

llvm-svn: 312783
2017-09-08 10:15:05 +00:00
Max Kazantsev 57db44838d diff --git a/lib/Transforms/Scalar/InductiveRangeCheckElimination.cpp b/lib/Transforms/Scalar/InductiveRangeCheckElimination.cpp
index f72a808..9fa49fd 100644
--- a/lib/Transforms/Scalar/InductiveRangeCheckElimination.cpp
+++ b/lib/Transforms/Scalar/InductiveRangeCheckElimination.cpp
@@ -450,20 +450,10 @@ struct LoopStructure {
   // equivalent to:
   //
   // intN_ty inc = IndVarIncreasing ? 1 : -1;
-  // pred_ty predicate = IndVarIncreasing
-  //                         ? IsSignedPredicate ? ICMP_SLT : ICMP_ULT
-  //                         : IsSignedPredicate ? ICMP_SGT : ICMP_UGT;
+  // pred_ty predicate = IndVarIncreasing ? ICMP_SLT : ICMP_SGT;
   //
-  //
-  // for (intN_ty iv = IndVarStart; predicate(IndVarBase, LoopExitAt);
-  //      iv = IndVarNext)
+  // for (intN_ty iv = IndVarStart; predicate(iv, LoopExitAt); iv = IndVarBase)
   //   ... body ...
-  //
-  // Here IndVarBase is either current or next value of the induction variable.
-  // in the former case, IsIndVarNext = false and IndVarBase points to the
-  // Phi node of the induction variable. Otherwise, IsIndVarNext = true and
-  // IndVarBase points to IV increment instruction.
-  //
 
   Value *IndVarBase;
   Value *IndVarStart;
@@ -471,13 +461,12 @@ struct LoopStructure {
   Value *LoopExitAt;
   bool IndVarIncreasing;
   bool IsSignedPredicate;
-  bool IsIndVarNext;
 
   LoopStructure()
       : Tag(""), Header(nullptr), Latch(nullptr), LatchBr(nullptr),
         LatchExit(nullptr), LatchBrExitIdx(-1), IndVarBase(nullptr),
         IndVarStart(nullptr), IndVarStep(nullptr), LoopExitAt(nullptr),
-        IndVarIncreasing(false), IsSignedPredicate(true), IsIndVarNext(false) {}
+        IndVarIncreasing(false), IsSignedPredicate(true) {}
 
   template <typename M> LoopStructure map(M Map) const {
     LoopStructure Result;
@@ -493,7 +482,6 @@ struct LoopStructure {
     Result.LoopExitAt = Map(LoopExitAt);
     Result.IndVarIncreasing = IndVarIncreasing;
     Result.IsSignedPredicate = IsSignedPredicate;
-    Result.IsIndVarNext = IsIndVarNext;
     return Result;
   }
 
@@ -841,42 +829,21 @@ LoopStructure::parseLoopStructure(ScalarEvolution &SE,
     return false;
   };
 
-  // `ICI` can either be a comparison against IV or a comparison of IV.next.
-  // Depending on the interpretation, we calculate the start value differently.
+  // `ICI` is interpreted as taking the backedge if the *next* value of the
+  // induction variable satisfies some constraint.
 
-  // Pair {IndVarBase; IsIndVarNext} semantically designates whether the latch
-  // comparisons happens against the IV before or after its value is
-  // incremented. Two valid combinations for them are:
-  //
-  // 1) { phi [ iv.start, preheader ], [ iv.next, latch ]; false },
-  // 2) { iv.next; true }.
-  //
-  // The latch comparison happens against IndVarBase which can be either current
-  // or next value of the induction variable.
   const SCEVAddRecExpr *IndVarBase = cast<SCEVAddRecExpr>(LeftSCEV);
   bool IsIncreasing = false;
   bool IsSignedPredicate = true;
-  bool IsIndVarNext = false;
   ConstantInt *StepCI;
   if (!IsInductionVar(IndVarBase, IsIncreasing, StepCI)) {
     FailureReason = "LHS in icmp not induction variable";
     return None;
   }
 
-  const SCEV *IndVarStart = nullptr;
-  // TODO: Currently we only handle comparison against IV, but we can extend
-  // this analysis to be able to deal with comparison against sext(iv) and such.
-  if (isa<PHINode>(LeftValue) &&
-      cast<PHINode>(LeftValue)->getParent() == Header)
-    // The comparison is made against current IV value.
-    IndVarStart = IndVarBase->getStart();
-  else {
-    // Assume that the comparison is made against next IV value.
-    const SCEV *StartNext = IndVarBase->getStart();
-    const SCEV *Addend = SE.getNegativeSCEV(IndVarBase->getStepRecurrence(SE));
-    IndVarStart = SE.getAddExpr(StartNext, Addend);
-    IsIndVarNext = true;
-  }
+  const SCEV *StartNext = IndVarBase->getStart();
+  const SCEV *Addend = SE.getNegativeSCEV(IndVarBase->getStepRecurrence(SE));
+  const SCEV *IndVarStart = SE.getAddExpr(StartNext, Addend);
   const SCEV *Step = SE.getSCEV(StepCI);
 
   ConstantInt *One = ConstantInt::get(IndVarTy, 1);
@@ -1060,7 +1027,6 @@ LoopStructure::parseLoopStructure(ScalarEvolution &SE,
   Result.IndVarIncreasing = IsIncreasing;
   Result.LoopExitAt = RightValue;
   Result.IsSignedPredicate = IsSignedPredicate;
-  Result.IsIndVarNext = IsIndVarNext;
 
   FailureReason = nullptr;
 
@@ -1350,9 +1316,8 @@ LoopConstrainer::RewrittenRangeInfo LoopConstrainer::changeIterationSpaceEnd(
                                       BranchToContinuation);
 
     NewPHI->addIncoming(PN->getIncomingValueForBlock(Preheader), Preheader);
-    auto *FixupValue =
-        LS.IsIndVarNext ? PN->getIncomingValueForBlock(LS.Latch) : PN;
-    NewPHI->addIncoming(FixupValue, RRI.ExitSelector);
+    NewPHI->addIncoming(PN->getIncomingValueForBlock(LS.Latch),
+                        RRI.ExitSelector);
     RRI.PHIValuesAtPseudoExit.push_back(NewPHI);
   }
 
@@ -1735,10 +1700,7 @@ bool InductiveRangeCheckElimination::runOnLoop(Loop *L, LPPassManager &LPM) {
   }
   LoopStructure LS = MaybeLoopStructure.getValue();
   const SCEVAddRecExpr *IndVar =
-      cast<SCEVAddRecExpr>(SE.getSCEV(LS.IndVarBase));
-  if (LS.IsIndVarNext)
-    IndVar = cast<SCEVAddRecExpr>(SE.getMinusSCEV(IndVar,
-                                                  SE.getSCEV(LS.IndVarStep)));
+      cast<SCEVAddRecExpr>(SE.getMinusSCEV(SE.getSCEV(LS.IndVarBase), SE.getSCEV(LS.IndVarStep)));
 
   Optional<InductiveRangeCheck::Range> SafeIterRange;
   Instruction *ExprInsertPt = Preheader->getTerminator();
diff --git a/test/Transforms/IRCE/latch-comparison-against-current-value.ll b/test/Transforms/IRCE/latch-comparison-against-current-value.ll
deleted file mode 100644
index afea0e6..0000000
--- a/test/Transforms/IRCE/latch-comparison-against-current-value.ll
+++ /dev/null
@@ -1,182 +0,0 @@
-; RUN: opt -verify-loop-info -irce-print-changed-loops -irce -S < %s 2>&1 | FileCheck %s
-
-; Check that IRCE is able to deal with loops where the latch comparison is
-; done against current value of the IV, not the IV.next.
-
-; CHECK: irce: in function test_01: constrained Loop at depth 1 containing: %loop<header><exiting>,%in.bounds<latch><exiting>
-; CHECK: irce: in function test_02: constrained Loop at depth 1 containing: %loop<header><exiting>,%in.bounds<latch><exiting>
-; CHECK-NOT: irce: in function test_03: constrained Loop at depth 1 containing: %loop<header><exiting>,%in.bounds<latch><exiting>
-; CHECK-NOT: irce: in function test_04: constrained Loop at depth 1 containing: %loop<header><exiting>,%in.bounds<latch><exiting>
-
-; SLT condition for increasing loop from 0 to 100.
-define void @test_01(i32* %arr, i32* %a_len_ptr) #0 {
-
-; CHECK:      test_01
-; CHECK:        entry:
-; CHECK-NEXT:     %exit.mainloop.at = load i32, i32* %a_len_ptr, !range !0
-; CHECK-NEXT:     [[COND2:%[^ ]+]] = icmp slt i32 0, %exit.mainloop.at
-; CHECK-NEXT:     br i1 [[COND2]], label %loop.preheader, label %main.pseudo.exit
-; CHECK:        loop:
-; CHECK-NEXT:     %idx = phi i32 [ %idx.next, %in.bounds ], [ 0, %loop.preheader ]
-; CHECK-NEXT:     %idx.next = add nuw nsw i32 %idx, 1
-; CHECK-NEXT:     %abc = icmp slt i32 %idx, %exit.mainloop.at
-; CHECK-NEXT:     br i1 true, label %in.bounds, label %out.of.bounds.loopexit1
-; CHECK:        in.bounds:
-; CHECK-NEXT:     %addr = getelementptr i32, i32* %arr, i32 %idx
-; CHECK-NEXT:     store i32 0, i32* %addr
-; CHECK-NEXT:     %next = icmp slt i32 %idx, 100
-; CHECK-NEXT:     [[COND3:%[^ ]+]] = icmp slt i32 %idx, %exit.mainloop.at
-; CHECK-NEXT:     br i1 [[COND3]], label %loop, label %main.exit.selector
-; CHECK:        main.exit.selector:
-; CHECK-NEXT:     %idx.lcssa = phi i32 [ %idx, %in.bounds ]
-; CHECK-NEXT:     [[COND4:%[^ ]+]] = icmp slt i32 %idx.lcssa, 100
-; CHECK-NEXT:     br i1 [[COND4]], label %main.pseudo.exit, label %exit
-; CHECK-NOT: loop.preloop:
-; CHECK:        loop.postloop:
-; CHECK-NEXT:    %idx.postloop = phi i32 [ %idx.copy, %postloop ], [ %idx.next.postloop, %in.bounds.postloop ]
-; CHECK-NEXT:     %idx.next.postloop = add nuw nsw i32 %idx.postloop, 1
-; CHECK-NEXT:     %abc.postloop = icmp slt i32 %idx.postloop, %exit.mainloop.at
-; CHECK-NEXT:     br i1 %abc.postloop, label %in.bounds.postloop, label %out.of.bounds.loopexit
-
-entry:
-  %len = load i32, i32* %a_len_ptr, !range !0
-  br label %loop
-
-loop:
-  %idx = phi i32 [ 0, %entry ], [ %idx.next, %in.bounds ]
-  %idx.next = add nsw nuw i32 %idx, 1
-  %abc = icmp slt i32 %idx, %len
-  br i1 %abc, label %in.bounds, label %out.of.bounds
-
-in.bounds:
-  %addr = getelementptr i32, i32* %arr, i32 %idx
-  store i32 0, i32* %addr
-  %next = icmp slt i32 %idx, 100
-  br i1 %next, label %loop, label %exit
-
-out.of.bounds:
-  ret void
-
-exit:
-  ret void
-}
-
-; ULT condition for increasing loop from 0 to 100.
-define void @test_02(i32* %arr, i32* %a_len_ptr) #0 {
-
-; CHECK:      test_02
-; CHECK:        entry:
-; CHECK-NEXT:     %exit.mainloop.at = load i32, i32* %a_len_ptr, !range !0
-; CHECK-NEXT:     [[COND2:%[^ ]+]] = icmp ult i32 0, %exit.mainloop.at
-; CHECK-NEXT:     br i1 [[COND2]], label %loop.preheader, label %main.pseudo.exit
-; CHECK:        loop:
-; CHECK-NEXT:     %idx = phi i32 [ %idx.next, %in.bounds ], [ 0, %loop.preheader ]
-; CHECK-NEXT:     %idx.next = add nuw nsw i32 %idx, 1
-; CHECK-NEXT:     %abc = icmp ult i32 %idx, %exit.mainloop.at
-; CHECK-NEXT:     br i1 true, label %in.bounds, label %out.of.bounds.loopexit1
-; CHECK:        in.bounds:
-; CHECK-NEXT:     %addr = getelementptr i32, i32* %arr, i32 %idx
-; CHECK-NEXT:     store i32 0, i32* %addr
-; CHECK-NEXT:     %next = icmp ult i32 %idx, 100
-; CHECK-NEXT:     [[COND3:%[^ ]+]] = icmp ult i32 %idx, %exit.mainloop.at
-; CHECK-NEXT:     br i1 [[COND3]], label %loop, label %main.exit.selector
-; CHECK:        main.exit.selector:
-; CHECK-NEXT:     %idx.lcssa = phi i32 [ %idx, %in.bounds ]
-; CHECK-NEXT:     [[COND4:%[^ ]+]] = icmp ult i32 %idx.lcssa, 100
-; CHECK-NEXT:     br i1 [[COND4]], label %main.pseudo.exit, label %exit
-; CHECK-NOT: loop.preloop:
-; CHECK:        loop.postloop:
-; CHECK-NEXT:    %idx.postloop = phi i32 [ %idx.copy, %postloop ], [ %idx.next.postloop, %in.bounds.postloop ]
-; CHECK-NEXT:     %idx.next.postloop = add nuw nsw i32 %idx.postloop, 1
-; CHECK-NEXT:     %abc.postloop = icmp ult i32 %idx.postloop, %exit.mainloop.at
-; CHECK-NEXT:     br i1 %abc.postloop, label %in.bounds.postloop, label %out.of.bounds.loopexit
-
-entry:
-  %len = load i32, i32* %a_len_ptr, !range !0
-  br label %loop
-
-loop:
-  %idx = phi i32 [ 0, %entry ], [ %idx.next, %in.bounds ]
-  %idx.next = add nsw nuw i32 %idx, 1
-  %abc = icmp ult i32 %idx, %len
-  br i1 %abc, label %in.bounds, label %out.of.bounds
-
-in.bounds:
-  %addr = getelementptr i32, i32* %arr, i32 %idx
-  store i32 0, i32* %addr
-  %next = icmp ult i32 %idx, 100
-  br i1 %next, label %loop, label %exit
-
-out.of.bounds:
-  ret void
-
-exit:
-  ret void
-}
-
-; Same as test_01, but comparison happens against IV extended to a wider type.
-; This test ensures that IRCE rejects it and does not falsely assume that it was
-; a comparison against iv.next.
-; TODO: We can actually extend the recognition to cover this case.
-define void @test_03(i32* %arr, i64* %a_len_ptr) #0 {
-
-; CHECK:      test_03
-
-entry:
-  %len = load i64, i64* %a_len_ptr, !range !1
-  br label %loop
-
-loop:
-  %idx = phi i32 [ 0, %entry ], [ %idx.next, %in.bounds ]
-  %idx.next = add nsw nuw i32 %idx, 1
-  %idx.ext = sext i32 %idx to i64
-  %abc = icmp slt i64 %idx.ext, %len
-  br i1 %abc, label %in.bounds, label %out.of.bounds
-
-in.bounds:
-  %addr = getelementptr i32, i32* %arr, i32 %idx
-  store i32 0, i32* %addr
-  %next = icmp slt i32 %idx, 100
-  br i1 %next, label %loop, label %exit
-
-out.of.bounds:
-  ret void
-
-exit:
-  ret void
-}
-
-; Same as test_02, but comparison happens against IV extended to a wider type.
-; This test ensures that IRCE rejects it and does not falsely assume that it was
-; a comparison against iv.next.
-; TODO: We can actually extend the recognition to cover this case.
-define void @test_04(i32* %arr, i64* %a_len_ptr) #0 {
-
-; CHECK:      test_04
-
-entry:
-  %len = load i64, i64* %a_len_ptr, !range !1
-  br label %loop
-
-loop:
-  %idx = phi i32 [ 0, %entry ], [ %idx.next, %in.bounds ]
-  %idx.next = add nsw nuw i32 %idx, 1
-  %idx.ext = sext i32 %idx to i64
-  %abc = icmp ult i64 %idx.ext, %len
-  br i1 %abc, label %in.bounds, label %out.of.bounds
-
-in.bounds:
-  %addr = getelementptr i32, i32* %arr, i32 %idx
-  store i32 0, i32* %addr
-  %next = icmp ult i32 %idx, 100
-  br i1 %next, label %loop, label %exit
-
-out.of.bounds:
-  ret void
-
-exit:
-  ret void
-}
-
-!0 = !{i32 0, i32 50}
-!1 = !{i64 0, i64 50}

llvm-svn: 312775
2017-09-08 04:26:41 +00:00
Peter Collingbourne 88a58cf9e7 WholeProgramDevirt: When promoting for single-impl devirt, also rename the comdat.
This is required when targeting COFF, as the comdat name must match
one of the names of the symbols in the comdat.

Differential Revision: https://reviews.llvm.org/D37550

llvm-svn: 312767
2017-09-08 00:10:53 +00:00
Reid Kleckner 0e8c4bb055 Sink some IntrinsicInst.h and Intrinsics.h out of llvm/include
Many of these uses can get by with forward declarations. Hopefully this
speeds up compilation after adding a single intrinsic.

llvm-svn: 312759
2017-09-07 23:27:44 +00:00
Richard Trieu c7828ebea4 Revert r312318, r312325, r312424, r312489
r312318 - Debug info for variables whose type is shrinked to bool
r312325, r312424, r312489 - Test case for r312318

Revision 312318 introduced a null dereference bug.
Details in https://bugs.llvm.org/show_bug.cgi?id=34490

llvm-svn: 312758
2017-09-07 23:20:35 +00:00
Krzysztof Parzyszek 1dc313727e Disable jump threading into loop headers
Consider this type of a loop:
    for (...) {
      ...
      if (...) continue;
      ...
    }
Normally, the "continue" would branch to the loop control code that
checks whether the loop should continue iterating and which contains
the (often) unique loop latch branch. In certain cases jump threading
can "thread" the inner branch directly to the loop header, creating
a second loop latch. Loop canonicalization would then transform this
loop into a loop nest. The problem with this is that in such a loop
nest neither loop is countable even if the original loop was. This
may inhibit subsequent loop optimizations and be detrimental to
performance.

Differential Revision: https://reviews.llvm.org/D36404

llvm-svn: 312664
2017-09-06 19:36:58 +00:00
Sanjay Patel 6840c5ff75 [ValueTracking, InstCombine] canonicalize fcmp ord/uno with non-NAN ops to null constants
This is a preliminary step towards solving the remaining part of PR27145 - IR for isfinite():
https://bugs.llvm.org/show_bug.cgi?id=27145

In order to solve that one more generally, we need to add matching for and/or of fcmp ord/uno
with a constant operand.

But while looking at those patterns, I realized we were missing a canonicalization for nonzero
constants. Rather than limiting to just folds for constants, we're adding a general value
tracking method for this based on an existing DAG helper.

By transforming everything to 0.0, we can simplify the existing code in foldLogicOfFCmps()
and pick up missing vector folds.

Differential Revision: https://reviews.llvm.org/D37427

llvm-svn: 312591
2017-09-05 23:13:13 +00:00
Davide Italiano 32504cf661 [GVNHoist] Move duplicated code to a helper function. NFCI.
llvm-svn: 312575
2017-09-05 20:49:41 +00:00
Craig Topper 28d6d962d5 [InstCombine] Move foldSelectICmpAnd helper function earlier in the file to enable reuse in a future patch.
llvm-svn: 312518
2017-09-05 05:26:37 +00:00
Craig Topper 4c766a0559 [InstCombine] In foldSelectIntoOp, avoid creating a Constant before we know for sure we're going to use it and avoid an unnecessary call to m_APInt.
Instead of creating a Constant and then calling m_APInt with it (which will always return true). Just create an APInt initially, and use that for the checks in isSelect01 function. If it turns out we do need the Constant, create it from the APInt.

This is a refactor for a future patch that will do some more checks of the constant values here.

llvm-svn: 312517
2017-09-05 05:26:36 +00:00
Daniel Berlin f9c9455d3f NewGVN: Fix PR 34430 - we need to look through predicateinfo copies to detect self-cycles of phi nodes. We also need to not ignore certain types of arguments when testing whether the phi has a backedge or was originally constant.
llvm-svn: 312510
2017-09-05 02:17:43 +00:00
Daniel Berlin 54a92fcc5d NewGVN: Fix PR 34452 by passing instruction all the way down when we do aggregate value simplification
llvm-svn: 312509
2017-09-05 02:17:42 +00:00
Daniel Berlin 1a58258232 NewGVN: Detect copies through predicateinfo
llvm-svn: 312508
2017-09-05 02:17:41 +00:00
Daniel Berlin 4ad7e8d263 NewGVN: Change where check for original instruction in phi of ops leader finding is done. Where we had it before, we would stop looking when we hit the original instruction, but skip it. Now we skip it and keep looking.
llvm-svn: 312507
2017-09-05 02:17:40 +00:00
Zvi Rackover 9a087a357a LoopVectorize: MaxVF should not be larger than the loop trip count
Summary:
Improve how MaxVF is computed while taking into account that MaxVF should not be larger than the loop's trip count.

Other than saving on compile-time by pruning the possible MaxVF candidates, this patch fixes pr34438 which exposed the following flow:
1. Short trip count identified -> Don't bail out, set OptForSize:=True to avoid tail-loop and runtime checks.
2. Compute MaxVF returned 16 on a target supporting AVX512.
3. OptForSize -> choose VF:=MaxVF.
4. Bail out because TripCount = 8, VF = 16, TripCount % VF !=0 means we need a tail loop.

With this patch step 2. will choose MaxVF=8 based on TripCount.

Reviewers: Ayal, dorit, mkuper, hfinkel

Reviewed By: hfinkel

Subscribers: hfinkel, llvm-commits

Differential Revision: https://reviews.llvm.org/D37425

llvm-svn: 312472
2017-09-04 08:35:13 +00:00
Sam Parker 7cd826a321 [LoopUnroll][DebugInfo] Don't add metadata to unrolled remainder loop
Debug information can be, and was, corrupted when the runtime
remainder loop was fully unrolled. This is because a !null node can
be created instead of a unique one describing the loop. In this case,
the original node gets incorrectly updated with the NewLoopID
metadata.

In the case when the remainder loop is going to be quickly fully
unrolled, there isn't the need to add loop metadata for it anyway.

Differential Revision: https://reviews.llvm.org/D37338

llvm-svn: 312471
2017-09-04 08:12:16 +00:00
Sanjay Patel bc6da4e40f [InstCombine] replace unnecessary fcmp fold with assert
See https://reviews.llvm.org/rL312411 for related InstSimplify tests.

llvm-svn: 312421
2017-09-02 18:10:29 +00:00
Sanjay Patel 64fc5daf42 [InstCombine] combine foldAndOfFCmps and foldOrOfFcmps; NFCI
In addition to removing chunks of duplicated code, we don't
want these to diverge. If there's a fold for one, there
should be a fold of the other via DeMorgan's Laws.

llvm-svn: 312420
2017-09-02 17:53:33 +00:00
Sanjay Patel 275bb5a14e [InstCombine] fix misnamed locals and use them to reduce code; NFCI
We had these locals:
Value *Op0RHS = LHS->getOperand(1);
Value *Op1LHS = RHS->getOperand(0);
...so we confusingly transposed the meaning of left/right and op0/op1.

llvm-svn: 312418
2017-09-02 17:17:17 +00:00
Benjamin Kramer 14ddcdfb18 [LoopVectorize] Turn static DenseSet into switch.
LLVM transforms this into a bit test which is a lot faster and smaller.

llvm-svn: 312417
2017-09-02 16:41:55 +00:00
Sanjay Patel da6f9b2fee [InstCombine] remove unnecessary code; NFC
llvm-svn: 312416
2017-09-02 16:32:37 +00:00
Sanjay Patel 4c52f765a5 [InstCombine] move related functions next to each other; NFC
This makes it easier to see that they're almost duplicates.
As with the similar icmp functions, there should be identical 
folds for both logic ops because those are DeMorganized variants.

llvm-svn: 312415
2017-09-02 16:30:27 +00:00
Sanjay Patel 6b139464ca [InstCombine] use local variable to reduce code duplication; NFCI
llvm-svn: 312414
2017-09-02 15:11:55 +00:00
Daniel Berlin 94090dd13b Fix PR/33305. caused by trying to simplify expressions in phi of ops that should have no leaders.
Summary:
After a discussion with Rekka, i believe this (or a small variant)
should fix the remaining phi-of-ops problems.

Rekka's algorithm for completeness relies on looking up expressions
that should have no leader, and expecting it to fail (IE looking up
expressions that can't exist in a predecessor, and expecting it to
find nothing).

Unfortunately, sometimes these expressions can be simplified to
constants, but we need the lookup to fail anyway.  Additionally, our
simplifier outsmarts this by taking these "not quite right"
expressions, and simplifying them into other expressions or walking
through phis, etc.  In the past, we've sometimes been able to find
leaders for these expressions, incorrectly.

This change causes us to not to try to phi of ops such expressions.
We determine safety by seeing if they depend on a phi node in our
block.

This is not perfect, we can do a bit better, but this should be a
"correctness start" that we can then improve.  It also requires a
bunch of caching that i'll eventually like to eliminate.

The right solution, longer term, to the simplifier issues, is to make
the query interface for the instruction simplifier/constant folder
have the flags we need, so that we can keep most things going, but
turn off the possibly-invalid parts (threading through phis, etc).
This is an issue in another wrong code bug as well.

Reviewers: davide, mcrosier

Subscribers: sanjoy, llvm-commits

Differential Revision: https://reviews.llvm.org/D37175

llvm-svn: 312401
2017-09-02 02:18:44 +00:00
Eugene Zelenko 75075efe5e [Analysis, Transforms] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 312383
2017-09-01 21:37:29 +00:00
Craig Topper 924f20262b [InstCombine][InstSimplify] Teach decomposeBitTestICmp to look through truncate instructions
This patch teaches decomposeBitTestICmp to look through truncate instructions on the input to the compare. If a truncate is found it will now return the pre-truncated Value and appropriately extend the APInt mask.

This allows some code to be removed from InstSimplify that was doing this functionality.

This allows InstCombine's bit test combining code to match a pre-truncate Value with the same Value appear with an 'and' on another icmp. Or it allows us to combine a truncate to i16 and a truncate to i8. This also required removing the type check from the beginning of getMaskedTypeForICmpPair, but I believe that's ok because we still have to find two values from the input to each icmp that are equal before we'll do any transformation. So the type check was really just serving as an early out.

There was one user of decomposeBitTestICmp that didn't want to look through truncates, so I've added a flag to prevent that behavior when necessary.

Differential Revision: https://reviews.llvm.org/D37158

llvm-svn: 312382
2017-09-01 21:27:34 +00:00
Craig Topper d3b465606a [InstCombine] Don't require the compare types to be the same in getMaskedTypeForICmpPair.
A future patch will make the code look through truncates feeding the compare. So the compares might be different types but the pretruncated types might be the same.

This should be safe because we still require the same Value* to be used truncated or not in both compares. So that serves to ensure the types are the same.

llvm-svn: 312381
2017-09-01 21:27:31 +00:00
Craig Topper 085c1f4dea [InstCombine] When converting decomposeBitTestICmp's APInt return to ConstantInt, make sure we use the type from the Value* that was also returned from decomposeBitTestICmp.
Previously we used the type from the LHS of the compare, but a future patch will change decomposeBitTestICmp to look through truncates so it will return a pretruncated Value* and the type needs to match that.

llvm-svn: 312380
2017-09-01 21:27:29 +00:00
Daniel Berlin 86932104db NewGVN: Make sure we don't incorrectly use PredicateInfo when doing PHI of ops
Summary: When we backtranslate expressions, we can't use the predicateinfo, since we are evaluating them in a different context.

Reviewers: davide, mcrosier

Subscribers: sanjoy, Prazek, llvm-commits

Differential Revision: https://reviews.llvm.org/D37174

llvm-svn: 312352
2017-09-01 19:20:18 +00:00
Manoj Gupta 6b54c7e11b [LoopVectorizer] Use two step casting for float to pointer types.
Summary:
LoopVectorizer is creating casts between vec<ptr> and vec<float> types
on ARM when compiling OpenCV. Since, tIs is illegal to directly cast a
floating point type to a pointer type even if the types have same size
causing a crash. Fix the crash using a two-step casting by bitcasting
to integer and integer to pointer/float.
Fixes PR33804.

Reviewers: mkuper, Ayal, dlj, rengolin, srhines

Reviewed By: rengolin

Subscribers: aemerson, kristof.beyls, mkazantsev, Meinersbur, rengolin, mzolotukhin, llvm-commits

Differential Revision: https://reviews.llvm.org/D35498

llvm-svn: 312331
2017-09-01 15:36:00 +00:00
Clement Courbet bc0c4459c9 [MergeICmps] Fix build of rL312315 on clang-with-thin-lto-windows:
MergeICmps.cpp(68,15): error: chosen constructor is explicit in copy-initialization
      return {};
APInt.h(339,12): note: explicit constructor declared here
  explicit APInt() : BitWidth(1) { U.VAL = 0; }
             ^
MergeICmps.cpp(56,9): note: in implicit initialization of field 'Offset' with omitted
initializer
  APInt Offset;
          ^

llvm-svn: 312326
2017-09-01 11:51:23 +00:00
Clement Courbet 65130e2d8d Reland rL312315: [MergeICmps] MergeICmps is a new optimization pass that turns chains of integer
Add missing header.

This reverts commit 86dd6335cf7607af22f383a9a8e072ba929848cf.

llvm-svn: 312322
2017-09-01 10:56:34 +00:00
Strahinja Petrovic 676fd0b022 Debug info for variables whose type is shrinked to bool
This patch provides such debug information for integer
variables whose type is shrinked to bool by providing 
dwarf expression which returns either constant initial 
value or other value.

Patch by Nikola Prica.

Differential Revision: https://reviews.llvm.org/D35994

llvm-svn: 312318
2017-09-01 10:05:27 +00:00
Clement Courbet 316212575b Revert "[MergeICmps] MergeICmps is a new optimization pass that turns chains of integer"
Break build

This reverts commit d07ab866f7f88f81e49046d691a80dcd32d7198b.

llvm-svn: 312317
2017-09-01 09:43:08 +00:00
Clement Courbet 9473c01e96 [MergeICmps] MergeICmps is a new optimization pass that turns chains of integer
comparisons into memcmp.

Thanks to recent improvements in the LLVM codegen, the memcmp is typically
inlined as a chain of efficient hardware comparisons.
This typically benefits C++ member or nonmember operator==().

For now this is disabled by default until:
 - https://bugs.llvm.org/show_bug.cgi?id=33329 is complete
 - Benchmarks show that this is always useful.

Differential Revision:
https://reviews.llvm.org/D33987

llvm-svn: 312315
2017-09-01 09:07:05 +00:00
Eugene Zelenko fa6434bebb [Analysis] Fix some Clang-tidy modernize-use-using and Include What You Use warnings; other minor fixes. Also affected in files (NFC).
llvm-svn: 312289
2017-08-31 21:56:16 +00:00
Akira Hatanaka 13d2beb14d [ObjCARC] Pass the correct BasicBlock to fix assertion failure.
The BasicBlock passed to FindPredecessorRetainWithSafePath should be the
parent block of Autorelease. This fixes a crash that occurs in
FindDependencies when StartInst is not in StartBB.

rdar://problem/33866381

llvm-svn: 312266
2017-08-31 18:27:47 +00:00
Sanjay Patel e6b48a1b02 [InstCombine] improve demanded vector elements analysis of insertelement
Recurse instead of returning on the first found optimization. Also, return early in the caller
instead of continuing because that allows another round of simplification before we might
potentially lose undef information from a shuffle mask by eliminating the shuffle.

As noted in the review, we could probably do better and be more efficient by moving all of
demanded elements into a separate pass, but this is yet another quick fix to instcombine.

Differential Revision: https://reviews.llvm.org/D37236

llvm-svn: 312248
2017-08-31 15:57:17 +00:00
Dinar Temirbulatov 8870a14e4e [SLPVectorizer] Move out Entry->NeedToGather check and assert of inner loop as invariant, NFCI.
llvm-svn: 312242
2017-08-31 14:10:07 +00:00
Max Kazantsev 0a9c1ef2eb [IRCE] Identify loops with latch comparison against current IV value
Current implementation of parseLoopStructure interprets the latch comparison as a
comarison against `iv.next`. If the actual comparison is made against the `iv` current value
then the loop may be rejected, because this misinterpretation leads to incorrect evaluation
of the latch start value.

This patch teaches the IRCE to distinguish this kind of loops and perform the optimization
for them. Now we use `IndVarBase` variable which can be either next or current value of the
induction variable (previously we used `IndVarNext` which was always the value on next iteration).

Differential Revision: https://reviews.llvm.org/D36215

llvm-svn: 312221
2017-08-31 07:04:20 +00:00
Max Kazantsev a22742be5a [IRCE][NFC] Rename IndVarNext to IndVarBase
Renaming as a preparation step to generalizing IRCE for comparison not only against
the next value of an indvar, but also against the current.

Differential Revision: https://reviews.llvm.org/D36509

llvm-svn: 312215
2017-08-31 05:58:15 +00:00
Adrian Prantl 504b82d44b Don't add a fragment expression when GlobalSRA splits up a single-member struct
Fixes PR34390.

https://bugs.llvm.org/show_bug.cgi?id=34390

llvm-svn: 312196
2017-08-31 00:06:18 +00:00
Matt Morehouse 034126e507 [SanitizeCoverage] Enable stack-depth coverage for -fsanitize=fuzzer
Summary:
- Don't sanitize __sancov_lowest_stack.
- Don't instrument leaf functions.
- Add CoverageStackDepth to Fuzzer and FuzzerNoLink.
- Only enable on Linux.

Reviewers: vitalybuka, kcc, george.karpenkov

Reviewed By: kcc

Subscribers: kubamracek, cfe-commits, llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D37156

llvm-svn: 312185
2017-08-30 22:49:31 +00:00
Adrian Prantl b192b545c1 Refactor DIBuilder::createFragmentExpression into a static DIExpression member
NFC

llvm-svn: 312165
2017-08-30 20:04:17 +00:00
Daniel Berlin 23fec57e6f NewGVN: Make sure we add the correct user if we swapped the comparison operands
llvm-svn: 312162
2017-08-30 19:53:23 +00:00
Daniel Berlin 7ef26daba8 NewGVN: Allow simplification into variables
llvm-svn: 312161
2017-08-30 19:52:39 +00:00
Benjamin Kramer b99d7c9214 [GVNSink] Remove dependency on SmallPtrSet iteration order.
Found by LLVM_ENABLE_REVERSE_ITERATION.

llvm-svn: 312156
2017-08-30 18:46:37 +00:00
Sanjay Patel 6f7ac7e402 [InstCombine] remove unnecessary vector select fold; NFCI
This code is double-dead:
1. We simplify all selects with constant true/false condition in InstSimplify.
   I've minimized/moved the tests to show that works as expected.
2. All remaining vector selects with a constant condition are canonicalized to
   shufflevector, so we really can't see this pattern.

llvm-svn: 312123
2017-08-30 14:04:57 +00:00
Florian Hahn b992feee13 [InstCombine] Fold insert sequence if first ins has multiple users.
Summary:
If the first insertelement instruction has multiple users and inserts at
position 0, we can re-use this instruction when folding a chain of
insertelement instructions. As we need to generate the first
insertelement instruction anyways, this should be a strict improvement.

We could get rid of the restriction of inserting at position 0 by
creating a different shufflemask, but it is probably worth to keep the
first insertelement instruction with position 0, as this is easier to do
efficiently than at other positions I think.

Reviewers: grosser, mkuper, fpetrogalli, efriedma

Reviewed By: fpetrogalli

Subscribers: gareevroman, llvm-commits

Differential Revision: https://reviews.llvm.org/D37064

llvm-svn: 312110
2017-08-30 10:54:21 +00:00
Mandeep Singh Grang e3bbb68b0c [cfi] Fixed non-determinism in codegen due to DenseSet iteration order
llvm-svn: 312098
2017-08-30 04:47:21 +00:00
Evgeniy Stepanov 7372b67063 [cfi] Avoid branch veneers in jump tables when possible.
Summary:
When jumptable encoding does not match target code encoding (arm vs
thumb), a veneer is inserted by the linker. We can not avoid this
in all cases, because entries within one jumptable must have the same
encoding, but we can make it less common by selecting the jumptable
encoding to match the majority of its targets.

This change only covers FullLTO, and not ThinLTO.

Reviewers: pcc

Subscribers: aemerson, mehdi_amini, javed.absar, kristof.beyls, llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D37171

llvm-svn: 312054
2017-08-29 22:40:19 +00:00
Evgeniy Stepanov 4731ad81c7 [cfi] Build __cfi_check as Thumb when applicable.
Summary:
Cross-DSO CFI needs all __cfi_check exports to use the same encoding
(ARM vs Thumb).

Reviewers: pcc

Subscribers: aemerson, srhines, kristof.beyls, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D37243

llvm-svn: 312052
2017-08-29 22:29:15 +00:00
Matt Morehouse ba2e61b357 Revert "[SanitizeCoverage] Enable stack-depth coverage for -fsanitize=fuzzer"
This reverts r312026 due to bot breakage.

llvm-svn: 312047
2017-08-29 21:56:56 +00:00
Wei Mi ebb9327759 [LoopUnswitch] Fix a simple bug which disables loop unswitch for select statement
This is to fix PR34257. rL309059 takes an early return when FindLIVLoopCondition
fails to find a loop invariant condition. This is wrong and it will disable loop
unswitch for select. The patch fixes the bug.

Differential Revision: https://reviews.llvm.org/D36985

llvm-svn: 312045
2017-08-29 21:45:11 +00:00
Benjamin Kramer 154411e0e7 [FunctionImport] Avoid unused variable warnings in Release builds
Just skip the entire block in NDEBUG. No functionality change intended.

llvm-svn: 312031
2017-08-29 20:24:39 +00:00
Alexey Bataev 978e2e4760 [SimplifyCFG] Fix for PR34219: Preserve alignment after merging conditional stores.
Summary:
If SimplifyCFG pass is able to merge conditional stores into single one,
it loses the alignment. This may lead to incorrect codegen. Patch
sets the alignment of the new instruction if it is set in the original
one.

Reviewers: jmolloy

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D36841

llvm-svn: 312030
2017-08-29 20:06:24 +00:00
Matt Morehouse 2ad8d948b2 [SanitizeCoverage] Enable stack-depth coverage for -fsanitize=fuzzer
Summary:
- Don't sanitize __sancov_lowest_stack.
- Don't instrument leaf functions.
- Add CoverageStackDepth to Fuzzer and FuzzerNoLink.
- Disable stack depth tracking on Mac.

Reviewers: vitalybuka, kcc, george.karpenkov

Reviewed By: kcc

Subscribers: kubamracek, cfe-commits, llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D37156

llvm-svn: 312026
2017-08-29 19:48:12 +00:00
Craig Topper 4431bfe88c [InstCombine] Support vector splats in transformZExtICmp
This patch adds splat support to transformZExtICmp. The test cases are vector versions of tests that failed when commenting out parts of the existing scalar code.

One test didn't vectorize optimize properly due to another bug so a TODO has been added.

Differential Revision: https://reviews.llvm.org/D37253

llvm-svn: 312023
2017-08-29 18:58:13 +00:00
Teresa Johnson 2df7fc7991 [ThinLTO] Clean up stale alias import handling
Summary:
Remove some code that was no longer needed. The first FIXME is
stale since we long ago started using the index to drive importing,
rather than doing force importing based on linkage type. And
now with r309278, we no longer import any aliases.

Reviewers: dblaikie

Subscribers: inglorion, llvm-commits

Differential Revision: https://reviews.llvm.org/D37266

llvm-svn: 312019
2017-08-29 18:15:34 +00:00
Dehao Chen efd007f6f4 Add null check for promoted direct call
Summary: We originally assume that in pgo-icp, the promoted direct call will never be null after strip point casts. However, stripPointerCasts is so smart that it could possibly return the value of the function call if it knows that the return value is always an argument. In this case, the returned value cannot cast to Instruction. In this patch, null check is added to ensure null pointer will not be accessed.

Reviewers: tejohnson, xur, davidxl, djasper

Reviewed By: tejohnson

Subscribers: llvm-commits, sanjoy

Differential Revision: https://reviews.llvm.org/D37252

llvm-svn: 312005
2017-08-29 15:28:12 +00:00
Sanjay Patel 674d2c23ea [Instruction] add moveAfter() convenience function; NFCI
As suggested in D37121, here's a wrapper for removeFromParent() + insertAfter(),
but implemented using moveBefore() for symmetry/efficiency.

Differential Revision: https://reviews.llvm.org/D37239

llvm-svn: 312001
2017-08-29 14:07:48 +00:00
Max Kazantsev bb1d010872 [LSR] Fix Shadow IV in case of integer overflow
When LSR processes code like

  int accumulator = 0;
  for (int i = 0; i < N; i++) {
    accummulator += i;
    use((double) accummulator);
  }

It may decide to replace integer `accumulator` with a double Shadow IV to get rid
of casts.  The problem with that is that the `accumulator`'s value may overflow.
Starting from this moment, the behavior of integer and double accumulators
will differ.

This patch strenghtens up the conditions of Shadow IV mechanism applicability.
We only allow it for IVs that are proved to be `AddRec`s with `nsw`/`nuw` flag.

Differential Revision: https://reviews.llvm.org/D37209

llvm-svn: 311986
2017-08-29 07:32:20 +00:00
Craig Topper 5d6ddda92d [InstCombine] Teach foldSelectICmpAndOr to handle vector splats
This was pretty close to working already. While I was here I went ahead and passed the ICmpInst pointer from the caller instead of doing a dyn_cast that can never fail.

Differential Revision: https://reviews.llvm.org/D37237

llvm-svn: 311960
2017-08-29 00:13:49 +00:00
Justin Bogner f1a54a47b0 [sanitizer-coverage] Mark the guard and 8-bit counter arrays as used
In r311742 we marked the PCs array as used so it wouldn't be dead
stripped, but left the guard and 8-bit counters arrays alone since
these are referenced by the coverage instrumentation. This doesn't
quite work if we want the indices of the PCs array to match the other
arrays though, since elements can still end up being dead and
disappear.

Instead, we mark all three of these arrays as used so that they'll be
consistent with one another.

llvm-svn: 311959
2017-08-29 00:11:05 +00:00
Justin Bogner 873a0746f1 [sanitizer-coverage] Return the array from CreatePCArray. NFC
Be more consistent with CreateFunctionLocalArrayInSection in the API
of CreatePCArray, and assign the member variable in the caller like we
do for the guard and 8-bit counter arrays.

This also tweaks the order of method declarations to match the order
of definitions in the file.

llvm-svn: 311955
2017-08-28 23:46:11 +00:00
Justin Bogner be757de2b6 [sanitizer-coverage] Clean up trailing whitespace. NFC
llvm-svn: 311954
2017-08-28 23:38:12 +00:00
Kamil Rytarowski a9f404f813 Define NetBSD/amd64 ASAN Shadow Offset
Summary:
Catch up after compiler-rt changes and define kNetBSD_ShadowOffset64
as (1ULL << 46).
 
Sponsored by <The NetBSD Foundation>

Reviewers: kcc, joerg, filcab, vitalybuka, eugenis

Reviewed By: eugenis

Subscribers: llvm-commits, #sanitizers

Tags: #sanitizers

Differential Revision: https://reviews.llvm.org/D37234

llvm-svn: 311941
2017-08-28 22:13:52 +00:00
Craig Topper 516e39cd38 [InstCombine] Teach select01 helper of foldSelectIntoOp to handle vector splats
We were handling some vectors in foldSelectIntoOp, but not if the operand of the bin op was any kind of vector constant. This patch fixes it to treat vector splats the same as scalars.

Differential Revision: https://reviews.llvm.org/D37232

llvm-svn: 311940
2017-08-28 22:00:27 +00:00
Davide Italiano 20cb7e887f [LoopUnroll] Properly update loop structure in case of successful peeling.
When peeling kicks in, it updates the loop preheader.
Later, a successful full unroll of the loop needs to update a PHI
which i-th argument comes from the loop preheader, so it'd better look
at the correct block. Fixes PR33437.

Differential Revision:  https://reviews.llvm.org/D37153

llvm-svn: 311922
2017-08-28 20:29:33 +00:00
Davide Italiano 9a09ae448d [LoopUnroll] Add a cl::opt to force peeling, for testing purposes.
Will be used to test the patch proposed in D37153.

llvm-svn: 311915
2017-08-28 19:50:55 +00:00
Taewook Oh 572f45a3c8 Create PHI node for the return value only when the return value has uses.
Summary:
Currently, a phi node is created in the normal destination to unify the return values from promoted calls and the original indirect call. This patch makes this phi node to be created only when the return value has uses.

This patch is necessary to generate valid code, as compiler crashes with the attached test case without this patch. Without this patch, an illegal phi node that has no incoming value from `entry`/`catch` is created in `cleanup` block.

I think existing implementation is good as far as there is at least one use of the original indirect call. `insertCallRetPHI` creates a new phi node in the normal destination block only when the original indirect call dominates its use and the normal destination block. Otherwise, `fixupPHINodeForNormalDest` will handle the unification of return values naturally without creating a new phi node. However, if there's no use, `insertCallRetPHI` still creates a new phi node even when the original indirect call does not dominate the normal destination block, because `getCallRetPHINode` returns false.

Reviewers: xur, davidxl, danielcdh

Reviewed By: xur

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D37176

llvm-svn: 311906
2017-08-28 18:57:00 +00:00
Craig Topper 3763f0e00d [InstCombine] Call hasNoSignedWrap instead of hasNoUnsignedWrap to get the NSW flag when handling Add in SimplifyDemandedUseBits.
This is a typo from r311789.

This should fix PR34349.

llvm-svn: 311902
2017-08-28 18:44:28 +00:00
NAKAMURA Takumi a1e97a77f5 Untabify.
llvm-svn: 311875
2017-08-28 06:47:47 +00:00
Dehao Chen 191b24d3d2 revert r310985 which breaks for the following case:
struct string {
  ~string();
};
void f2();
void f1(int) { f2(); }
void run(int c) {
  string body;
  while (true) {
    if (c)
      f1(c);
    else
      f1(c);
  }
}

Will recommit once the issue is fixed.

llvm-svn: 311864
2017-08-27 22:22:39 +00:00
Ayal Zaks 1f58dda4e4 [LV] Fix PR34248 - recommit D32871 after revert r311304
Original commit r311077 of D32871 was reverted in r311304 due to failures
reported in PR34248.

This recommit fixes PR34248 by restricting the packing of predicated scalars
into vectors only when vectorizing, avoiding doing so when unrolling w/o
vectorizing. Added a test derived from the reproducer of PR34248.

llvm-svn: 311849
2017-08-27 12:55:46 +00:00
Davide Italiano 9bdccb37d5 [NewGVN] Use `auto` when the type is obvious NFCI.
llvm-svn: 311838
2017-08-26 22:31:10 +00:00
Daniel Berlin de269f4620 NewGVN: Fix PR33204 - We need to add memory users when we bypass memorydefs for loads, not just when we do it for stores.
llvm-svn: 311829
2017-08-26 07:37:11 +00:00
Davide Italiano a872519dbd [Inliner] Only compute fully inline cost when remarks are enabled.
Prior to this change (and after r311371), we computed it
unconditionally, causin gsevere compile time regressions (in some
cases, 5 to 10x).

llvm-svn: 311804
2017-08-25 22:01:42 +00:00
Matt Morehouse 6ec7595b1e Revert "[SanitizeCoverage] Enable stack-depth coverage for -fsanitize=fuzzer"
This reverts r311801 due to a bot failure.

llvm-svn: 311803
2017-08-25 22:01:21 +00:00
Matt Morehouse f42bd31323 [SanitizeCoverage] Enable stack-depth coverage for -fsanitize=fuzzer
Summary:
- Don't sanitize __sancov_lowest_stack.
- Don't instrument leaf functions.
- Add CoverageStackDepth to Fuzzer and FuzzerNoLink.

Reviewers: vitalybuka, kcc

Reviewed By: kcc

Subscribers: cfe-commits, llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D37156

llvm-svn: 311801
2017-08-25 21:18:29 +00:00
Kostya Serebryany d3e4b7e24a [sanitizer-coverage] extend fsanitize-coverage=pc-table with flags for every PC
llvm-svn: 311794
2017-08-25 19:29:47 +00:00
Craig Topper 35171e5d67 [InstCombine] Don't fall back to only calling computeKnownBits if the upper bit of Add/Sub is demanded.
Just create an all 1s demanded mask and continue recursing like normal. The recursive calls should be able to handle an all 1s mask and do the right thing.

The only time we should care about knowing whether the upper bit was demanded is when we need to know if we should clear the NSW/NUW flags.

Now that we have a consistent path through the code for all cases, use KnownBits::computeForAddSub to compute the known bits at the end since we already have the LHS and RHS.

My larger goal here is to move the code that turns add into xor if only 1 bit is demanded and no bits below it are non-zero from InstCombiner::OptAndOp to here. This will allow it to be more general instead of just looking for 'add' and 'and' with constant RHS.

Differential Revision: https://reviews.llvm.org/D36486

llvm-svn: 311789
2017-08-25 18:39:40 +00:00
Florian Hahn cd78345398 [LoopInterchange] Skip zext instructions when looking for induction var.
Summary:
SimplifyIndVar may introduce zext instructions to widen arguments of the
loop exit check. They should not prevent us from splitting the loop at
the induction variable, but maybe the check should be more conservative,
e.g. making sure it only extends arguments used by a comparison?

Reviewers: karthikthecool, mcrosier, mzolotukhin

Reviewed By: mcrosier

Subscribers: mzolotukhin, llvm-commits

Differential Revision: https://reviews.llvm.org/D34879

llvm-svn: 311783
2017-08-25 16:52:29 +00:00
Amjad Aboud 22178dd33b [InstCombine] Consider more cases where SimplifyDemandedUseBits does not convert AShr to LShr.
There are cases where AShr have better chance to be optimized than LShr, especially when the demanded bits are not known to be Zero, and also known to be similar to the sign bit.

Differential Revision: https://reviews.llvm.org/D36936

llvm-svn: 311773
2017-08-25 11:07:54 +00:00
Gor Nishanov e29e94cf87 [coroutines] Add support for symmetric control transfer (musttail on coro.resumes followed by a suspend)
Summary:
Add musttail to any resume instructions that is immediately followed by a
suspend (i.e. ret). We do this even in -O0 to support guaranteed tail call
for symmetrical coroutine control transfer (C++ Coroutines TS extension).
This transformation is done only in the resume part of the coroutine that has
identical signature and calling convention as the coro.resume call.

Reviewers: GorNishanov

Reviewed By: GorNishanov

Subscribers: EricWF, majnemer, llvm-commits

Differential Revision: https://reviews.llvm.org/D37125

llvm-svn: 311751
2017-08-25 02:25:10 +00:00
Justin Bogner ad96ff1228 [sanitizer-coverage] Make sure pc-tables aren't dead stripped
Add a reference to the PC array in llvm.used so that linkers that
aggressively dead strip (like ld64) don't remove it.

llvm-svn: 311742
2017-08-25 01:24:54 +00:00
Xinliang David Li 66531dd10a [Profile] backward propagate profile info in JumpThreading
Take-2 after fixing bugs in the original patch.

Differential Revsion: http://reviews.llvm.org/D36864

llvm-svn: 311727
2017-08-24 22:54:01 +00:00
Sanjay Patel bb789381fc [InstCombine] fix and enhance udiv/urem narrowing
There are 3 small independent changes here:

  1. Account for multiple uses in the pattern matching: avoid the transform if it increases the instruction count.
  2. Add a missing fold for the case where the numerator is the constant: http://rise4fun.com/Alive/E2p
  3. Enable all folds for vector types.

There's still one more potential change - use "shouldChangeType()" to keep from transforming to an illegal integer type.

Differential Revision: https://reviews.llvm.org/D36988

llvm-svn: 311726
2017-08-24 22:54:01 +00:00
Chad Rosier f98335e0b0 [PartialInlining] Formatting. NFC.
llvm-svn: 311702
2017-08-24 21:21:09 +00:00
Chad Rosier 4cb2e82774 [PartialInlining] Type. NFC.
llvm-svn: 311699
2017-08-24 20:29:02 +00:00
Sanjay Patel 5d67d8916e [BypassSlowDivision] move map helper code to header; NFC
We can reuse this code with other div/rem transforms as shown in:
https://reviews.llvm.org/D31037 
https://bugs.llvm.org/show_bug.cgi?id=31028

llvm-svn: 311661
2017-08-24 14:43:33 +00:00
Mikael Holmen 7a99e33b8e [Reassociate] Do not drop debug location if replacement is missing
Summary:
When reassociating an expression, do not drop the instruction's
original debug location in case the replacement location is
missing.

The debug location must at least not be dropped for inlinable
callsites of debug-info-bearing functions in debug-info-bearing
functions. Failing to do so would result in an "inlinable function "
"call in a function with debug info must have a !dbg location"
error in the verifier.

As preserving the original debug location is not expected
to result in overly jumpy debug line information, it is
preserved for all other cases too.

This fixes PR34231:
https://bugs.llvm.org/show_bug.cgi?id=34231

Original patch by David Stenberg

Reviewers: davide, craig.topper, mcrosier, dblaikie, aprantl

Reviewed By: davide, aprantl

Subscribers: aprantl

Differential Revision: https://reviews.llvm.org/D36865

llvm-svn: 311642
2017-08-24 09:05:00 +00:00
Daniel Berlin f948603a15 NewGVN: We weren't properly simplifying selects with equal arguments due to a thinko.
llvm-svn: 311626
2017-08-24 02:43:17 +00:00
Rong Xu 15848e5977 [PGO] Set edge weights for indirectbr instruction with profile counts
Current PGO only annotates the edge weight for branch and switch instructions
with profile counts. We should also annotate the indirectbr instruction as
all the information is there. This patch enables the annotating for indirectbr
instructions. Also uses this annotation in branch probability analysis.

Differential Revision: https://reviews.llvm.org/D37074

llvm-svn: 311604
2017-08-23 21:36:02 +00:00
Hans Wennborg 66f6fc0a49 LowerAtomic: Don't skip optnone functions; atomic still need lowering (PR34020)
The lowering isn't really an optimization, so optnone shouldn't make a
difference. ARM relies on the pass running when using "-mthread-model
single", because in that mode, it doesn't run AtomicExpand. See bug for
more details.

Differential Revision: https://reviews.llvm.org/D37040

llvm-svn: 311565
2017-08-23 15:43:28 +00:00
Gor Nishanov 2f55b958b1 [coroutines] CoroBegin from inner coroutines should be considered for spills
Summary:
If a coroutine outer calls another coroutine inner and the inner coroutine body is inlined into the outer, coro.begin from the inner coroutine should be considered for spilling if accessed across suspends.

Prior to this change, coroutine frame building code was not considering any coro.begins for spilling.
With this change, we only ignore coro.begin for the current coroutine, but, any coro.begins that were inlined into the current coroutine are eligible for spills.

Fixes PR34267

Reviewers: GorNishanov

Subscribers: qcolombet, llvm-commits, EricWF

Differential Revision: https://reviews.llvm.org/D37062

llvm-svn: 311556
2017-08-23 14:47:52 +00:00
Chad Rosier 8db41e9dbd [Reassociate] Don't canonicalize x + (-Constant * y) -> x - (Constant * y)..
..if the resulting subtract will be broken up later.  This can cause us to get
into an infinite loop.

x + (-5.0 * y)      -> x - (5.0 * y)       ; Canonicalize neg const
x - (5.0 * y)       -> x + (0 - (5.0 * y)) ; Break up subtract
x + (0 - (5.0 * y)) -> x + (-5.0 * y)      ; Replace 0-X with X*-1.

PR34078

llvm-svn: 311554
2017-08-23 14:10:06 +00:00
Davide Italiano c78885818a [InstCombine] Fold branches with irrelevant conditions to a constant.
InstCombine folds instructions with irrelevant conditions to undef.
This, as Nuno confirmed is a bug.
(see https://bugs.llvm.org/show_bug.cgi?id=33409#c1 )

Given the original motivation for the change is that of removing an
USE, we now fold to false instead (which reaches the same goal
without undesired side effects).

Fixes PR33409.

Differential Revision:  https://reviews.llvm.org/D36975

llvm-svn: 311540
2017-08-23 09:14:37 +00:00
Craig Topper a85f86225a [InstCombine] Remove unused argument. NFC
llvm-svn: 311529
2017-08-23 05:46:09 +00:00
Craig Topper a94069fb4c [InstCombine] Replace a simple matcher with a plain old dyn_cast. NFC
llvm-svn: 311528
2017-08-23 05:46:08 +00:00
Craig Topper 524c44f74e [InstCombine] Remove an unnecessary dyn_cast to Instruction and a switch over two opcodes. Just dyn_cast to the specific instruction classes individually. NFC
Change the helper methods to take the more specific class as well.

llvm-svn: 311527
2017-08-23 05:46:07 +00:00
Craig Topper ec4b82571c [InstCombine] Remove check for sext of vector icmp from shouldOptimizeCast
Looks like for 'and' and 'or' we end up performing at least some of the transformations this is bocking in a round about way anyway.

For 'and sext(cmp1), sext(cmp2) we end up later turning it into 'select cmp1, sext(cmp2), 0'. Then we optimize that back to sext (and cmp1, cmp2). This is the same result we would have gotten if shouldOptimizeCast hadn't blocked it. We do something analogous for 'or'.

With this patch we allow that transformation to happen directly in foldCastedBitwiseLogic. And we now support the same thing for 'xor'. This is definitely opening up many other cases, but since we already went around it for some cases hopefully it's ok.

Differential Revision: https://reviews.llvm.org/D36213

llvm-svn: 311508
2017-08-22 23:40:15 +00:00
Peter Collingbourne 001052a067 WholeProgramDevirt: Create bitcast to i8* at each virtual call site.
We can't reuse the llvm.assume instruction's bitcast because it may not
dominate every user of the vtable pointer.

Differential Revision: https://reviews.llvm.org/D36994

llvm-svn: 311491
2017-08-22 21:41:19 +00:00
Matt Morehouse b1fa8255db [SanitizerCoverage] Optimize stack-depth instrumentation.
Summary:
Use the initialexec TLS type and eliminate calls to the TLS
wrapper.  Fixes the sanitizer-x86_64-linux-fuzzer bot failure.

Reviewers: vitalybuka, kcc

Reviewed By: kcc

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D37026

llvm-svn: 311490
2017-08-22 21:28:29 +00:00
Jakub Kuderski 2724d45325 [ADCE][Dominators] Reapply: Teach ADCE to preserve dominators
Summary:
This patch teaches ADCE to preserve both DominatorTrees and PostDominatorTrees.

This is reapplies the original patch r311057 that was reverted in r311381.
The previous version wasn't using the batch update api for updating dominators,
which in vary rare cases caused assertion failures.

This also fixes PR34258.

Reviewers: dberlin, chandlerc, sanjoy, davide, grosser, brzycki

Reviewed By: davide

Subscribers: grandinj, zhendongsu, llvm-commits, david2050

Differential Revision: https://reviews.llvm.org/D35869

llvm-svn: 311467
2017-08-22 16:30:21 +00:00
Craig Topper 775ffcc8f5 [InstCombine] Move the checks for pointer types in getMaskedTypeForICmpPair earlier in the function
I don't think there's any reason to have them scattered about and on all 4 operands. We already have an early check that both compares must be the same type. And within a given compare the LHS and RHS must have the same type. Beyond that I don't think there's anyway this function returns anything valid for pointer types. So let's just return early and be done with it.

Differential Revision: https://reviews.llvm.org/D36561

llvm-svn: 311383
2017-08-21 21:00:45 +00:00
Sanjoy Das 08a38fe71e Revert "Reapply: [ADCE][Dominators] Teach ADCE to preserve dominators"
Summary: This partially reverts commit r311057 since it breaks ADCE.  See PR34258.

Reviewers: kuhar

Subscribers: mcrosier, david2050, llvm-commits

Differential Revision: https://reviews.llvm.org/D36979

llvm-svn: 311381
2017-08-21 20:39:18 +00:00
Haicheng Wu 0812c5bea3 [InlineCost] Add cl::opt to allow full inline cost to be computed for debugging purposes.
Currently, the inline cost model will bail once the inline cost exceeds the
inline threshold in order to avoid unnecessary compile-time. However, when
debugging it is useful to compute the full cost, so this command line option
is added to override the default behavior.

I took over this work from Chad Rosier (mcrosier@codeaurora.org).

Differential Revision: https://reviews.llvm.org/D35850

llvm-svn: 311371
2017-08-21 20:00:09 +00:00
Sanjay Patel 82ec872990 [LibCallSimplifier] try harder to fold memcmp with constant arguments (2nd try)
The 1st try was reverted because it could inf-loop by creating a dead instruction.
Fixed that to not happen and added a test case to verify.

Original commit message:

Try to fold:
memcmp(X, C, ConstantLength) == 0 --> load X == *C

Without this change, we're unnecessarily checking the alignment of the constant data,
so we miss the transform in the first 2 tests in the patch.

I noted this shortcoming of LibCallSimpifier in one of the recent CGP memcmp expansion
patches. This doesn't help the example in:
https://bugs.llvm.org/show_bug.cgi?id=34032#c13
...directly, but it's worth short-circuiting more of these simple cases since we're
already trying to do that.

The benefit of transforming to load+cmp is that existing IR analysis/transforms may
further simplify that code. For example, if the load of the variable is common to
multiple memcmp calls, CSE can remove the duplicate instructions.

Differential Revision: https://reviews.llvm.org/D36922

llvm-svn: 311366
2017-08-21 19:13:14 +00:00
Craig Topper 74177e1ed1 [InstCombine] Teach foldSelectICmpAnd to recognize a (icmp slt X, 0) and (icmp sgt X, -1) as equivalent to an and with the sign bit of the truncated type
This is similar to what was already done in foldSelectICmpAndOr. Ultimately I'd like to see if we can call foldSelectICmpAnd from foldSelectIntoOp if we detect a power of 2 constant. This would allow us to remove foldSelectICmpAndOr entirely.

Differential Revision: https://reviews.llvm.org/D36498

llvm-svn: 311362
2017-08-21 19:02:06 +00:00
Sam Elliott e963c89d11 Migrate WholeProgramDevirt to new Optimization Remark API
Summary:
This is an attempt to move WholeProgramDevirt to the new remark API.

https://bugs.llvm.org/show_bug.cgi?id=33793

Reviewers: anemet

Reviewed By: anemet

Subscribers: fhahn, llvm-commits

Differential Revision: https://reviews.llvm.org/D36943

llvm-svn: 311352
2017-08-21 16:57:21 +00:00
Sam Elliott e604b563ea Emit only A Single Opt Remark When Inlining
Summary:
This updates the Inliner to only add a single Optimization
Remark when Inlining, rather than an Analysis Remark and an
Optimization Remark.

Fixes https://bugs.llvm.org/show_bug.cgi?id=33786

Reviewers: anemet, davidxl, chandlerc

Reviewed By: anemet

Subscribers: haicheng, fhahn, mehdi_amini, dblaikie, llvm-commits, eraman

Differential Revision: https://reviews.llvm.org/D36054

llvm-svn: 311349
2017-08-21 16:45:47 +00:00
Craig Topper cc255bcd77 [InstCombine] Fix a weakness in canEvaluateZExtd around 'and' instructions
Summary:
If the bitsToClear from the LHS of an 'and' comes back non-zero, but all of those bits are known zero on the RHS, we can reset bitsToClear.

Without this, the 'or' in the modified test case blocks the transform because it has non-zero bits in its RHS in those bits.

Reviewers: spatel, majnemer, davide

Reviewed By: davide

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D36944

llvm-svn: 311343
2017-08-21 16:04:11 +00:00
Xinliang David Li d2838fc4b9 Revert 311208, 311209
llvm-svn: 311341
2017-08-21 16:00:38 +00:00
Sanjay Patel 707f786cc5 revert r311333: [LibCallSimplifier] try harder to fold memcmp with constant arguments
We're getting lots of compile-timeout bot failures like:
http://lab.llvm.org:8011/builders/clang-native-arm-lnt/builds/7119
http://lab.llvm.org:8011/builders/clang-cmake-x86_64-avx2-linux

llvm-svn: 311340
2017-08-21 15:16:25 +00:00
Sanjay Patel 7756edfa93 [LibCallSimplifier] try harder to fold memcmp with constant arguments
Try to fold:
memcmp(X, C, ConstantLength) == 0 --> load X == *C

Without this change, we're unnecessarily checking the alignment of the constant data, 
so we miss the transform in the first 2 tests in the patch.

I noted this shortcoming of LibCallSimpifier in one of the recent CGP memcmp expansion 
patches. This doesn't help the example in:
https://bugs.llvm.org/show_bug.cgi?id=34032#c13
...directly, but it's worth short-circuiting more of these simple cases since we're 
already trying to do that.

The benefit of transforming to load+cmp is that existing IR analysis/transforms may
further simplify that code. For example, if the load of the variable is common to 
multiple memcmp calls, CSE can remove the duplicate instructions.

Differential Revision: https://reviews.llvm.org/D36922

llvm-svn: 311333
2017-08-21 13:55:49 +00:00
Chandler Carruth bd6dc14230 Revert r311077: [LV] Using VPlan ...
This causes LLVM to assert fail on PPC64 and crash / infloop in other
cases. Filed http://llvm.org/PR34248 with reproducer attached.

llvm-svn: 311304
2017-08-20 23:17:11 +00:00
Benjamin Kramer 2e5be849cc [Mem2Reg] Modernize code a bit.
No functionality change intended.

llvm-svn: 311290
2017-08-20 14:34:44 +00:00
Benjamin Kramer 49a49fe816 Move helper classes into anonymous namespaces.
No functionality change intended.

llvm-svn: 311288
2017-08-20 13:03:48 +00:00
Aditya Kumar a525fffd07 [Loop Vectorize] Added a separate metadata
Added a separate metadata to indicate when the loop
has already been vectorized instead of setting width and count to 1.

Patch written by Divya Shanmughan and Aditya Kumar

Differential Revision: https://reviews.llvm.org/D36220

llvm-svn: 311281
2017-08-20 10:32:41 +00:00
Sam Elliott 7fe0aaa140 Revert "Emit only A Single Opt Remark When Inlining"
Reverting due to clang build failure

llvm-svn: 311274
2017-08-20 06:55:10 +00:00
Sam Elliott 785dd75369 Emit only A Single Opt Remark When Inlining
Summary:
This updates the Inliner to only add a single Optimization
Remark when Inlining, rather than an Analysis Remark and an
Optimization Remark.

Fixes https://bugs.llvm.org/show_bug.cgi?id=33786

Reviewers: anemet, davidxl, chandlerc

Reviewed By: anemet

Subscribers: haicheng, fhahn, mehdi_amini, dblaikie, llvm-commits, eraman

Differential Revision: https://reviews.llvm.org/D36054

llvm-svn: 311273
2017-08-20 06:43:34 +00:00
Teresa Johnson 73305f82e9 [ThinLTO] Fix ThinLTO crash
Summary:
Follow up to fix in r311023, which fixed the case where the combined
index is written to disk. The same samplePGO logic exists for the
in-memory index when computing imports, so we need to filter out
GlobalVariable summaries there too.

Reviewers: davidxl

Subscribers: inglorion, llvm-commits

Differential Revision: https://reviews.llvm.org/D36919

llvm-svn: 311254
2017-08-19 18:04:25 +00:00
Chandler Carruth 4f3aa29a46 [Inliner] Fix a nasty bug when inlining a non-recursive trace of
a function into itself.

We tried to fix this before in r306495 but that got reverted as the
assert was actually hit.

This fixes the original bug (which we seem to have lost track of with
the revert) by blocking a second remapping when the function being
inlined is also the caller and the remapping could succeed but
erroneously.

The included test case would actually load from an inlined copy of the
alloca before this change, failing to load the stored value and
miscompiling.

Many thanks to Richard Smith for diagnosing a user miscompile to this
bug, and to Kyle for the first attempt and initial analysis and David Li
for remembering the issue and how to fix it and suggesting the patch.
I'm just stitching it together and landing it. =]

llvm-svn: 311229
2017-08-19 06:56:11 +00:00
Chandler Carruth 1f8212597d [SLP] Fix an unused variable warning in non-asserts builds.
llvm-svn: 311227
2017-08-19 05:06:23 +00:00
Dinar Temirbulatov 7aff8cfa55 [SLPVectorizer] Tighten up VLeft, VRight declaration, remove unnecessary testcase test/Transforms/SLPVectorizer/X86/reorder.ll, NFCI.
llvm-svn: 311223
2017-08-19 03:15:07 +00:00
Dinar Temirbulatov e3ce1b455e [SLPVectorizer] Add opcode parameter to reorderAltShuffleOperands, reorderInputsAccordingToOpcode functions.
Reviewers: mkuper, RKSimon, ABataev, mzolotukhin, spatel, filcab

Subscribers: llvm-commits, rengolin

Differential Revision: https://reviews.llvm.org/D36766

llvm-svn: 311221
2017-08-19 02:54:20 +00:00
Xinliang David Li 0d07f9d68a Fix comment /NFC
llvm-svn: 311209
2017-08-18 23:08:50 +00:00
Xinliang David Li 709ffe178e [Profile] backward propagate profile info in JumpThreading
Differential Revsion: http://reviews.llvm.org/D36864

llvm-svn: 311208
2017-08-18 23:00:05 +00:00
Max Kazantsev 0aaf8c16ac [IRCE] Fix buggy behavior in Clamp
Clamp function was too optimistic when choosing signed or unsigned min/max function for calculations.
In fact, `!IsSignedPredicate` guarantees us that `Smallest` and `Greatest` can be compared safely using unsigned
predicates, but we did not check this for `S` which can in theory be negative.

This patch makes Clamp use signed min/max for cases when it fails to prove `S` being non-negative,
and it adds a test where such situation may lead to incorrect conditions calculation.

Differential Revision: https://reviews.llvm.org/D36873

llvm-svn: 311205
2017-08-18 22:50:29 +00:00
Ana Pazos 6210f27dfc [PGO] Fixed assertion due to mismatched memcpy size type.
Summary:
Memcpy intrinsics have size argument of any integer type, like i32 or i64.
Fixed size type along with its value when cloning the intrinsic.

Reviewers: davidxl, xur

Reviewed By: davidxl

Subscribers: mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D36844

llvm-svn: 311188
2017-08-18 19:17:08 +00:00
Matt Morehouse 5c7fc76983 [SanitizerCoverage] Add stack depth tracing instrumentation.
Summary:
Augment SanitizerCoverage to insert maximum stack depth tracing for
use by libFuzzer.  The new instrumentation is enabled by the flag
-fsanitize-coverage=stack-depth and is compatible with the existing
trace-pc-guard coverage.  The user must also declare the following
global variable in their code:
  thread_local uintptr_t __sancov_lowest_stack

https://bugs.llvm.org/show_bug.cgi?id=33857

Reviewers: vitalybuka, kcc

Reviewed By: vitalybuka

Subscribers: kubamracek, hiraditya, cfe-commits, llvm-commits

Differential Revision: https://reviews.llvm.org/D36839

llvm-svn: 311186
2017-08-18 18:43:30 +00:00
Jakub Kuderski e608ef7635 [LoopRotate][Dominators] Use the incremental API to update DomTree
Summary: This patch teaches LoopRotate to use the new incremental API to update the DominatorTree.

Reviewers: dberlin, davide, grosser, sanjoy

Reviewed By: dberlin, davide

Subscribers: hiraditya, llvm-commits, mzolotukhin

Differential Revision: https://reviews.llvm.org/D35581

llvm-svn: 311125
2017-08-17 21:48:19 +00:00
Jakub Kuderski e35a449140 [Dominators] Teach LoopUnswitch to use the incremental API
Summary:
This patch makes LoopUnswitch use new incremental API for updating dominators.
It also updates SplitCriticalEdge, as it is called in LoopUnswitch.

There doesn't seem to be any noticeable performance difference when bootstrapping clang with this patch.

Reviewers: dberlin, davide, sanjoy, grosser, chandlerc

Reviewed By: davide, grosser

Subscribers: mzolotukhin, llvm-commits

Differential Revision: https://reviews.llvm.org/D35528

llvm-svn: 311093
2017-08-17 16:45:35 +00:00
Simon Dardis b5205c69d2 [dfsan] Add explicit zero extensions for shadow parameters in function wrappers.
In the case where dfsan provides a custom wrapper for a function,
shadow parameters are added for each parameter of the function.
These parameters are i16s. For targets which do not consider this
a legal type, the lack of sign extension information would cause
LLVM to generate anyexts around their usage with phi variables
and calling convention logic.

Address this by introducing zero exts for each shadow parameter.

Reviewers: pcc, slthakur

Differential Revision: https://reviews.llvm.org/D33349

llvm-svn: 311087
2017-08-17 14:14:25 +00:00
Ayal Zaks 6627883369 [LV] Using VPlan to model the vectorized code and drive its transformation
VPlan is an ongoing effort to refactor and extend the Loop Vectorizer. This
patch introduces the VPlan model into LV and uses it to represent the vectorized
code and drive the generation of vectorized IR.

In this patch VPlan models the vectorized loop body: the vectorized control-flow
is represented using VPlan's Hierarchical CFG, with predication refactored from
being a post-vectorization-step into a vectorization planning step modeling
if-then VPRegionBlocks, and generating code inline with non-predicated code. The
vectorized code within each VPBasicBlock is represented as a sequence of
Recipes, each responsible for modelling and generating a sequence of IR
instructions. To keep the size of this commit manageable the Recipes in this
patch are coarse-grained and capture large chunks of LV's code-generation logic.
The constructed VPlans are dumped in dot format under -debug.

This commit retains current vectorizer output, except for minor instruction
reorderings; see associated modifications to lit tests.

For further details on the VPlan model see docs/Proposals/VectorizationPlan.rst
and its references.

Authors: Gil Rapaport and Ayal Zaks

Differential Revision: https://reviews.llvm.org/D32871

llvm-svn: 311077
2017-08-17 09:29:59 +00:00
Jakub Kuderski fd5c5c9144 Reapply: [ADCE][Dominators] Teach ADCE to preserve dominators
Summary:
This patch teaches ADCE to preserve both DominatorTrees and PostDominatorTrees.

I didn't notice any performance impact when bootstrapping clang with this patch.

The patch was originally committed in r311039 and reverted in r311049.
This revision fixes the problem with not adding a dependency on the
DominatorTreeWrapperPass for the LegacyPassManager.

Reviewers: dberlin, chandlerc, sanjoy, davide, grosser, brzycki

Reviewed By: davide

Subscribers: grandinj, zhendongsu, llvm-commits, david2050

Differential Revision: https://reviews.llvm.org/D35869

llvm-svn: 311057
2017-08-17 01:41:49 +00:00
Amjad Aboud 86111c6696 [InstCombine] Teach canEvaluateTruncated to handle arithmetic shift (including those with vector splat shift amount)
Differential Revision: https://reviews.llvm.org/D36784

llvm-svn: 311050
2017-08-16 22:42:38 +00:00
Jakub Kuderski cbcffb173c Revert "[ADCE][Dominators] Teach ADCE to preserve dominators"
This reverts commit r311039. The patch caused the
`test/Bindings/OCaml/Output/scalar_opts.ml` to fail.

llvm-svn: 311049
2017-08-16 22:10:53 +00:00
Craig Topper 882f29630b [InstCombine] Make folding (X >s -1) ? C1 : C2 --> ((X >>s 31) & (C2 - C1)) + C1 support splat vectors
This also uses decomposeBitTestICmp to decode the compare.

Differential Revision: https://reviews.llvm.org/D36781

llvm-svn: 311044
2017-08-16 21:52:07 +00:00
Jakub Kuderski 4552e9de9f [ADCE][Dominators] Teach ADCE to preserve dominators
Summary:
This patch teaches ADCE to preserve both DominatorTrees and PostDominatorTrees.

I didn't notice any performance impact when bootstrapping clang with this patch.

Reviewers: dberlin, chandlerc, sanjoy, davide, grosser, brzycki

Reviewed By: davide

Subscribers: grandinj, zhendongsu, llvm-commits, david2050

Differential Revision: https://reviews.llvm.org/D35869

llvm-svn: 311039
2017-08-16 20:50:23 +00:00
Geoff Berry 40549ad1ac [LoopDataPrefetch][AArch64FalkorHWPFFix] Preserve ScalarEvolution
Summary:
Mark LoopDataPrefetch and AArch64FalkorHWPFFix passes as preserving
ScalarEvolution since they do not alter loop structure and should not
alter any SCEV values (though LoopDataPrefetch may introduce new
instructions that won't have cached SCEV values yet).

This can result in slight code differences, mainly w.r.t. nsw/nuw flags
on SCEVs, since these are computed somewhat lazily when a zext/sext
instruction is encountered.  As a result, passes after the modified
passes may see SCEVs with more nsw/nuw flags present.

Reviewers: sanjoy, anemet

Subscribers: aemerson, rengolin, mzolotukhin, javed.absar, kristof.beyls, mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D36716

llvm-svn: 311032
2017-08-16 19:03:16 +00:00
Hal Finkel 9e54b7093a [BDCE] Don't check demanded bits on unsized types
To clear assumptions that are potentially invalid after trivialization, we need
to walk the use/def chain. Normally, the only way to reach an instruction with
an unsized type is via an instruction that has side effects (or otherwise will
demand its input bits). That would stop the walk. However, if we have a
readnone function that returns an unsized type (e.g., void), we must avoid
asking for the demanded bits of the function call's return value. A
void-returning readnone function is always dead (and so we can stop walking the
use/def chain here), but the check is necessary to avoid asserting.

Fixes PR34211.

llvm-svn: 311014
2017-08-16 16:09:22 +00:00
Dehao Chen 84d412035a Merge debug info when hoist then-else code to if.
Summary: When we move then-else code to if, we need to merge its debug info, otherwise the hoisted instruction may have inaccurate debug info attached.

Reviewers: aprantl, probinson, dblaikie, echristo, loladiro

Reviewed By: aprantl

Subscribers: sanjoy, llvm-commits

Differential Revision: https://reviews.llvm.org/D36778

llvm-svn: 310985
2017-08-16 01:55:26 +00:00
Craig Topper 0a1a276d91 [InstCombine] Teach canEvaluateZExtd and canEvaluateTruncated to handle vector shifts with splat shift amount
We were only allowing ConstantInt before. This patch allows splat of ConstantInt too.

Differential Revision: https://reviews.llvm.org/D36763

llvm-svn: 310970
2017-08-15 22:48:41 +00:00
Amjad Aboud 0464c5d958 [InstCombine] Added support for (X >>s C) << C --> X & (-1 << C)
Differential Revision: https://reviews.llvm.org/D36743

llvm-svn: 310949
2017-08-15 19:33:14 +00:00
Sanjay Patel f69b7d5c93 [InstCombine] sink sext after ashr
Narrow ops are better for bit-tracking, and in the case of vectors,
may enable better codegen.

As the trunc test shows, this can allow follow-on simplifications.

There's a block of code in visitTrunc that deals with shifted ops
with FIXME comments. It may be possible to remove some of that now,
but I want to make sure there are no problems with this step first.

http://rise4fun.com/Alive/Y3a

Name: hoist_ashr_ahead_of_sext_1
  %s = sext i8 %x to i32
  %r = ashr i32 %s, 3  ; shift value is < than source bit width
  =>
  %a = ashr i8 %x, 3
  %r = sext i8 %a to i32
  
Name: hoist_ashr_ahead_of_sext_2
  %s = sext i8 %x to i32
  %r = ashr i32 %s, 8  ; shift value is >= than source bit width
  =>
  %a = ashr i8 %x, 7   ; so clamp this shift value
  %r = sext i8 %a to i32
  
Name: junc_the_trunc  
  %a = sext i16 %v to i32
  %s = ashr i32 %a, 18
  %t = trunc i32 %s to i16
  =>
  %t = ashr i16 %v, 15
llvm-svn: 310942
2017-08-15 18:25:52 +00:00
Jakub Kuderski 638c085d07 [Dominators] Include infinite loops in PostDominatorTree
Summary:
This patch teaches PostDominatorTree about infinite loops. It is built on top of D29705 by @dberlin which includes a very detailed motivation for this change.

What's new is that the patch also teaches the incremental updater how to deal with reverse-unreachable regions and how to properly maintain and verify tree roots. Before that, the incremental algorithm sometimes ended up preserving reverse-unreachable regions after updates that wouldn't appear in the tree if it was constructed from scratch on the same CFG.

This patch makes the following assumptions:
- A sequence of updates should produce the same tree as a recalculating it.
- Any sequence of the same updates should lead to the same tree.
- Siblings and roots are unordered.

The last two properties are essential to efficiently perform batch updates in the future.
When it comes to the first one, we can decide later that the consistency between freshly built tree and an updated one doesn't matter match, as there are many correct ways to pick roots in infinite loops, and to relax this assumption. That should enable us to recalculate postdominators less frequently.

This patch is pretty conservative when it comes to incremental updates on reverse-unreachable regions and ends up recalculating the whole tree in many cases. It should be possible to improve the performance in many cases, if we decide that it's important enough.
That being said, my experiments showed that reverse-unreachable are very rare in the IR emitted by clang when bootstrapping  clang. Here are the statistics I collected by analyzing IR between passes and after each removePredecessor call:

```
# functions:  52283
# samples:  337609
# reverse unreachable BBs:  216022
# BBs:  247840796
Percent reverse-unreachable:  0.08716159869015269 %
Max(PercRevUnreachable) in a function:  87.58620689655172 %
# > 25 % samples:  471 ( 0.1395104988314885 % samples )
... in 145 ( 0.27733680163724345 % functions )
```

Most of the reverse-unreachable regions come from invalid IR where it wouldn't be possible to construct a PostDomTree anyway.

I would like to commit this patch in the next week in order to be able to complete the work that depends on it before the end of my internship, so please don't wait long to voice your concerns :).

Reviewers: dberlin, sanjoy, grosser, brzycki, davide, chandlerc, hfinkel

Reviewed By: dberlin

Subscribers: nhaehnle, javed.absar, kparzysz, uabelho, jlebar, hiraditya, llvm-commits, dberlin, david2050

Differential Revision: https://reviews.llvm.org/D35851

llvm-svn: 310940
2017-08-15 18:14:57 +00:00
Rui Ueyama 4a17955030 Fix -Wunused-lambda-capture for Release build.
`I` and `this` are used only in assert or DEBUG, so they are unused
in Release build.

llvm-svn: 310934
2017-08-15 17:39:35 +00:00
Ayal Zaks 25e2800e20 [LV] Minor savings to Sink casts to unravel first order recurrence
Two minor savings: avoid copying the SinkAfter map and avoid moving a cast if it
is not needed.

Differential Revision: https://reviews.llvm.org/D36408

llvm-svn: 310910
2017-08-15 08:32:59 +00:00
Dinar Temirbulatov 9e43d6e7b2 [SLPVectorizer] Replace VL[0] to VL0 with assert, add propagateIRFlags extra parameter VL0,
replace E->Scalars[0] to VL0, NFCI.

llvm-svn: 310904
2017-08-15 00:31:49 +00:00
Dehao Chen 45847d3612 Add missing dependency in ICP. (NFC)
llvm-svn: 310896
2017-08-14 23:25:21 +00:00
Reid Kleckner 18728822d2 Remove checks for debug info intrinsics in use lists, NFC
These haven't done anything since debug info intrinsics stopped
appearing in Value use lists in 2014.

llvm-svn: 310892
2017-08-14 22:10:54 +00:00
Craig Topper 0aa3a19512 Recommit r310869, "[InstSimplify][InstCombine] Modify the interface of decomposeBitTestICmp and use it in the InstSimplify"
This recommits r310869, with the moved files and no extra changes.

Original commit message:

This addresses a fixme in InstSimplify about using decomposeBitTest. This also fixes InstSimplify to handle ugt and ult compares too.

I've modified the interface a little to return only the APInt version of the mask that InstSimplify needs. InstCombine now has a small wrapper routine to create a Constant out of it. I've also dropped the returning of 0 since InstSimplify doesn't need that. So InstCombine creates a zero constant itself.

I also had to make decomposeBitTest support vectors since InstSimplify needs that.

As InstSimplify can't use something from the Transforms library, I've moved the CmpInstAnalysis code to the Analysis library.

Differential Revision: https://reviews.llvm.org/D36593

llvm-svn: 310889
2017-08-14 21:39:51 +00:00
Andrew Kaylor 53a5fbb45f Add strictfp attribute to prevent unwanted optimizations of libm calls
Differential Revision: https://reviews.llvm.org/D34163

llvm-svn: 310885
2017-08-14 21:15:13 +00:00
Craig Topper 69fa8e0d99 Revert r310869 "[InstSimplify][InstCombine] Modify the interface of decomposeBitTestICmp and use it in the InstSimplify"
Failed to add the two files that moved. And then added an extra change I didn't mean to while trying to fix that. Reverting everything.

llvm-svn: 310873
2017-08-14 19:09:32 +00:00
Craig Topper 2f0b450666 [InstSimplify][InstCombine] Modify the interface of decomposeBitTestICmp and use it in the InstSimplify
This addresses a fixme in InstSimplify about using decomposeBitTest. This also fixes InstSimplify to handle ugt and ult compares too.

I've modified the interface a little to return only the APInt version of the mask that InstSimplify needs. InstCombine now has a small wrapper routine to create a Constant out of it. I've also dropped the returning of 0 since InstSimplify doesn't need that. So InstCombine creates a zero constant itself.

I also had to make decomposeBitTest support vectors since InstSimplify needs that.

As InstSimplify can't use something from the Transforms library, I've moved the CmpInstAnalysis code to the Analysis library.

Differential Revision: https://reviews.llvm.org/D36593

llvm-svn: 310869
2017-08-14 18:49:42 +00:00
Dinar Temirbulatov 7b78f5e52d [SLPVectorizer] Schedule bundle with different opcodes.
This change let us schedule a bundle with different opcodes in it, for example : [ load, add, add, add ]

Reviewers: mkuper, RKSimon, ABataev, mzolotukhin, spatel, filcab

Subscribers: llvm-commits, rengolin

Differential Revision: https://reviews.llvm.org/D36518

llvm-svn: 310847
2017-08-14 15:40:16 +00:00
Sanjay Patel a1067d9bae [BDCE] reduce scope of an assert (PR34179)
The assert was added with r310779 and is usually correct,
but as the test shows, not always. The 'volatile' on the
load is needed to expose the faulty path because without
it, DemandedBits would return that the load is just dead
rather than not demanded, and so we wouldn't hit the
bogus assert.

Also, since the lambda is just a single-line now, get rid 
of it and inline the DB.isAllOnesValue() calls. 

This should fix (prevent execution of a faulty assert):
https://bugs.llvm.org/show_bug.cgi?id=34179

llvm-svn: 310842
2017-08-14 15:13:46 +00:00
Sam Parker 718c8a6a2a [LoopUnroll] Enable option to peel remainder loop
On some targets, the penalty of executing runtime unrolling checks
and then not the unrolled loop can be significantly detrimental to
performance. This results in the need to be more conservative with
the unroll count, keeping a trip count of 2 reduces the overhead as
well as increasing the chance of the unrolled body being executed. But
being conservative leaves performance gains on the table.

This patch enables the unrolling of the remainder loop introduced by
runtime unrolling. This can help reduce the overhead of misunrolled
loops because the cost of non-taken branches is much less than the
cost of the backedge that would normally be executed in the remainder
loop. This allows larger unroll factors to be used without suffering
performance loses with smaller iteration counts.

Differential Revision: https://reviews.llvm.org/D36309

llvm-svn: 310824
2017-08-14 09:25:26 +00:00
Craig Topper f720099007 [InstCombine] Simplify and inline FoldOrWithConstants/FoldXorWithConstants
Summary:
These functions were overly complicated. The body of this function was rechecking for an And operation to find the constant, but we already knew we were looking at two Ands ORed together and the pieces are in variables. We already had earlier nearby code that checked for ConstantInts. So just inline the remaining parts into the earlier code.

Next step is to use m_APInt instead of ConstantInt.

Reviewers: spatel, efriedma, davide, majnemer

Reviewed By: spatel

Subscribers: zzheng, llvm-commits

Differential Revision: https://reviews.llvm.org/D36439

llvm-svn: 310806
2017-08-14 00:04:21 +00:00
Sanjay Patel fe346f9f5b [BDCE] clear poison generators after turning a value into zero (PR33695, PR34037)
nsw, nuw, and exact carry implicit assumptions about their operands, so we need
to clear those after trivializing a value. We decided there was no danger for
llvm.assume or metadata, so there's just a comment about that.

This fixes miscompiles as shown in:
https://bugs.llvm.org/show_bug.cgi?id=33695
https://bugs.llvm.org/show_bug.cgi?id=34037

Differential Revision: https://reviews.llvm.org/D36592

llvm-svn: 310779
2017-08-12 16:41:08 +00:00
Eli Friedman 51cf2604b6 [OptDiag] Updating Remarks in SampleProfile
Updating remark API to newer OptimizationDiagnosticInfo API. This
allows remarks to show up in diagnostic yaml file, and enables use
of opt-viewer tool.

Hotness information for remarks (L505 and L751) do not display hotness
information, most likely due to profile information not being
propagated yet. Unsure if this is the desired outcome.

Patch by Tarun Rajendran.

Differential Revision: https://reviews.llvm.org/D36127

llvm-svn: 310763
2017-08-11 21:12:04 +00:00
Xinliang David Li 24524f314c Fix typo /NFC
llvm-svn: 310737
2017-08-11 17:49:20 +00:00
Craig Topper 9a6110b2d3 [InstCombine] Make (X|C1)^C2 -> X^(C1^C2) iff X&~C1 == 0 work for splat vectors
This also corrects the description to match what was actually implemented. The old comment said X^(C1|C2), but it implemented X^((C1|C2)&~(C1&C2)). I believe ((C1|C2)&~(C1&C2)) is equivalent to (C1^C2).

Differential Revision: https://reviews.llvm.org/D36505

llvm-svn: 310658
2017-08-10 20:35:34 +00:00
Craig Topper 57b4d8646b [InstCombine] Fix a crash in getSelectCondition if we happen to have two inverse vectors of i1 constants.
We used to try to truncate the constant vector to vXi1, but if it's already i1 this would fail. Instead we now use IRBuilder::getZExtOrTrunc which should check the type and only create a trunc if needed. I believe this should trigger constant folding in the IRBuilder and ultimately do the same thing just with the additional type check.

llvm-svn: 310639
2017-08-10 17:48:14 +00:00
Craig Topper cd13ebca5f [InstCombine] Add a DEBUG_COUNTER to InstCombine to limit how many instructions are visited for debug
Sometimes it would be nice to stop InstCombine mid way through its combining to see the current IR. By using a debug counter we can place an upper limit on how many instructions to process.

This will also allow skipping the first X combines, but that has the potential to change later combines since earlier canonicalizations might have been skipped.

Differential Revision: https://reviews.llvm.org/D36553

llvm-svn: 310638
2017-08-10 17:48:12 +00:00
Craig Topper 9cd976d041 [DebugCounter] Move the semicolon out of the DEBUG_COUNTER macro and require it to be placed at the end of each use.
This make it consistent with STATISTIC which it will often appears near.

While there move one DEBUG_COUNTER instance out of an anonymous namespace. It's already declaring a static variable so the namespace is unnecessary.

llvm-svn: 310637
2017-08-10 17:48:11 +00:00
Alexander Potapenko 5241081532 [sanitizer-coverage] Change cmp instrumentation to distinguish const operands
This implementation of SanitizerCoverage instrumentation inserts different
callbacks depending on constantness of operands:

  1. If both operands are non-const, then a usual
     __sanitizer_cov_trace_cmp[1248] call is inserted.
  2. If exactly one operand is const, then a
     __sanitizer_cov_trace_const_cmp[1248] call is inserted. The first
     argument of the call is always the constant one.
  3. If both operands are const, then no callback is inserted.

This separation comes useful in fuzzing when tasks like "find one operand
of the comparison in input arguments and replace it with the other one"
have to be done. The new instrumentation allows us to not waste time on
searching the constant operands in the input.

Patch by Victor Chibotaru.

llvm-svn: 310600
2017-08-10 15:00:13 +00:00
Chad Rosier a5508e3119 [NewGVN] Add CL option to control the generation of phi-of-ops (disable by default).
Differential Revision: https://reviews.llvm.org/D36478539

llvm-svn: 310594
2017-08-10 14:12:57 +00:00
Sanjay Patel c50e55d0e6 [InstCombine] narrow rotate left/right patterns to eliminate zext/trunc (PR34046)
I couldn't find any smaller folds to help the cases in:
https://bugs.llvm.org/show_bug.cgi?id=34046
after:
rL310141

The truncated rotate-by-variable patterns elude all of the existing transforms because 
of multiple uses and knowledge about demanded bits and knownbits that doesn't exist 
without the whole pattern. So we need an unfortunately large pattern match. But by 
simplifying this pattern in IR, the backend is already able to generate 
rolb/rolw/rorb/rorw for x86 using its existing rotate matching logic (although
there is a likely extraneous 'and' of the rotate amount). 

Note that rotate-by-constant doesn't have this problem - smaller folds should already 
produce the narrow IR ops.

Differential Revision: https://reviews.llvm.org/D36395

llvm-svn: 310509
2017-08-09 18:37:41 +00:00
Matt Morehouse 49e5acab33 [asan] Fix instruction emission ordering with dynamic shadow.
Summary:
Instrumentation to copy byval arguments is now correctly inserted
after the dynamic shadow base is loaded.

Reviewers: vitalybuka, eugenis

Reviewed By: vitalybuka

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D36533

llvm-svn: 310503
2017-08-09 17:59:43 +00:00
Jonas Paulsson 6228aeda65 [LSR / TTI / SystemZ] Eliminate TargetTransformInfo::isFoldableMemAccess()
isLegalAddressingMode() has recently gained the extra optional Instruction*
parameter, and therefore it can now do the job that previously only
isFoldableMemAccess() could do.

The SystemZ implementation of isLegalAddressingMode() has gained the
functionality of checking for offsets, which used to be done with
isFoldableMemAccess().

The isFoldableMemAccess() hook has been removed everywhere.

Review: Quentin Colombet, Ulrich Weigand
https://reviews.llvm.org/D35933

llvm-svn: 310463
2017-08-09 11:28:01 +00:00
Jonas Paulsson 5052771af3 [LoopStrengthReduce] Don't neglect the Fixup.Offset in isAMCompletelyFolded().
In the recursive call to isAMCompletelyFolded(), the passed offset should be
the sum of F.BaseOffset and Fixup.Offset.

Review: Quentin Colombet.
llvm-svn: 310462
2017-08-09 11:27:46 +00:00
Davide Italiano c163fac184 [GlobalOpt] Switch an explicit loop to llvm::all_of(). NFCI.
llvm-svn: 310453
2017-08-09 09:23:29 +00:00
Craig Topper 5706c01c0b [InstCombine] Use regular dyn_cast instead of a matcher for a simple case. NFC
llvm-svn: 310446
2017-08-09 06:17:48 +00:00
Wei Mi bb9106ac4b [GVN] Remove stale entries in phitranslate cache when new phi is generated for PRE
When a new phi is generated for scalarpre of an expression, the phiTranslate cache
will become stale: Before PRE, the candidate expression must not be available in a
predecessor block, and phitranslate will cache the information. After PRE, the
expression will become available in all predecessor blocks, so the related entries
in phiTranslate cache becomes stale. The patch will simply remove the stale entries
so phiTranslate can be recomputed next time.

The stale entries in phitranslate cache will not affect correctness but will cause
missing PRE opportunity for later instructions.

Differential Revision: https://reviews.llvm.org/D36124

llvm-svn: 310421
2017-08-08 21:40:14 +00:00
Dehao Chen 34cfcb29aa Make ICP uses PSI to check for hotness.
Summary: Currently, ICP checks the count against a fixed value to see if it is hot enough to be promoted. This does not work for SamplePGO because sampled count may be much smaller. This patch uses PSI to check if the count is hot enough to be promoted.

Reviewers: davidxl, tejohnson, eraman

Reviewed By: davidxl

Subscribers: sanjoy, llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D36341

llvm-svn: 310416
2017-08-08 20:57:33 +00:00
Craig Topper 364359e4fc [InstCombine] Support pulling left shifts through a subtract with constant LHS
We already support pulling through an add with constant RHS. We can do the same for subtract.

Differential Revision: https://reviews.llvm.org/D36443

llvm-svn: 310407
2017-08-08 20:14:11 +00:00
Chad Rosier 4d852597f8 [NewGVN] Use a cast instead of a dyn_cast.
Differential Revision: https://reviews.llvm.org/D36478

llvm-svn: 310397
2017-08-08 18:41:49 +00:00
Anna Thomas 9b6e12f3dc [LoopVectorize] Fix assertion failure in Fcmp vectorization
Summary:
When vectorizing fcmps we can trip on incorrect cast assertion when setting the
FastMathFlags after generating the vectorized FCmp.
This can happen if the FCmp can be folded to true or false directly. The fix
here is to set the FastMathFlag using the FastMathFlagBuilder *before* creating
the FCmp Instruction. This is what's done by other optimizations such as
InstCombine.
Added a test case which trips on cast assertion without this patch.

Reviewers: Ayal, mssimpso, mkuper, gilr

Reviewed by: Ayal, mssimpso

Subscribers: llvm-commits, mzolotukhin

Differential Revision: https://reviews.llvm.org/D36244

llvm-svn: 310389
2017-08-08 18:07:44 +00:00
Craig Topper 8e351e9018 [InstCombine] Cast to BinaryOperator earlier in foldSelectIntoOp to simplify the code.
We no longer need the explicit operand count check or the later dynamic cast.

llvm-svn: 310339
2017-08-08 06:19:24 +00:00
Chandler Carruth 7c888dca46 [PM] Fix new LoopUnroll function pass by invalidating loop analysis
results when a loop is completely removed.

This is very hard to manifest as a visible bug. You need to arrange for
there to be a subsequent allocation of a 'Loop' object which gets the
exact same address as the one which the unroll deleted, and you need the
LoopAccessAnalysis results to be significant in the way that they're
stale. And you need a million other things to align.

But when it does, you get a deeply mysterious crash due to actually
finding a stale analysis result. This fixes the issue and tests for it
by directly checking we successfully invalidate things. I have not been
able to get *any* test case to reliably trigger this. Changes to LLVM
itself caused the only test case I ever had to cease to crash.

I've looked pretty extensively at less brittle ways of fixing this and
they are actually very, very hard to do. This is a somewhat strange and
unusual case as we have a pass which is deleting an IR unit, but is not
running within that IR unit's pass framework (which is what handles this
cleanly for the normal loop unroll). And where there isn't a definitive
way to clear *all* of the stale cache entries. And where the pass *is*
updating the core analysis that provides the IR units!

For example, we don't have any of these problems with Function analyses
because it is easy to clear out function analyses when the functions
themselves may have been deleted -- we clear an entire module's worth!
But that is too heavy of a hammer down here in the LoopAnalysisManager
layer.

A better long-term solution IMO is to require that AnalysisManager's
make their keys durable to this kind of thing. Specifically, when
caching an analysis for one IR unit that is conceptually "owned" by
a higher level IR unit, the AnalysisManager should incorporate this into
its data structures so that we can reliably clear these results without
having to teach each and every pass to do so manually as we do here. But
that is a change for another day as it will be a fairly invasive change
to the AnalysisManager infrastructure. Until then, this fortunately
seems to be quite rare.

llvm-svn: 310333
2017-08-08 02:24:20 +00:00
Evgeny Stupachenko c675290680 Reapply fix PR23384 (part 3 of 3) r304824 (was reverted in r305720).
The root cause of reverting was fixed - PR33514.

Summary:
The patch makes instruction count the highest priority for
 LSR solution for X86 (previously registers had highest priority).

Reviewers: qcolombet

Differential Revision: http://reviews.llvm.org/D30562

From: Evgeny Stupachenko <evstupac@gmail.com>
                         <evgeny.v.stupachenko@intel.com>
llvm-svn: 310289
2017-08-07 19:56:34 +00:00
Aaron Ballman 428f0fe910 Removing an unused variable that was missed with the refactoring in r310272; NFC.
llvm-svn: 310285
2017-08-07 19:26:17 +00:00
Craig Topper 7091a743b4 [InstCombine] Support (X | C1) & C2 --> (X & C2^(C1&C2)) | (C1&C2) for vector splats
Note the original code I deleted incorrectly listed this as (X | C1) & C2 --> (X & C2^(C1&C2)) | C1 Which is only valid if C1 is a subset of C2. This relied on SimplifyDemandedBits to remove any extra bits from C1 before we got to that code.

My new implementation avoids relying on that behavior so that it can be naively verified with alive.

Differential Revision: https://reviews.llvm.org/D36384

llvm-svn: 310272
2017-08-07 18:10:39 +00:00
Alexey Bataev 9581b42589 [SLP] General improvements of SLP vectorization process.
Patch tries to improve two-pass vectorization analysis, existing in SLP vectorizer. What it does:

1. Defines key nodes, that are the vectorization roots. Previously vectorization started if StoreInst or ReturnInst is found. For now, the vectorization started for all Instructions with no users and void types (Terminators, StoreInst) + CallInsts.
2. CmpInsts, InsertElementInsts and InsertValueInsts are stored in the
array. This array is processed only after the vectorization of the
first-after-these instructions key node is finished. Vectorization goes
in reverse order to try to vectorize as much code as possible.

Reviewers: mzolotukhin, Ayal, mkuper, gilr, hfinkel, RKSimon

Subscribers: ashahid, anemet, RKSimon, mssimpso, llvm-commits

Differential Revision: https://reviews.llvm.org/D29826

llvm-svn: 310260
2017-08-07 15:25:49 +00:00
Alexey Bataev 53d523c9eb Revert "[SLP] General improvements of SLP vectorization process."
This reverts commit r310255.

llvm-svn: 310257
2017-08-07 14:51:52 +00:00
Alexey Bataev faace8f1f1 [SLP] General improvements of SLP vectorization process.
Summary:
Patch tries to improve two-pass vectorization analysis, existing in SLP vectorizer. What it does:
1. Defines key nodes, that are the vectorization roots. Previously vectorization started if StoreInst or ReturnInst is found. For now, the vectorization started for all Instructions with no users and void types (Terminators, StoreInst) + CallInsts.
2. CmpInsts, InsertElementInsts and InsertValueInsts are stored in the array. This array is processed only after the vectorization of the first-after-these instructions key node is finished. Vectorization goes in reverse order to try to vectorize as much code as possible.

Reviewers: mzolotukhin, Ayal, mkuper, gilr, hfinkel, RKSimon

Subscribers: ashahid, anemet, RKSimon, mssimpso, llvm-commits

Differential Revision: https://reviews.llvm.org/D29826

llvm-svn: 310255
2017-08-07 14:03:17 +00:00
Vitaly Buka 5d432ec929 [asan] Fix asan dynamic shadow check before copyArgsPassedByValToAllocas
llvm-svn: 310242
2017-08-07 07:35:33 +00:00
Vitaly Buka 629047de8e [asan] Disable checking of arguments passed by value for --asan-force-dynamic-shadow
Fails with "Instruction does not dominate all uses!"

llvm-svn: 310241
2017-08-07 07:12:34 +00:00
Davide Italiano b53b075bb1 [Reassociate] Use a range loop for clarity. NFCI.
While here, rename `i` to `Rank` as the latter is more
self-explanatory (and this code also uses `I` two lines below to
identify an Instruction).

llvm-svn: 310238
2017-08-07 01:57:21 +00:00
Davide Italiano a5cdc22e70 [Reassociate] Try to bail out early when canonicalizing.
This commit rearranges the checks to avoid calls to getRank()
when not needed (e.g. when RHS == LHS).

llvm-svn: 310237
2017-08-07 01:49:09 +00:00
Craig Topper 576fb91aef [InstCombine] Remove shift handling from OptAndOp.
Summary: This is all handled by SimplifyDemandedBits.

Reviewers: spatel, davide

Reviewed By: davide

Subscribers: davide, llvm-commits

Differential Revision: https://reviews.llvm.org/D36382

llvm-svn: 310234
2017-08-06 23:30:49 +00:00
Craig Topper a1693a2ed3 [InstCombine] Support (X ^ C1) & C2 --> (X & C2) ^ (C1&C2) for vector splats.
llvm-svn: 310233
2017-08-06 23:11:49 +00:00
Craig Topper 9cbdbefd0f [InstCombine] Support '(C - X) ^ signmask -> (C + signmask - X)' and '(X + C) ^ signmask -> (X + C + signmask)' for vector splats.
llvm-svn: 310232
2017-08-06 22:17:21 +00:00
Craig Topper b5bf016015 [InstCombine] Support ~(c-X) --> X+(-c-1) and ~(X-c) --> (-c-1)-X for splat vectors.
llvm-svn: 310195
2017-08-06 06:28:41 +00:00
Craig Topper 9ffda5ab86 [InstCombine] Fold (C - X) ^ signmask -> (C + signmask - X).
llvm-svn: 310186
2017-08-05 20:00:44 +00:00
Craig Topper 65dd32afbc [InstCombine] Teach the code that pulls logical operators through constant shifts to handle vector splats too.
llvm-svn: 310185
2017-08-05 20:00:42 +00:00
Craig Topper 1bbcab9ca5 [InstCombine] Support vector splats in foldSelectICmpAnd.
Unfortunately, it looks like there's some other missed optimizations in the generated code for some of these cases. I'll try to look at some of those next.

llvm-svn: 310184
2017-08-05 20:00:41 +00:00
Dinar Temirbulatov cc2294a4eb [SLPVectorizer] Add extra parameter to setInsertPointAfterBundle to handle different opcodes, NFCI.
Differential Revision: https://reviews.llvm.org/D35769

llvm-svn: 310183
2017-08-05 18:43:52 +00:00
Sanjay Patel 94da1de1ce [InstCombine] refactor trunc(binop) transforms; NFCI
In addition to moving the shift transforms over, we may want to
detect too-wide rotate patterns here (PR34046). 

llvm-svn: 310181
2017-08-05 15:19:18 +00:00
Craig Topper fc5283092b [InstCombine] In foldSelectICmpAnd, if we need to to truncate from the 'and' type to the 'select' type, do it after shifting right instead of just bailing.
Previously we were always trying to emit the zext or truncate before any shift. This meant if the 'and' mask was larger than the size of the truncate we would skip the transformation.

Now we shift the result of the and right first leaving the bit within the range of the truncate.

This matches what we are doing in foldSelectICmpAndOr for the same problem.

llvm-svn: 310159
2017-08-05 01:45:17 +00:00
Sanjay Patel e12d734be3 [InstCombine] narrow truncated add/sub/mul with constant
Name: narrow_sub
  %sub = sub i32 C1, %x
  %r = trunc i32 %sub to i8
  =>  
  %xn = trunc i32 %x to i8
  %narrowC = trunc i32 C1 to i8
  %r = sub i8 %narrowC, %xn
 
Name: narrow_add
  %add = add i32 %x, C1
  %r = trunc i32 %add to i8
  =>  
  %xn = trunc i32 %x to i8
  %narrowC = trunc i32 C1 to i8
  %r = add i8 %xn, %narrowC
  
Name: narrow_mul
  %mul = mul i32 %x, C1
  %r = trunc i32 %mul to i8
  =>  
  %xn = trunc i32 %x to i8
  %narrowC = trunc i32 C1 to i8
  %r = mul i8 %xn, %narrowC


http://rise4fun.com/Alive/QpS

This doesn't solve PR34046 (failure to recognize rotate):
https://bugs.llvm.org/show_bug.cgi?id=34046
...but it reduces an extra complication in the description examples 
to a form that we can more easily match.

llvm-svn: 310141
2017-08-04 22:30:34 +00:00
Nico Weber 69ca5322d4 Revert r310055, it caused PR34074.
llvm-svn: 310123
2017-08-04 20:40:38 +00:00
Evgeny Stupachenko 38197c66a1 Fix PR33514
Summary:
The bug was uncovered after fix of  PR23384 (part 3 of 3).
The patch restricts pointer multiplication in SCEV computaion for ICmpZero.

Reviewers: qcolombet

Differential Revision: http://reviews.llvm.org/D36170

From: Evgeny Stupachenko <evstupac@gmail.com>
                         <evgeny.v.stupachenko@intel.com>
llvm-svn: 310092
2017-08-04 18:46:13 +00:00
Reid Kleckner da748f1c3d [ArgPromotion] Preserve alignment of byval argument in new alloca
The frontend may have requested a higher alignment for any reason, and
downstream optimizations may already have taken advantage of it.  We
should keep the same alignment when moving the allocation from the
parameter area to the local variable area.

Fixes PR34038

llvm-svn: 310071
2017-08-04 17:09:11 +00:00
Benjamin Kramer bda212a65d [InstCombine] Fold single-use variable into assert.
Avoids unused variable warnings in Release builds. No functional change.

llvm-svn: 310064
2017-08-04 16:08:41 +00:00
Craig Topper 760ff6ee87 [InstCombine] Remove the (not (sext)) case from foldBoolSextMaskToSelect and inline the remaining code to match visitOr
Summary:
The (not (sext)) case is really (xor (sext), -1) which should have been simplified to (sext (xor, 1)) before we got here. So we shouldn't need to handle it.

With that taken care of we only need to two cases so don't need the swap anymore. This makes us in sync with the equivalent code in visitOr so inline this to match.

Reviewers: spatel, eli.friedman, majnemer

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D36240

llvm-svn: 310063
2017-08-04 16:07:20 +00:00
Craig Topper 3b74a68cc7 [InstCombine] Use ConstantInt::getFalse to reduce some code. NFC
llvm-svn: 310062
2017-08-04 16:07:18 +00:00
Sanjay Patel 79e7f6b3e3 [InstCombine] narrow lshr with constant
Name: narrow_shift
Pre: C1 < 8
%zx = zext i8 %x to i32
%l = lshr i32 %zx, C1
  =>  
%narrowC = trunc i32 C1 to i8
%ns = lshr i8 %x, %narrowC
%l = zext i8 %ns to i32

http://rise4fun.com/Alive/jIV

This isn't directly applicable to PR34046 as written, but we
need to have more narrowing folds like this to be sure that
rotate patterns are recognized.

llvm-svn: 310060
2017-08-04 15:42:47 +00:00
Filipe Cabecinhas fb9d2a8775 [DSE] Merge stores when the later store only writes to memory locations the early store also wrote to.
Summary:
This fixes PR31777.

If both stores' values are ConstantInt, we merge the two stores
(shifting the smaller store appropriately) and replace the earlier (and
larger) store with an updated constant.

In the future we should also support vectors of integers. And maybe
float/double if we can.

Reviewers: hfinkel, junbuml, jfb, RKSimon, bkramer

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30703

llvm-svn: 310055
2017-08-04 12:28:36 +00:00
Nikolai Bozhenov 1545eb3408 [InstCombine] Canonicalize clamp of float types to minmax in fast mode.
Summary:
This commit allows matchSelectPattern to recognize clamp of float
arguments in the presence of FMF the same way as already done for
integers.

This case is a little different though. With integers, given the
min/max pattern is recognized, DAGBuilder starts selecting MIN/MAX
"automatically". That is not the case for float, because for them only
full FMINNAN/FMINNUM/FMAXNAN/FMAXNUM ISD nodes exist and they do care
about NaNs. On the other hand, some backends (e.g. X86) have only
FMIN/FMAX nodes that do not care about NaNS and the former NAN/NUM
nodes are illegal thus selection is not happening. So I decided to do
such kind of transformation in IR (InstCombiner) instead of
complicating the logic in the backend.

Reviewers: spatel, jmolloy, majnemer, efriedma, craig.topper

Reviewed By: efriedma

Subscribers: hiraditya, javed.absar, n.bozhenov, llvm-commits

Patch by Andrei Elovikov <andrei.elovikov@intel.com>

Differential Revision: https://reviews.llvm.org/D33186

llvm-svn: 310054
2017-08-04 12:22:17 +00:00
Max Kazantsev 9505470033 Do not declare a variable which is used only in assert. NFC
llvm-svn: 310034
2017-08-04 07:41:24 +00:00
Max Kazantsev 2f6ae28152 [IRCE] Handle loops with step different from 1/-1
This patch generalizes IRCE to handle IV steps that are not equal to 1 or -1.

Differential Revision: https://reviews.llvm.org/D35539

llvm-svn: 310032
2017-08-04 07:01:04 +00:00
Max Kazantsev 07da1ab23a [IRCE] Recognize loops with unsigned latch conditions
This patch enables recognition of loops with ult/ugt latch conditions.

Differential Revision: https://reviews.llvm.org/D35302

llvm-svn: 310027
2017-08-04 05:40:20 +00:00
Craig Topper c2d3c631ff [InstCombine] Move the call to foldSelectICmpAnd into foldSelectInstWithICmp. NFCI
llvm-svn: 310025
2017-08-04 05:12:37 +00:00
Craig Topper a86ca08d26 [InstCombine] Remove unnecessary casts. NFC
We're calling an overload of getOpcode that already returns Instruction::CastOps.

llvm-svn: 310024
2017-08-04 05:12:35 +00:00
Victor Leschuk 56b03d0dd6 Un-revert r310014: false revert, it wasn't the cause of build break
llvm-svn: 310021
2017-08-04 04:51:15 +00:00
Victor Leschuk 21713ebfb1 Revert r310014 as it breaks build lld-x86_64-darwin13
llvm-svn: 310020
2017-08-04 04:43:54 +00:00
Adrian Prantl fd8c8e9fe6 Teach GlobalSRA to update the debug info for split-up globals.
This is similar to what we are doing in "regular" SROA and creates
DW_OP_LLVM_fragment operations to describe the resulting variables.

rdar://problem/33654891

llvm-svn: 310014
2017-08-04 01:19:54 +00:00
Teresa Johnson 8482e56920 Use profile summary to disable peeling for huge working sets
Summary:
Detect when the working set size of a profiled application is huge,
by comparing the number of counts required to reach the hot percentile
in the profile summary to a large threshold*.

When the working set size is determined to be huge, disable peeling
to avoid bloating the working set further.

*Note that the selected threshold (15K) is significantly larger than the
largest working set value in SPEC cpu2006 (which is gcc at around 11K).

Reviewers: davidxl

Subscribers: mehdi_amini, mzolotukhin, eraman, llvm-commits

Differential Revision: https://reviews.llvm.org/D36288

llvm-svn: 310005
2017-08-03 23:42:58 +00:00
Davide Italiano 5974c31d91 [NewGVN] Fix the case where we have a phi-of-ops which goes away.
Patch by Daniel Berlin, fixes PR33196 (and probably something else).

llvm-svn: 309988
2017-08-03 21:17:49 +00:00
Teresa Johnson 9a18a6f08b Disable loop peeling during full unrolling pass.
Summary:
Peeling should not occur during the full unrolling invocation early
in the pipeline, but rather later with partial and runtime loop
unrolling. The later loop unrolling invocation will also eventually
utilize profile summary and branch frequency information, which
we would like to use to control peeling. And for ThinLTO we want
to delay peeling until the backend (post thin link) phase, just as
we do for most types of unrolling.

Ensure peeling doesn't occur during the full unrolling invocation
by adding a parameter to the shared implementation function, similar
to the way partial and runtime loop unrolling are disabled.

Performance results for ThinLTO suggest this has a neutral to positive
effect on some internal benchmarks.

Reviewers: chandlerc, davidxl

Subscribers: mzolotukhin, llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D36258

llvm-svn: 309966
2017-08-03 17:52:38 +00:00
Sanjay Patel 7cf745cc94 [NewGVN] fix typos; NFC
llvm-svn: 309946
2017-08-03 15:18:27 +00:00
Ewan Crawford e18490c8be [Cloning] Move distinct GlobalVariable debug info metadata in CloneModule
Duplicating the distinct Subprogram and CU metadata nodes seems like the incorrect thing to do in CloneModule for GlobalVariable debug info. As it results in the scope of the GlobalVariable DI no longer being consistent with the rest of the module, and the new CU is absent from llvm.dbg.cu.

Fixed by adding RF_MoveDistinctMDs to MapMetadata flags for GlobalVariables.

Current unit test IR after clone:
```
@gv = global i32 1, comdat($comdat), !dbg !0, !type !5

define private void @f() comdat($comdat) personality void ()* @persfn !dbg !14 {

!llvm.dbg.cu = !{!10}

!0 = !DIGlobalVariableExpression(var: !1)
!1 = distinct !DIGlobalVariable(name: "gv", linkageName: "gv", scope: !2, file: !3, line: 1, type: !9, isLocal: false, isDefinition: true)
!2 = distinct !DISubprogram(name: "f", linkageName: "f", scope: null, file: !3, line: 4, type: !4, isLocal: true, isDefinition: true, scopeLine: 3, isOptimized: false, unit: !6, variables: !5)
!3 = !DIFile(filename: "filename.c", directory: "/file/dir/")
!4 = !DISubroutineType(types: !5)
!5 = !{}
!6 = distinct !DICompileUnit(language: DW_LANG_C99, file: !7, producer: "CloneModule", isOptimized: false, runtimeVersion: 0, emissionKind: FullDebug, enums: !5, globals: !8)
!7 = !DIFile(filename: "filename.c", directory: "/file/dir")
!8 = !{!0}
!9 = !DIBasicType(tag: DW_TAG_unspecified_type, name: "decltype(nullptr)")
!10 = distinct !DICompileUnit(language: DW_LANG_C99, file: !7, producer: "CloneModule", isOptimized: false, runtimeVersion: 0, emissionKind: FullDebug, enums: !5, globals: !11)
!11 = !{!12}
!12 = !DIGlobalVariableExpression(var: !13)
!13 = distinct !DIGlobalVariable(name: "gv", linkageName: "gv", scope: !14, file: !3, line: 1, type: !9, isLocal: false, isDefinition: true)
!14 = distinct !DISubprogram(name: "f", linkageName: "f", scope: null, file: !3, line: 4, type: !4, isLocal: true, isDefinition: true, scopeLine: 3, isOptimized: false, unit: !10, variables: !5)
```

Patched IR after clone:
```
@gv = global i32 1, comdat($comdat), !dbg !0, !type !5

define private void @f() comdat($comdat) personality void ()* @persfn !dbg !2 {

!llvm.dbg.cu = !{!6}

!0 = !DIGlobalVariableExpression(var: !1)
!1 = distinct !DIGlobalVariable(name: "gv", linkageName: "gv", scope: !2, file: !3, line: 1, type: !9, isLocal: false, isDefinition: true)
!2 = distinct !DISubprogram(name: "f", linkageName: "f", scope: null, file: !3, line: 4, type: !4, isLocal: true, isDefinition: true, scopeLine: 3, isOptimized: false, unit: !6, variables: !5)
!3 = !DIFile(filename: "filename.c", directory: "/file/dir/")
!4 = !DISubroutineType(types: !5)
!5 = !{}
!6 = distinct !DICompileUnit(language: DW_LANG_C99, file: !7, producer: "CloneModule", isOptimized: false, runtimeVersion: 0, emissionKind: FullDebug, enums: !5, globals: !8)
!7 = !DIFile(filename: "filename.c", directory: "/file/dir")
!8 = !{!0}
!9 = !DIBasicType(tag: DW_TAG_unspecified_type, name: "decltype(nullptr)")
```

Reviewers: aprantl, probinson, dblaikie, echristo, loladiro
Reviewed By: aprantl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D36082

llvm-svn: 309928
2017-08-03 09:23:03 +00:00
Matt Arsenault e6de494b74 LV: Don't insert runtime ptr checks on divergent targets
llvm-svn: 309890
2017-08-02 21:43:08 +00:00
Craig Topper fc9bf50dee [InstCombine] Remove unnecessary temporary APInt. NFCI
llvm-svn: 309887
2017-08-02 21:05:40 +00:00
Teresa Johnson ecd901314d [PM] Split LoopUnrollPass and make partial unroller a function pass
Summary:
This is largely NFC*, in preparation for utilizing ProfileSummaryInfo
and BranchFrequencyInfo analyses. In this patch I am only doing the
splitting for the New PM, but I can do the same for the legacy PM as
a follow-on if this looks good.

*Not NFC since for partial unrolling we lose the updates done to the
loop traversal (adding new sibling and child loops) - according to
Chandler this is not very useful for partial unrolling, but it also
means that the debugging flag -unroll-revisit-child-loops no longer
works for partial unrolling.

Reviewers: chandlerc

Subscribers: mehdi_amini, mzolotukhin, eraman, llvm-commits

Differential Revision: https://reviews.llvm.org/D36157

llvm-svn: 309886
2017-08-02 20:35:29 +00:00
Craig Topper 4068e4eec5 [InstCombine] Remove explicit code for folding (xor(zext(cmp)), 1) and (xor(sext(cmp)), -1) to ext(!cmp).
As far as I can tell this should be handled by foldCastedBitwiseLogic which is called later in visitXor.

Differential Revision: https://reviews.llvm.org/D36214

llvm-svn: 309882
2017-08-02 20:30:27 +00:00
Craig Topper ae9b87d10c [InstCombine] Support sext in foldLogicCastConstant
This adds support for sext in foldLogicCastConstant. This is a prerequisite for D36214.

Differential Revision: https://reviews.llvm.org/D36234

llvm-svn: 309880
2017-08-02 20:25:56 +00:00
Jakub Kuderski d869913f3b [Dominators] Teach LoopDeletion to use the new incremental API
Summary:
This patch makes LoopDeletion use the incremental DominatorTree API.

We modify LoopDeletion to perform the deletion in 5 steps:
1. Create a new dummy edge from the preheader to the exit, by adding a conditional branch.
2. Inform the DomTree about the new edge.
3. Remove the conditional branch and replace it with an unconditional edge to the exit. This removes the edge to the loop header, making it unreachable.
4. Inform the DomTree about the deleted edge.
5. Remove the unreachable block from the function.

Creating the dummy conditional branch is necessary to perform incremental DomTree update.
We should consider using the batch updater when it's ready.

Reviewers: dberlin, davide, grosser, sanjoy

Reviewed By: dberlin, grosser

Subscribers: mzolotukhin, llvm-commits

Differential Revision: https://reviews.llvm.org/D35391

llvm-svn: 309850
2017-08-02 18:17:52 +00:00
Alexey Bataev 36e6096a03 [SLPVectorizer] Generalize interface of functions, NFC.
llvm-svn: 309816
2017-08-02 14:38:07 +00:00
Alexey Bataev fba97e6e21 [SLP] Fix for PR31880: shuffle and vectorize repeated scalar ops on extracted elements
Summary:
Currently most of the time vectors of extractelement instructions are
treated as scalars that must be gathered into vectors. But in some
cases, like when we have extractelement instructions from single vector
with different constant indeces or from 2 vectors of the same size, we
can treat this operations as shuffle of a single vector or blending of 2
vectors.
```
define <2 x i8> @g(<2 x i8> %x, <2 x i8> %y) {
  %x0 = extractelement <2 x i8> %x, i32 0
  %y1 = extractelement <2 x i8> %y, i32 1
  %x0x0 = mul i8 %x0, %x0
  %y1y1 = mul i8 %y1, %y1
  %ins1 = insertelement <2 x i8> undef, i8 %x0x0, i32 0
  %ins2 = insertelement <2 x i8> %ins1, i8 %y1y1, i32 1
  ret <2 x i8> %ins2
}
```
can be converted to something like
```
define <2 x i8> @g(<2 x i8> %x, <2 x i8> %y) {
  %1 = shufflevector <2 x i8> %x, <2 x i8> %y, <2 x i32> <i32 0, i32 3>
  %2 = mul <2 x i8> %1, %1
  ret <2 x i8> %2
}
```
Currently this type of conversion is considered as high cost
transformation.

Reviewers: mzolotukhin, delena, mkuper, hfinkel, RKSimon

Subscribers: ashahid, RKSimon, spatel, llvm-commits

Differential Revision: https://reviews.llvm.org/D30200

llvm-svn: 309812
2017-08-02 13:25:26 +00:00
Davide Italiano c2f73b7fae [NewGVN] Fold single-use variables. NFCI.
llvm-svn: 309790
2017-08-02 04:05:49 +00:00
Davide Italiano b13a3fa41b [NewGVN] Remove a (now stale) comment. NFCI.
llvm-svn: 309789
2017-08-02 03:51:40 +00:00
Craig Topper 36af40c115 [SimplifyCFG] Fix typo in comment. NFC
llvm-svn: 309785
2017-08-02 02:34:16 +00:00
Chandler Carruth 95055d8f8b [PM] Fix a bug where through CGSCC iteration we can get
infinite-inlining across multiple runs of the inliner by keeping a tiny
history of internal-to-SCC inlining decisions.

This is still a bit gross, but I don't yet have any fundamentally better
ideas and numerous people are blocked on this to use new PM and ThinLTO
together.

The core of the idea is to detect when we are about to do an inline that
has a chance of re-splitting an SCC which we have split before with
a similar inlining step. That is a critical component in the inlining
forming a cycle and so far detects all of the various cyclic patterns
I can come up with as well as the original real-world test case (which
comes from a ThinLTO build of libunwind).

I've added some tests that I think really demonstrate what is going on
here. They are essentially state machines that march the inliner through
various steps of a cycle and check that we stop when the cycle is closed
and that we actually did do inlining to form that cycle.

A lot of thanks go to Eric Christopher and Sanjoy Das for the help
understanding this issue and improving the test cases.

The biggest "yuck" here is the layering issue -- the CGSCC pass manager
is providing somewhat magical state to the inliner for it to use to make
itself converge. This isn't great, but I don't honestly have a lot of
better ideas yet and at least seems nicely isolated.

I have tested this patch, and it doesn't block *any* inlining on the
entire LLVM test suite and SPEC, so it seems sufficiently narrowly
targeted to the issue at hand.

We have come up with hypothetical issues that this patch doesn't cover,
but so far none of them are practical and we don't have a viable
solution yet that covers the hypothetical stuff, so proceeding here in
the interim. Definitely an area that we will be back and revisiting in
the future.

Differential Revision: https://reviews.llvm.org/D36188

llvm-svn: 309784
2017-08-02 02:09:22 +00:00
Chad Rosier dfd1de687d [Value Tracking] Default argument to true and rename accordingly. NFC.
IMHO this is a bit more readable.

llvm-svn: 309739
2017-08-01 20:18:54 +00:00
Craig Topper 4d5050b5ba [InstCombine] Remove explicit check for impossible condition. Replace with assert
Summary:
As far as I can tell the earlier call getLimitedValue will guaranteed ShiftAmt is saturated to BitWidth-1 preventing it from ever being equal or greater than BitWidth.

At one point in the past the getLimitedValue call was only passed BitWidth not BitWidth - 1. This would have allowed the equality case to get here. And in fact this check was initially added as just BitWidth == ShiftAmt, but was changed shortly after to include > which should have never been possible.

Reviewers: spatel, majnemer, davide

Reviewed By: davide

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D36123

llvm-svn: 309690
2017-08-01 15:10:25 +00:00
Max Kazantsev e4c220e8f2 [IRCE][NFC] Add another assert that AddRecExpr's step is not zero
One more assertion of this kind. It is a preparation step for generalizing
to the case of stride not equal to +1/-1.

llvm-svn: 309663
2017-08-01 06:49:29 +00:00
Max Kazantsev 85da7543f9 [IRCE][NFC] Add assert that AddRecExpr's step is not zero
We should never return zero steps, ensure this fact by adding
a sanity check when we are analyzing the induction variable.

llvm-svn: 309661
2017-08-01 06:27:51 +00:00
Davide Italiano 72c4285bd6 [MetaRenamer] Leave `@main` alone.
To the best of my knowledge -metarenamer is used in two cases:
1) obfuscate names, when e.g. they contain informations that
can't be shared.
2) Improve clarity of the textual IR for testcases.

One of the usecases if getting the output of `opt` and passing it
to the lli interpreter to run the test. If metarenamer renames
@main, lli can't find an entry point.

llvm-svn: 309657
2017-08-01 05:14:45 +00:00
Alina Sbirlea 30d8a881e8 Default MemoryLocation passed to getModRefInfo should be None (D35441)
llvm-svn: 309645
2017-08-01 00:47:17 +00:00
Kostya Serebryany a1f12ba17e [sanitizer-coverage] relax an assertion
llvm-svn: 309644
2017-08-01 00:44:05 +00:00
Alina Sbirlea 967e7966fc Allow None as a MemoryLocation to getModRefInfo
Summary:
Adding part of the changes in D30369 (needed to make progress):
Current patch updates AliasAnalysis and MemoryLocation, but does _not_ clean up MemorySSA.

Original summary from D30369, by dberlin:
Currently, we have instructions which affect memory but have no memory
location. If you call, for example, MemoryLocation::get on a fence,
it asserts. This means things specifically have to avoid that. It
also means we end up with a copy of each API, one taking a memory
location, one not.

This starts to fix that.

We add MemoryLocation::getOrNone as a new call, and reimplement the
old asserting version in terms of it.

We make MemoryLocation optional in the (Instruction, MemoryLocation)
version of getModRefInfo, and kill the old one argument version in
favor of passing None (it had one caller). Now both can handle fences
because you can just use MemoryLocation::getOrNone on an instruction
and it will return a correct answer.

We use all this to clean up part of MemorySSA that had to handle this difference.

Note that literally every actual getModRefInfo interface we have could be made private and replaced with:

getModRefInfo(Instruction, Optional<MemoryLocation>)
and
getModRefInfo(Instruction, Optional<MemoryLocation>, Instruction, Optional<MemoryLocation>)

and delegating to the right ones, if we wanted to.

I have not attempted to do this yet.

Reviewers: dberlin, davide, dblaikie

Subscribers: sanjoy, hfinkel, chandlerc, llvm-commits

Differential Revision: https://reviews.llvm.org/D35441

llvm-svn: 309641
2017-08-01 00:28:29 +00:00
Sanjay Patel dac0ab272c [InstCombine] allow mask hoisting transform for vector types
llvm-svn: 309627
2017-07-31 21:01:53 +00:00
Peter Collingbourne bcd204b478 Update phi nodes in LowerTypeTests control flow simplification
D33925 added a control flow simplification for -O2 --lto-O0 builds that
manually splits blocks and reassigns conditional branches but does not
correctly update phi nodes. If the else case being branched to had
incoming phi nodes the control-flow simplification would leave phi nodes
in that BB with an unhandled predecessor.

Patch by Vlad Tsyrklevich!

Differential Revision: https://reviews.llvm.org/D36012

llvm-svn: 309621
2017-07-31 20:43:07 +00:00
Kostya Serebryany bfc83fa8d7 [sanitizer-coverage] don't instrument available_externally functions
llvm-svn: 309611
2017-07-31 20:00:22 +00:00
Kostya Serebryany bb6f079a45 [sanitizer-coverage] ensure minimal alignment for coverage counters and guards
llvm-svn: 309610
2017-07-31 19:49:45 +00:00
Davide Italiano e4c2782cba [SLPVectorizer] Unbreak the build with -Werror.
GCC was complaining about `&&` within `||` without explicit
parentheses. NFCI.

llvm-svn: 309606
2017-07-31 19:14:19 +00:00
Craig Topper 317a51e886 [X86][InstCombine] Add some simplifications for BZHI intrinsics
This intrinsic clears the upper bits starting at a specified index. If the index is a constant we can do some simplifications.

This could be in InstSimplify, but we don't handle any target specific intrinsics there today.

Differential Revision: https://reviews.llvm.org/D36069

llvm-svn: 309604
2017-07-31 18:52:15 +00:00
Craig Topper 8324003818 [X86][InstCombine] Add basic simplification support for BEXTR/BEXTRI intrinsics.
This patch adds simplification support for the BEXTR/BEXTRI intrinsics to match gcc. This only supports cases that fold to 0 or can be fully constant folded. Theoretically we could support converting to AND if the shift part is unused or to only a shift if the mask doesn't modify any bits after an equivalent shl. gcc doesn't do these transformations either.

I put this in InstCombine, but it could be done in InstSimplify. It would be the first target specific intrinsic in InstSimplify.

Differential Revision: https://reviews.llvm.org/D36063

llvm-svn: 309603
2017-07-31 18:52:13 +00:00
David Majnemer 91c6330c96 [IPSCCP] Guard a user of getInitializer with hasDefinitiveInitializer
We are not allowed to reason about an initializer value without first
consulting hasDefinitiveInitializer.

llvm-svn: 309594
2017-07-31 17:47:07 +00:00
Florian Hahn 94b8a87c8e Extend ifdefs to more unused helper functions.
This fixes a buildbot failure with -Werror introduced by r309553

llvm-svn: 309572
2017-07-31 16:11:43 +00:00
Alexey Bataev 0ab22bb991 [SLP] Initial rework for min/max horizontal reduction vectorization, NFC.
Summary: All getReductionCost() functions are renamed to getArithmeticReductionCost() + added basic infrastructure to handle non-binary reduction operations.

Reviewers: spatel, mzolotukhin, Ayal, mkuper, gilr, hfinkel

Subscribers: RKSimon, llvm-commits

Differential Revision: https://reviews.llvm.org/D29402

llvm-svn: 309566
2017-07-31 14:36:05 +00:00
Alexey Bataev 3e9b3eb91d [Cost] Rename getReductionCost() to getArithmeticReductionCost(), NFC.
llvm-svn: 309563
2017-07-31 14:19:32 +00:00
Ayal Zaks e841b214b1 [LV] Avoid redundant operations manipulating masks
The Loop Vectorizer generates redundant operations when manipulating masks:
AND with true, OR with false, compare equal to true. Instead of relying on
a subsequent pass to clean them up, this patch avoids generating them.

Use null (no-mask) to represent all-one full masks, instead of a constant
all-one vector, following the convention of masked gathers and scatters.

Preparing for a follow-up VPlan patch in which these mask manipulating
operations are modeled using recipes.

Differential Revision: https://reviews.llvm.org/D35725

llvm-svn: 309558
2017-07-31 13:21:42 +00:00
Florian Hahn 6b3216aad8 Guard print() functions only used by dump() functions.
Summary:
Since  r293359, most dump() function are only defined when
`!defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)` holds. print() functions
only used by dump() functions are now unused in release builds,
generating lots of warnings. This patch only defines some print()
functions if they are used.

Reviewers: MatzeB

Reviewed By: MatzeB

Subscribers: arsenm, mzolotukhin, nhaehnle, llvm-commits

Differential Revision: https://reviews.llvm.org/D35949

llvm-svn: 309553
2017-07-31 10:07:49 +00:00
Florian Hahn 4284049dcc [LoopInterchange] Do not interchange loops with function calls.
Summary:
Without any information about the called function, we cannot be sure
that it is safe to interchange loops which contain function calls. For
example there could be dependences that prevent interchanging between
accesses in the called function and the loops. Even functions without any
parameters could cause problems, as they could access memory using
global pointers.

For now, I think it is only safe to interchange loops with calls marked
as readnone.

With this patch, the LLVM test suite passes with `-O3 -mllvm
-enable-loopinterchange` and LoopInterchangeProfitability::isProfitable
returning true for all loops. check-llvm and check-clang also pass when
bootstrapped in a similar fashion, although only 3 loops got
interchanged.

Reviewers: karthikthecool, blitz.opensource, hfinkel, mcrosier, mkuper

Reviewed By: mcrosier

Subscribers: mzolotukhin, llvm-commits

Differential Revision: https://reviews.llvm.org/D35489

llvm-svn: 309547
2017-07-31 09:00:52 +00:00
Sam Elliott 67b0e589d0 Migrate PGOMemOptSizeOpt to use new OptimizationRemarkEmitter Pass
Summary:
Fixes PR33790.

This patch still needs a yaml-style test, which I shall write tomorrow

Reviewers: anemet

Reviewed By: anemet

Subscribers: anemet, llvm-commits

Differential Revision: https://reviews.llvm.org/D35981

llvm-svn: 309497
2017-07-30 00:35:33 +00:00
Sumanth Gundapaneni 8d50a50e98 [SimplifyCFG] Make the no-jump-tables attribute also disable switch lookup tables
Differential Revision: https://reviews.llvm.org/D35579

llvm-svn: 309444
2017-07-28 22:25:40 +00:00
Adrian Prantl abe04759a6 Remove the obsolete offset parameter from @llvm.dbg.value
There is no situation where this rarely-used argument cannot be
substituted with a DIExpression and removing it allows us to simplify
the DWARF backend. Note that this patch does not yet remove any of
the newly dead code.

rdar://problem/33580047
Differential Revision: https://reviews.llvm.org/D35951

llvm-svn: 309426
2017-07-28 20:21:02 +00:00
Alexey Bataev e109655c90 [SLP] Allow vectorization of the instruction from the same basic blocks only, NFC.
Summary:
After some changes in SLP vectorizer we missed some additional checks to
limit the instructions for vectorization. We should not perform analysis
of the instructions if the parent of instruction is not the same as the
parent of the first instruction in the tree or it was analyzed already.

Subscribers: mzolotukhin

Differential Revision: https://reviews.llvm.org/D34881

llvm-svn: 309425
2017-07-28 20:11:16 +00:00
Wei Mi 55c05e14af [GVN] Recommit the patch "Add phi-translate support in scalarpre"
Recommit after workaround the bug PR31652.

Three bugs fixed in previous recommits: The first one is to use CurrentBlock
instead of PREInstr's Parent as param of performScalarPREInsertion because
the Parent of a clone instruction may be uninitialized. The second one is stop
PRE when CurrentBlock to its predecessor is a backedge and an operand of CurInst
is defined inside of CurrentBlock. The same value defined inside of loop in last
iteration can not be regarded as available. The third one is an out-of-bound
array access in a flipped if guard.

Right now scalarpre doesn't have phi-translate support, so it will miss some
simple pre opportunities. Like the following testcase, current scalarpre cannot
recognize the last "a * b" is fully redundent because a and b used by the last
"a * b" expr are both defined by phis.

long a[100], b[100], g1, g2, g3;
__attribute__((pure)) long goo();

void foo(long a, long b, long c, long d) {

  g1 = a * b;
  if (__builtin_expect(g2 > 3, 0)) {
    a = c;
    b = d;
    g2 = a * b;
  }
  g3 = a * b;      // fully redundant.

}

The patch adds phi-translate support in scalarpre. This is only a temporary
solution before the newpre based on newgvn is available.

Differential Revision: https://reviews.llvm.org/D32252

llvm-svn: 309397
2017-07-28 15:47:25 +00:00
Davide Italiano 75a001ba78 [JumpThreading] Stop falsely preserving LazyValueInfo.
JumpThreading claims to preserve LVI, but it doesn't preserve
the analyses which LVI holds a reference to (e.g. the Dominator).
In the current pass manager infrastructure, after JT runs, the
PM frees these analyses (including DominatorTree) but preserves
LVI.

CorrelatedValuePropagation runs immediately after and queries
a corrupted domtree, causing weird miscompiles.

This commit disables the preservation of LVI for the time being.
Eventually, we should either move LVI to a proper dependency
tracking mechanism (i.e. an analyses shouldn't hold references
to other analyses and compute them on demand if needed), or
we should teach all the passes preserving LVI to preserve the
analyses LVI depends on.

The new pass manager has a mechanism to invalidate LVI in case
one of the analyses it depends on becomes invalid, so this problem
shouldn't exist (at least not in this immediate form), but handling
of analyses holding references is still a very delicate subject.

Fixes PR33917 (and rustc).

llvm-svn: 309355
2017-07-28 03:10:43 +00:00
Davide Italiano 01cb947abb [JumpThreading] Add an option to dump LazyValueInfo after the run.
Differential Revision:  https://reviews.llvm.org/D35973

llvm-svn: 309353
2017-07-28 02:57:43 +00:00
Dehao Chen 8260d66556 Increase the ImportHotMultiplier to 10.0
Summary: The original 3.0 hot mupltiplier is too small, and would prevent hot callsites from being inline. This patch increases the hot multilier to 10.0

Reviewers: davidxl, tejohnson

Reviewed By: tejohnson

Subscribers: llvm-commits, sanjoy

Differential Revision: https://reviews.llvm.org/D35969

llvm-svn: 309344
2017-07-28 01:02:34 +00:00
Kostya Serebryany 063b652096 [sanitizer-coverage] rename sanitizer-coverage-create-pc-table into sanitizer-coverage-pc-table and add plumbing for a clang flag
llvm-svn: 309337
2017-07-28 00:09:29 +00:00
Kostya Serebryany b75d002f15 [sanitizer-coverage] add a feature sanitizer-coverage-create-pc-table=1 (works with trace-pc-guard and inline-8bit-counters) that adds a static table of instrumented PCs to be used at run-time
llvm-svn: 309335
2017-07-27 23:36:49 +00:00
whitequark 9e8197ac8f [MergeFunctions] Remove alias support.
The alias support was dead code since 2011. It was last touched
in r124182, where it was reintroduced after being removed
in r110434, and since then it was gated behind a HasGlobalAliases
flag that was permanently stuck as `false`.

It is also broken. I'm not sure if it bitrotted or was just broken
in the first place because it appears to have never been tested,
but the following IR results in a crash:

    define internal i32 @a(i32 %a, i32 %b) unnamed_addr {
      %c = add i32 %a, %b
      %d = xor i32 %a, %c
      ret i32 %c
    }

    define internal i32 @b(i32 %a, i32 %b) unnamed_addr {
      %c = add i32 %a, %b
      %d = xor i32 %a, %c
      ret i32 %c
    }

It seems safe to remove buggy untested code that no one cared about
for seven years.

Differential Revision: https://reviews.llvm.org/D34802

llvm-svn: 309313
2017-07-27 19:36:13 +00:00
Davide Italiano 82c7d3768d [FunctionImport] Prefer isa<> to dyn_cast<> as the value is not used.
This change makes GCC7 happy again.

llvm-svn: 309305
2017-07-27 18:38:09 +00:00
Hiroshi Yamauchi 60855214c2 [InstCombine] Simplify pointer difference subtractions (GEP-GEP) where GEPs have other uses and one non-constant index
Summary:
Pointer difference simplifications currently happen only if input GEPs don't have other uses or their indexes are all constants, to avoid duplicating indexing arithmetic.

This patch enables cases with exactly one non-constant index among input GEPs to happen where there is no duplicated arithmetic or code size increase even if input GEPs have other uses.

For example, this patch allows "(&A[42][i]-&A[42][0])" --> "i", which didn't happen previously, if the input GEP(s) have other uses.

Reviewers: sanjoy, bkramer

Reviewed By: sanjoy

Subscribers: mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D35499

llvm-svn: 309304
2017-07-27 18:27:11 +00:00
Adam Nemet 0d8b5d6f69 [ICP] Migrate to OptimizationRemarkEmitter
This is a module pass so for the old PM, we can't use ORE, the function
analysis pass.  Instead ORE is created on the fly.

A few notes:

- isPromotionLegal is folded in the caller since we want to emit the Function
in the remark but we can only do that if the symbol table look-up succeeded.

- There was good test coverage for remarks in this pass.

- promoteIndirectCall uses ORE conditionally since it's also used from
SampleProfile which does not use ORE yet.

Fixes PR33792.

Differential Revision: https://reviews.llvm.org/D35929

llvm-svn: 309294
2017-07-27 16:54:15 +00:00
Daniel Neilson 2574d7cbf6 All libcalls should be considered to be GC-leaf functions.
Summary:
It is possible for some passes to materialize a call to a libcall (ex: ldexp, exp2, etc),
but these passes will not mark the call as a gc-leaf-function. All libcalls are
actually gc-leaf-functions, so we change llvm::callsGCLeafFunction() to tell us that
available libcalls are equivalent to gc-leaf-function calls.

Reviewers: sanjoy, anna, reames

Reviewed By: anna

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D35840

llvm-svn: 309291
2017-07-27 16:49:39 +00:00
Alexey Bataev 07b96e8e96 [SLP] Outline code for the check that instruction users are part of
vectorization tree, NFC.

llvm-svn: 309284
2017-07-27 15:48:44 +00:00
David Blaikie 72c0b1cc1b Fix assert from r309278
llvm-svn: 309281
2017-07-27 15:28:10 +00:00
David Blaikie 2f0cc477ab ThinLTO: Don't import aliases of any kind (even linkonce_odr)
Summary:
Until a more advanced version of importing can be implemented for
aliases (one that imports an alias as an available_externally definition
of the aliasee), skip the narrow subset of cases that was possible but
came at a cost: aliases of linkonce_odr functions could be imported
because the linkonce_odr function could be safely duplicated from the
source module. This came/comes at the cost of not being able to 'home'
imported linkonce functions (they had to be emitted linkonce_odr in all
the destination modules (even if they weren't used by an alias) rather
than as available_externally - causing extra object size).

Tangentially, this also was the only reason ThinLTO would emit multiple
CUs in to the resulting DWARF - which happens to be a problem for
Fission (there's a fix for this in GDB but not released yet, etc).
(actually it's not the only reason - but I'm sending a patch to fix the
other reason shortly)

There's no reason to believe this particularly narrow alias importing
was especially/meaningfully important, only that it was /possible/ to
implement in this way. When a more general solution is done, it should
still satisfy the DWARF concerns above, since the import will still be
available_externally, and thus not create extra CUs.

Since now all aliases are treated the same, I removed/simplified some
test cases since they were testing corner cases where there are no
longer any corners.

Reviewers: tejohnson, mehdi_amini

Differential Revision: https://reviews.llvm.org/D35875

llvm-svn: 309278
2017-07-27 15:09:06 +00:00
Hiroshi Yamauchi 0445e31c88 Fix a comment (test commit).
llvm-svn: 309192
2017-07-26 21:54:43 +00:00
Adam Nemet ea06e6e865 Migrate SimplifyLibCalls to new OptimizationRemarkEmitter
Summary:
This changes SimplifyLibCalls to use the new OptimizationRemarkEmitter
API.

In fact, as SimplifyLibCalls is only ever called via InstCombine,
(as far as I can tell) the OptimizationRemarkEmitter is added there,
and then passed through to SimplifyLibCalls later.

I have avoided changing any remark text.

This closes PR33787

Patch by Sam Elliott!

Reviewers: anemet, davide

Reviewed By: anemet

Subscribers: davide, mehdi_amini, eraman, fhahn, llvm-commits

Differential Revision: https://reviews.llvm.org/D35608

llvm-svn: 309158
2017-07-26 19:03:18 +00:00
Wei Mi fc0e245464 Disable loop unswitching for some patterns containing equality comparison with undef.
This is a workaround for the bug described in PR31652 and
http://lists.llvm.org/pipermail/llvm-dev/2017-July/115497.html. The temporary
solution is to add a function EqualityPropUnSafe. In EqualityPropUnSafe, for
some simple patterns we can know the equality comparison may contains undef,
so we regard such comparison as unsafe and will not do loop-unswitching for
them. We also need to disable the select simplification when one of select
operand is undef and its result feeds into equality comparison.

The patch cannot clear the safety issue caused by the bug, but it can suppress
the issue from happening to some extent.

Differential Revision: https://reviews.llvm.org/D35811

llvm-svn: 309059
2017-07-25 23:37:17 +00:00
Chandler Carruth 1dc34c6d80 [LIR] Teach LIR to avoid extending the BE count prior to adding one to
it when safe.

Very often the BE count is the trip count minus one, and the plus one
here should fold with that minus one. But because the BE count might in
theory be UINT_MAX or some such, adding one before we extend could in
some cases wrap to zero and break when we scale things.

This patch checks to see if it would be safe to add one because the
specific case that would cause this is guarded for prior to entering the
preheader. This should handle essentially all of the common loop idioms
coming out of C/C++ code once canonicalized by LLVM.

Before this patch, both forms of loop in the added test cases ended up
subtracting one from the size, extending it, scaling it up by 8 and then
adding 8 back onto it. This is really silly, and it turns out made it
all the way into generated code very often, so this is a surprisingly
important cleanup to do.

Many thanks to Sanjoy for showing me how to do this with SCEV.

Differential Revision: https://reviews.llvm.org/D35758

llvm-svn: 308968
2017-07-25 10:48:32 +00:00
Kostya Serebryany c485ca05ac [sanitizer-coverage] simplify the code, NFC
llvm-svn: 308944
2017-07-25 02:07:38 +00:00
Florian Hahn f66efd6181 [LoopInterchange] Update code to use range-based for loops (NFC).
Summary:
The remaining non range-based for loops do not iterate over full ranges,
so leave them as they are.

Reviewers: karthikthecool, blitz.opensource, mcrosier, mkuper, aemerson

Reviewed By: aemerson

Subscribers: aemerson, mzolotukhin, llvm-commits

Differential Revision: https://reviews.llvm.org/D35777

llvm-svn: 308872
2017-07-24 11:41:30 +00:00
Xinliang David Li 8e43698cf1 [PGOInstr] Add a debug print
llvm-svn: 308785
2017-07-21 21:36:25 +00:00
Haojie Wang 1dec57d5b0 ThinLTO Minimized Bitcode File Size Reduction
Summary: Currently the ThinLTO minimized bitcode file only strip the debug info, but there is still a lot of information in the minimized bit code file that will be not used for thin linker. In this patch, most of the extra information is striped to reduce the minimized bitcode file. Now only ModuleVersion, ModuleInfo, ModuleGlobalValueSummary, ModuleHash, Symtab and Strtab are left. Now the minimized bitcode file size is reduced to 15%-30% of the debug info stripped bitcode file size.

Reviewers: danielcdh, tejohnson, pcc

Reviewed By: pcc

Subscribers: mehdi_amini, aprantl, inglorion, eraman, llvm-commits

Differential Revision: https://reviews.llvm.org/D35334

llvm-svn: 308760
2017-07-21 17:25:20 +00:00
Anna Thomas 5c07a4c5de [RuntimeUnroll] NFC: Add a profitability function for mutliexit loop
Separated out the profitability from the safety analysis for multiexit
loop unrolling. Currently, this is an NFC because profitability is true
only if the unroll-runtime-multi-exit is set to true (off-by-default).

This is to ease adding the profitability heuristic up for review at
D35380.

llvm-svn: 308753
2017-07-21 16:30:38 +00:00
Dinar Temirbulatov 4403b2b668 [SLPVectorizer] Replace E->Scalars to VL0 at vectorizeTree and move comment, NFCI.
llvm-svn: 308750
2017-07-21 16:02:56 +00:00
Dinar Temirbulatov b2a9a23213 [SLPVectorizer] buildTree_rec replace cast<Instruction>(VL[0]) to VL0, NFCI.
llvm-svn: 308745
2017-07-21 15:31:54 +00:00
Dinar Temirbulatov 3206409d91 [SLPVectorizer] Change canReuseExtract function parameter Opcode from unsigned to Value *, NFCI.
llvm-svn: 308739
2017-07-21 13:32:36 +00:00
Jonas Paulsson 024e319489 [SystemZ, LoopStrengthReduce]
This patch makes LSR generate better code for SystemZ in the cases of memory
intrinsics, Load->Store pairs or comparison of immediate with memory.

In order to achieve this, the following common code changes were made:

 * New TTI hook: LSRWithInstrQueries(), which defaults to false. Controls if
 LSR should do instruction-based addressing evaluations by calling
 isLegalAddressingMode() with the Instruction pointers.
 * In LoopStrengthReduce: handle address operands of memset, memmove and memcpy
 as address uses, and call isFoldableMemAccessOffset() for any LSRUse::Address,
 not just loads or stores.

SystemZ changes:

 * isLSRCostLess() implemented with Insns first, and without ImmCost.
 * New function supportedAddressingMode() that is a helper for TTI methods
 looking at Instructions passed via pointers.

Review: Ulrich Weigand, Quentin Colombet
https://reviews.llvm.org/D35262
https://reviews.llvm.org/D35049

llvm-svn: 308729
2017-07-21 11:59:37 +00:00
Davide Italiano 0c8d26c312 [PGO] Move the PGOInstrumentation pass to new OptRemark API.
This fixes PR33791.

llvm-svn: 308668
2017-07-20 20:43:05 +00:00
Peter Collingbourne 6f6788b99c LowerTypeTests: Drop function type metadata only if we're going to replace it.
Previously we were (mis)handling jump table members with a prevailing
definition in a full LTO module and a non-prevailing definition in a
ThinLTO module by dropping type metadata on those functions entirely,
which would cause type tests involving such functions to fail.

This patch causes us to drop metadata only if we are about to replace
it with metadata from cfi.functions.

We also want to replace metadata for available_externally functions,
which can arise in the opposite scenario (prevailing ThinLTO
definition, non-prevailing full LTO definition). The simplest way
to handle that is to remove the definition; there's little value in
keeping it around at this point (i.e. after most optimization passes
have already run) and later code will try to use the function's linkage
to create an alias, which would result in invalid IR if the function
is available_externally.

Fixes PR33832.

Differential Revision: https://reviews.llvm.org/D35604

llvm-svn: 308642
2017-07-20 18:02:05 +00:00
David Majnemer e6bb895ab5 [LICM] Make sinkRegion and hoistRegion non-recursive
Large CFGs can cause us to blow up the stack because we would have a
recursive step for each basic block in a region.

Instead, create a worklist and iterate it. This limits the stack usage
to something more manageable.

Differential Revision: https://reviews.llvm.org/D35609

llvm-svn: 308582
2017-07-20 03:27:02 +00:00
Davide Italiano 4b8c8eae32 [TRE] Move to the new OptRemark API.
Fixes PR33788.

Differential Revision:  https://reviews.llvm.org/D35570

llvm-svn: 308524
2017-07-19 21:13:22 +00:00
Peter Collingbourne 93fdaca5ac ThinLTOBitcodeWriter: Do not rewrite intrinsic functions when splitting modules.
Changing the type of an intrinsic may invalidate the IR.

Differential Revision: https://reviews.llvm.org/D35593

llvm-svn: 308500
2017-07-19 17:54:29 +00:00
Dinar Temirbulatov a61f4b8957 [LoopUtils] Add an extra parameter OpValue to propagateIRFlags function,
If OpValue is non-null, we only consider operations similar to OpValue
when intersecting.

Differential Revision: https://reviews.llvm.org/D35292

llvm-svn: 308428
2017-07-19 10:02:07 +00:00
Balaram Makam b05a55787a [SimplifyCFG] Defer folding unconditional branches to LateSimplifyCFG if it can destroy canonical loop structure.
Summary:
When simplifying unconditional branches from empty blocks, we pre-test if the
BB belongs to a set of loop headers and keep the block to prevent passes from
destroying canonical loop structure. However, the current algorithm fails if
the destination of the branch is a loop header. Especially when such a loop's
latch block is folded into loop header it results in additional backedges and
LoopSimplify turns it into a nested loop which prevent later optimizations
from being applied (e.g., loop  unrolling and loop interleaving).

This patch augments the existing algorithm by further checking if the
destination of the branch belongs to a set of loop headers and defer
eliminating it if yes to LateSimplifyCFG.

Fixes PR33605: https://bugs.llvm.org/show_bug.cgi?id=33605

Reviewers: efriedma, mcrosier, pacxx, hsung, davidxl

Reviewed By: efriedma

Subscribers: ashutosh.nema, gberry, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D35411

llvm-svn: 308422
2017-07-19 08:53:34 +00:00
Ayal Zaks 8c452d76ed [LV] Test once if vector trip count is zero, instead of twice
Generate a single test to decide if there are enough iterations to jump to the
vectorized loop, or else go to the scalar remainder loop. This test compares the
Scalar Trip Count: if STC < VF * UF go to the scalar loop. If
requiresScalarEpilogue() holds, at-least one iteration must remain scalar; the
rest can be used to form vector iterations. So in this case the test checks
instead if (STC - 1) < VF * UF by comparing STC <= VF * UF, and going to the
scalar loop if so. Otherwise the vector loop is entered for at-least one vector
iteration.

This test covers the case where incrementing the backedge-taken count will
overflow leading to an incorrect trip count of zero. In this (rare) case we will
also avoid the vector loop and jump to the scalar loop.

This patch simplifies the existing tests and effectively removes the basic-block
originally named "min.iters.checked", leaving the single test in block
"vector.ph".

Original observation and initial patch by Evgeny Stupachenko.

Differential Revision: https://reviews.llvm.org/D34150

llvm-svn: 308421
2017-07-19 05:16:39 +00:00
Chandler Carruth 06a86301a1 [PM/LCG] Follow-up fix to r308088 to handle deletion of library
functions.

In the prior commit, we provide ordering to the LCG between functions
and library function definitions that they might begin to call through
transformations. But we still would delete these library functions from
the call graph if they became dead during inlining.

While this immediately crashed, it also exposed a loss of information.
We shouldn't remove definitions of library functions that can still
usefully participate in the LCG-powered CGSCC optimization process. If
new call edges are formed, we want to have definitions to be called.

We can still remove these functions if truly dead using global-dce, etc,
but removing them during the CGSCC walk is premature.

This fixes a crash in the new PM when optimizing some unusual libraries
that end up with "internal" lib functions such as the code in the "R"
language's libraries.

llvm-svn: 308417
2017-07-19 04:12:25 +00:00
Weiming Zhao 984f1dc338 Fix DebugLoc propagation for unreachable LoadInst
Summary: Currently, when GVN creates a load and when InstCombine creates a new store for unreachable Load, the DebugLoc info gets lost.

Reviewers: dberlin, davide, aprantl

Reviewed By: aprantl

Subscribers: davide, llvm-commits

Differential Revision: https://reviews.llvm.org/D34639

llvm-svn: 308404
2017-07-19 01:27:24 +00:00
Vitaly Buka 74443f0778 [asan] Copy arguments passed by value into explicit allocas for ASan
Summary:
  ASan determines the stack layout from alloca instructions. Since
arguments marked as "byval" do not have an explicit alloca instruction, ASan
does not produce red zones for them. This commit produces an explicit alloca
instruction and copies the byval argument into the allocated memory so that red
zones are produced.

  Submitted on behalf of @morehouse (Matt Morehouse)

  Reviewers: eugenis, vitalybuka

  Reviewed By: eugenis

  Subscribers: hiraditya, llvm-commits

  Differential Revision: https://reviews.llvm.org/D34789

llvm-svn: 308387
2017-07-18 22:28:03 +00:00
Davide Italiano 6da7db31df [TRE] Simplify canTRE() a bit using all_of(). NFCI.
This has a ~11 years old FIXME, which may not be true today.
We might consider removing this code altogether.

llvm-svn: 308319
2017-07-18 15:42:59 +00:00
Alexander Potapenko 9385aaa848 [sancov] Fix PR33732
Coverage hooks that take less-than-64-bit-integers as parameters need the
zeroext parameter attribute (http://llvm.org/docs/LangRef.html#paramattrs)
to make sure they are properly extended by the x86_64 ABI.

llvm-svn: 308296
2017-07-18 11:47:56 +00:00
Max Kazantsev 2c627a97fd [IRCE] Recognize loops with ne/eq latch conditions
In some particular cases eq/ne conditions can be turned into equivalent
slt/sgt conditions. This patch teaches parseLoopStructure to handle some
of these cases.

Differential Revision: https://reviews.llvm.org/D35010

llvm-svn: 308264
2017-07-18 04:53:48 +00:00
Martin Storsjo 2f24e93481 [AArch64] Extend CallingConv::X86_64_Win64 to AArch64 as well
Rename the enum value from X86_64_Win64 to plain Win64.

The symbol exposed in the textual IR is changed from 'x86_64_win64cc'
to 'win64cc', but the numeric value is kept, keeping support for
old bitcode.

Differential Revision: https://reviews.llvm.org/D34474

llvm-svn: 308208
2017-07-17 20:05:19 +00:00
Teresa Johnson f9dc3deaa6 Revert "Restore with fix "[ThinLTO] Ensure we always select the same function copy to import""
This reverts commit r308114 (and follow on fixes to test).

There is a linking failure in a ThinLTO bot:
http://green.lab.llvm.org/green/job/clang-stage2-configure-Rthinlto_build/3663/

(and undefined reference). It seems like it must be a second order
effect of the heuristic change I made, and may take some time to try
to reproduce locally and track down. Therefore, reverting for now.

llvm-svn: 308206
2017-07-17 19:25:38 +00:00
Simon Pilgrim 7ac3efacc0 Remove unnecessary cast. NFCI.
llvm-svn: 308166
2017-07-17 09:35:03 +00:00
Davide Italiano 579064e2c1 [InstCombine] Don't violate dominance when replacing instructions.
Differential Revision:  https://reviews.llvm.org/D35376

llvm-svn: 308144
2017-07-16 18:56:30 +00:00
Craig Topper 2072aca51c [InstCombine] Move (0 - x) & 1 --> x & 1 to SimplifyDemandedUseBits.
This removes a dedicated matcher and allows us to support more than just an AND masking the lower bit.

llvm-svn: 308124
2017-07-16 05:37:58 +00:00
Teresa Johnson a7660b0127 Restore with fix "[ThinLTO] Ensure we always select the same function copy to import"
This restores r308078/r308079 with a fix for bot non-determinisim (make
sure we run llvm-lto in single threaded mode so the debug output doesn't get
interleaved).

llvm-svn: 308114
2017-07-15 22:58:06 +00:00
Craig Topper d918d5b36b [InstCombine] Improve the expansion in SimplifyUsingDistributiveLaws to handle cases where one side doesn't simplify, but the other side resolves to an identity value
Summary:
If one side simplifies to the identity value for inner opcode, we can replace the value with just the operation that can't be simplified.

I've removed a couple now unneeded special cases in visitAnd and visitOr. There are probably other cases I missed.

Reviewers: spatel, majnemer, hfinkel, dberlin

Reviewed By: spatel

Subscribers: grandinj, llvm-commits, spatel

Differential Revision: https://reviews.llvm.org/D35451

llvm-svn: 308111
2017-07-15 21:49:49 +00:00
Sanjay Patel 3437ee2740 [InstCombine] improve (1 << x) & 1 --> zext(x == 0) folding
1. Add a one-use check to prevent increasing instruction count.
2. Generalize the pattern matching to include vector types.

llvm-svn: 308105
2017-07-15 17:26:01 +00:00
Sanjay Patel 55b9f88ecc [InstCombine] allow (0 - x) & 1 --> x & 1 for vectors
llvm-svn: 308098
2017-07-15 15:29:47 +00:00
Sanjay Patel 27339133a7 [InstCombine] remove dead code/tests; NFCI
These patterns and tests were added to InstSimplify with:
https://reviews.llvm.org/rL303004

llvm-svn: 308096
2017-07-15 15:01:33 +00:00
Chandler Carruth d78a38ed2e Revert r308078 (and subsequent tweak in r308079) which introduces a test
that appears to exhibit non-determinism and is flaking on the bots
pretty consistently.

r308078: [ThinLTO] Ensure we always select the same function copy to import
r308079: Require asserts in new test that uses debug flag
llvm-svn: 308095
2017-07-15 13:50:26 +00:00
Florian Hahn ad993521ac [LoopInterchange] Add some optimization remarks.
Reviewers: anemet, karthikthecool, blitz.opensource

Reviewed By: anemet

Subscribers: mzolotukhin, llvm-commits

Differential Revision: https://reviews.llvm.org/D35122

llvm-svn: 308094
2017-07-15 13:13:19 +00:00
Dinar Temirbulatov 3c64077c82 [SLPVectorizer] Add an extra parameter to tryScheduleBundle function, NFCI.
llvm-svn: 308081
2017-07-15 05:43:54 +00:00
Teresa Johnson 82b4fb1afe [ThinLTO] Ensure we always select the same function copy to import
Summary:
Check if the first eligible callee is under the instruction threshold.
Checking this on the first eligible callee ensures that we don't end
up selecting different callees to import when we invoke this routine
with different thresholds due to reaching the callee via paths that
are shallower or hotter (when there are multiple copies, i.e. with
weak or linkonce linkage). We don't want to leave the decision of which
copy to import up to the backend.

Reviewers: mehdi_amini

Subscribers: inglorion, fhahn, llvm-commits

Differential Revision: https://reviews.llvm.org/D35436

llvm-svn: 308078
2017-07-15 04:53:05 +00:00
Geoff Berry f7d5daa0c0 [EarlyCSE] Handle calls with no MemorySSA info.
Summary:
When checking for memory dependencies between calls using MemorySSA,
handle cases where the calls have no MemoryAccess associated with them
because the AA analysis being used has determined that the call does not
read/write memory.

Fixes PR33756

Reviewers: dberlin, davide

Subscribers: mcrosier, llvm-commits, Prazek

Differential Revision: https://reviews.llvm.org/D35317

llvm-svn: 308051
2017-07-14 20:13:21 +00:00
Haicheng Wu 476adcca6b [JumpThreading] Add a pattern to TryToUnfoldSelectInCurrBB()
Add the following pattern to TryToUnfoldSelectInCurrBB()

bb:
   %p = phi [0, %bb1], [1, %bb2], [0, %bb3], [1, %bb4], ...
   %c = cmp %p, 0
   %s = select %c, trueval, falseval

The Select in the above pattern will be unfolded and then jump-threaded. The
current implementation does not allow CMP in the middle of PHI and Select.

Differential Revision: https://reviews.llvm.org/D34762

llvm-svn: 308050
2017-07-14 19:16:47 +00:00
Jakub Kuderski b292c22c8d [Dominators] Make IsPostDominator a template parameter
Summary:
DominatorTreeBase used to have IsPostDominators (bool) member to indicate if the tree is a dominator or a postdominator tree. This made it possible to switch between the two 'modes' at runtime, but it isn't used in practice anywhere.

This patch makes IsPostDominator a template argument. This way, it is easier to switch between different algorithms at compile-time based on this argument and design external utilities around it. It also makes it impossible to incidentally assign a postdominator tree to a dominator tree (and vice versa), and to further simplify template code in GenericDominatorTreeConstruction.

Reviewers: dberlin, sanjoy, davide, grosser

Reviewed By: dberlin

Subscribers: mzolotukhin, llvm-commits

Differential Revision: https://reviews.llvm.org/D35315

llvm-svn: 308040
2017-07-14 18:26:09 +00:00
Sanjay Patel 3f4db3ea97 [InstCombine] convert bitwise (in)equality checks to logical ops (PR32401)
As discussed in:
https://bugs.llvm.org/show_bug.cgi?id=32401

we have a backend transform to undo this:
https://reviews.llvm.org/rL299542

when it's likely that the xor version leads to better codegen, but we want 
this form in IR for better analysis and simplification potential.

llvm-svn: 308031
2017-07-14 15:09:49 +00:00
Max Kazantsev f80ffa1a78 [IRCE] Fix corner case with Start = INT_MAX
When iterating through loop

  for (int i = INT_MAX; i > 0; i--)

We fail to generate the pre-loop for it. It happens because we use the
overflown value in a comparison predicate when identifying whether or not
we need it.

In old logic, we used SLE predicate against Greatest value which exceeds all
seen values of the IV and might be overflown. Now we use the GreatestSeen
value of this IV with SLT predicate.

Also added a test that ensures that a pre-loop is generated for such loops.

Differential Revision: https://reviews.llvm.org/D35347

llvm-svn: 308001
2017-07-14 06:35:03 +00:00
Dinar Temirbulatov 21599fe2de [SLPVectorizer] Add an extra parameter to alreadyVectorized function, NFCI.
llvm-svn: 307996
2017-07-14 03:48:29 +00:00
Simon Pilgrim f32f4be957 Fix unused variable warning on EXPENSIVE_CHECKS release builds. NFCI.
llvm-svn: 307929
2017-07-13 17:10:12 +00:00
Davide Italiano c3dc055780 Reapply [GlobalOpt] Remove unreachable blocks before optimizing a function.
This commit reapplies r307215 now that we found out and fixed
the cause of the cfi test failure (in r307871).

llvm-svn: 307920
2017-07-13 15:40:59 +00:00
Anna Thomas ec9b326569 [RuntimeUnrolling] Update DomTree correctly when exit blocks have successors
Summary:
When we runtime unroll with multiple exit blocks, we also need to update the
immediate dominators of the immediate successors of the exit blocks.

Reviewers: reames, mkuper, mzolotukhin, apilipenko

Reviewed by: mzolotukhin

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D35304

llvm-svn: 307909
2017-07-13 13:21:23 +00:00
Xinliang David Li f564c6959e [PGO] Enhance pgo counter promotion
This is an incremental change to the promotion feature.

There are two problems with the current behavior:
1) loops with multiple exiting blocks are totally disabled
2) a counter update can only be promoted one level up in
  the loop nest -- which does help much for short trip
  count inner loops inside a high trip-count outer loops.

Due to this limitation, we still saw very large profile
count fluctuations from run to run for the affected loops
which are usually very hot.

This patch adds the support for promotion counters iteratively
across the loop nest. It also turns on the promotion for
loops with multiple exiting blocks (with a limit).

For single-threaded applications, the performance impact is flat
on average. For instance, dealII improves, but povray regresses.

llvm-svn: 307863
2017-07-12 23:27:44 +00:00
Anna Thomas 8e431a9851 [LoopUnrollRuntime] NFC: Refactored safety checks of unrolling multi-exit loop
Refactored the code and separated out a function
`canSafelyUnrollMultiExitLoop` to reduce redundant checks and make it
easier to add profitability heuristics later.
Added tests to runtime unrolling to make sure that unrolling for
multi-exit loops is not done unless the option
-unroll-runtime-multi-exit is true.

llvm-svn: 307843
2017-07-12 20:55:43 +00:00
Sam Clegg fd5ab25ae1 Remove unneeded use of #undef DEBUG_TYPE. NFC
Where is is needed (at the end of headers that define it), be
consistent about its use.

Also fix a few header guards that I found in the process.

Differential Revision: https://reviews.llvm.org/D34916

llvm-svn: 307840
2017-07-12 20:49:21 +00:00
Michael Kuperstein fdb46b2fb4 [LV] Don't allow outside uses of IVs if the SCEV is predicated on loop conditions.
This fixes PR33706.
Differential Revision: https://reviews.llvm.org/D35227

llvm-svn: 307837
2017-07-12 19:53:55 +00:00
Jakub Kuderski b323f4f173 [LoopRotate] Fix DomTree update logic for unreachable nodes. Fix PR33701.
Summary:
LoopRotate manually updates the DoomTree by iterating over all predecessors of a basic block and computing the Nearest Common Dominator.

When a predecessor happens to be unreachable, `DT.findNearestCommonDominator` returns nullptr.

This patch teaches LoopRotate to handle this case and fixes [[ https://bugs.llvm.org/show_bug.cgi?id=33701 | PR33701 ]].

In the future, LoopRotate should be taught to use the new incremental API for updating the DomTree.

Reviewers: dberlin, davide, uabelho, grosser

Subscribers: efriedma, mzolotukhin

Differential Revision: https://reviews.llvm.org/D35074

llvm-svn: 307828
2017-07-12 18:42:16 +00:00
Peter Collingbourne cacac6a104 LowerTypeTests: When importing functions skip definitions where the summary contains a decl.
This normally indicates mixed CFI + non-CFI compilation, and will
result in us treating the function in the same way as a function
defined outside of the LTO unit.

Part of PR33752.

Differential Revision: https://reviews.llvm.org/D35281

llvm-svn: 307744
2017-07-12 00:39:12 +00:00
Davide Italiano b8ad3eebca [IPO] Temporarily rollback r307215.
[GlobalOpt] Remove unreachable blocks before optimizing a function.
While the change is presumably correct, it exposes a latent bug
in DI which breaks on of the CFI checks. I'll analyze it further
and try to understand what's going on.

llvm-svn: 307729
2017-07-11 23:10:17 +00:00
Konstantin Zhuravlyov bb80d3e1d3 Enhance synchscope representation
OpenCL 2.0 introduces the notion of memory scopes in atomic operations to
  global and local memory. These scopes restrict how synchronization is
  achieved, which can result in improved performance.

  This change extends existing notion of synchronization scopes in LLVM to
  support arbitrary scopes expressed as target-specific strings, in addition to
  the already defined scopes (single thread, system).

  The LLVM IR and MIR syntax for expressing synchronization scopes has changed
  to use *syncscope("<scope>")*, where <scope> can be "singlethread" (this
  replaces *singlethread* keyword), or a target-specific name. As before, if
  the scope is not specified, it defaults to CrossThread/System scope.

  Implementation details:
    - Mapping from synchronization scope name/string to synchronization scope id
      is stored in LLVM context;
    - CrossThread/System and SingleThread scopes are pre-defined to efficiently
      check for known scopes without comparing strings;
    - Synchronization scope names are stored in SYNC_SCOPE_NAMES_BLOCK in
      the bitcode.

Differential Revision: https://reviews.llvm.org/D21723

llvm-svn: 307722
2017-07-11 22:23:00 +00:00
Anna Thomas bafe766f5d [LoopUnrollRuntime] NFC: Add some debugging trace messages for why loop wasn't unrolled.
llvm-svn: 307705
2017-07-11 20:44:37 +00:00
Davide Italiano ee1c82112e [NewGVN] Check for congruency of memory accesses.
This is fine as nothing in the code relies on leader and memory
leader being the same for a given congruency class. Ack'ed by
Dan.

Fixes PR33720.

llvm-svn: 307699
2017-07-11 19:49:12 +00:00
Davide Italiano 67b0e53dc1 [NewGVN] Fix an innocent typo I found while debugging PR33720.
llvm-svn: 307694
2017-07-11 19:19:45 +00:00
Davide Italiano fb4544cd15 [NewGVN] Clarify the function invariants formatting them properly.
llvm-svn: 307692
2017-07-11 19:15:36 +00:00
Evgeniy Stepanov 3d5ea713f7 [msan] Only check shadow memory for operands that are sized.
Fixes PR33347: https://bugs.llvm.org/show_bug.cgi?id=33347.

Differential Revision: https://reviews.llvm.org/D35160

Patch by Matt Morehouse.

llvm-svn: 307684
2017-07-11 18:13:52 +00:00
Anna Thomas 5526a33f4f [LoopUnrollRuntime] Avoid multi-exit nested loop with epilog generation
The loop structure for the outer loop does not contain the epilog
preheader when we try to unroll inner loop with multiple exits and
epilog code is generated. For now, we just bail out in such cases.
Added a test case that shows the problem. Without this bailout, we would
trip on assert saying LCSSA form is incorrect for outer loop.

llvm-svn: 307676
2017-07-11 17:16:33 +00:00
Dinar Temirbulatov 09b6779709 [SLPVectorizer] Revert change in cancelScheduling with referencing to FirstInBundle, NFCI.
llvm-svn: 307667
2017-07-11 15:54:50 +00:00
Hiroshi Inoue 0ca79dcf4b fix typos in comments; NFC
llvm-svn: 307626
2017-07-11 06:04:59 +00:00
Chandler Carruth 01f0c8a8c4 [PM/ThinLTO] Fix PR33536, a bug where the ThinLTO bitcode writer was
querying for analysis results on a function declaration rather than
a definition.

The only reason this worked previously is by chance -- because the way
we got alias analysis results with the legacy PM, we happened to not
compute a dominator tree and so we happened to not hit an assert even
though it didn't make any real sense. Now we bail out before trying to
compute alias analysis so that we don't hit these asserts.

llvm-svn: 307625
2017-07-11 05:39:20 +00:00
Leo Li 93abd7d915 [ConstantHoisting] Remove dupliate logic in constant hoisting
Summary:
As metioned in https://reviews.llvm.org/D34576, checkings in
`collectConstantCandidates` can be replaced by using
`llvm::canReplaceOperandWithVariable`.

The only special case is that `collectConstantCandidates` return false for
all `IntrinsicInst` but it is safe for us to collect constant candidates from
`IntrinsicInst`.

Reviewers: pirama, efriedma, srhines

Reviewed By: efriedma

Subscribers: llvm-commits, javed.absar

Differential Revision: https://reviews.llvm.org/D34921

llvm-svn: 307587
2017-07-10 20:45:34 +00:00
Davide Italiano a7a77540ef [NewGVN] Simplify a lambda a little bit. NFCI.
llvm-svn: 307586
2017-07-10 20:45:00 +00:00
Serge Guelton f6329ec2e9 Fix invalid cast in instcombine UMul/ZExt idiom
Fixes https://bugs.llvm.org/show_bug.cgi?id=25454

Do not assume IRBuilder creates Instruction where it can create Value.
Do not assume idiom operands are constant, leave generalisation ot the IRBuilder.

Differential Revision: https://reviews.llvm.org/D35114

llvm-svn: 307554
2017-07-10 16:51:40 +00:00
Anna Thomas 70ffd65ca9 [LoopUnrollRuntime] Remove strict assert about VMap requirement
When unrolling under multiple exits which is under off-by-default option,
the assert that checks for VMap entry in loop exit values is too strong.
(assert if VMap entry did not exist, the value should be a
constant). However, values derived from
constants or from values outside loop, does not have a VMap entry too.

Removed the assert and added a testcase showcasing the property for
non-constant values.

llvm-svn: 307542
2017-07-10 15:29:38 +00:00
Mikael Holmen e0ced14449 [ArgumentPromotion] Change use of removed argument in llvm.dbg.value to undef
Summary:
This solves PR33641.

When removing a dead argument we must also handle possibly existing calls
to llvm.dbg.value that use the removed argument. Now we change the use
of the otherwise dead argument to an undef for some other pass to cleanup
later.

If the calls are left untouched, they will later on cause errors:
 "function-local metadata used in wrong function"
since the ArgumentPromotion rewrites the code by creating a new function
with the wanted signature, but the metadata is not recreated so the new
function may then erroneously use metadata from the old function.

Reviewers: mstorsjo, rnk, arsenm

Reviewed By: rnk

Subscribers: wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D34874

llvm-svn: 307521
2017-07-10 06:07:24 +00:00
Craig Topper fde4723ebe [IR] Add Type::isIntOrIntVectorTy(unsigned) similar to the existing isIntegerTy(unsigned), but also works for vectors.
llvm-svn: 307492
2017-07-09 07:04:03 +00:00
Craig Topper 95d2347ae1 [IR] Make use of Type::isPtrOrPtrVectorTy/isIntOrIntVectorTy/isFPOrFPVectorTy to shorten code. NFC
llvm-svn: 307491
2017-07-09 07:04:00 +00:00
Hiroshi Inoue 713b5ba2de fix trivial typos; NFC
sucessor -> successor 

llvm-svn: 307488
2017-07-09 05:54:44 +00:00
Chandler Carruth bd9c29039e [PM] Finish implementing and fix a chain of bugs uncovered by testing
the invalidation propagation logic from an SCC to a Function.

I wrote the infrastructure to test this but didn't actually use it in
the unit test where it was designed to be used. =[ My bad. Once
I actually added it to the test case I discovered that it also hadn't
been properly implemented, so I've implemented it. The logic in the FAM
proxy for an SCC pass to propagate invalidation follows the same ideas
as the FAM proxy for a Module pass, but the implementation is a bit
different to reflect the fact that it is forwarding just for an SCC.

However, implementing this correctly uncovered a surprising "bug" (it
was conservatively correct but relatively very expensive) in how we
handle invalidation when splitting one SCC into multiple SCCs. We did an
eager invalidation when in reality we should be deferring invaliadtion
for the *current* SCC to the CGSCC pass manager and just invaliating the
newly constructed SCCs. Otherwise we end up invalidating too much too
soon. This was exposed by the inliner test case that I've updated. Now,
we invalidate *just* the split off '(test1_f)' SCC when doing the CG
update, and then the inliner finishes and invalidates the '(test1_g,
test1_h)' SCC's analyses. The first few attempts at fixing this hit
still more bugs, but all of those are covered by existing tests. For
example, the inliner should also preserve the FAM proxy to avoid
unnecesasry invalidation, and this is safe because the CG update
routines it uses handle any necessary adjustments to the FAM proxy.

Finally, the unittests for the CGSCC pass manager needed a bunch of
updates where we weren't correctly preserving the FAM proxy because it
hadn't been fully implemented and failing to preserve it didn't matter.

Note that this doesn't yet fix the current crasher due to MemSSA finding
a stale dominator tree, but without this the fix to that crasher doesn't
really make any sense when testing because it relies on the proxy
behavior.

llvm-svn: 307487
2017-07-09 03:59:31 +00:00
Craig Topper e79b3e7d9a [InstCombine] Speculatively implement a fix for what might be the root cause of PR33721 by making sure that we have integer types before doing select C, -1, 0 -> sext C to int
I recently changed m_One and m_AllOnes to use Constant::isOneValue/isAllOnesValue which work on floating point values too. The original implementation looked specifically for ConstantInt scalars and splats. So I'm guessing we are accidentally trying to issue sext/zexts on floating point types now.

Hopefully I figure out how to reproduce the failure from the PR soon.

llvm-svn: 307486
2017-07-09 03:25:17 +00:00
Max Kazantsev b9edcbcb1d Re-enable "[IndVars] Canonicalize comparisons between non-negative values and indvars"
The patch was reverted due to a bug. The bug was that if the IV is the 2nd operand of the icmp
instruction, then the "Pred" variable gets swapped and differs from the instruction's predicate.
In this patch we use the original predicate to do the transformation.

Also added a test case that exercises this situation.

Differentian Revision: https://reviews.llvm.org/D35107

llvm-svn: 307477
2017-07-08 17:17:30 +00:00
Craig Topper bb4069e439 [InstCombine] Make InstCombine's IRBuilder be passed by reference everywhere
Previously the InstCombiner class contained a pointer to an IR builder that had been passed to the constructor. Sometimes this would be passed to helper functions as either a pointer or the pointer would be dereferenced to be passed by reference.

This patch makes it a reference everywhere including the InstCombiner class itself so there is more inconsistency. This a large, but mechanical patch. I've done very minimal formatting changes on it despite what clang-format wanted to do.

llvm-svn: 307451
2017-07-07 23:16:26 +00:00
Dehao Chen 64c46574b0 Increase the import-threshold for crtical functions.
Summary: For interative sample-pgo, if a hot call site is inlined in the profiling binary, we should inline it in before profile annotation in the backend. Before that, the compile phase first collects all GUIDs that needs to be imported and creates virtual "hot" call edge in the summary. However, "hot" is not good enough to guarantee the callsites get inlined. This patch introduces "critical" call edge, and assign much higher importing threshold for those edges.

Reviewers: tejohnson

Reviewed By: tejohnson

Subscribers: sanjoy, mehdi_amini, llvm-commits, eraman

Differential Revision: https://reviews.llvm.org/D35096

llvm-svn: 307439
2017-07-07 21:01:00 +00:00
Anna Thomas e3872003d0 [LoopUnrollRuntime] Support multiple exit blocks unrolling when prolog remainder generated
With the NFC refactoring in rL307417 (git SHA 987dd01), all the logic
is in place to support multiple exit/exiting blocks when prolog
remainder is generated.
This patch removed the assert that multiple exit blocks unrolling is only
supported when epilog remainder is generated.

Also, added test runs and checks with PROLOG prefix in
runtime-loop-multiple-exits.ll test cases.

llvm-svn: 307435
2017-07-07 20:12:32 +00:00
Davide Italiano 4eb210bdeb [Local] Update the comment for removeUnreachableBlocks.
It referenced a wrong function name, and didn't mention what the
second argument did. This should be slightly more accurate now.

llvm-svn: 307425
2017-07-07 18:54:14 +00:00
Gor Nishanov 8cdf648795 [cloning] Do not duplicate types when cloning functions
Summary:
This is an addon to the change rl304488 cloning fixes. (Originally rl304226 reverted rl304228 and reapplied rl304488 https://reviews.llvm.org/D33655)

rl304488 works great when DILocalVariables that comes from the inlined function has a 'unique-ed' type, but,
in the case when the variable type is distinct we will create a second DILocalVariable in the scope of the original function that was inlined.

Consider cloning of the following function:
```
define private void @f() !dbg !5 {
  %1 = alloca i32, !dbg !11
  call void @llvm.dbg.declare(metadata i32* %1, metadata !14, metadata !12), !dbg !18
  ret void, !dbg !18
}

!14 = !DILocalVariable(name: "inlined", scope: !15, file: !6, line: 5, type: !17) ; came from an inlined function
!15 = distinct !DISubprogram(name: "inlined", linkageName: "inlined", scope: null, file: !6, line: 8, type: !7, isLocal: true, isDefinition: true, scopeLine: 9, isOptimized: false, unit: !0, variables: !16)
!16 = !{!14}
!17 = distinct !DICompositeType(tag: DW_TAG_structure_type, name: "some_struct", size: 32, align: 32)
```

Without this fix, when function 'f' is cloned, we will create another DILocalVariable for "inlined", due to its type being distinct.

```
define private void @f.1() !dbg !23 {
  %1 = alloca i32, !dbg !26
  call void @llvm.dbg.declare(metadata i32* %1, metadata !28, metadata !12), !dbg !30
  ret void, !dbg !30
}

!14 = !DILocalVariable(name: "inlined", scope: !15, file: !6, line: 5, type: !17)
!15 = distinct !DISubprogram(name: "inlined", linkageName: "inlined", scope: null, file: !6, line: 8, type: !7, isLocal: true, isDefinition: true, scopeLine: 9, isOptimized: false, unit: !0, variables: !16)
!16 = !{!14}
!17 = distinct !DICompositeType(tag: DW_TAG_structure_type, name: "some_struct", size: 32, align: 32)
 ;
!28 = !DILocalVariable(name: "inlined", scope: !15, file: !6, line: 5, type: !29) ; OOPS second DILocalVariable
!29 = distinct !DICompositeType(tag: DW_TAG_structure_type, name: "some_struct", size: 32, align: 32)
```

Now we have two DILocalVariable for "inlined" within the same scope. This result in assert in AsmPrinter/DwarfDebug.h:131: void llvm::DbgVariable::addMMIEntry(const llvm::DbgVariable &): Assertion `V.Var == Var && "conflicting variable"' failed.
(Full example: See: https://bugs.llvm.org/show_bug.cgi?id=33492)

In this change we prevent duplication of types so that when a metadata for DILocalVariable is cloned it will get uniqued to the same metadate node as an original variable.

Reviewers: loladiro, dblaikie, aprantl, echristo

Reviewed By: loladiro

Subscribers: EricWF, llvm-commits

Differential Revision: https://reviews.llvm.org/D35106

llvm-svn: 307418
2017-07-07 18:24:20 +00:00
Anna Thomas 734ab3f75c [LoopUnrollRuntime] NFC: use the precomputed loop exit in ConnectProlog
Minor refactoring to use the preexisting loop exit that's already
calculated. We do not need to recompute the loop exit in ConnectProlog.
Apart from avoiding redundant computation, this is required for
supporting multiple loop exits when Prolog remainder loops are generated.

llvm-svn: 307417
2017-07-07 18:05:28 +00:00
Yaxun Liu b909f11a31 [InferAddressSpaces] Fix assertion about null pointer
InferAddressSpaces does not check address space in collectFlatAddressExpressions,
which causes values with non flat address space put into Postorder and causes
assertion in cloneValueWithNewAddressSpace.

This patch fixes assertion in OpenCL 2.0 conformance test generic_address_space
subtest for amdgcn target.

Differential Revision: https://reviews.llvm.org/D34991

llvm-svn: 307349
2017-07-07 02:40:13 +00:00
Sean Fertile 9cd1cdf814 Extend memcpy expansion in Transform/Utils to handle wider operand types.
Adds loop expansions for known-size and unknown-sized memcpy calls, allowing the
target to provide the operand types through TTI callbacks. The default values
for the TTI callbacks use int8 operand types and matches the existing behaviour
if they aren't overridden by the target.

Differential revision: https://reviews.llvm.org/D32536

llvm-svn: 307346
2017-07-07 02:00:06 +00:00
Evgeniy Stepanov 7d3eeaaa96 Revert r307342, r307343.
Revert "Copy arguments passed by value into explicit allocas for ASan."
Revert "[asan] Add end-to-end tests for overflows of byval arguments."

Build failure on lldb-x86_64-ubuntu-14.04-buildserver.
Test failure on clang-cmake-aarch64-42vma and sanitizer-x86_64-linux-android.

llvm-svn: 307345
2017-07-07 01:31:23 +00:00
Evgeniy Stepanov 2a7a4bc1c9 Copy arguments passed by value into explicit allocas for ASan.
ASan determines the stack layout from alloca instructions. Since
arguments marked as "byval" do not have an explicit alloca instruction, ASan
does not produce red zones for them. This commit produces an explicit alloca
instruction and copies the byval argument into the allocated memory so that red
zones are produced.

Patch by Matt Morehouse.

Differential revision: https://reviews.llvm.org/D34789

llvm-svn: 307342
2017-07-07 00:48:25 +00:00
Wei Mi 7586755013 [ConstHoisting] Turn on consthoist-with-block-frequency by default.
Using profile information to guide consthoisting is generally helpful for
performance, so the patch turns it on by default. No compile time or perf
regression were found using spec2000 and spec2006 on x86.  Some significant
improvement (>20%) was seen on internal benchmarks.

Differential Revision: https://reviews.llvm.org/D35063

llvm-svn: 307338
2017-07-07 00:11:05 +00:00
Craig Topper cb22039bee [InstCombine] No need to pass DataLayout to helper functions if we're passing the InstCombiner object. We can just ask it for the DataLayout. NFC
llvm-svn: 307333
2017-07-06 23:18:43 +00:00
Craig Topper 4853c4304b [InstCombine] Remove unused arguments from some helper functions. NFC
llvm-svn: 307332
2017-07-06 23:18:42 +00:00
Craig Topper 2bb9f0f620 [InstCombine] Change a couple helper functions to only take the IRBuilder as an argument and not the whole InstCombiner object. NFC
llvm-svn: 307331
2017-07-06 23:18:41 +00:00
Wei Mi 20526b2725 [ConstHoisting] choose to hoist when frequency is the same.
The patch is to adjust the strategy of frequency based consthoisting:
Previously when the candidate block has the same frequency with the existing
blocks containing a const, it will not hoist the const to the candidate block.
For that case, now we change the strategy to hoist the const if only existing
blocks have more than one block member. This is helpful for reducing code size.

Differential Revision: https://reviews.llvm.org/D35084

llvm-svn: 307328
2017-07-06 22:32:27 +00:00
Davide Italiano f4891d29f8 [lib/LTO] Add a comment to explain where we set the linkage in the summary.
Pointed out by Teresa!

llvm-svn: 307305
2017-07-06 20:04:20 +00:00
Davide Italiano 6a5fbe52fa [LTO] Fix the interaction between linker redefined symbols and ThinLTO
This is the same as r304719 but for ThinLTO.
The substantial difference is that in this case we don't have
whole visibility, just the summary.
In the LTO case, when we got the resolution for the input file we
could just see if the linker told us whether a symbol was linker
redefined (using --wrap or --defsym) and switch the linkage directly
for the GV.

Here, we have the summary. So, we record that the linkage changed
from <whatever it was> to $weakany to prevent IPOs across this symbol
boundaries and actually just switch the linkage at FunctionImport time.

This patch should also fixes the lld bits (as all the scaffolding for
communicating if a symbol is linker redefined should be there & should
be the same), but I'll make sure to add some tests there as well.

Fixes PR33192.

Differential Revision:  https://reviews.llvm.org/D35064

llvm-svn: 307303
2017-07-06 19:58:26 +00:00
Craig Topper e9bf7ebacf [InstCombine] Remove include of DIBuilder.h and Dwarf.h as they don't appear to be necessary.
llvm-svn: 307295
2017-07-06 18:47:47 +00:00
Leo Li 5499b1b8be Modify constraints in `llvm::canReplaceOperandWithVariable`
Summary:
`Instruction::Switch`: only first operand can be set to a non-constant value.
`Instruction::InsertValue` both the first and the second operand can be set to a non-constant value.
`Instruction::Alloca` return true for non-static allocation.

Reviewers: efriedma

Reviewed By: efriedma

Subscribers: srhines, pirama, llvm-commits

Differential Revision: https://reviews.llvm.org/D34905

llvm-svn: 307294
2017-07-06 18:47:05 +00:00
Craig Topper ca2c87653c [Constants] Replace calls to ConstantInt::equalsInt(0)/equalsInt(1) with isZero and isOne. NFCI
llvm-svn: 307293
2017-07-06 18:39:49 +00:00
Craig Topper 79ab643da8 [Constants] If we already have a ConstantInt*, prefer to use isZero/isOne/isMinusOne instead of isNullValue/isOneValue/isAllOnesValue inherited from Constant. NFCI
Going through the Constant methods requires redetermining that the Constant is a ConstantInt and then calling isZero/isOne/isMinusOne.

llvm-svn: 307292
2017-07-06 18:39:47 +00:00
Anna Thomas eb6d5d1950 [LoopUnrollRuntime] Bailout when multiple exiting blocks to the unique latch exit block
Currently, we do not support multiple exiting blocks to the
latch exit block. However, this bailout wasn't triggered when we had a
unique exit block (which is the latch exit), with multiple exiting
blocks to that unique exit.

Moved the bailout so that it's triggered in both cases and added
testcase.

llvm-svn: 307291
2017-07-06 18:39:26 +00:00
Craig Topper 47c8f66997 [InstCombine] Remove Builder argument from InstCombiner::tryFactorization. NFC
Builder is already a member of the InstCombiner class so we can use it with passing it.

llvm-svn: 307290
2017-07-06 18:35:52 +00:00
Craig Topper dfd01ea9ed [SimplifyCFG] Move a portion of an if statement that should already be implied to an assert
Summary: In this code we got to Dom by following the predecessor link of BB. So it stands to reason that BB should also show up as a successor of Dom's terminator right? There isn't a way to have the CFG connect in only one direction is there?

Reviewers: jmolloy, davide, mcrosier

Reviewed By: mcrosier

Subscribers: mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D35025

llvm-svn: 307276
2017-07-06 16:29:43 +00:00
Craig Topper 95e4142f94 [InstCombine] Change helper method to a file local static method. NFC
llvm-svn: 307275
2017-07-06 16:24:23 +00:00
Craig Topper fc42acef92 [InstCombine] Clarify comment to mention other transform that it does. NFC
llvm-svn: 307274
2017-07-06 16:24:22 +00:00
Craig Topper 22795de20a [InstCombine] Add single use checks to SimplifyBSwap to ensure we are really saving instructions
Bswap isn't a simple operation so we need to make sure we are really removing a call to it before doing these simplifications.

For the case when both LHS and RHS are bswaps I've allowed it to be moved if either LHS or RHS has a single use since that at least allows us to move it later where it might find another bswap to combine with and it decreases the use count on the other side so maybe the other user can be optimized.

Differential Revision: https://reviews.llvm.org/D34974

llvm-svn: 307273
2017-07-06 16:24:21 +00:00
Craig Topper 3e1909d797 [InstCombine] Don't create extra ConstantInt objects in foldSelectICmpAnd. NFCI
Instead just use APInt objects and only create a ConstantInt at the end if we need it for the Offset.

llvm-svn: 307270
2017-07-06 15:58:54 +00:00
Wei Mi 90707394e3 [LSR] Narrow search space by filtering non-optimal formulae with the same ScaledReg and Scale.
When the formulae search space is huge, LSR uses a series of heuristic to keep
pruning the search space until the number of possible solutions are within
certain limit.

The big hammer of the series of heuristics is NarrowSearchSpaceByPickingWinnerRegs,
which picks the register which is used by the most LSRUses and deletes the other
formulae which don't use the register. This is a effective way to prune the search
space, but quite often not a good way to keep the best solution. We saw cases before
that the heuristic pruned the best formula candidate out of search space.

To relieve the problem, we introduce a new heuristic called
NarrowSearchSpaceByFilterFormulaWithSameScaledReg. The basic idea is in order to
reduce the search space while keeping the best formula, we want to keep as many
formulae with different Scale and ScaledReg as possible. That is because the central
idea of LSR is to choose a group of loop induction variables and use those induction
variables to represent LSRUses. An induction variable candidate is often represented
by the Scale and ScaledReg in a formula. If we have more formulae with different
ScaledReg and Scale to choose, we have better opportunity to find the best solution.
That is why we believe pruning search space by only keeping the best formula with the
same Scale and ScaledReg should be more effective than PickingWinnerReg. And we use
two criteria to choose the best formula with the same Scale and ScaledReg. The first
criteria is to select the formula using less non shared registers, and the second
criteria is to select the formula with less cost got from RateFormula. The patch
implements the heuristic before NarrowSearchSpaceByPickingWinnerRegs, which is the
last resort.

Testing shows we get 1.8% and 2% on two internal benchmarks on x86. llvm nightly
testsuite performance is neutral. We also tried lsr-exp-narrow and it didn't help
on the two improved internal cases we saw.

Differential Revision: https://reviews.llvm.org/D34583

llvm-svn: 307269
2017-07-06 15:52:14 +00:00
Max Kazantsev 98838527c6 Revert "Revert "Revert "[IndVars] Canonicalize comparisons between non-negative values and indvars"""
It appears that the problem is still there. Needs more analysis to understand why
SaturatedMultiply test fails.

llvm-svn: 307249
2017-07-06 10:47:13 +00:00
Max Kazantsev c8db20b78c Revert "Revert "[IndVars] Canonicalize comparisons between non-negative values and indvars""
It seems that the patch was reverted by mistake. Clang testing showed failure of the
MathExtras.SaturatingMultiply test, however I was unable to reproduce the issue on the
fresh code base and was able to confirm that the transformation introduced by the change
does not happen in the said test. This gives a strong confidence that the actual reason of
the failure of the initial patch was somewhere else, and that problem now seems to be
fixed. Re-submitting the change to confirm that.

llvm-svn: 307244
2017-07-06 09:57:41 +00:00
Frederich Munch 52dfcd18d1 Avoid constructing GlobalExtensions only to find out it is empty.
Summary:
GlobalExtensions is dereferenced twice, once for iteration and then a check if it is empty.
As a ManagedStatic this dereference forces it's construction which is unnecessary.

Reviewers: efriedma, davide, mehdi_amini

Reviewed By: mehdi_amini

Subscribers: chapuni, llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D33381

llvm-svn: 307229
2017-07-06 00:09:09 +00:00
Davide Italiano 7dd0694f96 [GlobalOpt] Remove unreachable blocks before optimizing a function.
LLVM's definition of dominance allows instructions that are cyclic
in unreachable blocks, e.g.:

  %pat = select i1 %condition, @global, i16* %pat

because any instruction dominates an instruction in a block that's
not reachable from entry.
So, remove unreachable blocks from the function, because a) there's
no point in analyzing them and b) GlobalOpt should otherwise grow
some more complicated logic to break these cycles.

Differential Revision:  https://reviews.llvm.org/D35028

llvm-svn: 307215
2017-07-05 22:28:28 +00:00
Craig Topper cc418b656a [InstCombine] Use CmpInst::Predicate with m_Cmp instead of ICmpInst::Predicate. NFC
There isn't really an ICmpInst version so we're just accessing the CmpInst version through inheritance.

llvm-svn: 307199
2017-07-05 20:31:00 +00:00
Dinar Temirbulatov b78adec638 [SLPVectorizer] Add an extra parameter to cancelScheduling function, NFCI.
llvm-svn: 307158
2017-07-05 13:53:03 +00:00
David Green b26a0a460c [IndVarSimplify] Add AShr exact flags using induction variables ranges.
This adds exact flags to AShr/LShr flags where we can statically
prove it is valid using the range of induction variables. This
allows further optimisations to remove extra loads.

Differential Revision: https://reviews.llvm.org/D34207

llvm-svn: 307157
2017-07-05 13:25:58 +00:00
Max Kazantsev ebe56283bc Revert "[IndVars] Canonicalize comparisons between non-negative values and indvars"
This patch seems to cause failures of test MathExtras.SaturatingMultiply on
multiple buildbots. Reverting until the reason of that is clarified.

Differential Revision: https://reviews.llvm.org/rL307126

llvm-svn: 307135
2017-07-05 09:44:41 +00:00
Max Kazantsev 80bc4a5554 [IndVars] Canonicalize comparisons between non-negative values and indvars
-If there is a IndVar which is known to be non-negative, and there is a value which is also non-negative,
then signed and unsigned comparisons between them produce the same result. Both of those can be
seen in the same loop. To allow other optimizations to simplify them, we turn all instructions like

  %c = icmp slt i32 %iv, %b
to

  %c = icmp ult i32 %iv, %b

if both %iv and %b are known to be non-negative.

Differential Revision: https://reviews.llvm.org/D34979

llvm-svn: 307126
2017-07-05 06:38:49 +00:00
Anna Thomas ada4ddc0bc [LoopDeletion] NFC: Add loop being analyzed debug statement
llvm-svn: 307096
2017-07-04 17:00:03 +00:00
Anna Thomas 90f69abc8b [LoopDeletion] NFC: Add debug statements to the optimization
We have a DEBUG option for loop deletion, but no related debug messages.
Added some debug messages to state why loop deletion failed.

llvm-svn: 307078
2017-07-04 14:05:19 +00:00
Craig Topper 0f746c2793 [InstCombine] Add TODOs for a couple things that should maybe be in InstSimplify instead. NFC
llvm-svn: 307065
2017-07-04 06:50:48 +00:00
Florian Hahn 4eeff394d3 [LoopInterchange] Add more debug messages to currentLimitations().
Summary: This makes it easier to find out which limitation prevented this pass from doing its work.

Reviewers: karthikthecool, mzolotukhin, efriedma, mcrosier

Reviewed By: mcrosier

Subscribers: mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D34940

llvm-svn: 307035
2017-07-03 15:32:00 +00:00
Benjamin Kramer fb620493e1 Revert "[GVN] Recommit the patch "Add phi-translate support in scalarpre"."
This reverts commit r306313. This breaks selfhost at -O3 and PR33652.
Let me know if you need additional information on reproducing the issue.

llvm-svn: 307021
2017-07-03 12:23:10 +00:00
Craig Topper 8036970008 [InstCombine] Add a TODO for a probable missing single use check. NFC
Will try to fix it soon, but in case I forget.

llvm-svn: 307003
2017-07-03 05:54:16 +00:00
Craig Topper 766ce6e9cf [InstCombine] Support BITWISE_OP( BSWAP(x), CONSTANT ) -> BSWAP( BITWISE_OP(x, BSWAP(CONSTANT) ) ) for splat vectors.
llvm-svn: 307002
2017-07-03 05:54:15 +00:00
Craig Topper 32fce4d647 [InstCombine] Remove support for BITWISE_OP(CONSTANT, BSWAP(x)) -> BSWAP(OP(BSWAP(CONSTANT), x)).
Constants were already canonicalized to the right hand side before we got here.

llvm-svn: 307000
2017-07-03 05:54:13 +00:00
Craig Topper 1e4643a98e [InstCombine] Support BITWISE_OP(BSWAP(A),BSWAP(B))->BSWAP(BITWISE_OP(A, B)) for vectors.
llvm-svn: 306999
2017-07-03 05:54:13 +00:00
Craig Topper c6948c25cc [InstCombine] Remove an if that should have been guaranteed by the caller. Replace with an assert. NFC
llvm-svn: 306997
2017-07-03 05:54:11 +00:00
Simon Pilgrim df2657ac2d [InstCombine] Use m_BitReverse pattern match helper. NFCI.
llvm-svn: 306986
2017-07-02 16:31:16 +00:00
Sanjay Patel b51e072d35 [InstCombine] fix crash when folding cmp+bswap vector
We assumed the constant was a scalar when creating the replacement operand.

Also, improve tests for this fold and move the tests for this fold to their own file.
I'll move the related and missing tests to this file as a follow-up.  

llvm-svn: 306985
2017-07-02 16:05:11 +00:00
Sanjay Patel c3d5cf0bb7 [InstCombine] look through bswap/bitreverse for equality comparisons
I noticed this missed bswap optimization in the CGP memcmp() expansion, 
and then I saw that we don't have the fold in InstCombine.

Differential Revision: https://reviews.llvm.org/D34763

llvm-svn: 306980
2017-07-02 14:34:50 +00:00
Hiroshi Inoue bb703e8960 fix trivial typos; NFC
suport -> support

llvm-svn: 306968
2017-07-02 03:24:54 +00:00
Craig Topper f60ab47098 [InstCombine] Fold (a | b) ^ (~a | ~b) --> ~(a ^ b) and (a & b) ^ (~a & ~b) --> ~(a ^ b)
Summary:
I came across this while thinking about what would happen if one of the operands in this xor pattern was itself a inverted (A & ~B) ^ (~A & B)-> (A^B).

The patterns here assume that the (~a | ~b) will be demorganed to ~(a & b) first. Though I wonder if there's a multiple use case that would prevent the demorgan.

Reviewers: spatel

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D34870

llvm-svn: 306967
2017-07-02 01:15:51 +00:00
Davide Italiano e3f7dda1fb [CodeExtractor] Remove unneded and commented out debugging stmts.
llvm-svn: 306966
2017-07-02 00:07:18 +00:00
Hiroshi Inoue ef1c2ba22a fix trivial typos, NFC
llvm-svn: 306952
2017-07-01 07:12:15 +00:00
Davide Italiano 9282f1aece [Cloner] Re-map simplfied cloned instructions.
This commit pretty much rolls back the logic added in r306495
as in the testcase provided we simplify an `icmp` looking through
a PHI that hasn't been mapped yet.

I think instsimplify shouldn't do threading over select/phis or
just looking through phis in general, but this is what we have
now. Also, add a test to prevent this from happening in case somebody
wants to modify this code again.

Briefly discussed with Kyle Butt (thanks Kyle!).

llvm-svn: 306938
2017-07-01 03:29:33 +00:00
Teresa Johnson 32d95742b8 Recommit "r306541 - Add zero-length check to memcpy/memset load store loop expansion""
With fix for use-after-free errors. We can't add the new branch and
remove the old one until we are done with the Builder constructed for
the block.

llvm-svn: 306937
2017-07-01 03:24:10 +00:00
Teresa Johnson c12306c0ad Revert "r306473 - re-commit r306336: Enable vectorizer-maximize-bandwidth by default."
This still breaks PPC tests we have. I'll forward reproduction
instructions to dehao.

llvm-svn: 306936
2017-07-01 03:24:09 +00:00
Teresa Johnson eb4fba9d61 re-commit r306336: Enable vectorizer-maximize-bandwidth by default.
Differential Revision: https://reviews.llvm.org/D33341

llvm-svn: 306935
2017-07-01 03:24:08 +00:00
Teresa Johnson de56903bde revert r306336 for breaking ppc test.
llvm-svn: 306934
2017-07-01 03:24:07 +00:00
Teresa Johnson 1fbaffeba1 Enable vectorizer-maximize-bandwidth by default.
Summary:
vectorizer-maximize-bandwidth is generally useful in terms of performance. I've tested the impact of changing this to default on speccpu benchmarks on sandybridge machines. The result shows non-negative impact:

spec/2006/fp/C++/444.namd                 26.84  -0.31%
spec/2006/fp/C++/447.dealII               46.19  +0.89%
spec/2006/fp/C++/450.soplex               42.92  -0.44%
spec/2006/fp/C++/453.povray               38.57  -2.25%
spec/2006/fp/C/433.milc                   24.54  -0.76%
spec/2006/fp/C/470.lbm                    41.08  +0.26%
spec/2006/fp/C/482.sphinx3                47.58  -0.99%
spec/2006/int/C++/471.omnetpp             22.06  +1.87%
spec/2006/int/C++/473.astar               22.65  -0.12%
spec/2006/int/C++/483.xalancbmk           33.69  +4.97%
spec/2006/int/C/400.perlbench             33.43  +1.70%
spec/2006/int/C/401.bzip2                 23.02  -0.19%
spec/2006/int/C/403.gcc                   32.57  -0.43%
spec/2006/int/C/429.mcf                   40.35  +0.27%
spec/2006/int/C/445.gobmk                 26.96  +0.06%
spec/2006/int/C/456.hmmer                  24.4  +0.19%
spec/2006/int/C/458.sjeng                 27.91  -0.08%
spec/2006/int/C/462.libquantum            57.47  -0.20%
spec/2006/int/C/464.h264ref               46.52  +1.35%

geometric mean                                   +0.29%

The regression on 453.povray seems real, but is due to secondary effects as all hot functions are bit-identical with and without the flag.

I started this patch to consult upstream opinions on this. It will be greatly appreciated if the community can help test the performance impact of this change on other architectures so that we can decided if this should be target-dependent.

Reviewers: hfinkel, mkuper, davidxl, chandlerc

Reviewed By: chandlerc

Subscribers: rengolin, sanjoy, javed.absar, bjope, dorit, magabari, RKSimon, llvm-commits, mzolotukhin

Differential Revision: https://reviews.llvm.org/D33341

llvm-svn: 306933
2017-07-01 03:24:06 +00:00
Dinar Temirbulatov 2fb1075f14 [SLPVectorizer] Add isOdd() helper function, NFCI.
llvm-svn: 306887
2017-06-30 21:16:26 +00:00
Craig Topper bcf511c0da [InstCombine] Replace an unnecessary use of a matcher with just an isa and a cast. NFC
We aren't looking through any levels of IR here so I don't think we need the power of a matcher or the temporary variable it requires.

llvm-svn: 306885
2017-06-30 21:09:34 +00:00
Ayal Zaks 2ff59d4350 [LV] Sink casts to unravel first order recurrence
Check if a single cast is preventing handling a first-order-recurrence Phi,
because the scheduling constraints it imposes on the first-order-recurrence
shuffle are infeasible; but they can be made feasible by moving the cast
downwards. Record such casts and move them when vectorizing the loop.

Differential Revision: https://reviews.llvm.org/D33058

llvm-svn: 306884
2017-06-30 21:05:06 +00:00
Sumanth Gundapaneni 5372f0a73e [SimplifyCFG] Update the name of switch generated lookup table.
This patch appends the name of the function to the switch generated lookup
table. This will ease the visual debugging in identifying the function the table
is generated from.

Differential Revision: https://reviews.llvm.org/D34817

llvm-svn: 306867
2017-06-30 20:00:01 +00:00
Simon Pilgrim 77c3c5f9b8 [InstCombine] Add m_BitReverse pattern match helper. NFCI.
llvm-svn: 306860
2017-06-30 18:58:29 +00:00
Anna Thomas e5e5e59d8b [RuntimeUnrolling] Add logic for loops with multiple exit blocks
Summary:
Runtime unrolling is done for loops with a single exit block and a
single exiting block (and this exiting block should be the latch block).
This patch adds logic to support unrolling in the presence of multiple exit
blocks (which also means multiple exiting blocks).
Currently this is under an off-by-default option and is supported when
epilog code is generated. Support in presence of prolog code will be in
a future patch (we just need to add more tests, and update comments).

This patch is essentially an implementation patch. I have not added any
heuristic (in terms of branches added or code size) to decide when
this should be enabled.

Reviewers: mkuper, sanjoy, reames, evstupac

Reviewed by: reames

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D33001

llvm-svn: 306846
2017-06-30 17:57:07 +00:00
Nikolai Bozhenov bde9b14c6f Revert of r306525: "Canonicalize clamp of float types to minmax"
llvm-svn: 306815
2017-06-30 10:39:09 +00:00
Ayal Zaks 8d26f0a602 [LV] Optimize for size when vectorizing loops with tiny trip count
It may be detrimental to vectorize loops with very small trip count, as various
costs of the vectorized loop body as well as enclosing overheads including
runtime tests and scalar iterations may outweigh the gains of vectorizing. The
current cost model measures the cost of the vectorized loop body only, expecting
it will amortize other costs, and loops with known or expected very small trip
counts are not vectorized at all. This patch allows loops with very small trip
counts to be vectorized, but under OptForSize constraints, which ensure the cost
of the loop body is dominant, having no runtime guards nor scalar iterations.

Patch inspired by D32451.

Differential Revision: https://reviews.llvm.org/D34373

llvm-svn: 306803
2017-06-30 08:02:35 +00:00
Craig Topper 880bf82685 [InstCombine] In foldXorToXor, move the commutable matcher from the LHS match to the RHS match. No meaningful change intended.
There are two conditions ORed here with similar checks and each contain two matches that must be true for the if to succeed. With the commutable match on the first half of the OR then both ifs basically have the same first part and only the second part distinguishs. With this change we move the commutable match to second half and make the first half unique.

This caused some tests to change because we now produce a commuted result, but this shouldn't matter in practice.

llvm-svn: 306800
2017-06-30 07:37:41 +00:00
Chandler Carruth 3545a9e1f9 Remove the BBVectorize pass.
It served us well, helped kick-start much of the vectorization efforts
in LLVM, etc. Its time has come and past. Back in 2014:
http://lists.llvm.org/pipermail/llvm-dev/2014-November/079091.html

Time to actually let go and move forward. =]

I've updated the release notes both about the removal and the
deprecation of the corresponding C API.

llvm-svn: 306797
2017-06-30 07:09:08 +00:00
Daniel Jasper 3b704ceba1 Revert "r306541 - Add zero-length check to memcpy/memset load store loop expansion"
Segfaults in non-optimized builds. I'll get a stack trace and a
reproducer to Teresa.

llvm-svn: 306793
2017-06-30 06:37:33 +00:00
Daniel Jasper 5ce1ce742e Revert "r306473 - re-commit r306336: Enable vectorizer-maximize-bandwidth by default."
This still breaks PPC tests we have. I'll forward reproduction
instructions to dehao.

llvm-svn: 306792
2017-06-30 06:32:21 +00:00
Max Kazantsev 8d0322e612 [SCEV] Use depth limit instead of local cache for SExt and ZExt
In rL300494 there was an attempt to deal with excessive compile time on
invocations of getSign/ZeroExtExpr using local caching. This approach only
helps if we request the same SCEV multiple times throughout recursion. But
in the bug PR33431 we see a case where we request different values all the time,
so caching does not help and the size of the cache grows enormously.

In this patch we remove the local cache for this methods and add the recursion
depth limit instead, as we do for arithmetics. This gives us a guarantee that the
invocation sequence is limited and reasonably short.

Differential Revision: https://reviews.llvm.org/D34273

llvm-svn: 306785
2017-06-30 05:04:09 +00:00
Eric Christopher a95aac3751 Reduce indenting and clean up comparisons around sign bit.
llvm-svn: 306781
2017-06-30 01:57:48 +00:00
Eric Christopher 710c1c8faa Reduce the complexity of the signbit/branch test functions.
llvm-svn: 306779
2017-06-30 01:35:31 +00:00
Dehao Chen 2f31d0d86e Hook the sample PGO machinery in the new PM
Summary: This patch hooks up SampleProfileLoaderPass with the new PM.

Reviewers: chandlerc, davidxl, davide, tejohnson

Reviewed By: chandlerc, tejohnson

Subscribers: tejohnson, llvm-commits, sanjoy

Differential Revision: https://reviews.llvm.org/D34720

llvm-svn: 306763
2017-06-29 23:33:05 +00:00
Dinar Temirbulatov f05c73c132 [SLPVectorizer] Moving Entry->NeedToGather check out of inner loop,
since it is invariant there. NFCI.

llvm-svn: 306749
2017-06-29 21:56:33 +00:00
Sam Clegg 3d65030c45 Remove `inline` keyword from inline `classof` methods
The style guide states that the explicit `inline`
should not be used with inline methods.  classof is
very common inline method with a fair amount on
inconsistency:

$ git grep classof ./include | grep inline | wc -l
230
$ git grep classof ./include | grep -v inline | wc -l
257

I chose to target this method rather the larger change
since this method is easily cargo-culted (I did it at
least once).  I considered doing the larger change and
removing all occurrences but that would be a much larger
change.

Differential Revision: https://reviews.llvm.org/D33906

llvm-svn: 306731
2017-06-29 19:35:17 +00:00
Xin Tong 02008c30b5 Remove useless header. NFC
llvm-svn: 306712
2017-06-29 17:48:12 +00:00
Leo Li 20fbad9307 [ConstantHoisting] Avoid hoisting constants in GEPs that index into a struct type.
Summary:
Indices for GEPs that index into a struct type should always be
constants. This added more checks in `collectConstantCandidates:` which make
sure constants for GEP pointer type are not hoisted.

This fixed Bug https://bugs.llvm.org/show_bug.cgi?id=33538

Reviewers: ributzka, rnk

Reviewed By: ributzka

Subscribers: efriedma, llvm-commits, srhines, javed.absar, pirama

Differential Revision: https://reviews.llvm.org/D34576

llvm-svn: 306704
2017-06-29 17:03:34 +00:00
Daniel Berlin b7df17ec59 PredicateInfo: Use OrderedInstructions instead of our homemade
version.

llvm-svn: 306703
2017-06-29 17:01:14 +00:00
Daniel Berlin b779db7ebc NewGVN: Remove useless test in addPhiOfOps.
llvm-svn: 306702
2017-06-29 17:01:10 +00:00
Daniel Berlin 7c757aee38 Remove unneeded else from OrderedInstructions::dominates.
llvm-svn: 306701
2017-06-29 17:01:03 +00:00
Dinar Temirbulatov 7b96266a16 [SLPVectorizer] Introducing getTreeEntry() helper function [NFC]
Differential Revision: https://reviews.llvm.org/D34756

llvm-svn: 306655
2017-06-29 08:46:18 +00:00
Craig Topper 798a19ab8e [InstCombine] In visitXor, use m_Not on the instruction itself instead of looking for all ones in Op1. This is consistent with 3 other not checks before this one. NFCI
llvm-svn: 306617
2017-06-29 00:07:08 +00:00
Keno Fischer a236dae5d1 [InstCombine] Retain TBAA when narrowing memory accesses
Summary:
As discussed on the mailing list it is legal to propagate TBAA to loads/stores
from/to smaller regions of a larger load tagged with TBAA. Do so for
(load->extractvalue)=>(gep->load) and similar foldings.

Reviewed By: sanjoy
Differential Revision: https://reviews.llvm.org/D31954

llvm-svn: 306615
2017-06-28 23:36:40 +00:00
Ayal Zaks d9bc43ef2a [LV] Fix PR33613 - retain order of insertelement per part
r306381 caused PR33613, by reversing the order in which insertelements were
generated per unroll part. This patch fixes PR33613 by retraining this order,
placing each set of insertelements per part immediately after the last scalar
being packed for this part. Includes a test case derived from PR33613.

Reference: https://bugs.llvm.org/show_bug.cgi?id=33613
Differential Revision: https://reviews.llvm.org/D34760

llvm-svn: 306575
2017-06-28 17:59:33 +00:00
Geoff Berry b0573547f6 [LoopUnroll] Fix bug in computeUnrollCount causing it to not honor MaxCount
Reviewers: sanjoy, anna, reames, apilipenko, igor-laevsky, mkuper

Subscribers: mcrosier, llvm-commits, mzolotukhin

Differential Revision: https://reviews.llvm.org/D34532

llvm-svn: 306564
2017-06-28 17:01:15 +00:00
Sanjay Patel 4e96f19052 [InstCombine] use local variable to reduce code; NFCI
llvm-svn: 306560
2017-06-28 16:39:06 +00:00
Geoff Berry 66d9bdbca8 [LoopUnroll] Pass SCEV to getUnrollingPreferences hook. NFCI.
Reviewers: sanjoy, anna, reames, apilipenko, igor-laevsky, mkuper

Subscribers: jholewinski, arsenm, mzolotukhin, nemanjai, nhaehnle, javed.absar, mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D34531

llvm-svn: 306554
2017-06-28 15:53:17 +00:00
Teresa Johnson 538b8d25f0 Add zero-length check to memcpy/memset load store loop expansion
Summary:
I was testing using this expansion logic in other cases besides
NVPTX, and found some runtime failures due to the lack of a check
for a zero length memcpy/memset before the loop. There is already
such a check in the memmove expansion code though.

Reviewers: hfinkel

Subscribers: jholewinski, wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D34707

llvm-svn: 306541
2017-06-28 13:07:37 +00:00
Nikolai Bozhenov b01e6b5a52 [InstCombine] Canonicalize clamp of float types to minmax in fast mode.
Summary:
This commit allows matchSelectPattern to recognize clamp of float
arguments in the presence of FMF the same way as already done for
integers.

This case is a little different though. With integers, given the
min/max pattern is recognized, DAGBuilder starts selecting MIN/MAX
"automatically". That is not the case for float, because for them only
full FMINNAN/FMINNUM/FMAXNAN/FMAXNUM ISD nodes exist and they do care
about NaNs. On the other hand, some backends (e.g. X86) have only
FMIN/FMAX nodes that do not care about NaNS and the former NAN/NUM
nodes are illegal thus selection is not happening. So I decided to do
such kind of transformation in IR (InstCombiner) instead of
complicating the logic in the backend.

Reviewers: spatel, jmolloy, majnemer, efriedma, craig.topper

Reviewed By: efriedma

Subscribers: hiraditya, javed.absar, n.bozhenov, llvm-commits

Patch by Andrei Elovikov <andrei.elovikov@intel.com>

Differential Revision: https://reviews.llvm.org/D33186

llvm-svn: 306525
2017-06-28 09:26:20 +00:00
Max Kazantsev 6c466a376e [IRCE][NFC] Better get SCEV for 1 in calculateSubRanges
A slightly more efficient way to get constant, we avoid resolving in getSCEV and excessive
invocations, and we don't create a ConstantInt if 'true' branch is taken.

Differential Revision: https://reviews.llvm.org/D34672

llvm-svn: 306503
2017-06-28 04:57:45 +00:00
Kyle Butt f73c8a06a9 Inlining: Don't re-map simplified cloned instructions.
When simplifying an instruction that has been re-mapped, it should never
simplify to an instruction in the original function. In the edge case
where we are inlining a function into itself, the existing code led to
incorrect behavior. Replace the incorrect code with an assert verifying
that we never expect simplification to produce an instruction in the old
function, unless the functions are the same.

Differential Revision: https://reviews.llvm.org/D33850

llvm-svn: 306495
2017-06-28 01:41:25 +00:00
Peter Collingbourne 92648c25a4 Bitcode: Write the irsymtab to disk.
Differential Revision: https://reviews.llvm.org/D33973

llvm-svn: 306487
2017-06-27 23:50:11 +00:00
Geoff Berry 2573a19fe6 [EarlyCSE][MemorySSA] Enable MemorySSA in function-simplification pass of EarlyCSE.
llvm-svn: 306477
2017-06-27 22:25:02 +00:00
Dehao Chen 920d022519 re-commit r306336: Enable vectorizer-maximize-bandwidth by default.
Differential Revision: https://reviews.llvm.org/D33341

llvm-svn: 306473
2017-06-27 22:05:58 +00:00
Craig Topper 5fe0197622 [InstCombine] Propagate nsw flag when turning mul by pow2 into shift when the constant is a vector splat or the scalar bit width is larger than 64-bits
The check to see if we can propagate the nsw flag used m_ConstantInt(uint64_t*&) which doesn't work with splat vectors and has a restriction that the bitwidth of the ConstantInt must be 64-bits are less.

This patch changes it to use m_APInt to remove both these issues

Differential Revision: https://reviews.llvm.org/D34699

llvm-svn: 306457
2017-06-27 19:57:53 +00:00
Serge Guelton 7bc405aa4c [CodeExtractor] Prevent extraction of block involving blockaddress
BlockAddress are only valid within their function context, which does not
interact well with CodeExtractor. Detect this case and prevent it.

Differential Revision: https://reviews.llvm.org/D33839

llvm-svn: 306448
2017-06-27 18:57:53 +00:00
Yaxun Liu 7c44f340de [SROA] Fix APInt size when alloca address space is not 0
SROA assumes alloca address space is 0, which causes assertion. This patch fixes that.

Differential Revision: https://reviews.llvm.org/D34104

llvm-svn: 306440
2017-06-27 18:26:06 +00:00
Sanjay Patel 7227276d41 [InstCombine] canonicalize icmp predicate feeding select
This canonicalization was suggested in D33172 as a way to make InstCombine behavior more uniform. 
We have this transform for icmp+br, so unless there's some reason that icmp+select should be 
treated differently, we should do the same thing here.

The benefit comes from increasing the chances of creating identical instructions. This is shown in
the tests in logical-select.ll (PR32791). InstCombine doesn't fold those directly, but EarlyCSE 
can simplify the identical cmps, and then InstCombine can fold the selects together.

The possible regression for the tests in select.ll raises questions about poison/undef:
http://lists.llvm.org/pipermail/llvm-dev/2017-May/113261.html

...but that transform is just as likely to be triggered by this canonicalization as it is to be 
missed, so we're just pointing out a commutation deficiency in the pattern matching:
https://reviews.llvm.org/rL228409

Differential Revision: https://reviews.llvm.org/D34242

llvm-svn: 306435
2017-06-27 17:53:22 +00:00
Dehao Chen 66131665c4 Enable ICP for AutoFDO.
Summary: AutoFDO should have ICP enabled.

Reviewers: davidxl

Reviewed By: davidxl

Subscribers: sanjoy, mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D34662

llvm-svn: 306429
2017-06-27 17:23:33 +00:00
Anna Thomas dc935a6eb6 [LoopUnrollRuntime] Use SCEV exit count for calculating trip count. NFCI
Instead of getBackEdgeTakenCount, use getExitCount on the latch exiting block
(which is proven to be the only exiting block in the loop to be unrolled).

llvm-svn: 306410
2017-06-27 14:14:35 +00:00
Ayal Zaks fc1e210d44 Recommitting 306331.
Undoing revert 306338 after fixed bug: add metadata to the load instead of the
reverse shuffle added to it, retaining the original ValueMap implementation.

llvm-svn: 306381
2017-06-27 08:41:19 +00:00
Chandler Carruth 3f81d8024c [SROA] Fix PR32902 by more carefully propagating !nonnull metadata.
This is based heavily on the work done ni D34285. I mostly wanted to do
test cleanup for the author to save them some time, but I had a really
hard time understanding why it was so hard to write better test cases
for these issues.

The problem is that because SROA does a second rewrite of the loads and
because we *don't* propagate !nonnull for non-pointer loads, we first
introduced invalid !nonnull metadata and then stripped it back off just
in time to avoid most ways of this PR manifesting. Moving to the more
careful utility only fixes this by changing the predicate to look at the
new load's type rather than the target type. However, that *does* fix
the bug, and the utility is much nicer including adding range metadata
to model the nonnull property after a conversion to an integer.

However, we have bigger problems because we don't actually propagate
*range* metadata, and the utility to do this extracted from instcombine
isn't really in good shape to do this currently. It *only* handles the
case of copying range metadata from an integer load to a pointer load.
It doesn't even handle the trivial cases of propagating from one integer
load to another when they are the same width! This utility will need to
be beefed up prior to using in this location to get the metadata to
fully survive.

And even then, we need to go and teach things to turn the range metadata
into an assume the way we do with nonnull so that when we *promote* an
integer we don't lose the information.

All of this will require a new test case that looks kind-of like
`preserve-nonnull.ll` does here but focuses on range metadata. It will
also likely require more testing because it needs to correctly handle
changes to the integer width, especially as SROA actively tries to
change the integer width!

Last but not least, I'm a little worried about hooking the range
metadata up here because the instcombine logic for converting from
a range metadata *to* a nonnull metadata node seems broken in the face
of non-zero address spaces where null is not mapped to the integer `0`.
So that probably needs to get fixed with test cases both in SROA and in
instcombine to cover it.

But this *does* extract the core PR fix from D34285 of preventing the
!nonnull metadata from being propagated in a broken state just long
enough to feed into promotion and crash value tracking.

On D34285 there is some discussion of zero-extend handling because it
isn't necessary. First, the new load size covers all of the non-undef
(ie, possibly initialized) bits. This may even extend past the original
alloca if loading those bits could produce valid data. The only way its
valid for us to zero-extend an integer load in SROA is if the original
code had a zero extend or those bits were undef. And we get to assume
things like undef *never* satifies nonnull, so non undef bits can
participate here. No need to special case the zero-extend handling, it
just falls out correctly.

The original credit goes to Ariel Ben-Yehuda! I'm mostly landing this to
save a few rounds of trivial edits fixing style issues and test case
formulation.

Differental Revision: D34285

llvm-svn: 306379
2017-06-27 08:32:03 +00:00
Mikael Holmen 37b5120a9a [Reassociate] Make sure EraseInst sets MadeChange
Summary:
EraseInst didn't report that it made IR changes through MadeChange.

It is essential that changes to the IR are reported correctly,
since for example ReassociatePass::run() will indicate that all
analyses are preserved otherwise.
And the CGPassManager determines if the CallGraph is up-to-date
based on status from InstructionCombiningPass::runOnFunction().

Reviewers: craig.topper, rnk, davide

Reviewed By: rnk, davide

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D34616

llvm-svn: 306368
2017-06-27 05:32:13 +00:00
Dehao Chen 8b7effb344 revert r306336 for breaking ppc test.
llvm-svn: 306344
2017-06-26 23:05:35 +00:00
Ayal Zaks 3923c0c46b reverting 306331.
Causes TBAA metadata to be generates on reverse shuffles, investigating.

llvm-svn: 306338
2017-06-26 22:26:54 +00:00
Dehao Chen 79655792cc Enable vectorizer-maximize-bandwidth by default.
Summary:
vectorizer-maximize-bandwidth is generally useful in terms of performance. I've tested the impact of changing this to default on speccpu benchmarks on sandybridge machines. The result shows non-negative impact:

spec/2006/fp/C++/444.namd                 26.84  -0.31%
spec/2006/fp/C++/447.dealII               46.19  +0.89%
spec/2006/fp/C++/450.soplex               42.92  -0.44%
spec/2006/fp/C++/453.povray               38.57  -2.25%
spec/2006/fp/C/433.milc                   24.54  -0.76%
spec/2006/fp/C/470.lbm                    41.08  +0.26%
spec/2006/fp/C/482.sphinx3                47.58  -0.99%
spec/2006/int/C++/471.omnetpp             22.06  +1.87%
spec/2006/int/C++/473.astar               22.65  -0.12%
spec/2006/int/C++/483.xalancbmk           33.69  +4.97%
spec/2006/int/C/400.perlbench             33.43  +1.70%
spec/2006/int/C/401.bzip2                 23.02  -0.19%
spec/2006/int/C/403.gcc                   32.57  -0.43%
spec/2006/int/C/429.mcf                   40.35  +0.27%
spec/2006/int/C/445.gobmk                 26.96  +0.06%
spec/2006/int/C/456.hmmer                  24.4  +0.19%
spec/2006/int/C/458.sjeng                 27.91  -0.08%
spec/2006/int/C/462.libquantum            57.47  -0.20%
spec/2006/int/C/464.h264ref               46.52  +1.35%

geometric mean                                   +0.29%

The regression on 453.povray seems real, but is due to secondary effects as all hot functions are bit-identical with and without the flag.

I started this patch to consult upstream opinions on this. It will be greatly appreciated if the community can help test the performance impact of this change on other architectures so that we can decided if this should be target-dependent.

Reviewers: hfinkel, mkuper, davidxl, chandlerc

Reviewed By: chandlerc

Subscribers: rengolin, sanjoy, javed.absar, bjope, dorit, magabari, RKSimon, llvm-commits, mzolotukhin

Differential Revision: https://reviews.llvm.org/D33341

llvm-svn: 306336
2017-06-26 21:41:09 +00:00
Ayal Zaks e7e15d186b [LV] Changing the interface of ValueMap, NFC.
Instead of providing access to the internal MapStorage holding all Values
associated with a given Key, used for setting or resetting them all together,
ValueMap keeps its MapStorage internal; its new interface allows getting,
setting or resetting a single Value, per part or per part-and-lane.
Follows the discussion in https://reviews.llvm.org/D32871.

Differential Revision: https://reviews.llvm.org/D34473

llvm-svn: 306331
2017-06-26 21:03:51 +00:00
Wei Mi 71f06420e4 [GVN] Recommit the patch "Add phi-translate support in scalarpre".
The recommit fixes three bugs: The first one is to use CurrentBlock instead of
PREInstr's Parent as param of performScalarPREInsertion because the Parent
of a clone instruction may be uninitialized. The second one is stop PRE when
CurrentBlock to its predecessor is a backedge and an operand of CurInst is
defined inside of CurrentBlock. The same value defined inside of loop in last
iteration can not be regarded as available. The third one is an out-of-bound
array access in a flipped if guard.

Right now scalarpre doesn't have phi-translate support, so it will miss some
simple pre opportunities. Like the following testcase, current scalarpre cannot
recognize the last "a * b" is fully redundent because a and b used by the last
"a * b" expr are both defined by phis.

long a[100], b[100], g1, g2, g3;
__attribute__((pure)) long goo();

void foo(long a, long b, long c, long d) {

  g1 = a * b;
  if (__builtin_expect(g2 > 3, 0)) {
    a = c;
    b = d;
    g2 = a * b;
  }
  g3 = a * b;      // fully redundant.

}

The patch adds phi-translate support in scalarpre. This is only a temporary
solution before the newpre based on newgvn is available.

llvm-svn: 306313
2017-06-26 18:16:10 +00:00
Chandler Carruth 2abb65ae11 [InstCombine] Factor the logic for propagating !nonnull and !range
metadata out of InstCombine and into helpers.

NFC, this just exposes the logic used by InstCombine when propagating
metadata from one load instruction to another. The plan is to use this
in SROA to address PR32902.

If anyone has better ideas about how to factor this or name variables,
I'm all ears, but this seemed like a pretty good start and lets us make
progress on the PR.

This is based on a patch by Ariel Ben-Yehuda (D34285).

llvm-svn: 306267
2017-06-26 03:31:31 +00:00
Chandler Carruth 4a000883c7 [LoopSimplify] Re-instate r306081 with a bug fix w.r.t. indirectbr.
This was reverted in r306252, but I already had the bug fixed and was
just trying to form a test case.

The original commit factored the logic for forming dedicated exits
inside of LoopSimplify into a helper that could be used elsewhere and
with an approach that required fewer intermediate data structures. See
that commit for full details including the change to the statistic, etc.

The code looked fine to me and my reviewers, but in fact didn't handle
indirectbr correctly -- it left the 'InLoopPredecessors' vector dirty.

If you have code that looks *just* right, you can end up leaking these
predecessors into a subsequent rewrite, and crash deep down when trying
to update PHI nodes for predecessors that don't exist.

I've added an assert that makes the bug much more obvious, and then
changed the code to reliably clear the vector so we don't get this bug
again in some other form as the code changes.

I've also added a test case that *does* manage to catch this while also
giving some nice positive coverage in the face of indirectbr.

The real code that found this came out of what I think is CPython's
interpreter loop, but any code with really "creative" interpreter loops
mixing indirectbr and other exit paths could manage to tickle the bug.
I was hard to reduce the original test case because in addition to
having a particular pattern of IR, the whole thing depends on the order
of the predecessors which is in turn depends on use list order. The test
case added here was designed so that in multiple different predecessor
orderings it should always end up going down the same path and tripping
the same bug. I hope. At least, it tripped it for me without
manipulating the use list order which is better than anything bugpoint
could do...

llvm-svn: 306257
2017-06-25 22:45:31 +00:00
Anna Thomas e7cb633d29 [LoopDeletion] NFC: Move phi node value setting into prepass
Recommit NFC patch (rL306157) where I missed incrementing the basic block iterator,
which caused loop deletion tests to hang due to infinite loop.
Had reverted it in rL306162.

rL306157 commit message:
Currently, the implementation of delete dead loops has a special case
when the loop being deleted is never executed. This special case
(updating of exit block's incoming values for phis) can be
run as a prepass for non-executable loops before performing
the actual deletion.

llvm-svn: 306254
2017-06-25 21:13:58 +00:00
Daniel Jasper 4c6cd4ccb7 Revert "[LoopSimplify] Factor the logic to form dedicated exits into a utility."
This leads to a segfault. Chandler already has a test case and should be
able to recommit with a fix soon.

llvm-svn: 306252
2017-06-25 17:58:25 +00:00
Sanjay Patel 2f3ead7adc [InstCombine] add (sext i1 X), 1 --> zext (not X)
http://rise4fun.com/Alive/i8Q

A narrow bitwise logic op is obviously better than math for value tracking, 
and zext is better than sext. Typically, the 'not' will be folded into an 
icmp predicate.

The IR difference would even survive through codegen for x86, so we would see 
worse code:

https://godbolt.org/g/C14HMF

one_or_zero(int, int):                      # @one_or_zero(int, int)
        xorl    %eax, %eax
        cmpl    %esi, %edi
        setle   %al
        retq

one_or_zero_alt(int, int):                  # @one_or_zero_alt(int, int)
        xorl    %ecx, %ecx
        cmpl    %esi, %edi
        setg    %cl
        movl    $1, %eax
        subl    %ecx, %eax
        retq

llvm-svn: 306243
2017-06-25 14:15:28 +00:00
Xinliang David Li b67530e9b9 [PGO] Implementate profile counter regiser promotion
Differential Revision: http://reviews.llvm.org/D34085

llvm-svn: 306231
2017-06-25 00:26:43 +00:00
Hiroshi Inoue b300824ee7 fix trivial typos in comment, NFC
dereferencable -> dereferenceable

llvm-svn: 306210
2017-06-24 15:43:33 +00:00
Craig Topper 7b66ffe875 [ValueTracking][InstCombine] Use m_Shr instead m_CombineOr(m_LShr, m_AShr). NFC
llvm-svn: 306205
2017-06-24 06:24:04 +00:00
Craig Topper 72ee6945af [Analysis][Transforms] Use commutable matchers instead of m_CombineOr in a few places. NFC
llvm-svn: 306204
2017-06-24 06:24:01 +00:00
Vitaly Buka df19ad456e [InstCombine] Don't replace allocas with smaller globals
Summary:
InstCombine replaces large allocas with small globals consts causing buffer overflows
on valid code, see PR33372.

This fix permits this optimization only if the global is dereference for alloca size.

Fixes PR33372

Reviewers: eugenis, majnemer, chandlerc

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D34311

llvm-svn: 306194
2017-06-24 01:35:19 +00:00
Anna Thomas 77a2e6b198 Revert "[LoopDeletion] NFC: Move phi node value setting into prepass"
This reverts commit r306157.
It caused some timeouts in clang tests. Perhaps unreachable loops have
far too many phi nodes.
Reverting and investigating.

llvm-svn: 306162
2017-06-23 21:30:48 +00:00
Anna Thomas a43b387f27 [LoopDeletion] NFC: Move phi node value setting into prepass
Currently, the implementation of delete dead loops has a special case
when the loop being deleted is never executed. This special case
(updating of exit block's incoming values for phis) can be
run as a prepass for non-executable loops before performing
the actual deletion.

llvm-svn: 306157
2017-06-23 20:38:50 +00:00
Craig Topper 68ed55e06a [CorrelatedValuePropagation] Fix typo in comment sense->since. NFC
llvm-svn: 306152
2017-06-23 20:28:40 +00:00
Craig Topper 29cdfe2cd9 [CorrelatedValuePropagation] Remove comment about iterating switch cases in reverse order. This is no longer being done after r298791. NFC
llvm-svn: 306151
2017-06-23 20:28:35 +00:00
Anna Thomas 91eed9ac1a [RuntimeLoopUnrolling] Rename exit block and move assert earlier. NFC
The single exit block allowed in runtime unrolling is guaranteed to be
the Latch's successor, so rename it as LatchExitBlock.

llvm-svn: 306105
2017-06-23 14:28:01 +00:00
Anna Thomas d67165c93c [InstCombine] Recognize and simplify three way comparison idioms
Summary:
Many languages have a three way comparison idiom where comparing two values
produces not a boolean, but a tri-state value. Typical values (e.g. as used in
the lcmp/fcmp bytecodes from Java) are -1 for less than, 0 for equality, and +1
for greater than.

We actually do a great job already of converting three way comparisons into
binary comparisons when the result produced has one a single use. Unfortunately,
such values can have more than one use, and in that case, our existing
optimizations break down.

The patch adds a peephole which converts a three-way compare + test idiom into a
binary comparison on the original inputs. It focused on replacing the test on
the result of the three way compare and does nothing about removing the three
way compare itself. That's left to other optimizations (which do actually kick
in commonly.)
We currently recognize one idiom on signed integer compare. In the future, we
plan to recognize and simplify other comparison idioms on
other signed/unsigned datatypes such as floats, vectors etc.

This is a resurrection of Philip Reames' original patch:
https://reviews.llvm.org/D19452

Reviewers: majnemer, apilipenko, reames, sanjoy, mkazantsev

Reviewed by: mkazantsev

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D34278

llvm-svn: 306100
2017-06-23 13:41:45 +00:00
Craig Topper 2c20c42cb6 [JumpThreading] Teach jump threading how to analyze (and (cmp A, C1), (cmp A, C2)) after InstCombine has turned it into (cmp (add A, C3), C4)
Currently JumpThreading can use LazyValueInfo to analyze an 'and' or 'or' of compare if the compare is fed by a livein of a basic block. This can be used to to prove the condition can't be met for some predecessor and the jump from that predecessor can be moved to the false path of the condition.

But if the compare is something that InstCombine turns into an add and a single compare, it can't be analyzed because the livein is now an input to the add and not the compare.

This patch adds a new method to LVI to get a ConstantRange on an edge. Then we teach jump threading to detect the add livein feeding a compare and to get the ConstantRange and propagate it.

Differential Revision: https://reviews.llvm.org/D33262

llvm-svn: 306085
2017-06-23 05:41:35 +00:00
Craig Topper 7927996140 [JumpThreading] Use some temporary variables to reduce the number of times we call the same methods. NFC
A future patch will add even more uses of these variables.

llvm-svn: 306084
2017-06-23 05:41:32 +00:00
Chandler Carruth 4ab0f4910a [LoopSimplify] Factor the logic to form dedicated exits into a utility.
I want to use the same logic as LoopSimplify to form dedicated exits in
another pass (SimpleLoopUnswitch) so I wanted to factor it out here.

I also noticed that there is a pretty significantly more efficient way
to implement this than the way the code in LoopSimplify worked. We don't
need to actually retain the set of unique exit blocks, we can just
rewrite them as we find them and use only a set to deduplicate.

This did require changing one part of LoopSimplify to not re-use the
unique set of exits, but it only used it to check that there was
a single unique exit. That part of the code is about to walk the exiting
blocks anyways, so it seemed better to rewrite it to use those exiting
blocks to compute this property on-demand.

I also had to ditch a statistic, but it doesn't seem terribly valuable.

Differential Revision: https://reviews.llvm.org/D34049

llvm-svn: 306081
2017-06-23 04:03:04 +00:00
Eric Christopher 5a7c2f1700 Remove the LoadCombine pass. It was never enabled and is unsupported.
Based on discussions with the author on mailing lists.

llvm-svn: 306067
2017-06-22 22:58:12 +00:00
Anna Thomas 72c90c87f8 [LoopDeletion] Update exits correctly when multiple duplicate edges from an exiting block
Summary:
Currently, we incorrectly update exit blocks of loops when there are multiple
edges from a single exiting block to the exit block. This can happen when we
have switches as the terminator of the exiting blocks.
The fix here is to correctly update the phi nodes in the exit block, and remove
all incoming values *except* for one which is from the preheader.

Note: Currently, this error can manifest only while deleting non-executed loops. However, it
is possible to trigger this error in invariant loops, once we enhance the logic
around the exit conditions for the loop check.

Reviewers: chandlerc, dberlin, sanjoy, efriedma

Reviewed by: efriedma

Subscribers: mzolotukhin, llvm-commits

Differential Revision: https://reviews.llvm.org/D34516

llvm-svn: 306048
2017-06-22 20:20:56 +00:00
Craig Topper dffbbcb3fd [InstCombine] Teach foldSelectICmpAndOr to recognize (select (icmp slt (trunc (X)), 0), Y, (or Y, C2))
Summary:
InstCombine likes to turn (icmp eq (and X, C1), 0) into (icmp slt (trunc (X)), 0) sometimes. This breaks foldSelectICmpAndOr's ability to recognize (select (icmp eq (and X, C1), 0), Y, (or Y, C2))->(or (shl (and X, C1), C3), y).

This patch tries to recover this. I had to flip around some of the early out checks so that I could create a new And instruction during the compare processing without it possibly never getting used.

Reviewers: spatel, majnemer, davide

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D34184

llvm-svn: 306029
2017-06-22 16:23:30 +00:00
Craig Topper 0de5e6a729 [InstCombine] Add one use checks to or/and->xnor folding
If the components of the and/or had multiple uses, this transform created an additional instruction.

This patch makes sure we remove one of the components.

Differential Revision: https://reviews.llvm.org/D34498

llvm-svn: 306027
2017-06-22 16:12:02 +00:00
Sanjay Patel d1e811979c [InstCombine] reverse bitcast + bitwise-logic canonicalization (PR33138)
There are 2 parts to this patch made simultaneously to avoid a regression.

We're reversing the canonicalization that moves bitwise vector ops before bitcasts. 
We're moving bitwise vector ops *after* bitcasts instead. That's the 1st and 3rd hunks 
of the patch. The motivation is that there's only one fold that currently depends on 
the existing canonicalization (see next), but there are many folds that would 
automatically benefit from the new canonicalization. 
PR33138 ( https://bugs.llvm.org/show_bug.cgi?id=33138 ) shows why/how we have these 
patterns in IR.

There's an or(and,andn) pattern that requires an adjustment in order to continue matching
to 'select' because the bitcast changes position. This match is unfortunately complicated 
because it requires 4 logic ops with optional bitcast and sext ops.

Test diffs:

  1. The bitcast.ll and bitcast-bigendian.ll changes show the most basic difference - 
     bitcast comes before logic.
  2. There are also tests with no diffs in bitcast.ll that verify that we're still doing 
     folds that were enabled by the previous canonicalization.
  3. icmp-xor-signbit.ll shows the payoff. We don't need to adjust existing icmp patterns 
     to look through bitcasts.
  4. logical-select.ll contains several tests for the or(and,andn) --> select fold to 
     verify that we are still handling those cases. The lone diff shows the movement of 
     the bitcast from the new canonicalization rule.

Differential Revision: https://reviews.llvm.org/D33517

llvm-svn: 306011
2017-06-22 15:46:54 +00:00
Sanjay Patel e800df8eac [InstCombine] add peekThroughBitcast() helper; NFC
This is an NFC portion of D33517. We have similar helpers in the backend.

llvm-svn: 306008
2017-06-22 15:28:01 +00:00
Diana Picus b512e91515 Revert "Enable vectorizer-maximize-bandwidth by default."
This reverts commit r305960 because it broke self-hosting on AArch64.

llvm-svn: 305990
2017-06-22 10:00:28 +00:00
Sam Clegg 705f798bff Mark dump() methods as const. NFC
Add const qualifier to any dump() method where adding one
was trivial.

Differential Revision: https://reviews.llvm.org/D34481

llvm-svn: 305963
2017-06-21 22:19:17 +00:00
Dehao Chen 014db29b89 Enable vectorizer-maximize-bandwidth by default.
Summary:
vectorizer-maximize-bandwidth is generally useful in terms of performance. I've tested the impact of changing this to default on speccpu benchmarks on sandybridge machines. The result shows non-negative impact:

spec/2006/fp/C++/444.namd                 26.84  -0.31%
spec/2006/fp/C++/447.dealII               46.19  +0.89%
spec/2006/fp/C++/450.soplex               42.92  -0.44%
spec/2006/fp/C++/453.povray               38.57  -2.25%
spec/2006/fp/C/433.milc                   24.54  -0.76%
spec/2006/fp/C/470.lbm                    41.08  +0.26%
spec/2006/fp/C/482.sphinx3                47.58  -0.99%
spec/2006/int/C++/471.omnetpp             22.06  +1.87%
spec/2006/int/C++/473.astar               22.65  -0.12%
spec/2006/int/C++/483.xalancbmk           33.69  +4.97%
spec/2006/int/C/400.perlbench             33.43  +1.70%
spec/2006/int/C/401.bzip2                 23.02  -0.19%
spec/2006/int/C/403.gcc                   32.57  -0.43%
spec/2006/int/C/429.mcf                   40.35  +0.27%
spec/2006/int/C/445.gobmk                 26.96  +0.06%
spec/2006/int/C/456.hmmer                  24.4  +0.19%
spec/2006/int/C/458.sjeng                 27.91  -0.08%
spec/2006/int/C/462.libquantum            57.47  -0.20%
spec/2006/int/C/464.h264ref               46.52  +1.35%

geometric mean                                   +0.29%

The regression on 453.povray seems real, but is due to secondary effects as all hot functions are bit-identical with and without the flag.

I started this patch to consult upstream opinions on this. It will be greatly appreciated if the community can help test the performance impact of this change on other architectures so that we can decided if this should be target-dependent.

Reviewers: hfinkel, mkuper, davidxl, chandlerc

Reviewed By: chandlerc

Subscribers: rengolin, sanjoy, javed.absar, bjope, dorit, magabari, RKSimon, llvm-commits, mzolotukhin

Differential Revision: https://reviews.llvm.org/D33341

llvm-svn: 305960
2017-06-21 22:01:32 +00:00
Craig Topper 34caf5396f [Reassociate] Use early returns in a couple places to reduce indentation and improve readability. NFC
llvm-svn: 305946
2017-06-21 19:39:35 +00:00
Craig Topper 99a2e89920 [Reassociate] Const correct a helper function. NFC
llvm-svn: 305945
2017-06-21 19:39:33 +00:00
Craig Topper a074c101e5 [InstCombine] Cleanup using commutable matchers. Make a couple helper methods standalone static functions. Put 'if' around variable declaration instead of after. NFC
llvm-svn: 305941
2017-06-21 18:57:00 +00:00
Dehao Chen 50f2aa19e8 Do not inline recursive direct calls in sample loader pass.
Summary: r305009 disables recursive inlining for indirect calls in sample loader pass. The same logic applies to direct recursive calls.

Reviewers: iteratee, davidxl

Reviewed By: iteratee

Subscribers: sanjoy, llvm-commits, eraman

Differential Revision: https://reviews.llvm.org/D34456

llvm-svn: 305934
2017-06-21 17:57:43 +00:00
Craig Topper 5b173f2bb3 [InstCombine] Add range metadata to cttz/ctlz/ctpop intrinsic calls based on known bits
Summary:
I noticed that passing known bits across these intrinsics isn't great at capturing the information we really know. Turning known bits of the input into known bits of a count output isn't able to convey a lot of what we really know.

This patch adds range metadata to these intrinsics based on the known bits.

Currently the patch punts if we already have range metadata present.

Reviewers: spatel, RKSimon, davide, majnemer

Reviewed By: RKSimon

Subscribers: sanjoy, hfinkel, llvm-commits

Differential Revision: https://reviews.llvm.org/D32582

llvm-svn: 305927
2017-06-21 16:32:35 +00:00
Craig Topper ae86cc725d [InstCombine] Don't let folding (select (icmp eq (and X, C1), 0), Y, (or Y, C2)) create more instructions than it removes
Summary:
Previously this folding had no checks to see if it was going to result in less instructions. This was pointed out during the review of D34184

This patch adds code to count how many instructions its going to create vs how many its going to remove so we can make a proper decision.

Reviewers: spatel, majnemer

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D34437

llvm-svn: 305926
2017-06-21 16:07:13 +00:00
Craig Topper cbac691c4b [Reassociate] Support xor reassociating for splat vectors
Summary: This patch adds support for xors of splat vectors.

Reviewers: mcrosier

Reviewed By: mcrosier

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D34354

llvm-svn: 305925
2017-06-21 16:07:09 +00:00
Davide Italiano 0ec715be1f [NewGVN] Fix a bug that made the store verifier less effective.
We weren't actually checking for duplicated stores, as the condition
was always actually false. This was found by Coverity, and I have
no clue how to trigger this in real-world code (although I
 tried for a bit).

llvm-svn: 305867
2017-06-20 22:57:40 +00:00
Sanjay Patel 4ccbd58d70 [InstCombine] fix code/test comments for r305792; NFC
These diffs were in the last version of the patch in D33342,
but I accidentally committed the previous rev. 

llvm-svn: 305793
2017-06-20 12:45:46 +00:00
Sanjay Patel adca825dc1 [InstCombine] try to canonicalize xor-of-icmps to and-of-icmps
We have a large portfolio of folds for and-of-icmps and or-of-icmps in InstSimplify and InstCombine, 
but hardly anything for xor-of-icmps. Rather than trying to rethink and translate all of those folds, 
we can use the truth table definition of xor:

X ^ Y --> (X | Y) & !(X & Y)

...to see if we can convert the xor to and/or and then use the existing folds.

http://rise4fun.com/Alive/J9v

Differential Revision: https://reviews.llvm.org/D33342

llvm-svn: 305792
2017-06-20 12:40:55 +00:00
Vedant Kumar b5794ca90c [ProfileData] PR33517: Check for failure of symtab creation
With PR33517, it became apparent that symbol table creation can fail
when presented with malformed inputs. This patch makes that sort of
error detectable, so llvm-cov etc. can fail more gracefully.

Specifically, we now check that function names within the symbol table
aren't empty.

Testing: check-{llvm,clang,profile}, some unit test updates.
llvm-svn: 305765
2017-06-20 01:38:56 +00:00
Ana Pazos f731bde064 [PATCH] [PGO] Fixed cast operation in emIntrinsicVisitor::instrumentOneMemIntrinsic.
Reviewers: xur, efriedma, davidxl

Reviewed By: davidxl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D34293

llvm-svn: 305737
2017-06-19 20:04:33 +00:00
Taewook Oh 9083547ae3 Improve profile-guided heuristics to use estimated trip count.
Summary:
Existing heuristic uses the ratio between the function entry
frequency and the loop invocation frequency to find cold loops. However,
even if the loop executes frequently, if it has a small trip count per
each invocation, vectorization is not beneficial. On the other hand,
even if the loop invocation frequency is much smaller than the function
invocation frequency, if the trip count is high it is still beneficial
to vectorize the loop.

This patch uses estimated trip count computed from the profile metadata
as a primary metric to determine coldness of the loop. If the estimated
trip count cannot be computed, it falls back to the original heuristics.

Reviewers: Ayal, mssimpso, mkuper, danielcdh, wmi, tejohnson

Reviewed By: tejohnson

Subscribers: tejohnson, mzolotukhin, llvm-commits

Differential Revision: https://reviews.llvm.org/D32451

llvm-svn: 305729
2017-06-19 18:48:58 +00:00
Bjorn Pettersson 475fcd9cd8 [InstCombine] Make sure AddReachableCodeToWorklist sets MadeIRChange
Summary:
Some optimizations in AddReachableCodeToWorklist did not update
the MadeIRChange state. This could happen both when removing
trivially dead instructions (DCE) and at constant folds.

It is essential that changes to the IR is reported correctly,
since for example InstCombinePass::run() will indicate that all
analyses are preserved otherwise.
And the CGPassManager determines if the CallGraph is up-to-date
based on status from InstructionCombiningPass::runOnFunction().

The new test case early_dce_clobbers_callgraph.ll is a reproducer
for some asserts that started to trigger after changes in the
inliner in r305245. With this patch the test case passes again.

Reviewers: sanjoy, craig.topper, dblaikie

Reviewed By: craig.topper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D34346

llvm-svn: 305725
2017-06-19 18:00:27 +00:00
Hans Wennborg ca69fc1cb7 Revert r304824 "Fix PR23384 (part 3 of 3)"
This seems to be interacting badly with ASan somehow, causing false reports of
heap-buffer overflows: PR33514.

> Summary:
> The patch makes instruction count the highest priority for
> LSR solution for X86 (previously registers had highest priority).
>
> Reviewers: qcolombet
>
> Differential Revision: http://reviews.llvm.org/D30562
>
> From: Evgeny Stupachenko <evstupac@gmail.com>

llvm-svn: 305720
2017-06-19 17:57:15 +00:00
Davide Italiano daa9c0e403 [NewGVN] Simplify findConditionEquivalence(). NFCI.
llvm-svn: 305707
2017-06-19 16:46:15 +00:00
Dinar Temirbulatov e2c6991c07 Remove brackets, NFC.
llvm-svn: 305706
2017-06-19 16:44:07 +00:00
Craig Topper a7529b68cc [InstCombine] Cleanup some duplicated one use checks
Summary:
These 4 patterns have the same one use check repeated twice for each. Once without a cast and one with. But the cast has no effect on what method is called.

For the OR case I believe it is always profitable regardless of the number of uses since we'll never increase the instruction count.

For the AND case I believe it is profitable if the pair of xors has one use such that we'll get rid of it completely. Or if the C value is something freely invertible, in which case the not doesn't cost anything.

Reviewers: spatel, majnemer

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D34308

llvm-svn: 305705
2017-06-19 16:23:49 +00:00
Craig Topper ef85498e05 [Reassociate] Support some reassociation of vector xors
Summary:
Currently we don't try to do anything with vector xors.

This patch adds support for removing duplicate pairs from a chain of vector xors as its pretty easy to support. We still dont' try to combine the xors with and/ors, but I might try that in a future patch.

Reviewers: mcrosier, davide, resistor

Reviewed By: mcrosier

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D34338

llvm-svn: 305704
2017-06-19 16:23:46 +00:00
Craig Topper 4350734d36 [Reassociate] Make one of the helper methods static because it doesn't use any class variables. NFC
llvm-svn: 305703
2017-06-19 16:23:43 +00:00
Anna Thomas 7949f4529a [JumpThreading][LVI] Invalidate LVI information after blocks are merged
Summary:
After a single predecessor is merged into a basic block, we need to invalidate
the LVI information for the new merged block, when LVI is not provably true for
all of instructions in the new block.
The test cases added show the correct LVI information using the LVI printer
pass.

Reviewers: reames, dberlin, davide, sanjoy

Reviewed by: dberlin, davide

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D34108

llvm-svn: 305699
2017-06-19 15:23:33 +00:00
Xin Tong b412831d11 [TRE] Improve code motion in TRE, use AA to tell whether a load can be moved before a call that writes to memory.
Summary: use AA to tell whether a load can be moved before a call that writes to memory.

Reviewers: dberlin, davide, sanjoy, hfinkel

Reviewed By: hfinkel

Subscribers: hfinkel, llvm-commits

Differential Revision: https://reviews.llvm.org/D34115

llvm-svn: 305698
2017-06-19 15:21:18 +00:00
Daniel Berlin 36b08b2088 NewGVN: Fix PR 33461, caused by slightly overzealous verification.
llvm-svn: 305657
2017-06-19 00:24:00 +00:00
Craig Topper d96177cf72 [Reassociate] Use APInt::isNullValue() instead of comparing with 0. NFC
This should compile to slightly better code.

llvm-svn: 305651
2017-06-18 18:15:38 +00:00
Xin Tong 9d2a5b1cf7 Add argmononly attribute to strlen and wcslen, i.e. they only read memory (string) passed to them.
Summary:
This allows strlen to be moved out of the loop in case its argument is
not modified in the loop in LICM.

Reviewers: hfinkel, davide, sanjoy, dberlin

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D34323

llvm-svn: 305641
2017-06-18 03:10:26 +00:00
Sanjoy Das b70ddd8901 [SROA] Add support for non-integral pointers
Summary: C.f. http://llvm.org/docs/LangRef.html#non-integral-pointer-type

Reviewers: chandlerc, loladiro

Reviewed By: loladiro

Subscribers: reames, loladiro, mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D32203

llvm-svn: 305639
2017-06-17 20:28:13 +00:00
Xin Tong 025780ba6e [TRE] Add assertion for folding trivial return block
llvm-svn: 305637
2017-06-17 16:55:12 +00:00
Xin Tong d5b4d0b53a [TRE] Update comments. NFC
llvm-svn: 305636
2017-06-17 16:18:36 +00:00
Wei Mi c7ba876323 Revert rL305578. There is still some buildbot failure to be fixed.
llvm-svn: 305603
2017-06-16 23:14:35 +00:00
Anna Thomas 6bc14c65ad [InstCombine] Set correct insertion point for selects generated while folding phis
Summary:
When we fold vector constants that are operands of phi's that feed into select,
we need to set the correct insertion point for the *new* selects that get generated.
The correct insertion point is the incoming block for the phi.
Such cases can occur with patch r298845, which fixed folding of
vector constants, but the new selects could be inserted incorrectly (as the added
test case shows).

Reviewers: majnemer, spatel, sanjoy

Reviewed by: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D34162

llvm-svn: 305591
2017-06-16 21:08:37 +00:00
Davide Italiano ec5b0257bf [SCCP] Simplify the code a bit. NFCI.
llvm-svn: 305583
2017-06-16 20:50:31 +00:00
Davide Italiano 0b1190aa8d [SCCP] Clarify a comment about unhandled instructions.
llvm-svn: 305579
2017-06-16 20:27:17 +00:00
Wei Mi a2493b6ad9 [GVN] Recommit the patch "Add phi-translate support in scalarpre".
The recommit fixes two bugs: The first one is to use CurrentBlock instead of
PREInstr's Parent as param of performScalarPREInsertion because the Parent
of a clone instruction may be uninitialized. The second one is stop PRE when
CurrentBlock to its predecessor is a backedge and an operand of CurInst is
defined inside of CurrentBlock. The same value defined inside of loop in last
iteration can not be regarded as available.

Right now scalarpre doesn't have phi-translate support, so it will miss some
simple pre opportunities. Like the following testcase, current scalarpre cannot
recognize the last "a * b" is fully redundent because a and b used by the last
"a * b" expr are both defined by phis.

long a[100], b[100], g1, g2, g3;
__attribute__((pure)) long goo();

void foo(long a, long b, long c, long d) {

  g1 = a * b;
  if (__builtin_expect(g2 > 3, 0)) {
    a = c;
    b = d;
    g2 = a * b;
  }
  g3 = a * b;      // fully redundant.

}
The patch adds phi-translate support in scalarpre. This is only a temporary
solution before the newpre based on newgvn is available.

Differential Revision: https://reviews.llvm.org/D32252

llvm-svn: 305578
2017-06-16 20:21:01 +00:00
Davide Italiano d95d871af0 [SCCP] Remove redundant instruction visitors.
Whenever we don't know what to do with an instruction, we send
it to overdefined anyway.

llvm-svn: 305575
2017-06-16 19:43:57 +00:00
Xinliang David Li c3f8e83253 Fix function name /NFC
llvm-svn: 305564
2017-06-16 16:54:13 +00:00
Daniel Neilson 3faabbbe85 [Atomics] Rename and change prototype for atomic memcpy intrinsic
Summary:

Background: http://lists.llvm.org/pipermail/llvm-dev/2017-May/112779.html

This change is to alter the prototype for the atomic memcpy intrinsic. The prototype itself is being changed to more closely resemble the semantics and parameters of the llvm.memcpy intrinsic -- to ease later combination of the llvm.memcpy and atomic memcpy intrinsics. Furthermore, the name of the atomic memcpy intrinsic is being changed to make it clear that it is not a generic atomic memcpy, but specifically a memcpy is unordered atomic.

Reviewers: reames, sanjoy, efriedma

Reviewed By: reames

Subscribers: mzolotukhin, anna, llvm-commits, skatkov

Differential Revision: https://reviews.llvm.org/D33240

llvm-svn: 305558
2017-06-16 14:43:59 +00:00
Craig Topper da6ea0d3e8 [InstCombine] Fold (!iszero(A & K1) & !iszero(A & K2)) -> (A & (K1 | K2)) == (K1 | K2) if K1 and K2 are a 1-bit mask
Summary: This is the demorganed version of the case we already handle for the OR of iszero.

Reviewers: spatel

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D34244

llvm-svn: 305548
2017-06-16 05:10:37 +00:00
Craig Topper 4d76da4fc9 [CorrelatedValuePropagation] Remove superfluous semicolon. NFC
llvm-svn: 305538
2017-06-16 01:53:20 +00:00
Evgeniy Stepanov 4d4ee93d25 [cfi] CFI-ICall for ThinLTO.
Implement ControlFlowIntegrity for indirect function calls in ThinLTO.
Design follows the RFC in llvm-dev, see
https://groups.google.com/d/msg/llvm-dev/MgUlaphu4Qc/kywu0AqjAQAJ

llvm-svn: 305533
2017-06-16 00:18:29 +00:00
Xinliang David Li eea0ade2eb [PartialInlining] Code Refactoring
This is a NFC code refactoring and interface cleanup. This paves the
way to enable outlining-only mode for the partial inliner.

llvm-svn: 305530
2017-06-15 23:56:59 +00:00
Craig Topper 2ba991ff2c [InstCombine] Add two FIXMEs for bad single use checks. NFC
llvm-svn: 305510
2017-06-15 21:38:48 +00:00
Teresa Johnson 152277952e Split PGO memory intrinsic optimization into its own source file
Summary:
Split the PGOMemOPSizeOpt pass out from IndirectCallPromotion.cpp into
its own file.

Reviewers: davidxl

Subscribers: mgorny, llvm-commits

Differential Revision: https://reviews.llvm.org/D34248

llvm-svn: 305501
2017-06-15 20:23:57 +00:00
Craig Topper f2d3e6d3d5 [InstCombine] Make the context instruction parameter of foldOrOfICmps a reference to discourage passing nullptr and to remove the '&' from all of the call sites. NFC
llvm-svn: 305493
2017-06-15 19:09:51 +00:00
Craig Topper 6eec9e21a5 [InstCombine] Handle (iszero(A & K1) | iszero(A & K2)) -> (A & (K1 | K2)) != (K1 | K2) when the one of the Ands is commuted relative to the other
Currently we expect A to be on the same side in both Ands but nothing guarantees that.

While there also switch to using matchers for some of the code.

Differential Revision: https://reviews.llvm.org/D34230

llvm-svn: 305487
2017-06-15 17:55:20 +00:00
Max Kazantsev dc80366d52 [ScalarEvolution] Apply Depth limit to getMulExpr
This is a fix for PR33292 that shows a case of extremely long compilation
of a single .c file with clang, with most time spent within SCEV.

We have a mechanism of limiting recursion depth for getAddExpr to avoid
long analysis in SCEV. However, there are calls from getAddExpr to getMulExpr
and back that do not propagate the info about depth. As result of this, a chain

  getAddExpr -> ... .> getAddExpr -> getMulExpr -> getAddExpr -> ... -> getAddExpr

can be extremely long, with every segment of getAddExpr's being up to max depth long.
This leads either to long compilation or crash by stack overflow. We face this situation while
analyzing big SCEVs in the test of PR33292.

This patch applies the same limit on max expression depth for getAddExpr and getMulExpr.

Differential Revision: https://reviews.llvm.org/D33984

llvm-svn: 305463
2017-06-15 11:48:21 +00:00
George Karpenkov 406c113103 Fixing section name for Darwin platforms for sanitizer coverage
On Darwin, section names have a 16char length limit.

llvm-svn: 305429
2017-06-14 23:40:25 +00:00
Daniel Berlin 6d2db9edb2 PredicateInfo: Don't insert conditional info when a conditional branch jumps to the same target regardless of condition
llvm-svn: 305416
2017-06-14 21:19:52 +00:00
Daniel Berlin 51e878e01d NewGVN: This is wrong by inspection, it will not cause an issue currently due to other limitations, i believe. This also means i can't make a test for it.
llvm-svn: 305415
2017-06-14 21:19:28 +00:00
Davide Italiano 0dc4778067 [EarlyCSE] Make PhiToCheck in removeMSSA() a set.
This way we end up not looking at PHI args already removed.
MemSSA now goes through the updater so we can prune
it to avoid having redundant MemoryPHI arguments, but that
doesn't quite work for the general case.

Discussed with Daniel Berlin, fixes PR33406.

llvm-svn: 305409
2017-06-14 19:29:53 +00:00
Frederich Munch dceb612eeb Hide dbgs() stream for when built with -fmodules.
Summary: Make DebugCounter::print and dump methods to be const correct.

Reviewers: aprantl

Reviewed By: aprantl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D34214

llvm-svn: 305408
2017-06-14 19:16:22 +00:00
Vedant Kumar 9c056c9e1b [InstrProf] Don't take the address of alwaysinline available_externally functions
Doing so breaks compilation of the following C program
(under -fprofile-instr-generate):

 __attribute__((always_inline)) inline int foo() { return 0; }

 int main() { return foo(); }

At link time, we fail because taking the address of an
available_externally function creates an undefined external reference,
which the TU cannot provide.

Emitting the function definition into the object file at all appears to
be a violation of the langref: "Globals with 'available_externally'
linkage are never emitted into the object file corresponding to the LLVM
module."

Differential Revision: https://reviews.llvm.org/D34134

llvm-svn: 305327
2017-06-13 22:12:35 +00:00
Teresa Johnson 8015f88525 [PGO] Update VP metadata after memory intrinsic optimization
Summary:
Leave an updated VP metadata on the fallback memcpy intrinsic after
specialization. This can be used for later possible expansion based on
the average of the remaining values.

Reviewers: davidxl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D34164

llvm-svn: 305321
2017-06-13 20:44:08 +00:00
Frederich Munch 6391c7e2a1 Revert r305313 & r305303, self-hosting build-bot isn’t liking it.
llvm-svn: 305318
2017-06-13 19:05:24 +00:00
Frederich Munch 4c73b40dca Force RegisterStandardPasses to construct std::function in the IPO library.
Summary: Fixes an issue using RegisterStandardPasses from a statically linked object before PassManagerBuilder::addGlobalExtension is called from a dynamic library.

Reviewers: efriedma, theraven

Reviewed By: efriedma

Subscribers: mehdi_amini, mgorny, llvm-commits

Differential Revision: https://reviews.llvm.org/D33515

llvm-svn: 305303
2017-06-13 16:48:41 +00:00
David Blaikie 6d0f39476a Inliner: Avoid calling shouldInline until it's absolutely necessary
This restores the order of evaluation (& conditionalized evaluation) of
isTriviallyDeadInstruction, InlineHistoryIncludes, and shouldInline
(with the addition of a shouldInline call after
isTriviallyDeadInstruction) from before r305245.

llvm-svn: 305267
2017-06-13 02:24:09 +00:00
George Burgess IV f613749382 Fix signed/unsigned comparison warning; NFC
llvm-svn: 305262
2017-06-13 01:28:49 +00:00
David Blaikie ae8c4af4ac Inliner: Don't remove calls to readnone+nounwind (but not always_inline) functions in the AlwaysInliner
llvm-svn: 305245
2017-06-12 23:01:17 +00:00
Anna Thomas 4b027e8f89 [RS4GC] Drop invalid metadata after pointers are relocated
Summary:
After RS4GC, we should drop metadata that is no longer valid. These metadata
is used by optimizations scheduled after RS4GC, and can cause a miscompile.
One such metadata is invariant.load which is used by LICM sinking transform.
After rewriting statepoints, the address of a load maybe relocated. With
invariant.load metadata on a load instruction, LICM sinking assumes the
loaded value (from a dererenceable address) to be invariant, and
rematerializes the load operand and the load at the exit block.
This transforms the IR to have an unrelocated use of the
address after a statepoint, which is incorrect.
Other metadata we conservatively remove are related to
dereferenceability and noalias metadata.

This patch drops such metadata on store and load instructions after
rewriting statepoints.

Reviewers: reames, sanjoy, apilipenko

Reviewed by: reames

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D33756

llvm-svn: 305234
2017-06-12 21:26:53 +00:00
Sanjay Patel 2e33bbaff0 [InstCombine] lshr (sext iM X to iN), N-M --> zext (ashr X, min(N-M, M-1)) to iN
This is a follow-up to https://reviews.llvm.org/D33879 / https://reviews.llvm.org/rL304939 ,
and was discussed in https://reviews.llvm.org/D33338.

We prefer this form because a narrower shift may be cheaper, and we can more easily fold a
zext than a sext.

http://rise4fun.com/Alive/slVe

Name: shz
%s = sext i8 %x to i12
%r = lshr i12 %s, 4
=>
%a = ashr i8 %x, 4
%r = zext i8 %a to i12 

llvm-svn: 305190
2017-06-12 14:23:43 +00:00
Xinliang David Li 7ed6cd32ea [PartialInlining] Support shrinkwrap life_range markers
Differential Revision: http://reviews.llvm.org/D33847

llvm-svn: 305170
2017-06-11 20:46:05 +00:00
Geoff Berry 3cca1da20c [EarlyCSE] Add option to use MemorySSA for function simplification run of EarlyCSE (off by default).
Summary:
Use MemorySSA for memory dependency checking in the EarlyCSE pass at the
start of the function simplification portion of the pipeline.  We rely
on the fact that GVNHoist runs just after this pass of EarlyCSE to
amortize the MemorySSA construction cost since GVNHoist uses MemorySSA
and EarlyCSE preserves it.

This is turned off by default.  A follow-up change will turn it on to
allow for easier reversion in case it breaks something.

llvm-svn: 305146
2017-06-10 15:20:03 +00:00
Andrew Kaylor 647025f9e1 [InstSimplify] Don't constant fold or DCE calls that are marked nobuiltin
Differential Revision: https://reviews.llvm.org/D33737

llvm-svn: 305132
2017-06-09 23:18:11 +00:00
Yaxun Liu 6455b0dbf3 [SROA] Fix APInt size when load/store have different address space
Currently there is a bug in SROA::presplitLoadsAndStores which causes assertion in
GEPOperator::accumulateConstantOffset.

Basically it does not consider the situation that the pointer operand of load or store
may be in a non-zero address space and its size may be different from the size of
a pointer in address space 0.

This patch fixes assertion when compiling Blender Cycles kernels for amdgpu backend.

Diffferential Revision: https://reviews.llvm.org/D33298

llvm-svn: 305107
2017-06-09 20:46:29 +00:00
Keno Fischer 5329174cb1 [Sink] Fix predicate in legality check
Summary:
isSafeToSpeculativelyExecute is the wrong predicate to use here.
All that checks for is whether it is safe to hoist a value due to
unaligned/un-dereferencable accesses. However, not only are we doing
sinking rather than hoisting, our concern is that the location
we're loading from may have been modified. Instead forbid sinking
any load across a critical edge.

Reviewers: majnemer

Subscribers: davide, llvm-commits

Differential Revision: https://reviews.llvm.org/D33179

llvm-svn: 305102
2017-06-09 19:31:10 +00:00
Sanjay Patel 70db424601 [SimplifyLibCalls] fix formatting; NFC
llvm-svn: 305081
2017-06-09 14:22:03 +00:00
Serguei Katkov 38414b57f9 [IndVars] Add an option to be able to disable LFTR
This change adds an option disable-lftr to be able to disable Linear Function Test Replace optimization.
By default option is off so current behavior is not changed.

Reviewers: reames, sanjoy, wmi, andreadb, apilipenko
Reviewed By: sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D33979

llvm-svn: 305055
2017-06-09 06:11:59 +00:00
George Burgess IV a20352e13e [LoopVectorize] Don't preserve nsw/nuw flags on shrunken ops.
If we're shrinking a binary operation, it may be the case that the new
operations wraps where the old didn't. If this happens, the behavior
should be well-defined. So, we can't always carry wrapping flags with us
when we shrink operations.

If we do, we get incorrect optimizations in cases like:

void foo(const unsigned char *from, unsigned char *to, int n) {
  for (int i = 0; i < n; i++)
    to[i] = from[i] - 128;
}

which gets optimized to:

void foo(const unsigned char *from, unsigned char *to, int n) {
  for (int i = 0; i < n; i++)
    to[i] = from[i] | 128;
}

Because:
- InstCombine turned `sub i32 %from.i, 128` into
  `add nuw nsw i32 %from.i, 128`.
- LoopVectorize vectorized the add to be `add nuw nsw <16 x i8>` with a
  vector full of `i8 128`s
- InstCombine took advantage of the fact that the newly-shrunken add
  "couldn't wrap", and changed the `add` to an `or`.

InstCombine seems happy to figure out whether we can add nuw/nsw on its
own, so I just decided to drop the flags. There are already a number of
places in LoopVectorize where we rely on InstCombine to clean up.

llvm-svn: 305053
2017-06-09 03:56:15 +00:00
David Blaikie cb9327b02d Inliner: Don't touch indirect calls
Other comments/implications are that this isn't intended behavior (nor
perserved/reimplemented in the new inliner) & complicates fixing the
'inlining' of trivially dead calls without consulting the cost function
first.

llvm-svn: 305052
2017-06-09 03:29:20 +00:00
Craig Topper a420562257 [InstCombine] Pass a proper context instruction to all of the calls into InstSimplify
Summary: This matches the behavior we already had for compares and makes us consistent everywhere.

Reviewers: dberlin, hfinkel, spatel

Reviewed By: dberlin

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D33604

llvm-svn: 305049
2017-06-09 03:21:29 +00:00
Evgeniy Stepanov d02dbf6b1c [CFI] Remove LinkerSubsectionsViaSymbols.
Since D17854 LinkerSubsectionsViaSymbols is unnecessary.

It is interfering with ThinLTO implementation of CFI-ICall, where
the aliases used on the !LinkerSubsectionsViaSymbols branch are
needed to export jump tables to ThinLTO backends.

This is the second attempt to land this change after fixing PR33316.

llvm-svn: 305031
2017-06-08 23:38:22 +00:00
Craig Topper 2aa4d39f5e [ExtractGV] Fix the doxygen comment on the constructor and the class to refer to global values instead of functions. While there fix an 80 column violation. NFC
llvm-svn: 305030
2017-06-08 23:38:19 +00:00
Peter Collingbourne e357fbd243 Write summaries for merged modules when splitting modules for ThinLTO.
This is to prepare to allow for dead stripping of globals in the
merged modules.

Differential Revision: https://reviews.llvm.org/D33921

llvm-svn: 305027
2017-06-08 23:01:49 +00:00
Kostya Serebryany 2c2fb8896b [sanitizer-coverage] one more flavor of coverage: -fsanitize-coverage=inline-8bit-counters. Experimental so far, not documenting yet. Reapplying revisions 304630, 304631, 304632, 304673, see PR33308
llvm-svn: 305026
2017-06-08 22:58:19 +00:00
Dehao Chen e2a428bad7 Do not early-inline recursive calls in sample profile loader.
Summary: Early-inlining of recursive call makes the code size bloat exponentially. We should not disable it.

Reviewers: davidxl, dnovillo, iteratee

Reviewed By: iteratee

Subscribers: iteratee, llvm-commits, sanjoy

Differential Revision: https://reviews.llvm.org/D34017

llvm-svn: 305009
2017-06-08 20:11:57 +00:00
Galina Kistanova e128958552 Changed a comparison operator for std::stable_sort to implement strict weak ordering.
This is a temporarily fix which needs additional work, as it triggers a test3 failure.
test3 is commented out till then.

llvm-svn: 304993
2017-06-08 17:27:40 +00:00
Nirav Dave 62fb8498d3 InferAddressSpaces: Avoid assertion failure with replacing identical
cloned constexpr

Have cloneConstantExprWithNewAddressSpaces return nullptr when
returning initial ConstantExpr.

Reviewers: arsenm

Subscribers: jholewinski, wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D33995

llvm-svn: 304975
2017-06-08 13:20:55 +00:00
John Brawn da4a68a1d2 [BPI] Don't assume that strcmp returning >0 is more likely than <0
The zero heuristic assumes that integers are more likely positive than negative,
but this also has the effect of assuming that strcmp return values are more
likely positive than negative. Given that for nonzero strcmp return values it's
the ordering of arguments that determines the sign of the result there's no
reason to assume that's true.

Fix this by inspecting the LHS of the compare and using TargetLibraryInfo to
decide if it's strcmp-like, and if so only assume that nonzero is more likely
than zero i.e. strings are more often different than the same. This causes a
slight code generation change in the spec2006 benchmark 403.gcc, but with no
noticeable performance impact. The intent of this patch is to allow better
optimisation of dhrystone on Cortex-M cpus, but currently it won't as there are
also some changes that need to be made to if-conversion.

Differential Revision: https://reviews.llvm.org/D33934

llvm-svn: 304970
2017-06-08 09:44:40 +00:00
Sanjay Patel 66f7fdb300 [InstCombine] fold lshr (sext X), C1 --> zext (lshr X, C2)
This was discussed in D33338. We have larger pattern-matching ending in a truncate that 
we can reduce or remove by handling these smaller patterns first. Further motivation is 
that narrower shift ops are easier for value tracking and zext is better than sext.

http://rise4fun.com/Alive/rhh

Name: boolshift
%sext = sext i1 %x to i8
%r = lshr i8 %sext, 7

=>

%r = zext i1 %x to i8

Name: noboolshift
%sext = sext i3 %x to i8
%r = lshr i8 %sext, 7

=>

%sh = lshr i3 %x, 2
%r = zext i3 %sh to i8

Differential Revision: https://reviews.llvm.org/D33879

llvm-svn: 304939
2017-06-07 20:32:08 +00:00
Xinliang David Li 4f49bee764 Fix builin_expect lowering bug
PR33346

Skip cases when expected value is not constant int.

llvm-svn: 304933
2017-06-07 18:32:24 +00:00
Peter Collingbourne aaae7eed5c LowerTypeTests: Generate simpler IR for br(llvm.type.test, then, else).
This makes it so that the code quality for CFI checks when compiling
with -O2 and linking with --lto-O0 is similar to that of the rest of
the code.

Reduces the size of a chrome binary built with -O2/--lto-O0 by
about 750KB.

Differential Revision: https://reviews.llvm.org/D33925

llvm-svn: 304921
2017-06-07 15:49:14 +00:00
Craig Topper 73ba1c84be [InstCombine][InstSimplify] Use APInt::isNullValue/isOneValue to reduce compiled code for comparing APInts with 0 and 1. NFC
These methods are specifically optimized to only counting leading zeros without an additional uint64_t compare.

llvm-svn: 304876
2017-06-07 07:40:37 +00:00
Craig Topper 29c282eac8 [InstCombine] Fix two asserts that were accidentally checking that an APInt pointer is non-zero instead of checking that the APInt self is non-zero.
I believe this code used to use APInt references which would have worked. But then they were changed to pointers to allow m_APInt to be used.

llvm-svn: 304875
2017-06-07 07:40:29 +00:00
Zachary Turner 264b5d9e88 Move Object format code to lib/BinaryFormat.
This creates a new library called BinaryFormat that has all of
the headers from llvm/Support containing structure and layout
definitions for various types of binary formats like dwarf, coff,
elf, etc as well as the code for identifying a file from its
magic.

Differential Revision: https://reviews.llvm.org/D33843

llvm-svn: 304864
2017-06-07 03:48:56 +00:00
Evgeny Stupachenko 3b88291581 Fix PR23384 (part 3 of 3)
Summary:
The patch makes instruction count the highest priority for
 LSR solution for X86 (previously registers had highest priority).

Reviewers: qcolombet

Differential Revision: http://reviews.llvm.org/D30562

From: Evgeny Stupachenko <evstupac@gmail.com>
llvm-svn: 304824
2017-06-06 20:04:16 +00:00
Daniel Berlin eafdd862e5 NewGVN: Fix PR/33187. This is a bug caused by two things:
1. When there is no perfect iteration order, we can't let phi nodes
put themselves in terms of things that come later in the iteration
order, or we will endlessly cycle (the normal RPO algorithm clears the
hashtable to avoid this issue).
2. We are sometimes erasing the wrong expression (causing pessimism)
because our equality says loads and stores are the same.
We introduce an exact equality function and use it when erasing to
make sure we erase only identical expressions, not equivalent ones.

llvm-svn: 304807
2017-06-06 17:15:28 +00:00
Anna Thomas b2a212c070 [Atomics][LoopIdiom] Recognize unordered atomic memcpy
Summary:
Expanding the loop idiom test for memcpy to also recognize
unordered atomic memcpy. The only difference for recognizing
an unordered atomic memcpy and instead of a normal memcpy is
that the loads and/or stores involved are unordered atomic operations.

Background:  http://lists.llvm.org/pipermail/llvm-dev/2017-May/112779.html

Patch by Daniel Neilson!

Reviewers: reames, anna, skatkov

Reviewed By: reames, anna

Subscribers: llvm-commits, mzolotukhin

Differential Revision: https://reviews.llvm.org/D33243

llvm-svn: 304806
2017-06-06 16:45:25 +00:00
Anna Thomas 7218032019 [IRCE] Canonicalize pre/post loops after the blocks are added into parent loop
Summary:
We were canonizalizing the pre loop (into loop-simplify form) before
the post loop blocks were added into parent loop. This is incorrect when IRCE is
done on a subloop. The post-loop blocks are created, but not yet added to the
parent loop. So, loop-simplification on the pre-loop incorrectly updates
LoopInfo.

This patch corrects the ordering so that pre and post loop blocks are added to
parent loop (if any), and then the loops are canonicalized to LCSSA and
LoopSimplifyForm.

Reviewers: reames, sanjoy, apilipenko

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D33846

llvm-svn: 304800
2017-06-06 14:54:01 +00:00
Chandler Carruth 6bda14b313 Sort the remaining #include lines in include/... and lib/....
I did this a long time ago with a janky python script, but now
clang-format has built-in support for this. I fed clang-format every
line with a #include and let it re-sort things according to the precise
LLVM rules for include ordering baked into clang-format these days.

I've reverted a number of files where the results of sorting includes
isn't healthy. Either places where we have legacy code relying on
particular include ordering (where possible, I'll fix these separately)
or where we have particular formatting around #include lines that
I didn't want to disturb in this patch.

This patch is *entirely* mechanical. If you get merge conflicts or
anything, just ignore the changes in this patch and run clang-format
over your #include lines in the files.

Sorry for any noise here, but it is important to keep these things
stable. I was seeing an increasing number of patches with irrelevant
re-ordering of #include lines because clang-format was used. This patch
at least isolates that churn, makes it easy to skip when resolving
conflicts, and gets us to a clean baseline (again).

llvm-svn: 304787
2017-06-06 11:49:48 +00:00
Xin Tong 9d6f08a8d4 Add a dominanance check interface that uses caching for instructions within same basic block.
Summary:
This problem stems from the fact that instructions are allocated using new
in LLVM, i.e. there is no relationship that can be derived by just looking
at the pointer value.

This interface dispatches to appropriate dominance check given 2 instructions,
i.e. in case the instructions are in the same basic block, ordered basicblock
(with instruction numbering and caching) are used. Otherwise, dominator tree
is used.

This is a preparation patch for https://reviews.llvm.org/D32720

Reviewers: dberlin, hfinkel, davide

Subscribers: davide, mgorny, llvm-commits

Differential Revision: https://reviews.llvm.org/D33380

llvm-svn: 304764
2017-06-06 02:34:41 +00:00
Evgeny Stupachenko f2b3b467e5 Fix PR23384 (part 2 of 3) NFC
Summary:
The patch moves LSR cost comparison to target part.

Reviewers: qcolombet

Differential Revision: http://reviews.llvm.org/D30561

From: Evgeny Stupachenko <evstupac@gmail.com>
llvm-svn: 304750
2017-06-05 23:37:00 +00:00
Evgeny Stupachenko 4d94e99446 LSR: Calculate instruction cost only if InsnsCost is set to true (NFC)
Summary:

The patch guard all instruction cost calculations with InsnCosts (-lsr-insns-cost) option.
Currently even if the option set to false we calculate and print (in debug mode) instruction costs.

Reviewers: qcolombet

Differential Revision: http://reviews.llvm.org/D33914

From: Evgeny Stupachenko <evstupac@gmail.com>
llvm-svn: 304746
2017-06-05 22:44:18 +00:00
Sven van Haastregt 78819e0fd4 [InstCombine] Fix extractelement use before def
This fixes a bug that can cause extractelements with operands that
haven't been defined yet to be inserted at a wrong point when
optimising insertelements.

Patch by Karl Hylen.

Differential Revision: https://reviews.llvm.org/D33449

llvm-svn: 304701
2017-06-05 09:18:10 +00:00
Renato Golin cdf840fd38 Revert "[sanitizer-coverage] one more flavor of coverage: -fsanitize-coverage=inline-8bit-counters. Experimental so far, not documenting yet."
This reverts commit r304630, as it broke ARM/AArch64 bots for 2 days.

llvm-svn: 304698
2017-06-05 07:35:52 +00:00
Ayal Zaks ab32aff838 [LV] Make scalarizeInstruction() non-virtual. NFC.
Following the request made in https://reviews.llvm.org/D32871,
scalarizeInstruction() which is no longer overridden by InnerLoopUnroller is
hereby made non-virtual in InnerLoopVectorizer.

Should have been part of r297580 originally.

llvm-svn: 304685
2017-06-04 13:29:51 +00:00
Craig Topper 0799ff9e64 [InstCombine] Add support for simplifying ctlz/cttz intrinsics based on known bits.
llvm-svn: 304669
2017-06-03 18:50:32 +00:00
Galina Kistanova e9cacb6ae8 Added LLVM_FALLTHROUGH to address warning: this statement may fall through. NFC.
llvm-svn: 304638
2017-06-03 05:19:32 +00:00
Galina Kistanova 55344aba7e Added LLVM_FALLTHROUGH to address warning: this statement may fall through. NFC.
llvm-svn: 304637
2017-06-03 05:19:10 +00:00
Galina Kistanova 96d51f5bcb Added LLVM_FALLTHROUGH to address warning: this statement may fall through. NFC.
llvm-svn: 304636
2017-06-03 05:18:46 +00:00
Kostya Serebryany f7db346cdf [sanitizer-coverage] one more flavor of coverage: -fsanitize-coverage=inline-8bit-counters. Experimental so far, not documenting yet.
llvm-svn: 304630
2017-06-03 01:35:47 +00:00
Evgeniy Stepanov 704003ea3d Revert "[CFI] Remove LinkerSubsectionsViaSymbols."
This reverts commit r304582: breaks cfi-devirt :: anon-namespace.cpp on Darwin.

llvm-svn: 304626
2017-06-03 00:46:27 +00:00
Alexey Bataev e4e5923ef1 [SLP] Improve comments and naming of functions/variables/members, NFC.
Fixed some comments, added an additional description of the algorithms,
improved readability of the code.

Differential revision: https://reviews.llvm.org/D33320

llvm-svn: 304616
2017-06-03 00:08:21 +00:00
Kostya Serebryany aed6ba770c [sanitizer-coverage] refactor the code to make it easier to add more sections in future. NFC
llvm-svn: 304610
2017-06-02 23:13:44 +00:00
Alexey Bataev 03ca396b95 Revert "[SLP] Improve comments and naming of functions/variables/members, NFC."
This reverts commit 6e311de8b907aa20da9a1a13ab07c3ce2ef4068a.

llvm-svn: 304609
2017-06-02 23:09:15 +00:00
Philip Reames b70cecd60a [Statepoint] Be consistent about using deopt naming [NFCI]
We'd called this "vm state" in the early days, but have long since standardized on calling it "deopt" in line with the operand bundle tag.  Fix a few cases we'd missed.

llvm-svn: 304607
2017-06-02 23:03:26 +00:00
Xinliang David Li 5fdc75aea1 Fix debug build test failure
llvm-svn: 304600
2017-06-02 22:38:48 +00:00
Xinliang David Li 0b7d858fa3 [PartialInlining] Minor cost anaysis tuning
Also added a test option and 2 cost analysis related tests.

llvm-svn: 304599
2017-06-02 22:08:04 +00:00
David Blaikie 6aeacaa527 FunctionAttrs: Skip it if the effective SCC (ignoring optnone functions) is empty
Minor optimization but mostly simplifies my debugging so I'm not dealing
with empty SCCNodeSets while investigating issues in this optimization.

llvm-svn: 304597
2017-06-02 21:24:17 +00:00
Alexey Bataev 2c08fde9e5 [SLP] Improve comments and naming of functions/variables/members, NFC.
Summary:
Fixed some comments, added an additional description of the algorithms,
improved readability of the code.

Reviewers: anemet

Subscribers: llvm-commits, mzolotukhin

Differential Revision: https://reviews.llvm.org/D33320

llvm-svn: 304593
2017-06-02 20:39:27 +00:00
Keno Fischer 514a6a54e7 [SROA] Fix crash due to bad bitcast
Summary:
As shown in the test case, SROA was crashing when trying to split
stores (to the alloca) of loads (from anywhere), because it assumed
the pointer operand to the loads and stores had to have the same
address space. This isn't the case. Make sure to use the correct
pointer type for both the load and the store.

Reviewed By: yaxunl
Differential Revision: https://reviews.llvm.org/D32593

llvm-svn: 304585
2017-06-02 19:04:17 +00:00
Evgeniy Stepanov 63f056327d [CFI] Remove LinkerSubsectionsViaSymbols.
Since D17854 LinkerSubsectionsViaSymbols is unnecessary.

It is interfering with ThinLTO implementation of CFI-ICall, where
the aliases used on the !LinkerSubsectionsViaSymbols branch are
needed to export jump tables to ThinLTO backends.

llvm-svn: 304582
2017-06-02 18:45:14 +00:00
Evgeniy Stepanov b933ad3a77 Skip CFI for dead functions.
Differential Revision: https://reviews.llvm.org/D33805

llvm-svn: 304578
2017-06-02 18:24:23 +00:00
Sanjay Patel ce241f48c5 [InstCombine] fix icmp with not op and constant to work with splat vector constant
llvm-svn: 304562
2017-06-02 16:29:41 +00:00
Sanjay Patel 4dc85eb75a [InstCombine] improve perf by not creating a known non-canonical instruction
Op1 (RHS) is a constant, so putting it on the LHS makes us churn through visitICmp
an extra time to canonicalize it:

INSTCOMBINE ITERATION #1 on cmpnot
IC: ADDING: 3 instrs to worklist
IC: Visiting:   %notx = xor i8 %x, -1
IC: Visiting:   %cmp = icmp sgt i8 %notx, 42
IC: Old =   %cmp = icmp sgt i8 %notx, 42
    New =   <badref> = icmp sgt i8 -43, %x
IC: ADD:   %cmp = icmp sgt i8 -43, %x
IC: ERASE   %1 = icmp sgt i8 %notx, 42
IC: ADD:   %notx = xor i8 %x, -1
IC: DCE:   %notx = xor i8 %x, -1
IC: ERASE   %notx = xor i8 %x, -1
IC: Visiting:   %cmp = icmp sgt i8 -43, %x
IC: Mod =   %cmp = icmp sgt i8 -43, %x
    New =   %cmp = icmp slt i8 %x, -43
IC: ADD:   %cmp = icmp slt i8 %x, -43
IC: Visiting:   %cmp = icmp slt i8 %x, -43
IC: Visiting:   ret i1 %cmp

If we create the swapped ICmp directly, we go faster:

INSTCOMBINE ITERATION #1 on cmpnot
IC: ADDING: 3 instrs to worklist
IC: Visiting:   %notx = xor i8 %x, -1
IC: Visiting:   %cmp = icmp sgt i8 %notx, 42
IC: Old =   %cmp = icmp sgt i8 %notx, 42
    New =   <badref> = icmp slt i8 %x, -43
IC: ADD:   %cmp = icmp slt i8 %x, -43
IC: ERASE   %1 = icmp sgt i8 %notx, 42
IC: ADD:   %notx = xor i8 %x, -1
IC: DCE:   %notx = xor i8 %x, -1
IC: ERASE   %notx = xor i8 %x, -1
IC: Visiting:   %cmp = icmp slt i8 %x, -43
IC: Visiting:   ret i1 %cmp

llvm-svn: 304558
2017-06-02 16:11:14 +00:00
Gor Nishanov 053d2d24f7 [coroutines] PR33271: Remove stray coro.save intrinsics during CoroSplit
Summary:
Optimization passes may remove llvm.coro.suspend intrinsic while leaving matching llvm.coro.save intrinsic orphaned.
Make sure we clean up orphaned coro.saves.  The bug manifested with a crash similar to this:

```
    llvm_unreachable("Unknown type!");
    llvm::MVT::getVT (Ty=0x489518, HandleUnknown=false)
    llvm::EVT::getEVT
    llvm::TargetLoweringBase::getValueType
    llvm::ComputeValueVTs
    llvm::SelectionDAGBuilder::visitTargetIntrinsic
```

Reviewers: GorNishanov

Subscribers: EricWF, llvm-commits

Differential Revision: https://reviews.llvm.org/D33817

llvm-svn: 304518
2017-06-02 02:18:36 +00:00
Xinliang David Li 621e8dcf1f [Profile] Enhance expect lowering to handle correlated branches
builtin_expect applied on && or || expressions were not
handled properly before. With this patch, the problem is fixed.

Differential Revision: http://reviews.llvm.org/D33164

llvm-svn: 304517
2017-06-02 02:09:31 +00:00
Philip Reames ae80045deb [RS4GC] Comment clarification
llvm-svn: 304514
2017-06-02 01:52:06 +00:00
Davide Italiano 1dd5558e52 [PM] GVNSink is off by default, fix an obvious typo.
llvm-svn: 304497
2017-06-01 23:47:53 +00:00
Xinliang David Li d6cfba2a02 Fix compiler_rt buildbot failure
llvm-svn: 304489
2017-06-01 23:05:11 +00:00
Keno Fischer fa635d730f Reapply "[Cloning] Take another pass at properly cloning debug info"
This was rL304226, reverted in 304228 due to a clang assertion failure
on the build bots. That problem should have been addressed by clang
commit rL304470.

llvm-svn: 304488
2017-06-01 23:02:12 +00:00
Evgeniy Stepanov 56584bbf16 (NFC) Track global summary liveness in GVFlags.
Replace GVFlags::LiveRoot with GVFlags::Live and use that instead of
all the DeadSymbols sets. This is refactoring in order to make
liveness information available in the RegularLTO pipeline.

llvm-svn: 304466
2017-06-01 20:30:06 +00:00
Xinliang David Li ee8d6acb1f [Profile] Fix builtin_expect lowering bug
The lowerer wrongly assumes the ICMP instruction 
 1) always has a constant operand;
 2) the operand has value 0.

It also assumes the expected value can only be one, thus
other values other than one will be considered 'zero'.

This leads to wrong profile annotation when other integer values
are used other than 0, 1 in the comparison or in the expect intrinsic.

Also missing is handling of equal predicate.

This patch fixes all the above problems.

Differential Revision: http://reviews.llvm.org/D33757

llvm-svn: 304453
2017-06-01 19:05:55 +00:00
Xinliang David Li 0a0acbcf78 [PartialInlining] Emit branch info and profile data as remarks
This allows us to collect profile statistics to tune static 
branch prediction.

Differential Revision: http://reviews.llvm.org/D33746

llvm-svn: 304452
2017-06-01 18:58:50 +00:00
Mandeep Singh Grang 33a1b73600 [PredicateInfo] Fix non-determinism in codegen uncovered by reverse iterating SmallPtrSet
Summary:
Sort OpsToRename before iterating to make iteration order deterministic.

Thanks to Daniel Berlin for the sorting logic.

Reviewers: dberlin, RKSimon, efriedma, davide

Reviewed By: dberlin, davide

Subscribers: sanjoy, davide, llvm-commits

Differential Revision: https://reviews.llvm.org/D33265

llvm-svn: 304447
2017-06-01 18:36:24 +00:00
Tim Shen 6b41141863 [ThinLTO] Migrate ThinLTOBitcodeWriter to the new PM.
Summary: Also see D33429 for other ThinLTO + New PM related changes.

Reviewers: davide, chandlerc, tejohnson

Subscribers: mehdi_amini, Prazek, cfe-commits, inglorion, llvm-commits, eraman

Differential Revision: https://reviews.llvm.org/D33525

llvm-svn: 304378
2017-06-01 01:02:12 +00:00
Xinliang David Li 32c5e809be [PartialInlining] Reduce outlining overhead by removing unneeded live-out(s)
Differential Revision: http://reviews.llvm.org/D33694

llvm-svn: 304375
2017-06-01 00:12:41 +00:00
Wei Mi 0bd3f41588 Revert rL304050. It may break sanitizer bootstrap. Revert it for now while investigating.
llvm-svn: 304350
2017-05-31 21:29:33 +00:00
Reid Kleckner 5fbdd17714 [IR] Add additional addParamAttr/removeParamAttr to AttributeList API
Summary:
Fairly straightforward patch to fill in some of the holes in the
attributes API with respect to accessing parameter/argument attributes.
The patch aims to step further towards encapsulating the
idx+FirstArgIndex pattern to access these attributes to within the
AttributeList.

Patch by Daniel Neilson!

Reviewers: rnk, chandlerc, pete, javed.absar, reames

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D33355

llvm-svn: 304329
2017-05-31 19:23:09 +00:00
Kostya Serebryany 53b34c8443 [sanitizer-coverage] remove stale code (old coverage); llvm part
llvm-svn: 304319
2017-05-31 18:27:33 +00:00
Anna Thomas 777bb90bdc Revert "[Atomics][LoopIdiom] Recognize unordered atomic memcpy"
This reverts commit r304310.

It caused build failures in polly and mingw
due to undefined reference to
llvm::RTLIB::getMEMCPY_ELEMENT_ATOMIC.

llvm-svn: 304315
2017-05-31 17:20:51 +00:00
Zaara Syeda 3a7578c658 [PPC] Inline expansion of memcmp
This patch does an inline expansion of memcmp.
It changes the memcmp library call into an inline expansion when the size is
known at compile time and is under a target specified threshold.
This expansion is implemented in CodeGenPrepare and expands into straight line
code. The target specifies a maximum load size and the expansion works by using
this size to load the two sources, compare, and exit early if a difference is
found. It also has a special case when the memcmp result is used in a compare
to zero equality.

Differential Revision: https://reviews.llvm.org/D28637

llvm-svn: 304313
2017-05-31 17:12:38 +00:00
Anna Thomas 056c009f1b [Atomics][LoopIdiom] Recognize unordered atomic memcpy
Summary:
Expanding the loop idiom test for memcpy to also recognize unordered atomic memcpy.
The only difference for recognizing
an unordered atomic memcpy and instead of a normal memcpy is
that the loads and/or stores involved are unordered atomic operations.
Background:  http://lists.llvm.org/pipermail/llvm-dev/2017-May/112779.html

Patch by Daniel Neilson!

Reviewers: reames, anna, skatkov

Reviewed By: reames

Subscribers: llvm-commits, mzolotukhin

Differential Revision: https://reviews.llvm.org/D33243

llvm-svn: 304310
2017-05-31 16:39:52 +00:00
Gor Nishanov 2bc782d8da [coroutines] Call initializePass in coroutine pass constructors
Summary:

Fixes: https://bugs.llvm.org/show_bug.cgi?id=33226

Reviewers: chandlerc, davide, majnemer, dblaikie

Reviewed By: chandlerc

Subscribers: EricWF, llvm-commits

Differential Revision: https://reviews.llvm.org/D33701

llvm-svn: 304277
2017-05-31 03:12:42 +00:00
Daniel Berlin be3e7ba45e NewGVN: Fix PR 33185 by checking whether we need to recursively
generate a phi of ops, which we don't currently support.

llvm-svn: 304272
2017-05-31 01:47:32 +00:00
Xinliang David Li 74480adafd [PartialInlining] Shrinkwrap allocas with live range contained in outline region.
Differential Revision: http://reviews.llvm.org/D33618

llvm-svn: 304245
2017-05-30 21:22:18 +00:00
Matthew Simpson 646475a9bc [LV] Reapply r303763 with fix for PR33193
r303763 caused build failures in some out-of-tree tests due to an assertion in
TTI. The original patch updated cost estimates for induction variable update
instructions marked for scalarization. However, it didn't consider that the
incoming value of an induction variable phi node could be a cast instruction.
This caused queries for cast instruction costs with a mix of vector and scalar
types. This patch includes a fix for cast instructions and the test case from
PR33193.

The fix was suggested by Jonas Paulsson <paulsson@linux.vnet.ibm.com>.

Reference: https://bugs.llvm.org/show_bug.cgi?id=33193
Original Differential Revision: https://reviews.llvm.org/D33457

llvm-svn: 304235
2017-05-30 19:55:57 +00:00
Keno Fischer 3fa5db4c04 Revert "[Cloning] Take another pass at properly cloning debug info"
At least one build bot is complaining. Will investigate after lunch.

llvm-svn: 304228
2017-05-30 18:56:26 +00:00
Keno Fischer 945dc1d2d1 [Cloning] Take another pass at properly cloning debug info
Summary:
In rL302576, DISubprograms gained the constraint that a !dbg attachments to functions must
have a 1:1 mapping to DISubprograms. As part of that change, the function cloning support
was adjusted to attempt to enforce this invariant during cloning. However, there
were several problems with the implementation. Part of these were fixed in rL304079.
However, there was a more fundamental problem with these changes, namely that it
bypasses the matadata value map, causing the cloned metadata to be a mix of metadata
pointing to the new suprogram (where manual code was added to fix those up) and the
old suprogram (where this was not the case). This mismatch could cause a number of
different assertion failures in the DWARF emitter. Some of these are given at
https://github.com/JuliaLang/julia/issues/22069, but some others have been observed
as well. Attempt to rectify this by partially reverting the manual DI metadata fixup,
and instead using the standard value map approach. To retain the desired semantics
of not duplicating the compilation unit and inlined subprograms, explicitly freeze
these in the value map.

Reviewers: dblaikie, aprantl, GorNishanov, echristo

Reviewed By: aprantl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D33655

llvm-svn: 304226
2017-05-30 18:28:30 +00:00
Daniel Berlin 2aa5dc1589 NewGVN: Compute hash value of expression on demand and use it in inequality testing.
llvm-svn: 304195
2017-05-30 06:58:18 +00:00
Daniel Berlin c8ed40400c NewGVN: Fix PR33194, memory corruption by putting temporary instructions in tables sometimes.
llvm-svn: 304194
2017-05-30 06:42:29 +00:00
Joerg Sonnenberger 9375a25342 Revert r303763, results in asserts i.e. while building Ruby.
llvm-svn: 304179
2017-05-29 22:52:17 +00:00
Hiroshi Inoue ac9cd3080d [trivial] fix a typo in comment, NFC
llvm-svn: 304139
2017-05-29 08:37:42 +00:00
Gor Nishanov ffbeb22b6f Cloning: Fix debug info cloning
Summary:
I believe https://reviews.llvm.org/rL302576 introduced two bugs:

1) it produces duplicate distinct variables for every: dbg.value describing the same variable.
    To fix the problme I switched form getDistinct() to get() in DebugLoc.cpp: auto reparentVar = [&](DILocalVariable *Var) {
    return DILocalVariable::getDistinct(

2) It passes NewFunction plain name as a linkagename parameter to Subprogram constructor. Breaks assert in:

 || DeclLinkageName.empty()) || LinkageName == DeclLinkageName) && "decl has a linkage name and it is different"' failed.
#9 0x00007f5010261b75 llvm::DwarfUnit::applySubprogramDefinitionAttributes(llvm::DISubprogram const*, llvm::DIE&) /home/gor/llvm/lib/CodeGen/AsmPrinter/DwarfUnit.cpp:1173:3
#
(Edit: reproducer added)

Here how https://reviews.llvm.org/rL302576 broke coroutine debug info.
Coroutine body of the original function is split into several parts by cloning and removing unneeded code.
All parts describe the original function and variables present in the original function.

For a simple case, prior to Split, original function has these two blocks:

```
PostSpill:                                        ; preds = %AllocaSpillBB
  call void @llvm.dbg.value(metadata i32 %x, i64 0, metadata !14, metadata !15), !dbg !13
  store i32 %x, i32* %x.addr, align 4
  ...
and

sw.epilog:                                        ; preds = %sw.bb
  %x.addr.reload.addr = getelementptr inbounds %f.Frame, %f.Frame* %FramePtr, i32 0, i32 4, !dbg !20
  %4 = load i32, i32* %x.addr.reload.addr, align 4, !dbg !20
  call void @llvm.dbg.value(metadata i32 %4, i64 0, metadata !14, metadata !15), !dbg !13

!14 = !DILocalVariable(name: "x", arg: 1, scope: !6, file: !7, line: 55, type: !11)

```

Note that in two blocks different expression represent the same original user variable X.

Before rL302576, for every cloned function there was exactly one cloned DILocalVariable(name: "x" as in:

```
define i8* @f(i32 %x) #0 !dbg !6 {
  ...
!6 = distinct !DISubprogram(name: "f", scope: !7, file: !7, line: 55, type: !8, isLocal: false, isDefinition: true, scopeLine: 55, flags: DIFlagPrototyped,
...
!14 = !DILocalVariable(name: "x", arg: 1, scope: !6, file: !7, line: 55, type: !11)

define internal fastcc void @f.resume(%f.Frame* %FramePtr) #0 !dbg !25 {
...
!25 = distinct !DISubprogram(name: "f", scope: !7, file: !7, line: 55, type: !8, isLocal: false, isDefinition: true, scopeLine: 55, flags: DIFlagPrototyped, isOptimized: false, unit: !0, variables: !2)
!28 = !DILocalVariable(name: "x", arg: 1, scope: !25, file: !7, line: 55, type: !11)
```
After rL302576, for every cloned function there were as many DILocalVariable(name: "x" as there were "call void @llvm.dbg.value" for that variable.
This was causing asserts in VerifyDebugInfo and AssemblyPrinter.

Example:

```
!27 = distinct !DISubprogram(name: "f", linkageName: "f.resume", scope: !7, file: !7, line: 55, type: !8, isLocal: false, isDefinition: true, scopeLine: 55,
!29 = distinct !DILocalVariable(name: "x", arg: 1, scope: !27, file: !7, line: 55, type: !11)
!39 = distinct !DILocalVariable(name: "x", arg: 1, scope: !27, file: !7, line: 55, type: !11)
!41 = distinct !DILocalVariable(name: "x", arg: 1, scope: !27, file: !7, line: 55, type: !11)
```

Second problem:

Prior to rL302576, all clones were described by DISubprogram referring to original function.

```
define i8* @f(i32 %x) #0 !dbg !6 {
...
!6 = distinct !DISubprogram(name: "f", scope: !7, file: !7, line: 55, type: !8, isLocal: false, isDefinition: true, scopeLine: 55, flags: DIFlagPrototyped,

define internal fastcc void @f.resume(%f.Frame* %FramePtr) #0 !dbg !25 {
...
!25 = distinct !DISubprogram(name: "f", scope: !7, file: !7, line: 55, type: !8, isLocal: false, isDefinition: true, scopeLine: 55, flags: DIFlagPrototyped,
```

After rL302576, DISubprogram for clones is of two minds, plain name refers to the original name, linkageName refers to plain name of the clone.

```
!27 = distinct !DISubprogram(name: "f", linkageName: "f.resume", scope: !7, file: !7, line: 55, type: !8, isLocal: false, isDefinition: true, scopeLine: 55,
```

I think the assumption in AsmPrinter is that both name and linkageName should refer to the same entity. It asserts here when they are not:

```
 || DeclLinkageName.empty()) || LinkageName == DeclLinkageName) && "decl has a linkage name and it is different"' failed.
#9 0x00007f5010261b75 llvm::DwarfUnit::applySubprogramDefinitionAttributes(llvm::DISubprogram const*, llvm::DIE&) /home/gor/llvm/lib/CodeGen/AsmPrinter/DwarfUnit.cpp:1173:3
```
After this fix, behavior (with respect to coroutines) reverts to exactly as it was before and therefore making them debuggable again, or even more importantly, compilable, with "-g"

Reviewers: dblaikie, echristo, aprantl

Reviewed By: dblaikie

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D33614

llvm-svn: 304079
2017-05-27 19:41:09 +00:00
Gor Nishanov 9c6ac6138d [coroutines] Define getPassName() for coroutine passes
Reviewers: GorNishanov

Reviewed By: GorNishanov

Subscribers: EricWF, llvm-commits

Differential Revision: https://reviews.llvm.org/D33622

llvm-svn: 304065
2017-05-27 05:54:30 +00:00
Vitaly Buka a637489ef1 [PartialInlining] Replace delete with unique_ptr in computeCallsiteToProfCountMap
Reviewers: davidxl

Reviewed By: davidxl

Subscribers: vsk, llvm-commits

Differential Revision: https://reviews.llvm.org/D33220

llvm-svn: 304064
2017-05-27 05:32:09 +00:00
Wei Mi 5bbb5aafc1 [GVN] Recommit the patch "Add phi-translate support in scalarpre".
The recommit is to fix a bug about ExtractValue and InsertValue ops. For those
ops, some varargs inside GVN::Expression are not value numbers but raw index
numbers. It is wrong to do phi-translate for raw index numbers, and the fix is
to stop doing that.

Right now scalarpre doesn't have phi-translate support, so it will miss some
simple pre opportunities. Like the following testcase, current scalarpre cannot
recognize the last "a * b" is fully redundent because a and b used by the last
"a * b" expr are both defined by phis.

long a[100], b[100], g1, g2, g3;
__attribute__((pure)) long goo();

void foo(long a, long b, long c, long d) {
  g1 = a * b;
  if (__builtin_expect(g2 > 3, 0)) {
    a = c;
    b = d;
    g2 = a * b;
  }
  g3 = a * b;      // fully redundant.
}
The patch adds phi-translate support in scalarpre. This is only a temporary
solution before the newpre based on newgvn is available.

Differential Revision: https://reviews.llvm.org/D32252

llvm-svn: 304050
2017-05-27 00:54:19 +00:00
Benjamin Kramer debb3c35e0 Make helper functions static. NFC.
llvm-svn: 304029
2017-05-26 20:09:00 +00:00
Peter Collingbourne 7730b24448 PMB: Run the whole-program-devirt pass during LTO at --lto-O0.
The whole-program-devirt pass needs to run at -O0 because only it
knows about the llvm.type.checked.load intrinsic: it needs to both
lower the intrinsic itself and handle it in the summary.

Differential Revision: https://reviews.llvm.org/D33571

llvm-svn: 304019
2017-05-26 18:27:13 +00:00
Craig Topper d45185f231 [InstCombine] Pass the DominatorTree, AssumptionCache, and context instruction to a few calls to isKnownPositive, isKnownNegative, and isKnownNonZero
Every other place in InstCombine that uses these methods in ValueTracking already pass this information. This makes the remaining sites consistent.

Differential Revision: https://reviews.llvm.org/D33567

llvm-svn: 304018
2017-05-26 18:23:57 +00:00
Wei Mi 3250ae3f7c Revert rL303923 since it broke the sanitizer bootstrap build bot.
llvm-svn: 303969
2017-05-26 05:42:50 +00:00
Craig Topper d4039f7283 [InstCombine] Add an InstCombine specific wrapper around isKnownToBeAPowerOfTwo to shorten code. NFC
We have wrappers for several other ValueTracking methods that take care of passing all of the analysis and assumption cache parameters. This extends it to isKnownToBeAPowerOfTwo.

llvm-svn: 303924
2017-05-25 21:51:12 +00:00
Wei Mi fd257fa7bf [GVN] Add phi-translate support in scalarpre.
Right now scalarpre doesn't have phi-translate support, so it will miss some
simple pre opportunities. Like the following testcase, current scalarpre cannot
recognize the last "a * b" is fully redundent because a and b used by the last
"a * b" expr are both defined by phis.

  long a[100], b[100], g1, g2, g3;
  __attribute__((pure)) long goo();

  void foo(long a, long b, long c, long d) {
    g1 = a * b;
    if (__builtin_expect(g2 > 3, 0)) {
      a = c;
      b = d;
      g2 = a * b;
    }
    g3 = a * b;      // fully redundant.
  }

The patch adds phi-translate support in scalarpre. This is only a temporary
solution before the newpre based on newgvn is available.

Differential Revision: https://reviews.llvm.org/D32252

llvm-svn: 303923
2017-05-25 21:49:02 +00:00
Daniel Berlin e67c322260 NewGVN: Fix PR 33119, PR 33129, due to regressed undef handling
Fix PR33120 and others by eliminating self-cycles a different way.

llvm-svn: 303875
2017-05-25 15:44:20 +00:00
Artur Pilipenko 315eafc339 [InstCombine] Teach isAllocSiteRemovable to look through addrspacecasts
Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D28565

llvm-svn: 303870
2017-05-25 15:14:48 +00:00
Sanjay Patel 5150612012 [InstCombine] make icmp-mul fold more efficient
There's probably a lot more like this (see also comments in D33338 about responsibility), 
but I suspect we don't usually get a visible manifestation.

Given the recent interest in improving InstCombine efficiency, another potential micro-opt
that could be repeated several times in this function: morph the existing icmp pred/operands
instead of creating a new instruction.

llvm-svn: 303860
2017-05-25 14:13:57 +00:00
James Molloy dc2d64bc35 [GVNSink] Pacify MSVC
Don't convert an unsigned to a pointer for a sentinel, use a size_t instead.

llvm-svn: 303855
2017-05-25 13:14:10 +00:00
James Molloy 2a237f19f1 [GVNSink] Don't define operator<< in NDEBUG
Without debug macros enabled, the raw_ostream operator<< overload
is unused.

llvm-svn: 303852
2017-05-25 13:11:18 +00:00
James Molloy a929063233 [GVNSink] GVNSink pass
This patch provides an initial prototype for a pass that sinks instructions based on GVN information, similar to GVNHoist. It is not yet ready for commiting but I've uploaded it to gather some initial thoughts.

This pass attempts to sink instructions into successors, reducing static
instruction count and enabling if-conversion.
We use a variant of global value numbering to decide what can be sunk.
Consider:

[ %a1 = add i32 %b, 1  ]   [ %c1 = add i32 %d, 1  ]
[ %a2 = xor i32 %a1, 1 ]   [ %c2 = xor i32 %c1, 1 ]
                 \           /
           [ %e = phi i32 %a2, %c2 ]
           [ add i32 %e, 4         ]

GVN would number %a1 and %c1 differently because they compute different
results - the VN of an instruction is a function of its opcode and the
transitive closure of its operands. This is the key property for hoisting
and CSE.

What we want when sinking however is for a numbering that is a function of
the *uses* of an instruction, which allows us to answer the question "if I
replace %a1 with %c1, will it contribute in an equivalent way to all
successive instructions?". The (new) PostValueTable class in GVN provides this
mapping.

This pass has some shown really impressive improvements especially for codesize already on internal benchmarks, so I have high hopes it can replace all the sinking logic in SimplifyCFG.

Differential revision: https://reviews.llvm.org/D24805

llvm-svn: 303850
2017-05-25 12:51:11 +00:00
Chandler Carruth dd2e275a47 [PM/Unswitch] Fix a bug in the domtree update logic for the new unswitch
pass.

The original logic only considered direct successors of the hoisted
domtree nodes, but that isn't really enough. If there are other basic
blocks that are completely within the subtree, their successors could
just as easily be impacted by the hoisting.

The more I think about it, the more I think the correct update here is
to hoist every block on the dominance frontier which has an idom in the
chain we hoist across. However, this is subtle enough that I'd
definitely appreciate some more eyes on it.

Sadly, if this is the correct algorithm, it requires computing a (highly
localized) dominance frontier. I've done this in the simplest (IE, least
code) way I could come up with, but that may be too naive. Suggestions
welcome here, dominance update algorithms are not an area I've studied
much, so I don't have strong opinions.

In good news, with this patch, turning on simple unswitch passes the
LLVM test suite for me with asserts enabled.

Differential Revision: https://reviews.llvm.org/D32740

llvm-svn: 303843
2017-05-25 06:33:36 +00:00
Chandler Carruth 29c22d2835 [LegacyPM] Make the 'addLoop' method accept a loop to add rather than
having it internally allocate the loop.

This is a much more flexible API and necessary in the new loop unswitch
to reasonably support both new and old PMs in common code. It also just
seems like a cleaner separation of concerns.

NFC, this should just be a pure refactoring.

Differential Revision: https://reviews.llvm.org/D33528

llvm-svn: 303834
2017-05-25 03:01:31 +00:00
George Karpenkov a1c532784d Fix coverage check for full post-dominator basic blocks.
Coverage instrumentation which does not instrument full post-dominators
and full-dominators may skip valid paths, as the reasoning for skipping
blocks may become circular.
This patch fixes that, by only skipping
full post-dominators with multiple predecessors, as such predecessors by
definition can not be full-dominators.

llvm-svn: 303827
2017-05-25 01:41:46 +00:00
Gor Nishanov 1fbc01f70f [coroutines] CoroFrame.cpp conform to coding convention (s/repeat/Repeat) (NFC)
llvm-svn: 303826
2017-05-25 01:07:10 +00:00
Gor Nishanov 0ea1863b27 [coroutines] Relocate instructions that maybe spilled after coro.begin
Summary:
Frontend generates store instructions after allocas, for example:

```
define i8* @f(i64 %this) "coroutine.presplit"="1" personality i32 0 {
entry:
  %this.addr = alloca i64
  store i64 %this, i64* %this.addr
  ..
  %hdl = call i8* @llvm.coro.begin(token %id, i8* %alloc)

```
Such instructions may require spilling into coro.frame, but, coro-frame address is only available after coro.begin and thus needs to be moved after coro.begin.
The only instructions that should not be moved are the arguments of coro.begin and all of their operands.

Reviewers: GorNishanov, majnemer

Reviewed By: GorNishanov

Subscribers: llvm-commits, EricWF

Differential Revision: https://reviews.llvm.org/D33527

llvm-svn: 303825
2017-05-25 00:46:20 +00:00
Gor Nishanov 1f72d75714 [coroutines] Allow rematerialization upto 4 times. Remove incorrect assert
Reviewers: majnemer

Subscribers: EricWF, llvm-commits

Differential Revision: https://reviews.llvm.org/D33524

llvm-svn: 303819
2017-05-24 23:01:02 +00:00
Sanjay Patel 07b1ba54b5 [InstCombine] use m_APInt to allow icmp-mul-mul vector fold
The swapped operands in the first test is a manifestation of an 
inefficiency for vectors that doesn't exist for scalars because 
the IRBuilder checks for an all-ones mask for scalars, but not 
vectors.

llvm-svn: 303818
2017-05-24 22:58:17 +00:00
Craig Topper 2f9c6dafe3 [InstCombine] Merge together the SimplifyDemandedUseBits implementations for ZExt and Trunc. NFC
While there avoid resizing the DemandedMask twice. Make a copy into a separate variable instead. This potentially removes an allocation on large bit widths.

With the use of the zextOrTrunc methods on APInt and KnownBits these can be made almost source identical. The only difference is the zero of the upper bits for ZExt. This is similar to how its done in computeKnownBits in ValueTracking.

llvm-svn: 303791
2017-05-24 18:40:25 +00:00
Teresa Johnson cd2aa0d2e4 Fix a couple of typos in memory intrinsic optimization output (NFC)
s/instrinsic/intrinsic

llvm-svn: 303782
2017-05-24 17:55:25 +00:00
Craig Topper 1c660dbea6 [InstCombine] Use less bitwise operations to handle Instruction::SExt in SimplifyDemandedUseBits. Other improvements.
The current code created a NewBits mask and used it as a mask several times. One of them just before a call to trunc making it unnecessary. A call to getActiveBits can get us the same information for the case. We also ORed with this mask later when we should have just sign extended the known bits.

We also called trunc on the guaranteed to be zero KnownZeros/Ones masks entering this code. Creating appropriately sized temporary APInts is probably better.

Differential Revision: https://reviews.llvm.org/D32098

llvm-svn: 303779
2017-05-24 17:33:30 +00:00
Craig Topper 8205a1a9b6 [ValueTracking] Convert most of the calls to computeKnownBits to use the version that returns the KnownBits object.
This continues the changes started when computeSignBit was replaced with this new version of computeKnowBits.

Differential Revision: https://reviews.llvm.org/D33431

llvm-svn: 303773
2017-05-24 16:53:07 +00:00
Matthew Simpson d6f179cad6 [LV] Update type in cost model for scalarization
For non-uniform instructions marked for scalarization, we should update
`VectorTy` when computing instruction costs to reflect the scalar type. In
addition to determining instruction costs, this type is also used to signal
that all instructions in the loop will be scalarized. This currently affects
memory instructions and non-pointer induction variables and their updates. (We
also mark GEPs scalar after vectorization, but their cost is computed together
with memory instructions.) For scalarized induction updates, this patch also
scales the scalar cost by the vectorization factor, corresponding to each
induction step.

llvm-svn: 303763
2017-05-24 15:26:15 +00:00
Jonas Paulsson 8624b7e1ce [LoopVectorizer] Let target prefer scalar addressing computations.
The loop vectorizer usually vectorizes any instruction it can and then
extracts the elements for a scalarized use. On SystemZ, all elements
containing addresses must be extracted into address registers (GRs). Since
this extraction is not free, it is better to have the address in a suitable
register to begin with. By forcing address arithmetic instructions and loads
of addresses to be scalar after vectorization, two benefits result:

* No need to extract the register
* LSR optimizations trigger (LSR isn't handling vector addresses currently)

Benchmarking show improvements on SystemZ with this new behaviour.

Any other target could try this by returning false in the new hook
prefersVectorizedAddressing().

Review: Renato Golin, Elena Demikhovsky, Ulrich Weigand
https://reviews.llvm.org/D32422

llvm-svn: 303744
2017-05-24 13:42:56 +00:00
Davide Italiano fd9100e056 [NewGVN] Update additionalUsers when we simplify to a value.
Otherwise we don't revisit an instruction that could be simplified,
and when we verify, we discover there's something that changed, i.e.
what we had wasn't a maximal fixpoint.

Fixes PR32836.

llvm-svn: 303715
2017-05-24 02:30:24 +00:00
George Karpenkov 018472c34a Revert "Disable coverage opt-out for strong postdominator blocks."
This reverts commit 2ed06f05fc10869dd1239cff96fcdea2ee8bf4ef.
Buildbots do not like this on Linux.

llvm-svn: 303710
2017-05-24 00:29:12 +00:00
Davide Italiano c4861adad9 [SCCP] Use the `hasAddressTaken()` version defined in `Function`.
Instead of using the SCCP homegrown one. We should eventually
make the private SCCP version disappear, but that wont' be today.
PR33143 tracks this issue.

Add braces for consistency while here. No functional change intended.

llvm-svn: 303706
2017-05-23 23:59:23 +00:00
Davide Italiano 7bf95b964f [LIR] Use the newly `getRecurrenceVar()` helper. NFCI.
llvm-svn: 303704
2017-05-23 23:51:54 +00:00
Davide Italiano 4bc91190ea [LIR] Strengthen the check for recurrence variable in popcnt/CTLZ.
Fixes PR33114.
Differential Revision:  https://reviews.llvm.org/D33420

llvm-svn: 303700
2017-05-23 22:32:56 +00:00
George Karpenkov 9017ca290a Disable coverage opt-out for strong postdominator blocks.
Coverage instrumentation has an optimization not to instrument extra
blocks, if the pass is already "accounted for" by a
successor/predecessor basic block.
However (https://github.com/google/sanitizers/issues/783) this
reasoning may become circular, which stops valid paths from having
coverage.
In the worst case this can cause fuzzing to stop working entirely.

This change simplifies logic to something which trivially can not have
such circular reasoning, as losing valid paths does not seem like a
good trade-off for a ~15% decrease in the # of instrumented basic blocks.

llvm-svn: 303698
2017-05-23 21:58:54 +00:00
Sanjay Patel d3106add77 [InstCombine] allow icmp-xor folds for vectors (PR33138)
This fixes the first part of:
https://bugs.llvm.org/show_bug.cgi?id=33138

More work is needed for the bitcasted variant.

llvm-svn: 303660
2017-05-23 17:29:58 +00:00
Reid Kleckner 8bf67fe98f [IR] Switch AttributeList to use an array for O(1) access
Summary:
Before this change, AttributeLists stored a pair of index and
AttributeSet. This is memory efficient if most arguments do not have
attributes. However, it requires doing a search over the pairs to test
an argument or function attribute. Profiling shows that this loop was
0.76% of the time in 'opt -O2' of sqlite3.c, because LLVM constantly
tests values for nullability.

This was worth about 2.5% of mid-level optimization cycles on the
sqlite3 amalgamation. Here are the full perf results:
https://reviews.llvm.org/P7995

Here are just the before and after cycle counts:
```
$ perf stat -r 5 ./opt_before -O2 sqlite3.bc -o /dev/null
    13,274,181,184      cycles                    #    3.047 GHz                      ( +-  0.28% )
$ perf stat -r 5 ./opt_after -O2 sqlite3.bc -o /dev/null
    12,906,927,263      cycles                    #    3.043 GHz                      ( +-  0.51% )
```

This patch *does not* change the indices used to query attributes, as
requested by reviewers. Tracking whether an index is usable for array
indexing is a huge pain that affects many of the internal APIs, so it
would be good to come back later and do a cleanup to remove this
internal adjustment.

Reviewers: pete, chandlerc

Subscribers: javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D32819

llvm-svn: 303654
2017-05-23 17:01:48 +00:00
Anna Thomas c07d5544dd [JumpThreading] Safely replace uses of condition
This patch builds over https://reviews.llvm.org/rL303349 and replaces
the use of the condition only if it is safe to do so.

We should not blindly RAUW the condition if experimental.guard or assume
is a use of that
condition. This is because LVI may have used the guard/assume to
identify the
value of the condition, and RUAWing will fold the guard/assume and uses
before the guards/assumes.

Reviewers: sanjoy, reames, trentxintong, mkazantsev

Reviewed by: sanjoy, reames

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D33257

llvm-svn: 303633
2017-05-23 13:36:25 +00:00
Craig Topper 7e0aeeb884 [KnownBits] Use !hasConflict() in asserts in place of Zero & One == 0 or similar. NFC
llvm-svn: 303614
2017-05-23 07:18:37 +00:00
Ayal Zaks 589e1d9610 [LV] Report multiple reasons for not vectorizing under allowExtraAnalysis
The default behavior of -Rpass-analysis=loop-vectorizer is to report only the
first reason encountered for not vectorizing, if one is found, at which time the
vectorizer aborts its handling of the loop. This patch allows multiple reasons
for not vectorizing to be identified and reported, at the potential expense of
additional compile-time, under allowExtraAnalysis which can currently be turned
on by Clang's -fsave-optimization-record and opt's -pass-remarks-missed.

Removed from LoopVectorizationLegality::canVectorize() the redundant checking
and reporting if we CantComputeNumberOfIterations, as LAI::canAnalyzeLoop() also
does that. This redundancy is caught by a lit test once multiple reasons are
reported.

Patch initially developed by Dror Barak.

Differential Revision: https://reviews.llvm.org/D33396

llvm-svn: 303613
2017-05-23 07:08:02 +00:00
Teresa Johnson 525dcb617b Fix update VP metadata after inlining for instrumentation PGO
Summary:
With instrumentation profiling, when updating the VP metadata after
an inline, VP metadata on the inlined copy was inadvertantly having
all counts zeroed out. This was causing indirect calls from code inlined
during the call step to be marked as cold in the ThinLTO summaries and
not imported.

The CallerBFI needs to be passed down so that the CallSiteCount can be
computed from the profile summary info. With Sample PGO this was working
since the count is extracted from the branch weight metadata on the
call being inlined (even before we stopped looking at metadata for
non-sample PGO in r302844 this largely wasn't working for instrumentation
PGO since only promoted indirect calls would be getting inlined and have
the metadata).

Added an instrumentation PGO test and renamed the sample PGO test.

Reviewers: danielcdh, eraman

Subscribers: mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D33389

llvm-svn: 303574
2017-05-22 20:28:18 +00:00
Xinliang David Li 126157c3b4 [PartialInlining] Add internal options to enable partial inlining in pass pipeline (off by default)
1. Legacy: -mllvm -enable-partial-inlining
2. New:  -mllvm -enable-npm-partial-inlining -fexperimental-new-pass-manager

Differential Revision: http://reviews.llvm.org/D33382

llvm-svn: 303567
2017-05-22 16:41:57 +00:00
Artur Pilipenko edee25152b [LoopPredication] NFC. Add extra debug output in case we fail to parse the range check
llvm-svn: 303544
2017-05-22 12:06:57 +00:00
Artur Pilipenko c488dfabac [LoopPredication] NFC. Move a nested struct declaration before the fields, clang-format a bit
This will simplify the diff for an upcoming review.

llvm-svn: 303543
2017-05-22 12:01:32 +00:00
Craig Topper 2b1fc32f22 [InstCombine] Cleanup the interface for overflow checks
Summary:
Fix naming conventions and const correctness.
This completes the changes made in rL303029.

Patch by Yoav Ben-Shalom.

Reviewers: craig.topper

Reviewed By: craig.topper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D33377

llvm-svn: 303529
2017-05-22 06:25:31 +00:00
Craig Topper e777fed152 [SimplifyCFG] Prevent a few APInt copies on method calls that return const reference. NFCI
llvm-svn: 303523
2017-05-22 00:49:35 +00:00
Craig Topper aaef41f71b [KnownBits] Use isNegative/isNonNegative to shorten some code. NFC
llvm-svn: 303522
2017-05-22 00:49:33 +00:00
Daniel Berlin d130b6c27d NewGVN: Fix PR 33116, the memoryphi version of bug 32838.
llvm-svn: 303521
2017-05-21 23:41:58 +00:00
Daniel Berlin 0207cca8e0 NewGVN: Cleanup some repeated code using some templated helpers
llvm-svn: 303520
2017-05-21 23:41:56 +00:00
Daniel Berlin 0193997b7e NewGVN: Fix printing of simplified expression
llvm-svn: 303519
2017-05-21 23:41:53 +00:00
Davide Italiano 21a49dcdf1 [InstCombine] Take in account the size in sext->lshr->trunc patterns.
Otherwise we end up miscompiling, transforming:

define i8 @tinky() {
  %sext = sext i1 1 to i16
  %hibit = lshr i16 %sext, 15
  %tr = trunc i16 %hibit to i8
  ret i8 %tr
}

into:

  %sext = sext i1 1 to i8
  ret i8 %sext

and the first get folded to ret i8 1, while the second gets folded
to ret i8 -1.

Eventually we should get rid of this transform entirely, but for now,
this at least fixes a know correctness bug.

Differential Revision:  https://reviews.llvm.org/D33338

llvm-svn: 303513
2017-05-21 20:30:27 +00:00
Xin Tong 9fbfeefadf Revert "Add pthread_self function prototype and make it speculatable."
This reverts commit 143d7445b5dfa2f6d6c45bdbe0433d9fc531be21.

Build breaking

llvm-svn: 303496
2017-05-21 00:37:55 +00:00
Xin Tong 75af3af957 Add pthread_self function prototype and make it speculatable.
Summary: This allows pthread_self to be pulled out of a loop by LICM.

Reviewers: hfinkel, arsenm, davide

Reviewed By: davide

Subscribers: davide, wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D32782

llvm-svn: 303495
2017-05-20 22:40:25 +00:00
Davide Italiano 9a0f542db6 [NewGVN] Create a StoreExpression instead of a VariableExpression.
In the case where we have an operand defined by a lod of the
same memory location. Historically this was a VariableExpression
because we wanted to make sure they ended up in the same class,
but if we create the right expression, they end up in the same
class anyway.

Fixes PR32897. Thanks to Dan for the detailed discussion and the
fix suggestion.

llvm-svn: 303475
2017-05-20 00:46:54 +00:00
Davide Italiano 888965c8a2 [NewGVN] Get rid of an assertion.
This was here because we don't want to switch leaders too much,
in order to avoid fixpoint(ing) issue, but it's not sure if it
matters in practice.

A first step towards fixing PR32897.

llvm-svn: 303473
2017-05-20 00:24:04 +00:00
Adrian Prantl 660437975b Revert "ThinLTO: Verify bitcode before lauching the ThinLTOCodeGenerator."
This reverts commit r303438 while deliberating buildbot breakage.

llvm-svn: 303467
2017-05-19 23:32:21 +00:00
Matthias Braun 50ec0b5dce SimplifyLibCalls: Optimize wcslen
Refactor the strlen optimization code to work for both strlen and wcslen.

This especially helps with programs in the wild where people pass
L"string"s to const std::wstring& function parameters and the wstring
constructor gets inlined.

This also fixes a lingerind API problem/bug in getConstantStringInfo()
where zeroinitializers would always give you an empty string (without a
length) back regardless of the actual length of the initializer which
did not work well in the TrimAtNul==false causing the PR mentioned
below.

Note that the fixed getConstantStringInfo() needed fixes to SelectionDAG
memcpy lowering and may lead to some cases for out-of-bounds
zeroinitializer accesses not getting optimized anymore. So some code
with UB may produce out of bound memory reads now instead of just
producing zeros.

The refactoring "accidentally" fixes http://llvm.org/PR32124

Differential Revision: https://reviews.llvm.org/D32839

llvm-svn: 303461
2017-05-19 22:37:09 +00:00
Daniel Berlin e021d2d629 NewGVN: Fix PR32838.
This is a complicated bug involving two issues:
1. What do we do with phi nodes when we prove all arguments are not
live?
2. When is it safe to use value leaders to determine if we can ignore
an argumnet?

llvm-svn: 303453
2017-05-19 20:22:20 +00:00
Daniel Berlin b527b2cf13 Last of the major pieces to NewGVN - yay!
Summary:
NewGVN: Handle equivalence between phi of ops and op of phis.

This makes our GVN mostly-complete. It would be complete, modulo some
deliberate choices we make.  This means it detects roughly all herband
equivalences in polynomial time, including cases notoriously hard for
other GVN's to detect.  It also detects a very large swath of the
cases we currently rely on instcombine to detect that involve folding
upwards through phis.

Fixes PR 31125, 31463, PR 31868

Reviewers: davide

Subscribers: Prazek, llvm-commits

Differential Revision: https://reviews.llvm.org/D32151

llvm-svn: 303444
2017-05-19 19:01:27 +00:00
Daniel Berlin ff15200b1d NewGVN: Get rid of most dominating leader check
llvm-svn: 303443
2017-05-19 19:01:24 +00:00
Anna Thomas ae3f752f36 [NFC][loopIdiom] Clang format change rL303434
llvm-svn: 303439
2017-05-19 18:00:30 +00:00
Adrian Prantl f9ab9bfc39 ThinLTO: Verify bitcode before lauching the ThinLTOCodeGenerator.
rdar://problem/31233625

Differential Revision: https://reviews.llvm.org/D33151

llvm-svn: 303438
2017-05-19 17:55:02 +00:00
Anna Thomas 5ecb8f7593 [LoopIdiom] Refactor return value of isLegalStore [NFC]
Summary:

This NFC simply refactors the return value of LoopIdiomRecognize::isLegalStore() from bool to an enumeration, and
removes the return-through-parameter mechanism that the function was using. This function is constructed such that it will
only ever recognize a single store idiom (memset, memset_pattern, or memcpy), and never a combination of these. As such it
makes much more sense for the return value to be the single idiom that the store matches, rather than
having a separate argument-return for each idiom -- it's cleaner, and makes it clearer that
only a single idiom can be matched.

Patch by Daniel Neilson!

Reviewers: anna, sanjoy, davide, haicheng

Reviewed By: anna, haicheng

Subscribers: haicheng, mzolotukhin, llvm-commits

Differential Revision: https://reviews.llvm.org/D33359

llvm-svn: 303434
2017-05-19 17:05:36 +00:00
Artur Pilipenko a6c278049a [LoopPredication] NFC. Extract LoopICmp struct and parseLoopICmp helper
llvm-svn: 303427
2017-05-19 14:02:46 +00:00
Artur Pilipenko 6780ba65b9 [LoopPredication] NFC. Extract LoopPredication::expandCheck helper
llvm-svn: 303426
2017-05-19 14:00:58 +00:00
Artur Pilipenko aab28666bc [LoopPredication] NFC. Extract CanExpand helper lambda
llvm-svn: 303425
2017-05-19 14:00:04 +00:00
Artur Pilipenko 46c4e0a4bf [LoopPredication] NFC. Add an early exit if there is no guards in the loop
llvm-svn: 303424
2017-05-19 13:59:34 +00:00
Amara Emerson 4d33c86359 Fix vector pass-through value being unused in IRBuilder::CreateMaskedGather
Also s/0/nullptr in the call site in LV.

llvm-svn: 303416
2017-05-19 10:40:18 +00:00
Davide Italiano ee49f4943c [NewGVN] Delete the old store when we find congruent to a load.
(or non-store, more in general). Fixes PR33086. Caught by the
store verifier.

llvm-svn: 303406
2017-05-19 04:06:10 +00:00
Davide Italiano eab0de2b82 [NewGVN] Break infinite recursion in singleReachablePHIPath().
We can have cycles between PHIs and this causes singleReachablePhi()
to call itself indefintely (until we run out of stack). The proper
solution would be that of computing SCCs, but it's not worth for
now, so just keep a visited set and give up when we find a cycle.
Thanks to Dan for the discussion/help with this.

Fixes PR33014.

llvm-svn: 303393
2017-05-18 23:22:44 +00:00
Davide Italiano a76e5fa111 [NewGVN] Replace predicate info leftovers.
This time with an additional fix, i.e. we remove the dead
@llvm.ssa.copy instruction.

llvm-svn: 303385
2017-05-18 21:43:23 +00:00
Sanjay Patel 5e456b943a [InstCombine] add helper to foldXorOfICmps(); NFCI
Also, fix the old-style capitalization of the related functions
and move them to the 'private' section of the class since they
are just helpers of the visit* functions.

As shown in the post-commit comments for D32143, we are missing
folds for xor-of-icmps. 

llvm-svn: 303381
2017-05-18 20:53:16 +00:00
Reid Kleckner 96ab8726a3 [IR] De-virtualize ~Value to save a vptr
Summary:
Implements PR889

Removing the virtual table pointer from Value saves 1% of RSS when doing
LTO of llc on Linux. The impact on time was positive, but too noisy to
conclusively say that performance improved. Here is a link to the
spreadsheet with the original data:

https://docs.google.com/spreadsheets/d/1F4FHir0qYnV0MEp2sYYp_BuvnJgWlWPhWOwZ6LbW7W4/edit?usp=sharing

This change makes it invalid to directly delete a Value, User, or
Instruction pointer. Instead, such code can be rewritten to a null check
and a call Value::deleteValue(). Value objects tend to have their
lifetimes managed through iplist, so for the most part, this isn't a big
deal.  However, there are some places where LLVM deletes values, and
those places had to be migrated to deleteValue.  I have also created
llvm::unique_value, which has a custom deleter, so it can be used in
place of std::unique_ptr<Value>.

I had to add the "DerivedUser" Deleter escape hatch for MemorySSA, which
derives from User outside of lib/IR. Code in IR cannot include MemorySSA
headers or call the MemoryAccess object destructors without introducing
a circular dependency, so we need some level of indirection.
Unfortunately, no class derived from User may have any virtual methods,
because adding a virtual method would break User::getHungOffOperands(),
which assumes that it can find the use list immediately prior to the
User object. I've added a static_assert to the appropriate OperandTraits
templates to help people avoid this trap.

Reviewers: chandlerc, mehdi_amini, pete, dberlin, george.burgess.iv

Reviewed By: chandlerc

Subscribers: krytarowski, eraman, george.burgess.iv, mzolotukhin, Prazek, nlewycky, hans, inglorion, pcc, tejohnson, dberlin, llvm-commits

Differential Revision: https://reviews.llvm.org/D31261

llvm-svn: 303362
2017-05-18 17:24:10 +00:00
Wei Mi 8848c1e3c7 [LSR] Call canonicalize after we generate a new Formula in GenerateTruncates. Fix PR33077.
The testcase in PR33077 generates a LSR Use Formula with two SCEVAddRecExprs for the same
loop. Such uncommon formula will become non-canonical after GenerateTruncates adds sign
extension to the ScaledReg of the Formula, and it will break the assertion that every
Formula to be inserted is canonical.

The fix is to call canonicalize for the raw Formula generated by GenerateTruncates
before inserting it.

llvm-svn: 303361
2017-05-18 17:21:22 +00:00
Anna Thomas 7bca59152a [JumpThreading] Dont RAUW condition incorrectly
Summary:
We have a bug when RAUWing the condition if experimental.guard or assumes is a use of that
condition. This is because LazyValueInfo may have used the guards/assumes to identify the
value of the condition at the end of the block. RAUW replaces the uses
at the guard/assume as well as uses before the guard/assume. Both of
these are incorrect.
For now, disable RAUW for conditions and fix the logic as a next
step: https://reviews.llvm.org/D33257

Reviewers: sanjoy, reames, trentxintong

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D33279

llvm-svn: 303349
2017-05-18 13:12:18 +00:00
Craig Topper 8a950275f7 [Statistics] Add a method to atomically update a statistic that contains a maximum
Summary:
There are several places in the codebase that try to calculate a maximum value in a Statistic object. We currently do this in one of two ways:

  MaxNumFoo = std::max(MaxNumFoo, NumFoo);

or

  MaxNumFoo = (MaxNumFoo > NumFoo) ? MaxNumFoo : NumFoo;

The first version reads from MaxNumFoo one time and uncontionally rwrites to it. The second version possibly reads it twice depending on the result of the first compare.  But we have no way of knowing if the value was changed by another thread between the reads and the writes.

This patch adds a method to the Statistic object that can ensure that we only store if our value is the max and the previous max didn't change after we read it. If it changed we'll recheck if our value should still be the max or not and try again.

This spawned from an audit I'm trying to do of all places we uses the implicit conversion to unsigned on the Statistics objects. See my previous thread on llvm-dev https://groups.google.com/forum/#!topic/llvm-dev/yfvxiorKrDQ

Reviewers: dberlin, chandlerc, hfinkel, dblaikie

Reviewed By: chandlerc

Subscribers: llvm-commits, sanjoy

Differential Revision: https://reviews.llvm.org/D33301

llvm-svn: 303318
2017-05-18 00:51:39 +00:00
Craig Topper 48187cffe2 [Statistics] Use Statistic::operator+= instead of adding and assigning separately.
I believe this technically fixes a multithreaded race condition in this code. But my primary concern was as part of looking at removing the ability to treat Statistics like a plain unsigned. There are many weird operations on Statistics in the codebase.

llvm-svn: 303314
2017-05-17 23:22:10 +00:00
Sanjay Patel ba212c241a [InstCombine] handle icmp i1 X, C early to avoid creating an unknown pattern
The missing optimization for xor-of-icmps still needs to be added, but by
being more efficient (not generating unnecessary logic ops with constants)
we avoid the bug.

See discussion in post-commit comments:
https://reviews.llvm.org/D32143

llvm-svn: 303312
2017-05-17 22:29:40 +00:00
Sanjay Patel e5747e3cbd [InstCombine] move icmp bool canonicalizations to helper; NFC
As noted in the post-commit comments in D32143, we should be
catching the constant operand cases sooner to be more efficient
and less likely to expose a missing fold.

llvm-svn: 303309
2017-05-17 22:15:07 +00:00
Sanjay Patel b2e7003103 [InstCombine] add isCanonicalPredicate() helper function and use it; NFCI
There should be a slight efficiency improvement from handling icmp/fcmp with one matcher and reducing duplicated code.

The larger motivation is that there are questions about how predicate canonicalization is handled, and the refactoring
should make it easier if we want to change any of that behavior.

1. As noted in the code comment, we've chosen 3 of the 16 FCMP preds as not canonical. Why those 3? It goes back to 
   rL32751 from what I can tell, but I'm not sure if there's a justification for that rule.
2. We currently do not canonicalize integer select conditions. Should we use the same rule that applies to branches 
   for selects?
3. We currently do canonicalize some FP select conditions, and those rules would conflict with the rule shown here. 
   Should one or both be changed? 

No-functional-change-intended, but adding tests anyway because there's no coverage for most of the predicates.

Differential Revision: https://reviews.llvm.org/D33247

llvm-svn: 303261
2017-05-17 14:21:19 +00:00
Gor Nishanov db38485588 [coroutines] Handle spills before catchswitch
If we need to spill the result of the PHI instruction, we insert the spill after
all of the PHIs and EHPads, however, in a catchswitch block there is no
room to insert the spill. Make room by splitting away catchswitch into a separate
block.

Before the fix:

    catch.dispatch:
       %val = phi i32 [ 1, %if.then ], [ 2, %if.else ]
       %switch = catchswitch within none [label %catch] unwind label %cleanuppad

After:

    catch.dispatch:
       %val = phi i32 [ 1, %if.then ], [ 2, %if.else ]
       %tok = cleanuppad within none []
       ; spill goes here
       cleanupret from %tok unwind label %catch.dispatch.switch
    catch.dispatch.switch:
       %switch = catchswitch within none [label %catch] unwind label %cleanuppad

https://reviews.llvm.org/D31846

llvm-svn: 303232
2017-05-17 03:09:22 +00:00
Francis Visoiu Mistrih b52e036600 BitVector: add iterators for set bits
Differential revision: https://reviews.llvm.org/D32060

llvm-svn: 303227
2017-05-17 01:07:53 +00:00
Eugene Zelenko a369a45746 [ADT] Fix some Clang-tidy modernize-use-using warnings; other minor fixes (NFC).
llvm-svn: 303221
2017-05-16 23:10:25 +00:00
Davide Italiano 79eb3b0366 [IR] Prefer use_empty() to !hasNUsesOrMore(1) for clarity.
llvm-svn: 303218
2017-05-16 22:38:40 +00:00
Evgeny Stupachenko cc19560253 The patch exclude a case from zero check skip in
CTLZ idiom recognition (r303102).

Summary:

The following case:
i = 1;
if(n)
  while (n >>= 1)
    i++;
use(i);

Was converted to:

i = 1;
if(n)
  i += builtin_ctlz(n >> 1, false);
use(i);

Which is not correct. The patch make it:

i = 1;
if(n)
  i += builtin_ctlz(n >> 1, true);
use(i);

From: Evgeny Stupachenko <evstupac@gmail.com>
llvm-svn: 303212
2017-05-16 21:44:59 +00:00
Dmitry Mikulin fce148c568 In debug builds non-trivial amount of time is spent in InstCombine processing
@llvm.dbg.* calls in visitCallInst(). They can be safely ignored.

llvm-svn: 303202
2017-05-16 20:08:49 +00:00
Daniel Berlin 6c66e9a22a NewGVN: Only do something in verifyStoreExpressions if assertions are enabled, to avoid unused code warnings.
llvm-svn: 303201
2017-05-16 20:02:45 +00:00
Daniel Berlin 4540357240 NewGVN: Fix PR 33051 by making sure we remove old store expressions
from the ExpressionToClass mapping.

llvm-svn: 303200
2017-05-16 19:58:47 +00:00
Matthew Simpson af60af1ed5 Revert 303174, 303176, and 303178
These commits are breaking the bots. Reverting to investigate.

llvm-svn: 303182
2017-05-16 15:50:30 +00:00
Matthew Simpson b7b5d55c38 [LV] Avoid potentential division by zero when selecting IC
llvm-svn: 303174
2017-05-16 14:43:55 +00:00
Gor Nishanov 23453c11ff [coroutines] Handle unwind edge splitting
Summary:
RewritePHIs algorithm used in building of CoroFrame inserts a placeholder
```
%placeholder = phi [%val]
```
on every edge leading to a block starting with PHI node with multiple incoming edges,
so that if one of the incoming values was spilled and need to be reloaded, we have a
place to insert a reload. We use SplitEdge helper function to split the incoming edge.

SplitEdge function does not deal with unwind edges comping into a block with an EHPad.

This patch adds an ehAwareSplitEdge function that can correctly split the unwind edge.

For landing pads, we clone the landing pad into every edge block and replace the original
landing pad with a PHI collection the values from all incoming landing pads.

For WinEH pads, we keep the original EHPad in place and insert cleanuppad/cleapret in the
edge blocks.

Reviewers: majnemer, rnk

Reviewed By: majnemer

Subscribers: EricWF, llvm-commits

Differential Revision: https://reviews.llvm.org/D31845

llvm-svn: 303172
2017-05-16 14:11:39 +00:00
Craig Topper 064adc6bfa [CorrelatedValuePropagation] Don't use -> to call a static method of ConstantRange. NFC
llvm-svn: 303147
2017-05-16 07:05:38 +00:00
Daniel Berlin 629e1ff6e6 NewGVN: Use StoreExpression StoredValue instead of looking it up again, since it was already looked up when it was created
llvm-svn: 303144
2017-05-16 06:06:15 +00:00
Daniel Berlin abd632dfeb NewGVN: Formatting fixes
llvm-svn: 303143
2017-05-16 06:06:12 +00:00
Davide Italiano a641842845 Revert "[NewGVN] Replace predicate info leftovers."
It's breaking the bots.

llvm-svn: 303142
2017-05-16 05:51:21 +00:00
Davide Italiano 331058fcc4 [NewGVN] Replace predicate info leftovers.
Fixes PR32945.

Differential Revision:  https://reviews.llvm.org/D33226

llvm-svn: 303141
2017-05-16 05:23:23 +00:00
Peter Collingbourne 6f0ecca3b5 IR: Give function GlobalValue::getRealLinkageName() a less misleading name: dropLLVMManglingEscape().
This function gives the wrong answer on some non-ELF platforms in some
cases. The function that does the right thing lives in Mangler.h. To try to
discourage people from using this function, give it a different name.

Differential Revision: https://reviews.llvm.org/D33162

llvm-svn: 303134
2017-05-16 00:39:01 +00:00
Xinliang David Li 8726d91d29 Fix memory leak
llvm-svn: 303126
2017-05-15 22:43:52 +00:00
David Blaikie 441cfee780 PR32288: Describe a bool parameter's DWARF location with a simple register
There's no need (& a bit incorrect) to mask off the high bits of the
register reference when describing a simple bool value.

Reviewers: aprantl

Differential Revision: https://reviews.llvm.org/D31062

llvm-svn: 303117
2017-05-15 21:34:01 +00:00
Adam Nemet e29686e5c1 [SLP] Enable 64-bit wide vectorization on AArch64
ARM Neon has native support for half-sized vector registers (64 bits).  This
is beneficial for example for 2D and 3D graphics.  This patch adds the option
to lower MinVecRegSize from 128 via a TTI in the SLP Vectorizer.

*** Performance Analysis

This change was motivated by some internal benchmarks but it is also
beneficial on SPEC and the LLVM testsuite.

The results are with -O3 and PGO.  A negative percentage is an improvement.
The testsuite was run with a sample size of 4.

** SPEC

* CFP2006/482.sphinx3  -3.34%

A pretty hot loop is SLP vectorized resulting in nice instruction reduction.
This used to be a +22% regression before rL299482.

* CFP2000/177.mesa     -3.34%
* CINT2000/256.bzip2   +6.97%

My current plan is to extend the fix in rL299482 to i16 which brings the
regression down to +2.5%.  There are also other problems with the codegen in
this loop so there is further room for improvement.

** LLVM testsuite

* SingleSource/Benchmarks/Misc/ReedSolomon               -10.75%

There are multiple small SLP vectorizations outside the hot code.  It's a bit
surprising that it adds up to 10%.  Some of this may be code-layout noise.

* MultiSource/Benchmarks/VersaBench/beamformer/beamformer -8.40%

The opt-viewer screenshot can be seen at F3218284.  We start at a colder store
but the tree leads us into the hottest loop.

* MultiSource/Applications/lambda-0.1.3/lambda            -2.68%
* MultiSource/Benchmarks/Bullet/bullet                    -2.18%

This is using 3D vectors.

* SingleSource/Benchmarks/Shootout-C++/Shootout-C++-lists +6.67%

Noise, binary is unchanged.

* MultiSource/Benchmarks/Ptrdist/anagram/anagram          +4.90%

There is an additional SLP in the cold code.  The test runs for ~1sec and
prints out over 2000 lines. This is most likely noise.

* MultiSource/Applications/aha/aha                        +1.63%
* MultiSource/Applications/JM/lencod/lencod               +1.41%
* SingleSource/Benchmarks/Misc/richards_benchmark         +1.15%

Differential Revision: https://reviews.llvm.org/D31965

llvm-svn: 303116
2017-05-15 21:15:01 +00:00
Evgeniy Stepanov b56012b548 [asan] Better workaround for gold PR19002.
See the comment for more details. Test in a follow-up CFE commit.

llvm-svn: 303113
2017-05-15 20:43:42 +00:00
Davide Italiano cff8a34716 [NewGVN] Remove unused setDefiningExpr(). NFCI.
llvm-svn: 303107
2017-05-15 19:35:40 +00:00
Sanjay Patel 878715f978 [InstCombine] restrict icmp fold with 2 sdiv exact operands (PR32949)
This is the InstCombine counterpart to D32954. 
I added some comments about the code duplication in:
rL302436

Alive-based verification:
http://rise4fun.com/Alive/dPw

This is a 2nd fix for the problem reported in:
https://bugs.llvm.org/show_bug.cgi?id=32949

Differential Revision: https://reviews.llvm.org/D32970

llvm-svn: 303105
2017-05-15 19:27:53 +00:00
Evgeny Stupachenko 2fecd38ab8 The patch adds CTLZ idiom recognition.
Summary:

The following loops should be recognized:
i = 0;
while (n) {
  n = n >> 1;
  i++;
  body();
}
use(i);

And replaced with builtin_ctlz(n) if body() is empty or
for CPUs that have CTLZ instruction converted to countable:

for (j = 0; j < builtin_ctlz(n); j++) {
  n = n >> 1;
  i++;
  body();
}
use(builtin_ctlz(n));

Reviewers: rengolin, joerg

Differential Revision: http://reviews.llvm.org/D32605

From: Evgeny Stupachenko <evstupac@gmail.com>
llvm-svn: 303102
2017-05-15 19:08:56 +00:00
Davide Italiano 6e7a212748 [NewGVN] Fix verification of MemoryPhis in verifyMemoryCongruency().
verifyMemoryCongruency() filters out trivially dead MemoryDef(s),
as we find them immediately dead, before moving from TOP to a new
congruence class.
This fixes the same problem for PHI(s) skipping MemoryPhis if all
the operands are dead.

Differential Revision:  https://reviews.llvm.org/D33044

llvm-svn: 303100
2017-05-15 18:50:53 +00:00
Sanjay Patel 941e8dfcbf [InstCombine] use m_OneUse to reduce code; NFCI
llvm-svn: 303090
2017-05-15 18:08:17 +00:00
Craig Topper 1a36b7d836 [ValueTracking] Replace all uses of ComputeSignBit with computeKnownBits.
This patch finishes off the conversion of ComputeSignBit to computeKnownBits.

Differential Revision: https://reviews.llvm.org/D33166

llvm-svn: 303035
2017-05-15 06:39:41 +00:00
Craig Topper bb9737247a [InstCombine] Merge duplicate functionality between InstCombine and ValueTracking
Summary:
Merge overflow computation for signed add,
appearing both in InstCombine and ValueTracking.

As part of the merge,
cleanup the interface for overflow checks in InstCombine.

Patch by Yoav Ben-Shalom.

Reviewers: craig.topper, majnemer

Reviewed By: craig.topper

Subscribers: takuto.ikuta, llvm-commits

Differential Revision: https://reviews.llvm.org/D32946

llvm-svn: 303029
2017-05-15 02:44:08 +00:00
Craig Topper 26c4159956 [InstCombine] Remove 'return' of a called function that also returned void. NFC
llvm-svn: 303028
2017-05-15 02:30:27 +00:00
Xinliang David Li 392e975693 Fix test failure on windows -- do not return deleted func
llvm-svn: 302999
2017-05-14 02:54:02 +00:00
Simon Pilgrim 7d62e4b455 [LoopOptimizer][Fix]PR32859, PR24738
The Loop vectorizer pass introduced undef value while it is fixing output of LCSSA form.
Here it is:

before: %e.0.ph = phi i32 [ 0, %for.inc.2.i ]
after: %e.0.ph = phi i32 [ 0, %for.inc.2.i ], [ undef, %middle.block ]

and after this change we have:

%e.0.ph = phi i32 [ 0, %for.inc.2.i ]
%e.0.ph = phi i32 [ 0, %for.inc.2.i ], [ 0, %middle.block ]

Committed on behalf of @dtemirbulatov

Differential Revision: https://reviews.llvm.org/D33055

llvm-svn: 302988
2017-05-13 13:25:57 +00:00
Craig Topper 935f7b050f [InstCombine] Prevent InstCombine from triggering an extra iteration if something changed in the initial Worklist creation
Summary:
If the Worklist build causes an IR change this change flag currently factors into the flag for running another iteration of the iteration loop. But only changes during processing should trigger another loop.

This patch captures the worklist creation change flag into the outside the loop flag currently used for DbgDeclares and only sends that flag up to the caller. Rerunning the loop only depends on IC.run() now.

This uses the debug output of InstCombine to determine if one or two iterations run. I couldn't think of a better way to detect it since the second spurious iteration shoudn't make any visible changes. Just wasted computation.

I can do a pre-commit of the test case with the CHECK-NOT as a CHECK if this is an ok way to check this.

This is a subset of D31678 as I'm still not sure how to verify the analysis behavior for that.

Reviewers: davide, majnemer, spatel, chandlerc

Reviewed By: davide

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D32453

llvm-svn: 302982
2017-05-13 06:56:04 +00:00
Xinliang David Li 66bdfca77a [PartialInlining] Profile based cost analysis
Implemented frequency based cost/saving analysis
and related options.

The pass is now in a state ready to be turne on
in the pipeline (in follow up).

Differential Revision: http://reviews.llvm.org/D32783

llvm-svn: 302967
2017-05-12 23:41:43 +00:00
Craig Topper 8df66c602a [KnownBits] Add bit counting methods to KnownBits struct and use them where possible
This patch adds min/max population count, leading/trailing zero/one bit counting methods.

The min methods return answers based on bits that are known without considering unknown bits. The max methods give answers taking into account the largest count that unknown bits could give.

Differential Revision: https://reviews.llvm.org/D32931

llvm-svn: 302925
2017-05-12 17:20:30 +00:00
Davide Italiano c43a9f80ed [NewGVN] Improve debug output a bit. NFCI.
While debugging a predicate info problem, I noticed this was missing
a newline, making the debug output slightly less readable.

llvm-svn: 302908
2017-05-12 15:28:12 +00:00
Davide Italiano b60f6e0550 [NewGVN] Format an assertion and fix a typo. NFCI.
llvm-svn: 302906
2017-05-12 15:25:56 +00:00
Davide Italiano 41f5c7bcba [NewGVN] Don't incorrectly reset the memory leader.
This code was missing a check for stores, so we were thinking the
congruency class didn't have any memory members, and reset the
memory leader.

Differential Revision:  https://reviews.llvm.org/D33056

llvm-svn: 302905
2017-05-12 15:22:45 +00:00
Chandler Carruth d869b18826 [PM/Unswitch] Teach the new simple loop unswitch to handle loop
invariant PHI inputs and to rewrite PHI nodes during the actual
unswitching.

The checking is quite easy, but rewriting the PHI nodes is somewhat
surprisingly challenging. This should handle both branches and switches.

I think this is now a full featured trivial unswitcher, and more full
featured than the trivial cases in the old pass while still being (IMO)
somewhat simpler in how it works.

Next up is to verify its correctness in more widespread testing, and
then to add non-trivial unswitching.

Thanks to Davide and Sanjoy for the excellent review. There is one
remaining question that I may address in a follow-up patch (see the
review thread for details) but it isn't related to the functionality
specifically.

Differential Revision: https://reviews.llvm.org/D32699

llvm-svn: 302867
2017-05-12 02:19:59 +00:00
Adam Nemet 0aca09fc6c [SLP] Emit optimization remarks
The approach I followed was to emit the remark after getTreeCost concludes
that SLP is profitable.  I initially tried emitting them after the
vectorizeRootInstruction calls in vectorizeChainsInBlock but I vaguely
remember missing a few cases for example in HorizontalReduction::tryToReduce.

ORE is placed in BoUpSLP so that it's available from everywhere (notably
HorizontalReduction::tryToReduce).

We use the first instruction in the root bundle as the locator for the remark.
In order to get a sense how far the tree is spanning I've include the size of
the tree in the remark.  This is not perfect of course but it gives you at
least a rough idea about the tree.  Then you can follow up with -view-slp-tree
to really see the actual tree.

llvm-svn: 302811
2017-05-11 17:06:17 +00:00
Ayal Zaks 58b28d549a [LV] Refactor ILV.vectorize{Loop}() by introducing LVP.executePlan(); NFC
Introduce LoopVectorizationPlanner.executePlan(), replacing ILV.vectorize() and
refactoring ILV.vectorizeLoop(). Method collectDeadInstructions() is moved from
ILV to LVP. These changes facilitate building VPlans and using them to generate
code, following https://reviews.llvm.org/D28975 and its tentative breakdown.

Method ILV.createEmptyLoop() is renamed ILV.createVectorizedLoopSkeleton() to
improve clarity; it's contents remain intact.

Differential Revision: https://reviews.llvm.org/D32200

llvm-svn: 302790
2017-05-11 11:36:33 +00:00
Alexander Potapenko a658ae8fe2 [msan] Fix PR32842
It turned out that MSan was incorrectly calculating the shadow for int comparisons: it was done by truncating the result of (Shadow1 OR Shadow2) to i1, effectively rendering all bits except LSB useless.
This approach doesn't work e.g. in the case where the values being compared are even (i.e. have the LSB of the shadow equal to zero).
Instead, if CreateShadowCast() has to cast a bigger int to i1, we replace the truncation with an ICMP to 0.

This patch doesn't affect the code generated for SPEC 2006 binaries, i.e. there's no performance impact.

For the test case reported in PR32842 MSan with the patch generates a slightly more efficient code:

  orq     %rcx, %rax
  jne     .LBB0_6
, instead of:

  orl     %ecx, %eax
  testb   $1, %al
  jne     .LBB0_6

llvm-svn: 302787
2017-05-11 11:07:48 +00:00
Serge Guelton f4dc59ba8e Remove spurious cast of nullptr. NFC.
Conversion rules allow automatic casting of nullptr to any pointer type.

llvm-svn: 302780
2017-05-11 08:53:00 +00:00
Sanjay Patel 40a87a909b [InstCombine] remove fold that swaps xor/or with constants; NFCI
// (X ^ C1) | C2 --> (X | C2) ^ (C1&~C2)

This canonicalization was added at:
https://reviews.llvm.org/rL7264 

By moving xors out/down, we can more easily combine constants. I'm adding
tests that do not change with this patch, so we can verify that those kinds
of transforms are still happening.

This is no-functional-change-intended because there's a later fold:
// (X^C)|Y -> (X|Y)^C iff Y&C == 0
...and demanded-bits appears to guarantee that any fold that would have
hit the fold we're removing here would be caught by that 2nd fold.

Similar reasoning was used in:
https://reviews.llvm.org/rL299384

The larger motivation for removing this code is that it could interfere with 
the fix for PR32706:
https://bugs.llvm.org/show_bug.cgi?id=32706

Ie, we're not checking if the 'xor' is actually a 'not', so we could reverse
a 'not' optimization and cause an infinite loop by altering an 'xor X, -1'. 

Differential Revision: https://reviews.llvm.org/D33050

llvm-svn: 302733
2017-05-10 21:33:55 +00:00
Davide Italiano dc435325a8 [NewGVN] Introduce a definesNoMemory() helper and use it.
This is nice as is, but it will be used in my next patch to
fix a bug. Suggested by Daniel Berlin.

llvm-svn: 302714
2017-05-10 19:57:43 +00:00
Teresa Johnson 94624aca2a Ensure non-null ProfileSummaryInfo passed to ModuleSummaryIndex builder
This fixes a ubsan bot failure after r302597, which made getProfileCount
non-static, but ended up invoking it on a null ProfileSummaryInfo object
in some cases from buildModuleSummaryIndex.

Most testing passed because the non-static getProfileCount currently
doesn't access any member variables, but I found this when testing a
follow on patch (D32877) that adds a member variable access.

llvm-svn: 302705
2017-05-10 18:52:16 +00:00
Sanjay Patel 2e069f250a [InstCombine] add (ashr (shl i32 X, 31), 31), 1 --> and (not X), 1
This is another step towards favoring 'not' ops over random 'xor' in IR:
https://bugs.llvm.org/show_bug.cgi?id=32706

This transformation may have occurred in longer IR sequences using computeKnownBits,
but that could be much more expensive to calculate.

As the scalar result shows, we do not currently favor 'not' in all cases. The 'not'
created by the transform is transformed again (unnecessarily). Vectors don't have
this problem because vectors are (wrongly) excluded from several other combines.

llvm-svn: 302659
2017-05-10 13:56:52 +00:00
Serge Guelton 778ece82ae Use explicit false instead of casted nullptr. NFC.
llvm-svn: 302656
2017-05-10 13:24:17 +00:00
Chandler Carruth f3bd8ddedb Revert r301950: SpeculativeExecution: Stop using whitelist for costs
This pass doesn't correctly handle testing for when it is legal to hoist
arbitrary instructions. The whitelist happens to make it safe, so before
it is removed the pass's legality checks will need to be enhanced.

Details have been added to the code review thread for the patch.

llvm-svn: 302640
2017-05-10 12:30:07 +00:00
Amara Emerson 836b0f48c1 Add a late IR expansion pass for the experimental reduction intrinsics.
This pass uses a new target hook to decide whether or not to expand a particular
intrinsic to the shuffevector sequence.

Differential Revision: https://reviews.llvm.org/D32245

llvm-svn: 302631
2017-05-10 09:42:49 +00:00
Sanjay Patel 4133d4a56e [InstCombine] add helper function for add X, C folds; NFCI
llvm-svn: 302605
2017-05-10 00:07:16 +00:00
Easwaran Raman f5f9160072 [ProfileSummary] Make getProfileCount a non-static member function.
This change is required because the notion of count is different for
sample profiling and getProfileCount will need to determine the
underlying profile type.

Differential revision: https://reviews.llvm.org/D33012

llvm-svn: 302597
2017-05-09 23:21:10 +00:00
Peter Collingbourne c3d677f9d9 FunctionImport: Simplify function llvm::thinLTOInternalizeModule. NFCI.
llvm-svn: 302595
2017-05-09 22:43:31 +00:00
Keno Fischer 06f962c1e8 [GVN] Fix a crash on encountering non-integral pointers
Summary:
This fixes the immediate crash caused by introducing an incorrect inttoptr
before attempting the conversion. There may still be a legality
check missing somewhere earlier for non-integral pointers, but this change
seems necessary in any case.

Reviewers: sanjoy, dberlin

Reviewed By: dberlin

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D32623

llvm-svn: 302587
2017-05-09 21:07:20 +00:00
Matthew Simpson 78fd46b230 [AArch64] Consider widening instructions in cost calculations
The AArch64 instruction set has a few "widening" instructions (e.g., uaddl,
saddl, uaddw, etc.) that take one or more doubleword operands and produce
quadword results. The operands are automatically sign- or zero-extended as
appropriate. However, in LLVM IR, these extends are explicit. This patch
updates TTI to consider these widening instructions as single operations whose
cost is attached to the arithmetic instruction. It marks extends that are part
of a widening operation "free" and applies a sub-target specified overhead
(zero by default) to the arithmetic instructions.

Differential Revision: https://reviews.llvm.org/D32706

llvm-svn: 302582
2017-05-09 20:18:12 +00:00
Sanjay Patel 7caaa79879 [InstCombine] clean up matchDeMorgansLaws(); NFCI
The motivation for getting rid of dyn_castNotVal is to allow fixing:
https://bugs.llvm.org/show_bug.cgi?id=32706

So this was supposed to be functional-change-intended for the case
of inverting constants and applying DeMorgan. However, I can't find 
any cases where that pattern will actually get to matchDeMorgansLaws()
because we have other folds in visitAnd/visitOr that do the same
thing. So this ends up just being a clean-up patch with slight efficiency
improvement, but no-functional-change-intended.

llvm-svn: 302581
2017-05-09 20:05:05 +00:00
Davide Italiano b7a6698ae9 [NewGVN] Simplify a DEBUG() statement. NFCI.
llvm-svn: 302579
2017-05-09 20:02:48 +00:00
Adrian Prantl c10d0e5ccd Make it illegal for two Functions to point to the same DISubprogram
As recently discussed on llvm-dev [1], this patch makes it illegal for
two Functions to point to the same DISubprogram and updates
FunctionCloner to also clone the debug info of a function to conform
to the new requirement. To simplify the implementation it also factors
out the creation of inlineAt locations from the Inliner into a
general-purpose utility in DILocation.

[1] http://lists.llvm.org/pipermail/llvm-dev/2017-May/112661.html
<rdar://problem/31926379>

Differential Revision: https://reviews.llvm.org/D32975

This reapplies r302469 with a fix for a bot failure (reparentDebugInfo
now checks for the case the orig and new function are identical).

llvm-svn: 302576
2017-05-09 19:47:37 +00:00
Piotr Padlewski d979c1f806 NFC: refactor replaceDominatedUsesWith
Summary:
Since I will post patch with some changes to
replaceDominatedUsesWith, it would be good to avoid
duplicating code again.

Reviewers: davide, dberlin

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D32798

llvm-svn: 302575
2017-05-09 19:39:44 +00:00
Serge Guelton e38003f839 Suppress all uses of LLVM_END_WITH_NULL. NFC.
Use variadic templates instead of relying on <cstdarg> + sentinel.
This enforces better type checking and makes code more readable.

Differential Revision: https://reviews.llvm.org/D32541

llvm-svn: 302571
2017-05-09 19:31:13 +00:00
Davide Italiano 63998ec3c8 [NewGVN] Explain why sorting by pointer values doesn't introduce non-determinism.
Thanks to Eli for pointing out in a post-commit review comment.

llvm-svn: 302566
2017-05-09 18:29:37 +00:00
Davide Italiano d6bb8cab03 [NewGVN] Fix a consistent order for phi nodes operands.
The way we currently define congruency for two PHIExpression(s) is:

1) The operands to the phi functions are congruent
2) The PHIs are defined in the same BasicBlock.

NewGVN works under the assumption that phi operands are in predecessor
order, or at least in some consistent order. OTOH, is valid IR:

patatino:
  %meh = phi i16 [ %0, %winky ], [ %conv1, %tinky ]
  %banana = phi i16 [ %0, %tinky ], [ %conv1, %winky ]
  br label %end

and the in-memory representations of the two SSA registers have an
inconsistent order. This violation of NewGVN assumptions results into
two PHIs found congruent when they're not. While we think it's useful
to have always a consistent order enforced, let's fix this in NewGVN
sorting uses in predecessor order before creating a PHI expression.

Differential Revision:  https://reviews.llvm.org/D32990

llvm-svn: 302552
2017-05-09 16:58:28 +00:00
Daniel Berlin 6604a2ffbb NewGVN: Make all of symbolic evaluation logically const.
llvm-svn: 302550
2017-05-09 16:40:04 +00:00
Sanjay Patel 6844e21f59 [InstCombineCasts] Fix checks in sext->lshr->trunc pattern.
The comment says to avoid the case where zero bits are shifted into the truncated value, 
but the code checks that the shift is smaller than the truncated value instead of the 
number of bits added by the sign extension. Fixing this allows a shift by more than the 
value size to be introduced, which is undefined behavior, so the shift is capped at the 
value size minus one, which has the expected behavior of filling the value with the sign 
bit.

Patch by Jacob Young!

Differential Revision: https://reviews.llvm.org/D32285

llvm-svn: 302548
2017-05-09 16:24:59 +00:00
Hans Wennborg 66fb0d9768 Revert r302469 "Make it illegal for two Functions to point to the same DISubprogram"
This caused PR32977.

Original commit message:

> Make it illegal for two Functions to point to the same DISubprogram
>
> As recently discussed on llvm-dev [1], this patch makes it illegal for
> two Functions to point to the same DISubprogram and updates
> FunctionCloner to also clone the debug info of a function to conform
> to the new requirement. To simplify the implementation it also factors
> out the creation of inlineAt locations from the Inliner into a
> general-purpose utility in DILocation.
>
> [1] http://lists.llvm.org/pipermail/llvm-dev/2017-May/112661.html
> <rdar://problem/31926379>
>
> Differential Revision: https://reviews.llvm.org/D32975

llvm-svn: 302533
2017-05-09 14:44:15 +00:00
Anna Thomas 0691483435 [LV] Fix insertion point for shuffle vectors in first order recurrence
Summary:
In first order recurrence vectorization, when the previous value is a phi node, we need to
set the insertion point to the first non-phi node.
We can have the previous value being a phi node, due to the generation of new
IVs as part of trunc optimization [1].

[1] https://reviews.llvm.org/rL294967

Reviewers: mssimpso, mkuper

Subscribers: mzolotukhin, llvm-commits

Differential Revision: https://reviews.llvm.org/D32969

llvm-svn: 302532
2017-05-09 14:29:33 +00:00
Amara Emerson cf9daa33a7 Introduce experimental generic intrinsics for horizontal vector reductions.
- This change allows targets to opt-in to using them instead of the log2
  shufflevector algorithm.
- The SLP and Loop vectorizers have the common code to do shuffle reductions
  factored out into LoopUtils, and now have a unified interface for generating
  reductions regardless of the preference of the target. LoopUtils now uses TTI
  to determine what kind of reductions the target wants to handle.
- For CodeGen, basic legalization support is added.

Differential Revision: https://reviews.llvm.org/D30086

llvm-svn: 302514
2017-05-09 10:43:25 +00:00
Sanjoy Das 76bbdd1a16 [InstNamer] Use range-for
llvm-svn: 302481
2017-05-08 23:18:43 +00:00
Sanjoy Das 0ac3bf1cc8 [InstNamer] Don't check type of arguments (they're never void)
llvm-svn: 302480
2017-05-08 23:18:39 +00:00
Sanjoy Das 7be961b081 Delete trailing whitespace
llvm-svn: 302479
2017-05-08 23:18:36 +00:00
Adrian Prantl 200a5ef526 Make it illegal for two Functions to point to the same DISubprogram
As recently discussed on llvm-dev [1], this patch makes it illegal for
two Functions to point to the same DISubprogram and updates
FunctionCloner to also clone the debug info of a function to conform
to the new requirement. To simplify the implementation it also factors
out the creation of inlineAt locations from the Inliner into a
general-purpose utility in DILocation.

[1] http://lists.llvm.org/pipermail/llvm-dev/2017-May/112661.html
<rdar://problem/31926379>

Differential Revision: https://reviews.llvm.org/D32975

llvm-svn: 302469
2017-05-08 21:17:08 +00:00
Sanjay Patel a1c8814891 [InstCombine] add folds for not-of-shift-right
This is another step towards getting rid of dyn_castNotVal, 
so we can recommit:
https://reviews.llvm.org/rL300977

As the tests show, we were missing the lshr case for constants
and both ashr/lshr vector splat folds. The ashr case with constant
was being performed inefficiently in 2 steps. It's also possible
there was a latent bug in that case because we can't do that fold
if the constant is positive:
http://rise4fun.com/Alive/Bge

llvm-svn: 302465
2017-05-08 20:49:59 +00:00
Davide Italiano aa42a10051 [PartialInlining] Capture by reference rather than by value.
llvm-svn: 302464
2017-05-08 20:44:01 +00:00
Sanjay Patel 2a06263036 [InstCombine] use local variable to reduce code duplication; NFCI
llvm-svn: 302438
2017-05-08 16:33:42 +00:00
Sanjay Patel 2df38a80f1 [InstCombine/InstSimplify] add comments about code duplication; NFC
llvm-svn: 302436
2017-05-08 16:21:55 +00:00
Craig Topper 7e3e7afca8 [ConstantRange][SimplifyCFG] Add a helper method to allow SimplifyCFG to determine if a ConstantRange has more than 8 elements without requiring an allocation if the ConstantRange is 64-bits wide.
Previously SimplifyCFG used getSetSize which returns an APInt that is 1 bit wider than the ConstantRange's bit width. In the reasonably common case that the ConstantRange is 64-bits wide, this requires returning a 65-bit APInt. APInt's can only store 64-bits without a memory allocation so this is inefficient.

The new method takes the 8 as an input and tells if the range contains more than that many elements without requiring any wider math.

llvm-svn: 302385
2017-05-07 22:22:11 +00:00
Sanjay Patel 599e65b1ff [InstSimplify] use ConstantRange to simplify or-of-icmps
We can simplify (or (icmp X, C1), (icmp X, C2)) to 'true' or one of the icmps in many cases.
I had to check some of these with Alive to prove to myself it's right, but everything seems 
to check out. Eg, the deleted code in instcombine was completely ignoring predicates with
mismatched signedness.

This is a follow-up to:
https://reviews.llvm.org/rL301260
https://reviews.llvm.org/D32143

llvm-svn: 302370
2017-05-07 15:11:40 +00:00
Kostya Serebryany 424bfed693 [sanitizer-coverage] implement -fsanitize-coverage=no-prune,... instead of a hidden -mllvm flag. llvm part.
llvm-svn: 302319
2017-05-05 23:14:40 +00:00
Craig Topper a49e768977 Fix spelling error in command line option description. NFC
llvm-svn: 302311
2017-05-05 22:31:11 +00:00
Matthias Braun 60b40b8fec TargetLibraryInfo: Introduce wcslen
wcslen is part of the C99 and C++98 standards.

- This introduces the function to TargetLibraryInfo.
- Also set attributes for wcslen in llvm::inferLibFuncAttributes().

Differential Revision: https://reviews.llvm.org/D32837

llvm-svn: 302278
2017-05-05 20:25:50 +00:00
Craig Topper f0aeee01c3 [KnownBits] Add wrapper methods for setting and clear all bits in the underlying APInts in KnownBits.
This adds routines for reseting KnownBits to unknown, making the value all zeros or all ones. It also adds methods for querying if the value is zero, all ones or unknown.

Differential Revision: https://reviews.llvm.org/D32637

llvm-svn: 302262
2017-05-05 17:36:09 +00:00
Craig Topper fc481e5eb7 [Float2Int] Replace a ConstantRange copy with a move. Remove an extra call to MapVector::find.
llvm-svn: 302256
2017-05-05 17:09:29 +00:00
Aditya Kumar 1c42d135e1 [LoopIdiom] check for safety while expanding
Loop Idiom recognition was generating memset in a case that
would result generating a division operation to an unsafe location.

Differential Revision: https://reviews.llvm.org/D32674

llvm-svn: 302238
2017-05-05 14:49:45 +00:00
Evgeniy Stepanov 9aff829f78 Remap metadata attached to global variables.
Fix for PR32577.
Global variables may have !associated metadata, which includes a reference to another global. It needs remapping.

llvm-svn: 302203
2017-05-04 23:29:39 +00:00
Craig Topper 1f673d4450 [JumpThreading] When processing compares, explicitly check that the result type is not a vector rather than check for it being an integer.
Compares always return a scalar integer or vector of integers. isIntegerTy returns false for vectors, but that's not completely obvious. So using isVectorTy is less confusing.

llvm-svn: 302198
2017-05-04 21:45:49 +00:00
Craig Topper 930689ada4 [JumpThreading] Change a dyn_cast that is already protected by an isa check to a static cast. Combine the with another static cast. NFC
Differential Revision: https://reviews.llvm.org/D32874

llvm-svn: 302197
2017-05-04 21:45:45 +00:00
Craig Topper 5974dadc69 [Float2Int] Remove return of ConstantRange from seen method. Nothing uses it so it just creates and discards a ConstantRange object for no reason.
llvm-svn: 302193
2017-05-04 21:29:45 +00:00
Peter Collingbourne 9667b91b13 Re-apply r302108, "IR: Use pointers instead of GUIDs to represent edges in the module summary. NFCI."
with a fix for the clang backend.

llvm-svn: 302176
2017-05-04 18:03:25 +00:00
Davide Italiano 94bf7846fd [NewGVN] Remove unneeded newline and format assertions. NFCI.
llvm-svn: 302173
2017-05-04 17:26:15 +00:00
Eric Liu f6039f255e Revert "IR: Use pointers instead of GUIDs to represent edges in the module summary. NFCI."
This reverts commit r302108. This causes crash in clang bootstrap with LTO.

Contacted the auther in the original commit.

llvm-svn: 302140
2017-05-04 11:49:39 +00:00
Martin Storsjo e81233d0ed [ArgPromotion] Fix a truncated variable
This fixes a regression since SVN rev 273808 (which was supposed to
not change functionality).

The regression caused miscompilations (noted in the wild when targeting
AArch64) on platforms with 32 bit long.

Differential Revision: https://reviews.llvm.org/D32850

llvm-svn: 302137
2017-05-04 10:54:35 +00:00
Jonas Paulsson 8bf1fdcc91 Use right function in LoopVectorize.
-    unsigned AS = getMemInstAlignment(I);
+    unsigned AS = getMemInstAddressSpace(I);

Review: Hal Finkel
llvm-svn: 302114
2017-05-04 05:31:56 +00:00
Peter Collingbourne 5f85a9deda IR: Use pointers instead of GUIDs to represent edges in the module summary. NFCI.
When profiling a no-op incremental link of Chromium I found that the functions
computeImportForFunction and computeDeadSymbols were consuming roughly 10% of
the profile. The goal of this change is to improve the performance of those
functions by changing the map lookups that they were previously doing into
pointer dereferences.

This is achieved by changing the ValueInfo data structure to be a pointer to
an element of the global value map owned by ModuleSummaryIndex, and changing
reference lists in the GlobalValueSummary to hold ValueInfos instead of GUIDs.
This means that a ValueInfo will take a client directly to the summary list
for a given GUID.

Differential Revision: https://reviews.llvm.org/D32471

llvm-svn: 302108
2017-05-04 03:36:16 +00:00
Craig Topper cff357c322 [InstCombine][KnownBits] Use KnownBits better to detect nsw adds
Change checkRippleForAdd from a heuristic to a full check -
if it is provable that the add does not overflow return true, otherwise false.

Patch by Yoav Ben-Shalom

Differential Revision: https://reviews.llvm.org/D32686

llvm-svn: 302093
2017-05-03 23:22:46 +00:00
Craig Topper 8189a87a1e [KnownBits] Add methods for determining if KnownBits is a constant value
This patch adds isConstant and getConstant for determining if KnownBits represents a constant value and to retrieve the value. Use them to simplify code.

Differential Revision: https://reviews.llvm.org/D32785

llvm-svn: 302091
2017-05-03 23:12:29 +00:00
Craig Topper d938fd1397 [KnownBits] Add zext, sext, and trunc methods to KnownBits
This patch adds zext, sext, and trunc methods to KnownBits and uses them where possible.

Differential Revision: https://reviews.llvm.org/D32784

llvm-svn: 302088
2017-05-03 22:07:25 +00:00
Xin Tong 46fb813ac3 [TailCallElim] Remove an unused argument. NFCI
llvm-svn: 302080
2017-05-03 20:37:07 +00:00
Anna Thomas f475fa3575 Avoid warning of unused variable in release builds. NFC
llvm-svn: 302068
2017-05-03 19:25:04 +00:00
Sanjoy Das 23f314d04f Fix typos in comment
llvm-svn: 302063
2017-05-03 18:29:34 +00:00
Anna Thomas d4c0295cc8 Fix PPC64 warning for missing parantheses. NFC.
llvm-svn: 302061
2017-05-03 18:25:43 +00:00
Reid Kleckner a0b45f4bfc [IR] Abstract away ArgNo+1 attribute indexing as much as possible
Summary:
Do three things to help with that:
- Add AttributeList::FirstArgIndex, which is an enumerator currently set
  to 1. It allows us to change the indexing scheme with fewer changes.
- Add addParamAttr/removeParamAttr. This just shortens addAttribute call
  sites that would otherwise need to spell out FirstArgIndex.
- Remove some attribute-specific getters and setters from Function that
  take attribute list indices.  Most of these were only used from
  BuildLibCalls, and doesNotAlias was only used to test or set if the
  return value is malloc-like.

I'm happy to split the patch, but I think they are probably easier to
review when taken together.

This patch should be NFC, but it sets the stage to change the indexing
scheme to this, which is more convenient when indexing into an array:
  0: func attrs
  1: retattrs
  2...: arg attrs

Reviewers: chandlerc, pete, javed.absar

Subscribers: david2050, llvm-commits

Differential Revision: https://reviews.llvm.org/D32811

llvm-svn: 302060
2017-05-03 18:17:31 +00:00
Anna Thomas ac0ec2240b [RuntimeLoopUnroller] Add assert that we dont unroll non-rotated loops
Summary:
Cloning basic blocks in the loop for runtime loop unroller depends on loop being
in rotated form (i.e. loop latch target is the exit block).
Assert that this is true, so that callers of runtime loop unroller pass in
canonical loops.
The single caller of this function has that check recently added:
https://reviews.llvm.org/rL301239

Reviewers: davide

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D32801

llvm-svn: 302058
2017-05-03 17:43:59 +00:00
Anna Thomas 53c8d95c85 [Loop Deletion] Delete loops that are never executed
Summary:
Currently, loop deletion deletes loop where the only values
that are used outside the loop are loop-invariant.
This patch adds logic to delete loops where the loop is proven to be
never executed (i.e. the only predecessor of the loop preheader has a
constant conditional branch as terminator, and the preheader is not the
taken target). This will remove loops that become dead after
loop-unswitching generates constant conditional branches.

The next steps are:
1. moving the loop deletion implementation to LoopUtils.
2. Add logic in loop-simplifyCFG which will support changing conditional
constant branches to unconditional branches. If loops become unreachable in this
process, they can be removed using `deleteDeadLoop` function.

Reviewers: chandlerc, efriedma, sanjoy, reames

Reviewed by: sanjoy

Subscribers: mzolotukhin, llvm-commits

Differential Revision: https://reviews.llvm.org/D32494

llvm-svn: 302015
2017-05-03 11:47:11 +00:00
Matt Arsenault 6a288c1e32 Replace hardcoded intrinsic list with speculatable attribute.
No change in which intrinsics should be speculated.

llvm-svn: 301995
2017-05-03 02:26:10 +00:00
Reid Kleckner ee4930b688 Re-land r301697 "[IR] Make add/remove Attributes use AttrBuilder instead of AttributeList"
This time, I fixed, built, and tested clang.

This reverts r301712.

llvm-svn: 301981
2017-05-02 22:07:37 +00:00
Davide Italiano 839c7e6cfb [NewGVN] Fix typo and format comment. NFCI.
llvm-svn: 301974
2017-05-02 21:11:40 +00:00
Xinliang David Li ab8722f80a [PartialInlining] Add more early filtering
This is a follow up to the previous
inline cost patch for quicker filtering.

llvm-svn: 301959
2017-05-02 18:43:21 +00:00
Matt Arsenault 9ac7d6be3c SpeculativeExecution: Stop using whitelist for costs
Just let TTI's cost do this instead of arbitrarily restricting
this.

llvm-svn: 301950
2017-05-02 18:02:18 +00:00
Sanjay Patel 6381db18fe [InstCombine] don't use DeMorgan's Law on integer constants (2nd try)
This was originally checked in here:
https://reviews.llvm.org/rL301923

And reverted here:
https://reviews.llvm.org/rL301924

Because there's a clang test that would fail after this. I fixed/removed the
offending CHECK lines in:
https://reviews.llvm.org/rL301928

So let's try this again. Original commit message:

This is the fold that causes the infinite loop in BoringSSL
(https://github.com/google/boringssl/blob/master/crypto/cipher/e_rc2.c)
when we fix instcombine demanded bits to prefer 'not' ops as in https://reviews.llvm.org/D32255.

There are 2 or 3 problems with dyn_castNotVal, and I don't think we can
reinstate https://reviews.llvm.org/D32255 until dyn_castNotVal is completely eliminated.

1. As shown here, it transforms 'not' into random xor. This transform is harmful to SCEV and codegen because 'not' can often be folded while random xor cannot.
2. It does not transform vector constants. This is actually a good thing, but if you don't believe the above argument, then we shouldn't have excluded vectors.
3. It tries to avoid transforming not(not(X)). That's nice, but it doesn't match the greedy nature of instcombine. If we DeMorganize a pattern that has an extra 'not' in it: ~(~(~X) & Y) --> (~X | ~Y)

  That's just another case of DeMorgan, so we should trust that we'll fold that pattern too: (~X | ~ Y) --> ~(X & Y)

Differential Revision: https://reviews.llvm.org/D32665

llvm-svn: 301929
2017-05-02 15:31:40 +00:00
Sanjay Patel da0b4deafa revert r301923 : [InstCombine] don't use DeMorgan's Law on integer constants
There's a clang test that is wrongly using -O1 and failing after this commit.

llvm-svn: 301924
2017-05-02 14:48:23 +00:00
Sanjay Patel 096a981982 [InstCombine] don't use DeMorgan's Law on integer constants
This is the fold that causes the infinite loop in BoringSSL 
(https://github.com/google/boringssl/blob/master/crypto/cipher/e_rc2.c) 
when we fix instcombine demanded bits to prefer 'not' ops as in D32255.

There are 2 or 3 problems with dyn_castNotVal, and I don't think we can 
reinstate D32255 until dyn_castNotVal is completely eliminated.
1. As shown here, it transforms 'not' into random xor. This transform is 
   harmful to SCEV and codegen because 'not' can often be folded while 
   random xor cannot.
2. It does not transform vector constants. This is actually a good thing, 
   but if you don't believe the above argument, then we shouldn't have 
   excluded vectors.
3. It tries to avoid transforming not(not(X)). That's nice, but it doesn't
   match the greedy nature of instcombine. If we DeMorganize a pattern 
   that has an extra 'not' in it:
   ~(~(~X) & Y) --> (~X | ~Y)

   That's just another case of DeMorgan, so we should trust that we'll fold
   that pattern too:
   (~X | ~ Y) --> ~(X & Y)

Differential Revision: https://reviews.llvm.org/D32665

llvm-svn: 301923
2017-05-02 14:31:30 +00:00
Xinliang David Li 6133846be1 [PartialInlining] Hook up inline cost analysis
Differential Revision: http://reviews.llvm.org/D32666

llvm-svn: 301894
2017-05-02 02:44:14 +00:00
Xin Tong a41bf70bea Empty Space. NFC
llvm-svn: 301878
2017-05-01 23:08:19 +00:00
Davide Italiano 2dfd46bf08 [NewGVN] Don't derive incorrect implications.
In the testcase attached,  we believe %tmp1 implies %tmp4.
where:
  br i1 %tmp1, label %bb2, label %bb7
  br i1 %tmp4, label %bb5, label %bb7

because Wwhile looking at PredicateInfo stuffs we end up calling
isImpliedTrueByMatchingCmp() with the arguments backwards.

Differential Revision:  https://reviews.llvm.org/D32718

llvm-svn: 301849
2017-05-01 22:26:28 +00:00
Sanjay Patel 59d0aeaafe [InstCombine] check one-use before applying DeMorgan nor/nand folds
If we have ~(~X & Y), it only makes sense to transform it to (X | ~Y) when we do not need 
the intermediate (~X & Y) value. In that case, we would need an extra instruction to 
generate ~Y + 'or' (as shown in the test changes).

It's ok if we have multiple uses of ~X or Y, however. In those cases, we may not reduce the
instruction count or critical path, but we might improve throughput because we can generate 
~X and ~Y in parallel. Whether that actually makes perf sense or not for a target is something 
we can't answer in IR.

Differential Revision: https://reviews.llvm.org/D32703

llvm-svn: 301848
2017-05-01 22:25:42 +00:00
Peter Collingbourne a992f53099 IPO: Add missing build dep.
llvm-svn: 301835
2017-05-01 20:57:20 +00:00
Peter Collingbourne c15d60b772 Object: Remove ModuleSummaryIndexObjectFile class.
Differential Revision: https://reviews.llvm.org/D32195

llvm-svn: 301832
2017-05-01 20:42:32 +00:00
Xin Tong a4b9b9f42a Take indirect branch into account as well when folding.
We may not be able to rewrite indirect branch target, but we also want to take it into
account when folding, i.e. if it and all its successor's predecessors go to the same
destination, we can fold, i.e. no need to thread.

llvm-svn: 301816
2017-05-01 17:15:37 +00:00
Sanjoy Das e6bca0eecb Rename WeakVH to WeakTrackingVH; NFC
This relands r301424.

llvm-svn: 301812
2017-05-01 17:07:49 +00:00
Xin Tong 99dce428bc [JumpThread] Add some assertions for expected ConstantInt/BlockAddress
llvm-svn: 301808
2017-05-01 16:19:59 +00:00
Xin Tong 21f8ac235e [JumpThread] Do RAUW in case Cond folds to a constant in the CFG
Summary: [JumpThread] Do RAUW in case Cond folds to a constant in the CFG

Reviewers: sanjoy

Reviewed By: sanjoy

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D32407

llvm-svn: 301804
2017-05-01 15:34:17 +00:00
Sanjoy Das 08989c7ecd Rename isKnownNotFullPoison to programUndefinedIfPoison; NFC
Summary:
programUndefinedIfPoison makes more sense, given what the function
does; and I'm about to add a function with a name similar to
isKnownNotFullPoison (so do the rename to avoid confusion).

Reviewers: broune, majnemer, bjarke.roune

Reviewed By: broune

Subscribers: mcrosier, llvm-commits, mzolotukhin

Differential Revision: https://reviews.llvm.org/D30444

llvm-svn: 301776
2017-04-30 19:41:19 +00:00
Craig Topper ca48af3c87 [KnownBits] Add methods for determining if the known bits represent a negative/nonnegative number and add methods for changing the negative/nonnegative state
Summary: This patch adds isNegative, isNonNegative for querying whether the sign bit is known. It also adds makeNegative and makeNonNegative for controlling the sign bit.

Reviewers: RKSimon, spatel, davide

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D32651

llvm-svn: 301747
2017-04-29 16:43:11 +00:00
Akira Hatanaka 6fdcb3c2ce [ObjCARC] Do not move a release between a call and a
retainAutoreleasedReturnValue that retains the returned value.

This commit fixes a bug in ARC optimizer where it moves a release
between a call and a retainAutoreleasedReturnValue, causing the returned
object to be released before the retainAutoreleasedReturnValue can
retain it.

This commit accomplishes that by doing a lookahead and checking whether
the call prevents the release from moving upwards. In the long term, we
should treat the region between the retainAutoreleasedReturnValue and
the call as a critical section and disallow moving anything there
(possibly using operand bundles).

rdar://problem/20449878

llvm-svn: 301724
2017-04-29 00:23:11 +00:00
Davide Italiano 0aaa96a07b [LoopUnswitch] Make DEBUG output more readable (part 2).
I fixed my miscompile in r301722 and I hope I don't have to take
a look at this code again now that Chandler has a new LoopUnswitch
pass, but maybe this could be of use for somebody else in the
meanwhile.

llvm-svn: 301723
2017-04-29 00:18:26 +00:00
Davide Italiano 534e314356 [LoopUnswitch] Don't remove instructions with side effects.
This fixes PR32818.

Differential Revision:  https://reviews.llvm.org/D32664

llvm-svn: 301722
2017-04-29 00:12:18 +00:00
Hans Wennborg 0f88d863b4 Revert r301697 "[IR] Make add/remove Attributes use AttrBuilder instead of AttributeList"
This broke the Clang build. (Clang-side patch missing?)

Original commit message:

> [IR] Make add/remove Attributes use AttrBuilder instead of
> AttributeList
>
> This change cleans up call sites and avoids creating temporary
> AttributeList objects.
>
> NFC

llvm-svn: 301712
2017-04-28 23:01:32 +00:00
Matt Arsenault e0f9e984fd InferAddressSpaces: Search constant expressions for addrspacecasts
These are pretty common when using local memory, and the 64-bit generic
addressing is much more expensive to compute.

llvm-svn: 301711
2017-04-28 22:52:41 +00:00
Matt Arsenault c20ccd2c02 InferAddressSpaces: Avoid looking up deleted values
While looking at pure addressing expressions, it's possible
for the value to appear later in Postorder.

I haven't been able to come up with a testcase where this
exhibits an actual issue, but if you insert a dump before
the value map lookup, a few testcases crash.

llvm-svn: 301705
2017-04-28 22:18:19 +00:00
Matt Arsenault a1e734050c InferAddressSpaces: Infer from just addrspacecasts
Eliminates some more cases where some subset of the addressing
computation remains flat. Some cases with addrspacecasts
in nested constant expressions are still left behind however.

llvm-svn: 301704
2017-04-28 22:18:08 +00:00
Daniel Berlin 98a1de85cb LoopRotate: Fix use after scope bug
llvm-svn: 301702
2017-04-28 22:05:55 +00:00
Reid Kleckner 608c8b63b3 [IR] Make add/remove Attributes use AttrBuilder instead of AttributeList
This change cleans up call sites and avoids creating temporary
AttributeList objects.

NFC

llvm-svn: 301697
2017-04-28 21:48:28 +00:00
Davide Italiano e27cb87754 [LoopUnswitch] Make DEBUG output more readable.
While debugging a miscompile I realized loopunswitch doesn't
put newlines when printing the instruction being replacement.
Ending up with a single line with many instruction replaced isn't
the best for readability and/or mental sanity.

llvm-svn: 301692
2017-04-28 21:30:50 +00:00
Reid Kleckner 859f8b544a Make getParamAlignment use argument numbers
The method is called "get *Param* Alignment", and is only used for
return values exactly once, so it should take argument indices, not
attribute indices.

Avoids confusing code like:
  IsSwiftError = CS->paramHasAttr(ArgIdx, Attribute::SwiftError);
  Alignment  = CS->getParamAlignment(ArgIdx + 1);

Add getRetAlignment to handle the one case in Value.cpp that wants the
return value alignment.

This is a potentially breaking change for out-of-tree backends that do
their own call lowering.

llvm-svn: 301682
2017-04-28 20:34:27 +00:00
Daniel Berlin 4d0fe64ae3 Kill off the old SimplifyInstruction API by converting remaining users.
llvm-svn: 301673
2017-04-28 19:55:38 +00:00
Davide Italiano b6681e2b4e [IPO/MergeFunctions] This function is used only under DEBUG().
llvm-svn: 301672
2017-04-28 19:39:45 +00:00
Reid Kleckner 99351967c7 [RS4GC] Simplify attribute handling code NFC
Avoids use of AttributeList::getNumSlots, making it easier to change the
underlying implementation.

llvm-svn: 301671
2017-04-28 19:22:40 +00:00
Reid Kleckner 6652a52e2b Use Argument::hasAttribute and AttributeList::ReturnIndex more
This eliminates many extra 'Idx' induction variables in loops over
arguments in CodeGen/ and Target/. It also reduces the number of places
where we assume that ReturnIndex is 0 and that we should add one to
argument numbers to get the corresponding attribute list index.

NFC

llvm-svn: 301666
2017-04-28 18:37:16 +00:00
Adrian Prantl 109b236850 Clean up DIExpression::prependDIExpr a little. (NFC)
llvm-svn: 301662
2017-04-28 17:51:05 +00:00
Craig Topper 24db6b800f [APInt] Add clearSignBit method. Use it and setSignBit in a few places. NFCI
llvm-svn: 301656
2017-04-28 16:58:05 +00:00
Teresa Johnson 51177295c4 Memory intrinsic value profile optimization: Avoid divide by 0
Summary:
Skip memops if the total value profiled count is 0, we can't correctly
scale up the counts and there is no point anyway.

Reviewers: davidxl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D32624

llvm-svn: 301645
2017-04-28 14:30:54 +00:00
Andrew Ng 03e35b6bc0 [DebugInfo][X86] Improve X86 Optimize LEAs handling of debug values.
This is a follow up to the fix in r298360 to improve the handling of debug
values when redundant LEAs are removed. The fix in r298360 effectively
discarded the debug values. This patch now attempts to preserve the debug
values by using the DWARF DW_OP_stack_value operation via prependDIExpr.

Moved functions appendOffset and prependDIExpr from Local.cpp to
DebugInfoMetadata.cpp and made them available as static member functions of
DIExpression.

Differential Revision: https://reviews.llvm.org/D31604

llvm-svn: 301630
2017-04-28 08:44:30 +00:00
Max Kazantsev 531db9a504 [EarlyCSE] Mark the condition of assume intrinsic as true
EarlyCSE should not just ignore assumes. It should use the fact that its condition is true for all dominated instructions.

Reviewers: sanjoy, reames, apilipenko, anna, skatkov

Reviewed By: reames, sanjoy

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D32482

llvm-svn: 301625
2017-04-28 06:25:39 +00:00
Max Kazantsev 0589d9fa0f [EarlyCSE] Remove guards with conditions known to be true
If a condition is calculated only once, and there are multiple guards on this condition, we should be able
to remove all guards dominated by the first of them. This patch allows EarlyCSE to try to find the condition
of a guard among the known values, and if it is true, remove the guard. Otherwise we keep the guard and
mark its condition as 'true' for future consideration.

Reviewers: sanjoy, reames, apilipenko, skatkov, anna, dberlin

Reviewed By: reames, sanjoy

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D32476

llvm-svn: 301623
2017-04-28 06:05:48 +00:00
Craig Topper 24e71017aa [APInt] Use inplace shift methods where possible. NFCI
llvm-svn: 301612
2017-04-28 03:36:24 +00:00
Davide Italiano 81a26da1e5 [SROA] Fix nondeterminism exposed by Simon's r299221.
Use a SmallSetSetVector instead of a SmallPtrSet as iterating
over the latter is not stable ('<' relies on addresses).

llvm-svn: 301599
2017-04-27 23:09:01 +00:00
Sanjay Patel 73d8c43da8 [InstCombine] fix matcher to bind to specific operand (PR32830)
Matching any random value would be very wrong:
https://bugs.llvm.org/show_bug.cgi?id=32830

llvm-svn: 301594
2017-04-27 21:55:03 +00:00
Evgeniy Stepanov 964f4663c4 [asan] Fix dead stripping of globals on Linux.
Use a combination of !associated, comdat, @llvm.compiler.used and
custom sections to allow dead stripping of globals and their asan
metadata. Sometimes.

Currently this works on LLD, which supports SHF_LINK_ORDER with
sh_link pointing to the associated section.

This also works on BFD, which seems to treat comdats as
all-or-nothing with respect to linker GC. There is a weird quirk
where the "first" global in each link is never GC-ed because of the
section symbols.

At this moment it does not work on Gold (as in the globals are never
stripped).

This is a second re-land of r298158. This time, this feature is
limited to -fdata-sections builds.

llvm-svn: 301587
2017-04-27 20:27:27 +00:00
Evgeniy Stepanov 716f0ff222 [asan] Put ctor/dtor in comdat.
When possible, put ASan ctor/dtor in comdat.

The only reason not to is global registration, which can be
TU-specific. This is not the case when there are no instrumented
globals. This is also limited to ELF targets, because MachO does
not have comdat, and COFF linkers may GC comdat constructors.

The benefit of this is a lot less __asan_init() calls: one per DSO
instead of one per TU. It's also necessary for the upcoming
gc-sections-for-globals change on Linux, where multiple references to
section start symbols trigger quadratic behaviour in gold linker.

This is a second re-land of r298756. This time with a flag to disable
the whole thing to avoid a bug in the gold linker:
  https://sourceware.org/bugzilla/show_bug.cgi?id=19002

llvm-svn: 301586
2017-04-27 20:27:23 +00:00
Chandler Carruth 1353f9a48b [PM/LoopUnswitch] Introduce a new, simpler loop unswitch pass.
Currently, this pass only focuses on *trivial* loop unswitching. At that
reduced problem it remains significantly better than the current loop
unswitch:
- Old pass is worse than cubic complexity. New pass is (I think) linear.
- New pass is much simpler in its design by focusing on full unswitching. (See
  below for details on this).
- New pass doesn't carry state for thresholds between pass iterations.
- New pass doesn't carry state for correctness (both miscompile and
  infloop) between pass iterations.
- New pass produces substantially better code after unswitching.
- New pass can handle more trivial unswitch cases.
- New pass doesn't recompute the dominator tree for the entire function
  and instead incrementally updates it.

I've ported all of the trivial unswitching test cases from the old pass
to the new one to make sure that major functionality isn't lost in the
process. For several of the test cases I've worked to improve the
precision and rigor of the CHECKs, but for many I've just updated them
to handle the new IR produced.

My initial motivation was the fact that the old pass carried state in
very unreliable ways between pass iterations, and these mechansims were
incompatible with the new pass manager. However, I discovered many more
improvements to make along the way.

This pass makes two very significant assumptions that enable most of these
improvements:

1) Focus on *full* unswitching -- that is, completely removing whatever
   control flow construct is being unswitched from the loop. In the case
   of trivial unswitching, this means removing the trivial (exiting)
   edge. In non-trivial unswitching, this means removing the branch or
   switch itself. This is in opposition to *partial* unswitching where
   some part of the unswitched control flow remains in the loop. Partial
   unswitching only really applies to switches and to folded branches.
   These are very similar to full unrolling and partial unrolling. The
   full form is an effective canonicalization, the partial form needs
   a complex cost model, cannot be iterated, isn't canonicalizing, and
   should be a separate pass that runs very late (much like unrolling).

2) Leverage LLVM's Loop machinery to the fullest. The original unswitch
   dates from a time when a great deal of LLVM's loop infrastructure was
   missing, ineffective, and/or unreliable. As a consequence, a lot of
   complexity was added which we no longer need.

With these two overarching principles, I think we can build a fast and
effective unswitcher that fits in well in the new PM and in the
canonicalization pipeline. Some of the remaining functionality around
partial unswitching may not be relevant today (not many test cases or
benchmarks I can find) but if they are I'd like to add support for them
as a separate layer that runs very late in the pipeline.

Purely to make reviewing and introducing this code more manageable, I've
split this into first a trivial-unswitch-only pass and in the next patch
I'll add support for full non-trivial unswitching against a *fixed*
threshold, exactly like full unrolling. I even plan to re-use the
unrolling thresholds, as these are incredibly similar cost tradeoffs:
we're cloning a loop body in order to end up with simplified control
flow. We should only do that when the total growth is reasonably small.

One of the biggest changes with this pass compared to the previous one
is that previously, each individual trivial exiting edge from a switch
was unswitched separately as a branch. Now, we unswitch the entire
switch at once, with cases going to the various destinations. This lets
us unswitch multiple exiting edges in a single operation and also avoids
numerous extremely bad behaviors, where we would introduce 1000s of
branches to test for thousands of possible values, all of which would
take the exact same exit path bypassing the loop. Now we will use
a switch with 1000s of cases that can be efficiently lowered into
a jumptable. This avoids relying on somehow forming a switch out of the
branches or getting horrible code if that fails for any reason.

Another significant change is that this pass actively updates the CFG
based on unswitching. For trivial unswitching, this is actually very
easy because of the definition of loop simplified form. Doing this makes
the code coming out of loop unswitch dramatically more friendly. We
still should run loop-simplifycfg (at the least) after this to clean up,
but it will have to do a lot less work.

Finally, this pass makes much fewer attempts to simplify instructions
based on the unswitch. Something like loop-instsimplify, instcombine, or
GVN can be used to do increasingly powerful simplifications based on the
now dominating predicate. The old simplifications are things that
something like loop-instsimplify should get today or a very, very basic
loop-instcombine could get. Keeping that logic separate is a big
simplifying technique.

Most of the code in this pass that isn't in the old one has to do with
achieving specific goals:
- Updating the dominator tree as we go
- Unswitching all cases in a switch in a single step.

I think it is still shorter than just the trivial unswitching code in
the old pass despite having this functionality.

Differential Revision: https://reviews.llvm.org/D32409

llvm-svn: 301576
2017-04-27 18:45:20 +00:00
Eli Friedman 10ab923b32 [GlobalOpt] Correctly update metadata when localizing a global.
Just calling dropAllReferences leaves pointers to the ConstantExpr
behind, so we would eventually crash with a null pointer dereference.

Differential Revision: https://reviews.llvm.org/D32551

llvm-svn: 301575
2017-04-27 18:39:08 +00:00
Teresa Johnson f9ea176f05 Memory intrinsic value profile optimization: Improve debug output (NFC)
Summary:
Misc improvements to debug output. Fix a couple typos and also dump the
value profile before we make any profitability checks.

Reviewers: davidxl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D32607

llvm-svn: 301574
2017-04-27 18:25:22 +00:00
Xinliang David Li d21601a929 [PartialInlining]: Improve partial inlining to handle complex conditions
Differential Revision: http://reviews.llvm.org/D32249

llvm-svn: 301561
2017-04-27 16:34:00 +00:00
Craig Topper 9474e9b6c8 [InstCombine] Use APInt bit counting methods to avoid a temporary APInt. NFC
llvm-svn: 301516
2017-04-27 04:51:25 +00:00
Chandler Carruth c246a4c973 Disable GVN Hoist due to still more bugs being found in it. There is
also a discussion about exactly what we should do prior to re-enabling
it.

The current bug is http://llvm.org/PR32821 and the discussion about this
is in the review thread for r300200.

llvm-svn: 301505
2017-04-27 00:28:03 +00:00
Davide Italiano d7b2a9981c [LibCallsShrinkWrap] Remove an unnecessary class member variable.
llvm-svn: 301477
2017-04-26 21:28:40 +00:00
Davide Italiano 11817ba2ea [LibCallsShrinkWrap] More descriptive assertion messages.
Fix a typo while I'm here.

llvm-svn: 301474
2017-04-26 21:21:02 +00:00
Davide Italiano 3c3785fd1f [LibCallsShrinkWrap] Remove some temporary cl::opt(s).
The pass has been on and working for a while.

llvm-svn: 301473
2017-04-26 21:19:05 +00:00
Davide Italiano 6abada8ab8 [LibCallsShrinkWrap] Teach the pass how to preserve the dominator.
llvm-svn: 301471
2017-04-26 21:05:40 +00:00
Daniel Berlin ede130d490 NewGVN: Use new SimplifyQuery based API
llvm-svn: 301466
2017-04-26 20:56:14 +00:00
Daniel Berlin 2c75c63063 InstCombine: Use the new SimplifyQuery versions of Simplify*. Use AssumptionCache, DominatorTree, TargetLibraryInfo everywhere.
llvm-svn: 301464
2017-04-26 20:56:07 +00:00
Daniel Berlin c9f0a4f1ec CorrelatedValuePropagation: Rename a variable for consistency
llvm-svn: 301435
2017-04-26 17:41:46 +00:00
Craig Topper b45eabcf82 [ValueTracking] Introduce a KnownBits struct to wrap the two APInts for computeKnownBits
This patch introduces a new KnownBits struct that wraps the two APInt used by computeKnownBits. This allows us to treat them as more of a unit.

Initially I've just altered the signatures of computeKnownBits and InstCombine's simplifyDemandedBits to pass a KnownBits reference instead of two separate APInt references. I'll do similar to the SelectionDAG version of computeKnownBits/simplifyDemandedBits as a separate patch.

I've added a constructor that allows initializing both APInts to the same bit width with a starting value of 0. This reduces the repeated pattern of initializing both APInts. Once place default constructed the APInts so I added a default constructor for those cases.

Going forward I would like to add more methods that will work on the pairs. For example trunc, zext, and sext occur on both APInts together in several places. We should probably add a clear method that can be used to clear both pieces. Maybe a method to check for conflicting information. A method to return (Zero|One) so we don't write it out everywhere. Maybe a method for (Zero|One).isAllOnesValue() to determine if all bits are known. I'm sure there are many other methods we can come up with.

Differential Revision: https://reviews.llvm.org/D32376

llvm-svn: 301432
2017-04-26 16:39:58 +00:00
Sanjoy Das 2cbeb00f38 Reverts commit r301424, r301425 and r301426
Commits were:

"Use WeakVH instead of WeakTrackingVH in AliasSetTracker's UnkownInsts"
"Add a new WeakVH value handle; NFC"
"Rename WeakVH to WeakTrackingVH; NFC"

The changes assumed pointers are 8 byte aligned on all architectures.

llvm-svn: 301429
2017-04-26 16:37:05 +00:00
Matthew Simpson 9eed0bee3d [LV] Handle external uses of floating-point induction variables
Reference: https://bugs.llvm.org/show_bug.cgi?id=32758
Differential Revision: https://reviews.llvm.org/D32445

llvm-svn: 301428
2017-04-26 16:23:02 +00:00
Sanjoy Das 01de557738 Rename WeakVH to WeakTrackingVH; NFC
Summary:
I plan to use WeakVH to mean "nulls itself out on deletion, but does
not track RAUW" in a subsequent commit.

Reviewers: dblaikie, davide

Reviewed By: davide

Subscribers: arsenm, mehdi_amini, mcrosier, mzolotukhin, jfb, llvm-commits, nhaehnle

Differential Revision: https://reviews.llvm.org/D32266

llvm-svn: 301424
2017-04-26 16:20:52 +00:00
Haojian Wu e43db0a834 Fix unused-variable warning caused by r301407.
llvm-svn: 301411
2017-04-26 14:31:05 +00:00
Daniel Berlin 62aee14978 Convert LoopRotation to use SimplifyQuery version of SimplifyInstruction. Add AssumptionCache, DominatorTree, TLI if available.
llvm-svn: 301407
2017-04-26 13:52:18 +00:00
Daniel Berlin 954006fde8 Convert SimplifyInstructions to use the SimplifyQuery version of SimplifyInstruction
llvm-svn: 301406
2017-04-26 13:52:16 +00:00
Daniel Berlin 9bae449d78 Convert CVP to use SimplifyQuery version of SimplifyInstruction. Add AssumptionCache, DominatorTree, TLI if available.
llvm-svn: 301405
2017-04-26 13:52:13 +00:00
Filipe Cabecinhas 92dc348773 Simplify the CFG after loop pass cleanup.
Summary:
Otherwise we might end up with some empty basic blocks or
single-entry-single-exit basic blocks.

This fixes PR32085

Reviewers: chandlerc, danielcdh

Subscribers: mehdi_amini, RKSimon, llvm-commits

Differential Revision: https://reviews.llvm.org/D30468

llvm-svn: 301395
2017-04-26 12:02:41 +00:00
Matthias Braun c36a78c3f3 SimplifyLibCalls: Fix crash on memset(notmalloc())
rdar://31520787

llvm-svn: 301352
2017-04-25 19:44:25 +00:00
Stanislav Mekhanoshin f2db5434be Skip bitcasts while looking for GEP in LoadStoreVectorizer
Differential Revisison: https://reviews.llvm.org/D32101

llvm-svn: 301343
2017-04-25 18:00:08 +00:00
Craig Topper 09a5878d33 [InstCombine] Remove redundant code from SimplifyUsingDistributiveLaws
The code I've removed here exists in ExpandBinOp in InstSimplify which we call into before SimplifyUsingDistributiveLaws. The code in InstSimplify looks to have been copied from here.

I verified this code doesn't fire on any lit tests. Not that that proves its definitely dead.

Differential Revision: https://reviews.llvm.org/D32472

llvm-svn: 301341
2017-04-25 17:54:12 +00:00
Craig Topper f3dbd17d0a [APInt] Use isSubsetOf, intersects, and bit counting methods to reduce temporary APInts
This patch uses various APInt methods to reduce temporary APInt creation.

This should be all of the unrelated cleanups that got buried in D32376(creating a KnownBits struct) as well as some pointed out by Simon during the review of that. Plus a few improvements to use counting instead of masking.

I've left out any places where we do something like (KnownZero & KnownOne) != 0 as I plan to add a helper method to KnownBits to ask that question and didn't want to thrash that code an additional time.

Differential Revision: https://reviews.llvm.org/D32495

llvm-svn: 301338
2017-04-25 17:46:30 +00:00
Davide Italiano 058abf1f61 [PM] Run IndirectCallPromotion only when PGO is enabled.
Differential Revision:  https://reviews.llvm.org/D32465

llvm-svn: 301327
2017-04-25 16:54:45 +00:00
Craig Topper 7603dce6b2 [InstCombine] Remove superfluous curly braces around a single line if body. NFC
llvm-svn: 301326
2017-04-25 16:48:19 +00:00
Craig Topper ba01143193 [InstCombine] Add missing commute handling to (A | B) & (B ^ (~A)) -> (A & B)
The matching here wasn't able to handle all the possible commutes. It always assumed the not would be on the left of the xor, but that's not guaranteed.

Differential Revision: https://reviews.llvm.org/D32474

llvm-svn: 301316
2017-04-25 15:19:04 +00:00
Andrew Ng 1606fc0bf9 [SimplifyLibCalls] Fix infinite loop with fast-math optimization.
One of the fast-math optimizations is to replace calls to standard double
functions with their float equivalents, e.g. exp -> expf. However, this can
cause infinite loops for the following:

  float expf(float val) { return (float) exp((double) val); }

A similar inline declaration exists in the MinGW-w64 math.h header file which
when compiled with -O2/3 and fast-math generates infinite loops.

So this fix checks that the calling function to the standard double function
that is being replaced does not match the float equivalent.

Differential Revision: https://reviews.llvm.org/D31806

llvm-svn: 301304
2017-04-25 12:36:14 +00:00
Craig Topper c4b48a32f0 [InstCombine] Use commutable matchers to reduce some code. NFC
llvm-svn: 301294
2017-04-25 06:02:11 +00:00
Gil Rapaport 860f0a2bad [LV] Remove redundant basic block split
This patch is part of D28975's breakdown.

Genreating the control-flow to guard predicated instructions modified to
only use SplitBlockAndInsertIfThen() for producing the if-then construct.

Differential Revision: https://reviews.llvm.org/D32224

llvm-svn: 301293
2017-04-25 05:57:22 +00:00
Xinliang David Li f12a0faf88 [CodeExtractor]: Fixup use refs of the old phi.
Differential Revision: http://reviews.llvm.org/D32468

llvm-svn: 301291
2017-04-25 04:51:19 +00:00
Akira Hatanaka 490397fc08 [ObjCARC] Do not sink an objc_retain past a clang.arc.use.
We need to do this to prevent a miscompile which sinks an objc_retain
past an objc_release that releases the object objc_retain retains. This
happens because the top-down and bottom-up traversals each determines
the insert point for retain or release individually without knowing
where the other instruction is moved.

For example, when the following IR is fed to the ARC optimizer, the
top-down traversal decides to insert objc_retain right before
objc_release and the bottom-up traversal decides to insert objc_release
right after clang.arc.use.

(IR before ARC optimizer)
%11 = call i8* @objc_retain(i8* %10)
call void (...) @clang.arc.use(%0* %5)
call void @llvm.dbg.value(...)
call void @objc_release(i8* %6)

This reverses the order of objc_release and objc_retain, which causes
the object to be destructed prematurely.

(IR after ARC optimizer)
call void (...) @clang.arc.use(%0* %5)
call void @objc_release(i8* %6)
call void @llvm.dbg.value(...)
%11 = call i8* @objc_retain(i8* %10)

rdar://problem/30530580

llvm-svn: 301289
2017-04-25 04:06:35 +00:00
Davide Italiano 5b65f12bfa [SimplifyLibCalls] Remove a cl::opt that's been `true` for a long time.
llvm-svn: 301288
2017-04-25 03:48:47 +00:00
Matt Arsenault 6d7f01e3d8 InferAddressSpaces: Use reference arguments instead of pointers
llvm-svn: 301276
2017-04-24 23:42:41 +00:00
Matt Arsenault e8d0539f20 InferAddressSpaces: Remove redundant assert
This is just asserting all the operations are handled in the
switch, which the unreachable already handles.

llvm-svn: 301270
2017-04-24 23:02:57 +00:00
Sanjay Patel 35c362ebbb [InstSimplify] use ConstantRange to simplify more and-of-icmps
We can simplify (and (icmp X, C1), (icmp X, C2)) to one of the icmps in many cases. 
I had to check some of these with Alive to prove to myself it's right, but everything 
seems to check out. Eg, the code in instcombine was completely ignoring predicates with 
mismatched signedness.

Handling or-of-icmps would be a follow-up step.

Differential Revision: https://reviews.llvm.org/D32143

llvm-svn: 301260
2017-04-24 21:52:39 +00:00
Teresa Johnson b2c390e9f5 Update profile during memory instrinsic optimization
Summary:
Ensure that the new merge BB (which contains the rest of the original BB
after the mem op being optimized) gets a profile frequency, in case
there are additional mem ops later in the BB. Otherwise they get skipped
as the merge BB looks cold.

Reviewers: davidxl, xur

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D32447

llvm-svn: 301244
2017-04-24 20:30:42 +00:00
Matt Arsenault 4474652c95 Revert "StructurizeCFG: Directly invert cmp instructions"
This reverts commit r300732. This breaks a few tests.
I think the problem is related to adding more uses of
the condition that don't yet exist at this point.

llvm-svn: 301242
2017-04-24 20:25:01 +00:00
Davide Italiano ca81fbcadb [LoopUnroll] Remove spurious newline.
Eli pointed out in the review, but I didn't squash the two commits
correctly. Pointy-hat to me.

llvm-svn: 301241
2017-04-24 20:17:38 +00:00
Davide Italiano 0f62eea7ff [LoopUnroll] Don't try to unroll non canonical loops.
The current Loop Unroll implementation works with loops having a
single latch that contains a conditional branch to a block outside
the loop (the other successor is, by defition of latch, the header).
If this precondition doesn't hold, avoid unrolling the loop as
the code is not ready to handle such circumstances.

Differential Revision:  https://reviews.llvm.org/D32261

llvm-svn: 301239
2017-04-24 20:14:11 +00:00
Sanjoy Das 206f65c049 [LIR] Obey non-integral pointer semantics
Summary: See http://llvm.org/docs/LangRef.html#non-integral-pointer-type

Reviewers: haicheng

Reviewed By: haicheng

Subscribers: mcrosier, mzolotukhin, llvm-commits

Differential Revision: https://reviews.llvm.org/D32196

llvm-svn: 301238
2017-04-24 20:12:10 +00:00
Evgeniy Stepanov 9e536081fe [asan] Let the frontend disable gc-sections optimization for asan globals.
Also extend -asan-globals-live-support flag to all binary formats.

llvm-svn: 301226
2017-04-24 19:34:13 +00:00
Mandeep Singh Grang 799a2edb3d [SimplifyCFG] Fix for non-determinism in codegen
Summary: This patch fixes issues in codegen uncovered due to https://reviews.llvm.org/D26718

Reviewers: majnemer, chenli, davide

Reviewed By: davide

Subscribers: davide, arsenm, llvm-commits

Differential Revision: https://reviews.llvm.org/D26726

llvm-svn: 301222
2017-04-24 19:20:45 +00:00
Evgeniy Stepanov 58ccc0949a Revert "Compute safety information in a much finer granularity."
Use-after-free in llvm::isGuaranteedToExecute.

llvm-svn: 301214
2017-04-24 18:25:07 +00:00
Sanjay Patel 0889225f51 [InstSimplify] move (A & ~B) | (A ^ B) -> (A ^ B) from InstCombine
This is a straight cut and paste, but there's a bigger problem: if this
fold exists for simplifyOr, there should be a DeMorganized version for
simplifyAnd. But more than that, we have a patchwork of ad hoc logic
optimizations in InstCombine. There should be some structure to ensure 
that we're not missing sibling folds across and/or/xor.
 

llvm-svn: 301213
2017-04-24 18:24:36 +00:00
Adrian Prantl f2c7997013 Use DW_OP_stack_value when reconstructing variable values with arithmetic.
When the location description of a source variable involves arithmetic
on the value itself, it needs to be marked with DW_OP_stack_value since it
is not describing the variable's location, but rather its value.

This is a follow-up to r297971 and fixes the source testcase quoted in
the comment in debuginfo-dce.ll.

rdar://problem/30725338

This reapplies r301093 without modifications.

llvm-svn: 301210
2017-04-24 18:11:42 +00:00
Matt Arsenault 02907f3039 InstCombine: Fix assert when reassociating fsub with undef
There is logic to track the expected number of instructions
produced. It thought in this case an instruction would
be necessary to negate the result, but here it folded
into a ConstantExpr fneg when the non-undef value operand
was cancelled out by the second fsub.

I'm not sure why we don't fold constant FP ops with undef currently,
but I think that would also avoid this problem.

llvm-svn: 301199
2017-04-24 17:24:37 +00:00
Xin Tong a266923d57 Compute safety information in a much finer granularity.
Summary:
Instead of keeping a variable indicating whether there are early exits
in the loop.  We keep all the early exits. This improves LICM's ability to
move instructions out of the loop based on is-guaranteed-to-execute.

I am going to update compilation time as well soon.

Reviewers: hfinkel, sanjoy, efriedma, mkuper

Reviewed By: hfinkel

Subscribers: llvm-commits, mzolotukhin

Differential Revision: https://reviews.llvm.org/D32433

llvm-svn: 301196
2017-04-24 17:12:22 +00:00
Nicolai Haehnle 9c66185315 InstCombine/AMDGPU: Fix constant folding of llvm.amdgcn.{icmp,fcmp}
Summary:
The return value of these intrinsics should always have 0 bits for
inactive threads. This means that when all arguments are constant
and the comparison evaluates to true, the intrinsic should return
the current exec mask.

Fixes some GL_ARB_shader_ballot tests.

Reviewers: arsenm

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D32344

llvm-svn: 301195
2017-04-24 17:08:43 +00:00
Xinliang David Li db8d09b6c2 [PartialInine]: add triaging options
There are more bugs (runtime failures) triggered when partial
inlining is turned on. Add options to help triaging problems.

llvm-svn: 301148
2017-04-23 23:39:04 +00:00
Sanjay Patel e0c26e0640 [InstCombine] add/move folds for [not]-xor
We handled all of the commuted variants for plain xor already,
although they were scattered around and sometimes folded less
efficiently using distributive laws. We had no folds for not-xor.

Handling all of these patterns consistently is part of trying to 
reinstate:
https://reviews.llvm.org/rL300977

llvm-svn: 301144
2017-04-23 22:00:02 +00:00
Xinliang David Li 15744ad87b [PartialInlining] Add optimization remark support
Differential Revision: http://reviews.llvm.org/D32387

llvm-svn: 301143
2017-04-23 21:40:58 +00:00
Xin Tong f98602a1ab [JumpThread] We want to fold (not thread) when all predecessor go to single BB's successor.
Summary:
In case all predecessor go to a single successor of current BB. We want to fold (not thread).

I failed to update the phi nodes properly in the last patch https://reviews.llvm.org/rL300657.

Phi nodes values are per predecessor in LLVM.

Reviewers: sanjoy

Reviewed By: sanjoy

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D32400

llvm-svn: 301139
2017-04-23 20:56:29 +00:00
Xin Tong b7b081262a Correct grammar. NFC
llvm-svn: 301135
2017-04-23 17:36:25 +00:00
Sanjay Patel d13b0bfdac [InstCombine] add pattern matches for commuted variants of xor-to-xor
There's probably some better way to write this that eliminates the
code duplication without hurting readability, but at least this
eliminates the logic holes and is hopefully slightly more efficient
than creating new instructions.

llvm-svn: 301129
2017-04-23 16:03:00 +00:00
Renato Golin 4abfb3d741 Revert "[APInt] Fix a few places that use APInt::getRawData to operate within the normal API."
This reverts commit r301105, 4, 3 and 1, as a follow up of the previous
revert, which broke even more bots.

For reference:
Revert "[APInt] Use operator<<= where possible. NFC"
Revert "[APInt] Use operator<<= instead of shl where possible. NFC"
Revert "[APInt] Use ashInPlace where possible."

PR32754.

llvm-svn: 301111
2017-04-23 12:15:30 +00:00
Craig Topper 5f68af0806 [APInt] Use operator<<= instead of shl where possible. NFC
llvm-svn: 301103
2017-04-23 05:18:31 +00:00
Davide Italiano 5da7090256 [ThinLTO/Summary] Rename anonymous globals as last action ...
... in the per-TU -O0 pipeline.
The problem is that there could be passes registered using
`addExtensionsToPM()` introducing unnamed globals.
Asan is an example, but there may be others. Building cppcheck
with `-flto=thin` and `-fsanitize=address` triggers an assertion
while we're reading bitcode (in lib/LTO), as the BitcodeReader
assumes there are no unnamed globals (because the namer has run).
Unfortunately I wasn't able to find an easy way to test this.
I added a comment in the hope nobody moves this again.

llvm-svn: 301102
2017-04-23 04:49:34 +00:00
Adrian Prantl 4677205010 Revert "Use DW_OP_stack_value when reconstructing variable values with arithmetic."
This reverts commit r301093 while investigating stage2 bot breakage.

llvm-svn: 301099
2017-04-23 00:44:40 +00:00
Adrian Prantl a2d25ac14a Use DW_OP_stack_value when reconstructing variable values with arithmetic.
When the location description of a source variable involves arithmetic
on the value itself, it needs to be marked with DW_OP_stack_value since it
is not describing the variable's location, but rather its value.

This is a follow-up to r297971 and fixes the source testcase quoted in
the comment in debuginfo-dce.ll.

rdar://problem/30725338

llvm-svn: 301093
2017-04-22 20:54:06 +00:00
Xinliang David Li 016a82ba51 [PartialInlining] Using existing hasAddressTaken interface to legality check/NFC
llvm-svn: 301090
2017-04-22 19:24:19 +00:00
Sanjay Patel 3b863f8a1e [InstCombine] use 'match' to reduce code; NFCI
The later uses of dyn_castNotVal in this block are either
incomplete (doesn't handle vector constants) or overstepping
(shouldn't handle constants at all), but this first use is
just unnecessary. 'I' is obviously not a constant, and it 
can't be a not-of-a-not because that would already be
instsimplified.

llvm-svn: 301088
2017-04-22 18:05:35 +00:00
Artur Pilipenko 0632bdc648 Fix for PR32740 - Invalid floating type, unreachable between r300969 and r301029
The bug was introduced by r301018 "[InstCombine] fadd double (sitofp x), y check that the promotion is valid". The patch didn't expect that fadd can be on vectors not necessarily scalars. Add vector support along with the test.

llvm-svn: 301070
2017-04-22 07:24:52 +00:00
Matt Arsenault 01d17e7c5f LowerSwitch: Fix producing invalid IR on unreachable code
If a switch was in an unreachable block that branched
to a block with a phi, it would leave phis with missing
predecessors.

llvm-svn: 301064
2017-04-21 23:54:12 +00:00
Matt Arsenault c07bda7b87 InferAddressSpaces: Infer for just GEPs
Fixes leaving intermediate flat addressing computations
where a GEP instruction's source is a constant expression.

Still leaves behind a trivial addrspacecast + gep pair that
instcombine is able to handle, which ideally could be folded
here directly.

llvm-svn: 301044
2017-04-21 21:35:04 +00:00
Xinliang David Li 0e9f6df169 [PartialInliner] Partial inliner needs to check use kind before transformation
Differential Revision: https://reviews.llvm.org/D32373

llvm-svn: 301042
2017-04-21 21:20:56 +00:00
Sanjay Patel 8ce1d4cbe1 [InstCombine] revert r300977 and r301021
This can cause an inf-loop. Investigating...

llvm-svn: 301035
2017-04-21 20:29:17 +00:00
Adrian Prantl 1a18f1ad10 typo
llvm-svn: 301030
2017-04-21 20:06:41 +00:00
Sanjay Patel 0f001a4701 [InstCombine] use isSubsetOf() for efficiency
C | ~D == -1
~(C | ~D) == 0
~C & D == 0
D & ~C == 0
D.isSubsetOf(C)

llvm-svn: 301021
2017-04-21 19:16:52 +00:00
Artur Pilipenko 134d94f9a3 [InstCombine] fadd double (sitofp x), y check that the promotion is valid
Doing these transformations check that the result of integer addition is representable in the FP type.

(fadd double (sitofp x), fpcst) --> (sitofp (add int x, intcst))
(fadd double (sitofp x), (sitofp y)) --> (sitofp (add int x, y))

This is a fix for https://bugs.llvm.org//show_bug.cgi?id=27036

Reviewed By: andrew.w.kaylor, scanon, spatel

Differential Revision: https://reviews.llvm.org/D31182

llvm-svn: 301018
2017-04-21 18:45:25 +00:00
Craig Topper 7af078847c [SimplifyCFG] Fix the determination of PostBB in conditional store merging to handle the targets on the second branch being commuted
Currently we choose PostBB as the single successor of QFB, but its possible that QTB's single successor is QFB which would make QFB the correct choice.

Differential Revision: https://reviews.llvm.org/D32323

llvm-svn: 300992
2017-04-21 15:53:42 +00:00
Wei Mi 337d4d95c2 [ConstHoisting] Add BFI in constanthoisting pass and select the best insertion
places based on it.

Existing constant hoisting pass will merge a group of contants in a small range
and hoist the const materialization code to the common dominator of their uses.
However, if the uses are all in cold pathes, existing implementation may hoist
the materialization code from cold pathes to a hot place. This may hurt performance.
The patch introduces BFI to the pass and selects the best insertion places based
on it.

The change is controlled by an option consthoist-with-block-frequency which is
off by default for now.

Differential Revision: https://reviews.llvm.org/D28962

llvm-svn: 300989
2017-04-21 15:50:16 +00:00
Matthew Simpson e2037d24f9 [LV] Model if-converted phi node costs
Phi nodes in non-header blocks are converted to select instructions after
if-conversion. This patch updates the cost model to account for the selects.

Differential Revision: https://reviews.llvm.org/D31906

llvm-svn: 300980
2017-04-21 14:14:54 +00:00
Sanjay Patel 347b54b093 [InstCombine] prefer xor with -1 because 'not' is easier to understand (PR32706)
This matches the demanded bits behavior in the DAG and should fix:
https://bugs.llvm.org/show_bug.cgi?id=32706

Differential Revision: https://reviews.llvm.org/D32255

llvm-svn: 300977
2017-04-21 14:03:54 +00:00
Davide Italiano fa15de34b7 [PartialInliner] Fix crash when inlining functions with unreachable blocks.
CodeExtractor looks up the dominator node corresponding to return blocks
when splitting them. If one of these blocks is unreachable, there's no
node in the Dom and CodeExtractor crashes because it doesn't check
for domtree node validity.
In theory, we could add just a check for skipping null DTNodes in
`splitReturnBlock` but the fix I propose here is slightly different. To the
best of my knowledge, unreachable blocks are irrelevant for the algorithm,
therefore we can just skip them when building the candidate set in the
constructor.

Differential Revision:  https://reviews.llvm.org/D32335

llvm-svn: 300946
2017-04-21 04:25:00 +00:00
Davide Italiano 059574c537 [CodeExtractor] Remove an unneeded level of indirection. NFCI.
llvm-svn: 300931
2017-04-21 00:21:09 +00:00
Craig Topper 358cd9ae3a [InstCombine] Remove the zextOrTrunc from ShrinkDemandedConstant.
The demanded mask and the constant should always be the same width for all callers today.

Also stop copying the demanded mask as its passed in. We should avoid allocating memory unless we are going to do something. The final AND to create the new constant will take care of it.

llvm-svn: 300927
2017-04-20 23:58:27 +00:00
Sanjay Patel cc663b82fa [InstCombine] function names start with lower-case letter; NFC
Forgot to make this fix with the signature change in r300911.

llvm-svn: 300912
2017-04-20 22:37:01 +00:00
Sanjay Patel c9485ca895 [InstCombine] allow shl+shr demanded bits folds with splat constants
llvm-svn: 300911
2017-04-20 22:33:54 +00:00
Xinliang David Li 99e3ca1526 Use basicblock split block utility function
Instead of calling BasicBlock::SplitBasicBlock directly in 
CodeExtractor.

Differential Revision: https://reviews.llvm.org/D32308

llvm-svn: 300899
2017-04-20 21:40:22 +00:00
Sanjay Patel 3e1ae72fcf [InstCombine] allow shl demanded bits folds with splat constants
More fixes are needed to enable the helper SimplifyShrShlDemandedBits().

llvm-svn: 300898
2017-04-20 21:33:02 +00:00
Craig Topper ff23889609 [InstCombine] Use APInt::intersects and APInt::isSubsetOf to improve a few more places in SimplifyDemandedBits.
llvm-svn: 300896
2017-04-20 21:24:37 +00:00
Sanjay Patel fb5b3e773a [InstCombine] allow ashr/lshr demanded bits folds with splat constants
llvm-svn: 300888
2017-04-20 20:59:02 +00:00
Craig Topper 17f37ba3b9 [InstCombine] Use APInt::isSubsetOf to simplify some code in SimplifyDemandedBits. NFC
This allows us to use less temporary APInt for And and Invert operations.

llvm-svn: 300885
2017-04-20 20:47:35 +00:00
Craig Topper 0ec3f2f39a [InstCombine] Remove redundant code from SimplifyDemandedBits handling for Or. The code above it is equivalent if you work through the bitwise math.
llvm-svn: 300876
2017-04-20 19:31:22 +00:00
Davide Italiano b965121ba8 [CodeExtractor] Remove a bunch of unneeded constructors.
Differential Revision:  https://reviews.llvm.org/D32305

llvm-svn: 300869
2017-04-20 18:33:40 +00:00
Craig Topper bcfd2d1789 [APInt] Rename getSignBit to getSignMask
getSignBit is a static function that creates an APInt with only the sign bit set. getSignMask seems like a better name to convey its functionality. In fact several places use it and then store in an APInt named SignMask.

Differential Revision: https://reviews.llvm.org/D32108

llvm-svn: 300856
2017-04-20 16:56:25 +00:00
Craig Topper a8129a1122 [APInt] Add isSubsetOf method that can check if one APInt is a subset of another without creating temporary APInts
This question comes up in many places in SimplifyDemandedBits. This makes it easy to ask without allocating additional temporary APInts.

The BitVector class provides a similar functionality through its (IMHO badly named) test(const BitVector&) method. Though its output polarity is reversed.

I've provided one example use case in this patch. I plan to do more as a follow up.

Differential Revision: https://reviews.llvm.org/D32258

llvm-svn: 300851
2017-04-20 16:17:13 +00:00
Craig Topper 83dc1c60aa In SimplifyDemandedUseBits, use computeKnownBits directly to handle Constants
Currently we don't explicitly process ConstantDataSequential, ConstantAggregateZero, or ConstantVector, or Undef before applying the Depth limit. Instead they occur after the depth check in the non-instruction path.

For the constant types that we do handle, the code is replicated from computeKnownBits.

This patch fixes the missing constant handling and the reduces the amount of code by just using computeKnownBits directly for any type of Constant.

Differential Revision: https://reviews.llvm.org/D32123

llvm-svn: 300849
2017-04-20 16:14:58 +00:00
Reid Kleckner f1de9e83c2 [DAE] Simplify attribute list creation, NFC
Removes a use of getSlotAttributes, which I intend to change.

llvm-svn: 300795
2017-04-19 23:45:45 +00:00
Reid Kleckner 0a5ed3d5dc [GlobalOpt] Simplify attribute code stripping nest, NFC
llvm-svn: 300787
2017-04-19 23:26:44 +00:00
Reid Kleckner aa0cec7d6d Simplify test for sret attribute in instcombine
This change is correct because the verifier requires that at most one
argument be marked 'sret'.

NFC, removes a use of AttributeList slot APIs.

llvm-svn: 300784
2017-04-19 23:17:47 +00:00
Kostya Serebryany c5d3d49034 [sanitizer-coverage] remove some more stale code
llvm-svn: 300778
2017-04-19 22:42:11 +00:00
Evgeniy Stepanov 7c9b086ef5 Remove two unused variables (-Werror).
llvm-svn: 300777
2017-04-19 22:27:23 +00:00
Kostya Serebryany be87d480ff [sanitizer-coverage] remove stale code
llvm-svn: 300769
2017-04-19 21:48:09 +00:00
Craig Topper 9b71a402c2 [APInt] Cast calls to add/sub/mul overflow methods to void if only their overflow bool out param is used.
This is preparation for a clang change to improve the [[nodiscard]] warning to not be ignored on methods that return a class marked [[nodiscard]] that are defined in the class itself. See D32207.

We should consider adding wrapper methods to APInt that return the overflow flag directly and discard the APInt result. This would eliminate the void casts and the need to create a bool before the call to pass to the out param.

llvm-svn: 300758
2017-04-19 21:09:45 +00:00
Matt Arsenault d3406bc45c StructurizeCFG: Directly invert cmp instructions
The most common case for a branch condition is
a single use compare. Directly invert the branch
predicate rather than adding a lot of xor i1 true
which the DAG will have to fold later.

This produces nicer to read structurizer output.

This produces some random changes in codegen
due to the DAG swapping branch conditions itself,
and then does a poor job of dealing with those
inverts.

llvm-svn: 300732
2017-04-19 18:29:07 +00:00
Sanjoy Das 5945447d84 [GVN] Don't coerce non-integral pointers to integers or vice versa
Summary:
See http://llvm.org/docs/LangRef.html#non-integral-pointer-type

The NewGVN test does not fail without these changes (perhaps it does
try to coerce pointers <-> integers to begin with?), but I added the
test case anyway.

Reviewers: dberlin

Subscribers: mcrosier, llvm-commits, Prazek

Differential Revision: https://reviews.llvm.org/D32208

llvm-svn: 300730
2017-04-19 18:21:09 +00:00
Reid Kleckner 9d16fa09c6 Prefer addAttr(Attribute::AttrKind) over the AttributeList overload
This should simplify the call sites, which typically want to tweak one
attribute at a time. It should also avoid creating ephemeral
AttributeLists that live forever.

llvm-svn: 300718
2017-04-19 17:28:52 +00:00
Davide Italiano ffcb4df204 [InstCombine] Reduce visitLoadInst() code duplication. NFCI.
llvm-svn: 300717
2017-04-19 17:26:57 +00:00
Chandler Carruth ae3386aa74 Revert r300657 due to crashes in stage2 of bootstraps:
http://lab.llvm.org:8011/builders/clang-with-lto-ubuntu/builds/2476/steps/build-stage2-LLVMgold.so/logs/stdio
http://bb.pgr.jp/builders/clang-3stage-x86_64-linux/builds/15036/steps/build_llvmclang/logs/stdio

I've updated the commit thread, reverting to get the bots back to green.

Original commit summary:
[JumpThread] We want to fold (not thread) when all predecessor go to single BB's successor.

llvm-svn: 300662
2017-04-19 06:23:20 +00:00
Xin Tong 636a332906 [JumpThread] We want to fold (not thread) when all predecessor go to single BB's successor. .
Summary: In case all predecessor go to a single successor of current BB. We want to fold (not thread).

Reviewers: efriedma, sanjoy

Reviewed By: sanjoy

Subscribers: dberlin, majnemer, llvm-commits

Differential Revision: https://reviews.llvm.org/D30869

llvm-svn: 300657
2017-04-19 05:15:57 +00:00
Sanjoy Das f09c1e346e Add a getPointerOperandType() helper to LoadInst and StoreInst; NFC
I will use this in a later change.

llvm-svn: 300613
2017-04-18 22:00:54 +00:00
Davide Italiano 80fe987b42 [LoopReroll] Prefer hasNUses/hasNUses or more as they're cheaper. NFCI.
llvm-svn: 300607
2017-04-18 21:42:21 +00:00
Daniel Berlin 9d0042b47c NewGVN: Fix memory congruence verification. The return true should be a return false. Merge the appropriate if statements so it doesn't happen again.
llvm-svn: 300584
2017-04-18 20:15:47 +00:00
Easwaran Raman 76aba5f6d7 [SLP vectorizer] Allow phi node reordering in tryToVectorizeList.
In tryToVectorizeList, under a very limited circumstance (when entered
from tryToVectorizePair), the values may be reordered (swapped) and the
SLP tree is built with the new order. This extends that to the case when
starting from phis in vectorizeChainsInBlock when there are exactly two
phis. The textual order of phi nodes shouldn't really matter. Without
this change, the loop body in the accompnaying test case is fully vectorized
when we swap the orde of the phis but not with this order. While this
doesn't solve the phi-ordering problem in a general way (for more than 2
phis), this is simple fix that piggybacks on an existing mechanism and
is useful in cases like multiplying two complex numbers.

Differential revision: https://reviews.llvm.org/D32065

llvm-svn: 300574
2017-04-18 18:16:57 +00:00
Craig Topper fc947bcfba [APInt] Use lshrInPlace to replace lshr where possible
This patch uses lshrInPlace to replace code where the object that lshr is called on is being overwritten with the result.

This adds an lshrInPlace(const APInt &) version as well.

Differential Revision: https://reviews.llvm.org/D32155

llvm-svn: 300566
2017-04-18 17:14:21 +00:00
Daniel Berlin ec9deb7f54 NewGVN: Don't waste time value numbering unreachable blocks
llvm-svn: 300565
2017-04-18 17:06:11 +00:00
Zvi Rackover d942397e24 LoopRerollPass: Prefer Value::hasOneUse() over Value::getNumUses(). NFC.
getNumUses() can be more expensive as it iterates over all list's elements.

llvm-svn: 300558
2017-04-18 14:55:43 +00:00
Gil Rapaport fb1d915ab2 [LV] Cache block mask values
This patch is part of D28975's breakdown.

Add caching for block masks similar to the cache already used for edge masks,
replacing generation per user with reusing the first generated value which
dominates all uses.

Differential Revision: https://reviews.llvm.org/D32054

llvm-svn: 300557
2017-04-18 14:43:43 +00:00
Nikolai Bozhenov 9e4a1c39db [GVNHoist] Mark GlobalsAA as preserved by GVNHoist.
Reviewers: sebpop, hiraditya

Reviewed By: sebpop

Subscribers: n.bozhenov, llvm-commits

Differential Revision: https://reviews.llvm.org/D32158
Patch by Andrei Elovikov <andrei.elovikov@intel.com>

llvm-svn: 300552
2017-04-18 13:25:49 +00:00
Andrea Di Biagio 517e3fc34c [SampleProfile] Don't assert when printing the DebugLoc of a branch. NFC.
llvm-svn: 300544
2017-04-18 11:27:58 +00:00
Andrea Di Biagio e3edef0977 [SampleProfile] Skip intrinsic calls when visiting callsites in InlineHotFunctions.
Before this patch, we always called method 'findCalleeFunctionSamples()' on
intrinsic calls. However, intrinsic calls like llvm.dbg.value() are not viable
candidates for obvious reasons.

No functional change intended.

Differential Revision: https://reviews.llvm.org/D32008

llvm-svn: 300541
2017-04-18 10:08:53 +00:00
Adrian Prantl 6825fb64e9 PR32382: Fix emitting complex DWARF expressions.
The DWARF specification knows 3 kinds of non-empty simple location
descriptions:
1. Register location descriptions
  - describe a variable in a register
  - consist of only a DW_OP_reg
2. Memory location descriptions
  - describe the address of a variable
3. Implicit location descriptions
  - describe the value of a variable
  - end with DW_OP_stack_value & friends

The existing DwarfExpression code is pretty much ignorant of these
restrictions. This used to not matter because we only emitted very
short expressions that we happened to get right by accident.  This
patch makes DwarfExpression aware of the rules defined by the DWARF
standard and now chooses the right kind of location description for
each expression being emitted.

This would have been an NFC commit (for the existing testsuite) if not
for the way that clang describes captured block variables. Based on
how the previous code in LLVM emitted locations, DW_OP_deref
operations that should have come at the end of the expression are put
at its beginning. Fixing this means changing the semantics of
DIExpression, so this patch bumps the version number of DIExpression
and implements a bitcode upgrade.

There are two major changes in this patch:

I had to fix the semantics of dbg.declare for describing function
arguments. After this patch a dbg.declare always takes the *address*
of a variable as the first argument, even if the argument is not an
alloca.

When lowering a DBG_VALUE, the decision of whether to emit a register
location description or a memory location description depends on the
MachineLocation — register machine locations may get promoted to
memory locations based on their DIExpression. (Future) optimization
passes that want to salvage implicit debug location for variables may
do so by appending a DW_OP_stack_value. For example:
  DBG_VALUE, [RBP-8]                        --> DW_OP_fbreg -8
  DBG_VALUE, RAX                            --> DW_OP_reg0 +0
  DBG_VALUE, RAX, DIExpression(DW_OP_deref) --> DW_OP_reg0 +0

All testcases that were modified were regenerated from clang. I also
added source-based testcases for each of these to the debuginfo-tests
repository over the last week to make sure that no synchronized bugs
slip in. The debuginfo-tests compile from source and run the debugger.

https://bugs.llvm.org/show_bug.cgi?id=32382
<rdar://problem/31205000>

Differential Revision: https://reviews.llvm.org/D31439

llvm-svn: 300522
2017-04-18 01:21:53 +00:00
Dehao Chen 1ea8bd8109 Build SymbolMap in SampleProfileLoader to help matchin function names with suffix.
Summary: If there is suffix added in the function name (e.g. module hash added by thinLTO), we will not be able to find a match in profile as the suffix does not exist in profile. This patch build a map from function name to Function *. The map includes the entry for the stripped function name so that inlineHotFunctions can find the corresponding function to promote/inline.

Reviewers: davidxl, dnovillo, tejohnson

Reviewed By: davidxl

Subscribers: mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D31952

llvm-svn: 300507
2017-04-17 22:23:05 +00:00
Craig Topper c228068d90 [SimplifyCFG] Use hasNUses instead of comparing getNumUses to a constant."
The use list is a linked list so getNumUses requires a linear scan through the whole list. hasNUses will stop scanning at N and see if that is the end.

llvm-svn: 300505
2017-04-17 22:13:00 +00:00
Davide Italiano cdc937d0fc [InstCombine] Matchers work with both ConstExpr and Instructions.
So, `cast<Instruction>` is not guaranteed to succeed. Change the
code so that we create a new constant and use it in the newly
created instruction, as it's done in other places in InstCombine.

OK'ed by Sanjay/Craig. Fixes PR32686.

llvm-svn: 300495
2017-04-17 20:49:50 +00:00
Peter Collingbourne a0f371a106 Bitcode: Add a string table to the bitcode format.
Add a top-level STRTAB block containing a string table blob, and start storing
strings for module codes FUNCTION, GLOBALVAR, ALIAS, IFUNC and COMDAT in
the string table.

This change allows us to share names between globals and comdats as well
as between modules, and improves the efficiency of loading bitcode files by
no longer using a bit encoding for symbol names. Once we start writing the
irsymtab to the bitcode file we will also be able to share strings between
it and the module.

On my machine, link time for Chromium for Linux with ThinLTO decreases by
about 7% for no-op incremental builds or about 1% for full builds. Total
bitcode file size decreases by about 3%.

As discussed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2017-April/111732.html

Differential Revision: https://reviews.llvm.org/D31838

llvm-svn: 300464
2017-04-17 17:51:36 +00:00
Craig Topper d23004c37b Introduce APInt::isSignBitSet/isSignBitClear. Use in place isSignBitSet in place of isNegative in known bits tracking.
This makes statements like KnownZero.isNegative() (which means the value we're tracking is positive) less confusing.

llvm-svn: 300457
2017-04-17 16:38:20 +00:00
Matt Arsenault 7205f3c2e4 AMDGPU: SimplifyDemandedElts for image intrinsics
Causes some VGPR usage improvements in shaderdb, but
introduces some SGPR spilling regressions due to random
scheduling changes later.

llvm-svn: 300453
2017-04-17 15:12:44 +00:00
Davide Italiano ce161a7812 [LCSSA] Don't insert tokens into the worklist at all.
We're gonna skip them anyway, so there's no point in inserting them
in the first place.

llvm-svn: 300452
2017-04-17 14:32:05 +00:00
Max Kazantsev 751579cac0 [LoopPeeling] Get rid of Phis that become invariant after N steps
This patch is a generalization of the improvement introduced in rL296898.
Previously, we were able to peel one iteration of a loop to get rid of a Phi that becomes
an invariant on the 2nd iteration. In more general case, if a Phi becomes invariant after
N iterations, we can peel N times and turn it into invariant.
In order to do this, we for every Phi in loop's header we define the Invariant Depth value
which is calculated as follows:

Given %x = phi <Inputs from above the loop>, ..., [%y, %back.edge].

If %y is a loop invariant, then Depth(%x) = 1.
If %y is a Phi from the loop header, Depth(%x) = Depth(%y) + 1.
Otherwise, Depth(%x) is infinite.
Notice that if we peel a loop, all Phis with Depth = 1 become invariants,
and all other Phis with finite depth decrease the depth by 1.
Thus, peeling N first iterations allows us to turn all Phis with Depth <= N
into invariants.

Reviewers: reames, apilipenko, mkuper, skatkov, anna, sanjoy

Reviewed By: sanjoy

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D31613

llvm-svn: 300446
2017-04-17 09:52:02 +00:00
Max Kazantsev 8ed6b66d85 [LoopPeeling] Fix condition for phi-eliminating peeling
When peeling loops basing on phis becoming invariants, we make a wrong loop size check.
UP.Threshold should be compared against the total numbers of instructions after the transformation,
which is equal to 2 * LoopSize in case of peeling one iteration.
We should also check that the maximum allowed number of peeled iterations is not zero.

Reviewers: sanjoy, anna, reames, mkuper

Reviewed By: mkuper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D31753

llvm-svn: 300441
2017-04-17 05:38:28 +00:00
Craig Topper 218a359fbd [InstCombine] Simplify 1/X for vectors.
llvm-svn: 300439
2017-04-17 03:41:47 +00:00
Craig Topper 1a18a7c51e [InstCombine] Add support for vector srem->urem.
llvm-svn: 300437
2017-04-17 01:51:24 +00:00
Craig Topper f248468359 [InstCombine] Add support for turning vector sdiv into udiv.
llvm-svn: 300435
2017-04-17 01:51:19 +00:00
Davide Italiano ee654bf5f1 [LCSSA] Simplify a loop. NFCI.
llvm-svn: 300433
2017-04-17 00:02:45 +00:00
Craig Topper da886c665b [InstCombine][ValueTracking] When computing known bits for Srem make sure we don't compute known bits for the LHS twice.
If we already called computeKnownBits for the RHS being a constant power of 2, we've already computed everything we can and should just stop. I think previously we would still recurse if we had determined the result was negative or had not determined the sign bit at all.

llvm-svn: 300432
2017-04-16 21:46:12 +00:00
Davide Italiano dd37c67d81 [LCSSA] Fix non-determinism due to iterating over a SmallPtrSet.
Use a SmallSetVector instead.

llvm-svn: 300431
2017-04-16 21:07:04 +00:00
Craig Topper 0d304f01b4 [InstCombine] In SimplifyDemandedUseBits, don't bother to mask known bits of constants with DemandedMask.
Just because we didn't demand them doesn't mean they aren't known.

llvm-svn: 300430
2017-04-16 20:55:58 +00:00
Michael Zuckerman 16b20d2fc5 [X86][X86 intrinsics]Folding cmp(sub(a,b),0) into cmp(a,b) optimization
This patch adds new optimization (Folding cmp(sub(a,b),0) into cmp(a,b))
to instCombineCall pass and was written specific for X86 CMP intrinsics.

Differential Revision: https://reviews.llvm.org/D31398

llvm-svn: 300422
2017-04-16 13:26:08 +00:00
Sanjay Patel ef9f586bb2 [InstCombine] allow (X != C1 && X != C2) and similar patterns to match splat vector constants
llvm-svn: 300402
2017-04-15 17:55:06 +00:00
Vedant Kumar 1a6a2b642b [ProfileData] Unify getInstrProf*SectionName helpers
This is a version of D32090 that unifies all of the
`getInstrProf*SectionName` helper functions. (Note: the build failures
which D32090 would have addressed were fixed with r300352.)

We should unify these helper functions because they are hard to use in
their current form. E.g we recently introduced more helpers to fix
section naming for COFF files. This scheme doesn't totally succeed at
hiding low-level details about section naming, so we should switch to an
API that is easier to maintain.

This is not an NFC commit because it fixes llvm-cov's testing support
for COFF files (this falls out of the API change naturally). This is an
area where we lack tests -- I will see about adding one as a follow up.

Testing: check-clang, check-profile, check-llvm.

Differential Revision: https://reviews.llvm.org/D32097

llvm-svn: 300381
2017-04-15 00:09:57 +00:00
Craig Topper 9a458cd517 [InstCombine] MakeAnd/Or/Xor handling to reuse previous APInt computations
When checking if we should return a constant, we create some temporary APInts to see if we know all bits. But the exact computations we do are needed in several other locations in the same code.

This patch moves them to named temporaries so we can reuse them.

Ideally we'd write directly to KnownZero/One, but we currently seem to only write those variables after all the simplifications checks and I didn't want to change that with this patch.

Differential Revision: https://reviews.llvm.org/D32094

llvm-svn: 300376
2017-04-14 22:34:14 +00:00
Reid Kleckner fb502d2f5e [IR] Make paramHasAttr to use arg indices instead of attr indices
This avoids the confusing 'CS.paramHasAttr(ArgNo + 1, Foo)' pattern.

Previously we were testing return value attributes with index 0, so I
introduced hasReturnAttr() for that use case.

llvm-svn: 300367
2017-04-14 20:19:02 +00:00
Sanjay Patel 7cfe41659c [InstCombine] (X != C1 && X != C2) --> (X | (C1 ^ C2)) != C2
...when C1 differs from C2 by one bit and C1 <u C2:
http://rise4fun.com/Alive/Vuo

And move related folds to a helper function. This reduces code duplication and
will make it easier to remove the scalar-only restriction as a follow-up step.

llvm-svn: 300364
2017-04-14 19:23:50 +00:00
Craig Topper fb71b7d3e0 [InstCombine] Support folding a subtract with a constant LHS into a phi node
We currently only support folding a subtract into a select but not a PHI. This fixes that.

I had to fix an assumption in FoldOpIntoPhi that assumed the PHI node was always in operand 0. Now we pass it in like we do for FoldOpIntoSelect. But we still require some dancing to find the Constant when we create the BinOp or ConstantExpr. This is based code is similar to what we do for selects.

Since I touched all call sites, this also renames FoldOpIntoPhi to foldOpIntoPhi to match coding standards.

Differential Revision: https://reviews.llvm.org/D31686

llvm-svn: 300363
2017-04-14 19:20:12 +00:00
Craig Topper c22c7b1459 [InstCombine] Refactor SimplifyUsingDistributiveLaws to more explicitly skip code when LHS/RHS aren't BinaryOperators
Currently this code always makes 2 or 3 calls to tryFactorization regardless of whether the LHS/RHS are BinaryOperators. We make 3 calls when both operands are BinaryOperators with the same opcode. Or surprisingly, when neither are BinaryOperators. This is because getBinOpsForFactorization returns Instruction::BinaryOpsEnd when the operand is not a BinaryOperator. If both LHS and RHS are not BinaryOperators then they both have an Opcode of Instruction::BinaryOpsEnd. When this happens we rely on tryFactorization to early out due to A/B/C/D being null. Similar behavior occurs for the other calls, we rely on getBinOpsForFactorization having made A/B or C/D null to get tryFactorization to early out.

We also rely on these null checks to check the result of getIdentityValue and early out for it.

This patches refactors this to pull these checks up to SimplifyUsingDistributiveLaws so we don't rely on BinaryOpsEnd as a sentinel or this A/B/C/D null behavior. I think this makes this code easier to reason about. Should also give a tiny performance improvement for cases where the LHS or RHS isn't a BinaryOperator.

Differential Revision: https://reviews.llvm.org/D31913

llvm-svn: 300353
2017-04-14 17:55:41 +00:00
Davide Italiano 91239088a1 [FunctionImport] assert(false) -> llvm_unreachable(). NFCI.
llvm-svn: 300344
2017-04-14 17:22:02 +00:00
Sanjoy Das e3a15e832c Tighten the API for ScalarEvolutionNormalization
llvm-svn: 300331
2017-04-14 15:49:59 +00:00
Sanjoy Das ac9f3ea0b4 Remove NormalizeAutodetect; NFC
It is cleaner to have a callback based system where the logic of
whether an add recurrence is normalized or not lives on IVUsers.

This is one step in a multi-step cleanup.

llvm-svn: 300330
2017-04-14 15:49:53 +00:00
Gil Rapaport 334f8fbe47 [LV] Remove implicit single basic block assumption
This patch is part of D28975's breakdown - no change in output intended.

LV's code currently assumes the vectorized loop is a single basic block up
until predicateInstructions() is called. This patch removes two manifestations
of this assumption (loop phi incoming values, dominator tree update) by
replacing the use of vectorLoopBody with the vectorized loop's latch/header.

Differential Revision: https://reviews.llvm.org/D32040

llvm-svn: 300310
2017-04-14 07:30:23 +00:00
Craig Topper c9a4fc0750 [InstCombine] Use APInt::setSignBit and APInt::isNegative(). NFC
llvm-svn: 300305
2017-04-14 05:09:04 +00:00
Xinliang David Li 9a71766751 Fix test failure on windows: pass module to getInstrProfXXName calls
llvm-svn: 300302
2017-04-14 03:03:24 +00:00
Daniel Berlin 2f72b19b05 NewGVN: Don't propagate over phi backedges where undef causes us to
have >1 value, unless we can prove the phi node is cycle free.

Fixes PR 32607.

llvm-svn: 300299
2017-04-14 02:53:37 +00:00
Xinliang David Li 57dea2d359 [Profile] PE binary coverage bug fix
PR/32584

Differential Revision: https://reviews.llvm.org/D32023

llvm-svn: 300277
2017-04-13 23:37:12 +00:00
Reid Kleckner f021fab2af [IR] Make getParamAttributes take argument numbers, not ArgNo+1
Add hasParamAttribute() and use it instead of hasAttribute(ArgNo+1,
Kind) everywhere.

The fact that the AttributeList index for an argument is ArgNo+1 should
be a hidden implementation detail.

NFC

llvm-svn: 300272
2017-04-13 23:12:13 +00:00
Craig Topper e7563f8dda [InstCombine] Use APInt::getBitsSetFrom instead of inverting the result of getLowBitsSet. NFC
llvm-svn: 300265
2017-04-13 21:49:48 +00:00
Davide Italiano af36d02430 [LCSSA] Efficiently compute blocks dominating at least one exit.
For LCSSA purposes, loop BBs not dominating any of the exits aren't
interesting, as none of the values defined in these blocks can be
used outside the loop.

The way the code computed this information was by comparing each
BB of the loop with each of the exit blocks and ask the dominator tree
about their dominance relation. This is slow.

A more efficient way, implemented here, is that of starting from the
exit blocks and walking the dom upwards until we hit an header. By
transitivity, all the blocks we encounter in our path dominate an exit.

For the testcase provided in PR31851, this reduces compile time on
`opt -O2` by ~25%, going from 1m47s to 1m22s.

Thanks to Dan/MichaelZ for discussions/suggesting the approach/review.

Differential Revision:  https://reviews.llvm.org/D31843

llvm-svn: 300255
2017-04-13 20:36:59 +00:00
Richard Smith 6c2615177b Revert accidentally-committed files in r300252.
llvm-svn: 300253
2017-04-13 20:31:21 +00:00
Richard Smith 55bd375b69 Remove all allocation and divisions from GreatestCommonDivisor
Switch from Euclid's algorithm to Stein's algorithm for computing GCD. This
avoids the (expensive) APInt division operation in favour of bit operations.
Remove all memory allocation from within the GCD loop by tweaking our `lshr`
implementation so it can operate in-place.

Differential Revision: https://reviews.llvm.org/D31968

llvm-svn: 300252
2017-04-13 20:29:59 +00:00
Reid Kleckner 257cb4e099 [InstCombine] Fix !prof metadata preservation for invokes
Summary:
Bug noticed by inspection.

Extend the test to handle invokes as well as calls, and rewrite it to
not depend on the inliner and other passes.

Also simplify the call site replacement code with CallSite, similar to
what I did to dead arg elimination and arg promotion (rL300235 and
rL300229).

Reviewers: danielcdh, davidxl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D32041

llvm-svn: 300251
2017-04-13 20:26:38 +00:00
Davide Italiano 0b30227f75 [LCSSA] Assert that we always have a valid loop.
We could otherwise add BBs not belonging to a loop in `formLCSSA`
and later crash when trying to iterate the loop blocks.

llvm-svn: 300244
2017-04-13 20:05:37 +00:00
Davide Italiano 549078d1ab [LCSSA] Remove spurious whitespaces. NFCI.
llvm-svn: 300243
2017-04-13 20:02:27 +00:00
Davide Italiano 5129951296 [LCSSA] Use `auto` when the type is obvious. NFCI.
llvm-svn: 300242
2017-04-13 20:01:30 +00:00
Dehao Chen 2c7ca9b5df SamplePGO: convert callsite samples map key from callsite_location to callsite_location+callee_name
Summary: For iterative SamplePGO, an indirect call can be speculatively promoted to multiple direct calls and get inlined. All these promoted direct calls will share the same callsite location (offset+discriminator). With the current implementation, we cannot distinguish between different promotion candidates and its inlined instance. This patch adds callee_name to the key of the callsite sample map. And added helper functions to get all inlined callee samples for a given callsite location. This helps the profile annotator promote correct targets and inline it before annotation, and ensures all indirect call targets to be annotated correctly.

Reviewers: davidxl, dnovillo

Reviewed By: davidxl

Subscribers: andreadb, llvm-commits

Differential Revision: https://reviews.llvm.org/D31950

llvm-svn: 300240
2017-04-13 19:52:10 +00:00
Anna Thomas dcdb325fee [LV] Fix the vector code generation for first order recurrence
Summary:
In first order recurrences where phi's are used outside the loop,
we should generate an additional vector.extract of the second last element from
the vectorized phi update.
This is because we require the phi itself (which is the value at the second last
iteration of the vector loop) and not the phi's update within the loop.
Also fix the code gen when we just unroll, but don't vectorize.
Fixes PR32396.

Reviewers: mssimpso, mkuper, anemet

Subscribers: llvm-commits, mzolotukhin

Differential Revision: https://reviews.llvm.org/D31979

llvm-svn: 300238
2017-04-13 18:59:25 +00:00
Sanjay Patel 445d03bf00 [InstCombine] fold X == 0 || X == -1 to one compare (PR32524)
This is effectively a retry of:
https://reviews.llvm.org/rL299851
but now we have tests and an assert to make sure the bug
that was exposed with that attempt will not happen again.

I'll fix the code duplication and missing sibling fold next,
but I want to make this change as small as possible to reduce
risk since I messed it up last time.

This should fix:
https://bugs.llvm.org/show_bug.cgi?id=32524

llvm-svn: 300236
2017-04-13 18:47:06 +00:00
Reid Kleckner aea2a28098 [DAE] Simplify call site replacement code with CallSite NFC
llvm-svn: 300235
2017-04-13 18:42:03 +00:00
Reid Kleckner c3fae796fd [InstCombine] Simplify attribute code with new AttributeList::get NFC
llvm-svn: 300230
2017-04-13 18:11:03 +00:00
Reid Kleckner 3a1150352d [ArgPromotion] Don't drop !prof metadata on promoted calls
Noticed by inspection while doing attribute work. DAE, InstCombineCalls,
and ArgPromotion have a fair amount of duplicated code for hacking on
call sites, and you can find bugs by comparing them.

Add a test case for this.

llvm-svn: 300229
2017-04-13 18:10:30 +00:00
Sanjay Patel 9745d24a66 [InstCombine] use similar ops for related folds; NFCI
It's less efficient to produce 'ule' than 'ult' since we know we're going to
canonicalize to 'ult', but we shouldn't have duplicated code for these folds.

As a trade-off, this was a pretty terrible way to make a '2'. :)
       if (LHSC == SubOne(RHSC)) 
         AddC = ConstantExpr::getSub(AddOne(RHSC), LHSC);

The next steps are to share the code to fix PR32524 and add the missing 'and'
fold that was left out when PR14708 was fixed:
https://bugs.llvm.org/show_bug.cgi?id=14708

llvm-svn: 300222
2017-04-13 17:36:24 +00:00
Sanjay Patel a8ebb46e0e [InstCombine] fix assert to not always be true
llvm-svn: 300202
2017-04-13 16:05:01 +00:00
Geoff Berry 85a530fb59 Re-apply "[GVNHoist] Move GVNHoist to function simplification part of pipeline."
This reverts commit r296872 now that PR32153 has been fixed.

llvm-svn: 300200
2017-04-13 15:36:25 +00:00
Ayal Zaks cd712b6c49 [LV] Refactor ILV to provide vectorizeInstruction(); NFC
Refactoring InnerLoopVectorizer's vectorizeBlockInLoop() to provide
vectorizeInstruction(). Aligning DeadInstructions with its only user.
Facilitates driving the transformation by VPlan - follows
https://reviews.llvm.org/D28975 and its tentative breakdown.

Differential Revision: https://reviews.llvm.org/D31997

llvm-svn: 300183
2017-04-13 09:07:23 +00:00
Reid Kleckner 7f72033e1c [IR] Take func, ret, and arg attrs separately in AttributeList::get
This seems like a much more natural API, based on Derek Schuff's
comments on r300015. It further hides the implementation detail of
AttributeList that function attributes come last and appear at index
~0U, which is easy for the user to screw up. git diff says it saves code
as well: 97 insertions(+), 137 deletions(-)

This also makes it easier to change the implementation, which I want to
do next.

llvm-svn: 300153
2017-04-13 00:58:09 +00:00
Craig Topper c75f94bfa5 [InstCombine] Teach SimplifyMultipleUseDemandedBits to handle And/Or/Xor known bits using the LHS/RHS known bits it already acquired without recursing back into computeKnownBits.
This replicates the known bits and constant creation code from the single use case for these instructions and adds it here. The computeKnownBits and constant creation code for other instructions is now in the default case of the opcode switch.

llvm-svn: 300094
2017-04-12 19:32:47 +00:00
Craig Topper cf3641fd57 [InstCombine] Remove unreachable code for turning an And where all demanded bits on both sides are known to be zero into a constant 0.
We already handled a superset check that included the known ones too and folded to a constant that may include ones. But it can also handle the case of no ones.

llvm-svn: 300093
2017-04-12 19:08:03 +00:00
Sanjay Patel 6e41018942 [InstCombine] fix wrong undef handling when converting select to shuffle
As discussed in:
https://bugs.llvm.org/show_bug.cgi?id=32486
...the canonicalization of vector select to shufflevector does not hold up
when undef elements are present in the condition vector. 

Try to make the undef handling clear in the code and the LangRef.

Differential Revision: https://reviews.llvm.org/D31980

llvm-svn: 300092
2017-04-12 18:39:53 +00:00
Craig Topper f35a7f7b49 [InstCombine] In SimplifyMultipleUseDemandedBits, use a switch instead of cascaded ifs on opcode. NFC
llvm-svn: 300085
2017-04-12 18:25:25 +00:00
Craig Topper 9a51c7f343 [InstCombine] Teach SimplifyDemandedInstructionBits that even if we reach an instruction that has multiple uses, if we know all the bits for the demanded bits for this context we can go ahead and create a constant.
Currently if we reach an instruction with multiples uses we know we can't do any optimizations to that instruction itself since we only have the demanded bits for one of the users. But if we know all of the bits are zero/one for that one user we can still go ahead and create a constant to give to that user.

This might then reduce the instruction to having a single use and allow additional optimizations on the other path.

This picks up an additional case that r300075 didn't catch.

Differential Revision: https://reviews.llvm.org/D31552

llvm-svn: 300084
2017-04-12 18:17:46 +00:00
Craig Topper b0076fe8b4 [InstCombine] Move portion of SimplifyDemandedUseBits that deals with instructions with multiple uses out to a separate method. NFCI
llvm-svn: 300082
2017-04-12 18:05:21 +00:00
Craig Topper 845033a6c9 Teach SimplifyDemandedUseBits that adding or subtractings 0s from every bit below the highest demanded bit can be simplified
If we are adding/subtractings 0s below the highest demanded bit we can just use the other operand and remove the operation.

My primary motivation is observing that we can call ShrinkDemandedConstant for the add/sub and create a 0 constant, rather than removing the add completely. In the case I saw, we modified the constant on an add instruction to a 0, but the add is not put into the worklist. So we didn't revisit it until the next InstCombine iteration. This caused an IR modification to remove add and a subsequent iteration to be ran.

With this change we get bypass the add in the first iteration and prevent the second iteration from changing anything.

Differential Revision: https://reviews.llvm.org/D31120

llvm-svn: 300075
2017-04-12 16:49:59 +00:00
Sanjay Patel 33439f982b [InstCombine] morph an existing instruction instead of creating a new one
One potential way to make InstCombine (very slightly?) faster is to recycle instructions 
when possible instead of creating new ones. It's not explicitly stated AFAIK, but we don't
consider this an "InstSimplify". We could, however, make a new layer to house transforms 
like this if that makes InstCombine more manageable (just throwing out an idea; not sure 
how much opportunity is actually here).

Differential Revision: https://reviews.llvm.org/D31863

llvm-svn: 300067
2017-04-12 15:11:33 +00:00
Jonas Paulsson 22776892c9 [SLPVectorizer] Pass the right type argument to getCmpSelInstrCost()
In getEntryCost(), make the scalar type for a compare instruction that of the
operands, not i1. This is needed in order to call getCmpSelInstrCost() for a
compare in a sensible way, the same way as the LoopVectorizer does.

New test: test/Transforms/SLPVectorizer/SystemZ/SLP-cmp-cost-query.ll

Review: Matthew Simpson
https://reviews.llvm.org/D31601

llvm-svn: 300061
2017-04-12 13:29:25 +00:00
Jonas Paulsson 592dbea779 [LoopVectorizer] Improve handling of branches during cost estimation.
The cost for a branch after vectorization is very different depending on if
the vectorizer will if-convert the block (branch is eliminated), or if
scalarized and predicated blocks will be produced (branch duplicated before
each block). There is also the case of remaining scalar branches, such as the
back-edge branch.

This patch handles these cases differently with TTI based cost estimates.

Review: Matthew Simpson
https://reviews.llvm.org/D31175

llvm-svn: 300058
2017-04-12 13:13:15 +00:00
Jonas Paulsson da74ed42da [LoopVectorizer, TTI] New method supportsEfficientVectorElementLoadStore()
Since SystemZ supports vector element load/store instructions, there is no
need for extracts/inserts if a vector load/store gets scalarized.

This patch lets Target specify that it supports such instructions by means of
a new TTI hook that defaults to false.

The use for this is in the LoopVectorizer getScalarizationOverhead() method,
which will with this patch produce a smaller sum for a vector load/store on
SystemZ.

New test: test/Transforms/LoopVectorize/SystemZ/load-store-scalarization-cost.ll

Review: Adam Nemet
https://reviews.llvm.org/D30680

llvm-svn: 300056
2017-04-12 12:41:37 +00:00
Jonas Paulsson fccc7d66c3 [SystemZ] TargetTransformInfo cost functions implemented.
getArithmeticInstrCost(), getShuffleCost(), getCastInstrCost(),
getCmpSelInstrCost(), getVectorInstrCost(), getMemoryOpCost(),
getInterleavedMemoryOpCost() implemented.

Interleaved access vectorization enabled.

BasicTTIImpl::getCastInstrCost() improved to check for legal extending loads,
in which case the cost of the z/sext instruction becomes 0.

Review: Ulrich Weigand, Renato Golin.
https://reviews.llvm.org/D29631

llvm-svn: 300052
2017-04-12 11:49:08 +00:00
Bjorn Pettersson 4af0593ecc [LoadCombine] Avoid analysing dead basic blocks
Summary:
Dead basic blocks may be forming a loop, for which SSA form is
fulfilled, but with a circular def-use chain. LoadCombine could
enter an infinite loop when analysing such dead code. This patch
solves the problem by simply avoiding to analyse all basic blocks
that aren't forward reachable, from function entry, in LoadCombine.

Fixes https://bugs.llvm.org/show_bug.cgi?id=27065

Reviewers: mehdi_amini, chandlerc, grosser, Bigcheese, davide

Reviewed By: davide

Subscribers: dberlin, zzheng, bjope, grandinj, Ka-Ka, materi, jholewinski, llvm-commits, mzolotukhin

Differential Revision: https://reviews.llvm.org/D31032

llvm-svn: 300034
2017-04-12 08:07:55 +00:00
Chandler Carruth 927d8e610a [IR] Redesign the case iterator in SwitchInst to actually be an iterator
and to expose a handle to represent the actual case rather than having
the iterator return a reference to itself.

All of this allows the iterator to be used with common STL facilities,
standard algorithms, etc.

Doing this exposed some missing facilities in the iterator facade that
I've fixed and required some work to the actual iterator to fully
support the necessary API.

Differential Revision: https://reviews.llvm.org/D31548

llvm-svn: 300032
2017-04-12 07:27:28 +00:00
Craig Topper b5194eeebf [InstCombine][IR] Add a commutable BinOp matcher. Use it to reduce some code. NFC
llvm-svn: 300030
2017-04-12 05:49:28 +00:00
Bob Haarman 4075ccc717 ThinLTOBitcodeWriter: keep comdats together, rename if leader is renamed
Summary:
COFF requires that every comdat contain a symbol with the same name as
the comdat. ThinLTOBitcodeWriter renames symbols, which may cause this
requirement to be violated. This change avoids such violations by
renaming comdats if their leaders are renamed. It also keeps comdats
together when splitting modules.

Reviewers: pcc, mehdi_amini, tejohnson

Reviewed By: pcc

Subscribers: rnk, Prazek, llvm-commits

Differential Revision: https://reviews.llvm.org/D31963

llvm-svn: 300019
2017-04-12 01:43:07 +00:00
Reid Kleckner c2cb560045 [IR] Add AttributeSet to hide AttributeSetNode* again, NFC
Summary:
For now, it just wraps AttributeSetNode*. Eventually, it will hold
AvailableAttrs as an inline bitset, and adding and removing enum
attributes will be super cheap.

This sinks AttributeSetNode back down to lib/IR/AttributeImpl.h.

Reviewers: pete, chandlerc

Subscribers: llvm-commits, jfb

Differential Revision: https://reviews.llvm.org/D31940

llvm-svn: 300014
2017-04-12 00:38:00 +00:00
Evgeniy Stepanov 90fd87303c [asan] Give global metadata private linkage.
Internal linkage preserves names like "__asan_global_foo" which may
account to 2% of unstripped binary size.

llvm-svn: 299995
2017-04-11 22:28:13 +00:00
Anna Thomas 00dc1b74b7 [LV] Avoid vectorizing first order recurrence when phi uses are outside loop
In the vectorization of first order recurrence, we vectorize such
that the last element in the vector will be the one extracted to pass into the
scalar remainder loop. However, this is not true when there is a phi (other
than the primary induction variable) is used outside the loop.
In such a case, we need the value from the second last iteration (i.e.
the phi value), not the last iteration (which would be the phi update).
I've added a test case for this. Also see PR32396.

A follow up patch would generate the correct code gen for such cases,
and turn this vectorization on.

Differential Revision: https://reviews.llvm.org/D31910

Reviewers: mssimpso
llvm-svn: 299985
2017-04-11 21:02:00 +00:00
Daniel Berlin 554dcd8c89 MemorySSA: Move to Analysis, from Transforms/Utils. It's used as
Analysis, it has Analysis passes, and once NewGVN is made an Analysis,
this removes the cross dependency from Analysis to Transform/Utils.
NFC.

llvm-svn: 299980
2017-04-11 20:06:36 +00:00
Andrea Di Biagio 8e26936bfd [AddDiscriminators] Assign discriminators to MemIntrinsic calls.
Before this patch, pass AddDiscriminators always avoided to assign
discriminators to intrinsic calls. This was done mainly for two reasons:
 1) We wanted to minimize the number of based discriminators used.
 2) We wanted to avoid non-deterministic discriminator assignment for
    different debug levels.

Unfortunately, that approach was problematic for MemIntrinsic calls.
MemIntrinsic calls can be split by SROA into loads and stores, and each new
load/store instruction would obtain the debug location from the original
intrinsic call.
If we don't assign a discriminator to MemIntrinsic calls, then we cannot
correctly set the discriminator for the newly created loads and stores.
This may have a negative impact on the basic block weight computation
performed by the SampleLoader.

This patch fixes the issue by letting MemIntrinsic calls have a discriminator.

Differential Revision: https://reviews.llvm.org/D31900

llvm-svn: 299972
2017-04-11 19:07:30 +00:00
Craig Topper 957a94cc03 Fix spelling compliment->complement. Mostly refering to 2s complement. NFC
llvm-svn: 299970
2017-04-11 18:47:58 +00:00
Craig Topper 271b2245f4 [InstCombine] Use ConstantExpr::getBinOpIdentity to implement getIdentityValue.
This removes a TODO in getIdentityValue and may allow some transforms to occur earlier. But I was unable to find any transforms we didn't already handle.

llvm-svn: 299966
2017-04-11 17:42:40 +00:00
Sanjay Patel 28611acef9 revert r299851 - [InstCombine] fix matching of or-of-icmps constants (PR32524)
This is a candidate culprit for multiple bot fails, so reverting pending investigation.

llvm-svn: 299955
2017-04-11 15:57:32 +00:00
Serge Guelton 59a2d7b909 Module::getOrInsertFunction is using C-style vararg instead of variadic templates.
From a user prospective, it forces the use of an annoying nullptr to mark the end of the vararg, and there's not type checking on the arguments.
The variadic template is an obvious solution to both issues.

Differential Revision: https://reviews.llvm.org/D31070

llvm-svn: 299949
2017-04-11 15:01:18 +00:00
Geoff Berry 9d597adde4 [GVNHoist] Re-enable GVNHoist by default
Turn GVNHoist back on by default now that PR32153 has been fixed.

llvm-svn: 299944
2017-04-11 14:36:30 +00:00
Keno Fischer 30779772cf [StripDeadDebug/DIFinder] Track inlined SPs
Summary:
In rL299692 I improved strip-dead-debug-info's ability to drop CUs that are not
referenced from the current module. However, in doing so I neglected to realize
that some SPs could be referenced entirely from inlined functions. It appears
I was not the only one to make this mistake, because DebugInfoFinder, doesn't
find those SPs either. Fix this in DebugInfoFinder and then use that to make
sure not to drop those CUs in strip-dead-debug-info.

Reviewers: aprantl

Reviewed By: aprantl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D31904

llvm-svn: 299936
2017-04-11 13:32:11 +00:00
Diana Picus b050c7fbe0 Revert "Turn some C-style vararg into variadic templates"
This reverts commit r299925 because it broke the buildbots. See e.g.
http://lab.llvm.org:8011/builders/clang-cmake-armv7-a15/builds/6008

llvm-svn: 299928
2017-04-11 10:07:12 +00:00
Serge Guelton 5fd75fb72e Turn some C-style vararg into variadic templates
Module::getOrInsertFunction is using C-style vararg instead of
variadic templates.

From a user prospective, it forces the use of an annoying nullptr
to mark the end of the vararg, and there's not type checking on the
arguments. The variadic template is an obvious solution to both
issues.

llvm-svn: 299925
2017-04-11 08:36:52 +00:00
Sylvestre Ledru 06faa9bf32 Simplify the code and remove dead code
Summary: Fix coverity cid 1374240

Reviewers: dberlin

Reviewed By: dberlin

Differential Revision: https://reviews.llvm.org/D31928

llvm-svn: 299924
2017-04-11 08:21:27 +00:00
Craig Topper 8c75adf95b [InstCombine] Refinement of r299915. Only consider a ConstantVector for Neg if all the elements are Undef or ConstantInt.
llvm-svn: 299917
2017-04-11 06:32:48 +00:00
Craig Topper 18f9e424e7 [InstCombine] Support weird size element types in dyn_castNegVal.
llvm-svn: 299915
2017-04-11 05:42:47 +00:00
Hal Finkel b63ed91549 [LICM] Hoist fp division from the loops and replace by a reciprocal
When allowed, we can hoist a division out of a loop in favor of a
multiplication by the reciprocal. Fixes PR32157.

Patch by vit9696!

Differential Revision: https://reviews.llvm.org/D30819

llvm-svn: 299911
2017-04-11 02:22:54 +00:00
Daniel Berlin bf80cfe6b6 Revert "NewGVN: Don't propagate over phi backedges where undef causes us to have >1 value."
It's not ready yet this was an accidental commit :(

This reverts r299903

llvm-svn: 299904
2017-04-11 00:07:26 +00:00
Daniel Berlin 3938111fe7 NewGVN: Don't propagate over phi backedges where undef causes us to have >1 value.
Fixes PR 32607.

llvm-svn: 299903
2017-04-11 00:02:38 +00:00
Reid Kleckner eb9dd5b87f Reland "[IR] Make AttributeSetNode public, avoid temporary AttributeList copies"
This re-lands r299875.

I introduced a bug in Clang code responsible for replacing K&R, no
prototype declarations with a real function definition with a prototype.
The bug was here:

       // Collect any return attributes from the call.
  -    if (oldAttrs.hasAttributes(llvm::AttributeList::ReturnIndex))
  -      newAttrs.push_back(llvm::AttributeList::get(newFn->getContext(),
  -                                                  oldAttrs.getRetAttributes()));
  +    newAttrs.push_back(oldAttrs.getRetAttributes());

Previously getRetAttributes() carried AttributeList::ReturnIndex in its
AttributeList. Now that we return the AttributeSetNode* directly, it no
longer carries that index, and we call this overload with a single node:
  AttributeList::get(LLVMContext&, ArrayRef<AttributeSetNode*>)

That aborted with an assertion on x86_32 targets. I added an explicit
triple to the test and added CHECKs to help find issues like this in the
future sooner.

llvm-svn: 299899
2017-04-10 23:31:05 +00:00
Davide Italiano f58a30236b [NewGVN] Surround with parens to clarify allegedly ambiguous precedence.
This Placates GCC7 with -Werror. Also, clang-format the assertions
while I'm here.

llvm-svn: 299895
2017-04-10 23:08:35 +00:00
Davide Italiano fa6a0a819d [MemorySSA] We don't need to compute dominator levels anymore.
Differential Revision:  https://reviews.llvm.org/D31818

llvm-svn: 299893
2017-04-10 22:44:46 +00:00
Matt Arsenault 3c1fc768ed Allow DataLayout to specify addrspace for allocas.
LLVM makes several assumptions about address space 0. However,
alloca is presently constrained to always return this address space.
There's no real way to avoid using alloca, so without this
there is no way to opt out of these assumptions.

The problematic assumptions include:
- That the pointer size used for the stack is the same size as
  the code size pointer, which is also the maximum sized pointer.

- That 0 is an invalid, non-dereferencable pointer value.

These are problems for AMDGPU because alloca is used to
implement the private address space, which uses a 32-bit
index as the pointer value. Other pointers are 64-bit
and behave more like LLVM's notion of generic address
space. By changing the address space used for allocas,
we can change our generic pointer type to be LLVM's generic
pointer type which does have similar properties.

llvm-svn: 299888
2017-04-10 22:27:50 +00:00
Dehao Chen d4a3397861 Emit less compiler optimization remarks in samplepgo to reduce a call to findCalleeFunctionSamples which is going to be refactored.
Summary: Now the SamplePGO support is more stable, we do not need so many verbose optimization remarks emitted.

Reviewers: dnovillo, davidxl

Reviewed By: davidxl

Subscribers: fhahn, llvm-commits

Differential Revision: https://reviews.llvm.org/D31826

llvm-svn: 299883
2017-04-10 20:49:16 +00:00
Geoff Berry 635e505675 [GVNHoist] Call isGuaranteedToTransferExecutionToSuccessor on each instruction
w.r.t. https://bugs.llvm.org/show_bug.cgi?id=32153
The consensus seems to be isGuaranteedToTransferExecutionToSuccessor should be called for each function.

Patch by Aditya Kumar

Differential Revision: https://reviews.llvm.org/D31035

llvm-svn: 299882
2017-04-10 20:45:17 +00:00
Evgeniy Stepanov ed7fce7c84 Revert "[asan] Put ctor/dtor in comdat."
This reverts commit r299696, which is causing mysterious test failures.

llvm-svn: 299880
2017-04-10 20:36:36 +00:00
Evgeniy Stepanov ba7c2e9661 Revert "[asan] Fix dead stripping of globals on Linux."
This reverts commit r299697, which caused a big increase in object file size.

llvm-svn: 299879
2017-04-10 20:36:30 +00:00
Reid Kleckner 211b1f324f Revert "[IR] Make AttributeSetNode public, avoid temporary AttributeList copies"
This reverts r299875. A Linux bot came back with a test failure:
http://bb.pgr.jp/builders/test-clang-i686-linux-RA/builds/741/steps/test_clang/logs/Clang%20%3A%3A%20CodeGen__2006-05-19-SingleEltReturn.c

llvm-svn: 299878
2017-04-10 20:34:19 +00:00
Reid Kleckner 324c99dee5 [IR] Make AttributeSetNode public, avoid temporary AttributeList copies
Summary:
AttributeList::get(Fn|Ret|Param)Attributes no longer creates a temporary
AttributeList just to hide the AttributeSetNode type.

I've also added a factory method to create AttributeLists from a
parallel array of AttributeSetNodes. I think this simplifies
construction of AttributeLists when rewriting function prototypes.
Previously we would test if a particular index had attributes, and
conditionally add a temporary attribute list to a vector. Now the
attribute set vector is parallel to the argument vector already that
these passes already construct.

My long term vision is to wrap AttributeSetNode* inside an AttributeSet
type that holds the enum attributes, but that will come in a follow up
change.

I haven't done any performance measurements for this change because
profiling hasn't shown that any of the affected code is hot.

Reviewers: pete, chandlerc, sanjoy, hfinkel

Reviewed By: pete

Subscribers: jfb, llvm-commits

Differential Revision: https://reviews.llvm.org/D31198

llvm-svn: 299875
2017-04-10 20:18:10 +00:00
Sanjay Patel e4159d2238 [InstCombine] improve variable names; NFCI
llvm-svn: 299871
2017-04-10 19:38:36 +00:00
Matt Arsenault daa08875b3 [MemCpyOpt] Only replace memcpy with bitcast if address spaces match
Patch by James Price

llvm-svn: 299866
2017-04-10 19:00:25 +00:00
Daniel Berlin 74603a68ef MemorySSA: Make lifetime starts defs for mustaliased pointers
Summary:
While we don't want them aliasing with other pointers, there seems to
be no point in not having them clobber must-aliased'd pointers.

If some day, we split the aliasing and ordering chains, we'd make this
not aliasing but an ordering barrier (IE it doesn't affect it's
memory, but we can't hoist it above it).

Reviewers: hfinkel, george.burgess.iv

Subscribers: Prazek, llvm-commits

Differential Revision: https://reviews.llvm.org/D31865

llvm-svn: 299865
2017-04-10 18:46:00 +00:00
Craig Topper 0d830ff7bf [InstCombine] Use commutable matchers and m_OneUse in visitSub to shorten code. Add missing test cases.
In one case I removed commute handling for a multiply with a constant since we'll eventually get the constant on the right hand side.

llvm-svn: 299863
2017-04-10 18:09:25 +00:00
Craig Topper 98851adc2a [InstCombine] Use m_c_Add to shorten some code. Add testcases for this fold since they were missing. NFC
llvm-svn: 299853
2017-04-10 16:59:40 +00:00
Sanjay Patel 570e35c157 [InstCombine] fix matching of or-of-icmps constants (PR32524)
Also, make the same change in and-of-icmps and remove a hack for detecting that case.

Finally, add some FIXME comments because the code duplication here is awful.

This should fix the remaining IR problem noted in:
https://bugs.llvm.org/show_bug.cgi?id=32524

llvm-svn: 299851
2017-04-10 16:55:57 +00:00
Craig Topper 3eec73e20b [InstCombine] Support folding of add instructions with vector constants into select operations
We currently only fold scalar add of constants into selects. This improves this to support vectors too.

Differential Revision: https://reviews.llvm.org/D31683

llvm-svn: 299847
2017-04-10 16:40:00 +00:00
Craig Topper 31cc143b51 [InstCombine] Use commutable and/or/xor matchers to simplify some code
Summary:
This is my first time using the commutable matchers so wanted to make sure I was doing it right.

Are there any other matcher tricks to further shrink this? Can we commute the whole match so we don't have to LHS and RHS separately?

Reviewers: davide, spatel

Reviewed By: davide

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D31680

llvm-svn: 299840
2017-04-10 07:13:40 +00:00
Craig Topper 838d13e7ee [InstCombine] Make sure we preserve fast math flags when folding fp instructions into phi nodes
Summary: I noticed in the select folding code that we copied fast math flags, but did not do the same for the similar handling in phi nodes. This patch fixes that to do the same thing as select

Reviewers: spatel, davide, majnemer, hfinkel

Reviewed By: davide

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D31690

llvm-svn: 299838
2017-04-10 07:00:10 +00:00
Craig Topper d8840d7b10 [InstCombine] use m_c_And and m_c_Xor to handle commuted versions of a transform.
llvm-svn: 299837
2017-04-10 06:53:28 +00:00
Craig Topper 7639460367 [InstCombine] Remove unnecessary dyn_cast to BinaryOperator around some matcher checks in visitXor.
The matchers themselves should be enough.

llvm-svn: 299835
2017-04-10 06:53:23 +00:00
Craig Topper 4738321f0c [InstCombine] Make the (A|B)^B -> A & ~B transform code consistent with the very similar (A&B)^B -> ~A & B code. This should be NFC except for the addition of hasOneUse check.
I think this code is still overly complicated and should use matchers, but first I wanted to make it consistent.

llvm-svn: 299834
2017-04-10 06:53:21 +00:00
Craig Topper 4f16d82d6b [InstCombine] Use m_OneUse to shorten some code. NFC
llvm-svn: 299833
2017-04-10 06:53:19 +00:00
Xin Tong 34888c08bc [SCCP] Resolve indirect branch target when possible.
Summary:
Resolve indirect branch target when possible.
This potentially eliminates more basicblocks and result in better evaluation for phi and other things.

Reviewers: davide, efriedma, sanjoy

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30322

llvm-svn: 299830
2017-04-10 00:33:25 +00:00
Sanjay Patel 16a054d5c7 [InstCombine] remove dead cases from icmp pair switches; NFCI
"PredicatesFoldable" returns false for signed/unsigned mismatched pairs,
so these cases should never exist. We'll default to 'unreachable' on those 
predicate combos instead.

Most of what's left in these switches belongs in InstSimplify (and may 
already be there), so there's probably more that can be done to reduce
this code.

llvm-svn: 299829
2017-04-09 21:51:34 +00:00
Davide Italiano 612d5a9c5c [Mem2Reg] Remove AliasSetTracker updating logic from the pass.
No caller has been passing it for a long time.

llvm-svn: 299827
2017-04-09 20:47:14 +00:00
Hal Finkel a9d67cf601 [MemorySSA] Fix use of pointsToConstantMemory in isUseTriviallyOptimizableToLiveOnEntry
In isUseTriviallyOptimizableToLiveOnEntry, pointsToConstantMemory needs to be
called on the load's pointer operand, not on the result of the load (which
might not even be a pointer).

llvm-svn: 299823
2017-04-09 12:57:50 +00:00
Craig Topper afa07c5ef6 [InstCombine] Extend some OR combines to support vectors.
This adds support for these combines for vectors
(X^C)|Y -> (X|Y)^C iff Y&C == 0
Y|(X^C) -> (X|Y)^C iff Y&C == 0

llvm-svn: 299822
2017-04-09 06:12:41 +00:00
Craig Topper e63c21b1ba [InstCombine] Extend a canonicalization check to apply to vector constants too.
llvm-svn: 299821
2017-04-09 06:12:39 +00:00
Craig Topper 437c97622b [InstCombine] Use the SubOne helper function to shorten some code. NFC
llvm-svn: 299819
2017-04-09 06:12:34 +00:00
Craig Topper 9d1821b262 [InstCombine] rename variable for easier reading; NFC
We usually give constants a 'C' somewhere in the name...

llvm-svn: 299818
2017-04-09 06:12:31 +00:00
Gor Nishanov bfb2a9db31 [coroutines] Make CoroSplit pass deterministic
coro-split-after-phi.ll test was flaky due to non-determinism in
the coroutine frame construction that was sorting the spill
vector using a pointer to a def as a part of the key.

The sorting was intended to make sure that spills for the same def
are kept together, however, we populate the vector by processing
defs in order, so the spill entires will end up together anyways.

This change removes spill sorting and restores the determinism
in the test.

llvm-svn: 299809
2017-04-08 00:49:46 +00:00
Evgeniy Stepanov 349adbacca [cfi] Take over existing __cfi_check in CrossDSOCFI.
https://reviews.llvm.org/D31796 will emit a dummy __cfi_check in the
frontend.

llvm-svn: 299805
2017-04-07 23:00:20 +00:00
Daniel Berlin a823656ce7 NewGVN: Make CongruenceClass a real class in preparation for splitting
NewGVN into analysis and eliminator.

llvm-svn: 299792
2017-04-07 18:38:09 +00:00
Gor Nishanov 138ad6c9c0 [coroutines] Insert spills of PHI instructions correctly
Summary:
Fix a bug where we were inserting a spill in between the PHIs in the beginning of the block.
Consider this fragment:

```
begin:
  %phi1 = phi i32 [ 0, %entry ], [ 2, %alt ]
  %phi2 = phi i32 [ 1, %entry ], [ 3, %alt ]
  %sp1 = call i8 @llvm.coro.suspend(token none, i1 false)
  switch i8 %sp1, label %suspend [i8 0, label %resume
                                  i8 1, label %cleanup]
resume:
  call i32 @print(i32 %phi1)
```
Unless we are spilling the argument or result of the invoke, we were always inserting the spill immediately following the instruction.
The fix adds a check that if the spilled instruction is a PHI Node, select an appropriate insert point with `getFirstInsertionPt()` that
skips all the PHI Nodes and EH pads.

Reviewers: majnemer, rnk

Reviewed By: rnk

Subscribers: qcolombet, EricWF, llvm-commits

Differential Revision: https://reviews.llvm.org/D31799

llvm-svn: 299771
2017-04-07 14:16:49 +00:00
Matthew Simpson 11fe2e9f2b Reapply r298620: [LV] Vectorize GEPs
This patch reapplies r298620. The original patch was reverted because of two
issues. First, the patch exposed a bug in InstCombine that caused the Chromium
builds to fail (PR32414). This issue was fixed in r299017. Second, the patch
introduced a bug in the vectorizer's scalars analysis that caused test suite
builds to fail on SystemZ. The scalars analysis was too aggressive and marked a
memory instruction scalar, even though it was going to be vectorized. This
issue has been fixed in the current patch and several new test cases for the
scalars analysis have been added.

llvm-svn: 299770
2017-04-07 14:15:34 +00:00
Craig Topper 33e0dbcc58 [InstCombine] Handle more commuted cases of ((A & B) | ~A) -> (~A | B)
llvm-svn: 299747
2017-04-07 07:32:00 +00:00
Daniel Berlin d952ceae2f AliasAnalysis: Be less conservative about volatile than atomic.
Summary:
getModRefInfo is meant to answer the question "what impact does this
instruction have on a given memory location" (not even another
instruction).

Long debate on this on IRC comes to the conclusion the answer should be "nothing special".

That is, a noalias volatile store does not affect a memory location
just by being volatile.  Note: DSE and GVN and memdep currently
believe this, because memdep just goes behind AA's back after it says
"modref" right now.

see line 635 of memdep. Prior to this patch we would get modref there, then check aliasing,
and if it said noalias, we would continue.

getModRefInfo *already* has this same AA check, it just wasn't being used because volatile was
lumped in with ordering.

(I am separately testing whether this code in memdep is now dead except for the invariant load case)

Reviewers: jyknight, chandlerc

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D31726

llvm-svn: 299741
2017-04-07 01:28:36 +00:00
Craig Topper 72a622cac7 [InstCombine] Add more commuted patterns to support folding ((~A & B) | A) -> (A | B).
llvm-svn: 299737
2017-04-07 00:29:47 +00:00
Craig Topper a521c30dc6 [InstCombine] Remove testing assert I accidentally left in r299710.
llvm-svn: 299715
2017-04-06 21:29:43 +00:00
Craig Topper b4da6840d8 [InstCombine] When checking to see if we can turn subtracts of 2^n - 1 into xor, we only need to call computeKnownBits on the RHS not the whole subtract. While there use isMask instead of isPowerOf2(C+1)
Calling computeKnownBits on the RHS should allows us to recurse one step further. isMask is equivalent to the isPowerOf2(C+1) except in the case where C is all ones. But that was already handled earlier by creating a not which is an Xor with all ones. So this should be fine.

llvm-svn: 299710
2017-04-06 21:06:03 +00:00
Rong Xu 2bf4c59025 [PGO] Preserve GlobalsAA in pgo-memop-opt pass.
Preserve GlobalsAA analysis in memory intrinsic calls optimization based on
profiled size.

llvm-svn: 299707
2017-04-06 20:56:00 +00:00
Craig Topper 7226d796aa [InstCombine] Remove redundant combine from visitAnd
This combine is fully handled by SimplifyDemandedInstructionBits as of r299658 where I fixed this code to ensure the Add/Sub had only a single user. Otherwise it would fire and create additional instructions. That fix resulted in an improvement to code generated for tsan which is why I committed it before deleting.

Differential Revision: https://reviews.llvm.org/D31543

llvm-svn: 299704
2017-04-06 20:41:48 +00:00
Mehdi Amini db11fdfda5 Revert "Turn some C-style vararg into variadic templates"
This reverts commit r299699, the examples needs to be updated.

llvm-svn: 299702
2017-04-06 20:23:57 +00:00
Mehdi Amini 579540a8f7 Turn some C-style vararg into variadic templates
Module::getOrInsertFunction is using C-style vararg instead of
variadic templates.

From a user prospective, it forces the use of an annoying nullptr
to mark the end of the vararg, and there's not type checking on the
arguments. The variadic template is an obvious solution to both
issues.

Patch by: Serge Guelton <serge.guelton@telecom-bretagne.eu>

Differential Revision: https://reviews.llvm.org/D31070

llvm-svn: 299699
2017-04-06 20:09:31 +00:00
Evgeniy Stepanov 6c3a8cbc4d [asan] Fix dead stripping of globals on Linux.
Use a combination of !associated, comdat, @llvm.compiler.used and
custom sections to allow dead stripping of globals and their asan
metadata. Sometimes.

Currently this works on LLD, which supports SHF_LINK_ORDER with
sh_link pointing to the associated section.

This also works on BFD, which seems to treat comdats as
all-or-nothing with respect to linker GC. There is a weird quirk
where the "first" global in each link is never GC-ed because of the
section symbols.

At this moment it does not work on Gold (as in the globals are never
stripped).

This is a re-land of r298158 rebased on D31358. This time,
asan.module_ctor is put in a comdat as well to avoid quadratic
behavior in Gold.

llvm-svn: 299697
2017-04-06 19:55:17 +00:00
Evgeniy Stepanov 5dfe420d10 [asan] Put ctor/dtor in comdat.
When possible, put ASan ctor/dtor in comdat.

The only reason not to is global registration, which can be
TU-specific. This is not the case when there are no instrumented
globals. This is also limited to ELF targets, because MachO does
not have comdat, and COFF linkers may GC comdat constructors.

The benefit of this is a lot less __asan_init() calls: one per DSO
instead of one per TU. It's also necessary for the upcoming
gc-sections-for-globals change on Linux, where multiple references to
section start symbols trigger quadratic behaviour in gold linker.

This is a rebase of r298756.

llvm-svn: 299696
2017-04-06 19:55:13 +00:00
Evgeniy Stepanov 039af609f1 [asan] Delay creation of asan ctor.
Create the constructor in the module pass.
This in needed for the GC-friendly globals change, where the constructor can be
put in a comdat  in some cases, but we don't know about that in the function
pass.

This is a rebase of r298731 which was reverted due to a false alarm.

llvm-svn: 299695
2017-04-06 19:55:09 +00:00
Keno Fischer bacc64b5fa [StripDeadDebugInfo] Drop dead CUs entirely
Summary:
Prior to this while it would delete the dead DIGlobalVariables, it would
leave dead DICompileUnits and everything referenced therefrom. For a bit
bitcode file with thousands of compile units those dead nodes easily
outnumbered the real ones. Clean that up.

Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D31720

llvm-svn: 299692
2017-04-06 19:26:22 +00:00
Daniel Berlin 21279bd37a NewGVN: Rename some functions for consistency
llvm-svn: 299685
2017-04-06 18:52:58 +00:00
Daniel Berlin 08fe6e0f74 NewGVN: Fixup some small issues
llvm-svn: 299684
2017-04-06 18:52:55 +00:00
Daniel Berlin 5845e0549e NewGVN: Fix a small formatting issue in performSymbolicLoadEvaluation.
llvm-svn: 299683
2017-04-06 18:52:53 +00:00
Daniel Berlin 1316a94ebc NewGVN: This patch makes memory congruence work for all types of
memorydefs, not just stores.  Along the way, we audit and fixup issues
about how we were tracking memory leaders, and improve the verifier
to notice more memory congruency issues.

llvm-svn: 299682
2017-04-06 18:52:50 +00:00
Craig Topper 3fc1225c18 [InstCombine] Fix a case where we weren't checking that an instruction had a single use resulting in extra instructions being created.
llvm-svn: 299658
2017-04-06 16:42:46 +00:00
Daniel Berlin d7a7ae061f MemorySSA: Remove MemorySSA walker caching.
Summary:
Remove all the caching the clobber walker does, and that the
caching walker does.  With the patch to enable storing clobbering
access results for stores, i can find no improvement with the cache
turned on (and a number of degradations, both time and memory, from
the cost of caching.  For a large program i have, we do millions of
lookups and inserts with zero hits).

I haven't tried to rename or simplify the walker otherwise yet.

(Appreciate some perf testing on this past my own testing)

Reviewers: george.burgess.iv, davide

Subscribers: Prazek, llvm-commits

Differential Revision: https://reviews.llvm.org/D31576

llvm-svn: 299578
2017-04-05 19:01:58 +00:00
Sanjay Patel 50c82c4395 [InstCombine] add fold for icmp with or mask of low bits (PR32542)
We already have these 'and' folds:

// X & -C == -C -> X >  u ~C
// X & -C != -C -> X <= u ~C
//   iff C is a power of 2

...but we were missing the 'or' siblings.

http://rise4fun.com/Alive/n6

This should improve:
https://bugs.llvm.org/show_bug.cgi?id=32524
...but there are 2 or more other pieces to fix still.

Differential Revision: https://reviews.llvm.org/D31712

llvm-svn: 299570
2017-04-05 17:57:05 +00:00
Sanjay Patel 519a87a468 [InstCombine] fix formatting and variable names; NFCI
There must be some opportunity to refactor big chunks of nearly duplicated code in FoldOrOfICmps / FoldAndOfICmps.
Also, none of this works with vectors, but it should.

llvm-svn: 299568
2017-04-05 17:38:34 +00:00
Daniel Berlin 3082b8e062 MemorySSA: Fix and use optimized_def_chain
llvm-svn: 299566
2017-04-05 17:26:25 +00:00
Akira Hatanaka 75be84f3c2 [ObjCArc] Do not dereference an invalidated iterator.
Fix a bug in ARC contract pass where an iterator that pointed to a
deleted instruction was dereferenced.

It appears that tryToContractReleaseIntoStoreStrong was incorrectly
assuming that a call to objc_retain would not immediately follow a call
to objc_release.

rdar://problem/25276306

llvm-svn: 299507
2017-04-05 03:44:09 +00:00
Bob Haarman 6de8134784 ThinLTOBitcodeWriter: handle aliases first in filterModule
Summary: This change fixes a "local linkage requires default visibility" assert when attempting to build LLVM with ThinLTO on Windows.

Reviewers: pcc, tejohnson, mehdi_amini

Reviewed By: pcc

Subscribers: llvm-commits, Prazek

Differential Revision: https://reviews.llvm.org/D31632

llvm-svn: 299491
2017-04-05 00:42:07 +00:00
Daniel Berlin e33bc31df4 Re-apply MemorySSA: Add support for caching clobbering access in
stores with some fixes.

Summary:
This enables us to cache the clobbering access for stores, despite the
fact that we can't rewrite the use-def chains themselves.

Early testing shows that, after this change, for larger testcases, it
will be a significant net positive (memory and time) to remove the
walker caching.

Reviewers: george.burgess.iv, davide

Subscribers: Prazek, llvm-commits

Differential Revision: https://reviews.llvm.org/D31567

llvm-svn: 299486
2017-04-04 23:43:10 +00:00
Daniel Berlin f49d4c45a1 Revert "MemorySSA: Add support for caching clobbering access in stores"
This reverts revision r299322.

llvm-svn: 299485
2017-04-04 23:43:04 +00:00
Sanjay Patel 0bf0abedf6 [InstCombine] rename variable for easier reading; NFC
We usually give constants a 'C' somewhere in the name...

llvm-svn: 299474
2017-04-04 22:06:03 +00:00
Craig Topper c745b6a1f6 [InstCombine] Turn subtract of vectors of i1 into xor like we do for scalar i1. Matches what we already do for add.
llvm-svn: 299472
2017-04-04 21:44:56 +00:00
Craig Topper 86173600ec [InstCombine] Support folding and/or/xor with a constant vector RHS into selects and phis
Currently we only fold with ConstantInt RHS. This generalizes to any Constant RHS.

Differential Revision: https://reviews.llvm.org/D31610

llvm-svn: 299466
2017-04-04 20:26:25 +00:00
Rong Xu 48596b6f7a [PGO] Memory intrinsic calls optimization based on profiled size
This patch optimizes two memory intrinsic operations: memset and memcpy based
on the profiled size of the operation. The high level transformation is like:
  mem_op(..., size)
  ==>
  switch (size) {
    case s1:
       mem_op(..., s1);
       goto merge_bb;
    case s2:
       mem_op(..., s2);
       goto merge_bb;
    ...
    default:
       mem_op(..., size);
       goto merge_bb;
    }
  merge_bb:

Differential Revision: http://reviews.llvm.org/D28966

llvm-svn: 299446
2017-04-04 16:42:20 +00:00
Craig Topper e06b6bcfa1 [InstCombine] Use setAllBits in place of getAllOnesValue since we know the bitwidths are the same. NFCI
llvm-svn: 299413
2017-04-04 05:03:02 +00:00
Zvi Rackover 82bf48d8b9 InstCombine: Use the InstSimplify hook for shufflevector
Summary: Start using the recently added InstSimplify hook for shuffles in the respective InstCombine visitor.

Reviewers: spatel, RKSimon, craig.topper, majnemer

Reviewed By: majnemer

Subscribers: majnemer, llvm-commits

Differential Revision: https://reviews.llvm.org/D31526

llvm-svn: 299412
2017-04-04 04:47:57 +00:00
Craig Topper 1604f0773b [InstCombine] Remove canonicalization for (X & C1) | C2 --> (X | C2) & (C1|C2) when C1 & C2 have common bits.
It turns out that SimplifyDemandedInstructionBits will get called earlier and remove bits from C1 first. Effectively doing (X & (C1&C2)) | C2. So by the time it got to this check there could be no common bits.

I think the DAGCombiner has the same check but its check can be executed because it handles demanded bits later. I'll look at it next.

llvm-svn: 299384
2017-04-03 20:41:47 +00:00
Craig Topper 3882613956 [DAGCombine][InstCombine] Fix inverted if condition in equivalent comments in DAGCombine and InstCombine. NFC
llvm-svn: 299378
2017-04-03 19:18:48 +00:00
Craig Topper 79120e80b8 Revert r299337 "[InstCombine] Remove redundant combine from visitAnd"
One of the tsan bots started failing at this commit. I don't see anything obviously wrong with the commit so trying this to see if it recovers.

Failing log: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-autoconf/builds/6792

llvm-svn: 299366
2017-04-03 17:22:23 +00:00
Sanjay Patel 77bf622db6 [InstCombine] fix formatting for foldLogOpOfMaskedICmps and related bits; NFCI
1. Improve enum, function, and variable names.
2. Improve comments.
3. Fix variable capitalization.
4. Run clang-format.

As an existing code comment suggests, this should work with vector types / splat constants too,
so making this look right first will reduce the diffs needed for that change.

llvm-svn: 299365
2017-04-03 16:53:12 +00:00
Craig Topper d33ee1b960 [APInt] Move isMask and isShiftedMask out of APIntOps and into the APInt class. Implement them without memory allocation for multiword
This moves the isMask and isShiftedMask functions to be class methods. They now use the MathExtras.h function for single word size and leading/trailing zeros/ones or countPopulation for the multiword size. The previous implementation made multiple temorary memory allocations to do the bitwise arithmetic operations to match the MathExtras.h implementation.

Differential Revision: https://reviews.llvm.org/D31565

llvm-svn: 299362
2017-04-03 16:34:59 +00:00
Craig Topper d0b053d229 [InstCombine] Make foldOpWithConstantIntoOperand take a BinaryOperator instead of a generic Instruction.
It blindly assumes there are two operands so make it explicit.

llvm-svn: 299351
2017-04-03 07:08:08 +00:00
Craig Topper 07944f891c [InstCombine] Remove a And transform that should be handled by SimplifyDemandedInstructionBits. NFCI
llvm-svn: 299349
2017-04-03 06:02:09 +00:00
Craig Topper 70e4f434ae [InstCombine] Make InstCombiner::OptAndOp take a BinaryOperator instead of an Instruction.
The callers have already performed the necessary cast before calling. This allows us to remove a comment that says the instruction must be a BinaryOperator and make it explicit in the argument type.

Had to add a default case to the switch because BinaryOperator::getOpcode() returns a BinaryOps enum.

llvm-svn: 299339
2017-04-02 17:57:30 +00:00
Craig Topper d133591a7e [InstCombine] Remove redundant combine from visitAnd
As far as I can tell this combine is fully handled by SimplifyDemandedInstructionBits.

I was only looking at this because it is the only user of APIntOps::isShiftedMask which is itself broken. As demonstrated by r299187. I was going to fix isShiftedMask and needed to make sure we had coverage for the new cases it would expose to this combine. But looks like we can nuke it instead.

Differential Revision: https://reviews.llvm.org/D31543

llvm-svn: 299337
2017-04-02 17:34:30 +00:00
Daniel Berlin 07daac8a36 NewGVN: Handle coercion of constant stores, loads, memory insts.
Summary:
Depends on D30928.

This adds support for coercion of stores and memory instructions that do not require insertion to process.
Another few tests down.
I added the relevant tests from rle.ll

Reviewers: davide

Subscribers: llvm-commits, Prazek

Differential Revision: https://reviews.llvm.org/D30929

llvm-svn: 299330
2017-04-02 13:23:44 +00:00
Nikolai Bozhenov fca527af5c [BypassSlowDivision] Do not bypass division of hash-like values
Disable bypassing if one of the operands looks like a hash value. Slow
division often occurs in hashtable implementations and fast division is
never taken there because a hash value is extremely unlikely to have
enough upper bits set to zero.

A value is considered to be hash-like if it is produced by

1) XOR operation
2) Multiplication by a constant wider than the shorter type
3) PHI node with all incoming values being hash-like

Differential Revision: https://reviews.llvm.org/D28200

llvm-svn: 299329
2017-04-02 13:14:30 +00:00
Daniel Berlin 8a00270838 MemorySSA: Add support for caching clobbering access in stores
Summary:
This enables us to cache the clobbering access for stores, despite the
fact that we can't rewrite the use-def chains themselves.

Early testing shows that, after this change, for larger testcases, it will be a significant net positive (memory and time) to remove the walker caching.

Reviewers: george.burgess.iv, davide

Subscribers: Prazek, llvm-commits

Differential Revision: https://reviews.llvm.org/D31567

llvm-svn: 299322
2017-04-02 05:09:15 +00:00
Daniel Berlin 9a9c9ff260 NewGVN: Don't try to kill off the stored value of stores when
processing the congruence class of the store.
Because we use the stored value of a store as the def, it isn't dead
just because it appears as a def when it comes from a store.

Note: I have not hit any cases with the memory code as it is where
this breaks anything, just because of what memory congruences we
actually allow.  In a followup that improves memory congruence,
this bug actually breaks real stuff (but the verifier catches it).

llvm-svn: 299300
2017-04-01 09:44:33 +00:00
Daniel Berlin 9b4984926c NewGVN: Clean up GVNExpression memory hierarchy, restructure hash computation a bit so we don't have to redefine it for loads, stores, and calls
llvm-svn: 299299
2017-04-01 09:44:29 +00:00
Daniel Berlin 871ecd90ca NewGVN: Use def_chain iterator in singleReachablePhiPath instead of recursion
llvm-svn: 299298
2017-04-01 09:44:24 +00:00
Daniel Berlin 07275c3065 Move def_chain iterator to MemorySSA.h so it can be reused
llvm-svn: 299297
2017-04-01 09:44:19 +00:00
Daniel Berlin d042031f0f MemorySSA: Push const correctness further.
llvm-svn: 299295
2017-04-01 09:01:12 +00:00
Daniel Berlin 7500c5641e MemorySSA: Kill the WalkTargetCache now that we have getBlockDefs.
llvm-svn: 299294
2017-04-01 08:59:45 +00:00
Craig Topper 47fd2de304 [APInt] Fix bugs in isShiftedMask to match behavior of the similar function in MathExtras.h
This removes a parameter from the routine that was responsible for a lot of the issue. It was a bit count that had to be set to the BitWidth of the APInt and would get passed to getLowBitsSet. This guaranteed the call to getLowBitsSet would create an all ones value. This was then compared to (V | (V-1)). So the only shifted masks we detected had to have the MSB set.

The one in tree user is a transform in InstCombine that never fires due to earlier transforms covering the case better. I've submitted a patch to remove it completely, but for now I've just adapted it to the new interface for isShiftedMask.

llvm-svn: 299273
2017-03-31 22:23:42 +00:00
Craig Topper e625d74271 [InstCombine] When adding an Instruction and its Users to the worklist at the same time, make sure we put the Users in first. Then put in the instruction.
This way we ensure we immediately revisit the instruction and do any additional optimizations before visiting the users. Otherwise we might visit the users, then the instruction, then users again, then instruction again.

llvm-svn: 299267
2017-03-31 21:35:30 +00:00
Craig Topper 885fa12e8a [APInt] Remove shift functions from APIntOps namespace. Replace the few users with the APInt class methods. NFCI
llvm-svn: 299248
2017-03-31 20:01:16 +00:00
Joerg Sonnenberger 28bed106e0 Do not translate rint into nearbyint, but truncate it like nearbyint.
A common way to implement nearbyint is by fiddling with the floating
point environment and calling rint. This is used at least by the BSD
libm and musl. As such, canonicalizing the latter to the former will
create infinite loops for libm and generally pessimize performance, at
least when the generic C versions are used.

This change preserves the rint in the libcall translation and also
handles the domain truncation logic, so that rint with float argument
will be reduced to rintf etc.

llvm-svn: 299247
2017-03-31 19:58:07 +00:00
Dehao Chen fed890ea3a Fix the InstCombine to reserve the VP metadata and sets correct call count.
Summary: Currently the VP metadata was dropped when InstCombine converts a call to direct call. This patch converts the VP metadata to branch_weights so that its hotness is recorded.

Reviewers: eraman, davidxl

Reviewed By: davidxl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D31344

llvm-svn: 299228
2017-03-31 15:59:52 +00:00
Mikael Holmen 79235bd4d8 [Scalarizer] Handle scalar arguments in vector GEP
Summary:
Triggered by commit r298620: "[LV] Vectorize GEPs".

If we encounter a vector GEP with scalar arguments, we splat the scalar
into a vector of appropriate size before we scatter the argument.

Reviewers: arsenm, mehdi_amini, bkramer

Reviewed By: arsenm

Subscribers: bjope, mssimpso, wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D31416

llvm-svn: 299186
2017-03-31 06:29:49 +00:00
Peter Collingbourne 6b193966ac ThinLTOBitcodeWriter: Use Module::global_values(). NFCI.
llvm-svn: 299132
2017-03-30 23:43:08 +00:00
Craig Topper 79e5bc528d [InstCombine] Fix typo last->least. NFC
llvm-svn: 299123
2017-03-30 22:28:55 +00:00
Matt Arsenault 79f837c254 AMDGPU: Add all atomicrmw fields to atomic.inc/dec
Add scope, order, isVolatile

llvm-svn: 299122
2017-03-30 22:21:40 +00:00
Hongbin Zheng bfd7c38de7 [SimplifyIndvar] Replace the sdiv used by IV if we can prove both of its operands are non-negative
Since there is no sdiv in SCEV, an 'udiv' is a better canonical form than an 'sdiv' as the user of induction variable

Differential Revision: https://reviews.llvm.org/D31488

llvm-svn: 299118
2017-03-30 21:56:56 +00:00
Simon Pilgrim 68168d17b9 Spelling mistakes in comments. NFCI.
Based on corrections mentioned in patch for clang for PR27635

llvm-svn: 299072
2017-03-30 12:59:53 +00:00
Matthew Simpson c8f0aeccda [InstCombine] Correct the check for vector GEPs
Some of the GEP combines (e.g., descaling) can't handle vector GEPs. We have an
existing check that attempts to bail out if given a vector GEP. However, the
check only tests the GEP's pointer operand. A GEP results in a vector of
pointers if at least one of its operands is vector-typed (e.g., its pointer
operand could be a scalar, but its index could be a vector). We should just
check the type of the GEP itself. This should fix PR32414.

Reference: https://bugs.llvm.org/show_bug.cgi?id=32414
Differential Revision: https://reviews.llvm.org/D31470

llvm-svn: 299017
2017-03-29 18:23:08 +00:00
Filipe Cabecinhas 8b94273fe6 Cleanup in preparation for D30703. NFCI
Make the enumerators follow the coding convention and start with OW_...

llvm-svn: 298996
2017-03-29 14:42:27 +00:00
Anna Thomas 923e574bff [InstCombine] For select rule, use positive check of constant int for select operand. NFCI
llvm-svn: 298906
2017-03-28 09:32:24 +00:00
Alex Shlyapnikov bbd5cc63d7 Revert "[asan] Delay creation of asan ctor."
Speculative revert. Some libfuzzer tests are affected.

This reverts commit r298731.

llvm-svn: 298890
2017-03-27 23:11:50 +00:00
Alex Shlyapnikov 09171aa31f Revert "[asan] Put ctor/dtor in comdat."
Speculative revert, some libfuzzer tests are affected.

This reverts commit r298756.

llvm-svn: 298889
2017-03-27 23:11:47 +00:00
Matthew Simpson b8ff4a4a70 [LV] Transform truncations of non-primary induction variables
The vectorizer tries to replace truncations of induction variables with new
induction variables having the smaller type. After r295063, this optimization
was applied to all integer induction variables, including non-primary ones.
When optimizing the truncation of a non-primary induction variable, we still
need to transform the new induction so that it has the correct start value.
This should fix PR32419.

Reference: https://bugs.llvm.org/show_bug.cgi?id=32419
llvm-svn: 298882
2017-03-27 20:07:38 +00:00
Anna Thomas f57ae33381 [InstCombine] Avoid incorrect folding of select into phi nodes when incoming element is a vector type
Summary:
We are incorrectly folding selects into phi nodes when the incoming value of a phi
node is a constant vector. This optimization is done in `FoldOpIntoPhi` when the
select condition is a phi node with constant incoming values.
Without the fix, we are miscompiling (i.e. incorrectly folding the
select into the phi node) when the vector contains non-zero
elements.
This patch fixes the miscompile and we will correctly fold based on the
select vector operand (see added test cases).

Reviewers: majnemer, sanjoy, spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D31189

llvm-svn: 298845
2017-03-27 13:52:51 +00:00
Serge Pavlov b71bb80c2d [LoopUnroll] Remap references in peeled iteration
References in cloned blocks must be remapped prior to dominator
calculation.

Differential Revision: https://reviews.llvm.org/D31281

llvm-svn: 298811
2017-03-26 16:46:53 +00:00
Joerg Sonnenberger fa7367428a Split the SimplifyCFG pass into two variants.
The first variant contains all current transformations except
transforming switches into lookup tables. The second variant
contains all current transformations.

The switch-to-lookup-table conversion results in code that is more
difficult to analyze and optimize by other passes. Most importantly,
it can inhibit Dead Code Elimination. As such it is often beneficial to
only apply this transformation very late. A common example is inlining,
which can often result in range restrictions for the switch expression.

Changes in execution time according to LNT:
SingleSource/Benchmarks/Misc/fp-convert +3.03%
MultiSource/Benchmarks/ASC_Sequoia/CrystalMk/CrystalMk -11.20%
MultiSource/Benchmarks/Olden/perimeter/perimeter -10.43%
and a couple of smaller changes. For perimeter it also results 2.6%
a smaller binary.

Differential Revision: https://reviews.llvm.org/D30333

llvm-svn: 298799
2017-03-26 06:44:08 +00:00
Chandler Carruth 0d256c0f5d [IR] Make SwitchInst::CaseIt almost a normal iterator.
This moves it to the iterator facade utilities giving it full random
access semantics, etc. It can also now be used with standard algorithms
like std::all_of and std::any_of and range adaptors like llvm::reverse.

Also make the semantics of iterating match what every other iterator
uses and forbid decrementing past the begin iterator. This was used as
a hacky way to work around iterator invalidation. However, every
instance trying to do this failed to actually avoid touching invalid
iterators despite the clear documentation that the removed and all
subsequent iterators become invalid including the end iterator. So I've
added a return of the next iterator to removeCase and rewritten the
loops that were doing this to correctly follow the iterator pattern of
either incremneting or removing and assigning fresh values to the
iterator and the end.

In one case we were trying to go backwards to make this cleaner but it
doesn't actually work. I've made that code match the code we use
everywhere else to remove cases as we iterate. This changes the order of
cases in one test output and I moved that test to CHECK-DAG so it
wouldn't care -- the order isn't semantically meaningful anyways.

llvm-svn: 298791
2017-03-26 02:49:23 +00:00
Craig Topper 47596dd4cc [InstCombine] Change the interface of SimplifyDemandedBits so that it takes the instruction and operand instead of the Use.
The first thing it did was get the User for the Use to get the instruction back. This requires looking through the Uses for the User using the waymarking walk. That's pretty fast, but its probably still better to just pass the Instruction we already had.

llvm-svn: 298772
2017-03-25 06:52:52 +00:00
Davide Italiano e9781e7b2f [NewGVN] Adjust NDEBUG markers.
This avoids 'used but not defined' warnings in Release builds
with GCC.

llvm-svn: 298760
2017-03-25 02:40:02 +00:00
Evgeniy Stepanov 71bb8f1ad0 [asan] Put ctor/dtor in comdat.
When possible, put ASan ctor/dtor in comdat.

The only reason not to is global registration, which can be
TU-specific. This is not the case when there are no instrumented
globals. This is also limited to ELF targets, because MachO does
not have comdat, and COFF linkers may GC comdat constructors.

The benefit of this is a lot less __asan_init() calls: one per DSO
instead of one per TU. It's also necessary for the upcoming
gc-sections-for-globals change on Linux, where multiple references to
section start symbols trigger quadratic behaviour in gold linker.

llvm-svn: 298756
2017-03-25 01:01:11 +00:00
Craig Topper 8fbb74b5b2 Revert r298711 "[InstCombine] Provide a way to calculate KnownZero/One for Add/Sub in SimplifyDemandedUseBits without recursing into ComputeKnownBits"
Tsan bot is failing.

llvm-svn: 298745
2017-03-24 22:12:10 +00:00
Ivan Krasin c2124e185c Revert r298620: [LV] Vectorize GEPs
Reason: breaks linking Chromium with LLD + ThinLTO (a pass crashes)
LLVM bug: https://bugs.llvm.org//show_bug.cgi?id=32413

Original change description:

[LV] Vectorize GEPs

This patch adds support for vectorizing GEPs. Previously, we only generated
vector GEPs on-demand when creating gather or scatter operations. All GEPs from
the original loop were scalarized by default, and if a pointer was to be stored
to memory, we would have to build up the pointer vector with insertelement
instructions.

With this patch, we will vectorize all GEPs that haven't already been marked
for scalarization.

The patch refines collectLoopScalars to more exactly identify the scalar GEPs.
The function now more closely resembles collectLoopUniforms. And the patch
moves vector GEP creation out of vectorizeMemoryInstruction and into the main
vectorization loop. The vector GEPs needed for gather and scatter operations
will have already been generated before vectoring the memory accesses.

Original Differential Revision: https://reviews.llvm.org/D30710

llvm-svn: 298735
2017-03-24 20:49:43 +00:00
Evgeniy Stepanov 64e872a91f [asan] Delay creation of asan ctor.
Create the constructor in the module pass.
This in needed for the GC-friendly globals change, where the constructor can be
put in a comdat  in some cases, but we don't know about that in the function
pass.

llvm-svn: 298731
2017-03-24 20:42:15 +00:00
Matt Arsenault 4c7795dd31 AMDGPU: Fold rcp/rsq of undef to undef
llvm-svn: 298725
2017-03-24 19:04:57 +00:00
Matt Arsenault 18bb24a1be TTI: Split IsSimple in MemIntrinsicInfo
All this did before was assert in EarlyCSE.

llvm-svn: 298724
2017-03-24 18:56:43 +00:00
Teresa Johnson 428b9e0627 [ThinLTO] Correct counting of functions in inliner stats
Summary: Declarations need to be filtered out when counting functions.

Reviewers: eraman

Subscribers: Prazek, llvm-commits

Differential Revision: https://reviews.llvm.org/D31336

llvm-svn: 298720
2017-03-24 17:59:06 +00:00
Craig Topper d4521c2fc2 [InstCombine] Provide a way to calculate KnownZero/One for Add/Sub in SimplifyDemandedUseBits without recursing into ComputeKnownBits
SimplifyDemandedUseBits for Add/Sub already recursed down LHS and RHS for simplifying bits. If that didn't provide any simplifications we fall back to calling computeKnownBits which will recurse again. Instead just take the known bits for LHS and RHS we already have and call into a new function in ValueTracking that can calculate the known bits given the LHS/RHS bits.

llvm-svn: 298711
2017-03-24 16:56:51 +00:00
Benjamin Kramer 46f5e2c47b Make GCC happy again.
llvm-svn: 298702
2017-03-24 14:15:35 +00:00
Daniel Berlin ffc30781f4 NewGVN: Small cleanup of two dominance related functions to make
them easier to understand.

llvm-svn: 298692
2017-03-24 06:33:51 +00:00
Daniel Berlin 0e9001131d NewGVN: Small cleanup of useless expression deletion, and don't uselessly create two expressions in symbolic store evaluation.
llvm-svn: 298691
2017-03-24 06:33:48 +00:00
Daniel Berlin 9d0796e5d0 NewGVN: Fix PR32403 - Handling of undef in phis was not quite correct
due to LLVM's view of phi nodes.  It would cause NewGVN not to fixpoint
in some interesting edge cases.

llvm-svn: 298687
2017-03-24 05:30:34 +00:00
Craig Topper 36f2e0eee8 [InstCombine] Use range-based for loop. NFC
llvm-svn: 298680
2017-03-24 02:58:02 +00:00
Craig Topper df73e7c5b7 [InstCombine] Fix 80 column violation I accidentally introduced. NFC
llvm-svn: 298679
2017-03-24 02:57:59 +00:00
Reid Kleckner 392f062675 [sancov] Don't instrument blocks with no insertion point
This prevents crashes when attempting to instrument functions containing
C++ try.

Sanitizer coverage will still fail at runtime when an exception is
thrown through a sancov instrumented function, but that seems marginally
better than what we have now. The full solution is to color the blocks
in LLVM IR and only instrument blocks that have an unambiguous color,
using the appropriate token.

llvm-svn: 298662
2017-03-23 23:30:41 +00:00
Dehao Chen 722e94061b Set the prof weight correctly for call instructions in DeadArgumentElimination.
Summary: In DeadArgumentElimination, the call instructions will be replaced. We also need to set the prof weights so that function inlining can find the correct profile.

Reviewers: eraman

Reviewed By: eraman

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D31143

llvm-svn: 298660
2017-03-23 23:26:00 +00:00
Bryant Wong def79b21e4 [MetaRenamer] Don't rename library functions.
Library functions can have specific semantics that affect the behavior of
certain passes. DSE, for instance, gives special treatment to malloc-ed pointers
but not to pointers returned from an equivalently typed (but differently named)
function.

MetaRenamer ought not to alter program semantics, so library functions must
remain untouched.

Reviewers: mehdi_amini, majnemer, chandlerc, davide

Reviewed By: davide

Subscribers: davide, llvm-commits

Differential Revision: https://reviews.llvm.org/D31304

llvm-svn: 298659
2017-03-23 23:21:07 +00:00
Dehao Chen 8c88671985 Disable loop unrolling and icp in SamplePGO ThinLTO compile phase
Summary:
loop unrolling and icp will make the sample profile annotation much harder in the backend. So disable these 2 optimization in the ThinLTO compile phase.
Will add a test in cfe in a separate patch.

Reviewers: tejohnson

Reviewed By: tejohnson

Subscribers: mehdi_amini, llvm-commits, Prazek

Differential Revision: https://reviews.llvm.org/D31217

llvm-svn: 298646
2017-03-23 21:20:05 +00:00
Craig Topper 74494d0179 [InstCombine] Remove some code from visitAnd that dealt with trying to reduce the LHS of a sub to 0. This should now be fully handled by SimplifyDemandedInstructionBits now.
Now that we call ShrinkDemandedConstant on the RHS of sub this should be taken care of. This code doesn't trigger on any in tree regressions, but did before ShrinkDemandedConstant was added to the RHS.

llvm-svn: 298644
2017-03-23 21:00:13 +00:00
Teresa Johnson 0c6a4ff8dc [ThinLTO] Add support for emitting minimized bitcode for thin link
Summary:
The cumulative size of the bitcode files for a very large application
can be huge, particularly with -g. In a distributed build environment,
all of these files must be sent to the remote build node that performs
the thin link step, and this can exceed size limits.

The thin link actually only needs the summary along with a bitcode
symbol table. Until we have a proper bitcode symbol table, simply
stripping the debug metadata results in significant size reduction.

Add support for an option to additionally emit minimized bitcode
modules, just for use in the thin link step, which for now just strips
all debug metadata. I plan to add a cc1 option so this can be invoked
easily during the compile step.

However, care must be taken to ensure that these minimized thin link
bitcode files produce the same index as with the original bitcode files,
as these original bitcode files will be used in the backends.

Specifically:
1) The module hash used for caching is typically produced by hashing the
written bitcode, and we want to include the hash that would correspond
to the original bitcode file. This is because we want to ensure that
changes in the stripped portions affect caching. Added plumbing to emit
the same module hash in the minimized thin link bitcode file.
2) The module paths in the index are constructed from the module ID of
each thin linked bitcode, and typically is automatically generated from
the input file path. This is the path used for finding the modules to
import from, and obviously we need this to point to the original bitcode
files. Added gold-plugin support to take a suffix replacement during the
thin link that is used to override the identifier on the MemoryBufferRef
constructed from the loaded thin link bitcode file. The assumption is
that the build system can specify that the minimized bitcode file has a
name that is similar but uses a different suffix (e.g. out.thinlink.bc
instead of out.o).

Added various tests to ensure that we get identical index files out of
the thin link step.

Reviewers: mehdi_amini, pcc

Subscribers: Prazek, llvm-commits

Differential Revision: https://reviews.llvm.org/D31027

llvm-svn: 298638
2017-03-23 19:47:39 +00:00
Matthew Simpson 4e7b71bc86 [LV] Vectorize GEPs
This patch adds support for vectorizing GEPs. Previously, we only generated
vector GEPs on-demand when creating gather or scatter operations. All GEPs from
the original loop were scalarized by default, and if a pointer was to be stored
to memory, we would have to build up the pointer vector with insertelement
instructions.

With this patch, we will vectorize all GEPs that haven't already been marked
for scalarization.

The patch refines collectLoopScalars to more exactly identify the scalar GEPs.
The function now more closely resembles collectLoopUniforms. And the patch
moves vector GEP creation out of vectorizeMemoryInstruction and into the main
vectorization loop. The vector GEPs needed for gather and scatter operations
will have already been generated before vectoring the memory accesses.

Differential Revision: https://reviews.llvm.org/D30710

llvm-svn: 298620
2017-03-23 16:29:58 +00:00
Matthew Simpson 1fb4064531 [LV] Delete unneeded scalar GEP creation code
The code for generating scalar base pointers in vectorizeMemoryInstruction is
not needed. We currently scalarize all GEPs and maintain the scalarized values
in VectorLoopValueMap. The GEP cloning in this unneeded code is the same as
that in scalarizeInstruction. The test cases that changed as a result of this
patch changed because we were able to reuse the scalarized GEP that we
previously generated instead of cloning a new one.

Differential Revision: https://reviews.llvm.org/D30587

llvm-svn: 298615
2017-03-23 16:07:21 +00:00
Dehao Chen 53a0c082d2 Do not set branch weight if the branch weight annotation is present.
Summary: ThinLTO will annotate the CFG twice. If the branch weight is set by the first annotation, we should not set the branch weight again in the second annotation because the first annotation is more accurate as there is less optimization that could affect debug info accuracy.

Reviewers: tejohnson, davidxl

Reviewed By: tejohnson

Subscribers: mehdi_amini, aprantl, llvm-commits

Differential Revision: https://reviews.llvm.org/D31228

llvm-svn: 298602
2017-03-23 14:43:10 +00:00
Luqman Aden 3f807c91dc Preserve nonnull metadata on Loads through SROA & mem2reg.
Summary:
https://llvm.org/bugs/show_bug.cgi?id=31142 :

SROA was dropping the nonnull metadata on loads from allocas that got optimized out. This patch simply preserves nonnull metadata on loads through SROA and mem2reg.

Reviewers: chandlerc, efriedma

Reviewed By: efriedma

Subscribers: hfinkel, spatel, efriedma, arielb1, davide, llvm-commits

Differential Revision: https://reviews.llvm.org/D27114

llvm-svn: 298540
2017-03-22 19:16:39 +00:00
Peter Collingbourne f7691d8b41 IPO: Const correctness for summaries passed into passes.
Pass const qualified summaries into importers and unqualified summaries into
exporters. This lets us const-qualify the summary argument to thinBackend.

Differential Revision: https://reviews.llvm.org/D31230

llvm-svn: 298534
2017-03-22 18:22:59 +00:00
Peter Collingbourne 9a3f97977f IR: Fix a race condition in type id clients of ModuleSummaryIndex.
Add a const version of the getTypeIdSummary accessor that avoids
mutating the TypeIdMap.

Differential Revision: https://reviews.llvm.org/D31226

llvm-svn: 298531
2017-03-22 18:04:39 +00:00