Commit Graph

16194 Commits

Author SHA1 Message Date
James Molloy 3c1137c639 Revert "[SimplifyCFG] Improve FoldValueComparisonIntoPredecessors to handle more cases"
This reverts commit r280218. This *also* causes buildbot errors. Sigh. Not a successful day all around!

llvm-svn: 280239
2016-08-31 13:32:28 +00:00
James Molloy cacfc16109 Revert "[SimplifyCFG] Change the algorithm in SinkThenElseCodeToEnd"
This reverts commit r280216 - it caused buildbot failures.

llvm-svn: 280234
2016-08-31 13:16:52 +00:00
James Molloy 76c9d423a7 Revert "[SimplifyCFG] Handle tail-sinking of more than 2 incoming branches"
This reverts commit r280217. r280216 caused buildbot failures - backing out the entire chain.

llvm-svn: 280233
2016-08-31 13:16:45 +00:00
James Molloy 06a45483a1 Revert "[SimplifyCFG] Add a workaround to fix PR30188"
This reverts commit r280219. r280216 caused buildbot failures - backing out the entire chain.

llvm-svn: 280232
2016-08-31 13:16:36 +00:00
James Molloy 8a66a39cbf Revert "[SimplifyCFG] Fix bootstrap failure after r280220"
This reverts commit r280228. r280216 caused buildbot failures - backing out the entire sequence.

llvm-svn: 280231
2016-08-31 13:16:30 +00:00
James Molloy b7efa6c227 [SimplifyCFG] Fix bootstrap failure after r280220
We check that a sinking candidate is used by only one PHI node during our legality checks. However for instructions that are used by other sinking candidates our heuristic is less conservative. This can result in a candidate actually being illegal when we come to sink it because of how we sunk a predecessor. Do the used-by-only-one-PHI checks again during sinking to ensure we don't crash.

llvm-svn: 280228
2016-08-31 12:33:48 +00:00
James Molloy 171fdac7ce [SimplifyCFG] Add a workaround to fix PR30188
We're sinking stores, which is a good thing, but in the process creating selects for the store address operand, which SROA/Mem2Reg can't look through, which caused serious regressions.

The real fix is in SROA, which I'll be looking into.

llvm-svn: 280219
2016-08-31 10:46:45 +00:00
James Molloy 8e69b032e5 [SimplifyCFG] Improve FoldValueComparisonIntoPredecessors to handle more cases
A very important case is not handled here: multiple arcs to a single block with a PHI. Consider:

    a:
      %1 = icmp %b, 1
      br %1, label %c, label %e
    c:
      %2 = icmp %b, 2
      br %2, label %d, label %e
    d:
      br %e
    e:
      phi [0, %a], [1, %c], [2, %d]

FoldValueComparisonIntoPredecessors will refuse to fold this, as it doesn't know how to deal with two arcs to a common destination with different PHI values. The answer is obvious - just split all conflicting arcs.

llvm-svn: 280218
2016-08-31 10:46:39 +00:00
James Molloy c53b40b509 [SimplifyCFG] Handle tail-sinking of more than 2 incoming branches
This was a real restriction in the original version of SinkIfThenCodeToEnd. Now it's been rewritten, the restriction can be lifted.

As part of this, we handle a very common and useful case where one of the incoming branches is actually conditional. Consider:

   if (a)
     x(1);
   else if (b)
     x(2);

This produces the following CFG:

         [if]
        /    \
      [x(1)] [if]
        |     | \
        |     |  \
        |  [x(2)] |
         \    |  /
          [ end ]

[end] has two unconditional predecessor arcs and one conditional. The conditional refers to the implicit empty 'else' arc. This same pattern can also be caused by an empty default block in a switch.

We can't sink the call to x() down to end because no call to x() happens on the third incoming arc (assume that x() has sideeffects for the sake of argument; if something is safe to speculate we could indeed sink nevertheless but this cannot happen in the general case and causes many extra selects).

We are now able to detect this case and split off the unconditional arcs to a common successor:

         [if]
        /    \
      [x(1)] [if]
        |     | \
        |     |  \
        |  [x(2)] |
         \   /    |
     [sink.split] |
           \     /
           [ end ]

Now we can sink the call to x() into %sink.split. This can cause significant code simplification in many testcases.

llvm-svn: 280217
2016-08-31 10:46:33 +00:00
James Molloy 55bd04cd20 [SimplifyCFG] Change the algorithm in SinkThenElseCodeToEnd
r279460 rewrote this function to be able to handle more than two incoming edges and took pains to ensure this didn't regress anything.

This time we change the logic for determining if an instruction should be sunk. Previously we used a single pass greedy algorithm - sink instructions until one requires more than one PHI node or we run out of instructions to sink.

This had the problem that sinking instructions that had non-identical but trivially the same operands needed extra logic so we sunk them aggressively. For example:

    %a = load i32* %b          %d = load i32* %b
    %c = gep i32* %a, i32 0    %e = gep i32* %d, i32 1

Sinking %c and %e would naively require two PHI merges as %a != %d. But the loads are obviously equivalent (and maybe can't be hoisted because there is no common predecessor).

This is why we implemented the fairly complex function areValuesTriviallySame(), to look through trivial differences like this. However it's just not clever enough.

Instead, throw areValuesTriviallySame away, use pointer equality to check equivalence of operands and switch to a two-stage algorithm.

In the "scan" stage, we look at every sinkable instruction in isolation from end of block to front. If it's sinkable, we keep track of all operands that required PHI merging.

In the "sink" stage, we iteratively sink the last non-terminator in the source blocks. But when calculating how many PHIs are actually required to be inserted (to work out if we should stop or not) we remove any values that have already been sunk from the set of PHI-merges required, which allows us to be more aggressive.

This turns an algorithm with potentially recursive lookahead (looking through GEPs, casts, loads and any other instruction potentially not CSE'd) to two linear scans.

llvm-svn: 280216
2016-08-31 10:46:23 +00:00
James Molloy 923e98c232 [SimplifyCFG] Tail-merge calls with sideeffects
This was deliberately disabled during my rewrite of SinkIfThenToEnd to keep behaviour
at least vaguely consistent with the previous version and keep it as close to NFC as
I could.

There's no real reason not to merge sideeffect calls though, so let's do it! Small fixup
along the way to ensure we don't create indirect calls.

Should fix PR28964.

llvm-svn: 280215
2016-08-31 10:46:16 +00:00
Gor Nishanov 50d7fb974f [Coroutines] Part 10: Add coroutine promise support.
Summary:
1) CoroEarly now lowers llvm.coro.promise intrinsic that allows to obtain
a coroutine promise pointer from a coroutine frame and vice versa.

2) CoroFrame now interprets Promise argument of llvm.coro.begin to
place CoroutinPromise alloca at a deterministic offset from the coroutine frame.

Now, the coroutine promise example from docs\Coroutines.rst compiles and produces expected result (see test/Transform/Coroutines/ex4.ll).

Reviewers: majnemer

Subscribers: llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D23993

llvm-svn: 280184
2016-08-31 00:35:41 +00:00
Sanjay Patel 7d9ebaf337 [InstCombine] clean up InsertRangeTest; NFCI
It's much less code and easier to read if we don't duplicate
everything between the 'Inside' and not 'Inside' cases.

As noted with the FIXME, the goal is to make this vector-friendly
in a follow-up patch.

llvm-svn: 280183
2016-08-31 00:19:35 +00:00
Alina Sbirlea 3f8f7840bf [LoadStoreVectorizer] Change VectorSet to Vector to match head and tail positions. Resolves PR29148.
Summary:
LSV was using two vector sets (heads and tails) to track pairs of adjiacent position to vectorize.
A recent optimization is trying to obtain the longest chain to vectorize and assumes the positions
in heads(H) and tails(T) match, which is not the case is there are multiple tails for the same head.

e.g.:
i1: store a[0]
i2: store a[1]
i3: store a[1]
Leads to:
H: i1
T: i2 i3
Instead of:
H: i1 i1
T: i2 i3
So the positions for instructions that follow i3 will have different indexes in H/T.
This patch resolves PR29148.

This issue also surfaced the fact that if the chain is too long, and TLI
returns a "not-fast" answer, the whole chain will be abandoned for
vectorization, even though a smaller one would be beneficial.
Added a testcase and FIXME for this.

Reviewers: tstellarAMD, arsenm, jlebar

Subscribers: mzolotukhin, wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D24057

llvm-svn: 280179
2016-08-30 23:53:59 +00:00
Michael Kuperstein 2954d1db77 [LoopVectorizer] Predicate instructions in blocks with several incoming edges
We don't need to limit predication to blocks that have a single incoming
edge, we just need to use the right mask.
This fixes PR30172.

Differential Revision: https://reviews.llvm.org/D24009

llvm-svn: 280148
2016-08-30 20:22:21 +00:00
Sanjay Patel b37145712e [InstCombine] replace divide-by-constant checks with asserts; NFC
These folds already have tests for scalar and vector types, except 
for the vector div-by-0 case, so I'm adding tests for that.

llvm-svn: 280115
2016-08-30 17:31:34 +00:00
Sanjay Patel a7cb477277 [InstCombine] clean up foldICmpDivConstant; NFCI
1. Fix comments to match variable names
2. Remove redundant CmpRHS variable
3. Add FIXME to replace some checks with asserts

llvm-svn: 280112
2016-08-30 17:10:49 +00:00
Chad Rosier 27ac0d8670 [Reassociate] Add additional debug output. NFC.
llvm-svn: 280090
2016-08-30 13:58:35 +00:00
James Molloy d13b1239e4 [SimplifyCFG] Properly CSE metadata in SinkThenElseCodeToEnd
This was missing, meaning the metadata in sunk instructions was potentially bogus and could cause miscompiles.

llvm-svn: 280072
2016-08-30 10:56:08 +00:00
Anna Thomas 1aea6da564 [RewriteStatepointsForGC] Update comment for same PHI node check. NFC
llvm-svn: 280052
2016-08-30 02:36:48 +00:00
Kostya Serebryany 5ac427b8e4 [sanitizer-coverage] add two more modes of instrumentation: trace-div and trace-gep, mostly usaful for value-profile-based fuzzing; llvm part
llvm-svn: 280043
2016-08-30 01:12:10 +00:00
Duncan P. N. Exon Smith 5c001c367f ADT: Give ilist<T>::reverse_iterator a handle to the current node
Reverse iterators to doubly-linked lists can be simpler (and cheaper)
than std::reverse_iterator.  Make it so.

In particular, change ilist<T>::reverse_iterator so that it is *never*
invalidated unless the node it references is deleted.  This matches the
guarantees of ilist<T>::iterator.

(Note: MachineBasicBlock::iterator is *not* an ilist iterator, but a
MachineInstrBundleIterator<MachineInstr>.  This commit does not change
MachineBasicBlock::reverse_iterator, but it does update
MachineBasicBlock::reverse_instr_iterator.  See note at end of commit
message for details on bundle iterators.)

Given the list (with the Sentinel showing twice for simplicity):

     [Sentinel] <-> A <-> B <-> [Sentinel]

the following is now true:
 1. begin() represents A.
 2. begin() holds the pointer for A.
 3. end() represents [Sentinel].
 4. end() holds the poitner for [Sentinel].
 5. rbegin() represents B.
 6. rbegin() holds the pointer for B.
 7. rend() represents [Sentinel].
 8. rend() holds the pointer for [Sentinel].

The changes are #6 and #8.  Here are some properties from the old
scheme (which used std::reverse_iterator):
- rbegin() held the pointer for [Sentinel] and rend() held the pointer
  for A;
- operator*() cost two dereferences instead of one;
- converting from a valid iterator to its valid reverse_iterator
  involved a confusing increment; and
- "RI++->erase()" left RI invalid.  The unintuitive replacement was
  "RI->erase(), RE = end()".

With vector-like data structures these properties are hard to avoid
(since past-the-beginning is not a valid pointer), and don't impose a
real cost (since there's still only one dereference, and all iterators
are invalidated on erase).  But with lists, this was a poor design.

Specifically, the following code (which obviously works with normal
iterators) now works with ilist::reverse_iterator as well:

    for (auto RI = L.rbegin(), RE = L.rend(); RI != RE;)
      fooThatMightRemoveArgFromList(*RI++);

Converting between iterator and reverse_iterator for the same node uses
the getReverse() function.

    reverse_iterator iterator::getReverse();
    iterator reverse_iterator::getReverse();

Why doesn't iterator <=> reverse_iterator conversion use constructors?

In order to catch and update old code, reverse_iterator does not even
have an explicit conversion from iterator.  It wouldn't be safe because
there would be no reasonable way to catch all the bugs from the changed
semantic (see the changes at call sites that are part of this patch).

Old code used this API:

    std::reverse_iterator::reverse_iterator(iterator);
    iterator std::reverse_iterator::base();

Here's how to update from old code to new (that incorporates the
semantic change), assuming I is an ilist<>::iterator and RI is an
ilist<>::reverse_iterator:

            [Old]         ==>          [New]
    reverse_iterator(I)       (--I).getReverse()
    reverse_iterator(I)         ++I.getReverse()
  --reverse_iterator(I)           I.getReverse()
    reverse_iterator(++I)         I.getReverse()
          RI.base()          (--RI).getReverse()
          RI.base()            ++RI.getReverse()
        --RI.base()              RI.getReverse()
      (++RI).base()              RI.getReverse()
  delete &*RI, RE = end()         delete &*RI++
  RI->erase(), RE = end()         RI++->erase()

=======================================
Note: bundle iterators are out of scope
=======================================

MachineBasicBlock::iterator, also known as
MachineInstrBundleIterator<MachineInstr>, is a wrapper to represent
MachineInstr bundles.  The idea is that each operator++ takes you to the
beginning of the next bundle.  Implementing a sane reverse iterator for
this is harder than ilist.  Here are the options:
- Use std::reverse_iterator<MBB::i>.  Store a handle to the beginning of
  the next bundle.  A call to operator*() runs a loop (usually
  operator--() will be called 1 time, for unbundled instructions).
  Increment/decrement just works.  This is the status quo.
- Store a handle to the final node in the bundle.  A call to operator*()
  still runs a loop, but it iterates one time fewer (usually
  operator--() will be called 0 times, for unbundled instructions).
  Increment/decrement just works.
- Make the ilist_sentinel<MachineInstr> *always* store that it's the
  sentinel (instead of just in asserts mode).  Then the bundle iterator
  can sniff the sentinel bit in operator++().

I initially tried implementing the end() option as part of this commit,
but updating iterator/reverse_iterator conversion call sites was
error-prone.  I have a WIP series of patches that implements the final
option.

llvm-svn: 280032
2016-08-30 00:13:12 +00:00
Teresa Johnson 8c1bc986b5 [ThinLTO] Indirect call promotion fixes for promoted local functions
Summary:
Fix a couple issues limiting the application of indirect call promotion
in ThinLTO mode:
- Invoke indirect call promotion before globalopt, since it may
  eliminate imported functions which appear unreferenced.
- Invoke indirect call promotion with InLTO=true so that the PGOFuncName
  metadata is used to get the name for locals which would have been
  renamed during promotion.

Reviewers: davidxl, mehdi_amini

Subscribers: Prazek, llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D24004

llvm-svn: 280024
2016-08-29 22:46:56 +00:00
Chad Rosier 6e1eaac62b [SLP] Return a boolean value for these static helpers. NFC.
Differential Revision: https://reviews.llvm.org/D24008

llvm-svn: 280020
2016-08-29 22:09:51 +00:00
Matthew Simpson df19502b16 [LV] Move insertelement sequence after scalar definitions
After r279649 when getting a vector value from VectorLoopValueMap, we create an
insertelement sequence on-demand if the value has been scalarized instead of
vectorized. We previously inserted this insertelement sequence before the
value's first vector user. However, this insert location is problematic if that
user is the phi node of a first-order recurrence. With this patch, we move the
insertelement sequence after the last scalar instruction we created when
scalarizing the value. Thus, the value's vector definition in the new loop will
immediately follow its scalar definitions. This should fix PR30183.

Reference: https://llvm.org/bugs/show_bug.cgi?id=30183
llvm-svn: 280001
2016-08-29 20:14:04 +00:00
Vitaly Buka 3c4f6bf654 [asan] Enable new stack poisoning with store instruction by default
Reviewers: eugenis

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D23968

llvm-svn: 279993
2016-08-29 19:28:34 +00:00
Tim Northover c10c33444e ASan: remove variable only used in assertions build
llvm-svn: 279990
2016-08-29 19:12:20 +00:00
Vitaly Buka 793913c7eb Use store operation to poison allocas for lifetime analysis.
Summary:
Calling __asan_poison_stack_memory and __asan_unpoison_stack_memory for small
variables is too expensive.

Code is disabled by default and can be enabled by -asan-experimental-poisoning.

PR27453

Reviewers: eugenis

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D23947

llvm-svn: 279984
2016-08-29 18:17:21 +00:00
Vitaly Buka db331d8be7 [asan] Separate calculation of ShadowBytes from calculating ASanStackFrameLayout
Summary: No functional changes, just refactoring to make D23947 simpler.

Reviewers: eugenis

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D23954

llvm-svn: 279982
2016-08-29 17:41:29 +00:00
David Majnemer e8fd5f9ffd [SimplifyCFG] Hoisting invalidates metadata
We forgot to remove optimization metadata when performing hosting during
FoldTwoEntryPHINode.

This fixes PR29163.

llvm-svn: 279980
2016-08-29 17:14:08 +00:00
Anna Thomas 2bc129c5fd [StatepointsForGC] Rematerialize in the presence of PHIs
Summary:
While walking the use chain for identifying rematerializable values in RS4GC,
add the case where the current value and base value are the same PHI nodes.

This will aid rematerialization of geps and casts instead of relocating.

Reviewers: sanjoy, reames, igor

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D23920

llvm-svn: 279975
2016-08-29 15:41:59 +00:00
Gor Nishanov dce9b02677 [Coroutines] Part 9: Add cleanup subfunction.
Summary:
[Coroutines] Part 9: Add cleanup subfunction.

This patch completes coroutine heap allocation elision. Now, the heap elision example from docs\Coroutines.rst compiles and produces expected result (see test/Transform/Coroutines/ex3.ll)

Intrinsic Changes:
* coro.free gets a token parameter tying it to coro.id to allow reliably discovering all coro.frees associated with a particular coroutine.
* coro.id gets an extra parameter that points back to a coroutine function. This allows to check whether a coro.id describes the enclosing function or it belongs to a different function that was later inlined.

CoroSplit now creates three subfunctions:
# f$resume - resume logic
# f$destroy - cleanup logic, followed by a deallocation code
# f$cleanup - just the cleanup code

CoroElide pass during devirtualization replaces coro.destroy with either f$destroy or f$cleanup depending whether heap elision is performed or not.

Other fixes, improvements:
* Fixed buglet in Shape::buildFrame that was not creating coro.save properly if coroutine has more than one suspend point.

* Switched to using variable width suspend index field (no longer limited to 32 bit index field can be as little as i1 or as large as i<whatever-size_t-is>)

Reviewers: majnemer

Subscribers: llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D23844

llvm-svn: 279971
2016-08-29 14:34:12 +00:00
Sanjay Patel 5c5311f4e5 [InstCombine] use m_APInt to allow icmp (and X, Y), C folds for splat constant vectors
llvm-svn: 279937
2016-08-28 18:18:00 +00:00
Sebastian Pop 4660199a33 GVN-hoist: invalidate MD cache (PR29144)
Without invalidating the entries in the MD cache we would try to access instructions
that were removed in previous iterations of hoisting.

Differential Revision: https://reviews.llvm.org/D23927

llvm-svn: 279907
2016-08-27 02:48:41 +00:00
Adam Nemet cef3314156 [Inliner] Report when inlining fails because callee's def is unavailable
Summary:
This is obviously an interesting case because it may motivate code
restructuring or LTO.

Reporting this requires instantiation of ORE in the loop where the call
sites are first gathered.  I've checked compile-time
overhead *with* -Rpass-with-hotness and the worst slow-down was 6% in
mcf and quickly tailing off.  As before without -Rpass-with-hotness
there is no overhead.

Because this could be a pretty noisy diagnostics, it is currently
qualified as 'verbose'.  As of this patch, 'verbose' diagnostics are
only emitted with -Rpass-with-hotness, i.e. when the output is expected
to be filtered.

Reviewers: eraman, chandlerc, davidxl, hfinkel

Subscribers: tejohnson, Prazek, davide, llvm-commits

Differential Revision: https://reviews.llvm.org/D23415

llvm-svn: 279860
2016-08-26 20:21:05 +00:00
Sanjay Patel 14e0e18d76 [InstCombine] add helper function for icmp (and (sh X, Y), C2), C1 ; NFC
Like other recent changes near here, the goal is to allow vector types for
all of these folds. Splitting things up makes it easier to incrementally 
enhance the code and easier to read.

llvm-svn: 279851
2016-08-26 18:28:46 +00:00
Sanjay Patel da9c56299b [InstCombine] clean up foldICmpAndConstConst(); NFC
1. Early exit to reduce indent
2. Fix comments and variable names to match
3. Reformat comments / clang-format code

llvm-svn: 279837
2016-08-26 17:15:22 +00:00
Sanjay Patel d3c7bb28be [InstCombine] add helper function for folding of icmp (and X, C2), C; NFC
llvm-svn: 279834
2016-08-26 16:42:33 +00:00
Bob Haarman 3db176410a limit the number of instructions per block examined by dead store elimination
Summary: Dead store elimination gets very expensive when large numbers of instructions need to be analyzed. This patch limits the number of instructions analyzed per store to the value of the memdep-block-scan-limit parameter (which defaults to 100). This resulted in no observed difference in performance of the generated code, and no change in the statistics for the dead store elimination pass, but improved compilation time on some files by more than an order of magnitude.

Reviewers: dexonsmith, bruno, george.burgess.iv, dberlin, reames, davidxl

Subscribers: davide, chandlerc, dberlin, davidxl, eraman, tejohnson, mbodart, llvm-commits

Differential Revision: https://reviews.llvm.org/D15537

llvm-svn: 279833
2016-08-26 16:34:27 +00:00
Sanjay Patel 311e0fabb1 [InstCombine] rename variables in foldICmpAndConstant(); NFC
llvm-svn: 279831
2016-08-26 16:14:06 +00:00
Bob Haarman 244ed8b574 test commit
llvm-svn: 279830
2016-08-26 16:00:04 +00:00
Adam Nemet 4f155b6e91 [LoopUnroll] Use OptimizationRemarkEmitter directly not via the analysis pass
We can't mark ORE (a function pass) preserved as required by the loop
passes because that is how we ensure that the required passes like
LazyBFI are all available any time ORE is used.  See the new comments in
the patch.

Instead we use it directly just like the inliner does in D22694.

As expected there is some additional overhead after removing the caching
provided by analysis passes.  The worst case, I measured was
LNT/CINT2006_ref/401.bzip2 which regresses by 12%.  As before, this only
affects -Rpass-with-hotness and not default compilation.

llvm-svn: 279829
2016-08-26 15:58:34 +00:00
Sanjay Patel f7ba0891ce [InstCombine] rename variables in foldICmpDivConstant(); NFC
Removing the redundant 'CmpRHSV' local variable exposes a bug in the caller
foldICmpShrConstant() - it was sending in the div constant instead of the
cmp constant. But I have not been able to expose this in a regression test
yet - the affected folds all appear to be handled before we ever reach this
code. I'll keep trying to find a case as I make changes to allow vector folds
in both functions.

llvm-svn: 279828
2016-08-26 15:53:01 +00:00
Tim Shen 3ad8b43cc2 [MemCpy] Add comments for r279769
Differential Revision: https://reviews.llvm.org/D23846

llvm-svn: 279778
2016-08-25 21:03:46 +00:00
Tim Shen a3dbead2d6 [MemCpy] Check for alias in performMemCpyToMemSetOptzn, instead of the identity of two operands
Summary:
This fixes pr29105. The reason is that lifetime marks creates new
aliasing pointers the original ones, but before this patch aliases
were not checked in performMemCpyToMemSetOptzn.

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D23846

llvm-svn: 279769
2016-08-25 19:27:26 +00:00
Wei Mi 59ca96636d [UNROLL] Postpone ScalarEvolution::forgetLoop after TripCountSC is expanded
when unroll runtime iteration loop.

In llvm::UnrollRuntimeLoopRemainder, if the loop to be unrolled is the inner
loop inside a loop nest, the scalar evolution needs to be dropped for its
parent loop which is done by ScalarEvolution::forgetLoop. However, we can
postpone forgetLoop to the end of UnrollRuntimeLoopRemainder so TripCountSC
expansion can still reuse existing value.

Differential Revision: https://reviews.llvm.org/D23572

llvm-svn: 279748
2016-08-25 16:17:18 +00:00
Sebastian Pop 5f0d0e60d1 GVN-hoist: fix hoistingFromAllPaths for loops (PR29034)
It is invalid to hoist stores or loads if they are not executed on all paths
from the hoisting point to the exit of the function. In the testcase, there are
paths in the loop that do not execute the stores or the loads, and so hoisting
them within the loop is unsafe.

The problem is that the current implementation of hoistingFromAllPaths is
incomplete: it walks all blocks dominated by the hoisting point, and does not
return false when the loop contains a path on which the hoisted ld/st is
not executed.

Differential Revision: https://reviews.llvm.org/D23843

llvm-svn: 279732
2016-08-25 11:55:47 +00:00
Xinliang David Li cad3a995a4 [Profile] Propagate branch metadata properly in instcombine
Differential Revision: http://reviews.llvm.org/D23590

llvm-svn: 279693
2016-08-25 00:26:32 +00:00
Sanjay Patel 1655414903 [InstCombine] move foldICmpDivConstConst() contents to foldICmpDivConstant(); NFCI
There was no logic in foldICmpDivConstant, so no need for a separate function.
The code is directly copy/pasted, so further cleanups to follow.

llvm-svn: 279685
2016-08-24 23:03:36 +00:00
Sanjay Patel d398d4a39e [InstCombine] use m_APInt to allow icmp eq/ne (shr X, C2), C folds for splat constant vectors
llvm-svn: 279677
2016-08-24 22:22:06 +00:00
Matthew Simpson abd2be1e2e [LV] Unify vector and scalar maps
This patch unifies the data structures we use for mapping instructions from the
original loop to their corresponding instructions in the new loop. Previously,
we maintained two distinct maps for this purpose: WidenMap and ScalarIVMap.
WidenMap maintained the vector values each instruction from the old loop was
represented with, and ScalarIVMap maintained the scalar values each scalarized
induction variable was represented with. With this patch, all values created
for the new loop are maintained in VectorLoopValueMap.

The change allows for several simplifications. Previously, when an instruction
was scalarized, we had to insert the scalar values into vectors in order to
maintain the mapping in WidenMap. Then, if a user of the scalarized value was
also scalar, we had to extract the scalar values from the temporary vector we
created. We now aovid these unnecessary scalar-to-vector-to-scalar conversions.
If a scalarized value is used by a scalar instruction, the scalar value is used
directly. However, if the scalarized value is needed by a vector instruction,
we generate the needed insertelement instructions on-demand.

A common idiom in several locations in the code (including the scalarization
code), is to first get the vector values an instruction from the original loop
maps to, and then extract a particular scalar value. This patch adds
getScalarValue for this purpose along side getVectorValue as an interface into
VectorLoopValueMap. These functions work together to return the requested
values if they're available or to produce them if they're not.

The mapping has also be made less permissive. Entries can be added to
VectorLoopValue map with the new initVector and initScalar functions.
getVectorValue has been modified to return a constant reference to the mapped
entries.

There's no real functional change with this patch; however, in some cases we
will generate slightly different code. For example, instead of an insertelement
sequence following the definition of an instruction, it will now precede the
first use of that instruction. This can be seen in the test case changes.

Differential Revision: https://reviews.llvm.org/D23169

llvm-svn: 279649
2016-08-24 18:23:17 +00:00
Sanjoy Das ff855b6020 [SCCP] Don't delete side-effecting instructions
I'm not sure if the `!isa<CallInst>(Inst) &&
!isa<TerminatorInst>(Inst))` bit is correct either, but this fixes the
case we know is broken.

llvm-svn: 279647
2016-08-24 18:10:21 +00:00
Sanjay Patel 8e297749c1 [InstCombine] add assert and explanatory comment for fold removed in r279568; NFC
I deleted a fold from InstCombine at:
https://reviews.llvm.org/rL279568

because it (like any InstCombine to a constant?) should always happen in InstSimplify,
however, it's not obvious what the assumptions are in the remaining code.

Add a comment and assert to make it clearer.

Differential Revision: https://reviews.llvm.org/D23819

llvm-svn: 279626
2016-08-24 13:55:55 +00:00
Gil Rapaport 550148b2f6 [Loop Vectorizer] Support predication of div/rem
div/rem instructions in basic blocks that require predication currently prevent
vectorization. This patch extends the existing mechanism for predicating stores
to handle other instructions and leverages it to predicate divs and rems.

Differential Revision: https://reviews.llvm.org/D22918

llvm-svn: 279620
2016-08-24 11:37:57 +00:00
Chandler Carruth 8882346842 [PM] Introduce basic update capabilities to the new PM's CGSCC pass
manager, including both plumbing and logic to handle function pass
updates.

There are three fundamentally tied changes here:
1) Plumbing *some* mechanism for updating the CGSCC pass manager as the
   CG changes while passes are running.
2) Changing the CGSCC pass manager infrastructure to have support for
   the underlying graph to mutate mid-pass run.
3) Actually updating the CG after function passes run.

I can separate them if necessary, but I think its really useful to have
them together as the needs of #3 drove #2, and that in turn drove #1.

The plumbing technique is to extend the "run" method signature with
extra arguments. We provide the call graph that intrinsically is
available as it is the basis of the pass manager's IR units, and an
output parameter that records the results of updating the call graph
during an SCC passes's run. Note that "...UpdateResult" isn't a *great*
name here... suggestions very welcome.

I tried a pretty frustrating number of different data structures and such
for the innards of the update result. Every other one failed for one
reason or another. Sometimes I just couldn't keep the layers of
complexity right in my head. The thing that really worked was to just
directly provide access to the underlying structures used to walk the
call graph so that their updates could be informed by the *particular*
nature of the change to the graph.

The technique for how to make the pass management infrastructure cope
with mutating graphs was also something that took a really, really large
number of iterations to get to a place where I was happy. Here are some
of the considerations that drove the design:

- We operate at three levels within the infrastructure: RefSCC, SCC, and
  Node. In each case, we are working bottom up and so we want to
  continue to iterate on the "lowest" node as the graph changes. Look at
  how we iterate over nodes in an SCC running function passes as those
  function passes mutate the CG. We continue to iterate on the "lowest"
  SCC, which is the one that continues to contain the function just
  processed.

- The call graph structure re-uses SCCs (and RefSCCs) during mutation
  events for the *highest* entry in the resulting new subgraph, not the
  lowest. This means that it is necessary to continually update the
  current SCC or RefSCC as it shifts. This is really surprising and
  subtle, and took a long time for me to work out. I actually tried
  changing the call graph to provide the opposite behavior, and it
  breaks *EVERYTHING*. The graph update algorithms are really deeply
  tied to this particualr pattern.

- When SCCs or RefSCCs are split apart and refined and we continually
  re-pin our processing to the bottom one in the subgraph, we need to
  enqueue the newly formed SCCs and RefSCCs for subsequent processing.
  Queuing them presents a few challenges:
  1) SCCs and RefSCCs use wildly different iteration strategies at
     a high level. We end up needing to converge them on worklist
     approaches that can be extended in order to be able to handle the
     mutations.
  2) The order of the enqueuing need to remain bottom-up post-order so
     that we don't get surprising order of visitation for things like
     the inliner.
  3) We need the worklists to have set semantics so we don't duplicate
     things endlessly. We don't need a *persistent* set though because
     we always keep processing the bottom node!!!! This is super, super
     surprising to me and took a long time to convince myself this is
     correct, but I'm pretty sure it is... Once we sink down to the
     bottom node, we can't re-split out the same node in any way, and
     the postorder of the current queue is fixed and unchanging.
  4) We need to make sure that the "current" SCC or RefSCC actually gets
     enqueued here such that we re-visit it because we continue
     processing a *new*, *bottom* SCC/RefSCC.

- We also need the ability to *skip* SCCs and RefSCCs that get merged
  into a larger component. We even need the ability to skip *nodes* from
  an SCC that are no longer part of that SCC.

This led to the design you see in the patch which uses SetVector-based
worklists. The RefSCC worklist is always empty until an update occurs
and is just used to handle those RefSCCs created by updates as the
others don't even exist yet and are formed on-demand during the
bottom-up walk. The SCC worklist is pre-populated from the RefSCC, and
we push new SCCs onto it and blacklist existing SCCs on it to get the
desired processing.

We then *directly* update these when updating the call graph as I was
never able to find a satisfactory abstraction around the update
strategy.

Finally, we need to compute the updates for function passes. This is
mostly used as an initial customer of all the update mechanisms to drive
their design to at least cover some real set of use cases. There are
a bunch of interesting things that came out of doing this:

- It is really nice to do this a function at a time because that
  function is likely hot in the cache. This means we want even the
  function pass adaptor to support online updates to the call graph!

- To update the call graph after arbitrary function pass mutations is
  quite hard. We have to build a fairly comprehensive set of
  data structures and then process them. Fortunately, some of this code
  is related to the code for building the cal graph in the first place.
  Unfortunately, very little of it makes any sense to share because the
  nature of what we're doing is so very different. I've factored out the
  one part that made sense at least.

- We need to transfer these updates into the various structures for the
  CGSCC pass manager. Once those were more sanely worked out, this
  became relatively easier. But some of those needs necessitated changes
  to the LazyCallGraph interface to make it significantly easier to
  extract the changed SCCs from an update operation.

- We also need to update the CGSCC analysis manager as the shape of the
  graph changes. When an SCC is merged away we need to clear analyses
  associated with it from the analysis manager which we didn't have
  support for in the analysis manager infrsatructure. New SCCs are easy!
  But then we have the case that the original SCC has its shape changed
  but remains in the call graph. There we need to *invalidate* the
  analyses associated with it.

- We also need to invalidate analyses after we *finish* processing an
  SCC. But the analyses we need to invalidate here are *only those for
  the newly updated SCC*!!! Because we only continue processing the
  bottom SCC, if we split SCCs apart the original one gets invalidated
  once when its shape changes and is not processed farther so its
  analyses will be correct. It is the bottom SCC which continues being
  processed and needs to have the "normal" invalidation done based on
  the preserved analyses set.

All of this is mostly background and context for the changes here.

Many thanks to all the reviewers who helped here. Especially Sanjoy who
caught several interesting bugs in the graph algorithms, David, Sean,
and others who all helped with feedback.

Differential Revision: http://reviews.llvm.org/D21464

llvm-svn: 279618
2016-08-24 09:37:14 +00:00
Gor Nishanov 4570e26e68 [Coroutines] Fix unused var warning in release build
llvm-svn: 279610
2016-08-24 05:20:30 +00:00
Gor Nishanov 241b041fba [Coroutines] Part 8: Coroutine Frame Building algorithm
Summary:
This patch adds coroutine frame building algorithm. Now, simple coroutines such as ex0.ll and ex1.ll (first examples from docs\Coroutines.rst can be compiled).

Documentation and overview is here: http://llvm.org/docs/Coroutines.html.

Upstreaming sequence (rough plan)
1.Add documentation. (https://reviews.llvm.org/D22603)
2.Add coroutine intrinsics. (https://reviews.llvm.org/D22659)
...

7. Split coroutine into subfunctions. (https://reviews.llvm.org/D23461)
8. Coroutine Frame Building algorithm  <= we are here
9. Add f.cleanup subfunction.
10+. The rest of the logic

Reviewers: majnemer

Subscribers: mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D23586

llvm-svn: 279609
2016-08-24 04:44:35 +00:00
David Callahan 012d1c0766 [ADCE] Add control dependence computation
Summary:
This is part of a serious of patches to evolve ADCE.cpp to support
removing of unnecessary control flow.

This patch adds the ability to compute control dependences using
the iterated dominance frontier. We extend the liveness propagation
to alternate between data and control dependences until convergences.

Modify the pass manager intergation to compute the post-dominator tree
needed for iterator dominance frontier.

We still force all terminators live for now until we add code to
handlinge removing control flow in a later patch.

No changes to effective behavior with this patch

Previous patches:

D23225 [ADCE] Modify data structures to support removing control flow
D23065 [ADCE] Refactor anticipating new functionality (NFC)
D23102 [ADCE] Refactoring for new functionality (NFC)

Reviewers: nadav, majnemer, mehdi_amini

Subscribers: twoh, freik, llvm-commits

Differential Revision: https://reviews.llvm.org/D23559

llvm-svn: 279594
2016-08-24 00:10:06 +00:00
Michael Zolotukhin bd63d436c1 [LoopUnroll] By default disable unrolling when optimizing for size.
Summary:
In clang commit r268509 we started to invoke loop-unroll pass from the
driver even under -Os. However, we happen to not initialize optsize
thresholds properly, which si fixed with this change.

r268509 led to some big compile time regressions, because we started to
unroll some loops that we didn't unroll before. With this change I hope
to recover most of the regressions. We still are slightly slower than
before, because we do some checks here and there in loop-unrolling
before we bail out, but at least the slowdown is not that huge now.

Reviewers: hfinkel, chandlerc

Subscribers: mzolotukhin, llvm-commits

Differential Revision: https://reviews.llvm.org/D23388

llvm-svn: 279585
2016-08-23 23:13:15 +00:00
Sanjay Patel d64e988701 [InstCombine] use local variables for repeated values; NFCI
llvm-svn: 279578
2016-08-23 22:05:55 +00:00
Sanjay Patel dcac0dfca9 [InstCombine] move foldICmpShrConstConst() contents to foldICmpShrConst(); NFCI
There will only be 3 lines of code in foldICmpShrConst() when the cleanup is done,
so it doesn't make much sense to have a separate function for a single fold.

llvm-svn: 279575
2016-08-23 21:25:13 +00:00
Sanjay Patel 6ef22da9ec [InstCombine] remove icmp shr folds that are already handled by InstSimplify
AFAICT, these already worked in all cases for scalar types, and I enhanced
the code to work for vector types in:
https://reviews.llvm.org/rL279543

llvm-svn: 279568
2016-08-23 21:01:35 +00:00
Matthew Simpson df2ab917ad [SLP] Avoid signed integer overflow
The test case included with r279125 exposed an existing signed integer
overflow. Since getTreeCost can return INT_MAX, we can't sum this cost together
with other costs, such as getReductionCost.

This patch removes the possibility of assigning a cost of INT_MAX. Since we
were previously using INT_MAX as an indicator for "should not vectorize", we
now explicitly check this condition with "isTreeTinyAndNotFullyVectorizable"
before computing a cost.

This patch adds a run-line to the test case used for r279125 that ensures we
don't vectorize. Previously, this line would vectorize the test case by chance
due to undefined behavior in the cost calculation.

Differential Revision: https://reviews.llvm.org/D23723

llvm-svn: 279562
2016-08-23 20:48:50 +00:00
Xinliang David Li 812288a5b4 Possible fix of test failures on win bots
llvm-svn: 279542
2016-08-23 18:00:41 +00:00
Xinliang David Li dc49140b44 [Profile] refactor meta data copying/swapping code
Differential Revision: http://reviews.llvm.org/D23619

llvm-svn: 279523
2016-08-23 15:39:03 +00:00
Daniel Berlin ea02eee18f GVNHoist: Use the pass version of MemorySSA and preserve it.
Summary: GVNHoist: Use the pass version of MemorySSA and preserve it.

Reviewers: sebpop, george.burgess.iv

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D23782

llvm-svn: 279504
2016-08-23 05:42:41 +00:00
George Burgess IV 7f414b90ab [MemorySSA] Remove unused field. NFC.
Given that we're not currently using blocker info, and whether or not we
will end up using it it is unclear, don't waste 8 (or 4) bytes of memory
per path node.

llvm-svn: 279493
2016-08-22 23:40:01 +00:00
Sanjay Patel c9196c4488 [InstCombine] change param type from Instruction to BinaryOperator for icmp helpers; NFCI
This saves some casting in the helper functions and eases some further refactoring.

llvm-svn: 279478
2016-08-22 21:24:29 +00:00
Tim Shen f2187ed321 [GraphTraits] Replace all NodeType usage with NodeRef
This should finish the GraphTraits migration.

Differential Revision: http://reviews.llvm.org/D23730

llvm-svn: 279475
2016-08-22 21:09:30 +00:00
Sanjay Patel a392049419 [InstCombine] use m_APInt to allow icmp (shr exact X, Y), 0 folds for splat constant vectors
llvm-svn: 279472
2016-08-22 20:45:06 +00:00
Daniel Berlin 3d512a2dc2 MSSA: Factor out phi node placement
llvm-svn: 279462
2016-08-22 19:14:30 +00:00
Daniel Berlin 868381bff6 MSSA: Only rename accesses whose defining access is nullptr
llvm-svn: 279461
2016-08-22 19:14:16 +00:00
James Molloy 5bf2114265 [SimplifyCFG] Rewrite SinkThenElseCodeToEnd
[Recommitting now an unrelated assertion in SROA is sorted out]

The new version has several advantages:
  1) IMSHO it's more readable and neater
  2) It handles loads and stores properly
  3) It can handle any number of incoming blocks rather than just two. I'll be taking advantage of this in a followup patch.

With this change we can now finally sink load-modify-store idioms such as:

    if (a)
      return *b += 3;
    else
      return *b += 4;

    =>

    %z = load i32, i32* %y
    %.sink = select i1 %a, i32 5, i32 7
    %b = add i32 %z, %.sink
    store i32 %b, i32* %y
    ret i32 %b

When this works for switches it'll be even more powerful.

Round 4. This time we should handle all instructions correctly, and not replace any operands that need to be constant with variables.

This was really hard to determine safely, so the helper function should be put into the Instruction API. I'll do that as a followup.

llvm-svn: 279460
2016-08-22 19:07:15 +00:00
James Molloy 0fee97f8ba [SROA] Remove incorrect assertion
Confirmed with aprantl, this assertion is incorrect - code can get here (for example 80-bit FP types) and if it does it's benign. This is exposed by a completely unrelated patch of mine, so stop the compiler falling over.

Original differential: http://reviews.llvm.org/D16187
aprantl's advice to remove assertion: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160815/382129.html

llvm-svn: 279454
2016-08-22 18:49:42 +00:00
Jun Bum Lim ec8b8cc595 [InstCombine] Allow sinking from unique predecessor with multiple edges
Summary: We can allow sinking if the single user block has only one unique predecessor, regardless of the number of edges. Note that a switch statement with multiple cases can have the same destination.

Reviewers: mcrosier, majnemer, spatel, reames

Subscribers: reames, mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D23722

llvm-svn: 279448
2016-08-22 18:21:56 +00:00
James Molloy 475f4a763f Revert "[SimplifyCFG] Rewrite SinkThenElseCodeToEnd"
This reverts commit r279443. It caused buildbot failures.

llvm-svn: 279447
2016-08-22 18:13:12 +00:00
James Molloy 353052698a [SimplifyCFG] Rewrite SinkThenElseCodeToEnd
The new version has several advantages:
  1) IMSHO it's more readable and neater
  2) It handles loads and stores properly
  3) It can handle any number of incoming blocks rather than just two. I'll be taking advantage of this in a followup patch.

With this change we can now finally sink load-modify-store idioms such as:

    if (a)
      return *b += 3;
    else
      return *b += 4;

    =>

    %z = load i32, i32* %y
    %.sink = select i1 %a, i32 5, i32 7
    %b = add i32 %z, %.sink
    store i32 %b, i32* %y
    ret i32 %b

When this works for switches it'll be even more powerful.

Round 4. This time we should handle all instructions correctly, and not replace any operands that need to be constant with variables.

This was really hard to determine safely, so the helper function should be put into the Instruction API. I'll do that as a followup.

llvm-svn: 279443
2016-08-22 17:40:23 +00:00
Artur Pilipenko b78ad9d41f Revert -r278269 [IndVarSimplify] Eliminate zext of a signed IV when the IV is known to be non-negative
This change needs to be reverted in order to revert -r278267 which cause performance regression on MultiSource/Benchmarks/TSVC/Symbolics-flt/Symbolics-flt from LNT and some other bechmarks.

See comments on https://reviews.llvm.org/D18777 for details.

llvm-svn: 279432
2016-08-22 13:12:07 +00:00
Vitaly Buka 0672a27bb5 [asan] Use 1 byte aligned stores to poison shadow memory
Summary: r279379 introduced crash on arm 32bit bot. I suspect this is alignment issue.

Reviewers: eugenis

Subscribers: llvm-commits, aemerson

Differential Revision: https://reviews.llvm.org/D23762

llvm-svn: 279413
2016-08-22 04:16:14 +00:00
Sanjay Patel 643d21a62c [InstCombine] use m_APInt to allow icmp (shl X, Y), C folds for splat constant vectors, part 4
This concludes the fixes for icmp+shl in this series:
https://reviews.llvm.org/rL279339
https://reviews.llvm.org/rL279398
https://reviews.llvm.org/rL279399

llvm-svn: 279401
2016-08-21 17:10:07 +00:00
Sanjay Patel 7ffcde7422 [InstCombine] use m_APInt to allow icmp (shl X, Y), C folds for splat constant vectors, part 3
This is a partial enablement (move the ConstantInt guard down).

llvm-svn: 279399
2016-08-21 16:35:34 +00:00
Sanjay Patel 7e09f13fed [InstCombine] use m_APInt to allow icmp (shl X, Y), C folds for splat constant vectors, part 2
This is a partial enablement (move the ConstantInt guard down).

llvm-svn: 279398
2016-08-21 16:28:22 +00:00
Sanjay Patel 792636603f [InstCombine] use APInt instead of ConstantInt in isSignBitCheck(); NFCI
The callers still have ConstantInt guards, so there is no functional change
intended from this change. But relaxing the callers will allow more folds
for vector types.

llvm-svn: 279396
2016-08-21 15:07:45 +00:00
Vitaly Buka 1f9e135023 [asan] Minimize code size by using __asan_set_shadow_* for large blocks
Summary:
We can insert function call instead of multiple store operation.
Current default is blocks larger than 64 bytes.
Changes are hidden behind -asan-experimental-poisoning flag.

PR27453

Differential Revision: https://reviews.llvm.org/D23711

llvm-svn: 279383
2016-08-20 20:23:50 +00:00
Vitaly Buka 3455b9b8bc [asan] Initialize __asan_set_shadow_* callbacks
Summary:
Callbacks are not being used yet.

PR27453

Reviewers: kcc, eugenis

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D23634

llvm-svn: 279380
2016-08-20 18:34:39 +00:00
Vitaly Buka 186280daa5 [asan] Optimize store size in FunctionStackPoisoner::poisonRedZones
Summary: Reduce store size to avoid leading and trailing zeros.

Reviewers: kcc, eugenis

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D23648

llvm-svn: 279379
2016-08-20 18:34:36 +00:00
Vitaly Buka 5b4f12176c [asan] Cleanup instrumentation of dynamic allocas
Summary:
Extract instrumenting dynamic allocas into separate method.
Rename asan-instrument-allocas -> asan-instrument-dynamic-allocas

Differential Revision: https://reviews.llvm.org/D23707

llvm-svn: 279376
2016-08-20 17:22:27 +00:00
Vitaly Buka f9fd63ad39 [asan] Add support of lifetime poisoning into ComputeASanStackFrameLayout
Summary:
We are going to combine poisoning of red zones and scope poisoning.

PR27453

Reviewers: kcc, eugenis

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D23623

llvm-svn: 279373
2016-08-20 16:48:24 +00:00
Matthew Simpson 235e479984 Reapply "[SLP] Initialize VectorizedValue when gathering"
The test case included in r279125 exposed existing undefined behavior in the
SLP vectorizer that it did not introduce. This patch reapplies the original
patch, but modifies the test case to avoid hitting the undefined behavior. This
allows us to close PR28330 while keeping the UBSan bot happy. The undefined
behavior the original test uncovered will be addressed in a follow-on patch.

Reference: https://llvm.org/bugs/show_bug.cgi?id=28330
llvm-svn: 279370
2016-08-20 14:49:02 +00:00
Matthew Simpson 2429656aa9 [SLP] Add command line option for minimum tree size (NFC)
llvm-svn: 279369
2016-08-20 14:10:06 +00:00
Vitaly Buka cc7db13bf0 Revert "[SLP] Initialize VectorizedValue when gathering" to fix ubsan bot.
This reverts commit r279125.

https://reviews.llvm.org/D23410

llvm-svn: 279363
2016-08-20 07:09:39 +00:00
Sanjay Patel fa7de606c4 [InstCombine] use m_APInt to allow icmp (shl X, Y), C folds for splat constant vectors, part 1
This is a partial enablement (move the ConstantInt guard down) because there are many
different folds here and one of the later ones will require reworking 'isSignBitCheck'.

llvm-svn: 279339
2016-08-19 22:33:26 +00:00
Daniel Berlin 11da66fc10 Partially revert 279331, as we modify this instruction in the loop
llvm-svn: 279335
2016-08-19 22:18:38 +00:00
Vitaly Buka e149b392a8 Revert "[asan] Add support of lifetime poisoning into ComputeASanStackFrameLayout"
This reverts commit r279020.

Speculative revert in hope to fix asan test on arm.

llvm-svn: 279332
2016-08-19 22:12:58 +00:00
Daniel Berlin a36f46363f Convert some depth first traversals to depth_first
llvm-svn: 279331
2016-08-19 22:06:23 +00:00
Tim Shen b5e0f5ac95 [GraphTraits] Make nodes_iterator dereference to NodeType*/NodeRef
Currently nodes_iterator may dereference to a NodeType* or a NodeType&. Make them all dereference to NodeType*, which is NodeRef later.

Differential Revision: https://reviews.llvm.org/D23704
Differential Revision: https://reviews.llvm.org/D23705

llvm-svn: 279326
2016-08-19 21:20:13 +00:00
Reid Kleckner 98a48afa5d Revert "[SimplifyCFG] Rewrite SinkThenElseCodeToEnd"
This reverts commit r279229. It breaks intrinsic function calls in
diamonds.

llvm-svn: 279313
2016-08-19 20:22:39 +00:00
Sanjay Patel 7a104615c5 [InstCombine] remove an icmp fold that is already handled by InstSimplify
Specifically, this is done near the end of "SimplifyICmpInst" using 
computeKnownBits() as the broader solution. There are even vector
tests (yay!) for this in test/Transforms/InstSimplify/compare.ll.

I considered putting an assert here instead of just deleting, but
then we could assert every possible fold in InstSimplify in 
InstCombine, so...less is more?

llvm-svn: 279300
2016-08-19 19:03:07 +00:00
Sanjay Patel e38e79c3e6 [InstCombine] use local variables to reduce code in foldICmpShlConstant; NFC
llvm-svn: 279282
2016-08-19 17:34:05 +00:00
Sanjay Patel 38b7506f75 [InstCombine] rename variables in foldICmpShlConstant(); NFC
llvm-svn: 279279
2016-08-19 17:20:37 +00:00
Vitaly Buka 170dede75d Revert "[asan] Optimize store size in FunctionStackPoisoner::poisonRedZones"
This reverts commit r279178.

Speculative revert in hope to fix asan crash on arm.

llvm-svn: 279277
2016-08-19 17:15:38 +00:00
Vitaly Buka c8f4d69c82 Revert "[asan] Fix size of shadow incorrectly calculated in r279178"
This reverts commit r279222.

Speculative revert in hope to fix asan crash on arm.

llvm-svn: 279276
2016-08-19 17:15:33 +00:00
Reid Kleckner a871d3872a Fix regression in InstCombine introduced by r278944
The intended transform is:
  // Simplify icmp eq (or (ptrtoint P), (ptrtoint Q)), 0
  // -> and (icmp eq P, null), (icmp eq Q, null).

P and Q are both pointer types, but may have different types. We need
two calls to getNullValue() to make the icmps.

llvm-svn: 279271
2016-08-19 16:53:18 +00:00
David Majnemer 5554edabef [CloneFunction] Don't remove unrelated nodes from the CGSSC
CGSCC use a WeakVH to track call sites.  RAUW a call within a function
can result in that WeakVH getting confused about whether or not the call
site is still around.

llvm-svn: 279268
2016-08-19 16:37:40 +00:00
Sanjay Patel a867afe094 [InstCombine] use m_APInt to allow icmp (shl 1, Y), C folds for splat constant vectors
llvm-svn: 279266
2016-08-19 16:12:16 +00:00
Sanjay Patel 57b12d3876 [InstCombine] use m_APInt to allow icmp X, C folds for splat constant vectors
Of course, we really need to refactor and fix all of the cmp predicates, 
but this one is interesting because without it, we later perform an 
information-losing transform of icmp (shl 1, Y), C, and we can't recover
the better fold.

llvm-svn: 279263
2016-08-19 15:40:44 +00:00
Benjamin Kramer 96fcf5df03 [LoopVectorize] Don't copy std::vector in for-range loop.
llvm-svn: 279233
2016-08-19 12:44:24 +00:00
James Molloy 11a1936b70 [SimplifyCFG] Rewrite SinkThenElseCodeToEnd
The new version has several advantages:
  1) IMSHO it's more readable and neater
  2) It handles loads and stores properly
  3) It can handle any number of incoming blocks rather than just two. I'll be taking advantage of this in a followup patch.

With this change we can now finally sink load-modify-store idioms such as:

    if (a)
      return *b += 3;
    else
      return *b += 4;

    =>

    %z = load i32, i32* %y
    %.sink = select i1 %a, i32 5, i32 7
    %b = add i32 %z, %.sink
    store i32 %b, i32* %y
    ret i32 %b

When this works for switches it'll be even more powerful.

llvm-svn: 279229
2016-08-19 10:10:27 +00:00
Vitaly Buka b81960a6c8 [asan] Fix size of shadow incorrectly calculated in r279178
Summary: r279178 generates 8 times more stores than necessary.

Reviewers: eugenis

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D23708

llvm-svn: 279222
2016-08-19 08:33:53 +00:00
Xinliang David Li 63248ab888 [Profile] Fix edge count read bug
Use uint64_t to avoid value truncation before scaling.

llvm-svn: 279213
2016-08-19 06:31:45 +00:00
Xinliang David Li 2c9336823c [Profile] Simple code refactoring for reuse /NFC
llvm-svn: 279209
2016-08-19 05:31:33 +00:00
Vitaly Buka aa654292bd [asan] Optimize store size in FunctionStackPoisoner::poisonRedZones
Summary: Reduce store size to avoid leading and trailing zeros.

Reviewers: kcc, eugenis

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D23648

llvm-svn: 279178
2016-08-18 23:51:15 +00:00
Sanjay Patel 98cd99dfc6 [InstCombine] add helper function for folds of icmp (shl 1, Y), C; NFCI
Clean up the existing code by:
1. Renaming variables
2. Adding local variables
3. Making it vector-safe

This is still guarded by a ConstantInt check, so no functional change is intended.
But this should be ready to go: if we move the ConstantInt check down, all of
these folds should do the right thing for vector types.

llvm-svn: 279150
2016-08-18 21:28:30 +00:00
Amaury Sechet 763c59dc9a Make cltz and cttz zero undef when the operand cannot be zero in InstCombine
Summary: Also add popcount(n) == bitsize(n)  -> n == -1 transformation.

Reviewers: majnemer, spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D23134

llvm-svn: 279141
2016-08-18 20:43:50 +00:00
Sanjay Patel 40e8ca46ad [InstCombine] use m_APInt to allow icmp (trunc X, Y), C folds for splat constant vectors
This is a sibling of:
https://reviews.llvm.org/rL278859
https://reviews.llvm.org/rL278935
https://reviews.llvm.org/rL278945
https://reviews.llvm.org/rL279066
https://reviews.llvm.org/rL279077
https://reviews.llvm.org/rL279101

llvm-svn: 279133
2016-08-18 20:28:54 +00:00
Sanjay Patel 5f4ce4e23d [InstCombine] clean up foldICmpTruncConstant(); NFCI
1. Fix variable names
2. Add local variables to reduce code

llvm-svn: 279132
2016-08-18 20:25:16 +00:00
Matthew Simpson 11db6b6b8c [SLP] Initialize VectorizedValue when gathering
We abort building vectorizable trees in some cases (e.g., if the maximum
recursion depth is reached, if the region size is too large, etc.). If this
happens for a reduction, we can be left with a root entry that needs to be
gathered. For these cases, we need make sure we actually set VectorizedValue to
the resulting vector.

This patch ensures we properly set VectorizedValue, and it also ensures the
insertelement sequence generated for the gathers is inserted at the correct
location.

Reference: https://llvm.org/bugs/show_bug.cgi?id=28330
Differential Revison: https://reviews.llvm.org/D23410

llvm-svn: 279125
2016-08-18 19:50:32 +00:00
Sanjay Patel fa5ca2bf46 [InstCombine] use m_APInt to allow icmp (udiv X, Y), C folds for splat constant vectors
This is a sibling of:
https://reviews.llvm.org/rL278859
https://reviews.llvm.org/rL278935
https://reviews.llvm.org/rL278945
https://reviews.llvm.org/rL279066
https://reviews.llvm.org/rL279077

llvm-svn: 279101
2016-08-18 17:55:59 +00:00
Sanjay Patel 12a4105647 [InstCombine] clean up foldICmpUDivConstant; NFC
1. Better variable names
2. Remove unnecessary check of ConstantInt

llvm-svn: 279094
2016-08-18 17:37:26 +00:00
Artur Pilipenko 615b820af6 CVP. Turn marking adds as no wrap (introduced by r278107) off by default
It causes a regression on our internal benchmark. Introduce cvp-dont-process flag and set it off by default while investigating the regression.

llvm-svn: 279082
2016-08-18 16:08:35 +00:00
Davide Italiano d1279df752 [IRCE] Switch over to LLVM_DUMP_METHOD. NFCI.
llvm-svn: 279079
2016-08-18 15:55:49 +00:00
Sanjay Patel 6347807f87 [InstCombine] use m_APInt to allow icmp (mul X, Y), C folds for splat constant vectors
This is a sibling of:
https://reviews.llvm.org/rL278859
https://reviews.llvm.org/rL278935
https://reviews.llvm.org/rL278945
https://reviews.llvm.org/rL279066

llvm-svn: 279077
2016-08-18 15:44:44 +00:00
Sanjay Patel 5b112845da [InstCombine] use APInt in isSignTest instead of ConstantInt; NFC
This will enable vector splat folding, but NFC until the callers
have their ConstantInt restrictions removed.

llvm-svn: 279072
2016-08-18 14:59:14 +00:00
Sanjay Patel 4c5e60d95c [InstCombine] use m_APInt to allow icmp (xor X, Y), C folds for splat constant vectors
This is a sibling of:
https://reviews.llvm.org/rL278859
https://reviews.llvm.org/rL278935
https://reviews.llvm.org/rL278945

llvm-svn: 279066
2016-08-18 14:10:48 +00:00
Kostya Serebryany 524c3f32e7 [sanitizer-coverage/libFuzzer] instrument comparisons with __sanitizer_cov_trace_cmp[1248] instead of __sanitizer_cov_trace_cmp, don't pass the comparison type to save a bit performance. Use these new callbacks in libFuzzer
llvm-svn: 279027
2016-08-18 01:25:28 +00:00
Vitaly Buka d5ec14989d [asan] Add support of lifetime poisoning into ComputeASanStackFrameLayout
Summary:
We are going to combine poisoning of red zones and scope poisoning.

PR27453

Reviewers: kcc, eugenis

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D23623

llvm-svn: 279020
2016-08-18 00:56:58 +00:00
Haicheng Wu e787763275 [LoopUnroll] Move a simple check earlier. NFC.
Move the check of CallInst earlier to skip expensive recursive operations.

Differential Revision: https://reviews.llvm.org/D23611

llvm-svn: 278998
2016-08-17 22:42:58 +00:00
Tim Shen 5c0c063ad5 [LV] Move LoopBodyTraits to a better place, and add comment for simplifying LoopBlocksTraversal. NFC.
Summary: I later (after r278573) found that LoopIterator.h has some overlapping with LoopBodyTraits. It's good to use LoopBodyTraits because a *Traits struct is algorithm independent.

Reviewers: anemet, nadav, mkuper

Subscribers: mzolotukhin, llvm-commits

Differential Revision: https://reviews.llvm.org/D23529

llvm-svn: 278996
2016-08-17 22:20:07 +00:00
Justin Bogner cd1d5aaf2e Replace a few more "fall through" comments with LLVM_FALLTHROUGH
Follow up to r278902. I had missed "fall through", with a space.

llvm-svn: 278970
2016-08-17 20:30:52 +00:00
Sanjay Patel daffec91ef [InstCombine] more clean up of foldICmpXorConstant(); NFCI
Use m_APInt for the xor constant, but this is all still guarded by the initial
ConstantInt check, so no vector types should make it in here.

llvm-svn: 278957
2016-08-17 19:45:18 +00:00
Sanjay Patel 6d5f448746 [InstCombine] clean up foldICmpXorConstant(); NFCI
1. Change variable names
2. Use local variables to reduce code
3. Early exit to reduce indent

llvm-svn: 278955
2016-08-17 19:23:42 +00:00
Sanjay Patel 63e14a07e8 [InstCombine] use m_APInt to allow icmp (or X, Y), C folds for splat constant vectors
This is a sibling of:
https://reviews.llvm.org/rL278859
https://reviews.llvm.org/rL278935

llvm-svn: 278945
2016-08-17 16:38:57 +00:00
Sanjay Patel 943e92efde [InstCombine] clean up foldICmpOrConstant(); NFCI
1. Change variable names
2. Use local variables to reduce code
3. Use ? instead of if/else
4. Use the APInt variable instead of 'RHS' so the removal of the FIXME code will be direct

llvm-svn: 278944
2016-08-17 16:30:43 +00:00
Chad Rosier ea7e4647db Revert "Reassociate: Reprocess RedoInsts after each inst".
This reverts commit r258830, which introduced a bug described in PR28367.

PR28367

llvm-svn: 278938
2016-08-17 15:54:39 +00:00
Sanjay Patel 4f7eb2aa95 [InstCombine] use m_APInt to allow icmp (add X, Y), C folds for splat constant vectors
This is a sibling of:
https://reviews.llvm.org/rL278859

llvm-svn: 278935
2016-08-17 15:24:30 +00:00
Chad Rosier a6822f64f3 Revert "[Reassociate] Avoid iterator invalidation when negating value."
This reverts commit r278928 due to lit test failures.

llvm-svn: 278929
2016-08-17 14:31:34 +00:00
Chad Rosier cf3e8121a6 [Reassociate] Avoid iterator invalidation when negating value.
Differential Revision: https://reviews.llvm.org/D23464
PR28367

llvm-svn: 278928
2016-08-17 14:16:45 +00:00
Jonas Paulsson 7a79422536 [LoopStrenghtReduce] Refactoring and addition of a new target cost function.
Refactored so that a LSRUse owns its fixups, as oppsed to letting the
LSRInstance own them. This makes it easier to rate formulas for
LSRUses, since the fixups are available directly. The Offsets vector
has been removed since it was no longer necessary.

New target hook isFoldableMemAccessOffset(), which is used during formula
rating.

For SystemZ, this is useful to express that loads and stores with
float or vector types with a big/negative offset should be avoided in
loops. Without this, LSR will generate a lot of negative offsets that
would require extra instructions for loading the address.

Updated tests:
test/CodeGen/SystemZ/loop-01.ll

Reviewed by: Quentin Colombet and Ulrich Weigand.
https://reviews.llvm.org/D19152

llvm-svn: 278927
2016-08-17 13:24:19 +00:00
Justin Bogner b03fd12cef Replace "fallthrough" comments with LLVM_FALLTHROUGH
This is a mechanical change of comments in switches like fallthrough,
fall-through, or fall-thru to use the LLVM_FALLTHROUGH macro instead.

llvm-svn: 278902
2016-08-17 05:10:15 +00:00
Chandler Carruth 67fc52f067 [PM] Port the always inliner to the new pass manager in a much more
minimal and boring form than the old pass manager's version.

This pass does the very minimal amount of work necessary to inline
functions declared as always-inline. It doesn't support a wide array of
things that the legacy pass manager did support, but is alse ... about
20 lines of code. So it has that going for it. Notably things this
doesn't support:

- Array alloca merging
  - To support the above, bottom-up inlining with careful history
    tracking and call graph updates
- DCE of the functions that become dead after this inlining.
- Inlining through call instructions with the always_inline attribute.
  Instead, it focuses on inlining functions with that attribute.

The first I've omitted because I'm hoping to just turn it off for the
primary pass manager. If that doesn't pan out, I can add it here but it
will be reasonably expensive to do so.

The second should really be handled by running global-dce after the
inliner. I don't want to re-implement the non-trivial logic necessary to
do comdat-correct DCE of functions. This means the -O0 pipeline will
have to be at least 'always-inline,global-dce', but that seems
reasonable to me. If others are seriously worried about this I'd like to
hear about it and understand why. Again, this is all solveable by
factoring that logic into a utility and calling it here, but I'd like to
wait to do that until there is a clear reason why the existing
pass-based factoring won't work.

The final point is a serious one. I can fairly easily add support for
this, but it seems both costly and a confusing construct for the use
case of the always inliner running at -O0. This attribute can of course
still impact the normal inliner easily (although I find that
a questionable re-use of the same attribute). I've started a discussion
to sort out what semantics we want here and based on that can figure out
if it makes sense ta have this complexity at O0 or not.

One other advantage of this design is that it should be quite a bit
faster due to checking for whether the function is a viable candidate
for inlining exactly once per function instead of doing it for each call
site.

Anyways, hopefully a reasonable starting point for this pass.

Differential Revision: https://reviews.llvm.org/D23299

llvm-svn: 278896
2016-08-17 02:56:20 +00:00
Chandler Carruth f702d8ecb6 [Inliner] Add a flag to disable manual alloca merging in the Inliner.
This is off for now while testing can take place to make sure that in
fact we do sufficient stack coloring to fully obviate the manual alloca
array merging.

Some context on why we should be using stack coloring rather than
merging allocas in this way:

LLVM relies very heavily on analyzing pointers as coming from different
allocas in order to make aliasing decisions. These are some of the most
powerful aliasing signals available in LLVM. So merging allocas is an
extremely destructive operation on the LLVM IR -- it takes away highly
valuable and hard to reconstruct information.

As a consequence, inlined functions which happen to have array allocas
that this pattern matches will fail to be properly interleaved unless
SROA manages to hoist everything to an SSA register. Instead, the
inliner will have added an unnecessary dependence that one inlined
function execute after the other because they will have been rewritten
to refer to the same memory.

All that said, folks will reasonably want some time to experiment here
and make sure there are no significant regressions. A flag should give
us an easy knob to test.

For more context, see the thread here:
http://lists.llvm.org/pipermail/llvm-dev/2016-July/103277.html
http://lists.llvm.org/pipermail/llvm-dev/2016-August/103285.html

Differential Revision: https://reviews.llvm.org/D23052

llvm-svn: 278892
2016-08-17 02:40:23 +00:00
Duncan P. N. Exon Smith 362d120488 Scalar: Avoid dereferencing end() in IndVarSimplify
IndVarSimplify::sinkUnusedInvariants calls
BasicBlock::getFirstInsertionPt on the ExitBlock and moves instructions
before it.  This can return end(), so it's not safe to dereference.  Add
an iterator-based overload to Instruction::moveBefore to avoid the UB.

llvm-svn: 278886
2016-08-17 01:54:41 +00:00
Duncan P. N. Exon Smith 9e3edad932 IPO: Swap || operands to avoid dereferencing end()
IsOperandBundleUse conveniently indicates  whether
std::next(F->arg_begin(),UseIndex) will get to (or past) end().  Check
it first to avoid dereferencing end().

llvm-svn: 278884
2016-08-17 01:23:58 +00:00
Duncan P. N. Exon Smith 3bcaa81204 Scalar: Avoid dereferencing end() in InductiveRangeCheckElimination
BasicBlock::Create isn't designed to take iterators (which might be
end()), but pointers (which might be nullptr).  Fix the UB that was
converting end() to a BasicBlock* by calling BasicBlock::getNextNode()
in the first place.

llvm-svn: 278883
2016-08-17 01:16:17 +00:00
Duncan P. N. Exon Smith 0a12729f99 SimplifyCFG: Avoid dereferencing end()
When comparing a User* to a BasicBlock::iterator in
passingValueIsAlwaysUndefined, don't dereference the iterator in case it
is end().

llvm-svn: 278872
2016-08-16 23:57:56 +00:00
Sanjay Patel 60ea1b43d6 [InstCombine] clean up foldICmpAddConstant(); NFCI
1. Fix variable names
2. Add local variables to reduce code
3. Fix code comments
4. Add early exit to reduce indentation
5. Remove 'else' after if -> return
6. Hoist common predicate

llvm-svn: 278864
2016-08-16 22:34:42 +00:00
David Majnemer 744a8753db Preserve the assumption cache more often
We were clearing it out in LoopUnswitch and InlineFunction instead of
attempting to preserve it.

llvm-svn: 278860
2016-08-16 22:07:32 +00:00
Sanjay Patel e47df1ac62 [InstCombine] use m_APInt to allow icmp (sub X, Y), C folds for splat constant vectors
llvm-svn: 278859
2016-08-16 21:53:19 +00:00
Sanjay Patel b9aa67bfcf [InstCombine] fix variable names to match formula comments; NFC
llvm-svn: 278855
2016-08-16 21:26:10 +00:00
David Majnemer 110522bc0f [LoopUnroll] Don't clear out the AssumptionCache on each loop
Clearing out the AssumptionCache can cause us to rescan the entire
function for assumes.  If there are many loops, then we are scanning
over the entire function many times.

Instead of clearing out the AssumptionCache, register all cloned
assumes.

llvm-svn: 278854
2016-08-16 21:09:46 +00:00