Commit Graph

7658 Commits

Author SHA1 Message Date
Sanjay Patel d2d2aa52cd [LowerExpectIntrinsic] make default likely/unlikely ratio bigger
We need the default ratio to be sufficiently large that it triggers transforms 
based on block frequency info (BFI) and plays well with the recently introduced
BranchProbability used by CGP.

Differential Revision: http://reviews.llvm.org/D19435

llvm-svn: 267615
2016-04-26 22:23:38 +00:00
Justin Bogner 90744d215b Reassociate: Simplify using lambdas. NFC
llvm-svn: 267614
2016-04-26 22:22:18 +00:00
David Majnemer 30ffc4ce45 [SROA] Don't falsely report that changes have occured
We would report that the function changed despite creating no new
allocas or performing any promotion.

This fixes PR27316.

llvm-svn: 267507
2016-04-26 01:05:00 +00:00
Chad Rosier e2cbd13e56 [ValueTracking] Improve isImpliedCondition when the dominating cond is false.
llvm-svn: 267430
2016-04-25 17:23:36 +00:00
Andrew Kaylor aa641a5171 Re-commit optimization bisect support (r267022) without new pass manager support.
The original commit was reverted because of a buildbot problem with LazyCallGraph::SCC handling (not related to the OptBisect handling).

Differential Revision: http://reviews.llvm.org/D19172

llvm-svn: 267231
2016-04-22 22:06:11 +00:00
Justin Bogner b93949089e PM: Port SinkingPass to the new pass manager
llvm-svn: 267199
2016-04-22 19:54:10 +00:00
Justin Bogner 82077c4ab0 PM: Reorder the functions used for SinkingPass. NFC
This will make the port to the new PM easier to follow.

llvm-svn: 267198
2016-04-22 19:54:04 +00:00
Jun Bum Lim d29a24e4fd [DeadStoreElimination] Shorten beginning of memset overwritten by later stores
Summary: This change will shorten memset if the beginning of memset is overwritten by later stores.

Reviewers: hfinkel, eeckstein, dberlin, mcrosier

Subscribers: mgrang, mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D18906

llvm-svn: 267197
2016-04-22 19:51:29 +00:00
Justin Bogner 395c2127ed PM: Port DCE to the new pass manager
Also add a very basic test, since apparently there aren't any tests
for DCE whatsoever to add the new pass version to.

llvm-svn: 267196
2016-04-22 19:40:41 +00:00
Chad Rosier 1a4bc110f5 [EarlyCSE/CVP] Add stats for CVPs and make sure to account for any Changes.
llvm-svn: 267187
2016-04-22 18:47:21 +00:00
David Majnemer bfd695d591 [EarlyCSE] Don't add the overflow flags to the hash
We take the intersection of overflow flags while CSE'ing.
This permits us to consider two instructions with different overflow
behavior to be replaceable.

llvm-svn: 267153
2016-04-22 14:12:50 +00:00
Vedant Kumar 6013f45f92 Revert "Initial implementation of optimization bisect support."
This reverts commit r267022, due to an ASan failure:

  http://lab.llvm.org:8080/green/job/clang-stage2-cmake-RgSan_check/1549

llvm-svn: 267115
2016-04-22 06:51:37 +00:00
David Majnemer d0ce8f1485 [GVN] Respect fast-math-flags on fcmps
We assumed that flags were only present on binary operators.  This is
not true, they may also be present on calls and fcmps.

llvm-svn: 267113
2016-04-22 06:37:51 +00:00
David Majnemer 9554c1339c [EarlyCSE] Take the intersection of flags on instructions
EarlyCSE had inconsistent behavior with regards to flag'd instructions:
- In some cases, it would pessimize if the available instruction had
  different flags by not performing CSE.
- In other cases, it would miscompile if it replaced an instruction
  which had no flags with an instruction which has flags.

Fix this by being more consistent with our flag handling by utilizing
andIRFlags.

llvm-svn: 267111
2016-04-22 06:37:45 +00:00
Andrew Kaylor f0f279291c Initial implementation of optimization bisect support.
This patch implements a optimization bisect feature, which will allow optimizations to be selectively disabled at compile time in order to track down test failures that are caused by incorrect optimizations.

The bisection is enabled using a new command line option (-opt-bisect-limit).  Individual passes that may be skipped call the OptBisect object (via an LLVMContext) to see if they should be skipped based on the bisect limit.  A finer level of control (disabling individual transformations) can be managed through an addition OptBisect method, but this is not yet used.

The skip checking in this implementation is based on (and replaces) the skipOptnoneFunction check.  Where that check was being called, a new call has been inserted in its place which checks the bisect limit and the optnone attribute.  A new function call has been added for module and SCC passes that behaves in a similar way.

Differential Revision: http://reviews.llvm.org/D19172

llvm-svn: 267022
2016-04-21 17:58:54 +00:00
Adam Nemet 963341c872 [LoopUtils] Move def of findStringMetadataForLoop to LoopUtils.cpp. NFC
The decl is in LoopUtils.h.  I think that this was added to
LoopVersioningLICM.cpp by mistake.

llvm-svn: 267014
2016-04-21 17:33:17 +00:00
Adam Nemet f787826b46 [LoopUtils] Rename {check->find}StringMetadata{Into->For}Loop. NFC
"Into" was misleading.  I am also planning to use this helper to look
for loop metadata and return the argument, so find seems like a better
name.

llvm-svn: 267013
2016-04-21 17:33:12 +00:00
Chad Rosier b346dcbc25 Typo.
llvm-svn: 266905
2016-04-20 19:16:23 +00:00
Chad Rosier 41dd31f0b0 [ValueTracking] Make isImpliedCondition return an Optional<bool>. NFC.
Phabricator Revision: http://reviews.llvm.org/D19277

llvm-svn: 266904
2016-04-20 19:15:26 +00:00
Chad Rosier b7dfbb40a3 [ValueTracking] Improve isImpliedCondition for conditions with matching operands.
This patch improves SimplifyCFG to catch cases like:

  if (a < b) {
    if (a > b) <- known to be false
      unreachable;
  }

Phabricator Revision: http://reviews.llvm.org/D18905

llvm-svn: 266767
2016-04-19 17:19:14 +00:00
Michael Kuperstein de16b44f74 Port DemandedBits to the new pass manager.
Differential Revision: http://reviews.llvm.org/D18679

llvm-svn: 266699
2016-04-18 23:55:01 +00:00
Mehdi Amini b550cb1750 [NFC] Header cleanup
Removed some unused headers, replaced some headers with forward class declarations.

Found using simple scripts like this one:
clear && ack --cpp -l '#include "llvm/ADT/IndexedMap.h"' | xargs grep -L 'IndexedMap[<]' | xargs grep -n --color=auto 'IndexedMap'

Patch by Eugene Kosov <claprix@yandex.ru>

Differential Revision: http://reviews.llvm.org/D19219

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 266595
2016-04-18 09:17:29 +00:00
Duncan P. N. Exon Smith a71301befa Transforms: Fix bootstrap after r266565
Apparently there isn't test coverage for all of these.  I'd appreciate
if someone with could reproduce and send me something to reduce, but for
now I've just looked for users of RemapInstruction and MapValue and
ensured they don't accidentally insert nullptr.  Here is one of the
bootstraps that caught:

  http://lab.llvm.org:8011/builders/clang-x64-ninja-win7/builds/11494

llvm-svn: 266567
2016-04-17 19:26:49 +00:00
Justin Lebar cad81cf6b3 [Speculation] Add a SpeculativeExecution mode where the pass does nothing unless TTI::hasBranchDivergence() is true.
Summary:
This lets us add this pass to the IR pass manager unconditionally; it
will simply not do anything on targets without branch divergence.

Reviewers: tra

Subscribers: llvm-commits, jingyue, rnk, chandlerc

Differential Revision: http://reviews.llvm.org/D18625

llvm-svn: 266398
2016-04-15 00:32:09 +00:00
Nicolai Haehnle 05b127da06 [StructurizeCFG] Annotate branches that were treated as uniform
Summary:
This fully solves the problem where the StructurizeCFG pass does not
consider the same branches as uniform as the SIAnnotateControlFlow pass.
The patch in D19013 helps with this problem, but is not sufficient
(and, interestingly, causes a "regression" with one of the existing
test cases).

No tests included here, because tests in D19013 already cover this.

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D19018

llvm-svn: 266346
2016-04-14 17:42:35 +00:00
Tim Northover 5c02f9ad28 ARM: override cost function to re-enable ConstantHoisting (& fix it).
At some point, ARM stopped getting any benefit from ConstantHoisting because
the pass called a different variant of getIntImmCost. Reimplementing the
correct variant revealed some problems, however:

  + ConstantHoisting was modifying switch statements. This is simply invalid,
    the cases must remain integer constants no matter the notional cost.
  + ConstantHoisting was mangling alloca instructions in the entry block. These
    should be handled by FrameLowering, so constants actually have a cost of 0.
    Worse, the resulting bitcasts meant they became dynamic allocas.

rdar://25707382

llvm-svn: 266260
2016-04-13 23:08:27 +00:00
Betul Buyukkurt bf8554c279 [PGO] Remove redundant VP instrumentation
LLVM optimization passes may reduce a profiled target expression
to a constant. Removing runtime calls at such instrumentation points
would help speedup the runtime of the instrumented program.

llvm-svn: 266229
2016-04-13 18:52:19 +00:00
Sanjoy Das 5ce3272833 Don't IPO over functions that can be de-refined
Summary:
Fixes PR26774.

If you're aware of the issue, feel free to skip the "Motivation"
section and jump directly to "This patch".

Motivation:

I define "refinement" as discarding behaviors from a program that the
optimizer has license to discard.  So transforming:

```
void f(unsigned x) {
  unsigned t = 5 / x;
  (void)t;
}
```

to

```
void f(unsigned x) { }
```

is refinement, since the behavior went from "if x == 0 then undefined
else nothing" to "nothing" (the optimizer has license to discard
undefined behavior).

Refinement is a fundamental aspect of many mid-level optimizations done
by LLVM.  For instance, transforming `x == (x + 1)` to `false` also
involves refinement since the expression's value went from "if x is
`undef` then { `true` or `false` } else { `false` }" to "`false`" (by
definition, the optimizer has license to fold `undef` to any non-`undef`
value).

Unfortunately, refinement implies that the optimizer cannot assume
that the implementation of a function it can see has all of the
behavior an unoptimized or a differently optimized version of the same
function can have.  This is a problem for functions with comdat
linkage, where a function can be replaced by an unoptimized or a
differently optimized version of the same source level function.

For instance, FunctionAttrs cannot assume a comdat function is
actually `readnone` even if it does not have any loads or stores in
it; since there may have been loads and stores in the "original
function" that were refined out in the currently visible variant, and
at the link step the linker may in fact choose an implementation with
a load or a store.  As an example, consider a function that does two
atomic loads from the same memory location, and writes to memory only
if the two values are not equal.  The optimizer is allowed to refine
this function by first CSE'ing the two loads, and the folding the
comparision to always report that the two values are equal.  Such a
refined variant will look like it is `readonly`.  However, the
unoptimized version of the function can still write to memory (since
the two loads //can// result in different values), and selecting the
unoptimized version at link time will retroactively invalidate
transforms we may have done under the assumption that the function
does not write to memory.

Note: this is not just a problem with atomics or with linking
differently optimized object files.  See PR26774 for more realistic
examples that involved neither.

This patch:

This change introduces a new set of linkage types, predicated as
`GlobalValue::mayBeDerefined` that returns true if the linkage type
allows a function to be replaced by a differently optimized variant at
link time.  It then changes a set of IPO passes to bail out if they see
such a function.

Reviewers: chandlerc, hfinkel, dexonsmith, joker.eph, rnk

Subscribers: mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D18634

llvm-svn: 265762
2016-04-08 00:48:30 +00:00
Ulrich Weigand fc23907673 [GVN] Address review comments for D18662
As suggested by Chandler in his review comments for D18662, this
follow-on patch renames some variables in GetLoadValueForLoad and
CoerceAvailableValueToLoadType to hopefully make it more obvious
which variables hold value sizes and which hold load/store sizes.

No functional change intended.

llvm-svn: 265687
2016-04-07 15:55:11 +00:00
Ulrich Weigand 6e6966460a [GVN] Fix handling of sub-byte types in big-endian mode
When GVN wants to re-interpret an already available value in a smaller
type, it needs to right-shift the value on big-endian systems to ensure
the correct bytes are accessed.  The shift value is the difference of
the sizes of the two types.

This is correct as long as both types occupy multiples of full bytes.
However, when one of them is a sub-byte type like i1, this no longer
holds true: we still need to shift, but only to access the correct
*byte*.  Accessing bits within the byte requires no shift in either
endianness; e.g. an i1 resides in the least-significant bit of its
containing byte on both big- and little-endian systems.

Therefore, the appropriate shift value to be used is the difference of
the *storage* sizes of the two types.  This is already handled correctly
in one place where such a shift takes place (GetStoreValueForLoad), but
is incorrect in two other places: GetLoadValueForLoad and
CoerceAvailableValueToLoadType.

This patch changes both places to use the storage size as well.

Differential Revision: http://reviews.llvm.org/D18662

llvm-svn: 265684
2016-04-07 15:45:02 +00:00
Duncan P. N. Exon Smith da68cbc4ad IR: RF_IgnoreMissingValues => RF_IgnoreMissingLocals, NFC
Clarify what this RemapFlag actually means.

  - Change the flag name to match its intended behaviour.
  - Clearly document that it's not supposed to affect globals.
  - Add a host of FIXMEs to indicate how to fix the behaviour to match
    the intent of the flag.

RF_IgnoreMissingLocals should only affect the behaviour of
RemapInstruction for function-local operands; namely, for operands of
type Argument, Instruction, and BasicBlock.  Currently, it is *only*
passed into RemapInstruction calls (and the transitive MapValue calls
that it makes).

When I split Metadata from Value I didn't understand the flag, and I
used it in a bunch of places for "global" metadata.

This commit doesn't have any functionality change, but prepares to
cleanup MapMetadata and MapValue.

llvm-svn: 265628
2016-04-07 00:26:43 +00:00
JF Bastien 800f87a871 NFC: make AtomicOrdering an enum class
Summary:
In the context of http://wg21.link/lwg2445 C++ uses the concept of
'stronger' ordering but doesn't define it properly. This should be fixed
in C++17 barring a small question that's still open.

The code currently plays fast and loose with the AtomicOrdering
enum. Using an enum class is one step towards tightening things. I later
also want to tighten related enums, such as clang's
AtomicOrderingKind (which should be shared with LLVM as a 'C++ ABI'
enum).

This change touches a few lines of code which can be improved later, I'd
like to keep it as NFC for now as it's already quite complex. I have
related changes for clang.

As a follow-up I'll add:
  bool operator<(AtomicOrdering, AtomicOrdering) = delete;
  bool operator>(AtomicOrdering, AtomicOrdering) = delete;
  bool operator<=(AtomicOrdering, AtomicOrdering) = delete;
  bool operator>=(AtomicOrdering, AtomicOrdering) = delete;
This is separate so that clang and LLVM changes don't need to be in sync.

Reviewers: jyknight, reames

Subscribers: jyknight, llvm-commits

Differential Revision: http://reviews.llvm.org/D18775

llvm-svn: 265602
2016-04-06 21:19:33 +00:00
Fiona Glaser 045afc4f66 Loop Unroll: add options and tweak to make Partial unrolling more useful
1. Add FullUnrollMaxCount option that works like MaxCount, but also limits
   the unroll count for fully unrolled loops. So if a loop has an iteration
   count over this, it won't fully unroll.
2. Add CLI options for MaxCount and the new option, so they can be tested
   (plus a test).
3. Make partial unrolling obey MaxCount.

An example use-case (the out of tree one this is originally designed for) is
a target’s TTI can analyze a loop and decide on a max unroll count separate
from the size threshold, e.g. based on register pressure, then constrain
LoopUnroll to not exceed that, regardless of the size of the unrolled loop.

llvm-svn: 265562
2016-04-06 16:57:25 +00:00
Fiona Glaser 16332ba861 LoopUnroll: only allow non-modulo Partial unrolling when Runtime=true
Patch by Evgeny Stupachenko <evstupac@gmail.com>.

llvm-svn: 265558
2016-04-06 16:43:45 +00:00
Chad Rosier 074ce836f0 Simplify logic. NFC.
llvm-svn: 265537
2016-04-06 13:27:13 +00:00
Richard Trieu f35d4b0928 Add parentheses to silence warning.
llvm-svn: 265516
2016-04-06 04:22:00 +00:00
Sanjoy Das 99abb2728b [RS4GC] Add a comment
llvm-svn: 265503
2016-04-06 01:33:54 +00:00
Sanjoy Das 8d89a2b296 [RS4GC] NFC cleanup of the DeferredReplacement class
Instead of constructors use clearly named factory methods.

llvm-svn: 265486
2016-04-05 23:18:53 +00:00
Sanjoy Das 49e974b33b [RS4GC] Better codegen for deoptimize calls
Don't emit a gc.result for a statepoint lowered from
@llvm.experimental.deoptimize since the call into __llvm_deoptimize is
effectively noreturn.  Instead follow the corresponding gc.statepoint
with an "unreachable".

llvm-svn: 265485
2016-04-05 23:18:35 +00:00
Sanjay Patel e77c7de459 use range loop; NFCI
llvm-svn: 265360
2016-04-04 23:05:06 +00:00
Zia Ansari a82a58a4e5 Enable unroll for constant bound loops when TripCount is not modulo of unroll factor, reducing it to maximum power-of-2 that satisfies threshold limit.
Commit for Evgeny Stupachenko (evstupac@gmail.com)

Differential Revision: http://reviews.llvm.org/D18290

llvm-svn: 265337
2016-04-04 19:24:46 +00:00
Sanjoy Das 021de058df Introduce a @llvm.experimental.guard intrinsic
Summary:
As discussed on llvm-dev[1].

This change adds the basic boilerplate code around having this intrinsic
in LLVM:

 - Changes in Intrinsics.td, and the IR Verifier
 - A lowering pass to lower @llvm.experimental.guard to normal
   control flow
 - Inliner support

[1]: http://lists.llvm.org/pipermail/llvm-dev/2016-February/095523.html

Reviewers: reames, atrick, chandlerc, rnk, JosephTremoulet, echristo

Subscribers: mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D18527

llvm-svn: 264976
2016-03-31 00:18:46 +00:00
David Majnemer 5d518386b6 [IndVarSimplify] Don't insert after a catchswitch
Widening a PHI requires us to insert a trunc.
The logical place for this trunc is in the same BB as the PHI.
This is not possible if the BB is terminated by a catchswitch.

This fixes PR27133.

llvm-svn: 264926
2016-03-30 21:12:06 +00:00
Adam Nemet 1428d41f9a [LoopDataPrefetch] Centralize the tuning cl::opts under the pass
This is effectively NFC, minus the renaming of the options
(-cyclone-prefetch-distance -> -prefetch-distance).

The change was requested by Tim in D17943.

llvm-svn: 264806
2016-03-29 23:45:52 +00:00
Duncan P. N. Exon Smith e8eb94a9a5 ADCE: Remove debug info intrinsics in dead scopes
During ADCE, track which debug info scopes still have live references
from the code, and delete debug info intrinsics for the dead ones.

These intrinsics describe the locations of variables (in registers or
stack slots).  If there's no code left corresponding to a variable's
scope, then there's no way to reference the variable in the debugger and
it doesn't matter what its value is.

I add a DEBUG printout when the described location in an SSA register,
in case it helps some trying to track down why locations get lost.
However, we still delete these; the scope itself isn't attached to any
real code, so the ship has already sailed.

llvm-svn: 264800
2016-03-29 22:57:12 +00:00
Adam Nemet 85fba39390 [LoopDataPrefetch] Make more member functions private, NFC.
llvm-svn: 264798
2016-03-29 22:40:02 +00:00
Hyojin Sung 4673f10568 [SimlifyCFG] Prevent passes from destroying canonical loop structure, especially for nested loops
When eliminating or merging almost empty basic blocks, the existence of non-trivial PHI nodes
is currently used to recognize potential loops of which the block is the header and keep the block.
However, the current algorithm fails if the loops' exit condition is evaluated only with volatile
values hence no PHI nodes in the header. Especially when such a loop is an outer loop of a nested
loop, the loop is collapsed into a single loop which prevent later optimizations from being
applied (e.g., transforming nested loops into simplified forms and loop vectorization).
    
The patch augments the existing PHI node-based check by adding a pre-test if the BB actually
belongs to a set of loop headers and not eliminating it if yes.

llvm-svn: 264697
2016-03-29 04:08:57 +00:00
Reid Kleckner ba85781f58 Revert "[SimlifyCFG] Prevent passes from destroying canonical loop structure, especially for nested loops"
This reverts commit r264596.

It does not compile.

llvm-svn: 264604
2016-03-28 18:07:40 +00:00
Hyojin Sung 0ada5b0d14 [SimlifyCFG] Prevent passes from destroying canonical loop structure, especially for nested loops
When eliminating or merging almost empty basic blocks, the existence of non-trivial PHI nodes
is currently used to recognize potential loops of which the block is the header and keep the block.
However, the current algorithm fails if the loops' exit condition is evaluated only with volatile
values hence no PHI nodes in the header. Especially when such a loop is an outer loop of a nested
loop, the loop is collapsed into a single loop which prevent later optimizations from being 
applied (e.g., transforming nested loops into simplified forms and loop vectorization).

The patch augments the existing PHI node-based check by adding a pre-test if the BB actually 
belongs to a set of loop headers and not eliminating it if yes. 

llvm-svn: 264596
2016-03-28 17:22:25 +00:00
Hal Finkel 5c83a090bc [SROA] Fix typo in comment
llvm-svn: 264573
2016-03-28 11:23:21 +00:00
Hal Finkel 29f5131daf C++11 is required, remove some preprocessor checks for it
We require C++11 to build, so remove a few remaining preprocessor checks for
'__cplusplus >= 201103L'. This should always be true.

llvm-svn: 264572
2016-03-28 11:13:03 +00:00
Sanjoy Das d4c783335b [RS4GC] Lower calls to @llvm.experimental.deoptimize
This changes RS4GC to lower calls to ``@llvm.experimental.deoptimize``
to gc.statepoints wrapping ``__llvm_deoptimize``, and changes
``callsGCLeafFunction`` to recognize ``@llvm.experimental.deoptimize``
as a non GC leaf function.

I've had to hard code the ``"__llvm_deoptimize"`` name in
RewriteStatepointsForGC; since ``TargetLibraryInfo`` is available only
during codegen.  This isn't without precedent in the codebase, so I'm
not overtly concerned.

llvm-svn: 264456
2016-03-25 20:12:13 +00:00
David L Kreitzer 8d441eb936 Enable non-power-of-2 #pragma unroll counts.
Patch by Evgeny Stupachenko.

Differential Revision: http://reviews.llvm.org/D18202

llvm-svn: 264407
2016-03-25 14:24:52 +00:00
David Majnemer e09d035dad [LoopStrengthReduce] Don't hoist into a catchswitch
We try to hoist the insertion point as high as possible to encourage
sharing.  However, we must be careful not to hoist into a catchswitch as
it is both an EHPad and a terminator.

llvm-svn: 264344
2016-03-24 21:40:22 +00:00
Adam Nemet 7aba60c853 [LLE] Check for mismatching types between the store and the load earlier
isDependenceDistanceOfOne asserts that the store and the load access
through the same type.  This function is also used by
removeDependencesFromMultipleStores so we need to make sure we filter
out mismatching types before reaching this point.

Now we do this when the initial candidates are gathered.

This is a refinement of the fix made in r262267.

Fixes PR27048.

llvm-svn: 264313
2016-03-24 17:59:26 +00:00
Zinovy Nis 07ac2bd4d0 [PATCH] Force LoopReroll to reset the loop trip count value after reroll.
It's a bug fix. 
For rerolled loops SE trip count remains unchanged. It leads to incorrect work of the next passes.
My patch just resets SE info for rerolled loop forcing SE to re-evaluate it next time it requested.
I also added a verifier call in the exisitng test to be sure no invalid SE data remain. Without my fix this test would fail with -verify-scev.

Differential Revision: http://reviews.llvm.org/D18316

llvm-svn: 264051
2016-03-22 13:50:57 +00:00
Adam Nemet 709e3046ee [LoopDataPrefetch] Add TTI to limit the number of iterations to prefetch ahead
Summary:
It can hurt performance to prefetch ahead too much.  Be conservative for
now and don't prefetch ahead more than 3 iterations on Cyclone.

Reviewers: hfinkel

Subscribers: llvm-commits, mzolotukhin

Differential Revision: http://reviews.llvm.org/D17949

llvm-svn: 263772
2016-03-18 00:27:43 +00:00
Adam Nemet 6d8beeca53 [LoopDataPrefetch/Aarch64] Allow selective prefetching of large-strided accesses
Summary:
And use this TTI for Cyclone.  As it was explained in the original RFC
(http://thread.gmane.org/gmane.comp.compilers.llvm.devel/92758), the HW
prefetcher work up to 2KB strides.

I am also adding tests for this and the previous change (D17943):

* Cyclone prefetching accesses with a large stride
* Cyclone not prefetching accesses with a small stride
* Generic Aarch64 subtarget not prefetching either

Reviewers: hfinkel

Subscribers: aemerson, rengolin, llvm-commits, mzolotukhin

Differential Revision: http://reviews.llvm.org/D17945

llvm-svn: 263771
2016-03-18 00:27:38 +00:00
Adam Nemet 5eccf07df3 [LoopVersioning] Annotate versioned loop with noalias metadata
Summary:
If we decide to version a loop to benefit a transformation, it makes
sense to record the now non-aliasing accesses in the newly versioned
loop.  This allows non-aliasing information to be used by subsequent
passes.

One example is 456.hmmer in SPECint2006 where after loop distribution,
we vectorize one of the newly distributed loops.  To vectorize we
version this loop to fully disambiguate may-aliasing accesses.  If we
add the noalias markers, we can use the same information in a later DSE
pass to eliminate some dead stores which amounts to ~25% of the
instructions of this hot memory-pipeline-bound loop.  The overall
performance improves by 18% on our ARM64.

The scoped noalias annotation is added in LoopVersioning.  The patch
then enables this for loop distribution.  A follow-on patch will enable
it for the vectorizer.  Eventually this should be run by default when
versioning the loop but first I'd like to get some feedback whether my
understanding and application of scoped noalias metadata is correct.

Essentially my approach was to have a separate alias domain for each
versioning of the loop.  For example, if we first version in loop
distribution and then in vectorization of the distributed loops, we have
a different set of memchecks for each versioning.  By keeping the scopes
in different domains they can conveniently be defined independently
since different alias domains don't affect each other.

As written, I also have a separate domain for each loop.  This is not
necessary and we could save some metadata here by using the same domain
across the different loops.  I don't think it's a big deal either way.

Probably the best is to review the tests first to see if I mapped this
problem correctly to scoped noalias markers.  I have plenty of comments
in the tests.

Note that the interface is prepared for the vectorizer which needs the
annotateInstWithNoAlias API.  The vectorizer does not use LoopVersioning
so we need a way to pass in the versioned instructions.  This is also
why the maps have to become part of the object state.

Also currently, we only have an AA-aware DSE after the vectorizer if we
also run the LTO pipeline.  Depending how widely this triggers we may
want to schedule a DSE toward the end of the regular pass pipeline.

Reviewers: hfinkel, nadav, ashutosh.nema

Subscribers: mssimpso, aemerson, llvm-commits, mcrosier

Differential Revision: http://reviews.llvm.org/D16712

llvm-svn: 263743
2016-03-17 20:32:32 +00:00
Sanjoy Das c9058ca9e0 [Statepoints] Export a magic constant into a header; NFC
llvm-svn: 263733
2016-03-17 18:42:17 +00:00
Sanjoy Das 312038872d [Statepoints] Separate out logic for statepoint directives; NFC
This splits out the logic that maps the `"statepoint-id"` attribute into
the actual statepoint ID, and the `"statepoint-num-patch-bytes"`
attribute into the number of patchable bytes the statpeoint is lowered
into.  The new home of this logic is in IR/Statepoint.cpp, and this
refactoring will support similar functionality when lowering calls with
deopt operand bundles in the future.

llvm-svn: 263685
2016-03-17 01:56:10 +00:00
Geoff Berry 56fabf9b55 Revert "[LSR] Create fewer redundant instructions."
This reverts commit r263644.  Investigating bootstrap failures.

llvm-svn: 263655
2016-03-16 19:21:47 +00:00
Geoff Berry 459b750871 [LSR] Create fewer redundant instructions.
Summary:
Fix LSRInstance::HoistInsertPosition() to check the original insert
position block first for a canonical insertion point that is dominated
by all inputs.  This leads to SCEV being able to reuse more instructions
since it currently tracks the instructions it creates for reuse by
keeping a table of <Value, insert point> pairs.

Reviewers: atrick

Subscribers: mcrosier, mzolotukhin, llvm-commits

Differential Revision: http://reviews.llvm.org/D18001

llvm-svn: 263644
2016-03-16 17:29:49 +00:00
Haicheng Wu 7873857a88 [JumpThreading] See through Cast Instructions
To capture more jump-thread opportunity.

llvm-svn: 263618
2016-03-16 04:52:52 +00:00
Haicheng Wu 64d9d7c3f7 Revert "[JumpThreading] Simplify Instructions first in ComputeValueKnownInPredecessors()"
Not sure it handles undef properly.

llvm-svn: 263605
2016-03-15 23:38:47 +00:00
Justin Lebar 6827de19b2 [LoopUnroll] Respect the convergent attribute.
Summary:
Specifically, when we perform runtime loop unrolling of a loop that
contains a convergent op, we can only unroll k times, where k divides
the loop trip multiple.

Without this change, we'll happily unroll e.g. the following loop

  for (int i = 0; i < N; ++i) {
    if (i == 0) convergent_op();
    foo();
  }

into

  int i = 0;
  if (N % 2 == 1) {
    convergent_op();
    foo();
    ++i;
  }
  for (; i < N - 1; i += 2) {
    if (i == 0) convergent_op();
    foo();
    foo();
  }.

This is unsafe, because we've just added a control-flow dependency to
the convergent op in the prelude.

In general, runtime unrolling loops that contain convergent ops is safe
only if we don't have emit a prelude, which occurs when the unroll count
divides the trip multiple.

Reviewers: resistor

Subscribers: llvm-commits, mzolotukhin

Differential Revision: http://reviews.llvm.org/D17526

llvm-svn: 263509
2016-03-14 23:15:34 +00:00
Amaury Sechet bdb261b4c0 Imporove load to store => memcpy
Summary: This now try to reorder instructions in order to help create the optimizable pattern.

Reviewers: craig.topper, spatel, dexonsmith, Prazek, chandlerc, joker.eph, majnemer

Differential Revision: http://reviews.llvm.org/D16523

llvm-svn: 263503
2016-03-14 22:52:27 +00:00
Chad Rosier 00bd82cade [CVP] Replace nonnegative with positive, per Philip's request. NFC.
llvm-svn: 263430
2016-03-14 13:48:00 +00:00
Haicheng Wu d60ae33d29 [CVP] Convert an SDiv to a UDiv if both operands are known to be nonnegative
The motivating example is this

for (j = n; j > 1; j = i) {
   i = j / 2;
}

The signed division is safely to be changed to an unsigned division (j is known
to be larger than 1 from the loop guard) and later turned into a single shift
without considering the sign bit.

llvm-svn: 263406
2016-03-14 03:24:28 +00:00
Mehdi Amini ba9fba81d6 Remove PreserveNames template parameter from IRBuilder
This reapplies r263258, which was reverted in r263321 because
of issues on Clang side.

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 263393
2016-03-13 21:05:13 +00:00
Eric Christopher 35abd051c0 Temporarily revert:
commit ae14bf6488e8441f0f6d74f00455555f6f3943ac
Author: Mehdi Amini <mehdi.amini@apple.com>
Date:   Fri Mar 11 17:15:50 2016 +0000

    Remove PreserveNames template parameter from IRBuilder

    Summary:
    Following r263086, we are now relying on a flag on the Context to
    discard Value names in release builds.

    Reviewers: chandlerc

    Subscribers: mzolotukhin, llvm-commits

    Differential Revision: http://reviews.llvm.org/D18023

    From: Mehdi Amini <mehdi.amini@apple.com>

    git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263258
    91177308-0d34-0410-b5e6-96231b3b80d8

until we can figure out what to do about clang and Release build testing.

This reverts commit 263258.

llvm-svn: 263321
2016-03-12 01:47:22 +00:00
Mehdi Amini 99eab3dd06 Remove PreserveNames template parameter from IRBuilder
Summary:
Following r263086, we are now relying on a flag on the Context to
discard Value names in release builds.

Reviewers: chandlerc

Subscribers: mzolotukhin, llvm-commits

Differential Revision: http://reviews.llvm.org/D18023

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 263258
2016-03-11 17:15:50 +00:00
Mehdi Amini 1e9c925182 Do not specialize IRBuilder to strip names in SROA
Summary:
Following r263086, we are replacing this by a runtime check.
More cleanup will follow on the IRBuilder itself, but I submitted
this patch separately as SROA has a fancy "prefixInserter" class
that needs extra-love.

Reviewers: chandlerc

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D18022

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 263256
2016-03-11 17:15:34 +00:00
Chandler Carruth ace8c8f765 [PM] Sink the "Expression" type for GVN into the class as a private
member type.

Because of how this type is used by the ValueTable, it cannot actually
have hidden visibility. GCC actually nicely warns about this but Clang
just silently ... I don't even know. =/ We should do a better job either
way though.

This should resolve a bunch of the GCC warnings about visibility that
the port of GVN triggered and make the visibility story a bit more
correct.

llvm-svn: 263250
2016-03-11 16:25:19 +00:00
Chandler Carruth 3bc9c7fb45 [PM] The order of evaluation of these analyses is actually significant,
much to my horror, so use variables to fix it in place.

This terrifies me. Both basic-aa and memdep will provide more precise
information when the domtree and/or the loop info is available. Because
of this, if your pass (like GVN) requires domtree, and then queries
memdep or basic-aa, it will get more precise results. If it does this in
the other order, it gets less precise results.

All of the ideas I have for fixing this are, essentially, terrible. Here
I've just caused us to stop having unspecified behavior as different
implementations evaluate the order of these arguments differently. I'm
actually rather glad that they do, or the fragility of memdep and
basic-aa would have gone on unnoticed. I've left comments so we don't
immediately break this again. This should fix bots whose host compilers
evaluate the order of arguments differently from Clang.

llvm-svn: 263231
2016-03-11 13:26:47 +00:00
Chandler Carruth b47f8010a9 [PM] Make the AnalysisManager parameter to run methods a reference.
This was originally a pointer to support pass managers which didn't use
AnalysisManagers. However, that doesn't realistically come up much and
the complexity of supporting it doesn't really make sense.

In fact, *many* parts of the pass manager were just assuming the pointer
was never null already. This at least makes it much more explicit and
clear.

llvm-svn: 263219
2016-03-11 11:05:24 +00:00
Chandler Carruth 89c45a162f [PM] Port GVN to the new pass manager, wire it up, and teach a couple of
tests to run GVN in both modes.

This is mostly the boring refactoring just like SROA and other complex
transformation passes. There is some trickiness in that GVN's
ValueNumber class requires hand holding to get to compile cleanly. I'm
open to suggestions about a better pattern there, but I tried several
before settling on this. I was trying to balance my desire to sink as
much implementation detail into the source file as possible without
introducing overly many layers of abstraction.

Much like with SROA, the design of this system is made somewhat more
cumbersome by the need to support both pass managers without duplicating
the significant state and logic of the pass. The same compromise is
struck here.

I've also left a FIXME in a doxygen comment as the GVN pass seems to
have pretty woeful documentation within it. I'd like to submit this with
the FIXME and let those more deeply familiar backfill the information
here now that we have a nice place in an interface to put that kind of
documentaiton.

Differential Revision: http://reviews.llvm.org/D18019

llvm-svn: 263208
2016-03-11 08:50:55 +00:00
Adam Nemet efb234135c [LLE] Add missed LoopSimplify dependence
The code assumed that we always had a preheader without making the pass
dependent on LoopSimplify.

Thanks to Mattias Eriksson V for reporting this.

llvm-svn: 263173
2016-03-10 23:54:39 +00:00
Chandler Carruth 37f1f12226 [SROA] Fix PR25873, which Andrea Di Biagio analyzed the daylights out
of, and I misdiagnosed for months and months.

Andrea has had a patch for this forever, but I just couldn't see how
it was fixing the root cause of the problem. It didn't make sense to me,
even though the patch was perfectly good and the analysis of the actual
failure event was *fantastic*.

Well, I came back to it today because the patch has sat for *far* too
long and needs attention and decided I wouldn't let it go until I really
understood what was going on. After quite some time in the debugger,
I finally realized that in fact I had just missed an important case with
my previous attempt to fix PR22093 in r225149. Not only do we need to
handle loads that won't be split, but stores-of-loads that we won't
split. We *do* actually have enough logic in the presplitting to form
new slices for split stores.... *unless* we decided not to split them!

I'm so sorry that it took me this long to come to the realization that
this is the issue. It seems so obvious in hind sight (of course).
Anyways, the fix becomes *much* smaller and more focused. The fact that
we're left doing integer smashing is related to the FIXME in my original
commit: fundamentally, we're not aggressive about pre-splitting for
loads and stores to the same alloca. If we want to get aggressive about
this, it'll need both what Andrea had put into the proposed fix, but
also a *lot* more logic to essentially iteratively pre-split the alloca
until we can't do any more. As I said in that commit log, its really
unclear that this is the right call. Instead, the integer blending and
letting targets lower this to narrower stores seems slightly better. But
we definitely shouldn't really go down that path just to fix this bug.

Again, tons of thanks are owed to Andrea and others at Sony for working
on this bug. I really should have seen what was going on here and
re-directed them sooner. =////

llvm-svn: 263121
2016-03-10 15:31:17 +00:00
Chandler Carruth d94a5962cc [SROA] Clean up some really weird code, no functionality changed.
We already have the instruction extracted into 'I', just cast that to
a store the way we do for loads. Also, we don't enter the if unless SI
is non-null, so don't test it again for null.

I'm pretty sure the entire test there can be nuked, but this is just the
trivial cleanup.

llvm-svn: 263112
2016-03-10 14:16:18 +00:00
Chandler Carruth 7776377e62 [gvn] Fix more indenting and formatting in regions of code that will
need to be changed for porting to the new pass manager.

Also sink the comment on the ValueTable class back to that class instead
of it dangling on an anonymous namespace.

No functionality changed.

llvm-svn: 263084
2016-03-10 00:58:20 +00:00
Chandler Carruth 169c84f1cc [gvn] Reformat a chunk of the GVN code that is strangely indented prior
to restructuring it for porting to the new pass manager.

No functionality changed.

llvm-svn: 263083
2016-03-10 00:58:18 +00:00
Chandler Carruth 61440d225b [PM] Port memdep to the new pass manager.
This is a fairly straightforward port to the new pass manager with one
exception. It removes a very questionable use of releaseMemory() in
the old pass to invalidate its caches between runs on a function.
I don't think this is really guaranteed to be safe. I've just used the
more direct port to the new PM to address this by nuking the results
object each time the pass runs. While this could cause some minor malloc
traffic increase, I don't expect the compile time performance hit to be
noticable, and it makes the correctness and other aspects of the pass
much easier to reason about. In some cases, it may make things faster by
making the sets and maps smaller with better locality. Indeed, the
measurements collected by Bruno (thanks!!!) show mostly compile time
improvements.

There is sadly very limited testing at this point as there are only two
tests of memdep, and both rely on GVN. I'll be porting GVN next and that
will exercise this heavily though.

Differential Revision: http://reviews.llvm.org/D17962

llvm-svn: 263082
2016-03-10 00:55:30 +00:00
Philip Reames e0a5454df4 Fix the build
I screwed up rebasing 263072.  This change fixes the build and passes all make check.

llvm-svn: 263073
2016-03-09 23:07:53 +00:00
Philip Reames b54c8e6eea [LICM] Store promotion when memory is thread local
This patch teaches LICM's implementation of store promotion to exploit the fact that the memory location being accessed might be provable thread local. The fact it's thread local weakens the requirements for where we can insert stores since no other thread can observe the write. This allows us perform store promotion even in cases where the store is not guaranteed to execute in the loop.

Two key assumption worth drawing out is that this assumes a) no-capture is strong enough to imply no-escape, and b) standard allocation functions like malloc, calloc, and operator new return values which can be assumed not to have previously escaped.

In future work, it would be nice to generalize this so that it works without directly seeing the allocation site. I believe that the nocapture return attribute should be suitable for this purpose, but haven't investigated carefully. It's also likely that we could support unescaped allocas with similar reasoning, but since SROA and Mem2Reg should destroy those, they're less interesting than they first might seem.

Differential Revision: http://reviews.llvm.org/D16783

llvm-svn: 263072
2016-03-09 22:59:30 +00:00
Adam Nemet 660748ca8c [LLE] Add missing check for unit stride
I somehow missed this.  The case in GCC (global_alloc) was similar to
the new testcase except it had an array of structs rather than a two
dimensional array.

Fixes RP26885.

llvm-svn: 263058
2016-03-09 20:47:55 +00:00
Adam Nemet 34785ecff1 [LoopDataPrefetch] Add stats and debug output
llvm-svn: 262998
2016-03-09 05:33:21 +00:00
Sanjoy Das 2eac48de9e Return StringRef instead of a naked char*; NFC
llvm-svn: 262989
2016-03-09 02:34:19 +00:00
Sanjoy Das f13900f8ac [IRCE] Reflow comments; NFC
llvm-svn: 262988
2016-03-09 02:34:15 +00:00
Sanjay Patel f831fdb56a fix variable name; NFC
llvm-svn: 262953
2016-03-08 19:07:42 +00:00
Sanjay Patel 5c96723622 use range-based loop; NFCI
llvm-svn: 262952
2016-03-08 19:06:12 +00:00
Adam Nemet bb3680bd85 [LoopDataPrefetch] If prefetch distance is not set, skip pass
This lets select sub-targets enable this pass.  The patch implements the
idea from the recent llvm-dev thread:
http://thread.gmane.org/gmane.comp.compilers.llvm.devel/94925

The goal is to enable the LoopDataPrefetch pass for the Cyclone
sub-target only within Aarch64.

Positive and negative tests will be included in an upcoming patch that
enables selective prefetching of large-strided accesses on Cyclone.

llvm-svn: 262844
2016-03-07 18:35:42 +00:00
Adam Nemet efc091f457 [LLE] Fix a comment
llvm-svn: 262270
2016-02-29 23:21:12 +00:00
Adam Nemet 83be06e529 [LLE] Fix SingleSource/Benchmarks/Polybench/stencils/jacobi-2d-imper with Polly
We can actually have dependences between accesses with different
underlying types.  Bail in this case.

A test will follow shortly.

llvm-svn: 262267
2016-02-29 22:53:59 +00:00
Chandler Carruth ad8cb382fa [LICM] Teach LICM how to handle cases where the alias set tracker was
merged into a loop that was subsequently unrolled (or otherwise nuked).

In this case it can't merge in the ASTs for any remaining nested loops,
it needs to re-add their instructions dircetly.

The fix is very isolated, but I've pulled the code for merging blocks
into the AST into a single place in the process. The only behavior
change is in the case which would have crashed before.

This fixes a crash reported by Mikael Holmen on the list after r261316
restored much of the loop pass pipelining and allowed us to actually do
this kind of nested transformation sequenc. I've taken that test case
and further reduced it into the somewhat twisty maze of loops in the
included test case. This does in fact trigger the bug even in this
reduced form.

llvm-svn: 262108
2016-02-27 04:34:07 +00:00
Haicheng Wu 5539f852ae [JumpThreading] Simplify Instructions first in ComputeValueKnownInPredecessors()
This change tries to find more opportunities to thread over basic blocks.

llvm-svn: 261981
2016-02-26 06:06:04 +00:00
Michael Zolotukhin 9f520ebc54 [LoopUnrollAnalyzer] Check that we're using SCEV for the same loop we're simulating.
Summary: Check that we're using SCEV for the same loop we're simulating. Otherwise, we might try to use the iteration number of the current loop in SCEV expressions for inner/outer loops IVs, which is clearly incorrect.

Reviewers: chandlerc, hfinkel

Subscribers: sanjoy, llvm-commits, mzolotukhin

Differential Revision: http://reviews.llvm.org/D17632

llvm-svn: 261958
2016-02-26 02:57:05 +00:00
Adam Nemet fb31d580ea [LoopDataPrefetch] Make it testable with opt
Summary:
Since this is an IR pass it's nice to be able to write tests without
llc.  This is the counterpart of the llc test under
CodeGen/PowerPC/loop-data-prefetch.ll.

Reviewers: hfinkel

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D17464

llvm-svn: 261578
2016-02-22 21:41:22 +00:00
Philip Reames ce38c2ddf6 [RS4GC] "Constant fold" the rs4gc-split-vector-values flag
This flag was part of a migration to a new means of handling vectors-of-points which was described in the llvm-dev thread "FYI: Relocating vector of pointers".  The old code path has been off by default for a while without complaints, so time to cleanup.

llvm-svn: 261569
2016-02-22 21:01:28 +00:00
Philip Reames 79fa9b75c0 [RS4GC] Revert optimization attempt due to memory corruption
This change reverts "246133 [RewriteStatepointsForGC] Reduce the number of new instructions for base pointers" and a follow on bugfix 12575.

As pointed out in pr25846, this code suffers from a memory corruption bug.  Since I'm (empirically) not going to get back to this any time soon, simply reverting the problematic change is the right answer.

llvm-svn: 261565
2016-02-22 20:45:56 +00:00
Elena Demikhovsky 9914dbd11b Allow setting MaxRerollIterations above 16
By Ayal Zaks.

Differential Revision http://reviews.llvm.org/D17258

llvm-svn: 261517
2016-02-22 09:38:28 +00:00
Duncan P. N. Exon Smith e9bc579c37 ADT: Remove == and != comparisons between ilist iterators and pointers
I missed == and != when I removed implicit conversions between iterators
and pointers in r252380 since they were defined outside ilist_iterator.

Since they depend on getNodePtrUnchecked(), they indirectly rely on UB.
This commit removes all uses of these operators.  (I'll delete the
operators themselves in a separate commit so that it can be easily
reverted if necessary.)

There should be NFC here.

llvm-svn: 261498
2016-02-21 20:39:50 +00:00
Sanjoy Das 979a11d5b2 [LoopDeletion] Add an assert that verifies LCSSA
This is inspired by PR24804 -- had this assert been there before,
isolating the root cause for PR24804 would have been far easier.

llvm-svn: 261481
2016-02-21 17:11:59 +00:00
Chandler Carruth 31088a9d58 [LPM] Factor all of the loop analysis usage updates into a common helper
routine.

We were getting this wrong in small ways and generally being very
inconsistent about it across loop passes. Instead, let's have a common
place where we do this. One minor downside is that this will require
some analyses like SCEV in more places than they are strictly needed.
However, this seems benign as these analyses are complete no-ops, and
without this consistency we can in many cases end up with the legacy
pass manager scheduling deciding to split up a loop pass pipeline in
order to run the function analysis half-way through. It is very, very
annoying to fix these without just being very pedantic across the board.

The only loop passes I've not updated here are ones that use
AU.setPreservesAll() such as IVUsers (an analysis) and the pass printer.
They seemed less relevant.

With this patch, almost all of the problems in PR24804 around loop pass
pipelines are fixed. The one remaining issue is that we run simplify-cfg
and instcombine in the middle of the loop pass pipeline. We've recently
added some loop variants of these passes that would seem substantially
cleaner to use, but this at least gets us much closer to the previous
state. Notably, the seven loop pass managers is down to three.

I've not updated the loop passes using LoopAccessAnalysis because that
analysis hasn't been fully wired into LoopSimplify/LCSSA, and it isn't
clear that those transforms want to support those forms anyways. They
all run late anyways, so this is harmless. Similarly, LSR is left alone
because it already carefully manages its forms and doesn't need to get
fused into a single loop pass manager with a bunch of other loop passes.

LoopReroll didn't use loop simplified form previously, and I've updated
the test case to match the trivially different output.

Finally, I've also factored all the pass initialization for the passes
that use this technique as well, so that should be done regularly and
reliably.

Thanks to James for the help reviewing and thinking about this stuff,
and Ben for help thinking about it as well!

Differential Revision: http://reviews.llvm.org/D17435

llvm-svn: 261316
2016-02-19 10:45:18 +00:00
Lawrence Hu 84e6f1dd70 Bug fix: use dyn_cast_or_null instead of dyn_cast
Differential Revision: http://reviews.llvm.org/D17154

llvm-svn: 261299
2016-02-19 02:17:07 +00:00
Richard Trieu 7a08381403 Remove uses of builtin comma operator.
Cleanup for upcoming Clang warning -Wcomma.  No functionality change intended.

llvm-svn: 261270
2016-02-18 22:09:30 +00:00
Adam Nemet 9d9cb274ea [PPCLoopDataPrefetch] Move pass to Transforms/Scalar/LoopDataPrefetch. NFC
This patch is part of the work to make PPCLoopDataPrefetch
target-independent
(http://thread.gmane.org/gmane.comp.compilers.llvm.devel/92758).

Obviously the pass still only used from PPC at this point.  Subsequent
patches will start driving this from ARM64 as well.

Due to the previous patch most lines should show up as moved lines.

llvm-svn: 261265
2016-02-18 21:38:19 +00:00
Junmo Park 80440eb804 Minor code cleanup. NFC.
llvm-svn: 261200
2016-02-18 10:09:20 +00:00
Haicheng Wu 57e1a3e6ee [LIR] Avoid turning non-temporal stores into memset
This is to fix PR26645.

llvm-svn: 261149
2016-02-17 21:00:06 +00:00
Roman Gareev 036c08874a Tweak the LICM code to reuse the first sub-loop instead of creating a new one
LICM starts with an *empty* AST, and then merges in each sub-loop. While the
add code is appropriate for sub-loop 2 and up, it's utterly unnecessary for
sub-loop 1. If the AST starts off empty, we can just clone/move the contents
of the subloop into the containing AST.

Reviewed-by: Philip Reames <listmail@philipreames.com>

Differential Revision: http://reviews.llvm.org/D16753

llvm-svn: 260892
2016-02-15 14:48:50 +00:00
Chad Rosier 81362a8599 [LIR] Allow merging of memsets in negatively strided loops.
Last part of PR25166.

llvm-svn: 260732
2016-02-12 21:03:23 +00:00
Justin Lebar df04d2a1f1 [LoopRotate] Don't perform loop rotation if the loop header calls a convergent function.
Summary:
Calls to convergent functions can be duplicated, but only if the
duplicates are not control-flow dependent on any additional values.
Loop rotation doesn't meet the bar.

Reviewers: jingyue

Subscribers: mzolotukhin, llvm-commits, arsenm, joker.eph, resistor, tra, hfinkel, broune

Differential Revision: http://reviews.llvm.org/D17127

llvm-svn: 260729
2016-02-12 21:01:33 +00:00
David Majnemer 01674939b2 Remove unused variable
llvm-svn: 260722
2016-02-12 20:33:51 +00:00
Philip Reames 96fccc2d09 [GVN] Common code for local and non-local load availability [NFCI]
The attached patch removes all of the block local code for performing X-load forwarding by reusing the code used in the non-local case.

The motivation here is to remove duplication and in the process increase our test coverage of some fairly tricky code. I have some upcoming changes I'll be proposing in this area and wanted to have the code cleaned up a bit first.

Note: The review for this mostly happened in email which didn't make it to phabricator on the 258882 commit thread.

Differential Revision: http://reviews.llvm.org/D16608

llvm-svn: 260711
2016-02-12 19:24:57 +00:00
Chad Rosier 4acff96646 [LIR] Partially revert r252926(NFC), which introduced a very subtle change.
In short, before r252926 we were comparing an unsigned (StoreSize) against an a
APInt (Stride), which is fine and well.  After we were zero extending the Stride
and then converting to an unsigned, which is not the same thing.  Obviously,
Stides can also be negative.  This commit just restores the original behavior.

AFAICT, it's not possible to write a test case to expose the issue because
the code already has checks to make sure the StoreSize can't overflow an
unsigned (which prevents the Stride from overflowing an unsigned as well).

llvm-svn: 260706
2016-02-12 19:05:27 +00:00
Tamas Berghammer d12f31535a Fix MSVC 2013 build after rL260504
llvm-svn: 260511
2016-02-11 11:27:51 +00:00
Ashutosh Nema 2260a3a046 Fixed typo in comment & coding style for LoopVersioningLICM.
llvm-svn: 260504
2016-02-11 09:23:53 +00:00
Philip Reames d59638e14f Follow up to 260439, Speculative fix to clang builders
It looks like clang has a couple of test cases which caught the fact LVI was not slightly more precise after 260439.  When looking at the failures, it struck me as wasteful to be querying nullness of a constant via LVI, so instead of tweaking the clang tests, let's just stop querying constants from this source.

llvm-svn: 260451
2016-02-10 22:22:41 +00:00
Tom Stellard 6fa4e290d7 StructurizeCFG: Initialize SkipUniformRegions in the default constructor
This should fix some random bot failures caused by r260336.

llvm-svn: 260342
2016-02-10 01:10:09 +00:00
Tom Stellard 755a4e6b57 StructurizeCFG: Add an option for skipping regions with only uniform branches
Summary:
Tests for this will be added once the AMDGPU backend enables this
option.

Reviewers: arsenm

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D16602

llvm-svn: 260336
2016-02-10 00:39:37 +00:00
Michael Zolotukhin 1da4afdfc9 Factor out UnrollAnalyzer to Analysis, and add unit tests for it.
Summary:
Unrolling Analyzer is already pretty complicated, and it becomes harder and harder to exercise it with usual IR tests, as with them we can only check the final decision: whether the loop is unrolled or not. This change factors this framework out from LoopUnrollPass to analyses, which allows to use unit tests.
The change itself is supposed to be NFC, except adding a couple of tests.

I plan to add more tests as I add new functionality and find/fix bugs.

Reviewers: chandlerc, hfinkel, sanjoy

Subscribers: zzheng, sanjoy, llvm-commits

Differential Revision: http://reviews.llvm.org/D16623

llvm-svn: 260169
2016-02-08 23:03:59 +00:00
Haicheng Wu b35f772b90 [JumpThreading] Change a return of ComputeValueKnownInPredecessors()
Change a return statement of ComputeValueKnownInPredecessors() to be the same as
the rest return statements of the function. Otherwise, it might return true with
an empty Result when the current basic block has no predecessors and trigger the
first assert of JumpThreading::ProcessThreadableEdges().

llvm-svn: 260110
2016-02-08 17:00:39 +00:00
Ashutosh Nema df6763abe8 New Loop Versioning LICM Pass
Summary:
When alias analysis is uncertain about the aliasing between any two accesses,
it will return MayAlias. This uncertainty from alias analysis restricts LICM
from proceeding further. In cases where alias analysis is uncertain we might
use loop versioning as an alternative.

Loop Versioning will create a version of the loop with aggressive aliasing
assumptions in addition to the original with conservative (default) aliasing
assumptions. The version of the loop making aggressive aliasing assumptions
will have all the memory accesses marked as no-alias. These two versions of
loop will be preceded by a memory runtime check. This runtime check consists
of bound checks for all unique memory accessed in loop, and it ensures the
lack of memory aliasing. The result of the runtime check determines which of
the loop versions is executed: If the runtime check detects any memory
aliasing, then the original loop is executed. Otherwise, the version with
aggressive aliasing assumptions is used.

The pass is off by default and can be enabled with command line option 
-enable-loop-versioning-licm.

Reviewers: hfinkel, anemet, chatur01, reames

Subscribers: MatzeB, grosser, joker.eph, sanjoy, javed.absar, sbaranga,
             llvm-commits

Differential Revision: http://reviews.llvm.org/D9151

llvm-svn: 259986
2016-02-06 07:47:48 +00:00
Joseph Tremoulet adc2376375 [RS4GC] Pass DenseMap by reference, NFC
Summary:
Passing the rematerialized values map to insertRematerializationStores by
value looks to be a simple oversight; update it to pass by reference.


Reviewers: reames, sanjoy

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D16911

llvm-svn: 259867
2016-02-05 01:42:52 +00:00
Adam Nemet 9455c1d2b1 [LoopLoadElim] Don't allow versioning when optForSize
This was requested in the review of D16300.

llvm-svn: 259861
2016-02-05 01:14:05 +00:00
David Majnemer a53b5bbb18 [LoopStrengthReduce] Don't rewrite PHIs with incoming values from CatchSwitches
Bail out if we have a PHI on an EHPad that gets a value from a
CatchSwitchInst.  Because the CatchSwitchInst cannot be split, there is
no good place to stick any instructions.

This fixes PR26373.

llvm-svn: 259702
2016-02-03 21:30:34 +00:00
Adam Nemet d52ed84160 [LoopVersioning] Expose loop versioning as a pass too
Summary:
LoopVersioning is a transform utility that transform passes can use to
run-time disambiguate may-aliasing accesses. I'd like to also expose as
pass to allow it to be unit-tested.

I am planning to add support for non-aliasing annotation in
LoopVersioning and I'd like to be able to write tests directly using
this pass.

(After that feature is done, the pass could also be used to look for
optimization opportunities that are hidden behind incomplete alias
information at compile time.)

The pass drives LoopVersioning in its default way which is to fully
disambiguate may-aliasing accesses no matter how many checks are
required.

Reviewers: hfinkel, ashutosh.nema, sbaranga

Subscribers: zzheng, mssimpso, llvm-commits, sanjoy

Differential Revision: http://reviews.llvm.org/D16612

llvm-svn: 259610
2016-02-03 00:06:10 +00:00
Matthias Braun b30f2f5141 Avoid overly large SmallPtrSet/SmallSet
These sets perform linear searching in small mode so it is never a good
idea to use SmallSize/N bigger than 32.

llvm-svn: 259283
2016-01-30 01:24:31 +00:00
Fiona Glaser 36e8230db0 Fix typo in LoopSimplifyCFG
llvm-svn: 259261
2016-01-29 23:12:52 +00:00
Fiona Glaser b417d464e6 Add LoopSimplifyCFG pass
Loop transformations can sometimes fail because the loop, while in
valid rotated LCSSA form, is not in a canonical CFG form. This is
an extremely simple pass that just merges obviously redundant
blocks, which can be used to fix some known failure cases. In the
future, it may be enhanced with more cases (and have code shared with
SimplifyCFG).

This allows us to run LoopSimplifyCFG -> LoopRotate -> LoopUnroll,
so that SimplifyCFG cleans up the loop before Rotate tries to run.

Not currently used in the pass manager, since this pass doesn't do
anything unless you can hook it up in an LPM with other loop passes.
It'll be added once Chandler cleans up things to allow this.

Tested in a custom pipeline out of tree to confirm it works in
practice (in addition to the included trivial test).

llvm-svn: 259256
2016-01-29 22:35:36 +00:00
David Majnemer 75f492e7f1 Fix the build
llvm-svn: 259215
2016-01-29 17:46:57 +00:00
Sanjoy Das c816f03b70 [RS4GC] Address post-commit review on r259208 from David
NFC

llvm-svn: 259211
2016-01-29 17:20:49 +00:00
Sanjoy Das 565f7866ac [RS4GC] Remove unnecessary const_cast; NFC
GCRelocateInst::getDerivedPtr already returns a non-const llvm::Value
pointer.

llvm-svn: 259209
2016-01-29 16:54:49 +00:00
Sanjoy Das 3794eeb8bb [RS4GC] Minor local cleanup to StabilizeOrder; NFC
- Locally declare struct, and call it BaseDerivedPair
 - Use a lambda to compare, instead of a singleton with uninitialized
   fields
 - Add a constructor to BaseDerivedPair and use SmallVector::emplace_back

llvm-svn: 259208
2016-01-29 16:50:34 +00:00
Philip Reames 10e678d25a [GVN] Add clarifying assert [NFCI]
Just adding an assert which makes invariants between AnalyzeLoadsFromClobberingLoads and GetLoadValueForLoad slightly more clear.

llvm-svn: 259145
2016-01-29 02:23:10 +00:00
Sanjoy Das bcf27523f5 [RS4GC] Minor cleanups enabled by the previous change; NFC
llvm-svn: 259133
2016-01-29 01:03:20 +00:00
Sanjoy Das 4099297856 [RS4GC] Delete code that is dead due to r259129; NFC
llvm-svn: 259132
2016-01-29 01:03:17 +00:00
Sanjoy Das 0407108020 [RS4GC] Clamp UseDeoptBundles to true and update tests
The full diff for the test directory may be hard to read because of the
filename clash; so here's all that happened as far as the tests are
concerned:

```
cd test/Transforms/RewriteStatepointsForGC
git rm *ll
git mv deopt-bundles/* ./
rmdir deopt-bundles
find . -name '*.ll' | xargs gsed -i 's/-rs4gc-use-deopt-bundles //g'
```

llvm-svn: 259129
2016-01-29 00:28:57 +00:00
Sanjoy Das bb04f6e28f [PlaceSafepoints] Use DEBUG() instead of TraceLSP
DEBUG() is the more idiomatic LLVM style.

llvm-svn: 259121
2016-01-28 23:49:27 +00:00
Sanjoy Das cd23fec756 [PlaceSafepoints] Misc. minor cleanups; NFC
These changes are aimed at bringing PlaceSafepoints up to code with the
LLVM coding guidelines:

 - Fix variable naming
 - Use DenseSet instead of std::set
 - Remove dead code
 - Minor local code simplifications

llvm-svn: 259112
2016-01-28 23:03:19 +00:00
Sanjoy Das 360a4e4ee2 [PlaceSafepoints] Remvoe unused headers, and sort #includes; NFC
llvm-svn: 259111
2016-01-28 23:03:17 +00:00
Sanjoy Das 12673765cf [PlaceSafepoints] Eliminate dead code; NFC
Now that NoStatepoints is a constant `true`, we can get rid of a bunch
of dead code.

llvm-svn: 259110
2016-01-28 23:03:14 +00:00
Sanjoy Das f7302c8baf [PlaceSafepoints] Clamp NoStatepoints to true
This change permanently clamps -spp-no-statepoints to true (the code
deletion will come later).  Tests that specifically tested
PlaceSafepoint's ability to wrap calls in gc.statepoint have been moved
to RS4GC's test suite.

llvm-svn: 259096
2016-01-28 21:51:14 +00:00
Sanjoy Das 7a2e2bed67 [LICM] Keep metadata on control equivalent hoists
Summary:
If the instruction we're hoisting out of a loop into its preheader is
guaranteed to have executed in the loop, then the metadata associated
with the instruction (e.g. !range or !dereferenceable) is valid in the
preheader.  This is because once we're in the preheader, we know we're
eventually going to reach the location the metadata was valid at.

This change makes LICM smarter around this, and helps it recognize cases
like these:

```
  do {
    int a = *ptr; !range !0
    ...
  } while (i++ < N);
```

to

```
  int a = *ptr; !range !0
  do {
    ...
  } while (i++ < N);
```

Earlier we'd drop the `!range` metadata after hoisting the load from
`ptr`.

Reviewers: igor-laevsky

Subscribers: mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D16669

llvm-svn: 259053
2016-01-28 15:51:58 +00:00
Sanjoy Das cddde58f1c [IndVars] Hoist DataLayout load out of loop; NFC
llvm-svn: 258946
2016-01-27 17:05:09 +00:00
Sanjoy Das 2f7a7447c2 [IndVars] Use isSCEVable; NFC
llvm-svn: 258945
2016-01-27 17:05:06 +00:00
Sanjoy Das 8fdf87c338 [IndVars] Use range-for; NFC
llvm-svn: 258944
2016-01-27 17:05:03 +00:00
Chen Li 5cde8389cf [IndVarSimplify] Rewrite loop exit values with their initial values from loop preheader
Summary:
This is a revised version of D13974, and the following quoted summary are from D13974

"This patch adds support to check if a loop has loop invariant conditions which lead to loop exits. If so, we know that if the exit path is taken, it is at the first loop iteration. If there is an induction variable used in that exit path whose value has not been updated, it will keep its initial value passing from loop preheader. We can therefore rewrite the exit value with
its initial value. This will help remove phis created by LCSSA and enable other optimizations like loop unswitch."

D13974 was committed but failed one lnt test. The bug was that we only checked the condition from loop exit's incoming block was a loop invariant. But there could be another condition from loop header to that incoming block not being a loop invariant. This would produce miscompiled code.

This patch fixes the issue by checking if the incoming block is loop header, and if not, don't perform the rewrite. The could be further improved by recursively checking all conditions leading to loop exit block, but I'd like to check in this simple version first and improve it with future patches.     

Reviewers: sanjoy

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D16570

llvm-svn: 258912
2016-01-27 07:40:41 +00:00
Philip Reames 8e785a4ec0 [GVN] Split AvailableValueInBlock into two parts [NFC]
AvailableValue is the part that represents the potential rematerialization.  AvailableValueInBlock is simply a pair of an AvailableValue and a BB which we might materialize it in.

This is motivated by http://reviews.llvm.org/D16608.  The intent is that we'll have a single function which handles the local case which both local and non-local will use to identify available values.  Once that's done, the local case can rematerialize at the use site and the non-local case can do the SSA construction as it does currently.

llvm-svn: 258882
2016-01-26 23:43:16 +00:00
Chris Bieneman e49730d4ba Remove autoconf support
Summary:
This patch is provided in preparation for removing autoconf on 1/26. The proposal to remove autoconf on 1/26 was discussed on the llvm-dev thread here: http://lists.llvm.org/pipermail/llvm-dev/2016-January/093875.html

"I felt a great disturbance in the [build system], as if millions of [makefiles] suddenly cried out in terror and were suddenly silenced. I fear something [amazing] has happened."
- Obi Wan Kenobi

Reviewers: chandlerc, grosbach, bob.wilson, tstellarAMD, echristo, whitequark

Subscribers: chfast, simoncook, emaste, jholewinski, tberghammer, jfb, danalbert, srhines, arsenm, dschuff, jyknight, dsanders, joker.eph, llvm-commits

Differential Revision: http://reviews.llvm.org/D16471

llvm-svn: 258861
2016-01-26 21:29:08 +00:00
Aditya Nandakumar 3d0c46d489 Reassociate: Reprocess RedoInsts after each inst
Previously the RedoInsts was processed at the end of the block.
However it was possible that it left behind some instructions that
were not canonicalized.
This should guarantee that any previous instruction in the basic
block is canonicalized before we process a new instruction.

llvm-svn: 258830
2016-01-26 18:42:36 +00:00
Haicheng Wu f1c00a22be [LIR] Add support for structs and hand unrolled loops
This is a recommit of r258620 which causes PR26293.

The original message:

Now LIR can turn following codes into memset:

typedef struct foo {
  int a;
  int b;
} foo_t;

void bar(foo_t *f, unsigned n) {
  for (unsigned i = 0; i < n; ++i) {
    f[i].a = 0;
    f[i].b = 0;
  }
}

void test(foo_t *f, unsigned n) {
  for (unsigned i = 0; i < n; i += 2) {
    f[i] = 0;
    f[i+1] = 0;
  }
}

llvm-svn: 258777
2016-01-26 02:27:47 +00:00
Philip Reames 273dcb0d82 [GVN] Rearrange code to make local vs non-local cases more obvious [NFCI]
llvm-svn: 258747
2016-01-25 23:37:53 +00:00
Philip Reames 10a50b188e [GVN] Factor out common code [NFCI]
We had the same code duplicated for each type of Def.  We also have the entire block duplicated between the local and non-local case, but let's start with local cleanup.

llvm-svn: 258740
2016-01-25 23:19:12 +00:00
Lawrence Hu d3d51061fb Enable loopreroll to rerool loop with pointer induction variable.
Example:

while (buf !=end ) {
   S += buf[0];
   S += buf[1];
   buf +=2;
};

Differential Revision: http://reviews.llvm.org/D13151

llvm-svn: 258709
2016-01-25 19:43:45 +00:00
Lawrence Hu b917cd9fa6 Undo commit 258700 due to missing commit message
llvm-svn: 258708
2016-01-25 19:36:30 +00:00
Quentin Colombet a392810bea Speculatively revert r258620 as it is the likely culprid of PR26293.
llvm-svn: 258703
2016-01-25 19:12:49 +00:00
Lawrence Hu 84b6195e41 Differential Revision: http://reviews.llvm.org/D13151
llvm-svn: 258700
2016-01-25 18:53:39 +00:00
David Majnemer eec878574e Fix build bot breakage
llvm-svn: 258661
2016-01-24 16:46:53 +00:00
David Majnemer dcd6c79d55 Fix buildbot failures
llvm-svn: 258655
2016-01-24 06:40:37 +00:00
David Majnemer 88542a0a69 [SCCP] Remove duplicate code
SCCP has code identical to changeToUnreachable's behavior, switch it
over to just call changeToUnreachable.

No functionality change intended.

llvm-svn: 258654
2016-01-24 06:26:47 +00:00
David Majnemer 35c46d3e0b [InstCombine, SCCP] Consolidate code used to remove instructions
InstCombine and SCCP both want to remove dead code in a very particular
way but using identical means to do so.  Share the code between the two.

No functionality change is intended.

llvm-svn: 258653
2016-01-24 05:26:18 +00:00
Haicheng Wu dd5e9d2159 [LIR] Add support for structs and hand unrolled loops
Now LIR can turn following codes into memset:

typedef struct foo {
  int a;
  int b;
} foo_t;

void bar(foo_t *f, unsigned n) {
  for (unsigned i = 0; i < n; ++i) {
    f[i].a = 0;
    f[i].b = 0;
  }
}

void test(foo_t *f, unsigned n) {
  for (unsigned i = 0; i < n; i += 2) {
    f[i] = 0;
    f[i+1] = 0;
  }
}

llvm-svn: 258620
2016-01-23 06:52:41 +00:00
Sanjoy Das 95639746e5 [PlaceSafepoints] Introduce a -spp-no-statepoints flag
Summary:
This change adds a `-spp-no-statepoints` flag to PlaceSafepoints that
bypasses the code that wraps newly introduced polls and existing calls
in gc.statepoint.  With `-spp-no-statepoints` enabled, PlaceSafepoints
effectively becomes a safpeoint **poll** insertion pass.

The eventual goal is to "constant fold" this option, along with
`-rs4gc-use-deopt-bundles` to `true`, once clients using gc.statepoint
are okay doing so.

Reviewers: pgavlin, reames, JosephTremoulet

Subscribers: sanjoy, mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D16439

llvm-svn: 258551
2016-01-22 21:02:55 +00:00
Sanjoy Das acc43d197d [RS4GC] Use OB_deopt instead of "deopt"
llvm-svn: 258529
2016-01-22 19:20:40 +00:00
Eduard Burtescu 68e7f49f8e [opaque pointer types] [NFC] DataLayout::getIndexedOffset: take source element type instead of pointer type and rename to getIndexedOffsetInType.
Summary:

Reviewers: mjacob, dblaikie

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D16282

llvm-svn: 258478
2016-01-22 03:08:27 +00:00
Eduard Burtescu e2a6917849 [opaque pointer types] [NFC] FindAvailableLoadedValue: take LoadInst instead of just the pointer.
Reviewers: mjacob, dblaikie

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D16422

llvm-svn: 258477
2016-01-22 01:51:51 +00:00
Eduard Burtescu 1423921a24 [opaque pointer types] [NFC] Add an explicit type argument to ConstantFoldLoadFromConstPtr.
Reviewers: mjacob, dblaikie

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D16418

llvm-svn: 258472
2016-01-22 01:17:26 +00:00
David L Kreitzer 4d7257dfa1 Fix for two constant propagation problems in GVN with the assume intrinsic
instruction.

Patch by Yuanrui Zhang.

Differential Revision: http://reviews.llvm.org/D16100

llvm-svn: 258435
2016-01-21 21:32:35 +00:00
Sanjoy Das a34ce95b60 Add a "gc-transition" operand bundle
Summary:
This adds a new kind of operand bundle to LLVM denoted by the
`"gc-transition"` tag.  Inputs to `"gc-transition"` operand bundle are
lowered into the "transition args" section of `gc.statepoint` by
`RewriteStatepointsForGC`.

This removes the last bit of functionality that was unsupported in the
deopt bundle based code path in `RewriteStatepointsForGC`.

Reviewers: pgavlin, JosephTremoulet, reames

Subscribers: sanjoy, mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D16342

llvm-svn: 258338
2016-01-20 19:50:25 +00:00
Eduard Burtescu 19eb03106d [opaque pointer types] [NFC] GEP: replace get(Pointer)ElementType uses with get{Source,Result}ElementType.
Summary:
GEPOperator: provide getResultElementType alongside getSourceElementType.
This is made possible by adding a result element type field to GetElementPtrConstantExpr, which GetElementPtrInst already has.

GEP: replace get(Pointer)ElementType uses with get{Source,Result}ElementType.

Reviewers: mjacob, dblaikie

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D16275

llvm-svn: 258145
2016-01-19 17:28:00 +00:00
Philip Reames b336bca07e [GC] Lower vectors-of-pointers directly by default
This commit changes the default on our lowering of vectors-of-pointers from splitting in RS4GC to reporting them in the final stack map.  All of the changes to do so are already in place and tested.  Assuming no problems are unearthed in the next week, we will be deleting the old code entirely next Monday.

llvm-svn: 258111
2016-01-19 04:18:24 +00:00
Eduard Burtescu 6007e0dd02 Revert assert added in rL258028 as the alloca and OtherPtr types may differ in address space.
llvm-svn: 258029
2016-01-18 00:20:34 +00:00
Eduard Burtescu 90c4449128 [opaque pointer types] Alloca: use getAllocatedType() instead of getType()->getPointerElementType().
Reviewers: mjacob

Subscribers: llvm-commits, dblaikie

Differential Revision: http://reviews.llvm.org/D16272

llvm-svn: 258028
2016-01-18 00:10:01 +00:00
Sanjoy Das de47590589 [IndVars] Fix PR25576
`LCSSASafePhiForRAUW` as computed was incorrect -- in cases like
these (this exact example does not actually trigger the bug):

define i32 @f(i32 %n, i1* %c) {
entry:
  br label %outer.loop

outer.loop:
  br label %inner.loop

inner.loop:
  %iv = phi i32 [ 0, %outer.loop ], [ %iv.inc, %inner.loop ]
  %iv.inc = add nuw nsw i32 %iv, 1
  %tc = udiv i32 %n, 13
  %be.cond = icmp ult i32 %iv, %tc
  br i1 %be.cond, label %inner.loop, label %inner.exit

inner.exit:
  %iv.lcssa = phi i32 [ %iv, %inner.loop ]
  %outer.be.cond = load volatile i1, i1* %c
  br i1 %outer.be.cond, label %outer.loop, label %leave

leave:
  %iv.lcssa.lcssa = phi i32 [ %iv.lcssa, %inner.exit ]
  ret i32 %iv.lcssa.lcssa
}

`LCSSASafePhiForRAUW` is true for `%iv.lcssa` when re-rewriting the exit
value of `%iv` for `%inner.loop` to `%tc` (this can happen due to
`SCEVExpander::findExistingExpansion`), but the RAUW breaks LCSSA.

To fix this, instead of computing `SafePhi` with special logic, decide
the safety of RAUW directly via `replacementPreservesLCSSAForm`.

llvm-svn: 258016
2016-01-17 18:12:52 +00:00
Sanjoy Das 7a8a705c9d [IndVars] Use emplace_back; NFC
llvm-svn: 258015
2016-01-17 18:12:48 +00:00
Artur Pilipenko aba8fdc480 Fix buildbot failure introduced by 258010. Remove local variables became unused.
llvm-svn: 258011
2016-01-17 12:59:40 +00:00
Artur Pilipenko f84dc06e5b Push isDereferenceableAndAlignedPointer down into isSafeToLoadUnconditionally
Reviewed By: reames

Differential Revision: http://reviews.llvm.org/D16226

llvm-svn: 258010
2016-01-17 12:35:29 +00:00
Manuel Jacob 5f6eaac611 GlobalValue: use getValueType() instead of getType()->getPointerElementType().
Reviewers: mjacob

Subscribers: jholewinski, arsenm, dsanders, dblaikie

Patch by Eduard Burtescu.

Differential Revision: http://reviews.llvm.org/D16260

llvm-svn: 257999
2016-01-16 20:30:46 +00:00
Justin Bogner fd757648a4 PM: Fix an inverted condition in simplifyFunctionCFG
I mentioned the issue here in code review way back in September and
was sure we'd fixed it, but apparently we forgot:

  http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20150921/301850.html

In any case, as soon as you try to use this pass in anything but the
most basic pipeline everything falls apart. Fix the condition.

llvm-svn: 257935
2016-01-15 21:21:39 +00:00
Artur Pilipenko 6dd6969cee Change isSafeToLoadUnconditionally arguments order. Separated from http://reviews.llvm.org/D10920.
llvm-svn: 257894
2016-01-15 15:27:46 +00:00
Keno Fischer d5354fdddb [SROA] Also insert a bit piece expression if only one piece is needed
Summary: If SROA creates only one piece (e.g. because the other is not needed),
it still needs to create a bit_piece expression if that bit piece is smaller
than the original size of the alloca.

Reviewers: aprantl

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D16187

llvm-svn: 257795
2016-01-14 20:06:34 +00:00
Sanjay Patel 9913322327 move return variable declarations down to where they are actually used; NFCI
llvm-svn: 257700
2016-01-13 23:01:57 +00:00
Justin Bogner b8d82abb78 LoopUnroll: Move the actual unrolling logic to a standalone function. NFC
This is pure code motion - break the actual work out of runOnLoop into
a reusable standalone function.

llvm-svn: 257445
2016-01-12 05:21:37 +00:00
Justin Bogner 921b04e9a4 LoopUnroll: Make canUnrollCompletely static - it doesn't use any state. NFC
llvm-svn: 257427
2016-01-12 01:06:32 +00:00
Justin Bogner a1dd493159 LoopUnroll: Clean up the maze of initialization for unroll parameters. NFC
The layering of where the various loop unroll parameters are
initialized and overridden here was very confusing, making it pretty
difficult to tell just how the various sources interacted. Instead, we
put all of the initialization logic together in a single function so
that it's obvious what overrides what.

llvm-svn: 257426
2016-01-12 00:55:26 +00:00
Justin Bogner 0fb7ed5726 LoopUnroll: Use the optsize threshold for minsize as well
Currently we're unrolling loops more in minsize than in optsize, which
means -Oz will have a larger code size than -Os. That doesn't make any
sense.

This resolves the FIXME about this in LoopUnrollPass and extends the
optsize test to make sure we use the smaller threshold for minsize as
well.

llvm-svn: 257402
2016-01-11 22:39:43 +00:00
David Majnemer d9833ea579 [JumpThreading] Don't forget to report that the IR changed
JumpThreading's runOnFunction is supposed to return true if it made any
changes.  JumpThreading has a call to removeUnreachableBlocks which may
result in changes to the IR but runOnFunction didn't appropriate account
for this possibility, leading to badness.

While we are here, make sure to call LazyValueInfo::eraseBlock in
removeUnreachableBlocks;  JumpThreading preserves LVI.

This fixes PR26096.

llvm-svn: 257279
2016-01-10 07:13:04 +00:00
Benjamin Kramer 543762da3e [JumpThreading] Use range-based for loops.
No functionality change intended.

llvm-svn: 257262
2016-01-09 18:43:01 +00:00
Benjamin Kramer 530e0db333 [TRE] Simplify code with range-based loops and std::find.
No functional change intended.

llvm-svn: 257261
2016-01-09 17:35:29 +00:00
Manuel Jacob 734e73342d [RS4GC] Update and simplify handling of Constants in findBaseDefiningValueOfVector().
Summary:
This is analogous to r256079, which removed an overly strong assertion, and
r256812, which simplified the code by replacing three conditionals by one.

Reviewers: reames

Subscribers: sanjoy, llvm-commits

Differential Revision: http://reviews.llvm.org/D16019

llvm-svn: 257250
2016-01-09 04:02:16 +00:00
Manuel Jacob 0593cfd336 [RS4GC] Unify two asserts. NFC.
llvm-svn: 257247
2016-01-09 03:08:49 +00:00
Philip Reames 5715f576ea [rs4gc] Optionally directly relocated vector of pointers
This patch teaches rewrite-statepoints-for-gc to relocate vector-of-pointers directly rather than trying to split them. This builds on the recent lowering/IR changes to allow vector typed gc.relocates.

The motivation for this is that we recently found a bug in the vector splitting code where depending on visit order, a vector might not be relocated at some safepoint. Specifically, the bug is that the splitting code wasn't updating the side tables (live vector) of other safepoints. As a result, a vector which was live at two safepoints might not be updated at one of them. However, if you happened to visit safepoints in post order over the dominator tree, everything worked correctly. Weirdly, it turns out that post order is actually an incredibly common order to visit instructions in in practice. Frustratingly, I have not managed to write a test case which actually hits this. I can only reproduce it in large IR files produced by actual applications.

Rather than continue to make this code more complicated, we can remove all of the complexity by just representing the relocation of the entire vector natively in the IR.

At the moment, the new functionality is hidden behind a flag. To use this code, you need to pass "-rs4gc-split-vector-values=0". Once I have a chance to stress test with this option and get feedback from other users, my plan is to flip the default and remove the original splitting code. I would just remove it now, but given the rareness of the bug, I figured it was better to leave it in place until the new approach has been stress tested.

Differential Revision: http://reviews.llvm.org/D15982

llvm-svn: 257244
2016-01-09 01:31:13 +00:00
Sanjay Patel 9f088ab5e2 rangify; NFCI
llvm-svn: 257226
2016-01-08 22:59:42 +00:00
Sanjay Patel 9f49b683e0 variable names start with an upper case letter; NFC
llvm-svn: 257213
2016-01-08 22:05:03 +00:00
Haicheng Wu a6a3279bd3 [JumpThreading] Split select that has constant conditions coming from the PHI node
Look for PHI/Select in the same BB of the form

bb:
  %p = phi [false, %bb1], [true, %bb2], [false, %bb3], [true, %bb4], ...
  %s = select p, trueval, falseval

And expand the select into a branch structure. This later enables
jump-threading over bb in this pass.

Using the similar approach of SimplifyCFG::FoldCondBranchOnPHI(), unfold
select if the associated PHI has at least one constant.  If the unfolded
select is not jump-threaded, it will be folded again in the later
optimizations.

llvm-svn: 257198
2016-01-08 19:39:39 +00:00
Justin Bogner e9fb228d59 LoopInfo: Simplify ownership of Loop objects
It's strange that LoopInfo mostly owns the Loop objects, but that it
defers deleting them to the loop pass manager. Instead, change the
oddly named "updateUnloop" to "markAsRemoved" and have it queue the
Loop object for deletion. We can't delete the Loop immediately when we
remove it, since we need its pointer identity still, so we'll mark the
object as "invalid" so that clients can see what's going on.

llvm-svn: 257191
2016-01-08 19:08:53 +00:00
Mehdi Amini 599ebf2767 Remove static global GCNames from Function.cpp and move it to the Context
This remove the need for locking when deleting a function.

Differential Revision: http://reviews.llvm.org/D15988

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 257139
2016-01-08 02:28:20 +00:00
Aditya Nandakumar f94c149f7f Instructions to be redone only if from the same BB
While adding instructions(possible roots) to be redone, make sure they
are from the same basic block.

llvm-svn: 257112
2016-01-07 23:22:55 +00:00
David Majnemer f1a9c9e148 [SCCP] Don't violate the lattice invariants
We marked values which are 'undef' as constant instead of undefined
which violates SCCP's invariants.  If we can figure out that a
computation results in 'undef', leave it in the undefined state.

This fixes PR16052.

llvm-svn: 257102
2016-01-07 21:36:16 +00:00
David Majnemer f3b99dd22e Remove junk accidentally commited with r257087
llvm-svn: 257089
2016-01-07 19:30:13 +00:00
David Majnemer bae945735a [SCCP] Can't go from overdefined to constant
The fix for PR23999 made us mark loads of null as producing the constant
undef which upsets the lattice.  Instead, keep the load as "undefined".
This fixes PR26044.

llvm-svn: 257087
2016-01-07 19:25:39 +00:00
Philip Reames 103d2381d6 [RS4GC] Add an option to suppress vector splitting
At the moment, this is essentially a diangostic option so that I can start collecting failing test cases, but we will eventually migrate to removing the vector splitting code entirely.

llvm-svn: 257015
2016-01-07 02:20:11 +00:00
Mehdi Amini 0535003bef Fix PR26051: Memcpy optimization should introduce a call to memcpy before the store destination position
This is a conservative fix, I expect Amaury to relax this.
Follow-up for r256923

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 256999
2016-01-06 23:50:22 +00:00
Amaury Sechet 3235c08253 Promote aggregate store to memset when possible
Summary: As per title. This will allow the optimizer to pick up on it.

Reviewers: craig.topper, spatel, dexonsmith, Prazek, chandlerc, joker.eph, majnemer

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D15923

llvm-svn: 256969
2016-01-06 19:47:24 +00:00
Amaury Sechet 5fc9f6999d Remove useless DEBUG
llvm-svn: 256968
2016-01-06 19:45:09 +00:00
Amaury Sechet d3b2c0fd94 Improve load/store to memcpy for aggregate
Summary: It turns out that if we don't try to do it at the store location, we can do it before any operation that alias the load, as long as no operation alias the store.

Reviewers: craig.topper, spatel, dexonsmith, Prazek, chandlerc, joker.eph

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D15903

llvm-svn: 256923
2016-01-06 09:30:39 +00:00
Amaury Sechet a0c242cdfd Implement load to store => memcpy in MemCpyOpt for aggregates
Summary:
Most of the tool chain is able to optimize scalar and memcpy like operation effisciently while it isn't that good with aggregates. In order to improve the support of aggregate, we try to change aggregate manipulation into either scalar or memcpy like ones whenever possible without loosing informations.

This is one such opportunity.

Reviewers: craig.topper, spatel, dexonsmith, Prazek, chandlerc

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D15894

llvm-svn: 256868
2016-01-05 20:17:48 +00:00
Manuel Jacob 75cbfdcf03 [RS4GC] Simplify handling of Constants in findBaseDefiningValue(). NFC.
Summary:
Previously there were three conditionals, checking for global
variables, undef values and everything constant except these two, all three
returning the same value.  This commit replaces them by one conditional.

Reviewers: reames

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D15818

llvm-svn: 256812
2016-01-05 04:06:21 +00:00
Manuel Jacob 83eefa6d20 [Statepoints] Refactor GCRelocateOperands into an intrinsic wrapper. NFC.
Summary:
This commit renames GCRelocateOperands to GCRelocateInst and makes it an
intrinsic wrapper, similar to e.g. MemCpyInst.  Also, all users of
GCRelocateOperands were changed to use the new intrinsic wrapper instead.

Reviewers: sanjoy, reames

Subscribers: reames, sanjoy, llvm-commits

Differential Revision: http://reviews.llvm.org/D15762

llvm-svn: 256811
2016-01-05 04:03:00 +00:00
David Majnemer b33f3a239a [LICM] Fix a small oversight introduced in r256763
r256763 had promoteLoopAccessesToScalars check for the existence of a
catchswitch when the exit blocks were populated but
promoteLoopAccessesToScalars may be called with a prepopulated set of
exit blocks which would also need to be checked.

This fixes PR26019.

llvm-svn: 256788
2016-01-04 23:16:22 +00:00
Haicheng Wu 9d6c94006e [LIR] General refactoring to simplify code and the ease future code review
This is a resubmission of r256336 which was reverted in r256361. The issue was the lack of the invariant check of the memset value in processLooMemSet().

The original message:

Move several checks into isLegalStores. Also, delineate between those stores that are memset-able and those that are memcpy-able.

llvm-svn: 256783
2016-01-04 21:43:14 +00:00
Aditya Nandakumar 12d060481a Remove dead instructions before Redoing
Before reevaluating instructions, iterate over all instructions
to be reevaluated and remove trivially dead instructions and if
any of it's operands become trivially dead, mark it for deletion
until all trivially dead instructions have been removed

llvm-svn: 256773
2016-01-04 19:48:14 +00:00
David Majnemer 219055f9df [LICM] Don't insert instructions after a catchswitch when performing loop promotion
Inserting after a catchswitch results in verifier errors, bail out on
promotion if a catchswitch is a loop exit.

llvm-svn: 256763
2016-01-04 17:42:19 +00:00
David Majnemer 42a0730c42 [LICM] Make instruction sinking funclet-aware
We had two bugs here:
- We might try to sink into a catchswitch, causing verifier failures.
- We will succeed in sinking into a cleanuppad but we didn't update the
  funclet operand bundle.

This fixes PR26000.

llvm-svn: 256728
2016-01-04 03:37:39 +00:00
Manuel Jacob 67f1d3ac63 [RS4GC] Use DenseMap::count() instead of DenseMap::find()/DenseMap::end(). NFC.
llvm-svn: 256586
2015-12-29 22:16:41 +00:00
Manuel Jacob e3773d632e [PlaceSafepoints] Assert that the gc.safepoint_poll function is present in the module.
If running the PlaceSafepoints pass on a module which doesn't have the
gc.safepoint_poll function without disabling entry and backedge safepoints,
previously the pass crashed with an obscure error because of a null pointer.
Now it fails the assert instead.

llvm-svn: 256580
2015-12-29 21:57:55 +00:00
Geoff Berry 43dc285915 [JumpThreading] Fix opcode bonus in getJumpThreadDuplicationCost()
The code that was meant to adjust the duplication cost based on the
terminator opcode was not being executed in cases where the initial
threshold was hit inside the loop.

Subscribers: mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D15536

llvm-svn: 256568
2015-12-29 18:10:16 +00:00
Manuel Jacob 9db5b93ffc [RS4GC] Fix rematerialization of bitcast of bitcast.
Summary:
Previously, only the outer (last) bitcast was rematerialized, resulting in a
use of the unrelocated inner (first) bitcast after the statepoint.  See the
test case for an example.

Reviewers: igor-laevsky, reames

Subscribers: reames, alex, llvm-commits, sanjoy

Differential Revision: http://reviews.llvm.org/D15789

llvm-svn: 256520
2015-12-28 20:14:05 +00:00
Chen Li d71999ef1b [gc.statepoint] Change gc.statepoint intrinsic's return type to token type instead of i32 type
Summary: This patch changes gc.statepoint intrinsic's return type to token type instead of i32 type. Using token types could prevent LLVM to merge different gc.statepoint nodes into PHI nodes and cause further problems with gc relocations. The patch also changes the way on how gc.relocate and gc.result look for their corresponding gc.statepoint on unwind path. The current implementation uses the selector value extracted from a { i8*, i32 } landingpad as a hook to find the gc.statepoint, while the patch directly uses a token type landingpad (http://reviews.llvm.org/D15405) to find the gc.statepoint. 

Reviewers: sanjoy, JosephTremoulet, pgavlin, igor-laevsky, mjacob

Subscribers: reames, mjacob, sanjoy, llvm-commits

Differential Revision: http://reviews.llvm.org/D15662

llvm-svn: 256443
2015-12-26 07:54:32 +00:00
Nico Weber 95cc9d5f14 Revert r256336, it caused PR25939
llvm-svn: 256361
2015-12-24 04:01:06 +00:00
Chad Rosier fba65d2fd3 [LIR] General refactoring to simplify code and the ease future code review.
Move several checks into isLegalStores. Also, delineate between those stores
that are memset-able and those that are memcpy-able.

http://reviews.llvm.org/D15683
Patch by Haicheng Wu <haicheng@codeaurora.org>!

llvm-svn: 256336
2015-12-23 17:29:33 +00:00
David Majnemer 63ad9e0543 [OperandBundles] Have TailCallElim play nice with operand bundles
A call site's use of a Value might not correspond to an argument
operand but to a bundle operand.

This fixes PR25928.

llvm-svn: 256328
2015-12-23 09:58:43 +00:00
Philip Reames ee8f055327 [GC] Make GCStrategy::isGCManagedPointer a type predicate not a value predicate [NFC]
Reasons:
1) The existing form was a form of false generality.  None of the implemented GCStrategies use anything other than a type.  Its becoming more and more clear we're going to need some type of strong GC pointer in the type system and we shouldn't pretend otherwise at this point.
2) The API was awkward when applied to vectors-of-pointers.  The old one could have been made to work, but calling isGCManagedPointer(Ty->getScalarType()) is much cleaner than the Value alternatives.  
3) The rewriting implementation effectively assumes the type based predicate as well.  We should be consistent.

llvm-svn: 256312
2015-12-23 01:42:15 +00:00
Manuel Jacob a4efd8ac2e [RS4GC] Fix base pair printing for constants.
Previously, "%" + name of the value was printed for each derived and base
pointer.  This is correct for instructions, but wrong for e.g. globals.

llvm-svn: 256305
2015-12-23 00:19:45 +00:00
Cong Hou 6a2c71af0b [BPI] Fix two potential divide-by-zero operations that are introduced in r256263.
llvm-svn: 256303
2015-12-22 23:45:55 +00:00
Cong Hou e93b8e1539 [BPI] Replace weights by probabilities in BPI.
This patch removes all weight-related interfaces from BPI and replace
them by probability versions. With this patch, we won't use edge weight
anymore in either IR or MC passes. Edge probabilitiy is a better
representation in terms of CFG update and validation.


Differential revision: http://reviews.llvm.org/D15519 

llvm-svn: 256263
2015-12-22 18:56:14 +00:00
Manuel Jacob 4e4f60ded0 Remove deprecated llvm.experimental.gc.result.{int,float,ptr} intrinsics.
Summary:
These were deprecated 11 months ago when a generic
llvm.experimental.gc.result intrinsic, which works for all types, was added.

Reviewers: sanjoy, reames

Subscribers: sanjoy, chenli, llvm-commits

Differential Revision: http://reviews.llvm.org/D15719

llvm-svn: 256262
2015-12-22 18:44:45 +00:00
Manuel Jacob 990dfa6fe5 [RS4GC] Fix crash in the case that a live variable has a constant base.
Summary:
Previously, RS4GC crashed in CreateGCRelocates() because it assumed
that every base is also in the array of live variables, which isn't true if a
live variable has a constant base.

This change fixes the crash by making sure CreateGCRelocates() won't try to
relocate a live variable with a constant base.  This would be unnecessary
anyway because anything with a constant base won't move.

Reviewers: reames

Subscribers: llvm-commits, sanjoy

Differential Revision: http://reviews.llvm.org/D15556

llvm-svn: 256252
2015-12-22 16:50:44 +00:00
Chad Rosier 94274fb1ad [LIR] Refactor code to enable future patch. NFC.
llvm-svn: 256159
2015-12-21 14:49:32 +00:00
Manuel Jacob 8050a49737 [RS4GC] Add an assert which fails if there is a (yet unsupported) addrspacecast.
The slightly strange indentation comes from clang-format.

llvm-svn: 256132
2015-12-21 01:26:46 +00:00
Philip Reames 5d54689bca [RS4GC] Remove an overly strong assertion
As shown by the included test case, it's reasonable to end up with constant references during base pointer calculation.  The code actually handled this case just fine, we only had the assert to help isolate problems under the belief that constant references shouldn't be present in IR generated by managed frontends. This turned out to be wrong on two fronts: 1) Manual Jacobs is working on a language with constant references, and b) we found a case where the optimizer does create them in practice.

llvm-svn: 256079
2015-12-19 02:38:22 +00:00
Jingyue Wu ba3ca76ed2 [NaryReassociate] allow candidate to have a different type
Summary:
If Candiadte may have a different type from GEP, we should bitcast or
pointer cast it to GEP's type so that the later RAUW doesn't complain.

Added a test in nary-gep.ll

Reviewers: tra, meheff

Subscribers: mcrosier, llvm-commits, jholewinski

Differential Revision: http://reviews.llvm.org/D15618

llvm-svn: 256035
2015-12-18 21:36:30 +00:00
Philip Reames dd0948a1b6 [RS4GC] Use an value handle to help isolate errors quickly
Inspired by the bug reported in 25846.  Whatever we end up doing about that one, the value handle change is a generally good one since it will help catch this type of mistake more quickly.

Patch by: Manuel Jacob

llvm-svn: 255984
2015-12-18 03:53:28 +00:00
Sanjoy Das 0de2feceb1 [SCEV] Add and use SCEVConstant::getAPInt; NFCI
llvm-svn: 255921
2015-12-17 20:28:46 +00:00
Philip Reames 15145fb7b1 [EarlyCSE] DSE of atomic unordered stores
The rules for removing trivially dead stores are a lot less complicated than loads. Since we know the later store post dominates the former and the former dominates the later, unless the former has side effects other than the actual store, we can remove it. One slightly surprising thing is that we can freely remove atomic stores, even if the later one isn't atomic. There's no guarantee the atomic one was every visible.

For the moment, we don't handle DSE of ordered atomic stores. We could extend the same chain of reasoning to them, but the catch is we'd then have to model the ordering effect without a store instruction. Since our fences are a stronger than our operation orderings, simple using a fence isn't an obvious win. This arguable calls for a refinement in our fence specification, but that's (much) later work.

Differential Revision: http://reviews.llvm.org/D15352

llvm-svn: 255914
2015-12-17 18:50:50 +00:00
Eric Christopher bfba572425 Fix funciton->function typo.
llvm-svn: 255841
2015-12-16 23:10:53 +00:00
Justin Bogner 883a3ea67f LPM: Make callers of LPM.deleteLoopFromQueue update LoopInfo directly. NFC
As of r255720, the loop pass manager will DTRT when passes update the
loop info for removed loops, so they no longer need to reach into
LPPassManager APIs to do this kind of transformation. This change very
nearly removes the need for the LPPassManager to even be passed into
loop passes - the only remaining pass that uses the LPM argument is
LoopUnswitch.

llvm-svn: 255797
2015-12-16 18:40:20 +00:00
Philip Reames ae1f265bf1 [EarlyCSE] DSE of stores which write back loaded values
Extend EarlyCSE with an additional style of dead store elimination. If we write back a value just read from that memory location, we can eliminate the store under the assumption that the value hasn't changed.

I'm implementing this mostly because I noticed the omission when looking at the code. It seemed strange to have InstCombine have a peephole which was more powerful than EarlyCSE. :)

Differential Revision: http://reviews.llvm.org/D15397

llvm-svn: 255739
2015-12-16 01:01:30 +00:00
Justin Bogner 843fb204b7 LPM: Stop threading `Pass *` through all of the loop utility APIs. NFC
A large number of loop utility functions take a `Pass *` and reach
into it to find out which analyses to preserve. There are a number of
problems with this:

- The APIs have access to pretty well any Pass state they want, so
  it's hard to tell what they may or may not do.

- Other APIs have copied these and pass around a `Pass *` even though
  they don't even use it. Some of these just hand a nullptr to the API
  since the callers don't even have a pass available.

- Passes in the new pass manager don't work like the current ones, so
  the APIs can't be used as is there.

Instead, we should explicitly thread the analysis results that we
actually care about through these APIs. This is both simpler and more
reusable.

llvm-svn: 255669
2015-12-15 19:40:57 +00:00
Justin Bogner 6291b587b6 LoopRotate: Convert the methods of LoopRotate to utility functions. NFC
This moves the actual work to do loop rotation into standalone
functions with the analysis results they need passed in as arguments,
leaving the class itself as a relatively simple shim. This will make
the functions easy to reuse when we're ready to port this
transformation to the new pass manager.

llvm-svn: 255574
2015-12-14 23:22:48 +00:00
Justin Bogner a730045156 LoopRotate: Reorder some method implementations. NFC
This just moves some callers after their callees. My next patch will
convert some of these methods to stand alone functions, and that diff
is more obviously NFC if I move these first. That change, in turn,
will make it much easier to port this pass to the new pass manager
once the loop pass manager is in place.

llvm-svn: 255573
2015-12-14 23:22:44 +00:00
Sanjay Patel af674fbfd9 getParent() ^ 3 == getModule() ; NFCI
llvm-svn: 255511
2015-12-14 17:24:23 +00:00
David Majnemer 8a1c45d6e8 [IR] Reformulate LLVM's EH funclet IR
While we have successfully implemented a funclet-oriented EH scheme on
top of LLVM IR, our scheme has some notable deficiencies:
- catchendpad and cleanupendpad are necessary in the current design
  but they are difficult to explain to others, even to seasoned LLVM
  experts.
- catchendpad and cleanupendpad are optimization barriers.  They cannot
  be split and force all potentially throwing call-sites to be invokes.
  This has a noticable effect on the quality of our code generation.
- catchpad, while similar in some aspects to invoke, is fairly awkward.
  It is unsplittable, starts a funclet, and has control flow to other
  funclets.
- The nesting relationship between funclets is currently a property of
  control flow edges.  Because of this, we are forced to carefully
  analyze the flow graph to see if there might potentially exist illegal
  nesting among funclets.  While we have logic to clone funclets when
  they are illegally nested, it would be nicer if we had a
  representation which forbade them upfront.

Let's clean this up a bit by doing the following:
- Instead, make catchpad more like cleanuppad and landingpad: no control
  flow, just a bunch of simple operands;  catchpad would be splittable.
- Introduce catchswitch, a control flow instruction designed to model
  the constraints of funclet oriented EH.
- Make funclet scoping explicit by having funclet instructions consume
  the token produced by the funclet which contains them.
- Remove catchendpad and cleanupendpad.  Their presence can be inferred
  implicitly using coloring information.

N.B.  The state numbering code for the CLR has been updated but the
veracity of it's output cannot be spoken for.  An expert should take a
look to make sure the results are reasonable.

Reviewers: rnk, JosephTremoulet, andrew.w.kaylor

Differential Revision: http://reviews.llvm.org/D15139

llvm-svn: 255422
2015-12-12 05:38:55 +00:00
Chad Rosier d7634fc91d Revert r255247, r255265, and r255286 due to serious compile-time regressions.
Revert "[DSE] Disable non-local DSE to see if the bots go green."
Revert "[DeadStoreElimination] Use range-based loops. NFC."
Revert "[DeadStoreElimination] Add support for non-local DSE."

llvm-svn: 255354
2015-12-11 18:39:41 +00:00
Hal Finkel 494393b740 AlignmentFromAssumptions and SLPVectorizer preserves AA and GlobalsAA
GlobalsAA's assumptions that passes do not escape globals not previously
escaped is not violated by AlignmentFromAssumptions and SLPVectorizer. Marking
them as such allows GlobalsAA to be preserved until GVN in the LTO pipeline.

http://lists.llvm.org/pipermail/llvm-dev/2015-December/092972.html

Patch by Vaivaswatha Nagaraj!

llvm-svn: 255348
2015-12-11 17:46:01 +00:00
Chad Rosier 843c7b4309 [DSE] Disable non-local DSE to see if the bots go green.
I see a few bots timing out, so I'm speculatively disabling r255247.

llvm-svn: 255286
2015-12-10 19:23:02 +00:00
Chad Rosier 02fe4248a2 [DeadStoreElimination] Use range-based loops. NFC.
llvm-svn: 255265
2015-12-10 17:27:18 +00:00
Chad Rosier 533bc3fcac [DeadStoreElimination] Add support for non-local DSE.
We extend the search for redundant stores to predecessor blocks that
unconditionally lead to the block BB with the current store instruction.  That
also includes single-block loops that unconditionally lead to BB, and
if-then-else blocks where then- and else-blocks unconditionally lead to BB.

http://reviews.llvm.org/D13363
Patch by Ivan Baev <ibaev@codeaurora.org>!

llvm-svn: 255247
2015-12-10 13:51:43 +00:00
Silviu Baranga 86de80db37 [LLE] Use the PredicatedScalarEvolution interface to query SCEVs for dependences
Summary:
LAA uses the PredicatedScalarEvolution interface, so it can produce
forward/backward dependences having SCEVs that are AddRecExprs only after being
transformed by PredicatedScalarEvolution.

Use PredicatedScalarEvolution to get the expected expressions.

Reviewers: anemet

Subscribers: llvm-commits, sanjoy

Differential Revision: http://reviews.llvm.org/D15382

llvm-svn: 255241
2015-12-10 11:07:18 +00:00
Reid Kleckner 54ade23504 [Float2Int] Don't operate on vector instructions
This fixes a crash bug. It's also not clear if we'd want to do this
transform for vectors.

llvm-svn: 255155
2015-12-09 21:08:18 +00:00
Silviu Baranga 9cd9a7e310 Re-commit r255115, with the PredicatedScalarEvolution class moved to
ScalarEvolution.h, in order to avoid cyclic dependencies between the Transform
and Analysis modules:

[LV][LAA] Add a layer over SCEV to apply run-time checked knowledge on SCEV expressions

Summary:
This change creates a layer over ScalarEvolution for LAA and LV, and centralizes the
usage of SCEV predicates. The SCEVPredicatedLayer takes the statically deduced knowledge
by ScalarEvolution and applies the knowledge from the SCEV predicates. The end goal is
that both LAA and LV should use this interface everywhere.

This also solves a problem involving the result of SCEV expression rewritting when
the predicate changes. Suppose we have the expression (sext {a,+,b}) and two predicates
  P1: {a,+,b} has nsw
  P2: b = 1.

Applying P1 and then P2 gives us {a,+,1}, while applying P2 and the P1 gives us
sext({a,+,1}) (the AddRec expression was changed by P2 so P1 no longer applies).
The SCEVPredicatedLayer maintains the order of transformations by feeding back
the results of previous transformations into new transformations, and therefore
avoiding this issue.

The SCEVPredicatedLayer maintains a cache to remember the results of previous
SCEV rewritting results. This also has the benefit of reducing the overall number
of expression rewrites.

Reviewers: mzolotukhin, anemet

Subscribers: jmolloy, sanjoy, llvm-commits

Differential Revision: http://reviews.llvm.org/D14296

llvm-svn: 255122
2015-12-09 16:06:28 +00:00
Silviu Baranga ad1ccb357b Revert r255115 until we figure out how to fix the bot failures.
llvm-svn: 255117
2015-12-09 15:25:28 +00:00
Silviu Baranga 41eb682501 [LV][LAA] Add a layer over SCEV to apply run-time checked knowledge on SCEV expressions
Summary:
This change creates a layer over ScalarEvolution for LAA and LV, and centralizes the
usage of SCEV predicates. The SCEVPredicatedLayer takes the statically deduced knowledge
by ScalarEvolution and applies the knowledge from the SCEV predicates. The end goal is
that both LAA and LV should use this interface everywhere.

This also solves a problem involving the result of SCEV expression rewritting when
the predicate changes. Suppose we have the expression (sext {a,+,b}) and two predicates
  P1: {a,+,b} has nsw
  P2: b = 1.

Applying P1 and then P2 gives us {a,+,1}, while applying P2 and the P1 gives us
sext({a,+,1}) (the AddRec expression was changed by P2 so P1 no longer applies).
The SCEVPredicatedLayer maintains the order of transformations by feeding back
the results of previous transformations into new transformations, and therefore
avoiding this issue.

The SCEVPredicatedLayer maintains a cache to remember the results of previous
SCEV rewritting results. This also has the benefit of reducing the overall number
of expression rewrites.

Reviewers: mzolotukhin, anemet

Subscribers: jmolloy, sanjoy, llvm-commits

Differential Revision: http://reviews.llvm.org/D14296

llvm-svn: 255115
2015-12-09 15:03:52 +00:00
JF Bastien 9938425b31 EarlyCSE: fix typo from rL255054.
llvm-svn: 255102
2015-12-09 09:05:42 +00:00
Vikram TV 74b4111483 Test commit access - Fix few missing '.' in comments of LoopInterchange code.
llvm-svn: 255095
2015-12-09 05:16:24 +00:00
Sanjoy Das 42e551b92d [IndVars] Use any_of and foreach instead of explicit for loops; NFC
llvm-svn: 255077
2015-12-08 23:52:58 +00:00
Philip Reames 8fc2cbf933 [EarlyCSE] Value forwarding for unordered atomics
This patch teaches the fully redundant load part of EarlyCSE how to forward from atomic and volatile loads and stores, and how to eliminate unordered atomics (only). This patch does not include dead store elimination support for unordered atomics, that will follow in the near future.

The basic idea is that we allow all loads and stores to be tracked by the AvailableLoad table. We store a bit in the table which tracks whether load/store was atomic, and then only replace atomic loads with ones which were also atomic.

No attempt is made to refine our handling of ordered loads or stores. Those are still treated as full fences. We could pretty easily extend the release fence handling to release stores, but that should be a separate patch.

Differential Revision: http://reviews.llvm.org/D15337

llvm-svn: 255054
2015-12-08 21:45:41 +00:00
Sanjoy Das 683bf070ef [IndVars] Have getInsertPointForUses preserve LCSSA
Summary:
Also add a stricter post-condition for IndVarSimplify.

Fixes PR25578.  Test case by Michael Zolotukhin.

Reviewers: hfinkel, atrick, mzolotukhin

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D15059

llvm-svn: 254977
2015-12-08 00:13:21 +00:00
Philip Reames 9e5e2d61bf Reapply 254950 w/fix
254950 ended up being not NFC.  The previous code was overriding the flags for whether an instruction read or wrote memory using the target specific flags returned via TTI.  I'd missed this in my refactoring.  Since I mistakenly built only x86 and didn't notice the number of unsupported tests, I didn't catch that before the original checkin.

This raises an interesting issue though.  Given we have function attributes (i.e. readonly, readnone, argmemonly) which describe the aliasing of intrinsics, why does TTI have this information overriding the instruction definition at all?  I see no reason for this, but decided to preserve existing behavior for the moment.  The root issue might be that we don't have a "writeonly" attribute.

Original commit message:
[EarlyCSE] Simplify and invert ParseMemoryInst [NFCI]

Restructure ParseMemoryInst - which was introduced to abstract over target specific load and stores instructions - to just query the underlying instructions. In theory, this could be slightly slower than caching the results, but in practice, it's very unlikely to be measurable.

The simple query scheme makes it far easier to understand, and much easier to extend with new queries. Given I'm about to need to add new query types, doing the cleanup first seemed worthwhile.

Do we still believe the target specific intrinsic handling is worthwhile in EarlyCSE? It adds quite a bit of complexity and makes the code harder to read. Being able to delete the abstraction entirely would be wonderful.

llvm-svn: 254957
2015-12-07 22:41:23 +00:00
Philip Reames 4b5634af44 Revert 254950
It's causing test failures on AArch64.  Due to a bad build config on my part, I apparently wasn't running the tests I thought I was.

llvm-svn: 254954
2015-12-07 21:41:29 +00:00
Philip Reames 998cae653b [EarlyCSE] Simplify and invert ParseMemoryInst [NFCI]
Restructure ParseMemoryInst - which was introduced to abstract over target specific load and stores instructions - to just query the underlying instructions. In theory, this could be slightly slower than caching the results, but in practice, it's very unlikely to be measurable.

The simple query scheme makes it far easier to understand, and much easier to extend with new queries. Given I'm about to need to add new query types, doing the cleanup first seemed worthwhile.

Do we still believe the target specific intrinsic handling is worthwhile in EarlyCSE? It adds quite a bit of complexity and makes the code harder to read. Being able to delete the abstraction entirely would be wonderful.

llvm-svn: 254950
2015-12-07 21:27:15 +00:00
Philip Reames 7c6692de16 [EarlyCSE] IsSimple vs IsVolatile naming clarification (NFC)
When the notion of target specific memory intrinsics was introduced to EarlyCSE, the commit confused the notions of volatile and simple memory access.  Since I'm about to start working on this area, cleanup the naming so that patches aren't horribly confusing.  Note that the actual implementation was always bailing if the load or store wasn't simple.  

Reminder:
- "volatile" - C++ volatile, can't remove any memory operations, but in principal unordered
- "ordered" - imposes ordering constraints on other nearby memory operations
- "atomic" - can't be split or sheared.  In LLVM terms, all "ordered" operations are also atomic so the predicate "isAtomic" is often used.
- "simple" - a load which is none of the above.  These are normal loads and what most of the optimizer works with.

llvm-svn: 254805
2015-12-05 00:18:33 +00:00
Akira Hatanaka 237916b537 [AttributeSet] Overload AttributeSet::addAttribute to reduce compile
time.

The new overloaded function is used when an attribute is added to a
large number of slots of an AttributeSet (for example, to function
parameters). This is much faster than calling AttributeSet::addAttribute
once per slot, because AttributeSet::getImpl (which calls
FoldingSet::FIndNodeOrInsertPos) is called only once per function
instead of once per slot.

With this commit, clang compiles a file which used to take over 22
minutes in just 13 seconds.

rdar://problem/23581000

Differential Revision: http://reviews.llvm.org/D15085

llvm-svn: 254491
2015-12-02 06:58:49 +00:00
Chad Rosier 869962f962 [LIR] Push check into helper function. NFC.
llvm-svn: 254416
2015-12-01 14:26:35 +00:00
Craig Topper d896b03e4c Remove an intermediate lambda. NFC
llvm-svn: 254246
2015-11-29 05:38:08 +00:00
Craig Topper e471cf32a0 Use range-based for loops. NFC
llvm-svn: 254222
2015-11-28 08:23:04 +00:00
Davide Italiano dd04fee8a6 [SCCP] More informative message if we don't know how to handle a terminator.
llvm-svn: 254093
2015-11-25 21:03:36 +00:00
Sanjay Patel 739f2ce93a use convenience function for copying IR flags; NFCI
llvm-svn: 253996
2015-11-24 17:16:33 +00:00
Chad Rosier a15b4b6af2 [LIR] Put includes in correct order. NFC.
llvm-svn: 253915
2015-11-23 21:09:13 +00:00
Andrew Kaylor 0615a0e65d [WinEH] Fix a case where GVN could incorrectly PRE a load into an EH pad.
Differential Revision: http://reviews.llvm.org/D14842

llvm-svn: 253908
2015-11-23 19:51:41 +00:00
Davide Italiano 945d05f6a0 [LoopStrengthReduce] Mark dump() definitions as LLVM_DUMP_METHOD.
llvm-svn: 253841
2015-11-23 02:47:30 +00:00
Craig Topper a5ea5289ff Use modulo operator instead of multiplying result of a divide and subtracting from the original dividend. NFC.
llvm-svn: 253792
2015-11-21 17:44:42 +00:00
Owen Anderson 630077ef55 Fix a pair of issues that caused an infinite loop in reassociate.
Terrifyingly, one of them is a mishandling of floating point vectors
in Constant::isZero().  How exactly this issue survived this long
is beyond me.

llvm-svn: 253655
2015-11-20 08:16:13 +00:00
Craig Topper e325e3806f Use range-based for loops. NFC
llvm-svn: 253652
2015-11-20 07:18:48 +00:00
Chad Rosier 1cd3da15e8 [LIR] Update some comments. NFC.
llvm-svn: 253603
2015-11-19 21:33:07 +00:00
Chad Rosier 3ecc8d8d83 [LIR] Fix 80-column from previous commit.
llvm-svn: 253586
2015-11-19 18:25:11 +00:00
Chad Rosier fddc01f393 [LIR] Sink checks into function to enable future refactoring. NFC.
The purpose of this change is help delineate the memset and memcpy
optimizations with the overall goal of resolving PR25520.

llvm-svn: 253585
2015-11-19 18:22:21 +00:00
Chad Rosier 85c21f0a6e [LIR] Use the more appropriate method. NFC.
llvm-svn: 253578
2015-11-19 17:27:28 +00:00
Pete Cooper 67cf9a723b Revert "Change memcpy/memset/memmove to have dest and source alignments."
This reverts commit r253511.

This likely broke the bots in
http://lab.llvm.org:8011/builders/clang-ppc64-elf-linux2/builds/20202
http://bb.pgr.jp/builders/clang-3stage-i686-linux/builds/3787

llvm-svn: 253543
2015-11-19 05:56:52 +00:00
Weiming Zhao b69babd01e Fix bug 25440: GVN assertion after coercing loads
Optimizations like LoadPRE in GVN will insert new instructions.
If the insertion point is in a already processed BB, they should
get a value number explicitly. If the insertion point is after
current instruction, then just leave it. However, current GVN framework
has no support for it.
In this patch, we just bail out if a VN can't be found.

Dfferential Revision: http://reviews.llvm.org/D14670

A    test/Transforms/GVN/pr25440.ll
M    lib/Transforms/Scalar/GVN.cpp

llvm-svn: 253536
2015-11-19 02:45:18 +00:00
Mehdi Amini adb4057a15 Fix returned value for GVN: could return "false" even after modifying the IR
This bug would manifest in some very specific cases where all the following
conditions are fullfilled:

- GVN didn't remove block
- The regular GVN iteration didn't change the IR
- PRE is enabled
- PRE will not split critical edge
- The last instruction processed by PRE didn't change the IR

Because the CallGraph PassManager relies on this returned value to decide
if it needs to recompute a node after the execution of Function passes,
not returning the right value can lead to unexpected results.

Fix for: https://llvm.org/bugs/show_bug.cgi?id=24715

Patch by Wenxiang Qiu <vincentqiuuu@gmail.com>

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 253518
2015-11-18 22:49:49 +00:00
Pete Cooper 72bc23ef02 Change memcpy/memset/memmove to have dest and source alignments.
Note, this was reviewed (and more details are in) http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html

These intrinsics currently have an explicit alignment argument which is
required to be a constant integer.  It represents the alignment of the
source and dest, and so must be the minimum of those.

This change allows source and dest to each have their own alignments
by using the alignment attribute on their arguments.  The alignment
argument itself is removed.

There are a few places in the code for which the code needs to be
checked by an expert as to whether using only src/dest alignment is
safe.  For those places, they currently take the minimum of src/dest
alignments which matches the current behaviour.

For example, code which used to read:
  call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %src, i32 500, i32 8, i1 false)
will now read:
  call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 8 %dest, i8* align 8 %src, i32 500, i1 false)

For out of tree owners, I was able to strip alignment from calls using sed by replacing:
  (call.*llvm\.memset.*)i32\ [0-9]*\,\ i1 false\)
with:
  $1i1 false)

and similarly for memmove and memcpy.

I then added back in alignment to test cases which needed it.

A similar commit will be made to clang which actually has many differences in alignment as now
IRBuilder can generate different source/dest alignments on calls.

In IRBuilder itself, a new argument was added.  Instead of calling:
  CreateMemCpy(Dst, Src, getInt64(Size), DstAlign, /* isVolatile */ false)
you now call
  CreateMemCpy(Dst, Src, getInt64(Size), DstAlign, SrcAlign, /* isVolatile */ false)

There is a temporary class (IntegerAlignment) which takes the source alignment and rejects
implicit conversion from bool.  This is to prevent isVolatile here from passing its default
parameter to the source alignment.

Note, changes in future can now be made to codegen.  I didn't change anything here, but this
change should enable better memcpy code sequences.

Reviewed by Hal Finkel.

llvm-svn: 253511
2015-11-18 22:17:24 +00:00
Mike Aizatsky c7810baaa6 Disable gvn non-local speculative loads under asan.
Summary: Fix for https://llvm.org/bugs/show_bug.cgi?id=25550

Differential Revision: http://reviews.llvm.org/D14763

llvm-svn: 253498
2015-11-18 20:43:00 +00:00
Igor Laevsky 7310c68e85 Revert "Revert "Strip metadata when speculatively hoisting instructions (r252604)"
Failing clang test is now fixed by the r253458.

llvm-svn: 253459
2015-11-18 14:50:18 +00:00
Craig Topper 66059c9f4d Replace dyn_cast with isa in places that weren't using the returned value for more than a boolean check. NFC.
llvm-svn: 253441
2015-11-18 07:07:59 +00:00
Philip Reames b6e8fe3dac [PRE] Preserve !invariant.load metadata
Spoted via inspection.  Test case included.

llvm-svn: 253275
2015-11-17 00:15:09 +00:00
Owen Anderson 2de9f545aa Add intermediate subtract instructions to reassociation worklist.
We sometimes create intermediate subtract instructions during
reassociation.  Adding these to the worklist to revisit exposes many
additional reassociation opportunities.

Patch by Aditya Nandakumar.

llvm-svn: 253240
2015-11-16 18:07:30 +00:00
David Majnemer 7378e7a333 [LoopStrengthReduce] Don't increment iterator past the end of the BB
We tried to move the insertion point beyond instructions like landingpad
and cleanuppad.
However, we *also* tried to move past catchpad.  This is problematic
because catchpad is also a terminator.

This fixes PR25541.

llvm-svn: 253238
2015-11-16 17:37:58 +00:00
Keno Fischer 86c95b5642 [Sink] Don't move landingpads
Summary: Moving landingpads into successor basic blocks makes the
verifier sad. Teach Sink that much like PHI nodes and terminator
instructions, landingpads (and cleanuppads, etc.) may not be moved
between basic blocks.

Reviewers: majnemer

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D14475

llvm-svn: 253182
2015-11-16 04:47:58 +00:00
Chad Rosier cc299b627d [LIR] Add support for creating memcpys from loops with a negative stride.
This allows us to transform the below loop into a memcpy.

void test(unsigned *__restrict__ a, unsigned *__restrict__ b) {
  for (int i = 2047; i >= 0; --i) {
    a[i] = b[i];
  }
}

This is the memcpy version of r251518, which added support for memset with
negative strided loops.

llvm-svn: 253091
2015-11-13 21:51:02 +00:00
Chad Rosier 2fa50a7a05 Add a comment that should have made my last commit.
llvm-svn: 253063
2015-11-13 19:13:40 +00:00
Chad Rosier ed0c7d1316 [LIR] Factor out the code to compute base ptr for negative strided loops.
This will allow for the code to be reused in the memcpy optimization.

llvm-svn: 253061
2015-11-13 19:11:07 +00:00
Tobias Grosser 8241795d20 Revert "Fix bug 25440: GVN assertion after coercing loads"
This reverts 252919 which broke LNT: MultiSource/Applications/SPASS

llvm-svn: 252936
2015-11-12 20:04:21 +00:00
Chad Rosier a548fe569b [LIR] Minor refactoring. NFCI.
This change prevents uninteresting stores from being inserted into the list of
candidate stores for memset/memcpy conversion.

llvm-svn: 252926
2015-11-12 19:09:16 +00:00
Weiming Zhao eed0145dd2 Fix bug 25440: GVN assertion after coercing loads
Summary:
when coercing loads, it inserts some instructions, which have no GV assigned.

https://llvm.org/bugs/show_bug.cgi?id=25440


Reviewers: hfinkel, dberlin

Subscribers: dberlin, llvm-commits

Differential Revision: http://reviews.llvm.org/D14479

llvm-svn: 252919
2015-11-12 18:19:59 +00:00
Chad Rosier cc9030b60a [LIR] General refactor to improve compile-time and simplify code.
First create a list of candidates, then transform.  This simplifies the code in
that you have don't have to worry that you may be using an invalidated
iterator.

Previously, each time we created a memset/memcpy we would reevaluate the entire
loop potentially resulting in lots of redundant work for large basic blocks.

llvm-svn: 252817
2015-11-11 23:00:59 +00:00
Renato Golin 0e77d72b0a Revert "Strip metadata when speculatively hoisting instructions"
This reverts commit r252604, as it broke all ARM and AArch64 buildbots, as
well as some x86, et al.

llvm-svn: 252623
2015-11-10 18:01:16 +00:00
Igor Laevsky 01c3692a10 Strip metadata when speculatively hoisting instructions
This is fix for PR24059.

When we are hoisting instruction above some condition it may turn out
that metadata on this instruction was control dependant on the condition.
This metadata becomes invalid and we need to drop it.

This patch should cover most obvious places of speculative execution (which
I have found by greping isSafeToSpeculativelyExecute). I think there are more
cases but at least this change covers the severe ones.

Differential Revision: http://reviews.llvm.org/D14398

llvm-svn: 252604
2015-11-10 14:10:31 +00:00
Chad Rosier 19dc92dc8d Simplify. NFC.
llvm-svn: 252491
2015-11-09 16:56:06 +00:00
Silviu Baranga 2910a4f6b1 Allow LLE/LD and the loop versioning infrastructure to use SCEV predicates
Summary:
LAA currently generates a set of SCEV predicates that must be checked by users.
In the case of Loop Distribute/Loop Load Elimination, no such predicates could have
been emitted, since we don't allow stride versioning. However, in the future there
could be SCEV predicates that will need to be checked.

This change adds support for SCEV predicate versioning in the Loop Distribute, Loop
Load Eliminate and the loop versioning infrastructure.

Reviewers: anemet

Subscribers: mssimpso, sanjoy, llvm-commits

Differential Revision: http://reviews.llvm.org/D14240

llvm-svn: 252467
2015-11-09 13:26:09 +00:00
David Majnemer b222184223 [LoopStrengthReduce] Don't bother fixing up PHIs from EH Pad preds
We cannot really insert fixup code into a PHI's predecessor.

This fixes PR25445.

llvm-svn: 252416
2015-11-08 05:04:07 +00:00
Duncan P. N. Exon Smith 83c4b68720 ADT: Remove last implicit ilist iterator conversions, NFC
Some implicit ilist iterator conversions have crept back into Analysis,
Transforms, Hexagon, and llvm-stress.  This removes them.

I'll commit a patch immediately after this to disallow them (in a
separate patch so that it's easy to revert if necessary).

llvm-svn: 252371
2015-11-07 00:01:16 +00:00
Akira Hatanaka 5cfcce12eb Add 'notail' marker for call instructions.
This marker prevents optimization passes from adding 'tail' or
'musttail' markers to a call. Is is used to prevent tail call
optimization from being performed on the call.

rdar://problem/22667622

Differential Revision: http://reviews.llvm.org/D12923

llvm-svn: 252368
2015-11-06 23:55:38 +00:00
Sanjoy Das 55ea67cea7 [ValueTracking] Add parameters to isImpliedCondition; NFC
Summary:
This change makes the `isImpliedCondition` interface similar to the rest
of the functions in ValueTracking (in that it takes a DataLayout,
AssumptionCache etc.).  This is an NFC, intended to make a later diff
less noisy.

Depends on D14369

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D14391

llvm-svn: 252333
2015-11-06 19:01:08 +00:00
Chad Rosier 43f9b48975 [LIR] Simplify code by making DataLayout globally accessible. NFC.
llvm-svn: 252317
2015-11-06 16:33:57 +00:00
Eugene Zelenko ffec81ca00 Fix some Clang-tidy modernize warnings, other minor fixes.
Fixed warnings are: modernize-use-override, modernize-use-nullptr and modernize-redundant-void-arg.

Differential revision: http://reviews.llvm.org/D14312

llvm-svn: 252087
2015-11-04 22:32:32 +00:00
Philip Reames 814fb60130 [CVP] Fold return values if possible
In my previous change to CVP (251606), I made CVP much more aggressive about trying to constant fold comparisons. This patch is a reversal in direction. Rather than being agressive about every compare, we restore the non-block local restriction for most, and then try hard for compares feeding returns.

The motivation for this is two fold:
 * The more I thought about it, the less comfortable I got with the possible compile time impact of the other approach. There have been no reported issues, but after talking to a couple of folks, I've come to the conclusion the time probably isn't justified.
 * It turns out we need to know the context to leverage the full power of LVI. In particular, asking about something at the end of it's block (the use of a compare in a return) will frequently get more precise results than something in the middle of a block. This is an implementation detail, but it's also hard to get around since mid-block queries have to reason about possible throwing instructions and don't get to use most of LVI's block focused infrastructure. This will become particular important when combined with http://reviews.llvm.org/D14263.

Differential Revision: http://reviews.llvm.org/D14271

llvm-svn: 252032
2015-11-04 01:43:54 +00:00
Adam Nemet 7c94c9bf07 Fix unused variable warning from r252017
llvm-svn: 252019
2015-11-04 00:10:33 +00:00
Adam Nemet e54a4fa95d LLE 6/6: Add LoopLoadElimination pass
Summary:
The goal of this pass is to perform store-to-load forwarding across the
backedge of a loop.  E.g.:

  for (i)
     A[i + 1] = A[i] + B[i]

  =>

  T = A[0]
  for (i)
     T = T + B[i]
     A[i + 1] = T

The pass relies on loop dependence analysis via LoopAccessAnalisys to
find opportunities of loop-carried dependences with a distance of one
between a store and a load.  Since it's using LoopAccessAnalysis, it was
easy to also add support for versioning away may-aliasing intervening
stores that would otherwise prevent this transformation.

This optimization is also performed by Load-PRE in GVN without the
option of multi-versioning.  As was discussed with Daniel Berlin in
http://reviews.llvm.org/D9548, this is inferior to a more loop-aware
solution applied here.  Hopefully, we will be able to remove some
complexity from GVN/MemorySSA as a consequence.

In the long run, we may want to extend this pass (or create a new one if
there is little overlap) to also eliminate loop-indepedent redundant
loads and store that *require* versioning due to may-aliasing
intervening stores/loads.  I have some motivating cases for store
elimination. My plan right now is to wait for MemorySSA to come online
first rather than using memdep for this.

The main motiviation for this pass is the 456.hmmer loop in SPECint2006
where after distributing the original loop and vectorizing the top part,
we are left with the critical path exposed in the bottom loop.  Being
able to promote the memory dependence into a register depedence (even
though the HW does perform store-to-load fowarding as well) results in a
major gain (~20%).  This gain also transfers over to x86: it's
around 8-10%.

Right now the pass is off by default and can be enabled
with -enable-loop-load-elim.  On the LNT testsuite, there are two
performance changes (negative number -> improvement):

  1. -28% in Polybench/linear-algebra/solvers/dynprog: the length of the
     critical paths is reduced
  2. +2% in Polybench/stencils/adi: Unfortunately, I couldn't reproduce this
     outside of LNT

The pass is scheduled after the loop vectorizer (which is after loop
distribution).  The rational is to try to reuse LAA state, rather than
recomputing it.  The order between LV and LLE is not critical because
normally LV does not touch scalar st->ld forwarding cases where
vectorizing would inhibit the CPU's st->ld forwarding to kick in.

LoopLoadElimination requires LAA to provide the full set of dependences
(including forward dependences).  LAA is known to omit loop-independent
dependences in certain situations.  The big comment before
removeDependencesFromMultipleStores explains why this should not occur
for the cases that we're interested in.

Reviewers: dberlin, hfinkel

Subscribers: junbuml, dberlin, mssimpso, rengolin, sanjoy, llvm-commits

Differential Revision: http://reviews.llvm.org/D13259

llvm-svn: 252017
2015-11-03 23:50:08 +00:00
Adam Nemet a2df750fb3 [LAA] LLE 3/6: Rename InterestingDependence to Dependences, NFC
Summary:
We now collect all types of dependences including lexically forward
deps not just "interesting" ones.

Reviewers: hfinkel

Subscribers: rengolin, llvm-commits

Differential Revision: http://reviews.llvm.org/D13256

llvm-svn: 251985
2015-11-03 21:39:52 +00:00
Tobias Grosser 526d52691a Revert "[IndVarSimplify] Rewrite loop exit values with their initial values from loop preheader"
Commit 251839 triggers miscompiles on some bots:

http://lab.llvm.org:8011/builders/perf-x86_64-penryn-O3-polly-fast/builds/13723

(The commit is listed in 13722, but due to an existing failure introduced in
13721 and reverted in 13723 the failure is only visible in 13723)

To verify r251839 is indeed the only change that triggered the buildbot failures
and to ensure the buildbots remain green while investigating I temporarily
revert this commit. At the current state it is unclear if this commit introduced
some miscompile or if it only exposed code to Polly that is subsequently
miscompiled by Polly.

llvm-svn: 251901
2015-11-03 07:14:39 +00:00
Chen Li d715310162 [IndVarSimplify] Rewrite loop exit values with their initial values from loop preheader
Summary:
This patch adds support to check if a loop has loop invariant conditions which lead to loop exits. If so, we know that if the exit path is taken, it is at the first loop iteration. If there is an induction variable used in that exit path whose value has not been updated, it will keep its initial value passing from loop preheader. We can therefore rewrite the exit value with
its initial value. This will help remove phis created by LCSSA and enable other optimizations like loop unswitch.


Reviewers: sanjoy

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D13974

llvm-svn: 251839
2015-11-02 22:00:15 +00:00
Justin Bogner 19b679963f [PM] Port ADCE to the new pass manager
llvm-svn: 251725
2015-10-30 23:13:18 +00:00
Philip Reames eb3e9dad7f [LVI/CVP] Teach LVI about range metadata
Somewhat shockingly for an analysis pass which is computing constant ranges, LVI did not understand the ranges provided by range metadata.

As part of this change, I included a change to CVP primarily because doing so made it much easier to write small self contained test cases. CVP was previously only handling the non-local operand case, but given that LVI can sometimes figure out information about instructions standalone, I don't see any reason to restrict this.  There could possibly be a compile time impact from this, but I suspect it should be minimal.  If anyone has an example which substaintially regresses, please let me know.  I could restrict the block local handling to ICmps feeding Terminator instructions if needed.  

Note that this patch continues a somewhat bad practice in LVI. In many cases, we know facts about values, and separate context sensitive facts about values. LVI makes no effort to distinguish and will frequently cache the same value fact repeatedly for different contexts. I would like to change this, but that's a large enough change that I want it to go in separately with clear documentation of what's changing. Other examples of this include the non-null handling, and arguments.

As a meta comment: the entire motivation of this change was being able to write smaller (aka reasonable sized) test cases for a future patch teaching LVI about select instructions.

Differential Revision: http://reviews.llvm.org/D13543

llvm-svn: 251606
2015-10-29 03:57:17 +00:00
Sanjoy Das 13e63a2f21 [JumpThreading] Use dominating conditions to prove implications
Summary:
If P branches to Q conditional on C and Q branches to R conditional on
C' and C => C' then the branch conditional on C' can be folded to an
unconditional branch.

Reviewers: reames

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D13972

llvm-svn: 251557
2015-10-28 21:27:08 +00:00
Chad Rosier 7142da0ed4 Typo.
llvm-svn: 251521
2015-10-28 15:08:33 +00:00
Chad Rosier 7967614b2b Reapply: [LIR] Add support for creating memsets from loops with a negative stride.
The simple fix is to prevent forming memcpy from loops with a negative stride.

llvm-svn: 251518
2015-10-28 14:38:49 +00:00
Chad Rosier 8eb2a18a9f Revert "[LIR] Add support for creating memsets from loops with a negative stride."
This reverts commit r251512.  This is causing LNT/chomp to fail.

llvm-svn: 251513
2015-10-28 13:54:09 +00:00
Chad Rosier d6a6bd5501 [LIR] Add support for creating memsets from loops with a negative stride.
http://reviews.llvm.org/D14125

llvm-svn: 251512
2015-10-28 12:55:34 +00:00
Chen Li 8d23a9bbef Revert r251492 "[IndVarSimplify] Rewrite loop exit values with their
initial values from loop preheader", because it broke some bots.

llvm-svn: 251498
2015-10-28 05:15:51 +00:00
Chen Li 032a5d0cea [IndVarSimplify] Rewrite loop exit values with their initial values from loop preheader
Summary:
This patch adds support to check if a loop has loop invariant conditions which lead to loop exits. If so, we know that if the exit path is taken, it is at the first loop iteration. If there is an induction variable used in that exit path whose value has not been updated, it will keep its initial value passing from loop preheader. We can therefore rewrite the exit value with
its initial value. This will help remove phis created by LCSSA and enable other optimizations like loop unswitch.


Reviewers: sanjoy

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D13974

llvm-svn: 251492
2015-10-28 04:45:47 +00:00
Igor Laevsky 1ef06559f4 [RS4GC] Strip noalias attribute after statepoint rewrite
We should remove noalias along with dereference and dereference_or_null attributes 
because statepoint could potentially touch the entire heap including noalias objects.

Differential Revision: http://reviews.llvm.org/D14032

llvm-svn: 251333
2015-10-26 19:06:01 +00:00
Benjamin Kramer 8ceb323bb4 Convert assert(false) into llvm_unreachable where it makes sense.
llvm-svn: 251266
2015-10-25 22:28:27 +00:00
NAKAMURA Takumi 26c3872666 ScalarReplAggregates.cpp: Try to appease clash of anonymous::SROA in modules build.
llvm-svn: 251181
2015-10-24 06:42:42 +00:00
Igor Laevsky dde0029a25 [RS4GC] Rename stripDereferenceabilityInfo into stripNonValidAttributes.
llvm-svn: 251157
2015-10-23 22:42:44 +00:00
Tim Northover d4f55c0b1b GVN: don't try to replace instruction with itself.
After some look-ahead PRE was added for GEPs, an instruction could end
up in the table of candidates before it was actually inspected. When
this happened the pass might decide it was the best candidate to
replace itself. This didn't go well.

Should fix PR25291

llvm-svn: 251145
2015-10-23 20:30:02 +00:00
Justin Bogner 35e46cdd04 LoopPass: Simplify the API for adding a new loop. NFC
The insertLoop() API is only used to add new loops, and has confusing
ownership semantics. Simplify it by replacing it with addLoop().

llvm-svn: 251064
2015-10-22 21:21:32 +00:00
David Majnemer e0675fb8fb [Sink] Don't check BB.empty()
As an invariant, BasicBlocks cannot be empty when passed to a transform.
This is not the case for MachineBasicBlocks and the Sink pass was ported
from the MachineSink pass which would explain the check's existence.

llvm-svn: 251057
2015-10-22 20:29:08 +00:00
Sanjoy Das 3020b1bc8c [RS4GC] Remove a redundant linear search, NFCI
Since LiveVariables is uniqued (we just created it from a `DenseSet`),
`FindIndex(LiveVariables, LiveVariables[i])` is always `i`.

llvm-svn: 250786
2015-10-20 01:06:31 +00:00
Sanjoy Das b1942f14cd [RS4GC] Clean up `find_index`; NFC
- Bring it up to the LLVM Coding Style
 - Sink it inside `CreateGCRelocates`, which is its only user

llvm-svn: 250785
2015-10-20 01:06:28 +00:00
Sanjoy Das 7ad67640e9 [RS4GC] Re-purpose `normalizeForInvokeSafepoint`; NFC.
`normalizeForInvokeSafepoint` in RewriteStatepointsForGC.cpp, as it is
written today, deals with `gc.relocate` and `gc.result` uses of a
statepoint equally well.  This change documents this fact and adds a
test case.

There is no functional change here -- only documentation of existing
functionality.

llvm-svn: 250784
2015-10-20 01:06:24 +00:00
Sanjoy Das ff3dba736a [RS4GC] Minor cleanup to `normalizeForInvokeSafepoint`; NFC
llvm-svn: 250783
2015-10-20 01:06:17 +00:00
Jakub Staszak f12821a43c Preserve CFG in MergedLoadStoreMotion. This fixes PR24426.
llvm-svn: 250660
2015-10-18 19:34:10 +00:00
Sanjoy Das 58fae7cf6b [RS4GC] Dont' propagate call attrs related to patchable statepoints
The `"statepoint-id"` and `"statepoint-num-patch-bytes"` attributes are
used solely to determine properties of the `gc.statepoint` being
created.  Once the `gc.statepoint` is in place, these should be removed.

llvm-svn: 250491
2015-10-16 02:41:23 +00:00
Sanjoy Das 810a59d037 [RS4GC] Bring legalizeCallAttributes up to LLVM coding style; NFC
llvm-svn: 250490
2015-10-16 02:41:11 +00:00
Sanjoy Das 25ec1a3e60 [RS4GC] Use "deopt" operand bundles
Summary:
This is a step towards using operand bundles to carry deopt state till
RewriteStatepointsForGC.  The change adds a flag to
RewriteStatepointsForGC that teaches it to pick up deopt state from a
`"deopt"` operand bundle attached to the `call` or `invoke` it is
wrapping.

The command line flag added, `-rs4gc-use-deopt-bundles`, will only exist
for a short while.  Once we are able to pipe deopt bundle state through
the full optimization pipeline without problems, we will "constant fold"
`-rs4gc-use-deopt-bundles` to `true`.

Reviewers: swaroop.sridhar, reames

Subscribers: llvm-commits, sanjoy

Differential Revision: http://reviews.llvm.org/D13372

llvm-svn: 250489
2015-10-16 02:41:00 +00:00
Sanjoy Das 7360f30852 [IndVars] Rename getExtend; NFC
Rename `IndVarSimplify::getExtend` to `IndVarSimplify::createExtendInst`
to make it obvious that it creates `llvm::Instruction` s.

llvm-svn: 250484
2015-10-16 01:00:50 +00:00
Sanjoy Das 37e87c2023 [IndVars] Have `cloneArithmeticIVUser` guess better
Summary:
`cloneArithmeticIVUser` currently trips over expression like `add %iv,
-1` when `%iv` is being zero extended -- it tries to construct the
widened use as `add %iv.zext, zext(-1)` and (correctly) fails to prove
equivalence to `zext(add %iv, -1)` (here the SCEV for `%iv` is
`{1,+,1}`).

This change teaches `IndVars` to try sign extending the non-IV operand
if that makes the newly constructed IV use equivalent to the widened
narrow IV use.

Reviewers: atrick, hfinkel, reames

Subscribers: sanjoy, llvm-commits

Differential Revision: http://reviews.llvm.org/D13717

llvm-svn: 250483
2015-10-16 01:00:47 +00:00
Sanjoy Das 472840a3d3 [IndVars] Extract out a few local variables; NFC
llvm-svn: 250482
2015-10-16 01:00:44 +00:00
Sanjoy Das 1fd184e5a2 [IndVars] Split `WidenIV::cloneIVUser`; NFC
Summary:
This NFC splitting is intended to make a later diff easier to follow.
It just tail duplicates `cloneIVUser` into `cloneArithmeticIVUser` and
`cloneBitwiseIVUser`.

Reviewers: atrick, hfinkel, reames

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D13716

llvm-svn: 250481
2015-10-16 01:00:39 +00:00
Benjamin Kramer 6db3338cb1 [ScalarOpts] Remove dead code.
Does not touch debug dumpers. NFC.

llvm-svn: 250417
2015-10-15 15:08:58 +00:00
Manman Ren 72d44b1b09 Recommit r250345, it was reverted in r250366 to investigate a bot failure.
Our internal bot is still red after r250366.

llvm-svn: 250415
2015-10-15 14:59:40 +00:00
Manman Ren f5499fd9d5 Temporarily revert r250345 to sort out bot failure.
With r250345 and r250343, we start to observe the following failure
when bootstrap clang with lto and pgo:
PHI node entries do not match predecessors!
  %.sroa.029.3.i = phi %"class.llvm::SDNode.13298"* [ null, %30953 ], [ null, %31017 ], [ null, %30998 ], [ null, %_ZN4llvm8dyn_castINS_14ConstantSDNodeENS_7SDValueEEENS_10cast_rettyIT_T0_E8ret_typeERS5_.exit.i.1804 ], [ null, %30975 ], [ null, %30991 ], [ null, %_ZNK4llvm3EVT13getScalarTypeEv.exit.i.1812 ], [ %..sroa.029.0.i, %_ZN4llvm11SmallVectorIiLj8EED1Ev.exit.i.1826 ], !dbg !451895
label %30998
label %_ZNK4llvm3EVTeqES0_.exit19.thread.i
LLVM ERROR: Broken function found, compilation aborted!

I will re-commit this if the bot does not recover.

llvm-svn: 250366
2015-10-15 04:58:24 +00:00
Cong Hou b74d3b3b86 Update the branch weight metadata in JumpThreading pass.
Currently in JumpThreading pass, the branch weight metadata is not updated after CFG modification. Consider the jump threading on PredBB, BB, and SuccBB. After jump threading, the weight on BB->SuccBB should be adjusted as some of it is contributed by the edge PredBB->BB, which doesn't exist anymore. This patch tries to update the edge weight in metadata on BB->SuccBB by scaling it by 1 - Freq(PredBB->BB) / Freq(BB->SuccBB).

This is the third attempt to submit this patch, while the first two led to failures in some FDO tests. After investigation, it is the edge weight normalization that caused those failures. In this patch the edge weight normalization is fixed so that there is no zero weight in the output and the sum of all weights can fit in 32-bit integer. Several unit tests are added.

Differential revision: http://reviews.llvm.org/D10979

llvm-svn: 250345
2015-10-14 23:14:17 +00:00
Chen Li 567aa7ab30 [LoopUnswitch] Correct misleading comments.
Reviewers: reames

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D13738

llvm-svn: 250317
2015-10-14 19:47:43 +00:00
Manman Ren 2c8e16d507 Revert r250204 and r250240 due to bot failure. We failed to build PGO-ed clang.
llvm-svn: 250264
2015-10-14 03:04:03 +00:00
Chad Rosier 7f08d80595 Typo.
llvm-svn: 250224
2015-10-13 20:59:16 +00:00
Duncan P. N. Exon Smith be4d8cba1c Scalar: Remove remaining ilist iterator implicit conversions
Remove remaining `ilist_iterator` implicit conversions from
LLVMScalarOpts.

This change exposed some scary behaviour in
lib/Transforms/Scalar/SCCP.cpp around line 1770.  This patch changes a
call from `Function::begin()` to `&Function::front()`, since the return
was immediately being passed into another function that takes a
`Function*`.  `Function::front()` started to assert, since the function
was empty.  Note that `Function::end()` does not point at a legal
`Function*` -- it points at an `ilist_half_node` -- so the other
function was getting garbage before.  (I added the missing check for
`Function::isDeclaration()`.)

Otherwise, no functionality change intended.

llvm-svn: 250211
2015-10-13 19:26:58 +00:00
Cong Hou 7ab123a5cf Update the branch weight metadata in JumpThreading pass.
Currently in JumpThreading pass, the branch weight metadata is not updated after CFG modification. Consider the jump threading on PredBB, BB, and SuccBB. After jump threading, the weight on BB->SuccBB should be adjusted as some of it is contributed by the edge PredBB->BB, which doesn't exist anymore. This patch tries to update the edge weight in metadata on BB->SuccBB by scaling it by 1 - Freq(PredBB->BB) / Freq(BB->SuccBB).

Differential revision: http://reviews.llvm.org/D10979

llvm-svn: 250204
2015-10-13 18:43:10 +00:00
Duncan P. N. Exon Smith 3a9c9e3dcd Scalar: Remove some implicit ilist iterator conversions, NFC
Remove some of the implicit ilist iterator conversions in
LLVMScalarOpts.  More to go.

llvm-svn: 250197
2015-10-13 18:26:00 +00:00
Sanjoy Das b873cbe5c9 [IndVars] NFC Cleanup.
- Rename methods according to the LLVM Coding Style
 - Merge adjacent anonymous namespace block
 - Use `auto` in two places

llvm-svn: 250152
2015-10-13 07:17:38 +00:00
Manman Ren 9f824dab1d Revert 250089 due to bot failure. It failed when building clang itself with PGO.
llvm-svn: 250145
2015-10-13 03:38:02 +00:00
Cong Hou 3320bcd815 Update the branch weight metadata in JumpThreading pass.
In JumpThreading pass, the branch weight metadata is not updated after CFG modification. Consider the jump threading on PredBB, BB, and SuccBB. After jump threading, the weight on BB->SuccBB should be adjusted as some of it is contributed by the edge PredBB->BB, which doesn't exist anymore. This patch tries to update the edge weight in metadata on BB->SuccBB by scaling it by 1 - Freq(PredBB->BB) / Freq(BB->SuccBB). 

Differential revision: http://reviews.llvm.org/D10979

llvm-svn: 250089
2015-10-12 19:44:08 +00:00
Sanjoy Das cc16ccc1ab [IndVars] Use `auto`; NFC
llvm-svn: 249944
2015-10-10 06:33:33 +00:00
Owen Anderson 97ca0f3f2c Generalize convergent check to handle invokes as well as calls.
llvm-svn: 249892
2015-10-09 20:17:46 +00:00
Owen Anderson 2c9978b12b Teach LoopUnswitch not to perform non-trivial unswitching on loops containing convergent operations.
Doing so could cause the post-unswitching convergent ops to be
control-dependent on the unswitch condition where they were not before.
This check could be refined to allow unswitching where the convergent
operation was already control-dependent on the unswitch condition.

llvm-svn: 249874
2015-10-09 18:40:20 +00:00
Owen Anderson d95b08a0a7 Refine the definition of convergent to only disallow the addition of new control dependencies.
This covers the common case of operations that cannot be sunk.
Operations that cannot be hoisted should already be handled properly via
the safe-to-speculate rules and mechanisms.

llvm-svn: 249865
2015-10-09 18:06:13 +00:00
Andrea Di Biagio 99493df257 [MemCpyOpt] Fix wrong merging adjacent nontemporal stores into memset calls.
Pass MemCpyOpt doesn't check if a store instruction is nontemporal.
As a consequence, adjacent nontemporal stores are always merged into a
memset call.

Example:

;;;
define void @foo(<4 x float>* nocapture %p) {
entry:
  store <4 x float> zeroinitializer, <4 x float>* %p, align 16, !nontemporal !0
  %p1 = getelementptr inbounds <4 x float>, <4 x float>* %dst, i64 1
  store <4 x float> zeroinitializer, <4 x float>* %p1, align 16, !nontemporal !0
  ret void
}

!0 = !{i32 1}
;;;

In this example, the two nontemporal stores are combined to a memset of zero
which does not preserve the nontemporal hint. Later on the backend (tested on a
x86-64 corei7) expands that memset call into a sequence of two normal 16-byte
aligned vector stores.

opt -memcpyopt example.ll -S -o - | llc -mcpu=corei7 -o -

Before:
  xorps  %xmm0, %xmm0
  movaps  %xmm0, 16(%rdi)
  movaps  %xmm0, (%rdi)

With this patch, we no longer merge nontemporal stores into calls to memset.
In this example, llc correctly expands the two stores into two movntps:
  xorps  %xmm0, %xmm0
  movntps %xmm0, 16(%rdi)
  movntps  %xmm0, (%rdi)

In theory, we could extend the usage of !nontemporal metadata to memcpy/memset
calls. However a change like that would only have the effect of forcing the
backend to expand !nontemporal memsets back to sequences of store instructions.
A memset library call would not have exactly the same semantic of a builtin
!nontemporal memset call. So, SelectionDAG will have to conservatively expand
it back to a sequence of !nontemporal stores (effectively undoing the merging).

Differential Revision: http://reviews.llvm.org/D13519

llvm-svn: 249820
2015-10-09 10:53:41 +00:00
Arnaud A. de Grandmaison 859b2ac07d [EarlyCSE] Address post commit review for r249523.
llvm-svn: 249814
2015-10-09 09:23:01 +00:00
Sanjoy Das 3c520a1272 [RS4GC] Refactoring to make a later change easier, NFCI
Summary:
These non-semantic changes will help make a later change adding
support for deopt operand bundles more streamlined.

Reviewers: reames, swaroop.sridhar

Subscribers: sanjoy, llvm-commits

Differential Revision: http://reviews.llvm.org/D13491

llvm-svn: 249779
2015-10-08 23:18:38 +00:00
Sanjoy Das c21a05a3a4 [PlaceSafeopints] Extract out `callsGCLeafFunction`, NFC
Summary:
This will be used in a later change to RewriteStatepointsForGC.

Reviewers: reames, swaroop.sridhar

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D13490

llvm-svn: 249777
2015-10-08 23:18:30 +00:00
Sanjoy Das 1ede5367ba [RS4GC] Don't copy ADT's unneccessarily, NFCI
Summary: Use `const auto &` instead of `auto` in `makeStatepointExplicit`.

Reviewers: reames, swaroop.sridhar

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D13454

llvm-svn: 249776
2015-10-08 23:18:22 +00:00
Sanjoy Das 40bdd041db [RS4GC] Use AssertingVH for RematerializedValueMapTy, NFCI
Reviewers: reames, swaroop.sridhar

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D13489

llvm-svn: 249620
2015-10-07 21:32:35 +00:00
Arnaud A. de Grandmaison a6178a179d [EarlyCSE] Fix handling of target memory intrinsics for CSE'ing loads.
Summary:
Some target intrinsics can access multiple elements, using the pointer as a
base address (e.g. AArch64 ld4). When trying to CSE such instructions,
it must be checked the available value comes from a compatible instruction
because the pointer is not enough to discriminate whether the value is
correct.

Reviewers: ssijaric

Subscribers: mcrosier, llvm-commits, aemerson

Differential Revision: http://reviews.llvm.org/D13475

llvm-svn: 249523
2015-10-07 07:41:29 +00:00
Sanjoy Das 60bf3db17f [RS4GC] Remove an unnecessary assert & related variables
I don't think this assert adds much value, and removing it and related
variables avoids an "unused variable" warning in release builds.

llvm-svn: 249511
2015-10-07 02:39:27 +00:00
Sanjoy Das b40bd1a93f [RS4GC] Cosmetic cleanup, NFC
Summary:
A series of cosmetic cleanup changes to RewriteStatepointsForGC:

  - Rename variables to LLVM style
  - Remove some redundant asserts
  - Remove an unsued `Pass *` parameter
  - Remove unnecessary variables
  - Use C++11 idioms where applicable
  - Pass CallSite by value, not reference

Reviewers: reames, swaroop.sridhar

Subscribers: llvm-commits, sanjoy

Differential Revision: http://reviews.llvm.org/D13370

llvm-svn: 249508
2015-10-07 02:39:18 +00:00
Hans Wennborg 083ca9bb32 Fix Clang-tidy modernize-use-nullptr warnings in source directories and generated files; other minor cleanups.
Patch by Eugene Zelenko!

Differential Revision: http://reviews.llvm.org/D13321

llvm-svn: 249482
2015-10-06 23:24:35 +00:00
Sanjoy Das 5c8bead46d [IndVars] Don't break dominance in `eliminateIdentitySCEV`
Summary:
After r249211, `getSCEV(X) == getSCEV(Y)` does not guarantee that X and
Y are related in the dominator tree, even if X is an operand to Y (I've
included a toy example in comments, and a real example as a test case).

This commit changes `SimplifyIndVar` to require a `DominatorTree`.  I
don't think this is a problem because `ScalarEvolution` requires it
anyway.

Fixes PR25051.

Depends on D13459.

Reviewers: atrick, hfinkel

Subscribers: joker.eph, llvm-commits, sanjoy

Differential Revision: http://reviews.llvm.org/D13460

llvm-svn: 249471
2015-10-06 21:44:49 +00:00
Arnaud A. de Grandmaison 6fd488b156 [EarlyCSE] Constify ParseMemoryInst methods (NFC).
llvm-svn: 249400
2015-10-06 13:35:30 +00:00
Piotr Padlewski dc9b2cfc50 inariant.group handling in GVN
The most important part required to make clang
devirtualization works ( ͡°͜ʖ ͡°).
The code is able to find non local dependencies, but unfortunatelly
because the caller can only handle local dependencies, I had to add
some restrictions to look for dependencies only in the same BB.

http://reviews.llvm.org/D12992

llvm-svn: 249196
2015-10-02 22:12:22 +00:00
Jingyue Wu df1a1b113b [NaryReassociate] SeenExprs records WeakVH
Summary:
The instructions SeenExprs records may be deleted during rewriting.
FindClosestMatchingDominator should ignore these deleted instructions.

Fixes PR24301.

Reviewers: grosser

Subscribers: grosser, llvm-commits

Differential Revision: http://reviews.llvm.org/D13315

llvm-svn: 248983
2015-10-01 03:51:44 +00:00
Fiona Glaser b0c6d9174e DeadCodeElimination: rewrite to be faster
Same strategy as simplifyInstructionsInBlock. ~1/3 less time
on my test suite. This pass doesn't have many in-tree users,
but getting rid of an O(N^2) worst case and making it cleaner
should at least make it a viable alternative to ADCE, since
it's now consistently somewhat faster.

llvm-svn: 248927
2015-09-30 17:49:49 +00:00
Chen Li 9f27fc0599 [LoopUnswitch] Add block frequency analysis to recognize hot/cold regions
Summary: This patch adds block frequency analysis to LoopUnswitch pass to recognize hot/cold regions. For cold regions the pass only performs trivial unswitches since they do not increase code size, and for hot regions everything works as before. This helps to minimize code growth in cold regions and be more aggressive in hot regions. Currently the default cold regions are blocks with frequencies below 20% of function entry frequency, and it can be adjusted via -loop-unswitch-cold-block-frequency flag. The entire feature is controlled via -loop-unswitch-with-block-frequency flag and it is off by default.

Reviewers: broune, silvas, dnovillo, reames

Subscribers: davidxl, llvm-commits

Differential Revision: http://reviews.llvm.org/D11605

llvm-svn: 248777
2015-09-29 05:03:32 +00:00
Weiming Zhao 310770a90f [LoopReroll] Ignore debug intrinsics
Originally, debug intrinsics and annotation intrinsics may prevent
the loop to be rerolled, now they are ignored.

Differential Revision: http://reviews.llvm.org/D13150

llvm-svn: 248718
2015-09-28 17:03:23 +00:00
Justin Bogner 0638b7ba99 ADCE: Fix typo in file comment. NFC
llvm-svn: 248613
2015-09-25 21:03:46 +00:00
Lawrence Hu cac0b89289 Swap loop invariant GEP with loop variant GEP to allow more LICM.
This patch changes the order of GEPs generated by Splitting GEPs
    pass, specially when one of the GEPs has constant and the base is
    loop invariant, then we will generate the GEP with constant first
    when beneficial, to expose more cases for LICM.

    If originally Splitting GEP generate the following:
      do.body.i:
        %idxprom.i = sext i32 %shr.i to i64
        %2 = bitcast %typeD* %s to i8*
        %3 = shl i64 %idxprom.i, 2
        %uglygep = getelementptr i8, i8* %2, i64 %3
        %uglygep7 = getelementptr i8, i8* %uglygep, i64 1032
      ...
    Now it genereates:
      do.body.i:
        %idxprom.i = sext i32 %shr.i to i64
        %2 = bitcast %typeD* %s to i8*
        %3 = shl i64 %idxprom.i, 2
        %uglygep = getelementptr i8, i8* %2, i64 1032
        %uglygep7 = getelementptr i8, i8* %uglygep, i64 %3
      ...

    For no-loop cases, the original way of generating GEPs seems to
    expose more CSE cases, so we don't change the logic for no-loop
    cases, and only limit our change to the specific case we are
    interested in.

llvm-svn: 248420
2015-09-23 19:25:30 +00:00
Igor Laevsky 029bd93c5d [DeadStoreElimination] Remove dead zero store to calloc initialized memory
This change allows dead store elimination to remove zero and null stores into memory freshly allocated with calloc-like function.

Differential Revision: http://reviews.llvm.org/D13021

llvm-svn: 248374
2015-09-23 11:38:44 +00:00
Sanjoy Das 2aacc0ecca [SCEV] Introduce ScalarEvolution::getOne and getZero.
Summary:
It is fairly common to call SE->getConstant(Ty, 0) or
SE->getConstant(Ty, 1); this change makes such uses a little bit
briefer.

I've refactored the call sites I could find easily to use getZero /
getOne.

Reviewers: hfinkel, majnemer, reames

Subscribers: sanjoy, llvm-commits

Differential Revision: http://reviews.llvm.org/D12947

llvm-svn: 248362
2015-09-23 01:59:04 +00:00
Michael Zolotukhin deade19630 [Unroll] Do not crash trying to propagate a value to vector load.
llvm-svn: 248333
2015-09-22 22:27:12 +00:00
Michael Zolotukhin 8bb31dd08a [Unroll] Follow-up for r247769: fix a bug in UnrolledInstAnalyzer::visitLoad.
Apart from checking that GlobalVariable is a constant, we should check
that it's not a weak constant, in which case we can't propagate its
value.

llvm-svn: 248327
2015-09-22 21:41:29 +00:00
NAKAMURA Takumi 10c80e7996 Prune trailing whitespaces.
llvm-svn: 248265
2015-09-22 11:19:03 +00:00
NAKAMURA Takumi 0a7d0ad95f Untabify.
llvm-svn: 248264
2015-09-22 11:15:07 +00:00
NAKAMURA Takumi a9cb538a74 Reformat blank lines.
llvm-svn: 248263
2015-09-22 11:14:39 +00:00
NAKAMURA Takumi 84965031a7 Reformat comment lines.
llvm-svn: 248262
2015-09-22 11:14:12 +00:00
NAKAMURA Takumi 70ad98aca4 Reformat.
llvm-svn: 248261
2015-09-22 11:13:55 +00:00
Michael Zolotukhin 9f3aea6e1f [LoopUnswitch] Require DominatorTree info.
Summary:
We should either require the DT info to be available, or check if it's
available in every place we use DT (and we already miss such check in
one place, which causes failures in some cases). As other loop passes
preserve DT and it's usually available, it makes sense to just require
it here.

There is no regression test, because the bug only shows up if pass
manager decides to clean DT info right before LoopUnswitch. If
loop-unswitch is run separately, DT is available, so bug isn't exposed.

Reviewers: chandlerc, hfinkel

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D13036

llvm-svn: 248230
2015-09-22 00:22:47 +00:00
Philip Reames 5f99423de9 [LICM] Hoist calls to readonly argmemonly functions even with stores in the loop
We know that an argmemonly function can only access memory pointed to by it's pointer arguments. Rather than needing to consider all possible stores as aliasing (as we do for a readonly function), we can only consider the aliasing of the pointer arguments.

Note that this change only addresses hoisting. I'm thinking about how to address speculation safety as well, but that will be a different change.

FYI, argmemonly disallows accessing memory through non-pointer typed arguments.  

Differential Revision: http://reviews.llvm.org/D12771

llvm-svn: 248220
2015-09-21 22:27:59 +00:00
Mehdi Amini 24e20583d1 Fix UB: can't bind a reference to nullptr (NFC)
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 248213
2015-09-21 21:29:43 +00:00
Sanjoy Das 7cc2cfecd9 [IndVars] Use C++11 style field initialization; NFCI.
llvm-svn: 248131
2015-09-20 18:42:53 +00:00
Sanjoy Das e1e352d5c5 [IndVars] Don't add a level of indentation for namespace {. NFC.
Whitespace-only change.

llvm-svn: 248130
2015-09-20 18:42:50 +00:00
Sanjoy Das 9119bf4c0b [IndVars] Don't repeat function names in comment; NFC.
Only changes comments.

llvm-svn: 248112
2015-09-20 06:58:03 +00:00
Sanjoy Das 428db150d1 [IndVars] Fix a bug in r248045.
Because -indvars widens induction variables through arithmetic,
`NeverNegative` cannot be a property of the `WidenIV` (a `WidenIV`
manages information for all transitive uses of an IV being widened,
including uses of `-1 * IV`).  Instead it must live on `NarrowIVDefUse`
which manages information for a specific def-use edge in the transitive
use list of an induction variable.

This change also adds a test case that demonstrates the problem with
r248045.

llvm-svn: 248107
2015-09-20 01:52:18 +00:00
Sanjoy Das f69d0e3384 [IndVars] Widen more comparisons for non-negative induction vars
Summary:
If an induction variable is provably non-negative, its sign extension is
equal to its zero extension.  This means narrow uses like

  icmp slt iNarrow %indvar, %rhs

can be widened into

  icmp slt iWide zext(%indvar), sext(%rhs)

Reviewers: atrick, mcrosier, hfinkel

Subscribers: hfinkel, reames, llvm-commits

Differential Revision: http://reviews.llvm.org/D12745

llvm-svn: 248045
2015-09-18 21:21:02 +00:00
Larisse Voufo 532bf7153c Clean up: Refactoring the hardcoded value of 6 for FindAvailableLoadedValue()'s parameter MaxInstsToScan. (Complete version of r247497. See D12886)
llvm-svn: 248022
2015-09-18 19:14:35 +00:00
Piotr Padlewski a4d43337d4 gvn small fix
http://reviews.llvm.org/D12928

llvm-svn: 247935
2015-09-17 20:34:22 +00:00
David L Kreitzer da700ce581 Test commit: Fixed a few typos in the comments.
llvm-svn: 247793
2015-09-16 13:27:30 +00:00
Michael Zolotukhin fc314be0ec [Unroll] Fix a bug in UnrolledInstAnalyzer::visitLoad.
We only checked that a global is initialized with constants, which is
incorrect. We should be checking that GlobalVariable *is* a constant,
not just initialized with it.

llvm-svn: 247769
2015-09-16 03:25:09 +00:00
Sanjoy Das 8a5526e8be [IndVars] Fix PR24783.
In `IndVarSimplify::ExpandSCEVIfNeeded`,
`SCEVExpander::findExistingExpansion` may return an `llvm::Value` that
differs in type from the SCEV it was asked to find an expansion for (but
computes the same value).  In such cases, we fall back on
`expandCodeFor`; and rely on LLVM to CSE the two equivalent
expressions (different only by a no-op cast) into a single computation.

I tried a few other approaches to fixing PR24783, all of which turned
out to be more complex than this current version:

 1. Move the `ExpandSCEVIfNeeded` logic into `expandCodeFor`.  This got
    problematic because currently we do not pass in the `Loop *` into
    `expandCodeFor`.  Changing the interface to do this is a more
    invasive change, and really does not make much semantic sense unless
    the SCEV being passed in is an add recurrence.

    There is also the problem of `expandCodeFor` being used in places
    other than `indvars` -- there may be performance / correctness
    issues elsewhere if `expandCodeFor` is moved from always generating
    IR from scratch to cache-like model.

 2. Have `findExistingExpansion` only return expression with the correct
    type.  This would make `isHighCostExpansionHelper` and thus
    `isHighCostExpansion` more conservative than necessary.

 3. Insert casts on the value returned by `findExistingExpansion` if
    needed using `InsertNoopCastOfTo`.  This is complicated because
    `InsertNoopCastOfTo` depends on internal state of its
    `SCEVExpander` (specifically `Builder.GetInserPoint()`), and this
    may not be set up when `ExpandSCEVIfNeeded` is called.

 4. Manually insert casts on the value returned by
    `findExistingExpansion` if needed using `InsertNoopCastOfTo` via
    `CastInst::Create`.  This is probably workable, but figuring out the
    location where the cast instruction needs to be inserted has enough
    edge cases (arguments, constants, invokes, LCSSA must be preserved)
    makes me feel what I have right now is simplest solution.

llvm-svn: 247749
2015-09-15 23:45:39 +00:00
Sanjoy Das 0ce51a92a8 [IndVars] Rename variable; NFC.
llvm-svn: 247748
2015-09-15 23:45:35 +00:00
Larisse Voufo 6b867c7254 Revert "Clean up: Refactoring the hardcoded value of 6 for FindAvailableLoadedValue()'s parameter MaxInstsToScan." for preliminary community discussion (See. D12886)
llvm-svn: 247716
2015-09-15 19:14:05 +00:00
Igor Laevsky bdc1eafe20 [CorrelatedValuePropagation] Infer nonnull attributes
LazuValueInfo can prove that value is nonnull based on the context information. 
Make use of this ability to infer nonnull attributes for the call arguments.

Differential Revision: http://reviews.llvm.org/D12836

llvm-svn: 247707
2015-09-15 17:51:50 +00:00
Marcello Maggioni 454faa84e2 [NaryReassociate] Add support for Mul instructions
This patch extends the current pass by handling
Mul instructions as well.

Patch by: Volkan Keles (vkeles@apple.com)

llvm-svn: 247705
2015-09-15 17:22:52 +00:00
Sanjoy Das f75e15e5ac [PlaceSafepoints] Make the width of a counted loop settable.
Summary:
This change lets a `PlaceSafepoints` client change how wide the trip
count of a loop has to be for the loop to be considerd "counted", via
`CountedLoopTripWidth`.  It also removes the boolean `SkipCounted` flag
and the `upperTripBound` constant -- we can get the old behavior of
`SkipCounted` == `false` by setting `CountedLoopTripWidth` to `13` (2 ^
13 == 8192).

Reviewers: reames

Subscribers: llvm-commits, sanjoy

Differential Revision: http://reviews.llvm.org/D12789

llvm-svn: 247656
2015-09-15 01:42:48 +00:00
Chandler Carruth 29a18a4663 [PM] Port SROA to the new pass manager.
In some ways this is a very boring port to the new pass manager as there
are no interesting analyses or dependencies or other oddities.

However, this does introduce the first good example of a transformation
pass with non-trivial state porting to the new pass manager. I've tried
to carve out patterns here to replicate elsewhere, and would appreciate
comments on whether folks like these patterns:

- A common need in the new pass manager is to effectively lift the pass
  class and some of its state into a public header file. Prior to this,
  LLVM used anonymous namespaces to provide "module private" types and
  utilities, but that doesn't scale to cases where a public header file
  is needed and the new pass manager will exacerbate that. The pattern
  I've adopted here is to use the namespace-cased-name of the core pass
  (what would be a module if we had them) as a module-private namespace.
  Then utility and other code can be declared and defined in this
  namespace. At some point in the future, we could even have
  (conditionally compiled) code that used modules features when
  available to do the same basic thing.

- I've split the actual pass run method in two in order to expose
  a private method usable by the old pass manager to wrap the new class
  with a minimum of duplicated code. I actually looked at a bunch of
  ways to automate or generate these, but they are all quite terrible
  IMO. The fundamental need is to extract the set of analyses which need
  to cross this interface boundary, and that will end up being too
  unpredictable to effectively encapsulate IMO. This is also
  a relatively small amount of boiler plate that will live a relatively
  short time, so I'm not too worried about the fact that it is boiler
  plate.

The rest of the patch is totally boring but results in a massive diff
(sorry). It just moves code around and removes or adds qualifiers to
reflect the new name and nesting structure.

Differential Revision: http://reviews.llvm.org/D12773

llvm-svn: 247501
2015-09-12 09:09:14 +00:00
Larisse Voufo f57162b6e7 Clean up: Refactoring the hardcoded value of 6 for FindAvailableLoadedValue()'s parameter MaxInstsToScan.
llvm-svn: 247497
2015-09-12 01:41:55 +00:00
James Molloy efbba72cb2 Add GlobalsAA as preserved to a bunch of transforms
GlobalsAA must by definition be preserved in function passes, but the passmanager doesn't know that. Make each pass explicitly preserve GlobalsAA.

llvm-svn: 247263
2015-09-10 10:22:12 +00:00
Philip Reames 953817b65d [RewriteStatepointsForGC] Minor refactor to use shared implementation [NFC]
llvm-svn: 247223
2015-09-10 00:44:10 +00:00
Philip Reames b4e55f3923 [RewriteStatepointsForGC] Strengthen a confusingly weak assertion [NFC]
The assertion was weaker than it should be and gave the impression we're growing the number of base defining values being considered during the fixed point interation.  That's not true.  The tighter form of the assert is useful documentation.

llvm-svn: 247221
2015-09-10 00:32:56 +00:00
Philip Reames c8ded462c4 [RewriteStatepointsForGC] One last bit of naming [NFCI]
llvm-svn: 247220
2015-09-10 00:27:50 +00:00
Philip Reames 34d7a7493d [RewriteStatepointsForGC] Further style/naming fixup [NFCI]
llvm-svn: 247217
2015-09-10 00:22:49 +00:00
Philip Reames 7540e3a45d [RewriteStatepointsForGC] More naming cleanup [NFCI]
llvm-svn: 247213
2015-09-10 00:01:53 +00:00
Philip Reames ece70b8042 [RewriteStatepointsForGC] Code cleanup [NFC]
Factor out common code related to naming values, fix a small style issue.  More to follow in separate changes.

llvm-svn: 247211
2015-09-09 23:57:18 +00:00
Philip Reames 6628713f4f [RewriteStatepointsForGC] Extend base pointer inference to handle insertelement
This change is simply enhancing the existing inference algorithm to handle insertelement instructions by conservatively inserting a new instruction to propagate the vector of associated base pointers. In the process, I'm ripping out the peephole optimizations which mostly helped cover the fact this hadn't been done.

Note that most of the newly inserted nodes will be nearly immediately removed by the post insertion optimization pass introduced in 246718. Arguably, we should be trying harder to avoid the malloc traffic here, but I'd rather get the code correct, then worry about compile time.

Unlike previous extensions of the algorithm to handle more case, I discovered the existing code was causing miscompiles in some cases. In particular, we had an implicit assumption that the peephole covered *all* insert element instructions, so if we had a value directly based on a insert element the peephole didn't cover, we proceeded as if it were a base anyways. Not good. I believe we had the same issue with shufflevector which is why I adjusted the predicate for them as well.

Differential Revision: http://reviews.llvm.org/D12583

llvm-svn: 247210
2015-09-09 23:40:12 +00:00
Philip Reames 15d5563cea [RewriteStatepointsForGC] Make base pointer inference deterministic
Previously, the base pointer algorithm wasn't deterministic. The core fixed point was (of course), but we were inserting new nodes and optimizing them in an order which was unspecified and variable. We'd somewhat hacked around this for testing by sorting by value name, but that doesn't solve the general determinism problem.

Instead, we can use the order of traversal over the def/use graph to give us a single consistent ordering. Today, this is a DFS order, but the exact order doesn't mater provided it's deterministic for a given input.

(Q: It is safe to rely on a deterministic order of operands right?)

Note that this only fixes the determinism within a single inference step. The inference step is currently invoked many times in a non-deterministic order. That's a future change in the sequence. :)

Differential Revision: http://reviews.llvm.org/D12640

llvm-svn: 247208
2015-09-09 23:26:08 +00:00
Chandler Carruth 7b560d40bd [PM/AA] Rebuild LLVM's alias analysis infrastructure in a way compatible
with the new pass manager, and no longer relying on analysis groups.

This builds essentially a ground-up new AA infrastructure stack for
LLVM. The core ideas are the same that are used throughout the new pass
manager: type erased polymorphism and direct composition. The design is
as follows:

- FunctionAAResults is a type-erasing alias analysis results aggregation
  interface to walk a single query across a range of results from
  different alias analyses. Currently this is function-specific as we
  always assume that aliasing queries are *within* a function.

- AAResultBase is a CRTP utility providing stub implementations of
  various parts of the alias analysis result concept, notably in several
  cases in terms of other more general parts of the interface. This can
  be used to implement only a narrow part of the interface rather than
  the entire interface. This isn't really ideal, this logic should be
  hoisted into FunctionAAResults as currently it will cause
  a significant amount of redundant work, but it faithfully models the
  behavior of the prior infrastructure.

- All the alias analysis passes are ported to be wrapper passes for the
  legacy PM and new-style analysis passes for the new PM with a shared
  result object. In some cases (most notably CFL), this is an extremely
  naive approach that we should revisit when we can specialize for the
  new pass manager.

- BasicAA has been restructured to reflect that it is much more
  fundamentally a function analysis because it uses dominator trees and
  loop info that need to be constructed for each function.

All of the references to getting alias analysis results have been
updated to use the new aggregation interface. All the preservation and
other pass management code has been updated accordingly.

The way the FunctionAAResultsWrapperPass works is to detect the
available alias analyses when run, and add them to the results object.
This means that we should be able to continue to respect when various
passes are added to the pipeline, for example adding CFL or adding TBAA
passes should just cause their results to be available and to get folded
into this. The exception to this rule is BasicAA which really needs to
be a function pass due to using dominator trees and loop info. As
a consequence, the FunctionAAResultsWrapperPass directly depends on
BasicAA and always includes it in the aggregation.

This has significant implications for preserving analyses. Generally,
most passes shouldn't bother preserving FunctionAAResultsWrapperPass
because rebuilding the results just updates the set of known AA passes.
The exception to this rule are LoopPass instances which need to preserve
all the function analyses that the loop pass manager will end up
needing. This means preserving both BasicAAWrapperPass and the
aggregating FunctionAAResultsWrapperPass.

Now, when preserving an alias analysis, you do so by directly preserving
that analysis. This is only necessary for non-immutable-pass-provided
alias analyses though, and there are only three of interest: BasicAA,
GlobalsAA (formerly GlobalsModRef), and SCEVAA. Usually BasicAA is
preserved when needed because it (like DominatorTree and LoopInfo) is
marked as a CFG-only pass. I've expanded GlobalsAA into the preserved
set everywhere we previously were preserving all of AliasAnalysis, and
I've added SCEVAA in the intersection of that with where we preserve
SCEV itself.

One significant challenge to all of this is that the CGSCC passes were
actually using the alias analysis implementations by taking advantage of
a pretty amazing set of loop holes in the old pass manager's analysis
management code which allowed analysis groups to slide through in many
cases. Moving away from analysis groups makes this problem much more
obvious. To fix it, I've leveraged the flexibility the design of the new
PM components provides to just directly construct the relevant alias
analyses for the relevant functions in the IPO passes that need them.
This is a bit hacky, but should go away with the new pass manager, and
is already in many ways cleaner than the prior state.

Another significant challenge is that various facilities of the old
alias analysis infrastructure just don't fit any more. The most
significant of these is the alias analysis 'counter' pass. That pass
relied on the ability to snoop on AA queries at different points in the
analysis group chain. Instead, I'm planning to build printing
functionality directly into the aggregation layer. I've not included
that in this patch merely to keep it smaller.

Note that all of this needs a nearly complete rewrite of the AA
documentation. I'm planning to do that, but I'd like to make sure the
new design settles, and to flesh out a bit more of what it looks like in
the new pass manager first.

Differential Revision: http://reviews.llvm.org/D12080

llvm-svn: 247167
2015-09-09 17:55:00 +00:00
Chandler Carruth 1688a772fc Fix a typo I spotted when hacking on SROA. Somewhat alarming that
nothing broke.

llvm-svn: 247127
2015-09-09 09:46:16 +00:00
Sanjoy Das da0d79e0a0 [IRCE] Add INITIALIZE_PASS_DEPENDENCY invocations.
IRCE was just using INITIALIZE_PASS(), which is incorrect.

llvm-svn: 247122
2015-09-09 03:47:18 +00:00
Philip Reames 3ea158950e [RewriteStatepointsForGC] Extract common code, comment, and fix a build warning [NFC]
llvm-svn: 246810
2015-09-03 21:57:40 +00:00
Philip Reames f5b8e47651 [RewriteStatepointsForGC] Strengthen invariants around BDVs
As a first step towards a new implementation of the base pointer inference algorithm, introduce an abstraction for BDVs, strengthen the assertions around them, and rewrite the BDV relation code in terms of the abstraction which includes an explicit notion of whether the BDV is also a base. The later is motivated by the fact we had a bug where insertelement was always assumed to be a base pointer even though the BDV code knew it wasn't. The strengthened assertions in this patch would have caught that bug.

The next step will be to separate the DefiningValueMap into a BDV use list cache (entirely within findBasePointers) and a base pointer cache. Having the former will allow me to use a deterministic visit order when visiting BDVs in the inference algorithm and remove a bunch of ordering related hacks. Before actually doing the last step, I'm likely going to extend the lattice with a 'BaseN' (seen only base inputs) state so that I can kill the post process optimization step.

Phabricator Revision: http://reviews.llvm.org/D12608

llvm-svn: 246809
2015-09-03 21:34:30 +00:00
Philip Reames 246e618e77 [RewriteStatepointsForGC] Workaround a lack of determinism in visit order
The visit order being used in the base pointer inference algorithm is currently non-deterministic.  When working on http://reviews.llvm.org/D12583, I discovered that we were relying on a peephole optimization to get deterministic ordering in one of the test cases.  

This change is intented to let me test and land http://reviews.llvm.org/D12583.  The current code will not be long lived.  I'm starting to investigate a rewrite of the algorithm which will combine the post-process step into the initial algorithm and make the visit order determistic.  Before doing that, I wanted to make sure the existing code was complete and the test were stable.  Hopefully, patches should be up for review for the new algorithm this week or early next.

llvm-svn: 246801
2015-09-03 20:24:29 +00:00
Philip Reames 07a2ee1aff [RewriteStatepointsForGC] Delete stale comment [NFC]
llvm-svn: 246722
2015-09-02 22:35:42 +00:00
Philip Reames b3967cd08e [RewriteStatepointsForGC] Pull a function out of anon namespace [NFC]
Thanks to David Blaikie for noticing in previous commit.

llvm-svn: 246721
2015-09-02 22:30:53 +00:00
Philip Reames 9546f367f7 [RewriteStatepointsForGC] Bugfix for change 246133
Fix a bug in change 246133. I didn't handle the case where we had a cycle in the use graph and could add an instruction we were about to erase back on to the worklist. Oddly, I have not been able to write a small test case for this, even with the AssertingVH added. I have confirmed the basic theory for the fix on a large failing example, but all attempts to reduce that to something appropriate for a test case have failed.

Differential Revision: http://reviews.llvm.org/D12575

llvm-svn: 246718
2015-09-02 22:25:07 +00:00
Philip Reames 6906e92812 Fix release build warning for unused function
llvm-svn: 246717
2015-09-02 21:57:17 +00:00
Philip Reames dab35f317d [RewriteStatepointsForGC] Improve debug output [NFC]
llvm-svn: 246713
2015-09-02 21:11:44 +00:00
Piotr Padlewski 0c7d8fc1f6 assuem(X) handling in GVN bugfix
There was infinite loop because it was trying to change assume(true) into
assume(true)
Also added handling when assume(false) appear

http://reviews.llvm.org/D12516

llvm-svn: 246697
2015-09-02 20:00:03 +00:00
Piotr Padlewski 28ffcbe1cc Constant propagation after hitting assume(cmp) bugfix
Last time code run into assertion `BBE.isSingleEdge()` in
lib/IR/Dominators.cpp:200.

http://reviews.llvm.org/D12170

llvm-svn: 246696
2015-09-02 19:59:59 +00:00
Piotr Padlewski 14e815c22b Constant propagation after hiting llvm.assume
After hitting @llvm.assume(X) we can:
- propagate equality that X == true
- if X is icmp/fcmp (with eq operation), and one of operand
  is constant we can change all variables with constants in the same BasicBlock

http://reviews.llvm.org/D11918

llvm-svn: 246695
2015-09-02 19:59:53 +00:00
Jingyue Wu e84f671830 [JumpThreading] make jump threading respect convergent annotation.
Summary:
JumpThreading shouldn't duplicate a convergent call, because that would move a convergent call into a control-inequivalent location. For example,
  if (cond) {
    ...
  } else {
    ...
  }
  convergent_call();
  if (cond) {
    ...
  } else {
    ...
  }
should not be optimized to
  if (cond) {
    ...
    convergent_call();
    ...
  } else {
    ...
    convergent_call();
    ...
  }

Test Plan: test/Transforms/JumpThreading/basic.ll

Patch by Xuetian Weng. 

Reviewers: resistor, arsenm, jingyue

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D12484

llvm-svn: 246415
2015-08-31 06:10:27 +00:00
Chandler Carruth 4b682f6f24 [SROA] Fix PR24463, a crash I introduced in SROA by allowing it to
handle more allocas with loads past the end of the alloca.

I suspect there are some related crashers with slightly different
patterns, but I'll fix those and add test cases as I find them.

Thanks to David Majnemer for the excellent test case reduction here.
Made this super simple to debug and fix.

llvm-svn: 246289
2015-08-28 09:03:52 +00:00
Steven Wu 61db34d12e Revert r246244 and r246243
These two commits cause clang/llvm bootstrap to hang.

llvm-svn: 246279
2015-08-28 06:52:00 +00:00
Piotr Padlewski 3f81ec1e38 Constant propagation after hitting assume(cmp) bugfix
Last time code run into assertion `BBE.isSingleEdge()` in
lib/IR/Dominators.cpp:200.

http://reviews.llvm.org/D12170

llvm-svn: 246244
2015-08-28 01:02:00 +00:00
Piotr Padlewski 63cc5d4627 Constant propagation after hiting llvm.assume
After hitting @llvm.assume(X) we can:
- propagate equality that X == true
- if X is icmp/fcmp (with eq operation), and one of operand
  is constant we can change all variables with constants in the same BasicBlock

http://reviews.llvm.org/D11918

llvm-svn: 246243
2015-08-28 01:01:57 +00:00
James Molloy 1bbf15c57c [LoopVectorize] Extract InductionInfo into a helper class...
... and move it into LoopUtils where it can be used by other passes, just like ReductionDescriptor. The API is very similar to ReductionDescriptor - that is, not very nice at all. Sorting these both out will come in a followup.

NFC

llvm-svn: 246145
2015-08-27 09:53:00 +00:00
Philip Reames dfd890dd3a Allow value forwarding past release fences in EarlyCSE
A release fence acts as a publication barrier for stores within the current thread to become visible to other threads which might observe the release fence. It does not require the current thread to observe stores performed on other threads. As a result, we can allow store-load and load-store forwarding across a release fence.

We do need to make sure that stores before the fence can't be eliminated even if there's another store to the same location after the fence. In theory, we could reorder the second store above the fence and *then* eliminate the former, but we can't do this if the stores are on opposite sides of the fence.

Note: While more aggressive then what's there, this patch is still implementing a really conservative ordering.  In particular, I'm not trying to exploit undefined behavior via races, or the fact that the LangRef says only 'atomic' accesses are ordered w.r.t. fences.

Differential Revision: http://reviews.llvm.org/D11434

llvm-svn: 246134
2015-08-27 01:32:33 +00:00
Philip Reames abcdc5e3a8 [RewriteStatepointsForGC] Reduce the number of new instructions for base pointers
When computing base pointers, we introduce new instructions to propagate the base of existing instructions which might not be bases. However, the algorithm doesn't make any effort to recognize when the new instruction to be inserted is the same as an existing one already in the IR. Since this is happening immediately before rewriting, we don't really have a chance to fix it after the pass runs without teaching loop passes about statepoints.

I'm really not thrilled with this patch. I've rewritten it 4 different ways now, but this is the best I've come up with. The case where the new instruction is just the original base defining value could be merged into the existing algorithm with some complexity. The problem is that we might have something like an extractelement from a phi of two vectors. It may be trivially obvious that the base of the 0th element is an existing instruction, but I can't see how to make the algorithm itself figure that out. Thus, I resort to the call to SimplifyInstruction instead.

Note that we can only adjust the instructions we've inserted ourselves. The live sets are still being tracked in side structures at this point in the code. We can't easily muck with instructions which might be in them. Long term, I'm really thinking we need to materialize the live pointer sets explicitly in the IR somehow rather than using side structures to track them.

Differential Revision: http://reviews.llvm.org/D12004

llvm-svn: 246133
2015-08-27 01:02:28 +00:00
Chandler Carruth 748d095ff0 [SROA] Rip out all support for SSAUpdater in SROA.
This was only added to preserve the old ScalarRepl's use of SSAUpdater
which was originally to avoid use of dominance frontiers. Now, we only
need a domtree, and we'll need a domtree right after this pass as well
and so it makes perfect sense to always and only use the dom-tree
powered mem2reg. This was flag-flipper earlier and has stuck reasonably
so I wanted to gut the now-dead code out of SROA before we waste more
time with it. Among other things, this will make passmanager porting
easier.

llvm-svn: 246028
2015-08-26 09:09:29 +00:00
NAKAMURA Takumi c57a09821f Update libdeps in LLVMipo and LLVMScalarOpts, corresponding to r245940.
llvm-svn: 245957
2015-08-25 17:11:17 +00:00
Diego Novillo 4d71113cdb Convert SampleProfile pass into a Module pass.
Eventually, we will need sample profiles to be incorporated into the
inliner's cost models.  To do this, we need the sample profile pass to
be a module pass.

This patch makes no functional changes beyond the mechanical adjustments
needed to run SampleProfile as a module pass.

llvm-svn: 245940
2015-08-25 15:25:11 +00:00
Sanjay Patel 6b2765fe49 fix typo; NFC
llvm-svn: 245869
2015-08-24 20:11:14 +00:00
Adrian Prantl cbdfdb74d3 Rename Instruction::dropUnknownMetadata() to dropUnknownNonDebugMetadata()
and make it always preserve debug locations, since all callers wanted this
behavior anyway.

This is addressing a post-commit review feedback for r245589.

NFC (inside the LLVM tree).

llvm-svn: 245622
2015-08-20 22:00:30 +00:00
Jingyue Wu 10fcea5d4b [ValueTracking] computeOverflowForSignedAdd and isKnownNonNegative
Summary:
Refactor, NFC

Extracts computeOverflowForSignedAdd and isKnownNonNegative from NaryReassociate to ValueTracking in case
others need it.

Reviewers: reames

Subscribers: majnemer, llvm-commits

Differential Revision: http://reviews.llvm.org/D11313

llvm-svn: 245591
2015-08-20 18:27:04 +00:00
Adrian Prantl baf90fc265 Fix a bug that caused SimplifyCFG to drop DebugLocs.
Instruction::dropUnknownMetadata(KnownSet) is supposed to preserve all
metadata in KnownSet, but the condition for DebugLocs was inverted.

Most users of dropUnknownMetadata() actually worked around this by not
adding LLVMContext::MD_dbg to their list of KnowIDs.
This is now made explicit.

llvm-svn: 245589
2015-08-20 18:24:02 +00:00
Adrian Prantl a317cd2583 Fix a debug location handling bug in GVN.
Caught by the famous "DebugLoc describes the currect SubProgram" assertion.

When GVN is removing a nonlocal load it updates the debug location of the
SSA value it replaced the load with with the one of the load. In the
testcase this actually overwrites a valid debug location with an empty one.

In reality GVN has to make an arbitrary choice between two equally valid
debug locations. This patch changes to behavior to only update the
location if the value doesn't already have a debug location.

llvm-svn: 245588
2015-08-20 18:23:56 +00:00
Adam Nemet e48134093d [LVer] Fix FIXME: hide addPHINodes, NFC
Since Ashutosh made findDefsUsedOutsideOfLoop public, we can clean this
up.

Now clients that don't compute DefsUsedOutsideOfLoop can just call
versionLoop() and computing DefsUsedOutsideOfLoop will happen
implicitly.  With that there is no reason to expose addPHINodes anymore.

Ashutosh, you can now drop the calls to findDefsUsedOutsideOfLoop and
addPHINodes in LVerLICM and things should just work.

llvm-svn: 245579
2015-08-20 17:22:29 +00:00
Benjamin Kramer fcdb1c14ac Make helper functions static. NFC.
llvm-svn: 245549
2015-08-20 09:57:22 +00:00
Bjorn Steinbrink 2e2f66557e Revert "[DSE] Enable removal of lifetime intrinsics in terminating blocks"
llvm-svn: 245543
2015-08-20 08:58:47 +00:00
Bjorn Steinbrink cc7e8a9705 [DSE] Enable removal of lifetime intrinsics in terminating blocks
Usually DSE is not supposed to remove lifetime intrinsics, but it's
actually ok to remove them for dead objects in terminating blocks,
because they convey no extra information there. Until we hit a lifetime
start that cannot be removed, that is. Because from that point on the
lifetime intrinsics become interesting again, e.g. for stack coloring.

Reviewers: reames

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D11710

llvm-svn: 245542
2015-08-20 08:25:28 +00:00
David Majnemer ba275f9947 Replace some calls to isa<LandingPadInst> with isEHPad()
No functionality change is intended.

llvm-svn: 245487
2015-08-19 19:54:02 +00:00
Nick Lewycky 1098e496e1 More clean up, still NFC. Remove dead variables now that the casts are gone.
llvm-svn: 245420
2015-08-19 06:25:30 +00:00
Nick Lewycky 2c852543a3 Clean up this file a little. Remove dead casts, casting Values to Values. Adjust some comments for typos and whitespace. NFC.
llvm-svn: 245419
2015-08-19 06:22:33 +00:00
Ashutosh Nema c5b7b55589 Exposed findDefsUsedOutsideOfLoop as a loop utility function
Exposed findDefsUsedOutsideOfLoop as a loop utility function by moving 
it from LoopDistribute to LoopUtils.

Reviewed By: anemet

llvm-svn: 245416
2015-08-19 05:40:42 +00:00
Eric Christopher 0efe9f60bb Revert "Fix PR24469 resulting from r245025 and re-enable dead store elimination across basicblocks."
This is causing bootstrap problems, e.g.: http://bb.pgr.jp/builders/clang-3stage-i686-linux/builds/2960

This reverts r245195.

llvm-svn: 245402
2015-08-19 02:15:13 +00:00
Nick Lewycky 06b0ea2e8f Fix three typos in comments; "easilly" -> "easily".
llvm-svn: 245379
2015-08-18 22:41:58 +00:00
Justin Bogner 9f00ebaeda Revert "Constant propagation after hiting llvm.assume"
This was also failing bootstrap:

http://lab.llvm.org:8080/green/job/clang-stage2-configure-Rlto_build

This reverts r245265.

llvm-svn: 245269
2015-08-18 07:00:34 +00:00
Piotr Padlewski 94ca3783b8 Constant propagation after hiting llvm.assume
After hitting @llvm.assume(X) we can:
- propagate equality that X == true
- if X is icmp/fcmp (with eq operation), and one of operand
  is constant we can change all variables with constants in the same BasicBlock

http://reviews.llvm.org/D11918

llvm-svn: 245265
2015-08-18 03:55:30 +00:00
Karthik Bhat 3af28945b9 Fix PR24469 resulting from r245025 and re-enable dead store elimination across basicblocks.
PR24469 resulted because DeleteDeadInstruction in handleNonLocalStoreDeletion was
deleting the next basic block iterator. Fixed the same by resetting the basic block iterator
post call to DeleteDeadInstruction.

llvm-svn: 245195
2015-08-17 05:51:39 +00:00
Chandler Carruth 2f1fd1658f [PM] Port ScalarEvolution to the new pass manager.
This change makes ScalarEvolution a stand-alone object and just produces
one from a pass as needed. Making this work well requires making the
object movable, using references instead of overwritten pointers in
a number of places, and other refactorings.

I've also wired it up to the new pass manager and added a RUN line to
a test to exercise it under the new pass manager. This includes basic
printing support much like with other analyses.

But there is a big and somewhat scary change here. Prior to this patch
ScalarEvolution was never *actually* invalidated!!! Re-running the pass
just re-wired up the various other analyses and didn't remove any of the
existing entries in the SCEV caches or clear out anything at all. This
might seem OK as everything in SCEV that can uses ValueHandles to track
updates to the values that serve as SCEV keys. However, this still means
that as we ran SCEV over each function in the module, we kept
accumulating more and more SCEVs into the cache. At the end, we would
have a SCEV cache with every value that we ever needed a SCEV for in the
entire module!!! Yowzers. The releaseMemory routine would dump all of
this, but that isn't realy called during normal runs of the pipeline as
far as I can see.

To make matters worse, there *is* actually a key that we don't update
with value handles -- there is a map keyed off of Loop*s. Because
LoopInfo *does* release its memory from run to run, it is entirely
possible to run SCEV over one function, then over another function, and
then lookup a Loop* from the second function but find an entry inserted
for the first function! Ouch.

To make matters still worse, there are plenty of updates that *don't*
trip a value handle. It seems incredibly unlikely that today GVN or
another pass that invalidates SCEV can update values in *just* such
a way that a subsequent run of SCEV will incorrectly find lookups in
a cache, but it is theoretically possible and would be a nightmare to
debug.

With this refactoring, I've fixed all this by actually destroying and
recreating the ScalarEvolution object from run to run. Technically, this
could increase the amount of malloc traffic we see, but then again it is
also technically correct. ;] I don't actually think we're suffering from
tons of malloc traffic from SCEV because if we were, the fact that we
never clear the memory would seem more likely to have come up as an
actual problem before now. So, I've made the simple fix here. If in fact
there are serious issues with too much allocation and deallocation,
I can work on a clever fix that preserves the allocations (while
clearing the data) between each run, but I'd prefer to do that kind of
optimization with a test case / benchmark that shows why we need such
cleverness (and that can test that we actually make it faster). It's
possible that this will make some things faster by making the SCEV
caches have higher locality (due to being significantly smaller) so
until there is a clear benchmark, I think the simple change is best.

Differential Revision: http://reviews.llvm.org/D12063

llvm-svn: 245193
2015-08-17 02:08:17 +00:00
Sanjoy Das 94c4aecf83 [LSR][NFC] Don’t duplicate entity name at the beginning of the comment.
llvm-svn: 245183
2015-08-16 18:22:46 +00:00
Sanjoy Das 302bfd04b5 [LSR][NFC] Use camelCase for method names in Formula and RegUseTracker.
llvm-svn: 245182
2015-08-16 18:22:43 +00:00
David Majnemer e04443baff Revert "Add support for cross block dse. This patch enables dead stroe elimination across basicblocks."
This reverts commit r245025, it caused PR24469.

llvm-svn: 245172
2015-08-16 07:11:59 +00:00
David Majnemer 0bc0eef71c [IR] Give catchret an optional 'return value' operand
Some personality routines require funclet exit points to be clearly
marked, this is done by producing a token at the funclet pad and
consuming it at the corresponding ret instruction.  CleanupReturnInst
already had a spot for this operand but CatchReturnInst did not.
Other personality routines don't need to use this which is why it has
been made optional.

llvm-svn: 245149
2015-08-15 02:46:08 +00:00
Matt Arsenault 427a0fd22e LoopStrengthReduce: Try to pass address space to isLegalAddressingMode
This seems to only work some of the time. In some situations,
this seems to use a nonsensical type and isn't actually aware of the
memory being accessed. e.g. if branch condition is an icmp of a pointer,
it checks the addressing mode of i1.

llvm-svn: 245137
2015-08-15 00:53:06 +00:00
James Molloy 87405c7f66 Separate out BDCE's analysis into a separate DemandedBits analysis.
This allows other areas of the compiler to use BDCE's bit-tracking.
NFCI.

llvm-svn: 245039
2015-08-14 11:09:09 +00:00
Adam Nemet 06ccf0145f [LVer] Remove unused Pass parameter from versionLoop, NFC
llvm-svn: 245032
2015-08-14 06:30:26 +00:00
David Majnemer b611e3f50e [IR] Add token types
This introduces the basic functionality to support "token types".
The motivation stems from the need to perform operations on a Value
whose provenance cannot be obscured.

There are several applications for such a type but my immediate
motivation stems from WinEH.  Our personality routine enforces a
single-entry - single-exit regime for cleanups.  After several rounds of
optimizations, we may be left with a terminator whose "cleanup-entry
block" is not entirely clear because control flow has merged two
cleanups together.  We have experimented with using labels as operands
inside of instructions which are not terminators to indicate where we
came from but found that LLVM does not expect such exotic uses of
BasicBlocks.

Instead, we can use this new type to clearly associate the "entry point"
and "exit point" of our cleanup.  This is done by having the cleanuppad
yield a Token and consuming it at the cleanupret.
The token type makes it impossible to obscure or otherwise hide the
Value, making it trivial to track the relationship between the two
points.

What is the burden to the optimizer?  Well, it turns out we have already
paid down this cost by accepting that there are certain calls that we
are not permitted to duplicate, optimizations have to watch out for
such instructions anyway.  There are additional places in the optimizer
that we will probably have to update but early examination has given me
the impression that this will not be heroic.

Differential Revision: http://reviews.llvm.org/D11861

llvm-svn: 245029
2015-08-14 05:09:07 +00:00
Karthik Bhat ddc2a86a00 Add support for cross block dse.
This patch enables dead stroe elimination across basicblocks.

Example:
define void @test_02(i32 %N) {
  %1 = alloca i32
  store i32 %N, i32* %1
  store i32 10, i32* @x
  %2 = load i32, i32* %1
  %3 = icmp ne i32 %2, 0
  br i1 %3, label %4, label %5

; <label>:4
  store i32 5, i32* @x
  br label %7

; <label>:5
  %6 = load i32, i32* @x
  store i32 %6, i32* @y
  br label %7

; <label>:7
  store i32 15, i32* @x
  ret void
}
In the above example dead store "store i32 5, i32* @x" is now eliminated.

Differential Revision: http://reviews.llvm.org/D11143

llvm-svn: 245025
2015-08-14 04:17:23 +00:00
Chandler Carruth 1db22822b4 [PM/AA] Hoist the interface to TBAA into a dedicated header along with
its creation function. Update the relevant includes accordingly.

llvm-svn: 245019
2015-08-14 03:33:48 +00:00
Chandler Carruth 42ff448fe4 [PM/AA] Hoist ScopedNoAliasAA's interface into a header and move the
creation function there.

Same basic refactoring as the other alias analyses. Nothing special
required this time around.

llvm-svn: 245012
2015-08-14 02:55:50 +00:00
Jingyue Wu 1238f341ba [SeparateConstOffsetFromGEP] sext(a)+sext(b) => sext(a+b) when a+b can't sign-overflow.
Summary:
This patch implements my promised optimization to reunites certain sexts from
operands after we extract the constant offset. See the header comment of
reuniteExts for its motivation.

One key building block that enables this optimization is Bjarke's poison value
analysis (D11212). That helps to prove "a +nsw b" can't overflow.

Reviewers: broune

Subscribers: jholewinski, sanjoy, llvm-commits

Differential Revision: http://reviews.llvm.org/D12016

llvm-svn: 245003
2015-08-14 02:02:05 +00:00
Chandler Carruth bf143e2a20 [LIR] Re-instate r244880, reverted in r244884, factoring the handling of
AliasAnalysis in LoopIdiomRecognize.

The previous commit to LIR, r244879, exposed some scary bug in the loop
pass pipeline with an assert failure that showed up on several bots.
This patch got reverted as part of getting that revision reverted, but
they're actually independent and unrelated. This patch has no functional
change and should be completely safe. It is also useful for my current
work on the AA infrastructure.

llvm-svn: 244993
2015-08-14 00:21:10 +00:00
Sanjay Patel a75c41e5f3 don't repeat function names in comments; NFC
llvm-svn: 244977
2015-08-13 22:53:20 +00:00
Jingyue Wu 13a80eaceb [SeparateConstOffsetFromGEP] strengthen the inbounds attribute
We used to be over-conservative about preserving inbounds. Actually, the second
GEP (which applies the constant offset) can inherit the inbounds attribute of
the original GEP, because the resultant pointer is equivalent to that of the
original GEP. For example,

  x  = GEP inbounds a, i+5
    =>
  y = GEP a, i               // inbounds removed
  x = GEP inbounds y, 5      // inbounds preserved

llvm-svn: 244937
2015-08-13 18:48:49 +00:00
Erik Eckstein 11fc8175d9 [DeadStoreElimination] remove a redundant store even if the load is in a different block.
DeadStoreElimination does eliminate a store if it stores a value which was loaded from the same memory location.
So far this worked only if the store is in the same block as the load.
Now we can also handle stores which are in a different block than the load.
Example:

define i32 @test(i1, i32*) {
entry:
  %l2 = load i32, i32* %1, align 4
  br i1 %0, label %bb1, label %bb2
bb1:
  br label %bb3
bb2:
  ; This store is redundant
  store i32 %l2, i32* %1, align 4
  br label %bb3
bb3:
  ret i32 0
}

Differential Revision: http://reviews.llvm.org/D11854

llvm-svn: 244901
2015-08-13 15:36:11 +00:00
Renato Golin 655348f0b2 Revert "[LIR] Start leveraging the fundamental guarantees of a loop..."
This reverts commit r244879, as it broke the test-suite on
SingleSource/Regression/C/2004-03-15-IndirectGoto in AArch64.

llvm-svn: 244885
2015-08-13 11:25:38 +00:00
Renato Golin 4d57906b0e Revert "[LIR] Handle access to AliasAnalysis the same way as the other analysis in LoopIdiomRecognize."
This reverts commit r244880, as it broke the test-suite on
SingleSource/Regression/C/2004-03-15-IndirectGoto in AArch64.

llvm-svn: 244884
2015-08-13 11:25:35 +00:00
Ashutosh Nema 47802628f7 Test Commit.
llvm-svn: 244883
2015-08-13 11:18:35 +00:00
Chandler Carruth c2af09823f [LIR] Handle access to AliasAnalysis the same way as the other analysis
in LoopIdiomRecognize. This is what started me staring at this code. Now
migrating it with the new AA stuff will be trivial.

llvm-svn: 244880
2015-08-13 10:00:53 +00:00
Chandler Carruth 8ae7b81559 [LIR] Start leveraging the fundamental guarantees of a loop in
simplified form to remove redundant checks and simplify the code for
popcount recognition. We don't actually need to handle all of these
cases.

I've left a FIXME for one in particular until I finish inspecting to
make sure we don't actually *rely* on the predicate in any way.

llvm-svn: 244879
2015-08-13 09:56:20 +00:00
Chandler Carruth 18c2669aca [LIR] Handle the LoopInfo the same as all the other analyses. No utility
really in breaking pattern just for this analysis.

llvm-svn: 244878
2015-08-13 09:27:01 +00:00
Chen Li f458c6f313 [LoopUnswitch] Check OptimizeForSize before traversing over all basic blocks in current loop
Summary: This patch moves the check of OptimizeForSize before traversing over all basic blocks in current loop. If OptimizeForSize is set to true, no non-trivial unswitch is ever allowed. Therefore, the early exit will help reduce compilation time. This patch should be NFC. 

Reviewers: reames, weimingz, broune

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D11997

llvm-svn: 244868
2015-08-13 05:24:29 +00:00
Chandler Carruth dc298329cc [LIR] Make the LoopIdiomRecognize pass get analyses essentially the same
way as every other pass. This simplifies the code quite a bit and is
also more idiomatic! <ba-dum!>

llvm-svn: 244853
2015-08-13 01:03:26 +00:00
Chandler Carruth 8219a501da [LIR] Remove the dedicated class for popcount recognition and sink the
code into methods on LoopIdiomRecognize.

This simplifies the code somewhat and also makes it much easier to move
the analyses around. Ultimately, the separate class wasn't providing
significant value over methods -- it contained the precondition basic
block and the current loop. The current loop is already available and
the precondition block wasn't needed everywhere and is easy to pass
around.

In several cases I just moved things to be static functions because they
already accepted most of their inputs as arguments.

This doesn't fix the way we manage analyses yet, that will be the next
patch, but it already makes the code over 50 lines shorter.

No functionality changed.

llvm-svn: 244851
2015-08-13 00:44:29 +00:00
Chandler Carruth d9c6070c98 [LIR] Move all the helpers to be private and re-order the methods in
a way that groups things logically. No functionality changed.

llvm-svn: 244845
2015-08-13 00:10:03 +00:00
Chandler Carruth be158b17db [LIR] Remove the 'LIRUtils' abstraction which was unnecessary and adding
complexity.

There is only one function that was called from multiple locations, and
that was 'getBranch' which has a reasonable one-line spelling already:
dyn_cast<BranchInst>(BB->getTerminator). We could make this shorter, but
it doesn't seem to add much value. Instead, we should avoid calling it
so many times on the same basic blocks, but that will be in a subsequent
patch.

The other functions are only called in one location, so inline them
there, and take advantage of this to use direct early exit and reduce
indentation. This makes it much more clear what is being tested for, and
in fact makes it clear now to me that there are simpler ways to do this
work. However, this patch just does the mechanical inlining. I'll clean
up the functionality of the code to leverage loop simplified form more
effectively in a follow-up.

Despite lots of early line breaks due to early-exit, this is still
shorter than it was before.

llvm-svn: 244841
2015-08-12 23:55:56 +00:00
Chandler Carruth bad690e8f7 [LIR] Run clang-format over LoopIdiomRecognize in preparation for
a significant code cleanup here.

The handling of analyses in this pass is overly complex and can be
simplified significantly, but the right way to do that is to simplify
all of the code not just the analyses, and that'll require pretty
extensive edits that would be noisy with formatting changes mixed into
them.

llvm-svn: 244828
2015-08-12 23:06:37 +00:00
Philip Reames 971dc3a82a [RewriteStatepointsForGC] Avoid using unrelocated pointers after safepoints
To be clear: this is an *optimization* not a correctness change.

CodeGenPrep likes to duplicate icmps feeding branch instructions to take advantage of x86's ability to fuze many comparison/branch patterns into a single micro-op and to reduce the need for materializing i1s into general registers. PlaceSafepoints likes to place safepoint polls right at the end of basic blocks (immediately before terminators) when inserting entry and backedge safepoints. These two heuristics interact in a somewhat unfortunate way where the branch terminating the original block will be controlled by a condition driven by unrelocated pointers. This forces the register allocator to keep both the relocated and unrelocated values of the pointers feeding the icmp alive over the safepoint poll.

One simple fix would have been to just adjust PlaceSafepoints to move one back in the basic block, but you can reach similar cases as a result of LICM or other hoisting passes. As a result, doing a post insertion fixup seems to be more robust.

I considered doing this in CodeGenPrep itself, but having to update the live sets of already rewritten safepoints gets complicated fast. In particular, you can't just use def/use information because by moving the icmp, we're extending the live range of it's inputs potentially.

Instead, this patch teaches RewriteStatepointsForGC to make the required adjustments before making the relocations explicit in the IR. This change really highlights the fact that RSForGC is a CodeGenPrep-like pass which is performing target specific lowering. In the long run, we may even want to combine the two though this would require a lot more smarts to be integrated into RSForGC first. We currently rely on being able to run a set of cleanup passes post rewriting because the IR RSForGC generates is pretty damn ugly.

Differential Revision: http://reviews.llvm.org/D11819

llvm-svn: 244821
2015-08-12 22:11:45 +00:00
Philip Reames 9ac4e38a16 [RewriteStatepointsForGC] Handle extractelement fully in the base pointer algorithm
When rewriting the IR such that base pointers are available for every live pointer, we potentially need to duplicate instructions to propagate the base. The original code had only handled PHI and Select under the belief those were the only instructions which would need duplicated. When I added support for vector instructions, I'd added a collection of hacks for ExtractElement which caught most of the common cases. Of course, I then found the one test case my hacks couldn't cover. :)

This change removes all of the early hacks for extract element. By defining extractelement as a BDV (rather than trying to look through it), we can extend the rewriting algorithm to duplicate the extract as needed.  Note that a couple of peephole optimizations were left in for the moment, because while we now handle extractelement as a first class citizen, we're not yet handling insertelement.  That change will follow in the near future.  

llvm-svn: 244808
2015-08-12 21:00:20 +00:00
Chandler Carruth 19ac7d5b29 [PM/AA] Add missing static dependency edges from DSE and memdep to TLI.
I forgot to add these in r244780 and r244778. Sorry about that.

Also order the static dependencies in a lexicographical order.

llvm-svn: 244787
2015-08-12 18:10:45 +00:00
Chandler Carruth dbe40fb45e [PM/AA] Stop getting the TargetLibraryInfo out of the AliasAnalysis and
just depend on it directly.

This was particularly frustrating because there was a really wide
mixture of using a member variable and re-extracting it from the AA that
happened to be around. I think the result is much more clear.

I've also deleted all of the pointless null checks and used references
across the APIs where I could to make it explicit that this cannot be
null in a useful fashion.

llvm-svn: 244780
2015-08-12 18:01:44 +00:00
Sanjay Patel 956e29cc8e don't repeat function names in comments; NFC
llvm-svn: 244672
2015-08-11 21:24:04 +00:00
Sanjay Patel 41f3d95f76 fix 80-cols; NFC
llvm-svn: 244668
2015-08-11 21:11:56 +00:00
Igor Laevsky 4709c03715 [IndVarSimplify] Make cost estimation in RewriteLoopExitValues smarter
Differential Revision: http://reviews.llvm.org/D11687

llvm-svn: 244474
2015-08-10 18:23:58 +00:00
Mark Heffernan 8939154a22 Add new llvm.loop.unroll.enable metadata.
This change adds the unroll metadata "llvm.loop.unroll.enable" which directs
the optimizer to unroll a loop fully if the trip count is known at compile time, and
unroll partially if the trip count is not known at compile time. This differs from
"llvm.loop.unroll.full" which explicitly does not unroll a loop if the trip count is not
known at compile time.

The "llvm.loop.unroll.enable" is intended to be added for loops annotated with
"#pragma unroll".

llvm-svn: 244466
2015-08-10 17:28:08 +00:00
Fraser Cormack e29ab2bfab Prevent the scalarizer from caching incorrect entries
The scalarizer can cache incorrect entries when walking up a chain of
insertelement instructions. This occurs when it encounters more than one
instruction that it is not actively searching for, as it unconditionally caches
every element it finds. The fix is to only cache the first element that it
isn't searching for so we don't overwrite correct entries.

Reviewers: hfinkel

Differential Revision: http://reviews.llvm.org/D11559

llvm-svn: 244448
2015-08-10 14:48:47 +00:00
Benjamin Kramer df005cbe19 Fix some comment typos.
llvm-svn: 244402
2015-08-08 18:27:36 +00:00
Adam Nemet 15840393f3 [LAA] Make the set of runtime checks part of the state of LAA, NFC
This is the full set of checks that clients can further filter. IOW,
it's client-agnostic.  This makes LAA complete in the sense that it now
provides the two main results of its analysis precomputed:

1. memory dependences via getDepChecker().getInsterestingDependences()
2. run-time checks via getRuntimePointerCheck().getChecks()

However, as a consequence we now compute this information pro-actively.
Thus if the client decides to skip the loop based on the dependences
we've computed the checks unnecessarily.  In order to see whether this
was a significant overhead I checked compile time on SPEC2k6 LTO bitcode
files.  The change was in the noise.

The checks are generated in canCheckPtrAtRT, at the same place where we
used to call groupChecks to merge checks.

llvm-svn: 244368
2015-08-07 22:44:15 +00:00
Pete Cooper ebcd748927 Convert a bunch of loops to foreach. NFC.
After r244074, we now have a successors() method to iterate over
all the successors of a TerminatorInst.  This commit changes a bunch
of eligible loops to use it.

llvm-svn: 244260
2015-08-06 20:22:46 +00:00
Nico Rieck 78199518c4 Rename inst_range() to instructions() for consistency. NFC
llvm-svn: 244248
2015-08-06 19:10:45 +00:00