Commit Graph

6945 Commits

Author SHA1 Message Date
Sanjoy Das 353a19e13c [RewriteStatepointsForGC] Strip deref info after rewriting.
Summary:
Once a gc.statepoint has been rewritten to relocate live references, the
SSA values represent physical pointers instead of logical references.
Logical dereferencability does not imply physical dereferencability and
after RewriteStatepointsForGC has run any attributes that imply
dereferencability of the logical references need to be stripped.

This current approach is conservative, and can be made more precise
later if needed.  For starters, we need to strip dereferencable
attributes only from pointers that live in the GC address space.

Reviewers: reames, pgavlin

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D10105

llvm-svn: 238883
2015-06-02 22:33:37 +00:00
Sanjoy Das ea45f0e054 [NFCI] Change RewriteStatepointsForGC to a ModulePass.
Summary:
A later change that has RewriteStatepointsForGC change function
attributes throughout the module depends on this.

Reviewers: reames, pgavlin

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D10104

llvm-svn: 238882
2015-06-02 22:33:34 +00:00
Owen Anderson 15d1805504 Teach the IR Sink pass to (conservatively) respect convergent annotations.
llvm-svn: 238762
2015-06-01 17:20:31 +00:00
Benjamin Kramer f5e2fc474d Replace push_back(Constructor(foo)) with emplace_back(foo) for non-trivial types
If the type isn't trivially moveable emplace can skip a potentially
expensive move. It also saves a couple of characters.


Call sites were found with the ASTMatcher + some semi-automated cleanup.

memberCallExpr(
    argumentCountIs(1), callee(methodDecl(hasName("push_back"))),
    on(hasType(recordDecl(has(namedDecl(hasName("emplace_back")))))),
    hasArgument(0, bindTemporaryExpr(
                       hasType(recordDecl(hasNonTrivialDestructor())),
                       has(constructExpr()))),
    unless(isInTemplateInstantiation()))

No functional change intended.

llvm-svn: 238602
2015-05-29 19:43:39 +00:00
Wei Mi e2538b5639 Enable exitValue rewrite only when the cost of expansion is low.
The patch evaluates the expansion cost of exitValue in indVarSimplify pass, and only does the rewriting when the expansion cost is low or loop can be deleted with the rewriting. It provides an option "-replexitval=" to control the default aggressiveness of the exitvalue rewriting. It also fixes some missing cases in SCEVExpander::isHighCostExpansionHelper to enhance the evaluation of SCEV expansion cost.

Differential Revision: http://reviews.llvm.org/D9800

llvm-svn: 238507
2015-05-28 21:49:07 +00:00
David Majnemer 587336d2ad [Reassociate] Canonicalizing 'x [+-] (-Constant * y)' isn't always a win
Canonicalizing 'x [+-] (-Constant * y)' is not a win if we don't *know*
we will open up CSE opportunities.

If the multiply was 'nsw', then negating 'y' requires us to clear the
'nsw' flag.  If this is actually worth pursuing, it is probably more
appropriate to do so in GVN or EarlyCSE.

This fixes PR23675.

llvm-svn: 238397
2015-05-28 06:16:39 +00:00
Jingyue Wu c2a014697a [NaryReassociate] Run EarlyCSE after NaryReassociate
Summary:
This patch made two improvements to NaryReassociate and the NVPTX pipeline

1. Run EarlyCSE/GVN after NaryReassociate to get rid of redundant common
expressions.

2. When adding an instruction to SeenExprs, maps both the SCEV before and after
reassociation to that instruction.

Test Plan: updated @reassociate_gep_nsw in nary-gep.ll

Reviewers: meheff, broune

Reviewed By: broune

Subscribers: dberlin, jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D9947

llvm-svn: 238396
2015-05-28 04:56:52 +00:00
Philip Reames 52e7a59e50 [PlaceSafepoints] Entry safepoint location doesn't need to be a terminator
Long ago, the poll insertion code assumed that the insertion site was a terminator.  As a result, the entry selection code would split a basic block to ensure it could pass a terminator.  The insertion code was updated quite a while ago - possibly before it ever landed upstream - but the now redundant work was never removed.  

While I'm at it, remove a comment which doesn't apply to the upstreamed code.  

NFC intended.

llvm-svn: 238254
2015-05-26 21:16:42 +00:00
Philip Reames 38840245e4 [PlaceSafepoints] Cleanup InsertSafepointPoll function
While working on another change, I noticed that the naming in this function was mildly deceptive.  While fixing that, I took the oppurtunity to modernize some of the code.  NFC intended.

llvm-svn: 238252
2015-05-26 21:03:23 +00:00
Craig Topper 042a39274a Use range-based for loops. NFC.
llvm-svn: 238154
2015-05-25 20:01:18 +00:00
NAKAMURA Takumi 5582a6a4a5 Reformat.
llvm-svn: 238126
2015-05-25 01:43:34 +00:00
NAKAMURA Takumi fb3bd7127a Prune CRLFs.
llvm-svn: 238125
2015-05-25 01:43:23 +00:00
Chandler Carruth 04cc665cef [Unroll] Switch from an eagerly populated SCEV cache to one that is
lazily built.

Also, make it a much more generic SCEV cache, which today exposes only
a reduced GEP model description but could be extended in the future to
do other profitable caching of SCEV information.

llvm-svn: 238124
2015-05-25 01:00:46 +00:00
Craig Topper 10949ae742 Give more meaningful names than I and J to some for loop variables after converting to range-based loops.
llvm-svn: 238095
2015-05-23 08:45:10 +00:00
Craig Topper 37d0d866f4 Fix an unused variable warning in release builds.
llvm-svn: 238094
2015-05-23 08:20:33 +00:00
Craig Topper 77b9941ab9 Use range-based for loops. NFC.
llvm-svn: 238093
2015-05-23 08:01:41 +00:00
Philip Reames 7c78ef7dd9 Extend EarlyCSE to handle basic cases from JumpThreading and CVP
This patch extends EarlyCSE to take advantage of the information that a controlling branch gives us about the value of a Value within this and dominated basic blocks. If the current block has a single predecessor with a controlling branch, we can infer what the branch condition must have been to execute this block. The actual change to support this is downright simple because EarlyCSE's existing scoped hash table logic deals with most of the complexity around merging.

The patch actually implements two optimizations.
1) The first is analogous to JumpThreading in that it enables EarlyCSE's CSE handling to fold branches which are exactly redundant due to a previous branch to branches on constants. (It doesn't actually replace the branch or change the CFG.) This is pretty clearly a win since it enables substantial CFG simplification before we start trying to inline.
2) The second is analogous to CVP in that it exploits the knowledge gained to replace dominated *uses* of the original value. EarlyCSE does not otherwise reason about specific uses, so this is the more arguable one. It does enable further simplication and constant folding within the rest of the visit by EarlyCSE.

In both cases, the added code only handles the easy dominance based case of each optimization. The general case is deferred to the existing passes.

Differential Revision: http://reviews.llvm.org/D9763

llvm-svn: 238071
2015-05-22 23:53:24 +00:00
Chandler Carruth 0215608bda [Unroll] Separate the logic for testing each iteration of the loop,
accumulating estimated cost, and other loop-centric logic from the logic
used to analyze instructions in a particular iteration.

This makes the visitor very narrow in scope -- all it does is visit
instructions, update a map of simplified values, and return whether it
is able to optimize away a particular instruction.

The two cost metrics are now returned as an optional struct. When the
optional is left unengaged, there is no information about the unrolled
cost of the loop, when it is engaged the cost metrics are available to
run against the thresholds.

No functionality changed.

llvm-svn: 238033
2015-05-22 17:41:35 +00:00
Chandler Carruth 5189559905 [Unroll] Replace a hand-wavy FIXME with a FIXME that explains the actual
problem instead of suggesting doing something that is trivial to do but
incorrect given the current design of the libraries.

llvm-svn: 237994
2015-05-22 03:07:28 +00:00
Chandler Carruth e1a0462dcc [Unroll] Extract the logic for caching SCEV-modeled GEPs with their
simplified model for use simulating each iteration into a separate
helper function that just returns the cache.

Building this cache had nothing to do with the rest of the unroll
analysis and so this removes an unnecessary coupling, etc. It should
also make it easier to think about the concept of providing fast cached
access to basic SCEV models as an orthogonal concept to the overall
unroll simulation.

I'd really like to see this kind of caching logic folded into SCEV
itself, it seems weird for us to provide it at this layer rather than
making repeated queries into SCEV fast all on their own.

No functionality changed.

llvm-svn: 237993
2015-05-22 03:02:22 +00:00
Chandler Carruth f174a156c3 [Unroll] Refactor the accumulation of optimized instruction costs into
a single location.

This reduces code duplication a bit and will also pave the way for
a better separation between the visitation algorithm and the unroll
analysis.

No functionality changed.

llvm-svn: 237990
2015-05-22 02:47:29 +00:00
Philip Reames b47b9c2b2b [LICM] Sinking doesn't involve the preheader
PR23608 pointed out that using the preheader to gain a context instruction isn't always legal because a loop might not have a preheader.  When looking into that, I realized that using the preheader to determine legality for sinking is questionable at best.  Given no test covers that case and the original commit didn't seem to intend it, I restructured the code to only ask context sensative queries for hoising of loads and stores.  This is effectively a partial revert of 237593.

llvm-svn: 237985
2015-05-22 02:14:05 +00:00
Daniel Berlin b301533ef1 MergedLoadStoreMotion preserves MemoryDependenceAnalysis, it does not require it.
(It already was coded assuming it can sometimes be null, so no other changes are necessary)

llvm-svn: 237978
2015-05-22 00:13:05 +00:00
Jingyue Wu 4fc97f6df8 [NaryReassoc] reassociate GEP for CSE
Summary:
x = &a[i];
y = &a[i + j];

=>

y = x + j;

along with some refactoring work such as extracting method
findClosestMatchingDominator.

Depends on D9786 which provides the ScalarEvolution::getGEPExpr interface.

Test Plan: nary-gep.ll

Reviewers: meheff, broune

Reviewed By: broune

Subscribers: jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D9802

llvm-svn: 237971
2015-05-21 23:17:30 +00:00
Benjamin Kramer e6987bf351 [LoopDistribute] Remove a layer of pointer indirection.
Just store InstPartitions directly into the std::list. No functional change
intended.

llvm-svn: 237930
2015-05-21 18:32:07 +00:00
Igor Laevsky d83f6976ba [RewriteStatepointsForGC] Fix debug assertion during derivable pointer rematerialization
Correct assertion would be that there is no other uses from chain we are currently cloning. It is ok to have other uses of values not from this chain.

Differential Revision: http://reviews.llvm.org/D9882

llvm-svn: 237899
2015-05-21 13:02:14 +00:00
Ahmed Bougacha 97876fa894 [MemCpyOpt] Do move the memset, but look at its dest's dependencies.
In effect a partial revert of r237858, which was a dumb shortcut.
Looking at the dependencies of the destination should be the proper
fix: if the new memset would depend on anything other than itself,
the transformation isn't correct.

llvm-svn: 237874
2015-05-21 01:43:39 +00:00
Ahmed Bougacha 0541c67ae7 [MemCpyOpt] Pass Instruction to IRBuilder, no need for NextNode. NFC.
We're erasing the instructions anyway.

llvm-svn: 237861
2015-05-21 00:08:35 +00:00
Ahmed Bougacha 5e0f425c27 [MemCpyOpt] Don't move the memset when optimizing memset+memcpy.
Fixes PR23599, another miscompile introduced by r235232: when there is
another dependency on the destination of the created memset (i.e., the
part of the original destination that the memcpy doesn't depend on)
between the memcpy and the original memset, we would insert the created
memset after the memcpy, and thus after the other dependency.

Instead, insert the created memset right after the old one.

llvm-svn: 237858
2015-05-20 23:55:16 +00:00
Aaron Ballman ff7d4fad54 Silencing a -Wsign-compare warning; NFC.
llvm-svn: 237794
2015-05-20 14:53:50 +00:00
Swaroop Sridhar 665bc9c936 Add a GCStrategy for CoreCLR
This change adds a new GC strategy for supporting the CoreCLR runtime.

This strategy is currently identical to Statepoint-example GC, 
but is necessary for several upcoming changes specific to CoreCLR, such as:

1. Base-pointers not explicitly reported for interior pointers
2. Different format for stack-map encoding
3. Location of Safe-point polls: polls are only needed before loop-back edges and before tail-calls (not needed at function-entry)
4. Runtime specific handshake between calls to managed/unmanaged functions.

llvm-svn: 237753
2015-05-20 01:07:23 +00:00
Philip Reames d97cdf28e6 [PlaceSafepoints] Stop special casing some intrinsics
We were special casing a handful of intrinsics as not needing a safepoint before them.  After running into another valid case - memset - I took a closer look and realized that almost no intrinsics need to have a safepoint poll before them.  Restructure the code to make that apparent so that we stop hitting these bugs.  The only intrinsics which need a safepoint poll before them are ones which can run arbitrary code.

llvm-svn: 237744
2015-05-19 23:40:11 +00:00
Jingyue Wu 5db9cba066 [Speculation] NFC: more header comments
explaining how it differs from SpeculativeExecuteBB in SimplifyCFG.

llvm-svn: 237724
2015-05-19 20:52:45 +00:00
Igor Laevsky 285fe84edd [RewriteStatepointsForGC] Fix up naming in "relocationViaAlloca" and run it through clang-format.
Differential Revision: http://reviews.llvm.org/D9774

llvm-svn: 237703
2015-05-19 16:29:43 +00:00
Igor Laevsky e03171863d [RewriteStatepointsForGC] For some values (like gep's and bitcasts) it's cheaper to clone them after statepoint than to emit proper relocates for them. This change implements this logic. There is alredy similar optimization in CodeGenPrepare, but doing so during RewriteStatepointsForGC allows to capture more opprtunities such as relocates in loops and longer instruction chains.
Differential Revision: http://reviews.llvm.org/D9774

llvm-svn: 237701
2015-05-19 15:59:05 +00:00
David Blaikie ff6409d096 Simplify IRBuilder::CreateCall* by using ArrayRef+initializer_list/braced init only
llvm-svn: 237624
2015-05-18 22:13:54 +00:00
Chen Li 74ca2a8777 [PlaceSafepoints] Assertion on that gc_result can not have preceding phis should only apply to invoke statepoint
Summary: When PlaceSafepoints pass replaces old return result with gc_result from statepoint, it asserts that gc_result can not have preceding phis in its parent block. This is only true on invoke statepoint, which terminates the block and puts its result at the beginning of the normal successor block. Call statepoint does not terminate the block and thus its result is in the same block with it. There should be no restriction on whether there are phis or not.

Reviewers: reames, igor-laevsky

Reviewed By: igor-laevsky

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D9803

llvm-svn: 237597
2015-05-18 19:02:25 +00:00
Sanjoy Das f8a0db50b2 Exploit dereferenceable_or_null attribute in LICM pass
Summary:
Allow hoisting of loads from values marked with dereferenceable_or_null
attribute. For values marked with the attribute perform
context-sensitive analysis to determine whether it's known-non-null or
not.

Patch by Artur Pilipenko!

Reviewers: hfinkel, sanjoy, reames

Reviewed By: reames

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D9253

llvm-svn: 237593
2015-05-18 18:07:00 +00:00
Jingyue Wu 2982d4d795 [ScalarEvolution] refactor: extract interface getGEPExpr
Summary:
This allows other passes (such as SLSR) to compute the SCEV expression for an
imaginary GEP.

Test Plan: no regression

Reviewers: atrick, sanjoy

Reviewed By: sanjoy

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D9786

llvm-svn: 237589
2015-05-18 17:03:25 +00:00
Andrew Trick 715b27f058 indvars cruft: don't replace phi nodes for no reason.
Don't replace a phi with an identical phi. This was done long ago to
"preserve" IVUsers analysis. The code has already called
SE->forgetValue(PN) so I see no purpose in creating a new value for
the phi.

llvm-svn: 237587
2015-05-18 16:49:34 +00:00
Andrew Trick 018e55a187 SimplifyIV comments and dead argument cleanup.
Remove crufty comments. IVUsers hasn't been used here for a long time.

llvm-svn: 237586
2015-05-18 16:49:31 +00:00
Benjamin Kramer d9f06c8b87 Move Pass into anonymous namespace. NFC.
llvm-svn: 237526
2015-05-16 16:16:35 +00:00
Ahmed Bougacha f8fa3b8d4b [MemCpyOpt] Turn memcpy from just-memset'd source into memset.
There's no point in copying around constants, so, when all else fails,
we can still transform memcpy of memset into two independent memsets.

To quote the example, we can turn:
  memset(dst1, c, dst1_size);
  memcpy(dst2, dst1, dst2_size);
into:
  memset(dst1, c, dst1_size);
  memset(dst2, c, dst2_size);
When dst2_size <= dst1_size.

Like r235232 for copy constructors, this can occur in move constructors.

Differential Revision: http://reviews.llvm.org/D9682

llvm-svn: 237506
2015-05-16 01:32:26 +00:00
Ahmed Bougacha 15a31f67f7 [MemCpyOpt] Remove dead argument. NFC.
llvm-svn: 237503
2015-05-16 01:23:47 +00:00
Jingyue Wu 154eb5aa1d Add a speculative execution pass
Summary:
This is a pass for speculative execution of instructions for simple if-then (triangle) control flow. It's aimed at GPUs, but could perhaps be used in other contexts. Enabling this pass gives us a 1.0% geomean improvement on Google benchmark suites, with one benchmark improving 33%.

Credit goes to Jingyue Wu for writing an earlier version of this pass.

Patched by Bjarke Roune. 

Test Plan:
This patch adds a set of tests in test/Transforms/SpeculativeExecution/spec.ll
The pass is controlled by a flag which defaults to having the pass not run.

Reviewers: eliben, dberlin, meheff, jingyue, hfinkel

Reviewed By: jingyue, hfinkel

Subscribers: majnemer, jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D9360

llvm-svn: 237459
2015-05-15 17:54:48 +00:00
Jingyue Wu 80a96d299a [SLSR] handle (B | i) * S
Summary:
Consider (B | i) * S as (B + i) * S if B and i have no bits set in
common.

Test Plan: @or in slsr-mul.ll

Reviewers: broune, meheff

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D9788

llvm-svn: 237456
2015-05-15 17:07:48 +00:00
Sanjoy Das 2c2661456e [PlaceSafepoints] Fix a bug that came in with rL236672.
Transfer the calling convention from the invoke being replaced by
PlaceStatepoints to the new invoke to gc.statepoint created.  Add a test
case that would have caught this issue.

llvm-svn: 237414
2015-05-15 00:26:21 +00:00
Sanjoy Das 8045810c58 [PlaceSafepoints] Fix a bug that came in with rL236672.
rL236672 would generate all invoke statepoints with deopt args set to a
list containing the single element "0", instead of an empty list.

Also add a test case that would have caught this.

llvm-svn: 237413
2015-05-15 00:26:15 +00:00
Jingyue Wu ca32190379 [ValueTracking] refactor: extract method haveNoCommonBitsSet
Summary:
Extract method haveNoCommonBitsSet so that we don't have to duplicate this logic in
InstCombine and SeparateConstOffsetFromGEP.

This patch also makes SeparateConstOffsetFromGEP more precise by passing
DominatorTree to computeKnownBits.

Test Plan: value-tracking-domtree.ll that tests ValueTracking indeed leverages dominating conditions

Reviewers: broune, meheff, majnemer

Reviewed By: majnemer

Subscribers: jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D9734

llvm-svn: 237407
2015-05-14 23:53:19 +00:00
Davide Italiano 95a77e8901 Don't rely on implicit pointerness of 'auto'.
This ends up being a copy. Pointy hat to me.
Reported by: dexonsmith, dblaikie

llvm-svn: 237394
2015-05-14 21:52:12 +00:00
Adam Nemet 2f85b7372c Attempt to fix MSVC bots
llvm-svn: 237359
2015-05-14 12:33:32 +00:00
Adam Nemet 938d3d63d6 New Loop Distribution pass
Summary:
This implements the initial version as was proposed earlier this year
(http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-January/080462.html).
Since then Loop Access Analysis was split out from the Loop Vectorizer
and was made into a separate analysis pass.  Loop Distribution becomes
the second user of this analysis.

The pass is off by default and can be enabled
with -enable-loop-distribution.  There is currently no notion of
profitability; if there is a loop with dependence cycles, the pass will
try to split them off from other memory operations into a separate loop.

I decided to remove the control-dependence calculation from this first
version.  This and the issues with the PDT are actively discussed so it
probably makes sense to treat it separately.  Right now I just mark all
terminator instruction required which keeps identical CFGs for each
distributed loop.  This seems to be working pretty well for 456.hmmer
where even though there is an empty if-then block in the distributed
loop initially, it gets completely removed.

The pass keeps DominatorTree and LoopInfo updated.  I've tested this
with -loop-distribute-verify with the testsuite where we distribute ~90
loops.  SimplifyLoop is violated in some cases and I have a FIXME
covering this.

Reviewers: hfinkel, nadav, aschwaighofer

Reviewed By: aschwaighofer

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D8831

llvm-svn: 237358
2015-05-14 12:05:18 +00:00
Pete Cooper 7c4d7b8fbe Construct ArrayRef<const T*> from vector<T>
ArrayRef already has a SFINAE constructor which can construct ArrayRef<const T*> from ArrayRef<T*>.

This adds methods to do the same directly from SmallVector and std::vector.  This avoids an intermediate step through the use of makeArrayRef.

Also update the users of this in LICM and SROA to remove the now unnecessary makeArrayRef call.

Reviewed by David Blaikie.

llvm-svn: 237309
2015-05-13 22:43:09 +00:00
Sanjoy Das ba74e645d8 [PlaceSafepoints] New attributes for patchable statepoints.
Summary:
This patch teaches the PlaceSafepoints pass about two `CallSite`
function attributes:

 * "statepoint-id": if the string value of this attribute can be parsed
   as an integer, then it is propagated to the ID parameter of the
   statepoint created.

 * "statepoint-num-patch-bytes": if the string value of this attribute
   can be parsed as an integer, then it is propagated to the `num patch
   bytes` parameter of the statepoint created.

This change intentionally does not assert on a malformed value for these
attributes, given that they're not "official" attributes.

Reviewers: reames, pgavlin

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D9735

llvm-svn: 237286
2015-05-13 20:11:31 +00:00
Davide Italiano 80625afea8 [LoopIdiomRecognize] Use auto + range-based loop. NFC intended.
llvm-svn: 237284
2015-05-13 19:51:21 +00:00
Jingyue Wu c74e33bffe [NaryReassociate] avoid running forever
Avoid running forever by checking we are not reassociating an expression into
the same form.

Tested with @avoid_infinite_loops in nary-add.ll

llvm-svn: 237269
2015-05-13 18:12:24 +00:00
Diego Novillo ffc84e378a Add function entry counts from sample profiles.
This patch uses the new function profile metadata "function_entry_count"
to annotate entry counts from sample profiles.

In a sampling profile, the total samples collected at the function entry
are an approximation for the number of times that function was invoked.

llvm-svn: 237265
2015-05-13 17:04:29 +00:00
Pete Cooper 0cabcf211a Constify arguments to methods in LICM. NFC
llvm-svn: 237227
2015-05-13 01:12:18 +00:00
Pete Cooper 41e0ee3074 Change LoadAndStorePromoter to take ArrayRef instead of SmallVectorImpl&.
The array passed to LoadAndStorePromoter's constructor was a constant reference to a SmallVectorImpl, which is just the same as passing an ArrayRef.

Also, the data in the array can be 'const Instruction*' instead of 'Instruction*'.  Its not possible to convert a SmallVectorImpl<T*> to SmallVectorImpl<const T*>, but ArrayRef does provide such a method.

Currently this added calls to makeArrayRef which should be a nop, but i'm going to kick off a discussion about improving ArrayRef to not need these.

llvm-svn: 237226
2015-05-13 01:12:16 +00:00
Philip Reames 4d1a3ef659 [PlaceSafepoints] Reduce dominator tree recalculation
Reduce recalculation of the dominator tree by identifying all sites that will need a safepoint poll before doing any of the insertion. This allows us to invalidate the dominator info once, rather than once per safepoint poll inserted.

While I'm at it, update findLocationForEntrySafepoint to properly update the dom tree now that the interface has been made easy. When first written, it wasn't per comment in the code.

Differential Revision: http://reviews.llvm.org/D9727

llvm-svn: 237220
2015-05-13 00:32:23 +00:00
Jingyue Wu 4b6125d788 [SLSR] handles non-canonicalized Mul candidates
such as (2 + B) * S.

Tested by @non_canonicalized in slsr-mul.ll

llvm-svn: 237216
2015-05-13 00:03:17 +00:00
Sanjoy Das a1d39ba940 [Statepoints] Support for "patchable" statepoints.
Summary:
This change adds two new parameters to the statepoint intrinsic, `i64 id`
and `i32 num_patch_bytes`.  `id` gets propagated to the ID field
in the generated StackMap section.  If the `num_patch_bytes` is
non-zero then the statepoint is lowered to `num_patch_bytes` bytes of
nops instead of a call (the spill and reload code remains unchanged).
A non-zero `num_patch_bytes` is useful in situations where a language
runtime requires complete control over how a call is lowered.

This change brings statepoints one step closer to patchpoints.  With
some additional work (that is not part of this patch) it should be
possible to get rid of `TargetOpcode::STATEPOINT` altogether.

PlaceSafepoints generates `statepoint` wrappers with `id` set to
`0xABCDEF00` (the old default value for the ID reported in the stackmap)
and `num_patch_bytes` set to `0`.  This can be made more sophisticated
later.

Reviewers: reames, pgavlin, swaroop.sridhar, AndyAyers

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D9546

llvm-svn: 237214
2015-05-12 23:52:24 +00:00
Philip Reames 89fe570958 [PlaceSafepoints] Followup to commit L237172
Responding to review feedback from http://reviews.llvm.org/D9585

1) Remove a variable shadow by converting the outer loop to a range for loop.  We never really used the 'i' variable which was being shadowed.
2) Reduce DominatorTree recalculations by passing the DT to SplitEdge.

llvm-svn: 237212
2015-05-12 23:39:23 +00:00
Chandler Carruth a6ae877aec [Unrolling] Refactor the start and step offsets to simplify overflow
checking and make the cache faster and smaller.

I had thought that using an APInt here would be useful, but I think
I was just wrong. Notably, we don't have to do any fancy overflow
checking, we can just bound the values as quite small and do the math in
a higher precision integer. I've switched to a signed integer so that
UBSan will even point out if we ever have integer overflow. I've added
various asserts to try to catch things as well and hoisted the overflow
checks so that we just leave the too-large offsets out of the SCEV-GEP
cache. This makes the value in the cache quite a bit smaller which is
probably worthwhile.

No functionality changed here (for trip counts under 1 billion).

llvm-svn: 237209
2015-05-12 23:32:56 +00:00
Bjorn Steinbrink 2833966a3c CVP: Improve handling of Selects used as incoming PHI values
Summary:
If the branch that leads to the PHI node and the Select instruction
depend on correlated conditions, we might be able to directly use the
corresponding value from the Select instruction as the incoming value
for the PHI node, allowing later removal of the select instruction.

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D9051

llvm-svn: 237201
2015-05-12 22:31:47 +00:00
Philip Reames 311f710654 [RewriteStatepointsForGC] Extend base pointer to handle more cases w/vectors
When relocating a pointer, we need to determine a base pointer for the derived pointer being relocated. We have limited support for handling a pointer extracted from a vector; the current code only handled the case where the entire vector was known to contain base pointers. This patch extends the reasoning to handle chains of insertelements where the indices are constants. This case turns out to be fairly common in vectorized code. We can now handle vectors which contains mixtures of base and derived pointers provided the insertelements use constant indices.

Note that this doesn't solve the general problem. To handle variable indexed insertelements, we'd need to scalarize and introduce conditional branching based on the index. Alternatively, we could eagerly scalarize, but the code structure doesn't currently make either fix easy. The patch also doesn't handle shufflevector or other vector manipulation for much the same reasons. I plan to defer this work until I have a motivating test case.

Differential Revision: http://reviews.llvm.org/D9676

llvm-svn: 237200
2015-05-12 22:19:52 +00:00
Justin Bogner 383749af55 [PlaceSafepoints] Add missing "override" to PlaceBackedgeSafepointsImpl::runOnFunction
Pointed out by -Winconsistent-missing-override.

llvm-svn: 237196
2015-05-12 21:49:47 +00:00
Philip Reames 7b9817927a [PlaceSafepoints] Switch to being a FunctionPass
The pass doesn't actually modify the module outside of the function being processed. The only confusing piece is that it both inserts calls and then inlines the resulting calls. Given that, it definitely invalidates module level analysis results, but many FunctionPasses do that.

Differential Revision: http://reviews.llvm.org/D9590

llvm-svn: 237185
2015-05-12 21:21:18 +00:00
Philip Reames 9f12904ec9 [PlaceSafepoints] Make internal helper pass a FunctionPass
Switch from using a LoopPass to using a FunctionPass for the internal helper analysis pass. The next step is going to be to make this a true analysis pass which is required by the PlaceSafepoints pass itself.

p.s. The interesting semantic part here is that we're changing the iteration order over the loops. It shouldn't matter, but that's the reason to separate this into it's own distinct patch.

Differential Revision: http://reviews.llvm.org/D9588

llvm-svn: 237180
2015-05-12 21:09:36 +00:00
Philip Reames 57bdac96d9 [PlaceSafepoints] Use analysis infrastructure to get dominator tree
The old code computed dominators for every loop. This was terribly slow with no good reason. Just use the standard infrastructure for analysis passes.

Differential Revision: http://reviews.llvm.org/D9586

llvm-svn: 237176
2015-05-12 20:56:33 +00:00
Philip Reames 5708cca7ab [PlaceSafepoints] Remove dependence on LoopSimplify
As a step towards getting rid of internal pass manager hack entirely, remove the need for loop simplify to run in the inner pass manager. The new code does produce slightly different loop structures, so this isn't technically NFC.

Differential Revision: http://reviews.llvm.org/D9585

llvm-svn: 237172
2015-05-12 20:43:48 +00:00
Pete Cooper 833f34d837 Convert PHI getIncomingValue() to foreach over incoming_values(). NFC.
We already had a method to iterate over all the incoming values of a PHI.  This just changes all eligible code to use it.

Ineligible code included anything which cared about the index, or was also trying to get the i'th incoming BB.

llvm-svn: 237169
2015-05-12 20:05:31 +00:00
Pete Cooper 47e80cd796 Constify method. NFC
llvm-svn: 237167
2015-05-12 20:05:20 +00:00
Michael Zolotukhin 8c68171fef Reimplement heuristic for estimating complete-unroll optimization effects.
Summary:
This patch reimplements heuristic that tries to estimate optimization beneftis
from complete loop unrolling.

In this patch I kept the minimal changes - e.g. I removed code handling
branches and folding compares. That's a promising area, but now there
are too many questions to discuss before we can enable it.

Test Plan: Tests are included in the patch.

Reviewers: hfinkel, chandlerc

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D8816

llvm-svn: 237156
2015-05-12 17:20:03 +00:00
Sanjoy Das 5665c999c2 Rename variables in gc_relocate related functions to follow LLVM's naming conventions.
Summary:
This patch is to rename some variables to CamelCase in gc_relocate
related functions. There is no functionality change.

Patch by Chen Li!

Reviewers: reames, AndyAyers, sanjoy

Reviewed By: sanjoy

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D9681

llvm-svn: 237069
2015-05-11 23:47:27 +00:00
Ahmed Bougacha b61696656e [MemCpyOpt] Look at any dependency -not just source- for memset+memcpy.
This fixes another miscompile introduced by r235232: when there was a
dependency on the memcpy destination other than the memset, we would
ignore it, because we only looked at the source dependency.

It was a mistake to use SrcDepInfo.  Instead, just use DepInfo.

llvm-svn: 237066
2015-05-11 23:09:46 +00:00
Davide Italiano 8ed0446e97 [LoopIdiomRecognize] Transform backedge-taken count check into an assertion.
runOnCountable() allowed the caller to call on a loop without a
predictable backedge-taken count. Change the code so that only loops
with computable backdge-count can call this function, in order to catch
abuses.

llvm-svn: 237044
2015-05-11 21:02:34 +00:00
Sanjoy Das 89c5491a72 [RewriteStatepointsForGC] Fix a bug on creating gc_relocate for pointer to vector of pointers
Summary:
In RewriteStatepointsForGC pass, we create a gc_relocate intrinsic for
each relocated pointer, and the gc_relocate has the same type with the
pointer. During the creation of gc_relocate intrinsic, llvm requires to
mangle its type. However, llvm does not support mangling of all possible
types. RewriteStatepointsForGC will hit an assertion failure when it
tries to create a gc_relocate for pointer to vector of pointers because
mangling for vector of pointers is not supported.

This patch changes the way RewriteStatepointsForGC pass creates
gc_relocate. For each relocated pointer, we erase the type of pointers
and create an unified gc_relocate of type i8 addrspace(1)*. Then a
bitcast is inserted to convert the gc_relocate to the correct type. In
this way, gc_relocate does not need to deal with different types of
pointers and the unsupported type mangling is no longer a problem. This
change would also ease further merge when LLVM erases types of pointers
and introduces an unified pointer type.

Some minor changes are also introduced to gc_relocate related part in
InstCombineCalls, CodeGenPrepare, and Verifier accordingly.

Patch by Chen Li!

Reviewers: reames, AndyAyers, sanjoy

Reviewed By: sanjoy

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D9592

llvm-svn: 237009
2015-05-11 18:49:34 +00:00
Igor Laevsky 5e23e16c6c This change is refactoring only. It moves basic block normalization for invokes to happen before replacement of a call with safepoint in "ReplaceWithStatepoint". Previously it was partly done before replacement of calls with safepoint and partly after call replacement but before RAUW's for gc_relocates, which was confusing.
llvm-svn: 236829
2015-05-08 11:59:09 +00:00
NAKAMURA Takumi 2a5bd54f4e Scalar/PlaceSafepoints.cpp: Fix a warning introduced in r228090. [-Wunused-variable]
llvm-svn: 236711
2015-05-07 10:18:46 +00:00
Philip Reames 7a738dd94c [JumpThreading] Simplify comparisons when simplifying branches
If we have recognized that a conditional is constant at a particular location in the code (while trying to decide if we can simplify a conditional branch), we can eagerly replace that condition with a constant if it's definition is post dominated by the branch in question.

In practice, this ends up being a compile time savings at most. JumpThreading would have visited each using branch anyways. CVP would have visited the cmp itself again. Unless LVI gives up early, we shouldn't gain any addition power by doing this transformation early. What we do gain is simplicity and compile time.

Differential Revision: http://reviews.llvm.org/D9312

llvm-svn: 236684
2015-05-07 00:19:14 +00:00
Sanjoy Das abf15608a7 [Statepoints] Clean up PlaceSafepoints.cpp: de-duplicate code.
Common duplicated code and remove unnecessary code.

llvm-svn: 236674
2015-05-06 23:53:21 +00:00
Sanjoy Das 93abd813ec [Statepoints] Clean up PlaceSafepoints.cpp: variable naming.
Use CamelCase.  NFC.

llvm-svn: 236673
2015-05-06 23:53:19 +00:00
Sanjoy Das abe1c685ac [IRBuilder] Add a CreateGCStatepointInvoke.
Renames the original CreateGCStatepoint to CreateGCStatepointCall, and
moves invoke creating functionality from PlaceSafepoints.cpp to
IRBuilder.cpp.

This changes the labels generated for PlaceSafepoints/invokes.ll so use
a regex there to make the basic block labels more resilient.

llvm-svn: 236672
2015-05-06 23:53:09 +00:00
Adam Nemet e340f851a3 [DomTree] verifyDomTree to unconditionally perform DT verification
I folded the check for the flag -verify-dom-info into the only caller
where I think it is supposed to be checked: verifyAnalysis.  (The idea
of the flag is to enable this expensive verification in
verifyPreservedAnalysis.)

I'm assuming that when manually scheduling the verification pass
with -passes=verify<domtree>, we do want to perform the verification.

llvm-svn: 236575
2015-05-06 08:18:41 +00:00
Sanjoy Das 499d703f52 [Statepoint] Clean up Statepoint.h: accessor names.
Use getFoo() as accessors consistently and some other naming changes.

llvm-svn: 236564
2015-05-06 02:36:26 +00:00
Duncan P. N. Exon Smith a9308c49ef IR: Give 'DI' prefix to debug info metadata
Finish off PR23080 by renaming the debug info IR constructs from `MD*`
to `DI*`.  The last of the `DIDescriptor` classes were deleted in
r235356, and the last of the related typedefs removed in r235413, so
this has all baked for about a week.

Note: If you have out-of-tree code (like a frontend), I recommend that
you get everything compiling and tests passing with the *previous*
commit before updating to this one.  It'll be easier to keep track of
what code is using the `DIDescriptor` hierarchy and what you've already
updated, and I think you're extremely unlikely to insert bugs.  YMMV of
course.

Back to *this* commit: I did this using the rename-md-di-nodes.sh
upgrade script I've attached to PR23080 (both code and testcases) and
filtered through clang-format-diff.py.  I edited the tests for
test/Assembler/invalid-generic-debug-node-*.ll by hand since the columns
were off-by-three.  It should work on your out-of-tree testcases (and
code, if you've followed the advice in the previous paragraph).

Some of the tests are in badly named files now (e.g.,
test/Assembler/invalid-mdcompositetype-missing-tag.ll should be
'dicompositetype'); I'll come back and move the files in a follow-up
commit.

llvm-svn: 236120
2015-04-29 16:38:44 +00:00
Philip Reames 63294cbb6a [RewriteStatepointsForGC] Exclude constant values from being considered live at a safepoint
There can be various constant pointers in the IR which do not get relocated at a safepoint. One example is the address of a global variable. Another example is a pointer created via inttoptr. Note that the optimizer itself likes to create such inttoptrs when locally propagating constants through dynamically dead code.

To deal with this, we need to exclude uses of constants from contributing to the liveness of a safepoint which might reach that use. At some later date, it might be worth exploring what could be done to support the relocation of various special types of "constants", but that's future work.

Differential Revision: http://reviews.llvm.org/D9236

llvm-svn: 235821
2015-04-26 19:48:03 +00:00
Philip Reames 2e78fa49a8 Don't Place Entry Safepoints Before the llvm.frameescape() Intrinsic
llvm.frameescape() intrinsic is not a real call. The intrinsic can only exist in the entry block. Inserting a gc.statepoint() before llvm.frameescape() may split the entry block, and push the intrinsic out of the entry block.

Patch by: Swaroop.Sridhar@microsoft.com
Differential Revision: http://reviews.llvm.org/D8910

llvm-svn: 235820
2015-04-26 19:41:23 +00:00
Andrew Kaylor 08c5f1efc1 Fix LoopInterchange/reductions.ll test for debug builds
llvm-svn: 235734
2015-04-24 17:39:16 +00:00
Jingyue Wu 72fca6c89b Resurrect r235688
We should skip vector types which are not SCEVable.

test/CodeGen/NVPTX/sched2.ll passes

llvm-svn: 235695
2015-04-24 04:22:39 +00:00
Michael Zolotukhin 10ed96bf09 Fix comment for NoCommonBits.
Maybe there is a better wording, but at least it should be technically
correct now.

llvm-svn: 235660
2015-04-23 22:55:48 +00:00
Philip Reames 5461d45abf Move Value.isDereferenceablePointer to ValueTracking [NFC]
Move isDereferenceablePointer function to Analysis. This function recursively tracks dereferencability over a chain of values like other functions in ValueTracking.

This refactoring is motivated by further changes to support dereferenceable_or_null attribute (http://reviews.llvm.org/D8650). isDereferenceablePointer will be extended to perform context-sensitive analysis and IR is not a good place to have such functionality.

Patch by: Artur Pilipenko <apilipenko@azulsystems.com>
Differential Revision: reviews.llvm.org/D9075

llvm-svn: 235611
2015-04-23 17:36:48 +00:00
Karthik Bhat 8210fdf26e Add support to interchange loops with reductions.
This patch enables interchanging of tightly nested loops with reductions.
Differential Revision: http://reviews.llvm.org/D8314

llvm-svn: 235571
2015-04-23 04:51:44 +00:00
Sanjay Patel c96ee08016 don't repeat function names in comments; NFC
llvm-svn: 235531
2015-04-22 18:04:46 +00:00
Ahmed Bougacha 9692e30e8b [MemCpyOpt] Use the raw i8* dest when optimizing memset+memcpy.
MemIntrinsic::getDest() looks through pointer casts, and using it
directly when building the new GEP+memset results in stuff like:

  %0 = getelementptr i64* %p, i32 16
  %1 = bitcast i64* %0 to i8*
  call ..memset(i8* %1, ...)

instead of the correct:

  %0 = bitcast i64* %p to i8*
  %1 = getelementptr i8* %0, i32 16
  call ..memset(i8* %1, ...)

Instead, use getRawDest, which just gives you the i8* value.
While there, use the memcpy's dest, as it's live anyway.

In most cases, when the optimization triggers, the memset and memcpy
sizes are the same, so the built memset is 0-sized and eliminated.
The problem occurs when they're different.

Fixes a regression caused by r235232: PR23300.

llvm-svn: 235419
2015-04-21 21:28:33 +00:00
Daniel Berlin b4e7a4a40c Revamp PredIteratorCache interface to be cleaner.
Summary:
This lets us use range based for loops.

Reviewers: chandlerc

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D9169

llvm-svn: 235416
2015-04-21 21:11:50 +00:00
Sanjoy Das 7be03d69e5 [LSR][NFC] Remove a stale comment.
The comment was made stale in r171735.

llvm-svn: 235414
2015-04-21 20:42:50 +00:00
Jingyue Wu f1edf3e88f [SLSR] garbage-collect unused instructions
Summary:
After we rewrite a candidate, the instructions used by the old form may
become unused. This patch cleans up these unused instructions so that we
needn't run DCE after SLSR.

Test Plan: removed -dce in all the SLSR tests

Reviewers: broune, meheff

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D9101

llvm-svn: 235410
2015-04-21 19:56:18 +00:00
Jingyue Wu f763c3fd45 [SeparateConstOffsetFromGEP] garbage-collect intermediate instructions
Summary: so that we needn't run DCE after this pass.

Test Plan: removed -dce from the commandline in split-gep.ll and split-gep-and-gvn.ll

Reviewers: meheff

Subscribers: llvm-commits, HaoLiu, hfinkel, jholewinski

Differential Revision: http://reviews.llvm.org/D9096

llvm-svn: 235409
2015-04-21 19:53:18 +00:00
Duncan P. N. Exon Smith 60635e39b6 DebugInfo: Drop rest of DIDescriptor subclasses
Delete the remaining subclasses of (the already deleted) `DIDescriptor`.
Part of PR23080.

llvm-svn: 235404
2015-04-21 18:44:06 +00:00
Ahmed Bougacha 05b72c1fd8 [MemCpyOpt] Don't force i64 when promoting memset/memcpy sizes.
Harden r235258 to support any integer bitwidth.  The quick glance at
the reference made me think only i32 and i64 were valid types, but
they're not special, so any overload is legal.

Thanks to David Majnemer for noticing!

llvm-svn: 235261
2015-04-18 23:06:04 +00:00
Ahmed Bougacha 7216ccc3f3 [MemCpyOpt] Promote both memset/memcpy sizes if differently typed.
Followup to r235232, which caused PR23278.

We can't assume the memset and memcpy sizes have the same type, as
nothing in the language reference prevents that.
Instead, zext both to i64 if they disagree.

While there, robustify tests by using i8 %c rather than i8 0 for the
memset character.

llvm-svn: 235258
2015-04-18 17:57:41 +00:00
Ahmed Bougacha 83f78a459a [MemCpyOpt] Optimize double-storing by memset+memcpy.
A common idiom in some code is to do the following:

  memset(dst, 0, dst_size);
  memcpy(dst, src, src_size);

Some of the memset is redundant; instead, we can do:

  memcpy(dst, src, src_size);
  memset(dst + src_size, 0,
         dst_size <= src_size ? 0 : dst_size - src_size);

Original patch by: Joel Jones
Differential Revision: http://reviews.llvm.org/D498

llvm-svn: 235232
2015-04-17 22:20:57 +00:00
Jingyue Wu 8579b81329 [NaryReassociate] run NaryReassociate iteratively
Summary:
An alternative is to use a worklist approach. However, that approach
would break the traversing order so that we couldn't lookup SeenExprs
efficiently. I don't see a clear winner here, so I picked the easier approach.

Along with two minor improvements:
1. preserves ScalarEvolution by forgetting instructions replaced
2. removes dead code locally avoiding the need of running DCE afterwards

Test Plan: add to slsr-add.ll a test that requires multiple iterations

Reviewers: broune, dberlin, atrick, meheff

Reviewed By: atrick

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D9058

llvm-svn: 235151
2015-04-17 00:25:10 +00:00
Jingyue Wu 771dfe91cf [NaryReassociate] speeds up candidate searching
Summary:
This fixes a left-over efficiency issue in D8950.

As Andrew and Daniel suggested, we can store the candidates in a stack
and pop the top element when it does not dominate the current
instruction. This reduces the worst-case time complexity to O(n).

Test Plan: a new test in nary-add.ll that exercises this optimization.

Reviewers: broune, dberlin, meheff, atrick

Reviewed By: atrick

Subscribers: llvm-commits, sanjoy

Differential Revision: http://reviews.llvm.org/D9055

llvm-svn: 235129
2015-04-16 18:42:31 +00:00
Duncan P. N. Exon Smith cd1aecfe36 DebugInfo: Require a DebugLoc in DIBuilder::insertDeclare()
Change `DIBuilder::insertDeclare()` and `insertDbgValueIntrinsic()` to
take an `MDLocation*`/`DebugLoc` parameter which it attaches to the
created intrinsic.  Assert at creation time that the `scope:` field's
subprogram matches the variable's.  There's a matching `clang` commit to
use the API.

The context for this is PR22778, which is removing the `inlinedAt:`
field from `MDLocalVariable`, instead deferring to the `!dbg` location
attached to the debug info intrinsic.  The best way to ensure we always
have a `!dbg` attachment is to require one at creation time.  I'll be
adding verifier checks next, but this API change is the best way to
shake out frontend bugs.

Note: I added an `llvm_unreachable()` in `bindings/go` and passed in
`nullptr` for the `DebugLoc`.  The `llgo` folks will eventually need to
pass a valid `DebugLoc` here.

llvm-svn: 235041
2015-04-15 21:18:07 +00:00
Jingyue Wu 43885ebb3a [SLSR] handle candidate form (B + i * S)
Summary:
With this patch, SLSR may rewrite

S1: X = B + i * S
S2: Y = B + i' * S

to

S2: Y = X + (i' - i) * S

A secondary improvement: if (i' - i) is a power of 2, emit Y as X + (S << log(i' - i)). (S << log(i' -i)) is in a canonical form and thus more likely GVN'ed than (i' - i) * S.

Test Plan: slsr-add.ll

Reviewers: hfinkel, sanjoy, meheff, broune, eliben

Reviewed By: eliben

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D8983

llvm-svn: 235019
2015-04-15 16:46:13 +00:00
Richard Trieu 6b1aa5f5e1 Change range-based for-loops to be -Wrange-loop-analysis clean.
No functionality change.

llvm-svn: 234963
2015-04-15 01:21:15 +00:00
Jingyue Wu 8cb6b2a292 Simplify n-ary adds by reassociation
Summary:
This transformation reassociates a n-ary add so that the add can partially reuse
existing instructions. For example, this pass can simplify

  void foo(int a, int b) {
    bar(a + b);
    bar((a + 2) + b);
  }

to

  void foo(int a, int b) {
    int t = a + b;
    bar(t);
    bar(t + 2);
  }

saving one add instruction.

Fixes PR22357 (https://llvm.org/bugs/show_bug.cgi?id=22357).

Test Plan: nary-add.ll

Reviewers: broune, dberlin, hfinkel, meheff, sanjoy, atrick

Reviewed By: sanjoy, atrick

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D8950

llvm-svn: 234855
2015-04-14 04:59:22 +00:00
Duncan P. N. Exon Smith 537b4a8159 DebugInfo: Gut DISubprogram and DILexicalBlock*
Gut the `DIDescriptor` wrappers around `MDLocalScope` subclasses.  Note
that `DILexicalBlock` wraps `MDLexicalBlockBase`, not `MDLexicalBlock`.

llvm-svn: 234850
2015-04-14 03:40:37 +00:00
Sanjoy Das e178f46965 [LoopUnrollRuntime] Avoid high-cost trip count computation.
Summary:
Runtime unrolling of loops needs to emit an expression to compute the
loop's runtime trip-count.  Avoid runtime unrolling if this computation
will be expensive.

Depends on D8993.

Reviewers: atrick

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D8994

llvm-svn: 234846
2015-04-14 03:20:38 +00:00
Sanjoy Das 2e6bb3b947 [SCEV] Refactor out isHighCostExpansion. NFCI.
Summary:
Move isHighCostExpansion from IndVarSimplify to SCEVExpander.  This
exposed function will be used in a subsequent change.

Reviewers: bogner, atrick

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D8995

llvm-svn: 234844
2015-04-14 03:20:28 +00:00
Duncan P. N. Exon Smith b7e221ba55 DebugInfo: Gut DILocation
This is along the same lines as r234832, but for `DILocation`.  Clean
out all accessors from `DILocation`.  Any callers should be using
`MDLocation` directly (e.g., via `operator->()`).

llvm-svn: 234835
2015-04-14 01:35:55 +00:00
Duncan P. N. Exon Smith 6a0320a991 DebugInfo: Gut DIExpression
Completely gut `DIExpression`, turning it into a simple wrapper around
`MDExpression *`.  There are two bits of magic left:

  - It's constructed from `const MDExpression*` but convertible to
    `MDExpression*`.
  - It's default-constructed to `nullptr`.

Otherwise, it should behave quite like a raw pointer.  Once I've done
the same to the rest of the `DIDescriptor` subclasses, I'll come back to
delete them entirely (and update call sites as necessary to deal with
the missing magic).

llvm-svn: 234832
2015-04-14 01:12:42 +00:00
Philip Reames ba1984958d [RewriteStatepointsForGC] Delete dead code [NFC]
Before we had real liveness, we needed to track every value that base pointer
insertion code created because these now might be live.  We now just rerun 
the data flow liveness algorithm (which is actually faster!) and no longer 
need the associated code.

llvm-svn: 234827
2015-04-14 00:41:34 +00:00
Philip Reames f209a153f1 [RwriteStatepointsForGC] Minor indentation and naming [NFC]
Use early-return style that's preferred in LLVM and updating the naming in places I touched with other changes in the last few days.  Hopefully, NFC.

llvm-svn: 234785
2015-04-13 20:00:30 +00:00
Philip Reames 2114275263 [RewriteStatepointsForGC] Avoid inserting empty holder
We use dummy calls to adjust the liveness of values over statepoints in the midst of the insertion.  If there are no values which need held live, there's no point in actually inserting the holder.  

llvm-svn: 234779
2015-04-13 19:07:47 +00:00
Philip Reames 69e51cae33 [RewriteStatepointsForGC] Fix a latent bug in normalization for invoke statepoint [NFC]
Since we're restructuring the CFG, we also need to make sure to update the analsis passes. While I'm touching the code, I dedicided to restructure it a bit.  The code involved here was very confusing.  This change moves the normalization to essentially being a pre-pass before the main insertion work and updates a few comments to actually say what is happening and *why*.

The restructuring should be covered by existing tests.  I couldn't easily see how to create a test for the invalidation bug.  Suggestions welcome.

llvm-svn: 234769
2015-04-13 18:07:21 +00:00
Philip Reames 9a2e01d908 [RewriteStatepointsForGC] Strengthen assertions around liveness
This is related to the issues addressed in 234651.  These assertions check the properties ensured by that change at the place of use.  Note that a similiar property is checked in checkBasicSSA, but without the reachability constraint.  Technically, the liveness would be correct to include unreachable values, but this would be problematic for actual relocation.

llvm-svn: 234766
2015-04-13 17:35:55 +00:00
Philip Reames e73300b925 [RewriteStatepointsForGC] Move an expensive debugging check to XDEBUG
The check in question is attempting to help find cases where we haven't relocated a pointer at a safepoint we should have.  It does this by coercing the value to null at any safepoint which doesn't relocate it.  

Unfortunately, this turns out to be rather expensive in terms of memory usage and time.  The number of stores inserted can grow with O(number of values x number of statepoints).  On at least one example I looked at, over half of peak memory usage was coming from this check.  

With this change, the check is no longer enabled by default in Asserts builds.  It is enabled for expensive asserts builds and has a command line option to enable it in both Asserts and non-Asserts builds.  

llvm-svn: 234761
2015-04-13 16:41:32 +00:00
Benjamin Kramer 79de6e6d89 Mark empty default constructors as =default if it makes the type POD
NFC

llvm-svn: 234694
2015-04-11 18:57:14 +00:00
Alexander Kornienko f817c1cb9a Use 'override/final' instead of 'virtual' for overridden methods
The patch is generated using clang-tidy misc-use-override check.

This command was used:

  tools/clang/tools/extra/clang-tidy/tool/run-clang-tidy.py \
    -checks='-*,misc-use-override' -header-filter='llvm|clang' \
    -j=32 -fix -format

http://reviews.llvm.org/D8925

llvm-svn: 234679
2015-04-11 02:11:45 +00:00
Philip Reames 9638ff9b37 [Statepoints] Fix a release only build failure
A function which is used only in Asserts builds needs to be defined only in Asserts builds.

llvm-svn: 234667
2015-04-11 00:06:47 +00:00
Philip Reames 4d80ede538 [RewriteStatepointsForGC] Use a SetVector for a worklist [NFC]
Using a SetVector to replace equivelent but more verbose functionality.

llvm-svn: 234662
2015-04-10 23:11:26 +00:00
Philip Reames df1ef08c0c [RewriteStatepointsForGC] Use an actual liveness algorithm
When rewriting statepoints to make relocations explicit, we need to have a conservative but consistent notion of where a particular pointer is live at a particular site. The old code just used dominance, which is correct, but decidedly more conservative then it needed to be. This patch implements a simple dataflow algorithm that's run one per function (well, twice counting fixup after base pointer insertion). There's still lots of room to make this faster, but it's fast enough for all practical purposes today.

Differential Revision: http://reviews.llvm.org/D8674

llvm-svn: 234657
2015-04-10 22:53:14 +00:00
Philip Reames 704e78b149 [RewriteStatepointsForGC] clang-format file
Format the entire file to reduce diff of change to follow.

llvm-svn: 234656
2015-04-10 22:34:56 +00:00
Philip Reames f66d73708b [RewriteStatepointsForGC] Missed review comment from 234651 & build fix
After submitting 234651, I noticed I hadn't responded to a review comment by mjacob.  This patch addresses that comment and fixes a Release only build problem due to an unused variable.  

llvm-svn: 234653
2015-04-10 22:16:58 +00:00
Philip Reames 85b36a8157 [RewriteStatepointsForGC] Preprocess the IR to remove unreachable blocks and single entry phis
Two related small changes:

    Various dominance based queries about liveness can get confused if we're talking about unreachable blocks. To avoid reasoning about such cases, just remove them before rewriting statepoints.
    Remove single entry phis (likely left behind by LCSSA) to reduce the number of live values.

Both of these are motivated by http://reviews.llvm.org/D8674 which will be submitted shortly.

Differential Revision: http://reviews.llvm.org/D8675

llvm-svn: 234651
2015-04-10 22:07:04 +00:00
Philip Reames 8531d8c491 [RewriteStatepointsForGC] Limited support for vectors of pointers
This patch adds limited support for inserting explicit relocations when there's a vector of pointers live over the statepoint. This doesn't handle the case where the vector contains a mix of base and non-base pointers; that's future work.

The current implementation just scalarizes the vector over the gc.statepoint before doing the explicit rewrite. An alternate approach would be to plumb the vector all the way though the backend lowering, but doing that appears challenging. In particular, the size of the indirect spill slot is currently assumed to be sizeof(pointer) throughout the backend.

In practice, this is enough to allow running the SLP and Loop vectorizers before RewriteStatepointsForGC.

Differential Revision: http://reviews.llvm.org/D8671

llvm-svn: 234647
2015-04-10 21:48:25 +00:00
Benjamin Kramer 3a09ef64ee [CallSite] Make construction from Value* (or Instruction*) explicit.
CallSite roughly behaves as a common base CallInst and InvokeInst. Bring
the behavior closer to that model by making upcasts explicit. Downcasts
remain implicit and work as before.

Following dyn_cast as a mental model checking whether a Value *V isa
CallSite now looks like this: 
  if (auto CS = CallSite(V)) // think dyn_cast
instead of:
  if (CallSite CS = V)

This is an extra token but I think it is slightly clearer. Making the
ctor explicit has the advantage of not accidentally creating nullptr
CallSites, e.g. when you pass a Value * to a function taking a CallSite
argument.

llvm-svn: 234601
2015-04-10 14:50:08 +00:00
Duncan P. N. Exon Smith 6186fb2cd0 Transforms: Stop using DIDescriptor::is*() and auto-casting
Same as r234255, but for lib/Analysis and lib/Transforms.

llvm-svn: 234257
2015-04-06 23:27:00 +00:00
Jingyue Wu 96d74006fd [SLSR] consider &B[S << i] as &B[(1 << i) * S]
Summary: This reduces handling &B[(1 << i) * s] to handling &B[i * S].

Test Plan: slsr-gep.ll

Reviewers: meheff

Subscribers: sanjoy, llvm-commits

Differential Revision: http://reviews.llvm.org/D8837

llvm-svn: 234180
2015-04-06 17:15:48 +00:00
David Blaikie 95d3e53720 [opaque pointer type] More GEP IRBuilder API migrations
llvm-svn: 234064
2015-04-03 23:03:54 +00:00
David Blaikie aa41cd57e0 [opaque pointer type] More GEP IRBuilder API migrations...
llvm-svn: 234058
2015-04-03 21:33:42 +00:00
David Blaikie 93c5444fe0 [opaque pointer type] More GEP API migrations in IRBuilder uses
The plan here is to push the API changes out from the common components
(like Constant::getGetElementPtr and IRBuilder::CreateGEP related
functions) and just update callers to either pass the type if it's
obvious, or pass null.

Do this with LoadInst as well and anything else that comes up, then to
start porting specific uses to not pass null anymore - this may require
some refactoring in each case.

llvm-svn: 234042
2015-04-03 19:41:44 +00:00
Jingyue Wu 99a6bed965 [SLSR] handles off bounds GEPs
Summary:
The old requirement on GEP candidates being in bounds is unnecessary.
For off-bound GEPs, we still have

  &B[i * S] = B + (i * S) * e = B + (i * e) * S

Test Plan: slsr_offbound_gep in slsr-gep.ll

Reviewers: meheff

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D8809

llvm-svn: 233949
2015-04-02 21:18:32 +00:00
David Blaikie 4a2e73b066 [opaque pointer type] API migration for GEP constant factories
Require the pointee type to be passed explicitly and assert that it is
correct. For now it's possible to pass nullptr here (and I've done so in
a few places in this patch) but eventually that will be disallowed once
all clients have been updated or removed. It'll be a long road to get
all the way there... but if you have the cahnce to update your callers
to pass the type explicitly without depending on a pointer's element
type, that would be a good thing to do soon and a necessary thing to do
eventually.

llvm-svn: 233938
2015-04-02 18:55:32 +00:00
Duncan P. N. Exon Smith ec819c096b Transforms: Use the new DebugLoc API, NFC
Update lib/Analysis and lib/Transforms to use the new `DebugLoc` API.

llvm-svn: 233587
2015-03-30 19:49:49 +00:00
James Molloy 0cbb2a8603 Reapply r233175 and r233183: float2int.
This re-adds float2int to the tree, after fixing PR23038. It turns
out the argument to APSInt() is true-if-unsigned, rather than
true-if-signed :(. Added testcase and explanatory comment.

llvm-svn: 233370
2015-03-27 10:36:57 +00:00
Sanjoy Das 7041fb1c13 [NFC] Fix typo in comment.
llvm-svn: 233363
2015-03-27 06:01:56 +00:00
Philip Reames a6ebf075b1 Code cleanup [NFC]
The assertion here was more expensive then it needed to be.  We're only inserting allocas in the entry block, so we only need to consider ones in the entry block.

llvm-svn: 233362
2015-03-27 05:53:16 +00:00
Philip Reames 24c6cd52e0 More code cleanup [NFC]
llvm-svn: 233361
2015-03-27 05:47:00 +00:00
Philip Reames 18d0feb7d2 More code cleanup [NFC]
Minor naming, one potentially unsafe cast

llvm-svn: 233359
2015-03-27 05:39:32 +00:00
Philip Reames aa66dfa028 Code simplification and style cleanup
All the removed assertions are either implied locally by the assert at the top of the function or properties of the verifier.

llvm-svn: 233358
2015-03-27 05:34:44 +00:00
Nick Lewycky ffb0864b44 Revert r233175 and r233183 with it. This pulls float2int back out of the tree, due to PR23038.
llvm-svn: 233350
2015-03-27 02:00:11 +00:00
Jingyue Wu 177a81578f [SLSR] handle candidate form &B[i * S]
Summary:
This patch enhances SLSR to handle another candidate form &B[i * S]. If
we found two candidates

S1: X = &B[i * S]
S2: Y = &B[i' * S]

and S1 dominates S2, we can replace S2 with

Y = &X[(i' - i) * S]

Test Plan:
slsr-gep.ll
X86/no-slsr.ll: verify that we do not run SLSR on GEPs that already fit into
an addressing mode

Reviewers: eliben, atrick, meheff, hfinkel

Reviewed By: hfinkel

Subscribers: sanjoy, llvm-commits

Differential Revision: http://reviews.llvm.org/D7459

llvm-svn: 233286
2015-03-26 16:49:24 +00:00
Andrea Di Biagio 460948c9ab [optnone] Skip pass Float2Int on optnone functions.
Added test Float2Int/float2int-optnone.ll to verify that pass Float2Int
is not run on optnone functions.

llvm-svn: 233183
2015-03-25 12:22:37 +00:00
James Molloy cb75d92458 Reapply r233062: "float2int": Add a new pass to demote from float to int where possible.
Now with a fix for PR23008 and extra regression test.

llvm-svn: 233175
2015-03-25 10:03:42 +00:00
David Blaikie 68d535c45f Opaque Pointer Types: GEP API migrations to specify the gep type explicitly
The changes to InstCombine do seem a bit silly - it doesn't make
anything obviously better to have the caller access the pointers element
type (the thing I'm trying to remove) than the GEP itself, but it's a
helpful migration step. This will allow me to more obviously lock down
GEP (& Load, etc) API usage, then fix all the code that accesses pointer
element types except the places that need to be removed (most of the
InstCombines) anyway - at which point I'll need to just remove all that
code because it won't be meaningful anymore (there will be no pointer
types, so no bitcasts to combine)

llvm-svn: 233126
2015-03-24 22:38:16 +00:00
Hans Wennborg e42c64551a Revert r233062 ""float2int": Add a new pass to demote from float to int where possible."
This caused PR23008, compiles failing with: "Use still stuck around after Def is
destroyed: %.sroa.speculated"

Also reverting follow-up r233064.

llvm-svn: 233105
2015-03-24 20:07:08 +00:00
Sanjoy Das 45dc94a856 [IRCE] Fix how IRCE checks for no-sign-overflow.
IRCE requires the induction variables it handles to not sign-overflow.
The current scheme of checking if sext({X,+,S}) == {sext(X),+,sext(S)}
fails when SCEV simplifies sext(X) too.  After this change we //also//
check no-signed-wrap by looking at the flags set on the SCEVAddRecExpr.

llvm-svn: 233102
2015-03-24 19:29:22 +00:00
Sanjoy Das 337d46b36f [IRCE] Fix a regression introduced in r232444.
IRCE should not try to eliminate range checks that check an induction
variable against a loop-varying length.

llvm-svn: 233101
2015-03-24 19:29:18 +00:00
Benjamin Kramer e3b961a6e2 [float2int] Sort includes and add missing raw_ostream include.
llvm-svn: 233064
2015-03-24 11:28:47 +00:00
James Molloy 408df5160c "float2int": Add a new pass to demote from float to int where possible.
It is possible to have code that converts from integer to float, performs operations then converts back, and the result is provably the same as if integers were used.

This can come from different sources, but the most obvious is a helper function that uses floats but the arguments given at an inlined callsites are integers.

This pass considers all integers requiring a bitwidth less than or equal to the bitwidth of the mantissa of a floating point type (23 for floats, 52 for doubles) as exactly representable in floating point.

To reduce the risk of harming efficient code, the pass only attempts to perform complete removal of inttofp/fptoint operations, not just move them around.

llvm-svn: 233062
2015-03-24 11:15:23 +00:00
Benjamin Kramer 799003bf8c Re-sort includes with sort-includes.py and insert raw_ostream.h where it's used.
llvm-svn: 232998
2015-03-23 19:32:43 +00:00
Benjamin Kramer b85d3756a6 Another set of missing raw_ostream.h. Still no functional change.
llvm-svn: 232993
2015-03-23 18:45:56 +00:00
Benjamin Kramer 51f6096cf8 Move private classes into anonymous namespaces
NFC.

llvm-svn: 232944
2015-03-23 12:30:58 +00:00
Duncan P. N. Exon Smith 41a1546ebc SampleProfile: Check for missing debug locations
Don't use `DebugLoc` accessors if we're pointing at null, which will be
a problem after a WIP patch to make the `DIDescriptor` accessors more
strict.  Caught by Frontend/profile-sample-use-loc-tracking.c (in
clang).

llvm-svn: 232792
2015-03-20 00:56:55 +00:00
Duncan P. N. Exon Smith ab58a568ee Verifier: Remove the separate -verify-di pass
Remove `DebugInfoVerifierLegacyPass` and the `-verify-di` pass.
Instead, call into the `DebugInfoVerifier` from inside
`VerifierLegacyPass::finalizeModule()`.  This better matches the logic
in `verifyModule()` (used by the new PassManager), avoids requiring two
separate passes to verify the IR, and makes the API for "add a pass to
verify the IR" simple.

Note: the `-verify-debug-info` flag still works (for now, at least;
eventually it might make sense to just remove it).

llvm-svn: 232772
2015-03-19 22:24:17 +00:00
David Blaikie c4dfa63928 Fix GCC -Wparentheses warning (& reformat now that the precedence is fixed)
Benign warning (clang deliberately suppresses this case) but does
regularly produce bad formatting, so it's nice to fix/reformat.

llvm-svn: 232508
2015-03-17 17:48:24 +00:00
Reid Kleckner 0b16859805 Use an underlying enum type of unsigned to silence a -Wmicrosoft warning about being unable to put (unsigned)-1 into the default underyling type of int
llvm-svn: 232498
2015-03-17 16:50:20 +00:00
Sanjoy Das 9c1bfae604 [IRCE] Add a -irce-print-range-checks option.
-irce-print-range-checks prints out the set of range checks recognized
by IRCE.

llvm-svn: 232451
2015-03-17 01:40:22 +00:00
Sanjoy Das 7a0b7f5996 [IRCE] Add comments, NFC.
This change adds some comments that justify why a potentially
overflowing operation is safe.

llvm-svn: 232445
2015-03-17 00:42:16 +00:00
Sanjoy Das e2cde6f195 [IRCE] Support half-range checks.
This change to IRCE gets it to recognize "half" range checks.  Half
range checks are range checks that only either check if the index is
`slt` some positive integer ("length") or if the index is `sge` `0`.

The range solver does not try to be clever / aggressive about solving
half-range checks -- it transforms "I < L" to "0 <= I < L" and "0 <= I"
to "0 <= I < INT_SMAX".  This is safe, but not always optimal.

llvm-svn: 232444
2015-03-17 00:42:13 +00:00
David Blaikie 741c8f81e4 [opaque pointer type] Start migrating GEP creation to explicitly specify the pointee type
I'm just going to migrate these in a pretty ad-hoc & incremental way -
providing the backwards compatible API for now, then locally removing
it, fixing a few callers, adding it back in and commiting those callers.
Rinse, repeat.

The assertions should ensure that if I get this wrong we'll find out
about it and not just have one giant patch to revert, recommit, revert,
recommit, etc.

llvm-svn: 232240
2015-03-14 01:53:18 +00:00
Robert Lougher 1858ba7626 Reapply "[Reassociate] Add initial support for vector instructions."
This reapplies the patch previously committed at revision 232190.  This was
reverted at revision 232196 as it caused test failures in tests that did not
expect operands to be commuted.  I have made the tests more resilient to
reassociation in revision 232206.

llvm-svn: 232209
2015-03-13 20:53:01 +00:00
Robert Lougher 5e0ea66d59 Revert: "[Reassociate] Add initial support for vector instructions."
This reverts revision 232190 due to buildbot failure reported on clang-hexagon-elf
for test arm64_vtst.c.  To be investigated.

llvm-svn: 232196
2015-03-13 19:20:46 +00:00
Robert Lougher 1bad505c3c [Reassociate] Add initial support for vector instructions.
This patch adds initial support for vector instructions to the reassociation
pass. It enables most parts of the pass to work with vectors but to keep the
size of the patch small, optimization of Xor trees, canonicalization of
negative constants and converting shifts to muls, etc., have been left out.
This will be handled in later patches.

The patch is based on an initial patch by Chad Rosier.

Differential Revision: http://reviews.llvm.org/D7566

llvm-svn: 232190
2015-03-13 18:33:27 +00:00
Mehdi Amini a28d91d81b DataLayout is mandatory, update the API to reflect it with references.
Summary:
Now that the DataLayout is a mandatory part of the module, let's start
cleaning the codebase. This patch is a first attempt at doing that.

This patch is not exactly NFC as for instance some places were passing
a nullptr instead of the DataLayout, possibly just because there was a
default value on the DataLayout argument to many functions in the API.
Even though it is not purely NFC, there is no change in the
validation.

I turned as many pointer to DataLayout to references, this helped
figuring out all the places where a nullptr could come up.

I had initially a local version of this patch broken into over 30
independant, commits but some later commit were cleaning the API and
touching part of the code modified in the previous commits, so it
seemed cleaner without the intermediate state.

Test Plan:

Reviewers: echristo

Subscribers: llvm-commits

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 231740
2015-03-10 02:37:25 +00:00
Benjamin Kramer 7bd1f7cb58 Remove the remaining uses of abs64 and nuke it.
std::abs works just fine and we're already using it in many places. NFC intended.

llvm-svn: 231696
2015-03-09 20:20:16 +00:00
Benjamin Kramer f044d3f93b Make helper functions static.
Found by -Wmissing-prototypes. NFC.

llvm-svn: 231664
2015-03-09 16:23:46 +00:00
Kevin Qin 715b01e979 Introduce runtime unrolling disable matadata and use it to mark the scalar loop from vectorization.
Runtime unrolling is an expensive optimization which can bring benefit
only if the loop is hot and iteration number is relatively large enough.
For some loops, we know they are not worth to be runtime unrolled.
The scalar loop from vectorization is one of the cases.

llvm-svn: 231631
2015-03-09 06:14:18 +00:00
Benjamin Kramer e8a64a20f2 LoopInterchange: Remove empty method.
llvm-svn: 231503
2015-03-06 19:37:26 +00:00
Benjamin Kramer 79442920bf LoopInterchange: Rephrase instruction moving using ilist's splice and factor it into a function
+ Random cleanups. No functional change.

llvm-svn: 231501
2015-03-06 18:59:14 +00:00
Daniel Jasper 6adbd7aecf Change the way in which error case is being handled.
Specifically this:
* Prevents an "unused" warning in non-assert builds.
* In that error case return with out removing a child loop instead of
  looping forever.

llvm-svn: 231459
2015-03-06 10:39:14 +00:00
Karthik Bhat 88db86dd29 Add a new pass "Loop Interchange"
This pass interchanges loops to provide a more cache-friendly memory access.

For e.g. given a loop like -
  for(int i=0;i<N;i++)
    for(int j=0;j<N;j++)
      A[j][i] = A[j][i]+B[j][i];

is interchanged to -
  for(int j=0;j<N;j++)
    for(int i=0;i<N;i++)
      A[j][i] = A[j][i]+B[j][i];

This pass is currently disabled by default.

To give a brief introduction it consists of 3 stages-

LoopInterchangeLegality : Checks the legality of loop interchange based on Dependency matrix.
LoopInterchangeProfitability: A very basic heuristic has been added to check for profitibility. This will evolve over time.
LoopInterchangeTransform : Which does the actual transform.

LNT Performance tests shows improvement in Polybench/linear-algebra/kernels/mvt and Polybench/linear-algebra/kernels/gemver becnmarks.

TODO:
1) Add support for reductions and lcssa phi.
2) Improve profitability model.
3) Improve loop selection algorithm to select best loop for interchange. Currently the innermost loop is selected for interchange.
4) Improve compile time regression found in llvm lnt due to this pass.
5) Fix issues in Dependency Analysis module.

A special thanks to Hal for reviewing this code.
Review: http://reviews.llvm.org/D7499

llvm-svn: 231458
2015-03-06 10:11:25 +00:00
Mehdi Amini 46a43556db Make DataLayout Non-Optional in the Module
Summary:
DataLayout keeps the string used for its creation.

As a side effect it is no longer needed in the Module.
This is "almost" NFC, the string is no longer
canonicalized, you can't rely on two "equals" DataLayout
having the same string returned by getStringRepresentation().

Get rid of DataLayoutPass: the DataLayout is in the Module

The DataLayout is "per-module", let's enforce this by not
duplicating it more than necessary.
One more step toward non-optionality of the DataLayout in the
module.

Make DataLayout Non-Optional in the Module

Module->getDataLayout() will never returns nullptr anymore.

Reviewers: echristo

Subscribers: resistor, llvm-commits, jholewinski

Differential Revision: http://reviews.llvm.org/D7992

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 231270
2015-03-04 18:43:29 +00:00
Philip Reames 6da37857d1 [RewriteStatepointsForGC] Fix a relocation bug w.r.t values defined by invoke instructions
RewriteStatepointsForGC pass emits an alloca for each GC pointer which will be relocated. It then inserts stores after def and all relocations, and inserts loads before each use as well. In the end, mem2reg is used to update IR with relocations in SSA form.

However, there is a problem with inserting stores for values defined by invoke instructions. The code didn't expect a def was a terminator instruction, and inserting instructions after these terminators resulted in malformed IR.

This patch fixes this problem by handling invoke instructions as a special case. If the def is an invoke instruction, the store will be inserted at the beginning of the normal destination block. Since return value from invoke instruction does not dominate the unwind destination block, no action is needed there.

Patch by: Chen Li
Differential Revision: http://reviews.llvm.org/D7923

llvm-svn: 231183
2015-03-04 00:13:52 +00:00
David Blaikie 9469072367 RewriteStatepointsForGC::PhiState: Remove explicit copy ctor in favor of the Rule of Zero
The assertion was just checking a class invariant that's pretty easy to
verify by inspection (no mutating operations, and the two non-copy ctors
already ensure the state is maintained) so remove the explicit copy ctor
in favor of the default, thus allowing the use of the default copy
assignment operator without hitting the C++11 deprecation here.

llvm-svn: 231143
2015-03-03 21:49:07 +00:00
David Blaikie 7f1e0565b3 Revert "Remove the explicit SDNodeIterator::operator= in favor of the implicit default"
Accidentally committed a few more of these cleanup changes than
intended. Still breaking these out & tidying them up.

This reverts commit r231135.

llvm-svn: 231136
2015-03-03 21:18:16 +00:00
David Blaikie bb8da4c08f Remove the explicit SDNodeIterator::operator= in favor of the implicit default
There doesn't seem to be any need to assert that iterator assignment is
between iterators over the same node - if you want to reuse an iterator
variable to iterate another node, that's perfectly acceptable. Just
don't mix comparisons between iterators into disjoint sequences, as
usual.

llvm-svn: 231135
2015-03-03 21:17:08 +00:00
Benjamin Kramer 838752d3f6 LoopIdiom: Give globals for memset_pattern16 private linkage.
There's really no reason to have them have entries in the symbol table
anymore. Old versions of ld64 had some bugs in this area but those have
been fixed long ago.

llvm-svn: 231041
2015-03-03 00:17:09 +00:00
Sanjoy Das 2d38031271 Revert some changes that were made to fix PR20680.
This re-lands change r230921.  r230921 was reverted because it broke a
clang test; a checkin fixing the clang test will be commited shortly.

Summary:
As far as I can tell, the real bug causing the issue was fixed in
r230533.  SCEVExpander should mark an increment operation as nuw or nsw
only if it can *prove* that the operation does not overflow.  There
shouldn't be any situation where we have to do something different
because of no-wrap flags generated by SCEVExpander.

Revert "IndVarSimplify: Allow LFTR to fire more often"

This reverts commit 1ade0f0faa98877b688e0b9da58e876052c1e04e (SVN: 222213).

Revert "IndVarSimplify: Don't let LFTR compare against a poison value"

This reverts commit c0f2b8b528d8a37b0a1522aae90af649d6357eb5 (SVN: 217102).

Reviewers: majnemer, atrick, spatel

Differential Revision: http://reviews.llvm.org/D7979

llvm-svn: 231018
2015-03-02 21:41:07 +00:00
NAKAMURA Takumi 0cd23c842e Revert r230921, "Revert some changes that were made to fix PR20680.", for now.
It caused a failure on clang/test/Misc/backend-optimization-failure.cpp .

llvm-svn: 230929
2015-03-02 01:14:03 +00:00
Sanjoy Das 876bd51486 Revert some changes that were made to fix PR20680.
Summary:
As far as I can tell, the real bug causing the issue was fixed in
r230533.  SCEVExpander should mark an increment operation as nuw or nsw
only if it can *prove* that the operation does not overflow.  There
shouldn't be any situation where we have to do something different
because of no-wrap flags generated by SCEVExpander.

Revert "IndVarSimplify: Allow LFTR to fire more often"

This reverts commit 1ade0f0faa98877b688e0b9da58e876052c1e04e (SVN: 222213).

Revert "IndVarSimplify: Don't let LFTR compare against a poison value"

This reverts commit c0f2b8b528d8a37b0a1522aae90af649d6357eb5 (SVN: 217102).

Reviewers: majnemer, atrick, spatel

Differential Revision: http://reviews.llvm.org/D7979

llvm-svn: 230921
2015-03-01 23:36:26 +00:00
Benjamin Kramer cb570f1bc9 TRE: Just erase dead BBs and tweak the iteration loop not to increment the deleted BB iterator.
Leaving empty blocks around just opens up a can of bugs like PR22704. Deleting
them early also slightly simplifies code.

Thanks to Sanjay for the IR test case.

llvm-svn: 230856
2015-02-28 16:47:27 +00:00
Yaron Keren 42a7adf171 Silence variable set but not used warning, NFC.
llvm-svn: 230848
2015-02-28 13:11:24 +00:00
Benjamin Kramer 4f6ac16292 Replace std::copy with a back inserter with vector append where feasible
All of the cases were just appending from random access iterators to a
vector. Using insert/append can grow the vector to the perfect size
directly and moves the growing out of the loop. No intended functionalty
change.

llvm-svn: 230845
2015-02-28 10:11:12 +00:00
Philip Reames 28e61ce60f [RewriteStatepointsForGC] Reduce indentation via early continue [NFC]
llvm-svn: 230836
2015-02-28 01:57:44 +00:00
Philip Reames 2e5bcbe8d5 [RewriteStatepointsForGC] Fix another order of iteration bug
It turns out the naming of inserted phis and selects is sensative to the order in which two sets are iterated.  We need to nail this down to avoid non-deterministic output and possible test failures.  

The modified test is the one I first noticed something odd in.  The change is making it more strict to report the error.  With the test change, but without the code change, the test fails roughly 1 in 5.  With the code change, I've run ~30 runs without error.

Long term, the right fix here is to adjust the naming scheme.  I'm checking in this hack to avoid any possible non-determinism in the tests over the weekend.  HJust because I only noticed one case doesn't mean it's actually the only case.  I hope to get to the right change Monday.

std->llvm data structure changes bugfix change #3

llvm-svn: 230835
2015-02-28 01:52:09 +00:00
Philip Reames f986d68b36 [RewriteStatepointsForGC] Reduce indentation via early continue [NFC]
llvm-svn: 230829
2015-02-28 00:54:41 +00:00
Philip Reames a226e6115c [RewriteStatepointsForGC] Fix iterator invalidation bug
Inserting into a DenseMap you're iterating over is not well defined.  This is unfortunate since this is well defined on a std::map.

"cleanup per llvm code style standards" bug #2

llvm-svn: 230827
2015-02-28 00:47:50 +00:00
Philip Reames a5aeaf4b4f [RewriteStatepointsForGC] Add tests for the base pointer identification algorithm
These tests cover the 'base object' identification and rewritting portion of RewriteStatepointsForGC.  These aren't completely exhaustive, but they've proven to be reasonable effective over time at finding regressions.

In the process of porting these tests over, I found my first "cleanup per llvm code style standards" bug.  We were relying on the order of iteration when testing the base pointers found for a derived pointer.  When we switched from std::set to DenseSet, this stopped being a safe assumption.  I'm suspecting I'm going to find more of those.  In particular, I'm now really wondering about the main iteration loop for this algorithm.  I need to go take a closer look at the assumptions there.

I'm not really happy with the fact these are testing what is essentially debug output (i.e. enabled via command line flags).  Suggestions for how to structure this better are very welcome.  

llvm-svn: 230818
2015-02-28 00:20:48 +00:00
Sanjay Patel b92e9164d2 remove function names from comments; NFC
llvm-svn: 230766
2015-02-27 17:27:15 +00:00
Sanjoy Das e91665de39 IRCE: only touch loops that have been shown to have a high
backedge-taken count in profiliing data.

llvm-svn: 230619
2015-02-26 08:56:04 +00:00
Sanjoy Das e75ed92630 IRCE: generalize to handle loops with decreasing induction variables.
IRCE can now split the iteration space for loops like:

   for (i = n; i >= 0; i--)
     a[i + k] = 42; // bounds check on access

llvm-svn: 230618
2015-02-26 08:19:31 +00:00
Sanjoy Das 48c75814a5 IRCE: print newline after printing an InductiveRangeCheck.
llvm-svn: 230607
2015-02-26 04:03:31 +00:00
Ramkumar Ramachandra 3408f3e296 PlaceSafepoints: use IRBuilder helpers
Use the IRBuilder helpers for gc.statepoint and gc.result, instead of
coding the construction by hand. Note that the gc.statepoint IRBuilder
handles only CallInst, not InvokeInst; retain that part of hand-coding.

Differential Revision: http://reviews.llvm.org/D7518

llvm-svn: 230591
2015-02-26 00:35:56 +00:00
Sanjay Patel cc29f4f2cb only propagate equality comparisons of FP values that we are certain are non-zero
This is a follow-on to r227491 which tightens the check for propagating FP
values. If a non-constant value happens to be a zero, we would hit the same
bug as before.

Bug noted and patch suggested by Eli Friedman.

llvm-svn: 230564
2015-02-25 22:46:08 +00:00
Sanjay Patel cee38616c8 remove function names from comments; NFC
llvm-svn: 230391
2015-02-24 22:43:06 +00:00
Sanjay Patel 27aa1423d2 add newline for easier reading; NFC
llvm-svn: 230265
2015-02-23 21:32:09 +00:00
David Blaikie 5e5d7840fb Roll condition into an assert then wrap it 'ifndef NDEBUG' to protect from the inevitable "unused variable" warning in a non-asserts build.
llvm-svn: 230181
2015-02-22 20:58:38 +00:00
Hal Finkel 3d4269ab05 [LICM] Refactor to expose functionality as utility functions
This refactors the core functionality of LICM: HoistRegion, SinkRegion and
PromoteAliasSet (renamed to promoteLoopAccessesToScalars) as utility functions
in LoopUtils. This will enable other transformations to make use of them
directly.

Patch by Ashutosh Nema.

llvm-svn: 230178
2015-02-22 18:35:32 +00:00
NAKAMURA Takumi f7d08f6dcc RewriteStatepointsForGC.cpp: Fix for -Asserts to mark isNullConstant() as LLVM_ATTRIBUTE_UNUSED. [-Wunused-function]
llvm-svn: 230169
2015-02-22 09:58:19 +00:00
NAKAMURA Takumi 02aa295a00 RewriteStatepointsForGC.cpp: Fix for -Asserts. [-Wunused-variable]
llvm-svn: 230168
2015-02-22 09:58:13 +00:00
Sanjoy Das 95c476db94 IRCE: generalize InductiveRangeCheck::computeSafeIterationSpace to
work with a non-canonical induction variable.

This is currently a non-functional change because we only ever call
computeSafeIterationSpace on a canonical induction variable; but the
generalization will be useful in a later commit.

llvm-svn: 230151
2015-02-21 22:20:22 +00:00
Sanjoy Das 7fc60da2f5 IRCE: use SCEVs instead of llvm::Value's for intermediate
calculations.  Semantically non-functional change.

This gets rid of some of the SCEV -> Value -> SCEV round tripping and
the Construct(SMin|SMax)Of and MaybeSimplify helper routines.

llvm-svn: 230150
2015-02-21 22:07:32 +00:00
Philip Reames 0b1b387441 [PlaceSafepoints] Adjust enablement logic to default to off and be GC configurable per GC
Previously, this pass ran over every function in the Module if added to the pass order.  With this change, it runs only over those with a GC attribute where the GC explicitly opts in.  A GC can also choose which of entry safepoint polls, backedge safepoint polls, and call safepoints it wants.  I hope to get these exposed as checks on the GCStrategy at some point, but for now, the checks are manual string comparisons.

llvm-svn: 230097
2015-02-21 00:09:09 +00:00
David Blaikie 82ad78771b Remove some unnecessary unreachables in favor of (sometimes implicit) assertions
Also simplify some else-after-return cases including some standard
algorithm convenience/use.

llvm-svn: 230094
2015-02-20 23:44:24 +00:00
Philip Reames 1f3e5c195c Hide a bunch of advanced testing options in default opt --help output
These are internal options.  I need to go through, evaluate which are worth keeping and which not.  Many of them should probably be renamed as well.  Until I have time to do that, we can at least stop poluting the standard opt -help output.

llvm-svn: 230088
2015-02-20 23:32:03 +00:00
Philip Reames 1f017547bb [RewriteStatepointsForGC] Use DenseSet in place of std::set [NFC]
This should be the last cleanup on non-llvm preferred data structures.  I left one use of std::set in an assertion; DenseSet didn't seem to have a tombstone for CallSite defined.  That might be worth fixing, but wasn't worth it for a debug only use.

llvm-svn: 230084
2015-02-20 23:16:52 +00:00
Philip Reames e9c3b9bd46 [RewriteStatepointsForGC] Replace std::map with DenseMap
I'd done the work of extracting the typedef in a previous commit, but didn't actually change it.  Hopefully this will make any subtle changes easier to isolate.

llvm-svn: 230081
2015-02-20 22:48:20 +00:00
Philip Reames d2b664642f [RewriteStatepointsForGC] Cleanup - replace std::vector usage [NFC]
Migrate std::vector usage to a combination of SmallVector and ArrayRef.

llvm-svn: 230079
2015-02-20 22:39:41 +00:00
Philip Reames 860660ea5e [RewriteStatepointsForGC] More style cleanup [NFC]
Use llvm_unreachable where appropriate, use SmallVector where easy to do so, introduce typedefs for planned type migrations.

llvm-svn: 230068
2015-02-20 22:05:18 +00:00
Philip Reames 0a3240f4de [RewriteStatepointsForGC] Remove notion of SafepointBounds [NFC]
The notion of a range of inserted safepoint related code is no longer really applicable.  This survived over from an earlier implementation.  Just saving the inserted gc.statepoint and working from that is far clearer given the current code structure.  Particularly when invokable statepoints get involved.

llvm-svn: 230063
2015-02-20 21:34:11 +00:00
Benjamin Kramer 911d5b3ace LoopRotate: When reconstructing loop simplify form don't split edges from indirectbrs.
Yet another chapter in the endless story. While this looks like we leave
the loop in a non-canonical state this replicates the logic in
LoopSimplify so it doesn't diverge from the canonical form in any way.

PR21968

llvm-svn: 230058
2015-02-20 20:49:25 +00:00
Philip Reames fa2fcf173b [GC, RewriteStatepointsForGC] Style cleanup and bug fix
When doing style cleanup, I noticed a minor bug in this code.  If we have a pointer that we think is unused after a statepoint and thus doesn't need relocation, we store a null pointer into the alloca we're about to promote.  This helps turn a mistake in liveness analysis into an easily debuggable crash.  It turned out this code had never been updated to handle invoke statepoints.  

There's no test for this.  Without a bug in liveness, it appears impossible to make this trigger in a way which is visible in the resulting IR.  We might store the null, but when promoting the alloca, there will be no uses and thus nothing to test against.  Suggestions on how to test are very welcome.

llvm-svn: 230047
2015-02-20 19:51:56 +00:00
Reid Kleckner a070ee5ef5 Use unreachable instead of assert(false) to silence MSVC warning
llvm-svn: 230045
2015-02-20 19:46:02 +00:00
Philip Reames f20413245a [GC] Style cleanup for RewriteStatepointForGC (1 of many) [NFC]
Starting to update variable naming and types to match LLVM style.  This will be an incremental process to minimize the chance of breakage as I work.  Step one, rename member variables to LLVM CamelCase and use llvm's ADT.  Much more to come.

llvm-svn: 230042
2015-02-20 19:26:04 +00:00
Philip Reames 2ef029c7ae Bugfix for 229954
Before calling Function::getGC to test for enablement, we need to make sure there's actually a GC at all via Function::hasGC.  Otherwise, we'd crash on functions without a GC.  Thankfully, this only mattered if you manually scheduled the pass, but still, oops. :(

llvm-svn: 230040
2015-02-20 18:56:14 +00:00
Benjamin Kramer 6f66545ae6 RewriteStatepointsForGC: Move details into anonymous namespaces. NFC.
While there reduce the number of duplicated std::map lookups.

llvm-svn: 230012
2015-02-20 14:00:58 +00:00
Benjamin Kramer d4a3a55564 Wrap recursive function only used in assert in #ifndef NDEBUG.
Avoids unused function warnings in Release builds.

llvm-svn: 230009
2015-02-20 13:15:49 +00:00
Nick Lewycky eb3231eefa Fix build in release mode, four cases of -Wunused-variable.
llvm-svn: 229976
2015-02-20 07:14:02 +00:00
Philip Reames 6faacf4772 Adjust enablement of RewriteStatepointsForGC
When back merging the changes in 229945 I noticed that I forgot to mark the test cases with the appropriate GC.  We want the rewriting to be off by default (even when manually added to the pass order), not on-by default.  To keep the current test working, mark them as using the statepoint-example GC and whitelist that GC.  

Longer term, we need a better selection mechanism here for both actual usage and testing.  As I migrate more tests to the in tree version of this pass, I will probably need to update the enable/disable logic as well. 

llvm-svn: 229954
2015-02-20 02:34:49 +00:00
Philip Reames d16a9b1fdc Add a pass for constructing gc.statepoint sequences w/explicit relocations
This patch consists of a single pass whose only purpose is to visit previous inserted gc.statepoints which do not have gc.relocates inserted yet, and insert them. This can be used either immediately after IR generation to perform 'early safepoint insertion' or late in the pass order to perform 'late insertion'.

This patch is setting the stage for work to continue in tree.  In particular, there are known naming and style violations in the current patch.  I'll try to get those resolved over the next week or so.  As I touch each area to make style changes, I need to make sure we have adequate testing in place.  As part of the cleanup, I will be cleaning up a collection of test cases we have out of tree and submitting them upstream. The tests included in this change are very basic and mostly to provide examples of usage.

The pass has several main subproblems it needs to address:
- First, it has identify any live pointers. In the current code, the use of address spaces to distinguish pointers to GC managed objects is hard coded, but this will become parametrizable in the near future.  Note that the current change doesn't actually contain a useful liveness analysis.  It was seperated into a followup change as the code wasn't ready to be shared.  Instead, the current implementation just considers any dominating def of appropriate pointer type to be live.
- Second, it has to identify base pointers for each live pointer. This is a fairly straight forward data flow algorithm. 
- Third, the information in the previous steps is used to actually introduce rewrites. Rather than trying to do this by hand, we simply re-purpose the code behind Mem2Reg to do this for us.

llvm-svn: 229945
2015-02-20 01:06:44 +00:00
Adam Nemet 3bfd93d789 [LoopAccesses] Create the analysis pass
This is a function pass that runs the analysis on demand.  The analysis
can be initiated by querying the loop access info via LAA::getInfo.  It
either returns the cached info or runs the analysis.

Symbolic stride information continues to reside outside of this analysis
pass. We may move it inside later but it's not a priority for me right
now.  The idea is that Loop Distribution won't support run-time stride
checking at least initially.

This means that when querying the analysis, symbolic stride information
can be provided optionally.  Whether stride information is used can
invalidate the cache entry and rerun the analysis.  Note that if the
loop does not have any symbolic stride, the entry should be preserved
across Loop Distribution and LV.

Since currently the only user of the pass is LV, I just check that the
symbolic stride information didn't change when using a cached result.

On the LV side, LoopVectorizationLegality requests the info object
corresponding to the loop from the analysis pass.  A large chunk of the
diff is due to LAI becoming a pointer from a reference.

A test will be added as part of the -analyze patch.

Also tested that with AVX, we generate identical assembly output for the
testsuite (including the external testsuite) before and after.

This is part of the patchset that converts LoopAccessAnalysis into an
actual analysis pass.

llvm-svn: 229893
2015-02-19 19:15:04 +00:00
Benjamin Kramer 1c2beed7fd LSR: Move set instead of copying. NFC.
llvm-svn: 229871
2015-02-19 17:19:43 +00:00
NAKAMURA Takumi fa520c5f49 Revert r229622: "[LoopAccesses] Make VectorizerParams global" and others. r229622 brought cyclic dependencies between Analysis and Vector.
r229622: "[LoopAccesses] Make VectorizerParams global"
  r229623: "[LoopAccesses] Stash the report from the analysis rather than emitting it"
  r229624: "[LoopAccesses] Cache the result of canVectorizeMemory"
  r229626: "[LoopAccesses] Create the analysis pass"
  r229628: "[LoopAccesses] Change debug messages from LV to LAA"
  r229630: "[LoopAccesses] Add canAnalyzeLoop"
  r229631: "[LoopAccesses] Add missing const to APIs in VectorizationReport"
  r229632: "[LoopAccesses] Split out LoopAccessReport from VectorizerReport"
  r229633: "[LoopAccesses] Add -analyze support"
  r229634: "[LoopAccesses] Change LAA:getInfo to return a constant reference"
  r229638: "Analysis: fix buildbots"

llvm-svn: 229650
2015-02-18 08:34:47 +00:00
Adam Nemet d6b7e29815 [LoopAccesses] Create the analysis pass
This is a function pass that runs the analysis on demand.  The analysis
can be initiated by querying the loop access info via LAA::getInfo.  It
either returns the cached info or runs the analysis.

Symbolic stride information continues to reside outside of this analysis
pass. We may move it inside later but it's not a priority for me right
now.  The idea is that Loop Distribution won't support run-time stride
checking at least initially.

This means that when querying the analysis, symbolic stride information
can be provided optionally.  Whether stride information is used can
invalidate the cache entry and rerun the analysis.  Note that if the
loop does not have any symbolic stride, the entry should be preserved
across Loop Distribution and LV.

Since currently the only user of the pass is LV, I just check that the
symbolic stride information didn't change when using a cached result.

On the LV side, LoopVectorizationLegality requests the info object
corresponding to the loop from the analysis pass.  A large chunk of the
diff is due to LAI becoming a pointer from a reference.

A test will be added as part of the -analyze patch.

Also tested that with AVX, we generate identical assembly output for the
testsuite (including the external testsuite) before and after.

This is part of the patchset that converts LoopAccessAnalysis into an
actual analysis pass.

llvm-svn: 229626
2015-02-18 03:43:24 +00:00
Hal Finkel 4393559621 [BDCE] Don't forget uses of root instructions seen before the instruction itself
When visiting the initial list of "root" instructions (those which must always
be alive), for those that are integer-valued (such as invokes returning an
integer), we mark their bits as (initially) all dead (we might, obviously, find
uses of those bits later, but all bits are assumed dead until proven
otherwise). Don't do so, however, if we're already seen a use of those bits by
another root instruction (such as a store).

Fixes a miscompile of the sanitizer unit tests on x86_64.

Also, add a debug line for visiting the root instructions, and remove a debug
line which tried to print instructions being removed (printing dead
instructions is dangerous, and can sometimes crash).

llvm-svn: 229618
2015-02-18 03:12:28 +00:00
Elena Demikhovsky ef035bb974 Fixed a bug in store sinking.
The problem was in store-sink barrier check.

Store sink barrier should be checked for ModRef (read-write) mode.

http://llvm.org/bugs/show_bug.cgi?id=22613

llvm-svn: 229495
2015-02-17 13:10:05 +00:00
Hal Finkel 2bb61ba2fe [BDCE] Add a bit-tracking DCE pass
BDCE is a bit-tracking dead code elimination pass. It is based on ADCE (the
"aggressive DCE" pass), with the added capability to track dead bits of integer
valued instructions and remove those instructions when all of the bits are
dead.

Currently, it does not actually do this all-bits-dead removal, but rather
replaces the instruction's uses with a constant zero, and lets instcombine (and
the later run of ADCE) do the rest. Because we essentially get a run of ADCE
"for free" while tracking the dead bits, we also do what ADCE does and removes
actually-dead instructions as well (this includes instructions newly trivially
dead because all bits were dead, but not all such instructions can be removed).

The motivation for this is a case like:

int __attribute__((const)) foo(int i);
int bar(int x) {
  x |= (4 & foo(5));
  x |= (8 & foo(3));
  x |= (16 & foo(2));
  x |= (32 & foo(1));
  x |= (64 & foo(0));
  x |= (128& foo(4));
  return x >> 4;
}

As it turns out, if you order the bit-field insertions so that all of the dead
ones come last, then instcombine will remove them. However, if you pick some
other order (such as the one above), the fact that some of the calls to foo()
are useless is not locally obvious, and we don't remove them (without this
pass).

I did a quick compile-time overhead check using sqlite from the test suite
(Release+Asserts). BDCE took ~0.4% of the compilation time (making it about
twice as expensive as ADCE).

I've not looked at why yet, but we eliminate instructions due to having
all-dead bits in:
External/SPEC/CFP2006/447.dealII/447.dealII
External/SPEC/CINT2006/400.perlbench/400.perlbench
External/SPEC/CINT2006/403.gcc/403.gcc
MultiSource/Applications/ClamAV/clamscan
MultiSource/Benchmarks/7zip/7zip-benchmark

llvm-svn: 229462
2015-02-17 01:36:59 +00:00
Hal Finkel c64150b8f3 [ADCE] Don't indent inside an anonymous namespace
To be consistent with what clang-format does, don't add extra indentation
inside an anonymous namespace. NFC.

llvm-svn: 229412
2015-02-16 18:08:00 +00:00
James Molloy e32d806b5f [LoopReroll] Relax some assumptions a little.
We won't find a root with index zero in any loop that we are able to reroll.
However, we may find one in a non-rerollable loop, so bail gracefully instead
of failing hard.

llvm-svn: 229406
2015-02-16 17:02:00 +00:00
James Molloy 4c7deb2259 [LoopReroll] Don't crash on dead code
If a PHI has no users, don't crash; bail gracefully. This shouldn't
happen often, but we can make no guarantees that previous passes didn't leave
dead code around.

llvm-svn: 229405
2015-02-16 17:01:52 +00:00
Aaron Ballman f9a1897c72 Removing LLVM_DELETED_FUNCTION, as MSVC 2012 was the last reason for requiring the macro. NFC; LLVM edition.
llvm-svn: 229340
2015-02-15 22:54:22 +00:00
Hal Finkel 8626ed2eae [ADCE] Convert another loop for a range-based for
We can use a range-based for for the operands loop too; NFC.

llvm-svn: 229319
2015-02-15 15:51:25 +00:00
Hal Finkel 92fb2d3803 [ADCE] Use inst_range and range-based fors
Convert a few loops to range-based fors; NFC.

llvm-svn: 229318
2015-02-15 15:51:23 +00:00
Hal Finkel c6035cff55 [ADCE] Fix formatting of pointer types
We prefer to put the * with the variable, not with the type; NFC.

llvm-svn: 229317
2015-02-15 15:47:52 +00:00
Hal Finkel 234d8fea7b [ADCE] Fix capitalization of another local variable
Bring another local variable in compliance with our naming conventions, NFC.

llvm-svn: 229316
2015-02-15 15:45:30 +00:00
Hal Finkel 75901293a1 [ADCE] Fix capitalization of some local variables
Bring some local variables in compliance with our naming conventions, NFC.

llvm-svn: 229315
2015-02-15 15:45:28 +00:00
Andrea Di Biagio f54432388f [optnone] Skip pass Constant Hoisting on optnone functions.
Added test CodeGen/X86/constant-hoisting-optnone.ll to verify that
pass Constant Hoisting is not run on optnone functions.

llvm-svn: 229258
2015-02-14 15:11:48 +00:00
Duncan P. N. Exon Smith 2c79ad974c Transforms: Canonicalize access to function attributes, NFC
Canonicalize access to function attributes to use the simpler API.

getAttributes().getAttribute(AttributeSet::FunctionIndex, Kind)
  => getFnAttribute(Kind)

getAttributes().hasAttribute(AttributeSet::FunctionIndex, Kind)
  => hasFnAttribute(Kind)

llvm-svn: 229202
2015-02-14 01:11:29 +00:00
Chandler Carruth 30d69c2e36 [PM] Remove the old 'PassManager.h' header file at the top level of
LLVM's include tree and the use of using declarations to hide the
'legacy' namespace for the old pass manager.

This undoes the primary modules-hostile change I made to keep
out-of-tree targets building. I sent an email inquiring about whether
this would be reasonable to do at this phase and people seemed fine with
it, so making it a reality. This should allow us to start bootstrapping
with modules to a certain extent along with making it easier to mix and
match headers in general.

The updates to any code for users of LLVM are very mechanical. Switch
from including "llvm/PassManager.h" to "llvm/IR/LegacyPassManager.h".
Qualify the types which now produce compile errors with "legacy::". The
most common ones are "PassManager", "PassManagerBase", and
"FunctionPassManager".

llvm-svn: 229094
2015-02-13 10:01:29 +00:00
Chandler Carruth 1fbc316534 [unroll] Concede defeat and disable the unroll analyzer for now.
The issues with the new unroll analyzer are more fundamental than code
cleanup, algorithm, or data structure changes. I've sent an email to the
original commit thread with details and a proposal for how to redesign
things. I'm disabling this for now so that we don't spend time
debugging issues with it in its current state.

llvm-svn: 229064
2015-02-13 05:31:46 +00:00
Chandler Carruth 6c03dff7cc [unroll] Merge the simplification and DCE estimation methods on the
UnrollAnalyzer.

Now they share a single worklist and have less implicit state between
them. There was no real benefit to separating these two things out.

I'm going to subsequently refactor things to share even more code.

llvm-svn: 229062
2015-02-13 04:39:05 +00:00
Chandler Carruth d9591d8922 [unroll] Remove pointless dyn_cast<>s to Instruction - the users of an
instruction must by definition be instructions.

llvm-svn: 229061
2015-02-13 04:33:21 +00:00
Chandler Carruth 5457e20d27 [unroll] Don't check the loop set for whether an instruction is
contained in it each time we try to add it to the worklist, just check
this when pulling it off the worklist. That way we do it at most once
per instruction with the cost of the worklist set we would need to pay
anyways.

llvm-svn: 229060
2015-02-13 04:30:44 +00:00
Chandler Carruth e5c30e4e10 [unroll] Change the other worklist in the unroll analyzer to be a set
vector.

In addition to dramatically reducing the work required for contrived
example loops, this also has to correct some serious latent bugs in the
cost computation. Previously, we might add an instruction onto the
worklist once for every load which it used and was simplified. Then we
would visit it many times and accumulate "savings" each time.

I mean, fortunately this couldn't matter for things like calls with 100s
of operands, but even for binary operators this code seems like it must
be double counting the savings.

I just noticed this by inspection and due to the runtime problems it can
introduce, I don't have any test cases for cases where the cost produced
by this routine is unacceptable.

llvm-svn: 229059
2015-02-13 04:27:50 +00:00
Chandler Carruth 7824bc9241 [unroll] Replace a boolean, for loop, condition, and break with
std::all_of and a lambda. Much cleaner, no functionality
changed.

llvm-svn: 229058
2015-02-13 04:18:14 +00:00
Chandler Carruth 06d537cdd6 [unroll] Directly query for dead instructions.
In the unroll analyzer, it is checking each user to see if that user
will become dead. However, it first checked if that user was missing
from the simplified values map, and then if was also missing from the
dead instructions set. We add everything from the simplified values map
to the dead instructions set, so the first step is completely subsumed
by the second. Moreover, the first step requires *inserting* something
into the simplified value map which isn't what we want at all.

This also replaces a dyn_cast with a cast as an instruction cannot be
used by a non-instruction.

llvm-svn: 229057
2015-02-13 04:14:05 +00:00
Chandler Carruth 82cb30f10c [unroll] Replace a linear time check for no uses with a constant time
check.

Also hoist this into the enqueue process as it is faster even than
testing the worklist set, we should just directly filter these out much
like we filter out constants and such.

llvm-svn: 229056
2015-02-13 04:06:08 +00:00
Chandler Carruth 3b057b3216 [unroll] Rather than an operand set, use a setvector for the worklist.
We don't just want to handle duplicate operands within an instruction,
but also duplicates across operands of different instructions. I should
have gone straight to this, but I had convinced myself that it wasn't
going to be necessary briefly. I've come to my senses after chatting
more with Nick, and am now happier here.

llvm-svn: 229054
2015-02-13 03:57:40 +00:00
Chandler Carruth 17a0496b5a [unroll] Extract the code to enqueue operansd for the worklist in the
unroll analysis into a lambda and call it. That's much simpler than
duplicating all the code.

llvm-svn: 229053
2015-02-13 03:49:41 +00:00
Chandler Carruth 8c86375a10 [unroll] Use a small set to de-duplicate operands prior to putting them
into the worklist. This avoids allocating lots of worklist memory for
them when there are large numbers of repeated operands.

llvm-svn: 229052
2015-02-13 03:48:38 +00:00
Chandler Carruth 93063e6191 [unroll] Make the unroll cost analysis terminate deterministically and
reasonably quickly.

I don't have a reduced test case, but for a version of FFMPEG, this
makes the loop unroller start finishing at all (after over 15 minutes of
running, it hadn't terminated for me, no idea if it was a true infloop
or just exponential work).

The key thing here is to check the DeadInstructions set when pulling
things off the worklist. Without this, we would re-walk the user list of
already dead instructions again and again and again. Consider phi nodes
with many, many operands and other patterns.

The other important aspect of this is that because we would keep
re-visiting instructions that were already known dead, we kept adding
their cost savings to this! This would cause our cost savings to be
*insanely* inflated from this.

While I was here, I also rotated the operand walk out of the worklist
loop to make the code easier to read. There is still work to be done to
minimize worklist traffic because we don't de-duplicate operands. This
means we may add the same instruction onto the worklist 1000s of times
if it shows up in 1000s of operansd to a PHI node for example.

Still, with this patch, the ffmpeg testcase I have finishes quickly and
I can't measure the runtime impact of the unroll analysis any more. I'll
probably try to do a few more cleanups to this code, but not sure how
much cleanup I can justify right now.

llvm-svn: 229038
2015-02-13 03:40:58 +00:00
Chandler Carruth dd6029fc6e [unroll] Make range based for loops a bit more explicit and more
readable.

The biggest thing that was causing me problems is recognizing the
references vs. poniters here. I also found that for maps naming the loop
variable as KeyValue helps make it obvious why you don't actually use it
directly. Finally, using 'auto' instead of 'User *' doesn't seem like
a good tradeoff. Much like with the other cases, I like to know its
a pointer, and 'User' is just as long and tells the reader a lot more.

llvm-svn: 229033
2015-02-13 02:45:17 +00:00
Chandler Carruth 415f41258f [unroll] Avoid the "Insn" abbreviation of Instruction. This is quite
hard to type and read for me, and is inconsistent with the other
abbreviation in the base class "Inst". For most of these (where they are
used widely) I prefer just spelling it out as Instruction. I've changed
two of the short-lived variables to use "Inst" to match the base class.

llvm-svn: 229028
2015-02-13 02:17:39 +00:00
Chandler Carruth 302a133b1e [unroll] Tidy up the integer we use to accumululate the number of
instructions optimized. NFC, just separating this out from the
functionality changing commit.

llvm-svn: 229026
2015-02-13 02:10:56 +00:00
Chandler Carruth 10a9926ab5 [unroll] Don't use a map from pointer to bool. Use a set.
This is much more efficient. In particular, the query with the user
instruction has to insert a false for every missing instruction into the
set. This is just a cleanup a long the way to fixing the underlying
algorithm problems here.

llvm-svn: 228994
2015-02-13 00:29:39 +00:00
Michael Zolotukhin 1b48019751 Prevent division by 0.
When we try to estimate number of potentially removed instructions in
loop unroller, we analyze first N iterations and then scale the
computed number by TripCount/N. We should bail out early if N is 0.

llvm-svn: 228988
2015-02-13 00:17:03 +00:00
Chandler Carruth 186ad60815 [unroll] Update the new analysis logic from r228265 to use modern coding
conventions for function names consistently. Some were already using
this but not all.

llvm-svn: 228987
2015-02-13 00:00:24 +00:00
James Molloy e805ad95dc [LoopRerolling] Be more forgiving with instruction order.
We can't solve the full subgraph isomorphism problem. But we can
allow obvious cases, where for example two instructions of different
types are out of order. Due to them having different types/opcodes,
there is no ambiguity.

llvm-svn: 228931
2015-02-12 15:54:14 +00:00
Mehdi Amini 9730116bd6 Reassociate: cannot negate a INT_MIN value
Summary:
When trying to canonicalize negative constants out of
multiplication expressions, we need to check that the
constant is not INT_MIN which cannot be negated.

Reviewers: mcrosier

Reviewed By: mcrosier

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D7286

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 228872
2015-02-11 19:54:44 +00:00
James Molloy f147359376 [LoopReroll] Introduce the concept of DAGRootSets.
A DAGRootSet models an induction variable being used in a rerollable
loop. For example:

   x[i*3+0] = y1
   x[i*3+1] = y2
   x[i*3+2] = y3

   Base instruction -> i*3
                    +---+----+
                   /    |     \
               ST[y1]  +1     +2  <-- Roots
                        |      |
                      ST[y2] ST[y3]

There may be multiple DAGRootSets, for example:

   x[i*2+0] = ...   (1)
   x[i*2+1] = ...   (1)
   x[i*2+4] = ...   (2)
   x[i*2+5] = ...   (2)
   x[(i+1234)*2+5678] = ... (3)
   x[(i+1234)*2+5679] = ... (3)

This concept is similar to the "Scale" member used previously, but allows
multiple independent sets of roots based off the same induction variable.

llvm-svn: 228821
2015-02-11 09:19:47 +00:00
Zachary Turner 3bd47cee78 Use ADDITIONAL_HEADER_DIRS in all LLVM CMake projects.
This allows IDEs to recognize the entire set of header files for
each of the core LLVM projects.

Differential Revision: http://reviews.llvm.org/D7526
Reviewed By: Chris Bieneman

llvm-svn: 228798
2015-02-11 03:28:02 +00:00
David Majnemer 7679300d93 EarlyCSE: It isn't safe to CSE across synchronization boundaries
This fixes PR22514.

llvm-svn: 228760
2015-02-10 23:09:43 +00:00
Philip Reames 7e7dc3e9df Adjust how we avoid poll insertion inside the poll function (NFC)
I realized that my early fix for this was overly complicated.  Rather than scatter checks around in a bunch of places, just exit early when we visit the poll function itself.

Thinking about it a bit, the whole inlining mechanism used with gc.safepoint_poll could probably be cleaned up a bit.  Originally, poll insertion was fused with gc relocation rewriting.  It might be worth going back to see if we can simplify the chain of events now that these two are seperated.  As one thought, maybe it makes sense to rewrite calls inside the helper function before inlining it to the many callers.  This would require us to visit the poll function before any other functions though..

llvm-svn: 228634
2015-02-10 00:04:53 +00:00
Adrian Prantl 34e7590e0d Debug info: When updating debug info during SROA, do not emit debug info
for any padding introduced by SROA. In particular, do not emit debug info
for an alloca that represents only the padding introduced by a previous
iteration.

Fixes PR22495.

llvm-svn: 228632
2015-02-09 23:57:22 +00:00
Adrian Prantl 27bd01f71c Debug info: Use DW_OP_bit_piece instead of DW_OP_piece in the
intermediate representation. This
- increases consistency by using the same granularity everywhere
- allows for pieces < 1 byte
- DW_OP_piece didn't actually allow storing an offset.

Part of PR22495.

llvm-svn: 228631
2015-02-09 23:57:15 +00:00
Ramkumar Ramachandra 3edf74fe29 [Statepoint] Improve two asserts, fix some style (NFC)
Summary:
It's important that our users immediately know what gc.safepoint_poll
is. Also fix the style of the declaration of CreateGCStatepoint, in
preparation for another change that will wrap it.

Reviewers: reames

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D7517

llvm-svn: 228626
2015-02-09 23:02:10 +00:00
Ramkumar Ramachandra 2e4b9e0a37 PlaceSafepoints: modernize gc.result.* -> gc.result
Differential Revision: http://reviews.llvm.org/D7516

llvm-svn: 228625
2015-02-09 23:00:40 +00:00
Philip Reames d4a912fefd Update file comment to clarify points highlighted in review (NFC)
llvm-svn: 228621
2015-02-09 22:44:03 +00:00
Philip Reames a29de87ea4 Use range for loops in PlaceSafepoints (NFC)
llvm-svn: 228620
2015-02-09 22:26:11 +00:00
Philip Reames b1ed02f728 Add basic tests for PlaceSafepoints
This is just adding really simple tests which should have been part of the original submission.  When doing so, I discovered that I'd mistakenly removed required pieces when preparing the patch for upstream submission.  I fixed two such bugs in this submission.

llvm-svn: 228610
2015-02-09 21:48:05 +00:00
Benjamin Kramer f094d77de8 LoopIdiom: Use utility functions.
The only difference between deleteIfDeadInstruction and
RecursivelyDeleteTriviallyDeadInstructions is that the former also
manually invalidates SCEV. That's unnecessary because SCEV automatically
gets informed when an instruction is deleted via a ValueHandle. NFC.

llvm-svn: 228508
2015-02-07 21:37:08 +00:00
Bjorn Steinbrink 71bf3b800a Properly update AA metadata when performing call slot optimization
Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D7482

llvm-svn: 228500
2015-02-07 17:54:36 +00:00
Michael Zolotukhin 7af83c1f39 Use estimated number of optimized insns in unroll-threshold computation.
If complete-unroll could help us to optimize away N% of instructions, we
might want to do this even if the final size would exceed loop-unroll
threshold. However, we don't want to unroll huge loop, and we are add
AbsoluteThreshold to avoid that - this threshold will never be crossed,
even if we expect to optimize 99% instructions after that.

llvm-svn: 228434
2015-02-06 20:20:40 +00:00
Michael Zolotukhin 4e8598eee3 [InstSimplify] Add SimplifyFPBinOp function.
It is a variation of SimplifyBinOp, but it takes into account
FastMathFlags.

It is needed in inliner and loop-unroller to accurately predict the
transformation's outcome (previously we dropped the flags and were too
conservative in some cases).

Example:
float foo(float *a, float b) {
 float r;
 if (a[1] * b)
   r = /* a lot of expensive computations */;
 else
   r = 1;
 return r;
}
float boo(float *a) {
 return foo(a, 0.0);
}

Without this patch, we don't inline 'foo' into 'boo'.

llvm-svn: 228432
2015-02-06 20:02:51 +00:00
Benjamin Kramer 970eac40bf Make helper functions/classes/globals static. NFC.
llvm-svn: 228410
2015-02-06 17:51:54 +00:00
Benjamin Kramer 39f76acb5c IRCE: Demote template to ArrayRef and SmallVector to array.
NFC.

llvm-svn: 228398
2015-02-06 14:43:49 +00:00
Aaron Ballman 94d4d33a38 Removing an unused variable warning I accidentally introduced with my last warning fix; NFC.
llvm-svn: 228295
2015-02-05 13:52:42 +00:00
Aaron Ballman 1b072b340b Silencing an MSVC warning about a switch statement with no cases; NFC.
llvm-svn: 228294
2015-02-05 13:40:04 +00:00
Michael Zolotukhin a9aadd2903 Implement new heuristic for complete loop unrolling.
Complete loop unrolling can make some loads constant, thus enabling a
lot of other optimizations. To catch such cases, we look for loads that
might become constants and estimate number of instructions that would be
simplified or become dead after substitution.

Example:
Suppose we have:
int a[] = {0, 1, 0};
v = 0;
for (i = 0; i < 3; i ++)
  v += b[i]*a[i];

If we completely unroll the loop, we would get:
v = b[0]*a[0] + b[1]*a[1] + b[2]*a[2]

Which then will be simplified to:
v = b[0]* 0 + b[1]* 1 + b[2]* 0

And finally:
v = b[1]

llvm-svn: 228265
2015-02-05 02:34:00 +00:00
Tom Stellard 080209d573 StructurizeCFG: Remove obsolete fix for loop backedge detection
This is no longer needed now that we are using a reverse post-order
traversal.

llvm-svn: 228187
2015-02-04 20:49:47 +00:00
Tom Stellard 071ec90b68 StructurizeCFG: Use a reverse post-order traversal
We were previously doing a post-order traversal and operating on the
list in reverse, however this would occasionaly cause backedges for
loops to be visited before some of the other blocks in the loop.

We know use a reverse post-order traversal, which avoids this issue.

The reverse post-order traversal is not completely ideal, so we need
to manually fixup the list to ensure that inner loop backedges are
visited before outer loop backedges.

llvm-svn: 228186
2015-02-04 20:49:44 +00:00
Aaron Ballman 34c325e749 Fixing a -Wsign-compare warning; NFC
llvm-svn: 228142
2015-02-04 14:01:08 +00:00
Philip Reames 72634d6af0 Fix a warning in non-asserts builds
llvm-svn: 228114
2015-02-04 05:11:20 +00:00
Philip Reames 5a9685dba6 Clang format of a file introduced in 228090 (NFC)
llvm-svn: 228091
2015-02-04 00:39:57 +00:00
Philip Reames 47cc673e1f Add a pass for inserting safepoints into (nearly) arbitrary IR
This pass is responsible for figuring out where to place call safepoints and safepoint polls. It doesn't actually make the relocations explicit; that's the job of the RewriteStatepointsForGC pass (http://reviews.llvm.org/D6975).

Note that this code is not yet finalized.  Its moving in tree for incremental development, but further cleanup is needed and will happen over the next few days.  It is not yet part of the standard pass order.  

Planned changes in the near future:
 - I plan on restructuring the statepoint rewrite to use the functions add to the IRBuilder a while back. 
 - In the current pass, the function "gc.safepoint_poll" is treated specially but is not an intrinsic. I plan to make identifying the poll function a property of the GCStrategy at some point in the near future.
 - As follow on patches, I will be separating a collection of test cases we have out of tree and submitting them upstream. 
 - It's not explicit in the code, but these two patches are introducing a new state for a statepoint which looks a lot like a patchpoint. There's no a transient form which doesn't yet have the relocations explicitly represented, but does prevent reordering of memory operations. Once this is in, I need to update actually make this explicit by reserving the 'unused' argument of the statepoint as a flag, updating the docs, and making the code explicitly check for such a thing. This wasn't really planned, but once I split the two passes - which was done for other reasons - the intermediate state fell out. Just reminds us once again that we need to merge statepoints and patchpoints at some point in the not that distant future.

Future directions planned:
 - Identifying more cases where a backedge safepoint isn't required to ensure timely execution of a safepoint poll.
 - Tweaking the insertion process to generate easier to optimize IR. (For example, investigating making SplitBackedge) the default.
 - Adding opt-in flags for a GCStrategy to use this pass. Once done, add this pass to the actual pass ordering.

Differential Revision: http://reviews.llvm.org/D6981

llvm-svn: 228090
2015-02-04 00:37:33 +00:00
Daniel Berlin 487aed0d77 Allow PRE to insert no-cost phi nodes
llvm-svn: 228024
2015-02-03 20:37:08 +00:00
Jingyue Wu d7966ff3b9 Add straight-line strength reduction to LLVM
Summary:
Straight-line strength reduction (SLSR) is implemented in GCC but not yet in
LLVM. It has proven to effectively simplify statements derived from an unrolled
loop, and can potentially benefit many other cases too. For example,

LLVM unrolls

  #pragma unroll
  foo (int i = 0; i < 3; ++i) {
    sum += foo((b + i) * s);
  }

into

  sum += foo(b * s);
  sum += foo((b + 1) * s);
  sum += foo((b + 2) * s);

However, no optimizations yet reduce the internal redundancy of the three
expressions:

  b * s
  (b + 1) * s
  (b + 2) * s

With SLSR, LLVM can optimize these three expressions into:

  t1 = b * s
  t2 = t1 + s
  t3 = t2 + s

This commit is only an initial step towards implementing a series of such
optimizations. I will implement more (see TODO in the file commentary) in the
near future. This optimization is enabled for the NVPTX backend for now.
However, I am more than happy to push it to the standard optimization pipeline
after more thorough performance tests.

Test Plan: test/StraightLineStrengthReduce/slsr.ll

Reviewers: eliben, HaoLiu, meheff, hfinkel, jholewinski, atrick

Reviewed By: jholewinski, atrick

Subscribers: karthikthecool, jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D7310

llvm-svn: 228016
2015-02-03 19:37:06 +00:00
Jingyue Wu 49a766e468 Resurrect the assertion removed by r227717
Summary: MSVC can compile "LoopID->getOperand(0) == LoopID" when LoopID is MDNode*.

Test Plan: no regression

Reviewers: mkuper

Subscribers: jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D7327

llvm-svn: 227853
2015-02-02 20:41:11 +00:00
Chandler Carruth 21fc195c13 [multiversion] Kill FunctionTargetTransformInfo, TTI itself is now
per-function and supports the exact desired interface.

llvm-svn: 227743
2015-02-01 14:37:03 +00:00
Benjamin Kramer 6ab86b1bb6 EarlyCSE: Replace custom hash mixing with Hashing.h
Brings it in line with the other hashes in EarlyCSE.

llvm-svn: 227733
2015-02-01 12:30:59 +00:00
Chandler Carruth fdb9c573f7 [multiversion] Thread a function argument through all the callers of the
getTTI method used to get an actual TTI object.

No functionality changed. This just threads the argument and ensures
code like the inliner can correctly look up the callee's TTI rather than
using a fixed one.

The next change will use this to implement per-function subtarget usage
by TTI. The changes after that should eliminate the need for FTTI as that
will have become the default.

llvm-svn: 227730
2015-02-01 12:01:35 +00:00
Chandler Carruth fdffd87d68 [PM] Port SimplifyCFG to the new pass manager.
This should be sufficient to replace the initial (minor) function pass
pipeline in Clang with the new pass manager. I'll probably add an (off
by default) flag to do that just to ensure we can get extra testing.

llvm-svn: 227726
2015-02-01 11:34:21 +00:00
Chandler Carruth e8c686aa86 [PM] Port EarlyCSE to the new pass manager.
I've added RUN lines both to the basic test for EarlyCSE and the
target-specific test, as this serves as a nice test that the TTI layer
in the new pass manager is in fact working well.

llvm-svn: 227725
2015-02-01 10:51:23 +00:00
Jingyue Wu 6c26bb63fe [SeparateConstOffsetFromGEP] skip optnone functions
llvm-svn: 227705
2015-02-01 02:34:41 +00:00
Jingyue Wu 6e091c8eab [SeparateConstOffsetFromGEP] set PreservesCFG flag
SeparateConstOffsetFromGEP does not change the shape of the control flow graph.

llvm-svn: 227704
2015-02-01 02:33:02 +00:00
Jingyue Wu 0220df0dfd [NVPTX] Emit .pragma "nounroll" for loops marked with nounroll
Summary:
CUDA driver can unroll loops when jit-compiling PTX. To prevent CUDA
driver from unrolling a loop marked with llvm.loop.unroll.disable is not
unrolled by CUDA driver, we need to emit .pragma "nounroll" at the
header of that loop.

This patch also extracts getting unroll metadata from loop ID metadata
into a shared helper function.

Test Plan: test/CodeGen/NVPTX/nounroll.ll

Reviewers: eliben, meheff, jholewinski

Reviewed By: jholewinski

Subscribers: jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D7041

llvm-svn: 227703
2015-02-01 02:27:45 +00:00
Adrian Prantl 152ac396db Fix PR22393. When recursively replacing an aggregate with a smaller
aggregate or scalar, the debug info needs to refer to the absolute offset
(relative to the entire variable) instead of storing the offset inside
the smaller aggregate.

llvm-svn: 227702
2015-02-01 00:58:04 +00:00
Chandler Carruth 705b185f90 [PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.

The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.

I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.

There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.

The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.

Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.

The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]

Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:

1) Improving the TargetMachine interface by having it directly return
   a TTI object. Because we have a non-pass object with value semantics
   and an internal type erasure mechanism, we can narrow the interface
   of the TargetMachine to *just* do what we need: build and return
   a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
   This will include splitting off a minimal form of it which is
   sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
   target machine for each function. This may actually be done as part
   of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
   easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
   easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
   just a bit messy and exacerbating the complexity of implementing
   the TTI in each target.

Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.

Differential Revision: http://reviews.llvm.org/D7293

llvm-svn: 227669
2015-01-31 03:43:40 +00:00
James Molloy 64419d414b [LoopReroll] Alter the data structures used during reroll validation.
The validation algorithm used an incremental approach, building each
iteration's data structures temporarily, validating them, then
adding them to a global set.

This does not scale well to having multiple sets of Root nodes, as the
set of instructions used in each iteration is the union over all
the root nodes. Therefore, refactor the logic to create a single, simple
container to which later logic then refers. This makes it simpler
control-flow wise to make the creation of the container more complex with
the addition of multiple root sets.

llvm-svn: 227499
2015-01-29 21:52:03 +00:00
Sanjay Patel 4f07a56958 [GVN] don't propagate equality comparisons of FP zero (PR22376)
In http://reviews.llvm.org/D6911, we allowed GVN to propagate FP equalities
to allow some simple value range optimizations. But that introduced a bug
when comparing to -0.0 or 0.0: these compare equal even though they are not
bitwise identical.

This patch disallows propagating zero constants in equality comparisons. 
Fixes: http://llvm.org/bugs/show_bug.cgi?id=22376

Differential Revision: http://reviews.llvm.org/D7257

llvm-svn: 227491
2015-01-29 20:51:49 +00:00
James Molloy 5f255eb48f [LoopReroll] Refactor most of reroll() into a helper class
reroll() was slightly monolithic and a pain to modify. Refactor
a bunch of its state from local variables to member variables
of a helper class, and do some trivial simplification while we're
there.

llvm-svn: 227439
2015-01-29 13:48:05 +00:00
Philip Reames 9198b33b48 Teach SplitBlockPredecessors how to handle landingpad blocks.
Patch by: Igor Laevsky <igor@azulsystems.com>

"Currently SplitBlockPredecessors generates incorrect code in case if basic block we are going to split has a landingpad. Also seems like it is fairly common case among it's users to conditionally call either SplitBlockPredecessors or SplitLandingPadPredecessors. Because of this I think it is reasonable to add this condition directly into SplitBlockPredecessors."

Differential Revision: http://reviews.llvm.org/D7157

llvm-svn: 227390
2015-01-28 23:06:47 +00:00
Chandler Carruth b81dfa6378 [LPM] Stop using the string based preservation API. It is an
abomination.

For starters, this API is incredibly slow. In order to lookup the name
of a pass it must take a memory fence to acquire a pointer to the
managed static pass registry, and then potentially acquire locks while
it consults this registry for information about what passes exist by
that name. This stops the world of LLVMs in your process no matter
how little they cared about the result.

To make this more joyful, you'll note that we are preserving many passes
which *do not exist* any more, or are not even analyses which one might
wish to have be preserved. This means we do all the work only to say
"nope" with no error to the user.

String-based APIs are a *bad idea*. String-based APIs that cannot
produce any meaningful error are an even worse idea. =/

I have a patch that simply removes this API completely, but I'm hesitant
to commit it as I don't really want to perniciously break out-of-tree
users of the old pass manager. I'd rather they just have to migrate to
the new one at some point. If others disagree and would like me to kill
it with fire, just say the word. =]

llvm-svn: 227294
2015-01-28 04:57:56 +00:00
Sanjoy Das dcf2651043 Teach IRCE to look at branch weights when recognizing range checks
Splitting a loop to make range checks redundant is profitable only if
the range check "never" fails. Make this fact a part of recognizing a
range check -- a branch is a range check only if it is expected to
pass (via branch_weights metadata).

Differential Revision: http://reviews.llvm.org/D7192

llvm-svn: 227249
2015-01-27 21:38:12 +00:00
Eric Christopher e38c8d4aa9 Migrate SeparateConstOffsetFromGEP to use a Function with
getSubtarget.

llvm-svn: 227172
2015-01-27 07:16:37 +00:00
David Majnemer 4c82daea60 LoopRotate: Don't walk the uses of a Constant
LoopRotate wanted to avoid live range interference by looking at the
uses of a Value in the loop latch and seeing if any lied outside of the
loop.  We would wrongly perform this operation on Constants.

This fixes PR22337.

llvm-svn: 227171
2015-01-27 06:21:43 +00:00
Chandler Carruth d649c0ad56 [PM] Refactor the core logic to run EarlyCSE over a function into an
object that manages a single run of this pass.

This was already essentially how it worked. Within the run function, it
would point members at *stack local* allocations that were only live for
a single run. Instead, it seems much cleaner to have a utility object
whose lifetime is clearly bounded by the run of the pass over the
function and can use member variables in a more direct way.

This also makes it easy to plumb the analyses used into it from the pass
and will make it re-usable with the new pass manager.

No functionality changed here, its just a refactoring.

llvm-svn: 227162
2015-01-27 01:34:14 +00:00
Chad Rosier f9327d6fe9 Commoning of target specific load/store intrinsics in Early CSE.
Phabricator revision: http://reviews.llvm.org/D7121
Patch by Sanjin Sijaric <ssijaric@codeaurora.org>!

llvm-svn: 227149
2015-01-26 22:51:15 +00:00
Chandler Carruth 9dea5cdb8e [PM] General doxygen and comment cleanup for this pass.
llvm-svn: 227001
2015-01-24 11:44:32 +00:00
Chandler Carruth 7253bba458 [PM] Reformat this code with clang-format so that I can use clang-format
when refactoring for the new pass manager without introducing too many
formatting changes into meaning full diffs.

llvm-svn: 227000
2015-01-24 11:33:55 +00:00
Chandler Carruth 43e590e51f [PM] Port LowerExpectIntrinsic to the new pass manager.
This just lifts the logic into a static helper function, sinks the
legacy pass to be a trivial wrapper of that helper fuction, and adds
a trivial wrapper for the new PM as well. Not much to see here.

I switched a test case to run in both modes, but we have to strip the
dead prototypes separately as that pass isn't in the new pass manager
(yet).

llvm-svn: 226999
2015-01-24 11:13:02 +00:00
Chandler Carruth c3bf5bd8cf [PM] Change LowerExpectIntrinsic to actually return true when it has
changed the IR. This is particularly easy as we can just look for the
existence of any expect intrinsic at all to know whether we've changed
the IR.

llvm-svn: 226998
2015-01-24 11:12:57 +00:00
Chandler Carruth 6eb60eb5c9 [PM] Use a more appropriate name for the statistics variable in
lower-expect, as we don't have 'if's in the IR and we use it for
switches as well.

llvm-svn: 226997
2015-01-24 10:57:25 +00:00
Chandler Carruth d12741e0a9 [PM] Switch tihs code to use a range based for loop over the function.
We can't switch the loop over the instructions because it needs to
early-increment the iterator.

llvm-svn: 226996
2015-01-24 10:57:19 +00:00
Chandler Carruth 3f5e7b1fb6 [PM] Use a SmallVector instead of std::vector to avoid heap allocations
for small switches, and avoid using a complex loop to set up the
weights.

We know what the baseline weights will be so we can just resize the
vector to contain all that value and clobber the one slot that is
likely. This seems much more direct than the previous code that tested
at every iteration, and started off by zeroing the vector.

llvm-svn: 226995
2015-01-24 10:47:13 +00:00
Chandler Carruth 0012c778a4 [PM] Pull the two helpers for this pass into static functions. There are
no members for them to use.

Also, make them accept references as there is no possibility of a null
pointer.

llvm-svn: 226994
2015-01-24 10:39:24 +00:00
Chandler Carruth 579c5c45c2 [PM] Add a basic doxygen comment for this pass.
llvm-svn: 226993
2015-01-24 10:32:53 +00:00
Chandler Carruth 0ea746bf9f [PM] Clean up the formatting of the LowerExpectIntrinsic pass prior to
refactoring its code.

llvm-svn: 226992
2015-01-24 10:30:14 +00:00
Chandler Carruth 72793727cc [PM] Move the LowerExpectIntrinsic pass to the Scalar library.
It was already in the Scalar header and referenced extensively as being
in this library, the source file was just in the utils directory for
some reason. No actual functionality changed. I noticed as it didn't
make sense to add a pass header to the utils headers.

llvm-svn: 226991
2015-01-24 10:18:47 +00:00
Sanjoy Das 351db05308 [NFC] Introduce a 'struct Range' for IRCE
Use the struct instead of a std::pair<Value *, Value *>.  This makes a
Range an obviously immutable object, and we can now assert that a
range is well-typed (Begin->getType() == End->getType()) on its
construction.

llvm-svn: 226804
2015-01-22 09:32:02 +00:00
Sanjoy Das d1fb13ce4c Fix crashes in IRCE caused by mismatched types
There are places where the inductive range check elimination pass
depends on two llvm::Values or llvm::SCEVs to be of the same
llvm::Type when they do not need to be. This patch relaxes those
restrictions (by bailing out of the optimization if the types
mismatch), and adds test cases to trigger those paths.

These issues were found by bootstrapping clang with IRCE running in
the -O3 pass ordering.

Differential Revision: http://reviews.llvm.org/D7082

llvm-svn: 226793
2015-01-22 08:29:18 +00:00
Adrian Prantl 565cc18d8f Reapply: Teach SROA how to update debug info for fragmented variables.
This reapplies r225379.

ChangeLog:
- The assertion that this commit previously ran into about the inability
  to handle indirect variables has since been removed and the backend
  can handle this now.
- Testcases were upgrade to the new MDLocation format.
- Instead of keeping a DebugDeclares map, we now use
  llvm::FindAllocaDbgDeclare().

Original commit message follows.

Debug info: Teach SROA how to update debug info for fragmented variables.
This allows us to generate debug info for extremely advanced code such as

 typedef struct { long int a; int b;} S;

 int foo(S s) {
   return s.b;
 }

which at -O1 on x86_64 is codegen'd into

 define i32 @foo(i64 %s.coerce0, i32 %s.coerce1) #0 {
   ret i32 %s.coerce1, !dbg !24
 }

with this patch we emit the following debug info for this

 TAG_formal_parameter [3]
   AT_location( 0x00000000
                0x0000000000000000 - 0x0000000000000006: rdi, piece 0x00000008, rsi, piece 0x00000004
                0x0000000000000006 - 0x0000000000000008: rdi, piece 0x00000008, rax, piece 0x00000004 )
                AT_name( "s" )
                AT_decl_file( "/Volumes/Data/llvm/_build.ninja.release/test.c" )

Thanks to chandlerc, dblaikie, and echristo for their feedback on all
previous iterations of this patch!

llvm-svn: 226598
2015-01-20 19:42:22 +00:00
Chandler Carruth d450056c78 [PM] Replace the Pass argument to SplitEdge with specific analyses used
and updated.

This may appear to remove handling for things like alias analysis when
splitting critical edges here, but in fact no callers of SplitEdge
relied on this. Similarly, all of them wanted to preserve LCSSA if there
was any update of the loop info. That makes the interface much simpler.

With this, all of BasicBlockUtils.h is free of Pass arguments and
prepared for the new pass manager. This is tho majority of utilities
that relied on pass arguments.

llvm-svn: 226459
2015-01-19 12:36:53 +00:00
Chandler Carruth f8753fc48d [PM] Cleanup a dead option to critical edge splitting that I noticed
while refactoring this API for the new pass manager.

No functionality changed here, the code didn't actually support this
option.

llvm-svn: 226457
2015-01-19 12:12:00 +00:00
Chandler Carruth 37df2cfbf8 [PM] Remove the Pass argument from all of the critical edge splitting
APIs and replace it and numerous booleans with an option struct.

The critical edge splitting API has a really large surface of flags and
so it seems worth burning a small option struct / builder. This struct
can be constructed with the various preserved analyses and then flags
can be flipped in a builder style.

The various users are now responsible for directly passing along their
analysis information. This should be enough for the critical edge
splitting to work cleanly with the new pass manager as well.

This API is still pretty crufty and could be cleaned up a lot, but I've
focused on this change just threading an option struct rather than
a pass through the API.

llvm-svn: 226456
2015-01-19 12:09:11 +00:00
Chandler Carruth 0eae112009 [PM] Lift the analyses into the interface for
SplitLandingPadPredecessors and remove the Pass argument from its
interface.

Another step to the utilities being usable with both old and new pass
managers.

llvm-svn: 226426
2015-01-19 03:03:39 +00:00
Chandler Carruth b5797b659f [PM] Pull the analyses used for another utility routine into its API
rather than relying on the pass object.

This one is a bit annoying, but will pay off. First, supporting this one
will make the next one much easier, and for utilities like LoopSimplify,
this is moving them (slowly) closer to not having to pass the pass
object around throughout their APIs.

llvm-svn: 226396
2015-01-18 09:21:15 +00:00
Chandler Carruth 32c52c7e04 [PM] Sink the specific analyses preserved by SplitBlock into its
interface, removing Pass from its interface.

This also makes those analyses optional so that passes which don't even
preserve these (or use them) can skip the logic entirely.

llvm-svn: 226394
2015-01-18 02:39:37 +00:00
Chandler Carruth b5c115357c [PM] Replace another Pass argument with specific analyses that are
optionally updated by MergeBlockIntoPredecessors.

No functionality changed, just refactoring to clear the way for the new
pass manager.

llvm-svn: 226392
2015-01-18 02:11:23 +00:00
Chandler Carruth 94209094a5 [PM] Refactor how the LoopRotation pass access the DominatorTree.
Instead of querying the pass every where we need to, do that once and
cache a pointer in the pass object. This is both simpler and I'm about
to add yet another place where we need to dig out that pointer.

llvm-svn: 226391
2015-01-18 02:08:05 +00:00
Chandler Carruth 691addc25f [PM] Now that LoopInfo isn't in the Pass type hierarchy, it is much
cleaner to derive from the generic base.

Thise removes a ton of boiler plate code and somewhat strange and
pointless indirections. It also remove a bunch of the previously needed
friend declarations. To fully remove these, I also lifted the verify
logic into the generic LoopInfoBase, which seems good anyways -- it is
generic and useful logic even for the machine side.

llvm-svn: 226385
2015-01-18 01:25:51 +00:00
Chandler Carruth 4f8f307c77 [PM] Split the LoopInfo object apart from the legacy pass, creating
a LoopInfoWrapperPass to wire the object up to the legacy pass manager.

This switches all the clients of LoopInfo over and paves the way to port
LoopInfo to the new pass manager. No functionality change is intended
with this iteration.

llvm-svn: 226373
2015-01-17 14:16:18 +00:00
Mehdi Amini 590a2700fc Fix Reassociate handling of constant in presence of undef float
http://reviews.llvm.org/D6993

llvm-svn: 226245
2015-01-16 03:00:58 +00:00
Sanjoy Das a1837a342d Add a new pass "inductive range check elimination"
IRCE eliminates range checks of the form

  0 <= A * I + B < Length

by splitting a loop's iteration space into three segments in a way
that the check is completely redundant in the middle segment.  As an
example, IRCE will convert

  len = < known positive >
  for (i = 0; i < n; i++) {
    if (0 <= i && i < len) {
      do_something();
    } else {
      throw_out_of_bounds();
    }
  }

to

  len = < known positive >
  limit = smin(n, len)
  // no first segment
  for (i = 0; i < limit; i++) {
    if (0 <= i && i < len) { // this check is fully redundant
      do_something();
    } else {
      throw_out_of_bounds();
    }
  }
  for (i = limit; i < n; i++) {
    if (0 <= i && i < len) {
      do_something();
    } else {
      throw_out_of_bounds();
    }
  }


IRCE can deal with multiple range checks in the same loop (it takes
the intersection of the ranges that will make each of them redundant
individually).

Currently IRCE does not do any profitability analysis.  That is a
TODO.

Please note that the status of this pass is *experimental*, and it is
not part of any default pass pipeline.  Having said that, I will love
to get feedback and general input from people interested in trying
this out.

This pass was originally r226201.  It was reverted because it used C++
features not supported by MSVC 2012.

Differential Revision: http://reviews.llvm.org/D6693

llvm-svn: 226238
2015-01-16 01:03:22 +00:00
Sanjoy Das 7f62ac8e4d Revert r226201 (Add a new pass "inductive range check elimination")
The change used C++11 features not supported by MSVC 2012.  I will fix
the change to use things supported MSVC 2012 and recommit shortly.

llvm-svn: 226216
2015-01-15 22:18:10 +00:00
David Majnemer f1f72c9e43 InductiveRangeCheckElimination: Remove extra ';'
This silences a GCC warning.

llvm-svn: 226215
2015-01-15 21:55:16 +00:00
Sanjoy Das 7059e2959d Add a new pass "inductive range check elimination"
IRCE eliminates range checks of the form

  0 <= A * I + B < Length

by splitting a loop's iteration space into three segments in a way
that the check is completely redundant in the middle segment.  As an
example, IRCE will convert

  len = < known positive >
  for (i = 0; i < n; i++) {
    if (0 <= i && i < len) {
      do_something();
    } else {
      throw_out_of_bounds();
    }
  }

to

  len = < known positive >
  limit = smin(n, len)
  // no first segment
  for (i = 0; i < limit; i++) {
    if (0 <= i && i < len) { // this check is fully redundant
      do_something();
    } else {
      throw_out_of_bounds();
    }
  }
  for (i = limit; i < n; i++) {
    if (0 <= i && i < len) {
      do_something();
    } else {
      throw_out_of_bounds();
    }
  }


IRCE can deal with multiple range checks in the same loop (it takes
the intersection of the ranges that will make each of them redundant
individually).

Currently IRCE does not do any profitability analysis.  That is a
TODO.

Please note that the status of this pass is *experimental*, and it is
not part of any default pass pipeline.  Having said that, I will love
to get feedback and general input from people interested in trying
this out.

Differential Revision: http://reviews.llvm.org/D6693

llvm-svn: 226201
2015-01-15 20:45:46 +00:00
Chandler Carruth b98f63dbdb [PM] Separate the TargetLibraryInfo object from the immutable pass.
The pass is really just a means of accessing a cached instance of the
TargetLibraryInfo object, and this way we can re-use that object for the
new pass manager as its result.

Lots of delta, but nothing interesting happening here. This is the
common pattern that is developing to allow analyses to live in both the
old and new pass manager -- a wrapper pass in the old pass manager
emulates the separation intrinsic to the new pass manager between the
result and pass for analyses.

llvm-svn: 226157
2015-01-15 10:41:28 +00:00
NAKAMURA Takumi 24ebfcb619 Update libdeps since TLI was moved from Target to Analysis in r226078.
llvm-svn: 226126
2015-01-15 05:21:00 +00:00
Chandler Carruth 62d4215baa [PM] Move TargetLibraryInfo into the Analysis library.
While the term "Target" is in the name, it doesn't really have to do
with the LLVM Target library -- this isn't an abstraction which LLVM
targets generally need to implement or extend. It has much more to do
with modeling the various runtime libraries on different OSes and with
different runtime environments. The "target" in this sense is the more
general sense of a target of cross compilation.

This is in preparation for porting this analysis to the new pass
manager.

No functionality changed, and updates inbound for Clang and Polly.

llvm-svn: 226078
2015-01-15 02:16:27 +00:00
Ramkumar Ramachandra 40c3e03e27 Standardize {pred,succ,use,user}_empty()
The functions {pred,succ,use,user}_{begin,end} exist, but many users
have to check *_begin() with *_end() by hand to determine if the
BasicBlock or User is empty. Fix this with a standard *_empty(),
demonstrating a few usecases.

llvm-svn: 225760
2015-01-13 03:46:47 +00:00
Sanjay Patel db8e6f472e fix typo; NFC
llvm-svn: 225753
2015-01-13 01:51:52 +00:00
Sanjay Patel 06d5589a84 80-cols; NFC
llvm-svn: 225700
2015-01-12 21:21:28 +00:00
Sanjay Patel 5f1d9eaad3 GVN: propagate equalities for floating point compares
Allow optimizations based on FP comparison values in the same way
as integers. 

This resolves PR17713:
http://llvm.org/bugs/show_bug.cgi?id=17713

Differential Revision: http://reviews.llvm.org/D6911

llvm-svn: 225660
2015-01-12 19:29:48 +00:00
Hal Finkel 38dd590861 [LoopUnroll] Fix the partial unrolling threshold for small loop sizes
When we compute the size of a loop, we include the branch on the backedge and
the comparison feeding the conditional branch. Under normal circumstances,
these don't get replicated with the rest of the loop body when we unroll. This
led to the somewhat surprising behavior that really small loops would not get
unrolled enough -- they could be unrolled more and the resulting loop would be
below the threshold, because we were assuming they'd take
(LoopSize * UnrollingFactor) instructions after unrolling, instead of
(((LoopSize-2) * UnrollingFactor)+2) instructions. This fixes that computation.

llvm-svn: 225565
2015-01-10 00:30:55 +00:00
Tim Northover eb16112e97 Re-reapply r221924: "[GVN] Perform Scalar PRE on gep indices that feed loads before
doing Load PRE"

It's not really expected to stick around, last time it provoked a weird LTO
build failure that I can't reproduce now, and the bot logs are long gone. I'll
re-revert it if the failures recur.

Original description: Perform Scalar PRE on gep indices that feed loads before
doing Load PRE.

llvm-svn: 225536
2015-01-09 19:19:56 +00:00
Philip Reames 567feb98f0 [Refactor] Have getNonLocalPointerDependency take the query instruction
Previously, MemoryDependenceAnalysis::getNonLocalPointerDependency was taking a list of properties about the instruction being queried. Since I'm about to need one more property to be passed down through the infrastructure - I need to know a query instruction is non-volatile in an inner helper - fix the interface once and for all.

I also added some assertions and behaviour clarifications around volatile and ordered field accesses. At the moment, this is mostly to document expected behaviour. The only non-standard instructions which can currently reach this are atomic, but unordered, loads and stores. Neither ordered or volatile accesses can reach here.

The call in GVN is protected by an isSimple check when it first considers the load. The calls in MemDepPrinter are protected by isUnordered checks. Both utilities also check isVolatile for loads and stores.

llvm-svn: 225481
2015-01-09 00:04:22 +00:00
Adrian Prantl 2561bb8831 Revert "Reapply: Teach SROA how to update debug info for fragmented variables."
This reverts commit r225379 while investigating an assertion failure reported
by Alexey.

llvm-svn: 225424
2015-01-08 02:02:00 +00:00
Adrian Prantl 72b8ee708f Reapply: Teach SROA how to update debug info for fragmented variables.
The two buildbot failures were addressed in LLVM r225378 and CFE r225359.

This rapplies commit 225272 without modifications.

llvm-svn: 225379
2015-01-07 20:52:22 +00:00
Adrian Prantl 52f943b536 Revert "Reapply: Teach SROA how to update debug info for fragmented variables."
because of a tsan buildbot failure.
This reverts commit 225272.

Fix should be coming soon.

llvm-svn: 225288
2015-01-06 19:47:27 +00:00
Adrian Prantl 8335a5724a Reapply: Teach SROA how to update debug info for fragmented variables.
This also rolls in the changes discussed in http://reviews.llvm.org/D6766.
Defers migrating the debug info for new allocas until after all partitions
are created.

Thanks to Chandler for reviewing!

llvm-svn: 225272
2015-01-06 17:14:10 +00:00
Chandler Carruth 73b0164fe5 [SROA] Apply a somewhat heavy and unpleasant hammer to fix PR22093, an
assert out of the new pre-splitting in SROA.

This fix makes the code do what was originally intended -- when we have
a store of a load both dealing in the same alloca, we force them to both
be pre-split with identical offsets. This is really quite hard to do
because we can keep discovering problems as we go along. We have to
track every load over the current alloca which for any resaon becomes
invalid for pre-splitting, and go back to remove all stores of those
loads. I've included a couple of test cases derived from PR22093 that
cover the different ways this can happen. While that PR only really
triggered the first of these two, its the same fundamental issue.

The other challenge here is documented in a FIXME now. We end up being
quite a bit more aggressive for pre-splitting when loads and stores
don't refer to the same alloca. This aggressiveness comes at the cost of
introducing potentially redundant loads. It isn't clear that this is the
right balance. It might be considerably better to require that we only
do pre-splitting when we can presplit every load and store involved in
the entire operation. That would give more consistent if conservative
results. Unfortunately, it requires a non-trivial change to the actual
pre-splitting operation in order to correctly handle cases where we end
up pre-splitting stores out-of-order. And it isn't 100% clear that this
is the right direction, although I'm starting to suspect that it is.

llvm-svn: 225149
2015-01-05 04:17:53 +00:00
Chandler Carruth 66b3130cda [PM] Split the AssumptionTracker immutable pass into two separate APIs:
a cache of assumptions for a single function, and an immutable pass that
manages those caches.

The motivation for this change is two fold. Immutable analyses are
really hacks around the current pass manager design and don't exist in
the new design. This is usually OK, but it requires that the core logic
of an immutable pass be reasonably partitioned off from the pass logic.
This change does precisely that. As a consequence it also paves the way
for the *many* utility functions that deal in the assumptions to live in
both pass manager worlds by creating an separate non-pass object with
its own independent API that they all rely on. Now, the only bits of the
system that deal with the actual pass mechanics are those that actually
need to deal with the pass mechanics.

Once this separation is made, several simplifications become pretty
obvious in the assumption cache itself. Rather than using a set and
callback value handles, it can just be a vector of weak value handles.
The callers can easily skip the handles that are null, and eventually we
can wrap all of this up behind a filter iterator.

For now, this adds boiler plate to the various passes, but this kind of
boiler plate will end up making it possible to port these passes to the
new pass manager, and so it will end up factored away pretty reasonably.

llvm-svn: 225131
2015-01-04 12:03:27 +00:00
Chandler Carruth 24ac830d7c [SROA] Teach SROA to be more aggressive in splitting now that we have
a pre-splitting pass over loads and stores.

Historically, splitting could cause enough problems that I hamstrung the
entire process with a requirement that splittable integer loads and
stores must cover the entire alloca. All smaller loads and stores were
unsplittable to prevent chaos from ensuing. With the new pre-splitting
logic that does load/store pair splitting I introduced in r225061, we
can now very nicely handle arbitrarily splittable loads and stores. In
order to fully benefit from these smarts, we need to mark all of the
integer loads and stores as splittable.

However, we don't actually want to rewrite partitions with all integer
loads and stores marked as splittable. This will fail to extract scalar
integers from aggregates, which is kind of the point of SROA. =] In
order to resolve this, what we really want to do is only do
pre-splitting on the alloca slices with integer loads and stores fully
splittable. This allows us to uncover all non-integer uses of the alloca
that would benefit from a split in an integer load or store (and where
introducing the split is safe because it is just memory transfer from
a load to a store). Once done, we make all the non-whole-alloca integer
loads and stores unsplittable just as they have historically been,
repartition and rewrite.

The result is that when there are integer loads and stores anywhere
within an alloca (such as from a memcpy of a sub-object of a larger
object), we can split them up if there are non-integer components to the
aggregate hiding beneath. I've added the challenging test cases to
demonstrate how this is able to promote to scalars even a case where we
have even *partially* overlapping loads and stores.

This restores the single-store behavior for small arrays of i8s which is
really nice. I've restored both the little endian testing and big endian
testing for these exactly as they were prior to r225061. It also forced
me to be more aggressive in an alignment test to actually defeat SROA.
=] Without the added volatiles there, we actually split up the weird i16
loads and produce nice double allocas with better alignment.

This also uncovered a number of bugs where we failed to handle
splittable load and store slices which didn't have a begininng offset of
zero. Those fixes are included, and without them the existing test cases
explode in glorious fireworks. =]

I've kept support for leaving whole-alloca integer loads and stores as
splittable even for the purpose of rewriting, but I think that's likely
no longer needed. With the new pre-splitting, we might be able to remove
all the splitting support for loads and stores from the rewriter. Not
doing that in this patch to try to isolate any performance regressions
that causes in an easy to find and revert chunk.

llvm-svn: 225074
2015-01-02 03:55:54 +00:00
Chandler Carruth 5986b541d4 [SROA] Make the computation of adjusted pointers not leak GEP
instructions.

I noticed this when working on dialing up how aggressively we can
pre-split loads and stores. My test case wasn't passing because dead
GEPs into the allocas persisted when they were built by this routine.
This isn't terribly harmful, we still rewrote and promoted the alloca
and I can't conceive of how to cause this to happen in a case where we
will keep the exact same alloca but rewrite and promote the uses of it.
If that ever happened, we'd get an assert out of mem2reg.

So I don't have a direct test case yet, but the subsequent commit's test
case wouldn't pass without this. There are other problems fixed by this
patch that I spotted purely by inspection such as the fact that
getAdjustedPtr could have actually deleted dead base pointers. I don't
know how to get a base pointer to go into getAdjustedPtr today, so
I think this bug could never have manifested (and I certainly can't
write a test case for it) but, it wasn't the intent of the code. The
code really just wanted to GC the new instructions built. That can be
done more directly by comparing with the base pointer which is the only
non-new instruction that this code can return.

llvm-svn: 225073
2015-01-02 02:47:38 +00:00
Chandler Carruth 29c22fae46 [SROA] Fix the loop exit placement to be prior to indexing the splits
array. This prevents it from walking out of bounds on the splits array.

Bug found with the existing tests by ASan and by the MSVC debug build.

llvm-svn: 225069
2015-01-02 00:10:22 +00:00
Chandler Carruth c39eaa5041 [SROA] Fix two total think-os in r225061 that should have been caught on
a +asserts bootstrap, but my bootstrap had asserts off. Oops.

Anyways, in some places it is reasonable to cast (as a sanity check) the
pointer operand to a load or store to an instruction within SROA --
namely when the pointer operand is expected to be derived from an
alloca, and thus always an instruction. However, the pre-splitting code
also deals with loads and stores to non-alloca pointers and there we
need to just use the Value*. Nothing about the code relied on the
instruction cast, it was only there essentially as an invariant
assertion. Remove the two that don't actually hold.

This should fix the proximate issue in PR22080, but I'm also doing an
asserts bootstrap myself to see if there are other issues lurking.

I'll craft a reduced test case in a moment, but I wanted to get the tree
healthy as quickly as possible.

llvm-svn: 225068
2015-01-01 23:26:16 +00:00
Chandler Carruth 6044c0bc78 [SROA] Switch to using a more direct debug logging technique in one part
of my new load and store splitting, and fix a bug where it logged
a totally irrelevant slice rather than the actual slice in question.

The logging here previously worked because we used to place new slices
onto the back of the core sequence, but that caused other problems.
I updated the actual code to store new slices in their own vector but
didn't update the logging. There isn't a good way to reuse the logging
any more, and frankly it wasn't needed. We can directly log this bit
more easily.

llvm-svn: 225063
2015-01-01 12:56:47 +00:00
Chandler Carruth 994cde8869 [SROA] Fix formatting with clang-format which I managed to fail to do
prior to committing r225061. Sorry for that.

llvm-svn: 225062
2015-01-01 12:01:03 +00:00
Chandler Carruth 0715cba02d [SROA] Teach SROA how to much more intelligently handle split loads and
stores.

When there are accesses to an entire alloca with an integer
load or store as well as accesses to small pieces of the alloca, SROA
splits up the large integer accesses. In order to do that, it uses bit
math to merge the small accesses into large integers. While this is
effective, it produces insane IR that can cause significant problems in
the rest of the optimizer:

- It can cause load and store mismatches with GVN on the non-alloca side
  where we end up loading an i64 (or some such) rather than loading
  specific elements that are stored.
- We can't always get rid of the integer bit math, which is why we can't
  always fix the loads and stores to work well with GVN.
- This is especially bad when we have operations that mix poorly with
  integer bit math such as floating point operations.
- It will block things like the vectorizer which might be able to handle
  the scalar stores that underly the aggregate.

At the same time, we can't just directly split up these loads and stores
in all cases. If there is actual integer arithmetic involved on the
values, then using integer bit math is actually the perfect lowering
because we can often combine it heavily with the surrounding math.

The solution this patch provides is to find places where SROA is
partitioning aggregates into small elements, and look for splittable
loads and stores that it can split all the way to some other adjacent
load and store. These are uniformly the cases where failing to split the
loads and stores hurts the optimizer that I have seen, and I've looked
extensively at the code produced both from more and less aggressive
approaches to this problem.

However, it is quite tricky to actually do this in SROA. We may have
loads and stores to the same alloca, or other complex patterns that are
hard to handle. This complexity leads to the somewhat subtle algorithm
implemented here. We have to do this entire process as a separate pass
over the partitioning of the alloca, and split up all of the loads prior
to splitting the stores so that we can handle safely the cases of
overlapping, including partially overlapping, loads and stores to the
same alloca. We also have to reconstitute the post-split slice
configuration so we can avoid iterating again over all the alloca uses
(the slow part of SROA). But we also have to ensure that when we split
up loads and stores to *other* allocas, we *do* re-iterate over them in
SROA to adapt to the more refined partitioning now required.

With this, I actually think we can fix a long-standing TODO in SROA
where I avoided splitting as many loads and stores as probably should be
splittable. This limitation historically mitigated the fallout of all
the bad things mentioned above. Now that we have more intelligent
handling, I plan to remove the FIXME and more aggressively mark integer
loads and stores as splittable. I'll do that in a follow-up patch to
help with bisecting any fallout.

The net result of this change should be more fine-grained and accurate
scalars being formed out of aggregates. At the very least, Clang now
generates perfect code for this high-level test case using
std::complex<float>:

  #include <complex>

  void g1(std::complex<float> &x, float a, float b) {
    x += std::complex<float>(a, b);
  }
  void g2(std::complex<float> &x, float a, float b) {
    x -= std::complex<float>(a, b);
  }

  void foo(const std::complex<float> &x, float a, float b,
           std::complex<float> &x1, std::complex<float> &x2) {
    std::complex<float> l1 = x;
    g1(l1, a, b);
    std::complex<float> l2 = x;
    g2(l2, a, b);
    x1 = l1;
    x2 = l2;
  }

This code isn't just hypothetical either. It was reduced out of the hot
inner loops of essentially every part of the Eigen math library when
using std::complex<float>. Those loops would consistently and
pervasively hop between the floating point unit and the integer unit due
to bit math extraction and insertion of floating point values that were
"stored" in a 64-bit integer register around the loop backedge.

So far, this change has passed a bootstrap and I have done some other
testing and so far, no issues. That doesn't mean there won't be though,
so I'll be prepared to help with any fallout. If you performance swings
in particular, please let me know. I'm very curious what all the impact
of this change will be. Stay tuned for the follow-up to also split more
integer loads and stores.

llvm-svn: 225061
2015-01-01 11:54:38 +00:00
Philip Reames b35f46ce06 Refine the notion of MayThrow in LICM to include a header specific version
In LICM, we have a check for an instruction which is guaranteed to execute and thus can't introduce any new faults if moved to the preheader. To handle a function which might unconditionally throw when first called, we check for any potentially throwing call in the loop and give up.

This is unfortunate when the potentially throwing condition is down a rare path. It prevents essentially all LICM of potentially faulting instructions where the faulting condition is checked outside the loop. It also greatly diminishes the utility of loop unswitching since control dependent instructions - which are now likely in the loops header block - will not be lifted by subsequent LICM runs.

define void @nothrow_header(i64 %x, i64 %y, i1 %cond) {
; CHECK-LABEL: nothrow_header
; CHECK-LABEL: entry
; CHECK: %div = udiv i64 %x, %y
; CHECK-LABEL: loop
; CHECK: call void @use(i64 %div)
entry:
  br label %loop
loop: ; preds = %entry, %for.inc
  %div = udiv i64 %x, %y
  br i1 %cond, label %loop-if, label %exit
loop-if:
  call void @use(i64 %div)
  br label %loop
exit:
  ret void
}

The current patch really only helps with non-memory instructions (i.e. divs, etc..) since the maythrow call down the rare path will be considered to alias an otherwise hoistable load.  The one exception is that it does kick in for loads which are known to be invariant without regard to other possible stores, i.e. those marked with either !invarant.load metadata of tbaa 'is constant memory' metadata.

Differential Revision: http://reviews.llvm.org/D6725

llvm-svn: 224965
2014-12-29 23:00:57 +00:00
Chandler Carruth ffb7ce56a6 [SROA] Update the documentation and names for accessing the slices
within a partition of an alloca in SROA.

This reflects the fact that the organization of the slices isn't really
ideal for analysis, but is the naive way in which the slices are
available while we're processing them in the core partitioning
algorithm.

It is possible we could improve matters, and I've left a FIXME with
one of my ideas for how to do this, but it is a lot of work, the benefit
is somewhat minor, and it isn't clear that it would be strictly better.
=/ Not really satisfying, but I'm out of really good ideas.

This also improves one place where the debug logging failed to mark some
split partitions. Now we log in one place, slightly later, and with
accurate information about whether the slice is split by the partition
being rewritten.

llvm-svn: 224800
2014-12-24 01:48:09 +00:00
Chandler Carruth 5031bbe86a [SROA] Refactor the integer and vector promotion testing logic to
operate in terms of the new Partition class, and generally have a more
clear set of arguments. No functionality changed.

The most notable improvements here are consistently using the
terminology of 'partition' for a collection of slices that will be
rewritten together and 'slice' for a region of an alloca that is used by
a particular instruction.

This also makes it more clear that the split things are actually slices
as well, just ones that will be split by the proposed partition.

This doesn't yet address the confusing aspects of the partition's
interface where slices that will be split by the partition and start
prior to the partition are accesssed via Partition::splitSlices() while
the core range of slices exposed by a Partition includes both unsplit
slices and slices which will be split by the end, but started within the
offset range of the partition. This is particularly hard to address
because the algorithm which computes partitions quite literally doesn't
know which slices these will end up being until too late. I'm looking at
whether I can fix that or not, but I'm not optimistic. I'll update the
comments and/or names to further explain this either way. I've also
added one FIXME in this patch relating to this confusion so that I don't
forget about it.

llvm-svn: 224798
2014-12-24 01:05:14 +00:00
Chandler Carruth c7d1e24b34 Revert r224739: Debug info: Teach SROA how to update debug info for
fragmented variables.

This caused codegen to start crashing when we built somewhat large
programs with debug info and optimizations. 'check-msan' hit in, and
I suspect a bootstrap would as well. I mailed a test case to the
review thread.

llvm-svn: 224750
2014-12-23 02:58:14 +00:00
Chandler Carruth e2f66ceed9 [SROA] Lift the logic for traversing the alloca slices one partition at
a time into a partition iterator and a Partition class.

There is a lot of knock-on simplification that this enables, largely
stemming from having a Partition object to refer to in lots of helpers.
I've only done a minimal amount of that because enoguh stuff is changing
as-is in this commit.

This shouldn't change any observable behavior. I've worked hard to
preserve the *exact* traversal semantics which were originally present
even though some of them make no sense. I'll be changing some of this in
subsequent commits now that the logic is carefully factored into
a reusable place.

The primary motivation for this change is to break the rewriting into
phases in order to support more intelligent rewriting. For example, I'm
planning to change how split loads and stores are rewritten to remove
the significant overuse of integer bit packing in the resulting code and
allow more effective secondary splitting of aggregates. For any of this
to work, they have to share the exact traversal logic.

llvm-svn: 224742
2014-12-22 22:46:00 +00:00
Bruno Cardoso Lopes bad65c3b70 [LCSSA] Handle PHI insertion in disjoint loops
Take two disjoint Loops L1 and L2.

LoopSimplify fails to simplify some loops (e.g. when indirect branches
are involved). In such situations, it can happen that an exit for L1 is
the header of L2. Thus, when we create PHIs in one of such exits we are
also inserting PHIs in L2 header.

This could break LCSSA form for L2 because these inserted PHIs can also
have uses in L2 exits, which are never handled in the current
implementation. Provide a fix for this corner case and test that we
don't assert/crash on that.

Differential Revision: http://reviews.llvm.org/D6624

rdar://problem/19166231

llvm-svn: 224740
2014-12-22 22:35:46 +00:00
Adrian Prantl a47ace5901 Debug info: Teach SROA how to update debug info for fragmented variables.
This allows us to generate debug info for extremely advanced code such as

  typedef struct { long int a; int b;} S;

  int foo(S s) {
    return s.b;
  }

which at -O1 on x86_64 is codegen'd into

  define i32 @foo(i64 %s.coerce0, i32 %s.coerce1) #0 {
    ret i32 %s.coerce1, !dbg !24
  }

with this patch we emit the following debug info for this

  TAG_formal_parameter [3]
    AT_location( 0x00000000
                 0x0000000000000000 - 0x0000000000000006: rdi, piece 0x00000008, rsi, piece 0x00000004
                 0x0000000000000006 - 0x0000000000000008: rdi, piece 0x00000008, rax, piece 0x00000004 )
                 AT_name( "s" )
                 AT_decl_file( "/Volumes/Data/llvm/_build.ninja.release/test.c" )

Thanks to chandlerc, dblaikie, and echristo for their feedback on all
previous iterations of this patch!

llvm-svn: 224739
2014-12-22 22:26:00 +00:00
Chandler Carruth 113dc64c67 [SROA] Run clang-format over the entire SROA pass as I wrote it before
much of the glory of clang-format, and now any time I touch it I risk
introducing formatting changes as part of a functional commit.

Also, clang-format is *way* better at formatting my code than I am.
Most of this is a huge improvement although I reverted a couple of
places where I hit a clang-format bug with lambdas that has been filed
but not (fully) fixed.

llvm-svn: 224666
2014-12-20 02:39:18 +00:00
Chandler Carruth 68ea415d04 [SROA] Cleanup - remove the use of std::mem_fun_ref nonsense and use
a lambda now that we have them.

llvm-svn: 224500
2014-12-18 05:19:47 +00:00
Elena Demikhovsky a5599bfd72 Sink store based on alias analysis
- by Ella Bolshinsky
The alias analysis is used define whether the given instruction
is a barrier for store sinking. For 2 identical stores, following
instructions are checked in the both basic blocks, to determine
whether they are sinking barriers.

http://reviews.llvm.org/D6420

llvm-svn: 224247
2014-12-15 14:09:53 +00:00
Chad Rosier 78943bcc18 [Reassociate] Use dbgs() instead of errs().
llvm-svn: 224125
2014-12-12 14:44:12 +00:00
Duncan P. N. Exon Smith 5bf8fef580 IR: Split Metadata from Value
Split `Metadata` away from the `Value` class hierarchy, as part of
PR21532.  Assembly and bitcode changes are in the wings, but this is the
bulk of the change for the IR C++ API.

I have a follow-up patch prepared for `clang`.  If this breaks other
sub-projects, I apologize in advance :(.  Help me compile it on Darwin
I'll try to fix it.  FWIW, the errors should be easy to fix, so it may
be simpler to just fix it yourself.

This breaks the build for all metadata-related code that's out-of-tree.
Rest assured the transition is mechanical and the compiler should catch
almost all of the problems.

Here's a quick guide for updating your code:

  - `Metadata` is the root of a class hierarchy with three main classes:
    `MDNode`, `MDString`, and `ValueAsMetadata`.  It is distinct from
    the `Value` class hierarchy.  It is typeless -- i.e., instances do
    *not* have a `Type`.

  - `MDNode`'s operands are all `Metadata *` (instead of `Value *`).

  - `TrackingVH<MDNode>` and `WeakVH` referring to metadata can be
    replaced with `TrackingMDNodeRef` and `TrackingMDRef`, respectively.

    If you're referring solely to resolved `MDNode`s -- post graph
    construction -- just use `MDNode*`.

  - `MDNode` (and the rest of `Metadata`) have only limited support for
    `replaceAllUsesWith()`.

    As long as an `MDNode` is pointing at a forward declaration -- the
    result of `MDNode::getTemporary()` -- it maintains a side map of its
    uses and can RAUW itself.  Once the forward declarations are fully
    resolved RAUW support is dropped on the ground.  This means that
    uniquing collisions on changing operands cause nodes to become
    "distinct".  (This already happened fairly commonly, whenever an
    operand went to null.)

    If you're constructing complex (non self-reference) `MDNode` cycles,
    you need to call `MDNode::resolveCycles()` on each node (or on a
    top-level node that somehow references all of the nodes).  Also,
    don't do that.  Metadata cycles (and the RAUW machinery needed to
    construct them) are expensive.

  - An `MDNode` can only refer to a `Constant` through a bridge called
    `ConstantAsMetadata` (one of the subclasses of `ValueAsMetadata`).

    As a side effect, accessing an operand of an `MDNode` that is known
    to be, e.g., `ConstantInt`, takes three steps: first, cast from
    `Metadata` to `ConstantAsMetadata`; second, extract the `Constant`;
    third, cast down to `ConstantInt`.

    The eventual goal is to introduce `MDInt`/`MDFloat`/etc. and have
    metadata schema owners transition away from using `Constant`s when
    the type isn't important (and they don't care about referring to
    `GlobalValue`s).

    In the meantime, I've added transitional API to the `mdconst`
    namespace that matches semantics with the old code, in order to
    avoid adding the error-prone three-step equivalent to every call
    site.  If your old code was:

        MDNode *N = foo();
        bar(isa             <ConstantInt>(N->getOperand(0)));
        baz(cast            <ConstantInt>(N->getOperand(1)));
        bak(cast_or_null    <ConstantInt>(N->getOperand(2)));
        bat(dyn_cast        <ConstantInt>(N->getOperand(3)));
        bay(dyn_cast_or_null<ConstantInt>(N->getOperand(4)));

    you can trivially match its semantics with:

        MDNode *N = foo();
        bar(mdconst::hasa               <ConstantInt>(N->getOperand(0)));
        baz(mdconst::extract            <ConstantInt>(N->getOperand(1)));
        bak(mdconst::extract_or_null    <ConstantInt>(N->getOperand(2)));
        bat(mdconst::dyn_extract        <ConstantInt>(N->getOperand(3)));
        bay(mdconst::dyn_extract_or_null<ConstantInt>(N->getOperand(4)));

    and when you transition your metadata schema to `MDInt`:

        MDNode *N = foo();
        bar(isa             <MDInt>(N->getOperand(0)));
        baz(cast            <MDInt>(N->getOperand(1)));
        bak(cast_or_null    <MDInt>(N->getOperand(2)));
        bat(dyn_cast        <MDInt>(N->getOperand(3)));
        bay(dyn_cast_or_null<MDInt>(N->getOperand(4)));

  - A `CallInst` -- specifically, intrinsic instructions -- can refer to
    metadata through a bridge called `MetadataAsValue`.  This is a
    subclass of `Value` where `getType()->isMetadataTy()`.

    `MetadataAsValue` is the *only* class that can legally refer to a
    `LocalAsMetadata`, which is a bridged form of non-`Constant` values
    like `Argument` and `Instruction`.  It can also refer to any other
    `Metadata` subclass.

(I'll break all your testcases in a follow-up commit, when I propagate
this change to assembly.)

llvm-svn: 223802
2014-12-09 18:38:53 +00:00
Tom Stellard 1f0dded057 StructurizeCFG: Use LoopInfo analysis for better loop detection
We were assuming that each back-edge in a region represented a unique
loop, which is not always the case.  We need to use LoopInfo to
correctly determine which back-edges are loops.

llvm-svn: 223199
2014-12-03 04:28:32 +00:00
Bruno Cardoso Lopes d035fbb96f [LICM] Avoind store sinking if no preheader is available
Load instructions are inserted into loop preheaders when sinking stores
and later removed if not used by the SSA updater. Avoid sinking if the
loop has no preheader and avoid crashes. This fixes one more side effect
of not handling indirectbr instructions properly on LoopSimplify.

llvm-svn: 223119
2014-12-02 14:22:34 +00:00
Bruno Cardoso Lopes 46d5bf2982 [LICM] Store sink and indirectbr instructions
Loop simplify skips exit-block insertion when exits contain indirectbr
instructions. This leads to an assertion in LICM when trying to sink
stores out of non-dedicated loop exits containing indirectbr
instructions. This patch fix this issue by re-checking for dedicated
exits in LICM prior to store sink attempts.

Differential Revision: http://reviews.llvm.org/D6414

rdar://problem/18943047

llvm-svn: 222927
2014-11-28 19:47:46 +00:00
Chandler Carruth 1a3c2c414c Revert r220349 to re-instate r220277 with a fix for PR21330 -- quite
clearly only exactly equal width ptrtoint and inttoptr casts are no-op
casts, it says so right there in the langref. Make the code agree.

Original log from r220277:
Teach the load analysis to allow finding available values which require
inttoptr or ptrtoint cast provided there is datalayout available.
Eventually, the datalayout can just be required but in practice it will
always be there today.

To go with the ability to expose available values requiring a ptrtoint
or inttoptr cast, helpers are added to perform one of these three casts.

These smarts are necessary to finish canonicalizing loads and stores to
the operational type requirements without regressing fundamental
combines.

I've added some test cases. These should actually improve as the load
combining and store combining improves, but they may fundamentally be
highlighting some missing combines for select in addition to exercising
the specific added logic to load analysis.

llvm-svn: 222739
2014-11-25 08:20:27 +00:00
David Majnemer 1f44142e4e This Reassociate change unintentionally slipped in r222499
llvm-svn: 222500
2014-11-21 02:37:38 +00:00
David Majnemer c0a313b57c SROA: The alloca type isn't a candidate promotion type for vectors
The alloca's type is irrelevant, only those types which are used in a
load or store of the exact size of the slice should be considered.

This manifested as an assertion failure when we compared the various
types: we had a size mismatch.

This fixes PR21480.

llvm-svn: 222499
2014-11-21 02:34:55 +00:00
Chad Rosier 90a2f9b110 Revert "[Reassociate] As the expression tree is rewritten make sure the operands are"
This reverts commit r222142.  This is causing/exposing an execution-time regression
in spec2006/gcc and coremark on AArch64/A57/Ofast.

Conflicts:

	test/Transforms/Reassociate/optional-flags.ll

llvm-svn: 222398
2014-11-19 23:21:20 +00:00
Arnaud A. de Grandmaison 7b9dc28060 Fix tail recursion elimination
When the BasicBlock containing the return instrution has a PHI with 2
incoming values, FoldReturnIntoUncondBranch will remove the no longer
used incoming value and remove the no longer needed phi as well. This
leaves us with a BB that no longer has a PHI, but the subsequent call
to FoldReturnIntoUncondBranch from FoldReturnAndProcessPred will not
remove the return instruction (which still uses the result of the call
instruction). This prevents EliminateRecursiveTailCall to remove
the value, as it is still being used in a basicblock which has no
predecessors.

The basicblock can not be erased on the spot, because its iterator is
still being used in runTRE.

This issue was exposed when removing the threshold on size for lifetime
marker insertion for named temporaries in clang. The testcase is a much
reduced version of peelOffOuterExpr(const Expr*, const ExplodedNode *)
from clang/lib/StaticAnalyzer/Core/BugReporterVisitors.cpp.

llvm-svn: 222354
2014-11-19 13:32:51 +00:00
David Blaikie 70573dcd9f Update SetVector to rely on the underlying set's insert to return a pair<iterator, bool>
This is to be consistent with StringSet and ultimately with the standard
library's associative container insert function.

This lead to updating SmallSet::insert to return pair<iterator, bool>,
and then to update SmallPtrSet::insert to return pair<iterator, bool>,
and then to update all the existing users of those functions...

llvm-svn: 222334
2014-11-19 07:49:26 +00:00
Hao Liu 1d2a061bd8 [SeparateConstOffsetFromGEP] Allow SeparateConstOffsetFromGEP pass to lower GEPs.
If LowerGEP is enabled, it can lower a GEP with multiple indices into GEPs with a single index
or arithmetic operations. Lowering GEPs can always extract structure indices. Lowering GEPs can
also give use more optimization opportunities. It can benefit passes like CSE, LICM and CGP.

Reviewed in http://reviews.llvm.org/D5864

llvm-svn: 222328
2014-11-19 06:24:44 +00:00
Manman Ren c67109313c Revert r222039 because of bot failure.
http://lab.llvm.org:8080/green/job/clang-Rlto_master/298/
Hopefully, bot will be green. If not, we will re-submit the commit.

llvm-svn: 222287
2014-11-19 00:13:26 +00:00
Chad Rosier e53e8c8e58 [Reassociate] Rename local variable to not use same name as a member
variable. NFC.

llvm-svn: 222248
2014-11-18 20:21:54 +00:00
Philip Reames 018dbf18c4 Tweak EarlyCSE to recognize series of dead stores
EarlyCSE is giving up on the current instruction immediately when it recognizes that the current instruction makes a previous store trivially dead. There's no reason to do this. Once the previous store has been deleted, it's perfectly legal to remember the value of the current store (for value forwarding) and the fact the store occurred (it could be dead too!).

Reviewed by: Hal
Differential Revision: http://reviews.llvm.org/D6301

llvm-svn: 222241
2014-11-18 17:46:32 +00:00
David Majnemer 9a91e4a18a IndVarSimplify: Allow LFTR to fire more often
I added a pessimization in r217102 to prevent miscompiles when the
incremented induction variable was used in a comparison; it would be
poison.

Try to use the incremented induction variable more often when we can be
sure that the increment won't end in poison.

Differential Revision: http://reviews.llvm.org/D6222

llvm-svn: 222213
2014-11-18 02:20:58 +00:00
Chad Rosier bc0b869be9 [Reassociate] As the expression tree is rewritten make sure the operands are
emitted in canonical form.

llvm-svn: 222142
2014-11-17 16:33:50 +00:00
Chad Rosier 9a1ac6e494 [Reassociate] Canonicalize constants to RHS operand.
Fix a thinko where the RHS was already a constant.

llvm-svn: 222139
2014-11-17 15:52:51 +00:00
Chad Rosier 1ff4c0bf0b Reapply r221924: "[GVN] Perform Scalar PRE on gep indices that feed loads before
doing Load PRE"

This commit updates the failing test in
Analysis/TypeBasedAliasAnalysis/gvn-nonlocal-type-mismatch.ll

The failing test is sensitive to the order in which we process loads.  This
version turns on the RPO traversal instead of the while DT traversal in GVN.
The new test code is functionally same just the order of loads that are
eliminated is swapped.

This new version also fixes an issue where GVN splits a critical edge and
potentially invalidate the RPO/DT iterator.

llvm-svn: 222039
2014-11-14 21:09:13 +00:00
Chad Rosier df8f2a23cb [Reassociate] Canonicalize the operands of all binary operators.
llvm-svn: 222008
2014-11-14 17:09:19 +00:00
Chad Rosier d99df68e19 [Reassociate] Canonicalize operands of vector binary operators.
Prior to this commit fmul and fadd binary operators were being canonicalized for
both scalar and vector versions.  We now canonicalize add, mul, and, or, and xor
vector instructions.

llvm-svn: 222006
2014-11-14 17:08:15 +00:00
Chad Rosier f8b55f1bc5 [Reassociate] Canonicalize constants to RHS operand.
llvm-svn: 222005
2014-11-14 17:05:59 +00:00
Chad Rosier f59e548ba7 [Reassociate] Improve rank debug information. NFC.
llvm-svn: 221999
2014-11-14 15:01:38 +00:00
Chad Rosier 8716b58583 Revert "[GVN] Perform Scalar PRE on gep indices that feed loads before doing Load PRE."
This reverts commit r221924.  It appears the commit was a bit premature and is causing
bot failures that need further investigation.

llvm-svn: 221939
2014-11-13 22:54:59 +00:00