The new version has several advantages:
1) IMSHO it's more readable and neater
2) It handles loads and stores properly
3) It can handle any number of incoming blocks rather than just two. I'll be taking advantage of this in a followup patch.
With this change we can now finally sink load-modify-store idioms such as:
if (a)
return *b += 3;
else
return *b += 4;
=>
%z = load i32, i32* %y
%.sink = select i1 %a, i32 5, i32 7
%b = add i32 %z, %.sink
store i32 %b, i32* %y
ret i32 %b
When this works for switches it'll be even more powerful.
llvm-svn: 279229
Summary:
We are going to combine poisoning of red zones and scope poisoning.
PR27453
Reviewers: kcc, eugenis
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D23623
llvm-svn: 279020
minimal and boring form than the old pass manager's version.
This pass does the very minimal amount of work necessary to inline
functions declared as always-inline. It doesn't support a wide array of
things that the legacy pass manager did support, but is alse ... about
20 lines of code. So it has that going for it. Notably things this
doesn't support:
- Array alloca merging
- To support the above, bottom-up inlining with careful history
tracking and call graph updates
- DCE of the functions that become dead after this inlining.
- Inlining through call instructions with the always_inline attribute.
Instead, it focuses on inlining functions with that attribute.
The first I've omitted because I'm hoping to just turn it off for the
primary pass manager. If that doesn't pan out, I can add it here but it
will be reasonably expensive to do so.
The second should really be handled by running global-dce after the
inliner. I don't want to re-implement the non-trivial logic necessary to
do comdat-correct DCE of functions. This means the -O0 pipeline will
have to be at least 'always-inline,global-dce', but that seems
reasonable to me. If others are seriously worried about this I'd like to
hear about it and understand why. Again, this is all solveable by
factoring that logic into a utility and calling it here, but I'd like to
wait to do that until there is a clear reason why the existing
pass-based factoring won't work.
The final point is a serious one. I can fairly easily add support for
this, but it seems both costly and a confusing construct for the use
case of the always inliner running at -O0. This attribute can of course
still impact the normal inliner easily (although I find that
a questionable re-use of the same attribute). I've started a discussion
to sort out what semantics we want here and based on that can figure out
if it makes sense ta have this complexity at O0 or not.
One other advantage of this design is that it should be quite a bit
faster due to checking for whether the function is a viable candidate
for inlining exactly once per function instead of doing it for each call
site.
Anyways, hopefully a reasonable starting point for this pass.
Differential Revision: https://reviews.llvm.org/D23299
llvm-svn: 278896
When comparing a User* to a BasicBlock::iterator in
passingValueIsAlwaysUndefined, don't dereference the iterator in case it
is end().
llvm-svn: 278872
Clearing out the AssumptionCache can cause us to rescan the entire
function for assumes. If there are many loops, then we are scanning
over the entire function many times.
Instead of clearing out the AssumptionCache, register all cloned
assumes.
llvm-svn: 278854
This reverts commit r278660.
It causes downstream assertion failure in InstCombine on shuffle
instructions. Comes up in __mm_swizzle_epi32.
llvm-svn: 278672
The new version has several advantages:
1) IMSHO it's more readable and neater
2) It handles loads and stores properly
3) It can handle any number of incoming blocks rather than just two. I'll be taking advantage of this in a followup patch.
With this change we can now finally sink load-modify-store idioms such as:
if (a)
return *b += 3;
else
return *b += 4;
=>
%z = load i32, i32* %y
%.sink = select i1 %a, i32 5, i32 7
%b = add i32 %z, %.sink
store i32 %b, i32* %y
ret i32 %b
When this works for switches it'll be even more powerful.
llvm-svn: 278660
They aren't static, and moving them to the entry block across something
else will only result in tears.
Root cause of http://crbug.com/636558.
llvm-svn: 278571
Summary:
Port the NameAnonFunction pass and add a test.
Depends on D23439.
Reviewers: mehdi_amini
Subscribers: llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D23440
llvm-svn: 278509
We are seeing r276077 drastically increasing compiler time for our larger
benchmarks in PGO profile generation build (both clang based and IR based
mode) -- it can be 20x slower than without the patch (like from 30 secs to
780 secs)
The increased time are all in pass LCSSA. The problematic code is about
PostProcessPHIs after use-rewrite. Note that the InsertedPhis from ssa_updater
is accumulating (never been cleared). Since the inserted PHIs are added to the
candidate for each rewrite, The earlier ones will be repeatedly added. Later
when adding the new PHIs to the work-list, we don't check the duplication
either. This can result in extremely long work-list that containing tons of
duplicated PHIs.
This patch fixes the issue by hoisting the code out of the loop.
Differential Revision: http://reviews.llvm.org/D23344
llvm-svn: 278250
Hal pointed out that the semantic of our intrinsic and the libc
call are slightly different. Add a comment while I'm here to
explain why we can't emit an intrinsic. Thanks Hal!
llvm-svn: 278200
Summary:
This hopefully fixes PR28825. The problem now was that a value from the
original loop was used in a subloop, which became a sibling after separation.
While a subloop doesn't need an lcssa phi node, a sibling does, and that's
where we broke LCSSA. The most natural way to fix this now is to simply call
formLCSSA on the original loop: it'll do what we've been doing before plus
it'll cover situations described above.
I think we don't need to run formLCSSARecursively here, and we have an assert
to verify this (I've tried testing it on LLVM testsuite + SPECs). I'd be happy
to be corrected here though.
I also changed a run line in the test from '-lcssa -loop-unroll' to
'-lcssa -loop-simplify -indvars', because it exercises LCSSA
preservation to the same extent, but also makes less unrelated
transformation on the CFG, which makes it easier to verify.
Reviewers: chandlerc, sanjoy, silvas
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D23288
llvm-svn: 278173
Besides a general consistently benefit, the extra layer of indirection
allows the mechanical part of https://reviews.llvm.org/D23256 that
requires touching every transformation and analysis to be factored out
cleanly.
Thanks to David for the suggestion.
llvm-svn: 278077
Summary:
Ensure that the MemorySSA object never changes address when using the
new pass manager since the walkers contained by MemorySSA cache pointers
to it at construction time. This is achieved by wrapping the
MemorySSAAnalysis result in a unique_ptr. Also add some asserts that
check for this bug.
Reviewers: george.burgess.iv, dberlin
Subscribers: mcrosier, hfinkel, chandlerc, silvas, llvm-commits
Differential Revision: https://reviews.llvm.org/D23171
llvm-svn: 278028
Summary:
In the use optimizer, we need to keep of whether the lower bound still
dominates us or else we may decide a lower bound is still valid when it
is not due to intervening pushes/pops. Fixes PR28880 (and probably a
bunch of other things).
Reviewers: george.burgess.iv
Subscribers: MatzeB, llvm-commits, sebpop
Differential Revision: https://reviews.llvm.org/D23237
llvm-svn: 277978
Summary:
The correctness fix here is that when we CSE a load with another load,
we need to combine the metadata on the two loads. This matches the
behavior of other passes, like instcombine and GVN.
There's also a minor optimization improvement here: for load PRE, the
aliasing metadata on the inserted load should be the same as the
metadata on the original load. Not sure why the old code was throwing
it away.
Issue found by inspection.
Differential Revision: http://reviews.llvm.org/D21460
llvm-svn: 277977
Summary:
Originally the plan was to use the custom worklist to do some block popping,
and because we don't actually need a visited set. The custom one we have
here is slightly broken, and it's not worth fixing vs using depth_first_iterator since we aren't going to go the route we originally
were.
Fixes PR28874
Reviewers: george.burgess.iv
Subscribers: llvm-commits, gberry
Differential Revision: https://reviews.llvm.org/D23187
llvm-svn: 277880
This fixes PR28825. The problem was that we only checked if a value from
a created inner loop is used in the outer loop, and fixed LCSSA for
them. But we missed to fixup LCSSA for values used in exits of the outer
loop.
llvm-svn: 277877
Summary: We do not care about intrinsic calls when assigning discriminators.
Reviewers: davidxl, dnovillo
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D23212
llvm-svn: 277843
This generated IR based on the order of evaluation, which is different
between GCC and Clang. With that in mind you get bootstrap miscompares
if you compare a Clang built with GCC-built Clang vs. Clang built with
Clang-built Clang. Diagnosing that made my head hurt.
This also reverts commit r277337, which "fixed" the test case.
llvm-svn: 277820
Not a correctness issue, but it would be nice if we didn't have to
recompute our block numbering (worst-case) every time we move MSSA.
llvm-svn: 277652
This reverts commit r277611 and the followup r277614.
Bootstrap builds and chromium builds are crashing during inlining after
this change.
llvm-svn: 277642
This is a follow-up to r277637. It teaches MemorySSA that invariant
loads (and loads of provably constant memory) are always liveOnEntry.
llvm-svn: 277640
This patch makes MemorySSA recognize atomic/volatile loads, and makes
MSSA treat said loads specially. This allows us to be a bit more
aggressive in some cases.
Administrative note: Revision was LGTM'ed by reames in person.
Additionally, this doesn't include the `invariant.load` recognition in
the differential revision, because I feel it's better to commit that
separately. Will commit soon.
Differential Revision: https://reviews.llvm.org/D16875
llvm-svn: 277637
It is possible for the value map to not have an entry for some value
that has already been removed.
I don't have a testcase, this is fall-out from a buildbot.
llvm-svn: 277614
We were able to figure out that the result of a call is some constant.
While propagating that fact, we added the constant to the value map.
This is problematic because it results in us losing the call site when
processing the value map.
This fixes PR28802.
llvm-svn: 277611
This fixes a bug where we'd sometimes cache overly-conservative results
with our walker. This bug was made more obvious by r277480, which makes
our cache far more spotty than it was. Test case is llvm-unit, because
we're likely going to use CachingWalker only for def optimization in the
future.
The bug stems from that there was a place where the walker assumed that
`DefNode.Last` was a valid target to cache to when failing to optimize
phis. This is sometimes incorrect if we have a cache hit. The fix is to
use the thing we *can* assume is a valid target to cache to. :)
llvm-svn: 277559
Summary: We really want to move towards MemoryLocOrCall (or fix AA) everywhere, but for now, this lets us have a single instructionClobbersQuery.
Reviewers: george.burgess.iv
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D23072
llvm-svn: 277530
As agreed in post-commit review of r265388, I'm switching the flag to
its original value until the 90% runtime performance regression on
SingleSource/Benchmarks/Stanford/Bubblesort is addressed.
llvm-svn: 277524
Fixes PR28670
Summary:
Rewrite the use optimizer to be less memory intensive and 50% faster.
Fixes PR28670
The new use optimizer works like a standard SSA renaming pass, storing
all possible versions a MemorySSA use could get in a stack, and just
tracking indexes into the stack.
This uses much less memory than caching N^2 alias query results.
It's also a lot faster.
The current version defers phi node walking to the normal walker.
Reviewers: george.burgess.iv
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D23032
llvm-svn: 277480
Added ability to estimate the entry count of the extracted function and
the branch probabilities of the exit branches.
Patch by River Riddle!
Differential Revision: https://reviews.llvm.org/D22744
llvm-svn: 277411
Using RAUW was wrong here; if we have a switch transform such as:
18 -> 6 then
6 -> 0
If we use RAUW, while performing the second transform the *transformed* 6
from the first will be also replaced, so we end up with:
18 -> 0
6 -> 0
Found by clang stage2 bootstrap; testcase added.
llvm-svn: 277332
If a switch is sparse and all the cases (once sorted) are in arithmetic progression, we can extract the common factor out of the switch and create a dense switch. For example:
switch (i) {
case 5: ...
case 9: ...
case 13: ...
case 17: ...
}
can become:
if ( (i - 5) % 4 ) goto default;
switch ((i - 5) / 4) {
case 0: ...
case 1: ...
case 2: ...
case 3: ...
}
or even better:
switch ( ROTR(i - 5, 2) {
case 0: ...
case 1: ...
case 2: ...
case 3: ...
}
The division and remainder operations could be costly so we only do this if the factor is a power of two, and emit a right-rotate instead of a divide/remainder sequence. Dense switches can be lowered significantly better than sparse switches and can even be transformed into lookup tables.
llvm-svn: 277325
When extracting a set of blocks make sure to inherit all of the target
dependent attributes to make sure that the function will be valid for
lowering. One example is the "target-features" attribute for x86, if the
extracted region has functionality that relies on a specific feature it
will fail to be lowered.
This also allows for extracted functions to be valid for inlining, at
least back into the parent function, as the target attributes are tested
when inlining for compatibility.
Patch by River Riddle!
Differential Revision: https://reviews.llvm.org/D22713
llvm-svn: 277315
Added ability to estimate the entry count of the extracted function and
the branch probabilities of the exit branches.
Patch by River Riddle!
Differential Revision: https://reviews.llvm.org/D22744
llvm-svn: 277313
LoopUnroll is a loop pass, so the analysis of OptimizationRemarkEmitter
is added to the common function analysis passes that loop passes
depend on.
The BFI and indirectly BPI used in this pass is computed lazily so no
overhead should be observed unless -pass-remarks-with-hotness is used.
This is how the patch affects the O3 pipeline:
Dominator Tree Construction
Natural Loop Information
Canonicalize natural loops
Loop-Closed SSA Form Pass
Basic Alias Analysis (stateless AA impl)
Function Alias Analysis Results
Scalar Evolution Analysis
+ Lazy Branch Probability Analysis
+ Lazy Block Frequency Analysis
+ Optimization Remark Emitter
Loop Pass Manager
Rotate Loops
Loop Invariant Code Motion
Unswitch loops
Simplify the CFG
Dominator Tree Construction
Basic Alias Analysis (stateless AA impl)
Function Alias Analysis Results
Combine redundant instructions
Natural Loop Information
Canonicalize natural loops
Loop-Closed SSA Form Pass
Scalar Evolution Analysis
+ Lazy Branch Probability Analysis
+ Lazy Block Frequency Analysis
+ Optimization Remark Emitter
Loop Pass Manager
Induction Variable Simplification
Recognize loop idioms
Delete dead loops
Unroll loops
...
llvm-svn: 277203
Patch by Sunita Marathe
Third try, now following fixes to MSan to handle mempcy in such a way that this commit won't break the MSan buildbots. (Thanks, Evegenii!)
llvm-svn: 277189
A ConstantVector can have ConstantExpr operands and vice versa.
However, the folder had no ability to fold ConstantVectors which, in
some cases, was an optimization barrier.
Instead, rephrase the folder in terms of Constants instead of
ConstantExprs and teach callers how to deal with failure.
llvm-svn: 277099
Summary:
copypasta doc of ImportedFunctionsInliningStatistics class
\brief Calculate and dump ThinLTO specific inliner stats.
The main statistics are:
(1) Number of inlined imported functions,
(2) Number of imported functions inlined into importing module (indirect),
(3) Number of non imported functions inlined into importing module
(indirect).
The difference between first and the second is that first stat counts
all performed inlines on imported functions, but the second one only the
functions that have been eventually inlined to a function in the importing
module (by a chain of inlines). Because llvm uses bottom-up inliner, it is
possible to e.g. import function `A`, `B` and then inline `B` to `A`,
and after this `A` might be too big to be inlined into some other function
that calls it. It calculates this statistic by building graph, where
the nodes are functions, and edges are performed inlines and then by marking
the edges starting from not imported function.
If `Verbose` is set to true, then it also dumps statistics
per each inlined function, sorted by the greatest inlines count like
- number of performed inlines
- number of performed inlines to importing module
Reviewers: eraman, tejohnson, mehdi_amini
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D22491
llvm-svn: 277089
Sanitizers set nobuiltin attribute on certain library functions to
avoid a situation where such function is neither instrumented nor
intercepted.
At the moment the list of interesting functions is hardcoded. This
change replaces it with logic based on
TargetLibraryInfo::hasOptimizedCodegen and the presense of readnone
function attribute (sanitizers are generally interested in memory
behavior of library functions).
This is expected to be a no-op change: the new logic matches exactly
the same set of functions.
r276771 (currently reverted) added mempcpy() to the list, breaking
MSan tests. With this change, r276771 can be safely re-landed.
llvm-svn: 277086
Summary:
LCSSAWrapperPass currently doesn't override verifyAnalysis method, so pass
manager doesn't verify LCSSA. This patch adds the method so that we start
verifying LCSSA between loop passes.
Reviewers: chandlerc, sanjoy, hfinkel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D22888
llvm-svn: 276941
This lets you actually check to see if a block is valid before trying to
extract.
Patch by River Riddle!
Differential Revision: https://reviews.llvm.org/D22699
llvm-svn: 276846
There didn't appear to be a good reason to use iplist in this case, a regular
list of unique_ptr works just as well.
Change made in preparation to a new PM port (since iplist is not moveable).
llvm-svn: 276668
Allowed loop vectorization with secondary FP IVs. Like this:
float *A;
float x = init;
for (int i=0; i < N; ++i) {
A[i] = x;
x -= fp_inc;
}
The auto-vectorization is possible when the induction binary operator is "fast" or the function has "unsafe" attribute.
Differential Revision: https://reviews.llvm.org/D21330
llvm-svn: 276554
checkClobberSanity will now be run for all results of `ClobberWalk`,
instead of just the crazy phi-optimized ones. This can help us catch
cases where our cache is being wonky.
llvm-svn: 276553
This unblocks the new PM part of River's patch in
https://reviews.llvm.org/D22706
Conveniently, this same change was needed for D21921 and so these
changes are just spun out from there.
llvm-svn: 276515
A seemingly common use for the walker's getClobberingMemoryAccess
function is:
```
MemoryAccess *getClobber(MemorySSAWalker *W, MemoryUseOrDef *MUD) {
const Instruction *I = MUD->getMemoryInst();
return W->getClobberingMemoryAccess(I);
}
```
Which is kind of redundant, since walkers will ultimately query MSSA to
find out which MemoryAccess `I` maps to (...which is always `MUD`).
So, this patch adds an overload of getClobberingMemoryAccess that
accepts MemoryAccesses directly. As a result, the Instruction overload
of getClobberingMemoryAccess becomes a lightweight wrapper around our
new overload.
Additionally, this patch un`virtual`izes the Instruction overload of
getClobberingMemoryAccess, since there doesn't seem to be a walker that
benefits from that being virtual, and I can't think of how else one
would implement it. Happy to make it virtual again if we would benefit
from doing so.
llvm-svn: 276169
As noted in https://reviews.llvm.org/D22537 , we can use this functionality in
visitSelectInstWithICmp() and InstSimplify, but currently we have duplicated
code.
llvm-svn: 276140
Revert "[LoopSimplify] Update LCSSA after separating nested loops."
This reverts commit r275891.
Revert "[LCSSA] Post-process PHI-nodes created by SSAUpdate when constructing LCSSA form."
This reverts commit r275883.
llvm-svn: 276064
This patch updates MemorySSA's use-optimizing walker to be more
accurate and, in some cases, faster.
Essentially, this changed our core walking algorithm from a
cache-as-you-go DFS to an iteratively expanded DFS, with all of the
caching happening at the end. Said expansion happens when we hit a Phi,
P; we'll try to do the smallest amount of work possible to see if
optimizing above that Phi is legal in the first place. If so, we'll
expand the search to see if we can optimize to the next phi, etc.
An iteratively expanded DFS lets us potentially quit earlier (because we
don't assume that we can optimize above all phis) than our old walker.
Additionally, because we don't cache as we go, we can now optimize above
loops.
As an added bonus, this patch adds a ton of verification (if
EXPENSIVE_CHECKS are enabled), so finding bugs is easier.
Differential Revision: https://reviews.llvm.org/D21777
llvm-svn: 275940
Summary:
Usually LCSSA survives this transformation, but in some cases (see
attached test) it doesn't: values from the original loop after
separating might be used from the outer loop. Before the transformation
it was the same loop, so LCSSA phis were not required.
This fixes PR28272.
Reviewers: sanjoy, hfinkel, chandlerc
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D21665
llvm-svn: 275891
Summary:
When a pass tries to keep LCSSA form it's often convenient to be able to update
LCSSA for a set of instructions rather than for the entire loop. This patch makes the
processInstruction from LCSSA externally available under a name
formLCSSAForInstruction.
Reviewers: chandlerc, sanjoy, hfinkel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D22378
llvm-svn: 275613
Calling getModRefInfo with a fence resulted in crashes because fences
don't have a memory location. Add a new predicate to Instruction
called isFenceLike which indicates that the instruction mutates memory
but not any single memory location in particular. In practice, it is a
proxy for the set of instructions which "mayWriteToMemory" but cannot be
used with MemoryLocation::get.
This fixes PR28570.
llvm-svn: 275581