factory functions for the two modes the loop unroller is actually used
in in-tree: simplified full-unrolling and the entire thing including
partial unrolling.
I've also wired these up to nice names so you can express both of these
being in a pipeline easily. This is a precursor to actually enabling
these parts of the O2 pipeline.
Differential Revision: https://reviews.llvm.org/D28897
llvm-svn: 293136
Even when we don't create a remainder loop (that is, when we unroll by 2), we
may duplicate nested loops into the remainder. This is complicated by the fact
the remainder may itself be either inserted into an outer loop, or at the top
level. In the latter case, we may need to create new top-level loops.
Differential Revision: https://reviews.llvm.org/D29156
llvm-svn: 293124
loops.
We do this by reconstructing the newly added loops after the unroll
completes to avoid threading pass manager details through all the mess
of the unrolling infrastructure.
I've enabled some extra assertions in the LPM to try and catch issues
here and enabled a bunch of unroller tests to try and make sure this is
sane.
Currently, I'm manually running loop-simplify when needed. That should
go away once it is folded into the LPM infrastructure.
Differential Revision: https://reviews.llvm.org/D28848
llvm-svn: 293011
Summary: Partial unrolling should have separate threshold with full unrolling.
Reviewers: efriedma, mzolotukhin
Reviewed By: efriedma, mzolotukhin
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D28831
llvm-svn: 292293
a function's CFG when that CFG is unchanged.
This allows transformation passes to simply claim they preserve the CFG
and analysis passes to check for the CFG being preserved to remove the
fanout of all analyses being listed in all passes.
I've gone through and removed or cleaned up as many of the comments
reminding us to do this as I could.
Differential Revision: https://reviews.llvm.org/D28627
llvm-svn: 292054
the latter to the Transforms library.
While the loop PM uses an analysis to form the IR units, the current
plan is to have the PM itself establish and enforce both loop simplified
form and LCSSA. This would be a layering violation in the analysis
library.
Fundamentally, the idea behind the loop PM is to *transform* loops in
addition to running passes over them, so it really seemed like the most
natural place to sink this was into the transforms library.
We can't just move *everything* because we also have loop analyses that
rely on a subset of the invariants. So this patch splits the the loop
infrastructure into the analysis management that has to be part of the
analysis library, and the transform-aware pass manager.
This also required splitting the loop analyses' printer passes out to
the transforms library, which makes sense to me as running these will
transform the code into LCSSA in theory.
I haven't split the unittest though because testing one component
without the other seems nearly intractable.
Differential Revision: https://reviews.llvm.org/D28452
llvm-svn: 291662
arguments much like the CGSCC pass manager.
This is a major redesign following the pattern establish for the CGSCC layer to
support updates to the set of loops during the traversal of the loop nest and
to support invalidation of analyses.
An additional significant burden in the loop PM is that so many passes require
access to a large number of function analyses. Manually ensuring these are
cached, available, and preserved has been a long-standing burden in LLVM even
with the help of the automatic scheduling in the old pass manager. And it made
the new pass manager extremely unweildy. With this design, we can package the
common analyses up while in a function pass and make them immediately available
to all the loop passes. While in some cases this is unnecessary, I think the
simplicity afforded is worth it.
This does not (yet) address loop simplified form or LCSSA form, but those are
the next things on my radar and I have a clear plan for them.
While the patch is very large, most of it is either mechanically updating loop
passes to the new API or the new testing for the loop PM. The code for it is
reasonably compact.
I have not yet updated all of the loop passes to correctly leverage the update
mechanisms demonstrated in the unittests. I'll do that in follow-up patches
along with improved FileCheck tests for those passes that ensure things work in
more realistic scenarios. In many cases, there isn't much we can do with these
until the loop simplified form and LCSSA form are in place.
Differential Revision: https://reviews.llvm.org/D28292
llvm-svn: 291651
Summary:
The current loop complete unroll algorithm checks if unrolling complete will reduce the runtime by a certain percentage. If yes, it will apply a fixed boosting factor to the threshold (by discounting cost). The problem for this approach is that the threshold abruptly. This patch makes the boosting factor a function of runtime reduction percentage, capped by a fixed threshold. In this way, the threshold changes continuously.
The patch also simplified the code by reducing one parameter in UP.
The patch only affects code-gen of two speccpu2006 benchmark:
445.gobmk binary size decreases 0.08%, no performance change.
464.h264ref binary size increases 0.24%, no performance change.
Reviewers: mzolotukhin, chandlerc
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D26989
llvm-svn: 290737
After r289755, the AssumptionCache is no longer needed. Variables affected by
assumptions are now found by using the new operand-bundle-based scheme. This
new scheme is more computationally efficient, and also we need much less
code...
llvm-svn: 289756
This implements PGO-driven loop peeling.
The basic idea is that when the average dynamic trip-count of a loop is known,
based on PGO, to be low, we can expect a performance win by peeling off the
first several iterations of that loop.
Unlike unrolling based on a known trip count, or a trip count multiple, this
doesn't save us the conditional check and branch on each iteration. However,
it does allow us to simplify the straight-line code we get (constant-folding,
etc.). This is important given that we know that we will usually only hit this
code, and not the actual loop.
This is currently disabled by default.
Differential Revision: https://reviews.llvm.org/D25963
llvm-svn: 288274
Summary:
For flat loop, even if it is hot, it is not a good idea to unroll in runtime, thus we set a lower partial unroll threshold.
For hot loop, we set a higher unroll threshold and allows expensive tripcount computation to allow more aggressive unrolling.
Reviewers: davidxl, mzolotukhin
Subscribers: sanjoy, mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D26527
llvm-svn: 287186
Summary:
Unrolled Loop Size calculations moved to a function.
Constant representing number of optimized instructions
when "back edge" becomes "fall through" replaced with
variable.
Some comments added.
Reviewers: mzolotukhin
Differential Revision: http://reviews.llvm.org/D21719
From: Evgeny Stupachenko <evstupac@gmail.com>
llvm-svn: 286389
When we have a loop with a known upper bound on the number of iterations, and
furthermore know that either the number of iterations will be either exactly
that upper bound or zero, then we can fully unroll up to that upper bound
keeping only the first loop test to check for the zero iteration case.
Most of the work here is in plumbing this 'max-or-zero' information from the
part of scalar evolution where it's detected through to loop unrolling. I've
also gone for the safe default of 'false' everywhere but howManyLessThans which
could probably be improved.
Differential Revision: https://reviews.llvm.org/D25682
llvm-svn: 284818
Reappy r284044 after revert in r284051. Krzysztof fixed the error in r284049.
The original summary:
This patch tries to fully unroll loops having break statement like this
for (int i = 0; i < 8; i++) {
if (a[i] == value) {
found = true;
break;
}
}
GCC can fully unroll such loops, but currently LLVM cannot because LLVM only
supports loops having exact constant trip counts.
The upper bound of the trip count can be obtained from calling
ScalarEvolution::getMaxBackedgeTakenCount(). Part of the patch is the
refactoring work in SCEV to prevent duplicating code.
The feature of using the upper bound is enabled under the same circumstance
when runtime unrolling is enabled since both are used to unroll loops without
knowing the exact constant trip count.
llvm-svn: 284053
This patch tries to fully unroll loops having break statement like this
for (int i = 0; i < 8; i++) {
if (a[i] == value) {
found = true;
break;
}
}
GCC can fully unroll such loops, but currently LLVM cannot because LLVM only
supports loops having exact constant trip counts.
The upper bound of the trip count can be obtained from calling
ScalarEvolution::getMaxBackedgeTakenCount(). Part of the patch is the
refactoring work in SCEV to prevent duplicating code.
The feature of using the upper bound is enabled under the same circumstance
when runtime unrolling is enabled since both are used to unroll loops without
knowing the exact constant trip count.
Differential Revision: https://reviews.llvm.org/D24790
llvm-svn: 284044
Summary: Debug info should *not* affect optimization decisions. This patch updates loop unroller cost model to make it not affected by debug info.
Reviewers: davidxl, mzolotukhin
Subscribers: haicheng, llvm-commits, mzolotukhin
Differential Revision: https://reviews.llvm.org/D25098
llvm-svn: 282894
This commit enables more unrolling for SystemZ by implementing the
SystemZTargetTransformInfo::getUnrollingPreferences() method.
It has been found that it is better to only unroll moderately, so the
DefaultUnrollRuntimeCount has been moved into UnrollingPreferences in order
to set this to a lower value for SystemZ (4).
Reviewers: Evgeny Stupachenko, Ulrich Weigand.
https://reviews.llvm.org/D24451
llvm-svn: 282570
We can't mark ORE (a function pass) preserved as required by the loop
passes because that is how we ensure that the required passes like
LazyBFI are all available any time ORE is used. See the new comments in
the patch.
Instead we use it directly just like the inliner does in D22694.
As expected there is some additional overhead after removing the caching
provided by analysis passes. The worst case, I measured was
LNT/CINT2006_ref/401.bzip2 which regresses by 12%. As before, this only
affects -Rpass-with-hotness and not default compilation.
llvm-svn: 279829
Summary:
In clang commit r268509 we started to invoke loop-unroll pass from the
driver even under -Os. However, we happen to not initialize optsize
thresholds properly, which si fixed with this change.
r268509 led to some big compile time regressions, because we started to
unroll some loops that we didn't unroll before. With this change I hope
to recover most of the regressions. We still are slightly slower than
before, because we do some checks here and there in loop-unrolling
before we bail out, but at least the slowdown is not that huge now.
Reviewers: hfinkel, chandlerc
Subscribers: mzolotukhin, llvm-commits
Differential Revision: https://reviews.llvm.org/D23388
llvm-svn: 279585
One exception here is LoopInfo which must forward-declare it (because
the typedef is in LoopPassManager.h which depends on LoopInfo).
Also, some includes for LoopPassManager.h were needed since that file
provides the typedef.
Besides a general consistently benefit, the extra layer of indirection
allows the mechanical part of https://reviews.llvm.org/D23256 that
requires touching every transformation and analysis to be factored out
cleanly.
Thanks to David for the suggestion.
llvm-svn: 278079
LoopUnroll is a loop pass, so the analysis of OptimizationRemarkEmitter
is added to the common function analysis passes that loop passes
depend on.
The BFI and indirectly BPI used in this pass is computed lazily so no
overhead should be observed unless -pass-remarks-with-hotness is used.
This is how the patch affects the O3 pipeline:
Dominator Tree Construction
Natural Loop Information
Canonicalize natural loops
Loop-Closed SSA Form Pass
Basic Alias Analysis (stateless AA impl)
Function Alias Analysis Results
Scalar Evolution Analysis
+ Lazy Branch Probability Analysis
+ Lazy Block Frequency Analysis
+ Optimization Remark Emitter
Loop Pass Manager
Rotate Loops
Loop Invariant Code Motion
Unswitch loops
Simplify the CFG
Dominator Tree Construction
Basic Alias Analysis (stateless AA impl)
Function Alias Analysis Results
Combine redundant instructions
Natural Loop Information
Canonicalize natural loops
Loop-Closed SSA Form Pass
Scalar Evolution Analysis
+ Lazy Branch Probability Analysis
+ Lazy Block Frequency Analysis
+ Optimization Remark Emitter
Loop Pass Manager
Induction Variable Simplification
Recognize loop idioms
Delete dead loops
Unroll loops
...
llvm-svn: 277203
We just set PreserveLCSSA to always true since we don't have an
analogous method `mustPreserveAnalysisID(LCSSA)`.
Also port LoopInfo verifier pass to test LoopUnrollPass.
llvm-svn: 276063
We do not support splitting cleanuppad or catchswitches. This is
problematic for passes which assume that a loop is in loop simplify
form (the loop would have a dedicated exit block instead of sharing it).
While it isn't great that we don't support this for cleanups, we still
cannot make loop-simplify form an assertable precondition because
indirectbr will also disable these sorts of CFG cleanups.
This fixes PR28132.
llvm-svn: 272739
with user specified count has been applied.
Summary:
Previously SetLoopAlreadyUnrolled() set the disable pragma only if
there was some loop metadata.
Now it set the pragma in all cases. This helps to prevent multiple
unroll when -unroll-count=N is given.
Reviewers: mzolotukhin
Differential Revision: http://reviews.llvm.org/D20765
From: Evgeny Stupachenko <evstupac@gmail.com>
llvm-svn: 272195
In r270478, where I enabled the new heuristic I posted testing results,
which I got when explicitly passed the thresholds values via CL options.
However, setting the CL options init-values is not enough to change the
default values of thresholds, so I'm changing them in another place now.
llvm-svn: 271615
Summary:
Unroll factor (Count) calculations moved to a new function.
Early exits on pragma and "-unroll-count" defined factor added.
New type of unrolling "Force" introduced (previously used implicitly).
New unroll preference "AllowRemainder" introduced and set "true" by default.
(should be set to false for architectures that suffers from it).
Reviewers: hfinkel, mzolotukhin, zzheng
Differential Revision: http://reviews.llvm.org/D19553
From: Evgeny Stupachenko <evstupac@gmail.com>
llvm-svn: 271071
Condition might be simplified to a Constant, but it doesn't have to be
ConstantInt, so we should dyn_cast, instead of cast.
This fixes PR27886.
llvm-svn: 270924
Summary:
This patch turns on LoopUnrollAnalyzer by default. To mitigate compile
time regressions, I chose very conservative thresholds for now. Later we
can make them more aggressive, but it might require being smarter in
which loops we're optimizing. E.g. currently the biggest issue is that
with more agressive thresholds we unroll many cold loops, which
increases compile time for no performance benefit (performance of those
loops is improved, but it doesn't matter since they are cold).
Test results for compile time(using 4 samples to reduce noise):
```
MultiSource/Benchmarks/VersaBench/ecbdes/ecbdes 5.19%
SingleSource/Benchmarks/Polybench/medley/reg_detect/reg_detect 4.19%
MultiSource/Benchmarks/FreeBench/fourinarow/fourinarow 3.39%
MultiSource/Applications/JM/lencod/lencod 1.47%
MultiSource/Benchmarks/Fhourstones-3_1/fhourstones3_1 -6.06%
```
I didn't see any performance changes in the testsuite, but it improves
some internal tests.
Reviewers: hfinkel, chandlerc
Subscribers: llvm-commits, mzolotukhin
Differential Revision: http://reviews.llvm.org/D20482
llvm-svn: 270478
Summary:
...loop after the last iteration.
This is really hard to do correctly. The core problem is that we need to
model liveness through the induction PHIs from iteration to iteration in
order to get the correct results, and we need to correctly de-duplicate
the common subgraphs of instructions feeding some subset of the
induction PHIs. All of this can be driven either from a side effect at
some iteration or from the loop values used after the loop finishes.
This patch implements this by storing the forward-propagating analysis
of each instruction in a cache to recall whether it was free and whether
it has become live and thus counted toward the total unroll cost. Then,
at each sink for a value in the loop, we recursively walk back through
every value that feeds the sink, including looping back through the
iterations as needed, until we have marked the entire input graph as
live. Because we cache this, we never visit instructions more than twice
-- once when we analyze them and put them into the cache, and once when
we count their cost towards the unrolled loop. Also, because the cache
is only two bits and because we are dealing with relatively small
iteration counts, we can store all of this very densely in memory to
avoid this from becoming an excessively slow analysis.
The code here is still pretty gross. I would appreciate suggestions
about better ways to factor or split this up, I've stared too long at
the algorithmic side to really have a good sense of what the design
should probably look at.
Also, it might seem like we should do all of this bottom-up, but I think
that is a red herring. Specifically, the simplification power is *much*
greater working top-down. We can forward propagate very effectively,
even across strange and interesting recurrances around the backedge.
Because we use data to propagate, this doesn't cause a state space
explosion. Doing this level of constant folding, etc, would be very
expensive to do bottom-up because it wouldn't be until the last moment
that you could collapse everything. The current solution is essentially
a top-down simplification with a bottom-up cost accounting which seems
to get the best of both worlds. It makes the simplification incremental
and powerful while leaving everything dead until we *know* it is needed.
Finally, a core property of this approach is its *monotonicity*. At all
times, the current UnrolledCost is a conservatively low estimate. This
ensures that we will never early-exit from the analysis due to exceeding
a threshold when if we had continued, the cost would have gone back
below the threshold. These kinds of bugs can cause incredibly hard to
track down random changes to behavior.
We could use a techinque similar (but much simpler) within the inliner
as well to avoid considering speculated code in the inline cost.
Reviewers: chandlerc
Subscribers: sanjoy, mzolotukhin, llvm-commits
Differential Revision: http://reviews.llvm.org/D11758
llvm-svn: 269388
Before r268509, Clang would disable the loop unroll pass when optimizing
for size. That commit enabled it to be able to support unroll pragmas
in -Os builds. However, this regressed binary size in one of Chromium's
DLLs with ~100 KB.
This restores the original behaviour of no unrolling at -Os, but doing it
in LLVM instead of Clang makes more sense, and also allows the pragmas to
keep working.
Differential revision: http://reviews.llvm.org/D20115
llvm-svn: 269124
The original commit was reverted because of a buildbot problem with LazyCallGraph::SCC handling (not related to the OptBisect handling).
Differential Revision: http://reviews.llvm.org/D19172
llvm-svn: 267231
This patch implements a optimization bisect feature, which will allow optimizations to be selectively disabled at compile time in order to track down test failures that are caused by incorrect optimizations.
The bisection is enabled using a new command line option (-opt-bisect-limit). Individual passes that may be skipped call the OptBisect object (via an LLVMContext) to see if they should be skipped based on the bisect limit. A finer level of control (disabling individual transformations) can be managed through an addition OptBisect method, but this is not yet used.
The skip checking in this implementation is based on (and replaces) the skipOptnoneFunction check. Where that check was being called, a new call has been inserted in its place which checks the bisect limit and the optnone attribute. A new function call has been added for module and SCC passes that behaves in a similar way.
Differential Revision: http://reviews.llvm.org/D19172
llvm-svn: 267022
1. Add FullUnrollMaxCount option that works like MaxCount, but also limits
the unroll count for fully unrolled loops. So if a loop has an iteration
count over this, it won't fully unroll.
2. Add CLI options for MaxCount and the new option, so they can be tested
(plus a test).
3. Make partial unrolling obey MaxCount.
An example use-case (the out of tree one this is originally designed for) is
a target’s TTI can analyze a loop and decide on a max unroll count separate
from the size threshold, e.g. based on register pressure, then constrain
LoopUnroll to not exceed that, regardless of the size of the unrolled loop.
llvm-svn: 265562
Summary:
Specifically, when we perform runtime loop unrolling of a loop that
contains a convergent op, we can only unroll k times, where k divides
the loop trip multiple.
Without this change, we'll happily unroll e.g. the following loop
for (int i = 0; i < N; ++i) {
if (i == 0) convergent_op();
foo();
}
into
int i = 0;
if (N % 2 == 1) {
convergent_op();
foo();
++i;
}
for (; i < N - 1; i += 2) {
if (i == 0) convergent_op();
foo();
foo();
}.
This is unsafe, because we've just added a control-flow dependency to
the convergent op in the prelude.
In general, runtime unrolling loops that contain convergent ops is safe
only if we don't have emit a prelude, which occurs when the unroll count
divides the trip multiple.
Reviewers: resistor
Subscribers: llvm-commits, mzolotukhin
Differential Revision: http://reviews.llvm.org/D17526
llvm-svn: 263509
Summary: Check that we're using SCEV for the same loop we're simulating. Otherwise, we might try to use the iteration number of the current loop in SCEV expressions for inner/outer loops IVs, which is clearly incorrect.
Reviewers: chandlerc, hfinkel
Subscribers: sanjoy, llvm-commits, mzolotukhin
Differential Revision: http://reviews.llvm.org/D17632
llvm-svn: 261958
routine.
We were getting this wrong in small ways and generally being very
inconsistent about it across loop passes. Instead, let's have a common
place where we do this. One minor downside is that this will require
some analyses like SCEV in more places than they are strictly needed.
However, this seems benign as these analyses are complete no-ops, and
without this consistency we can in many cases end up with the legacy
pass manager scheduling deciding to split up a loop pass pipeline in
order to run the function analysis half-way through. It is very, very
annoying to fix these without just being very pedantic across the board.
The only loop passes I've not updated here are ones that use
AU.setPreservesAll() such as IVUsers (an analysis) and the pass printer.
They seemed less relevant.
With this patch, almost all of the problems in PR24804 around loop pass
pipelines are fixed. The one remaining issue is that we run simplify-cfg
and instcombine in the middle of the loop pass pipeline. We've recently
added some loop variants of these passes that would seem substantially
cleaner to use, but this at least gets us much closer to the previous
state. Notably, the seven loop pass managers is down to three.
I've not updated the loop passes using LoopAccessAnalysis because that
analysis hasn't been fully wired into LoopSimplify/LCSSA, and it isn't
clear that those transforms want to support those forms anyways. They
all run late anyways, so this is harmless. Similarly, LSR is left alone
because it already carefully manages its forms and doesn't need to get
fused into a single loop pass manager with a bunch of other loop passes.
LoopReroll didn't use loop simplified form previously, and I've updated
the test case to match the trivially different output.
Finally, I've also factored all the pass initialization for the passes
that use this technique as well, so that should be done regularly and
reliably.
Thanks to James for the help reviewing and thinking about this stuff,
and Ben for help thinking about it as well!
Differential Revision: http://reviews.llvm.org/D17435
llvm-svn: 261316
Summary:
Unrolling Analyzer is already pretty complicated, and it becomes harder and harder to exercise it with usual IR tests, as with them we can only check the final decision: whether the loop is unrolled or not. This change factors this framework out from LoopUnrollPass to analyses, which allows to use unit tests.
The change itself is supposed to be NFC, except adding a couple of tests.
I plan to add more tests as I add new functionality and find/fix bugs.
Reviewers: chandlerc, hfinkel, sanjoy
Subscribers: zzheng, sanjoy, llvm-commits
Differential Revision: http://reviews.llvm.org/D16623
llvm-svn: 260169
The layering of where the various loop unroll parameters are
initialized and overridden here was very confusing, making it pretty
difficult to tell just how the various sources interacted. Instead, we
put all of the initialization logic together in a single function so
that it's obvious what overrides what.
llvm-svn: 257426
Currently we're unrolling loops more in minsize than in optsize, which
means -Oz will have a larger code size than -Os. That doesn't make any
sense.
This resolves the FIXME about this in LoopUnrollPass and extends the
optsize test to make sure we use the smaller threshold for minsize as
well.
llvm-svn: 257402
As of r255720, the loop pass manager will DTRT when passes update the
loop info for removed loops, so they no longer need to reach into
LPPassManager APIs to do this kind of transformation. This change very
nearly removes the need for the LPPassManager to even be passed into
loop passes - the only remaining pass that uses the LPM argument is
LoopUnswitch.
llvm-svn: 255797
A large number of loop utility functions take a `Pass *` and reach
into it to find out which analyses to preserve. There are a number of
problems with this:
- The APIs have access to pretty well any Pass state they want, so
it's hard to tell what they may or may not do.
- Other APIs have copied these and pass around a `Pass *` even though
they don't even use it. Some of these just hand a nullptr to the API
since the callers don't even have a pass available.
- Passes in the new pass manager don't work like the current ones, so
the APIs can't be used as is there.
Instead, we should explicitly thread the analysis results that we
actually care about through these APIs. This is both simpler and more
reusable.
llvm-svn: 255669
Apart from checking that GlobalVariable is a constant, we should check
that it's not a weak constant, in which case we can't propagate its
value.
llvm-svn: 248327
We only checked that a global is initialized with constants, which is
incorrect. We should be checking that GlobalVariable *is* a constant,
not just initialized with it.
llvm-svn: 247769
GlobalsAA must by definition be preserved in function passes, but the passmanager doesn't know that. Make each pass explicitly preserve GlobalsAA.
llvm-svn: 247263
This change makes ScalarEvolution a stand-alone object and just produces
one from a pass as needed. Making this work well requires making the
object movable, using references instead of overwritten pointers in
a number of places, and other refactorings.
I've also wired it up to the new pass manager and added a RUN line to
a test to exercise it under the new pass manager. This includes basic
printing support much like with other analyses.
But there is a big and somewhat scary change here. Prior to this patch
ScalarEvolution was never *actually* invalidated!!! Re-running the pass
just re-wired up the various other analyses and didn't remove any of the
existing entries in the SCEV caches or clear out anything at all. This
might seem OK as everything in SCEV that can uses ValueHandles to track
updates to the values that serve as SCEV keys. However, this still means
that as we ran SCEV over each function in the module, we kept
accumulating more and more SCEVs into the cache. At the end, we would
have a SCEV cache with every value that we ever needed a SCEV for in the
entire module!!! Yowzers. The releaseMemory routine would dump all of
this, but that isn't realy called during normal runs of the pipeline as
far as I can see.
To make matters worse, there *is* actually a key that we don't update
with value handles -- there is a map keyed off of Loop*s. Because
LoopInfo *does* release its memory from run to run, it is entirely
possible to run SCEV over one function, then over another function, and
then lookup a Loop* from the second function but find an entry inserted
for the first function! Ouch.
To make matters still worse, there are plenty of updates that *don't*
trip a value handle. It seems incredibly unlikely that today GVN or
another pass that invalidates SCEV can update values in *just* such
a way that a subsequent run of SCEV will incorrectly find lookups in
a cache, but it is theoretically possible and would be a nightmare to
debug.
With this refactoring, I've fixed all this by actually destroying and
recreating the ScalarEvolution object from run to run. Technically, this
could increase the amount of malloc traffic we see, but then again it is
also technically correct. ;] I don't actually think we're suffering from
tons of malloc traffic from SCEV because if we were, the fact that we
never clear the memory would seem more likely to have come up as an
actual problem before now. So, I've made the simple fix here. If in fact
there are serious issues with too much allocation and deallocation,
I can work on a clever fix that preserves the allocations (while
clearing the data) between each run, but I'd prefer to do that kind of
optimization with a test case / benchmark that shows why we need such
cleverness (and that can test that we actually make it faster). It's
possible that this will make some things faster by making the SCEV
caches have higher locality (due to being significantly smaller) so
until there is a clear benchmark, I think the simple change is best.
Differential Revision: http://reviews.llvm.org/D12063
llvm-svn: 245193
This change adds the unroll metadata "llvm.loop.unroll.enable" which directs
the optimizer to unroll a loop fully if the trip count is known at compile time, and
unroll partially if the trip count is not known at compile time. This differs from
"llvm.loop.unroll.full" which explicitly does not unroll a loop if the trip count is not
known at compile time.
The "llvm.loop.unroll.enable" is intended to be added for loops annotated with
"#pragma unroll".
llvm-svn: 244466
Create wrapper methods in the Function class for the OptimizeForSize and MinSize
attributes. We want to hide the logic of "or'ing" them together when optimizing
just for size (-Os).
Currently, we are not consistent about this and rely on a front-end to always set
OptimizeForSize (-Os) if MinSize (-Oz) is on. Thus, there are 18 FIXME changes here
that should be added as follow-on patches with regression tests.
This patch is NFC-intended: it just replaces existing direct accesses of the attributes
by the equivalent wrapper call.
Differential Revision: http://reviews.llvm.org/D11734
llvm-svn: 243994
through PHI nodes across iterations.
This patch teaches the new advanced loop unrolling heuristics to propagate
constants into the loop from the preheader and around the backedge after
simulating each iteration. This lets us brute force solve simple recurrances
that aren't modeled effectively by SCEV. It also makes it more clear why we
need to process the loop in-order rather than bottom-up which might otherwise
make much more sense (for example, for DCE).
This came out of an attempt I'm making to develop a principled way to account
for dead code in the unroll estimation. When I implemented
a forward-propagating version of that it produced incorrect results due to
failing to propagate *cost* between loop iterations through the PHI nodes, and
it occured to me we really should at least propagate simplifications across
those edges, and it is quite easy thanks to the loop being in canonical and
LCSSA form.
Differential Revision: http://reviews.llvm.org/D11706
llvm-svn: 243900
Summary:
Resolving a branch allows us to ignore blocks that won't be executed, and thus make our estimate more accurate.
This patch is intended to be applied after D10205 (though it could be applied independently).
Reviewers: chandlerc
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10206
llvm-svn: 243084
During estimation of unrolling effect we should be able to propagate
constants through casts.
Differential Revision: http://reviews.llvm.org/D10207
llvm-svn: 242257
Enable runtime unrolling for loops with unroll count metadata ("#pragma unroll N")
and a runtime trip count. Also, do not unroll loops with unroll full metadata if the
loop has a runtime loop count. Previously, such loops would be unrolled with a
very large threshold (pragma-unroll-threshold) if runtime unrolled happened to be
enabled resulting in a very large (and likely unwise) unroll factor.
llvm-svn: 242047
The patch is generated using this command:
tools/clang/tools/extra/clang-tidy/tool/run-clang-tidy.py -fix \
-checks=-*,llvm-namespace-comment -header-filter='llvm/.*|clang/.*' \
llvm/lib/
Thanks to Eugene Kosov for the original patch!
llvm-svn: 240137
Summary:
Using some SCEV functionality helped to entirely remove SCEVCache class and FindConstantPointers SCEV visitor.
Also, this makes the code more universal - I'll take advandate of it in next patches where I start handling additional types of instructions.
Test Plan: Tests would be submitted in subsequent patches.
Reviewers: atrick, chandlerc
Reviewed By: atrick, chandlerc
Subscribers: atrick, llvm-commits
Differential Revision: http://reviews.llvm.org/D10205
llvm-svn: 239282
Summary:
canUnrollCompletely takes `unsigned` values for `UnrolledCost` and
`RolledDynamicCost` but is passed in `uint64_t`s that are silently
truncated. Because of this, when `UnrolledSize` is a large integer
that has a small remainder with UINT32_MAX, LLVM tries to completely
unroll loops with high trip counts.
Reviewers: mzolotukhin, chandlerc
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10293
llvm-svn: 239218
The new naming is (to me) much easier to understand. Here is a summary
of the new state of the world:
- '*Threshold' is the threshold for full unrolling. It is measured
against the estimated unrolled cost as computed by getUserCost in TTI
(or CodeMetrics, etc). We will exceed this threshold when unrolling
loops where unrolling exposes a significant degree of simplification
of the logic within the loop.
- '*PercentDynamicCostSavedThreshold' is the percentage of the loop's
estimated dynamic execution cost which needs to be saved by unrolling
to apply a discount to the estimated unrolled cost.
- '*DynamicCostSavingsDiscount' is the discount applied to the estimated
unrolling cost when the dynamic savings are expected to be high.
When actually analyzing the loop, we now produce both an estimated
unrolled cost, and an estimated rolled cost. The rolled cost is notably
a dynamic estimate based on our analysis of the expected execution of
each iteration.
While we're still working to build up the infrastructure for making
these estimates, to me it is much more clear *how* to make them better
when they have reasonably descriptive names. For example, we may want to
apply estimated (from heuristics or profiles) dynamic execution weights
to the *dynamic* cost estimates. If we start doing that, we would also
need to track the static unrolled cost and the dynamic unrolled cost, as
only the latter could reasonably be weighted by profile information.
This patch is sadly not without functionality change for the new unroll
analysis logic. Buried in the heuristic management were several things
that surprised me. For example, we never subtracted the optimized
instruction count off when comparing against the unroll heursistics!
I don't know if this just got lost somewhere along the way or what, but
with the new accounting of things, this is much easier to keep track of
and we use the post-simplification cost estimate to compare to the
thresholds, and use the dynamic cost reduction ratio to select whether
we can exceed the baseline threshold.
The old values of these flags also don't necessarily make sense. My
impression is that none of these thresholds or discounts have been tuned
yet, and so they're just arbitrary placehold numbers. As such, I've not
bothered to adjust for the fact that this is now a discount and not
a tow-tier threshold model. We need to tune all these values once the
logic is ready to be enabled.
Differential Revision: http://reviews.llvm.org/D9966
llvm-svn: 239164
lazily built.
Also, make it a much more generic SCEV cache, which today exposes only
a reduced GEP model description but could be extended in the future to
do other profitable caching of SCEV information.
llvm-svn: 238124
accumulating estimated cost, and other loop-centric logic from the logic
used to analyze instructions in a particular iteration.
This makes the visitor very narrow in scope -- all it does is visit
instructions, update a map of simplified values, and return whether it
is able to optimize away a particular instruction.
The two cost metrics are now returned as an optional struct. When the
optional is left unengaged, there is no information about the unrolled
cost of the loop, when it is engaged the cost metrics are available to
run against the thresholds.
No functionality changed.
llvm-svn: 238033
simplified model for use simulating each iteration into a separate
helper function that just returns the cache.
Building this cache had nothing to do with the rest of the unroll
analysis and so this removes an unnecessary coupling, etc. It should
also make it easier to think about the concept of providing fast cached
access to basic SCEV models as an orthogonal concept to the overall
unroll simulation.
I'd really like to see this kind of caching logic folded into SCEV
itself, it seems weird for us to provide it at this layer rather than
making repeated queries into SCEV fast all on their own.
No functionality changed.
llvm-svn: 237993
a single location.
This reduces code duplication a bit and will also pave the way for
a better separation between the visitation algorithm and the unroll
analysis.
No functionality changed.
llvm-svn: 237990
checking and make the cache faster and smaller.
I had thought that using an APInt here would be useful, but I think
I was just wrong. Notably, we don't have to do any fancy overflow
checking, we can just bound the values as quite small and do the math in
a higher precision integer. I've switched to a signed integer so that
UBSan will even point out if we ever have integer overflow. I've added
various asserts to try to catch things as well and hoisted the overflow
checks so that we just leave the too-large offsets out of the SCEV-GEP
cache. This makes the value in the cache quite a bit smaller which is
probably worthwhile.
No functionality changed here (for trip counts under 1 billion).
llvm-svn: 237209
Summary:
This patch reimplements heuristic that tries to estimate optimization beneftis
from complete loop unrolling.
In this patch I kept the minimal changes - e.g. I removed code handling
branches and folding compares. That's a promising area, but now there
are too many questions to discuss before we can enable it.
Test Plan: Tests are included in the patch.
Reviewers: hfinkel, chandlerc
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8816
llvm-svn: 237156
Summary:
Runtime unrolling of loops needs to emit an expression to compute the
loop's runtime trip-count. Avoid runtime unrolling if this computation
will be expensive.
Depends on D8993.
Reviewers: atrick
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8994
llvm-svn: 234846
Summary:
Now that the DataLayout is a mandatory part of the module, let's start
cleaning the codebase. This patch is a first attempt at doing that.
This patch is not exactly NFC as for instance some places were passing
a nullptr instead of the DataLayout, possibly just because there was a
default value on the DataLayout argument to many functions in the API.
Even though it is not purely NFC, there is no change in the
validation.
I turned as many pointer to DataLayout to references, this helped
figuring out all the places where a nullptr could come up.
I had initially a local version of this patch broken into over 30
independant, commits but some later commit were cleaning the API and
touching part of the code modified in the previous commits, so it
seemed cleaner without the intermediate state.
Test Plan:
Reviewers: echristo
Subscribers: llvm-commits
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 231740
Runtime unrolling is an expensive optimization which can bring benefit
only if the loop is hot and iteration number is relatively large enough.
For some loops, we know they are not worth to be runtime unrolled.
The scalar loop from vectorization is one of the cases.
llvm-svn: 231631
Canonicalize access to function attributes to use the simpler API.
getAttributes().getAttribute(AttributeSet::FunctionIndex, Kind)
=> getFnAttribute(Kind)
getAttributes().hasAttribute(AttributeSet::FunctionIndex, Kind)
=> hasFnAttribute(Kind)
llvm-svn: 229202
The issues with the new unroll analyzer are more fundamental than code
cleanup, algorithm, or data structure changes. I've sent an email to the
original commit thread with details and a proposal for how to redesign
things. I'm disabling this for now so that we don't spend time
debugging issues with it in its current state.
llvm-svn: 229064
UnrollAnalyzer.
Now they share a single worklist and have less implicit state between
them. There was no real benefit to separating these two things out.
I'm going to subsequently refactor things to share even more code.
llvm-svn: 229062
contained in it each time we try to add it to the worklist, just check
this when pulling it off the worklist. That way we do it at most once
per instruction with the cost of the worklist set we would need to pay
anyways.
llvm-svn: 229060
vector.
In addition to dramatically reducing the work required for contrived
example loops, this also has to correct some serious latent bugs in the
cost computation. Previously, we might add an instruction onto the
worklist once for every load which it used and was simplified. Then we
would visit it many times and accumulate "savings" each time.
I mean, fortunately this couldn't matter for things like calls with 100s
of operands, but even for binary operators this code seems like it must
be double counting the savings.
I just noticed this by inspection and due to the runtime problems it can
introduce, I don't have any test cases for cases where the cost produced
by this routine is unacceptable.
llvm-svn: 229059
In the unroll analyzer, it is checking each user to see if that user
will become dead. However, it first checked if that user was missing
from the simplified values map, and then if was also missing from the
dead instructions set. We add everything from the simplified values map
to the dead instructions set, so the first step is completely subsumed
by the second. Moreover, the first step requires *inserting* something
into the simplified value map which isn't what we want at all.
This also replaces a dyn_cast with a cast as an instruction cannot be
used by a non-instruction.
llvm-svn: 229057
check.
Also hoist this into the enqueue process as it is faster even than
testing the worklist set, we should just directly filter these out much
like we filter out constants and such.
llvm-svn: 229056
We don't just want to handle duplicate operands within an instruction,
but also duplicates across operands of different instructions. I should
have gone straight to this, but I had convinced myself that it wasn't
going to be necessary briefly. I've come to my senses after chatting
more with Nick, and am now happier here.
llvm-svn: 229054
reasonably quickly.
I don't have a reduced test case, but for a version of FFMPEG, this
makes the loop unroller start finishing at all (after over 15 minutes of
running, it hadn't terminated for me, no idea if it was a true infloop
or just exponential work).
The key thing here is to check the DeadInstructions set when pulling
things off the worklist. Without this, we would re-walk the user list of
already dead instructions again and again and again. Consider phi nodes
with many, many operands and other patterns.
The other important aspect of this is that because we would keep
re-visiting instructions that were already known dead, we kept adding
their cost savings to this! This would cause our cost savings to be
*insanely* inflated from this.
While I was here, I also rotated the operand walk out of the worklist
loop to make the code easier to read. There is still work to be done to
minimize worklist traffic because we don't de-duplicate operands. This
means we may add the same instruction onto the worklist 1000s of times
if it shows up in 1000s of operansd to a PHI node for example.
Still, with this patch, the ffmpeg testcase I have finishes quickly and
I can't measure the runtime impact of the unroll analysis any more. I'll
probably try to do a few more cleanups to this code, but not sure how
much cleanup I can justify right now.
llvm-svn: 229038
readable.
The biggest thing that was causing me problems is recognizing the
references vs. poniters here. I also found that for maps naming the loop
variable as KeyValue helps make it obvious why you don't actually use it
directly. Finally, using 'auto' instead of 'User *' doesn't seem like
a good tradeoff. Much like with the other cases, I like to know its
a pointer, and 'User' is just as long and tells the reader a lot more.
llvm-svn: 229033
hard to type and read for me, and is inconsistent with the other
abbreviation in the base class "Inst". For most of these (where they are
used widely) I prefer just spelling it out as Instruction. I've changed
two of the short-lived variables to use "Inst" to match the base class.
llvm-svn: 229028
This is much more efficient. In particular, the query with the user
instruction has to insert a false for every missing instruction into the
set. This is just a cleanup a long the way to fixing the underlying
algorithm problems here.
llvm-svn: 228994
When we try to estimate number of potentially removed instructions in
loop unroller, we analyze first N iterations and then scale the
computed number by TripCount/N. We should bail out early if N is 0.
llvm-svn: 228988
If complete-unroll could help us to optimize away N% of instructions, we
might want to do this even if the final size would exceed loop-unroll
threshold. However, we don't want to unroll huge loop, and we are add
AbsoluteThreshold to avoid that - this threshold will never be crossed,
even if we expect to optimize 99% instructions after that.
llvm-svn: 228434
It is a variation of SimplifyBinOp, but it takes into account
FastMathFlags.
It is needed in inliner and loop-unroller to accurately predict the
transformation's outcome (previously we dropped the flags and were too
conservative in some cases).
Example:
float foo(float *a, float b) {
float r;
if (a[1] * b)
r = /* a lot of expensive computations */;
else
r = 1;
return r;
}
float boo(float *a) {
return foo(a, 0.0);
}
Without this patch, we don't inline 'foo' into 'boo'.
llvm-svn: 228432
Complete loop unrolling can make some loads constant, thus enabling a
lot of other optimizations. To catch such cases, we look for loads that
might become constants and estimate number of instructions that would be
simplified or become dead after substitution.
Example:
Suppose we have:
int a[] = {0, 1, 0};
v = 0;
for (i = 0; i < 3; i ++)
v += b[i]*a[i];
If we completely unroll the loop, we would get:
v = b[0]*a[0] + b[1]*a[1] + b[2]*a[2]
Which then will be simplified to:
v = b[0]* 0 + b[1]* 1 + b[2]* 0
And finally:
v = b[1]
llvm-svn: 228265
Summary: MSVC can compile "LoopID->getOperand(0) == LoopID" when LoopID is MDNode*.
Test Plan: no regression
Reviewers: mkuper
Subscribers: jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D7327
llvm-svn: 227853
getTTI method used to get an actual TTI object.
No functionality changed. This just threads the argument and ensures
code like the inliner can correctly look up the callee's TTI rather than
using a fixed one.
The next change will use this to implement per-function subtarget usage
by TTI. The changes after that should eliminate the need for FTTI as that
will have become the default.
llvm-svn: 227730
Summary:
CUDA driver can unroll loops when jit-compiling PTX. To prevent CUDA
driver from unrolling a loop marked with llvm.loop.unroll.disable is not
unrolled by CUDA driver, we need to emit .pragma "nounroll" at the
header of that loop.
This patch also extracts getting unroll metadata from loop ID metadata
into a shared helper function.
Test Plan: test/CodeGen/NVPTX/nounroll.ll
Reviewers: eliben, meheff, jholewinski
Reviewed By: jholewinski
Subscribers: jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D7041
llvm-svn: 227703
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
a LoopInfoWrapperPass to wire the object up to the legacy pass manager.
This switches all the clients of LoopInfo over and paves the way to port
LoopInfo to the new pass manager. No functionality change is intended
with this iteration.
llvm-svn: 226373
When we compute the size of a loop, we include the branch on the backedge and
the comparison feeding the conditional branch. Under normal circumstances,
these don't get replicated with the rest of the loop body when we unroll. This
led to the somewhat surprising behavior that really small loops would not get
unrolled enough -- they could be unrolled more and the resulting loop would be
below the threshold, because we were assuming they'd take
(LoopSize * UnrollingFactor) instructions after unrolling, instead of
(((LoopSize-2) * UnrollingFactor)+2) instructions. This fixes that computation.
llvm-svn: 225565
a cache of assumptions for a single function, and an immutable pass that
manages those caches.
The motivation for this change is two fold. Immutable analyses are
really hacks around the current pass manager design and don't exist in
the new design. This is usually OK, but it requires that the core logic
of an immutable pass be reasonably partitioned off from the pass logic.
This change does precisely that. As a consequence it also paves the way
for the *many* utility functions that deal in the assumptions to live in
both pass manager worlds by creating an separate non-pass object with
its own independent API that they all rely on. Now, the only bits of the
system that deal with the actual pass mechanics are those that actually
need to deal with the pass mechanics.
Once this separation is made, several simplifications become pretty
obvious in the assumption cache itself. Rather than using a set and
callback value handles, it can just be a vector of weak value handles.
The callers can easily skip the handles that are null, and eventually we
can wrap all of this up behind a filter iterator.
For now, this adds boiler plate to the various passes, but this kind of
boiler plate will end up making it possible to port these passes to the
new pass manager, and so it will end up factored away pretty reasonably.
llvm-svn: 225131
Split `Metadata` away from the `Value` class hierarchy, as part of
PR21532. Assembly and bitcode changes are in the wings, but this is the
bulk of the change for the IR C++ API.
I have a follow-up patch prepared for `clang`. If this breaks other
sub-projects, I apologize in advance :(. Help me compile it on Darwin
I'll try to fix it. FWIW, the errors should be easy to fix, so it may
be simpler to just fix it yourself.
This breaks the build for all metadata-related code that's out-of-tree.
Rest assured the transition is mechanical and the compiler should catch
almost all of the problems.
Here's a quick guide for updating your code:
- `Metadata` is the root of a class hierarchy with three main classes:
`MDNode`, `MDString`, and `ValueAsMetadata`. It is distinct from
the `Value` class hierarchy. It is typeless -- i.e., instances do
*not* have a `Type`.
- `MDNode`'s operands are all `Metadata *` (instead of `Value *`).
- `TrackingVH<MDNode>` and `WeakVH` referring to metadata can be
replaced with `TrackingMDNodeRef` and `TrackingMDRef`, respectively.
If you're referring solely to resolved `MDNode`s -- post graph
construction -- just use `MDNode*`.
- `MDNode` (and the rest of `Metadata`) have only limited support for
`replaceAllUsesWith()`.
As long as an `MDNode` is pointing at a forward declaration -- the
result of `MDNode::getTemporary()` -- it maintains a side map of its
uses and can RAUW itself. Once the forward declarations are fully
resolved RAUW support is dropped on the ground. This means that
uniquing collisions on changing operands cause nodes to become
"distinct". (This already happened fairly commonly, whenever an
operand went to null.)
If you're constructing complex (non self-reference) `MDNode` cycles,
you need to call `MDNode::resolveCycles()` on each node (or on a
top-level node that somehow references all of the nodes). Also,
don't do that. Metadata cycles (and the RAUW machinery needed to
construct them) are expensive.
- An `MDNode` can only refer to a `Constant` through a bridge called
`ConstantAsMetadata` (one of the subclasses of `ValueAsMetadata`).
As a side effect, accessing an operand of an `MDNode` that is known
to be, e.g., `ConstantInt`, takes three steps: first, cast from
`Metadata` to `ConstantAsMetadata`; second, extract the `Constant`;
third, cast down to `ConstantInt`.
The eventual goal is to introduce `MDInt`/`MDFloat`/etc. and have
metadata schema owners transition away from using `Constant`s when
the type isn't important (and they don't care about referring to
`GlobalValue`s).
In the meantime, I've added transitional API to the `mdconst`
namespace that matches semantics with the old code, in order to
avoid adding the error-prone three-step equivalent to every call
site. If your old code was:
MDNode *N = foo();
bar(isa <ConstantInt>(N->getOperand(0)));
baz(cast <ConstantInt>(N->getOperand(1)));
bak(cast_or_null <ConstantInt>(N->getOperand(2)));
bat(dyn_cast <ConstantInt>(N->getOperand(3)));
bay(dyn_cast_or_null<ConstantInt>(N->getOperand(4)));
you can trivially match its semantics with:
MDNode *N = foo();
bar(mdconst::hasa <ConstantInt>(N->getOperand(0)));
baz(mdconst::extract <ConstantInt>(N->getOperand(1)));
bak(mdconst::extract_or_null <ConstantInt>(N->getOperand(2)));
bat(mdconst::dyn_extract <ConstantInt>(N->getOperand(3)));
bay(mdconst::dyn_extract_or_null<ConstantInt>(N->getOperand(4)));
and when you transition your metadata schema to `MDInt`:
MDNode *N = foo();
bar(isa <MDInt>(N->getOperand(0)));
baz(cast <MDInt>(N->getOperand(1)));
bak(cast_or_null <MDInt>(N->getOperand(2)));
bat(dyn_cast <MDInt>(N->getOperand(3)));
bay(dyn_cast_or_null<MDInt>(N->getOperand(4)));
- A `CallInst` -- specifically, intrinsic instructions -- can refer to
metadata through a bridge called `MetadataAsValue`. This is a
subclass of `Value` where `getType()->isMetadataTy()`.
`MetadataAsValue` is the *only* class that can legally refer to a
`LocalAsMetadata`, which is a bridged form of non-`Constant` values
like `Argument` and `Instruction`. It can also refer to any other
`Metadata` subclass.
(I'll break all your testcases in a follow-up commit, when I propagate
this change to assembly.)
llvm-svn: 223802
routines and fix all of the bugs they expose.
I hit a test case that crashed even without these asserts due to passing
a non-exiting latch to the ExitingBlock parameter of the trip count
computation machinery. However, when I add the nice asserts, it turns
out we have plenty of coverage of these bugs, they just didn't manifest
in crashers.
The core problem seems to stem from an assumption that the latch *is*
the exiting block. While this is often true, and somewhat the "normal"
way to think about loops, it isn't necessarily true. The correct way to
call the trip count routines in a *generic* fashion (that is, without
a particular exit in mind) is to just use the loop's single exiting
block if it has one. The trip count can't be computed generically unless
it does. This works great for the loop vectorizer. The loop unroller
actually *wants* to select the latch when it has to chose between
multiple exits because for unrolling it is the latch trips that matter.
But if this is the desire, it needs to explicitly guard for non-exiting
latches and check for the generic trip count in that case.
I've added the asserts, and added convenience APIs for querying the trip
count generically that check for a single exit block. I've kept the APIs
consistent between computing trip count and trip multiples.
Thansk to Mark for the help debugging and tracking down the *right* fix
here!
llvm-svn: 219550
shim between the TargetTransformInfo immutable pass and the Subtarget
via the TargetMachine and Function. Migrate a single call from
BasicTargetTransformInfo as an example and provide shims where TargetMachine
begins taking a Function to determine the subtarget.
No functional change.
llvm-svn: 218004
This adds a set of utility functions for collecting 'ephemeral' values. These
are LLVM IR values that are used only by @llvm.assume intrinsics (directly or
indirectly), and thus will be removed prior to code generation, implying that
they should be considered free for certain purposes (like inlining). The
inliner's cost analysis, and a few other passes, have been updated to account
for ephemeral values using the provided functionality.
This functionality is important for the usability of @llvm.assume, because it
limits the "non-local" side-effects of adding llvm.assume on inlining, loop
unrolling, etc. (these are hints, and do not generate code, so they should not
directly contribute to estimates of execution cost).
llvm-svn: 217335
This adds an immutable pass, AssumptionTracker, which keeps a cache of
@llvm.assume call instructions within a module. It uses callback value handles
to keep stale functions and intrinsics out of the map, and it relies on any
code that creates new @llvm.assume calls to notify it of the new instructions.
The benefit is that code needing to find @llvm.assume intrinsics can do so
directly, without scanning the function, thus allowing the cost of @llvm.assume
handling to be negligible when none are present.
The current design is intended to be lightweight. We don't keep track of
anything until we need a list of assumptions in some function. The first time
this happens, we scan the function. After that, we add/remove @llvm.assume
calls from the cache in response to registration calls and ValueHandle
callbacks.
There are no new direct test cases for this pass, but because it calls it
validation function upon module finalization, we'll pick up detectable
inconsistencies from the other tests that touch @llvm.assume calls.
This pass will be used by follow-up commits that make use of @llvm.assume.
llvm-svn: 217334
hint) the loop unroller replaces the llvm.loop.unroll.count metadata with
llvm.loop.unroll.disable metadata to prevent any subsequent unrolling
passes from unrolling more than the hint indicates. This patch fixes
an issue where loop unrolling could be disabled for other loops as well which
share the same llvm.loop metadata.
llvm-svn: 213900
[LLVM part]
These patches rename the loop unrolling and loop vectorizer metadata
such that they have a common 'llvm.loop.' prefix. Metadata name
changes:
llvm.vectorizer.* => llvm.loop.vectorizer.*
llvm.loopunroll.* => llvm.loop.unroll.*
This was a suggestion from an earlier review
(http://reviews.llvm.org/D4090) which added the loop unrolling
metadata.
Patch by Mark Heffernan.
llvm-svn: 211710
[This is resubmitting r210721, which was reverted due to suspected breakage
which turned out to be unrelated].
Some extra review comments were addressed. See D4090 and D4147 for more details.
The Clang change that produces this metadata was committed in r210667
Patch by Mark Heffernan.
llvm-svn: 211076
See http://reviews.llvm.org/D4090 for more details.
The Clang change that produces this metadata was committed in r210667
Patch by Mark Heffernan.
llvm-svn: 210721