This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated. The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.
This is part of an effort to migrate from llvm::Optional to
std::optional:
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
Using these queries with a context instruction and without a cache
seems to be about 2x slower than with it so this theoretically
improves compile time.
* Replace getUserCost with getInstructionCost, covering all cost kinds.
* Remove getInstructionLatency, it's not implemented by any backends, and we should fold the functionality into getUserCost (now getInstructionCost) to make it easier for targets to handle the cost kinds with their existing cost callbacks.
Original Patch by @samparker (Sam Parker)
Differential Revision: https://reviews.llvm.org/D79483
Teach the unroller(s) how to handle an invalid cost. This avoids crashes when the backend can't provide a cost due to either a fundemental limitation or an unimplemented cost model case.
Differential Revision: https://reviews.llvm.org/D127305
Per the documentation in Support/InstructionCost.h, the purpose of an invalid cost is so that clients can change behavior on impossible to cost inputs. CodeMetrics was instead asserting that invalid costs never occurred.
On a target with an incomplete cost model - e.g. RISCV - this means that transformations would crash on (falsely) invalid constructs - e.g. scalable vectors. While we certainly should improve the cost model - and I plan to do so in the near future - we also shouldn't be crashing. This violates the explicitly stated purpose of an invalid InstructionCost.
I updated all of the "easy" consumers where bailouts were locally obvious. I plan to follow up with loop unroll in a following change.
Differential Revision: https://reviews.llvm.org/D127131
Some cl::ZeroOrMore were added to avoid the `may only occur zero or one times!`
error. More were added due to cargo cult. Since the error has been removed,
cl::ZeroOrMore is unneeded.
Also remove cl::init(false) while touching the lines.
There are a few places where we use report_fatal_error when the input is broken.
Currently, this function always crashes LLVM with an abort signal, which
then triggers the backtrace printing code.
I think this is excessive, as wrong input shouldn't give a link to
LLVM's github issue URL and tell users to file a bug report.
We shouldn't print a stack trace either.
This patch changes report_fatal_error so it uses exit() rather than
abort() when its argument GenCrashDiag=false.
Reviewed by: nikic, MaskRay, RKSimon
Differential Revision: https://reviews.llvm.org/D126550
Similar to c515b2f39e, If there are no loops in the function as seen
through LI, we should avoid computing the remaining expensive analyses
(such as SCEV, BPI). Reordered the analyses requests and early return
if there are no loops.
The logic of avoiding expensive analyses is applied to LoopVectorizer,
LoopLoadElimination and LoopUnrollPass, i.e. all function passes which operate
on loops.
This is an NFC with compile time improvement.
Differential Revision: https://reviews.llvm.org/D124529
IMO when user provide unroll pragma, compiler should always respect it.
It is not clear to me why loop unroll pass currently ensure that the
unrolled loop size is limited by PragmaUnrollThreshold.
Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D119148
Cleanup code in peelLoop API. We already have usage of DT without guarding
against a null DT, so this change constant folds the remaining null DT
checks.
Also make the argument a reference so that it is clear the argument is
a nonnull DT.
Extracted from D118472.
If a loop isn't forced to be unrolled, we want to avoid unrolling it when there
is an explicit unroll-and-jam pragma. This is to prevent automatic unrolling
from interfering with the user requested transformation.
Differential Revision: https://reviews.llvm.org/D114886
5c77aa2b91 [unroll] Use early return in shouldFullUnroll [nfc]
wasn't quite NFC since !(x <= y) is x > y rather than x >= y
Credit to Justin Bogner for spotting the bug
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D114894
This change should be NFC. It's posted for review mostly to make sure others are happy with the names I'm introducing for "exact full unroll" and "bounded full unroll". The motivation here is that our cost model for bounded unrolling is too aggressive - it gives benefits for exits we aren't going to prune - but I also just think the new version of the code is a lot easier to follow.
Differential Revision: https://reviews.llvm.org/D114453
These variables are not out-params, and we immediately return after assigning them. Thus, the assignments are dead and just confusing.
I believe these used to be out-params, but they're not any more.
This patch adds a new cost heuristic that allows peeling a single
iteration off read-only loops, if the loop contains a load that
1. is feeding an exit condition,
2. dominates the latch,
3. is not already known to be dereferenceable,
4. and has a loop invariant address.
If all non-latch exits are terminated with unreachable, such loads
in the loop are guaranteed to be dereferenceable after peeling,
enabling hoisting/CSE'ing them.
This enables vectorization of loops with certain runtime-checks, like
multiple calls to `std::vector::at` if the vector is passed as pointer.
Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D108114
Added '-print-pipeline-passes' printing of parameters for those passes
declared with *_WITH_PARAMS macro in PassRegistry.def.
Note that it only prints the parameters declared inside *_WITH_PARAMS as
in a few cases there appear to be additional parameters not parsable.
The following passes are now covered (i.e. all of those with *_WITH_PARAMS in
PassRegistry.def).
LoopExtractorPass - loop-extract
HWAddressSanitizerPass - hwsan
EarlyCSEPass - early-cse
EntryExitInstrumenterPass - ee-instrument
LowerMatrixIntrinsicsPass - lower-matrix-intrinsics
LoopUnrollPass - loop-unroll
AddressSanitizerPass - asan
MemorySanitizerPass - msan
SimplifyCFGPass - simplifycfg
LoopVectorizePass - loop-vectorize
MergedLoadStoreMotionPass - mldst-motion
GVN - gvn
StackLifetimePrinterPass - print<stack-lifetime>
SimpleLoopUnswitchPass - simple-loop-unswitch
Differential Revision: https://reviews.llvm.org/D109310
Decoupling the unrolling logic into three different functions. The shouldPragmaUnroll() covers the 1st and 2nd priorities of the previous code, the shouldFullUnroll() covers the 3rd, and the shouldPartialUnroll() covers the 5th. The output of each function, Optional<unsigned>, could be a value for UP.Count, which means unrolling factor has been set, or None, which means decision hasn't been made yet and should try the next priority.
Reviewed By: mtrofin, jdoerfert
Differential Revision: https://reviews.llvm.org/D106001
I'm not sure this is the best way to approach this,
but the situation is rather not very detectable unless we explicitly call it out when refusing to advise to unroll.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D107271
This aligns the multiple exit costing with all the other cost decisions. Note that UnrollAndJam, which is the only other caller of the original home of this code, unconditionally bails out of multiple exit loops.
As these are no longer passed to UnrollLoop(), there is no need to
modify them in computeUnrollCount(). Make them non-reference parameters.
Differential Revision: https://reviews.llvm.org/D104590
This is a more general alternative/extension to D102635. Rather than
handling the special case of "header exit with non-exiting latch",
this unrolls against the smallest exact trip count from any exit.
The latch exit is no longer treated as priviledged when it comes to
full unrolling.
The motivating case is in full-unroll-one-unpredictable-exit.ll.
Here the header exit is an IV-based exit, while the latch exit is
a data comparison. This kind of loop does not get rotated, because
the latch is already exiting, and loop rotation doesn't try to
distinguish IV-based/analyzable latches.
Differential Revision: https://reviews.llvm.org/D102982
Currently, UnrollLoop() is passed an AllowRuntime flag and decides
itself whether runtime unrolling should be used or not. This patch
pushes the decision into the caller and allows us to eliminate the
ULO.TripCount and ULO.TripMultiple parameters.
Differential Revision: https://reviews.llvm.org/D104487
Loop peeling is currently performed as part of UnrollLoop().
Outside test scenarios, it is always performed with an unroll
count of 1. This means that unrolling doesn't actually do anything
apart from performing post-unroll simplification.
When testing, it's currently possible to specify both an explicit
peel count and an explicit unroll count. This doesn't perform any
sensible operation and may result in miscompiles, see
https://bugs.llvm.org/show_bug.cgi?id=45939.
This patch moves peeling from UnrollLoop() into tryToUnrollLoop(),
so that peeling does not also perform a susequent unroll. We only
run the post-unroll simplifications. Specifying both an explicit
peel count and unroll count is forbidden.
In the future, we may want to support both (non-PGO) peeling a
loop and unrolling it, but this needs to be done by first performing
the peel and then recalculating unrolling heuristics on a now
possibly analyzable loop.
Differential Revision: https://reviews.llvm.org/D103362
This builds on D103584. The change eliminates the coupling between unroll heuristic and implementation w.r.t. knowing when the passed in trip count is an exact trip count or a max trip count. In theory the new code is slightly less powerful (since it relies on exact computable trip counts), but in practice, it appears to cover all the same cases. It can also be extended if needed.
The test change shows what appears to be a bug in the existing code around the interaction of peeling and unrolling. The original loop only ran 8 iterations. The previous output had the loop peeled by 2, and then an exact unroll of 8. This meant the loop ran a total of 10 iterations which appears to have been a miscompile.
Differential Revision: https://reviews.llvm.org/D103620
This is a first step towards simplifying the transform interface to be less error prone. The basic idea is that querying SCEV is cheap (since it's cached) and we can just check for properties related to branch folding in the transform method instead of relying on the heuristic part to pass everything in correctly.
Differential Revision: https://reviews.llvm.org/D103584
The current full unroll cost model does a symbolic evaluation of the loop up to a fixed limit. That symbolic evaluation currently simplifies to constants, but we can generalize to arbitrary Values using the InstructionSimplify infrastructure at very low cost.
By itself, this enables some simplifications, but it's mainly useful when combined with the branch simplification over in D102928.
Differential Revision: https://reviews.llvm.org/D102934
Loop peeling removes conditions from loop bodies that become invariant
after a small number of iterations. When triggered, this leads to fewer
compares and possibly PHIs in loop bodies, enabling further
optimizations. The current cost-model of loop peeling should be quite
conservative/safe, i.e. only peel if a condition in the loop becomes
known after peeling.
For example, see PR47671, where loop peeling enables vectorization by
removing a PHI the vectorizer does not understand. Granted, the
loop-vectorizer could also be taught about constant PHIs, but loop
peeling is likely to enable other optimizations as well.
This has an impact on quite a few benchmarks from
MultiSource/SPEC2000/SPEC2006 on X86 with -O3 -flto, for example
Same hash: 186 (filtered out)
Remaining: 51
Metric: loop-vectorize.LoopsVectorized
Program base patch diff
test-suite...ve-susan/automotive-susan.test 8.00 9.00 12.5%
test-suite...nal/skidmarks10/skidmarks.test 35.00 31.00 -11.4%
test-suite...lications/sqlite3/sqlite3.test 41.00 43.00 4.9%
test-suite...s/ASC_Sequoia/AMGmk/AMGmk.test 25.00 26.00 4.0%
test-suite...006/450.soplex/450.soplex.test 88.00 89.00 1.1%
test-suite...TimberWolfMC/timberwolfmc.test 120.00 119.00 -0.8%
test-suite.../CINT2006/403.gcc/403.gcc.test 215.00 216.00 0.5%
test-suite...006/447.dealII/447.dealII.test 957.00 958.00 0.1%
test-suite...ternal/HMMER/hmmcalibrate.test 75.00 75.00 0.0%
Same hash: 186 (filtered out)
Remaining: 51
Metric: loop-vectorize.LoopsAnalyzed
Program base patch diff
test-suite...ks/Prolangs-C/agrep/agrep.test 440.00 434.00 -1.4%
test-suite...nal/skidmarks10/skidmarks.test 312.00 308.00 -1.3%
test-suite...marks/7zip/7zip-benchmark.test 6399.00 6323.00 -1.2%
test-suite...lications/minisat/minisat.test 134.00 135.00 0.7%
test-suite...rks/FreeBench/pifft/pifft.test 295.00 297.00 0.7%
test-suite...TimberWolfMC/timberwolfmc.test 1879.00 1869.00 -0.5%
test-suite...pplications/treecc/treecc.test 689.00 691.00 0.3%
test-suite...T2000/300.twolf/300.twolf.test 1593.00 1597.00 0.3%
test-suite.../Benchmarks/Bullet/bullet.test 1394.00 1392.00 -0.1%
test-suite...ications/JM/ldecod/ldecod.test 1431.00 1429.00 -0.1%
test-suite...6/464.h264ref/464.h264ref.test 2229.00 2230.00 0.0%
test-suite...lications/sqlite3/sqlite3.test 2590.00 2589.00 -0.0%
test-suite...ications/JM/lencod/lencod.test 2732.00 2733.00 0.0%
test-suite...006/453.povray/453.povray.test 3395.00 3394.00 -0.0%
Note the -11% regression in number of loops vectorized for skidmarks. I
suspect this corresponds to the fact that those loops are gone now (see
the reduction in number of loops analyzed by LV).
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D88471