When unrolling, the exit values in LCSSA phis will get updated.
Invalidate cached SCEV values for those phis in case SCEV looked through
a exit phi.
Fixes#58340.
After unrolling a loop, the block and loop dispositions need to be
cleared. As we don't know which SCEVs in the loop/blocks may be
impacted, completely clear the cache. This should also fix some cases
where deleted loops remained in the LoopDispositions cache.
This fixes a verification failure surfaced by D134531.
I am planning on reviewing/updating the existing uses of
forgetLoopDispositions to check if they should be replaced by
forgetBlockAndLoopDispositions.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D134612
This patch replaces calls to GreatestCommonDivisor64 with std::gcd
where both arguments are known to be of unsigned types no larger than
64 bits in size.
Clang-format InstructionSimplify and convert all "FunctionName"s to
"functionName". This patch does touch a lot of files but gets done with
the cleanup of InstructionSimplify in one commit.
This is the alternative to the less invasive clang-format only patch: D126783
Reviewed By: spatel, rengolin
Differential Revision: https://reviews.llvm.org/D126889
I am suspecting a bug around updates of loop info for unreachable exits, but don't have a test case. Running this locally on make check didn't reveal anything, we'll see if the expensive checks bots find it.
If we have an exit which is controlled by a loop invariant condition and which dominates the latch, we know only the copy in the first unrolled iteration can be taken. All other copies are dead.
The change itself is pretty straight forward, but let me add two points of context:
* I'd have expected other transform passes to catch this after unrolling, but I'm seeing multiple examples where we get to the end of O2/O3 without simplifying.
* I'd like to do a stronger change which did CSE during unroll and accounted for invariant expressions (as defined by SCEV instead of trivial ones from LoopInfo), but that doesn't fit cleanly into the current code structure.
Differential Revision: https://reviews.llvm.org/D116496
The unrolling code was previously inserting new cloned blocks at the end of the function. The result of this with typical loop structures is that the new iterations are placed far from the initial iteration.
With unrolling, the general assumption is that the a) the loop is reasonable hot, and b) the first Count-1 copies of the loop are rarely (if ever) loop exiting. As such, placing Count-1 copies out of line is a fairly poor code placement choice. We'd much rather fall through into the hot (non-exiting) path. For code with branch profiles, later layout would fix this, but this may have a positive impact on non-PGO compiled code.
However, the real motivation for this change isn't performance. Its readability and human understanding. Having to jump around long distances in an IR file to trace an unrolled loop structure is error prone and tedious.
Currently, UnrollLoop() is passed an AllowRuntime flag and decides
itself whether runtime unrolling should be used or not. This patch
pushes the decision into the caller and allows us to eliminate the
ULO.TripCount and ULO.TripMultiple parameters.
Differential Revision: https://reviews.llvm.org/D104487
Remove dependence on ULO.TripCount/ULO.TripMultiple from ORE and
debug code. For debug code, print information about all exits.
For optimization remarks, only include the unroll count and the
type of unroll (complete, partial or runtime), but omit detailed
information about exit folding, now that more than one exit may
be folded.
Differential Revision: https://reviews.llvm.org/D104482
Fold all exits based on known trip count/multiple information from
SCEV. Previously only the latch exit or the single exit were folded.
This doesn't yet eliminate ULO.TripCount and ULO.TripMultiple
entirely: They're still used to a) decide whether runtime unrolling
should be performed and b) for ORE remarks. However, the core
unrolling logic is independent of them now.
Differential Revision: https://reviews.llvm.org/D104203
Unrolling with more iterations than MaxTripCount is pointless, as
those iterations can never be executed. As such, we clamp ULO.Count
to MaxTripCount if it is known. This means we no longer need to
consider iterations after MaxTripCount for exit folding, and the
CompletelyUnroll flag becomes independent of ULO.TripCount.
Differential Revision: https://reviews.llvm.org/D103748
Loop peeling is currently performed as part of UnrollLoop().
Outside test scenarios, it is always performed with an unroll
count of 1. This means that unrolling doesn't actually do anything
apart from performing post-unroll simplification.
When testing, it's currently possible to specify both an explicit
peel count and an explicit unroll count. This doesn't perform any
sensible operation and may result in miscompiles, see
https://bugs.llvm.org/show_bug.cgi?id=45939.
This patch moves peeling from UnrollLoop() into tryToUnrollLoop(),
so that peeling does not also perform a susequent unroll. We only
run the post-unroll simplifications. Specifying both an explicit
peel count and unroll count is forbidden.
In the future, we may want to support both (non-PGO) peeling a
loop and unrolling it, but this needs to be done by first performing
the peel and then recalculating unrolling heuristics on a now
possibly analyzable loop.
Differential Revision: https://reviews.llvm.org/D103362
This builds on D103584. The change eliminates the coupling between unroll heuristic and implementation w.r.t. knowing when the passed in trip count is an exact trip count or a max trip count. In theory the new code is slightly less powerful (since it relies on exact computable trip counts), but in practice, it appears to cover all the same cases. It can also be extended if needed.
The test change shows what appears to be a bug in the existing code around the interaction of peeling and unrolling. The original loop only ran 8 iterations. The previous output had the loop peeled by 2, and then an exact unroll of 8. This meant the loop ran a total of 10 iterations which appears to have been a miscompile.
Differential Revision: https://reviews.llvm.org/D103620
This is a first step towards simplifying the transform interface to be less error prone. The basic idea is that querying SCEV is cheap (since it's cached) and we can just check for properties related to branch folding in the transform method instead of relying on the heuristic part to pass everything in correctly.
Differential Revision: https://reviews.llvm.org/D103584
This cleans up the unroll action into two phases. Phase 1 does the mechanical act of unrolling, and leaves all conditional branches in place. Phase 2 optimizes away some of the conditional branches and then simplifies the loop. The primary benefit of the reordering is that we can delete some special cases dom tree update logic.
Differential Revision: https://reviews.llvm.org/D103561
When fulling unrolling with a non-latch exit, the latch block is
folded to unreachable. Replace this folding with the existing
changeToUnreachable() helper, rather than performing it manually.
This also moves the fold to happen after the manual DT update
for exit blocks. I believe this is correct in that the conversion
of an unconditional backedge into unreachable should not affect
the DT at all.
Differential Revision: https://reviews.llvm.org/D103340
This does some non-functional cleanup of exit folding during
unrolling. The two main changes are:
* First rewrite latch->header edges, which is unrelated to exit
folding.
* Combine folding for latch and non-latch exits. After the
previous change, the only difference in their logic is that
for non-latch exits we currently only fold "known non-exit"
cases, but not "known exit" cases.
I think this helps a lot to clarify this code and prepare it for
future changes.
Differential Revision: https://reviews.llvm.org/D103333
Turns out simplifyLoopIVs sometimes returns a non-dead instruction in it's DeadInsts out param. I had done a bit of NFC cleanup which was only NFC if simplifyLoopIVs obeyed it's documentation. I'm simplfy dropping that part of the change.
Commit message from try 3:
Recommitting after fixing a bug found post commit. Amusingly, try 1 had been correct, and by reverting to incorporate last minute review feedback, I introduce the bug. Oops. :)
Original commit message:
The problem was that recursively deleting an instruction can delete instructions beyond the current iterator (via a dead phi), thus invalidating iteration. Test case added in LoopUnroll/dce.ll to cover this case.
LoopUnroll does a limited DCE pass after unrolling, but if you have a chain of dead instructions, it only deletes the last one. Improve the code to recursively delete all trivially dead instructions.
Differential Revision: https://reviews.llvm.org/D102511
This patch implements first part of Flow Sensitive SampleFDO (FSAFDO).
It has the following changes:
(1) disable current discriminator encoding scheme,
(2) new hierarchical discriminator for FSAFDO.
For this patch, option "-enable-fs-discriminator=true" turns on the new
functionality. Option "-enable-fs-discriminator=false" (the default)
keeps the current SampleFDO behavior. When the fs-discriminator is
enabled, we insert a flag variable, namely, llvm_fs_discriminator, to
the object. This symbol will checked by create_llvm_prof tool, and used
to generate a profile with FS-AFDO discriminators enabled. If this
happens, for an extbinary format profile, create_llvm_prof tool
will add a flag to profile summary section.
Differential Revision: https://reviews.llvm.org/D102246
Recommitting after fixing a bug found post commit. Amusingly, try 1 had been correct, and by reverting to incorporate last minute review feedback, I introduce the bug. Oops. :)
The problem was that recursively deleting an instruction can delete instructions beyond the current iterator (via a dead phi), thus invalidating iteration. Test case added in LoopUnroll/dce.ll to cover this case.
LoopUnroll does a limited DCE pass after unrolling, but if you have a chain of dead instructions, it only deletes the last one. Improve the code to recursively delete all trivially dead instructions.
Differential Revision: https://reviews.llvm.org/D102511
Recommitting after addressing a missed review comment, and updating an aarch64 test I'd missed.
LoopUnroll does a limited DCE pass after unrolling, but if you have a chain of dead instructions, it only deletes the last one. Improve the code to recursively delete all trivially dead instructions.
Differential Revision: https://reviews.llvm.org/D102511
LoopUnroll does a limited DCE pass after unrolling, but if you have a chain of dead instructions, it only deletes the last one. Improve the code to recursively delete all trivially dead instructions.
Differential Revision: https://reviews.llvm.org/D102511
Add the subclass, update a few places which check for the intrinsic to use idiomatic dyn_cast, and update the public interface of AssumptionCache to use the new class. A follow up change will do the same for the newer assumption query/bundle mechanisms.
In the cloning infrastructure, only track an MDNode mapping,
without explicitly storing the Metadata mapping, same as is done
during inlining. This makes things slightly simpler.
This is a fix for https://bugs.llvm.org/show_bug.cgi?id=39282. Compared to D90104, this version is based on part of the full restrict patched (D68484) and uses the `@llvm.experimental.noalias.scope.decl` intrinsic to track the location where !noalias and !alias.scope scopes have been introduced. This allows us to only duplicate the scopes that are really needed.
Notes:
- it also includes changes and tests from D90104
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D92887
Loop peeling as a last step triggers loop simplification and this
can change the loop structure. As a result all cashed values like
latch branch becomes invalid.
Patch re-structure the code to take into account the possible
changes caused by peeling.
Reviewers: dmgreen, Meinersbur, etiotto, fhahn, efriedma, bmahjour
Reviewed By: Meinersbur, fhahn
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D93686
Summary: This patch separates the Loop Peeling Utilities from Loop Unrolling.
The reason for this change is that Loop Peeling is no longer only being used by
loop unrolling; Patch D82927 introduces loop peeling with fusion, such that
loops can be modified to have to same trip count, making them legal to be
peeled.
Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D83056
Summary:
Avoid exposing details about how children are stored. This will enable
subsequent type-erasure changes.
New methods are introduced to cover common access patterns.
Change-Id: Idb5f4b1b9c84e4cc71ddb39bb52a388682f5674f
Reviewers: arsenm, RKSimon, mehdi_amini, courbet
Subscribers: qcolombet, sdardis, wdng, hiraditya, jrtc27, zzheng, atanasyan, asbirlea, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D83083
is not necessary one of them.
Summary: Currently LoopUnrollPass already allow loops with multiple
exiting blocks, but it is only allowed when the loop latch is one of the
exiting blocks.
When the loop latch is not an exiting block, then only single exiting
block is supported.
When possible, the single loop latch or the single exiting block
terminator is optimized to an unconditional branch in the unrolled loop.
This patch allows loops with multiple exiting blocks even if the loop
latch is not one of them. However, the optimization of exiting block
terminator to unconditional branch is not done when there exists more
than one exiting block.
Reviewer: dmgreen, Meinersbur, etiotto, fhahn, efriedma, bmahjour
Reviewed By: efriedma
Subscribers: hiraditya, zzheng, llvm-commits
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D81053
I think the current code dealing with connecting the unrolled iterations
is a bit more complicated than necessary currently. To connect the
unrolled iterations, we have to update the unrolled latch blocks to
branch to the header of the next unrolled iteration.
We need to do this regardless whether the latch is exiting or not.
Additionally, we try to turn the conditional branch in the exiting block
to an unconditional one. This is an optimization only; alternatively we
could leave the conditional branches in place and rely on other passes
to simplify the conditions.
Logically, this is a separate step from connecting the latches to the
headers, but it is convenient to fold them into the same loop, if the
latch is also exiting. For headers (or other non-latch exiting blocks,
this is done separately).
Hopefully the patch with additional comments makes things a bit clearer.
Reviewers: efriedma, dmgreen, hfinkel, Whitney
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D80544
Summary:
Future patches will make use of TTI to perform cost-model-driven `SCEVExpander::isHighCostExpansionHelper()`
This is a fully NFC patch to make things reviewable.
Reviewers: reames, mkazantsev, wmi, sanjoy
Reviewed By: mkazantsev
Subscribers: hiraditya, zzheng, javed.absar, dmgreen, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73704