It can happen that after widening of the IV, flattening may not be possible,
e.g. when it is deemed unprofitable. We were not properly checking this, which
resulted in flattening being applied when it shouldn't, also leading to
incorrect results (miscompilation).
This should fix PR51980 (https://bugs.llvm.org/show_bug.cgi?id=51980)
Differential Revision: https://reviews.llvm.org/D110712
In rG6a076fa9539e, a problem with updating the old/narrow phi nodes after IV
widening was introduced. If after widening of the IV the transformation is
*not* applied, the narrow phi node was incorrectly modified, which should only
happen if flattening happens. This can be seen in the added test widen-iv2.ll,
which incorrectly had 1 incoming value, but should have its original 2 incoming
values, which is now restored.
Differential Revision: https://reviews.llvm.org/D110234
LoopFlatten wasn't triggering on this motivating case after IV widening:
void foo(int *A, int N, int M) {
for (int i = 0; i < N; ++i)
for (int j = 0; j < M; ++j)
f(A[i*M+j]);
}
The reason was that the old induction phi nodes were getting in the way. These
narrow and dead induction phis are not always trivially dead, and having both
the narrow and wide IVs confused the analysis and caused it to bail. This adds
some extra bookkeeping for these old phis, so we can filter them out when
checks on phi nodes are performed. Other clean up passes will get rid of these
old phis and increment instructions.
As this was one of the motivating examples from the beginning, it was
surprising this wasn't triggering from C/C++ code. It looks like the IR and CFG
is just slightly different.
Differential Revision: https://reviews.llvm.org/D109309
There is an assertion failure in computeOverflowForUnsignedMul
(used in checkOverflow) due to the inner and outer trip counts
having different types. This occurs when the IV has been widened,
but the loop components are not successfully rediscovered.
This is fixed by some refactoring of the code in findLoopComponents
which identifies the trip count of the loop.
Differential Revision: https://reviews.llvm.org/D108107
There is an assertion failure in computeOverflowForUnsignedMul
(used in checkOverflow) due to the inner and outer trip counts
having different types. This occurs when the IV has been widened,
but the loop components are not successfully rediscovered.
This is fixed by some refactoring of the code in findLoopComponents
which identifies the trip count of the loop.
When the limit of the inner loop is a known integer, the InstCombine
pass now causes the transformation e.g. imcp ult i32 %inc, tripcount ->
icmp ult %j, tripcount-step (where %j is the inner loop induction
variable and %inc is add %j, step), which is now accounted for when
identifying the trip count of the loop. This is also an acceptable use
of %j (provided the step is 1) so is ignored as long as the compare
that it's used in is also the condition of the inner branch.
Differential Revision: https://reviews.llvm.org/D105802
When the trip count of the inner loop is a constant, the InstCombine
pass now causes the transformation e.g. imcp ult i32 %inc, tripcount ->
icmp ult %j, tripcount-step (where %j is the inner loop induction
variable and %inc is add %j, step), which is now accounted for when
identifying the trip count of the loop. This is also an acceptable use
of %j (provided the step is 1) so is ignored as long as the compare
that it's used in is also the condition of the inner branch.
Differential Revision: https://reviews.llvm.org/D105802
The SCEV method getBackedgeTakenCount() returns a SCEVCouldNotCompute
object if the backedge-taken count is unpredictable. This fix ensures
there is no longer an attempt to use such an object to find the trip
count.
Patch by: Rosie Sumpter.
Differential Revision: https://reviews.llvm.org/D106970
Replace pattern-matching with existing SCEV and Loop APIs as a more
robust way of identifying the loop increment and trip count. Also
rename 'Limit' as 'TripCount' to be consistent with terminology.
Differential Revision: https://reviews.llvm.org/D106580
The loop flattening pass requires loops to be in simplified form. If the
loops are not in simplified form, the pass cannot operate. This patch
simplifies all loops before flattening. As a result, all loops will be
simplified regardless of whether anything ends up being flattened.
This change was inspired by observing a certain loop that was not flatten
because the loops were not in simplified form. This loop is added as a
test to verify that it is now flattened.
Differential Revision: https://reviews.llvm.org/D102249
Change-Id: I45bcabe70fb99b0d89f0effafc82eb9e0585ec30
The `InductionPHI` is not necessarily the increment instruction, as
demonstrated in pr49571.ll.
This patch removes the assertion and instead bails out from the
`LoopFlatten` pass if that happens.
This fixes https://bugs.llvm.org/show_bug.cgi?id=49571
Reviewed By: SjoerdMeijer
Differential Revision: https://reviews.llvm.org/D99252
The -loop-flatten legacy pass preserves loop analyses. The legacy PM
will check all passes that preserve loop analyses that they preserve
LCSSA. This implicitly involves running -loop-simplify. The test
shouldn't depend on verify flags being set in order to run
-loop-simplify, so explicitly add it. The new PM ends up not running it
otherwise.
I disabled the widening in fa5cb4b because it run in an assert, which was
related to replacing values with different types. I forgot that an extend could
also be a zero-extend, which I have added now. This means that the approach now
is to create and insert a trunc value of the outerloop for each user, and use
that to replace IV values.
Differential Revision: https://reviews.llvm.org/D91690
Widen the IV to the widest available and legal integer type, which makes this
transformations always safe so that we can skip overflow checks.
Motivation is to let this pass trigger on 64-bit targets too, and this is the
last patch in a serie to achieve this: D90402 moves pass LoopFlatten to just
before IndVarSimplify so that IVs are not already widened, D90421 factors out
widening from IndVarSimplify into Utils/SimplifyIndVar so that we can also use
it in LoopFlatten.
Differential Revision: https://reviews.llvm.org/D90640
This converts LoopFlatten from a LoopPass to a FunctionPass so that we don't
run into problems of a loop pass deleting a (inner)loop.
Differential Revision: https://reviews.llvm.org/D90940
This is a simple pass that flattens nested loops. The intention is to optimise
loop nests like this, which together access an array linearly:
for (int i = 0; i < N; ++i)
for (int j = 0; j < M; ++j)
f(A[i*M+j]);
into one loop:
for (int i = 0; i < (N*M); ++i)
f(A[i]);
It can also flatten loops where the induction variables are not used in the
loop. This can help with codesize and runtime, especially on simple cpus
without advanced branch prediction.
This is only worth flattening if the induction variables are only used in an
expression like i*M+j. If they had any other uses, we would have to insert a
div/mod to reconstruct the original values, so this wouldn't be profitable.
This partially fixes PR40581 as this pass triggers on one of the two cases. I
will follow up on this to learn LoopFlatten a few more (small) tricks. Please
note that LoopFlatten is not yet enabled by default.
Patch by Oliver Stannard, with minor tweaks from Dave Green and myself.
Differential Revision: https://reviews.llvm.org/D42365