`isValidRewrite()` checks that the both the original SCEV,
and the rewrite SCEV have the same base pointer.
I //believe//, after all the recent SCEV improvements,
this invariant is already enforced by SCEV itself.
I originally tried changing it into an assert in D108043,
but that showed that it triggers on e.g. https://reviews.llvm.org/D108043#2946621,
where SCEV manages to forward the store to load,
test added.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D108655
Changes since aec08e:
* Adjust placement of a closing brace so that the general case actually runs. Turns out we had *no* coverage of the switch case. I added one in eae90fd.
* Drop .llvm.loop.* metadata from the new branch as there is no longer a loop to annotate.
Original commit message:
This special cases an unconditional latch and a conditional branch latch exit to improve codegen and test readability. I am hoping to reuse this function in the runtime unroll code, but without this change, the test diffs are far too complex to assess.
This special cases an unconditional latch and a conditional branch latch exit to improve codegen and test readability. I am hoping to reuse this function in the runtime unroll code, but without this change, the test diffs are far too complex to assess.
Currently we drop wrapping flags for expressions like (A + C1)<flags> - C2.
But we can retain flags under certain conditions:
* Adding a smaller constant is NUW if the original AddExpr was NUW.
* Adding a constant with the same sign and small magnitude is NSW, if the
original AddExpr was NSW.
This can improve results after using `SimplifyICmpOperands`, which may
subtract one in order to use stricter predicates, as is the case for
`isKnownPredicate`.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D104319
A backedge-taken count doesn't refer to memory; returning a pointer type
is nonsense. So make sure we always return an integer.
The obvious way to do this would be to just convert the operands of the
icmp to integers, but that doesn't quite work out at the moment:
isLoopEntryGuardedByCond currently gets confused by ptrtoint operations.
So we perform the ptrtoint conversion late for lt/gt operations.
The test changes are mostly innocuous. The most interesting changes are
more complex SCEV expressions of the form "(-1 * (ptrtoint i8* %ptr to
i64)) + %ptr)". This is expected: we can't fold this to zero because we
need to preserve the pointer base.
The call to isLoopEntryGuardedByCond in howFarToZero is less precise
because of ptrtoint operations; this shows up in the function
pr46786_c26_char in ptrtoint.ll. Fixing it here would require more
complex refactoring. It should eventually be fixed by future
improvements to isImpliedCond.
See https://bugs.llvm.org/show_bug.cgi?id=46786 for context.
Differential Revision: https://reviews.llvm.org/D103656
The old version of this code would blindly perform arithmetic without
paying attention to whether the types involved were pointers or
integers. This could lead to weird expressions like negating a pointer.
Explicitly handle simple cases involving pointers, like "x < y ? x : y".
In all other cases, coerce the operands of the comparison to integer
types. This avoids the weird cases, while handling most of the
interesting cases.
Differential Revision: https://reviews.llvm.org/D103660
I think currently isImpliedViaMerge can incorrectly return true for phis
in a loop/cycle, if the found condition involves the previous value of
Consider the case in exit_cond_depends_on_inner_loop.
At some point, we call (modulo simplifications)
isImpliedViaMerge(<=, %x.lcssa, -1, %call, -1).
The existing code tries to prove IncV <= -1 for all incoming values
InvV using the found condition (%call <= -1). At the moment this succeeds,
but only because it does not compare the same runtime value. The found
condition checks the value of the last iteration, but the incoming value
is from the *previous* iteration.
Hence we incorrectly determine that the *previous* value was <= -1,
which may not be true.
I think we need to be more careful when looking at the incoming values
here. In particular, we need to rule out that a found condition refers to
any value that may refer to one of the previous iterations. I'm not sure
there's a reliable way to do so (that also works of irreducible control
flow).
So for now this patch adds an additional requirement that the incoming
value must properly dominate the phi block. This should ensure the
values do not change in a cycle. I am not entirely sure if will catch
all cases and I appreciate a through second look in that regard.
Alternatively we could also unconditionally bail out in this case,
instead of checking the incoming values
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D101829
The test is a crasher reduced from:
https://llvm.org/PR49993
linearFunctionTestReplace() assumes that we have an add recurrence,
so check for that as a condition of matching a loop counter.
Differential Revision: https://reviews.llvm.org/D101291
As being discussed in https://reviews.llvm.org/D100721,
this modelling is lossy, we can't reconstruct `ash`/`ashr exact`
from it, which means that whenever we actually expand the IR,
we've just pessimized the code..
It would be good to model this pattern, after all it comes up every time
you want to compute a distance between two pointers, but not at this cost.
This reverts commit ec54867df5.
When eliminating comparisons, we can use common dominator of
all its users as context. This gives better results when ICMP is not
computed right before the branch that uses it.
Differential Revision: https://reviews.llvm.org/D98924
Reviewed By: lebedev.ri
We can prove more predicates when we have a context when eliminating ICmp.
As first (and very obvious) approximation we can use the ICmp instruction itself,
though in the future we are going to use a common dominator of all its users.
Need some refactoring before that.
Observed ~0.5% negative compile time impact.
Differential Revision: https://reviews.llvm.org/D98697
Reviewed By: lebedev.ri
By definition of Implication operator, `false -> true` and `false -> false`. It means that
`false` implies any predicate, no matter true or false. We don't need to go any further
trying to prove the statement we need and just always say that `false` implies it in this case.
In practice it means that we are trying to prove something guarded by `false` condition,
which means that this code is unreachable, and we can safely prove any fact or perform any
transform in this code.
Differential Revision: https://reviews.llvm.org/D98706
Reviewed By: lebedev.ri
This reverts commit 329aeb5db4,
and relands commit 61f006ac65.
This is a continuation of D89456.
As it was suggested there, now that SCEV models `PtrToInt`,
we can try to improve SCEV's pointer handling.
In particular, i believe, i will need this in the future
to further fix `SCEVAddExpr`operation type handling.
This removes special handling of `ConstantPointerNull`
from `ScalarEvolution::createSCEV()`, and add constant folding
into `ScalarEvolution::getPtrToIntExpr()`.
This way, `null` constants stay as such in SCEV's,
but gracefully become zero integers when asked.
Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D98147
This is a continuation of D89456.
As it was suggested there, now that SCEV models `PtrToInt`,
we can try to improve SCEV's pointer handling.
In particular, i believe, i will need this in the future
to further fix `SCEVAddExpr`operation type handling.
This removes special handling of `ConstantPointerNull`
from `ScalarEvolution::createSCEV()`, and add constant folding
into `ScalarEvolution::getPtrToIntExpr()`.
This way, `null` constants stay as such in SCEV's,
but gracefully become zero integers when asked.
Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D98147
If we have a recurrence of the form <Start, And, Step> we know that the value taken by the recurrence stabilizes on the first iteration (provided step is loop invariant). We can exploit that fact to remove the loop carried dependence in the recurrence.
Differential Revision: https://reviews.llvm.org/D97578 (and part)
These intrinsics, not the icmp+select are the canonical form nowadays,
so we might as well directly emit them.
This should not cause any regressions, but if it does,
then then they would needed to be fixed regardless.
Note that this doesn't deal with `SCEVExpander::isHighCostExpansion()`,
but that is a pessimization, not a correctness issue.
Additionally, the non-intrinsic form has issues with undef,
see https://reviews.llvm.org/D88287#2587863
This builds on the restricted after initial revert form of D93906, and adds back support for breaking backedges of inner loops. It turns out the original invalidation logic wasn't quite right, specifically around the handling of LCSSA.
When breaking the backedge of an inner loop, we can cause blocks which were in the outer loop only because they were also included in a sub-loop to be removed from both loops. This results in the exit block set for our original parent loop changing, and thus a need for new LCSSA phi nodes.
This case happens when the inner loop has an exit block which is also an exit block of the parent, and there's a block in the child which reaches an exit to said block without also reaching an exit to the parent loop.
(I'm describing this in terms of the immediate parent, but the problem is general for any transitive parent in the nest.)
The approach implemented here involves a potentially expensive LCSSA rebuild. Perf testing during review didn't show anything concerning, but we may end up needing to revert this if anyone encounters a practical compile time issue.
Differential Revision: https://reviews.llvm.org/D94378
This is a resubmit of dd6bb367 (which was reverted due to stage2 build failures in 7c63aac), with the additional restriction added to the transform to only consider outer most loops.
As shown in the added test case, ensuring LCSSA is up to date when deleting an inner loop is tricky as we may actually need to remove blocks from any outer loops, thus changing the exit block set. For the moment, just avoid transforming this case. I plan to return to this case in a follow up patch and see if we can do better.
Original commit message follows...
The basic idea is that if SCEV can prove the backedge isn't taken, we can go ahead and get rid of the backedge (and thus the loop) while leaving the rest of the control in place. This nicely handles cases with dispatch between multiple exits and internal side effects.
Differential Revision: https://reviews.llvm.org/D93906
This reverts commit dd6bb367d1.
Multi-stage builders are showing an assertion failure w/LCSSA not being preserved on entry to IndVars. Reason isn't clear, reverting while investigating.
The basic idea is that if SCEV can prove the backedge isn't taken, we can go ahead and get rid of the backedge (and thus the loop) while leaving the rest of the control in place. This nicely handles cases with dispatch between multiple exits and internal side effects.
Differential Revision: https://reviews.llvm.org/D93906
In 35676a4f9a I've added handling for
non-trivial dominating conditions that imply non-zero on the true
branch. This adds the same support for the false branch.
The changes in pr45360.ll change block ordering and naming, but
don't change the control flow. The urem is still guaraded by a
non-zero check correctly.
And that exposes that a number of tests don't *actually* manage to
maintain DomTree validity, which is inline with my observations.
Once again, SimlifyCFG pass currently does not require/preserve DomTree
by default, so this is effectively NFC.
... so just ensure that we pass DomTreeUpdater it into it.
Fixes DomTree preservation for a large number of tests,
all of which are marked as such so that they do not regress.
... so just ensure that we pass DomTreeUpdater it into it.
Fixes DomTree preservation for a large number of tests,
all of which are marked as such so that they do not regress.
First step after e113317958,
in these tests, DomTree is valid afterwards, so mark them as such,
so that they don't regress.
In further steps, SimplifyCFG transforms shall taught to preserve DomTree,
in as small steps as possible.
The NPM runs loop passes over loops in forward program order, rather
than the legacy loop PM's reverse program order. This seems to produce
better results as shown here.
I verified that changing the loop order to reverse program order results
in the same IR with the NPM.
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D92817
This reverts commit 4bd35cdc3a.
The patch was reverted during the investigation. The investigation
shown that the patch did not cause any trouble, but just exposed
the existing problem that is addressed by the previous patch
"[IndVars] Quick fix LHS/RHS bug". Returning without changes.
This reverts commit 0c9c6ddf17.
We are seeing some failures with this patch locally. Not clear
if it's causing them or just triggering a problem in another
place. Reverting while investigating.
If we decided to widen IV with zext, then unsigned comparisons
should not prevent widening (same for sext/sign comparisons).
The result of comparison in wider type does not change in this case.
Differential Revision: https://reviews.llvm.org/D92207
Reviewed By: nikic
When widening an IndVar that has LCSSA Phi users outside
the loop, we can safely widen it as usual and then truncate
the result outside the loop without hurting the performance.
Differential Revision: https://reviews.llvm.org/D91593
Reviewed By: skatkov
When bailing out in rewriteLoopExitValues() you could be left with PHI
nodes in the DeadInsts vector. Those would be not handled by the use of
RecursivelyDeleteTriviallyDeadInstructions() in IndVarSimplify. This
resulted in the IndVarSimplify pass returning an incorrect modified
status. This was caught by the expensive check introduced in D86589.
This patches changes IndVarSimplify so that it deletes those PHI nodes,
using RecursivelyDeleteDeadPHINode().
This fixes PR47486.
Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D91153
Use more context to prove contextual facts about the last iteration. It is
only executed when the backedge is taken, so we can use `isLoopBackedgeGuardedByCond`
to make this check.
Differential Revision: https://reviews.llvm.org/D91535
Reviewed By: skatkov
When deciding to widen narrow use, we may need to prove some facts
about it. For proof, the context is used. Currently we take the instruction
being widened as the context.
However, we may be more precise here if we take as context the point that
dominates all users of instruction being widened.
Differential Revision: https://reviews.llvm.org/D90456
Reviewed By: skatkov
Some nested loops may share the same ExitingBB, so after we finishing FoldExit,
we need to notify OuterLoop and SCEV to drop any stored trip count.
Patched by: guopeilin
Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D91325
In some cases we can handle IV and iter count of different types. It's a typical situation
after IV have been widened. This patch adds support for such cases, when legal.
Differential Revision: https://reviews.llvm.org/D88528
Reviewed By: skatkov
In an effort to make code around flag determination more readable, and (possibly) prepare for a follow up change, factor out some of the flag detection logic. In the process, reduce the number of locations we mutate wrap flags by a couple.
Note that this isn't NFC. The old code tried for NSW xor (NUW || NW). This is, two different paths computed different sets of wrap flags. The new code will try for all three. The result is that some expressions end up with a few extra flags set.
If we cannot prove that the check is trivially true, but can prove that it either
fails on the 1st iteration or never fails, we can replace it with first iteration check.
Differential Revision: https://reviews.llvm.org/D88527
Reviewed By: skatkov
Sometimes the an instruction we are trying to widen is used by the IV
(which means the instruction is the IV increment). Currently this may
prevent its widening. We should ignore such user because it will be
dead once the transform is done anyways.
Differential Revision: https://reviews.llvm.org/D90920
Reviewed By: fhahn
InstCombine canonicalizes 'sub nuw' instructions to 'add' without the
`nuw` flag. The typical case where we see it is decrementing induction
variables. For them, IndVars fails to prove that it's legal to widen them,
and inserts unprofitable `zext`'s.
This patch adds recognition of such pattern using SCEV.
Differential Revision: https://reviews.llvm.org/D89550
Reviewed By: fhahn, skatkov
Our range computation methods benefit from no-wrap flags. But if the ranges
were first computed before the flags were set, the cached range will be too
pessimistic.
We need to drop cached ranges whenever we sharpen AddRec's no wrap flags.
Differential Revision: https://reviews.llvm.org/D89847
Reviewed By: fhahn
If we know that some check will not be executed on the last iteration, we can use this
fact to eliminate its check.
Differential Revision: https://reviews.llvm.org/D88210
Reviwed By: ebrevnov
This reverts commit e038b60d91.
This reverts commit a0d84d8031.
This revert was a mistake. The reason of the failures was
"Use uint64_t for branch weights instead of uint32_t"
Differential Revision: https://reviews.llvm.org/D87832
This reverts commit c6ca26c0bf.
This breaks stage2 builds due to hitting this assert:
```
Assertion failed: (WeightSum <= UINT32_MAX && "Expected weights to scale down to 32 bits"), function calcMetadataWeights
```
when compiling AArch64RegisterBankInfo.cpp in LLVM.
Even if the exact exit count is unknown, we can still prove that this
exit will not be taken. If we can prove that the predicate is monotonic,
fulfilled on first & last iteration, and no overflow happened in between,
then the check can be removed.
Differential Revision: https://reviews.llvm.org/D87832
Reviewed By: apilipenko
Same change as 0dda633317, but for
mul expressions. We want to first fold any constant operans and
then strengthen the nowrap flags, as we can compute more precise
flags at that point.
We should first try to constant fold the add expression and only
strengthen nowrap flags afterwards. This allows us to determine
stronger flags if e.g. only two operands are left after constant
folding (and thus "guaranteed no wrap region" code applies) or the
resulting operands are non-negative and thus nsw->nuw strengthening
applies.
This reverts commit a10a64e7e3.
It broke polly/test/ScopInfo/NonAffine/non-affine-loop-condition-dependent-access_3.ll
The difference suggests that this may be a serious issue.
Fixed wrapping range case & proof methods reduced to constant range
checks to save compile time.
Differential Revision: https://reviews.llvm.org/D89381
It was reverted because of negative compile time impact. In this version,
less powerful proof methods are used (non-recursive reasoning only), and
scope limited to constant End values to avoid explision of complex proofs.
Differential Revision: https://reviews.llvm.org/D89381
We can sharpen the range of a AddRec if we know that it does not
self-wrap and know the symbolic iteration count in the loop. If we can
evaluate the value of AddRec on the last iteration and prove that at least
one its intermediate value lies between start and end, then no-wrap flag
allows us to conclude that all of them also lie between start and end. So
the estimate of range can be improved to union of ranges of start and end.
Differential Revision: https://reviews.llvm.org/D89381
Reviewed By: efriedma
While we haven't encountered an earth-shattering problem with this yet,
by now it is pretty evident that trying to model the ptr->int cast
implicitly leads to having to update every single place that assumed
no such cast could be needed. That is of course the wrong approach.
Let's back this out, and re-attempt with some another approach,
possibly one originally suggested by Eli Friedman in
https://bugs.llvm.org/show_bug.cgi?id=46786#c20
which should hopefully spare us this pain and more.
This reverts commits 1fb6104293,
7324616660,
aaafe350bb,
e92a8e0c74.
I've kept&improved the tests though.