When the limit of the inner loop is a known integer, the InstCombine
pass now causes the transformation e.g. imcp ult i32 %inc, tripcount ->
icmp ult %j, tripcount-step (where %j is the inner loop induction
variable and %inc is add %j, step), which is now accounted for when
identifying the trip count of the loop. This is also an acceptable use
of %j (provided the step is 1) so is ignored as long as the compare
that it's used in is also the condition of the inner branch.
Differential Revision: https://reviews.llvm.org/D105802
I don't know much about this pass, but we need a stronger
check on the memset length arg to avoid an assert. The
current code was added with D59000.
The test is reduced from:
https://llvm.org/PR50910
Differential Revision: https://reviews.llvm.org/D106462
Additional asserts were added to ScalarEvolution to enforce
pointer/int type rules. An assert is triggered when the LSR pass
attempts to extend a pointer SCEV in GenerateTruncates.
This patch changes GenerateTruncates to exit early if the Formaula
contains a ScaledReg or BaseReg with a pointer type.
Differential Revision: https://reviews.llvm.org/D107185
SCEVToIterCountExpr only expects to be fed affine expressions, but
DbgRewriteSalvageableDVIs is feeding it non-affine induction variables.
Following this up with an obvious fix, will add test coverage too if
this avoids D105207 being reverted.
When the trip count of the inner loop is a constant, the InstCombine
pass now causes the transformation e.g. imcp ult i32 %inc, tripcount ->
icmp ult %j, tripcount-step (where %j is the inner loop induction
variable and %inc is add %j, step), which is now accounted for when
identifying the trip count of the loop. This is also an acceptable use
of %j (provided the step is 1) so is ignored as long as the compare
that it's used in is also the condition of the inner branch.
Differential Revision: https://reviews.llvm.org/D105802
Reapply commit d675b594f4 that was
reverted due to buildbot failures. A simple fix has been applied to
remove an assertion.
Differential Revision: https://reviews.llvm.org/D105207
The SCEV method getBackedgeTakenCount() returns a SCEVCouldNotCompute
object if the backedge-taken count is unpredictable. This fix ensures
there is no longer an attempt to use such an object to find the trip
count.
Patch by: Rosie Sumpter.
Differential Revision: https://reviews.llvm.org/D106970
Reapply commit 796b84d26f that was
reverted due to reports of crashes. A minor change now guards against
getVariableLocationOperand() returning a nullptr.
Differential Revision: https://reviews.llvm.org/D106659
This transform was added with e38b7e8948
and as shown in:
https://llvm.org/PR51241
...it could crash without an extra check of the blocks.
There might be a more compact way to write this constraint,
but we can't just count the successors/predecessors without
affecting a test that includes a switch instruction.
As an instruction is replaced in optimizeTransposes RAUW will replace it in
the ShapeMap (ShapeMap is ValueMap so that uses are updated). In
finalizeLowering however we skip updating uses if they are in the ShapeMap
since they will be lowered separately at which point we pick up the lowered
operands.
In the testcase what happened was that since we replaced the doubled-transpose
with the shuffle, it ended up in the ShapeMap. As we lowered the
columnwise-load the use in the shuffle was not updated. Then as we removed
the original columnwise-load we changed that to an undef. I.e. we ended up
with:
```
%shuf = shufflevector <8 x double> undef, <8 x double> poison, <6 x i32>
^^^^^
<i32 0, i32 1, i32 2, i32 4, i32 5, i32 6>
```
Besides the fix itself, I have fortified this last bit. As we change uses to
undef when removing instruction we track the undefed instruction to make sure
we eventually remove those too. This would have caught the issue at compile
time.
Differential Revision: https://reviews.llvm.org/D106714
The current JumpThreading pass does not jump thread loops since it can
result in irreducible control flow that harms other optimizations. This
prevents switch statements inside a loop from being optimized to use
unconditional branches.
This code pattern occurs in the core_state_transition function of
Coremark. The state machine can be implemented manually with goto
statements resulting in a large runtime improvement, and this transform
makes the switch implementation match the goto version in performance.
This patch specifically targets switch statements inside a loop that
have the opportunity to be threaded. Once it identifies an opportunity,
it creates new paths that branch directly to the correct code block.
For example, the left CFG could be transformed to the right CFG:
```
sw.bb sw.bb
/ | \ / | \
case1 case2 case3 case1 case2 case3
\ | / / | \
latch.bb latch.2 latch.3 latch.1
br sw.bb / | \
sw.bb.2 sw.bb.3 sw.bb.1
br case2 br case3 br case1
```
Co-author: Justin Kreiner @jkreiner
Co-author: Ehsan Amiri @amehsan
Reviewed By: SjoerdMeijer
Differential Revision: https://reviews.llvm.org/D99205
When hoisting/moving calls to locations, we strip unknown metadata. Such calls are usually marked `speculatable`, i.e. they are guaranteed to not cause undefined behaviour when run anywhere. So, we should strip attributes that can cause immediate undefined behaviour if those attributes are not valid in the context where the call is moved to.
This patch introduces such an API and uses it in relevant passes. See
updated tests.
Fix for PR50744.
Reviewed By: nikic, jdoerfert, lebedev.ri
Differential Revision: https://reviews.llvm.org/D104641
This reapplies commit 76f3ffb2b2 that was
reverted due to buildbot failures.
- Update lit tests with REQUIRES condition.
- Abandon salvage attempt if SCEVUnknown::getValue() returns nullptr.
Differential Revision: https://reviews.llvm.org/D105207
This patch extends salvaging of debuginfo in the Loop Strength Reduction
(LSR) pass by translating Scalar Evaluations (SCEV) into DIExpressions.
The method is as follows:
- Cache dbg.value intrinsics that are salvageable.
- Obtain a loop Induction Variable (IV) from ScalarExpressionExpander or
the loop header.
- Translate the IV SCEV into an expression that recovers the current
loop iteration count. Combine this with the dbg.value's location
op SCEV to create a DIExpression that salvages the value.
Review by: jmorse
Differential Revision: https://reviews.llvm.org/D105207
Replace pattern-matching with existing SCEV and Loop APIs as a more
robust way of identifying the loop increment and trip count. Also
rename 'Limit' as 'TripCount' to be consistent with terminology.
Differential Revision: https://reviews.llvm.org/D106580
Apparently this fails to line up the types -- try to sidestep the
issue entirely by writing the code in a more reasonable way: Walk
over the operands and perform a set lookup, rather than walking
over the set and performing an operand scan.
Separate out the BCECmp part from BCECmpBlock, which just stores
the comparison atoms without the branch instruction. At the same
time switch the code to return Optional<> rather than objects in
invalid state and partially constructed objects.
This adjusts mayHaveSideEffect() to return true for !willReturn()
instructions. Just like other side-effects, non-willreturn calls
(aka "divergence") cannot be removed and cannot be reordered relative
to other side effects. This fixes a number of bugs where
non-willreturn calls are either incorrectly dropped or moved. In
particular, it also fixes the last open problem in
https://bugs.llvm.org/show_bug.cgi?id=50511.
I performed a cursory review of all current mayHaveSideEffect()
uses, which convinced me that these are indeed the desired default
semantics. Places that do not want to consider non-willreturn as a
sideeffect generally do not want mayHaveSideEffect() semantics at
all. I identified two such cases, which are addressed by D106591
and D106742. Finally, there is a use in SCEV for which we don't
really have an appropriate API right now -- what it wants is
basically "would this be considered forward progress". I've just
spelled out the previous semantics there.
Differential Revision: https://reviews.llvm.org/D106749
The check for sinking instructions past the load + cmp sequence
currently checks for side-effects, which includes writing to memory
and unwinding. However, I don't believe we care about sinking the
instructions past an unwind (as they don't have any side-effects
themselves).
Differential Revision: https://reviews.llvm.org/D106591
We should only add the fake lowering entry for the matrix remark if the
transpose is not lowered on its own. `MapVector::insert` is used to insert
the entry during proper lowering which does not overwrite the fake entry in
the map.
We actually had test coverage for this but the reference output code was
wrong; it was storing undef rather than the transposed column.
Also add an assert that would have caught this.
Differential Revision: https://reviews.llvm.org/D106457
The purpose of patch is to learn Loop idiom recognition pass how to recognize simple memmove patterns
in similar way like GCC: https://godbolt.org/z/fh95e83od
LoopIdiomRecognize already has machinery for memset and memcpy recognition, patch tries to extend exisiting capabilities with minimal effort.
Differential Revision: https://reviews.llvm.org/D104464
Make getLatchCmpInst non-static and use it in LoopFlatten as a more
robust way of identifying the compare.
Differential Revision: https://reviews.llvm.org/D106256
The patch does not depend on the availability of the library functions for
memcpy/memset as it operates on LLVM intrinsics. The optimizations are useful
on the targets that have these functions disabled (e.g. NVPTX & AMDGPU).
Differential Revision: https://reviews.llvm.org/D104801
This patch adds a new pass called LNICM which is a LoopNest version of LICM and a test case to show how LNICM works.
Basically, LNICM only hoists invariants out of loop nest (not a loop) to keep/make perfect loop nest. This enables later optimizations that require perfect loop nest.
Reviewed By: Whitney
Differential Revision: https://reviews.llvm.org/D104180
Replace code which identifies induction phi with helper function
getInductionVariable to improve robustness.
Differential Revision: https://reviews.llvm.org/D106045
We already know that we need to check whether lcssa
phis are supported in inner loop exit block or in
outer loop exit block, and we have logic to check
them already. Presumably the inner loop latch does
not have lcssa phis and there is no code that deals
with lcssa phis in the inner loop latch. However,
that assumption is not true, when we have loops
with more than two-level nesting. This patch adds
checks for lcssa phis in the inner latch.
Reviewed By: Whitney
Differential Revision: https://reviews.llvm.org/D102300
This patch addresses assertion failure in case when the only found formula for LSR
is `1*reg => reg` which was supposed to be an impossible situation, however there
is a test that shows it is possible.
In this case, we can use scale register with scale of 1 as the missing base register.
Reviewed By: huihuiz, reames
Differential Revision: https://reviews.llvm.org/D105009
To help with debugging non-trivial unswitching issues.
Don't care about the legacy pass, nobody is using it.
If a pass's string params are empty (e.g. "simple-loop-unswitch"), don't
default to the empty constructor for the pass params. We should still
let the parser take care of it in case the parser has its own defaults.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D105933
Make sure getMinusSCEV() didn't return a pointer. The following check
would never succeed if it was a pointer, anyway, but calling
getMulExpr() on a pointer SCEV now asserts.
This new test demonstrates a case where a base ptr is generated
twice for the same value: the first one is generated while
the gc.get.pointer.base() is inlined, the second is generated
for the statepoint. This happens because the methods
inlineGetBaseAndOffset() and insertParsePoints() do not share
their defining value cache used by the findBasePointer() method.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D103240
This patch adds support for hoisting the division and maybe the
remainder for control flow graphs like this.
```
PredBB
| \
| Rem
| /
Div
```
If we have DivRem we'll hoist both to PredBB. If not we'll just
hoist Div and expand Rem using the Div.
This improves our codegen for something like this
```
__uint128_t udivmodti4(__uint128_t dividend, __uint128_t divisor, __uint128_t *remainder) {
if (remainder != 0)
*remainder = dividend % divisor;
return dividend / divisor;
}
```
Reviewed By: spatel, lebedev.ri
Differential Revision: https://reviews.llvm.org/D87555
Rules:
1. SCEVUnknown is a pointer if and only if the LLVM IR value is a
pointer.
2. SCEVPtrToInt is never a pointer.
3. If any other SCEV expression has no pointer operands, the result is
an integer.
4. If a SCEVAddExpr has exactly one pointer operand, the result is a
pointer.
5. If a SCEVAddRecExpr's first operand is a pointer, and it has no other
pointer operands, the result is a pointer.
6. If every operand of a SCEVMinMaxExpr is a pointer, the result is a
pointer.
7. Otherwise, the SCEV expression is invalid.
I'm not sure how useful rule 6 is in practice. If we exclude it, we can
guarantee that ScalarEvolution::getPointerBase always returns a
SCEVUnknown, which might be a helpful property. Anyway, I'll leave that
for a followup.
This is basically mop-up at this point; all the changes with significant
functional effects have landed. Some of the remaining changes could be
split off, but I don't see much point.
Differential Revision: https://reviews.llvm.org/D105510
Added check for switch-terminated blocks in loops.
Now if a block is terminated with a switch, we try to find out which of the
cases is taken on 1st iteration and mark corresponding edge from the block
to the case successor as live.
Patch by Dmitry Makogon!
Differential Revision: https://reviews.llvm.org/D105688
Reviewed By: nikic, mkazantsev
There was an alias between 'simplifycfg' and 'simplify-cfg' in the
PassRegistry. That was the original reason for this patch, which
effectively removes the alias.
This patch also replaces all occurrances of 'simplify-cfg'
by 'simplifycfg'. Reason for choosing that form for the name is
that it matches the DEBUG_TYPE for the pass, and the legacy PM name
and also how it is spelled out in other passes such as
'loop-simplifycfg', and in other options such as
'simplifycfg-merge-cond-stores'.
I for some reason the name should be changed to 'simplify-cfg' in
the future, then I think such a renaming should be more widely done
and not only impacting the PassRegistry.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D105627
C++23 will make these conversions ambiguous - so fix them to make the
codebase forward-compatible with C++23 (& a follow-up change I've made
will make this ambiguous/invalid even in <C++23 so we don't regress
this & it generally improves the code anyway)
As part of making ScalarEvolution's handling of pointers consistent, we
want to forbid multiplying a pointer by -1 (or any other value). This
means we can't blindly subtract pointers.
There are a few ways we could deal with this:
1. We could completely forbid subtracting pointers in getMinusSCEV()
2. We could forbid subracting pointers with different pointer bases
(this patch).
3. We could try to ptrtoint pointer operands.
The option in this patch is more friendly to non-integral pointers: code
that works with normal pointers will also work with non-integral
pointers. And it seems like there are very few places that actually
benefit from the third option.
As a minimal patch, the ScalarEvolution implementation of getMinusSCEV
still ends up subtracting pointers if they have the same base. This
should eliminate the shared pointer base, but eventually we'll need to
rewrite it to avoid negating the pointer base. I plan to do this as a
separate step to allow measuring the compile-time impact.
This doesn't cause obvious functional changes in most cases; the one
case that is significantly affected is ICmpZero handling in LSR (which
is the source of almost all the test changes). The resulting changes
seem okay to me, but suggestions welcome. As an alternative, I tried
explicitly ptrtoint'ing the operands, but the result doesn't seem
obviously better.
I deleted the test lsr-undef-in-binop.ll becuase I couldn't figure out
how to repair it to test what it was actually trying to test.
Recommitting with fix to MemoryDepChecker::isDependent.
Differential Revision: https://reviews.llvm.org/D104806
As part of making ScalarEvolution's handling of pointers consistent, we
want to forbid multiplying a pointer by -1 (or any other value). This
means we can't blindly subtract pointers.
There are a few ways we could deal with this:
1. We could completely forbid subtracting pointers in getMinusSCEV()
2. We could forbid subracting pointers with different pointer bases
(this patch).
3. We could try to ptrtoint pointer operands.
The option in this patch is more friendly to non-integral pointers: code
that works with normal pointers will also work with non-integral
pointers. And it seems like there are very few places that actually
benefit from the third option.
As a minimal patch, the ScalarEvolution implementation of getMinusSCEV
still ends up subtracting pointers if they have the same base. This
should eliminate the shared pointer base, but eventually we'll need to
rewrite it to avoid negating the pointer base. I plan to do this as a
separate step to allow measuring the compile-time impact.
This doesn't cause obvious functional changes in most cases; the one
case that is significantly affected is ICmpZero handling in LSR (which
is the source of almost all the test changes). The resulting changes
seem okay to me, but suggestions welcome. As an alternative, I tried
explicitly ptrtoint'ing the operands, but the result doesn't seem
obviously better.
I deleted the test lsr-undef-in-binop.ll becuase I couldn't figure out
how to repair it to test what it was actually trying to test.
Differential Revision: https://reviews.llvm.org/D104806
If the store address does not dominate the matrix multiply, try to hoist
address computation instructions without side-effects and/or memory
reads before the multiply, to allow fusion.
Reviewed By: thegameg
Differential Revision: https://reviews.llvm.org/D105193
With 'for' loop there is is a single place where 'Current' is adjusted. It helps to avoid copy paste and makes a bit easy to understand overall loop controll flow.
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D101044
Previously we used the vector type, but we're loading/storing
invididual elements so I think only element alignment should matter.
Noticed while looking at the code for something else so I don't
have a test case.
Differential Revision: https://reviews.llvm.org/D105220
Compiler crashes at an assertion while casting operands to PtrToIntInst at some cases when
ptrtoint is present as an explicit operand to inttoptr. Explicit instruction operator as
operand can not be casted to an Instruction.
This patch replaces cast from PtrToInst to Operator which are later checked for constant
expressions.
Differential Revision: https://reviews.llvm.org/D105002
We can exploit branches by `undef` condition. Frankly, the LangRef says that
such branches are UB, so we can assume that all outgoing edges of such blocks
are dead.
However, from practical perspective, we know that this is not supported correctly
in some other places. So we are being conservative about it.
Branch by undef is treated in the following way:
- If it is a loop-exiting branch, we always assume it exits the loop;
- If not, we arbitrarily assume it takes `true` value.
Differential Revision: https://reviews.llvm.org/D104689
Reviewed By: nikic
For the start shortening optimization, always use a i8 type for
the GEP, as it is a raw offset calculation.
Handling of non-i8* memset/memcpy arguments requires insertion
of casts. These cases were previously miscompiled, as the offset
calculation was performed on the wrong type.
Apparently, it is legal to use memcpy/memset with pointer types
other than i8*. Prior to 81fcdae68c
this case was silently miscompiled, as the i8 offset calculation
was performed on some other type. Now it would crash due to a
type mismatch. Fix this by inserting an explicit bitcast to i8*.
Similar to what we already do for `ret` terminators.
As noted by @rnk, clang seems to already generate a single `ret`/`resume`,
so this isn't likely to cause widespread changes.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D104849
Based ontop of D104598, which is a NFCI-ish refactoring.
Here, a restriction, that only empty blocks can be merged, is lifted.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D104597
This changes the approach taken to tail-merge the blocks
to always create a new block instead of trying to reuse some block,
and generalizes it to support dealing not with just the `ret` in the future.
This effectively lifts the CallBr restriction, although this isn't really intentional.
That is the only non-NFC change here, i'm not sure if it's reasonable/feasible to temporarily retain it.
Other restrictions of the transform remain.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D104598
Follow-up on Roman's idea expressed in D103959.
- If a Phi has undefined inputs from live blocks:
- and no other inputs, assume it is undef itself;
- and exactly one non-undef input, we can assume that all undefs are equal to this input.
Differential Revision: https://reviews.llvm.org/D104618
Reviewed By: lebedev.ri, nikic
Zero factor leads to division by zero and failure of corresponding
assert as shown in PR50765. We should filter out such factors.
Differential Revision: https://reviews.llvm.org/D104702
Reviewed By: huihuiz, reames
There was a bug from cost calculation for partially invariant unswitch.
The costs of non-duplicated blocks are substracted from the total LoopCost, so
anything that is duplicated should not be counted.
Differential Revision: https://reviews.llvm.org/D103816
Patch was reverted due to a bug that existed before it and was exposed
by it. Returning after the underlying bug has been fixed.
Differential Revision: https://reviews.llvm.org/D103959
As these are no longer passed to UnrollLoop(), there is no need to
modify them in computeUnrollCount(). Make them non-reference parameters.
Differential Revision: https://reviews.llvm.org/D104590
This reverts commit bb1dc876eb.
This patch causes an assertion failure when building an arm64 defconfig
Linux kernel.
See https://reviews.llvm.org/D103959 for a link to the original bug
report and a reduced reproducer.
This patch lifts the requirement to have the only incoming live block
for Phis. There can be multiple live blocks if the same value comes to
phi from all of them.
Differential Revision: https://reviews.llvm.org/D103959
Reviewed By: nikic, lebedev.ri
This is a more general alternative/extension to D102635. Rather than
handling the special case of "header exit with non-exiting latch",
this unrolls against the smallest exact trip count from any exit.
The latch exit is no longer treated as priviledged when it comes to
full unrolling.
The motivating case is in full-unroll-one-unpredictable-exit.ll.
Here the header exit is an IV-based exit, while the latch exit is
a data comparison. This kind of loop does not get rotated, because
the latch is already exiting, and loop rotation doesn't try to
distinguish IV-based/analyzable latches.
Differential Revision: https://reviews.llvm.org/D102982
DSE will currently only remove stores in the same block unless they can
be guaranteed to be loop invariant. This expands that to any stores that
are in the same Loop, at the same loop level. This should still account
for where AA/MSSA will not handle aliasing between loops, but allow the
dead stores to be removed where they overlap in the same loop iteration.
It requires adding loop info to DSE, but that looks fairly harmless.
The test case this helps is from code like this, which can come up in
certain matrix operations:
for(i=..)
dst[i] = 0;
for(j=..)
dst[i] += src[i*n+j];
After LICM, this becomes:
for(i=..)
dst[i] = 0;
sum = 0;
for(j=..)
sum += src[i*n+j];
dst[i] = sum;
The first store is dead, and with this patch is now removed.
Differntial Revision: https://reviews.llvm.org/D100464
Currently, UnrollLoop() is passed an AllowRuntime flag and decides
itself whether runtime unrolling should be used or not. This patch
pushes the decision into the caller and allows us to eliminate the
ULO.TripCount and ULO.TripMultiple parameters.
Differential Revision: https://reviews.llvm.org/D104487
As a follow-up to https://reviews.llvm.org/D104129, I'm cleaning up the danling probe related code in both the compiler and llvm-profgen.
I'm seeing a 5% size win for the pseudo_probe section for SPEC2017 and 10% for Ciner. Certain benchmark such as 602.gcc has a 20% size win. No obvious difference seen on build time for SPEC2017 and Cinder.
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D104477
This patch handles one particular case of one-iteration loops for which SCEV
cannot straightforwardly prove BECount = 1. The idea of the optimization is to
symbolically execute conditional branches on the 1st iteration, moving in topoligical
order, and only visiting blocks that may be reached on the first iteration. If we find out
that we never reach header via the latch, then the backedge can be broken.
This implementation uses InstSimplify. SCEV version was rejected due to high
compile time impact.
Differential Revision: https://reviews.llvm.org/D102615
Reviewed By: nikic
This pass emits a floating point compare and a conditional branch,
but if strictfp is enabled we don't emit a constrained compare
intrinsic.
The backend also won't expand the readonly sqrt call this pass inserts
to a sqrt instruction under strictfp. So we end up with 2 libcalls as
seen here. https://godbolt.org/z/oax5zMEWd
Fix these things by disabling the pass.
Differential Revision: https://reviews.llvm.org/D104479
This can be seen as a follow up to commit 0ee439b705,
that changed the second argument of __powidf2, __powisf2 and
__powitf2 in compiler-rt from si_int to int. That was to align with
how those runtimes are defined in libgcc.
One thing that seem to have been missing in that patch was to make
sure that the rest of LLVM also handle that the argument now depends
on the size of int (not using the si_int machine mode for 32-bit).
When using __builtin_powi for a target with 16-bit int clang crashed.
And when emitting libcalls to those rtlib functions, typically when
lowering @llvm.powi), the backend would always prepare the exponent
argument as an i32 which caused miscompiles when the rtlib was
compiled with 16-bit int.
The solution used here is to use an overloaded type for the second
argument in @llvm.powi. This way clang can use the "correct" type
when lowering __builtin_powi, and then later when emitting the libcall
it is assumed that the type used in @llvm.powi matches the rtlib
function.
One thing that needed some extra attention was that when vectorizing
calls several passes did not support that several arguments could
be overloaded in the intrinsics. This patch allows overload of a
scalar operand by adding hasVectorInstrinsicOverloadedScalarOpd, with
an entry for powi.
Differential Revision: https://reviews.llvm.org/D99439
Addition of this pass has been botched.
There is no particular reason why it had to be sold as an inseparable part
of new-pm transition. It was added when old-pm was still the default,
and very *very* few users were actually tracking new-pm,
so it's effects weren't measured.
Which means, some of the turnoil of the new-pm transition
are actually likely regressions due to this pass.
Likewise, there has been a number of post-commit feedback
(post new-pm switch), namely
* https://reviews.llvm.org/D37467#2787157 (regresses HW-loops)
* https://reviews.llvm.org/D37467#2787259 (should not be in middle-end, should run after LSR, not before)
* https://reviews.llvm.org/D95789 (an attempt to fix bad loop backedge metadata)
and in the half year past, the pass authors (google) still haven't found time to respond to any of that.
Hereby it is proposed to backout the pass from the pipeline,
until someone who cares about it can address the issues reported,
and properly start the process of adding a new pass into the pipeline,
with proper performance evaluation.
Furthermore, neither google nor facebook reports any perf changes
from this change, so i'm dropping the pass completely.
It can always be re-reverted should/if anyone want to pick it up again.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D104099
Loops with irreducible cycles may loop infinitely. Those cannot be
removed, unless the loop/function is marked as mustprogress.
Also discussed in D103382.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D104238
Currently, Loop strengh reduce is not handling loops with scalable stride very well.
Take loop vectorized with scalable vector type <vscale x 8 x i16> for instance,
(refer to test/CodeGen/AArch64/sve-lsr-scaled-index-addressing-mode.ll added).
Memory accesses are incremented by "16*vscale", while induction variable is incremented
by "8*vscale". The scaling factor "2" needs to be extracted to build candidate formula
i.e., "reg(%in) + 2*reg({0,+,(8 * %vscale)}". So that addrec register reg({0,+,(8*vscale)})
can be reused among Address and ICmpZero LSRUses to enable optimal solution selection.
This patch allow LSR getExactSDiv to recognize special cases like "C1*X*Y /s C2*X*Y",
and pull out "C1 /s C2" as scaling factor whenever possible. Without this change, LSR
is missing candidate formula with proper scaled factor to leverage target scaled-index
addressing mode.
Note: This patch doesn't fully fix AArch64 isLegalAddressingMode for scalable
vector. But allow simple valid scale to pass through.
Reviewed By: sdesmalen
Differential Revision: https://reviews.llvm.org/D103939
Ensure that we provide a `Module` when checking if a rename of an intrinsic is necessary.
This fixes the issue that was detected by https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=32288
(as mentioned by @fhahn), after committing D91250.
Note that the `LLVMIntrinsicCopyOverloadedName` is being deprecated in favor of `LLVMIntrinsicCopyOverloadedName2`.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D99173