Explicitly opt-out llvm/test/Transforms/Attributor.
Verified by flipping the default value of allow-unused-prefixes and
observing that none of the failures were under llvm/test/Transforms.
Differential Revision: https://reviews.llvm.org/D92404
Reverted because the compile time impact is still too high.
isKnownViaNonRecursiveReasoning is used twice, we can do it just once.
Differential Revision: https://reviews.llvm.org/D92152
Previously we tried to using isKnownPredicateAt, but it makes an
extra query to isKnownPredicate, which has negative impact on compile
time. Let's try to use more lightweight isBasicBlockEntryGuardedByCond.
Differential Revision: https://reviews.llvm.org/D92152
A piece of code in `isLoopBackedgeGuardedByCond` basically duplicates
the dominators traversal from `isBlockEntryGuardedByCond` called from
`isKnownPredicateAt`, but it's less powerful because it does not give context
to `isImpliedCond`. This patch reuses the `isKnownPredicateAt `function there,
reducing the amount of code duplication and making it more powerful.
Differential Revision: https://reviews.llvm.org/D92152
Reviewed By: skatkov
If we've got an SCEVPtrToIntExpr(op), where op is not an SCEVUnknown,
we want to sink the SCEVPtrToIntExpr into an operand,
so that the operation is performed on integers,
and eventually we end up with just an `SCEVPtrToIntExpr(SCEVUnknown)`.
Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D89692
And use it to model LLVM IR's `ptrtoint` cast.
This is essentially an alternative to D88806, but with no chance for
all the problems it caused due to having the cast as implicit there.
(see rG7ee6c402474a2f5fd21c403e7529f97f6362fdb3)
As we've established by now, there are at least two reasons why we want this:
* It will allow SCEV to actually model the `ptrtoint` casts
and their operands, instead of treating them as `SCEVUnknown`
* It should help with initial problem of PR46786 - this should eventually allow us
to not loose pointer-ness of an expression in more cases
As discussed in [[ https://bugs.llvm.org/show_bug.cgi?id=46786 | PR46786 ]], in principle,
we could just extend `SCEVUnknown` with a `is ptrtoint` cast, because `ScalarEvolution::getPtrToIntExpr()`
should sink the cast as far down into the expression as possible,
so in the end we should always end up with `SCEVPtrToIntExpr` of `SCEVUnknown`.
But i think that it isn't the best solution, because it doesn't really matter
from memory consumption side - there probably won't be *that* many `SCEVPtrToIntExpr`s
for it to matter, and it allows for much better discoverability.
Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D89456
We cannot guarantee that the replacement expression is loop-invariant in
all AddRecs in the source expression. Use a rewriter that skips
AddRecExpr for now.
Fixes PR47776.
Use SCEV to salvage additional @llvm.dbg.value that have turned into
referencing undef after transformation (and traditional
salvageDebugInfo). Before transformation compute SCEV for each
@llvm.dbg.value in the loop body and store it (along side its current
DIExpression). After transformation update those @llvm.dbg.value now
referencing undef by comparing its stored SCEV to the SCEV of the
current loop-header PHI-nodes. Allow match with offset by inserting
compensation code in the DIExpression.
Includes fix for the nullptr deref that caused the original commit
to be reverted in 9d63029770.
Fixes : PR38815
Differential Revision: https://reviews.llvm.org/D87494
This reverts commit a3caf7f610.
The ReleaseLTO-g test-suite configuration has been failing
to build since this commit, because clang segfaults while
building 7zip.
Use SCEV to salvage additional @llvm.dbg.value that have turned into
referencing undef after transformation (and traditional
salvageDebugInfo). Before transformation compute SCEV for each
@llvm.dbg.value in the loop body and store it (along side its current
DIExpression). After transformation update those @llvm.dbg.value now
referencing undef by comparing its stored SCEV to the SCEV of the
current loop-header PHI-nodes. Allow match with offset by inserting
compensation code in the DIExpression.
Fixes : PR38815
Differential Revision: https://reviews.llvm.org/D87494
This commit was originally because it was suspected to cause a crash,
but a reproducer did not surface.
A crash that was exposed by this change was fixed in 1d8f2e5292.
This reverts the revert commit 0581c0b0ee.
This prevents us from doing things like LICM'ing it out of a loop,
which is usually a net loss because we end up having to spill a
callee-saved FPR to accomodate it.
This does perturb instruction scheduling around this instruction,
so a number of tests had to be updated to account for it.
Reviewed By: t.p.northover
Differential Revision: https://reviews.llvm.org/D87316
In GenerateConstantOffsetsImpl, we may generate non canonical Formula
if BaseRegs of that Formula is updated and includes a recurrent expr reg
related with current loop while its ScaledReg is not.
Patched by: mdchen
Reviewed By: qcolombet
Differential Revision: https://reviews.llvm.org/D86939
This reverts commit e441b7a7a0.
This patch causes a compile error in tensorflow opensource project. The stack trace looks like:
Point of crash:
llvm/include/llvm/Analysis/LoopInfoImpl.h : line 35
(gdb) ptype *this
type = const class llvm::LoopBase<llvm::BasicBlock, llvm::Loop> [with BlockT = llvm::BasicBlock, LoopT = llvm::Loop]
(gdb) p *this
$1 = {ParentLoop = 0x0, SubLoops = std::vector of length 0, capacity 0, Blocks = std::vector of length 0, capacity 1,
DenseBlockSet = {<llvm::SmallPtrSetImpl<llvm::BasicBlock const*>> = {<llvm::SmallPtrSetImplBase> = {<llvm::DebugEpochBase> = {Epoch = 3}, SmallArray = 0x1b2bf6c8, CurArray = 0x1b2bf6c8,
CurArraySize = 8, NumNonEmpty = 0, NumTombstones = 0}, <No data fields>}, SmallStorage = {0xfffffffffffffffe, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}}, IsInvalid = true}
(gdb) p *this->DenseBlockSet->CurArray
$2 = (const void *) 0xfffffffffffffffe
I will try to get a case from tensorflow or use creduce to get a small case.
Now that SCEVExpander can preserve LCSSA form,
we do not have to worry about LCSSA form when
trying to look through PHIs. SCEVExpander will take
care of inserting LCSSA PHI nodes as required.
This increases precision of the analysis in some cases.
Reviewed By: mkazantsev, bmahjour
Differential Revision: https://reviews.llvm.org/D71539
Currently the SCEVExpander tries to re-use existing casts, even if they
are not exactly at the insertion point it was asked to create the cast.
To do so in some case, it creates a new cast at the insertion point and
updates all users to use the new cast.
This behavior is problematic, because it changes the IR outside of the
instructions created during the expansion. Therefore we cannot
completely undo all changes made during expansion.
This re-use should be only an extra optimization, so only using the new
cast in the expanded instructions should not be a correctness issue.
There are many cases equivalent instructions are created during
expansion.
This patch also adjusts findInsertPointAfter to skip instructions
inserted during expansion. This enables re-using existing casts without
the renaming any uses, by picking a better insertion point.
Reviewed By: efriedma, lebedev.ri
Differential Revision: https://reviews.llvm.org/D84399
querying getSCEV() for incomplete phis leads to wrong cache value in `ExprToIVMap`,
because incomplete phis may be simplified to same value before get SCEV expression.
Reviewed By: lebedev.ri, mkazantsev
Differential Revision: https://reviews.llvm.org/D77560
The test is quite frafile, as the check lines match IR numbers and it is
not obvious why only a very small subset is checked.
Re-generate check lines, so further changes are more obvious.
The legacy pass was called "loop-reduce".
This lowers the number of check-llvm failures under NPM by 83.
Reviewed By: ychen
Differential Revision: https://reviews.llvm.org/D82925
Summary:
In SCEVExpander FactorOutConstant(), when GEP indexing into/over scalable vector,
it is legal for the 'Factor' in a MulExpr to be the size of a scalable vector
instead of a compile-time constant.
Current upstream crash with the test attached.
Reviewers: efriedma, sdesmalen, sanjoy.google, mkazantsev
Reviewed By: efriedma
Subscribers: hiraditya, javed.absar, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80973
This patch adds a new TTI hook to allow targets to tell LSR that
a chain including some instruction is already profitable and
should not be optimized. This patch also adds an implementation
of this TTI hook for ARM so LSR doesn't optimize chains that include
the VCTP intrinsic.
Differential Revision: https://reviews.llvm.org/D79418
greater-than-sign is not a BRE special character.
POSIX.1-2017 XBD Section 9.3.2 indicates that the interpretation of `\>`
is undefined. This patch replaces the pattern.
llvm-svn: 363828
Second functional change following on from rL362687. Pass the
NoWrapFlags from the MulExpr to InsertBinop when we're generating a
shl or mul.
Differential Revision: https://reviews.llvm.org/D61934
llvm-svn: 363540
InsertBinop now accepts NoWrapFlags, so pass them through when
expanding a simple add expression.
This is the first re-commit of the functional changes from rL362687,
which was previously reverted.
Differential Revision: https://reviews.llvm.org/D61934
llvm-svn: 363364
If the given SCEVExpr has no (un)signed flags attached to it, transfer
these to the resulting instruction or use them to find an existing
instruction.
Differential Revision: https://reviews.llvm.org/D61934
llvm-svn: 362687
INC/DEC is really a special case of a more generic issue. We should also turn leas into add reg/reg or add reg/imm regardless of the slow lea flags.
This also supports LEA64_32 which has 64 bit input registers and 32 bit output registers. So we need to convert the 64 bit inputs to their 32 bit equivalents to check if they are equal to base reg.
One thing to note, the original code preserved the kill flags by adding operands to the new instruction instead of using addReg. But I think tied operands aren't supposed to have the kill flag set. I dropped the kill flags, but I could probably try to preserve it in the add reg/reg case if we think its important. Not sure which operand its supposed to go on for the LEA64_32r instruction due to the super reg implicit uses. Though I'm also not sure those are needed since they were probably just created by an INSERT_SUBREG from a 32-bit input.
Differential Revision: https://reviews.llvm.org/D61472
llvm-svn: 361691
Summary:
Currently we express umin as `~umax(~x, ~y)`. However, this becomes
a problem for operands in non-integral pointer spaces, because `~x`
is not something we can compute for `x` non-integral. However, since
comparisons are generally still allowed, we are actually able to
express `umin(x, y)` directly as long as we don't try to express is
as a umax. Support this by adding an explicit umin/smin representation
to SCEV. We do this by factoring the existing getUMax/getSMax functions
into a new function that does all four. The previous two functions were
largely identical.
Reviewed By: sanjoy
Differential Revision: https://reviews.llvm.org/D50167
llvm-svn: 360159
In some circumstances we can end up with setup costs that are very complex to
compute, even though the scevs are not very complex to create. This can also
lead to setupcosts that are calculated to be exactly -1, which LSR treats as an
invalid cost. This patch puts a limit on the recursion depth for setup cost to
prevent them taking too long.
Thanks to @reames for the report and test case.
Differential Revision: https://reviews.llvm.org/D60944
llvm-svn: 358958
As it's causing some bot failures (and per request from kbarton).
This reverts commit r358543/ab70da07286e618016e78247e4a24fcb84077fda.
llvm-svn: 358546
If LSR split critical edge during rewriting phi operands and
phi node has other pending fixup operands, we need to
update those pending fixups. Otherwise formulae will not be
implemented completely and some instructions will not be eliminated.
llvm.org/PR41445
Differential Revision: https://reviews.llvm.org/D60645
Patch by: Denis Bakhvalov <denis.bakhvalov@intel.com>
llvm-svn: 358457
For the attached test case, unchecked addition of immediate starts and
ends overflows, as they can be arbitrary i64 constants.
Proof: https://rise4fun.com/Alive/Plqc
Reviewers: qcolombet, gilr, efriedma
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D59218
llvm-svn: 357217
Summary:
This adds a BranchFusion feature to replace the usage of the MacroFusion
for AMD CPUs.
See D59688 for context.
Reviewers: andreadb, lebedev.ri
Subscribers: hiraditya, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59872
llvm-svn: 357171
We are adding a sign extended IR value to an int64_t, which can cause
signed overflows, as in the attached test case, where we have a formula
with BaseOffset = -1 and a constant with numeric_limits<int64_t>::min().
If the addition would overflow, skip the simplification for this
formula. Note that the target triple is required to trigger the failure.
Reviewers: qcolombet, gilr, kparzysz, efriedma
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D59211
llvm-svn: 356256
In some loops, we end up generating loop induction variables that look like:
{(-1 * (zext i16 (%i0 * %i1) to i32))<nsw>,+,1}
As opposed to the simpler:
{(zext i16 (%i0 * %i1) to i32),+,-1}
i.e we count up from -limit to 0, not the simpler counting down from limit to
0. This is because the scores, as LSR calculates them, are the same and the
second is filtered in place of the first. We end up with a redundant SUB from 0
in the code.
This patch tries to make the calculation of the setup cost a little more
thoroughly, recursing into the scev members to better approximate the setup
required. The cost function for comparing LSR costs is:
return std::tie(C1.NumRegs, C1.AddRecCost, C1.NumIVMuls, C1.NumBaseAdds,
C1.ScaleCost, C1.ImmCost, C1.SetupCost) <
std::tie(C2.NumRegs, C2.AddRecCost, C2.NumIVMuls, C2.NumBaseAdds,
C2.ScaleCost, C2.ImmCost, C2.SetupCost);
So this will only alter results if none of the other variables turn out to be
different.
Differential Revision: https://reviews.llvm.org/D58770
llvm-svn: 355597
Modify GenerateConstantOffsetsImpl to create offsets that can be used
by indexed addressing modes. If formulae can be generated which
result in the constant offset being the same size as the recurrence,
we can generate a pre-indexed access. This allows the pointer to be
updated via the single pre-indexed access so that (hopefully) no
add/subs are required to update it for the next iteration. For small
cores, this can significantly improve performance DSP-like loops.
Differential Revision: https://reviews.llvm.org/D55373
llvm-svn: 353403
When LSR first adds SCEVs to BaseRegs, it only does it if `isZero()` has
returned false. In the end, in invocation of `InsertFormula`, it asserts that
all values there are still not zero constants. However between these two
points, it makes some transformations, in particular extends them to wider
type.
SCEV does not give us guarantee that if `S` is not a constant zero, then
`sext(S)` is also not a constant zero. It might have missed some optimizing
transforms when it was calculating `S` and then made them when it took `sext`.
For example, it may happen if previously optimizing transforms were limited
by depth or somehow else.
This patch adds a bailout when we may end up with a zero SCEV after extension.
Differential Revision: https://reviews.llvm.org/D57565
Reviewed By: samparker
llvm-svn: 353136
Currently SCEV attempts to limit transformations so that they do not work with
big SCEVs (that may take almost infinite compile time). But for this, it uses heuristics
such as recursion depth and number of operands, which do not give us a guarantee
that we don't actually have big SCEVs. This situation is still possible, though it is not
likely to happen. However, the bug PR33494 showed a bunch of simple corner case
tests where we still produce huge SCEVs, even not reaching big recursion depth etc.
This patch introduces a concept of 'huge' SCEVs. A SCEV is huge if its expression
size (intoduced in D35989) exceeds some threshold value. We prohibit optimizing
transformations if any of SCEVs we are dealing with is huge. This gives us a reliable
check that we don't spend too much time working with them.
As the next step, we can possibly get rid of old limiting mechanisms, such as recursion
depth thresholds.
Differential Revision: https://reviews.llvm.org/D35990
Reviewed By: reames
llvm-svn: 352728
LSR reassociates constants as unfolded offsets when the constants fit as
immediate add operands, which currently prevents such constants from being
combined later with loop invariant registers.
This patch modifies GenerateCombinations() to generate a second formula which
includes the unfolded offset in the combined loop-invariant register.
This commit fixes a bug in the original patch (committed at r345114, reverted
at r345123).
Differential Revision: https://reviews.llvm.org/D51861
llvm-svn: 346390
LSR reassociates constants as unfolded offsets when the constants fit as
immediate add operands, which currently prevents such constants from being
combined later with loop invariant registers.
This patch modifies GenerateCombinations() to generate a second formula which
includes the unfolded offset in the combined loop-invariant register.
Differential Revision: https://reviews.llvm.org/D51861
llvm-svn: 345114
LSR reassociates small constants that fit into add immediate operands as
unfolded offset. Since unfolded offset is not combined with loop-invariant
registers, LSR does not consider solutions that bump invariant registers by
these constants outside the loop.
llvm-svn: 341835
Summary:
An alternative to D48597.
Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=37936 | PR37936 ]].
The problem is as follows:
1. `indvars` marks `%dec` as `NUW`.
2. `loop-instsimplify` runs `instsimplify`, which constant-folds `%dec` to -1 (D47908)
3. `loop-reduce` tries to do some further modification, but crashes
with an type assertion in cast, because `%dec` is no longer an `Instruction`,
If the runline is split into two, i.e. you first run `-indvars -loop-instsimplify`,
store that into a file, and then run `-loop-reduce`, there is no crash.
So it looks like the problem is due to `-loop-instsimplify` not discarding SCEV.
But in this case we can just not crash if it's not an `Instruction`.
This is just a local fix, unlike D48597, so there may very well be other problems.
Reviewers: mkazantsev, uabelho, sanjoy, silviu.baranga, wmi
Reviewed By: mkazantsev
Subscribers: evstupac, javed.absar, spatel, llvm-commits
Differential Revision: https://reviews.llvm.org/D48599
llvm-svn: 335950
Summary:
Two utils methods have essentially the same functionality. This is an attempt to merge them into one.
1. lib/Transforms/Utils/Local.cpp : MergeBasicBlockIntoOnlyPred
2. lib/Transforms/Utils/BasicBlockUtils.cpp : MergeBlockIntoPredecessor
Prior to the patch:
1. MergeBasicBlockIntoOnlyPred
Updates either DomTree or DeferredDominance
Moves all instructions from Pred to BB, deletes Pred
Asserts BB has single predecessor
If address was taken, replace the block address with constant 1 (?)
2. MergeBlockIntoPredecessor
Updates DomTree, LoopInfo and MemoryDependenceResults
Moves all instruction from BB to Pred, deletes BB
Returns if doesn't have a single predecessor
Returns if BB's address was taken
After the patch:
Method 2. MergeBlockIntoPredecessor is attempting to become the new default:
Updates DomTree or DeferredDominance, and LoopInfo and MemoryDependenceResults
Moves all instruction from BB to Pred, deletes BB
Returns if doesn't have a single predecessor
Returns if BB's address was taken
Uses of MergeBasicBlockIntoOnlyPred that need to be replaced:
1. lib/Transforms/Scalar/LoopSimplifyCFG.cpp
Updated in this patch. No challenges.
2. lib/CodeGen/CodeGenPrepare.cpp
Updated in this patch.
i. eliminateFallThrough is straightforward, but I added using a temporary array to avoid the iterator invalidation.
ii. eliminateMostlyEmptyBlock(s) methods also now use a temporary array for blocks
Some interesting aspects:
- Since Pred is not deleted (BB is), the entry block does not need updating.
- The entry block was being updated with the deleted block in eliminateMostlyEmptyBlock. Added assert to make obvious that BB=SinglePred.
- isMergingEmptyBlockProfitable assumes BB is the one to be deleted.
- eliminateMostlyEmptyBlock(BB) does not delete BB on one path, it deletes its unique predecessor instead.
- adding some test owner as subscribers for the interesting tests modified:
test/CodeGen/X86/avx-cmp.ll
test/CodeGen/AMDGPU/nested-loop-conditions.ll
test/CodeGen/AMDGPU/si-annotate-cf.ll
test/CodeGen/X86/hoist-spill.ll
test/CodeGen/X86/2006-11-17-IllegalMove.ll
3. lib/Transforms/Scalar/JumpThreading.cpp
Not covered in this patch. It is the only use case using the DeferredDominance.
I would defer to Brian Rzycki to make this replacement.
Reviewers: chandlerc, spatel, davide, brzycki, bkramer, javed.absar
Subscribers: qcolombet, sanjoy, nemanjai, nhaehnle, jlebar, tpr, kbarton, RKSimon, wmi, arsenm, llvm-commits
Differential Revision: https://reviews.llvm.org/D48202
llvm-svn: 335183
The purpose of this patch is to have LSR generate better code on Power.
This is done by overriding isLSRCostLess.
Differential Revision: https://reviews.llvm.org/D40855
llvm-svn: 326906
In the motivating case from PR35681 and represented by the macro-fuse-cmp test:
https://bugs.llvm.org/show_bug.cgi?id=35681
...there's a 37 -> 31 byte size win for the loop because we eliminate the big base
address offsets.
SPEC2017 on Ryzen shows no significant perf difference.
Differential Revision: https://reviews.llvm.org/D42607
llvm-svn: 324289
Summary:
Before emitting code for scaled registers, we prevent
SCEVExpander from hoisting any scaled addressing mode
by emitting all the bases first. However, these bases
are being forced to the final type, resulting in some
odd code.
For example, if the type of the base is an integer and
the final type is a pointer, we will emit an inttoptr
for the base, a ptrtoint for the scale, and then a
'reverse' GEP where the GEP pointer is actually the base
integer and the index is the pointer. It's more intuitive
to use the pointer as a pointer and the integer as index.
Patch by: Bevin Hansson
Reviewers: atrick, qcolombet, sanjoy
Reviewed By: qcolombet
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D42103
llvm-svn: 323946
Discussed here:
http://lists.llvm.org/pipermail/llvm-dev/2018-January/120320.html
In preparation for adding support for named vregs we are changing the sigil for
physical registers in MIR to '$' from '%'. This will prevent name clashes of
named physical register with named vregs.
llvm-svn: 323922
SCEV tracks the correspondence of created SCEV to original instruction.
However during creation of SCEV it is possible that nuw/nsw/exact flags are
lost.
As a result during expansion of the SCEV the instruction with nuw/nsw/exact
will be used where it was expected and we produce poison incorreclty.
Reviewers: sanjoy, mkazantsev, sebpop, jbhateja
Reviewed By: sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41578
llvm-svn: 322058
Summary:
1/ Operand folding during complex pattern matching for LEAs has been extended, such that it promotes Scale to
accommodate similar operand appearing in the DAG e.g.
T1 = A + B
T2 = T1 + 10
T3 = T2 + A
For above DAG rooted at T3, X86AddressMode will now look like
Base = B , Index = A , Scale = 2 , Disp = 10
2/ During OptimizeLEAPass down the pipeline factorization is now performed over LEAs so that if there is an opportunity
then complex LEAs (having 3 operands) could be factored out e.g.
leal 1(%rax,%rcx,1), %rdx
leal 1(%rax,%rcx,2), %rcx
will be factored as following
leal 1(%rax,%rcx,1), %rdx
leal (%rdx,%rcx) , %edx
3/ Aggressive operand folding for AM based selection for LEAs is sensitive to loops, thus avoiding creation of any complex LEAs within a loop.
4/ Simplify LEA converts (lea (BASE,1,INDEX,0) --> add (BASE, INDEX) which offers better through put.
PR32755 will be taken care of by this pathc.
Previous patch revisions : r313343 , r314886
Reviewers: lsaba, RKSimon, craig.topper, qcolombet, jmolloy, jbhateja
Reviewed By: lsaba, RKSimon, jbhateja
Subscribers: jmolloy, spatel, igorb, llvm-commits
Differential Revision: https://reviews.llvm.org/D35014
llvm-svn: 319543
It broke the Chromium / SQLite build; see PR34830.
> Summary:
> 1/ Operand folding during complex pattern matching for LEAs has been
> extended, such that it promotes Scale to accommodate similar operand
> appearing in the DAG.
> e.g.
> T1 = A + B
> T2 = T1 + 10
> T3 = T2 + A
> For above DAG rooted at T3, X86AddressMode will no look like
> Base = B , Index = A , Scale = 2 , Disp = 10
>
> 2/ During OptimizeLEAPass down the pipeline factorization is now performed over LEAs
> so that if there is an opportunity then complex LEAs (having 3 operands)
> could be factored out.
> e.g.
> leal 1(%rax,%rcx,1), %rdx
> leal 1(%rax,%rcx,2), %rcx
> will be factored as following
> leal 1(%rax,%rcx,1), %rdx
> leal (%rdx,%rcx) , %edx
>
> 3/ Aggressive operand folding for AM based selection for LEAs is sensitive to loops,
> thus avoiding creation of any complex LEAs within a loop.
>
> Reviewers: lsaba, RKSimon, craig.topper, qcolombet, jmolloy
>
> Reviewed By: lsaba
>
> Subscribers: jmolloy, spatel, igorb, llvm-commits
>
> Differential Revision: https://reviews.llvm.org/D35014
llvm-svn: 314919
Summary:
1/ Operand folding during complex pattern matching for LEAs has been
extended, such that it promotes Scale to accommodate similar operand
appearing in the DAG.
e.g.
T1 = A + B
T2 = T1 + 10
T3 = T2 + A
For above DAG rooted at T3, X86AddressMode will no look like
Base = B , Index = A , Scale = 2 , Disp = 10
2/ During OptimizeLEAPass down the pipeline factorization is now performed over LEAs
so that if there is an opportunity then complex LEAs (having 3 operands)
could be factored out.
e.g.
leal 1(%rax,%rcx,1), %rdx
leal 1(%rax,%rcx,2), %rcx
will be factored as following
leal 1(%rax,%rcx,1), %rdx
leal (%rdx,%rcx) , %edx
3/ Aggressive operand folding for AM based selection for LEAs is sensitive to loops,
thus avoiding creation of any complex LEAs within a loop.
Reviewers: lsaba, RKSimon, craig.topper, qcolombet, jmolloy
Reviewed By: lsaba
Subscribers: jmolloy, spatel, igorb, llvm-commits
Differential Revision: https://reviews.llvm.org/D35014
llvm-svn: 314886
This caused PR34629: asserts firing when building Chromium. It also broke some
buildbots building test-suite as reported on the commit thread.
> Summary:
> 1/ Operand folding during complex pattern matching for LEAs has been
> extended, such that it promotes Scale to accommodate similar operand
> appearing in the DAG.
> e.g.
> T1 = A + B
> T2 = T1 + 10
> T3 = T2 + A
> For above DAG rooted at T3, X86AddressMode will no look like
> Base = B , Index = A , Scale = 2 , Disp = 10
>
> 2/ During OptimizeLEAPass down the pipeline factorization is now performed over LEAs
> so that if there is an opportunity then complex LEAs (having 3 operands)
> could be factored out.
> e.g.
> leal 1(%rax,%rcx,1), %rdx
> leal 1(%rax,%rcx,2), %rcx
> will be factored as following
> leal 1(%rax,%rcx,1), %rdx
> leal (%rdx,%rcx) , %edx
>
> 3/ Aggressive operand folding for AM based selection for LEAs is sensitive to loops,
> thus avoiding creation of any complex LEAs within a loop.
>
> Reviewers: lsaba, RKSimon, craig.topper, qcolombet
>
> Reviewed By: lsaba
>
> Subscribers: spatel, igorb, llvm-commits
>
> Differential Revision: https://reviews.llvm.org/D35014
llvm-svn: 313376
Summary:
1/ Operand folding during complex pattern matching for LEAs has been
extended, such that it promotes Scale to accommodate similar operand
appearing in the DAG.
e.g.
T1 = A + B
T2 = T1 + 10
T3 = T2 + A
For above DAG rooted at T3, X86AddressMode will no look like
Base = B , Index = A , Scale = 2 , Disp = 10
2/ During OptimizeLEAPass down the pipeline factorization is now performed over LEAs
so that if there is an opportunity then complex LEAs (having 3 operands)
could be factored out.
e.g.
leal 1(%rax,%rcx,1), %rdx
leal 1(%rax,%rcx,2), %rcx
will be factored as following
leal 1(%rax,%rcx,1), %rdx
leal (%rdx,%rcx) , %edx
3/ Aggressive operand folding for AM based selection for LEAs is sensitive to loops,
thus avoiding creation of any complex LEAs within a loop.
Reviewers: lsaba, RKSimon, craig.topper, qcolombet
Reviewed By: lsaba
Subscribers: spatel, igorb, llvm-commits
Differential Revision: https://reviews.llvm.org/D35014
llvm-svn: 313343
When LSR processes code like
int accumulator = 0;
for (int i = 0; i < N; i++) {
accummulator += i;
use((double) accummulator);
}
It may decide to replace integer `accumulator` with a double Shadow IV to get rid
of casts. The problem with that is that the `accumulator`'s value may overflow.
Starting from this moment, the behavior of integer and double accumulators
will differ.
This patch strenghtens up the conditions of Shadow IV mechanism applicability.
We only allow it for IVs that are proved to be `AddRec`s with `nsw`/`nuw` flag.
Differential Revision: https://reviews.llvm.org/D37209
llvm-svn: 311986