Commit Graph

360 Commits

Author SHA1 Message Date
Gil Rapaport d874c3a480 [LSR] Add tests for small constants; NFC
LSR reassociates small constants that fit into add immediate operands as
unfolded offset. Since unfolded offset is not combined with loop-invariant
registers, LSR does not consider solutions that bump invariant registers by
these constants outside the loop.

llvm-svn: 341835
2018-09-10 14:56:24 +00:00
Roman Lebedev 8d081b78e4 SCEVExpander::expandAddRecExprLiterally(): check before casting as Instruction
Summary:
An alternative to D48597.
Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=37936 | PR37936 ]].

The problem is as follows:
1. `indvars` marks `%dec` as `NUW`.
2. `loop-instsimplify` runs `instsimplify`, which constant-folds `%dec` to -1 (D47908)
3. `loop-reduce` tries to do some further modification, but crashes
    with an type assertion in cast, because `%dec` is no longer an `Instruction`,

If the runline is split into two, i.e. you first run `-indvars -loop-instsimplify`,
store that into a file, and then run `-loop-reduce`, there is no crash.

So it looks like the problem is due to `-loop-instsimplify` not discarding SCEV.
But in this case we can just not crash if it's not an `Instruction`.
This is just a local fix, unlike D48597, so there may very well be other problems.

Reviewers: mkazantsev, uabelho, sanjoy, silviu.baranga, wmi

Reviewed By: mkazantsev

Subscribers: evstupac, javed.absar, spatel, llvm-commits

Differential Revision: https://reviews.llvm.org/D48599

llvm-svn: 335950
2018-06-29 07:44:20 +00:00
Alina Sbirlea dfd14adeb0 Generalize MergeBlockIntoPredecessor. Replace uses of MergeBasicBlockIntoOnlyPred.
Summary:
Two utils methods have essentially the same functionality. This is an attempt to merge them into one.
1. lib/Transforms/Utils/Local.cpp : MergeBasicBlockIntoOnlyPred
2. lib/Transforms/Utils/BasicBlockUtils.cpp : MergeBlockIntoPredecessor

Prior to the patch:
1. MergeBasicBlockIntoOnlyPred
Updates either DomTree or DeferredDominance
Moves all instructions from Pred to BB, deletes Pred
Asserts BB has single predecessor
If address was taken, replace the block address with constant 1 (?)

2. MergeBlockIntoPredecessor
Updates DomTree, LoopInfo and MemoryDependenceResults
Moves all instruction from BB to Pred, deletes BB
Returns if doesn't have a single predecessor
Returns if BB's address was taken

After the patch:
Method 2. MergeBlockIntoPredecessor is attempting to become the new default:
Updates DomTree or DeferredDominance, and LoopInfo and MemoryDependenceResults
Moves all instruction from BB to Pred, deletes BB
Returns if doesn't have a single predecessor
Returns if BB's address was taken

Uses of MergeBasicBlockIntoOnlyPred that need to be replaced:

1. lib/Transforms/Scalar/LoopSimplifyCFG.cpp
Updated in this patch. No challenges.

2. lib/CodeGen/CodeGenPrepare.cpp
Updated in this patch.
  i. eliminateFallThrough is straightforward, but I added using a temporary array to avoid the iterator invalidation.
  ii. eliminateMostlyEmptyBlock(s) methods also now use a temporary array for blocks
Some interesting aspects:
  - Since Pred is not deleted (BB is), the entry block does not need updating.
  - The entry block was being updated with the deleted block in eliminateMostlyEmptyBlock. Added assert to make obvious that BB=SinglePred.
  - isMergingEmptyBlockProfitable assumes BB is the one to be deleted.
  - eliminateMostlyEmptyBlock(BB) does not delete BB on one path, it deletes its unique predecessor instead.
  - adding some test owner as subscribers for the interesting tests modified:
    test/CodeGen/X86/avx-cmp.ll
    test/CodeGen/AMDGPU/nested-loop-conditions.ll
    test/CodeGen/AMDGPU/si-annotate-cf.ll
    test/CodeGen/X86/hoist-spill.ll
    test/CodeGen/X86/2006-11-17-IllegalMove.ll

3. lib/Transforms/Scalar/JumpThreading.cpp
Not covered in this patch. It is the only use case using the DeferredDominance.
I would defer to Brian Rzycki to make this replacement.

Reviewers: chandlerc, spatel, davide, brzycki, bkramer, javed.absar

Subscribers: qcolombet, sanjoy, nemanjai, nhaehnle, jlebar, tpr, kbarton, RKSimon, wmi, arsenm, llvm-commits

Differential Revision: https://reviews.llvm.org/D48202

llvm-svn: 335183
2018-06-20 22:01:04 +00:00
Daniil Fukalov 37433dc2e1 reapply r334209 with fixes for harfbuzz in Chromium
r334209 description:
[LSR] Check yet more intrinsic pointer operands

the patch fixes another assertion in isLegalUse()

Differential Revision: https://reviews.llvm.org/D47794

llvm-svn: 334300
2018-06-08 16:22:52 +00:00
Reid Kleckner a3609f75b2 Revert r334209 "[LSR] Check yet more intrinsic pointer operands"
This causes cast failures when compiling harfbuzz in Chromium.
Reproducer on the way.

llvm-svn: 334254
2018-06-08 00:43:27 +00:00
Daniil Fukalov 12c0663a25 [LSR] Check yet more intrinsic pointer operands
the patch fixes another assertion in isLegalUse()

Differential Revision: https://reviews.llvm.org/D47794

llvm-svn: 334209
2018-06-07 17:30:58 +00:00
Stanislav Mekhanoshin 595fdcf43b [AMDGPU] Move lsr test. NFC.
llvm-svn: 332562
2018-05-17 01:30:51 +00:00
Evgeny Stupachenko bff9302c3d Fix LSR compile time hang.
Summary:
Limit number of reassociations in GenerateReassociationsImpl.

Reviewers: qcolombet, mkazantsev

Differential Revision: https://reviews.llvm.org/D46039

From: Evgeny Stupachenko <evstupac@gmail.com>
                         <evgeny.v.stupachenko@intel.com>
llvm-svn: 332426
2018-05-16 02:48:50 +00:00
Stefan Pintilie ef7c4976bb Revert "[PowerPC] LSR tunings for PowerPC"
Revert the rest of the LST tune commit.
It seems that the LSR tune commit breaks internal tests.
Reverting the commit.

llvm-svn: 327143
2018-03-09 16:08:55 +00:00
Stefan Pintilie 7f879a8467 Revert "[PowerPC] Move test to correct location."
Revert part of the LSR tune commit.

llvm-svn: 327142
2018-03-09 16:08:48 +00:00
Stefan Pintilie f8c2dce236 [PowerPC] Move test to correct location.
Test was added in r326906 to an incorrect location.
Moving the test to PPC CodeGen directory as the test is PPC specific.

llvm-svn: 326923
2018-03-07 18:27:10 +00:00
Stefan Pintilie f8438e8e59 [PowerPC] LSR tunings for PowerPC
The purpose of this patch is to have LSR generate better code on Power.
This is done by overriding isLSRCostLess.

Differential Revision: https://reviews.llvm.org/D40855

llvm-svn: 326906
2018-03-07 16:53:09 +00:00
Sanjay Patel d7c702b451 [LoopStrengthReduce, x86] don't add cost for a cmp that will be macro-fused (PR35681)
In the motivating case from PR35681 and represented by the macro-fuse-cmp test:
https://bugs.llvm.org/show_bug.cgi?id=35681
...there's a 37 -> 31 byte size win for the loop because we eliminate the big base 
address offsets.

SPEC2017 on Ryzen shows no significant perf difference.

Differential Revision: https://reviews.llvm.org/D42607

llvm-svn: 324289
2018-02-05 23:43:05 +00:00
Yaxun Liu 2a22c5deff [AMDGPU] Switch to the new addr space mapping by default
This requires corresponding clang change.

Differential Revision: https://reviews.llvm.org/D40955

llvm-svn: 324101
2018-02-02 16:07:16 +00:00
Mikael Holmen 6d06976e74 [LSR] Don't force bases of foldable formulae to the final type.
Summary:
Before emitting code for scaled registers, we prevent
SCEVExpander from hoisting any scaled addressing mode
by emitting all the bases first. However, these bases
are being forced to the final type, resulting in some
odd code.

For example, if the type of the base is an integer and
the final type is a pointer, we will emit an inttoptr
for the base, a ptrtoint for the scale, and then a
'reverse' GEP where the GEP pointer is actually the base
integer and the index is the pointer. It's more intuitive
to use the pointer as a pointer and the integer as index.

Patch by: Bevin Hansson

Reviewers: atrick, qcolombet, sanjoy

Reviewed By: qcolombet

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D42103

llvm-svn: 323946
2018-02-01 06:38:34 +00:00
Puyan Lotfi 43e94b15ea Followup on Proposal to move MIR physical register namespace to '$' sigil.
Discussed here:

http://lists.llvm.org/pipermail/llvm-dev/2018-January/120320.html

In preparation for adding support for named vregs we are changing the sigil for
physical registers in MIR to '$' from '%'. This will prevent name clashes of
named physical register with named vregs.

llvm-svn: 323922
2018-01-31 22:04:26 +00:00
Sanjay Patel ffb37a29d1 [LoopStrengthReduce] add test to show potential macro-fusion-based diff (PR35681); NFC
This is the baseline output for the test proposed with D42607.

llvm-svn: 323806
2018-01-30 19:17:38 +00:00
Sanjay Patel 5bce08ddff [x86] auto-generate complete checks; NFC
llvm-svn: 323571
2018-01-26 22:06:07 +00:00
Serguei Katkov 6a7a4c6a55 [SCEV] Do not cache S -> V if S is not equivalent of V
SCEV tracks the correspondence of created SCEV to original instruction.
However during creation of SCEV it is possible that nuw/nsw/exact flags are
lost.

As a result during expansion of the SCEV the instruction with nuw/nsw/exact
will be used where it was expected and we produce poison incorreclty.

Reviewers: sanjoy, mkazantsev, sebpop, jbhateja
Reviewed By: sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41578

llvm-svn: 322058
2018-01-09 06:47:14 +00:00
Matt Arsenault 3e268cc0dd LSR: Check more intrinsic pointer operands
llvm-svn: 320424
2017-12-11 21:38:43 +00:00
Matt Morehouse 9e658c974b Revert "[X86] Improvement in CodeGen instruction selection for LEAs."
This reverts r319543, due to ASan bot breakage.

llvm-svn: 319591
2017-12-01 22:20:26 +00:00
Jatin Bhateja 328199ec26 [X86] Improvement in CodeGen instruction selection for LEAs.
Summary:
1/  Operand folding during complex pattern matching for LEAs has been extended, such that it promotes Scale to
     accommodate similar operand appearing in the DAG  e.g.
                 T1 = A + B
                 T2 = T1 + 10
                 T3 = T2 + A
    For above DAG rooted at T3, X86AddressMode will now look like
                Base = B , Index = A , Scale = 2 , Disp = 10

2/  During OptimizeLEAPass down the pipeline factorization is now performed over LEAs so that if there is an opportunity
     then complex LEAs (having 3 operands) could be factored out  e.g.
                 leal 1(%rax,%rcx,1), %rdx
                 leal 1(%rax,%rcx,2), %rcx
     will be factored as following
                 leal 1(%rax,%rcx,1), %rdx
                 leal (%rdx,%rcx)   , %edx

3/ Aggressive operand folding for AM based selection for LEAs is sensitive to loops, thus avoiding creation of any complex LEAs within a loop.

4/ Simplify LEA converts (lea (BASE,1,INDEX,0)  --> add (BASE, INDEX) which offers better through put.

PR32755 will be taken care of by this pathc.

Previous patch revisions : r313343 , r314886

Reviewers: lsaba, RKSimon, craig.topper, qcolombet, jmolloy, jbhateja

Reviewed By: lsaba, RKSimon, jbhateja

Subscribers: jmolloy, spatel, igorb, llvm-commits

Differential Revision: https://reviews.llvm.org/D35014

llvm-svn: 319543
2017-12-01 14:07:38 +00:00
Hans Wennborg 2a6c9adb2f Revert r314886 "[X86] Improvement in CodeGen instruction selection for LEAs (re-applying post required revision changes.)"
It broke the Chromium / SQLite build; see PR34830.

> Summary:
>    1/  Operand folding during complex pattern matching for LEAs has been
>        extended, such that it promotes Scale to accommodate similar operand
>        appearing in the DAG.
>        e.g.
>          T1 = A + B
>          T2 = T1 + 10
>          T3 = T2 + A
>        For above DAG rooted at T3, X86AddressMode will no look like
>          Base = B , Index = A , Scale = 2 , Disp = 10
>
>    2/  During OptimizeLEAPass down the pipeline factorization is now performed over LEAs
>        so that if there is an opportunity then complex LEAs (having 3 operands)
>        could be factored out.
>        e.g.
>          leal 1(%rax,%rcx,1), %rdx
>          leal 1(%rax,%rcx,2), %rcx
>        will be factored as following
>          leal 1(%rax,%rcx,1), %rdx
>          leal (%rdx,%rcx)   , %edx
>
>    3/ Aggressive operand folding for AM based selection for LEAs is sensitive to loops,
>       thus avoiding creation of any complex LEAs within a loop.
>
> Reviewers: lsaba, RKSimon, craig.topper, qcolombet, jmolloy
>
> Reviewed By: lsaba
>
> Subscribers: jmolloy, spatel, igorb, llvm-commits
>
>     Differential Revision: https://reviews.llvm.org/D35014

llvm-svn: 314919
2017-10-04 17:54:06 +00:00
Jatin Bhateja 3c29bacd43 [X86] Improvement in CodeGen instruction selection for LEAs (re-applying post required revision changes.)
Summary:
   1/  Operand folding during complex pattern matching for LEAs has been
       extended, such that it promotes Scale to accommodate similar operand
       appearing in the DAG.
       e.g.
         T1 = A + B
         T2 = T1 + 10
         T3 = T2 + A
       For above DAG rooted at T3, X86AddressMode will no look like
         Base = B , Index = A , Scale = 2 , Disp = 10

   2/  During OptimizeLEAPass down the pipeline factorization is now performed over LEAs
       so that if there is an opportunity then complex LEAs (having 3 operands)
       could be factored out.
       e.g.
         leal 1(%rax,%rcx,1), %rdx
         leal 1(%rax,%rcx,2), %rcx
       will be factored as following
         leal 1(%rax,%rcx,1), %rdx
         leal (%rdx,%rcx)   , %edx

   3/ Aggressive operand folding for AM based selection for LEAs is sensitive to loops,
      thus avoiding creation of any complex LEAs within a loop.

Reviewers: lsaba, RKSimon, craig.topper, qcolombet, jmolloy

Reviewed By: lsaba

Subscribers: jmolloy, spatel, igorb, llvm-commits

    Differential Revision: https://reviews.llvm.org/D35014

llvm-svn: 314886
2017-10-04 09:02:10 +00:00
Hans Wennborg 534bfbd3ba Revert r313343 "[X86] PR32755 : Improvement in CodeGen instruction selection for LEAs."
This caused PR34629: asserts firing when building Chromium. It also broke some
buildbots building test-suite as reported on the commit thread.

> Summary:
>    1/  Operand folding during complex pattern matching for LEAs has been
>        extended, such that it promotes Scale to accommodate similar operand
>        appearing in the DAG.
>        e.g.
>           T1 = A + B
>           T2 = T1 + 10
>           T3 = T2 + A
>        For above DAG rooted at T3, X86AddressMode will no look like
>           Base = B , Index = A , Scale = 2 , Disp = 10
>
>    2/  During OptimizeLEAPass down the pipeline factorization is now performed over LEAs
>        so that if there is an opportunity then complex LEAs (having 3 operands)
>        could be factored out.
>        e.g.
>           leal 1(%rax,%rcx,1), %rdx
>           leal 1(%rax,%rcx,2), %rcx
>        will be factored as following
>           leal 1(%rax,%rcx,1), %rdx
>           leal (%rdx,%rcx)   , %edx
>
>    3/ Aggressive operand folding for AM based selection for LEAs is sensitive to loops,
>       thus avoiding creation of any complex LEAs within a loop.
>
> Reviewers: lsaba, RKSimon, craig.topper, qcolombet
>
> Reviewed By: lsaba
>
> Subscribers: spatel, igorb, llvm-commits
>
> Differential Revision: https://reviews.llvm.org/D35014

llvm-svn: 313376
2017-09-15 18:40:26 +00:00
Jatin Bhateja 908c8b37c2 [X86] PR32755 : Improvement in CodeGen instruction selection for LEAs.
Summary:
   1/  Operand folding during complex pattern matching for LEAs has been
       extended, such that it promotes Scale to accommodate similar operand
       appearing in the DAG.
       e.g.
          T1 = A + B
          T2 = T1 + 10
          T3 = T2 + A
       For above DAG rooted at T3, X86AddressMode will no look like
          Base = B , Index = A , Scale = 2 , Disp = 10

   2/  During OptimizeLEAPass down the pipeline factorization is now performed over LEAs
       so that if there is an opportunity then complex LEAs (having 3 operands)
       could be factored out.
       e.g.
          leal 1(%rax,%rcx,1), %rdx
          leal 1(%rax,%rcx,2), %rcx
       will be factored as following
          leal 1(%rax,%rcx,1), %rdx
          leal (%rdx,%rcx)   , %edx

   3/ Aggressive operand folding for AM based selection for LEAs is sensitive to loops,
      thus avoiding creation of any complex LEAs within a loop.

Reviewers: lsaba, RKSimon, craig.topper, qcolombet

Reviewed By: lsaba

Subscribers: spatel, igorb, llvm-commits

Differential Revision: https://reviews.llvm.org/D35014

llvm-svn: 313343
2017-09-15 05:29:51 +00:00
Max Kazantsev bb1d010872 [LSR] Fix Shadow IV in case of integer overflow
When LSR processes code like

  int accumulator = 0;
  for (int i = 0; i < N; i++) {
    accummulator += i;
    use((double) accummulator);
  }

It may decide to replace integer `accumulator` with a double Shadow IV to get rid
of casts.  The problem with that is that the `accumulator`'s value may overflow.
Starting from this moment, the behavior of integer and double accumulators
will differ.

This patch strenghtens up the conditions of Shadow IV mechanism applicability.
We only allow it for IVs that are proved to be `AddRec`s with `nsw`/`nuw` flag.

Differential Revision: https://reviews.llvm.org/D37209

llvm-svn: 311986
2017-08-29 07:32:20 +00:00
Max Kazantsev f2e017b083 [NFC] Fix indents in test
llvm-svn: 311982
2017-08-29 05:30:58 +00:00
Max Kazantsev 03407da281 [NFC] Refactor ShadowIV test to use FileCheck
Also get rid of unnamed values that make the test hard to read.

llvm-svn: 311980
2017-08-29 05:20:56 +00:00
Evgeny Astigeevich 540a39adf7 [ARM, Thumb1] Prevent ARMTargetLowering::isLegalAddressingMode from accepting illegal modes
ARMTargetLowering::isLegalAddressingMode can accept illegal addressing modes
for the Thumb1 target. This causes generation of redundant code and affects
performance.

This fixes PR34106: https://bugs.llvm.org/show_bug.cgi?id=34106

Differential Revision: https://reviews.llvm.org/D36467

llvm-svn: 311649
2017-08-24 10:00:25 +00:00
Evgeny Stupachenko c675290680 Reapply fix PR23384 (part 3 of 3) r304824 (was reverted in r305720).
The root cause of reverting was fixed - PR33514.

Summary:
The patch makes instruction count the highest priority for
 LSR solution for X86 (previously registers had highest priority).

Reviewers: qcolombet

Differential Revision: http://reviews.llvm.org/D30562

From: Evgeny Stupachenko <evstupac@gmail.com>
                         <evgeny.v.stupachenko@intel.com>
llvm-svn: 310289
2017-08-07 19:56:34 +00:00
Amara Emerson 56dca4e3ca [SCEV] Preserve NSW information for sext(subtract).
Pushes the sext onto the operands of a Sub if NSW is present.
Also adds support for propagating the nowrap flags of the
llvm.ssub.with.overflow intrinsic during analysis.

Differential Revision: https://reviews.llvm.org/D35256

llvm-svn: 310117
2017-08-04 20:19:46 +00:00
Evgeny Stupachenko 38197c66a1 Fix PR33514
Summary:
The bug was uncovered after fix of  PR23384 (part 3 of 3).
The patch restricts pointer multiplication in SCEV computaion for ICmpZero.

Reviewers: qcolombet

Differential Revision: http://reviews.llvm.org/D36170

From: Evgeny Stupachenko <evstupac@gmail.com>
                         <evgeny.v.stupachenko@intel.com>
llvm-svn: 310092
2017-08-04 18:46:13 +00:00
Adrian Prantl abe04759a6 Remove the obsolete offset parameter from @llvm.dbg.value
There is no situation where this rarely-used argument cannot be
substituted with a DIExpression and removing it allows us to simplify
the DWARF backend. Note that this patch does not yet remove any of
the newly dead code.

rdar://problem/33580047
Differential Revision: https://reviews.llvm.org/D35951

llvm-svn: 309426
2017-07-28 20:21:02 +00:00
Wei Mi 90707394e3 [LSR] Narrow search space by filtering non-optimal formulae with the same ScaledReg and Scale.
When the formulae search space is huge, LSR uses a series of heuristic to keep
pruning the search space until the number of possible solutions are within
certain limit.

The big hammer of the series of heuristics is NarrowSearchSpaceByPickingWinnerRegs,
which picks the register which is used by the most LSRUses and deletes the other
formulae which don't use the register. This is a effective way to prune the search
space, but quite often not a good way to keep the best solution. We saw cases before
that the heuristic pruned the best formula candidate out of search space.

To relieve the problem, we introduce a new heuristic called
NarrowSearchSpaceByFilterFormulaWithSameScaledReg. The basic idea is in order to
reduce the search space while keeping the best formula, we want to keep as many
formulae with different Scale and ScaledReg as possible. That is because the central
idea of LSR is to choose a group of loop induction variables and use those induction
variables to represent LSRUses. An induction variable candidate is often represented
by the Scale and ScaledReg in a formula. If we have more formulae with different
ScaledReg and Scale to choose, we have better opportunity to find the best solution.
That is why we believe pruning search space by only keeping the best formula with the
same Scale and ScaledReg should be more effective than PickingWinnerReg. And we use
two criteria to choose the best formula with the same Scale and ScaledReg. The first
criteria is to select the formula using less non shared registers, and the second
criteria is to select the formula with less cost got from RateFormula. The patch
implements the heuristic before NarrowSearchSpaceByPickingWinnerRegs, which is the
last resort.

Testing shows we get 1.8% and 2% on two internal benchmarks on x86. llvm nightly
testsuite performance is neutral. We also tried lsr-exp-narrow and it didn't help
on the two improved internal cases we saw.

Differential Revision: https://reviews.llvm.org/D34583

llvm-svn: 307269
2017-07-06 15:52:14 +00:00
Hans Wennborg ca69fc1cb7 Revert r304824 "Fix PR23384 (part 3 of 3)"
This seems to be interacting badly with ASan somehow, causing false reports of
heap-buffer overflows: PR33514.

> Summary:
> The patch makes instruction count the highest priority for
> LSR solution for X86 (previously registers had highest priority).
>
> Reviewers: qcolombet
>
> Differential Revision: http://reviews.llvm.org/D30562
>
> From: Evgeny Stupachenko <evstupac@gmail.com>

llvm-svn: 305720
2017-06-19 17:57:15 +00:00
Max Kazantsev 35b2a18eb9 [SCEV] Teach SCEVExpander to expand BinPow
Current implementation of SCEVExpander demonstrates a very naive behavior when
it deals with power calculation. For example, a SCEV for x^8 looks like

  (x * x * x * x * x * x * x * x)

If we try to expand it, it generates a very straightforward sequence of muls, like:

  x2 = mul x, x
  x3 = mul x2, x
  x4 = mul x3, x
      ...
  x8 = mul x7, x

This is a non-efficient way of doing that. A better way is to generate a sequence of
binary power calculation. In this case the expanded calculation will look like:

  x2 = mul x, x
  x4 = mul x2, x2
  x8 = mul x4, x4

In some cases the code size reduction for such SCEVs is dramatic. If we had a loop:

  x = a;
  for (int i = 0; i < 3; i++)
    x = x * x;

And this loop have been fully unrolled, we have something like:

  x = a;
  x2 = x * x;
  x4 = x2 * x2;
  x8 = x4 * x4;

The SCEV for x8 is the same as in example above, and if we for some reason
want to expand it, we will generate naively 7 multiplications instead of 3.
The BinPow expansion algorithm here allows to keep code size reasonable.

This patch teaches SCEV Expander to generate a sequence of BinPow multiplications
if we have repeating arguments in SCEVMulExpressions.

Differential Revision: https://reviews.llvm.org/D34025

llvm-svn: 305663
2017-06-19 06:24:53 +00:00
Evgeny Stupachenko 3b88291581 Fix PR23384 (part 3 of 3)
Summary:
The patch makes instruction count the highest priority for
 LSR solution for X86 (previously registers had highest priority).

Reviewers: qcolombet

Differential Revision: http://reviews.llvm.org/D30562

From: Evgeny Stupachenko <evstupac@gmail.com>
llvm-svn: 304824
2017-06-06 20:04:16 +00:00
Craig Topper 2b54baeb96 [X86] Replace 'REQUIRES: x86' in tests with 'REQUIRES: x86-registered-target' which seems to be the correct way to make them run on an x86 build.
llvm-svn: 304682
2017-06-04 08:21:58 +00:00
Keno Fischer 090f1959c1 [SCEVExpander] Try harder to avoid introducing inttoptr
Summary:
This fixes introduction of an incorrect inttoptr/ptrtoint pair in
the included test case which makes use of non-integral pointers. I
suspect there are more cases like this left, but this takes care of
the one I was seeing at the moment.

Reviewers: sanjoy

Subscribers: mzolotukhin, llvm-commits

Differential Revision: https://reviews.llvm.org/D33129

llvm-svn: 304058
2017-05-27 03:22:55 +00:00
Max Kazantsev 41450329f7 Re-enable "[SCEV] Do not fold dominated SCEVUnknown into AddRecExpr start"
The patch rL303730 was reverted because test lsr-expand-quadratic.ll failed on
many non-X86 configs with this patch. The reason of this is that the patch
makes a correctless fix that changes optimizer's behavior for this test.
Without the change, LSR was making an overconfident simplification basing on a
wrong SCEV. Apparently it did not need the IV analysis to do this. With the
change, it chose a different way to simplify (that wasn't so confident), and
this way required the IV analysis. Now, following the right execution path,
LSR tries to make a transformation relying on IV Users analysis. This analysis
is target-dependent due to this code:

  // LSR is not APInt clean, do not touch integers bigger than 64-bits.
  // Also avoid creating IVs of non-native types. For example, we don't want a
  // 64-bit IV in 32-bit code just because the loop has one 64-bit cast.
  uint64_t Width = SE->getTypeSizeInBits(I->getType());
  if (Width > 64 || !DL.isLegalInteger(Width))
    return false;

To make a proper transformation in this test case, the type i32 needs to be
legal for the specified data layout. When the test runs on some non-X86
configuration (e.g. pure ARM 64), opt gets confused by the specified target
and does not use it, rejecting the specified data layout as well. Instead,
it uses some default layout that does not treat i32 as a legal type
(currently the layout that is used when it is not specified does not have
legal types at all). As result, the transformation we expect to happen does
not happen for this test.

This re-enabling patch does not have any source code changes compared to the
original patch rL303730. The only difference is that the failing test is
moved to X86 directory and now has requirement of running on x86 only to comply
with the specified target triple and data layout.

Differential Revision: https://reviews.llvm.org/D33543

llvm-svn: 303971
2017-05-26 06:47:04 +00:00
Diana Picus 183863fc3b Revert "[SCEV] Do not fold dominated SCEVUnknown into AddRecExpr start"
This reverts commit r303730 because it broke all the buildbots.

llvm-svn: 303747
2017-05-24 14:16:04 +00:00
Max Kazantsev 13e016bf48 [SCEV] Do not fold dominated SCEVUnknown into AddRecExpr start
When folding arguments of AddExpr or MulExpr with recurrences, we rely on the fact that
the loop of our base recurrency is the bottom-lost in terms of domination. This assumption
may be broken by an expression which is treated as invariant, and which depends on a complex
Phi for which SCEVUnknown was created. If such Phi is a loop Phi, and this loop is lower than
the chosen AddRecExpr's loop, it is invalid to fold our expression with the recurrence.

Another reason why it might be invalid to fold SCEVUnknown into Phi start value is that unlike
other SCEVs, SCEVUnknown are sometimes position-bound. For example, here:

for (...) { // loop
  phi = {A,+,B}
}
X = load ...
Folding phi + X into {A+X,+,B}<loop> actually makes no sense, because X does not exist and cannot
exist while we are iterating in loop (this memory can be even not allocated and not filled by this moment).
It is only valid to make such folding if X is defined before the loop. In this case the recurrence {A+X,+,B}<loop>
may be existant.

This patch prohibits folding of SCEVUnknown (and those who use them) into the start value of an AddRecExpr,
if this instruction is dominated by the loop. Merging the dominating unknown values is still valid. Some tests that
relied on the fact that some SCEVUnknown should be folded into AddRec's are changed so that they no longer
expect such behavior.

llvm-svn: 303730
2017-05-24 08:52:18 +00:00
Wei Mi 8848c1e3c7 [LSR] Call canonicalize after we generate a new Formula in GenerateTruncates. Fix PR33077.
The testcase in PR33077 generates a LSR Use Formula with two SCEVAddRecExprs for the same
loop. Such uncommon formula will become non-canonical after GenerateTruncates adds sign
extension to the ScaledReg of the Formula, and it will break the assertion that every
Formula to be inserted is canonical.

The fix is to call canonicalize for the raw Formula generated by GenerateTruncates
before inserting it.

llvm-svn: 303361
2017-05-18 17:21:22 +00:00
Tim Northover 8b1240b0f0 ARM: handle post-indexed NEON ops where the offset isn't the access width.
Before, we assumed that any ConstantInt offset was precisely the access width,
so we could use the "[rN]!" form. ISelLowering only ever created that kind, but
further simplification during combining could lead to unexpected constants and
incorrect codegen.

Should fix PR32658.

llvm-svn: 300878
2017-04-20 19:54:02 +00:00
Eli Friedman 5fba1e53f2 Turn on -addr-sink-using-gep by default.
The new codepath has been in the tree for years, and there isn't any
reason to use two codepaths here.

Differential Revision: https://reviews.llvm.org/D30596

llvm-svn: 299723
2017-04-06 22:42:18 +00:00
Matt Arsenault 3dbeefa978 AMDGPU: Mark all unspecified CC functions in tests as amdgpu_kernel
Currently the default C calling convention functions are treated
the same as compute kernels. Make this explicit so the default
calling convention can be changed to a non-kernel.

Converted with perl -pi -e 's/define void/define amdgpu_kernel void/'
on the relevant test directories (and undoing in one place that actually
wanted a non-kernel).

llvm-svn: 298444
2017-03-21 21:39:51 +00:00
Evgeny Stupachenko d6aa0d02c2 Set option enabling LSR alternative way to resolve complex solution to false.
Differential Revision: http://reviews.llvm.org/D29862

From: Evgeny Stupachenko <evstupac@gmail.com>
llvm-svn: 296959
2017-03-04 03:14:05 +00:00
Wei Mi 74d5a90fa6 [LSR] Canonicalize formula and put recursive Reg related with current loop in ScaledReg.
After rL294814, LSR formula can have multiple SCEVAddRecExprs inside of its BaseRegs.
Previous canonicalization will swap the first SCEVAddRecExpr in BaseRegs with ScaledReg.
But now we want to swap the SCEVAddRecExpr Reg related with current loop with ScaledReg.
Otherwise, we may generate code like this: RegA + lsr.iv + RegB, where loop invariant
parts RegA and RegB are not grouped together and cannot be promoted outside of loop.
With this patch, it will ensure lsr.iv to be generated later in the expr:
RegA + RegB + lsr.iv, so that RegA + RegB can be promoted outside of loop.

Differential Revision: https://reviews.llvm.org/D26781

llvm-svn: 295884
2017-02-22 21:47:08 +00:00
Evgeny Stupachenko 9909872e30 The patch introduces new way of narrowing complex (>UINT16 variants) solutions.
The new method introduced under "-lsr-exp-narrow" option (currenlty set to true).

Summary:

The method is based on registers number mathematical expectation and should be
 generally closer to optimal solution.
Please see details in comments to
 "LSRInstance::NarrowSearchSpaceByDeletingCostlyFormulas()" function
 (in lib/Transforms/Scalar/LoopStrengthReduce.cpp).

Reviewers: qcolombet

Differential Revision: http://reviews.llvm.org/D29862

From: Evgeny Stupachenko <evstupac@gmail.com>
llvm-svn: 295704
2017-02-21 07:34:40 +00:00
Wei Mi 493fb266ed [LSR] Prevent formula with SCEVAddRecExpr type of Reg from Sibling loops
In rL294814, we allow formula with SCEVAddRecExpr type of Reg from loops
other than current loop. This is good for the case when induction variable
of outerloop being used in expr in innerloop. But it is very bad to allow
such Reg from sibling loop because we may need to add lsr.iv in other sibling
loops when scev expanding those SCEVAddRecExpr type exprs. For the testcase
below, one loop can be inserted with a bunch of lsr.iv because of LSR for
other loops. 

// The induction variable j from a loop in the middle will have initial
// value generated from previous sibling loop and exit value used by its
// next sibling loop.
void goo(long i, long j); 
long cond; 

void foo(long N) { 
long i = 0; 
long j = 0; 
i = 0; do { goo(i, j); i++; j++; } while (cond); 
i = 0; do { goo(i, j); i++; j++; } while (cond); 
i = 0; do { goo(i, j); i++; j++; } while (cond); 
i = 0; do { goo(i, j); i++; j++; } while (cond); 
i = 0; do { goo(i, j); i++; j++; } while (cond); 
i = 0; do { goo(i, j); i++; j++; } while (cond); 
} 

The fix is to only allow formula with SCEVAddRecExpr type of Reg from current
loop or its parents.

Differential Revision: https://reviews.llvm.org/D30021

llvm-svn: 295378
2017-02-16 21:27:31 +00:00
Mikael Holmen ece84cd10c [LSR] Pointers with different address spaces are considered incompatible.
Summary:
Function isCompatibleIVType is already used as a guard before the call to

 SE.getMinusSCEV(OperExpr, PrevExpr);

in LSRInstance::ChainInstruction. getMinusSCEV requires the expressions
to be of the same type, so we now consider two pointers with different
address spaces to be incompatible, since it is possible that the pointers
in fact have different sizes.

Reviewers: qcolombet, eli.friedman

Reviewed By: qcolombet

Subscribers: nhaehnle, Ka-Ka, llvm-commits, mzolotukhin

Differential Revision: https://reviews.llvm.org/D29885

llvm-svn: 295033
2017-02-14 06:37:42 +00:00
Evgeny Stupachenko 5f3d9b6c09 The patch fixes r294821
Summary:
Update register match for windows testing

From: Evgeny Stupachenko <evstupac@gmail.com>
llvm-svn: 294825
2017-02-11 05:39:00 +00:00
Evgeny Stupachenko fe6f548d2d Fix PR23384 (under "-lsr-insns-cost" option)
Summary:
The patch adds instructions number generated by a solution
 to LSR cost under "-lsr-insns-cost" option.

Reviewers: qcolombet, hfinkel

Differential Revision: http://reviews.llvm.org/D28307

From: Evgeny Stupachenko <evstupac@gmail.com>
llvm-svn: 294821
2017-02-11 02:57:43 +00:00
Wei Mi 8f20e63a20 [LSR] Recommit: Allow formula containing Reg for SCEVAddRecExpr related with outerloop.
The recommit includes some changes of testcases. No functional change to the patch.

In RateRegister of existing LSR, if a formula contains a Reg which is a SCEVAddRecExpr,
and this SCEVAddRecExpr's loop is an outerloop, the formula will be marked as Loser
and dropped.

Suppose we have an IR that %for.body is outerloop and %for.body2 is innerloop. LSR only
handle inner loop now so only %for.body2 will be handled.

Using the logic above, formula like
reg(%array) + reg({1,+, %size}<%for.body>) + 1*reg({0,+,1}<%for.body2>) will be dropped
no matter what because reg({1,+, %size}<%for.body>) is a SCEVAddRecExpr type reg related
with outerloop. Only formula like
reg(%array) + 1*reg({{1,+, %size}<%for.body>,+,1}<nuw><nsw><%for.body2>) will be kept
because the SCEVAddRecExpr related with outerloop is folded into the initial value of the
SCEVAddRecExpr related with current loop.

But in some cases, we do need to share the basic induction variable
reg{0 ,+, 1}<%for.body2> among LSR Uses to reduce the final total number of induction
variables used by LSR, so we don't want to drop the formula like
reg(%array) + reg({1,+, %size}<%for.body>) + 1*reg({0,+,1}<%for.body2>) unconditionally.

From the existing comment, it tries to avoid considering multiple level loops at the same time.
However, existing LSR only handles innermost loop, so for any SCEVAddRecExpr with a loop other
than current loop, it is an invariant and will be simple to handle, and the formula doesn't have
to be dropped.

Differential Revision: https://reviews.llvm.org/D26429

llvm-svn: 294814
2017-02-11 00:50:23 +00:00
Matt Arsenault cb3fa37c7e LSR: Check atomic instruction pointer operands
llvm-svn: 294410
2017-02-08 06:44:58 +00:00
Matt Arsenault 1f2ca66317 LSR: Don't drop address space when type doesn't match
For targets with different addressing modes in each address space,
if this is dropped querying isLegalAddressingMode later with this
will give a nonsense result, breaking the isLegalUse assertions.

This is a candidate for the 4.0 release branch.

llvm-svn: 293542
2017-01-30 19:50:17 +00:00
Chandler Carruth d501b18990 This test apparently requires an x86 target and is failing on numerous
bots ever since d0k fixed the CHECK lines so that it did something at
all.

It isn't actually testing SCEV directly but LSR, so move it into LSR and
the x86-specific tree of tests that already exists there. Target
dependence is common and unavoidable with the current design of LSR.

llvm-svn: 292774
2017-01-23 08:33:29 +00:00
Chandler Carruth 0952750fae [PM] Clean up the testing for IVUsers, especially with the new PM.
First, I've moved a test of IVUsers from the LSR tree to a dedicated
IVUsers test directory. I've also simplified its RUN line now that the
new pass manager's loop PM is providing analyses on their own.

No functionality changed, but it makes subsequent changes cleaner.

llvm-svn: 292060
2017-01-15 09:29:27 +00:00
David Majnemer bba17390c7 [LoopStrengthReduce] Don't bother rewriting PHIs in catchswitch blocks
The catchswitch instruction cannot be split, don't bother trying to
rewrite it.

This fixes PR31627.

llvm-svn: 291966
2017-01-13 22:24:27 +00:00
Wei Mi 37c4aaaf52 Revert r286999 which caused buildbot test failures. Some testcases need to be made target specific.
llvm-svn: 287014
2016-11-15 19:42:05 +00:00
Wei Mi 7ccf7651c0 [LSR] Allow formula containing Reg for SCEVAddRecExpr related with outerloop.
In RateRegister of existing LSR, if a formula contains a Reg which is a SCEVAddRecExpr,
and this SCEVAddRecExpr's loop is an outerloop, the formula will be marked as Loser
and dropped.

Suppose we have an IR that %for.body is outerloop and %for.body2 is innerloop. LSR only
handle inner loop now so only %for.body2 will be handled.

Using the logic above, formula like
reg(%array) + reg({1,+, %size}<%for.body>) + 1*reg({0,+,1}<%for.body2>) will be dropped
no matter what because reg({1,+, %size}<%for.body>) is a SCEVAddRecExpr type reg related
with outerloop. Only formula like
reg(%array) + 1*reg({{1,+, %size}<%for.body>,+,1}<nuw><nsw><%for.body2>) will be kept
because the SCEVAddRecExpr related with outerloop is folded into the initial value of the
SCEVAddRecExpr related with current loop.

But in some cases, we do need to share the basic induction variable
reg{0 ,+, 1}<%for.body2> among LSR Uses to reduce the final total number of induction
variables used by LSR, so we don't want to drop the formula like
reg(%array) + reg({1,+, %size}<%for.body>) + 1*reg({0,+,1}<%for.body2>) unconditionally.

From the existing comment, it tries to avoid considering multiple level loops at the same time.
However, existing LSR only handles innermost loop, so for any SCEVAddRecExpr with a loop other
than current loop, it is an invariant and will be simple to handle, and the formula doesn't have
to be dropped.

Differential Revision: https://reviews.llvm.org/D26429

llvm-svn: 286999
2016-11-15 18:35:53 +00:00
Alexandros Lamprineas 0ee3ec2fe4 [ARM] Loop Strength Reduction crashes when targeting ARM or Thumb.
Scalar Evolution asserts when not all the operands of an Add Recurrence
Expression are loop invariants. Loop Strength Reduction should only
create affine Add Recurrences, so that both the start and the step of
the expression are loop invariants.

Differential Revision: https://reviews.llvm.org/D26185

llvm-svn: 286347
2016-11-09 08:53:07 +00:00
Krzysztof Parzyszek 3cb5ffeb35 Fix testcases failing after r284036
The codegen has changed slightly between my tests and the commit.

llvm-svn: 284049
2016-10-12 20:39:33 +00:00
Krzysztof Parzyszek 8271be9a1d Do not remove implicit defs in BranchFolder
Branch folder removes implicit defs if they are the only non-branching
instructions in a block, and the branches do not use the defined registers.
The problem is that in some cases these implicit defs are required for
the liveness information to be correct.

Differential Revision: https://reviews.llvm.org/D25478

llvm-svn: 284036
2016-10-12 19:50:57 +00:00
James Molloy 196ad0823e [LSR] Don't try and create post-inc expressions on non-rotated loops
If a loop is not rotated (for example when optimizing for size), the latch is not the backedge. If we promote an expression to post-inc form, we not only increase register pressure and add a COPY for that IV expression but for all IVs!

Motivating testcase:

    void f(float *a, float *b, float *c, int n) {
      while (n-- > 0)
        *c++ = *a++ + *b++;
    }

It's imperative that the pointer increments be located in the latch block and not the header block; if not, we cannot use post-increment loads and stores and we have to keep both the post-inc and pre-inc values around until the end of the latch which bloats register usage.

llvm-svn: 278658
2016-08-15 07:53:03 +00:00
Geoff Berry d01828096f [SCEV] Update interface to handle SCEVExpander insert point motion.
Summary:
This is an extension of the fix in r271424.  That fix dealt with builder
insert points being moved by SCEV expansion, but only for the lifetime
of the expand call.  This change modifies the interface so that LSR can
safely call expand multiple times at the same insert point and do the
right thing if one of the expansions decides to move the original insert
point.

This is a fix for PR28719.

Reviewers: sanjoy

Subscribers: llvm-commits, mcrosier, mzolotukhin

Differential Revision: https://reviews.llvm.org/D23342

llvm-svn: 278413
2016-08-11 21:05:17 +00:00
Chandler Carruth 6cb2ab2c60 [PM] Significantly refactor the pass pipeline parsing to be easier to
reason about and less error prone.

The core idea is to fully parse the text without trying to identify
passes or structure. This is done with a single state machine. There
were various bugs in the logic around this previously that were repeated
and scattered across the code. Having a single routine makes it much
easier to fix and get correct. For example, this routine doesn't suffer
from PR28577.

Then the actual pass construction is handled using *much* easier to read
code and simple loops, with particular pass manager construction sunk to
live with other pass construction. This is especially nice as the pass
managers *are* in fact passes.

Finally, the "implicit" pass manager synthesis is done much more simply
by forming "pre-parsed" structures rather than having to duplicate tons
of logic.

One of the bugs fixed by this was evident in the tests where we accepted
a pipeline that wasn't really well formed. Another bug is PR28577 for
which I have added a test case.

The code is less efficient than the previous code but I'm really hoping
that's not a priority. ;]

Thanks to Sean for the review!

Differential Revision: https://reviews.llvm.org/D22724

llvm-svn: 277561
2016-08-03 03:21:41 +00:00
Dehao Chen 6132ee8502 [PM] Convert Loop Strength Reduce pass to new PM
Summary: Convert Loop String Reduce pass to new PM

Reviewers: davidxl, silvas

Subscribers: junbuml, sanjoy, mzolotukhin, llvm-commits

Differential Revision: https://reviews.llvm.org/D22468

llvm-svn: 275919
2016-07-18 21:41:50 +00:00
Dehao Chen 1a44452b11 [PM] Convert IVUsers analysis to new pass manager.
Summary: Convert IVUsers analysis to new pass manager.

Reviewers: davidxl, silvas

Subscribers: junbuml, sanjoy, llvm-commits, mzolotukhin

Differential Revision: https://reviews.llvm.org/D22434

llvm-svn: 275698
2016-07-16 22:51:33 +00:00
Matt Arsenault f42c69206d AMDGPU: Run pointer optimization passes
llvm-svn: 272736
2016-06-15 00:11:01 +00:00
Geoff Berry 43e5160d0e Reapply [LSR] Create fewer redundant instructions.
Summary:
Fix LSRInstance::HoistInsertPosition() to check the original insert
position block first for a canonical insertion point that is dominated
by all inputs.  This leads to SCEV being able to reuse more instructions
since it currently tracks the instructions it creates for reuse by
keeping a table of <Value, insert point> pairs.

Originally reviewed in http://reviews.llvm.org/D18001

Reviewers: atrick

Subscribers: llvm-commits, mzolotukhin, mcrosier

Differential Revision: http://reviews.llvm.org/D18480

llvm-svn: 271929
2016-06-06 19:10:46 +00:00
Matt Arsenault 71fa1f375e AMDGPU: Fix a few slightly broken tests
Fix minor bugs and uses of undef which break when
pointer related optimization passes are run.

llvm-svn: 269944
2016-05-18 15:48:44 +00:00
Matt Arsenault 7d1b6c81af AMDGPU: Stop reporting an addressing mode for unknown addrspace
This was being treated the same as private, which has an immediate
offset. For unknown, it probably means it's for a computation not
actually being used for accessing memory, so it should not have a
nontrivial addressing mode.

llvm-svn: 268002
2016-04-29 06:25:10 +00:00
Chuang-Yu Cheng d3fb38cae5 Don't delete empty preheaders in CodeGenPrepare if it would create a critical edge
Presently, CodeGenPrepare deletes all nearly empty (only phi and branch)
basic blocks. This pass can delete loop preheaders which frequently creates
critical edges. A preheader can be a convenient place to spill registers to
the stack. If the entrance to a loop body is a critical edge, then spills
may occur in the loop body rather than immediately before it. This patch
protects loop preheaders from deletion in CodeGenPrepare even if they are
nearly empty.

Since the patch alters the CFG, it affects a large number of test cases.
In most cases, the changes are merely cosmetic (basic blocks have different
names or instruction orders change slightly). I am somewhat concerned about
the test/CodeGen/Mips/brdelayslot.ll test case. If the loop preheader is not
deleted, then the MIPS backend does not take advantage of a branch delay
slot. Consequently, I would like some close review by a MIPS expert.

The patch also partially subsumes D16893 from George Burgess IV. George
correctly notes that CodeGenPrepare does not actually preserve the dominator
tree. I think the dominator tree was usually not valid when CodeGenPrepare
ran, but I am using LoopInfo to mark preheaders, so the dominator tree is
now always valid before CodeGenPrepare.

Author: Tom Jablin (tjablin)
Reviewers: hfinkel george.burgess.iv vkalintiris dsanders kbarton cycheng

http://reviews.llvm.org/D16984

llvm-svn: 265397
2016-04-05 14:06:20 +00:00
David Majnemer e09d035dad [LoopStrengthReduce] Don't hoist into a catchswitch
We try to hoist the insertion point as high as possible to encourage
sharing.  However, we must be careful not to hoist into a catchswitch as
it is both an EHPad and a terminator.

llvm-svn: 264344
2016-03-24 21:40:22 +00:00
Geoff Berry 56fabf9b55 Revert "[LSR] Create fewer redundant instructions."
This reverts commit r263644.  Investigating bootstrap failures.

llvm-svn: 263655
2016-03-16 19:21:47 +00:00
Geoff Berry 459b750871 [LSR] Create fewer redundant instructions.
Summary:
Fix LSRInstance::HoistInsertPosition() to check the original insert
position block first for a canonical insertion point that is dominated
by all inputs.  This leads to SCEV being able to reuse more instructions
since it currently tracks the instructions it creates for reuse by
keeping a table of <Value, insert point> pairs.

Reviewers: atrick

Subscribers: mcrosier, mzolotukhin, llvm-commits

Differential Revision: http://reviews.llvm.org/D18001

llvm-svn: 263644
2016-03-16 17:29:49 +00:00
Wei Mi a49559befb [SCEV] Try to reuse existing value during SCEV expansion
Current SCEV expansion will expand SCEV as a sequence of operations
and doesn't utilize the value already existed. This will introduce
redundent computation which may not be cleaned up throughly by
following optimizations.

This patch introduces an ExprValueMap which is a map from SCEV to the
set of equal values with the same SCEV. When a SCEV is expanded, the
set of values is checked and reused whenever possible before generating
a sequence of operations.

The original commit triggered regressions in Polly tests. The regressions
exposed two problems which have been fixed in current version.

1. Polly will generate a new function based on the old one. To generate an
instruction for the new function, it builds SCEV for the old instruction,
applies some tranformation on the SCEV generated, then expands the transformed
SCEV and insert the expanded value into new function. Because SCEV expansion
may reuse value cached in ExprValueMap, the value in old function may be
inserted into new function, which is wrong.
   In SCEVExpander::expand, there is a logic to check the cached value to
be used should dominate the insertion point. However, for the above
case, the check always passes. That is because the insertion point is
in a new function, which is unreachable from the old function. However
for unreachable node, DominatorTreeBase::dominates thinks it will be
dominated by any other node.
   The fix is to simply add a check that the cached value to be used in
expansion should be in the same function as the insertion point instruction.

2. When the SCEV is of scConstant type, expanding it directly is cheaper than
reusing a normal value cached. Although in the cached value set in ExprValueMap,
there is a Constant type value, but it is not easy to find it out -- the cached
Value set is not sorted according to the potential cost. Existing reuse logic
in SCEVExpander::expand simply chooses the first legal element from the cached
value set.
   The fix is that when the SCEV is of scConstant type, don't try the reuse
logic. simply expand it.

Differential Revision: http://reviews.llvm.org/D12090

llvm-svn: 259736
2016-02-04 01:27:38 +00:00
David Majnemer a53b5bbb18 [LoopStrengthReduce] Don't rewrite PHIs with incoming values from CatchSwitches
Bail out if we have a PHI on an EHPad that gets a value from a
CatchSwitchInst.  Because the CatchSwitchInst cannot be split, there is
no good place to stick any instructions.

This fixes PR26373.

llvm-svn: 259702
2016-02-03 21:30:34 +00:00
Wei Mi 97de385868 Revert r259662, which caused regressions on polly tests.
llvm-svn: 259675
2016-02-03 18:05:57 +00:00
Wei Mi ed133978a0 [SCEV] Try to reuse existing value during SCEV expansion
Current SCEV expansion will expand SCEV as a sequence of operations
and doesn't utilize the value already existed. This will introduce
redundent computation which may not be cleaned up throughly by
following optimizations.

This patch introduces an ExprValueMap which is a map from SCEV to the
set of equal values with the same SCEV. When a SCEV is expanded, the
set of values is checked and reused whenever possible before generating
a sequence of operations.

Differential Revision: http://reviews.llvm.org/D12090

llvm-svn: 259662
2016-02-03 17:05:12 +00:00
Dan Gohman 75452734e4 Followup to 258750; update more tests to use .p2align .
llvm-svn: 258755
2016-01-26 00:35:07 +00:00
David Majnemer bbfc7219ef [IR] Remove terminatepad
It turns out that terminatepad gives little benefit over a cleanuppad
which calls the termination function.  This is not sufficient to
implement fully generic filters but MSVC doesn't support them which
makes terminatepad a little over-designed.

Depends on D15478.

Differential Revision: http://reviews.llvm.org/D15479

llvm-svn: 255522
2015-12-14 18:34:23 +00:00
David Majnemer 8a1c45d6e8 [IR] Reformulate LLVM's EH funclet IR
While we have successfully implemented a funclet-oriented EH scheme on
top of LLVM IR, our scheme has some notable deficiencies:
- catchendpad and cleanupendpad are necessary in the current design
  but they are difficult to explain to others, even to seasoned LLVM
  experts.
- catchendpad and cleanupendpad are optimization barriers.  They cannot
  be split and force all potentially throwing call-sites to be invokes.
  This has a noticable effect on the quality of our code generation.
- catchpad, while similar in some aspects to invoke, is fairly awkward.
  It is unsplittable, starts a funclet, and has control flow to other
  funclets.
- The nesting relationship between funclets is currently a property of
  control flow edges.  Because of this, we are forced to carefully
  analyze the flow graph to see if there might potentially exist illegal
  nesting among funclets.  While we have logic to clone funclets when
  they are illegally nested, it would be nicer if we had a
  representation which forbade them upfront.

Let's clean this up a bit by doing the following:
- Instead, make catchpad more like cleanuppad and landingpad: no control
  flow, just a bunch of simple operands;  catchpad would be splittable.
- Introduce catchswitch, a control flow instruction designed to model
  the constraints of funclet oriented EH.
- Make funclet scoping explicit by having funclet instructions consume
  the token produced by the funclet which contains them.
- Remove catchendpad and cleanupendpad.  Their presence can be inferred
  implicitly using coloring information.

N.B.  The state numbering code for the CLR has been updated but the
veracity of it's output cannot be spoken for.  An expert should take a
look to make sure the results are reasonable.

Reviewers: rnk, JosephTremoulet, andrew.w.kaylor

Differential Revision: http://reviews.llvm.org/D15139

llvm-svn: 255422
2015-12-12 05:38:55 +00:00
David Majnemer 7378e7a333 [LoopStrengthReduce] Don't increment iterator past the end of the BB
We tried to move the insertion point beyond instructions like landingpad
and cleanuppad.
However, we *also* tried to move past catchpad.  This is problematic
because catchpad is also a terminator.

This fixes PR25541.

llvm-svn: 253238
2015-11-16 17:37:58 +00:00
David Majnemer b222184223 [LoopStrengthReduce] Don't bother fixing up PHIs from EH Pad preds
We cannot really insert fixup code into a PHI's predecessor.

This fixes PR25445.

llvm-svn: 252416
2015-11-08 05:04:07 +00:00
David Majnemer 235acde953 [ScalarEvolutionExpander] PHI on a catchpad can be used on both edges
A PHI on a catchpad might be used by both edges out of the catchpad,
feeding back into a loop.  In this case, just use the insertion point.
Anything more clever would require new basic blocks or PHI placement.

llvm-svn: 251442
2015-10-27 19:48:28 +00:00
David Majnemer dd9a815746 [ScalarEvolutionExpander] Properly insert no-op casts + EH Pads
We want to insert no-op casts as close as possible to the def.  This is
tricky when the cast is of a PHI node and the BasicBlocks between the
def and the use cannot hold any instructions.  Iteratively walk EH pads
until we hit a non-EH pad.

This fixes PR25326.

llvm-svn: 251393
2015-10-27 07:36:42 +00:00
Sanjoy Das 8f27415c05 [SCEV] Mark AddExprs as nsw or nuw if legal
Summary:
This uses `ScalarEvolution::getRange` and not potentially control
dependent `nsw` and `nuw` bits on the arithmetic instruction.

Reviewers: atrick, hfinkel, nlewycky

Subscribers: llvm-commits, sanjoy

Differential Revision: http://reviews.llvm.org/D13613

llvm-svn: 251048
2015-10-22 19:57:19 +00:00
Craig Topper 2c4068f409 [TwoAddressInstructionPass] When looking for a 3 addr conversion after commuting, make sure regB has been updated to take into account the commute.
llvm-svn: 249378
2015-10-06 05:39:59 +00:00
Jeroen Ketema ab99b59e8c [ARM][NEON] Use address space in vld([1234]|[234]lane) and vst([1234]|[234]lane) instructions
This commit changes the interface of the vld[1234], vld[234]lane, and vst[1234],
vst[234]lane ARM neon intrinsics and associates an address space with the
pointer that these intrinsics take. This changes, e.g.,

<2 x i32> @llvm.arm.neon.vld1.v2i32(i8*, i32)

to

<2 x i32> @llvm.arm.neon.vld1.v2i32.p0i8(i8*, i32)

This change ensures that address spaces are fully taken into account in the ARM
target during lowering of interleaved loads and stores.

Differential Revision: http://reviews.llvm.org/D12985

llvm-svn: 248887
2015-09-30 10:56:37 +00:00
Duncan P. N. Exon Smith 814b8e91c7 DI: Require subprogram definitions to be distinct
As a follow-up to r246098, require `DISubprogram` definitions
(`isDefinition: true`) to be 'distinct'.  Specifically, add an assembler
check, a verifier check, and bitcode upgrading logic to combat testcase
bitrot after the `DIBuilder` change.

While working on the testcases, I realized that
test/Linker/subprogram-linkonce-weak-odr.ll isn't relevant anymore.  Its
purpose was to check for a corner case in PR22792 where two subprogram
definitions match exactly and share the same metadata node.  The new
verifier check, requiring that subprogram definitions are 'distinct',
precludes that possibility.

I updated almost all the IR with the following script:

    git grep -l -E -e '= !DISubprogram\(.* isDefinition: true' |
    grep -v test/Bitcode |
    xargs sed -i '' -e 's/= \(!DISubprogram(.*, isDefinition: true\)/= distinct \1/'

Likely some variant of would work for out-of-tree testcases.

llvm-svn: 246327
2015-08-28 20:26:49 +00:00
Jingyue Wu ca3ef11a9b [NVPTX] truncating 64-bit to 32-bit is free
Summary:
Add an LSR test that exercises isTruncateFree. Without this change, LSR creates
another indvar representing the truncated value.

Reviewers: jholewinski, eliben

Subscribers: jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D12058

llvm-svn: 245611
2015-08-20 20:59:02 +00:00
Matt Arsenault 427a0fd22e LoopStrengthReduce: Try to pass address space to isLegalAddressingMode
This seems to only work some of the time. In some situations,
this seems to use a nonsensical type and isn't actually aware of the
memory being accessed. e.g. if branch condition is an icmp of a pointer,
it checks the addressing mode of i1.

llvm-svn: 245137
2015-08-15 00:53:06 +00:00
Bjarke Hammersholt Roune 9791ed4705 [SCEV] Apply NSW and NUW flags via poison value analysis for sub, mul and shl
Summary:
http://reviews.llvm.org/D11212 made Scalar Evolution able to propagate NSW and NUW flags from instructions to SCEVs for add instructions. This patch expands that to sub, mul and shl instructions.

This change makes LSR able to generate pointer induction variables for loops like these, where the index is 32 bit and the pointer is 64 bit:

  for (int i = 0; i < numIterations; ++i)
    sum += ptr[i - offset];

  for (int i = 0; i < numIterations; ++i)
    sum += ptr[i * stride];

  for (int i = 0; i < numIterations; ++i)
    sum += ptr[3 * (i << 7)];


Reviewers: atrick, sanjoy

Subscribers: sanjoy, majnemer, hfinkel, llvm-commits, meheff, jingyue, eliben

Differential Revision: http://reviews.llvm.org/D11860

llvm-svn: 245118
2015-08-14 22:45:26 +00:00
Sanjoy Das 215df9ed98 Revert "[LSR] Generate and use zero extends"
This reverts commit r243348 and r243357.  They caused PR24347.

llvm-svn: 243939
2015-08-04 01:52:05 +00:00
Duncan P. N. Exon Smith ed013cd221 DI: Remove DW_TAG_arg_variable and DW_TAG_auto_variable
Remove the fake `DW_TAG_auto_variable` and `DW_TAG_arg_variable` tags,
using `DW_TAG_variable` in their place Stop exposing the `tag:` field at
all in the assembly format for `DILocalVariable`.

Most of the testcase updates were generated by the following sed script:

    find test/ -name "*.ll" -o -name "*.mir" |
    xargs grep -l 'DILocalVariable' |
    xargs sed -i '' \
      -e 's/tag: DW_TAG_arg_variable, //' \
      -e 's/tag: DW_TAG_auto_variable, //'

There were only a handful of tests in `test/Assembly` that I needed to
update by hand.

(Note: a follow-up could change `DILocalVariable::DILocalVariable()` to
set the tag to `DW_TAG_formal_parameter` instead of `DW_TAG_variable`
(as appropriate), instead of having that logic magically in the backend
in `DbgVariable`.  I've added a FIXME to that effect.)

llvm-svn: 243774
2015-07-31 18:58:39 +00:00
Jingyue Wu 42f1d67a45 [SCEV] Apply NSW and NUW flags via poison value analysis
Summary:
Make Scalar Evolution able to propagate NSW and NUW flags from instructions to SCEVs in some cases. This is based on reasoning about when poison from instructions with these flags would trigger undefined behavior. This gives a 13% speed-up on some Eigen3-based Google-internal microbenchmarks for NVPTX.

There does not seem to be clear agreement about when poison should be considered to propagate through instructions. In this analysis, poison propagates only in cases where that should be uncontroversial.

This change makes LSR able to create induction variables for expressions like &ptr[i + offset] for loops like this:

  for (int i = 0; i < limit; ++i) {
    sum += ptr[i + offset];
  }

Here ptr is a 64 bit pointer and offset is a 32 bit integer. For NVPTX, LSR currently creates an induction variable for i + offset instead, which is not as fast. Improving this situation is what brings the 13% speed-up on some Eigen3-based Google-internal microbenchmarks for NVPTX.


There are more details in this discussion on llvmdev.
June: http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-June/thread.html#87234
July: http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-July/thread.html#87392

Patch by Bjarke Roune

Reviewers: eliben, atrick, sanjoy

Subscribers: majnemer, hfinkel, jingyue, meheff, llvm-commits

Differential Revision: http://reviews.llvm.org/D11212

llvm-svn: 243460
2015-07-28 18:22:40 +00:00
Sanjoy Das 3895a57b32 [LSR] Move X86 specific test case to X86/
rL243348 added the test case in the wrong directory.

llvm-svn: 243357
2015-07-28 00:13:42 +00:00