Commit Graph

291 Commits

Author SHA1 Message Date
Kyle Butt 7e8be28661 CodeGen: BlockPlacement: Don't always tail-duplicate with no other successor.
The math works out where it can actually be counter-productive. The probability
calculations correctly handle the case where the alternative is 0 probability,
rely on those calculations.

Includes a test case that demonstrates the problem.

llvm-svn: 299892
2017-04-10 22:28:22 +00:00
Kyle Butt ee51a20164 CodeGen: BlockPlacement: Minor probability changes.
Qin may be large, and Succ may be more frequent than BB. Take these both into
account when deciding if tail-duplication is profitable.

llvm-svn: 299891
2017-04-10 22:28:18 +00:00
Dehao Chen b197d5b0a0 Fix trellis layout to avoid mis-identify triangle.
Summary:
For the following CFG:

A->B
B->C
A->C

If there is another edge B->D, then ABC should not be considered as triangle.

Reviewers: davidxl, iteratee

Reviewed By: iteratee

Subscribers: nemanjai, llvm-commits

Differential Revision: https://reviews.llvm.org/D31310

llvm-svn: 298661
2017-03-23 23:28:09 +00:00
Kyle Butt 08655997eb CodeGen: BlockPlacement: Reduce TriangleChainCount to 2
This produces a 1% speedup on an important internal Google benchmark
(protocol buffers), with no other regressions in google or in the llvm
test-suite. Only 5 targets in the entire llvm test-suite are affected,
and on those 5 targets the size increase is 0.027%

llvm-svn: 297925
2017-03-16 01:32:29 +00:00
Kyle Butt 1fa6030767 CodeGen: BlockPlacement: Precompute layout for chains of triangles.
For chains of triangles with small join blocks that can be tail duplicated, a
simple calculation of probabilities is insufficient. Tail duplication
can be profitable in 3 different ways for these cases:

1) The post-dominators marked 50% are actually taken 56% (This shrinks with
   longer chains)
2) The chains are statically correlated. Branch probabilities have a very
   U-shaped distribution.
   [http://nrs.harvard.edu/urn-3:HUL.InstRepos:24015805]
   If the branches in a chain are likely to be from the same side of the
   distribution as their predecessor, but are independent at runtime, this
   transformation is profitable. (Because the cost of being wrong is a small
   fixed cost, unlike the standard triangle layout where the cost of being
   wrong scales with the # of triangles.)
3) The chains are dynamically correlated. If the probability that a previous
   branch was taken positively influences whether the next branch will be
   taken
We believe that 2 and 3 are common enough to justify the small margin in 1.

The code pre-scans a function's CFG to identify this pattern and marks the edges
so that the standard layout algorithm can use the computed results.

llvm-svn: 296845
2017-03-03 01:00:22 +00:00
Kyle Butt 1393761e0c CodeGen: MachineBlockPlacement: Remove the unused outlining heuristic.
Outlining optional branches isn't a good heuristic, and it's never been
on by default. Remove it to clean things up.

llvm-svn: 296818
2017-03-02 21:44:24 +00:00
Kyle Butt ebe6cc4dad CodeGen: MachineBlockPlacement: Rename member to more general name. NFC.
Rename ComputedTrellisEdges to ComputedEdges to allow for other methods of
pre-computing edges.

Differential Revision: https://reviews.llvm.org/D30308

llvm-svn: 296018
2017-02-23 21:22:24 +00:00
Kyle Butt 7fbec9bdf1 Codegen: Make chains from trellis-shaped CFGs
Lay out trellis-shaped CFGs optimally.
A trellis of the shape below:

  A     B
  |\   /|
  | \ / |
  |  X  |
  | / \ |
  |/   \|
  C     D

would be laid out A; B->C ; D by the current layout algorithm. Now we identify
trellises and lay them out either A->C; B->D or A->D; B->C. This scales with an
increasing number of predecessors. A trellis is a a group of 2 or more
predecessor blocks that all have the same successors.

because of this we can tail duplicate to extend existing trellises.

As an example consider the following CFG:

    B   D   F   H
   / \ / \ / \ / \
  A---C---E---G---Ret

Where A,C,E,G are all small (Currently 2 instructions).

The CFG preserving layout is then A,B,C,D,E,F,G,H,Ret.

The current code will copy C into B, E into D and G into F and yield the layout
A,C,B(C),E,D(E),F(G),G,H,ret

define void @straight_test(i32 %tag) {
entry:
  br label %test1
test1: ; A
  %tagbit1 = and i32 %tag, 1
  %tagbit1eq0 = icmp eq i32 %tagbit1, 0
  br i1 %tagbit1eq0, label %test2, label %optional1
optional1: ; B
  call void @a()
  br label %test2
test2: ; C
  %tagbit2 = and i32 %tag, 2
  %tagbit2eq0 = icmp eq i32 %tagbit2, 0
  br i1 %tagbit2eq0, label %test3, label %optional2
optional2: ; D
  call void @b()
  br label %test3
test3: ; E
  %tagbit3 = and i32 %tag, 4
  %tagbit3eq0 = icmp eq i32 %tagbit3, 0
  br i1 %tagbit3eq0, label %test4, label %optional3
optional3: ; F
  call void @c()
  br label %test4
test4: ; G
  %tagbit4 = and i32 %tag, 8
  %tagbit4eq0 = icmp eq i32 %tagbit4, 0
  br i1 %tagbit4eq0, label %exit, label %optional4
optional4: ; H
  call void @d()
  br label %exit
exit:
  ret void
}

here is the layout after D27742:
straight_test:                          # @straight_test
; ... Prologue elided
; BB#0:                                 # %entry ; A (merged with test1)
; ... More prologue elided
	mr 30, 3
	andi. 3, 30, 1
	bc 12, 1, .LBB0_2
; BB#1:                                 # %test2 ; C
	rlwinm. 3, 30, 0, 30, 30
	beq	 0, .LBB0_3
	b .LBB0_4
.LBB0_2:                                # %optional1 ; B (copy of C)
	bl a
	nop
	rlwinm. 3, 30, 0, 30, 30
	bne	 0, .LBB0_4
.LBB0_3:                                # %test3 ; E
	rlwinm. 3, 30, 0, 29, 29
	beq	 0, .LBB0_5
	b .LBB0_6
.LBB0_4:                                # %optional2 ; D (copy of E)
	bl b
	nop
	rlwinm. 3, 30, 0, 29, 29
	bne	 0, .LBB0_6
.LBB0_5:                                # %test4 ; G
	rlwinm. 3, 30, 0, 28, 28
	beq	 0, .LBB0_8
	b .LBB0_7
.LBB0_6:                                # %optional3 ; F (copy of G)
	bl c
	nop
	rlwinm. 3, 30, 0, 28, 28
	beq	 0, .LBB0_8
.LBB0_7:                                # %optional4 ; H
	bl d
	nop
.LBB0_8:                                # %exit ; Ret
	ld 30, 96(1)                    # 8-byte Folded Reload
	addi 1, 1, 112
	ld 0, 16(1)
	mtlr 0
	blr

The tail-duplication has produced some benefit, but it has also produced a
trellis which is not laid out optimally. With this patch, we improve the layouts
of such trellises, and decrease the cost calculation for tail-duplication
accordingly.

This patch produces the layout A,C,E,G,B,D,F,H,Ret. This layout does have
back edges, which is a negative, but it has a bigger compensating
positive, which is that it handles the case where there are long strings
of skipped blocks much better than the original layout. Both layouts
handle runs of executed blocks equally well. Branch prediction also
improves if there is any correlation between subsequent optional blocks.

Here is the resulting concrete layout:

straight_test:                          # @straight_test
; BB#0:                                 # %entry ; A (merged with test1)
	mr 30, 3
	andi. 3, 30, 1
	bc 12, 1, .LBB0_4
; BB#1:                                 # %test2 ; C
	rlwinm. 3, 30, 0, 30, 30
	bne	 0, .LBB0_5
.LBB0_2:                                # %test3 ; E
	rlwinm. 3, 30, 0, 29, 29
	bne	 0, .LBB0_6
.LBB0_3:                                # %test4 ; G
	rlwinm. 3, 30, 0, 28, 28
	bne	 0, .LBB0_7
	b .LBB0_8
.LBB0_4:                                # %optional1 ; B (Copy of C)
	bl a
	nop
	rlwinm. 3, 30, 0, 30, 30
	beq	 0, .LBB0_2
.LBB0_5:                                # %optional2 ; D (Copy of E)
	bl b
	nop
	rlwinm. 3, 30, 0, 29, 29
	beq	 0, .LBB0_3
.LBB0_6:                                # %optional3 ; F (Copy of G)
	bl c
	nop
	rlwinm. 3, 30, 0, 28, 28
	beq	 0, .LBB0_8
.LBB0_7:                                # %optional4 ; H
	bl d
	nop
.LBB0_8:                                # %exit

Differential Revision: https://reviews.llvm.org/D28522

llvm-svn: 295223
2017-02-15 19:49:14 +00:00
Xinliang David Li 538d666814 include function name in dot filename
Differential Revision: http://reviews.llvm.org/D29975

llvm-svn: 295220
2017-02-15 19:21:04 +00:00
Kyle Butt c7d67eef5a [CodeGen]: BlockPlacement: Skip extraneous logging.
Move a check for blocks that are not candidates for tail duplication up before
the logging. Reduces logging noise. No non-logging changes intended.

llvm-svn: 294086
2017-02-04 02:26:34 +00:00
Kyle Butt e9425c4ff8 [CodeGen]: BlockPlacement: Apply const liberally. NFC
Anything that needs to be passed to AnalyzeBranch unfortunately can't be const,
or more would be const. Added const_iterator to BlockChain to allow
BlockChain to be const when we don't expect to change it.

llvm-svn: 294085
2017-02-04 02:26:32 +00:00
Xinliang David Li 58fcc9bdce [PGO] internal option cleanups
1. Added comments for options
2. Added missing option cl::desc field
3. Uniified function filter option for graph viewing.
   Now PGO count/raw-counts share the same
   filter option: -view-bfi-func-name=.

llvm-svn: 293938
2017-02-02 21:29:17 +00:00
Xinliang David Li 1eb4ec6a2e [PGO] make graph view internal options available for all builds
Differential Revision: https://reviews.llvm.org/D29259

llvm-svn: 293921
2017-02-02 19:18:56 +00:00
Kyle Butt b15c06677c CodeGen: Allow small copyable blocks to "break" the CFG.
When choosing the best successor for a block, ordinarily we would have preferred
a block that preserves the CFG unless there is a strong probability the other
direction. For small blocks that can be duplicated we now skip that requirement
as well, subject to some simple frequency calculations.

Differential Revision: https://reviews.llvm.org/D28583

llvm-svn: 293716
2017-01-31 23:48:32 +00:00
Xinliang David Li fd3f645f9d Add support to dump dot graph block layout after MBP
Differential Revision: https://reviews.llvm.org/D29141

llvm-svn: 293408
2017-01-29 01:57:02 +00:00
Kyle Butt efe56fed12 Revert "CodeGen: Allow small copyable blocks to "break" the CFG."
This reverts commit ada6595a526d71df04988eb0a4b4fe84df398ded.

This needs a simple probability check because there are some cases where it is
not profitable.

llvm-svn: 291695
2017-01-11 19:55:19 +00:00
Kyle Butt df27aa8c89 CodeGen: Allow small copyable blocks to "break" the CFG.
When choosing the best successor for a block, ordinarily we would have preferred
a block that preserves the CFG unless there is a strong probability the other
direction. For small blocks that can be duplicated we now skip that requirement
as well.

Differential revision: https://reviews.llvm.org/D27742

llvm-svn: 291609
2017-01-10 23:04:30 +00:00
Hal Finkel 34f9d6ac11 Trying to fix NDEBUG build after r289764
llvm-svn: 289766
2016-12-15 05:33:19 +00:00
Sanjoy Das d7389d6261 [MachineBlockPlacement] Don't make blocks "uneditable"
Summary:
This fixes an issue with MachineBlockPlacement due to a badly timed call
to `analyzeBranch` with `AllowModify` set to true.  The timeline is as
follows:

 1. `MachineBlockPlacement::maybeTailDuplicateBlock` calls
    `TailDup.shouldTailDuplicate` on its argument, which in turn calls
    `analyzeBranch` with `AllowModify` set to true.

 2. This `analyzeBranch` call edits the terminator sequence of the block
    based on the physical layout of the machine function, turning an
    unanalyzable non-fallthrough block to a unanalyzable fallthrough
    block.  Normally MBP bails out of rearranging such blocks, but this
    block was unanalyzable non-fallthrough (and thus rearrangeable) the
    first time MBP looked at it, and so it goes ahead and decides where
    it should be placed in the function.

 3. When placing this block MBP fails to analyze and thus update the
    block in keeping with the new physical layout.

Concretely, before (1) we have something like:

```
LBL0:
  < unknown terminator op that may branch to LBL1 >
  jmp LBL1

LBL1:
  ... A

LBL2:
  ... B
```

In (2), analyze branch simplifies this to

```
LBL0:
  < unknown terminator op that may branch to LBL2 >
  ;; jmp LBL1 <- redundant jump removed

LBL1:
  ... A

LBL2:
  ... B
```

In (3), MachineBlockPlacement goes ahead with its plan of putting LBL2
after the first block since that is profitable.

```
LBL0:
  < unknown terminator op that may branch to LBL2 >
  ;; jmp LBL1 <- redundant jump

LBL2:
  ... B

LBL1:
  ... A
```

and the program now has incorrect behavior (we no longer fall-through
from `LBL0` to `LBL1`) because MBP can no longer edit LBL0.

There are several possible solutions, but I went with removing the teeth
off of the `analyzeBranch` calls in TailDuplicator.  That makes thinking
about the result of these calls easier, and breaks nothing in the lit
test suite.

I've also added some bookkeeping to the MachineBlockPlacement pass and
used that to write an assert that would have caught this.

Reviewers: chandlerc, gberry, MatzeB, iteratee

Subscribers: mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D27783

llvm-svn: 289764
2016-12-15 05:08:57 +00:00
Rong Xu 66827427e1 Make block placement deterministic
We fail to produce bit-to-bit matching stage2 and stage3 compiler in PGO
bootstrap build. The reason is because LoopBlockSet is of SmallPtrSet type
whose iterating order depends on the pointer value.

This patch fixes this issue by changing to use SmallSetVector.

Differential Revision: http://reviews.llvm.org/D26634

llvm-svn: 287148
2016-11-16 20:50:06 +00:00
Eric Christopher 690f8e587e Move the initialization of PreferredLoopExit into runOnMachineFunction to be near the other function specific initializations.
llvm-svn: 285758
2016-11-01 22:15:50 +00:00
Sam McCall 2a36eee4ca Fix uninitialized access in MachineBlockPlacement.
Summary:
Currently PreferredLoopExit is set only in buildLoopChains, which is
never called if there are no MachineLoops.

MSan is currently broken by this:
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/145/steps/check-llvm%20msan/logs/stdio

This is a naive fix to get things green again. iteratee: you may have a better fix.

This change will also mean PreferredLoopExit will not carry over if
buildCFGChains() is called a second time in runOnMachineFunction, this
appears to be the right thing.

Reviewers: bkramer, iteratee, echristo

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26069

llvm-svn: 285757
2016-11-01 22:02:14 +00:00
Kyle Butt ab9cca7b0c CodeGen: Handle missed case of block removal during BlockPlacement.
There is a use after free bug in the existing code. Loop layout selects
a preferred exit block, and then lays out the loop. If this block is
removed during layout, it needs to be invalidated to prevent a use after
free.

llvm-svn: 285348
2016-10-27 21:37:20 +00:00
Kyle Butt 0846e56e63 Codegen: Tail-duplicate during placement.
The tail duplication pass uses an assumed layout when making duplication
decisions. This is fine, but passes up duplication opportunities that
may arise when blocks are outlined. Because we want the updated CFG to
affect subsequent placement decisions, this change must occur during
placement.

In order to achieve this goal, TailDuplicationPass is split into a
utility class, TailDuplicator, and the pass itself. The pass delegates
nearly everything to the TailDuplicator object, except for looping over
the blocks in a function. This allows the same code to be used for tail
duplication in both places.

This change, in concert with outlining optional branches, allows
triangle shaped code to perform much better, esepecially when the
taken/untaken branches are correlated, as it creates a second spine when
the tests are small enough.

Issue from previous rollback fixed, and a new test was added for that
case as well. Issue was worklist/scheduling/taildup issue in layout.

Issue from 2nd rollback fixed, with 2 additional tests. Issue was
tail merging/loop info/tail-duplication causing issue with loops that share
a header block.

Issue with early tail-duplication of blocks that branch to a fallthrough
predecessor fixed with test case: tail-dup-branch-to-fallthrough.ll

Differential revision: https://reviews.llvm.org/D18226

llvm-svn: 283934
2016-10-11 20:36:43 +00:00
Daniel Jasper 0c42dc4784 Revert "Codegen: Tail-duplicate during placement."
This reverts commit r283842.

test/CodeGen/X86/tail-dup-repeat.ll causes and llc crash with our
internal testing. I'll share a link with you.

llvm-svn: 283857
2016-10-11 07:36:11 +00:00
Kyle Butt ae068a320c Codegen: Tail-duplicate during placement.
The tail duplication pass uses an assumed layout when making duplication
decisions. This is fine, but passes up duplication opportunities that
may arise when blocks are outlined. Because we want the updated CFG to
affect subsequent placement decisions, this change must occur during
placement.

In order to achieve this goal, TailDuplicationPass is split into a
utility class, TailDuplicator, and the pass itself. The pass delegates
nearly everything to the TailDuplicator object, except for looping over
the blocks in a function. This allows the same code to be used for tail
duplication in both places.

This change, in concert with outlining optional branches, allows
triangle shaped code to perform much better, esepecially when the
taken/untaken branches are correlated, as it creates a second spine when
the tests are small enough.

Issue from previous rollback fixed, and a new test was added for that
case as well. Issue was worklist/scheduling/taildup issue in layout.

Issue from 2nd rollback fixed, with 2 additional tests. Issue was
tail merging/loop info/tail-duplication causing issue with loops that share
a header block.

Issue with early tail-duplication of blocks that branch to a fallthrough
predecessor fixed with test case: tail-dup-branch-to-fallthrough.ll

Differential revision: https://reviews.llvm.org/D18226

llvm-svn: 283842
2016-10-11 01:20:33 +00:00
Kyle Butt 2facd194a2 Revert "Codegen: Tail-duplicate during placement."
This reverts commit 71c312652c10f1855b28d06697c08d47e7a243e4.

llvm-svn: 283647
2016-10-08 01:47:05 +00:00
Kyle Butt 37e676d857 Codegen: Tail-duplicate during placement.
The tail duplication pass uses an assumed layout when making duplication
decisions. This is fine, but passes up duplication opportunities that
may arise when blocks are outlined. Because we want the updated CFG to
affect subsequent placement decisions, this change must occur during
placement.

In order to achieve this goal, TailDuplicationPass is split into a
utility class, TailDuplicator, and the pass itself. The pass delegates
nearly everything to the TailDuplicator object, except for looping over
the blocks in a function. This allows the same code to be used for tail
duplication in both places.

This change, in concert with outlining optional branches, allows
triangle shaped code to perform much better, esepecially when the
taken/untaken branches are correlated, as it creates a second spine when
the tests are small enough.

Issue from previous rollback fixed, and a new test was added for that
case as well. Issue was worklist/scheduling/taildup issue in layout.

Issue from 2nd rollback fixed, with 2 additional tests. Issue was
tail merging/loop info/tail-duplication causing issue with loops that share
a header block.

Differential revision: https://reviews.llvm.org/D18226

llvm-svn: 283619
2016-10-07 22:33:20 +00:00
Kyle Butt 25ac35d822 Revert "Codegen: Tail-duplicate during placement."
This reverts commit 062ace9764953e9769142c1099281a345f9b6bdc.

Issue with loop info and block removal revealed by polly.
I have a fix for this issue already in another patch, I'll re-roll this
together with that fix, and a test case.

llvm-svn: 283292
2016-10-05 01:39:29 +00:00
Kyle Butt adabac2d57 Codegen: Tail-duplicate during placement.
The tail duplication pass uses an assumed layout when making duplication
decisions. This is fine, but passes up duplication opportunities that
may arise when blocks are outlined. Because we want the updated CFG to
affect subsequent placement decisions, this change must occur during
placement.

In order to achieve this goal, TailDuplicationPass is split into a
utility class, TailDuplicator, and the pass itself. The pass delegates
nearly everything to the TailDuplicator object, except for looping over
the blocks in a function. This allows the same code to be used for tail
duplication in both places.

This change, in concert with outlining optional branches, allows
triangle shaped code to perform much better, esepecially when the
taken/untaken branches are correlated, as it creates a second spine when
the tests are small enough.

Issue from previous rollback fixed, and a new test was added for that
case as well.

Differential revision: https://reviews.llvm.org/D18226

llvm-svn: 283274
2016-10-04 23:54:18 +00:00
Kyle Butt 3ffb8529bc Revert "Codegen: Tail-duplicate during placement."
This reverts commit ff234efbe23528e4f4c80c78057b920a51f434b2.

Causing crashes on aarch64 build.

llvm-svn: 283172
2016-10-04 00:38:23 +00:00
Kyle Butt 396bfdd707 Codegen: Tail-duplicate during placement.
The tail duplication pass uses an assumed layout when making duplication
decisions. This is fine, but passes up duplication opportunities that
may arise when blocks are outlined. Because we want the updated CFG to
affect subsequent placement decisions, this change must occur during
placement.

In order to achieve this goal, TailDuplicationPass is split into a
utility class, TailDuplicator, and the pass itself. The pass delegates
nearly everything to the TailDuplicator object, except for looping over
the blocks in a function. This allows the same code to be used for tail
duplication in both places.

This change, in concert with outlining optional branches, allows
triangle shaped code to perform much better, esepecially when the
taken/untaken branches are correlated, as it creates a second spine when
the tests are small enough.

llvm-svn: 283164
2016-10-04 00:00:09 +00:00
Matt Arsenault 1b9fc8ed65 Finish renaming remaining analyzeBranch functions
llvm-svn: 281535
2016-09-14 20:43:16 +00:00
Matt Arsenault e8e0f5cac6 Make analyzeBranch family of instruction names consistent
analyzeBranch was renamed to use lowercase first, rename
the related set to match.

llvm-svn: 281506
2016-09-14 17:24:15 +00:00
Kyle Butt 64e428147f Branch Folding: Accept explicit threshold for tail merge size.
This is prep work for allowing the threshold to be different during layout,
and to enforce a single threshold between merging and duplicating during
layout. No observable change intended.

llvm-svn: 279117
2016-08-18 18:57:29 +00:00
Sjoerd Meijer 15c81b05ea [MBP] do not reorder and move up loop latch block
Do not reorder and move up a loop latch block before a loop header
when optimising for size because this will generate an extra 
unconditional branch.

Differential Revision: https://reviews.llvm.org/D22521

llvm-svn: 278840
2016-08-16 19:50:33 +00:00
David Majnemer c700490f48 Use the range variant of remove_if instead of unpacking begin/end
No functionality change is intended.

llvm-svn: 278475
2016-08-12 04:32:37 +00:00
David Majnemer 0d955d0bf5 Use the range variant of find instead of unpacking begin/end
If the result of the find is only used to compare against end(), just
use is_contained instead.

No functionality change is intended.

llvm-svn: 278433
2016-08-11 22:21:41 +00:00
Kyle Butt 02d8d054ab Codegen: MachineBlockPlacement Improve probability layout.
The following pattern was being layed out poorly:

              A
             / \
            B   C
           / \ / \
          D   E   ? (Doesn't matter)

Where A->B is far more likely than A->C, and prob(B->D) = prob(B->E)

The current algorithm gives:
A,B,C,E (D goes on worklist)

It does this even if C has a frequency count of 0. This patch
adjusts the layout calculation so that if freq(B->E) >> freq(C->E)
then we go ahead and layout E rather than C. Fallthrough half the time
is better than fallthrough never, or fallthrough very rarely. The
resulting layout is:

A,B,E, (C and D are in a worklist)

llvm-svn: 277187
2016-07-29 18:09:28 +00:00
Sjoerd Meijer 5e11a18f5a [MBP] Added some more debug messages and some clean ups /NFC
Differential Revision: https://reviews.llvm.org/D22669

llvm-svn: 276849
2016-07-27 08:49:23 +00:00
Sjoerd Meijer fd0ad4e193 [MBP] Clean up of the comments, and a first attempt to better describe a part
of the algorithm.

Differential Revision: https://reviews.llvm.org/D22364

llvm-svn: 275595
2016-07-15 18:41:56 +00:00
Jacques Pienaar 71c30a14b7 Rename AnalyzeBranch* to analyzeBranch*.
Summary: NFC. Rename AnalyzeBranch/AnalyzeBranchPredicate to analyzeBranch/analyzeBranchPredicate to follow LLVM coding style and be consistent with TargetInstrInfo's analyzeCompare and analyzeSelect.

Reviewers: tstellarAMD, mcrosier

Subscribers: mcrosier, jholewinski, jfb, arsenm, dschuff, jyknight, dsanders, nemanjai

Differential Revision: https://reviews.llvm.org/D22409

llvm-svn: 275564
2016-07-15 14:41:04 +00:00
Xinliang David Li 93926acbb2 [MBP] method interface cleanup
Make worklist and ehworklist member of the
class so that they don't need to be passed around.

llvm-svn: 274333
2016-07-01 05:46:48 +00:00
Kyle Butt 82c2290e0f Codegen: [MBP] Add messages to asserts. NFC
llvm-svn: 274075
2016-06-28 22:50:54 +00:00
Xinliang David Li 449cdfd00a [MBP] show function name in debug dump
llvm-svn: 273744
2016-06-24 22:54:21 +00:00
Kyle Butt b3875ea71b Codegen: [MBP] Add assert strings. NFC
llvm-svn: 273067
2016-06-17 22:40:19 +00:00
Xinliang David Li e34ed833e5 [MBP] add comments and bug fix
Document the new parameter and threshod computation
model.  Also fix a bug when the threshold parameter
is set to be different from the default.

 

llvm-svn: 272749
2016-06-15 03:03:30 +00:00
Dehao Chen 9f2bdfb40f Set machine block placement hot prob threshold for both static and runtime profile.
Summary: With runtime profile, we have more confidence in branch probability, thus during basic block layout, we set a lower hot prob threshold so that blocks can be layouted optimally.

Reviewers: djasper, davidxl

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D20991

llvm-svn: 272729
2016-06-14 22:27:17 +00:00
Xinliang David Li 52530a72c9 [MBP] Interface cleanups /NFC
Save machine function pointer so that
the reference does not need to be passed around.

This also gives other methods access to machine
function for information such as entry count etc.

 

llvm-svn: 272594
2016-06-13 22:23:44 +00:00
Xinliang David Li cbf1214f76 [MBP] Code cleanup #3 /NFC
This is third patch to clean up the code.

Included in this patch:
1. Further unclutter trace/chain formation main routine;
2. Isolate the logic to compute global cost/conflict detection
   into its own method;
3. Heavily document the selection algorithm;
4. Added helper hook to allow PGO specific logic to be
   added in the future.
 

llvm-svn: 272582
2016-06-13 20:24:19 +00:00
Xinliang David Li 071d0f1807 [MBP] Code cleanup /NFC
This is second patch to clean up the code.

In this patch, the logic to determine block outlinining
is refactored and more comments are added.
 

llvm-svn: 272514
2016-06-12 16:54:03 +00:00
Xinliang David Li 594ffa3d36 [MBP] Code cleanup /NFC
This is one of the patches to clean up the code so that
it is in a better form to make future enhancements easier.

In htis patch, the logic to collect viable successors are
extrated as a helper to unclutter the caller which gets very
large recenty. Also cleaned up BP adjustment code.
 

llvm-svn: 272482
2016-06-11 18:35:40 +00:00
Haicheng Wu 5b458cc1f6 Reapply "[MBP] Reduce code size by running tail merging in MBP.""
This reapplies commit r271930, r271915, r271923.  They hit a bug in
Thumb which is fixed in r272258 now.

The original message:

The code layout that TailMerging (inside BranchFolding) works on is not the
final layout optimized based on the branch probability. Generally, after
BlockPlacement, many new merging opportunities emerge.

This patch calls Tail Merging after MBP and calls MBP again if Tail Merging
merges anything.

llvm-svn: 272267
2016-06-09 15:24:29 +00:00
Dehao Chen 769219b11a Revive http://reviews.llvm.org/D12778 to handle forward-hot-prob and backward-hot-prob consistently.
Summary:
Consider the following diamond CFG:

 A
/ \
B C
 \/
 D

Suppose A->B and A->C have probabilities 81% and 19%. In block-placement, A->B is called a hot edge and the final placement should be ABDC. However, the current implementation outputs ABCD. This is because when choosing the next block of B, it checks if Freq(C->D) > Freq(B->D) * 20%, which is true (if Freq(A) = 100, then Freq(B->D) = 81, Freq(C->D) = 19, and 19 > 81*20%=16.2). Actually, we should use 25% instead of 20% as the probability here, so that we have 19 < 81*25%=20.25, and the desired ABDC layout will be generated.

Reviewers: djasper, davidxl

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D20989

llvm-svn: 272203
2016-06-08 21:30:12 +00:00
Haicheng Wu 4fa9f3ae45 Revert "[MBP] Reduce code size by running tail merging in MBP."
This reverts commit r271930, r271915, r271923.  They break a thumb selfhosting
bot.

llvm-svn: 272017
2016-06-07 15:17:21 +00:00
Haicheng Wu 77ea344786 [MBP] Reduce code size by running tail merging in MBP.
The code layout that TailMerging (inside BranchFolding) works on is not the
final layout optimized based on the branch probability. Generally, after
BlockPlacement, many new merging opportunities emerge.

This patch calls Tail Merging after MBP and calls MBP again if Tail Merging
merges anything.

Differential Revision: http://reviews.llvm.org/D20276

llvm-svn: 271925
2016-06-06 18:36:07 +00:00
Xinliang David Li ff2873742e Replace hard coded probability threshold with parameter /NFC
llvm-svn: 271751
2016-06-03 23:48:36 +00:00
Haicheng Wu 90a55651e6 [MBP] Factor out the optimizations on branch conditions and unanalyzable branches. NFCI.
The benefits of this patch are

-- We call AnalyzeBranch() to optimize unanalyzable branches, but the result of
   AnalyzeBranch() is not used. Now the result is useful.

-- Before the layout of all the MBBs is set, the result of AnalyzeBranch() is
   not correct and needs to be fixed before using it to optimize the branch
   conditions. Now this optimization is called after the layout, the code used
   to fix the result of AnalyzeBranch() is not needed.

-- The branch condition of the last block is not optimized before. Now it is
   optimized.

Differential Revision: http://reviews.llvm.org/D20177

llvm-svn: 270623
2016-05-24 22:16:14 +00:00
Haicheng Wu c01919e796 [MBP] Remove a redundant skipFunction(). NFC.
skipFunction() is called twice.

Differential Revision: http://reviews.llvm.org/D20377

llvm-svn: 269994
2016-05-18 22:34:45 +00:00
Xinliang David Li b840bb8714 Fix option description /NFC
llvm-svn: 269307
2016-05-12 16:39:02 +00:00
Xinliang David Li f0ab6dfedc [Layout] Add a new option (NFC)
Currently cost based loop rotation algo can only be turned on with
two conditions: the function has real profile data, and -precise-rotation-cost
flag is turned on. This is not convenient for developers to experiment
when profile is not available. Add a new option to force the new
rotation algorithm -force-precise-rotation-cost

llvm-svn: 269266
2016-05-12 02:04:41 +00:00
Andrew Kaylor 50271f787e Add opt-bisect support to additional passes that can be skipped
Differential Revision: http://reviews.llvm.org/D19882

llvm-svn: 268457
2016-05-03 22:32:30 +00:00
Quentin Colombet 776e6de516 [MachineBlockPlacement] Let the target optimize the branches at the end.
After the layout of the basic blocks is set, the target may be able to get rid
of unconditional branches to fallthrough blocks that the generic code does not
catch. This happens any time TargetInstrInfo::AnalyzeBranch is not able to
analyze all the branches involved in the terminators sequence, while still
understanding a few of them.

In such situation, AnalyzeBranch can directly modify the branches if it has been
instructed to do so.

This patch takes advantage of that.

llvm-svn: 268328
2016-05-02 22:58:59 +00:00
Haicheng Wu 4afe0425db [MBP] Use Function::optForSize() instead of checking OptimizeForSize directly.
Fix a FIXME.  Disable loop alignment if compiled with -Oz now.

llvm-svn: 268121
2016-04-29 22:01:10 +00:00
Haicheng Wu e749ce53d4 [MBP] Split placement and alignment into two functions. NFC.
Cut and Paste.

llvm-svn: 268067
2016-04-29 17:06:44 +00:00
Andrew Kaylor aa641a5171 Re-commit optimization bisect support (r267022) without new pass manager support.
The original commit was reverted because of a buildbot problem with LazyCallGraph::SCC handling (not related to the OptBisect handling).

Differential Revision: http://reviews.llvm.org/D19172

llvm-svn: 267231
2016-04-22 22:06:11 +00:00
Vedant Kumar 6013f45f92 Revert "Initial implementation of optimization bisect support."
This reverts commit r267022, due to an ASan failure:

  http://lab.llvm.org:8080/green/job/clang-stage2-cmake-RgSan_check/1549

llvm-svn: 267115
2016-04-22 06:51:37 +00:00
Andrew Kaylor f0f279291c Initial implementation of optimization bisect support.
This patch implements a optimization bisect feature, which will allow optimizations to be selectively disabled at compile time in order to track down test failures that are caused by incorrect optimizations.

The bisection is enabled using a new command line option (-opt-bisect-limit).  Individual passes that may be skipped call the OptBisect object (via an LLVMContext) to see if they should be skipped based on the bisect limit.  A finer level of control (disabling individual transformations) can be managed through an addition OptBisect method, but this is not yet used.

The skip checking in this implementation is based on (and replaces) the skipOptnoneFunction check.  Where that check was being called, a new call has been inserted in its place which checks the bisect limit and the optnone attribute.  A new function call has been added for module and SCC passes that behaves in a similar way.

Differential Revision: http://reviews.llvm.org/D19172

llvm-svn: 267022
2016-04-21 17:58:54 +00:00
Amaury Sechet c53ad4f3b2 Do not select EhPad BB in MachineBlockPlacement when there is regular BB to schedule
Summary:
EHPad BB are not entered the classic way and therefor do not need to be placed after their predecessors. This patch make sure EHPad BB are not chosen amongst successors to form chains, and are selected as last resort when selecting the best candidate.

EHPad are scheduled in reverse probability order in order to have them flow into each others naturally.

Reviewers: chandlerc, majnemer, rafael, MatzeB, escha, silvas

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D17625

llvm-svn: 265726
2016-04-07 21:29:39 +00:00
Amaury Sechet 33c161c02f [BlockPlacement] Remove an unnecessary continue
NFC.

llvm-svn: 265643
2016-04-07 06:35:00 +00:00
Amaury Sechet 9ee4ddd710 [MBP] Remove an unused function parameter
NFC.

llvm-svn: 265642
2016-04-07 06:34:47 +00:00
Amaury Sechet 41474a52e7 Revert "[BlockPlacement] Remove an unnecessary continue" and "[MBP] Remove an unused function parameter"
llvm-svn: 265638
2016-04-07 04:28:40 +00:00
Haicheng Wu 1951cf24a7 [MBP] Remove an unused function parameter
NFC.

llvm-svn: 265596
2016-04-06 20:38:20 +00:00
Haicheng Wu 3618fa786f [BlockPlacement] Remove an unnecessary continue
NFC.

llvm-svn: 265407
2016-04-05 15:37:08 +00:00
Amaury Sechet eae09c2c2a Factor out MachineBlockPlacement::fillWorkLists. NFC
Summary: There are places in MachineBlockPlacement where a worklist is filled in pretty much identical way. The code is duplicated. This refactor it so that the same code is used in both scenarii.

Reviewers: chandlerc, majnemer, rafael, MatzeB, escha, silvas

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D18077

llvm-svn: 263495
2016-03-14 21:24:11 +00:00
Junmo Park 4ba6cf69e4 Minor code cleanup. NFC.
llvm-svn: 263196
2016-03-11 05:07:07 +00:00
Philip Reames ae27b2380f [MBP] Renaming a confusing variable and add clarifying comments
Was discussed as part of http://reviews.llvm.org/D17830

llvm-svn: 262571
2016-03-03 00:58:43 +00:00
Philip Reames 23d933982a [MBP] Avoid placing random blocks between loop preheader and header
If we have a loop with a rarely taken path, we will prune that from the blocks which get added as part of the loop chain. The problem is that we weren't then recognizing the loop chain as schedulable when considering the preheader when forming the function chain. We'd then fall to various non-predecessors before finally scheduling the loop chain (as if the CFG was unnatural.) The net result was that there could be lots of garbage between a loop preheader and the loop, even though we could have directly fallen into the loop. It also meant we separated hot code with regions of colder code.

The particular reason for the rejection of the loop chain was that we were scanning predecessor of the header, seeing the backedge, believing that was a globally more important predecessor (true), but forgetting to account for the fact the backedge precessor was already part of the existing loop chain (oops!.

Differential Revision: http://reviews.llvm.org/D17830

llvm-svn: 262547
2016-03-03 00:01:42 +00:00
Philip Reames 02e1132afb [MBP] Remove overly verbose debug output
llvm-svn: 262531
2016-03-02 22:40:51 +00:00
Philip Reames b9688f4382 [MBP] Adjust debug output to be more focused and approachable
llvm-svn: 262522
2016-03-02 21:45:13 +00:00
Chad Rosier 406808e344 Partially revert "Add command line options to force function/loop alignments."
This partially reverts r256571 in favor of the solution in r258409.

llvm-svn: 258421
2016-01-21 18:49:15 +00:00
Geoff Berry 10494aca05 [BlockPlacement] Add option to align all non-fall-through blocks.
Summary: This option is being added for testing purposes.

Reviewers: mcrosier

Subscribers: mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D16410

llvm-svn: 258409
2016-01-21 17:25:52 +00:00
Chad Rosier 6b4326367a Add command line options to force function/loop alignments.
These are being added for testing purposes.
http://reviews.llvm.org/D15648

llvm-svn: 256571
2015-12-29 18:18:07 +00:00
Cong Hou d97c100dc4 Replace all weight-based interfaces in MBB with probability-based interfaces, and update all uses of old interfaces.
(This is the second attempt to submit this patch. The first caused two assertion
 failures and was reverted. See https://llvm.org/bugs/show_bug.cgi?id=25687)

The patch in http://reviews.llvm.org/D13745 is broken into four parts:

1. New interfaces without functional changes (http://reviews.llvm.org/D13908).
2. Use new interfaces in SelectionDAG, while in other passes treat probabilities
as weights (http://reviews.llvm.org/D14361).
3. Use new interfaces in all other passes.
4. Remove old interfaces.

This patch is 3+4 above. In this patch, MBB won't provide weight-based
interfaces any more, which are totally replaced by probability-based ones.
The interface addSuccessor() is redesigned so that the default probability is
unknown. We allow unknown probabilities but don't allow using it together
with known probabilities in successor list. That is to say, we either have a
list of successors with all known probabilities, or all unknown
probabilities. In the latter case, we assume each successor has 1/N
probability where N is the number of successors. An assertion checks if the
user is attempting to add a successor with the disallowed mixed use as stated
above. This can help us catch many misuses.

All uses of weight-based interfaces are now updated to use probability-based
ones.


Differential revision: http://reviews.llvm.org/D14973

llvm-svn: 254377
2015-12-01 05:29:22 +00:00
Hans Wennborg 1dbaf67537 Revert r254348: "Replace all weight-based interfaces in MBB with probability-based interfaces, and update all uses of old interfaces."
and the follow-up r254356: "Fix a bug in MachineBlockPlacement that may cause assertion failure during BranchProbability construction."

Asserts were firing in Chromium builds. See PR25687.

llvm-svn: 254366
2015-12-01 03:49:42 +00:00
Cong Hou 1ccca9e673 Fix a bug in MachineBlockPlacement that may cause assertion failure during BranchProbability construction.
The root cause is the rounding behavior in BranchProbability construction. We may consider to use truncation instead in the future.

llvm-svn: 254356
2015-12-01 00:55:42 +00:00
Cong Hou fa1917c673 Replace all weight-based interfaces in MBB with probability-based interfaces, and update all uses of old interfaces.
The patch in http://reviews.llvm.org/D13745 is broken into four parts:

1. New interfaces without functional changes (http://reviews.llvm.org/D13908).
2. Use new interfaces in SelectionDAG, while in other passes treat probabilities
as weights (http://reviews.llvm.org/D14361).
3. Use new interfaces in all other passes.
4. Remove old interfaces.

This patch is 3+4 above. In this patch, MBB won't provide weight-based
interfaces any more, which are totally replaced by probability-based ones.
The interface addSuccessor() is redesigned so that the default probability is
unknown. We allow unknown probabilities but don't allow using it together
with known probabilities in successor list. That is to say, we either have a
list of successors with all known probabilities, or all unknown
probabilities. In the latter case, we assume each successor has 1/N
probability where N is the number of successors. An assertion checks if the
user is attempting to add a successor with the disallowed mixed use as stated
above. This can help us catch many misuses.

All uses of weight-based interfaces are now updated to use probability-based
ones.


Differential revision: http://reviews.llvm.org/D14973

llvm-svn: 254348
2015-12-01 00:02:51 +00:00
Cong Hou 41cf1a5dfb Improving edge probabilities computation when choosing the best successor in machine block placement.
When looking for the best successor from the outer loop for a block
belonging to an inner loop, the edge probability computation can be
improved so that edges in the inner loop are ignored. For example,
suppose we are building chains for the non-loop part of the following
code, and looking for B1's best successor. Assume the true body is very
hot, then B3 should be the best candidate. However, because of the
existence of the back edge from B1 to B0, the probability from B1 to B3
can be very small, preventing B3 to be its successor. In this patch, when
computing the probability of the edge from B1 to B3, the weight on the
back edge B1->B0 is ignored, so that B1->B3 will have 100% probability.

if (...)
  do {
    B0;
    ... // some branches
    B1;
  } while(...);
else
  B2;
B3;


Differential revision: http://reviews.llvm.org/D10825

llvm-svn: 253414
2015-11-18 00:52:52 +00:00
Cong Hou b90b9e0531 In MachineBlockPlacement, filter cold blocks off the loop chain when profile data is available.
In the current BB placement algorithm, a loop chain always contains all loop blocks. This has a drawback that cold blocks in the loop may be inserted on a hot function path, hence increasing branch cost and also reducing icache locality.

Consider a simple example shown below:

A
|
B⇆C
|
D

When B->C is quite cold, the best BB-layout should be A,B,D,C. But the current implementation produces A,C,B,D.

This patch filters those cold blocks off from the loop chain by comparing the ratio:

LoopBBFreq / LoopFreq

to 20%: if it is less than 20%, we don't include this BB to the loop chain. Here LoopFreq is the frequency of the loop when we reduce the loop into a single node. In general we have more cold blocks when the loop has few iterations. And vice versa.


Differential revision: http://reviews.llvm.org/D11662

llvm-svn: 251833
2015-11-02 21:24:00 +00:00
Cong Hou 7745dbc5c4 Enhance loop rotation with existence of profile data in MachineBlockPlacement pass.
Currently, in MachineBlockPlacement pass the loop is rotated to let the best exit to be the last BB in the loop chain, to maximize the fall-through from the loop to outside. With profile data, we can determine the cost in terms of missed fall through opportunities when rotating a loop chain and select the best rotation. Basically, there are three kinds of cost to consider for each rotation:

1. The possibly missed fall through edge (if it exists) from BB out of the loop to the loop header.
2. The possibly missed fall through edges (if they exist) from the loop exits to BB out of the loop.
3. The missed fall through edge (if it exists) from the last BB to the first BB in the loop chain.

Therefore, the cost for a given rotation is the sum of costs listed above. We select the best rotation with the smallest cost. This is only for PGO mode when we have more precise edge frequencies.

Differential revision: http://reviews.llvm.org/D10717

llvm-svn: 250754
2015-10-19 23:16:40 +00:00
Duncan P. N. Exon Smith 6ac07fd228 CodeGen: Remove implicit iterator conversions from MBB.cpp
Remove implicit ilist iterator conversions from MachineBasicBlock.cpp.

I've also added an overload of `splice()` that takes a pointer, since
it's a natural API.  This is similar to the overloads I added for
`remove()` and `erase()` in r249867.

llvm-svn: 249883
2015-10-09 19:36:12 +00:00
Craig Topper 77ec077067 Fix a spelling error in the description of a statistic. NFC
llvm-svn: 247771
2015-09-16 03:52:32 +00:00
Reid Kleckner 0e2882345d [WinEH] Add some support for code generating catchpad
We can now run 32-bit programs with empty catch bodies.  The next step
is to change PEI so that we get funclet prologues and epilogues.

llvm-svn: 246235
2015-08-27 23:27:47 +00:00
Cong Hou ec10587205 Revert r244154 which causes some build failure. See https://llvm.org/bugs/show_bug.cgi?id=24377.
llvm-svn: 244239
2015-08-06 18:17:29 +00:00
Cong Hou 36e7e52aa4 Record whether the weights on out-edges from a MBB are normalized.
1. Create a utility function normalizeEdgeWeights() in MachineBranchProbabilityInfo that normalizes a list of edge weights so that the sum of then can fit in uint32_t.
2. Provide an interface in MachineBasicBlock to normalize its successors' weights.
3. Add a flag in MachineBasicBlock that tracks whether its successors' weights are normalized.
4. Provide an overload of getSumForBlock that accepts a non-const pointer to a MBB so that it can force normalizing this MBB's successors' weights.
5. Update several uses of getSumForBlock() by eliminating the once needed weight scale.

Differential Revision: http://reviews.llvm.org/D11442

llvm-svn: 244154
2015-08-05 22:01:20 +00:00
Sanjay Patel 924879ad2c wrap OptSize and MinSize attributes for easier and consistent access (NFCI)
Create wrapper methods in the Function class for the OptimizeForSize and MinSize
attributes. We want to hide the logic of "or'ing" them together when optimizing
just for size (-Os).

Currently, we are not consistent about this and rely on a front-end to always set
OptimizeForSize (-Os) if MinSize (-Oz) is on. Thus, there are 18 FIXME changes here
that should be added as follow-on patches with regression tests.

This patch is NFC-intended: it just replaces existing direct accesses of the attributes
by the equivalent wrapper call.

Differential Revision: http://reviews.llvm.org/D11734

llvm-svn: 243994
2015-08-04 15:49:57 +00:00
Cong Hou 0881fc1198 Test commit.
This is a test commit (one blank line deleted).

llvm-svn: 242308
2015-07-15 17:58:15 +00:00
Alexander Kornienko f00654e31b Revert r240137 (Fixed/added namespace ending comments using clang-tidy. NFC)
Apparently, the style needs to be agreed upon first.

llvm-svn: 240390
2015-06-23 09:49:53 +00:00
Alexander Kornienko 70bc5f1398 Fixed/added namespace ending comments using clang-tidy. NFC
The patch is generated using this command:

tools/clang/tools/extra/clang-tidy/tool/run-clang-tidy.py -fix \
  -checks=-*,llvm-namespace-comment -header-filter='llvm/.*|clang/.*' \
  llvm/lib/


Thanks to Eugene Kosov for the original patch!

llvm-svn: 240137
2015-06-19 15:57:42 +00:00
Chandler Carruth 26d3017b8e [MBP] Spell the conditions the same way through out this if statement.
NFC.

llvm-svn: 235009
2015-04-15 13:39:42 +00:00
Chandler Carruth cfb2b9d755 [MBP] Sink a comment into the if block to which it pertains. This makes
the content of the comment make much more sense.

llvm-svn: 235007
2015-04-15 13:26:41 +00:00
Chandler Carruth 9a512a48b2 [MBP] Fix a really misleading typo in a comment.
llvm-svn: 235006
2015-04-15 13:19:54 +00:00
Benjamin Kramer 799003bf8c Re-sort includes with sort-includes.py and insert raw_ostream.h where it's used.
llvm-svn: 232998
2015-03-23 19:32:43 +00:00
Daniel Jasper 214997c63b [MBP] Don't outline short optional branches
With the option -outline-optional-branches, LLVM will place optional
branches out of line (more details on r231230).

With this patch, this is not done for short optional branches. A short
optional branch is a branch containing a single block with an
instruction count below a certain threshold (defaulting to 3). Still
everything is guarded under -outline-optional-branches).

Outlining a short branch can't significantly improve code locality. It
can however decrease performance because of the additional jmp and in
cases where the optional branch is hot. This fixes a compile time
regression I have observed in a benchmark.

Review: http://reviews.llvm.org/D8108
llvm-svn: 232802
2015-03-20 10:00:37 +00:00
Chandler Carruth 7a715dae05 [MBP] Use range based for-loops throughout this code. Several had
already been added and the inconsistency made choosing names and
changing code more annoying. Plus, wow are they better for this code!

llvm-svn: 231347
2015-03-05 03:19:05 +00:00
Chandler Carruth 2fc3fe1282 [MBP] NFC, run clang-format over this code and tweak things to make the
result reasonable.

This code predated clang-format and so there was a reasonable amount of
crufty formatting that had accumulated. This should ensure that neither
myself nor others end up with formatting-only changes sneaking into
other fixes.

llvm-svn: 231341
2015-03-05 02:35:31 +00:00
Chandler Carruth d0dced58ab [MBP] This is no longer 'block-placement2'. ;] The old variants are long
gone, update this code to reflect that.

llvm-svn: 231340
2015-03-05 02:28:25 +00:00
Chandler Carruth af7e99f2f4 [MBP] Revert r231238 which attempted to fix a nasty bug where MBP is
just arbitrarily interleaving unrelated control flows once they get
moved "out-of-line" (both outside of natural CFG ordering and with
diamonds that cannot be fully laid out by chaining fallthrough edges).

This easy solution doesn't work in practice, and it isn't just a small
bug. It looks like a very different strategy will be required. I'm
working on that now, and it'll again go behind some flag so that
everyone can experiment and make sure it is working well for them.

llvm-svn: 231332
2015-03-05 01:07:03 +00:00
Chandler Carruth 9a53fbe243 [MBP] Fix a really horrible bug in MachineBlockPlacement, but behind
a flag for now.

First off, thanks to Daniel Jasper for really pointing out the issue
here. It's been here forever (at least, I think it was there when
I first wrote this code) without getting really noticed or fixed.

The key problem is what happens when two reasonably common patterns
happen at the same time: we outline multiple cold regions of code, and
those regions in turn have diamonds or other CFGs for which we can't
just topologically lay them out. Consider some C code that looks like:

  if (a1()) { if (b1()) c1(); else d1(); f1(); }
  if (a2()) { if (b2()) c2(); else d2(); f2(); }
  done();

Now consider the case where a1() and a2() are unlikely to be true. In
that case, we might lay out the first part of the function like:

  a1, a2, done;

And then we will be out of successors in which to build the chain. We go
to find the best block to continue the chain with, which is perfectly
reasonable here, and find "b1" let's say. Laying out successors gets us
to:

  a1, a2, done; b1, c1;

At this point, we will refuse to lay out the successor to c1 (f1)
because there are still un-placed predecessors of f1 and we want to try
to preserve the CFG structure. So we go get the next best block, d1.

... wait for it ...

Except that the next best block *isn't* d1. It is b2! d1 is waaay down
inside these conditionals. It is much less important than b2. Except
that this is exactly what we didn't want. If we keep going we get the
entire set of the rest of the CFG *interleaved*!!!

  a1, a2, done; b1, c1; b2, c2; d1, f1; d2, f2;

So we clearly need a better strategy here. =] My current favorite
strategy is to actually try to place the block whose predecessor is
closest. This very simply ensures that we unwind these kinds of CFGs the
way that is natural and fitting, and should minimize the number of cache
lines instructions are spread across.

It also happens to be *dead simple*. It's like the datastructure was
specifically set up for this use case or something. We only push blocks
onto the work list when the last predecessor for them is placed into the
chain. So the back of the worklist *is* the nearest next block.

Unfortunately, a change like this is going to cause *soooo* many
benchmarks to swing wildly. So for now I'm adding this under a flag so
that we and others can validate that this is fixing the problems
described, that it seems possible to enable, and hopefully that it fixes
more of our problems long term.

llvm-svn: 231238
2015-03-04 12:18:08 +00:00
Daniel Jasper 471e856f49 Add a flag to experiment with outlining optional branches.
In a CFG with the edges A->B->C and A->C, B is an optional branch.

LLVM's default behavior is to lay the blocks out naturally, i.e. A, B,
C, in order to improve code locality and fallthroughs. However, if a
function contains many of those optional branches only a few of which
are taken, this leads to a lot of unnecessary icache misses. Moving B
out of line can work around this.

Review: http://reviews.llvm.org/D7719
llvm-svn: 231230
2015-03-04 11:05:34 +00:00
Daniel Jasper ed9eb7209e NFC: Use range-based for loops and more consistent naming.
No functional changes intended.

(I plan on doing some modifications to this function and would like to
have as few unrelated changes as possible in the patch)

llvm-svn: 229649
2015-02-18 08:19:16 +00:00
Daniel Jasper 4d7b04384e Remove experimental options to control machine block placement.
This reverts r226034. Benchmarking with those flags has not revealed
anything interesting.

llvm-svn: 229648
2015-02-18 08:18:07 +00:00
Duncan P. N. Exon Smith 70eb9c5ae5 CodeGen: Canonicalize access to function attributes, NFC
Canonicalize access to function attributes to use the simpler API.

getAttributes().getAttribute(AttributeSet::FunctionIndex, Kind)
  => getFnAttribute(Kind)

getAttributes().hasAttribute(AttributeSet::FunctionIndex, Kind)
  => hasFnAttribute(Kind)

Also, add `Function::getFnStackAlignment()`, and canonicalize:

getAttributes().getStackAlignment(AttributeSet::FunctionIndex)
  => getFnStackAlignment()

llvm-svn: 229208
2015-02-14 01:44:41 +00:00
Chandler Carruth e3288147f0 [MBP] Add flags to disable the BadCFGConflict check in MachineBlockPlacement.
Some benchmarks have shown that this could lead to a potential
performance benefit, and so adding some flags to try to help measure the
difference.

A possible explanation. In diamond-shaped CFGs (A followed by either
B or C both followed by D), putting B and C both in between A and
D leads to the code being less dense than it could be. Always either
B or C have to be skipped increasing the chance of cache misses etc.
Moving either B or C to after D might be beneficial on average.

In the long run, but we should probably do a better job of analyzing the
basic block and branch probabilities to move the correct one of B or
C to after D. But even if we don't use this in the long run, it is
a good baseline for benchmarking.

Original patch authored by Daniel Jasper with test tweaks and a second
flag added by me.

Differential Revision: http://reviews.llvm.org/D6969

llvm-svn: 226034
2015-01-14 20:19:29 +00:00
Hal Finkel 5772566ed6 [PowerPC/BlockPlacement] Allow target to provide a per-loop alignment preference
The existing code provided for specifying a global loop alignment preference.
However, the preferred loop alignment might depend on the loop itself. For
recent POWER cores, loops between 5 and 8 instructions should have 32-byte
alignment (while the others are better with 16-byte alignment) so that the
entire loop will fit in one i-cache line.

To support this, getPrefLoopAlignment has been made virtual, and can be
provided with an optional MachineLoop* so the target can inspect the loop
before answering the query. The default behavior, as before, is to return the
value set with setPrefLoopAlignment. MachineBlockPlacement now queries the
target for each loop instead of only once per function. There should be no
functional change for other targets.

llvm-svn: 225117
2015-01-03 17:58:24 +00:00
David Blaikie 70573dcd9f Update SetVector to rely on the underlying set's insert to return a pair<iterator, bool>
This is to be consistent with StringSet and ultimately with the standard
library's associative container insert function.

This lead to updating SmallSet::insert to return pair<iterator, bool>,
and then to update SmallPtrSet::insert to return pair<iterator, bool>,
and then to update all the existing users of those functions...

llvm-svn: 222334
2014-11-19 07:49:26 +00:00
Eric Christopher fc6de428c8 Have MachineFunction cache a pointer to the subtarget to make lookups
shorter/easier and have the DAG use that to do the same lookup. This
can be used in the future for TargetMachine based caching lookups from
the MachineFunction easily.

Update the MIPS subtarget switching machinery to update this pointer
at the same time it runs.

llvm-svn: 214838
2014-08-05 02:39:49 +00:00
Eric Christopher d913448b38 Remove the TargetMachine forwards for TargetSubtargetInfo based
information and update all callers. No functional change.

llvm-svn: 214781
2014-08-04 21:25:23 +00:00
Alp Toker e69170a110 Revert "Introduce a string_ostream string builder facilty"
Temporarily back out commits r211749, r211752 and r211754.

llvm-svn: 211814
2014-06-26 22:52:05 +00:00
Alp Toker 614717388c Introduce a string_ostream string builder facilty
string_ostream is a safe and efficient string builder that combines opaque
stack storage with a built-in ostream interface.

small_string_ostream<bytes> additionally permits an explicit stack storage size
other than the default 128 bytes to be provided. Beyond that, storage is
transferred to the heap.

This convenient class can be used in most places an
std::string+raw_string_ostream pair or SmallString<>+raw_svector_ostream pair
would previously have been used, in order to guarantee consistent access
without byte truncation.

The patch also converts much of LLVM to use the new facility. These changes
include several probable bug fixes for truncated output, a programming error
that's no longer possible with the new interface.

llvm-svn: 211749
2014-06-26 00:00:48 +00:00
Chandler Carruth 1b9dde087e [Modules] Remove potential ODR violations by sinking the DEBUG_TYPE
define below all header includes in the lib/CodeGen/... tree. While the
current modules implementation doesn't check for this kind of ODR
violation yet, it is likely to grow support for it in the future. It
also removes one layer of macro pollution across all the included
headers.

Other sub-trees will follow.

llvm-svn: 206837
2014-04-22 02:02:50 +00:00
Craig Topper c0196b1b40 [C++11] More 'nullptr' conversion. In some cases just using a boolean check instead of comparing to nullptr.
llvm-svn: 206142
2014-04-14 00:51:57 +00:00
Paul Robinson 7c99ec5b99 Disable each MachineFunctionPass for 'optnone' functions, unless that
pass normally runs at optimization level None, or is part of the
register allocation pipeline.

llvm-svn: 205228
2014-03-31 17:43:35 +00:00
Craig Topper 4584cd54e3 [C++11] Add 'override' keyword to virtual methods that override their base class.
llvm-svn: 203220
2014-03-07 09:26:03 +00:00
Benjamin Kramer b6d0bd48bd [C++11] Replace llvm::next and llvm::prior with std::next and std::prev.
Remove the old functions.

llvm-svn: 202636
2014-03-02 12:27:27 +00:00
Benjamin Kramer 3a377bce4e Now that we have C++11, turn simple functors into lambdas and remove a ton of boilerplate.
No intended functionality change.

llvm-svn: 202588
2014-03-01 11:47:00 +00:00
Nico Weber 7408c7066a Add a LLVM_DUMP_METHOD macro.
The motivation is to mark dump methods as used in debug builds so that they can
be called from lldb, but to not do so in release builds so that they can be
dead-stripped.

There's lots of potential follow-up work suggested in the thread
"Should dump methods be LLVM_ATTRIBUTE_USED only in debug builds?" on cfe-dev,
but everyone seems to agreen on this subset.

Macro name chosen by fair coin toss.

llvm-svn: 198456
2014-01-03 22:53:37 +00:00
Michael Gottesman b78dec8faf [block-freq] Update MachineBlockPlacement and RegAllocGreedy to use the new MachineBlockFrequencyInfo methods.
llvm-svn: 197290
2013-12-14 00:25:45 +00:00
Matt Arsenault 0f5f015bfd Fix gcc warnings.
Unused variable and unused typedef in release build.

llvm-svn: 196947
2013-12-10 18:55:37 +00:00
Matt Arsenault 79d55f5c1f Revert part of GCC warning fix to fix debug build.
The typedef is used inside the DEBUG(), and apparently can't be moved
inside of it.

llvm-svn: 196528
2013-12-05 20:02:18 +00:00
Matt Arsenault c44a3ff638 Fix minor GCC warnings.
Unused typedefs and unused variables.

llvm-svn: 196526
2013-12-05 19:37:36 +00:00
Chandler Carruth 260258b9c0 Output a bit more information in the debug printing for MBP. This was
useful when analyzing parts of zlib's behavior here.

llvm-svn: 195588
2013-11-25 00:43:41 +00:00
Benjamin Kramer c8160d6523 MachineBlockPlacement: Strengthen the source order bias when picking an exit block.
We now only allow breaking source order if the exit block frequency is
significantly higher than the other exit block. The actual bias is
currently under a flag so the best cut-off can be found; the flag
defaults to the old behavior. The idea is to get some benchmark coverage
over different values for the flag and pick the best one.

When we require the new frequency to be at least 20% higher than the old
frequency I see a 5% speedup on zlib's deflate when compressing a random
file on x86_64/westmere. Hal reported a small speedup on Fhourstones on
a BG/Q and no regressions in the test suite.

The test case is the full long_match function from zlib's deflate. I was
reluctant to add it for previous tweaks to branch probabilities because
it's large and potentially fragile, but changed my mind since it's an
important use case and more likely to break with all the current work
going into the PGO infrastructure.

Differential Revision: http://llvm-reviews.chandlerc.com/D2202

llvm-svn: 195265
2013-11-20 19:08:44 +00:00
Shuxin Yang 8b8fd2171c Fix a defect in code-layout pass, improving Benchmarks/Olden/em3d/em3d by about 30%
(4.58s vs 3.2s on an oldish Mac Tower). 

  The corresponding src is excerpted bellow. The lopp accounts for about 90% of execution time.
  --------------------
    cat -n test-suite/MultiSource/Benchmarks/Olden/em3d/make_graph.c
     90 
     91         for (k=0; k<j; k++)
     92           if (other_node == cur_node->to_nodes[k]) break;

  The defective layout is sketched bellow, where the two branches need to swap.
  ------------------------------------------------------------------------
      L:
         ...
      if (cond) goto out-of-loop
      goto L

  While this code sequence is defective, I don't understand why it incurs 1/3 of 
execution time. CPU-event-profiling indicates the poor laoyout dose not increase
in br-misprediction; it dosen't increase stall cycle at all, and it dosen't 
prevent the CPU detect the loop (i.e. Loop-Stream-Detector seems to be working fine
as well)... 

   The root cause of the problem is that the layout pass calls AnalyzeBranch() 
with basic-block which is not updated to reflect its current layout.

rdar://13966341

llvm-svn: 183174
2013-06-04 01:00:57 +00:00
Nadav Rotem c0adc9fd91 Don't disable block layout when forcing block alignment.
llvm-svn: 179355
2013-04-12 01:24:16 +00:00
Nadav Rotem c3b0f50ac2 Add a flag to align all basic blocks in the function.
When debugging performance regressions we often ask ourselves if the regression
that we see is due to poor isel/sched/ra or due to some micro-architetural
problem.  When comparing two code sequences one good way to rule out front-end
bottlenecks (and other the issues) is to force code alignment. This pass adds
a flag that forces the alignment of all of the basic blocks in the program.

llvm-svn: 179353
2013-04-12 00:48:32 +00:00
Nadav Rotem 6036f581aa Fix a typo
llvm-svn: 178346
2013-03-29 16:34:23 +00:00
Benjamin Kramer 56b31bd9d7 Split TargetLowering into a CodeGen and a SelectionDAG part.
This fixes some of the cycles between libCodeGen and libSelectionDAG. It's still
a complete mess but as long as the edges consist of virtual call it doesn't
cause breakage. BasicTTI did static calls and thus broke some build
configurations.

llvm-svn: 172246
2013-01-11 20:05:37 +00:00
Bill Wendling 698e84fc4f Remove the Function::getFnAttributes method in favor of using the AttributeSet
directly.

This is in preparation for removing the use of the 'Attribute' class as a
collection of attributes. That will shift to the AttributeSet class instead.

llvm-svn: 171253
2012-12-30 10:32:01 +00:00
Bill Wendling 3d7b0b8ac7 Rename the 'Attributes' class to 'Attribute'. It's going to represent a single attribute in the future.
llvm-svn: 170502
2012-12-19 07:18:57 +00:00
Chandler Carruth ed0881b2a6 Use the new script to sort the includes of every file under lib.
Sooooo many of these had incorrect or strange main module includes.
I have manually inspected all of these, and fixed the main module
include to be the nearest plausible thing I could find. If you own or
care about any of these source files, I encourage you to take some time
and check that these edits were sensible. I can't have broken anything
(I strictly added headers, and reordered them, never removed), but they
may not be the headers you'd really like to identify as containing the
API being implemented.

Many forward declarations and missing includes were added to a header
files to allow them to parse cleanly when included first. The main
module rule does in fact have its merits. =]

llvm-svn: 169131
2012-12-03 16:50:05 +00:00
Bill Wendling c9b22d735a Create enums for the different attributes.
We use the enums to query whether an Attributes object has that attribute. The
opaque layer is responsible for knowing where that specific attribute is stored.

llvm-svn: 165488
2012-10-09 07:45:08 +00:00
Bill Wendling 863bab689a Remove the `hasFnAttr' method from Function.
The hasFnAttr method has been replaced by querying the Attributes explicitly. No
intended functionality change.

llvm-svn: 164725
2012-09-26 21:48:26 +00:00
Duncan Sands 291d47efdf Remove silly dead store. Patch by Ettl Martin.
llvm-svn: 163882
2012-09-14 09:00:11 +00:00
Chandler Carruth 881d0a7966 Add a much more conservative strategy for aligning branch targets.
Previously, MBP essentially aligned every branch target it could. This
bloats code quite a bit, especially non-looping code which has no real
reason to prefer aligned branch targets so heavily.

As Andy said in review, it's still a bit odd to do this without a real
cost model, but this at least has much more plausible heuristics.

Fixes PR13265.

llvm-svn: 161409
2012-08-07 09:45:24 +00:00
Manman Ren 2b6a0dfd4c Reverse order of the two branches at end of a basic block if it is profitable.
We branch to the successor with higher edge weight first.
Convert from
     je    LBB4_8  --> to outer loop
     jmp   LBB4_14 --> to inner loop
to
     jne   LBB4_14
     jmp   LBB4_8

PR12750
rdar: 11393714

llvm-svn: 161018
2012-07-31 01:11:07 +00:00
Chandler Carruth 9139f44d23 Update a bunch of stale comments that dated from when this folled the
very first (and worst) placement algorithm. These should now more
accurately reflect the reality of the pass.

llvm-svn: 159185
2012-06-26 05:16:37 +00:00
Benjamin Kramer bde9176663 Fix typos found by http://github.com/lyda/misspell-check
llvm-svn: 157885
2012-06-02 10:20:22 +00:00
Chandler Carruth 8c0b41d656 Add a somewhat hacky heuristic to do something different from whole-loop
rotation. When there is a loop backedge which is an unconditional
branch, we will end up with a branch somewhere no matter what. Try
placing this backedge in a fallthrough position above the loop header as
that will definitely remove at least one branch from the loop iteration,
where whole loop rotation may not.

I haven't seen any benchmarks where this is important but loop-blocks.ll
tests for it, and so this will be covered when I flip the default.

llvm-svn: 154812
2012-04-16 13:33:36 +00:00
Chandler Carruth 8c74c7b1c6 Tweak the loop rotation logic to check whether the loop is naturally
laid out in a form with a fallthrough into the header and a fallthrough
out of the bottom. In that case, leave the loop alone because any
rotation will introduce unnecessary branches. If either side looks like
it will require an explicit branch, then the rotation won't add any, do
it to ensure the branch occurs outside of the loop (if possible) and
maximize the benefit of the fallthrough in the bottom.

llvm-svn: 154806
2012-04-16 09:31:23 +00:00