Simplify the logic of handling sections in BOLT. This change brings more
direct and predictable mapping of BinarySection instances to sections in
the input and output files.
* Only sections from the input binary will have a non-null SectionRef.
When a new section is created as a copy of the input section,
its SectionRef is reset to null.
* RewriteInstance::getOutputSectionName() is removed as the section name
in the output file is now defined by BinarySection::getOutputName().
* Querying BinaryContext for sections by name uses their original name.
E.g., getUniqueSectionByName(".rodata") will return the original
section even if the new .rodata section was created.
* Input file sections (with relocations applied) are emitted via MC with
".bolt.org" prefix. However, their name in the output binary is
unchanged unless a new section with the same name is created.
* New sections are emitted internally with ".bolt.new" prefix if there's
a name conflict with an input file section. Their original name is
preserved in the output file.
* Section header string table is properly populated with section names
that are actually used. Previously we used to include discarded
section names as well.
* Fix the problem when dynamic relocations were propagated to a new
section with a name that matched a section in the input binary.
E.g., the new .rodata with jump tables had dynamic relocations from
the original .rodata.
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D135494
This adds a round of checks to memory references, looking for
incorrect references to jump table objects. Fix them by replacing the
jump table reference with another object reference + offset.
This solves bugs related to regular data references in code
accidentally being bound to a jump table, and this reference being
updated to a new (incorrect) location because we moved this jump
table.
Fixes#55004
Reviewed By: #bolt, maksfb
Differential Revision: https://reviews.llvm.org/D134098
I went over the output of the following mess of a command:
`(ulimit -m 2000000; ulimit -v 2000000; git ls-files -z | parallel --xargs -0 cat | aspell list --mode=none --ignore-case | grep -E '^[A-Za-z][a-z]*$' | sort | uniq -c | sort -n | grep -vE '.{25}' | aspell pipe -W3 | grep : | cut -d' ' -f2 | less)`
and proceeded to spend a few days looking at it to find probable typos
and fixed a few hundred of them in all of the llvm project (note, the
ones I found are not anywhere near all of them, but it seems like a
good start).
Reviewed By: Amir, maksfb
Differential Revision: https://reviews.llvm.org/D130824
This does *not* link with libLLVM, but with static archives instead. Not
super-great, but at least the build works, which is probably better than
failing.
Related to #57551
Differential Revision: https://reviews.llvm.org/D134434
After BOLT's merge to LLVM, there are two (almost identical) versions of the
code layout algorithm. The diff unifies the implementations by keeping the one
in LLVM.
There are mild changes in the resulting block orders. I tested the changes
extensively both on the clang binary and on prod services. Didn't see stat sig
differences on average.
Reviewed By: Amir
Differential Revision: https://reviews.llvm.org/D129895
For exception handling, LSDA call sites have to be emitted for each
fragment individually. With this patch, call sites and respective LSDA
symbols are generated and associated with each fragment of their
function, such that they can be used by the emitter.
Reviewed By: maksfb
Differential Revision: https://reviews.llvm.org/D132052
To enable split strategies that require view of the entire CFG (e.g. to
estimate cost of path from entry block), with this patch, all blocks of
a function are passed to `SplitStrategy::fragment`. Because this might
move non-outlineable blocks into a split fragment, these blocks are
moved back into the main fragment after fragmenting. This also gives
strategies the option to specify whether empty fragments should be
kept or removed.
Reviewed By: maksfb
Differential Revision: https://reviews.llvm.org/D132423
ICP has two modes: jump table promotion and indirect call promotion.
The selection is based on whether an instruction has a jump table or not.
An instruction with unknown control flow doesn't have a jump table and will
fall under indirect call promotion policy which might be incorrect/unsafe
(if an instruction is not a tail call, i.e. has local jump targets).
Prevent ICP for functions containing instructions with unknown control flow.
Follow-up to https://reviews.llvm.org/D128870.
Reviewed By: maksfb
Differential Revision: https://reviews.llvm.org/D132882
This introduces an abstract base class for splitting strategies to
document the interface a strategy needs to implement, and also to avoid
code bloat of the `splitFunction` method.
Reviewed By: maksfb
Differential Revision: https://reviews.llvm.org/D132054
This changes `FunctionFragment` from being used as a temporary proxy
object to access basic block ranges to a heap-allocated object that can
store fragment-specific information.
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D132050
A const-qualified reference to function layout allows accessing
non-const qualified basic blocks on a const-qualified function. This
patch adds or removes const-qualifiers where necessary to indicate where
basic blocks are used in a non-const manner.
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D132049
This changes `FunctionFragment` from being used as a temporary proxy
object to access basic block ranges to a heap-allocated object that can
store fragment-specific information.
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D132050
A const-qualified reference to function layout allows accessing
non-const qualified basic blocks on a const-qualified function. This
patch adds or removes const-qualifiers where necessary to indicate where
basic blocks are used in a non-const manner.
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D132049
This patch adds exception handling trampolines when a function is split
into more than two fragments. Trampolines are tracked per-fragment, such
that they can be removed if splitting is reversed.
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D132048
Use isSplit() instead of isCold() when building the call graph and
update parameter names to reflect this.
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D132047
This adds a strategy to split functions into a random number of
fragments at randomly chosen split points.
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D130647
This adds a function splitting strategy that splits each outlineable
basic block into its own fragment. This is exposed through a new command
line option `--split-strategy`.
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D129827
This patch adds support to generate any number of sections that are
assigned to fragments of functions that are split more than two-way.
With this, a function's *nth* split fragment goes into section
`.text.cold.n`.
This also changes `FunctionLayout::erase` to make sure, that there are
no empty fragments at the end of the function. This sometimes happens
when blocks are erased from the function. To avoid creating symbols
pointing to these fragments, they need to be removed.
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D130521
This adds basic fragment awareness in the exception handling passes and
generates the necessary symbols for fragments.
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D130520
To track whether a function's new layout is different from its old
layout when updating it, the old layout would be kept around in memory
indefinitely (if the new layout is different). This was used only for
debugging/logging purposes. This patch forces the caller of function
layout's update method to copy the old layout into a temporary if they
need it by removing the old layout fields.
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D131413
This patch refactors BAT to be testable as a library, so we
can have open-source tests on it. This further fixes an issue with
basic blocks that lack a valid input offset, making BAT omit those
when writing translation tables.
Test Plan: new testcases added, new testing tool added (llvm-bat-dump)
Differential Revision: https://reviews.llvm.org/D129382
Rather than iterating over the whole function from the start until no
internal calls are found, process each block only once and continue
processing after splitting. This version of the function also does not
seemingly invalidate iterators from within the loop.
Reviewed By: maksfb
Differential Revision: https://reviews.llvm.org/D130436
This patch adds a dedicated class to keep track of each function's
layout. It also lays the groundwork for splitting functions into
multiple fragments (as opposed to a strict hot/cold split).
Reviewed By: maksfb
Differential Revision: https://reviews.llvm.org/D129518
As we are moving towards support for multiple fragments, loops that
iterate over all basic blocks of a function, but do not depend on the
order of basic blocks in the final layout, should iterate over binary
functions directly, rather than the layout.
Eventually, all loops using the layout list should either iterate over
the function, or be aware of multiple layouts. This patch replaces
references to binary function's block layout with the binary function
itself where only little code changes are necessary.
Reviewed By: maksfb
Differential Revision: https://reviews.llvm.org/D129585
There are two assumptions regarding jump table:
(a) It is accessed by only one fragment, say, Parent
(b) All entries target instructions in Parent
For (a), BOLT stores jump table entries as relative offset to Parent.
For (b), BOLT treats jump table entries target somewhere out of Parent
as INVALID_OFFSET, including fragment of same split function.
In this update, we extend (a) and (b) to include fragment of same split
functinon. For (a), we store jump table entries in absolute offset
instead. In addition, jump table will store all fragments that access
it. A fragment uses this information to only create label for jump table
entries that target to that fragment.
For (b), using absolute offset allows jump table entries to target
fragments of same split function, i.e., extend support for split jump
table. This can be done using relocation (fragment start/size) and
fragment detection heuristics (e.g., using symbol name pattern for
non-stripped binaries).
For jump table targets that can only be reached by one fragment, we
mark them as local label; otherwise, they would be the secondary
function entry to the target fragment.
Test Plan
```
ninja check-bolt
```
Reviewed By: Amir
Differential Revision: https://reviews.llvm.org/D128474
The gold linker veneers are written between functions without symbols,
so we to handle it specially in BOLT.
Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei
Differential Revision: https://reviews.llvm.org/D129260
Add -experimental-shrink-wrapping flag to control when we
want to move callee-saved registers even when addresses of the stack
frame are captured and used in pointer arithmetic, making it more
challenging to do alias analysis to prove that we do not access
optimized stack positions. This alias analysis is not yet implemented,
hence, it is experimental. In practice, though, no compiler would emit
code to do pointer arithmetic to access a saved callee-saved register
unless there is a memory bug or we are failing to identify a
callee-saved reg, so I'm not sure how useful it would be to formally
prove that.
Reviewed By: Amir
Differential Revision: https://reviews.llvm.org/D126115
Change shrink-wrapping to try a priority list of save
positions, instead of trying the best one and giving up if it doesn't
work. This also increases coverage.
Reviewed By: Amir
Differential Revision: https://reviews.llvm.org/D126114
Add the option to run -equalize-bb-counts before shrink
wrapping to avoid unnecessarily optimizing some CFGs where profile is
inaccurate but we can prove two blocks have the same frequency.
Reviewed By: Amir
Differential Revision: https://reviews.llvm.org/D126113
Change how function score is calculated and provide more
detailed statistics when reporting back frame optimizer and shrink
wrapping results. In this new statistics, we provide dynamic coverage
numbers. The main metric for shrink wrapping is the number of executed
stores that were saved because of shrink wrapping (push instructions
that were either entirely moved away from the hot block or converted
to a stack adjustment instruction). There is still a number of reduced
load instructions (pop) that we are not counting at the moment. Also
update alloc combiner to report dynamic numbers, as well as frame
optimizer.
For debugging purposes, we also include a list of top 10 functions
optimized by shrink wrapping. These changes are aimed at better
understanding the impact of shrink wrapping in a given binary.
We also remove an assertion in dataflow analysis to do not choke on
empty functions (which makes no sense).
Reviewed By: Amir
Differential Revision: https://reviews.llvm.org/D126111
There is a post-processing in ext-tsp block reordering that merges some blocks
into chains. This allows to maintain the original block order in the absense of
profile data and can be beneficial for code size (when fallthroughs are merged).
In the earlier version we could merge hot and cold (with zero execution count)
chains, that later were split by SplitFunction.cpp (when split-all-cold=1). The
diff eliminates the redundant merging.
It is unlikely the change will affect the performance of a binary in a
measurable way, as it is mostly operates with cold basic blocks. However, after
the diff the impact of split-all-cold is almost negligible and we can avoid the
extra function splitting.
Measuring on the clang binary (negative is good, positive is a regression):
**clang12**
benchmark1: `0.0253`
benchmark2: `-0.1843`
benchmark3: `0.3234`
benchmark4: `0.0333`
**clang10**
benchmark1 `-0.2517`
benchmark2 `-0.3703`
benchmark3 `-0.1186`
benchmark4 `-0.3822`
**clang7**
benchmark1 `0.2526`
benchmark2 `0.0500`
benchmark3 `0.3024`
benchmark4 `-0.0489`
**Overall**: `-0.0671 ± 0.1172` (insignificant)
Reviewed By: maksfb
Differential Revision: https://reviews.llvm.org/D129397
When SplitFunctions pass adds a trampoline code for exception landing
pads (limited to shared objects), it may increase the size of the hot
fragment making it larger than the whole function pre-split. When this
happens, the pass reverts the splitting action by restoring the original
block order and marking all blocks hot.
However, if createEHTrampolines() added new blocks to the CFG and
modified invoke instructions, simply restoring the original block layout
will not suffice as the new CFG has more blocks.
For proper backout of the split, modify the original layout by merging
in trampoline blocks immediately before their matching targets. As a
result, the number of blocks increases, but the number of instructions
and the function size remains the same as pre-split.
Add an assertion for the number of blocks when updating a function
layout.
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D128696
For test purposes, we want to split functions at a random split point
to be able to test different layouts without relying on the profile.
This patch introduces an option, that randomly chooses a split point
to partition blocks of a function into hot and cold regions.
Reviewed By: Amir, yota9
Differential Revision: https://reviews.llvm.org/D128773
This reverts commit 425dda76e9.
This commit is currently causing BOLT to crash in one of our
binaries and needs a bit more checking to make sure it is safe
to land.
The gold linker veneers are written between functions without symbols,
so we to handle it specially in BOLT.
Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei
Differential Revision: https://reviews.llvm.org/D128082
ICP peel for inline mode only makes sense for calls, not jump tables.
Plus, add a check that the Target BinaryFunction is found.
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D128404