Commit Graph

256 Commits

Author SHA1 Message Date
Amir Ayupov 556efdba85 [BOLT][NFC] Extend debug logging in analyzeJumpTable
Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D131918
2022-08-15 20:34:40 -07:00
Kazu Hirata 2febc32c9c Use llvm::erase_if (NFC) 2022-08-13 12:55:48 -07:00
Fangrui Song 53113515cd [BOLT] Use Optional::emplace to avoid move assignment. NFC 2022-08-12 12:51:50 -07:00
Fangrui Song 0972a390b9 LLVM_FALLTHROUGH => [[fallthrough]]. NFC 2022-08-09 04:06:52 +00:00
Thorsten Schütt 0c9258612b [bolt] silence unused variables warnings 2022-08-06 20:52:45 +02:00
Rafael Auler 19eb908e61 [BOLT] Remove always true if statement
Got a warning from GCC when building this.

Reviewed By: Amir

Differential Revision: https://reviews.llvm.org/D131092
2022-08-03 13:11:33 -07:00
Nicolai Hähnle f7872cdce1 CommandLine: add and use cl::SubCommand::get{All,TopLevel}
Prefer using these accessors to access the special sub-commands
corresponding to the top-level (no subcommand) and all sub-commands.

This is a preparatory step towards removing the use of ManagedStatic:
with a subsequent change, these global instances will be moved to
be regular function-scope statics.

It is split up to give downstream projects a (albeit short) window in
which they can switch to using the accessors in a forward-compatible
way.

Differential Revision: https://reviews.llvm.org/D129118
2022-08-02 23:49:16 +02:00
David Blaikie 7651522b78 Fold assert-used variable into assert
Fixes #56724
2022-08-01 21:57:11 +00:00
Alexander Yermolovich dd29b3c542 [BOLT][DWARF] Fix handling of multiple DW_OP_addrx in an expression
We were not handling correclty multiple DW_OP_addrx in the location expression.
This was exposed by clang-15 build in release mode with debug information.

Reviewed By: maksfb

Differential Revision: https://reviews.llvm.org/D130812
2022-08-01 14:38:47 -07:00
Kazu Hirata bf6021709a Use drop_begin (NFC) 2022-07-31 15:17:09 -07:00
Kazu Hirata ce3b687b88 [BOLT] Remove redundaunt string initialization (NFC)
Identified with readability-redundant-string-init.
2022-07-31 15:17:05 -07:00
Kazu Hirata 1bf531a5d0 [BOLT] Use boolean literals (NFC)
Identified with modernize-use-bool-literals.
2022-07-31 15:17:02 -07:00
Amir Ayupov 468d4f6d18 Revert "[BOLT] Ignore functions accessing false positive jump tables"
This diff uncovers an ASAN leak in getOrCreateJumpTable:
```
Indirect leak of 264 byte(s) in 1 object(s) allocated from:
    #1 0x4f6e48c in llvm::bolt::BinaryContext::getOrCreateJumpTable ...
```
The removal of an assertion needs to be accompanied by proper deallocation of
a `JumpTable` object for which `analyzeJumpTable` was unsuccessful.

This reverts commit 52cd00cabf.
2022-07-30 10:39:46 -07:00
Kazu Hirata 12b29900a1 Use any_of (NFC) 2022-07-30 10:35:56 -07:00
Kazu Hirata f081ec20b5 [bolt] Remove redundaunt virtual specifiers (NFC)
Identified with modernize-use-override.
2022-07-30 10:35:51 -07:00
Kazu Hirata b498a8991e [bolt] Remove redundaunt control-flow statements (NFC)
Identified with readability-redundant-control-flow.
2022-07-30 10:35:49 -07:00
Kazu Hirata 60db8d9b4e Use nullptr instead of 0 (NFC)
Identified with modernize-use-nullptr.
2022-07-30 10:35:48 -07:00
Rafael Auler fc0ced73dc Add BAT testing framework
This patch refactors BAT to be testable as a library, so we
can have open-source tests on it. This further fixes an issue with
basic blocks that lack a valid input offset, making BAT omit those
when writing translation tables.

Test Plan: new testcases added, new testing tool added (llvm-bat-dump)

Differential Revision: https://reviews.llvm.org/D129382
2022-07-29 14:55:04 -07:00
Fangrui Song 7430894a65 Replace Optional::hasValue with has_value or operator bool. NFC 2022-07-29 10:57:25 -07:00
Fangrui Song 999514bb9a [bolt] Replace Optional::getValue with value or operator*. NFC 2022-07-29 01:15:24 -07:00
Huan Nguyen 52cd00cabf [BOLT] Ignore functions accessing false positive jump tables
Disassembly and branch target analysis are not decoupled, so any
analysis that depends on disassembly may not operate properly.

In specific, analyzeJumpTable uses instruction bounds check property.
A jump table was analyzed twice: (a) during disassembly, and (b) after
disassembly, so there are potentially some mismatched results.

In this update, functions that access JTs which fail the second check
will be marked as ignored.

Test Plan:
```
ninja check-bolt
```

Reviewed By: Amir

Differential Revision: https://reviews.llvm.org/D130431
2022-07-28 23:22:17 -07:00
Huan Nguyen ccabbfff86 [BOLT] Remove --allow-stripped option
AllowStripped has not been used in BOLT.
This option is replaced by actively detecting stripped binary.

Test Plan:

Reviewed By: Amir

Differential Revision: https://reviews.llvm.org/D130036
2022-07-28 23:15:53 -07:00
Huan Nguyen 986362d4a3 [BOLT] Add BinaryContext::IsStripped
Determine stripped status of a binary based on .symtab

Test Plan:
```
ninja check-bolt
```

Reviewed By: Amir

Differential Revision: https://reviews.llvm.org/D130034
2022-07-28 23:11:03 -07:00
Amir Ayupov 77c1977384 [BOLT] Support files with no symbols
`LastSymbol` handling in `discoverFileObjects` assumes a non-zero number of
symbols in an object file. It's not the case for broken_dynsym.test added in
D130073, and potentially other stripped binaries.

Reviewed By: maksfb

Differential Revision: https://reviews.llvm.org/D130544
2022-07-26 00:07:59 -07:00
Fabian Parzefall 83882606db [BOLT] Process each block only once in fixCFGForPIC
Rather than iterating over the whole function from the start until no
internal calls are found, process each block only once and continue
processing after splitting. This version of the function also does not
seemingly invalidate iterators from within the loop.

Reviewed By: maksfb

Differential Revision: https://reviews.llvm.org/D130436
2022-07-25 15:06:24 -07:00
Huan Nguyen 8eb68d92d4 [BOLT] Handle broken .dynsym in stripped binaries
Strip tools cause a few symbols in .dynsym to have bad section index.
This update safely keeps such broken symbols intact.

Test Plan:
```
ninja check-bolt
```

Reviewed By: Amir

Differential Revision: https://reviews.llvm.org/D130073
2022-07-22 11:24:09 -07:00
Maksim Panchenko 661577b5f4 [BOLT] Add support for the latest perf tool
The latest perf tool can return non-empty buffer when executing
buildid-list command, even when perf.data was recorded with -B flag.
Some binaries will be listed without the ID, while others may have a
recorded ID. Allow invalid entires on the input, while checking the
valid ones for the match.

Reviewed By: Amir

Differential Revision: https://reviews.llvm.org/D130223
2022-07-22 07:56:15 -07:00
Sriraman Tallam 116ee23f4c [bolt] std::atomic_uint64_t to std::atomic<uint64_t>
Differential Revision: https://reviews.llvm.org/D129903
2022-07-19 16:09:11 -07:00
Fabian Parzefall 8477bc6761 [BOLT] Add function layout class
This patch adds a dedicated class to keep track of each function's
layout. It also lays the groundwork for splitting functions into
multiple fragments (as opposed to a strict hot/cold split).

Reviewed By: maksfb

Differential Revision: https://reviews.llvm.org/D129518
2022-07-16 17:23:24 -07:00
Huan Nguyen ae563c9146 [BOLT] Support split landing pad
We previously support split jump table, where some jump table entries
target different fragments of same function. In this fix, we provide
support for another type of intra-indirect transfer: landing pad.

When C++ exception handling is used, compiler emits .gcc_except_table
that describes the location of catch block (landing pad) for specific
range that potentially invokes a throw(). Normally landing pads reside
in the function, but with -fsplit-machine-functions, landing pads can
be moved to another fragment. The intuition is, landing pads are rarely
executed, so compiler can move them to .cold section.

This update will mark all fragments that have landing pad to another
fragment as non-simple, and later propagate non-simple to all related
fragments.

This update also includes one manual test case: split-landing-pad.s

Reviewed By: Amir

Differential Revision: https://reviews.llvm.org/D128561
2022-07-14 18:10:22 -07:00
Fabian Parzefall d55dfeaf32 [BOLT] Replace uses of layout with basic block list
As we are moving towards support for multiple fragments, loops that
iterate over all basic blocks of a function, but do not depend on the
order of basic blocks in the final layout, should iterate over binary
functions directly, rather than the layout.

Eventually, all loops using the layout list should either iterate over
the function, or be aware of multiple layouts. This patch replaces
references to binary function's block layout with the binary function
itself where only little code changes are necessary.

Reviewed By: maksfb

Differential Revision: https://reviews.llvm.org/D129585
2022-07-14 13:07:05 -07:00
Huan Nguyen 05523dc32d [BOLT] Support multiple parents for split jump table
There are two assumptions regarding jump table:
(a) It is accessed by only one fragment, say, Parent
(b) All entries target instructions in Parent

For (a), BOLT stores jump table entries as relative offset to Parent.
For (b), BOLT treats jump table entries target somewhere out of Parent
as INVALID_OFFSET, including fragment of same split function.

In this update, we extend (a) and (b) to include fragment of same split
functinon. For (a), we store jump table entries in absolute offset
instead. In addition, jump table will store all fragments that access
it. A fragment uses this information to only create label for jump table
entries that target to that fragment.

For (b), using absolute offset allows jump table entries to target
fragments of same split function, i.e., extend support for split jump
table. This can be done using relocation (fragment start/size) and
fragment detection heuristics (e.g., using symbol name pattern for
non-stripped binaries).

For jump table targets that can only be reached by one fragment, we
mark them as local label; otherwise, they would be the secondary
function entry to the target fragment.

Test Plan
```
ninja check-bolt
```

Reviewed By: Amir

Differential Revision: https://reviews.llvm.org/D128474
2022-07-13 23:37:31 -07:00
Vladislav Khmelevsky 35efe1d806 [BOLT][AArch64] Handle gold linker veneers
The gold linker veneers are written between functions without symbols,
so we to handle it specially in BOLT.

Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei

Differential Revision: https://reviews.llvm.org/D129260
2022-07-13 14:47:22 +03:00
Denis Revunov 7564167885 [BOLT][AArch64] Use all supported CPU features on AArch64
Since we now have +all feature for AArch64 disassembler, we can use it
in BOLT and allow it to disassemble all ARM instructions supported by LLVM.

Reviewed by: rafauler

Differential Revision: https://reviews.llvm.org/D129139
2022-07-12 03:56:04 -04:00
Rafael Auler a3cfdd746e [BOLT] Increase coverage of shrink wrapping [5/5]
Add -experimental-shrink-wrapping flag to control when we
want to move callee-saved registers even when addresses of the stack
frame are captured and used in pointer arithmetic, making it more
challenging to do alias analysis to prove that we do not access
optimized stack positions. This alias analysis is not yet implemented,
hence, it is experimental. In practice, though, no compiler would emit
code to do pointer arithmetic to access a saved callee-saved register
unless there is a memory bug or we are failing to identify a
callee-saved reg, so I'm not sure how useful it would be to formally
prove that.

Reviewed By: Amir

Differential Revision: https://reviews.llvm.org/D126115
2022-07-11 17:30:13 -07:00
Rafael Auler 3e5f67f356 [BOLT] Increase coverage of shrink wrapping [4/5]
Change shrink-wrapping to try a priority list of save
positions, instead of trying the best one and giving up if it doesn't
work. This also increases coverage.

Reviewed By: Amir

Differential Revision: https://reviews.llvm.org/D126114
2022-07-11 17:30:05 -07:00
Rafael Auler 3332904ad6 [BOLT] Increase coverage of shrink wrapping [3/5]
Add the option to run -equalize-bb-counts before shrink
wrapping to avoid unnecessarily optimizing some CFGs where profile is
inaccurate but we can prove two blocks have the same frequency.

Reviewed By: Amir

Differential Revision: https://reviews.llvm.org/D126113
2022-07-11 17:30:00 -07:00
Rafael Auler 3508ced6ea [BOLT] Increase coverage of shrink wrapping [2/5]
Refactor isStackAccess() to reflect updates by D126116. Now we only
handle simple stack accesses and delegate the rest of the cases to
getMemDataSize.

Reviewed By: Amir

Differential Revision: https://reviews.llvm.org/D126112
2022-07-11 17:29:54 -07:00
Rafael Auler 42465efd17 [BOLT] Increase coverage of shrink wrapping [1/5]
Change how function score is calculated and provide more
detailed statistics when reporting back frame optimizer and shrink
wrapping results. In this new statistics, we provide dynamic coverage
numbers. The main metric for shrink wrapping is the number of executed
stores that were saved because of shrink wrapping (push instructions
that were either entirely moved away from the hot block or converted
to a stack adjustment instruction). There is still a number of reduced
load instructions (pop) that we are not counting at the moment. Also
update alloc combiner to report dynamic numbers, as well as frame
optimizer.

For debugging purposes, we also include a list of top 10 functions
optimized by shrink wrapping. These changes are aimed at better
understanding the impact of shrink wrapping in a given binary.

We also remove an assertion in dataflow analysis to do not choke on
empty functions (which makes no sense).

Reviewed By: Amir

Differential Revision: https://reviews.llvm.org/D126111
2022-07-11 17:29:22 -07:00
spupyrev 228970f612 Revert "Rebase: [Facebook] Revert "[BOLT] Update dynamic relocations from section relocations""
This reverts commit 76029cc53e.
2022-07-11 09:50:47 -07:00
spupyrev eecd41aa09 Revert "Rebase: [Facebook] [MC] Introduce NeverAlign fragment type"
This reverts commit 6d0528636a.
2022-07-11 09:50:47 -07:00
spupyrev 7228371054 [BOLT] Do not merge cold and hot chains of basic blocks
There is a post-processing in ext-tsp block reordering that merges some blocks
into chains. This allows to maintain the original block order in the absense of
profile data and can be beneficial for code size (when fallthroughs are merged).
In the earlier version we could merge hot and cold (with zero execution count)
chains, that later were split by SplitFunction.cpp (when split-all-cold=1). The
diff eliminates the redundant merging.

It is unlikely the change will affect the performance of a binary in a
measurable way, as it is mostly operates with cold basic blocks. However, after
the diff the impact of split-all-cold is almost negligible and we can avoid the
extra function splitting.

Measuring on the clang binary (negative is good, positive is a regression):
**clang12**
benchmark1:  `0.0253`
benchmark2:  `-0.1843`
benchmark3:  `0.3234`
benchmark4:  `0.0333`

**clang10**
benchmark1  `-0.2517`
benchmark2  `-0.3703`
benchmark3  `-0.1186`
benchmark4  `-0.3822`

**clang7**
benchmark1  `0.2526`
benchmark2  `0.0500`
benchmark3  `0.3024`
benchmark4  `-0.0489`

**Overall**: `-0.0671 ± 0.1172` (insignificant)

Reviewed By: maksfb

Differential Revision: https://reviews.llvm.org/D129397
2022-07-11 09:31:52 -07:00
Maksim Panchenko 76029cc53e Rebase: [Facebook] Revert "[BOLT] Update dynamic relocations from section relocations"
Summary:
This reverts commit 729d29e167.

Needed as a workaround for T112872562.

Manual rebase conflict history:
https://phabricator.intern.facebook.com/D35230076
https://phabricator.intern.facebook.com/D35681740

Test Plan: sandcastle

Reviewers: #llvm-bolt

Subscribers: spupyrev

Differential Revision: https://phabricator.intern.facebook.com/D37098481
2022-07-11 09:31:52 -07:00
Rafael Auler 6d0528636a Rebase: [Facebook] [MC] Introduce NeverAlign fragment type
Summary:
Introduce NeverAlign fragment type.

The intended usage of this fragment is to insert it before a pair of
macro-op fusion eligible instructions. NeverAlign fragment ensures that
the next fragment (first instruction in the pair) does not end at a
given alignment boundary by emitting a minimal size nop if necessary.

In effect, it ensures that a pair of macro-fusible instructions is not
split by a given alignment boundary, which is a precondition for
macro-op fusion in modern Intel Cores (64B = cache line size, see Intel
Architecture Optimization Reference Manual, 2.3.2.1 Legacy Decode
Pipeline: Macro-Fusion).

This patch introduces functionality used by BOLT when emitting code with
MacroFusion alignment already in place.

The use case is different from BoundaryAlign and instruction bundling:
- BoundaryAlign can be extended to perform the desired alignment for the
first instruction in the macro-op fusion pair (D101817). However, this
approach has higher overhead due to reliance on relaxation as
BoundaryAlign requires in the general case - see
https://reviews.llvm.org/D97982#2710638.
- Instruction bundling: the intent of NeverAlign fragment is to prevent
the first instruction in a pair ending at a given alignment boundary, by
inserting at most one minimum size nop. It's OK if either instruction
crosses the cache line. Padding both instructions using bundles to not
cross the alignment boundary would result in excessive padding. There's
no straightforward way to request instruction bundling to avoid a given
end alignment for the first instruction in the bundle.

LLVM: https://reviews.llvm.org/D97982

Manual rebase conflict history:
https://phabricator.intern.facebook.com/D30142613

Test Plan: sandcastle

Reviewers: #llvm-bolt

Subscribers: phabricatorlinter

Differential Revision: https://phabricator.intern.facebook.com/D31361547
2022-07-11 09:31:52 -07:00
Alexander Yermolovich e159abdb04 [BOLT][DWARF] Support mix mode DWARF
Added support for mixing monolithic DWARF5 with legacy DWARF, and monolithic legacy and DWARF5 split dwarf.

Reviewed By: maksfb

Differential Revision: https://reviews.llvm.org/D128232
2022-06-30 16:53:15 -07:00
Amir Ayupov 66b01a8934 [BOLT] Fix getDynoStats to handle BCs with no functions
Address fuzzer crash

Reviewed By: yota9

Differential Revision: https://reviews.llvm.org/D120696
2022-06-30 01:18:45 -07:00
Amir Ayupov cb75faf40c [X86][BOLT] Use getOperandType to determine memory access size
Generate INSTRINFO_OPERAND_TYPE table in X86GenInstrInfo.inc.

This diff adds support for instructions that were previously reported as having
memory access size 0. It replaces the heuristic of looking at instruction
register width to determine memory access width by instead checking the memory
operand type using tablegen-provided tables.

Reviewed By: skan

Differential Revision: https://reviews.llvm.org/D126116
2022-06-30 00:25:32 -07:00
Amir Ayupov 798e92c6c4 [BOLT] Respect shouldPrint in dump-dot-all
Don't dump dot CFG graph for functions that should not be printed.

Reviewed By: rafauler, maksfb

Differential Revision: https://reviews.llvm.org/D128699
2022-06-29 17:01:17 -07:00
Maksim Panchenko ed74304506 [BOLT] Fix EH trampoline backout code
When SplitFunctions pass adds a trampoline code for exception landing
pads (limited to shared objects), it may increase the size of the hot
fragment making it larger than the whole function pre-split. When this
happens, the pass reverts the splitting action by restoring the original
block order and marking all blocks hot.

However, if createEHTrampolines() added new blocks to the CFG and
modified invoke instructions, simply restoring the original block layout
will not suffice as the new CFG has more blocks.

For proper backout of the split, modify the original layout by merging
in trampoline blocks immediately before their matching targets. As a
result, the number of blocks increases, but the number of instructions
and the function size remains the same as pre-split.

Add an assertion for the number of blocks when updating a function
layout.

Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D128696
2022-06-29 14:35:57 -07:00
Fabian Parzefall e341e9f094 [BOLT] Add option to randomize function split point
For test purposes, we want to split functions at a random split point
to be able to test different layouts without relying on the profile.
This patch introduces an option, that randomly chooses a split point
to partition blocks of a function into hot and cold regions.

Reviewed By: Amir, yota9

Differential Revision: https://reviews.llvm.org/D128773
2022-06-29 13:02:05 -07:00