It's more natural to use uint8_t * (std::byte needs C++17 and llvm has
too much uint8_t *) and most callers use uint8_t * instead of char *.
The functions are recently moved into `llvm::compression::zlib::`, so
downstream projects need to make adaption anyway.
Summary:
Introduce NeverAlign fragment type.
The intended usage of this fragment is to insert it before a pair of
macro-op fusion eligible instructions. NeverAlign fragment ensures that
the next fragment (first instruction in the pair) does not end at a
given alignment boundary by emitting a minimal size nop if necessary.
In effect, it ensures that a pair of macro-fusible instructions is not
split by a given alignment boundary, which is a precondition for
macro-op fusion in modern Intel Cores (64B = cache line size, see Intel
Architecture Optimization Reference Manual, 2.3.2.1 Legacy Decode
Pipeline: Macro-Fusion).
This patch introduces functionality used by BOLT when emitting code with
MacroFusion alignment already in place.
The use case is different from BoundaryAlign and instruction bundling:
- BoundaryAlign can be extended to perform the desired alignment for the
first instruction in the macro-op fusion pair (D101817). However, this
approach has higher overhead due to reliance on relaxation as
BoundaryAlign requires in the general case - see
https://reviews.llvm.org/D97982#2710638.
- Instruction bundling: the intent of NeverAlign fragment is to prevent
the first instruction in a pair ending at a given alignment boundary, by
inserting at most one minimum size nop. It's OK if either instruction
crosses the cache line. Padding both instructions using bundles to not
cross the alignment boundary would result in excessive padding. There's
no straightforward way to request instruction bundling to avoid a given
end alignment for the first instruction in the bundle.
LLVM: https://reviews.llvm.org/D97982
Manual rebase conflict history:
https://phabricator.intern.facebook.com/D30142613
Test Plan: sandcastle
Reviewers: #llvm-bolt
Subscribers: phabricatorlinter
Differential Revision: https://phabricator.intern.facebook.com/D31361547
* Refactor compression namespaces across the project, making way for a possible
introduction of alternatives to zlib compression.
Changes are as follows:
* Relocate the `llvm::zlib` namespace to `llvm::compression::zlib`.
Reviewed By: MaskRay, leonardchan, phosek
Differential Revision: https://reviews.llvm.org/D128953
Finish adding support for the remaining binary named operators in expression context: XOR, SHL, and SHR.
Differential Revision: https://reviews.llvm.org/D129299
Currently we use the `.llvm.offloading` section to store device-side
objects inside the host, creating a fat binary. The contents of these
sections is currently determined by the name of the section while it
should ideally be determined by its type. This patch adds the new
`SHT_LLVM_OFFLOADING` section type to the ELF section types. Which
should make it easier to identify this specific data format.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D129052
ml.exe and ml64.exe support the use of anonymous labels, with @B and @F referring to the previous and next anonymous label respectively.
We add similar support to llvm-ml, with the exception that we are unable to emit an error message for an @F expression not followed by a @@ label.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D128944
`mov x0, 1024u` is permitted in binutils but rejected by the integrated
assembler. Support the case. This is especially important when using the C
pre-processor with the assembler: some shared code between C and assembler may
use lower-cased suffices.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D128871
We are going to change alignment for DWARF sections. This patch is to
fix functionality issue if the alignment is not the same with
DefaultSectionAlign defined in XCOFFObjectWriter.cpp.
Currently no test for this patch as for now alignment for DWARF sections
and other sections are always the same. This patch will be tested when
patch changing DWARF section is merged in.
Reviewed By: Esme
Differential Revision: https://reviews.llvm.org/D116092
This is a resurrection of D106421 with the change that it keeps backward-compatibility. This means decoding the previous version of `LLVM_BB_ADDR_MAP` will work. This is required as the profile mapping tool is not released with LLVM (AutoFDO). As suggested by @jhenderson we rename the original section type value to `SHT_LLVM_BB_ADDR_MAP_V0` and assign a new value to the `SHT_LLVM_BB_ADDR_MAP` section type. The new encoding adds a version byte to each function entry to specify the encoding version for that function. This patch also adds a feature byte to be used with more flexibility in the future. An use-case example for the feature field is encoding multi-section functions more concisely using a different format.
Conceptually, the new encoding emits basic block offsets and sizes as label differences between each two consecutive basic block begin and end label. When decoding, offsets must be aggregated along with basic block sizes to calculate the final offsets of basic blocks relative to the function address.
This encoding uses smaller values compared to the existing one (offsets relative to function symbol).
Smaller values tend to occupy fewer bytes in ULEB128 encoding. As a result, we get about 17% total reduction in the size of the bb-address-map section (from about 11MB to 9MB for the clang PGO binary).
The extra two bytes (version and feature fields) incur a small 3% size overhead to the `LLVM_BB_ADDR_MAP` section size.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D121346
This is already possible for e.g. `cstring_literals`, but the entry for zerofill was unnamed.
rdar://90336380
Differential Revision: https://reviews.llvm.org/D128654
DXContainer files resemble traditional object files in that they are
comprised of parts which resemble sections. Adding DXContainer as an
object file format in the MC layer will allow emitting DXContainer
objects through the normal object emission pipeline.
Differential Revision: https://reviews.llvm.org/D127165
Previously, omitting unnecessary DWARF unwinds was only done in two
cases:
* For Darwin + aarch64, if no DWARF unwind info is needed for all the
functions in a TU, then the `__eh_frame` section would be omitted
entirely. If any one function needed DWARF unwind, then MC would emit
DWARF unwind entries for all the functions in the TU.
* For watchOS, MC would omit DWARF unwind on a per-function basis, as
long as compact unwind was available for that function.
This diff makes it so that we omit DWARF unwind on a per-function basis
for Darwin + aarch64 as well. In addition, we introduce the flag
`--emit-dwarf-unwind=` which can toggle between `always`,
`no-compact-unwind` (only emit DWARF when CU cannot be emitted for a
given function), and the target platform `default`. `no-compact-unwind`
is particularly useful for newer x86_64 platforms: we don't want to omit
DWARF unwind for x86_64 in general due to possible backwards compat
issues, but we should make it possible for people to opt into this
behavior if they are only targeting newer platforms.
**Motivation:** I'm working on adding support for `__eh_frame` to LLD,
but I'm concerned that we would suffer a perf hit. Processing compact
unwind is already expensive, and that's a simpler format than EH frames.
Given that MC currently produces one EH frame entry for every compact
unwind entry, I don't think processing them will be cheap. I tried to do
something clever on LLD's end to drop the unnecessary EH frames at parse
time, but this made the code significantly more complex. So I'm looking
at fixing this at the MC level instead.
**Addendum:** It turns out that there was a latent bug in the X86
backend when `OmitDwarfIfHaveCompactUnwind` is naively enabled, which is
not too surprising given that this combination has not been heretofore
used.
For functions that have unwind info that cannot be encoded with CU, MC
would end up dropping both the compact unwind entry (OK; existing
behavior) as well as the DWARF entries (not OK). This diff fixes things
so that we emit the DWARF entry, as well as a CU entry with encoding
`UNWIND_X86_MODE_DWARF` -- this basically tells the unwinder to look for
the DWARF entry. I'm not 100% sure the `UNWIND_X86_MODE_DWARF` CU entry
is necessary, this was the simplest fix. ld64 seems to be able to handle
both the absence and presence of this CU entry. Ultimately ld64 (and
LLD) will synthesize `UNWIND_X86_MODE_DWARF` if it is absent, so there
is no impact to the final binary size.
Reviewed By: davide, lhames
Differential Revision: https://reviews.llvm.org/D122258
The callsite identifier used in pseudo probe encoding and decoding is consisted of a function name and the callsite probe id. For encoding, i.e., `MCPseudoProbeInlineTree`, the function name is callee function name. However for decoding, i.e., `MCDecodedPseudoProbeInlineTree`, the caller function name is used actually. This results in multiple callees that are inlined at the same callsite, likely via indirect call promotion, sharing the same decoded inline frame. While it is not a problem for profile generation, it confuses probe re-encoding in Bolt.
In Bolt, we decode pseudo probes first and build `MCDecodedPseudoProbeInlineTree`. The decoded tree is used for final re-encoding. Here comes the problem. Two inlinees from the same callsite share the same decoded inline frame. During re-encoding, the frame name (whatever inlinee comes first) will be used and encoded in the bolted binary. This will cause wrong inline contexts in the profile generated on the bolted binary.
The fix is a no-op to pre-bolt profile generation. Some of the bolt tests are not yet upstreamed, thus I'm not adding a bolt test here.
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D126434
Upon invalid alignment value, still set a default valid alignment value to avoid
hitting later asserts.
Fix#55273
Differential Revision: https://reviews.llvm.org/D125688
It's a fairly common issue that the generating code incorrectly marks
instructions as narrow or wide; check that the instruction lengths
add up to the expected value, and error out if it doesn't. This allows
catching code generation bugs.
Also check that prologs and epilogs are properly terminated, to
catch other code generation issues.
Differential Revision: https://reviews.llvm.org/D125647
This includes .seh_* directives for generating it from assembly.
It is designed fairly similarly to the ARM64 handling.
For .seh_handler directives, such as
".seh_handler __C_specific_handler, @except" (which is supported
on x86_64 and aarch64 so far), the "@except" bit doesn't work in
ARM assembly, as '@' is used as a comment character (on all current
platforms).
Allow using '%' instead of '@' for this purpose. This convention
is used by GAS in similar contexts already,
e.g. [1]:
Note on targets where the @ character is the start of a comment
(eg ARM) then another character is used instead. For example the
ARM port uses the % character.
In practice, this unfortunately means that all such .seh_handler
directives will need ifdefs for ARM.
Contrary to ARM64, on ARM, it's quite common that we can't evaluate
e.g. the function length at this point, due to instructions whose
length is finalized later. (Also, inline jump tables end with
a ".p2align 1".)
If unable to to evaluate the function length immediately, emit
it as an MCExpr instead. If we'd implement splitting the unwind
info for a function (which isn't implemented for ARM64 yet either),
we wouldn't know whether we need to split it though.
Avoid calling getFrameIndexOffset() on an unset
FuncInfo.UnwindHelpFrameIdx, to avoid triggering asserts in the
preexisting testcase CodeGen/ARM/Windows/wineh-basic.ll. (Once
MSVC exception handling is fully implemented, those changes
can be reverted.)
[1] https://sourceware.org/binutils/docs/as/Section.html#Section
Differential Revision: https://reviews.llvm.org/D125645
For ARM SEH, the epilogs will need a little more associated data than
just the plain list of opcodes.
This is a preparatory refactoring for D125645.
Differential Revision: https://reviews.llvm.org/D125879
MCSymbolizer::tryAddingSymbolicOperand() overloaded the Size parameter
to specify either the instruction size or the operand size depending on
the architecture. However, for proper symbolic disassembly on X86, we
need to know both sizes, as an instruction can have two operands, and
the instruction size cannot be reliably calculated based on the operand
offset and its size. Hence, split Size into OpSize and InstSize.
For X86, the new interface allows to fix a couple of issues:
* Correctly adjust the value of PC-relative operands.
* Set operand size to zero when the operand is specified implicitly.
Differential Revision: https://reviews.llvm.org/D126101
There's an inconsistency regarding the epilogs of packed ARM64
unwind info with homed parameters; according to the documentation
(and according to common sense), the epilog wouldn't have a series
of nop instructions matching the stp x0-x7 in the prolog - however
in practice, RtlVirtualUnwind still seems to behave as if the epilog
does have the mirrored nops from the prolog.
In practice, MSVC doesn't seem to produce packed unwind info with
homed parameters, which might be why this inconsistency hasn't
been noticed.
Thus, to play it safe, avoid creating such packed unwind info with
homed parameters. (LLVM's current behaviour matches the current
runtime behaviour of RtlVirtualUnwind, but if it later is bug fixed
to match the documentation, such unwind information would be
incorrect.)
See https://github.com/llvm/llvm-project/issues/54879 for further
discussion on the matter.
Differential Revision: https://reviews.llvm.org/D125876
This allows sharing opcodes between prolog and epilog even when there
is more than one epilog.
I didn't make any handcrafted special MC level testcases for this (yet
at least), but it does seem to have the expected effect on two existing
CodeGen level testcases.
Differential Revision: https://reviews.llvm.org/D125619