We already do this for personality pointers referenced from compact
unwind entries; this patch extends that behavior to personalities
referenced via EH frames as well.
This reduces the number of distinct personalities we need in the final
binary, and helps us avoid hitting the "too many personalities" error.
I renamed `UnwindInfoSection::prepareRelocations()` to simply `prepare`
since we now do some non-reloc-specific stuff within.
Fixes#58277.
Reviewed By: #lld-macho, oontvoo
Differential Revision: https://reviews.llvm.org/D135728
This commit adds support for chained fixups, which were introduced in
Apple's late 2020 OS releases. This format replaces the dyld opcodes
used for supplying rebase and binding information, and encodes most of
that data directly in the memory location that will have the fixup
applied.
This reduces binary size and is a requirement for page-in linking, which
will be available starting with macOS 13.
A high-level overview of the format and my implementation can be found
in SyntheticSections.h.
This feature is currently gated behind the `-fixup_chains` flag, and
will be enabled by default for supported targets in a later commit.
Like in ld64, lazy binding is disabled when chained fixups are in use,
and the `-init_offsets` transformation is performed by default.
Differential Revision: https://reviews.llvm.org/D132560
Builds that error out on duplicate symbols can still succeed if the symbols
will be dead stripped. Currently, this is the current behavior in ld64.
https://github.com/apple-oss-distributions/ld64/blob/main/src/ld/Resolver.cpp#L2018.
In order to provide an easier to path for adoption, introduce a new flag that will
retain compatibility with ld64's behavior (similar to `--deduplicate-literals`). This is
turned off by default since we do not encourage this behavior in the linker.
Reviewed By: #lld-macho, thakis, int3
Differential Revision: https://reviews.llvm.org/D134794
If a value for a given key is not present, `DenseMap::operator[]`
default-constructs one, which is wasteful when we don't do anything with
it afterwards. Fix it by calling `lookup()` instead which only returns
the default value, but does not modify the map.
This speeds up linking a fair bit when only a small portion of all
sections are specified in the order file, like in the case of Chromium
Framework:
N Min Max Median Avg Stddev
x 25 3.727684 3.8808699 3.753552 3.7702461 0.0397282
+ 25 3.6469049 3.7523289 3.6764321 3.6841622 0.025525047
Difference at 95.0% confidence
-0.0860839 +/- 0.0189924
-2.28324% +/- 0.503745%
(Student's t, pooled s = 0.0333906)
Differential Revision: https://reviews.llvm.org/D134811
`__thread_vars` contains pointers to `__tlv_bootstrap`, which are fixed
up by dyld; however the section's alignment is not specified. This means
that the relocations might end up on odd addresses, which is not
representable by the soon to be added chained fixups.
This is arguably a bug in MC, but this behavior has been there since TLV
support was originally added.
This patch forces the `__thread_vars` sections to be aligned to the
target's pointer size. This is done by ld64 as well.
Differential Revision: https://reviews.llvm.org/D134594
This arg is undocumented but from looking at the code + experiment, it's used to add additional DYLD_ENVIRONMENT load commands to the output.
Differential Revision: https://reviews.llvm.org/D134058
This commit moves the parsing of linker optimization hints into
`ARM64::applyOptimizationHints`. This lets us avoid allocating memory
for holding the parsed information, and moves work out of
`ObjFile::parse`, which is not parallelized at the moment.
This change reduces the overhead of processing LOHs to 25-30 ms when
linking Chromium Framework on my M1 machine; previously it took close to
100 ms.
There's no statistically significant change in runtime for a --threads=1
link.
Performance figures with all 8 cores utilized:
N Min Max Median Avg Stddev
x 20 3.8027232 3.8760762 3.8505335 3.8454145 0.026352574
+ 20 3.7019017 3.8660538 3.7546209 3.7620371 0.032680043
Difference at 95.0% confidence
-0.0833775 +/- 0.019
-2.16823% +/- 0.494094%
(Student's t, pooled s = 0.0296854)
Differential Revision: https://reviews.llvm.org/D133439
This flag instructs dyld to make the segment read-only after fixups have
been performed.
I'm not sure why this flag is needed, as on macOS 13 beta at least,
__DATA_CONST is read-only even without this flag; but ld64 sets it as
well.
Differential Revision: https://reviews.llvm.org/D133010
This section stores 32-bit `__TEXT` segment offsets of initializer
functions, and is used instead of `__mod_init_func` when chained fixups
are enabled.
Storing the offsets lets us avoid emitting fixups for the initializers.
Differential Revision: https://reviews.llvm.org/D132947
We now re-use the existing needsBinding() helper to determine if a
branch has to go through a stub. The logic for determining which type of
binding is needed is moved inside StubsSection::addEntry().
This is an NFC refactor that simplifies my diff that adds support for
chained fixups.
Differential Revision: https://reviews.llvm.org/D132476
Apple Clang in Xcode 14 introduced a new feature for reducing the
overhead of objc_msgSend calls by deduplicating the setup calls for each
individual selector. This works by clang adding undefined symbols for
each selector called in a translation unit, such as `_objc_msgSend$foo`
for calling the `foo` method on any `NSObject`. There are 2
different modes for this behavior, the default directly does the setup
for `_objc_msgSend` and calls it, and the smaller option does the
selector setup, and then calls the standard `_objc_msgSend` stub
function.
The general overview of how this works is:
- Undefined symbols with the given prefix are collected
- The suffix of each matching undefined symbol is added as a string to
`__objc_methname`
- A pointer is added for every method name in the `__objc_selrefs`
section
- A `got` entry is emitted for `_objc_msgSend`
- Stubs are emitting pointing to the synthesized locations
Notes:
- Both `__objc_methname` and `__objc_selrefs` can also exist from object
files, so their contents are merged with our synthesized contents
- The compiler emits method names for defined methods, but not for
undefined symbols you call, but stubs are used for both
- This only implements the default "fast" mode currently just to reduce
the diff, I also doubt many folks will care to swap modes
- This only implements this for arm64 and x86_64, we don't need to
implement this for 32 bit iOS archs, but we should implement it for
watchOS archs in a later diff
Differential Revision: https://reviews.llvm.org/D128108
A symbol `$ld$previous$/Another$1.2.3$1$3.0$14.0$_xxx$` means
"pretend symbol `_xxx` is in dylib `/Another` with version `1.2.3`
if the deployment target is between `3.0` and `14.0` and we're
targeting platform `1` (ie macOS)".
This means dylibs can now inject synthetic dylibs into the link, so
DylibFile needs to grow a 3rd constructor.
The only other interesting thing is that such an injected dylib
counts as a use of the original dylib. This patch gets this mostly
right (if _only_ `$ld$previous` symbols are used from a dylib,
we don't add a dep on the dylib itself, matching ld64), but one case
where we don't match ld64 yet is that ld64 even omits the original
dylib when linking it with `-needed-l`. Lld currently still adds a load
command for the original dylib in that case. (That's for a future
patch.)
Fixes#56074.
Differential Revision: https://reviews.llvm.org/D130725
Previously, we treated it as a regular ConcatInputSection. However, ld64
actually parses its contents and uses that to synthesize a single image
info struct, generating one 8-byte section instead of `8 * number of
object files with ObjC code`.
I'm not entirely sure what impact this section has on the runtime, so I
just tried to follow ld64's semantics as closely as possible in this
diff. My main motivation though was to reduce binary size.
No significant perf change on chromium_framework on my 16-core Mac Pro:
base diff difference (95% CI)
sys_time 1.764 ± 0.062 1.748 ± 0.032 [ -2.4% .. +0.5%]
user_time 5.112 ± 0.104 5.106 ± 0.046 [ -0.9% .. +0.7%]
wall_time 6.111 ± 0.184 6.085 ± 0.076 [ -1.6% .. +0.8%]
samples 30 32
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D130125
Identical literal folding takes ~1.4% of the time, and was missing
from the trace.
Signature computation still needs ~2.2% of the time, so probably worth
explicitly marking its contribution to "Write output file" (9.1%)
Differential Revision: https://reviews.llvm.org/D128343
ld64.lld used to print the "undefined symbol" line for each reference to
an undefined symbol previously:
ld64.lld: error: undefined symbol: _foo
>>> referenced by /path/to/bar.o:(symbol _baz+0x0)
ld64.lld: error: undefined symbol: _foo
>>> referenced by /path/to/bar.o:(symbol _quux+0x1)
Now they are deduplicated:
ld64.lld: error: undefined symbol: _foo
>>> referenced by /path/to/bar.o:(symbol _baz+0x0)
>>> referenced by /path/to/bar.o:(symbol _quux+0x1)
As with the other lld ports, only the first 3 references are printed.
Differential Revision: https://reviews.llvm.org/D127753
The error used to look like this:
ld64.lld: error: undefined symbol: _foo
>>> referenced by /path/to/bar.o
Now it displays the name of the function that contains the undefined
reference as well:
ld64.lld: error: undefined symbol: _foo
>>> referenced by /path/to/bar.o:(symbol _baz+0x4)
Differential Revision: https://reviews.llvm.org/D127696
This reverts commit 942f4e3a7c.
The additional change required to avoid the assertion errors seen
previously is:
--- a/lld/MachO/ICF.cpp
+++ b/lld/MachO/ICF.cpp
@@ -443,7 +443,9 @@ void macho::foldIdenticalSections() {
/*relocVA=*/0);
isec->data = copy;
}
- } else {
+ } else if (!isEhFrameSection(isec)) {
+ // EH frames are gathered as hashables from unwindEntry above; give a
+ // unique ID to everything else.
isec->icfEqClass[0] = ++icfUniqueID;
}
}
Differential Revision: https://reviews.llvm.org/D123435
== Background ==
`llvm-mc` generates unwind info in both compact unwind and DWARF
formats. LLD already handles the compact unwind format; this diff gets
us close to handling the DWARF format properly.
== Caveats ==
It's not quite done yet, but I figure it's worth getting this reviewed
and landed first as it's shaping up to be a fairly large code change.
**Known limitations of the current code:**
* Only works for x86_64, for which `llvm-mc` emits "abs-ified"
relocations as described in 618def651b.
`llvm-mc` emits regular relocations for ARM EH frames, which we do not
yet handle correctly.
Since the feature is not ready for real use yet, I've gated it behind a
flag that only gets toggled on during test suite runs. With most of the
new code disabled, we see just a hint of perf regression, so I don't
think it'd be remiss to land this as-is:
base diff difference (95% CI)
sys_time 1.926 ± 0.168 1.979 ± 0.117 [ -1.2% .. +6.6%]
user_time 3.590 ± 0.033 3.606 ± 0.028 [ +0.0% .. +0.9%]
wall_time 7.104 ± 0.184 7.179 ± 0.151 [ -0.2% .. +2.3%]
samples 30 31
== Design ==
Like compact unwind entries, EH frames are also represented as regular
ConcatInputSections that get pointed to via `Defined::unwindEntry`. This
allows them to be handled generically by e.g. the MarkLive and ICF
code. (But note that unlike compact unwind subsections, EH frame
subsections do end up in the final binary.)
In order to make EH frames "look like" a regular ConcatInputSection,
some processing is required. First, we need to split the `__eh_frame`
section along EH frame boundaries rather than along symbol boundaries.
We do this by decoding the length field of each EH frame. Second, the
abs-ified relocations need to be turned into regular Relocs.
== Next Steps ==
In order to support EH frames on ARM targets, we will either have to
teach LLD how to handle EH frames with explicit relocs, or we can try to
make `llvm-mc` emit abs-ified relocs for ARM as well. I'm hoping to do
the latter as I think it will make the LLD implementation both simpler
and faster to execute.
== Misc ==
The `obj-file-with-stabs.s` test had to be updated as the previous
version would trip assertion errors in the code. It appears that in our
attempt to produce a minimal YAML test input, we created a file with
invalid EH frame data. I've fixed this by re-generating the YAML and not
doing any hand-pruning of it.
Reviewed By: #lld-macho, Roger
Differential Revision: https://reviews.llvm.org/D123435
This reduces linking time by ~8% for my project (1.19s -> 0.53s for
writeSections()). writeTo is const, which bodes well for it being
parallelizable, and I've looked through the different overridden versions and
can't see any race conditions. It produces the same byte-for-byte output for my
project.
Differential Revision: https://reviews.llvm.org/D126800
With -platform_version flags for two distinct platforms,
this writes a LC_BUILD_VERSION header for each.
The motivation is that this is needed for self-hosting with lld as linker
after D124059.
To create a zippered output at the clang driver level, pass
-target arm64-apple-macos -darwin-target-variant arm64-apple-ios-macabi
to create a zippered dylib.
(In Xcode's clang, `-darwin-target-variant` is spelled just `-target-variant`.)
(If you pass `-target arm64-apple-ios-macabi -target-variant arm64-apple-macos`
instead, ld64 crashes!)
This results in two -platform_version flags being passed to the linker.
ld64 also verifies that the iOS SDK version is at least 13.1. We don't do that
yet. But ld64 also does that for other platforms and we don't. So we need to
do that at some point, but not in this patch.
Only dylib and bundle outputs can be zippered.
I verified that a Catalyst app linked against a dylib created with
clang -shared foo.cc -o libfoo.dylib \
-target arm64-apple-macos \
-target-variant arm64-apple-ios-macabi \
-Wl,-install_name,@rpath/libfoo.dylib \
-fuse-ld=$PWD/out/gn/bin/ld64.lld
runs successfully. (The app calls a function `f()` in libfoo.dylib
that returns a const char* "foo", and NSLog(@"%s")s it.)
ld64 is a bit more permissive when writing zippered outputs,
see references to "unzippered twins". That's not implemented yet.
(If anybody wants to implement that, D124275 is a good start.)
Differential Revision: https://reviews.llvm.org/D124887
This diff is motivated by my work to add proper DWARF unwind support. As
detailed in PR50956 functions that need DWARF unwind need to have
compact unwind entries synthesized for them. These CU entries encode an
offset within `__eh_frame` that points to the corresponding DWARF FDE.
In order to encode this offset during
`UnwindInfoSectionImpl::finalize()`, we need to first assign values to
`InputSection::outSecOff` for each `__eh_frame` subsection. But
`__eh_frame` is ordered after `__unwind_info` (according to ld64 at
least), which puts us in a bit of a bind: `outSecOff` gets assigned
during finalization, but `__eh_frame` is being finalized after
`__unwind_info`.
But it occurred to me that there's no real need for most
ConcatOutputSections to be finalized sequentially. It's only necessary
for text-containing ConcatOutputSections that may contain branch relocs
which may need thunks. ConcatOutputSections containing other types of
data can be finalized in any order.
This diff moves the finalization logic for non-text sections into a
separate `finalizeContents()` method. This method is called before
section address assignment & unwind info finalization takes place. In
theory we could call these `finalizeContents()` methods in parallel, but
in practice it seems to be faster to do it all on the main thread.
Reviewed By: #lld-macho, oontvoo
Differential Revision: https://reviews.llvm.org/D123279
`config->priorities` has been used to hold the intermediate state during the construction of the order in which sections should be laid out. This is not a good place to hold this state since the intermediate state is not a "configuration" for LLD. It should be encapsulated in a class for building a mapping from section to priority (which I created in this diff as the `PriorityBuilder` class).
The same thing is being done for `config->callGraphProfile`.
Reviewed By: #lld-macho, int3
Differential Revision: https://reviews.llvm.org/D122156
All references to interposable symbols can be redirected at runtime to
point to a different symbol definition (with the same name). For
example, if both dylib A and B define symbol _foo, and we load A before
B at runtime, then all references to _foo within dylib B will point to
the definition in dylib A.
ld64 makes all extern symbols interposable when linking with
`-flat_namespace`.
TODO 1: Support `-interposable` and `-interposable_list`, which should
just be a matter of parsing those CLI flags and setting the
`Defined::interposable` bit.
TODO 2: Set Reloc::FinalDefinitionInLinkageUnit correctly with this info
(we are currently not setting it at all, so we're erring on the
conservative side, but we should help the LTO backend generate more
optimal code.)
Reviewed By: modimo, MaskRay
Differential Revision: https://reviews.llvm.org/D119294
Earlier in LLD's evolution, I tried to create the illusion that
subsections were indistinguishable from "top-level" sections. Thus, even
though the subsections shared many common field values, I hid those
common values away in a private Shared struct (see D105305). More
recently, however, @gkm added a public `Section` struct in D113241 that
served as an explicit way to store values that are common to an entire
set of subsections (aka InputSections). Now that we have another "common
value" struct, `Shared` has been rendered redundant. All its fields can
be moved into `Section` instead, and the pointer to `Shared` can be replaced
with a pointer to `Section`.
This `Section` pointer also has the advantage of letting us inspect other
subsections easily, simplifying the implementation of {D118798}.
P.S. I do think that having both `Section` and `InputSection` makes for
a slightly confusing naming scheme. I considered renaming `InputSection`
to `Subsection`, but that would break the symmetry with `OutputSection`.
It would also make us deviate from LLD-ELF's naming scheme.
This change is perf-neutral on my 3.2 GHz 16-Core Intel Xeon W machine:
base diff difference (95% CI)
sys_time 1.258 ± 0.031 1.248 ± 0.023 [ -1.6% .. +0.1%]
user_time 3.659 ± 0.047 3.658 ± 0.041 [ -0.5% .. +0.4%]
wall_time 4.640 ± 0.085 4.625 ± 0.063 [ -1.0% .. +0.3%]
samples 49 61
There's also no stat sig change in RSS (as measured by `time -l`):
base diff difference (95% CI)
time 998038627.097 ± 13567305.958 1003327715.556 ± 15210451.236 [ -0.2% .. +1.2%]
samples 31 36
Reviewed By: #lld-macho, oontvoo
Differential Revision: https://reviews.llvm.org/D118797
Added some comments (particularly around finalize() and
finalizeContents()) as well as doing some rephrasing / grammar fixes for
existing comments.
Also did some minor style fixups, such as by putting methods together in
a class definition and having fields of similar types next to each
other.
Reviewed By: #lld-macho, oontvoo
Differential Revision: https://reviews.llvm.org/D118714
This is in preparation for moving the code that parses and processes
order files into this file.
See https://reviews.llvm.org/D117354 for context and discussion.
Move all variables at file-scope or function-static-scope into a hosting structure (lld::CommonLinkerContext) that lives at lldMain()-scope. Drivers will inherit from this structure and add their own global state, in the same way as for the existing COFFLinkerContext.
See discussion in https://lists.llvm.org/pipermail/llvm-dev/2021-June/151184.html
Differential Revision: https://reviews.llvm.org/D108850
The PlatformKind/PlatformType enums contain the same information, which requires
them to be kept in-sync. This commit changes over to PlatformType as the sole
source of truth, which allows the removal of the redundant PlatformKind.
The majority of the changes were in LLD and TextAPI.
Reviewed By: cishida
Differential Revision: https://reviews.llvm.org/D117163
Depends on D112160
This adds the new options `--call-graph-profile-sort` (default),
`--no-call-graph-profile-sort` and `--print-symbol-order=`. If call graph
profile sorting is enabled, reads `__LLVM,__cg_profile` sections from object
files and uses the resulting graph to put callees and callers close to each
other in the final binary via the C3 clustering heuristic.
Differential Revision: https://reviews.llvm.org/D112164
After {D115416}, the "Write map file" event no longer shows up
in the time trace. Each time trace profiler instance is thread-local,
but we had neglected to initialize a separate instance for the mapfile
worker thread.
Reviewed By: keith
Differential Revision: https://reviews.llvm.org/D117069
References from thread-local variable sections are treated as offsets
relative to the start of the thread-local data memory area, which is
initialized via copying all the TLV data sections (which are all
contiguous). If later data sections require a greater alignment than
earlier ones, the offsets of data within those sections won't be
guaranteed to aligned unless we normalize alignments. We therefore use
the largest alignment for all TLV data sections.
Reviewed By: #lld-macho, int3
Differential Revision: https://reviews.llvm.org/D116263
For large applications that write to map files, writing map files can take quite
a bit of time. Sorting the biggest contributors to link times, writing map files
ranks in at 2nd place, with load input files being the biggest contributor of
link times. Avoiding writing map files on the critical path (and having its own
thread) saves ~2-3 seconds when linking chromium framework on a 16-Core
Intel Xeon W.
```
base diff difference (95% CI)
sys_time 1.617 ± 0.034 1.657 ± 0.026 [ +1.5% .. +3.5%]
user_time 28.536 ± 0.245 28.609 ± 0.180 [ -0.1% .. +0.7%]
wall_time 23.833 ± 0.271 21.684 ± 0.194 [ -9.5% .. -8.5%]
samples 31 24
```
Reviewed By: #lld-macho, oontvoo, int3
Differential Revision: https://reviews.llvm.org/D115416
This is an NFC diff that prepares for pruning & relocating `__eh_frame`.
Along the way, I made the following changes to ...
* clarify usage of `section` vs. `subsection`
* remove `map` & `vec` from type names
* disambiguate class `Section` from template parameter `SectionHeader`.
Differential Revision: https://reviews.llvm.org/D113241
We need to reset global state between runs, similar to the other ports.
There's some file-static state which needs to be reset as well and we
need to add some new helpers for that.
With this change, most LLD Mach-O tests pass with `LLD_IN_TEST=2` (which
runs the linker twice on each test). Some tests will be fixed by the
remainder of this stack, and the rest are fundamentally incompatible
with that mode (e.g. they intentionally throw fatal errors).
Fixes PR52070.
Reviewed By: #lld-macho, int3
Differential Revision: https://reviews.llvm.org/D112878
Having to remember to call `canonical()` all over the place is
error-prone; let's do it in a centralized location instead. It also
appears to improve performance slightly.
base diff difference (95% CI)
sys_time 0.984 ± 0.009 0.983 ± 0.014 [ -0.8% .. +0.6%]
user_time 6.508 ± 0.035 6.475 ± 0.036 [ -0.8% .. -0.2%]
wall_time 5.321 ± 0.034 5.300 ± 0.033 [ -0.7% .. -0.1%]
samples 36 23
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D112687