Symbols occur at non-zero offsets in a subsection if they are
`.alt_entry` symbols, or if `.subsections_via_symbols` is omitted.
It doesn't seem like ld64 supports folding those subsections either.
Moreover, supporting this it makes `foldIdentical` a lot more
complicated to implement. The existing implementation has some
questionable behavior around STABS omission -- if a section with an
non-zero offset symbol was folded into one without, we would omit the
STABS entry for the non-zero offset symbol.
I will be following up with a diff that makes `foldIdentical` zero out
the symbol sizes for folded symbols. Again, this is much easier to
implement if we don't have to worry about non-zero offsets.
Reviewed By: #lld-macho, oontvoo
Differential Revision: https://reviews.llvm.org/D136000
The previous form is currently "harmless" and happened to work but may not in the future:
Consider the struct: (for x86-64, but same issue can be said for the ARM/64 families):
```
UNWIND_X86_64_MODE_MASK = 0x0F000000,
UNWIND_X86_64_MODE_RBP_FRAME = 0x01000000,
UNWIND_X86_64_MODE_STACK_IMMD = 0x02000000,
UNWIND_X86_64_MODE_STACK_IND = 0x03000000,
UNWIND_X86_64_MODE_DWARF = 0x04000000,
```
Previously, we were doing: `(encoding & MODE_DWARF) == MODE_DWARF`
As soon as a new `UNWIND_X86_64_MODE_FOO = 0x05000000` is defined, then the check above would always return true for encoding=MODE_FOO (because `(0b0101 & 0b0100) == 0b0100` )
Differential Revision: https://reviews.llvm.org/D135359
While reading this code, I was wondering if we change these variables in the
loop. We don't, so make them const to make this easier to see next time.
No behavior change.
Differential Revision: https://reviews.llvm.org/D135877
This commit moves the parsing of linker optimization hints into
`ARM64::applyOptimizationHints`. This lets us avoid allocating memory
for holding the parsed information, and moves work out of
`ObjFile::parse`, which is not parallelized at the moment.
This change reduces the overhead of processing LOHs to 25-30 ms when
linking Chromium Framework on my M1 machine; previously it took close to
100 ms.
There's no statistically significant change in runtime for a --threads=1
link.
Performance figures with all 8 cores utilized:
N Min Max Median Avg Stddev
x 20 3.8027232 3.8760762 3.8505335 3.8454145 0.026352574
+ 20 3.7019017 3.8660538 3.7546209 3.7620371 0.032680043
Difference at 95.0% confidence
-0.0833775 +/- 0.019
-2.16823% +/- 0.494094%
(Student's t, pooled s = 0.0296854)
Differential Revision: https://reviews.llvm.org/D133439
This is similar to the `-alias` CLI option, but it gives finer-grained
control in that it allows the aliased symbols to be treated as private
externs.
While working on this, I realized that our `-alias` handling did not
cover the cases where the aliased symbol is a common or dylib symbol,
nor the case where we have an undefined that gets treated specially and
converted to a defined later on. My N_INDR handling neglects this too
for now; I've added checks and TODO messages for these.
`N_INDR` symbols cropped up as part of our attempt to link swift-stdlib.
Reviewed By: #lld-macho, thakis, thevinster
Differential Revision: https://reviews.llvm.org/D133825
This is similar to the `-alias` CLI option, but it gives finer-grained
control in that it allows the aliased symbols to be treated as private
externs.
While working on this, I realized that our `-alias` handling did not
cover the cases where the aliased symbol is a common or dylib symbol,
nor the case where we have an undefined that gets treated specially and
converted to a defined later on. My N_INDR handling neglects this too
for now; I've added checks and TODO messages for these.
`N_INDR` symbols cropped up as part of our attempt to link swift-stdlib.
Reviewed By: #lld-macho, thakis, thevinster
Differential Revision: https://reviews.llvm.org/D133825
This is what ld64 does (though it doesn't use ICF to do this; instead it
always dedups selrefs by default).
We'll want to dedup implicitly-defined selrefs as well, but I will leave
that for future work.
Additionally, I'm not *super* happy with the current LLD implementation
because I think it is rather janky and inefficient. But at least it
moves us toward the goal of closing the size gap with ld64. I've
described ideas for cleaning up our implementation here:
https://github.com/llvm/llvm-project/issues/57714
Differential Revision: https://reviews.llvm.org/D133780
This section is marked S_ATTR_LIVE_SUPPORT in input files, which meant
that on arm64, we were unnecessarily preserving FDEs if we e.g. had
multiple weak definitions for a function. Worse, we would actually
produce an invalid `__eh_frame` section in that case, because the CIE
associated with the unnecessary FDE would still get dead-stripped and
we'd end up with a dangling FDE. We set up associations from functions
to their FDEs, so dead-stripping will just work naturally, and we can
clear S_ATTR_LIVE_SUPPORT from our input `__eh_frame` sections to fix
dead-stripping.
Reviewed By: #lld-macho, int3
Differential Revision: https://reviews.llvm.org/D132489
Apple Clang in Xcode 14 introduced a new feature for reducing the
overhead of objc_msgSend calls by deduplicating the setup calls for each
individual selector. This works by clang adding undefined symbols for
each selector called in a translation unit, such as `_objc_msgSend$foo`
for calling the `foo` method on any `NSObject`. There are 2
different modes for this behavior, the default directly does the setup
for `_objc_msgSend` and calls it, and the smaller option does the
selector setup, and then calls the standard `_objc_msgSend` stub
function.
The general overview of how this works is:
- Undefined symbols with the given prefix are collected
- The suffix of each matching undefined symbol is added as a string to
`__objc_methname`
- A pointer is added for every method name in the `__objc_selrefs`
section
- A `got` entry is emitted for `_objc_msgSend`
- Stubs are emitting pointing to the synthesized locations
Notes:
- Both `__objc_methname` and `__objc_selrefs` can also exist from object
files, so their contents are merged with our synthesized contents
- The compiler emits method names for defined methods, but not for
undefined symbols you call, but stubs are used for both
- This only implements the default "fast" mode currently just to reduce
the diff, I also doubt many folks will care to swap modes
- This only implements this for arm64 and x86_64, we don't need to
implement this for 32 bit iOS archs, but we should implement it for
watchOS archs in a later diff
Differential Revision: https://reviews.llvm.org/D128108
Normally we'd use LLVM_FALLTHROUGH, or now, [[fallthrough]].
But for case labels followed directly by other case labels, we
use neither.
No behavior change.
Previously we only supporting using the system pointer size (aka the
`absptr` encoding) because `llvm-mc`'s CFI directives always generate EH
frames with that encoding. But libffi uses 4-byte-encoded, hand-rolled
EH frames, so this patch adds support for it.
Fixes#56576.
Reviewed By: #lld-macho, oontvoo
Differential Revision: https://reviews.llvm.org/D130804
A symbol `$ld$previous$/Another$1.2.3$1$3.0$14.0$_xxx$` means
"pretend symbol `_xxx` is in dylib `/Another` with version `1.2.3`
if the deployment target is between `3.0` and `14.0` and we're
targeting platform `1` (ie macOS)".
This means dylibs can now inject synthetic dylibs into the link, so
DylibFile needs to grow a 3rd constructor.
The only other interesting thing is that such an injected dylib
counts as a use of the original dylib. This patch gets this mostly
right (if _only_ `$ld$previous` symbols are used from a dylib,
we don't add a dep on the dylib itself, matching ld64), but one case
where we don't match ld64 yet is that ld64 even omits the original
dylib when linking it with `-needed-l`. Lld currently still adds a load
command for the original dylib in that case. (That's for a future
patch.)
Fixes#56074.
Differential Revision: https://reviews.llvm.org/D130725
Linking fails when targeting `x86_64-apple-darwin` for runtimes. The issue
is that LLD strictly assumes the target architecture be present in the tbd
files (which isn't always true). For example, when targeting `x86_64h`, it should
work with `x86_64` because they are ABI compatible. This is also inline with what
ld64 does.
An environment variable (which ld64 also supports) is also added to preserve the
existing behavior of strict architecture matching.
Reviewed By: #lld-macho, int3
Differential Revision: https://reviews.llvm.org/D130683
This flag was introduced in ld64-609. It instructs the linker to link to
a static library while treating its symbols as if they had hidden
visibility. This is useful when building a dylib that links to static
libraries but we don't want the symbols from those to be exported.
Closes#51505
This reland adds bitcode file handling, so we won't get any compile
errors due to BitcodeFile::forceHidden being unused.
Differential Revision: https://reviews.llvm.org/D130473
This flag was introduced in ld64-609. It instructs the linker to link to
a static library while treating its symbols as if they had hidden
visibility. This is useful when building a dylib that links to static
libraries but we don't want the symbols from those to be exported.
Closes#51505
Differential Revision: https://reviews.llvm.org/D130473
Previously, we treated it as a regular ConcatInputSection. However, ld64
actually parses its contents and uses that to synthesize a single image
info struct, generating one 8-byte section instead of `8 * number of
object files with ObjC code`.
I'm not entirely sure what impact this section has on the runtime, so I
just tried to follow ld64's semantics as closely as possible in this
diff. My main motivation though was to reduce binary size.
No significant perf change on chromium_framework on my 16-core Mac Pro:
base diff difference (95% CI)
sys_time 1.764 ± 0.062 1.748 ± 0.032 [ -2.4% .. +0.5%]
user_time 5.112 ± 0.104 5.106 ± 0.046 [ -0.9% .. +0.7%]
wall_time 6.111 ± 0.184 6.085 ± 0.076 [ -1.6% .. +0.8%]
samples 30 32
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D130125
which occurs when there are EH frames present in the object file's weak
def.
Reviewed By: abrachet
Differential Revision: https://reviews.llvm.org/D130409
`advanceSubsection()` didn't account for the possibility that a section
could have no subsections.
Reviewed By: #lld-macho, thakis, BertalanD
Differential Revision: https://reviews.llvm.org/D130288
If there are multiple symbols at the same address, our unwind info
implementation assumes that we always register unwind entries to a
single canonical symbol.
This assumption was violated by the `registerEhFrame` code.
Fixes#56570.
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D130208
It's only used in one branch, so we were unnecessarily calculating the
length of many symbol names.
Tiny speedup when linking chromium_framework on my M1 Mac mini:
x before.txt
+ after.txt
N Min Max Median Avg Stddev
x 10 3.9917109 4.0418 4.0318099 4.0203902 0.021459873
+ 10 3.944725 4.053988 3.9708955 3.9825602 0.037257609
Difference at 95.0% confidence
-0.03783 +/- 0.0285663
-0.940953% +/- 0.710536%
(Student's t, pooled s = 0.0304028)
Differential Revision: https://reviews.llvm.org/D130234
Similar to cstrings ld64 always deduplicates cfstrings. This was already
being done when enabling ICF, but for debug builds you may want to flip
this on if you cannot eliminate your instances of this, so this change
makes --deduplicate-literals also apply to cfstrings.
Differential Revision: https://reviews.llvm.org/D130134
To do this, we need to slice away the LSDA pointer, just like we are
slicing away the functionAddress pointer.
No observable difference in perf on chromium_framework:
base diff difference (95% CI)
sys_time 1.769 ± 0.068 1.761 ± 0.065 [ -2.7% .. +1.8%]
user_time 9.517 ± 0.110 9.528 ± 0.116 [ -0.6% .. +0.8%]
wall_time 8.291 ± 0.174 8.307 ± 0.183 [ -1.1% .. +1.5%]
samples 21 25
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D129830
While working on {D129830}, I realized that our handling of ICF +
eh_frame combined was untested. Additionally I realized that the comment
explaining why we were safely slicing away the functionAddress reloc
from our compact unwind entries was... insufficient and slightly
misleading. I've tried to clarify it.
Reviewed By: #lld-macho, thevinster
Differential Revision: https://reviews.llvm.org/D129894
This just removes the code that gates the logic. The main issue here is
perf impact: without {D122258}, LLD takes a significant perf hit because
it now has to do a lot more work in the input parsing phase. But with
that change to eliminate unnecessary EH frames from input object files,
the perf overhead here is minimal. Concretely, here are the numbers for
some builds as measured on my 16-core Mac Pro:
**chromium_framework**
This is without the use of `-femit-dwarf-unwind=no-compact-unwind`:
base diff difference (95% CI)
sys_time 1.826 ± 0.019 1.962 ± 0.034 [ +6.5% .. +8.4%]
user_time 9.306 ± 0.054 9.926 ± 0.082 [ +6.2% .. +7.1%]
wall_time 8.225 ± 0.068 8.947 ± 0.128 [ +8.0% .. +9.6%]
samples 15 22
With that flag enabled, the regression mostly disappears, as hoped:
base diff difference (95% CI)
sys_time 1.839 ± 0.062 1.866 ± 0.068 [ -0.9% .. +3.8%]
user_time 9.452 ± 0.068 9.490 ± 0.067 [ -0.1% .. +0.9%]
wall_time 8.383 ± 0.127 8.452 ± 0.114 [ -0.1% .. +1.8%]
samples 17 21
**Unnamed internal app**
Without `-femit-dwarf-unwind`, this is the perf hit:
base diff difference (95% CI)
sys_time 1.372 ± 0.029 1.317 ± 0.024 [ -4.6% .. -3.5%]
user_time 2.835 ± 0.028 2.980 ± 0.027 [ +4.8% .. +5.4%]
wall_time 3.205 ± 0.079 3.383 ± 0.066 [ +4.9% .. +6.2%]
samples 102 83
With `-femit-dwarf-unwind`, the perf hit almost disappears:
base diff difference (95% CI)
sys_time 1.274 ± 0.026 1.270 ± 0.025 [ -0.9% .. +0.3%]
user_time 2.812 ± 0.023 2.822 ± 0.035 [ +0.1% .. +0.7%]
wall_time 3.166 ± 0.047 3.174 ± 0.059 [ -0.2% .. +0.7%]
samples 95 97
Just for fun, I measured the impact of `-femit-dwarf-unwind` on ld64
(`base` has the extra DWARF unwind info in the input object files,
`diff` doesn't):
base diff difference (95% CI)
sys_time 1.128 ± 0.010 1.124 ± 0.023 [ -1.3% .. +0.6%]
user_time 7.176 ± 0.030 7.106 ± 0.094 [ -1.5% .. -0.4%]
wall_time 7.874 ± 0.041 7.795 ± 0.121 [ -1.7% .. -0.3%]
samples 16 25
And for LLD:
base diff difference (95% CI)
sys_time 1.315 ± 0.019 1.280 ± 0.019 [ -3.2% .. -2.0%]
user_time 2.980 ± 0.022 2.822 ± 0.016 [ -5.5% .. -5.0%]
wall_time 3.369 ± 0.038 3.175 ± 0.033 [ -6.2% .. -5.3%]
samples 47 47
So parsing the extra EH frames is a lot more expensive for us than for
ld64. But given that we are quite a lot faster than ld64 to begin with,
I guess this isn't entirely unexpected...
Reviewed By: #lld-macho, oontvoo
Differential Revision: https://reviews.llvm.org/D129540
This load command specifies the offset and size of the exports trie.
This information used to be a field in LC_DYLD_INFO, but in newer
libraries, it has a dedicated load command: LC_DYLD_EXPORTS_TRIE.
The format of the trie is the same for both load commands, so the code
for parsing it can be shared.
LLD does not generate this yet; it is mainly useful when chained fixups
are in use, as the other members of LC_DYLD_INFO are unused then, so the
smaller LC_DYLD_EXPORTS_TRIE can be output instead.
LLDB gained support for this in D107673.
Fixes#54550
Differential Revision: https://reviews.llvm.org/D129430
Linker optimization hints mark a sequence of instructions used for
synthesizing an address, like ADRP+ADD. If the referenced symbol ends up
close enough, it can be replaced by a faster sequence of instructions
like ADR+NOP.
This commit adds support for 2 of the 7 defined ARM64 optimization
hints:
- LOH_ARM64_ADRP_ADD, which transforms a pair of ADRP+ADD into ADR+NOP
if the referenced address is within +/- 1 MiB
- LOH_ARM64_ADRP_ADRP, which transforms two ADRP instructions into
ADR+NOP if they reference the same page
These two kinds already cover more than 50% of all LOHs in
chromium_framework.
Differential Review: https://reviews.llvm.org/D128093
The error used to look like this:
ld64.lld: error: undefined symbol: _foo
>>> referenced by /path/to/bar.o:(symbol _baz+0x4)
If DWARF line information is available, we now show where in the source
the references are coming from:
ld64.lld: error: unreferenced symbol: _foo
>>> referenced by: bar.cpp:42 (/path/to/bar.cpp:42)
>>> /path/to/bar.o:(symbol _baz+0x4)
The reland is identical to the first time this landed. The fix was in D128294.
This reverts commit 0cc7ad4175.
Differential Revision: https://reviews.llvm.org/D128184
If ld64.lld was supplied an object file that had a `__debug_abbrev` or
`__debug_str` section, but didn't have any compile unit DIEs in
`__debug_info`, it would dereference an iterator pointing to the empty
array of DIEs. This underlying issue started causing segmentation faults
when parsing for `__debug_info` was addded in D128184. That commit was
reverted, and this one fixes the invalid dereference to allow relanding
it.
This commit adds an assertion to `filter_iterator_base`'s dereference
operators to catch bugs like this one.
Ran check-llvm, check-clang and check-lld.
Differential Revision: https://reviews.llvm.org/D128294
The error used to look like this:
ld64.lld: error: undefined symbol: _foo
>>> referenced by /path/to/bar.o:(symbol _baz+0x4)
If DWARF line information is available, we now show where in the source
the references are coming from:
ld64.lld: error: unreferenced symbol: _foo
>>> referenced by: bar.cpp:42 (/path/to/bar.cpp:42)
>>> /path/to/bar.o:(symbol _baz+0x4)
Differential Revision: https://reviews.llvm.org/D128184
This reverts commit 942f4e3a7c.
The additional change required to avoid the assertion errors seen
previously is:
--- a/lld/MachO/ICF.cpp
+++ b/lld/MachO/ICF.cpp
@@ -443,7 +443,9 @@ void macho::foldIdenticalSections() {
/*relocVA=*/0);
isec->data = copy;
}
- } else {
+ } else if (!isEhFrameSection(isec)) {
+ // EH frames are gathered as hashables from unwindEntry above; give a
+ // unique ID to everything else.
isec->icfEqClass[0] = ++icfUniqueID;
}
}
Differential Revision: https://reviews.llvm.org/D123435
For arm64, llvm-mc emits relocations for the target function
address like so:
ltmp:
<CIE start>
...
<CIE end>
... multiple FDEs ...
<FDE start>
<target function address - (ltmp + pcrel offset)>
...
If any of the FDEs in `multiple FDEs` get dead-stripped, then `FDE start`
will move to an earlier address, and `ltmp + pcrel offset` will no longer
reflect an accurate pcrel value. To avoid this problem, we "canonicalize"
our relocation by adding an `EH_Frame` symbol at `FDE start`, and updating
the reloc to be `target function address - (EH_Frame + new pcrel offset)`.
Reviewed By: #lld-macho, Roger
Differential Revision: https://reviews.llvm.org/D124561
== Background ==
`llvm-mc` generates unwind info in both compact unwind and DWARF
formats. LLD already handles the compact unwind format; this diff gets
us close to handling the DWARF format properly.
== Caveats ==
It's not quite done yet, but I figure it's worth getting this reviewed
and landed first as it's shaping up to be a fairly large code change.
**Known limitations of the current code:**
* Only works for x86_64, for which `llvm-mc` emits "abs-ified"
relocations as described in 618def651b.
`llvm-mc` emits regular relocations for ARM EH frames, which we do not
yet handle correctly.
Since the feature is not ready for real use yet, I've gated it behind a
flag that only gets toggled on during test suite runs. With most of the
new code disabled, we see just a hint of perf regression, so I don't
think it'd be remiss to land this as-is:
base diff difference (95% CI)
sys_time 1.926 ± 0.168 1.979 ± 0.117 [ -1.2% .. +6.6%]
user_time 3.590 ± 0.033 3.606 ± 0.028 [ +0.0% .. +0.9%]
wall_time 7.104 ± 0.184 7.179 ± 0.151 [ -0.2% .. +2.3%]
samples 30 31
== Design ==
Like compact unwind entries, EH frames are also represented as regular
ConcatInputSections that get pointed to via `Defined::unwindEntry`. This
allows them to be handled generically by e.g. the MarkLive and ICF
code. (But note that unlike compact unwind subsections, EH frame
subsections do end up in the final binary.)
In order to make EH frames "look like" a regular ConcatInputSection,
some processing is required. First, we need to split the `__eh_frame`
section along EH frame boundaries rather than along symbol boundaries.
We do this by decoding the length field of each EH frame. Second, the
abs-ified relocations need to be turned into regular Relocs.
== Next Steps ==
In order to support EH frames on ARM targets, we will either have to
teach LLD how to handle EH frames with explicit relocs, or we can try to
make `llvm-mc` emit abs-ified relocs for ARM as well. I'm hoping to do
the latter as I think it will make the LLD implementation both simpler
and faster to execute.
== Misc ==
The `obj-file-with-stabs.s` test had to be updated as the previous
version would trip assertion errors in the code. It appears that in our
attempt to produce a minimal YAML test input, we created a file with
invalid EH frame data. I've fixed this by re-generating the YAML and not
doing any hand-pruning of it.
Reviewed By: #lld-macho, Roger
Differential Revision: https://reviews.llvm.org/D123435
This change implements --icf=safe for MachO based on addrsig section that is implemented in D123751.
Reviewed By: int3, #lld-macho
Differential Revision: https://reviews.llvm.org/D123752
Before this,
clang empty.cc -target x86_64-apple-ios13.1-macabi \
-framework CoreServices -fuse-ld=lld
would error out with
ld64.lld: error: path/to/MacOSX.sdk/System/Library/Frameworks/
CoreServices.framework/Versions/A/Frameworks/CarbonCore.framework/
Versions/A/CarbonCore.tbd(
/System/Library/Frameworks/
CoreServices.framework/Versions/A/Frameworks/CarbonCore.framework/
Versions/A/CarbonCore) is incompatible with x86_64 (macCatalyst)
Now it works, like with ld64.
Differential Revision: https://reviews.llvm.org/D124336
Previously, we stored a pointer from the ObjFile to its compact unwind
section in order to avoid iterating over the file's sections a second
time. However, given the small number of sections (not subsections) per
file, this caching was really quite unnecessary. We will soon do lookups
for more sections (such as the `__eh_frame` section), so let's simplify
the code first.
Reviewed By: #lld-macho, Roger
Differential Revision: https://reviews.llvm.org/D123434
A "zippered" dylib contains several LC_BUILD_VERSION load commands, usually
one each for "normal" macOS and one for macCatalyst.
These are usually created by passing something like
-shared -target arm64-apple-macos -darwin-target-variant arm64-apple-ios13.1-macabi
to clang, which turns it into
-platform_version macos 12.0.0 12.3 -platform_version "mac catalyst" 14.0.0 15.4
for the linker.
ld64.lld can read these files fine, but it can't write them. Before this
change, it would just silently use the last -platform_version flag and ignore
the rest.
This change adds a warning that writing zippered dylibs isn't implemented yet
instead.
Sadly, parts of ld64.lld's test suite relied on the previous
"silently use last flag" semantics for its test suite: `%lld` always expanded
to `ld64.lld -platform_version macos 10.15 11.0` and tests that wanted a
different value passed a 2nd `-platform_version` flag later on. But this now
produces a warning if the platform passed to `-platform_version` is not `macos`.
There weren't very many cases of this, so move these to use `%no-arg-lld` and
manually pass `-arch`.
Differential Revision: https://reviews.llvm.org/D124106
{D123302} got me looking deeper at `includeInSymtab`. I thought it was a
little odd that there were excluded (live) symbols for which
`includeInSymtab` was false; we shouldn't have so many different ways to
exclude a symbol. As such, this diff makes the `L`-prefixed-symbol
exclusion code use `includeInSymtab` too. (Note that as part of our
support for `__eh_frame`, we will also be excluding all `__eh_frame`
symbols from the symtab in a future diff.)
Another thing I noticed is that the `emitStabs` code never has to deal
with excluded symbols because `SymtabSection::finalize()` already
filters them out. As such, I've updated the comments and asserts from
{D123302} to reflect this.
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D123433