Commit Graph

102 Commits

Author SHA1 Message Date
Keith Smiley 3c24fae398
[lld-macho] Add support for objc_msgSend stubs
Apple Clang in Xcode 14 introduced a new feature for reducing the
overhead of objc_msgSend calls by deduplicating the setup calls for each
individual selector. This works by clang adding undefined symbols for
each selector called in a translation unit, such as `_objc_msgSend$foo`
for calling the `foo` method on any `NSObject`. There are 2
different modes for this behavior, the default directly does the setup
for `_objc_msgSend` and calls it, and the smaller option does the
selector setup, and then calls the standard `_objc_msgSend` stub
function.

The general overview of how this works is:

- Undefined symbols with the given prefix are collected
- The suffix of each matching undefined symbol is added as a string to
  `__objc_methname`
- A pointer is added for every method name in the `__objc_selrefs`
  section
- A `got` entry is emitted for `_objc_msgSend`
- Stubs are emitting pointing to the synthesized locations

Notes:

- Both `__objc_methname` and `__objc_selrefs` can also exist from object
  files, so their contents are merged with our synthesized contents
- The compiler emits method names for defined methods, but not for
  undefined symbols you call, but stubs are used for both
- This only implements the default "fast" mode currently just to reduce
  the diff, I also doubt many folks will care to swap modes
- This only implements this for arm64 and x86_64, we don't need to
  implement this for 32 bit iOS archs, but we should implement it for
  watchOS archs in a later diff

Differential Revision: https://reviews.llvm.org/D128108
2022-08-10 17:17:17 -07:00
Fangrui Song 4b2b68d5ab [lld] Change vector to SmallVector. NFC
My lld executable is 1.6KiB smaller and some functions are now more efficient.
2022-07-30 18:11:21 -07:00
Vincent Lee f030132c72 [lld-macho] Allow linking with ABI compatible architectures
Linking fails when targeting `x86_64-apple-darwin` for runtimes. The issue
is that LLD strictly assumes the target architecture be present in the tbd
files (which isn't always true). For example, when targeting `x86_64h`, it should
work with `x86_64` because they are ABI compatible. This is also inline with what
ld64 does.

An environment variable (which ld64 also supports) is also added to preserve the
existing behavior of strict architecture matching.

Reviewed By: #lld-macho, int3

Differential Revision: https://reviews.llvm.org/D130683
2022-07-28 17:16:32 -07:00
Jez Ng 87ce7b41d8 [lld-macho] Simplify archive loading logic
This is a follow-on to {D129556}. I've refactored the code such that
`addFile()` no longer needs to take an extra parameter. Additionally,
the "do we force-load or not" policy logic is now fully contained within
addFile, instead of being split between `addFile` and
`parseLCLinkerOptions`. This also allows us to move the `ForceLoad` (now
`LoadType`) enum out of the header file.

Additionally, we can now correctly report loads induced by
`LC_LINKER_OPTION` in our `-why_load` output.

I've also added another test to check that CLI library non-force-loads
take precedence over `LC_LINKER_OPTION` + `-force_load_swift_libs`. (The
existing logic is correct, just untested.)

Reviewed By: #lld-macho, thakis

Differential Revision: https://reviews.llvm.org/D130137
2022-07-19 21:56:24 -04:00
Keith Smiley 0bc100986c
[lld-macho] Add support for -alias
This creates a symbol alias similar to --defsym in the elf linker. This
is used by swiftpm for all executables, so it's useful to support. This
doesn't implement -alias_list but that could be done pretty easily as
needed.

Differential Revision: https://reviews.llvm.org/D129938
2022-07-19 13:55:56 -07:00
Jez Ng 403d61aedd [lld-macho] Enable EH frame relocation / pruning
This just removes the code that gates the logic. The main issue here is
perf impact: without {D122258}, LLD takes a significant perf hit because
it now has to do a lot more work in the input parsing phase. But with
that change to eliminate unnecessary EH frames from input object files,
the perf overhead here is minimal. Concretely, here are the numbers for
some builds as measured on my 16-core Mac Pro:

**chromium_framework**

This is without the use of `-femit-dwarf-unwind=no-compact-unwind`:

             base           diff           difference (95% CI)
  sys_time   1.826 ± 0.019  1.962 ± 0.034  [  +6.5% ..   +8.4%]
  user_time  9.306 ± 0.054  9.926 ± 0.082  [  +6.2% ..   +7.1%]
  wall_time  8.225 ± 0.068  8.947 ± 0.128  [  +8.0% ..   +9.6%]
  samples    15             22

With that flag enabled, the regression mostly disappears, as hoped:

             base           diff           difference (95% CI)
  sys_time   1.839 ± 0.062  1.866 ± 0.068  [  -0.9% ..   +3.8%]
  user_time  9.452 ± 0.068  9.490 ± 0.067  [  -0.1% ..   +0.9%]
  wall_time  8.383 ± 0.127  8.452 ± 0.114  [  -0.1% ..   +1.8%]
  samples    17             21

**Unnamed internal app**

Without `-femit-dwarf-unwind`, this is the perf hit:

             base           diff           difference (95% CI)
  sys_time   1.372 ± 0.029  1.317 ± 0.024  [  -4.6% ..   -3.5%]
  user_time  2.835 ± 0.028  2.980 ± 0.027  [  +4.8% ..   +5.4%]
  wall_time  3.205 ± 0.079  3.383 ± 0.066  [  +4.9% ..   +6.2%]
  samples    102            83

With `-femit-dwarf-unwind`, the perf hit almost disappears:

             base           diff           difference (95% CI)
  sys_time   1.274 ± 0.026  1.270 ± 0.025  [  -0.9% ..   +0.3%]
  user_time  2.812 ± 0.023  2.822 ± 0.035  [  +0.1% ..   +0.7%]
  wall_time  3.166 ± 0.047  3.174 ± 0.059  [  -0.2% ..   +0.7%]
  samples    95             97

Just for fun, I measured the impact of `-femit-dwarf-unwind` on ld64
(`base` has the extra DWARF unwind info in the input object files,
`diff` doesn't):

             base           diff           difference (95% CI)
  sys_time   1.128 ± 0.010  1.124 ± 0.023  [  -1.3% ..   +0.6%]
  user_time  7.176 ± 0.030  7.106 ± 0.094  [  -1.5% ..   -0.4%]
  wall_time  7.874 ± 0.041  7.795 ± 0.121  [  -1.7% ..   -0.3%]
  samples    16             25

And for LLD:

             base           diff           difference (95% CI)
  sys_time   1.315 ± 0.019  1.280 ± 0.019  [  -3.2% ..   -2.0%]
  user_time  2.980 ± 0.022  2.822 ± 0.016  [  -5.5% ..   -5.0%]
  wall_time  3.369 ± 0.038  3.175 ± 0.033  [  -6.2% ..   -5.3%]
  samples    47             47

So parsing the extra EH frames is a lot more expensive for us than for
ld64. But given that we are quite a lot faster than ld64 to begin with,
I guess this isn't entirely unexpected...

Reviewed By: #lld-macho, oontvoo

Differential Revision: https://reviews.llvm.org/D129540
2022-07-13 21:14:05 -04:00
Daniel Bertalan a3f67f0920 [lld-macho] Initial support for Linker Optimization Hints
Linker optimization hints mark a sequence of instructions used for
synthesizing an address, like ADRP+ADD. If the referenced symbol ends up
close enough, it can be replaced by a faster sequence of instructions
like ADR+NOP.

This commit adds support for 2 of the 7 defined ARM64 optimization
hints:
- LOH_ARM64_ADRP_ADD, which transforms a pair of ADRP+ADD into ADR+NOP
  if the referenced address is within +/- 1 MiB
- LOH_ARM64_ADRP_ADRP, which transforms two ADRP instructions into
  ADR+NOP if they reference the same page

These two kinds already cover more than 50% of all LOHs in
chromium_framework.

Differential Review: https://reviews.llvm.org/D128093
2022-06-30 06:28:42 +02:00
Kazu Hirata 757d9d22cd [lld] Use value_or instead of getValueOr (NFC) 2022-06-19 00:29:41 -07:00
Keith Smiley 272bf0fc41
[lld-macho] Add support for exporting no symbols
As an optimization for ld64 sometimes it can be useful to not export any
symbols for top level binaries that don't need any exports, to do this
you can pass `-exported_symbols_list /dev/null`, or new with Xcode 14
(ld64 816) there is a `-no_exported_symbols` flag for the same behavior.
This reproduces this behavior where previously an empty exported symbols
list file would have been ignored.

Differential Revision: https://reviews.llvm.org/D127562
2022-06-15 15:07:27 -07:00
Jez Ng e183bf8e15 [lld-macho][reland] Initial support for EH Frames
This reverts commit 942f4e3a7c.

The additional change required to avoid the assertion errors seen
previously is:

  --- a/lld/MachO/ICF.cpp
  +++ b/lld/MachO/ICF.cpp
  @@ -443,7 +443,9 @@ void macho::foldIdenticalSections() {
                                 /*relocVA=*/0);
           isec->data = copy;
         }
  -    } else {
  +    } else if (!isEhFrameSection(isec)) {
  +      // EH frames are gathered as hashables from unwindEntry above; give a
  +      // unique ID to everything else.
         isec->icfEqClass[0] = ++icfUniqueID;
       }
     }

Differential Revision: https://reviews.llvm.org/D123435
2022-06-13 07:45:16 -04:00
Douglas Yung 942f4e3a7c Revert "[lld-macho] Initial support for EH Frames"
This reverts commit 826be330af.

This was causing a test failure on build bots:
  - https://lab.llvm.org/buildbot/#/builders/36/builds/21770
  - https://lab.llvm.org/buildbot/#/builders/58/builds/23913
2022-06-09 05:25:43 -07:00
Jez Ng 826be330af [lld-macho] Initial support for EH Frames
== Background ==

`llvm-mc` generates unwind info in both compact unwind and DWARF
formats. LLD already handles the compact unwind format; this diff gets
us close to handling the DWARF format properly.

== Caveats ==

It's not quite done yet, but I figure it's worth getting this reviewed
and landed first as it's shaping up to be a fairly large code change.

**Known limitations of the current code:**

* Only works for x86_64, for which `llvm-mc` emits "abs-ified"
  relocations as described in 618def651b.
  `llvm-mc` emits regular relocations for ARM EH frames, which we do not
  yet handle correctly.

Since the feature is not ready for real use yet, I've gated it behind a
flag that only gets toggled on during test suite runs. With most of the
new code disabled, we see just a hint of perf regression, so I don't
think it'd be remiss to land this as-is:

             base           diff           difference (95% CI)
  sys_time   1.926 ± 0.168  1.979 ± 0.117  [  -1.2% ..   +6.6%]
  user_time  3.590 ± 0.033  3.606 ± 0.028  [  +0.0% ..   +0.9%]
  wall_time  7.104 ± 0.184  7.179 ± 0.151  [  -0.2% ..   +2.3%]
  samples    30             31

== Design ==

Like compact unwind entries, EH frames are also represented as regular
ConcatInputSections that get pointed to via `Defined::unwindEntry`. This
allows them to be handled generically by e.g. the MarkLive and ICF
code. (But note that unlike compact unwind subsections, EH frame
subsections do end up in the final binary.)

In order to make EH frames "look like" a regular ConcatInputSection,
some processing is required. First, we need to split the `__eh_frame`
section along EH frame boundaries rather than along symbol boundaries.
We do this by decoding the length field of each EH frame. Second, the
abs-ified relocations need to be turned into regular Relocs.

== Next Steps ==

In order to support EH frames on ARM targets, we will either have to
teach LLD how to handle EH frames with explicit relocs, or we can try to
make `llvm-mc` emit abs-ified relocs for ARM as well. I'm hoping to do
the latter as I think it will make the LLD implementation both simpler
and faster to execute.

== Misc ==

The `obj-file-with-stabs.s` test had to be updated as the previous
version would trip assertion errors in the code. It appears that in our
attempt to produce a minimal YAML test input, we created a file with
invalid EH frame data. I've fixed this by re-generating the YAML and not
doing any hand-pruning of it.

Reviewed By: #lld-macho, Roger

Differential Revision: https://reviews.llvm.org/D123435
2022-06-08 23:40:52 -04:00
Vy Nguyen fae6bd7563 [lld-macho] Support -non_global_symbols_strip_list, -non_global_symbols_no_strip_list, -x
PR/55600

Differential Revision: https://reviews.llvm.org/D126046
2022-05-25 19:22:04 +07:00
Nico Weber 895a72111b [lld/mac] Support writing zippered dylibs and bundles
With -platform_version flags for two distinct platforms,
this writes a LC_BUILD_VERSION header for each.

The motivation is that this is needed for self-hosting with lld as linker
after D124059.

To create a zippered output at the clang driver level, pass

    -target arm64-apple-macos -darwin-target-variant arm64-apple-ios-macabi

to create a zippered dylib.

(In Xcode's clang, `-darwin-target-variant` is spelled just `-target-variant`.)

(If you pass `-target arm64-apple-ios-macabi -target-variant arm64-apple-macos`
instead, ld64 crashes!)

This results in two -platform_version flags being passed to the linker.

ld64 also verifies that the iOS SDK version is at least 13.1. We don't do that
yet. But ld64 also does that for other platforms and we don't. So we need to
do that at some point, but not in this patch.

Only dylib and bundle outputs can be zippered.

I verified that a Catalyst app linked against a dylib created with

    clang -shared foo.cc -o libfoo.dylib \
          -target arm64-apple-macos \
          -target-variant arm64-apple-ios-macabi \
          -Wl,-install_name,@rpath/libfoo.dylib \
          -fuse-ld=$PWD/out/gn/bin/ld64.lld

runs successfully. (The app calls a function `f()` in libfoo.dylib
that returns a const char* "foo", and NSLog(@"%s")s it.)

ld64 is a bit more permissive when writing zippered outputs,
see references to "unzippered twins". That's not implemented yet.
(If anybody wants to implement that, D124275 is a good start.)

Differential Revision: https://reviews.llvm.org/D124887
2022-05-04 19:23:35 -04:00
Nikita Popov b8f50abd04 [lld] Remove support for legacy pass manager
This removes options for performing LTO with the legacy pass
manager in LLD. Options that explicitly enable the new pass manager
are retained as no-ops.

Differential Revision: https://reviews.llvm.org/D123219
2022-04-07 10:17:31 +02:00
Nikita Popov ed4e6e0398 [cmake] Remove LLVM_ENABLE_NEW_PASS_MANAGER cmake option
Or rather, error out if it is set to something other than ON. This
removes the ability to enable the legacy pass manager by default,
but does not remove the ability to explicitly enable it through
various flags like -flegacy-pass-manager or -enable-new-pm=0.

I checked, and our test suite definitely doesn't pass with
LLVM_ENABLE_NEW_PASS_MANAGER=OFF anymore.

Differential Revision: https://reviews.llvm.org/D123126
2022-04-06 09:52:21 +02:00
Roger Kim 34b9729561 [lld-macho][NFC] Encapsulate symbol priority implementation.
Just some code clean up.

Reviewed By: #lld-macho, int3

Differential Revision: https://reviews.llvm.org/D122752
2022-03-31 13:47:38 -04:00
Roger Kim f858fba631 [lld][Macho][NFC] Encapsulate priorities map in a priority class
`config->priorities` has been used to hold the intermediate state during the construction of the order in which sections should be laid out. This is not a good place to hold this state since the intermediate state is not a "configuration" for LLD. It should be encapsulated in a class for building a mapping from section to priority (which I created in this diff as the `PriorityBuilder` class).

The same thing is being done for `config->callGraphProfile`.

Reviewed By: #lld-macho, int3

Differential Revision: https://reviews.llvm.org/D122156
2022-03-23 13:57:26 -04:00
Jez Ng 850592ec14 [lld-macho] Implement -why_live (without perf overhead)
This was based off @thakis' draft in {D103517}. I employed templates to ensure
the support for `-why_live` wouldn't slow down the regular non-why-live code
path.

No stat sig perf difference on my 3.2 GHz 16-Core Intel Xeon W:

             base           diff           difference (95% CI)
  sys_time   1.195 ± 0.015  1.199 ± 0.022  [  -0.4% ..   +1.0%]
  user_time  3.716 ± 0.022  3.701 ± 0.025  [  -0.7% ..   -0.1%]
  wall_time  4.606 ± 0.034  4.597 ± 0.046  [  -0.6% ..   +0.2%]
  samples    44             37

Reviewed By: #lld-macho, thakis

Differential Revision: https://reviews.llvm.org/D120377
2022-02-24 15:49:36 -05:00
Juergen Ributzka 3025c3eded Replace PlatformKind with PlatformType.
The PlatformKind/PlatformType enums contain the same information, which requires
them to be kept in-sync. This commit changes over to PlatformType as the sole
source of truth, which allows the removal of the redundant PlatformKind.

The majority of the changes were in LLD and TextAPI.

Reviewed By: cishida

Differential Revision: https://reviews.llvm.org/D117163
2022-01-13 09:23:49 -08:00
Leonard Grey 6db04b97e6 [lld-macho] Port CallGraphSort from COFF/ELF
Depends on D112160

This adds the new options `--call-graph-profile-sort` (default),
`--no-call-graph-profile-sort` and `--print-symbol-order=`. If call graph
profile sorting is enabled, reads `__LLVM,__cg_profile` sections from object
files and uses the resulting graph to put callees and callers close to each
other in the final binary via the C3 clustering heuristic.

Differential Revision: https://reviews.llvm.org/D112164
2022-01-12 10:47:04 -05:00
Fangrui Song 477bc36d3b [lld-macho] Change some global pointers to unique_ptr
Similar to D116143. My x86-64 `lld` is ~8KiB smaller.

Reviewed By: keith

Differential Revision: https://reviews.llvm.org/D116902
2022-01-10 19:39:14 -08:00
Vincent Lee adfbb5411b [lld-macho] Add warn flags to enable/disable warnings on -install_name
ld64 doesn't warn on builds using `-install_name` if it's a bundle. But, the
current warning is nice to have because `install_name` only works with dylib.
To prevent an overflow of warnings in build logs and have parity with ld64,
create a `--warn-dylib-install-name` and `--warn-no-dylib-install-name` flag
that enables this LLD specific warning.

Reviewed By: #lld-macho, thakis

Differential Revision: https://reviews.llvm.org/D113534
2021-11-17 16:18:14 -08:00
Keith Smiley 6629ec3ecc [lld-macho] Implement -arch_errors_fatal
By default with ld64, architecture mismatches are just warnings, then
this flag can be passed to make these fail. This matches that behavior.

Reviewed By: int3, #lld-macho

Differential Revision: https://reviews.llvm.org/D113082
2021-11-03 22:01:53 -07:00
Jez Ng 6c2f26a159 [lld-macho] -all_load and -ObjC should not affect LC_LINKER_OPTION flags
In particular, they should not cause archives to be eagerly loaded. This
matches ld64's behavior.

Fixes PR52246.

Reviewed By: #lld-macho, thakis

Differential Revision: https://reviews.llvm.org/D112756
2021-10-29 11:00:28 -04:00
Vincent Lee d54360cd32 [lld-macho] Implement -S
There are a couple internal builds that require the use of this flag.

Reviewed By: #lld-macho, int3

Differential Revision: https://reviews.llvm.org/D112594
2021-10-27 17:09:57 -07:00
Vy Nguyen 236197e2d0 [lld-macho] Implement -oso_prefix
https://bugs.llvm.org/show_bug.cgi?id=50229

Differential Revision: https://reviews.llvm.org/D112291
2021-10-22 16:32:42 -04:00
Nico Weber 2d6fb62ef2 [lld/mac] Handle symbols from -U in treatUndefinedSymbol()
In ld64, `-U section$start$FOO$bar` handles `section$start$FOO$bar`
as a regular `section$start` symbol, that is section$start processing
happens before -U processing.

Likely, nobody uses that in practice so it doesn't seem very important
to be compatible with this, but it also moves the -U handling code next
to the `-undefined dynamic_lookup` handling code, which is nice because
they do the same thing. And, in fact, this did identify a bug in a corner
case in the intersection of `-undefined dynamic_lookup` and dead-stripping
(fix for that in D106565).

Vaguely related to PR50760.

No interesting behavior change.

Differential Revision: https://reviews.llvm.org/D106566
2021-07-22 19:43:57 -04:00
Vincent Lee d695d0d6f6 [lld-macho] Optimize bind opcodes with multiple passes
In D105866, we used an intermediate container to store a list of opcodes. Here,
we use that data structure to help us perform optimization passes that would allow
a more efficient encoding of bind opcodes. Currently, the functionality mirrors the
optimization pass {1,2} done in ld64 for bind opcodes under optimization gate
to prevent slight regressions.

Reviewed By: int3, #lld-macho

Differential Revision: https://reviews.llvm.org/D105867
2021-07-15 20:52:46 -07:00
Leonard Grey c931ff72bd [lld-macho] Add LTO cache support
This adds support for the lld-only `--thinlto-cache-policy` option, as well as
implementations for ld64's `-cache_path_lto`, `-prune_interval_lto`,
`-prune_after_lto`, and `-max_relative_cache_size_lto`.

Test is adapted from lld/test/ELF/lto/cache.ll

Differential Revision: https://reviews.llvm.org/D105922
2021-07-15 12:56:13 -04:00
Nico Weber f21801dab2 [lld/mac] Implement -application_extension
Differential Revision: https://reviews.llvm.org/D105818
2021-07-12 13:42:16 -04:00
Nico Weber 3eb2fc4b50 [lld/mac] Partially implement -export_dynamic
This implements the part of -export_dynamic that adds external
symbols as dead strip roots even for executables.

It does not yet implement the effect -export_dynamic has for LTO.
I tried just replacing `config->outputType != MH_EXECUTE` with
`(config->outputType != MH_EXECUTE || config->exportDynamic)` in
LTO.cpp, but then local symbols make it into the symbol table too,
which is too much (and also doesn't match ld64). So punt on this
for now until I understand it better.
(D91583 may or may not be related too).

Differential Revision: https://reviews.llvm.org/D105482
2021-07-06 11:22:18 -04:00
Nico Weber 64be5b7d87 [lld/mac] Implement -arch_multiple
This is the other flag clang passes when calling clang with two -arch
flags (which means with this, `clang -arch x86_64 -arch arm64 -fuse-ld=lld ...`
now no longer prints any warnings \o/). Since clang calls the linker several
times in that setup, it's not clear to the user from which invocation the
errors are. The flag's help text is

    Specifies that the linker should augment error and warning messages
    with the architecture name.

In ld64, the only effect of the flag is that undefined symbols are prefaced
with

    Undefined symbols for architecture x86_64:

instead of the usual "Undefined symbols:". So for now, let's add this
only to undefined symbol errors too. That's probably the most common
linker diagnostic.

Another idea would be to prefix errors and warnings with "ld64.lld(x86_64):"
instead of the usual "ld64.lld:", but I'm not sure if people would
misunderstand that as a comment about the arch of ld itself.
But open to suggestions on what effect this flag should have :) And we
don't have to get it perfect now, we can iterate on it.

Differential Revision: https://reviews.llvm.org/D105450
2021-07-06 00:25:18 -04:00
Nico Weber 2c25f39fcc [lld/mac] Implement -final_output
This is one of two flags clang passes to the linker when giving calling
clang with multiple -arch flags.

I think it'd make sense to also use finalOutput instead of outputFile
in CodeSignatureSection() and when replacing @executable_path, but
ld64 doesn't do that, so I'll at least put those in separate commits.

Differential Revision: https://reviews.llvm.org/D105449
2021-07-05 20:06:26 -04:00
Nico Weber db64306d99 [lld/mac] Implement -umbrella
I think this is an old way for doing what is done with
-reexport_library these days, but it's e.g. still used in libunwind's
build (the opensource.apple.com one, not the llvm one).

Differential Revision: https://reviews.llvm.org/D105448
2021-07-05 20:06:25 -04:00
Leonard Grey fe08e9c487 [lld-macho] Add support for LTO optimization level
Everything (including test) modified from ELF/COFF. Using the same syntax
(--lto-O3, etc) as ELF.

Differential Revision: https://reviews.llvm.org/D105223
2021-07-01 15:01:59 -04:00
Greg McGary f27e4548fc [lld-macho] Implement ICF
ICF = Identical C(ode|OMDAT) Folding

This is the LLD ELF/COFF algorithm, adapted for MachO. So far, only `-icf all` is supported. In order to support `-icf safe`, we will need to port address-significance tables (`.addrsig` directives) to MachO, which will come in later diffs.

`check-{llvm,clang,lld}` have 0 regressions for `lld -icf all` vs. baseline ld64.

We only run ICF on `__TEXT,__text` for reasons explained in the block comment in `ConcatOutputSection.cpp`.

Here is the perf impact for linking `chromium_framekwork` on a Mac Pro (16-core Xeon W) for the non-ICF case vs. pre-ICF:
```
    N           Min           Max        Median           Avg        Stddev
x  20          4.27          4.44          4.34         4.349   0.043029977
+  20          4.37          4.46         4.405        4.4115   0.025188761
Difference at 95.0% confidence
        0.0625 +/- 0.0225658
        1.43711% +/- 0.518873%
        (Student's t, pooled s = 0.0352566)
```

Reviewed By: #lld-macho, int3

Differential Revision: https://reviews.llvm.org/D103292
2021-06-17 10:07:44 -07:00
Nico Weber b579938d40 [lld/mac] Add support for -no_data_in_code_info flag
Differential Revision: https://reviews.llvm.org/D104345
2021-06-16 06:40:42 -04:00
Nico Weber e87c095af3 [lld/mac] Print dylib search details with --print-dylib-search or RC_TRACE_DYLIB_SEARCHING
For debugging dylib loading, it's useful to have some insight into what
the linker is doing.

ld64 has the undocumented RC_TRACE_DYLIB_SEARCHING env var
for this printing dylib search candidates.

This adds a flag --print-dylib-search to make lld print the seame information.
It's useful for users, but also for writing tests. The output is formatted
slightly differently than ld64, but we still support RC_TRACE_DYLIB_SEARCHING
to offer at least a compatible way to trigger this.

ld64 has both `-print_statistics` and `-trace_symbol_output` to enable
diagnostics output. I went with "print" since that seems like a more
straightforward name.

Differential Revision: https://reviews.llvm.org/D103985
2021-06-09 22:08:20 -04:00
Jez Ng 447dfbe005 [lld-macho] Implement -force_load_swift_libs
It causes libraries whose names start with "swift" to be force-loaded.
Note that unlike the more general `-force_load`, this flag only applies
to libraries specified via LC_LINKER_OPTIONS, and not those passed on
the command-line. This is what ld64 does.

Reviewed By: #lld-macho, thakis

Differential Revision: https://reviews.llvm.org/D103709
2021-06-07 23:48:35 -04:00
Jez Ng 04259cde15 [lld-macho] Implement cstring deduplication
Our implementation draws heavily from LLD-ELF's, which in turn delegates
its string deduplication to llvm-mc's StringTableBuilder. The messiness of
this diff is largely due to the fact that we've previously assumed that
all InputSections get concatenated together to form the output. This is
no longer true with CStringInputSections, which split their contents into
StringPieces. StringPieces are much more lightweight than InputSections,
which is important as we create a lot of them. They may also overlap in
the output, which makes it possible for strings to be tail-merged. In
fact, the initial version of this diff implemented tail merging, but
I've dropped it for reasons I'll explain later.

**Alignment Issues**

Mergeable cstring literals are found under the `__TEXT,__cstring`
section. In contrast to ELF, which puts strings that need different
alignments into different sections, clang's Mach-O backend puts them all
in one section. Strings that need to be aligned have the `.p2align`
directive emitted before them, which simply translates into zero padding
in the object file.

I *think* ld64 extracts the desired per-string alignment from this data
by preserving each string's offset from the last section-aligned
address. I'm not entirely certain since it doesn't seem consistent about
doing this; but perhaps this can be chalked up to cases where ld64 has
to deduplicate strings with different offset/alignment combos -- it
seems to pick one of their alignments to preserve. This doesn't seem
correct in general; we can in fact can induce ld64 to produce a crashing
binary just by linking in an additional object file that only contains
cstrings and no code. See PR50563 for details.

Moreover, this scheme seems rather inefficient: since unaligned and
aligned strings are all put in the same section, which has a single
alignment value, it doesn't seem possible to tell whether a given string
doesn't have any alignment requirements. Preserving offset+alignments
for strings that don't need it is wasteful.

In practice, the crashes seen so far seem to stem from x86_64 SIMD
operations on cstrings. X86_64 requires SIMD accesses to be
16-byte-aligned. So for now, I'm thinking of just aligning all strings
to 16 bytes on x86_64. This is indeed wasteful, but implementation-wise
it's simpler than preserving per-string alignment+offsets. It also
avoids the aforementioned crash after deduplication of
differently-aligned strings. Finally, the overhead is not huge: using
16-byte alignment (vs no alignment) is only a 0.5% size overhead when
linking chromium_framework.

With these alignment requirements, it doesn't make sense to attempt tail
merging -- most strings will not be eligible since their overlaps aren't
likely to start at a 16-byte boundary. Tail-merging (with alignment) for
chromium_framework only improves size by 0.3%.

It's worth noting that LLD-ELF only does tail merging at `-O2`. By
default (at `-O1`), it just deduplicates w/o tail merging. @thakis has
also mentioned that they saw it regress compressed size in some cases
and therefore turned it off. `ld64` does not seem to do tail merging at
all.

**Performance Numbers**

CString deduplication reduces chromium_framework from 250MB to 242MB, or
about a 3.2% reduction.

Numbers for linking chromium_framework on my 3.2 GHz 16-Core Intel Xeon W:

      N           Min           Max        Median           Avg        Stddev
  x  20          3.91          4.03         3.935          3.95   0.034641016
  +  20          3.99          4.14         4.015        4.0365     0.0492336
  Difference at 95.0% confidence
          0.0865 +/- 0.027245
          2.18987% +/- 0.689746%
          (Student's t, pooled s = 0.0425673)

As expected, cstring merging incurs some non-trivial overhead.

When passing `--no-literal-merge`, it seems that performance is the
same, i.e. the refactoring in this diff didn't cost us.

      N           Min           Max        Median           Avg        Stddev
  x  20          3.91          4.03         3.935          3.95   0.034641016
  +  20          3.89          4.02         3.935        3.9435   0.043197831
  No difference proven at 95.0% confidence

Reviewed By: #lld-macho, gkm

Differential Revision: https://reviews.llvm.org/D102964
2021-06-07 23:48:35 -04:00
Alexander Shaposhnikov 1309c181a8 [lld][MachO] Add first bits to support special symbols
This diff adds first bits to support special symbols $ld$previous* in LLD.
$ld$* symbols modify properties/behavior of the library
(e.g. its install name, compatibility version or hide/add symbols)
for specific target versions.

Test plan: make check-lld-macho

Differential revision: https://reviews.llvm.org/D103505
2021-06-04 23:32:26 -07:00
Nico Weber a5645513db [lld/mac] Implement -dead_strip
Also adds support for live_support sections, no_dead_strip sections,
.no_dead_strip symbols.

Chromium Framework 345MB unstripped -> 250MB stripped
(vs 290MB unstripped -> 236M stripped with ld64).

Doing dead stripping is a bit faster than not, because so much less
data needs to be processed:

    % ministat lld_*
    x lld_nostrip.txt
    + lld_strip.txt
        N           Min           Max        Median           Avg        Stddev
    x  10      3.929414       4.07692     4.0269079     4.0089678   0.044214794
    +  10     3.8129408     3.9025559     3.8670411     3.8642573   0.024779651
    Difference at 95.0% confidence
            -0.144711 +/- 0.0336749
            -3.60967% +/- 0.839989%
            (Student's t, pooled s = 0.0358398)

This interacts with many parts of the linker. I tried to add test coverage
for all added `isLive()` checks, so that some test will fail if any of them
is removed. I checked that the test expectations for the most part match
ld64's behavior (except for live-support-iterations.s, see the comment
in the test). Interacts with:
- debug info
- export tries
- import opcodes
- flags like -exported_symbol(s_list)
- -U / dynamic_lookup
- mod_init_funcs, mod_term_funcs
- weak symbol handling
- unwind info
- stubs
- map files
- -sectcreate
- undefined, dylib, common, defined (both absolute and normal) symbols

It's possible it interacts with more features I didn't think of,
of course.

I also did some manual testing:
- check-llvm check-clang check-lld work with lld with this patch
  as host linker and -dead_strip enabled
- Chromium still starts
- Chromium's base_unittests still pass, including unwind tests

Implemenation-wise, this is InputSection-based, so it'll work for
object files with .subsections_via_symbols (which includes all
object files generated by clang). I first based this on the COFF
implementation, but later realized that things are more similar to ELF.
I think it'd be good to refactor MarkLive.cpp to look more like the ELF
part at some point, but I'd like to get a working state checked in first.

Mechanical parts:
- Rename canOmitFromOutput to wasCoalesced (no behavior change)
  since it really is for weak coalesced symbols
- Add noDeadStrip to Defined, corresponding to N_NO_DEAD_STRIP
  (`.no_dead_strip` in asm)

Fixes PR49276.

Differential Revision: https://reviews.llvm.org/D103324
2021-06-02 11:09:26 -04:00
Nico Weber 2c1903412b [lld/mac] Implement removal of unused dylibs
This omits load commands for unreferenced dylibs if:
- the dylib was loaded implicitly,
- it is marked MH_DEAD_STRIPPABLE_DYLIB
- or -dead_strip_dylibs is passed

This matches ld64.

Currently, the "is dylib referenced" state is computed before dead code
stripping and is not updated after dead code stripping. This too matches ld64.
We should do better here.

With this, clang-format linked with lld (like with ld64) no longer has
libobjc.A.dylib in `otool -L` output. (It was implicitly loaded as a reexport
of CoreFoundation.framework, but it's not needed.)

Differential Revision: https://reviews.llvm.org/D103430
2021-06-01 16:06:30 -04:00
Nico Weber 0b39f055d8 [lld/mac] Don't write mtimes to N_OSO entries if ZERO_AR_DATE is set.
This is important for build determinism. This matches ld64.

Differential Revision: https://reviews.llvm.org/D103446
2021-06-01 15:29:38 -04:00
Nico Weber 9ab49ae55d [lld/mac] Implement -sectalign
clang sometimes passes this flag along (see D68351), so we should implement it.

Differential Revision: https://reviews.llvm.org/D102247
2021-05-11 13:31:32 -04:00
Jez Ng 0f8854f7f5 [lld-macho] Don't reference entry symbol for non-executables
This would cause us to pull in symbols (and code) that should
be unused.

Reviewed By: #lld-macho, thakis

Differential Revision: https://reviews.llvm.org/D102137
2021-05-09 20:30:26 -04:00
Greg McGary 27b426b0c8 [lld-macho] Implement builtin section renaming
ld64 automatically renames many sections depending on output type and assorted flags. Here, we implement the most common configs. We can add more obscure flags and behaviors as needed.

Depends on D101393

Differential Revision: https://reviews.llvm.org/D101395
2021-05-03 21:26:51 -07:00
Jez Ng ed4a4e3312 [lld-macho][nfc] Add accessors for commonly-used PlatformInfo fields
As discussed here: https://reviews.llvm.org/D100523#inline-951543

Reviewed By: #lld-macho, thakis, alexshap

Differential Revision: https://reviews.llvm.org/D100978
2021-04-21 15:43:56 -04:00
Jez Ng ab9c21bbab [lld-macho] Support LC_ENCRYPTION_INFO
This load command records a range spanning from the end of the load
commands to the end of the `__TEXT` segment. Presumably the kernel will encrypt
all this data.

Reviewed By: #lld-macho, thakis

Differential Revision: https://reviews.llvm.org/D100973
2021-04-21 13:39:56 -04:00