Commit Graph

260 Commits

Author SHA1 Message Date
Vy Nguyen c7c5a1c9ae [lld-macho] Ignore debug symbols while preparing relocations.
Details: see https://bugs.llvm.org/show_bug.cgi?id=50812

Differential Revision: https://reviews.llvm.org/D105210
2021-07-02 13:51:46 -04:00
Jez Ng f6b6e72143 [lld-macho] Factor out common InputSection members
We have been creating many ConcatInputSections with identical values due
to .subsections_via_symbols. This diff factors out the identical values
into a Shared struct, to reduce memory consumption and make copying
cheaper.

I also changed `callSiteCount` from a uint32_t to a 31-bit field to save an
extra word.

All in all, this takes InputSection from 120 to 72 bytes (and
ConcatInputSection from 160 to 112 bytes), i.e. 30% size reduction in
ConcatInputSection.

Numbers for linking chromium_framework on my 3.2 GHz 16-Core Intel Xeon W:

      N           Min           Max        Median           Avg        Stddev
  x  20          4.14          4.24          4.18         4.183   0.027548999
  +  20          4.04          4.11         4.075        4.0775   0.018027756
  Difference at 95.0% confidence
          -0.1055 +/- 0.0149005
          -2.52211% +/- 0.356215%
          (Student's t, pooled s = 0.0232803)

Reviewed By: #lld-macho, thakis

Differential Revision: https://reviews.llvm.org/D105305
2021-07-01 21:22:39 -04:00
Jez Ng ac2dd06b91 [lld-macho] Deduplicate CFStrings
`__cfstring` is a special literal section, so instead of breaking it up
at symbol boundaries, we break it up at fixed-width boundaries (since
each literal is the same size). Symbols can only occur at one of those
boundaries, so this is strictly more powerful than
`.subsections_via_symbols`.

With that in place, we then run the section through ICF.

This change is about perf-neutral when linking chromium_framework.

Reviewed By: #lld-macho, gkm

Differential Revision: https://reviews.llvm.org/D105045
2021-07-01 21:22:38 -04:00
Nico Weber aed0a08c69 [lld/mac] Make symbol table order deterministic
SymtabSection::emitStabs() writes the symbol table in the order
of externalSymbols, which has the order of symtab->getSymbols(),
which is just the order symbols are added to the symbol table.

In practice, symbols in the symbol files of input .o files are
sorted, but since that's not guaranteed we sort them in
ObjFile::parseSymbols(). To make sure several symbols with the same
address keep the order they're in the input file, we have to use
stable_sort().

In practice, std::sort() on already-sorted inputs won't change the order
of just adjacent elements, and while in theory std::sort() could use a
random pivot, in practice the code should be deterministic as it was
previously too.

But now lld/test/MachO/stabs.s passes with LLVM_ENABLE_EXPENSIVE_CHECKS=ON
(the last test that was failing with that set).

Fixes a regression from D99972.

While here, remove an empty section in stabs.s and move
.subsections_via_symbols to the end where it usually is (this part no
behavior change).

Differential Revision: https://reviews.llvm.org/D105071
2021-06-29 09:29:49 -04:00
Leonard Grey a8a6e5b094 [lld-macho] Preserve alignment for non-deduplicated cstrings
Fixes PR50637.

Downstream bug: https://crbug.com/1218958

Currently, we split __cstring along symbol boundaries with .subsections_via_symbols
when not deduplicating, and along null bytes when deduplicating. This change splits
along null bytes unconditionally, and preserves original alignment in the non-
deduplicated case.

Removing subsections-section-relocs.s because with this change, __cstring
is never reordered based on the order file.

Differential Revision: https://reviews.llvm.org/D104919
2021-06-28 22:26:43 -04:00
Jez Ng 4c49f9ceaf [lld-macho] Handle non-extern symbols marked as private extern
Previously, we asserted that such a case was invalid, but in fact
`ld -r` can emit such symbols if the input contained a (true) private
extern, or if it contained a symbol started with "L".

Non-extern symbols marked as private extern are essentially equivalent
to regular TU-scoped symbols, so no new functionality is needed.

Reviewed By: #lld-macho, thakis

Differential Revision: https://reviews.llvm.org/D104502
2021-06-18 16:36:14 -04:00
Muhammad Omair Javaid 9777f3fd06 Fix build failure on 32 bit Arm
This patch fixes build failure caused by commit
f27e4548fc on 32 bit arm.

Differential Revision: https://reviews.llvm.org/D103292
2021-06-18 15:27:09 +00:00
Greg McGary f27e4548fc [lld-macho] Implement ICF
ICF = Identical C(ode|OMDAT) Folding

This is the LLD ELF/COFF algorithm, adapted for MachO. So far, only `-icf all` is supported. In order to support `-icf safe`, we will need to port address-significance tables (`.addrsig` directives) to MachO, which will come in later diffs.

`check-{llvm,clang,lld}` have 0 regressions for `lld -icf all` vs. baseline ld64.

We only run ICF on `__TEXT,__text` for reasons explained in the block comment in `ConcatOutputSection.cpp`.

Here is the perf impact for linking `chromium_framekwork` on a Mac Pro (16-core Xeon W) for the non-ICF case vs. pre-ICF:
```
    N           Min           Max        Median           Avg        Stddev
x  20          4.27          4.44          4.34         4.349   0.043029977
+  20          4.37          4.46         4.405        4.4115   0.025188761
Difference at 95.0% confidence
        0.0625 +/- 0.0225658
        1.43711% +/- 0.518873%
        (Student's t, pooled s = 0.0352566)
```

Reviewed By: #lld-macho, int3

Differential Revision: https://reviews.llvm.org/D103292
2021-06-17 10:07:44 -07:00
Jez Ng eeac6b2bec [lld-macho] Handle multiple LC_LINKER_OPTIONs
We previously only parsed the first one.

Reviewed By: #lld-macho, thakis

Differential Revision: https://reviews.llvm.org/D104352
2021-06-16 15:23:06 -04:00
Jez Ng d52d1b93c3 [lld-macho] Downgrade version mismatch to warning
It's a warning in ld64. While having LLD be stricter would be nice, it
makes it harder for it to be a drop-in replacement into existing builds.

Reviewed By: #lld-macho, thakis

Differential Revision: https://reviews.llvm.org/D104333
2021-06-16 11:06:26 -04:00
Nico Weber b579938d40 [lld/mac] Add support for -no_data_in_code_info flag
Differential Revision: https://reviews.llvm.org/D104345
2021-06-16 06:40:42 -04:00
Alexander Shaposhnikov 928394d109 [lld][MachO] Add support for LC_DATA_IN_CODE
Add first bits for emitting LC_DATA_IN_CODE.

Test plan: make check-lld-macho

Differential revision: https://reviews.llvm.org/D103006
2021-06-14 19:21:59 -07:00
Jez Ng cc17bfe489 [lld-macho] Fix "shift exponent too large" UBSAN error
UBSAN seems to have added this check somewhere along the way...

This might also fix the PPC buildbot, which is failing on the same test
2021-06-14 13:47:25 -04:00
Jez Ng 681cfeb591 [lld-macho][nfc] Have InputSection ctors take some parameters
This is motivated by an upcoming diff in which the
WordLiteralInputSection ctor sets itself up based on the value of its
section flags. As such, it needs to be passed the `flags` value as part
of its ctor parameters, instead of having them assigned after the fact
in `parseSection()`. While refactoring code to make that possible, I
figured it would make sense for the other InputSections to also take
their initial values as ctor parameters.

Reviewed By: #lld-macho, thakis

Differential Revision: https://reviews.llvm.org/D103978
2021-06-11 19:50:09 -04:00
Jez Ng 5d88f2dd94 [lld-macho] Deduplicate fixed-width literals
Conceptually, the implementation is pretty straightforward: we put each
literal value into a hashtable, and then write out the keys of that
hashtable at the end.

In contrast with ELF, the Mach-O format does not support variable-length
literals that aren't strings. Its literals are either 4, 8, or 16 bytes
in length. LLD-ELF dedups its literals via sorting + uniq'ing, but since
we don't need to worry about overly-long values, we should be able to do
a faster job by just hashing.

That said, the implementation right now is far from optimal, because we
add to those hashtables serially. To parallelize this, we'll need a
basic concurrent hashtable (only needs to support concurrent writes w/o
interleave reads), which shouldn't be to hard to implement, but I'd like
to punt on it for now.

Numbers for linking chromium_framework on my 3.2 GHz 16-Core Intel Xeon W:

      N           Min           Max        Median           Avg        Stddev
  x  20          4.27          4.39         4.315        4.3225   0.033225703
  +  20          4.36          4.82          4.44        4.4845    0.13152846
  Difference at 95.0% confidence
          0.162 +/- 0.0613971
          3.74783% +/- 1.42041%
          (Student's t, pooled s = 0.0959262)

This corresponds to binary size savings of 2MB out of 335MB, or 0.6%.
It's not a great tradeoff as-is, but as mentioned our implementation can
be signficantly optimized, and literal dedup will unlock more
opportunities for ICF to identify identical structures that reference
the same literals.

Reviewed By: #lld-macho, gkm

Differential Revision: https://reviews.llvm.org/D103113
2021-06-11 19:50:08 -04:00
Nico Weber 0e399eb527 [lld/mac] When handling @loader_path, use realpath() of symlinks
This is important for Frameworks, which are usually symlinks.

ld64 gets this right for @rpath that's replaced with @loader_path, but not for
bare @loader_path -- ld64's code calls realpath() in that case too, but ignores
the result.

ld64 somehow manages to find libbar1.dylib in the test without the
explicit `-rpath` in Foo1. I don't understand why or how. But this
change is a step forward and fixes an immediate problem I'm having,
so let's start with this :)

Differential Revision: https://reviews.llvm.org/D103990
2021-06-09 20:36:07 -04:00
Jez Ng 04259cde15 [lld-macho] Implement cstring deduplication
Our implementation draws heavily from LLD-ELF's, which in turn delegates
its string deduplication to llvm-mc's StringTableBuilder. The messiness of
this diff is largely due to the fact that we've previously assumed that
all InputSections get concatenated together to form the output. This is
no longer true with CStringInputSections, which split their contents into
StringPieces. StringPieces are much more lightweight than InputSections,
which is important as we create a lot of them. They may also overlap in
the output, which makes it possible for strings to be tail-merged. In
fact, the initial version of this diff implemented tail merging, but
I've dropped it for reasons I'll explain later.

**Alignment Issues**

Mergeable cstring literals are found under the `__TEXT,__cstring`
section. In contrast to ELF, which puts strings that need different
alignments into different sections, clang's Mach-O backend puts them all
in one section. Strings that need to be aligned have the `.p2align`
directive emitted before them, which simply translates into zero padding
in the object file.

I *think* ld64 extracts the desired per-string alignment from this data
by preserving each string's offset from the last section-aligned
address. I'm not entirely certain since it doesn't seem consistent about
doing this; but perhaps this can be chalked up to cases where ld64 has
to deduplicate strings with different offset/alignment combos -- it
seems to pick one of their alignments to preserve. This doesn't seem
correct in general; we can in fact can induce ld64 to produce a crashing
binary just by linking in an additional object file that only contains
cstrings and no code. See PR50563 for details.

Moreover, this scheme seems rather inefficient: since unaligned and
aligned strings are all put in the same section, which has a single
alignment value, it doesn't seem possible to tell whether a given string
doesn't have any alignment requirements. Preserving offset+alignments
for strings that don't need it is wasteful.

In practice, the crashes seen so far seem to stem from x86_64 SIMD
operations on cstrings. X86_64 requires SIMD accesses to be
16-byte-aligned. So for now, I'm thinking of just aligning all strings
to 16 bytes on x86_64. This is indeed wasteful, but implementation-wise
it's simpler than preserving per-string alignment+offsets. It also
avoids the aforementioned crash after deduplication of
differently-aligned strings. Finally, the overhead is not huge: using
16-byte alignment (vs no alignment) is only a 0.5% size overhead when
linking chromium_framework.

With these alignment requirements, it doesn't make sense to attempt tail
merging -- most strings will not be eligible since their overlaps aren't
likely to start at a 16-byte boundary. Tail-merging (with alignment) for
chromium_framework only improves size by 0.3%.

It's worth noting that LLD-ELF only does tail merging at `-O2`. By
default (at `-O1`), it just deduplicates w/o tail merging. @thakis has
also mentioned that they saw it regress compressed size in some cases
and therefore turned it off. `ld64` does not seem to do tail merging at
all.

**Performance Numbers**

CString deduplication reduces chromium_framework from 250MB to 242MB, or
about a 3.2% reduction.

Numbers for linking chromium_framework on my 3.2 GHz 16-Core Intel Xeon W:

      N           Min           Max        Median           Avg        Stddev
  x  20          3.91          4.03         3.935          3.95   0.034641016
  +  20          3.99          4.14         4.015        4.0365     0.0492336
  Difference at 95.0% confidence
          0.0865 +/- 0.027245
          2.18987% +/- 0.689746%
          (Student's t, pooled s = 0.0425673)

As expected, cstring merging incurs some non-trivial overhead.

When passing `--no-literal-merge`, it seems that performance is the
same, i.e. the refactoring in this diff didn't cost us.

      N           Min           Max        Median           Avg        Stddev
  x  20          3.91          4.03         3.935          3.95   0.034641016
  +  20          3.89          4.02         3.935        3.9435   0.043197831
  No difference proven at 95.0% confidence

Reviewed By: #lld-macho, gkm

Differential Revision: https://reviews.llvm.org/D102964
2021-06-07 23:48:35 -04:00
Nico Weber 17c43c4045 [lld/mac] Add reexports after reexporter to inputFiles
When a library "host"'s reexports change their installName with
`$ld$os10.11$install_name$host`, we used to write a load command for "host" but
write the version numbers of the reexport instead of "host". This fixes that.

I first thought that the rule is to take the version numbers from the library
that originally had that install name (implemented in D103819), but that's not
what ld64 seems to be doing: It takes the version number from the first dylib
with that install name it loads, and it loads the reexporting library before
the reexports. We already did most of that, we just added reexports before the
reexporter. After this change, we add the reexporter before the reexports.

Addresses https://bugs.llvm.org/show_bug.cgi?id=49800#c11 part 1.

(ld64 seems to add reexports after processing _all_ files on the command line,
while we add them right after the reexporter. For the common case of reexport +
$ld$ symbol changing back to the exporter name, this doesn't make a difference,
but you can construct a case where it does. I expect this to not make a
difference in practice though.)

Differential Revision: https://reviews.llvm.org/D103821
2021-06-07 17:04:03 -04:00
Nico Weber c5ffe97988 [lld/mac] Implement support for searching dylibs with @rpath/ in install name
Also adjust a few comments, and move the DylibFile comment talking about
umbrella next to the parameter again.

Differential Revision: https://reviews.llvm.org/D103783
2021-06-07 06:22:52 -04:00
Nico Weber 52489021cf [lld/mac] Implement support for searching dylibs with @loader_path/ in install name
Differential Revision: https://reviews.llvm.org/D103779
2021-06-06 20:19:50 -04:00
Nico Weber a48bd587f7 [lld/mac] Implement support for searching dylibs with @executable_path/ in install name
Differential Revision: https://reviews.llvm.org/D103775
2021-06-06 20:01:50 -04:00
Nico Weber 7def700667 [lld/mac] Rename DylibFile::dylibName to DylibFile::installName
The flag to set it is called `-install_name`, and it's called `installName` in tbd files.

No behavior change.

Differential Revision: https://reviews.llvm.org/D103776
2021-06-06 20:00:35 -04:00
Nico Weber e910437443 [lld/mac] Use fewer magic numbers in magic $ld$ handling code
Also simply a conditional and de-alias a variable.
Minor cleanups, no behavior change.

Differential Revision: https://reviews.llvm.org/D103774
2021-06-06 18:13:16 -04:00
Alexander Shaposhnikov 5e49ee8794 [lld][MachO] Add support for $ld$install_name symbols
This diff adds support for $ld$install_name symbols.

Test plan: make check-lld-macho

Differential revision: https://reviews.llvm.org/D103746
2021-06-05 12:58:59 -07:00
Alexander Shaposhnikov 1309c181a8 [lld][MachO] Add first bits to support special symbols
This diff adds first bits to support special symbols $ld$previous* in LLD.
$ld$* symbols modify properties/behavior of the library
(e.g. its install name, compatibility version or hide/add symbols)
for specific target versions.

Test plan: make check-lld-macho

Differential revision: https://reviews.llvm.org/D103505
2021-06-04 23:32:26 -07:00
Jez Ng 6881f29a36 [lld-macho] Parse re-exports of nested TAPI documents
D103423 neglected to call `parseReexports()` for nested TBD
documents, leading to symbol resolution failures when trying to look up
a symbol nested more than one level deep in a TBD file. This fixes the
regression and adds a test.

It also appears that `umbrella` wasn't being set properly when calling
`parseLoadCommands` -- it's supposed to resolve to `this` if `nullptr`
is passed. I didn't write a failing test case for this but I've made
`umbrella` a member so the previous behavior should be preserved.

Reviewed By: #lld-macho, thakis

Differential Revision: https://reviews.llvm.org/D103586
2021-06-03 12:02:30 -04:00
Nico Weber a5645513db [lld/mac] Implement -dead_strip
Also adds support for live_support sections, no_dead_strip sections,
.no_dead_strip symbols.

Chromium Framework 345MB unstripped -> 250MB stripped
(vs 290MB unstripped -> 236M stripped with ld64).

Doing dead stripping is a bit faster than not, because so much less
data needs to be processed:

    % ministat lld_*
    x lld_nostrip.txt
    + lld_strip.txt
        N           Min           Max        Median           Avg        Stddev
    x  10      3.929414       4.07692     4.0269079     4.0089678   0.044214794
    +  10     3.8129408     3.9025559     3.8670411     3.8642573   0.024779651
    Difference at 95.0% confidence
            -0.144711 +/- 0.0336749
            -3.60967% +/- 0.839989%
            (Student's t, pooled s = 0.0358398)

This interacts with many parts of the linker. I tried to add test coverage
for all added `isLive()` checks, so that some test will fail if any of them
is removed. I checked that the test expectations for the most part match
ld64's behavior (except for live-support-iterations.s, see the comment
in the test). Interacts with:
- debug info
- export tries
- import opcodes
- flags like -exported_symbol(s_list)
- -U / dynamic_lookup
- mod_init_funcs, mod_term_funcs
- weak symbol handling
- unwind info
- stubs
- map files
- -sectcreate
- undefined, dylib, common, defined (both absolute and normal) symbols

It's possible it interacts with more features I didn't think of,
of course.

I also did some manual testing:
- check-llvm check-clang check-lld work with lld with this patch
  as host linker and -dead_strip enabled
- Chromium still starts
- Chromium's base_unittests still pass, including unwind tests

Implemenation-wise, this is InputSection-based, so it'll work for
object files with .subsections_via_symbols (which includes all
object files generated by clang). I first based this on the COFF
implementation, but later realized that things are more similar to ELF.
I think it'd be good to refactor MarkLive.cpp to look more like the ELF
part at some point, but I'd like to get a working state checked in first.

Mechanical parts:
- Rename canOmitFromOutput to wasCoalesced (no behavior change)
  since it really is for weak coalesced symbols
- Add noDeadStrip to Defined, corresponding to N_NO_DEAD_STRIP
  (`.no_dead_strip` in asm)

Fixes PR49276.

Differential Revision: https://reviews.llvm.org/D103324
2021-06-02 11:09:26 -04:00
Nico Weber 476e4d65d4 [lld/mac] Address review feedback and improve a comment
I forgot to move the message() call around as requested in D103428
before committing that change. Move it now.

Also, improve the ordinal uniq'ing comment. I hadn't realized that the
distinct-but-identical files happen with --reproduce and not in general.

No behavior change.

Differential Revision: https://reviews.llvm.org/D103522
2021-06-02 10:54:53 -04:00
Vy Nguyen 8f89c054af [lld-macho][nfc] Remove unnecessary use of Optional<T*>
In all of these cases, the functions could simply return a nullptr instead of {}.
There is no case where Optional<nullptr> has a special meaning.

Differential Revision: https://reviews.llvm.org/D103489
2021-06-01 18:35:31 -04:00
Nico Weber 2c1903412b [lld/mac] Implement removal of unused dylibs
This omits load commands for unreferenced dylibs if:
- the dylib was loaded implicitly,
- it is marked MH_DEAD_STRIPPABLE_DYLIB
- or -dead_strip_dylibs is passed

This matches ld64.

Currently, the "is dylib referenced" state is computed before dead code
stripping and is not updated after dead code stripping. This too matches ld64.
We should do better here.

With this, clang-format linked with lld (like with ld64) no longer has
libobjc.A.dylib in `otool -L` output. (It was implicitly loaded as a reexport
of CoreFoundation.framework, but it's not needed.)

Differential Revision: https://reviews.llvm.org/D103430
2021-06-01 16:06:30 -04:00
Nico Weber 24979e1113 [lld/mac] Don't load DylibFiles from the DylibFile constructor
loadDylib() keeps a name->DylibFile cache, but it only writes
to the cache once the DylibFile constructor has completed.
So dylib loads done recursively from the DylibFile constructor
wouldn't use the cache.

Now, we load additional dylibs after writing to the cache,
which means the cache now gets used for dylibs loaded because
they're referenced from other dylibs.

Related to PR49514 and PR50101, but no dramatic behavior change in itself.
(Technically we no longer crash when a tbd file reexports itself,
but that doesn't happen in practice. We now accept it silently instead
of crashing; ld64 has a diag for the reexport cycle.)

Differential Revision: https://reviews.llvm.org/D103423
2021-06-01 15:31:02 -04:00
Jez Ng 8535834ef7 [lld-macho][nfc] Misc code cleanup
* Move `static_asserts` into cpp instead of header file. I noticed they
  had been separated from the main class definition in the header, so I
  set about to clean that up, then figured it made more sense as part of
  the cpp file so as not to incur unnecessary compile-time overhead.

* Remove unnecessary `virtual`s

* Remove unnecessary comment / reword another comment
2021-05-25 14:58:29 -04:00
Alexander Shaposhnikov 57501e512e [lld][MachO] Fix code formatting
Apply clang-format -style=llvm to InputFile.cpp. NFC.

Test plan: make check-all
2021-05-23 20:35:55 -07:00
Nico Weber 4a12248ee2 [lld/mac] Honor REFERENCED_DYAMICALLY, set it on __mh_execute_header
Has the effect that `__mh_execute_header` stays in the symbol table of
outputs even after running `strip` on the output. I don't know if that's
important for anything -- my motivation for the patch is just is to make
the output more similar to ld64.

(Corresponds to symbolTableInAndNeverStrip in ld64.)

Differential Revision: https://reviews.llvm.org/D102619
2021-05-17 14:22:12 -04:00
Jez Ng b1c3c2e4fc [lld-macho] Fix order file arch filtering
We had a hardcoded check and a stale TODO, written back when we only had
support for one architecture.

Reviewed By: #lld-macho, thakis

Differential Revision: https://reviews.llvm.org/D102154
2021-05-10 15:45:54 -04:00
Nico Weber 7f673fcaa9 [lld/mac] Fix alignment on subsections
On a section with alignment of 16, subsections aligned to 16-byte
boundaries should keep their 16-byte alignment.

Fixes PR50274. (The same bug could have happened with -order_file
previously.)

Differential Revision: https://reviews.llvm.org/D102139
2021-05-09 21:00:56 -04:00
Nico Weber d5a70db193 [lld/mac] Write every weak symbol only once in the output
Before this, if an inline function was defined in several input files,
lld would write each copy of the inline function the output. With this
patch, it only writes one copy.

Reduces the size of Chromium Framework from 378MB to 345MB (compared
to 290MB linked with ld64, which also does dead-stripping, which we
don't do yet), and makes linking it faster:

        N           Min           Max        Median           Avg        Stddev
    x  10     3.9957051     4.3496981     4.1411121      4.156837    0.10092097
    +  10      3.908154      4.169318     3.9712729     3.9846753   0.075773012
    Difference at 95.0% confidence
            -0.172162 +/- 0.083847
            -4.14165% +/- 2.01709%
            (Student's t, pooled s = 0.0892373)

Implementation-wise, when merging two weak symbols, this sets a
"canOmitFromOutput" on the InputSection belonging to the weak symbol not put in
the symbol table. We then don't write InputSections that have this set, as long
as they are not referenced from other symbols. (This happens e.g. for object
files that don't set .subsections_via_symbols or that use .alt_entry.)

Some restrictions:
- not yet done for bitcode inputs
- no "comdat" handling (`kindNoneGroupSubordinate*` in ld64) --
  Frame Descriptor Entries (FDEs), Language Specific Data Areas (LSDAs)
  (that is, catch block unwind information) and Personality Routines
  associated with weak functions still not stripped. This is wasteful,
  but harmless.
- However, this does strip weaks from __unwind_info (which is needed for
  correctness and not just for size)
- This nopes out on InputSections that are referenced form more than
  one symbol (eg from .alt_entry) for now

Things that work based on symbols Just Work:
- map files (change in MapFile.cpp is no-op and not needed; I just
  found it a bit more explicit)
- exports

Things that work with inputSections need to explicitly check if
an inputSection is written (e.g. unwind info).

This patch is useful in itself, but it's also likely also a useful foundation
for dead_strip.

I used to have a "canoncialRepresentative" pointer on InputSection instead of
just the bool, which would be handy for ICF too. But I ended up not needing it
for this patch, so I removed that again for now.

Differential Revision: https://reviews.llvm.org/D102076
2021-05-07 17:11:40 -04:00
Jez Ng 9260760235 [lld-macho] Support loading of zippered dylibs
ld64 can emit dylibs that support more than one platform (typically macOS and
macCatalyst). This diff allows LLD to read in those dylibs. Note that this is a
super bare-bones implementation -- in particular, I haven't added support for
LLD to emit those multi-platform dylibs, nor have I added a variety of
validation checks that ld64 does. Until we have a use-case for emitting zippered
dylibs, I think this is good enough.

Fixes PR49597.

Reviewed By: #lld-macho, oontvoo

Differential Revision: https://reviews.llvm.org/D101954
2021-05-06 11:19:40 -04:00
Vy Nguyen 23233ad139 [lld-macho] Check simulator platforms to avoid issuing false positive errors.
Currently the linker causes unnecessary errors when either the target or the config's platform is a simulator.

Differential Revision: https://reviews.llvm.org/D101855
2021-05-05 18:07:58 -04:00
Jez Ng 001ba65375 [lld-macho] De-templatize mach_header operations
@thakis pointed out that `mach_header` and `mach_header_64`
actually have the same set of (used) fields, with the 64-bit version
having extra padding. So we can access the fields we need using the
single `mach_header` type instead of using templates to switch between
the two.

I also spotted a potential issue where hasObjCSection tries to parse a
file w/o checking if it does indeed match the target arch... As such,
I've added a quick magic number check to ensure we don't access invalid
memory during `findCommand()`.

Addresses PR50180.

Reviewed By: #lld-macho, thakis

Differential Revision: https://reviews.llvm.org/D101724
2021-05-03 18:31:23 -04:00
Jez Ng 05c5363b39 [lld-macho] Parse & emit the N_ARM_THUMB_DEF symbol flag
Eventually we'll use this flag to properly handle bl/blx
opcodes.

Reviewed By: #lld-macho, gkm

Differential Revision: https://reviews.llvm.org/D101558
2021-04-30 16:17:26 -04:00
Nico Weber 4b456038e4 [lld/mac] Tweak two comments and fix style on one variable name
Cosmetic, no behavior change.
2021-04-30 09:30:51 -04:00
Nico Weber a38ebed258 [lld/mac] Implement support for .weak_def_can_be_hidden
I first had a more invasive patch for this (D101069), but while trying
to get that polished for review I realized that lld's current symbol
merging semantics mean that only a very small code change is needed.
So this goes with the smaller patch for now.

This has no effect on projects that build with -fvisibility=hidden
(e.g.  chromium), since these see .private_extern symbols instead.

It does have an effect on projects that build with -fvisibility-inlines-hidden
(e.g. llvm) in -O2 builds, where LLVM's GlobalOpt pass will promote most inline
functions from .weak_definition to .weak_def_can_be_hidden.

Before this patch:

    % ls -l out/gn/bin/clang out/gn/lib/libclang.dylib
    -rwxr-xr-x  1 thakis  staff  113059936 Apr 22 11:51 out/gn/bin/clang
    -rwxr-xr-x  1 thakis  staff   86370064 Apr 22 11:51 out/gn/lib/libclang.dylib
    % out/gn/bin/llvm-objdump --macho --weak-bind out/gn/bin/clang | wc -l
        8291
    % out/gn/bin/llvm-objdump --macho --weak-bind out/gn/lib/libclang.dylib | wc -l
        5698

With this patch:

    % ls -l out/gn/bin/clang out/gn/lib/libclang.dylib
    -rwxr-xr-x  1 thakis  staff  111721096 Apr 22 11:55 out/gn/bin/clang
    -rwxr-xr-x  1 thakis  staff   85291208 Apr 22 11:55 out/gn/lib/libclang.dylib
    thakis@MBP llvm-project % out/gn/bin/llvm-objdump --macho --weak-bind out/gn/bin/clang | wc -l
         725
    thakis@MBP llvm-project % out/gn/bin/llvm-objdump --macho --weak-bind out/gn/lib/libclang.dylib | wc -l
         542

Linking clang becomes a tiny bit faster with this patch:

    x 100    0.67263818    0.77847815    0.69430709    0.69877208   0.017715892
    + 100    0.67209601    0.73323393    0.68600798    0.68917346   0.012824377
    Difference at 95.0% confidence
            -0.00959861 +/- 0.00428661
            -1.37364% +/- 0.613449%
            (Student's t, pooled s = 0.0154648)

This only happens if lld with the patch and lld without the patch are both
linked with an lld with the patch or both linked with an lld without the patch
(...or with ld64). I accidentally linked the lld with the patch with an lld
without the patch and the other way round at first. In that setup, no
difference is found. That makese sense, since having fewer weak imports will
make the linked output a bit faster too. So not only does this make linking
binaries such as clang a bit faster (since fewer exports need to be written to
the export trie by lld), the linked output binary binary is also a bit faster
(since dyld needs to process fewer dynamic imports).

This also happens to fix the one `check-clang` failure when using lld as host
linker, but mostly for silly reasons: See crbug.com/1183336, mostly comment 26.
The real bug here is that c-index-test links all of LLVM both statically and
dynamically, which is an ODR violation. Things just happen to work with this
patch.

So after this patch, check-clang, check-lld, check-llvm all pass with lld as
host linker :)

Differential Revision: https://reviews.llvm.org/D101080
2021-04-22 22:51:34 -04:00
Jez Ng 8c17a87515 [re-land][lld-macho] Fix min version check
We had got it backwards... the minimum version of the target
should be higher than the min version of the object files, presumably
since new platforms are backwards-compatible with older formats.

Fixes PR50078.

The original commit (aa05439c9c) broke many tests that had inputs too
new for our target platform (10.0). This commit changes the inputs to
target 10.0, which was the simpler thing to do, but we should really
just have our lit.local.cfg default to targeting 10.15... we're not
likely to ever have proper support for the older versions anyway. I will
follow up with a change to that effect.

Differential Revision: https://reviews.llvm.org/D101114
2021-04-22 19:35:32 -04:00
Jez Ng 75ecb804b1 Revert "[lld-macho] Fix min version check"
This reverts commit aa05439c9c.
2021-04-22 19:07:41 -04:00
Jez Ng aa05439c9c [lld-macho] Fix min version check
We had got it backwards... the minimum version of the target
should be higher than the min version of the object files, presumably
since new platforms are backwards-compatible with older formats.

Fixes PR50078.

Reviewed By: #lld-macho, thakis

Differential Revision: https://reviews.llvm.org/D101114
2021-04-22 18:25:44 -04:00
Jez Ng ed4a4e3312 [lld-macho][nfc] Add accessors for commonly-used PlatformInfo fields
As discussed here: https://reviews.llvm.org/D100523#inline-951543

Reviewed By: #lld-macho, thakis, alexshap

Differential Revision: https://reviews.llvm.org/D100978
2021-04-21 15:43:56 -04:00
Alexander Shaposhnikov b5720354ef [lld][MachO] Refactor findCommand
Refactor findCommand to allow passing multiple types. NFC.

Test plan: make check-lld-macho

Differential revision: https://reviews.llvm.org/D100954
2021-04-21 08:38:17 -07:00
Alexander Shaposhnikov 5c835e1ae5 [lld][MachO] Add support for LC_VERSION_MIN_* load commands
This diff adds initial support for the legacy LC_VERSION_MIN_* load commands.

Test plan: make check-lld-macho

Differential revision: https://reviews.llvm.org/D100523
2021-04-21 05:41:14 -07:00
Jez Ng 7208bd4b32 [lld-macho] Skip platform checks for a few libSystem re-exports
XCode 12 ships with mismatched platforms for these libraries,
so this hack is necessary...

Fixes PR49799.

Reviewed By: #lld-macho, gkm, smeenai

Differential Revision: https://reviews.llvm.org/D100913
2021-04-20 19:54:53 -04:00
Jez Ng 1aa29dffce [lld-macho] Support subtractor relocations that reference sections
The minuend (but not the subtrahend) can reference a section.

Note that we do not yet properly validate that the subtrahend isn't
referencing a section; I've filed PR50034 to track that.

I've also extended the reloc-subtractor.s test to reorder symbols, to
make sure that the addends are being associated with the minuend (and not
the subtrahend) relocation.

Fixes PR49999.

Reviewed By: #lld-macho, thakis

Differential Revision: https://reviews.llvm.org/D100804
2021-04-20 16:58:57 -04:00
Jez Ng 3142fc3b5b [lld-macho] Have toString() emit full path to archive files
It doesn't make sense to take just the base filename for archives when we emit
the full path for object files. (LLD-ELF emits the full path too.)

This will also make it easier to write a proper test for {D100147}.

Reviewed By: #lld-macho, oontvoo

Differential Revision: https://reviews.llvm.org/D100357
2021-04-13 10:43:28 -04:00
Jez Ng 2461804b48 [lld-macho] Symbol::value should always be uint64_t
D98837 migrated a bunch of `value`s to uint64_t, but missed these.
2021-04-06 17:54:11 -04:00
Jez Ng ceec610754 [lld-macho] Fix & refactor symbol size calculations
I noticed two problems with the previous implementation:

* N_ALT_ENTRY symbols weren't being handled correctly -- they should
  determine the size of the previous symbol, even though they don't
  cause a new section to be created
* The last symbol in a section had its size calculated wrongly;
  the first subsection's size was used instead of the last one

I decided to take the opportunity to refactor things as well, mainly to
realize my observation
[here](https://reviews.llvm.org/D98837#inline-931511) that we could
avoid doing a binary search to match symbols with subsections. I think
the resulting code is a bit simpler too.

      N           Min           Max        Median           Avg        Stddev
  x  20          4.31          4.43          4.37        4.3775   0.034162922
  +  20          4.32          4.43          4.38        4.3755    0.02799906
  No difference proven at 95.0% confidence

Reviewed By: #lld-macho, alexshap

Differential Revision: https://reviews.llvm.org/D99972
2021-04-06 15:10:01 -04:00
Jez Ng e0df2b540a [lld-macho] Rename SubsectionMapping to SubsectionMap
We bikeshedded about it here: https://reviews.llvm.org/D98837#inline-931557

I initially suggested SubsectionMapping, but I thought the discussion
landed on doing `std::vector<SubsectionEntry>`. @alexshap went and did
both, but on hindsight I regret adding 3 more characters to an already
long name, and I think SubsectionEntry is descriptive enough...

This diff also renames `subsectionMap` to `subsecMap` for consistency
with other variable names in the codebase.
2021-04-06 14:26:13 -04:00
Cyndy Ishida 0116d04d04 [TextAPI] move source code files out of subdirectory, NFC
TextAPI/ELF has moved out into InterfaceStubs, so theres no longer a
need to seperate out TextAPI between formats.

Reviewed By: ributzka, int3, #lld-macho

Differential Revision: https://reviews.llvm.org/D99811
2021-04-05 10:24:42 -07:00
Jez Ng 817d98d841 [lld-macho][nfc] Refactor in preparation for 32-bit support
The main challenge was handling the different on-disk structures (e.g.
`mach_header` vs `mach_header_64`). I tried to strike a balance between
sprinkling `target->wordSize == 8` checks everywhere (branchy = slow, and ugly)
and templatizing everything (causes code bloat, also ugly). I think I struck a
decent balance by judicious use of type erasure.

Note that LLD-ELF has a similar architecture, though it seems to use more templating.

Linking chromium_framework takes about the same time before and after this
change:

      N           Min           Max        Median           Avg        Stddev
  x  20          4.52          4.67         4.595        4.5945   0.044423204
  +  20           4.5          4.71         4.575         4.582   0.056344803
  No difference proven at 95.0% confidence

Reviewed By: #lld-macho, oontvoo

Differential Revision: https://reviews.llvm.org/D99633
2021-04-02 18:46:39 -04:00
Yang Fan d441dee5c2
[lld][MachO] Fix -Wsign-compare warning (NFC)
GCC warning:
```
/llvm-project/lld/MachO/InputFiles.cpp:484:24: warning: comparison of integer expressions of different signedness: ‘int64_t’ {aka ‘long int’} and ‘uint64_t’ {aka ‘long unsigned int’} [-Wsign-compare]
484 |           return value < subsectionEntry.offset;
    |                  ~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~
```
2021-04-02 11:33:56 +08:00
Alexander Shaposhnikov f6ad045366 [lld][MachO] Make emitEndFunStab independent from .subsections_via_symbols
This diff addresses FIXME in SyntheticSections.cpp and removes
the dependency of emitEndFunStab on .subsections_via_symbols.

Test plan: make check-lld-macho

Differential revision: https://reviews.llvm.org/D99054
2021-04-01 17:48:09 -07:00
Alexander Shaposhnikov f1e4e2fb20 [lld][MachO] Refactor handling of subsections
This diff is a preparation for fixing FunStabs (incorrect size calculation).
std::map<uint32_t, InputSection*> (SubsectionMap) is replaced with
a sorted vector + binary search. If .subsections_via_symbols is set
this vector will contain the list of subsections, otherwise,
the offsets will be used for calculating the symbols sizes.

Test plan: make check-all

Differential revision: https://reviews.llvm.org/D98837
2021-03-31 16:52:53 -07:00
Jez Ng dc8bee9265 [lld-macho] Check address ranges when applying relocations
This diff required fixing `getEmbeddedAddend` to apply sign
extension to 32-bit values. We were previously passing around wrong
64-bit addend values that became "right" after being truncated back to
32-bit.

I've also made `getEmbeddedAddend` return a signed int, which is similar
to what LLD-ELF does for its `getImplicitAddend`.

`reportRangeError`, `checkUInt`, and `checkInt` are counterparts of similar
functions in LLD-ELF.

Reviewed By: #lld-macho, thakis

Differential Revision: https://reviews.llvm.org/D98387
2021-03-12 17:26:27 -05:00
Jez Ng a723db92d8 [lld-macho][nfc] Refactor subtractor reloc handling
SUBTRACTOR relocations are always paired with UNSIGNED
relocations to indicate a pair of symbols whose address difference we
want. Functionally they are like a single relocation: only one pointer
gets written / relocated. Previously, we would handle these pairs by
skipping over the SUBTRACTOR relocation and writing the pointer when
handling the UNSIGNED reloc. This diff reverses things, so we write
while handling SUBTRACTORs and skip over the UNSIGNED relocs instead.

Being able to distinguish between SUBTRACTOR and UNSIGNED relocs in the
write phase (i.e. inside `relocateOne`) is useful for the upcoming range
check diff: we want to check that SUBTRACTOR relocs write signed values,
but UNSIGNED relocs (naturally) write unsigned values.

Reviewed By: #lld-macho, thakis

Differential Revision: https://reviews.llvm.org/D98386
2021-03-11 13:28:13 -05:00
Jez Ng e8a3058303 [lld-macho] Fix handling of X86_64_RELOC_SIGNED_{1,2,4}
The previous implementation miscalculated the addend, resulting
in an underflow. This meant that every SIGNED_N section relocation would
be associated with the last subsection (since the addend would now be a
huge number). We were "lucky" that this mistake was typically cancelled
out -- 64-to-32-bit-truncation meant that the final value was correct,
as long as subsections were not rearranged.

Reviewed By: #lld-macho, thakis

Differential Revision: https://reviews.llvm.org/D98385
2021-03-11 13:28:11 -05:00
Jez Ng 5433a79176 [lld-macho][nfc] Create Relocations.{h,cpp} for relocation-specific code
This more closely mirrors the structure of lld-ELF.

Reviewed By: #lld-macho, thakis

Differential Revision: https://reviews.llvm.org/D98384
2021-03-11 13:28:09 -05:00
Jez Ng 1752f28506 [lld-macho][nfc] Remove `MachO::` prefix where possible
Previously, SyntheticSections.cpp did not have a top-level `using namespace
llvm::MachO` because it caused a naming conflict: `llvm::MachO::Symbol` would
collide with `lld::macho::Symbol`.

`MachO::Symbol` represents the symbols defined in InterfaceFiles (TBDs). By
moving the inclusion of InterfaceFile.h into our .cpp files, we can avoid this
name collision in other files where we are only dealing with LLD's own symbols.

Along the way, I removed all unnecessary "MachO::" prefixes in our code.

Cons of this approach: If TextAPI/MachO/Symbol.h gets included via some other
header file in the future, we could run into this collision again.

Alternative 1: Have either TextAPI/MachO or BinaryFormat/MachO.h use a different
namespace. Most of the benefit of `using namespace llvm::MachO` comes from being
able to use things in BinaryFormat/MachO.h conveniently; if TextAPI was under a
different (and fully-qualified) namespace like `llvm::tapi` that would solve our
problems. Cons: lots of files across llvm-project will need to be updated, and
folks who own the TextAPI code need to agree to the name change.

Alternative 2: Rename our Symbol to something like `LldSymbol`. I think this is
ugly.

Personally I think alternative #1 is ideal, but I'm not sure the effort to do it is
worthwhile, this diff's halfway solution seems good enough to me. Thoughts?

Reviewed By: #lld-macho, oontvoo, MaskRay

Differential Revision: https://reviews.llvm.org/D98149
2021-03-11 13:28:08 -05:00
Greg McGary fdc0c21973 [lld-macho][NFC] when reasonable, replace auto keyword with type names
lld policy discourages `auto`. Replace it with a type name whenever reasonable. Retain `auto` to avoid ...
* redundancy, as for decls such as `auto *t = mumble_cast<TYPE *>` or similar that specifies the result type on the RHS
* verbosity, as for iterators
* gratuitous suffering, as for lambdas

Along the way, add `const` when appropriate.

Note: a future diff will ...
* add more `const` qualifiers
* remove `opt::` when we are already `using llvm::opt`

Differential Revision: https://reviews.llvm.org/D98313
2021-03-09 22:08:32 -08:00
Vy Nguyen 70c0dbf151 [lld-macho][NFC] Replace config param with a global in hasCompatVersion() helper.
Differential Revision: https://reviews.llvm.org/D98115
2021-03-06 11:32:51 -05:00
Vy Nguyen fc5d804ddb [lld-macho] Check platform and version when constructor ObjFile
Differential Revision: https://reviews.llvm.org/D97979
2021-03-05 17:34:38 -05:00
Jez Ng 3c19b4f34d [lld-macho] Skip over symbols in un-parsed debug info sections
clang appears to emit symbols in `__debug_aranges`, at least
for arm64... in the examples I've seen, it doesn't seem like those
symbols are referenced outside of `__DWARF`, so I think they're safe to
ignore. But hopefully @clayborg can confirm.

Reviewed By: clayborg

Differential Revision: https://reviews.llvm.org/D98073
2021-03-05 17:24:32 -05:00
Jez Ng fc011b5eb1 [lld-macho] Replace debug-info-related assert with FIXME
We'll need to properly handle object files with multiple source inputs
eventually, but remove the assert for now so we can successfully emit binaries
for testing.

Reviewed By: #lld-macho, smeenai

Differential Revision: https://reviews.llvm.org/D98067
2021-03-05 17:24:31 -05:00
Jez Ng 0d4dadc64c [lld-macho] Include install name in error messages for dylibs from TBDs
Since multiple dylibs can be defined in one TBD, this is
necessary to avoid confusion.

Reviewed By: #lld-macho, oontvoo

Differential Revision: https://reviews.llvm.org/D97905
2021-03-04 14:36:49 -05:00
Jez Ng 55a32812fa [lld-macho] Filter TAPI re-exports by target
Previously, we were loading re-exports without checking whether
they were compatible with our target. Prior to {D97209}, it meant that
we were defining dylib symbols that were invalid -- usually a silent
failure unless our binary actually used them. D97209 exposed this as an
explicit error.

Along the way, I've extended our TAPI compatibility check to cover the
platform as well, instead of just checking the arch. To this end, I've
replaced MachO::Architecture with MachO::Target in our Config struct.

Reviewed By: #lld-macho, oontvoo

Differential Revision: https://reviews.llvm.org/D97867
2021-03-04 14:36:47 -05:00
Jez Ng 8601be809e [lld-macho] Fix & fold reexport-nested-libs test into stub-link.s
The reexport-nested-libs test added in D97438 was a bit wonky.

First, it was linking against libReexportSystem.tbd which targets the
iOS simulator, and which in turn attempted to re-export the iOS
simulator's libSystem. However, due to the way `-syslibroot` works, it
was actually re-exporting the macOS libSystem.

As a result, the test was not actually able to resolve the symbols in
the desired libSystem. I'm guessing that @oontvoo was confused by this
and therefore included those symbols in libReexportSystem.tbd itself.
But this means that the test wasn't actually testing the resolution of
re-exported symbols (though it did at least verify that the re-exported
libraries could be located).

After some consideration, I figured that stub-link.s could be extended
to cover what reexport-nested-libs.s was attempting to do. The test
targets macOS, so we only have one `-syslibroot` and no chance of
confusion.

Reviewed By: #lld-macho, oontvoo

Differential Revision: https://reviews.llvm.org/D97866
2021-03-04 14:36:46 -05:00
Jez Ng 5d9aafc09a [lld-macho] Bind re-exported symbols directly to implicitly-linked umbrellas
Suppose we are linking against libFoo, which re-exports the
implicitly-bound libSystem, which in turn re-exports some
non-explicitly-bound library like `/usr/lib/system/libsystem_c.dylib`.
Then any bindings we have to a symbol in libsystem_c should use
libSystem (and not libFoo) as the umbrella library.

Reviewed By: #lld-macho, smeenai

Differential Revision: https://reviews.llvm.org/D97865
2021-03-04 14:36:44 -05:00
Jez Ng b63919e180 [lld-macho] Require -arch and -platform_version to always be specified
We previously defaulted to x86_64 and an unknown platform, which was fine when
we only supported one arch and did no platform checks, but that will no longer
be true going ahead. Therefore, we should require those flags to be specified
whenever the linker is invoked.

Note that LLD-ELF and ld64 both infer the arch from their input object files,
but the usefulness of that is questionable since clang will always specify these
flags, and most of the time `lld` will be invoked via clang.

Reviewed By: #lld-macho, thakis

Differential Revision: https://reviews.llvm.org/D97799
2021-03-03 15:52:10 -05:00
Greg McGary 4af1522a85 [lld-macho] Rework length check when opening input files
This reverts diff D97610 (commit 0223ab035c) and adds a one-line fix to verify that a `MemoryBufferRef` has sufficient length before reading a 4-byte magic number.

Differential Revision: https://reviews.llvm.org/D97757
2021-03-02 13:00:57 -08:00
Vy Nguyen 9a2e2de15f [lld-macho] Change loadReexport to handle the case where a TAPI re-exports to reference documents nested within other TBD.
Currently, it was delibrately impleneted to not handle this case, but as it has turnt out, we need this feature.
The concrete use case is
       `System/Library/Frameworks/Cocoa.framework/Versions/A/Cocoa` reexports
               /System/Library/Frameworks/AppKit.framework/Versions/C/AppKit , which then rexports
                    /System/Library/PrivateFrameworks/UIFoundation.framework/Versions/A/UIFoundation

The current implemention uses a global currentTopLevelTapi, which is not reset until it finishes loading the whole tree.
This is a problem because if the top-level is set to Cocoa, then when we get to UIFoundation, it will try to find UIFoundation in the current top level, which is Cocoa and will not find it.

The right thing should be:
 - When loading a library from a TBD file, re-exports need to be looked up in the auxiliary documents within the same TBD.
 - When loading from an actual dylib, no additional TBD documents need to be examined.
 - In no case does a re-export mentioned in one TBD file need to be looked up in a document in an auxiliary document from a different TBD file

Differential Revision: https://reviews.llvm.org/D97438
2021-03-02 12:14:31 -05:00
Nico Weber 8174f33dc9 [lld/mac] Add support for -flat_namespace
-flat_namespace makes lld emit binaries that use name lookup that's more in
line with other POSIX systems: Instead of looking up symbols as (dylib,name)
pairs by dyld, they're instead looked up just by name.

-flat_namespace has three effects:

1. MH_TWOLEVEL and MH_NNOUNDEFS are no longer set in the Mach-O header
2. All symbols use BIND_SPECIAL_DYLIB_FLAT_LOOKUP as ordinal
3. When a dylib is added to the link, its dependent dylibs are also added,
   so that lld can verify that no undefined symbols remain at the end of
   a link with -flat_namespace. These transitive dylibs are added for symbol
   resolution, but they are not emitted in LC_LOAD_COMMANDs.

-undefined with -flat_namespace still isn't implemented. Before this change,
it was impossible to hit that combination because -flat_namespace caused a
diagnostic. Now that it no longer does, emit a dedicated temporary diagnostic
when both flags are used.

Differential Revision: https://reviews.llvm.org/D97641
2021-03-01 15:25:10 -05:00
Jez Ng f083f652c3 [lld-macho][nfc] Remove TODO regarding addends
There was initially some concern around the correct handling of pcrel
section relocations with r_length != 2. But it looks like there are no such
relocations in practice -- x86_64's pcrel section relocs all have r_length == 2,
and ARM64 doesn't even have pcrel section relocs. So we can replace the TODO
with an assert.

Reviewed By: #lld-macho, thakis

Differential Revision: https://reviews.llvm.org/D97576
2021-03-01 12:30:08 -05:00
Greg McGary 0223ab035c [lld-macho] check minimum header length when opening linkable input files
Bifurcate the `readFile()` API into ...
* `readRawFile()` which performs no checks, and
* `readLinkableFile()` which enforces minimum length of 20 bytes, same as ld64

There are no new tests because tweaks to existing tests are sufficient.

Differential Revision: https://reviews.llvm.org/D97610
2021-02-27 14:41:40 -08:00
Jez Ng 82b3da6f6f [lld-macho] Extract embedded addends for arm64 UNSIGNED relocations
On arm64, UNSIGNED relocs are the only ones that use embedded addends
instead of the ADDEND relocation.

Also ensure that the addend works when UNSIGNED is part of a SUBTRACTOR
pair.

Reviewed By: #lld-macho, alexshap

Differential Revision: https://reviews.llvm.org/D97105
2021-02-27 12:31:34 -05:00
Jez Ng 541390131e [lld-macho] Don't emit rebase opcodes for subtractor minuend relocs
Also add a few asserts to verify that we are indeed handling an
UNSIGNED relocation as the minued. I haven't made it an actual
user-facing error since I don't think llvm-mc is capable of generating
SUBTRACTOR relocations without an associated UNSIGNED.

Reviewed By: #lld-macho, smeenai

Differential Revision: https://reviews.llvm.org/D97103
2021-02-27 12:31:34 -05:00
Jez Ng 84579fc24f [lld-macho] Basic support for linkage and visibility attributes in LTO
When parsing bitcode, convert LTO Symbols to LLD Symbols in order to perform
resolution. The "winning" symbol will then be marked as Prevailing at LTO
compilation time. This is similar to what the other LLD ports do.

This change allows us to handle `linkonce` symbols correctly, and to deal with
duplicate bitcode symbols gracefully. Previously, both scenarios would result in
an assertion failure inside the LTO code, complaining that multiple Prevailing
definitions are not allowed.

While at it, I also added basic logic around visibility. We don't do anything
useful with it yet, but we do check that its value is valid. LLD-ELF appears to
use it only to set FinalDefinitionInLinkageUnit for LTO, which I think is just a
performance optimization.

From my local experimentation, the linker itself doesn't seem to do anything
differently when encountering linkonce / linkonce_odr / weak / weak_odr. So I've
only written a test for one of them. LLD-ELF has more, but they seem to mostly
be testing the intermediate bitcode output of their LTO backend...? I'm far from
an expert here though, so I might very well be missing things.

Reviewed By: #lld-macho, MaskRay, smeenai

Differential Revision: https://reviews.llvm.org/D94342
2021-02-25 13:27:40 -05:00
Jez Ng 4752cdc9a2 [lld-macho] Check for arch compatibility when loading ObjFiles and TBDs
The silent failures had confused me a few times.

I haven't added a similar check for platform yet as we don't yet have logic to
infer the platform automatically, and so adding that check would require
updating dozens of test files.

Reviewed By: #lld-macho, thakis, alexshap

Differential Revision: https://reviews.llvm.org/D97209
2021-02-23 22:02:38 -05:00
Jez Ng 5e851733c5 [lld-macho] Fix semantics & add tests for ARM64 GOT/TLV relocs
I've adjusted the RelocAttrBits to better fit the semantics of
the relocations. In particular:

1. *_UNSIGNED relocations are no longer marked with the `TLV` bit, even
   though they can occur within TLV sections. Instead the `TLV` bit is
   reserved for relocations that can reference thread-local symbols, and
   *_UNSIGNED relocations have their own `UNSIGNED` bit. The previous
   implementation caused TLV and regular UNSIGNED semantics to be
   conflated, resulting in rebase opcodes being incorrectly emitted for TLV
   relocations.

2. I've added a new `POINTER` bit to denote non-relaxable GOT
   relocations. This distinction isn't important on x86 -- the GOT
   relocations there are either relaxable or non-relaxable loads -- but
   arm64 has `GOT_LOAD_PAGE21` which loads the page that the referent
   symbol is in (regardless of whether the symbol ends up in the GOT). This
   relocation must reference a GOT symbol (so must have the `GOT` bit set)
   but isn't itself relaxable (so must not have the `LOAD` bit). The
   `POINTER` bit is used for relocations that *must* reference a GOT
   slot.

3. A similar situation occurs for TLV relocations.

4. ld64 supports both a pcrel and an absolute version of
   ARM64_RELOC_POINTER_TO_GOT. But the semantics of the absolute version
   are pretty weird -- it results in the value of the GOT slot being
   written, rather than the address. (That means a reference to a
   dynamically-bound slot will result in zeroes being written.) The
   programs I've tried linking don't use this form of the relocation, so
   I've dropped our partial support for it by removing the relevant
   RelocAttrBits.

Reviewed By: alexshap

Differential Revision: https://reviews.llvm.org/D97031
2021-02-23 22:02:38 -05:00
Jez Ng e5d780e049 [lld-macho] Use full input file name in invalid relocation error message
Just something I noticed while debugging arm relocations...

Reviewed By: #lld-macho, thakis

Differential Revision: https://reviews.llvm.org/D97078
2021-02-23 22:02:38 -05:00
Vy Nguyen 5a856f5b44 Reland [lld-macho]Implement bundle_loader
Reland 1a0afcf518
  https://reviews.llvm.org/D95913

New change: fix UB bug caused by copying empty path/name. (since the executable does not have a name)
2021-02-22 14:05:12 -05:00
Vitaly Buka c17547df44 Revert "Implement -bundle_loader"
D95913 passes null pointer into memcpy

This reverts commit 1a0afcf518.
2021-02-19 17:40:07 -08:00
Vy Nguyen 1a0afcf518 Implement -bundle_loader
Differential Revision: https://reviews.llvm.org/D95913

Usage: -bundle_loader <executable>
This option specifies the executable that will load the build output file being linked.
When building a bundle, users can use the --bundle_loader  to specify an executable
that contains symbols referenced, but not implemented in the bundle.
2021-02-18 16:11:37 -05:00
Greg McGary 87104faac4 [lld-macho] Add ARM64 target arch
This is an initial base commit for ARM64 target arch support. I don't represent that it complete or bug-free, but wish to put it out for review now that some basic things like branch target & load/store address relocs are working.

I can add more tests to this base commit, or add them in follow-up commits.

It is not entirely clear whether I use the "ARM64" (Apple) or "AArch64" (non-Apple) naming convention. Guidance is appreciated.

Differential Revision: https://reviews.llvm.org/D88629
2021-02-08 18:14:07 -07:00
Jez Ng f843bb82c0 [lld-macho] Force-loading should share code path with regular archive loads
This extends {D92539} to work even when we are loading archive
members via `-force_load`. I uncovered this issue while trying to
force-load archives containing bitcode -- we were segfaulting.

In addition to fixing the `-force_load` case, this diff also addresses
the behavior of `-ObjC` when LTO bitcode is involved -- we need to
force-load those archive members if they contain ObjC categories.

Reviewed By: #lld-macho, smeenai

Differential Revision: https://reviews.llvm.org/D95265
2021-02-03 13:43:47 -05:00
Jez Ng 163dcd8513 [lld-macho] Associate each Symbol with an InputFile
This makes our error messages more informative. But the bigger motivation is for
LTO symbol resolution, which will be in an upcoming diff. The changes in this
one are largely mechanical.

Reviewed By: #lld-macho, smeenai

Differential Revision: https://reviews.llvm.org/D94316
2021-02-03 13:43:47 -05:00
Greg McGary 3a9d2f1488 [lld-macho][NFC] refactor relocation handling
Add per-reloc-type attribute bits and migrate code from per-target file into target independent code, driven by reloc attributes.

Many cleanups

Differential Revision: https://reviews.llvm.org/D95121
2021-02-02 10:54:53 -07:00
Jez Ng e98b441a09 [lld-macho] Remove unnecessary llvm:: namespace prefixes 2021-01-09 12:44:35 -05:00
Nico Weber 13f439a187 [lld/mac] Implement support for private extern symbols
Private extern symbols are used for things scoped to the linkage unit.
They cause duplicate symbol errors (so they're in the symbol table,
unlike TU-scoped truly local symbols), but they don't make it into the
export trie. They are created e.g. by compiling with
-fvisibility=hidden.

If two weak symbols have differing privateness, the combined symbol is
non-private external. (Example: inline functions and some TUs that
include the header defining it were built with
-fvisibility-inlines-hidden and some weren't).

A weak private external symbol implicitly has its "weak" dropped and
behaves like a regular strong private external symbol: Weak is an export
trie concept, and private symbols are not in the export trie.

If a weak and a strong symbol have different privateness, the strong
symbol wins.

If two common symbols have differing privateness, the larger symbol
wins. If they have the same size, the privateness of the symbol seen
later during the link wins (!) -- this is a bit lame, but it matches
ld64 and this behavior takes 2 lines less to implement than the less
surprising "result is non-private external), so match ld64.
(Example: `int a` in two .c files, both built with -fcommon,
one built with -fvisibility=hidden and one without.)

This also makes `__dyld_private` a true TU-local symbol, matching ld64.
To make this work, make the `const char*` StringRefZ ctor to correctly
set `size` (without this, writing the string table crashed when calling
getName() on the __dyld_private symbol).

Mention in CommonSymbol's comment that common symbols are now disabled
by default in clang.

Mention in -keep_private_externs's HelpText that the flag only has an
effect with `-r` (which we don't implement yet -- so this patch here
doesn't regress any behavior around -r + -keep_private_externs)). ld64
doesn't explicitly document it, but the commit text of
http://reviews.llvm.org/rL216146 does, and ld64's
OutputFile::buildSymbolTable() checks `_options.outputKind() ==
Options::kObjectFile` before calling `_options.keepPrivateExterns()`
(the only reference to that function).

Fixes PR48536.

Differential Revision: https://reviews.llvm.org/D93609
2020-12-21 21:23:33 -05:00
Greg McGary d4ec3346b1 [lld-macho][nfc] Refactor to accommodate paired relocs
This is a refactor to pave the way for supporting paired-ADDEND for ARM64. The only paired reloc type for X86_64 is SUBTRACTOR. In a later diff, I will add SUBTRACTOR for both X86_64 and ARM64.

* s/`getImplicitAddend`/`getAddend`/ because it handles all forms of addend: implicit, explicit, paired.
* add predicate `bool isPairedReloc()`
* check range of `relInfo.r_symbolnum` is internal, unrelated to user-input, so use `assert()`, not `error()`
* minor cleanups & rearrangements in `InputFile::parseRelocations()`

Differential Revision: https://reviews.llvm.org/D90614
2020-12-17 20:21:41 -08:00
Jez Ng 4c8276cdc1 [lld-macho] Use LC_LOAD_WEAK_DYLIB for dylibs with only weakrefs
Note that dylibs without *any* refs will still be loaded in the usual
(strong) fashion.

Reviewed By: #lld-macho, thakis

Differential Revision: https://reviews.llvm.org/D93435
2020-12-17 08:49:17 -05:00
Jez Ng 811444d7a1 [lld-macho] Add support for weak references
Weak references need not necessarily be satisfied at runtime (but they must
still be satisfied at link time). So symbol resolution still works as per usual,
but we now pass around a flag -- ultimately emitting it in the bind table -- to
indicate if a given dylib symbol is a weak reference.

ld64's behavior for symbols that have both weak and strong references is
a bit bizarre. For non-function symbols, it will emit a weak import. For
function symbols (those referenced by BRANCH relocs), it will emit a
regular import. I'm not sure what value there is in that behavior, and
since emulating it will make our implementation more complex, I've
decided to treat regular weakrefs like function symbol ones for now.

Fixes PR48511.

Reviewed By: #lld-macho, thakis

Differential Revision: https://reviews.llvm.org/D93369
2020-12-17 08:49:16 -05:00
Nico Weber ec88746a05 [lld/mac] fill in current and compatibility version for LC_LOAD_(WEAK_)DYLIB
Not sure if anything actually depends on this, but it makes `otool -L`
output look nicer.

Differential Revision: https://reviews.llvm.org/D93332
2020-12-15 19:34:59 -05:00
Jez Ng 3aa8e071dd [lld-macho] Add implicit dylib support for frameworks
{D93000} applied to frameworks. Partial fix for PR48511.

Reviewed By: #lld-macho, thakis

Differential Revision: https://reviews.llvm.org/D93277
2020-12-15 15:58:26 -05:00
Jez Ng 544148ae70 [lld-macho] -weak_{library,framework} should always take priority
We were not setting forceWeakImport for file paths given by
`-weak_library` if we had already loaded the file. This diff fixes that
by having `loadDylib` return a cached DylibFile instance even if we have
already loaded that file.

We still avoid emitting multiple LC_LOAD_DYLIBs, but we achieve this by
making inputFiles a SetVector instead of relying on the `loadedDylibs`
cache.

Reviewed By: #lld-macho, smeenai

Differential Revision: https://reviews.llvm.org/D93255
2020-12-15 15:58:26 -05:00
Jez Ng 76c36c11a9 [lld-macho] Don't load dylibs more than once
Also remove `DylibFile::reexported` since it's unused.

Fixes llvm.org/PR48393.

Reviewed By: thakis

Differential Revision: https://reviews.llvm.org/D93001
2020-12-10 15:57:52 -08:00
Jez Ng 6a348f6158 [lld-macho] Implement `-no_implicit_dylibs`
Dylibs that are "public" -- i.e. top-level system libraries -- are considered
implicitly linked when another library re-exports them. That is, we should load
them & bind directly to their symbols instead of via their re-exporting
umbrella library. This diff implements that behavior by default, as well as an
opt-out flag.

In theory, this is just a performance optimization, but in practice it seems
that it's needed for correctness.

Fixes llvm.org/PR48395.

Reviewed By: thakis

Differential Revision: https://reviews.llvm.org/D93000
2020-12-10 15:57:52 -08:00
Jez Ng 863f7a745e [lld-macho] Don't attempt to emit rebase opcodes for debug sections
This was causing a crash as we were attempting to look up the
nonexistent parent OutputSection of the debug sections. We didn't detect
it earlier because there was no test for PIEs with debug info (PIEs
require us to emit rebases for X86_64_RELOC_UNSIGNED).

This diff filters out the debug sections while loading the ObjFiles. In
addition to fixing the above problem, it also lets us avoid doing
redundant work -- we no longer parse / apply relocations / attempt to
emit dyld opcodes for these sections that we don't emit.

Fixes llvm.org/PR48392.

Reviewed By: thakis

Differential Revision: https://reviews.llvm.org/D92904
2020-12-10 15:57:51 -08:00
Jez Ng 78976bf3da [lld-macho] Support parsing of bitcode within archives
Also error out if we find anything other than an object or bitcode file
in the archive.

Note that we were previously inserting the symbols and sections of the
unpacked ObjFile into the containing ArchiveFile. This was actually
unnecessary -- we can just insert the ObjectFile (or BitcodeFile) into
the `inputFiles` vector. This is the approach taken by LLD-ELF.

Reviewed By: thakis

Differential Revision: https://reviews.llvm.org/D92539
2020-12-08 10:34:32 -08:00
Jez Ng 7b007ac080 [lld-macho][nfc] Move some methods from InputFile to ObjFile
Additionally:
1. Move the helper functions in InputSection.h below the definition of
   `InputSection`, so the important stuff is on top
2. Remove unnecessary `explicit`

Reviewed By: #lld-macho, compnerd

Differential Revision: https://reviews.llvm.org/D92453
2020-12-08 10:34:32 -08:00
Nico Weber 16b1f6e385 [mac/lld] Add support for the LC_LINKER_OPTION load command in o files
clang puts `-framework CoreFoundation` in this load command for files
that use @available / __builtin_available. Without support for this,
binaries that don't explicitly link to CoreFoundation fail to link.

Differential Revision: https://reviews.llvm.org/D92624
2020-12-04 08:46:53 -05:00
Nico Weber 7cb0a373d1 [mac/lld] Implement -t
Goes well with `-why_load` to get an idea of load order.

Differential Revision: https://reviews.llvm.org/D92583
2020-12-03 16:02:38 -05:00
Nico Weber 3422f3cc6e Reland "[mac/lld] Implement -why_load".
The problem was that `sym` became replaced in the call
to make<ObjFile> and referring to it afer that read memory that now
stored a different kind of symbol (a Defined instead of a LazySymbol).
Since this happens only once per archive, just copy the symbol to the
stack before make<ObjFile> and read the copy instead.

Originally reviewed at https://reviews.llvm.org/D92496
2020-12-03 08:35:12 -05:00
Nico Weber ea0029f55d Revert "[mac/lld] Implement -why_load"
This reverts commit 542d3b609d.
Seems to break check-lld. Reverting while I take a look.
2020-12-02 18:57:46 -05:00
Nico Weber 542d3b609d [mac/lld] Implement -why_load
This is useful for debugging why lld loads .o files it shouldn't load.
It's also useful for users of lld -- I've used ld64's version of this a
few times.

Differential Revision: https://reviews.llvm.org/D92496
2020-12-02 18:33:12 -05:00
Nico Weber ca634393fc [mac/lld] Make --reproduce work with thin archives
See http://reviews.llvm.org/rL268229 and
http://reviews.llvm.org/rL313832 which did the same for the ELF port.

Differential Revision: https://reviews.llvm.org/D92456
2020-12-02 09:48:31 -05:00
Nico Weber b2f00f24a3 [mac/lld] Include archive name in diagnostics
Also, for .o files, include full path as given on link command line.

Before:
    lld: error: undefined symbol [...], referenced from sandbox_logging.o

After:
    lld: error: undefined symbol [...], referenced from libseatbelt.a(sandbox_logging.o)

Move archiveName up to InputFile so we can consistently use toString()
to print InputFiles in diags, and pass it to the ObjFile ctor. This
matches the ELF and COFF ports.

Differential Revision: https://reviews.llvm.org/D92437
2020-12-01 23:00:25 -05:00
Nico Weber 07ab597bb0 [lld/mac] Fix issues around thin archives
- most importantly, fix a use-after-free when using thin archives,
  by putting the archive unique_ptr to the arena allocator. This
  ports D65565 to MachO

- correctly demangle symbol namess from archives in diagnostics

- add a test for thin archives -- it finds this UaF, but only when
  running it under asan (it also finds the demangling fix)

- make forceLoadArchive() use addFile() with a bool to have the archive
  loading code in fewer places. no behavior change; matches COFF port a
  bit better

Differential Revision: https://reviews.llvm.org/D92360
2020-12-01 18:48:29 -05:00
Jez Ng 78f6498cdc [lld-macho] Flesh out STABS implementation
This addresses a lot of the comments in {D89257}. Ideally it'd have been
done in the same diff, but the commits in between make that difficult.

This diff implements:
* N_GSYM and N_STSYM, the STABS for global and static symbols
* Has the STABS reflect the section IDs of their referent symbols
* Ensures we don't fail when encountering absolute symbols or files with
  no debug info
* Sorts STABS symbols by file to minimize the number of N_OSO entries

Reviewed By: clayborg

Differential Revision: https://reviews.llvm.org/D92366
2020-12-01 15:05:21 -08:00
Jez Ng b768d57b36 [lld-macho] Add archive name and file modtime to STABS output
We should also set the modtime when running LTO. That will be done in a
future diff, together with support for the `-object_path_lto` flag.

Reviewed By: clayborg

Differential Revision: https://reviews.llvm.org/D91318
2020-12-01 15:05:21 -08:00
Jez Ng 3fcb0eeb15 [lld-macho] Emit STABS symbols for debugging, and drop debug sections
Debug sections contain a large amount of data. In order not to bloat the size
of the final binary, we remove them and instead emit STABS symbols for
`dsymutil` and the debugger to locate their contents in the object files.

With this diff, `dsymutil` is able to locate the debug info. However, we need
a few more features before `lldb` is able to work well with our binaries --
e.g. having `LC_DYSYMTAB` accurately reflect the number of local symbols,
emitting `LC_UUID`, and more. Those will be handled in follow-up diffs.

Note also that the STABS we emit differ slightly from what ld64 does. First, we
emit the path to the source file as one `N_SO` symbol instead of two. (`ld64`
emits one `N_SO` for the dirname and one of the basename.) Second, we do not
emit `N_BNSYM` and `N_ENSYM` STABS to mark the start and end of functions,
because the `N_FUN` STABS already serve that purpose. @clayborg recommended
these changes based on his knowledge of what the debugging tools look for.

Additionally, this current implementation doesn't accurately reflect the size
of function symbols. It uses the size of their containing sectioins as a proxy,
but that is only accurate if `.subsections_with_symbols` is set, and if there
isn't an `N_ALT_ENTRY` in that particular subsection. I think we have two
options to solve this:

1. We can split up subsections by symbol even if `.subsections_with_symbols`
   is not set, but include constraints to ensure those subsections retain
   their order in the final output. This is `ld64`'s approach.
2. We could just add a `size` field to our `Symbol` class. This seems simpler,
   and I'm more inclined toward it, but I'm not sure if there are use cases
   that it doesn't handle well. As such I'm punting on the decision for now.

Reviewed By: clayborg

Differential Revision: https://reviews.llvm.org/D89257
2020-12-01 15:05:20 -08:00
Nico Weber 83e60f5a55 [lld/mac] Add --reproduce option
This adds support for ld.lld's --reproduce / lld-link's /reproduce:
flag to the MachO port. This flag can be added to a link command
to make the link write a tar file containing all inputs to the link
and a response file containing the link command. This can be used
to reproduce the link on another machine, which is useful for sharing
bug report inputs or performance test loads.

Since the linker is usually called through the clang driver and
adding linker flags can be a bit cumbersome, setting the env var
`LLD_REPRODUCE=foo.tar` triggers the feature as well.

The file response.txt in the archive can be used with
`ld64.lld.darwinnew $(cat response.txt)` as long as the contents are
smaller than the command-line limit, or with `ld64.lld.darwinnew
@response.txt` once D92149 is in.

The support in this patch is sufficient to create a tar file for
Chromium's base_unittests that can link after unpacking on a different
machine.

Differential Revision: https://reviews.llvm.org/D92274
2020-11-30 08:40:21 -05:00
Nico Weber c519bc7e16 lld/MachO: Move MachOOptTable to DriverUtils.cpp, remove DriverUtils.h
This makes lld/MachO look more like lld/COFF and lld/ELF, as discussed
in D91640.
2020-11-18 12:33:15 -05:00
Jez Ng 21f831134c [lld-macho] Add very basic support for LTO
Just enough to consume some bitcode files and link them. There's more
to be done around the symbol resolution API and the LTO config, but I don't yet
understand what all the various LTO settings do...

Reviewed By: #lld-macho, compnerd, smeenai, MaskRay

Differential Revision: https://reviews.llvm.org/D90663
2020-11-10 12:19:28 -08:00
Jez Ng 62a3f0c984 [lld-macho] Support absolute symbols
They operate like Defined symbols but with no associated InputSection.

Note that `ld64` seems to treat the weak definition flag like a no-op for
absolute symbols, so I have replicated that behavior.

Reviewed By: #lld-macho, smeenai

Differential Revision: https://reviews.llvm.org/D87909
2020-09-25 11:28:35 -07:00
Jez Ng c32e69b2ce [lld-macho][re-land] Initial support for common symbols
Fix earlier build break via a static_cast.

This reverts commit 8112d494d3.

Differential Revision: https://reviews.llvm.org/D86909
2020-09-24 15:00:20 -07:00
Muhammad Omair Javaid 8112d494d3 Revert "[lld-macho] Initial support for common symbols"
This reverts commit 63ace77962.

Breaks LLDB Arm build:
http://lab.llvm.org:8011/builders/lldb-arm-ubuntu/builds/4409
2020-09-24 12:26:40 +05:00
Jez Ng 9c70281497 [lld-macho][NFC] Make `!= nullptr` implicit 2020-09-23 20:09:49 -07:00
Jez Ng 63ace77962 [lld-macho] Initial support for common symbols
On Unix, it is traditionally allowed to write variable definitions without
initialization expressions (such as "int foo;") to header files. These are
called tentative definitions.

The compiler creates common symbols when it sees tentative definitions. When
linking the final binary, if there are remaining common symbols after name
resolution is complete, the linker converts them to regular defined symbols in
a `__common` section.

This diff implements most of that functionality, though we do not yet handle
the case where there are both common and non-common definitions of the same
symbol.

Reviewed By: #lld-macho, gkm

Differential Revision: https://reviews.llvm.org/D86909
2020-09-23 19:26:40 -07:00
Greg McGary 1a3ef0417c [lld-macho] In the context of relocs, s/target/referent/ for sections & symbols
The word "target" is overloaded, so lighten its load by using another word to denote the symbol or section to which a reloc points. While more stilted than "target", "referent" is rather less pompous than "designatum" or "denotatum". :P

Along the way, make a few neighboring variable names more descriptive.

Reviewed By: #lld-macho, int3

Differential Revision: https://reviews.llvm.org/D87584
2020-09-22 20:31:01 -07:00
Jez Ng cbe27316ef [lld-macho] Implement weak bindings for GOT/TLV
Previously, we were only emitting regular bindings to weak
dynamic symbols; this diff adds support for the weak bindings too, which
can overwrite the regular bindings at runtime. We also treat weak
defined global symbols similarly -- since they can also be interposed at
runtime, they need to be treated as potentially dynamic symbols.

Note that weak bindings differ from regular bindings in that they do not
specify the dylib to do the lookup in (i.e. weak symbol lookup happens
in a flat namespace.)

Differential Revision: https://reviews.llvm.org/D86572
2020-08-26 19:21:09 -07:00
Jez Ng cf918c809b [lld-macho] Implement -ObjC
It's roughly like -force_load with some filtering.

Differential Revision: https://reviews.llvm.org/D86181
2020-08-26 19:20:55 -07:00
Jez Ng 7394460d87 [lld-macho] Handle TAPI and regular re-exports uniformly
The re-exports list in a TAPI document can either refer to other inlined
TAPI documents, or to on-disk files (which may themselves be TBD or
regular files.) Similarly, the re-exports of a regular dylib can refer
to a TBD file.

Differential Revision: https://reviews.llvm.org/D85404
2020-08-26 19:20:48 -07:00
Jez Ng 6336c042f6 [lld-macho] Make it possible to re-export .tbd files
Two things needed fixing for that to work:

1. getName() no longer returns null for DylibFiles constructed from TAPIs
2. markSubLibrary() now accepts .tbd as a possible extension

Differential Revision: https://reviews.llvm.org/D86180
2020-08-26 19:20:42 -07:00
Jez Ng 7e6d675499 [lld-macho] Avoid unnecessary shared_ptr in DylibFile ctor
DylibFile doesn't store a pointer to its InterfaceFile
parameter, so there's no need to use a shared_ptr.

Reviewed By: #lld-macho, compnerd

Differential Revision: https://reviews.llvm.org/D85402
2020-08-12 19:50:12 -07:00
Jez Ng a499898e86 [lld-macho] Generate ObjC symbols from .tbd files
I followed similar logic in TapiFile.cpp.

Reviewed By: #lld-macho, smeenai

Differential Revision: https://reviews.llvm.org/D85255
2020-08-12 19:50:10 -07:00
Jez Ng 3c9100fb78 [lld-macho] Support dynamic linking of thread-locals
References to symbols in dylibs work very similarly regardless of
whether the symbol is a TLV. The main difference is that we have a
separate `__thread_ptrs` section that acts as the GOT for these
thread-locals.

We can identify thread-locals in dylibs by a flag in their export trie
entries, and we cross-check it with the relocations that refer to them
to ensure that we are not using a GOT relocation to reference a
thread-local (or vice versa).

Reviewed By: #lld-macho, smeenai

Differential Revision: https://reviews.llvm.org/D85081
2020-08-12 19:50:09 -07:00
Greg McGary a379f2c251 [lld-macho] Handle command-line option -sectcreate SEG SECT FILE
Handle command-line option `-sectcreate SEG SECT FILE`, which inputs a binary blob from `FILE` into `SEG,SECT`

Reviewed By: int3

Differential Revision: https://reviews.llvm.org/D85501
2020-08-10 18:47:13 -07:00
Jez Ng 31d5885842 [lld-macho] Partial support for weak definitions
This diff adds support for weak definitions, though it doesn't handle weak
symbols in dylibs quite correctly -- we need to emit binding opcodes for them
in the weak binding section rather than the lazy binding section.

What *is* covered in this diff:

1. Reading the weak flag from symbol table / export trie, and writing it to the
   export trie
2. Refining the symbol table's rules for choosing one symbol definition over
   another. Wrote a few dozen test cases to make sure we were matching ld64's
   behavior.

We can now link basic C++ programs.

Reviewed By: #lld-macho, compnerd

Differential Revision: https://reviews.llvm.org/D83532
2020-07-24 15:55:25 -07:00
Jez Ng 74871cdad7 [lld-macho] Ensure __bss sections we output have file offset of zero
Summary:
llvm-mc emits `__bss` sections with an offset of zero, but we weren't expecting
that in our input, so we were copying non-zero data from the start of the file and
putting it in `__bss`, with obviously undesirable runtime results. (It appears that
the kernel will copy those nonzero bytes as long as the offset is nonzero, regardless
of whether S_ZERO_FILL is set.)

I debated on whether to make a special ZeroFillSection -- separate from a
regular InputSection -- but it seemed like too much work for now. But I'm happy
to refactor if anyone feels strongly about having it as a separate class.

Depends on D80857.

Reviewers: ruiu, pcc, MaskRay, smeenai, alexshap, gkm, Ktwu, christylee

Reviewed By: smeenai

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80859
2020-06-17 20:41:28 -07:00
Jez Ng fcde378dcb [lld-macho] Support non-pcrel section relocs
Summary: Depends on D80854.

Reviewers: ruiu, pcc, MaskRay, smeenai, alexshap, gkm, Ktwu, christylee

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80855
2020-06-17 20:41:28 -07:00
Saleem Abdulrasool 73312976ad lld: remove old test support path
This removes the stub library that lld injected to satisfy the
dependency on the libSystem.  Now with TBD support, we can provide the
stub library to permit the tests to function properly as they would on a
real system.

Reviewed By: smeenai
Differential Revision: https://reviews.llvm.org/D81418
2020-06-16 15:57:58 -07:00
Jez Ng 53c796b948 [lld-macho] Properly handle & validate relocation r_length
Summary:
We should be reading / writing our addends / relocated addresses based on
r_length, and not just based on the type of the relocation. But since only
some r_length values are valid for a given reloc type, I've also added some
validation.

ld64 has code to allow for r_length = 0 in X86_64_RELOC_BRANCH relocs, but I'm
not sure how to create such a relocation...

Reviewed By: smeenai

Differential Revision: https://reviews.llvm.org/D80854
2020-06-14 16:35:23 -07:00
Saleem Abdulrasool 6fe27b5fed lld: initial pass at supporting TBD
Add support to lld to use Text Based API stubs for linking.  This is
support is incomplete not filtering out platforms.  It also does not
account for architecture specific API handling and potentially does not
correctly handle trees of re-exports with inlined libraries being
treated as direct children of the top level library.
2020-06-08 18:15:40 -07:00
Jez Ng 1e1a3f67ee [lld-macho] Ensure reads from nlist_64 structs are aligned when necessary
My test refactoring in D80217 seems to have caused yaml2obj to emit
unaligned nlist_64 structs, causing ASAN'd lld to be unhappy. I don't
think this is an issue with yaml2obj though -- llvm-mc also seems to
emit unaligned nlist_64s. This diff makes lld able to safely do aligned
reads under ASAN builds while hopefully creating no overhead for regular
builds on architectures that support unaligned reads.

Reviewed By: thakis

Differential Revision: https://reviews.llvm.org/D80414
2020-06-02 13:19:38 -07:00
Jez Ng 6f6d91867d [lld-macho] Add some relocation validation logic
I considered making a `Target::validate()` method, but I wasn't sure how
I felt about the overhead of doing yet another switch-dispatch on the
relocation type, so I put the validation in `relocateOne` instead...
might be a bit of a micro-optimization, but `relocateOne` does assume
certain things about the relocations it gets, and this error handling
makes that explicit, so it's not a totally unreasonable code
organization.

Reviewed By: smeenai

Differential Revision: https://reviews.llvm.org/D80049
2020-06-02 13:19:38 -07:00
Jez Ng ce0d8beebc [lld-macho][re-land] Support X86_64_RELOC_UNSIGNED
This reverts commit db8559eee4.
2020-05-19 12:31:55 -07:00
Jez Ng 4eb6f4854e [lld-macho][re-land] Support .subsections_via_symbols
Summary:
This diff restores and builds upon @pcc and @ruiu's initial work on
subsections.

The .subsections_via_symbols directive indicates we can split each
section along symbol boundaries, unless those symbols have been marked
with `.alt_entry`.

We exercise this functionality in our tests by using order files that
rearrange those symbols.

Depends on D79668.

Reviewers: ruiu, pcc, MaskRay, smeenai, alexshap, gkm, Ktwu, christylee

Reviewed By: smeenai

Subscribers: thakis, llvm-commits, pcc, ruiu

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79926
2020-05-19 12:31:54 -07:00
Jez Ng 70fbbcdd34 Revert "[lld-macho] Support .subsections_via_symbols"
Due to build breakage mentioned in https://reviews.llvm.org/D79926.

This reverts commit e270b2f172.
2020-05-19 08:30:02 -07:00
Jez Ng db8559eee4 Revert "[lld-macho] Support X86_64_RELOC_UNSIGNED"
This reverts commit 1f820e3559.
2020-05-19 08:30:02 -07:00
Jez Ng 1f820e3559 [lld-macho] Support X86_64_RELOC_UNSIGNED
Note that it's only used for non-pc-relative contexts.

Reviewed By: MaskRay, smeenai

Differential Revision: https://reviews.llvm.org/D80048
2020-05-19 07:46:57 -07:00
Jez Ng e270b2f172 [lld-macho] Support .subsections_via_symbols
This diff restores and builds upon @pcc and @ruiu's initial work on
subsections.

The .subsections_via_symbols directive indicates we can split each
section along symbol boundaries, unless those symbols have been marked
with `.alt_entry`.

We exercise this functionality in our tests by using order files that
rearrange those symbols.

Reviewed By: smeenai

Differential Revision: https://reviews.llvm.org/D79926
2020-05-19 07:46:57 -07:00
Jez Ng 55e9eb416e [lld-macho] Support -order_file
The order file indicates how input sections should be sorted within each
output section, based on the symbols contained within those sections.

This diff sets the stage for implementing and testing
`.subsections_via_symbols`, where we will break up InputSections by each
symbol and sort them more granularly.

Reviewed By: smeenai

Differential Revision: https://reviews.llvm.org/D79668
2020-05-19 07:46:57 -07:00
Kellie Medlin 2b920ae78c [lld] Add archive file support to Mach-O backend
With this change, basic archive files can be linked together. Input
section discovery has been refactored into a function since archive
files lazily resolve their symbols / the object files containing those
symbols.

Reviewed By: int3, smeenai

Differential Revision: https://reviews.llvm.org/D78342
2020-05-14 12:58:35 -07:00