llvm-project

Commit Graph

Author	SHA1	Message	Date
Vy Nguyen	c7c5a1c9ae	[lld-macho] Ignore debug symbols while preparing relocations. Details: see https://bugs.llvm.org/show_bug.cgi?id=50812 Differential Revision: https://reviews.llvm.org/D105210	2021-07-02 13:51:46 -04:00
Jez Ng	f6b6e72143	[lld-macho] Factor out common InputSection members We have been creating many ConcatInputSections with identical values due to .subsections_via_symbols. This diff factors out the identical values into a Shared struct, to reduce memory consumption and make copying cheaper. I also changed `callSiteCount` from a uint32_t to a 31-bit field to save an extra word. All in all, this takes InputSection from 120 to 72 bytes (and ConcatInputSection from 160 to 112 bytes), i.e. 30% size reduction in ConcatInputSection. Numbers for linking chromium_framework on my 3.2 GHz 16-Core Intel Xeon W: N Min Max Median Avg Stddev x 20 4.14 4.24 4.18 4.183 0.027548999 + 20 4.04 4.11 4.075 4.0775 0.018027756 Difference at 95.0% confidence -0.1055 +/- 0.0149005 -2.52211% +/- 0.356215% (Student's t, pooled s = 0.0232803) Reviewed By: #lld-macho, thakis Differential Revision: https://reviews.llvm.org/D105305	2021-07-01 21:22:39 -04:00
Jez Ng	ac2dd06b91	[lld-macho] Deduplicate CFStrings `__cfstring` is a special literal section, so instead of breaking it up at symbol boundaries, we break it up at fixed-width boundaries (since each literal is the same size). Symbols can only occur at one of those boundaries, so this is strictly more powerful than `.subsections_via_symbols`. With that in place, we then run the section through ICF. This change is about perf-neutral when linking chromium_framework. Reviewed By: #lld-macho, gkm Differential Revision: https://reviews.llvm.org/D105045	2021-07-01 21:22:38 -04:00
Nico Weber	aed0a08c69	[lld/mac] Make symbol table order deterministic SymtabSection::emitStabs() writes the symbol table in the order of externalSymbols, which has the order of symtab->getSymbols(), which is just the order symbols are added to the symbol table. In practice, symbols in the symbol files of input .o files are sorted, but since that's not guaranteed we sort them in ObjFile::parseSymbols(). To make sure several symbols with the same address keep the order they're in the input file, we have to use stable_sort(). In practice, std::sort() on already-sorted inputs won't change the order of just adjacent elements, and while in theory std::sort() could use a random pivot, in practice the code should be deterministic as it was previously too. But now lld/test/MachO/stabs.s passes with LLVM_ENABLE_EXPENSIVE_CHECKS=ON (the last test that was failing with that set). Fixes a regression from D99972. While here, remove an empty section in stabs.s and move .subsections_via_symbols to the end where it usually is (this part no behavior change). Differential Revision: https://reviews.llvm.org/D105071	2021-06-29 09:29:49 -04:00
Leonard Grey	a8a6e5b094	[lld-macho] Preserve alignment for non-deduplicated cstrings Fixes PR50637. Downstream bug: https://crbug.com/1218958 Currently, we split __cstring along symbol boundaries with .subsections_via_symbols when not deduplicating, and along null bytes when deduplicating. This change splits along null bytes unconditionally, and preserves original alignment in the non- deduplicated case. Removing subsections-section-relocs.s because with this change, __cstring is never reordered based on the order file. Differential Revision: https://reviews.llvm.org/D104919	2021-06-28 22:26:43 -04:00
Jez Ng	4c49f9ceaf	[lld-macho] Handle non-extern symbols marked as private extern Previously, we asserted that such a case was invalid, but in fact `ld -r` can emit such symbols if the input contained a (true) private extern, or if it contained a symbol started with "L". Non-extern symbols marked as private extern are essentially equivalent to regular TU-scoped symbols, so no new functionality is needed. Reviewed By: #lld-macho, thakis Differential Revision: https://reviews.llvm.org/D104502	2021-06-18 16:36:14 -04:00
Muhammad Omair Javaid	9777f3fd06	Fix build failure on 32 bit Arm This patch fixes build failure caused by commit `f27e4548fc` on 32 bit arm. Differential Revision: https://reviews.llvm.org/D103292	2021-06-18 15:27:09 +00:00
Greg McGary	f27e4548fc	[lld-macho] Implement ICF ICF = Identical C(ode\|OMDAT) Folding This is the LLD ELF/COFF algorithm, adapted for MachO. So far, only `-icf all` is supported. In order to support `-icf safe`, we will need to port address-significance tables (`.addrsig` directives) to MachO, which will come in later diffs. `check-{llvm,clang,lld}` have 0 regressions for `lld -icf all` vs. baseline ld64. We only run ICF on `__TEXT,__text` for reasons explained in the block comment in `ConcatOutputSection.cpp`. Here is the perf impact for linking `chromium_framekwork` on a Mac Pro (16-core Xeon W) for the non-ICF case vs. pre-ICF: ``` N Min Max Median Avg Stddev x 20 4.27 4.44 4.34 4.349 0.043029977 + 20 4.37 4.46 4.405 4.4115 0.025188761 Difference at 95.0% confidence 0.0625 +/- 0.0225658 1.43711% +/- 0.518873% (Student's t, pooled s = 0.0352566) ``` Reviewed By: #lld-macho, int3 Differential Revision: https://reviews.llvm.org/D103292	2021-06-17 10:07:44 -07:00
Jez Ng	eeac6b2bec	[lld-macho] Handle multiple LC_LINKER_OPTIONs We previously only parsed the first one. Reviewed By: #lld-macho, thakis Differential Revision: https://reviews.llvm.org/D104352	2021-06-16 15:23:06 -04:00
Jez Ng	d52d1b93c3	[lld-macho] Downgrade version mismatch to warning It's a warning in ld64. While having LLD be stricter would be nice, it makes it harder for it to be a drop-in replacement into existing builds. Reviewed By: #lld-macho, thakis Differential Revision: https://reviews.llvm.org/D104333	2021-06-16 11:06:26 -04:00
Nico Weber	b579938d40	[lld/mac] Add support for -no_data_in_code_info flag Differential Revision: https://reviews.llvm.org/D104345	2021-06-16 06:40:42 -04:00
Alexander Shaposhnikov	928394d109	[lld][MachO] Add support for LC_DATA_IN_CODE Add first bits for emitting LC_DATA_IN_CODE. Test plan: make check-lld-macho Differential revision: https://reviews.llvm.org/D103006	2021-06-14 19:21:59 -07:00
Jez Ng	cc17bfe489	[lld-macho] Fix "shift exponent too large" UBSAN error UBSAN seems to have added this check somewhere along the way... This might also fix the PPC buildbot, which is failing on the same test	2021-06-14 13:47:25 -04:00
Jez Ng	681cfeb591	[lld-macho][nfc] Have InputSection ctors take some parameters This is motivated by an upcoming diff in which the WordLiteralInputSection ctor sets itself up based on the value of its section flags. As such, it needs to be passed the `flags` value as part of its ctor parameters, instead of having them assigned after the fact in `parseSection()`. While refactoring code to make that possible, I figured it would make sense for the other InputSections to also take their initial values as ctor parameters. Reviewed By: #lld-macho, thakis Differential Revision: https://reviews.llvm.org/D103978	2021-06-11 19:50:09 -04:00
Jez Ng	5d88f2dd94	[lld-macho] Deduplicate fixed-width literals Conceptually, the implementation is pretty straightforward: we put each literal value into a hashtable, and then write out the keys of that hashtable at the end. In contrast with ELF, the Mach-O format does not support variable-length literals that aren't strings. Its literals are either 4, 8, or 16 bytes in length. LLD-ELF dedups its literals via sorting + uniq'ing, but since we don't need to worry about overly-long values, we should be able to do a faster job by just hashing. That said, the implementation right now is far from optimal, because we add to those hashtables serially. To parallelize this, we'll need a basic concurrent hashtable (only needs to support concurrent writes w/o interleave reads), which shouldn't be to hard to implement, but I'd like to punt on it for now. Numbers for linking chromium_framework on my 3.2 GHz 16-Core Intel Xeon W: N Min Max Median Avg Stddev x 20 4.27 4.39 4.315 4.3225 0.033225703 + 20 4.36 4.82 4.44 4.4845 0.13152846 Difference at 95.0% confidence 0.162 +/- 0.0613971 3.74783% +/- 1.42041% (Student's t, pooled s = 0.0959262) This corresponds to binary size savings of 2MB out of 335MB, or 0.6%. It's not a great tradeoff as-is, but as mentioned our implementation can be signficantly optimized, and literal dedup will unlock more opportunities for ICF to identify identical structures that reference the same literals. Reviewed By: #lld-macho, gkm Differential Revision: https://reviews.llvm.org/D103113	2021-06-11 19:50:08 -04:00
Nico Weber	0e399eb527	[lld/mac] When handling @loader_path, use realpath() of symlinks This is important for Frameworks, which are usually symlinks. ld64 gets this right for @rpath that's replaced with @loader_path, but not for bare @loader_path -- ld64's code calls realpath() in that case too, but ignores the result. ld64 somehow manages to find libbar1.dylib in the test without the explicit `-rpath` in Foo1. I don't understand why or how. But this change is a step forward and fixes an immediate problem I'm having, so let's start with this :) Differential Revision: https://reviews.llvm.org/D103990	2021-06-09 20:36:07 -04:00
Jez Ng	04259cde15	[lld-macho] Implement cstring deduplication Our implementation draws heavily from LLD-ELF's, which in turn delegates its string deduplication to llvm-mc's StringTableBuilder. The messiness of this diff is largely due to the fact that we've previously assumed that all InputSections get concatenated together to form the output. This is no longer true with CStringInputSections, which split their contents into StringPieces. StringPieces are much more lightweight than InputSections, which is important as we create a lot of them. They may also overlap in the output, which makes it possible for strings to be tail-merged. In fact, the initial version of this diff implemented tail merging, but I've dropped it for reasons I'll explain later. Alignment Issues Mergeable cstring literals are found under the `__TEXT,__cstring` section. In contrast to ELF, which puts strings that need different alignments into different sections, clang's Mach-O backend puts them all in one section. Strings that need to be aligned have the `.p2align` directive emitted before them, which simply translates into zero padding in the object file. I think ld64 extracts the desired per-string alignment from this data by preserving each string's offset from the last section-aligned address. I'm not entirely certain since it doesn't seem consistent about doing this; but perhaps this can be chalked up to cases where ld64 has to deduplicate strings with different offset/alignment combos -- it seems to pick one of their alignments to preserve. This doesn't seem correct in general; we can in fact can induce ld64 to produce a crashing binary just by linking in an additional object file that only contains cstrings and no code. See PR50563 for details. Moreover, this scheme seems rather inefficient: since unaligned and aligned strings are all put in the same section, which has a single alignment value, it doesn't seem possible to tell whether a given string doesn't have any alignment requirements. Preserving offset+alignments for strings that don't need it is wasteful. In practice, the crashes seen so far seem to stem from x86_64 SIMD operations on cstrings. X86_64 requires SIMD accesses to be 16-byte-aligned. So for now, I'm thinking of just aligning all strings to 16 bytes on x86_64. This is indeed wasteful, but implementation-wise it's simpler than preserving per-string alignment+offsets. It also avoids the aforementioned crash after deduplication of differently-aligned strings. Finally, the overhead is not huge: using 16-byte alignment (vs no alignment) is only a 0.5% size overhead when linking chromium_framework. With these alignment requirements, it doesn't make sense to attempt tail merging -- most strings will not be eligible since their overlaps aren't likely to start at a 16-byte boundary. Tail-merging (with alignment) for chromium_framework only improves size by 0.3%. It's worth noting that LLD-ELF only does tail merging at `-O2`. By default (at `-O1`), it just deduplicates w/o tail merging. @thakis has also mentioned that they saw it regress compressed size in some cases and therefore turned it off. `ld64` does not seem to do tail merging at all. Performance Numbers CString deduplication reduces chromium_framework from 250MB to 242MB, or about a 3.2% reduction. Numbers for linking chromium_framework on my 3.2 GHz 16-Core Intel Xeon W: N Min Max Median Avg Stddev x 20 3.91 4.03 3.935 3.95 0.034641016 + 20 3.99 4.14 4.015 4.0365 0.0492336 Difference at 95.0% confidence 0.0865 +/- 0.027245 2.18987% +/- 0.689746% (Student's t, pooled s = 0.0425673) As expected, cstring merging incurs some non-trivial overhead. When passing `--no-literal-merge`, it seems that performance is the same, i.e. the refactoring in this diff didn't cost us. N Min Max Median Avg Stddev x 20 3.91 4.03 3.935 3.95 0.034641016 + 20 3.89 4.02 3.935 3.9435 0.043197831 No difference proven at 95.0% confidence Reviewed By: #lld-macho, gkm Differential Revision: https://reviews.llvm.org/D102964	2021-06-07 23:48:35 -04:00
Nico Weber	17c43c4045	[lld/mac] Add reexports after reexporter to inputFiles When a library "host"'s reexports change their installName with `$ld$os10.11$install_name$host`, we used to write a load command for "host" but write the version numbers of the reexport instead of "host". This fixes that. I first thought that the rule is to take the version numbers from the library that originally had that install name (implemented in D103819), but that's not what ld64 seems to be doing: It takes the version number from the first dylib with that install name it loads, and it loads the reexporting library before the reexports. We already did most of that, we just added reexports before the reexporter. After this change, we add the reexporter before the reexports. Addresses https://bugs.llvm.org/show_bug.cgi?id=49800#c11 part 1. (ld64 seems to add reexports after processing _all_ files on the command line, while we add them right after the reexporter. For the common case of reexport + $ld$ symbol changing back to the exporter name, this doesn't make a difference, but you can construct a case where it does. I expect this to not make a difference in practice though.) Differential Revision: https://reviews.llvm.org/D103821	2021-06-07 17:04:03 -04:00
Nico Weber	c5ffe97988	[lld/mac] Implement support for searching dylibs with @rpath/ in install name Also adjust a few comments, and move the DylibFile comment talking about umbrella next to the parameter again. Differential Revision: https://reviews.llvm.org/D103783	2021-06-07 06:22:52 -04:00
Nico Weber	52489021cf	[lld/mac] Implement support for searching dylibs with @loader_path/ in install name Differential Revision: https://reviews.llvm.org/D103779	2021-06-06 20:19:50 -04:00
Nico Weber	a48bd587f7	[lld/mac] Implement support for searching dylibs with @executable_path/ in install name Differential Revision: https://reviews.llvm.org/D103775	2021-06-06 20:01:50 -04:00
Nico Weber	7def700667	[lld/mac] Rename DylibFile::dylibName to DylibFile::installName The flag to set it is called `-install_name`, and it's called `installName` in tbd files. No behavior change. Differential Revision: https://reviews.llvm.org/D103776	2021-06-06 20:00:35 -04:00
Nico Weber	e910437443	[lld/mac] Use fewer magic numbers in magic $ld$ handling code Also simply a conditional and de-alias a variable. Minor cleanups, no behavior change. Differential Revision: https://reviews.llvm.org/D103774	2021-06-06 18:13:16 -04:00
Alexander Shaposhnikov	5e49ee8794	[lld][MachO] Add support for $ld$install_name symbols This diff adds support for $ld$install_name symbols. Test plan: make check-lld-macho Differential revision: https://reviews.llvm.org/D103746	2021-06-05 12:58:59 -07:00
Alexander Shaposhnikov	1309c181a8	[lld][MachO] Add first bits to support special symbols This diff adds first bits to support special symbols $ld$previous* in LLD. $ld$* symbols modify properties/behavior of the library (e.g. its install name, compatibility version or hide/add symbols) for specific target versions. Test plan: make check-lld-macho Differential revision: https://reviews.llvm.org/D103505	2021-06-04 23:32:26 -07:00
Jez Ng	6881f29a36	[lld-macho] Parse re-exports of nested TAPI documents D103423 neglected to call `parseReexports()` for nested TBD documents, leading to symbol resolution failures when trying to look up a symbol nested more than one level deep in a TBD file. This fixes the regression and adds a test. It also appears that `umbrella` wasn't being set properly when calling `parseLoadCommands` -- it's supposed to resolve to `this` if `nullptr` is passed. I didn't write a failing test case for this but I've made `umbrella` a member so the previous behavior should be preserved. Reviewed By: #lld-macho, thakis Differential Revision: https://reviews.llvm.org/D103586	2021-06-03 12:02:30 -04:00
Nico Weber	a5645513db	[lld/mac] Implement -dead_strip Also adds support for live_support sections, no_dead_strip sections, .no_dead_strip symbols. Chromium Framework 345MB unstripped -> 250MB stripped (vs 290MB unstripped -> 236M stripped with ld64). Doing dead stripping is a bit faster than not, because so much less data needs to be processed: % ministat lld_* x lld_nostrip.txt + lld_strip.txt N Min Max Median Avg Stddev x 10 3.929414 4.07692 4.0269079 4.0089678 0.044214794 + 10 3.8129408 3.9025559 3.8670411 3.8642573 0.024779651 Difference at 95.0% confidence -0.144711 +/- 0.0336749 -3.60967% +/- 0.839989% (Student's t, pooled s = 0.0358398) This interacts with many parts of the linker. I tried to add test coverage for all added `isLive()` checks, so that some test will fail if any of them is removed. I checked that the test expectations for the most part match ld64's behavior (except for live-support-iterations.s, see the comment in the test). Interacts with: - debug info - export tries - import opcodes - flags like -exported_symbol(s_list) - -U / dynamic_lookup - mod_init_funcs, mod_term_funcs - weak symbol handling - unwind info - stubs - map files - -sectcreate - undefined, dylib, common, defined (both absolute and normal) symbols It's possible it interacts with more features I didn't think of, of course. I also did some manual testing: - check-llvm check-clang check-lld work with lld with this patch as host linker and -dead_strip enabled - Chromium still starts - Chromium's base_unittests still pass, including unwind tests Implemenation-wise, this is InputSection-based, so it'll work for object files with .subsections_via_symbols (which includes all object files generated by clang). I first based this on the COFF implementation, but later realized that things are more similar to ELF. I think it'd be good to refactor MarkLive.cpp to look more like the ELF part at some point, but I'd like to get a working state checked in first. Mechanical parts: - Rename canOmitFromOutput to wasCoalesced (no behavior change) since it really is for weak coalesced symbols - Add noDeadStrip to Defined, corresponding to N_NO_DEAD_STRIP (`.no_dead_strip` in asm) Fixes PR49276. Differential Revision: https://reviews.llvm.org/D103324	2021-06-02 11:09:26 -04:00
Nico Weber	476e4d65d4	[lld/mac] Address review feedback and improve a comment I forgot to move the message() call around as requested in D103428 before committing that change. Move it now. Also, improve the ordinal uniq'ing comment. I hadn't realized that the distinct-but-identical files happen with --reproduce and not in general. No behavior change. Differential Revision: https://reviews.llvm.org/D103522	2021-06-02 10:54:53 -04:00
Vy Nguyen	8f89c054af	[lld-macho][nfc] Remove unnecessary use of Optional<T*> In all of these cases, the functions could simply return a nullptr instead of {}. There is no case where Optional<nullptr> has a special meaning. Differential Revision: https://reviews.llvm.org/D103489	2021-06-01 18:35:31 -04:00
Nico Weber	2c1903412b	[lld/mac] Implement removal of unused dylibs This omits load commands for unreferenced dylibs if: - the dylib was loaded implicitly, - it is marked MH_DEAD_STRIPPABLE_DYLIB - or -dead_strip_dylibs is passed This matches ld64. Currently, the "is dylib referenced" state is computed before dead code stripping and is not updated after dead code stripping. This too matches ld64. We should do better here. With this, clang-format linked with lld (like with ld64) no longer has libobjc.A.dylib in `otool -L` output. (It was implicitly loaded as a reexport of CoreFoundation.framework, but it's not needed.) Differential Revision: https://reviews.llvm.org/D103430	2021-06-01 16:06:30 -04:00
Nico Weber	24979e1113	[lld/mac] Don't load DylibFiles from the DylibFile constructor loadDylib() keeps a name->DylibFile cache, but it only writes to the cache once the DylibFile constructor has completed. So dylib loads done recursively from the DylibFile constructor wouldn't use the cache. Now, we load additional dylibs after writing to the cache, which means the cache now gets used for dylibs loaded because they're referenced from other dylibs. Related to PR49514 and PR50101, but no dramatic behavior change in itself. (Technically we no longer crash when a tbd file reexports itself, but that doesn't happen in practice. We now accept it silently instead of crashing; ld64 has a diag for the reexport cycle.) Differential Revision: https://reviews.llvm.org/D103423	2021-06-01 15:31:02 -04:00
Jez Ng	8535834ef7	[lld-macho][nfc] Misc code cleanup * Move `static_asserts` into cpp instead of header file. I noticed they had been separated from the main class definition in the header, so I set about to clean that up, then figured it made more sense as part of the cpp file so as not to incur unnecessary compile-time overhead. * Remove unnecessary `virtual`s * Remove unnecessary comment / reword another comment	2021-05-25 14:58:29 -04:00
Alexander Shaposhnikov	57501e512e	[lld][MachO] Fix code formatting Apply clang-format -style=llvm to InputFile.cpp. NFC. Test plan: make check-all	2021-05-23 20:35:55 -07:00
Nico Weber	4a12248ee2	[lld/mac] Honor REFERENCED_DYAMICALLY, set it on __mh_execute_header Has the effect that `__mh_execute_header` stays in the symbol table of outputs even after running `strip` on the output. I don't know if that's important for anything -- my motivation for the patch is just is to make the output more similar to ld64. (Corresponds to symbolTableInAndNeverStrip in ld64.) Differential Revision: https://reviews.llvm.org/D102619	2021-05-17 14:22:12 -04:00
Jez Ng	b1c3c2e4fc	[lld-macho] Fix order file arch filtering We had a hardcoded check and a stale TODO, written back when we only had support for one architecture. Reviewed By: #lld-macho, thakis Differential Revision: https://reviews.llvm.org/D102154	2021-05-10 15:45:54 -04:00
Nico Weber	7f673fcaa9	[lld/mac] Fix alignment on subsections On a section with alignment of 16, subsections aligned to 16-byte boundaries should keep their 16-byte alignment. Fixes PR50274. (The same bug could have happened with -order_file previously.) Differential Revision: https://reviews.llvm.org/D102139	2021-05-09 21:00:56 -04:00
Nico Weber	d5a70db193	[lld/mac] Write every weak symbol only once in the output Before this, if an inline function was defined in several input files, lld would write each copy of the inline function the output. With this patch, it only writes one copy. Reduces the size of Chromium Framework from 378MB to 345MB (compared to 290MB linked with ld64, which also does dead-stripping, which we don't do yet), and makes linking it faster: N Min Max Median Avg Stddev x 10 3.9957051 4.3496981 4.1411121 4.156837 0.10092097 + 10 3.908154 4.169318 3.9712729 3.9846753 0.075773012 Difference at 95.0% confidence -0.172162 +/- 0.083847 -4.14165% +/- 2.01709% (Student's t, pooled s = 0.0892373) Implementation-wise, when merging two weak symbols, this sets a "canOmitFromOutput" on the InputSection belonging to the weak symbol not put in the symbol table. We then don't write InputSections that have this set, as long as they are not referenced from other symbols. (This happens e.g. for object files that don't set .subsections_via_symbols or that use .alt_entry.) Some restrictions: - not yet done for bitcode inputs - no "comdat" handling (`kindNoneGroupSubordinate*` in ld64) -- Frame Descriptor Entries (FDEs), Language Specific Data Areas (LSDAs) (that is, catch block unwind information) and Personality Routines associated with weak functions still not stripped. This is wasteful, but harmless. - However, this does strip weaks from __unwind_info (which is needed for correctness and not just for size) - This nopes out on InputSections that are referenced form more than one symbol (eg from .alt_entry) for now Things that work based on symbols Just Work: - map files (change in MapFile.cpp is no-op and not needed; I just found it a bit more explicit) - exports Things that work with inputSections need to explicitly check if an inputSection is written (e.g. unwind info). This patch is useful in itself, but it's also likely also a useful foundation for dead_strip. I used to have a "canoncialRepresentative" pointer on InputSection instead of just the bool, which would be handy for ICF too. But I ended up not needing it for this patch, so I removed that again for now. Differential Revision: https://reviews.llvm.org/D102076	2021-05-07 17:11:40 -04:00
Jez Ng	9260760235	[lld-macho] Support loading of zippered dylibs ld64 can emit dylibs that support more than one platform (typically macOS and macCatalyst). This diff allows LLD to read in those dylibs. Note that this is a super bare-bones implementation -- in particular, I haven't added support for LLD to emit those multi-platform dylibs, nor have I added a variety of validation checks that ld64 does. Until we have a use-case for emitting zippered dylibs, I think this is good enough. Fixes PR49597. Reviewed By: #lld-macho, oontvoo Differential Revision: https://reviews.llvm.org/D101954	2021-05-06 11:19:40 -04:00
Vy Nguyen	23233ad139	[lld-macho] Check simulator platforms to avoid issuing false positive errors. Currently the linker causes unnecessary errors when either the target or the config's platform is a simulator. Differential Revision: https://reviews.llvm.org/D101855	2021-05-05 18:07:58 -04:00
Jez Ng	001ba65375	[lld-macho] De-templatize mach_header operations @thakis pointed out that `mach_header` and `mach_header_64` actually have the same set of (used) fields, with the 64-bit version having extra padding. So we can access the fields we need using the single `mach_header` type instead of using templates to switch between the two. I also spotted a potential issue where hasObjCSection tries to parse a file w/o checking if it does indeed match the target arch... As such, I've added a quick magic number check to ensure we don't access invalid memory during `findCommand()`. Addresses PR50180. Reviewed By: #lld-macho, thakis Differential Revision: https://reviews.llvm.org/D101724	2021-05-03 18:31:23 -04:00
Jez Ng	05c5363b39	[lld-macho] Parse & emit the N_ARM_THUMB_DEF symbol flag Eventually we'll use this flag to properly handle bl/blx opcodes. Reviewed By: #lld-macho, gkm Differential Revision: https://reviews.llvm.org/D101558	2021-04-30 16:17:26 -04:00
Nico Weber	4b456038e4	[lld/mac] Tweak two comments and fix style on one variable name Cosmetic, no behavior change.	2021-04-30 09:30:51 -04:00
Nico Weber	a38ebed258	[lld/mac] Implement support for .weak_def_can_be_hidden I first had a more invasive patch for this (D101069), but while trying to get that polished for review I realized that lld's current symbol merging semantics mean that only a very small code change is needed. So this goes with the smaller patch for now. This has no effect on projects that build with -fvisibility=hidden (e.g. chromium), since these see .private_extern symbols instead. It does have an effect on projects that build with -fvisibility-inlines-hidden (e.g. llvm) in -O2 builds, where LLVM's GlobalOpt pass will promote most inline functions from .weak_definition to .weak_def_can_be_hidden. Before this patch: % ls -l out/gn/bin/clang out/gn/lib/libclang.dylib -rwxr-xr-x 1 thakis staff 113059936 Apr 22 11:51 out/gn/bin/clang -rwxr-xr-x 1 thakis staff 86370064 Apr 22 11:51 out/gn/lib/libclang.dylib % out/gn/bin/llvm-objdump --macho --weak-bind out/gn/bin/clang \| wc -l 8291 % out/gn/bin/llvm-objdump --macho --weak-bind out/gn/lib/libclang.dylib \| wc -l 5698 With this patch: % ls -l out/gn/bin/clang out/gn/lib/libclang.dylib -rwxr-xr-x 1 thakis staff 111721096 Apr 22 11:55 out/gn/bin/clang -rwxr-xr-x 1 thakis staff 85291208 Apr 22 11:55 out/gn/lib/libclang.dylib thakis@MBP llvm-project % out/gn/bin/llvm-objdump --macho --weak-bind out/gn/bin/clang \| wc -l 725 thakis@MBP llvm-project % out/gn/bin/llvm-objdump --macho --weak-bind out/gn/lib/libclang.dylib \| wc -l 542 Linking clang becomes a tiny bit faster with this patch: x 100 0.67263818 0.77847815 0.69430709 0.69877208 0.017715892 + 100 0.67209601 0.73323393 0.68600798 0.68917346 0.012824377 Difference at 95.0% confidence -0.00959861 +/- 0.00428661 -1.37364% +/- 0.613449% (Student's t, pooled s = 0.0154648) This only happens if lld with the patch and lld without the patch are both linked with an lld with the patch or both linked with an lld without the patch (...or with ld64). I accidentally linked the lld with the patch with an lld without the patch and the other way round at first. In that setup, no difference is found. That makese sense, since having fewer weak imports will make the linked output a bit faster too. So not only does this make linking binaries such as clang a bit faster (since fewer exports need to be written to the export trie by lld), the linked output binary binary is also a bit faster (since dyld needs to process fewer dynamic imports). This also happens to fix the one `check-clang` failure when using lld as host linker, but mostly for silly reasons: See crbug.com/1183336, mostly comment 26. The real bug here is that c-index-test links all of LLVM both statically and dynamically, which is an ODR violation. Things just happen to work with this patch. So after this patch, check-clang, check-lld, check-llvm all pass with lld as host linker :) Differential Revision: https://reviews.llvm.org/D101080	2021-04-22 22:51:34 -04:00
Jez Ng	8c17a87515	[re-land][lld-macho] Fix min version check We had got it backwards... the minimum version of the target should be higher than the min version of the object files, presumably since new platforms are backwards-compatible with older formats. Fixes PR50078. The original commit (`aa05439c9c`) broke many tests that had inputs too new for our target platform (10.0). This commit changes the inputs to target 10.0, which was the simpler thing to do, but we should really just have our lit.local.cfg default to targeting 10.15... we're not likely to ever have proper support for the older versions anyway. I will follow up with a change to that effect. Differential Revision: https://reviews.llvm.org/D101114	2021-04-22 19:35:32 -04:00
Jez Ng	75ecb804b1	Revert "[lld-macho] Fix min version check" This reverts commit `aa05439c9c`.	2021-04-22 19:07:41 -04:00
Jez Ng	aa05439c9c	[lld-macho] Fix min version check We had got it backwards... the minimum version of the target should be higher than the min version of the object files, presumably since new platforms are backwards-compatible with older formats. Fixes PR50078. Reviewed By: #lld-macho, thakis Differential Revision: https://reviews.llvm.org/D101114	2021-04-22 18:25:44 -04:00
Jez Ng	ed4a4e3312	[lld-macho][nfc] Add accessors for commonly-used PlatformInfo fields As discussed here: https://reviews.llvm.org/D100523#inline-951543 Reviewed By: #lld-macho, thakis, alexshap Differential Revision: https://reviews.llvm.org/D100978	2021-04-21 15:43:56 -04:00
Alexander Shaposhnikov	b5720354ef	[lld][MachO] Refactor findCommand Refactor findCommand to allow passing multiple types. NFC. Test plan: make check-lld-macho Differential revision: https://reviews.llvm.org/D100954	2021-04-21 08:38:17 -07:00
Alexander Shaposhnikov	5c835e1ae5	[lld][MachO] Add support for LC_VERSION_MIN_* load commands This diff adds initial support for the legacy LC_VERSION_MIN_* load commands. Test plan: make check-lld-macho Differential revision: https://reviews.llvm.org/D100523	2021-04-21 05:41:14 -07:00
Jez Ng	7208bd4b32	[lld-macho] Skip platform checks for a few libSystem re-exports XCode 12 ships with mismatched platforms for these libraries, so this hack is necessary... Fixes PR49799. Reviewed By: #lld-macho, gkm, smeenai Differential Revision: https://reviews.llvm.org/D100913	2021-04-20 19:54:53 -04:00
Jez Ng	1aa29dffce	[lld-macho] Support subtractor relocations that reference sections The minuend (but not the subtrahend) can reference a section. Note that we do not yet properly validate that the subtrahend isn't referencing a section; I've filed PR50034 to track that. I've also extended the reloc-subtractor.s test to reorder symbols, to make sure that the addends are being associated with the minuend (and not the subtrahend) relocation. Fixes PR49999. Reviewed By: #lld-macho, thakis Differential Revision: https://reviews.llvm.org/D100804	2021-04-20 16:58:57 -04:00
Jez Ng	3142fc3b5b	[lld-macho] Have toString() emit full path to archive files It doesn't make sense to take just the base filename for archives when we emit the full path for object files. (LLD-ELF emits the full path too.) This will also make it easier to write a proper test for {D100147}. Reviewed By: #lld-macho, oontvoo Differential Revision: https://reviews.llvm.org/D100357	2021-04-13 10:43:28 -04:00
Jez Ng	2461804b48	[lld-macho] Symbol::value should always be uint64_t D98837 migrated a bunch of `value`s to uint64_t, but missed these.	2021-04-06 17:54:11 -04:00
Jez Ng	ceec610754	[lld-macho] Fix & refactor symbol size calculations I noticed two problems with the previous implementation: * N_ALT_ENTRY symbols weren't being handled correctly -- they should determine the size of the previous symbol, even though they don't cause a new section to be created * The last symbol in a section had its size calculated wrongly; the first subsection's size was used instead of the last one I decided to take the opportunity to refactor things as well, mainly to realize my observation [here](https://reviews.llvm.org/D98837#inline-931511) that we could avoid doing a binary search to match symbols with subsections. I think the resulting code is a bit simpler too. N Min Max Median Avg Stddev x 20 4.31 4.43 4.37 4.3775 0.034162922 + 20 4.32 4.43 4.38 4.3755 0.02799906 No difference proven at 95.0% confidence Reviewed By: #lld-macho, alexshap Differential Revision: https://reviews.llvm.org/D99972	2021-04-06 15:10:01 -04:00
Jez Ng	e0df2b540a	[lld-macho] Rename SubsectionMapping to SubsectionMap We bikeshedded about it here: https://reviews.llvm.org/D98837#inline-931557 I initially suggested SubsectionMapping, but I thought the discussion landed on doing `std::vector<SubsectionEntry>`. @alexshap went and did both, but on hindsight I regret adding 3 more characters to an already long name, and I think SubsectionEntry is descriptive enough... This diff also renames `subsectionMap` to `subsecMap` for consistency with other variable names in the codebase.	2021-04-06 14:26:13 -04:00
Cyndy Ishida	0116d04d04	[TextAPI] move source code files out of subdirectory, NFC TextAPI/ELF has moved out into InterfaceStubs, so theres no longer a need to seperate out TextAPI between formats. Reviewed By: ributzka, int3, #lld-macho Differential Revision: https://reviews.llvm.org/D99811	2021-04-05 10:24:42 -07:00
Jez Ng	817d98d841	[lld-macho][nfc] Refactor in preparation for 32-bit support The main challenge was handling the different on-disk structures (e.g. `mach_header` vs `mach_header_64`). I tried to strike a balance between sprinkling `target->wordSize == 8` checks everywhere (branchy = slow, and ugly) and templatizing everything (causes code bloat, also ugly). I think I struck a decent balance by judicious use of type erasure. Note that LLD-ELF has a similar architecture, though it seems to use more templating. Linking chromium_framework takes about the same time before and after this change: N Min Max Median Avg Stddev x 20 4.52 4.67 4.595 4.5945 0.044423204 + 20 4.5 4.71 4.575 4.582 0.056344803 No difference proven at 95.0% confidence Reviewed By: #lld-macho, oontvoo Differential Revision: https://reviews.llvm.org/D99633	2021-04-02 18:46:39 -04:00
Yang Fan	d441dee5c2	[lld][MachO] Fix -Wsign-compare warning (NFC) GCC warning: ``` /llvm-project/lld/MachO/InputFiles.cpp:484:24: warning: comparison of integer expressions of different signedness: ‘int64_t’ {aka ‘long int’} and ‘uint64_t’ {aka ‘long unsigned int’} [-Wsign-compare] 484 \| return value < subsectionEntry.offset; \| ~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~ ```	2021-04-02 11:33:56 +08:00
Alexander Shaposhnikov	f6ad045366	[lld][MachO] Make emitEndFunStab independent from .subsections_via_symbols This diff addresses FIXME in SyntheticSections.cpp and removes the dependency of emitEndFunStab on .subsections_via_symbols. Test plan: make check-lld-macho Differential revision: https://reviews.llvm.org/D99054	2021-04-01 17:48:09 -07:00
Alexander Shaposhnikov	f1e4e2fb20	[lld][MachO] Refactor handling of subsections This diff is a preparation for fixing FunStabs (incorrect size calculation). std::map<uint32_t, InputSection*> (SubsectionMap) is replaced with a sorted vector + binary search. If .subsections_via_symbols is set this vector will contain the list of subsections, otherwise, the offsets will be used for calculating the symbols sizes. Test plan: make check-all Differential revision: https://reviews.llvm.org/D98837	2021-03-31 16:52:53 -07:00
Jez Ng	dc8bee9265	[lld-macho] Check address ranges when applying relocations This diff required fixing `getEmbeddedAddend` to apply sign extension to 32-bit values. We were previously passing around wrong 64-bit addend values that became "right" after being truncated back to 32-bit. I've also made `getEmbeddedAddend` return a signed int, which is similar to what LLD-ELF does for its `getImplicitAddend`. `reportRangeError`, `checkUInt`, and `checkInt` are counterparts of similar functions in LLD-ELF. Reviewed By: #lld-macho, thakis Differential Revision: https://reviews.llvm.org/D98387	2021-03-12 17:26:27 -05:00
Jez Ng	a723db92d8	[lld-macho][nfc] Refactor subtractor reloc handling SUBTRACTOR relocations are always paired with UNSIGNED relocations to indicate a pair of symbols whose address difference we want. Functionally they are like a single relocation: only one pointer gets written / relocated. Previously, we would handle these pairs by skipping over the SUBTRACTOR relocation and writing the pointer when handling the UNSIGNED reloc. This diff reverses things, so we write while handling SUBTRACTORs and skip over the UNSIGNED relocs instead. Being able to distinguish between SUBTRACTOR and UNSIGNED relocs in the write phase (i.e. inside `relocateOne`) is useful for the upcoming range check diff: we want to check that SUBTRACTOR relocs write signed values, but UNSIGNED relocs (naturally) write unsigned values. Reviewed By: #lld-macho, thakis Differential Revision: https://reviews.llvm.org/D98386	2021-03-11 13:28:13 -05:00
Jez Ng	e8a3058303	[lld-macho] Fix handling of X86_64_RELOC_SIGNED_{1,2,4} The previous implementation miscalculated the addend, resulting in an underflow. This meant that every SIGNED_N section relocation would be associated with the last subsection (since the addend would now be a huge number). We were "lucky" that this mistake was typically cancelled out -- 64-to-32-bit-truncation meant that the final value was correct, as long as subsections were not rearranged. Reviewed By: #lld-macho, thakis Differential Revision: https://reviews.llvm.org/D98385	2021-03-11 13:28:11 -05:00
Jez Ng	5433a79176	[lld-macho][nfc] Create Relocations.{h,cpp} for relocation-specific code This more closely mirrors the structure of lld-ELF. Reviewed By: #lld-macho, thakis Differential Revision: https://reviews.llvm.org/D98384	2021-03-11 13:28:09 -05:00
Jez Ng	1752f28506	[lld-macho][nfc] Remove `MachO::` prefix where possible Previously, SyntheticSections.cpp did not have a top-level `using namespace llvm::MachO` because it caused a naming conflict: `llvm::MachO::Symbol` would collide with `lld::macho::Symbol`. `MachO::Symbol` represents the symbols defined in InterfaceFiles (TBDs). By moving the inclusion of InterfaceFile.h into our .cpp files, we can avoid this name collision in other files where we are only dealing with LLD's own symbols. Along the way, I removed all unnecessary "MachO::" prefixes in our code. Cons of this approach: If TextAPI/MachO/Symbol.h gets included via some other header file in the future, we could run into this collision again. Alternative 1: Have either TextAPI/MachO or BinaryFormat/MachO.h use a different namespace. Most of the benefit of `using namespace llvm::MachO` comes from being able to use things in BinaryFormat/MachO.h conveniently; if TextAPI was under a different (and fully-qualified) namespace like `llvm::tapi` that would solve our problems. Cons: lots of files across llvm-project will need to be updated, and folks who own the TextAPI code need to agree to the name change. Alternative 2: Rename our Symbol to something like `LldSymbol`. I think this is ugly. Personally I think alternative #1 is ideal, but I'm not sure the effort to do it is worthwhile, this diff's halfway solution seems good enough to me. Thoughts? Reviewed By: #lld-macho, oontvoo, MaskRay Differential Revision: https://reviews.llvm.org/D98149	2021-03-11 13:28:08 -05:00
Greg McGary	fdc0c21973	[lld-macho][NFC] when reasonable, replace auto keyword with type names lld policy discourages `auto`. Replace it with a type name whenever reasonable. Retain `auto` to avoid ... * redundancy, as for decls such as `auto t = mumble_cast<TYPE >` or similar that specifies the result type on the RHS * verbosity, as for iterators * gratuitous suffering, as for lambdas Along the way, add `const` when appropriate. Note: a future diff will ... * add more `const` qualifiers * remove `opt::` when we are already `using llvm::opt` Differential Revision: https://reviews.llvm.org/D98313	2021-03-09 22:08:32 -08:00
Vy Nguyen	70c0dbf151	[lld-macho][NFC] Replace config param with a global in hasCompatVersion() helper. Differential Revision: https://reviews.llvm.org/D98115	2021-03-06 11:32:51 -05:00
Vy Nguyen	fc5d804ddb	[lld-macho] Check platform and version when constructor ObjFile Differential Revision: https://reviews.llvm.org/D97979	2021-03-05 17:34:38 -05:00
Jez Ng	3c19b4f34d	[lld-macho] Skip over symbols in un-parsed debug info sections clang appears to emit symbols in `__debug_aranges`, at least for arm64... in the examples I've seen, it doesn't seem like those symbols are referenced outside of `__DWARF`, so I think they're safe to ignore. But hopefully @clayborg can confirm. Reviewed By: clayborg Differential Revision: https://reviews.llvm.org/D98073	2021-03-05 17:24:32 -05:00
Jez Ng	fc011b5eb1	[lld-macho] Replace debug-info-related assert with FIXME We'll need to properly handle object files with multiple source inputs eventually, but remove the assert for now so we can successfully emit binaries for testing. Reviewed By: #lld-macho, smeenai Differential Revision: https://reviews.llvm.org/D98067	2021-03-05 17:24:31 -05:00
Jez Ng	0d4dadc64c	[lld-macho] Include install name in error messages for dylibs from TBDs Since multiple dylibs can be defined in one TBD, this is necessary to avoid confusion. Reviewed By: #lld-macho, oontvoo Differential Revision: https://reviews.llvm.org/D97905	2021-03-04 14:36:49 -05:00
Jez Ng	55a32812fa	[lld-macho] Filter TAPI re-exports by target Previously, we were loading re-exports without checking whether they were compatible with our target. Prior to {D97209}, it meant that we were defining dylib symbols that were invalid -- usually a silent failure unless our binary actually used them. D97209 exposed this as an explicit error. Along the way, I've extended our TAPI compatibility check to cover the platform as well, instead of just checking the arch. To this end, I've replaced MachO::Architecture with MachO::Target in our Config struct. Reviewed By: #lld-macho, oontvoo Differential Revision: https://reviews.llvm.org/D97867	2021-03-04 14:36:47 -05:00
Jez Ng	8601be809e	[lld-macho] Fix & fold reexport-nested-libs test into stub-link.s The reexport-nested-libs test added in D97438 was a bit wonky. First, it was linking against libReexportSystem.tbd which targets the iOS simulator, and which in turn attempted to re-export the iOS simulator's libSystem. However, due to the way `-syslibroot` works, it was actually re-exporting the macOS libSystem. As a result, the test was not actually able to resolve the symbols in the desired libSystem. I'm guessing that @oontvoo was confused by this and therefore included those symbols in libReexportSystem.tbd itself. But this means that the test wasn't actually testing the resolution of re-exported symbols (though it did at least verify that the re-exported libraries could be located). After some consideration, I figured that stub-link.s could be extended to cover what reexport-nested-libs.s was attempting to do. The test targets macOS, so we only have one `-syslibroot` and no chance of confusion. Reviewed By: #lld-macho, oontvoo Differential Revision: https://reviews.llvm.org/D97866	2021-03-04 14:36:46 -05:00
Jez Ng	5d9aafc09a	[lld-macho] Bind re-exported symbols directly to implicitly-linked umbrellas Suppose we are linking against libFoo, which re-exports the implicitly-bound libSystem, which in turn re-exports some non-explicitly-bound library like `/usr/lib/system/libsystem_c.dylib`. Then any bindings we have to a symbol in libsystem_c should use libSystem (and not libFoo) as the umbrella library. Reviewed By: #lld-macho, smeenai Differential Revision: https://reviews.llvm.org/D97865	2021-03-04 14:36:44 -05:00
Jez Ng	b63919e180	[lld-macho] Require -arch and -platform_version to always be specified We previously defaulted to x86_64 and an unknown platform, which was fine when we only supported one arch and did no platform checks, but that will no longer be true going ahead. Therefore, we should require those flags to be specified whenever the linker is invoked. Note that LLD-ELF and ld64 both infer the arch from their input object files, but the usefulness of that is questionable since clang will always specify these flags, and most of the time `lld` will be invoked via clang. Reviewed By: #lld-macho, thakis Differential Revision: https://reviews.llvm.org/D97799	2021-03-03 15:52:10 -05:00
Greg McGary	4af1522a85	[lld-macho] Rework length check when opening input files This reverts diff D97610 (commit `0223ab035c`) and adds a one-line fix to verify that a `MemoryBufferRef` has sufficient length before reading a 4-byte magic number. Differential Revision: https://reviews.llvm.org/D97757	2021-03-02 13:00:57 -08:00
Vy Nguyen	9a2e2de15f	[lld-macho] Change loadReexport to handle the case where a TAPI re-exports to reference documents nested within other TBD. Currently, it was delibrately impleneted to not handle this case, but as it has turnt out, we need this feature. The concrete use case is `System/Library/Frameworks/Cocoa.framework/Versions/A/Cocoa` reexports /System/Library/Frameworks/AppKit.framework/Versions/C/AppKit , which then rexports /System/Library/PrivateFrameworks/UIFoundation.framework/Versions/A/UIFoundation The current implemention uses a global currentTopLevelTapi, which is not reset until it finishes loading the whole tree. This is a problem because if the top-level is set to Cocoa, then when we get to UIFoundation, it will try to find UIFoundation in the current top level, which is Cocoa and will not find it. The right thing should be: - When loading a library from a TBD file, re-exports need to be looked up in the auxiliary documents within the same TBD. - When loading from an actual dylib, no additional TBD documents need to be examined. - In no case does a re-export mentioned in one TBD file need to be looked up in a document in an auxiliary document from a different TBD file Differential Revision: https://reviews.llvm.org/D97438	2021-03-02 12:14:31 -05:00
Nico Weber	8174f33dc9	[lld/mac] Add support for -flat_namespace -flat_namespace makes lld emit binaries that use name lookup that's more in line with other POSIX systems: Instead of looking up symbols as (dylib,name) pairs by dyld, they're instead looked up just by name. -flat_namespace has three effects: 1. MH_TWOLEVEL and MH_NNOUNDEFS are no longer set in the Mach-O header 2. All symbols use BIND_SPECIAL_DYLIB_FLAT_LOOKUP as ordinal 3. When a dylib is added to the link, its dependent dylibs are also added, so that lld can verify that no undefined symbols remain at the end of a link with -flat_namespace. These transitive dylibs are added for symbol resolution, but they are not emitted in LC_LOAD_COMMANDs. -undefined with -flat_namespace still isn't implemented. Before this change, it was impossible to hit that combination because -flat_namespace caused a diagnostic. Now that it no longer does, emit a dedicated temporary diagnostic when both flags are used. Differential Revision: https://reviews.llvm.org/D97641	2021-03-01 15:25:10 -05:00
Jez Ng	f083f652c3	[lld-macho][nfc] Remove TODO regarding addends There was initially some concern around the correct handling of pcrel section relocations with r_length != 2. But it looks like there are no such relocations in practice -- x86_64's pcrel section relocs all have r_length == 2, and ARM64 doesn't even have pcrel section relocs. So we can replace the TODO with an assert. Reviewed By: #lld-macho, thakis Differential Revision: https://reviews.llvm.org/D97576	2021-03-01 12:30:08 -05:00
Greg McGary	0223ab035c	[lld-macho] check minimum header length when opening linkable input files Bifurcate the `readFile()` API into ... * `readRawFile()` which performs no checks, and * `readLinkableFile()` which enforces minimum length of 20 bytes, same as ld64 There are no new tests because tweaks to existing tests are sufficient. Differential Revision: https://reviews.llvm.org/D97610	2021-02-27 14:41:40 -08:00
Jez Ng	82b3da6f6f	[lld-macho] Extract embedded addends for arm64 UNSIGNED relocations On arm64, UNSIGNED relocs are the only ones that use embedded addends instead of the ADDEND relocation. Also ensure that the addend works when UNSIGNED is part of a SUBTRACTOR pair. Reviewed By: #lld-macho, alexshap Differential Revision: https://reviews.llvm.org/D97105	2021-02-27 12:31:34 -05:00
Jez Ng	541390131e	[lld-macho] Don't emit rebase opcodes for subtractor minuend relocs Also add a few asserts to verify that we are indeed handling an UNSIGNED relocation as the minued. I haven't made it an actual user-facing error since I don't think llvm-mc is capable of generating SUBTRACTOR relocations without an associated UNSIGNED. Reviewed By: #lld-macho, smeenai Differential Revision: https://reviews.llvm.org/D97103	2021-02-27 12:31:34 -05:00
Jez Ng	84579fc24f	[lld-macho] Basic support for linkage and visibility attributes in LTO When parsing bitcode, convert LTO Symbols to LLD Symbols in order to perform resolution. The "winning" symbol will then be marked as Prevailing at LTO compilation time. This is similar to what the other LLD ports do. This change allows us to handle `linkonce` symbols correctly, and to deal with duplicate bitcode symbols gracefully. Previously, both scenarios would result in an assertion failure inside the LTO code, complaining that multiple Prevailing definitions are not allowed. While at it, I also added basic logic around visibility. We don't do anything useful with it yet, but we do check that its value is valid. LLD-ELF appears to use it only to set FinalDefinitionInLinkageUnit for LTO, which I think is just a performance optimization. From my local experimentation, the linker itself doesn't seem to do anything differently when encountering linkonce / linkonce_odr / weak / weak_odr. So I've only written a test for one of them. LLD-ELF has more, but they seem to mostly be testing the intermediate bitcode output of their LTO backend...? I'm far from an expert here though, so I might very well be missing things. Reviewed By: #lld-macho, MaskRay, smeenai Differential Revision: https://reviews.llvm.org/D94342	2021-02-25 13:27:40 -05:00
Jez Ng	4752cdc9a2	[lld-macho] Check for arch compatibility when loading ObjFiles and TBDs The silent failures had confused me a few times. I haven't added a similar check for platform yet as we don't yet have logic to infer the platform automatically, and so adding that check would require updating dozens of test files. Reviewed By: #lld-macho, thakis, alexshap Differential Revision: https://reviews.llvm.org/D97209	2021-02-23 22:02:38 -05:00
Jez Ng	5e851733c5	[lld-macho] Fix semantics & add tests for ARM64 GOT/TLV relocs I've adjusted the RelocAttrBits to better fit the semantics of the relocations. In particular: 1. _UNSIGNED relocations are no longer marked with the `TLV` bit, even though they can occur within TLV sections. Instead the `TLV` bit is reserved for relocations that can reference thread-local symbols, and _UNSIGNED relocations have their own `UNSIGNED` bit. The previous implementation caused TLV and regular UNSIGNED semantics to be conflated, resulting in rebase opcodes being incorrectly emitted for TLV relocations. 2. I've added a new `POINTER` bit to denote non-relaxable GOT relocations. This distinction isn't important on x86 -- the GOT relocations there are either relaxable or non-relaxable loads -- but arm64 has `GOT_LOAD_PAGE21` which loads the page that the referent symbol is in (regardless of whether the symbol ends up in the GOT). This relocation must reference a GOT symbol (so must have the `GOT` bit set) but isn't itself relaxable (so must not have the `LOAD` bit). The `POINTER` bit is used for relocations that must reference a GOT slot. 3. A similar situation occurs for TLV relocations. 4. ld64 supports both a pcrel and an absolute version of ARM64_RELOC_POINTER_TO_GOT. But the semantics of the absolute version are pretty weird -- it results in the value of the GOT slot being written, rather than the address. (That means a reference to a dynamically-bound slot will result in zeroes being written.) The programs I've tried linking don't use this form of the relocation, so I've dropped our partial support for it by removing the relevant RelocAttrBits. Reviewed By: alexshap Differential Revision: https://reviews.llvm.org/D97031	2021-02-23 22:02:38 -05:00
Jez Ng	e5d780e049	[lld-macho] Use full input file name in invalid relocation error message Just something I noticed while debugging arm relocations... Reviewed By: #lld-macho, thakis Differential Revision: https://reviews.llvm.org/D97078	2021-02-23 22:02:38 -05:00
Vy Nguyen	5a856f5b44	Reland [lld-macho]Implement bundle_loader Reland `1a0afcf518` https://reviews.llvm.org/D95913 New change: fix UB bug caused by copying empty path/name. (since the executable does not have a name)	2021-02-22 14:05:12 -05:00
Vitaly Buka	c17547df44	Revert "Implement -bundle_loader" D95913 passes null pointer into memcpy This reverts commit `1a0afcf518`.	2021-02-19 17:40:07 -08:00
Vy Nguyen	1a0afcf518	Implement -bundle_loader Differential Revision: https://reviews.llvm.org/D95913 Usage: -bundle_loader <executable> This option specifies the executable that will load the build output file being linked. When building a bundle, users can use the --bundle_loader to specify an executable that contains symbols referenced, but not implemented in the bundle.	2021-02-18 16:11:37 -05:00
Greg McGary	87104faac4	[lld-macho] Add ARM64 target arch This is an initial base commit for ARM64 target arch support. I don't represent that it complete or bug-free, but wish to put it out for review now that some basic things like branch target & load/store address relocs are working. I can add more tests to this base commit, or add them in follow-up commits. It is not entirely clear whether I use the "ARM64" (Apple) or "AArch64" (non-Apple) naming convention. Guidance is appreciated. Differential Revision: https://reviews.llvm.org/D88629	2021-02-08 18:14:07 -07:00
Jez Ng	f843bb82c0	[lld-macho] Force-loading should share code path with regular archive loads This extends {D92539} to work even when we are loading archive members via `-force_load`. I uncovered this issue while trying to force-load archives containing bitcode -- we were segfaulting. In addition to fixing the `-force_load` case, this diff also addresses the behavior of `-ObjC` when LTO bitcode is involved -- we need to force-load those archive members if they contain ObjC categories. Reviewed By: #lld-macho, smeenai Differential Revision: https://reviews.llvm.org/D95265	2021-02-03 13:43:47 -05:00
Jez Ng	163dcd8513	[lld-macho] Associate each Symbol with an InputFile This makes our error messages more informative. But the bigger motivation is for LTO symbol resolution, which will be in an upcoming diff. The changes in this one are largely mechanical. Reviewed By: #lld-macho, smeenai Differential Revision: https://reviews.llvm.org/D94316	2021-02-03 13:43:47 -05:00
Greg McGary	3a9d2f1488	[lld-macho][NFC] refactor relocation handling Add per-reloc-type attribute bits and migrate code from per-target file into target independent code, driven by reloc attributes. Many cleanups Differential Revision: https://reviews.llvm.org/D95121	2021-02-02 10:54:53 -07:00
Jez Ng	e98b441a09	[lld-macho] Remove unnecessary llvm:: namespace prefixes	2021-01-09 12:44:35 -05:00
Nico Weber	13f439a187	[lld/mac] Implement support for private extern symbols Private extern symbols are used for things scoped to the linkage unit. They cause duplicate symbol errors (so they're in the symbol table, unlike TU-scoped truly local symbols), but they don't make it into the export trie. They are created e.g. by compiling with -fvisibility=hidden. If two weak symbols have differing privateness, the combined symbol is non-private external. (Example: inline functions and some TUs that include the header defining it were built with -fvisibility-inlines-hidden and some weren't). A weak private external symbol implicitly has its "weak" dropped and behaves like a regular strong private external symbol: Weak is an export trie concept, and private symbols are not in the export trie. If a weak and a strong symbol have different privateness, the strong symbol wins. If two common symbols have differing privateness, the larger symbol wins. If they have the same size, the privateness of the symbol seen later during the link wins (!) -- this is a bit lame, but it matches ld64 and this behavior takes 2 lines less to implement than the less surprising "result is non-private external), so match ld64. (Example: `int a` in two .c files, both built with -fcommon, one built with -fvisibility=hidden and one without.) This also makes `__dyld_private` a true TU-local symbol, matching ld64. To make this work, make the `const char*` StringRefZ ctor to correctly set `size` (without this, writing the string table crashed when calling getName() on the __dyld_private symbol). Mention in CommonSymbol's comment that common symbols are now disabled by default in clang. Mention in -keep_private_externs's HelpText that the flag only has an effect with `-r` (which we don't implement yet -- so this patch here doesn't regress any behavior around -r + -keep_private_externs)). ld64 doesn't explicitly document it, but the commit text of http://reviews.llvm.org/rL216146 does, and ld64's OutputFile::buildSymbolTable() checks `_options.outputKind() == Options::kObjectFile` before calling `_options.keepPrivateExterns()` (the only reference to that function). Fixes PR48536. Differential Revision: https://reviews.llvm.org/D93609	2020-12-21 21:23:33 -05:00
Greg McGary	d4ec3346b1	[lld-macho][nfc] Refactor to accommodate paired relocs This is a refactor to pave the way for supporting paired-ADDEND for ARM64. The only paired reloc type for X86_64 is SUBTRACTOR. In a later diff, I will add SUBTRACTOR for both X86_64 and ARM64. * s/`getImplicitAddend`/`getAddend`/ because it handles all forms of addend: implicit, explicit, paired. * add predicate `bool isPairedReloc()` * check range of `relInfo.r_symbolnum` is internal, unrelated to user-input, so use `assert()`, not `error()` * minor cleanups & rearrangements in `InputFile::parseRelocations()` Differential Revision: https://reviews.llvm.org/D90614	2020-12-17 20:21:41 -08:00
Jez Ng	4c8276cdc1	[lld-macho] Use LC_LOAD_WEAK_DYLIB for dylibs with only weakrefs Note that dylibs without any refs will still be loaded in the usual (strong) fashion. Reviewed By: #lld-macho, thakis Differential Revision: https://reviews.llvm.org/D93435	2020-12-17 08:49:17 -05:00
Jez Ng	811444d7a1	[lld-macho] Add support for weak references Weak references need not necessarily be satisfied at runtime (but they must still be satisfied at link time). So symbol resolution still works as per usual, but we now pass around a flag -- ultimately emitting it in the bind table -- to indicate if a given dylib symbol is a weak reference. ld64's behavior for symbols that have both weak and strong references is a bit bizarre. For non-function symbols, it will emit a weak import. For function symbols (those referenced by BRANCH relocs), it will emit a regular import. I'm not sure what value there is in that behavior, and since emulating it will make our implementation more complex, I've decided to treat regular weakrefs like function symbol ones for now. Fixes PR48511. Reviewed By: #lld-macho, thakis Differential Revision: https://reviews.llvm.org/D93369	2020-12-17 08:49:16 -05:00
Nico Weber	ec88746a05	[lld/mac] fill in current and compatibility version for LC_LOAD_(WEAK_)DYLIB Not sure if anything actually depends on this, but it makes `otool -L` output look nicer. Differential Revision: https://reviews.llvm.org/D93332	2020-12-15 19:34:59 -05:00
Jez Ng	3aa8e071dd	[lld-macho] Add implicit dylib support for frameworks {D93000} applied to frameworks. Partial fix for PR48511. Reviewed By: #lld-macho, thakis Differential Revision: https://reviews.llvm.org/D93277	2020-12-15 15:58:26 -05:00
Jez Ng	544148ae70	[lld-macho] -weak_{library,framework} should always take priority We were not setting forceWeakImport for file paths given by `-weak_library` if we had already loaded the file. This diff fixes that by having `loadDylib` return a cached DylibFile instance even if we have already loaded that file. We still avoid emitting multiple LC_LOAD_DYLIBs, but we achieve this by making inputFiles a SetVector instead of relying on the `loadedDylibs` cache. Reviewed By: #lld-macho, smeenai Differential Revision: https://reviews.llvm.org/D93255	2020-12-15 15:58:26 -05:00
Jez Ng	76c36c11a9	[lld-macho] Don't load dylibs more than once Also remove `DylibFile::reexported` since it's unused. Fixes llvm.org/PR48393. Reviewed By: thakis Differential Revision: https://reviews.llvm.org/D93001	2020-12-10 15:57:52 -08:00
Jez Ng	6a348f6158	[lld-macho] Implement `-no_implicit_dylibs` Dylibs that are "public" -- i.e. top-level system libraries -- are considered implicitly linked when another library re-exports them. That is, we should load them & bind directly to their symbols instead of via their re-exporting umbrella library. This diff implements that behavior by default, as well as an opt-out flag. In theory, this is just a performance optimization, but in practice it seems that it's needed for correctness. Fixes llvm.org/PR48395. Reviewed By: thakis Differential Revision: https://reviews.llvm.org/D93000	2020-12-10 15:57:52 -08:00
Jez Ng	863f7a745e	[lld-macho] Don't attempt to emit rebase opcodes for debug sections This was causing a crash as we were attempting to look up the nonexistent parent OutputSection of the debug sections. We didn't detect it earlier because there was no test for PIEs with debug info (PIEs require us to emit rebases for X86_64_RELOC_UNSIGNED). This diff filters out the debug sections while loading the ObjFiles. In addition to fixing the above problem, it also lets us avoid doing redundant work -- we no longer parse / apply relocations / attempt to emit dyld opcodes for these sections that we don't emit. Fixes llvm.org/PR48392. Reviewed By: thakis Differential Revision: https://reviews.llvm.org/D92904	2020-12-10 15:57:51 -08:00
Jez Ng	78976bf3da	[lld-macho] Support parsing of bitcode within archives Also error out if we find anything other than an object or bitcode file in the archive. Note that we were previously inserting the symbols and sections of the unpacked ObjFile into the containing ArchiveFile. This was actually unnecessary -- we can just insert the ObjectFile (or BitcodeFile) into the `inputFiles` vector. This is the approach taken by LLD-ELF. Reviewed By: thakis Differential Revision: https://reviews.llvm.org/D92539	2020-12-08 10:34:32 -08:00
Jez Ng	7b007ac080	[lld-macho][nfc] Move some methods from InputFile to ObjFile Additionally: 1. Move the helper functions in InputSection.h below the definition of `InputSection`, so the important stuff is on top 2. Remove unnecessary `explicit` Reviewed By: #lld-macho, compnerd Differential Revision: https://reviews.llvm.org/D92453	2020-12-08 10:34:32 -08:00
Nico Weber	16b1f6e385	[mac/lld] Add support for the LC_LINKER_OPTION load command in o files clang puts `-framework CoreFoundation` in this load command for files that use @available / __builtin_available. Without support for this, binaries that don't explicitly link to CoreFoundation fail to link. Differential Revision: https://reviews.llvm.org/D92624	2020-12-04 08:46:53 -05:00
Nico Weber	7cb0a373d1	[mac/lld] Implement -t Goes well with `-why_load` to get an idea of load order. Differential Revision: https://reviews.llvm.org/D92583	2020-12-03 16:02:38 -05:00
Nico Weber	3422f3cc6e	Reland "[mac/lld] Implement -why_load". The problem was that `sym` became replaced in the call to make<ObjFile> and referring to it afer that read memory that now stored a different kind of symbol (a Defined instead of a LazySymbol). Since this happens only once per archive, just copy the symbol to the stack before make<ObjFile> and read the copy instead. Originally reviewed at https://reviews.llvm.org/D92496	2020-12-03 08:35:12 -05:00
Nico Weber	ea0029f55d	Revert "[mac/lld] Implement -why_load" This reverts commit `542d3b609d`. Seems to break check-lld. Reverting while I take a look.	2020-12-02 18:57:46 -05:00
Nico Weber	542d3b609d	[mac/lld] Implement -why_load This is useful for debugging why lld loads .o files it shouldn't load. It's also useful for users of lld -- I've used ld64's version of this a few times. Differential Revision: https://reviews.llvm.org/D92496	2020-12-02 18:33:12 -05:00
Nico Weber	ca634393fc	[mac/lld] Make --reproduce work with thin archives See http://reviews.llvm.org/rL268229 and http://reviews.llvm.org/rL313832 which did the same for the ELF port. Differential Revision: https://reviews.llvm.org/D92456	2020-12-02 09:48:31 -05:00
Nico Weber	b2f00f24a3	[mac/lld] Include archive name in diagnostics Also, for .o files, include full path as given on link command line. Before: lld: error: undefined symbol [...], referenced from sandbox_logging.o After: lld: error: undefined symbol [...], referenced from libseatbelt.a(sandbox_logging.o) Move archiveName up to InputFile so we can consistently use toString() to print InputFiles in diags, and pass it to the ObjFile ctor. This matches the ELF and COFF ports. Differential Revision: https://reviews.llvm.org/D92437	2020-12-01 23:00:25 -05:00
Nico Weber	07ab597bb0	[lld/mac] Fix issues around thin archives - most importantly, fix a use-after-free when using thin archives, by putting the archive unique_ptr to the arena allocator. This ports D65565 to MachO - correctly demangle symbol namess from archives in diagnostics - add a test for thin archives -- it finds this UaF, but only when running it under asan (it also finds the demangling fix) - make forceLoadArchive() use addFile() with a bool to have the archive loading code in fewer places. no behavior change; matches COFF port a bit better Differential Revision: https://reviews.llvm.org/D92360	2020-12-01 18:48:29 -05:00
Jez Ng	78f6498cdc	[lld-macho] Flesh out STABS implementation This addresses a lot of the comments in {D89257}. Ideally it'd have been done in the same diff, but the commits in between make that difficult. This diff implements: * N_GSYM and N_STSYM, the STABS for global and static symbols * Has the STABS reflect the section IDs of their referent symbols * Ensures we don't fail when encountering absolute symbols or files with no debug info * Sorts STABS symbols by file to minimize the number of N_OSO entries Reviewed By: clayborg Differential Revision: https://reviews.llvm.org/D92366	2020-12-01 15:05:21 -08:00
Jez Ng	b768d57b36	[lld-macho] Add archive name and file modtime to STABS output We should also set the modtime when running LTO. That will be done in a future diff, together with support for the `-object_path_lto` flag. Reviewed By: clayborg Differential Revision: https://reviews.llvm.org/D91318	2020-12-01 15:05:21 -08:00
Jez Ng	3fcb0eeb15	[lld-macho] Emit STABS symbols for debugging, and drop debug sections Debug sections contain a large amount of data. In order not to bloat the size of the final binary, we remove them and instead emit STABS symbols for `dsymutil` and the debugger to locate their contents in the object files. With this diff, `dsymutil` is able to locate the debug info. However, we need a few more features before `lldb` is able to work well with our binaries -- e.g. having `LC_DYSYMTAB` accurately reflect the number of local symbols, emitting `LC_UUID`, and more. Those will be handled in follow-up diffs. Note also that the STABS we emit differ slightly from what ld64 does. First, we emit the path to the source file as one `N_SO` symbol instead of two. (`ld64` emits one `N_SO` for the dirname and one of the basename.) Second, we do not emit `N_BNSYM` and `N_ENSYM` STABS to mark the start and end of functions, because the `N_FUN` STABS already serve that purpose. @clayborg recommended these changes based on his knowledge of what the debugging tools look for. Additionally, this current implementation doesn't accurately reflect the size of function symbols. It uses the size of their containing sectioins as a proxy, but that is only accurate if `.subsections_with_symbols` is set, and if there isn't an `N_ALT_ENTRY` in that particular subsection. I think we have two options to solve this: 1. We can split up subsections by symbol even if `.subsections_with_symbols` is not set, but include constraints to ensure those subsections retain their order in the final output. This is `ld64`'s approach. 2. We could just add a `size` field to our `Symbol` class. This seems simpler, and I'm more inclined toward it, but I'm not sure if there are use cases that it doesn't handle well. As such I'm punting on the decision for now. Reviewed By: clayborg Differential Revision: https://reviews.llvm.org/D89257	2020-12-01 15:05:20 -08:00
Nico Weber	83e60f5a55	[lld/mac] Add --reproduce option This adds support for ld.lld's --reproduce / lld-link's /reproduce: flag to the MachO port. This flag can be added to a link command to make the link write a tar file containing all inputs to the link and a response file containing the link command. This can be used to reproduce the link on another machine, which is useful for sharing bug report inputs or performance test loads. Since the linker is usually called through the clang driver and adding linker flags can be a bit cumbersome, setting the env var `LLD_REPRODUCE=foo.tar` triggers the feature as well. The file response.txt in the archive can be used with `ld64.lld.darwinnew $(cat response.txt)` as long as the contents are smaller than the command-line limit, or with `ld64.lld.darwinnew @response.txt` once D92149 is in. The support in this patch is sufficient to create a tar file for Chromium's base_unittests that can link after unpacking on a different machine. Differential Revision: https://reviews.llvm.org/D92274	2020-11-30 08:40:21 -05:00
Nico Weber	c519bc7e16	lld/MachO: Move MachOOptTable to DriverUtils.cpp, remove DriverUtils.h This makes lld/MachO look more like lld/COFF and lld/ELF, as discussed in D91640.	2020-11-18 12:33:15 -05:00
Jez Ng	21f831134c	[lld-macho] Add very basic support for LTO Just enough to consume some bitcode files and link them. There's more to be done around the symbol resolution API and the LTO config, but I don't yet understand what all the various LTO settings do... Reviewed By: #lld-macho, compnerd, smeenai, MaskRay Differential Revision: https://reviews.llvm.org/D90663	2020-11-10 12:19:28 -08:00
Jez Ng	62a3f0c984	[lld-macho] Support absolute symbols They operate like Defined symbols but with no associated InputSection. Note that `ld64` seems to treat the weak definition flag like a no-op for absolute symbols, so I have replicated that behavior. Reviewed By: #lld-macho, smeenai Differential Revision: https://reviews.llvm.org/D87909	2020-09-25 11:28:35 -07:00
Jez Ng	c32e69b2ce	[lld-macho][re-land] Initial support for common symbols Fix earlier build break via a static_cast. This reverts commit `8112d494d3`. Differential Revision: https://reviews.llvm.org/D86909	2020-09-24 15:00:20 -07:00
Muhammad Omair Javaid	8112d494d3	Revert "[lld-macho] Initial support for common symbols" This reverts commit `63ace77962`. Breaks LLDB Arm build: http://lab.llvm.org:8011/builders/lldb-arm-ubuntu/builds/4409	2020-09-24 12:26:40 +05:00
Jez Ng	9c70281497	[lld-macho][NFC] Make `!= nullptr` implicit	2020-09-23 20:09:49 -07:00
Jez Ng	63ace77962	[lld-macho] Initial support for common symbols On Unix, it is traditionally allowed to write variable definitions without initialization expressions (such as "int foo;") to header files. These are called tentative definitions. The compiler creates common symbols when it sees tentative definitions. When linking the final binary, if there are remaining common symbols after name resolution is complete, the linker converts them to regular defined symbols in a `__common` section. This diff implements most of that functionality, though we do not yet handle the case where there are both common and non-common definitions of the same symbol. Reviewed By: #lld-macho, gkm Differential Revision: https://reviews.llvm.org/D86909	2020-09-23 19:26:40 -07:00
Greg McGary	1a3ef0417c	[lld-macho] In the context of relocs, s/target/referent/ for sections & symbols The word "target" is overloaded, so lighten its load by using another word to denote the symbol or section to which a reloc points. While more stilted than "target", "referent" is rather less pompous than "designatum" or "denotatum". :P Along the way, make a few neighboring variable names more descriptive. Reviewed By: #lld-macho, int3 Differential Revision: https://reviews.llvm.org/D87584	2020-09-22 20:31:01 -07:00
Jez Ng	cbe27316ef	[lld-macho] Implement weak bindings for GOT/TLV Previously, we were only emitting regular bindings to weak dynamic symbols; this diff adds support for the weak bindings too, which can overwrite the regular bindings at runtime. We also treat weak defined global symbols similarly -- since they can also be interposed at runtime, they need to be treated as potentially dynamic symbols. Note that weak bindings differ from regular bindings in that they do not specify the dylib to do the lookup in (i.e. weak symbol lookup happens in a flat namespace.) Differential Revision: https://reviews.llvm.org/D86572	2020-08-26 19:21:09 -07:00
Jez Ng	cf918c809b	[lld-macho] Implement -ObjC It's roughly like -force_load with some filtering. Differential Revision: https://reviews.llvm.org/D86181	2020-08-26 19:20:55 -07:00
Jez Ng	7394460d87	[lld-macho] Handle TAPI and regular re-exports uniformly The re-exports list in a TAPI document can either refer to other inlined TAPI documents, or to on-disk files (which may themselves be TBD or regular files.) Similarly, the re-exports of a regular dylib can refer to a TBD file. Differential Revision: https://reviews.llvm.org/D85404	2020-08-26 19:20:48 -07:00
Jez Ng	6336c042f6	[lld-macho] Make it possible to re-export .tbd files Two things needed fixing for that to work: 1. getName() no longer returns null for DylibFiles constructed from TAPIs 2. markSubLibrary() now accepts .tbd as a possible extension Differential Revision: https://reviews.llvm.org/D86180	2020-08-26 19:20:42 -07:00
Jez Ng	7e6d675499	[lld-macho] Avoid unnecessary shared_ptr in DylibFile ctor DylibFile doesn't store a pointer to its InterfaceFile parameter, so there's no need to use a shared_ptr. Reviewed By: #lld-macho, compnerd Differential Revision: https://reviews.llvm.org/D85402	2020-08-12 19:50:12 -07:00
Jez Ng	a499898e86	[lld-macho] Generate ObjC symbols from .tbd files I followed similar logic in TapiFile.cpp. Reviewed By: #lld-macho, smeenai Differential Revision: https://reviews.llvm.org/D85255	2020-08-12 19:50:10 -07:00
Jez Ng	3c9100fb78	[lld-macho] Support dynamic linking of thread-locals References to symbols in dylibs work very similarly regardless of whether the symbol is a TLV. The main difference is that we have a separate `__thread_ptrs` section that acts as the GOT for these thread-locals. We can identify thread-locals in dylibs by a flag in their export trie entries, and we cross-check it with the relocations that refer to them to ensure that we are not using a GOT relocation to reference a thread-local (or vice versa). Reviewed By: #lld-macho, smeenai Differential Revision: https://reviews.llvm.org/D85081	2020-08-12 19:50:09 -07:00
Greg McGary	a379f2c251	[lld-macho] Handle command-line option -sectcreate SEG SECT FILE Handle command-line option `-sectcreate SEG SECT FILE`, which inputs a binary blob from `FILE` into `SEG,SECT` Reviewed By: int3 Differential Revision: https://reviews.llvm.org/D85501	2020-08-10 18:47:13 -07:00
Jez Ng	31d5885842	[lld-macho] Partial support for weak definitions This diff adds support for weak definitions, though it doesn't handle weak symbols in dylibs quite correctly -- we need to emit binding opcodes for them in the weak binding section rather than the lazy binding section. What is covered in this diff: 1. Reading the weak flag from symbol table / export trie, and writing it to the export trie 2. Refining the symbol table's rules for choosing one symbol definition over another. Wrote a few dozen test cases to make sure we were matching ld64's behavior. We can now link basic C++ programs. Reviewed By: #lld-macho, compnerd Differential Revision: https://reviews.llvm.org/D83532	2020-07-24 15:55:25 -07:00
Jez Ng	74871cdad7	[lld-macho] Ensure __bss sections we output have file offset of zero Summary: llvm-mc emits `__bss` sections with an offset of zero, but we weren't expecting that in our input, so we were copying non-zero data from the start of the file and putting it in `__bss`, with obviously undesirable runtime results. (It appears that the kernel will copy those nonzero bytes as long as the offset is nonzero, regardless of whether S_ZERO_FILL is set.) I debated on whether to make a special ZeroFillSection -- separate from a regular InputSection -- but it seemed like too much work for now. But I'm happy to refactor if anyone feels strongly about having it as a separate class. Depends on D80857. Reviewers: ruiu, pcc, MaskRay, smeenai, alexshap, gkm, Ktwu, christylee Reviewed By: smeenai Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D80859	2020-06-17 20:41:28 -07:00
Jez Ng	fcde378dcb	[lld-macho] Support non-pcrel section relocs Summary: Depends on D80854. Reviewers: ruiu, pcc, MaskRay, smeenai, alexshap, gkm, Ktwu, christylee Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D80855	2020-06-17 20:41:28 -07:00
Saleem Abdulrasool	73312976ad	lld: remove old test support path This removes the stub library that lld injected to satisfy the dependency on the libSystem. Now with TBD support, we can provide the stub library to permit the tests to function properly as they would on a real system. Reviewed By: smeenai Differential Revision: https://reviews.llvm.org/D81418	2020-06-16 15:57:58 -07:00
Jez Ng	53c796b948	[lld-macho] Properly handle & validate relocation r_length Summary: We should be reading / writing our addends / relocated addresses based on r_length, and not just based on the type of the relocation. But since only some r_length values are valid for a given reloc type, I've also added some validation. ld64 has code to allow for r_length = 0 in X86_64_RELOC_BRANCH relocs, but I'm not sure how to create such a relocation... Reviewed By: smeenai Differential Revision: https://reviews.llvm.org/D80854	2020-06-14 16:35:23 -07:00
Saleem Abdulrasool	6fe27b5fed	lld: initial pass at supporting TBD Add support to lld to use Text Based API stubs for linking. This is support is incomplete not filtering out platforms. It also does not account for architecture specific API handling and potentially does not correctly handle trees of re-exports with inlined libraries being treated as direct children of the top level library.	2020-06-08 18:15:40 -07:00
Jez Ng	1e1a3f67ee	[lld-macho] Ensure reads from nlist_64 structs are aligned when necessary My test refactoring in D80217 seems to have caused yaml2obj to emit unaligned nlist_64 structs, causing ASAN'd lld to be unhappy. I don't think this is an issue with yaml2obj though -- llvm-mc also seems to emit unaligned nlist_64s. This diff makes lld able to safely do aligned reads under ASAN builds while hopefully creating no overhead for regular builds on architectures that support unaligned reads. Reviewed By: thakis Differential Revision: https://reviews.llvm.org/D80414	2020-06-02 13:19:38 -07:00
Jez Ng	6f6d91867d	[lld-macho] Add some relocation validation logic I considered making a `Target::validate()` method, but I wasn't sure how I felt about the overhead of doing yet another switch-dispatch on the relocation type, so I put the validation in `relocateOne` instead... might be a bit of a micro-optimization, but `relocateOne` does assume certain things about the relocations it gets, and this error handling makes that explicit, so it's not a totally unreasonable code organization. Reviewed By: smeenai Differential Revision: https://reviews.llvm.org/D80049	2020-06-02 13:19:38 -07:00
Jez Ng	ce0d8beebc	[lld-macho][re-land] Support X86_64_RELOC_UNSIGNED This reverts commit `db8559eee4`.	2020-05-19 12:31:55 -07:00
Jez Ng	4eb6f4854e	[lld-macho][re-land] Support .subsections_via_symbols Summary: This diff restores and builds upon @pcc and @ruiu's initial work on subsections. The .subsections_via_symbols directive indicates we can split each section along symbol boundaries, unless those symbols have been marked with `.alt_entry`. We exercise this functionality in our tests by using order files that rearrange those symbols. Depends on D79668. Reviewers: ruiu, pcc, MaskRay, smeenai, alexshap, gkm, Ktwu, christylee Reviewed By: smeenai Subscribers: thakis, llvm-commits, pcc, ruiu Tags: #llvm Differential Revision: https://reviews.llvm.org/D79926	2020-05-19 12:31:54 -07:00
Jez Ng	70fbbcdd34	Revert "[lld-macho] Support .subsections_via_symbols" Due to build breakage mentioned in https://reviews.llvm.org/D79926. This reverts commit `e270b2f172`.	2020-05-19 08:30:02 -07:00
Jez Ng	db8559eee4	Revert "[lld-macho] Support X86_64_RELOC_UNSIGNED" This reverts commit `1f820e3559`.	2020-05-19 08:30:02 -07:00
Jez Ng	1f820e3559	[lld-macho] Support X86_64_RELOC_UNSIGNED Note that it's only used for non-pc-relative contexts. Reviewed By: MaskRay, smeenai Differential Revision: https://reviews.llvm.org/D80048	2020-05-19 07:46:57 -07:00
Jez Ng	e270b2f172	[lld-macho] Support .subsections_via_symbols This diff restores and builds upon @pcc and @ruiu's initial work on subsections. The .subsections_via_symbols directive indicates we can split each section along symbol boundaries, unless those symbols have been marked with `.alt_entry`. We exercise this functionality in our tests by using order files that rearrange those symbols. Reviewed By: smeenai Differential Revision: https://reviews.llvm.org/D79926	2020-05-19 07:46:57 -07:00
Jez Ng	55e9eb416e	[lld-macho] Support -order_file The order file indicates how input sections should be sorted within each output section, based on the symbols contained within those sections. This diff sets the stage for implementing and testing `.subsections_via_symbols`, where we will break up InputSections by each symbol and sort them more granularly. Reviewed By: smeenai Differential Revision: https://reviews.llvm.org/D79668	2020-05-19 07:46:57 -07:00
Kellie Medlin	2b920ae78c	[lld] Add archive file support to Mach-O backend With this change, basic archive files can be linked together. Input section discovery has been refactored into a function since archive files lazily resolve their symbols / the object files containing those symbols. Reviewed By: int3, smeenai Differential Revision: https://reviews.llvm.org/D78342	2020-05-14 12:58:35 -07:00

1 2 3 4 5 ...

260 Commits