llvm-project

Commit Graph

Author	SHA1	Message	Date
Nico Weber	87248ba5b1	[lld/elf] Use C++17 nested namespace syntax in most places Like D131405, but for ELF. No behavior change. Differential Revision: https://reviews.llvm.org/D131612	2022-08-10 16:47:30 -04:00
Alex Brachet	dbd04b853b	[ELF] Support --package-metadata This was recently introduced in GNU linkers and it makes sense for ld.lld to have the same support. This implementation omits checking if the input string is valid json to reduce size bloat. Differential Revision: https://reviews.llvm.org/D131439	2022-08-08 21:31:58 +00:00
Fangrui Song	81ed005c4c	[ELF] Remove EhFrameSection::addSection. NFC	2022-07-31 19:55:05 -07:00
Fangrui Song	a465e79f19	[ELF] Move SyntheticSections to InputSection.h. NFC Keep the main SectionBase hierarchy in InputSection.h. And inline MergeInputSection::getParent.	2022-07-30 17:42:08 -07:00
Fangrui Song	2e2d5304f0	[ELF] Move combineEhSections from Writer to SyntheticSections. NFC This not only places the function in the right place, but also allows inlining addSection.	2022-07-29 00:47:30 -07:00
Mitch Phillips	786c89fed3	[ELF][MTE] Add --android-memtag-* options to synthesize ELF notes This ELF note is aarch64 and Android-specific. It specifies to the dynamic loader that specific work should be scheduled to enable MTE protection of stack and heap regions. Current synthesis of the ".note.android.memtag" ELF note is done in the Android build system. We'd like to move that to the compiler. This patch adds the --memtag-stack, --memtag-heap, and --memtag-mode={async, sync, none} flags to the linker, which synthesises the note for us. Future changes will add -fsanitize=memtag* flags to clang which will pass these through to lld. Depends on D119381. Differential Revision: https://reviews.llvm.org/D119384	2022-04-04 11:17:36 -07:00
Nico Weber	cd52b35ee4	fix comment typos to cycle bots	2022-04-04 08:56:18 -04:00
Fangrui Song	1db59dc8e2	[ELF] Fix llvm_unreachable failure when COMMON is placed in SHT_PROGBITS output section Fix a regression in aa27bab5a1a17e9c4168a741a6298ecaa92c1ecb: COMMON in an SHT_PROGBITS output section caused llvm_unreachable failure.	2022-03-28 11:05:52 -07:00
Fangrui Song	385573e07b	[ELF] Inline ARMExidxSyntheticSection::classof. NFC To optimize the only call site `dyn_cast<ARMExidxSyntheticSection>(first)` and decrease code size.	2022-03-15 23:41:30 -07:00
Joao Moreira	9d7001eba9	[ELF][X86] Don't create IBT .plt if there is no PLT entry https://github.com/ClangBuiltLinux/linux/issues/1606 When GNU_PROPERTY_X86_FEATURE_1_IBT is enabled, ld.lld will create .plt output section even if there is no PLT entry. Fix this by implementing IBTPltSection::isNeeded instead of using the default code path (which always returns true). Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D120600	2022-02-26 03:55:40 +00:00
Fangrui Song	27bb799095	[ELF] Clean up headers. NFC	2022-02-07 21:53:34 -08:00
Fangrui Song	196aedb843	[ELF] Change vector<InputSection *> to SmallVector. NFC My x86-64 lld executable is 8KiB smaller.	2022-02-01 00:14:21 -08:00
Fangrui Song	7cd0c45364	[ELF] Simplify SectionBase::partition handling and make it live by default. NFC Previously an InputSectionBase is dead (`partition==0`) by default. SyntheticSection calls markLive and BssSection overrides that with markDead. It is more natural to make InputSectionBase live by default and let --gc-sections mark InputSectionBase dead. When linking a Release build of clang: * --no-gc-sections:, the removed `inputSections` loop decreases markLive time from 4ms to 1ms. * --gc-sections: the extra `inputSections` loop increases markLive time from 0.181296s to 0.188526s. This is as of we lose the removing one `inputSections` loop optimization (`4374824ccf`). I believe the loss can be mitigated if we refactor markLive.	2022-01-30 15:12:09 -08:00
Fangrui Song	988a03c585	[ELF] Add some MipsSection to InStruct and change make<MipsSection> to std::make_unique Similar to D116143. My x86-64 lld executable is 20+KiB smaller.	2022-01-29 23:55:29 -08:00
Fangrui Song	469c4124ab	[ELF] --gdb-index: switch to SmallVector. NFC	2022-01-29 15:24:56 -08:00
Fangrui Song	da0e5b885b	[ELF] Refactor -z combreloc * `RelocationBaseSection::addReloc` increases `numRelativeRelocs`, which duplicates the work done by RelocationSection<ELFT>::writeTo. * --pack-dyn-relocs=android has inappropropriate DT_RELACOUNT. AndroidPackedRelocationSection does not necessarily place relative relocations in the front and DT_RELACOUNT might cause semantics error (though our implementation doesn't and Android bionic doesn't use DT_RELACOUNT anyway.) Move `llvm::partition` to a new function `partitionRels` and compute `numRelativeRelocs` there. Now `RelocationBaseSection::addReloc` is trivial and can be moved to the header to enable inlining. The rest of DynamicReloc and `-z combreloc` handling is moved to the non-template `RelocationBaseSection::computeRels` to decrease code size. My x86-64 lld executable is 44+KiB smaller. While here, rename `sort` to `combreloc`.	2022-01-29 14:45:58 -08:00
Fangrui Song	ac0986f880	[ELF] Change std::vector<InputSectionBase *> to SmallVector There is no remaining std::vector<InputSectionBase> now. My x86-64 lld executable is 2KiB small.	2022-01-17 10:25:07 -08:00
Fangrui Song	e205445434	[ELF] StringTableSection: Use DenseMap<CachedHashStringRef> to avoid redundant hash computation 5~6% speedup when linking clang and chrome.	2022-01-16 21:02:05 -08:00
Fangrui Song	a5249c2dd2	[ELF] Change gnuHashTab/hashTab to unique_ptr. NFC and remove associated make<XXX> calls. My x86-64 `lld` is ~5KiB smaller.	2022-01-12 13:04:32 -08:00
Fangrui Song	7f1955dc96	[ELF] Support mixed TLSDESC and TLS GD We only support both TLSDESC and TLS GD for x86 so this is an x86-specific problem. If both are used, only one R_X86_64_TLSDESC is produced and TLS GD accesses will incorrectly reference R_X86_64_TLSDESC. Fix this by introducing SymbolAux::tlsDescIdx. Reviewed By: ikudrin Differential Revision: https://reviews.llvm.org/D116900	2022-01-10 10:03:21 -08:00
Fangrui Song	cb203f3f92	[ELF] Change InStruct/Partition pointers to unique_ptr and remove associated make<XXX> calls. gnuHash and sysvHash are unchanged, otherwise LinkerScript::discard would destroy the objects which may be referenced by input section descriptions. My x86-64 lld executable is 121+KiB smaller.	2021-12-27 18:15:23 -08:00
Fangrui Song	80c14dcc0e	[ELF] Delete stale declaration. NFC	2021-12-27 12:56:38 -08:00
Fangrui Song	a1c2ee0147	[ELF] LinkerScript/OutputSection: change other std::vector members to SmallVector 11+KiB smaller .text with both libc++ and libstdc++ builds.	2021-12-26 13:53:47 -08:00
Fangrui Song	ad26b0b233	Revert "[ELF] Make Partition/InStruct members unique_ptr and remove associate make<XXX>" This reverts commit `e48b1c8a27`. This reverts commit `d019de23a1`. The changes caused memory leaks (non-final classes cannot use unique_ptr).	2021-12-22 23:55:11 -08:00
Fangrui Song	ba6973c89b	[ELF] Change nonnull pointer parameters to references	2021-12-22 22:02:29 -08:00
Fangrui Song	e48b1c8a27	[ELF] Make Partition members unique_ptr and remove associate make<XXX> See D116143 for benefits. My lld executable (x86-64) is 103+KiB smaller.	2021-12-22 21:34:26 -08:00
Fangrui Song	d019de23a1	[ELF] Make InStruct members unique_ptr and remove associate make<XXX> See D116143 for benefits. My lld executable (x86-64) is 24+KiB smaller.	2021-12-22 21:11:26 -08:00
Fangrui Song	5c75cc51b3	[ELF] Change nonnull pointer parameters to references. NFC	2021-12-22 21:09:57 -08:00
Fangrui Song	baa3eb0dd9	[ELF] Change some non-null pointer parameters to references. NFC	2021-12-22 20:51:11 -08:00
Fangrui Song	6683099a0d	[ELF] Optimize RelocationSection<ELFT>::writeTo When linking a 1.2G output (nearly no debug info, 2846621 dynamic relocations) using `--threads=8`, I measured ``` 9.131462 Total ExecuteLinker 1.449913 Total Write output file 1.445784 Total Write sections 0.657152 Write sections {"detail":".rela.dyn"} ``` This change decreases the .rela.dyn time to 0.25, leading to 4% speed up in the total time. * The parallelSort is slow because of expensive r_sym/r_offset computation. Cache the values. * The iteration is slow. Move r_sym/r_addend computation ahead of time and parallelize it. With the change, the new encodeDynamicReloc is cheap (0.05s). So no need to parallelize it. Reviewed By: ikudrin Differential Revision: https://reviews.llvm.org/D115993	2021-12-21 09:43:44 -08:00
Fangrui Song	552d84414d	[ELF] Use SmallVector for many SyntheticSections. NFC This decreases struct sizes and usually decreases the lld executable size (39KiB for my x86-64 executable) (unless in some cases smaller SmallVector leads to more inlining, e.g. StringTableBuilder). For --gdb-index, there may be memory usage saving.	2021-12-17 19:22:16 -08:00
Fangrui Song	93558e575e	[ELF] Internalize createMergeSynthetic. NFC Only called once. Moving to OutputSections.cpp can make it inlined. finalizeInputSections can be very hot, especially in -O1 links with much debug info.	2021-12-16 20:50:06 -08:00
Fangrui Song	5ca54c6686	[ELF] Simplify GnuHashSection::write. NFC	2021-11-25 14:23:25 -08:00
Fangrui Song	55c14d6dbf	[ELF] Simplify DynamicSection content computation. NFC The new code computes the content twice, but avoides the tricky std::function<uint64_t()>. Removed 13KiB code in a Release build.	2021-11-25 14:12:34 -08:00
Alex Richardson	cc7cb9523e	[ELF][AArch64] Write addends for TLSDESC relocations with -z rel Since D100490 this case is diagnosed for -z rel. This commit implements R_AARCH64_TLSDESC cases for AArch64::getImplicitAddend() and AArch64::relocate(). However, there are probably further relocation types that need to be handled for full support of -z rel. Fixes https://bugs.llvm.org/show_bug.cgi?id=47009 Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D100544	2021-07-09 10:41:41 +01:00
Alex Richardson	35c5e564e6	[ELF] Check the Elf_Rel addends for dynamic relocations There used to be many cases where addends for Elf_Rel were not emitted in the final object file (mostly when building for MIPS64 since the input .o files use RELA but the output uses REL). These cases have been fixed since, but this patch adds a check to ensure that the written values are correct. It is based on a previous patch that I added to the CHERI fork of LLD since we were using MIPS64 as a baseline. The work has now almost entirely shifted to RISC-V and Arm Morello (which use Elf_Rela), but I thought it would be useful to upstream our local changes anyway. This patch adds a (hidden) command line flag --check-dynamic-relocations that can be used to enable these checks. It is also on by default in assertions builds for targets that handle all dynamic relocations kinds that LLD can emit in Target::getImplicitAddend(). Currently this is enabled for ARM, MIPS, and I386. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D101450	2021-07-09 10:41:40 +01:00
Alex Richardson	6d87ca08ae	[ELF] Refactor DynamicReloc to fix incorrect relocation addends This patch changes the DynamicReloc class to store an enum instead of the overloaded useSymVA member to make it easier to understand and fix incorrect addends being written in some corner cases. The change is motivated by a follow-up review that checks the value of implicit Elf_Rel addends written to the output file. This patch fixes an incorrect output when using `-z rela` for i386 files with R_386_GOT32 relocations (not that this really matters since it's an unsupported configuration). Storing the relocation expression kind also addresses an incorrect addend FIXME in ppc64-abs64-dyn.s introduced in D63383. DynamicReloc now also has a special case for the MIPS TLS relocations (DynamicReloc::AgainstSymbolWithTargetVA) since the R_MIPS_TLS_TPREL{32/64} the symbol VA to the GOT for preemptible symbols. I'm not sure if the symbol value actually should be written for R_MIPS_TLS_TPREL32, but this patch does not attempt to change that behaviour. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D100490	2021-07-09 10:41:40 +01:00
Georgii Rymar	ed146d6291	[LLD][ELF] - Use LLVM_ELF_IMPORT_TYPES_ELFT instead of multiple types definitions. NFCI. We can reduce the number of "using" declarations. `LLVM_ELF_IMPORT_TYPES_ELFT` was extended in D93801. Differential revision: https://reviews.llvm.org/D93856	2020-12-29 10:50:07 +03:00
Jessica Clarke	bef38e86b4	[ELF] Handle SHT_RISCV_ATTRIBUTES similarly to SHT_ARM_ATTRIBUTES Currently we treat SHT_RISCV_ATTRIBUTES like a normal section and concatenate all such input sections, yielding invalid output unless only a single attributes section is present in the input. Instead, pick the first as with SHT_ARM_ATTRIBUTES. We do not currently need to condition our behaviour on the contents, unlike Arm. In future, we should both do stricter validation of the input and merge all sections together to ensure we have, for example, the full arch string requirement, but this rudimentary implementation is good enough for most common cases. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D86309	2020-09-05 18:36:23 +01:00
Fangrui Song	21b4f8060a	[ELF] --icf: don't fold text sections with LSDA Fix PR36272 and PR46835 A .eh_frame FDE references a text section and (optionally) a LSDA (in .gcc_except_table). Even if two text sections have identical content and relocations (e.g. a() and b()), we cannot fold them if their LSDA are different. ``` void foo(); void a() { try { foo(); } catch (int) { } } void b() { try { foo(); } catch (float) { } } ``` Scan .eh_frame pieces with LSDA and disallow referenced text sections to be folded. If two .gcc_except_table have identical semantics (usually identical content with PC-relative encoding), we will lose folding opportunity. For ClickHouse (an exception-heavy application), this can reduce --icf=all efficiency from 9% to 5%. There may be some percentage we can reclaim without affecting correctness, if we analyze .eh_frame and .gcc_except_table sections. gold 2.24 implemented a more complex fix (resolution to https://sourceware.org/bugzilla/show_bug.cgi?id=21066) which combines the checksum of .eh_frame CIE/FDE pieces. Reviewed By: grimar Differential Revision: https://reviews.llvm.org/D84610	2020-08-05 09:16:28 -07:00
Peter Smith	3b1622d63a	[LLD][ELF][ARM] recommit Fix ARM Exidx order for non monotonic section order Fixed error detected by msan. The size field of the .ARM.exidx synthetic section needs to be initialized to at least estimation level before calling assignAddresses as that will use the size field. This was previously reverted in `1ca16fc4f5`. Differential Revision: https://reviews.llvm.org/D78422	2020-04-24 13:47:28 +01:00
Kazuaki Ishizaki	7c5fcb3591	[lld] NFC: fix trivial typos in comments Differential Revision: https://reviews.llvm.org/D72339	2020-04-02 01:21:36 +09:00
Fangrui Song	00925aadb3	[ELF][PPC32] Fix canonical PLTs when the order does not match the PLT order Reviewed By: Bdragon28 Differential Revision: https://reviews.llvm.org/D75394	2020-02-28 22:23:14 -08:00
Fangrui Song	837e8a9c0c	[ELF][PPC32] Support canonical PLT -fno-pie produces a pair of non-GOT-non-PLT relocations R_PPC_ADDR16_{HA,LO} (R_ABS) referencing external functions. ``` lis 3, func@ha la 3, func@l(3) ``` In a -no-pie/-pie link, if func is not defined in the executable, a canonical PLT entry (st_value>0, st_shndx=0) will be needed. References to func in shared objects will be resolved to this address. -fno-pie -pie should fail with "can't create dynamic relocation ... against ...", so we just need to think about -no-pie. On x86, the PLT entry passes the JMP_SLOT offset to the rtld PLT resolver. On x86-64: the PLT entry passes the JUMP_SLOT index to the rtld PLT resolver. On ARM/AArch64: the PLT entry passes &.got.plt[n]. The PLT header passes &.got.plt[fixed-index]. The rtld PLT resolver can compute the JUMP_SLOT index from the two addresses. For these targets, the canonical PLT entry can just reuse the regular PLT entry (in PltSection). On PPC32: PltSection (.glink) consists of `b PLTresolve` instructions and `PLTresolve`. The rtld PLT resolver depends on r11 having been set up to the .plt (GotPltSection) entry. On PPC64 ELFv2: PltSection (.glink) consists of `__glink_PLTresolve` and `bl __glink_PLTresolve`. The rtld PLT resolver depends on r12 having been set up to the .plt (GotPltSection) entry. We cannot reuse a `b PLTresolve`/`bl __glink_PLTresolve` in PltSection as a canonical PLT entry. PPC64 ELFv2 avoids the problem by using TOC for any external reference, even in non-pic code, so the canonical PLT entry scenario should not happen in the first place. For PPC32, we have to create a PLT call stub as the canonical PLT entry. The code sequence sets up r11. Reviewed By: Bdragon28 Differential Revision: https://reviews.llvm.org/D73399	2020-01-25 17:56:37 -08:00
Peter Smith	01ad4c8384	[LLD][ELF][ARM][AArch64] Only round up ThunkSection Size when large OS. In D71281 a fix was put in to round up the size of a ThunkSection to the nearest 4KiB when performing errata patching. This fixed a problem with a very large instrumented program that had thunks and patches mutually trigger each other. Unfortunately it triggers an assertion failure in an AArch64 allyesconfig build of the kernel. There is a specific assertion preventing an InputSectionDescription being larger than 4KiB. This will always trigger if there is at least one Thunk needed in that InputSectionDescription, which is possible for an allyesconfig build. Abstractly the problem case is: .text : { (.text) ; ... . = ALIGN(SZ_4K); __idmap_text_start = .; (.idmap.text) __idmap_text_end = .; ... } The assertion checks that __idmap_text_end - __idmap_start is < 4 KiB. Note that there is more than one InputSectionDescription in the OutputSection so we can't just restrict the fix to OutputSections smaller than 4 KiB. The fix presented here limits the D71281 to InputSectionDescriptions that meet the following conditions: 1.) The OutputSection is bigger than the thunkSectionSpacing so adding thunks will affect the addresses of following code. 2.) The InputSectionDescription is larger than 4 KiB. This will prevent any assertion failures that an InputSectionDescription is < 4 KiB in size. We do this at ThunkSection creation time as at this point we know that the addresses are stable and up to date prior to adding the thunks as assignAddresses() will have been called immediately prior to thunk generation. The fix reverts the two tests affected by D71281 to their original state as they no longer need the 4KiB size roundup. I've added simpler tests to check for D71281 when the OutputSection size is larger than the ThunkSection spacing. Fixes https://github.com/ClangBuiltLinux/linux/issues/812 Differential Revision: https://reviews.llvm.org/D72344	2020-01-17 10:47:21 +00:00
Fangrui Song	7cd429f27d	[ELF] Add -z force-ibt and -z shstk for Intel Control-flow Enforcement Technology This patch is a joint work by Rui Ueyama and me based on D58102 by Xiang Zhang. It adds Intel CET (Control-flow Enforcement Technology) support to lld. The implementation follows the draft version of psABI which you can download from https://github.com/hjl-tools/x86-psABI/wiki/X86-psABI. CET introduces a new restriction on indirect jump instructions so that you can limit the places to which you can jump to using indirect jumps. In order to use the feature, you need to compile source files with -fcf-protection=full. * IBT is enabled if all input files are compiled with the flag. To force enabling ibt, pass -z force-ibt. * SHSTK is enabled if all input files are compiled with the flag, or if -z shstk is specified. IBT-enabled executables/shared objects have two PLT sections, ".plt" and ".plt.sec". For the details as to why we have two sections, please read the comments. Reviewed By: xiangzhangllvm Differential Revision: https://reviews.llvm.org/D59780	2020-01-13 23:39:28 -08:00
Fangrui Song	07522e4e23	[ELF] Fix a comment. NFC	2019-12-17 17:17:33 -08:00
Fangrui Song	891a8655ab	[ELF] Add IpltSection PltSection is used by both PLT and IPLT. The PLT section may have a header while the IPLT section does not. Split off IpltSection from PltSection to be clearer. Unlike other targets, PPC64 cannot use the same code sequence for PLT and IPLT. This helps make a future PPC64 patch (D71509) more isolated. On EM_386 and EM_X86_64, when PLT is empty while IPLT is not, currently we are inconsistent whether the PLT header is conceptually attached to in.plt or in.iplt . Consistently attach the header to in.plt can make the -z retpolineplt logic simpler. It also makes `jmp` point to an aesthetically better place for non-retpolineplt cases. Reviewed By: grimar, ruiu Differential Revision: https://reviews.llvm.org/D71519	2019-12-17 00:06:04 -08:00
Fangrui Song	98afa2c1f1	[ELF] De-template PltSection::addEntry. NFC	2019-12-16 11:03:20 -08:00
Peter Smith	86d24193a9	[LLD][ELF][AArch64][ARM] When errata patching, round thunk size to 4KiB. On some edge cases such as Chromium compiled with full instrumentation we have a .text section over twice the size of the maximum branch range and the instrumented code generation containing many examples of the erratum sequence. The combination of Thunks and many erratum sequences causes finalizeAddressDependentContent() to not converge. We end up with: start - Thunk Creation (disturbs addresses after thunks, creating more patches) - Patch Creation (disturbs addresses after patches, creating more thunks) - goto start In most images with few thunks and patches the mutual disturbance does not cause convergence problems. As the .text size and number of patches go up the risk increases. A way to prevent the thunk creation from interfering with patch creation is to round up the size of the thunks to a 4KiB boundary when the erratum patch is enabled. As the erratum sequence only triggers when an instruction sequence starts at 0xff8 or 0xffc modulo (4 KiB) by making the thunks not affect addresses modulo (4 KiB) we prevent thunks from interfering with the patch. The patches themselves could be aggregated in the same way that Thunks are within ThunkSections and we could round up the size in the same way. This would reduce the number of patches created in a .text section size > 128 MiB but would not likely help convergence problems. Differential Revision: https://reviews.llvm.org/D71281 fixes (remaining part of) pr44071, other part in D71242	2019-12-11 14:09:15 +00:00

1 2 3 4 5 ...

313 Commits