Commit Graph

313 Commits

Author SHA1 Message Date
Nico Weber 87248ba5b1 [lld/elf] Use C++17 nested namespace syntax in most places
Like D131405, but for ELF.

No behavior change.

Differential Revision: https://reviews.llvm.org/D131612
2022-08-10 16:47:30 -04:00
Alex Brachet dbd04b853b [ELF] Support --package-metadata
This was recently introduced in GNU linkers and it makes sense for
ld.lld to have the same support. This implementation omits checking if
the input string is valid json to reduce size bloat.

Differential Revision: https://reviews.llvm.org/D131439
2022-08-08 21:31:58 +00:00
Fangrui Song 81ed005c4c [ELF] Remove EhFrameSection::addSection. NFC 2022-07-31 19:55:05 -07:00
Fangrui Song a465e79f19 [ELF] Move SyntheticSections to InputSection.h. NFC
Keep the main SectionBase hierarchy in InputSection.h.
And inline MergeInputSection::getParent.
2022-07-30 17:42:08 -07:00
Fangrui Song 2e2d5304f0 [ELF] Move combineEhSections from Writer to SyntheticSections. NFC
This not only places the function in the right place, but also allows inlining addSection.
2022-07-29 00:47:30 -07:00
Mitch Phillips 786c89fed3 [ELF][MTE] Add --android-memtag-* options to synthesize ELF notes
This ELF note is aarch64 and Android-specific. It specifies to the
dynamic loader that specific work should be scheduled to enable MTE
protection of stack and heap regions.

Current synthesis of the ".note.android.memtag" ELF note is done in the
Android build system. We'd like to move that to the compiler. This patch
adds the --memtag-stack, --memtag-heap, and --memtag-mode={async, sync,
none} flags to the linker, which synthesises the note for us.

Future changes will add -fsanitize=memtag* flags to clang which will
pass these through to lld.

Depends on D119381.

Differential Revision: https://reviews.llvm.org/D119384
2022-04-04 11:17:36 -07:00
Nico Weber cd52b35ee4 fix comment typos to cycle bots 2022-04-04 08:56:18 -04:00
Fangrui Song 1db59dc8e2 [ELF] Fix llvm_unreachable failure when COMMON is placed in SHT_PROGBITS output section
Fix a regression in aa27bab5a1a17e9c4168a741a6298ecaa92c1ecb: COMMON in an
SHT_PROGBITS output section caused llvm_unreachable failure.
2022-03-28 11:05:52 -07:00
Fangrui Song 385573e07b [ELF] Inline ARMExidxSyntheticSection::classof. NFC
To optimize the only call site `dyn_cast<ARMExidxSyntheticSection>(first)` and
decrease code size.
2022-03-15 23:41:30 -07:00
Joao Moreira 9d7001eba9 [ELF][X86] Don't create IBT .plt if there is no PLT entry
https://github.com/ClangBuiltLinux/linux/issues/1606
When GNU_PROPERTY_X86_FEATURE_1_IBT is enabled, ld.lld will create .plt output
section even if there is no PLT entry. Fix this by implementing
IBTPltSection::isNeeded instead of using the default code path (which always
returns true).

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D120600
2022-02-26 03:55:40 +00:00
Fangrui Song 27bb799095 [ELF] Clean up headers. NFC 2022-02-07 21:53:34 -08:00
Fangrui Song 196aedb843 [ELF] Change vector<InputSection *> to SmallVector. NFC
My x86-64 lld executable is 8KiB smaller.
2022-02-01 00:14:21 -08:00
Fangrui Song 7cd0c45364 [ELF] Simplify SectionBase::partition handling and make it live by default. NFC
Previously an InputSectionBase is dead (`partition==0`) by default.
SyntheticSection calls markLive and BssSection overrides that with markDead.

It is more natural to make InputSectionBase live by default and let
--gc-sections mark InputSectionBase dead.

When linking a Release build of clang:

* --no-gc-sections:, the removed `inputSections` loop decreases markLive time from 4ms to 1ms.
* --gc-sections: the extra `inputSections` loop increases markLive time from 0.181296s to 0.188526s.
  This is as of we lose the removing one `inputSections` loop optimization (4374824ccf).
  I believe the loss can be mitigated if we refactor markLive.
2022-01-30 15:12:09 -08:00
Fangrui Song 988a03c585 [ELF] Add some Mips*Section to InStruct and change make<Mips*Section> to std::make_unique
Similar to D116143. My x86-64 lld executable is 20+KiB smaller.
2022-01-29 23:55:29 -08:00
Fangrui Song 469c4124ab [ELF] --gdb-index: switch to SmallVector. NFC 2022-01-29 15:24:56 -08:00
Fangrui Song da0e5b885b [ELF] Refactor -z combreloc
* `RelocationBaseSection::addReloc` increases `numRelativeRelocs`, which
  duplicates the work done by RelocationSection<ELFT>::writeTo.
* --pack-dyn-relocs=android has inappropropriate DT_RELACOUNT.
  AndroidPackedRelocationSection does not necessarily place relative relocations
  in the front and DT_RELACOUNT might cause semantics error (though our
  implementation doesn't and Android bionic doesn't use DT_RELACOUNT anyway.)

Move `llvm::partition` to a new function `partitionRels` and compute
`numRelativeRelocs` there. Now `RelocationBaseSection::addReloc` is trivial and
can be moved to the header to enable inlining.

The rest of DynamicReloc and `-z combreloc` handling is moved to the
non-template `RelocationBaseSection::computeRels` to decrease code size. My
x86-64 lld executable is 44+KiB smaller.

While here, rename `sort` to `combreloc`.
2022-01-29 14:45:58 -08:00
Fangrui Song ac0986f880 [ELF] Change std::vector<InputSectionBase *> to SmallVector
There is no remaining std::vector<InputSectionBase> now. My x86-64 lld
executable is 2KiB small.
2022-01-17 10:25:07 -08:00
Fangrui Song e205445434 [ELF] StringTableSection: Use DenseMap<CachedHashStringRef> to avoid redundant hash computation
5~6% speedup when linking clang and chrome.
2022-01-16 21:02:05 -08:00
Fangrui Song a5249c2dd2 [ELF] Change gnuHashTab/hashTab to unique_ptr. NFC
and remove associated make<XXX> calls.

My x86-64 `lld` is ~5KiB smaller.
2022-01-12 13:04:32 -08:00
Fangrui Song 7f1955dc96 [ELF] Support mixed TLSDESC and TLS GD
We only support both TLSDESC and TLS GD for x86 so this is an x86-specific
problem. If both are used, only one R_X86_64_TLSDESC is produced and TLS GD
accesses will incorrectly reference R_X86_64_TLSDESC. Fix this by introducing
SymbolAux::tlsDescIdx.

Reviewed By: ikudrin

Differential Revision: https://reviews.llvm.org/D116900
2022-01-10 10:03:21 -08:00
Fangrui Song cb203f3f92 [ELF] Change InStruct/Partition pointers to unique_ptr
and remove associated make<XXX> calls.
gnuHash and sysvHash are unchanged, otherwise LinkerScript::discard would
destroy the objects which may be referenced by input section descriptions.

My x86-64 lld executable is 121+KiB smaller.
2021-12-27 18:15:23 -08:00
Fangrui Song 80c14dcc0e [ELF] Delete stale declaration. NFC 2021-12-27 12:56:38 -08:00
Fangrui Song a1c2ee0147 [ELF] LinkerScript/OutputSection: change other std::vector members to SmallVector
11+KiB smaller .text with both libc++ and libstdc++ builds.
2021-12-26 13:53:47 -08:00
Fangrui Song ad26b0b233 Revert "[ELF] Make Partition/InStruct members unique_ptr and remove associate make<XXX>"
This reverts commit e48b1c8a27.
This reverts commit d019de23a1.

The changes caused memory leaks (non-final classes cannot use unique_ptr).
2021-12-22 23:55:11 -08:00
Fangrui Song ba6973c89b [ELF] Change nonnull pointer parameters to references 2021-12-22 22:02:29 -08:00
Fangrui Song e48b1c8a27 [ELF] Make Partition members unique_ptr and remove associate make<XXX>
See D116143 for benefits. My lld executable (x86-64) is 103+KiB smaller.
2021-12-22 21:34:26 -08:00
Fangrui Song d019de23a1 [ELF] Make InStruct members unique_ptr and remove associate make<XXX>
See D116143 for benefits. My lld executable (x86-64) is 24+KiB smaller.
2021-12-22 21:11:26 -08:00
Fangrui Song 5c75cc51b3 [ELF] Change nonnull pointer parameters to references. NFC 2021-12-22 21:09:57 -08:00
Fangrui Song baa3eb0dd9 [ELF] Change some non-null pointer parameters to references. NFC 2021-12-22 20:51:11 -08:00
Fangrui Song 6683099a0d [ELF] Optimize RelocationSection<ELFT>::writeTo
When linking a 1.2G output (nearly no debug info, 2846621 dynamic relocations) using `--threads=8`, I measured

```
9.131462 Total ExecuteLinker
1.449913 Total Write output file
1.445784 Total Write sections
0.657152 Write sections {"detail":".rela.dyn"}
```

This change decreases the .rela.dyn time to 0.25, leading to 4% speed up in the total time.

* The parallelSort is slow because of expensive r_sym/r_offset computation. Cache the values.
* The iteration is slow. Move r_sym/r_addend computation ahead of time and parallelize it.

With the change, the new encodeDynamicReloc is cheap (0.05s). So no need to parallelize it.

Reviewed By: ikudrin

Differential Revision: https://reviews.llvm.org/D115993
2021-12-21 09:43:44 -08:00
Fangrui Song 552d84414d [ELF] Use SmallVector for many SyntheticSections. NFC
This decreases struct sizes and usually decreases the lld executable
size (39KiB for my x86-64 executable) (unless in some cases smaller
SmallVector leads to more inlining, e.g. StringTableBuilder).
For --gdb-index, there may be memory usage saving.
2021-12-17 19:22:16 -08:00
Fangrui Song 93558e575e [ELF] Internalize createMergeSynthetic. NFC
Only called once. Moving to OutputSections.cpp can make it inlined.
finalizeInputSections can be very hot, especially in -O1 links with much debug info.
2021-12-16 20:50:06 -08:00
Fangrui Song 5ca54c6686 [ELF] Simplify GnuHashSection::write. NFC 2021-11-25 14:23:25 -08:00
Fangrui Song 55c14d6dbf [ELF] Simplify DynamicSection content computation. NFC
The new code computes the content twice, but avoides the tricky
std::function<uint64_t()>. Removed 13KiB code in a Release build.
2021-11-25 14:12:34 -08:00
Alex Richardson cc7cb9523e [ELF][AArch64] Write addends for TLSDESC relocations with -z rel
Since D100490 this case is diagnosed for -z rel. This commit implements
R_AARCH64_TLSDESC cases for AArch64::getImplicitAddend() and
AArch64::relocate(). However, there are probably further relocation types
that need to be handled for full support of -z rel.

Fixes https://bugs.llvm.org/show_bug.cgi?id=47009

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D100544
2021-07-09 10:41:41 +01:00
Alex Richardson 35c5e564e6 [ELF] Check the Elf_Rel addends for dynamic relocations
There used to be many cases where addends for Elf_Rel were not emitted in
the final object file (mostly when building for MIPS64 since the input .o
files use RELA but the output uses REL). These cases have been fixed since,
but this patch adds a check to ensure that the written values are correct.
It is based on a previous patch that I added to the CHERI fork of LLD since
we were using MIPS64 as a baseline. The work has now almost entirely
shifted to RISC-V and Arm Morello (which use Elf_Rela), but I thought
it would be useful to upstream our local changes anyway.

This patch adds a (hidden) command line flag --check-dynamic-relocations
that can be used to enable these checks. It is also on by default in
assertions builds for targets that handle all dynamic relocations kinds
that LLD can emit in Target::getImplicitAddend(). Currently this is
enabled for ARM, MIPS, and I386.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D101450
2021-07-09 10:41:40 +01:00
Alex Richardson 6d87ca08ae [ELF] Refactor DynamicReloc to fix incorrect relocation addends
This patch changes the DynamicReloc class to store an enum instead
of the overloaded useSymVA member to make it easier to understand
and fix incorrect addends being written in some corner cases. The
change is motivated by a follow-up review that checks the value of
implicit Elf_Rel addends written to the output file.

This patch fixes an incorrect output when using `-z rela` for i386 files
with R_386_GOT32 relocations (not that this really matters since it's an
unsupported configuration).
Storing the relocation expression kind also addresses an incorrect addend
FIXME in ppc64-abs64-dyn.s introduced in D63383.

DynamicReloc now also has a special case for the MIPS TLS relocations
(DynamicReloc::AgainstSymbolWithTargetVA) since the
R_MIPS_TLS_TPREL{32/64} the symbol VA to the GOT for preemptible
symbols. I'm not sure if the symbol value actually should be written
for R_MIPS_TLS_TPREL32, but this patch does not attempt to change
that behaviour.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D100490
2021-07-09 10:41:40 +01:00
Georgii Rymar ed146d6291 [LLD][ELF] - Use LLVM_ELF_IMPORT_TYPES_ELFT instead of multiple types definitions. NFCI.
We can reduce the number of "using" declarations.
`LLVM_ELF_IMPORT_TYPES_ELFT` was extended in D93801.

Differential revision: https://reviews.llvm.org/D93856
2020-12-29 10:50:07 +03:00
Jessica Clarke bef38e86b4 [ELF] Handle SHT_RISCV_ATTRIBUTES similarly to SHT_ARM_ATTRIBUTES
Currently we treat SHT_RISCV_ATTRIBUTES like a normal section and
concatenate all such input sections, yielding invalid output unless only
a single attributes section is present in the input. Instead, pick the
first as with SHT_ARM_ATTRIBUTES. We do not currently need to condition
our behaviour on the contents, unlike Arm. In future, we should both do
stricter validation of the input and merge all sections together to
ensure we have, for example, the full arch string requirement, but this
rudimentary implementation is good enough for most common cases.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D86309
2020-09-05 18:36:23 +01:00
Fangrui Song 21b4f8060a [ELF] --icf: don't fold text sections with LSDA
Fix PR36272 and PR46835

A .eh_frame FDE references a text section and (optionally) a LSDA (in
.gcc_except_table).  Even if two text sections have identical content and
relocations (e.g. a() and b()), we cannot fold them if their LSDA are different.

```
void foo();
void a() {
  try { foo(); } catch (int) { }
}
void b() {
  try { foo(); } catch (float) { }
}
```

Scan .eh_frame pieces with LSDA and disallow referenced text sections to be
folded. If two .gcc_except_table have identical semantics (usually identical
content with PC-relative encoding), we will lose folding opportunity.
For ClickHouse (an exception-heavy application), this can reduce --icf=all efficiency
from 9% to 5%. There may be some percentage we can reclaim without affecting
correctness, if we analyze .eh_frame and .gcc_except_table sections.

gold 2.24 implemented a more complex fix (resolution to
https://sourceware.org/bugzilla/show_bug.cgi?id=21066) which combines the
checksum of .eh_frame CIE/FDE pieces.

Reviewed By: grimar

Differential Revision: https://reviews.llvm.org/D84610
2020-08-05 09:16:28 -07:00
Peter Smith 3b1622d63a [LLD][ELF][ARM] recommit Fix ARM Exidx order for non monotonic section order
Fixed error detected by msan. The size field of the .ARM.exidx synthetic
section needs to be initialized to at least estimation level before
calling assignAddresses as that will use the size field.

This was previously reverted in 1ca16fc4f5.

Differential Revision: https://reviews.llvm.org/D78422
2020-04-24 13:47:28 +01:00
Kazuaki Ishizaki 7c5fcb3591 [lld] NFC: fix trivial typos in comments
Differential Revision: https://reviews.llvm.org/D72339
2020-04-02 01:21:36 +09:00
Fangrui Song 00925aadb3 [ELF][PPC32] Fix canonical PLTs when the order does not match the PLT order
Reviewed By: Bdragon28

Differential Revision: https://reviews.llvm.org/D75394
2020-02-28 22:23:14 -08:00
Fangrui Song 837e8a9c0c [ELF][PPC32] Support canonical PLT
-fno-pie produces a pair of non-GOT-non-PLT relocations R_PPC_ADDR16_{HA,LO} (R_ABS) referencing external
functions.

```
lis 3, func@ha
la 3, func@l(3)
```

In a -no-pie/-pie link, if func is not defined in the executable, a canonical PLT entry (st_value>0, st_shndx=0) will be needed.
References to func in shared objects will be resolved to this address.
-fno-pie -pie should fail with "can't create dynamic relocation ... against ...", so we just need to think about -no-pie.

On x86, the PLT entry passes the JMP_SLOT offset to the rtld PLT resolver.
On x86-64: the PLT entry passes the JUMP_SLOT index to the rtld PLT resolver.
On ARM/AArch64: the PLT entry passes &.got.plt[n]. The PLT header passes &.got.plt[fixed-index]. The rtld PLT resolver can compute the JUMP_SLOT index from the two addresses.

For these targets, the canonical PLT entry can just reuse the regular PLT entry (in PltSection).

On PPC32: PltSection (.glink) consists of `b PLTresolve` instructions and `PLTresolve`. The rtld PLT resolver depends on r11 having been set up to the .plt (GotPltSection) entry.
On PPC64 ELFv2: PltSection (.glink) consists of `__glink_PLTresolve` and `bl __glink_PLTresolve`. The rtld PLT resolver depends on r12 having been set up to the .plt (GotPltSection) entry.

We cannot reuse a `b PLTresolve`/`bl __glink_PLTresolve` in PltSection as a canonical PLT entry. PPC64 ELFv2 avoids the problem by using TOC for any external reference, even in non-pic code, so the canonical PLT entry scenario should not happen in the first place.
For PPC32, we have to create a PLT call stub as the canonical PLT entry. The code sequence sets up r11.

Reviewed By: Bdragon28

Differential Revision: https://reviews.llvm.org/D73399
2020-01-25 17:56:37 -08:00
Peter Smith 01ad4c8384 [LLD][ELF][ARM][AArch64] Only round up ThunkSection Size when large OS.
In D71281 a fix was put in to round up the size of a ThunkSection to the
nearest 4KiB when performing errata patching. This fixed a problem with a
very large instrumented program that had thunks and patches mutually
trigger each other. Unfortunately it triggers an assertion failure in an
AArch64 allyesconfig build of the kernel. There is a specific assertion
preventing an InputSectionDescription being larger than 4KiB. This will
always trigger if there is at least one Thunk needed in that
InputSectionDescription, which is possible for an allyesconfig build.

Abstractly the problem case is:
.text : {
          *(.text) ;
          ...
          . = ALIGN(SZ_4K);
          __idmap_text_start = .;
          *(.idmap.text)
          __idmap_text_end = .;
          ...
        }
The assertion checks that __idmap_text_end - __idmap_start is < 4 KiB.
Note that there is more than one InputSectionDescription in the
OutputSection so we can't just restrict the fix to OutputSections smaller
than 4 KiB.

The fix presented here limits the D71281 to InputSectionDescriptions that
meet the following conditions:
1.) The OutputSection is bigger than the thunkSectionSpacing so adding
thunks will affect the addresses of following code.
2.) The InputSectionDescription is larger than 4 KiB. This will prevent
any assertion failures that an InputSectionDescription is < 4 KiB
in size.

We do this at ThunkSection creation time as at this point we know that
the addresses are stable and up to date prior to adding the thunks as
assignAddresses() will have been called immediately prior to thunk
generation.

The fix reverts the two tests affected by D71281 to their original state
as they no longer need the 4KiB size roundup. I've added simpler tests to
check for D71281 when the OutputSection size is larger than the ThunkSection
spacing.

Fixes https://github.com/ClangBuiltLinux/linux/issues/812

Differential Revision: https://reviews.llvm.org/D72344
2020-01-17 10:47:21 +00:00
Fangrui Song 7cd429f27d [ELF] Add -z force-ibt and -z shstk for Intel Control-flow Enforcement Technology
This patch is a joint work by Rui Ueyama and me based on D58102 by Xiang Zhang.

It adds Intel CET (Control-flow Enforcement Technology) support to lld.
The implementation follows the draft version of psABI which you can
download from https://github.com/hjl-tools/x86-psABI/wiki/X86-psABI.

CET introduces a new restriction on indirect jump instructions so that
you can limit the places to which you can jump to using indirect jumps.

In order to use the feature, you need to compile source files with
-fcf-protection=full.

* IBT is enabled if all input files are compiled with the flag. To force enabling ibt, pass -z force-ibt.
* SHSTK is enabled if all input files are compiled with the flag, or if -z shstk is specified.

IBT-enabled executables/shared objects have two PLT sections, ".plt" and
".plt.sec".  For the details as to why we have two sections, please read
the comments.

Reviewed By: xiangzhangllvm

Differential Revision: https://reviews.llvm.org/D59780
2020-01-13 23:39:28 -08:00
Fangrui Song 07522e4e23 [ELF] Fix a comment. NFC 2019-12-17 17:17:33 -08:00
Fangrui Song 891a8655ab [ELF] Add IpltSection
PltSection is used by both PLT and IPLT. The PLT section may have a
header while the IPLT section does not. Split off IpltSection from
PltSection to be clearer.

Unlike other targets, PPC64 cannot use the same code sequence for PLT
and IPLT. This helps make a future PPC64 patch (D71509) more isolated.

On EM_386 and EM_X86_64, when PLT is empty while IPLT is not, currently
we are inconsistent whether the PLT header is conceptually attached to
in.plt or in.iplt .  Consistently attach the header to in.plt can make
the -z retpolineplt logic simpler. It also makes `jmp` point to an
aesthetically better place for non-retpolineplt cases.

Reviewed By: grimar, ruiu

Differential Revision: https://reviews.llvm.org/D71519
2019-12-17 00:06:04 -08:00
Fangrui Song 98afa2c1f1 [ELF] De-template PltSection::addEntry. NFC 2019-12-16 11:03:20 -08:00
Peter Smith 86d24193a9 [LLD][ELF][AArch64][ARM] When errata patching, round thunk size to 4KiB.
On some edge cases such as Chromium compiled with full instrumentation we
have a .text section over twice the size of the maximum branch range and
the instrumented code generation containing many examples of the erratum
sequence. The combination of Thunks and many erratum sequences causes
finalizeAddressDependentContent() to not converge. We end up with:
start
- Thunk Creation (disturbs addresses after thunks, creating more patches)
- Patch Creation (disturbs addresses after patches, creating more thunks)
- goto start

In most images with few thunks and patches the mutual disturbance does not
cause convergence problems. As the .text size and number of patches go up
the risk increases.

A way to prevent the thunk creation from interfering with patch creation is
to round up the size of the thunks to a 4KiB boundary when the
erratum patch is enabled. As the erratum sequence only triggers when an
instruction sequence starts at 0xff8 or 0xffc modulo (4 KiB) by making the
thunks not affect addresses modulo (4 KiB) we prevent thunks from
interfering with the patch.

The patches themselves could be aggregated in the same way that Thunks are
within ThunkSections and we could round up the size in the same way. This
would reduce the number of patches created in a .text section size >
128 MiB but would not likely help convergence problems.

Differential Revision: https://reviews.llvm.org/D71281

fixes (remaining part of) pr44071, other part in D71242
2019-12-11 14:09:15 +00:00