Commit Graph

595 Commits

Author SHA1 Message Date
Fangrui Song 4eda362539 [ELF] Inline computeAddend. NFC 2022-10-17 13:09:39 -07:00
Fangrui Song d8af31eced [ELF] Move ELFT-agnostic relocation code to processAux 2022-10-17 11:57:17 -07:00
Fangrui Song 874fc6bd78 [ELF] Move ELFT-agnostic relocation code to processAux. NFC 2022-10-17 11:44:28 -07:00
Fangrui Song dc884f0f43 [ELF] Remove RelocationScanner::target. NFC 2022-10-16 12:39:37 -07:00
Fangrui Song e983109876 [ELF] Move R_TPREL/R_TPREL_NEG check into handleTlsRelocation 2022-10-16 12:19:58 -07:00
Fangrui Song 2b153088be [ELF] Set DF_STATIC_TLS for AArch64/PPC32/PPC64 2022-10-16 12:08:08 -07:00
Fangrui Song 9c626d4a0d [ELF] Remove symtab indirection. NFC
Add LLVM_LIBRARY_VISIBILITY to remove unneeded GOT and unique_ptr indirection.
2022-10-01 14:46:49 -07:00
Fangrui Song 34fa860048 [ELF] Remove ctx indirection. NFC
Add LLVM_LIBRARY_VISIBILITY to remove unneeded GOT and unique_ptr
indirection. We can move other global variables into ctx without
indirection concern. In the long term we may consider passing Ctx
as a parameter to various functions and eliminate global state as
much as possible and then remove `Ctx::reset`.
2022-10-01 12:06:33 -07:00
Fangrui Song a623a4c8b4 [ELF] Remove elf::config indirection. NFC
`config` has 1000+ uses so we try to avoid changing `config->foo`. Define a
wrapper with LLVM_LIBRARY_VISIBILITY to remove unneeded GOT and unique_ptr
indirection.

My x86-64 lld executable is 11+KiB smaller.
2022-10-01 11:39:45 -07:00
Fangrui Song ab11ed5249 [ELF] Reset verdefIndex for Defined preempting SharedSymbol
to avoid spurious "attempt to reassign symbol '...'" warning after
7a58dd1046
2022-09-29 21:26:53 -07:00
Fangrui Song e3ecc6a912 [ELF] Make symAux[0] a sentinel
And default auxIdx to 0.
2022-09-29 00:50:19 -07:00
Fangrui Song 7a58dd1046 [ELF] Refactor Symbol initialization and overwriting
Symbol::replace intends to overwrite a few fields (mostly Elf{32,64}_Sym
fields), but the implementation copies all fields then restores some old fields.
This is error-prone and wasteful. Add Symbol::overwrite to copy just the
needed fields and add other overwrite member functions to copy the extra
fields.
2022-09-28 13:11:31 -07:00
Fangrui Song 62e7c5b4e2 Revert "[ELF] --pack-dyn-relocs=android: scan relocation serially after D133003"
This reverts commit bce6416775.

The workaround is unneeded after 7dac9f4e48.
2022-09-28 07:06:49 +00:00
Fangrui Song bce6416775 [ELF] --pack-dyn-relocs=android: scan relocation serially after D133003
https://reviews.llvm.org/D133003#3806508 can reproduce a non-determinism with
--threads=4. Making the config serial fixes non-determinism (by running the link
many times and compare output).
2022-09-21 11:43:13 -07:00
Martin Storsjö e280940bfb [Support] Access threadIndex via a wrapper function
On Unix platforms, this wrapper function is inline, so it should
expand to the same direct access to the thread local variable. On
Windows, it's a non-inline function within Parallel.cpp, allowing
making the thread_local variable static.

Windows Native TLS doesn't support direct access to thread local
variables in a different DLL, and GCC/binutils on Windows occasionally
has problems with non-static thread local variables too.

This fixes mingw dylib builds with native TLS after
e6aebff674.

At the same time, move the whole thread local variable within
    #if LLVM_ENABLE_THREADS
to fix builds without threading support.

Differential Revision: https://reviews.llvm.org/D133759
2022-09-14 09:19:27 +03:00
Fangrui Song e6aebff674 [ELF] Parallelize relocation scanning
* Change `Symbol::flags` to a `std::atomic<uint16_t>`
* Add `llvm::parallel::threadIndex` as a thread-local non-negative integer
* Add `relocsVec` to part.relaDyn and part.relrDyn so that relative relocations can be added without a mutex
* Arbitrarily change -z nocombreloc to move relative relocations to the end. Disable parallelism for deterministic output.

MIPS and PPC64 use global states for relocation scanning. Keep serial scanning.

Speed-up with mimalloc and --threads=8 on an Intel Skylake machine:

* clang (Release): 1.27x as fast
* clang (Debug): 1.06x as fast
* chrome (default): 1.05x as fast
* scylladb (default): 1.04x as fast

Speed-up with glibc malloc and --threads=16 on a ThunderX2 (AArch64):

* clang (Release): 1.31x as fast
* scylladb (default): 1.06x as fast

Reviewed By: andrewng

Differential Revision: https://reviews.llvm.org/D133003
2022-09-12 12:56:35 -07:00
Fangrui Song bd16ffb389 [ELF] Merge Symbol::needs* into uint16_t flags. NFC
Split off from D133003 ([ELF] Parallelize relocation scanning) to make its diff smaller.
2022-09-09 14:37:18 -07:00
Fangrui Song 50b7eb91f0 [ELF] Reuse one RelocationScanner to scan all sections. NFC 2022-09-04 23:12:27 -07:00
Fangrui Song 94ca041905 [ELF] Move scanRelocations into Relocations.cpp. NFC 2022-09-04 21:31:18 -07:00
Fangrui Song c8d9d0000b [ELF] Relocations: set hasDirectReloc only if not ifunc. NFC 2022-09-04 21:30:19 -07:00
Fangrui Song 82ed93ea05 [ELF] Use stOther to track visibility
This simplifies SymbolTableSection<ELFT>::writeTo. Add dsoProtected to be used
in canDefineSymbolInExecutable and get the side benefit that the protected DSO
preemption diagnostic is clearer.
2022-09-04 17:27:35 -07:00
Sam Clegg 2cd4cd9a32 [lld][ELF] Rename SymbolTable::symbols() to SymbolTable::getSymbols(). NFC
This change renames this method match its original name and the name
used in the wasm linker.

Back in d8f8abbd4a the ELF SymbolTable
method `getSymbols()` was replaced with `forEachSymbol`.

Then in a2fc964417 `forEachSymbol` was
replaced with a `llvm::iterator_range`.

Then in e9262edf0d we came full circle
and the `llvm::iterator_range` was replaced with a `symbols()` accessor
that was identical the original `getSymbols()`.

`getSymbols` also matches the name used elsewhere in the ELF linker as
well as in both COFF and wasm backend (e.g. `InputFiles.h` and
`SyntheticSections.h`)

Differential Revision: https://reviews.llvm.org/D130787
2022-08-19 14:56:08 -07:00
Fangrui Song e3fcf2e06f [ELF] Simplify llvm::enumerate with structured binding. NFC 2022-08-09 21:52:08 -07:00
Fangrui Song abd9807590 [ELF] mergeCmp: work around irreflexivity bug
Some tests (e.g. aarch64-feature-pac.s) segfault in libstdc++ _GLIBCXX_DEBUG
builds (enabled by LLVM_ENABLE_EXPENSIVE_CHECKS).

dyn_cast<ThunkSection> is incorrectly true for any SyntheticSection. std::merge
transitively calls mergeCmp(x, x) (due to __glibcxx_requires_irreflexive_pred)
and will segfault in `ta->getTargetInputSection()`. The dyn_cast<ThunkSection>
issue should be eventually fixed properly, bug `a != b` is robust enough for now.
2022-08-05 17:08:37 -07:00
Fangrui Song 3e9adff456 [ELF] Split EhInputSection::pieces into cies and fdes
This simplifies code, removes a read32 (for id==0 check), and makes it feasible
to combine some operations in EhInputSection::split and EhFrameSection::addRecords.

Mostly NFC, but fixes "Relocation not in any piece" assertion failure in an
erroneous case when a relocation offset precedes all CIE/FDE pices.
2022-07-31 16:16:10 -07:00
Fangrui Song 6611d58f5b [ELF] Relax R_RISCV_ALIGN
Alternative to D125036. Implement R_RISCV_ALIGN relaxation so that we can handle
-mrelax object files (i.e. -mno-relax is no longer needed) and creates a
framework for future relaxation.

`relaxAux` is placed in a union with InputSectionBase::jumpInstrMod, storing
auxiliary information for relaxation. In the first pass, `relaxAux` is allocated.
The main data structure is `relocDeltas`: when referencing `relocations[i]`, the
actual offset is `r_offset - (i ? relocDeltas[i-1] : 0)`.

`relaxOnce` performs one relaxation pass. It computes `relocDeltas` for all text
section. Then, adjust st_value/st_size for symbols relative to this section
based on `SymbolAnchor`. `bytesDropped` is set so that `assignAddresses` knows
that the size has changed.

Run `relaxOnce` in the `finalizeAddressDependentContent` loop to wait for
convergence of text sections and other address dependent sections (e.g.
SHT_RELR). Note: extrating `relaxOnce` into a separate loop works for many cases
but has issues in some linker script edge cases.

After convergence, compute section contents: shrink the NOP sequence of each
R_RISCV_ALIGN as appropriate. Instead of deleting bytes, we run a sequence of
memcpy on the content delimitered by relocation locations. For R_RISCV_ALIGN let
the next memcpy skip the desired number of bytes. Section content computation is
parallelizable, but let's ensure the implementation is mature before
optimizations. Technically we can save a copy if we interleave some code with
`OutputSection::writeTo`, but let's not pollute the generic code (we don't have
templated relocation resolving, so using conditions can impose overhead to
non-RISCV.)

Tested:
`make ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- LLVM=1 defconfig all` built Linux kernel using -mrelax is bootable.
FreeBSD RISCV64 system using -mrelax is bootable.
bash/curl/firefox/libevent/vim/tmux using -mrelax works.

Differential Revision: https://reviews.llvm.org/D127581
2022-07-07 10:16:09 -07:00
Fangrui Song 9a572164d5 [ELF] Move InputFiles global variables (memoryBuffers, objectFiles, etc) into Ctx. NFC 2022-06-29 18:53:38 -07:00
Daniel Bertalan 0eec7e2a89 Reland "[lld-macho] Group undefined symbol diagnostics by symbol".
This reverts commit 36e7c9a450.

This relands d61341768c with the fix described in
https://reviews.llvm.org/D127753#3587390
2022-06-15 19:22:39 -04:00
Xiaodong Liu 36d4f42c36 [lld] Fix typo for processAux; NFC
Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D125163
2022-05-09 10:21:47 +08:00
Fangrui Song 3bc79808d0 [ELF] Fix branch range computation when picking ThunkSection
Similar to D117734. Take AArch64 as an example when the branch range is +-0x8000000.

getISDThunkSec returns `ts` when `src-0x8000000-r_addend <= tsBase < src-0x8000000`
and the new thunk will be placed in `ts` (`ts->addThunk(t)`). However, the new
thunk (at the end of ts) may be unreachable from src. In the next pass,
`normalizeExistingThunk` reverts the relocation back to the original target.
Then a new thunk is created and the same `ts` is picked as before. The `ts` is
still unreachable.

I have observed it in one test with a sufficiently large r_addend (47664): there
are initially 245 Thunk's, then in each pass 14 new Thunk's are created and get
appended to the unreachable ThunkSection. After 15 passes lld fails with
`thunk creation not converged`.

The new test aarch64-thunk-reuse2.s checks the case.

Without `- pcBias`, arm-thumb-thunk-empty-pass.s and arm-thunk-multipass-plt.s
will fail.

Reviewed By: peter.smith

Differential Revision: https://reviews.llvm.org/D124653
2022-05-03 08:46:15 -07:00
Fangrui Song 7c7702b318 [ELF] Move section assignment from initializeSymbols to postParse
https://discourse.llvm.org/t/parallel-input-file-parsing/60164

initializeSymbols currently sets Defined::section and handles non-prevailing
COMDAT groups. Move the code to the parallel postParse to reduce work from the
single-threading code path and make parallel section initialization infeasible.

Postpone reporting duplicate symbol errors so that the messages have the
section information. (`Defined::section` is assigned in postParse and another
thread may not have the information).

* duplicated-synthetic-sym.s: BinaryFile duplicate definition (very rare) now
  has no section information
* comdat-binding: `%t/w.o %t/g.o` leads to an undesired undefined symbol. This
  is not ideal but we report a diagnostic to inform that this is unsupported.
  (See release note)
* comdat-discarded-lazy.s: %tdef.o is unextracted. The new behavior (discarded
  section error) makes more sense
* i386-comdat.s: switched to a better approach working around
  .gnu.linkonce.t.__x86.get_pc_thunk.bx in glibc<2.32 for x86-32.
  Drop the ancient no-longer-relevant workaround for __i686.get_pc_thunk.bx

Depends on D120640

Differential Revision: https://reviews.llvm.org/D120626
2022-03-15 19:24:41 -07:00
Fangrui Song 9b61fff0eb Revert D120626 "[ELF] Move section assignment from initializeSymbols to postParse"
This reverts commit c30e6447c0.
It exposed brittle support for __x86.get_pc_thunk.bx.
Need to think a bit how to support __x86.get_pc_thunk.bx.
2022-03-15 19:00:54 -07:00
Fangrui Song c30e6447c0 [ELF] Move section assignment from initializeSymbols to postParse
https://discourse.llvm.org/t/parallel-input-file-parsing/60164

initializeSymbols currently sets Defined::section and handles non-prevailing
COMDAT groups. Move the code to the parallel postParse to reduce work from the
single-threading code path and make parallel section initialization infeasible.

Postpone reporting duplicate symbol errors so that the messages have the
section information. (`Defined::section` is assigned in postParse and another
thread may not have the information).

* duplicated-synthetic-sym.s: BinaryFile duplicate definition (very rare) now
  has no section information
* comdat-binding: `%t/w.o %t/g.o` leads to an undesired undefined symbol. This
  is not ideal but we report a diagnostic to inform that this is unsupported.
  (See release note)
* comdat-discarded-lazy.s: %tdef.o is unextracted. The new behavior (discarded
  section error) makes more sense

Depends on D120640

Reviewed By: peter.smith

Differential Revision: https://reviews.llvm.org/D120626
2022-03-14 14:13:41 -07:00
Fangrui Song 7b8fbb796c [ELF] Simplify addCopyRelSymbol with invokeELFT. NFC 2022-03-12 14:08:10 -08:00
Fangrui Song 38fbedab32 [ELF] Don't rely on Symbols.h's transitive inclusion of InputFiles.h. NFC 2022-02-23 20:44:34 -08:00
Fangrui Song 47d18be58b [ELF] Remove SharedSymbol::getFile. NFC
Symbol.h depends on InputFiles.h. This change moves us toward dropping the
weird dependency.

The call sites will become slightly uglier (`cast<SharedFile>(s->file)`), but
the compromise is acceptable.
2022-02-23 17:57:52 -08:00
Fangrui Song ae1ba6194f [ELF] Replace uncompressed InputSectionBase::data() with rawData. NFC
In many call sites we know uncompression cannot happen (non-SHF_ALLOC, or the
data (even if compressed) must have been uncompressed by a previous pass).
Prefer rawData in these cases. data() increases code size and prevents
optimization on rawData.
2022-02-21 00:39:26 -08:00
Fangrui Song 27bb799095 [ELF] Clean up headers. NFC 2022-02-07 21:53:34 -08:00
Fangrui Song cb03ac0b5d [ELF] Move Symbol::needsTlsLd to config->needsTlsLd
to decrease sizeof(SymbolUnion) from 72 to 64 on ELF64 platforms.

Use a dummy `Undefined` to prevent null pointer dereference (though unused)
`*rel.sym` in InputSectionBase::relocateAlloc.

The relocation order may shuffle a bit, but otherwise there is no behavior
difference.
2022-02-07 10:26:16 -08:00
Alexander Kornienko ec8a693717 Revert "[ELF] Move Symbol::needsTlsLd to config->needsTlsLd. NFC"
This reverts commit f9e3ca542e.

The commit results in internal test failures. Test case provided offline.
2022-02-07 19:00:09 +01:00
Fangrui Song 977a1a523c [ELF] Symbol::replace: use the old nameData/nameSize. NFC
Currently `this->getName() == newSym.getName()`.
By keeping the old nameData/nameSize, newSym's nameData/nameSize will be
ignored. The call sites can avoid calling getName().

printTraceSymbol needs to take the symbol name since `other`'s name is empty.
2022-02-05 16:34:02 -08:00
Fangrui Song 9af90e205a [ELF] De-template reportUndefinedSymbols. NFC
My x86-64 lld executable is 16KiB smaller.
2022-02-05 15:03:56 -08:00
Fangrui Song f9e3ca542e [ELF] Move Symbol::needsTlsLd to config->needsTlsLd. NFC
to decrease sizeof(SymbolUnion) from 72 to 64 on ELF64 platforms.
2022-02-05 14:40:15 -08:00
Fangrui Song c03fdd3403 [ELF] Fix the branch range computation when reusing a thunk
Notation: dst is `t->getThunkTargetSym()->getVA()`

On AArch64, when `src-0x8000000-r_addend <= dst < src-0x8000000`, the condition
`target->inBranchRange(rel.type, src, rel.sym->getVA(rel.addend))` may
incorrectly consider a thunk reusable.
`rel.addend = -getPCBias(rel.type)` resets the addend to 0 for AArch64/PPC
and the zero addend is used by `rel.sym->getVA(rel.addend)` to check
out-of-range relocations.

See the test for a case this computation is wrong:
`error: a.o:(.text_high+0x4): relocation R_AARCH64_JUMP26 out of range: -134217732 is not in [-134217728, 134217727]`
I have seen a real world case with r_addend=19960.

Reviewed By: peter.smith

Differential Revision: https://reviews.llvm.org/D117734
2022-01-24 09:03:21 -08:00
Fangrui Song 54fe70bfba [ELF] RelocationScanner::scanOne: replace rel.r_offset with offset. NFC 2022-01-17 00:05:27 -08:00
Fangrui Song 4c36567179 [ELF] Relocations: remove some cast<Undefined>. NFC 2022-01-17 00:02:47 -08:00
Fangrui Song b8d4eb84d7 [ELF] De-template getAlternativeSpelling. NFC 2022-01-16 23:56:25 -08:00
Fangrui Song 37a1291885 [ELF] Add RelocationScanner. NFC
Currently the way some relocation-related static functions pass around
states is clumsy. Add a Resolver class to store some states as member
variables.

Advantages:

* Avoid the parameter `InputSectionBase &sec` (this offsets the cost passing around `this` paramemter)
* Avoid the parameter `end` (Mips and PowerPC hacks)
* `config` and `target` can be cached as member variables to reduce global state accesses. (potential speedup because the compiler didn't know `config`/`target` were not changed across function calls)
* If we ever want to reduce if-else costs (e.g. `config->emachine==EM_MIPS` for non-Mips) or introduce parallel relocation scan not handling some tricky arches (PPC/Mips), we can templatize Resolver

`target` isn't used as much as `config`, so I change it to a const reference
during the migration.

There is a minor performance inprovement for elf::scanRelocations.

Reviewed By: ikudrin, peter.smith

Differential Revision: https://reviews.llvm.org/D116881
2022-01-11 09:54:53 -08:00
Fangrui Song 5dbbd4eeb8 [ELF] Move OffsetGetter before some static functions. NFC
to prepare for D116881.
2022-01-10 20:16:02 -08:00
Fangrui Song 7f1955dc96 [ELF] Support mixed TLSDESC and TLS GD
We only support both TLSDESC and TLS GD for x86 so this is an x86-specific
problem. If both are used, only one R_X86_64_TLSDESC is produced and TLS GD
accesses will incorrectly reference R_X86_64_TLSDESC. Fix this by introducing
SymbolAux::tlsDescIdx.

Reviewed By: ikudrin

Differential Revision: https://reviews.llvm.org/D116900
2022-01-10 10:03:21 -08:00