Renames CommonOrcRuntimeTypes.h to ExecutorAddress.h and moves ExecutorAddress
into the 'orc' namespace (rather than orc::shared).
Also makes ExecutorAddress a class, adds an ExecutorAddrDiff type and some
arithmetic operations on the pair (subtracting two addresses yields an addrdiff,
adding an addrdiff and an address yields an address).
The computeNamedSymbolDependencies and computeLocalDeps methods on
ObjectLinkingLayerJITLinkContext are responsible for computing, for each symbol
in the current MaterializationResponsibility, the set of non-locally-scoped
symbols that are depended on. To calculate this we have to consider the effect
of chains of dependence through locally scoped symbols in the LinkGraph. E.g.
.text
.globl foo
foo:
callq bar ## foo depneds on external 'bar'
movq Ltmp1(%rip), %rcx ## foo depends on locally scoped 'Ltmp1'
addl (%rcx), %eax
retq
.data
Ltmp1:
.quad x ## Ltmp1 depends on external 'x'
In this example symbol 'foo' depends directly on 'bar', and indirectly on 'x'
via 'Ltmp1', which is locally scoped.
Performance of the existing implementations appears to have been mediocre:
Based on flame graphs posted by @drmeister (in #jit on the LLVM discord server)
the computeLocalDeps function was taking up a substantial amount of time when
starting up Clasp (https://github.com/clasp-developers/clasp).
This commit attempts to address the performance problems in three ways:
1. Using jitlink::Blocks instead of jitlink::Symbols as the nodes of the
dependencies-introduced-by-locally-scoped-symbols graph.
Using either Blocks or Symbols as nodes provides the same information, but since
there may be more than one locally scoped symbol per block the block-based
version of the dependence graph should always be a subgraph of the Symbol-based
version, and so faster to operate on.
2. Improved worklist management.
The older version of computeLocalDeps used a fixed worklist containing all
nodes, and iterated over this list propagating dependencies until no further
changes were required. The worklist was not sorted into a useful order before
the loop started.
The new version uses a variable work-stack, visiting nodes in DFS order and
only adding nodes when there is meaningful work to do on them.
Compared to the old version the new version avoids revisiting nodes which
haven't changed, and I suspect it converges more quickly (due to the DFS
ordering).
3. Laziness and caching.
Mappings of...
jitlink::Symbol* -> Interned Name (as SymbolStringPtr)
jitlink::Block* -> Immediate dependencies (as SymbolNameSet)
jitlink::Block* -> Transitive dependencies (as SymbolNameSet)
are all built lazily and cached while running computeNamedSymbolDependencies.
According to @drmeister these changes reduced Clasp startup time in his test
setup (averaged over a handful of starts) from 4.8 to 2.8 seconds (with
ORC/JITLink linking ~11,000 object files in that time), which seems like
enough to justify switching to the new algorithm in the absence of any other
perf numbers.
MachOJITDylibInitializers::SectionExtent represented the address range of a
section as an (address, size) pair. The new ExecutorAddressRange type
generalizes this to an address range (for any object, not necessarily a section)
represented as a (start-address, end-address) pair.
The aim is to express more of ORC (and the ORC runtime) in terms of simple types
that can be serialized/deserialized via SPS. This will simplify SPS-based RPC
involving arguments/return-values of these types.
Adds support for both synchronous and asynchronous calls to wrapper functions
using SPS (Simple Packed Serialization). Also adds support for wrapping
functions on the JIT side in SPS-based wrappers that can be called from the
executor.
These new methods simplify calls between the JIT and Executor, and will be used
in upcoming ORC runtime patches to enable communication between ORC and the
runtime.
This is a first step towards consistently using the term 'executor' for the
process that executes JIT'd code. I've opted for 'executor' as the preferred
term over 'target' as target is already heavily overloaded ("the target
machine for the executor" is much clearer than "the target machine for the
target").
ELFLinkGraphBuilder<ELFT> will hold generic parsing and LinkGraph-building code
that can be shared between JITLink ELF backends for different architectures.
For now it's just a stub. The plan is to incrementally move functionality down
from ELFLinkGraphBuilder_x86_64 into the new template.
This patch was derived from Valentin Churavy's work in
https://reviews.llvm.org/D104480. It adds support for setting the transform on
an IRTransformLayer, and for accessing the IRTransformLayer in LLJIT. It also
adds access to the ThreadSafeModule::withModuleDo method for thread-safe
access to modules.
A new example has been added to show how to use these APIs to optimize a module
during materialization.
Thanks Valentin!
Reviewed By: lhames
Differential Revision: https://reviews.llvm.org/D103855
Provides ObjectTransformLayer APIs, a getter to access the
ObjectTransformLayer member of LLJIT, and the DumpObjects utility
to make construction of a dump-to-disk transform easy.
An example showing how the new APIs can be used has been added in
llvm/examples/OrcV2Examples/OrcV2CBindingsDumpObjects.
Addresses FIXMEs in TPC-based EH-frame and debug object registration code by
replacing manual argument serialization with WrapperFunction utility calls.
Replace the existing WrapperFunctionResult type in
llvm/include/ExecutionEngine/Orc/Shared/TargetProcessControlTypes.h with a
version adapted from the ORC runtime's implementation.
Also introduce the SimplePackedSerialization scheme (also adapted from the ORC
runtime's implementation) for wrapper functions to avoid manual serialization
and deserialization for calls to runtime functions involving common types.
The C-string section splitting support added in f9649d123d triggered an assert
("Duplicate canonical symbol at address") when multiple symbols were defined at
the the same offset within a C-string block (this triggered on arm64, where we
always add a block start symbol). The bug was caused by a failure to update the
record of the last canonical symbol address. The fix was to maintain this record
correctly, and move the auto-generation of the block-start symbol above the
handling for symbols defined in the object itself so that all symbols
(auto-generated and defined) are processed in address order.
MachO C-string literal sections should be split on null-terminator boundaries,
rather than the usual symbol boundaries. This patch updates
MachOLinkGraphBuilder to do that.
As suggested on rG937c4cffd024, use llvm_unreachable for unhandled integer types (which shouldn't be possible) instead of breaking and dropping down to the existing fatal error handler.
Helps silence static analyzer warnings.
During the generic x86-64 support refactor in ecf6466f01 the implementation
of MachO_arm64_GOTAndStubsBuilder::isGOTEdgeToFix was altered to only return
true for external symbols. This behavior is incorrect: GOT entries may be
required for defined symbols (e.g. in the large code model).
This patch fixes the bug and adds a test case for it (renaming an old test
case to avoid any ambiguity).
Currently, BPF only contains three relocations:
R_BPF_NONE for no relocation
R_BPF_64_64 for LD_imm64 and normal 64-bit data relocation
R_BPF_64_32 for call insn and normal 32-bit data relocation
Also .BTF and .BTF.ext sections contain symbols in allocated
program and data sections. These two sections reserved 32bit
space to hold the offset relative to the symbol's section.
When LLVM JIT is used, the LLVM ExecutionEngine RuntimeDyld
may attempt to resolve relocations for .BTF and .BTF.ext,
which we want to prevent. So we used R_BPF_NONE for such relocations.
This all works fine until when we try to do linking of
multiple objects.
. R_BPF_64_64 handling of LD_imm64 vs. normal 64-bit data
is different, so lld target->relocate() needs more context
to do a correct job.
. The same for R_BPF_64_32. More context is needed for
lld target->relocate() to differentiate call insn vs.
normal 32-bit data relocation.
. Since relocations in .BTF and .BTF.ext are set to R_BPF_NONE,
they will not be relocated properly when multiple .BTF/.BTF.ext
sections are merged by lld.
This patch intends to address this issue by adding additional
relocation kinds:
R_BPF_64_ABS64 for normal 64-bit data relocation
R_BPF_64_ABS32 for normal 32-bit data relocation
R_BPF_64_NODYLD32 for .BTF and .BTF.ext style relocations.
The old R_BPF_64_{64,32} semantics:
R_BPF_64_64 for LD_imm64 relocation
R_BPF_64_32 for call insn relocation
The existing R_BPF_64_64/R_BPF_64_32 mapping to numeric values
is maintained. They are the most common use cases for
bpf programs and we want to maintain backward compatibility
as much as possible.
ExecutionEngine RuntimeDyld BPF relocations are adjusted as well.
R_BPF_64_{ABS64,ABS32} relocations will be resolved properly and
other relocations will be ignored.
Two tests are added for RuntimeDyld. Not handling R_BPF_64_NODYLD32 in
RuntimeDyldELF.cpp will result in "Relocation type not implemented yet!"
fatal error.
FK_SecRel_4 usages in BPFAsmBackend.cpp and BPFELFObjectWriter.cpp
are removed as they are not triggered in BPF backend.
BPF backend used FK_SecRel_8 for LD_imm64 instruction operands.
Differential Revision: https://reviews.llvm.org/D102712
This patch introduces new operations on jitlink::Blocks: setMutableContent,
getMutableContent and getAlreadyMutableContent. The setMutableContent method
will set the block content data and size members and flag the content as
mutable. The getMutableContent method will return a mutable copy of the existing
content value, auto-allocating and populating a new mutable copy if the existing
content is marked immutable. The getAlreadyMutableMethod asserts that the
existing content is already mutable and returns it.
setMutableContent should be used when updating the block with totally new
content backed by mutable memory. It can be used to change the size of the
block. The argument value should *not* be shared with any other block.
getMutableContent should be used when clients want to modify the existing
content and are unsure whether it is mutable yet.
getAlreadyMutableContent should be used when clients want to modify the existing
content and know from context that it must already be immutable.
These operations reduce copy-modify-update boilerplate and unnecessary copies
introduced when clients couldn't me sure whether the existing content was
mutable or not.
The implementation and intent behind freeing the triple string here is the same
as LLVMGetDefaultTargetTriple (and any other owned c string returned from the C
API), so we should use LLVMDisposeMessage for to free the string for
consistency.
Patch by Mats Larsen -- thanks Mats!
Reviewed By: lhames
Differential Revision: https://reviews.llvm.org/D102957
This patch introduces functionality used by BOLT when
re-linking the final binary. It adds to MemoryManager a new member
function allowStubAllocation to control whether this MemoryManager
supports increasing code size with stubs or not. Since BOLT can
rewrite some files in-place, it needs to avoid stub insertion done
by the linker. This patch also introduces allowsZeroSymbols to the
JITSymbolResolver class, enabling us to finish a link successfully
even when some symbols resolve to the value zero. When rewriting a
binary, sometimes we do need to resolve a target to zero in case
the input binary calls address zero and we want to be bug
compatible. We also expose reassignSectionAddress as it is used by
BOLT.
Reviewed By: lhames
Differential Revision: https://reviews.llvm.org/D97898
Fix was implemented in the ittap repo to solve an error about cross-compiling ITTAPI in LLVM with mingw.
The problem occurred in the cross-compilation environment for Julia's dependencies.
The corresponding issue item in ittapi repo: https://github.com/intel/ittapi/issues/19
A new tag was created in ittapi repo for that fix.
This patch contains changes to update the ittapi tag in LLVM.
Reviewed By: bader
Differential Revision: https://reviews.llvm.org/D102471
This is separate from (but builds on) the support added in ec6b71df70 for
emitting LinkGraphs in the context of an active materialization. This commit
makes LinkGraphs a first-class data structure with features equivalent to
object files within ObjectLinkingLayer.
These can be used to create eh-frame section fixing passes outside the usual
linker pipeline, which can be useful for tests and tools that just want to
verify or dump graphs.
This commit reorders some fields and fixes the width of others to try to
maintain more consistent columns. It also switches to long-hand scope
and linkage names, since LinkGraph dumps aren't read often enough for
single-character codes to be memorable.
Generalizing this API allows work to be distributed more evenly. In particular,
query callbacks can now be dispatched (rather than running immediately on the
thread that satisfied the query). This avoids the pathalogical case where an
operation on one thread satisfies many queries simultaneously, causing large
amounts of work to be run on that thread while other threads potentially sit
idle.
This can be useful for clients constructing custom JIT stacks: If the C API
for your custom stack exposes API to obtain a reference to an object layer
(e.g. LLVMOrcLLJITGetObjLinkingLayer) then the newly added
LLVMOrcObjectLayerAddObjectFile and LLVMOrcObjectLayerAddObjectFileWithRT
functions can be used to add objects directly to that layer.
This reapplies 8740360093, which was reverted in bbddadd46e due to buildbot
errors.
This version checks that a JIT instance can be safely constructed, skipping
tests if it can not be. To enable this it introduces new C API to retrieve and
set the target triple for a JITTargetMachineBuilder.
Adds support for creating custom MaterializationUnits in the C API with the new
LLVMOrcCreateCustomMaterializationUnit function.
Modifies ownership rules for LLVMOrcAbsoluteSymbols to make it consistent with
LLVMOrcCreateCustomMaterializationUnit. This is an ABI breaking change for any
clients of the LLVMOrcAbsoluteSymbols API.
Adds LLVMOrcLLJITGetObjLinkingLayer and LLVMOrcObjectLayerEmit functions to
allow clients to get a reference to an LLJIT instance's linking layer, then
emit an object file using it. This can be used to support construction of
custom materialization units in the common case where those units will
generate an object file that needs to be emitted to complete the
materialization.
In EHFrameRegistrationPlugin::notifyTransferringResources if SrcKey had
eh-frames associated but DstKey did not we would create a new entry for DskKey,
invalidating the iterator for SrcKey in the process. This commit fixes that by
removing SrcKey first in this case.
It used to be that all of our intrinsics were call instructions, but over time, we've added more and more invokable intrinsics. According to the verifier, we're up to 8 right now. As IntrinsicInst is a sub-class of CallInst, this puts us in an awkward spot where the idiomatic means to check for intrinsic has a false negative if the intrinsic is invoked.
This change switches IntrinsicInst from being a sub-class of CallInst to being a subclass of CallBase. This allows invoked intrinsics to be instances of IntrinsicInst, at the cost of requiring a few more casts to CallInst in places where the intrinsic really is known to be a call, not an invoke.
After this lands and has baked for a couple days, planned cleanups:
Make GCStatepointInst a IntrinsicInst subclass.
Merge intrinsic handling in InstCombine and use idiomatic visitIntrinsicInst entry point for InstVisitor.
Do the same in SelectionDAG.
Do the same in FastISEL.
Differential Revision: https://reviews.llvm.org/D99976