Commit Graph

465 Commits

Author SHA1 Message Date
Lang Hames c6ade39ba0 [ORC] Replace LLJIT::defineAbsolute with an LLJIT::define convenience method.
LLJIT::defineAbsolute did not mangle its Name argument, which is inconsistent
with the behavior of other LLJIT methods (e.g. lookup). Since it is currently
unused anyway, this commit replaces it with a generic 'define' convenience
method for adding MaterializationUnits to the main JITDylib. This simplifies
use of the generic absoluteSymbols function (as well as the symbolAlias,
reexports and other functions that generate MaterializationUnits) with LLJIT.
2020-04-18 14:16:54 -07:00
vgxbj ac00376a13 [Object] Change uint32_t getSymbolFlags() to Expected<uint32_t> getSymbolFlags().
This change enables getSymbolFlags() to return errors which benefit error reporting in clients.

Differential Revision: https://reviews.llvm.org/D77860
2020-04-18 21:27:57 +08:00
Lang Hames 59ed45b483 [ORC] Add an OrcV2 C API function for configuring TargetMachines. 2020-04-10 15:51:29 -07:00
Lang Hames 37bcf2df01 [ORC] Require JITDylib to be specified when adding IR and objects in the C API. 2020-04-09 17:59:26 -07:00
Lang Hames 0d5f15f700 [ORC] Add C API support for adding object files to an LLJIT instance. 2020-04-09 16:18:46 -07:00
Lang Hames 1cd8493e69 [ORC] Expand the OrcV2 C API bindings.
Adds basic support for LLJITBuilder and DynamicLibrarySearchGenerator. This
allows C API clients to configure LLJIT to expose process symbols to JIT'd
code. An example of this is added in
llvm/examples/OrcV2CBindingsReflectProcessSymbols.
2020-04-09 16:18:46 -07:00
Lang Hames 5877d6f5f4 [ORC] Make mangling convenience methods part of the public API of LLJIT.
This saves clients from having to manually construct a MangleAndInterner.
2020-04-08 20:20:13 -07:00
Lang Hames 1b39c6f62c [ORC] Add MachO universal binary support to StaticLibraryDefinitionGenerator.
Add a new overload of StaticLibraryDefinitionGenerator::Load that takes a triple
argument and supports loading archives from MachO universal binaries in addition
to regular archives.

The LLI tool is updated to use this overload.
2020-04-05 20:21:05 -07:00
Lang Hames 05598441de Re-apply 0071eaaf08, "[ORC] Export __cxa_atexit ...", with fixes.
Forgot to include part of the testcase. Thank to Nico for spotting that and
reverting!
2020-04-02 16:03:35 -07:00
Nico Weber 5bac8d427d Revert "[ORC] Export __cxa_atexit from the main JITDylib in LLJIT."
This reverts commit 0071eaaf08.
Inputs/noop-main.ll wasn't checked in, so this breaks check-llvm
everywhere.
2020-04-01 22:49:38 -04:00
Lang Hames 0071eaaf08 [ORC] Export __cxa_atexit from the main JITDylib in LLJIT.
Failure to export __cxa_atexit can lead to an attempt to import a definition
from the process itself (if __cxa_atexit is referenced from another JITDylib),
but the process definition will clash with the existing non-exported definition
to produce an unexpected DuplicateDefinitionError.

This patch fixes the immediate issue by exporting __cxa_atexit. It also fixes a
bug where atexit functions in other JITDylibs were not being run by adding a
copy of run_atexits_helper to every JITDylib.

A follow up patch will deal with the bug where definition generators are called
despite a non-exported definition being present.
2020-04-01 19:12:08 -07:00
Lang Hames 8e5a8f620c [ORC] Don't require a null-terminator on MemoryBuffers for objects in archives.
The MemoryBuffer::getMemBuffer method's RequiresNullTerminator parameter
defaults to true, but object files are not null terminated so we need to
explicitly pass false here.
2020-04-01 12:16:38 -07:00
Lang Hames cb84e4827e [ORC] Introduce JITSymbolFlags::HasMaterializeSideEffectsOnly flag.
This flag can be used to mark a symbol as existing only for the purpose of
enabling materialization. Such a symbol can be looked up to trigger
materialization with the lookup returning only once materialization is
complete. Symbols with this flag will never resolve however (to avoid
permanently polluting the symbol table), and should only be looked up using
the SymbolLookupFlags::WeaklyReferencedSymbol flag. The primary use case for
this flag is initialization symbols.
2020-03-27 11:02:54 -07:00
Lang Hames d38d06e649 [ORC] Don't create MaterializingInfo entries unnecessarily. 2020-03-27 11:02:54 -07:00
Alina Sbirlea 3abcbf9903 [CFG/BasicBlock] Rename succ_const to const_succ. [NFC]
Summary:
Rename `succ_const_iterator` to `const_succ_iterator` and
`succ_const_range` to `const_succ_range` for consistency with the
predecessor iterators, and the corresponding iterators in
MachineBasicBlock.

Reviewers: nicholas, dblaikie, nlewycky

Subscribers: hiraditya, bmahjour, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75952
2020-03-25 12:40:55 -07:00
Lang Hames 38a8760b99 [ORC] Move ostream operators for debugging output out of Core.h.
DebugUtils.h seems like a more appropriate home for these.
2020-03-21 18:27:28 -07:00
Lang Hames 39253a50f0 [ORC] Re-apply 98f2bb4461, enable JITEventListeners in OrcV2, with fixes.
Updates the object buffer ownership scheme in jitLinkForOrc and related
functions: Ownership of both the object::ObjectFile and underlying
MemoryBuffer is passed into jitLinkForOrc and passed back to the onEmit
callback once linking is complete. This avoids the use-after-free errors
that were seen in 98f2bb4461.
2020-03-19 16:30:08 -07:00
Lang Hames 54aec178da [ORC] Don't use a platform mutex for LLJIT's GenericLLVMIRPlatformSupport class.
Along the same lines as eb918d8daf1: This code also had to acquire the session
mutex, and this could cause a deadlock under the wrong circumstances. This
patch updates GenericLLVMIRPlatformSupport to just use the session lock for
everything.
2020-03-19 11:03:34 -07:00
Lang Hames ad2da631bf [ORC] Fix indentation in debugging output. 2020-03-19 11:02:56 -07:00
Lang Hames eb918d8daf [ORC] Use finer-grained and session locking in MachOPlatform to avoid deadlock.
In MachOPlatform, obtaining the link-order for a JITDylib requires locking the
session, but also needs to be part of a larger atomic operation that collates
initializer symbols tracked by the platform. Trying to do this under a separate
platform mutex leads to potential locking order issues, e.g.

T1 locks session then tries to lock platform to register a new init symbol
meanwhile
T2 locks platform then tries to lock session to obtain link order.

Removing the platform lock and performing all these operations under the session
lock eliminates this possibility.

At the same time we also need to collate init pointers from the
MachOPlatform::InitScraperPlugin, and we don't need or want to lock the session
for that. The new InitSeqMutex has been added to guard these init pointers, and
the session mutex is never obtained while the InitSeqMutex is held.
2020-03-19 11:02:56 -07:00
Lang Hames a7b8393ffe [ORC] Don't waste time building empty replacement MaterializationUnits. 2020-03-19 11:02:56 -07:00
Lang Hames cd34c0570b [ORC] Bail out early if a replacement MaterializationUnit is empty.
The MU may define no symbols, but still contain a non-trivial destructor (e.g.
an LLVM IR module that has been stripped of all externally visible
definitions, but which still needs to lock its context to be destroyed).
Bailing out early ensures that we destroy the unit outside the session lock,
rather than under it which may cause deadlocks.

Also adds some extra sanity-checking assertions.
2020-03-19 11:02:56 -07:00
Lang Hames 9c5771710e Revert "[ORC] Enable JITEventListeners in the RTDyldObjectLinkingLayer."
This reverts commit 98f2bb4461.

Reverting while I investigate bot failures.
2020-03-15 15:35:08 -07:00
Lang Hames 98f2bb4461 [ORC] Enable JITEventListeners in the RTDyldObjectLinkingLayer.
Enable use of ExecutionEngine JITEventListeners in RTDyldObjectLinkingLayer.
This allows existing MCJIT clients to more easily migrate to LLJIT / ORCv2.

Example usage in llvm/examples/OrcV2Examples/LLJITWithGDBRegistrationListener.

Differential Revision: https://reviews.llvm.org/D75838
2020-03-15 15:14:46 -07:00
Lang Hames 981f017c5c [ORC] Print symbol flags and materializer name in ExecutionSession::dump.
The extra information can be helpful in diagnosing JIT bugs.
2020-03-14 18:52:10 -07:00
Lang Hames 9c9eb60b4b [JITLink][MachO] Re-apply b64afadf30, MachO linker-private support, with fixes.
Global symbols with linker-private prefixes should be resolvable across object
boundaries, but internal symbols with linker-private prefixes should not.
2020-03-14 18:36:15 -07:00
Lang Hames a7d187d9c0 Revert "[JITLink][MachO] Treat linker private symbols as hidden rather than private."
This reverts commit b64afadf30.

Reverting while I investigate bot failures.
2020-03-14 16:52:25 -07:00
Lang Hames b64afadf30 [JITLink][MachO] Treat linker private symbols as hidden rather than private.
Linker-private symbols should be resolvable across object file boundaries.
2020-03-14 16:33:15 -07:00
Lang Hames 633ea07200 [Orc] Add basic OrcV2 C bindings and example.
Renames the llvm/examples/LLJITExamples directory to llvm/examples/OrcV2Examples
since it is becoming a home for all OrcV2 examples, not just LLJIT.

See http://llvm.org/PR31103.
2020-03-14 14:41:22 -07:00
Jan Korous b7ce8fa91e [LLJIT] Add std::move() as a workaround for older compilers
Clang 3.8 isn't able to bind the variable to rvalue-ref which breaks the build.
2020-03-13 15:25:25 -07:00
Lang Hames 7266a8bfeb [ORC] Enable exception handling in JIT'd code when using LLJIT on Darwin.
This patch enables exception handling in code added to LLJIT on Darwin by
adding an orc::EHFrameRegistrationPlugin instance to the ObjectLinkingLayer
(which is currently used on Darwin only).
2020-03-12 15:33:56 -07:00
Lang Hames 214a9f0dd4 [ORC] Add a mutex to guard EHFrameRegistrationPlugin data structures.
These may be accessed from multiple threads if concurrent materialization is
enabled in ORC.

Testcase coming in a follow-up patch that enables eh-frame registration for
LLJIT.
2020-03-12 15:33:56 -07:00
Lang Hames b19801640b [ORC] Fix an overly aggressive assert.
It is ok to add dependencies on symbols that are ready, they should just be
skipped.
2020-03-11 20:04:54 -07:00
Lang Hames 4b87f9230b [ORC] Add some extra debugging output. 2020-03-11 20:04:54 -07:00
Lang Hames 4b15decb60 [ORC] Remove hard dependency on libobjc when using MachOPlatform with LLJIT.
The LLJIT::MachOPlatformSupport class used to unconditionally attempt to
register __objc_selrefs and __objc_classlist sections. If libobjc had not
been loaded this resulted in an assertion, even if no objc sections were
actually present. This patch replaces this unconditional registration with
a check that no objce sections are present if libobjc has not been loaded.
This will allow clients to use MachOPlatform with LLJIT without requiring
libobjc for non-objc code.
2020-03-04 21:49:28 -08:00
Stefan Gränitz 76c59a63bc [ORC] Decompose LazyCallThroughManager::callThroughToSymbol()
Summary: Decompose callThroughToSymbol() into findReexport(), resolveSymbol(), notifyResolved() and reportCallThroughError(). This allows derived classes to reuse the functionality while adding their own code in between.

Reviewers: lhames

Reviewed By: lhames

Subscribers: hiraditya, steven_wu, dexonsmith, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75084
2020-03-05 00:24:23 +01:00
Lang Hames 8363ff04af [ORC] Add some debugging output for initializers.
This output can be useful in tracking down initialization failures in the JIT.
2020-03-04 12:38:25 -08:00
Lang Hames 31e0331763 [ORC] Skip ST_File symbols in MaterializationUnit interfaces / resolution.
ST_File symbols aren't relevant for linking purposes, but can end up shadowing
real symbols if they're not filtered.

No test case yet: The ideal testcase for this would be an ELF llvm-jitlink test,
but llvm-jitlink support for ELF is still under development. We should add a
testcase for this once support lands in tree.
2020-03-03 16:15:44 -08:00
Lang Hames ff4fd8dead [ORC] Make sure we add initializers to the SymbolFlags map for objects. 2020-03-03 09:33:37 -08:00
Lang Hames 22ed8c4994 [ORC] Remove an out-of-date FIXME 2020-03-03 09:33:37 -08:00
Reid Kleckner af450eabb9 Avoid including FileSystem.h from MemoryBuffer.h
Lots of headers pass around MemoryBuffer objects, but very few open
them. Let those that do include FileSystem.h.

Saves ~250 includes of Chrono.h & FileSystem.h:

$ diff -u thedeps-before.txt thedeps-after.txt | grep '^[-+] ' | sort | uniq -c | sort -nr
    254 -    ../llvm/include/llvm/Support/FileSystem.h
    253 -    ../llvm/include/llvm/Support/Chrono.h
    237 -    ../llvm/include/llvm/Support/NativeFormatting.h
    237 -    ../llvm/include/llvm/Support/FormatProviders.h
    192 -    ../llvm/include/llvm/ADT/StringSwitch.h
    190 -    ../llvm/include/llvm/Support/FormatVariadicDetails.h
...

This requires duplicating the file_t typedef, which is unfortunate. I
sunk the choice of mapping mode down into the cpp file using variable
template specializations instead of class members in headers.
2020-02-29 12:30:23 -08:00
Lang Hames b7aa1cc3a4 [ORC] Remove the JITDylib::SymbolTableEntry::isInMaterializingState() method.
It was being used inconsistently. Uses have been replaced with direct checks
on the symbol state.
2020-02-25 16:44:12 -08:00
Benjamin Kramer 8c893cac3f [ORC] Remove spammy debug print 2020-02-24 12:10:13 +01:00
Lang Hames 1df947ab40 [ORC] Update LLJIT to automatically run specially named initializer functions.
The GenericLLVMIRPlatformSupport class runs a transform on all LLVM IR added to
the LLJIT instance to replace instances of llvm.global_ctors with a specially
named function that runs the corresponing static initializers (See
(GlobalCtorDtorScraper from lib/ExecutionEngine/Orc/LLJIT.cpp). This patch
updates the GenericIRPlatform class to check for this specially named function
in other materialization units that are added to the JIT and, if found, add
the function to the initializer work queue. Doing this allows object files
that were compiled from IR and cached to be reloaded in subsequent JIT sessions
without their initializers being skipped.

To enable testing this patch also updates the lli tool's -jit-kind=orc-lazy mode
to respect the -enable-cache-manager and -object-cache-dir options, and modifies
the CompileOnDemandLayer to rename extracted submodules to include a hash of the
names of their symbol definitions. This allows a simple object caching scheme
based on module names (which was already implemented in lli) to work with the
lazy JIT.
2020-02-22 11:49:14 -08:00
Lang Hames 81726894d3 [ORC] Add errors for missing and extraneous symbol definitions.
This patch adds new errors and error checking to the ObjectLinkingLayer to
catch cases where a compiled or loaded object either:
(1) Contains definitions not covered by its responsibility set, or
(2) Is missing definitions that are covered by its responsibility set.

Proir to this patch providing the correct set of definitions was treated as
an API contract requirement, however this requires that the client be confident
in the correctness of the whole compiler / object-cache pipeline and results
in difficult-to-debug assertions upon failure. Treating this as a recoverable
error results in clearer diagnostics.

The performance overhead of this check is one comparison of densemap keys
(symbol string pointers) per linking object, which is minimal. If this overhead
ever becomes a problem we can add the check under a flag that can be turned off
if the client fully trusts the rest of the pipeline.
2020-02-22 11:49:14 -08:00
Hans Wennborg 1f984c83a4 Add #include <condition_variable> to fix build after 85fb997659
See https://reviews.llvm.org/D74300#1884614
2020-02-20 16:36:19 +01:00
Lang Hames 63d0932c35 [ORC] Fix a missing move. 2020-02-19 14:27:31 -08:00
Lang Hames 85fb997659 [ORC] Add generic initializer/deinitializer support.
Initializers and deinitializers are used to implement C++ static constructors
and destructors, runtime registration for some languages (e.g. with the
Objective-C runtime for Objective-C/C++ code) and other tasks that would
typically be performed when a shared-object/dylib is loaded or unloaded by a
statically compiled program.

MCJIT and ORC have historically provided limited support for discovering and
running initializers/deinitializers by scanning the llvm.global_ctors and
llvm.global_dtors variables and recording the functions to be run. This approach
suffers from several drawbacks: (1) It only works for IR inputs, not for object
files (including cached JIT'd objects). (2) It only works for initializers
described by llvm.global_ctors and llvm.global_dtors, however not all
initializers are described in this way (Objective-C, for example, describes
initializers via specially named metadata sections). (3) To make the
initializer/deinitializer functions described by llvm.global_ctors and
llvm.global_dtors searchable they must be promoted to extern linkage, polluting
the JIT symbol table (extra care must be taken to ensure this promotion does
not result in symbol name clashes).

This patch introduces several interdependent changes to ORCv2 to support the
construction of new initialization schemes, and includes an implementation of a
backwards-compatible llvm.global_ctor/llvm.global_dtor scanning scheme, and a
MachO specific scheme that handles Objective-C runtime registration (if the
Objective-C runtime is available) enabling execution of LLVM IR compiled from
Objective-C and Swift.

The major changes included in this patch are:

(1) The MaterializationUnit and MaterializationResponsibility classes are
extended to describe an optional "initializer" symbol for the module (see the
getInitializerSymbol method on each class). The presence or absence of this
symbol indicates whether the module contains any initializers or
deinitializers. The initializer symbol otherwise behaves like any other:
searching for it triggers materialization.

(2) A new Platform interface is introduced in llvm/ExecutionEngine/Orc/Core.h
which provides the following callback interface:

  - Error setupJITDylib(JITDylib &JD): Can be used to install standard symbols
    in JITDylibs upon creation. E.g. __dso_handle.

  - Error notifyAdding(JITDylib &JD, const MaterializationUnit &MU): Generally
    used to record initializer symbols.

  - Error notifyRemoving(JITDylib &JD, VModuleKey K): Used to notify a platform
    that a module is being removed.

  Platform implementations can use these callbacks to track outstanding
initializers and implement a platform-specific approach for executing them. For
example, the MachOPlatform installs a plugin in the JIT linker to scan for both
__mod_inits sections (for C++ static constructors) and ObjC metadata sections.
If discovered, these are processed in the usual platform order: Objective-C
registration is carried out first, then static initializers are executed,
ensuring that calls to Objective-C from static initializers will be safe.

This patch updates LLJIT to use the new scheme for initialization. Two
LLJIT::PlatformSupport classes are implemented: A GenericIR platform and a MachO
platform. The GenericIR platform implements a modified version of the previous
llvm.global-ctor scraping scheme to provide support for Windows and
Linux. LLJIT's MachO platform uses the MachOPlatform class to provide MachO
specific initialization as described above.

Reviewers: sgraenitz, dblaikie

Subscribers: mgorny, hiraditya, mgrang, ributzka, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74300
2020-02-19 13:59:32 -08:00
Alexandre Ganea 8404aeb56a [Support] On Windows, ensure hardware_concurrency() extends to all CPU sockets and all NUMA groups
The goal of this patch is to maximize CPU utilization on multi-socket or high core count systems, so that parallel computations such as LLD/ThinLTO can use all hardware threads in the system. Before this patch, on Windows, a maximum of 64 hardware threads could be used at most, in some cases dispatched only on one CPU socket.

== Background ==
Windows doesn't have a flat cpu_set_t like Linux. Instead, it projects hardware CPUs (or NUMA nodes) to applications through a concept of "processor groups". A "processor" is the smallest unit of execution on a CPU, that is, an hyper-thread if SMT is active; a core otherwise. There's a limit of 32-bit processors on older 32-bit versions of Windows, which later was raised to 64-processors with 64-bit versions of Windows. This limit comes from the affinity mask, which historically is represented by the sizeof(void*). Consequently, the concept of "processor groups" was introduced for dealing with systems with more than 64 hyper-threads.

By default, the Windows OS assigns only one "processor group" to each starting application, in a round-robin manner. If the application wants to use more processors, it needs to programmatically enable it, by assigning threads to other "processor groups". This also means that affinity cannot cross "processor group" boundaries; one can only specify a "preferred" group on start-up, but the application is free to allocate more groups if it wants to.

This creates a peculiar situation, where newer CPUs like the AMD EPYC 7702P (64-cores, 128-hyperthreads) are projected by the OS as two (2) "processor groups". This means that by default, an application can only use half of the cores. This situation could only get worse in the years to come, as dies with more cores will appear on the market.

== The problem ==
The heavyweight_hardware_concurrency() API was introduced so that only *one hardware thread per core* was used. Once that API returns, that original intention is lost, only the number of threads is retained. Consider a situation, on Windows, where the system has 2 CPU sockets, 18 cores each, each core having 2 hyper-threads, for a total of 72 hyper-threads. Both heavyweight_hardware_concurrency() and hardware_concurrency() currently return 36, because on Windows they are simply wrappers over std:🧵:hardware_concurrency() -- which can only return processors from the current "processor group".

== The changes in this patch ==
To solve this situation, we capture (and retain) the initial intention until the point of usage, through a new ThreadPoolStrategy class. The number of threads to use is deferred as late as possible, until the moment where the std::threads are created (ThreadPool in the case of ThinLTO).

When using hardware_concurrency(), setting ThreadCount to 0 now means to use all the possible hardware CPU (SMT) threads. Providing a ThreadCount above to the maximum number of threads will have no effect, the maximum will be used instead.
The heavyweight_hardware_concurrency() is similar to hardware_concurrency(), except that only one thread per hardware *core* will be used.

When LLVM_ENABLE_THREADS is OFF, the threading APIs will always return 1, to ensure any caller loops will be exercised at least once.

Differential Revision: https://reviews.llvm.org/D71775
2020-02-14 10:24:22 -05:00
Lang Hames ca6f58486f [ORC] Fix symbol dependence propagation algorithm in ObjectLinkingLayer.
ObjectLinkingLayer was not correctly propagating dependencies through local
symbols within an object. This could cause symbol lookup to return before a
searched-for symbol is ready if the following conditions are met:
(1) The definition of the symbol being searched for transitively depends on a
    local symbol within the same object, and that local symbol in turn
    transitively depends on an external symbol provided by some other module
    in the JIT.
(2) Concurrent compilation is enabled.
(3) Thread scheduling causes the lookup of the searched-for symbol to return
    before all transitive dependencies of the looked-up symbol are emitted.

This bug was found by inspection and has not been observed in practice.

A jitlink test case has been added to verify that symbol dependencies are
correctly propagated through local symbol definitions.
2020-02-11 12:56:41 -08:00