forked from OSchip/llvm-project
				
			
		
			
				
	
	
		
			324 lines
		
	
	
		
			16 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
			
		
		
	
	
			324 lines
		
	
	
		
			16 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
| =======================================================
 | |
| Building a JIT: Starting out with KaleidoscopeJIT
 | |
| =======================================================
 | |
| 
 | |
| .. contents::
 | |
|    :local:
 | |
| 
 | |
| Chapter 1 Introduction
 | |
| ======================
 | |
| 
 | |
| **Warning: This tutorial is currently being updated to account for ORC API
 | |
| changes. Only Chapters 1 and 2 are up-to-date.**
 | |
| 
 | |
| **Example code from Chapters 3 to 5 will compile and run, but has not been
 | |
| updated**
 | |
| 
 | |
| Welcome to Chapter 1 of the "Building an ORC-based JIT in LLVM" tutorial. This
 | |
| tutorial runs through the implementation of a JIT compiler using LLVM's
 | |
| On-Request-Compilation (ORC) APIs. It begins with a simplified version of the
 | |
| KaleidoscopeJIT class used in the
 | |
| `Implementing a language with LLVM <LangImpl01.html>`_ tutorials and then
 | |
| introduces new features like concurrent compilation, optimization, lazy
 | |
| compilation and remote execution.
 | |
| 
 | |
| The goal of this tutorial is to introduce you to LLVM's ORC JIT APIs, show how
 | |
| these APIs interact with other parts of LLVM, and to teach you how to recombine
 | |
| them to build a custom JIT that is suited to your use-case.
 | |
| 
 | |
| The structure of the tutorial is:
 | |
| 
 | |
| - Chapter #1: Investigate the simple KaleidoscopeJIT class. This will
 | |
|   introduce some of the basic concepts of the ORC JIT APIs, including the
 | |
|   idea of an ORC *Layer*.
 | |
| 
 | |
| - `Chapter #2 <BuildingAJIT2.html>`_: Extend the basic KaleidoscopeJIT by adding
 | |
|   a new layer that will optimize IR and generated code.
 | |
| 
 | |
| - `Chapter #3 <BuildingAJIT3.html>`_: Further extend the JIT by adding a
 | |
|   Compile-On-Demand layer to lazily compile IR.
 | |
| 
 | |
| - `Chapter #4 <BuildingAJIT4.html>`_: Improve the laziness of our JIT by
 | |
|   replacing the Compile-On-Demand layer with a custom layer that uses the ORC
 | |
|   Compile Callbacks API directly to defer IR-generation until functions are
 | |
|   called.
 | |
| 
 | |
| - `Chapter #5 <BuildingAJIT5.html>`_: Add process isolation by JITing code into
 | |
|   a remote process with reduced privileges using the JIT Remote APIs.
 | |
| 
 | |
| To provide input for our JIT we will use a lightly modified version of the
 | |
| Kaleidoscope REPL from `Chapter 7 <LangImpl07.html>`_ of the "Implementing a
 | |
| language in LLVM tutorial".
 | |
| 
 | |
| Finally, a word on API generations: ORC is the 3rd generation of LLVM JIT API.
 | |
| It was preceded by MCJIT, and before that by the (now deleted) legacy JIT.
 | |
| These tutorials don't assume any experience with these earlier APIs, but
 | |
| readers acquainted with them will see many familiar elements. Where appropriate
 | |
| we will make this connection with the earlier APIs explicit to help people who
 | |
| are transitioning from them to ORC.
 | |
| 
 | |
| JIT API Basics
 | |
| ==============
 | |
| 
 | |
| The purpose of a JIT compiler is to compile code "on-the-fly" as it is needed,
 | |
| rather than compiling whole programs to disk ahead of time as a traditional
 | |
| compiler does. To support that aim our initial, bare-bones JIT API will have
 | |
| just two functions:
 | |
| 
 | |
| 1. ``Error addModule(std::unique_ptr<Module> M)``: Make the given IR module
 | |
|    available for execution.
 | |
| 2. ``Expected<JITEvaluatedSymbol> lookup()``: Search for pointers to
 | |
|    symbols (functions or variables) that have been added to the JIT.
 | |
| 
 | |
| A basic use-case for this API, executing the 'main' function from a module,
 | |
| will look like:
 | |
| 
 | |
| .. code-block:: c++
 | |
| 
 | |
|   JIT J;
 | |
|   J.addModule(buildModule());
 | |
|   auto *Main = (int(*)(int, char*[]))J.lookup("main").getAddress();
 | |
|   int Result = Main();
 | |
| 
 | |
| The APIs that we build in these tutorials will all be variations on this simple
 | |
| theme. Behind this API we will refine the implementation of the JIT to add
 | |
| support for concurrent compilation, optimization and lazy compilation.
 | |
| Eventually we will extend the API itself to allow higher-level program
 | |
| representations (e.g. ASTs) to be added to the JIT.
 | |
| 
 | |
| KaleidoscopeJIT
 | |
| ===============
 | |
| 
 | |
| In the previous section we described our API, now we examine a simple
 | |
| implementation of it: The KaleidoscopeJIT class [1]_ that was used in the
 | |
| `Implementing a language with LLVM <LangImpl01.html>`_ tutorials. We will use
 | |
| the REPL code from `Chapter 7 <LangImpl07.html>`_ of that tutorial to supply the
 | |
| input for our JIT: Each time the user enters an expression the REPL will add a
 | |
| new IR module containing the code for that expression to the JIT. If the
 | |
| expression is a top-level expression like '1+1' or 'sin(x)', the REPL will also
 | |
| use the lookup method of our JIT class find and execute the code for the
 | |
| expression. In later chapters of this tutorial we will modify the REPL to enable
 | |
| new interactions with our JIT class, but for now we will take this setup for
 | |
| granted and focus our attention on the implementation of our JIT itself.
 | |
| 
 | |
| Our KaleidoscopeJIT class is defined in the KaleidoscopeJIT.h header. After the
 | |
| usual include guards and #includes [2]_, we get to the definition of our class:
 | |
| 
 | |
| .. code-block:: c++
 | |
| 
 | |
|   #ifndef LLVM_EXECUTIONENGINE_ORC_KALEIDOSCOPEJIT_H
 | |
|   #define LLVM_EXECUTIONENGINE_ORC_KALEIDOSCOPEJIT_H
 | |
| 
 | |
|   #include "llvm/ADT/StringRef.h"
 | |
|   #include "llvm/ExecutionEngine/JITSymbol.h"
 | |
|   #include "llvm/ExecutionEngine/Orc/CompileUtils.h"
 | |
|   #include "llvm/ExecutionEngine/Orc/Core.h"
 | |
|   #include "llvm/ExecutionEngine/Orc/ExecutionUtils.h"
 | |
|   #include "llvm/ExecutionEngine/Orc/IRCompileLayer.h"
 | |
|   #include "llvm/ExecutionEngine/Orc/JITTargetMachineBuilder.h"
 | |
|   #include "llvm/ExecutionEngine/Orc/RTDyldObjectLinkingLayer.h"
 | |
|   #include "llvm/ExecutionEngine/SectionMemoryManager.h"
 | |
|   #include "llvm/IR/DataLayout.h"
 | |
|   #include "llvm/IR/LLVMContext.h"
 | |
|   #include <memory>
 | |
| 
 | |
|   namespace llvm {
 | |
|   namespace orc {
 | |
| 
 | |
|   class KaleidoscopeJIT {
 | |
|   private:
 | |
|     ExecutionSession ES;
 | |
|     RTDyldObjectLinkingLayer ObjectLayer;
 | |
|     IRCompileLayer CompileLayer;
 | |
| 
 | |
|     DataLayout DL;
 | |
|     MangleAndInterner Mangle;
 | |
|     ThreadSafeContext Ctx;
 | |
| 
 | |
|   public:
 | |
|     KaleidoscopeJIT(JITTargetMachineBuilder JTMB, DataLayout DL)
 | |
|         : ObjectLayer(ES,
 | |
|                       []() { return std::make_unique<SectionMemoryManager>(); }),
 | |
|           CompileLayer(ES, ObjectLayer, ConcurrentIRCompiler(std::move(JTMB))),
 | |
|           DL(std::move(DL)), Mangle(ES, this->DL),
 | |
|           Ctx(std::make_unique<LLVMContext>()) {
 | |
|       ES.getMainJITDylib().setGenerator(
 | |
|           cantFail(DynamicLibrarySearchGenerator::GetForCurrentProcess(DL)));
 | |
|     }
 | |
| 
 | |
| Our class begins with six member variables: An ExecutionSession member, ``ES``,
 | |
| which provides context for our running JIT'd code (including the string pool,
 | |
| global mutex, and error reporting facilities); An RTDyldObjectLinkingLayer,
 | |
| ``ObjectLayer``, that can be used to add object files to our JIT (though we will
 | |
| not use it directly); An IRCompileLayer, ``CompileLayer``, that can be used to
 | |
| add LLVM Modules to our JIT (and which builds on the ObjectLayer), A DataLayout
 | |
| and MangleAndInterner, ``DL`` and ``Mangle``, that will be used for symbol mangling
 | |
| (more on that later); and finally an LLVMContext that clients will use when
 | |
| building IR files for the JIT.
 | |
| 
 | |
| Next up we have our class constructor, which takes a `JITTargetMachineBuilder``
 | |
| that will be used by our IRCompiler, and a ``DataLayout`` that we will use to
 | |
| initialize our DL member. The constructor begins by initializing our
 | |
| ObjectLayer.  The ObjectLayer requires a reference to the ExecutionSession, and
 | |
| a function object that will build a JIT memory manager for each module that is
 | |
| added (a JIT memory manager manages memory allocations, memory permissions, and
 | |
| registration of exception handlers for JIT'd code). For this we use a lambda
 | |
| that returns a SectionMemoryManager, an off-the-shelf utility that provides all
 | |
| the basic memory management functionality required for this chapter. Next we
 | |
| initialize our CompileLayer. The CompileLayer needs three things: (1) A
 | |
| reference to the ExecutionSession, (2) A reference to our object layer, and (3)
 | |
| a compiler instance to use to perform the actual compilation from IR to object
 | |
| files. We use the off-the-shelf ConcurrentIRCompiler utility as our compiler,
 | |
| which we construct using this constructor's JITTargetMachineBuilder argument.
 | |
| The ConcurrentIRCompiler utility will use the JITTargetMachineBuilder to build
 | |
| llvm TargetMachines (which are not thread safe) as needed for compiles. After
 | |
| this, we initialize our supporting members: ``DL``, ``Mangler`` and ``Ctx`` with
 | |
| the input DataLayout, the ExecutionSession and DL member, and a new default
 | |
| constructed LLVMContext respectively. Now that our members have been initialized,
 | |
| so the one thing that remains to do is to tweak the configuration of the
 | |
| *JITDylib* that we will store our code in. We want to modify this dylib to
 | |
| contain not only the symbols that we add to it, but also the symbols from our
 | |
| REPL process as well. We do this by attaching a
 | |
| ``DynamicLibrarySearchGenerator`` instance using the
 | |
| ``DynamicLibrarySearchGenerator::GetForCurrentProcess`` method.
 | |
| 
 | |
| 
 | |
| .. code-block:: c++
 | |
| 
 | |
|   static Expected<std::unique_ptr<KaleidoscopeJIT>> Create() {
 | |
|     auto JTMB = JITTargetMachineBuilder::detectHost();
 | |
| 
 | |
|     if (!JTMB)
 | |
|       return JTMB.takeError();
 | |
| 
 | |
|     auto DL = JTMB->getDefaultDataLayoutForTarget();
 | |
|     if (!DL)
 | |
|       return DL.takeError();
 | |
| 
 | |
|     return std::make_unique<KaleidoscopeJIT>(std::move(*JTMB), std::move(*DL));
 | |
|   }
 | |
| 
 | |
|   const DataLayout &getDataLayout() const { return DL; }
 | |
| 
 | |
|   LLVMContext &getContext() { return *Ctx.getContext(); }
 | |
| 
 | |
| Next we have a named constructor, ``Create``, which will build a KaleidoscopeJIT
 | |
| instance that is configured to generate code for our host process. It does this
 | |
| by first generating a JITTargetMachineBuilder instance using that classes'
 | |
| detectHost method and then using that instance to generate a datalayout for
 | |
| the target process. Each of these operations can fail, so each returns its
 | |
| result wrapped in an Expected value [3]_ that we must check for error before
 | |
| continuing. If both operations succeed we can unwrap their results (using the
 | |
| dereference operator) and pass them into KaleidoscopeJIT's constructor on the
 | |
| last line of the function.
 | |
| 
 | |
| Following the named constructor we have the ``getDataLayout()`` and
 | |
| ``getContext()`` methods. These are used to make data structures created and
 | |
| managed by the JIT (especially the LLVMContext) available to the REPL code that
 | |
| will build our IR modules.
 | |
| 
 | |
| .. code-block:: c++
 | |
| 
 | |
|   void addModule(std::unique_ptr<Module> M) {
 | |
|     cantFail(CompileLayer.add(ES.getMainJITDylib(),
 | |
|                               ThreadSafeModule(std::move(M), Ctx)));
 | |
|   }
 | |
| 
 | |
|   Expected<JITEvaluatedSymbol> lookup(StringRef Name) {
 | |
|     return ES.lookup({&ES.getMainJITDylib()}, Mangle(Name.str()));
 | |
|   }
 | |
| 
 | |
| Now we come to the first of our JIT API methods: addModule. This method is
 | |
| responsible for adding IR to the JIT and making it available for execution. In
 | |
| this initial implementation of our JIT we will make our modules "available for
 | |
| execution" by adding them to the CompileLayer, which will it turn store the
 | |
| Module in the main JITDylib. This process will create new symbol table entries
 | |
| in the JITDylib for each definition in the module, and will defer compilation of
 | |
| the module until any of its definitions is looked up. Note that this is not lazy
 | |
| compilation: just referencing a definition, even if it is never used, will be
 | |
| enough to trigger compilation. In later chapters we will teach our JIT to defer
 | |
| compilation of functions until they're actually called.  To add our Module we
 | |
| must first wrap it in a ThreadSafeModule instance, which manages the lifetime of
 | |
| the Module's LLVMContext (our Ctx member) in a thread-friendly way. In our
 | |
| example, all modules will share the Ctx member, which will exist for the
 | |
| duration of the JIT. Once we switch to concurrent compilation in later chapters
 | |
| we will use a new context per module.
 | |
| 
 | |
| Our last method is ``lookup``, which allows us to look up addresses for
 | |
| function and variable definitions added to the JIT based on their symbol names.
 | |
| As noted above, lookup will implicitly trigger compilation for any symbol
 | |
| that has not already been compiled. Our lookup method calls through to
 | |
| `ExecutionSession::lookup`, passing in a list of dylibs to search (in our case
 | |
| just the main dylib), and the symbol name to search for, with a twist: We have
 | |
| to *mangle* the name of the symbol we're searching for first. The ORC JIT
 | |
| components use mangled symbols internally the same way a static compiler and
 | |
| linker would, rather than using plain IR symbol names. This allows JIT'd code
 | |
| to interoperate easily with precompiled code in the application or shared
 | |
| libraries. The kind of mangling will depend on the DataLayout, which in turn
 | |
| depends on the target platform. To allow us to remain portable and search based
 | |
| on the un-mangled name, we just re-produce this mangling ourselves using our
 | |
| ``Mangle`` member function object.
 | |
| 
 | |
| This brings us to the end of Chapter 1 of Building a JIT. You now have a basic
 | |
| but fully functioning JIT stack that you can use to take LLVM IR and make it
 | |
| executable within the context of your JIT process. In the next chapter we'll
 | |
| look at how to extend this JIT to produce better quality code, and in the
 | |
| process take a deeper look at the ORC layer concept.
 | |
| 
 | |
| `Next: Extending the KaleidoscopeJIT <BuildingAJIT2.html>`_
 | |
| 
 | |
| Full Code Listing
 | |
| =================
 | |
| 
 | |
| Here is the complete code listing for our running example. To build this
 | |
| example, use:
 | |
| 
 | |
| .. code-block:: bash
 | |
| 
 | |
|     # Compile
 | |
|     clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core orcjit native` -O3 -o toy
 | |
|     # Run
 | |
|     ./toy
 | |
| 
 | |
| Here is the code:
 | |
| 
 | |
| .. literalinclude:: ../../examples/Kaleidoscope/BuildingAJIT/Chapter1/KaleidoscopeJIT.h
 | |
|    :language: c++
 | |
| 
 | |
| .. [1] Actually we use a cut-down version of KaleidoscopeJIT that makes a
 | |
|        simplifying assumption: symbols cannot be re-defined. This will make it
 | |
|        impossible to re-define symbols in the REPL, but will make our symbol
 | |
|        lookup logic simpler. Re-introducing support for symbol redefinition is
 | |
|        left as an exercise for the reader. (The KaleidoscopeJIT.h used in the
 | |
|        original tutorials will be a helpful reference).
 | |
| 
 | |
| .. [2] +-----------------------------+-----------------------------------------------+
 | |
|        |         File                |               Reason for inclusion            |
 | |
|        +=============================+===============================================+
 | |
|        |        JITSymbol.h          | Defines the lookup result type                |
 | |
|        |                             | JITEvaluatedSymbol                            |
 | |
|        +-----------------------------+-----------------------------------------------+
 | |
|        |       CompileUtils.h        | Provides the SimpleCompiler class.            |
 | |
|        +-----------------------------+-----------------------------------------------+
 | |
|        |           Core.h            | Core utilities such as ExecutionSession and   |
 | |
|        |                             | JITDylib.                                     |
 | |
|        +-----------------------------+-----------------------------------------------+
 | |
|        |      ExecutionUtils.h       | Provides the DynamicLibrarySearchGenerator    |
 | |
|        |                             | class.                                        |
 | |
|        +-----------------------------+-----------------------------------------------+
 | |
|        |      IRCompileLayer.h       | Provides the IRCompileLayer class.            |
 | |
|        +-----------------------------+-----------------------------------------------+
 | |
|        |  JITTargetMachineBuilder.h  | Provides the JITTargetMachineBuilder class.   |
 | |
|        +-----------------------------+-----------------------------------------------+
 | |
|        | RTDyldObjectLinkingLayer.h  | Provides the RTDyldObjectLinkingLayer class.  |
 | |
|        +-----------------------------+-----------------------------------------------+
 | |
|        |   SectionMemoryManager.h    | Provides the SectionMemoryManager class.      |
 | |
|        +-----------------------------+-----------------------------------------------+
 | |
|        |        DataLayout.h         | Provides the DataLayout class.                |
 | |
|        +-----------------------------+-----------------------------------------------+
 | |
|        |        LLVMContext.h        | Provides the LLVMContext class.               |
 | |
|        +-----------------------------+-----------------------------------------------+
 | |
| 
 | |
| .. [3] See the ErrorHandling section in the LLVM Programmer's Manual
 | |
|        (http://llvm.org/docs/ProgrammersManual.html#error-handling)
 |