328 lines
		
	
	
		
			12 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
			
		
		
	
	
			328 lines
		
	
	
		
			12 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
| ============================================================
 | |
| Extending LLVM: Adding instructions, intrinsics, types, etc.
 | |
| ============================================================
 | |
| 
 | |
| Introduction and Warning
 | |
| ========================
 | |
| 
 | |
| 
 | |
| During the course of using LLVM, you may wish to customize it for your research
 | |
| project or for experimentation. At this point, you may realize that you need to
 | |
| add something to LLVM, whether it be a new fundamental type, a new intrinsic
 | |
| function, or a whole new instruction.
 | |
| 
 | |
| When you come to this realization, stop and think. Do you really need to extend
 | |
| LLVM? Is it a new fundamental capability that LLVM does not support at its
 | |
| current incarnation or can it be synthesized from already pre-existing LLVM
 | |
| elements? If you are not sure, ask on the `LLVM-dev
 | |
| <http://lists.llvm.org/mailman/listinfo/llvm-dev>`_ list. The reason is that
 | |
| extending LLVM will get involved as you need to update all the different passes
 | |
| that you intend to use with your extension, and there are ``many`` LLVM analyses
 | |
| and transformations, so it may be quite a bit of work.
 | |
| 
 | |
| Adding an `intrinsic function`_ is far easier than adding an
 | |
| instruction, and is transparent to optimization passes.  If your added
 | |
| functionality can be expressed as a function call, an intrinsic function is the
 | |
| method of choice for LLVM extension.
 | |
| 
 | |
| Before you invest a significant amount of effort into a non-trivial extension,
 | |
| **ask on the list** if what you are looking to do can be done with
 | |
| already-existing infrastructure, or if maybe someone else is already working on
 | |
| it. You will save yourself a lot of time and effort by doing so.
 | |
| 
 | |
| .. _intrinsic function:
 | |
| 
 | |
| Adding a new intrinsic function
 | |
| ===============================
 | |
| 
 | |
| Adding a new intrinsic function to LLVM is much easier than adding a new
 | |
| instruction.  Almost all extensions to LLVM should start as an intrinsic
 | |
| function and then be turned into an instruction if warranted.
 | |
| 
 | |
| #. ``llvm/docs/LangRef.html``:
 | |
| 
 | |
|    Document the intrinsic.  Decide whether it is code generator specific and
 | |
|    what the restrictions are.  Talk to other people about it so that you are
 | |
|    sure it's a good idea.
 | |
| 
 | |
| #. ``llvm/include/llvm/IR/Intrinsics*.td``:
 | |
| 
 | |
|    Add an entry for your intrinsic.  Describe its memory access characteristics
 | |
|    for optimization (this controls whether it will be DCE'd, CSE'd, etc). Note
 | |
|    that any intrinsic using one of the ``llvm_any*_ty`` types for an argument or
 | |
|    return type will be deemed by ``tblgen`` as overloaded and the corresponding
 | |
|    suffix will be required on the intrinsic's name.
 | |
| 
 | |
| #. ``llvm/lib/Analysis/ConstantFolding.cpp``:
 | |
| 
 | |
|    If it is possible to constant fold your intrinsic, add support to it in the
 | |
|    ``canConstantFoldCallTo`` and ``ConstantFoldCall`` functions.
 | |
| 
 | |
| #. ``llvm/test/*``:
 | |
| 
 | |
|    Add test cases for your test cases to the test suite
 | |
| 
 | |
| Once the intrinsic has been added to the system, you must add code generator
 | |
| support for it.  Generally you must do the following steps:
 | |
| 
 | |
| Add support to the .td file for the target(s) of your choice in
 | |
| ``lib/Target/*/*.td``.
 | |
| 
 | |
|   This is usually a matter of adding a pattern to the .td file that matches the
 | |
|   intrinsic, though it may obviously require adding the instructions you want to
 | |
|   generate as well.  There are lots of examples in the PowerPC and X86 backend
 | |
|   to follow.
 | |
| 
 | |
| Adding a new SelectionDAG node
 | |
| ==============================
 | |
| 
 | |
| As with intrinsics, adding a new SelectionDAG node to LLVM is much easier than
 | |
| adding a new instruction.  New nodes are often added to help represent
 | |
| instructions common to many targets.  These nodes often map to an LLVM
 | |
| instruction (add, sub) or intrinsic (byteswap, population count).  In other
 | |
| cases, new nodes have been added to allow many targets to perform a common task
 | |
| (converting between floating point and integer representation) or capture more
 | |
| complicated behavior in a single node (rotate).
 | |
| 
 | |
| #. ``include/llvm/CodeGen/ISDOpcodes.h``:
 | |
| 
 | |
|    Add an enum value for the new SelectionDAG node.
 | |
| 
 | |
| #. ``lib/CodeGen/SelectionDAG/SelectionDAG.cpp``:
 | |
| 
 | |
|    Add code to print the node to ``getOperationName``.  If your new node can be
 | |
|     evaluated at compile time when given constant arguments (such as an add of a
 | |
|     constant with another constant), find the ``getNode`` method that takes the
 | |
|     appropriate number of arguments, and add a case for your node to the switch
 | |
|     statement that performs constant folding for nodes that take the same number
 | |
|     of arguments as your new node.
 | |
| 
 | |
| #. ``lib/CodeGen/SelectionDAG/LegalizeDAG.cpp``:
 | |
| 
 | |
|    Add code to `legalize, promote, and expand
 | |
|    <CodeGenerator.html#selectiondag_legalize>`_ the node as necessary.  At a
 | |
|    minimum, you will need to add a case statement for your node in
 | |
|    ``LegalizeOp`` which calls LegalizeOp on the node's operands, and returns a
 | |
|    new node if any of the operands changed as a result of being legalized.  It
 | |
|    is likely that not all targets supported by the SelectionDAG framework will
 | |
|    natively support the new node.  In this case, you must also add code in your
 | |
|    node's case statement in ``LegalizeOp`` to Expand your node into simpler,
 | |
|    legal operations.  The case for ``ISD::UREM`` for expanding a remainder into
 | |
|    a divide, multiply, and a subtract is a good example.
 | |
| 
 | |
| #. ``lib/CodeGen/SelectionDAG/LegalizeDAG.cpp``:
 | |
| 
 | |
|    If targets may support the new node being added only at certain sizes, you
 | |
|     will also need to add code to your node's case statement in ``LegalizeOp``
 | |
|     to Promote your node's operands to a larger size, and perform the correct
 | |
|     operation.  You will also need to add code to ``PromoteOp`` to do this as
 | |
|     well.  For a good example, see ``ISD::BSWAP``, which promotes its operand to
 | |
|     a wider size, performs the byteswap, and then shifts the correct bytes right
 | |
|     to emulate the narrower byteswap in the wider type.
 | |
| 
 | |
| #. ``lib/CodeGen/SelectionDAG/LegalizeDAG.cpp``:
 | |
| 
 | |
|    Add a case for your node in ``ExpandOp`` to teach the legalizer how to
 | |
|    perform the action represented by the new node on a value that has been split
 | |
|    into high and low halves.  This case will be used to support your node with a
 | |
|    64 bit operand on a 32 bit target.
 | |
| 
 | |
| #. ``lib/CodeGen/SelectionDAG/DAGCombiner.cpp``:
 | |
| 
 | |
|    If your node can be combined with itself, or other existing nodes in a
 | |
|    peephole-like fashion, add a visit function for it, and call that function
 | |
|    from. There are several good examples for simple combines you can do;
 | |
|    ``visitFABS`` and ``visitSRL`` are good starting places.
 | |
| 
 | |
| #. ``lib/Target/PowerPC/PPCISelLowering.cpp``:
 | |
| 
 | |
|    Each target has an implementation of the ``TargetLowering`` class, usually in
 | |
|    its own file (although some targets include it in the same file as the
 | |
|    DAGToDAGISel).  The default behavior for a target is to assume that your new
 | |
|    node is legal for all types that are legal for that target.  If this target
 | |
|    does not natively support your node, then tell the target to either Promote
 | |
|    it (if it is supported at a larger type) or Expand it.  This will cause the
 | |
|    code you wrote in ``LegalizeOp`` above to decompose your new node into other
 | |
|    legal nodes for this target.
 | |
| 
 | |
| #. ``lib/Target/TargetSelectionDAG.td``:
 | |
| 
 | |
|    Most current targets supported by LLVM generate code using the DAGToDAG
 | |
|    method, where SelectionDAG nodes are pattern matched to target-specific
 | |
|    nodes, which represent individual instructions.  In order for the targets to
 | |
|    match an instruction to your new node, you must add a def for that node to
 | |
|    the list in this file, with the appropriate type constraints. Look at
 | |
|    ``add``, ``bswap``, and ``fadd`` for examples.
 | |
| 
 | |
| #. ``lib/Target/PowerPC/PPCInstrInfo.td``:
 | |
| 
 | |
|    Each target has a tablegen file that describes the target's instruction set.
 | |
|    For targets that use the DAGToDAG instruction selection framework, add a
 | |
|    pattern for your new node that uses one or more target nodes.  Documentation
 | |
|    for this is a bit sparse right now, but there are several decent examples.
 | |
|    See the patterns for ``rotl`` in ``PPCInstrInfo.td``.
 | |
| 
 | |
| #. TODO: document complex patterns.
 | |
| 
 | |
| #. ``llvm/test/CodeGen/*``:
 | |
| 
 | |
|    Add test cases for your new node to the test suite.
 | |
|    ``llvm/test/CodeGen/X86/bswap.ll`` is a good example.
 | |
| 
 | |
| Adding a new instruction
 | |
| ========================
 | |
| 
 | |
| .. warning::
 | |
| 
 | |
|   Adding instructions changes the bitcode format, and it will take some effort
 | |
|   to maintain compatibility with the previous version. Only add an instruction
 | |
|   if it is absolutely necessary.
 | |
| 
 | |
| #. ``llvm/include/llvm/IR/Instruction.def``:
 | |
| 
 | |
|    add a number for your instruction and an enum name
 | |
| 
 | |
| #. ``llvm/include/llvm/IR/Instructions.h``:
 | |
| 
 | |
|    add a definition for the class that will represent your instruction
 | |
| 
 | |
| #. ``llvm/include/llvm/IR/InstVisitor.h``:
 | |
| 
 | |
|    add a prototype for a visitor to your new instruction type
 | |
| 
 | |
| #. ``llvm/lib/AsmParser/LLLexer.cpp``:
 | |
| 
 | |
|    add a new token to parse your instruction from assembly text file
 | |
| 
 | |
| #. ``llvm/lib/AsmParser/LLParser.cpp``:
 | |
| 
 | |
|    add the grammar on how your instruction can be read and what it will
 | |
|    construct as a result
 | |
| 
 | |
| #. ``llvm/lib/Bitcode/Reader/BitcodeReader.cpp``:
 | |
| 
 | |
|    add a case for your instruction and how it will be parsed from bitcode
 | |
| 
 | |
| #. ``llvm/lib/Bitcode/Writer/BitcodeWriter.cpp``:
 | |
| 
 | |
|    add a case for your instruction and how it will be parsed from bitcode
 | |
| 
 | |
| #. ``llvm/lib/IR/Instruction.cpp``:
 | |
| 
 | |
|    add a case for how your instruction will be printed out to assembly
 | |
| 
 | |
| #. ``llvm/lib/IR/Instructions.cpp``:
 | |
| 
 | |
|    implement the class you defined in ``llvm/include/llvm/Instructions.h``
 | |
| 
 | |
| #. Test your instruction
 | |
| 
 | |
| #. ``llvm/lib/Target/*``:
 | |
| 
 | |
|    add support for your instruction to code generators, or add a lowering pass.
 | |
| 
 | |
| #. ``llvm/test/*``:
 | |
| 
 | |
|    add your test cases to the test suite.
 | |
| 
 | |
| Also, you need to implement (or modify) any analyses or passes that you want to
 | |
| understand this new instruction.
 | |
| 
 | |
| Adding a new type
 | |
| =================
 | |
| 
 | |
| .. warning::
 | |
| 
 | |
|   Adding new types changes the bitcode format, and will break compatibility with
 | |
|   currently-existing LLVM installations. Only add new types if it is absolutely
 | |
|   necessary.
 | |
| 
 | |
| Adding a fundamental type
 | |
| -------------------------
 | |
| 
 | |
| #. ``llvm/include/llvm/IR/Type.h``:
 | |
| 
 | |
|    add enum for the new type; add static ``Type*`` for this type
 | |
| 
 | |
| #. ``llvm/lib/IR/Type.cpp`` and ``llvm/lib/IR/ValueTypes.cpp``:
 | |
| 
 | |
|    add mapping from ``TypeID`` => ``Type*``; initialize the static ``Type*``
 | |
| 
 | |
| #. ``llvm/llvm/llvm-c/Core.cpp``:
 | |
| 
 | |
|    add enum ``LLVMTypeKind`` and modify
 | |
|    ``LLVMTypeKind LLVMGetTypeKind(LLVMTypeRef Ty)`` for the new type
 | |
| 
 | |
| #. ``llvm/include/llvm/IR/TypeBuilder.h``:
 | |
| 
 | |
|    add new class to represent new type in the hierarchy
 | |
| 
 | |
| #. ``llvm/lib/AsmParser/LLLexer.cpp``:
 | |
| 
 | |
|    add ability to parse in the type from text assembly
 | |
| 
 | |
| #. ``llvm/lib/AsmParser/LLParser.cpp``:
 | |
| 
 | |
|    add a token for that type
 | |
| 
 | |
| #. ``llvm/lib/Bitcode/Writer/BitcodeWriter.cpp``:
 | |
| 
 | |
|    modify ``static void WriteTypeTable(const ValueEnumerator &VE,
 | |
|    BitstreamWriter &Stream)`` to serialize your type
 | |
| 
 | |
| #. ``llvm/lib/Bitcode/Reader/BitcodeReader.cpp``:
 | |
| 
 | |
|    modify ``bool BitcodeReader::ParseTypeType()`` to read your data type
 | |
| 
 | |
| #. ``include/llvm/Bitcode/LLVMBitCodes.h``:
 | |
| 
 | |
|    add enum ``TypeCodes`` for the new type
 | |
| 
 | |
| Adding a derived type
 | |
| ---------------------
 | |
| 
 | |
| #. ``llvm/include/llvm/IR/Type.h``:
 | |
| 
 | |
|    add enum for the new type; add a forward declaration of the type also
 | |
| 
 | |
| #. ``llvm/include/llvm/IR/DerivedTypes.h``:
 | |
| 
 | |
|    add new class to represent new class in the hierarchy; add forward
 | |
|    declaration to the TypeMap value type
 | |
| 
 | |
| #. ``llvm/lib/IR/Type.cpp`` and ``llvm/lib/IR/ValueTypes.cpp``:
 | |
| 
 | |
|    add support for derived type, notably `enum TypeID` and `is`, `get` methods.
 | |
| 
 | |
| #. ``llvm/llvm/llvm-c/Core.cpp``:
 | |
| 
 | |
|    add enum ``LLVMTypeKind`` and modify
 | |
|    `LLVMTypeKind LLVMGetTypeKind(LLVMTypeRef Ty)` for the new type
 | |
| 
 | |
| #. ``llvm/include/llvm/IR/TypeBuilder.h``:
 | |
| 
 | |
|    add new class to represent new class in the hierarchy
 | |
| 
 | |
| #. ``llvm/lib/AsmParser/LLLexer.cpp``:
 | |
| 
 | |
|    modify ``lltok::Kind LLLexer::LexIdentifier()`` to add ability to
 | |
|    parse in the type from text assembly
 | |
| 
 | |
| #. ``llvm/lib/Bitcode/Writer/BitcodeWriter.cpp``:
 | |
| 
 | |
|    modify ``static void WriteTypeTable(const ValueEnumerator &VE,
 | |
|    BitstreamWriter &Stream)`` to serialize your type
 | |
| 
 | |
| #. ``llvm/lib/Bitcode/Reader/BitcodeReader.cpp``:
 | |
| 
 | |
|    modify ``bool BitcodeReader::ParseTypeType()`` to read your data type
 | |
| 
 | |
| #. ``include/llvm/Bitcode/LLVMBitCodes.h``:
 | |
| 
 | |
|    add enum ``TypeCodes`` for the new type
 | |
| 
 | |
| #. ``llvm/lib/IR/AsmWriter.cpp``:
 | |
| 
 | |
|    modify ``void TypePrinting::print(Type *Ty, raw_ostream &OS)``
 | |
|    to output the new derived type
 |