494 lines
		
	
	
		
			14 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
			
		
		
	
	
			494 lines
		
	
	
		
			14 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
========================================
 | 
						|
Machine IR (MIR) Format Reference Manual
 | 
						|
========================================
 | 
						|
 | 
						|
.. contents::
 | 
						|
   :local:
 | 
						|
 | 
						|
.. warning::
 | 
						|
  This is a work in progress.
 | 
						|
 | 
						|
Introduction
 | 
						|
============
 | 
						|
 | 
						|
This document is a reference manual for the Machine IR (MIR) serialization
 | 
						|
format. MIR is a human readable serialization format that is used to represent
 | 
						|
LLVM's :ref:`machine specific intermediate representation
 | 
						|
<machine code representation>`.
 | 
						|
 | 
						|
The MIR serialization format is designed to be used for testing the code
 | 
						|
generation passes in LLVM.
 | 
						|
 | 
						|
Overview
 | 
						|
========
 | 
						|
 | 
						|
The MIR serialization format uses a YAML container. YAML is a standard
 | 
						|
data serialization language, and the full YAML language spec can be read at
 | 
						|
`yaml.org
 | 
						|
<http://www.yaml.org/spec/1.2/spec.html#Introduction>`_.
 | 
						|
 | 
						|
A MIR file is split up into a series of `YAML documents`_. The first document
 | 
						|
can contain an optional embedded LLVM IR module, and the rest of the documents
 | 
						|
contain the serialized machine functions.
 | 
						|
 | 
						|
.. _YAML documents: http://www.yaml.org/spec/1.2/spec.html#id2800132
 | 
						|
 | 
						|
MIR Testing Guide
 | 
						|
=================
 | 
						|
 | 
						|
You can use the MIR format for testing in two different ways:
 | 
						|
 | 
						|
- You can write MIR tests that invoke a single code generation pass using the
 | 
						|
  ``run-pass`` option in llc.
 | 
						|
 | 
						|
- You can use llc's ``stop-after`` option with existing or new LLVM assembly
 | 
						|
  tests and check the MIR output of a specific code generation pass.
 | 
						|
 | 
						|
Testing Individual Code Generation Passes
 | 
						|
-----------------------------------------
 | 
						|
 | 
						|
The ``run-pass`` option in llc allows you to create MIR tests that invoke
 | 
						|
just a single code generation pass. When this option is used, llc will parse
 | 
						|
an input MIR file, run the specified code generation pass, and print the
 | 
						|
resulting MIR to the standard output stream.
 | 
						|
 | 
						|
You can generate an input MIR file for the test by using the ``stop-after``
 | 
						|
option in llc. For example, if you would like to write a test for the
 | 
						|
post register allocation pseudo instruction expansion pass, you can specify
 | 
						|
the machine copy propagation pass in the ``stop-after`` option, as it runs
 | 
						|
just before the pass that we are trying to test:
 | 
						|
 | 
						|
   ``llc -stop-after machine-cp bug-trigger.ll > test.mir``
 | 
						|
 | 
						|
After generating the input MIR file, you'll have to add a run line that uses
 | 
						|
the ``-run-pass`` option to it. In order to test the post register allocation
 | 
						|
pseudo instruction expansion pass on X86-64, a run line like the one shown
 | 
						|
below can be used:
 | 
						|
 | 
						|
    ``# RUN: llc -run-pass postrapseudos -march=x86-64 %s -o /dev/null | FileCheck %s``
 | 
						|
 | 
						|
The MIR files are target dependent, so they have to be placed in the target
 | 
						|
specific test directories. They also need to specify a target triple or a
 | 
						|
target architecture either in the run line or in the embedded LLVM IR module.
 | 
						|
 | 
						|
Limitations
 | 
						|
-----------
 | 
						|
 | 
						|
Currently the MIR format has several limitations in terms of which state it
 | 
						|
can serialize:
 | 
						|
 | 
						|
- The target-specific state in the target-specific ``MachineFunctionInfo``
 | 
						|
  subclasses isn't serialized at the moment.
 | 
						|
 | 
						|
- The target-specific ``MachineConstantPoolValue`` subclasses (in the ARM and
 | 
						|
  SystemZ backends) aren't serialized at the moment.
 | 
						|
 | 
						|
- The ``MCSymbol`` machine operands are only printed, they can't be parsed.
 | 
						|
 | 
						|
- A lot of the state in ``MachineModuleInfo`` isn't serialized - only the CFI
 | 
						|
  instructions and the variable debug information from MMI is serialized right
 | 
						|
  now.
 | 
						|
 | 
						|
These limitations impose restrictions on what you can test with the MIR format.
 | 
						|
For now, tests that would like to test some behaviour that depends on the state
 | 
						|
of certain ``MCSymbol``  operands or the exception handling state in MMI, can't
 | 
						|
use the MIR format. As well as that, tests that test some behaviour that
 | 
						|
depends on the state of the target specific ``MachineFunctionInfo`` or
 | 
						|
``MachineConstantPoolValue`` subclasses can't use the MIR format at the moment.
 | 
						|
 | 
						|
High Level Structure
 | 
						|
====================
 | 
						|
 | 
						|
.. _embedded-module:
 | 
						|
 | 
						|
Embedded Module
 | 
						|
---------------
 | 
						|
 | 
						|
When the first YAML document contains a `YAML block literal string`_, the MIR
 | 
						|
parser will treat this string as an LLVM assembly language string that
 | 
						|
represents an embedded LLVM IR module.
 | 
						|
Here is an example of a YAML document that contains an LLVM module:
 | 
						|
 | 
						|
.. code-block:: llvm
 | 
						|
 | 
						|
       define i32 @inc(i32* %x) {
 | 
						|
       entry:
 | 
						|
         %0 = load i32, i32* %x
 | 
						|
         %1 = add i32 %0, 1
 | 
						|
         store i32 %1, i32* %x
 | 
						|
         ret i32 %1
 | 
						|
       }
 | 
						|
 | 
						|
.. _YAML block literal string: http://www.yaml.org/spec/1.2/spec.html#id2795688
 | 
						|
 | 
						|
Machine Functions
 | 
						|
-----------------
 | 
						|
 | 
						|
The remaining YAML documents contain the machine functions. This is an example
 | 
						|
of such YAML document:
 | 
						|
 | 
						|
.. code-block:: text
 | 
						|
 | 
						|
     ---
 | 
						|
     name:            inc
 | 
						|
     tracksRegLiveness: true
 | 
						|
     liveins:
 | 
						|
       - { reg: '%rdi' }
 | 
						|
     body: |
 | 
						|
       bb.0.entry:
 | 
						|
         liveins: %rdi
 | 
						|
 | 
						|
         %eax = MOV32rm %rdi, 1, _, 0, _
 | 
						|
         %eax = INC32r killed %eax, implicit-def dead %eflags
 | 
						|
         MOV32mr killed %rdi, 1, _, 0, _, %eax
 | 
						|
         RETQ %eax
 | 
						|
     ...
 | 
						|
 | 
						|
The document above consists of attributes that represent the various
 | 
						|
properties and data structures in a machine function.
 | 
						|
 | 
						|
The attribute ``name`` is required, and its value should be identical to the
 | 
						|
name of a function that this machine function is based on.
 | 
						|
 | 
						|
The attribute ``body`` is a `YAML block literal string`_. Its value represents
 | 
						|
the function's machine basic blocks and their machine instructions.
 | 
						|
 | 
						|
Machine Instructions Format Reference
 | 
						|
=====================================
 | 
						|
 | 
						|
The machine basic blocks and their instructions are represented using a custom,
 | 
						|
human readable serialization language. This language is used in the
 | 
						|
`YAML block literal string`_ that corresponds to the machine function's body.
 | 
						|
 | 
						|
A source string that uses this language contains a list of machine basic
 | 
						|
blocks, which are described in the section below.
 | 
						|
 | 
						|
Machine Basic Blocks
 | 
						|
--------------------
 | 
						|
 | 
						|
A machine basic block is defined in a single block definition source construct
 | 
						|
that contains the block's ID.
 | 
						|
The example below defines two blocks that have an ID of zero and one:
 | 
						|
 | 
						|
.. code-block:: text
 | 
						|
 | 
						|
    bb.0:
 | 
						|
      <instructions>
 | 
						|
    bb.1:
 | 
						|
      <instructions>
 | 
						|
 | 
						|
A machine basic block can also have a name. It should be specified after the ID
 | 
						|
in the block's definition:
 | 
						|
 | 
						|
.. code-block:: text
 | 
						|
 | 
						|
    bb.0.entry:       ; This block's name is "entry"
 | 
						|
       <instructions>
 | 
						|
 | 
						|
The block's name should be identical to the name of the IR block that this
 | 
						|
machine block is based on.
 | 
						|
 | 
						|
Block References
 | 
						|
^^^^^^^^^^^^^^^^
 | 
						|
 | 
						|
The machine basic blocks are identified by their ID numbers. Individual
 | 
						|
blocks are referenced using the following syntax:
 | 
						|
 | 
						|
.. code-block:: text
 | 
						|
 | 
						|
    %bb.<id>[.<name>]
 | 
						|
 | 
						|
Examples:
 | 
						|
 | 
						|
.. code-block:: llvm
 | 
						|
 | 
						|
    %bb.0
 | 
						|
    %bb.1.then
 | 
						|
 | 
						|
Successors
 | 
						|
^^^^^^^^^^
 | 
						|
 | 
						|
The machine basic block's successors have to be specified before any of the
 | 
						|
instructions:
 | 
						|
 | 
						|
.. code-block:: text
 | 
						|
 | 
						|
    bb.0.entry:
 | 
						|
      successors: %bb.1.then, %bb.2.else
 | 
						|
      <instructions>
 | 
						|
    bb.1.then:
 | 
						|
      <instructions>
 | 
						|
    bb.2.else:
 | 
						|
      <instructions>
 | 
						|
 | 
						|
The branch weights can be specified in brackets after the successor blocks.
 | 
						|
The example below defines a block that has two successors with branch weights
 | 
						|
of 32 and 16:
 | 
						|
 | 
						|
.. code-block:: text
 | 
						|
 | 
						|
    bb.0.entry:
 | 
						|
      successors: %bb.1.then(32), %bb.2.else(16)
 | 
						|
 | 
						|
.. _bb-liveins:
 | 
						|
 | 
						|
Live In Registers
 | 
						|
^^^^^^^^^^^^^^^^^
 | 
						|
 | 
						|
The machine basic block's live in registers have to be specified before any of
 | 
						|
the instructions:
 | 
						|
 | 
						|
.. code-block:: text
 | 
						|
 | 
						|
    bb.0.entry:
 | 
						|
      liveins: %edi, %esi
 | 
						|
 | 
						|
The list of live in registers and successors can be empty. The language also
 | 
						|
allows multiple live in register and successor lists - they are combined into
 | 
						|
one list by the parser.
 | 
						|
 | 
						|
Miscellaneous Attributes
 | 
						|
^^^^^^^^^^^^^^^^^^^^^^^^
 | 
						|
 | 
						|
The attributes ``IsAddressTaken``, ``IsLandingPad`` and ``Alignment`` can be
 | 
						|
specified in brackets after the block's definition:
 | 
						|
 | 
						|
.. code-block:: text
 | 
						|
 | 
						|
    bb.0.entry (address-taken):
 | 
						|
      <instructions>
 | 
						|
    bb.2.else (align 4):
 | 
						|
      <instructions>
 | 
						|
    bb.3(landing-pad, align 4):
 | 
						|
      <instructions>
 | 
						|
 | 
						|
.. TODO: Describe the way the reference to an unnamed LLVM IR block can be
 | 
						|
   preserved.
 | 
						|
 | 
						|
Machine Instructions
 | 
						|
--------------------
 | 
						|
 | 
						|
A machine instruction is composed of a name,
 | 
						|
:ref:`machine operands <machine-operands>`,
 | 
						|
:ref:`instruction flags <instruction-flags>`, and machine memory operands.
 | 
						|
 | 
						|
The instruction's name is usually specified before the operands. The example
 | 
						|
below shows an instance of the X86 ``RETQ`` instruction with a single machine
 | 
						|
operand:
 | 
						|
 | 
						|
.. code-block:: text
 | 
						|
 | 
						|
    RETQ %eax
 | 
						|
 | 
						|
However, if the machine instruction has one or more explicitly defined register
 | 
						|
operands, the instruction's name has to be specified after them. The example
 | 
						|
below shows an instance of the AArch64 ``LDPXpost`` instruction with three
 | 
						|
defined register operands:
 | 
						|
 | 
						|
.. code-block:: text
 | 
						|
 | 
						|
    %sp, %fp, %lr = LDPXpost %sp, 2
 | 
						|
 | 
						|
The instruction names are serialized using the exact definitions from the
 | 
						|
target's ``*InstrInfo.td`` files, and they are case sensitive. This means that
 | 
						|
similar instruction names like ``TSTri`` and ``tSTRi`` represent different
 | 
						|
machine instructions.
 | 
						|
 | 
						|
.. _instruction-flags:
 | 
						|
 | 
						|
Instruction Flags
 | 
						|
^^^^^^^^^^^^^^^^^
 | 
						|
 | 
						|
The flag ``frame-setup`` can be specified before the instruction's name:
 | 
						|
 | 
						|
.. code-block:: text
 | 
						|
 | 
						|
    %fp = frame-setup ADDXri %sp, 0, 0
 | 
						|
 | 
						|
.. _registers:
 | 
						|
 | 
						|
Registers
 | 
						|
---------
 | 
						|
 | 
						|
Registers are one of the key primitives in the machine instructions
 | 
						|
serialization language. They are primarly used in the
 | 
						|
:ref:`register machine operands <register-operands>`,
 | 
						|
but they can also be used in a number of other places, like the
 | 
						|
:ref:`basic block's live in list <bb-liveins>`.
 | 
						|
 | 
						|
The physical registers are identified by their name. They use the following
 | 
						|
syntax:
 | 
						|
 | 
						|
.. code-block:: text
 | 
						|
 | 
						|
    %<name>
 | 
						|
 | 
						|
The example below shows three X86 physical registers:
 | 
						|
 | 
						|
.. code-block:: text
 | 
						|
 | 
						|
    %eax
 | 
						|
    %r15
 | 
						|
    %eflags
 | 
						|
 | 
						|
The virtual registers are identified by their ID number. They use the following
 | 
						|
syntax:
 | 
						|
 | 
						|
.. code-block:: text
 | 
						|
 | 
						|
    %<id>
 | 
						|
 | 
						|
Example:
 | 
						|
 | 
						|
.. code-block:: text
 | 
						|
 | 
						|
    %0
 | 
						|
 | 
						|
The null registers are represented using an underscore ('``_``'). They can also be
 | 
						|
represented using a '``%noreg``' named register, although the former syntax
 | 
						|
is preferred.
 | 
						|
 | 
						|
.. _machine-operands:
 | 
						|
 | 
						|
Machine Operands
 | 
						|
----------------
 | 
						|
 | 
						|
There are seventeen different kinds of machine operands, and all of them, except
 | 
						|
the ``MCSymbol`` operand, can be serialized. The ``MCSymbol`` operands are
 | 
						|
just printed out - they can't be parsed back yet.
 | 
						|
 | 
						|
Immediate Operands
 | 
						|
^^^^^^^^^^^^^^^^^^
 | 
						|
 | 
						|
The immediate machine operands are untyped, 64-bit signed integers. The
 | 
						|
example below shows an instance of the X86 ``MOV32ri`` instruction that has an
 | 
						|
immediate machine operand ``-42``:
 | 
						|
 | 
						|
.. code-block:: text
 | 
						|
 | 
						|
    %eax = MOV32ri -42
 | 
						|
 | 
						|
.. TODO: Describe the CIMM (Rare) and FPIMM immediate operands.
 | 
						|
 | 
						|
.. _register-operands:
 | 
						|
 | 
						|
Register Operands
 | 
						|
^^^^^^^^^^^^^^^^^
 | 
						|
 | 
						|
The :ref:`register <registers>` primitive is used to represent the register
 | 
						|
machine operands. The register operands can also have optional
 | 
						|
:ref:`register flags <register-flags>`,
 | 
						|
:ref:`a subregister index <subregister-indices>`,
 | 
						|
and a reference to the tied register operand.
 | 
						|
The full syntax of a register operand is shown below:
 | 
						|
 | 
						|
.. code-block:: text
 | 
						|
 | 
						|
    [<flags>] <register> [ :<subregister-idx-name> ] [ (tied-def <tied-op>) ]
 | 
						|
 | 
						|
This example shows an instance of the X86 ``XOR32rr`` instruction that has
 | 
						|
5 register operands with different register flags:
 | 
						|
 | 
						|
.. code-block:: text
 | 
						|
 | 
						|
  dead %eax = XOR32rr undef %eax, undef %eax, implicit-def dead %eflags, implicit-def %al
 | 
						|
 | 
						|
.. _register-flags:
 | 
						|
 | 
						|
Register Flags
 | 
						|
~~~~~~~~~~~~~~
 | 
						|
 | 
						|
The table below shows all of the possible register flags along with the
 | 
						|
corresponding internal ``llvm::RegState`` representation:
 | 
						|
 | 
						|
.. list-table::
 | 
						|
   :header-rows: 1
 | 
						|
 | 
						|
   * - Flag
 | 
						|
     - Internal Value
 | 
						|
 | 
						|
   * - ``implicit``
 | 
						|
     - ``RegState::Implicit``
 | 
						|
 | 
						|
   * - ``implicit-def``
 | 
						|
     - ``RegState::ImplicitDefine``
 | 
						|
 | 
						|
   * - ``def``
 | 
						|
     - ``RegState::Define``
 | 
						|
 | 
						|
   * - ``dead``
 | 
						|
     - ``RegState::Dead``
 | 
						|
 | 
						|
   * - ``killed``
 | 
						|
     - ``RegState::Kill``
 | 
						|
 | 
						|
   * - ``undef``
 | 
						|
     - ``RegState::Undef``
 | 
						|
 | 
						|
   * - ``internal``
 | 
						|
     - ``RegState::InternalRead``
 | 
						|
 | 
						|
   * - ``early-clobber``
 | 
						|
     - ``RegState::EarlyClobber``
 | 
						|
 | 
						|
   * - ``debug-use``
 | 
						|
     - ``RegState::Debug``
 | 
						|
 | 
						|
.. _subregister-indices:
 | 
						|
 | 
						|
Subregister Indices
 | 
						|
~~~~~~~~~~~~~~~~~~~
 | 
						|
 | 
						|
The register machine operands can reference a portion of a register by using
 | 
						|
the subregister indices. The example below shows an instance of the ``COPY``
 | 
						|
pseudo instruction that uses the X86 ``sub_8bit`` subregister index to copy 8
 | 
						|
lower bits from the 32-bit virtual register 0 to the 8-bit virtual register 1:
 | 
						|
 | 
						|
.. code-block:: text
 | 
						|
 | 
						|
    %1 = COPY %0:sub_8bit
 | 
						|
 | 
						|
The names of the subregister indices are target specific, and are typically
 | 
						|
defined in the target's ``*RegisterInfo.td`` file.
 | 
						|
 | 
						|
Global Value Operands
 | 
						|
^^^^^^^^^^^^^^^^^^^^^
 | 
						|
 | 
						|
The global value machine operands reference the global values from the
 | 
						|
:ref:`embedded LLVM IR module <embedded-module>`.
 | 
						|
The example below shows an instance of the X86 ``MOV64rm`` instruction that has
 | 
						|
a global value operand named ``G``:
 | 
						|
 | 
						|
.. code-block:: text
 | 
						|
 | 
						|
    %rax = MOV64rm %rip, 1, _, @G, _
 | 
						|
 | 
						|
The named global values are represented using an identifier with the '@' prefix.
 | 
						|
If the identifier doesn't match the regular expression
 | 
						|
`[-a-zA-Z$._][-a-zA-Z$._0-9]*`, then this identifier must be quoted.
 | 
						|
 | 
						|
The unnamed global values are represented using an unsigned numeric value with
 | 
						|
the '@' prefix, like in the following examples: ``@0``, ``@989``.
 | 
						|
 | 
						|
.. TODO: Describe the parsers default behaviour when optional YAML attributes
 | 
						|
   are missing.
 | 
						|
.. TODO: Describe the syntax for the bundled instructions.
 | 
						|
.. TODO: Describe the syntax for virtual register YAML definitions.
 | 
						|
.. TODO: Describe the machine function's YAML flag attributes.
 | 
						|
.. TODO: Describe the syntax for the external symbol and register
 | 
						|
   mask machine operands.
 | 
						|
.. TODO: Describe the frame information YAML mapping.
 | 
						|
.. TODO: Describe the syntax of the stack object machine operands and their
 | 
						|
   YAML definitions.
 | 
						|
.. TODO: Describe the syntax of the constant pool machine operands and their
 | 
						|
   YAML definitions.
 | 
						|
.. TODO: Describe the syntax of the jump table machine operands and their
 | 
						|
   YAML definitions.
 | 
						|
.. TODO: Describe the syntax of the block address machine operands.
 | 
						|
.. TODO: Describe the syntax of the CFI index machine operands.
 | 
						|
.. TODO: Describe the syntax of the metadata machine operands, and the
 | 
						|
   instructions debug location attribute.
 | 
						|
.. TODO: Describe the syntax of the target index machine operands.
 | 
						|
.. TODO: Describe the syntax of the register live out machine operands.
 | 
						|
.. TODO: Describe the syntax of the machine memory operands.
 |