384 lines
		
	
	
		
			15 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
			
		
		
	
	
			384 lines
		
	
	
		
			15 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
====================
 | 
						|
Constant Interpreter
 | 
						|
====================
 | 
						|
 | 
						|
.. contents::
 | 
						|
   :local:
 | 
						|
 | 
						|
Introduction
 | 
						|
============
 | 
						|
 | 
						|
The constexpr interpreter aims to replace the existing tree evaluator in
 | 
						|
clang, improving performance on constructs which are executed inefficiently
 | 
						|
by the evaluator. The interpreter is activated using the following flags:
 | 
						|
 | 
						|
* ``-fexperimental-new-constant-interpreter`` enables the interpreter,
 | 
						|
  emitting an error if an unsupported feature is encountered
 | 
						|
 | 
						|
Bytecode Compilation
 | 
						|
====================
 | 
						|
 | 
						|
Bytecode compilation is handled in ``ByteCodeStmtGen.h`` for statements
 | 
						|
and ``ByteCodeExprGen.h`` for expressions. The compiler has two different
 | 
						|
backends: one to generate bytecode for functions (``ByteCodeEmitter``) and
 | 
						|
one to directly evaluate expressions as they are compiled, without
 | 
						|
generating bytecode (``EvalEmitter``). All functions are compiled to
 | 
						|
bytecode, while toplevel expressions used in constant contexts are directly
 | 
						|
evaluated since the bytecode would never be reused. This mechanism aims to
 | 
						|
pave the way towards replacing the evaluator, improving its performance on
 | 
						|
functions and loops, while being just as fast on single-use toplevel
 | 
						|
expressions.
 | 
						|
 | 
						|
The interpreter relies on stack-based, strongly-typed opcodes. The glue
 | 
						|
logic between the code generator, along with the enumeration and
 | 
						|
description of opcodes, can be found in ``Opcodes.td``. The opcodes are
 | 
						|
implemented as generic template methods in ``Interp.h`` and instantiated
 | 
						|
with the relevant primitive types by the interpreter loop or by the
 | 
						|
evaluating emitter.
 | 
						|
 | 
						|
Primitive Types
 | 
						|
---------------
 | 
						|
 | 
						|
* ``PT_{U|S}int{8|16|32|64}``
 | 
						|
 | 
						|
  Signed or unsigned integers of a specific bit width, implemented using
 | 
						|
  the ```Integral``` type.
 | 
						|
 | 
						|
* ``PT_{U|S}intFP``
 | 
						|
 | 
						|
  Signed or unsigned integers of an arbitrary, but fixed width used to
 | 
						|
  implement integral types which are required by the target, but are not
 | 
						|
  supported by the host. Under the hood, they rely on APValue. The
 | 
						|
  ``Integral`` specialisation for these types is required by opcodes to
 | 
						|
  share an implementation with fixed integrals.
 | 
						|
 | 
						|
* ``PT_Bool``
 | 
						|
 | 
						|
  Representation for boolean types, essentially a 1-bit unsigned
 | 
						|
  ``Integral``.
 | 
						|
 | 
						|
* ``PT_RealFP``
 | 
						|
 | 
						|
  Arbitrary, but fixed precision floating point numbers. Could be
 | 
						|
  specialised in the future similarly to integers in order to improve
 | 
						|
  floating point performance.
 | 
						|
 | 
						|
* ``PT_Ptr``
 | 
						|
 | 
						|
  Pointer type, defined in ``"Pointer.h"``. A pointer can be either null,
 | 
						|
  reference interpreter-allocated memory (``BlockPointer``) or point to an
 | 
						|
  address which can be derived, but not accessed (``ExternPointer``).
 | 
						|
 | 
						|
* ``PT_FnPtr``
 | 
						|
 | 
						|
  Function pointer type, can also be a null function pointer. Defined
 | 
						|
  in ``"FnPointer.h"``.
 | 
						|
 | 
						|
* ``PT_MemPtr``
 | 
						|
 | 
						|
  Member pointer type, can also be a null member pointer. Defined
 | 
						|
  in ``"MemberPointer.h"``
 | 
						|
 | 
						|
* ``PT_VoidPtr``
 | 
						|
 | 
						|
  Void pointer type, can be used for rount-trip casts. Represented as
 | 
						|
  the union of all pointers which can be cast to void.
 | 
						|
  Defined in ``"VoidPointer.h"``.
 | 
						|
 | 
						|
* ``PT_ObjCBlockPtr``
 | 
						|
 | 
						|
  Pointer type for ObjC blocks. Defined in ``"ObjCBlockPointer.h"``.
 | 
						|
 | 
						|
Composite types
 | 
						|
---------------
 | 
						|
 | 
						|
The interpreter distinguishes two kinds of composite types: arrays and
 | 
						|
records (structs and classes). Unions are represented as records, except
 | 
						|
at most a single field can be marked as active. The contents of inactive
 | 
						|
fields are kept until they are reactivated and overwritten.
 | 
						|
Complex numbers (``_Complex``) and vectors
 | 
						|
(``__attribute((vector_size(16)))``) are treated as arrays.
 | 
						|
 | 
						|
 | 
						|
Bytecode Execution
 | 
						|
==================
 | 
						|
 | 
						|
Bytecode is executed using a stack-based interpreter. The execution
 | 
						|
context consists of an ``InterpStack``, along with a chain of
 | 
						|
``InterpFrame`` objects storing the call frames. Frames are built by
 | 
						|
call instructions and destroyed by return instructions. They perform
 | 
						|
one allocation to reserve space for all locals in a single block.
 | 
						|
These objects store all the required information to emit stack traces
 | 
						|
whenever evaluation fails.
 | 
						|
 | 
						|
Memory Organisation
 | 
						|
===================
 | 
						|
 | 
						|
Memory management in the interpreter relies on 3 data structures: ``Block``
 | 
						|
objects which store the data and associated inline metadata, ``Pointer``
 | 
						|
objects which refer to or into blocks, and ``Descriptor`` structures which
 | 
						|
describe blocks and subobjects nested inside blocks.
 | 
						|
 | 
						|
Blocks
 | 
						|
------
 | 
						|
 | 
						|
Blocks contain data interleaved with metadata. They are allocated either
 | 
						|
statically in the code generator (globals, static members, dummy parameter
 | 
						|
values etc.) or dynamically in the interpreter, when creating the frame
 | 
						|
containing the local variables of a function. Blocks are associated with a
 | 
						|
descriptor that characterises the entire allocation, along with a few
 | 
						|
additional attributes:
 | 
						|
 | 
						|
* ``IsStatic`` indicates whether the block has static duration in the
 | 
						|
  interpreter, i.e. it is not a local in a frame.
 | 
						|
 | 
						|
* ``DeclID`` identifies each global declaration (it is set to an invalid
 | 
						|
  and irrelevant value for locals) in order to prevent illegal writes and
 | 
						|
  reads involving globals and temporaries with static storage duration.
 | 
						|
 | 
						|
Static blocks are never deallocated, but local ones might be deallocated
 | 
						|
even when there are live pointers to them. Pointers are only valid as
 | 
						|
long as the blocks they point to are valid, so a block with pointers to
 | 
						|
it whose lifetime ends is kept alive until all pointers to it go out of
 | 
						|
scope. Since the frame is destroyed on function exit, such blocks are
 | 
						|
turned into a ``DeadBlock`` and copied to storage managed by the
 | 
						|
interpreter itself, not the frame. Reads and writes to these blocks are
 | 
						|
illegal and cause an appropriate diagnostic to be emitted. When the last
 | 
						|
pointer goes out of scope, dead blocks are also deallocated.
 | 
						|
 | 
						|
The lifetime of blocks is managed through 3 methods stored in the
 | 
						|
descriptor of the block:
 | 
						|
 | 
						|
* **CtorFn**: initializes the metadata which is store in the block,
 | 
						|
  alongside actual data. Invokes the default constructors of objects
 | 
						|
  which are not trivial (``Pointer``, ``RealFP``, etc.)
 | 
						|
 | 
						|
* **DtorFn**: invokes the destructors of non-trivial objects.
 | 
						|
 | 
						|
* **MoveFn**: moves a block to dead storage.
 | 
						|
 | 
						|
Non-static blocks track all the pointers into them through an intrusive
 | 
						|
doubly-linked list, required to adjust and invalidate all pointers when
 | 
						|
transforming a block into a dead block. If the lifetime of an object ends,
 | 
						|
all pointers to it are invalidated, emitting the appropriate diagnostics when
 | 
						|
dereferenced.
 | 
						|
 | 
						|
The interpreter distinguishes 3 different kinds of blocks:
 | 
						|
 | 
						|
* **Primitives**
 | 
						|
 | 
						|
  A block containing a single primitive with no additional metadata.
 | 
						|
 | 
						|
* **Arrays of primitives**
 | 
						|
 | 
						|
  An array of primitives contains a pointer to an ``InitMap`` storage as its
 | 
						|
  first field: the initialisation map is a bit map indicating all elements of
 | 
						|
  the array which were initialised. If the pointer is null, no elements were
 | 
						|
  initialised, while a value of ``(InitMap*)-1`` indicates that the object was
 | 
						|
  fully initialised. When all fields are initialised, the map is deallocated
 | 
						|
  and replaced with that token.
 | 
						|
 | 
						|
  Array elements are stored sequentially, without padding, after the pointer
 | 
						|
  to the map.
 | 
						|
 | 
						|
* **Arrays of composites and records**
 | 
						|
 | 
						|
  Each element in an array of composites is preceded by an ``InlineDescriptor``
 | 
						|
  which stores the attributes specific to the field and not the whole
 | 
						|
  allocation site. Descriptors and elements are stored sequentially in the
 | 
						|
  block.
 | 
						|
  Records are laid out identically to arrays of composites: each field and base
 | 
						|
  class is preceded by an inline descriptor. The ``InlineDescriptor``
 | 
						|
  has the following fields:
 | 
						|
 | 
						|
   * **Offset**: byte offset into the array or record, used to step back to the
 | 
						|
     parent array or record.
 | 
						|
   * **IsConst**: flag indicating if the field is const-qualified.
 | 
						|
   * **IsInitialized**: flag indicating whether the field or element was
 | 
						|
     initialized. For non-primitive fields, this is only relevant to determine
 | 
						|
     the dynamic type of objects during construction.
 | 
						|
   * **IsBase**: flag indicating whether the record is a base class. In that
 | 
						|
     case, the offset can be used to identify the derived class.
 | 
						|
   * **IsActive**: indicates if the field is the active field of a union.
 | 
						|
   * **IsMutable**: indicates if the field is marked as mutable.
 | 
						|
 | 
						|
Inline descriptors are filled in by the `CtorFn` of blocks, which leaves storage
 | 
						|
in an uninitialised, but valid state.
 | 
						|
 | 
						|
Descriptors
 | 
						|
-----------
 | 
						|
 | 
						|
Descriptors are generated at bytecode compilation time and contain information
 | 
						|
required to determine if a particular memory access is allowed in constexpr.
 | 
						|
They also carry all the information required to emit a diagnostic involving
 | 
						|
a memory access, such as the declaration which originates the block.
 | 
						|
Currently there is a single kind of descriptor encoding information for all
 | 
						|
block types.
 | 
						|
 | 
						|
Pointers
 | 
						|
--------
 | 
						|
 | 
						|
Pointers, implemented in ``Pointer.h`` are represented as a tagged union.
 | 
						|
Some of these may not yet be available in upstream ``clang``.
 | 
						|
 | 
						|
 * **BlockPointer**: used to reference memory allocated and managed by the
 | 
						|
   interpreter, being the only pointer kind which allows dereferencing in the
 | 
						|
   interpreter
 | 
						|
 * **ExternPointer**: points to memory which can be addressed, but not read by
 | 
						|
   the interpreter. It is equivalent to APValue, tracking a declaration and a path
 | 
						|
   of fields and indices into that allocation.
 | 
						|
 * **TargetPointer**: represents a target address derived from a base address
 | 
						|
   through pointer arithmetic, such as ``((int *)0x100)[20]``. Null pointers are
 | 
						|
   target pointers with a zero offset.
 | 
						|
 * **TypeInfoPointer**: tracks information for the opaque type returned by
 | 
						|
   ``typeid``
 | 
						|
 * **InvalidPointer**: is dummy pointer created by an invalid operation which
 | 
						|
   allows the interpreter to continue execution. Does not allow pointer
 | 
						|
   arithmetic or dereferencing.
 | 
						|
 | 
						|
Besides the previously mentioned union, a number of other pointer-like types
 | 
						|
have their own type:
 | 
						|
 | 
						|
 * **ObjCBlockPointer** tracks Objective-C blocks
 | 
						|
 * **FnPointer** tracks functions and lazily caches their compiled version
 | 
						|
 * **MemberPointer** tracks C++ object members
 | 
						|
 | 
						|
Void pointers, which can be built by casting any of the aforementioned
 | 
						|
pointers, are implemented as a union of all pointer types. The ``BitCast``
 | 
						|
opcode is responsible for performing all legal conversions between these
 | 
						|
types and primitive integers.
 | 
						|
 | 
						|
BlockPointer
 | 
						|
~~~~~~~~~~~~
 | 
						|
 | 
						|
Block pointers track a ``Pointee``, the block to which they point, along
 | 
						|
with a ``Base`` and an ``Offset``. The base identifies the innermost field,
 | 
						|
while the offset points to an array element relative to the base (including
 | 
						|
one-past-end pointers). The offset identifies the array element or field
 | 
						|
which is referenced, while the base points to the outer object or array which
 | 
						|
contains the field. These two fields allow all pointers to be uniquely
 | 
						|
identified, disambiguated and characterised.
 | 
						|
 | 
						|
As an example, consider the following structure:
 | 
						|
 | 
						|
.. code-block:: c
 | 
						|
 | 
						|
    struct A {
 | 
						|
        struct B {
 | 
						|
            int x;
 | 
						|
            int y;
 | 
						|
        } b;
 | 
						|
        struct C {
 | 
						|
            int a;
 | 
						|
            int b;
 | 
						|
        } c[2];
 | 
						|
        int z;
 | 
						|
    };
 | 
						|
    constexpr A a;
 | 
						|
 | 
						|
On the target, ``&a`` and ``&a.b.x`` are equal. So are ``&a.c[0]`` and
 | 
						|
``&a.c[0].a``. In the interpreter, all these pointers must be
 | 
						|
distinguished since the are all allowed to address distinct range of
 | 
						|
memory.
 | 
						|
 | 
						|
In the interpreter, the object would require 240 bytes of storage and
 | 
						|
would have its field interleaved with metadata. The pointers which can
 | 
						|
be derived to the object are illustrated in the following diagram:
 | 
						|
 | 
						|
::
 | 
						|
 | 
						|
      0   16  32  40  56  64  80  96  112 120 136 144 160 176 184 200 208 224 240
 | 
						|
  +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
 | 
						|
  + B | D | D | x | D | y | D | D | D | a | D | b | D | D | a | D | b | D | z |
 | 
						|
  +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
 | 
						|
      ^   ^   ^       ^       ^   ^   ^       ^       ^   ^       ^       ^
 | 
						|
      |   |   |       |       |   |   |   &a.c[0].b   |   |   &a.c[1].b   |
 | 
						|
      a   |&a.b.x   &a.y    &a.c  |&a.c[0].a          |&a.c[1].a          |
 | 
						|
        &a.b                   &a.c[0]            &a.c[1]               &a.z
 | 
						|
 | 
						|
The ``Base`` offset of all pointers points to the start of a field or
 | 
						|
an array and is preceded by an inline descriptor (unless ``Base`` is
 | 
						|
zero, pointing to the root). All the relevant attributes can be read
 | 
						|
from either the inline descriptor or the descriptor of the block.
 | 
						|
 | 
						|
 | 
						|
Array elements are identified by the ``Offset`` field of pointers,
 | 
						|
pointing to past the inline descriptors for composites and before
 | 
						|
the actual data in the case of primitive arrays. The ``Offset``
 | 
						|
points to the offset where primitives can be read from. As an example,
 | 
						|
``a.c + 1`` would have the same base as ``a.c`` since it is an element
 | 
						|
of ``a.c``, but its offset would point to ``&a.c[1]``. The
 | 
						|
array-to-pointer decay operation adjusts a pointer to an array (where
 | 
						|
the offset is equal to the base) to a pointer to the first element.
 | 
						|
 | 
						|
ExternPointer
 | 
						|
~~~~~~~~~~~~~
 | 
						|
 | 
						|
Extern pointers can be derived, pointing into symbols which are not
 | 
						|
readable from constexpr. An external pointer consists of a base
 | 
						|
declaration, along with a path designating a subobject, similar to
 | 
						|
the ``LValuePath`` of an APValue. Extern pointers can be converted
 | 
						|
to block pointers if the underlying variable is defined after the
 | 
						|
pointer is created, as is the case in the following example:
 | 
						|
 | 
						|
.. code-block:: c
 | 
						|
 | 
						|
  extern const int a;
 | 
						|
  constexpr const int *p = &a;
 | 
						|
  const int a = 5;
 | 
						|
  static_assert(*p == 5, "x");
 | 
						|
 | 
						|
TargetPointer
 | 
						|
~~~~~~~~~~~~~
 | 
						|
 | 
						|
While null pointer arithmetic or integer-to-pointer conversion is
 | 
						|
banned in constexpr, some expressions on target offsets must be folded,
 | 
						|
replicating the behaviour of the ``offsetof`` builtin. Target pointers
 | 
						|
are characterised by 3 offsets: a field offset, an array offset and a
 | 
						|
base offset, along with a descriptor specifying the type the pointer is
 | 
						|
supposed to refer to. Array indexing adjusts the array offset, while the
 | 
						|
field offset is adjusted when a pointer to a member is created. Casting
 | 
						|
an integer to a pointer sets the value of the base offset. As a special
 | 
						|
case, null pointers are target pointers with all offsets set to 0.
 | 
						|
 | 
						|
TypeInfoPointer
 | 
						|
~~~~~~~~~~~~~~~
 | 
						|
 | 
						|
``TypeInfoPointer`` tracks two types: the type assigned to
 | 
						|
``std::type_info`` and the type which was passed to ``typeinfo``.
 | 
						|
 | 
						|
InvalidPointer
 | 
						|
~~~~~~~~~~~~~~
 | 
						|
 | 
						|
Such pointers are built by operations which cannot generate valid
 | 
						|
pointers, allowing the interpreter to continue execution after emitting
 | 
						|
a warning. Inspecting such a pointer stops execution.
 | 
						|
 | 
						|
TODO
 | 
						|
====
 | 
						|
 | 
						|
Missing Language Features
 | 
						|
-------------------------
 | 
						|
 | 
						|
* Changing the active field of unions
 | 
						|
* ``volatile``
 | 
						|
* ``__builtin_constant_p``
 | 
						|
* ``dynamic_cast``
 | 
						|
* ``new`` and ``delete``
 | 
						|
* Fixed Point numbers and arithmetic on Complex numbers
 | 
						|
* Several builtin methods, including string operations and
 | 
						|
  ``__builtin_bit_cast``
 | 
						|
* Continue-after-failure: a form of exception handling at the bytecode
 | 
						|
  level should be implemented to allow execution to resume. As an example,
 | 
						|
  argument evaluation should resume after the computation of an argument fails.
 | 
						|
* Pointer-to-Integer conversions
 | 
						|
* Lazy descriptors: the interpreter creates a ``Record`` and ``Descriptor``
 | 
						|
  when it encounters a type: ones which are not yet defined should be lazily
 | 
						|
  created when required
 | 
						|
 | 
						|
Known Bugs
 | 
						|
----------
 | 
						|
 | 
						|
* If execution fails, memory storing APInts and APFloats is leaked when the
 | 
						|
  stack is cleared
 |