442 lines
		
	
	
		
			16 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
			
		
		
	
	
			442 lines
		
	
	
		
			16 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
.. _transformation-metadata:
 | 
						|
 | 
						|
============================
 | 
						|
Code Transformation Metadata
 | 
						|
============================
 | 
						|
 | 
						|
.. contents::
 | 
						|
   :local:
 | 
						|
 | 
						|
Overview
 | 
						|
========
 | 
						|
 | 
						|
LLVM transformation passes can be controlled by attaching metadata to
 | 
						|
the code to transform. By default, transformation passes use heuristics
 | 
						|
to determine whether or not to perform transformations, and when doing
 | 
						|
so, other details of how the transformations are applied (e.g., which
 | 
						|
vectorization factor to select).
 | 
						|
Unless the optimizer is otherwise directed, transformations are applied
 | 
						|
conservatively. This conservatism generally allows the optimizer to
 | 
						|
avoid unprofitable transformations, but in practice, this results in the
 | 
						|
optimizer not applying transformations that would be highly profitable.
 | 
						|
 | 
						|
Frontends can give additional hints to LLVM passes on which
 | 
						|
transformations they should apply. This can be additional knowledge that
 | 
						|
cannot be derived from the emitted IR, or directives passed from the
 | 
						|
user/programmer. OpenMP pragmas are an example of the latter.
 | 
						|
 | 
						|
If any such metadata is dropped from the program, the code's semantics
 | 
						|
must not change.
 | 
						|
 | 
						|
Metadata on Loops
 | 
						|
=================
 | 
						|
 | 
						|
Attributes can be attached to loops as described in :ref:`llvm.loop`.
 | 
						|
Attributes can describe properties of the loop, disable transformations,
 | 
						|
force specific transformations and set transformation options.
 | 
						|
 | 
						|
Because metadata nodes are immutable (with the exception of
 | 
						|
``MDNode::replaceOperandWith`` which is dangerous to use on uniqued
 | 
						|
metadata), in order to add or remove a loop attributes, a new ``MDNode``
 | 
						|
must be created and assigned as the new ``llvm.loop`` metadata. Any
 | 
						|
connection between the old ``MDNode`` and the loop is lost. The
 | 
						|
``llvm.loop`` node is also used as LoopID (``Loop::getLoopID()``), i.e.
 | 
						|
the loop effectively gets a new identifier. For instance,
 | 
						|
``llvm.mem.parallel_loop_access`` references the LoopID. Therefore, if
 | 
						|
the parallel access property is to be preserved after adding/removing
 | 
						|
loop attributes, any ``llvm.mem.parallel_loop_access`` reference must be
 | 
						|
updated to the new LoopID.
 | 
						|
 | 
						|
Transformation Metadata Structure
 | 
						|
=================================
 | 
						|
 | 
						|
Some attributes describe code transformations (unrolling, vectorizing,
 | 
						|
loop distribution, etc.). They can either be a hint to the optimizer
 | 
						|
that a transformation might be beneficial, instruction to use a specific
 | 
						|
option, , or convey a specific request from the user (such as
 | 
						|
``#pragma clang loop`` or ``#pragma omp simd``).
 | 
						|
 | 
						|
If a transformation is forced but cannot be carried-out for any reason,
 | 
						|
an optimization-missed warning must be emitted. Semantic information
 | 
						|
such as a transformation being safe (e.g.
 | 
						|
``llvm.mem.parallel_loop_access``) can be unused by the optimizer
 | 
						|
without generating a warning.
 | 
						|
 | 
						|
Unless explicitly disabled, any optimization pass may heuristically
 | 
						|
determine whether a transformation is beneficial and apply it. If
 | 
						|
metadata for another transformation was specified, applying a different
 | 
						|
transformation before it might be inadvertent due to being applied on a
 | 
						|
different loop or the loop not existing anymore. To avoid having to
 | 
						|
explicitly disable an unknown number of passes, the attribute
 | 
						|
``llvm.loop.disable_nonforced`` disables all optional, high-level,
 | 
						|
restructuring transformations.
 | 
						|
 | 
						|
The following example avoids the loop being altered before being
 | 
						|
vectorized, for instance being unrolled.
 | 
						|
 | 
						|
.. code-block:: llvm
 | 
						|
 | 
						|
      br i1 %exitcond, label %for.exit, label %for.header, !llvm.loop !0
 | 
						|
    ...
 | 
						|
    !0 = distinct !{!0, !1, !2}
 | 
						|
    !1 = !{!"llvm.loop.vectorize.enable", i1 true}
 | 
						|
    !2 = !{!"llvm.loop.disable_nonforced"}
 | 
						|
 | 
						|
After a transformation is applied, follow-up attributes are set on the
 | 
						|
transformed and/or new loop(s). This allows additional attributes
 | 
						|
including followup-transformations to be specified. Specifying multiple
 | 
						|
transformations in the same metadata node is possible for compatibility
 | 
						|
reasons, but their execution order is undefined. For instance, when
 | 
						|
``llvm.loop.vectorize.enable`` and ``llvm.loop.unroll.enable`` are
 | 
						|
specified at the same time, unrolling may occur either before or after
 | 
						|
vectorization.
 | 
						|
 | 
						|
As an example, the following instructs a loop to be vectorized and only
 | 
						|
then unrolled.
 | 
						|
 | 
						|
.. code-block:: llvm
 | 
						|
 | 
						|
    !0 = distinct !{!0, !1, !2, !3}
 | 
						|
    !1 = !{!"llvm.loop.vectorize.enable", i1 true}
 | 
						|
    !2 = !{!"llvm.loop.disable_nonforced"}
 | 
						|
    !3 = !{!"llvm.loop.vectorize.followup_vectorized", !{"llvm.loop.unroll.enable"}}
 | 
						|
 | 
						|
If, and only if, no followup is specified, the pass may add attributes itself.
 | 
						|
For instance, the vectorizer adds a ``llvm.loop.isvectorized`` attribute and
 | 
						|
all attributes from the original loop excluding its loop vectorizer
 | 
						|
attributes. To avoid this, an empty followup attribute can be used, e.g.
 | 
						|
 | 
						|
.. code-block:: llvm
 | 
						|
 | 
						|
    !3 = !{!"llvm.loop.vectorize.followup_vectorized"}
 | 
						|
 | 
						|
The followup attributes of a transformation that cannot be applied will
 | 
						|
never be added to a loop and are therefore effectively ignored. This means
 | 
						|
that any followup-transformation in such attributes requires that its
 | 
						|
prior transformations are applied before the followup-transformation.
 | 
						|
The user should receive a warning about the first transformation in the
 | 
						|
transformation chain that could not be applied if it a forced
 | 
						|
transformation. All following transformations are skipped.
 | 
						|
 | 
						|
Pass-Specific Transformation Metadata
 | 
						|
=====================================
 | 
						|
 | 
						|
Transformation options are specific to each transformation. In the
 | 
						|
following, we present the model for each LLVM loop optimization pass and
 | 
						|
the metadata to influence them.
 | 
						|
 | 
						|
Loop Vectorization and Interleaving
 | 
						|
-----------------------------------
 | 
						|
 | 
						|
Loop vectorization and interleaving is interpreted as a single
 | 
						|
transformation. It is interpreted as forced if
 | 
						|
``!{"llvm.loop.vectorize.enable", i1 true}`` is set.
 | 
						|
 | 
						|
Assuming the pre-vectorization loop is
 | 
						|
 | 
						|
.. code-block:: c
 | 
						|
 | 
						|
    for (int i = 0; i < n; i+=1) // original loop
 | 
						|
      Stmt(i);
 | 
						|
 | 
						|
then the code after vectorization will be approximately (assuming an
 | 
						|
SIMD width of 4):
 | 
						|
 | 
						|
.. code-block:: c
 | 
						|
 | 
						|
    int i = 0;
 | 
						|
    if (rtc) {
 | 
						|
      for (; i + 3 < n; i+=4) // vectorized/interleaved loop
 | 
						|
        Stmt(i:i+3);
 | 
						|
    }
 | 
						|
    for (; i < n; i+=1) // epilogue loop
 | 
						|
      Stmt(i);
 | 
						|
 | 
						|
where ``rtc`` is a generated runtime check.
 | 
						|
 | 
						|
``llvm.loop.vectorize.followup_vectorized`` will set the attributes for
 | 
						|
the vectorized loop. If not specified, ``llvm.loop.isvectorized`` is
 | 
						|
combined with the original loop's attributes to avoid it being
 | 
						|
vectorized multiple times.
 | 
						|
 | 
						|
``llvm.loop.vectorize.followup_epilogue`` will set the attributes for
 | 
						|
the remainder loop. If not specified, it will have the original loop's
 | 
						|
attributes combined with ``llvm.loop.isvectorized`` and
 | 
						|
``llvm.loop.unroll.runtime.disable`` (unless the original loop already
 | 
						|
has unroll metadata).
 | 
						|
 | 
						|
The attributes specified by ``llvm.loop.vectorize.followup_all`` are
 | 
						|
added to both loops.
 | 
						|
 | 
						|
When using a follow-up attribute, it replaces any automatically deduced
 | 
						|
attributes for the generated loop in question. Therefore it is
 | 
						|
recommended to add ``llvm.loop.isvectorized`` to
 | 
						|
``llvm.loop.vectorize.followup_all`` which avoids that the loop
 | 
						|
vectorizer tries to optimize the loops again.
 | 
						|
 | 
						|
Loop Unrolling
 | 
						|
--------------
 | 
						|
 | 
						|
Unrolling is interpreted as forced any ``!{!"llvm.loop.unroll.enable"}``
 | 
						|
metadata or option (``llvm.loop.unroll.count``, ``llvm.loop.unroll.full``)
 | 
						|
is present. Unrolling can be full unrolling, partial unrolling of a loop
 | 
						|
with constant trip count or runtime unrolling of a loop with a trip
 | 
						|
count unknown at compile-time.
 | 
						|
 | 
						|
If the loop has been unrolled fully, there is no followup-loop. For
 | 
						|
partial/runtime unrolling, the original loop of
 | 
						|
 | 
						|
.. code-block:: c
 | 
						|
 | 
						|
    for (int i = 0; i < n; i+=1) // original loop
 | 
						|
      Stmt(i);
 | 
						|
 | 
						|
is transformed into (using an unroll factor of 4):
 | 
						|
 | 
						|
.. code-block:: c
 | 
						|
 | 
						|
    int i = 0;
 | 
						|
    for (; i + 3 < n; i+=4) { // unrolled loop
 | 
						|
      Stmt(i);
 | 
						|
      Stmt(i+1);
 | 
						|
      Stmt(i+2);
 | 
						|
      Stmt(i+3);
 | 
						|
    }
 | 
						|
    for (; i < n; i+=1) // remainder loop
 | 
						|
      Stmt(i);
 | 
						|
 | 
						|
``llvm.loop.unroll.followup_unrolled`` will set the loop attributes of
 | 
						|
the unrolled loop. If not specified, the attributes of the original loop
 | 
						|
without the ``llvm.loop.unroll.*`` attributes are copied and
 | 
						|
``llvm.loop.unroll.disable`` added to it.
 | 
						|
 | 
						|
``llvm.loop.unroll.followup_remainder`` defines the attributes of the
 | 
						|
remainder loop. If not specified the remainder loop will have no
 | 
						|
attributes. The remainder loop might not be present due to being fully
 | 
						|
unrolled in which case this attribute has no effect.
 | 
						|
 | 
						|
Attributes defined in ``llvm.loop.unroll.followup_all`` are added to the
 | 
						|
unrolled and remainder loops.
 | 
						|
 | 
						|
To avoid that the partially unrolled loop is unrolled again, it is
 | 
						|
recommended to add ``llvm.loop.unroll.disable`` to
 | 
						|
``llvm.loop.unroll.followup_all``. If no follow-up attribute specified
 | 
						|
for a generated loop, it is added automatically.
 | 
						|
 | 
						|
Unroll-And-Jam
 | 
						|
--------------
 | 
						|
 | 
						|
Unroll-and-jam uses the following transformation model (here with an
 | 
						|
unroll factor if 2). Currently, it does not support a fallback version
 | 
						|
when the transformation is unsafe.
 | 
						|
 | 
						|
.. code-block:: c
 | 
						|
 | 
						|
    for (int i = 0; i < n; i+=1) { // original outer loop
 | 
						|
      Fore(i);
 | 
						|
      for (int j = 0; j < m; j+=1) // original inner loop
 | 
						|
        SubLoop(i, j);
 | 
						|
      Aft(i);
 | 
						|
    }
 | 
						|
 | 
						|
.. code-block:: c
 | 
						|
 | 
						|
    int i = 0;
 | 
						|
    for (; i + 1 < n; i+=2) { // unrolled outer loop
 | 
						|
      Fore(i);
 | 
						|
      Fore(i+1);
 | 
						|
      for (int j = 0; j < m; j+=1) { // unrolled inner loop
 | 
						|
        SubLoop(i, j);
 | 
						|
        SubLoop(i+1, j);
 | 
						|
      }
 | 
						|
      Aft(i);
 | 
						|
      Aft(i+1);
 | 
						|
    }
 | 
						|
    for (; i < n; i+=1) { // remainder outer loop
 | 
						|
      Fore(i);
 | 
						|
      for (int j = 0; j < m; j+=1) // remainder inner loop
 | 
						|
        SubLoop(i, j);
 | 
						|
      Aft(i);
 | 
						|
    }
 | 
						|
 | 
						|
``llvm.loop.unroll_and_jam.followup_outer`` will set the loop attributes
 | 
						|
of the unrolled outer loop. If not specified, the attributes of the
 | 
						|
original outer loop without the ``llvm.loop.unroll.*`` attributes are
 | 
						|
copied and ``llvm.loop.unroll.disable`` added to it.
 | 
						|
 | 
						|
``llvm.loop.unroll_and_jam.followup_inner`` will set the loop attributes
 | 
						|
of the unrolled inner loop. If not specified, the attributes of the
 | 
						|
original inner loop are used unchanged.
 | 
						|
 | 
						|
``llvm.loop.unroll_and_jam.followup_remainder_outer`` sets the loop
 | 
						|
attributes of the outer remainder loop. If not specified it will not
 | 
						|
have any attributes. The remainder loop might not be present due to
 | 
						|
being fully unrolled.
 | 
						|
 | 
						|
``llvm.loop.unroll_and_jam.followup_remainder_inner`` sets the loop
 | 
						|
attributes of the inner remainder loop. If not specified it will have
 | 
						|
the attributes of the original inner loop. It the outer remainder loop
 | 
						|
is unrolled, the inner remainder loop might be present multiple times.
 | 
						|
 | 
						|
Attributes defined in ``llvm.loop.unroll_and_jam.followup_all`` are
 | 
						|
added to all of the aforementioned output loops.
 | 
						|
 | 
						|
To avoid that the unrolled loop is unrolled again, it is
 | 
						|
recommended to add ``llvm.loop.unroll.disable`` to
 | 
						|
``llvm.loop.unroll_and_jam.followup_all``. It suppresses unroll-and-jam
 | 
						|
as well as an additional inner loop unrolling. If no follow-up
 | 
						|
attribute specified for a generated loop, it is added automatically.
 | 
						|
 | 
						|
Loop Distribution
 | 
						|
-----------------
 | 
						|
 | 
						|
The LoopDistribution pass tries to separate vectorizable parts of a loop
 | 
						|
from the non-vectorizable part (which otherwise would make the entire
 | 
						|
loop non-vectorizable). Conceptually, it transforms a loop such as
 | 
						|
 | 
						|
.. code-block:: c
 | 
						|
 | 
						|
    for (int i = 1; i < n; i+=1) { // original loop
 | 
						|
      A[i] = i;
 | 
						|
      B[i] = 2 + B[i];
 | 
						|
      C[i] = 3 + C[i - 1];
 | 
						|
    }
 | 
						|
 | 
						|
into the following code:
 | 
						|
 | 
						|
.. code-block:: c
 | 
						|
 | 
						|
    if (rtc) {
 | 
						|
      for (int i = 1; i < n; i+=1) // coincident loop
 | 
						|
        A[i] = i;
 | 
						|
      for (int i = 1; i < n; i+=1) // coincident loop
 | 
						|
        B[i] = 2 + B[i];
 | 
						|
      for (int i = 1; i < n; i+=1) // sequential loop
 | 
						|
        C[i] = 3 + C[i - 1];
 | 
						|
    } else {
 | 
						|
      for (int i = 1; i < n; i+=1) { // fallback loop
 | 
						|
        A[i] = i;
 | 
						|
        B[i] = 2 + B[i];
 | 
						|
        C[i] = 3 + C[i - 1];
 | 
						|
      }
 | 
						|
    }
 | 
						|
 | 
						|
where ``rtc`` is a generated runtime check.
 | 
						|
 | 
						|
``llvm.loop.distribute.followup_coincident`` sets the loop attributes of
 | 
						|
all loops without loop-carried dependencies (i.e. vectorizable loops).
 | 
						|
There might be more than one such loops. If not defined, the loops will
 | 
						|
inherit the original loop's attributes.
 | 
						|
 | 
						|
``llvm.loop.distribute.followup_sequential`` sets the loop attributes of the
 | 
						|
loop with potentially unsafe dependencies. There should be at most one
 | 
						|
such loop. If not defined, the loop will inherit the original loop's
 | 
						|
attributes.
 | 
						|
 | 
						|
``llvm.loop.distribute.followup_fallback`` defines the loop attributes
 | 
						|
for the fallback loop, which is a copy of the original loop for when
 | 
						|
loop versioning is required. If undefined, the fallback loop inherits
 | 
						|
all attributes from the original loop.
 | 
						|
 | 
						|
Attributes defined in ``llvm.loop.distribute.followup_all`` are added to
 | 
						|
all of the aforementioned output loops.
 | 
						|
 | 
						|
It is recommended to add ``llvm.loop.disable_nonforced`` to
 | 
						|
``llvm.loop.distribute.followup_fallback``. This avoids that the
 | 
						|
fallback version (which is likely never executed) is further optimized
 | 
						|
which would increase the code size.
 | 
						|
 | 
						|
Versioning LICM
 | 
						|
---------------
 | 
						|
 | 
						|
The pass hoists code out of loops that are only loop-invariant when
 | 
						|
dynamic conditions apply. For instance, it transforms the loop
 | 
						|
 | 
						|
.. code-block:: c
 | 
						|
 | 
						|
    for (int i = 0; i < n; i+=1) // original loop
 | 
						|
      A[i] = B[0];
 | 
						|
 | 
						|
into:
 | 
						|
 | 
						|
.. code-block:: c
 | 
						|
 | 
						|
    if (rtc) {
 | 
						|
      auto b = B[0];
 | 
						|
      for (int i = 0; i < n; i+=1) // versioned loop
 | 
						|
        A[i] = b;
 | 
						|
    } else {
 | 
						|
      for (int i = 0; i < n; i+=1) // unversioned loop
 | 
						|
        A[i] = B[0];
 | 
						|
    }
 | 
						|
 | 
						|
The runtime condition (``rtc``) checks that the array ``A`` and the
 | 
						|
element `B[0]` do not alias.
 | 
						|
 | 
						|
Currently, this transformation does not support followup-attributes.
 | 
						|
 | 
						|
Loop Interchange
 | 
						|
----------------
 | 
						|
 | 
						|
Currently, the ``LoopInterchange`` pass does not use any metadata.
 | 
						|
 | 
						|
Ambiguous Transformation Order
 | 
						|
==============================
 | 
						|
 | 
						|
If there multiple transformations defined, the order in which they are
 | 
						|
executed depends on the order in LLVM's pass pipeline, which is subject
 | 
						|
to change. The default optimization pipeline (anything higher than
 | 
						|
``-O0``) has the following order.
 | 
						|
 | 
						|
When using the legacy pass manager:
 | 
						|
 | 
						|
 - LoopInterchange (if enabled)
 | 
						|
 - SimpleLoopUnroll/LoopFullUnroll (only performs full unrolling)
 | 
						|
 - VersioningLICM (if enabled)
 | 
						|
 - LoopDistribute
 | 
						|
 - LoopVectorizer
 | 
						|
 - LoopUnrollAndJam (if enabled)
 | 
						|
 - LoopUnroll (partial and runtime unrolling)
 | 
						|
 | 
						|
When using the legacy pass manager with LTO:
 | 
						|
 | 
						|
 - LoopInterchange (if enabled)
 | 
						|
 - SimpleLoopUnroll/LoopFullUnroll (only performs full unrolling)
 | 
						|
 - LoopVectorizer
 | 
						|
 - LoopUnroll (partial and runtime unrolling)
 | 
						|
 | 
						|
When using the new pass manager:
 | 
						|
 | 
						|
 - SimpleLoopUnroll/LoopFullUnroll (only performs full unrolling)
 | 
						|
 - LoopDistribute
 | 
						|
 - LoopVectorizer
 | 
						|
 - LoopUnrollAndJam (if enabled)
 | 
						|
 - LoopUnroll (partial and runtime unrolling)
 | 
						|
 | 
						|
Leftover Transformations
 | 
						|
========================
 | 
						|
 | 
						|
Forced transformations that have not been applied after the last
 | 
						|
transformation pass should be reported to the user. The transformation
 | 
						|
passes themselves cannot be responsible for this reporting because they
 | 
						|
might not be in the pipeline, there might be multiple passes able to
 | 
						|
apply a transformation (e.g. ``LoopInterchange`` and Polly) or a
 | 
						|
transformation attribute may be 'hidden' inside another passes' followup
 | 
						|
attribute.
 | 
						|
 | 
						|
The pass ``-transform-warning`` (``WarnMissedTransformationsPass``)
 | 
						|
emits such warnings. It should be placed after the last transformation
 | 
						|
pass.
 | 
						|
 | 
						|
The current pass pipeline has a fixed order in which transformations
 | 
						|
passes are executed. A transformation can be in the followup of a pass
 | 
						|
that is executed later and thus leftover. For instance, a loop nest
 | 
						|
cannot be distributed and then interchanged with the current pass
 | 
						|
pipeline. The loop distribution will execute, but there is no loop
 | 
						|
interchange pass following such that any loop interchange metadata will
 | 
						|
be ignored. The ``-transform-warning`` should emit a warning in this
 | 
						|
case.
 | 
						|
 | 
						|
Future versions of LLVM may fix this by executing transformations using
 | 
						|
a dynamic ordering.
 |