401 lines
		
	
	
		
			18 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
			
		
		
	
	
			401 lines
		
	
	
		
			18 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
==========================
 | 
						|
Clang Transformer Tutorial
 | 
						|
==========================
 | 
						|
 | 
						|
A tutorial on how to write a source-to-source translation tool using Clang Transformer.
 | 
						|
 | 
						|
.. contents::
 | 
						|
   :local:
 | 
						|
 | 
						|
What is Clang Transformer?
 | 
						|
--------------------------
 | 
						|
 | 
						|
Clang Transformer is a framework for writing C++ diagnostics and program
 | 
						|
transformations. It is built on the clang toolchain and the LibTooling library,
 | 
						|
but aims to hide much of the complexity of clang's native, low-level libraries.
 | 
						|
 | 
						|
The core abstraction of Transformer is the *rewrite rule*, which specifies how
 | 
						|
to change a given program pattern into a new form. Here are some examples of
 | 
						|
tasks you can achieve with Transformer:
 | 
						|
 | 
						|
*   warn against using the name ``MkX`` for a declared function,
 | 
						|
*   change ``MkX`` to ``MakeX``, where ``MkX`` is the name of a declared function,
 | 
						|
*   change ``s.size()`` to ``Size(s)``, where ``s`` is a ``string``,
 | 
						|
*   collapse ``e.child().m()`` to ``e.m()``, for any expression ``e`` and method named
 | 
						|
    ``m``.
 | 
						|
 | 
						|
All of the examples have a common form: they identify a pattern that is the
 | 
						|
target of the transformation, they specify an *edit* to the code identified by
 | 
						|
the pattern, and their pattern and edit refer to common variables, like ``s``,
 | 
						|
``e``, and ``m``, that range over code fragments. Our first and second examples also
 | 
						|
specify constraints on the pattern that aren't apparent from the syntax alone,
 | 
						|
like "``s`` is a ``string``." Even the first example ("warn ...") shares this form,
 | 
						|
even though it doesn't change any of the code -- it's "edit" is simply a no-op.
 | 
						|
 | 
						|
Transformer helps users succinctly specify rules of this sort and easily execute
 | 
						|
them locally over a collection of files, apply them to selected portions of
 | 
						|
a codebase, or even bundle them as a clang-tidy check for ongoing application.
 | 
						|
 | 
						|
Who is Clang Transformer for?
 | 
						|
-----------------------------
 | 
						|
 | 
						|
Clang Transformer is for developers who want to write clang-tidy checks or write
 | 
						|
tools to modify a large number of C++ files in (roughly) the same way. What
 | 
						|
qualifies as "large" really depends on the nature of the change and your
 | 
						|
patience for repetitive editing. In our experience, automated solutions become
 | 
						|
worthwhile somewhere between 100 and 500 files.
 | 
						|
 | 
						|
Getting Started
 | 
						|
---------------
 | 
						|
 | 
						|
Patterns in Transformer are expressed with :doc:`clang's AST matchers <LibASTMatchers>`. 
 | 
						|
Matchers are a language of combinators for describing portions of a clang
 | 
						|
Abstract Syntax Tree (AST). Since clang's AST includes complete type information
 | 
						|
(within the limits of single `Translation Unit (TU)`_,
 | 
						|
these patterns can even encode rich constraints on the type properties of AST
 | 
						|
nodes.
 | 
						|
 | 
						|
.. _`Translation Unit (TU)`: https://en.wikipedia.org/wiki/Translation_unit_\(programming\)
 | 
						|
 | 
						|
We assume a familiarity with the clang AST and the corresponding AST matchers
 | 
						|
for the purpose of this tutorial. Users who are unfamiliar with either are
 | 
						|
encouraged to start with the recommended references in `Related Reading`_.
 | 
						|
 | 
						|
Example: style-checking names
 | 
						|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 | 
						|
 | 
						|
Assume you have a style-guide rule which forbids functions from being named
 | 
						|
"MkX" and you want to write a check that catches any violations of this rule. We
 | 
						|
can express this a Transformer rewrite rule:
 | 
						|
 | 
						|
.. code-block:: c++
 | 
						|
		
 | 
						|
   makeRule(functionDecl(hasName("MkX").bind("fun"),
 | 
						|
	    noopEdit(node("fun")),
 | 
						|
	    cat("The name ``MkX`` is not allowed for functions; please rename"));
 | 
						|
 | 
						|
``makeRule`` is our go-to function for generating rewrite rules. It takes three
 | 
						|
arguments: the pattern, the edit, and (optionally) an explanatory note. In our
 | 
						|
example, the pattern (``functionDecl(...)``) identifies the declaration of the
 | 
						|
function ``MkX``. Since we're just diagnosing the problem, but not suggesting a
 | 
						|
fix, our edit is an no-op. But, it contains an *anchor* for the diagnostic
 | 
						|
message: ``node("fun")`` says to associate the message with the source range of
 | 
						|
the AST node bound to "fun"; in this case, the ill-named function declaration.
 | 
						|
Finally, we use ``cat`` to build a message that explains the change. Regarding the
 | 
						|
name ``cat`` -- we'll discuss it in more detail below, but suffice it to say that
 | 
						|
it can also take multiple arguments and concatenate their results.
 | 
						|
 | 
						|
Note that the result of ``makeRule`` is a value of type
 | 
						|
``clang::transformer::RewriteRule``, but most users don't need to care about the
 | 
						|
details of this type.
 | 
						|
 | 
						|
Example: renaming a function
 | 
						|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 | 
						|
 | 
						|
Now, let's extend this example to a *transformation*; specifically, the second
 | 
						|
example above:
 | 
						|
 | 
						|
.. code-block:: c++
 | 
						|
		
 | 
						|
   makeRule(declRefExpr(to(functionDecl(hasName("MkX")))),
 | 
						|
	    changeTo(cat("MakeX")),
 | 
						|
	    cat("MkX has been renamed MakeX"));
 | 
						|
 | 
						|
In this example, the pattern (``declRefExpr(...)``) identifies any *reference* to
 | 
						|
the function ``MkX``, rather than the declaration itself, as in our previous
 | 
						|
example. Our edit (``changeTo(...)``) says to *change* the code matched by the
 | 
						|
pattern *to* the text "MakeX". Finally, we use ``cat`` again to build a message
 | 
						|
that explains the change.
 | 
						|
 | 
						|
Here are some example changes that this rule would make:
 | 
						|
 | 
						|
+--------------------------+----------------------------+
 | 
						|
| Original                 | Result                     |
 | 
						|
+==========================+============================+
 | 
						|
| ``X x = MkX(3);``        | ``X x = MakeX(3);``        |
 | 
						|
+--------------------------+----------------------------+
 | 
						|
| ``CallFactory(MkX, 3);`` | ``CallFactory(MakeX, 3);`` |
 | 
						|
+--------------------------+----------------------------+
 | 
						|
| ``auto f = MkX;``        | ``auto f = MakeX;``        |
 | 
						|
+--------------------------+----------------------------+
 | 
						|
 | 
						|
Example: method to function
 | 
						|
^^^^^^^^^^^^^^^^^^^^^^^^^^^
 | 
						|
 | 
						|
Next, let's write a rule to replace a method call with a (free) function call,
 | 
						|
applied to the original method call's target object. Specifically, "change
 | 
						|
``s.size()`` to ``Size(s)``, where ``s`` is a ``string``." We start with a simpler
 | 
						|
change that ignores the type of ``s``. That is, it will modify *any* method call
 | 
						|
where the method is named "size":
 | 
						|
 | 
						|
.. code-block:: c++
 | 
						|
		
 | 
						|
   llvm::StringRef s = "str";
 | 
						|
   makeRule(
 | 
						|
     cxxMemberCallExpr(
 | 
						|
       on(expr().bind(s)),
 | 
						|
       callee(cxxMethodDecl(hasName("size")))),
 | 
						|
     changeTo(cat("Size(", node(s), ")")),
 | 
						|
     cat("Method ``size`` is deprecated in favor of free function ``Size``"));
 | 
						|
 | 
						|
We express the pattern with the given AST matcher, which binds the method call's
 | 
						|
target to ``s`` [#f1]_. For the edit, we again use ``changeTo``, but this
 | 
						|
time we construct the term from multiple parts, which we compose with ``cat``. The
 | 
						|
second part of our term is ``node(s)``, which selects the source code
 | 
						|
corresponding to the AST node ``s`` that was bound when a match was found in the
 | 
						|
AST for our rule's pattern. ``node(s)`` constructs a ``RangeSelector``, which, when
 | 
						|
used in ``cat``, indicates that the selected source should be inserted in the
 | 
						|
output at that point.
 | 
						|
 | 
						|
Now, we probably don't want to rewrite *all* invocations of "size" methods, just
 | 
						|
those on ``std::string``\ s. We can achieve this change simply by refining our
 | 
						|
matcher. The rest of the rule remains unchanged:
 | 
						|
 | 
						|
.. code-block:: c++
 | 
						|
		
 | 
						|
   llvm::StringRef s = "str";
 | 
						|
   makeRule(
 | 
						|
     cxxMemberCallExpr(
 | 
						|
       on(expr(hasType(namedDecl(hasName("std::string"))))
 | 
						|
	 .bind(s)),
 | 
						|
       callee(cxxMethodDecl(hasName("size")))),
 | 
						|
     changeTo(cat("Size(", node(s), ")")),
 | 
						|
     cat("Method ``size`` is deprecated in favor of free function ``Size``"));
 | 
						|
 | 
						|
Example: rewriting method calls
 | 
						|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 | 
						|
 | 
						|
In this example, we delete an "intermediary" method call in a string of
 | 
						|
invocations. This scenario can arise, for example, if you want to collapse a
 | 
						|
substructure into its parent.
 | 
						|
 | 
						|
.. code-block:: c++
 | 
						|
		
 | 
						|
   llvm::StringRef e = "expr", m = "member";
 | 
						|
   auto child_call = cxxMemberCallExpr(on(expr().bind(e)),
 | 
						|
				       callee(cxxMethodDecl(hasName("child"))));
 | 
						|
   makeRule(cxxMemberCallExpr(on(child_call), callee(memberExpr().bind(m)),
 | 
						|
	    changeTo(cat(e, ".", member(m), "()"))),
 | 
						|
	    cat("``child`` accessor is being removed; call ",
 | 
						|
		member(m), " directly on parent"));
 | 
						|
 | 
						|
This rule isn't quite what we want: it will rewrite ``my_object.child().foo()`` to
 | 
						|
``my_object.foo()``, but it will also rewrite ``my_ptr->child().foo()`` to
 | 
						|
``my_ptr.foo()``, which is not what we intend. We could fix this by restricting
 | 
						|
the pattern with ``not(isArrow())`` in the definition of ``child_call``. Yet, we
 | 
						|
*want* to rewrite calls through pointers.
 | 
						|
 | 
						|
To capture this idiom, we provide the ``access`` combinator to intelligently
 | 
						|
construct a field/method access. In our example, the member access is expressed
 | 
						|
as:
 | 
						|
 | 
						|
.. code-block:: c++
 | 
						|
		
 | 
						|
   access(e, cat(member(m)))
 | 
						|
 | 
						|
The first argument specifies the object being accessed and the second, a
 | 
						|
description of the field/method name. In this case, we specify that the method
 | 
						|
name should be copied from the source -- specifically, the source range of ``m``'s
 | 
						|
member. To construct the method call, we would use this expression in ``cat``:
 | 
						|
 | 
						|
.. code-block:: c++
 | 
						|
		
 | 
						|
   cat(access(e, cat(member(m))), "()")
 | 
						|
 | 
						|
Reference: ranges, stencils, edits, rules
 | 
						|
-----------------------------------------
 | 
						|
 | 
						|
The above examples demonstrate just the basics of rewrite rules. Every element
 | 
						|
we touched on has more available constructors: range selectors, stencils, edits
 | 
						|
and rules. In this section, we'll briefly review each in turn, with references
 | 
						|
to the source headers for up-to-date information. First, though, we clarify what
 | 
						|
rewrite rules are actually rewriting.
 | 
						|
 | 
						|
Rewriting ASTs to... Text?
 | 
						|
^^^^^^^^^^^^^^^^^^^^^^^^^^
 | 
						|
 | 
						|
The astute reader may have noticed that we've been somewhat vague in our
 | 
						|
explanation of what the rewrite rules are actually rewriting. We've referred to
 | 
						|
"code", but code can be represented both as raw source text and as an abstract
 | 
						|
syntax tree. So, which one is it?
 | 
						|
 | 
						|
Ideally, we'd be rewriting the input AST to a new AST, but clang's AST is not
 | 
						|
terribly amenable to this kind of transformation. So, we compromise: we express
 | 
						|
our patterns and the names that they bind in terms of the AST, but our changes
 | 
						|
in terms of source code text. We've designed Transformer's language to bridge
 | 
						|
the gap between the two representations, in an attempt to minimize the user's
 | 
						|
need to reason about source code locations and other, low-level syntactic
 | 
						|
details.
 | 
						|
 | 
						|
Range Selectors
 | 
						|
^^^^^^^^^^^^^^^
 | 
						|
 | 
						|
Transformer provides a small API for describing source ranges: the
 | 
						|
``RangeSelector`` combinators. These ranges are most commonly used to specify the
 | 
						|
source code affected by an edit and to extract source code in constructing new
 | 
						|
text.
 | 
						|
 | 
						|
Roughly, there are two kinds of range combinators: ones that select a source
 | 
						|
range based on the AST, and others that combine existing ranges into new ranges.
 | 
						|
For example, ``node`` selects the range of source spanned by a particular AST
 | 
						|
node, as we've seen, while ``after`` selects the (empty) range located immediately
 | 
						|
after its argument range. So, ``after(node("id"))`` is the empty range immediately
 | 
						|
following the AST node bound to ``id``.
 | 
						|
 | 
						|
For the full collection of ``RangeSelector``\ s, see the header,
 | 
						|
`clang/Tooling/Transformer/RangeSelector.h <https://github.com/llvm/llvm-project/blob/main/clang/include/clang/Tooling/Transformer/RangeSelector.h>`_
 | 
						|
 | 
						|
Stencils
 | 
						|
^^^^^^^^
 | 
						|
 | 
						|
Transformer offers a large and growing collection of combinators for
 | 
						|
constructing output. Above, we demonstrated ``cat``, the core function for
 | 
						|
constructing stencils. It takes a series of arguments, of three possible kinds:
 | 
						|
 | 
						|
#.  Raw text, to be copied directly to the output.
 | 
						|
#.  Selector: specified with a ``RangeSelector``, indicates a range of source text
 | 
						|
    to copy to the output.
 | 
						|
#.  Builder: an operation that constructs a code snippet from its arguments. For
 | 
						|
    example, the ``access`` function we saw above.
 | 
						|
 | 
						|
Data of these different types are all represented (generically) by a ``Stencil``.
 | 
						|
``cat`` takes text and ``RangeSelector``\ s directly as arguments, rather than
 | 
						|
requiring that they be constructed with a builder; other builders are
 | 
						|
constructed explicitly.
 | 
						|
 | 
						|
In general, ``Stencil``\ s produce text from a match result. So, they are not
 | 
						|
limited to generating source code, but can also be used to generate diagnostic
 | 
						|
messages that reference (named) elements of the matched code, like we saw in the
 | 
						|
example of rewriting method calls.
 | 
						|
 | 
						|
Further details of the ``Stencil`` type are documented in the header file
 | 
						|
`clang/Tooling/Transformer/Stencil.h <https://github.com/llvm/llvm-project/blob/main/clang/include/clang/Tooling/Transformer/Stencil.h>`_.
 | 
						|
 | 
						|
Edits
 | 
						|
^^^^^
 | 
						|
 | 
						|
Transformer supports additional forms of edits. First, in a ``changeTo``, we can
 | 
						|
specify the particular portion of code to be replaced, using the same
 | 
						|
``RangeSelector`` we saw earlier. For example, we could change the function name
 | 
						|
in a function declaration with:
 | 
						|
 | 
						|
.. code-block:: c++
 | 
						|
		
 | 
						|
   makeRule(functionDecl(hasName("bad")).bind(f),
 | 
						|
	    changeTo(name(f), cat("good")),
 | 
						|
	    cat("bad is now good"));
 | 
						|
 | 
						|
We also provide simpler editing primitives for insertion and deletion:
 | 
						|
``insertBefore``, ``insertAfter`` and ``remove``. These can all be found in the header
 | 
						|
file
 | 
						|
`clang/Tooling/Transformer/RewriteRule.h <https://github.com/llvm/llvm-project/blob/main/clang/include/clang/Tooling/Transformer/RewriteRule.h>`_.
 | 
						|
 | 
						|
We are not limited one edit per match found. Some situations require making
 | 
						|
multiple edits for each match. For example, suppose we wanted to swap two
 | 
						|
arguments of a function call.
 | 
						|
 | 
						|
For this, we provide an overload of ``makeRule`` that takes a list of edits,
 | 
						|
rather than just a single one. Our example might look like:
 | 
						|
 | 
						|
.. code-block:: c++
 | 
						|
		
 | 
						|
   makeRule(callExpr(...),
 | 
						|
	   {changeTo(node(arg0), cat(node(arg2))),
 | 
						|
	    changeTo(node(arg2), cat(node(arg0)))},
 | 
						|
	   cat("swap the first and third arguments of the call"));
 | 
						|
 | 
						|
``EditGenerator``\ s (Advanced)
 | 
						|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 | 
						|
 | 
						|
The particular edits we've seen so far are all instances of the ``ASTEdit`` class,
 | 
						|
or a list of such. But, not all edits can be expressed as ``ASTEdit``\ s. So, we
 | 
						|
also support a very general signature for edit generators:
 | 
						|
 | 
						|
.. code-block:: c++
 | 
						|
		
 | 
						|
   using EditGenerator = MatchConsumer<llvm::SmallVector<Edit, 1>>;
 | 
						|
 | 
						|
That is, an ``EditGenerator`` is function that maps a ``MatchResult`` to a set
 | 
						|
of edits, or fails. This signature supports a very general form of computation
 | 
						|
over match results. Transformer provides a number of functions for working with
 | 
						|
``EditGenerator``\ s, most notably
 | 
						|
`flatten <https://github.com/llvm/llvm-project/blob/1fabe6e51917bcd7a1242294069c682fe6dffa45/clang/include/clang/Tooling/Transformer/RewriteRule.h#L165-L167>`_
 | 
						|
``EditGenerator``\ s, like list flattening. For the full list, see the header file
 | 
						|
`clang/Tooling/Transformer/RewriteRule.h <https://github.com/llvm/llvm-project/blob/main/clang/include/clang/Tooling/Transformer/RewriteRule.h>`_.
 | 
						|
 | 
						|
Rules
 | 
						|
^^^^^
 | 
						|
 | 
						|
We can also compose multiple *rules*, rather than just edits within a rule,
 | 
						|
using ``applyFirst``: it composes a list of rules as an ordered choice, where
 | 
						|
Transformer applies the first rule whose pattern matches, ignoring others in the
 | 
						|
list that follow. If the matchers are independent then order doesn't matter. In
 | 
						|
that case, ``applyFirst`` is simply joining the set of rules into one.
 | 
						|
 | 
						|
The benefit of ``applyFirst`` is that, for some problems, it allows the user to
 | 
						|
more concisely formulate later rules in the list, since their patterns need not
 | 
						|
explicitly exclude the earlier patterns of the list. For example, consider a set
 | 
						|
of rules that rewrite compound statements, where one rule handles the case of an
 | 
						|
empty compound statement and the other handles non-empty compound statements.
 | 
						|
With ``applyFirst``, these rules can be expressed compactly as:
 | 
						|
 | 
						|
.. code-block:: c++
 | 
						|
		
 | 
						|
   applyFirst({
 | 
						|
     makeRule(compoundStmt(statementCountIs(0)).bind("empty"), ...),
 | 
						|
     makeRule(compoundStmt().bind("non-empty"),...)
 | 
						|
   })
 | 
						|
 | 
						|
The second rule does not need to explicitly specify that the compound statement
 | 
						|
is non-empty -- it follows from the rules position in ``applyFirst``. For more
 | 
						|
complicated examples, this can lead to substantially more readable code.
 | 
						|
 | 
						|
Sometimes, a modification to the code might require the inclusion of a
 | 
						|
particular header file. To this end, users can modify rules to specify include
 | 
						|
directives with ``addInclude``.
 | 
						|
 | 
						|
For additional documentation on these functions, see the header file
 | 
						|
`clang/Tooling/Transformer/RewriteRule.h <https://github.com/llvm/llvm-project/blob/main/clang/include/clang/Tooling/Transformer/RewriteRule.h>`_.
 | 
						|
 | 
						|
Using a RewriteRule as a clang-tidy check
 | 
						|
-----------------------------------------
 | 
						|
 | 
						|
Transformer supports executing a rewrite rule as a
 | 
						|
`clang-tidy <https://clang.llvm.org/extra/clang-tidy/>`_ check, with the class
 | 
						|
``clang::tidy::utils::TransformerClangTidyCheck``. It is designed to require
 | 
						|
minimal code in the definition. For example, given a rule
 | 
						|
``MyCheckAsRewriteRule``, one can define a tidy check as follows:
 | 
						|
 | 
						|
.. code-block:: c++
 | 
						|
 | 
						|
   class MyCheck : public TransformerClangTidyCheck {
 | 
						|
    public:
 | 
						|
     MyCheck(StringRef Name, ClangTidyContext *Context)
 | 
						|
	 : TransformerClangTidyCheck(MyCheckAsRewriteRule, Name, Context) {}
 | 
						|
   };
 | 
						|
 | 
						|
``TransformerClangTidyCheck`` implements the virtual ``registerMatchers`` and
 | 
						|
``check`` methods based on your rule specification, so you don't need to implement
 | 
						|
them yourself. If the rule needs to be configured based on the language options
 | 
						|
and/or the clang-tidy configuration, it can be expressed as a function taking
 | 
						|
these as parameters and (optionally) returning a ``RewriteRule``. This would be
 | 
						|
useful, for example, for our method-renaming rule, which is parameterized by the
 | 
						|
original name and the target. For details, see
 | 
						|
`clang-tools-extra/clang-tidy/utils/TransformerClangTidyCheck.h <https://github.com/llvm/llvm-project/blob/main/clang-tools-extra/clang-tidy/utils/TransformerClangTidyCheck.h>`_
 | 
						|
 | 
						|
Related Reading
 | 
						|
---------------
 | 
						|
 | 
						|
A good place to start understanding the clang AST and its matchers is with the
 | 
						|
introductions on clang's site:
 | 
						|
 | 
						|
*   :doc:`Introduction to the Clang AST <IntroductionToTheClangAST>`
 | 
						|
*   :doc:`Matching the Clang AST <LibASTMatchers>`
 | 
						|
*   `AST Matcher Reference <https://clang.llvm.org/docs/LibASTMatchersReference.html>`_
 | 
						|
 | 
						|
.. rubric:: Footnotes
 | 
						|
 | 
						|
.. [#f1] Technically, it binds it to the string "str", to which our
 | 
						|
    variable ``s`` is bound. But, the choice of that id string is
 | 
						|
    irrelevant, so elide the difference.
 |