forked from OSchip/llvm-project
				
			
		
			
				
	
	
		
			415 lines
		
	
	
		
			14 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
			
		
		
	
	
			415 lines
		
	
	
		
			14 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
======================================================
 | 
						|
How to set up LLVM-style RTTI for your class hierarchy
 | 
						|
======================================================
 | 
						|
 | 
						|
.. contents::
 | 
						|
 | 
						|
Background
 | 
						|
==========
 | 
						|
 | 
						|
LLVM avoids using C++'s built in RTTI. Instead, it  pervasively uses its
 | 
						|
own hand-rolled form of RTTI which is much more efficient and flexible,
 | 
						|
although it requires a bit more work from you as a class author.
 | 
						|
 | 
						|
A description of how to use LLVM-style RTTI from a client's perspective is
 | 
						|
given in the `Programmer's Manual <ProgrammersManual.html#isa>`_. This
 | 
						|
document, in contrast, discusses the steps you need to take as a class
 | 
						|
hierarchy author to make LLVM-style RTTI available to your clients.
 | 
						|
 | 
						|
Before diving in, make sure that you are familiar with the Object Oriented
 | 
						|
Programming concept of "`is-a`_".
 | 
						|
 | 
						|
.. _is-a: http://en.wikipedia.org/wiki/Is-a
 | 
						|
 | 
						|
Basic Setup
 | 
						|
===========
 | 
						|
 | 
						|
This section describes how to set up the most basic form of LLVM-style RTTI
 | 
						|
(which is sufficient for 99.9% of the cases). We will set up LLVM-style
 | 
						|
RTTI for this class hierarchy:
 | 
						|
 | 
						|
.. code-block:: c++
 | 
						|
 | 
						|
   class Shape {
 | 
						|
   public:
 | 
						|
     Shape() {}
 | 
						|
     virtual double computeArea() = 0;
 | 
						|
   };
 | 
						|
 | 
						|
   class Square : public Shape {
 | 
						|
     double SideLength;
 | 
						|
   public:
 | 
						|
     Square(double S) : SideLength(S) {}
 | 
						|
     double computeArea() override;
 | 
						|
   };
 | 
						|
 | 
						|
   class Circle : public Shape {
 | 
						|
     double Radius;
 | 
						|
   public:
 | 
						|
     Circle(double R) : Radius(R) {}
 | 
						|
     double computeArea() override;
 | 
						|
   };
 | 
						|
 | 
						|
The most basic working setup for LLVM-style RTTI requires the following
 | 
						|
steps:
 | 
						|
 | 
						|
#. In the header where you declare ``Shape``, you will want to ``#include
 | 
						|
   "llvm/Support/Casting.h"``, which declares LLVM's RTTI templates. That
 | 
						|
   way your clients don't even have to think about it.
 | 
						|
 | 
						|
   .. code-block:: c++
 | 
						|
 | 
						|
      #include "llvm/Support/Casting.h"
 | 
						|
 | 
						|
#. In the base class, introduce an enum which discriminates all of the
 | 
						|
   different concrete classes in the hierarchy, and stash the enum value
 | 
						|
   somewhere in the base class.
 | 
						|
 | 
						|
   Here is the code after introducing this change:
 | 
						|
 | 
						|
   .. code-block:: c++
 | 
						|
 | 
						|
       class Shape {
 | 
						|
       public:
 | 
						|
      +  /// Discriminator for LLVM-style RTTI (dyn_cast<> et al.)
 | 
						|
      +  enum ShapeKind {
 | 
						|
      +    SK_Square,
 | 
						|
      +    SK_Circle
 | 
						|
      +  };
 | 
						|
      +private:
 | 
						|
      +  const ShapeKind Kind;
 | 
						|
      +public:
 | 
						|
      +  ShapeKind getKind() const { return Kind; }
 | 
						|
      +
 | 
						|
         Shape() {}
 | 
						|
         virtual double computeArea() = 0;
 | 
						|
       };
 | 
						|
 | 
						|
   You will usually want to keep the ``Kind`` member encapsulated and
 | 
						|
   private, but let the enum ``ShapeKind`` be public along with providing a
 | 
						|
   ``getKind()`` method. This is convenient for clients so that they can do
 | 
						|
   a ``switch`` over the enum.
 | 
						|
 | 
						|
   A common naming convention is that these enums are "kind"s, to avoid
 | 
						|
   ambiguity with the words "type" or "class" which have overloaded meanings
 | 
						|
   in many contexts within LLVM. Sometimes there will be a natural name for
 | 
						|
   it, like "opcode". Don't bikeshed over this; when in doubt use ``Kind``.
 | 
						|
 | 
						|
   You might wonder why the ``Kind`` enum doesn't have an entry for
 | 
						|
   ``Shape``. The reason for this is that since ``Shape`` is abstract
 | 
						|
   (``computeArea() = 0;``), you will never actually have non-derived
 | 
						|
   instances of exactly that class (only subclasses). See `Concrete Bases
 | 
						|
   and Deeper Hierarchies`_ for information on how to deal with
 | 
						|
   non-abstract bases. It's worth mentioning here that unlike
 | 
						|
   ``dynamic_cast<>``, LLVM-style RTTI can be used (and is often used) for
 | 
						|
   classes that don't have v-tables.
 | 
						|
 | 
						|
#. Next, you need to make sure that the ``Kind`` gets initialized to the
 | 
						|
   value corresponding to the dynamic type of the class. Typically, you will
 | 
						|
   want to have it be an argument to the constructor of the base class, and
 | 
						|
   then pass in the respective ``XXXKind`` from subclass constructors.
 | 
						|
 | 
						|
   Here is the code after that change:
 | 
						|
 | 
						|
   .. code-block:: c++
 | 
						|
 | 
						|
       class Shape {
 | 
						|
       public:
 | 
						|
         /// Discriminator for LLVM-style RTTI (dyn_cast<> et al.)
 | 
						|
         enum ShapeKind {
 | 
						|
           SK_Square,
 | 
						|
           SK_Circle
 | 
						|
         };
 | 
						|
       private:
 | 
						|
         const ShapeKind Kind;
 | 
						|
       public:
 | 
						|
         ShapeKind getKind() const { return Kind; }
 | 
						|
 | 
						|
      -  Shape() {}
 | 
						|
      +  Shape(ShapeKind K) : Kind(K) {}
 | 
						|
         virtual double computeArea() = 0;
 | 
						|
       };
 | 
						|
 | 
						|
       class Square : public Shape {
 | 
						|
         double SideLength;
 | 
						|
       public:
 | 
						|
      -  Square(double S) : SideLength(S) {}
 | 
						|
      +  Square(double S) : Shape(SK_Square), SideLength(S) {}
 | 
						|
         double computeArea() override;
 | 
						|
       };
 | 
						|
 | 
						|
       class Circle : public Shape {
 | 
						|
         double Radius;
 | 
						|
       public:
 | 
						|
      -  Circle(double R) : Radius(R) {}
 | 
						|
      +  Circle(double R) : Shape(SK_Circle), Radius(R) {}
 | 
						|
         double computeArea() override;
 | 
						|
       };
 | 
						|
 | 
						|
#. Finally, you need to inform LLVM's RTTI templates how to dynamically
 | 
						|
   determine the type of a class (i.e. whether the ``isa<>``/``dyn_cast<>``
 | 
						|
   should succeed). The default "99.9% of use cases" way to accomplish this
 | 
						|
   is through a small static member function ``classof``. In order to have
 | 
						|
   proper context for an explanation, we will display this code first, and
 | 
						|
   then below describe each part:
 | 
						|
 | 
						|
   .. code-block:: c++
 | 
						|
 | 
						|
       class Shape {
 | 
						|
       public:
 | 
						|
         /// Discriminator for LLVM-style RTTI (dyn_cast<> et al.)
 | 
						|
         enum ShapeKind {
 | 
						|
           SK_Square,
 | 
						|
           SK_Circle
 | 
						|
         };
 | 
						|
       private:
 | 
						|
         const ShapeKind Kind;
 | 
						|
       public:
 | 
						|
         ShapeKind getKind() const { return Kind; }
 | 
						|
 | 
						|
         Shape(ShapeKind K) : Kind(K) {}
 | 
						|
         virtual double computeArea() = 0;
 | 
						|
       };
 | 
						|
 | 
						|
       class Square : public Shape {
 | 
						|
         double SideLength;
 | 
						|
       public:
 | 
						|
         Square(double S) : Shape(SK_Square), SideLength(S) {}
 | 
						|
         double computeArea() override;
 | 
						|
      +
 | 
						|
      +  static bool classof(const Shape *S) {
 | 
						|
      +    return S->getKind() == SK_Square;
 | 
						|
      +  }
 | 
						|
       };
 | 
						|
 | 
						|
       class Circle : public Shape {
 | 
						|
         double Radius;
 | 
						|
       public:
 | 
						|
         Circle(double R) : Shape(SK_Circle), Radius(R) {}
 | 
						|
         double computeArea() override;
 | 
						|
      +
 | 
						|
      +  static bool classof(const Shape *S) {
 | 
						|
      +    return S->getKind() == SK_Circle;
 | 
						|
      +  }
 | 
						|
       };
 | 
						|
 | 
						|
   The job of ``classof`` is to dynamically determine whether an object of
 | 
						|
   a base class is in fact of a particular derived class.  In order to
 | 
						|
   downcast a type ``Base`` to a type ``Derived``, there needs to be a
 | 
						|
   ``classof`` in ``Derived`` which will accept an object of type ``Base``.
 | 
						|
 | 
						|
   To be concrete, consider the following code:
 | 
						|
 | 
						|
   .. code-block:: c++
 | 
						|
 | 
						|
      Shape *S = ...;
 | 
						|
      if (isa<Circle>(S)) {
 | 
						|
        /* do something ... */
 | 
						|
      }
 | 
						|
 | 
						|
   The code of the ``isa<>`` test in this code will eventually boil
 | 
						|
   down---after template instantiation and some other machinery---to a
 | 
						|
   check roughly like ``Circle::classof(S)``. For more information, see
 | 
						|
   :ref:`classof-contract`.
 | 
						|
 | 
						|
   The argument to ``classof`` should always be an *ancestor* class because
 | 
						|
   the implementation has logic to allow and optimize away
 | 
						|
   upcasts/up-``isa<>``'s automatically. It is as though every class
 | 
						|
   ``Foo`` automatically has a ``classof`` like:
 | 
						|
 | 
						|
   .. code-block:: c++
 | 
						|
 | 
						|
      class Foo {
 | 
						|
        [...]
 | 
						|
        template <class T>
 | 
						|
        static bool classof(const T *,
 | 
						|
                            ::std::enable_if<
 | 
						|
                              ::std::is_base_of<Foo, T>::value
 | 
						|
                            >::type* = 0) { return true; }
 | 
						|
        [...]
 | 
						|
      };
 | 
						|
 | 
						|
   Note that this is the reason that we did not need to introduce a
 | 
						|
   ``classof`` into ``Shape``: all relevant classes derive from ``Shape``,
 | 
						|
   and ``Shape`` itself is abstract (has no entry in the ``Kind`` enum),
 | 
						|
   so this notional inferred ``classof`` is all we need. See `Concrete
 | 
						|
   Bases and Deeper Hierarchies`_ for more information about how to extend
 | 
						|
   this example to more general hierarchies.
 | 
						|
 | 
						|
Although for this small example setting up LLVM-style RTTI seems like a lot
 | 
						|
of "boilerplate", if your classes are doing anything interesting then this
 | 
						|
will end up being a tiny fraction of the code.
 | 
						|
 | 
						|
Concrete Bases and Deeper Hierarchies
 | 
						|
=====================================
 | 
						|
 | 
						|
For concrete bases (i.e. non-abstract interior nodes of the inheritance
 | 
						|
tree), the ``Kind`` check inside ``classof`` needs to be a bit more
 | 
						|
complicated. The situation differs from the example above in that
 | 
						|
 | 
						|
* Since the class is concrete, it must itself have an entry in the ``Kind``
 | 
						|
  enum because it is possible to have objects with this class as a dynamic
 | 
						|
  type.
 | 
						|
 | 
						|
* Since the class has children, the check inside ``classof`` must take them
 | 
						|
  into account.
 | 
						|
 | 
						|
Say that ``SpecialSquare`` and ``OtherSpecialSquare`` derive
 | 
						|
from ``Square``, and so ``ShapeKind`` becomes:
 | 
						|
 | 
						|
.. code-block:: c++
 | 
						|
 | 
						|
    enum ShapeKind {
 | 
						|
      SK_Square,
 | 
						|
   +  SK_SpecialSquare,
 | 
						|
   +  SK_OtherSpecialSquare,
 | 
						|
      SK_Circle
 | 
						|
    }
 | 
						|
 | 
						|
Then in ``Square``, we would need to modify the ``classof`` like so:
 | 
						|
 | 
						|
.. code-block:: c++
 | 
						|
 | 
						|
   -  static bool classof(const Shape *S) {
 | 
						|
   -    return S->getKind() == SK_Square;
 | 
						|
   -  }
 | 
						|
   +  static bool classof(const Shape *S) {
 | 
						|
   +    return S->getKind() >= SK_Square &&
 | 
						|
   +           S->getKind() <= SK_OtherSpecialSquare;
 | 
						|
   +  }
 | 
						|
 | 
						|
The reason that we need to test a range like this instead of just equality
 | 
						|
is that both ``SpecialSquare`` and ``OtherSpecialSquare`` "is-a"
 | 
						|
``Square``, and so ``classof`` needs to return ``true`` for them.
 | 
						|
 | 
						|
This approach can be made to scale to arbitrarily deep hierarchies. The
 | 
						|
trick is that you arrange the enum values so that they correspond to a
 | 
						|
preorder traversal of the class hierarchy tree. With that arrangement, all
 | 
						|
subclass tests can be done with two comparisons as shown above. If you just
 | 
						|
list the class hierarchy like a list of bullet points, you'll get the
 | 
						|
ordering right::
 | 
						|
 | 
						|
   | Shape
 | 
						|
     | Square
 | 
						|
       | SpecialSquare
 | 
						|
       | OtherSpecialSquare
 | 
						|
     | Circle
 | 
						|
 | 
						|
A Bug to be Aware Of
 | 
						|
--------------------
 | 
						|
 | 
						|
The example just given opens the door to bugs where the ``classof``\s are
 | 
						|
not updated to match the ``Kind`` enum when adding (or removing) classes to
 | 
						|
(from) the hierarchy.
 | 
						|
 | 
						|
Continuing the example above, suppose we add a ``SomewhatSpecialSquare`` as
 | 
						|
a subclass of ``Square``, and update the ``ShapeKind`` enum like so:
 | 
						|
 | 
						|
.. code-block:: c++
 | 
						|
 | 
						|
    enum ShapeKind {
 | 
						|
      SK_Square,
 | 
						|
      SK_SpecialSquare,
 | 
						|
      SK_OtherSpecialSquare,
 | 
						|
   +  SK_SomewhatSpecialSquare,
 | 
						|
      SK_Circle
 | 
						|
    }
 | 
						|
 | 
						|
Now, suppose that we forget to update ``Square::classof()``, so it still
 | 
						|
looks like:
 | 
						|
 | 
						|
.. code-block:: c++
 | 
						|
 | 
						|
   static bool classof(const Shape *S) {
 | 
						|
     // BUG: Returns false when S->getKind() == SK_SomewhatSpecialSquare,
 | 
						|
     // even though SomewhatSpecialSquare "is a" Square.
 | 
						|
     return S->getKind() >= SK_Square &&
 | 
						|
            S->getKind() <= SK_OtherSpecialSquare;
 | 
						|
   }
 | 
						|
 | 
						|
As the comment indicates, this code contains a bug. A straightforward and
 | 
						|
non-clever way to avoid this is to introduce an explicit ``SK_LastSquare``
 | 
						|
entry in the enum when adding the first subclass(es). For example, we could
 | 
						|
rewrite the example at the beginning of `Concrete Bases and Deeper
 | 
						|
Hierarchies`_ as:
 | 
						|
 | 
						|
.. code-block:: c++
 | 
						|
 | 
						|
    enum ShapeKind {
 | 
						|
      SK_Square,
 | 
						|
   +  SK_SpecialSquare,
 | 
						|
   +  SK_OtherSpecialSquare,
 | 
						|
   +  SK_LastSquare,
 | 
						|
      SK_Circle
 | 
						|
    }
 | 
						|
   ...
 | 
						|
   // Square::classof()
 | 
						|
   -  static bool classof(const Shape *S) {
 | 
						|
   -    return S->getKind() == SK_Square;
 | 
						|
   -  }
 | 
						|
   +  static bool classof(const Shape *S) {
 | 
						|
   +    return S->getKind() >= SK_Square &&
 | 
						|
   +           S->getKind() <= SK_LastSquare;
 | 
						|
   +  }
 | 
						|
 | 
						|
Then, adding new subclasses is easy:
 | 
						|
 | 
						|
.. code-block:: c++
 | 
						|
 | 
						|
    enum ShapeKind {
 | 
						|
      SK_Square,
 | 
						|
      SK_SpecialSquare,
 | 
						|
      SK_OtherSpecialSquare,
 | 
						|
   +  SK_SomewhatSpecialSquare,
 | 
						|
      SK_LastSquare,
 | 
						|
      SK_Circle
 | 
						|
    }
 | 
						|
 | 
						|
Notice that ``Square::classof`` does not need to be changed.
 | 
						|
 | 
						|
.. _classof-contract:
 | 
						|
 | 
						|
The Contract of ``classof``
 | 
						|
---------------------------
 | 
						|
 | 
						|
To be more precise, let ``classof`` be inside a class ``C``.  Then the
 | 
						|
contract for ``classof`` is "return ``true`` if the dynamic type of the
 | 
						|
argument is-a ``C``".  As long as your implementation fulfills this
 | 
						|
contract, you can tweak and optimize it as much as you want.
 | 
						|
 | 
						|
For example, LLVM-style RTTI can work fine in the presence of
 | 
						|
multiple-inheritance by defining an appropriate ``classof``.
 | 
						|
An example of this in practice is
 | 
						|
`Decl <http://clang.llvm.org/doxygen/classclang_1_1Decl.html>`_ vs.
 | 
						|
`DeclContext <http://clang.llvm.org/doxygen/classclang_1_1DeclContext.html>`_
 | 
						|
inside Clang.
 | 
						|
The ``Decl`` hierarchy is done very similarly to the example setup
 | 
						|
demonstrated in this tutorial.
 | 
						|
The key part is how to then incorporate ``DeclContext``: all that is needed
 | 
						|
is in ``bool DeclContext::classof(const Decl *)``, which asks the question
 | 
						|
"Given a ``Decl``, how can I determine if it is-a ``DeclContext``?".
 | 
						|
It answers this with a simple switch over the set of ``Decl`` "kinds", and
 | 
						|
returning true for ones that are known to be ``DeclContext``'s.
 | 
						|
 | 
						|
.. TODO::
 | 
						|
 | 
						|
   Touch on some of the more advanced features, like ``isa_impl`` and
 | 
						|
   ``simplify_type``. However, those two need reference documentation in
 | 
						|
   the form of doxygen comments as well. We need the doxygen so that we can
 | 
						|
   say "for full details, see http://llvm.org/doxygen/..."
 | 
						|
 | 
						|
Rules of Thumb
 | 
						|
==============
 | 
						|
 | 
						|
#. The ``Kind`` enum should have one entry per concrete class, ordered
 | 
						|
   according to a preorder traversal of the inheritance tree.
 | 
						|
#. The argument to ``classof`` should be a ``const Base *``, where ``Base``
 | 
						|
   is some ancestor in the inheritance hierarchy. The argument should
 | 
						|
   *never* be a derived class or the class itself: the template machinery
 | 
						|
   for ``isa<>`` already handles this case and optimizes it.
 | 
						|
#. For each class in the hierarchy that has no children, implement a
 | 
						|
   ``classof`` that checks only against its ``Kind``.
 | 
						|
#. For each class in the hierarchy that has children, implement a
 | 
						|
   ``classof`` that checks a range of the first child's ``Kind`` and the
 | 
						|
   last child's ``Kind``.
 |