401 lines
		
	
	
		
			17 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
			
		
		
	
	
			401 lines
		
	
	
		
			17 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
| =========================
 | |
| Driver Design & Internals
 | |
| =========================
 | |
| 
 | |
| .. contents::
 | |
|    :local:
 | |
| 
 | |
| Introduction
 | |
| ============
 | |
| 
 | |
| This document describes the Clang driver. The purpose of this document
 | |
| is to describe both the motivation and design goals for the driver, as
 | |
| well as details of the internal implementation.
 | |
| 
 | |
| Features and Goals
 | |
| ==================
 | |
| 
 | |
| The Clang driver is intended to be a production quality compiler driver
 | |
| providing access to the Clang compiler and tools, with a command line
 | |
| interface which is compatible with the gcc driver.
 | |
| 
 | |
| Although the driver is part of and driven by the Clang project, it is
 | |
| logically a separate tool which shares many of the same goals as Clang:
 | |
| 
 | |
| .. contents:: Features
 | |
|    :local:
 | |
| 
 | |
| GCC Compatibility
 | |
| -----------------
 | |
| 
 | |
| The number one goal of the driver is to ease the adoption of Clang by
 | |
| allowing users to drop Clang into a build system which was designed to
 | |
| call GCC. Although this makes the driver much more complicated than
 | |
| might otherwise be necessary, we decided that being very compatible with
 | |
| the gcc command line interface was worth it in order to allow users to
 | |
| quickly test clang on their projects.
 | |
| 
 | |
| Flexible
 | |
| --------
 | |
| 
 | |
| The driver was designed to be flexible and easily accommodate new uses
 | |
| as we grow the clang and LLVM infrastructure. As one example, the driver
 | |
| can easily support the introduction of tools which have an integrated
 | |
| assembler; something we hope to add to LLVM in the future.
 | |
| 
 | |
| Similarly, most of the driver functionality is kept in a library which
 | |
| can be used to build other tools which want to implement or accept a gcc
 | |
| like interface.
 | |
| 
 | |
| Low Overhead
 | |
| ------------
 | |
| 
 | |
| The driver should have as little overhead as possible. In practice, we
 | |
| found that the gcc driver by itself incurred a small but meaningful
 | |
| overhead when compiling many small files. The driver doesn't do much
 | |
| work compared to a compilation, but we have tried to keep it as
 | |
| efficient as possible by following a few simple principles:
 | |
| 
 | |
| -  Avoid memory allocation and string copying when possible.
 | |
| -  Don't parse arguments more than once.
 | |
| -  Provide a few simple interfaces for efficiently searching arguments.
 | |
| 
 | |
| Simple
 | |
| ------
 | |
| 
 | |
| Finally, the driver was designed to be "as simple as possible", given
 | |
| the other goals. Notably, trying to be completely compatible with the
 | |
| gcc driver adds a significant amount of complexity. However, the design
 | |
| of the driver attempts to mitigate this complexity by dividing the
 | |
| process into a number of independent stages instead of a single
 | |
| monolithic task.
 | |
| 
 | |
| Internal Design and Implementation
 | |
| ==================================
 | |
| 
 | |
| .. contents::
 | |
|    :local:
 | |
|    :depth: 1
 | |
| 
 | |
| Internals Introduction
 | |
| ----------------------
 | |
| 
 | |
| In order to satisfy the stated goals, the driver was designed to
 | |
| completely subsume the functionality of the gcc executable; that is, the
 | |
| driver should not need to delegate to gcc to perform subtasks. On
 | |
| Darwin, this implies that the Clang driver also subsumes the gcc
 | |
| driver-driver, which is used to implement support for building universal
 | |
| images (binaries and object files). This also implies that the driver
 | |
| should be able to call the language specific compilers (e.g. cc1)
 | |
| directly, which means that it must have enough information to forward
 | |
| command line arguments to child processes correctly.
 | |
| 
 | |
| Design Overview
 | |
| ---------------
 | |
| 
 | |
| The diagram below shows the significant components of the driver
 | |
| architecture and how they relate to one another. The orange components
 | |
| represent concrete data structures built by the driver, the green
 | |
| components indicate conceptually distinct stages which manipulate these
 | |
| data structures, and the blue components are important helper classes.
 | |
| 
 | |
| .. image:: DriverArchitecture.png
 | |
|    :align: center
 | |
|    :alt: Driver Architecture Diagram
 | |
| 
 | |
| Driver Stages
 | |
| -------------
 | |
| 
 | |
| The driver functionality is conceptually divided into five stages:
 | |
| 
 | |
| #. **Parse: Option Parsing**
 | |
| 
 | |
|    The command line argument strings are decomposed into arguments
 | |
|    (``Arg`` instances). The driver expects to understand all available
 | |
|    options, although there is some facility for just passing certain
 | |
|    classes of options through (like ``-Wl,``).
 | |
| 
 | |
|    Each argument corresponds to exactly one abstract ``Option``
 | |
|    definition, which describes how the option is parsed along with some
 | |
|    additional metadata. The Arg instances themselves are lightweight and
 | |
|    merely contain enough information for clients to determine which
 | |
|    option they correspond to and their values (if they have additional
 | |
|    parameters).
 | |
| 
 | |
|    For example, a command line like "-Ifoo -I foo" would parse to two
 | |
|    Arg instances (a JoinedArg and a SeparateArg instance), but each
 | |
|    would refer to the same Option.
 | |
| 
 | |
|    Options are lazily created in order to avoid populating all Option
 | |
|    classes when the driver is loaded. Most of the driver code only needs
 | |
|    to deal with options by their unique ID (e.g., ``options::OPT_I``),
 | |
| 
 | |
|    Arg instances themselves do not generally store the values of
 | |
|    parameters. In many cases, this would simply result in creating
 | |
|    unnecessary string copies. Instead, Arg instances are always embedded
 | |
|    inside an ArgList structure, which contains the original vector of
 | |
|    argument strings. Each Arg itself only needs to contain an index into
 | |
|    this vector instead of storing its values directly.
 | |
| 
 | |
|    The clang driver can dump the results of this stage using the
 | |
|    ``-ccc-print-options`` flag (which must precede any actual command
 | |
|    line arguments). For example:
 | |
| 
 | |
|    .. code-block:: console
 | |
| 
 | |
|       $ clang -ccc-print-options -Xarch_i386 -fomit-frame-pointer -Wa,-fast -Ifoo -I foo t.c
 | |
|       Option 0 - Name: "-Xarch_", Values: {"i386", "-fomit-frame-pointer"}
 | |
|       Option 1 - Name: "-Wa,", Values: {"-fast"}
 | |
|       Option 2 - Name: "-I", Values: {"foo"}
 | |
|       Option 3 - Name: "-I", Values: {"foo"}
 | |
|       Option 4 - Name: "<input>", Values: {"t.c"}
 | |
| 
 | |
|    After this stage is complete the command line should be broken down
 | |
|    into well defined option objects with their appropriate parameters.
 | |
|    Subsequent stages should rarely, if ever, need to do any string
 | |
|    processing.
 | |
| 
 | |
| #. **Pipeline: Compilation Job Construction**
 | |
| 
 | |
|    Once the arguments are parsed, the tree of subprocess jobs needed for
 | |
|    the desired compilation sequence are constructed. This involves
 | |
|    determining the input files and their types, what work is to be done
 | |
|    on them (preprocess, compile, assemble, link, etc.), and constructing
 | |
|    a list of Action instances for each task. The result is a list of one
 | |
|    or more top-level actions, each of which generally corresponds to a
 | |
|    single output (for example, an object or linked executable).
 | |
| 
 | |
|    The majority of Actions correspond to actual tasks, however there are
 | |
|    two special Actions. The first is InputAction, which simply serves to
 | |
|    adapt an input argument for use as an input to other Actions. The
 | |
|    second is BindArchAction, which conceptually alters the architecture
 | |
|    to be used for all of its input Actions.
 | |
| 
 | |
|    The clang driver can dump the results of this stage using the
 | |
|    ``-ccc-print-phases`` flag. For example:
 | |
| 
 | |
|    .. code-block:: console
 | |
| 
 | |
|       $ clang -ccc-print-phases -x c t.c -x assembler t.s
 | |
|       0: input, "t.c", c
 | |
|       1: preprocessor, {0}, cpp-output
 | |
|       2: compiler, {1}, assembler
 | |
|       3: assembler, {2}, object
 | |
|       4: input, "t.s", assembler
 | |
|       5: assembler, {4}, object
 | |
|       6: linker, {3, 5}, image
 | |
| 
 | |
|    Here the driver is constructing seven distinct actions, four to
 | |
|    compile the "t.c" input into an object file, two to assemble the
 | |
|    "t.s" input, and one to link them together.
 | |
| 
 | |
|    A rather different compilation pipeline is shown here; in this
 | |
|    example there are two top level actions to compile the input files
 | |
|    into two separate object files, where each object file is built using
 | |
|    ``lipo`` to merge results built for two separate architectures.
 | |
| 
 | |
|    .. code-block:: console
 | |
| 
 | |
|       $ clang -ccc-print-phases -c -arch i386 -arch x86_64 t0.c t1.c
 | |
|       0: input, "t0.c", c
 | |
|       1: preprocessor, {0}, cpp-output
 | |
|       2: compiler, {1}, assembler
 | |
|       3: assembler, {2}, object
 | |
|       4: bind-arch, "i386", {3}, object
 | |
|       5: bind-arch, "x86_64", {3}, object
 | |
|       6: lipo, {4, 5}, object
 | |
|       7: input, "t1.c", c
 | |
|       8: preprocessor, {7}, cpp-output
 | |
|       9: compiler, {8}, assembler
 | |
|       10: assembler, {9}, object
 | |
|       11: bind-arch, "i386", {10}, object
 | |
|       12: bind-arch, "x86_64", {10}, object
 | |
|       13: lipo, {11, 12}, object
 | |
| 
 | |
|    After this stage is complete the compilation process is divided into
 | |
|    a simple set of actions which need to be performed to produce
 | |
|    intermediate or final outputs (in some cases, like ``-fsyntax-only``,
 | |
|    there is no "real" final output). Phases are well known compilation
 | |
|    steps, such as "preprocess", "compile", "assemble", "link", etc.
 | |
| 
 | |
| #. **Bind: Tool & Filename Selection**
 | |
| 
 | |
|    This stage (in conjunction with the Translate stage) turns the tree
 | |
|    of Actions into a list of actual subprocess to run. Conceptually, the
 | |
|    driver performs a top down matching to assign Action(s) to Tools. The
 | |
|    ToolChain is responsible for selecting the tool to perform a
 | |
|    particular action; once selected the driver interacts with the tool
 | |
|    to see if it can match additional actions (for example, by having an
 | |
|    integrated preprocessor).
 | |
| 
 | |
|    Once Tools have been selected for all actions, the driver determines
 | |
|    how the tools should be connected (for example, using an inprocess
 | |
|    module, pipes, temporary files, or user provided filenames). If an
 | |
|    output file is required, the driver also computes the appropriate
 | |
|    file name (the suffix and file location depend on the input types and
 | |
|    options such as ``-save-temps``).
 | |
| 
 | |
|    The driver interacts with a ToolChain to perform the Tool bindings.
 | |
|    Each ToolChain contains information about all the tools needed for
 | |
|    compilation for a particular architecture, platform, and operating
 | |
|    system. A single driver invocation may query multiple ToolChains
 | |
|    during one compilation in order to interact with tools for separate
 | |
|    architectures.
 | |
| 
 | |
|    The results of this stage are not computed directly, but the driver
 | |
|    can print the results via the ``-ccc-print-bindings`` option. For
 | |
|    example:
 | |
| 
 | |
|    .. code-block:: console
 | |
| 
 | |
|       $ clang -ccc-print-bindings -arch i386 -arch ppc t0.c
 | |
|       # "i386-apple-darwin9" - "clang", inputs: ["t0.c"], output: "/tmp/cc-Sn4RKF.s"
 | |
|       # "i386-apple-darwin9" - "darwin::Assemble", inputs: ["/tmp/cc-Sn4RKF.s"], output: "/tmp/cc-gvSnbS.o"
 | |
|       # "i386-apple-darwin9" - "darwin::Link", inputs: ["/tmp/cc-gvSnbS.o"], output: "/tmp/cc-jgHQxi.out"
 | |
|       # "ppc-apple-darwin9" - "gcc::Compile", inputs: ["t0.c"], output: "/tmp/cc-Q0bTox.s"
 | |
|       # "ppc-apple-darwin9" - "gcc::Assemble", inputs: ["/tmp/cc-Q0bTox.s"], output: "/tmp/cc-WCdicw.o"
 | |
|       # "ppc-apple-darwin9" - "gcc::Link", inputs: ["/tmp/cc-WCdicw.o"], output: "/tmp/cc-HHBEBh.out"
 | |
|       # "i386-apple-darwin9" - "darwin::Lipo", inputs: ["/tmp/cc-jgHQxi.out", "/tmp/cc-HHBEBh.out"], output: "a.out"
 | |
| 
 | |
|    This shows the tool chain, tool, inputs and outputs which have been
 | |
|    bound for this compilation sequence. Here clang is being used to
 | |
|    compile t0.c on the i386 architecture and darwin specific versions of
 | |
|    the tools are being used to assemble and link the result, but generic
 | |
|    gcc versions of the tools are being used on PowerPC.
 | |
| 
 | |
| #. **Translate: Tool Specific Argument Translation**
 | |
| 
 | |
|    Once a Tool has been selected to perform a particular Action, the
 | |
|    Tool must construct concrete Jobs which will be executed during
 | |
|    compilation. The main work is in translating from the gcc style
 | |
|    command line options to whatever options the subprocess expects.
 | |
| 
 | |
|    Some tools, such as the assembler, only interact with a handful of
 | |
|    arguments and just determine the path of the executable to call and
 | |
|    pass on their input and output arguments. Others, like the compiler
 | |
|    or the linker, may translate a large number of arguments in addition.
 | |
| 
 | |
|    The ArgList class provides a number of simple helper methods to
 | |
|    assist with translating arguments; for example, to pass on only the
 | |
|    last of arguments corresponding to some option, or all arguments for
 | |
|    an option.
 | |
| 
 | |
|    The result of this stage is a list of Jobs (executable paths and
 | |
|    argument strings) to execute.
 | |
| 
 | |
| #. **Execute**
 | |
| 
 | |
|    Finally, the compilation pipeline is executed. This is mostly
 | |
|    straightforward, although there is some interaction with options like
 | |
|    ``-pipe``, ``-pass-exit-codes`` and ``-time``.
 | |
| 
 | |
| Additional Notes
 | |
| ----------------
 | |
| 
 | |
| The Compilation Object
 | |
| ^^^^^^^^^^^^^^^^^^^^^^
 | |
| 
 | |
| The driver constructs a Compilation object for each set of command line
 | |
| arguments. The Driver itself is intended to be invariant during
 | |
| construction of a Compilation; an IDE should be able to construct a
 | |
| single long lived driver instance to use for an entire build, for
 | |
| example.
 | |
| 
 | |
| The Compilation object holds information that is particular to each
 | |
| compilation sequence. For example, the list of used temporary files
 | |
| (which must be removed once compilation is finished) and result files
 | |
| (which should be removed if compilation fails).
 | |
| 
 | |
| Unified Parsing & Pipelining
 | |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 | |
| 
 | |
| Parsing and pipelining both occur without reference to a Compilation
 | |
| instance. This is by design; the driver expects that both of these
 | |
| phases are platform neutral, with a few very well defined exceptions
 | |
| such as whether the platform uses a driver driver.
 | |
| 
 | |
| ToolChain Argument Translation
 | |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 | |
| 
 | |
| In order to match gcc very closely, the clang driver currently allows
 | |
| tool chains to perform their own translation of the argument list (into
 | |
| a new ArgList data structure). Although this allows the clang driver to
 | |
| match gcc easily, it also makes the driver operation much harder to
 | |
| understand (since the Tools stop seeing some arguments the user
 | |
| provided, and see new ones instead).
 | |
| 
 | |
| For example, on Darwin ``-gfull`` gets translated into two separate
 | |
| arguments, ``-g`` and ``-fno-eliminate-unused-debug-symbols``. Trying to
 | |
| write Tool logic to do something with ``-gfull`` will not work, because
 | |
| Tool argument translation is done after the arguments have been
 | |
| translated.
 | |
| 
 | |
| A long term goal is to remove this tool chain specific translation, and
 | |
| instead force each tool to change its own logic to do the right thing on
 | |
| the untranslated original arguments.
 | |
| 
 | |
| Unused Argument Warnings
 | |
| ^^^^^^^^^^^^^^^^^^^^^^^^
 | |
| 
 | |
| The driver operates by parsing all arguments but giving Tools the
 | |
| opportunity to choose which arguments to pass on. One downside of this
 | |
| infrastructure is that if the user misspells some option, or is confused
 | |
| about which options to use, some command line arguments the user really
 | |
| cared about may go unused. This problem is particularly important when
 | |
| using clang as a compiler, since the clang compiler does not support
 | |
| anywhere near all the options that gcc does, and we want to make sure
 | |
| users know which ones are being used.
 | |
| 
 | |
| To support this, the driver maintains a bit associated with each
 | |
| argument of whether it has been used (at all) during the compilation.
 | |
| This bit usually doesn't need to be set by hand, as the key ArgList
 | |
| accessors will set it automatically.
 | |
| 
 | |
| When a compilation is successful (there are no errors), the driver
 | |
| checks the bit and emits an "unused argument" warning for any arguments
 | |
| which were never accessed. This is conservative (the argument may not
 | |
| have been used to do what the user wanted) but still catches the most
 | |
| obvious cases.
 | |
| 
 | |
| Relation to GCC Driver Concepts
 | |
| -------------------------------
 | |
| 
 | |
| For those familiar with the gcc driver, this section provides a brief
 | |
| overview of how things from the gcc driver map to the clang driver.
 | |
| 
 | |
| -  **Driver Driver**
 | |
| 
 | |
|    The driver driver is fully integrated into the clang driver. The
 | |
|    driver simply constructs additional Actions to bind the architecture
 | |
|    during the *Pipeline* phase. The tool chain specific argument
 | |
|    translation is responsible for handling ``-Xarch_``.
 | |
| 
 | |
|    The one caveat is that this approach requires ``-Xarch_`` not be used
 | |
|    to alter the compilation itself (for example, one cannot provide
 | |
|    ``-S`` as an ``-Xarch_`` argument). The driver attempts to reject
 | |
|    such invocations, and overall there isn't a good reason to abuse
 | |
|    ``-Xarch_`` to that end in practice.
 | |
| 
 | |
|    The upside is that the clang driver is more efficient and does little
 | |
|    extra work to support universal builds. It also provides better error
 | |
|    reporting and UI consistency.
 | |
| 
 | |
| -  **Specs**
 | |
| 
 | |
|    The clang driver has no direct correspondent for "specs". The
 | |
|    majority of the functionality that is embedded in specs is in the
 | |
|    Tool specific argument translation routines. The parts of specs which
 | |
|    control the compilation pipeline are generally part of the *Pipeline*
 | |
|    stage.
 | |
| 
 | |
| -  **Toolchains**
 | |
| 
 | |
|    The gcc driver has no direct understanding of tool chains. Each gcc
 | |
|    binary roughly corresponds to the information which is embedded
 | |
|    inside a single ToolChain.
 | |
| 
 | |
|    The clang driver is intended to be portable and support complex
 | |
|    compilation environments. All platform and tool chain specific code
 | |
|    should be protected behind either abstract or well defined interfaces
 | |
|    (such as whether the platform supports use as a driver driver).
 |