parent
4cd449813a
commit
22a0599816
186
clang/README.txt
186
clang/README.txt
|
|
@ -1,178 +1,26 @@
|
||||||
//===----------------------------------------------------------------------===//
|
//===----------------------------------------------------------------------===//
|
||||||
// C Language Family Front-end
|
// C Language Family Front-end
|
||||||
//===----------------------------------------------------------------------===//
|
//===----------------------------------------------------------------------===//
|
||||||
Chris Lattner
|
|
||||||
|
|
||||||
I. Introduction:
|
Welcome to Clang. This is a compiler front-end for the C family of languages
|
||||||
|
(C, C++, Objective-C, and Objective-C++) which is built as part of the LLVM
|
||||||
clang: noun
|
compiler intrastructure project.
|
||||||
1. A loud, resonant, metallic sound.
|
|
||||||
2. The strident call of a crane or goose.
|
|
||||||
3. C-language family front-end toolkit.
|
|
||||||
|
|
||||||
The world needs better compiler tools, tools which are built as libraries. This
|
Unlike many other compiler frontends, Clang is useful for a number of things
|
||||||
design point allows reuse of the tools in new and novel ways. However, building
|
beyond just compiling code: we intend for Clang to be host to a number of
|
||||||
the tools as libraries isn't enough: they must have clean APIs, be as
|
different source level tools. One example of this is the Clang Static Analyzer.
|
||||||
decoupled from each other as possible, and be easy to modify/extend. This
|
|
||||||
requires clean layering, decent design, and avoiding tying the libraries to a
|
|
||||||
specific use. Oh yeah, did I mention that we want the resultant libraries to
|
|
||||||
be as fast as possible? :)
|
|
||||||
|
|
||||||
This front-end is built as a component of the LLVM toolkit that can be used
|
If you're interested in more (including how to build Clang) it is best to read
|
||||||
with the LLVM backend or independently of it. In this spirit, the API has been
|
the relevant web sites. Here are some pointers:
|
||||||
carefully designed as the following components:
|
|
||||||
|
|
||||||
libsupport - Basic support library, reused from LLVM.
|
|
||||||
|
|
||||||
libsystem - System abstraction library, reused from LLVM.
|
Information on Clang: http://clang.llvm.org/
|
||||||
|
Building and using Clang: http://clang.llvm.org/get_started.html
|
||||||
libbasic - Diagnostics, SourceLocations, SourceBuffer abstraction,
|
Clang Static Analyzer: http://clang-analyzer.llvm.org/
|
||||||
file system caching for input source files. This depends on
|
Information on the LLVM project: http://llvm.org/
|
||||||
libsupport and libsystem.
|
|
||||||
|
|
||||||
libast - Provides classes to represent the C AST, the C type system,
|
If you have questions or comments about Clang, a great place to discuss them is
|
||||||
builtin functions, and various helpers for analyzing and
|
on the Clang development mailing list:
|
||||||
manipulating the AST (visitors, pretty printers, etc). This
|
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
|
||||||
library depends on libbasic.
|
|
||||||
|
|
||||||
|
|
||||||
liblex - C/C++/ObjC lexing and preprocessing, identifier hash table,
|
|
||||||
pragma handling, tokens, and macros. This depends on libbasic.
|
|
||||||
|
|
||||||
libparse - C (for now) parsing and local semantic analysis. This library
|
|
||||||
invokes coarse-grained 'Actions' provided by the client to do
|
|
||||||
stuff (e.g. libsema builds ASTs). This depends on liblex.
|
|
||||||
|
|
||||||
libsema - Provides a set of parser actions to build a standardized AST
|
|
||||||
for programs. AST's are 'streamed' out a top-level declaration
|
|
||||||
at a time, allowing clients to use decl-at-a-time processing,
|
|
||||||
build up entire translation units, or even build 'whole
|
|
||||||
program' ASTs depending on how they use the APIs. This depends
|
|
||||||
on libast and libparse.
|
|
||||||
|
|
||||||
librewrite - Fast, scalable rewriting of source code. This operates on
|
|
||||||
the raw syntactic text of source code, allowing a client
|
|
||||||
to insert and delete text in very large source files using
|
|
||||||
the same source location information embedded in ASTs. This
|
|
||||||
is intended to be a low-level API that is useful for
|
|
||||||
higher-level clients and libraries such as code refactoring.
|
|
||||||
|
|
||||||
libanalysis - Source-level dataflow analysis useful for performing analyses
|
|
||||||
such as computing live variables. It also includes a
|
|
||||||
path-sensitive "graph-reachability" engine for writing
|
|
||||||
analyses that reason about different possible paths of
|
|
||||||
execution through source code. This is currently being
|
|
||||||
employed to write a set of checks for finding bugs in software.
|
|
||||||
|
|
||||||
libcodegen - Lower the AST to LLVM IR for optimization & codegen. Depends
|
|
||||||
on libast.
|
|
||||||
|
|
||||||
clang - An example driver, client of the libraries at various levels.
|
|
||||||
This depends on all these libraries, and on LLVM VMCore.
|
|
||||||
|
|
||||||
This front-end has been intentionally built as a DAG of libraries, making it
|
|
||||||
easy to reuse individual parts or replace pieces if desired. For example, to
|
|
||||||
build a preprocessor, you take the Basic and Lexer libraries. If you want an
|
|
||||||
indexer, you take those plus the Parser library and provide some actions for
|
|
||||||
indexing. If you want a refactoring, static analysis, or source-to-source
|
|
||||||
compiler tool, it makes sense to take those plus the AST building and semantic
|
|
||||||
analyzer library. Finally, if you want to use this with the LLVM backend,
|
|
||||||
you'd take these components plus the AST to LLVM lowering code.
|
|
||||||
|
|
||||||
In the future I hope this toolkit will grow to include new and interesting
|
|
||||||
components, including a C++ front-end, ObjC support, and a whole lot of other
|
|
||||||
things.
|
|
||||||
|
|
||||||
Finally, it should be pointed out that the goal here is to build something that
|
|
||||||
is high-quality and industrial-strength: all the obnoxious features of the C
|
|
||||||
family must be correctly supported (trigraphs, preprocessor arcana, K&R-style
|
|
||||||
prototypes, GCC/MS extensions, etc). It cannot be used if it is not 'real'.
|
|
||||||
|
|
||||||
|
|
||||||
II. Usage of clang driver:
|
|
||||||
|
|
||||||
* Basic Command-Line Options:
|
|
||||||
- Help: clang --help
|
|
||||||
- Standard GCC options accepted: -E, -I*, -i*, -pedantic, -std=c90, etc.
|
|
||||||
- To make diagnostics more gcc-like: -fno-caret-diagnostics -fno-show-column
|
|
||||||
- Enable metric printing: -stats
|
|
||||||
|
|
||||||
* -fsyntax-only is currently the default mode.
|
|
||||||
|
|
||||||
* -E mode works the same way as GCC.
|
|
||||||
|
|
||||||
* -Eonly mode does all preprocessing, but does not print the output,
|
|
||||||
useful for timing the preprocessor.
|
|
||||||
|
|
||||||
* -fsyntax-only is currently partially implemented, lacking some
|
|
||||||
semantic analysis (some errors and warnings are not produced).
|
|
||||||
|
|
||||||
* -parse-noop parses code without building an AST. This is useful
|
|
||||||
for timing the cost of the parser without including AST building
|
|
||||||
time.
|
|
||||||
|
|
||||||
* -parse-ast builds ASTs, but doesn't print them. This is most
|
|
||||||
useful for timing AST building vs -parse-noop.
|
|
||||||
|
|
||||||
* -parse-ast-print pretty prints most expression and statements nodes.
|
|
||||||
|
|
||||||
* -parse-ast-check checks that diagnostic messages that are expected
|
|
||||||
are reported and that those which are reported are expected.
|
|
||||||
|
|
||||||
* -dump-cfg builds ASTs and then CFGs. CFGs are then pretty-printed.
|
|
||||||
|
|
||||||
* -view-cfg builds ASTs and then CFGs. CFGs are then visualized by
|
|
||||||
invoking Graphviz.
|
|
||||||
|
|
||||||
For more information on getting Graphviz to work with clang/LLVM,
|
|
||||||
see: http://llvm.org/docs/ProgrammersManual.html#ViewGraph
|
|
||||||
|
|
||||||
|
|
||||||
III. Current advantages over GCC:
|
|
||||||
|
|
||||||
* Column numbers are fully tracked (no 256 col limit, no GCC-style pruning).
|
|
||||||
* All diagnostics have column numbers, includes 'caret diagnostics', and they
|
|
||||||
highlight regions of interesting code (e.g. the LHS and RHS of a binop).
|
|
||||||
* Full diagnostic customization by client (can format diagnostics however they
|
|
||||||
like, e.g. in an IDE or refactoring tool) through DiagnosticClient interface.
|
|
||||||
* Built as a framework, can be reused by multiple tools.
|
|
||||||
* All languages supported linked into same library (no cc1,cc1obj, ...).
|
|
||||||
* mmap's code in read-only, does not dirty the pages like GCC (mem footprint).
|
|
||||||
* LLVM License, can be linked into non-GPL projects.
|
|
||||||
* Full diagnostic control, per diagnostic. Diagnostics are identified by ID.
|
|
||||||
* Significantly faster than GCC at semantic analysis, parsing, preprocessing
|
|
||||||
and lexing.
|
|
||||||
* Defers exposing platform-specific stuff to as late as possible, tracks use of
|
|
||||||
platform-specific features (e.g. #ifdef PPC) to allow 'portable bytecodes'.
|
|
||||||
* The lexer doesn't rely on the "lexer hack": it has no notion of scope and
|
|
||||||
does not categorize identifiers as types or variables -- this is up to the
|
|
||||||
parser to decide.
|
|
||||||
|
|
||||||
Potential Future Features:
|
|
||||||
|
|
||||||
* Fine grained diag control within the source (#pragma enable/disable warning).
|
|
||||||
* Better token tracking within macros? (Token came from this line, which is
|
|
||||||
a macro argument instantiated here, recursively instantiated here).
|
|
||||||
* Fast #import with a module system.
|
|
||||||
* Dependency tracking: change to header file doesn't recompile every function
|
|
||||||
that texually depends on it: recompile only those functions that need it.
|
|
||||||
This is aka 'incremental parsing'.
|
|
||||||
|
|
||||||
|
|
||||||
IV. Missing Functionality / Improvements
|
|
||||||
|
|
||||||
Lexer:
|
|
||||||
* Source character mapping. GCC supports ASCII and UTF-8.
|
|
||||||
See GCC options: -ftarget-charset and -ftarget-wide-charset.
|
|
||||||
* Universal character support. Experimental in GCC, enabled with
|
|
||||||
-fextended-identifiers.
|
|
||||||
* -fpreprocessed mode.
|
|
||||||
|
|
||||||
Preprocessor:
|
|
||||||
* #assert/#unassert
|
|
||||||
* MSExtension: "L#param" stringizes to a wide string literal.
|
|
||||||
* Add support for -M*
|
|
||||||
|
|
||||||
Traditional Preprocessor:
|
|
||||||
* Currently, we have none. :)
|
|
||||||
|
|
||||||
|
If you find a bug in Clang, please file it in the LLVM bug tracker:
|
||||||
|
http://llvm.org/bugs/
|
||||||
|
|
|
||||||
Loading…
Reference in New Issue