140 lines
		
	
	
		
			6.6 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
			
		
		
	
	
			140 lines
		
	
	
		
			6.6 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
| //===----------------------------------------------------------------------===//
 | |
| // Clang Static Analyzer
 | |
| //===----------------------------------------------------------------------===//
 | |
| 
 | |
| = Library Structure =
 | |
| 
 | |
| The analyzer library has two layers: a (low-level) static analysis
 | |
| engine (GRExprEngine.cpp and friends), and some static checkers
 | |
| (*Checker.cpp).  The latter are built on top of the former via the
 | |
| Checker and CheckerVisitor interfaces (Checker.h and
 | |
| CheckerVisitor.h).  The Checker interface is designed to be minimal
 | |
| and simple for checker writers, and attempts to isolate them from much
 | |
| of the gore of the internal analysis engine.
 | |
| 
 | |
| = How It Works =
 | |
| 
 | |
| The analyzer is inspired by several foundational research papers ([1],
 | |
| [2]).  (FIXME: kremenek to add more links)
 | |
| 
 | |
| In a nutshell, the analyzer is basically a source code simulator that
 | |
| traces out possible paths of execution.  The state of the program
 | |
| (values of variables and expressions) is encapsulated by the state
 | |
| (ProgramState).  A location in the program is called a program point
 | |
| (ProgramPoint), and the combination of state and program point is a
 | |
| node in an exploded graph (ExplodedGraph).  The term "exploded" comes
 | |
| from exploding the control-flow edges in the control-flow graph (CFG).
 | |
| 
 | |
| Conceptually the analyzer does a reachability analysis through the
 | |
| ExplodedGraph.  We start at a root node, which has the entry program
 | |
| point and initial state, and then simulate transitions by analyzing
 | |
| individual expressions.  The analysis of an expression can cause the
 | |
| state to change, resulting in a new node in the ExplodedGraph with an
 | |
| updated program point and an updated state.  A bug is found by hitting
 | |
| a node that satisfies some "bug condition" (basically a violation of a
 | |
| checking invariant).
 | |
| 
 | |
| The analyzer traces out multiple paths by reasoning about branches and
 | |
| then bifurcating the state: on the true branch the conditions of the
 | |
| branch are assumed to be true and on the false branch the conditions
 | |
| of the branch are assumed to be false.  Such "assumptions" create
 | |
| constraints on the values of the program, and those constraints are
 | |
| recorded in the ProgramState object (and are manipulated by the
 | |
| ConstraintManager).  If assuming the conditions of a branch would
 | |
| cause the constraints to be unsatisfiable, the branch is considered
 | |
| infeasible and that path is not taken.  This is how we get
 | |
| path-sensitivity.  We reduce exponential blow-up by caching nodes.  If
 | |
| a new node with the same state and program point as an existing node
 | |
| would get generated, the path "caches out" and we simply reuse the
 | |
| existing node.  Thus the ExplodedGraph is not a DAG; it can contain
 | |
| cycles as paths loop back onto each other and cache out.
 | |
| 
 | |
| ProgramState and ExplodedNodes are basically immutable once created.  Once
 | |
| one creates a ProgramState, you need to create a new one to get a new
 | |
| ProgramState.  This immutability is key since the ExplodedGraph represents
 | |
| the behavior of the analyzed program from the entry point.  To
 | |
| represent these efficiently, we use functional data structures (e.g.,
 | |
| ImmutableMaps) which share data between instances.
 | |
| 
 | |
| Finally, individual Checkers work by also manipulating the analysis
 | |
| state.  The analyzer engine talks to them via a visitor interface.
 | |
| For example, the PreVisitCallExpr() method is called by GRExprEngine
 | |
| to tell the Checker that we are about to analyze a CallExpr, and the
 | |
| checker is asked to check for any preconditions that might not be
 | |
| satisfied.  The checker can do nothing, or it can generate a new
 | |
| ProgramState and ExplodedNode which contains updated checker state.  If it
 | |
| finds a bug, it can tell the BugReporter object about the bug,
 | |
| providing it an ExplodedNode which is the last node in the path that
 | |
| triggered the problem.
 | |
| 
 | |
| = Notes about C++ =
 | |
| 
 | |
| Since now constructors are seen before the variable that is constructed 
 | |
| in the CFG, we create a temporary object as the destination region that 
 | |
| is constructed into. See ExprEngine::VisitCXXConstructExpr().
 | |
| 
 | |
| In ExprEngine::processCallExit(), we always bind the object region to the
 | |
| evaluated CXXConstructExpr. Then in VisitDeclStmt(), we compute the
 | |
| corresponding lazy compound value if the variable is not a reference, and
 | |
| bind the variable region to the lazy compound value. If the variable
 | |
| is a reference, just use the object region as the initilizer value.
 | |
| 
 | |
| Before entering a C++ method (or ctor/dtor), the 'this' region is bound
 | |
| to the object region. In ctors, we synthesize 'this' region with  
 | |
| CXXRecordDecl*, which means we do not use type qualifiers. In methods, we
 | |
| synthesize 'this' region with CXXMethodDecl*, which has getThisType() 
 | |
| taking type qualifiers into account. It does not matter we use qualified
 | |
| 'this' region in one method and unqualified 'this' region in another
 | |
| method, because we only need to ensure the 'this' region is consistent 
 | |
| when we synthesize it and create it directly from CXXThisExpr in a single
 | |
| method call.
 | |
| 
 | |
| = Working on the Analyzer =
 | |
| 
 | |
| If you are interested in bringing up support for C++ expressions, the
 | |
| best place to look is the visitation logic in GRExprEngine, which
 | |
| handles the simulation of individual expressions.  There are plenty of
 | |
| examples there of how other expressions are handled.
 | |
| 
 | |
| If you are interested in writing checkers, look at the Checker and
 | |
| CheckerVisitor interfaces (Checker.h and CheckerVisitor.h).  Also look
 | |
| at the files named *Checker.cpp for examples on how you can implement
 | |
| these interfaces.
 | |
| 
 | |
| = Debugging the Analyzer =
 | |
| 
 | |
| There are some useful command-line options for debugging.  For example:
 | |
| 
 | |
| $ clang -cc1 -help | grep analyze
 | |
|  -analyze-function <value>
 | |
|  -analyzer-display-progress
 | |
|  -analyzer-viz-egraph-graphviz
 | |
|  ...
 | |
| 
 | |
| The first allows you to specify only analyzing a specific function.
 | |
| The second prints to the console what function is being analyzed.  The
 | |
| third generates a graphviz dot file of the ExplodedGraph.  This is
 | |
| extremely useful when debugging the analyzer and viewing the
 | |
| simulation results.
 | |
| 
 | |
| Of course, viewing the CFG (Control-Flow Graph) is also useful:
 | |
| 
 | |
| $ clang -cc1 -help | grep cfg
 | |
|  -cfg-add-implicit-dtors Add C++ implicit destructors to CFGs for all analyses
 | |
|  -cfg-add-initializers   Add C++ initializers to CFGs for all analyses
 | |
|  -cfg-dump               Display Control-Flow Graphs
 | |
|  -cfg-view               View Control-Flow Graphs using GraphViz
 | |
|  -unoptimized-cfg        Generate unoptimized CFGs for all analyses
 | |
| 
 | |
| -cfg-dump dumps a textual representation of the CFG to the console,
 | |
| and -cfg-view creates a GraphViz representation.
 | |
| 
 | |
| = References =
 | |
| 
 | |
| [1] Precise interprocedural dataflow analysis via graph reachability,
 | |
|     T Reps, S Horwitz, and M Sagiv, POPL '95,
 | |
|     http://portal.acm.org/citation.cfm?id=199462
 | |
| 
 | |
| [2] A memory model for static analysis of C programs, Z Xu, T
 | |
|     Kremenek, and J Zhang, http://lcs.ios.ac.cn/~xzx/memmodel.pdf
 |