702 lines
		
	
	
		
			32 KiB
		
	
	
	
		
			HTML
		
	
	
	
			
		
		
	
	
			702 lines
		
	
	
		
			32 KiB
		
	
	
	
		
			HTML
		
	
	
	
| <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
 | |
|           "http://www.w3.org/TR/html4/strict.dtd">
 | |
| <html>
 | |
| <head>
 | |
|   <title>Checker Developer Manual</title>
 | |
|   <link type="text/css" rel="stylesheet" href="menu.css">
 | |
|   <link type="text/css" rel="stylesheet" href="content.css">
 | |
|   <script type="text/javascript" src="scripts/menu.js"></script>
 | |
| </head>
 | |
| <body>
 | |
| 
 | |
| <div id="page">
 | |
| <!--#include virtual="menu.html.incl"-->
 | |
| 
 | |
| <div id="content">
 | |
| 
 | |
| <h3 style="color:red">This Page Is Under Construction</h3>
 | |
| 
 | |
| <h1>Checker Developer Manual</h1>
 | |
| 
 | |
| <p>The static analyzer engine performs path-sensitive exploration of the program and 
 | |
| relies on a set of checkers to implement the logic for detecting and 
 | |
| constructing specific bug reports. Anyone who is interested in implementing their own 
 | |
| checker, should check out the Building a Checker in 24 Hours talk 
 | |
| (<a href="http://llvm.org/devmtg/2012-11/Zaks-Rose-Checker24Hours.pdf">slides</a>
 | |
|  <a href="http://llvm.org/devmtg/2012-11/videos/Zaks-Rose-Checker24Hours.mp4">video</a>) 
 | |
| and refer to this page for additional information on writing a checker. The static analyzer is a 
 | |
| part of the Clang project, so consult <a href="http://clang.llvm.org/hacking.html">Hacking on Clang</a> 
 | |
| and <a href="http://llvm.org/docs/ProgrammersManual.html">LLVM Programmer's Manual</a> 
 | |
| for developer guidelines and send your questions and proposals to 
 | |
| <a href=http://lists.llvm.org/mailman/listinfo/cfe-dev>cfe-dev mailing list</a>. 
 | |
| </p>
 | |
| 
 | |
|     <ul>
 | |
|       <li><a href="#start">Getting Started</a></li>
 | |
|       <li><a href="#analyzer">Static Analyzer Overview</a>
 | |
|       <ul>
 | |
|         <li><a href="#interaction">Interaction with Checkers</a></li>
 | |
|         <li><a href="#values">Representing Values</a></li>
 | |
|       </ul></li>
 | |
|       <li><a href="#idea">Idea for a Checker</a></li>
 | |
|       <li><a href="#registration">Checker Registration</a></li>
 | |
|       <li><a href="#events_callbacks">Events, Callbacks, and Checker Class Structure</a></li>
 | |
|       <li><a href="#extendingstates">Custom Program States</a></li>
 | |
|       <li><a href="#bugs">Bug Reports</a></li>
 | |
|       <li><a href="#ast">AST Visitors</a></li>
 | |
|       <li><a href="#testing">Testing</a></li>
 | |
|       <li><a href="#commands">Useful Commands/Debugging Hints</a>
 | |
|       <ul>
 | |
|         <li><a href="#attaching">Attaching the Debugger</a></li>
 | |
|         <li><a href="#narrowing">Narrowing Down the Problem</a></li>
 | |
|         <li><a href="#visualizing">Visualizing the Analysis</a></li>
 | |
|         <li><a href="#debugprints">Debug Prints and Tricks</a></li>
 | |
|       </ul></li>
 | |
|       <li><a href="#additioninformation">Additional Sources of Information</a></li>
 | |
|       <li><a href="#links">Useful Links</a></li>
 | |
|     </ul>
 | |
| 
 | |
| <h2 id=start>Getting Started</h2>
 | |
|   <ul>
 | |
|     <li>To check out the source code and build the project, follow steps 1-4 of 
 | |
|     the <a href="http://clang.llvm.org/get_started.html">Clang Getting Started</a> 
 | |
|   page.</li>
 | |
| 
 | |
|     <li>The analyzer source code is located under the Clang source tree:
 | |
|     <br><tt>
 | |
|     $ <b>cd llvm/tools/clang</b>
 | |
|     </tt>
 | |
|     <br>See: <tt>include/clang/StaticAnalyzer</tt>, <tt>lib/StaticAnalyzer</tt>,
 | |
|      <tt>test/Analysis</tt>.</li>
 | |
| 
 | |
|     <li>The analyzer regression tests can be executed from the Clang's build 
 | |
|     directory:
 | |
|     <br><tt>
 | |
|     $ <b>cd ../../../; cd build/tools/clang; TESTDIRS=Analysis make test</b>
 | |
|     </tt></li>
 | |
|     
 | |
|     <li>Analyze a file with the specified checker:
 | |
|     <br><tt>
 | |
|     $ <b>clang -cc1 -analyze -analyzer-checker=core.DivideZero test.c</b>
 | |
|     </tt></li>
 | |
| 
 | |
|     <li>List the available checkers:
 | |
|     <br><tt>
 | |
|     $ <b>clang -cc1 -analyzer-checker-help</b>
 | |
|     </tt></li>
 | |
| 
 | |
|     <li>See the analyzer help for different output formats, fine tuning, and 
 | |
|     debug options:
 | |
|     <br><tt>
 | |
|     $ <b>clang -cc1 -help | grep "analyzer"</b>
 | |
|     </tt></li>
 | |
| 
 | |
|   </ul>
 | |
|  
 | |
| <h2 id=analyzer>Static Analyzer Overview</h2>
 | |
|   The analyzer core performs symbolic execution of the given program. All the 
 | |
|   input values are represented with symbolic values; further, the engine deduces 
 | |
|   the values of all the expressions in the program based on the input symbols  
 | |
|   and the path. The execution is path sensitive and every possible path through 
 | |
|   the program is explored. The explored execution traces are represented with 
 | |
|   <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1ExplodedGraph.html">ExplodedGraph</a> object.
 | |
|   Each node of the graph is 
 | |
|   <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1ExplodedNode.html">ExplodedNode</a>, 
 | |
|   which consists of a <tt>ProgramPoint</tt> and a <tt>ProgramState</tt>.
 | |
|   <p>
 | |
|   <a href="http://clang.llvm.org/doxygen/classclang_1_1ProgramPoint.html">ProgramPoint</a> 
 | |
|   represents the corresponding location in the program (or the CFG). 
 | |
|   <tt>ProgramPoint</tt> is also used to record additional information on 
 | |
|   when/how the state was added. For example, <tt>PostPurgeDeadSymbolsKind</tt> 
 | |
|   kind means that the state is the result of purging dead symbols - the 
 | |
|   analyzer's equivalent of garbage collection. 
 | |
|   <p>
 | |
|   <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1ProgramState.html">ProgramState</a> 
 | |
|   represents abstract state of the program. It consists of:
 | |
|   <ul>
 | |
|     <li><tt>Environment</tt> - a mapping from source code expressions to symbolic 
 | |
|     values
 | |
|     <li><tt>Store</tt> - a mapping from memory locations to symbolic values
 | |
|     <li><tt>GenericDataMap</tt> - constraints on symbolic values
 | |
|   </ul>
 | |
|   
 | |
|   <h3 id=interaction>Interaction with Checkers</h3>
 | |
| 
 | |
|   <p>
 | |
|   Checkers are not merely passive receivers of the analyzer core changes - they 
 | |
|   actively participate in the <tt>ProgramState</tt> construction through the
 | |
|   <tt>GenericDataMap</tt> which can be used to store the checker-defined part 
 | |
|   of the state. Each time the analyzer engine explores a new statement, it 
 | |
|   notifies each checker registered to listen for that statement, giving it an 
 | |
|   opportunity to either report a bug or modify the state. (As a rule of thumb, 
 | |
|   the checker itself should be stateless.) The checkers are called one after another 
 | |
|   in the predefined order; thus, calling all the checkers adds a chain to the 
 | |
|   <tt>ExplodedGraph</tt>.
 | |
|   </p>
 | |
|   
 | |
|   <h3 id=values>Representing Values</h3>
 | |
| 
 | |
|   <p>
 | |
|   During symbolic execution, <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1SVal.html">SVal</a> 
 | |
|   objects are used to represent the semantic evaluation of expressions. 
 | |
|   They can represent things like concrete 
 | |
|   integers, symbolic values, or memory locations (which are memory regions). 
 | |
|   They are a discriminated union of "values", symbolic and otherwise. 
 | |
|   If a value isn't symbolic, usually that means there is no symbolic 
 | |
|   information to track. For example, if the value was an integer, such as 
 | |
|   <tt>42</tt>, it would be a <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1nonloc_1_1ConcreteInt.html">ConcreteInt</a>, 
 | |
|   and the checker doesn't usually need to track any state with the concrete 
 | |
|   number. In some cases, <tt>SVal</tt> is not a symbol, but it really should be 
 | |
|   a symbolic value. This happens when the analyzer cannot reason about something 
 | |
|   (yet). An example is floating point numbers. In such cases, the 
 | |
|   <tt>SVal</tt> will evaluate to <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1UnknownVal.html">UnknownVal</a>.
 | |
|   This represents a case that is outside the realm of the analyzer's reasoning 
 | |
|   capabilities. <tt>SVals</tt> are value objects and their values can be viewed 
 | |
|   using the <tt>.dump()</tt> method. Often they wrap persistent objects such as 
 | |
|   symbols or regions.
 | |
|   </p>
 | |
| 
 | |
|   <p>
 | |
|   <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1SymExpr.html">SymExpr</a> (symbol) 
 | |
|   is meant to represent abstract, but named, symbolic value. Symbols represent 
 | |
|   an actual (immutable) value. We might not know what its specific value is, but 
 | |
|   we can associate constraints with that value as we analyze a path. For 
 | |
|   example, we might record that the value of a symbol is greater than 
 | |
|   <tt>0</tt>, etc.
 | |
|   </p>
 | |
| 
 | |
|   <p>
 | |
|   <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1MemRegion.html">MemRegion</a> is similar to a symbol.  
 | |
|   It is used to provide a lexicon of how to describe abstract memory. Regions can 
 | |
|   layer on top of other regions, providing a layered approach to representing memory. 
 | |
|   For example, a struct object on the stack might be represented by a <tt>VarRegion</tt>, 
 | |
|   but a <tt>FieldRegion</tt> which is a subregion of the <tt>VarRegion</tt> could 
 | |
|   be used to represent the memory associated with a specific field of that object.
 | |
|   So how do we represent symbolic memory regions? That's what 
 | |
|   <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1SymbolicRegion.html">SymbolicRegion</a> 
 | |
|   is for. It is a <tt>MemRegion</tt> that has an associated symbol. Since the 
 | |
|   symbol is unique and has a unique name; that symbol names the region.
 | |
|   </p>
 | |
|   
 | |
|   <p>
 | |
|   Let's see how the analyzer processes the expressions in the following example:
 | |
|   </p>
 | |
| 
 | |
|   <p>
 | |
|   <pre class="code_example">
 | |
|   int foo(int x) {
 | |
|      int y = x * 2;
 | |
|      int z = x;
 | |
|      ...
 | |
|   }
 | |
|   </pre>
 | |
|   </p>
 | |
| 
 | |
|   <p>
 | |
| Let's look at how <tt>x*2</tt> gets evaluated. When <tt>x</tt> is evaluated, 
 | |
| we first construct an <tt>SVal</tt> that represents the lvalue of <tt>x</tt>, in 
 | |
| this case it is an <tt>SVal</tt> that references the <tt>MemRegion</tt> for <tt>x</tt>. 
 | |
| Afterwards, when we do the lvalue-to-rvalue conversion, we get a new <tt>SVal</tt>, 
 | |
| which references the value <b>currently bound</b> to <tt>x</tt>. That value is 
 | |
| symbolic; it's whatever <tt>x</tt> was bound to at the start of the function. 
 | |
| Let's call that symbol <tt>$0</tt>. Similarly, we evaluate the expression for <tt>2</tt>, 
 | |
| and get an <tt>SVal</tt> that references the concrete number <tt>2</tt>. When 
 | |
| we evaluate <tt>x*2</tt>, we take the two <tt>SVals</tt> of the subexpressions, 
 | |
| and create a new <tt>SVal</tt> that represents their multiplication (which in 
 | |
| this case is a new symbolic expression, which we might call <tt>$1</tt>). When we 
 | |
| evaluate the assignment to <tt>y</tt>, we again compute its lvalue (a <tt>MemRegion</tt>), 
 | |
| and then bind the <tt>SVal</tt> for the RHS (which references the symbolic value <tt>$1</tt>) 
 | |
| to the <tt>MemRegion</tt> in the symbolic store.
 | |
| <br>
 | |
| The second line is similar. When we evaluate <tt>x</tt> again, we do the same 
 | |
| dance, and create an <tt>SVal</tt> that references the symbol <tt>$0</tt>. Note, two <tt>SVals</tt> 
 | |
| might reference the same underlying values.
 | |
|   </p>
 | |
| 
 | |
| <p>
 | |
| To summarize, MemRegions are unique names for blocks of memory. Symbols are 
 | |
| unique names for abstract symbolic values. Some MemRegions represents abstract 
 | |
| symbolic chunks of memory, and thus are also based on symbols. SVals are just 
 | |
| references to values, and can reference either MemRegions, Symbols, or concrete 
 | |
| values (e.g., the number 1).
 | |
| </p>
 | |
| 
 | |
|   <!-- 
 | |
|   TODO: Add a picture.
 | |
|   <br>
 | |
|   Symbols<br>
 | |
|   FunctionalObjects are used throughout.  
 | |
|   -->
 | |
| 
 | |
| <h2 id=idea>Idea for a Checker</h2>
 | |
|   Here are several questions which you should consider when evaluating your 
 | |
|   checker idea:
 | |
|   <ul>
 | |
|     <li>Can the check be effectively implemented without path-sensitive 
 | |
|     analysis? See <a href="#ast">AST Visitors</a>.</li>
 | |
|     
 | |
|     <li>How high the false positive rate is going to be? Looking at the occurrences 
 | |
|     of the issue you want to write a checker for in the existing code bases might 
 | |
|     give you some ideas. </li>
 | |
|     
 | |
|     <li>How the current limitations of the analysis will effect the false alarm 
 | |
|     rate? Currently, the analyzer only reasons about one procedure at a time (no 
 | |
|     inter-procedural analysis). Also, it uses a simple range tracking based 
 | |
|     solver to model symbolic execution.</li>
 | |
|     
 | |
|     <li>Consult the <a
 | |
|     href="http://llvm.org/bugs/buglist.cgi?query_format=advanced&bug_status=NEW&bug_status=REOPENED&version=trunk&component=Static%20Analyzer&product=clang">Bugzilla database</a> 
 | |
|     to get some ideas for new checkers and consider starting with improving/fixing  
 | |
|     bugs in the existing checkers.</li>
 | |
|   </ul>
 | |
| 
 | |
| <p>Once an idea for a checker has been chosen, there are two key decisions that
 | |
| need to be made:
 | |
|   <ul>
 | |
|     <li> Which events the checker should be tracking. This is discussed in more
 | |
|     detail in the section <a href="#events_callbacks">Events, Callbacks, and
 | |
|     Checker Class Structure</a>.
 | |
|     <li> What checker-specific data needs to be stored as part of the program
 | |
|     state (if any). This should be minimized as much as possible. More detail about
 | |
|     implementing custom program state is given in section <a
 | |
|     href="#extendingstates">Custom Program States</a>.
 | |
|   </ul>
 | |
| 
 | |
| 
 | |
| <h2 id=registration>Checker Registration</h2>
 | |
|   All checker implementation files are located in
 | |
|   <tt>clang/lib/StaticAnalyzer/Checkers</tt> folder. The steps below describe
 | |
|   how the checker <tt>SimpleStreamChecker</tt>, which checks for misuses of 
 | |
|   stream APIs, was registered with the analyzer.
 | |
|   Similar steps should be followed for a new checker.
 | |
| <ol>
 | |
|   <li>A new checker implementation file, <tt>SimpleStreamChecker.cpp</tt>, was
 | |
|   created in the directory <tt>lib/StaticAnalyzer/Checkers</tt>.
 | |
|   <li>The following registration code was added to the implementation file:
 | |
| <pre class="code_example">
 | |
| void ento::registerSimpleStreamChecker(CheckerManager &mgr) {
 | |
|   mgr.registerChecker<SimpleStreamChecker>();
 | |
| }
 | |
| </pre>
 | |
| <li>A package was selected for the checker and the checker was defined in the
 | |
| table of checkers at <tt>lib/StaticAnalyzer/Checkers/Checkers.td</tt>. Since all
 | |
| checkers should first be developed as "alpha", and the SimpleStreamChecker
 | |
| performs UNIX API checks, the correct package is "alpha.unix", and the following
 | |
| was added to the corresponding <tt>UnixAlpha</tt> section of <tt>Checkers.td</tt>:
 | |
| <pre class="code_example">
 | |
| let ParentPackage = UnixAlpha in {
 | |
| ...
 | |
| def SimpleStreamChecker : Checker<"SimpleStream">,
 | |
|   HelpText<"Check for misuses of stream APIs">,
 | |
|   DescFile<"SimpleStreamChecker.cpp">;
 | |
| ...
 | |
| } // end "alpha.unix"
 | |
| </pre>
 | |
| 
 | |
| <li>The source code file was made visible to CMake by adding it to
 | |
| <tt>lib/StaticAnalyzer/Checkers/CMakeLists.txt</tt>.
 | |
| 
 | |
| </ol>
 | |
| 
 | |
| After adding a new checker to the analyzer, one can verify that the new checker
 | |
| was successfully added by seeing if it appears in the list of available checkers:
 | |
| <br> <tt><b>$clang -cc1 -analyzer-checker-help</b></tt>
 | |
| 
 | |
| <h2 id=events_callbacks>Events, Callbacks, and Checker Class Structure</h2>
 | |
| 
 | |
| <p> All checkers inherit from the <tt><a
 | |
| href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1Checker.html">
 | |
| Checker</a></tt> template class; the template parameter(s) describe the type of
 | |
| events that the checker is interested in processing. The various types of events
 | |
| that are available are described in the file <a
 | |
| href="http://clang.llvm.org/doxygen/CheckerDocumentation_8cpp_source.html">
 | |
| CheckerDocumentation.cpp</a>
 | |
| 
 | |
| <p> For each event type requested, a corresponding callback function must be
 | |
| defined in the checker class (<a
 | |
| href="http://clang.llvm.org/doxygen/CheckerDocumentation_8cpp_source.html">
 | |
| CheckerDocumentation.cpp</a> shows the
 | |
| correct function name and signature for each event type).
 | |
| 
 | |
| <p> As an example, consider <tt>SimpleStreamChecker</tt>. This checker needs to
 | |
| take action at the following times:
 | |
| 
 | |
| <ul>
 | |
| <li>Before making a call to a function, check if the function is <tt>fclose</tt>.
 | |
| If so, check the parameter being passed.
 | |
| <li>After making a function call, check if the function is <tt>fopen</tt>. If
 | |
| so, process the return value.
 | |
| <li>When values go out of scope, check whether they are still-open file
 | |
| descriptors, and report a bug if so. In addition, remove any information about
 | |
| them from the program state in order to keep the state as small as possible.
 | |
| <li>When file pointers "escape" (are used in a way that the analyzer can no longer
 | |
| track them), mark them as such. This prevents false positives in the cases where
 | |
| the analyzer cannot be sure whether the file was closed or not.
 | |
| </ul>
 | |
| 
 | |
| <p>These events that will be used for each of these actions are, respectively, <a
 | |
| href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1check_1_1PreCall.html">PreCall</a>,
 | |
| <a
 | |
| href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1check_1_1PostCall.html">PostCall</a>,
 | |
| <a
 | |
| href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1check_1_1DeadSymbols.html">DeadSymbols</a>,
 | |
| and <a
 | |
| href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1check_1_1PointerEscape.html">PointerEscape</a>.
 | |
| The high-level structure of the checker's class is thus:
 | |
| 
 | |
| <pre class="code_example">
 | |
| class SimpleStreamChecker : public Checker<check::PreCall,
 | |
|                                            check::PostCall,
 | |
|                                            check::DeadSymbols,
 | |
|                                            check::PointerEscape> {
 | |
| public:
 | |
| 
 | |
|   void checkPreCall(const CallEvent &Call, CheckerContext &C) const;
 | |
| 
 | |
|   void checkPostCall(const CallEvent &Call, CheckerContext &C) const;
 | |
| 
 | |
|   void checkDeadSymbols(SymbolReaper &SR, CheckerContext &C) const;
 | |
| 
 | |
|   ProgramStateRef checkPointerEscape(ProgramStateRef State,
 | |
|                                      const InvalidatedSymbols &Escaped,
 | |
|                                      const CallEvent *Call,
 | |
|                                      PointerEscapeKind Kind) const;
 | |
| };
 | |
| </pre>
 | |
| 
 | |
| <h2 id=extendingstates>Custom Program States</h2>
 | |
| 
 | |
| <p> Checkers often need to keep track of information specific to the checks they
 | |
| perform. However, since checkers have no guarantee about the order in which the
 | |
| program will be explored, or even that all possible paths will be explored, this
 | |
| state information cannot be kept within individual checkers. Therefore, if
 | |
| checkers need to store custom information, they need to add new categories of
 | |
| data to the <tt>ProgramState</tt>. The preferred way to do so is to use one of
 | |
| several macros designed for this purpose. They are:
 | |
| 
 | |
| <ul>
 | |
| <li><a
 | |
| href="http://clang.llvm.org/doxygen/ProgramStateTrait_8h.html#ae4cddb54383cd702a045d7c61b009147">REGISTER_TRAIT_WITH_PROGRAMSTATE</a>:
 | |
| Used when the state information is a single value. The methods available for
 | |
| state types declared with this macro are <tt>get</tt>, <tt>set</tt>, and
 | |
| <tt>remove</tt>.
 | |
| <li><a
 | |
| href="http://clang.llvm.org/doxygen/CheckerContext_8h.html#aa27656fa0ce65b0d9ba12eb3c02e8be9">REGISTER_LIST_WITH_PROGRAMSTATE</a>:
 | |
| Used when the state information is a list of values. The methods available for
 | |
| state types declared with this macro are <tt>add</tt>, <tt>get</tt>,
 | |
| <tt>remove</tt>, and <tt>contains</tt>.
 | |
| <li><a
 | |
| href="http://clang.llvm.org/doxygen/CheckerContext_8h.html#ad90f9387b94b344eaaf499afec05f4d1">REGISTER_SET_WITH_PROGRAMSTATE</a>:
 | |
| Used when the state information is a set of values. The methods available for
 | |
| state types declared with this macro are <tt>add</tt>, <tt>get</tt>,
 | |
| <tt>remove</tt>, and <tt>contains</tt>.
 | |
| <li><a
 | |
| href="http://clang.llvm.org/doxygen/CheckerContext_8h.html#a6d1893bb8c18543337b6c363c1319fcf">REGISTER_MAP_WITH_PROGRAMSTATE</a>:
 | |
| Used when the state information is a map from a key to a value. The methods
 | |
| available for state types declared with this macro are <tt>add</tt>,
 | |
| <tt>set</tt>, <tt>get</tt>, <tt>remove</tt>, and <tt>contains</tt>.
 | |
| </ul>
 | |
| 
 | |
| <p>All of these macros take as parameters the name to be used for the custom
 | |
| category of state information and the data type(s) to be used for storage. The
 | |
| data type(s) specified will become the parameter type and/or return type of the
 | |
| methods that manipulate the new category of state information. Each of these
 | |
| methods are templated with the name of the custom data type.
 | |
| 
 | |
| <p>For example, a common case is the need to track data associated with a
 | |
| symbolic expression; a map type is the most logical way to implement this. The
 | |
| key for this map will be a pointer to a symbolic expression
 | |
| (<tt>SymbolRef</tt>). If the data type to be associated with the symbolic
 | |
| expression is an integer, then the custom category of state information would be
 | |
| declared as
 | |
| 
 | |
| <pre class="code_example">
 | |
| REGISTER_MAP_WITH_PROGRAMSTATE(ExampleDataType, SymbolRef, int)
 | |
| </pre>
 | |
| 
 | |
| The data would be accessed with the function
 | |
| 
 | |
| <pre class="code_example">
 | |
| ProgramStateRef state;
 | |
| SymbolRef Sym;
 | |
| ...
 | |
| int currentlValue = state->get<ExampleDataType>(Sym);
 | |
| </pre>
 | |
| 
 | |
| and set with the function
 | |
| 
 | |
| <pre class="code_example">
 | |
| ProgramStateRef state;
 | |
| SymbolRef Sym;
 | |
| int newValue;
 | |
| ...
 | |
| ProgramStateRef newState = state->set<ExampleDataType>(Sym, newValue);
 | |
| </pre>
 | |
| 
 | |
| <p>In addition, the macros define a data type used for storing the data of the
 | |
| new data category; the name of this type is the name of the data category with
 | |
| "Ty" appended. For <tt>REGISTER_TRAIT_WITH_PROGRAMSTATE</tt>, this will simply
 | |
| be passed data type; for the other three macros, this will be a specialized
 | |
| version of the <a
 | |
| href="http://llvm.org/doxygen/classllvm_1_1ImmutableList.html">llvm::ImmutableList</a>,
 | |
| <a
 | |
| href="http://llvm.org/doxygen/classllvm_1_1ImmutableSet.html">llvm::ImmutableSet</a>,
 | |
| or <a
 | |
| href="http://llvm.org/doxygen/classllvm_1_1ImmutableMap.html">llvm::ImmutableMap</a>
 | |
| templated class. For the <tt>ExampleDataType</tt> example above, the type
 | |
| created would be equivalent to writing the declaration:
 | |
| 
 | |
| <pre class="code_example">
 | |
| typedef llvm::ImmutableMap<SymbolRef, int> ExampleDataTypeTy;
 | |
| </pre>
 | |
| 
 | |
| <p>These macros will cover a majority of use cases; however, they still have a
 | |
| few limitations. They cannot be used inside namespaces (since they expand to
 | |
| contain top-level namespace references), and the data types that they define
 | |
| cannot be referenced from more than one file.
 | |
| 
 | |
| <p>Note that <tt>ProgramStates</tt> are immutable; instead of modifying an existing
 | |
| one, functions that modify the state will return a copy of the previous state
 | |
| with the change applied. This updated state must be then provided to the
 | |
| analyzer core by calling the <tt>CheckerContext::addTransition</tt> function.
 | |
| <h2 id=bugs>Bug Reports</h2>
 | |
| 
 | |
| 
 | |
| <p> When a checker detects a mistake in the analyzed code, it needs a way to
 | |
| report it to the analyzer core so that it can be displayed. The two classes used
 | |
| to construct this report are <tt><a
 | |
| href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1BugType.html">BugType</a></tt>
 | |
| and <tt><a
 | |
| href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1BugReport.html">
 | |
| BugReport</a></tt>.
 | |
| 
 | |
| <p>
 | |
| <tt>BugType</tt>, as the name would suggest, represents a type of bug. The
 | |
| constructor for <tt>BugType</tt> takes two parameters: The name of the bug
 | |
| type, and the name of the category of the bug. These are used (e.g.) in the
 | |
| summary page generated by the scan-build tool.
 | |
| 
 | |
| <P>
 | |
|   The <tt>BugReport</tt> class represents a specific occurrence of a bug. In
 | |
|   the most common case, three parameters are used to form a <tt>BugReport</tt>:
 | |
| <ol>
 | |
| <li>The type of bug, specified as an instance of the <tt>BugType</tt> class.
 | |
| <li>A short descriptive string. This is placed at the location of the bug in
 | |
| the detailed line-by-line output generated by scan-build.
 | |
| <li>The context in which the bug occurred. This includes both the location of
 | |
| the bug in the program and the program's state when the location is reached. These are
 | |
| both encapsulated in an <tt>ExplodedNode</tt>.
 | |
| </ol>
 | |
| 
 | |
| <p>In order to obtain the correct <tt>ExplodedNode</tt>, a decision must be made
 | |
| as to whether or not analysis can continue along the current path. This decision
 | |
| is based on whether the detected bug is one that would prevent the program under
 | |
| analysis from continuing. For example, leaking of a resource should not stop
 | |
| analysis, as the program can continue to run after the leak. Dereferencing a
 | |
| null pointer, on the other hand, should stop analysis, as there is no way for
 | |
| the program to meaningfully continue after such an error.
 | |
| 
 | |
| <p>If analysis can continue, then the most recent <tt>ExplodedNode</tt> 
 | |
| generated by the checker can be passed to the <tt>BugReport</tt> constructor 
 | |
| without additional modification. This <tt>ExplodedNode</tt> will be the one 
 | |
| returned by the most recent call to <a
 | |
| href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1CheckerContext.html#a264f48d97809707049689c37aa35af78">CheckerContext::addTransition</a>.
 | |
| If no transition has been performed during the current callback, the checker should call <a
 | |
| href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1CheckerContext.html#a264f48d97809707049689c37aa35af78">CheckerContext::addTransition()</a> 
 | |
| and use the returned node for bug reporting.
 | |
| 
 | |
| <p>If analysis can not continue, then the current state should be transitioned
 | |
| into a so-called <i>sink node</i>, a node from which no further analysis will be
 | |
| performed. This is done by calling the <a
 | |
| href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1CheckerContext.html#adeea33a5a2bed190210c4a2bb807a6f0">
 | |
| CheckerContext::generateSink</a> function; this function is the same as the
 | |
| <tt>addTransition</tt> function, but marks the state as a sink node. Like
 | |
| <tt>addTransition</tt>, this returns an <tt>ExplodedNode</tt> with the updated
 | |
| state, which can then be passed to the <tt>BugReport</tt> constructor.
 | |
| 
 | |
| <p>
 | |
| After a <tt>BugReport</tt> is created, it should be passed to the analyzer core 
 | |
| by calling <a href = "http://clang.llvm.org/doxygen/classclang_1_1ento_1_1CheckerContext.html#ae7738af2cbfd1d713edec33d3203dff5">CheckerContext::emitReport</a>.
 | |
| 
 | |
| <h2 id=ast>AST Visitors</h2>
 | |
|   Some checks might not require path-sensitivity to be effective. Simple AST walk 
 | |
|   might be sufficient. If that is the case, consider implementing a Clang 
 | |
|   compiler warning. On the other hand, a check might not be acceptable as a compiler 
 | |
|   warning; for example, because of a relatively high false positive rate. In this 
 | |
|   situation, AST callbacks <tt><b>checkASTDecl</b></tt> and 
 | |
|   <tt><b>checkASTCodeBody</b></tt> are your best friends. 
 | |
| 
 | |
| <h2 id=testing>Testing</h2>
 | |
|   Every patch should be well tested with Clang regression tests. The checker tests 
 | |
|   live in <tt>clang/test/Analysis</tt> folder. To run all of the analyzer tests, 
 | |
|   execute the following from the <tt>clang</tt> build directory:
 | |
|     <pre class="code">
 | |
|     $ <b>bin/llvm-lit -sv ../llvm/tools/clang/test/Analysis</b>
 | |
|     </pre>
 | |
| 
 | |
| <h2 id=commands>Useful Commands/Debugging Hints</h2>
 | |
| 
 | |
| <h3 id=attaching>Attaching the Debugger</h3>
 | |
| 
 | |
| <p>When your command contains the <tt><b>-cc1</b></tt> flag, you can attach the
 | |
| debugger to it directly:</p>
 | |
| 
 | |
| <pre class="code">
 | |
|     $ <b>gdb --args clang -cc1 -analyze -analyzer-checker=core test.c</b>
 | |
|     $ <b>lldb -- clang -cc1 -analyze -analyzer-checker=core test.c</b>
 | |
| </pre>
 | |
| 
 | |
| <p>
 | |
| Otherwise, if your command line contains <tt><b>--analyze</b></tt>,
 | |
| the actual clang instance would be run in a separate process. In
 | |
| order to debug it, use the <tt><b>-###</b></tt> flag for obtaining
 | |
| the command line of the child process:
 | |
| </p>
 | |
| 
 | |
| <pre class="code">
 | |
|     $ <b>clang --analyze test.c -\#\#\#</b>
 | |
| </pre>
 | |
| 
 | |
| <p>
 | |
| Below we describe a few useful command line arguments, all of which assume that
 | |
| you are running <tt><b>clang -cc1</b></tt>.
 | |
| </p>
 | |
| 
 | |
| <h3 id=narrowing>Narrowing Down the Problem</h3>
 | |
| 
 | |
| <p>While investigating a checker-related issue, instruct the analyzer to only
 | |
| execute a single checker:
 | |
| </p>
 | |
| <pre class="code">
 | |
|     $ <b>clang -cc1 -analyze -analyzer-checker=osx.KeychainAPI test.c</b>
 | |
| </pre>
 | |
| 
 | |
| <p>If you are experiencing a crash, to see which function is failing while
 | |
| processing a large file use the  <tt><b>-analyzer-display-progress</b></tt>
 | |
| option.</p>
 | |
| 
 | |
| <p>You can analyze a particular function within the file, which is often useful
 | |
| because the problem is always in a certain function:</p>
 | |
| <pre class="code">
 | |
|     $ <b>clang -cc1 -analyze -analyzer-checker=core test.c -analyzer-display-progress</b>
 | |
|     ANALYZE (Syntax): test.c foo
 | |
|     ANALYZE (Syntax): test.c bar
 | |
|     ANALYZE (Path,  Inline_Regular): test.c bar
 | |
|     ANALYZE (Path,  Inline_Regular): test.c foo
 | |
|     $ <b>clang -cc1 -analyze -analyzer-checker=core test.c -analyzer-display-progress -analyze-function=foo</b>
 | |
|     ANALYZE (Syntax): test.c foo
 | |
|     ANALYZE (Path,  Inline_Regular): test.c foo
 | |
| </pre>
 | |
| 
 | |
| <p>The bug reporter mechanism removes path diagnostics inside intermediate
 | |
| function calls that have returned by the time the bug was found and contain
 | |
| no interesting pieces. Usually it is up to the checkers to produce more
 | |
| interesting pieces by adding custom <tt>BugReporterVisitor</tt> objects.
 | |
| However, you can disable path pruning while debugging with the
 | |
| <tt><b>-analyzer-config prune-paths=false</b></tt> option.
 | |
| 
 | |
| <h3 id=visualizing>Visualizing the Analysis</h3>
 | |
| 
 | |
| <p>To dump the AST, which often helps understanding how the program should
 | |
| behave:</p>
 | |
| <pre class="code">
 | |
|     $ <b>clang -cc1 -ast-dump test.c</b>
 | |
| </pre>
 | |
| 
 | |
| <p>To view/dump CFG use <tt>debug.ViewCFG</tt> or <tt>debug.DumpCFG</tt>
 | |
| checkers:</p>
 | |
| <pre class="code">
 | |
|     $ <b>clang -cc1 -analyze -analyzer-checker=debug.ViewCFG test.c</b>
 | |
| </pre>
 | |
| 
 | |
| <p><tt>ExplodedGraph</tt> (the state graph explored by the analyzer) can be
 | |
| visualized with another debug checker:</p>
 | |
| <pre class="code">
 | |
|     $ <b>clang -cc1 -analyze -analyzer-checker=debug.ViewExplodedGraph test.c</b>
 | |
| </pre>
 | |
| <p>Or, equivalently, with <tt><b>-analyzer-viz-egraph-graphviz</b></tt>
 | |
| option, which does the same thing - dumps the exploded graph in graphviz
 | |
| <tt><b>.dot</b></tt> format.</p>
 | |
| 
 | |
| <p>You can convert <tt><b>.dot</b></tt> files into other formats - in
 | |
| particular, converting to <tt><b>.svg</b></tt> and viewing in your web
 | |
| browser might be more comfortable than using a <tt><b>.dot</b></tt> viewer:</p>
 | |
| <pre class="code">
 | |
|     $ <b>dot -Tsvg ExprEngine-501e2e.dot -o ExprEngine-501e2e.svg</b>
 | |
| </pre>
 | |
| 
 | |
| <p>The <tt><b>-trim-egraph</b></tt> option removes all paths except those
 | |
| leading to bug reports from the exploded graph dump. This is useful
 | |
| because exploded graphs are often huge and hard to navigate.</p>
 | |
| 
 | |
| <p>Viewing <tt>ExplodedGraph</tt> is your most powerful tool for understanding
 | |
| the analyzer's false positives, because it gives comprehensive information
 | |
| on every decision made by the analyzer across all analysis paths.</p>
 | |
| 
 | |
| <p>There are more debug checkers available. To see all available debug checkers:
 | |
| </p>
 | |
| <pre class="code">
 | |
|     $ <b>clang -cc1 -analyzer-checker-help | grep "debug"</b>
 | |
| </pre>
 | |
| 
 | |
| <h3 id=debugprints>Debug Prints and Tricks</h3>
 | |
| 
 | |
| <p>To view "half-baked" <tt>ExplodedGraph</tt> while debugging, jump to a frame
 | |
| that has <tt>clang::ento::ExprEngine</tt> object and execute:</p>
 | |
| <pre class="code">
 | |
|     (gdb) <b>p ViewGraph(0)</b>
 | |
| </pre>
 | |
| 
 | |
| <p>To see the <tt>ProgramState</tt> while debugging use the following command.
 | |
| <pre class="code">
 | |
|     (gdb) <b>p State->dump()</b>
 | |
| </pre>
 | |
| 
 | |
| <p>To see <tt>clang::Expr</tt> while debugging use the following command. If you
 | |
| pass in a <tt>SourceManager</tt> object, it will also dump the corresponding line in the
 | |
| source code.</p>
 | |
| <pre class="code">
 | |
|     (gdb) <b>p E->dump()</b>
 | |
| </pre>
 | |
| 
 | |
| <p>To dump AST of a method that the current <tt>ExplodedNode</tt> belongs
 | |
| to:</p>
 | |
| <pre class="code">
 | |
|     (gdb) <b>p C.getPredecessor()->getCodeDecl().getBody()->dump()</b>
 | |
| </pre>
 | |
| 
 | |
| <h2 id=additioninformation>Additional Sources of Information</h2>
 | |
| 
 | |
| Here are some additional resources that are useful when working on the Clang
 | |
| Static Analyzer:
 | |
| 
 | |
| <ul>
 | |
| <li> <a href="http://clang.llvm.org/doxygen">Clang doxygen</a>. Contains
 | |
| up-to-date documentation about the APIs available in Clang. Relevant entries
 | |
| have been linked throughout this page. Also of use is the
 | |
| <a href="http://llvm.org/doxygen">LLVM doxygen</a>, when dealing with classes
 | |
| from LLVM.
 | |
| <li> The <a href="http://lists.llvm.org/mailman/listinfo/cfe-dev">
 | |
| cfe-dev mailing list</a>. This is the primary mailing list used for
 | |
| discussion of Clang development (including static code analysis). The
 | |
| <a href="http://lists.llvm.org/pipermail/cfe-dev">archive</a> also contains
 | |
| a lot of information.
 | |
| <li> The "Building a Checker in 24 hours" presentation given at the <a
 | |
| href="http://llvm.org/devmtg/2012-11">November 2012 LLVM Developer's
 | |
| meeting</a>. Describes the construction of SimpleStreamChecker. <a
 | |
| href="http://llvm.org/devmtg/2012-11/Zaks-Rose-Checker24Hours.pdf">Slides</a>
 | |
| and <a
 | |
| href="http://llvm.org/devmtg/2012-11/videos/Zaks-Rose-Checker24Hours.mp4">video</a>
 | |
| are available.
 | |
| </ul>
 | |
| 
 | |
| <h2 id=links>Useful Links</h2>
 | |
| <ul>
 | |
| <li>The list of <a href="implicit_checks.html">Implicit Checkers</a></li>
 | |
| </ul>
 | |
| 
 | |
| </div>
 | |
| </div>
 | |
| </body>
 | |
| </html>
 |