456 lines
		
	
	
		
			17 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
			
		
		
	
	
			456 lines
		
	
	
		
			17 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
| =================
 | |
| SanitizerCoverage
 | |
| =================
 | |
| 
 | |
| .. contents::
 | |
|    :local:
 | |
| 
 | |
| Introduction
 | |
| ============
 | |
| 
 | |
| Sanitizer tools have a very simple code coverage tool built in. It allows to
 | |
| get function-level, basic-block-level, and edge-level coverage at a very low
 | |
| cost.
 | |
| 
 | |
| How to build and run
 | |
| ====================
 | |
| 
 | |
| SanitizerCoverage can be used with :doc:`AddressSanitizer`,
 | |
| :doc:`LeakSanitizer`, :doc:`MemorySanitizer`,
 | |
| UndefinedBehaviorSanitizer, or without any sanitizer.  Pass one of the
 | |
| following compile-time flags:
 | |
| 
 | |
| * ``-fsanitize-coverage=func`` for function-level coverage (very fast).
 | |
| * ``-fsanitize-coverage=bb`` for basic-block-level coverage (may add up to 30%
 | |
|   **extra** slowdown).
 | |
| * ``-fsanitize-coverage=edge`` for edge-level coverage (up to 40% slowdown).
 | |
| 
 | |
| You may also specify ``-fsanitize-coverage=indirect-calls`` for
 | |
| additional `caller-callee coverage`_.
 | |
| 
 | |
| At run time, pass ``coverage=1`` in ``ASAN_OPTIONS``,
 | |
| ``LSAN_OPTIONS``, ``MSAN_OPTIONS`` or ``UBSAN_OPTIONS``, as
 | |
| appropriate. For the standalone coverage mode, use ``UBSAN_OPTIONS``.
 | |
| 
 | |
| To get `Coverage counters`_, add ``-fsanitize-coverage=8bit-counters``
 | |
| to one of the above compile-time flags. At runtime, use
 | |
| ``*SAN_OPTIONS=coverage=1:coverage_counters=1``.
 | |
| 
 | |
| Example:
 | |
| 
 | |
| .. code-block:: console
 | |
| 
 | |
|     % cat -n cov.cc
 | |
|          1  #include <stdio.h>
 | |
|          2  __attribute__((noinline))
 | |
|          3  void foo() { printf("foo\n"); }
 | |
|          4
 | |
|          5  int main(int argc, char **argv) {
 | |
|          6    if (argc == 2)
 | |
|          7      foo();
 | |
|          8    printf("main\n");
 | |
|          9  }
 | |
|     % clang++ -g cov.cc -fsanitize=address -fsanitize-coverage=func
 | |
|     % ASAN_OPTIONS=coverage=1 ./a.out; ls -l *sancov
 | |
|     main
 | |
|     -rw-r----- 1 kcc eng 4 Nov 27 12:21 a.out.22673.sancov
 | |
|     % ASAN_OPTIONS=coverage=1 ./a.out foo ; ls -l *sancov
 | |
|     foo
 | |
|     main
 | |
|     -rw-r----- 1 kcc eng 4 Nov 27 12:21 a.out.22673.sancov
 | |
|     -rw-r----- 1 kcc eng 8 Nov 27 12:21 a.out.22679.sancov
 | |
| 
 | |
| Every time you run an executable instrumented with SanitizerCoverage
 | |
| one ``*.sancov`` file is created during the process shutdown.
 | |
| If the executable is dynamically linked against instrumented DSOs,
 | |
| one ``*.sancov`` file will be also created for every DSO.
 | |
| 
 | |
| Postprocessing
 | |
| ==============
 | |
| 
 | |
| The format of ``*.sancov`` files is very simple: the first 8 bytes is the magic,
 | |
| one of ``0xC0BFFFFFFFFFFF64`` and ``0xC0BFFFFFFFFFFF32``. The last byte of the
 | |
| magic defines the size of the following offsets. The rest of the data is the
 | |
| offsets in the corresponding binary/DSO that were executed during the run.
 | |
| 
 | |
| A simple script
 | |
| ``$LLVM/projects/compiler-rt/lib/sanitizer_common/scripts/sancov.py`` is
 | |
| provided to dump these offsets.
 | |
| 
 | |
| .. code-block:: console
 | |
| 
 | |
|     % sancov.py print a.out.22679.sancov a.out.22673.sancov
 | |
|     sancov.py: read 2 PCs from a.out.22679.sancov
 | |
|     sancov.py: read 1 PCs from a.out.22673.sancov
 | |
|     sancov.py: 2 files merged; 2 PCs total
 | |
|     0x465250
 | |
|     0x4652a0
 | |
| 
 | |
| You can then filter the output of ``sancov.py`` through ``addr2line --exe
 | |
| ObjectFile`` or ``llvm-symbolizer --obj ObjectFile`` to get file names and line
 | |
| numbers:
 | |
| 
 | |
| .. code-block:: console
 | |
| 
 | |
|     % sancov.py print a.out.22679.sancov a.out.22673.sancov 2> /dev/null | llvm-symbolizer --obj a.out
 | |
|     cov.cc:3
 | |
|     cov.cc:5
 | |
| 
 | |
| Sancov Tool
 | |
| ===========
 | |
| 
 | |
| A new experimental ``sancov`` tool is developed to process coverage files.
 | |
| The tool is part of LLVM project and is currently supported only on Linux.
 | |
| It can handle symbolization tasks autonomously without any extra support
 | |
| from the environment. You need to pass .sancov files (named 
 | |
| ``<module_name>.<pid>.sancov`` and paths to all corresponding binary elf files. 
 | |
| Sancov matches these files using module names and binaries file names.
 | |
| 
 | |
| .. code-block:: console
 | |
| 
 | |
|     USAGE: sancov [options] <action> (<binary file>|<.sancov file>)...
 | |
| 
 | |
|     Action (required)
 | |
|       -print                    - Print coverage addresses
 | |
|       -covered-functions        - Print all covered functions.
 | |
|       -not-covered-functions    - Print all not covered functions.
 | |
|       -html-report              - Print HTML coverage report.
 | |
| 
 | |
|     Options
 | |
|       -blacklist=<string>         - Blacklist file (sanitizer blacklist format).
 | |
|       -demangle                   - Print demangled function name.
 | |
|       -strip_path_prefix=<string> - Strip this prefix from file paths in reports
 | |
| 
 | |
| 
 | |
| Automatic HTML Report Generation
 | |
| ================================
 | |
| 
 | |
| If ``*SAN_OPTIONS`` contains ``html_cov_report=1`` option set, then html
 | |
| coverage report would be automatically generated alongside the coverage files.
 | |
| The ``sancov`` binary should be present in ``PATH`` or
 | |
| ``sancov_path=<path_to_sancov`` option can be used to specify tool location.
 | |
| 
 | |
| 
 | |
| How good is the coverage?
 | |
| =========================
 | |
| 
 | |
| It is possible to find out which PCs are not covered, by subtracting the covered
 | |
| set from the set of all instrumented PCs. The latter can be obtained by listing
 | |
| all callsites of ``__sanitizer_cov()`` in the binary. On Linux, ``sancov.py``
 | |
| can do this for you. Just supply the path to binary and a list of covered PCs:
 | |
| 
 | |
| .. code-block:: console
 | |
| 
 | |
|     % sancov.py print a.out.12345.sancov > covered.txt
 | |
|     sancov.py: read 2 64-bit PCs from a.out.12345.sancov
 | |
|     sancov.py: 1 file merged; 2 PCs total
 | |
|     % sancov.py missing a.out < covered.txt
 | |
|     sancov.py: found 3 instrumented PCs in a.out
 | |
|     sancov.py: read 2 PCs from stdin
 | |
|     sancov.py: 1 PCs missing from coverage
 | |
|     0x4cc61c
 | |
| 
 | |
| Edge coverage
 | |
| =============
 | |
| 
 | |
| Consider this code:
 | |
| 
 | |
| .. code-block:: c++
 | |
| 
 | |
|     void foo(int *a) {
 | |
|       if (a)
 | |
|         *a = 0;
 | |
|     }
 | |
| 
 | |
| It contains 3 basic blocks, let's name them A, B, C:
 | |
| 
 | |
| .. code-block:: none
 | |
| 
 | |
|     A
 | |
|     |\
 | |
|     | \
 | |
|     |  B
 | |
|     | /
 | |
|     |/
 | |
|     C
 | |
| 
 | |
| If blocks A, B, and C are all covered we know for certain that the edges A=>B
 | |
| and B=>C were executed, but we still don't know if the edge A=>C was executed.
 | |
| Such edges of control flow graph are called
 | |
| `critical <http://en.wikipedia.org/wiki/Control_flow_graph#Special_edges>`_. The
 | |
| edge-level coverage (``-fsanitize-coverage=edge``) simply splits all critical
 | |
| edges by introducing new dummy blocks and then instruments those blocks:
 | |
| 
 | |
| .. code-block:: none
 | |
| 
 | |
|     A
 | |
|     |\
 | |
|     | \
 | |
|     D  B
 | |
|     | /
 | |
|     |/
 | |
|     C
 | |
| 
 | |
| Bitset
 | |
| ======
 | |
| 
 | |
| When ``coverage_bitset=1`` run-time flag is given, the coverage will also be
 | |
| dumped as a bitset (text file with 1 for blocks that have been executed and 0
 | |
| for blocks that were not).
 | |
| 
 | |
| .. code-block:: console
 | |
| 
 | |
|     % clang++ -fsanitize=address -fsanitize-coverage=edge cov.cc
 | |
|     % ASAN_OPTIONS="coverage=1:coverage_bitset=1" ./a.out
 | |
|     main
 | |
|     % ASAN_OPTIONS="coverage=1:coverage_bitset=1" ./a.out 1
 | |
|     foo
 | |
|     main
 | |
|     % head *bitset*
 | |
|     ==> a.out.38214.bitset-sancov <==
 | |
|     01101
 | |
|     ==> a.out.6128.bitset-sancov <==
 | |
|     11011%
 | |
| 
 | |
| For a given executable the length of the bitset is always the same (well,
 | |
| unless dlopen/dlclose come into play), so the bitset coverage can be
 | |
| easily used for bitset-based corpus distillation.
 | |
| 
 | |
| Caller-callee coverage
 | |
| ======================
 | |
| 
 | |
| (Experimental!)
 | |
| Every indirect function call is instrumented with a run-time function call that
 | |
| captures caller and callee.  At the shutdown time the process dumps a separate
 | |
| file called ``caller-callee.PID.sancov`` which contains caller/callee pairs as
 | |
| pairs of lines (odd lines are callers, even lines are callees)
 | |
| 
 | |
| .. code-block:: console
 | |
| 
 | |
|     a.out 0x4a2e0c
 | |
|     a.out 0x4a6510
 | |
|     a.out 0x4a2e0c
 | |
|     a.out 0x4a87f0
 | |
| 
 | |
| Current limitations:
 | |
| 
 | |
| * Only the first 14 callees for every caller are recorded, the rest are silently
 | |
|   ignored.
 | |
| * The output format is not very compact since caller and callee may reside in
 | |
|   different modules and we need to spell out the module names.
 | |
| * The routine that dumps the output is not optimized for speed
 | |
| * Only Linux x86_64 is tested so far.
 | |
| * Sandboxes are not supported.
 | |
| 
 | |
| Coverage counters
 | |
| =================
 | |
| 
 | |
| This experimental feature is inspired by
 | |
| `AFL <http://lcamtuf.coredump.cx/afl/technical_details.txt>`__'s coverage
 | |
| instrumentation. With additional compile-time and run-time flags you can get
 | |
| more sensitive coverage information.  In addition to boolean values assigned to
 | |
| every basic block (edge) the instrumentation will collect imprecise counters.
 | |
| On exit, every counter will be mapped to a 8-bit bitset representing counter
 | |
| ranges: ``1, 2, 3, 4-7, 8-15, 16-31, 32-127, 128+`` and those 8-bit bitsets will
 | |
| be dumped to disk.
 | |
| 
 | |
| .. code-block:: console
 | |
| 
 | |
|     % clang++ -g cov.cc -fsanitize=address -fsanitize-coverage=edge,8bit-counters
 | |
|     % ASAN_OPTIONS="coverage=1:coverage_counters=1" ./a.out
 | |
|     % ls -l *counters-sancov
 | |
|     ... a.out.17110.counters-sancov
 | |
|     % xxd *counters-sancov
 | |
|     0000000: 0001 0100 01
 | |
| 
 | |
| These counters may also be used for in-process coverage-guided fuzzers. See
 | |
| ``include/sanitizer/coverage_interface.h``:
 | |
| 
 | |
| .. code-block:: c++
 | |
| 
 | |
|     // The coverage instrumentation may optionally provide imprecise counters.
 | |
|     // Rather than exposing the counter values to the user we instead map
 | |
|     // the counters to a bitset.
 | |
|     // Every counter is associated with 8 bits in the bitset.
 | |
|     // We define 8 value ranges: 1, 2, 3, 4-7, 8-15, 16-31, 32-127, 128+
 | |
|     // The i-th bit is set to 1 if the counter value is in the i-th range.
 | |
|     // This counter-based coverage implementation is *not* thread-safe.
 | |
| 
 | |
|     // Returns the number of registered coverage counters.
 | |
|     uintptr_t __sanitizer_get_number_of_counters();
 | |
|     // Updates the counter 'bitset', clears the counters and returns the number of
 | |
|     // new bits in 'bitset'.
 | |
|     // If 'bitset' is nullptr, only clears the counters.
 | |
|     // Otherwise 'bitset' should be at least
 | |
|     // __sanitizer_get_number_of_counters bytes long and 8-aligned.
 | |
|     uintptr_t
 | |
|     __sanitizer_update_counter_bitset_and_clear_counters(uint8_t *bitset);
 | |
| 
 | |
| Tracing basic blocks
 | |
| ====================
 | |
| Experimental support for basic block (or edge) tracing.
 | |
| With ``-fsanitize-coverage=trace-bb`` the compiler will insert
 | |
| ``__sanitizer_cov_trace_basic_block(s32 *id)`` before every function, basic block, or edge
 | |
| (depending on the value of ``-fsanitize-coverage=[func,bb,edge]``).
 | |
| Example:
 | |
| 
 | |
| .. code-block:: console
 | |
| 
 | |
|     % clang -g -fsanitize=address -fsanitize-coverage=edge,trace-bb foo.cc
 | |
|     % ASAN_OPTIONS=coverage=1 ./a.out
 | |
| 
 | |
| This will produce two files after the process exit:
 | |
| `trace-points.PID.sancov` and `trace-events.PID.sancov`.
 | |
| The first file will contain a textual description of all the instrumented points in the program
 | |
| in the form that you can feed into llvm-symbolizer (e.g. `a.out 0x4dca89`), one per line.
 | |
| The second file will contain the actual execution trace as a sequence of 4-byte integers
 | |
| -- these integers are the indices into the array of instrumented points (the first file).
 | |
| 
 | |
| Basic block tracing is currently supported only for single-threaded applications.
 | |
| 
 | |
| 
 | |
| Tracing PCs
 | |
| ===========
 | |
| *Experimental* feature similar to tracing basic blocks, but with a different API.
 | |
| With ``-fsanitize-coverage=trace-pc`` the compiler will insert
 | |
| ``__sanitizer_cov_trace_pc()`` on every edge.
 | |
| With an additional ``...=trace-pc,indirect-calls`` flag
 | |
| ``__sanitizer_cov_trace_pc_indirect(void *callee)`` will be inserted on every indirect call.
 | |
| These callbacks are not implemented in the Sanitizer run-time and should be defined
 | |
| by the user. So, these flags do not require the other sanitizer to be used.
 | |
| This mechanism is used for fuzzing the Linux kernel (https://github.com/google/syzkaller)
 | |
| and can be used with `AFL <http://lcamtuf.coredump.cx/afl>`__.
 | |
| 
 | |
| Tracing data flow
 | |
| =================
 | |
| 
 | |
| An *experimental* feature to support data-flow-guided fuzzing.
 | |
| With ``-fsanitize-coverage=trace-cmp`` the compiler will insert extra instrumentation
 | |
| around comparison instructions and switch statements.
 | |
| The fuzzer will need to define the following functions,
 | |
| they will be called by the instrumented code.
 | |
| 
 | |
| .. code-block:: c++
 | |
| 
 | |
|   // Called before a comparison instruction.
 | |
|   // SizeAndType is a packed value containing
 | |
|   //   - [63:32] the Size of the operands of comparison in bits
 | |
|   //   - [31:0] the Type of comparison (one of ICMP_EQ, ... ICMP_SLE)
 | |
|   // Arg1 and Arg2 are arguments of the comparison.
 | |
|   void __sanitizer_cov_trace_cmp(uint64_t SizeAndType, uint64_t Arg1, uint64_t Arg2);
 | |
| 
 | |
|   // Called before a switch statement.
 | |
|   // Val is the switch operand.
 | |
|   // Cases[0] is the number of case constants.
 | |
|   // Cases[1] is the size of Val in bits.
 | |
|   // Cases[2:] are the case constants.
 | |
|   void __sanitizer_cov_trace_switch(uint64_t Val, uint64_t *Cases);
 | |
| 
 | |
| This interface is a subject to change.
 | |
| The current implementation is not thread-safe and thus can be safely used only for single-threaded targets.
 | |
| 
 | |
| Output directory
 | |
| ================
 | |
| 
 | |
| By default, .sancov files are created in the current working directory.
 | |
| This can be changed with ``ASAN_OPTIONS=coverage_dir=/path``:
 | |
| 
 | |
| .. code-block:: console
 | |
| 
 | |
|     % ASAN_OPTIONS="coverage=1:coverage_dir=/tmp/cov" ./a.out foo
 | |
|     % ls -l /tmp/cov/*sancov
 | |
|     -rw-r----- 1 kcc eng 4 Nov 27 12:21 a.out.22673.sancov
 | |
|     -rw-r----- 1 kcc eng 8 Nov 27 12:21 a.out.22679.sancov
 | |
| 
 | |
| Sudden death
 | |
| ============
 | |
| 
 | |
| Normally, coverage data is collected in memory and saved to disk when the
 | |
| program exits (with an ``atexit()`` handler), when a SIGSEGV is caught, or when
 | |
| ``__sanitizer_cov_dump()`` is called.
 | |
| 
 | |
| If the program ends with a signal that ASan does not handle (or can not handle
 | |
| at all, like SIGKILL), coverage data will be lost. This is a big problem on
 | |
| Android, where SIGKILL is a normal way of evicting applications from memory.
 | |
| 
 | |
| With ``ASAN_OPTIONS=coverage=1:coverage_direct=1`` coverage data is written to a
 | |
| memory-mapped file as soon as it collected.
 | |
| 
 | |
| .. code-block:: console
 | |
| 
 | |
|     % ASAN_OPTIONS="coverage=1:coverage_direct=1" ./a.out
 | |
|     main
 | |
|     % ls
 | |
|     7036.sancov.map  7036.sancov.raw  a.out
 | |
|     % sancov.py rawunpack 7036.sancov.raw
 | |
|     sancov.py: reading map 7036.sancov.map
 | |
|     sancov.py: unpacking 7036.sancov.raw
 | |
|     writing 1 PCs to a.out.7036.sancov
 | |
|     % sancov.py print a.out.7036.sancov
 | |
|     sancov.py: read 1 PCs from a.out.7036.sancov
 | |
|     sancov.py: 1 files merged; 1 PCs total
 | |
|     0x4b2bae
 | |
| 
 | |
| Note that on 64-bit platforms, this method writes 2x more data than the default,
 | |
| because it stores full PC values instead of 32-bit offsets.
 | |
| 
 | |
| In-process fuzzing
 | |
| ==================
 | |
| 
 | |
| Coverage data could be useful for fuzzers and sometimes it is preferable to run
 | |
| a fuzzer in the same process as the code being fuzzed (in-process fuzzer).
 | |
| 
 | |
| You can use ``__sanitizer_get_total_unique_coverage()`` from
 | |
| ``<sanitizer/coverage_interface.h>`` which returns the number of currently
 | |
| covered entities in the program. This will tell the fuzzer if the coverage has
 | |
| increased after testing every new input.
 | |
| 
 | |
| If a fuzzer finds a bug in the ASan run, you will need to save the reproducer
 | |
| before exiting the process.  Use ``__asan_set_death_callback`` from
 | |
| ``<sanitizer/asan_interface.h>`` to do that.
 | |
| 
 | |
| An example of such fuzzer can be found in `the LLVM tree
 | |
| <http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Fuzzer/README.txt?view=markup>`_.
 | |
| 
 | |
| Performance
 | |
| ===========
 | |
| 
 | |
| This coverage implementation is **fast**. With function-level coverage
 | |
| (``-fsanitize-coverage=func``) the overhead is not measurable. With
 | |
| basic-block-level coverage (``-fsanitize-coverage=bb``) the overhead varies
 | |
| between 0 and 25%.
 | |
| 
 | |
| ==============  =========  =========  =========  =========  =========  =========
 | |
|      benchmark      cov0        cov1   diff 0-1       cov2   diff 0-2   diff 1-2
 | |
| ==============  =========  =========  =========  =========  =========  =========
 | |
|  400.perlbench    1296.00    1307.00       1.01    1465.00       1.13       1.12
 | |
|      401.bzip2     858.00     854.00       1.00    1010.00       1.18       1.18
 | |
|        403.gcc     613.00     617.00       1.01     683.00       1.11       1.11
 | |
|        429.mcf     605.00     582.00       0.96     610.00       1.01       1.05
 | |
|      445.gobmk     896.00     880.00       0.98    1050.00       1.17       1.19
 | |
|      456.hmmer     892.00     892.00       1.00     918.00       1.03       1.03
 | |
|      458.sjeng     995.00    1009.00       1.01    1217.00       1.22       1.21
 | |
| 462.libquantum     497.00     492.00       0.99     534.00       1.07       1.09
 | |
|    464.h264ref    1461.00    1467.00       1.00    1543.00       1.06       1.05
 | |
|    471.omnetpp     575.00     590.00       1.03     660.00       1.15       1.12
 | |
|      473.astar     658.00     652.00       0.99     715.00       1.09       1.10
 | |
|  483.xalancbmk     471.00     491.00       1.04     582.00       1.24       1.19
 | |
|       433.milc     616.00     627.00       1.02     627.00       1.02       1.00
 | |
|       444.namd     602.00     601.00       1.00     654.00       1.09       1.09
 | |
|     447.dealII     630.00     634.00       1.01     653.00       1.04       1.03
 | |
|     450.soplex     365.00     368.00       1.01     395.00       1.08       1.07
 | |
|     453.povray     427.00     434.00       1.02     495.00       1.16       1.14
 | |
|        470.lbm     357.00     375.00       1.05     370.00       1.04       0.99
 | |
|    482.sphinx3     927.00     928.00       1.00    1000.00       1.08       1.08
 | |
| ==============  =========  =========  =========  =========  =========  =========
 | |
| 
 | |
| Why another coverage?
 | |
| =====================
 | |
| 
 | |
| Why did we implement yet another code coverage?
 | |
|   * We needed something that is lightning fast, plays well with
 | |
|     AddressSanitizer, and does not significantly increase the binary size.
 | |
|   * Traditional coverage implementations based in global counters
 | |
|     `suffer from contention on counters
 | |
|     <https://groups.google.com/forum/#!topic/llvm-dev/cDqYgnxNEhY>`_.
 |