forked from OSchip/llvm-project
				
			
		
			
				
	
	
		
			795 lines
		
	
	
		
			33 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
			
		
		
	
	
			795 lines
		
	
	
		
			33 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
=======================================================
 | 
						||
libFuzzer – a library for coverage-guided fuzz testing.
 | 
						||
=======================================================
 | 
						||
.. contents::
 | 
						||
   :local:
 | 
						||
   :depth: 1
 | 
						||
 | 
						||
Introduction
 | 
						||
============
 | 
						||
 | 
						||
LibFuzzer is in-process, coverage-guided, evolutionary fuzzing engine.
 | 
						||
 | 
						||
LibFuzzer is linked with the library under test, and feeds fuzzed inputs to the
 | 
						||
library via a specific fuzzing entrypoint (aka "target function"); the fuzzer
 | 
						||
then tracks which areas of the code are reached, and generates mutations on the
 | 
						||
corpus of input data in order to maximize the code coverage.
 | 
						||
The code coverage
 | 
						||
information for libFuzzer is provided by LLVM's SanitizerCoverage_
 | 
						||
instrumentation.
 | 
						||
 | 
						||
Contact: libfuzzer(#)googlegroups.com
 | 
						||
 | 
						||
Versions
 | 
						||
========
 | 
						||
 | 
						||
LibFuzzer is under active development so you will need the current
 | 
						||
(or at least a very recent) version of the Clang compiler.
 | 
						||
 | 
						||
(If `building Clang from trunk`_ is too time-consuming or difficult, then
 | 
						||
the Clang binaries that the Chromium developers build are likely to be
 | 
						||
fairly recent:
 | 
						||
 | 
						||
.. code-block:: console
 | 
						||
 | 
						||
  mkdir TMP_CLANG
 | 
						||
  cd TMP_CLANG
 | 
						||
  git clone https://chromium.googlesource.com/chromium/src/tools/clang
 | 
						||
  cd ..
 | 
						||
  TMP_CLANG/clang/scripts/update.py
 | 
						||
 | 
						||
This installs the Clang binary as
 | 
						||
``./third_party/llvm-build/Release+Asserts/bin/clang``)
 | 
						||
 | 
						||
The libFuzzer code resides in the LLVM repository, and requires a recent Clang
 | 
						||
compiler to build (and is used to `fuzz various parts of LLVM itself`_).
 | 
						||
However the fuzzer itself does not (and should not) depend on any part of LLVM
 | 
						||
infrastructure and can be used for other projects without requiring the rest
 | 
						||
of LLVM.
 | 
						||
 | 
						||
 | 
						||
Getting Started
 | 
						||
===============
 | 
						||
 | 
						||
.. contents::
 | 
						||
   :local:
 | 
						||
   :depth: 1
 | 
						||
 | 
						||
Fuzz Target
 | 
						||
-----------
 | 
						||
 | 
						||
The first step in using libFuzzer on a library is to implement a
 | 
						||
*fuzz target* -- a function that accepts an array of bytes and
 | 
						||
does something interesting with these bytes using the API under test.
 | 
						||
Like this:
 | 
						||
 | 
						||
.. code-block:: c++
 | 
						||
 | 
						||
  // fuzz_target.cc
 | 
						||
  extern "C" int LLVMFuzzerTestOneInput(const uint8_t *Data, size_t Size) {
 | 
						||
    DoSomethingInterestingWithMyAPI(Data, Size);
 | 
						||
    return 0;  // Non-zero return values are reserved for future use.
 | 
						||
  }
 | 
						||
 | 
						||
Note that this fuzz target does not depend on libFuzzer in any way
 | 
						||
and so it is possible and even desirable to use it with other fuzzing engines
 | 
						||
e.g. AFL_ and/or Radamsa_.
 | 
						||
 | 
						||
Some important things to remember about fuzz targets:
 | 
						||
 | 
						||
* The fuzzing engine will execute the fuzz target many times with different inputs in the same process.
 | 
						||
* It must tolerate any kind of input (empty, huge, malformed, etc).
 | 
						||
* It must not `exit()` on any input.
 | 
						||
* It may use threads but ideally all threads should be joined at the end of the function.
 | 
						||
* It must be as deterministic as possible. Non-determinism (e.g. random decisions not based on the input bytes) will make fuzzing inefficient.
 | 
						||
* It must be fast. Try avoiding cubic or greater complexity, logging, or excessive memory consumption.
 | 
						||
* Ideally, it should not modify any global state (although that's not strict).
 | 
						||
 | 
						||
 | 
						||
Building
 | 
						||
--------
 | 
						||
 | 
						||
Next, build the libFuzzer library as a static archive, without any sanitizer
 | 
						||
options. Note that the libFuzzer library contains the ``main()`` function:
 | 
						||
 | 
						||
.. code-block:: console
 | 
						||
 | 
						||
  svn co http://llvm.org/svn/llvm-project/llvm/trunk/lib/Fuzzer  # or git clone https://chromium.googlesource.com/chromium/llvm-project/llvm/lib/Fuzzer
 | 
						||
  ./Fuzzer/build.sh  # Produces libFuzzer.a
 | 
						||
 | 
						||
Then build the fuzzing target function and the library under test using
 | 
						||
the SanitizerCoverage_ option, which instruments the code so that the fuzzer
 | 
						||
can retrieve code coverage information (to guide the fuzzing).  Linking with
 | 
						||
the libFuzzer code then gives a fuzzer executable.
 | 
						||
 | 
						||
You should also enable one or more of the *sanitizers*, which help to expose
 | 
						||
latent bugs by making incorrect behavior generate errors at runtime:
 | 
						||
 | 
						||
 - AddressSanitizer_ (ASAN) detects memory access errors. Use `-fsanitize=address`.
 | 
						||
 - UndefinedBehaviorSanitizer_ (UBSAN) detects the use of various features of C/C++ that are explicitly
 | 
						||
   listed as resulting in undefined behavior.  Use `-fsanitize=undefined -fno-sanitize-recover=undefined`
 | 
						||
   or any individual UBSAN check, e.g.  `-fsanitize=signed-integer-overflow -fno-sanitize-recover=undefined`.
 | 
						||
   You may combine ASAN and UBSAN in one build.
 | 
						||
 - MemorySanitizer_ (MSAN) detects uninitialized reads: code whose behavior relies on memory
 | 
						||
   contents that have not been initialized to a specific value. Use `-fsanitize=memory`.
 | 
						||
   MSAN can not be combined with other sanirizers and should be used as a seprate build.
 | 
						||
 | 
						||
Finally, link with ``libFuzzer.a``::
 | 
						||
 | 
						||
  clang -fsanitize-coverage=trace-pc-guard -fsanitize=address your_lib.cc fuzz_target.cc libFuzzer.a -o my_fuzzer
 | 
						||
 | 
						||
Corpus
 | 
						||
------
 | 
						||
 | 
						||
Coverage-guided fuzzers like libFuzzer rely on a corpus of sample inputs for the
 | 
						||
code under test.  This corpus should ideally be seeded with a varied collection
 | 
						||
of valid and invalid inputs for the code under test; for example, for a graphics
 | 
						||
library the initial corpus might hold a variety of different small PNG/JPG/GIF
 | 
						||
files.  The fuzzer generates random mutations based around the sample inputs in
 | 
						||
the current corpus.  If a mutation triggers execution of a previously-uncovered
 | 
						||
path in the code under test, then that mutation is saved to the corpus for
 | 
						||
future variations.
 | 
						||
 | 
						||
LibFuzzer will work without any initial seeds, but will be less
 | 
						||
efficient if the library under test accepts complex,
 | 
						||
structured inputs.
 | 
						||
 | 
						||
The corpus can also act as a sanity/regression check, to confirm that the
 | 
						||
fuzzing entrypoint still works and that all of the sample inputs run through
 | 
						||
the code under test without problems.
 | 
						||
 | 
						||
If you have a large corpus (either generated by fuzzing or acquired by other means)
 | 
						||
you may want to minimize it while still preserving the full coverage. One way to do that
 | 
						||
is to use the `-merge=1` flag:
 | 
						||
 | 
						||
.. code-block:: console
 | 
						||
 | 
						||
  mkdir NEW_CORPUS_DIR  # Store minimized corpus here.
 | 
						||
  ./my_fuzzer -merge=1 NEW_CORPUS_DIR FULL_CORPUS_DIR
 | 
						||
 | 
						||
You may use the same flag to add more interesting items to an existing corpus.
 | 
						||
Only the inputs that trigger new coverage will be added to the first corpus.
 | 
						||
 | 
						||
.. code-block:: console
 | 
						||
 | 
						||
  ./my_fuzzer -merge=1 CURRENT_CORPUS_DIR NEW_POTENTIALLY_INTERESTING_INPUTS_DIR
 | 
						||
 | 
						||
 | 
						||
Running
 | 
						||
-------
 | 
						||
 | 
						||
To run the fuzzer, first create a Corpus_ directory that holds the
 | 
						||
initial "seed" sample inputs:
 | 
						||
 | 
						||
.. code-block:: console
 | 
						||
 | 
						||
  mkdir CORPUS_DIR
 | 
						||
  cp /some/input/samples/* CORPUS_DIR
 | 
						||
 | 
						||
Then run the fuzzer on the corpus directory:
 | 
						||
 | 
						||
.. code-block:: console
 | 
						||
 | 
						||
  ./my_fuzzer CORPUS_DIR  # -max_len=1000 -jobs=20 ...
 | 
						||
 | 
						||
As the fuzzer discovers new interesting test cases (i.e. test cases that
 | 
						||
trigger coverage of new paths through the code under test), those test cases
 | 
						||
will be added to the corpus directory.
 | 
						||
 | 
						||
By default, the fuzzing process will continue indefinitely – at least until
 | 
						||
a bug is found.  Any crashes or sanitizer failures will be reported as usual,
 | 
						||
stopping the fuzzing process, and the particular input that triggered the bug
 | 
						||
will be written to disk (typically as ``crash-<sha1>``, ``leak-<sha1>``,
 | 
						||
or ``timeout-<sha1>``).
 | 
						||
 | 
						||
 | 
						||
Parallel Fuzzing
 | 
						||
----------------
 | 
						||
 | 
						||
Each libFuzzer process is single-threaded, unless the library under test starts
 | 
						||
its own threads.  However, it is possible to run multiple libFuzzer processes in
 | 
						||
parallel with a shared corpus directory; this has the advantage that any new
 | 
						||
inputs found by one fuzzer process will be available to the other fuzzer
 | 
						||
processes (unless you disable this with the ``-reload=0`` option).
 | 
						||
 | 
						||
This is primarily controlled by the ``-jobs=N`` option, which indicates that
 | 
						||
that `N` fuzzing jobs should be run to completion (i.e. until a bug is found or
 | 
						||
time/iteration limits are reached).  These jobs will be run across a set of
 | 
						||
worker processes, by default using half of the available CPU cores; the count of
 | 
						||
worker processes can be overridden by the ``-workers=N`` option.  For example,
 | 
						||
running with ``-jobs=30`` on a 12-core machine would run 6 workers by default,
 | 
						||
with each worker averaging 5 bugs by completion of the entire process.
 | 
						||
 | 
						||
 | 
						||
Options
 | 
						||
=======
 | 
						||
 | 
						||
To run the fuzzer, pass zero or more corpus directories as command line
 | 
						||
arguments.  The fuzzer will read test inputs from each of these corpus
 | 
						||
directories, and any new test inputs that are generated will be written
 | 
						||
back to the first corpus directory:
 | 
						||
 | 
						||
.. code-block:: console
 | 
						||
 | 
						||
  ./fuzzer [-flag1=val1 [-flag2=val2 ...] ] [dir1 [dir2 ...] ]
 | 
						||
 | 
						||
If a list of files (rather than directories) are passed to the fuzzer program,
 | 
						||
then it will re-run those files as test inputs but will not perform any fuzzing.
 | 
						||
In this mode the fuzzer binary can be used as a regression test (e.g. on a
 | 
						||
continuous integration system) to check the target function and saved inputs
 | 
						||
still work.
 | 
						||
 | 
						||
The most important command line options are:
 | 
						||
 | 
						||
``-help``
 | 
						||
  Print help message.
 | 
						||
``-seed``
 | 
						||
  Random seed. If 0 (the default), the seed is generated.
 | 
						||
``-runs``
 | 
						||
  Number of individual test runs, -1 (the default) to run indefinitely.
 | 
						||
``-max_len``
 | 
						||
  Maximum length of a test input. If 0 (the default), libFuzzer tries to guess
 | 
						||
  a good value based on the corpus (and reports it).
 | 
						||
``-timeout``
 | 
						||
  Timeout in seconds, default 1200. If an input takes longer than this timeout,
 | 
						||
  the process is treated as a failure case.
 | 
						||
``-rss_limit_mb``
 | 
						||
  Memory usage limit in Mb, default 2048. Use 0 to disable the limit.
 | 
						||
  If an input requires more than this amount of RSS memory to execute,
 | 
						||
  the process is treated as a failure case.
 | 
						||
  The limit is checked in a separate thread every second.
 | 
						||
  If running w/o ASAN/MSAN, you may use 'ulimit -v' instead.
 | 
						||
``-timeout_exitcode``
 | 
						||
  Exit code (default 77) used if libFuzzer reports a timeout.
 | 
						||
``-error_exitcode``
 | 
						||
  Exit code (default 77) used if libFuzzer itself (not a sanitizer) reports a bug (leak, OOM, etc).
 | 
						||
``-max_total_time``
 | 
						||
  If positive, indicates the maximum total time in seconds to run the fuzzer.
 | 
						||
  If 0 (the default), run indefinitely.
 | 
						||
``-merge``
 | 
						||
  If set to 1, any corpus inputs from the 2nd, 3rd etc. corpus directories
 | 
						||
  that trigger new code coverage will be merged into the first corpus
 | 
						||
  directory.  Defaults to 0. This flag can be used to minimize a corpus.
 | 
						||
``-minimize_crash``
 | 
						||
  If 1, minimizes the provided crash input.
 | 
						||
  Use with -runs=N or -max_total_time=N to limit the number of attempts.
 | 
						||
``-reload``
 | 
						||
  If set to 1 (the default), the corpus directory is re-read periodically to
 | 
						||
  check for new inputs; this allows detection of new inputs that were discovered
 | 
						||
  by other fuzzing processes.
 | 
						||
``-jobs``
 | 
						||
  Number of fuzzing jobs to run to completion. Default value is 0, which runs a
 | 
						||
  single fuzzing process until completion.  If the value is >= 1, then this
 | 
						||
  number of jobs performing fuzzing are run, in a collection of parallel
 | 
						||
  separate worker processes; each such worker process has its
 | 
						||
  ``stdout``/``stderr`` redirected to ``fuzz-<JOB>.log``.
 | 
						||
``-workers``
 | 
						||
  Number of simultaneous worker processes to run the fuzzing jobs to completion
 | 
						||
  in. If 0 (the default), ``min(jobs, NumberOfCpuCores()/2)`` is used.
 | 
						||
``-dict``
 | 
						||
  Provide a dictionary of input keywords; see Dictionaries_.
 | 
						||
``-use_counters``
 | 
						||
  Use `coverage counters`_ to generate approximate counts of how often code
 | 
						||
  blocks are hit; defaults to 1.
 | 
						||
``-use_value_profile``
 | 
						||
  Use `value profile`_ to guide corpus expansion; defaults to 0.
 | 
						||
``-only_ascii``
 | 
						||
  If 1, generate only ASCII (``isprint``+``isspace``) inputs. Defaults to 0.
 | 
						||
``-artifact_prefix``
 | 
						||
  Provide a prefix to use when saving fuzzing artifacts (crash, timeout, or
 | 
						||
  slow inputs) as ``$(artifact_prefix)file``.  Defaults to empty.
 | 
						||
``-exact_artifact_path``
 | 
						||
  Ignored if empty (the default).  If non-empty, write the single artifact on
 | 
						||
  failure (crash, timeout) as ``$(exact_artifact_path)``. This overrides
 | 
						||
  ``-artifact_prefix`` and will not use checksum in the file name. Do not use
 | 
						||
  the same path for several parallel processes.
 | 
						||
``-print_pcs``
 | 
						||
  If 1, print out newly covered PCs. Defaults to 0.
 | 
						||
``-print_final_stats``
 | 
						||
  If 1, print statistics at exit.  Defaults to 0.
 | 
						||
``-detect_leaks``
 | 
						||
  If 1 (default) and if LeakSanitizer is enabled
 | 
						||
  try to detect memory leaks during fuzzing (i.e. not only at shut down).
 | 
						||
``-close_fd_mask``
 | 
						||
  Indicate output streams to close at startup. Be careful, this will
 | 
						||
  remove diagnostic output from target code (e.g. messages on assert failure).
 | 
						||
 | 
						||
   - 0 (default): close neither ``stdout`` nor ``stderr``
 | 
						||
   - 1 : close ``stdout``
 | 
						||
   - 2 : close ``stderr``
 | 
						||
   - 3 : close both ``stdout`` and ``stderr``.
 | 
						||
 | 
						||
For the full list of flags run the fuzzer binary with ``-help=1``.
 | 
						||
 | 
						||
Output
 | 
						||
======
 | 
						||
 | 
						||
During operation the fuzzer prints information to ``stderr``, for example::
 | 
						||
 | 
						||
  INFO: Seed: 1523017872
 | 
						||
  INFO: Loaded 1 modules (16 guards): [0x744e60, 0x744ea0), 
 | 
						||
  INFO: -max_len is not provided, using 64
 | 
						||
  INFO: A corpus is not provided, starting from an empty corpus
 | 
						||
  #0	READ units: 1
 | 
						||
  #1	INITED cov: 3 ft: 2 corp: 1/1b exec/s: 0 rss: 24Mb
 | 
						||
  #3811	NEW    cov: 4 ft: 3 corp: 2/2b exec/s: 0 rss: 25Mb L: 1 MS: 5 ChangeBit-ChangeByte-ChangeBit-ShuffleBytes-ChangeByte-
 | 
						||
  #3827	NEW    cov: 5 ft: 4 corp: 3/4b exec/s: 0 rss: 25Mb L: 2 MS: 1 CopyPart-
 | 
						||
  #3963	NEW    cov: 6 ft: 5 corp: 4/6b exec/s: 0 rss: 25Mb L: 2 MS: 2 ShuffleBytes-ChangeBit-
 | 
						||
  #4167	NEW    cov: 7 ft: 6 corp: 5/9b exec/s: 0 rss: 25Mb L: 3 MS: 1 InsertByte-
 | 
						||
  ...
 | 
						||
 | 
						||
The early parts of the output include information about the fuzzer options and
 | 
						||
configuration, including the current random seed (in the ``Seed:`` line; this
 | 
						||
can be overridden with the ``-seed=N`` flag).
 | 
						||
 | 
						||
Further output lines have the form of an event code and statistics.  The
 | 
						||
possible event codes are:
 | 
						||
 | 
						||
``READ``
 | 
						||
  The fuzzer has read in all of the provided input samples from the corpus
 | 
						||
  directories.
 | 
						||
``INITED``
 | 
						||
  The fuzzer has completed initialization, which includes running each of
 | 
						||
  the initial input samples through the code under test.
 | 
						||
``NEW``
 | 
						||
  The fuzzer has created a test input that covers new areas of the code
 | 
						||
  under test.  This input will be saved to the primary corpus directory.
 | 
						||
``pulse``
 | 
						||
  The fuzzer has generated 2\ :sup:`n` inputs (generated periodically to reassure
 | 
						||
  the user that the fuzzer is still working).
 | 
						||
``DONE``
 | 
						||
  The fuzzer has completed operation because it has reached the specified
 | 
						||
  iteration limit (``-runs``) or time limit (``-max_total_time``).
 | 
						||
``MIN<n>``
 | 
						||
  The fuzzer is minimizing the combination of input corpus directories into
 | 
						||
  a single unified corpus (due to the ``-merge`` command line option).
 | 
						||
``RELOAD``
 | 
						||
  The fuzzer is performing a periodic reload of inputs from the corpus
 | 
						||
  directory; this allows it to discover any inputs discovered by other
 | 
						||
  fuzzer processes (see `Parallel Fuzzing`_).
 | 
						||
 | 
						||
Each output line also reports the following statistics (when non-zero):
 | 
						||
 | 
						||
``cov:``
 | 
						||
  Total number of code blocks or edges covered by the executing the current
 | 
						||
  corpus.
 | 
						||
``ft:``
 | 
						||
  libFuzzer uses different signals to evaluate the code coverage:
 | 
						||
  edge coverage, edge counters, value profiles, indirect caller/callee pairs, etc.
 | 
						||
  These signals combined are called *features* (`ft:`).
 | 
						||
``corp:``
 | 
						||
  Number of entries in the current in-memory test corpus and its size in bytes.
 | 
						||
``exec/s:``
 | 
						||
  Number of fuzzer iterations per second.
 | 
						||
``rss:``
 | 
						||
  Current memory consumption.
 | 
						||
 | 
						||
For ``NEW`` events, the output line also includes information about the mutation
 | 
						||
operation that produced the new input:
 | 
						||
 | 
						||
``L:``
 | 
						||
  Size of the new input in bytes.
 | 
						||
``MS: <n> <operations>``
 | 
						||
  Count and list of the mutation operations used to generate the input.
 | 
						||
 | 
						||
 | 
						||
Examples
 | 
						||
========
 | 
						||
.. contents::
 | 
						||
   :local:
 | 
						||
   :depth: 1
 | 
						||
 | 
						||
Toy example
 | 
						||
-----------
 | 
						||
 | 
						||
A simple function that does something interesting if it receives the input
 | 
						||
"HI!"::
 | 
						||
 | 
						||
  cat << EOF > test_fuzzer.cc
 | 
						||
  #include <stdint.h>
 | 
						||
  #include <stddef.h>
 | 
						||
  extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
 | 
						||
    if (size > 0 && data[0] == 'H')
 | 
						||
      if (size > 1 && data[1] == 'I')
 | 
						||
         if (size > 2 && data[2] == '!')
 | 
						||
         __builtin_trap();
 | 
						||
    return 0;
 | 
						||
  }
 | 
						||
  EOF
 | 
						||
  # Build test_fuzzer.cc with asan and link against libFuzzer.a
 | 
						||
  clang++ -fsanitize=address -fsanitize-coverage=trace-pc-guard test_fuzzer.cc libFuzzer.a
 | 
						||
  # Run the fuzzer with no corpus.
 | 
						||
  ./a.out
 | 
						||
 | 
						||
You should get an error pretty quickly::
 | 
						||
 | 
						||
  INFO: Seed: 1523017872
 | 
						||
  INFO: Loaded 1 modules (16 guards): [0x744e60, 0x744ea0), 
 | 
						||
  INFO: -max_len is not provided, using 64
 | 
						||
  INFO: A corpus is not provided, starting from an empty corpus
 | 
						||
  #0	READ units: 1
 | 
						||
  #1	INITED cov: 3 ft: 2 corp: 1/1b exec/s: 0 rss: 24Mb
 | 
						||
  #3811	NEW    cov: 4 ft: 3 corp: 2/2b exec/s: 0 rss: 25Mb L: 1 MS: 5 ChangeBit-ChangeByte-ChangeBit-ShuffleBytes-ChangeByte-
 | 
						||
  #3827	NEW    cov: 5 ft: 4 corp: 3/4b exec/s: 0 rss: 25Mb L: 2 MS: 1 CopyPart-
 | 
						||
  #3963	NEW    cov: 6 ft: 5 corp: 4/6b exec/s: 0 rss: 25Mb L: 2 MS: 2 ShuffleBytes-ChangeBit-
 | 
						||
  #4167	NEW    cov: 7 ft: 6 corp: 5/9b exec/s: 0 rss: 25Mb L: 3 MS: 1 InsertByte-
 | 
						||
  ==31511== ERROR: libFuzzer: deadly signal
 | 
						||
  ...
 | 
						||
  artifact_prefix='./'; Test unit written to ./crash-b13e8756b13a00cf168300179061fb4b91fefbed
 | 
						||
 | 
						||
 | 
						||
More examples
 | 
						||
-------------
 | 
						||
 | 
						||
Examples of real-life fuzz targets and the bugs they find can be found
 | 
						||
at http://tutorial.libfuzzer.info. Among other things you can learn how
 | 
						||
to detect Heartbleed_ in one second.
 | 
						||
 | 
						||
 | 
						||
Advanced features
 | 
						||
=================
 | 
						||
.. contents::
 | 
						||
   :local:
 | 
						||
   :depth: 1
 | 
						||
 | 
						||
Dictionaries
 | 
						||
------------
 | 
						||
LibFuzzer supports user-supplied dictionaries with input language keywords
 | 
						||
or other interesting byte sequences (e.g. multi-byte magic values).
 | 
						||
Use ``-dict=DICTIONARY_FILE``. For some input languages using a dictionary
 | 
						||
may significantly improve the search speed.
 | 
						||
The dictionary syntax is similar to that used by AFL_ for its ``-x`` option::
 | 
						||
 | 
						||
  # Lines starting with '#' and empty lines are ignored.
 | 
						||
 | 
						||
  # Adds "blah" (w/o quotes) to the dictionary.
 | 
						||
  kw1="blah"
 | 
						||
  # Use \\ for backslash and \" for quotes.
 | 
						||
  kw2="\"ac\\dc\""
 | 
						||
  # Use \xAB for hex values
 | 
						||
  kw3="\xF7\xF8"
 | 
						||
  # the name of the keyword followed by '=' may be omitted:
 | 
						||
  "foo\x0Abar"
 | 
						||
 | 
						||
 | 
						||
 | 
						||
Tracing CMP instructions
 | 
						||
------------------------
 | 
						||
 | 
						||
With an additional compiler flag ``-fsanitize-coverage=trace-cmp``
 | 
						||
(see SanitizerCoverageTraceDataFlow_)
 | 
						||
libFuzzer will intercept CMP instructions and guide mutations based
 | 
						||
on the arguments of intercepted CMP instructions. This may slow down
 | 
						||
the fuzzing but is very likely to improve the results.
 | 
						||
 | 
						||
Value Profile
 | 
						||
-------------
 | 
						||
 | 
						||
*EXPERIMENTAL*.
 | 
						||
With  ``-fsanitize-coverage=trace-cmp``
 | 
						||
and extra run-time flag ``-use_value_profile=1`` the fuzzer will
 | 
						||
collect value profiles for the parameters of compare instructions
 | 
						||
and treat some new values as new coverage.
 | 
						||
 | 
						||
The current imlpementation does roughly the following:
 | 
						||
 | 
						||
* The compiler instruments all CMP instructions with a callback that receives both CMP arguments.
 | 
						||
* The callback computes `(caller_pc&4095) | (popcnt(Arg1 ^ Arg2) << 12)` and uses this value to set a bit in a bitset.
 | 
						||
* Every new observed bit in the bitset is treated as new coverage.
 | 
						||
 | 
						||
 | 
						||
This feature has a potential to discover many interesting inputs,
 | 
						||
but there are two downsides.
 | 
						||
First, the extra instrumentation may bring up to 2x additional slowdown.
 | 
						||
Second, the corpus may grow by several times.
 | 
						||
 | 
						||
Fuzzer-friendly build mode
 | 
						||
---------------------------
 | 
						||
Sometimes the code under test is not fuzzing-friendly. Examples:
 | 
						||
 | 
						||
  - The target code uses a PRNG seeded e.g. by system time and
 | 
						||
    thus two consequent invocations may potentially execute different code paths
 | 
						||
    even if the end result will be the same. This will cause a fuzzer to treat
 | 
						||
    two similar inputs as significantly different and it will blow up the test corpus.
 | 
						||
    E.g. libxml uses ``rand()`` inside its hash table.
 | 
						||
  - The target code uses checksums to protect from invalid inputs.
 | 
						||
    E.g. png checks CRC for every chunk.
 | 
						||
 | 
						||
In many cases it makes sense to build a special fuzzing-friendly build
 | 
						||
with certain fuzzing-unfriendly features disabled. We propose to use a common build macro
 | 
						||
for all such cases for consistency: ``FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION``.
 | 
						||
 | 
						||
.. code-block:: c++
 | 
						||
 | 
						||
  void MyInitPRNG() {
 | 
						||
  #ifdef FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION
 | 
						||
    // In fuzzing mode the behavior of the code should be deterministic.
 | 
						||
    srand(0);
 | 
						||
  #else
 | 
						||
    srand(time(0));
 | 
						||
  #endif
 | 
						||
  }
 | 
						||
 | 
						||
 | 
						||
 | 
						||
AFL compatibility
 | 
						||
-----------------
 | 
						||
LibFuzzer can be used together with AFL_ on the same test corpus.
 | 
						||
Both fuzzers expect the test corpus to reside in a directory, one file per input.
 | 
						||
You can run both fuzzers on the same corpus, one after another:
 | 
						||
 | 
						||
.. code-block:: console
 | 
						||
 | 
						||
  ./afl-fuzz -i testcase_dir -o findings_dir /path/to/program @@
 | 
						||
  ./llvm-fuzz testcase_dir findings_dir  # Will write new tests to testcase_dir
 | 
						||
 | 
						||
Periodically restart both fuzzers so that they can use each other's findings.
 | 
						||
Currently, there is no simple way to run both fuzzing engines in parallel while sharing the same corpus dir.
 | 
						||
 | 
						||
You may also use AFL on your target function ``LLVMFuzzerTestOneInput``:
 | 
						||
see an example `here <https://github.com/llvm-mirror/llvm/blob/master/lib/Fuzzer/afl/afl_driver.cpp>`__.
 | 
						||
 | 
						||
How good is my fuzzer?
 | 
						||
----------------------
 | 
						||
 | 
						||
Once you implement your target function ``LLVMFuzzerTestOneInput`` and fuzz it to death,
 | 
						||
you will want to know whether the function or the corpus can be improved further.
 | 
						||
One easy to use metric is, of course, code coverage.
 | 
						||
You can get the coverage for your corpus like this:
 | 
						||
 | 
						||
.. code-block:: console
 | 
						||
 | 
						||
  ASAN_OPTIONS=coverage=1 ./fuzzer CORPUS_DIR -runs=0
 | 
						||
 | 
						||
This will run all tests in the CORPUS_DIR but will not perform any fuzzing.
 | 
						||
At the end of the process it will dump a single ``.sancov`` file with coverage 
 | 
						||
information.  See SanitizerCoverage_ for details on querying the file using the
 | 
						||
``sancov`` tool.
 | 
						||
 | 
						||
You may also use other ways to visualize coverage,
 | 
						||
e.g. using `Clang coverage <http://clang.llvm.org/docs/SourceBasedCodeCoverage.html>`_,
 | 
						||
but those will require
 | 
						||
you to rebuild the code with different compiler flags.
 | 
						||
 | 
						||
User-supplied mutators
 | 
						||
----------------------
 | 
						||
 | 
						||
LibFuzzer allows to use custom (user-supplied) mutators,
 | 
						||
see FuzzerInterface.h_
 | 
						||
 | 
						||
Startup initialization
 | 
						||
----------------------
 | 
						||
If the library being tested needs to be initialized, there are several options.
 | 
						||
 | 
						||
The simplest way is to have a statically initialized global object inside
 | 
						||
`LLVMFuzzerTestOneInput` (or in global scope if that works for you):
 | 
						||
 | 
						||
.. code-block:: c++
 | 
						||
 | 
						||
  extern "C" int LLVMFuzzerTestOneInput(const uint8_t *Data, size_t Size) {
 | 
						||
    static bool Initialized = DoInitialization();
 | 
						||
    ...
 | 
						||
 | 
						||
Alternatively, you may define an optional init function and it will receive
 | 
						||
the program arguments that you can read and modify. Do this **only** if you
 | 
						||
realy need to access ``argv``/``argc``.
 | 
						||
 | 
						||
.. code-block:: c++
 | 
						||
 | 
						||
   extern "C" int LLVMFuzzerInitialize(int *argc, char ***argv) {
 | 
						||
    ReadAndMaybeModify(argc, argv);
 | 
						||
    return 0;
 | 
						||
   }
 | 
						||
 | 
						||
 | 
						||
Leaks
 | 
						||
-----
 | 
						||
 | 
						||
Binaries built with AddressSanitizer_ or LeakSanitizer_ will try to detect
 | 
						||
memory leaks at the process shutdown.
 | 
						||
For in-process fuzzing this is inconvenient
 | 
						||
since the fuzzer needs to report a leak with a reproducer as soon as the leaky
 | 
						||
mutation is found. However, running full leak detection after every mutation
 | 
						||
is expensive.
 | 
						||
 | 
						||
By default (``-detect_leaks=1``) libFuzzer will count the number of
 | 
						||
``malloc`` and ``free`` calls when executing every mutation.
 | 
						||
If the numbers don't match (which by itself doesn't mean there is a leak)
 | 
						||
libFuzzer will invoke the more expensive LeakSanitizer_
 | 
						||
pass and if the actual leak is found, it will be reported with the reproducer
 | 
						||
and the process will exit.
 | 
						||
 | 
						||
If your target has massive leaks and the leak detection is disabled
 | 
						||
you will eventually run out of RAM (see the ``-rss_limit_mb`` flag).
 | 
						||
 | 
						||
 | 
						||
Developing libFuzzer
 | 
						||
====================
 | 
						||
 | 
						||
Building libFuzzer as a part of LLVM project and running its test requires
 | 
						||
fresh clang as the host compiler and special CMake configuration:
 | 
						||
 | 
						||
.. code-block:: console
 | 
						||
 | 
						||
    cmake -GNinja  -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DLLVM_USE_SANITIZER=Address -DLLVM_USE_SANITIZE_COVERAGE=YES -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_ASSERTIONS=ON /path/to/llvm
 | 
						||
    ninja check-fuzzer
 | 
						||
 | 
						||
 | 
						||
Fuzzing components of LLVM
 | 
						||
==========================
 | 
						||
.. contents::
 | 
						||
   :local:
 | 
						||
   :depth: 1
 | 
						||
 | 
						||
To build any of the LLVM fuzz targets use the build instructions above.
 | 
						||
 | 
						||
clang-format-fuzzer
 | 
						||
-------------------
 | 
						||
The inputs are random pieces of C++-like text.
 | 
						||
 | 
						||
.. code-block:: console
 | 
						||
 | 
						||
    ninja clang-format-fuzzer
 | 
						||
    mkdir CORPUS_DIR
 | 
						||
    ./bin/clang-format-fuzzer CORPUS_DIR
 | 
						||
 | 
						||
Optionally build other kinds of binaries (ASan+Debug, MSan, UBSan, etc).
 | 
						||
 | 
						||
Tracking bug: https://llvm.org/bugs/show_bug.cgi?id=23052
 | 
						||
 | 
						||
clang-fuzzer
 | 
						||
------------
 | 
						||
 | 
						||
The behavior is very similar to ``clang-format-fuzzer``.
 | 
						||
 | 
						||
Tracking bug: https://llvm.org/bugs/show_bug.cgi?id=23057
 | 
						||
 | 
						||
llvm-as-fuzzer
 | 
						||
--------------
 | 
						||
 | 
						||
Tracking bug: https://llvm.org/bugs/show_bug.cgi?id=24639
 | 
						||
 | 
						||
llvm-mc-fuzzer
 | 
						||
--------------
 | 
						||
 | 
						||
This tool fuzzes the MC layer. Currently it is only able to fuzz the
 | 
						||
disassembler but it is hoped that assembly, and round-trip verification will be
 | 
						||
added in future.
 | 
						||
 | 
						||
When run in dissassembly mode, the inputs are opcodes to be disassembled. The
 | 
						||
fuzzer will consume as many instructions as possible and will stop when it
 | 
						||
finds an invalid instruction or runs out of data.
 | 
						||
 | 
						||
Please note that the command line interface differs slightly from that of other
 | 
						||
fuzzers. The fuzzer arguments should follow ``--fuzzer-args`` and should have
 | 
						||
a single dash, while other arguments control the operation mode and target in a
 | 
						||
similar manner to ``llvm-mc`` and should have two dashes. For example:
 | 
						||
 | 
						||
.. code-block:: console
 | 
						||
 | 
						||
  llvm-mc-fuzzer --triple=aarch64-linux-gnu --disassemble --fuzzer-args -max_len=4 -jobs=10
 | 
						||
 | 
						||
Buildbot
 | 
						||
--------
 | 
						||
 | 
						||
A buildbot continuously runs the above fuzzers for LLVM components, with results
 | 
						||
shown at http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fuzzer .
 | 
						||
 | 
						||
FAQ
 | 
						||
=========================
 | 
						||
 | 
						||
Q. Why doesn't libFuzzer use any of the LLVM support?
 | 
						||
-----------------------------------------------------
 | 
						||
 | 
						||
There are two reasons.
 | 
						||
 | 
						||
First, we want this library to be used outside of the LLVM without users having to
 | 
						||
build the rest of LLVM. This may sound unconvincing for many LLVM folks,
 | 
						||
but in practice the need for building the whole LLVM frightens many potential
 | 
						||
users -- and we want more users to use this code.
 | 
						||
 | 
						||
Second, there is a subtle technical reason not to rely on the rest of LLVM, or
 | 
						||
any other large body of code (maybe not even STL). When coverage instrumentation
 | 
						||
is enabled, it will also instrument the LLVM support code which will blow up the
 | 
						||
coverage set of the process (since the fuzzer is in-process). In other words, by
 | 
						||
using more external dependencies we will slow down the fuzzer while the main
 | 
						||
reason for it to exist is extreme speed.
 | 
						||
 | 
						||
Q. What about Windows then? The fuzzer contains code that does not build on Windows.
 | 
						||
------------------------------------------------------------------------------------
 | 
						||
 | 
						||
Volunteers are welcome.
 | 
						||
 | 
						||
Q. When libFuzzer is not a good solution for a problem?
 | 
						||
---------------------------------------------------------
 | 
						||
 | 
						||
* If the test inputs are validated by the target library and the validator
 | 
						||
  asserts/crashes on invalid inputs, in-process fuzzing is not applicable.
 | 
						||
* Bugs in the target library may accumulate without being detected. E.g. a memory
 | 
						||
  corruption that goes undetected at first and then leads to a crash while
 | 
						||
  testing another input. This is why it is highly recommended to run this
 | 
						||
  in-process fuzzer with all sanitizers to detect most bugs on the spot.
 | 
						||
* It is harder to protect the in-process fuzzer from excessive memory
 | 
						||
  consumption and infinite loops in the target library (still possible).
 | 
						||
* The target library should not have significant global state that is not
 | 
						||
  reset between the runs.
 | 
						||
* Many interesting target libraries are not designed in a way that supports
 | 
						||
  the in-process fuzzer interface (e.g. require a file path instead of a
 | 
						||
  byte array).
 | 
						||
* If a single test run takes a considerable fraction of a second (or
 | 
						||
  more) the speed benefit from the in-process fuzzer is negligible.
 | 
						||
* If the target library runs persistent threads (that outlive
 | 
						||
  execution of one test) the fuzzing results will be unreliable.
 | 
						||
 | 
						||
Q. So, what exactly this Fuzzer is good for?
 | 
						||
--------------------------------------------
 | 
						||
 | 
						||
This Fuzzer might be a good choice for testing libraries that have relatively
 | 
						||
small inputs, each input takes < 10ms to run, and the library code is not expected
 | 
						||
to crash on invalid inputs.
 | 
						||
Examples: regular expression matchers, text or binary format parsers, compression,
 | 
						||
network, crypto.
 | 
						||
 | 
						||
Trophies
 | 
						||
========
 | 
						||
* GLIBC: https://sourceware.org/glibc/wiki/FuzzingLibc
 | 
						||
 | 
						||
* MUSL LIBC: `[1] <http://git.musl-libc.org/cgit/musl/commit/?id=39dfd58417ef642307d90306e1c7e50aaec5a35c>`__ `[2] <http://www.openwall.com/lists/oss-security/2015/03/30/3>`__
 | 
						||
 | 
						||
* `pugixml <https://github.com/zeux/pugixml/issues/39>`_
 | 
						||
 | 
						||
* PCRE: Search for "LLVM fuzzer" in http://vcs.pcre.org/pcre2/code/trunk/ChangeLog?view=markup;
 | 
						||
  also in `bugzilla <https://bugs.exim.org/buglist.cgi?bug_status=__all__&content=libfuzzer&no_redirect=1&order=Importance&product=PCRE&query_format=specific>`_
 | 
						||
 | 
						||
* `ICU <http://bugs.icu-project.org/trac/ticket/11838>`_
 | 
						||
 | 
						||
* `Freetype <https://savannah.nongnu.org/search/?words=LibFuzzer&type_of_search=bugs&Search=Search&exact=1#options>`_
 | 
						||
 | 
						||
* `Harfbuzz <https://github.com/behdad/harfbuzz/issues/139>`_
 | 
						||
 | 
						||
* `SQLite <http://www3.sqlite.org/cgi/src/info/088009efdd56160b>`_
 | 
						||
 | 
						||
* `Python <http://bugs.python.org/issue25388>`_
 | 
						||
 | 
						||
* OpenSSL/BoringSSL: `[1] <https://boringssl.googlesource.com/boringssl/+/cb852981cd61733a7a1ae4fd8755b7ff950e857d>`_ `[2] <https://openssl.org/news/secadv/20160301.txt>`_ `[3] <https://boringssl.googlesource.com/boringssl/+/2b07fa4b22198ac02e0cee8f37f3337c3dba91bc>`_ `[4] <https://boringssl.googlesource.com/boringssl/+/6b6e0b20893e2be0e68af605a60ffa2cbb0ffa64>`_  `[5] <https://github.com/openssl/openssl/pull/931/commits/dd5ac557f052cc2b7f718ac44a8cb7ac6f77dca8>`_ `[6] <https://github.com/openssl/openssl/pull/931/commits/19b5b9194071d1d84e38ac9a952e715afbc85a81>`_
 | 
						||
 | 
						||
* `Libxml2
 | 
						||
  <https://bugzilla.gnome.org/buglist.cgi?bug_status=__all__&content=libFuzzer&list_id=68957&order=Importance&product=libxml2&query_format=specific>`_ and `[HT206167] <https://support.apple.com/en-gb/HT206167>`_ (CVE-2015-5312, CVE-2015-7500, CVE-2015-7942)
 | 
						||
 | 
						||
* `Linux Kernel's BPF verifier <https://github.com/iovisor/bpf-fuzzer>`_
 | 
						||
 | 
						||
* Capstone: `[1] <https://github.com/aquynh/capstone/issues/600>`__ `[2] <https://github.com/aquynh/capstone/commit/6b88d1d51eadf7175a8f8a11b690684443b11359>`__
 | 
						||
 | 
						||
* file:`[1] <http://bugs.gw.com/view.php?id=550>`__  `[2] <http://bugs.gw.com/view.php?id=551>`__  `[3] <http://bugs.gw.com/view.php?id=553>`__  `[4] <http://bugs.gw.com/view.php?id=554>`__
 | 
						||
 | 
						||
* Radare2: `[1] <https://github.com/revskills?tab=contributions&from=2016-04-09>`__
 | 
						||
 | 
						||
* gRPC: `[1] <https://github.com/grpc/grpc/pull/6071/commits/df04c1f7f6aec6e95722ec0b023a6b29b6ea871c>`__ `[2] <https://github.com/grpc/grpc/pull/6071/commits/22a3dfd95468daa0db7245a4e8e6679a52847579>`__ `[3] <https://github.com/grpc/grpc/pull/6071/commits/9cac2a12d9e181d130841092e9d40fa3309d7aa7>`__ `[4] <https://github.com/grpc/grpc/pull/6012/commits/82a91c91d01ce9b999c8821ed13515883468e203>`__ `[5] <https://github.com/grpc/grpc/pull/6202/commits/2e3e0039b30edaf89fb93bfb2c1d0909098519fa>`__ `[6] <https://github.com/grpc/grpc/pull/6106/files>`__
 | 
						||
 | 
						||
* WOFF2: `[1] <https://github.com/google/woff2/commit/a15a8ab>`__
 | 
						||
 | 
						||
* LLVM: `Clang <https://llvm.org/bugs/show_bug.cgi?id=23057>`_, `Clang-format <https://llvm.org/bugs/show_bug.cgi?id=23052>`_, `libc++ <https://llvm.org/bugs/show_bug.cgi?id=24411>`_, `llvm-as <https://llvm.org/bugs/show_bug.cgi?id=24639>`_, `Demangler <https://bugs.chromium.org/p/chromium/issues/detail?id=606626>`_, Disassembler: http://reviews.llvm.org/rL247405, http://reviews.llvm.org/rL247414, http://reviews.llvm.org/rL247416, http://reviews.llvm.org/rL247417, http://reviews.llvm.org/rL247420, http://reviews.llvm.org/rL247422.
 | 
						||
 | 
						||
* Tensorflow: `[1] <https://github.com/tensorflow/tensorflow/commit/7231d01fcb2cd9ef9ffbfea03b724892c8a4026e>`__
 | 
						||
 | 
						||
* Ffmpeg: `[1] <https://github.com/FFmpeg/FFmpeg/commit/c92f55847a3d9cd12db60bfcd0831ff7f089c37c>`__  `[2] <https://github.com/FFmpeg/FFmpeg/commit/25ab1a65f3acb5ec67b53fb7a2463a7368f1ad16>`__  `[3] <https://github.com/FFmpeg/FFmpeg/commit/85d23e5cbc9ad6835eef870a5b4247de78febe56>`__ `[4] <https://github.com/FFmpeg/FFmpeg/commit/04bd1b38ee6b8df410d0ab8d4949546b6c4af26a>`__
 | 
						||
 | 
						||
.. _pcre2: http://www.pcre.org/
 | 
						||
.. _AFL: http://lcamtuf.coredump.cx/afl/
 | 
						||
.. _Radamsa: https://github.com/aoh/radamsa
 | 
						||
.. _SanitizerCoverage: http://clang.llvm.org/docs/SanitizerCoverage.html
 | 
						||
.. _SanitizerCoverageTraceDataFlow: http://clang.llvm.org/docs/SanitizerCoverage.html#tracing-data-flow
 | 
						||
.. _AddressSanitizer: http://clang.llvm.org/docs/AddressSanitizer.html
 | 
						||
.. _LeakSanitizer: http://clang.llvm.org/docs/LeakSanitizer.html
 | 
						||
.. _Heartbleed: http://en.wikipedia.org/wiki/Heartbleed
 | 
						||
.. _FuzzerInterface.h: https://github.com/llvm-mirror/llvm/blob/master/lib/Fuzzer/FuzzerInterface.h
 | 
						||
.. _3.7.0: http://llvm.org/releases/3.7.0/docs/LibFuzzer.html
 | 
						||
.. _building Clang from trunk: http://clang.llvm.org/get_started.html
 | 
						||
.. _MemorySanitizer: http://clang.llvm.org/docs/MemorySanitizer.html
 | 
						||
.. _UndefinedBehaviorSanitizer: http://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html
 | 
						||
.. _`coverage counters`: http://clang.llvm.org/docs/SanitizerCoverage.html#coverage-counters
 | 
						||
.. _`value profile`: #value-profile
 | 
						||
.. _`caller-callee pairs`: http://clang.llvm.org/docs/SanitizerCoverage.html#caller-callee-coverage
 | 
						||
.. _BoringSSL: https://boringssl.googlesource.com/boringssl/
 | 
						||
.. _`fuzz various parts of LLVM itself`: `Fuzzing components of LLVM`_
 |