I found that the initial corpus allocation of fork mode has certain defects.
I designed a new initial corpus allocation strategy based on size grouping.
This method can give more energy to the small seeds in the corpus and
increase the throughput of the test.
Fuzzbench data (glibfuzzer is -fork_corpus_groups=1):
https://www.fuzzbench.com/reports/experimental/2021-08-05-parallel/index.html
Reviewed By: morehouse
Differential Revision: https://reviews.llvm.org/D105084
Extend the existing single-pass algorithm for `Merger::Merge` with an algorithm that gives better results. This new implementation can be used with a new **set_cover_merge=1** flag.
This greedy set cover implementation gives a substantially smaller final corpus (40%-80% less testcases) while preserving the same features/coverage. At the same time, the execution time penalty is not that significant (+50% for ~1M corpus files and far less for smaller corpora). These results were obtained by comparing several targets with varying size corpora.
Change `Merger::CrashResistantMergeInternalStep` to collect all features from each file and not just unique ones. This is needed for the set cover algorithm to work correctly. The implementation of the algorithm in `Merger::SetCoverMerge` uses a bitvector to store features that are covered by a file while performing the pass. Collisions while indexing the bitvector are ignored similarly to the fuzzer.
Reviewed By: morehouse
Differential Revision: https://reviews.llvm.org/D105284
Attempting to build a standalone libFuzzer in Fuchsia's default toolchain for the purpose of cross-compiling the unit tests revealed a number of not-quite-proper type conversions. Fuchsia's toolchain include `-std=c++17` and `-Werror`, among others, leading to many errors like `-Wshorten-64-to-32`, `-Wimplicit-float-conversion`, etc.
Most of these have been addressed by simply making the conversion explicit with a `static_cast`. These typically fell into one of two categories: 1) conversions between types where high precision isn't critical, e.g. the "energy" calculations for `InputInfo`, and 2) conversions where the values will never reach the bits being truncated, e.g. `DftTimeInSeconds` is not going to exceed 136 years.
The major exception to this is the number of features: there are several places that treat features as `size_t`, and others as `uint32_t`. This change makes the decision to cap the features at 32 bits. The maximum value of a feature as produced by `TracePC::CollectFeatures` is roughly:
(NumPCsInPCTables + ValueBitMap::kMapSizeInBits + ExtraCountersBegin() - ExtraCountersEnd() + log2(SIZE_MAX)) * 8
It's conceivable for extremely large targets and/or extra counters that this limit could be reached. This shouldn't break fuzzing, but it will cause certain features to collide and lower the fuzzers overall precision. To address this, this change adds a warning to TracePC::PrintModuleInfo about excessive feature size if it is detected, and recommends refactoring the fuzzer into several smaller ones.
Reviewed By: morehouse
Differential Revision: https://reviews.llvm.org/D97992
This change adds additional unit tests for fuzzer::Merger::Parse and fuzzer::Merger::Merge in anticipation of additional changes to the merge control file format to support cross-process fuzzing.
It modifies the parameter handling of Merge slightly in order to make NewFeatures and NewCov consistent with NewFiles; namely, Merge *replaces* the contents of these output parameters rather than accumulating them (thereby fixing a buggy return value).
This is change 1 of (at least) 18 for cross-process fuzzing support.
Reviewed By: morehouse
Differential Revision: https://reviews.llvm.org/D94506
This patch adds an option "keep_seed" to keep all initial seed inputs in the
corpus. Previously, only the initial seed inputs that find new coverage were
added to the corpus, and all the other initial inputs were discarded. We
observed in some circumstances that useful initial seed inputs are discarded as
they find no new coverage, even though they contain useful fragments in them
(e.g., SQLITE3 FuzzBench benchmark). This newly added option provides a way to
keeping seed inputs in the corpus for those circumstances. With this patch, and
with -keep_seed=1, all initial seed inputs are kept in the corpus regardless of
whether they find new coverage or not. Further, these seed inputs are not
replaced with smaller inputs even if -reduce_inputs=1.
Differential Revision: https://reviews.llvm.org/D86577
It broke the Windows build:
C:\b\s\w\ir\cache\builder\src\third_party\llvm\compiler-rt\lib\fuzzer\FuzzerDataFlowTrace.cpp(243): error C3861: 'setenv': identifier not found
This also reverts the follow-up r363327.
llvm-svn: 363358
Summary:
Pass seed corpus list in a file to get around argument length limits on Windows.
This limit was preventing many uses of fork mode on Windows.
Reviewers: kcc, morehouse
Reviewed By: kcc
Subscribers: #sanitizers, llvm-commits
Tags: #sanitizers, #llvm
Differential Revision: https://reviews.llvm.org/D60980
llvm-svn: 359610
Summary:
Port libFuzzer's fork mode to Windows.
Implement Windows versions of MkDir, RmDir, and IterateDirRecursive to do this.
Don't print error messages under new normal uses of FileSize (on a non-existent file).
Implement portable way of piping output to /dev/null.
Fix test for Windows and comment fork-sigusr.test on why it won't be ported to Win.
Reviewers: zturner
Reviewed By: zturner
Subscribers: kcc, zturner, jdoerfert, #sanitizers, llvm-commits
Tags: #sanitizers, #llvm
Differential Revision: https://reviews.llvm.org/D58513
llvm-svn: 355019