aristotelis e6597dbae8 Greedy set cover implementation of Merger::Merge
Extend the existing single-pass algorithm for `Merger::Merge` with an algorithm that gives better results. This new implementation can be used with a new **set_cover_merge=1** flag.

This greedy set cover implementation gives a substantially smaller final corpus (40%-80% less testcases) while preserving the same features/coverage. At the same time, the execution time penalty is not that significant (+50% for ~1M corpus files and far less for smaller corpora). These results were obtained by comparing several targets with varying size corpora.

Change `Merger::CrashResistantMergeInternalStep` to collect all features from each file and not just unique ones. This is needed for the set cover algorithm to work correctly. The implementation of the algorithm in `Merger::SetCoverMerge` uses a bitvector to store features that are covered by a file while performing the pass. Collisions while indexing the bitvector are ignored similarly to the fuzzer.

Reviewed By: morehouse

Differential Revision: https://reviews.llvm.org/D105284
2021-09-07 09:42:38 -07:00

94 lines
3.7 KiB
C++

//===- FuzzerMerge.h - merging corpa ----------------------------*- C++ -* ===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//
// Merging Corpora.
//
// The task:
// Take the existing corpus (possibly empty) and merge new inputs into
// it so that only inputs with new coverage ('features') are added.
// The process should tolerate the crashes, OOMs, leaks, etc.
//
// Algorithm:
// The outer process collects the set of files and writes their names
// into a temporary "control" file, then repeatedly launches the inner
// process until all inputs are processed.
// The outer process does not actually execute the target code.
//
// The inner process reads the control file and sees a) list of all the inputs
// and b) the last processed input. Then it starts processing the inputs one
// by one. Before processing every input it writes one line to control file:
// STARTED INPUT_ID INPUT_SIZE
// After processing an input it writes the following lines:
// FT INPUT_ID Feature1 Feature2 Feature3 ...
// COV INPUT_ID Coverage1 Coverage2 Coverage3 ...
// If a crash happens while processing an input the last line in the control
// file will be "STARTED INPUT_ID" and so the next process will know
// where to resume.
//
// Once all inputs are processed by the inner process(es) the outer process
// reads the control files and does the merge based entirely on the contents
// of control file.
// It uses a single pass greedy algorithm choosing first the smallest inputs
// within the same size the inputs that have more new features.
//
//===----------------------------------------------------------------------===//
#ifndef LLVM_FUZZER_MERGE_H
#define LLVM_FUZZER_MERGE_H
#include "FuzzerDefs.h"
#include "FuzzerIO.h"
#include <istream>
#include <ostream>
#include <set>
#include <vector>
namespace fuzzer {
struct MergeFileInfo {
std::string Name;
size_t Size = 0;
std::vector<uint32_t> Features, Cov;
};
struct Merger {
std::vector<MergeFileInfo> Files;
size_t NumFilesInFirstCorpus = 0;
size_t FirstNotProcessedFile = 0;
std::string LastFailure;
bool Parse(std::istream &IS, bool ParseCoverage);
bool Parse(const std::string &Str, bool ParseCoverage);
void ParseOrExit(std::istream &IS, bool ParseCoverage);
size_t Merge(const std::set<uint32_t> &InitialFeatures,
std::set<uint32_t> *NewFeatures,
const std::set<uint32_t> &InitialCov, std::set<uint32_t> *NewCov,
std::vector<std::string> *NewFiles);
size_t SetCoverMerge(const std::set<uint32_t> &InitialFeatures,
std::set<uint32_t> *NewFeatures,
const std::set<uint32_t> &InitialCov,
std::set<uint32_t> *NewCov,
std::vector<std::string> *NewFiles);
size_t ApproximateMemoryConsumption() const;
std::set<uint32_t> AllFeatures() const;
};
void CrashResistantMerge(const std::vector<std::string> &Args,
const std::vector<SizedFile> &OldCorpus,
const std::vector<SizedFile> &NewCorpus,
std::vector<std::string> *NewFiles,
const std::set<uint32_t> &InitialFeatures,
std::set<uint32_t> *NewFeatures,
const std::set<uint32_t> &InitialCov,
std::set<uint32_t> *NewCov, const std::string &CFPath,
bool Verbose, bool IsSetCoverMerge);
} // namespace fuzzer
#endif // LLVM_FUZZER_MERGE_H