[libc++] Update utilities to compare benchmarks (#157556)
This patch replaces the previous `libcxx-compare-benchmarks` wrapper by a new `compare-benchmarks` script which works with LNT-compatible data. This allows comparing benchmark results across libc++ microbenchmarks, SPEC, and anything else that would produce LNT-compatible data. It also adds a simple script to consolidate LNT benchmark output into a single file, simplifying the process of doing A/B runs locally. The simplest way to do this doesn't require creating two build directories after this patch anymore. It also adds the ability to produce either a standalone HTML chart or a plain text output for diffing results locally when prototyping changes. Example text output of the new tool: ``` Benchmark Baseline Candidate Difference % Difference ----------------------------------- ---------- ----------- ------------ -------------- BM_join_view_deques/0 8.11 8.16 0.05 0.63 BM_join_view_deques/1 13.56 13.79 0.23 1.69 BM_join_view_deques/1024 6606.51 7011.34 404.83 6.13 BM_join_view_deques/2 17.99 19.92 1.93 10.72 BM_join_view_deques/4000 27655.58 29864.72 2209.14 7.99 BM_join_view_deques/4096 26218.07 30520.13 4302.05 16.41 BM_join_view_deques/512 3231.66 2832.47 -399.19 -12.35 BM_join_view_deques/5500 47144.82 42207.41 -4937.42 -10.47 BM_join_view_deques/64 247.23 262.66 15.43 6.24 BM_join_view_deques/64000 756221.63 511247.48 -244974.15 -32.39 BM_join_view_deques/65536 537110.91 560241.61 23130.70 4.31 BM_join_view_deques/70000 815739.07 616181.34 -199557.73 -24.46 BM_join_view_out_vectors/0 0.93 0.93 0.00 0.07 BM_join_view_out_vectors/1 3.11 3.14 0.03 0.82 BM_join_view_out_vectors/1024 3090.92 3563.29 472.37 15.28 BM_join_view_out_vectors/2 5.52 5.56 0.04 0.64 BM_join_view_out_vectors/4000 9887.21 9774.40 -112.82 -1.14 BM_join_view_out_vectors/4096 10158.78 10190.44 31.66 0.31 BM_join_view_out_vectors/512 1218.68 1209.59 -9.09 -0.75 BM_join_view_out_vectors/5500 13559.23 13676.06 116.84 0.86 BM_join_view_out_vectors/64 158.95 157.91 -1.04 -0.65 BM_join_view_out_vectors/64000 178514.73 226520.97 48006.24 26.89 BM_join_view_out_vectors/65536 184639.37 207180.35 22540.98 12.21 BM_join_view_out_vectors/70000 235006.69 213886.93 -21119.77 -8.99 ```
This commit is contained in:
parent
f9e5d39c4c
commit
a6af641b89
@ -471,7 +471,7 @@ removed from the Standard. These tests should be written like:
|
||||
Benchmarks
|
||||
==========
|
||||
|
||||
Libc++'s test suite also contains benchmarks. The benchmarks are written using the `Google Benchmark`_
|
||||
Libc++'s test suite also contains benchmarks. Many benchmarks are written using the `Google Benchmark`_
|
||||
library, a copy of which is stored in the LLVM monorepo. For more information about using the Google
|
||||
Benchmark library, see the `official documentation <https://github.com/google/benchmark>`_.
|
||||
|
||||
@ -490,27 +490,46 @@ run through ``check-cxx`` for anything, instead run the benchmarks manually usin
|
||||
the instructions for running individual tests.
|
||||
|
||||
If you want to compare the results of different benchmark runs, we recommend using the
|
||||
``libcxx-compare-benchmarks`` helper tool. First, configure CMake in a build directory
|
||||
and run the benchmark:
|
||||
``compare-benchmarks`` helper tool. Note that the script has some dependencies, which can
|
||||
be installed with:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
$ cmake -S runtimes -B <build1> [...]
|
||||
$ libcxx/utils/libcxx-lit <build1> libcxx/test/benchmarks/string.bench.cpp --param optimization=speed
|
||||
$ python -m venv .venv && source .venv/bin/activate # Optional but recommended
|
||||
$ pip install -r libcxx/utils/requirements.txt
|
||||
|
||||
Then, do the same for the second configuration you want to test. Use a different build
|
||||
directory for that configuration:
|
||||
Once that's done, start by configuring CMake in a build directory and running one or
|
||||
more benchmarks, as usual:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
$ cmake -S runtimes -B <build2> [...]
|
||||
$ libcxx/utils/libcxx-lit <build2> libcxx/test/benchmarks/string.bench.cpp --param optimization=speed
|
||||
$ cmake -S runtimes -B <build> [...]
|
||||
$ libcxx/utils/libcxx-lit <build> libcxx/test/benchmarks/string.bench.cpp --param optimization=speed
|
||||
|
||||
Finally, use ``libcxx-compare-benchmarks`` to compare both:
|
||||
Then, get the consolidated benchmark output for that run using ``consolidate-benchmarks``:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
$ libcxx/utils/libcxx-compare-benchmarks <build1> <build2> libcxx/test/benchmarks/string.bench.cpp
|
||||
$ libcxx/utils/consolidate-benchmarks <build> > baseline.lnt
|
||||
|
||||
The ``baseline.lnt`` file will contain a consolidation of all the benchmark results present in the build
|
||||
directory. You can then make the desired modifications to the code, run the benchmark(s) again, and then run:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
$ libcxx/utils/consolidate-benchmarks <build> > candidate.lnt
|
||||
|
||||
Finally, use ``compare-benchmarks`` to compare both:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
$ libcxx/utils/compare-benchmarks baseline.lnt candidate.lnt
|
||||
|
||||
# Useful one-liner when iterating locally:
|
||||
$ libcxx/utils/compare-benchmarks baseline.lnt <(libcxx/utils/consolidate-benchmarks <build>)
|
||||
|
||||
The ``compare-benchmarks`` script provides some useful options like creating a chart to easily visualize
|
||||
differences in a browser window. Use ``compare-benchmarks --help`` for details.
|
||||
|
||||
.. _`Google Benchmark`: https://github.com/google/benchmark
|
||||
|
||||
|
||||
123
libcxx/utils/compare-benchmarks
Executable file
123
libcxx/utils/compare-benchmarks
Executable file
@ -0,0 +1,123 @@
|
||||
#!/usr/bin/env python3
|
||||
|
||||
import argparse
|
||||
import re
|
||||
import statistics
|
||||
import sys
|
||||
|
||||
import plotly
|
||||
import tabulate
|
||||
|
||||
def parse_lnt(lines):
|
||||
"""
|
||||
Parse lines in LNT format and return a dictionnary of the form:
|
||||
|
||||
{
|
||||
'benchmark1': {
|
||||
'metric1': [float],
|
||||
'metric2': [float],
|
||||
...
|
||||
},
|
||||
'benchmark2': {
|
||||
'metric1': [float],
|
||||
'metric2': [float],
|
||||
...
|
||||
},
|
||||
...
|
||||
}
|
||||
|
||||
Each metric may have multiple values.
|
||||
"""
|
||||
results = {}
|
||||
for line in lines:
|
||||
line = line.strip()
|
||||
if not line:
|
||||
continue
|
||||
|
||||
(identifier, value) = line.split(' ')
|
||||
(name, metric) = identifier.split('.')
|
||||
if name not in results:
|
||||
results[name] = {}
|
||||
if metric not in results[name]:
|
||||
results[name][metric] = []
|
||||
results[name][metric].append(float(value))
|
||||
return results
|
||||
|
||||
def plain_text_comparison(benchmarks, baseline, candidate):
|
||||
"""
|
||||
Create a tabulated comparison of the baseline and the candidate.
|
||||
"""
|
||||
headers = ['Benchmark', 'Baseline', 'Candidate', 'Difference', '% Difference']
|
||||
fmt = (None, '.2f', '.2f', '.2f', '.2f')
|
||||
table = []
|
||||
for (bm, base, cand) in zip(benchmarks, baseline, candidate):
|
||||
diff = (cand - base) if base and cand else None
|
||||
percent = 100 * (diff / base) if base and cand else None
|
||||
row = [bm, base, cand, diff, percent]
|
||||
table.append(row)
|
||||
return tabulate.tabulate(table, headers=headers, floatfmt=fmt, numalign='right')
|
||||
|
||||
def create_chart(benchmarks, baseline, candidate):
|
||||
"""
|
||||
Create a bar chart comparing 'baseline' and 'candidate'.
|
||||
"""
|
||||
figure = plotly.graph_objects.Figure()
|
||||
figure.add_trace(plotly.graph_objects.Bar(x=benchmarks, y=baseline, name='Baseline'))
|
||||
figure.add_trace(plotly.graph_objects.Bar(x=benchmarks, y=candidate, name='Candidate'))
|
||||
return figure
|
||||
|
||||
def prepare_series(baseline, candidate, metric, aggregate=statistics.median):
|
||||
"""
|
||||
Prepare the data for being formatted or displayed as a chart.
|
||||
|
||||
Metrics that have more than one value are aggregated using the given aggregation function.
|
||||
"""
|
||||
all_benchmarks = sorted(list(set(baseline.keys()) | set(candidate.keys())))
|
||||
baseline_series = []
|
||||
candidate_series = []
|
||||
for bm in all_benchmarks:
|
||||
baseline_series.append(aggregate(baseline[bm][metric]) if bm in baseline and metric in baseline[bm] else None)
|
||||
candidate_series.append(aggregate(candidate[bm][metric]) if bm in candidate and metric in candidate[bm] else None)
|
||||
return (all_benchmarks, baseline_series, candidate_series)
|
||||
|
||||
def main(argv):
|
||||
parser = argparse.ArgumentParser(
|
||||
prog='compare-benchmarks',
|
||||
description='Compare the results of two sets of benchmarks in LNT format.',
|
||||
epilog='This script requires the `tabulate` and the `plotly` Python modules.')
|
||||
parser.add_argument('baseline', type=argparse.FileType('r'),
|
||||
help='Path to a LNT format file containing the benchmark results for the baseline.')
|
||||
parser.add_argument('candidate', type=argparse.FileType('r'),
|
||||
help='Path to a LNT format file containing the benchmark results for the candidate.')
|
||||
parser.add_argument('--metric', type=str, default='execution_time',
|
||||
help='The metric to compare. LNT data may contain multiple metrics (e.g. code size, execution time, etc) -- '
|
||||
'this option allows selecting which metric is being analyzed. The default is "execution_time".')
|
||||
parser.add_argument('--output', '-o', type=argparse.FileType('w'), default=sys.stdout,
|
||||
help='Path of a file where to output the resulting comparison. Default to stdout.')
|
||||
parser.add_argument('--filter', type=str, required=False,
|
||||
help='An optional regular expression used to filter the benchmarks included in the comparison. '
|
||||
'Only benchmarks whose names match the regular expression will be included.')
|
||||
parser.add_argument('--format', type=str, choices=['text', 'chart'], default='text',
|
||||
help='Select the output format. "text" generates a plain-text comparison in tabular form, and "chart" '
|
||||
'generates a self-contained HTML graph that can be opened in a browser. The default is text.')
|
||||
args = parser.parse_args(argv)
|
||||
|
||||
baseline = parse_lnt(args.baseline.readlines())
|
||||
candidate = parse_lnt(args.candidate.readlines())
|
||||
|
||||
if args.filter is not None:
|
||||
regex = re.compile(args.filter)
|
||||
baseline = {k: v for (k, v) in baseline.items() if regex.search(k)}
|
||||
candidate = {k: v for (k, v) in candidate.items() if regex.search(k)}
|
||||
|
||||
(benchmarks, baseline_series, candidate_series) = prepare_series(baseline, candidate, args.metric)
|
||||
|
||||
if args.format == 'chart':
|
||||
figure = create_chart(benchmarks, baseline_series, candidate_series)
|
||||
plotly.io.write_html(figure, file=args.output)
|
||||
else:
|
||||
diff = plain_text_comparison(benchmarks, baseline_series, candidate_series)
|
||||
args.output.write(diff)
|
||||
|
||||
if __name__ == '__main__':
|
||||
main(sys.argv[1:])
|
||||
36
libcxx/utils/consolidate-benchmarks
Executable file
36
libcxx/utils/consolidate-benchmarks
Executable file
@ -0,0 +1,36 @@
|
||||
#!/usr/bin/env python3
|
||||
|
||||
import argparse
|
||||
import pathlib
|
||||
import sys
|
||||
|
||||
def main(argv):
|
||||
parser = argparse.ArgumentParser(
|
||||
prog='consolidate-benchmarks',
|
||||
description='Consolidate benchmark result files (in LNT format) into a single LNT-format file.')
|
||||
parser.add_argument('files_or_directories', type=str, nargs='+',
|
||||
help='Path to files or directories containing LNT data to consolidate. Directories are searched '
|
||||
'recursively for files with a .lnt extension.')
|
||||
parser.add_argument('--output', '-o', type=argparse.FileType('w'), default=sys.stdout,
|
||||
help='Where to output the result. Default to stdout.')
|
||||
args = parser.parse_args(argv)
|
||||
|
||||
files = []
|
||||
for arg in args.files_or_directories:
|
||||
path = pathlib.Path(arg)
|
||||
if path.is_dir():
|
||||
for p in path.rglob('*.lnt'):
|
||||
files.append(p)
|
||||
else:
|
||||
files.append(path)
|
||||
|
||||
for file in files:
|
||||
for line in file.open().readlines():
|
||||
line = line.strip()
|
||||
if not line:
|
||||
continue
|
||||
args.output.write(line)
|
||||
args.output.write('\n')
|
||||
|
||||
if __name__ == '__main__':
|
||||
main(sys.argv[1:])
|
||||
@ -1,57 +0,0 @@
|
||||
#!/usr/bin/env bash
|
||||
|
||||
set -e
|
||||
|
||||
PROGNAME="$(basename "${0}")"
|
||||
MONOREPO_ROOT="$(realpath $(dirname "${PROGNAME}"))"
|
||||
function usage() {
|
||||
cat <<EOF
|
||||
Usage:
|
||||
${PROGNAME} [-h|--help] <build-directory> benchmarks...
|
||||
|
||||
Print the path to the JSON files containing benchmark results for the given benchmarks.
|
||||
|
||||
This requires those benchmarks to have already been run, i.e. this only resolves the path
|
||||
to the benchmark .json file within the build directory.
|
||||
|
||||
<build-directory> The path to the build directory.
|
||||
benchmarks... Paths of the benchmarks to extract the results for. Those paths are relative to '<monorepo-root>'.
|
||||
|
||||
Example
|
||||
=======
|
||||
$ cmake -S runtimes -B build/ -DLLVM_ENABLE_RUNTIMES="libcxx;libcxxabi"
|
||||
$ libcxx-lit build/ -sv libcxx/test/benchmarks/algorithms/for_each.bench.cpp
|
||||
$ less \$(${PROGNAME} build/ libcxx/test/benchmarks/algorithms/for_each.bench.cpp)
|
||||
EOF
|
||||
}
|
||||
|
||||
if [[ "${1}" == "-h" || "${1}" == "--help" ]]; then
|
||||
usage
|
||||
exit 0
|
||||
fi
|
||||
|
||||
if [[ $# -lt 1 ]]; then
|
||||
usage
|
||||
exit 1
|
||||
fi
|
||||
|
||||
build_dir="${1}"
|
||||
shift
|
||||
|
||||
for benchmark in ${@}; do
|
||||
# Normalize the paths by turning all benchmarks paths into absolute ones and then making them
|
||||
# relative to the root of the monorepo.
|
||||
benchmark="$(realpath ${benchmark})"
|
||||
relative=$(python -c "import os; import sys; print(os.path.relpath(sys.argv[1], sys.argv[2]))" "${benchmark}" "${MONOREPO_ROOT}")
|
||||
|
||||
# Extract components of the benchmark path
|
||||
directory="$(dirname ${relative})"
|
||||
file="$(basename ${relative})"
|
||||
|
||||
# Reconstruct the (slightly weird) path to the benchmark json file. This should be kept in sync
|
||||
# whenever the test suite changes.
|
||||
json="${build_dir}/${directory}/Output/${file}.dir/benchmark-result.json"
|
||||
if [[ -f "${json}" ]]; then
|
||||
echo "${json}"
|
||||
fi
|
||||
done
|
||||
@ -1,73 +0,0 @@
|
||||
#!/usr/bin/env bash
|
||||
|
||||
set -e
|
||||
|
||||
PROGNAME="$(basename "${0}")"
|
||||
MONOREPO_ROOT="$(realpath $(dirname "${PROGNAME}"))"
|
||||
function usage() {
|
||||
cat <<EOF
|
||||
Usage:
|
||||
${PROGNAME} [-h|--help] <baseline-build> <candidate-build> benchmarks... [-- gbench-args...]
|
||||
|
||||
Compare the given benchmarks between the baseline and the candidate build directories.
|
||||
|
||||
This requires those benchmarks to have already been generated in both build directories.
|
||||
|
||||
<baseline-build> The path to the build directory considered the baseline.
|
||||
<candidate-build> The path to the build directory considered the candidate.
|
||||
benchmarks... Paths of the benchmarks to compare. Those paths are relative to '<monorepo-root>'.
|
||||
[-- gbench-args...] Any arguments provided after '--' will be passed as-is to GoogleBenchmark's compare.py tool.
|
||||
|
||||
Example
|
||||
=======
|
||||
$ libcxx-lit build1/ -sv libcxx/test/benchmarks/algorithms/for_each.bench.cpp
|
||||
$ libcxx-lit build2/ -sv libcxx/test/benchmarks/algorithms/for_each.bench.cpp
|
||||
$ ${PROGNAME} build1/ build2/ libcxx/test/benchmarks/algorithms/for_each.bench.cpp
|
||||
EOF
|
||||
}
|
||||
|
||||
if [[ "${1}" == "-h" || "${1}" == "--help" ]]; then
|
||||
usage
|
||||
exit 0
|
||||
fi
|
||||
|
||||
if [[ $# -lt 1 ]]; then
|
||||
usage
|
||||
exit 1
|
||||
fi
|
||||
|
||||
baseline="${1}"
|
||||
candidate="${2}"
|
||||
shift; shift
|
||||
|
||||
GBENCH="${MONOREPO_ROOT}/third-party/benchmark"
|
||||
|
||||
python3 -m venv /tmp/libcxx-compare-benchmarks-venv
|
||||
source /tmp/libcxx-compare-benchmarks-venv/bin/activate
|
||||
pip3 install -r ${GBENCH}/tools/requirements.txt
|
||||
|
||||
benchmarks=""
|
||||
while [[ $# -gt 0 ]]; do
|
||||
if [[ "${1}" == "--" ]]; then
|
||||
shift
|
||||
break
|
||||
fi
|
||||
benchmarks+=" ${1}"
|
||||
shift
|
||||
done
|
||||
|
||||
for benchmark in ${benchmarks}; do
|
||||
base="$(${MONOREPO_ROOT}/libcxx/utils/libcxx-benchmark-json ${baseline} ${benchmark})"
|
||||
cand="$(${MONOREPO_ROOT}/libcxx/utils/libcxx-benchmark-json ${candidate} ${benchmark})"
|
||||
|
||||
if [[ ! -e "${base}" ]]; then
|
||||
echo "Benchmark ${benchmark} does not exist in the baseline"
|
||||
continue
|
||||
fi
|
||||
if [[ ! -e "${cand}" ]]; then
|
||||
echo "Benchmark ${benchmark} does not exist in the candidate"
|
||||
continue
|
||||
fi
|
||||
|
||||
"${GBENCH}/tools/compare.py" benchmarks "${base}" "${cand}" ${@}
|
||||
done
|
||||
2
libcxx/utils/requirements.txt
Normal file
2
libcxx/utils/requirements.txt
Normal file
@ -0,0 +1,2 @@
|
||||
plotly
|
||||
tabulate
|
||||
Loading…
x
Reference in New Issue
Block a user