[libc++] Update utilities to compare benchmarks (#157556)

This patch replaces the previous `libcxx-compare-benchmarks` wrapper by
a new `compare-benchmarks` script which works with LNT-compatible data.
This allows comparing benchmark results across libc++ microbenchmarks,
SPEC, and anything else that would produce LNT-compatible data.

It also adds a simple script to consolidate LNT benchmark output into a
single file, simplifying the process of doing A/B runs locally. The
simplest way to do this doesn't require creating two build directories
after this patch anymore.

It also adds the ability to produce either a standalone HTML chart or a
plain text output for diffing results locally when prototyping changes.
Example text output of the new tool:

```
Benchmark                              Baseline    Candidate    Difference    % Difference
-----------------------------------  ----------  -----------  ------------  --------------
BM_join_view_deques/0                      8.11         8.16          0.05            0.63
BM_join_view_deques/1                     13.56        13.79          0.23            1.69
BM_join_view_deques/1024                6606.51      7011.34        404.83            6.13
BM_join_view_deques/2                     17.99        19.92          1.93           10.72
BM_join_view_deques/4000               27655.58     29864.72       2209.14            7.99
BM_join_view_deques/4096               26218.07     30520.13       4302.05           16.41
BM_join_view_deques/512                 3231.66      2832.47       -399.19          -12.35
BM_join_view_deques/5500               47144.82     42207.41      -4937.42          -10.47
BM_join_view_deques/64                   247.23       262.66         15.43            6.24
BM_join_view_deques/64000             756221.63    511247.48    -244974.15          -32.39
BM_join_view_deques/65536             537110.91    560241.61      23130.70            4.31
BM_join_view_deques/70000             815739.07    616181.34    -199557.73          -24.46
BM_join_view_out_vectors/0                 0.93         0.93          0.00            0.07
BM_join_view_out_vectors/1                 3.11         3.14          0.03            0.82
BM_join_view_out_vectors/1024           3090.92      3563.29        472.37           15.28
BM_join_view_out_vectors/2                 5.52         5.56          0.04            0.64
BM_join_view_out_vectors/4000           9887.21      9774.40       -112.82           -1.14
BM_join_view_out_vectors/4096          10158.78     10190.44         31.66            0.31
BM_join_view_out_vectors/512            1218.68      1209.59         -9.09           -0.75
BM_join_view_out_vectors/5500          13559.23     13676.06        116.84            0.86
BM_join_view_out_vectors/64              158.95       157.91         -1.04           -0.65
BM_join_view_out_vectors/64000        178514.73    226520.97      48006.24           26.89
BM_join_view_out_vectors/65536        184639.37    207180.35      22540.98           12.21
BM_join_view_out_vectors/70000        235006.69    213886.93     -21119.77           -8.99
```
This commit is contained in:
Louis Dionne 2025-09-09 10:52:44 -04:00 committed by GitHub
parent f9e5d39c4c
commit a6af641b89
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
6 changed files with 191 additions and 141 deletions

View File

@ -471,7 +471,7 @@ removed from the Standard. These tests should be written like:
Benchmarks
==========
Libc++'s test suite also contains benchmarks. The benchmarks are written using the `Google Benchmark`_
Libc++'s test suite also contains benchmarks. Many benchmarks are written using the `Google Benchmark`_
library, a copy of which is stored in the LLVM monorepo. For more information about using the Google
Benchmark library, see the `official documentation <https://github.com/google/benchmark>`_.
@ -490,27 +490,46 @@ run through ``check-cxx`` for anything, instead run the benchmarks manually usin
the instructions for running individual tests.
If you want to compare the results of different benchmark runs, we recommend using the
``libcxx-compare-benchmarks`` helper tool. First, configure CMake in a build directory
and run the benchmark:
``compare-benchmarks`` helper tool. Note that the script has some dependencies, which can
be installed with:
.. code-block:: bash
$ cmake -S runtimes -B <build1> [...]
$ libcxx/utils/libcxx-lit <build1> libcxx/test/benchmarks/string.bench.cpp --param optimization=speed
$ python -m venv .venv && source .venv/bin/activate # Optional but recommended
$ pip install -r libcxx/utils/requirements.txt
Then, do the same for the second configuration you want to test. Use a different build
directory for that configuration:
Once that's done, start by configuring CMake in a build directory and running one or
more benchmarks, as usual:
.. code-block:: bash
$ cmake -S runtimes -B <build2> [...]
$ libcxx/utils/libcxx-lit <build2> libcxx/test/benchmarks/string.bench.cpp --param optimization=speed
$ cmake -S runtimes -B <build> [...]
$ libcxx/utils/libcxx-lit <build> libcxx/test/benchmarks/string.bench.cpp --param optimization=speed
Finally, use ``libcxx-compare-benchmarks`` to compare both:
Then, get the consolidated benchmark output for that run using ``consolidate-benchmarks``:
.. code-block:: bash
$ libcxx/utils/libcxx-compare-benchmarks <build1> <build2> libcxx/test/benchmarks/string.bench.cpp
$ libcxx/utils/consolidate-benchmarks <build> > baseline.lnt
The ``baseline.lnt`` file will contain a consolidation of all the benchmark results present in the build
directory. You can then make the desired modifications to the code, run the benchmark(s) again, and then run:
.. code-block:: bash
$ libcxx/utils/consolidate-benchmarks <build> > candidate.lnt
Finally, use ``compare-benchmarks`` to compare both:
.. code-block:: bash
$ libcxx/utils/compare-benchmarks baseline.lnt candidate.lnt
# Useful one-liner when iterating locally:
$ libcxx/utils/compare-benchmarks baseline.lnt <(libcxx/utils/consolidate-benchmarks <build>)
The ``compare-benchmarks`` script provides some useful options like creating a chart to easily visualize
differences in a browser window. Use ``compare-benchmarks --help`` for details.
.. _`Google Benchmark`: https://github.com/google/benchmark

123
libcxx/utils/compare-benchmarks Executable file
View File

@ -0,0 +1,123 @@
#!/usr/bin/env python3
import argparse
import re
import statistics
import sys
import plotly
import tabulate
def parse_lnt(lines):
"""
Parse lines in LNT format and return a dictionnary of the form:
{
'benchmark1': {
'metric1': [float],
'metric2': [float],
...
},
'benchmark2': {
'metric1': [float],
'metric2': [float],
...
},
...
}
Each metric may have multiple values.
"""
results = {}
for line in lines:
line = line.strip()
if not line:
continue
(identifier, value) = line.split(' ')
(name, metric) = identifier.split('.')
if name not in results:
results[name] = {}
if metric not in results[name]:
results[name][metric] = []
results[name][metric].append(float(value))
return results
def plain_text_comparison(benchmarks, baseline, candidate):
"""
Create a tabulated comparison of the baseline and the candidate.
"""
headers = ['Benchmark', 'Baseline', 'Candidate', 'Difference', '% Difference']
fmt = (None, '.2f', '.2f', '.2f', '.2f')
table = []
for (bm, base, cand) in zip(benchmarks, baseline, candidate):
diff = (cand - base) if base and cand else None
percent = 100 * (diff / base) if base and cand else None
row = [bm, base, cand, diff, percent]
table.append(row)
return tabulate.tabulate(table, headers=headers, floatfmt=fmt, numalign='right')
def create_chart(benchmarks, baseline, candidate):
"""
Create a bar chart comparing 'baseline' and 'candidate'.
"""
figure = plotly.graph_objects.Figure()
figure.add_trace(plotly.graph_objects.Bar(x=benchmarks, y=baseline, name='Baseline'))
figure.add_trace(plotly.graph_objects.Bar(x=benchmarks, y=candidate, name='Candidate'))
return figure
def prepare_series(baseline, candidate, metric, aggregate=statistics.median):
"""
Prepare the data for being formatted or displayed as a chart.
Metrics that have more than one value are aggregated using the given aggregation function.
"""
all_benchmarks = sorted(list(set(baseline.keys()) | set(candidate.keys())))
baseline_series = []
candidate_series = []
for bm in all_benchmarks:
baseline_series.append(aggregate(baseline[bm][metric]) if bm in baseline and metric in baseline[bm] else None)
candidate_series.append(aggregate(candidate[bm][metric]) if bm in candidate and metric in candidate[bm] else None)
return (all_benchmarks, baseline_series, candidate_series)
def main(argv):
parser = argparse.ArgumentParser(
prog='compare-benchmarks',
description='Compare the results of two sets of benchmarks in LNT format.',
epilog='This script requires the `tabulate` and the `plotly` Python modules.')
parser.add_argument('baseline', type=argparse.FileType('r'),
help='Path to a LNT format file containing the benchmark results for the baseline.')
parser.add_argument('candidate', type=argparse.FileType('r'),
help='Path to a LNT format file containing the benchmark results for the candidate.')
parser.add_argument('--metric', type=str, default='execution_time',
help='The metric to compare. LNT data may contain multiple metrics (e.g. code size, execution time, etc) -- '
'this option allows selecting which metric is being analyzed. The default is "execution_time".')
parser.add_argument('--output', '-o', type=argparse.FileType('w'), default=sys.stdout,
help='Path of a file where to output the resulting comparison. Default to stdout.')
parser.add_argument('--filter', type=str, required=False,
help='An optional regular expression used to filter the benchmarks included in the comparison. '
'Only benchmarks whose names match the regular expression will be included.')
parser.add_argument('--format', type=str, choices=['text', 'chart'], default='text',
help='Select the output format. "text" generates a plain-text comparison in tabular form, and "chart" '
'generates a self-contained HTML graph that can be opened in a browser. The default is text.')
args = parser.parse_args(argv)
baseline = parse_lnt(args.baseline.readlines())
candidate = parse_lnt(args.candidate.readlines())
if args.filter is not None:
regex = re.compile(args.filter)
baseline = {k: v for (k, v) in baseline.items() if regex.search(k)}
candidate = {k: v for (k, v) in candidate.items() if regex.search(k)}
(benchmarks, baseline_series, candidate_series) = prepare_series(baseline, candidate, args.metric)
if args.format == 'chart':
figure = create_chart(benchmarks, baseline_series, candidate_series)
plotly.io.write_html(figure, file=args.output)
else:
diff = plain_text_comparison(benchmarks, baseline_series, candidate_series)
args.output.write(diff)
if __name__ == '__main__':
main(sys.argv[1:])

View File

@ -0,0 +1,36 @@
#!/usr/bin/env python3
import argparse
import pathlib
import sys
def main(argv):
parser = argparse.ArgumentParser(
prog='consolidate-benchmarks',
description='Consolidate benchmark result files (in LNT format) into a single LNT-format file.')
parser.add_argument('files_or_directories', type=str, nargs='+',
help='Path to files or directories containing LNT data to consolidate. Directories are searched '
'recursively for files with a .lnt extension.')
parser.add_argument('--output', '-o', type=argparse.FileType('w'), default=sys.stdout,
help='Where to output the result. Default to stdout.')
args = parser.parse_args(argv)
files = []
for arg in args.files_or_directories:
path = pathlib.Path(arg)
if path.is_dir():
for p in path.rglob('*.lnt'):
files.append(p)
else:
files.append(path)
for file in files:
for line in file.open().readlines():
line = line.strip()
if not line:
continue
args.output.write(line)
args.output.write('\n')
if __name__ == '__main__':
main(sys.argv[1:])

View File

@ -1,57 +0,0 @@
#!/usr/bin/env bash
set -e
PROGNAME="$(basename "${0}")"
MONOREPO_ROOT="$(realpath $(dirname "${PROGNAME}"))"
function usage() {
cat <<EOF
Usage:
${PROGNAME} [-h|--help] <build-directory> benchmarks...
Print the path to the JSON files containing benchmark results for the given benchmarks.
This requires those benchmarks to have already been run, i.e. this only resolves the path
to the benchmark .json file within the build directory.
<build-directory> The path to the build directory.
benchmarks... Paths of the benchmarks to extract the results for. Those paths are relative to '<monorepo-root>'.
Example
=======
$ cmake -S runtimes -B build/ -DLLVM_ENABLE_RUNTIMES="libcxx;libcxxabi"
$ libcxx-lit build/ -sv libcxx/test/benchmarks/algorithms/for_each.bench.cpp
$ less \$(${PROGNAME} build/ libcxx/test/benchmarks/algorithms/for_each.bench.cpp)
EOF
}
if [[ "${1}" == "-h" || "${1}" == "--help" ]]; then
usage
exit 0
fi
if [[ $# -lt 1 ]]; then
usage
exit 1
fi
build_dir="${1}"
shift
for benchmark in ${@}; do
# Normalize the paths by turning all benchmarks paths into absolute ones and then making them
# relative to the root of the monorepo.
benchmark="$(realpath ${benchmark})"
relative=$(python -c "import os; import sys; print(os.path.relpath(sys.argv[1], sys.argv[2]))" "${benchmark}" "${MONOREPO_ROOT}")
# Extract components of the benchmark path
directory="$(dirname ${relative})"
file="$(basename ${relative})"
# Reconstruct the (slightly weird) path to the benchmark json file. This should be kept in sync
# whenever the test suite changes.
json="${build_dir}/${directory}/Output/${file}.dir/benchmark-result.json"
if [[ -f "${json}" ]]; then
echo "${json}"
fi
done

View File

@ -1,73 +0,0 @@
#!/usr/bin/env bash
set -e
PROGNAME="$(basename "${0}")"
MONOREPO_ROOT="$(realpath $(dirname "${PROGNAME}"))"
function usage() {
cat <<EOF
Usage:
${PROGNAME} [-h|--help] <baseline-build> <candidate-build> benchmarks... [-- gbench-args...]
Compare the given benchmarks between the baseline and the candidate build directories.
This requires those benchmarks to have already been generated in both build directories.
<baseline-build> The path to the build directory considered the baseline.
<candidate-build> The path to the build directory considered the candidate.
benchmarks... Paths of the benchmarks to compare. Those paths are relative to '<monorepo-root>'.
[-- gbench-args...] Any arguments provided after '--' will be passed as-is to GoogleBenchmark's compare.py tool.
Example
=======
$ libcxx-lit build1/ -sv libcxx/test/benchmarks/algorithms/for_each.bench.cpp
$ libcxx-lit build2/ -sv libcxx/test/benchmarks/algorithms/for_each.bench.cpp
$ ${PROGNAME} build1/ build2/ libcxx/test/benchmarks/algorithms/for_each.bench.cpp
EOF
}
if [[ "${1}" == "-h" || "${1}" == "--help" ]]; then
usage
exit 0
fi
if [[ $# -lt 1 ]]; then
usage
exit 1
fi
baseline="${1}"
candidate="${2}"
shift; shift
GBENCH="${MONOREPO_ROOT}/third-party/benchmark"
python3 -m venv /tmp/libcxx-compare-benchmarks-venv
source /tmp/libcxx-compare-benchmarks-venv/bin/activate
pip3 install -r ${GBENCH}/tools/requirements.txt
benchmarks=""
while [[ $# -gt 0 ]]; do
if [[ "${1}" == "--" ]]; then
shift
break
fi
benchmarks+=" ${1}"
shift
done
for benchmark in ${benchmarks}; do
base="$(${MONOREPO_ROOT}/libcxx/utils/libcxx-benchmark-json ${baseline} ${benchmark})"
cand="$(${MONOREPO_ROOT}/libcxx/utils/libcxx-benchmark-json ${candidate} ${benchmark})"
if [[ ! -e "${base}" ]]; then
echo "Benchmark ${benchmark} does not exist in the baseline"
continue
fi
if [[ ! -e "${cand}" ]]; then
echo "Benchmark ${benchmark} does not exist in the candidate"
continue
fi
"${GBENCH}/tools/compare.py" benchmarks "${base}" "${cand}" ${@}
done

View File

@ -0,0 +1,2 @@
plotly
tabulate