[libc++] Update utilities to compare benchmarks (#157556)

This patch replaces the previous `libcxx-compare-benchmarks` wrapper by a new `compare-benchmarks` script which works with LNT-compatible data. This allows comparing benchmark results across libc++ microbenchmarks, SPEC, and anything else that would produce LNT-compatible data. It also adds a simple script to consolidate LNT benchmark output into a single file, simplifying the process of doing A/B runs locally. The simplest way to do this doesn't require creating two build directories after this patch anymore. It also adds the ability to produce either a standalone HTML chart or a plain text output for diffing results locally when prototyping changes. Example text output of the new tool: ``` Benchmark Baseline Candidate Difference % Difference ----------------------------------- ---------- ----------- ------------ -------------- BM_join_view_deques/0 8.11 8.16 0.05 0.63 BM_join_view_deques/1 13.56 13.79 0.23 1.69 BM_join_view_deques/1024 6606.51 7011.34 404.83 6.13 BM_join_view_deques/2 17.99 19.92 1.93 10.72 BM_join_view_deques/4000 27655.58 29864.72 2209.14 7.99 BM_join_view_deques/4096 26218.07 30520.13 4302.05 16.41 BM_join_view_deques/512 3231.66 2832.47 -399.19 -12.35 BM_join_view_deques/5500 47144.82 42207.41 -4937.42 -10.47 BM_join_view_deques/64 247.23 262.66 15.43 6.24 BM_join_view_deques/64000 756221.63 511247.48 -244974.15 -32.39 BM_join_view_deques/65536 537110.91 560241.61 23130.70 4.31 BM_join_view_deques/70000 815739.07 616181.34 -199557.73 -24.46 BM_join_view_out_vectors/0 0.93 0.93 0.00 0.07 BM_join_view_out_vectors/1 3.11 3.14 0.03 0.82 BM_join_view_out_vectors/1024 3090.92 3563.29 472.37 15.28 BM_join_view_out_vectors/2 5.52 5.56 0.04 0.64 BM_join_view_out_vectors/4000 9887.21 9774.40 -112.82 -1.14 BM_join_view_out_vectors/4096 10158.78 10190.44 31.66 0.31 BM_join_view_out_vectors/512 1218.68 1209.59 -9.09 -0.75 BM_join_view_out_vectors/5500 13559.23 13676.06 116.84 0.86 BM_join_view_out_vectors/64 158.95 157.91 -1.04 -0.65 BM_join_view_out_vectors/64000 178514.73 226520.97 48006.24 26.89 BM_join_view_out_vectors/65536 184639.37 207180.35 22540.98 12.21 BM_join_view_out_vectors/70000 235006.69 213886.93 -21119.77 -8.99 ```
2025-09-09 10:52:44 -04:00 · 2025-09-09 10:52:44 -04:00 · a6af641b89
commit a6af641b89
parent f9e5d39c4c
6 changed files with 191 additions and 141 deletions
--- a/libcxx/docs/TestingLibcxx.rst
+++ b/libcxx/docs/TestingLibcxx.rst
@ -471,7 +471,7 @@ removed from the Standard. These tests should be written like:
 Benchmarks
 ==========

-Libc++'s test suite also contains benchmarks. The benchmarks are written using the `Google Benchmark`_
+Libc++'s test suite also contains benchmarks. Many benchmarks are written using the `Google Benchmark`_
 library, a copy of which is stored in the LLVM monorepo. For more information about using the Google
 Benchmark library, see the `official documentation <https://github.com/google/benchmark>`_.

@ -490,27 +490,46 @@ run through ``check-cxx`` for anything, instead run the benchmarks manually usin
 the instructions for running individual tests.

 If you want to compare the results of different benchmark runs, we recommend using the
-``libcxx-compare-benchmarks`` helper tool. First, configure CMake in a build directory
-and run the benchmark:
+``compare-benchmarks`` helper tool. Note that the script has some dependencies, which can
+be installed with:

 .. code-block:: bash

-  $ cmake -S runtimes -B <build1> [...]
-  $ libcxx/utils/libcxx-lit <build1> libcxx/test/benchmarks/string.bench.cpp --param optimization=speed
+  $ python -m venv .venv && source .venv/bin/activate # Optional but recommended
+  $ pip install -r libcxx/utils/requirements.txt

-Then, do the same for the second configuration you want to test. Use a different build
-directory for that configuration:
+Once that's done, start by configuring CMake in a build directory and running one or
+more benchmarks, as usual:

 .. code-block:: bash

-  $ cmake -S runtimes -B <build2> [...]
-  $ libcxx/utils/libcxx-lit <build2> libcxx/test/benchmarks/string.bench.cpp --param optimization=speed
+  $ cmake -S runtimes -B <build> [...]
+  $ libcxx/utils/libcxx-lit <build> libcxx/test/benchmarks/string.bench.cpp --param optimization=speed

-Finally, use ``libcxx-compare-benchmarks`` to compare both:
+Then, get the consolidated benchmark output for that run using ``consolidate-benchmarks``:

 .. code-block:: bash

-  $ libcxx/utils/libcxx-compare-benchmarks <build1> <build2> libcxx/test/benchmarks/string.bench.cpp
+  $ libcxx/utils/consolidate-benchmarks <build> > baseline.lnt
+
+The ``baseline.lnt`` file will contain a consolidation of all the benchmark results present in the build
+directory. You can then make the desired modifications to the code, run the benchmark(s) again, and then run:
+
+.. code-block:: bash
+
+  $ libcxx/utils/consolidate-benchmarks <build> > candidate.lnt
+
+Finally, use ``compare-benchmarks`` to compare both:
+
+.. code-block:: bash
+
+  $ libcxx/utils/compare-benchmarks baseline.lnt candidate.lnt
+
+  # Useful one-liner when iterating locally:
+  $ libcxx/utils/compare-benchmarks baseline.lnt <(libcxx/utils/consolidate-benchmarks <build>)
+
+The ``compare-benchmarks`` script provides some useful options like creating a chart to easily visualize
+differences in a browser window. Use ``compare-benchmarks --help`` for details.

 .. _`Google Benchmark`: https://github.com/google/benchmark

--- a/libcxx/utils/compare-benchmarks
+++ b/libcxx/utils/compare-benchmarks
@ -0,0 +1,123 @@
+#!/usr/bin/env python3
+
+import argparse
+import re
+import statistics
+import sys
+
+import plotly
+import tabulate
+
+def parse_lnt(lines):
+    """
+    Parse lines in LNT format and return a dictionnary of the form:
+
+        {
+            'benchmark1': {
+                'metric1': [float],
+                'metric2': [float],
+                ...
+            },
+            'benchmark2': {
+                'metric1': [float],
+                'metric2': [float],
+                ...
+            },
+            ...
+        }
+
+    Each metric may have multiple values.
+    """
+    results = {}
+    for line in lines:
+        line = line.strip()
+        if not line:
+            continue
+
+        (identifier, value) = line.split(' ')
+        (name, metric) = identifier.split('.')
+        if name not in results:
+            results[name] = {}
+        if metric not in results[name]:
+            results[name][metric] = []
+        results[name][metric].append(float(value))
+    return results
+
+def plain_text_comparison(benchmarks, baseline, candidate):
+    """
+    Create a tabulated comparison of the baseline and the candidate.
+    """
+    headers = ['Benchmark', 'Baseline', 'Candidate', 'Difference', '% Difference']
+    fmt = (None, '.2f', '.2f', '.2f', '.2f')
+    table = []
+    for (bm, base, cand) in zip(benchmarks, baseline, candidate):
+        diff = (cand - base) if base and cand else None
+        percent = 100 * (diff / base) if base and cand else None
+        row = [bm, base, cand, diff, percent]
+        table.append(row)
+    return tabulate.tabulate(table, headers=headers, floatfmt=fmt, numalign='right')
+
+def create_chart(benchmarks, baseline, candidate):
+    """
+    Create a bar chart comparing 'baseline' and 'candidate'.
+    """
+    figure = plotly.graph_objects.Figure()
+    figure.add_trace(plotly.graph_objects.Bar(x=benchmarks, y=baseline, name='Baseline'))
+    figure.add_trace(plotly.graph_objects.Bar(x=benchmarks, y=candidate, name='Candidate'))
+    return figure
+
+def prepare_series(baseline, candidate, metric, aggregate=statistics.median):
+    """
+    Prepare the data for being formatted or displayed as a chart.
+
+    Metrics that have more than one value are aggregated using the given aggregation function.
+    """
+    all_benchmarks = sorted(list(set(baseline.keys()) | set(candidate.keys())))
+    baseline_series = []
+    candidate_series = []
+    for bm in all_benchmarks:
+        baseline_series.append(aggregate(baseline[bm][metric]) if bm in baseline and metric in baseline[bm] else None)
+        candidate_series.append(aggregate(candidate[bm][metric]) if bm in candidate and metric in candidate[bm] else None)
+    return (all_benchmarks, baseline_series, candidate_series)
+
+def main(argv):
+    parser = argparse.ArgumentParser(
+        prog='compare-benchmarks',
+        description='Compare the results of two sets of benchmarks in LNT format.',
+        epilog='This script requires the `tabulate` and the `plotly` Python modules.')
+    parser.add_argument('baseline', type=argparse.FileType('r'),
+        help='Path to a LNT format file containing the benchmark results for the baseline.')
+    parser.add_argument('candidate', type=argparse.FileType('r'),
+        help='Path to a LNT format file containing the benchmark results for the candidate.')
+    parser.add_argument('--metric', type=str, default='execution_time',
+        help='The metric to compare. LNT data may contain multiple metrics (e.g. code size, execution time, etc) -- '
+             'this option allows selecting which metric is being analyzed. The default is "execution_time".')
+    parser.add_argument('--output', '-o', type=argparse.FileType('w'), default=sys.stdout,
+        help='Path of a file where to output the resulting comparison. Default to stdout.')
+    parser.add_argument('--filter', type=str, required=False,
+        help='An optional regular expression used to filter the benchmarks included in the comparison. '
+             'Only benchmarks whose names match the regular expression will be included.')
+    parser.add_argument('--format', type=str, choices=['text', 'chart'], default='text',
+        help='Select the output format. "text" generates a plain-text comparison in tabular form, and "chart" '
+             'generates a self-contained HTML graph that can be opened in a browser. The default is text.')
+    args = parser.parse_args(argv)
+
+    baseline = parse_lnt(args.baseline.readlines())
+    candidate = parse_lnt(args.candidate.readlines())
+
+    if args.filter is not None:
+        regex = re.compile(args.filter)
+        baseline = {k: v for (k, v) in baseline.items() if regex.search(k)}
+        candidate = {k: v for (k, v) in candidate.items() if regex.search(k)}
+
+    (benchmarks, baseline_series, candidate_series) = prepare_series(baseline, candidate, args.metric)
+
+    if args.format == 'chart':
+        figure = create_chart(benchmarks, baseline_series, candidate_series)
+        plotly.io.write_html(figure, file=args.output)
+    else:
+        diff = plain_text_comparison(benchmarks, baseline_series, candidate_series)
+        args.output.write(diff)
+
+if __name__ == '__main__':
+    main(sys.argv[1:])
--- a/libcxx/utils/consolidate-benchmarks
+++ b/libcxx/utils/consolidate-benchmarks
@ -0,0 +1,36 @@
+#!/usr/bin/env python3
+
+import argparse
+import pathlib
+import sys
+
+def main(argv):
+    parser = argparse.ArgumentParser(
+        prog='consolidate-benchmarks',
+        description='Consolidate benchmark result files (in LNT format) into a single LNT-format file.')
+    parser.add_argument('files_or_directories', type=str, nargs='+',
+        help='Path to files or directories containing LNT data to consolidate. Directories are searched '
+             'recursively for files with a .lnt extension.')
+    parser.add_argument('--output', '-o', type=argparse.FileType('w'), default=sys.stdout,
+        help='Where to output the result. Default to stdout.')
+    args = parser.parse_args(argv)
+
+    files = []
+    for arg in args.files_or_directories:
+        path = pathlib.Path(arg)
+        if path.is_dir():
+            for p in path.rglob('*.lnt'):
+                files.append(p)
+        else:
+            files.append(path)
+
+    for file in files:
+        for line in file.open().readlines():
+            line = line.strip()
+            if not line:
+                continue
+            args.output.write(line)
+            args.output.write('\n')
+
+if __name__ == '__main__':
+    main(sys.argv[1:])
--- a/libcxx/utils/libcxx-benchmark-json
+++ b/libcxx/utils/libcxx-benchmark-json
@ -1,57 +0,0 @@
-#!/usr/bin/env bash
-
-set -e
-
-PROGNAME="$(basename "${0}")"
-MONOREPO_ROOT="$(realpath $(dirname "${PROGNAME}"))"
-function usage() {
-cat <<EOF
-Usage:
-${PROGNAME} [-h|--help] <build-directory> benchmarks...
-
-Print the path to the JSON files containing benchmark results for the given benchmarks.
-
-This requires those benchmarks to have already been run, i.e. this only resolves the path
-to the benchmark .json file within the build directory.
-
-<build-directory>  The path to the build directory.
-benchmarks...      Paths of the benchmarks to extract the results for. Those paths are relative to '<monorepo-root>'.
-
-Example
-=======
-$ cmake -S runtimes -B build/ -DLLVM_ENABLE_RUNTIMES="libcxx;libcxxabi"
-$ libcxx-lit build/ -sv libcxx/test/benchmarks/algorithms/for_each.bench.cpp
-$ less \$(${PROGNAME} build/ libcxx/test/benchmarks/algorithms/for_each.bench.cpp)
-EOF
-}
-
-if [[ "${1}" == "-h" || "${1}" == "--help" ]]; then
-    usage
-    exit 0
-fi
-
-if [[ $# -lt 1 ]]; then
-    usage
-    exit 1
-fi
-
-build_dir="${1}"
-shift
-
-for benchmark in ${@}; do
-    # Normalize the paths by turning all benchmarks paths into absolute ones and then making them
-    # relative to the root of the monorepo.
-    benchmark="$(realpath ${benchmark})"
-    relative=$(python -c "import os; import sys; print(os.path.relpath(sys.argv[1], sys.argv[2]))" "${benchmark}" "${MONOREPO_ROOT}")
-
-    # Extract components of the benchmark path
-    directory="$(dirname ${relative})"
-    file="$(basename ${relative})"
-
-    # Reconstruct the (slightly weird) path to the benchmark json file. This should be kept in sync
-    # whenever the test suite changes.
-    json="${build_dir}/${directory}/Output/${file}.dir/benchmark-result.json"
-    if [[ -f "${json}" ]]; then
-        echo "${json}"
-    fi
-done
--- a/libcxx/utils/libcxx-compare-benchmarks
+++ b/libcxx/utils/libcxx-compare-benchmarks
@ -1,73 +0,0 @@
-#!/usr/bin/env bash
-
-set -e
-
-PROGNAME="$(basename "${0}")"
-MONOREPO_ROOT="$(realpath $(dirname "${PROGNAME}"))"
-function usage() {
-cat <<EOF
-Usage:
-${PROGNAME} [-h|--help] <baseline-build> <candidate-build> benchmarks... [-- gbench-args...]
-
-Compare the given benchmarks between the baseline and the candidate build directories.
-
-This requires those benchmarks to have already been generated in both build directories.
-
-<baseline-build>     The path to the build directory considered the baseline.
-<candidate-build>    The path to the build directory considered the candidate.
-benchmarks...        Paths of the benchmarks to compare. Those paths are relative to '<monorepo-root>'.
-[-- gbench-args...]  Any arguments provided after '--' will be passed as-is to GoogleBenchmark's compare.py tool.
-
-Example
-=======
-$ libcxx-lit build1/ -sv libcxx/test/benchmarks/algorithms/for_each.bench.cpp
-$ libcxx-lit build2/ -sv libcxx/test/benchmarks/algorithms/for_each.bench.cpp
-$ ${PROGNAME} build1/ build2/ libcxx/test/benchmarks/algorithms/for_each.bench.cpp
-EOF
-}
-
-if [[ "${1}" == "-h" || "${1}" == "--help" ]]; then
-    usage
-    exit 0
-fi
-
-if [[ $# -lt 1 ]]; then
-    usage
-    exit 1
-fi
-
-baseline="${1}"
-candidate="${2}"
-shift; shift
-
-GBENCH="${MONOREPO_ROOT}/third-party/benchmark"
-
-python3 -m venv /tmp/libcxx-compare-benchmarks-venv
-source /tmp/libcxx-compare-benchmarks-venv/bin/activate
-pip3 install -r ${GBENCH}/tools/requirements.txt
-
-benchmarks=""
-while [[ $# -gt 0 ]]; do
-    if [[ "${1}" == "--" ]]; then
-        shift
-        break
-    fi
-    benchmarks+=" ${1}"
-    shift
-done
-
-for benchmark in ${benchmarks}; do
-    base="$(${MONOREPO_ROOT}/libcxx/utils/libcxx-benchmark-json ${baseline} ${benchmark})"
-    cand="$(${MONOREPO_ROOT}/libcxx/utils/libcxx-benchmark-json ${candidate} ${benchmark})"
-
-    if [[ ! -e "${base}" ]]; then
-        echo "Benchmark ${benchmark} does not exist in the baseline"
-        continue
-    fi
-    if [[ ! -e "${cand}" ]]; then
-        echo "Benchmark ${benchmark} does not exist in the candidate"
-        continue
-    fi
-
-    "${GBENCH}/tools/compare.py" benchmarks "${base}" "${cand}" ${@}
-done
--- a/libcxx/utils/requirements.txt
+++ b/libcxx/utils/requirements.txt
@ -0,0 +1,2 @@
+plotly
+tabulate