Previously, the segmented iterator optimization was limited to `std::{for_each, for_each_n}`. This patch
extends the optimization to `std::ranges::for_each` and `std::ranges::for_each_n`, ensuring consistent
optimizations across these algorithms. This patch first generalizes the `std` algorithms by introducing
a `Projection` parameter, which is set to `__identity` for the `std` algorithms. Then we let the `ranges`
algorithms to directly call their `std` counterparts with a general `__proj` argument. Benchmarks
demonstrate performance improvements of up to 21.4x for ``std::deque::iterator`` and 22.3x for
``join_view`` of ``vector<vector<char>>``.
Addresses a subtask of #102817.
This patch enhances the performance of `std::for_each_n` when used with
segmented iterators, leading to significant performance improvements,
summarized in the tables below. This addresses a subtask of
https://github.com/llvm/llvm-project/issues/102817.
This patch adds missing benchmark coverage for partial_sort,
partial_sort_copy, is_sorted and is_sorted_until.
It also refactors the existing benchmarks for sort and stable_sort to
follow the consistent style of the new algorithm benchmarks. However,
these benchmarks were notoriously slow to run since they tested multiple
data patterns on multiple data types. To try to alleviate this, I
reduced the benchmarks to only run on integral types and on a single
non-integral type, which should faithfully show how the algorithm
behaves for anything non-integral. However, this is technically a
reduction in coverage.
This PR optimizes the performance of `std::ranges::rotate` for
`vector<bool>::iterator`. The optimization yields a performance
improvement of up to 2096x.
Closes#64038.
This PR optimizes the performance of `std::ranges::swap_ranges` for
`vector<bool>::iterator`, addressing a subtask outlined in issue #64038.
The optimizations yield performance improvements of up to **611x** for
aligned range swap and **78x** for unaligned range swap comparison.
Additionally, comprehensive tests covering up to 4 storage words (256
bytes) with odd and even bit sizes are provided, which validate the
proposed optimizations in this patch.
This PR optimizes the performance of `std::ranges::equal` for
`vector<bool>::iterator`, addressing a subtask outlined in issue #64038.
The optimizations yield performance improvements of up to 188x for
aligned equality comparison and 82x for unaligned equality
comparison. Moreover, comprehensive tests covering up to 4 storage words
(256 bytes) with odd and even bit sizes are provided, which validate the
proposed optimizations in this patch.
As a follow-up to #121013 (which optimized `ranges::copy`) and #121026
(which optimized `ranges::copy_backward`), this PR enhances the
performance of `std::ranges::{move, move_backward}` for
`vector<bool>::iterator`, addressing a subtask outlined in issue #64038.
The optimizations bring performance improvements analogous to those
achieved for the `{copy, copy_backward}` algorithms: up to 2000x for
aligned moves and 60x for unaligned moves. Moreover, comprehensive
tests covering up to 4 storage words (256 bytes) with odd and even bit
sizes are provided, which validate the proposed optimizations in this
patch.
We have some benchmarks that were benchmarking very specific
functionality, namely the optimizations in vector<bool>::iterator. Call
this out in the benchmarks by renaming them appropriately. In the future
we will also increase the coverage of these benchmarks to test other
containers.
Rewrite the sequence container benchmarks to only rely on the actual
operations specified in SequenceContainer requirements and add
benchmarks for std::list, which is also considered a sequence container.
One of the major goals of this refactoring is also to make these
container benchmarks run faster so that they can be run more frequently.
The existing benchmarks have the significant problem that they take so
long to run that they must basically be run overnight. This patch
reduces the size of inputs such that the rewritten benchmarks each take
at most a minute to run.
This patch doesn't touch the string benchmarks, which were not using the
generic container benchmark functions previously.
As a follow-up to #121013 (which focused on `std::ranges::copy`), this
PR optimizes the performance of `std::ranges::copy_backward` for
`vector<bool>::iterator`, addressing a subtask outlined in issue #64038.
The optimizations yield performance improvements of up to 2000x for
aligned copies and 60x for unaligned copies.
The benchmark currently uses makeCartesianProductBenchmark, which
doesn't make a ton of sense, since minmax_element always goes through
every element one by one. The runtime doesn't depend on the values
of the elements.
Fixes#120758
Instead of building the benchmarks separately via CMake and running them
separately from the test suite, this patch merges the benchmarks into
the test suite and handles both uniformly.
As a result:
- It is now possible to run individual benchmarks like we run tests
(e.g. using libcxx-lit), which is a huge quality-of-life improvement.
- The benchmarks will be run under exactly the same configuration as
the rest of the tests, which is a nice simplification. This does
mean that one has to be careful to enable the desired optimization
flags when running benchmarks, but that is easy with e.g.
`libcxx-lit <...> --param optimization=speed`.
- Benchmarks can use the same annotations as the rest of the test
suite, such as `// UNSUPPORTED` & friends.
When running the tests via `check-cxx`, we only compile the benchmarks
because running them would be too time consuming. This introduces a bit
of complexity in the testing setup, and instead it would be better to
allow passing a --dry-run flag to GoogleBenchmark executables, which is
the topic of https://github.com/google/benchmark/issues/1827.
I am not really satisfied with the layering violation of adding the
%{benchmark_flags} substitution to cmake-bridge, however I believe
this can be improved in the future.
This patch fixes warnings and errors that come up when running the
benchmarks as part of the test suite. It also adds the necessary Lit
annotations to make it pass in various configurations and increases the
portability of the benchmarks.
This is an intermediate and fairly mechanical step towards unifying the
benchmarks with the rest of the test suite. Moving this around requires
a few changes, notably making sure we don't throw a wrench into the
discovery process of the normal test suite. This won't be a problem
anymore once benchmarks are taken into account by the test setup out of
the box.