Louis Dionne
c7aaaeaef4
[libc++] Rename a few benchmarks to allow identifying what's being benchmarked from the name ( #185747 )
2026-03-11 08:15:02 -04:00
Nikolas Klauser
15bcae5e3b
[libc++] Drop the unrepresentative search_n benchmark ( #184783 )
...
This benchmark isn't very good at benchmarking `search_n`, since a good
`search_n` implementation can go through it in ~10 perfectly predictable
steps. We can drop it to avoid spending unnecessary resources. This also
fixes that the two benchmark sets have identical names.
Fixes #183832
2026-03-10 10:40:12 +01:00
Nikolas Klauser
ab25249e63
[libc++] Refactor std::push_heap benchmark ( #181343 )
...
We're trying to get the time it takes to run all the benchmarks down, so
that we can run them on a regular basis. This patch saves us ~8 minutes
per run.
Fixes #177028
2026-02-14 08:34:45 +01:00
Nikolas Klauser
8c93fb0c05
[libc++] Refactor benchmarking std::make_heap and std::sort_heap together ( #180935 )
...
We're trying to get the time it takes to run all the benchmarks down, so
that we can run them on a regular basis. This patch saves us ~18 minutes
per run.
2026-02-11 13:17:44 -05:00
Nikolas Klauser
bc1e852241
[libc++] Refactor std::sort_heap benchmark ( #180941 )
...
We're trying to get the time it takes to run all the benchmarks down, so
that we can run them on a regular basis. This patch saves us ~80 seconds
per run.
2026-02-11 13:17:28 -05:00
Nikolas Klauser
f32fd56f0e
[libc++] Reduce the number of runs on the ranges::min{,max} benchmarks ( #179912 )
...
Testing a bunch of range sizes has relatively little value. This reduces
the number of benchmarks so we can run them on a regular basis. This
saves ~10 minutes when running benchmarks.
Fixes #179698
2026-02-11 13:17:03 -05:00
Nikolas Klauser
064694160f
[libc++] Rewrite the std::pop_heap benchmark ( #179911 )
...
Testing a bunch of random types has relatively little value. This
reduces the number of benchmarks so we can run them on a regular basis.
This saves ~90 seconds when running the benchmarks.
2026-02-11 12:47:33 +01:00
Nikolas Klauser
0bf41e3d7a
[libc++] Rewrite the std::make_heap benchmark ( #178696 )
...
This rewrites the `make_heap` benchmark to make it significantly faster
to run. In my test it saves ~10 minutes.
This patch also drops `ranges::` heap benchmarks, since we've decided to
remove `ranges::` benchmarks if there is a `std::` equivalent.
2026-02-05 11:46:11 +01:00
Nikolas Klauser
a2fb416b53
[libc++] Rewrite the std::lower_bound benchmark to be more efficient and add an upper_bound benchmark ( #177180 )
...
The current benchmark is incredibly slow to run. This patch refactors
the benchmark to be faster and also adds an equivalent benchmark for
`std::upper_bound`.
Fixes #177026
2026-01-27 13:46:36 -05:00
Nikolas Klauser
07a0e0fb31
[libc++] Remove benchmarks for ranges algorithms that have a std equivalent ( #176138 )
...
We're currently running all the algorithms benchmarks for the `std` and
`ranges` variants, even though almost all the algorithms share the same
code. This makes running the benchmarks on a large set of commits very
slow and costly. This reduced running the `algorithm/` subdirectory from
~4 hours to roughtly 2.5 hours.
Fixes #175973
2026-01-19 16:04:32 +01:00
Nikolas Klauser
4304106dc2
Reapply "[libc++] Optimize std::find_if" ( #175903 ) ( #175921 )
...
#175913 removed that `__builtin_assume_dereferenceable(ptr, 0)` implies
`ptr != nullptr`, which should allow us to use the builtin with LLVM 23.
This reverts commit 776c09c212e945fdceeae240b42c38df3dd34727.
2026-01-15 10:16:45 +01:00
Nikolas Klauser
776c09c212
Revert "[libc++] Optimize std::find_if" ( #175903 )
...
`__builtin_assume_dereferenceable` currently implies that the pointer
given to it is non-null, even if the size is zero. This causes
miscompilations, since we consider [nullptr, nullptr) to be a valid
range.
Reverts llvm/llvm-project#167697
2026-01-14 11:15:21 +01:00
Nikolas Klauser
6189512f73
[libc++] Optimize std::find_if ( #167697 )
...
```
Benchmark 4ecfaa602f56 80d5ac247d34 Difference % Difference
---------------------------- -------------- -------------- ------------ --------------
bm_find_if_autovectorization 1901.51 306.12 -1595.39 -83.90%
```
2026-01-13 11:13:54 +01:00
Nikolas Klauser
fbde9240e9
[libc++] Remove some low value benchmarks ( #175178 )
...
This patch removes some of the benchmarks in our suite that are
relatively low value. Specifically:
- `lexicographical_compare` isn't benchmarked with every number of
elements between 1 and 8 anymore
- `bitset` isn't benchmarked with as many sizes anymore. Instead, only
some representative sizes are benchmarked.
- `to_chars` isn't benchmarked for every possible base anymore, but only
2, 8, 10, 16 and 23 (so we have an uncommon base as well)
2026-01-12 14:09:54 -05:00
Nikolas Klauser
06c6a5020f
[libc++] Optimize search_n ( #171389 )
...
This changes the algorithm to more efficiently skip ranges which cannot
match the needle for random access iterators. Specifically, we now
search for a mismatching element from the back of the subrange we want
to check. When a mismatch occurs we can directly start one after the
mismatched element, since there cannot possibly be a matching subrange
starting between the start of the subrange we checked and the mismatched
element (since all elements have to be equal). The algorithm also
remembers the subrange which was already match as being equal and
doesn't try to compare it a second time, reducing the time spent in case
of a match.
```
Benchmark old new Difference % Difference
--------------------------------------------------- -------------- -------------- ------------ --------------
rng::search_n(deque<int>)_(no_match)/1000 458.33 14.22 -444.11 -96.90%
rng::search_n(deque<int>)_(no_match)/1024 456.17 13.89 -442.28 -96.95%
rng::search_n(deque<int>)_(no_match)/1048576 453420.38 17.69 -453402.69 -100.00%
rng::search_n(deque<int>)_(no_match)/8192 3566.08 17.60 -3548.49 -99.51%
rng::search_n(deque<int>,_pred)_(no_match)/1000 597.88 15.25 -582.63 -97.45%
rng::search_n(deque<int>,_pred)_(no_match)/1024 608.42 15.39 -593.03 -97.47%
rng::search_n(deque<int>,_pred)_(no_match)/1048576 594533.99 18.91 -594515.08 -100.00%
rng::search_n(deque<int>,_pred)_(no_match)/8192 4670.23 18.88 -4651.35 -99.60%
rng::search_n(list<int>)_(no_match)/1000 733.72 730.22 -3.50 -0.48%
rng::search_n(list<int>)_(no_match)/1024 759.93 753.10 -6.84 -0.90%
rng::search_n(list<int>)_(no_match)/1048576 833841.54 813483.75 -20357.79 -2.44%
rng::search_n(list<int>)_(no_match)/8192 8352.18 8417.31 65.14 0.78%
rng::search_n(list<int>,_pred)_(no_match)/1000 776.79 789.72 12.93 1.66%
rng::search_n(list<int>,_pred)_(no_match)/1024 788.42 806.70 18.28 2.32%
rng::search_n(list<int>,_pred)_(no_match)/1048576 955536.40 982976.81 27440.41 2.87%
rng::search_n(list<int>,_pred)_(no_match)/8192 8874.02 8915.18 41.16 0.46%
rng::search_n(vector<int>)_(no_match)/1000 212.69 3.79 -208.90 -98.22%
rng::search_n(vector<int>)_(no_match)/1024 219.67 3.70 -215.96 -98.31%
rng::search_n(vector<int>)_(no_match)/1048576 209622.54 3.67 -209618.87 -100.00%
rng::search_n(vector<int>)_(no_match)/8192 1643.80 3.83 -1639.98 -99.77%
rng::search_n(vector<int>,_pred)_(no_match)/1000 461.93 7.55 -454.38 -98.36%
rng::search_n(vector<int>,_pred)_(no_match)/1024 472.43 7.74 -464.69 -98.36%
rng::search_n(vector<int>,_pred)_(no_match)/1048576 546180.29 8.71 -546171.58 -100.00%
rng::search_n(vector<int>,_pred)_(no_match)/8192 3786.26 7.88 -3778.38 -99.79%
std::search_n(deque<int>)_(no_match)/1000 455.53 14.19 -441.34 -96.88%
std::search_n(deque<int>)_(no_match)/1024 459.79 13.98 -445.81 -96.96%
std::search_n(deque<int>)_(no_match)/1048576 449780.32 17.99 -449762.33 -100.00%
std::search_n(deque<int>)_(no_match)/8192 3508.55 17.97 -3490.58 -99.49%
std::search_n(deque<int>,_pred)_(no_match)/1000 571.53 17.16 -554.37 -97.00%
std::search_n(deque<int>,_pred)_(no_match)/1024 584.43 17.09 -567.34 -97.08%
std::search_n(deque<int>,_pred)_(no_match)/1048576 581418.31 19.16 -581399.15 -100.00%
std::search_n(deque<int>,_pred)_(no_match)/8192 4661.97 19.36 -4642.61 -99.58%
std::search_n(list<int>)_(no_match)/1000 722.45 710.39 -12.06 -1.67%
std::search_n(list<int>)_(no_match)/1024 748.50 727.08 -21.42 -2.86%
std::search_n(list<int>)_(no_match)/1048576 821655.28 784520.12 -37135.16 -4.52%
std::search_n(list<int>)_(no_match)/8192 7941.73 8002.05 60.32 0.76%
std::search_n(list<int>,_pred)_(no_match)/1000 766.59 786.31 19.72 2.57%
std::search_n(list<int>,_pred)_(no_match)/1024 785.92 804.43 18.51 2.35%
std::search_n(list<int>,_pred)_(no_match)/1048576 948252.76 969125.41 20872.65 2.20%
std::search_n(list<int>,_pred)_(no_match)/8192 8658.99 8825.71 166.72 1.93%
std::search_n(vector<int>)_(no_match)/1000 210.36 3.47 -206.89 -98.35%
std::search_n(vector<int>)_(no_match)/1024 217.60 4.13 -213.47 -98.10%
std::search_n(vector<int>)_(no_match)/1048576 209386.43 3.51 -209382.92 -100.00%
std::search_n(vector<int>)_(no_match)/8192 1643.79 3.50 -1640.29 -99.79%
std::search_n(vector<int>,_pred)_(no_match)/1000 460.88 5.44 -455.45 -98.82%
std::search_n(vector<int>,_pred)_(no_match)/1024 475.36 5.43 -469.93 -98.86%
std::search_n(vector<int>,_pred)_(no_match)/1048576 682722.75 7.15 -682715.60 -100.00%
std::search_n(vector<int>,_pred)_(no_match)/8192 3779.95 5.43 -3774.52 -99.86%
Geomean 4956.15 87.96 -4868.19 -98.23%
```
Fixes #129327
2026-01-08 11:26:08 +01:00
Nikolas Klauser
f21e56c632
[libc++] Add benchmark for std::find(vector<short>)
...
We have special cases for `std::find` with `char` and `int` on most
platforms, so the only other benchmark of or vector implementation
is currently with `long long`, which is a rather big type. Adding
`short` to the benchmarks allows us to more meaningfully compare the
different implementations.
2026-01-07 11:21:57 +01:00
Nikolas Klauser
4fc5b6d8c4
[libc++] Optimize {std,ranges}::for_each for iterating over __trees ( #164405 )
...
This patch optimizes how `for_each` iterates over trees by using
recursion and storing pointers to the next nodes on the stack. This
avoids pointer chasing through the `__parent_` pointer, reducing cache
misses. It also makes use of the compiler being able tail-call optimize
the recursive function, removing back-tracking the iterators have to do.
```
Benchmark old new Difference % Difference
-------------------------------------------------- -------------- -------------- ------------ --------------
rng::for_each(map<int>)/32 35.19 26.67 -8.52 -24.21%
rng::for_each(map<int>)/50 64.13 40.68 -23.45 -36.57%
rng::for_each(map<int>)/8 5.06 6.49 1.43 28.21%
rng::for_each(map<int>)/8192 22893.89 9266.68 -13627.21 -59.52%
rng::for_each(map<int>::iterator)/32 35.51 26.88 -8.63 -24.31%
rng::for_each(map<int>::iterator)/50 64.39 41.24 -23.15 -35.95%
rng::for_each(map<int>::iterator)/8 5.12 5.93 0.81 15.80%
rng::for_each(map<int>::iterator)/8192 21283.14 9736.83 -11546.31 -54.25%
rng::for_each(multimap<int>)/32 35.22 26.61 -8.61 -24.45%
rng::for_each(multimap<int>)/50 64.10 40.07 -24.03 -37.49%
rng::for_each(multimap<int>)/8 5.08 6.69 1.61 31.70%
rng::for_each(multimap<int>)/8192 23130.44 9026.16 -14104.28 -60.98%
rng::for_each(multimap<int>::iterator)/32 35.40 25.08 -10.32 -29.15%
rng::for_each(multimap<int>::iterator)/50 64.19 38.15 -26.04 -40.56%
rng::for_each(multimap<int>::iterator)/8 5.04 5.25 0.22 4.31%
rng::for_each(multimap<int>::iterator)/8192 22875.97 9392.08 -13483.89 -58.94%
rng::for_each(multiset<int>)/32 35.82 27.11 -8.72 -24.33%
rng::for_each(multiset<int>)/50 62.92 41.59 -21.32 -33.89%
rng::for_each(multiset<int>)/8 4.79 6.79 2.00 41.70%
rng::for_each(multiset<int>)/8192 22642.68 9280.95 -13361.73 -59.01%
rng::for_each(multiset<int>::iterator)/32 35.76 26.71 -9.04 -25.28%
rng::for_each(multiset<int>::iterator)/50 63.44 39.00 -24.44 -38.53%
rng::for_each(multiset<int>::iterator)/8 4.90 5.21 0.30 6.18%
rng::for_each(multiset<int>::iterator)/8192 19930.45 9867.60 -10062.85 -50.49%
rng::for_each(set<int>)/32 35.90 27.30 -8.60 -23.96%
rng::for_each(set<int>)/50 63.15 40.75 -22.40 -35.47%
rng::for_each(set<int>)/8 4.77 6.83 2.06 43.23%
rng::for_each(set<int>)/8192 20262.77 9381.57 -10881.20 -53.70%
rng::for_each(set<int>::iterator)/32 36.02 26.42 -9.60 -26.64%
rng::for_each(set<int>::iterator)/50 63.29 37.97 -25.32 -40.01%
rng::for_each(set<int>::iterator)/8 4.72 5.22 0.50 10.50%
rng::for_each(set<int>::iterator)/8192 20041.91 9831.91 -10210.00 -50.94%
```
2025-12-12 09:05:51 +01:00
lbonn
3ebe99f4c2
[libcxx] Unwrap iterators in __find_segment ( #161274 )
...
The segmented iterator optimized implementation of find now unwraps
iterators when processing each segments.
As a result, it is able to take better advantage to some find
specializations: calling memchr/wmemchr for vector<vector<{char,int}>>
```
Benchmark Baseline Candidate Difference % Difference
-------------------------------------------------------------- ---------- ----------- ------------ --------------
rng::find(join_view(deque<deque<int>>))_(process_all)/1024 71.13 61.19 -9.94 -13.97
rng::find(join_view(deque<deque<int>>))_(process_all)/32768 2359.19 2237.02 -122.17 -5.18
rng::find(join_view(deque<deque<int>>))_(process_all)/50 16.88 17.59 0.71 4.20
rng::find(join_view(deque<deque<int>>))_(process_all)/8 15.59 16.10 0.51 3.27
rng::find(join_view(deque<deque<int>>))_(process_all)/8192 647.01 532.75 -114.26 -17.66
rng::find(join_view(list<vector<int>>))_(process_all)/1024 689.76 680.74 -9.02 -1.31
rng::find(join_view(list<vector<int>>))_(process_all)/32768 22284.95 21500.26 -784.69 -3.52
rng::find(join_view(list<vector<int>>))_(process_all)/50 32.77 32.12 -0.65 -1.98
rng::find(join_view(list<vector<int>>))_(process_all)/8 6.11 5.92 -0.19 -3.11
rng::find(join_view(list<vector<int>>))_(process_all)/8192 5527.88 5373.43 -154.45 -2.79
rng::find(join_view(vector<list<int>>))_(process_all)/1024 1305.59 1264.04 -41.55 -3.18
rng::find(join_view(vector<list<int>>))_(process_all)/32768 42840.88 43322.64 481.76 1.12
rng::find(join_view(vector<list<int>>))_(process_all)/50 57.52 62.35 4.82 8.38
rng::find(join_view(vector<list<int>>))_(process_all)/8 6.06 5.98 -0.07 -1.18
rng::find(join_view(vector<list<int>>))_(process_all)/8192 20700.53 21431.66 731.12 3.53
rng::find(join_view(vector<vector<char>>))_(process_all)/1024 310.64 18.34 -292.30 -94.09
rng::find(join_view(vector<vector<char>>))_(process_all)/32768 9424.96 531.99 -8892.97 -94.36
rng::find(join_view(vector<vector<char>>))_(process_all)/50 18.58 3.25 -15.32 -82.49
rng::find(join_view(vector<vector<char>>))_(process_all)/8 4.81 2.98 -1.84 -38.13
rng::find(join_view(vector<vector<char>>))_(process_all)/8192 2437.50 126.88 -2310.62 -94.79
rng::find(join_view(vector<vector<int>>))_(process_all)/1024 297.10 41.70 -255.39 -85.96
rng::find(join_view(vector<vector<int>>))_(process_all)/32768 9662.42 1822.05 -7840.36 -81.14
rng::find(join_view(vector<vector<int>>))_(process_all)/50 22.29 5.10 -17.19 -77.11
rng::find(join_view(vector<vector<int>>))_(process_all)/8 3.73 3.13 -0.60 -16.05
rng::find(join_view(vector<vector<int>>))_(process_all)/8192 2399.68 356.10 -2043.58 -85.16
```
2025-11-28 10:13:21 +01:00
Nikolas Klauser
97367d1046
[libc++] Vectorize std::find ( #156431 )
...
```
Apple M4:
-----------------------------------------------------------------------------
Benchmark old new
-----------------------------------------------------------------------------
std::find(vector<char>) (bail 25%)/8 1.43 ns 1.44 ns
std::find(vector<char>) (bail 25%)/1024 5.54 ns 5.59 ns
std::find(vector<char>) (bail 25%)/8192 38.4 ns 39.1 ns
std::find(vector<char>) (bail 25%)/32768 134 ns 136 ns
std::find(vector<int>) (bail 25%)/8 1.56 ns 1.57 ns
std::find(vector<int>) (bail 25%)/1024 65.3 ns 65.4 ns
std::find(vector<int>) (bail 25%)/8192 465 ns 464 ns
std::find(vector<int>) (bail 25%)/32768 1832 ns 1832 ns
std::find(vector<long long>) (bail 25%)/8 0.920 ns 1.20 ns
std::find(vector<long long>) (bail 25%)/1024 65.2 ns 31.2 ns
std::find(vector<long long>) (bail 25%)/8192 464 ns 255 ns
std::find(vector<long long>) (bail 25%)/32768 1833 ns 992 ns
std::find(vector<char>) (process all)/8 1.21 ns 1.22 ns
std::find(vector<char>) (process all)/50 1.92 ns 1.93 ns
std::find(vector<char>) (process all)/1024 16.6 ns 16.9 ns
std::find(vector<char>) (process all)/8192 134 ns 136 ns
std::find(vector<char>) (process all)/32768 488 ns 503 ns
std::find(vector<int>) (process all)/8 2.45 ns 2.48 ns
std::find(vector<int>) (process all)/50 12.7 ns 12.7 ns
std::find(vector<int>) (process all)/1024 236 ns 236 ns
std::find(vector<int>) (process all)/8192 1830 ns 1834 ns
std::find(vector<int>) (process all)/32768 7351 ns 7346 ns
std::find(vector<long long>) (process all)/8 2.02 ns 1.45 ns
std::find(vector<long long>) (process all)/50 12.0 ns 6.12 ns
std::find(vector<long long>) (process all)/1024 235 ns 123 ns
std::find(vector<long long>) (process all)/8192 1830 ns 983 ns
std::find(vector<long long>) (process all)/32768 7306 ns 3969 ns
std::find(vector<bool>) (process all)/8 1.14 ns 1.15 ns
std::find(vector<bool>) (process all)/50 1.16 ns 1.17 ns
std::find(vector<bool>) (process all)/1024 4.51 ns 4.53 ns
std::find(vector<bool>) (process all)/8192 33.6 ns 33.5 ns
std::find(vector<bool>) (process all)/1048576 3660 ns 3660 ns
```
2025-09-29 11:10:19 +02:00
Louis Dionne
a34048bfed
[libc++][NFC] Fix benchmark name missing a parenthesis
2025-09-24 23:24:55 -04:00
Louis Dionne
794351355e
[libc++] Fix broken unique and unique_copy benchmarks ( #158287 )
...
These benchmarks have an assumption that the container size is divisible
by 4 because of how we populate their content, which wasn't satisfied.
2025-09-15 16:00:51 -04:00
Peng Liu
9827440f1e
[libc++] Optimize ranges::{for_each, for_each_n} for segmented iterators ( #132896 )
...
Previously, the segmented iterator optimization was limited to `std::{for_each, for_each_n}`. This patch
extends the optimization to `std::ranges::for_each` and `std::ranges::for_each_n`, ensuring consistent
optimizations across these algorithms. This patch first generalizes the `std` algorithms by introducing
a `Projection` parameter, which is set to `__identity` for the `std` algorithms. Then we let the `ranges`
algorithms to directly call their `std` counterparts with a general `__proj` argument. Benchmarks
demonstrate performance improvements of up to 21.4x for ``std::deque::iterator`` and 22.3x for
``join_view`` of ``vector<vector<char>>``.
Addresses a subtask of #102817 .
2025-06-18 12:22:47 -04:00
Peng Liu
09c266b75d
[libc++] Optimize std::for_each_n for segmented iterators ( #135468 )
...
This patch enhances the performance of `std::for_each_n` when used with
segmented iterators, leading to significant performance improvements,
summarized in the tables below. This addresses a subtask of
https://github.com/llvm/llvm-project/issues/102817 .
2025-05-21 12:10:50 -04:00
Louis Dionne
8c435886aa
[libc++] Refactor and add benchmark coverage for [alg.sort] ( #128236 )
...
This patch adds missing benchmark coverage for partial_sort,
partial_sort_copy, is_sorted and is_sorted_until.
It also refactors the existing benchmarks for sort and stable_sort to
follow the consistent style of the new algorithm benchmarks. However,
these benchmarks were notoriously slow to run since they tested multiple
data patterns on multiple data types. To try to alleviate this, I
reduced the benchmarks to only run on integral types and on a single
non-integral type, which should faithfully show how the algorithm
behaves for anything non-integral. However, this is technically a
reduction in coverage.
2025-05-14 14:52:17 -04:00
Louis Dionne
410754410f
[libc++] Add benchmarks for partitioning algorithms ( #127324 )
...
This patch adds benchmarks for std::partition, is_partitioned, etc and
their ranges:: variants.
2025-03-24 12:24:43 -04:00
Louis Dionne
b3a4bf9d8f
[libc++] Refactor and add benchmarks from [alg.nonmodifying] ( #128206 )
2025-03-19 00:08:46 -04:00
Louis Dionne
24e88b0e6b
[libc++] Add remaining benchmarks from [alg.modifying.operations] ( #127354 )
...
This patch adds benchmarks for all the remaining algorithms in
[alg.modifying.operations] that we didn't already have a benchmark for.
2025-03-17 15:11:13 -04:00
Peng Liu
4baf1c03fa
[libc++] Optimize ranges::rotate for vector<bool>::iterator ( #121168 )
...
This PR optimizes the performance of `std::ranges::rotate` for
`vector<bool>::iterator`. The optimization yields a performance
improvement of up to 2096x.
Closes #64038 .
2025-03-13 14:07:23 -04:00
Peng Liu
a12744ff05
[libc++] Optimize ranges::swap_ranges for vector<bool>::iterator ( #121150 )
...
This PR optimizes the performance of `std::ranges::swap_ranges` for
`vector<bool>::iterator`, addressing a subtask outlined in issue #64038 .
The optimizations yield performance improvements of up to **611x** for
aligned range swap and **78x** for unaligned range swap comparison.
Additionally, comprehensive tests covering up to 4 storage words (256
bytes) with odd and even bit sizes are provided, which validate the
proposed optimizations in this patch.
2025-03-04 17:15:36 -05:00
Peng Liu
7717a549e9
[libc++] Optimize ranges::equal for vector<bool>::iterator ( #121084 )
...
This PR optimizes the performance of `std::ranges::equal` for
`vector<bool>::iterator`, addressing a subtask outlined in issue #64038 .
The optimizations yield performance improvements of up to 188x for
aligned equality comparison and 82x for unaligned equality
comparison. Moreover, comprehensive tests covering up to 4 storage words
(256 bytes) with odd and even bit sizes are provided, which validate the
proposed optimizations in this patch.
2025-02-26 12:18:25 -05:00
Louis Dionne
8feb5bac32
[libc++] Add benchmarks for copy algorithms ( #127328 )
...
This patch adds benchmarks for the copy family of algorithms (copy,
copy_n, copy_if, copy_backward).
2025-02-20 08:24:32 -05:00
Peng Liu
ab3d793982
[libc++] Optimize ranges::move{,_backward} for vector<bool>::iterator ( #121109 )
...
As a follow-up to #121013 (which optimized `ranges::copy`) and #121026
(which optimized `ranges::copy_backward`), this PR enhances the
performance of `std::ranges::{move, move_backward}` for
`vector<bool>::iterator`, addressing a subtask outlined in issue #64038 .
The optimizations bring performance improvements analogous to those
achieved for the `{copy, copy_backward}` algorithms: up to 2000x for
aligned moves and 60x for unaligned moves. Moreover, comprehensive
tests covering up to 4 storage words (256 bytes) with odd and even bit
sizes are provided, which validate the proposed optimizations in this
patch.
2025-02-19 11:36:45 -05:00
Louis Dionne
a6093d3034
[libc++] Explicitly mention vector_bool in the name of benchmarks ( #127313 )
...
We have some benchmarks that were benchmarking very specific
functionality, namely the optimizations in vector<bool>::iterator. Call
this out in the benchmarks by renaming them appropriately. In the future
we will also increase the coverage of these benchmarks to test other
containers.
2025-02-15 14:56:19 +01:00
Louis Dionne
c7c7eabc7f
[libc++] Add a benchmark for std::reverse ( #125262 )
2025-02-03 12:10:47 -05:00
Louis Dionne
439bef9751
[libc++] Refactor the sequence container benchmarks ( #119763 )
...
Rewrite the sequence container benchmarks to only rely on the actual
operations specified in SequenceContainer requirements and add
benchmarks for std::list, which is also considered a sequence container.
One of the major goals of this refactoring is also to make these
container benchmarks run faster so that they can be run more frequently.
The existing benchmarks have the significant problem that they take so
long to run that they must basically be run overnight. This patch
reduces the size of inputs such that the rewritten benchmarks each take
at most a minute to run.
This patch doesn't touch the string benchmarks, which were not using the
generic container benchmark functions previously.
2025-01-30 15:02:34 -05:00
Peng Liu
edc3dc6abd
[libc++] Optimize ranges::copy_backward for vector<bool>::iterator ( #121026 )
...
As a follow-up to #121013 (which focused on `std::ranges::copy`), this
PR optimizes the performance of `std::ranges::copy_backward` for
`vector<bool>::iterator`, addressing a subtask outlined in issue #64038 .
The optimizations yield performance improvements of up to 2000x for
aligned copies and 60x for unaligned copies.
2025-01-30 14:55:05 -05:00
Peng Liu
5b65896ad6
[libc++] Optimize ranges::copy{, _n} for vector<bool>::iterator ( #121013 )
...
This PR optimizes the performance of `std::ranges::copy` and
`std::ranges::copy_n` specifically for `vector<bool>::iterator`,
addressing a subtask outlined in issue #64038 . The optimizations yield
performance improvements of up to **2000x** for aligned copies and
**60x** for unaligned copies. Additionally, new tests have been added to
validate these enhancements.
- Aligned source-destination bits
ranges::copy
```
--------------------------------------------------------------------------
Benchmark Before After Improvement
--------------------------------------------------------------------------
bm_ranges_copy_vb_aligned/8 10.8 ns 1.42 ns 8x
bm_ranges_copy_vb_aligned/64 88.5 ns 2.28 ns 39x
bm_ranges_copy_vb_aligned/512 709 ns 1.95 ns 364x
bm_ranges_copy_vb_aligned/4096 5568 ns 5.01 ns 1111x
bm_ranges_copy_vb_aligned/32768 44754 ns 38.7 ns 1156x
bm_ranges_copy_vb_aligned/65536 91092 ns 73.2 ns 1244x
bm_ranges_copy_vb_aligned/102400 139473 ns 127 ns 1098x
bm_ranges_copy_vb_aligned/106496 189004 ns 81.5 ns 2319x
bm_ranges_copy_vb_aligned/110592 153647 ns 71.1 ns 2161x
bm_ranges_copy_vb_aligned/114688 159261 ns 70.2 ns 2269x
bm_ranges_copy_vb_aligned/118784 181910 ns 73.5 ns 2475x
bm_ranges_copy_vb_aligned/122880 174117 ns 76.5 ns 2276x
bm_ranges_copy_vb_aligned/126976 176020 ns 82.0 ns 2147x
bm_ranges_copy_vb_aligned/131072 180757 ns 137 ns 1319x
bm_ranges_copy_vb_aligned/135168 190342 ns 158 ns 1205x
bm_ranges_copy_vb_aligned/139264 192831 ns 103 ns 1872x
bm_ranges_copy_vb_aligned/143360 199627 ns 89.4 ns 2233x
bm_ranges_copy_vb_aligned/147456 203881 ns 88.6 ns 2301x
bm_ranges_copy_vb_aligned/151552 213345 ns 88.4 ns 2413x
bm_ranges_copy_vb_aligned/155648 216892 ns 92.9 ns 2335x
bm_ranges_copy_vb_aligned/159744 222751 ns 96.4 ns 2311x
bm_ranges_copy_vb_aligned/163840 225995 ns 173 ns 1306x
bm_ranges_copy_vb_aligned/167936 235230 ns 202 ns 1165x
bm_ranges_copy_vb_aligned/172032 244093 ns 131 ns 1863x
bm_ranges_copy_vb_aligned/176128 244434 ns 111 ns 2202x
bm_ranges_copy_vb_aligned/180224 249570 ns 108 ns 2311x
bm_ranges_copy_vb_aligned/184320 254538 ns 108 ns 2357x
bm_ranges_copy_vb_aligned/188416 261817 ns 113 ns 2317x
bm_ranges_copy_vb_aligned/192512 269923 ns 125 ns 2159x
bm_ranges_copy_vb_aligned/196608 273494 ns 210 ns 1302x
bm_ranges_copy_vb_aligned/200704 280035 ns 269 ns 1041x
bm_ranges_copy_vb_aligned/204800 293102 ns 231 ns 1269x
```
ranges::copy_n
```
--------------------------------------------------------------------------
Benchmark Before After Improvement
--------------------------------------------------------------------------
bm_ranges_copy_n_vb_aligned/8 11.8 ns 0.89 ns 13x
bm_ranges_copy_n_vb_aligned/64 91.6 ns 2.06 ns 44x
bm_ranges_copy_n_vb_aligned/512 718 ns 2.45 ns 293x
bm_ranges_copy_n_vb_aligned/4096 5750 ns 5.02 ns 1145x
bm_ranges_copy_n_vb_aligned/32768 45824 ns 40.9 ns 1120x
bm_ranges_copy_n_vb_aligned/65536 92267 ns 73.8 ns 1250x
bm_ranges_copy_n_vb_aligned/102400 143267 ns 125 ns 1146x
bm_ranges_copy_n_vb_aligned/106496 148625 ns 82.4 ns 1804x
bm_ranges_copy_n_vb_aligned/110592 154817 ns 72.0 ns 2150x
bm_ranges_copy_n_vb_aligned/114688 157953 ns 70.4 ns 2244x
bm_ranges_copy_n_vb_aligned/118784 162374 ns 71.5 ns 2270x
bm_ranges_copy_n_vb_aligned/122880 168638 ns 72.9 ns 2313x
bm_ranges_copy_n_vb_aligned/126976 175596 ns 76.6 ns 2292x
bm_ranges_copy_n_vb_aligned/131072 181164 ns 135 ns 1342x
bm_ranges_copy_n_vb_aligned/135168 184697 ns 157 ns 1176x
bm_ranges_copy_n_vb_aligned/139264 191395 ns 104 ns 1840x
bm_ranges_copy_n_vb_aligned/143360 194954 ns 88.3 ns 2208x
bm_ranges_copy_n_vb_aligned/147456 208917 ns 86.1 ns 2426x
bm_ranges_copy_n_vb_aligned/151552 211101 ns 87.2 ns 2421x
bm_ranges_copy_n_vb_aligned/155648 213175 ns 89.0 ns 2395x
bm_ranges_copy_n_vb_aligned/159744 218988 ns 86.7 ns 2526x
bm_ranges_copy_n_vb_aligned/163840 225263 ns 156 ns 1444x
bm_ranges_copy_n_vb_aligned/167936 230725 ns 184 ns 1254x
bm_ranges_copy_n_vb_aligned/172032 235795 ns 119 ns 1981x
bm_ranges_copy_n_vb_aligned/176128 241145 ns 101 ns 2388x
bm_ranges_copy_n_vb_aligned/180224 250680 ns 99.5 ns 2519x
bm_ranges_copy_n_vb_aligned/184320 262954 ns 99.7 ns 2637x
bm_ranges_copy_n_vb_aligned/188416 258584 ns 103 ns 2510x
bm_ranges_copy_n_vb_aligned/192512 267190 ns 125 ns 2138x
bm_ranges_copy_n_vb_aligned/196608 270821 ns 213 ns 1271x
bm_ranges_copy_n_vb_aligned/200704 279532 ns 262 ns 1067x
bm_ranges_copy_n_vb_aligned/204800 283412 ns 222 ns 1277x
```
- Unaligned source-destination bits
```
--------------------------------------------------------------------------------
Benchmark Before After Improvement
--------------------------------------------------------------------------------
bm_ranges_copy_vb_unaligned/8 12.8 ns 8.59 ns 1.5x
bm_ranges_copy_vb_unaligned/64 98.2 ns 8.24 ns 12x
bm_ranges_copy_vb_unaligned/512 755 ns 18.1 ns 42x
bm_ranges_copy_vb_unaligned/4096 6027 ns 102 ns 59x
bm_ranges_copy_vb_unaligned/32768 47663 ns 774 ns 62x
bm_ranges_copy_vb_unaligned/262144 378981 ns 6455 ns 59x
bm_ranges_copy_vb_unaligned/1048576 1520486 ns 25942 ns 59x
bm_ranges_copy_n_vb_unaligned/8 11.3 ns 8.22 ns 1.4x
bm_ranges_copy_n_vb_unaligned/64 97.3 ns 7.89 ns 12x
bm_ranges_copy_n_vb_unaligned/512 747 ns 18.1 ns 41x
bm_ranges_copy_n_vb_unaligned/4096 5932 ns 99.0 ns 60x
bm_ranges_copy_n_vb_unaligned/32768 47776 ns 749 ns 64x
bm_ranges_copy_n_vb_unaligned/262144 378802 ns 6576 ns 58x
bm_ranges_copy_n_vb_unaligned/1048576 1547234 ns 26229 ns 59x
```
2025-01-30 17:26:26 +01:00
Nikolas Klauser
fd784726db
[libc++] Rewrite minmax_element benchmark
...
The benchmark currently uses makeCartesianProductBenchmark, which
doesn't make a ton of sense, since minmax_element always goes through
every element one by one. The runtime doesn't depend on the values
of the elements.
Fixes #120758
2024-12-21 20:07:28 +01:00
Louis Dionne
6a9279ca40
[libc++] Slight reorganization of the benchmarks ( #119625 )
...
Move various container benchmarks to the same subdirectory, and regroup
some format-related benchmarks.
2024-12-12 08:14:50 -05:00
Louis Dionne
e236a52a88
[libc++] Unify the benchmarks with the test suite ( #101399 )
...
Instead of building the benchmarks separately via CMake and running them
separately from the test suite, this patch merges the benchmarks into
the test suite and handles both uniformly.
As a result:
- It is now possible to run individual benchmarks like we run tests
(e.g. using libcxx-lit), which is a huge quality-of-life improvement.
- The benchmarks will be run under exactly the same configuration as
the rest of the tests, which is a nice simplification. This does
mean that one has to be careful to enable the desired optimization
flags when running benchmarks, but that is easy with e.g.
`libcxx-lit <...> --param optimization=speed`.
- Benchmarks can use the same annotations as the rest of the test
suite, such as `// UNSUPPORTED` & friends.
When running the tests via `check-cxx`, we only compile the benchmarks
because running them would be too time consuming. This introduces a bit
of complexity in the testing setup, and instead it would be better to
allow passing a --dry-run flag to GoogleBenchmark executables, which is
the topic of https://github.com/google/benchmark/issues/1827 .
I am not really satisfied with the layering violation of adding the
%{benchmark_flags} substitution to cmake-bridge, however I believe
this can be improved in the future.
2024-11-07 09:07:50 -05:00
Louis Dionne
b2d2494731
[libc++] Make benchmarks forward-compatible with the test suite ( #114502 )
...
This patch fixes warnings and errors that come up when running the
benchmarks as part of the test suite. It also adds the necessary Lit
annotations to make it pass in various configurations and increases the
portability of the benchmarks.
2024-11-05 09:08:00 -05:00
Nikolas Klauser
d07fdf9779
[libc++] Optimize lexicographical_compare ( #65279 )
...
If the comparison operation is equivalent to < and that is a total
order, we know that we can use equality comparison on that type instead
to extract some information. Furthermore, if equality comparison on that
type is trivial, the user can't observe that we're calling it. So
instead of using the user-provided total order, we use std::mismatch,
which uses equality comparison (and is vertorized). Additionally, if the
type is trivially lexicographically comparable, we can go one step
further and use std::memcmp directly instead of calling std::mismatch.
Benchmarks:
```
-------------------------------------------------------------------------------------
Benchmark old new
-------------------------------------------------------------------------------------
bm_lexicographical_compare<unsigned char>/1 1.17 ns 2.34 ns
bm_lexicographical_compare<unsigned char>/2 1.64 ns 2.57 ns
bm_lexicographical_compare<unsigned char>/3 2.23 ns 2.58 ns
bm_lexicographical_compare<unsigned char>/4 2.82 ns 2.57 ns
bm_lexicographical_compare<unsigned char>/5 3.34 ns 2.11 ns
bm_lexicographical_compare<unsigned char>/6 3.94 ns 2.21 ns
bm_lexicographical_compare<unsigned char>/7 4.56 ns 2.11 ns
bm_lexicographical_compare<unsigned char>/8 5.25 ns 2.11 ns
bm_lexicographical_compare<unsigned char>/16 9.88 ns 2.11 ns
bm_lexicographical_compare<unsigned char>/64 38.9 ns 2.36 ns
bm_lexicographical_compare<unsigned char>/512 317 ns 6.54 ns
bm_lexicographical_compare<unsigned char>/4096 2517 ns 41.4 ns
bm_lexicographical_compare<unsigned char>/32768 20052 ns 488 ns
bm_lexicographical_compare<unsigned char>/262144 159579 ns 4409 ns
bm_lexicographical_compare<unsigned char>/1048576 640456 ns 20342 ns
bm_lexicographical_compare<signed char>/1 1.18 ns 2.37 ns
bm_lexicographical_compare<signed char>/2 1.65 ns 2.60 ns
bm_lexicographical_compare<signed char>/3 2.23 ns 2.83 ns
bm_lexicographical_compare<signed char>/4 2.81 ns 3.06 ns
bm_lexicographical_compare<signed char>/5 3.35 ns 3.30 ns
bm_lexicographical_compare<signed char>/6 3.90 ns 3.99 ns
bm_lexicographical_compare<signed char>/7 4.56 ns 3.78 ns
bm_lexicographical_compare<signed char>/8 5.20 ns 4.02 ns
bm_lexicographical_compare<signed char>/16 9.80 ns 6.21 ns
bm_lexicographical_compare<signed char>/64 39.0 ns 3.16 ns
bm_lexicographical_compare<signed char>/512 318 ns 7.58 ns
bm_lexicographical_compare<signed char>/4096 2514 ns 47.4 ns
bm_lexicographical_compare<signed char>/32768 20096 ns 504 ns
bm_lexicographical_compare<signed char>/262144 156617 ns 4146 ns
bm_lexicographical_compare<signed char>/1048576 624265 ns 19810 ns
bm_lexicographical_compare<int>/1 1.15 ns 2.12 ns
bm_lexicographical_compare<int>/2 1.60 ns 2.36 ns
bm_lexicographical_compare<int>/3 2.21 ns 2.59 ns
bm_lexicographical_compare<int>/4 2.74 ns 2.83 ns
bm_lexicographical_compare<int>/5 3.26 ns 3.06 ns
bm_lexicographical_compare<int>/6 3.81 ns 4.53 ns
bm_lexicographical_compare<int>/7 4.41 ns 4.72 ns
bm_lexicographical_compare<int>/8 5.08 ns 2.36 ns
bm_lexicographical_compare<int>/16 9.54 ns 3.08 ns
bm_lexicographical_compare<int>/64 37.8 ns 4.71 ns
bm_lexicographical_compare<int>/512 309 ns 24.6 ns
bm_lexicographical_compare<int>/4096 2422 ns 204 ns
bm_lexicographical_compare<int>/32768 19362 ns 1947 ns
bm_lexicographical_compare<int>/262144 155727 ns 19793 ns
bm_lexicographical_compare<int>/1048576 623614 ns 80180 ns
bm_ranges_lexicographical_compare<unsigned char>/1 1.07 ns 2.35 ns
bm_ranges_lexicographical_compare<unsigned char>/2 1.72 ns 2.13 ns
bm_ranges_lexicographical_compare<unsigned char>/3 2.46 ns 2.12 ns
bm_ranges_lexicographical_compare<unsigned char>/4 3.17 ns 2.12 ns
bm_ranges_lexicographical_compare<unsigned char>/5 3.86 ns 2.12 ns
bm_ranges_lexicographical_compare<unsigned char>/6 4.55 ns 2.12 ns
bm_ranges_lexicographical_compare<unsigned char>/7 5.25 ns 2.12 ns
bm_ranges_lexicographical_compare<unsigned char>/8 5.95 ns 2.13 ns
bm_ranges_lexicographical_compare<unsigned char>/16 11.7 ns 2.13 ns
bm_ranges_lexicographical_compare<unsigned char>/64 45.5 ns 2.36 ns
bm_ranges_lexicographical_compare<unsigned char>/512 366 ns 6.35 ns
bm_ranges_lexicographical_compare<unsigned char>/4096 2886 ns 40.9 ns
bm_ranges_lexicographical_compare<unsigned char>/32768 23054 ns 489 ns
bm_ranges_lexicographical_compare<unsigned char>/262144 185302 ns 4339 ns
bm_ranges_lexicographical_compare<unsigned char>/1048576 741576 ns 19430 ns
bm_ranges_lexicographical_compare<signed char>/1 1.10 ns 2.12 ns
bm_ranges_lexicographical_compare<signed char>/2 1.66 ns 2.35 ns
bm_ranges_lexicographical_compare<signed char>/3 2.23 ns 2.58 ns
bm_ranges_lexicographical_compare<signed char>/4 2.82 ns 2.82 ns
bm_ranges_lexicographical_compare<signed char>/5 3.34 ns 3.06 ns
bm_ranges_lexicographical_compare<signed char>/6 3.92 ns 3.99 ns
bm_ranges_lexicographical_compare<signed char>/7 4.64 ns 4.10 ns
bm_ranges_lexicographical_compare<signed char>/8 5.21 ns 4.61 ns
bm_ranges_lexicographical_compare<signed char>/16 9.79 ns 7.42 ns
bm_ranges_lexicographical_compare<signed char>/64 38.9 ns 2.93 ns
bm_ranges_lexicographical_compare<signed char>/512 317 ns 7.31 ns
bm_ranges_lexicographical_compare<signed char>/4096 2500 ns 47.5 ns
bm_ranges_lexicographical_compare<signed char>/32768 19940 ns 496 ns
bm_ranges_lexicographical_compare<signed char>/262144 159166 ns 4393 ns
bm_ranges_lexicographical_compare<signed char>/1048576 638206 ns 19786 ns
bm_ranges_lexicographical_compare<int>/1 1.10 ns 2.12 ns
bm_ranges_lexicographical_compare<int>/2 1.64 ns 3.04 ns
bm_ranges_lexicographical_compare<int>/3 2.23 ns 2.58 ns
bm_ranges_lexicographical_compare<int>/4 2.81 ns 2.81 ns
bm_ranges_lexicographical_compare<int>/5 3.35 ns 3.05 ns
bm_ranges_lexicographical_compare<int>/6 3.94 ns 4.60 ns
bm_ranges_lexicographical_compare<int>/7 4.60 ns 4.81 ns
bm_ranges_lexicographical_compare<int>/8 5.19 ns 2.35 ns
bm_ranges_lexicographical_compare<int>/16 9.85 ns 2.87 ns
bm_ranges_lexicographical_compare<int>/64 38.9 ns 4.70 ns
bm_ranges_lexicographical_compare<int>/512 318 ns 24.5 ns
bm_ranges_lexicographical_compare<int>/4096 2494 ns 202 ns
bm_ranges_lexicographical_compare<int>/32768 20000 ns 1939 ns
bm_ranges_lexicographical_compare<int>/262144 160433 ns 19730 ns
bm_ranges_lexicographical_compare<int>/1048576 642636 ns 80760 ns
```
2024-08-04 10:02:43 +02:00
Louis Dionne
6a54dfbfe5
[libc++][NFC] Add missing license headers
...
Also standardize the license comment in several files where it was
different from what we normally do.
2024-07-31 12:58:09 -04:00
Louis Dionne
78b4b5cccb
[libc++] Move the benchmarks under libcxx/test ( #99371 )
...
This is an intermediate and fairly mechanical step towards unifying the
benchmarks with the rest of the test suite. Moving this around requires
a few changes, notably making sure we don't throw a wrench into the
discovery process of the normal test suite. This won't be a problem
anymore once benchmarks are taken into account by the test setup out of
the box.
2024-07-31 11:18:32 -04:00