44 Commits

Author SHA1 Message Date
Louis Dionne
c7aaaeaef4
[libc++] Rename a few benchmarks to allow identifying what's being benchmarked from the name (#185747) 2026-03-11 08:15:02 -04:00
Nikolas Klauser
15bcae5e3b
[libc++] Drop the unrepresentative search_n benchmark (#184783)
This benchmark isn't very good at benchmarking `search_n`, since a good
`search_n` implementation can go through it in ~10 perfectly predictable
steps. We can drop it to avoid spending unnecessary resources. This also
fixes that the two benchmark sets have identical names.

Fixes #183832
2026-03-10 10:40:12 +01:00
Nikolas Klauser
ab25249e63
[libc++] Refactor std::push_heap benchmark (#181343)
We're trying to get the time it takes to run all the benchmarks down, so
that we can run them on a regular basis. This patch saves us ~8 minutes
per run.

Fixes #177028
2026-02-14 08:34:45 +01:00
Nikolas Klauser
8c93fb0c05
[libc++] Refactor benchmarking std::make_heap and std::sort_heap together (#180935)
We're trying to get the time it takes to run all the benchmarks down, so
that we can run them on a regular basis. This patch saves us ~18 minutes
per run.
2026-02-11 13:17:44 -05:00
Nikolas Klauser
bc1e852241
[libc++] Refactor std::sort_heap benchmark (#180941)
We're trying to get the time it takes to run all the benchmarks down, so
that we can run them on a regular basis. This patch saves us ~80 seconds
per run.
2026-02-11 13:17:28 -05:00
Nikolas Klauser
f32fd56f0e
[libc++] Reduce the number of runs on the ranges::min{,max} benchmarks (#179912)
Testing a bunch of range sizes has relatively little value. This reduces
the number of benchmarks so we can run them on a regular basis. This
saves ~10 minutes when running benchmarks.

Fixes #179698
2026-02-11 13:17:03 -05:00
Nikolas Klauser
064694160f
[libc++] Rewrite the std::pop_heap benchmark (#179911)
Testing a bunch of random types has relatively little value. This
reduces the number of benchmarks so we can run them on a regular basis.
This saves ~90 seconds when running the benchmarks.
2026-02-11 12:47:33 +01:00
Nikolas Klauser
0bf41e3d7a
[libc++] Rewrite the std::make_heap benchmark (#178696)
This rewrites the `make_heap` benchmark to make it significantly faster
to run. In my test it saves ~10 minutes.

This patch also drops `ranges::` heap benchmarks, since we've decided to
remove `ranges::` benchmarks if there is a `std::` equivalent.
2026-02-05 11:46:11 +01:00
Nikolas Klauser
a2fb416b53
[libc++] Rewrite the std::lower_bound benchmark to be more efficient and add an upper_bound benchmark (#177180)
The current benchmark is incredibly slow to run. This patch refactors
the benchmark to be faster and also adds an equivalent benchmark for
`std::upper_bound`.

Fixes #177026
2026-01-27 13:46:36 -05:00
Nikolas Klauser
07a0e0fb31
[libc++] Remove benchmarks for ranges algorithms that have a std equivalent (#176138)
We're currently running all the algorithms benchmarks for the `std` and
`ranges` variants, even though almost all the algorithms share the same
code. This makes running the benchmarks on a large set of commits very
slow and costly. This reduced running the `algorithm/` subdirectory from
~4 hours to roughtly 2.5 hours.

Fixes #175973
2026-01-19 16:04:32 +01:00
Nikolas Klauser
4304106dc2
Reapply "[libc++] Optimize std::find_if" (#175903) (#175921)
#175913 removed that `__builtin_assume_dereferenceable(ptr, 0)` implies
`ptr != nullptr`, which should allow us to use the builtin with LLVM 23.

This reverts commit 776c09c212e945fdceeae240b42c38df3dd34727.
2026-01-15 10:16:45 +01:00
Nikolas Klauser
776c09c212
Revert "[libc++] Optimize std::find_if" (#175903)
`__builtin_assume_dereferenceable` currently implies that the pointer
given to it is non-null, even if the size is zero. This causes
miscompilations, since we consider [nullptr, nullptr) to be a valid
range.

Reverts llvm/llvm-project#167697
2026-01-14 11:15:21 +01:00
Nikolas Klauser
6189512f73
[libc++] Optimize std::find_if (#167697)
```
Benchmark                       4ecfaa602f56    80d5ac247d34    Difference    % Difference
----------------------------  --------------  --------------  ------------  --------------
bm_find_if_autovectorization         1901.51          306.12      -1595.39         -83.90%
```
2026-01-13 11:13:54 +01:00
Nikolas Klauser
fbde9240e9
[libc++] Remove some low value benchmarks (#175178)
This patch removes some of the benchmarks in our suite that are
relatively low value. Specifically:
- `lexicographical_compare` isn't benchmarked with every number of
    elements between 1 and 8 anymore
- `bitset` isn't benchmarked with as many sizes anymore. Instead, only
    some representative sizes are benchmarked.
- `to_chars` isn't benchmarked for every possible base anymore, but only
    2, 8, 10, 16 and 23 (so we have an uncommon base as well)
2026-01-12 14:09:54 -05:00
Nikolas Klauser
06c6a5020f
[libc++] Optimize search_n (#171389)
This changes the algorithm to more efficiently skip ranges which cannot
match the needle for random access iterators. Specifically, we now
search for a mismatching element from the back of the subrange we want
to check. When a mismatch occurs we can directly start one after the
mismatched element, since there cannot possibly be a matching subrange
starting between the start of the subrange we checked and the mismatched
element (since all elements have to be equal). The algorithm also
remembers the subrange which was already match as being equal and
doesn't try to compare it a second time, reducing the time spent in case
of a match.

```
Benchmark                                                       old             new    Difference    % Difference
---------------------------------------------------  --------------  --------------  ------------  --------------
rng::search_n(deque<int>)_(no_match)/1000                    458.33           14.22       -444.11         -96.90%
rng::search_n(deque<int>)_(no_match)/1024                    456.17           13.89       -442.28         -96.95%
rng::search_n(deque<int>)_(no_match)/1048576              453420.38           17.69    -453402.69        -100.00%
rng::search_n(deque<int>)_(no_match)/8192                   3566.08           17.60      -3548.49         -99.51%
rng::search_n(deque<int>,_pred)_(no_match)/1000              597.88           15.25       -582.63         -97.45%
rng::search_n(deque<int>,_pred)_(no_match)/1024              608.42           15.39       -593.03         -97.47%
rng::search_n(deque<int>,_pred)_(no_match)/1048576        594533.99           18.91    -594515.08        -100.00%
rng::search_n(deque<int>,_pred)_(no_match)/8192             4670.23           18.88      -4651.35         -99.60%
rng::search_n(list<int>)_(no_match)/1000                     733.72          730.22         -3.50          -0.48%
rng::search_n(list<int>)_(no_match)/1024                     759.93          753.10         -6.84          -0.90%
rng::search_n(list<int>)_(no_match)/1048576               833841.54       813483.75     -20357.79          -2.44%
rng::search_n(list<int>)_(no_match)/8192                    8352.18         8417.31         65.14           0.78%
rng::search_n(list<int>,_pred)_(no_match)/1000               776.79          789.72         12.93           1.66%
rng::search_n(list<int>,_pred)_(no_match)/1024               788.42          806.70         18.28           2.32%
rng::search_n(list<int>,_pred)_(no_match)/1048576         955536.40       982976.81      27440.41           2.87%
rng::search_n(list<int>,_pred)_(no_match)/8192              8874.02         8915.18         41.16           0.46%
rng::search_n(vector<int>)_(no_match)/1000                   212.69            3.79       -208.90         -98.22%
rng::search_n(vector<int>)_(no_match)/1024                   219.67            3.70       -215.96         -98.31%
rng::search_n(vector<int>)_(no_match)/1048576             209622.54            3.67    -209618.87        -100.00%
rng::search_n(vector<int>)_(no_match)/8192                  1643.80            3.83      -1639.98         -99.77%
rng::search_n(vector<int>,_pred)_(no_match)/1000             461.93            7.55       -454.38         -98.36%
rng::search_n(vector<int>,_pred)_(no_match)/1024             472.43            7.74       -464.69         -98.36%
rng::search_n(vector<int>,_pred)_(no_match)/1048576       546180.29            8.71    -546171.58        -100.00%
rng::search_n(vector<int>,_pred)_(no_match)/8192            3786.26            7.88      -3778.38         -99.79%
std::search_n(deque<int>)_(no_match)/1000                    455.53           14.19       -441.34         -96.88%
std::search_n(deque<int>)_(no_match)/1024                    459.79           13.98       -445.81         -96.96%
std::search_n(deque<int>)_(no_match)/1048576              449780.32           17.99    -449762.33        -100.00%
std::search_n(deque<int>)_(no_match)/8192                   3508.55           17.97      -3490.58         -99.49%
std::search_n(deque<int>,_pred)_(no_match)/1000              571.53           17.16       -554.37         -97.00%
std::search_n(deque<int>,_pred)_(no_match)/1024              584.43           17.09       -567.34         -97.08%
std::search_n(deque<int>,_pred)_(no_match)/1048576        581418.31           19.16    -581399.15        -100.00%
std::search_n(deque<int>,_pred)_(no_match)/8192             4661.97           19.36      -4642.61         -99.58%
std::search_n(list<int>)_(no_match)/1000                     722.45          710.39        -12.06          -1.67%
std::search_n(list<int>)_(no_match)/1024                     748.50          727.08        -21.42          -2.86%
std::search_n(list<int>)_(no_match)/1048576               821655.28       784520.12     -37135.16          -4.52%
std::search_n(list<int>)_(no_match)/8192                    7941.73         8002.05         60.32           0.76%
std::search_n(list<int>,_pred)_(no_match)/1000               766.59          786.31         19.72           2.57%
std::search_n(list<int>,_pred)_(no_match)/1024               785.92          804.43         18.51           2.35%
std::search_n(list<int>,_pred)_(no_match)/1048576         948252.76       969125.41      20872.65           2.20%
std::search_n(list<int>,_pred)_(no_match)/8192              8658.99         8825.71        166.72           1.93%
std::search_n(vector<int>)_(no_match)/1000                   210.36            3.47       -206.89         -98.35%
std::search_n(vector<int>)_(no_match)/1024                   217.60            4.13       -213.47         -98.10%
std::search_n(vector<int>)_(no_match)/1048576             209386.43            3.51    -209382.92        -100.00%
std::search_n(vector<int>)_(no_match)/8192                  1643.79            3.50      -1640.29         -99.79%
std::search_n(vector<int>,_pred)_(no_match)/1000             460.88            5.44       -455.45         -98.82%
std::search_n(vector<int>,_pred)_(no_match)/1024             475.36            5.43       -469.93         -98.86%
std::search_n(vector<int>,_pred)_(no_match)/1048576       682722.75            7.15    -682715.60        -100.00%
std::search_n(vector<int>,_pred)_(no_match)/8192            3779.95            5.43      -3774.52         -99.86%
Geomean                                                     4956.15           87.96      -4868.19         -98.23%
```

Fixes #129327
2026-01-08 11:26:08 +01:00
Nikolas Klauser
f21e56c632 [libc++] Add benchmark for std::find(vector<short>)
We have special cases for `std::find` with `char` and `int` on most
platforms, so the only other benchmark of or vector implementation
is currently with `long long`, which is a rather big type. Adding
`short` to the benchmarks allows us to more meaningfully compare the
different implementations.
2026-01-07 11:21:57 +01:00
Nikolas Klauser
4fc5b6d8c4
[libc++] Optimize {std,ranges}::for_each for iterating over __trees (#164405)
This patch optimizes how `for_each` iterates over trees by using
recursion and storing pointers to the next nodes on the stack. This
avoids pointer chasing through the `__parent_` pointer, reducing cache
misses. It also makes use of the compiler being able tail-call optimize
the recursive function, removing back-tracking the iterators have to do.

```
Benchmark                                                      old             new    Difference    % Difference
--------------------------------------------------  --------------  --------------  ------------  --------------
rng::for_each(map<int>)/32                                   35.19           26.67         -8.52         -24.21%
rng::for_each(map<int>)/50                                   64.13           40.68        -23.45         -36.57%
rng::for_each(map<int>)/8                                     5.06            6.49          1.43          28.21%
rng::for_each(map<int>)/8192                              22893.89         9266.68     -13627.21         -59.52%
rng::for_each(map<int>::iterator)/32                         35.51           26.88         -8.63         -24.31%
rng::for_each(map<int>::iterator)/50                         64.39           41.24        -23.15         -35.95%
rng::for_each(map<int>::iterator)/8                           5.12            5.93          0.81          15.80%
rng::for_each(map<int>::iterator)/8192                    21283.14         9736.83     -11546.31         -54.25%
rng::for_each(multimap<int>)/32                              35.22           26.61         -8.61         -24.45%
rng::for_each(multimap<int>)/50                              64.10           40.07        -24.03         -37.49%
rng::for_each(multimap<int>)/8                                5.08            6.69          1.61          31.70%
rng::for_each(multimap<int>)/8192                         23130.44         9026.16     -14104.28         -60.98%
rng::for_each(multimap<int>::iterator)/32                    35.40           25.08        -10.32         -29.15%
rng::for_each(multimap<int>::iterator)/50                    64.19           38.15        -26.04         -40.56%
rng::for_each(multimap<int>::iterator)/8                      5.04            5.25          0.22           4.31%
rng::for_each(multimap<int>::iterator)/8192               22875.97         9392.08     -13483.89         -58.94%
rng::for_each(multiset<int>)/32                              35.82           27.11         -8.72         -24.33%
rng::for_each(multiset<int>)/50                              62.92           41.59        -21.32         -33.89%
rng::for_each(multiset<int>)/8                                4.79            6.79          2.00          41.70%
rng::for_each(multiset<int>)/8192                         22642.68         9280.95     -13361.73         -59.01%
rng::for_each(multiset<int>::iterator)/32                    35.76           26.71         -9.04         -25.28%
rng::for_each(multiset<int>::iterator)/50                    63.44           39.00        -24.44         -38.53%
rng::for_each(multiset<int>::iterator)/8                      4.90            5.21          0.30           6.18%
rng::for_each(multiset<int>::iterator)/8192               19930.45         9867.60     -10062.85         -50.49%
rng::for_each(set<int>)/32                                   35.90           27.30         -8.60         -23.96%
rng::for_each(set<int>)/50                                   63.15           40.75        -22.40         -35.47%
rng::for_each(set<int>)/8                                     4.77            6.83          2.06          43.23%
rng::for_each(set<int>)/8192                              20262.77         9381.57     -10881.20         -53.70%
rng::for_each(set<int>::iterator)/32                         36.02           26.42         -9.60         -26.64%
rng::for_each(set<int>::iterator)/50                         63.29           37.97        -25.32         -40.01%
rng::for_each(set<int>::iterator)/8                           4.72            5.22          0.50          10.50%
rng::for_each(set<int>::iterator)/8192                    20041.91         9831.91     -10210.00         -50.94%
```
2025-12-12 09:05:51 +01:00
lbonn
3ebe99f4c2
[libcxx] Unwrap iterators in __find_segment (#161274)
The segmented iterator optimized implementation of find now unwraps
iterators when processing each segments.

As a result, it is able to take better advantage to some find
specializations: calling memchr/wmemchr for vector<vector<{char,int}>>

```
Benchmark                                                         Baseline    Candidate    Difference    % Difference
--------------------------------------------------------------  ----------  -----------  ------------  --------------
rng::find(join_view(deque<deque<int>>))_(process_all)/1024           71.13        61.19         -9.94          -13.97
rng::find(join_view(deque<deque<int>>))_(process_all)/32768        2359.19      2237.02       -122.17           -5.18
rng::find(join_view(deque<deque<int>>))_(process_all)/50             16.88        17.59          0.71            4.20
rng::find(join_view(deque<deque<int>>))_(process_all)/8              15.59        16.10          0.51            3.27
rng::find(join_view(deque<deque<int>>))_(process_all)/8192          647.01       532.75       -114.26          -17.66
rng::find(join_view(list<vector<int>>))_(process_all)/1024          689.76       680.74         -9.02           -1.31
rng::find(join_view(list<vector<int>>))_(process_all)/32768       22284.95     21500.26       -784.69           -3.52
rng::find(join_view(list<vector<int>>))_(process_all)/50             32.77        32.12         -0.65           -1.98
rng::find(join_view(list<vector<int>>))_(process_all)/8               6.11         5.92         -0.19           -3.11
rng::find(join_view(list<vector<int>>))_(process_all)/8192         5527.88      5373.43       -154.45           -2.79
rng::find(join_view(vector<list<int>>))_(process_all)/1024         1305.59      1264.04        -41.55           -3.18
rng::find(join_view(vector<list<int>>))_(process_all)/32768       42840.88     43322.64        481.76            1.12
rng::find(join_view(vector<list<int>>))_(process_all)/50             57.52        62.35          4.82            8.38
rng::find(join_view(vector<list<int>>))_(process_all)/8               6.06         5.98         -0.07           -1.18
rng::find(join_view(vector<list<int>>))_(process_all)/8192        20700.53     21431.66        731.12            3.53
rng::find(join_view(vector<vector<char>>))_(process_all)/1024       310.64        18.34       -292.30          -94.09
rng::find(join_view(vector<vector<char>>))_(process_all)/32768     9424.96       531.99      -8892.97          -94.36
rng::find(join_view(vector<vector<char>>))_(process_all)/50          18.58         3.25        -15.32          -82.49
rng::find(join_view(vector<vector<char>>))_(process_all)/8            4.81         2.98         -1.84          -38.13
rng::find(join_view(vector<vector<char>>))_(process_all)/8192      2437.50       126.88      -2310.62          -94.79
rng::find(join_view(vector<vector<int>>))_(process_all)/1024        297.10        41.70       -255.39          -85.96
rng::find(join_view(vector<vector<int>>))_(process_all)/32768      9662.42      1822.05      -7840.36          -81.14
rng::find(join_view(vector<vector<int>>))_(process_all)/50           22.29         5.10        -17.19          -77.11
rng::find(join_view(vector<vector<int>>))_(process_all)/8             3.73         3.13         -0.60          -16.05
rng::find(join_view(vector<vector<int>>))_(process_all)/8192       2399.68       356.10      -2043.58          -85.16
```
2025-11-28 10:13:21 +01:00
Nikolas Klauser
97367d1046
[libc++] Vectorize std::find (#156431)
```
Apple M4:
-----------------------------------------------------------------------------
Benchmark                                                 old             new
-----------------------------------------------------------------------------
std::find(vector<char>) (bail 25%)/8                  1.43 ns         1.44 ns
std::find(vector<char>) (bail 25%)/1024               5.54 ns         5.59 ns
std::find(vector<char>) (bail 25%)/8192               38.4 ns         39.1 ns
std::find(vector<char>) (bail 25%)/32768               134 ns          136 ns
std::find(vector<int>) (bail 25%)/8                   1.56 ns         1.57 ns
std::find(vector<int>) (bail 25%)/1024                65.3 ns         65.4 ns
std::find(vector<int>) (bail 25%)/8192                 465 ns          464 ns
std::find(vector<int>) (bail 25%)/32768               1832 ns         1832 ns
std::find(vector<long long>) (bail 25%)/8            0.920 ns         1.20 ns
std::find(vector<long long>) (bail 25%)/1024          65.2 ns         31.2 ns
std::find(vector<long long>) (bail 25%)/8192           464 ns          255 ns
std::find(vector<long long>) (bail 25%)/32768         1833 ns          992 ns
std::find(vector<char>) (process all)/8               1.21 ns         1.22 ns
std::find(vector<char>) (process all)/50              1.92 ns         1.93 ns
std::find(vector<char>) (process all)/1024            16.6 ns         16.9 ns
std::find(vector<char>) (process all)/8192             134 ns          136 ns
std::find(vector<char>) (process all)/32768            488 ns          503 ns
std::find(vector<int>) (process all)/8                2.45 ns         2.48 ns
std::find(vector<int>) (process all)/50               12.7 ns         12.7 ns
std::find(vector<int>) (process all)/1024              236 ns          236 ns
std::find(vector<int>) (process all)/8192             1830 ns         1834 ns
std::find(vector<int>) (process all)/32768            7351 ns         7346 ns
std::find(vector<long long>) (process all)/8          2.02 ns         1.45 ns
std::find(vector<long long>) (process all)/50         12.0 ns         6.12 ns
std::find(vector<long long>) (process all)/1024        235 ns          123 ns
std::find(vector<long long>) (process all)/8192       1830 ns          983 ns
std::find(vector<long long>) (process all)/32768      7306 ns         3969 ns
std::find(vector<bool>) (process all)/8               1.14 ns         1.15 ns
std::find(vector<bool>) (process all)/50              1.16 ns         1.17 ns
std::find(vector<bool>) (process all)/1024            4.51 ns         4.53 ns
std::find(vector<bool>) (process all)/8192            33.6 ns         33.5 ns
std::find(vector<bool>) (process all)/1048576         3660 ns         3660 ns
```
2025-09-29 11:10:19 +02:00
Louis Dionne
a34048bfed [libc++][NFC] Fix benchmark name missing a parenthesis 2025-09-24 23:24:55 -04:00
Louis Dionne
794351355e
[libc++] Fix broken unique and unique_copy benchmarks (#158287)
These benchmarks have an assumption that the container size is divisible
by 4 because of how we populate their content, which wasn't satisfied.
2025-09-15 16:00:51 -04:00
Peng Liu
9827440f1e
[libc++] Optimize ranges::{for_each, for_each_n} for segmented iterators (#132896)
Previously, the segmented iterator optimization was limited to `std::{for_each, for_each_n}`. This patch
extends the optimization to `std::ranges::for_each` and `std::ranges::for_each_n`, ensuring consistent
optimizations across these algorithms. This patch first generalizes the `std` algorithms by introducing
a `Projection` parameter, which is set to `__identity` for the `std` algorithms. Then we let the `ranges`
algorithms to directly call their `std` counterparts with a general `__proj` argument. Benchmarks
demonstrate performance improvements of up to 21.4x for ``std::deque::iterator`` and 22.3x for
``join_view`` of ``vector<vector<char>>``.

Addresses a subtask of #102817.
2025-06-18 12:22:47 -04:00
Peng Liu
09c266b75d
[libc++] Optimize std::for_each_n for segmented iterators (#135468)
This patch enhances the performance of `std::for_each_n` when used with
segmented iterators, leading to significant performance improvements,
summarized in the tables below. This addresses a subtask of
https://github.com/llvm/llvm-project/issues/102817.
2025-05-21 12:10:50 -04:00
Louis Dionne
8c435886aa
[libc++] Refactor and add benchmark coverage for [alg.sort] (#128236)
This patch adds missing benchmark coverage for partial_sort,
partial_sort_copy, is_sorted and is_sorted_until.

It also refactors the existing benchmarks for sort and stable_sort to
follow the consistent style of the new algorithm benchmarks. However,
these benchmarks were notoriously slow to run since they tested multiple
data patterns on multiple data types. To try to alleviate this, I
reduced the benchmarks to only run on integral types and on a single
non-integral type, which should faithfully show how the algorithm
behaves for anything non-integral. However, this is technically a
reduction in coverage.
2025-05-14 14:52:17 -04:00
Louis Dionne
410754410f
[libc++] Add benchmarks for partitioning algorithms (#127324)
This patch adds benchmarks for std::partition, is_partitioned, etc and
their ranges:: variants.
2025-03-24 12:24:43 -04:00
Louis Dionne
b3a4bf9d8f
[libc++] Refactor and add benchmarks from [alg.nonmodifying] (#128206) 2025-03-19 00:08:46 -04:00
Louis Dionne
24e88b0e6b
[libc++] Add remaining benchmarks from [alg.modifying.operations] (#127354)
This patch adds benchmarks for all the remaining algorithms in
[alg.modifying.operations] that we didn't already have a benchmark for.
2025-03-17 15:11:13 -04:00
Peng Liu
4baf1c03fa
[libc++] Optimize ranges::rotate for vector<bool>::iterator (#121168)
This PR optimizes the performance of `std::ranges::rotate` for
`vector<bool>::iterator`. The optimization yields a performance
improvement of up to 2096x.

Closes #64038.
2025-03-13 14:07:23 -04:00
Peng Liu
a12744ff05
[libc++] Optimize ranges::swap_ranges for vector<bool>::iterator (#121150)
This PR optimizes the performance of `std::ranges::swap_ranges` for
`vector<bool>::iterator`, addressing a subtask outlined in issue #64038.
The optimizations yield performance improvements of up to **611x** for
aligned range swap and **78x** for unaligned range swap comparison.
Additionally, comprehensive tests covering up to 4 storage words (256
bytes) with odd and even bit sizes are provided, which validate the
proposed optimizations in this patch.
2025-03-04 17:15:36 -05:00
Peng Liu
7717a549e9
[libc++] Optimize ranges::equal for vector<bool>::iterator (#121084)
This PR optimizes the performance of `std::ranges::equal` for
`vector<bool>::iterator`, addressing a subtask outlined in issue #64038.
The optimizations yield performance improvements of up to 188x for
aligned equality comparison and 82x for unaligned equality
comparison. Moreover, comprehensive tests covering up to 4 storage words
(256 bytes) with odd and even bit sizes are provided, which validate the
proposed optimizations in this patch.
2025-02-26 12:18:25 -05:00
Louis Dionne
8feb5bac32
[libc++] Add benchmarks for copy algorithms (#127328)
This patch adds benchmarks for the copy family of algorithms (copy,
copy_n, copy_if, copy_backward).
2025-02-20 08:24:32 -05:00
Peng Liu
ab3d793982
[libc++] Optimize ranges::move{,_backward} for vector<bool>::iterator (#121109)
As a follow-up to #121013 (which optimized `ranges::copy`) and #121026
(which optimized `ranges::copy_backward`), this PR enhances the
performance of `std::ranges::{move, move_backward}` for
`vector<bool>::iterator`, addressing a subtask outlined in issue #64038.

The optimizations bring performance improvements analogous to those
achieved for the `{copy, copy_backward}` algorithms: up to 2000x for
aligned moves and 60x for unaligned moves. Moreover, comprehensive
tests covering up to 4 storage words (256 bytes) with odd and even bit
sizes are provided, which validate the proposed optimizations in this
patch.
2025-02-19 11:36:45 -05:00
Louis Dionne
a6093d3034
[libc++] Explicitly mention vector_bool in the name of benchmarks (#127313)
We have some benchmarks that were benchmarking very specific
functionality, namely the optimizations in vector<bool>::iterator. Call
this out in the benchmarks by renaming them appropriately. In the future
we will also increase the coverage of these benchmarks to test other
containers.
2025-02-15 14:56:19 +01:00
Louis Dionne
c7c7eabc7f
[libc++] Add a benchmark for std::reverse (#125262) 2025-02-03 12:10:47 -05:00
Louis Dionne
439bef9751
[libc++] Refactor the sequence container benchmarks (#119763)
Rewrite the sequence container benchmarks to only rely on the actual
operations specified in SequenceContainer requirements and add
benchmarks for std::list, which is also considered a sequence container.

One of the major goals of this refactoring is also to make these
container benchmarks run faster so that they can be run more frequently.
The existing benchmarks have the significant problem that they take so
long to run that they must basically be run overnight. This patch
reduces the size of inputs such that the rewritten benchmarks each take
at most a minute to run.

This patch doesn't touch the string benchmarks, which were not using the
generic container benchmark functions previously.
2025-01-30 15:02:34 -05:00
Peng Liu
edc3dc6abd
[libc++] Optimize ranges::copy_backward for vector<bool>::iterator (#121026)
As a follow-up to #121013 (which focused on `std::ranges::copy`), this
PR optimizes the performance of `std::ranges::copy_backward` for
`vector<bool>::iterator`, addressing a subtask outlined in issue #64038.
The optimizations yield performance improvements of up to 2000x for
aligned copies and 60x for unaligned copies.
2025-01-30 14:55:05 -05:00
Peng Liu
5b65896ad6
[libc++] Optimize ranges::copy{, _n} for vector<bool>::iterator (#121013)
This PR optimizes the performance of `std::ranges::copy` and
`std::ranges::copy_n` specifically for `vector<bool>::iterator`,
addressing a subtask outlined in issue #64038. The optimizations yield
performance improvements of up to **2000x** for aligned copies and
**60x** for unaligned copies. Additionally, new tests have been added to
validate these enhancements.


- Aligned source-destination bits

ranges::copy
```
--------------------------------------------------------------------------
Benchmark                                Before        After   Improvement
--------------------------------------------------------------------------
bm_ranges_copy_vb_aligned/8              10.8 ns      1.42 ns           8x
bm_ranges_copy_vb_aligned/64             88.5 ns      2.28 ns          39x
bm_ranges_copy_vb_aligned/512             709 ns      1.95 ns         364x
bm_ranges_copy_vb_aligned/4096           5568 ns      5.01 ns        1111x
bm_ranges_copy_vb_aligned/32768         44754 ns      38.7 ns        1156x
bm_ranges_copy_vb_aligned/65536         91092 ns      73.2 ns        1244x
bm_ranges_copy_vb_aligned/102400       139473 ns       127 ns        1098x
bm_ranges_copy_vb_aligned/106496       189004 ns      81.5 ns        2319x
bm_ranges_copy_vb_aligned/110592       153647 ns      71.1 ns        2161x
bm_ranges_copy_vb_aligned/114688       159261 ns      70.2 ns        2269x
bm_ranges_copy_vb_aligned/118784       181910 ns      73.5 ns        2475x
bm_ranges_copy_vb_aligned/122880       174117 ns      76.5 ns        2276x
bm_ranges_copy_vb_aligned/126976       176020 ns      82.0 ns        2147x
bm_ranges_copy_vb_aligned/131072       180757 ns       137 ns        1319x
bm_ranges_copy_vb_aligned/135168       190342 ns       158 ns        1205x
bm_ranges_copy_vb_aligned/139264       192831 ns       103 ns        1872x
bm_ranges_copy_vb_aligned/143360       199627 ns      89.4 ns        2233x
bm_ranges_copy_vb_aligned/147456       203881 ns      88.6 ns        2301x
bm_ranges_copy_vb_aligned/151552       213345 ns      88.4 ns        2413x
bm_ranges_copy_vb_aligned/155648       216892 ns      92.9 ns        2335x
bm_ranges_copy_vb_aligned/159744       222751 ns      96.4 ns        2311x
bm_ranges_copy_vb_aligned/163840       225995 ns       173 ns        1306x
bm_ranges_copy_vb_aligned/167936       235230 ns       202 ns        1165x
bm_ranges_copy_vb_aligned/172032       244093 ns       131 ns        1863x
bm_ranges_copy_vb_aligned/176128       244434 ns       111 ns        2202x
bm_ranges_copy_vb_aligned/180224       249570 ns       108 ns        2311x
bm_ranges_copy_vb_aligned/184320       254538 ns       108 ns        2357x
bm_ranges_copy_vb_aligned/188416       261817 ns       113 ns        2317x
bm_ranges_copy_vb_aligned/192512       269923 ns       125 ns        2159x
bm_ranges_copy_vb_aligned/196608       273494 ns       210 ns        1302x
bm_ranges_copy_vb_aligned/200704       280035 ns       269 ns        1041x
bm_ranges_copy_vb_aligned/204800       293102 ns       231 ns        1269x
```

ranges::copy_n
```
--------------------------------------------------------------------------
Benchmark                                Before        After   Improvement
--------------------------------------------------------------------------
bm_ranges_copy_n_vb_aligned/8            11.8 ns       0.89 ns         13x
bm_ranges_copy_n_vb_aligned/64           91.6 ns       2.06 ns         44x
bm_ranges_copy_n_vb_aligned/512           718 ns       2.45 ns        293x
bm_ranges_copy_n_vb_aligned/4096         5750 ns       5.02 ns       1145x
bm_ranges_copy_n_vb_aligned/32768       45824 ns       40.9 ns       1120x
bm_ranges_copy_n_vb_aligned/65536       92267 ns       73.8 ns       1250x
bm_ranges_copy_n_vb_aligned/102400     143267 ns       125 ns        1146x
bm_ranges_copy_n_vb_aligned/106496     148625 ns      82.4 ns        1804x
bm_ranges_copy_n_vb_aligned/110592     154817 ns      72.0 ns        2150x
bm_ranges_copy_n_vb_aligned/114688     157953 ns      70.4 ns        2244x
bm_ranges_copy_n_vb_aligned/118784     162374 ns      71.5 ns        2270x
bm_ranges_copy_n_vb_aligned/122880     168638 ns      72.9 ns        2313x
bm_ranges_copy_n_vb_aligned/126976     175596 ns      76.6 ns        2292x
bm_ranges_copy_n_vb_aligned/131072     181164 ns       135 ns        1342x
bm_ranges_copy_n_vb_aligned/135168     184697 ns       157 ns        1176x
bm_ranges_copy_n_vb_aligned/139264     191395 ns       104 ns        1840x
bm_ranges_copy_n_vb_aligned/143360     194954 ns      88.3 ns        2208x
bm_ranges_copy_n_vb_aligned/147456     208917 ns      86.1 ns        2426x
bm_ranges_copy_n_vb_aligned/151552     211101 ns      87.2 ns        2421x
bm_ranges_copy_n_vb_aligned/155648     213175 ns      89.0 ns        2395x
bm_ranges_copy_n_vb_aligned/159744     218988 ns      86.7 ns        2526x
bm_ranges_copy_n_vb_aligned/163840     225263 ns       156 ns        1444x
bm_ranges_copy_n_vb_aligned/167936     230725 ns       184 ns        1254x
bm_ranges_copy_n_vb_aligned/172032     235795 ns       119 ns        1981x
bm_ranges_copy_n_vb_aligned/176128     241145 ns       101 ns        2388x
bm_ranges_copy_n_vb_aligned/180224     250680 ns      99.5 ns        2519x
bm_ranges_copy_n_vb_aligned/184320     262954 ns      99.7 ns        2637x
bm_ranges_copy_n_vb_aligned/188416     258584 ns       103 ns        2510x
bm_ranges_copy_n_vb_aligned/192512     267190 ns       125 ns        2138x
bm_ranges_copy_n_vb_aligned/196608     270821 ns       213 ns        1271x
bm_ranges_copy_n_vb_aligned/200704     279532 ns       262 ns        1067x
bm_ranges_copy_n_vb_aligned/204800     283412 ns       222 ns        1277x
```

- Unaligned source-destination bits
```
--------------------------------------------------------------------------------
Benchmark                                    Before           After  Improvement
--------------------------------------------------------------------------------
bm_ranges_copy_vb_unaligned/8               12.8 ns         8.59 ns         1.5x
bm_ranges_copy_vb_unaligned/64              98.2 ns         8.24 ns          12x
bm_ranges_copy_vb_unaligned/512              755 ns         18.1 ns          42x
bm_ranges_copy_vb_unaligned/4096            6027 ns          102 ns          59x
bm_ranges_copy_vb_unaligned/32768          47663 ns          774 ns          62x
bm_ranges_copy_vb_unaligned/262144        378981 ns         6455 ns          59x
bm_ranges_copy_vb_unaligned/1048576      1520486 ns        25942 ns          59x
bm_ranges_copy_n_vb_unaligned/8             11.3 ns         8.22 ns         1.4x
bm_ranges_copy_n_vb_unaligned/64            97.3 ns         7.89 ns          12x
bm_ranges_copy_n_vb_unaligned/512            747 ns         18.1 ns          41x
bm_ranges_copy_n_vb_unaligned/4096          5932 ns         99.0 ns          60x
bm_ranges_copy_n_vb_unaligned/32768        47776 ns         749 ns           64x
bm_ranges_copy_n_vb_unaligned/262144      378802 ns        6576 ns           58x
bm_ranges_copy_n_vb_unaligned/1048576    1547234 ns       26229 ns           59x
```
2025-01-30 17:26:26 +01:00
Nikolas Klauser
fd784726db [libc++] Rewrite minmax_element benchmark
The benchmark currently uses makeCartesianProductBenchmark, which
doesn't make a ton of sense, since minmax_element always goes through
every element one by one. The runtime doesn't depend on the values
of the elements.

Fixes #120758
2024-12-21 20:07:28 +01:00
Louis Dionne
6a9279ca40
[libc++] Slight reorganization of the benchmarks (#119625)
Move various container benchmarks to the same subdirectory, and regroup
some format-related benchmarks.
2024-12-12 08:14:50 -05:00
Louis Dionne
e236a52a88
[libc++] Unify the benchmarks with the test suite (#101399)
Instead of building the benchmarks separately via CMake and running them
separately from the test suite, this patch merges the benchmarks into
the test suite and handles both uniformly.

As a result:
- It is now possible to run individual benchmarks like we run tests
  (e.g. using libcxx-lit), which is a huge quality-of-life improvement.

- The benchmarks will be run under exactly the same configuration as
  the rest of the tests, which is a nice simplification. This does
  mean that one has to be careful to enable the desired optimization
  flags when running benchmarks, but that is easy with e.g.
  `libcxx-lit <...> --param optimization=speed`.

- Benchmarks can use the same annotations as the rest of the test
  suite, such as `// UNSUPPORTED` & friends.

When running the tests via `check-cxx`, we only compile the benchmarks
because running them would be too time consuming. This introduces a bit
of complexity in the testing setup, and instead it would be better to
allow passing a --dry-run flag to GoogleBenchmark executables, which is
the topic of https://github.com/google/benchmark/issues/1827.

I am not really satisfied with the layering violation of adding the
%{benchmark_flags} substitution to cmake-bridge, however I believe
this can be improved in the future.
2024-11-07 09:07:50 -05:00
Louis Dionne
b2d2494731
[libc++] Make benchmarks forward-compatible with the test suite (#114502)
This patch fixes warnings and errors that come up when running the
benchmarks as part of the test suite. It also adds the necessary Lit
annotations to make it pass in various configurations and increases the
portability of the benchmarks.
2024-11-05 09:08:00 -05:00
Nikolas Klauser
d07fdf9779
[libc++] Optimize lexicographical_compare (#65279)
If the comparison operation is equivalent to < and that is a total
order, we know that we can use equality comparison on that type instead
to extract some information. Furthermore, if equality comparison on that
type is trivial, the user can't observe that we're calling it. So
instead of using the user-provided total order, we use std::mismatch,
which uses equality comparison (and is vertorized). Additionally, if the
type is trivially lexicographically comparable, we can go one step
further and use std::memcmp directly instead of calling std::mismatch.

Benchmarks:
```
-------------------------------------------------------------------------------------
Benchmark                                                         old             new
-------------------------------------------------------------------------------------
bm_lexicographical_compare<unsigned char>/1                   1.17 ns         2.34 ns
bm_lexicographical_compare<unsigned char>/2                   1.64 ns         2.57 ns
bm_lexicographical_compare<unsigned char>/3                   2.23 ns         2.58 ns
bm_lexicographical_compare<unsigned char>/4                   2.82 ns         2.57 ns
bm_lexicographical_compare<unsigned char>/5                   3.34 ns         2.11 ns
bm_lexicographical_compare<unsigned char>/6                   3.94 ns         2.21 ns
bm_lexicographical_compare<unsigned char>/7                   4.56 ns         2.11 ns
bm_lexicographical_compare<unsigned char>/8                   5.25 ns         2.11 ns
bm_lexicographical_compare<unsigned char>/16                  9.88 ns         2.11 ns
bm_lexicographical_compare<unsigned char>/64                  38.9 ns         2.36 ns
bm_lexicographical_compare<unsigned char>/512                  317 ns         6.54 ns
bm_lexicographical_compare<unsigned char>/4096                2517 ns         41.4 ns
bm_lexicographical_compare<unsigned char>/32768              20052 ns          488 ns
bm_lexicographical_compare<unsigned char>/262144            159579 ns         4409 ns
bm_lexicographical_compare<unsigned char>/1048576           640456 ns        20342 ns
bm_lexicographical_compare<signed char>/1                     1.18 ns         2.37 ns
bm_lexicographical_compare<signed char>/2                     1.65 ns         2.60 ns
bm_lexicographical_compare<signed char>/3                     2.23 ns         2.83 ns
bm_lexicographical_compare<signed char>/4                     2.81 ns         3.06 ns
bm_lexicographical_compare<signed char>/5                     3.35 ns         3.30 ns
bm_lexicographical_compare<signed char>/6                     3.90 ns         3.99 ns
bm_lexicographical_compare<signed char>/7                     4.56 ns         3.78 ns
bm_lexicographical_compare<signed char>/8                     5.20 ns         4.02 ns
bm_lexicographical_compare<signed char>/16                    9.80 ns         6.21 ns
bm_lexicographical_compare<signed char>/64                    39.0 ns         3.16 ns
bm_lexicographical_compare<signed char>/512                    318 ns         7.58 ns
bm_lexicographical_compare<signed char>/4096                  2514 ns         47.4 ns
bm_lexicographical_compare<signed char>/32768                20096 ns          504 ns
bm_lexicographical_compare<signed char>/262144              156617 ns         4146 ns
bm_lexicographical_compare<signed char>/1048576             624265 ns        19810 ns
bm_lexicographical_compare<int>/1                             1.15 ns         2.12 ns
bm_lexicographical_compare<int>/2                             1.60 ns         2.36 ns
bm_lexicographical_compare<int>/3                             2.21 ns         2.59 ns
bm_lexicographical_compare<int>/4                             2.74 ns         2.83 ns
bm_lexicographical_compare<int>/5                             3.26 ns         3.06 ns
bm_lexicographical_compare<int>/6                             3.81 ns         4.53 ns
bm_lexicographical_compare<int>/7                             4.41 ns         4.72 ns
bm_lexicographical_compare<int>/8                             5.08 ns         2.36 ns
bm_lexicographical_compare<int>/16                            9.54 ns         3.08 ns
bm_lexicographical_compare<int>/64                            37.8 ns         4.71 ns
bm_lexicographical_compare<int>/512                            309 ns         24.6 ns
bm_lexicographical_compare<int>/4096                          2422 ns          204 ns
bm_lexicographical_compare<int>/32768                        19362 ns         1947 ns
bm_lexicographical_compare<int>/262144                      155727 ns        19793 ns
bm_lexicographical_compare<int>/1048576                     623614 ns        80180 ns
bm_ranges_lexicographical_compare<unsigned char>/1            1.07 ns         2.35 ns
bm_ranges_lexicographical_compare<unsigned char>/2            1.72 ns         2.13 ns
bm_ranges_lexicographical_compare<unsigned char>/3            2.46 ns         2.12 ns
bm_ranges_lexicographical_compare<unsigned char>/4            3.17 ns         2.12 ns
bm_ranges_lexicographical_compare<unsigned char>/5            3.86 ns         2.12 ns
bm_ranges_lexicographical_compare<unsigned char>/6            4.55 ns         2.12 ns
bm_ranges_lexicographical_compare<unsigned char>/7            5.25 ns         2.12 ns
bm_ranges_lexicographical_compare<unsigned char>/8            5.95 ns         2.13 ns
bm_ranges_lexicographical_compare<unsigned char>/16           11.7 ns         2.13 ns
bm_ranges_lexicographical_compare<unsigned char>/64           45.5 ns         2.36 ns
bm_ranges_lexicographical_compare<unsigned char>/512           366 ns         6.35 ns
bm_ranges_lexicographical_compare<unsigned char>/4096         2886 ns         40.9 ns
bm_ranges_lexicographical_compare<unsigned char>/32768       23054 ns          489 ns
bm_ranges_lexicographical_compare<unsigned char>/262144     185302 ns         4339 ns
bm_ranges_lexicographical_compare<unsigned char>/1048576    741576 ns        19430 ns
bm_ranges_lexicographical_compare<signed char>/1              1.10 ns         2.12 ns
bm_ranges_lexicographical_compare<signed char>/2              1.66 ns         2.35 ns
bm_ranges_lexicographical_compare<signed char>/3              2.23 ns         2.58 ns
bm_ranges_lexicographical_compare<signed char>/4              2.82 ns         2.82 ns
bm_ranges_lexicographical_compare<signed char>/5              3.34 ns         3.06 ns
bm_ranges_lexicographical_compare<signed char>/6              3.92 ns         3.99 ns
bm_ranges_lexicographical_compare<signed char>/7              4.64 ns         4.10 ns
bm_ranges_lexicographical_compare<signed char>/8              5.21 ns         4.61 ns
bm_ranges_lexicographical_compare<signed char>/16             9.79 ns         7.42 ns
bm_ranges_lexicographical_compare<signed char>/64             38.9 ns         2.93 ns
bm_ranges_lexicographical_compare<signed char>/512             317 ns         7.31 ns
bm_ranges_lexicographical_compare<signed char>/4096           2500 ns         47.5 ns
bm_ranges_lexicographical_compare<signed char>/32768         19940 ns          496 ns
bm_ranges_lexicographical_compare<signed char>/262144       159166 ns         4393 ns
bm_ranges_lexicographical_compare<signed char>/1048576      638206 ns        19786 ns
bm_ranges_lexicographical_compare<int>/1                      1.10 ns         2.12 ns
bm_ranges_lexicographical_compare<int>/2                      1.64 ns         3.04 ns
bm_ranges_lexicographical_compare<int>/3                      2.23 ns         2.58 ns
bm_ranges_lexicographical_compare<int>/4                      2.81 ns         2.81 ns
bm_ranges_lexicographical_compare<int>/5                      3.35 ns         3.05 ns
bm_ranges_lexicographical_compare<int>/6                      3.94 ns         4.60 ns
bm_ranges_lexicographical_compare<int>/7                      4.60 ns         4.81 ns
bm_ranges_lexicographical_compare<int>/8                      5.19 ns         2.35 ns
bm_ranges_lexicographical_compare<int>/16                     9.85 ns         2.87 ns
bm_ranges_lexicographical_compare<int>/64                     38.9 ns         4.70 ns
bm_ranges_lexicographical_compare<int>/512                     318 ns         24.5 ns
bm_ranges_lexicographical_compare<int>/4096                   2494 ns          202 ns
bm_ranges_lexicographical_compare<int>/32768                 20000 ns         1939 ns
bm_ranges_lexicographical_compare<int>/262144               160433 ns        19730 ns
bm_ranges_lexicographical_compare<int>/1048576              642636 ns        80760 ns
```
2024-08-04 10:02:43 +02:00
Louis Dionne
6a54dfbfe5 [libc++][NFC] Add missing license headers
Also standardize the license comment in several files where it was
different from what we normally do.
2024-07-31 12:58:09 -04:00
Louis Dionne
78b4b5cccb
[libc++] Move the benchmarks under libcxx/test (#99371)
This is an intermediate and fairly mechanical step towards unifying the
benchmarks with the rest of the test suite. Moving this around requires
a few changes, notably making sure we don't throw a wrench into the
discovery process of the normal test suite. This won't be a problem
anymore once benchmarks are taken into account by the test setup out of
the box.
2024-07-31 11:18:32 -04:00