llvm-project

Author	SHA1	Message	Date
Louis Dionne	c7aaaeaef4	[libc++] Rename a few benchmarks to allow identifying what's being benchmarked from the name (#185747 )	2026-03-11 08:15:02 -04:00
Nikolas Klauser	15bcae5e3b	[libc++] Drop the unrepresentative search_n benchmark (#184783 ) This benchmark isn't very good at benchmarking `search_n`, since a good `search_n` implementation can go through it in ~10 perfectly predictable steps. We can drop it to avoid spending unnecessary resources. This also fixes that the two benchmark sets have identical names. Fixes #183832	2026-03-10 10:40:12 +01:00
Nikolas Klauser	ab25249e63	[libc++] Refactor std::push_heap benchmark (#181343 ) We're trying to get the time it takes to run all the benchmarks down, so that we can run them on a regular basis. This patch saves us ~8 minutes per run. Fixes #177028	2026-02-14 08:34:45 +01:00
Nikolas Klauser	8c93fb0c05	[libc++] Refactor benchmarking std::make_heap and std::sort_heap together (#180935 ) We're trying to get the time it takes to run all the benchmarks down, so that we can run them on a regular basis. This patch saves us ~18 minutes per run.	2026-02-11 13:17:44 -05:00
Nikolas Klauser	bc1e852241	[libc++] Refactor std::sort_heap benchmark (#180941 ) We're trying to get the time it takes to run all the benchmarks down, so that we can run them on a regular basis. This patch saves us ~80 seconds per run.	2026-02-11 13:17:28 -05:00
Nikolas Klauser	f32fd56f0e	[libc++] Reduce the number of runs on the ranges::min{,max} benchmarks (#179912 ) Testing a bunch of range sizes has relatively little value. This reduces the number of benchmarks so we can run them on a regular basis. This saves ~10 minutes when running benchmarks. Fixes #179698	2026-02-11 13:17:03 -05:00
Nikolas Klauser	064694160f	[libc++] Rewrite the std::pop_heap benchmark (#179911 ) Testing a bunch of random types has relatively little value. This reduces the number of benchmarks so we can run them on a regular basis. This saves ~90 seconds when running the benchmarks.	2026-02-11 12:47:33 +01:00
Nikolas Klauser	0bf41e3d7a	[libc++] Rewrite the std::make_heap benchmark (#178696 ) This rewrites the `make_heap` benchmark to make it significantly faster to run. In my test it saves ~10 minutes. This patch also drops `ranges::` heap benchmarks, since we've decided to remove `ranges::` benchmarks if there is a `std::` equivalent.	2026-02-05 11:46:11 +01:00
Nikolas Klauser	a2fb416b53	[libc++] Rewrite the std::lower_bound benchmark to be more efficient and add an upper_bound benchmark (#177180 ) The current benchmark is incredibly slow to run. This patch refactors the benchmark to be faster and also adds an equivalent benchmark for `std::upper_bound`. Fixes #177026	2026-01-27 13:46:36 -05:00
Nikolas Klauser	07a0e0fb31	[libc++] Remove benchmarks for ranges algorithms that have a std equivalent (#176138 ) We're currently running all the algorithms benchmarks for the `std` and `ranges` variants, even though almost all the algorithms share the same code. This makes running the benchmarks on a large set of commits very slow and costly. This reduced running the `algorithm/` subdirectory from ~4 hours to roughtly 2.5 hours. Fixes #175973	2026-01-19 16:04:32 +01:00
Nikolas Klauser	4304106dc2	Reapply "[libc++] Optimize std::find_if" (#175903 ) (#175921 ) #175913 removed that `__builtin_assume_dereferenceable(ptr, 0)` implies `ptr != nullptr`, which should allow us to use the builtin with LLVM 23. This reverts commit 776c09c212e945fdceeae240b42c38df3dd34727.	2026-01-15 10:16:45 +01:00
Nikolas Klauser	776c09c212	Revert "[libc++] Optimize std::find_if" (#175903 ) `__builtin_assume_dereferenceable` currently implies that the pointer given to it is non-null, even if the size is zero. This causes miscompilations, since we consider [nullptr, nullptr) to be a valid range. Reverts llvm/llvm-project#167697	2026-01-14 11:15:21 +01:00
Nikolas Klauser	6189512f73	[libc++] Optimize std::find_if (#167697 ) ``` Benchmark 4ecfaa602f56 80d5ac247d34 Difference % Difference ---------------------------- -------------- -------------- ------------ -------------- bm_find_if_autovectorization 1901.51 306.12 -1595.39 -83.90% ```	2026-01-13 11:13:54 +01:00
Nikolas Klauser	fbde9240e9	[libc++] Remove some low value benchmarks (#175178 ) This patch removes some of the benchmarks in our suite that are relatively low value. Specifically: - `lexicographical_compare` isn't benchmarked with every number of elements between 1 and 8 anymore - `bitset` isn't benchmarked with as many sizes anymore. Instead, only some representative sizes are benchmarked. - `to_chars` isn't benchmarked for every possible base anymore, but only 2, 8, 10, 16 and 23 (so we have an uncommon base as well)	2026-01-12 14:09:54 -05:00
Nikolas Klauser	06c6a5020f	[libc++] Optimize search_n (#171389 ) This changes the algorithm to more efficiently skip ranges which cannot match the needle for random access iterators. Specifically, we now search for a mismatching element from the back of the subrange we want to check. When a mismatch occurs we can directly start one after the mismatched element, since there cannot possibly be a matching subrange starting between the start of the subrange we checked and the mismatched element (since all elements have to be equal). The algorithm also remembers the subrange which was already match as being equal and doesn't try to compare it a second time, reducing the time spent in case of a match. ``` Benchmark old new Difference % Difference --------------------------------------------------- -------------- -------------- ------------ -------------- rng::search_n(deque<int>)_(no_match)/1000 458.33 14.22 -444.11 -96.90% rng::search_n(deque<int>)_(no_match)/1024 456.17 13.89 -442.28 -96.95% rng::search_n(deque<int>)_(no_match)/1048576 453420.38 17.69 -453402.69 -100.00% rng::search_n(deque<int>)_(no_match)/8192 3566.08 17.60 -3548.49 -99.51% rng::search_n(deque<int>,_pred)_(no_match)/1000 597.88 15.25 -582.63 -97.45% rng::search_n(deque<int>,_pred)_(no_match)/1024 608.42 15.39 -593.03 -97.47% rng::search_n(deque<int>,_pred)_(no_match)/1048576 594533.99 18.91 -594515.08 -100.00% rng::search_n(deque<int>,_pred)_(no_match)/8192 4670.23 18.88 -4651.35 -99.60% rng::search_n(list<int>)_(no_match)/1000 733.72 730.22 -3.50 -0.48% rng::search_n(list<int>)_(no_match)/1024 759.93 753.10 -6.84 -0.90% rng::search_n(list<int>)_(no_match)/1048576 833841.54 813483.75 -20357.79 -2.44% rng::search_n(list<int>)_(no_match)/8192 8352.18 8417.31 65.14 0.78% rng::search_n(list<int>,_pred)_(no_match)/1000 776.79 789.72 12.93 1.66% rng::search_n(list<int>,_pred)_(no_match)/1024 788.42 806.70 18.28 2.32% rng::search_n(list<int>,_pred)_(no_match)/1048576 955536.40 982976.81 27440.41 2.87% rng::search_n(list<int>,_pred)_(no_match)/8192 8874.02 8915.18 41.16 0.46% rng::search_n(vector<int>)_(no_match)/1000 212.69 3.79 -208.90 -98.22% rng::search_n(vector<int>)_(no_match)/1024 219.67 3.70 -215.96 -98.31% rng::search_n(vector<int>)_(no_match)/1048576 209622.54 3.67 -209618.87 -100.00% rng::search_n(vector<int>)_(no_match)/8192 1643.80 3.83 -1639.98 -99.77% rng::search_n(vector<int>,_pred)_(no_match)/1000 461.93 7.55 -454.38 -98.36% rng::search_n(vector<int>,_pred)_(no_match)/1024 472.43 7.74 -464.69 -98.36% rng::search_n(vector<int>,_pred)_(no_match)/1048576 546180.29 8.71 -546171.58 -100.00% rng::search_n(vector<int>,_pred)_(no_match)/8192 3786.26 7.88 -3778.38 -99.79% std::search_n(deque<int>)_(no_match)/1000 455.53 14.19 -441.34 -96.88% std::search_n(deque<int>)_(no_match)/1024 459.79 13.98 -445.81 -96.96% std::search_n(deque<int>)_(no_match)/1048576 449780.32 17.99 -449762.33 -100.00% std::search_n(deque<int>)_(no_match)/8192 3508.55 17.97 -3490.58 -99.49% std::search_n(deque<int>,_pred)_(no_match)/1000 571.53 17.16 -554.37 -97.00% std::search_n(deque<int>,_pred)_(no_match)/1024 584.43 17.09 -567.34 -97.08% std::search_n(deque<int>,_pred)_(no_match)/1048576 581418.31 19.16 -581399.15 -100.00% std::search_n(deque<int>,_pred)_(no_match)/8192 4661.97 19.36 -4642.61 -99.58% std::search_n(list<int>)_(no_match)/1000 722.45 710.39 -12.06 -1.67% std::search_n(list<int>)_(no_match)/1024 748.50 727.08 -21.42 -2.86% std::search_n(list<int>)_(no_match)/1048576 821655.28 784520.12 -37135.16 -4.52% std::search_n(list<int>)_(no_match)/8192 7941.73 8002.05 60.32 0.76% std::search_n(list<int>,_pred)_(no_match)/1000 766.59 786.31 19.72 2.57% std::search_n(list<int>,_pred)_(no_match)/1024 785.92 804.43 18.51 2.35% std::search_n(list<int>,_pred)_(no_match)/1048576 948252.76 969125.41 20872.65 2.20% std::search_n(list<int>,_pred)_(no_match)/8192 8658.99 8825.71 166.72 1.93% std::search_n(vector<int>)_(no_match)/1000 210.36 3.47 -206.89 -98.35% std::search_n(vector<int>)_(no_match)/1024 217.60 4.13 -213.47 -98.10% std::search_n(vector<int>)_(no_match)/1048576 209386.43 3.51 -209382.92 -100.00% std::search_n(vector<int>)_(no_match)/8192 1643.79 3.50 -1640.29 -99.79% std::search_n(vector<int>,_pred)_(no_match)/1000 460.88 5.44 -455.45 -98.82% std::search_n(vector<int>,_pred)_(no_match)/1024 475.36 5.43 -469.93 -98.86% std::search_n(vector<int>,_pred)_(no_match)/1048576 682722.75 7.15 -682715.60 -100.00% std::search_n(vector<int>,_pred)_(no_match)/8192 3779.95 5.43 -3774.52 -99.86% Geomean 4956.15 87.96 -4868.19 -98.23% ``` Fixes #129327	2026-01-08 11:26:08 +01:00
Nikolas Klauser	f21e56c632	[libc++] Add benchmark for std::find(vector<short>) We have special cases for `std::find` with `char` and `int` on most platforms, so the only other benchmark of or vector implementation is currently with `long long`, which is a rather big type. Adding `short` to the benchmarks allows us to more meaningfully compare the different implementations.	2026-01-07 11:21:57 +01:00
Nikolas Klauser	4fc5b6d8c4	[libc++] Optimize {std,ranges}::for_each for iterating over __trees (#164405 ) This patch optimizes how `for_each` iterates over trees by using recursion and storing pointers to the next nodes on the stack. This avoids pointer chasing through the `__parent_` pointer, reducing cache misses. It also makes use of the compiler being able tail-call optimize the recursive function, removing back-tracking the iterators have to do. ``` Benchmark old new Difference % Difference -------------------------------------------------- -------------- -------------- ------------ -------------- rng::for_each(map<int>)/32 35.19 26.67 -8.52 -24.21% rng::for_each(map<int>)/50 64.13 40.68 -23.45 -36.57% rng::for_each(map<int>)/8 5.06 6.49 1.43 28.21% rng::for_each(map<int>)/8192 22893.89 9266.68 -13627.21 -59.52% rng::for_each(map<int>::iterator)/32 35.51 26.88 -8.63 -24.31% rng::for_each(map<int>::iterator)/50 64.39 41.24 -23.15 -35.95% rng::for_each(map<int>::iterator)/8 5.12 5.93 0.81 15.80% rng::for_each(map<int>::iterator)/8192 21283.14 9736.83 -11546.31 -54.25% rng::for_each(multimap<int>)/32 35.22 26.61 -8.61 -24.45% rng::for_each(multimap<int>)/50 64.10 40.07 -24.03 -37.49% rng::for_each(multimap<int>)/8 5.08 6.69 1.61 31.70% rng::for_each(multimap<int>)/8192 23130.44 9026.16 -14104.28 -60.98% rng::for_each(multimap<int>::iterator)/32 35.40 25.08 -10.32 -29.15% rng::for_each(multimap<int>::iterator)/50 64.19 38.15 -26.04 -40.56% rng::for_each(multimap<int>::iterator)/8 5.04 5.25 0.22 4.31% rng::for_each(multimap<int>::iterator)/8192 22875.97 9392.08 -13483.89 -58.94% rng::for_each(multiset<int>)/32 35.82 27.11 -8.72 -24.33% rng::for_each(multiset<int>)/50 62.92 41.59 -21.32 -33.89% rng::for_each(multiset<int>)/8 4.79 6.79 2.00 41.70% rng::for_each(multiset<int>)/8192 22642.68 9280.95 -13361.73 -59.01% rng::for_each(multiset<int>::iterator)/32 35.76 26.71 -9.04 -25.28% rng::for_each(multiset<int>::iterator)/50 63.44 39.00 -24.44 -38.53% rng::for_each(multiset<int>::iterator)/8 4.90 5.21 0.30 6.18% rng::for_each(multiset<int>::iterator)/8192 19930.45 9867.60 -10062.85 -50.49% rng::for_each(set<int>)/32 35.90 27.30 -8.60 -23.96% rng::for_each(set<int>)/50 63.15 40.75 -22.40 -35.47% rng::for_each(set<int>)/8 4.77 6.83 2.06 43.23% rng::for_each(set<int>)/8192 20262.77 9381.57 -10881.20 -53.70% rng::for_each(set<int>::iterator)/32 36.02 26.42 -9.60 -26.64% rng::for_each(set<int>::iterator)/50 63.29 37.97 -25.32 -40.01% rng::for_each(set<int>::iterator)/8 4.72 5.22 0.50 10.50% rng::for_each(set<int>::iterator)/8192 20041.91 9831.91 -10210.00 -50.94% ```	2025-12-12 09:05:51 +01:00
lbonn	3ebe99f4c2	[libcxx] Unwrap iterators in __find_segment (#161274 ) The segmented iterator optimized implementation of find now unwraps iterators when processing each segments. As a result, it is able to take better advantage to some find specializations: calling memchr/wmemchr for vector<vector<{char,int}>> ``` Benchmark Baseline Candidate Difference % Difference -------------------------------------------------------------- ---------- ----------- ------------ -------------- rng::find(join_view(deque<deque<int>>))_(process_all)/1024 71.13 61.19 -9.94 -13.97 rng::find(join_view(deque<deque<int>>))_(process_all)/32768 2359.19 2237.02 -122.17 -5.18 rng::find(join_view(deque<deque<int>>))_(process_all)/50 16.88 17.59 0.71 4.20 rng::find(join_view(deque<deque<int>>))_(process_all)/8 15.59 16.10 0.51 3.27 rng::find(join_view(deque<deque<int>>))_(process_all)/8192 647.01 532.75 -114.26 -17.66 rng::find(join_view(list<vector<int>>))_(process_all)/1024 689.76 680.74 -9.02 -1.31 rng::find(join_view(list<vector<int>>))_(process_all)/32768 22284.95 21500.26 -784.69 -3.52 rng::find(join_view(list<vector<int>>))_(process_all)/50 32.77 32.12 -0.65 -1.98 rng::find(join_view(list<vector<int>>))_(process_all)/8 6.11 5.92 -0.19 -3.11 rng::find(join_view(list<vector<int>>))_(process_all)/8192 5527.88 5373.43 -154.45 -2.79 rng::find(join_view(vector<list<int>>))_(process_all)/1024 1305.59 1264.04 -41.55 -3.18 rng::find(join_view(vector<list<int>>))_(process_all)/32768 42840.88 43322.64 481.76 1.12 rng::find(join_view(vector<list<int>>))_(process_all)/50 57.52 62.35 4.82 8.38 rng::find(join_view(vector<list<int>>))_(process_all)/8 6.06 5.98 -0.07 -1.18 rng::find(join_view(vector<list<int>>))_(process_all)/8192 20700.53 21431.66 731.12 3.53 rng::find(join_view(vector<vector<char>>))_(process_all)/1024 310.64 18.34 -292.30 -94.09 rng::find(join_view(vector<vector<char>>))_(process_all)/32768 9424.96 531.99 -8892.97 -94.36 rng::find(join_view(vector<vector<char>>))_(process_all)/50 18.58 3.25 -15.32 -82.49 rng::find(join_view(vector<vector<char>>))_(process_all)/8 4.81 2.98 -1.84 -38.13 rng::find(join_view(vector<vector<char>>))_(process_all)/8192 2437.50 126.88 -2310.62 -94.79 rng::find(join_view(vector<vector<int>>))_(process_all)/1024 297.10 41.70 -255.39 -85.96 rng::find(join_view(vector<vector<int>>))_(process_all)/32768 9662.42 1822.05 -7840.36 -81.14 rng::find(join_view(vector<vector<int>>))_(process_all)/50 22.29 5.10 -17.19 -77.11 rng::find(join_view(vector<vector<int>>))_(process_all)/8 3.73 3.13 -0.60 -16.05 rng::find(join_view(vector<vector<int>>))_(process_all)/8192 2399.68 356.10 -2043.58 -85.16 ```	2025-11-28 10:13:21 +01:00
Nikolas Klauser	97367d1046	[libc++] Vectorize std::find (#156431 ) ``` Apple M4: ----------------------------------------------------------------------------- Benchmark old new ----------------------------------------------------------------------------- std::find(vector<char>) (bail 25%)/8 1.43 ns 1.44 ns std::find(vector<char>) (bail 25%)/1024 5.54 ns 5.59 ns std::find(vector<char>) (bail 25%)/8192 38.4 ns 39.1 ns std::find(vector<char>) (bail 25%)/32768 134 ns 136 ns std::find(vector<int>) (bail 25%)/8 1.56 ns 1.57 ns std::find(vector<int>) (bail 25%)/1024 65.3 ns 65.4 ns std::find(vector<int>) (bail 25%)/8192 465 ns 464 ns std::find(vector<int>) (bail 25%)/32768 1832 ns 1832 ns std::find(vector<long long>) (bail 25%)/8 0.920 ns 1.20 ns std::find(vector<long long>) (bail 25%)/1024 65.2 ns 31.2 ns std::find(vector<long long>) (bail 25%)/8192 464 ns 255 ns std::find(vector<long long>) (bail 25%)/32768 1833 ns 992 ns std::find(vector<char>) (process all)/8 1.21 ns 1.22 ns std::find(vector<char>) (process all)/50 1.92 ns 1.93 ns std::find(vector<char>) (process all)/1024 16.6 ns 16.9 ns std::find(vector<char>) (process all)/8192 134 ns 136 ns std::find(vector<char>) (process all)/32768 488 ns 503 ns std::find(vector<int>) (process all)/8 2.45 ns 2.48 ns std::find(vector<int>) (process all)/50 12.7 ns 12.7 ns std::find(vector<int>) (process all)/1024 236 ns 236 ns std::find(vector<int>) (process all)/8192 1830 ns 1834 ns std::find(vector<int>) (process all)/32768 7351 ns 7346 ns std::find(vector<long long>) (process all)/8 2.02 ns 1.45 ns std::find(vector<long long>) (process all)/50 12.0 ns 6.12 ns std::find(vector<long long>) (process all)/1024 235 ns 123 ns std::find(vector<long long>) (process all)/8192 1830 ns 983 ns std::find(vector<long long>) (process all)/32768 7306 ns 3969 ns std::find(vector<bool>) (process all)/8 1.14 ns 1.15 ns std::find(vector<bool>) (process all)/50 1.16 ns 1.17 ns std::find(vector<bool>) (process all)/1024 4.51 ns 4.53 ns std::find(vector<bool>) (process all)/8192 33.6 ns 33.5 ns std::find(vector<bool>) (process all)/1048576 3660 ns 3660 ns ```	2025-09-29 11:10:19 +02:00
Louis Dionne	a34048bfed	[libc++][NFC] Fix benchmark name missing a parenthesis	2025-09-24 23:24:55 -04:00
Louis Dionne	794351355e	[libc++] Fix broken unique and unique_copy benchmarks (#158287 ) These benchmarks have an assumption that the container size is divisible by 4 because of how we populate their content, which wasn't satisfied.	2025-09-15 16:00:51 -04:00
Peng Liu	9827440f1e	[libc++] Optimize ranges::{for_each, for_each_n} for segmented iterators (#132896 ) Previously, the segmented iterator optimization was limited to `std::{for_each, for_each_n}`. This patch extends the optimization to `std::ranges::for_each` and `std::ranges::for_each_n`, ensuring consistent optimizations across these algorithms. This patch first generalizes the `std` algorithms by introducing a `Projection` parameter, which is set to `__identity` for the `std` algorithms. Then we let the `ranges` algorithms to directly call their `std` counterparts with a general `__proj` argument. Benchmarks demonstrate performance improvements of up to 21.4x for ``std::deque::iterator`` and 22.3x for ``join_view`` of ``vector<vector<char>>``. Addresses a subtask of #102817.	2025-06-18 12:22:47 -04:00
Peng Liu	09c266b75d	[libc++] Optimize std::for_each_n for segmented iterators (#135468 ) This patch enhances the performance of `std::for_each_n` when used with segmented iterators, leading to significant performance improvements, summarized in the tables below. This addresses a subtask of https://github.com/llvm/llvm-project/issues/102817.	2025-05-21 12:10:50 -04:00
Louis Dionne	8c435886aa	[libc++] Refactor and add benchmark coverage for [alg.sort] (#128236 ) This patch adds missing benchmark coverage for partial_sort, partial_sort_copy, is_sorted and is_sorted_until. It also refactors the existing benchmarks for sort and stable_sort to follow the consistent style of the new algorithm benchmarks. However, these benchmarks were notoriously slow to run since they tested multiple data patterns on multiple data types. To try to alleviate this, I reduced the benchmarks to only run on integral types and on a single non-integral type, which should faithfully show how the algorithm behaves for anything non-integral. However, this is technically a reduction in coverage.	2025-05-14 14:52:17 -04:00
Louis Dionne	410754410f	[libc++] Add benchmarks for partitioning algorithms (#127324 ) This patch adds benchmarks for std::partition, is_partitioned, etc and their ranges:: variants.	2025-03-24 12:24:43 -04:00
Louis Dionne	b3a4bf9d8f	[libc++] Refactor and add benchmarks from [alg.nonmodifying] (#128206 )	2025-03-19 00:08:46 -04:00
Louis Dionne	24e88b0e6b	[libc++] Add remaining benchmarks from [alg.modifying.operations] (#127354 ) This patch adds benchmarks for all the remaining algorithms in [alg.modifying.operations] that we didn't already have a benchmark for.	2025-03-17 15:11:13 -04:00
Peng Liu	4baf1c03fa	[libc++] Optimize ranges::rotate for vector<bool>::iterator (#121168 ) This PR optimizes the performance of `std::ranges::rotate` for `vector<bool>::iterator`. The optimization yields a performance improvement of up to 2096x. Closes #64038.	2025-03-13 14:07:23 -04:00
Peng Liu	a12744ff05	[libc++] Optimize ranges::swap_ranges for vector<bool>::iterator (#121150 ) This PR optimizes the performance of `std::ranges::swap_ranges` for `vector<bool>::iterator`, addressing a subtask outlined in issue #64038. The optimizations yield performance improvements of up to 611x for aligned range swap and 78x for unaligned range swap comparison. Additionally, comprehensive tests covering up to 4 storage words (256 bytes) with odd and even bit sizes are provided, which validate the proposed optimizations in this patch.	2025-03-04 17:15:36 -05:00
Peng Liu	7717a549e9	[libc++] Optimize ranges::equal for vector<bool>::iterator (#121084 ) This PR optimizes the performance of `std::ranges::equal` for `vector<bool>::iterator`, addressing a subtask outlined in issue #64038. The optimizations yield performance improvements of up to 188x for aligned equality comparison and 82x for unaligned equality comparison. Moreover, comprehensive tests covering up to 4 storage words (256 bytes) with odd and even bit sizes are provided, which validate the proposed optimizations in this patch.	2025-02-26 12:18:25 -05:00
Louis Dionne	8feb5bac32	[libc++] Add benchmarks for copy algorithms (#127328 ) This patch adds benchmarks for the copy family of algorithms (copy, copy_n, copy_if, copy_backward).	2025-02-20 08:24:32 -05:00
Peng Liu	ab3d793982	[libc++] Optimize ranges::move{,_backward} for vector<bool>::iterator (#121109 ) As a follow-up to #121013 (which optimized `ranges::copy`) and #121026 (which optimized `ranges::copy_backward`), this PR enhances the performance of `std::ranges::{move, move_backward}` for `vector<bool>::iterator`, addressing a subtask outlined in issue #64038. The optimizations bring performance improvements analogous to those achieved for the `{copy, copy_backward}` algorithms: up to 2000x for aligned moves and 60x for unaligned moves. Moreover, comprehensive tests covering up to 4 storage words (256 bytes) with odd and even bit sizes are provided, which validate the proposed optimizations in this patch.	2025-02-19 11:36:45 -05:00
Louis Dionne	a6093d3034	[libc++] Explicitly mention vector_bool in the name of benchmarks (#127313 ) We have some benchmarks that were benchmarking very specific functionality, namely the optimizations in vector<bool>::iterator. Call this out in the benchmarks by renaming them appropriately. In the future we will also increase the coverage of these benchmarks to test other containers.	2025-02-15 14:56:19 +01:00
Louis Dionne	c7c7eabc7f	[libc++] Add a benchmark for std::reverse (#125262 )	2025-02-03 12:10:47 -05:00
Louis Dionne	439bef9751	[libc++] Refactor the sequence container benchmarks (#119763 ) Rewrite the sequence container benchmarks to only rely on the actual operations specified in SequenceContainer requirements and add benchmarks for std::list, which is also considered a sequence container. One of the major goals of this refactoring is also to make these container benchmarks run faster so that they can be run more frequently. The existing benchmarks have the significant problem that they take so long to run that they must basically be run overnight. This patch reduces the size of inputs such that the rewritten benchmarks each take at most a minute to run. This patch doesn't touch the string benchmarks, which were not using the generic container benchmark functions previously.	2025-01-30 15:02:34 -05:00
Peng Liu	edc3dc6abd	[libc++] Optimize ranges::copy_backward for vector<bool>::iterator (#121026 ) As a follow-up to #121013 (which focused on `std::ranges::copy`), this PR optimizes the performance of `std::ranges::copy_backward` for `vector<bool>::iterator`, addressing a subtask outlined in issue #64038. The optimizations yield performance improvements of up to 2000x for aligned copies and 60x for unaligned copies.	2025-01-30 14:55:05 -05:00
Peng Liu	5b65896ad6	[libc++] Optimize ranges::copy{, _n} for vector<bool>::iterator (#121013 ) This PR optimizes the performance of `std::ranges::copy` and `std::ranges::copy_n` specifically for `vector<bool>::iterator`, addressing a subtask outlined in issue #64038. The optimizations yield performance improvements of up to 2000x for aligned copies and 60x for unaligned copies. Additionally, new tests have been added to validate these enhancements. - Aligned source-destination bits ranges::copy ``` -------------------------------------------------------------------------- Benchmark Before After Improvement -------------------------------------------------------------------------- bm_ranges_copy_vb_aligned/8 10.8 ns 1.42 ns 8x bm_ranges_copy_vb_aligned/64 88.5 ns 2.28 ns 39x bm_ranges_copy_vb_aligned/512 709 ns 1.95 ns 364x bm_ranges_copy_vb_aligned/4096 5568 ns 5.01 ns 1111x bm_ranges_copy_vb_aligned/32768 44754 ns 38.7 ns 1156x bm_ranges_copy_vb_aligned/65536 91092 ns 73.2 ns 1244x bm_ranges_copy_vb_aligned/102400 139473 ns 127 ns 1098x bm_ranges_copy_vb_aligned/106496 189004 ns 81.5 ns 2319x bm_ranges_copy_vb_aligned/110592 153647 ns 71.1 ns 2161x bm_ranges_copy_vb_aligned/114688 159261 ns 70.2 ns 2269x bm_ranges_copy_vb_aligned/118784 181910 ns 73.5 ns 2475x bm_ranges_copy_vb_aligned/122880 174117 ns 76.5 ns 2276x bm_ranges_copy_vb_aligned/126976 176020 ns 82.0 ns 2147x bm_ranges_copy_vb_aligned/131072 180757 ns 137 ns 1319x bm_ranges_copy_vb_aligned/135168 190342 ns 158 ns 1205x bm_ranges_copy_vb_aligned/139264 192831 ns 103 ns 1872x bm_ranges_copy_vb_aligned/143360 199627 ns 89.4 ns 2233x bm_ranges_copy_vb_aligned/147456 203881 ns 88.6 ns 2301x bm_ranges_copy_vb_aligned/151552 213345 ns 88.4 ns 2413x bm_ranges_copy_vb_aligned/155648 216892 ns 92.9 ns 2335x bm_ranges_copy_vb_aligned/159744 222751 ns 96.4 ns 2311x bm_ranges_copy_vb_aligned/163840 225995 ns 173 ns 1306x bm_ranges_copy_vb_aligned/167936 235230 ns 202 ns 1165x bm_ranges_copy_vb_aligned/172032 244093 ns 131 ns 1863x bm_ranges_copy_vb_aligned/176128 244434 ns 111 ns 2202x bm_ranges_copy_vb_aligned/180224 249570 ns 108 ns 2311x bm_ranges_copy_vb_aligned/184320 254538 ns 108 ns 2357x bm_ranges_copy_vb_aligned/188416 261817 ns 113 ns 2317x bm_ranges_copy_vb_aligned/192512 269923 ns 125 ns 2159x bm_ranges_copy_vb_aligned/196608 273494 ns 210 ns 1302x bm_ranges_copy_vb_aligned/200704 280035 ns 269 ns 1041x bm_ranges_copy_vb_aligned/204800 293102 ns 231 ns 1269x ``` ranges::copy_n ``` -------------------------------------------------------------------------- Benchmark Before After Improvement -------------------------------------------------------------------------- bm_ranges_copy_n_vb_aligned/8 11.8 ns 0.89 ns 13x bm_ranges_copy_n_vb_aligned/64 91.6 ns 2.06 ns 44x bm_ranges_copy_n_vb_aligned/512 718 ns 2.45 ns 293x bm_ranges_copy_n_vb_aligned/4096 5750 ns 5.02 ns 1145x bm_ranges_copy_n_vb_aligned/32768 45824 ns 40.9 ns 1120x bm_ranges_copy_n_vb_aligned/65536 92267 ns 73.8 ns 1250x bm_ranges_copy_n_vb_aligned/102400 143267 ns 125 ns 1146x bm_ranges_copy_n_vb_aligned/106496 148625 ns 82.4 ns 1804x bm_ranges_copy_n_vb_aligned/110592 154817 ns 72.0 ns 2150x bm_ranges_copy_n_vb_aligned/114688 157953 ns 70.4 ns 2244x bm_ranges_copy_n_vb_aligned/118784 162374 ns 71.5 ns 2270x bm_ranges_copy_n_vb_aligned/122880 168638 ns 72.9 ns 2313x bm_ranges_copy_n_vb_aligned/126976 175596 ns 76.6 ns 2292x bm_ranges_copy_n_vb_aligned/131072 181164 ns 135 ns 1342x bm_ranges_copy_n_vb_aligned/135168 184697 ns 157 ns 1176x bm_ranges_copy_n_vb_aligned/139264 191395 ns 104 ns 1840x bm_ranges_copy_n_vb_aligned/143360 194954 ns 88.3 ns 2208x bm_ranges_copy_n_vb_aligned/147456 208917 ns 86.1 ns 2426x bm_ranges_copy_n_vb_aligned/151552 211101 ns 87.2 ns 2421x bm_ranges_copy_n_vb_aligned/155648 213175 ns 89.0 ns 2395x bm_ranges_copy_n_vb_aligned/159744 218988 ns 86.7 ns 2526x bm_ranges_copy_n_vb_aligned/163840 225263 ns 156 ns 1444x bm_ranges_copy_n_vb_aligned/167936 230725 ns 184 ns 1254x bm_ranges_copy_n_vb_aligned/172032 235795 ns 119 ns 1981x bm_ranges_copy_n_vb_aligned/176128 241145 ns 101 ns 2388x bm_ranges_copy_n_vb_aligned/180224 250680 ns 99.5 ns 2519x bm_ranges_copy_n_vb_aligned/184320 262954 ns 99.7 ns 2637x bm_ranges_copy_n_vb_aligned/188416 258584 ns 103 ns 2510x bm_ranges_copy_n_vb_aligned/192512 267190 ns 125 ns 2138x bm_ranges_copy_n_vb_aligned/196608 270821 ns 213 ns 1271x bm_ranges_copy_n_vb_aligned/200704 279532 ns 262 ns 1067x bm_ranges_copy_n_vb_aligned/204800 283412 ns 222 ns 1277x ``` - Unaligned source-destination bits ``` -------------------------------------------------------------------------------- Benchmark Before After Improvement -------------------------------------------------------------------------------- bm_ranges_copy_vb_unaligned/8 12.8 ns 8.59 ns 1.5x bm_ranges_copy_vb_unaligned/64 98.2 ns 8.24 ns 12x bm_ranges_copy_vb_unaligned/512 755 ns 18.1 ns 42x bm_ranges_copy_vb_unaligned/4096 6027 ns 102 ns 59x bm_ranges_copy_vb_unaligned/32768 47663 ns 774 ns 62x bm_ranges_copy_vb_unaligned/262144 378981 ns 6455 ns 59x bm_ranges_copy_vb_unaligned/1048576 1520486 ns 25942 ns 59x bm_ranges_copy_n_vb_unaligned/8 11.3 ns 8.22 ns 1.4x bm_ranges_copy_n_vb_unaligned/64 97.3 ns 7.89 ns 12x bm_ranges_copy_n_vb_unaligned/512 747 ns 18.1 ns 41x bm_ranges_copy_n_vb_unaligned/4096 5932 ns 99.0 ns 60x bm_ranges_copy_n_vb_unaligned/32768 47776 ns 749 ns 64x bm_ranges_copy_n_vb_unaligned/262144 378802 ns 6576 ns 58x bm_ranges_copy_n_vb_unaligned/1048576 1547234 ns 26229 ns 59x ```	2025-01-30 17:26:26 +01:00
Nikolas Klauser	fd784726db	[libc++] Rewrite minmax_element benchmark The benchmark currently uses makeCartesianProductBenchmark, which doesn't make a ton of sense, since minmax_element always goes through every element one by one. The runtime doesn't depend on the values of the elements. Fixes #120758	2024-12-21 20:07:28 +01:00
Louis Dionne	6a9279ca40	[libc++] Slight reorganization of the benchmarks (#119625 ) Move various container benchmarks to the same subdirectory, and regroup some format-related benchmarks.	2024-12-12 08:14:50 -05:00
Louis Dionne	e236a52a88	[libc++] Unify the benchmarks with the test suite (#101399 ) Instead of building the benchmarks separately via CMake and running them separately from the test suite, this patch merges the benchmarks into the test suite and handles both uniformly. As a result: - It is now possible to run individual benchmarks like we run tests (e.g. using libcxx-lit), which is a huge quality-of-life improvement. - The benchmarks will be run under exactly the same configuration as the rest of the tests, which is a nice simplification. This does mean that one has to be careful to enable the desired optimization flags when running benchmarks, but that is easy with e.g. `libcxx-lit <...> --param optimization=speed`. - Benchmarks can use the same annotations as the rest of the test suite, such as `// UNSUPPORTED` & friends. When running the tests via `check-cxx`, we only compile the benchmarks because running them would be too time consuming. This introduces a bit of complexity in the testing setup, and instead it would be better to allow passing a --dry-run flag to GoogleBenchmark executables, which is the topic of https://github.com/google/benchmark/issues/1827. I am not really satisfied with the layering violation of adding the %{benchmark_flags} substitution to cmake-bridge, however I believe this can be improved in the future.	2024-11-07 09:07:50 -05:00
Louis Dionne	b2d2494731	[libc++] Make benchmarks forward-compatible with the test suite (#114502 ) This patch fixes warnings and errors that come up when running the benchmarks as part of the test suite. It also adds the necessary Lit annotations to make it pass in various configurations and increases the portability of the benchmarks.	2024-11-05 09:08:00 -05:00
Nikolas Klauser	d07fdf9779	[libc++] Optimize lexicographical_compare (#65279 ) If the comparison operation is equivalent to < and that is a total order, we know that we can use equality comparison on that type instead to extract some information. Furthermore, if equality comparison on that type is trivial, the user can't observe that we're calling it. So instead of using the user-provided total order, we use std::mismatch, which uses equality comparison (and is vertorized). Additionally, if the type is trivially lexicographically comparable, we can go one step further and use std::memcmp directly instead of calling std::mismatch. Benchmarks: ``` ------------------------------------------------------------------------------------- Benchmark old new ------------------------------------------------------------------------------------- bm_lexicographical_compare<unsigned char>/1 1.17 ns 2.34 ns bm_lexicographical_compare<unsigned char>/2 1.64 ns 2.57 ns bm_lexicographical_compare<unsigned char>/3 2.23 ns 2.58 ns bm_lexicographical_compare<unsigned char>/4 2.82 ns 2.57 ns bm_lexicographical_compare<unsigned char>/5 3.34 ns 2.11 ns bm_lexicographical_compare<unsigned char>/6 3.94 ns 2.21 ns bm_lexicographical_compare<unsigned char>/7 4.56 ns 2.11 ns bm_lexicographical_compare<unsigned char>/8 5.25 ns 2.11 ns bm_lexicographical_compare<unsigned char>/16 9.88 ns 2.11 ns bm_lexicographical_compare<unsigned char>/64 38.9 ns 2.36 ns bm_lexicographical_compare<unsigned char>/512 317 ns 6.54 ns bm_lexicographical_compare<unsigned char>/4096 2517 ns 41.4 ns bm_lexicographical_compare<unsigned char>/32768 20052 ns 488 ns bm_lexicographical_compare<unsigned char>/262144 159579 ns 4409 ns bm_lexicographical_compare<unsigned char>/1048576 640456 ns 20342 ns bm_lexicographical_compare<signed char>/1 1.18 ns 2.37 ns bm_lexicographical_compare<signed char>/2 1.65 ns 2.60 ns bm_lexicographical_compare<signed char>/3 2.23 ns 2.83 ns bm_lexicographical_compare<signed char>/4 2.81 ns 3.06 ns bm_lexicographical_compare<signed char>/5 3.35 ns 3.30 ns bm_lexicographical_compare<signed char>/6 3.90 ns 3.99 ns bm_lexicographical_compare<signed char>/7 4.56 ns 3.78 ns bm_lexicographical_compare<signed char>/8 5.20 ns 4.02 ns bm_lexicographical_compare<signed char>/16 9.80 ns 6.21 ns bm_lexicographical_compare<signed char>/64 39.0 ns 3.16 ns bm_lexicographical_compare<signed char>/512 318 ns 7.58 ns bm_lexicographical_compare<signed char>/4096 2514 ns 47.4 ns bm_lexicographical_compare<signed char>/32768 20096 ns 504 ns bm_lexicographical_compare<signed char>/262144 156617 ns 4146 ns bm_lexicographical_compare<signed char>/1048576 624265 ns 19810 ns bm_lexicographical_compare<int>/1 1.15 ns 2.12 ns bm_lexicographical_compare<int>/2 1.60 ns 2.36 ns bm_lexicographical_compare<int>/3 2.21 ns 2.59 ns bm_lexicographical_compare<int>/4 2.74 ns 2.83 ns bm_lexicographical_compare<int>/5 3.26 ns 3.06 ns bm_lexicographical_compare<int>/6 3.81 ns 4.53 ns bm_lexicographical_compare<int>/7 4.41 ns 4.72 ns bm_lexicographical_compare<int>/8 5.08 ns 2.36 ns bm_lexicographical_compare<int>/16 9.54 ns 3.08 ns bm_lexicographical_compare<int>/64 37.8 ns 4.71 ns bm_lexicographical_compare<int>/512 309 ns 24.6 ns bm_lexicographical_compare<int>/4096 2422 ns 204 ns bm_lexicographical_compare<int>/32768 19362 ns 1947 ns bm_lexicographical_compare<int>/262144 155727 ns 19793 ns bm_lexicographical_compare<int>/1048576 623614 ns 80180 ns bm_ranges_lexicographical_compare<unsigned char>/1 1.07 ns 2.35 ns bm_ranges_lexicographical_compare<unsigned char>/2 1.72 ns 2.13 ns bm_ranges_lexicographical_compare<unsigned char>/3 2.46 ns 2.12 ns bm_ranges_lexicographical_compare<unsigned char>/4 3.17 ns 2.12 ns bm_ranges_lexicographical_compare<unsigned char>/5 3.86 ns 2.12 ns bm_ranges_lexicographical_compare<unsigned char>/6 4.55 ns 2.12 ns bm_ranges_lexicographical_compare<unsigned char>/7 5.25 ns 2.12 ns bm_ranges_lexicographical_compare<unsigned char>/8 5.95 ns 2.13 ns bm_ranges_lexicographical_compare<unsigned char>/16 11.7 ns 2.13 ns bm_ranges_lexicographical_compare<unsigned char>/64 45.5 ns 2.36 ns bm_ranges_lexicographical_compare<unsigned char>/512 366 ns 6.35 ns bm_ranges_lexicographical_compare<unsigned char>/4096 2886 ns 40.9 ns bm_ranges_lexicographical_compare<unsigned char>/32768 23054 ns 489 ns bm_ranges_lexicographical_compare<unsigned char>/262144 185302 ns 4339 ns bm_ranges_lexicographical_compare<unsigned char>/1048576 741576 ns 19430 ns bm_ranges_lexicographical_compare<signed char>/1 1.10 ns 2.12 ns bm_ranges_lexicographical_compare<signed char>/2 1.66 ns 2.35 ns bm_ranges_lexicographical_compare<signed char>/3 2.23 ns 2.58 ns bm_ranges_lexicographical_compare<signed char>/4 2.82 ns 2.82 ns bm_ranges_lexicographical_compare<signed char>/5 3.34 ns 3.06 ns bm_ranges_lexicographical_compare<signed char>/6 3.92 ns 3.99 ns bm_ranges_lexicographical_compare<signed char>/7 4.64 ns 4.10 ns bm_ranges_lexicographical_compare<signed char>/8 5.21 ns 4.61 ns bm_ranges_lexicographical_compare<signed char>/16 9.79 ns 7.42 ns bm_ranges_lexicographical_compare<signed char>/64 38.9 ns 2.93 ns bm_ranges_lexicographical_compare<signed char>/512 317 ns 7.31 ns bm_ranges_lexicographical_compare<signed char>/4096 2500 ns 47.5 ns bm_ranges_lexicographical_compare<signed char>/32768 19940 ns 496 ns bm_ranges_lexicographical_compare<signed char>/262144 159166 ns 4393 ns bm_ranges_lexicographical_compare<signed char>/1048576 638206 ns 19786 ns bm_ranges_lexicographical_compare<int>/1 1.10 ns 2.12 ns bm_ranges_lexicographical_compare<int>/2 1.64 ns 3.04 ns bm_ranges_lexicographical_compare<int>/3 2.23 ns 2.58 ns bm_ranges_lexicographical_compare<int>/4 2.81 ns 2.81 ns bm_ranges_lexicographical_compare<int>/5 3.35 ns 3.05 ns bm_ranges_lexicographical_compare<int>/6 3.94 ns 4.60 ns bm_ranges_lexicographical_compare<int>/7 4.60 ns 4.81 ns bm_ranges_lexicographical_compare<int>/8 5.19 ns 2.35 ns bm_ranges_lexicographical_compare<int>/16 9.85 ns 2.87 ns bm_ranges_lexicographical_compare<int>/64 38.9 ns 4.70 ns bm_ranges_lexicographical_compare<int>/512 318 ns 24.5 ns bm_ranges_lexicographical_compare<int>/4096 2494 ns 202 ns bm_ranges_lexicographical_compare<int>/32768 20000 ns 1939 ns bm_ranges_lexicographical_compare<int>/262144 160433 ns 19730 ns bm_ranges_lexicographical_compare<int>/1048576 642636 ns 80760 ns ```	2024-08-04 10:02:43 +02:00
Louis Dionne	6a54dfbfe5	[libc++][NFC] Add missing license headers Also standardize the license comment in several files where it was different from what we normally do.	2024-07-31 12:58:09 -04:00
Louis Dionne	78b4b5cccb	[libc++] Move the benchmarks under libcxx/test (#99371 ) This is an intermediate and fairly mechanical step towards unifying the benchmarks with the rest of the test suite. Moving this around requires a few changes, notably making sure we don't throw a wrench into the discovery process of the normal test suite. This won't be a problem anymore once benchmarks are taken into account by the test setup out of the box.	2024-07-31 11:18:32 -04:00

44 Commits