llvm-project

Author	SHA1	Message	Date
Peng Liu	9827440f1e	[libc++] Optimize ranges::{for_each, for_each_n} for segmented iterators (#132896 ) Previously, the segmented iterator optimization was limited to `std::{for_each, for_each_n}`. This patch extends the optimization to `std::ranges::for_each` and `std::ranges::for_each_n`, ensuring consistent optimizations across these algorithms. This patch first generalizes the `std` algorithms by introducing a `Projection` parameter, which is set to `__identity` for the `std` algorithms. Then we let the `ranges` algorithms to directly call their `std` counterparts with a general `__proj` argument. Benchmarks demonstrate performance improvements of up to 21.4x for ``std::deque::iterator`` and 22.3x for ``join_view`` of ``vector<vector<char>>``. Addresses a subtask of #102817.	2025-06-18 12:22:47 -04:00
Peng Liu	09c266b75d	[libc++] Optimize std::for_each_n for segmented iterators (#135468 ) This patch enhances the performance of `std::for_each_n` when used with segmented iterators, leading to significant performance improvements, summarized in the tables below. This addresses a subtask of https://github.com/llvm/llvm-project/issues/102817.	2025-05-21 12:10:50 -04:00
Louis Dionne	8c435886aa	[libc++] Refactor and add benchmark coverage for [alg.sort] (#128236 ) This patch adds missing benchmark coverage for partial_sort, partial_sort_copy, is_sorted and is_sorted_until. It also refactors the existing benchmarks for sort and stable_sort to follow the consistent style of the new algorithm benchmarks. However, these benchmarks were notoriously slow to run since they tested multiple data patterns on multiple data types. To try to alleviate this, I reduced the benchmarks to only run on integral types and on a single non-integral type, which should faithfully show how the algorithm behaves for anything non-integral. However, this is technically a reduction in coverage.	2025-05-14 14:52:17 -04:00
Louis Dionne	410754410f	[libc++] Add benchmarks for partitioning algorithms (#127324 ) This patch adds benchmarks for std::partition, is_partitioned, etc and their ranges:: variants.	2025-03-24 12:24:43 -04:00
Louis Dionne	b3a4bf9d8f	[libc++] Refactor and add benchmarks from [alg.nonmodifying] (#128206 )	2025-03-19 00:08:46 -04:00
Louis Dionne	24e88b0e6b	[libc++] Add remaining benchmarks from [alg.modifying.operations] (#127354 ) This patch adds benchmarks for all the remaining algorithms in [alg.modifying.operations] that we didn't already have a benchmark for.	2025-03-17 15:11:13 -04:00
Peng Liu	4baf1c03fa	[libc++] Optimize ranges::rotate for vector<bool>::iterator (#121168 ) This PR optimizes the performance of `std::ranges::rotate` for `vector<bool>::iterator`. The optimization yields a performance improvement of up to 2096x. Closes #64038.	2025-03-13 14:07:23 -04:00
Peng Liu	a12744ff05	[libc++] Optimize ranges::swap_ranges for vector<bool>::iterator (#121150 ) This PR optimizes the performance of `std::ranges::swap_ranges` for `vector<bool>::iterator`, addressing a subtask outlined in issue #64038. The optimizations yield performance improvements of up to 611x for aligned range swap and 78x for unaligned range swap comparison. Additionally, comprehensive tests covering up to 4 storage words (256 bytes) with odd and even bit sizes are provided, which validate the proposed optimizations in this patch.	2025-03-04 17:15:36 -05:00
Peng Liu	7717a549e9	[libc++] Optimize ranges::equal for vector<bool>::iterator (#121084 ) This PR optimizes the performance of `std::ranges::equal` for `vector<bool>::iterator`, addressing a subtask outlined in issue #64038. The optimizations yield performance improvements of up to 188x for aligned equality comparison and 82x for unaligned equality comparison. Moreover, comprehensive tests covering up to 4 storage words (256 bytes) with odd and even bit sizes are provided, which validate the proposed optimizations in this patch.	2025-02-26 12:18:25 -05:00
Louis Dionne	8feb5bac32	[libc++] Add benchmarks for copy algorithms (#127328 ) This patch adds benchmarks for the copy family of algorithms (copy, copy_n, copy_if, copy_backward).	2025-02-20 08:24:32 -05:00
Peng Liu	ab3d793982	[libc++] Optimize ranges::move{,_backward} for vector<bool>::iterator (#121109 ) As a follow-up to #121013 (which optimized `ranges::copy`) and #121026 (which optimized `ranges::copy_backward`), this PR enhances the performance of `std::ranges::{move, move_backward}` for `vector<bool>::iterator`, addressing a subtask outlined in issue #64038. The optimizations bring performance improvements analogous to those achieved for the `{copy, copy_backward}` algorithms: up to 2000x for aligned moves and 60x for unaligned moves. Moreover, comprehensive tests covering up to 4 storage words (256 bytes) with odd and even bit sizes are provided, which validate the proposed optimizations in this patch.	2025-02-19 11:36:45 -05:00
Louis Dionne	a6093d3034	[libc++] Explicitly mention vector_bool in the name of benchmarks (#127313 ) We have some benchmarks that were benchmarking very specific functionality, namely the optimizations in vector<bool>::iterator. Call this out in the benchmarks by renaming them appropriately. In the future we will also increase the coverage of these benchmarks to test other containers.	2025-02-15 14:56:19 +01:00
Louis Dionne	c7c7eabc7f	[libc++] Add a benchmark for std::reverse (#125262 )	2025-02-03 12:10:47 -05:00
Louis Dionne	439bef9751	[libc++] Refactor the sequence container benchmarks (#119763 ) Rewrite the sequence container benchmarks to only rely on the actual operations specified in SequenceContainer requirements and add benchmarks for std::list, which is also considered a sequence container. One of the major goals of this refactoring is also to make these container benchmarks run faster so that they can be run more frequently. The existing benchmarks have the significant problem that they take so long to run that they must basically be run overnight. This patch reduces the size of inputs such that the rewritten benchmarks each take at most a minute to run. This patch doesn't touch the string benchmarks, which were not using the generic container benchmark functions previously.	2025-01-30 15:02:34 -05:00
Peng Liu	edc3dc6abd	[libc++] Optimize ranges::copy_backward for vector<bool>::iterator (#121026 ) As a follow-up to #121013 (which focused on `std::ranges::copy`), this PR optimizes the performance of `std::ranges::copy_backward` for `vector<bool>::iterator`, addressing a subtask outlined in issue #64038. The optimizations yield performance improvements of up to 2000x for aligned copies and 60x for unaligned copies.	2025-01-30 14:55:05 -05:00
Peng Liu	5b65896ad6	[libc++] Optimize ranges::copy{, _n} for vector<bool>::iterator (#121013 ) This PR optimizes the performance of `std::ranges::copy` and `std::ranges::copy_n` specifically for `vector<bool>::iterator`, addressing a subtask outlined in issue #64038. The optimizations yield performance improvements of up to 2000x for aligned copies and 60x for unaligned copies. Additionally, new tests have been added to validate these enhancements. - Aligned source-destination bits ranges::copy ``` -------------------------------------------------------------------------- Benchmark Before After Improvement -------------------------------------------------------------------------- bm_ranges_copy_vb_aligned/8 10.8 ns 1.42 ns 8x bm_ranges_copy_vb_aligned/64 88.5 ns 2.28 ns 39x bm_ranges_copy_vb_aligned/512 709 ns 1.95 ns 364x bm_ranges_copy_vb_aligned/4096 5568 ns 5.01 ns 1111x bm_ranges_copy_vb_aligned/32768 44754 ns 38.7 ns 1156x bm_ranges_copy_vb_aligned/65536 91092 ns 73.2 ns 1244x bm_ranges_copy_vb_aligned/102400 139473 ns 127 ns 1098x bm_ranges_copy_vb_aligned/106496 189004 ns 81.5 ns 2319x bm_ranges_copy_vb_aligned/110592 153647 ns 71.1 ns 2161x bm_ranges_copy_vb_aligned/114688 159261 ns 70.2 ns 2269x bm_ranges_copy_vb_aligned/118784 181910 ns 73.5 ns 2475x bm_ranges_copy_vb_aligned/122880 174117 ns 76.5 ns 2276x bm_ranges_copy_vb_aligned/126976 176020 ns 82.0 ns 2147x bm_ranges_copy_vb_aligned/131072 180757 ns 137 ns 1319x bm_ranges_copy_vb_aligned/135168 190342 ns 158 ns 1205x bm_ranges_copy_vb_aligned/139264 192831 ns 103 ns 1872x bm_ranges_copy_vb_aligned/143360 199627 ns 89.4 ns 2233x bm_ranges_copy_vb_aligned/147456 203881 ns 88.6 ns 2301x bm_ranges_copy_vb_aligned/151552 213345 ns 88.4 ns 2413x bm_ranges_copy_vb_aligned/155648 216892 ns 92.9 ns 2335x bm_ranges_copy_vb_aligned/159744 222751 ns 96.4 ns 2311x bm_ranges_copy_vb_aligned/163840 225995 ns 173 ns 1306x bm_ranges_copy_vb_aligned/167936 235230 ns 202 ns 1165x bm_ranges_copy_vb_aligned/172032 244093 ns 131 ns 1863x bm_ranges_copy_vb_aligned/176128 244434 ns 111 ns 2202x bm_ranges_copy_vb_aligned/180224 249570 ns 108 ns 2311x bm_ranges_copy_vb_aligned/184320 254538 ns 108 ns 2357x bm_ranges_copy_vb_aligned/188416 261817 ns 113 ns 2317x bm_ranges_copy_vb_aligned/192512 269923 ns 125 ns 2159x bm_ranges_copy_vb_aligned/196608 273494 ns 210 ns 1302x bm_ranges_copy_vb_aligned/200704 280035 ns 269 ns 1041x bm_ranges_copy_vb_aligned/204800 293102 ns 231 ns 1269x ``` ranges::copy_n ``` -------------------------------------------------------------------------- Benchmark Before After Improvement -------------------------------------------------------------------------- bm_ranges_copy_n_vb_aligned/8 11.8 ns 0.89 ns 13x bm_ranges_copy_n_vb_aligned/64 91.6 ns 2.06 ns 44x bm_ranges_copy_n_vb_aligned/512 718 ns 2.45 ns 293x bm_ranges_copy_n_vb_aligned/4096 5750 ns 5.02 ns 1145x bm_ranges_copy_n_vb_aligned/32768 45824 ns 40.9 ns 1120x bm_ranges_copy_n_vb_aligned/65536 92267 ns 73.8 ns 1250x bm_ranges_copy_n_vb_aligned/102400 143267 ns 125 ns 1146x bm_ranges_copy_n_vb_aligned/106496 148625 ns 82.4 ns 1804x bm_ranges_copy_n_vb_aligned/110592 154817 ns 72.0 ns 2150x bm_ranges_copy_n_vb_aligned/114688 157953 ns 70.4 ns 2244x bm_ranges_copy_n_vb_aligned/118784 162374 ns 71.5 ns 2270x bm_ranges_copy_n_vb_aligned/122880 168638 ns 72.9 ns 2313x bm_ranges_copy_n_vb_aligned/126976 175596 ns 76.6 ns 2292x bm_ranges_copy_n_vb_aligned/131072 181164 ns 135 ns 1342x bm_ranges_copy_n_vb_aligned/135168 184697 ns 157 ns 1176x bm_ranges_copy_n_vb_aligned/139264 191395 ns 104 ns 1840x bm_ranges_copy_n_vb_aligned/143360 194954 ns 88.3 ns 2208x bm_ranges_copy_n_vb_aligned/147456 208917 ns 86.1 ns 2426x bm_ranges_copy_n_vb_aligned/151552 211101 ns 87.2 ns 2421x bm_ranges_copy_n_vb_aligned/155648 213175 ns 89.0 ns 2395x bm_ranges_copy_n_vb_aligned/159744 218988 ns 86.7 ns 2526x bm_ranges_copy_n_vb_aligned/163840 225263 ns 156 ns 1444x bm_ranges_copy_n_vb_aligned/167936 230725 ns 184 ns 1254x bm_ranges_copy_n_vb_aligned/172032 235795 ns 119 ns 1981x bm_ranges_copy_n_vb_aligned/176128 241145 ns 101 ns 2388x bm_ranges_copy_n_vb_aligned/180224 250680 ns 99.5 ns 2519x bm_ranges_copy_n_vb_aligned/184320 262954 ns 99.7 ns 2637x bm_ranges_copy_n_vb_aligned/188416 258584 ns 103 ns 2510x bm_ranges_copy_n_vb_aligned/192512 267190 ns 125 ns 2138x bm_ranges_copy_n_vb_aligned/196608 270821 ns 213 ns 1271x bm_ranges_copy_n_vb_aligned/200704 279532 ns 262 ns 1067x bm_ranges_copy_n_vb_aligned/204800 283412 ns 222 ns 1277x ``` - Unaligned source-destination bits ``` -------------------------------------------------------------------------------- Benchmark Before After Improvement -------------------------------------------------------------------------------- bm_ranges_copy_vb_unaligned/8 12.8 ns 8.59 ns 1.5x bm_ranges_copy_vb_unaligned/64 98.2 ns 8.24 ns 12x bm_ranges_copy_vb_unaligned/512 755 ns 18.1 ns 42x bm_ranges_copy_vb_unaligned/4096 6027 ns 102 ns 59x bm_ranges_copy_vb_unaligned/32768 47663 ns 774 ns 62x bm_ranges_copy_vb_unaligned/262144 378981 ns 6455 ns 59x bm_ranges_copy_vb_unaligned/1048576 1520486 ns 25942 ns 59x bm_ranges_copy_n_vb_unaligned/8 11.3 ns 8.22 ns 1.4x bm_ranges_copy_n_vb_unaligned/64 97.3 ns 7.89 ns 12x bm_ranges_copy_n_vb_unaligned/512 747 ns 18.1 ns 41x bm_ranges_copy_n_vb_unaligned/4096 5932 ns 99.0 ns 60x bm_ranges_copy_n_vb_unaligned/32768 47776 ns 749 ns 64x bm_ranges_copy_n_vb_unaligned/262144 378802 ns 6576 ns 58x bm_ranges_copy_n_vb_unaligned/1048576 1547234 ns 26229 ns 59x ```	2025-01-30 17:26:26 +01:00
Nikolas Klauser	fd784726db	[libc++] Rewrite minmax_element benchmark The benchmark currently uses makeCartesianProductBenchmark, which doesn't make a ton of sense, since minmax_element always goes through every element one by one. The runtime doesn't depend on the values of the elements. Fixes #120758	2024-12-21 20:07:28 +01:00
Louis Dionne	6a9279ca40	[libc++] Slight reorganization of the benchmarks (#119625 ) Move various container benchmarks to the same subdirectory, and regroup some format-related benchmarks.	2024-12-12 08:14:50 -05:00
Louis Dionne	e236a52a88	[libc++] Unify the benchmarks with the test suite (#101399 ) Instead of building the benchmarks separately via CMake and running them separately from the test suite, this patch merges the benchmarks into the test suite and handles both uniformly. As a result: - It is now possible to run individual benchmarks like we run tests (e.g. using libcxx-lit), which is a huge quality-of-life improvement. - The benchmarks will be run under exactly the same configuration as the rest of the tests, which is a nice simplification. This does mean that one has to be careful to enable the desired optimization flags when running benchmarks, but that is easy with e.g. `libcxx-lit <...> --param optimization=speed`. - Benchmarks can use the same annotations as the rest of the test suite, such as `// UNSUPPORTED` & friends. When running the tests via `check-cxx`, we only compile the benchmarks because running them would be too time consuming. This introduces a bit of complexity in the testing setup, and instead it would be better to allow passing a --dry-run flag to GoogleBenchmark executables, which is the topic of https://github.com/google/benchmark/issues/1827. I am not really satisfied with the layering violation of adding the %{benchmark_flags} substitution to cmake-bridge, however I believe this can be improved in the future.	2024-11-07 09:07:50 -05:00
Louis Dionne	b2d2494731	[libc++] Make benchmarks forward-compatible with the test suite (#114502 ) This patch fixes warnings and errors that come up when running the benchmarks as part of the test suite. It also adds the necessary Lit annotations to make it pass in various configurations and increases the portability of the benchmarks.	2024-11-05 09:08:00 -05:00
Nikolas Klauser	d07fdf9779	[libc++] Optimize lexicographical_compare (#65279 ) If the comparison operation is equivalent to < and that is a total order, we know that we can use equality comparison on that type instead to extract some information. Furthermore, if equality comparison on that type is trivial, the user can't observe that we're calling it. So instead of using the user-provided total order, we use std::mismatch, which uses equality comparison (and is vertorized). Additionally, if the type is trivially lexicographically comparable, we can go one step further and use std::memcmp directly instead of calling std::mismatch. Benchmarks: ``` ------------------------------------------------------------------------------------- Benchmark old new ------------------------------------------------------------------------------------- bm_lexicographical_compare<unsigned char>/1 1.17 ns 2.34 ns bm_lexicographical_compare<unsigned char>/2 1.64 ns 2.57 ns bm_lexicographical_compare<unsigned char>/3 2.23 ns 2.58 ns bm_lexicographical_compare<unsigned char>/4 2.82 ns 2.57 ns bm_lexicographical_compare<unsigned char>/5 3.34 ns 2.11 ns bm_lexicographical_compare<unsigned char>/6 3.94 ns 2.21 ns bm_lexicographical_compare<unsigned char>/7 4.56 ns 2.11 ns bm_lexicographical_compare<unsigned char>/8 5.25 ns 2.11 ns bm_lexicographical_compare<unsigned char>/16 9.88 ns 2.11 ns bm_lexicographical_compare<unsigned char>/64 38.9 ns 2.36 ns bm_lexicographical_compare<unsigned char>/512 317 ns 6.54 ns bm_lexicographical_compare<unsigned char>/4096 2517 ns 41.4 ns bm_lexicographical_compare<unsigned char>/32768 20052 ns 488 ns bm_lexicographical_compare<unsigned char>/262144 159579 ns 4409 ns bm_lexicographical_compare<unsigned char>/1048576 640456 ns 20342 ns bm_lexicographical_compare<signed char>/1 1.18 ns 2.37 ns bm_lexicographical_compare<signed char>/2 1.65 ns 2.60 ns bm_lexicographical_compare<signed char>/3 2.23 ns 2.83 ns bm_lexicographical_compare<signed char>/4 2.81 ns 3.06 ns bm_lexicographical_compare<signed char>/5 3.35 ns 3.30 ns bm_lexicographical_compare<signed char>/6 3.90 ns 3.99 ns bm_lexicographical_compare<signed char>/7 4.56 ns 3.78 ns bm_lexicographical_compare<signed char>/8 5.20 ns 4.02 ns bm_lexicographical_compare<signed char>/16 9.80 ns 6.21 ns bm_lexicographical_compare<signed char>/64 39.0 ns 3.16 ns bm_lexicographical_compare<signed char>/512 318 ns 7.58 ns bm_lexicographical_compare<signed char>/4096 2514 ns 47.4 ns bm_lexicographical_compare<signed char>/32768 20096 ns 504 ns bm_lexicographical_compare<signed char>/262144 156617 ns 4146 ns bm_lexicographical_compare<signed char>/1048576 624265 ns 19810 ns bm_lexicographical_compare<int>/1 1.15 ns 2.12 ns bm_lexicographical_compare<int>/2 1.60 ns 2.36 ns bm_lexicographical_compare<int>/3 2.21 ns 2.59 ns bm_lexicographical_compare<int>/4 2.74 ns 2.83 ns bm_lexicographical_compare<int>/5 3.26 ns 3.06 ns bm_lexicographical_compare<int>/6 3.81 ns 4.53 ns bm_lexicographical_compare<int>/7 4.41 ns 4.72 ns bm_lexicographical_compare<int>/8 5.08 ns 2.36 ns bm_lexicographical_compare<int>/16 9.54 ns 3.08 ns bm_lexicographical_compare<int>/64 37.8 ns 4.71 ns bm_lexicographical_compare<int>/512 309 ns 24.6 ns bm_lexicographical_compare<int>/4096 2422 ns 204 ns bm_lexicographical_compare<int>/32768 19362 ns 1947 ns bm_lexicographical_compare<int>/262144 155727 ns 19793 ns bm_lexicographical_compare<int>/1048576 623614 ns 80180 ns bm_ranges_lexicographical_compare<unsigned char>/1 1.07 ns 2.35 ns bm_ranges_lexicographical_compare<unsigned char>/2 1.72 ns 2.13 ns bm_ranges_lexicographical_compare<unsigned char>/3 2.46 ns 2.12 ns bm_ranges_lexicographical_compare<unsigned char>/4 3.17 ns 2.12 ns bm_ranges_lexicographical_compare<unsigned char>/5 3.86 ns 2.12 ns bm_ranges_lexicographical_compare<unsigned char>/6 4.55 ns 2.12 ns bm_ranges_lexicographical_compare<unsigned char>/7 5.25 ns 2.12 ns bm_ranges_lexicographical_compare<unsigned char>/8 5.95 ns 2.13 ns bm_ranges_lexicographical_compare<unsigned char>/16 11.7 ns 2.13 ns bm_ranges_lexicographical_compare<unsigned char>/64 45.5 ns 2.36 ns bm_ranges_lexicographical_compare<unsigned char>/512 366 ns 6.35 ns bm_ranges_lexicographical_compare<unsigned char>/4096 2886 ns 40.9 ns bm_ranges_lexicographical_compare<unsigned char>/32768 23054 ns 489 ns bm_ranges_lexicographical_compare<unsigned char>/262144 185302 ns 4339 ns bm_ranges_lexicographical_compare<unsigned char>/1048576 741576 ns 19430 ns bm_ranges_lexicographical_compare<signed char>/1 1.10 ns 2.12 ns bm_ranges_lexicographical_compare<signed char>/2 1.66 ns 2.35 ns bm_ranges_lexicographical_compare<signed char>/3 2.23 ns 2.58 ns bm_ranges_lexicographical_compare<signed char>/4 2.82 ns 2.82 ns bm_ranges_lexicographical_compare<signed char>/5 3.34 ns 3.06 ns bm_ranges_lexicographical_compare<signed char>/6 3.92 ns 3.99 ns bm_ranges_lexicographical_compare<signed char>/7 4.64 ns 4.10 ns bm_ranges_lexicographical_compare<signed char>/8 5.21 ns 4.61 ns bm_ranges_lexicographical_compare<signed char>/16 9.79 ns 7.42 ns bm_ranges_lexicographical_compare<signed char>/64 38.9 ns 2.93 ns bm_ranges_lexicographical_compare<signed char>/512 317 ns 7.31 ns bm_ranges_lexicographical_compare<signed char>/4096 2500 ns 47.5 ns bm_ranges_lexicographical_compare<signed char>/32768 19940 ns 496 ns bm_ranges_lexicographical_compare<signed char>/262144 159166 ns 4393 ns bm_ranges_lexicographical_compare<signed char>/1048576 638206 ns 19786 ns bm_ranges_lexicographical_compare<int>/1 1.10 ns 2.12 ns bm_ranges_lexicographical_compare<int>/2 1.64 ns 3.04 ns bm_ranges_lexicographical_compare<int>/3 2.23 ns 2.58 ns bm_ranges_lexicographical_compare<int>/4 2.81 ns 2.81 ns bm_ranges_lexicographical_compare<int>/5 3.35 ns 3.05 ns bm_ranges_lexicographical_compare<int>/6 3.94 ns 4.60 ns bm_ranges_lexicographical_compare<int>/7 4.60 ns 4.81 ns bm_ranges_lexicographical_compare<int>/8 5.19 ns 2.35 ns bm_ranges_lexicographical_compare<int>/16 9.85 ns 2.87 ns bm_ranges_lexicographical_compare<int>/64 38.9 ns 4.70 ns bm_ranges_lexicographical_compare<int>/512 318 ns 24.5 ns bm_ranges_lexicographical_compare<int>/4096 2494 ns 202 ns bm_ranges_lexicographical_compare<int>/32768 20000 ns 1939 ns bm_ranges_lexicographical_compare<int>/262144 160433 ns 19730 ns bm_ranges_lexicographical_compare<int>/1048576 642636 ns 80760 ns ```	2024-08-04 10:02:43 +02:00
Louis Dionne	6a54dfbfe5	[libc++][NFC] Add missing license headers Also standardize the license comment in several files where it was different from what we normally do.	2024-07-31 12:58:09 -04:00
Louis Dionne	78b4b5cccb	[libc++] Move the benchmarks under libcxx/test (#99371 ) This is an intermediate and fairly mechanical step towards unifying the benchmarks with the rest of the test suite. Moving this around requires a few changes, notably making sure we don't throw a wrench into the discovery process of the normal test suite. This won't be a problem anymore once benchmarks are taken into account by the test setup out of the box.	2024-07-31 11:18:32 -04:00

23 Commits