365 Commits

Author SHA1 Message Date
Дмитрий Изволов
69b54c1a05
[libcxx][algorithm] Optimize std::stable_sort via radix sort algorithm (#104683)
The radix sort (LSD) algorithm allows to speed up std::stable_sort
dramatically in case we sort integers.
The speed up varies from a relatively small to x10 times, depending on
type of sorted elements and the initial state of the sorted array.

```
Running ./libcxx/test/benchmarks/stable_sort.bench.out
Run on (12 X 2600 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB
  L1 Instruction 32 KiB
  L2 Unified 256 KiB (x6)
  L3 Unified 12288 KiB
Load Average: 3.48, 3.38, 3.08
---------------------------------------------------------------------------
Benchmark                                               After        Before 
---------------------------------------------------------------------------
BM_StableSort_int8_Random_1                           3.39 ns       3.58 ns 
BM_StableSort_int8_Random_4                           21.1 ns       21.9 ns 
BM_StableSort_int8_Random_16                           142 ns        147 ns 
BM_StableSort_int8_Random_64                           893 ns        903 ns 
BM_StableSort_int8_Random_256                          409 ns       5810 ns 
BM_StableSort_int8_Random_1024                        1235 ns      29973 ns 
BM_StableSort_int8_Random_4096                        4410 ns     141880 ns 
BM_StableSort_int8_Random_16384                      18044 ns     620540 ns 
BM_StableSort_int8_Random_65536                     144030 ns    2592013 ns 
BM_StableSort_int8_Random_262144                    858350 ns   10935814 ns 
BM_StableSort_int8_Random_524288                   2929988 ns   27060729 ns 
BM_StableSort_int8_Random_1048576                  6058292 ns   49622720 ns 
BM_StableSort_int8_Ascending_1                        3.42 ns       3.92 ns 
BM_StableSort_int8_Ascending_4                        5.86 ns       8.08 ns 
BM_StableSort_int8_Ascending_16                       10.6 ns       12.0 ns 
BM_StableSort_int8_Ascending_64                       28.9 ns       30.6 ns 
BM_StableSort_int8_Ascending_256                       415 ns        391 ns 
BM_StableSort_int8_Ascending_1024                     1666 ns       2309 ns 
BM_StableSort_int8_Ascending_4096                     7748 ns      12269 ns 
BM_StableSort_int8_Ascending_16384                   40588 ns      60181 ns 
BM_StableSort_int8_Ascending_65536                  178843 ns     298221 ns 
BM_StableSort_int8_Ascending_262144                 919959 ns    1402692 ns 
BM_StableSort_int8_Ascending_524288                2397397 ns    3036984 ns 
BM_StableSort_int8_Ascending_1048576               5080043 ns    7218581 ns 
BM_StableSort_int8_Descending_1                       3.44 ns       3.53 ns 
BM_StableSort_int8_Descending_4                       7.94 ns       8.29 ns 
BM_StableSort_int8_Descending_16                      59.6 ns       57.7 ns 
BM_StableSort_int8_Descending_64                      1051 ns       1027 ns 
BM_StableSort_int8_Descending_256                      422 ns       4718 ns 
BM_StableSort_int8_Descending_1024                    1676 ns      21044 ns 
BM_StableSort_int8_Descending_4096                    7766 ns      64827 ns 
BM_StableSort_int8_Descending_16384                  40230 ns      93981 ns 
BM_StableSort_int8_Descending_65536                 190978 ns     421151 ns 
BM_StableSort_int8_Descending_262144               1055141 ns    1918927 ns 
BM_StableSort_int8_Descending_524288               2875115 ns    3809153 ns 
BM_StableSort_int8_Descending_1048576              5854135 ns    8713690 ns 
BM_StableSort_int8_SingleElement_1                    3.52 ns       3.46 ns 
BM_StableSort_int8_SingleElement_4                    6.25 ns       5.79 ns 
BM_StableSort_int8_SingleElement_16                   10.7 ns       11.4 ns 
BM_StableSort_int8_SingleElement_64                   29.3 ns       30.3 ns 
BM_StableSort_int8_SingleElement_256                   858 ns        380 ns 
BM_StableSort_int8_SingleElement_1024                 3036 ns       2231 ns 
BM_StableSort_int8_SingleElement_4096                11580 ns      11866 ns 
BM_StableSort_int8_SingleElement_16384               44956 ns      59621 ns 
BM_StableSort_int8_SingleElement_65536              182006 ns     297853 ns 
BM_StableSort_int8_SingleElement_262144             962181 ns    1432857 ns 
BM_StableSort_int8_SingleElement_524288            2256687 ns    2975707 ns 
BM_StableSort_int8_SingleElement_1048576           4522556 ns    6949948 ns 
BM_StableSort_int8_PipeOrgan_1                        3.26 ns       3.64 ns 
BM_StableSort_int8_PipeOrgan_4                        6.21 ns       6.58 ns 
BM_StableSort_int8_PipeOrgan_16                       23.7 ns       25.4 ns 
BM_StableSort_int8_PipeOrgan_64                        250 ns        248 ns 
BM_StableSort_int8_PipeOrgan_256                       414 ns       2498 ns 
BM_StableSort_int8_PipeOrgan_1024                     1697 ns      10946 ns 
BM_StableSort_int8_PipeOrgan_4096                     7840 ns      37238 ns 
BM_StableSort_int8_PipeOrgan_16384                   41402 ns      74805 ns 
BM_StableSort_int8_PipeOrgan_65536                  180107 ns     357891 ns 
BM_StableSort_int8_PipeOrgan_262144                 988273 ns    1647296 ns 
BM_StableSort_int8_PipeOrgan_524288                2547374 ns    3245991 ns 
BM_StableSort_int8_PipeOrgan_1048576               5128783 ns    7342444 ns 
BM_StableSort_int8_QuickSortAdversary_1               3.14 ns       4.01 ns 
BM_StableSort_int8_QuickSortAdversary_4               6.05 ns       7.02 ns 
BM_StableSort_int8_QuickSortAdversary_16              10.5 ns       11.9 ns 
BM_StableSort_int8_QuickSortAdversary_64               520 ns        516 ns 
BM_StableSort_int8_QuickSortAdversary_256              920 ns        386 ns 
BM_StableSort_int8_QuickSortAdversary_1024            3083 ns       2299 ns 
BM_StableSort_int8_QuickSortAdversary_4096           11659 ns      12295 ns 
BM_StableSort_int8_QuickSortAdversary_16384          45721 ns      60931 ns 
BM_StableSort_int8_QuickSortAdversary_65536         186334 ns     295423 ns 
BM_StableSort_int8_QuickSortAdversary_262144        946262 ns    1399973 ns 
BM_StableSort_int8_QuickSortAdversary_524288       2282004 ns    2832266 ns 
BM_StableSort_int8_QuickSortAdversary_1048576      4691123 ns    6963253 ns 
BM_StableSort_uint8_Random_1                          3.11 ns       3.44 ns 
BM_StableSort_uint8_Random_4                          21.9 ns       23.1 ns 
BM_StableSort_uint8_Random_16                          154 ns        171 ns 
BM_StableSort_uint8_Random_64                         1000 ns       1051 ns 
BM_StableSort_uint8_Random_256                         402 ns       6498 ns 
BM_StableSort_uint8_Random_1024                       1176 ns      35310 ns 
BM_StableSort_uint8_Random_4096                       4415 ns     164087 ns 
BM_StableSort_uint8_Random_16384                     17849 ns     686769 ns 
BM_StableSort_uint8_Random_65536                    146109 ns    2932051 ns 
BM_StableSort_uint8_Random_262144                   876710 ns   12163988 ns 
BM_StableSort_uint8_Random_524288                  2858089 ns   26458830 ns 
BM_StableSort_uint8_Random_1048576                 5766942 ns   54836214 ns 
BM_StableSort_uint8_Ascending_1                       3.11 ns       3.43 ns 
BM_StableSort_uint8_Ascending_4                       6.18 ns       7.24 ns 
BM_StableSort_uint8_Ascending_16                      14.5 ns       17.0 ns 
BM_StableSort_uint8_Ascending_64                      50.7 ns       59.2 ns 
BM_StableSort_uint8_Ascending_256                      395 ns        536 ns 
BM_StableSort_uint8_Ascending_1024                    1752 ns       2956 ns 
BM_StableSort_uint8_Ascending_4096                    7785 ns      15146 ns 
BM_StableSort_uint8_Ascending_16384                  41442 ns      74136 ns 
BM_StableSort_uint8_Ascending_65536                 180879 ns     354261 ns 
BM_StableSort_uint8_Ascending_262144                945880 ns    1674256 ns 
BM_StableSort_uint8_Ascending_524288               2287832 ns    3138581 ns 
BM_StableSort_uint8_Ascending_1048576              4630290 ns    7296278 ns 
BM_StableSort_uint8_Descending_1                      3.19 ns       3.63 ns 
BM_StableSort_uint8_Descending_4                      9.60 ns       11.5 ns 
BM_StableSort_uint8_Descending_16                     78.3 ns       86.0 ns 
BM_StableSort_uint8_Descending_64                     1265 ns       1308 ns 
BM_StableSort_uint8_Descending_256                     395 ns       6556 ns 
BM_StableSort_uint8_Descending_1024                   1712 ns      24669 ns 
BM_StableSort_uint8_Descending_4096                   7748 ns      83407 ns 
BM_StableSort_uint8_Descending_16384                 40779 ns     104043 ns 
BM_StableSort_uint8_Descending_65536                181560 ns     467680 ns 
BM_StableSort_uint8_Descending_262144              1146627 ns    2102769 ns 
BM_StableSort_uint8_Descending_524288              2874096 ns    4572229 ns 
BM_StableSort_uint8_Descending_1048576             5873195 ns   10170663 ns 
BM_StableSort_uint8_SingleElement_1                   3.28 ns       3.58 ns 
BM_StableSort_uint8_SingleElement_4                   6.44 ns       7.40 ns 
BM_StableSort_uint8_SingleElement_16                  14.9 ns       16.4 ns 
BM_StableSort_uint8_SingleElement_64                  51.2 ns       52.9 ns 
BM_StableSort_uint8_SingleElement_256                  876 ns        490 ns 
BM_StableSort_uint8_SingleElement_1024                3041 ns       2750 ns 
BM_StableSort_uint8_SingleElement_4096               11947 ns      14326 ns 
BM_StableSort_uint8_SingleElement_16384              46669 ns      69984 ns 
BM_StableSort_uint8_SingleElement_65536             197903 ns     328961 ns 
BM_StableSort_uint8_SingleElement_262144           1031466 ns    1551436 ns 
BM_StableSort_uint8_SingleElement_524288           2447672 ns    3049553 ns 
BM_StableSort_uint8_SingleElement_1048576          4793087 ns    7615245 ns 
BM_StableSort_uint8_PipeOrgan_1                       3.38 ns       3.56 ns 
BM_StableSort_uint8_PipeOrgan_4                       7.16 ns       8.70 ns 
BM_StableSort_uint8_PipeOrgan_16                      31.7 ns       35.3 ns 
BM_StableSort_uint8_PipeOrgan_64                       326 ns        366 ns 
BM_StableSort_uint8_PipeOrgan_256                      409 ns       2942 ns 
BM_StableSort_uint8_PipeOrgan_1024                    1994 ns      12571 ns 
BM_StableSort_uint8_PipeOrgan_4096                    8086 ns      46278 ns 
BM_StableSort_uint8_PipeOrgan_16384                  41749 ns      79813 ns 
BM_StableSort_uint8_PipeOrgan_65536                 180697 ns     375120 ns 
BM_StableSort_uint8_PipeOrgan_262144               1004899 ns    1676143 ns 
BM_StableSort_uint8_PipeOrgan_524288               2456081 ns    3333949 ns 
BM_StableSort_uint8_PipeOrgan_1048576              5030857 ns    7591303 ns 
BM_StableSort_uint8_QuickSortAdversary_1              3.12 ns       3.46 ns 
BM_StableSort_uint8_QuickSortAdversary_4              7.25 ns       6.83 ns 
BM_StableSort_uint8_QuickSortAdversary_16             14.6 ns       16.2 ns 
BM_StableSort_uint8_QuickSortAdversary_64              650 ns        665 ns 
BM_StableSort_uint8_QuickSortAdversary_256             395 ns       2982 ns 
BM_StableSort_uint8_QuickSortAdversary_1024           3125 ns       2583 ns 
BM_StableSort_uint8_QuickSortAdversary_4096          11797 ns      13929 ns 
BM_StableSort_uint8_QuickSortAdversary_16384         45803 ns      66513 ns 
BM_StableSort_uint8_QuickSortAdversary_65536        190745 ns     313467 ns 
BM_StableSort_uint8_QuickSortAdversary_262144       974646 ns    1469014 ns 
BM_StableSort_uint8_QuickSortAdversary_524288      2317553 ns    3022065 ns 
BM_StableSort_uint8_QuickSortAdversary_1048576     4898703 ns    6854079 ns 
BM_StableSort_int16_Random_1                          3.94 ns       3.49 ns 
BM_StableSort_int16_Random_4                          20.8 ns       23.2 ns 
BM_StableSort_int16_Random_16                          133 ns        163 ns 
BM_StableSort_int16_Random_64                          903 ns        953 ns 
BM_StableSort_int16_Random_256                        5638 ns       6258 ns 
BM_StableSort_int16_Random_1024                       3056 ns      34587 ns 
BM_StableSort_int16_Random_4096                      10596 ns     168397 ns 
BM_StableSort_int16_Random_16384                     49908 ns     753031 ns 
BM_StableSort_int16_Random_65536                    444605 ns    3838368 ns 
BM_StableSort_int16_Random_262144                  2419345 ns   15657285 ns 
BM_StableSort_int16_Random_524288                  7984040 ns   32726933 ns 
BM_StableSort_int16_Random_1048576                16092424 ns   67999766 ns 
BM_StableSort_int16_Ascending_1                       3.40 ns       3.43 ns 
BM_StableSort_int16_Ascending_4                       5.45 ns       5.79 ns 
BM_StableSort_int16_Ascending_16                      12.0 ns       15.3 ns 
BM_StableSort_int16_Ascending_64                      39.6 ns       52.6 ns 
BM_StableSort_int16_Ascending_256                      470 ns        550 ns 
BM_StableSort_int16_Ascending_1024                    1686 ns       2707 ns 
BM_StableSort_int16_Ascending_4096                    5676 ns      14165 ns 
BM_StableSort_int16_Ascending_16384                  21413 ns      69483 ns 
BM_StableSort_int16_Ascending_65536                  88010 ns     334466 ns 
BM_StableSort_int16_Ascending_262144                567239 ns    1570620 ns 
BM_StableSort_int16_Ascending_524288               1553063 ns    3424666 ns 
BM_StableSort_int16_Ascending_1048576              3145577 ns    8499649 ns 
BM_StableSort_int16_Descending_1                      3.22 ns       3.54 ns 
BM_StableSort_int16_Descending_4                      6.85 ns       10.2 ns 
BM_StableSort_int16_Descending_16                     62.7 ns       62.2 ns 
BM_StableSort_int16_Descending_64                     1138 ns       1036 ns 
BM_StableSort_int16_Descending_256                    5541 ns       4696 ns 
BM_StableSort_int16_Descending_1024                   3046 ns      19577 ns 
BM_StableSort_int16_Descending_4096                  10962 ns      79149 ns 
BM_StableSort_int16_Descending_16384                 58182 ns     327709 ns 
BM_StableSort_int16_Descending_65536                447025 ns    1424896 ns 
BM_StableSort_int16_Descending_262144              1104973 ns    5921903 ns 
BM_StableSort_int16_Descending_524288              2547840 ns   17956789 ns 
BM_StableSort_int16_Descending_1048576             5093555 ns   17044318 ns 
BM_StableSort_int16_SingleElement_1                   3.56 ns       3.96 ns 
BM_StableSort_int16_SingleElement_4                   5.75 ns       6.72 ns 
BM_StableSort_int16_SingleElement_16                  12.4 ns       16.1 ns 
BM_StableSort_int16_SingleElement_64                  36.9 ns       54.4 ns 
BM_StableSort_int16_SingleElement_256                  473 ns        557 ns 
BM_StableSort_int16_SingleElement_1024                1828 ns       2826 ns 
BM_StableSort_int16_SingleElement_4096                6239 ns      14252 ns 
BM_StableSort_int16_SingleElement_16384              23695 ns      70369 ns 
BM_StableSort_int16_SingleElement_65536              93281 ns     361641 ns 
BM_StableSort_int16_SingleElement_262144            599078 ns    1640216 ns 
BM_StableSort_int16_SingleElement_524288           1659678 ns    3343087 ns 
BM_StableSort_int16_SingleElement_1048576          3184033 ns    7770271 ns 
BM_StableSort_int16_PipeOrgan_1                       3.75 ns       3.76 ns 
BM_StableSort_int16_PipeOrgan_4                       5.94 ns       7.74 ns 
BM_StableSort_int16_PipeOrgan_16                      26.7 ns       25.9 ns 
BM_StableSort_int16_PipeOrgan_64                       300 ns        263 ns 
BM_StableSort_int16_PipeOrgan_256                     2769 ns       2760 ns 
BM_StableSort_int16_PipeOrgan_1024                    2996 ns      10544 ns 
BM_StableSort_int16_PipeOrgan_4096                   11641 ns      44750 ns 
BM_StableSort_int16_PipeOrgan_16384                  57224 ns     200464 ns 
BM_StableSort_int16_PipeOrgan_65536                 416873 ns     887631 ns 
BM_StableSort_int16_PipeOrgan_262144                843264 ns    3588669 ns 
BM_StableSort_int16_PipeOrgan_524288               2027741 ns   11056924 ns 
BM_StableSort_int16_PipeOrgan_1048576              4223773 ns   13261276 ns 
BM_StableSort_int16_QuickSortAdversary_1              3.83 ns       3.68 ns 
BM_StableSort_int16_QuickSortAdversary_4              5.55 ns       6.93 ns 
BM_StableSort_int16_QuickSortAdversary_16             12.3 ns       15.2 ns 
BM_StableSort_int16_QuickSortAdversary_64              646 ns        632 ns 
BM_StableSort_int16_QuickSortAdversary_256            2751 ns       2542 ns 
BM_StableSort_int16_QuickSortAdversary_1024           3028 ns      16901 ns 
BM_StableSort_int16_QuickSortAdversary_4096          10862 ns      80222 ns 
BM_StableSort_int16_QuickSortAdversary_16384         57753 ns     317281 ns 
BM_StableSort_int16_QuickSortAdversary_65536         94064 ns     328502 ns 
BM_StableSort_int16_QuickSortAdversary_262144       557796 ns    1613208 ns 
BM_StableSort_int16_QuickSortAdversary_524288      1518451 ns    3479740 ns 
BM_StableSort_int16_QuickSortAdversary_1048576     3165129 ns    7655880 ns 
BM_StableSort_uint16_Random_1                         3.26 ns       3.44 ns 
BM_StableSort_uint16_Random_4                         21.1 ns       22.2 ns 
BM_StableSort_uint16_Random_16                         157 ns        156 ns 
BM_StableSort_uint16_Random_64                         955 ns        947 ns 
BM_StableSort_uint16_Random_256                       5886 ns       6097 ns 
BM_StableSort_uint16_Random_1024                      2787 ns      30776 ns 
BM_StableSort_uint16_Random_4096                      9973 ns     155652 ns 
BM_StableSort_uint16_Random_16384                    48628 ns     741072 ns 
BM_StableSort_uint16_Random_65536                   439609 ns    3478966 ns 
BM_StableSort_uint16_Random_262144                 2336983 ns   15197642 ns 
BM_StableSort_uint16_Random_524288                 7888701 ns   34234254 ns 
BM_StableSort_uint16_Random_1048576               14865180 ns   68516386 ns 
BM_StableSort_uint16_Ascending_1                      3.33 ns       4.00 ns 
BM_StableSort_uint16_Ascending_4                      5.79 ns       6.64 ns 
BM_StableSort_uint16_Ascending_16                     14.9 ns       15.5 ns 
BM_StableSort_uint16_Ascending_64                     50.2 ns       52.5 ns 
BM_StableSort_uint16_Ascending_256                     538 ns        546 ns 
BM_StableSort_uint16_Ascending_1024                   1645 ns       2652 ns 
BM_StableSort_uint16_Ascending_4096                   5559 ns      14517 ns 
BM_StableSort_uint16_Ascending_16384                 22803 ns      70275 ns 
BM_StableSort_uint16_Ascending_65536                 83109 ns     333446 ns 
BM_StableSort_uint16_Ascending_262144               562667 ns    1568670 ns 
BM_StableSort_uint16_Ascending_524288              1564646 ns    3059839 ns 
BM_StableSort_uint16_Ascending_1048576             3178826 ns    7048327 ns 
BM_StableSort_uint16_Descending_1                     3.34 ns       3.93 ns 
BM_StableSort_uint16_Descending_4                     8.75 ns       9.73 ns 
BM_StableSort_uint16_Descending_16                    55.9 ns       55.5 ns 
BM_StableSort_uint16_Descending_64                    1021 ns       1035 ns 
BM_StableSort_uint16_Descending_256                   4752 ns       4931 ns 
BM_StableSort_uint16_Descending_1024                  2982 ns      19727 ns 
BM_StableSort_uint16_Descending_4096                 10432 ns      83165 ns 
BM_StableSort_uint16_Descending_16384                56593 ns     326131 ns 
BM_StableSort_uint16_Descending_65536               439134 ns    1371346 ns 
BM_StableSort_uint16_Descending_262144             1220925 ns    5735665 ns 
BM_StableSort_uint16_Descending_524288             2767234 ns   16758330 ns 
BM_StableSort_uint16_Descending_1048576            5673769 ns   17541715 ns 
BM_StableSort_uint16_SingleElement_1                  3.53 ns       3.73 ns 
BM_StableSort_uint16_SingleElement_4                  6.27 ns       5.81 ns 
BM_StableSort_uint16_SingleElement_16                 14.8 ns       15.1 ns 
BM_StableSort_uint16_SingleElement_64                 51.5 ns       50.9 ns 
BM_StableSort_uint16_SingleElement_256                 536 ns        540 ns 
BM_StableSort_uint16_SingleElement_1024               1669 ns       2690 ns 
BM_StableSort_uint16_SingleElement_4096               5840 ns      14230 ns 
BM_StableSort_uint16_SingleElement_16384             22468 ns      68524 ns 
BM_StableSort_uint16_SingleElement_65536             89845 ns     332187 ns 
BM_StableSort_uint16_SingleElement_262144           590736 ns    1550868 ns 
BM_StableSort_uint16_SingleElement_524288          1573677 ns    3095703 ns 
BM_StableSort_uint16_SingleElement_1048576         3183421 ns    8251180 ns 
BM_StableSort_uint16_PipeOrgan_1                      3.70 ns       3.64 ns 
BM_StableSort_uint16_PipeOrgan_4                      7.01 ns       6.81 ns 
BM_StableSort_uint16_PipeOrgan_16                     25.7 ns       26.4 ns 
BM_StableSort_uint16_PipeOrgan_64                      283 ns        277 ns 
BM_StableSort_uint16_PipeOrgan_256                    2562 ns       2852 ns 
BM_StableSort_uint16_PipeOrgan_1024                   2863 ns      10892 ns 
BM_StableSort_uint16_PipeOrgan_4096                  10585 ns      45668 ns 
BM_StableSort_uint16_PipeOrgan_16384                 59151 ns     194358 ns 
BM_StableSort_uint16_PipeOrgan_65536                508579 ns     854692 ns 
BM_StableSort_uint16_PipeOrgan_262144               901294 ns    3606346 ns 
BM_StableSort_uint16_PipeOrgan_524288              2192498 ns   10449279 ns 
BM_StableSort_uint16_PipeOrgan_1048576             4204368 ns   11956606 ns 
BM_StableSort_uint16_QuickSortAdversary_1             3.20 ns       3.63 ns 
BM_StableSort_uint16_QuickSortAdversary_4             5.30 ns       6.38 ns 
BM_StableSort_uint16_QuickSortAdversary_16            14.5 ns       15.3 ns 
BM_StableSort_uint16_QuickSortAdversary_64             575 ns        611 ns 
BM_StableSort_uint16_QuickSortAdversary_256           2423 ns       2577 ns 
BM_StableSort_uint16_QuickSortAdversary_1024          2794 ns      16854 ns 
BM_StableSort_uint16_QuickSortAdversary_4096         10511 ns      75952 ns 
BM_StableSort_uint16_QuickSortAdversary_16384        56214 ns     333824 ns 
BM_StableSort_uint16_QuickSortAdversary_65536       422512 ns    1354867 ns 
BM_StableSort_uint16_QuickSortAdversary_262144      583301 ns    1564443 ns 
BM_StableSort_uint16_QuickSortAdversary_524288     1584319 ns    3265575 ns 
BM_StableSort_uint16_QuickSortAdversary_1048576    3197732 ns    7945245 ns 
BM_StableSort_int32_Random_1                          3.81 ns       3.70 ns 
BM_StableSort_int32_Random_4                          20.8 ns       23.4 ns 
BM_StableSort_int32_Random_16                          134 ns        161 ns 
BM_StableSort_int32_Random_64                          895 ns        984 ns 
BM_StableSort_int32_Random_256                        5640 ns       5897 ns 
BM_StableSort_int32_Random_1024                       6994 ns      32118 ns 
BM_StableSort_int32_Random_4096                      27367 ns     168960 ns 
BM_StableSort_int32_Random_16384                    183261 ns     843240 ns 
BM_StableSort_int32_Random_65536                    950914 ns    3953588 ns 
BM_StableSort_int32_Random_262144                  3673311 ns   16790171 ns 
BM_StableSort_int32_Random_524288                 11515700 ns   36023098 ns 
BM_StableSort_int32_Random_1048576                24492515 ns   78116028 ns 
BM_StableSort_int32_Ascending_1                       3.31 ns       4.48 ns 
BM_StableSort_int32_Ascending_4                       5.96 ns       6.99 ns 
BM_StableSort_int32_Ascending_16                      13.0 ns       16.0 ns 
BM_StableSort_int32_Ascending_64                      36.7 ns       53.0 ns 
BM_StableSort_int32_Ascending_256                      391 ns        471 ns 
BM_StableSort_int32_Ascending_1024                    2705 ns       2682 ns 
BM_StableSort_int32_Ascending_4096                    8773 ns      14231 ns 
BM_StableSort_int32_Ascending_16384                  34709 ns      70625 ns 
BM_StableSort_int32_Ascending_65536                 142907 ns     344482 ns 
BM_StableSort_int32_Ascending_262144                745483 ns    1591418 ns 
BM_StableSort_int32_Ascending_524288               1873701 ns    3190305 ns 
BM_StableSort_int32_Ascending_1048576              3851590 ns    7570095 ns 
BM_StableSort_int32_Descending_1                      3.22 ns       4.23 ns 
BM_StableSort_int32_Descending_4                      7.58 ns       11.2 ns 
BM_StableSort_int32_Descending_16                     63.9 ns       58.6 ns 
BM_StableSort_int32_Descending_64                     1133 ns       1017 ns 
BM_StableSort_int32_Descending_256                    4850 ns       4464 ns 
BM_StableSort_int32_Descending_1024                   7023 ns      18954 ns 
BM_StableSort_int32_Descending_4096                  28550 ns      75163 ns 
BM_StableSort_int32_Descending_16384                200880 ns     341104 ns 
BM_StableSort_int32_Descending_65536               1095910 ns    1398021 ns 
BM_StableSort_int32_Descending_262144              3818864 ns    5695486 ns 
BM_StableSort_int32_Descending_524288              5606779 ns   17593982 ns 
BM_StableSort_int32_Descending_1048576            16416366 ns   26649503 ns 
BM_StableSort_int32_SingleElement_1                   3.81 ns       3.71 ns 
BM_StableSort_int32_SingleElement_4                   6.57 ns       6.61 ns 
BM_StableSort_int32_SingleElement_16                  14.0 ns       15.8 ns 
BM_StableSort_int32_SingleElement_64                  38.7 ns       53.5 ns 
BM_StableSort_int32_SingleElement_256                  386 ns        554 ns 
BM_StableSort_int32_SingleElement_1024                2761 ns       3046 ns 
BM_StableSort_int32_SingleElement_4096                9179 ns      15188 ns 
BM_StableSort_int32_SingleElement_16384              34794 ns      70119 ns 
BM_StableSort_int32_SingleElement_65536             135190 ns     354755 ns 
BM_StableSort_int32_SingleElement_262144            760995 ns    1644072 ns 
BM_StableSort_int32_SingleElement_524288           1969575 ns    3343419 ns 
BM_StableSort_int32_SingleElement_1048576          4423816 ns    8346971 ns 
BM_StableSort_int32_PipeOrgan_1                       3.79 ns       3.63 ns 
BM_StableSort_int32_PipeOrgan_4                       6.21 ns       6.73 ns 
BM_StableSort_int32_PipeOrgan_16                      27.5 ns       26.0 ns 
BM_StableSort_int32_PipeOrgan_64                       291 ns        265 ns 
BM_StableSort_int32_PipeOrgan_256                     2557 ns       2518 ns 
BM_StableSort_int32_PipeOrgan_1024                    6765 ns      10976 ns 
BM_StableSort_int32_PipeOrgan_4096                   26373 ns      44537 ns 
BM_StableSort_int32_PipeOrgan_16384                 201466 ns     188582 ns 
BM_StableSort_int32_PipeOrgan_65536                1148533 ns     802368 ns 
BM_StableSort_int32_PipeOrgan_262144               2255177 ns    3477829 ns 
BM_StableSort_int32_PipeOrgan_524288               3947015 ns   10356637 ns 
BM_StableSort_int32_PipeOrgan_1048576             10274312 ns   16405366 ns 
BM_StableSort_int32_QuickSortAdversary_1              3.32 ns       4.36 ns 
BM_StableSort_int32_QuickSortAdversary_4              5.98 ns       7.44 ns 
BM_StableSort_int32_QuickSortAdversary_16             13.0 ns       16.3 ns 
BM_StableSort_int32_QuickSortAdversary_64              657 ns        616 ns 
BM_StableSort_int32_QuickSortAdversary_256            2569 ns       2483 ns 
BM_StableSort_int32_QuickSortAdversary_1024           6898 ns      19635 ns 
BM_StableSort_int32_QuickSortAdversary_4096          27092 ns      75108 ns 
BM_StableSort_int32_QuickSortAdversary_16384        190379 ns     316463 ns 
BM_StableSort_int32_QuickSortAdversary_65536       1109040 ns    1319018 ns 
BM_StableSort_int32_QuickSortAdversary_262144      4361925 ns    5472779 ns 
BM_StableSort_int32_QuickSortAdversary_524288      6528215 ns   17538983 ns 
BM_StableSort_int32_QuickSortAdversary_1048576    18345325 ns   27223926 ns 
BM_StableSort_uint32_Random_1                         3.67 ns       3.82 ns 
BM_StableSort_uint32_Random_4                         22.3 ns       21.8 ns 
BM_StableSort_uint32_Random_16                         155 ns        153 ns 
BM_StableSort_uint32_Random_64                         946 ns        976 ns 
BM_StableSort_uint32_Random_256                       5824 ns       6019 ns 
BM_StableSort_uint32_Random_1024                      4525 ns      32764 ns 
BM_StableSort_uint32_Random_4096                     17223 ns     158608 ns 
BM_StableSort_uint32_Random_16384                   134821 ns     748525 ns 
BM_StableSort_uint32_Random_65536                   716644 ns    3453325 ns 
BM_StableSort_uint32_Random_262144                 3628062 ns   16065414 ns 
BM_StableSort_uint32_Random_524288                10971334 ns   36567712 ns 
BM_StableSort_uint32_Random_1048576               22688377 ns   77533497 ns 
BM_StableSort_uint32_Ascending_1                      3.57 ns       3.44 ns 
BM_StableSort_uint32_Ascending_4                      5.73 ns       5.33 ns 
BM_StableSort_uint32_Ascending_16                     14.5 ns       14.0 ns 
BM_StableSort_uint32_Ascending_64                     50.3 ns       51.3 ns 
BM_StableSort_uint32_Ascending_256                     465 ns        467 ns 
BM_StableSort_uint32_Ascending_1024                   3042 ns       2530 ns 
BM_StableSort_uint32_Ascending_4096                   9842 ns      12207 ns 
BM_StableSort_uint32_Ascending_16384                 37994 ns      61726 ns 
BM_StableSort_uint32_Ascending_65536                148890 ns     294385 ns 
BM_StableSort_uint32_Ascending_262144               855080 ns    1422167 ns 
BM_StableSort_uint32_Ascending_524288              2154903 ns    3203018 ns 
BM_StableSort_uint32_Ascending_1048576             5002518 ns    7563817 ns 
BM_StableSort_uint32_Descending_1                     3.51 ns       3.40 ns 
BM_StableSort_uint32_Descending_4                     9.09 ns       7.95 ns 
BM_StableSort_uint32_Descending_16                    54.8 ns       74.4 ns 
BM_StableSort_uint32_Descending_64                    1003 ns       1305 ns 
BM_StableSort_uint32_Descending_256                   4545 ns       5300 ns 
BM_StableSort_uint32_Descending_1024                  4361 ns      21884 ns 
BM_StableSort_uint32_Descending_4096                 16018 ns      90534 ns 
BM_StableSort_uint32_Descending_16384               146274 ns     381943 ns 
BM_StableSort_uint32_Descending_65536               938248 ns    1536806 ns 
BM_StableSort_uint32_Descending_262144             3899300 ns    6387843 ns 
BM_StableSort_uint32_Descending_524288             5808157 ns   21959858 ns 
BM_StableSort_uint32_Descending_1048576           17520047 ns   26351912 ns 
BM_StableSort_uint32_SingleElement_1                  4.03 ns       3.97 ns 
BM_StableSort_uint32_SingleElement_4                  6.55 ns       6.41 ns 
BM_StableSort_uint32_SingleElement_16                 15.6 ns       15.8 ns 
BM_StableSort_uint32_SingleElement_64                 52.3 ns       58.7 ns 
BM_StableSort_uint32_SingleElement_256                 473 ns        485 ns 
BM_StableSort_uint32_SingleElement_1024               3020 ns       2407 ns 
BM_StableSort_uint32_SingleElement_4096               9998 ns      12527 ns 
BM_StableSort_uint32_SingleElement_16384             38072 ns      62228 ns 
BM_StableSort_uint32_SingleElement_65536            153706 ns     295662 ns 
BM_StableSort_uint32_SingleElement_262144           836532 ns    1477099 ns 
BM_StableSort_uint32_SingleElement_524288          2144900 ns    3157204 ns 
BM_StableSort_uint32_SingleElement_1048576         4995525 ns    7617233 ns 
BM_StableSort_uint32_PipeOrgan_1                      4.02 ns       3.99 ns 
BM_StableSort_uint32_PipeOrgan_4                      6.97 ns       6.84 ns 
BM_StableSort_uint32_PipeOrgan_16                     26.1 ns       29.7 ns 
BM_StableSort_uint32_PipeOrgan_64                      266 ns        333 ns 
BM_StableSort_uint32_PipeOrgan_256                    2462 ns       2892 ns 
BM_StableSort_uint32_PipeOrgan_1024                   4291 ns      12431 ns 
BM_StableSort_uint32_PipeOrgan_4096                  15638 ns      51449 ns 
BM_StableSort_uint32_PipeOrgan_16384                154563 ns     217460 ns 
BM_StableSort_uint32_PipeOrgan_65536                907724 ns     925873 ns 
BM_StableSort_uint32_PipeOrgan_262144              2394580 ns    4103575 ns 
BM_StableSort_uint32_PipeOrgan_524288              4177145 ns   13947158 ns 
BM_StableSort_uint32_PipeOrgan_1048576            11848224 ns   18807297 ns 
BM_StableSort_uint32_QuickSortAdversary_1             3.50 ns       3.43 ns 
BM_StableSort_uint32_QuickSortAdversary_4             5.88 ns       4.96 ns 
BM_StableSort_uint32_QuickSortAdversary_16            14.6 ns       14.0 ns 
BM_StableSort_uint32_QuickSortAdversary_64             576 ns        715 ns 
BM_StableSort_uint32_QuickSortAdversary_256           2353 ns       2797 ns 
BM_StableSort_uint32_QuickSortAdversary_1024          4176 ns      21775 ns 
BM_StableSort_uint32_QuickSortAdversary_4096         15565 ns      96188 ns 
BM_StableSort_uint32_QuickSortAdversary_16384       149092 ns     398332 ns 
BM_StableSort_uint32_QuickSortAdversary_65536       902488 ns    1552393 ns 
BM_StableSort_uint32_QuickSortAdversary_262144     3946517 ns    6560414 ns 
BM_StableSort_uint32_QuickSortAdversary_524288     6247114 ns   22420977 ns 
BM_StableSort_uint32_QuickSortAdversary_1048576   19892446 ns   26529576 ns 
BM_StableSort_int64_Random_1                          3.83 ns       3.98 ns 
BM_StableSort_int64_Random_4                          21.1 ns       24.0 ns 
BM_StableSort_int64_Random_16                          129 ns        136 ns 
BM_StableSort_int64_Random_64                          890 ns        906 ns 
BM_StableSort_int64_Random_256                        5542 ns       5901 ns 
BM_StableSort_int64_Random_1024                      16085 ns      33112 ns 
BM_StableSort_int64_Random_4096                      63895 ns     162181 ns 
BM_StableSort_int64_Random_16384                    348827 ns     790045 ns 
BM_StableSort_int64_Random_65536                   1488237 ns    3557506 ns 
BM_StableSort_int64_Random_262144                  8195713 ns   16315808 ns 
BM_StableSort_int64_Random_524288                 16586833 ns   38274075 ns 
BM_StableSort_int64_Random_1048576                40346644 ns   79182089 ns 
BM_StableSort_int64_Ascending_1                       3.76 ns       3.55 ns 
BM_StableSort_int64_Ascending_4                       5.82 ns       6.19 ns 
BM_StableSort_int64_Ascending_16                      11.7 ns       11.8 ns 
BM_StableSort_int64_Ascending_64                      32.9 ns       36.8 ns 
BM_StableSort_int64_Ascending_256                      415 ns        550 ns 
BM_StableSort_int64_Ascending_1024                    5352 ns       3347 ns 
BM_StableSort_int64_Ascending_4096                   17516 ns      19134 ns 
BM_StableSort_int64_Ascending_16384                  64147 ns      91099 ns 
BM_StableSort_int64_Ascending_65536                 322126 ns     434009 ns 
BM_StableSort_int64_Ascending_262144               1554669 ns    2057056 ns 
BM_StableSort_int64_Ascending_524288               3656527 ns    5016650 ns 
BM_StableSort_int64_Ascending_1048576             10469979 ns   12908613 ns 
BM_StableSort_int64_Descending_1                      4.09 ns       3.35 ns 
BM_StableSort_int64_Descending_4                      9.13 ns       8.01 ns 
BM_StableSort_int64_Descending_16                     76.8 ns       92.9 ns 
BM_StableSort_int64_Descending_64                     1336 ns       1417 ns 
BM_StableSort_int64_Descending_256                    5525 ns       5674 ns 
BM_StableSort_int64_Descending_1024                  17461 ns      22558 ns 
BM_StableSort_int64_Descending_4096                  64285 ns     102360 ns 
BM_StableSort_int64_Descending_16384                336946 ns     388940 ns 
BM_StableSort_int64_Descending_65536                837912 ns    1662169 ns 
BM_StableSort_int64_Descending_262144              3680806 ns    7494323 ns 
BM_StableSort_int64_Descending_524288             11023784 ns   24935033 ns 
BM_StableSort_int64_Descending_1048576            20023568 ns   33220712 ns 
BM_StableSort_int64_SingleElement_1                   3.37 ns       3.98 ns 
BM_StableSort_int64_SingleElement_4                   5.32 ns       6.92 ns 
BM_StableSort_int64_SingleElement_16                  10.9 ns       13.3 ns 
BM_StableSort_int64_SingleElement_64                  32.1 ns       43.8 ns 
BM_StableSort_int64_SingleElement_256                  420 ns        541 ns 
BM_StableSort_int64_SingleElement_1024                5689 ns       3381 ns 
BM_StableSort_int64_SingleElement_4096               19199 ns      17989 ns 
BM_StableSort_int64_SingleElement_16384              75754 ns      91963 ns 
BM_StableSort_int64_SingleElement_65536             357106 ns     500326 ns 
BM_StableSort_int64_SingleElement_262144           1672975 ns    2417734 ns 
BM_StableSort_int64_SingleElement_524288           3642891 ns    5200878 ns 
BM_StableSort_int64_SingleElement_1048576         11172007 ns   13729511 ns 
BM_StableSort_int64_PipeOrgan_1                       3.38 ns       3.94 ns 
BM_StableSort_int64_PipeOrgan_4                       5.73 ns       6.44 ns 
BM_StableSort_int64_PipeOrgan_16                      27.5 ns       29.0 ns 
BM_StableSort_int64_PipeOrgan_64                       310 ns        321 ns 
BM_StableSort_int64_PipeOrgan_256                     2761 ns       2918 ns 
BM_StableSort_int64_PipeOrgan_1024                   16105 ns      12525 ns 
BM_StableSort_int64_PipeOrgan_4096                   65289 ns      59990 ns 
BM_StableSort_int64_PipeOrgan_16384                 341757 ns     270636 ns 
BM_StableSort_int64_PipeOrgan_65536                 587452 ns    1126132 ns 
BM_StableSort_int64_PipeOrgan_262144               2837955 ns    5034180 ns 
BM_StableSort_int64_PipeOrgan_524288               6617313 ns   15267354 ns 
BM_StableSort_int64_PipeOrgan_1048576             15208796 ns   23162989 ns 
BM_StableSort_int64_QuickSortAdversary_1              3.77 ns       3.45 ns 
BM_StableSort_int64_QuickSortAdversary_4              5.55 ns       5.20 ns 
BM_StableSort_int64_QuickSortAdversary_16             12.5 ns       11.5 ns 
BM_StableSort_int64_QuickSortAdversary_64              646 ns        750 ns 
BM_StableSort_int64_QuickSortAdversary_256            2655 ns       3539 ns 
BM_StableSort_int64_QuickSortAdversary_1024          16373 ns      22349 ns 
BM_StableSort_int64_QuickSortAdversary_4096          62306 ns      97248 ns 
BM_StableSort_int64_QuickSortAdversary_16384        321755 ns     388084 ns 
BM_StableSort_int64_QuickSortAdversary_65536       1374694 ns    1596091 ns 
BM_StableSort_int64_QuickSortAdversary_262144      4374661 ns    6894139 ns 
BM_StableSort_int64_QuickSortAdversary_524288     12736074 ns   23932229 ns 
BM_StableSort_int64_QuickSortAdversary_1048576    22615219 ns   33355629 ns 
BM_StableSort_uint64_Random_1                         3.82 ns       3.49 ns 
BM_StableSort_uint64_Random_4                         22.4 ns       23.4 ns 
BM_StableSort_uint64_Random_16                         154 ns        146 ns 
BM_StableSort_uint64_Random_64                         924 ns        926 ns 
BM_StableSort_uint64_Random_256                       5864 ns       5913 ns 
BM_StableSort_uint64_Random_1024                      7168 ns      31746 ns 
BM_StableSort_uint64_Random_4096                     27668 ns     154224 ns 
BM_StableSort_uint64_Random_16384                   219526 ns     755205 ns 
BM_StableSort_uint64_Random_65536                   965251 ns    3490165 ns 
BM_StableSort_uint64_Random_262144                 6262162 ns   15889589 ns 
BM_StableSort_uint64_Random_524288                12530078 ns   36458581 ns 
BM_StableSort_uint64_Random_1048576               38462191 ns   75168445 ns 
BM_StableSort_uint64_Ascending_1                      3.30 ns       3.35 ns 
BM_StableSort_uint64_Ascending_4                      5.65 ns       5.84 ns 
BM_StableSort_uint64_Ascending_16                     14.7 ns       12.6 ns 
BM_StableSort_uint64_Ascending_64                     55.3 ns       34.6 ns 
BM_StableSort_uint64_Ascending_256                     513 ns        533 ns 
BM_StableSort_uint64_Ascending_1024                   5541 ns       3189 ns 
BM_StableSort_uint64_Ascending_4096                  17706 ns      20326 ns 
BM_StableSort_uint64_Ascending_16384                 66420 ns      93757 ns 
BM_StableSort_uint64_Ascending_65536                341425 ns     435016 ns 
BM_StableSort_uint64_Ascending_262144              1595691 ns    2088317 ns 
BM_StableSort_uint64_Ascending_524288              3808703 ns    5092832 ns 
BM_StableSort_uint64_Ascending_1048576            11060417 ns   13023250 ns 
BM_StableSort_uint64_Descending_1                     3.29 ns       3.35 ns 
BM_StableSort_uint64_Descending_4                     8.65 ns       7.92 ns 
BM_StableSort_uint64_Descending_16                    54.7 ns       80.2 ns 
BM_StableSort_uint64_Descending_64                    1028 ns       1307 ns 
BM_StableSort_uint64_Descending_256                   4521 ns       5635 ns 
BM_StableSort_uint64_Descending_1024                  7122 ns      23323 ns 
BM_StableSort_uint64_Descending_4096                 30538 ns      95892 ns 
BM_StableSort_uint64_Descending_16384               195565 ns     392721 ns 
BM_StableSort_uint64_Descending_65536               852002 ns    1720358 ns 
BM_StableSort_uint64_Descending_262144             3737884 ns    7484130 ns 
BM_StableSort_uint64_Descending_524288            11159345 ns   25690770 ns 
BM_StableSort_uint64_Descending_1048576           20648864 ns   33057383 ns 
BM_StableSort_uint64_SingleElement_1                  3.62 ns       4.10 ns 
BM_StableSort_uint64_SingleElement_4                  6.73 ns       6.64 ns 
BM_StableSort_uint64_SingleElement_16                 14.9 ns       11.3 ns 
BM_StableSort_uint64_SingleElement_64                 52.0 ns       33.0 ns 
BM_StableSort_uint64_SingleElement_256                 511 ns        582 ns 
BM_StableSort_uint64_SingleElement_1024               6499 ns       3287 ns 
BM_StableSort_uint64_SingleElement_4096              22190 ns      17616 ns 
BM_StableSort_uint64_SingleElement_16384             84378 ns      86885 ns 
BM_StableSort_uint64_SingleElement_65536            466257 ns     457144 ns 
BM_StableSort_uint64_SingleElement_262144          1993687 ns    2361999 ns 
BM_StableSort_uint64_SingleElement_524288          4759565 ns    5096771 ns 
BM_StableSort_uint64_SingleElement_1048576        12426111 ns   13468453 ns 
BM_StableSort_uint64_PipeOrgan_1                      3.73 ns       3.94 ns 
BM_StableSort_uint64_PipeOrgan_4                      7.18 ns       7.54 ns 
BM_StableSort_uint64_PipeOrgan_16                     25.2 ns       29.1 ns 
BM_StableSort_uint64_PipeOrgan_64                      260 ns        321 ns 
BM_StableSort_uint64_PipeOrgan_256                    2468 ns       2970 ns 
BM_StableSort_uint64_PipeOrgan_1024                   7025 ns      12912 ns 
BM_StableSort_uint64_PipeOrgan_4096                  28968 ns      53379 ns 
BM_StableSort_uint64_PipeOrgan_16384                194156 ns     239790 ns 
BM_StableSort_uint64_PipeOrgan_65536                599491 ns     993800 ns 
BM_StableSort_uint64_PipeOrgan_262144              2648585 ns    4689680 ns 
BM_StableSort_uint64_PipeOrgan_524288              7621109 ns   15401808 ns 
BM_StableSort_uint64_PipeOrgan_1048576            15608814 ns   23484821 ns 
BM_StableSort_uint64_QuickSortAdversary_1             3.38 ns       3.54 ns 
BM_StableSort_uint64_QuickSortAdversary_4             5.50 ns       6.03 ns 
BM_StableSort_uint64_QuickSortAdversary_16            14.2 ns       11.0 ns 
BM_StableSort_uint64_QuickSortAdversary_64             597 ns        688 ns 
BM_StableSort_uint64_QuickSortAdversary_256           2446 ns       2818 ns 
BM_StableSort_uint64_QuickSortAdversary_1024          7266 ns      20319 ns 
BM_StableSort_uint64_QuickSortAdversary_4096         31155 ns      89112 ns 
BM_StableSort_uint64_QuickSortAdversary_16384       201033 ns     390574 ns 
BM_StableSort_uint64_QuickSortAdversary_65536       871014 ns    1685639 ns 
BM_StableSort_uint64_QuickSortAdversary_262144     3978535 ns    7265830 ns 
BM_StableSort_uint64_QuickSortAdversary_524288    10279721 ns   25350004 ns 
BM_StableSort_uint64_QuickSortAdversary_1048576   20256585 ns   33054393 ns 
```
2025-01-09 19:02:35 +01:00
Vitaly Buka
570f03096a
Revert "Reapply "[libc++] Explicitly convert to masks in SIMD code (#107983)"" (#122022)
Reverts llvm/llvm-project#121352

Triggers "vector type should not be a bool!" on:
```
  bool a[100];
  bool b[100];
  auto t = std::mismatch(std::begin(a), std::end(a), std::begin(b), std::end(b));
```

https://godbolt.org/z/Y73s3sdef
2025-01-08 18:14:39 +01:00
Nikolas Klauser
f69585235e
[libc++] Put _LIBCPP_NODEBUG on all internal aliases (#118710)
This significantly reduces the amount of debug information generated
for codebases using libc++, without hurting the debugging experience.
2025-01-08 11:12:59 -05:00
Vitaly Buka
ed572f2003
Reapply "[libc++] Explicitly convert to masks in SIMD code (#107983)" (#121352)
This reverts commit 0ea40bf02138c02e7680ce6fa8169502f2a8bd42.

Passes with https://github.com/llvm/llvm-project/issues/121365 fix: 
https://lab.llvm.org/buildbot/#/builders/55/builds/4930
2025-01-01 11:30:35 +01:00
Nikolas Klauser
b905bcc509
[libc++] Remove some unused includes (#120219) 2024-12-18 21:10:27 +01:00
Nikolas Klauser
59890c1334
[libc++] Granularize <new> includes (#119964) 2024-12-17 11:29:16 +01:00
Nikolas Klauser
a2042521a0
[libc++] Remove _AlgPolicy from std::copy and algorithms using std::copy (#115887)
`std::copy` doesn't use the `_AlgPolicy` for anything other than calling
itself with it, so we can just remove the argument. This also removes
the need in a few other algorithms which had an `_AlgPolicy` argument
only to call `copy`.
2024-11-12 23:03:52 +01:00
Nikolas Klauser
5b67372aec [libc++] Remove a few unused includes from <__algorithm/find_end.h> 2024-11-12 22:11:15 +01:00
Nikolas Klauser
eab7be5d42
[libc++] Forward more algorithms to the classic algorithms (#114674)
This partially addresses #105687.
2024-11-06 12:10:06 +01:00
Nikolas Klauser
c6f3b7bcd0
[libc++] Refactor the configuration macros to being always defined (#112094)
This is a follow-up to #89178. This updates the `<__config_site>`
macros.
2024-11-06 10:39:19 +01:00
Nikolas Klauser
e99c4906e4
[libc++] Granularize <cstddef> includes (#108696) 2024-10-31 02:20:10 +01:00
Nikolas Klauser
0fb76bae6b
Reapply "[libc++] Simplify the implementation of std::sort a bit (#104902)" (#114023)
This reverts commit ef44e4659878f2. The patch was originally reverted
because it was
deemed to introduce a performance regression for small inputs, however
it also fixed
a previous performance regression for larger inputs. So overall, this
patch is desirable.
2024-10-30 11:51:55 +01:00
Louis Dionne
8e6bba230e [libc++][NFC] Rename fold.h to ranges_fold.h (#109696)
This follows the pattern we use consistently for ranges algorithms.

This is a re-application of 24bc3244d4e which had been reverted in
f11abac65 due to unrelated failures.
2024-09-30 08:30:16 -04:00
Chris B
f11abac652
Revert "[libc++][modules] Rewrite the modulemap to have fewer top-level modules (#107638)" (#110384)
This reverts 3 commits:
45a09d1811d5d6597385ef02ecf2d4b7320c37c5
24bc3244d4e221f4e6740f45e2bf15a1441a3076
bc6bd3bc1e99c7ec9e22dff23b4f4373fa02cae3

The GitHub pre-merge CI has been broken since this PR went in. This
change reverts it to see if I can get the pre-merge CI working again.
2024-09-28 21:47:09 -05:00
Louis Dionne
24bc3244d4
[libc++][NFC] Rename fold.h to ranges_fold.h (#109696)
This follows the pattern we use consistently for ranges algorithms.
2024-09-27 01:02:21 -04:00
Louis Dionne
ef44e46598 Revert "[libc++] Simplify the implementation of std::sort a bit (#104902)"
This reverts commit d4ffccfce103b01401b8a9222e373f2d404f8439, which
caused a performance regression that needs to be investigated further.
2024-09-20 09:48:58 -04:00
Thurston Dang
0ea40bf021 Revert "[libc++] Explicitly convert to masks in SIMD code (#107983)"
This reverts commit 1603f99a37c5b179a21dbb8000c39a471a950927.

Reason: buildbot breakage e.g., https://lab.llvm.org/buildbot/#/builders/55/builds/2061
  llvm-libc++-shared.cfg.in :: std/algorithms/alg.nonmodifying/alg.starts_with/ranges.starts_with.pass.cpp
  llvm-libc++-shared.cfg.in :: std/algorithms/alg.nonmodifying/mismatch/mismatch.pass.cpp
  llvm-libc++-shared.cfg.in :: std/algorithms/alg.nonmodifying/mismatch/ranges_mismatch.pass.cpp
  ...

(Buildbot re-run passed with the previous revision, 1fc288bf481726393c73133eef9aa73c0f78312e)
2024-09-17 21:52:33 +00:00
Nikolas Klauser
1603f99a37
[libc++] Explicitly convert to masks in SIMD code (#107983)
This makes it clearer when we use masks and avoids MSan complaining.
2024-09-17 12:04:54 +02:00
Louis Dionne
09e3a36058
[libc++][modules] Fix missing and incorrect includes (#108850)
This patch adds a large number of missing includes in the libc++ headers
and the test suite. Those were found as part of the effort to move
towards a mostly monolithic top-level std module.
2024-09-16 15:06:20 -04:00
A. Jiang
94e7c0b051
[libc++] Remove get_temporary_buffer and return_temporary_buffer (#100914)
Works towards P0619R4 / #99985.

The use of `std::get_temporary_buffer` and `std::return_temporary_buffer`
are replaced with `unique_ptr`-based RAII buffer holder.

Escape hatches:
- `_LIBCPP_ENABLE_CXX20_REMOVED_TEMPORARY_BUFFER` restores
`std::get_temporary_buffer` and `std::return_temporary_buffer`.

Drive-by changes:
- In `<syncstream>`, states that `get_temporary_buffer` is now removed,
because `<syncstream>` is added in C++20.
2024-09-16 11:53:05 -04:00
Nikolas Klauser
17e0686ab1
[libc++][NFC] Use [[__nodiscard__]] unconditionally (#80454)
`__has_cpp_attribute(__nodiscard__)` is always true now, so we might as
well replace `_LIBCPP_NODISCARD`. It's one less macro that can result in
bad diagnostics.
2024-09-12 21:18:43 +02:00
Louis Dionne
d6832a611a
[libc++][modules] Modularize <cstddef> (#107254)
Many headers include `<cstddef>` just for size_t, and pulling in
additional content (e.g. the traits used for std::byte) is unnecessary.
To solve this problem, this patch splits up `<cstddef>` into
subcomponents so that headers can include only the parts that they
actually require.

This has the added benefit of making the modules build a lot stricter
with respect to IWYU, and also providing a canonical location where we
define `std::size_t` and friends (which were previously defined in
multiple headers like `<cstddef>` and `<ctime>`).

After this patch, there's still many places in the codebase where we
include `<cstddef>` when `<__cstddef/size_t.h>` would be sufficient.
This patch focuses on removing `<cstddef>` includes from __type_traits
to make these headers non-circular with `<cstddef>`. Additional
refactorings can be tackled separately.
2024-09-05 08:28:33 -04:00
Louis Dionne
0df78123fd [libc++] Add missing include to three_way_comp_ref_type.h
We were using a `_LIBCPP_ASSERT_FOO` macro without including `<__assert>`.

rdar://134425695
2024-08-27 14:22:49 -04:00
Nikolas Klauser
d4ffccfce1
[libc++] Simplify the implementation of std::sort a bit (#104902)
This does a few things to canonicalize the library a bit. Specifically
- use `__desugars_to_v` instead of the custom `__is_simple_comparator`
- make `__use_branchless_sort` an inline variable
- remove the `_maybe_branchless` versions of the `__sortN` functions and
overload based on whether we can do branchless sorting instead.
2024-08-27 16:54:05 +02:00
Louis Dionne
f73050e722
[libc++] Fix several double-moves in the code base (#104616)
This patch hardens the "test iterators" we use to test algorithms by
ensuring that they don't get double-moved. As a result of this
hardening, the tests started reporting multiple failures where we would
double-move iterators, which are being fixed in this patch.

In particular:
- Fixed a double-move in pstl.partition
- Add coverage for begin()/end() in subrange tests
- Fix tests for ranges::ends_with and ranges::contains, which were
  incorrectly calling begin() twice on the same subrange containing
  non-copyable input iterators.

Fixes #100709
2024-08-20 14:36:11 -04:00
Louis Dionne
257831582c
[libc++] Check correctly ref-qualified __is_callable in algorithms (#101553)
We were only checking that the comparator was rvalue callable,
when in reality the algorithms always call comparators as lvalues.
This patch also refactors the tests for callable requirements and
expands it to a few missing algorithms.

This is take 2 of #73451, which was reverted because it broke some
CI bots. The issue was that we checked __is_callable with arguments
in the wrong order inside std::upper_bound. This has now been fixed
and a test was added.

Fixes #69554
2024-08-05 11:23:06 -04:00
Nikolas Klauser
d07fdf9779
[libc++] Optimize lexicographical_compare (#65279)
If the comparison operation is equivalent to < and that is a total
order, we know that we can use equality comparison on that type instead
to extract some information. Furthermore, if equality comparison on that
type is trivial, the user can't observe that we're calling it. So
instead of using the user-provided total order, we use std::mismatch,
which uses equality comparison (and is vertorized). Additionally, if the
type is trivially lexicographically comparable, we can go one step
further and use std::memcmp directly instead of calling std::mismatch.

Benchmarks:
```
-------------------------------------------------------------------------------------
Benchmark                                                         old             new
-------------------------------------------------------------------------------------
bm_lexicographical_compare<unsigned char>/1                   1.17 ns         2.34 ns
bm_lexicographical_compare<unsigned char>/2                   1.64 ns         2.57 ns
bm_lexicographical_compare<unsigned char>/3                   2.23 ns         2.58 ns
bm_lexicographical_compare<unsigned char>/4                   2.82 ns         2.57 ns
bm_lexicographical_compare<unsigned char>/5                   3.34 ns         2.11 ns
bm_lexicographical_compare<unsigned char>/6                   3.94 ns         2.21 ns
bm_lexicographical_compare<unsigned char>/7                   4.56 ns         2.11 ns
bm_lexicographical_compare<unsigned char>/8                   5.25 ns         2.11 ns
bm_lexicographical_compare<unsigned char>/16                  9.88 ns         2.11 ns
bm_lexicographical_compare<unsigned char>/64                  38.9 ns         2.36 ns
bm_lexicographical_compare<unsigned char>/512                  317 ns         6.54 ns
bm_lexicographical_compare<unsigned char>/4096                2517 ns         41.4 ns
bm_lexicographical_compare<unsigned char>/32768              20052 ns          488 ns
bm_lexicographical_compare<unsigned char>/262144            159579 ns         4409 ns
bm_lexicographical_compare<unsigned char>/1048576           640456 ns        20342 ns
bm_lexicographical_compare<signed char>/1                     1.18 ns         2.37 ns
bm_lexicographical_compare<signed char>/2                     1.65 ns         2.60 ns
bm_lexicographical_compare<signed char>/3                     2.23 ns         2.83 ns
bm_lexicographical_compare<signed char>/4                     2.81 ns         3.06 ns
bm_lexicographical_compare<signed char>/5                     3.35 ns         3.30 ns
bm_lexicographical_compare<signed char>/6                     3.90 ns         3.99 ns
bm_lexicographical_compare<signed char>/7                     4.56 ns         3.78 ns
bm_lexicographical_compare<signed char>/8                     5.20 ns         4.02 ns
bm_lexicographical_compare<signed char>/16                    9.80 ns         6.21 ns
bm_lexicographical_compare<signed char>/64                    39.0 ns         3.16 ns
bm_lexicographical_compare<signed char>/512                    318 ns         7.58 ns
bm_lexicographical_compare<signed char>/4096                  2514 ns         47.4 ns
bm_lexicographical_compare<signed char>/32768                20096 ns          504 ns
bm_lexicographical_compare<signed char>/262144              156617 ns         4146 ns
bm_lexicographical_compare<signed char>/1048576             624265 ns        19810 ns
bm_lexicographical_compare<int>/1                             1.15 ns         2.12 ns
bm_lexicographical_compare<int>/2                             1.60 ns         2.36 ns
bm_lexicographical_compare<int>/3                             2.21 ns         2.59 ns
bm_lexicographical_compare<int>/4                             2.74 ns         2.83 ns
bm_lexicographical_compare<int>/5                             3.26 ns         3.06 ns
bm_lexicographical_compare<int>/6                             3.81 ns         4.53 ns
bm_lexicographical_compare<int>/7                             4.41 ns         4.72 ns
bm_lexicographical_compare<int>/8                             5.08 ns         2.36 ns
bm_lexicographical_compare<int>/16                            9.54 ns         3.08 ns
bm_lexicographical_compare<int>/64                            37.8 ns         4.71 ns
bm_lexicographical_compare<int>/512                            309 ns         24.6 ns
bm_lexicographical_compare<int>/4096                          2422 ns          204 ns
bm_lexicographical_compare<int>/32768                        19362 ns         1947 ns
bm_lexicographical_compare<int>/262144                      155727 ns        19793 ns
bm_lexicographical_compare<int>/1048576                     623614 ns        80180 ns
bm_ranges_lexicographical_compare<unsigned char>/1            1.07 ns         2.35 ns
bm_ranges_lexicographical_compare<unsigned char>/2            1.72 ns         2.13 ns
bm_ranges_lexicographical_compare<unsigned char>/3            2.46 ns         2.12 ns
bm_ranges_lexicographical_compare<unsigned char>/4            3.17 ns         2.12 ns
bm_ranges_lexicographical_compare<unsigned char>/5            3.86 ns         2.12 ns
bm_ranges_lexicographical_compare<unsigned char>/6            4.55 ns         2.12 ns
bm_ranges_lexicographical_compare<unsigned char>/7            5.25 ns         2.12 ns
bm_ranges_lexicographical_compare<unsigned char>/8            5.95 ns         2.13 ns
bm_ranges_lexicographical_compare<unsigned char>/16           11.7 ns         2.13 ns
bm_ranges_lexicographical_compare<unsigned char>/64           45.5 ns         2.36 ns
bm_ranges_lexicographical_compare<unsigned char>/512           366 ns         6.35 ns
bm_ranges_lexicographical_compare<unsigned char>/4096         2886 ns         40.9 ns
bm_ranges_lexicographical_compare<unsigned char>/32768       23054 ns          489 ns
bm_ranges_lexicographical_compare<unsigned char>/262144     185302 ns         4339 ns
bm_ranges_lexicographical_compare<unsigned char>/1048576    741576 ns        19430 ns
bm_ranges_lexicographical_compare<signed char>/1              1.10 ns         2.12 ns
bm_ranges_lexicographical_compare<signed char>/2              1.66 ns         2.35 ns
bm_ranges_lexicographical_compare<signed char>/3              2.23 ns         2.58 ns
bm_ranges_lexicographical_compare<signed char>/4              2.82 ns         2.82 ns
bm_ranges_lexicographical_compare<signed char>/5              3.34 ns         3.06 ns
bm_ranges_lexicographical_compare<signed char>/6              3.92 ns         3.99 ns
bm_ranges_lexicographical_compare<signed char>/7              4.64 ns         4.10 ns
bm_ranges_lexicographical_compare<signed char>/8              5.21 ns         4.61 ns
bm_ranges_lexicographical_compare<signed char>/16             9.79 ns         7.42 ns
bm_ranges_lexicographical_compare<signed char>/64             38.9 ns         2.93 ns
bm_ranges_lexicographical_compare<signed char>/512             317 ns         7.31 ns
bm_ranges_lexicographical_compare<signed char>/4096           2500 ns         47.5 ns
bm_ranges_lexicographical_compare<signed char>/32768         19940 ns          496 ns
bm_ranges_lexicographical_compare<signed char>/262144       159166 ns         4393 ns
bm_ranges_lexicographical_compare<signed char>/1048576      638206 ns        19786 ns
bm_ranges_lexicographical_compare<int>/1                      1.10 ns         2.12 ns
bm_ranges_lexicographical_compare<int>/2                      1.64 ns         3.04 ns
bm_ranges_lexicographical_compare<int>/3                      2.23 ns         2.58 ns
bm_ranges_lexicographical_compare<int>/4                      2.81 ns         2.81 ns
bm_ranges_lexicographical_compare<int>/5                      3.35 ns         3.05 ns
bm_ranges_lexicographical_compare<int>/6                      3.94 ns         4.60 ns
bm_ranges_lexicographical_compare<int>/7                      4.60 ns         4.81 ns
bm_ranges_lexicographical_compare<int>/8                      5.19 ns         2.35 ns
bm_ranges_lexicographical_compare<int>/16                     9.85 ns         2.87 ns
bm_ranges_lexicographical_compare<int>/64                     38.9 ns         4.70 ns
bm_ranges_lexicographical_compare<int>/512                     318 ns         24.5 ns
bm_ranges_lexicographical_compare<int>/4096                   2494 ns          202 ns
bm_ranges_lexicographical_compare<int>/32768                 20000 ns         1939 ns
bm_ranges_lexicographical_compare<int>/262144               160433 ns        19730 ns
bm_ranges_lexicographical_compare<int>/1048576              642636 ns        80760 ns
```
2024-08-04 10:02:43 +02:00
Louis Dionne
451bba6fbf [libc++] Revert "Check correctly ref-qualified __is_callable in algorithms (#73451)"
This reverts commit 8d151f804ff43aaed1edf810bb2a07607b8bba14, which
broke some build bots. I think that is caused by an invalid argument
order when checking __is_comparable in upper_bound.
2024-08-01 15:56:06 -04:00
Nhat Nguyen
8d151f804f
[libc++] Check correctly ref-qualified __is_callable in algorithms (#73451)
We were only checking that the comparator was rvalue callable,
when in reality the algorithms always call comparators as lvalues.
This patch also refactors the tests for callable requirements and
expands it to a few missing algorithms.

Fixes #69554
2024-08-01 14:08:21 -04:00
Christopher Di Bella
d10dc5a06f
[libc++] Remove dedicated namespaces for ranges functions (#76543)
We originally put implementation-detail function objects into individual
namespaces for `std::ranges` without a good reason for doing so. This
practice was continued, presumably because there was prior art. Since
there's no reason to keep these namespaces, this commit removes them,
which will slightly impact binary size.

This commit does not apply to CPOs, some of which need additional work.
2024-08-01 08:54:06 -04:00
Hewill Kang
5b6b48800e
[libc++][NFC] Remove two unused implementation details __find_end (#100685)
Those two `__find_end` functions are no longer used after 101d1e9b3c86.
After that commit, `std::find_end` started dispatching to `__find_end_classic`,
and `ranges::find_end` to `__find_end_impl`, which means that the two `__find_end`
functions were no longer necessary.

Fixes #100569
2024-07-31 10:34:19 -04:00
nicole mazzuca
04760bfadb
[libc++][ranges] P1223R5: find_last (#99312)
Implements [P1223R5][] completely.

Includes an implementation of `find_last`, `find_last_if`, and
`find_last_if_not`.

[P1223R5]: https://wg21.link/p1223r5
2024-07-19 09:42:16 -07:00
Iuri Chaer
a0662176a9
[libc++] Speed up set_intersection() by fast-forwarding over ranges of non-matching elements with one-sided binary search. (#75230)
One-sided binary search, aka meta binary search, has been in the public
domain for decades, and has the general advantage of being constant time
in the best case, with the downside of executing at most 2*log(N)
comparisons vs classic binary search's exact log(N). There are two
scenarios in which it really shines: the first one is when operating
over non-random-access iterators, because the classic algorithm requires
knowing the container's size upfront, which adds N iterator increments
to the complexity. The second one is when traversing the container in
order, trying to fast-forward to the next value: in that case the
classic algorithm requires at least O(N*log(N)) comparisons and, for
non-random-access iterators, O(N^2) iterator increments, whereas the
one-sided version will yield O(N) operations on both counts, with a
best-case of O(log(N)) comparisons which is very common in practice.
2024-07-18 16:11:24 -04:00
Louis Dionne
e2c2ffbe7a
[libc++][NFC] Run clang-format on libcxx/include again (#95874)
As time went by, a few files have become mis-formatted w.r.t.
clang-format. This was made worse by the fact that formatting was not
being enforced in extensionless headers. This commit simply brings all
of libcxx/include in-line with clang-format again.

We might have to do this from time to time as we update our clang-format
version, but frankly this is really low effort now that we've formatted
everything once.
2024-06-18 09:13:45 -04:00
Louis Dionne
acb896a344
[libc++] Remove unnecessary #ifdef guards around PSTL implementation details (#95268)
We want the PSTL implementation details to be available regardless of
the Standard mode or whether the experimental PSTL is enabled. This
patch guards the inclusion of the PSTL to the top-level headers that
define the public API in `__numeric` and `__algorithm`.
2024-06-12 17:25:43 -04:00
Louis Dionne
fe4cd104a8 [libc++][NFC] Fix typo in concept PSTL concept check 2024-06-12 12:31:05 -04:00
Louis Dionne
9540950a45
[libc++] Overhaul the PSTL dispatching mechanism (#88131)
The experimental PSTL's current dispatching mechanism was designed with
flexibility in mind. However, while reviewing the in-progress OpenMP
backend, I realized that the dispatching mechanism based on ADL and
default definitions in the frontend had several downsides. To name a
few:

1. The dispatching of an algorithm to the back-end and its default
   implementation is bundled together via `_LIBCPP_PSTL_CUSTOMIZATION_POINT`.
   This makes the dispatching really confusing and leads to annoyances
   such as variable shadowing and weird lambda captures in the front-end.
2. The distinction between back-end functions and front-end algorithms
   is not as clear as it could be, which led us to call one where we meant
   the other in a few cases. This is bad due to the exception requirements
   of the PSTL: calling a front-end algorithm inside the implementation of
   a back-end is incorrect for exception-safety.
3. There are two levels of back-end dispatching in the PSTL, which treat
   CPU backends as a special case. This was confusing and not as flexible
   as we'd like. For example, there was no straightforward way to dispatch
   all uses of `unseq` to a specific back-end from the OpenMP backend,
   or for CPU backends to fall back on each other.

This patch rewrites the backend dispatching mechanism to solve these
problems, but doesn't touch any of the actual implementation of
algorithms. Specifically, this rewrite has the following
characteristics:

- There is a single level of backend dispatching, however partial backends can
  be stacked to provide a full implementation of the PSTL. The two-level dispatching
  that was used for CPU-based backends is handled by providing CPU-based basis 
  operations as simple helpers that can easily be reused when defining any PSTL 
  backend.

- The default definitions for algorithms are separated from their dispatching logic.

- The front-end is thus simplified a whole lot and made very consistent
  for all algorithms, which makes it easier to audit the front-end for
  things like exception-correctness, appropriate forwarding, etc.

Fixes #70718
2024-06-12 12:24:34 -04:00
Zibi Sarbinowski
ffc3a6b286
[libc++] Fix endianness for algorithm mismatch (#93082)
This PR is required to fix
`std/algorithms/alg.nonmodifying/mismatch/mismatch.pass.cpp` test for
big endian platrofrms such as z/OS.
2024-06-11 08:29:12 -04:00
Louis Dionne
e406d5ed9c
[libc++][pstl] Merge all frontend functions for the PSTL (#89219)
This is an intermediate step towards the PSTL dispatching mechanism
rework. It will make it a lot easier to track the upcoming front-end
changes. After the rework, there are basically no implementation details
in the front-end, so the definition of each algorithm will become much
simpler. Otherwise, it wouldn't make sense to define all the algorithms
in the same header.
2024-05-27 17:51:12 -04:00
Louis Dionne
72417920d3
[libc++] Remove a few unused includes of trivially_copyable.h (#93200) 2024-05-23 15:58:51 -04:00
Louis Dionne
bd3f5a4bd3
[libc++][pstl] Improve exception handling (#88998)
There were various places where we incorrectly handled exceptions in the
PSTL. Typical issues were missing `noexcept` and taking iterators by
value instead of by reference.

This patch fixes those inconsistent and incorrect instances, and adds
proper tests for all of those. Note that the previous tests were often
incorrectly turned into no-ops by the compiler due to copy ellision,
which doesn't happen with these new tests.
2024-05-22 12:39:21 -07:00
zibi2
af57ad6536
[libc++][z/OS] Correct a definition of __native_vector_size (#91995)
Fix `std/ranges/range.adaptors/range.lazy.split/general.pass.cpp` which
started failing on z/OS after this
[commit](https://github.com/llvm/llvm-project/commit/985c1a44f8d49e0af).

This test case is passing on other platforms such as AIX. This is
because the `__ALTIVEC__` macro is defined and `__mismatch` under
`_LIBCPP_VECTORIZE_ALGORITHMS` guard is compiled out. However, on z/OS
`_LIBCPP_VECTORIZE_ALGORITHMS` is defined. Analyzing the algorithm of
`__mismatch` shows that the culprit is the definition of
`__native_vector_size` which was defined wrongly as 1. This PR corrects
the definition of `__native_vector_size` and fixes the affected test.
2024-05-16 08:58:44 -04:00
Nikolas Klauser
05cc2d5fe1
[libc++] Vectorize std::mismatch with trivially equality comparable types (#87716) 2024-05-11 23:32:48 +02:00
Nikolas Klauser
840032419d
[libc++][NFC] Rename __find_impl to __find (#90163)
For most algorithms we've just added underscores to the detail function.
This changes `std::find` to match that pattern.
2024-04-27 09:51:59 +02:00
Nikolas Klauser
83bc7b5771
[libc++] Remove _LIBCPP_DISABLE_NODISCARD_EXTENSIONS and refactor the tests (#87094)
This also adds a few tests that were missing.
2024-04-22 22:13:58 +02:00
Louis Dionne
0e08bce142
[libc++][pstl] Move the CPU algorithm implementations to __pstl (#89109)
This colocates the CPU algorithms closer to the rest of the PSTL
implementation details.
2024-04-18 07:50:57 -04:00
Louis Dionne
d423d80e56
[libc++][pstl] Promote CPU backends to top-level backends (#88968)
This patch removes the two-level backend dispatching mechanism we had in
the PSTL. Instead of selecting both a PSTL backend and a PSTL CPU
backend, we now only select a top-level PSTL backend. This greatly
simplifies the PSTL configuration layer.

While this patch technically removes some flexibility from the PSTL
configuration mechanism because CPU backends are not considered
separately, it opens the door to a much more powerful configuration
mechanism based on chained backends in a follow-up patch.

This is a step towards overhauling the PSTL dispatching mechanism.
2024-04-17 13:36:53 -04:00
Louis Dionne
d57907d0b4
[libc++] Add missing iterator requirement checks in the PSTL (#88127)
Also add tests for those, and add a few missing requirements to testing
iterators in the test suite.
2024-04-17 08:21:48 -04:00
Louis Dionne
5b811562a5
[libc++] Rename __cpu_traits functions (#88741)
Functions inside __cpu_traits were needlessly prefixed with __parallel,
which doesn't serve a real purpose anymore now that they are inside a
traits class.
2024-04-16 10:33:39 +02:00
Louis Dionne
a3ce29f7bb
[libc++][PSTL] Introduce cpu traits (#88134)
Currently, CPU backends in the PSTL are created by defining functions
in the __par_backend namespace. Then, the PSTL includes the CPU backend
that gets configured via CMake and gets those definitions.

This prevents CPU backends from easily co-existing and is a bit
confusing.
To solve this problem, this patch introduces the notion of __cpu_traits,
which is a cheap encapsulation of the basis operations required to
implement a CPU-based PSTL. Different backends can now define their own
tag and coexist, and the CPU-based PSTL will simply use __cpu_traits to
dispatch to the right implementation of e.g. __for_each.

Note that this patch doesn't change the actual implementation of the
backends in any way, it only modifies how that implementation is
accessed
to implement PSTL algorithms.

This patch is a step towards #88131.
2024-04-15 10:30:00 -04:00