24 Commits

Author SHA1 Message Date
Nikolas Klauser
935e699173
[libc++] Optimize ranges::minmax (#87335)
This allows Clang to vectorize the loop.
```
---------------------------------------------------------------------
Benchmark                                         old             new
---------------------------------------------------------------------
BM_std_minmax<char>/1                        0.659 ns         1.41 ns
BM_std_minmax<char>/2                         1.08 ns         2.16 ns
BM_std_minmax<char>/3                         2.16 ns         2.96 ns
BM_std_minmax<char>/4                         2.82 ns         3.81 ns
BM_std_minmax<char>/5                         3.43 ns         4.69 ns
BM_std_minmax<char>/6                         4.08 ns         5.63 ns
BM_std_minmax<char>/7                         4.75 ns         6.51 ns
BM_std_minmax<char>/8                         5.42 ns         7.41 ns
BM_std_minmax<char>/9                         6.05 ns         8.34 ns
BM_std_minmax<char>/10                        6.68 ns         9.29 ns
BM_std_minmax<char>/11                        7.47 ns         10.6 ns
BM_std_minmax<char>/12                        7.95 ns         11.4 ns
BM_std_minmax<char>/13                        8.64 ns         12.4 ns
BM_std_minmax<char>/14                        9.35 ns         13.4 ns
BM_std_minmax<char>/15                        10.1 ns         14.4 ns
BM_std_minmax<char>/16                        10.6 ns         2.25 ns
BM_std_minmax<char>/17                        11.3 ns         2.82 ns
BM_std_minmax<char>/18                        11.8 ns         3.71 ns
BM_std_minmax<char>/19                        12.6 ns         4.52 ns
BM_std_minmax<char>/20                        13.2 ns         5.47 ns
BM_std_minmax<char>/21                        14.1 ns         6.67 ns
BM_std_minmax<char>/22                        14.5 ns         7.78 ns
BM_std_minmax<char>/23                        15.1 ns         8.67 ns
BM_std_minmax<char>/24                        15.7 ns         9.68 ns
BM_std_minmax<char>/25                        16.4 ns         10.7 ns
BM_std_minmax<char>/26                        17.1 ns         11.7 ns
BM_std_minmax<char>/27                        17.8 ns         12.8 ns
BM_std_minmax<char>/28                        18.4 ns         14.1 ns
BM_std_minmax<char>/29                        19.0 ns         15.0 ns
BM_std_minmax<char>/30                        19.6 ns         16.0 ns
BM_std_minmax<char>/31                        20.2 ns         17.0 ns
BM_std_minmax<char>/32                        20.8 ns         2.46 ns
BM_std_minmax<char>/64                        41.5 ns         2.97 ns
BM_std_minmax<char>/512                        340 ns         6.05 ns
BM_std_minmax<char>/1024                       667 ns         8.83 ns
BM_std_minmax<char>/4000                      2571 ns         28.6 ns
BM_std_minmax<char>/4096                      2632 ns         25.8 ns
BM_std_minmax<char>/5500                      3554 ns         51.1 ns
BM_std_minmax<char>/64000                    41175 ns          480 ns
BM_std_minmax<char>/65536                    42039 ns          490 ns
BM_std_minmax<char>/70000                    44931 ns          528 ns
BM_std_minmax<short>/1                       0.708 ns         1.20 ns
BM_std_minmax<short>/2                        1.18 ns         1.78 ns
BM_std_minmax<short>/3                        1.98 ns         2.42 ns
BM_std_minmax<short>/4                        2.47 ns         3.05 ns
BM_std_minmax<short>/5                        3.09 ns         3.72 ns
BM_std_minmax<short>/6                        3.49 ns         4.37 ns
BM_std_minmax<short>/7                        4.24 ns         5.03 ns
BM_std_minmax<short>/8                        4.65 ns         2.12 ns
BM_std_minmax<short>/9                        5.34 ns         2.51 ns
BM_std_minmax<short>/10                       5.82 ns         3.18 ns
BM_std_minmax<short>/11                       6.36 ns         3.97 ns
BM_std_minmax<short>/12                       6.73 ns         4.68 ns
BM_std_minmax<short>/13                       7.59 ns         5.49 ns
BM_std_minmax<short>/14                       7.77 ns         6.45 ns
BM_std_minmax<short>/15                       8.54 ns         7.55 ns
BM_std_minmax<short>/16                       8.74 ns         2.38 ns
BM_std_minmax<short>/17                       9.59 ns         2.76 ns
BM_std_minmax<short>/18                       9.88 ns         3.37 ns
BM_std_minmax<short>/19                       10.7 ns         4.17 ns
BM_std_minmax<short>/20                       10.9 ns         4.88 ns
BM_std_minmax<short>/21                       12.1 ns         5.70 ns
BM_std_minmax<short>/22                       12.6 ns         6.64 ns
BM_std_minmax<short>/23                       13.5 ns         7.72 ns
BM_std_minmax<short>/24                       13.2 ns         2.87 ns
BM_std_minmax<short>/25                       14.2 ns         3.10 ns
BM_std_minmax<short>/26                       14.2 ns         3.59 ns
BM_std_minmax<short>/27                       15.4 ns         4.35 ns
BM_std_minmax<short>/28                       15.3 ns         5.10 ns
BM_std_minmax<short>/29                       16.2 ns         5.87 ns
BM_std_minmax<short>/30                       16.2 ns         6.88 ns
BM_std_minmax<short>/31                       17.0 ns         7.78 ns
BM_std_minmax<short>/32                       17.2 ns         3.45 ns
BM_std_minmax<short>/64                       34.1 ns         3.35 ns
BM_std_minmax<short>/512                       279 ns         8.37 ns
BM_std_minmax<short>/1024                      549 ns         14.2 ns
BM_std_minmax<short>/4000                     2111 ns         50.1 ns
BM_std_minmax<short>/4096                     2167 ns         47.9 ns
BM_std_minmax<short>/5500                     2895 ns         69.7 ns
BM_std_minmax<short>/64000                   33454 ns          953 ns
BM_std_minmax<short>/65536                   34474 ns          970 ns
BM_std_minmax<short>/70000                   36691 ns         1037 ns
BM_std_minmax<int>/1                         0.664 ns         1.17 ns
BM_std_minmax<int>/2                          1.11 ns         1.69 ns
BM_std_minmax<int>/3                          2.36 ns         2.29 ns
BM_std_minmax<int>/4                          2.53 ns         2.91 ns
BM_std_minmax<int>/5                          3.23 ns         3.56 ns
BM_std_minmax<int>/6                          3.56 ns         4.23 ns
BM_std_minmax<int>/7                          4.28 ns         4.91 ns
BM_std_minmax<int>/8                          4.60 ns         5.60 ns
BM_std_minmax<int>/9                          5.38 ns         6.31 ns
BM_std_minmax<int>/10                         5.69 ns         7.03 ns
BM_std_minmax<int>/11                         6.41 ns         7.70 ns
BM_std_minmax<int>/12                         6.73 ns         8.39 ns
BM_std_minmax<int>/13                         7.38 ns         9.07 ns
BM_std_minmax<int>/14                         7.74 ns         9.79 ns
BM_std_minmax<int>/15                         8.53 ns         10.5 ns
BM_std_minmax<int>/16                         8.79 ns         11.2 ns
BM_std_minmax<int>/17                         9.63 ns         12.0 ns
BM_std_minmax<int>/18                         9.84 ns         12.7 ns
BM_std_minmax<int>/19                         10.6 ns         13.5 ns
BM_std_minmax<int>/20                         11.0 ns         14.3 ns
BM_std_minmax<int>/21                         11.7 ns         15.0 ns
BM_std_minmax<int>/22                         12.0 ns         15.7 ns
BM_std_minmax<int>/23                         13.1 ns         16.5 ns
BM_std_minmax<int>/24                         13.0 ns         17.3 ns
BM_std_minmax<int>/25                         13.7 ns         17.9 ns
BM_std_minmax<int>/26                         14.0 ns         18.6 ns
BM_std_minmax<int>/27                         14.8 ns         19.4 ns
BM_std_minmax<int>/28                         15.1 ns         20.3 ns
BM_std_minmax<int>/29                         15.8 ns         20.9 ns
BM_std_minmax<int>/30                         16.1 ns         21.7 ns
BM_std_minmax<int>/31                         16.9 ns         22.5 ns
BM_std_minmax<int>/32                         17.2 ns         3.40 ns
BM_std_minmax<int>/64                         33.9 ns         4.04 ns
BM_std_minmax<int>/512                         275 ns         14.6 ns
BM_std_minmax<int>/1024                        541 ns         27.5 ns
BM_std_minmax<int>/4000                       2093 ns         96.3 ns
BM_std_minmax<int>/4096                       2146 ns         98.3 ns
BM_std_minmax<int>/5500                       2866 ns          157 ns
BM_std_minmax<int>/64000                     33619 ns         1954 ns
BM_std_minmax<int>/65536                     34252 ns         2009 ns
BM_std_minmax<int>/70000                     36618 ns         2125 ns
BM_std_minmax<long long>/1                   0.709 ns         1.19 ns
BM_std_minmax<long long>/2                    1.01 ns         1.65 ns
BM_std_minmax<long long>/3                    2.14 ns         2.21 ns
BM_std_minmax<long long>/4                    2.45 ns         2.83 ns
BM_std_minmax<long long>/5                    3.09 ns         3.47 ns
BM_std_minmax<long long>/6                    3.44 ns         4.11 ns
BM_std_minmax<long long>/7                    4.16 ns         4.79 ns
BM_std_minmax<long long>/8                    4.54 ns         5.47 ns
BM_std_minmax<long long>/9                    5.37 ns         6.20 ns
BM_std_minmax<long long>/10                   5.71 ns         6.93 ns
BM_std_minmax<long long>/11                   6.00 ns         7.60 ns
BM_std_minmax<long long>/12                   6.43 ns         8.27 ns
BM_std_minmax<long long>/13                   7.01 ns         8.94 ns
BM_std_minmax<long long>/14                   7.45 ns         9.65 ns
BM_std_minmax<long long>/15                   8.16 ns         10.4 ns
BM_std_minmax<long long>/16                   8.46 ns         5.22 ns
BM_std_minmax<long long>/17                   9.16 ns         5.22 ns
BM_std_minmax<long long>/18                   9.53 ns         5.52 ns
BM_std_minmax<long long>/19                   10.2 ns         6.02 ns
BM_std_minmax<long long>/20                   10.5 ns         6.89 ns
BM_std_minmax<long long>/21                   11.3 ns         7.83 ns
BM_std_minmax<long long>/22                   11.6 ns         8.59 ns
BM_std_minmax<long long>/23                   12.3 ns         9.91 ns
BM_std_minmax<long long>/24                   12.6 ns         10.1 ns
BM_std_minmax<long long>/25                   13.2 ns         12.0 ns
BM_std_minmax<long long>/26                   13.6 ns         13.5 ns
BM_std_minmax<long long>/27                   14.2 ns         14.8 ns
BM_std_minmax<long long>/28                   14.7 ns         15.9 ns
BM_std_minmax<long long>/29                   15.3 ns         16.6 ns
BM_std_minmax<long long>/30                   15.8 ns         17.3 ns
BM_std_minmax<long long>/31                   16.3 ns         18.2 ns
BM_std_minmax<long long>/32                   16.7 ns         7.18 ns
BM_std_minmax<long long>/64                   33.1 ns         11.5 ns
BM_std_minmax<long long>/512                   268 ns         71.0 ns
BM_std_minmax<long long>/1024                  532 ns          138 ns
BM_std_minmax<long long>/4000                 2056 ns          533 ns
BM_std_minmax<long long>/4096                 2112 ns          539 ns
BM_std_minmax<long long>/5500                 2823 ns          749 ns
BM_std_minmax<long long>/64000               32956 ns         8590 ns
BM_std_minmax<long long>/65536               33795 ns         8791 ns
BM_std_minmax<long long>/70000               36084 ns         9442 ns
BM_std_minmax<unsigned char>/1               0.714 ns         1.41 ns
BM_std_minmax<unsigned char>/2               0.955 ns         1.96 ns
BM_std_minmax<unsigned char>/3                1.90 ns         2.63 ns
BM_std_minmax<unsigned char>/4                2.40 ns         3.34 ns
BM_std_minmax<unsigned char>/5                2.87 ns         4.10 ns
BM_std_minmax<unsigned char>/6                3.47 ns         4.88 ns
BM_std_minmax<unsigned char>/7                4.04 ns         5.66 ns
BM_std_minmax<unsigned char>/8                4.65 ns         6.45 ns
BM_std_minmax<unsigned char>/9                5.18 ns         7.24 ns
BM_std_minmax<unsigned char>/10               5.80 ns         8.05 ns
BM_std_minmax<unsigned char>/11               6.24 ns         8.86 ns
BM_std_minmax<unsigned char>/12               6.78 ns         9.70 ns
BM_std_minmax<unsigned char>/13               7.30 ns         10.6 ns
BM_std_minmax<unsigned char>/14               7.86 ns         11.4 ns
BM_std_minmax<unsigned char>/15               8.46 ns         12.3 ns
BM_std_minmax<unsigned char>/16               9.00 ns         2.12 ns
BM_std_minmax<unsigned char>/17               9.58 ns         2.83 ns
BM_std_minmax<unsigned char>/18               10.1 ns         3.37 ns
BM_std_minmax<unsigned char>/19               10.7 ns         4.11 ns
BM_std_minmax<unsigned char>/20               11.2 ns         4.85 ns
BM_std_minmax<unsigned char>/21               11.9 ns         5.69 ns
BM_std_minmax<unsigned char>/22               12.3 ns         6.77 ns
BM_std_minmax<unsigned char>/23               13.1 ns         7.56 ns
BM_std_minmax<unsigned char>/24               13.5 ns         8.40 ns
BM_std_minmax<unsigned char>/25               14.2 ns         9.30 ns
BM_std_minmax<unsigned char>/26               14.4 ns         10.1 ns
BM_std_minmax<unsigned char>/27               15.0 ns         11.1 ns
BM_std_minmax<unsigned char>/28               15.3 ns         11.9 ns
BM_std_minmax<unsigned char>/29               16.2 ns         12.9 ns
BM_std_minmax<unsigned char>/30               16.5 ns         13.9 ns
BM_std_minmax<unsigned char>/31               17.2 ns         14.8 ns
BM_std_minmax<unsigned char>/32               17.6 ns         2.36 ns
BM_std_minmax<unsigned char>/64               35.6 ns         3.21 ns
BM_std_minmax<unsigned char>/512               288 ns         6.00 ns
BM_std_minmax<unsigned char>/1024              573 ns         8.80 ns
BM_std_minmax<unsigned char>/4000             2222 ns         28.6 ns
BM_std_minmax<unsigned char>/4096             2265 ns         25.9 ns
BM_std_minmax<unsigned char>/5500             3047 ns         48.8 ns
BM_std_minmax<unsigned char>/64000           35059 ns          480 ns
BM_std_minmax<unsigned char>/65536           35941 ns          491 ns
BM_std_minmax<unsigned char>/70000           38922 ns          525 ns
BM_std_minmax<unsigned short>/1              0.711 ns         1.18 ns
BM_std_minmax<unsigned short>/2              0.957 ns         1.65 ns
BM_std_minmax<unsigned short>/3               2.13 ns         2.21 ns
BM_std_minmax<unsigned short>/4               2.14 ns         2.78 ns
BM_std_minmax<unsigned short>/5               3.06 ns         3.29 ns
BM_std_minmax<unsigned short>/6               2.89 ns         3.87 ns
BM_std_minmax<unsigned short>/7               3.80 ns         4.55 ns
BM_std_minmax<unsigned short>/8               3.68 ns         2.02 ns
BM_std_minmax<unsigned short>/9               4.53 ns         2.40 ns
BM_std_minmax<unsigned short>/10              4.60 ns         2.94 ns
BM_std_minmax<unsigned short>/11              5.67 ns         3.67 ns
BM_std_minmax<unsigned short>/12              5.39 ns         4.22 ns
BM_std_minmax<unsigned short>/13              6.58 ns         4.78 ns
BM_std_minmax<unsigned short>/14              6.33 ns         5.54 ns
BM_std_minmax<unsigned short>/15              7.34 ns         6.30 ns
BM_std_minmax<unsigned short>/16              7.17 ns         2.25 ns
BM_std_minmax<unsigned short>/17              8.19 ns         2.61 ns
BM_std_minmax<unsigned short>/18              8.02 ns         3.19 ns
BM_std_minmax<unsigned short>/19              9.03 ns         3.72 ns
BM_std_minmax<unsigned short>/20              8.89 ns         4.36 ns
BM_std_minmax<unsigned short>/21              9.77 ns         5.10 ns
BM_std_minmax<unsigned short>/22              9.70 ns         5.55 ns
BM_std_minmax<unsigned short>/23              10.8 ns         6.29 ns
BM_std_minmax<unsigned short>/24              10.6 ns         2.41 ns
BM_std_minmax<unsigned short>/25              11.6 ns         2.75 ns
BM_std_minmax<unsigned short>/26              11.4 ns         3.26 ns
BM_std_minmax<unsigned short>/27              12.4 ns         3.86 ns
BM_std_minmax<unsigned short>/28              12.3 ns         4.45 ns
BM_std_minmax<unsigned short>/29              13.2 ns         5.07 ns
BM_std_minmax<unsigned short>/30              13.1 ns         5.77 ns
BM_std_minmax<unsigned short>/31              13.9 ns         6.65 ns
BM_std_minmax<unsigned short>/32              13.9 ns         2.72 ns
BM_std_minmax<unsigned short>/64              27.8 ns         3.25 ns
BM_std_minmax<unsigned short>/512              220 ns         8.30 ns
BM_std_minmax<unsigned short>/1024             435 ns         14.1 ns
BM_std_minmax<unsigned short>/4000            1703 ns         49.8 ns
BM_std_minmax<unsigned short>/4096            1746 ns         47.9 ns
BM_std_minmax<unsigned short>/5500            2350 ns         69.9 ns
BM_std_minmax<unsigned short>/64000          27388 ns          953 ns
BM_std_minmax<unsigned short>/65536          28040 ns          975 ns
BM_std_minmax<unsigned short>/70000          29967 ns         1040 ns
BM_std_minmax<unsigned int>/1                0.712 ns         1.18 ns
BM_std_minmax<unsigned int>/2                0.965 ns         1.65 ns
BM_std_minmax<unsigned int>/3                 2.13 ns         2.14 ns
BM_std_minmax<unsigned int>/4                 2.09 ns         2.64 ns
BM_std_minmax<unsigned int>/5                 3.02 ns         3.21 ns
BM_std_minmax<unsigned int>/6                 2.94 ns         3.81 ns
BM_std_minmax<unsigned int>/7                 3.91 ns         4.38 ns
BM_std_minmax<unsigned int>/8                 3.75 ns         4.93 ns
BM_std_minmax<unsigned int>/9                 4.71 ns         5.60 ns
BM_std_minmax<unsigned int>/10                4.59 ns         6.26 ns
BM_std_minmax<unsigned int>/11                5.57 ns         6.80 ns
BM_std_minmax<unsigned int>/12                5.43 ns         7.47 ns
BM_std_minmax<unsigned int>/13                6.45 ns         8.10 ns
BM_std_minmax<unsigned int>/14                6.32 ns         8.69 ns
BM_std_minmax<unsigned int>/15                7.29 ns         9.37 ns
BM_std_minmax<unsigned int>/16                7.12 ns         9.99 ns
BM_std_minmax<unsigned int>/17                8.24 ns         10.6 ns
BM_std_minmax<unsigned int>/18                8.00 ns         11.2 ns
BM_std_minmax<unsigned int>/19                8.94 ns         12.0 ns
BM_std_minmax<unsigned int>/20                8.91 ns         12.6 ns
BM_std_minmax<unsigned int>/21                9.73 ns         17.2 ns
BM_std_minmax<unsigned int>/22                9.75 ns         13.8 ns
BM_std_minmax<unsigned int>/23                10.6 ns         14.5 ns
BM_std_minmax<unsigned int>/24                10.6 ns         15.1 ns
BM_std_minmax<unsigned int>/25                11.5 ns         15.7 ns
BM_std_minmax<unsigned int>/26                11.4 ns         16.3 ns
BM_std_minmax<unsigned int>/27                12.3 ns         17.0 ns
BM_std_minmax<unsigned int>/28                12.3 ns         17.6 ns
BM_std_minmax<unsigned int>/29                13.2 ns         18.3 ns
BM_std_minmax<unsigned int>/30                13.2 ns         19.0 ns
BM_std_minmax<unsigned int>/31                14.0 ns         19.6 ns
BM_std_minmax<unsigned int>/32                14.0 ns         3.39 ns
BM_std_minmax<unsigned int>/64                27.6 ns         4.05 ns
BM_std_minmax<unsigned int>/512                221 ns         14.2 ns
BM_std_minmax<unsigned int>/1024               439 ns         25.5 ns
BM_std_minmax<unsigned int>/4000              1720 ns         96.3 ns
BM_std_minmax<unsigned int>/4096              1762 ns         97.8 ns
BM_std_minmax<unsigned int>/5500              2364 ns          146 ns
BM_std_minmax<unsigned int>/64000            27874 ns         1905 ns
BM_std_minmax<unsigned int>/65536            28012 ns         1961 ns
BM_std_minmax<unsigned int>/70000            29899 ns         2087 ns
BM_std_minmax<unsigned long long>/1          0.707 ns         1.18 ns
BM_std_minmax<unsigned long long>/2          0.909 ns         1.65 ns
BM_std_minmax<unsigned long long>/3           1.65 ns         2.70 ns
BM_std_minmax<unsigned long long>/4           1.93 ns         2.69 ns
BM_std_minmax<unsigned long long>/5           2.45 ns         3.34 ns
BM_std_minmax<unsigned long long>/6           2.78 ns         3.81 ns
BM_std_minmax<unsigned long long>/7           3.28 ns         4.43 ns
BM_std_minmax<unsigned long long>/8           3.70 ns         4.92 ns
BM_std_minmax<unsigned long long>/9           4.12 ns         5.64 ns
BM_std_minmax<unsigned long long>/10          4.44 ns         6.15 ns
BM_std_minmax<unsigned long long>/11          4.91 ns         6.81 ns
BM_std_minmax<unsigned long long>/12          5.31 ns         7.41 ns
BM_std_minmax<unsigned long long>/13          5.72 ns         7.96 ns
BM_std_minmax<unsigned long long>/14          6.05 ns         8.66 ns
BM_std_minmax<unsigned long long>/15          6.55 ns         9.37 ns
BM_std_minmax<unsigned long long>/16          6.89 ns         7.98 ns
BM_std_minmax<unsigned long long>/17          7.34 ns         8.13 ns
BM_std_minmax<unsigned long long>/18          7.73 ns         8.42 ns
BM_std_minmax<unsigned long long>/19          8.26 ns         8.63 ns
BM_std_minmax<unsigned long long>/20          8.54 ns         8.96 ns
BM_std_minmax<unsigned long long>/21          9.14 ns         9.37 ns
BM_std_minmax<unsigned long long>/22          9.39 ns         9.67 ns
BM_std_minmax<unsigned long long>/23          10.1 ns         10.1 ns
BM_std_minmax<unsigned long long>/24          10.4 ns         10.6 ns
BM_std_minmax<unsigned long long>/25          11.0 ns         11.3 ns
BM_std_minmax<unsigned long long>/26          11.3 ns         12.1 ns
BM_std_minmax<unsigned long long>/27          11.8 ns         14.2 ns
BM_std_minmax<unsigned long long>/28          12.1 ns         15.8 ns
BM_std_minmax<unsigned long long>/29          12.6 ns         17.4 ns
BM_std_minmax<unsigned long long>/30          13.1 ns         18.1 ns
BM_std_minmax<unsigned long long>/31          13.4 ns         18.8 ns
BM_std_minmax<unsigned long long>/32          13.8 ns         10.4 ns
BM_std_minmax<unsigned long long>/64          27.3 ns         15.5 ns
BM_std_minmax<unsigned long long>/512          222 ns         80.6 ns
BM_std_minmax<unsigned long long>/1024         443 ns          156 ns
BM_std_minmax<unsigned long long>/4000        1731 ns          591 ns
BM_std_minmax<unsigned long long>/4096        1752 ns          609 ns
BM_std_minmax<unsigned long long>/5500        2340 ns          819 ns
BM_std_minmax<unsigned long long>/64000      27166 ns         9652 ns
BM_std_minmax<unsigned long long>/65536      27869 ns         9876 ns
BM_std_minmax<unsigned long long>/70000      29920 ns        10680 ns
```
2024-04-06 17:22:07 +02:00
Nikolas Klauser
985c1a44f8
[libc++] Optimize the two range overload of mismatch (#86853)
```
-----------------------------------------------------------------------------
Benchmark                                                 old             new
-----------------------------------------------------------------------------
bm_mismatch_two_range_overload<char>/1               0.941 ns         1.88 ns
bm_mismatch_two_range_overload<char>/2                1.43 ns         2.15 ns
bm_mismatch_two_range_overload<char>/3                1.95 ns         2.55 ns
bm_mismatch_two_range_overload<char>/4                2.58 ns         2.90 ns
bm_mismatch_two_range_overload<char>/5                3.75 ns         3.31 ns
bm_mismatch_two_range_overload<char>/6                5.00 ns         3.83 ns
bm_mismatch_two_range_overload<char>/7                5.59 ns         4.35 ns
bm_mismatch_two_range_overload<char>/8                6.37 ns         4.84 ns
bm_mismatch_two_range_overload<char>/16               11.8 ns         6.72 ns
bm_mismatch_two_range_overload<char>/64               45.5 ns         2.59 ns
bm_mismatch_two_range_overload<char>/512               366 ns         12.6 ns
bm_mismatch_two_range_overload<char>/4096             2890 ns         91.6 ns
bm_mismatch_two_range_overload<char>/32768           23038 ns          758 ns
bm_mismatch_two_range_overload<char>/262144         142813 ns         6573 ns
bm_mismatch_two_range_overload<char>/1048576        366679 ns        26710 ns
bm_mismatch_two_range_overload<short>/1              0.934 ns         1.88 ns
bm_mismatch_two_range_overload<short>/2               1.30 ns         2.58 ns
bm_mismatch_two_range_overload<short>/3               1.76 ns         3.28 ns
bm_mismatch_two_range_overload<short>/4               2.24 ns         3.98 ns
bm_mismatch_two_range_overload<short>/5               2.80 ns         4.92 ns
bm_mismatch_two_range_overload<short>/6               3.58 ns         6.01 ns
bm_mismatch_two_range_overload<short>/7               4.29 ns         7.03 ns
bm_mismatch_two_range_overload<short>/8               4.67 ns         7.39 ns
bm_mismatch_two_range_overload<short>/16              9.86 ns         13.1 ns
bm_mismatch_two_range_overload<short>/64              38.9 ns         4.55 ns
bm_mismatch_two_range_overload<short>/512              348 ns         27.7 ns
bm_mismatch_two_range_overload<short>/4096            2881 ns          225 ns
bm_mismatch_two_range_overload<short>/32768          23111 ns         1715 ns
bm_mismatch_two_range_overload<short>/262144        184846 ns        14416 ns
bm_mismatch_two_range_overload<short>/1048576       742885 ns        57264 ns
bm_mismatch_two_range_overload<int>/1                0.838 ns         1.19 ns
bm_mismatch_two_range_overload<int>/2                 1.19 ns         1.65 ns
bm_mismatch_two_range_overload<int>/3                 1.83 ns         2.06 ns
bm_mismatch_two_range_overload<int>/4                 2.38 ns         2.42 ns
bm_mismatch_two_range_overload<int>/5                 3.60 ns         2.47 ns
bm_mismatch_two_range_overload<int>/6                 3.68 ns         3.05 ns
bm_mismatch_two_range_overload<int>/7                 4.32 ns         3.36 ns
bm_mismatch_two_range_overload<int>/8                 5.18 ns         3.58 ns
bm_mismatch_two_range_overload<int>/16                10.6 ns         2.84 ns
bm_mismatch_two_range_overload<int>/64                39.0 ns         7.78 ns
bm_mismatch_two_range_overload<int>/512                247 ns         53.9 ns
bm_mismatch_two_range_overload<int>/4096              1927 ns          429 ns
bm_mismatch_two_range_overload<int>/32768            15569 ns         3393 ns
bm_mismatch_two_range_overload<int>/262144          125413 ns        28504 ns
bm_mismatch_two_range_overload<int>/1048576         504549 ns       112729 ns
```
2024-04-01 18:21:51 +02:00
Nikolas Klauser
beaff78528
[libc++] Optimize the std::mismatch tail (#83440)
This adds vectorization to the last 0-3 vectors and, if the range is
large enough, the remaining elements that don't fill a vector
completely.
```
-----------------------------------------------------------------------
Benchmark                           old    full vectors  partial vector
-----------------------------------------------------------------------
bm_mismatch<char>/1             1.40 ns         1.62 ns         2.09 ns
bm_mismatch<char>/2             1.88 ns         2.10 ns         2.33 ns
bm_mismatch<char>/3             2.67 ns         2.56 ns         2.72 ns
bm_mismatch<char>/4             3.01 ns         3.20 ns         3.70 ns
bm_mismatch<char>/5             3.51 ns         3.73 ns         3.64 ns
bm_mismatch<char>/6             4.71 ns         4.85 ns         4.37 ns
bm_mismatch<char>/7             5.12 ns         5.33 ns         4.37 ns
bm_mismatch<char>/8             5.79 ns         6.02 ns         4.75 ns
bm_mismatch<char>/15            9.20 ns         10.5 ns         7.23 ns
bm_mismatch<char>/16            10.2 ns         10.1 ns         7.46 ns
bm_mismatch<char>/17            10.2 ns         10.8 ns         7.57 ns
bm_mismatch<char>/31            17.6 ns         17.1 ns         10.8 ns
bm_mismatch<char>/32            17.4 ns         1.64 ns         1.64 ns
bm_mismatch<char>/33            23.3 ns         2.10 ns         2.33 ns
bm_mismatch<char>/63            31.8 ns         16.9 ns         2.33 ns
bm_mismatch<char>/64            32.6 ns         2.10 ns         2.10 ns
bm_mismatch<char>/65            33.6 ns         2.57 ns         2.80 ns
bm_mismatch<char>/127           67.3 ns         18.1 ns         3.27 ns
bm_mismatch<char>/128           2.17 ns         2.14 ns         2.57 ns
bm_mismatch<char>/129           2.36 ns         2.80 ns         3.27 ns
bm_mismatch<char>/255           67.5 ns         19.6 ns         4.68 ns
bm_mismatch<char>/256           3.76 ns         3.71 ns         3.97 ns
bm_mismatch<char>/257           3.77 ns         4.04 ns         4.43 ns
bm_mismatch<char>/511           70.8 ns         22.1 ns         7.47 ns
bm_mismatch<char>/512           7.27 ns         7.30 ns         6.95 ns
bm_mismatch<char>/513           7.11 ns         7.05 ns         6.96 ns
bm_mismatch<char>/1023          75.9 ns         27.4 ns         13.3 ns
bm_mismatch<char>/1024          13.9 ns         13.8 ns         12.4 ns
bm_mismatch<char>/1025          13.6 ns         13.6 ns         12.8 ns
bm_mismatch<char>/2047          87.3 ns         37.5 ns         25.4 ns
bm_mismatch<char>/2048          26.8 ns         27.4 ns         24.0 ns
bm_mismatch<char>/2049          26.7 ns         27.3 ns         25.5 ns
bm_mismatch<char>/4095           112 ns         64.7 ns         48.7 ns
bm_mismatch<char>/4096          53.0 ns         54.2 ns         46.8 ns
bm_mismatch<char>/4097          52.7 ns         54.2 ns         48.4 ns
bm_mismatch<char>/8191           160 ns          118 ns         98.4 ns
bm_mismatch<char>/8192           107 ns          108 ns         96.0 ns
bm_mismatch<char>/8193           106 ns          108 ns         97.2 ns
bm_mismatch<char>/16383          283 ns          234 ns          215 ns
bm_mismatch<char>/16384          227 ns          223 ns          217 ns
bm_mismatch<char>/16385          221 ns          221 ns          215 ns
bm_mismatch<char>/32767          547 ns          499 ns          488 ns
bm_mismatch<char>/32768          495 ns          492 ns          492 ns
bm_mismatch<char>/32769          491 ns          489 ns          488 ns
bm_mismatch<char>/65535         1028 ns          979 ns          971 ns
bm_mismatch<char>/65536          976 ns          970 ns          974 ns
bm_mismatch<char>/65537          970 ns          965 ns          971 ns
bm_mismatch<char>/131071        2031 ns         1948 ns         2005 ns
bm_mismatch<char>/131072        1973 ns         1955 ns         1974 ns
bm_mismatch<char>/131073        1989 ns         1932 ns         2001 ns
bm_mismatch<char>/262143        4469 ns         4244 ns         4223 ns
bm_mismatch<char>/262144        4443 ns         4183 ns         4243 ns
bm_mismatch<char>/262145        4400 ns         4232 ns         4246 ns
bm_mismatch<char>/524287       10169 ns         9733 ns         9592 ns
bm_mismatch<char>/524288       10154 ns         9664 ns         9843 ns
bm_mismatch<char>/524289       10113 ns         9641 ns        10003 ns
bm_mismatch<short>/1            1.86 ns         2.53 ns         2.32 ns
bm_mismatch<short>/2            2.57 ns         2.77 ns         2.55 ns
bm_mismatch<short>/3            3.26 ns         3.00 ns         2.79 ns
bm_mismatch<short>/4            3.95 ns         3.39 ns         3.15 ns
bm_mismatch<short>/5            4.83 ns         3.97 ns         3.72 ns
bm_mismatch<short>/6            5.43 ns         4.34 ns         4.03 ns
bm_mismatch<short>/7            6.11 ns         4.73 ns         4.44 ns
bm_mismatch<short>/8            6.84 ns         5.02 ns         4.79 ns
bm_mismatch<short>/15           11.5 ns         7.12 ns         6.50 ns
bm_mismatch<short>/16           13.9 ns         1.87 ns         2.11 ns
bm_mismatch<short>/17           14.0 ns         3.00 ns         2.47 ns
bm_mismatch<short>/31           23.1 ns         7.87 ns         2.47 ns
bm_mismatch<short>/32           23.8 ns         2.57 ns         2.81 ns
bm_mismatch<short>/33           24.5 ns         3.70 ns         2.94 ns
bm_mismatch<short>/63           44.8 ns         9.37 ns         3.46 ns
bm_mismatch<short>/64           2.32 ns         2.57 ns         2.64 ns
bm_mismatch<short>/65           2.52 ns         3.02 ns         3.51 ns
bm_mismatch<short>/127          45.6 ns         9.97 ns         5.18 ns
bm_mismatch<short>/128          3.85 ns         3.93 ns         3.94 ns
bm_mismatch<short>/129          3.82 ns         4.20 ns         4.70 ns
bm_mismatch<short>/255          50.4 ns         12.6 ns         8.07 ns
bm_mismatch<short>/256          7.23 ns         6.91 ns         6.98 ns
bm_mismatch<short>/257          7.24 ns         7.19 ns         7.55 ns
bm_mismatch<short>/511          52.3 ns         17.8 ns         14.0 ns
bm_mismatch<short>/512          13.6 ns         13.7 ns         13.6 ns
bm_mismatch<short>/513          13.9 ns         13.8 ns         18.5 ns
bm_mismatch<short>/1023         60.9 ns         30.9 ns         26.3 ns
bm_mismatch<short>/1024         26.7 ns         27.7 ns         25.7 ns
bm_mismatch<short>/1025         27.7 ns         27.6 ns         25.3 ns
bm_mismatch<short>/2047         88.4 ns         58.0 ns         51.6 ns
bm_mismatch<short>/2048         52.8 ns         55.3 ns         50.6 ns
bm_mismatch<short>/2049         55.2 ns         54.8 ns         48.7 ns
bm_mismatch<short>/4095          153 ns          113 ns          102 ns
bm_mismatch<short>/4096          105 ns          110 ns          101 ns
bm_mismatch<short>/4097          110 ns          110 ns         99.1 ns
bm_mismatch<short>/8191          277 ns          219 ns          206 ns
bm_mismatch<short>/8192          226 ns          214 ns          250 ns
bm_mismatch<short>/8193          226 ns          207 ns          208 ns
bm_mismatch<short>/16383         519 ns          492 ns          488 ns
bm_mismatch<short>/16384         494 ns          492 ns          492 ns
bm_mismatch<short>/16385         492 ns          488 ns          489 ns
bm_mismatch<short>/32767        1007 ns          968 ns          964 ns
bm_mismatch<short>/32768         977 ns          972 ns          970 ns
bm_mismatch<short>/32769         972 ns          962 ns          967 ns
bm_mismatch<short>/65535        1978 ns         1918 ns         1956 ns
bm_mismatch<short>/65536        1940 ns         1927 ns         1970 ns
bm_mismatch<short>/65537        1937 ns         1922 ns         1959 ns
bm_mismatch<short>/131071       4524 ns         4193 ns         4304 ns
bm_mismatch<short>/131072       4445 ns         4196 ns         4306 ns
bm_mismatch<short>/131073       4452 ns         4278 ns         4311 ns
bm_mismatch<short>/262143       9801 ns        10188 ns         9634 ns
bm_mismatch<short>/262144       9738 ns        10151 ns         9651 ns
bm_mismatch<short>/262145       9716 ns        10171 ns         9715 ns
bm_mismatch<short>/524287      19944 ns        20718 ns        20044 ns
bm_mismatch<short>/524288      21139 ns        20647 ns        20008 ns
bm_mismatch<short>/524289      21162 ns        19512 ns        20068 ns
bm_mismatch<int>/1              1.40 ns         1.84 ns         1.87 ns
bm_mismatch<int>/2              1.87 ns         2.08 ns         2.09 ns
bm_mismatch<int>/3              2.36 ns         2.31 ns         2.87 ns
bm_mismatch<int>/4              3.06 ns         2.72 ns         2.95 ns
bm_mismatch<int>/5              3.66 ns         3.37 ns         3.42 ns
bm_mismatch<int>/6              4.55 ns         3.65 ns         3.73 ns
bm_mismatch<int>/7              5.03 ns         3.93 ns         3.94 ns
bm_mismatch<int>/8              5.67 ns         1.86 ns         1.87 ns
bm_mismatch<int>/15             9.89 ns         4.41 ns         2.34 ns
bm_mismatch<int>/16             10.1 ns         2.33 ns         2.34 ns
bm_mismatch<int>/17             10.2 ns         3.34 ns         2.86 ns
bm_mismatch<int>/31             17.2 ns         5.54 ns         3.28 ns
bm_mismatch<int>/32             2.16 ns         2.15 ns         2.58 ns
bm_mismatch<int>/33             2.36 ns         3.01 ns         3.28 ns
bm_mismatch<int>/63             17.7 ns         6.50 ns         4.93 ns
bm_mismatch<int>/64             3.81 ns         3.58 ns         3.90 ns
bm_mismatch<int>/65             3.74 ns         4.36 ns         4.45 ns
bm_mismatch<int>/127            19.5 ns         9.56 ns         7.74 ns
bm_mismatch<int>/128            7.30 ns         6.41 ns         6.85 ns
bm_mismatch<int>/129            7.09 ns         7.04 ns         7.06 ns
bm_mismatch<int>/255            24.7 ns         14.8 ns         13.3 ns
bm_mismatch<int>/256            14.0 ns         12.1 ns         12.3 ns
bm_mismatch<int>/257            13.8 ns         12.7 ns         12.8 ns
bm_mismatch<int>/511            34.3 ns         26.3 ns         24.8 ns
bm_mismatch<int>/512            27.6 ns         23.6 ns         23.9 ns
bm_mismatch<int>/513            27.3 ns         24.4 ns         25.1 ns
bm_mismatch<int>/1023           62.5 ns         50.9 ns         48.3 ns
bm_mismatch<int>/1024           54.4 ns         46.1 ns         46.6 ns
bm_mismatch<int>/1025           54.2 ns         48.4 ns         47.5 ns
bm_mismatch<int>/2047            116 ns         97.8 ns         94.1 ns
bm_mismatch<int>/2048            108 ns         92.6 ns         92.4 ns
bm_mismatch<int>/2049            108 ns          104 ns         94.0 ns
bm_mismatch<int>/4095            233 ns          222 ns          205 ns
bm_mismatch<int>/4096            226 ns          223 ns          225 ns
bm_mismatch<int>/4097            221 ns          219 ns          210 ns
bm_mismatch<int>/8191            499 ns          485 ns          488 ns
bm_mismatch<int>/8192            496 ns          490 ns          495 ns
bm_mismatch<int>/8193            491 ns          485 ns          488 ns
bm_mismatch<int>/16383           982 ns          962 ns          964 ns
bm_mismatch<int>/16384           974 ns          971 ns          971 ns
bm_mismatch<int>/16385           971 ns          961 ns          968 ns
bm_mismatch<int>/32767          2003 ns         1959 ns         1920 ns
bm_mismatch<int>/32768          1996 ns         1947 ns         1928 ns
bm_mismatch<int>/32769          1990 ns         1945 ns         1926 ns
bm_mismatch<int>/65535          4434 ns         4275 ns         4312 ns
bm_mismatch<int>/65536          4437 ns         4267 ns         4321 ns
bm_mismatch<int>/65537          4442 ns         4261 ns         4321 ns
bm_mismatch<int>/131071         9673 ns         9648 ns         9465 ns
bm_mismatch<int>/131072         9667 ns         9671 ns         9465 ns
bm_mismatch<int>/131073         9661 ns         9653 ns         9464 ns
bm_mismatch<int>/262143        20595 ns        19605 ns        19064 ns
bm_mismatch<int>/262144        19894 ns        19572 ns        19009 ns
bm_mismatch<int>/262145        19851 ns        19656 ns        18999 ns
bm_mismatch<int>/524287        39556 ns        39364 ns        38131 ns
bm_mismatch<int>/524288        39678 ns        39573 ns        38183 ns
bm_mismatch<int>/524289        40168 ns        39301 ns        38121 ns
```
2024-03-29 19:29:54 +01:00
Nikolas Klauser
b68e2eba0b
[libc++] Vectorize mismatch (#73255)
```
---------------------------------------------------
Benchmark                           old         new
---------------------------------------------------
bm_mismatch<char>/1           0.835 ns      2.37 ns
bm_mismatch<char>/2            1.44 ns      2.60 ns
bm_mismatch<char>/3            2.06 ns      2.83 ns
bm_mismatch<char>/4            2.60 ns      3.29 ns
bm_mismatch<char>/5            3.15 ns      3.77 ns
bm_mismatch<char>/6            3.82 ns      4.17 ns
bm_mismatch<char>/7            4.29 ns      4.52 ns
bm_mismatch<char>/8            4.78 ns      4.86 ns
bm_mismatch<char>/16           9.06 ns      7.54 ns
bm_mismatch<char>/64           31.7 ns      19.1 ns
bm_mismatch<char>/512           249 ns      8.16 ns
bm_mismatch<char>/4096         1956 ns      44.2 ns
bm_mismatch<char>/32768       15498 ns       501 ns
bm_mismatch<char>/262144     123965 ns      4479 ns
bm_mismatch<char>/1048576    495668 ns     21306 ns
bm_mismatch<short>/1          0.710 ns      2.12 ns
bm_mismatch<short>/2           1.03 ns      2.66 ns
bm_mismatch<short>/3           1.29 ns      3.56 ns
bm_mismatch<short>/4           1.68 ns      4.29 ns
bm_mismatch<short>/5           1.96 ns      5.18 ns
bm_mismatch<short>/6           2.59 ns      5.91 ns
bm_mismatch<short>/7           2.86 ns      6.63 ns
bm_mismatch<short>/8           3.19 ns      7.33 ns
bm_mismatch<short>/16          5.48 ns      13.0 ns
bm_mismatch<short>/64          16.6 ns      4.06 ns
bm_mismatch<short>/512          130 ns      13.8 ns
bm_mismatch<short>/4096         985 ns      93.8 ns
bm_mismatch<short>/32768       7846 ns      1002 ns
bm_mismatch<short>/262144     63217 ns     10637 ns
bm_mismatch<short>/1048576   251782 ns     42471 ns
bm_mismatch<int>/1            0.716 ns      1.91 ns
bm_mismatch<int>/2             1.21 ns      2.49 ns
bm_mismatch<int>/3             1.38 ns      3.46 ns
bm_mismatch<int>/4             1.71 ns      4.04 ns
bm_mismatch<int>/5             2.00 ns      4.98 ns
bm_mismatch<int>/6             2.43 ns      5.67 ns
bm_mismatch<int>/7             3.05 ns      6.38 ns
bm_mismatch<int>/8             3.22 ns      7.09 ns
bm_mismatch<int>/16            5.18 ns      12.8 ns
bm_mismatch<int>/64            16.6 ns      5.28 ns
bm_mismatch<int>/512            129 ns      25.2 ns
bm_mismatch<int>/4096          1009 ns       201 ns
bm_mismatch<int>/32768         7776 ns      2144 ns
bm_mismatch<int>/262144       62371 ns     20551 ns
bm_mismatch<int>/1048576     254750 ns     90097 ns
```
2024-03-23 15:28:22 +01:00
Nikolas Klauser
07b18c5e1b
[libc++] Optimize ranges::fill{,_n} for vector<bool>::iterator (#84642)
```
------------------------------------------------------
Benchmark                          old             new
------------------------------------------------------
bm_ranges_fill_n/1             1.64 ns         3.06 ns
bm_ranges_fill_n/2             3.45 ns         3.06 ns
bm_ranges_fill_n/3             4.88 ns         3.06 ns
bm_ranges_fill_n/4             6.46 ns         3.06 ns
bm_ranges_fill_n/5             8.03 ns         3.06 ns
bm_ranges_fill_n/6             9.65 ns         3.07 ns
bm_ranges_fill_n/7             11.5 ns         3.06 ns
bm_ranges_fill_n/8             13.0 ns         3.06 ns
bm_ranges_fill_n/16            25.9 ns         3.06 ns
bm_ranges_fill_n/64             103 ns         4.62 ns
bm_ranges_fill_n/512            711 ns         4.40 ns
bm_ranges_fill_n/4096          5642 ns         9.86 ns
bm_ranges_fill_n/32768        45135 ns         33.6 ns
bm_ranges_fill_n/262144      360818 ns          243 ns
bm_ranges_fill_n/1048576    1442828 ns          982 ns
bm_ranges_fill/1               1.63 ns         3.17 ns
bm_ranges_fill/2               3.43 ns         3.28 ns
bm_ranges_fill/3               4.97 ns         3.31 ns
bm_ranges_fill/4               6.53 ns         3.27 ns
bm_ranges_fill/5               8.12 ns         3.33 ns
bm_ranges_fill/6               9.76 ns         3.32 ns
bm_ranges_fill/7               11.6 ns         3.29 ns
bm_ranges_fill/8               13.2 ns         3.26 ns
bm_ranges_fill/16              26.3 ns         3.26 ns
bm_ranges_fill/64               104 ns         4.92 ns
bm_ranges_fill/512              716 ns         4.47 ns
bm_ranges_fill/4096            5772 ns         8.21 ns
bm_ranges_fill/32768          45778 ns         33.1 ns
bm_ranges_fill/262144        351422 ns          241 ns
bm_ranges_fill/1048576      1404710 ns          965 ns
```
2024-03-17 20:00:54 +01:00
ZijunZhaoCCK
fdd089b500
[libc++] Implement ranges::contains (#65148)
Differential Revision: https://reviews.llvm.org/D159232
```
Running ./ranges_contains.libcxx.out
Run on (10 X 24.121 MHz CPU s)
CPU Caches:
  L1 Data 64 KiB (x10)
  L1 Instruction 128 KiB (x10)
  L2 Unified 4096 KiB (x5)
Load Average: 3.37, 6.77, 5.27
--------------------------------------------------------------------
Benchmark                          Time             CPU   Iterations
--------------------------------------------------------------------
bm_contains_char/16             1.88 ns         1.87 ns    371607095
bm_contains_char/256            7.48 ns         7.47 ns     93292285
bm_contains_char/4096           99.7 ns         99.6 ns      7013185
bm_contains_char/65536          1296 ns         1294 ns       540436
bm_contains_char/1048576       23887 ns        23860 ns        29302
bm_contains_char/16777216     389420 ns       389095 ns         1796
bm_contains_int/16              7.14 ns         7.14 ns     97776288
bm_contains_int/256             90.4 ns         90.3 ns      7558089
bm_contains_int/4096            1294 ns         1290 ns       543052
bm_contains_int/65536          20482 ns        20443 ns        34334
bm_contains_int/1048576       328817 ns       327965 ns         2147
bm_contains_int/16777216     5246279 ns      5239361 ns          133
bm_contains_bool/16             2.19 ns         2.19 ns    322565780
bm_contains_bool/256            3.42 ns         3.41 ns    205025467
bm_contains_bool/4096           22.1 ns         22.1 ns     31780479
bm_contains_bool/65536           333 ns          332 ns      2106606
bm_contains_bool/1048576        5126 ns         5119 ns       135901
bm_contains_bool/16777216      81656 ns        81574 ns         8569
```

---------

Co-authored-by: Nathan Gauër <brioche@google.com>
2023-12-19 16:34:19 -08:00
Nikolas Klauser
f7407411a1
[libc++] Optimize std::find for segmented iterators (#67224)
```
--------------------------------------------------------------------------
Benchmark                                              old             new
--------------------------------------------------------------------------
bm_find<std::deque<char>>/1                        6.06 ns         10.6 ns
bm_find<std::deque<char>>/2                        15.5 ns         10.6 ns
bm_find<std::deque<char>>/3                        19.0 ns         10.6 ns
bm_find<std::deque<char>>/4                        20.8 ns         10.6 ns
bm_find<std::deque<char>>/5                        22.0 ns         10.6 ns
bm_find<std::deque<char>>/6                        23.0 ns         10.5 ns
bm_find<std::deque<char>>/7                        24.8 ns         10.7 ns
bm_find<std::deque<char>>/8                        25.7 ns         10.6 ns
bm_find<std::deque<char>>/16                       28.3 ns         10.6 ns
bm_find<std::deque<char>>/64                       44.2 ns         27.0 ns
bm_find<std::deque<char>>/512                       133 ns         37.6 ns
bm_find<std::deque<char>>/4096                      867 ns         53.1 ns
bm_find<std::deque<char>>/32768                    6838 ns          160 ns
bm_find<std::deque<char>>/262144                  52897 ns         1495 ns
bm_find<std::deque<char>>/1048576                215621 ns         6077 ns
bm_find<std::deque<short>>/1                       6.03 ns         6.28 ns
bm_find<std::deque<short>>/2                       15.8 ns         15.8 ns
bm_find<std::deque<short>>/3                       20.5 ns         20.3 ns
bm_find<std::deque<short>>/4                       21.0 ns         21.0 ns
bm_find<std::deque<short>>/5                       23.0 ns         22.1 ns
bm_find<std::deque<short>>/6                       22.6 ns         23.0 ns
bm_find<std::deque<short>>/7                       23.4 ns         23.7 ns
bm_find<std::deque<short>>/8                       24.4 ns         24.9 ns
bm_find<std::deque<short>>/16                      26.6 ns         27.2 ns
bm_find<std::deque<short>>/64                      43.2 ns         40.9 ns
bm_find<std::deque<short>>/512                      124 ns         90.7 ns
bm_find<std::deque<short>>/4096                     845 ns          525 ns
bm_find<std::deque<short>>/32768                   7273 ns         3194 ns
bm_find<std::deque<short>>/262144                 53710 ns        24385 ns
bm_find<std::deque<short>>/1048576               216086 ns        96195 ns
bm_find<std::deque<int>>/1                         6.03 ns         10.3 ns
bm_find<std::deque<int>>/2                         15.6 ns         10.3 ns
bm_find<std::deque<int>>/3                         19.1 ns         10.3 ns
bm_find<std::deque<int>>/4                         22.3 ns         10.3 ns
bm_find<std::deque<int>>/5                         23.5 ns         10.4 ns
bm_find<std::deque<int>>/6                         23.1 ns         10.3 ns
bm_find<std::deque<int>>/7                         23.7 ns         10.2 ns
bm_find<std::deque<int>>/8                         24.5 ns         10.2 ns
bm_find<std::deque<int>>/16                        27.9 ns         26.6 ns
bm_find<std::deque<int>>/64                        42.6 ns         32.2 ns
bm_find<std::deque<int>>/512                        123 ns         43.0 ns
bm_find<std::deque<int>>/4096                       874 ns         93.5 ns
bm_find<std::deque<int>>/32768                     7031 ns          751 ns
bm_find<std::deque<int>>/262144                   57723 ns         6169 ns
bm_find<std::deque<int>>/1048576                 230867 ns        35851 ns
bm_ranges_find<std::deque<char>>/1                 5.97 ns         10.6 ns
bm_ranges_find<std::deque<char>>/2                 16.0 ns         10.5 ns
bm_ranges_find<std::deque<char>>/3                 19.5 ns         10.5 ns
bm_ranges_find<std::deque<char>>/4                 21.1 ns         10.6 ns
bm_ranges_find<std::deque<char>>/5                 22.8 ns         10.5 ns
bm_ranges_find<std::deque<char>>/6                 22.8 ns         10.6 ns
bm_ranges_find<std::deque<char>>/7                 23.4 ns         10.8 ns
bm_ranges_find<std::deque<char>>/8                 24.1 ns         10.5 ns
bm_ranges_find<std::deque<char>>/16                26.9 ns         10.6 ns
bm_ranges_find<std::deque<char>>/64                50.2 ns         27.2 ns
bm_ranges_find<std::deque<char>>/512                126 ns         38.3 ns
bm_ranges_find<std::deque<char>>/4096               868 ns         53.8 ns
bm_ranges_find<std::deque<char>>/32768             6695 ns          161 ns
bm_ranges_find<std::deque<char>>/262144           54411 ns         1497 ns
bm_ranges_find<std::deque<char>>/1048576         241699 ns         6042 ns
bm_ranges_find<std::deque<short>>/1                6.39 ns         6.31 ns
bm_ranges_find<std::deque<short>>/2                15.8 ns         15.9 ns
bm_ranges_find<std::deque<short>>/3                19.0 ns         19.8 ns
bm_ranges_find<std::deque<short>>/4                20.8 ns         20.9 ns
bm_ranges_find<std::deque<short>>/5                21.8 ns         22.1 ns
bm_ranges_find<std::deque<short>>/6                23.0 ns         23.0 ns
bm_ranges_find<std::deque<short>>/7                23.2 ns         23.9 ns
bm_ranges_find<std::deque<short>>/8                23.7 ns         24.4 ns
bm_ranges_find<std::deque<short>>/16               26.6 ns         26.8 ns
bm_ranges_find<std::deque<short>>/64               43.4 ns         39.7 ns
bm_ranges_find<std::deque<short>>/512               131 ns         90.5 ns
bm_ranges_find<std::deque<short>>/4096              851 ns          523 ns
bm_ranges_find<std::deque<short>>/32768            7370 ns         3166 ns
bm_ranges_find<std::deque<short>>/262144          60778 ns        24814 ns
bm_ranges_find<std::deque<short>>/1048576        229288 ns        99273 ns
bm_ranges_find<std::deque<int>>/1                  6.43 ns         10.2 ns
bm_ranges_find<std::deque<int>>/2                  16.6 ns         10.2 ns
bm_ranges_find<std::deque<int>>/3                  19.6 ns         10.2 ns
bm_ranges_find<std::deque<int>>/4                  21.0 ns         10.2 ns
bm_ranges_find<std::deque<int>>/5                  21.9 ns         10.4 ns
bm_ranges_find<std::deque<int>>/6                  22.7 ns         10.2 ns
bm_ranges_find<std::deque<int>>/7                  23.9 ns         10.2 ns
bm_ranges_find<std::deque<int>>/8                  23.8 ns         10.2 ns
bm_ranges_find<std::deque<int>>/16                 27.2 ns         27.1 ns
bm_ranges_find<std::deque<int>>/64                 42.4 ns         32.4 ns
bm_ranges_find<std::deque<int>>/512                 122 ns         43.0 ns
bm_ranges_find<std::deque<int>>/4096                895 ns         93.7 ns
bm_ranges_find<std::deque<int>>/32768              6890 ns          756 ns
bm_ranges_find<std::deque<int>>/262144            54025 ns         6102 ns
bm_ranges_find<std::deque<int>>/1048576          221558 ns        32783 ns
```
2023-12-15 17:10:16 +01:00
Nikolas Klauser
c81bfc61da [libc++] Optimize for_each for segmented iterators
```
---------------------------------------------------
Benchmark                       old             new
---------------------------------------------------
bm_for_each/1               3.00 ns         2.98 ns
bm_for_each/2               4.53 ns         4.57 ns
bm_for_each/3               5.82 ns         5.82 ns
bm_for_each/4               6.94 ns         6.91 ns
bm_for_each/5               7.55 ns         7.75 ns
bm_for_each/6               7.06 ns         7.45 ns
bm_for_each/7               6.69 ns         7.14 ns
bm_for_each/8               6.86 ns         4.06 ns
bm_for_each/16              11.5 ns         5.73 ns
bm_for_each/64              43.7 ns         4.06 ns
bm_for_each/512              356 ns         7.98 ns
bm_for_each/4096            2787 ns         53.6 ns
bm_for_each/32768          20836 ns          438 ns
bm_for_each/262144        195362 ns         4945 ns
bm_for_each/1048576       685482 ns        19822 ns
```

Reviewed By: ldionne, Mordante, #libc

Spies: bgraur, sberg, arichardson, libcxx-commits

Differential Revision: https://reviews.llvm.org/D151274
2023-11-14 23:55:24 +01:00
Nikolas Klauser
a9138cdb36 [libc++] Optimize ranges::count for __bit_iterators
```
---------------------------------------------------------------
Benchmark                                    old            new
---------------------------------------------------------------
bm_vector_bool_count/1                   1.92 ns        1.92 ns
bm_vector_bool_count/2                   1.92 ns        1.92 ns
bm_vector_bool_count/3                   1.92 ns        1.92 ns
bm_vector_bool_count/4                   1.92 ns        1.92 ns
bm_vector_bool_count/5                   1.92 ns        1.92 ns
bm_vector_bool_count/6                   1.92 ns        1.92 ns
bm_vector_bool_count/7                   1.92 ns        1.92 ns
bm_vector_bool_count/8                   1.92 ns        1.92 ns
bm_vector_bool_count/16                  1.92 ns        1.92 ns
bm_vector_bool_count/64                  2.24 ns        2.25 ns
bm_vector_bool_count/512                 3.19 ns        3.20 ns
bm_vector_bool_count/4096                14.1 ns        12.3 ns
bm_vector_bool_count/32768               84.0 ns        83.6 ns
bm_vector_bool_count/262144               664 ns         661 ns
bm_vector_bool_count/1048576             2623 ns        2628 ns
bm_vector_bool_ranges_count/1            1.07 ns        1.92 ns
bm_vector_bool_ranges_count/2            1.65 ns        1.92 ns
bm_vector_bool_ranges_count/3            2.27 ns        1.92 ns
bm_vector_bool_ranges_count/4            2.68 ns        1.92 ns
bm_vector_bool_ranges_count/5            3.33 ns        1.92 ns
bm_vector_bool_ranges_count/6            3.99 ns        1.92 ns
bm_vector_bool_ranges_count/7            4.67 ns        1.92 ns
bm_vector_bool_ranges_count/8            5.19 ns        1.92 ns
bm_vector_bool_ranges_count/16           11.1 ns        1.92 ns
bm_vector_bool_ranges_count/64           52.2 ns        2.24 ns
bm_vector_bool_ranges_count/512           452 ns        3.20 ns
bm_vector_bool_ranges_count/4096         3577 ns        12.1 ns
bm_vector_bool_ranges_count/32768       28725 ns        83.7 ns
bm_vector_bool_ranges_count/262144     229676 ns         662 ns
bm_vector_bool_ranges_count/1048576    905574 ns        2625 ns
```

Reviewed By: #libc, ldionne

Spies: arichardson, ldionne, libcxx-commits

Differential Revision: https://reviews.llvm.org/D156956
2023-10-06 22:58:41 +02:00
Zijun Zhao
0218ea4aaa [libc++] Implement ranges::ends_with
Reviewed By: #libc, var-const

Differential Revision: https://reviews.llvm.org/D150831
2023-09-18 11:56:10 -07:00
Nikolas Klauser
68b1035965 [libc++][PSTL] Add a __parallel_sort implementation to libdispatch
Reviewed By: #libc, ldionne

Spies: ldionne, libcxx-commits

Differential Revision: https://reviews.llvm.org/D155136
2023-08-15 12:20:40 -07:00
Nikolas Klauser
8670b53e11 [libc++] Optimize ranges::find for vector<bool>
Benchmark results:
```
----------------------------------------------------------------
Benchmark                                    old             new
----------------------------------------------------------------
bm_vector_bool_ranges_find/1             5.64 ns         6.08 ns
bm_vector_bool_ranges_find/2             16.5 ns         6.03 ns
bm_vector_bool_ranges_find/3             20.3 ns         6.07 ns
bm_vector_bool_ranges_find/4             22.2 ns         6.08 ns
bm_vector_bool_ranges_find/5             23.5 ns         6.05 ns
bm_vector_bool_ranges_find/6             24.4 ns         6.10 ns
bm_vector_bool_ranges_find/7             26.7 ns         6.10 ns
bm_vector_bool_ranges_find/8             25.0 ns         6.08 ns
bm_vector_bool_ranges_find/16            27.9 ns         6.07 ns
bm_vector_bool_ranges_find/64            44.5 ns         5.35 ns
bm_vector_bool_ranges_find/512            243 ns         25.7 ns
bm_vector_bool_ranges_find/4096          1858 ns         35.6 ns
bm_vector_bool_ranges_find/32768        15461 ns         93.5 ns
bm_vector_bool_ranges_find/262144      126462 ns          571 ns
bm_vector_bool_ranges_find/1048576     497736 ns         2272 ns
```

Reviewed By: #libc, Mordante

Spies: var-const, Mordante, libcxx-commits

Differential Revision: https://reviews.llvm.org/D156039
2023-08-01 10:28:25 -07:00
Louis Dionne
5aa03b648b [libc++][NFC] Apply clang-format on large parts of the code base
This commit does a pass of clang-format over files in libc++ that
don't require major changes to conform to our style guide, or for
which we're not overly concerned about conflicting with in-flight
patches or hindering the git blame.

This roughly covers:
- benchmarks
- range algorithms
- concepts
- type traits

I did a manual verification of all the changes, and in particular I
applied clang-format on/off annotations in a few places where the
result was less readable after than before. This was not necessary
in a lot of places, however I did find that clang-format had pretty
bad taste when it comes to formatting concepts.

Differential Revision: https://reviews.llvm.org/D153140
2023-06-19 11:19:51 -04:00
Nikolas Klauser
d965960fcf Revert "[libc++] Optimize for_each for segmented iterators"
This reverts commit b1dc43aa3a05c2f14725e2e6428544208ccbe161.
2023-06-05 10:00:02 -07:00
Nikolas Klauser
b1dc43aa3a [libc++] Optimize for_each for segmented iterators
```
---------------------------------------------------
Benchmark                       old             new
---------------------------------------------------
bm_for_each/1               3.00 ns         2.98 ns
bm_for_each/2               4.53 ns         4.57 ns
bm_for_each/3               5.82 ns         5.82 ns
bm_for_each/4               6.94 ns         6.91 ns
bm_for_each/5               7.55 ns         7.75 ns
bm_for_each/6               7.06 ns         7.45 ns
bm_for_each/7               6.69 ns         7.14 ns
bm_for_each/8               6.86 ns         4.06 ns
bm_for_each/16              11.5 ns         5.73 ns
bm_for_each/64              43.7 ns         4.06 ns
bm_for_each/512              356 ns         7.98 ns
bm_for_each/4096            2787 ns         53.6 ns
bm_for_each/32768          20836 ns          438 ns
bm_for_each/262144        195362 ns         4945 ns
bm_for_each/1048576       685482 ns        19822 ns
```

Reviewed By: ldionne, Mordante, #libc

Spies: arichardson, libcxx-commits

Differential Revision: https://reviews.llvm.org/D151274
2023-05-31 18:15:25 -07:00
Nikolas Klauser
1fd08edd58 [libc++] Forward to std::{,w}memchr in std::find
Reviewed By: #libc, ldionne

Spies: Mordante, libcxx-commits, ldionne, mikhail.ramalho

Differential Revision: https://reviews.llvm.org/D144394
2023-05-25 07:59:50 -07:00
Nikolas Klauser
aff3cdc604 [libc++] Optimize std::ranges::{min, max} for types that are cheap to copy
Don't forward to `min_element` for small types that are trivially copyable, and instead use a naive loop that keeps track of the smallest element (as opposed to an iterator to the smallest element). This allows the compiler to vectorize the loop in some cases.

Reviewed By: #libc, ldionne

Spies: ldionne, libcxx-commits

Differential Revision: https://reviews.llvm.org/D143596
2023-03-11 16:28:24 +01:00
Nikolas Klauser
b4ecfd3c46 [libc++] Forward to std::memcmp for trivially comparable types in equal
Reviewed By: #libc, ldionne

Spies: ldionne, Mordante, libcxx-commits

Differential Revision: https://reviews.llvm.org/D139554
2023-02-21 17:11:21 +01:00
Louis Dionne
355e0ce3c5 [libc++] Extend check for non-ASCII characters to src/, test/ and benchmarks/
Differential Revision: https://reviews.llvm.org/D132180
2022-08-23 18:36:38 -04:00
Konstantin Varlamov
c945bd0da6 [libc++][ranges] Implement modifying heap algorithms:
- `ranges::make_heap`;
- `ranges::push_heap`;
- `ranges::pop_heap`;
- `ranges::sort_heap`.

Differential Revision: https://reviews.llvm.org/D128115
2022-07-08 13:48:41 -07:00
Konstantin Varlamov
94c7b89fe5 [libc++][ranges] Implement ranges::stable_sort.
Differential Revision: https://reviews.llvm.org/D127834
2022-07-01 16:34:26 -07:00
Konstantin Varlamov
ff3989e6ae [libc++][ranges] Implement ranges::sort.
Differential Revision: https://reviews.llvm.org/D127557
2022-06-16 15:21:06 -07:00
Nikolas Klauser
8171586176 [libc++][ranges] Implement ranges::binary_search and ranges::{lower, upper}_bound
Reviewed By: Mordante, var-const, ldionne, #libc

Spies: sstefan1, libcxx-commits, mgorny

Differential Revision: https://reviews.llvm.org/D121964
2022-06-06 13:33:18 +02:00
Nikolas Klauser
f94a447679 [libc++] Granularize algorithm benchmarks
Reviewed By: ldionne, #libc

Spies: libcxx-commits, mgorny, mgrang

Differential Revision: https://reviews.llvm.org/D124740
2022-05-19 16:13:52 +02:00