9 Commits

Author SHA1 Message Date
Nikolas Klauser
1cac2be874
[libc++] Optimize copy construction and assignment of __tree (#151304)
```
----------------------------------------------------------------------------------------------------------
Benchmark                                                                              old             new
----------------------------------------------------------------------------------------------------------
std::map<int, int>::ctor(const&)/0                                                 15.5 ns         14.9 ns
std::map<int, int>::ctor(const&)/32                                                 474 ns          321 ns
std::map<int, int>::ctor(const&)/1024                                             24591 ns        11101 ns
std::map<int, int>::ctor(const&)/8192                                            236153 ns        98868 ns
std::map<std::string, int>::ctor(const&)/0                                         15.2 ns         14.9 ns
std::map<std::string, int>::ctor(const&)/32                                        2673 ns         2340 ns
std::map<std::string, int>::ctor(const&)/1024                                    115354 ns        86088 ns
std::map<std::string, int>::ctor(const&)/8192                                   1298510 ns       626876 ns
std::map<int, int>::operator=(const&) (into cleared Container)/0                   16.5 ns         16.1 ns
std::map<int, int>::operator=(const&) (into cleared Container)/32                   548 ns          323 ns
std::map<int, int>::operator=(const&) (into cleared Container)/1024               28418 ns        11026 ns
std::map<int, int>::operator=(const&) (into cleared Container)/8192              281827 ns        97113 ns
std::map<int, int>::operator=(const&) (into populated Container)/0                 2.42 ns         1.85 ns
std::map<int, int>::operator=(const&) (into populated Container)/32                 369 ns         73.0 ns
std::map<int, int>::operator=(const&) (into populated Container)/1024             24078 ns         2322 ns
std::map<int, int>::operator=(const&) (into populated Container)/8192            266537 ns        22963 ns
std::map<std::string, int>::operator=(const&) (into cleared Container)/0           16.6 ns         16.2 ns
std::map<std::string, int>::operator=(const&) (into cleared Container)/32          2614 ns         1622 ns
std::map<std::string, int>::operator=(const&) (into cleared Container)/1024      116826 ns        63281 ns
std::map<std::string, int>::operator=(const&) (into cleared Container)/8192     1316655 ns       649177 ns
std::map<std::string, int>::operator=(const&) (into populated Container)/0         2.42 ns         1.89 ns
std::map<std::string, int>::operator=(const&) (into populated Container)/32        1264 ns          581 ns
std::map<std::string, int>::operator=(const&) (into populated Container)/1024    238826 ns        39943 ns
std::map<std::string, int>::operator=(const&) (into populated Container)/8192   2412327 ns       379456 ns
```

Fixes #77658
Fixes #62571
2025-08-05 09:49:40 +02:00
xbcnn
3efa461d45
[libcxx] Avoid hash key in __hash_table::find() if it is empty. (#126837)
If the hash table is empty, with or without buckets, the find() can do
fast return. Then computing hash key is useless and avoidable, since it
could be expensive for some key types, such as long strings.

This is a small optimization but useful in cases like a checklist
(unordered_set/map) which is mostly empty.

```
For std::unordered_set<*>, `--benchmark_filter=find`
1. With the opt:

---------------------------------------------------------------------------------------------------------
Benchmark                                                               Time             CPU   Iterations
---------------------------------------------------------------------------------------------------------
std::unordered_set<int>::find(key) (existent)/0                     0.118 ns        0.118 ns   5939922720
std::unordered_set<int>::find(key) (existent)/32                     52.1 ns         52.1 ns     13287232
std::unordered_set<int>::find(key) (existent)/1024                   51.1 ns         51.1 ns     13449472
std::unordered_set<int>::find(key) (existent)/8192                   53.1 ns         53.1 ns     13420864
std::unordered_set<int>::find(key) (non-existent)/0                  14.7 ns         14.7 ns     47725472
std::unordered_set<int>::find(key) (non-existent)/32                 44.1 ns         44.1 ns     15478144
std::unordered_set<int>::find(key) (non-existent)/1024               41.2 ns         41.2 ns     15082464
std::unordered_set<int>::find(key) (non-existent)/8192               49.5 ns         49.5 ns     15233600
std::unordered_set<std::string>::find(key) (existent)/0             0.136 ns        0.136 ns   5157977920
std::unordered_set<std::string>::find(key) (existent)/32              739 ns          739 ns      1023744
std::unordered_set<std::string>::find(key) (existent)/1024            836 ns          836 ns       840448
std::unordered_set<std::string>::find(key) (existent)/8192            768 ns          768 ns      1085664
std::unordered_set<std::string>::find(key) (non-existent)/0          14.6 ns         14.6 ns     47844160
std::unordered_set<std::string>::find(key) (non-existent)/32          608 ns          608 ns      1106496
std::unordered_set<std::string>::find(key) (non-existent)/1024        646 ns          646 ns       986272
std::unordered_set<std::string>::find(key) (non-existent)/8192        669 ns          669 ns      1047584


2. Without the opt:

---------------------------------------------------------------------------------------------------------
Benchmark                                                               Time             CPU   Iterations
---------------------------------------------------------------------------------------------------------
std::unordered_set<int>::find(key) (existent)/0                     0.135 ns        0.135 ns   5188502304
std::unordered_set<int>::find(key) (existent)/32                     54.4 ns         54.4 ns     12954144
std::unordered_set<int>::find(key) (existent)/1024                   57.7 ns         57.7 ns     13107008
std::unordered_set<int>::find(key) (existent)/8192                   50.7 ns         50.7 ns     12953312
std::unordered_set<int>::find(key) (non-existent)/0                  16.1 ns         16.1 ns     43460192
std::unordered_set<int>::find(key) (non-existent)/32                 45.8 ns         45.8 ns     17139584
std::unordered_set<int>::find(key) (non-existent)/1024               44.6 ns         44.6 ns     16538048
std::unordered_set<int>::find(key) (non-existent)/8192               41.5 ns         41.5 ns     12850816
std::unordered_set<std::string>::find(key) (existent)/0             0.133 ns        0.133 ns   5214104992
std::unordered_set<std::string>::find(key) (existent)/32              731 ns          731 ns      1000576
std::unordered_set<std::string>::find(key) (existent)/1024            716 ns          716 ns      1131584
std::unordered_set<std::string>::find(key) (existent)/8192            745 ns          745 ns       909632
std::unordered_set<std::string>::find(key) (non-existent)/0           600 ns          600 ns      1089792
std::unordered_set<std::string>::find(key) (non-existent)/32          645 ns          645 ns       979232
std::unordered_set<std::string>::find(key) (non-existent)/1024        675 ns          675 ns       962240
std::unordered_set<std::string>::find(key) (non-existent)/8192        711 ns          711 ns      1054880

```

We can see the improvements when find() for non-existent
`std::string`(random size 1~1024) keys:
```
std::unordered_set<std::string>::find(key) (non-existent)/0          14.6 ns         14.6 ns     47844160
std::unordered_set<std::string>::find(key) (non-existent)/0           600 ns          600 ns      1089792
```

---------

Co-authored-by: yangxiaobing <yangxiaobing@jwzg.com>
2025-07-03 09:39:06 +02:00
Hui
34b2e934ea
[libc++] Introduce __product_iterator_traits and optimise flat_map::insert (#139454)
Fixes #108624

This allows `flat_map::insert(Iter, Iter)` to directly forward to
underlying containers' `insert(Iter, Iter)`, instead of inserting one
element at a time, when input models "product iterator". atm,
`flat_map::iterator` and `zip_view::iterator` are "product iterator"s.

This gives about almost 10x speed up in my benchmark with -03 (for both
before and after)

```cpp
Benchmark                                                          Time             CPU      Time Old      Time New       CPU Old       CPU New
-----------------------------------------------------------------------------------------------------------------------------------------------
flat_map::insert_product_iterator_flat_map/32                   -0.5028         -0.5320           149            74           149            70
flat_map::insert_product_iterator_flat_map/1024                 -0.8617         -0.8618          3113           430          3112           430
flat_map::insert_product_iterator_flat_map/8192                 -0.8877         -0.8877         26682          2995         26679          2995
flat_map::insert_product_iterator_flat_map/65536                -0.8769         -0.8769        226235         27844        226221         27841
flat_map::insert_product_iterator_zip/32                        -0.5844         -0.5844           162            67           162            67
flat_map::insert_product_iterator_zip/1024                      -0.8754         -0.8754          3427           427          3427           427
flat_map::insert_product_iterator_zip/8192                      -0.8934         -0.8934         28134          3000         28132          3000
flat_map::insert_product_iterator_zip/65536                     -0.8783         -0.8783        229783         27960        229767         27958
OVERALL_GEOMEAN                                                 -0.8319         -0.8332             0             0             0             0
```

---------

Co-authored-by: Louis Dionne <ldionne.2@gmail.com>
2025-06-28 13:42:50 +01:00
Louis Dionne
27598aba49
[libc++] Further refactor sequence container benchmarks (#126129)
This patch does not significantly change how the sequence container
benchmarks are done, but it adopts the same style as the associative
container benchmarks.

This commit does adjust how we were benchmarking push_back, where we
never really measured the overhead of the slow path of push_back (when
we need to reallocate).
2025-02-07 09:56:45 -05:00
Louis Dionne
1d319dfe7d
[libc++] Implement generic associative container benchmarks (#123663)
This patch implements generic associative container benchmarks for
containers with unique keys. In doing so, it replaces the existing
std::map benchmarks which were based on the cartesian product
infrastructure and were too slow to execute.

These new benchmarks aim to strike a balance between exhaustive coverage
of all operations in the most interesting case, while executing fairly
rapidly (~40s on my machine).

This bumps the requirement for the map benchmarks from C++17 to C++20
because the common header that provides associative container benchmarks
requires support for C++20 concepts.
2025-02-06 16:08:55 -05:00
Louis Dionne
439bef9751
[libc++] Refactor the sequence container benchmarks (#119763)
Rewrite the sequence container benchmarks to only rely on the actual
operations specified in SequenceContainer requirements and add
benchmarks for std::list, which is also considered a sequence container.

One of the major goals of this refactoring is also to make these
container benchmarks run faster so that they can be run more frequently.
The existing benchmarks have the significant problem that they take so
long to run that they must basically be run overnight. This patch
reduces the size of inputs such that the rewritten benchmarks each take
at most a minute to run.

This patch doesn't touch the string benchmarks, which were not using the
generic container benchmark functions previously.
2025-01-30 15:02:34 -05:00
Louis Dionne
65dc0d4447 [libc++] Remove string benchmark for internal function
We strive to keep our benchmarks portable, so we should only
benchmark standard APIs.
2025-01-16 17:46:12 -05:00
Peng Liu
0298e58c7d
[libc++] Optimize input_iterator-pair insert for std::vector (#113768)
As a follow-up to #113852, this PR optimizes the performance of the
`insert(const_iterator pos, InputIt first, InputIt last)` function for
`input_iterator`-pair inputs in `std::vector` for cases where
reallocation occurs during insertion. Additionally, this optimization
enhances exception safety by replacing the traditional `try-catch`
mechanism with a modern exception guard for the `insert` function.

The optimization targets cases where insertion trigger reallocation. In
scenarios without reallocation, the implementation remains unchanged.

Previous implementation
-----------------------
The previous implementation of `insert` is inefficient in reallocation
scenarios because it performs the following steps separately:
- `reserve()`: This leads to the first round of relocating old
elements to new memory;
- `rotate()`: This leads to the second round of reorganizing the
existing elements;
- Move-forward: Moves the elements after the insertion position to
their final positions.
- Insert: performs the actual insertion.

This approach results in a lot of redundant operations, requiring the
elements to undergo three rounds of relocations/reorganizations to be
placed in their final positions.

Proposed implementation
-----------------------
The proposed implementation jointly optimize the above 4 steps in the
previous implementation such that each element is placed in its final
position in just one round of relocation. Specifically, this
optimization reduces the total cost from 2 relocations + 1 std::rotate
call to just 1 relocation, without needing to call `std::rotate`,
thereby significantly improving overall performance.
2025-01-14 11:40:29 -05:00
Louis Dionne
6a9279ca40
[libc++] Slight reorganization of the benchmarks (#119625)
Move various container benchmarks to the same subdirectory, and regroup
some format-related benchmarks.
2024-12-12 08:14:50 -05:00