Fixes#108624
This allows `flat_map::insert(Iter, Iter)` to directly forward to
underlying containers' `insert(Iter, Iter)`, instead of inserting one
element at a time, when input models "product iterator". atm,
`flat_map::iterator` and `zip_view::iterator` are "product iterator"s.
This gives about almost 10x speed up in my benchmark with -03 (for both
before and after)
```cpp
Benchmark Time CPU Time Old Time New CPU Old CPU New
-----------------------------------------------------------------------------------------------------------------------------------------------
flat_map::insert_product_iterator_flat_map/32 -0.5028 -0.5320 149 74 149 70
flat_map::insert_product_iterator_flat_map/1024 -0.8617 -0.8618 3113 430 3112 430
flat_map::insert_product_iterator_flat_map/8192 -0.8877 -0.8877 26682 2995 26679 2995
flat_map::insert_product_iterator_flat_map/65536 -0.8769 -0.8769 226235 27844 226221 27841
flat_map::insert_product_iterator_zip/32 -0.5844 -0.5844 162 67 162 67
flat_map::insert_product_iterator_zip/1024 -0.8754 -0.8754 3427 427 3427 427
flat_map::insert_product_iterator_zip/8192 -0.8934 -0.8934 28134 3000 28132 3000
flat_map::insert_product_iterator_zip/65536 -0.8783 -0.8783 229783 27960 229767 27958
OVERALL_GEOMEAN -0.8319 -0.8332 0 0 0 0
```
---------
Co-authored-by: Louis Dionne <ldionne.2@gmail.com>
This patch does not significantly change how the sequence container
benchmarks are done, but it adopts the same style as the associative
container benchmarks.
This commit does adjust how we were benchmarking push_back, where we
never really measured the overhead of the slow path of push_back (when
we need to reallocate).
This patch implements generic associative container benchmarks for
containers with unique keys. In doing so, it replaces the existing
std::map benchmarks which were based on the cartesian product
infrastructure and were too slow to execute.
These new benchmarks aim to strike a balance between exhaustive coverage
of all operations in the most interesting case, while executing fairly
rapidly (~40s on my machine).
This bumps the requirement for the map benchmarks from C++17 to C++20
because the common header that provides associative container benchmarks
requires support for C++20 concepts.
Rewrite the sequence container benchmarks to only rely on the actual
operations specified in SequenceContainer requirements and add
benchmarks for std::list, which is also considered a sequence container.
One of the major goals of this refactoring is also to make these
container benchmarks run faster so that they can be run more frequently.
The existing benchmarks have the significant problem that they take so
long to run that they must basically be run overnight. This patch
reduces the size of inputs such that the rewritten benchmarks each take
at most a minute to run.
This patch doesn't touch the string benchmarks, which were not using the
generic container benchmark functions previously.
As a follow-up to #113852, this PR optimizes the performance of the
`insert(const_iterator pos, InputIt first, InputIt last)` function for
`input_iterator`-pair inputs in `std::vector` for cases where
reallocation occurs during insertion. Additionally, this optimization
enhances exception safety by replacing the traditional `try-catch`
mechanism with a modern exception guard for the `insert` function.
The optimization targets cases where insertion trigger reallocation. In
scenarios without reallocation, the implementation remains unchanged.
Previous implementation
-----------------------
The previous implementation of `insert` is inefficient in reallocation
scenarios because it performs the following steps separately:
- `reserve()`: This leads to the first round of relocating old
elements to new memory;
- `rotate()`: This leads to the second round of reorganizing the
existing elements;
- Move-forward: Moves the elements after the insertion position to
their final positions.
- Insert: performs the actual insertion.
This approach results in a lot of redundant operations, requiring the
elements to undergo three rounds of relocations/reorganizations to be
placed in their final positions.
Proposed implementation
-----------------------
The proposed implementation jointly optimize the above 4 steps in the
previous implementation such that each element is placed in its final
position in just one round of relocation. Specifically, this
optimization reduces the total cost from 2 relocations + 1 std::rotate
call to just 1 relocation, without needing to call `std::rotate`,
thereby significantly improving overall performance.