llvm-project

Author	SHA1	Message	Date
Nikolas Klauser	1cac2be874	[libc++] Optimize copy construction and assignment of __tree (#151304 ) ``` ---------------------------------------------------------------------------------------------------------- Benchmark old new ---------------------------------------------------------------------------------------------------------- std::map<int, int>::ctor(const&)/0 15.5 ns 14.9 ns std::map<int, int>::ctor(const&)/32 474 ns 321 ns std::map<int, int>::ctor(const&)/1024 24591 ns 11101 ns std::map<int, int>::ctor(const&)/8192 236153 ns 98868 ns std::map<std::string, int>::ctor(const&)/0 15.2 ns 14.9 ns std::map<std::string, int>::ctor(const&)/32 2673 ns 2340 ns std::map<std::string, int>::ctor(const&)/1024 115354 ns 86088 ns std::map<std::string, int>::ctor(const&)/8192 1298510 ns 626876 ns std::map<int, int>::operator=(const&) (into cleared Container)/0 16.5 ns 16.1 ns std::map<int, int>::operator=(const&) (into cleared Container)/32 548 ns 323 ns std::map<int, int>::operator=(const&) (into cleared Container)/1024 28418 ns 11026 ns std::map<int, int>::operator=(const&) (into cleared Container)/8192 281827 ns 97113 ns std::map<int, int>::operator=(const&) (into populated Container)/0 2.42 ns 1.85 ns std::map<int, int>::operator=(const&) (into populated Container)/32 369 ns 73.0 ns std::map<int, int>::operator=(const&) (into populated Container)/1024 24078 ns 2322 ns std::map<int, int>::operator=(const&) (into populated Container)/8192 266537 ns 22963 ns std::map<std::string, int>::operator=(const&) (into cleared Container)/0 16.6 ns 16.2 ns std::map<std::string, int>::operator=(const&) (into cleared Container)/32 2614 ns 1622 ns std::map<std::string, int>::operator=(const&) (into cleared Container)/1024 116826 ns 63281 ns std::map<std::string, int>::operator=(const&) (into cleared Container)/8192 1316655 ns 649177 ns std::map<std::string, int>::operator=(const&) (into populated Container)/0 2.42 ns 1.89 ns std::map<std::string, int>::operator=(const&) (into populated Container)/32 1264 ns 581 ns std::map<std::string, int>::operator=(const&) (into populated Container)/1024 238826 ns 39943 ns std::map<std::string, int>::operator=(const&) (into populated Container)/8192 2412327 ns 379456 ns ``` Fixes #77658 Fixes #62571	2025-08-05 09:49:40 +02:00
xbcnn	3efa461d45	[libcxx] Avoid hash key in __hash_table::find() if it is empty. (#126837 ) If the hash table is empty, with or without buckets, the find() can do fast return. Then computing hash key is useless and avoidable, since it could be expensive for some key types, such as long strings. This is a small optimization but useful in cases like a checklist (unordered_set/map) which is mostly empty. ``` For std::unordered_set<*>, `--benchmark_filter=find` 1. With the opt: --------------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations --------------------------------------------------------------------------------------------------------- std::unordered_set<int>::find(key) (existent)/0 0.118 ns 0.118 ns 5939922720 std::unordered_set<int>::find(key) (existent)/32 52.1 ns 52.1 ns 13287232 std::unordered_set<int>::find(key) (existent)/1024 51.1 ns 51.1 ns 13449472 std::unordered_set<int>::find(key) (existent)/8192 53.1 ns 53.1 ns 13420864 std::unordered_set<int>::find(key) (non-existent)/0 14.7 ns 14.7 ns 47725472 std::unordered_set<int>::find(key) (non-existent)/32 44.1 ns 44.1 ns 15478144 std::unordered_set<int>::find(key) (non-existent)/1024 41.2 ns 41.2 ns 15082464 std::unordered_set<int>::find(key) (non-existent)/8192 49.5 ns 49.5 ns 15233600 std::unordered_set<std::string>::find(key) (existent)/0 0.136 ns 0.136 ns 5157977920 std::unordered_set<std::string>::find(key) (existent)/32 739 ns 739 ns 1023744 std::unordered_set<std::string>::find(key) (existent)/1024 836 ns 836 ns 840448 std::unordered_set<std::string>::find(key) (existent)/8192 768 ns 768 ns 1085664 std::unordered_set<std::string>::find(key) (non-existent)/0 14.6 ns 14.6 ns 47844160 std::unordered_set<std::string>::find(key) (non-existent)/32 608 ns 608 ns 1106496 std::unordered_set<std::string>::find(key) (non-existent)/1024 646 ns 646 ns 986272 std::unordered_set<std::string>::find(key) (non-existent)/8192 669 ns 669 ns 1047584 2. Without the opt: --------------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations --------------------------------------------------------------------------------------------------------- std::unordered_set<int>::find(key) (existent)/0 0.135 ns 0.135 ns 5188502304 std::unordered_set<int>::find(key) (existent)/32 54.4 ns 54.4 ns 12954144 std::unordered_set<int>::find(key) (existent)/1024 57.7 ns 57.7 ns 13107008 std::unordered_set<int>::find(key) (existent)/8192 50.7 ns 50.7 ns 12953312 std::unordered_set<int>::find(key) (non-existent)/0 16.1 ns 16.1 ns 43460192 std::unordered_set<int>::find(key) (non-existent)/32 45.8 ns 45.8 ns 17139584 std::unordered_set<int>::find(key) (non-existent)/1024 44.6 ns 44.6 ns 16538048 std::unordered_set<int>::find(key) (non-existent)/8192 41.5 ns 41.5 ns 12850816 std::unordered_set<std::string>::find(key) (existent)/0 0.133 ns 0.133 ns 5214104992 std::unordered_set<std::string>::find(key) (existent)/32 731 ns 731 ns 1000576 std::unordered_set<std::string>::find(key) (existent)/1024 716 ns 716 ns 1131584 std::unordered_set<std::string>::find(key) (existent)/8192 745 ns 745 ns 909632 std::unordered_set<std::string>::find(key) (non-existent)/0 600 ns 600 ns 1089792 std::unordered_set<std::string>::find(key) (non-existent)/32 645 ns 645 ns 979232 std::unordered_set<std::string>::find(key) (non-existent)/1024 675 ns 675 ns 962240 std::unordered_set<std::string>::find(key) (non-existent)/8192 711 ns 711 ns 1054880 ``` We can see the improvements when find() for non-existent `std::string`(random size 1~1024) keys: ``` std::unordered_set<std::string>::find(key) (non-existent)/0 14.6 ns 14.6 ns 47844160 std::unordered_set<std::string>::find(key) (non-existent)/0 600 ns 600 ns 1089792 ``` --------- Co-authored-by: yangxiaobing <yangxiaobing@jwzg.com>	2025-07-03 09:39:06 +02:00
Hui	34b2e934ea	[libc++] Introduce `__product_iterator_traits` and optimise `flat_map::insert` (#139454 ) Fixes #108624 This allows `flat_map::insert(Iter, Iter)` to directly forward to underlying containers' `insert(Iter, Iter)`, instead of inserting one element at a time, when input models "product iterator". atm, `flat_map::iterator` and `zip_view::iterator` are "product iterator"s. This gives about almost 10x speed up in my benchmark with -03 (for both before and after) ```cpp Benchmark Time CPU Time Old Time New CPU Old CPU New ----------------------------------------------------------------------------------------------------------------------------------------------- flat_map::insert_product_iterator_flat_map/32 -0.5028 -0.5320 149 74 149 70 flat_map::insert_product_iterator_flat_map/1024 -0.8617 -0.8618 3113 430 3112 430 flat_map::insert_product_iterator_flat_map/8192 -0.8877 -0.8877 26682 2995 26679 2995 flat_map::insert_product_iterator_flat_map/65536 -0.8769 -0.8769 226235 27844 226221 27841 flat_map::insert_product_iterator_zip/32 -0.5844 -0.5844 162 67 162 67 flat_map::insert_product_iterator_zip/1024 -0.8754 -0.8754 3427 427 3427 427 flat_map::insert_product_iterator_zip/8192 -0.8934 -0.8934 28134 3000 28132 3000 flat_map::insert_product_iterator_zip/65536 -0.8783 -0.8783 229783 27960 229767 27958 OVERALL_GEOMEAN -0.8319 -0.8332 0 0 0 0 ``` --------- Co-authored-by: Louis Dionne <ldionne.2@gmail.com>	2025-06-28 13:42:50 +01:00
Louis Dionne	27598aba49	[libc++] Further refactor sequence container benchmarks (#126129 ) This patch does not significantly change how the sequence container benchmarks are done, but it adopts the same style as the associative container benchmarks. This commit does adjust how we were benchmarking push_back, where we never really measured the overhead of the slow path of push_back (when we need to reallocate).	2025-02-07 09:56:45 -05:00
Louis Dionne	1d319dfe7d	[libc++] Implement generic associative container benchmarks (#123663 ) This patch implements generic associative container benchmarks for containers with unique keys. In doing so, it replaces the existing std::map benchmarks which were based on the cartesian product infrastructure and were too slow to execute. These new benchmarks aim to strike a balance between exhaustive coverage of all operations in the most interesting case, while executing fairly rapidly (~40s on my machine). This bumps the requirement for the map benchmarks from C++17 to C++20 because the common header that provides associative container benchmarks requires support for C++20 concepts.	2025-02-06 16:08:55 -05:00
Louis Dionne	439bef9751	[libc++] Refactor the sequence container benchmarks (#119763 ) Rewrite the sequence container benchmarks to only rely on the actual operations specified in SequenceContainer requirements and add benchmarks for std::list, which is also considered a sequence container. One of the major goals of this refactoring is also to make these container benchmarks run faster so that they can be run more frequently. The existing benchmarks have the significant problem that they take so long to run that they must basically be run overnight. This patch reduces the size of inputs such that the rewritten benchmarks each take at most a minute to run. This patch doesn't touch the string benchmarks, which were not using the generic container benchmark functions previously.	2025-01-30 15:02:34 -05:00
Louis Dionne	65dc0d4447	[libc++] Remove string benchmark for internal function We strive to keep our benchmarks portable, so we should only benchmark standard APIs.	2025-01-16 17:46:12 -05:00
Peng Liu	0298e58c7d	[libc++] Optimize input_iterator-pair `insert` for std::vector (#113768 ) As a follow-up to #113852, this PR optimizes the performance of the `insert(const_iterator pos, InputIt first, InputIt last)` function for `input_iterator`-pair inputs in `std::vector` for cases where reallocation occurs during insertion. Additionally, this optimization enhances exception safety by replacing the traditional `try-catch` mechanism with a modern exception guard for the `insert` function. The optimization targets cases where insertion trigger reallocation. In scenarios without reallocation, the implementation remains unchanged. Previous implementation ----------------------- The previous implementation of `insert` is inefficient in reallocation scenarios because it performs the following steps separately: - `reserve()`: This leads to the first round of relocating old elements to new memory; - `rotate()`: This leads to the second round of reorganizing the existing elements; - Move-forward: Moves the elements after the insertion position to their final positions. - Insert: performs the actual insertion. This approach results in a lot of redundant operations, requiring the elements to undergo three rounds of relocations/reorganizations to be placed in their final positions. Proposed implementation ----------------------- The proposed implementation jointly optimize the above 4 steps in the previous implementation such that each element is placed in its final position in just one round of relocation. Specifically, this optimization reduces the total cost from 2 relocations + 1 std::rotate call to just 1 relocation, without needing to call `std::rotate`, thereby significantly improving overall performance.	2025-01-14 11:40:29 -05:00
Louis Dionne	6a9279ca40	[libc++] Slight reorganization of the benchmarks (#119625 ) Move various container benchmarks to the same subdirectory, and regroup some format-related benchmarks.	2024-12-12 08:14:50 -05:00

9 Commits