llvm-project

Author	SHA1	Message	Date
Nikolas Klauser	4e112e5c1c	Reapply "[libc++] Optimize vector growing of trivially relocatable types" (#80558 ) This reapplies #76657. Non-trivial elements didn't get destroyed previously. This fixes the bug and adds tests for all the vector insertion functions.	2024-02-04 00:28:29 +01:00
Kirill Stoimenov	2352fdd202	Revert "[libc++] Optimize vector growing of trivially relocatable types (#76657 )" Broke sanitizer bots: https://lab.llvm.org/buildbot/#/builders/5/builds/40641 This reverts commit 67eee4a029797c09129889c3655416d1be487cfe.	2024-02-02 20:43:51 +00:00
Nikolas Klauser	67eee4a029	[libc++] Optimize vector growing of trivially relocatable types (#76657 ) This patch introduces a new trait to represent whether a type is trivially relocatable, and uses that trait to optimize the growth of a std::vector of trivially relocatable objects. ``` -------------------------------------------------- Benchmark old new -------------------------------------------------- bm_grow<int> 1354 ns 1301 ns bm_grow<std::string> 5584 ns 3370 ns bm_grow<std::unique_ptr<int>> 3506 ns 1994 ns bm_grow<std::deque<int>> 27114 ns 27209 ns ``` This also changes to order of moving and destroying the objects when growing the vector. This should not affect our conformance.	2024-02-02 17:13:55 +01:00
Martijn Vels	6fe4e033f0	[libc++] Optimize vector push_back to avoid continuous load and store of end pointer Credits: this change is based on analysis and a proof of concept by gerbens@google.com. Before, the compiler loses track of end as 'this' and other references possibly escape beyond the compiler's scope. This can be see in the generated assembly: 16.28 │200c80: mov %r15d,(%rax) 60.87 │200c83: add $0x4,%rax │200c87: mov %rax,-0x38(%rbp) 0.03 │200c8b: → jmpq 200d4e ... ... 1.69 │200d4e: cmp %r15d,%r12d │200d51: → je 200c40 16.34 │200d57: inc %r15d 0.05 │200d5a: mov -0x38(%rbp),%rax 3.27 │200d5e: mov -0x30(%rbp),%r13 1.47 │200d62: cmp %r13,%rax │200d65: → jne 200c80 We fix this by always explicitly storing the loaded local and pointer back at the end of push back. This generates some slight source 'noise', but creates nice and compact fast path code, i.e.: 32.64 │200760: mov %r14d,(%r12) 9.97 │200764: add $0x4,%r12 6.97 │200768: mov %r12,-0x38(%rbp) 32.17 │20076c: add $0x1,%r14d 2.36 │200770: cmp %r14d,%ebx │200773: → je 200730 8.98 │200775: mov -0x30(%rbp),%r13 6.75 │200779: cmp %r13,%r12 │20077c: → jne 200760 Now there is a single store for the push_back value (as before), and a single store for the end without a reload (dependency). For fully local vectors, (i.e., not referenced elsewhere), the capacity load and store inside the loop could also be removed, but this requires more substantial refactoring inside vector. Differential Revision: https://reviews.llvm.org/D80588	2023-10-02 09:12:37 -04:00
Konstantin Varlamov	dd788af74a	Reapply "[libc++][ranges] Add benchmarks for the `from_range` constructors of `vector` and `deque`." (#67753 ) This reverts commit 10edd5d9436153ace82009a04900ac67d3adc202 and guards against older versions of GCC to work around the problem.	2023-09-29 10:27:20 -04:00
Aaron Ballman	10edd5d943	Revert "[libc++][ranges] Add benchmarks for the `from_range` constructors of `vector` and `deque`." This reverts commit 390ac823178fc1073612b4c8a38835f441138d9d. It broke the sphinx publish bots for our documentation: https://lab.llvm.org/buildbot/#/builders/242/builds/1130 because that machine has GCC 9.4.0 which does not know about C++23	2023-09-25 10:40:51 -04:00
varconst	390ac82317	[libc++][ranges] Add benchmarks for the `from_range` constructors of `vector` and `deque`. Differential Revision: https://reviews.llvm.org/D150747	2023-09-05 15:57:45 -07:00
Louis Dionne	5aa03b648b	[libc++][NFC] Apply clang-format on large parts of the code base This commit does a pass of clang-format over files in libc++ that don't require major changes to conform to our style guide, or for which we're not overly concerned about conflicting with in-flight patches or hindering the git blame. This roughly covers: - benchmarks - range algorithms - concepts - type traits I did a manual verification of all the changes, and in particular I applied clang-format on/off annotations in a few places where the result was less readable after than before. This was not necessary in a lot of places, however I did find that clang-format had pretty bad taste when it comes to formatting concepts. Differential Revision: https://reviews.llvm.org/D153140	2023-06-19 11:19:51 -04:00
AdityaK	63a2b206fa	[libc++, std::vector] call the optimized version of __uninitialized_allocator_copy for trivial types See: https://github.com/llvm/llvm-project/issues/61987 Fix suggested by: @philnik and @var-const Reviewers: philnik, ldionne, EricWF, var-const Differential Revision: https://reviews.llvm.org/D147741 Testing: ninja check-cxx check-clang check-llvm Benchmark Testcases (BM_CopyConstruct, and BM_Assignment) added. performance improvement: Run on (8 X 4800 MHz CPU s) CPU Caches: L1 Data 48 KiB (x4) L1 Instruction 32 KiB (x4) L2 Unified 1280 KiB (x4) L3 Unified 12288 KiB (x1) Load Average: 1.66, 3.02, 2.43 Comparing build-runtimes-base/libcxx/benchmarks/vector_operations.libcxx.out to build-runtimes/libcxx/benchmarks/vector_operations.libcxx.out Benchmark Time CPU Time Old Time New CPU Old CPU New ---------------------------------------------------------------------------------------------------------------------------------------- BM_ConstructSize/vector_byte/5140480 +0.0362 +0.0362 116906 121132 116902 121131 BM_CopyConstruct/vector_int/5140480 -0.4563 -0.4577 1755224 954241 1755330 951987 BM_Assignment/vector_int/5140480 -0.0222 -0.0220 990045 968095 989917 968125 BM_ConstructSizeValue/vector_byte/5140480 +0.0308 +0.0307 116970 120567 116977 120573 BM_ConstructIterIter/vector_char/1024 -0.0831 -0.0831 19 17 19 17 BM_ConstructIterIter/vector_size_t/1024 +0.0129 +0.0131 88 89 88 89 BM_ConstructIterIter/vector_string/1024 -0.0064 -0.0018 54455 54109 54208 54112 OVERALL_GEOMEAN -0.0845 -0.0842 0 0 0 0 FYI, the perf improvements for BM_CopyConstruct due to this patch is mostly subsumed by the https://reviews.llvm.org/D149826. However this patch still adds value by converting copy to memmove (the second testcase). Before the patch: ``` define linkonce_odr dso_local void @_ZNSt3__16vectorIiNS_9allocatorIiEEE18__construct_at_endIPiS5_EEvT_T0_m(ptr noundef nonnull align 8 dereferenceable(24) %0, ptr noundef %1, ptr noundef %2, i64 noundef %3) local_unnamed_addr #4 comdat align 2 { %5 = getelementptr inbounds %"class.std::__1::vector", ptr %0, i64 0, i32 1 %6 = load ptr, ptr %5, align 8, !tbaa !12 %7 = icmp eq ptr %1, %2 br i1 %7, label %16, label %8 8: ; preds = %4, %8 %9 = phi ptr [ %13, %8 ], [ %1, %4 ] %10 = phi ptr [ %14, %8 ], [ %6, %4 ] %11 = icmp ne ptr %10, null tail call void @llvm.assume(i1 %11) %12 = load i32, ptr %9, align 4, !tbaa !14 store i32 %12, ptr %10, align 4, !tbaa !14 %13 = getelementptr inbounds i32, ptr %9, i64 1 %14 = getelementptr inbounds i32, ptr %10, i64 1 %15 = icmp eq ptr %13, %2 br i1 %15, label %16, label %8, !llvm.loop !16 16: ; preds = %8, %4 %17 = phi ptr [ %6, %4 ], [ %14, %8 ] store ptr %17, ptr %5, align 8, !tbaa !12 ret void } ``` After the patch: ``` define linkonce_odr dso_local void @_ZNSt3__16vectorIiNS_9allocatorIiEEE18__construct_at_endIPiS5_EEvT_T0_m(ptr noundef nonnull align 8 dereferenceable(24) %0, ptr noundef %1, ptr noundef %2, i64 noundef %3) local_unnamed_addr #4 comdat align 2 { %5 = getelementptr inbounds %"class.std::__1::vector", ptr %0, i64 0, i32 1 %6 = load ptr, ptr %5, align 8, !tbaa !12 %7 = ptrtoint ptr %2 to i64 %8 = ptrtoint ptr %1 to i64 %9 = sub i64 %7, %8 %10 = ashr exact i64 %9, 2 tail call void @llvm.memmove.p0.p0.i64(ptr align 4 %6, ptr align 4 %1, i64 %9, i1 false) %11 = getelementptr inbounds i32, ptr %6, i64 %10 store ptr %11, ptr %5, align 8, !tbaa !12 ret void } ``` This is due to the optimized version of uninitialized_allocator_copy function.	2023-05-24 13:40:53 -07:00
Nico Weber	f938755a33	libcxx: Rename .hpp files in libcxx/benchmarks to .h LLVM uses .h as its extension for header files. Differential Revision: https://reviews.llvm.org/D66509 llvm-svn: 369487	2019-08-21 01:59:12 +00:00
Eric Fiselier	d4ace50ed0	Fix PR35637: suboptimal codegen for `vector<unsigned char>`. The optimizer is petulant and temperamental. In this case LLVM failed to lower the the "insert at end" loop used by`vector<unsigned char>` to a `memset` despite `memset` being substantially faster over a range of bytes. LLVM has the ability to lower loops to `memset` whet appropriate, but the odd nature of libc++'s loops prevented the optimization from taking places. This patch addresses the issue by rewriting the loops from the form `do [ ... --__n; } while (__n > 0);` to instead use a for loop over a pointer range (For example: `for (auto *__i = ...; __i < __e; ++__i)`). This patch also rewrites the asan annotations to unposion all additional memory at the start of the loop instead of once per iterations. This could potentially permit false negatives where the constructor of element N attempts to access element N + 1 during its construction. The before and after results for the `BM_ConstructSize/vector_byte/5140480_mean` benchmark (run 5 times) are: -------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations -------------------------------------------------------------------------------------------- Before ------ BM_ConstructSize/vector_byte/5140480_mean 12530140 ns 12469693 ns N/A BM_ConstructSize/vector_byte/5140480_median 12512818 ns 12445571 ns N/A BM_ConstructSize/vector_byte/5140480_stddev 106224 ns 107907 ns 5 ----- After ----- BM_ConstructSize/vector_byte/5140480_mean 167285 ns 166500 ns N/A BM_ConstructSize/vector_byte/5140480_median 166749 ns 166069 ns N/A BM_ConstructSize/vector_byte/5140480_stddev 3242 ns 3184 ns 5 llvm-svn: 367183	2019-07-28 04:37:02 +00:00
Eric Fiselier	1903976d37	Update Google Benchmark library llvm-svn: 322812	2018-01-18 04:23:01 +00:00
Eric Fiselier	0e100998fc	Start adding benchmarks for vector llvm-svn: 276552	2016-07-24 06:51:55 +00:00

13 Commits