13 Commits

Author SHA1 Message Date
Nikolas Klauser
4e112e5c1c
Reapply "[libc++] Optimize vector growing of trivially relocatable types" (#80558)
This reapplies #76657. Non-trivial elements didn't get destroyed
previously. This fixes the bug and adds tests for all the vector
insertion functions.
2024-02-04 00:28:29 +01:00
Kirill Stoimenov
2352fdd202 Revert "[libc++] Optimize vector growing of trivially relocatable types (#76657)"
Broke sanitizer bots: https://lab.llvm.org/buildbot/#/builders/5/builds/40641

This reverts commit 67eee4a029797c09129889c3655416d1be487cfe.
2024-02-02 20:43:51 +00:00
Nikolas Klauser
67eee4a029
[libc++] Optimize vector growing of trivially relocatable types (#76657)
This patch introduces a new trait to represent whether a type is
trivially
relocatable, and uses that trait to optimize the growth of a std::vector
of trivially relocatable objects.

```
--------------------------------------------------
Benchmark                           old        new
--------------------------------------------------
bm_grow<int>                    1354 ns    1301 ns
bm_grow<std::string>            5584 ns    3370 ns
bm_grow<std::unique_ptr<int>>   3506 ns    1994 ns
bm_grow<std::deque<int>>       27114 ns   27209 ns
```

This also changes to order of moving and destroying the objects when
growing the vector. This should not affect our conformance.
2024-02-02 17:13:55 +01:00
Martijn Vels
6fe4e033f0 [libc++] Optimize vector push_back to avoid continuous load and store of end pointer
Credits: this change is based on analysis and a proof of concept by
gerbens@google.com.

Before, the compiler loses track of end as 'this' and other references
possibly escape beyond the compiler's scope. This can be see in the
generated assembly:

     16.28 │200c80:   mov     %r15d,(%rax)
     60.87 │200c83:   add     $0x4,%rax
           │200c87:   mov     %rax,-0x38(%rbp)
      0.03 │200c8b: → jmpq    200d4e
      ...
      ...
      1.69 │200d4e:   cmp     %r15d,%r12d
           │200d51: → je      200c40
     16.34 │200d57:   inc     %r15d
      0.05 │200d5a:   mov     -0x38(%rbp),%rax
      3.27 │200d5e:   mov     -0x30(%rbp),%r13
      1.47 │200d62:   cmp     %r13,%rax
           │200d65: → jne     200c80

We fix this by always explicitly storing the loaded local and pointer
back at the end of push back. This generates some slight source 'noise',
but creates nice and compact fast path code, i.e.:

     32.64 │200760:   mov    %r14d,(%r12)
      9.97 │200764:   add    $0x4,%r12
      6.97 │200768:   mov    %r12,-0x38(%rbp)
     32.17 │20076c:   add    $0x1,%r14d
      2.36 │200770:   cmp    %r14d,%ebx
           │200773: → je     200730
      8.98 │200775:   mov    -0x30(%rbp),%r13
      6.75 │200779:   cmp    %r13,%r12
           │20077c: → jne    200760

Now there is a single store for the push_back value (as before), and a
single store for the end without a reload (dependency).

For fully local vectors, (i.e., not referenced elsewhere), the capacity
load and store inside the loop could also be removed, but this requires
more substantial refactoring inside vector.

Differential Revision: https://reviews.llvm.org/D80588
2023-10-02 09:12:37 -04:00
Konstantin Varlamov
dd788af74a
Reapply "[libc++][ranges] Add benchmarks for the from_range constructors of vector and deque." (#67753)
This reverts commit 10edd5d9436153ace82009a04900ac67d3adc202 and guards
against older versions of GCC to work around the problem.
2023-09-29 10:27:20 -04:00
Aaron Ballman
10edd5d943 Revert "[libc++][ranges] Add benchmarks for the from_range constructors of vector and deque."
This reverts commit 390ac823178fc1073612b4c8a38835f441138d9d.

It broke the sphinx publish bots for our documentation:
https://lab.llvm.org/buildbot/#/builders/242/builds/1130

because that machine has GCC 9.4.0 which does not know about C++23
2023-09-25 10:40:51 -04:00
varconst
390ac82317 [libc++][ranges] Add benchmarks for the from_range constructors of vector and deque.
Differential Revision: https://reviews.llvm.org/D150747
2023-09-05 15:57:45 -07:00
Louis Dionne
5aa03b648b [libc++][NFC] Apply clang-format on large parts of the code base
This commit does a pass of clang-format over files in libc++ that
don't require major changes to conform to our style guide, or for
which we're not overly concerned about conflicting with in-flight
patches or hindering the git blame.

This roughly covers:
- benchmarks
- range algorithms
- concepts
- type traits

I did a manual verification of all the changes, and in particular I
applied clang-format on/off annotations in a few places where the
result was less readable after than before. This was not necessary
in a lot of places, however I did find that clang-format had pretty
bad taste when it comes to formatting concepts.

Differential Revision: https://reviews.llvm.org/D153140
2023-06-19 11:19:51 -04:00
AdityaK
63a2b206fa [libc++, std::vector] call the optimized version of __uninitialized_allocator_copy for trivial types
See: https://github.com/llvm/llvm-project/issues/61987

Fix suggested by: @philnik and @var-const

Reviewers: philnik, ldionne, EricWF, var-const

Differential Revision: https://reviews.llvm.org/D147741

Testing:
ninja check-cxx check-clang check-llvm

Benchmark Testcases (BM_CopyConstruct, and BM_Assignment) added.

performance improvement:

Run on (8 X 4800 MHz CPU s)
CPU Caches:
  L1 Data 48 KiB (x4)
  L1 Instruction 32 KiB (x4)
  L2 Unified 1280 KiB (x4)
  L3 Unified 12288 KiB (x1)
Load Average: 1.66, 3.02, 2.43

Comparing build-runtimes-base/libcxx/benchmarks/vector_operations.libcxx.out to build-runtimes/libcxx/benchmarks/vector_operations.libcxx.out
Benchmark                                                   Time             CPU      Time Old      Time New       CPU Old       CPU New
----------------------------------------------------------------------------------------------------------------------------------------
BM_ConstructSize/vector_byte/5140480                     +0.0362         +0.0362        116906        121132        116902        121131
BM_CopyConstruct/vector_int/5140480                      -0.4563         -0.4577       1755224        954241       1755330        951987
BM_Assignment/vector_int/5140480                         -0.0222         -0.0220        990045        968095        989917        968125
BM_ConstructSizeValue/vector_byte/5140480                +0.0308         +0.0307        116970        120567        116977        120573
BM_ConstructIterIter/vector_char/1024                    -0.0831         -0.0831            19            17            19            17
BM_ConstructIterIter/vector_size_t/1024                  +0.0129         +0.0131            88            89            88            89
BM_ConstructIterIter/vector_string/1024                  -0.0064         -0.0018         54455         54109         54208         54112
OVERALL_GEOMEAN                                          -0.0845         -0.0842             0             0             0             0

FYI, the perf improvements for BM_CopyConstruct due to this patch is mostly subsumed by the https://reviews.llvm.org/D149826. However this patch still adds value by converting copy to memmove (the second testcase).

Before the patch:

```
define linkonce_odr dso_local void @_ZNSt3__16vectorIiNS_9allocatorIiEEE18__construct_at_endIPiS5_EEvT_T0_m(ptr noundef nonnull align 8 dereferenceable(24) %0, ptr noundef %1, ptr noundef %2, i64 noundef %3) local_unnamed_addr #4 comdat align 2 {
  %5 = getelementptr inbounds %"class.std::__1::vector", ptr %0, i64 0, i32 1
  %6 = load ptr, ptr %5, align 8, !tbaa !12
  %7 = icmp eq ptr %1, %2
  br i1 %7, label %16, label %8

8:                                                ; preds = %4, %8
  %9 = phi ptr [ %13, %8 ], [ %1, %4 ]
  %10 = phi ptr [ %14, %8 ], [ %6, %4 ]
  %11 = icmp ne ptr %10, null
  tail call void @llvm.assume(i1 %11)
  %12 = load i32, ptr %9, align 4, !tbaa !14
  store i32 %12, ptr %10, align 4, !tbaa !14
  %13 = getelementptr inbounds i32, ptr %9, i64 1
  %14 = getelementptr inbounds i32, ptr %10, i64 1
  %15 = icmp eq ptr %13, %2
  br i1 %15, label %16, label %8, !llvm.loop !16

16:                                               ; preds = %8, %4
  %17 = phi ptr [ %6, %4 ], [ %14, %8 ]
  store ptr %17, ptr %5, align 8, !tbaa !12
  ret void
}
```

After the patch:
```
define linkonce_odr dso_local void @_ZNSt3__16vectorIiNS_9allocatorIiEEE18__construct_at_endIPiS5_EEvT_T0_m(ptr noundef nonnull align 8 dereferenceable(24) %0, ptr noundef %1, ptr noundef %2, i64 noundef %3) local_unnamed_addr #4 comdat align 2 {
  %5 = getelementptr inbounds %"class.std::__1::vector", ptr %0, i64 0, i32 1
  %6 = load ptr, ptr %5, align 8, !tbaa !12
  %7 = ptrtoint ptr %2 to i64
  %8 = ptrtoint ptr %1 to i64
  %9 = sub i64 %7, %8
  %10 = ashr exact i64 %9, 2
  tail call void @llvm.memmove.p0.p0.i64(ptr align 4 %6, ptr align 4 %1, i64 %9, i1 false)
  %11 = getelementptr inbounds i32, ptr %6, i64 %10
  store ptr %11, ptr %5, align 8, !tbaa !12
  ret void
}
```

This is due to the optimized version of uninitialized_allocator_copy function.
2023-05-24 13:40:53 -07:00
Nico Weber
f938755a33 libcxx: Rename .hpp files in libcxx/benchmarks to .h
LLVM uses .h as its extension for header files.

Differential Revision: https://reviews.llvm.org/D66509

llvm-svn: 369487
2019-08-21 01:59:12 +00:00
Eric Fiselier
d4ace50ed0 Fix PR35637: suboptimal codegen for vector<unsigned char>.
The optimizer is petulant and temperamental. In this case LLVM failed to lower
the the "insert at end" loop used by`vector<unsigned char>` to a `memset` despite
`memset` being substantially faster over a range of bytes.

LLVM has the ability to lower loops to `memset` whet appropriate, but the
odd nature of libc++'s loops prevented the optimization from taking places.

This patch addresses the issue by rewriting the loops from the form
`do [ ... --__n; } while (__n > 0);` to instead use a for loop over a pointer
range (For example: `for (auto *__i = ...; __i < __e; ++__i)`).

This patch also rewrites the asan annotations to unposion all additional memory
at the start of the loop instead of once per iterations. This could potentially
permit false negatives where the constructor of element N attempts to access
element N + 1 during its construction.

The before and after results for the `BM_ConstructSize/vector_byte/5140480_mean`
benchmark (run 5 times) are:

--------------------------------------------------------------------------------------------
Benchmark                                                 Time             CPU   Iterations
--------------------------------------------------------------------------------------------
Before
------
BM_ConstructSize/vector_byte/5140480_mean          12530140 ns     12469693 ns            N/A
BM_ConstructSize/vector_byte/5140480_median        12512818 ns     12445571 ns            N/A
BM_ConstructSize/vector_byte/5140480_stddev          106224 ns       107907 ns            5
-----
After
-----
BM_ConstructSize/vector_byte/5140480_mean            167285 ns       166500 ns            N/A
BM_ConstructSize/vector_byte/5140480_median          166749 ns       166069 ns            N/A
BM_ConstructSize/vector_byte/5140480_stddev            3242 ns         3184 ns            5

llvm-svn: 367183
2019-07-28 04:37:02 +00:00
Eric Fiselier
1903976d37 Update Google Benchmark library
llvm-svn: 322812
2018-01-18 04:23:01 +00:00
Eric Fiselier
0e100998fc Start adding benchmarks for vector
llvm-svn: 276552
2016-07-24 06:51:55 +00:00