Adrian Vogelsgesang f02b661054
[libc++] Add move constructor & assignment to exception_ptr (#164281)
This commit adds move constructor, move assignment and `swap`
to `exception_ptr`. Adding those operators allows us to avoid
unnecessary calls to `__cxa_{inc,dec}rement_refcount`.

Performance results (from libc++'s CI):

```
Benchmark                               Baseline    Candidate    Difference    % Difference
------------------------------------  ----------  -----------  ------------  --------------
bm_exception_ptr_copy_assign_nonnull        9.77         9.94          0.18           1.79%
bm_exception_ptr_copy_assign_null          10.29        10.65          0.35           3.42%
bm_exception_ptr_copy_ctor_nonnull          7.02         7.01         -0.01          -0.13%
bm_exception_ptr_copy_ctor_null            10.54        10.60          0.06           0.56%
bm_exception_ptr_move_assign_nonnull       16.92        13.76         -3.16         -18.70%
bm_exception_ptr_move_assign_null          10.61        10.76          0.14           1.36%
bm_exception_ptr_move_ctor_nonnull         13.31        10.25         -3.06         -23.02%
bm_exception_ptr_move_ctor_null            10.28         7.30         -2.98         -28.95%
bm_exception_ptr_swap_nonnull              19.22         0.63        -18.59         -96.74%
bm_exception_ptr_swap_null                 20.02         7.79        -12.23         -61.07%
```

As expected, the `bm_exception_ptr_copy_*` benchmarks are not influenced by
this change. `bm_exception_ptr_move_*` benefits between 18% and 30%. The
`bm_exception_ptr_swap_*` tests show the biggest improvements since multiple
calls to the copy constructor are replaced by a simple pointer swap.

While `bm_exception_ptr_move_assign_null` did not show a regression in the CI
measurements, local measurements showed a regression from 3.98 to 4.71, i.e. by
18%. This is due to the additional `__tmp` inside `operator=`. The destructor
of `__other` is a no-op after the move because `__other.__ptr` will be a
nullptr. However, the compiler does not realize this, since the destructor is
not inlined and is lacking a fast-path. As such, the swap-based implementation
leads to an additional destructor call. `bm_exception_ptr_move_assign_nonnull`
still benefits because the swap-based move constructor avoids unnecessary
__cxa_{in,de}crement_refcount calls. As soon as we inline the destructor, this
regression should disappear again.

Works towards #44892
2025-11-03 21:18:43 +01:00
..