This patch does two things.
(1) It replaces SFINAE with `if constexpr`, avoiding some overload
resolution and unnecessary boilerplate.
(2) It removes an overload from `__for_each_n` to forward to
`__for_each`, since `__for_each` doesn't provide any further
optimizations.
Part of #102817.
This patch attempts to optimize the performance of `std::generate` for
segmented iterators. Below are the benchmark numbers from
`libcxx\test\benchmarks\algorithms\modifying\generate.bench.cpp`. Test
cases that use segmented iterators have also been added.
- before
```
std::generate(deque<int>)/32 194 ns 193 ns 3733333
std::generate(deque<int>)/50 276 ns 276 ns 2488889
std::generate(deque<int>)/1024 5096 ns 5022 ns 112000
std::generate(deque<int>)/8192 40806 ns 40806 ns 17231
```
- after
```
std::generate(deque<int>)/32 106 ns 105 ns 6400000
std::generate(deque<int>)/50 139 ns 138 ns 4977778
std::generate(deque<int>)/1024 2713 ns 2699 ns 248889
std::generate(deque<int>)/8192 18983 ns 19252 ns 37333
```
---------
Co-authored-by: A. Jiang <de34@live.cn>
Currently, `size_t` and `__libcpp_is_constant_evaluated` are obtained by
transitive inclusion in `<__algorithm/find.h>` when `<cwchar>` is not
included. This broke module build when `_LIBCPP_HAS_WIDE_CHARACTERS` is
`1` and caused CI failure. We should explicitly include the
corresponding internal headers.
This patch adds additional overloads to `map::at` in case its known that
the argument is transparently comparable to the key type. This avoids
actually constructing the key type in some cases, potentially removing
allocations.
```
--------------------------------------------------------
Benchmark old new
--------------------------------------------------------
BM_map_find_string_literal 12.8 ns 2.68 ns
```
`static` is not allowed inside constexpr functions in C++ versions below
23. This is causing a build error when compiled with GCC and warning
when compiled with Clang. This patch fixes this.
---------
Co-authored-by: A. Jiang <de34@live.cn>
We've built up quite a few links directly to github within the code
base. We should instead use `llvm.org/PR<issue-number>` to link to bugs,
since that is resilient to the bug tracker changing in the future. This
is especially relevant for tests linking to bugs, since they will
probably be there for decades to come. A nice side effect is that these
links are significantly shorter than the GH links, making them much less
of an eyesore.
This patch also replaces a few links that linked to the old bugzilla
instance on llvm.org.
In #135303, we started using `__bit_log2` instead of `__log2i` inside
`std::sort`. However, `__bit_log2` has a precondition that `__log2i`
didn't have, which is that the input is non-zero. While it technically
makes no sense to request the logarithm of 0, `__log2i` handled that
case and returned 0 without issues.
After switching to `__bit_log2`, passing 0 as an input results in an
unsigned integer overflow which can trigger
`-fsanitize=unsigned-integer-overflow`. While not technically UB in
itself, it's clearly not intended either.
To fix this, we add an internal assertion to `__bit_log2` which catches
the issue in our test suite, and we make sure not to violate
`__bit_log2`'s preconditions before we call it from `std::sort`.
Libc++'s policy is to support only the latest released Xcode, which is
Xcode 16.x. We did update our CI jobs to Xcode 16.x, but we forgot to
update the documentation, which still mentioned Xcode 15. This patch
updates the documentation and cleans up outdated mentions of
apple-clang-15 in the test suite.
Previously, the segmented iterator optimization was limited to `std::{for_each, for_each_n}`. This patch
extends the optimization to `std::ranges::for_each` and `std::ranges::for_each_n`, ensuring consistent
optimizations across these algorithms. This patch first generalizes the `std` algorithms by introducing
a `Projection` parameter, which is set to `__identity` for the `std` algorithms. Then we let the `ranges`
algorithms to directly call their `std` counterparts with a general `__proj` argument. Benchmarks
demonstrate performance improvements of up to 21.4x for ``std::deque::iterator`` and 22.3x for
``join_view`` of ``vector<vector<char>>``.
Addresses a subtask of #102817.
This reverts commit c861fe8a71e64f3d2108c58147e7375cd9314521.
Unfortunately, this use of hidden visibility attributes causes
user-defined specializations of standard-library types to also be marked
hidden by default, which is incorrect. See discussion thread on #131156.
...and also reverts the follow-up commits:
Revert "[libc++] Add explicit ABI annotations to functions from the block runtime declared in <__functional/function.h> (#140592)"
This reverts commit 3e4c9dc299c35155934688184319d391b298fff7.
Revert "[libc++] Make ABI annotations explicit for windows-specific code (#140507)"
This reverts commit f73287e623a6c2e4a3485832bc3e10860cd26eb5.
Revert "[libc++][NFC] Replace a few "namespace std" with the correct macro (#140510)"
This reverts commit 1d411f27c769a32cb22ce50b9dc4421e34fd40dd.
This patch enhances the performance of `std::for_each_n` when used with
segmented iterators, leading to significant performance improvements,
summarized in the tables below. This addresses a subtask of
https://github.com/llvm/llvm-project/issues/102817.
This patch introduces `_LIBCPP_{BEGIN,END}_EXPLICIT_ABI_ANNOTATIONS`,
which allow us to have implicit annotations for most functions, and just
where it's not "hide_from_abi everything" we add explicit annotations.
This allows us to drop the `_LIBCPP_HIDE_FROM_ABI` macro from most
functions in libc++.
Previously, the segmented iterator optimization for `std::for_each` was restricted to C++23 and later due to its dependency on `__movable_box`, which is not available in earlier standards. This patch eliminates that restriction, enabling consistent optimizations starting from C++11.
By backporting this enhancement, we improve performance across older standards and create opportunities to extend similar optimizations to other algorithms by forwarding their calls to `std::for_each`.
`__libcpp_{ctz, clz}` were previously used as fallbacks for `__builtin_{ctzg, clzg}` to ensure compatibility with older compilers (Clang 18 and earlier), as `__builtin_{ctzg, clzg}` became available in Clang 19. Now that support for Clang 18 has been officially dropped in #130142, we can now safely replace all instances of `__libcpp_{ctz, clz}` with `__count{l,r}_zero` (which internally call `__builtin_{ctzg, clzg}` and eliminate the fallback logic.
Closes#131179.
The `__log2i` function template in `<algorithm/sort.h>` is basically
equivalent to `__bit_log2` in `<__bit/bit_log2.h>`. It seems better to
avoid duplication.
# Overview
As a disclaimer, this is my first PR to LLVM and while I've tried to
ensure I've followed the LLVM and libc++ contributing guidelines,
there's probably a good chance I missed something. If I have, just let
me know and I'll try to correct it as soon as I can.
This PR implements `std::ranges::iota` and
`std::ranges::out_value_result` outlined in
[P2440r1](https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2021/p2440r1.html).
As outlined in the paper above, I've:
- Implemented `out_value_result` and added to `<algorithm>`
- Added `out_value_result`, `iota_result`, and two overloads of `iota`
to `std::ranges` in `<numeric>`
- Updated the version macro `__cpp_lib_ranges_iota` in `<version>`
I've also added tests for `ranges::iota` and `ranges::out_value_result`.
Lastly, I added those structs to the appropriate module files.
Partially implements #105184
EDIT: Forgot to mention in the original post, thanks to @hawkinsw for
taking a look at a preliminary version of this PR!
# TODOs
- [x] Updating the range [status
doc](https://github.com/jamesETsmith/llvm-project/blob/main/libcxx/docs/Status/RangesMajorFeatures.csv)
- [x] Ensure all comments from https://reviews.llvm.org/D121436 are
addressed here
- [X] EDIT (I'll do this in a separate PR). ~~I'm open to implementing
the rest of P2440r1 (`ranges::shift_left` and `ranges::shift_right`) if
that's ok, I just wanted to get feedback on `ranges::iota` first~~
- [x] I've been having trouble building the modules locally and want to
make sure that's working properly
Closes: #134060
Previously, ranges::min_element delegated to ranges::__min_element_impl, which
duplicated the definition of std::__min_element. This patch updates
ranges::min_element to directly call std::__min_element, which allows
removing the redundant code in ranges::__min_element_impl.
Upon removal of ranges::__min_element_impl, the other ranges algorithms
ranges::{min,max,max_element}, which previously delegated to ranges::__min_element_impl,
have been updated to call std::__min_element instead.
This refactoring unifies the implementation across these algorithms,
ensuring that future optimizations or maintenance work only need to be
applied in one place.
Previously, const and ref qualification on an operation would cause
__desugars_to to report false, which would lead to unnecessary
pessimizations. The same holds for reference_wrapper.
In practice, const and ref qualifications on the operation itself are
not relevant to determining whether an operation desugars to something
else or not, so can be ignored.
We are not stripping volatile qualifiers from operations in this patch
because we feel that this requires additional discussion.
Fixes#129312
Drive-by changes:
- Consistently mark `std::__inplace_merge::__inplace_merge_impl`
`_LIBCPP_CONSTEXPR_SINCE_CXX26`.
- This function template is only called by other functions that becomes
constexpr since C++26, and it itself calls `std::__inplace_merge` that
is constexpr since C++26.
- Unblock related test coverage in constant evaluation for
`stable_partition`, `ranges::stable_sort`, `std::stable_sort`,
`std::stable_partition`, and `std::inplace_merge`.
This patch fixes an issue in libc++ where `std::copy_backward` and
`ranges::copy_backward` incorrectly copy `std::vector<bool>` with small
storage types (e.g., `uint8_t`, `uint16_t`). The problem arises from
flawed bit mask computations involving integral promotions and sign-bit
extension, leading to unintended zeroing of bits. This patch corrects
the bitwise operations to ensure correct bit-level copying.
Fixes#131718.
The current implementation of `{std, ranges}::copy` fails to copy
`vector<bool>` correctly when the underlying storage type
(`__storage_type`) is smaller than `int`, such as `unsigned char`,
`unsigned short`, `uint8_t` and `uint16_t`. The root cause is that the
unsigned small storage type undergoes integer promotion to (signed)
`int`, which is then left and right shifted, leading to UB (before
C++20) and sign-bit extension (since C++20) respectively. As a result,
the underlying bit mask evaluations become incorrect, causing erroneous
copying behavior.
This patch resolves the issue by correcting the internal bitwise
operations, ensuring that `{std, ranges}::copy` operates correctly for
`vector<bool>` with any custom (unsigned) storage types.
Fixes#131692.
The current implementation of `{std, ranges}::equal` fails to correctly
compare `vector<bool>`s when the underlying storage type is smaller than
`int` (e.g., `unsigned char`, `unsigned short`, `uint8_t` and
`uint16_t`). See [demo](https://godbolt.org/z/j4s87s6b3)). The problem
arises due to integral promotions on the intermediate bitwise
operations, leading to incorrect final equality comparison results. This
patch fixes the issue by ensuring that `{std, ranges}::equal` operate
properly for both aligned and unaligned bits.
Fixes#126369.
This PR fixes an ambiguous call encountered while using the
`std::ranges::count` and `std::count` algorithms with `vector<bool>`
with small `size_type`s.
The ambiguity arises from integral promotions during the internal
bitwise arithmetic of the `count` algorithms for small integral types.
This results in multiple viable candidates:
`__libcpp_popcount(unsigned)`,` __libcpp_popcount(unsigned long)`, and
`__libcpp_popcount(unsigned long long)`, leading to an ambiguous call
error. To resolve this ambiguity, we introduce a dispatcher function,
`__popcount`, which directs calls to the appropriate overloads of
`__libcpp_popcount`. This closes#122528.
This PR fixes an ambiguous call encountered when using the `std::ranges::find` or `std::find`
algorithms with `vector<bool>` with small `allocator_traits::size_type`s, an issue reported
in #122528. The ambiguity arises from integral promotions during the internal bitwise
arithmetic of the `find` algorithms when applied to `vector<bool>` with small integral
`size_type`s. This leads to multiple viable candidates for small integral types:
__libcpp_ctz(unsigned), __libcpp_ctz(unsigned long), and __libcpp_ctz(unsigned long long),
none of which represent a single best viable match, resulting in an ambiguous call error.
To resolve this, we propose invoking an internal function __countr_zero as a dispatcher
that directs the call to the appropriate overload of __libcpp_ctz. Necessary amendments
have also been made to __countr_zero.
This PR optimizes the performance of `std::ranges::rotate` for
`vector<bool>::iterator`. The optimization yields a performance
improvement of up to 2096x.
Closes#64038.
Drive-by: Enables test coverage for `ranges::stable_sort` with proxy
iterators, and changes "constexpr in" to "constexpr since" in comments
in `<algorithm>`.
This PR optimizes the performance of `std::ranges::swap_ranges` for
`vector<bool>::iterator`, addressing a subtask outlined in issue #64038.
The optimizations yield performance improvements of up to **611x** for
aligned range swap and **78x** for unaligned range swap comparison.
Additionally, comprehensive tests covering up to 4 storage words (256
bytes) with odd and even bit sizes are provided, which validate the
proposed optimizations in this patch.
Drive-by changes:
- Enables no-memory case for Clang.
- Enables `robust_re_difference_type.compile.pass.cpp` and
`robust_against_proxy_iterators_lifetime_bugs.pass.cpp` test coverage
for `std::stable_sort` in constant evaluation since C++26. The changes
were missing in the PR making `std::stable_sort` `constexpr`.
Previously the wrong detection macro has been used to check whether arm
NEON is available. This fixes it, and removes a few unnecessary includes
from `__algorithm/simd_utils.h` as a drive-by.
This PR optimizes the performance of `std::ranges::equal` for
`vector<bool>::iterator`, addressing a subtask outlined in issue #64038.
The optimizations yield performance improvements of up to 188x for
aligned equality comparison and 82x for unaligned equality
comparison. Moreover, comprehensive tests covering up to 4 storage words
(256 bytes) with odd and even bit sizes are provided, which validate the
proposed optimizations in this patch.
This is technically not necessary in most cases to prevent issues with ADL,
but let's be consistent. This allows us to remove the libcpp-qualify-declval
clang-tidy check, which is now enforced by the robust-against-adl clang-tidy check.
As a follow-up to #121013 (which optimized `ranges::copy`) and #121026
(which optimized `ranges::copy_backward`), this PR enhances the
performance of `std::ranges::{move, move_backward}` for
`vector<bool>::iterator`, addressing a subtask outlined in issue #64038.
The optimizations bring performance improvements analogous to those
achieved for the `{copy, copy_backward}` algorithms: up to 2000x for
aligned moves and 60x for unaligned moves. Moreover, comprehensive
tests covering up to 4 storage words (256 bytes) with odd and even bit
sizes are provided, which validate the proposed optimizations in this
patch.
This PR addresses an undefined behavior that arises when using the
`std::fill` and `std::fill_n` algorithms, as well as their ranges
counterparts `ranges::fill` and `ranges::fill_n`, with `vector<bool, Alloc>`
that utilizes a custom-sized allocator with small integral types.