19 Commits

Author SHA1 Message Date
Louis Dionne
348e74139a [libc++][NFC] Run clang-format on libcxx/include
This re-formats a few headers that had become out-of-sync with respect
to formatting since we ran clang-format on the whole codebase. There's
surprisingly few instances of it.
2024-08-30 12:09:36 -04:00
Mark de Wever
59e66c515a
[libc++][format] Switches to Unicode 15.1. (#86543)
In addition to changes in the tables the extended grapheme clustering
algorithm has been overhauled. Before I considered a separate state
machine to implement the rules. With the new rule GB9c this became more
attractive and the design has changed.

This change initially had quite an impact on the performance. By making
the state machine persistent the performance was improved greatly. Note
it is still slower than before due to the larger Unicode tables.

Before
--------------------------------------------------------------------
Benchmark                          Time             CPU   Iterations
--------------------------------------------------------------------
BM_ascii_text<char>             1891 ns         1889 ns       369504
BM_unicode_text<char>         106642 ns       106397 ns         6576
BM_cyrillic_text<char>         73420 ns        73277 ns         9445
BM_japanese_text<char>         62485 ns        62387 ns        11153
BM_emoji_text<char>             1895 ns         1893 ns       369525
BM_ascii_text<wchar_t>          2015 ns         2013 ns       346887
BM_unicode_text<wchar_t>       92119 ns        92017 ns         7598
BM_cyrillic_text<wchar_t>      62637 ns        62568 ns        11117
BM_japanese_text<wchar_t>      53850 ns        53785 ns        12803
BM_emoji_text<wchar_t>          2016 ns         2014 ns       347325

After
--------------------------------------------------------------------
Benchmark                          Time             CPU   Iterations
--------------------------------------------------------------------
BM_ascii_text<char>             1906 ns         1904 ns       369409
BM_unicode_text<char>         265462 ns       265175 ns         2628
BM_cyrillic_text<char>        181063 ns       180865 ns         3871
BM_japanese_text<char>        130927 ns       130789 ns         5324
BM_emoji_text<char>             1892 ns         1890 ns       370537
BM_ascii_text<wchar_t>          2038 ns         2035 ns       343689
BM_unicode_text<wchar_t>      277603 ns       277282 ns         2526
BM_cyrillic_text<wchar_t>     188558 ns       188339 ns         3727
BM_japanese_text<wchar_t>     133084 ns       132943 ns         5262
BM_emoji_text<wchar_t>          2012 ns         2010 ns       348015

Persistent
--------------------------------------------------------------------
Benchmark                          Time             CPU   Iterations
--------------------------------------------------------------------
BM_ascii_text<char>             1904 ns         1899 ns       367472
BM_unicode_text<char>         133609 ns       133287 ns         5246
BM_cyrillic_text<char>         90185 ns        89941 ns         7796
BM_japanese_text<char>         75137 ns        74946 ns         9316
BM_emoji_text<char>             1906 ns         1901 ns       368081
BM_ascii_text<wchar_t>          2703 ns         2696 ns       259153
BM_unicode_text<wchar_t>      131497 ns       131168 ns         5341
BM_cyrillic_text<wchar_t>      87071 ns        86840 ns         8076
BM_japanese_text<wchar_t>      72279 ns        72099 ns         9682
BM_emoji_text<wchar_t>          2021 ns         2016 ns       346767
2024-04-09 19:20:06 +02:00
Konstantin Varlamov
4f215fdd62
[libc++][hardening] Categorize more assertions. (#75918)
Also introduce `_LIBCPP_ASSERT_PEDANTIC` for assertions violating which
results in a no-op or other benign behavior, but which may nevertheless
indicate a bug in the invoking code.
2024-01-05 16:29:23 -08:00
Louis Dionne
9783f28cbb
[libc++] Format the code base (#74334)
This patch runs clang-format on all of libcxx/include and libcxx/src, in
accordance with the RFC discussed at [1]. Follow-up patches will format
the benchmarks, the test suite and remaining parts of the code. I'm
splitting this one into its own patch so the diff is a bit easier to
review.

This patch was generated with:

   find libcxx/include libcxx/src -type f \
      | grep -v 'module.modulemap.in' \
      | grep -v 'CMakeLists.txt' \
      | grep -v 'README.txt' \
      | grep -v 'libcxx.imp' \
      | grep -v '__config_site.in' \
      | xargs clang-format -i

A Git merge driver is available in libcxx/utils/clang-format-merge-driver.sh
to help resolve merge and rebase issues across these formatting changes.

[1]: https://discourse.llvm.org/t/rfc-clang-formatting-all-of-libc-once-and-for-all
2023-12-18 14:01:33 -05:00
Mark de Wever
285e1e2a00 [libc++][format] Removes unneeded includes.
I did a manual review after the post-review comments in D149543

Reviewed By: #libc, philnik, ldionne

Differential Revision: https://reviews.llvm.org/D154122
2023-07-08 12:39:33 +02:00
varconst
cd0ad4216c [libc++][hardening][NFC] Introduce _LIBCPP_ASSERT_UNCATEGORIZED.
Replace most uses of `_LIBCPP_ASSERT` with
`_LIBCPP_ASSERT_UNCATEGORIZED`.

This is done as a prerequisite to introducing hardened mode to libc++.
The idea is to make enabling assertions an opt-in with (somewhat)
fine-grained controls over which categories of assertions are enabled.
The vast majority of assertions are currently uncategorized; the new
macro will allow turning on `_LIBCPP_ASSERT` (the underlying mechanism
for all kinds of assertions) without enabling all the uncategorized
assertions (in the future; this patch preserves the current behavior).

Differential Revision: https://reviews.llvm.org/D153816
2023-06-28 15:10:31 -07:00
Ian Anderson
d5ce68afdf [libc++] __iterator/readable_traits.h isn't standalone
`__iterator/readable_traits.h` can't be used by itself, intantiating `iter_value_t` requires `__iterator/iterator_traits.h`. `readable_traits.h` can't include `iterator_traits.h` though because `iterator_traits.h` requires `readable_traits.h`.

Move `iter_value_t` to `__iterator/iterator_traits.h` so that both headers can work standalone.

Reviewed By: Mordante, #libc

Differential Revision: https://reviews.llvm.org/D153828
2023-06-27 10:52:08 -07:00
Mark de Wever
09addf9cbe [libc++][format] Fixes UTF-8 continuation.
The mask used to check whether a code unit is a valid continuation was
incorrect and accepts non-continuation code points. This fixes the
issue.

Reviewed By: ldionne, tahonermann, #libc

Differential Revision: https://reviews.llvm.org/D149672
2023-06-20 19:28:02 +02:00
Mark de Wever
c866855b42 [libc++][format] Improves Unicode decoders.
During the implementation of P2286 a second Unicode decoder was added.
The original decoder was only used for the width estimation. Changing
an ill-formed Unicode sequence to the replacement character, works
properly for this use case. For P2286 an ill-formed Unicode sequence
needs to be formatted as a sequence of code units. The exact wording in
the Standard as a bit unclear and there was odd example in the WP. This
made it hard to use the same decoder. SG16 determined the odd example in
the WP was a bug and this has been fixed in the WP.

This made it possible to combine the two decoders. The P2286 decoder
kept track of the size of the ill-formed sequence. However this was not
needed since the output algorithm needs to keep track of size of a
well-formed and an ill-formed sequence. So this feature has been
removed.

The error status remains since it's needed for P2286, the grapheme
clustering can ignore this unneeded value. (In general, grapheme
clustering is only has specified behaviour for Unicode. When the string
is in a non-Unicode encoding there are no requirements. Ill-formed
Unicode is a non-Unicode encoding. Still libc++ does a best effort
estimation.)

There UTF-8 decoder accepted several ill-formed sequences:
- Values in the surrogate range U+D800..U+DFFF.
- Values encoded in more code units than required, for example 0+0020
  in theory can be encoded using 1, 2, 3, or 4 were accepted. This is
  not allowed by the Unicode Standard.
- Values larger than U+10FFFF were not always rejected.

Reviewed By: #libc, ldionne, tahonermann, Mordante

Differential Revision: https://reviews.llvm.org/D144346
2023-03-08 22:01:49 +01:00
Nikolas Klauser
40a20ae6ab [libc++] Granularize <bit> includes
Reviewed By: ldionne, #libc

Spies: libcxx-commits

Differential Revision: https://reviews.llvm.org/D141228
2023-02-17 11:36:19 +01:00
Nikolas Klauser
4f15267d3d [libc++][NFC] Replace _LIBCPP_STD_VER > x with _LIBCPP_STD_VER >= x
This change is almost fully mechanical. The only interesting change is in `generate_feature_test_macro_components.py` to generate `_LIBCPP_STD_VER >=` instead. To avoid churn in the git-blame this commit should be added to the `.git-blame-ignore-revs` once committed.

Reviewed By: ldionne, var-const, #libc

Spies: jloser, libcxx-commits, arichardson, arphaman, wenlei

Differential Revision: https://reviews.llvm.org/D143962
2023-02-15 16:52:25 +01:00
Louis Dionne
1562e51491 [libc++] Don't assume that string_view::const_iterator is a raw pointer
Our implementation of std::format assumed that string_view's iterators
were raw pointers in various places. If we want to introduce a checked
iterator in debug mode, that won't be true anymore. This patch removes
that assumption.

Differential Revision: https://reviews.llvm.org/D138795
2023-01-30 10:19:32 -05:00
Nikolas Klauser
1f5d698a8b [libc++] Add missing include in __format/unicode.h 2023-01-09 16:37:31 +01:00
Louis Dionne
5935db6ebd [libc++] Fix incorrect guard against the presence of wide characters
TEST_HAS_NO_WIDE_CHARACTERS should only be used in the tests.

Differential Revision: https://reviews.llvm.org/D138828
2022-11-28 14:33:49 -08:00
Nikolas Klauser
3574b800cf [libc++][clang-tidy] Enable readability-simplify-boolean-expr
Reviewed By: ldionne, #libc

Spies: Eugene.Zelenko, aheejin, libcxx-commits, xazax.hun

Differential Revision: https://reviews.llvm.org/D137804
2022-11-24 00:42:19 +01:00
Mark de Wever
a48007355a [libc++][format] Implements string escaping.
Implements parts of
- P2286R8 Formatting Ranges

Reviewed By: #libc, tahonermann

Differential Revision: https://reviews.llvm.org/D134036
2022-10-20 17:29:34 +02:00
Louis Dionne
c2df707666 [libc++] Suppress -Wctad-maybe-unsupported on types w/o deduction guides
There are a handful of standard library types that are intended
to support CTAD but don't need any explicit deduction guides to
do so.

This patch adds a dummy deduction guide to those types to suppress
-Wctad-maybe-unsupported (which gets emitted in user code).

This is a re-application of the original patch by Eric Fiselier in
fcd549a7d828 which had been reverted due to reasons lost at this point.
I also added the macro to a few more types. Reviving this patch was
prompted by the discussion on https://llvm.org/D133425.

Differential Revision: https://reviews.llvm.org/D133535
2022-10-03 14:05:08 -04:00
Mark de Wever
4db55a459e [libc++][format] Adhere to clang-tidy style.
D126971 broke the CI due to recent changes in the clang-tidy settings.
This fixes them.
2022-07-21 17:33:27 +02:00
Mark de Wever
857a78c04d [libc++] Implements Unicode grapheme clustering
This implements the Grapheme clustering as required by
P1868R2 width: clarifying units of width and precision in std::format

This was omitted in the initial patch, but the paper was marked as completed. This really completes the paper.

Reviewed By: ldionne, #libc

Differential Revision: https://reviews.llvm.org/D126971
2022-07-20 18:38:32 +02:00