llvm-project

Author	SHA1	Message	Date
Louis Dionne	348e74139a	[libc++][NFC] Run clang-format on libcxx/include This re-formats a few headers that had become out-of-sync with respect to formatting since we ran clang-format on the whole codebase. There's surprisingly few instances of it.	2024-08-30 12:09:36 -04:00
Mark de Wever	59e66c515a	[libc++][format] Switches to Unicode 15.1. (#86543 ) In addition to changes in the tables the extended grapheme clustering algorithm has been overhauled. Before I considered a separate state machine to implement the rules. With the new rule GB9c this became more attractive and the design has changed. This change initially had quite an impact on the performance. By making the state machine persistent the performance was improved greatly. Note it is still slower than before due to the larger Unicode tables. Before -------------------------------------------------------------------- Benchmark Time CPU Iterations -------------------------------------------------------------------- BM_ascii_text<char> 1891 ns 1889 ns 369504 BM_unicode_text<char> 106642 ns 106397 ns 6576 BM_cyrillic_text<char> 73420 ns 73277 ns 9445 BM_japanese_text<char> 62485 ns 62387 ns 11153 BM_emoji_text<char> 1895 ns 1893 ns 369525 BM_ascii_text<wchar_t> 2015 ns 2013 ns 346887 BM_unicode_text<wchar_t> 92119 ns 92017 ns 7598 BM_cyrillic_text<wchar_t> 62637 ns 62568 ns 11117 BM_japanese_text<wchar_t> 53850 ns 53785 ns 12803 BM_emoji_text<wchar_t> 2016 ns 2014 ns 347325 After -------------------------------------------------------------------- Benchmark Time CPU Iterations -------------------------------------------------------------------- BM_ascii_text<char> 1906 ns 1904 ns 369409 BM_unicode_text<char> 265462 ns 265175 ns 2628 BM_cyrillic_text<char> 181063 ns 180865 ns 3871 BM_japanese_text<char> 130927 ns 130789 ns 5324 BM_emoji_text<char> 1892 ns 1890 ns 370537 BM_ascii_text<wchar_t> 2038 ns 2035 ns 343689 BM_unicode_text<wchar_t> 277603 ns 277282 ns 2526 BM_cyrillic_text<wchar_t> 188558 ns 188339 ns 3727 BM_japanese_text<wchar_t> 133084 ns 132943 ns 5262 BM_emoji_text<wchar_t> 2012 ns 2010 ns 348015 Persistent -------------------------------------------------------------------- Benchmark Time CPU Iterations -------------------------------------------------------------------- BM_ascii_text<char> 1904 ns 1899 ns 367472 BM_unicode_text<char> 133609 ns 133287 ns 5246 BM_cyrillic_text<char> 90185 ns 89941 ns 7796 BM_japanese_text<char> 75137 ns 74946 ns 9316 BM_emoji_text<char> 1906 ns 1901 ns 368081 BM_ascii_text<wchar_t> 2703 ns 2696 ns 259153 BM_unicode_text<wchar_t> 131497 ns 131168 ns 5341 BM_cyrillic_text<wchar_t> 87071 ns 86840 ns 8076 BM_japanese_text<wchar_t> 72279 ns 72099 ns 9682 BM_emoji_text<wchar_t> 2021 ns 2016 ns 346767	2024-04-09 19:20:06 +02:00
Konstantin Varlamov	4f215fdd62	[libc++][hardening] Categorize more assertions. (#75918 ) Also introduce `_LIBCPP_ASSERT_PEDANTIC` for assertions violating which results in a no-op or other benign behavior, but which may nevertheless indicate a bug in the invoking code.	2024-01-05 16:29:23 -08:00
Louis Dionne	9783f28cbb	[libc++] Format the code base (#74334 ) This patch runs clang-format on all of libcxx/include and libcxx/src, in accordance with the RFC discussed at [1]. Follow-up patches will format the benchmarks, the test suite and remaining parts of the code. I'm splitting this one into its own patch so the diff is a bit easier to review. This patch was generated with: find libcxx/include libcxx/src -type f \ \| grep -v 'module.modulemap.in' \ \| grep -v 'CMakeLists.txt' \ \| grep -v 'README.txt' \ \| grep -v 'libcxx.imp' \ \| grep -v '__config_site.in' \ \| xargs clang-format -i A Git merge driver is available in libcxx/utils/clang-format-merge-driver.sh to help resolve merge and rebase issues across these formatting changes. [1]: https://discourse.llvm.org/t/rfc-clang-formatting-all-of-libc-once-and-for-all	2023-12-18 14:01:33 -05:00
Mark de Wever	285e1e2a00	[libc++][format] Removes unneeded includes. I did a manual review after the post-review comments in D149543 Reviewed By: #libc, philnik, ldionne Differential Revision: https://reviews.llvm.org/D154122	2023-07-08 12:39:33 +02:00
varconst	cd0ad4216c	[libc++][hardening][NFC] Introduce `_LIBCPP_ASSERT_UNCATEGORIZED`. Replace most uses of `_LIBCPP_ASSERT` with `_LIBCPP_ASSERT_UNCATEGORIZED`. This is done as a prerequisite to introducing hardened mode to libc++. The idea is to make enabling assertions an opt-in with (somewhat) fine-grained controls over which categories of assertions are enabled. The vast majority of assertions are currently uncategorized; the new macro will allow turning on `_LIBCPP_ASSERT` (the underlying mechanism for all kinds of assertions) without enabling all the uncategorized assertions (in the future; this patch preserves the current behavior). Differential Revision: https://reviews.llvm.org/D153816	2023-06-28 15:10:31 -07:00
Ian Anderson	d5ce68afdf	[libc++] __iterator/readable_traits.h isn't standalone `__iterator/readable_traits.h` can't be used by itself, intantiating `iter_value_t` requires `__iterator/iterator_traits.h`. `readable_traits.h` can't include `iterator_traits.h` though because `iterator_traits.h` requires `readable_traits.h`. Move `iter_value_t` to `__iterator/iterator_traits.h` so that both headers can work standalone. Reviewed By: Mordante, #libc Differential Revision: https://reviews.llvm.org/D153828	2023-06-27 10:52:08 -07:00
Mark de Wever	09addf9cbe	[libc++][format] Fixes UTF-8 continuation. The mask used to check whether a code unit is a valid continuation was incorrect and accepts non-continuation code points. This fixes the issue. Reviewed By: ldionne, tahonermann, #libc Differential Revision: https://reviews.llvm.org/D149672	2023-06-20 19:28:02 +02:00
Mark de Wever	c866855b42	[libc++][format] Improves Unicode decoders. During the implementation of P2286 a second Unicode decoder was added. The original decoder was only used for the width estimation. Changing an ill-formed Unicode sequence to the replacement character, works properly for this use case. For P2286 an ill-formed Unicode sequence needs to be formatted as a sequence of code units. The exact wording in the Standard as a bit unclear and there was odd example in the WP. This made it hard to use the same decoder. SG16 determined the odd example in the WP was a bug and this has been fixed in the WP. This made it possible to combine the two decoders. The P2286 decoder kept track of the size of the ill-formed sequence. However this was not needed since the output algorithm needs to keep track of size of a well-formed and an ill-formed sequence. So this feature has been removed. The error status remains since it's needed for P2286, the grapheme clustering can ignore this unneeded value. (In general, grapheme clustering is only has specified behaviour for Unicode. When the string is in a non-Unicode encoding there are no requirements. Ill-formed Unicode is a non-Unicode encoding. Still libc++ does a best effort estimation.) There UTF-8 decoder accepted several ill-formed sequences: - Values in the surrogate range U+D800..U+DFFF. - Values encoded in more code units than required, for example 0+0020 in theory can be encoded using 1, 2, 3, or 4 were accepted. This is not allowed by the Unicode Standard. - Values larger than U+10FFFF were not always rejected. Reviewed By: #libc, ldionne, tahonermann, Mordante Differential Revision: https://reviews.llvm.org/D144346	2023-03-08 22:01:49 +01:00
Nikolas Klauser	40a20ae6ab	[libc++] Granularize <bit> includes Reviewed By: ldionne, #libc Spies: libcxx-commits Differential Revision: https://reviews.llvm.org/D141228	2023-02-17 11:36:19 +01:00
Nikolas Klauser	4f15267d3d	[libc++][NFC] Replace _LIBCPP_STD_VER > x with _LIBCPP_STD_VER >= x This change is almost fully mechanical. The only interesting change is in `generate_feature_test_macro_components.py` to generate `_LIBCPP_STD_VER >=` instead. To avoid churn in the git-blame this commit should be added to the `.git-blame-ignore-revs` once committed. Reviewed By: ldionne, var-const, #libc Spies: jloser, libcxx-commits, arichardson, arphaman, wenlei Differential Revision: https://reviews.llvm.org/D143962	2023-02-15 16:52:25 +01:00
Louis Dionne	1562e51491	[libc++] Don't assume that string_view::const_iterator is a raw pointer Our implementation of std::format assumed that string_view's iterators were raw pointers in various places. If we want to introduce a checked iterator in debug mode, that won't be true anymore. This patch removes that assumption. Differential Revision: https://reviews.llvm.org/D138795	2023-01-30 10:19:32 -05:00
Nikolas Klauser	1f5d698a8b	[libc++] Add missing include in __format/unicode.h	2023-01-09 16:37:31 +01:00
Louis Dionne	5935db6ebd	[libc++] Fix incorrect guard against the presence of wide characters TEST_HAS_NO_WIDE_CHARACTERS should only be used in the tests. Differential Revision: https://reviews.llvm.org/D138828	2022-11-28 14:33:49 -08:00
Nikolas Klauser	3574b800cf	[libc++][clang-tidy] Enable readability-simplify-boolean-expr Reviewed By: ldionne, #libc Spies: Eugene.Zelenko, aheejin, libcxx-commits, xazax.hun Differential Revision: https://reviews.llvm.org/D137804	2022-11-24 00:42:19 +01:00
Mark de Wever	a48007355a	[libc++][format] Implements string escaping. Implements parts of - P2286R8 Formatting Ranges Reviewed By: #libc, tahonermann Differential Revision: https://reviews.llvm.org/D134036	2022-10-20 17:29:34 +02:00
Louis Dionne	c2df707666	[libc++] Suppress -Wctad-maybe-unsupported on types w/o deduction guides There are a handful of standard library types that are intended to support CTAD but don't need any explicit deduction guides to do so. This patch adds a dummy deduction guide to those types to suppress -Wctad-maybe-unsupported (which gets emitted in user code). This is a re-application of the original patch by Eric Fiselier in fcd549a7d828 which had been reverted due to reasons lost at this point. I also added the macro to a few more types. Reviving this patch was prompted by the discussion on https://llvm.org/D133425. Differential Revision: https://reviews.llvm.org/D133535	2022-10-03 14:05:08 -04:00
Mark de Wever	4db55a459e	[libc++][format] Adhere to clang-tidy style. D126971 broke the CI due to recent changes in the clang-tidy settings. This fixes them.	2022-07-21 17:33:27 +02:00
Mark de Wever	857a78c04d	[libc++] Implements Unicode grapheme clustering This implements the Grapheme clustering as required by P1868R2 width: clarifying units of width and precision in std::format This was omitted in the initial patch, but the paper was marked as completed. This really completes the paper. Reviewed By: ldionne, #libc Differential Revision: https://reviews.llvm.org/D126971	2022-07-20 18:38:32 +02:00

19 Commits