llvm-project

Author	SHA1	Message	Date
Mark de Wever	5db033e204	[libc++][format] Improves fill character. The main change is to allow a UCS scalar value as fill character. Especially for char based formatting this increase the number of valid characters. Originally this was to be expected ABI breaking, however the current change does not seem to break the ABI. Implements - P2572 std::format() fill character allowances Depends on D144499 Reviewed By: ldionne, tahonermann, #libc Differential Revision: https://reviews.llvm.org/D144742	2023-05-19 17:20:50 +02:00
Nikolas Klauser	eb65912e41	[libc++] Move __errc to __system_error/errc.h This file was added before we started granularizing the headers, but is essentially just a granularized header. This moves the header to the correct place. Reviewed By: #libc, EricWF Spies: libcxx-commits, arichardson, mikhail.ramalho Differential Revision: https://reviews.llvm.org/D146395	2023-04-10 19:23:42 +02:00
Mark de Wever	57e20cab5a	[libc++][format] Use granularized charconv. This reduces the number of transitive includes when using format. Reviewed By: #libc, ldionne Differential Revision: https://reviews.llvm.org/D146240	2023-04-07 17:49:21 +02:00
Mark de Wever	c866855b42	[libc++][format] Improves Unicode decoders. During the implementation of P2286 a second Unicode decoder was added. The original decoder was only used for the width estimation. Changing an ill-formed Unicode sequence to the replacement character, works properly for this use case. For P2286 an ill-formed Unicode sequence needs to be formatted as a sequence of code units. The exact wording in the Standard as a bit unclear and there was odd example in the WP. This made it hard to use the same decoder. SG16 determined the odd example in the WP was a bug and this has been fixed in the WP. This made it possible to combine the two decoders. The P2286 decoder kept track of the size of the ill-formed sequence. However this was not needed since the output algorithm needs to keep track of size of a well-formed and an ill-formed sequence. So this feature has been removed. The error status remains since it's needed for P2286, the grapheme clustering can ignore this unneeded value. (In general, grapheme clustering is only has specified behaviour for Unicode. When the string is in a non-Unicode encoding there are no requirements. Ill-formed Unicode is a non-Unicode encoding. Still libc++ does a best effort estimation.) There UTF-8 decoder accepted several ill-formed sequences: - Values in the surrogate range U+D800..U+DFFF. - Values encoded in more code units than required, for example 0+0020 in theory can be encoded using 1, 2, 3, or 4 were accepted. This is not allowed by the Unicode Standard. - Values larger than U+10FFFF were not always rejected. Reviewed By: #libc, ldionne, tahonermann, Mordante Differential Revision: https://reviews.llvm.org/D144346	2023-03-08 22:01:49 +01:00
Nikolas Klauser	4f15267d3d	[libc++][NFC] Replace _LIBCPP_STD_VER > x with _LIBCPP_STD_VER >= x This change is almost fully mechanical. The only interesting change is in `generate_feature_test_macro_components.py` to generate `_LIBCPP_STD_VER >=` instead. To avoid churn in the git-blame this commit should be added to the `.git-blame-ignore-revs` once committed. Reviewed By: ldionne, var-const, #libc Spies: jloser, libcxx-commits, arichardson, arphaman, wenlei Differential Revision: https://reviews.llvm.org/D143962	2023-02-15 16:52:25 +01:00
Louis Dionne	a845b5b4fb	[libc++] Use bounded iterators in std::string_view when the debug mode is enabled Differential Revision: https://reviews.llvm.org/D142903	2023-01-31 18:23:46 -05:00
Louis Dionne	1562e51491	[libc++] Don't assume that string_view::const_iterator is a raw pointer Our implementation of std::format assumed that string_view's iterators were raw pointers in various places. If we want to introduce a checked iterator in debug mode, that won't be true anymore. This patch removes that assumption. Differential Revision: https://reviews.llvm.org/D138795	2023-01-30 10:19:32 -05:00
Mark de Wever	22e8525dfd	[libc++][format] Implements range_formatter Implements parts of - P2286R8 Formatting Ranges - P2585R0 Improving default container formatting Depends on D140651 Reviewed By: ldionne, #libc Differential Revision: https://reviews.llvm.org/D140653	2023-01-19 17:20:05 +01:00
Nikolas Klauser	841399a218	[libc++] Add custom clang-tidy checks Reviewed By: #libc, ldionne Spies: jwakely, beanz, smeenai, cfe-commits, tschuett, avogelsgesang, Mordante, sstefan1, libcxx-commits, ldionne, mgorny, arichardson, miyuki Differential Revision: https://reviews.llvm.org/D131963	2022-12-23 15:42:13 +01:00
Mark de Wever	e31d27e460	[libc++][format] Renames __null_sentinel. While the FreeBSD CI was enabled in D128084 it was discovered libc++ uses the name of a system macro on FreeBSD. This renames the macro to fix the issue. Reviewed By: emaste, #libc, philnik Differential Revision: https://reviews.llvm.org/D140117	2022-12-17 13:43:52 +01:00
Mark de Wever	ddcb2d19b3	[libc++] Improves modular build. Makes sure headers having a xxx_result as return type export the proper header. Without exporting these modularized headers are not self contained. This is related to D136045. Reviewed By: #libc, ldionne Differential Revision: https://reviews.llvm.org/D136711	2022-11-01 20:24:33 +01:00
Mark de Wever	a48007355a	[libc++][format] Implements string escaping. Implements parts of - P2286R8 Formatting Ranges Reviewed By: #libc, tahonermann Differential Revision: https://reviews.llvm.org/D134036	2022-10-20 17:29:34 +02:00
Mark de Wever	37c98da395	[libc++][format] Fixes broken CI. Some of the merged patches didn't have conflicts but were not compatible. This should fix it.	2022-08-31 20:14:10 +02:00
Mark de Wever	a6ce0d087a	[NFC][libc++][format] Use ranges in the output. This should avoid some copies of the output iterator. Reviewed By: #libc, Mordante Differential Revision: https://reviews.llvm.org/D132812	2022-08-29 18:13:30 +02:00
Mark de Wever	f7c0df002a	[libc++][format] Improve format buffer. Allow bulk output operations on the buffer instead of adding one code unit at a time. This has a huge performance benefit at the cost of larger binary. This doesn't implement @vitaut's earlier suggestion to avoid buffering for std::string when writing a strings. That can be done in a follow-up patch. There are some minor complications for the non-buffered format_to_n. When writing one character at a time it's easy to detect when reaching the limit n. This is solved by adding a small overhead for format_to_n. When the next write would overflow it stores the data in the internal buffer and copies that up-to n code units. The overhead isn't measured, but it's expected to only be an issue for small values of n; for larger values the general improvements will outweight the new overhead. ``` text data bss dec hex filename 349081 6096 440 355617 56d21 format.libcxx.out-baseline 344442 6088 440 350970 55afa formatted_size.libcxx.out-baseline 4567980 57272 424 4625676 46950c formatter_float.libcxx.out-baseline 718800 12472 488 731760 b2a70 formatter_int.libcxx.out-baseline 376341 6096 552 382989 5d80d format_to.libcxx.out-beaseline 370169 6096 440 376705 5bf81 format.libcxx.out 365530 6088 440 372058 5ad5a formatted_size.libcxx.out 4575116 57272 424 4632812 46b0ec formatter_float.libcxx.out 725936 12472 488 738896 b4650 formatter_int.libcxx.out 397429 6096 552 404077 62a6d format_to.libcxx.out ``` For very small strings the new method is slower, from 4 characters there's already a small gain. ``` Comparing ./format.libcxx.out-baseline to ./format.libcxx.out Benchmark Time CPU Time Old Time New CPU Old CPU New -------------------------------------------------------------------------------------------------------------------------------- BM_format_string<char>/1 +0.0268 +0.0268 43 44 43 44 BM_format_string<char>/2 +0.0133 +0.0133 22 22 22 22 BM_format_string<char>/4 -0.0248 -0.0248 12 11 12 11 BM_format_string<char>/8 -0.0831 -0.0831 6 6 6 6 BM_format_string<char>/16 -0.2976 -0.2976 4 3 4 3 BM_format_string<char>/32 -0.4369 -0.4369 3 2 3 2 BM_format_string<char>/64 -0.6375 -0.6375 3 1 3 1 BM_format_string<char>/128 -0.7685 -0.7685 2 1 2 1 ``` The int benchmark has benefits for the simple formatting, but shines for the complex formatting: ``` Comparing ./formatter_int.libcxx.out-baseline to ./formatter_int.libcxx.out Benchmark Time CPU Time Old Time New CPU Old CPU New ---------------------------------------------------------------------------------------------------------------------------------------------------- BM_Basic<uint32_t> -0.2307 -0.2307 60 46 60 46 BM_Basic<int32_t> -0.1985 -0.1985 61 49 61 49 BM_Basic<uint64_t> -0.3478 -0.3479 81 53 81 53 BM_Basic<int64_t> -0.3475 -0.3475 81 53 81 53 BM_BasicLow<__uint128_t> -0.3388 -0.3388 86 57 86 57 BM_BasicLow<__int128_t> -0.3431 -0.3431 86 57 86 57 BM_Basic<__uint128_t> -0.2822 -0.2822 236 170 236 170 BM_Basic<__int128_t> -0.3107 -0.3107 219 151 219 151 Integral_LocFalse_BaseBin_AlignNone_Int64 -0.5781 -0.5781 178 75 178 75 Integral_LocFalse_BaseBin_AlignmentLeft_Int64 -0.9231 -0.9231 1156 89 1156 89 Integral_LocFalse_BaseBin_AlignmentCenter_Int64 -0.9179 -0.9179 1107 91 1107 91 Integral_LocFalse_BaseBin_AlignmentRight_Int64 -0.9238 -0.9238 1147 87 1147 87 Integral_LocFalse_BaseBin_ZeroPadding_Int64 -0.9170 -0.9170 1137 94 1137 94 Integral_LocFalse_BaseBin_AlignNone_Uint64 -0.5923 -0.5923 175 71 175 71 Integral_LocFalse_BaseBin_AlignmentLeft_Uint64 -0.9251 -0.9251 1154 86 1154 86 Integral_LocFalse_BaseBin_AlignmentCenter_Uint64 -0.9204 -0.9204 1105 88 1105 88 Integral_LocFalse_BaseBin_AlignmentRight_Uint64 -0.9242 -0.9242 1125 85 1125 85 Integral_LocFalse_BaseBin_ZeroPadding_Uint64 -0.9232 -0.9232 1139 88 1139 88 Integral_LocFalse_BaseOct_AlignNone_Int64 -0.3241 -0.3241 100 67 100 67 Integral_LocFalse_BaseOct_AlignmentLeft_Int64 -0.9322 -0.9322 1166 79 1166 79 Integral_LocFalse_BaseOct_AlignmentCenter_Int64 -0.9251 -0.9251 1108 83 1108 83 Integral_LocFalse_BaseOct_AlignmentRight_Int64 -0.9303 -0.9303 1136 79 1136 79 Integral_LocFalse_BaseOct_ZeroPadding_Int64 -0.9264 -0.9264 1156 85 1156 85 Integral_LocFalse_BaseOct_AlignNone_Uint64 -0.3116 -0.3116 96 66 96 66 Integral_LocFalse_BaseOct_AlignmentLeft_Uint64 -0.9310 -0.9310 1168 81 1168 81 Integral_LocFalse_BaseOct_AlignmentCenter_Uint64 -0.9281 -0.9281 1128 81 1128 81 Integral_LocFalse_BaseOct_AlignmentRight_Uint64 -0.9299 -0.9299 1148 80 1148 80 Integral_LocFalse_BaseOct_ZeroPadding_Uint64 -0.9288 -0.9288 1153 82 1153 82 Integral_LocFalse_BaseDec_AlignNone_Int64 -0.3342 -0.3342 95 63 95 63 Integral_LocFalse_BaseDec_AlignmentLeft_Int64 -0.9360 -0.9360 1157 74 1157 74 Integral_LocFalse_BaseDec_AlignmentCenter_Int64 -0.9303 -0.9303 1128 79 1128 79 Integral_LocFalse_BaseDec_AlignmentRight_Int64 -0.9369 -0.9369 1164 73 1164 73 Integral_LocFalse_BaseDec_ZeroPadding_Int64 -0.9323 -0.9323 1157 78 1157 78 Integral_LocFalse_BaseDec_AlignNone_Uint64 -0.3198 -0.3198 93 63 93 63 Integral_LocFalse_BaseDec_AlignmentLeft_Uint64 -0.9351 -0.9351 1158 75 1158 75 Integral_LocFalse_BaseDec_AlignmentCenter_Uint64 -0.9298 -0.9298 1128 79 1128 79 Integral_LocFalse_BaseDec_AlignmentRight_Uint64 -0.9361 -0.9361 1157 74 1157 74 Integral_LocFalse_BaseDec_ZeroPadding_Uint64 -0.9333 -0.9333 1151 77 1151 77 Integral_LocFalse_BaseHex_AlignNone_Int64 -0.3020 -0.3020 89 62 89 62 Integral_LocFalse_BaseHex_AlignmentLeft_Int64 -0.9357 -0.9357 1174 75 1174 75 Integral_LocFalse_BaseHex_AlignmentCenter_Int64 -0.9319 -0.9319 1129 77 1129 77 Integral_LocFalse_BaseHex_AlignmentRight_Int64 -0.9350 -0.9350 1161 75 1161 75 Integral_LocFalse_BaseHex_ZeroPadding_Int64 -0.9293 -0.9293 1150 81 1150 81 Integral_LocFalse_BaseHex_AlignNone_Uint64 -0.3056 -0.3057 86 59 86 59 Integral_LocFalse_BaseHex_AlignmentLeft_Uint64 -0.9378 -0.9378 1174 73 1174 73 Integral_LocFalse_BaseHex_AlignmentCenter_Uint64 -0.9341 -0.9341 1129 74 1130 74 Integral_LocFalse_BaseHex_AlignmentRight_Uint64 -0.9361 -0.9361 1157 74 1157 74 Integral_LocFalse_BaseHex_ZeroPadding_Uint64 -0.9315 -0.9315 1147 79 1147 79 Integral_LocFalse_BaseHexUpper_AlignNone_Int64 -0.0019 -0.0019 91 90 91 90 Integral_LocFalse_BaseHexUpper_AlignmentLeft_Int64 -0.9099 -0.9099 1162 105 1162 105 Integral_LocFalse_BaseHexUpper_AlignmentCenter_Int64 -0.9041 -0.9041 1121 108 1121 108 Integral_LocFalse_BaseHexUpper_AlignmentRight_Int64 -0.9086 -0.9086 1162 106 1162 106 Integral_LocFalse_BaseHexUpper_ZeroPadding_Int64 -0.9057 -0.9057 1164 110 1164 110 Integral_LocFalse_BaseHexUpper_AlignNone_Uint64 +0.0110 +0.0110 86 87 86 87 Integral_LocFalse_BaseHexUpper_AlignmentLeft_Uint64 -0.9136 -0.9136 1161 100 1161 100 Integral_LocFalse_BaseHexUpper_AlignmentCenter_Uint64 -0.9078 -0.9078 1133 104 1133 104 Integral_LocFalse_BaseHexUpper_AlignmentRight_Uint64 -0.9132 -0.9132 1177 102 1177 102 Integral_LocFalse_BaseHexUpper_ZeroPadding_Uint64 -0.9091 -0.9091 1160 105 1160 105 ``` Other benchmarks give similar results. Reviewed By: #libc, ldionne Differential Revision: https://reviews.llvm.org/D129964	2022-08-16 18:54:10 +02:00
Mark de Wever	4db55a459e	[libc++][format] Adhere to clang-tidy style. D126971 broke the CI due to recent changes in the clang-tidy settings. This fixes them.	2022-07-21 17:33:27 +02:00
Mark de Wever	857a78c04d	[libc++] Implements Unicode grapheme clustering This implements the Grapheme clustering as required by P1868R2 width: clarifying units of width and precision in std::format This was omitted in the initial patch, but the paper was marked as completed. This really completes the paper. Reviewed By: ldionne, #libc Differential Revision: https://reviews.llvm.org/D126971	2022-07-20 18:38:32 +02:00
Mark de Wever	6589729206	[libc++][format] Improves parsing speed. A format string like "{}" is quite common. In this case avoid parsing the format-spec when it's not present. Before the parsing was always called, therefore some refactoring is done to make sure the formatters work properly when their parse member isn't called. From the wording it's not entirely clear whether this optimization is allowed [tab:formatter] ``` and the range [pc.begin(), pc.end()) from the last call to f.parse(pc). ``` Implies there's always a call to `f.parse` even when the format-spec isn't present. Therefore this optimization isn't done for handle classes; it's unclear whether that would break user defined formatters. The improvements give a small reduciton is code size: 719408 12472 488 732368 b2cd0 before 718824 12472 488 731784 b2a88 after The performance benefits when not using a format-spec are: ``` Comparing ./formatter_int.libcxx.out-baseline to ./formatter_int.libcxx.out Benchmark Time CPU Time Old Time New CPU Old CPU New ---------------------------------------------------------------------------------------------------------------------------------------------------- BM_Basic<uint32_t> -0.0688 -0.0687 67 62 67 62 BM_Basic<int32_t> -0.1105 -0.1107 73 65 73 65 BM_Basic<uint64_t> -0.1053 -0.1049 95 85 95 85 BM_Basic<int64_t> -0.0889 -0.0888 93 85 93 85 BM_BasicLow<__uint128_t> -0.0655 -0.0655 96 90 96 90 BM_BasicLow<__int128_t> -0.0693 -0.0694 97 90 97 90 BM_Basic<__uint128_t> -0.0359 -0.0359 256 247 256 247 BM_Basic<__int128_t> -0.0414 -0.0414 239 229 239 229 ``` For the cases where a format-spec is used the results remain similar, some are faster some are slower, differing per run. Reviewed By: ldionne, #libc Differential Revision: https://reviews.llvm.org/D129426	2022-07-13 17:39:09 +02:00
Nikolas Klauser	b48c5010a4	[libc++] Make parameter names consistent and enforce the naming style using readability-identifier-naming Ensure that parameter names have the style `__lower_case` Reviewed By: ldionne, #libc Spies: aheejin, sstefan1, libcxx-commits, miyuki Differential Revision: https://reviews.llvm.org/D129051	2022-07-08 18:17:47 +02:00
Mark de Wever	207e7e4a70	[libc++[format][NFC] Removes dead code. This removes a part of the now obsolete formater code. The removal also removes the _v2 suffix where it's no longer needed. Depends on D128785 Reviewed By: #libc, ldionne Differential Revision: https://reviews.llvm.org/D128846	2022-07-07 08:00:43 +02:00
Mark de Wever	152d922295	[libc++][format] Improve floating-point formatters. This changes the implementation of the formatter. Instead of inheriting from a specialized parser all formatters will use the same generic parser. This reduces the binary size. The new parser contains some additional fields only used in the chrono formatting. Since this doesn't change the size of the parser the fields are in the generic parser. The parser is designed to fit in 128-bit, making it cheap to pass by value. The new format function is a const member function. This isn't required by the Standard yet, but it will be after LWG-3636 is accepted. Additionally P2286 adds a formattable concept which requires the member function to be const qualified in C++23. This paper is likely to be accepted in the 2022 July plenary. This is based on D125606. That commit did the groundwork and did similar changes for the string formatters. Reviewed By: #libc, ldionne Differential Revision: https://reviews.llvm.org/D128785	2022-07-07 08:00:05 +02:00
Mark de Wever	9afaa158f5	[libc++][format] Copy code to new location. This is a helper patch to ease the reviewing of D128139. The originals will be removed at a later time when all formatters are converted to the new style. (Floating-point and pointer aren't up for review yet.) Reviewed By: #libc, ldionne Differential Revision: https://reviews.llvm.org/D128367	2022-06-23 17:21:37 +02:00
Mark de Wever	77ad77c071	[libc++][format] Improve string formatters This changes the implementation of the formatter. Instead of inheriting from a specialized parser all formatters will use the same generic parser. This reduces the binary size. The new parser contains some additional fields only used in the chrono formatting. Since this doesn't change the size of the parser the fields are in the generic parser. The parser is designed to fit in 128-bit, making it cheap to pass by value. The new format function is a const member function. This isn't required by the Standard yet, but it will be after LWG-3636 is accepted. Additionally P2286 adds a formattable concept which requires the member function to be const qualified in C++23. This paper is likely to be accepted in the 2022 July plenary. Depends on D121530 NOTE parts of the code now contains duplicates for the current and new parser. The intention is to remove the duplication in followup patches. A general overview of the final code is available in D124620. That review however lacks a bit of polish. Most of the new code is based on the same algorithms used in the current code. The final version of this code reduces the binary size by 17 KB for this example code ``` int main() { { std::string_view sv{"hello world"}; std::format("{}{}\|{}{}{}{}{}{}\|{}{}{}{}{}{}\|{}{}{}\|{}{}\|{}", true, '', (signed char)(42), (short)(42), (int)(42), (long)(42), (long long)(42), (__int128_t)(42), (unsigned char)(42), (unsigned short)(42), (unsigned int)(42), (unsigned long)(42), (unsigned long long)(42), (__uint128_t)(42), (float)(42), (double)(42), (long double)(42), "hello world", sv, nullptr); } { std::wstring_view sv{L"hello world"}; std::format(L"{}{}\|{}{}{}{}{}{}\|{}{}{}{}{}{}\|{}{}{}\|{}{}\|{}", true, L'', (signed char)(42), (short)(42), (int)(42), (long)(42), (long long)(42), (__int128_t)(42), (unsigned char)(42), (unsigned short)(42), (unsigned int)(42), (unsigned long)(42), (unsigned long long)(42), (__uint128_t)(42), (float)(42), (double)(42), (long double)(42), L"hello world", sv, nullptr); } } ``` Reviewed By: #libc, ldionne Differential Revision: https://reviews.llvm.org/D125606	2022-06-22 07:40:36 +02:00

23 Commits