The main change is to allow a UCS scalar value as fill character.
Especially for char based formatting this increase the number of valid
characters. Originally this was to be expected ABI breaking, however the
current change does not seem to break the ABI.
Implements
- P2572 std::format() fill character allowances
Depends on D144499
Reviewed By: ldionne, tahonermann, #libc
Differential Revision: https://reviews.llvm.org/D144742
This file was added before we started granularizing the headers, but is essentially just a granularized header. This moves the header to the correct place.
Reviewed By: #libc, EricWF
Spies: libcxx-commits, arichardson, mikhail.ramalho
Differential Revision: https://reviews.llvm.org/D146395
During the implementation of P2286 a second Unicode decoder was added.
The original decoder was only used for the width estimation. Changing
an ill-formed Unicode sequence to the replacement character, works
properly for this use case. For P2286 an ill-formed Unicode sequence
needs to be formatted as a sequence of code units. The exact wording in
the Standard as a bit unclear and there was odd example in the WP. This
made it hard to use the same decoder. SG16 determined the odd example in
the WP was a bug and this has been fixed in the WP.
This made it possible to combine the two decoders. The P2286 decoder
kept track of the size of the ill-formed sequence. However this was not
needed since the output algorithm needs to keep track of size of a
well-formed and an ill-formed sequence. So this feature has been
removed.
The error status remains since it's needed for P2286, the grapheme
clustering can ignore this unneeded value. (In general, grapheme
clustering is only has specified behaviour for Unicode. When the string
is in a non-Unicode encoding there are no requirements. Ill-formed
Unicode is a non-Unicode encoding. Still libc++ does a best effort
estimation.)
There UTF-8 decoder accepted several ill-formed sequences:
- Values in the surrogate range U+D800..U+DFFF.
- Values encoded in more code units than required, for example 0+0020
in theory can be encoded using 1, 2, 3, or 4 were accepted. This is
not allowed by the Unicode Standard.
- Values larger than U+10FFFF were not always rejected.
Reviewed By: #libc, ldionne, tahonermann, Mordante
Differential Revision: https://reviews.llvm.org/D144346
This change is almost fully mechanical. The only interesting change is in `generate_feature_test_macro_components.py` to generate `_LIBCPP_STD_VER >=` instead. To avoid churn in the git-blame this commit should be added to the `.git-blame-ignore-revs` once committed.
Reviewed By: ldionne, var-const, #libc
Spies: jloser, libcxx-commits, arichardson, arphaman, wenlei
Differential Revision: https://reviews.llvm.org/D143962
Our implementation of std::format assumed that string_view's iterators
were raw pointers in various places. If we want to introduce a checked
iterator in debug mode, that won't be true anymore. This patch removes
that assumption.
Differential Revision: https://reviews.llvm.org/D138795
While the FreeBSD CI was enabled in D128084 it was discovered libc++
uses the name of a system macro on FreeBSD. This renames the macro to
fix the issue.
Reviewed By: emaste, #libc, philnik
Differential Revision: https://reviews.llvm.org/D140117
Makes sure headers having a xxx_result as return type export the proper
header. Without exporting these modularized headers are not self
contained.
This is related to D136045.
Reviewed By: #libc, ldionne
Differential Revision: https://reviews.llvm.org/D136711
This implements the Grapheme clustering as required by
P1868R2 width: clarifying units of width and precision in std::format
This was omitted in the initial patch, but the paper was marked as completed. This really completes the paper.
Reviewed By: ldionne, #libc
Differential Revision: https://reviews.llvm.org/D126971
A format string like "{}" is quite common. In this case avoid parsing
the format-spec when it's not present. Before the parsing was always
called, therefore some refactoring is done to make sure the formatters
work properly when their parse member isn't called.
From the wording it's not entirely clear whether this optimization is
allowed
[tab:formatter]
```
and the range [pc.begin(), pc.end()) from the last call to f.parse(pc).
```
Implies there's always a call to `f.parse` even when the format-spec
isn't present. Therefore this optimization isn't done for handle
classes; it's unclear whether that would break user defined formatters.
The improvements give a small reduciton is code size:
719408 12472 488 732368 b2cd0 before
718824 12472 488 731784 b2a88 after
The performance benefits when not using a format-spec are:
```
Comparing ./formatter_int.libcxx.out-baseline to ./formatter_int.libcxx.out
Benchmark Time CPU Time Old Time New CPU Old CPU New
----------------------------------------------------------------------------------------------------------------------------------------------------
BM_Basic<uint32_t> -0.0688 -0.0687 67 62 67 62
BM_Basic<int32_t> -0.1105 -0.1107 73 65 73 65
BM_Basic<uint64_t> -0.1053 -0.1049 95 85 95 85
BM_Basic<int64_t> -0.0889 -0.0888 93 85 93 85
BM_BasicLow<__uint128_t> -0.0655 -0.0655 96 90 96 90
BM_BasicLow<__int128_t> -0.0693 -0.0694 97 90 97 90
BM_Basic<__uint128_t> -0.0359 -0.0359 256 247 256 247
BM_Basic<__int128_t> -0.0414 -0.0414 239 229 239 229
```
For the cases where a format-spec is used the results remain similar,
some are faster some are slower, differing per run.
Reviewed By: ldionne, #libc
Differential Revision: https://reviews.llvm.org/D129426
This removes a part of the now obsolete formater code.
The removal also removes the _v2 suffix where it's no longer needed.
Depends on D128785
Reviewed By: #libc, ldionne
Differential Revision: https://reviews.llvm.org/D128846
This changes the implementation of the formatter. Instead of inheriting
from a specialized parser all formatters will use the same generic
parser. This reduces the binary size.
The new parser contains some additional fields only used in the chrono
formatting. Since this doesn't change the size of the parser the fields
are in the generic parser. The parser is designed to fit in 128-bit,
making it cheap to pass by value.
The new format function is a const member function. This isn't required
by the Standard yet, but it will be after LWG-3636 is accepted.
Additionally P2286 adds a formattable concept which requires the member
function to be const qualified in C++23. This paper is likely to be
accepted in the 2022 July plenary.
This is based on D125606. That commit did the groundwork and did similar
changes for the string formatters.
Reviewed By: #libc, ldionne
Differential Revision: https://reviews.llvm.org/D128785
This is a helper patch to ease the reviewing of D128139.
The originals will be removed at a later time when all formatters are
converted to the new style. (Floating-point and pointer aren't up for
review yet.)
Reviewed By: #libc, ldionne
Differential Revision: https://reviews.llvm.org/D128367
This changes the implementation of the formatter. Instead of inheriting
from a specialized parser all formatters will use the same generic
parser. This reduces the binary size.
The new parser contains some additional fields only used in the chrono
formatting. Since this doesn't change the size of the parser the fields
are in the generic parser. The parser is designed to fit in 128-bit,
making it cheap to pass by value.
The new format function is a const member function. This isn't required
by the Standard yet, but it will be after LWG-3636 is accepted.
Additionally P2286 adds a formattable concept which requires the member
function to be const qualified in C++23. This paper is likely to be
accepted in the 2022 July plenary.
Depends on D121530
NOTE parts of the code now contains duplicates for the current and new parser.
The intention is to remove the duplication in followup patches. A general
overview of the final code is available in D124620. That review however lacks a
bit of polish.
Most of the new code is based on the same algorithms used in the current code.
The final version of this code reduces the binary size by 17 KB for this example
code
```
int main() {
{
std::string_view sv{"hello world"};
std::format("{}{}|{}{}{}{}{}{}|{}{}{}{}{}{}|{}{}{}|{}{}|{}", true, '*',
(signed char)(42), (short)(42), (int)(42), (long)(42), (long long)(42), (__int128_t)(42),
(unsigned char)(42), (unsigned short)(42), (unsigned int)(42), (unsigned long)(42),
(unsigned long long)(42), (__uint128_t)(42),
(float)(42), (double)(42), (long double)(42),
"hello world", sv,
nullptr);
}
{
std::wstring_view sv{L"hello world"};
std::format(L"{}{}|{}{}{}{}{}{}|{}{}{}{}{}{}|{}{}{}|{}{}|{}", true, L'*',
(signed char)(42), (short)(42), (int)(42), (long)(42), (long long)(42), (__int128_t)(42),
(unsigned char)(42), (unsigned short)(42), (unsigned int)(42), (unsigned long)(42),
(unsigned long long)(42), (__uint128_t)(42),
(float)(42), (double)(42), (long double)(42),
L"hello world", sv,
nullptr);
}
}
```
Reviewed By: #libc, ldionne
Differential Revision: https://reviews.llvm.org/D125606