This is a re-application of bc6bd3bc1e9 which was reverted in
f11abac6524 because it broke the Clang pre-commit CI.
Original commit message:
This patch rewrites the modulemap to have fewer top-level modules.
Previously, our modulemap had one top level module for each header in
the library, including private headers. This had the well-known problem
of making compilation times terrible, in addition to being somewhat
against the design principles of Clang modules.
This patch provides almost an order of magnitude compilation time
improvement when building modularized code (certainly subject to
variations). For example, including <ccomplex> without a module cache
went from 22.4 seconds to 1.6 seconds, a 14x improvement.
To achieve this, one might be tempted to simply put all the headers in a
single top-level module. Unfortunately, this doesn't work because libc++
provides C compatibility headers (e.g. stdlib.h) which create cycles
when the C Standard Library headers are modularized too. This is
especially tricky since base systems are usually not modularized: as far
as I know, only Xcode 16 beta contains a modularized SDK that makes this
issue visible. To understand it, imagine we have the following setup:
// in libc++'s include/c++/v1/module.modulemap
module std {
header stddef.h
header stdlib.h
}
// in the C library's include/module.modulemap
module clib {
header stddef.h
header stdlib.h
}
Now, imagine that the C library's <stdlib.h> includes <stddef.h>,
perhaps as an implementation detail. When building the `std` module,
libc++'s <stdlib.h> header does `#include_next <stdlib.h>` to get the C
library's <stdlib.h>, so libc++ depends on the `clib` module.
However, remember that the C library's <stdlib.h> header includes
<stddef.h> as an implementation detail. Since the header search paths
for libc++ are (and must be) before the search paths for the C library,
the C library ends up including libc++'s <stddef.h>, which means it
depends on the `std` module. That's a cycle.
To solve this issue, this patch creates one top-level module for each C
compatibility header. The rest of the libc++ headers are located in a
single top-level `std` module, with two main exceptions. First, the
module containing configuration headers (e.g. <__config>) has its own
top-level module too, because those headers are included by the C
compatibility headers.
Second, we create a top-level std_core module that contains several
dependency-free utilities used (directly or indirectly) from the __math
subdirectory. This is needed because __math pulls in a bunch of stuff,
and __math is used from the C compatibility header <math.h>.
As a direct benefit of this change, we don't need to generate an
artificial __std_clang_module header anymore to provide a monolithic
`std` module, since our modulemap does it naturally by construction.
A next step after this change would be to look into whether math.h
really needs to include the contents of __math, and if so, whether
libc++'s math.h truly needs to include the C library's math.h header.
Removing either dependency would break this annoying cycle.
Thanks to Eric Fiselier for pointing out this approach during a recent
meeting. This wasn't viable before some recent refactoring, but wrapping
everything (except the C headers) in a large module is by far the
simplest and the most effective way of doing this.
Fixes#86193
This reverts 3 commits:
45a09d1811d5d6597385ef02ecf2d4b7320c37c5
24bc3244d4e221f4e6740f45e2bf15a1441a3076
bc6bd3bc1e99c7ec9e22dff23b4f4373fa02cae3
The GitHub pre-merge CI has been broken since this PR went in. This
change reverts it to see if I can get the pre-merge CI working again.
This patch rewrites the modulemap to have fewer top-level modules.
Previously, our modulemap had one top level module for each header in
the library, including private headers. This had the well-known problem
of making compilation times terrible, in addition to being somewhat
against the design principles of Clang modules.
This patch provides almost an order of magnitude compilation time
improvement when building modularized code (certainly subject to
variations). For example, including <ccomplex> without a module cache
went from 22.4 seconds to 1.6 seconds, a 14x improvement.
To achieve this, one might be tempted to simply put all the headers in a
single top-level module. Unfortunately, this doesn't work because libc++
provides C compatibility headers (e.g. stdlib.h) which create cycles
when the C Standard Library headers are modularized too. This is
especially tricky since base systems are usually not modularized: as far
as I know, only Xcode 16 beta contains a modularized SDK that makes this
issue visible. To understand it, imagine we have the following setup:
// in libc++'s include/c++/v1/module.modulemap
module std {
header stddef.h
header stdlib.h
}
// in the C library's include/module.modulemap
module clib {
header stddef.h
header stdlib.h
}
Now, imagine that the C library's <stdlib.h> includes <stddef.h>,
perhaps as an implementation detail. When building the `std` module,
libc++'s <stdlib.h> header does `#include_next <stdlib.h>` to get the C
library's <stdlib.h>, so libc++ depends on the `clib` module.
However, remember that the C library's <stdlib.h> header includes
<stddef.h> as an implementation detail. Since the header search paths
for libc++ are (and must be) before the search paths for the C library,
the C library ends up including libc++'s <stddef.h>, which means it
depends on the `std` module. That's a cycle.
To solve this issue, this patch creates one top-level module for each C
compatibility header. The rest of the libc++ headers are located in a
single top-level `std` module, with two main exceptions. First, the
module containing configuration headers (e.g. <__config>) has its own
top-level module too, because those headers are included by the C
compatibility headers.
Second, we create a top-level std_core module that contains several
dependency-free utilities used (directly or indirectly) from the __math
subdirectory. This is needed because __math pulls in a bunch of stuff,
and __math is used from the C compatibility header <math.h>.
As a direct benefit of this change, we don't need to generate an
artificial __std_clang_module header anymore to provide a monolithic
`std` module, since our modulemap does it naturally by construction.
A next step after this change would be to look into whether math.h
really needs to include the contents of __math, and if so, whether
libc++'s math.h truly needs to include the C library's math.h header.
Removing either dependency would break this annoying cycle.
Thanks to Eric Fiselier for pointing out this approach during a recent
meeting. This wasn't viable before some recent refactoring, but wrapping
everything (except the C headers) in a large module is by far the
simplest and the most effective way of doing this.
Fixes#86193
We can instead generate it on-the-fly when we install the headers. This
reduces the amount of boilerplate we have to re-generate whenever we
add, remove or relocate header files.
Fixes#88529
This adds the std.compat module. The patch contains a bit of refactoring
to avoid code duplication between the std and std.compat module.
Implements parts of
- P2465R3 Standard Library Modules std and std.compat
This takes the header restrictions into account instead of manually
duplicating this build information. This is a preparation to properly
support the libc++ disabled parts in the std module.
Reviewed By: #libc, ldionne
Differential Revision: https://reviews.llvm.org/D158192
Use header_information to generate the __std_clang_module header. Instead of using lit_header_restrictions like the manually written header did, make a new header_include_requirements to codify what can be included rather than what can be fully tested.
Reviewed By: Mordante, #libc
Differential Revision: https://reviews.llvm.org/D157364
This finishes the transition of tests covered in generate_header_tests.py
to the new .gen.py format.
Differential Revision: https://reviews.llvm.org/D152008
This allows removing a bunch of boilerplate from the test suite and
reducing the amount of manual stuff contributors have to do when they
add a new public header.
Differential Revision: https://reviews.llvm.org/D151830
As obvious from the paper's title this is an LWG issue and thus retroactively
applied to C++20. This change may the output for certain code points:
1 Considers 8477 extra codepoints as having a width 2 (as of Unicode 15)
(mostly Tangut Ideographs)
2 Change the width of 85 unassigned code points from 2 to 1
3 Change the width of 8 codepoints (in the range U+3248 CIRCLED NUMBER
TEN ON BLACK SQUARE ... U+324F CIRCLED NUMBER EIGHTY ON BLACK
SQUARE) from 2 to 1, because it seems questionable to make an exception
for those without input from Unicode
Note that libc++ already uses Unicode 15, while the Standard requires Unicode 12.
(The last time I checked MSVC STL used Unicode 14.)
So in practice the only notable change is item 3.
Implements
P2675 LWG3780: The Paper
format's width estimation is too approximate and not forward compatible
Benchmark before these changes
--------------------------------------------------------------------
Benchmark Time CPU Iterations
--------------------------------------------------------------------
BM_ascii_text<char> 3928 ns 3928 ns 178131
BM_unicode_text<char> 75231 ns 75230 ns 9158
BM_cyrillic_text<char> 59837 ns 59834 ns 11529
BM_japanese_text<char> 39842 ns 39832 ns 17501
BM_emoji_text<char> 3931 ns 3930 ns 177750
BM_ascii_text<wchar_t> 4024 ns 4024 ns 174190
BM_unicode_text<wchar_t> 63756 ns 63751 ns 11136
BM_cyrillic_text<wchar_t> 44639 ns 44638 ns 15597
BM_japanese_text<wchar_t> 34425 ns 34424 ns 20283
BM_emoji_text<wchar_t> 3937 ns 3937 ns 177684
Benchmark after these changes
--------------------------------------------------------------------
Benchmark Time CPU Iterations
--------------------------------------------------------------------
BM_ascii_text<char> 3914 ns 3913 ns 178814
BM_unicode_text<char> 70380 ns 70378 ns 9694
BM_cyrillic_text<char> 51889 ns 51877 ns 13488
BM_japanese_text<char> 41707 ns 41705 ns 16723
BM_emoji_text<char> 3908 ns 3907 ns 177912
BM_ascii_text<wchar_t> 3949 ns 3948 ns 177525
BM_unicode_text<wchar_t> 64591 ns 64587 ns 10649
BM_cyrillic_text<wchar_t> 44089 ns 44078 ns 15721
BM_japanese_text<wchar_t> 39369 ns 39367 ns 17779
BM_emoji_text<wchar_t> 3936 ns 3934 ns 177821
Benchmarks without "if(__code_point < (__entries[0] >> 14))"
--------------------------------------------------------------------
Benchmark Time CPU Iterations
--------------------------------------------------------------------
BM_ascii_text<char> 3922 ns 3922 ns 178587
BM_unicode_text<char> 94474 ns 94474 ns 7351
BM_cyrillic_text<char> 69202 ns 69200 ns 10157
BM_japanese_text<char> 42735 ns 42692 ns 16382
BM_emoji_text<char> 3920 ns 3919 ns 178704
BM_ascii_text<wchar_t> 3951 ns 3950 ns 177224
BM_unicode_text<wchar_t> 81003 ns 80988 ns 8668
BM_cyrillic_text<wchar_t> 57020 ns 57018 ns 12048
BM_japanese_text<wchar_t> 39695 ns 39687 ns 17582
BM_emoji_text<wchar_t> 3977 ns 3976 ns 176479
This optimization does carry its weight for the Unicode and Cyrillic
test. For the Japanese tests the gains are minor and for emoji it seems
to have no effect.
Reviewed By: ldionne, tahonermann, #libc
Differential Revision: https://reviews.llvm.org/D144499
This makes it possible for programmers to run IWYU and get more accurate
standard library inclusions. Prior to this commit, the following program
would be transformed thusly:
```cpp
// Before
#include <algorithm>
#include <vector>
void f() {
auto v = std::vector{0, 1};
std::find(std::ranges::begin(v), std::ranges::end(v), 0);
}
```
```cpp
// After
#include <__algorithm/find.h>
#include <__ranges/access.h>
#include <vector>
...
```
There are two ways to fix this issue: to use [comment pragmas](https://github.com/include-what-you-use/include-what-you-use/blob/master/docs/IWYUPragmas.md)
on every private include, or to write a canonical [mapping file](https://github.com/include-what-you-use/include-what-you-use/blob/master/docs/IWYUMappings.md)
that provides the tool with a manual on how libc++ is laid out. Due to
the complexity of libc++, this commit opts for the latter, to maximise
correctness and minimise developer burden.
To mimimise developer updates to the file, it makes use of wildcards
that match everything within listed subdirectories. A script has also
been added to ensure that the mapping is always fresh in CI, and makes
the process a single step.
Finally, documentation has been added to inform users that IWYU is
supported, and what they need to do in order to leverage the mapping
file.
Closes#56937.
Differential Revision: https://reviews.llvm.org/D138189
This changes makes it easier to update the Unicode data files used for
the Extended Graphme Clustering as added in D126971.
Reviewed By: ldionne, #libc
Differential Revision: https://reviews.llvm.org/D129668
Note that `generate_assertion_tests.py` will be renamed to
`generate_header_tests.py` separately to facilitate change tracking.
Differential Revision: https://reviews.llvm.org/D123000
This patch changes the requirement for getting the declaration of the
assertion handler from including <__assert> to including any public
C++ header of the library. Note that C compatibility headers are
excluded because we don't implement all the C headers ourselves --
some of them are taken straight from the C library, like assert.h.
It also adds a generated test to check it. Furthermore, this new
generated test is designed in a way that will make it possible to
replace almost all the existing test-generation scripts with this
system in upcoming patches.
Differential Revision: https://reviews.llvm.org/D122506
libc++ has started splicing standard library headers into much more
fine-grained content for maintainability. It's very likely that outdated
and naive tooling (some of which is outside of LLVM's scope) will
suggest users include things such as `<__algorithm/find.h>` instead of
`<algorithm>`, and Hyrum's law suggests that users will eventually begin
to rely on this without the help of tooling. As such, this commit
intends to protect users from themselves, by making it a hard error for
anyone outside of the standard library to include libc++ detail headers.
This is the first of four patches. Patch #2 will solve the problem for
pre-processor `#include`s; patches #3 and #4 will solve the problem for
`<__tree>` and `<__hash_table>` (since I've never touched the test cases
that are failing for these two, I want to split them out into their own
commits to be extra careful). Patch #5 will concern itself with
`<__threading_support>`, which intersects with libcxxabi (which I know
even less about).
Differential Revision: https://reviews.llvm.org/D105932
As we automate more and more things in the library, it becomes useful for
contributors to have a single target for running all the automation as
part of their workflow. This commit adds a new `libcxx-generate-files`
target that should re-generate all the auto-generated files in the library.
As a fly-by, I also revamped the documentation on Contributing to account
for this new target and present it as a bullet list of things to check
before committing. I also added a few things that are often overlooked
to that list, such as updating the synopsis and the status files.
Differential Revision: https://reviews.llvm.org/D106067