21 Commits

Author SHA1 Message Date
Louis Dionne
41145feb77
[libc++][modules] Rewrite the modulemap to have fewer top-level modules (#110501)
This is a re-application of bc6bd3bc1e9 which was reverted in
f11abac6524 because it broke the Clang pre-commit CI.

Original commit message:

This patch rewrites the modulemap to have fewer top-level modules.
Previously, our modulemap had one top level module for each header in
the library, including private headers. This had the well-known problem
of making compilation times terrible, in addition to being somewhat
against the design principles of Clang modules.

This patch provides almost an order of magnitude compilation time
improvement when building modularized code (certainly subject to
variations). For example, including <ccomplex> without a module cache
went from 22.4 seconds to 1.6 seconds, a 14x improvement.

To achieve this, one might be tempted to simply put all the headers in a
single top-level module. Unfortunately, this doesn't work because libc++
provides C compatibility headers (e.g. stdlib.h) which create cycles
when the C Standard Library headers are modularized too. This is
especially tricky since base systems are usually not modularized: as far
as I know, only Xcode 16 beta contains a modularized SDK that makes this
issue visible. To understand it, imagine we have the following setup:

   // in libc++'s include/c++/v1/module.modulemap
   module std {
      header stddef.h
      header stdlib.h
   }

   // in the C library's include/module.modulemap
   module clib {
      header stddef.h
      header stdlib.h
   }

Now, imagine that the C library's <stdlib.h> includes <stddef.h>,
perhaps as an implementation detail. When building the `std` module,
libc++'s <stdlib.h> header does `#include_next <stdlib.h>` to get the C
library's <stdlib.h>, so libc++ depends on the `clib` module.

However, remember that the C library's <stdlib.h> header includes
<stddef.h> as an implementation detail. Since the header search paths
for libc++ are (and must be) before the search paths for the C library,
the C library ends up including libc++'s <stddef.h>, which means it
depends on the `std` module. That's a cycle.

To solve this issue, this patch creates one top-level module for each C
compatibility header. The rest of the libc++ headers are located in a
single top-level `std` module, with two main exceptions. First, the
module containing configuration headers (e.g. <__config>) has its own
top-level module too, because those headers are included by the C
compatibility headers.

Second, we create a top-level std_core module that contains several
dependency-free utilities used (directly or indirectly) from the __math
subdirectory. This is needed because __math pulls in a bunch of stuff,
and __math is used from the C compatibility header <math.h>.

As a direct benefit of this change, we don't need to generate an
artificial __std_clang_module header anymore to provide a monolithic
`std` module, since our modulemap does it naturally by construction.

A next step after this change would be to look into whether math.h
really needs to include the contents of __math, and if so, whether
libc++'s math.h truly needs to include the C library's math.h header.
Removing either dependency would break this annoying cycle.

Thanks to Eric Fiselier for pointing out this approach during a recent
meeting. This wasn't viable before some recent refactoring, but wrapping
everything (except the C headers) in a large module is by far the
simplest and the most effective way of doing this.

Fixes #86193
2024-09-30 14:17:05 -04:00
Chris B
f11abac652
Revert "[libc++][modules] Rewrite the modulemap to have fewer top-level modules (#107638)" (#110384)
This reverts 3 commits:
45a09d1811d5d6597385ef02ecf2d4b7320c37c5
24bc3244d4e221f4e6740f45e2bf15a1441a3076
bc6bd3bc1e99c7ec9e22dff23b4f4373fa02cae3

The GitHub pre-merge CI has been broken since this PR went in. This
change reverts it to see if I can get the pre-merge CI working again.
2024-09-28 21:47:09 -05:00
Louis Dionne
bc6bd3bc1e
[libc++][modules] Rewrite the modulemap to have fewer top-level modules (#107638)
This patch rewrites the modulemap to have fewer top-level modules.
Previously, our modulemap had one top level module for each header in
the library, including private headers. This had the well-known problem
of making compilation times terrible, in addition to being somewhat
against the design principles of Clang modules.

This patch provides almost an order of magnitude compilation time
improvement when building modularized code (certainly subject to
variations). For example, including <ccomplex> without a module cache
went from 22.4 seconds to 1.6 seconds, a 14x improvement.

To achieve this, one might be tempted to simply put all the headers in a
single top-level module. Unfortunately, this doesn't work because libc++
provides C compatibility headers (e.g. stdlib.h) which create cycles
when the C Standard Library headers are modularized too. This is
especially tricky since base systems are usually not modularized: as far
as I know, only Xcode 16 beta contains a modularized SDK that makes this
issue visible. To understand it, imagine we have the following setup:

   // in libc++'s include/c++/v1/module.modulemap
   module std {
      header stddef.h
      header stdlib.h
   }

   // in the C library's include/module.modulemap
   module clib {
      header stddef.h
      header stdlib.h
   }

Now, imagine that the C library's <stdlib.h> includes <stddef.h>,
perhaps as an implementation detail. When building the `std` module,
libc++'s <stdlib.h> header does `#include_next <stdlib.h>` to get the C
library's <stdlib.h>, so libc++ depends on the `clib` module.

However, remember that the C library's <stdlib.h> header includes
<stddef.h> as an implementation detail. Since the header search paths
for libc++ are (and must be) before the search paths for the C library,
the C library ends up including libc++'s <stddef.h>, which means it
depends on the `std` module. That's a cycle.

To solve this issue, this patch creates one top-level module for each C
compatibility header. The rest of the libc++ headers are located in a
single top-level `std` module, with two main exceptions. First, the
module containing configuration headers (e.g. <__config>) has its own
top-level module too, because those headers are included by the C
compatibility headers.

Second, we create a top-level std_core module that contains several
dependency-free utilities used (directly or indirectly) from the __math
subdirectory. This is needed because __math pulls in a bunch of stuff,
and __math is used from the C compatibility header <math.h>.

As a direct benefit of this change, we don't need to generate an
artificial __std_clang_module header anymore to provide a monolithic
`std` module, since our modulemap does it naturally by construction.

A next step after this change would be to look into whether math.h
really needs to include the contents of __math, and if so, whether
libc++'s math.h truly needs to include the C library's math.h header.
Removing either dependency would break this annoying cycle.

Thanks to Eric Fiselier for pointing out this approach during a recent
meeting. This wasn't viable before some recent refactoring, but wrapping
everything (except the C headers) in a large module is by far the
simplest and the most effective way of doing this.

Fixes #86193
2024-09-26 13:19:48 -04:00
Louis Dionne
e2754890ca
[libc++] Don't commit libcxx.imp (#89391)
We can instead generate it on-the-fly when we install the headers. This
reduces the amount of boilerplate we have to re-generate whenever we
add, remove or relocate header files.

Fixes #88529
2024-04-22 08:45:02 -04:00
Mark de Wever
59e66c515a
[libc++][format] Switches to Unicode 15.1. (#86543)
In addition to changes in the tables the extended grapheme clustering
algorithm has been overhauled. Before I considered a separate state
machine to implement the rules. With the new rule GB9c this became more
attractive and the design has changed.

This change initially had quite an impact on the performance. By making
the state machine persistent the performance was improved greatly. Note
it is still slower than before due to the larger Unicode tables.

Before
--------------------------------------------------------------------
Benchmark                          Time             CPU   Iterations
--------------------------------------------------------------------
BM_ascii_text<char>             1891 ns         1889 ns       369504
BM_unicode_text<char>         106642 ns       106397 ns         6576
BM_cyrillic_text<char>         73420 ns        73277 ns         9445
BM_japanese_text<char>         62485 ns        62387 ns        11153
BM_emoji_text<char>             1895 ns         1893 ns       369525
BM_ascii_text<wchar_t>          2015 ns         2013 ns       346887
BM_unicode_text<wchar_t>       92119 ns        92017 ns         7598
BM_cyrillic_text<wchar_t>      62637 ns        62568 ns        11117
BM_japanese_text<wchar_t>      53850 ns        53785 ns        12803
BM_emoji_text<wchar_t>          2016 ns         2014 ns       347325

After
--------------------------------------------------------------------
Benchmark                          Time             CPU   Iterations
--------------------------------------------------------------------
BM_ascii_text<char>             1906 ns         1904 ns       369409
BM_unicode_text<char>         265462 ns       265175 ns         2628
BM_cyrillic_text<char>        181063 ns       180865 ns         3871
BM_japanese_text<char>        130927 ns       130789 ns         5324
BM_emoji_text<char>             1892 ns         1890 ns       370537
BM_ascii_text<wchar_t>          2038 ns         2035 ns       343689
BM_unicode_text<wchar_t>      277603 ns       277282 ns         2526
BM_cyrillic_text<wchar_t>     188558 ns       188339 ns         3727
BM_japanese_text<wchar_t>     133084 ns       132943 ns         5262
BM_emoji_text<wchar_t>          2012 ns         2010 ns       348015

Persistent
--------------------------------------------------------------------
Benchmark                          Time             CPU   Iterations
--------------------------------------------------------------------
BM_ascii_text<char>             1904 ns         1899 ns       367472
BM_unicode_text<char>         133609 ns       133287 ns         5246
BM_cyrillic_text<char>         90185 ns        89941 ns         7796
BM_japanese_text<char>         75137 ns        74946 ns         9316
BM_emoji_text<char>             1906 ns         1901 ns       368081
BM_ascii_text<wchar_t>          2703 ns         2696 ns       259153
BM_unicode_text<wchar_t>      131497 ns       131168 ns         5341
BM_cyrillic_text<wchar_t>      87071 ns        86840 ns         8076
BM_japanese_text<wchar_t>      72279 ns        72099 ns         9682
BM_emoji_text<wchar_t>          2021 ns         2016 ns       346767
2024-04-09 19:20:06 +02:00
Mark de Wever
600462a2db
[libc++][modules] Adds std.compat module. (#71438)
This adds the std.compat module. The patch contains a bit of refactoring
to avoid code duplication between the std and std.compat module.

Implements parts of
- P2465R3 Standard Library Modules std and std.compat
2023-12-09 13:51:50 +01:00
Mark de Wever
41161aeb54 [libc++][modules] Generates std.cppm.in.
This takes the header restrictions into account instead of manually
duplicating this build information. This is a preparation to properly
support the libc++ disabled parts in the std module.

Reviewed By: #libc, ldionne

Differential Revision: https://reviews.llvm.org/D158192
2023-08-22 20:13:39 +02:00
Ian Anderson
f0c5ce0800 [libc++][Modules] Generate the __std_clang_module header
Use header_information to generate the __std_clang_module header. Instead of using lit_header_restrictions like the manually written header did, make a new header_include_requirements to codify what can be included rather than what can be fully tested.

Reviewed By: Mordante, #libc

Differential Revision: https://reviews.llvm.org/D157364
2023-08-14 12:08:00 -07:00
Louis Dionne
81cc929d4f [libc++] Use .gen.py tests for the transitive inclusion tests
This finishes the transition of tests covered in generate_header_tests.py
to the new .gen.py format.

Differential Revision: https://reviews.llvm.org/D152008
2023-06-05 07:23:31 -07:00
Louis Dionne
45307f1b0d [libc++] Refactor the mandatory header inclusion tests to .gen.py
This allows removing a bunch of boilerplate from the test suite and
reducing the amount of manual stuff contributors have to do when they
add a new public header.

Differential Revision: https://reviews.llvm.org/D151830
2023-06-01 19:56:30 -07:00
Mark de Wever
68c3d66a97 [libc++][format] Improves width estimate.
As obvious from the paper's title this is an LWG issue and thus retroactively
applied to C++20. This change may the output for certain code points:
1 Considers 8477 extra codepoints as having a width 2 (as of Unicode 15)
  (mostly Tangut Ideographs)
2 Change the width of 85 unassigned code points from 2 to 1
3 Change the width of 8 codepoints (in the range U+3248 CIRCLED NUMBER
  TEN ON BLACK SQUARE ... U+324F CIRCLED NUMBER EIGHTY ON BLACK
  SQUARE) from 2 to 1, because it seems questionable to make an exception
  for those without input from Unicode

Note that libc++ already uses Unicode 15, while the Standard requires Unicode 12.
(The last time I checked MSVC STL used Unicode 14.)

So in practice the only notable change is item 3.

Implements
  P2675 LWG3780: The Paper
  format's width estimation is too approximate and not forward compatible

Benchmark before these changes
--------------------------------------------------------------------
Benchmark                          Time             CPU   Iterations
--------------------------------------------------------------------
BM_ascii_text<char>             3928 ns         3928 ns       178131
BM_unicode_text<char>          75231 ns        75230 ns         9158
BM_cyrillic_text<char>         59837 ns        59834 ns        11529
BM_japanese_text<char>         39842 ns        39832 ns        17501
BM_emoji_text<char>             3931 ns         3930 ns       177750
BM_ascii_text<wchar_t>          4024 ns         4024 ns       174190
BM_unicode_text<wchar_t>       63756 ns        63751 ns        11136
BM_cyrillic_text<wchar_t>      44639 ns        44638 ns        15597
BM_japanese_text<wchar_t>      34425 ns        34424 ns        20283
BM_emoji_text<wchar_t>          3937 ns         3937 ns       177684

Benchmark after these changes
--------------------------------------------------------------------
Benchmark                          Time             CPU   Iterations
--------------------------------------------------------------------
BM_ascii_text<char>             3914 ns         3913 ns       178814
BM_unicode_text<char>          70380 ns        70378 ns         9694
BM_cyrillic_text<char>         51889 ns        51877 ns        13488
BM_japanese_text<char>         41707 ns        41705 ns        16723
BM_emoji_text<char>             3908 ns         3907 ns       177912
BM_ascii_text<wchar_t>          3949 ns         3948 ns       177525
BM_unicode_text<wchar_t>       64591 ns        64587 ns        10649
BM_cyrillic_text<wchar_t>      44089 ns        44078 ns        15721
BM_japanese_text<wchar_t>      39369 ns        39367 ns        17779
BM_emoji_text<wchar_t>          3936 ns         3934 ns       177821

Benchmarks without "if(__code_point < (__entries[0] >> 14))"
--------------------------------------------------------------------
Benchmark                          Time             CPU   Iterations
--------------------------------------------------------------------
BM_ascii_text<char>             3922 ns         3922 ns       178587
BM_unicode_text<char>          94474 ns        94474 ns         7351
BM_cyrillic_text<char>         69202 ns        69200 ns        10157
BM_japanese_text<char>         42735 ns        42692 ns        16382
BM_emoji_text<char>             3920 ns         3919 ns       178704
BM_ascii_text<wchar_t>          3951 ns         3950 ns       177224
BM_unicode_text<wchar_t>       81003 ns        80988 ns         8668
BM_cyrillic_text<wchar_t>      57020 ns        57018 ns        12048
BM_japanese_text<wchar_t>      39695 ns        39687 ns        17582
BM_emoji_text<wchar_t>          3977 ns         3976 ns       176479

This optimization does carry its weight for the Unicode and Cyrillic
test. For the Japanese tests the gains are minor and for emoji it seems
to have no effect.

Reviewed By: ldionne, tahonermann, #libc

Differential Revision: https://reviews.llvm.org/D144499
2023-04-20 21:18:33 +02:00
Christopher Di Bella
ab46648082 [libcxx] adds an include-what-you-use (IWYU) mapping file
This makes it possible for programmers to run IWYU and get more accurate
standard library inclusions. Prior to this commit, the following program
would be transformed thusly:

```cpp
// Before
 #include <algorithm>
 #include <vector>

void f() {
  auto v = std::vector{0, 1};
  std::find(std::ranges::begin(v), std::ranges::end(v), 0);
}
```

```cpp
// After
 #include <__algorithm/find.h>
 #include <__ranges/access.h>
 #include <vector>
...
```

There are two ways to fix this issue: to use [comment pragmas](https://github.com/include-what-you-use/include-what-you-use/blob/master/docs/IWYUPragmas.md)
on every private include, or to write a canonical [mapping file](https://github.com/include-what-you-use/include-what-you-use/blob/master/docs/IWYUMappings.md)
that provides the tool with a manual on how libc++ is laid out. Due to
the complexity of libc++, this commit opts for the latter, to maximise
correctness and minimise developer burden.

To mimimise developer updates to the file, it makes use of wildcards
that match everything within listed subdirectories. A script has also
been added to ensure that the mapping is always fresh in CI, and makes
the process a single step.

Finally, documentation has been added to inform users that IWYU is
supported, and what they need to do in order to leverage the mapping
file.

Closes #56937.

Differential Revision: https://reviews.llvm.org/D138189
2022-11-22 01:09:49 +00:00
Mark de Wever
a48007355a [libc++][format] Implements string escaping.
Implements parts of
- P2286R8 Formatting Ranges

Reviewed By: #libc, tahonermann

Differential Revision: https://reviews.llvm.org/D134036
2022-10-20 17:29:34 +02:00
Mark de Wever
130b1816c5 [libc++] Improve updating data files.
This changes makes it easier to update the Unicode data files used for
the Extended Graphme Clustering as added in D126971.

Reviewed By: ldionne, #libc

Differential Revision: https://reviews.llvm.org/D129668
2022-08-16 18:55:46 +02:00
Louis Dionne
9a44ed43cf [libc++] Implement tests for private headers using the new generator
Differential Revision: https://reviews.llvm.org/D123028
2022-04-04 17:44:47 -04:00
Louis Dionne
a4f73b9b14 [libc++][NFC] Rename generate_assertion_tests.py to generate_header_tests.py 2022-04-04 09:10:52 -04:00
Louis Dionne
be1294de9d [libc++] Implement all public header tests using the new generator
Note that `generate_assertion_tests.py` will be renamed to
`generate_header_tests.py` separately to facilitate change tracking.

Differential Revision: https://reviews.llvm.org/D123000
2022-04-04 09:09:37 -04:00
Louis Dionne
385cc25a53 [libc++] Ensure that all public C++ headers include <__assert>
This patch changes the requirement for getting the declaration of the
assertion handler from including <__assert> to including any public
C++ header of the library. Note that C compatibility headers are
excluded because we don't implement all the C headers ourselves --
some of them are taken straight from the C library, like assert.h.

It also adds a generated test to check it. Furthermore, this new
generated test is designed in a way that will make it possible to
replace almost all the existing test-generation scripts with this
system in upcoming patches.

Differential Revision: https://reviews.llvm.org/D122506
2022-03-30 15:05:31 -04:00
Louis Dionne
9efffe8278 [libc++][NFC] Make private header generation CMake comment more consistent 2021-07-29 14:17:04 -04:00
Christopher Di Bella
e37bbfe59c [libcxx][modules] protects users from relying on libc++ detail headers (1/n)
libc++ has started splicing standard library headers into much more
fine-grained content for maintainability. It's very likely that outdated
and naive tooling (some of which is outside of LLVM's scope) will
suggest users include things such as `<__algorithm/find.h>` instead of
`<algorithm>`, and Hyrum's law suggests that users will eventually begin
to rely on this without the help of tooling. As such, this commit
intends to protect users from themselves, by making it a hard error for
anyone outside of the standard library to include libc++ detail headers.

This is the first of four patches. Patch #2 will solve the problem for
pre-processor `#include`s; patches #3 and #4 will solve the problem for
`<__tree>` and `<__hash_table>` (since I've never touched the test cases
that are failing for these two, I want to split them out into their own
commits to be extra careful). Patch #5 will concern itself with
`<__threading_support>`, which intersects with libcxxabi (which I know
even less about).

Differential Revision: https://reviews.llvm.org/D105932
2021-07-16 22:39:18 +00:00
Louis Dionne
1f8e286cdc [libc++] Add a CMake target to re-generate files and revamp CONTRIBUTING.rst
As we automate more and more things in the library, it becomes useful for
contributors to have a single target for running all the automation as
part of their workflow. This commit adds a new `libcxx-generate-files`
target that should re-generate all the auto-generated files in the library.

As a fly-by, I also revamped the documentation on Contributing to account
for this new target and present it as a bullet list of things to check
before committing. I also added a few things that are often overlooked
to that list, such as updating the synopsis and the status files.

Differential Revision: https://reviews.llvm.org/D106067
2021-07-15 12:07:26 -04:00