llvm-project

Author	SHA1	Message	Date
Louis Dionne	41145feb77	[libc++][modules] Rewrite the modulemap to have fewer top-level modules (#110501 ) This is a re-application of bc6bd3bc1e9 which was reverted in f11abac6524 because it broke the Clang pre-commit CI. Original commit message: This patch rewrites the modulemap to have fewer top-level modules. Previously, our modulemap had one top level module for each header in the library, including private headers. This had the well-known problem of making compilation times terrible, in addition to being somewhat against the design principles of Clang modules. This patch provides almost an order of magnitude compilation time improvement when building modularized code (certainly subject to variations). For example, including <ccomplex> without a module cache went from 22.4 seconds to 1.6 seconds, a 14x improvement. To achieve this, one might be tempted to simply put all the headers in a single top-level module. Unfortunately, this doesn't work because libc++ provides C compatibility headers (e.g. stdlib.h) which create cycles when the C Standard Library headers are modularized too. This is especially tricky since base systems are usually not modularized: as far as I know, only Xcode 16 beta contains a modularized SDK that makes this issue visible. To understand it, imagine we have the following setup: // in libc++'s include/c++/v1/module.modulemap module std { header stddef.h header stdlib.h } // in the C library's include/module.modulemap module clib { header stddef.h header stdlib.h } Now, imagine that the C library's <stdlib.h> includes <stddef.h>, perhaps as an implementation detail. When building the `std` module, libc++'s <stdlib.h> header does `#include_next <stdlib.h>` to get the C library's <stdlib.h>, so libc++ depends on the `clib` module. However, remember that the C library's <stdlib.h> header includes <stddef.h> as an implementation detail. Since the header search paths for libc++ are (and must be) before the search paths for the C library, the C library ends up including libc++'s <stddef.h>, which means it depends on the `std` module. That's a cycle. To solve this issue, this patch creates one top-level module for each C compatibility header. The rest of the libc++ headers are located in a single top-level `std` module, with two main exceptions. First, the module containing configuration headers (e.g. <__config>) has its own top-level module too, because those headers are included by the C compatibility headers. Second, we create a top-level std_core module that contains several dependency-free utilities used (directly or indirectly) from the __math subdirectory. This is needed because __math pulls in a bunch of stuff, and __math is used from the C compatibility header <math.h>. As a direct benefit of this change, we don't need to generate an artificial __std_clang_module header anymore to provide a monolithic `std` module, since our modulemap does it naturally by construction. A next step after this change would be to look into whether math.h really needs to include the contents of __math, and if so, whether libc++'s math.h truly needs to include the C library's math.h header. Removing either dependency would break this annoying cycle. Thanks to Eric Fiselier for pointing out this approach during a recent meeting. This wasn't viable before some recent refactoring, but wrapping everything (except the C headers) in a large module is by far the simplest and the most effective way of doing this. Fixes #86193	2024-09-30 14:17:05 -04:00
Chris B	f11abac652	Revert "[libc++][modules] Rewrite the modulemap to have fewer top-level modules (#107638 )" (#110384 ) This reverts 3 commits: 45a09d1811d5d6597385ef02ecf2d4b7320c37c5 24bc3244d4e221f4e6740f45e2bf15a1441a3076 bc6bd3bc1e99c7ec9e22dff23b4f4373fa02cae3 The GitHub pre-merge CI has been broken since this PR went in. This change reverts it to see if I can get the pre-merge CI working again.	2024-09-28 21:47:09 -05:00
Louis Dionne	bc6bd3bc1e	[libc++][modules] Rewrite the modulemap to have fewer top-level modules (#107638 ) This patch rewrites the modulemap to have fewer top-level modules. Previously, our modulemap had one top level module for each header in the library, including private headers. This had the well-known problem of making compilation times terrible, in addition to being somewhat against the design principles of Clang modules. This patch provides almost an order of magnitude compilation time improvement when building modularized code (certainly subject to variations). For example, including <ccomplex> without a module cache went from 22.4 seconds to 1.6 seconds, a 14x improvement. To achieve this, one might be tempted to simply put all the headers in a single top-level module. Unfortunately, this doesn't work because libc++ provides C compatibility headers (e.g. stdlib.h) which create cycles when the C Standard Library headers are modularized too. This is especially tricky since base systems are usually not modularized: as far as I know, only Xcode 16 beta contains a modularized SDK that makes this issue visible. To understand it, imagine we have the following setup: // in libc++'s include/c++/v1/module.modulemap module std { header stddef.h header stdlib.h } // in the C library's include/module.modulemap module clib { header stddef.h header stdlib.h } Now, imagine that the C library's <stdlib.h> includes <stddef.h>, perhaps as an implementation detail. When building the `std` module, libc++'s <stdlib.h> header does `#include_next <stdlib.h>` to get the C library's <stdlib.h>, so libc++ depends on the `clib` module. However, remember that the C library's <stdlib.h> header includes <stddef.h> as an implementation detail. Since the header search paths for libc++ are (and must be) before the search paths for the C library, the C library ends up including libc++'s <stddef.h>, which means it depends on the `std` module. That's a cycle. To solve this issue, this patch creates one top-level module for each C compatibility header. The rest of the libc++ headers are located in a single top-level `std` module, with two main exceptions. First, the module containing configuration headers (e.g. <__config>) has its own top-level module too, because those headers are included by the C compatibility headers. Second, we create a top-level std_core module that contains several dependency-free utilities used (directly or indirectly) from the __math subdirectory. This is needed because __math pulls in a bunch of stuff, and __math is used from the C compatibility header <math.h>. As a direct benefit of this change, we don't need to generate an artificial __std_clang_module header anymore to provide a monolithic `std` module, since our modulemap does it naturally by construction. A next step after this change would be to look into whether math.h really needs to include the contents of __math, and if so, whether libc++'s math.h truly needs to include the C library's math.h header. Removing either dependency would break this annoying cycle. Thanks to Eric Fiselier for pointing out this approach during a recent meeting. This wasn't viable before some recent refactoring, but wrapping everything (except the C headers) in a large module is by far the simplest and the most effective way of doing this. Fixes #86193	2024-09-26 13:19:48 -04:00
Louis Dionne	e2754890ca	[libc++] Don't commit libcxx.imp (#89391 ) We can instead generate it on-the-fly when we install the headers. This reduces the amount of boilerplate we have to re-generate whenever we add, remove or relocate header files. Fixes #88529	2024-04-22 08:45:02 -04:00
Mark de Wever	59e66c515a	[libc++][format] Switches to Unicode 15.1. (#86543 ) In addition to changes in the tables the extended grapheme clustering algorithm has been overhauled. Before I considered a separate state machine to implement the rules. With the new rule GB9c this became more attractive and the design has changed. This change initially had quite an impact on the performance. By making the state machine persistent the performance was improved greatly. Note it is still slower than before due to the larger Unicode tables. Before -------------------------------------------------------------------- Benchmark Time CPU Iterations -------------------------------------------------------------------- BM_ascii_text<char> 1891 ns 1889 ns 369504 BM_unicode_text<char> 106642 ns 106397 ns 6576 BM_cyrillic_text<char> 73420 ns 73277 ns 9445 BM_japanese_text<char> 62485 ns 62387 ns 11153 BM_emoji_text<char> 1895 ns 1893 ns 369525 BM_ascii_text<wchar_t> 2015 ns 2013 ns 346887 BM_unicode_text<wchar_t> 92119 ns 92017 ns 7598 BM_cyrillic_text<wchar_t> 62637 ns 62568 ns 11117 BM_japanese_text<wchar_t> 53850 ns 53785 ns 12803 BM_emoji_text<wchar_t> 2016 ns 2014 ns 347325 After -------------------------------------------------------------------- Benchmark Time CPU Iterations -------------------------------------------------------------------- BM_ascii_text<char> 1906 ns 1904 ns 369409 BM_unicode_text<char> 265462 ns 265175 ns 2628 BM_cyrillic_text<char> 181063 ns 180865 ns 3871 BM_japanese_text<char> 130927 ns 130789 ns 5324 BM_emoji_text<char> 1892 ns 1890 ns 370537 BM_ascii_text<wchar_t> 2038 ns 2035 ns 343689 BM_unicode_text<wchar_t> 277603 ns 277282 ns 2526 BM_cyrillic_text<wchar_t> 188558 ns 188339 ns 3727 BM_japanese_text<wchar_t> 133084 ns 132943 ns 5262 BM_emoji_text<wchar_t> 2012 ns 2010 ns 348015 Persistent -------------------------------------------------------------------- Benchmark Time CPU Iterations -------------------------------------------------------------------- BM_ascii_text<char> 1904 ns 1899 ns 367472 BM_unicode_text<char> 133609 ns 133287 ns 5246 BM_cyrillic_text<char> 90185 ns 89941 ns 7796 BM_japanese_text<char> 75137 ns 74946 ns 9316 BM_emoji_text<char> 1906 ns 1901 ns 368081 BM_ascii_text<wchar_t> 2703 ns 2696 ns 259153 BM_unicode_text<wchar_t> 131497 ns 131168 ns 5341 BM_cyrillic_text<wchar_t> 87071 ns 86840 ns 8076 BM_japanese_text<wchar_t> 72279 ns 72099 ns 9682 BM_emoji_text<wchar_t> 2021 ns 2016 ns 346767	2024-04-09 19:20:06 +02:00
Mark de Wever	600462a2db	[libc++][modules] Adds std.compat module. (#71438 ) This adds the std.compat module. The patch contains a bit of refactoring to avoid code duplication between the std and std.compat module. Implements parts of - P2465R3 Standard Library Modules std and std.compat	2023-12-09 13:51:50 +01:00
Mark de Wever	41161aeb54	[libc++][modules] Generates std.cppm.in. This takes the header restrictions into account instead of manually duplicating this build information. This is a preparation to properly support the libc++ disabled parts in the std module. Reviewed By: #libc, ldionne Differential Revision: https://reviews.llvm.org/D158192	2023-08-22 20:13:39 +02:00
Ian Anderson	f0c5ce0800	[libc++][Modules] Generate the __std_clang_module header Use header_information to generate the __std_clang_module header. Instead of using lit_header_restrictions like the manually written header did, make a new header_include_requirements to codify what can be included rather than what can be fully tested. Reviewed By: Mordante, #libc Differential Revision: https://reviews.llvm.org/D157364	2023-08-14 12:08:00 -07:00
Louis Dionne	81cc929d4f	[libc++] Use .gen.py tests for the transitive inclusion tests This finishes the transition of tests covered in generate_header_tests.py to the new .gen.py format. Differential Revision: https://reviews.llvm.org/D152008	2023-06-05 07:23:31 -07:00
Louis Dionne	45307f1b0d	[libc++] Refactor the mandatory header inclusion tests to .gen.py This allows removing a bunch of boilerplate from the test suite and reducing the amount of manual stuff contributors have to do when they add a new public header. Differential Revision: https://reviews.llvm.org/D151830	2023-06-01 19:56:30 -07:00
Mark de Wever	68c3d66a97	[libc++][format] Improves width estimate. As obvious from the paper's title this is an LWG issue and thus retroactively applied to C++20. This change may the output for certain code points: 1 Considers 8477 extra codepoints as having a width 2 (as of Unicode 15) (mostly Tangut Ideographs) 2 Change the width of 85 unassigned code points from 2 to 1 3 Change the width of 8 codepoints (in the range U+3248 CIRCLED NUMBER TEN ON BLACK SQUARE ... U+324F CIRCLED NUMBER EIGHTY ON BLACK SQUARE) from 2 to 1, because it seems questionable to make an exception for those without input from Unicode Note that libc++ already uses Unicode 15, while the Standard requires Unicode 12. (The last time I checked MSVC STL used Unicode 14.) So in practice the only notable change is item 3. Implements P2675 LWG3780: The Paper format's width estimation is too approximate and not forward compatible Benchmark before these changes -------------------------------------------------------------------- Benchmark Time CPU Iterations -------------------------------------------------------------------- BM_ascii_text<char> 3928 ns 3928 ns 178131 BM_unicode_text<char> 75231 ns 75230 ns 9158 BM_cyrillic_text<char> 59837 ns 59834 ns 11529 BM_japanese_text<char> 39842 ns 39832 ns 17501 BM_emoji_text<char> 3931 ns 3930 ns 177750 BM_ascii_text<wchar_t> 4024 ns 4024 ns 174190 BM_unicode_text<wchar_t> 63756 ns 63751 ns 11136 BM_cyrillic_text<wchar_t> 44639 ns 44638 ns 15597 BM_japanese_text<wchar_t> 34425 ns 34424 ns 20283 BM_emoji_text<wchar_t> 3937 ns 3937 ns 177684 Benchmark after these changes -------------------------------------------------------------------- Benchmark Time CPU Iterations -------------------------------------------------------------------- BM_ascii_text<char> 3914 ns 3913 ns 178814 BM_unicode_text<char> 70380 ns 70378 ns 9694 BM_cyrillic_text<char> 51889 ns 51877 ns 13488 BM_japanese_text<char> 41707 ns 41705 ns 16723 BM_emoji_text<char> 3908 ns 3907 ns 177912 BM_ascii_text<wchar_t> 3949 ns 3948 ns 177525 BM_unicode_text<wchar_t> 64591 ns 64587 ns 10649 BM_cyrillic_text<wchar_t> 44089 ns 44078 ns 15721 BM_japanese_text<wchar_t> 39369 ns 39367 ns 17779 BM_emoji_text<wchar_t> 3936 ns 3934 ns 177821 Benchmarks without "if(__code_point < (__entries[0] >> 14))" -------------------------------------------------------------------- Benchmark Time CPU Iterations -------------------------------------------------------------------- BM_ascii_text<char> 3922 ns 3922 ns 178587 BM_unicode_text<char> 94474 ns 94474 ns 7351 BM_cyrillic_text<char> 69202 ns 69200 ns 10157 BM_japanese_text<char> 42735 ns 42692 ns 16382 BM_emoji_text<char> 3920 ns 3919 ns 178704 BM_ascii_text<wchar_t> 3951 ns 3950 ns 177224 BM_unicode_text<wchar_t> 81003 ns 80988 ns 8668 BM_cyrillic_text<wchar_t> 57020 ns 57018 ns 12048 BM_japanese_text<wchar_t> 39695 ns 39687 ns 17582 BM_emoji_text<wchar_t> 3977 ns 3976 ns 176479 This optimization does carry its weight for the Unicode and Cyrillic test. For the Japanese tests the gains are minor and for emoji it seems to have no effect. Reviewed By: ldionne, tahonermann, #libc Differential Revision: https://reviews.llvm.org/D144499	2023-04-20 21:18:33 +02:00
Christopher Di Bella	ab46648082	[libcxx] adds an include-what-you-use (IWYU) mapping file This makes it possible for programmers to run IWYU and get more accurate standard library inclusions. Prior to this commit, the following program would be transformed thusly: ```cpp // Before #include <algorithm> #include <vector> void f() { auto v = std::vector{0, 1}; std::find(std::ranges::begin(v), std::ranges::end(v), 0); } ``` ```cpp // After #include <__algorithm/find.h> #include <__ranges/access.h> #include <vector> ... ``` There are two ways to fix this issue: to use [comment pragmas](https://github.com/include-what-you-use/include-what-you-use/blob/master/docs/IWYUPragmas.md) on every private include, or to write a canonical [mapping file](https://github.com/include-what-you-use/include-what-you-use/blob/master/docs/IWYUMappings.md) that provides the tool with a manual on how libc++ is laid out. Due to the complexity of libc++, this commit opts for the latter, to maximise correctness and minimise developer burden. To mimimise developer updates to the file, it makes use of wildcards that match everything within listed subdirectories. A script has also been added to ensure that the mapping is always fresh in CI, and makes the process a single step. Finally, documentation has been added to inform users that IWYU is supported, and what they need to do in order to leverage the mapping file. Closes #56937. Differential Revision: https://reviews.llvm.org/D138189	2022-11-22 01:09:49 +00:00
Mark de Wever	a48007355a	[libc++][format] Implements string escaping. Implements parts of - P2286R8 Formatting Ranges Reviewed By: #libc, tahonermann Differential Revision: https://reviews.llvm.org/D134036	2022-10-20 17:29:34 +02:00
Mark de Wever	130b1816c5	[libc++] Improve updating data files. This changes makes it easier to update the Unicode data files used for the Extended Graphme Clustering as added in D126971. Reviewed By: ldionne, #libc Differential Revision: https://reviews.llvm.org/D129668	2022-08-16 18:55:46 +02:00
Louis Dionne	9a44ed43cf	[libc++] Implement tests for private headers using the new generator Differential Revision: https://reviews.llvm.org/D123028	2022-04-04 17:44:47 -04:00
Louis Dionne	a4f73b9b14	[libc++][NFC] Rename generate_assertion_tests.py to generate_header_tests.py	2022-04-04 09:10:52 -04:00
Louis Dionne	be1294de9d	[libc++] Implement all public header tests using the new generator Note that `generate_assertion_tests.py` will be renamed to `generate_header_tests.py` separately to facilitate change tracking. Differential Revision: https://reviews.llvm.org/D123000	2022-04-04 09:09:37 -04:00
Louis Dionne	385cc25a53	[libc++] Ensure that all public C++ headers include <__assert> This patch changes the requirement for getting the declaration of the assertion handler from including <__assert> to including any public C++ header of the library. Note that C compatibility headers are excluded because we don't implement all the C headers ourselves -- some of them are taken straight from the C library, like assert.h. It also adds a generated test to check it. Furthermore, this new generated test is designed in a way that will make it possible to replace almost all the existing test-generation scripts with this system in upcoming patches. Differential Revision: https://reviews.llvm.org/D122506	2022-03-30 15:05:31 -04:00
Louis Dionne	9efffe8278	[libc++][NFC] Make private header generation CMake comment more consistent	2021-07-29 14:17:04 -04:00
Christopher Di Bella	e37bbfe59c	[libcxx][modules] protects users from relying on libc++ detail headers (1/n) libc++ has started splicing standard library headers into much more fine-grained content for maintainability. It's very likely that outdated and naive tooling (some of which is outside of LLVM's scope) will suggest users include things such as `<__algorithm/find.h>` instead of `<algorithm>`, and Hyrum's law suggests that users will eventually begin to rely on this without the help of tooling. As such, this commit intends to protect users from themselves, by making it a hard error for anyone outside of the standard library to include libc++ detail headers. This is the first of four patches. Patch #2 will solve the problem for pre-processor `#include`s; patches #3 and #4 will solve the problem for `<__tree>` and `<__hash_table>` (since I've never touched the test cases that are failing for these two, I want to split them out into their own commits to be extra careful). Patch #5 will concern itself with `<__threading_support>`, which intersects with libcxxabi (which I know even less about). Differential Revision: https://reviews.llvm.org/D105932	2021-07-16 22:39:18 +00:00
Louis Dionne	1f8e286cdc	[libc++] Add a CMake target to re-generate files and revamp CONTRIBUTING.rst As we automate more and more things in the library, it becomes useful for contributors to have a single target for running all the automation as part of their workflow. This commit adds a new `libcxx-generate-files` target that should re-generate all the auto-generated files in the library. As a fly-by, I also revamped the documentation on Contributing to account for this new target and present it as a bullet list of things to check before committing. I also added a few things that are often overlooked to that list, such as updating the synopsis and the status files. Differential Revision: https://reviews.llvm.org/D106067	2021-07-15 12:07:26 -04:00

21 Commits