llvm-project

Author	SHA1	Message	Date
Ilya Biryukov	f02b1cc99e	[ASTWriter] Detect more non-affecting FileIDs to reduce source location duplication (#112015 ) Currently, any FileID that references a module map file that was required for a compilation is considered as affecting. This misses an important opportunity to reduce the source location space taken by the resulting PCM. In particular, consider the situation where the same module map file is passed multiple times in the dependency chain: ```shell $ clang -fmodule-map-file=foo.modulemap ... -o mod1.pcm $ clang -fmodule-map-file=foo.modulemap -fmodule-file=mod1.pcm ... -o mod2.pcm ... $ clang -fmodule-map-file=foo.modulemap -fmodule-file=mod$((N-1)).pcm ... -o mod$N.pcm ``` Because `foo.modulemap` is read before reading any of the `.pcm` files, we have to create a unique `FileID` for it when creating each module. However, when reading the `.pcm` files, we will reuse the `FileID` loaded from it for the same module map file and the `FileID` we created can never be used again, but we will still mark it as affecting and it will take the source location space in the output PCM. For a chain of N dependencies, this results in the file taking `N * (size of file)` source location space, which could be significant. For examples, we observer internally that some targets that run out of 2GB of source location space end up wasting up to 20% of that space in module maps as described above. I take extra care to still write the InputFile entries for those files that occupied source location space before. It is required for correctness of clang-scan-deps.	2024-11-08 09:10:37 +01:00
Jan Svoboda	53e49f15ab	[clang][serialization] Pass `ASTContext` explicitly (#115235 ) This patch removes `ASTWriter::Context` and starts passing `ASTContext &` explicitly to functions that actually need it. This is a non-functional change with the end-goal of being able to write lightweight PCM files with no `ASTContext` at all.	2024-11-07 14:40:21 -08:00
Jan Svoboda	304c412173	[clang][serialization] Reduce `ASTWriter::writeUnhashedControlBlock()` scope	2024-11-06 12:54:01 -08:00
Jan Svoboda	0276621f8f	[clang][serialization] Reduce `ASTWriter::WriteControlBlock()` scope	2024-11-06 12:36:46 -08:00
Jan Svoboda	bcb64e1317	[clang][serialization] Reduce `ASTWriter::WriteSourceManagerBlock()` scope	2024-11-06 12:34:24 -08:00
David Pagan	435e58468a	[clang][OpenMP] Add 'allocator' modifier for 'allocate' clause. (#114883 ) The 'allocator' modifier is now accepted in the 'allocate' clause. Added LIT tests covering codegen, PCH, template handling, and serialization for 'allocator' modifier. Added support for allocator-modifier to release notes. Testing - New allocate modifier LIT tests. - OpenMP LIT tests. - check-all - relevant sollve_vv test cases tests/5.2/scope/test_scope_allocate_construct.c	2024-11-05 17:06:41 -08:00
Jan Svoboda	e494e2694a	[clang][lex] Remove `HeaderFileInfo::Framework` (#114460 ) This PR removes the `HeaderFileInfo::Framework` member and reduces the size of this data type from 32B to 16B. This should improve Clang's memory usage in situations where it keeps track of lots of header files. NFCI. Depends on #114459.	2024-10-31 16:33:28 -07:00
Jan Svoboda	19b4f17d4c	[clang][lex] Remove `-index-header-map` (#114459 ) This PR removes the `-index-header-map` functionality from Clang. AFAIK this was only used internally at Apple and is now dead code. The main motivation behind this change is to enable the removal of `HeaderFileInfo::Framework` member and reducing the size of that data structure. rdar://84036149	2024-10-31 16:04:35 -07:00
Jan Svoboda	da1a16ae10	[clang][modules] Preserve the module map that allowed inferring (#113389 ) With inferred modules, the dependency scanner takes care to replace the fake "__inferred_module.map" path with the file that allowed the module to be inferred. However, this only worked when such a module was imported directly in the TU. Whenever such module got loaded transitively, the scanner would fail to perform the replacement. This is caused by the fact that PCM files are lossy and drop this information. This patch makes sure that PCMs include this file for each submodule (in the `SUBMODULE_DEFINITION` record), fixes one existing test with an incorrect assertion, and does a little drive-by refactoring of `ModuleMap`.	2024-10-28 11:24:27 -07:00
Jan Svoboda	590b1e3154	[clang][modules] Only serialize info for locally-included headers (#113718 ) I noticed that some PCM files contain `HeaderFileInfo` for headers only included in a dependent PCM file, which is wasteful. This patch changes the logic to only write headers that are included locally. This makes the PCM files smaller and saves some superfluous deserialization of `HeaderFileInfo` triggered by `Preprocessor::alreadyIncluded()`.	2024-10-25 15:00:07 -07:00
Jan Svoboda	61946687bc	[clang][modules] Shrink the size of `Module::Headers` (#113395 ) This patch shrinks the size of the `Module` class from 2112B to 1624B. I wasn't able to get a good data on the actual impact on memory usage, but given my `clang-scan-deps` workload at hand (with tens of thousands of instances), I think there should be some win here. This also speeds up my benchmark by under 0.1%.	2024-10-25 11:33:44 -07:00
Jan Svoboda	0ffa29fe81	[clang][modules] Timestamp PCM files when writing (#112452 ) Clang uses timestamp files to track the last time an implicitly-built PCM file was verified to be up-to-date with regard to its inputs. With `-fbuild-session-{file,timestamp}=` and `-fmodules-validate-once-per-build-session` this reduces the number of times a PCM file is checked per "build session". The behavior I'm seeing with the current scheme is that when lots of Clang instances wait for the same PCM to be built, they race to validate it as soon as the file lock gets released, causing lots of concurrent IO. This patch makes it so that the timestamp is written by the same Clang instance responsible for building the PCM while still holding the lock. This makes it so that whenever a PCM file gets compiled, it's never re-validated in the same build session. I believe this is as sound as the current scheme. One thing to be aware of is that there might be a time interval between accessing input file N and writing the timestamp file, where changes to input files 0..<N would not result in a rebuild. Since this is the case current scheme too, I'm not too concerned about that. I've seen this speed up `clang-scan-deps` by ~27%.	2024-10-22 15:08:02 -07:00
Erich Keane	c8cbdc659c	[OpenACC] Implement 'loop' 'vector' clause (#112259 ) The 'vector' clause specifies the iterations to be executed in vector or SIMD mode. There are some limitations on which associated compute contexts may be associated with this and have arguments, but otherwise this is a fairly unrestricted clause. It DOES have region limits like 'gang' and 'worker'.	2024-10-15 06:12:19 -07:00
Erich Keane	cf456ed2a4	[OpenACC] implement loop 'worker' clause. (#112206 ) The worker clause specifies iterations of the loop/ that are executed in parallel by distributing the iterations among the multiple works within a single gang. The sema rules for this type are simply that it cannot be combined with a `kernel` construct with a `num_workers` clause, child `loop` clauses cannot contain a `gang` or `worker` clause, and that the argument is oly allowed when associated with a `kernel`.	2024-10-14 09:08:24 -07:00
Erich Keane	5b25c31351	[OpenACC] Implement loop 'gang' clause. (#112006 ) The 'gang' clause is used to specify parallel execution of loops, thus has some complicated rules depending on the 'loop's associated compute construct. This patch implements all of those.	2024-10-11 09:05:19 -07:00
Michael Kruse	5b03efb85d	[Clang][OpenMP] Add permutation clause (#92030 ) Add the permutation clause for the interchange directive which will be introduced in the upcoming OpenMP 6.0 specification. A preview has been published in [Technical Report12](https://www.openmp.org/wp-content/uploads/openmp-TR12.pdf).	2024-10-09 14:56:43 +02:00
Erich Keane	d412cea8c4	[OpenACC] Implement 'tile' attribute AST (#110999 ) The 'tile' clause shares quite a bit of the rules with 'collapse', so a followup patch will add those tests/behaviors. This patch deals with adding the AST node. The 'tile' clause takes a series of integer constant expressions, or *. The asterisk is now represented by a new OpenACCAsteriskSizeExpr node, else this clause is very similar to others.	2024-10-03 08:34:43 -07:00
Doug Wyatt	7fe43ada28	[Clang] nonblocking/nonallocating attributes: 2nd pass caller/callee analysis (#99656 ) - In Sema, when encountering Decls with function effects needing verification, add them to a vector, DeclsWithEffectsToVerify. - Update AST serialization to include DeclsWithEffectsToVerify. - In AnalysisBasedWarnings, use DeclsWithEffectsToVerify as a work queue, verifying functions with declared effects, and inferring (when permitted and necessary) whether their callees have effects. --------- Co-authored-by: Doug Wyatt <dwyatt@apple.com> Co-authored-by: Sirraide <aeternalmail@gmail.com> Co-authored-by: Erich Keane <ekeane@nvidia.com>	2024-10-03 02:14:51 +02:00
Erich Keane	97da34e015	[OpenACC] Add 'collapse' clause AST/basic Sema implementation (#109461 ) The 'collapse' clause on a 'loop' construct is used to specify how many nested loops are associated with the 'loop' construct. It takes an optional 'force' tag, and an integer constant expression as arguments. There are many other restrictions based on the contents of the loop/etc, but those are implemented in followup patches, for now, this patch just adds the AST node and does basic argument checking on the loop-count.	2024-10-01 06:40:21 -07:00
Dmitry Polukhin	9a361684c8	[C++20][Modules] Fix non-determinism in serialized AST (#110131 ) Summary: https://github.com/llvm/llvm-project/pull/109167 serializes FunctionToLambdasMap in the order of pointers in DenseMap. It gives different order with different memory layouts. Fix this issue by using LocalDeclID instead of pointers. Test Plan: check-clang	2024-09-27 07:33:59 +01:00
Kadir Cetinkaya	2ad435f9f6	Revert "[clang] Extend diagnose_if to accept more detailed warning information (#70976 )" This reverts commit e39205654dc11c50bd117e8ccac243a641ebd71f. There are further discussions in https://github.com/llvm/llvm-project/pull/70976, happening for past two weeks. Since there were no responses for couple weeks now, reverting until author is back.	2024-09-26 12:16:07 +02:00
Dmitry Polukhin	2ccac07bf2	[C++20][Modules] Fix crash when function and lambda inside loaded from different modules (#109167 ) Summary: Because AST loading code is lazy and happens in unpredictable order, it is possible that a function and lambda inside the function can be loaded from different modules. As a result, the captured DeclRefExpr won’t match the corresponding VarDecl inside the function. This situation is reflected in the AST as follows: ``` FunctionDecl 0x555564f4aff0 <Conv.h:33:1, line:41:1> line:33:35 imported in ./thrift_cpp2_base.h hidden tryTo 'Expected<Tgt, const char *> ()' inline \|-also in ./folly-conv.h `-CompoundStmt 0x555564f7cfc8 <col:43, line:41:1> \|-DeclStmt 0x555564f7ced8 <line:34:3, col:17> \| `-VarDecl 0x555564f7cef8 <col:3, col:16> col:7 imported in ./thrift_cpp2_base.h hidden referenced result 'Tgt' cinit \| `-IntegerLiteral 0x555564f7d080 <col:16> 'int' 0 \|-CallExpr 0x555564f7cea8 <line:39:3, col:76> '<dependent type>' \| \|-UnresolvedLookupExpr 0x555564f7bea0 <col:3, col:19> '<overloaded function type>' lvalue (no ADL) = 'then_' 0x555564f7bef0 \| \|-CXXTemporaryObjectExpr 0x555564f7bcb0 <col:25, col:45> 'Expected<bool, int>':'folly::Expected<bool, int>' 'void () noexcept' zeroing \| `-LambdaExpr 0x555564f7bc88 <col:48, col:75> '(lambda at Conv.h:39:48)' \| \|-CXXRecordDecl 0x555564f76b88 <col:48> col:48 imported in ./folly-conv.h hidden implicit <undeserialized declarations> class definition \| \| \|-also in ./thrift_cpp2_base.h \| \| `-DefinitionData lambda empty standard_layout trivially_copyable literal can_const_default_init \| \| \|-DefaultConstructor defaulted_is_constexpr \| \| \|-CopyConstructor simple trivial has_const_param needs_implicit implicit_has_const_param \| \| \|-MoveConstructor exists simple trivial needs_implicit \| \| \|-CopyAssignment trivial has_const_param needs_implicit implicit_has_const_param \| \| \|-MoveAssignment \| \| `-Destructor simple irrelevant trivial constexpr needs_implicit \| `-CompoundStmt 0x555564f7d1a8 <col:58, col:75> \| `-ReturnStmt 0x555564f7d198 <col:60, col:67> \| `-DeclRefExpr 0x555564f7d0a0 <col:67> 'Tgt' lvalue Var 0x555564f7d0c8 'result' 'Tgt' refers_to_enclosing_variable_or_capture `-ReturnStmt 0x555564f7bc78 <line:40:3, col:11> `-InitListExpr 0x555564f7bc38 <col:10, col:11> 'void' ``` This diff modifies the AST deserialization process to load lambdas within the canonical function declaration sooner, immediately following the function, ensuring that they are loaded from the same module. Re-land https://github.com/llvm/llvm-project/pull/104512 Added test case that caused crash due to multiple enclosed lambdas deserialization. Test Plan: check-clang	2024-09-25 08:31:49 +01:00
Nikolas Klauser	f5be5cdaad	[Clang] Add __builtin_common_type (#99473 ) This implements the logic of the `common_type` base template as a builtin alias. If there should be no `type` member, an empty class is returned. Otherwise a specialization of a `type_identity`-like class is returned. The base template (i.e. `std::common_type`) as well as the empty class and `type_identity`-like struct are given as arguments to the builtin.	2024-09-22 09:25:52 +02:00
Nikolas Klauser	e39205654d	Reapply "Reapply "[clang] Extend diagnose_if to accept more detailed warning information (#70976 )" (#108453 )" This reverts commit e1bd9740faa62c11cc785a7b70ec1ad17e286bd1. Fixes incorrect use of the `DiagnosticsEngine` in the clangd tests.	2024-09-14 22:25:08 +02:00
Florian Mayer	e1bd9740fa	Revert "Reapply "[clang] Extend diagnose_if to accept more detailed warning information (#70976 )" (#108453 )" This reverts commit e7f782e7481cea23ef452a75607d3d61f5bd0d22. This had UBSan failures: [----------] 1 test from ConfigCompileTests [ RUN ] ConfigCompileTests.DiagnosticSuppression Config fragment: compiling <unknown>:0 -> 0x00007B8366E2F7D8 (trusted=false) /usr/local/google/home/fmayer/large/llvm-project/llvm/include/llvm/ADT/IntrusiveRefCntPtr.h:203:33: runtime error: reference binding to null pointer of type 'clang::DiagnosticIDs' UndefinedBehaviorSanitizer: undefined-behavior /usr/local/google/home/fmayer/large/llvm-project/llvm/include/llvm/ADT/IntrusiveRefCntPtr.h:203:33 Pull Request: https://github.com/llvm/llvm-project/pull/108645	2024-09-13 15:01:33 -07:00
Nikolas Klauser	e7f782e748	Reapply "[clang] Extend diagnose_if to accept more detailed warning information (#70976 )" (#108453 ) This reverts commit e0cd11eba526234ca14a0b91f5598ca3363b6aca. Update the use of `getWarningOptionForDiag` in flang to use the DiagnosticIDs.	2024-09-13 11:34:20 +02:00
Kazu Hirata	e0cd11eba5	Revert "[clang] Extend diagnose_if to accept more detailed warning information (#70976 )" This reverts commit 030c6da7af826b641db005be925b20f956c3a6bb. Several build bots are failing: https://lab.llvm.org/buildbot/#/builders/89/builds/6211 https://lab.llvm.org/buildbot/#/builders/157/builds/7578 https://lab.llvm.org/buildbot/#/builders/140/builds/6429	2024-09-12 12:19:26 -07:00
Nikolas Klauser	030c6da7af	[clang] Extend diagnose_if to accept more detailed warning information (#70976 ) This implements parts of the extension proposed in https://discourse.llvm.org/t/exposing-the-diagnostic-engine-to-c/73092/7. Specifically, this makes it possible to specify a diagnostic group in an optional third argument.	2024-09-12 20:15:01 +02:00
Helena Kotas	e00e9a3f82	[HLSL] Add HLSLAttributedResourceType (#106181 ) Introducing `HLSLAttributedResourceType` - a new type that is similar to `AttributedType` but with additional data specific to HLSL resources. `AttributeType` currently only stores an attribute kind and no additional data from the type attribute parameters. This does not really work for HLSL resources since its type attributes contain non-boolean values that need to be retained as well. For example: ``` template <typename T> class RWBuffer { __hlsl_resource_t [[hlsl::resource_class(uav)]] [[hlsl::is_rov]] handle; }; ``` The data `HLSLAttributedResourceType` needs to eventually store are: - resource class (SRV, UAV, CBuffer, Sampler) - texture dimension(1-3) - flags is_rov, is_array, is_feedback and is_multisample - contained type All of these values except contained type will be stored in `HLSLAttributedResourceType::Attributes` struct and accessed individually via the fields. There is also `Data` alias that covers all of these values as a `unsigned` which is used for hashing and the AST type serialization. During type attribute processing all HLSL type attributes will be validated and collected by SemaHLSL (by `SemaHLSL::handleResourceTypeAttr`) and in the end combined into a single `HLSLAttributedResourceType` instance (in `SemaHLSL::ProcessResourceTypeAttributes`). `SemaHLSL` will also need to short-term store the `TypeLoc` information for the new type that will be grabbed by `TypeSpecLocFiller` soon after the type is created. Part 1/2 of #104861	2024-08-29 21:42:20 -07:00
Chuanqi Xu	47615ff234	[C++20] [Modules] Don't insert class not in named modules to PendingEmittingVTables (#106501 ) Close https://github.com/llvm/llvm-project/issues/102933 The root cause of the issue is an oversight in https://github.com/llvm/llvm-project/pull/102287 that I didn't notice that PendingEmittingVTables should only accept classes in named modules.	2024-08-29 15:42:57 +08:00
Jonas Hahnfeld	66bd5d7989	[clang-repl] Fix PCH with delayed template parsing (#103028 ) When instantiating a delayed template, the recorded token stream is passed to `Parser::ParseLateTemplatedFuncDef` which will append the current token "so it doesn't get lost". With incremental extensions enabled, this is `repl_input_end` which subsequently needs support for (de)serialization.	2024-08-14 15:11:04 +02:00
Chuanqi Xu	4915fddbb2	[Serialization] Add a callback to register new created predefined decls for DeserializationListener (#102855 ) Close https://github.com/llvm/llvm-project/issues/102684 The root cause of the issue is, it is possible that the predefined decl is not registered at the beginning of writing a module file but got created during the process of writing from reading. This is incorrect. The predefined decls should always be predefined decls. Another deep thought about the issue is, we shouldn't read any new things after we start to write the module file. But this is another deeper question.	2024-08-12 18:27:37 +08:00
Chuanqi Xu	cb372bd5e7	Revert "[NFC] [C++20] [Modules] Adjust the implementation of wasDeclEmitted to make it more clear" This reverts commit 4399f2a5ef38df381c2b65052621131890194d59. This fails with Modules/aarch64-sme-keywords.cppm	2024-08-12 14:50:32 +08:00
Chuanqi Xu	4399f2a5ef	[NFC] [C++20] [Modules] Adjust the implementation of wasDeclEmitted to make it more clear The preivous implementation of wasDeclEmitted may be confusing that why we need to filter the declaration not from modules. Now adjust the implementations to avoid the problems.	2024-08-12 11:25:05 +08:00
Shilei Tian	1c269929d0	[Clang][Sema][OpenMP] Allow `thread_limit` to accept multiple expressions (#102715 )	2024-08-10 09:54:58 -04:00
Chuanqi Xu	847f9cb0e8	Reland [C++20] [Modules] [Itanium ABI] Generate the vtable in the mod… (#102287 ) Reland https://github.com/llvm/llvm-project/pull/75912 The differences of this PR between https://github.com/llvm/llvm-project/pull/75912 are: - Fixed a regression in `Decl::isInAnotherModuleUnit()` in DeclBase.cpp pointed by @mizvekov and add the corresponding test. - Fixed the regression in windows https://github.com/llvm/llvm-project/issues/97447. The changes are in `CodeGenModule::getVTableLinkage` from `clang/lib/CodeGen/CGVTables.cpp`. According to the feedbacks from MSVC devs, the linkage of vtables won't affected by modules. So I simply skipped the case for MSVC. Given this is more or less fundamental to the use of modules. I hope we can backport this to 19.x.	2024-08-08 13:14:09 +08:00
Shilei Tian	cee594cf36	[Clang][Sema][OpenMP] Allow `num_teams` to accept multiple expressions (#99732 ) By the OpenMP standard, `num_teams` clause can only accept one expression (for now). In this patch, we extend it to allow to accept multiple expressions when it is used with `target teams ompx_bare` construct. This will allow to launch a multi-dim grid, same as CUDA/HIP.	2024-08-06 10:55:15 -04:00
Julian Brown	a42e515e3a	[OpenMP] OpenMP 5.1 "assume" directive parsing support (#92731 ) This is a minimal patch to support parsing for "omp assume" directives. These are meant to be hints to a compiler's optimisers: as such, it is legitimate (if not very useful) to ignore them. The patch builds on top of the existing support for "omp assumes" directives (note spelling!). Unlike the "omp [begin/end] assumes" directives, "omp assume" is associated with a compound statement, i.e. it can appear within a function. The "holds" assumption could (theoretically) be mapped onto the existing builtin "__builtin_assume", though the latter applies to a single point in the program, and the former to a range (i.e. the whole of the associated compound statement). This patch fixes sollve's OpenMP 5.1 "omp assume"-based tests.	2024-08-05 07:37:07 -04:00
Kazu Hirata	1fa7f05b70	[clang] Construct SmallVector with ArrayRef (NFC) (#101898 )	2024-08-04 23:46:34 -07:00
Krzysztof Parzyszek	243b27f7e1	[clang][OpenMP] Rename `varlists` to `varlist`, NFC (#101058 ) It returns a range of variables (via Expr*), not a range of lists.	2024-07-30 08:11:09 -05:00
Chuanqi Xu	c184b94ff6	[C++20] [Modules] Write ODRHash for decls in GMF Previously, we skipped calculating ODRHash for decls in GMF when writing them to .pcm files as an optimization. But actually, it is not true that this will be a pure optimization. Whether or not it is beneficial depends on the use cases. For example, if we're writing a function `a` in module and there are 10 consumers of `a` in other TUs, then the other TUs will pay for the cost to calculate the ODR hash for `a` ten times. Then this optimization doesn't work. However, if all the consumers of the module didn't touch `a`, then we can save the cost to calculate the ODR hash of `a` for 1 times. And the assumption to make it was: generally, the consumers of a module may only consume a small part of the imported module. This is the reason why we tried to load declarations, types and identifiers lazily. Then it looks good to do the similar thing for calculating ODR hashs. It works fine for a long time, until we started to look into the support of modules in clangd. Then we meet multiple issue reports complaining we're calculating ODR hash in the wrong place. To workaround these issue reports, I decided to always write the ODRhash for decls in GMF. In my local test, I only observed less than 1% compile time regression after doing this. So it should be fine.	2024-07-18 11:42:23 +08:00
Chuanqi Xu	91d40ef6e3	Revert "[C++20] [Modules] [Itanium ABI] Generate the vtable in the module unit of dynamic classes (#75912 )" This reverts commit 18f3bcbb13ca83d33223b00761d8cddf463e9ffb, 15bb02650e26875c48889053d6a9697444583721 and 99873b35da7ecb905143c8a6b8deca4d4416f1a9. See the post commit message in https://github.com/llvm/llvm-project/pull/75912 to see the reasons.	2024-07-10 10:58:18 +08:00
Jan Svoboda	0387a86f9a	[clang][modules] Fix use-after-free in header serialization (#96356 ) With the pruning of unused module map files disabled (`-fno-modules-prune-non-affecting-module-map-files`), `HeaderFileInfo` no longer gets deserialized before `ASTWriter::WriteHeaderSearch()`. This function then interleaves the stores of references to `KnownHeader` with their lazy deserialization. Lazy deserialization may cause reallocation of `ModuleMap::Headers` entries (including its `SmallVector<KnownHeader, 1>` values) thus making previously-stored `ArrayRef<KnownHeader>` dangling. This patch fixes that situation by storing a copy instead.	2024-07-08 09:26:44 -07:00
Chuanqi Xu	fa20184a8f	[C++20] [Modules] [Serialization] Don't reuse type ID and identifier ID from imported modules To support no-transitive-change model for named modules, we can't reuse type ID and identifier ID from imported modules arbitrarily. Since the theory for no-transitive-change model is, for a user of a named module, the user can only access the indirectly imported decls via the directly imported module. So that it is possible to control what matters to the users when writing the module. And it will be unsafe to do so if the users can reuse the type IDs and identifier IDs from the indirectly imported modules not via the directly imported modules. So in this patch, we don't reuse the type ID and identifier ID in the AST writer to avoid the problematic case.	2024-06-25 15:04:32 +08:00
Chuanqi Xu	1ecc5ae13b	[Serialization] Register Speical types before register decls We will only regsiter top level types and decls in ASTWriter and we will register the sub types and decls during the process of writing types and decls. So that the ID for the types in the sub level can be different if the writing decl process changes the order of the to-be- emitted type queues. This is not ideal since it causes unnecessary changes especially in no transitive changes model. This patch migrates the issue by regsitering special types before regsitering decls. This make sure that the special types in the 2nd top level can be registered early than the decls. But it might still be problematic if there are more levels in the special types. Luckily we just don't have such special types.	2024-06-24 11:08:46 +08:00
Fangrui Song	f3005d5b86	[Serialization] Change input file content hash from size_t to uint64_t https://reviews.llvm.org/D67249 added content hash (see -fvalidate-ast-input-files-content) using llvm::hash_code (size_t). The hash value is 32-bit on 32-bit systems, which was unintentional. Fix #96379: #96136 switched the hash function to xxh3_64bit but did not update the ContentHash type, leading to mismatch between ASTReader and ASTWriter.	2024-06-22 14:21:36 -07:00
Chuanqi Xu	d4d95ee651	[Serialization] Register identifiers in ahead and don't emit predefined decls See the added test for the motivation example. In that example, we add a new function declaration in `a.cppm` and this is not used in the reduced BMI of `b.cppm`. We expect that the change won't affect the BMI of `b.cppm`. But it is the not the case. There are 2 reason for unexpected result: 1. We would register the interesting identifiers in a pretty late phase. This may cause some some predefined identifier ID change due to we insert other identifiers during emitting decls and types. 2. In `GenerateNameLookup`, we would generate information for predefined decls. This may not be intended. Since every predefined decl doesn't belong to any module. And this patch solves the first issue by registering the identifiers in the very early posititon to make sure the ID won't get affected by the process to emit decls and types. And we solve the second question by filtering predefined decls simply.	2024-06-21 17:50:30 +08:00
Fangrui Song	874dcaea09	[Serialization] Use stable hash functions clangSerialization currently uses hash_combine/hash_value from Hashing.h, which are not guaranteed to be deterministic. Replace these uses with xxh3_64bits. Pull Request: https://github.com/llvm/llvm-project/pull/96136	2024-06-20 23:53:07 -07:00
Chuanqi Xu	03921b979d	[serialization] No transitive type change (#92511 ) Following of https://github.com/llvm/llvm-project/pull/92085. #### motivation The motivation is still cutting of the unnecessary change in the dependency chain. See the above link (recursively) for details. And this will be the last patch of the `no-transitive-*-change` series. If there are any following patches, they might be C++20 Named modules specific to handle special grammars like `ADL` (See the reply in https://discourse.llvm.org/t/rfc-c-20-modules-introduce-thin-bmi-and-decls-hash/74755/53 for example). So they won't affect the whole serialization part as the series patch did. #### example After this patch, finally we are able to cut of unnecessary change of types. For example, ``` //--- m-partA.cppm export module m:partA; //--- m-partA.v1.cppm export module m:partA; namespace NS { class A { public: int getValue() { return 43; } }; } //--- m-partB.cppm export module m:partB; export inline int getB() { return 430; } //--- m.cppm export module m; export import :partA; export import :partB; //--- useBOnly.cppm export module useBOnly; import m; export inline int get() { return getB(); } ``` The BMI of `useBOnly.cppm` is expected to not change if we only add a new class in `m:partA`. This will be pretty useful in practice. #### implementation details The key idea of this patch is similar with the previous patches: extend the 32bits type ID to 64bits so that we can store the module file index in the higher bits. Then the encoding of the type ID is independent on the imported modules. But there are two differences from the previous patches: - TypeID is not completely an index of serialized types. We used the lower 3 bits to store the qualifiers. - TypeID won't take part in any lookup process. So the uses of TypeID is much less than the previous patches. The first difference make we have some more slightly complex bit operations. And the second difference makes the patch much simpler than the previous ones.	2024-06-21 09:21:40 +08:00
Chuanqi Xu	2f2ea3557b	[Serialization] No transitive identifier change (#92085 ) Following of https://github.com/llvm/llvm-project/pull/92083 The motivation is still cutting of the unnecessary change in the dependency chain. See the above link (recursively) for details. After this patch, (and the above patch), we can already do something pretty interesting. For example, #### Motivation example ``` //--- m-partA.cppm export module m:partA; export inline int getA() { return 43; } export class A { public: int getMem(); }; export template <typename T> class ATempl { public: T getT(); }; //--- m-partA.v1.cppm export module m:partA; export inline int getA() { return 43; } // Now we add a new declaration without introducing a new type. // The consuming module which didn't use m:partA completely is expected to be // not changed. export inline int getA2() { return 88; } export class A { public: int getMem(); // Now we add a new declaration without introducing a new type. // The consuming module which didn't use m:partA completely is expected to be // not changed. int getMem2(); }; export template <typename T> class ATempl { public: T getT(); // Add a new declaration without introducing a new type. T getT2(); }; //--- m-partB.cppm export module m:partB; export inline int getB() { return 430; } //--- m.cppm export module m; export import :partA; export import :partB; //--- useBOnly.cppm export module useBOnly; import m; export inline int get() { return getB(); } ``` In this example, module `m` exports two partitions `:partA` and `:partB`. And a consumer `useBOnly` only consumes the entities from `:partB`. So we don't hope the BMI of `useBOnly` changes if only `:partA` changes. After this patch, we can make it if the change of `:partA` doesn't introduce new types. (And we can get rid of this if we make no-transitive-type-change). As the example shows, when we change the implementation of `:partA` from `m-partA.cppm` to `m-partA.v1.cppm`, we add new function declaration `getA2()` at the global namespace, add a new member function `getMem2()` to class `A` and add a new member function to `getT2()` to class template `ATempl`. And since `:partA` is not used by `useBOnly` completely, the BMI of `useBOnly` won't change after we made above changes. #### Design details Method used in this patch is similar with https://github.com/llvm/llvm-project/pull/92083 and https://github.com/llvm/llvm-project/pull/86912. It extends the 32 bit IdentifierID to 64 bits and use the higher 32 bits to store the module file index. So that the encoding of the identifier won't get affected by other modules. #### Overhead Similar with https://github.com/llvm/llvm-project/pull/92083 and https://github.com/llvm/llvm-project/pull/86912. The change is only expected to increase the size of the on-disk .pcm files and not affect the compile-time performances. And from my experiment, the size of the on-disk change only increase 1%+ and observe no compile-time impacts. #### Future Plans I'll try to do the same thing for type ids. IIRC, it won't change the dependency graph if we add a new type in an unused units. I do think this is a significant win. And this will be a pretty good answer to "why modules are better than headers."	2024-06-20 13:30:05 +08:00

1 2 3 4 5 ...

1414 Commits