llvm-project

Author	SHA1	Message	Date
James Y Knight	c7f3437507	NFC: Clean up of IntrusiveRefCntPtr construction from raw pointers. (#151545 ) Handles clang::DiagnosticsEngine and clang::DiagnosticIDs. For DiagnosticIDs, this mostly migrates from `new DiagnosticIDs` to convenience method `DiagnosticIDs::create()`. Part of cleanup https://github.com/llvm/llvm-project/issues/151026	2025-07-31 15:07:35 -04:00
Kazu Hirata	096cad59df	[Serialization] Fix a warning This patch fixes: clang/unittests/Serialization/SourceLocationEncodingTest.cpp:39:34: error: unused variable 'Biggest' [-Werror,-Wunused-const-variable]	2025-06-25 07:53:24 -07:00
Haojian Wu	e90ab0e342	[Serialization] Remove delta encoding optimization (#145670 ) See the discussion in https://github.com/llvm/llvm-project/pull/145529. This will slightly increase the PCM size (~5%), some data (in-memory preamble size in clangd): - SemaExpr.cpp: 77MB -> 80MB - FindTarget.cpp: 71MB -> 75MB	2025-06-25 15:50:09 +02:00
Jan Svoboda	13e1a2cb22	Reapply "[clang] Remove intrusive reference count from `DiagnosticOptions` (#139584 )" This reverts commit e2a885537f11f8d9ced1c80c2c90069ab5adeb1d. Build failures were fixed right away and reverting the original commit without the fixes breaks the build again.	2025-05-22 12:52:03 -07:00
Kazu Hirata	e2a885537f	Revert "[clang] Remove intrusive reference count from `DiagnosticOptions` (#139584 )" This reverts commit 9e306ad4600c4d3392c194a8be88919ee758425c. Multiple builtbot failures have been reported: https://github.com/llvm/llvm-project/pull/139584	2025-05-22 12:44:20 -07:00
Jan Svoboda	9e306ad460	[clang] Remove intrusive reference count from `DiagnosticOptions` (#139584 ) The `DiagnosticOptions` class is currently intrusively reference-counted, which makes reasoning about its lifetime very difficult in some cases. For example, `CompilerInvocation` owns the `DiagnosticOptions` instance (wrapped in `llvm::IntrusiveRefCntPtr`) and only exposes an accessor returning `DiagnosticOptions &`. One would think this gives `CompilerInvocation` exclusive ownership of the object, but that's not the case: ```c++ void shareOwnership(CompilerInvocation &CI) { llvm::IntrusiveRefCntPtr<DiagnosticOptions> CoOwner = &CI.getDiagnosticOptions(); // ... } ``` This is a perfectly valid pattern that is being actually used in the codebase. I would like to ensure the ownership of `DiagnosticOptions` by `CompilerInvocation` is guaranteed to be exclusive. This can be leveraged for a copy-on-write optimization later on. This PR changes usages of `DiagnosticOptions` across `clang`, `clang-tools-extra` and `lldb` to not be intrusively reference-counted.	2025-05-22 12:33:52 -07:00
Jan Svoboda	b69dcb8734	[clang][frontend] Require invocation to construct `CompilerInstance` (#137668 ) This PR makes it so that `CompilerInvocation` needs to be provided to `CompilerInstance` on construction. There are a couple of benefits in my view: * Making it impossible to mis-use some `CompilerInstance` APIs. For example there are cases, where `createDiagnostics()` was called before `setInvocation()`, causing the `DiagnosticEngine` to use the default-constructed `DiagnosticOptions` instead of the intended ones. * This shrinks `CompilerInstance`'s state space. * This makes it possible to access the invocation in `CompilerInstance`'s constructor (to be used in a follow-up).	2025-05-01 07:31:30 -07:00
Reid Kleckner	e3c0565b74	Reapply "[cmake] Refactor clang unittest cmake" (#134195 ) This reapplies 5ffd9bdb50b57 (#133545) with fixes. The BUILD_SHARED_LIBS=ON build was fixed by adding missing LLVM dependencies to the InterpTests binary in unittests/AST/ByteCode/CMakeLists.txt .	2025-04-02 21:07:30 -07:00
dpalermo	03a791f703	Revert "[cmake] Refactor clang unittest cmake" (#134022 ) Reverts llvm/llvm-project#133545 This change is breaking several buildbots as well as developer's builds. Reverting to allow people to make progress.	2025-04-01 22:19:27 -05:00
Reid Kleckner	5ffd9bdb50	[cmake] Refactor clang unittest cmake (#133545 ) Pass all the dependencies into add_clang_unittest. This is consistent with how it is done for LLDB. I borrowed the same named argument list structure from add_lldb_unittest. This is a necessary step towards consolidating unit tests into fewer binaries, but seems like a good refactoring in its own right.	2025-04-01 14:12:44 -07:00
Ilya Biryukov	7f43120152	[Serialization] Free memory in LoadSpecLazilyTest Default Clang invocations set DisableFree = true, which causes ASAN to complain. Override it in tests that are not supposed to leak.	2024-12-12 11:19:11 +01:00
Chuanqi Xu	20e9049509	[Serialization] Support loading template specializations lazily (#119333 ) Reland https://github.com/llvm/llvm-project/pull/83237 --- (Original comments) Currently all the specializations of a template (including instantiation, specialization and partial specializations) will be loaded at once if we want to instantiate another instance for the template, or find instantiation for the template, or just want to complete the redecl chain. This means basically we need to load every specializations for the template once the template declaration got loaded. This is bad since when we load a specialization, we need to load all of its template arguments. Then we have to deserialize a lot of unnecessary declarations. For example, ``` // M.cppm export module M; export template <class T> class A {}; export class ShouldNotBeLoaded {}; export class Temp { A<ShouldNotBeLoaded> AS; }; // use.cpp import M; A<int> a; ``` We have a specialization ` A<ShouldNotBeLoaded>` in `M.cppm` and we instantiate the template `A` in `use.cpp`. Then we will deserialize `ShouldNotBeLoaded` surprisingly when compiling `use.cpp`. And this patch tries to avoid that. Given that the templates are heavily used in C++, this is a pain point for the performance. This patch adds MultiOnDiskHashTable for specializations in the ASTReader. Then we will only deserialize the specializations with the same template arguments. We made that by using ODRHash for the template arguments as the key of the hash table. To review this patch, I think `ASTReaderDecl::AddLazySpecializations` may be a good entry point.	2024-12-11 09:40:47 +08:00
Haowei Wu	12bdeba76e	Revert "[Serialization] Support load lazy specialization lazily" This reverts commit b5bd19211118c6d43bc525a4e3fb65d2c750d61e. It brokes multiple llvm bots including clang-x64-windows-msvc	2024-12-06 10:33:57 -08:00
Chuanqi Xu	b5bd192111	[Serialization] Support load lazy specialization lazily Currently all the specializations of a template (including instantiation, specialization and partial specializations) will be loaded at once if we want to instantiate another instance for the template, or find instantiation for the template, or just want to complete the redecl chain. This means basically we need to load every specializations for the template once the template declaration got loaded. This is bad since when we load a specialization, we need to load all of its template arguments. Then we have to deserialize a lot of unnecessary declarations. For example, ``` // M.cppm export module M; export template <class T> class A {}; export class ShouldNotBeLoaded {}; export class Temp { A<ShouldNotBeLoaded> AS; }; // use.cpp import M; A<int> a; ``` We should a specialization ` A<ShouldNotBeLoaded>` in `M.cppm` and we instantiate the template `A` in `use.cpp`. Then we will deserialize `ShouldNotBeLoaded` surprisingly when compiling `use.cpp`. And this patch tries to avoid that. Given that the templates are heavily used in C++, this is a pain point for the performance. This patch adds MultiOnDiskHashTable for specializations in the ASTReader. Then we will only deserialize the specializations with the same template arguments. We made that by using ODRHash for the template arguments as the key of the hash table. To review this patch, I think `ASTReaderDecl::AddLazySpecializations` may be a good entry point. The patch was reviewed in https://github.com/llvm/llvm-project/pull/83237 but that PR is a stacked PR. But I feel the intention of the stacked PRs get lost during the review process. So I feel it is better to merge the commits into a single commit instead of merging them in the PR page. It is better for us to cherry-pick and revert.	2024-12-06 10:52:35 +08:00
Kadir Cetinkaya	df9a14d7bb	Reapply "[NFC] Explicitly pass a VFS when creating DiagnosticsEngine (#115852 )" This reverts commit a1153cd6fedd4c906a9840987934ca4712e34cb2 with fixes to lldb breakages. Fixes https://github.com/llvm/llvm-project/issues/117145.	2024-11-21 14:55:30 +01:00
Sylvestre Ledru	a1153cd6fe	Revert "[NFC] Explicitly pass a VFS when creating DiagnosticsEngine (#115852 )" Reverted for causing: https://github.com/llvm/llvm-project/issues/117145 This reverts commit bdd10d9d249bd1c2a45e3de56a5accd97e953458.	2024-11-21 13:04:30 +01:00
kadir çetinkaya	bdd10d9d24	[NFC] Explicitly pass a VFS when creating DiagnosticsEngine (#115852 ) Starting with 41e3919ded78d8870f7c95e9181c7f7e29aa3cc4 DiagnosticsEngine creation might perform IO. It was implicitly defaulting to getRealFileSystem. This patch makes it explicit by pushing the decision making to callers. It uses ambient VFS if one is available, and keeps using `getRealFileSystem` if there aren't any VFS.	2024-11-21 12:11:41 +01:00
Chuanqi Xu	947b062823	Reland "[Modules] No transitive source location change (#86912 )" This relands 6c31104. The patch was reverted due to incorrectly introduced alignment. And the patch was re-commited after fixing the alignment issue. Following off are the original message: This is part of "no transitive change" patch series, "no transitive source location change". I talked this with @Bigcheese in the tokyo's WG21 meeting. The idea comes from @jyknight posted on LLVM discourse. That for: ``` // A.cppm export module A; ... // B.cppm export module B; import A; ... //--- C.cppm export module C; import C; ``` Almost every time A.cppm changes, we need to recompile `B`. Due to we think the source location is significant to the semantics. But it may be good if we can avoid recompiling `C` if the change from `A` wouldn't change the BMI of B. This patch only cares source locations. So let's focus on source location's example. We can see the full example from the attached test. ``` //--- A.cppm export module A; export template <class T> struct C { T func() { return T(43); } }; export int funcA() { return 43; } //--- A.v1.cppm export module A; export template <class T> struct C { T func() { return T(43); } }; export int funcA() { return 43; } //--- B.cppm export module B; import A; export int funcB() { return funcA(); } //--- C.cppm export module C; import A; export void testD() { C<int> c; c.func(); } ``` Here the only difference between `A.cppm` and `A.v1.cppm` is that `A.v1.cppm` has an additional blank line. Then the test shows that two BMI of `B.cppm`, one specified `-fmodule-file=A=A.pcm` and the other specified `-fmodule-file=A=A.v1.pcm`, should have the bit-wise same contents. However, it is a different story for C, since C instantiates templates from A, and the instantiation records the source information from module A, which is different from `A` and `A.v1`, so it is expected that the BMI `C.pcm` and `C.v1.pcm` can and should differ. To fully understand the patch, we need to understand how we encodes source locations and how we serialize and deserialize them. For source locations, we encoded them as: ``` \| \| \| _____ base offset of an imported module \| \| \| \|_____ base offset of another imported module \| \| \| \| \| ___ 0 ``` As the diagram shows, we encode the local (unloaded) source location from 0 to higher bits. And we allocate the space for source locations from the loaded modules from high bits to 0. Then the source locations from the loaded modules will be mapped to our source location space according to the allocated offset. For example, for, ``` // a.cppm export module a; ... // b.cppm export module b; import a; ... ``` Assuming the offset of a source location (let's name the location as `S`) in a.cppm is 45 and we will record the value `45` into the BMI `a.pcm`. Then in b.cppm, when we import a, the source manager will allocate a space for module 'a' (according to the recorded number of source locations) as the base offset of module 'a' in the current source location spaces. Let's assume the allocated base offset as 90 in this example. Then when we want to get the location in the current source location space for `S`, we can get it simply by adding `45` to `90` to `135`. Finally we can get the source location for `S` in module B as `135`. And when we want to write module `b`, we would also write the source location of `S` as `135` directly in the BMI. And to clarify the location `S` comes from module `a`, we also need to record the base offset of module `a`, 90 in the BMI of `b`. Then the problem comes. Since the base offset of module 'a' is computed by the number source locations in module 'a'. In module 'b', the recorded base offset of module 'a' will change every time the number of source locations in module 'a' increase or decrease. In other words, the contents of BMI of B will change every time the number of locations in module 'a' changes. This is pretty sensitive. Almost every change will change the number of locations. So this is the problem this patch want to solve. Let's continue with the existing design to understand what's going on. Another interesting case is: ``` // c.cppm export module c; import whatever; import a; import b; ... ``` In `c.cppm`, when we import `a`, we still need to allocate a base location offset for it, let's say the value becomes to `200` somehow. Then when we reach the location `S` recorded in module `b`, we need to translate it into the current source location space. The solution is quite simple, we can get it by `135 + (200 - 90) = 245`. In another word, the offset of a source location in current module can be computed as `Recorded Offset + Base Offset of the its module file - Recorded Base Offset`. Then we're almost done about how we handle the offset of source locations in serializers. From the abstract level, what we want to do is to remove the hardcoded base offset of imported modules and remain the ability to calculate the source location in a new module unit. To achieve this, we need to be able to find the module file owning a source location from the encoding of the source location. So in this patch, for each source location, we will store the local offset of the location and the module file index. For the above example, in `b.pcm`, the source location of `S` will be recorded as `135` directly. And in the new design, the source location of `S` will be recorded as `<1, 45>`. Here `1` stands for the module file index of `a` in module `b`. And `45` means the offset of `S` to the base offset of module `a`. So the trade-off here is that, to make the BMI more independent, we need to record more abstract information. And I feel it is worthy. The recompilation problem of modules is really annoying and there are still people complaining this. But if we can make this (including stopping other changes transitively), I think this may be a killer feature for modules. And from @Bigcheese , this should be helpful for clang explicit modules too. And the benchmarking side, I tested this patch against https://github.com/alibaba/async_simple/tree/CXX20Modules. No significant change on compilation time. The size of .pcm files becomes to 204M from 200M. I think the trade-off is pretty fair. I didn't use another slot to record the module file index. I tried to use the higher 32 bits of the existing source location encodings to store that information. This design may be safe. Since we use `unsigned` to store source locations but we use uint64_t in serialization. And generally `unsigned` is 32 bit width in most platforms. So it might not be a safe problem. Since all the bits we used to store the module file index is not used before. So the new encodings may be: ``` \|-----------------------\|-----------------------\| \| A \| B \| C \| * A: 32 bit. The index of the module file in the module manager + 1. * The +1 here is necessary since we wish 0 stands for the current module file. * B: 31 bit. The offset of the source location to the module file * containing it. * C: The macro bit. We rotate it to the lowest bit so that we can save * some space in case the index of the module file is 0. ``` (The B and C is the existing raw encoding for source locations) Another reason to reuse the same slot of the source location is to reduce the impact of the patch. Since there are a lot of places assuming we can store and get a source location from a slot. And if I tried to add another slot, a lot of codes breaks. I don't feel it is worhty. Another impact of this decision is that, the existing small optimizations for encoding source location may be invalided. The key of the optimization is that we can turn large values into small values then we can use VBR6 format to reduce the size. But if we decided to put the module file index into the higher bits, then maybe it simply doesn't work. An example may be the `SourceLocationSequence` optimization. This will only affect the size of on-disk .pcm files. I don't expect this impact the speed and memory use of compilations. And seeing my small experiments above, I feel this trade off is worthy. The mental model for handling source location offsets is not so complex and I believe we can solve it by adding module file index to each stored source location. For the practical side, since the source location is pretty sensitive, and the patch can pass all the in-tree tests and a small scale projects, I feel it should be correct. I'll continue to work on no transitive decl change and no transitive identifier change (if matters) to achieve the goal to stop the propagation of unnecessary changes. But all of this depends on this patch. Since, clearly, the source locations are the most sensitive thing. --- The release nots and documentation will be added seperately.	2024-05-06 13:35:16 +08:00
Chuanqi Xu	d333a0de68	Revert "[Modules] No transitive source location change (#86912 )" This reverts commit 6c3110464bac3600685af9650269b0b2b8669d34. Required by the post commit comments: https://github.com/llvm/llvm-project/pull/86912	2024-04-30 22:32:02 +08:00
Chuanqi Xu	6c3110464b	[Modules] No transitive source location change (#86912 ) This is part of "no transitive change" patch series, "no transitive source location change". I talked this with @Bigcheese in the tokyo's WG21 meeting. The idea comes from @jyknight posted on LLVM discourse. That for: ``` // A.cppm export module A; ... // B.cppm export module B; import A; ... //--- C.cppm export module C; import C; ``` Almost every time A.cppm changes, we need to recompile `B`. Due to we think the source location is significant to the semantics. But it may be good if we can avoid recompiling `C` if the change from `A` wouldn't change the BMI of B. # Motivation Example This patch only cares source locations. So let's focus on source location's example. We can see the full example from the attached test. ``` //--- A.cppm export module A; export template <class T> struct C { T func() { return T(43); } }; export int funcA() { return 43; } //--- A.v1.cppm export module A; export template <class T> struct C { T func() { return T(43); } }; export int funcA() { return 43; } //--- B.cppm export module B; import A; export int funcB() { return funcA(); } //--- C.cppm export module C; import A; export void testD() { C<int> c; c.func(); } ``` Here the only difference between `A.cppm` and `A.v1.cppm` is that `A.v1.cppm` has an additional blank line. Then the test shows that two BMI of `B.cppm`, one specified `-fmodule-file=A=A.pcm` and the other specified `-fmodule-file=A=A.v1.pcm`, should have the bit-wise same contents. However, it is a different story for C, since C instantiates templates from A, and the instantiation records the source information from module A, which is different from `A` and `A.v1`, so it is expected that the BMI `C.pcm` and `C.v1.pcm` can and should differ. # Internal perspective of status quo To fully understand the patch, we need to understand how we encodes source locations and how we serialize and deserialize them. For source locations, we encoded them as: ``` \| \| \| _____ base offset of an imported module \| \| \| \|_____ base offset of another imported module \| \| \| \| \| ___ 0 ``` As the diagram shows, we encode the local (unloaded) source location from 0 to higher bits. And we allocate the space for source locations from the loaded modules from high bits to 0. Then the source locations from the loaded modules will be mapped to our source location space according to the allocated offset. For example, for, ``` // a.cppm export module a; ... // b.cppm export module b; import a; ... ``` Assuming the offset of a source location (let's name the location as `S`) in a.cppm is 45 and we will record the value `45` into the BMI `a.pcm`. Then in b.cppm, when we import a, the source manager will allocate a space for module 'a' (according to the recorded number of source locations) as the base offset of module 'a' in the current source location spaces. Let's assume the allocated base offset as 90 in this example. Then when we want to get the location in the current source location space for `S`, we can get it simply by adding `45` to `90` to `135`. Finally we can get the source location for `S` in module B as `135`. And when we want to write module `b`, we would also write the source location of `S` as `135` directly in the BMI. And to clarify the location `S` comes from module `a`, we also need to record the base offset of module `a`, 90 in the BMI of `b`. Then the problem comes. Since the base offset of module 'a' is computed by the number source locations in module 'a'. In module 'b', the recorded base offset of module 'a' will change every time the number of source locations in module 'a' increase or decrease. In other words, the contents of BMI of B will change every time the number of locations in module 'a' changes. This is pretty sensitive. Almost every change will change the number of locations. So this is the problem this patch want to solve. Let's continue with the existing design to understand what's going on. Another interesting case is: ``` // c.cppm export module c; import whatever; import a; import b; ... ``` In `c.cppm`, when we import `a`, we still need to allocate a base location offset for it, let's say the value becomes to `200` somehow. Then when we reach the location `S` recorded in module `b`, we need to translate it into the current source location space. The solution is quite simple, we can get it by `135 + (200 - 90) = 245`. In another word, the offset of a source location in current module can be computed as `Recorded Offset + Base Offset of the its module file - Recorded Base Offset`. Then we're almost done about how we handle the offset of source locations in serializers. # The high level design of current patch From the abstract level, what we want to do is to remove the hardcoded base offset of imported modules and remain the ability to calculate the source location in a new module unit. To achieve this, we need to be able to find the module file owning a source location from the encoding of the source location. So in this patch, for each source location, we will store the local offset of the location and the module file index. For the above example, in `b.pcm`, the source location of `S` will be recorded as `135` directly. And in the new design, the source location of `S` will be recorded as `<1, 45>`. Here `1` stands for the module file index of `a` in module `b`. And `45` means the offset of `S` to the base offset of module `a`. So the trade-off here is that, to make the BMI more independent, we need to record more abstract information. And I feel it is worthy. The recompilation problem of modules is really annoying and there are still people complaining this. But if we can make this (including stopping other changes transitively), I think this may be a killer feature for modules. And from @Bigcheese , this should be helpful for clang explicit modules too. And the benchmarking side, I tested this patch against https://github.com/alibaba/async_simple/tree/CXX20Modules. No significant change on compilation time. The size of .pcm files becomes to 204M from 200M. I think the trade-off is pretty fair. # Some low level details I didn't use another slot to record the module file index. I tried to use the higher 32 bits of the existing source location encodings to store that information. This design may be safe. Since we use `unsigned` to store source locations but we use uint64_t in serialization. And generally `unsigned` is 32 bit width in most platforms. So it might not be a safe problem. Since all the bits we used to store the module file index is not used before. So the new encodings may be: ``` \|-----------------------\|-----------------------\| \| A \| B \| C \| * A: 32 bit. The index of the module file in the module manager + 1. The +1 here is necessary since we wish 0 stands for the current module file. * B: 31 bit. The offset of the source location to the module file containing it. * C: The macro bit. We rotate it to the lowest bit so that we can save some space in case the index of the module file is 0. ``` (The B and C is the existing raw encoding for source locations) Another reason to reuse the same slot of the source location is to reduce the impact of the patch. Since there are a lot of places assuming we can store and get a source location from a slot. And if I tried to add another slot, a lot of codes breaks. I don't feel it is worhty. Another impact of this decision is that, the existing small optimizations for encoding source location may be invalided. The key of the optimization is that we can turn large values into small values then we can use VBR6 format to reduce the size. But if we decided to put the module file index into the higher bits, then maybe it simply doesn't work. An example may be the `SourceLocationSequence` optimization. This will only affect the size of on-disk .pcm files. I don't expect this impact the speed and memory use of compilations. And seeing my small experiments above, I feel this trade off is worthy. # Correctness The mental model for handling source location offsets is not so complex and I believe we can solve it by adding module file index to each stored source location. For the practical side, since the source location is pretty sensitive, and the patch can pass all the in-tree tests and a small scale projects, I feel it should be correct. # Future Plans I'll continue to work on no transitive decl change and no transitive identifier change (if matters) to achieve the goal to stop the propagation of unnecessary changes. But all of this depends on this patch. Since, clearly, the source locations are the most sensitive thing. --- The release nots and documentation will be added seperately.	2024-04-30 15:57:58 +08:00
Chuanqi Xu	da00c60dae	[C++20] [Modules] Introduce reduced BMI (#75894 ) Close https://github.com/llvm/llvm-project/issues/71034 See https://discourse.llvm.org/t/rfc-c-20-modules-introduce-thin-bmi-and-decls-hash/74755 This patch introduces reduced BMI, which doesn't contain the definitions of functions and variables if its definitions won't contribute to the ABI. Testing is a big part of the patch. We want to make sure the reduced BMI contains the same behavior with the existing and relatively stable fatBMI. This is pretty helpful for further reduction. The user interfaces part it left to following patches to ease the reviewing.	2024-03-08 10:12:51 +08:00
Krasimir Georgiev	b2ebd8b897	clang serialization unittests: fix some leaks (#82773 ) No functional changes intended. Fixes some leaks found by running under asan with `--gtest_repeat=2`.	2024-02-26 16:58:39 +01:00
Chuanqi Xu	49775b1dc0	[Serialization] Record whether the ODR is skipped (#82302 ) Close https://github.com/llvm/llvm-project/issues/80570. In `a0b6747804`, we skipped ODR checks for decls in GMF. Then it should be natural to skip storing the ODR values in BMI. Generally it should be fine as long as the writer and the reader keep consistent. However, the use of preamble in clangd shows the tricky part. For, ``` // test.cpp module; // any one off these is enough to crash clangd // #include <iostream> // #include <string_view> // #include <cmath> // #include <system_error> // #include <new> // #include <bit> // probably many more // only ok with libc++, not the system provided libstdc++ 13.2.1 // these are ok export module test; ``` clangd will store the headers as preamble to speedup the parsing and the preamble reuses the serialization techniques. (Generally we'd call the preamble as PCH. However it is not true strictly. I've tested the PCH wouldn't be problematic.) However, the tricky part is that the preamble is not modules. It literally serialiaze and deserialize things. So before clangd parsing the above test module, clangd will serialize the headers into the preamble. Note that there is no concept like GMF now. So the ODR bits are stored. However, when clangd parse the file actually, the decls from preamble are thought as in GMF literally, then hte ODR bits are skipped. Then mismatch happens. To solve the problem, this patch adds another bit for decls to record whether or not the ODR bits are skipped.	2024-02-20 13:31:28 +08:00
Chuanqi Xu	d76b56fd28	[NFC] Eliminate warnings in SourceLocationEncodingTest.cpp There are 2 dangling else warnings and a warning pararences around bit operations. This patch tries to eliminate that.	2023-11-02 14:01:17 +08:00
Benjamin Kramer	263fc4c79a	Turn off memory leaks in unit test	2023-09-14 13:16:22 +02:00
Chuanqi Xu	45c160510f	[C++20] [Modules] [Serialization] Check input file contents with option ForceCheckCXX20ModulesInputFiles enabled With overriden input files, e,g,. the compiler get the file from an in-memroy buffer, the compiler can't get correct modified time information to indicate whehter the input files are changed or not. Then, the semantics of ForceCheckCXX20ModulesInputFiles are broken. In this patch, if both ForceCheckCXX20ModulesInputFiles and ValidateASTInputFilesContent and enabled, the compiler will still check the hash value of the contents even if their modification time is the same.	2023-09-14 11:27:50 +08:00
Mikhail Goncharov	4faa6c6699	[clang][test] Use a physical copy of FS (missing piece from https://reviews.llvm.org/D152265) 476e7c49ecb762df1d68273696b06c36feb0fd96	2023-06-06 17:36:35 +02:00
Kadir Cetinkaya	476e7c49ec	[clang][test] Use a physical copy of FS Make use of a physical copy, rather than real FS in unittests that change working-directory to get rid of the side effect of changing cwd for the whole process. It's triggering crashes depending on the test order. Differential Revision: https://reviews.llvm.org/D152265	2023-06-06 15:45:55 +02:00
Chuanqi Xu	c336c983bc	[C++20] [Modules] [Serialization] Don't write comments to BMI for C++20 Named Modules This patch forbids to write comment to BMIs for C++20 Named Modules. Originally I thought this was helpful for language services like clangd. But I found clangd don't want the BMI to contain comments actually. So it is meaningless for C++20 Named Modules to keep such comments in their BMI. It is simple to enable this when someday we found we want this actually.	2023-06-06 13:05:17 +08:00
Michael Liao	2daf91dae3	Fix shared library build again from 1c9a800. NFC	2023-05-24 14:09:38 -04:00
Amy Kwan	61262f9ef4	Fix shared library build from 1c9a800. Fix the shared library build failure on clang-ppc64le-rhel from 1c9a800 as seen in: https://lab.llvm.org/buildbot/#/builders/57/builds/27080/steps/6/logs/stdio	2023-05-24 12:29:23 -05:00
Chuanqi Xu	1c9a8004ed	Recommit [C++20] [Modules] Serialize the evaluated constant values for VarDecl Close https://github.com/llvm/llvm-project/issues/62796. Previously, we didn't serialize the evaluated result for VarDecl. This caused the compilation of template metaprogramming become slower than expect. This patch fixes the issue. This is a recommit tested with asan built clang.	2023-05-24 15:45:16 +08:00
Chuanqi Xu	651b40e8ff	Revert "[C++20] [Modules] Serialize the evaluated constant values for VarDecl" This reverts commit c0d6f85e3ae8bcfdb7217d165314f01c1a4af9ae. The asan bot detected a memory leak after this patch. Revert it for now.	2023-05-24 13:56:09 +08:00
Chuanqi Xu	597dd1f91d	[NFC] Fix the warning for dangling pointer for c0d6f85e3ae8bc The bot notes a warning-converted-error for the dangling pointer. And the patch fixes that.	2023-05-24 11:42:55 +08:00
Chuanqi Xu	c0d6f85e3a	[C++20] [Modules] Serialize the evaluated constant values for VarDecl Close https://github.com/llvm/llvm-project/issues/62796. Previously, we didn't serialize the evaluated result for VarDecl. This caused the compilation of template metaprogramming become slower than expect. This patch fixes the issue.	2023-05-24 10:17:33 +08:00
Kazu Hirata	6ad0788c33	[clang] Use std::optional instead of llvm::Optional (NFC) This patch replaces (llvm::\|)Optional< with std::optional<. I'll post a separate patch to remove #include "llvm/ADT/Optional.h". This is part of an effort to migrate from llvm::Optional to std::optional: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716	2023-01-14 12:31:01 -08:00
Kazu Hirata	a1580d7b59	[clang] Add #include <optional> (NFC) This patch adds #include <optional> to those files containing llvm::Optional<...> or Optional<...>. I'll post a separate patch to actually replace llvm::Optional with std::optional. This is part of an effort to migrate from llvm::Optional to std::optional: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716	2023-01-14 11:07:21 -08:00
Kazu Hirata	a41fbb1fc2	[clang/unittests] Use std::nullopt instead of None (NFC) This patch mechanically replaces None with std::nullopt where the compiler would warn if None were deprecated. The intent is to reduce the amount of manual work required in migrating from Optional to std::optional. This is part of an effort to migrate from llvm::Optional to std::optional: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716	2022-12-03 12:14:19 -08:00
Sam McCall	481691572d	[Serialization] Add missing includes for CHAR_BIT	2022-05-19 10:04:25 +02:00
Sam McCall	4df795bff7	[Serialization] Delta-encode consecutive SourceLocations in TypeLoc Much of the size of PCH/PCM files comes from stored SourceLocations. These are encoded using (almost) their raw value, VBR-encoded. Absolute SourceLocations can be relatively large numbers, so this commonly takes 20-30 bits per location. We can reduce this by exploiting redundancy: many "nearby" SourceLocations are stored differing only slightly and can be delta-encoded. Randam-access loading of AST nodes constrains how long these sequences can be, but we can do it at least within a node that always gets deserialized as an atomic unit. TypeLoc is implemented in this patch as it's a relatively small change that shows most of the API. This saves ~3.5% of PCH size, I have local changes applying this technique further that save another 3%, I think it's possible to get to 10% total. Differential Revision: https://reviews.llvm.org/D125403	2022-05-19 09:40:44 +02:00
Sam McCall	499d0b96cb	[clang] createInvocationFromCommandLine -> createInvocation, delete former. NFC (Followup from 40c13720a4b977d4347bbde53c52a4d0703823c2) Differential Revision: https://reviews.llvm.org/D125012	2022-05-06 16:21:48 +02:00
Ben Barham	766a08df12	[Frontend] Only compile modules if not already finalized It was possible to re-add a module to a shared in-memory module cache when search paths are changed. This can eventually cause a crash if the original module is referenced after this occurs. 1. Module A depends on B 2. B exists in two paths C and D 3. First run only has C on the search path, finds A and B and loads them 4. Second run adds D to the front of the search path. A is loaded and contains a reference to the already compiled module from C. But searching finds the module from D instead, causing a mismatch 5. B and the modules that depend on it are considered out of date and thus rebuilt 6. The recompiled module A is added to the in-memory cache, freeing the previously inserted one This can never occur from a regular clang process, but is very easy to do through the API - whether through the use of a shared case or just running multiple compilations from a single `CompilerInstance`. Update the compilation to return early if a module is already finalized so that the pre-condition in the in-memory module cache holds. Resolves rdar://78180255 Differential Revision: https://reviews.llvm.org/D105328	2021-07-15 18:27:08 -07:00
Rumeet Dhindsa	57a2eaf3c1	Revert "[modules] Do not cache invalid state for modules that we attempted to load." As per comment on https://reviews.llvm.org/D72860, it is suggested to revert this change in the meantime, since it has introduced regression. This reverts commit 83f4c3af021cd5322ea10fd1c4e839874c1dae49.	2020-03-10 10:59:26 -07:00
Volodymyr Sapsai	83f4c3af02	[modules] Do not cache invalid state for modules that we attempted to load. Partially reverts 0a2be46cfdb698fefcc860a56b47dde0884d5335 as it turned out to cause redundant module rebuilds in multi-process incremental builds. When a module was getting out of date, all compilation processes started at the same time were marking it as `ToBuild`. So each process was building the same module instead of checking if it was built by someone else and using that result. In addition to the work duplication, contention on the same .pcm file wasn't making builds faster. Note that for a single-process build this change would cause redundant module reads and validations. But reading a module is faster than building it and multi-process builds are more common than single-process. So I'm willing to make such a trade-off. rdar://problem/54395127 Reviewed By: dexonsmith Differential Revision: https://reviews.llvm.org/D72860	2020-01-16 17:12:41 -08:00
Tom Stellard	2e97d2aa1b	cmake: Add CLANG_LINK_CLANG_DYLIB option Summary: Setting CLANG_LINK_CLANG_DYLIB=ON causes clang tools to link against libclang_shared.so instead of the individual component libraries. Reviewers: mgorny, beanz, smeenai, phosek, sylvestre.ledru Subscribers: arphaman, cfe-commits, llvm-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D63503 llvm-svn: 365092	2019-07-03 22:45:55 +00:00
Francis Visoiu Mistrih	e0308279cb	[Bitcode] Move Bitstream to a separate library This moves Bitcode/Bitstream*, Bitcode/BitCodes.h to Bitstream/. This is needed to avoid a circular dependency when using the bitstream code for parsing optimization remarks. Since Bitcode uses Core for the IR part: libLLVMRemarks -> Bitcode -> Core and Core uses libLLVMRemarks to generate remarks (see IR/RemarkStreamer.cpp): Core -> libLLVMRemarks we need to separate the Bitstream and Bitcode part. For clang-doc, it seems that it doesn't need the whole bitcode layer, so I updated the CMake to only use the bitstream part. Differential Revision: https://reviews.llvm.org/D63899 llvm-svn: 365091	2019-07-03 22:40:07 +00:00
Duncan P. N. Exon Smith	b7db2e9f82	Stop relying on allocator behaviour in modules unit test Another fixup for r355778 for Windows bots, this time to stop accidentally relying on allocator behaviour for the test to pass. llvm-svn: 355780	2019-03-09 20:15:01 +00:00
Duncan P. N. Exon Smith	0a2be46cfd	Modules: Invalidate out-of-date PCMs as they're discovered Leverage the InMemoryModuleCache to invalidate a module the first time it fails to import (and to lock a module as soon as it's built or imported successfully). For implicit module builds, this optimizes importing deep graphs where the leaf module is out-of-date; see example near the end of the commit message. Previously the cache finalized ("locked in") all modules imported so far when starting a new module build. This was sufficient to prevent loading two versions of the same module, but was somewhat arbitrary and hard to reason about. Now the cache explicitly tracks module state, where each module must be one of: - Unknown: module not in the cache (yet). - Tentative: module in the cache, but not yet fully imported. - ToBuild: module found on disk could not be imported; need to build. - Final: module in the cache has been successfully built or imported. Preventing repeated failed imports avoids variation in builds based on shifting filesystem state. Now it's guaranteed that a module is loaded from disk exactly once. It now seems safe to remove FileManager::invalidateCache, but I'm leaving that for a later commit. The new, precise logic uncovered a pre-existing problem in the cache: the map key is the module filename, and different contexts use different filenames for the same PCM file. (In particular, the test Modules/relative-import-path.c does not build without this commit. r223577 started using a relative path to describe a module's base directory when importing it within another module. As a result, the module cache sees an absolute path when (a) building the module or importing it at the top-level, and a relative path when (b) importing the module underneath another one.) The "obvious" fix is to resolve paths using FileManager::getVirtualFile and change the map key for the cache to a FileEntry, but some contexts (particularly related to ASTUnit) have a shorter lifetime for their FileManager than the InMemoryModuleCache. This is worth pursuing further in a later commit; perhaps by tying together the FileManager and InMemoryModuleCache lifetime, or moving the in-memory PCM storage into a VFS layer. For now, use the PCM's base directory as-written for constructing the filename to check the ModuleCache. Example ======= To understand the build optimization, first consider the build of a module graph TU -> A -> B -> C -> D with an empty cache: TU builds A' A' builds B' B' builds C' C' builds D' imports D' B' imports C' imports D' A' imports B' imports C' imports D' TU imports A' imports B' imports C' imports D' If we build TU again, where A, B, C, and D are in the cache and D is out-of-date, we would previously get this build: TU imports A imports B imports C imports D (out-of-date) TU builds A' A' imports B imports C imports D (out-of-date) builds B' B' imports C imports D (out-of-date) builds C' C' imports D (out-of-date) builds D' imports D' B' imports C' imports D' A' imports B' imports C' imports D' TU imports A' imports B' imports C' imports D' After this commit, we'll immediateley invalidate A, B, C, and D when we first observe that D is out-of-date, giving this build: TU imports A imports B imports C imports D (out-of-date) TU builds A' // The same graph as an empty cache. A' builds B' B' builds C' C' builds D' imports D' B' imports C' imports D' A' imports B' imports C' imports D' TU imports A' imports B' imports C' imports D' The new build matches what we'd naively expect, pretty closely matching the original build with the empty cache. rdar://problem/48545366 llvm-svn: 355778	2019-03-09 17:44:01 +00:00
Duncan P. N. Exon Smith	8bef5cd49a	Modules: Rename MemoryBufferCache to InMemoryModuleCache Change MemoryBufferCache to InMemoryModuleCache, moving it from Basic to Serialization. Another patch will start using it to manage module build more explicitly, but this is split out because it's mostly mechanical. Because of the move to Serialization we can no longer abuse the Preprocessor to forward it to the ASTReader. Besides the rename and file move, that means Preprocessor::Preprocessor has one fewer parameter and ASTReader::ASTReader has one more. llvm-svn: 355777	2019-03-09 17:33:56 +00:00

49 Commits