These three clauses are all quite trivial, as they take no parameters.
They are mutually exclusive, and 'seq' has some other exclusives that
are implemented here.
The ONE thing that isn't implemented is 2.9's restriction (line 2010):
'A loop associated with a 'loop' construct that does not have a 'seq'
clause must be written to meet all the following conditions'.
Future clauses will require similar work, so it'll be done as a
followup.
Following of https://github.com/llvm/llvm-project/pull/86912
The motivation of the patch series is that, for a module interface unit
`X`, when the dependent modules of `X` changes, if the changes is not
relevant with `X`, we hope the BMI of `X` won't change. For the specific
patch, we hope if the changes was about irrelevant declaration changes,
we hope the BMI of `X` won't change. **However**, I found the patch
itself is not very useful in practice, since the adding or removing
declarations, will change the state of identifiers and types in most
cases.
That said, for the most simple example,
```
// partA.cppm
export module m:partA;
// partA.v1.cppm
export module m:partA;
export void a() {}
// partB.cppm
export module m:partB;
export void b() {}
// m.cppm
export module m;
export import :partA;
export import :partB;
// onlyUseB;
export module onlyUseB;
import m;
export inline void onluUseB() {
b();
}
```
the BMI of `onlyUseB` will change after we change the implementation of
`partA.cppm` to `partA.v1.cppm`. Since `partA.v1.cppm` introduces new
identifiers and types (the function prototype).
So in this patch, we have to write the tests as:
```
// partA.cppm
export module m:partA;
export int getA() { ... }
export int getA2(int) { ... }
// partA.v1.cppm
export module m:partA;
export int getA() { ... }
export int getA(int) { ... }
export int getA2(int) { ... }
// partB.cppm
export module m:partB;
export void b() {}
// m.cppm
export module m;
export import :partA;
export import :partB;
// onlyUseB;
export module onlyUseB;
import m;
export inline void onluUseB() {
b();
}
```
so that the new introduced declaration `int getA(int)` doesn't introduce
new identifiers and types, then the BMI of `onlyUseB` can keep
unchanged.
While it looks not so great, the patch should be the base of the patch
to erase the transitive change for identifiers and types since I don't
know how can we introduce new types and identifiers without introducing
new declarations. Given how tightly the relationship between
declarations, types and identifiers, I think we can only reach the ideal
state after we made the series for all of the three entties.
The design of the patch is similar to
https://github.com/llvm/llvm-project/pull/86912, which extends the
32-bit DeclID to 64-bit and use the higher bits to store the module file
index and the lower bits to store the Local Decl ID.
A slight difference is that we only use 48 bits to store the new DeclID
since we try to use the higher 16 bits to store the module ID in the
prefix of Decl class. Previously, we use 32 bits to store the module ID
and 32 bits to store the DeclID. I don't want to allocate additional
space so I tried to make the additional space the same as 64 bits. An
potential interesting thing here is about the relationship between the
module ID and the module file index. I feel we can get the module file
index by the module ID. But I didn't prove it or implement it. Since I
want to make the patch itself as small as possible. We can make it in
the future if we want.
Another change in the patch is the new concept Decl Index, which means
the index of the very big array `DeclsLoaded` in ASTReader. Previously,
the index of a loaded declaration is simply the Decl ID minus
PREDEFINED_DECL_NUMs. So there are some places they got used
ambiguously. But this patch tried to split these two concepts.
As https://github.com/llvm/llvm-project/pull/86912 did, the change will
increase the on-disk PCM file sizes. As the declaration ID may be the
most IDs in the PCM file, this can have the biggest impact on the size.
In my experiments, this change will bring 6.6% increase of the on-disk
PCM size. No compile-time performance regression observed. Given the
benefits in the motivation example, I think the cost is worthwhile.
Particular example that lead to this is a very long chain of
`UsingShadowDecl`s that we hit in our codebase in generated code.
To avoid that, check for stack exhaustion when deserializing the
declaration. At that point, we can point to source location of a
particular declaration that is being deserialized.
This reverts commit ccb73e882b2d727877cfda42a14a6979cfd31f04.
It looks like there are some bots complaining about the patch.
See the post commit comment in
https://github.com/llvm/llvm-project/pull/92083 to track it.
Following of https://github.com/llvm/llvm-project/pull/86912
#### Motivation Example
The motivation of the patch series is that, for a module interface unit
`X`, when the dependent modules of `X` changes, if the changes is not
relevant with `X`, we hope the BMI of `X` won't change. For the specific
patch, we hope if the changes was about irrelevant declaration changes,
we hope the BMI of `X` won't change. **However**, I found the patch
itself is not very useful in practice, since the adding or removing
declarations, will change the state of identifiers and types in most
cases.
That said, for the most simple example,
```
// partA.cppm
export module m:partA;
// partA.v1.cppm
export module m:partA;
export void a() {}
// partB.cppm
export module m:partB;
export void b() {}
// m.cppm
export module m;
export import :partA;
export import :partB;
// onlyUseB;
export module onlyUseB;
import m;
export inline void onluUseB() {
b();
}
```
the BMI of `onlyUseB` will change after we change the implementation of
`partA.cppm` to `partA.v1.cppm`. Since `partA.v1.cppm` introduces new
identifiers and types (the function prototype).
So in this patch, we have to write the tests as:
```
// partA.cppm
export module m:partA;
export int getA() { ... }
export int getA2(int) { ... }
// partA.v1.cppm
export module m:partA;
export int getA() { ... }
export int getA(int) { ... }
export int getA2(int) { ... }
// partB.cppm
export module m:partB;
export void b() {}
// m.cppm
export module m;
export import :partA;
export import :partB;
// onlyUseB;
export module onlyUseB;
import m;
export inline void onluUseB() {
b();
}
```
so that the new introduced declaration `int getA(int)` doesn't introduce
new identifiers and types, then the BMI of `onlyUseB` can keep
unchanged.
While it looks not so great, the patch should be the base of the patch
to erase the transitive change for identifiers and types since I don't
know how can we introduce new types and identifiers without introducing
new declarations. Given how tightly the relationship between
declarations, types and identifiers, I think we can only reach the ideal
state after we made the series for all of the three entties.
#### Design details
The design of the patch is similar to
https://github.com/llvm/llvm-project/pull/86912, which extends the
32-bit DeclID to 64-bit and use the higher bits to store the module file
index and the lower bits to store the Local Decl ID.
A slight difference is that we only use 48 bits to store the new DeclID
since we try to use the higher 16 bits to store the module ID in the
prefix of Decl class. Previously, we use 32 bits to store the module ID
and 32 bits to store the DeclID. I don't want to allocate additional
space so I tried to make the additional space the same as 64 bits. An
potential interesting thing here is about the relationship between the
module ID and the module file index. I feel we can get the module file
index by the module ID. But I didn't prove it or implement it. Since I
want to make the patch itself as small as possible. We can make it in
the future if we want.
Another change in the patch is the new concept Decl Index, which means
the index of the very big array `DeclsLoaded` in ASTReader. Previously,
the index of a loaded declaration is simply the Decl ID minus
PREDEFINED_DECL_NUMs. So there are some places they got used
ambiguously. But this patch tried to split these two concepts.
#### Overhead
As https://github.com/llvm/llvm-project/pull/86912 did, the change will
increase the on-disk PCM file sizes. As the declaration ID may be the
most IDs in the PCM file, this can have the biggest impact on the size.
In my experiments, this change will bring 6.6% increase of the on-disk
PCM size. No compile-time performance regression observed. Given the
benefits in the motivation example, I think the cost is worthwhile.
'reduction' has a few restrictions over normal 'var-list' clauses:
1- On parallel, a num_gangs can only have 1 argument when combined with
reduction. These two aren't able to be combined on any other of the
compute constructs however.
2- The vars all must be 'numerical data types' types of some sort, or a
'composite of numerical data types'. A list of types is given in the
standard as a minimum, so we choose 'isScalar', which covers all of
these types and keeps types that are actually numeric. Other compilers
don't seem to implement the 'composite of numerical data types', though
we do.
3- Because of the above restrictions, member-of-composite is not
allowed, so any access via a memberexpr is disallowed. Array-element and
sub-arrays (aka array sections) are both permitted, so long as they meet
the requirements of #2.
This patch implements all of these for compute constructs.
device_type, also spelled as dtype, specifies the applicability of the
clauses following it, and takes a series of identifiers representing the
architectures it applies to. As we don't have a source for the valid
architectures yet, this patch just accepts all.
Semantically, this also limits the list of clauses that can be applied
after the device_type, so this implements that as well.
This reverts commit 06f04b2e27f2586d3db2204ed4e54f8b78fea74e.
This reapplies commit c4a9a374749deb5f2a932a7d4ef9321be1b2ae5d.
The build failures were caused by the patch depending on the order of
evaluation of arguments to a function. This reapplication separates out
the capture of one of the values.
This reverts commit c4a9a374749deb5f2a932a7d4ef9321be1b2ae5d.
This and the followup patch keep hitting an assert I wrote on the build
bots in a way that isn't clear. Reverting so I can fix it without a
rush.
device_type, also spelled as dtype, specifies the applicability of the
clauses following it, and takes a series of identifiers representing the
architectures it applies to. As we don't have a source for the valid
architectures yet, this patch just accepts all.
Semantically, this also limits the list of clauses that can be applied
after the device_type, so this implements that as well.
In ASTBitCodes.h, there are two type alias for the ID type of
Identifiers with the same underlying type. It is confusing. This patch
tries to merge the `IdentID` to `IdentifierID` to erase such confusion.
'wait' takes a few int-exprs (well, a series of async-arguments, but
those are effectively just an int-expr), plus a pair of tags. This
patch adds the support for this to the AST, and does the appropriate
semantic analysis for them.
Use `VFS.equivalent()`, which follows symlinks, to check if two module
cache paths are equivalent. This prevents a PCH error when building from
a different path that is a symlink of the original.
```
error: PCH was compiled with module cache path '/home/foo/blah/ModuleCache/2IBP1TNT8OR8D', but the path is currently '/data/users/foo/blah/ModuleCache/2IBP1TNT8OR8D'
1 error generated.
```
This is a pretty simple clause, it takes an 'async-argument', which
effectively needs to be just parsed as an 'int' argument, since it can
be an arbitrarly integer at runtime (and negative values are legal for
implementation defined values).
This patch also cleans up the async-argument parsing, so 'wait' got some
minor quality-of-life improvements for parsing (both clause and
construct).
These two are very similar to the other 'var-list' variants, except they
require that the type of the variable be a pointer. This patch
implements that restriction.
This relands 6c31104.
The patch was reverted due to incorrectly introduced alignment. And the
patch was re-commited after fixing the alignment issue.
Following off are the original message:
This is part of "no transitive change" patch series, "no transitive
source location change". I talked this with @Bigcheese in the tokyo's
WG21 meeting.
The idea comes from @jyknight posted on LLVM discourse. That for:
```
// A.cppm
export module A;
...
// B.cppm
export module B;
import A;
...
//--- C.cppm
export module C;
import C;
```
Almost every time A.cppm changes, we need to recompile `B`. Due to we
think the source location is significant to the semantics. But it may be
good if we can avoid recompiling `C` if the change from `A` wouldn't
change the BMI of B.
This patch only cares source locations. So let's focus on source
location's example. We can see the full example from the attached test.
```
//--- A.cppm
export module A;
export template <class T>
struct C {
T func() {
return T(43);
}
};
export int funcA() {
return 43;
}
//--- A.v1.cppm
export module A;
export template <class T>
struct C {
T func() {
return T(43);
}
};
export int funcA() {
return 43;
}
//--- B.cppm
export module B;
import A;
export int funcB() {
return funcA();
}
//--- C.cppm
export module C;
import A;
export void testD() {
C<int> c;
c.func();
}
```
Here the only difference between `A.cppm` and `A.v1.cppm` is that
`A.v1.cppm` has an additional blank line. Then the test shows that two
BMI of `B.cppm`, one specified `-fmodule-file=A=A.pcm` and the other
specified `-fmodule-file=A=A.v1.pcm`, should have the bit-wise same
contents.
However, it is a different story for C, since C instantiates templates
from A, and the instantiation records the source information from module
A, which is different from `A` and `A.v1`, so it is expected that the
BMI `C.pcm` and `C.v1.pcm` can and should differ.
To fully understand the patch, we need to understand how we encodes
source locations and how we serialize and deserialize them.
For source locations, we encoded them as:
```
|
|
| _____ base offset of an imported module
|
|
|
|_____ base offset of another imported module
|
|
|
|
| ___ 0
```
As the diagram shows, we encode the local (unloaded) source location
from 0 to higher bits. And we allocate the space for source locations
from the loaded modules from high bits to 0. Then the source locations
from the loaded modules will be mapped to our source location space
according to the allocated offset.
For example, for,
```
// a.cppm
export module a;
...
// b.cppm
export module b;
import a;
...
```
Assuming the offset of a source location (let's name the location as
`S`) in a.cppm is 45 and we will record the value `45` into the BMI
`a.pcm`. Then in b.cppm, when we import a, the source manager will
allocate a space for module 'a' (according to the recorded number of
source locations) as the base offset of module 'a' in the current source
location spaces. Let's assume the allocated base offset as 90 in this
example. Then when we want to get the location in the current source
location space for `S`, we can get it simply by adding `45` to `90` to
`135`. Finally we can get the source location for `S` in module B as
`135`.
And when we want to write module `b`, we would also write the source
location of `S` as `135` directly in the BMI. And to clarify the
location `S` comes from module `a`, we also need to record the base
offset of module `a`, 90 in the BMI of `b`.
Then the problem comes. Since the base offset of module 'a' is computed
by the number source locations in module 'a'. In module 'b', the
recorded base offset of module 'a' will change every time the number of
source locations in module 'a' increase or decrease. In other words, the
contents of BMI of B will change every time the number of locations in
module 'a' changes. This is pretty sensitive. Almost every change will
change the number of locations. So this is the problem this patch want
to solve.
Let's continue with the existing design to understand what's going on.
Another interesting case is:
```
// c.cppm
export module c;
import whatever;
import a;
import b;
...
```
In `c.cppm`, when we import `a`, we still need to allocate a base
location offset for it, let's say the value becomes to `200` somehow.
Then when we reach the location `S` recorded in module `b`, we need to
translate it into the current source location space. The solution is
quite simple, we can get it by `135 + (200 - 90) = 245`. In another
word, the offset of a source location in current module can be computed
as `Recorded Offset + Base Offset of the its module file - Recorded Base
Offset`.
Then we're almost done about how we handle the offset of source
locations in serializers.
From the abstract level, what we want to do is to remove the hardcoded
base offset of imported modules and remain the ability to calculate the
source location in a new module unit. To achieve this, we need to be
able to find the module file owning a source location from the encoding
of the source location.
So in this patch, for each source location, we will store the local
offset of the location and the module file index. For the above example,
in `b.pcm`, the source location of `S` will be recorded as `135`
directly. And in the new design, the source location of `S` will be
recorded as `<1, 45>`. Here `1` stands for the module file index of `a`
in module `b`. And `45` means the offset of `S` to the base offset of
module `a`.
So the trade-off here is that, to make the BMI more independent, we need
to record more abstract information. And I feel it is worthy. The
recompilation problem of modules is really annoying and there are still
people complaining this. But if we can make this (including stopping
other changes transitively), I think this may be a killer feature for
modules. And from @Bigcheese , this should be helpful for clang explicit
modules too.
And the benchmarking side, I tested this patch against
https://github.com/alibaba/async_simple/tree/CXX20Modules. No
significant change on compilation time. The size of .pcm files becomes
to 204M from 200M. I think the trade-off is pretty fair.
I didn't use another slot to record the module file index. I tried to
use the higher 32 bits of the existing source location encodings to
store that information. This design may be safe. Since we use `unsigned`
to store source locations but we use uint64_t in serialization. And
generally `unsigned` is 32 bit width in most platforms. So it might not
be a safe problem. Since all the bits we used to store the module file
index is not used before. So the new encodings may be:
```
|-----------------------|-----------------------|
| A | B | C |
* A: 32 bit. The index of the module file in the module manager + 1.
* The +1
here is necessary since we wish 0 stands for the current
module file.
* B: 31 bit. The offset of the source location to the module file
* containing it.
* C: The macro bit. We rotate it to the lowest bit so that we can save
* some
space in case the index of the module file is 0.
```
(The B and C is the existing raw encoding for source locations)
Another reason to reuse the same slot of the source location is to
reduce the impact of the patch. Since there are a lot of places assuming
we can store and get a source location from a slot. And if I tried to
add another slot, a lot of codes breaks. I don't feel it is worhty.
Another impact of this decision is that, the existing small
optimizations for encoding source location may be invalided. The key of
the optimization is that we can turn large values into small values then
we can use VBR6 format to reduce the size. But if we decided to put the
module file index into the higher bits, then maybe it simply doesn't
work. An example may be the `SourceLocationSequence` optimization.
This will only affect the size of on-disk .pcm files. I don't expect
this impact the speed and memory use of compilations. And seeing my
small experiments above, I feel this trade off is worthy.
The mental model for handling source location offsets is not so complex
and I believe we can solve it by adding module file index to each stored
source location.
For the practical side, since the source location is pretty sensitive,
and the patch can pass all the in-tree tests and a small scale projects,
I feel it should be correct.
I'll continue to work on no transitive decl change and no transitive
identifier change (if matters) to achieve the goal to stop the
propagation of unnecessary changes. But all of this depends on this
patch. Since, clearly, the source locations are the most sensitive
thing.
---
The release nots and documentation will be added seperately.
This patch revolves around the misuse of UnresolvedLookupExpr in
BuildTemplateIdExpr.
Basically, we build up an UnresolvedLookupExpr not only for function
overloads but for "unresolved" templates wherever we need an expression
for template decls. For example, a dependent VarTemplateDecl can be
wrapped with such an expression before template instantiation. (See
617007240c)
Also, one important thing is that UnresolvedLookupExpr uses a
"canonical"
QualType to describe the containing unresolved decls: a DependentTy is
for dependent expressions and an OverloadTy otherwise. Therefore, this
modeling for non-dependent templates leaves a problem in that the
expression
is marked and perceived as if describing overload functions. The
consumer then
expects functions for every such expression, although the fact is the
reverse.
Hence, we run into crashes.
As to the patch, I added a new canonical type "UnresolvedTemplateTy" to
model these cases. Given that we have been using this model
(intentionally or
accidentally) and it is pretty baked in throughout the code, I think
extending the role of UnresolvedLookupExpr is reasonable. Further, I
added
some diagnostics for the direct occurrence of these expressions, which
are supposed to be ill-formed.
As a bonus, this patch also fixes some typos in the diagnostics and
creates
RecoveryExprs rather than nothing in the hope of a better error-recovery
for clangd.
Fixes https://github.com/llvm/llvm-project/issues/88832
Fixes https://github.com/llvm/llvm-project/issues/63243
Fixes https://github.com/llvm/llvm-project/issues/48673
Like 'copy', these also have alternate names, so this implements that as
well. Additionally, these have an optional tag of either 'readonly' or
'zero' depending on the clause.
Otherwise, this is a pretty rote implementation of the clause, as there
aren't any special rules for it.
Like present, no_create, and first_private, copy is a clause that takes
just a var-list, and follows the same rules as the others.
The one unique part of this clause is that it ALSO supports two
deprecated/backwards-compatibility spellings, so this patch adds them
and implements them.
There were two diffs that introduced some options useful when you build
modules externally and cannot rely on file modification time as the key
for detecting input file changes:
- [D67249](https://reviews.llvm.org/D67249) introduced the
`-fmodules-validate-input-files-content` option, which allows the use of
file content hash in addition to the modification time.
- [D141632](https://reviews.llvm.org/D141632) propagated the use of
`-fno-pch-timestamps` with Clang modules.
There is a problem when the size of the input file (header) is not
modified but the content is. In this case, Clang cannot detect the file
change when the `-fno-pch-timestamps` option is used. The
`-fmodules-validate-input-files-content` option should help, but there
is an issue with its application: it's not applied when the modification
time is stored as zero that is the case for `-fno-pch-timestamps`.
The issue can be fixed using the same trick that was applied during the
processing of `ForceCheckCXX20ModulesInputFiles`:
```
// When ForceCheckCXX20ModulesInputFiles and ValidateASTInputFilesContent
// enabled, it is better to check the contents of the inputs. Since we can't
// get correct modified time information for inputs from overriden inputs.
if (HSOpts.ForceCheckCXX20ModulesInputFiles && ValidateASTInputFilesContent &&
F.StandardCXXModule && FileChange.Kind == Change::None)
FileChange = HasInputContentChanged(FileChange);
```
The patch suggests the solution similar to the presented above and
includes a LIT test to verify it.
The private clause is the first that takes a 'var-list', thus this has a
lot of additional work to enable the var-list type. A 'var' is a
traditional variable reference, subscript, member-expression, or
array-section, so checking of these is pretty minor.
Note: This ran into some issues with array-sections (aka sub-arrays)
that will be fixed in a follow-up patch.
This is part of "no transitive change" patch series, "no transitive
source location change". I talked this with @Bigcheese in the tokyo's
WG21 meeting.
The idea comes from @jyknight posted on LLVM discourse. That for:
```
// A.cppm
export module A;
...
// B.cppm
export module B;
import A;
...
//--- C.cppm
export module C;
import C;
```
Almost every time A.cppm changes, we need to recompile `B`. Due to we
think the source location is significant to the semantics. But it may be
good if we can avoid recompiling `C` if the change from `A` wouldn't
change the BMI of B.
# Motivation Example
This patch only cares source locations. So let's focus on source
location's example. We can see the full example from the attached test.
```
//--- A.cppm
export module A;
export template <class T>
struct C {
T func() {
return T(43);
}
};
export int funcA() {
return 43;
}
//--- A.v1.cppm
export module A;
export template <class T>
struct C {
T func() {
return T(43);
}
};
export int funcA() {
return 43;
}
//--- B.cppm
export module B;
import A;
export int funcB() {
return funcA();
}
//--- C.cppm
export module C;
import A;
export void testD() {
C<int> c;
c.func();
}
```
Here the only difference between `A.cppm` and `A.v1.cppm` is that
`A.v1.cppm` has an additional blank line. Then the test shows that two
BMI of `B.cppm`, one specified `-fmodule-file=A=A.pcm` and the other
specified `-fmodule-file=A=A.v1.pcm`, should have the bit-wise same
contents.
However, it is a different story for C, since C instantiates templates
from A, and the instantiation records the source information from module
A, which is different from `A` and `A.v1`, so it is expected that the
BMI `C.pcm` and `C.v1.pcm` can and should differ.
# Internal perspective of status quo
To fully understand the patch, we need to understand how we encodes
source locations and how we serialize and deserialize them.
For source locations, we encoded them as:
```
|
|
| _____ base offset of an imported module
|
|
|
|_____ base offset of another imported module
|
|
|
|
| ___ 0
```
As the diagram shows, we encode the local (unloaded) source location
from 0 to higher bits. And we allocate the space for source locations
from the loaded modules from high bits to 0. Then the source locations
from the loaded modules will be mapped to our source location space
according to the allocated offset.
For example, for,
```
// a.cppm
export module a;
...
// b.cppm
export module b;
import a;
...
```
Assuming the offset of a source location (let's name the location as
`S`) in a.cppm is 45 and we will record the value `45` into the BMI
`a.pcm`. Then in b.cppm, when we import a, the source manager will
allocate a space for module 'a' (according to the recorded number of
source locations) as the base offset of module 'a' in the current source
location spaces. Let's assume the allocated base offset as 90 in this
example. Then when we want to get the location in the current source
location space for `S`, we can get it simply by adding `45` to `90` to
`135`. Finally we can get the source location for `S` in module B as
`135`.
And when we want to write module `b`, we would also write the source
location of `S` as `135` directly in the BMI. And to clarify the
location `S` comes from module `a`, we also need to record the base
offset of module `a`, 90 in the BMI of `b`.
Then the problem comes. Since the base offset of module 'a' is computed
by the number source locations in module 'a'. In module 'b', the
recorded base offset of module 'a' will change every time the number of
source locations in module 'a' increase or decrease. In other words, the
contents of BMI of B will change every time the number of locations in
module 'a' changes. This is pretty sensitive. Almost every change will
change the number of locations. So this is the problem this patch want
to solve.
Let's continue with the existing design to understand what's going on.
Another interesting case is:
```
// c.cppm
export module c;
import whatever;
import a;
import b;
...
```
In `c.cppm`, when we import `a`, we still need to allocate a base
location offset for it, let's say the value becomes to `200` somehow.
Then when we reach the location `S` recorded in module `b`, we need to
translate it into the current source location space. The solution is
quite simple, we can get it by `135 + (200 - 90) = 245`. In another
word, the offset of a source location in current module can be computed
as `Recorded Offset + Base Offset of the its module file - Recorded Base
Offset`.
Then we're almost done about how we handle the offset of source
locations in serializers.
# The high level design of current patch
From the abstract level, what we want to do is to remove the hardcoded
base offset of imported modules and remain the ability to calculate the
source location in a new module unit. To achieve this, we need to be
able to find the module file owning a source location from the encoding
of the source location.
So in this patch, for each source location, we will store the local
offset of the location and the module file index. For the above example,
in `b.pcm`, the source location of `S` will be recorded as `135`
directly. And in the new design, the source location of `S` will be
recorded as `<1, 45>`. Here `1` stands for the module file index of `a`
in module `b`. And `45` means the offset of `S` to the base offset of
module `a`.
So the trade-off here is that, to make the BMI more independent, we need
to record more abstract information. And I feel it is worthy. The
recompilation problem of modules is really annoying and there are still
people complaining this. But if we can make this (including stopping
other changes transitively), I think this may be a killer feature for
modules. And from @Bigcheese , this should be helpful for clang explicit
modules too.
And the benchmarking side, I tested this patch against
https://github.com/alibaba/async_simple/tree/CXX20Modules. No
significant change on compilation time. The size of .pcm files becomes
to 204M from 200M. I think the trade-off is pretty fair.
# Some low level details
I didn't use another slot to record the module file index. I tried to
use the higher 32 bits of the existing source location encodings to
store that information. This design may be safe. Since we use `unsigned`
to store source locations but we use uint64_t in serialization. And
generally `unsigned` is 32 bit width in most platforms. So it might not
be a safe problem. Since all the bits we used to store the module file
index is not used before. So the new encodings may be:
```
|-----------------------|-----------------------|
| A | B | C |
* A: 32 bit. The index of the module file in the module manager + 1. The +1
here is necessary since we wish 0 stands for the current module file.
* B: 31 bit. The offset of the source location to the module file containing it.
* C: The macro bit. We rotate it to the lowest bit so that we can save some
space in case the index of the module file is 0.
```
(The B and C is the existing raw encoding for source locations)
Another reason to reuse the same slot of the source location is to
reduce the impact of the patch. Since there are a lot of places assuming
we can store and get a source location from a slot. And if I tried to
add another slot, a lot of codes breaks. I don't feel it is worhty.
Another impact of this decision is that, the existing small
optimizations for encoding source location may be invalided. The key of
the optimization is that we can turn large values into small values then
we can use VBR6 format to reduce the size. But if we decided to put the
module file index into the higher bits, then maybe it simply doesn't
work. An example may be the `SourceLocationSequence` optimization.
This will only affect the size of on-disk .pcm files. I don't expect
this impact the speed and memory use of compilations. And seeing my
small experiments above, I feel this trade off is worthy.
# Correctness
The mental model for handling source location offsets is not so complex
and I believe we can solve it by adding module file index to each stored
source location.
For the practical side, since the source location is pretty sensitive,
and the patch can pass all the in-tree tests and a small scale projects,
I feel it should be correct.
# Future Plans
I'll continue to work on no transitive decl change and no transitive
identifier change (if matters) to achieve the goal to stop the
propagation of unnecessary changes. But all of this depends on this
patch. Since, clearly, the source locations are the most sensitive
thing.
---
The release nots and documentation will be added seperately.
OpenACC is going to need an array sections implementation that is a
simpler version/more restrictive version of the OpenMP version.
This patch moves `OMPArraySectionExpr` to `Expr.h` and renames it `ArraySectionExpr`,
then adds an enum to choose between the two.
This also fixes a couple of 'drive-by' issues that I discovered on the way,
but leaves the OpenACC Sema parts reasonably unimplemented (no semantic
analysis implementation), as that will be a followup patch.
This patch tries to remove all the direct use of DeclID except the real
low level reading and writing. All the use of DeclID is converted to
the use of LocalDeclID or GlobalDeclID. This is helpful to increase the
readability and type safety.
This patch tries to remove all the direct use of DeclID except the real
low level reading and writing. All the use of DeclID is converted to
the use of LocalDeclID or GlobalDeclID. This is helpful to increase the
readability and type safety.
Previously, the DeclID is defined in serialization/ASTBitCodes.h under
clang::serialization namespace. However, actually the DeclID is not
purely used in serialization part. The DeclID is already widely used in
AST and all around the clang project via classes like `LazyPtrDecl` or
calling `ExternalASTSource::getExernalDecl()`. All such uses are via the
raw underlying type of `DeclID` as `uint32_t`. This is not pretty good.
This patch moves the DeclID class family to a new header `AST/DeclID.h`
so that the whole project can use the wrapped class `DeclID`,
`GlobalDeclID` and `LocalDeclID` instead of the raw underlying type.
This can improve the readability and the type safety.
Previously, the LocalDeclID and GlobalDeclID are defined as:
```
using LocalDeclID = DeclID;
using GlobalDeclID = DeclID;
```
This is more or less concerning that we may misuse LocalDeclID and
GlobalDeclID without understanding it. There is also a FIXME saying
this.
This patch tries to turn LocalDeclID into a class to improve the type
safety here.
This patch tries to use DeclID in the code bases to avoid use the raw
type 'uint32_t'. It is problematic to use the raw type 'uint32_t' if we
want to change the type of DeclID some day.
num_gangs takes an 'int-expr-list', for 'parallel', and an 'int-expr'
for 'kernels'. This patch changes the parsing to always parse it as an
'int-expr-list', then correct the expression count during Sema. It also
implements the rest of the semantic analysis changes for this clause.
The 'vector_length' clause is semantically identical to the
'num_workers' clause, in that it takes a mandatory single int-expr. This
is implemented identically to it.
`self` clauses on compute constructs take an optional condition
expression. We again limit the implementation to ONLY compute constructs
to ensure we get all the rules correct for others. However, this one
will be particularly complicated, as it takes a `var-list` for `update`,
so when we get to that construct/clause combination, we need to do that
as well.
This patch also furthers uses of the `OpenACCClauses.def` as it became
useful while implementing this (as well as some other minor refactors as
I went through).
Finally, `self` and `if` clauses have an interaction with each other, if
an `if` clause evaluates to `true`, the `self` clause has no effect.
While this is intended and can be used 'meaningfully', we are warning on
this with a very granular warning, so that this edge case will be
noticed by newer users, but can be disabled trivially.
Close https://github.com/llvm/llvm-project/issues/87609
We tried to profile the body of the lambda expressions in
https://reviews.llvm.org/D153957. But as the original comments show,
it is indeed dangerous. After we tried to skip calculating the ODR
hash values recently, we have fall into this trap twice.
So in this patch, I choose to not profile the body of the lambda
expression. The signature of the lambda is still profiled.
Like with the 'default' clause, this is being applied to only Compute
Constructs for now. The 'if' clause takes a condition expression which
is used as a runtime value.
This is not a particularly complex semantic implementation, as there
isn't much to this clause, other than its interactions with 'self',
which will be managed in the patch to implement that.
Following of https://github.com/llvm/llvm-project/pull/76930
This follows the idea of "only writes what we writes", which I think is
the most natural and efficient way to implement this optimization.
We start writing the BMI from the first declaration in module purview
instead of the global module fragment, so that everything in the GMF
untouched won't be written in the BMI naturally.
The exception is, as I said in
https://github.com/llvm/llvm-project/pull/76930, when we write a
declaration we need to write its decl context, and when we write the
decl context, we need to write everything from it. So when we see
`std::vector`, we basically need to write everything under namespace
std. This violates our intention. To fix this, this patch delays the
writing of namespace in the GMF.
From my local measurement, the size of the BMI decrease to 90M from 112M
for a local modules build. I think this is significant.
This feature will be covered under the experimental reduced BMI so that
it won't affect any existing users. So I'd like to land this when the CI
gets green.
Documents will be added seperately.