Following of https://github.com/llvm/llvm-project/pull/92085.
#### motivation
The motivation is still cutting of the unnecessary change in the
dependency chain. See the above link (recursively) for details.
And this will be the last patch of the `no-transitive-*-change` series.
If there are any following patches, they might be C++20 Named modules
specific to handle special grammars like `ADL` (See the reply in
https://discourse.llvm.org/t/rfc-c-20-modules-introduce-thin-bmi-and-decls-hash/74755/53
for example). So they won't affect the whole serialization part as the
series patch did.
#### example
After this patch, finally we are able to cut of unnecessary change of
types. For example,
```
//--- m-partA.cppm
export module m:partA;
//--- m-partA.v1.cppm
export module m:partA;
namespace NS {
class A {
public:
int getValue() {
return 43;
}
};
}
//--- m-partB.cppm
export module m:partB;
export inline int getB() {
return 430;
}
//--- m.cppm
export module m;
export import :partA;
export import :partB;
//--- useBOnly.cppm
export module useBOnly;
import m;
export inline int get() {
return getB();
}
```
The BMI of `useBOnly.cppm` is expected to not change if we only add a
new class in `m:partA`. This will be pretty useful in practice.
#### implementation details
The key idea of this patch is similar with the previous patches: extend
the 32bits type ID to 64bits so that we can store the module file index
in the higher bits. Then the encoding of the type ID is independent on
the imported modules.
But there are two differences from the previous patches:
- TypeID is not completely an index of serialized types. We used the
lower 3 bits to store the qualifiers.
- TypeID won't take part in any lookup process. So the uses of TypeID is
much less than the previous patches.
The first difference make we have some more slightly complex bit
operations. And the second difference makes the patch much simpler than
the previous ones.
Following of https://github.com/llvm/llvm-project/pull/92083
The motivation is still cutting of the unnecessary change in the
dependency chain. See the above link (recursively) for details.
After this patch, (and the above patch), we can already do something
pretty interesting. For example,
#### Motivation example
```
//--- m-partA.cppm
export module m:partA;
export inline int getA() {
return 43;
}
export class A {
public:
int getMem();
};
export template <typename T>
class ATempl {
public:
T getT();
};
//--- m-partA.v1.cppm
export module m:partA;
export inline int getA() {
return 43;
}
// Now we add a new declaration without introducing a new type.
// The consuming module which didn't use m:partA completely is expected to be
// not changed.
export inline int getA2() {
return 88;
}
export class A {
public:
int getMem();
// Now we add a new declaration without introducing a new type.
// The consuming module which didn't use m:partA completely is expected to be
// not changed.
int getMem2();
};
export template <typename T>
class ATempl {
public:
T getT();
// Add a new declaration without introducing a new type.
T getT2();
};
//--- m-partB.cppm
export module m:partB;
export inline int getB() {
return 430;
}
//--- m.cppm
export module m;
export import :partA;
export import :partB;
//--- useBOnly.cppm
export module useBOnly;
import m;
export inline int get() {
return getB();
}
```
In this example, module `m` exports two partitions `:partA` and
`:partB`. And a consumer `useBOnly` only consumes the entities from
`:partB`. So we don't hope the BMI of `useBOnly` changes if only
`:partA` changes. After this patch, we can make it if the change of
`:partA` doesn't introduce new types. (And we can get rid of this if we
make no-transitive-type-change).
As the example shows, when we change the implementation of `:partA` from
`m-partA.cppm` to `m-partA.v1.cppm`, we add new function declaration
`getA2()` at the global namespace, add a new member function `getMem2()`
to class `A` and add a new member function to `getT2()` to class
template `ATempl`. And since `:partA` is not used by `useBOnly`
completely, the BMI of `useBOnly` won't change after we made above
changes.
#### Design details
Method used in this patch is similar with
https://github.com/llvm/llvm-project/pull/92083 and
https://github.com/llvm/llvm-project/pull/86912. It extends the 32 bit
IdentifierID to 64 bits and use the higher 32 bits to store the module
file index. So that the encoding of the identifier won't get affected by
other modules.
#### Overhead
Similar with https://github.com/llvm/llvm-project/pull/92083 and
https://github.com/llvm/llvm-project/pull/86912. The change is only
expected to increase the size of the on-disk .pcm files and not affect
the compile-time performances. And from my experiment, the size of the
on-disk change only increase 1%+ and observe no compile-time impacts.
#### Future Plans
I'll try to do the same thing for type ids. IIRC, it won't change the
dependency graph if we add a new type in an unused units. I do think
this is a significant win. And this will be a pretty good answer to "why
modules are better than headers."