1835 Commits

Author SHA1 Message Date
Abhina Sreeskantharajan
d6f91200fe Revert "[SystemZ][z/OS] Propagate IsText flag continuation"
This reverts commit 3b3accb598ec87a6a30b0e18ded06071030bb78f.
2024-09-20 08:17:45 -04:00
Abhina Sreeskantharajan
3b3accb598 [SystemZ][z/OS] Propagate IsText flag continuation 2024-09-19 15:16:08 -04:00
Abhina Sree
edf3b277a5
[SystemZ][z/OS] Propagate IsText parameter to open text files as text (#107906)
This patch adds an IsText parameter to the following functions
openFileForRead, getBufferForFile, getBufferForFileImpl and determines
whether a file is text by querying the file tag on z/OS. The default is
set to OF_Text instead of OF_None, this change in value does not affect
any other platforms other than z/OS.
2024-09-19 14:30:10 -04:00
Vakhurin Sergei
eda72fac54
Fix OOM in FormatDiagnostic (2nd attempt) (#108866)
Resolves: #70930 (and probably latest comments from clangd/clangd#251)
by fixing racing for the shared DiagStorage value which caused messing with args inside the storage and then formatting the following message with getArgSInt(1) == 2:

def err_module_odr_violation_function : Error<
  "%q0 has different definitions in different modules; "
  "%select{definition in module '%2'|defined here}1 "
  "first difference is "

which causes HandleSelectModifier to go beyond the ArgumentLen so the recursive call to FormatDiagnostic was made with DiagStr > DiagEnd that leads to infinite while (DiagStr != DiagEnd).

The Main Idea:
Reuse the existing DiagStorageAllocator logic to make all DiagnosticBuilders having independent states.
Also, encapsulating the rest of state (e.g. ID and Loc) into DiagnosticBuilder.

The last attempt failed -
https://github.com/llvm/llvm-project/pull/108187#issuecomment-2353122096
so was reverted - #108838
2024-09-18 11:46:25 -04:00
Aaron Ballman
5cead0cb0b
Revert "Fix OOM in FormatDiagnostic" (#108838)
Reverting due to build failures found in #108187
2024-09-16 10:49:17 -04:00
Vakhurin Sergei
e5d255607d
Fix OOM in FormatDiagnostic (#108187)
Resolves: #70930 (and probably latest comments from
https://github.com/clangd/clangd/issues/251)
by fixing racing for the shared `DiagStorage` value which caused messing
with args inside the storage and then formatting the following message
with `getArgSInt(1)` == 2:
```
def err_module_odr_violation_function : Error<
  "%q0 has different definitions in different modules; "
  "%select{definition in module '%2'|defined here}1 "
  "first difference is "
```
which causes `HandleSelectModifier` to go beyond the `ArgumentLen` so
the recursive call to `FormatDiagnostic` was made with `DiagStr` >
`DiagEnd` that leads to infinite `while (DiagStr != DiagEnd)`.

**The Main Idea:**
Reuse the existing `DiagStorageAllocator` logic to make all
`DiagnosticBuilder`s having independent states.
Also, encapsulating the rest of state (e.g. ID and Loc) into
`DiagnosticBuilder`.

**TODO (if it will be requested by reviewer):**
- [x] add a test (I have no idea how to turn a whole bunch of my
proprietary code which leads `clangd` to OOM into a small public
example.. probably I must try using
[this](https://github.com/llvm/llvm-project/issues/70930#issuecomment-2209872975)
instead)
- [x] [`Diag.CurDiagID !=
diag::fatal_too_many_errors`](https://github.com/llvm/llvm-project/pull/108187#pullrequestreview-2296395489)
- [ ] ? get rid of `DiagStorageAllocator` at all and make
`DiagnosticBuilder` having they own `DiagnosticStorage` coz it seems
pretty small so should fit the stack for short-living
`DiagnosticBuilder` instances
2024-09-16 10:30:53 -04:00
Nikolas Klauser
e39205654d Reapply "Reapply "[clang] Extend diagnose_if to accept more detailed warning information (#70976)" (#108453)"
This reverts commit e1bd9740faa62c11cc785a7b70ec1ad17e286bd1.

Fixes incorrect use of the `DiagnosticsEngine` in the clangd tests.
2024-09-14 22:25:08 +02:00
Florian Mayer
e1bd9740fa Revert "Reapply "[clang] Extend diagnose_if to accept more detailed warning information (#70976)" (#108453)"
This reverts commit e7f782e7481cea23ef452a75607d3d61f5bd0d22.

This had UBSan failures:

[----------] 1 test from ConfigCompileTests
[ RUN      ] ConfigCompileTests.DiagnosticSuppression
Config fragment: compiling <unknown>:0 -> 0x00007B8366E2F7D8 (trusted=false)
/usr/local/google/home/fmayer/large/llvm-project/llvm/include/llvm/ADT/IntrusiveRefCntPtr.h:203:33: runtime error: reference binding to null pointer of type 'clang::DiagnosticIDs'

UndefinedBehaviorSanitizer: undefined-behavior /usr/local/google/home/fmayer/large/llvm-project/llvm/include/llvm/ADT/IntrusiveRefCntPtr.h:203:33

Pull Request: https://github.com/llvm/llvm-project/pull/108645
2024-09-13 15:01:33 -07:00
Nikolas Klauser
e7f782e748
Reapply "[clang] Extend diagnose_if to accept more detailed warning information (#70976)" (#108453)
This reverts commit e0cd11eba526234ca14a0b91f5598ca3363b6aca.

Update the use of `getWarningOptionForDiag` in flang to use the
DiagnosticIDs.
2024-09-13 11:34:20 +02:00
Kazu Hirata
e0cd11eba5 Revert "[clang] Extend diagnose_if to accept more detailed warning information (#70976)"
This reverts commit 030c6da7af826b641db005be925b20f956c3a6bb.

Several build bots are failing:
https://lab.llvm.org/buildbot/#/builders/89/builds/6211
https://lab.llvm.org/buildbot/#/builders/157/builds/7578
https://lab.llvm.org/buildbot/#/builders/140/builds/6429
2024-09-12 12:19:26 -07:00
Nikolas Klauser
030c6da7af
[clang] Extend diagnose_if to accept more detailed warning information (#70976)
This implements parts of the extension proposed in
https://discourse.llvm.org/t/exposing-the-diagnostic-engine-to-c/73092/7.

Specifically, this makes it possible to specify a diagnostic group in an
optional third argument.
2024-09-12 20:15:01 +02:00
Pranav Kant
3cd01371e0
Revert "[RFC][C++20][Modules] Fix crash when function and lambda insi… (#108311)
…de loaded from different modules (#104512)"

This reverts commit d778689fdc812033e7142ed87e4ee13c4997b3f9.
2024-09-11 17:21:42 -07:00
Dmitry Polukhin
d778689fdc
[RFC][C++20][Modules] Fix crash when function and lambda inside loaded from different modules (#104512)
Summary:
Because AST loading code is lazy and happens in unpredictable order it
could happen that function and lambda inside function can be loaded from
different modules. In this case, captured DeclRefExpr won’t match the
corresponding VarDecl inside function. In AST it looks like this:
```
FunctionDecl 0x555564f4aff0 <Conv.h:33:1, line:41:1> line:33:35 imported in ./thrift_cpp2_base.h hidden tryTo 'Expected<Tgt, const char *> ()' inline
|-also in ./folly-conv.h
`-CompoundStmt 0x555564f7cfc8 <col:43, line:41:1>
  |-DeclStmt 0x555564f7ced8 <line:34:3, col:17>
  | `-VarDecl 0x555564f7cef8 <col:3, col:16> col:7 imported in ./thrift_cpp2_base.h hidden referenced result 'Tgt' cinit
  |   `-IntegerLiteral 0x555564f7d080 <col:16> 'int' 0
  |-CallExpr 0x555564f7cea8 <line:39:3, col:76> '<dependent type>'
  | |-UnresolvedLookupExpr 0x555564f7bea0 <col:3, col:19> '<overloaded function type>' lvalue (no ADL) = 'then_' 0x555564f7bef0
  | |-CXXTemporaryObjectExpr 0x555564f7bcb0 <col:25, col:45> 'Expected<bool, int>':'folly::Expected<bool, int>' 'void () noexcept' zeroing
  | `-LambdaExpr 0x555564f7bc88 <col:48, col:75> '(lambda at Conv.h:39:48)'
  |   |-CXXRecordDecl 0x555564f76b88 <col:48> col:48 imported in ./folly-conv.h hidden implicit <undeserialized declarations> class definition
  |   | |-also in ./thrift_cpp2_base.h
  |   | `-DefinitionData lambda empty standard_layout trivially_copyable literal can_const_default_init
  |   |   |-DefaultConstructor defaulted_is_constexpr
  |   |   |-CopyConstructor simple trivial has_const_param needs_implicit implicit_has_const_param
  |   |   |-MoveConstructor exists simple trivial needs_implicit
  |   |   |-CopyAssignment trivial has_const_param needs_implicit implicit_has_const_param
  |   |   |-MoveAssignment
  |   |   `-Destructor simple irrelevant trivial constexpr needs_implicit
  |   `-CompoundStmt 0x555564f7d1a8 <col:58, col:75>
  |     `-ReturnStmt 0x555564f7d198 <col:60, col:67>
  |       `-DeclRefExpr 0x555564f7d0a0 <col:67> 'Tgt' lvalue Var 0x555564f7d0c8 'result' 'Tgt' refers_to_enclosing_variable_or_capture
  `-ReturnStmt 0x555564f7bc78 <line:40:3, col:11>
    `-InitListExpr 0x555564f7bc38 <col:10, col:11> 'void'
```
This diff changes AST deserialization to load lambdas inside canonical
function declaration earlier right after the function to make sure that
their canonical decl is loaded from the same module.

Test Plan: check-clang
2024-09-10 16:15:50 +01:00
Haojian Wu
57ef16c699 Fix -Wswitch warning after 4fef204ac42eb84e167d43ce076c9a167eae3be0 2024-09-01 15:40:44 +02:00
Helena Kotas
e00e9a3f82
[HLSL] Add HLSLAttributedResourceType (#106181)
Introducing `HLSLAttributedResourceType` - a new type that is similar to
`AttributedType` but with additional data specific to HLSL resources.
`AttributeType` currently only stores an attribute kind and no
additional data from the type attribute parameters. This does not really
work for HLSL resources since its type attributes contain non-boolean
values that need to be retained as well.

For example:

```
template <typename T> class RWBuffer {
  __hlsl_resource_t  [[hlsl::resource_class(uav)]] [[hlsl::is_rov]] handle;
};
```

The data `HLSLAttributedResourceType` needs to eventually store are:
- resource class (SRV, UAV, CBuffer, Sampler)
- texture dimension(1-3)
- flags is_rov, is_array, is_feedback and is_multisample
- contained type

All of these values except contained type will be stored in
`HLSLAttributedResourceType::Attributes` struct and accessed
individually via the fields. There is also `Data` alias that covers all
of these values as a `unsigned` which is used for hashing and the AST
type serialization.

During type attribute processing all HLSL type attributes will be
validated and collected by SemaHLSL (by
`SemaHLSL::handleResourceTypeAttr`) and in the end combined into a
single `HLSLAttributedResourceType` instance (in
`SemaHLSL::ProcessResourceTypeAttributes`). `SemaHLSL` will also need to
short-term store the `TypeLoc` information for the new type that will be
grabbed by `TypeSpecLocFiller` soon after the type is created.

Part 1/2 of #104861
2024-08-29 21:42:20 -07:00
Chuanqi Xu
55cdb3c785
[C++20] [Modules] Merge lambdas in source to imported lambdas (#106483)
Close https://github.com/llvm/llvm-project/issues/102721

Generally, the type of merged decls will be reused in ASTContext. But
for lambda, in the import and then include case, we can't decide its
previous decl in the imported modules so that we can't assign the
previous decl before creating the type for it. Since we can't decide its
numbering before creating it. So we have to assign the previous decl and
the canonical type for it after creating it, which is unusual and
slightly hack.
2024-08-29 12:37:56 +08:00
Kazu Hirata
1e3dc8cdb4 [Serialization] Fix a warning
This patch fixes:

  clang/lib/Serialization/ASTReader.cpp:9978:27: error: lambda capture
  'this' is not used [-Werror,-Wunused-lambda-capture]
2024-08-23 05:49:49 -07:00
Chuanqi Xu
3cca522d21
[C++20] [Modules] Warn for duplicated decls in mutliple module units (#105799)
It is a long standing issue that the duplicated declarations in multiple
module units would cause the compilation performance to get slowed down.
And there are many questions or issue reports. So I think it is better
to add a warning for it.

And given this is not because the users' code violates the language
specification or any best practices, the warning is disabled by default
even if `-Wall` is specified. The users need to specify the warning
explcitly or use `Weverything`.

The documentation will add separately.
2024-08-23 17:42:47 +08:00
Jonas Hahnfeld
66bd5d7989
[clang-repl] Fix PCH with delayed template parsing (#103028)
When instantiating a delayed template, the recorded token stream is
passed to `Parser::ParseLateTemplatedFuncDef` which will append the
current token "so it doesn't get lost". With incremental extensions
enabled, this is `repl_input_end` which subsequently needs support for
(de)serialization.
2024-08-14 15:11:04 +02:00
Chuanqi Xu
4915fddbb2
[Serialization] Add a callback to register new created predefined decls for DeserializationListener (#102855)
Close https://github.com/llvm/llvm-project/issues/102684

The root cause of the issue is, it is possible that the predefined decl
is not registered at the beginning of writing a module file but got
created during the process of writing from reading.

This is incorrect. The predefined decls should always be predefined
decls.

Another deep thought about the issue is, we shouldn't read any new
things after we start to write the module file. But this is another
deeper question.
2024-08-12 18:27:37 +08:00
Kazu Hirata
4ce2f988b2
[Serialization] Use traditional for loops (NFC) (#102761)
The use of _ requires either:

- (void)_ and curly braces, or
- [[maybe_unused]].

For simple repetitions like these, we can use traditional for loops
for readable warning-free code.
2024-08-10 10:49:39 -07:00
Kazu Hirata
ac83582a2e [Serialization] Fix a warning
This patch fixes:

  clang/lib/Serialization/ASTReader.cpp:11484:13: error: unused
  variable '_' [-Werror,-Wunused-variable]
2024-08-10 09:21:02 -07:00
Shilei Tian
1c269929d0
[Clang][Sema][OpenMP] Allow thread_limit to accept multiple expressions (#102715) 2024-08-10 09:54:58 -04:00
Volodymyr Sapsai
4357175569
[re-format][Modules] Follow-up formatting to "Mention which AST file's options differ from the current TU options." (#102484)
Fix formatting for fdf8e3e31103bc81917cdb27150877f524bb2669.
2024-08-08 11:59:47 -03:00
Volodymyr Sapsai
fdf8e3e311
[Modules][Diagnostic] Mention which AST file's options differ from the current TU options. (#101413)
Claiming a mismatch is always in a precompiled header is wrong and
misleading as a mismatch can happen in any provided AST file. Emitting a
path for a file with a problem allows to disambiguate between multiple
input files.

Use generic term "AST file" because we don't always know a kind of the
provided file (for example, see `ASTReader::readASTFileControlBlock`).

rdar://65005546
2024-08-08 11:23:47 -03:00
Chuanqi Xu
847f9cb0e8
Reland [C++20] [Modules] [Itanium ABI] Generate the vtable in the mod… (#102287)
Reland https://github.com/llvm/llvm-project/pull/75912

The differences of this PR between
https://github.com/llvm/llvm-project/pull/75912 are:

- Fixed a regression in `Decl::isInAnotherModuleUnit()` in DeclBase.cpp
pointed by @mizvekov and add the corresponding test.
- Fixed the regression in windows
https://github.com/llvm/llvm-project/issues/97447. The changes are in
`CodeGenModule::getVTableLinkage` from
`clang/lib/CodeGen/CGVTables.cpp`. According to the feedbacks from MSVC
devs, the linkage of vtables won't affected by modules. So I simply
skipped the case for MSVC.

Given this is more or less fundamental to the use of modules. I hope we
can backport this to 19.x.
2024-08-08 13:14:09 +08:00
Shilei Tian
0c2ded6706 [NFC] Fix compile warning introduced in #99732 2024-08-06 12:45:29 -04:00
Kazu Hirata
b809671a41 [Serialization] Fix a warning
This patch fixes:

  clang/lib/Serialization/ASTReader.cpp:11426:13: error: unused
  variable '_' [-Werror,-Wunused-variable]
2024-08-06 08:22:00 -07:00
Shilei Tian
cee594cf36
[Clang][Sema][OpenMP] Allow num_teams to accept multiple expressions (#99732)
By the OpenMP standard, `num_teams` clause can only accept one
expression (for now). In this patch, we extend it to allow to accept
multiple expressions when it is used with `target teams ompx_bare`
construct. This will allow to launch a multi-dim grid, same as CUDA/HIP.
2024-08-06 10:55:15 -04:00
Helena Kotas
52956b0f70
[HLSL] Implement intangible AST type (#97362)
HLSL has a set of intangible types which are described in in the
[draft HLSL Specification
(**[Basic.types]**)](https://microsoft.github.io/hlsl-specs/specs/hlsl.pdf):
  There are special implementation-defined types such as handle types,
  which fall into a category of standard intangible types. Intangible
  types are types that have no defined object representation or value
  representation, as such the size is unknown at compile time.
    
  A class type T is an intangible class type if it contains an base
  classes or members of intangible class type, standard intangible type,
  or arrays of such types. Standard intangible types and intangible class
  types are collectively called intangible
  types([9](https://microsoft.github.io/hlsl-specs/specs/hlsl.html#Intangible)).

This PR implements one standard intangible type `__hlsl_resource_t`
and sets up the infrastructure that will make it easier to add more
in the future, such as samplers or raytracing payload handles. The
HLSL intangible types are declared in
`clang/include/clang/Basic/HLSLIntangibleTypes.def` and this file is
included with related macro definition in most places that require edits
when a new type is added.

The new types are added as keywords and not typedefs to make sure they
cannot be redeclared, and they can only be declared in builtin implicit
headers. The `__hlsl_resource_t` type represents a handle to a memory
resource and it is going to be used in builtin HLSL buffer types like this:

        template <typename T>
        class RWBuffer {
          [[hlsl::contained_type(T)]]
          [[hlsl::is_rov(false)]]
          [[hlsl::resource_class(uav)]]  
          __hlsl_resource_t Handle;
        };

Part 1/3 of llvm/llvm-project#90631.

---------

Co-authored-by: Justin Bogner <mail@justinbogner.com>
2024-08-05 10:50:34 -07:00
Julian Brown
a42e515e3a
[OpenMP] OpenMP 5.1 "assume" directive parsing support (#92731)
This is a minimal patch to support parsing for "omp assume" directives.
These are meant to be hints to a compiler's optimisers: as such, it is
legitimate (if not very useful) to ignore them. The patch builds on top
of the existing support for "omp assumes" directives (note spelling!).

Unlike the "omp [begin/end] assumes" directives, "omp assume" is
associated with a compound statement, i.e. it can appear within a
function. The "holds" assumption could (theoretically) be mapped onto
the existing builtin "__builtin_assume", though the latter applies to a
single point in the program, and the former to a range (i.e. the whole
of the associated compound statement).

This patch fixes sollve's OpenMP 5.1 "omp assume"-based tests.
2024-08-05 07:37:07 -04:00
Dmitry Polukhin
f3761a4bd3
[C++20][Modules] Allow using stdarg.h with header units (#100739)
Summary:
Macro like `va_start`/`va_end` marked as builtin functions that makes
these identifiers special and it results in redefinition of the
identifiers as builtins and it hides macro definitions during preloading
C++ modules. In case of modules Clang ignores special identifiers but
`PP.getCurrentModule()` was not set. This diff fixes IsModule detection
logic for this particular case.

Test Plan: check-clang

---------

Co-authored-by: Chuanqi Xu <yedeng.yd@linux.alibaba.com>
2024-08-01 12:33:20 +03:00
Volodymyr Sapsai
f9827e67ce
[Modules][Diagnostic] Don't claim a METADATA mismatch is always in PCH file. (#101280)
You can provide more than one AST file as an input. Emit a path for a
file with a problem, so you can disambiguate between multiple files.

rdar://65005546
2024-07-31 10:38:32 -07:00
Chuanqi Xu
2f0910d2d7 [C++20] [Modules] Skip ODR checks if either declaration comes from GMF
This patch tries to workaround the case that:
- in a module unit that imports another module unit
- both the module units including overlapped headers
- the compiler emits false positive ODR violation diagnostics for the
  overlapped headers if ODR check is enabled
- the current module units enables PCH

For the third point, we disabled ODR check if the declarations comes
from GMF. However, due to the forth point, the check whether the
declaration comes from GMF failed. Then we still going to check it and
then the users get false positive checks.

What's worse is that, this always happens in clangd, where will generate
the PCH automatically before parsing the input files.

The root cause of the problem we mixed the modules in the semantical
level and the module in the serialization level.

The problem is pretty fundamental and we need time to fix that. But 19.x
is going to be branched and I hope to give clangd better user
experience. So I decided to land this workaround even if it is pretyy
niche and may only work for the case of clangd's pattern.
2024-07-19 13:38:57 +08:00
Chuanqi Xu
91d40ef6e3 Revert "[C++20] [Modules] [Itanium ABI] Generate the vtable in the module unit of dynamic classes (#75912)"
This reverts commit 18f3bcbb13ca83d33223b00761d8cddf463e9ffb, 15bb02650e26875c48889053d6a9697444583721 and
99873b35da7ecb905143c8a6b8deca4d4416f1a9.

See the post commit message in
https://github.com/llvm/llvm-project/pull/75912 to see the reasons.
2024-07-10 10:58:18 +08:00
Chuanqi Xu
79b0966f2f [NFC] [Serialization] Refactor getLocalDeclID to 'LocalDeclID::get'
I just realized that the name `getLocalDeclID` looks like an member
function in ASTReader. It looks not good. So I decided to refactor this
into a static member function in LocalDeclID.
2024-06-24 14:10:52 +08:00
Fangrui Song
f3005d5b86 [Serialization] Change input file content hash from size_t to uint64_t
https://reviews.llvm.org/D67249 added content hash (see
-fvalidate-ast-input-files-content) using llvm::hash_code (size_t).
The hash value is 32-bit on 32-bit systems, which was unintentional.

Fix #96379: #96136 switched the hash function to xxh3_64bit but did not
update the ContentHash type, leading to mismatch between ASTReader and
ASTWriter.
2024-06-22 14:21:36 -07:00
Fangrui Song
874dcaea09
[Serialization] Use stable hash functions
clangSerialization currently uses hash_combine/hash_value from
Hashing.h, which are not guaranteed to be deterministic.
Replace these uses with xxh3_64bits.

Pull Request: https://github.com/llvm/llvm-project/pull/96136
2024-06-20 23:53:07 -07:00
Chuanqi Xu
03921b979d
[serialization] No transitive type change (#92511)
Following of https://github.com/llvm/llvm-project/pull/92085. 

#### motivation

The motivation is still cutting of the unnecessary change in the
dependency chain. See the above link (recursively) for details.

And this will be the last patch of the `no-transitive-*-change` series.
If there are any following patches, they might be C++20 Named modules
specific to handle special grammars like `ADL` (See the reply in
https://discourse.llvm.org/t/rfc-c-20-modules-introduce-thin-bmi-and-decls-hash/74755/53
for example). So they won't affect the whole serialization part as the
series patch did.

#### example

After this patch, finally we are able to cut of unnecessary change of
types. For example,

```

//--- m-partA.cppm
export module m:partA;

//--- m-partA.v1.cppm
export module m:partA;

namespace NS {
    class A {
        public:
            int getValue() {
                return 43;
            }
    };
}

//--- m-partB.cppm
export module m:partB;

export inline int getB() {
    return 430;
}

//--- m.cppm
export module m;
export import :partA;
export import :partB;

//--- useBOnly.cppm
export module useBOnly;
import m;

export inline int get() {
    return getB();
}
```

The BMI of `useBOnly.cppm` is expected to not change if we only add a
new class in `m:partA`. This will be pretty useful in practice.

#### implementation details

The key idea of this patch is similar with the previous patches: extend
the 32bits type ID to 64bits so that we can store the module file index
in the higher bits. Then the encoding of the type ID is independent on
the imported modules.

But there are two differences from the previous patches:
- TypeID is not completely an index of serialized types. We used the
lower 3 bits to store the qualifiers.
- TypeID won't take part in any lookup process. So the uses of TypeID is
much less than the previous patches.

The first difference make we have some more slightly complex bit
operations. And the second difference makes the patch much simpler than
the previous ones.
2024-06-21 09:21:40 +08:00
Chuanqi Xu
2f2ea3557b
[Serialization] No transitive identifier change (#92085)
Following of https://github.com/llvm/llvm-project/pull/92083

The motivation is still cutting of the unnecessary change in the
dependency chain. See the above link (recursively) for details.

After this patch, (and the above patch), we can already do something
pretty interesting. For example,

#### Motivation example

```

//--- m-partA.cppm
export module m:partA;

export inline int getA() {
    return 43;
}

export class A {
public:
    int getMem();
};

export template <typename T>
class ATempl {
public:
    T getT();
};

//--- m-partA.v1.cppm
export module m:partA;

export inline int getA() {
    return 43;
}

// Now we add a new declaration without introducing a new type.
// The consuming module which didn't use m:partA completely is expected to be
// not changed.
export inline int getA2() {
    return 88;
}

export class A {
public:
    int getMem();
    // Now we add a new declaration without introducing a new type.
    // The consuming module which didn't use m:partA completely is expected to be
    // not changed.
    int getMem2();
};

export template <typename T>
class ATempl {
public:
    T getT();
    // Add a new declaration without introducing a new type.
    T getT2();
};

//--- m-partB.cppm
export module m:partB;

export inline int getB() {
    return 430;
}

//--- m.cppm
export module m;
export import :partA;
export import :partB;

//--- useBOnly.cppm
export module useBOnly;
import m;

export inline int get() {
    return getB();
}
```

In this example, module `m` exports two partitions `:partA` and
`:partB`. And a consumer `useBOnly` only consumes the entities from
`:partB`. So we don't hope the BMI of `useBOnly` changes if only
`:partA` changes. After this patch, we can make it if the change of
`:partA` doesn't introduce new types. (And we can get rid of this if we
make no-transitive-type-change).

As the example shows, when we change the implementation of `:partA` from
`m-partA.cppm` to `m-partA.v1.cppm`, we add new function declaration
`getA2()` at the global namespace, add a new member function `getMem2()`
to class `A` and add a new member function to `getT2()` to class
template `ATempl`. And since `:partA` is not used by `useBOnly`
completely, the BMI of `useBOnly` won't change after we made above
changes.

#### Design details

Method used in this patch is similar with
https://github.com/llvm/llvm-project/pull/92083 and
https://github.com/llvm/llvm-project/pull/86912. It extends the 32 bit
IdentifierID to 64 bits and use the higher 32 bits to store the module
file index. So that the encoding of the identifier won't get affected by
other modules.

#### Overhead

Similar with https://github.com/llvm/llvm-project/pull/92083 and
https://github.com/llvm/llvm-project/pull/86912. The change is only
expected to increase the size of the on-disk .pcm files and not affect
the compile-time performances. And from my experiment, the size of the
on-disk change only increase 1%+ and observe no compile-time impacts.

#### Future Plans

I'll try to do the same thing for type ids. IIRC, it won't change the
dependency graph if we add a new type in an unused units. I do think
this is a significant win. And this will be a pretty good answer to "why
modules are better than headers."
2024-06-20 13:30:05 +08:00
Chuanqi Xu
da2ad44119 [HeaderSearch] Introduce LazyIdentifierInfoPtr for Controlling Macro in HeaderFileInfo
This patch is helpful to reduce 32 bits for HeaderFileInfo by combining
a uint32_t and pointer into a tagged pointer.

This is reviewed as part of
https://github.com/llvm/llvm-project/pull/92085 and required to be split
as a separate commit
2024-06-20 13:13:47 +08:00
Chuanqi Xu
8af86025af [NFC] [Serialization] Unify how LocalDeclID can be created
Now we can create a LocalDeclID directly with an integer without
verifying. It may be hard to refactor if we want to change the way we
serialize DeclIDs (See https://github.com/llvm/llvm-project/pull/95897).
Also it is hard for us to debug if someday someone construct a
LocalDeclID with an incorrect value.

So in this patch, I tried to unify the way we can construct a
LocalDeclID in ASTReader, where we will construct the LocalDeclID from
the serialized data. Also, now we can verify the constructed LocalDeclID
sooner in the new interface.
2024-06-19 15:18:01 +08:00
Shilei Tian
ad599211a7
[Clang][AMDGPU] Add a new builtin type for buffer rsrc (#94830)
This patch adds a new builtin type for AMDGPU's buffer rsrc data type,
which is effectively an AS 8 pointer. This is needed because we'd like
to expose certain intrinsics to users via builtins which take buffer
rsrc as argument.
2024-06-18 20:46:53 -04:00
Chuanqi Xu
15bb02650e
[C++20] [Modules] [Itanium ABI] Generate the vtable in the module unit of dynamic classes (#75912)
Close https://github.com/llvm/llvm-project/issues/70585 and reflect
https://github.com/itanium-cxx-abi/cxx-abi/issues/170.

The significant change of the patch is: for dynamic classes attached to
module units, we generate the vtable to the attached module units
directly and the key functions for such classes is meaningless.
2024-06-17 10:25:35 +08:00
Ziqing Luo
2e7b95e4c0
[Safe Buffers] Serialize unsafe_buffer_usage pragmas (#92031)
The commit adds serialization and de-serialization implementations for
the stored regions. Basically, the serialized representation of the
regions of a PP is a (ordered) sequence of source location encodings.
For de-serialization, regions from loaded files are stored by their ASTs.
When later one queries if a loaded location L is in an opt-out
region, PP looks up the regions of the loaded AST where L is at.

(Background if helps: a pair of `#pragma clang unsafe_buffer_usage begin/end` pragmas marks a
warning-opt-out region. The begin and end locations (opt-out regions)
are stored in preprocessor instances (PP) and will be queried by the
`-Wunsafe-buffer-usage` analyzer.)

The reported issue at upstream: https://github.com/llvm/llvm-project/issues/90501
rdar://124035402
2024-06-13 22:44:24 -07:00
Chuanqi Xu
5a0181f568 [serialization] no transitive decl change (#92083)
Following of https://github.com/llvm/llvm-project/pull/86912

The motivation of the patch series is that, for a module interface unit
`X`, when the dependent modules of `X` changes, if the changes is not
relevant with `X`, we hope the BMI of `X` won't change. For the specific
patch, we hope if the changes was about irrelevant declaration changes,
we hope the BMI of `X` won't change. **However**, I found the patch
itself is not very useful in practice, since the adding or removing
declarations, will change the state of identifiers and types in most
cases.

That said, for the most simple example,

```
// partA.cppm
export module m:partA;

// partA.v1.cppm
export module m:partA;
export void a() {}

// partB.cppm
export module m:partB;
export void b() {}

// m.cppm
export module m;
export import :partA;
export import :partB;

// onlyUseB;
export module onlyUseB;
import m;
export inline void onluUseB() {
    b();
}
```

the BMI of `onlyUseB` will change after we change the implementation of
`partA.cppm` to `partA.v1.cppm`. Since `partA.v1.cppm` introduces new
identifiers and types (the function prototype).

So in this patch, we have to write the tests as:

```
// partA.cppm
export module m:partA;
export int getA() { ... }
export int getA2(int) { ... }

// partA.v1.cppm
export module m:partA;
export int getA() { ... }
export int getA(int) { ... }
export int getA2(int) { ... }

// partB.cppm
export module m:partB;
export void b() {}

// m.cppm
export module m;
export import :partA;
export import :partB;

// onlyUseB;
export module onlyUseB;
import m;
export inline void onluUseB() {
    b();
}
```

so that the new introduced declaration `int getA(int)` doesn't introduce
new identifiers and types, then the BMI of `onlyUseB` can keep
unchanged.

While it looks not so great, the patch should be the base of the patch
to erase the transitive change for identifiers and types since I don't
know how can we introduce new types and identifiers without introducing
new declarations. Given how tightly the relationship between
declarations, types and identifiers, I think we can only reach the ideal
state after we made the series for all of the three entties.

The design of the patch is similar to
https://github.com/llvm/llvm-project/pull/86912, which extends the
32-bit DeclID to 64-bit and use the higher bits to store the module file
index and the lower bits to store the Local Decl ID.

A slight difference is that we only use 48 bits to store the new DeclID
since we try to use the higher 16 bits to store the module ID in the
prefix of Decl class. Previously, we use 32 bits to store the module ID
and 32 bits to store the DeclID. I don't want to allocate additional
space so I tried to make the additional space the same as 64 bits. An
potential interesting thing here is about the relationship between the
module ID and the module file index. I feel we can get the module file
index by the module ID. But I didn't prove it or implement it. Since I
want to make the patch itself as small as possible. We can make it in
the future if we want.

Another change in the patch is the new concept Decl Index, which means
the index of the very big array `DeclsLoaded` in ASTReader. Previously,
the index of a loaded declaration is simply the Decl ID minus
PREDEFINED_DECL_NUMs. So there are some places they got used
ambiguously. But this patch tried to split these two concepts.

As https://github.com/llvm/llvm-project/pull/86912 did, the change will
increase the on-disk PCM file sizes. As the declaration ID may be the
most IDs in the PCM file, this can have the biggest impact on the size.
In my experiments, this change will bring 6.6% increase of the on-disk
PCM size. No compile-time performance regression observed. Given the
benefits in the motivation example, I think the cost is worthwhile.
2024-06-07 20:21:55 +08:00
Chuanqi Xu
4f70c5ec4a Revert "[serialization] no transitive decl change (#92083)"
This reverts commit 5c104879c1a98eeb845c03e7c45206bd48e88f0c.

The ArmV7 bot is complaining the change breaks the alignment.
2024-06-07 11:29:09 +08:00
Chuanqi Xu
5c104879c1 [serialization] no transitive decl change (#92083)
Following of https://github.com/llvm/llvm-project/pull/86912

The motivation of the patch series is that, for a module interface unit
`X`, when the dependent modules of `X` changes, if the changes is not
relevant with `X`, we hope the BMI of `X` won't change. For the specific
patch, we hope if the changes was about irrelevant declaration changes,
we hope the BMI of `X` won't change. **However**, I found the patch
itself is not very useful in practice, since the adding or removing
declarations, will change the state of identifiers and types in most
cases.

That said, for the most simple example,

```
// partA.cppm
export module m:partA;

// partA.v1.cppm
export module m:partA;
export void a() {}

// partB.cppm
export module m:partB;
export void b() {}

// m.cppm
export module m;
export import :partA;
export import :partB;

// onlyUseB;
export module onlyUseB;
import m;
export inline void onluUseB() {
    b();
}
```

the BMI of `onlyUseB` will change after we change the implementation of
`partA.cppm` to `partA.v1.cppm`. Since `partA.v1.cppm` introduces new
identifiers and types (the function prototype).

So in this patch, we have to write the tests as:

```
// partA.cppm
export module m:partA;
export int getA() { ... }
export int getA2(int) { ... }

// partA.v1.cppm
export module m:partA;
export int getA() { ... }
export int getA(int) { ... }
export int getA2(int) { ... }

// partB.cppm
export module m:partB;
export void b() {}

// m.cppm
export module m;
export import :partA;
export import :partB;

// onlyUseB;
export module onlyUseB;
import m;
export inline void onluUseB() {
    b();
}
```

so that the new introduced declaration `int getA(int)` doesn't introduce
new identifiers and types, then the BMI of `onlyUseB` can keep
unchanged.

While it looks not so great, the patch should be the base of the patch
to erase the transitive change for identifiers and types since I don't
know how can we introduce new types and identifiers without introducing
new declarations. Given how tightly the relationship between
declarations, types and identifiers, I think we can only reach the ideal
state after we made the series for all of the three entties.

The design of the patch is similar to
https://github.com/llvm/llvm-project/pull/86912, which extends the
32-bit DeclID to 64-bit and use the higher bits to store the module file
index and the lower bits to store the Local Decl ID.

A slight difference is that we only use 48 bits to store the new DeclID
since we try to use the higher 16 bits to store the module ID in the
prefix of Decl class. Previously, we use 32 bits to store the module ID
and 32 bits to store the DeclID. I don't want to allocate additional
space so I tried to make the additional space the same as 64 bits. An
potential interesting thing here is about the relationship between the
module ID and the module file index. I feel we can get the module file
index by the module ID. But I didn't prove it or implement it. Since I
want to make the patch itself as small as possible. We can make it in
the future if we want.

Another change in the patch is the new concept Decl Index, which means
the index of the very big array `DeclsLoaded` in ASTReader. Previously,
the index of a loaded declaration is simply the Decl ID minus
PREDEFINED_DECL_NUMs. So there are some places they got used
ambiguously. But this patch tried to split these two concepts.

As https://github.com/llvm/llvm-project/pull/86912 did, the change will
increase the on-disk PCM file sizes. As the declaration ID may be the
most IDs in the PCM file, this can have the biggest impact on the size.
In my experiments, this change will bring 6.6% increase of the on-disk
PCM size. No compile-time performance regression observed. Given the
benefits in the motivation example, I think the cost is worthwhile.
2024-06-07 10:47:53 +08:00
Chuanqi Xu
e2858189bd Revert "[serialization] no transitive decl change (#92083)"
This reverts commit 97c866f6c86456b3316006e6beff47e68a81c00a.

This fails on 32bit machines. See
https://github.com/llvm/llvm-project/pull/92083
2024-06-06 17:49:59 +08:00
Chuanqi Xu
97c866f6c8 [serialization] no transitive decl change (#92083)
Following of https://github.com/llvm/llvm-project/pull/86912

The motivation of the patch series is that, for a module interface unit
`X`, when the dependent modules of `X` changes, if the changes is not
relevant with `X`, we hope the BMI of `X` won't change. For the specific
patch, we hope if the changes was about irrelevant declaration changes,
we hope the BMI of `X` won't change. **However**, I found the patch
itself is not very useful in practice, since the adding or removing
declarations, will change the state of identifiers and types in most
cases.

That said, for the most simple example,

```
// partA.cppm
export module m:partA;

// partA.v1.cppm
export module m:partA;
export void a() {}

// partB.cppm
export module m:partB;
export void b() {}

// m.cppm
export module m;
export import :partA;
export import :partB;

// onlyUseB;
export module onlyUseB;
import m;
export inline void onluUseB() {
    b();
}
```

the BMI of `onlyUseB` will change after we change the implementation of
`partA.cppm` to `partA.v1.cppm`. Since `partA.v1.cppm` introduces new
identifiers and types (the function prototype).

So in this patch, we have to write the tests as:

```
// partA.cppm
export module m:partA;
export int getA() { ... }
export int getA2(int) { ... }

// partA.v1.cppm
export module m:partA;
export int getA() { ... }
export int getA(int) { ... }
export int getA2(int) { ... }

// partB.cppm
export module m:partB;
export void b() {}

// m.cppm
export module m;
export import :partA;
export import :partB;

// onlyUseB;
export module onlyUseB;
import m;
export inline void onluUseB() {
    b();
}
```

so that the new introduced declaration `int getA(int)` doesn't introduce
new identifiers and types, then the BMI of `onlyUseB` can keep
unchanged.

While it looks not so great, the patch should be the base of the patch
to erase the transitive change for identifiers and types since I don't
know how can we introduce new types and identifiers without introducing
new declarations. Given how tightly the relationship between
declarations, types and identifiers, I think we can only reach the ideal
state after we made the series for all of the three entties.

The design of the patch is similar to
https://github.com/llvm/llvm-project/pull/86912, which extends the
32-bit DeclID to 64-bit and use the higher bits to store the module file
index and the lower bits to store the Local Decl ID.

A slight difference is that we only use 48 bits to store the new DeclID
since we try to use the higher 16 bits to store the module ID in the
prefix of Decl class. Previously, we use 32 bits to store the module ID
and 32 bits to store the DeclID. I don't want to allocate additional
space so I tried to make the additional space the same as 64 bits. An
potential interesting thing here is about the relationship between the
module ID and the module file index. I feel we can get the module file
index by the module ID. But I didn't prove it or implement it. Since I
want to make the patch itself as small as possible. We can make it in
the future if we want.

Another change in the patch is the new concept Decl Index, which means
the index of the very big array `DeclsLoaded` in ASTReader. Previously,
the index of a loaded declaration is simply the Decl ID minus
PREDEFINED_DECL_NUMs. So there are some places they got used
ambiguously. But this patch tried to split these two concepts.

As https://github.com/llvm/llvm-project/pull/86912 did, the change will
increase the on-disk PCM file sizes. As the declaration ID may be the
most IDs in the PCM file, this can have the biggest impact on the size.
In my experiments, this change will bring 6.6% increase of the on-disk
PCM size. No compile-time performance regression observed. Given the
benefits in the motivation example, I think the cost is worthwhile.
2024-06-06 11:51:05 +08:00