17 Commits

Author SHA1 Message Date
yronglin
1da403937e
Reapply "[C++20][Modules] Implement P1857R3 Modules Dependency Discovery" (#173130)" (#173789)
The patch reapply https://github.com/llvm/llvm-project/pull/173130.

This patch implement the following papers:
[P1857R3 Modules Dependency Discovery](https://wg21.link/p1857r3).
[P3034R1 Module Declarations Shouldn’t be
Macros](https://wg21.link/P3034R1).
[CWG2947](https://cplusplus.github.io/CWG/issues/2947.html).

At the start of phase 4 an import or module token is treated as starting
a directive and are converted to their respective keywords iff:

 - After skipping horizontal whitespace are
    - at the start of a logical line, or
    - preceded by an export at the start of the logical line.
- Are followed by an identifier pp token (before macro expansion), or
    - <, ", or : (but not ::) pp tokens for import, or
    - ; for module
Otherwise the token is treated as an identifier.

Additionally:

- The entire import or module directive (including the closing ;) must
be on a single logical line and for module must not come from an
#include.
- The expansion of macros must not result in an import or module
directive introducer that was not there prior to macro expansion.
- A module directive may only appear as the first preprocessing tokens
in a file (excluding the global module fragment.)
- Preprocessor conditionals shall not span a module declaration.

After this patch, we handle C++ module-import and module-declaration as
a real pp-directive in preprocessor. Additionally, we refactor module
name lexing, remove the complex state machine and read full module name
during module/import directive handling. Possibly we can introduce a
tok::annot_module_name token in the future, avoid duplicatly parsing
module name in both preprocessor and parser, but it's makes error
recovery much diffcult(eg. import a; import b; in same line).

This patch also introduce 2 new keyword `__preprocessed_module` and
`__preprocessed_import`. These 2 keyword was generated during `-E` mode.
This is useful to avoid confusion with `module` and `import` keyword in
preprocessed output:
```cpp
export module m;
struct import {};
#define EMPTY
EMPTY import foo;
```

Fixes https://github.com/llvm/llvm-project/issues/54047

The previous patch has an use-after-free issue in
Lexer::LexTokenInternal function. Since C++20, the `export`, `import`
and `module` identifiers may be an introducer of a C++ module
declaration/importing directive, and the directive will handled in
`LexIdentifierContinue`. Unfortunately, the EOF may be encountered in
`LexIdentifierContinue` and `CurLexer` might be destructed in
`HandleEndOfFile`, If the code after `LexIdentifierContinue` try to
access `LangOps` or other class members in this Lexer, it will hit
undefined behavior.

This patch also fix the use-after-free issue in Lexer by introduce a
mechanism to delay the destruction of `CurLexer` in `Preprocessor`
class.

---------

Signed-off-by: yronglin <yronglin777@gmail.com>
2026-01-20 17:42:46 +08:00
yronglin
71bba12587
Revert "Reapply "[C++20][Modules] Implement P1857R3 Modules Dependency Discovery" (#173130)" (#173549)
This reverts commit 0d1c396ce8178baf05f277b16bf41b8a6b847d6d.

Co-authored-by: Yihan Wang <yihwang@nvidia.com>
2025-12-25 19:55:40 +08:00
yronglin
0d1c396ce8
Reapply "[C++20][Modules] Implement P1857R3 Modules Dependency Discovery" (#173130)
This PR reapply https://github.com/llvm/llvm-project/pull/107168.

---------

Signed-off-by: Wang, Yihan <yronglin777@gmail.com>
Signed-off-by: yronglin <yronglin777@gmail.com>
2025-12-25 18:55:44 +08:00
Paul Kirth
2b8b305d46
Revert "[C++20][Modules] Implement P1857R3 Modules Dependency Discovery" (#173118)
Reverts llvm/llvm-project#107168

This patch broke on bots:
- https://lab.llvm.org/buildbot/#/builders/190/builds/33105
- https://lab.llvm.org/buildbot/#/builders/94/builds/13727
- https://lab.llvm.org/buildbot/#/builders/169/builds/18192

and on mac-aarch64 builds.
see
https://github.com/llvm/llvm-project/pull/107168#issuecomment-3675990781
2025-12-19 15:17:31 -08:00
yronglin
d2e62d9024
[C++20][Modules] Implement P1857R3 Modules Dependency Discovery (#107168)
This PR implement the following papers:
[P1857R3 Modules Dependency Discovery](https://wg21.link/p1857r3).
[P3034R1 Module Declarations Shouldn’t be
Macros](https://wg21.link/P3034R1).
[CWG2947](https://cplusplus.github.io/CWG/issues/2947.html).

At the start of phase 4 an import or module token is treated as starting
a directive and are converted to their respective keywords iff:

 - After skipping horizontal whitespace are
    - at the start of a logical line, or
    - preceded by an export at the start of the logical line.
- Are followed by an identifier pp token (before macro expansion), or
    - <, ", or : (but not ::) pp tokens for import, or
    - ; for module
Otherwise the token is treated as an identifier.

Additionally:

- The entire import or module directive (including the closing ;) must
be on a single logical line and for module must not come from an
#include.
- The expansion of macros must not result in an import or module
directive introducer that was not there prior to macro expansion.
- A module directive may only appear as the first preprocessing tokens
in a file (excluding the global module fragment.)
- Preprocessor conditionals shall not span a module declaration.

After this patch, we handle C++ module-import and module-declaration as
a real pp-directive in preprocessor. Additionally, we refactor module
name lexing, remove the complex state machine and read full module name
during module/import directive handling. Possibly we can introduce a
tok::annot_module_name token in the future, avoid duplicatly parsing
module name in both preprocessor and parser, but it's makes error
recovery much diffcult(eg. import a; import b; in same line).

This patch also introduce 2 new keyword `__preprocessed_module` and
`__preprocessed_import`. These 2 keyword was generated during `-E` mode.
This is useful to avoid confusion with `module` and `import` keyword in
preprocessed output:
```cpp
export module m;
struct import {};
#define EMPTY
EMPTY import foo;
```

Fixes https://github.com/llvm/llvm-project/issues/54047

---------

Signed-off-by: yronglin <yronglin777@gmail.com>
Signed-off-by: Wang, Yihan <yronglin777@gmail.com>
2025-12-19 23:29:17 +08:00
yronglin
e6e874ce8f
[clang] Allow trivial pp-directives before C++ module directive (#153641)
Consider the following code:

```cpp
# 1 __FILE__ 1 3
export module a;
```

According to the wording in
[P1857R3](https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p1857r3.html):
```
A module directive may only appear as the first preprocessing tokens in a file (excluding the global module fragment.)
```

and the wording in
[[cpp.pre]](https://eel.is/c++draft/cpp.pre#nt:module-file)
```
module-file:
    pp-global-module-fragment[opt] pp-module group[opt] pp-private-module-fragment[opt]
```

`#` is the first pp-token in the translation unit, and it was rejected
by clang, but they really should be exempted from this rule. The goal is
to not allow any preprocessor conditionals or most state changes, but
these don't fit that.

State change would mean most semantically observable preprocessor state,
particularly anything that is order dependent. Global flags like being a
system header/module shouldn't matter.

We should exempt a brunch of directives, even though it violates the
current standard wording.

In this patch, we introduce a `TrivialDirectiveTracer` to trace the
**State change** that described above and propose to exempt the
following kind of directive: `#line`, GNU line marker, `#ident`,
`#pragma comment`, `#pragma mark`, `#pragma detect_mismatch`, `#pragma
clang __debug`, `#pragma message`, `#pragma GCC warning`, `#pragma GCC
error`, `#pragma gcc diagnostic`, `#pragma OPENCL EXTENSION`, `#pragma
warning`, `#pragma execution_character_set`, `#pragma clang
assume_nonnull` and builtin macro expansion.

Fixes https://github.com/llvm/llvm-project/issues/145274

---------

Signed-off-by: yronglin <yronglin777@gmail.com>
2025-08-18 14:17:35 +08:00
James Y Knight
c7f3437507
NFC: Clean up of IntrusiveRefCntPtr construction from raw pointers. (#151545)
Handles clang::DiagnosticsEngine and clang::DiagnosticIDs.

For DiagnosticIDs, this mostly migrates from `new DiagnosticIDs` to
convenience method `DiagnosticIDs::create()`.

Part of cleanup https://github.com/llvm/llvm-project/issues/151026
2025-07-31 15:07:35 -04:00
Jan Svoboda
13e1a2cb22 Reapply "[clang] Remove intrusive reference count from DiagnosticOptions (#139584)"
This reverts commit e2a885537f11f8d9ced1c80c2c90069ab5adeb1d. Build failures were fixed right away and reverting the original commit without the fixes breaks the build again.
2025-05-22 12:52:03 -07:00
Kazu Hirata
e2a885537f Revert "[clang] Remove intrusive reference count from DiagnosticOptions (#139584)"
This reverts commit 9e306ad4600c4d3392c194a8be88919ee758425c.

Multiple builtbot failures have been reported:
https://github.com/llvm/llvm-project/pull/139584
2025-05-22 12:44:20 -07:00
Jan Svoboda
9e306ad460
[clang] Remove intrusive reference count from DiagnosticOptions (#139584)
The `DiagnosticOptions` class is currently intrusively
reference-counted, which makes reasoning about its lifetime very
difficult in some cases. For example, `CompilerInvocation` owns the
`DiagnosticOptions` instance (wrapped in `llvm::IntrusiveRefCntPtr`) and
only exposes an accessor returning `DiagnosticOptions &`. One would
think this gives `CompilerInvocation` exclusive ownership of the object,
but that's not the case:

```c++
void shareOwnership(CompilerInvocation &CI) {
  llvm::IntrusiveRefCntPtr<DiagnosticOptions> CoOwner = &CI.getDiagnosticOptions();
  // ...
}
```

This is a perfectly valid pattern that is being actually used in the
codebase.

I would like to ensure the ownership of `DiagnosticOptions` by
`CompilerInvocation` is guaranteed to be exclusive. This can be
leveraged for a copy-on-write optimization later on. This PR changes
usages of `DiagnosticOptions` across `clang`, `clang-tools-extra` and
`lldb` to not be intrusively reference-counted.
2025-05-22 12:33:52 -07:00
Jan Svoboda
985410f87f
[clang] Hide the TargetOptions pointer from CompilerInvocation (#106271)
This PR hides the reference-counted pointer that holds `TargetOptions`
from the public API of `CompilerInvocation`. This gives
`CompilerInvocation` an exclusive control over the lifetime of this
member, which will eventually be leveraged to implement a copy-on-write
behavior.

There are two clients that currently share ownership of that pointer:

* `TargetInfo` - This was refactored to hold a non-owning reference to
`TargetOptions`. The options object is typically owned by the
`CompilerInvocation` or by the new `CompilerInstance::AuxTargetOpts` for
the auxiliary target. This needed a bit of care in `ASTUnit::Parse()` to
keep the `CompilerInvocation` alive.
* `clangd::PreambleData` - This was refactored to exclusively own the
`TargetOptions` that get moved out of the `CompilerInvocation`.
2025-04-28 07:43:26 -07:00
Jan Svoboda
1688c3062a
[clang] Do not share ownership of PreprocessorOptions (#133467)
This PR makes it so that `CompilerInvocation` is the sole owner of the
`PreprocessorOptions` instance.
2025-04-04 10:11:14 -07:00
Jan Svoboda
7a370748c0
[clang][lex] Store non-owning options ref in HeaderSearch (#132780)
This makes it so that `CompilerInvocation` can be the only entity that
manages ownership of `HeaderSearchOptions`, making it possible to
implement copy-on-write semantics.
2025-03-25 12:14:06 -07:00
Jonas Hahnfeld
3116d60494 [Lex] Introduce Preprocessor::LexTokensUntilEOF()
This new method repeatedly calls Lex() until end of file is reached
and optionally fills a std::vector of Tokens. Use it in Clang's unit
tests to avoid quite some code duplication.

Differential Revision: https://reviews.llvm.org/D158413
2023-10-05 11:04:07 +02:00
Chuanqi Xu
54cf24dc6e [NFC] Refactor ModuleDeclStateTest to make it not dependent on Frontend
Required in https://reviews.llvm.org/D137526. And it is indeed odd that
LexTest depends on ClangFrontend.
2023-02-15 11:30:18 +08:00
Jie Fu
35537aea12 [Modules][Test][NFC] Fix -Wsign-compare in clang/unittests/Lex/ModuleDeclStateTest.cpp
In file included from /data/jiefu/llvm-project/clang/unittests/Lex/ModuleDeclStateTest.cpp:22:
/data/jiefu/llvm-project/third-party/unittest/googletest/include/gtest/gtest.h:1526:11: error: comparison of integers of different signs: 'const unsigned long' and 'const int' [-Werror,-Wsign-compare]
  if (lhs == rhs) {
      ~~~ ^  ~~~
/data/jiefu/llvm-project/third-party/unittest/googletest/include/gtest/gtest.h:1553:12: note: in instantiation of function template specialization 'testing::internal::CmpHelperEQ<unsigned long, int>' requested here
    return CmpHelperEQ(lhs_expression, rhs_expression, lhs, rhs);
           ^
/data/jiefu/llvm-project/clang/unittests/Lex/ModuleDeclStateTest.cpp:124:3: note: in instantiation of function template specialization 'testing::internal::EqHelper::Compare<unsigned long, int, nullptr>' requested here
  EXPECT_EQ(Callback->importNamedModuleNum(), 0);
  ^
/data/jiefu/llvm-project/third-party/unittest/googletest/include/gtest/gtest.h:2027:54: note: expanded from macro 'EXPECT_EQ'
  EXPECT_PRED_FORMAT2(::testing::internal::EqHelper::Compare, val1, val2)
                                                     ^
1 error generated.
2023-02-10 10:51:21 +08:00
Chuanqi Xu
6470706bc0 [C++20] [Modules] [NFC] Add Preprocessor methods for named modules - for ClangScanDeps (1/4)
This patch prepares the necessary interfaces in the preprocessor part
for D137527 since we need to recognize if we're in a module unit, the
module kinds and the module declaration and the module we're importing
in the preprocessor.

Differential Revision: https://reviews.llvm.org/D137526
2023-02-10 10:11:36 +08:00