19 Commits

Author SHA1 Message Date
yronglin
1da403937e
Reapply "[C++20][Modules] Implement P1857R3 Modules Dependency Discovery" (#173130)" (#173789)
The patch reapply https://github.com/llvm/llvm-project/pull/173130.

This patch implement the following papers:
[P1857R3 Modules Dependency Discovery](https://wg21.link/p1857r3).
[P3034R1 Module Declarations Shouldn’t be
Macros](https://wg21.link/P3034R1).
[CWG2947](https://cplusplus.github.io/CWG/issues/2947.html).

At the start of phase 4 an import or module token is treated as starting
a directive and are converted to their respective keywords iff:

 - After skipping horizontal whitespace are
    - at the start of a logical line, or
    - preceded by an export at the start of the logical line.
- Are followed by an identifier pp token (before macro expansion), or
    - <, ", or : (but not ::) pp tokens for import, or
    - ; for module
Otherwise the token is treated as an identifier.

Additionally:

- The entire import or module directive (including the closing ;) must
be on a single logical line and for module must not come from an
#include.
- The expansion of macros must not result in an import or module
directive introducer that was not there prior to macro expansion.
- A module directive may only appear as the first preprocessing tokens
in a file (excluding the global module fragment.)
- Preprocessor conditionals shall not span a module declaration.

After this patch, we handle C++ module-import and module-declaration as
a real pp-directive in preprocessor. Additionally, we refactor module
name lexing, remove the complex state machine and read full module name
during module/import directive handling. Possibly we can introduce a
tok::annot_module_name token in the future, avoid duplicatly parsing
module name in both preprocessor and parser, but it's makes error
recovery much diffcult(eg. import a; import b; in same line).

This patch also introduce 2 new keyword `__preprocessed_module` and
`__preprocessed_import`. These 2 keyword was generated during `-E` mode.
This is useful to avoid confusion with `module` and `import` keyword in
preprocessed output:
```cpp
export module m;
struct import {};
#define EMPTY
EMPTY import foo;
```

Fixes https://github.com/llvm/llvm-project/issues/54047

The previous patch has an use-after-free issue in
Lexer::LexTokenInternal function. Since C++20, the `export`, `import`
and `module` identifiers may be an introducer of a C++ module
declaration/importing directive, and the directive will handled in
`LexIdentifierContinue`. Unfortunately, the EOF may be encountered in
`LexIdentifierContinue` and `CurLexer` might be destructed in
`HandleEndOfFile`, If the code after `LexIdentifierContinue` try to
access `LangOps` or other class members in this Lexer, it will hit
undefined behavior.

This patch also fix the use-after-free issue in Lexer by introduce a
mechanism to delay the destruction of `CurLexer` in `Preprocessor`
class.

---------

Signed-off-by: yronglin <yronglin777@gmail.com>
2026-01-20 17:42:46 +08:00
yronglin
71bba12587
Revert "Reapply "[C++20][Modules] Implement P1857R3 Modules Dependency Discovery" (#173130)" (#173549)
This reverts commit 0d1c396ce8178baf05f277b16bf41b8a6b847d6d.

Co-authored-by: Yihan Wang <yihwang@nvidia.com>
2025-12-25 19:55:40 +08:00
yronglin
0d1c396ce8
Reapply "[C++20][Modules] Implement P1857R3 Modules Dependency Discovery" (#173130)
This PR reapply https://github.com/llvm/llvm-project/pull/107168.

---------

Signed-off-by: Wang, Yihan <yronglin777@gmail.com>
Signed-off-by: yronglin <yronglin777@gmail.com>
2025-12-25 18:55:44 +08:00
Paul Kirth
2b8b305d46
Revert "[C++20][Modules] Implement P1857R3 Modules Dependency Discovery" (#173118)
Reverts llvm/llvm-project#107168

This patch broke on bots:
- https://lab.llvm.org/buildbot/#/builders/190/builds/33105
- https://lab.llvm.org/buildbot/#/builders/94/builds/13727
- https://lab.llvm.org/buildbot/#/builders/169/builds/18192

and on mac-aarch64 builds.
see
https://github.com/llvm/llvm-project/pull/107168#issuecomment-3675990781
2025-12-19 15:17:31 -08:00
yronglin
d2e62d9024
[C++20][Modules] Implement P1857R3 Modules Dependency Discovery (#107168)
This PR implement the following papers:
[P1857R3 Modules Dependency Discovery](https://wg21.link/p1857r3).
[P3034R1 Module Declarations Shouldn’t be
Macros](https://wg21.link/P3034R1).
[CWG2947](https://cplusplus.github.io/CWG/issues/2947.html).

At the start of phase 4 an import or module token is treated as starting
a directive and are converted to their respective keywords iff:

 - After skipping horizontal whitespace are
    - at the start of a logical line, or
    - preceded by an export at the start of the logical line.
- Are followed by an identifier pp token (before macro expansion), or
    - <, ", or : (but not ::) pp tokens for import, or
    - ; for module
Otherwise the token is treated as an identifier.

Additionally:

- The entire import or module directive (including the closing ;) must
be on a single logical line and for module must not come from an
#include.
- The expansion of macros must not result in an import or module
directive introducer that was not there prior to macro expansion.
- A module directive may only appear as the first preprocessing tokens
in a file (excluding the global module fragment.)
- Preprocessor conditionals shall not span a module declaration.

After this patch, we handle C++ module-import and module-declaration as
a real pp-directive in preprocessor. Additionally, we refactor module
name lexing, remove the complex state machine and read full module name
during module/import directive handling. Possibly we can introduce a
tok::annot_module_name token in the future, avoid duplicatly parsing
module name in both preprocessor and parser, but it's makes error
recovery much diffcult(eg. import a; import b; in same line).

This patch also introduce 2 new keyword `__preprocessed_module` and
`__preprocessed_import`. These 2 keyword was generated during `-E` mode.
This is useful to avoid confusion with `module` and `import` keyword in
preprocessed output:
```cpp
export module m;
struct import {};
#define EMPTY
EMPTY import foo;
```

Fixes https://github.com/llvm/llvm-project/issues/54047

---------

Signed-off-by: yronglin <yronglin777@gmail.com>
Signed-off-by: Wang, Yihan <yronglin777@gmail.com>
2025-12-19 23:29:17 +08:00
Sam McCall
7c1ee5e95f [Pseudo] Token/TokenStream, PP directive parser.
The TokenStream class is the representation of the source code that will
be fed into the GLR parser.

This patch allows a "raw" TokenStream to be built by reading source code.
It also supports scanning a TokenStream to find the directive structure.

Next steps (with placeholders in the code): heuristically choosing a
path through #ifs, preprocessing the code by stripping directives and comments.
These will produce a suitable stream to feed into the parser proper.

Differential Revision: https://reviews.llvm.org/D119162
2022-02-23 17:52:02 +01:00
Serge Pavlov
fcb6123d05 Use switch instead of series of comparisons
This is style correction, no functional changes.

Differential Revision: https://reviews.llvm.org/D65670

llvm-svn: 367759
2019-08-03 16:32:49 +00:00
Serge Pavlov
3c26163d1a [Parser] Use special definition for pragma annotations
Previously pragma annotation tokens were described as any other
annotations in TokenKinds.def. This change introduces special macro
PRAGMA_ANNOTATION for the pragma descriptions. It allows implementing
checks that deal with pragma annotations only.

Differential Revision: https://reviews.llvm.org/D65405

llvm-svn: 367575
2019-08-01 15:15:10 +00:00
Chandler Carruth
2946cd7010 Update the file headers across all of the LLVM projects in the monorepo
to reflect the new license.

We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.

Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.

llvm-svn: 351636
2019-01-19 08:50:56 +00:00
Craig Topper
f1186c5a8f [C++11] Use 'nullptr'.
llvm-svn: 208280
2014-05-08 06:41:40 +00:00
Alp Toker
b3f9501b25 Prospective MSVC 2010 build fix
Try to fix Compiler Error C2011 following r198607 by removing enum from 'enum
TokenKind' parameter types.

llvm-svn: 198621
2014-01-06 15:52:13 +00:00
Alp Toker
a231ad2216 Support diagnostic formatting of keyword tokens
Implemented with a new getKeywordSpelling() accessor. Unlike getTokenName() the
result of this function is stable and may be used in diagnostic output.

Uses of this feature are split out into the subsequent commit.

llvm-svn: 198604
2014-01-06 12:54:18 +00:00
Alp Toker
6d35eab5a6 Rename getTokenSimpleSpelling() to getPunctuatorSpelling()
That's what it does, what the documentation says it does and what callers
expect it to do.

llvm-svn: 198603
2014-01-06 12:54:07 +00:00
Alp Toker
637b347ed0 Apply some LLVM_READONLY / LLVM_READNONE on diagnostic functions
llvm-svn: 198598
2014-01-06 11:30:15 +00:00
Chandler Carruth
3a02247dc9 Sort all of Clang's files under 'lib', and fix up the broken headers
uncovered.

This required manually correcting all of the incorrect main-module
headers I could find, and running the new llvm/utils/sort_includes.py
script over the files.

I also manually added quite a few missing headers that were uncovered by
shuffling the order or moving headers up to be main-module-headers.

llvm-svn: 169237
2012-12-04 09:13:33 +00:00
Kovarththanan Rajaratnam
7632da4b8a This patch adds a PUNCTUATOR macro (specialization of TOK) in TokenKinds.def and makes use of it in tok::getTokenSimpleSpelling.
llvm-svn: 90042
2009-11-28 16:09:28 +00:00
Douglas Gregor
96977da72c Clean up and document code modification hints.
llvm-svn: 65641
2009-02-27 17:53:17 +00:00
Douglas Gregor
87f95b0a6a Introduce code modification hints into the diagnostics system. When we
know how to recover from an error, we can attach a hint to the
diagnostic that states how to modify the code, which can be one of:

  - Insert some new code (a text string) at a particular source
    location
  - Remove the code within a given range
  - Replace the code within a given range with some new code (a text
    string)

Right now, we use these hints to annotate diagnostic information. For
example, if one uses the '>>' in a template argument in C++98, as in
this code:

  template<int I> class B { };
  B<1000 >> 2> *b1;

we'll warn that the behavior will change in C++0x. The fix is to
insert parenthese, so we use code insertion annotations to illustrate
where the parentheses go:

test.cpp:10:10: warning: use of right-shift operator ('>>') in template
argument will require parentheses in C++0x
  B<1000 >> 2> *b1;
         ^
    (        )


Use of these annotations is partially implemented for HTML
diagnostics, but it's not (yet) producing valid HTML, which may be
related to PR2386, so it has been #if 0'd out.

In this future, we could consider hooking this mechanism up to the
rewriter to actually try to fix these problems during compilation (or,
after a compilation whose only errors have fixes). For now, however, I
suggest that we use these code modification hints whenever we can, so
that we get better diagnostics now and will have better coverage when
we find better ways to use this information.

This also fixes PR3410 by placing the complaint about missing tokens
just after the previous token (rather than at the location of the next
token).

llvm-svn: 65570
2009-02-26 21:00:50 +00:00
Chris Lattner
7a51313d8a Make a major restructuring of the clang tree: introduce a top-level
lib dir and move all the libraries into it.  This follows the main
llvm tree, and allows the libraries to be built in parallel.  The
top level now enforces that all the libs are built before Driver,
but we don't care what order the libs are built in.  This speeds
up parallel builds, particularly incremental ones.

llvm-svn: 48402
2008-03-15 23:59:48 +00:00