The patch reapply https://github.com/llvm/llvm-project/pull/173130.
This patch implement the following papers:
[P1857R3 Modules Dependency Discovery](https://wg21.link/p1857r3).
[P3034R1 Module Declarations Shouldn’t be
Macros](https://wg21.link/P3034R1).
[CWG2947](https://cplusplus.github.io/CWG/issues/2947.html).
At the start of phase 4 an import or module token is treated as starting
a directive and are converted to their respective keywords iff:
- After skipping horizontal whitespace are
- at the start of a logical line, or
- preceded by an export at the start of the logical line.
- Are followed by an identifier pp token (before macro expansion), or
- <, ", or : (but not ::) pp tokens for import, or
- ; for module
Otherwise the token is treated as an identifier.
Additionally:
- The entire import or module directive (including the closing ;) must
be on a single logical line and for module must not come from an
#include.
- The expansion of macros must not result in an import or module
directive introducer that was not there prior to macro expansion.
- A module directive may only appear as the first preprocessing tokens
in a file (excluding the global module fragment.)
- Preprocessor conditionals shall not span a module declaration.
After this patch, we handle C++ module-import and module-declaration as
a real pp-directive in preprocessor. Additionally, we refactor module
name lexing, remove the complex state machine and read full module name
during module/import directive handling. Possibly we can introduce a
tok::annot_module_name token in the future, avoid duplicatly parsing
module name in both preprocessor and parser, but it's makes error
recovery much diffcult(eg. import a; import b; in same line).
This patch also introduce 2 new keyword `__preprocessed_module` and
`__preprocessed_import`. These 2 keyword was generated during `-E` mode.
This is useful to avoid confusion with `module` and `import` keyword in
preprocessed output:
```cpp
export module m;
struct import {};
#define EMPTY
EMPTY import foo;
```
Fixes https://github.com/llvm/llvm-project/issues/54047
The previous patch has an use-after-free issue in
Lexer::LexTokenInternal function. Since C++20, the `export`, `import`
and `module` identifiers may be an introducer of a C++ module
declaration/importing directive, and the directive will handled in
`LexIdentifierContinue`. Unfortunately, the EOF may be encountered in
`LexIdentifierContinue` and `CurLexer` might be destructed in
`HandleEndOfFile`, If the code after `LexIdentifierContinue` try to
access `LangOps` or other class members in this Lexer, it will hit
undefined behavior.
This patch also fix the use-after-free issue in Lexer by introduce a
mechanism to delay the destruction of `CurLexer` in `Preprocessor`
class.
---------
Signed-off-by: yronglin <yronglin777@gmail.com>
This PR implement the following papers:
[P1857R3 Modules Dependency Discovery](https://wg21.link/p1857r3).
[P3034R1 Module Declarations Shouldn’t be
Macros](https://wg21.link/P3034R1).
[CWG2947](https://cplusplus.github.io/CWG/issues/2947.html).
At the start of phase 4 an import or module token is treated as starting
a directive and are converted to their respective keywords iff:
- After skipping horizontal whitespace are
- at the start of a logical line, or
- preceded by an export at the start of the logical line.
- Are followed by an identifier pp token (before macro expansion), or
- <, ", or : (but not ::) pp tokens for import, or
- ; for module
Otherwise the token is treated as an identifier.
Additionally:
- The entire import or module directive (including the closing ;) must
be on a single logical line and for module must not come from an
#include.
- The expansion of macros must not result in an import or module
directive introducer that was not there prior to macro expansion.
- A module directive may only appear as the first preprocessing tokens
in a file (excluding the global module fragment.)
- Preprocessor conditionals shall not span a module declaration.
After this patch, we handle C++ module-import and module-declaration as
a real pp-directive in preprocessor. Additionally, we refactor module
name lexing, remove the complex state machine and read full module name
during module/import directive handling. Possibly we can introduce a
tok::annot_module_name token in the future, avoid duplicatly parsing
module name in both preprocessor and parser, but it's makes error
recovery much diffcult(eg. import a; import b; in same line).
This patch also introduce 2 new keyword `__preprocessed_module` and
`__preprocessed_import`. These 2 keyword was generated during `-E` mode.
This is useful to avoid confusion with `module` and `import` keyword in
preprocessed output:
```cpp
export module m;
struct import {};
#define EMPTY
EMPTY import foo;
```
Fixes https://github.com/llvm/llvm-project/issues/54047
---------
Signed-off-by: yronglin <yronglin777@gmail.com>
Signed-off-by: Wang, Yihan <yronglin777@gmail.com>
The TokenStream class is the representation of the source code that will
be fed into the GLR parser.
This patch allows a "raw" TokenStream to be built by reading source code.
It also supports scanning a TokenStream to find the directive structure.
Next steps (with placeholders in the code): heuristically choosing a
path through #ifs, preprocessing the code by stripping directives and comments.
These will produce a suitable stream to feed into the parser proper.
Differential Revision: https://reviews.llvm.org/D119162
Previously pragma annotation tokens were described as any other
annotations in TokenKinds.def. This change introduces special macro
PRAGMA_ANNOTATION for the pragma descriptions. It allows implementing
checks that deal with pragma annotations only.
Differential Revision: https://reviews.llvm.org/D65405
llvm-svn: 367575
to reflect the new license.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
llvm-svn: 351636
Implemented with a new getKeywordSpelling() accessor. Unlike getTokenName() the
result of this function is stable and may be used in diagnostic output.
Uses of this feature are split out into the subsequent commit.
llvm-svn: 198604
uncovered.
This required manually correcting all of the incorrect main-module
headers I could find, and running the new llvm/utils/sort_includes.py
script over the files.
I also manually added quite a few missing headers that were uncovered by
shuffling the order or moving headers up to be main-module-headers.
llvm-svn: 169237
know how to recover from an error, we can attach a hint to the
diagnostic that states how to modify the code, which can be one of:
- Insert some new code (a text string) at a particular source
location
- Remove the code within a given range
- Replace the code within a given range with some new code (a text
string)
Right now, we use these hints to annotate diagnostic information. For
example, if one uses the '>>' in a template argument in C++98, as in
this code:
template<int I> class B { };
B<1000 >> 2> *b1;
we'll warn that the behavior will change in C++0x. The fix is to
insert parenthese, so we use code insertion annotations to illustrate
where the parentheses go:
test.cpp:10:10: warning: use of right-shift operator ('>>') in template
argument will require parentheses in C++0x
B<1000 >> 2> *b1;
^
( )
Use of these annotations is partially implemented for HTML
diagnostics, but it's not (yet) producing valid HTML, which may be
related to PR2386, so it has been #if 0'd out.
In this future, we could consider hooking this mechanism up to the
rewriter to actually try to fix these problems during compilation (or,
after a compilation whose only errors have fixes). For now, however, I
suggest that we use these code modification hints whenever we can, so
that we get better diagnostics now and will have better coverage when
we find better ways to use this information.
This also fixes PR3410 by placing the complaint about missing tokens
just after the previous token (rather than at the location of the next
token).
llvm-svn: 65570
lib dir and move all the libraries into it. This follows the main
llvm tree, and allows the libraries to be built in parallel. The
top level now enforces that all the libs are built before Driver,
but we don't care what order the libs are built in. This speeds
up parallel builds, particularly incremental ones.
llvm-svn: 48402