This removes the assumption that a deserialized module is backed by a
`FileEntry`. The `OptionalFileEntryRef` member is replaced with
`ModuleFile{Name,Key}`.
This PR upstreams a piece of infrastructure from Swift's LLVM fork. This
allows controlling behavior of the dependency scanner, e.g. modifying
the primary compiler invocation and modifying compiler instances (the
primary one and for imported modules). This will be used to implement
modules support in the CAS-based compilation caching mode.
Fold normalization of `.`'s into `FileManager::makeAbsolutePath` so that
different places that need to see through it can just call it instead of
handling it in wrappers.
There shouldn't be a functional impact.
This patch essentially reverts 62fec3d2 which was an NFC, and replaces
the remaining check for the output format with more generic scanning
service option that controls whether we report absolute or relative file
paths. Along with #182063 this makes the scanner implementation entirely
independent of the desired output format.
Adding new configuration knobs in the scanner is fairly painful now,
especially with a diverging downstream. This patch extracts what was
previously passed into the service constructor into a struct. This
encourages one knob customization per line, reduces difficult merge
conflicts, `/*ArgName=*/`-style comments with copy-pasted defaults, etc.
The patch reapply https://github.com/llvm/llvm-project/pull/173130.
This patch implement the following papers:
[P1857R3 Modules Dependency Discovery](https://wg21.link/p1857r3).
[P3034R1 Module Declarations Shouldn’t be
Macros](https://wg21.link/P3034R1).
[CWG2947](https://cplusplus.github.io/CWG/issues/2947.html).
At the start of phase 4 an import or module token is treated as starting
a directive and are converted to their respective keywords iff:
- After skipping horizontal whitespace are
- at the start of a logical line, or
- preceded by an export at the start of the logical line.
- Are followed by an identifier pp token (before macro expansion), or
- <, ", or : (but not ::) pp tokens for import, or
- ; for module
Otherwise the token is treated as an identifier.
Additionally:
- The entire import or module directive (including the closing ;) must
be on a single logical line and for module must not come from an
#include.
- The expansion of macros must not result in an import or module
directive introducer that was not there prior to macro expansion.
- A module directive may only appear as the first preprocessing tokens
in a file (excluding the global module fragment.)
- Preprocessor conditionals shall not span a module declaration.
After this patch, we handle C++ module-import and module-declaration as
a real pp-directive in preprocessor. Additionally, we refactor module
name lexing, remove the complex state machine and read full module name
during module/import directive handling. Possibly we can introduce a
tok::annot_module_name token in the future, avoid duplicatly parsing
module name in both preprocessor and parser, but it's makes error
recovery much diffcult(eg. import a; import b; in same line).
This patch also introduce 2 new keyword `__preprocessed_module` and
`__preprocessed_import`. These 2 keyword was generated during `-E` mode.
This is useful to avoid confusion with `module` and `import` keyword in
preprocessed output:
```cpp
export module m;
struct import {};
#define EMPTY
EMPTY import foo;
```
Fixes https://github.com/llvm/llvm-project/issues/54047
The previous patch has an use-after-free issue in
Lexer::LexTokenInternal function. Since C++20, the `export`, `import`
and `module` identifiers may be an introducer of a C++ module
declaration/importing directive, and the directive will handled in
`LexIdentifierContinue`. Unfortunately, the EOF may be encountered in
`LexIdentifierContinue` and `CurLexer` might be destructed in
`HandleEndOfFile`, If the code after `LexIdentifierContinue` try to
access `LangOps` or other class members in this Lexer, it will hit
undefined behavior.
This patch also fix the use-after-free issue in Lexer by introduce a
mechanism to delay the destruction of `CurLexer` in `Preprocessor`
class.
---------
Signed-off-by: yronglin <yronglin777@gmail.com>
This PR unifies the terminology for:
* "context hash" - previously ambiguously referred to as "module hash"
or as overly specific "module context hash"
* "specific module cache path" - previously referred to as just "module
cache path" - hard to distinguish from the command-line-provided module
cache path without the context hash
NFCI
This PR implement the following papers:
[P1857R3 Modules Dependency Discovery](https://wg21.link/p1857r3).
[P3034R1 Module Declarations Shouldn’t be
Macros](https://wg21.link/P3034R1).
[CWG2947](https://cplusplus.github.io/CWG/issues/2947.html).
At the start of phase 4 an import or module token is treated as starting
a directive and are converted to their respective keywords iff:
- After skipping horizontal whitespace are
- at the start of a logical line, or
- preceded by an export at the start of the logical line.
- Are followed by an identifier pp token (before macro expansion), or
- <, ", or : (but not ::) pp tokens for import, or
- ; for module
Otherwise the token is treated as an identifier.
Additionally:
- The entire import or module directive (including the closing ;) must
be on a single logical line and for module must not come from an
#include.
- The expansion of macros must not result in an import or module
directive introducer that was not there prior to macro expansion.
- A module directive may only appear as the first preprocessing tokens
in a file (excluding the global module fragment.)
- Preprocessor conditionals shall not span a module declaration.
After this patch, we handle C++ module-import and module-declaration as
a real pp-directive in preprocessor. Additionally, we refactor module
name lexing, remove the complex state machine and read full module name
during module/import directive handling. Possibly we can introduce a
tok::annot_module_name token in the future, avoid duplicatly parsing
module name in both preprocessor and parser, but it's makes error
recovery much diffcult(eg. import a; import b; in same line).
This patch also introduce 2 new keyword `__preprocessed_module` and
`__preprocessed_import`. These 2 keyword was generated during `-E` mode.
This is useful to avoid confusion with `module` and `import` keyword in
preprocessed output:
```cpp
export module m;
struct import {};
#define EMPTY
EMPTY import foo;
```
Fixes https://github.com/llvm/llvm-project/issues/54047
---------
Signed-off-by: yronglin <yronglin777@gmail.com>
Signed-off-by: Wang, Yihan <yronglin777@gmail.com>
This patch is the first of two in refactoring Clang's dependency
scanning tooling to remove its dependency on clangDriver.
It separates Tooling/DependencyScanningTool.cpp from the rest of
clangDependencyScanning and moves clangDependencyScanning out of
clangTooling into its own library. No functional changes are
introduced.
The follow-up patch (#169964) will restrict clangDependencyScanning to
handling only -cc1 command line inputs and will move all functionality
related to handling driver commands into clangTooling.
(Tooling/DependencyScanningTool.cpp).
This is part of a broader effort to support driver-managed builds for
compilations using C++ named modules and/or Clang modules. It is
required for linking the dependency scanning tooling against the driver
without introducing cyclic dependencies, which would otherwise cause
build failures when dynamic linking is enabled.
The RFC for this change can be found here:
https://discourse.llvm.org/t/rfc-new-clangoptions-library-remove-dependency-on-clangdriver-from-clangfrontend-and-flangfrontend/88773?u=naveen-seth