This patch is part of a series to support driver managed module builds
for C++ named modules and Clang modules.
This introduces a scanner that detects C++ named module usage early in
the driver with only negligible overhead.
For now, it is enabled only with the `-fmodules-driver` flag and serves
solely diagnostic purposes. In the future, the scanner will be enabled
for any (modules-driver compatible) compilation with two or more inputs,
and will help the driver determine whether to implicitly enable the
modules driver.
Since the scanner adds very little overhead, we are also exploring
enabling it for compilations with only a single input. This approach
could allow us to detect `import std` usage in a single-file
compilation, which would then activate the modules driver. For
performance measurements on this, see
https://github.com/naveen-seth/llvm-dev-cxx-modules-check-benchmark.
RFC for driver managed module builds:
https://discourse.llvm.org/t/rfc-modules-support-simple-c-20-modules-use-from-the-clang-driver-without-a-build-system
This patch relands the reland (2d31fc8) for commit ded1426. The earlier
reland failed due to a missing link dependency on `clangLex`. This
reland fixes the issue by adding the link dependency after discussing it
in the following RFC:
https://discourse.llvm.org/t/rfc-driver-link-the-driver-against-clangdependencyscanning-clangast-clangfrontend-clangserialization-and-clanglex
Consider the following code:
```cpp
# 1 __FILE__ 1 3
export module a;
```
According to the wording in
[P1857R3](https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p1857r3.html):
```
A module directive may only appear as the first preprocessing tokens in a file (excluding the global module fragment.)
```
and the wording in
[[cpp.pre]](https://eel.is/c++draft/cpp.pre#nt:module-file)
```
module-file:
pp-global-module-fragment[opt] pp-module group[opt] pp-private-module-fragment[opt]
```
`#` is the first pp-token in the translation unit, and it was rejected
by clang, but they really should be exempted from this rule. The goal is
to not allow any preprocessor conditionals or most state changes, but
these don't fit that.
State change would mean most semantically observable preprocessor state,
particularly anything that is order dependent. Global flags like being a
system header/module shouldn't matter.
We should exempt a brunch of directives, even though it violates the
current standard wording.
In this patch, we introduce a `TrivialDirectiveTracer` to trace the
**State change** that described above and propose to exempt the
following kind of directive: `#line`, GNU line marker, `#ident`,
`#pragma comment`, `#pragma mark`, `#pragma detect_mismatch`, `#pragma
clang __debug`, `#pragma message`, `#pragma GCC warning`, `#pragma GCC
error`, `#pragma gcc diagnostic`, `#pragma OPENCL EXTENSION`, `#pragma
warning`, `#pragma execution_character_set`, `#pragma clang
assume_nonnull` and builtin macro expansion.
Fixes https://github.com/llvm/llvm-project/issues/145274
---------
Signed-off-by: yronglin <yronglin777@gmail.com>
Fixes#152829
---
This patch addresses the issue where the preprocessor could crash when
parsing `#embed` parameters containing unmatched closing brackets
```cpp
#embed "file" prefix(])
#embed "file" prefix(})
```
This patch is part of a series to natively support C++20 module usage
from the Clang driver (without requiring an external build system). This
introduces a new scanner that detects C++20 module usage in source files
without using the preprocessor or lexer.
For now, it is enabled only with the `-fmodules-driver` flag and serves
solely diagnostic purposes. In the future, the scanner will be enabled
for any (modules-driver compatible) compilation with two or more inputs,
and will help the driver determine whether to implicitly enable the
modules driver.
Since the scanner adds very little overhead, we are also exploring
enabling it for compilations with only a single input. This approach
could allow us to detect `import std` usage in a single-file
compilation, which would then activate the modules driver. For
performance measurements on this, see
https://github.com/naveen-seth/llvm-dev-cxx-modules-check-benchmark.
RFC:
https://discourse.llvm.org/t/rfc-modules-support-simple-c-20-modules-use-from-the-clang-driver-without-a-build-system
This patch relands commit ded1426. The CI failure is resolved by
removing the compatibility warning for using the `-fmodules-driver` flag
with pre-C++20 standards, which also better aligns its behavior with
other features/flags supported only in newer standards.
Fixes#149669; the old check compared with the end of the literal, but
we can just check that after parsing digits, we're pointing to one
character past the token start.
Previously, the newline after a module directive was not properly
captured and printed by `clang::printDependencyDirectivesAsSource`.
According to P1857R3, each directive must, after skipping horizontal
whitespace, appear at the start of a logical line. Because the newline
after module directives was missing, this invalidated the following
line.
This fixes tests that were previously in violation of P1857R3,
including for Objective-C directives, which should also comply with
P1857R3.
This also ensures that the global module fragment `module;` is captured
by the dependency directives scanner.
The dependency directive scanner was incorrectly classifying namespaces
such as `import::inner xi` as directives. According to P1857R3, `import` should
not be treated as a directive when followed by `::`.
This change fixes that behavior.
This PR addresses instances of compiler warning C4146 that can be
replaced with std::numeric_limits. Specifically, these are cases where a
literal such as '-1ULL' was used to assign a value to a uint64_t
variable. The intent is much cleaner if we use the appropriate
std::numeric_limits value<Type>::max() for these cases.
Addresses #147439
The `SourceLocation` of a `RootSignatureToken` is incorrectly set to be
the "offset" into the concatenated string that denotes the
rootsignature. This causes an issue when the `StringLiteral` is a
multi-line expansion macro, since the offset will not account for the
characters between `StringLiteral` tokens.
This pr resolves this by retaining the `SourceLocation` information that
is kept in `StringLiteral` and then converting the offset in the
concatenated string into the proper `SourceLocation` using the
`StringLiteral::getLocationOfByte` interface. To do so, we will need to
adjust the `RootSignatureToken` to only hold its offset into the root
signature string. Then when the parser will use the token, it will need
to compute its actual `SourceLocation`.
See linked issue for more context.
For example:
```
#define DemoRootSignature \
"CBV(b0)," \
"RootConstants(num32BitConstants = 3, b0, invalid)"
expected caret location ---------------^
actual caret location ------------^
```
The caret points 5 characters early because the current offset did not
account for the characters:
```
'"' ' ' '\' ' ' '"'
1 2 3 4 5
```
- Updates `RootSignatureParser` to retain `SourceLocation` information
by retaining the `StringLiteral` and passing the underlying `StringRef`
to the `Lexer`
- Updates `RootSignatureLexer` so that the constructed tokens only
reflect an offset into the `StringRef`
- Updates `RootSignatureParser` to directly construct its used `Lexer`
so that the `StringLiteral` is directly tied with the string used in the
`RootSignatureLexer`
- Updates `RootSignatureParser` to use
`StringLiteral::getLocationOfByte` to get the actual token location for
diagnostics
- Updates `ParseHLSLRootSignatureTest` to construct a phony
`AST`/`StringLiteral` for the test cases
- Adds a test to `RootSignature-err.hlsl` showing that the
`SourceLocation` is correctly set for diagnostics in a multi-line macro
expansion
Resolves: https://github.com/llvm/llvm-project/issues/146967
In ms-compatibility mode we inject static_assert macro definition if
assert macro is defined. This is done by
8da090381d567d0ec555840f6b2a651d2997e4b3
for the sake of better diagnosing, in particular to emit a compatibility
warning when static_assert keyword is used without inclusion of
<assert.h>. Unfortunately it doesn't do a good job in c99 mode adding
that macro unexpectedly for the users, so this patch removes macro
injection and the diagnostics.
---------
Co-authored-by: Corentin Jabot <corentinjabot@gmail.com>
Otherwise we are continuing in an invalid state and can easily crash.
It is a follow-up to cde90e68f8123e7abef3f9e18d79980aa19f460a but an
important difference is when a failure happens in a submodule. In this
case in `Preprocessor::HandleEndOfFile` `tok::eof` is replaced by
`tok::annot_module_end`. And after exiting a file with bad
`#include/#import` we work with a new buffer, so `BufferPtr < BufferEnd`.
As there are no signs to stop lexing we just keep doing it.
The fix is the same as in dc9fdaf2171cc480300d5572606a8ede1678d18b in
`Lexer::LexTokenInternal` but this time in
`Lexer::LexDependencyDirectiveToken` as well.
rdar://152499276
`clang-scan-deps` threw "unterminated conditional directive" error
falsely on the following example:
```
#ifndef __TEST
#define __TEST
#if defined(__TEST_DUMMY)
#if defined(__TEST_DUMMY2)
#pragma GCC warning \
"Hello!"
#else
#pragma GCC error \
"World!"
#endif // defined(__TEST_DUMMY2)
#endif // defined(__TEST_DUMMY)
#endif // #ifndef __TEST
```
The issue comes from PR #143950, where the flag `LastNonWhitespace` does
not correctly represent the state for the example above. The PR aimed to
support that a line-continuation can be followed by whitespaces.
This commit fixes the issue by moving the `LastNonWhitespace` variable
to the inner loop so that it will be correctly reset.
rdar://153742186
These are identified by misc-include-cleaner. I've filtered out those
that break builds. Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
Introduce a type alias for the commonly used `std::pair<FileID,
unsigned>` to improve code readability, and make it easier for future
updates (64-bit source locations).
Depends on [[clang][Preprocessor] Add peekNextPPToken, makes look ahead
next token without
side-effects](https://github.com/llvm/llvm-project/pull/143898).
This PR fix the performance regression that introduced in
https://github.com/llvm/llvm-project/pull/144233.
The original PR(https://github.com/llvm/llvm-project/pull/144233) handle
the first pp-token in the main source file in the macro
definition/expansion and `Lexer::Lex`, but the lexer is almost always on
the hot path, we may hit a performance regression. In this PR, we handle
the first pp-token in `Preprocessor::EnterMainSourceFile`.
---------
Signed-off-by: yronglin <yronglin777@gmail.com>
This PR follow the
suggestion(https://github.com/llvm/llvm-project/pull/143898#discussion_r2164253141)
to refine the implementation of `Preprocessor::isNextPPToken`, also use
C++ fold expression to refine `Token::isOneOf`. We don't need `bool
isOneOf(tok::TokenKind K1, tok::TokenKind K2) const` anymore.
In order to reduce the impact, specificed `TokenKind` is still passed to
`Token::isOneOf` and `Preprocessor::isNextPPTokenOneOf` as function
parameters.
---------
Signed-off-by: yronglin <yronglin777@gmail.com>
Fixs https://github.com/llvm/llvm-project/issues/145240.
The UCN in preprocessor pasted identifier not resolved to unicode, it
may cause the following issue:
```c
#define CAT(a,b) a##b
char foo\u00b5;
char*p = &CAT(foo, \u00b5); // error: use of undeclared identifier 'foo\u00b5'
```
The real identifier after paste is `fooµ`. This PR fix this issue in
`TokenLexer::pasteTokens`, if there has any UCN in pasting tokens, the
final pasted token should have a Token::HasUCN flag. Then
`Preprocessor::LookUpIdentifierInfo` will expand UCNs in this token.
Signed-off-by: yronglin <yronglin777@gmail.com>
This PR introduce a new function `peekNextPPToken`. It's an extension of
`isNextPPTokenLParen` and can makes look ahead one token in preprocessor
without side-effects.
It's also the 1st part of
https://github.com/llvm/llvm-project/pull/107168 and it was used to look
ahead next token then determine whether current lexing pp directive is
one of pp-import or pp-module directive.
At the start of phase 4 an import or module token is treated as starting
a directive and are converted to their respective keywords iff:
- After skipping horizontal whitespace are
- at the start of a logical line, or
- preceded by an export at the start of the logical line.
- Are followed by an identifier pp token (before macro expansion), or
- <, ", or : (but not ::) pp tokens for import, or
- ; for module
Otherwise the token is treated as an identifier.
---------
Signed-off-by: yronglin <yronglin777@gmail.com>
This PR is 2nd part of
[P1857R3](https://github.com/llvm/llvm-project/pull/107168)
implementation, and mainly implement the restriction `A module directive
may only appear as the first preprocessing tokens in a file (excluding
the global module fragment.)`:
[cpp.pre](https://eel.is/c++draft/cpp.pre):
```
module-file:
pp-global-module-fragment[opt] pp-module group[opt] pp-private-module-fragment[opt]
```
We also refine tests use `split-file` instead of conditional macro.
Signed-off-by: yronglin <yronglin777@gmail.com>
These are identified by misc-include-cleaner. I've filtered out those
that break builds. Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
P2223R2 allows the line-continuation slash `\` to be followed by
additional whitespace. The Clang lexer already follows this behavior,
also for versions prior to C++23. The dependency directive scanner
however only implements it for `#define` directives (15d5f5d).
This fully implements P2223R2 for the dependency directive scanner (for
any C++ standard) and aligns the dependency directive scanner's splicing
behavior with that of the Clang lexer.
For example, the following code was previously not scanned correctly by
`clang-scan-deps` but now works as expected:
```cpp
import \<whitespace here>
A;
```
Sometimes, when a user writes invalid code, the minimization used for
scanning can create a stream of tokens that is invalid at lex time. This
patch protects against the case where there are valid (non-c++20) import
directives discovered in the middle of an invalid `import` declaration.
Mostly authored by: @akyrtzi
resolves: rdar://152335844
Fixes https://github.com/llvm/llvm-project/issues/141230.
Currently, prefixed octal literals used with floating-point suffixes are
not
rejected, causing Clang to crash.
This adds proper handling to reject invalid literals such as `0o0.1` or
`0.0e1`.
No release note because this is fixing an issue with a new change.
Reland with debug traces to try to understand a bug that only happens on
one CI configuration
===
This introduces a way detect the libstdc++ version,
use that to enable workarounds.
The version is cached.
This should make it easier in the future to find and remove
these hacks.
I did not find the need for enabling a hack between or after
specific versions, so it's left as a future exercise.
We can extend this fature to other libraries as the need arise.
===
This introduces a way detect the libstdc++ version, use that to enable
workarounds.
The version is cached.
This should make it easier in the future to find and remove these hacks.
I did not find the need for enabling a hack between or after specific
versions, so it's left as a future exercise.
We can extend this fature to other libraries as the need arise.
WG14 N3469 changed _Lengthof to _Countof but it also introduced the
<stdcountof.h> header to expose a macro with a non-ugly identifier. GCC
vends this header as part of the compiler implementation, so Clang
should do the same.
Suggested-by: Alejandro Colomar <alx@kernel.org>
These are identified by misc-include-cleaner. I've filtered out those
that break builds. Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
We can simplify the code with *Map::try_emplace where we need
default-constructed values while avoding calling constructors when
keys are already present.
Static analysis flagged the unconditional access of getExternalSource().
We don't initialize ExternalSource during construction but via
setExternalSource(). If this is not set it will violate the invariant
covered by the assert.
There are checks in clang codebase that determine the type of source
file, associated with a given location - specifically, if it is an
ordonary file or comes from sources like command-line options or a
built-in definitions. These checks often rely on calls to
`getPresumedLoc`, which is relatively expensive. In certain cases, these
checks are combined, leading to repeated calculations of the costly
function negatively affecting compile time.
This change tries to optimize such checks. It must fix compile time
regression introduced in
https://github.com/llvm/llvm-project/pull/137306/.
---------
Co-authored-by: cor3ntin <corentinjabot@gmail.com>
Add a new Cygwin toolchain that just goes through the motions to
initialize the Generic_GCC base properly. This allows removing some old,
almost certainly wrong hard-coded paths from Lex/InitHeaderSearch.cpp.
MSYS2 (GCC triple (arch)-pc-msys) is a fork of Cygwin (GCC triple
(arch)-pc-cygwin), and this driver can be used for either.
Add a simple test for this driver.
Move the Darwin framework search path logic from
InitHeaderSearch::AddDefaultIncludePaths to
DarwinClang::AddClangSystemIncludeArgs. Add a new -internal-iframework
cc1 argument to support the tool chain adding these paths.
Now that the tool chain is adding search paths via cc1 flag, they're
only added if they exist, so the Preprocessor/cuda-macos-includes.cu
test is no longer relevant.
Change Driver/driverkit-path.c and Driver/darwin-subframeworks.c to do
-### style testing similar to the darwin-header-search and
darwin-embedded-search-paths tests. Rename darwin-subframeworks.c to
darwin-framework-search-paths.c and have it test all framework search
paths, not just SubFrameworks.
Add a unit test to validate that the myriad of search path flags result
in the expected search path list.
Fixes https://github.com/llvm/llvm-project/issues/75638