Consider the following code:
```cpp
# 1 __FILE__ 1 3
export module a;
```
According to the wording in
[P1857R3](https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p1857r3.html):
```
A module directive may only appear as the first preprocessing tokens in a file (excluding the global module fragment.)
```
and the wording in
[[cpp.pre]](https://eel.is/c++draft/cpp.pre#nt:module-file)
```
module-file:
pp-global-module-fragment[opt] pp-module group[opt] pp-private-module-fragment[opt]
```
`#` is the first pp-token in the translation unit, and it was rejected
by clang, but they really should be exempted from this rule. The goal is
to not allow any preprocessor conditionals or most state changes, but
these don't fit that.
State change would mean most semantically observable preprocessor state,
particularly anything that is order dependent. Global flags like being a
system header/module shouldn't matter.
We should exempt a brunch of directives, even though it violates the
current standard wording.
In this patch, we introduce a `TrivialDirectiveTracer` to trace the
**State change** that described above and propose to exempt the
following kind of directive: `#line`, GNU line marker, `#ident`,
`#pragma comment`, `#pragma mark`, `#pragma detect_mismatch`, `#pragma
clang __debug`, `#pragma message`, `#pragma GCC warning`, `#pragma GCC
error`, `#pragma gcc diagnostic`, `#pragma OPENCL EXTENSION`, `#pragma
warning`, `#pragma execution_character_set`, `#pragma clang
assume_nonnull` and builtin macro expansion.
Fixes https://github.com/llvm/llvm-project/issues/145274
---------
Signed-off-by: yronglin <yronglin777@gmail.com>
This PR addresses instances of compiler warning C4146 that can be
replaced with std::numeric_limits. Specifically, these are cases where a
literal such as '-1ULL' was used to assign a value to a uint64_t
variable. The intent is much cleaner if we use the appropriate
std::numeric_limits value<Type>::max() for these cases.
Addresses #147439
Otherwise we are continuing in an invalid state and can easily crash.
It is a follow-up to cde90e68f8123e7abef3f9e18d79980aa19f460a but an
important difference is when a failure happens in a submodule. In this
case in `Preprocessor::HandleEndOfFile` `tok::eof` is replaced by
`tok::annot_module_end`. And after exiting a file with bad
`#include/#import` we work with a new buffer, so `BufferPtr < BufferEnd`.
As there are no signs to stop lexing we just keep doing it.
The fix is the same as in dc9fdaf2171cc480300d5572606a8ede1678d18b in
`Lexer::LexTokenInternal` but this time in
`Lexer::LexDependencyDirectiveToken` as well.
rdar://152499276
These are identified by misc-include-cleaner. I've filtered out those
that break builds. Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
Introduce a type alias for the commonly used `std::pair<FileID,
unsigned>` to improve code readability, and make it easier for future
updates (64-bit source locations).
Depends on [[clang][Preprocessor] Add peekNextPPToken, makes look ahead
next token without
side-effects](https://github.com/llvm/llvm-project/pull/143898).
This PR fix the performance regression that introduced in
https://github.com/llvm/llvm-project/pull/144233.
The original PR(https://github.com/llvm/llvm-project/pull/144233) handle
the first pp-token in the main source file in the macro
definition/expansion and `Lexer::Lex`, but the lexer is almost always on
the hot path, we may hit a performance regression. In this PR, we handle
the first pp-token in `Preprocessor::EnterMainSourceFile`.
---------
Signed-off-by: yronglin <yronglin777@gmail.com>
This PR introduce a new function `peekNextPPToken`. It's an extension of
`isNextPPTokenLParen` and can makes look ahead one token in preprocessor
without side-effects.
It's also the 1st part of
https://github.com/llvm/llvm-project/pull/107168 and it was used to look
ahead next token then determine whether current lexing pp directive is
one of pp-import or pp-module directive.
At the start of phase 4 an import or module token is treated as starting
a directive and are converted to their respective keywords iff:
- After skipping horizontal whitespace are
- at the start of a logical line, or
- preceded by an export at the start of the logical line.
- Are followed by an identifier pp token (before macro expansion), or
- <, ", or : (but not ::) pp tokens for import, or
- ; for module
Otherwise the token is treated as an identifier.
---------
Signed-off-by: yronglin <yronglin777@gmail.com>
This PR is 2nd part of
[P1857R3](https://github.com/llvm/llvm-project/pull/107168)
implementation, and mainly implement the restriction `A module directive
may only appear as the first preprocessing tokens in a file (excluding
the global module fragment.)`:
[cpp.pre](https://eel.is/c++draft/cpp.pre):
```
module-file:
pp-global-module-fragment[opt] pp-module group[opt] pp-private-module-fragment[opt]
```
We also refine tests use `split-file` instead of conditional macro.
Signed-off-by: yronglin <yronglin777@gmail.com>
This is crucial when recovering from fatal loader errors. Without it,
the `Lexer` keeps yielding more tokens and the compiler may access
invalid `ASTReader` state.
rdar://133388373
WG14 added N3411 to the list of papers which apply to older versions of
C in C2y, and WG21 adopted CWG787 as a Defect Report in C++11. So we no
longer should be issuing a pedantic diagnostic about a file which does
not end with a newline character.
We do, however, continue to support -Wnewline-eof as an opt-in
diagnostic.
WG14 N3353 added support for 0o and 0O as octal literal prefixes. It
also deprecates use of octal literals without a prefix, except for the
literal 0.
This feature is being exposed as an extension in older C language modes
as well as in all C++ language modes.
This paper (https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3411.pdf)
allows a source file to end without a newline. Clang has supported this
as a conforming extension for a long time, so this suppresses the
diagnotic in C2y mode but continues to diagnose as an extension in
earlier language modes. It also continues to diagnose if the user passes
-Wnewline-eof explicitly.
Similarly to https://github.com/llvm/llvm-project/pull/122918, leading
comments are currently not being moved.
```
struct Foo {
// This one is the cool field.
int a;
int b;
};
```
becomes:
```
struct Foo {
// This one is the cool field.
int b;
int a;
};
```
but should be:
```
struct Foo {
int b;
// This one is the cool field.
int a;
};
```
We have two copies of the same code in clang-tidy and
clang-reorder-fields, and those are extremenly similar to
`Lexer::findNextToken`, so just add an extra agument to the latter.
---------
Co-authored-by: cor3ntin <corentinjabot@gmail.com>
The Lexer used in getRawToken is not told to keep whitespace, so when it
skips over escaped newlines, it also ignores whitespace, regardless of
getRawToken's IgnoreWhiteSpace parameter.
Instead of letting this case fall through to lexing, check
for whitespace after skipping over any escaped newlines.
OpenCL has a reserved operator (^^), the use of which was diagnosed as
an error (735c6cdebdcd4292928079cb18a90f0dd5cd65fb). However, OpenCL
also encourages working with the blocks language extension. This token
has a parsing ambiguity as a result. Consider:
unsigned x=0;
unsigned y=x^^{return 0;}();
This should result in y holding the value zero (0^0) through an
immediately invoked block call as the right-hand side of the xor
operator. However, it causes errors instead because of this reserved
token: https://godbolt.org/z/navf7jTv1
This token is still reserved in OpenCL 3.0, so we still wish to issue a
diagnostic for its use. However, we do not need to create a token for an
extension point that's been unused for about a decade. So this patch
moves the diagnostic from a parsing diagnostic to a lexing diagnostic
and no longer forms a single token. The diagnostic behavior is slightly
worse as a result, but still seems acceptable.
Part of the reason this is coming up is because WG21 is considering
using ^^ as a token for reflection, so this token may come back in the
future.
This enables raw R"" string literals in C in some language modes
and adds an option to disable or enable them explicitly as an
extension.
Background: GCC supports raw string literals in C in `-gnuXY` modes
starting with gnu99. This pr both enables raw string literals in gnu99
mode and later in C and adds an `-f[no-]raw-string-literals` flag to override
this behaviour. The decision not to enable raw string literals in gnu89
mode, according to the GCC devs, is intentional as that mode is supposed
to be used for ‘old code’ that they don’t want to break; we’ve decided to
match GCC’s behaviour here as well.
The `-fraw-string-literals` flag can additionally be used to enable raw string
literals in modes where they aren’t enabled by default (such as c99—as
opposed to gnu99—or even e.g. C++03); conversely, the negated flag can
be used to disable them in any gnuXY modes that *do* provide them by
default, or to override a previous flag. However, we do *not* support
disabling raw string literals (or indeed either of these two options) in
C++11 mode and later, because we don’t want to just start supporting
disabling features that are actually part of the language in the general case.
This fixes#85703.
(bad error message on incorrect string literal)
Fixed the error message for incorrect string literal
before:
```
test.cpp:1:19: error: invalid character '
' character in raw string delimiter; use PREFIX( )PREFIX to delimit raw string
char const* a = R"
^
```
now:
```
test.cpp:1:19: error: invalid newline character in raw string delimiter; use PREFIX( )PREFIX to delimit raw string
1 | char const* a = R"
| ^
```
---------
Co-authored-by: Jon Roelofs <jroelofs@gmail.com>
This patch replaces uses of StringRef::{starts,ends}with with
StringRef::{starts,ends}_with for consistency with
std::{string,string_view}::{starts,ends}_with in C++20.
I'm planning to deprecate and eventually remove
StringRef::{starts,ends}with.
HLSL supports vector swizzles on scalars by implicitly converting the
scalar to a single-element vector. This syntax is a convienent way to
initialize vectors based on filling a scalar value.
There are two parts of this change. The first part in the Lexer splits
numeric constant tokens when a `.x` or `.r` suffix is encountered. This
splitting is a bit hacky but allows the numeric constant to be parsed
separately from the vector element expression. There is an ambiguity
here with the `r` suffix used by fixed point types, however fixed point
types aren't supported in HLSL so this should not cause any exposable
problems (a separate issue has been filed to track validating language
options for HLSL: #67689).
The second part of this change is in Sema::LookupMemberExpr. For HLSL,
if the base type is a scalar, we implicit cast the scalar to a
one-element vector then call back to perform the vector lookup.
Fixes#56658 and #67511
Instead of passing the Size by reference, assuming it is initialized,
return it alongside the expected char result as a POD.
This makes the interface less error prone: previous interface expected
the Size reference to be initialized, and it was often forgotten,
leading to uninitialized variable usage. This patch fixes the issue.
This also generates faster code, as the returned POD (a char and an
unsigned) fits in 64 bits. The speedup according to compile time tracker
reach -O.7%, with a good number of -0.4%. Details are available on
https://llvm-compile-time-tracker.com/compare.php?from=3fe63f81fcb999681daa11b2890c82fda3aaeef5&to=fc76a9202f737472ecad4d6e0b0bf87a013866f3&stat=instructions:u
And icing on the cake, on my setup it also shaves 2kB out of
libclang-cpp :-)
This is a recommit of d8f5a18b6e587aeaa8b99707e87b652f49b160cd for
This reverts commit f2583f3acf596cc545c8c0e3cb28e712f4ebf21b.
There is a large body of non-conforming C-like code using format strings
like this:
#define PRIuS "zu"
void h(size_t foo, size_t bar) {
printf("foo is %"PRIuS", bar is %"PRIuS, foo, bar);
}
Rejecting this code would be very disruptive. We could decide to do
that, but it's sufficiently disruptive that I think it requires
gathering more community consensus with an RFC, and Aaron indicated [1]
it's OK to revert for now so continuous testing systems can see past
this issue while we decide what to do.
[1] https://reviews.llvm.org/D153156#4607717
For applications like clangd, the preamble remains an important optimization
when editing a module definition. The global module fragment is a good fit for
it as it by definition contains only preprocessor directives.
Before this patch, we would terminate the preamble immediately at the "module"
keyword.
Differential Revision: https://reviews.llvm.org/D158439
This does the rename for most internal uses of C2x, but does not rename
or reword diagnostics (those will be done in a follow-up).
I also updated standards references and citations to the final wording
in the standard.
During the ISO C++ Committee meeting plenary session the C++23 Standard
has been voted as technical complete.
This updates the reference to c++2b to c++23 and updates the __cplusplus
macro.
Drive-by fixes c++1z -> c++17 and c++2a -> c++20 when seen.
Reviewed By: aaron.ballman
Differential Revision: https://reviews.llvm.org/D149553
This ensures we get the correct FileCharacteristic during scanning. In a
yet-to-be-upstreamed branch this fixes observable failures, but it's
also good to handle this on principle: the FileCharacteristic is a
property of the file that is observable in the scanner, so there is
nothing preventing us from depending on it.
rdar://108627403
Differential Revision: https://reviews.llvm.org/D149777
\u<DIGIT>{...} was incorrectly parsed as a valid UCN instead
of emitting a diagnostic, causing an assertion failure.
Reviewed By: tahonermann
Differential Revision: https://reviews.llvm.org/D139889