llvm-project

Author	SHA1	Message	Date
yronglin	e6e874ce8f	[clang] Allow trivial pp-directives before C++ module directive (#153641 ) Consider the following code: ```cpp # 1 __FILE__ 1 3 export module a; ``` According to the wording in [P1857R3](https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p1857r3.html): ``` A module directive may only appear as the first preprocessing tokens in a file (excluding the global module fragment.) ``` and the wording in [[cpp.pre]](https://eel.is/c++draft/cpp.pre#nt:module-file) ``` module-file: pp-global-module-fragment[opt] pp-module group[opt] pp-private-module-fragment[opt] ``` `#` is the first pp-token in the translation unit, and it was rejected by clang, but they really should be exempted from this rule. The goal is to not allow any preprocessor conditionals or most state changes, but these don't fit that. State change would mean most semantically observable preprocessor state, particularly anything that is order dependent. Global flags like being a system header/module shouldn't matter. We should exempt a brunch of directives, even though it violates the current standard wording. In this patch, we introduce a `TrivialDirectiveTracer` to trace the State change that described above and propose to exempt the following kind of directive: `#line`, GNU line marker, `#ident`, `#pragma comment`, `#pragma mark`, `#pragma detect_mismatch`, `#pragma clang __debug`, `#pragma message`, `#pragma GCC warning`, `#pragma GCC error`, `#pragma gcc diagnostic`, `#pragma OPENCL EXTENSION`, `#pragma warning`, `#pragma execution_character_set`, `#pragma clang assume_nonnull` and builtin macro expansion. Fixes https://github.com/llvm/llvm-project/issues/145274 --------- Signed-off-by: yronglin <yronglin777@gmail.com>	2025-08-18 14:17:35 +08:00
Alex Sepkowski	7c16a31aa5	Address a handful of C4146 compiler warnings where literals can be replaced with std::numeric_limits (#147623 ) This PR addresses instances of compiler warning C4146 that can be replaced with std::numeric_limits. Specifically, these are cases where a literal such as '-1ULL' was used to assign a value to a uint64_t variable. The intent is much cleaner if we use the appropriate std::numeric_limits value<Type>::max() for these cases. Addresses #147439	2025-07-09 16:13:28 -07:00
Volodymyr Sapsai	3b05edfc5f	[clang][deps] Stop lexing if hit a failure while loading a PCH/module in a submodule. (#146976 ) Otherwise we are continuing in an invalid state and can easily crash. It is a follow-up to cde90e68f8123e7abef3f9e18d79980aa19f460a but an important difference is when a failure happens in a submodule. In this case in `Preprocessor::HandleEndOfFile` `tok::eof` is replaced by `tok::annot_module_end`. And after exiting a file with bad `#include/#import` we work with a new buffer, so `BufferPtr < BufferEnd`. As there are no signs to stop lexing we just keep doing it. The fix is the same as in dc9fdaf2171cc480300d5572606a8ede1678d18b in `Lexer::LexTokenInternal` but this time in `Lexer::LexDependencyDirectiveToken` as well. rdar://152499276	2025-07-07 12:28:03 -07:00
Haojian Wu	b7c4ac2db4	NFC, use structured binding to simplify the code.	2025-07-07 17:07:37 +02:00
Kazu Hirata	c9cdc33dd6	[clang] Remove unused includes (NFC) (#146254 ) These are identified by misc-include-cleaner. I've filtered out those that break builds. Also, I'm staying away from llvm-config.h, config.h, and Compiler.h, which likely cause platform- or compiler-specific build failures.	2025-06-28 20:41:46 -07:00
Haojian Wu	0b6ddb02ef	[clang] NFC: Add alias for std::pair<FileID, unsigned> used in SourceLocation (#145711 ) Introduce a type alias for the commonly used `std::pair<FileID, unsigned>` to improve code readability, and make it easier for future updates (64-bit source locations).	2025-06-26 14:12:51 +02:00
yronglin	0529a34600	[clang][Preprocessor] Handle the first pp-token in EnterMainSourceFile (#145244 ) Depends on [[clang][Preprocessor] Add peekNextPPToken, makes look ahead next token without side-effects](https://github.com/llvm/llvm-project/pull/143898). This PR fix the performance regression that introduced in https://github.com/llvm/llvm-project/pull/144233. The original PR(https://github.com/llvm/llvm-project/pull/144233) handle the first pp-token in the main source file in the macro definition/expansion and `Lexer::Lex`, but the lexer is almost always on the hot path, we may hit a performance regression. In this PR, we handle the first pp-token in `Preprocessor::EnterMainSourceFile`. --------- Signed-off-by: yronglin <yronglin777@gmail.com>	2025-06-26 08:49:43 +08:00
yronglin	e8976e92f6	[clang][Preprocessor] Add peekNextPPToken, makes look ahead next token without side-effects (#143898 ) This PR introduce a new function `peekNextPPToken`. It's an extension of `isNextPPTokenLParen` and can makes look ahead one token in preprocessor without side-effects. It's also the 1st part of https://github.com/llvm/llvm-project/pull/107168 and it was used to look ahead next token then determine whether current lexing pp directive is one of pp-import or pp-module directive. At the start of phase 4 an import or module token is treated as starting a directive and are converted to their respective keywords iff: - After skipping horizontal whitespace are - at the start of a logical line, or - preceded by an export at the start of the logical line. - Are followed by an identifier pp token (before macro expansion), or - <, ", or : (but not ::) pp tokens for import, or - ; for module Otherwise the token is treated as an identifier. --------- Signed-off-by: yronglin <yronglin777@gmail.com>	2025-06-24 18:55:21 +08:00
yronglin	ea321392eb	[C++][Modules] A module directive may only appear as the first preprocessing tokens in a file (#144233 ) This PR is 2nd part of [P1857R3](https://github.com/llvm/llvm-project/pull/107168) implementation, and mainly implement the restriction `A module directive may only appear as the first preprocessing tokens in a file (excluding the global module fragment.)`: [cpp.pre](https://eel.is/c++draft/cpp.pre): ``` module-file: pp-global-module-fragment[opt] pp-module group[opt] pp-private-module-fragment[opt] ``` We also refine tests use `split-file` instead of conditional macro. Signed-off-by: yronglin <yronglin777@gmail.com>	2025-06-21 18:58:56 +08:00
Jan Svoboda	cde90e68f8	[clang][deps] Respect `Lexer::cutOffLexing()` (#134404 ) This is crucial when recovering from fatal loader errors. Without it, the `Lexer` keeps yielding more tokens and the compiler may access invalid `ASTReader` state. rdar://133388373	2025-04-04 10:21:33 -07:00
Aaron Ballman	449cdfacc0	Suppress pedantic diagnostic for a file not ending in EOL (#131794 ) WG14 added N3411 to the list of papers which apply to older versions of C in C2y, and WG21 adopted CWG787 as a Defect Report in C++11. So we no longer should be issuing a pedantic diagnostic about a file which does not end with a newline character. We do, however, continue to support -Wnewline-eof as an opt-in diagnostic.	2025-03-19 07:49:16 -04:00
Aaron Ballman	9cf46fb230	[C2y] Add octal prefixes, deprecate unprefixed octals (#131626 ) WG14 N3353 added support for 0o and 0O as octal literal prefixes. It also deprecates use of octal literals without a prefix, except for the literal 0. This feature is being exposed as an extension in older C language modes as well as in all C++ language modes.	2025-03-18 07:28:59 -04:00
Aaron Ballman	9fc3310798	[C2y] Implement WG14 N3411 (#130180 ) This paper (https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3411.pdf) allows a source file to end without a newline. Clang has supported this as a conforming extension for a long time, so this suppresses the diagnotic in C2y mode but continues to diagnose as an extension in earlier language modes. It also continues to diagnose if the user passes -Wnewline-eof explicitly.	2025-03-07 08:34:22 -05:00
Clement Courbet	fbd86d05fe	[clang-reorder-fields] Reorder leading comments (#123740 ) Similarly to https://github.com/llvm/llvm-project/pull/122918, leading comments are currently not being moved. ``` struct Foo { // This one is the cool field. int a; int b; }; ``` becomes: ``` struct Foo { // This one is the cool field. int b; int a; }; ``` but should be: ``` struct Foo { int b; // This one is the cool field. int a; }; ```	2025-01-22 13:42:00 +01:00
Clement Courbet	1819646623	[clang][refactor] Refactor `findNextTokenIncludingComments` (#123060 ) We have two copies of the same code in clang-tidy and clang-reorder-fields, and those are extremenly similar to `Lexer::findNextToken`, so just add an extra agument to the latter. --------- Co-authored-by: cor3ntin <corentinjabot@gmail.com>	2025-01-16 17:06:05 +01:00
Samira Bazuzi	f7e8be7c66	Skip escaped newlines before checking for whitespace in Lexer::getRawToken. (#117548 ) The Lexer used in getRawToken is not told to keep whitespace, so when it skips over escaped newlines, it also ignores whitespace, regardless of getRawToken's IgnoreWhiteSpace parameter. Instead of letting this case fall through to lexing, check for whitespace after skipping over any escaped newlines.	2024-12-05 09:37:46 -05:00
Kazu Hirata	7642759498	[Lex] Remove unused includes (NFC) (#116460 ) Identified with misc-include-cleaner.	2024-11-16 12:14:06 -08:00
Aaron Ballman	1881f648e2	Remove ^^ as a token in OpenCL (#108224 ) OpenCL has a reserved operator (^^), the use of which was diagnosed as an error (735c6cdebdcd4292928079cb18a90f0dd5cd65fb). However, OpenCL also encourages working with the blocks language extension. This token has a parsing ambiguity as a result. Consider: unsigned x=0; unsigned y=x^^{return 0;}(); This should result in y holding the value zero (0^0) through an immediately invoked block call as the right-hand side of the xor operator. However, it causes errors instead because of this reserved token: https://godbolt.org/z/navf7jTv1 This token is still reserved in OpenCL 3.0, so we still wish to issue a diagnostic for its use. However, we do not need to create a token for an extension point that's been unused for about a decade. So this patch moves the diagnostic from a parsing diagnostic to a lexing diagnostic and no longer forms a single token. The diagnostic behavior is slightly worse as a result, but still seems acceptable. Part of the reason this is coming up is because WG21 is considering using ^^ as a token for reflection, so this token may come back in the future.	2024-09-16 07:46:58 -04:00
Mital Ashok	4137309842	[Clang] Warn with -Wpre-c23-compat instead of -Wpre-c++17-compat for u8 character literals in C23 (#97210 ) Co-authored-by: cor3ntin <corentinjabot@gmail.com>	2024-09-05 10:15:54 +02:00
Sirraide	e46468407a	[Clang] Allow raw string literals in C as an extension (#88265 ) This enables raw R"" string literals in C in some language modes and adds an option to disable or enable them explicitly as an extension. Background: GCC supports raw string literals in C in `-gnuXY` modes starting with gnu99. This pr both enables raw string literals in gnu99 mode and later in C and adds an `-f[no-]raw-string-literals` flag to override this behaviour. The decision not to enable raw string literals in gnu89 mode, according to the GCC devs, is intentional as that mode is supposed to be used for ‘old code’ that they don’t want to break; we’ve decided to match GCC’s behaviour here as well. The `-fraw-string-literals` flag can additionally be used to enable raw string literals in modes where they aren’t enabled by default (such as c99—as opposed to gnu99—or even e.g. C++03); conversely, the negated flag can be used to disable them in any gnuXY modes that do provide them by default, or to override a previous flag. However, we do not support disabling raw string literals (or indeed either of these two options) in C++11 mode and later, because we don’t want to just start supporting disabling features that are actually part of the language in the general case. This fixes #85703.	2024-07-10 12:10:44 +02:00
Aaron Ballman	4f09ac7705	Fix off-by-one issue found by post-commit review	2024-06-13 07:48:08 -04:00
cor3ntin	2ace7bdcfe	[Clang] allow ` @$ `` in raw string delimiters in C++26 (#93216 ) And as an extension in older language modes. Per https://eel.is/c++draft/lex.string#nt:d-char Fixes #93130	2024-05-28 15:38:02 +02:00
akshaykumars614	cc23574184	bad error message on incorrect string literal #18079 (#81670 ) (bad error message on incorrect string literal) Fixed the error message for incorrect string literal before: ``` test.cpp:1:19: error: invalid character ' ' character in raw string delimiter; use PREFIX( )PREFIX to delimit raw string char const* a = R" ^ ``` now: ``` test.cpp:1:19: error: invalid newline character in raw string delimiter; use PREFIX( )PREFIX to delimit raw string 1 \| char const* a = R" \| ^ ``` --------- Co-authored-by: Jon Roelofs <jroelofs@gmail.com>	2024-02-15 20:07:54 -05:00
Owen Pan	a8279a8bc5	[clang][NFC] Move isSimpleTypeSpecifier() from Sema to Token (#80101 ) So that it can be used by clang-format.	2024-01-31 20:16:18 -08:00
Kazu Hirata	f3dcc2351c	[clang] Use StringRef::{starts,ends}_with (NFC) (#75149 ) This patch replaces uses of StringRef::{starts,ends}with with StringRef::{starts,ends}_with for consistency with std::{string,string_view}::{starts,ends}_with in C++20. I'm planning to deprecate and eventually remove StringRef::{starts,ends}with.	2023-12-13 08:54:13 -08:00
Chris B	2630d72cb3	[HLSL] Support vector swizzles on scalars (#67700 ) HLSL supports vector swizzles on scalars by implicitly converting the scalar to a single-element vector. This syntax is a convienent way to initialize vectors based on filling a scalar value. There are two parts of this change. The first part in the Lexer splits numeric constant tokens when a `.x` or `.r` suffix is encountered. This splitting is a bit hacky but allows the numeric constant to be parsed separately from the vector element expression. There is an ambiguity here with the `r` suffix used by fixed point types, however fixed point types aren't supported in HLSL so this should not cause any exposable problems (a separate issue has been filed to track validating language options for HLSL: #67689). The second part of this change is in Sema::LookupMemberExpr. For HLSL, if the base type is a scalar, we implicit cast the scalar to a one-element vector then call back to perform the vector lookup. Fixes #56658 and #67511	2023-11-29 11:25:02 -06:00
serge-sans-paille	8116b6dce7	[clang] Change GetCharAndSizeSlow interface to by-value style Instead of passing the Size by reference, assuming it is initialized, return it alongside the expected char result as a POD. This makes the interface less error prone: previous interface expected the Size reference to be initialized, and it was often forgotten, leading to uninitialized variable usage. This patch fixes the issue. This also generates faster code, as the returned POD (a char and an unsigned) fits in 64 bits. The speedup according to compile time tracker reach -O.7%, with a good number of -0.4%. Details are available on https://llvm-compile-time-tracker.com/compare.php?from=3fe63f81fcb999681daa11b2890c82fda3aaeef5&to=fc76a9202f737472ecad4d6e0b0bf87a013866f3&stat=instructions:u And icing on the cake, on my setup it also shaves 2kB out of libclang-cpp :-) This is a recommit of d8f5a18b6e587aeaa8b99707e87b652f49b160cd for	2023-10-31 00:08:01 +01:00
Nico Weber	1c876ff515	Revert "Perf/lexer faster slow get char and size (#70543 )" This reverts commit d8f5a18b6e587aeaa8b99707e87b652f49b160cd. Breaks build, see: https://github.com/llvm/llvm-project/pull/70543#issuecomment-1784227421	2023-10-29 21:11:39 -04:00
serge-sans-paille	d8f5a18b6e	Perf/lexer faster slow get char and size (#70543 ) Co-authored-by: serge-sans-paille <sguelton@mozilla.com>	2023-10-29 18:17:02 +00:00
serge-sans-paille	9f0f606081	[clang] Provide an SSE4.2 implementation of identifier token lexer (#68962 ) The _mm_cmpistri instruction can be used to quickly parse identifiers. With this patch activated, clang pre-processes <iostream> 1.8% faster, and sqlite3.c amalgametion 1.5% faster, based on time measurements and number of executed instructions as measured by valgrind. The introduction of an extra helper function in the regular case has no impact on performance, see https://llvm-compile-time-tracker.com/compare.php?from=30240e428f0ec7d4a6d1b84f9f807ce12b46cfd1&to=12bcb016cde4579ca7b75397762098c03eb4f264&stat=instructions:u --------- Co-authored-by: serge-sans-paille <sguelton@mozilla.com>	2023-10-19 08:45:54 +00:00
Timm Bäder	c654193c22	[clang][Lex][NFC] Make some local variables const	2023-10-07 07:11:45 +02:00
Corentin Jabot	3eb67d28de	[Clang] Handle non-ASCII after line splicing int a\ ス; Failed to be parsed as a valid identifier. Fixes #65156 Reviewed By: tahonermann Differential Revision: https://reviews.llvm.org/D159345	2023-09-06 23:20:00 +02:00
Timm Bäder	bb94817ecf	[clang][NFC] Remove stray slash	2023-09-04 16:12:30 +02:00
Reid Kleckner	0d9919d362	Revert "[Clang] CWG1473: do not err on the lack of space after operator""" This reverts commit f2583f3acf596cc545c8c0e3cb28e712f4ebf21b. There is a large body of non-conforming C-like code using format strings like this: #define PRIuS "zu" void h(size_t foo, size_t bar) { printf("foo is %"PRIuS", bar is %"PRIuS, foo, bar); } Rejecting this code would be very disruptive. We could decide to do that, but it's sufficiently disruptive that I think it requires gathering more community consensus with an RFC, and Aaron indicated [1] it's OK to revert for now so continuous testing systems can see past this issue while we decide what to do. [1] https://reviews.llvm.org/D153156#4607717	2023-08-22 18:10:41 -07:00
Sam McCall	23459f13fc	[Lex] Preambles should contain the global module fragment. For applications like clangd, the preamble remains an important optimization when editing a module definition. The global module fragment is a good fit for it as it by definition contains only preprocessor directives. Before this patch, we would terminate the preamble immediately at the "module" keyword. Differential Revision: https://reviews.llvm.org/D158439	2023-08-22 11:55:51 +02:00
Po-yao Chang	f2583f3acf	[Clang] CWG1473: do not err on the lack of space after operator"" In addition: 1. Fix tests for CWG2521 deprecation warning. 2. Enable -Wdeprecated-literal-operator by default. Differential Revision: https://reviews.llvm.org/D153156	2023-08-17 23:10:37 +08:00
Aaron Ballman	9c4ade0623	[C23] Rename C2x->C23 in diagnostics This renames C2x to C23 in diagnostic identifiers and messages. The changes were made mechanically.	2023-08-11 08:42:01 -04:00
Aaron Ballman	0ce056a814	[C23] Rename C2x -> C23; NFC This does the rename for most internal uses of C2x, but does not rename or reword diagnostics (those will be done in a follow-up). I also updated standards references and citations to the final wording in the standard.	2023-08-11 07:43:43 -04:00
Nikolas Klauser	874217f99b	[clang] Enable C++11-style attributes in all language modes This also ignores and deprecates the `-fdouble-square-bracket-attributes` command line flag, which seems to not be used anywhere. At least a code search exclusively found mentions of it in documentation: https://sourcegraph.com/search?q=context:global+-fdouble-square-bracket-attributes+-file:clang/+-file:test/Sema/+-file:test/Parser/+-file:test/AST/+-file:test/Preprocessor/+-file:test/Misc/+archived:yes&patternType=standard&sm=0&groupBy=repo RFC: https://discourse.llvm.org/t/rfc-enable-c-11-c2x-attributes-in-all-standard-modes-as-an-extension-and-remove-fdouble-square-bracket-attributes This enables `[[]]` attributes in all C and C++ language modes without warning by default. `-Wc++-extensions` does warn. GCC has enabled this extension in all C modes since GCC 10. Reviewed By: aaron.ballman, MaskRay Spies: #clang-vendors, beanz, JDevlieghere, Michael137, MaskRay, sstefan1, jplehr, cfe-commits, lldb-commits, dmgreen, jdoerfert, wenlei, wlei Differential Revision: https://reviews.llvm.org/D151683	2023-07-22 09:34:15 -07:00
Corentin Jabot	304e974694	[Clang] Correctly handle $, @, and ` when represented as UCN This covers * P2558R2 (C++, wg21.link/P2558) * N2701 (C, https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2701.htm) * N3124 (C, https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3124.pdf) This patch * Disallow representing $ as a UCN in all language mode, which did not properly work (see GH62133), and which in made ill-formed in C++ and C by P2558 and N3124 respectively * Allow a UCN for any character in C2X, in string and character literals Fixes #62133 Reviewed By: #clang-language-wg, tahonermann Differential Revision: https://reviews.llvm.org/D153621	2023-07-12 08:03:23 +02:00
Mark de Wever	ba15d186e5	[clang] Use -std=c++23 instead of -std=c++2b During the ISO C++ Committee meeting plenary session the C++23 Standard has been voted as technical complete. This updates the reference to c++2b to c++23 and updates the __cplusplus macro. Drive-by fixes c++1z -> c++17 and c++2a -> c++20 when seen. Reviewed By: aaron.ballman Differential Revision: https://reviews.llvm.org/D149553	2023-05-04 19:19:52 +02:00
Ben Langmuir	7b492d1be0	[clang][deps] Teach dep directive scanner about #pragma clang system_header This ensures we get the correct FileCharacteristic during scanning. In a yet-to-be-upstreamed branch this fixes observable failures, but it's also good to handle this on principle: the FileCharacteristic is a property of the file that is observable in the scanner, so there is nothing preventing us from depending on it. rdar://108627403 Differential Revision: https://reviews.llvm.org/D149777	2023-05-03 13:53:21 -07:00
Chuanqi Xu	aba32abe2d	[C++20] [Modules] Avoid crash if the inconsistency the size of lang options exceeds 1 Close https://github.com/llvm/llvm-project/issues/62359 The root reason for the crash is that we didn't test the case that the bits number of a language option exceeds 1.	2023-04-27 14:20:59 +08:00
Kazu Hirata	8bdf387858	Use *{Map,Set}::contains (NFC) Differential Revision: https://reviews.llvm.org/D146104	2023-03-15 08:46:32 -07:00
Kazu Hirata	55e2cd1609	Use llvm::count{lr}_{zero,one} (NFC)	2023-01-28 12:41:20 -08:00
Argyrios Kyrtzidis	ed6d09dd4e	[Lex] For dependency directive lexing, angled includes in `__has_include` should be lexed as string literals rdar://104386604 Differential Revision: https://reviews.llvm.org/D142143	2023-01-19 15:23:21 -08:00
Kazu Hirata	2d861436a9	[clang] Remove remaining uses of llvm::Optional (NFC) This patch removes several "using" declarations and #include "llvm/ADT/Optional.h". This is part of an effort to migrate from llvm::Optional to std::optional: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716	2023-01-14 13:37:25 -08:00
Kazu Hirata	6ad0788c33	[clang] Use std::optional instead of llvm::Optional (NFC) This patch replaces (llvm::\|)Optional< with std::optional<. I'll post a separate patch to remove #include "llvm/ADT/Optional.h". This is part of an effort to migrate from llvm::Optional to std::optional: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716	2023-01-14 12:31:01 -08:00
Kazu Hirata	a1580d7b59	[clang] Add #include <optional> (NFC) This patch adds #include <optional> to those files containing llvm::Optional<...> or Optional<...>. I'll post a separate patch to actually replace llvm::Optional with std::optional. This is part of an effort to migrate from llvm::Optional to std::optional: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716	2023-01-14 11:07:21 -08:00
Corentin Jabot	0d6b26b4d3	[Clang] Fix a crash when encountering an ill-formed delimited UCN. \u<DIGIT>{...} was incorrectly parsed as a valid UCN instead of emitting a diagnostic, causing an assertion failure. Reviewed By: tahonermann Differential Revision: https://reviews.llvm.org/D139889	2023-01-03 20:57:52 +01:00

1 2 3 4 5 ...

435 Commits