418 Commits

Author SHA1 Message Date
Aaron Ballman
1881f648e2
Remove ^^ as a token in OpenCL (#108224)
OpenCL has a reserved operator (^^), the use of which was diagnosed as
an error (735c6cdebdcd4292928079cb18a90f0dd5cd65fb). However, OpenCL
also encourages working with the blocks language extension. This token
has a parsing ambiguity as a result. Consider:

  unsigned x=0;
  unsigned y=x^^{return 0;}();

This should result in y holding the value zero (0^0) through an
immediately invoked block call as the right-hand side of the xor
operator. However, it causes errors instead because of this reserved
token: https://godbolt.org/z/navf7jTv1

This token is still reserved in OpenCL 3.0, so we still wish to issue a
diagnostic for its use. However, we do not need to create a token for an
extension point that's been unused for about a decade. So this patch
moves the diagnostic from a parsing diagnostic to a lexing diagnostic
and no longer forms a single token. The diagnostic behavior is slightly
worse as a result, but still seems acceptable.

Part of the reason this is coming up is because WG21 is considering
using ^^ as a token for reflection, so this token may come back in the
future.
2024-09-16 07:46:58 -04:00
Mital Ashok
4137309842
[Clang] Warn with -Wpre-c23-compat instead of -Wpre-c++17-compat for u8 character literals in C23 (#97210)
Co-authored-by: cor3ntin <corentinjabot@gmail.com>
2024-09-05 10:15:54 +02:00
Sirraide
e46468407a
[Clang] Allow raw string literals in C as an extension (#88265)
This enables raw R"" string literals in C in some language modes
and adds an option to disable or enable them explicitly as an
extension.

Background: GCC supports raw string literals in C in `-gnuXY` modes
starting with gnu99. This pr both enables raw string literals in gnu99 
mode and later in C and adds an `-f[no-]raw-string-literals` flag to override 
this behaviour. The decision not to enable raw string literals in gnu89
mode, according to the GCC devs, is intentional as that mode is supposed
to be used for ‘old code’ that they don’t want to break; we’ve decided to
match GCC’s behaviour here as well.

The `-fraw-string-literals`  flag can additionally be used to enable raw string 
literals in modes where they aren’t enabled by default (such as c99—as 
opposed to gnu99—or even e.g. C++03); conversely, the negated flag can 
be used to disable them in any gnuXY modes that *do* provide them by 
default, or to override a previous flag. However, we do *not*  support 
disabling raw string literals (or indeed either of these two options) in 
C++11 mode and later, because we don’t want to just start supporting 
disabling features that are actually part of the language in the general case.

This fixes #85703.
2024-07-10 12:10:44 +02:00
Aaron Ballman
4f09ac7705 Fix off-by-one issue found by post-commit review 2024-06-13 07:48:08 -04:00
cor3ntin
2ace7bdcfe
[Clang] allow ` @$ `` in raw string delimiters in C++26 (#93216)
And as an extension in older language modes.

Per https://eel.is/c++draft/lex.string#nt:d-char

Fixes #93130
2024-05-28 15:38:02 +02:00
akshaykumars614
cc23574184
bad error message on incorrect string literal #18079 (#81670)
(bad error message on incorrect string literal)

Fixed the error message for incorrect string literal

before:

```
test.cpp:1:19: error: invalid character '
' character in raw string delimiter; use PREFIX( )PREFIX to delimit raw string
char const* a = R"
                  ^
```

now:

```
test.cpp:1:19: error: invalid newline character in raw string delimiter; use PREFIX( )PREFIX to delimit raw string
    1 | char const* a = R"
      |                   ^
```

---------

Co-authored-by: Jon Roelofs <jroelofs@gmail.com>
2024-02-15 20:07:54 -05:00
Owen Pan
a8279a8bc5
[clang][NFC] Move isSimpleTypeSpecifier() from Sema to Token (#80101)
So that it can be used by clang-format.
2024-01-31 20:16:18 -08:00
Kazu Hirata
f3dcc2351c
[clang] Use StringRef::{starts,ends}_with (NFC) (#75149)
This patch replaces uses of StringRef::{starts,ends}with with
StringRef::{starts,ends}_with for consistency with
std::{string,string_view}::{starts,ends}_with in C++20.

I'm planning to deprecate and eventually remove
StringRef::{starts,ends}with.
2023-12-13 08:54:13 -08:00
Chris B
2630d72cb3
[HLSL] Support vector swizzles on scalars (#67700)
HLSL supports vector swizzles on scalars by implicitly converting the
scalar to a single-element vector. This syntax is a convienent way to
initialize vectors based on filling a scalar value.

There are two parts of this change. The first part in the Lexer splits
numeric constant tokens when a `.x` or `.r` suffix is encountered. This
splitting is a bit hacky but allows the numeric constant to be parsed
separately from the vector element expression. There is an ambiguity
here with the `r` suffix used by fixed point types, however fixed point
types aren't supported in HLSL so this should not cause any exposable
problems (a separate issue has been filed to track validating language
options for HLSL: #67689).

The second part of this change is in Sema::LookupMemberExpr. For HLSL,
if the base type is a scalar, we implicit cast the scalar to a
one-element vector then call back to perform the vector lookup.

Fixes #56658 and #67511
2023-11-29 11:25:02 -06:00
serge-sans-paille
8116b6dce7
[clang] Change GetCharAndSizeSlow interface to by-value style
Instead of passing the Size by reference, assuming it is initialized,
return it alongside the expected char result as a POD.

This makes the interface less error prone: previous interface expected
the Size reference to be initialized, and it was often forgotten,
leading to uninitialized variable usage. This patch fixes the issue.

This also generates faster code, as the returned POD (a char and an
unsigned) fits in 64 bits. The speedup according to compile time tracker
reach -O.7%, with a good number of -0.4%. Details are available on

        https://llvm-compile-time-tracker.com/compare.php?from=3fe63f81fcb999681daa11b2890c82fda3aaeef5&to=fc76a9202f737472ecad4d6e0b0bf87a013866f3&stat=instructions:u

And icing on the cake, on my setup it also shaves 2kB out of
libclang-cpp :-)

This is a recommit of d8f5a18b6e587aeaa8b99707e87b652f49b160cd for
2023-10-31 00:08:01 +01:00
Nico Weber
1c876ff515 Revert "Perf/lexer faster slow get char and size (#70543)"
This reverts commit d8f5a18b6e587aeaa8b99707e87b652f49b160cd.
Breaks build, see:
https://github.com/llvm/llvm-project/pull/70543#issuecomment-1784227421
2023-10-29 21:11:39 -04:00
serge-sans-paille
d8f5a18b6e
Perf/lexer faster slow get char and size (#70543)
Co-authored-by: serge-sans-paille <sguelton@mozilla.com>
2023-10-29 18:17:02 +00:00
serge-sans-paille
9f0f606081
[clang] Provide an SSE4.2 implementation of identifier token lexer (#68962)
The _mm_cmpistri instruction can be used to quickly parse identifiers.

With this patch activated, clang pre-processes <iostream> 1.8% faster,
and sqlite3.c amalgametion 1.5% faster, based on time measurements and
number of executed instructions as measured by valgrind.

The introduction of an extra helper function in the regular case has no
impact on performance, see


https://llvm-compile-time-tracker.com/compare.php?from=30240e428f0ec7d4a6d1b84f9f807ce12b46cfd1&to=12bcb016cde4579ca7b75397762098c03eb4f264&stat=instructions:u

---------

Co-authored-by: serge-sans-paille <sguelton@mozilla.com>
2023-10-19 08:45:54 +00:00
Timm Bäder
c654193c22 [clang][Lex][NFC] Make some local variables const 2023-10-07 07:11:45 +02:00
Corentin Jabot
3eb67d28de [Clang] Handle non-ASCII after line splicing
int a\
ス;

Failed to be parsed as a valid identifier.

Fixes #65156

Reviewed By: tahonermann

Differential Revision: https://reviews.llvm.org/D159345
2023-09-06 23:20:00 +02:00
Timm Bäder
bb94817ecf [clang][NFC] Remove stray slash 2023-09-04 16:12:30 +02:00
Reid Kleckner
0d9919d362 Revert "[Clang] CWG1473: do not err on the lack of space after operator"""
This reverts commit f2583f3acf596cc545c8c0e3cb28e712f4ebf21b.

There is a large body of non-conforming C-like code using format strings
like this:

  #define PRIuS "zu"
  void h(size_t foo, size_t bar) {
    printf("foo is %"PRIuS", bar is %"PRIuS, foo, bar);
  }

Rejecting this code would be very disruptive. We could decide to do
that, but it's sufficiently disruptive that I think it requires
gathering more community consensus with an RFC, and Aaron indicated [1]
it's OK to revert for now so continuous testing systems can see past
this issue while we decide what to do.

[1] https://reviews.llvm.org/D153156#4607717
2023-08-22 18:10:41 -07:00
Sam McCall
23459f13fc [Lex] Preambles should contain the global module fragment.
For applications like clangd, the preamble remains an important optimization
when editing a module definition. The global module fragment is a good fit for
it as it by definition contains only preprocessor directives.
Before this patch, we would terminate the preamble immediately at the "module"
keyword.

Differential Revision: https://reviews.llvm.org/D158439
2023-08-22 11:55:51 +02:00
Po-yao Chang
f2583f3acf [Clang] CWG1473: do not err on the lack of space after operator""
In addition:
  1. Fix tests for CWG2521 deprecation warning.
  2. Enable -Wdeprecated-literal-operator by default.

Differential Revision: https://reviews.llvm.org/D153156
2023-08-17 23:10:37 +08:00
Aaron Ballman
9c4ade0623 [C23] Rename C2x->C23 in diagnostics
This renames C2x to C23 in diagnostic identifiers and messages. The
changes were made mechanically.
2023-08-11 08:42:01 -04:00
Aaron Ballman
0ce056a814 [C23] Rename C2x -> C23; NFC
This does the rename for most internal uses of C2x, but does not rename
or reword diagnostics (those will be done in a follow-up).

I also updated standards references and citations to the final wording
in the standard.
2023-08-11 07:43:43 -04:00
Nikolas Klauser
874217f99b [clang] Enable C++11-style attributes in all language modes
This also ignores and deprecates the `-fdouble-square-bracket-attributes` command line flag, which seems to not be used anywhere. At least a code search exclusively found mentions of it in documentation: https://sourcegraph.com/search?q=context:global+-fdouble-square-bracket-attributes+-file:clang/*+-file:test/Sema/*+-file:test/Parser/*+-file:test/AST/*+-file:test/Preprocessor/*+-file:test/Misc/*+archived:yes&patternType=standard&sm=0&groupBy=repo

RFC: https://discourse.llvm.org/t/rfc-enable-c-11-c2x-attributes-in-all-standard-modes-as-an-extension-and-remove-fdouble-square-bracket-attributes

This enables `[[]]` attributes in all C and C++ language modes without warning by default. `-Wc++-extensions` does warn. GCC has enabled this extension in all C modes since GCC 10.

Reviewed By: aaron.ballman, MaskRay

Spies: #clang-vendors, beanz, JDevlieghere, Michael137, MaskRay, sstefan1, jplehr, cfe-commits, lldb-commits, dmgreen, jdoerfert, wenlei, wlei

Differential Revision: https://reviews.llvm.org/D151683
2023-07-22 09:34:15 -07:00
Corentin Jabot
304e974694 [Clang] Correctly handle $, @, and ` when represented as UCN
This covers
 * P2558R2 (C++, wg21.link/P2558)
 * N2701 (C, https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2701.htm)
 * N3124 (C, https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3124.pdf)

This patch
 * Disallow representing $ as a UCN in all language mode, which did not
   properly work (see GH62133), and which in made ill-formed in
   C++ and C by P2558 and N3124 respectively
 * Allow a UCN for any character in C2X, in string and character
   literals

Fixes #62133

Reviewed By: #clang-language-wg, tahonermann

Differential Revision: https://reviews.llvm.org/D153621
2023-07-12 08:03:23 +02:00
Mark de Wever
ba15d186e5 [clang] Use -std=c++23 instead of -std=c++2b
During the ISO C++ Committee meeting plenary session the C++23 Standard
has been voted as technical complete.

This updates the reference to c++2b to c++23 and updates the __cplusplus
macro.

Drive-by fixes c++1z -> c++17 and c++2a -> c++20 when seen.

Reviewed By: aaron.ballman

Differential Revision: https://reviews.llvm.org/D149553
2023-05-04 19:19:52 +02:00
Ben Langmuir
7b492d1be0 [clang][deps] Teach dep directive scanner about #pragma clang system_header
This ensures we get the correct FileCharacteristic during scanning. In a
yet-to-be-upstreamed branch this fixes observable failures, but it's
also good to handle this on principle: the FileCharacteristic is a
property of the file that is observable in the scanner, so there is
nothing preventing us from depending on it.

rdar://108627403

Differential Revision: https://reviews.llvm.org/D149777
2023-05-03 13:53:21 -07:00
Chuanqi Xu
aba32abe2d [C++20] [Modules] Avoid crash if the inconsistency the size of lang options exceeds 1
Close https://github.com/llvm/llvm-project/issues/62359

The root reason for the crash is that we didn't test the case that
the bits number of a language option exceeds 1.
2023-04-27 14:20:59 +08:00
Kazu Hirata
8bdf387858 Use *{Map,Set}::contains (NFC)
Differential Revision: https://reviews.llvm.org/D146104
2023-03-15 08:46:32 -07:00
Kazu Hirata
55e2cd1609 Use llvm::count{lr}_{zero,one} (NFC) 2023-01-28 12:41:20 -08:00
Argyrios Kyrtzidis
ed6d09dd4e [Lex] For dependency directive lexing, angled includes in __has_include should be lexed as string literals
rdar://104386604

Differential Revision: https://reviews.llvm.org/D142143
2023-01-19 15:23:21 -08:00
Kazu Hirata
2d861436a9 [clang] Remove remaining uses of llvm::Optional (NFC)
This patch removes several "using" declarations and #include
"llvm/ADT/Optional.h".

This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2023-01-14 13:37:25 -08:00
Kazu Hirata
6ad0788c33 [clang] Use std::optional instead of llvm::Optional (NFC)
This patch replaces (llvm::|)Optional< with std::optional<.  I'll post
a separate patch to remove #include "llvm/ADT/Optional.h".

This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2023-01-14 12:31:01 -08:00
Kazu Hirata
a1580d7b59 [clang] Add #include <optional> (NFC)
This patch adds #include <optional> to those files containing
llvm::Optional<...> or Optional<...>.

I'll post a separate patch to actually replace llvm::Optional with
std::optional.

This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2023-01-14 11:07:21 -08:00
Corentin Jabot
0d6b26b4d3 [Clang] Fix a crash when encountering an ill-formed delimited UCN.
\u<DIGIT>{...} was incorrectly parsed as a valid UCN instead
of emitting a diagnostic, causing an assertion failure.

Reviewed By: tahonermann

Differential Revision: https://reviews.llvm.org/D139889
2023-01-03 20:57:52 +01:00
Krasimir Georgiev
231992d9b8 [clang] silence unused variable warning
No functional changes intended.
2022-12-16 11:22:46 +00:00
Corentin Jabot
31f4859c3e [Clang] Allow additional mathematical symbols in identifiers.
Implement the proposed UAX Profile
"Mathematical notation profile for default identifiers".

This implements a not-yet approved Unicode for a vetted
UAX31 identifier profile
https://www.unicode.org/L2/L2022/22230-math-profile.pdf

This change mitigates the reported disruption caused
by the implementation of UAX31 in C++ and C2x,
as these mathematical symbols are commonly used in the
scientific community.

Fixes #54732

Reviewed By: tahonermann, #clang-language-wg

Differential Revision: https://reviews.llvm.org/D137051
2022-12-16 10:20:49 +01:00
Fangrui Song
b1df3a2c0b [Support] llvm::Optional => std::optional
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-16 08:49:10 +00:00
Argyrios Kyrtzidis
59df56413b [clang/Lexer] Enhance Lexer::getImmediateMacroNameForDiagnostics to return a result from non-file buffers
Use `SourceManager::isWrittenInScratchSpace()` to specifically check for token paste or stringization, instead of
excluding all non-file buffers. This allows diagnostics to mention macro names that were defined from the command-line.

Differential Revision: https://reviews.llvm.org/D140164
2022-12-15 22:46:41 -08:00
Corentin Jabot
dbfe446ef3 [Clang] Implement CWG2640 Allow more characters in an n-char sequence
Reviewed By: #clang-language-wg, aaron.ballman, tahonermann

Differential Revision: https://reviews.llvm.org/D138861
2022-12-13 09:02:52 +01:00
Kazu Hirata
f7dffc28b3 Don't include None.h (NFC)
I've converted all known uses of None to std::nullopt, so we no longer
need to include None.h.

This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-10 11:24:26 -08:00
Kazu Hirata
5891420e68 [clang] Use std::nullopt instead of None (NFC)
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated.  The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.

This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-03 11:54:46 -08:00
serge-sans-paille
c8ecbaa2eb
[clang] Fix assert message 2022-11-18 10:10:42 +01:00
serge-sans-paille
cb3f8d53e6
[Lexer] Speedup LexTokenInternal
Only reset "NeedsCleaning" flag in case of re-entrant call.
Do not needlessly blank IdentifierInfo. This information will be set
once the token type is picked.

This yields a nice 1% speedup when pre-processing sqlite amalgamation
through:

valgrind --tool=callgrind ./bin/clang -E sqlite3.c -o/dev/null

Differential Revision: https://reviews.llvm.org/D137960
2022-11-16 15:57:32 +01:00
Argyrios Kyrtzidis
aa484c90cf [Lex/DependencyDirectivesScanner] Keep track of the presence of tokens between the last scanned directive and EOF
Directive `dependency_directives_scan::tokens_present_before_eof` is introduced to indicate there were tokens present before
the last scanned dependency directive and EOF.
This is useful to ensure we correctly identify the macro guards when lexing using the dependency directives.

Differential Revision: https://reviews.llvm.org/D133357
2022-09-07 10:31:29 -07:00
Fangrui Song
3f18f7c007 [clang] LLVM_FALLTHROUGH => [[fallthrough]]. NFC
With C++17 there is no Clang pedantic warning or MSVC C5051.

Reviewed By: aaron.ballman

Differential Revision: https://reviews.llvm.org/D131346
2022-08-08 09:12:46 -07:00
Gabriel Ravier
5674a3c880 Fixed a number of typos
I went over the output of the following mess of a command:

(ulimit -m 2000000; ulimit -v 2000000; git ls-files -z |
 parallel --xargs -0 cat | aspell list --mode=none --ignore-case |
 grep -E '^[A-Za-z][a-z]*$' | sort | uniq -c | sort -n |
 grep -vE '.{25}' | aspell pipe -W3 | grep : | cut -d' ' -f2 | less)

and proceeded to spend a few days looking at it to find probable typos
and fixed a few hundred of them in all of the llvm project (note, the
ones I found are not anywhere near all of them, but it seems like a
good start).

Differential Revision: https://reviews.llvm.org/D130827
2022-08-01 13:13:18 -04:00
Corentin Jabot
ad16268f13 [Clang] Do not check for underscores in isAllowedInitiallyIDChar
isAllowedInitiallyIDChar is only used with non-ASCII codepoints,
which are handled by isAsciiIdentifierStart.
To make that clearer, remove the check for _ from
isAllowedInitiallyIDChar, and assert on ASCII - to ensure neither
_ or $ are passed to this function.

Reviewed By: tahonermann, aaron.ballman

Differential Revision: https://reviews.llvm.org/D130750
2022-07-29 17:46:38 +02:00
Corentin Jabot
aee76cb59c [Clang] Add support for Unicode identifiers (UAX31) in C2x mode.
This implements
N2836 Identifier Syntax using Unicode Standard Annex 31.

The feature was already implemented for C++,
and the semantics are the same.

Unlike C++ there was, afaict, no decision to
backport the feature in older languages mode,
so C17 and earlier are not modified and the
code point tables for these language modes are conserved.

Reviewed By: aaron.ballman

Differential Revision: https://reviews.llvm.org/D130416
2022-07-23 14:08:08 +02:00
Corentin Jabot
6882ca9aff [Clang] Adjust extension warnings for delimited sequences
WG21 approved delimited escape sequences and named escape
sequences.
Adjust the extension warnings accordingly, and update
the release notes.

Reviewed By: aaron.ballman

Differential Revision: https://reviews.llvm.org/D129664
2022-07-14 07:50:58 +02:00
Corentin Jabot
d4892a168f [Clang] Add a warning on invalid UTF-8 in comments.
Introduce an off-by default `-Winvalid-utf8` warning
that detects invalid UTF-8 code units sequences in comments.

Invalid UTF-8 in other places is already diagnosed,
as that cannot appear in identifiers and other grammar constructs.

The warning is off by default as its likely to be somewhat disruptive
otherwise.

This warning allows clang to conform to the yet-to be approved WG21
"P2295R5 Support for UTF-8 as a portable source file encoding"
paper.

Reviewed By: aaron.ballman, #clang-language-wg

Differential Revision: https://reviews.llvm.org/D128059
2022-07-13 10:19:26 +02:00
Jonas Devlieghere
a262f4dbd7 Revert "[Clang] Add a warning on invalid UTF-8 in comments."
This reverts commit cc309721d20c8e544ae7a10a66735ccf4981a11c because it
breaks the following tests on GreenDragon:

  TestDataFormatterObjCCF.py
  TestDataFormatterObjCExpr.py
  TestDataFormatterObjCKVO.py
  TestDataFormatterObjCNSBundle.py
  TestDataFormatterObjCNSData.py
  TestDataFormatterObjCNSError.py
  TestDataFormatterObjCNSNumber.py
  TestDataFormatterObjCNSURL.py
  TestDataFormatterObjCPlain.py
  TestDataFormatterObjNSException.py

https://green.lab.llvm.org/green/view/LLDB/job/lldb-cmake/45288/
2022-07-12 15:22:29 -07:00