247 Commits

Author SHA1 Message Date
Timothy Herchen
8366dc207a
[clang] Don't warn on zero literals with -std=c2y (#149688)
Fixes #149669; the old check compared with the end of the literal, but
we can just check that after parsing digits, we're pointing to one
character past the token start.
2025-07-21 12:15:09 -04:00
Naveen Seth Hanig
f482b9677e
[C2y] Handle FP-suffixes on prefixed octals (#141230) (#141695)
Fixes https://github.com/llvm/llvm-project/issues/141230.

Currently, prefixed octal literals used with floating-point suffixes are
not
rejected, causing Clang to crash.
This adds proper handling to reject invalid literals such as `0o0.1` or
`0.0e1`.

No release note because this is fixing an issue with a new change.
2025-06-06 09:47:34 +02:00
Aaron Ballman
f2b8539803
[C2y] Correctly handle 0 in the preprocessor (#137844)
We do not diagnose 0 as a deprecated octal literal outside of the
preprocessor. This fixes a bug where we were accidentally diagnosing 0
from a preprocessor conditional, however.

No release note because this is fixing an issue with a new change.
2025-04-30 06:55:20 -04:00
Aaron Ballman
9cf46fb230
[C2y] Add octal prefixes, deprecate unprefixed octals (#131626)
WG14 N3353 added support for 0o and 0O as octal literal prefixes. It
also deprecates use of octal literals without a prefix, except for the
literal 0.

This feature is being exposed as an extension in older C language modes
as well as in all C++ language modes.
2025-03-18 07:28:59 -04:00
Saleem Abdulrasool
dcec224240
Lex: add support for i128 and ui128 suffixes (#130993)
Microsoft's compiler supports an extension for 128-bit literals. This is
referenced in `intsafe.h` which is included transitievly. When building
with modules, the literal parsing causes a failure due to the missing
support for the extension. To alleviate this issue, support parsing this
literal, especially now that there is the BitInt extension.

Take the opportunity to tighten up the code slightly by ensuring that we
do not access out-of-bounds characters when lexing the token.
2025-03-13 16:36:07 -07:00
Aaron Ballman
8f0c865d10
Fix a crash with empty escape sequences when lexing (#102339)
The utilities we use for lexing string and character literals can be run
in a mode where we pass a null pointer for the diagnostics engine. This
mode is used by the format string checkers, for example. However, there
were two places that failed to account for a null diagnostic engine
pointer: `\o{}` and `\x{}`.

This patch adds a check for a null pointer and correctly handles
fallback behavior.

Fixes #102218
2024-08-08 07:32:39 -04:00
Mike Rice
54b61adc0c
[NFC][clang] Replace unreachable code in literal processing with assert (#96579)
Address static verifier concerns about dead code in DoubleUnderscore
check. Replace it with an assert.
2024-06-25 07:14:40 -07:00
Serge Pavlov
f4066fa2dd
[clang] Use constant rounding mode for floating literals (#90877)
Conversion of floating-point literal to binary representation must be
made using constant rounding mode, which can be changed using pragma
FENV_ROUND. For example, the literal "0.1F" should be representes by
either 0.099999994 or 0.100000001 depending on the rounding direction.
2024-05-17 12:06:34 +07:00
js324
ca1f1c9572
[BitInt] Expose a _BitInt literal suffix in C++ (#86586)
This exposes _BitInt literal suffixes __wb and u__wb as an extension
in C++. There is a new Extension warning, and the tests are
essentially the same as the existing _BitInt literal tests for C but
with a few additional cases.

Fixes #85223
2024-04-22 14:42:57 -04:00
PiJoules
3d2a918831
[clang] Fixes inf loop parsing fixed point literal (#83071)
Clang was incorrectly finding the start of the exponent in a fixed point
hex literal. It would unconditionally find the first `e/E/p/P` in a
constant regardless of if it were hex or not and parser the remaining
digits as an APInt. In a debug build, this would be caught by an
assertion, but in a release build, the assertion is removed and we'd end
up in an infinite loop.

Fixes #83050
2024-02-26 14:47:16 -08:00
Aaron Ballman
8e24bc096d
[C23] Do not diagnose binary literals as an extension (#81658)
We previously would diagnose them as a GNU extension in C mode, but they
are now a feature of C23. The -Wgnu-binary-literal warning group no
longer controls any diagnostics as this is no longer a GNU extension.
The warning group is retained as a noop to help avoid "unknown warning"
diagnostics.

This also adds the companion compatibility warning which existed for C++
but not for C.

Fixes https://github.com/llvm/llvm-project/issues/72017
2024-02-14 09:08:28 -05:00
Chris B
2630d72cb3
[HLSL] Support vector swizzles on scalars (#67700)
HLSL supports vector swizzles on scalars by implicitly converting the
scalar to a single-element vector. This syntax is a convienent way to
initialize vectors based on filling a scalar value.

There are two parts of this change. The first part in the Lexer splits
numeric constant tokens when a `.x` or `.r` suffix is encountered. This
splitting is a bit hacky but allows the numeric constant to be parsed
separately from the vector element expression. There is an ambiguity
here with the `r` suffix used by fixed point types, however fixed point
types aren't supported in HLSL so this should not cause any exposable
problems (a separate issue has been filed to track validating language
options for HLSL: #67689).

The second part of this change is in Sema::LookupMemberExpr. For HLSL,
if the base type is a scalar, we implicit cast the scalar to a
one-element vector then call back to perform the vector lookup.

Fixes #56658 and #67511
2023-11-29 11:25:02 -06:00
Aaron Ballman
9c4ade0623 [C23] Rename C2x->C23 in diagnostics
This renames C2x to C23 in diagnostic identifiers and messages. The
changes were made mechanically.
2023-08-11 08:42:01 -04:00
Aaron Ballman
0ce056a814 [C23] Rename C2x -> C23; NFC
This does the rename for most internal uses of C2x, but does not rename
or reword diagnostics (those will be done in a follow-up).

I also updated standards references and citations to the final wording
in the standard.
2023-08-11 07:43:43 -04:00
Corentin Jabot
49e0495feb [Clang] Produce a warning instead of an error in unevaluated strings before C++26
Emiting an error on unexpected encoding prefix - which was allowed before C++26 -
caused build errors for a few users.
This downgrade the error to a warning on older language modes and C.

Reviewed By: aaron.ballman, hubert.reinterpretcast

Differential Revision: https://reviews.llvm.org/D156596
2023-08-10 09:43:20 +02:00
Corentin Jabot
ab4e4a6985 Revert "[Clang] Produce a warning instead of an error in unevaluated strings before C++26"
Causes build failure on bots after rebase.

This reverts commit 20e01167b15aa17dac09e4742909a7138eca7afc.
2023-08-10 08:47:57 +02:00
Corentin Jabot
20e01167b1 [Clang] Produce a warning instead of an error in unevaluated strings before C++26
Reviewed By: aaron.ballman, hubert.reinterpretcast

Differential Revision: https://reviews.llvm.org/D156596
2023-08-10 08:32:42 +02:00
Richard Dzenis
66c43fbd27 Enable concatenation of predefined identifiers
Predefined identifiers like __FUNCTION__ are treated like string
literals in MSVC, which means they can be concatentated together with
an adjacent string literal. Clang now supports this behavior as well,
in Microsoft extensions mode.

Fixes https://github.com/llvm/llvm-project/issues/63563
Differential Revision: https://reviews.llvm.org/D153914
2023-08-09 13:55:03 -04:00
Corentin Jabot
68410fbed7 Fix handling of medial hyphens in Unicode Names.
In a Unicode name was stored in a way that caused
a medial hyphen to be at the end of a a chunk, it would not
be properly ignored by the loose matching algorithm.

For example if `LEFT-TO-RIGHT OVERRIDE` was stored as
`LEFT-` [...], the `-` would not be ignored.

The generators now ensures nodes are not cut accross
medial hyphen boundaries.

Fixes #64161

Differential Revision: https://reviews.llvm.org/D156518
2023-07-28 15:09:08 +02:00
Corentin Jabot
304e974694 [Clang] Correctly handle $, @, and ` when represented as UCN
This covers
 * P2558R2 (C++, wg21.link/P2558)
 * N2701 (C, https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2701.htm)
 * N3124 (C, https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3124.pdf)

This patch
 * Disallow representing $ as a UCN in all language mode, which did not
   properly work (see GH62133), and which in made ill-formed in
   C++ and C by P2558 and N3124 respectively
 * Allow a UCN for any character in C2X, in string and character
   literals

Fixes #62133

Reviewed By: #clang-language-wg, tahonermann

Differential Revision: https://reviews.llvm.org/D153621
2023-07-12 08:03:23 +02:00
Sergio Afonso
63ca93c7d1
[OpenMP][OMPIRBuilder] Rename IsEmbedded and IsTargetCodegen flags
This patch renames the `OpenMPIRBuilderConfig` flags to reduce confusion over
their meaning. `IsTargetCodegen` becomes `IsGPU`, whereas `IsEmbedded` becomes
`IsTargetDevice`. The `-fopenmp-is-device` compiler option is also renamed to
`-fopenmp-is-target-device` and the `omp.is_device` MLIR attribute is renamed
to `omp.is_target_device`. Getters and setters of all these renamed properties
are also updated accordingly. Many unit tests have been updated to use the new
names, but an alias for the `-fopenmp-is-device` option is created so that
external programs do not stop working after the name change.

`IsGPU` is set when the target triple is AMDGCN or NVIDIA PTX, and it is only
valid if `IsTargetDevice` is specified as well. `IsTargetDevice` is set by the
`-fopenmp-is-target-device` compiler frontend option, which is only added to
the OpenMP device invocation for offloading-enabled programs.

Differential Revision: https://reviews.llvm.org/D154591
2023-07-10 14:14:16 +01:00
Corentin Jabot
95f50964fb Implement P2361 Unevaluated string literals
This patch proposes to handle in an uniform fashion
the parsing of strings that are never evaluated,
in asm statement, static assert, attrributes, extern,
etc.

Unevaluated strings are UTF-8 internally and so currently
behave as narrow strings, but these things will diverge with
D93031.

The big question both for this patch and the P2361 paper
is whether we risk breaking code by disallowing
encoding prefixes in this context.
I hope this patch may allow to gather some data on that.

Future work:
Improve the rendering of unicode characters, line break
and so forth in static-assert messages

Reviewed By: aaron.ballman, shafik

Differential Revision: https://reviews.llvm.org/D105759
2023-07-07 13:30:27 +02:00
Mark de Wever
ba15d186e5 [clang] Use -std=c++23 instead of -std=c++2b
During the ISO C++ Committee meeting plenary session the C++23 Standard
has been voted as technical complete.

This updates the reference to c++2b to c++23 and updates the __cplusplus
macro.

Drive-by fixes c++1z -> c++17 and c++2a -> c++20 when seen.

Reviewed By: aaron.ballman

Differential Revision: https://reviews.llvm.org/D149553
2023-05-04 19:19:52 +02:00
Timm Bäder
9bbe25eca5 [clang][Lex][NFC] Use a range for loop in StringLiteralParser 2023-04-20 11:51:05 +02:00
Kazu Hirata
f8f3db2756 Use APInt::count{l,r}_{zero,one} (NFC) 2023-02-19 22:04:47 -08:00
Sergei Barannikov
574e417460 [clang] Fix a bug that allowed some overflowing octal escape sequences
Reviewed By: cor3ntin

Differential Revision: https://reviews.llvm.org/D144100
2023-02-16 15:19:24 +03:00
Shilei Tian
9c2cfaaada [Clang][OpenMP] Allow f16 literal suffix when compiling OpenMP target offloading for NVPTX
Fix #58087.

Reviewed By: jhuber6

Differential Revision: https://reviews.llvm.org/D142075
2023-01-19 22:24:38 -05:00
Alexandre Ganea
eded23dfda [Clang] Silence a "unused variable" warning when building with MSVC 2023-01-09 23:45:20 -05:00
Fangrui Song
b1df3a2c0b [Support] llvm::Optional => std::optional
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-16 08:49:10 +00:00
Corentin Jabot
dbfe446ef3 [Clang] Implement CWG2640 Allow more characters in an n-char sequence
Reviewed By: #clang-language-wg, aaron.ballman, tahonermann

Differential Revision: https://reviews.llvm.org/D138861
2022-12-13 09:02:52 +01:00
Kadir Cetinkaya
36f77e20d9
Revert "Revert "[clang][Lex] Fix a crash on malformed string literals""
This reverts commit feea7ef23cb1bef92d363cc613052f8f3a878fc2.
Drops the test case, see https://reviews.llvm.org/D135161#3839510
2022-10-06 11:41:18 +02:00
Kadir Cetinkaya
feea7ef23c
Revert "[clang][Lex] Fix a crash on malformed string literals"
This reverts commit 36a200208facf58d454c9b7253c956c2f2a8b946.
2022-10-05 10:37:32 +02:00
Kadir Cetinkaya
36a200208f
[clang][Lex] Fix a crash on malformed string literals
Differential Revision: https://reviews.llvm.org/D135161
2022-10-05 09:55:50 +02:00
Fangrui Song
3f18f7c007 [clang] LLVM_FALLTHROUGH => [[fallthrough]]. NFC
With C++17 there is no Clang pedantic warning or MSVC C5051.

Reviewed By: aaron.ballman

Differential Revision: https://reviews.llvm.org/D131346
2022-08-08 09:12:46 -07:00
Corentin Jabot
6882ca9aff [Clang] Adjust extension warnings for delimited sequences
WG21 approved delimited escape sequences and named escape
sequences.
Adjust the extension warnings accordingly, and update
the release notes.

Reviewed By: aaron.ballman

Differential Revision: https://reviews.llvm.org/D129664
2022-07-14 07:50:58 +02:00
Corentin Jabot
a9a60f20e6 [Clang] Rename StringLiteral::isAscii() => isOrdinary() [NFC]
"Ascii" StringLiteral instances are actually narrow strings
that are UTF-8 encoded and do not have an encoding prefix.
(UTF8 StringLiteral are also UTF-8 encoded strings, but with
the u8 prefix.

To avoid possible confusion both with actuall ASCII strings,
and with future works extending the set of literal encodings
supported by clang, this rename StringLiteral::isAscii() to
isOrdinary(), matching C++ standard terminology.

Reviewed By: aaron.ballman

Differential Revision: https://reviews.llvm.org/D128762
2022-06-29 18:28:51 +02:00
Corentin Jabot
c92056d038 [Clang][C++23] P2071 Named universal character escapes
Implements [[ https://wg21.link/p2071r1  | P2071 Named Universal Character Escapes ]] - as an extension in all language mode, the patch  not warn in c++23 mode will be done later once this paper is plenary approved (in July).

We add

 * A code generator that transforms `UnicodeData.txt` and `NameAliases.txt` to a space efficient data structure that can be queried in `O(NameLength)`
 * A set of functions in `Unicode.h` to query that data, including

   * A function to find an exact match of a given Unicode character name
   * A function to perform a loose (ignoring case, space, underscore, medial hyphen) matching
   * A function returning the best matching codepoint for a given string per edit distance

 * Support of `\N{}` escape sequences in String and character Literals, with loose and typos diagnostics/fixits
 * Support of `\N{}` as UCN with loose matching diagnostics/fixits.

Loose matching is considered an error to match closely the semantics of P2071.

The generated data contributes to 280kB of data to the binaries.

`UnicodeData.txt` and `NameAliases.txt`  are not committed to the repository in this patch, and regenerating the data is a manual process.

Reviewed By: tahonermann

Differential Revision: https://reviews.llvm.org/D123064
2022-06-25 19:03:33 +02:00
Sam McCall
817550919e [Lex] Don't assert when decoding invalid UCNs.
Currently if a lexically-valid UCN encodes an invalid codepoint, then we
diagnose that, and then hit an assertion while trying to decode it.

Since there isn't anything preventing us reaching this state, remove the
assertion. expandUCNs("X\UAAAAAAAAY") will produce "XY".

Differential Revision: https://reviews.llvm.org/D125059
2022-05-06 08:51:42 +02:00
Aaron Ballman
9e3e85ac6e Silence -Wlogical-op-parentheses and fix a logic bug while doing so 2022-03-14 10:13:39 -04:00
Aaron Ballman
8cba72177d Implement literal suffixes for _BitInt
WG14 adopted N2775 (http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2775.pdf)
at our Feb 2022 meeting. This paper adds a literal suffix for
bit-precise types that automatically sizes the bit-precise type to be
the smallest possible legal _BitInt type that can represent the literal
value. The suffix chosen is wb (for a signed bit-precise type) which
can be combined with the u suffix (for an unsigned bit-precise type).

The preprocessor continues to operate as-if all integer types were
intmax_t/uintmax_t, including bit-precise integer types. It is a
constraint violation if the bit-precise literal is too large to fit
within that type in the context of the preprocessor (when still using
a pp-number preprocessing token), but it is not a constraint violation
in other circumstances. This allows you to make bit-precise integer
literals that are wider than what the preprocessor currently supports
in order to initialize variables, etc.
2022-03-14 09:24:19 -04:00
Daan De Meyer
5a6dac66db LiteralSupport: Don't assert() on invalid input
When using clangd, it's possible to trigger assertions in
NumericLiteralParser and CharLiteralParser when switching git branches.
This commit removes the initial asserts on invalid input and replaces
those asserts with the error handling mechanism from those respective
classes instead. This allows clangd to gracefully recover without
crashing.

See https://github.com/clangd/clangd/issues/888 for more information
on the clangd crashes.
2021-11-17 23:51:30 +00:00
Kazu Hirata
dccfaddc6b [clang] Use StringRef::contains (NFC) 2021-10-21 08:58:19 -07:00
Jay Foad
d933adeaca [APInt] Stop using soft-deprecated constructors and methods in clang. NFC.
Stop using APInt constructors and methods that were soft-deprecated in
D109483. This fixes all the uses I found in clang.

Differential Revision: https://reviews.llvm.org/D110808
2021-10-04 09:38:11 +01:00
Aaron Ballman
38d09080c9 Removing a default constructor argument; NFC
The argument is always used with its default value, so remove the
argument entirely.
2021-09-27 09:41:28 -04:00
Corentin Jabot
274adcb866 Implement delimited escape sequences.
\x{XXXX} \u{XXXX} and \o{OOOO} are accepted in all languages mode
in characters and string literals.

This is a feature proposed for both C++ (P2290R1) and C (N2785). The
papers have been seen by both committees but are not yet adopted into
either standard. However, they do have support from both committees.
2021-09-15 09:54:49 -04:00
Corentin Jabot
bdeda959ab Make wide multi-character character literals ill-formed
This implements P2362, which has not yet been approved by the
C++ committee, but because wide-multi character literals are
implementation defined, clang might not have to wait for WG21.

This change is also being applied in C mode as the behavior is
implementation-defined in C as well and there's no benefit to
having different rules between the languages.

The other part of P2362, making non-representable character
literals ill-formed, is already implemented by clang
2021-08-20 11:10:53 -04:00
Jan Svoboda
aa245ddd46 [clang][lex] NFC: Add explicit cast to silence -Wsign-compare 2021-07-22 12:21:12 +02:00
Anton Bikineev
dc7ebd2cb0 [C++2b] Support size_t literals
This adds support for C++2b's z/uz suffixes for size_t literals (P0330).
2021-03-31 13:36:23 +00:00
Jan Svoboda
23cc8ebf59 [clang][lex] Speculative fix for buffer overrun on raw string parse
This attempts to fix a (non-deterministic) buffer overrun when parsing raw string literals during modular build.

Similar fix to 4e5b5c36f47c9a406ea7f6b4f89fae477693973a.

Reviewed By: beccadax

Differential Revision: https://reviews.llvm.org/D94950
2021-03-15 15:13:47 +01:00
Chuyang Chen
8fa45e1fd5 Convert diagnostics about multi-character literals from extension to warning
This addresses PR46797.
2020-10-06 08:47:17 -04:00