Fixes#149669; the old check compared with the end of the literal, but
we can just check that after parsing digits, we're pointing to one
character past the token start.
Fixes https://github.com/llvm/llvm-project/issues/141230.
Currently, prefixed octal literals used with floating-point suffixes are
not
rejected, causing Clang to crash.
This adds proper handling to reject invalid literals such as `0o0.1` or
`0.0e1`.
No release note because this is fixing an issue with a new change.
We do not diagnose 0 as a deprecated octal literal outside of the
preprocessor. This fixes a bug where we were accidentally diagnosing 0
from a preprocessor conditional, however.
No release note because this is fixing an issue with a new change.
WG14 N3353 added support for 0o and 0O as octal literal prefixes. It
also deprecates use of octal literals without a prefix, except for the
literal 0.
This feature is being exposed as an extension in older C language modes
as well as in all C++ language modes.
Microsoft's compiler supports an extension for 128-bit literals. This is
referenced in `intsafe.h` which is included transitievly. When building
with modules, the literal parsing causes a failure due to the missing
support for the extension. To alleviate this issue, support parsing this
literal, especially now that there is the BitInt extension.
Take the opportunity to tighten up the code slightly by ensuring that we
do not access out-of-bounds characters when lexing the token.
The utilities we use for lexing string and character literals can be run
in a mode where we pass a null pointer for the diagnostics engine. This
mode is used by the format string checkers, for example. However, there
were two places that failed to account for a null diagnostic engine
pointer: `\o{}` and `\x{}`.
This patch adds a check for a null pointer and correctly handles
fallback behavior.
Fixes#102218
Conversion of floating-point literal to binary representation must be
made using constant rounding mode, which can be changed using pragma
FENV_ROUND. For example, the literal "0.1F" should be representes by
either 0.099999994 or 0.100000001 depending on the rounding direction.
This exposes _BitInt literal suffixes __wb and u__wb as an extension
in C++. There is a new Extension warning, and the tests are
essentially the same as the existing _BitInt literal tests for C but
with a few additional cases.
Fixes#85223
Clang was incorrectly finding the start of the exponent in a fixed point
hex literal. It would unconditionally find the first `e/E/p/P` in a
constant regardless of if it were hex or not and parser the remaining
digits as an APInt. In a debug build, this would be caught by an
assertion, but in a release build, the assertion is removed and we'd end
up in an infinite loop.
Fixes#83050
We previously would diagnose them as a GNU extension in C mode, but they
are now a feature of C23. The -Wgnu-binary-literal warning group no
longer controls any diagnostics as this is no longer a GNU extension.
The warning group is retained as a noop to help avoid "unknown warning"
diagnostics.
This also adds the companion compatibility warning which existed for C++
but not for C.
Fixes https://github.com/llvm/llvm-project/issues/72017
HLSL supports vector swizzles on scalars by implicitly converting the
scalar to a single-element vector. This syntax is a convienent way to
initialize vectors based on filling a scalar value.
There are two parts of this change. The first part in the Lexer splits
numeric constant tokens when a `.x` or `.r` suffix is encountered. This
splitting is a bit hacky but allows the numeric constant to be parsed
separately from the vector element expression. There is an ambiguity
here with the `r` suffix used by fixed point types, however fixed point
types aren't supported in HLSL so this should not cause any exposable
problems (a separate issue has been filed to track validating language
options for HLSL: #67689).
The second part of this change is in Sema::LookupMemberExpr. For HLSL,
if the base type is a scalar, we implicit cast the scalar to a
one-element vector then call back to perform the vector lookup.
Fixes#56658 and #67511
This does the rename for most internal uses of C2x, but does not rename
or reword diagnostics (those will be done in a follow-up).
I also updated standards references and citations to the final wording
in the standard.
Emiting an error on unexpected encoding prefix - which was allowed before C++26 -
caused build errors for a few users.
This downgrade the error to a warning on older language modes and C.
Reviewed By: aaron.ballman, hubert.reinterpretcast
Differential Revision: https://reviews.llvm.org/D156596
Predefined identifiers like __FUNCTION__ are treated like string
literals in MSVC, which means they can be concatentated together with
an adjacent string literal. Clang now supports this behavior as well,
in Microsoft extensions mode.
Fixes https://github.com/llvm/llvm-project/issues/63563
Differential Revision: https://reviews.llvm.org/D153914
In a Unicode name was stored in a way that caused
a medial hyphen to be at the end of a a chunk, it would not
be properly ignored by the loose matching algorithm.
For example if `LEFT-TO-RIGHT OVERRIDE` was stored as
`LEFT-` [...], the `-` would not be ignored.
The generators now ensures nodes are not cut accross
medial hyphen boundaries.
Fixes#64161
Differential Revision: https://reviews.llvm.org/D156518
This patch renames the `OpenMPIRBuilderConfig` flags to reduce confusion over
their meaning. `IsTargetCodegen` becomes `IsGPU`, whereas `IsEmbedded` becomes
`IsTargetDevice`. The `-fopenmp-is-device` compiler option is also renamed to
`-fopenmp-is-target-device` and the `omp.is_device` MLIR attribute is renamed
to `omp.is_target_device`. Getters and setters of all these renamed properties
are also updated accordingly. Many unit tests have been updated to use the new
names, but an alias for the `-fopenmp-is-device` option is created so that
external programs do not stop working after the name change.
`IsGPU` is set when the target triple is AMDGCN or NVIDIA PTX, and it is only
valid if `IsTargetDevice` is specified as well. `IsTargetDevice` is set by the
`-fopenmp-is-target-device` compiler frontend option, which is only added to
the OpenMP device invocation for offloading-enabled programs.
Differential Revision: https://reviews.llvm.org/D154591
This patch proposes to handle in an uniform fashion
the parsing of strings that are never evaluated,
in asm statement, static assert, attrributes, extern,
etc.
Unevaluated strings are UTF-8 internally and so currently
behave as narrow strings, but these things will diverge with
D93031.
The big question both for this patch and the P2361 paper
is whether we risk breaking code by disallowing
encoding prefixes in this context.
I hope this patch may allow to gather some data on that.
Future work:
Improve the rendering of unicode characters, line break
and so forth in static-assert messages
Reviewed By: aaron.ballman, shafik
Differential Revision: https://reviews.llvm.org/D105759
During the ISO C++ Committee meeting plenary session the C++23 Standard
has been voted as technical complete.
This updates the reference to c++2b to c++23 and updates the __cplusplus
macro.
Drive-by fixes c++1z -> c++17 and c++2a -> c++20 when seen.
Reviewed By: aaron.ballman
Differential Revision: https://reviews.llvm.org/D149553
WG21 approved delimited escape sequences and named escape
sequences.
Adjust the extension warnings accordingly, and update
the release notes.
Reviewed By: aaron.ballman
Differential Revision: https://reviews.llvm.org/D129664
"Ascii" StringLiteral instances are actually narrow strings
that are UTF-8 encoded and do not have an encoding prefix.
(UTF8 StringLiteral are also UTF-8 encoded strings, but with
the u8 prefix.
To avoid possible confusion both with actuall ASCII strings,
and with future works extending the set of literal encodings
supported by clang, this rename StringLiteral::isAscii() to
isOrdinary(), matching C++ standard terminology.
Reviewed By: aaron.ballman
Differential Revision: https://reviews.llvm.org/D128762
Implements [[ https://wg21.link/p2071r1 | P2071 Named Universal Character Escapes ]] - as an extension in all language mode, the patch not warn in c++23 mode will be done later once this paper is plenary approved (in July).
We add
* A code generator that transforms `UnicodeData.txt` and `NameAliases.txt` to a space efficient data structure that can be queried in `O(NameLength)`
* A set of functions in `Unicode.h` to query that data, including
* A function to find an exact match of a given Unicode character name
* A function to perform a loose (ignoring case, space, underscore, medial hyphen) matching
* A function returning the best matching codepoint for a given string per edit distance
* Support of `\N{}` escape sequences in String and character Literals, with loose and typos diagnostics/fixits
* Support of `\N{}` as UCN with loose matching diagnostics/fixits.
Loose matching is considered an error to match closely the semantics of P2071.
The generated data contributes to 280kB of data to the binaries.
`UnicodeData.txt` and `NameAliases.txt` are not committed to the repository in this patch, and regenerating the data is a manual process.
Reviewed By: tahonermann
Differential Revision: https://reviews.llvm.org/D123064
Currently if a lexically-valid UCN encodes an invalid codepoint, then we
diagnose that, and then hit an assertion while trying to decode it.
Since there isn't anything preventing us reaching this state, remove the
assertion. expandUCNs("X\UAAAAAAAAY") will produce "XY".
Differential Revision: https://reviews.llvm.org/D125059
WG14 adopted N2775 (http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2775.pdf)
at our Feb 2022 meeting. This paper adds a literal suffix for
bit-precise types that automatically sizes the bit-precise type to be
the smallest possible legal _BitInt type that can represent the literal
value. The suffix chosen is wb (for a signed bit-precise type) which
can be combined with the u suffix (for an unsigned bit-precise type).
The preprocessor continues to operate as-if all integer types were
intmax_t/uintmax_t, including bit-precise integer types. It is a
constraint violation if the bit-precise literal is too large to fit
within that type in the context of the preprocessor (when still using
a pp-number preprocessing token), but it is not a constraint violation
in other circumstances. This allows you to make bit-precise integer
literals that are wider than what the preprocessor currently supports
in order to initialize variables, etc.
When using clangd, it's possible to trigger assertions in
NumericLiteralParser and CharLiteralParser when switching git branches.
This commit removes the initial asserts on invalid input and replaces
those asserts with the error handling mechanism from those respective
classes instead. This allows clangd to gracefully recover without
crashing.
See https://github.com/clangd/clangd/issues/888 for more information
on the clangd crashes.
Stop using APInt constructors and methods that were soft-deprecated in
D109483. This fixes all the uses I found in clang.
Differential Revision: https://reviews.llvm.org/D110808
\x{XXXX} \u{XXXX} and \o{OOOO} are accepted in all languages mode
in characters and string literals.
This is a feature proposed for both C++ (P2290R1) and C (N2785). The
papers have been seen by both committees but are not yet adopted into
either standard. However, they do have support from both committees.
This implements P2362, which has not yet been approved by the
C++ committee, but because wide-multi character literals are
implementation defined, clang might not have to wait for WG21.
This change is also being applied in C mode as the behavior is
implementation-defined in C as well and there's no benefit to
having different rules between the languages.
The other part of P2362, making non-representable character
literals ill-formed, is already implemented by clang
This attempts to fix a (non-deterministic) buffer overrun when parsing raw string literals during modular build.
Similar fix to 4e5b5c36f47c9a406ea7f6b4f89fae477693973a.
Reviewed By: beccadax
Differential Revision: https://reviews.llvm.org/D94950