This reverts commit f2583f3acf596cc545c8c0e3cb28e712f4ebf21b.
There is a large body of non-conforming C-like code using format strings
like this:
#define PRIuS "zu"
void h(size_t foo, size_t bar) {
printf("foo is %"PRIuS", bar is %"PRIuS, foo, bar);
}
Rejecting this code would be very disruptive. We could decide to do
that, but it's sufficiently disruptive that I think it requires
gathering more community consensus with an RFC, and Aaron indicated [1]
it's OK to revert for now so continuous testing systems can see past
this issue while we decide what to do.
[1] https://reviews.llvm.org/D153156#4607717
For applications like clangd, the preamble remains an important optimization
when editing a module definition. The global module fragment is a good fit for
it as it by definition contains only preprocessor directives.
Before this patch, we would terminate the preamble immediately at the "module"
keyword.
Differential Revision: https://reviews.llvm.org/D158439
This does the rename for most internal uses of C2x, but does not rename
or reword diagnostics (those will be done in a follow-up).
I also updated standards references and citations to the final wording
in the standard.
During the ISO C++ Committee meeting plenary session the C++23 Standard
has been voted as technical complete.
This updates the reference to c++2b to c++23 and updates the __cplusplus
macro.
Drive-by fixes c++1z -> c++17 and c++2a -> c++20 when seen.
Reviewed By: aaron.ballman
Differential Revision: https://reviews.llvm.org/D149553
This ensures we get the correct FileCharacteristic during scanning. In a
yet-to-be-upstreamed branch this fixes observable failures, but it's
also good to handle this on principle: the FileCharacteristic is a
property of the file that is observable in the scanner, so there is
nothing preventing us from depending on it.
rdar://108627403
Differential Revision: https://reviews.llvm.org/D149777
\u<DIGIT>{...} was incorrectly parsed as a valid UCN instead
of emitting a diagnostic, causing an assertion failure.
Reviewed By: tahonermann
Differential Revision: https://reviews.llvm.org/D139889
Implement the proposed UAX Profile
"Mathematical notation profile for default identifiers".
This implements a not-yet approved Unicode for a vetted
UAX31 identifier profile
https://www.unicode.org/L2/L2022/22230-math-profile.pdf
This change mitigates the reported disruption caused
by the implementation of UAX31 in C++ and C2x,
as these mathematical symbols are commonly used in the
scientific community.
Fixes#54732
Reviewed By: tahonermann, #clang-language-wg
Differential Revision: https://reviews.llvm.org/D137051
Use `SourceManager::isWrittenInScratchSpace()` to specifically check for token paste or stringization, instead of
excluding all non-file buffers. This allows diagnostics to mention macro names that were defined from the command-line.
Differential Revision: https://reviews.llvm.org/D140164
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated. The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.
This is part of an effort to migrate from llvm::Optional to
std::optional:
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
Only reset "NeedsCleaning" flag in case of re-entrant call.
Do not needlessly blank IdentifierInfo. This information will be set
once the token type is picked.
This yields a nice 1% speedup when pre-processing sqlite amalgamation
through:
valgrind --tool=callgrind ./bin/clang -E sqlite3.c -o/dev/null
Differential Revision: https://reviews.llvm.org/D137960
Directive `dependency_directives_scan::tokens_present_before_eof` is introduced to indicate there were tokens present before
the last scanned dependency directive and EOF.
This is useful to ensure we correctly identify the macro guards when lexing using the dependency directives.
Differential Revision: https://reviews.llvm.org/D133357
I went over the output of the following mess of a command:
(ulimit -m 2000000; ulimit -v 2000000; git ls-files -z |
parallel --xargs -0 cat | aspell list --mode=none --ignore-case |
grep -E '^[A-Za-z][a-z]*$' | sort | uniq -c | sort -n |
grep -vE '.{25}' | aspell pipe -W3 | grep : | cut -d' ' -f2 | less)
and proceeded to spend a few days looking at it to find probable typos
and fixed a few hundred of them in all of the llvm project (note, the
ones I found are not anywhere near all of them, but it seems like a
good start).
Differential Revision: https://reviews.llvm.org/D130827
isAllowedInitiallyIDChar is only used with non-ASCII codepoints,
which are handled by isAsciiIdentifierStart.
To make that clearer, remove the check for _ from
isAllowedInitiallyIDChar, and assert on ASCII - to ensure neither
_ or $ are passed to this function.
Reviewed By: tahonermann, aaron.ballman
Differential Revision: https://reviews.llvm.org/D130750
This implements
N2836 Identifier Syntax using Unicode Standard Annex 31.
The feature was already implemented for C++,
and the semantics are the same.
Unlike C++ there was, afaict, no decision to
backport the feature in older languages mode,
so C17 and earlier are not modified and the
code point tables for these language modes are conserved.
Reviewed By: aaron.ballman
Differential Revision: https://reviews.llvm.org/D130416
WG21 approved delimited escape sequences and named escape
sequences.
Adjust the extension warnings accordingly, and update
the release notes.
Reviewed By: aaron.ballman
Differential Revision: https://reviews.llvm.org/D129664
Introduce an off-by default `-Winvalid-utf8` warning
that detects invalid UTF-8 code units sequences in comments.
Invalid UTF-8 in other places is already diagnosed,
as that cannot appear in identifiers and other grammar constructs.
The warning is off by default as its likely to be somewhat disruptive
otherwise.
This warning allows clang to conform to the yet-to be approved WG21
"P2295R5 Support for UTF-8 as a portable source file encoding"
paper.
Reviewed By: aaron.ballman, #clang-language-wg
Differential Revision: https://reviews.llvm.org/D128059
This reverts commit cc309721d20c8e544ae7a10a66735ccf4981a11c because it
breaks the following tests on GreenDragon:
TestDataFormatterObjCCF.py
TestDataFormatterObjCExpr.py
TestDataFormatterObjCKVO.py
TestDataFormatterObjCNSBundle.py
TestDataFormatterObjCNSData.py
TestDataFormatterObjCNSError.py
TestDataFormatterObjCNSNumber.py
TestDataFormatterObjCNSURL.py
TestDataFormatterObjCPlain.py
TestDataFormatterObjNSException.py
https://green.lab.llvm.org/green/view/LLDB/job/lldb-cmake/45288/
Introduce an off-by default `-Winvalid-utf8` warning
that detects invalid UTF-8 code units sequences in comments.
Invalid UTF-8 in other places is already diagnosed,
as that cannot appear in identifiers and other grammar constructs.
The warning is off by default as its likely to be somewhat disruptive
otherwise.
This warning allows clang to conform to the yet-to be approved WG21
"P2295R5 Support for UTF-8 as a portable source file encoding"
paper.
Reviewed By: aaron.ballman, #clang-language-wg
Differential Revision: https://reviews.llvm.org/D128059
Introduce an off-by default `-Winvalid-utf8` warning
that detects invalid UTF-8 code units sequences in comments.
Invalid UTF-8 in other places is already diagnosed,
as that cannot appear in identifiers and other grammar constructs.
The warning is off by default as its likely to be somewhat disruptive
otherwise.
This warning allows clang to conform to the yet-to be approved WG21
"P2295R5 Support for UTF-8 as a portable source file encoding"
paper.
Reviewed By: aaron.ballman, #clang-language-wg
Differential Revision: https://reviews.llvm.org/D128059
This reverts commit 4174f0ca618b467571b43cff12cbe4c4239670f8.
Also revert follow-up "[Clang] Fix invalid utf-8 detection"
This reverts commit bf45e27a676d87944f1f13d5f0d0f39935fc4010.
The second commit broke tests, see comments on
https://reviews.llvm.org/D129223, and it sounds like the first
commit isn't valid without the second one. So reverting both for now.
Introduce an off-by default `-Winvalid-utf8` warning
that detects invalid UTF-8 code units sequences in comments.
Invalid UTF-8 in other places is already diagnosed,
as that cannot appear in identifiers and other grammar constructs.
The warning is off by default as its likely to be somewhat disruptive
otherwise.
This warning allows clang to conform to the yet-to be approved WG21
"P2295R5 Support for UTF-8 as a portable source file encoding"
paper.
Reviewed By: aaron.ballman, #clang-language-wg
Differential Revision: https://reviews.llvm.org/D128059
Introduce an off-by default `-Winvalid-utf8` warning
that detects invalid UTF-8 code units sequences in comments.
Invalid UTF-8 in other places is already diagnosed,
as that cannot appear in identifiers and other grammar constructs.
The warning is off by default as its likely to be somewhat disruptive
otherwise.
This warning allows clang to conform to the yet-to be approved WG21
"P2295R5 Support for UTF-8 as a portable source file encoding"
paper.
Reviewed By: aaron.ballman, #clang-language-wg
Differential Revision: https://reviews.llvm.org/D128059
Otherwise a header may be erroneously marked as having a header macro guard and won't get re-included.
Differential Revision: https://reviews.llvm.org/D128772
Implements [[ https://wg21.link/p2071r1 | P2071 Named Universal Character Escapes ]] - as an extension in all language mode, the patch not warn in c++23 mode will be done later once this paper is plenary approved (in July).
We add
* A code generator that transforms `UnicodeData.txt` and `NameAliases.txt` to a space efficient data structure that can be queried in `O(NameLength)`
* A set of functions in `Unicode.h` to query that data, including
* A function to find an exact match of a given Unicode character name
* A function to perform a loose (ignoring case, space, underscore, medial hyphen) matching
* A function returning the best matching codepoint for a given string per edit distance
* Support of `\N{}` escape sequences in String and character Literals, with loose and typos diagnostics/fixits
* Support of `\N{}` as UCN with loose matching diagnostics/fixits.
Loose matching is considered an error to match closely the semantics of P2071.
The generated data contributes to 280kB of data to the binaries.
`UnicodeData.txt` and `NameAliases.txt` are not committed to the repository in this patch, and regenerating the data is a manual process.
Reviewed By: tahonermann
Differential Revision: https://reviews.llvm.org/D123064
This is a commit with the following changes:
* Remove `ExcludedPreprocessorDirectiveSkipMapping` and related functionality
Removes `ExcludedPreprocessorDirectiveSkipMapping`; its intended benefit for fast skipping of excluded directived blocks
will be superseded by a follow-up patch in the series that will use dependency scanning lexing for the same purpose.
* Refactor dependency scanning to produce pre-lexed preprocessor directive tokens, instead of minimized sources
Replaces the "source minimization" mechanism with a mechanism that produces lexed dependency directives tokens.
* Make the special lexing for dependency scanning a first-class feature of the `Preprocessor` and `Lexer`
This is bringing the following benefits:
* Full access to the preprocessor state during dependency scanning. E.g. a component can see what includes were taken and where they were located in the actual sources.
* Improved performance for dependency scanning. Measurements with a release+thin-LTO build shows ~ -11% reduction in wall time.
* Opportunity to use dependency scanning lexing to speed-up skipping of excluded conditional blocks during normal preprocessing (as follow-up, not part of this patch).
For normal preprocessing measurements show differences are below the noise level.
Since, after this change, we don't minimize sources and pass them in place of the real sources, `DependencyScanningFilesystem` is not technically necessary, but it has valuable performance benefits for caching file `stat`s along with the results of scanning the sources. So the setup of using the `DependencyScanningFilesystem` during a dependency scan remains.
Differential Revision: https://reviews.llvm.org/D125486
Differential Revision: https://reviews.llvm.org/D125487
Differential Revision: https://reviews.llvm.org/D125488
> Includes regression test for problem noted by @hans.
> is reverts commit 973de71.
>
> Differential Revision: https://reviews.llvm.org/D106898
Feature implemented as-is is fairly expensive and hasn't been used by
libc++. A potential reimplementation is possible if libc++ become
interested in this feature again.
Differential Revision: https://reviews.llvm.org/D123885
Given that there is only one external user of Lexer::getLangOpts
we can remove getter entirely without much pain.
Differential Revision: https://reviews.llvm.org/D120404