When reviewing #147156, the reviewers pointed out that we didn't need to
support the trigraph. The code never handled it right.
In the debug build, this kind of input caused the assertion in the
function `countLeadingWhitespace` to fail. The release build without
assertions outputted `?` `?` `/` separated by spaces.
```C
#define A ??/
int i;
```
This is because the code in `countLeadingWhitespace` assumed that the
underlying lexer recognized the entire `??/` sequence as a single token.
In fact, the lexer recognized it as 3 separate tokens. The flag to make
the lexer recognize trigraphs was never enabled.
This patch enables the flag in the underlying lexer. This way, the
program now either turns the trigraph into a single `\` or removes it
altogether if the line is short enough. There are operators like the
`??=` in C#. So the flag is not enabled for all input languages. Instead
the check for the token size is moved from the assert line into the if
line.
The problem was introduced by my own patch 370bee480139 from about 3
years ago. I added code to count the number of characters in the escape
sequence probably just because the block of code used to have a comment
saying someone should add the feature. Maybe I forgot to enable
assertions when I ran the code. I found the problem because reviewing
pull request 145243 made me look at the code again.
Fixes#145226.
Implement
[P2223R2](https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2021/p2223r2.pdf)
in clang-format to correctly handle cases where a backslash '\\' is
followed by trailing whitespace before the newline.
Previously, `clang-format` failed to properly detect and handle such
cases, leading to misformatted code.
With this, `clang-format` matches the behavior already implemented in
Clang's lexer and `DependencyDirectivesScanner.cpp`, which allow
trailing whitespace after a line continuation in any C++ standard.
There is some code to make sure that C++ keywords that are identifiers
in the other languages are not treated as keywords. Right now, the kind
is set to identifier, and the identifier info is cleared. The latter is
probably so that the code for identifying C++ structures does not
recognize those structures by mistake when formatting a language that
does not have those structures. But we did not find an instance where
the language can have the sequence of tokens, the code tries to parse
the structure as if it is C++ using the identifier info instead of the
token kind, but without checking for the language setting. However,
there are places where the code checks whether the identifier info field
is null or not. They are places where an identifier and a keyword are
treated the same way. For example, the name of a function in
JavaScript. This patch removes the lines that clear the identifier
info. This way, a C++ keyword gets treated in the same way as an
identifier in those places.
JavaScript
New
```JavaScript
async function
union(
myparamnameiswaytooloooong) {
}
```
Old
```JavaScript
async function
union(
myparamnameiswaytooloooong) {
}
```
Java
New
```Java
enum union { ABC, CDE }
```
Old
```Java
enum
union { ABC, CDE }
```
This reverts commit 97dcbdef6089175c45e14fcbcf5c88b10233a79a.
There is some code to make sure that C++ keywords that are identifiers
in the other languages are not treated as keywords. Right now, the kind
is set to identifier, and the identifier info is cleared. The latter is
probably so that the code for identifying C++ structures does not
recognize those structures by mistake when formatting a language that
does not have those structures. But we did not find an instance where
the language can have the sequence of tokens, the code tries to parse
the structure as if it is C++ using the identifier info instead of the
token kind, but without checking for the language setting. However,
there are places where the code checks whether the identifier info field
is null or not. They are places where an identifier and a keyword are
treated the same way. For example, the name of a function in
JavaScript. This patch removes the lines that clear the identifier
info. This way, a C++ keyword gets treated in the same way as an
identifier in those places.
JavaScript
New
```JavaScript
async function
union(
myparamnameiswaytooloooong) {
}
```
Old
```JavaScript
async function
union(
myparamnameiswaytooloooong) {
}
```
Java
New
```Java
enum union { ABC, CDE }
```
Old
```Java
enum
union { ABC, CDE }
```
A regular expression was used in the lexing process. It made the program
take more than linear time with regards to the length of the input. It
looked like the entire buffer could be scanned for every token lexed.
Now the regular expression is replaced with code. Previously it took 20
minutes for the program to format 125 000 lines of code on my computer.
Now it takes 315 milliseconds.
This reverts commit b92d6dd704d789240685a336ad8b25a9f381b4cc. See
github.com/llvm/llvm-project/commit/b92d6dd704d7#commitcomment-139992444
We should use a tool like Visual Studio to clean up the headers.
This implements the annotation of the values in TableGen.
The main changes are,
- parseTableGenValue(), the simplified parser method for the syntax of
values.
- modified consumeToken() to parseTableGenValue in 'if', 'assert' and
after '='.
- modified parseParens() to call parseTableGenValue inside.
- modified parseSquare() to to call parseTableGenValue inside, with
skipping separator tokens.
- modified parseAngle() to call parseTableGenValue inside, with skipping
separator tokens.
Adds the support for tokens that have forms like unary operators.
- bang operators: `!name`
- cond operator: `!cond`
- numeric literals: `+1`, `-1`
cond operator are one of bang operators but is distinguished because it has very specific syntax.
This patch replaces uses of StringRef::{starts,ends}with with
StringRef::{starts,ends}_with for consistency with
std::{string,string_view}::{starts,ends}_with in C++20.
I'm planning to deprecate and eventually remove
StringRef::{starts,ends}with.
The Verilog implication operator `->` is a binary operator meaning
either the left hand side is false or the right hand side is true.
Previously it was treated as the C++ struct member operator.
I didn't even know it existed when I added the operator formatting part.
And I didn't check all the tests for all the operators I added. That is
how the bad test got in.
If a non-keyword identifier is found in TypeNames, then a *, &, or && that
follows it is annotated as TT_PointerOrReference.
Differential Revision: https://reviews.llvm.org/D155273