136 Commits

Author SHA1 Message Date
sstwcw
12a3afe47d
[clang-format] Remove code related to trigraphs (#148640)
When reviewing #147156, the reviewers pointed out that we didn't need to
support the trigraph. The code never handled it right.

In the debug build, this kind of input caused the assertion in the
function `countLeadingWhitespace` to fail. The release build without
assertions outputted `?` `?` `/` separated by spaces.

```C
#define A ??/
  int i;
```

This is because the code in `countLeadingWhitespace` assumed that the
underlying lexer recognized the entire `??/` sequence as a single token.
In fact, the lexer recognized it as 3 separate tokens. The flag to make
the lexer recognize trigraphs was never enabled.

This patch enables the flag in the underlying lexer. This way, the
program now either turns the trigraph into a single `\` or removes it
altogether if the line is short enough. There are operators like the
`??=` in C#. So the flag is not enabled for all input languages. Instead
the check for the token size is moved from the assert line into the if
line.

The problem was introduced by my own patch 370bee480139 from about 3
years ago. I added code to count the number of characters in the escape
sequence probably just because the block of code used to have a comment
saying someone should add the feature. Maybe I forgot to enable
assertions when I ran the code. I found the problem because reviewing
pull request 145243 made me look at the code again.
2025-07-21 15:40:28 +00:00
Owen Pan
c384ec431d
[clang-format] Add MacrosSkippedByRemoveParentheses option (#148345)
This allows RemoveParentheses to skip the invocations of function-like
macros.

Fixes #68354.
Fixes #147780.
2025-07-13 14:29:51 -07:00
Owen Pan
cb52efb893
[clang-format] Split line comments separated by backslashes (#147648)
Fixes #147341
2025-07-10 18:14:45 -07:00
Owen Pan
a5af874503 [clang-format][NFC] Use empty() instead of comparing size() to 0 or 1 2025-07-06 23:54:00 -07:00
Owen Pan
5ccbea9f48
[clang-format][NFC] Replace size() with empty() (#147164) 2025-07-06 14:19:30 -07:00
Naveen Seth Hanig
dd47b845a6
[clang-format] Handle Trailing Whitespace After Line Continuation (P2223R2) (#145243)
Fixes #145226.

Implement
[P2223R2](https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2021/p2223r2.pdf)
in clang-format to correctly handle cases where a backslash '\\' is
followed by trailing whitespace before the newline.
Previously, `clang-format` failed to properly detect and handle such
cases, leading to misformatted code.

With this, `clang-format` matches the behavior already implemented in
Clang's lexer and `DependencyDirectivesScanner.cpp`, which allow
trailing whitespace after a line continuation in any C++ standard.
2025-06-25 18:13:00 +02:00
Owen Pan
b7f5950bb3
[clang-format] Handle Java text blocks (#141334)
Fix #61954
2025-05-25 15:40:45 -07:00
Owen Pan
eb341f0b04
[clang-format][NFC] FormatTokenLexer.cpp cleanup (#141202) 2025-05-23 13:49:28 -07:00
Owen Pan
8effc8da29 Reland [clang-format] Add OneLineFormatOffRegex option (#137577) 2025-04-30 19:58:59 -07:00
Owen Pan
7752e0a10b Revert "[clang-format] Add OneLineFormatOffRegex option (#137577)"
This reverts commit b8bb1ccb4f9126d1bc9817be24e17f186a75a08b which triggered
an assertion failure in CodeGenTest.TestNonAlterTest.
2025-04-30 00:12:41 -07:00
Owen Pan
b8bb1ccb4f
[clang-format] Add OneLineFormatOffRegex option (#137577)
Close #54334
2025-04-29 19:22:53 -07:00
Owen Pan
9efabbbbe5
[clang-format] Fix a bug in lexing C++ UDL ending in $ (#136476)
Fix #61612
2025-04-22 21:08:09 -07:00
Owen Pan
09c8cfe219
[clang-format][NFC] Add isJava() and isTextProto() in FormatStyle (#135466)
Also remove redundant name qualifiers format::, FormatStyle::, and
LanguageKind::.
2025-04-12 15:04:29 -07:00
sstwcw
ed85822027
[clang-format] Handle C++ keywords in other languages better (#132941)
There is some code to make sure that C++ keywords that are identifiers
in the other languages are not treated as keywords.  Right now, the kind
is set to identifier, and the identifier info is cleared.  The latter is
probably so that the code for identifying C++ structures does not
recognize those structures by mistake when formatting a language that
does not have those structures.  But we did not find an instance where
the language can have the sequence of tokens, the code tries to parse
the structure as if it is C++ using the identifier info instead of the
token kind, but without checking for the language setting.  However,
there are places where the code checks whether the identifier info field
is null or not.  They are places where an identifier and a keyword are
treated the same way.  For example, the name of a function in
JavaScript.  This patch removes the lines that clear the identifier
info.  This way, a C++ keyword gets treated in the same way as an
identifier in those places.

JavaScript

New

```JavaScript
async function
union(
    myparamnameiswaytooloooong) {
}
```

Old

```JavaScript
async function
    union(
        myparamnameiswaytooloooong) {
}
```

Java

New

```Java
enum union { ABC, CDE }
```

Old

```Java
enum
union { ABC, CDE }
```

This reverts commit 97dcbdef6089175c45e14fcbcf5c88b10233a79a.
2025-04-10 12:51:10 +00:00
Owen Pan
97dcbdef60 Revert "[clang-format] Handle C++ keywords in other languages better (#132941)"
This reverts commit ab7cee8a0ecf29fdb47c64c8d431a694d63390d2 which had
formatting errors.
2025-04-01 18:59:12 -07:00
sstwcw
ab7cee8a0e [clang-format] Handle C++ keywords in other languages better (#132941)
There is some code to make sure that C++ keywords that are identifiers
in the other languages are not treated as keywords.  Right now, the kind
is set to identifier, and the identifier info is cleared.  The latter is
probably so that the code for identifying C++ structures does not
recognize those structures by mistake when formatting a language that
does not have those structures.  But we did not find an instance where
the language can have the sequence of tokens, the code tries to parse
the structure as if it is C++ using the identifier info instead of the
token kind, but without checking for the language setting.  However,
there are places where the code checks whether the identifier info field
is null or not.  They are places where an identifier and a keyword are
treated the same way.  For example, the name of a function in
JavaScript.  This patch removes the lines that clear the identifier
info.  This way, a C++ keyword gets treated in the same way as an
identifier in those places.

JavaScript

New

```JavaScript
async function
union(
    myparamnameiswaytooloooong) {
}
```

Old

```JavaScript
async function
    union(
        myparamnameiswaytooloooong) {
}
```

Java

New

```Java
enum union { ABC, CDE }
```

Old

```Java
enum
union { ABC, CDE }
```
2025-03-31 13:54:49 +00:00
Owen Pan
91328dbae9
[clang-format] Correctly annotate user-defined conversion functions (#131434)
Also fix/delete existing invalid/redundant test cases.

Fix #130894
2025-03-16 16:11:39 -07:00
sstwcw
fbef1f835f
[clang-format][NFC] Make formatting Verilog faster (#121139)
A regular expression was used in the lexing process. It made the program
take more than linear time with regards to the length of the input. It
looked like the entire buffer could be scanned for every token lexed.
Now the regular expression is replaced with code. Previously it took 20
minutes for the program to format 125 000 lines of code on my computer.
Now it takes 315 milliseconds.
2025-01-14 15:37:06 +00:00
Owen Pan
04610b901f
[clang-format][NFC] Replace SmallVectorImpl with ArrayRef (#121621) 2025-01-04 16:19:46 -08:00
Owen Pan
1a0d0ae234
[clang-format] Add VariableTemplates option (#121318)
Closes #120148.
2025-01-01 18:24:56 -08:00
Owen Pan
786db636b9
[clang-format] Add KeepFormFeed option (#113268)
Closes #113170.
2024-10-23 19:55:32 -07:00
Owen Pan
688bc958bd
[clang-format] Add TemplateNames option to help parse C++ angles (#109916)
Closes #109912.
2024-10-02 18:10:56 -07:00
Owen Pan
7153a4bbf6
[clang-format] Reimplement InsertNewlineAtEOF (#108513)
Fixes #108333.
2024-09-17 21:16:20 -07:00
Owen Pan
364f988d3f Reland "[clang-format] Fix FormatToken::isSimpleTypeSpecifier() (#91712)"
Remove FormatToken::isSimpleTypeSpecifier() and call
Token::isSimpleTypeSpecifier(LangOpts) instead.
2024-05-13 21:54:23 -07:00
Owen Pan
1fadb2b0c8 Revert "[clang-format] Fix FormatToken::isSimpleTypeSpecifier() (#91712)"
This reverts commits e62ce1f8842c, 5cd280433e8e, and de641e289269 due to
buildbot failures.
2024-05-12 23:15:35 -07:00
Owen Pan
e62ce1f884
[clang-format] Fix FormatToken::isSimpleTypeSpecifier() (#91712)
Remove FormatToken::isSimpleTypeSpecifier() and call
Token::isSimpleTypeSpecifier(LangOpts) instead.
2024-05-10 19:27:02 -07:00
Owen Pan
684f27d37a [clang-format][NFC] Use is instead of getType() == 2024-04-06 01:51:45 -07:00
Owen Pan
b2082a9817 Revert "[clang-format][NFC] Delete 100+ redundant #include lines in .cpp files"
This reverts commit b92d6dd704d789240685a336ad8b25a9f381b4cc. See
github.com/llvm/llvm-project/commit/b92d6dd704d7#commitcomment-139992444

We should use a tool like Visual Studio to clean up the headers.
2024-03-19 21:28:22 -07:00
Owen Pan
6f31cf51df Revert "[clang-format][NFC] Eliminate the IsCpp parameter in all functions (#84599)"
This reverts c3a1eb6207d8 (and the related commit f3c5278efa3b) which makes
cleanupAroundReplacements() no longer thread-safe.
2024-03-19 18:06:59 -07:00
Owen Pan
b92d6dd704 [clang-format][NFC] Delete 100+ redundant #include lines in .cpp files 2024-03-16 22:24:11 -07:00
Owen Pan
c3a1eb6207 Reland [clang-format][NFC] Eliminate the IsCpp parameter in all functions (#84599)
Initialize IsCpp in LeftRightQualifierAlignmentFixer ctor.
2024-03-14 19:44:40 -07:00
Mehdi Amini
b0d1e32ca2
Revert "[clang-format][NFC] Eliminate the IsCpp parameter in all functions" (#85353)
Reverts llvm/llvm-project#84599

This broke the presubmit bot.
2024-03-14 19:33:11 -07:00
Owen Pan
0c07102927
[clang-format][NFC] Eliminate the IsCpp parameter in all functions (#84599) 2024-03-14 18:56:24 -07:00
Owen Pan
61c83e9491 Revert "[clang-format][NFC] Make LangOpts global in namespace Format"
This reverts commit 32e65b0b8a743678974c7ca7913c1d6c41bb0772.

It seems to break some PowerPC bots.

See https://github.com/llvm/llvm-project/pull/81390#issuecomment-1941964803.
2024-02-13 21:02:14 -08:00
Hirofumi Nakamura
6a471611a4
[clang-format] Support of TableGen value annotations. (#80299)
This implements the annotation of the values in TableGen.
The main changes are,

- parseTableGenValue(), the simplified parser method for the syntax of
values.
- modified consumeToken() to parseTableGenValue in 'if', 'assert' and
after '='.
- modified parseParens() to call parseTableGenValue inside.
- modified parseSquare() to to call parseTableGenValue inside, with
skipping separator tokens.
- modified parseAngle() to call parseTableGenValue inside, with skipping
separator tokens.
2024-02-12 23:27:09 +09:00
Owen Pan
32e65b0b8a Reland "[clang-format][NFC] Make LangOpts global in namespace Format (#81390)"
Restore getFormattingLangOpts().
2024-02-11 22:01:23 -08:00
Owen Pan
3dc8ef677d Revert "[clang-format][NFC] Make LangOpts global in namespace Format (#81390)"
This reverts commit 03f571995b4f0c260254955afd16ec44d0764794.

We can't hide getFormattingLangOpts() as it's used by other tools.
2024-02-11 13:08:28 -08:00
Owen Pan
03f571995b
[clang-format][NFC] Make LangOpts global in namespace Format (#81390) 2024-02-11 12:59:05 -08:00
Owen Pan
5609bd83c3 Revert "[clang-format] Update FormatToken::isSimpleTypeSpecifier() (#80241)"
This reverts commit 763139afc19ddf2e0f0265dc828ce8e5fbe92530.

It seems that LangOpts is not initialized before use.
2024-02-09 01:52:41 -08:00
Owen Pan
763139afc1
[clang-format] Update FormatToken::isSimpleTypeSpecifier() (#80241)
Now with a8279a8bc541, we can make the update.
2024-02-08 21:42:29 -08:00
Kazu Hirata
b67ce7e349 [clang] Use StringRef::starts_with (NFC) 2024-01-31 23:54:09 -08:00
Hirofumi Nakamura
0058263600
[clang-format] Support of TableGen tokens with unary operator like form, bang operators and numeric literals. (#78996)
Adds the support for tokens that have forms like unary operators.
- bang operators:  `!name`
- cond operator: `!cond`
- numeric literals: `+1`, `-1`
cond operator are one of bang operators but is distinguished because it has very specific syntax.
2024-01-31 00:30:37 +09:00
Hirofumi Nakamura
fcb6737f82
[clang-format] Support of TableGen identifiers beginning with a number. (#78571)
TableGen allows the identifiers beginning with a number.
This patch add the support of the recognition of such identifiers.
2024-01-20 21:15:58 +09:00
Hirofumi Nakamura
e3702f6225
[clang-format] TableGen multi line string support. (#78032)
Support the handling of TableGen's multiline string (code) literal.
That has the form, 
[{ this is the string possibly with multi line... }]
2024-01-17 21:20:35 +09:00
Hirofumi Nakamura
0cc31579e0
[clang-format] TableGen keywords support. (#77477)
Add TableGen keywords to the additional keyword list of the formatter.

This pull request is the splited part from
https://github.com/llvm/llvm-project/pull/76059 .
2024-01-11 20:07:49 +01:00
Kazu Hirata
f3dcc2351c
[clang] Use StringRef::{starts,ends}_with (NFC) (#75149)
This patch replaces uses of StringRef::{starts,ends}with with
StringRef::{starts,ends}_with for consistency with
std::{string,string_view}::{starts,ends}_with in C++20.

I'm planning to deprecate and eventually remove
StringRef::{starts,ends}with.
2023-12-13 08:54:13 -08:00
Owen Pan
4c17452076
[clang-format][NFC] Extend isProto() to also cover LK_TextProto (#73582) 2023-11-29 12:52:01 -08:00
sstwcw
3af82b3962
[clang-format] Add spaces around the Verilog implication operator (#71352)
The Verilog implication operator `->` is a binary operator meaning
either the left hand side is false or the right hand side is true.
Previously it was treated as the C++ struct member operator.

I didn't even know it existed when I added the operator formatting part.
And I didn't check all the tests for all the operators I added. That is
how the bad test got in.
2023-11-29 15:17:59 +00:00
Owen Pan
91c4db0061 [clang-format][NFC] Replace !is() with isNot()
Differential Revision: https://reviews.llvm.org/D158571
2023-08-24 01:27:24 -07:00
Owen Pan
5c106f7b94 [clang-format] Add TypeNames option to disambiguate types/objects
If a non-keyword identifier is found in TypeNames, then a *, &, or && that
follows it is annotated as TT_PointerOrReference.

Differential Revision: https://reviews.llvm.org/D155273
2023-07-18 14:18:40 -07:00