@nico mentioned that FormatTests and BasicTests are small binaries with
few dependencies, so keeping them separate is nice. I broke them out as
distinct test binaries, and they are still pretty small:
$ find tools/clang/unittests/ -type f -name '*Tests' | xargs du -cksh |
sort -nr
708M total
276M tools/clang/unittests/AllClangUnitTests
244M tools/clang/unittests/Interpreter/ClangReplInterpreterTests
167M
tools/clang/unittests/Interpreter/ExceptionTests/ClangReplInterpreterExceptionTests
13M tools/clang/unittests/Format/FormatTests
6.9M tools/clang/unittests/Basic/BasicTests
1.1M tools/clang/unittests/libclang/CrashTests/libclangCrashTests
I also broke out libclangCrashTests and re-enabled the failing test to
resolve llvm#137855.
Formatting this piece of code made the program crash.
```
class TypedVecListRegOperand<RegisterClass Reg, int lanes, string eltsize>
: RegisterOperand<Reg, "printTypedVectorList<" # lanes # ", '"
# eltsize # "'>">;
```
The line starting with the `#` was treated as a separate preprocessor
directive line. Then the code dereferenced a null pointer when it tried
to continue parsing the first line that did not end in a semicolon.
Now the 2 problems are fixed.
There is some code to make sure that C++ keywords that are identifiers
in the other languages are not treated as keywords. Right now, the kind
is set to identifier, and the identifier info is cleared. The latter is
probably so that the code for identifying C++ structures does not
recognize those structures by mistake when formatting a language that
does not have those structures. But we did not find an instance where
the language can have the sequence of tokens, the code tries to parse
the structure as if it is C++ using the identifier info instead of the
token kind, but without checking for the language setting. However,
there are places where the code checks whether the identifier info field
is null or not. They are places where an identifier and a keyword are
treated the same way. For example, the name of a function in
JavaScript. This patch removes the lines that clear the identifier
info. This way, a C++ keyword gets treated in the same way as an
identifier in those places.
JavaScript
New
```JavaScript
async function
union(
myparamnameiswaytooloooong) {
}
```
Old
```JavaScript
async function
union(
myparamnameiswaytooloooong) {
}
```
Java
New
```Java
enum union { ABC, CDE }
```
Old
```Java
enum
union { ABC, CDE }
```
This reverts commit 97dcbdef6089175c45e14fcbcf5c88b10233a79a.
This reapplies 5ffd9bdb50b57 (#133545) with fixes.
The BUILD_SHARED_LIBS=ON build was fixed by adding missing LLVM
dependencies to the InterpTests binary in
unittests/AST/ByteCode/CMakeLists.txt .
Pass all the dependencies into add_clang_unittest. This is consistent
with how it is done for LLDB. I borrowed the same named argument list
structure from add_lldb_unittest. This is a necessary step towards
consolidating unit tests into fewer binaries, but seems like a good
refactoring in its own right.
There is some code to make sure that C++ keywords that are identifiers
in the other languages are not treated as keywords. Right now, the kind
is set to identifier, and the identifier info is cleared. The latter is
probably so that the code for identifying C++ structures does not
recognize those structures by mistake when formatting a language that
does not have those structures. But we did not find an instance where
the language can have the sequence of tokens, the code tries to parse
the structure as if it is C++ using the identifier info instead of the
token kind, but without checking for the language setting. However,
there are places where the code checks whether the identifier info field
is null or not. They are places where an identifier and a keyword are
treated the same way. For example, the name of a function in
JavaScript. This patch removes the lines that clear the identifier
info. This way, a C++ keyword gets treated in the same way as an
identifier in those places.
JavaScript
New
```JavaScript
async function
union(
myparamnameiswaytooloooong) {
}
```
Old
```JavaScript
async function
union(
myparamnameiswaytooloooong) {
}
```
Java
New
```Java
enum union { ABC, CDE }
```
Old
```Java
enum
union { ABC, CDE }
```
before
```Verilog
wait fork
;
wait fork
;
wait fork
;
```
after
```Verilog
wait fork;
wait fork;
wait fork;
```
The `wait fork` statement should not start a block. Previously the
formatter treated the `fork` part as the start of a new block. Now the
problem is fixed.