This is a continuation of #153494. In a WebAssembly file, the "name"
section contains names for the segments in the data section
(WASM_NAMES_DATA_SEGMENT). We already parse these as symbols, and with
this PR, we now also create sub-sections for each of the segments.
This PR adds support for parsing the data symbols from the WebAssembly
name section, which consists of a name and address range for the
segments in the Wasm data section. Unlike other object file formats,
Wasm has no symbols for referencing items within those segments (i.e.
symbols the user has defined).
Tag types like stucts or enums didn't have a declaration attached to
them. The source locations are present in the IPI stream in
`LF_UDT_MOD_SRC_LINE` records:
```
0x101F | LF_UDT_MOD_SRC_LINE [size = 18, hash = 0x1C63]
udt = 0x1058, mod = 3, file = 1, line = 0
0x2789 | LF_UDT_MOD_SRC_LINE [size = 18, hash = 0x1E5A]
udt = 0x1253, mod = 35, file = 93, line = 17069
```
The file is an ID in the string table `/names`:
```
ID | String
1 | '\<unknown>'
12 | 'D:\a\_work\1\s\src\ExternalAPIs\WindowsSDKInc\c\Include\10.0.22621.0\um\wingdi.h'
93 | 'D:\a\_work\1\s\src\ExternalAPIs\WindowsSDKInc\c\Include\10.0.22621.0\um\winnt.h'
```
Here, we're not interested in `mod`. This would indicate which module
contributed the UDT.
I was looking at Rustc's PDB and found that it uses `<unknown>` for some
types, so I added a check for that.
This makes two DIA PDB shell tests to work with the native PDB plugin.
---------
Co-authored-by: Michael Buch <michaelbuch12@gmail.com>
This PR adds support for parsing the WebAssembly symbol table. The
symbol table is encoded in the "names" section and contains names and
indexes into other sections. For now we only support parsing function
(code) symbols. The result is that you can set breakpoints by symbol
name, while previously breakpoints by name required debug info (DWARF).
This is also necessary for Swift, which checks for the presence of
`swift_release` as a heuristic to determine if there's a static Swift
stdlib.
Relands #152295.
Checking for the overloads did not account for them being out of order.
For example, [the failed
output](https://github.com/llvm/llvm-project/pull/152295#issuecomment-3177563247)
contained the overloads, but out of order. The last commit here fixes
that by using `-DAG`.
---------
Co-authored-by: Jonas Devlieghere <jonas@devlieghere.com>
This adds the ability for functions to be matched by their basename.
Before, the globals were searched for the name. This works if the full
name is available but fails for basenames.
PDB only includes the full names of functions, so we need to cache all
basenames. This is (again) very similar to
[SymbolFilePDB](b242150b07/lldb/source/Plugins/SymbolFile/PDB/SymbolFilePDB.cpp (L1291-L1383)).
There are two main differences:
- We can't just access the parent of a function to determine that it's a
member function - we need to check the type of the function, and its
"this type".
- SymbolFilePDB saved the full method name in the map. However, when
searching for methods, only the basename is passed, so the function
never found any methods.
Fixes#51933.
---------
Co-authored-by: Jonas Devlieghere <jonas@devlieghere.com>
This is a major change on how we represent nested name qualifications in
the AST.
* The nested name specifier itself and how it's stored is changed. The
prefixes for types are handled within the type hierarchy, which makes
canonicalization for them super cheap, no memory allocation required.
Also translating a type into nested name specifier form becomes a no-op.
An identifier is stored as a DependentNameType. The nested name
specifier gains a lightweight handle class, to be used instead of
passing around pointers, which is similar to what is implemented for
TemplateName. There is still one free bit available, and this handle can
be used within a PointerUnion and PointerIntPair, which should keep
bit-packing aficionados happy.
* The ElaboratedType node is removed, all type nodes in which it could
previously apply to can now store the elaborated keyword and name
qualifier, tail allocating when present.
* TagTypes can now point to the exact declaration found when producing
these, as opposed to the previous situation of there only existing one
TagType per entity. This increases the amount of type sugar retained,
and can have several applications, for example in tracking module
ownership, and other tools which care about source file origins, such as
IWYU. These TagTypes are lazily allocated, in order to limit the
increase in AST size.
This patch offers a great performance benefit.
It greatly improves compilation time for
[stdexec](https://github.com/NVIDIA/stdexec). For one datapoint, for
`test_on2.cpp` in that project, which is the slowest compiling test,
this patch improves `-c` compilation time by about 7.2%, with the
`-fsyntax-only` improvement being at ~12%.
This has great results on compile-time-tracker as well:

This patch also further enables other optimziations in the future, and
will reduce the performance impact of template specialization resugaring
when that lands.
It has some other miscelaneous drive-by fixes.
About the review: Yes the patch is huge, sorry about that. Part of the
reason is that I started by the nested name specifier part, before the
ElaboratedType part, but that had a huge performance downside, as
ElaboratedType is a big performance hog. I didn't have the steam to go
back and change the patch after the fact.
There is also a lot of internal API changes, and it made sense to remove
ElaboratedType in one go, versus removing it from one type at a time, as
that would present much more churn to the users. Also, the nested name
specifier having a different API avoids missing changes related to how
prefixes work now, which could make existing code compile but not work.
How to review: The important changes are all in
`clang/include/clang/AST` and `clang/lib/AST`, with also important
changes in `clang/lib/Sema/TreeTransform.h`.
The rest and bulk of the changes are mostly consequences of the changes
in API.
PS: TagType::getDecl is renamed to `getOriginalDecl` in this patch, just
for easier to rebasing. I plan to rename it back after this lands.
Fixes#136624
Fixes https://github.com/llvm/llvm-project/issues/43179
Fixes https://github.com/llvm/llvm-project/issues/68670
Fixes https://github.com/llvm/llvm-project/issues/92757
Some DIA PDB tests pass with the native plugin already, but didn't test
this. This adds test runs with the native plugin - no functional
changes.
In addition to the x86 calling convention test, there's also
9f102a9004/lldb/test/Shell/SymbolFile/PDB/calling-conventions-arm.test,
but I can't test this.
Fixes a bug that surfaces in frame recognizers.
Details about the bug:
A new frame recognizer is configured to match a specific symbol
(`swift_willThrow`). This is an `extern "C"` symbol defined in a C++
source file. When Swift is built with debug info, the function
`ParseFunctionFromDWARF` will use the debug info to construct a function
name that looks like a C++ declaration (`::swift_willThrow(void *,
SwiftError**)`). The `Mangled` instance will have this string as its
`m_demangled` field, and have _no_ string for its `m_mangled` field.
The result is the frame recognizer would not match the symbol to the
name (`swift_willThrow` != `::swift_willThrow(void *, SwiftError**)`.
By changing `ParseFunctionFromDWARF` to assign both a demangled name and
a mangled, frame recognizers can successfully match symbols in this
configuration.
Languages other than C/C++ don't necessarily emit mangled names in the
`UniqueName` field of type records. Rust specifically emits a unique ID
that doesn't contain the name.
For example, `(i32, i32)` is emitted as
```llvm
!266 = !DICompositeType(
tag: DW_TAG_structure_type, name: "tuple$<i32,i32>", file: !9, size: 64, align: 32,
elements: !267, templateParams: !17, identifier: "19122721b0632fe96c0dd37477674472"
)
```
which results in
```
0x1091 | LF_STRUCTURE [size = 72, hash = 0x1AC67] `tuple$<i32,i32>`
unique name: `19122721b0632fe96c0dd37477674472`
vtable: <no type>, base list: <no type>, field list: 0x1090
options: has unique name, sizeof 8
```
In C++ with Clang and MSVC, a structure similar to this would result in
```
0x136F | LF_STRUCTURE [size = 44, hash = 0x30BE2] `MyTuple`
unique name: `.?AUMyTuple@@`
vtable: <no type>, base list: <no type>, field list: 0x136E
options: has unique name, sizeof 8
```
With this PR, if a `UniqueName` is encountered that couldn't be parsed,
it will fall back to using the undecorated (→ do the same as if the
unique name is empty/unavailable).
I'm not sure how to test this. Maybe compiling the LLVM IR that rustc
emits?
Fixes#152051.
The test for this is artificial as I'm not aware of any upstream targets
that use DW_CFA_val_offset
RegisterContextUnwind::ReadFrameAddress now reports how it's attempting
to obtain the CFA unless all success/failure cases emit logs that
clearly identify the method it was attempting. Previously several of the
existing failure paths emit no message or a message that's
indistinguishable from those on other paths.
Initially suggested in
https://github.com/llvm/llvm-project/pull/149305#issuecomment-3113413702
- this PR adds the setting `plugin.symbol-file.pdb.use-native-reader`.
It doesn't remove support for `LLDB_USE_NATIVE_PDB_READER` to allow some
backwards compatibility. This was the suggested way to use the native
reader - changing that would mean users who set this, now use the DIA
reader. The setting has priority over the environment variable, though.
If the default gets flipped on Windows, the environment variable could
probably be removed as well.
This would make it possible to test both native PDB and DIA PDB in the
API tests (see linked PR).
Previously, `type lookup` for types in namespaces didn't work with the
native PDB plugin, because `FindTypes` would only look for types whose
base name was equal to their full name. PDB/CodeView does not store the
base names in the TPI stream, but the types have their full name (e.g.
`std::thread` instead of `thread`). So `findRecordsByName` would only
return types in the top level namespace.
This PR changes the lookup to go through all types and check their base
name. As that could be a bit expensive, the names are first cached
(similar to the function lookup in the DIA PDB plugin). Potential types
are checked with `TypeQuery::ContextMatches`.
To be able to handle anonymous namespaces, I changed
`TypeQuery::ContextMatches`. The [`TypeQuery`
constructor](9ad7edef42/lldb/source/Symbol/Type.cpp (L76-L79))
inserts all name components as `CompilerContextKind::AnyDeclContext`. To
skip over anonymous namespaces, `ContextMatches` checked if a component
was empty and exactly of kind `Namespace`. For our query, the last check
was always false, so we never skipped anonymous namespaces. DWARF
doesn't have this problem, as it [constructs the context
outside](abe93d9d7e/lldb/source/Plugins/SymbolFile/DWARF/DWARFIndex.cpp (L154-L160))
and has proper information about namespaces. I'm not fully sure if my
change is correct and that it doesn't break other users of `TypeQuery`.
This enables `type lookup <type>` to work on types in namespaces.
However, expressions don't work with this yet, because `FindNamespace`
is unimplemented for native PDB.
In #145967 Clang was taught to emit trap reasons on UBSan traps in debug
info using the same method as `__builtin_verbose_trap`. This patch adds
a test case to make sure that the existing "Verbose Trap StackFrame
Recognizer" recognizes the trap reason and sets the stop reason and
stack frame appropriately.
Part of a GSoC 2025 Project.
%T has been deprecated for about seven years, mostly because it is not
unique to each test which can lead to races. This patch updates the few
remaining tests in lldb that use %T to not use it (either directly using
files or creating their own temp dir). The eventual goal is to remove
support for %T from llvm-lit given few tests use it and it still has
racey behavior.
This patch errors on the side of creating new temp dirs even when not
strictly necessary to avoid needing to update filenames inside filecheck
matchers.
Second attempt at relanding the lldb-rpc-gen tool. This should fix 2
issues:
- An assert that was hitting when building on Linux. The assert would
hit in the server source emitter, specifically when attemping to
determine the storage size for a return type is that is a pointer, but
isn't a const char *, const char ** or void pointer.
The assert would hit when attempting to generate
SBAttachInfo::GetProcessPluginName, which returns a const char *
(meaning it shouldn't have been in the code block for the assert at
all). The reason that it was hitting the assert when generating this
function is that lldb_rpc_gen::TypeIsConstCharPtr was returning false
for this function even though it did return a const char *. This was
happening because when checking the return type for a const char *,
TypeIsConstCharPtr would only check that the underlying type was a
signed char. This failed on Linux (but was fine on Darwin), as the
underlying type also needs to be checked for being an unsigned char.
- Cross compiling support
The build for lldb-rpc-gen had no support for cross compiling and as
such, the sources generated for lldb-rpc-gen would get compiled too
early in phase 2 when cross compiling, before the Clang toolchain was
built and this led to an error when trying to include stdlib files. This
reland splits this build into 2 by building the tool first and then
compiling the sources in the second stage of the cross-compiled build.
Original PR Description:
This commit upstreams the lldb-rpc-gen tool, a ClangTool that generates
the LLDB RPC client and server interfaces. This tool, as well as LLDB
RPC itself is built by default. If it needs to be disabled, put
-DLLDB_BUILD_LLDBRPC=OFF in your CMake invocation.
https://discourse.llvm.org/t/rfc-upstreaming-lldb-rpc/85804
Original PR Link:
github.com/llvm/llvm-project/pull/138031
When gathering the headers to fix up and place in LLDB.framework, we
were previously globbing the header files from a location in the build
directory. This commit changes this to glob from the source directory
instead, as we were globbing from the build directory without ensuring
that the necessary files were actually in that location before globbing.
https://github.com/llvm/llvm-project/pull/149282 changed
the max children depth and that caused one part of the
output to become `{...}`.
The original PR set a higher limit for a different test,
so I'm doing the same here.
Deeply nested structs can be noisy, so Apple's LLDB fork sets the
default to `4`:
9c93adbb28/lldb/source/Target/TargetProperties.td (L134-L136)
Thought it would be useful to upstream this. Though happy to pick a
different default or keep it as-is.
Prior, Process Minidump would return
```
Status::FromErrorString("could not parse memory info");
```
For any unsuccessful memory read, with no differentiation between an
error in LLDB and the data simply not being present. This lead to a lot
of user confusion and overall pretty terrible user experience. To fix
this I've refactored the APIs so we can pass an error back in an llvm
expected.
There were also no shell tests for memory read and process Minidump so I
added one.
This fails because it tells clang to use DWARF which link.exe
then discards.
The test may not need DWARF, but I'm going to confirm that in
a follow up PR review.
Test added by https://github.com/llvm/llvm-project/pull/149088.
When dumping variables, LLDB will print a one-time warning about
truncating children (when the children count exceeds the default
`target.max-children-count`). But we only do this for `frame variable`.
So if we use `dwim-print` or `expression`, the output gets truncated but
we don't print a warning. But because we store the fact that we
truncated some output on the `CommandInterpreter`, we fire the warning
next time we use `frame variable`. E.g.,:
```
(lldb) p arr
(int[1000]) {
[0] = -5
[1] = 0
[2] = 0
<-- snipped -->
[253] = 0
[254] = 0
[255] = 0
...
}
(lldb) v someLocal
(int) someLocal = 10
*** Some of the displayed variables have more members than the debugger
will show by default. To show all of them, you can either use the
--show-all-children option to frame variable or raise the limit by
changing the target.max-children-count setting.
```
This patch prints the warning for `dwim-print` and `expression`.
I only added a test for the `target.max-children-count` for now because
it seems the `target.max-children-depth` warning is broken (I can't get
it to fire).
This changes the example command added in
https://github.com/llvm/llvm-project/pull/145793 so that the fdis
program does not have to be a single program name.
Doing so also means we can run the test on Windows where the program
needs to be "python.exe script_name".
I've changed "fdis set" to treat the rest of the command as the program.
Then store that as a list to be passed to subprocess. If we just use a
string, Python will think that "python.exe foo" is the name of an actual
program instead of a program and an argument to it.
This will still break if the paths have spaces in, but I'm trying to do
just enough to fix the test here without rewriting all the option
handling.
LLDB uses the LLVM disassembler to determine the size of instructions and
to do the actual disassembly. Currently, if the LLVM disassembler can't
disassemble an instruction, LLDB will ignore the instruction size, assume
the instruction size is the minimum size for that device, print no useful
opcode, and print nothing for the instruction.
This patch changes this behavior to separate the instruction size and
"can't disassemble". If the LLVM disassembler knows the size, but can't
dissasemble the instruction, LLDB will use that size. It will print out
the opcode, and will print "<unknown>" for the instruction. This is much
more useful to both a user and a script.
The impetus behind this change is to clean up RISC-V disassembly when
the LLVM disassembler doesn't understand all of the instructions.
RISC-V supports proprietary extensions, where the TD files don't know
about certain instructions, and the disassembler can't disassemble them.
Internal users want to be able to disassemble these instructions.
With llvm-objdump, the solution is to pipe the output of the disassembly
through a filter program. This patch modifies LLDB's disassembly to look
more like llvm-objdump's, and includes an example python script that adds
a command "fdis" that will disassemble, then pipe the output through a
specified filter program. This has been tested with crustfilt, a sample
filter located at https://github.com/quic/crustfilt .
Changes in this PR:
- Decouple "can't disassemble" with "instruction size".
DisassemblerLLVMC::MCDisasmInstance::GetMCInst now returns a bool for
valid disassembly, and has the size as an out paramter.
Use the size even if the disassembly is invalid.
Disassemble if disassemby is valid.
- Always print out the opcode when -b is specified.
Previously it wouldn't print out the opcode if it couldn't disassemble.
- Print out RISC-V opcodes the way llvm-objdump does.
Code for the new Opcode Type eType16_32Tuples by Jason Molenda.
- Print <unknown> for instructions that can't be disassembled, matching
llvm-objdump, instead of printing nothing.
- Update max riscv32 and riscv64 instruction size to 8.
- Add example "fdis" command script.
- Added disassembly byte test for x86 with known and unknown instructions.
- Added disassembly byte test for riscv32 with known and unknown instructions,
with and without filtering.
- Added test from Jason Molenda to RISC-V disassembly unit tests.
There is currently no way to prevent `${function.name-with-args}` from
using the `plugin.cplusplus.display.function-name-format` setting. Even
if the setting is set to an empty string. As a way to disable formatting
by language plugin, this patch makes it so
`plugin.cplusplus.display.function-name-format` falls back to the old
way of printing `${function.name-with-args}`. Even if we didn't want to
add a fallback, making the setting an empty string shouldn't really
"succeed".
LLDB breakpoint conditions take an expression that's evaluated using the
language of the code where the breakpoint is located. Users have asked
to have an option to tell it to evaluate the expression in a specific
language.
This is feature is especially helpful for Swift, for example for a
condition based on the value in memory at an offset from a register.
Such a condition is pretty difficult to write in Swift, but easy in C.
This PR adds a new argument (-Y) to specify the language of the
condition expression. We can't reuse the current -L option, since you
might want to break on only Swift symbols, but run a C expression there
as per the example above.
rdar://146119507
Relands the commit to upstream the lldb-rpc-gen tool in order to fix a
build failure on the linux remote bots. The reland adds the Clang
resource dir unconditionally to the invocation for the tool instead of
only adding it in the event that we're using a standalone build.
Original PR description:
This commit upstreams the lldb-rpc-gen tool, a ClangTool that generates
the LLDB RPC client and server interfaces. This tool, as well as LLDB
RPC itself is built by default. If it needs to be disabled, put
-DLLDB_BUILD_LLDBRPC=OFF in your CMake invocation.
https://discourse.llvm.org/t/rfc-upstreaming-lldb-rpc/85804
Original PR Link:
https://github.com/llvm/llvm-project/pull/138031
This commit upstreams the LLDB RPC server interface emitters. These
emitters generate the server-side API interfaces for RPC, which
communicate directly with liblldb itself out of process using the SB
API.
https://discourse.llvm.org/t/rfc-upstreaming-lldb-rpc/85804
Reverts llvm/llvm-project#138031. This is failing during the build phase
on the Ubuntu buildbot:
```
Error while processing /home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/llvm-project/lldb/include/lldb/API/SBWatchpoint.h.
[78/78] Processing file /home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/llvm-project/lldb/include/lldb/API/SBWatchpointOptions.h.
warning: unknown warning option '-Wno-maybe-uninitialized'; did you mean '-Wno-uninitialized'? [-Wunknown-warning-option]
warning: unknown warning option '-Wno-class-memaccess'; did you mean '-Wno-class-varargs'? [-Wunknown-warning-option]
warning: unknown warning option '-Wno-stringop-truncation'; did you mean '-Wno-format-truncation'? [-Wunknown-warning-option]
warning: unknown warning option '-Wno-maybe-uninitialized'; did you mean '-Wno-uninitialized'? [-Wunknown-warning-option]
warning: unknown warning option '-Wno-class-memaccess'; did you mean '-Wno-class-varargs'? [-Wunknown-warning-option]
warning: unknown warning option '-Wno-stringop-truncation'; did you mean '-Wno-format-truncation'? [-Wunknown-warning-option]
In file included from /home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/llvm-project/lldb/include/lldb/API/SBWatchpointOptions.h:12:
In file included from /home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/llvm-project/lldb/include/lldb/API/SBDefines.h:12:
In file included from /home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/llvm-project/lldb/include/lldb/lldb-defines.h:12:
In file included from /home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/llvm-project/lldb/include/lldb/lldb-types.h:12:
/home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/llvm-project/lldb/include/lldb/lldb-enumerations.h:12:10: fatal error: 'cstdint' file not found
12 | #include <cstdint>
| ^~~~~~~~~
```
This commit upstreams the `lldb-rpc-gen` tool, a ClangTool that
generates the LLDB RPC client and server interfaces. This tool, as well
as LLDB RPC itself is built by default. If it needs to be disabled, put
-DLLDB_BUILD_LLDBRPC=OFF in your CMake invocation.
https://discourse.llvm.org/t/rfc-upstreaming-lldb-rpc/85804
The script used to fix up LLDB's header for use in the macOS framework
contained 2 bugs that this commit addreses:
1. The output contents were appended to the output file multiple times
instead of only being written once.
2. The script was not considering LLDB includes that were *not* from the
SB API.
This commit addresses and fixes both of these bugs and updates the
corresponding test to match.
In the script that's used by RPC to convert LLDB headers to LLDB RPC
headers, there's a bug with how it converts namespace usage. An
overeager regex pattern caused *all* text before any `lldb::` namespace
usage to get replaced with `lldb_rpc::` instead of just the namespace
itself. This commit changes that regex pattern to be less overeager and
modifies one of the shell tests for this script to actually check that
the namespace usage replacement is working correctly.
rdar://154126268
This patch addresses 2 issues:
1. It makes registers available on non-crashed threads all the time
2. It fixes arm64 registers parsing for registers that don't use the `x`
prefix (`fp` -> `x29` / `lr` -> `x30`)
---------
Signed-off-by: Med Ismail Bennani <ismail@bennani.ma>
This patch adds support to the haswell sub-architecture (x86_64h) to
scripted processes.
rdar://147208252
Signed-off-by: Med Ismail Bennani <ismail@bennani.ma>
This patch makes interactive mode as the default when using the crashlog
command. It replaces the existing `-i|--interactive` flag with a new
`-m|--mode` option, that can either be `interactive` or `batch`.
By default, when the option is not explicitely set by the user, the
interactive mode is selected, however, lldb will fallback to batch mode
if the command interpreter is not interactive or if stdout is not a tty.
This also adds some railguards to prevent users from using interactive
only options with the batch mode and updates the tests accordingly.
rdar://97801509
Differential Revision: https://reviews.llvm.org/D141658
Signed-off-by: Med Ismail Bennani <ismail@bennani.ma>