This abstracts the base Transport handler to have a MessageHandler
component and allows us to generalize both JSON-RPC 2.0 for MCP (or an
LSP) and DAP format.
This should allow us to create clearly defined clients and servers for
protocols, both for testing and for RPC between the lldb instances and
an lldb-mcp multiplexer.
This basic model is inspiried by the clangd/Transport.h file and the
mlir/lsp-server-support/Transport.h that are both used for LSP servers
within the llvm project.
Additionally, this helps with testing by subclassing `Transport` to
allow us to simplify sending/receiving messages without needing to use a
toJSON/fromJSON and a pair of pipes, see `TestTransport` in
DAP/TestBase.h.
The current implementation of
CPlusPlusLanguage::SymbolNameFitsToLanguage will return true if the
symbol is mangled for any language that lldb knows about.
## Problem
When the new setting
```
set target.parallel-module-load true
```
was added, lldb began fetching modules from the devices from multiple
threads simultaneously. This caused crashes of lldb when debugging on
android devices.
The top of the stack in the crash look something like this:
```
#0 0x0000555aaf2b27fe llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/opt/llvm/bin/lldb-dap+0xb87fe)
#1 0x0000555aaf2b0a99 llvm::sys::RunSignalHandlers() (/opt/llvm/bin/lldb-dap+0xb6a99)
#2 0x0000555aaf2b2fda SignalHandler(int, siginfo_t*, void*) (/opt/llvm/bin/lldb-dap+0xb8fda)
#3 0x00007f9c02444560 __restore_rt /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/signal/../sysdeps/unix/sysv/linux/libc_sigaction.c:13:0
#4 0x00007f9c04ea7707 lldb_private::ConnectionFileDescriptor::Disconnect(lldb_private::Status*) (usr/bin/../lib/liblldb.so.15+0x22a7707)
#5 0x00007f9c04ea5b41 lldb_private::ConnectionFileDescriptor::~ConnectionFileDescriptor() (usr/bin/../lib/liblldb.so.15+0x22a5b41)
#6 0x00007f9c04ea5c1e lldb_private::ConnectionFileDescriptor::~ConnectionFileDescriptor() (usr/bin/../lib/liblldb.so.15+0x22a5c1e)
#7 0x00007f9c052916ff lldb_private::platform_android::AdbClient::SyncService::Stat(lldb_private::FileSpec const&, unsigned int&, unsigned int&, unsigned int&) (usr/bin/../lib/liblldb.so.15+0x26916ff)
#8 0x00007f9c0528b9dc lldb_private::platform_android::PlatformAndroid::GetFile(lldb_private::FileSpec const&, lldb_private::FileSpec const&) (usr/bin/../lib/liblldb.so.15+0x268b9dc)
```
Our workaround was to set `set target.parallel-module-load ` to `false`
to avoid the crash.
## Background
PlatformAndroid creates two different classes with one stateful adb
connection shared between the two -- one through AdbClient and another
through AdbClient::SyncService. The connection management and state is
complex, and seems to be responsible for the segfault we are seeing. The
AdbClient code resets these connections at times, and re-establishes
connections if they are not active. Similarly, PlatformAndroid caches
its SyncService, which uses an AdbClient class, but the SyncService puts
its connection into a different 'sync' state that is incompatible with a
standard connection.
## Changes in this diff
* This diff refactors the code to (hopefully) have clearer ownership of
the connection, clearer separation of AdbClient and SyncService by
making a new class for clearer separations of concerns, called
AdbSyncService.
* New unit tests are added
* Additional logs were added (see
https://github.com/llvm/llvm-project/pull/145382#issuecomment-3055535017
for details)
* This adjusts the `Request`/`Response` types to have an `id` that is
either a string or a number.
* Merges 'Error' into 'Response' to have a single response type that
represents both errors and results.
* Adjusts the `Error.data` field to by any JSON value.
* Adds `operator==` support to the base protocol types and simplifies
the tests.
Reapply "[lldb] Update JSONTransport to use MainLoop for reading."
(#152155)
This reverts commit cd40281685f642ad879e33f3fda8d1faa136ebf4.
This also includes some updates to try to address the platforms with
failing tests.
I updated the JSONTransport and tests to use std::function instead of
llvm:unique_function. I think the tests were failing due to the
unique_function not being moved correctly in the loop on some platforms.
Add `ValueObject::CreateValueObjectFromScalar` function and adjust
`Scalar::GetData` to be able to both extend and truncate the data bytes
in Scalar to the specified size.
This reverts commit 600976f4bfb06526c283dcc4efc4801792f08ca5.
The test was crashing trying to access any element of the GPR struct.
(gdb) disas
Dump of assembler code for function _ZN7testing8internal11CmpHelperEQIyyEENS_15AssertionResultEPKcS4_RKT_RKT0_:
0x00450afc <+0>: push {r4, r5, r6, r7, r8, r9, r10, r11, lr}
0x00450b00 <+4>: sub sp, sp, #60 ; 0x3c
0x00450b04 <+8>: ldr r5, [sp, #96] ; 0x60
=> 0x00450b08 <+12>: ldm r3, {r4, r7}
0x00450b0c <+16>: ldm r5, {r6, r9}
0x00450b10 <+20>: eor r7, r7, r9
0x00450b14 <+24>: eor r6, r4, r6
0x00450b18 <+28>: orrs r7, r6, r7
(gdb) p/x r3
$3 = 0x3e300f6e
"However, load and store multiple instructions (LDM and STM) and load and store double-word (LDRD or STRD) must be aligned to at least a word boundary."
https://developer.arm.com/documentation/den0013/d/Porting/Alignment
>>> 0x3e300f6e % 4
2
Program received signal SIGBUS, Bus error.
0x00450b08 in testing::AssertionResult testing::internal::CmpHelperEQ<unsigned long long, unsigned long long>(char const*, char const*, unsigned long long const&, unsigned long long const&) ()
The struct is packed with 1 byte alignment, but it needs to start at an aligned address for us
to ldm from it. So I've done that with alignas.
Also fixed some compiler warnings in the test itself.
The `EmulateInstructionARM64::EvaluateInstruction()` method used
`uint32_t` to store the PC value, causing addresses greater than 2^32 to
be truncated.
As for now, the issue does not affect the mainline because the method is
never called with both `eEmulateInstructionOptionAutoAdvancePC` and
`eEmulateInstructionOptionIgnoreConditions` options set. However, it can
trigger on a downstream that uses software stepping.
This moves all the generic MCP code into the new ProtocolMCP library so
it can be shared between the MCP implementation in LLDB (built on the
private API) and lldb-mcp (built on the public API).
This PR doesn't include the actual MCP server. The reason is that the
transport mechanism will be different between the LLDB implementation
(using sockets) and the lldb-mcp implementation (using stdio). The goal
s to abstract away that difference by making the server use
JSONTransport (again) but I'm saving that for a separate PR.
`PlatformList::Create()` added an item to the list even when
`Platform::Create()` returned `nullptr`. Other methods use these items
without checking, which can lead to a crash. For example:
```
> lldb
(lldb) platform select unknown
error: unable to find a plug-in for the platform named "unknown"
(lldb) file a.out-arm64
PLEASE submit a bug report to...
Stack dump:
0. Program arguments: lldb
1. HandleCommand(command = "file a.out-arm64 ")
...
```
This way we make sure that the logic to reconstruct demangled names in
the tests is the same as the logic when reconstructing the actual
frame-format variable.
`DemangledNameInfo::SuffixRange` is currently the only one which we
can't test in the same way until we set it from inside the
`TrackingOutputBuffer`. I added TODOs to track this.
This PR moves the MCP protocol code into its own library
(`lldbProtocolMCP`) so the code can be shared between the
`ProtocolServerMCP` plugin in LLDB as well as `lldb-mcp`. The goal is to
do the same thing for DAP (which, for now, would be used exclusively
from `lldb-dap`).
To make it clear that it's neither part of the `lldb` nor the
`lldb_private` namespace, I created a new `lldb_protocol` namespace.
Depending on how much code would be reused by lldb-dap, we may move more
code into the protocol library.
The test for this is artificial as I'm not aware of any upstream targets
that use DW_CFA_val_offset
RegisterContextUnwind::ReadFrameAddress now reports how it's attempting
to obtain the CFA unless all success/failure cases emit logs that
clearly identify the method it was attempting. Previously several of the
existing failure paths emit no message or a message that's
indistinguishable from those on other paths.
This updates JSONTransport to use a MainLoop for reading messages.
This also allows us to read in larger chunks than we did previously.
With the event driven reading operations we can read in chunks and store
the contents in an internal buffer. Separately we can parse the buffer
and split the contents up into messages.
Our previous version approach would read a byte at a time, which is less
efficient.
This patch adds 2 new attributes to `DemangledNameInfo`: `TemplateRange`
and `NameQualifiersRange`. It also introduces the
`function.name-qualifiers` entity formatter which allows tracking
qualifiers between the name of a function and its arguments/template.
This will be used downstream in Swift but may have applications in C++:
https://github.com/swiftlang/llvm-project/pull/11068.
### Summary
We have internally seen cases like this
`DW_OP_lit0, DW_OP_stack_value, DW_OP_piece 0x28`
where we have longer op pieces than what Scalar supports (32, 64 or 128
bits). In these cases LLDB is currently hitting the assertion
`assert(ap_int.getBitWidth() >= bit_size);`
We are extending the generated APInt to the piece size by filling zeros.
### Test plan
Added a unit to cover this case.
### Reviewers
@clayborg , @jeffreytan81 , @Jlalond
LLDB currently attaches `AsmLabel`s to `FunctionDecl`s such that that
the `IRExecutionUnit` can determine which mangled name to call (we can't
rely on Clang deriving the correct mangled name to call because the
debug-info AST doesn't contain all the info that would be encoded in the
DWARF linkage names). However, we don't attach `AsmLabel`s for structors
because they have multiple variants and thus it's not clear which
mangled name to use. In the [RFC on fixing expression evaluation of
abi-tagged
structors](https://discourse.llvm.org/t/rfc-lldb-handling-abi-tagged-constructors-destructors-in-expression-evaluator/82816)
we discussed encoding the structor variant into the `AsmLabel`s.
Specifically in [this
thread](https://discourse.llvm.org/t/rfc-lldb-handling-abi-tagged-constructors-destructors-in-expression-evaluator/82816/7)
we discussed that the contents of the `AsmLabel` are completely under
LLDB's control and we could make use of it to uniquely identify a
function by encoding the exact module and DIE that the function is
associated with (mangled names need not be enough since two identical
mangled symbols may live in different modules). So if we already have a
custom `AsmLabel` format, we can encode the structor variant in a
follow-up (the current idea is to append the structor variant as a
suffix to our custom `AsmLabel` when Clang emits the mangled name into
the JITted IR). Then we would just have to teach the `IRExecutionUnit`
to pick the correct structor variant DIE during symbol resolution. The
draft of this is available
[here](https://github.com/llvm/llvm-project/pull/149827)
This patch sets up the infrastructure for the custom `AsmLabel` format
by encoding the module id, DIE id and mangled name in it.
**Implementation**
The flow is as follows:
1. Create the label in `DWARFASTParserClang`. The format is:
`$__lldb_func:module_id:die_id:mangled_name`
2. When resolving external symbols in `IRExecutionUnit`, we parse this
label and then do a lookup by DIE ID (or mangled name into the module if
the encoded DIE is a declaration).
Depends on https://github.com/llvm/llvm-project/pull/151355
This switches to `makeIntrusiveRefCnt<FileSystem>` where creating a new
object, and to passing/returning by `IntrusiveRefCntPtr<FileSystem>`
instead of `FileSystem*` or `FileSystem&`, when dealing with existing
objects.
Part of cleanup #151026.
Split out from https://github.com/llvm/llvm-project/pull/148877
This patch prepares `TypeSystemClang` APIs to take `AsmLabel`s which
concatenated strings (hence `std::string`) instead of a plain `const
char*`.
Add support for DW_OP_WASM_location in DWARFExpression. This PR rebases
#78977 and cleans up the unit test. The DWARF extensions are documented
at https://yurydelendik.github.io/webassembly-dwarf/ and supported by
LLVM-based toolchains such as Clang, Swift, Emscripten, and Rust.
Eliminate the `lldb_private::dwarf` namespace, in favor of using
`llvm::dwarf` directly. The latter is shorter, and this avoids ambiguity
in the ABI plugins that define a `dwarf` namespace inside an anonymous
namespace.
According to the LLVM Style Guide we don't need braces because it's a
single-line, but it looks like the macro expands to code that includes a
potentially ambiguous "else". Use braces to silence the warning.
Target::ReadMemory may or may not read live memory, but whether it did
read from live memory or from the filecache is opaque to callers. Add an
extra out parameter to indicate whether live memory was read or not.
Prior, Process Minidump would return
```
Status::FromErrorString("could not parse memory info");
```
For any unsuccessful memory read, with no differentiation between an
error in LLDB and the data simply not being present. This lead to a lot
of user confusion and overall pretty terrible user experience. To fix
this I've refactored the APIs so we can pass an error back in an llvm
expected.
There were also no shell tests for memory read and process Minidump so I
added one.
# Lldb-server crash
We have seen stacks like the following in lldb-server core dumps:
```
[
"__GI___pthread_kill at pthread_kill.c:46",
"__GI_raise at raise.c:26",
"__GI_abort at abort.c:100",
"__assert_fail_base at assert.c:92",
"__GI___assert_fail at assert.c:101",
"lldb_private::NativeProcessProtocol::RemoveSoftwareBreakpoint(unsigned long) at /redacted/lldb-server:0"
]
```
# Hypothesis of root cause
In `NativeProcessProtocol::RemoveSoftwareBreakpoint()`
([code](19b2dd9d79/lldb/source/Host/common/NativeProcessProtocol.cpp (L359-L423))),
a `ref_count` is asserted and reduced. If it becomes zero, the code
first go through a series of memory reads and writes to remove the
breakpoint trap opcode and to restore the original process code, then,
if everything goes fine, removes the entry from the map
`m_software_breakpoints` at the end of the function.
However, if any of the validations for the above reads and writes goes
wrong, the code returns an error early, skipping the removal of the
entry. This leaves the entry behind with a `ref_count` of zero.
The next call to `NativeProcessProtocol::RemoveSoftwareBreakpoint()` for
the same breakpoint[*] would violate the assertion about `ref_count > 0`
([here](19b2dd9d79/lldb/source/Host/common/NativeProcessProtocol.cpp (L365))),
which would cause a crash.
[*] We haven't found a *regular* way to repro such a next call in lldb
or lldb-dap. This is because both of them remove the breakpoint from
their internal list when they get any response from the lldb-server (OK
or error). Asking the client to delete the breakpoint a second time
doesn't trigger the client to send the `$z` gdb packet to lldb-server.
We are able to trigger the crash by sending the `$z` packet directly,
see "Manual test" below.
# Fix
Lift the removal of the map entry to be immediately after the decrement
of `ref_count`, before the early returns. This ensures that the asserted
case will never happen. The validation errors can still happen, and
whether they happen or not, the breakpoint has been removed from the
perspective of the lldb-server (same as that of lldb and lldb-dap).
# Manual test & unit test
See PR.
…(#144919)"
This was causing a couple of failures on the Ubuntu bots. Reverting
while we wait on a fix for those issues.
This reverts commit 8612926c306c5191a5fb385dd11467728c59e982.
This PR addresses a race condition encountered when using LLDB through
the Python scripting interface.
I'm relatively new to LLDB, so feedback is very welcome, especially if
there's a more appropriate way to address this issue.
### Bug Description
When running a script that repeatedly calls
`debugger.GetListener().WaitForEvent()` in a loop, and at some point
invokes `process.Kill()` from within that loop to terminate the session,
a race condition can occur if `process.Kill()` is called around the same
time a breakpoint is hit.
### Race Condition Details
The issue arises when the following sequence of events happens:
1. The process's **private state** transitions to `stopped` when the
breakpoint is hit.
2. `process.Kill()` calls `Process::Destroy()`, which invokes
`Process::WaitForProcessToStop()`. At this point:
- `private_state = stopped`
- `public_state = running` (the public state has not yet been updated)
3. The **public stop event** is broadcast **before** the hijack listener
is installed.
4. As a result, the stop event is delivered to the **non-hijack
listener**.
5. The interrupt request sent by `Process::StopForDestroyOrDetach()` is
ignored because the process is already stopped (`private_state =
stopped`).
6. No public stop event reaches the hijack listener.
7. `Process::WaitForProcessToStop()` hangs waiting for a public stop
event, but the event is never received.
8. `process.Kill()` times out after 20 seconds
### Fix Summary
This patch modifies `Process::WaitForProcessToStop()` to ensure that any
pending events in the non-hijack listener queue are processed before
checking the hijack listener. This guarantees that any missed public
state change events are handled, preventing the hang.
### Additional Context
A discussion of this issue, including a script to reproduce the bug, can
be found here:
[LLDB hangs when killing process at the same time a breakpoint is
hit](https://discourse.llvm.org/t/lldb-hangs-when-killing-process-at-the-same-time-a-breakpoint-is-hit)
LLDB uses the LLVM disassembler to determine the size of instructions and
to do the actual disassembly. Currently, if the LLVM disassembler can't
disassemble an instruction, LLDB will ignore the instruction size, assume
the instruction size is the minimum size for that device, print no useful
opcode, and print nothing for the instruction.
This patch changes this behavior to separate the instruction size and
"can't disassemble". If the LLVM disassembler knows the size, but can't
dissasemble the instruction, LLDB will use that size. It will print out
the opcode, and will print "<unknown>" for the instruction. This is much
more useful to both a user and a script.
The impetus behind this change is to clean up RISC-V disassembly when
the LLVM disassembler doesn't understand all of the instructions.
RISC-V supports proprietary extensions, where the TD files don't know
about certain instructions, and the disassembler can't disassemble them.
Internal users want to be able to disassemble these instructions.
With llvm-objdump, the solution is to pipe the output of the disassembly
through a filter program. This patch modifies LLDB's disassembly to look
more like llvm-objdump's, and includes an example python script that adds
a command "fdis" that will disassemble, then pipe the output through a
specified filter program. This has been tested with crustfilt, a sample
filter located at https://github.com/quic/crustfilt .
Changes in this PR:
- Decouple "can't disassemble" with "instruction size".
DisassemblerLLVMC::MCDisasmInstance::GetMCInst now returns a bool for
valid disassembly, and has the size as an out paramter.
Use the size even if the disassembly is invalid.
Disassemble if disassemby is valid.
- Always print out the opcode when -b is specified.
Previously it wouldn't print out the opcode if it couldn't disassemble.
- Print out RISC-V opcodes the way llvm-objdump does.
Code for the new Opcode Type eType16_32Tuples by Jason Molenda.
- Print <unknown> for instructions that can't be disassembled, matching
llvm-objdump, instead of printing nothing.
- Update max riscv32 and riscv64 instruction size to 8.
- Add example "fdis" command script.
- Added disassembly byte test for x86 with known and unknown instructions.
- Added disassembly byte test for riscv32 with known and unknown instructions,
with and without filtering.
- Added test from Jason Molenda to RISC-V disassembly unit tests.
This PR is updating the string table offset (DW_AT_str_offsets_base
which is introduced in `DWARF5`) based on the DWARF format, as per the
DWARF specification; For the 32-bit DWARF format, each offset is 4 bytes
long; for the 64-bit DWARF format, each offset is 8 bytes long.