I'm trying to remove the redirection in SmallSet.h:
template <typename PointeeType, unsigned N>
class SmallSet<PointeeType*, N> : public SmallPtrSet<PointeeType*, N>
{};
to make it clear that we are using SmallPtrSet. There are only
handful places that rely on this redirection.
This patch replaces SmallSet to SmallPtrSet where the element type is
a pointer.
When providing allocation and deallocation traces,
the ASan compiler-rt runtime already provides call
addresses (`TracePCType::Calls`).
On Darwin, system sanitizers (libsanitizers)
provides return address. It also discards a few
non-user frames at the top of the stack, because
these internal libmalloc/libsanitizers stack
frames do not provide any value when diagnosing
memory errors.
Introduce and add handling for
`TracePCType::ReturnsNoZerothFrame` to cover this
case and enable libsanitizers traces line-level
testing.
rdar://157596927
---
Commit 1 is a mechanical refactoring to introduce
and adopt `TracePCType` enum to replace
`pcs_are_call_addresses` bool. It preserve the
current behavior:
```
pcs_are_call_addresses:
false -> TracePCType::Returns (default)
true -> TracePCType::Calls
```
Best reviewed commit by commit.
A CMake change included in CMake 4.0 makes `AIX` into a variable
(similar to `APPLE`, etc.)
ff03db6657
However, `${CMAKE_SYSTEM_NAME}` unfortunately also expands exactly to
`AIX` and `if` auto-expands variable names in CMake. That means you get
a double expansion if you write:
`if (${CMAKE_SYSTEM_NAME} MATCHES "AIX")`
which becomes:
`if (AIX MATCHES "AIX")`
which is as if you wrote:
`if (ON MATCHES "AIX")`
You can prevent this by quoting the expansion of "${CMAKE_SYSTEM_NAME}",
due to policy
[CMP0054](https://cmake.org/cmake/help/latest/policy/CMP0054.html#policy:CMP0054)
which is on by default in 4.0+. Most of the LLVM CMake already does
this, but this PR fixes the remaining cases where we do not.
`std::wstring AnsiToUtf16(const std::string &ansi)` is a
reimplementation of `llvm::sys::windows::UTF8ToUTF16`. This patch
removes `AnsiToUtf16` and its usages entirely.
Improve error handling in ObjectFileWasm by using helpers that wrap
their result in an llvm::Expected. The helper to read a Wasm string now
return an Expected<std::string> and I created a helper to parse 32-bit
ULEBs that returns an Expected<uint32_t>.
added getName method in SBModule.h and .cpp in order to get the name of
the module from m_object_name.
---------
Co-authored-by: Bar Soloveychik <barsolo@fb.com>
This patch replaces SmallSet<T *, N> with SmallPtrSet<T *, N>. Note
that SmallSet.h "redirects" SmallSet to SmallPtrSet for pointer
element types:
template <typename PointeeType, unsigned N>
class SmallSet<PointeeType*, N> : public SmallPtrSet<PointeeType*, N>
{};
We only have 10 instances that rely on this "redirection". Since the
redirection doesn't improve readability, this patch replaces SmallSet
with SmallPtrSet for pointer element types.
I'm planning to remove the redirection eventually.
This is a continuation of #153494. In a WebAssembly file, the "name"
section contains names for the segments in the data section
(WASM_NAMES_DATA_SEGMENT). We already parse these as symbols, and with
this PR, we now also create sub-sections for each of the segments.
This abstracts the base Transport handler to have a MessageHandler
component and allows us to generalize both JSON-RPC 2.0 for MCP (or an
LSP) and DAP format.
This should allow us to create clearly defined clients and servers for
protocols, both for testing and for RPC between the lldb instances and
an lldb-mcp multiplexer.
This basic model is inspiried by the clangd/Transport.h file and the
mlir/lsp-server-support/Transport.h that are both used for LSP servers
within the llvm project.
Additionally, this helps with testing by subclassing `Transport` to
allow us to simplify sending/receiving messages without needing to use a
toJSON/fromJSON and a pair of pipes, see `TestTransport` in
DAP/TestBase.h.
Introduces `_regexp-step`, a step command which additionally allows for
stepping into a target function. This change updates `step` and `s` to
be aliases for `_regexp-step`.
The existing `sif` alias ("Step Into Function") is not well known
amongst users. This change updates `step` and `s` to also work like
`sif`, taking an optional function name.
This is implemented to not break uses of `step` or `s` with a flag, for
example running `step -r func_to_avoid` works as expected.
The current implementation of
CPlusPlusLanguage::SymbolNameFitsToLanguage will return true if the
symbol is mangled for any language that lldb knows about.
This fixes a few bugs, effectively through a fallback to `p` when `po` fails.
The motivating bug this fixes is when an error within the compiler causes `po` to fail.
Previously when that happened, only its value (typically an object's address) was
printed – and problematically, no compiler diagnostics were shown. With this change,
compiler diagnostics are shown, _and_ the object is fully printed (ie `p`).
Another bug this fixes is when `po` is used on a type that doesn't provide an object
description (such as a struct). Again, the normal `ValueObject` printing is used.
Additionally, this also improves how lldb handles an object description method that
fails in some way. Now an error will be shown (it wasn't before), and the value will be
printed normally.
This updates the DIL code for handling array subscripting to more
closely match and handle all the cases from the original 'frame var'
implementation. Also updates the DIL array subscripting test. This
particularly fixes some issues with handling synthetic children, objc
pointers, and accessing specific bits within scalar data types.
The current implementation tries to (1) patch the existing readline
module definition if it's already present in the inittab and (2) append
our patched readline module to the inittab. The former (1) uses the
non-stable Python API and I can't find a situation where this is
necessary.
We do this work before initialization, so for the readline
module to exist, it either needs to be added by Python itself (which
doesn't seem to be the case), or someone would have had to have added it
without initializing.
Use `PyThread_get_thread_ident`, which is part of the Stable API,
instead of accessing a member of the PyThreadState, which is opaque when
using the Stable API.
This PR adds support for parsing the data symbols from the WebAssembly
name section, which consists of a name and address range for the
segments in the Wasm data section. Unlike other object file formats,
Wasm has no symbols for referencing items within those segments (i.e.
symbols the user has defined).
## Problem
When the new setting
```
set target.parallel-module-load true
```
was added, lldb began fetching modules from the devices from multiple
threads simultaneously. This caused crashes of lldb when debugging on
android devices.
The top of the stack in the crash look something like this:
```
#0 0x0000555aaf2b27fe llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/opt/llvm/bin/lldb-dap+0xb87fe)
#1 0x0000555aaf2b0a99 llvm::sys::RunSignalHandlers() (/opt/llvm/bin/lldb-dap+0xb6a99)
#2 0x0000555aaf2b2fda SignalHandler(int, siginfo_t*, void*) (/opt/llvm/bin/lldb-dap+0xb8fda)
#3 0x00007f9c02444560 __restore_rt /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/signal/../sysdeps/unix/sysv/linux/libc_sigaction.c:13:0
#4 0x00007f9c04ea7707 lldb_private::ConnectionFileDescriptor::Disconnect(lldb_private::Status*) (usr/bin/../lib/liblldb.so.15+0x22a7707)
#5 0x00007f9c04ea5b41 lldb_private::ConnectionFileDescriptor::~ConnectionFileDescriptor() (usr/bin/../lib/liblldb.so.15+0x22a5b41)
#6 0x00007f9c04ea5c1e lldb_private::ConnectionFileDescriptor::~ConnectionFileDescriptor() (usr/bin/../lib/liblldb.so.15+0x22a5c1e)
#7 0x00007f9c052916ff lldb_private::platform_android::AdbClient::SyncService::Stat(lldb_private::FileSpec const&, unsigned int&, unsigned int&, unsigned int&) (usr/bin/../lib/liblldb.so.15+0x26916ff)
#8 0x00007f9c0528b9dc lldb_private::platform_android::PlatformAndroid::GetFile(lldb_private::FileSpec const&, lldb_private::FileSpec const&) (usr/bin/../lib/liblldb.so.15+0x268b9dc)
```
Our workaround was to set `set target.parallel-module-load ` to `false`
to avoid the crash.
## Background
PlatformAndroid creates two different classes with one stateful adb
connection shared between the two -- one through AdbClient and another
through AdbClient::SyncService. The connection management and state is
complex, and seems to be responsible for the segfault we are seeing. The
AdbClient code resets these connections at times, and re-establishes
connections if they are not active. Similarly, PlatformAndroid caches
its SyncService, which uses an AdbClient class, but the SyncService puts
its connection into a different 'sync' state that is incompatible with a
standard connection.
## Changes in this diff
* This diff refactors the code to (hopefully) have clearer ownership of
the connection, clearer separation of AdbClient and SyncService by
making a new class for clearer separations of concerns, called
AdbSyncService.
* New unit tests are added
* Additional logs were added (see
https://github.com/llvm/llvm-project/pull/145382#issuecomment-3055535017
for details)
This patch fixes:
lldb/source/Plugins/SymbolFile/NativePDB/SymbolFileNativePDB.cpp:647:3:
error: 'llvm::Expected' may not intend to support class template
argument deduction [-Werror,-Wctad-maybe-unsupported]
lldb/source/Plugins/SymbolFile/NativePDB/SymbolFileNativePDB.cpp:677:3:
error: 'llvm::Expected' may not intend to support class template
argument deduction [-Werror,-Wctad-maybe-unsupported]
Fixes https://github.com/llvm/llvm-project/issues/135707
Follow up to https://github.com/llvm/llvm-project/pull/148836
which fixed some of this issue but not all of it.
Our Value/ValueObject system does not store the endian directly
in the values. Instead it assumes that the endian of the result
of a cast can be assumed to be the target's endian, or the host
but only as a fallback. It assumes the place it is copying from
is also that endian.
This breaks down when you have register values. These are always
host endian and continue to be when cast. Casting them to big
endian when on a little endian host breaks certain calls like
GetValueAsUnsigned.
To fix this, check the context of the value. If it has a register
context, always treat it as host endian and make the result host
endian.
I had an alternative where I passed an "is_register" flag into
all calls to this, but it felt like a layering violation and changed
many more lines.
This solution isn't much more robust, but it works for all the test
cases I know of. Perhaps you can create a register value without
a RegisterInfo backing it, but I don't know of a way myself.
For testing, I had to add a minimal program file for each arch
so that there is a type system to support the casting. This is
generated from YAML since we only need the machine and endian
to be set.
Tag types like stucts or enums didn't have a declaration attached to
them. The source locations are present in the IPI stream in
`LF_UDT_MOD_SRC_LINE` records:
```
0x101F | LF_UDT_MOD_SRC_LINE [size = 18, hash = 0x1C63]
udt = 0x1058, mod = 3, file = 1, line = 0
0x2789 | LF_UDT_MOD_SRC_LINE [size = 18, hash = 0x1E5A]
udt = 0x1253, mod = 35, file = 93, line = 17069
```
The file is an ID in the string table `/names`:
```
ID | String
1 | '\<unknown>'
12 | 'D:\a\_work\1\s\src\ExternalAPIs\WindowsSDKInc\c\Include\10.0.22621.0\um\wingdi.h'
93 | 'D:\a\_work\1\s\src\ExternalAPIs\WindowsSDKInc\c\Include\10.0.22621.0\um\winnt.h'
```
Here, we're not interested in `mod`. This would indicate which module
contributed the UDT.
I was looking at Rustc's PDB and found that it uses `<unknown>` for some
types, so I added a check for that.
This makes two DIA PDB shell tests to work with the native PDB plugin.
---------
Co-authored-by: Michael Buch <michaelbuch12@gmail.com>
Relates to https://github.com/llvm/llvm-project/issues/135707
Where it was reported that reading the PC using "register read" had
different results to an expression "$pc".
This was happening because registers are treated in lldb as pure
"values" that don't really have an endian. We have to store them
somewhere on the host of course, so the endian becomes host endian.
When you want to use a register as a value in an expression you're
pretending that it's a variable in memory. In target memory. Therefore
we must convert the register value to that endian before use.
The test I have added is based on the one used for XML register flags.
Where I fake an AArch64 little endian and an s390x big endian target. I
set up the data in such a way the pc value should print the same for
both, either with register read or an expression.
I considered just adding a live process test that checks the two are the
same but with on one doing cross endian testing, I doubt it would have
ever caught this bug.
Simulating this means most of the time, little endian hosts will test
little to little and little to big. In the minority of cases with a big
endian host, they'll check the reverse. Covering all the combinations.
Use std::numeric_limits<uint32_t>::max() for all overflow checks in
ObjectFileWasm and fix a few locations where I incorrectly used `>=`
instead of `>`.
* This adjusts the `Request`/`Response` types to have an `id` that is
either a string or a number.
* Merges 'Error' into 'Response' to have a single response type that
represents both errors and results.
* Adjusts the `Error.data` field to by any JSON value.
* Adds `operator==` support to the base protocol types and simplifies
the tests.
The previous commit was reverted because it broke a test on the bots.
Original commit message:
By profiling LLDB debugging a Swift application without a dSYM and a
large amount of .o files, I identified that querying shared modules was
the biggest bottleneck when running "frame variable", and Clang types
need to be searched.
One of the reasons for that slowness is that the shared module list can
can grow very large, and the search through it is O(n).
To solve this issue, this patch adds a new hashmap to the shared module
list whose key is the name of the module, and the value is all the
modules that share that name. This should speed up any search where the
query contains the module name.
rdar://156753350
This PR adds support for parsing the WebAssembly symbol table. The
symbol table is encoded in the "names" section and contains names and
indexes into other sections. For now we only support parsing function
(code) symbols. The result is that you can set breakpoints by symbol
name, while previously breakpoints by name required debug info (DWARF).
This is also necessary for Swift, which checks for the presence of
`swift_release` as a heuristic to determine if there's a static Swift
stdlib.
Relands #152295.
Checking for the overloads did not account for them being out of order.
For example, [the failed
output](https://github.com/llvm/llvm-project/pull/152295#issuecomment-3177563247)
contained the overloads, but out of order. The last commit here fixes
that by using `-DAG`.
---------
Co-authored-by: Jonas Devlieghere <jonas@devlieghere.com>
This adds the ability for functions to be matched by their basename.
Before, the globals were searched for the name. This works if the full
name is available but fails for basenames.
PDB only includes the full names of functions, so we need to cache all
basenames. This is (again) very similar to
[SymbolFilePDB](b242150b07/lldb/source/Plugins/SymbolFile/PDB/SymbolFilePDB.cpp (L1291-L1383)).
There are two main differences:
- We can't just access the parent of a function to determine that it's a
member function - we need to check the type of the function, and its
"this type".
- SymbolFilePDB saved the full method name in the map. However, when
searching for methods, only the basename is passed, so the function
never found any methods.
Fixes#51933.
---------
Co-authored-by: Jonas Devlieghere <jonas@devlieghere.com>
`cfa_reg_contents` is a local variable. Whatever value we assign there
right before the `return` statement will be lost anyway.
Co-authored-by: Matej Košík <matej.kosik@codasip.com>
Reapply "[lldb] Update JSONTransport to use MainLoop for reading."
(#152155)
This reverts commit cd40281685f642ad879e33f3fda8d1faa136ebf4.
This also includes some updates to try to address the platforms with
failing tests.
I updated the JSONTransport and tests to use std::function instead of
llvm:unique_function. I think the tests were failing due to the
unique_function not being moved correctly in the loop on some platforms.
Instead of returning an error when:
- it can't obtain section information from a module.
- there are other issues calculating the size.
when we encounter such an error we log the error and continue with the
other modules.
tested with
lldb/test/API/functionalities/process_save_core_minidump/TestProcessSaveCoreMinidump.py
---------
Co-authored-by: Bar Soloveychik <barsolo@fb.com>
Prior to this patch, SBFrame/SBThread methods exhibit racy behavior if
called while the process is running, because they do not lock the
`Process::RetRunLock` mutex. If they did, they would fail, correctly
identifying that the process is not running.
Some methods _attempt_ to protect against this with the pattern:
```
ExecutionContext exe_ctx(m_opaque_sp.get(), lock); // this is a different lock
Process *process = exe_ctx.GetProcessPtr();
if (process) {
Process::StopLocker stop_locker;
if (stop_locker.TryLock(&process->GetRunLock()))
.... do work ...
```
However, this is also racy: the constructor of `ExecutionContext` will
access the frame list, which is something that can only be done once the
process is stopped.
With this patch:
1. The constructor of `ExecutionContext` now expects a `ProcessRunLock`
as an argument. It attempts to lock the run lock, and only fills in
information about frames and threads if the lock can be acquired.
Callers of the constructor are expected to check the lock.
2. All uses of ExecutionContext are adjusted to conform to the above.
3. The SBThread.cpp-defined helper function ResumeNewPlan now expects a
locked ProcessRunLock as _proof_ that the execution is stopped. It will
unlock the mutex prior to resuming the process.
This commit exposes many opportunities for early-returns, but these
would increase the diff of this patch and distract from the important
changes, so we opt not to do it here.
In architectures where pointers may contain metadata, such as arm64e,
the metadata may need to be cleaned prior to sending this pointer to be
used in expression evaluation generated code.
This patch is a step towards allowing consumers of pointers to decide
whether they want to keep or remove metadata, as opposed to discarding
metadata at the moment pointers are created. See
https://github.com/llvm/llvm-project/pull/150537.
This was tested running the LLDB test suite on arm64e.
In architectures where pointers may contain metadata, such as arm64e, it
is important to ignore those bits when comparing two different StackIDs,
as the metadata may not be the same even if the pointers are.
This patch is a step towards allowing consumers of pointers to decide
whether they want to keep or remove metadata, as opposed to discarding
metadata at the moment pointers are created. See
https://github.com/llvm/llvm-project/pull/150537.
This was tested running the LLDB test suite on arm64e.
https://github.com/llvm/llvm-project/pull/147835 made changes that
caused libstdc++ std::string tests to fail:
```
======================================================================
FAIL: test_unavailable_summary_libstdcxx_dwo (TestDataFormatterStdString.StdStringDataFormatterTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/llvm-project/lldb/packages/Python/lldbsuite/test/lldbtest.py", line 1805, in test_method
return attrvalue(self)
File "/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/llvm-project/lldb/test/API/functionalities/data-formatter/data-formatter-stl/generic/string/TestDataFormatterStdString.py", line 223, in test_unavailable_summary_libstdcxx
self.do_test_summary_unavailable()
File "/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/llvm-project/lldb/test/API/functionalities/data-formatter/data-formatter-stl/generic/string/TestDataFormatterStdString.py", line 213, in do_test_summary_unavailable
self.assertEqual(summary, "Summary Unavailable", "No summary for bad value")
AssertionError: '(null)' != 'Summary Unavailable'
- (null)
+ Summary Unavailable
: No summary for bad value
Config=aarch64-/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/bin/clang
----------------------------------------------------------------------
```
This test constructs an invalid std::string by starting with a null
pointer.
Somehow this improvement in Clang uncovered a bug in the formatter.
Perhaps because we now know the type of the field we tried to access is
char *, so fall back to a C string formatter that produces `(null)`.
The formatter tries to access `_M_p` and checked whether the resulting
ValueObjectSP was null, but not that it did not contain an error value.
I think that error value can be there if you are able to access one part
of the path, `_M_dataplus`, but another part fails. Since the layout
looks like this:
```
struct _Alloc_hider :
{
pointer _M_p; // The actual data.
};
_Alloc_hider _M_dataplus;
void
_M_data(pointer __p)
{ _M_dataplus._M_p = __p; }
```
So I think we were able to read `_M_dataplus` just by offset, but then
failed because it contains, or points to something containing nulls or a
null pointer.
Or perhaps an error value means we know what the class member is, but
could not read from it.
I found this by comparing with the libcxx formatter, so I've copied the
same handling from there to fix the issue.