In #165604, a test was skipped on Windows, because the native PDB plugin
didn't set sizes on symbols. While the test isn't compiled with debug
info, it's linked with `-gdwarf`, causing a PDB to be created on
Windows. This PDB will only contain the public symbols (written by the
linker) and section information. The symbols themselves don't have a
size, however the DIA SDK sets a size for them.
It seems like, for these data symbols, the size given from DIA is the
distance to the next symbol (or the section end).
This PR implements the naive approach for the native plugin. The main
difference is in function/code symbols. There, DIA searches for a
corresponding `S_GPROC32` which have a "code size" that is sometimes
slightly smaller than the difference to the next symbol.
Most of the cases were where a C++ file was being compiled with the C substitution.
There were a few cases of the opposite though.
LLDB seems to be the only real culprit in the LLVM codebase for these mismatches.
Rest of the LLVM presumably sticks at least language-specific options in the common substitutions
making the mistakes immediately apparent.
I found these by using Clang frontend configuration files containing language-specific options for
both C and C++ (e.g. `-std=c2y` and `-std=c++26`).
After the default PDB plugin changed to the native one (#165363), this
test failed, because it uses the size of public symbols and the native
plugin sets the size to 0 (as PDB doesn't include this information
explicitly). A PDB was built because the final executable in that test
was linked with `-gdwarf`.
The test did not work as intended when the empty function `done()`
contained prologue/epilogue code, because a breakpoint was set before
the last instruction of the function, which caused the test to pass even
with the fix from #126838 having been reverted.
The test is intended to check a case when a breakpoint is set on a
return instruction, which is the very last instruction of a function.
When stepping out from this breakpoint, there is interaction between
`ThreadPlanStepOut` and `ThreadPlanStepOverBreakpoint` that could lead
to missing the stop location in the outer frame; the detailed
explanation can be found in #126838.
On `Linux/AArch64`, the source is compiled into:
```
> objdump -d main.o
0000000000000000 <done>:
0: d65f03c0 ret
```
So, when the command `bt set -n done` from the original test sets a
breakpoint to the first instruction of `done()`, this instruction
luckily also happens to be the last one.
On `Linux/x86_64`, it compiles into:
```
> objdump -d main.o
0000000000000000 <done>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: 5d pop %rbp
5: c3 ret
```
In this case, setting a breakpoint by function name means setting it
several instructions before `ret`, which does not provoke the
interaction between `ThreadPlanStepOut` and
`ThreadPlanStepOverBreakpoint`.
While debugging the tests for #155000 I found it helpful to have both
sides
of the simulated gdb-rsp traffic rather than just the responses so I've
extended
the packetLog in MockGDBServerResponder to record traffic in both
directions.
Tests have been updated accordingly
StopInfoBreakpoint keeps a BreakpointLocationCollection for all the
breakpoint locations at the BreakpointSite that was hit. It is also
lives through the time a given thread is stopped, so there are plenty of
opportunities for one of the owning breakpoints to get deleted.
But BreakpointLocations don't keep their owner Breakpoints alive, so if
the BreakpointLocationCollection can live past when some code gets a
chance to delete an owner breakpoint, and then you ask that location for
some breakpoint information, it will access freed memory.
This wasn't a problem before PR #158128 because the StopInfoBreakpoint
just kept the BreakpointSite that was hit, and when you asked it
questions, it relooked up that list. That was not great, however,
because if you hit breakpoints 5 & 6, deleted 5 and then asked which
breakpoints got hit, you would just get 6. For that and other reasons
that PR changed to storing a BreakpointLocationCollection of the
breakpoints that were hit. That's better from a UI perspective but
caused this potential problem.
I fix it by adding a variant of the BreakpointLocationCollection that
also holds onto a shared pointer to the Breakpoints that own the
locations that were hit, thus keeping them alive till the
StopInfoBreakpoint goes away.
This fixed the ASAN assertion. I also added a test that works harder to
cause trouble by deleting breakpoints during a stop.
Finally figured out the issue with TestCortexMExceptionUnwind.py
failing on some CI. When the llvm is configured without the ARM
target enabled, the ARM ABI plugins will not be initialized in lldb,
and the unwind engine won't backtrace more than one stack frame.
This test requires that the ARM target be enabled in the llvm cmake.
Update the decorator to specify that; should be the end of these
mysterious fails.
From
https://github.com/llvm/llvm-project/pull/163077#issuecomment-3396435083:
Currently, `std::atomic<T>` will always use the MSVC STL synthetic
children and summary. When inspecting types from other STLs, the output
would not show any children.
This PR adds a check that `std::atomic` contains `_Storage` to be
classified as coming from MSVC's STL.
Still seeing a failure on a CI bot with this test that
I cannot reproduce locally, but luckily this one CI bot is giving
me trace output from the tests. Turn on the unwind log when tracing
is enabled, migth get a better hint what's up with this test fail.
This test, with a corefile created via yaml2macho-core plus an
ObjectFileJSON binary with symbol addresses and ranges, was failing
on some machines/CI because the wrong ABI was being picked.
The bytes of the functions were not included in the yaml or .json
binary. The unwind falls back to using the ABI plugin default
unwind plans. We have two armv7 ABIs - the Darwin ABI that always
uses r7 as the frame pointer, and the AAPCS ABI which uses r11 code.
In reality, armv7 code uses r11 in arm mode, r7 in thumb code. But
the ABI ArchDefaultUnwindPlan doesn't have any access to the Target's
ArchSpec or Process register state, to determine the correct processor
state (arm or thumb). And in fact, on Cortex-M targets, the
instructions are always thumb, so the arch default unwind plan
(hardcoded r11) is always wrong.
The corefile doesn't specify a vendor/os, only a cpu.
The object file json specifies the armv7m-apple-* triple, which will
select the correct ABI plugin, and the test runs.
In some cases, it looks like the Process ABI was fetched after
opening the corefile, but before the binary.json was loaded and
corrected the Target's ArchSpec. And we never re-evaluate the ABI
once it is set, in a Process. When we picked the AAPCS armv7 ABI,
we would try to use r11 as frame pointer, and the unwind would stop
after one stack frame.
I'm stepping around this problem by (1) adding the register bytes of
the prologues of every test function in the backtrace, and (2)
shortening the function ranges (in binary.json) to specify that the
functions are all just long enough for the prologue where execution
is stopped. The instruction emulation plugin will fail if it can't
get all of the bytes from the function instructions, so I hacked
the function sizes in the .json to cover the prologue plus one and
changed the addresses in the backtrace to fit within those ranges.
[ updated this commit to keep the @skipIfRemote on the API test
because two remote CI bots are failing for reasons I don't quite
see. ]
This patch adds the notion of "Facade" locations which can be reported
from a ScriptedResolver instead of the actual underlying breakpoint
location for the breakpoint. Also add a "was_hit" method to the scripted
resolver that allows the breakpoint to say which of these "Facade"
locations was hit, and "get_location_description" to provide a
description for the facade locations.
I apologize in advance for the size of the patch. Almost all of what's
here was necessary to (a) make the feature testable and (b) not break
any of the current behavior.
The motivation for this feature is given in the "Providing Facade
Locations" section that I added to the python-reference.rst so I won't
repeat it here.
rdar://152112327
When a thread reaches a breakpoint at the return address set by
`ThreadPlanStepOut`, `ThreadPlanStepOut::ShouldStop()` calls
`ThreadPlanShouldStopHere::InvokeShouldStopHereCallback()`, and if it
returns `false`, `ThreadPlanShouldStopHere::QueueStepOutFromHerePlan()`
is called to queue a new plan to skip the corresponding range. Once the
new plan finishes, `ThreadPlanStepOut::ShouldStop()` should recheck the
stop condition; however, there is no code path in the method that sets
`done` to `true`. Before #126838, if `done` was `false`, the method checked
if a suitable frame had been reached. After the patch, the check is only
performed at a breakpoint; thus, the execution continues.
This patch causes `ThreadPlanStepOut::ShouldStop()` to recheck the stop
condition when `m_step_out_further_plan_sp` completes.
This test, with a corefile created via yaml2macho-core plus an
ObjectFileJSON binary with symbol addresses and ranges, was failing
on some machines/CI because the wrong ABI was being picked.
The bytes of the functions were not included in the yaml or .json
binary. The unwind falls back to using the ABI plugin default
unwind plans. We have two armv7 ABIs - the Darwin ABI that always
uses r7 as the frame pointer, and the AAPCS ABI which uses r11 code.
In reality, armv7 code uses r11 in arm mode, r7 in thumb code. But
the ABI ArchDefaultUnwindPlan doesn't have any access to the Target's
ArchSpec or Process register state, to determine the correct processor
state (arm or thumb). And in fact, on Cortex-M targets, the
instructions are always thumb, so the arch default unwind plan
(hardcoded r11) is always wrong.
The corefile doesn't specify a vendor/os, only a cpu.
The object file json specifies the armv7m-apple-* triple, which will
select the correct ABI plugin, and the test runs.
In some cases, it looks like the Process ABI was fetched after
opening the corefile, but before the binary.json was loaded and
corrected the Target's ArchSpec. And we never re-evaluate the ABI
once it is set, in a Process. When we picked the AAPCS armv7 ABI,
we would try to use r11 as frame pointer, and the unwind would stop
after one stack frame.
I'm stepping around this problem by (1) adding the register bytes of
the prologues of every test function in the backtrace, and (2)
shortening the function ranges (in binary.json) to specify that the
functions are all just long enough for the prologue where execution
is stopped. The instruction emulation plugin will fail if it can't
get all of the bytes from the function instructions, so I hacked
the function sizes in the .json to cover the prologue plus one and
changed the addresses in the backtrace to fit within those ranges.
This test failed during testing on the RISC-V target because we couldn't
strip the main label from the binary. main is dynamically linked when
the -fPIC flag is enabled. The RISC-V ABI requires that executables
support loading at arbitrary addresses to enable shared libraries and
secure loading (ASLR). In PIC mode, function addresses cannot be
hardcoded in the code. Instead, code is generated to load addresses from
the GOT/PLT tables, which are initialized by the dynamic loader. The
reference to main thus ends up in .dynsym and is dynamically bound. We
cannot strip main or any other dynamically linked functions because
these functions are referenced indirectly via dynamic linking tables
(.plt and .got). Removing these symbols would break the dynamic linking
mechanism needed to resolve function addresses at runtime, causing the
executable to fail to correctly call them.
This patch adds a load core time, right now we don't have much insight
into the performance of load core, especially for large coredumps. To
start collecting information on this I've added some minor
instrumentation code to measure the two call sites of `LoadCore`.
I've also added a test to validate the new metric is output in
statistics dump
This started as me being annoyed that I got loads of this when
inspecting memory regions on Mac:
Modified memory (dirty) page list provided, 0 entries.
So I thought I should test the existing behaviour, which led me to
refactor the existing test to run the same checks on all regions.
In the process I realised that the output is not wrong. There is a
difference between knowing that no pages are dirty and not knowing
anything about dirty pages. We print that there are 0 entries so the
user knows that difference.
The test case now checks "memory region" output as well as API use.
There were also some checks only run on certain regions, like page size,
which now run for all of them.
The `ProcessElfCore::FindModuleUUID` function can be called by multiple
threads at the same time when `target.parallel-module-load` is true. We
were using the `operator[]` to lookup the UUID which will mutate the map
when the key is not present. This is unsafe in a multi-threaded contex
so we now use a read-only `find` operation and explicitly return an
invalid UUID when the key is not present.
The `m_uuids` map can follow a create-then-query pattern. It is
populated in the `DoLoadCore` function which looks like it is only
called in a single-threaded context so we do not need extra locking as
long as we keep the other accesses read-only.
Other fixes I considered
* Use a lock to protect access - We don't need to modify the map after
creation so we can allow concurrent read-only access.
* Store the map in a const pointer container to prevent accidental
mutation in other places.
- Only accessed in one place currently so just added a comment.
* Store the UUID in the NT_FILE_Entry after building the mapping
correctly in `UpdateBuildIdForNTFileEntries`. - The map lets us avoid a
linear search in `FindModuleUUID`.
This commit also reverts the temporary workaround from #159395 which
disabled parallel module loading to avoid the test failure.
Fixes#159377
Since #157170 this test has been flakey on several LLDB buildbots. I
suspect it's to do with mutli-threading, there are more details in
#159377.
Disable parallel loading for now so we are not spamming people making
unrelated changes.
These no longer compile because the warning now defaults to an error
after #157364, so downgrade the error to a warning for now; I’m not
familiar enough with either LLDB or MacOS to fix these warnings properly
(assuming they’re unintended).
The lldb-remote-linux-ubuntu bot (and only this bot) is still failing
for TestCortexMExceptionUnwind.py because the Target triple is
somehow inheriting a non-Darwin OS. I marked this API test
skipUnlessDarwin but this bot can be identified more specifically
by a skipIfRemote test. There's no benefit to running this test
remotely anyway; it doesn't execute any code.
to possibly debug why this test fails on the
lldb-remote-linux-ubuntu CI bot. I'm sure the Target ArchSpec is
somehow ending up _not_ armv7m-apple-* like it should be, but I'd
like to gather a little more info before I just give up on running
this test on linux systems.
When I added support for the Cortex-M exception return unwinding,
I got CI test failures on the lldb-remote-linux-ubuntu bot. The
triple from my test `binary.json`, "armv7m-apple", was not being
used for the Target, so the incorrect SysV / AAPCS ABI was being
selected, and that ABI plugin has default unwind plans that hardcode
the arm-code r11 frame pointer behavior. This is a Cortex-M
core, and r7 should be used. The Darwin Arm ABI plugin uses r7 for
both arm and thumb, and behaves correctly.
Try getting the triple from `binary.json` in the API test, creating
the target with that triple explicitly before loading the corefile.
This may help prevent however we were losing the "-apple-" part of
the triple. We'll see what the CI bot looks like with this added.
`INSERT BEFORE` keyword is not supported in current versions gold and
mold linkers. Since we cannot confirm accurately what linker and version
is available on the system and when it will be supported. We test it
with a sample program using the script keywords.
I'm getting a failure on one linux CI, lldb-remote-linux-ubuntu,
where the test of assertEqual(thread.GetNumFrames(), 6) fails
because the unwinder only has one frame, most likely the
target triple is not set to armv7m-apple-* so ABISysV_arm is
being used instead of ABIMacOSX_arm and the architecture default
unwind plans behave differently. Possibly the frame pointer
register, or possibly the way the arch default unwind plans are
structured.
Skipping on linux for now until I can debug further. Can repo on
Darwin by changing the binary.json `"triple": "armv7m-apple"` to
armv7m-linux.
This is something the StopInfo class manages, so it should be allowed to
compute this rather than having SBThread do so. This code just moves the
computation to methods in StopInfo. It is mostly NFC. The one change
that I actually had to adjust the tests for was a couple of tests that
were asking for the UnixSignal stop info data by asking for the data at
index 1. GetStopInfoDataCount returns 1 and we don't do 1 based indexing
so the test code was clearly wrong. But I don't think it makes sense to
perpetuate handing out the value regardless of what index you pass us.
When a processor faults/is interrupted/gets an exception, it will stop
running code and jump to an exception catcher routine. Most processors
will store the pc that was executing in a system register, and the
catcher functions have special instructions to retrieve that & possibly
other registers. It may then save those values to stack, and the author
can add .cfi directives to tell lldb's unwinder where to find those
saved values.
ARM Cortex-M (microcontroller) processors have a simpler mechanism where
a fixed set of registers are saved to the stack on an exception, and a
unique value is put in the link register to indicate to the caller that
this has taken place. No special handling needs to be written into the
exception catcher, unless it wants to inspect these preserved values.
And it is possible for a general stack walker to walk the stack with no
special knowledge about what the catch function does.
This patch adds an Architecture plugin method to allow an Architecture
to override/augment the UnwindPlan that lldb would use for a stack
frame, given the contents of the return address register. It resembles a
feature where the LanguageRuntime can replace/augment the unwind plan
for a function, but it is doing it at offset by one level. The
LanguageRuntime is looking at the local register context and/or symbol
name to decide if it will override the unwind rules. For the Cortex-M
exception unwinds, we need to modify THIS frame's unwind plan if the
CALLER's LR had a specific value. RegisterContextUnwind has to retrieve
the caller's LR value before it has completely decided on the UnwindPlan
it will use for THIS stack frame.
This does mean that we will need one additional read of stack memory
than we currently do when unwinding, on Armv7 Cortex-M targets. The
unwinder walks the stack lazily, as stack frames are requested, and so
now if you ask for 2 stack frames, we will read enough stack to walk 2
frames, plus we will read one extra word of memory, the spilled LR value
from the stack. In practice, with 512-byte memory cache reads, this is
unlikely to be a real performance hit.
This PR includes a test with a yaml corefile description and a JSON
ObjectFile, incorporating all of the necessary stack memory and symbol
names from a real debug session I worked on. The architectural default
unwind plans are used for all stack frames except the 0th because
there's no instructions for the functions, and no unwind info. I may
need to add an encoding of unwind fules to ObjectFileJSON in the future
as we create more test cases like this.
This PR depends on the yaml2macho-core utility from
https://github.com/llvm/llvm-project/pull/153911 to run its API test.
rdar://110663219
Due to a fallback in GDBRemoteCommunicationClient.cpp, on Linux we will
assume a PID of 1 if the remote does not respond in some way that tells
us the real PID.
So if PID 1 happened to be a process that the current user could read,
we would try to debug that instead. On my current machine, PID 1 was
sshd run by root so we would ignore it. If I changed the fallback to
some process ID run by my user, the test would fail.
To prevent this, select the remote-linux platform before creating the
target. This means we won't attempt any host lookups.
Fixes#155895
This patch loads values of the VFP registers from the NT_ARM_VFP note.
Note that a CORE/NT_FPREGSET note is typically present in core dump
files and is used to store the FPA registers. The FPA unit is rare and
obsolete; however, Linux creates the note even if the unit is absent.
When hitting a sanitizer breakpoint, LLDB currently displays the frame
in the sanitizer dylib (which we usually don't have debug-info for),
which isn't very helpful to the user. A more helpful frame to display
would be the first frame not in the sanitizer library (we have a
[similar heuristic when we trap inside
libc++](https://github.com/llvm/llvm-project/pull/108825)). This patch
does just that, by implementing the `GetSuggestedStackFrameIndex` API
Depends on https://github.com/llvm/llvm-project/pull/133078
This came up in https://github.com/llvm/llvm-project/issues/155691.
For `std::basic_string` our formatter matching logic required the
allocator template parameter to be a `std::allocator`. There is no
compelling reason (that I know of) why this would be required for us to
apply the existing formatter to the string. We don't check the
`allocator` parameter for other STL containers either. This meant that
`std::string` that used custom allocators wouldn't be formatted. This
patch relaxes the regex for `basic_string`.
The LLVM Style Guide says the following about error and warning messages
[1]:
> [T]o match error message styles commonly produced by other tools,
> start the first sentence with a lowercase letter, and finish the last
> sentence without a period, if it would end in one otherwise.
I often provide this feedback during code review, but we still have a
bunch of places where we have inconsistent error message, which bothers
me as a user. This PR identifies a handful of those places and updates
the messages to be consistent.
[1] https://llvm.org/docs/CodingStandards.html#error-and-warning-messages
This patch introduces a new scripting affordance in lldb:
`ScriptedFrame`.
This allows user to produce mock stackframes in scripted threads and
scripted processes from a python script.
With this change, StackFrame can be synthetized from different sources:
- Either from a dictionary containing a load address, and a frame index,
which is the legacy way.
- Or by creating a ScriptedFrame python object.
One particularity of synthezising stackframes from the ScriptedFrame
python object, is that these frame have an optional PC, meaning that
they don't have a report a valid PC and they can act as shells that just
contain static information, like the frame function name, the list of
variables or registers, etc. It can also provide a symbol context.
rdar://157260006
Signed-off-by: Med Ismail Bennani <ismail@bennani.ma>
Signed-off-by: Med Ismail Bennani <ismail@bennani.ma>
Add a bunch of mnemonics to the command options now that they're
highlighted in the help output. This uncovered two issues:
- We had an instance where we weren't applying the ANSI formatting.
- We had a place where we were now incorrectly computing the column
width.
Both are fixed by this PR.
This patch changes the way frames created from scripted affordances like
Scripted Threads are displayed. Currently, they're marked artificial
which is used usually for compiler generated frames.
This patch changes that behaviour by introducing a new synthetic
StackFrame kind and moves 'artificial' to be a distinct StackFrame
attribut.
On top of making these frames less confusing, this allows us to know
when a frame was created from a scripted affordance.
rdar://155949703
Signed-off-by: Med Ismail Bennani <ismail@bennani.ma>
The `TestVariableAnnotationsDisassembler.py` test assembles
`d_original_example.s`,
which contains ELF-specific directives such as:
- `.ident`
- `.section ".note.GNU-stack", "", @progbits`
- `.section .debug_line, "", @progbits`
These directives are not understood by COFF on Windows, so the test
fails
on the lldb-remote-linux-win builder even when running on x86_64.
This patch adds a decorator to gate the test,
- `@skipUnlessPlatform(["linux", "freebsd", "netbsd", "android"])` —
runs only on ELF platforms
Follow-up to #155942.