Finally figured out the issue with TestCortexMExceptionUnwind.py
failing on some CI. When the llvm is configured without the ARM
target enabled, the ARM ABI plugins will not be initialized in lldb,
and the unwind engine won't backtrace more than one stack frame.
This test requires that the ARM target be enabled in the llvm cmake.
Update the decorator to specify that; should be the end of these
mysterious fails.
Still seeing a failure on a CI bot with this test that
I cannot reproduce locally, but luckily this one CI bot is giving
me trace output from the tests. Turn on the unwind log when tracing
is enabled, migth get a better hint what's up with this test fail.
This test, with a corefile created via yaml2macho-core plus an
ObjectFileJSON binary with symbol addresses and ranges, was failing
on some machines/CI because the wrong ABI was being picked.
The bytes of the functions were not included in the yaml or .json
binary. The unwind falls back to using the ABI plugin default
unwind plans. We have two armv7 ABIs - the Darwin ABI that always
uses r7 as the frame pointer, and the AAPCS ABI which uses r11 code.
In reality, armv7 code uses r11 in arm mode, r7 in thumb code. But
the ABI ArchDefaultUnwindPlan doesn't have any access to the Target's
ArchSpec or Process register state, to determine the correct processor
state (arm or thumb). And in fact, on Cortex-M targets, the
instructions are always thumb, so the arch default unwind plan
(hardcoded r11) is always wrong.
The corefile doesn't specify a vendor/os, only a cpu.
The object file json specifies the armv7m-apple-* triple, which will
select the correct ABI plugin, and the test runs.
In some cases, it looks like the Process ABI was fetched after
opening the corefile, but before the binary.json was loaded and
corrected the Target's ArchSpec. And we never re-evaluate the ABI
once it is set, in a Process. When we picked the AAPCS armv7 ABI,
we would try to use r11 as frame pointer, and the unwind would stop
after one stack frame.
I'm stepping around this problem by (1) adding the register bytes of
the prologues of every test function in the backtrace, and (2)
shortening the function ranges (in binary.json) to specify that the
functions are all just long enough for the prologue where execution
is stopped. The instruction emulation plugin will fail if it can't
get all of the bytes from the function instructions, so I hacked
the function sizes in the .json to cover the prologue plus one and
changed the addresses in the backtrace to fit within those ranges.
[ updated this commit to keep the @skipIfRemote on the API test
because two remote CI bots are failing for reasons I don't quite
see. ]
This test, with a corefile created via yaml2macho-core plus an
ObjectFileJSON binary with symbol addresses and ranges, was failing
on some machines/CI because the wrong ABI was being picked.
The bytes of the functions were not included in the yaml or .json
binary. The unwind falls back to using the ABI plugin default
unwind plans. We have two armv7 ABIs - the Darwin ABI that always
uses r7 as the frame pointer, and the AAPCS ABI which uses r11 code.
In reality, armv7 code uses r11 in arm mode, r7 in thumb code. But
the ABI ArchDefaultUnwindPlan doesn't have any access to the Target's
ArchSpec or Process register state, to determine the correct processor
state (arm or thumb). And in fact, on Cortex-M targets, the
instructions are always thumb, so the arch default unwind plan
(hardcoded r11) is always wrong.
The corefile doesn't specify a vendor/os, only a cpu.
The object file json specifies the armv7m-apple-* triple, which will
select the correct ABI plugin, and the test runs.
In some cases, it looks like the Process ABI was fetched after
opening the corefile, but before the binary.json was loaded and
corrected the Target's ArchSpec. And we never re-evaluate the ABI
once it is set, in a Process. When we picked the AAPCS armv7 ABI,
we would try to use r11 as frame pointer, and the unwind would stop
after one stack frame.
I'm stepping around this problem by (1) adding the register bytes of
the prologues of every test function in the backtrace, and (2)
shortening the function ranges (in binary.json) to specify that the
functions are all just long enough for the prologue where execution
is stopped. The instruction emulation plugin will fail if it can't
get all of the bytes from the function instructions, so I hacked
the function sizes in the .json to cover the prologue plus one and
changed the addresses in the backtrace to fit within those ranges.
The lldb-remote-linux-ubuntu bot (and only this bot) is still failing
for TestCortexMExceptionUnwind.py because the Target triple is
somehow inheriting a non-Darwin OS. I marked this API test
skipUnlessDarwin but this bot can be identified more specifically
by a skipIfRemote test. There's no benefit to running this test
remotely anyway; it doesn't execute any code.
to possibly debug why this test fails on the
lldb-remote-linux-ubuntu CI bot. I'm sure the Target ArchSpec is
somehow ending up _not_ armv7m-apple-* like it should be, but I'd
like to gather a little more info before I just give up on running
this test on linux systems.
When I added support for the Cortex-M exception return unwinding,
I got CI test failures on the lldb-remote-linux-ubuntu bot. The
triple from my test `binary.json`, "armv7m-apple", was not being
used for the Target, so the incorrect SysV / AAPCS ABI was being
selected, and that ABI plugin has default unwind plans that hardcode
the arm-code r11 frame pointer behavior. This is a Cortex-M
core, and r7 should be used. The Darwin Arm ABI plugin uses r7 for
both arm and thumb, and behaves correctly.
Try getting the triple from `binary.json` in the API test, creating
the target with that triple explicitly before loading the corefile.
This may help prevent however we were losing the "-apple-" part of
the triple. We'll see what the CI bot looks like with this added.
I'm getting a failure on one linux CI, lldb-remote-linux-ubuntu,
where the test of assertEqual(thread.GetNumFrames(), 6) fails
because the unwinder only has one frame, most likely the
target triple is not set to armv7m-apple-* so ABISysV_arm is
being used instead of ABIMacOSX_arm and the architecture default
unwind plans behave differently. Possibly the frame pointer
register, or possibly the way the arch default unwind plans are
structured.
Skipping on linux for now until I can debug further. Can repo on
Darwin by changing the binary.json `"triple": "armv7m-apple"` to
armv7m-linux.
When a processor faults/is interrupted/gets an exception, it will stop
running code and jump to an exception catcher routine. Most processors
will store the pc that was executing in a system register, and the
catcher functions have special instructions to retrieve that & possibly
other registers. It may then save those values to stack, and the author
can add .cfi directives to tell lldb's unwinder where to find those
saved values.
ARM Cortex-M (microcontroller) processors have a simpler mechanism where
a fixed set of registers are saved to the stack on an exception, and a
unique value is put in the link register to indicate to the caller that
this has taken place. No special handling needs to be written into the
exception catcher, unless it wants to inspect these preserved values.
And it is possible for a general stack walker to walk the stack with no
special knowledge about what the catch function does.
This patch adds an Architecture plugin method to allow an Architecture
to override/augment the UnwindPlan that lldb would use for a stack
frame, given the contents of the return address register. It resembles a
feature where the LanguageRuntime can replace/augment the unwind plan
for a function, but it is doing it at offset by one level. The
LanguageRuntime is looking at the local register context and/or symbol
name to decide if it will override the unwind rules. For the Cortex-M
exception unwinds, we need to modify THIS frame's unwind plan if the
CALLER's LR had a specific value. RegisterContextUnwind has to retrieve
the caller's LR value before it has completely decided on the UnwindPlan
it will use for THIS stack frame.
This does mean that we will need one additional read of stack memory
than we currently do when unwinding, on Armv7 Cortex-M targets. The
unwinder walks the stack lazily, as stack frames are requested, and so
now if you ask for 2 stack frames, we will read enough stack to walk 2
frames, plus we will read one extra word of memory, the spilled LR value
from the stack. In practice, with 512-byte memory cache reads, this is
unlikely to be a real performance hit.
This PR includes a test with a yaml corefile description and a JSON
ObjectFile, incorporating all of the necessary stack memory and symbol
names from a real debug session I worked on. The architectural default
unwind plans are used for all stack frames except the 0th because
there's no instructions for the functions, and no unwind info. I may
need to add an encoding of unwind fules to ObjectFileJSON in the future
as we create more test cases like this.
This PR depends on the yaml2macho-core utility from
https://github.com/llvm/llvm-project/pull/153911 to run its API test.
rdar://110663219