This allows the unwinder to handle code with mid-function epilogues
where the subsequent code is reachable through a backwards branch.
Two changes are required to accomplish this:
1. Do not enqueue the subsequent instruction if the current instruction
is a barrier(*).
2. When processing an instruction, stop ignoring branches with negative
offsets.
(*) As per the definition in LLVM's MC layer, a barrier is any
instruction that "stops control flow from executing the instruction
immediately following it". See `MCInstrDesc::isBarrier` in MCInstrDesc.h
Part of a sequence of PRs:
[lldb][NFCI] Rewrite UnwindAssemblyInstEmulation in terms of a CFG visit
#169630
[lldb][NFC] Rename forward_branch_offset to branch_offset in
UnwindAssemblyInstEmulation #169631
[lldb] Add DisassemblerLLVMC::IsBarrier API #169632
[lldb] Handle backwards branches in UnwindAssemblyInstEmulation #169633
commit-id:fd266c13
This will reduce the diff in subsequent patches
Part of a sequence of PRs:
[lldb][NFCI] Rewrite UnwindAssemblyInstEmulation in terms of a CFG visit
#169630
[lldb][NFC] Rename forward_branch_offset to branch_offset in
UnwindAssemblyInstEmulation #169631
[lldb] Add DisassemblerLLVMC::IsBarrier API #169632
[lldb] Handle backwards branches in UnwindAssemblyInstEmulation #169633
commit-id:5e758a22
Currently, UnwindAssemblyInstEmulation visits instructions in the order
in which they appear in a function. This commit makes an NFCI change to
UnwindAssemblyInstEmulation so that it follows the function's CFG:
1. The first instruction is enqueued.
2. While the queue is not empty:
2.1 Visit the instruction in the *back* queue to compute the new unwind
state.
2.2 Push(+) the next instruction to the *back* of the queue.
2.3 If the instruction is a forward branch with a known branch target,
push(+) the destination instruction to the *front* of the queue.
(+) Only push if this instruction hasn't been enqueued before.
(+) When pushing an instruction, the current unwind state is attached to
it.
Note that:
* the "next instruction" is pushed to the *back* of the queue,
* a branch target is pushed to the *front* of the queue, and
* we always dequeue from the *back* of the queue.
This means that consecutive instructions are visited one after the
other; this is important to support "conditional blocks" [1] of
instructions (see the line with "if last_condition != new_condition").
This is arguably a very Thumb specific thing, so maybe it shouldn't be
in the generic algorithm; that said, it is already in the code, so we
have to support it.
The main reason this patch is NFCI and not NFC is that, now, the
destination of a forward branch is visited in a slightly different
moment than before. This should not cause any changes in output, as if a
branch destination is reachable through two different paths, any well
behaved compiler will generate the same unwind state in both paths.
The motivation for this patch is to change step 2.2 so that it _only_
pushes the next instruction if the current instruction is not an
unconditional branch / return, and to change step 2.3 so that backwards
branches are also allowed, fixing the bug described by [2].
[1]:
https://developer.arm.com/documentation/dui0473/m/arm-and-thumb-instructions/it
[2]: https://github.com/llvm/llvm-project/pull/168398
Part of a sequence of PRs:
[lldb][NFCI] Rewrite UnwindAssemblyInstEmulation in terms of a CFG visit
#169630
[lldb][NFC] Rename forward_branch_offset to branch_offset in
UnwindAssemblyInstEmulation #169631
[lldb] Add DisassemblerLLVMC::IsBarrier API #169632
[lldb] Handle backwards branches in UnwindAssemblyInstEmulation #169633
commit-id:dce6b515
In most places, the rows are copied anyway (because they are generated
by cumulating modifications) immediately after adding them to the unwind
plans. In others, they can be moved into the unwind plan. This lets us
remove some backflip copies and make `const UnwindPlan` actually mean
something.
I've split this patch into two (and temporarily left both APIs) as this
patch was getting a bit big. This patch covers all the interesting
cases. Part two all about converting "architecture default" unwind plans
from ABI and InstructionEmulation plugins.
My main motivation was trying to understand how the function and whether
the rows need to be (shared) pointers. I noticed that the function
essentially constructs two copies unwind plans in parallel (the second
being the saved_unwind_states).
If we delay the construction of the unwind plan to the end of the
function, then we never need two copies of a single row (we can just
move it into the final result), so we can just use them as value types.
This makes the overall logic of the function stand out better as it
avoids the laborious deep copies of the Row shared pointer.
I've also noticed that a large portion of the function is devoted to
recomputing certain properties of the unwind state (e.g. the
`m_fp_is_cfa` field). Instead of doing that, this patch just
saves/restores them together with the rest of the state.
The whole unwind plan is already stored in a shared pointer, and there's
no need to persist Rows individually. If there's ever a need to do that,
there are at least two options:
- copy the row (they're not that big, and they're being copied left and
right during construction already)
- use the shared_ptr subobject constructor to create a shared_ptr which
points to a Row but holds the entire unwind plan alive
This also changes all of the getter functions to return const Row
pointers, which is important for safety because all of these objects are
cached and potentially accessed from multiple threads. (Technically one
could hand out `shared_ptr<const Row>`s, but we don't have a habit of
doing that.)
As a next step, I'd like to remove the internal UnwindPlan usages of the
shared pointer, but I'm doing this separately to gauge feedback, and
also because the patch got rather big.
Add the ability to override the disassembly CPU and CPU features through
a target setting (`target.disassembly-cpu` and
`target.disassembly-features`) and a `disassemble` command option
(`--cpu` and `--features`).
This is especially relevant for architectures like RISC-V which relies
heavily on CPU extensions.
The majority of this patch is plumbing the options through. I recommend
looking at DisassemblerLLVMC and the test for the observable change in
behavior.
…unction (#91321)"
This reapplies fd1bd53ba5a06f344698a55578f6a5d79c457e30, which was
reverted due to a test failure on aarch64/windows. The failure was
caused by a combination of several factors:
- clang targeting aarch64-windows (unlike msvc, and unlike clang
targeting other aarch64 platforms) defaults to -fomit-frame-pointers
- lldb's code for looking up register values for `<same>` unwind rules
is recursive
- the test binary creates a very long chain of fp-less function frames
(it manages to fit about 22k frames before it blows its stack)
Together, these things have caused lldb to recreate the same deep
recursion when unwinding through this, and blow its own stack as well.
Since lldb frames are larger, about 4k frames like this was sufficient
to trigger the stack overflow.
This version of the patch works around this problem by increasing the
frame size of the test binary, thereby causing it to blow its stack
sooner. This doesn't fix the issue -- the same problem can occur with a
real binary -- but it's not very likely, as it requires an infinite
recursion in a simple (so it doesn't use the frame pointer) function
with a very small frame (so you can fit a lot of them on the stack).
A more principled fix would be to make lldb's lookup code non-recursive,
but I believe that's out of scope for this patch.
The original patch description follows:
A leaf function may not store the link register to stack, but we it can
still end up being a non-zero frame if it gets interrupted by a signal.
Currently, we were unable to unwind past this function because we could
not read the link register value.
To make this work, this patch:
- changes the function-entry unwind plan to include the `fp|lr = <same>`
rules. This in turn necessitated an adjustment in the generic
instruction emulation logic to ensure that `lr=[sp-X]` can override the
`<same>` rule.
- allows the `<same>` rule for pc and lr in all
`m_all_registers_available` frames (and not just frame zero).
The test verifies that we can unwind in a situation like this, and that
the backtrace matches the one we computed before getting a signal.
This reverts commit fd1bd53ba5a06f344698a55578f6a5d79c457e30.
TestInterruptBacktrace was broken on AArch64/Windows as a result of this change.
See lldb-aarch64-windows buildbot here:
https://lab.llvm.org/buildbot/#/builders/219/builds/11261
A leaf function may not store the link register to stack, but we it can
still end up being a non-zero frame if it gets interrupted by a signal.
Currently, we were unable to unwind past this function because we could
not read the link register value.
To make this work, this patch:
- changes the function-entry unwind plan to include the `fp|lr = <same>`
rules. This in turn necessitated an adjustment in the generic
instruction emulation logic to ensure that `lr=[sp-X]` can override the
`<same>` rule.
- allows the `<same>` rule for pc and lr in all
`m_all_registers_available` frames (and not just frame zero).
The test verifies that we can unwind in a situation like this, and that
the backtrace matches the one we computed before getting a signal.
Walter Erquinigo added optional instruction annotations for x86
instructions in 2022 for the `thread trace dump instruction` command,
and code to DisassemblerLLVMC to add annotations for instructions that
change flow control, v. https://reviews.llvm.org/D128477
This was added as an option to `disassemble`, and the trace dump command
enables it by default, but several other instruction dumpers were
changed to display them by default as well. These are only implemented
for Intel instructions, so our disassembly on other targets ends up
looking like
```
(lldb) x/5i 0x1000086e4
0x1000086e4: 0xa9be6ffc unknown stp x28, x27, [sp, #-0x20]!
0x1000086e8: 0xa9017bfd unknown stp x29, x30, [sp, #0x10]
0x1000086ec: 0x910043fd unknown add x29, sp, #0x10
0x1000086f0: 0xd11843ff unknown sub sp, sp, #0x610
0x1000086f4: 0x910c63e8 unknown add x8, sp, #0x318
```
instead of `disassemble`'s output style of
```
lldb`main:
lldb[0x1000086e4] <+0>: stp x28, x27, [sp, #-0x20]!
lldb[0x1000086e8] <+4>: stp x29, x30, [sp, #0x10]
lldb[0x1000086ec] <+8>: add x29, sp, #0x10
lldb[0x1000086f0] <+12>: sub sp, sp, #0x610
lldb[0x1000086f4] <+16>: add x8, sp, #0x318
```
Adding symbolic annotations for assembly instructions is something I'm
interested in too, because we may have users investigating a crash or
apparent-incorrect behavior who must debug optimized assembly and they
may not be familiar with the ISA they're using, so short of flipping
through a many-thousand-page PDF to understand each instruction, they're
lost. They don't write assembly or work at that level, but to understand
a bug, they have to understand what the instructions are actually doing.
But the annotations that exist today don't move us forward much on that
front - I'd argue that the flow control instructions on Intel are not
hard to understand from their names, but that might just be my personal
bias. Much trickier instructions exist in any event.
Displaying this information by default for all targets when we only have
one class of instructions on one target is not a good default.
Also, in 2011 when Greg implemented the `memory read -f i` (aka `x/i`)
command
```
commit 5009f9d5010a7e34ae15f962dac8505ea11a8716
Author: Greg Clayton <gclayton@apple.com>
Date: Thu Oct 27 17:55:14 2011 +0000
[...]
eFormatInstruction will print out disassembly with bytes and it will use the
current target's architecture. The format character for this is "i" (which
used to be being used for the integer format, but the integer format also has
"d", so we gave the "i" format to disassembly), the long format is
"instruction".
```
he had DumpDataExtractor's DumpInstructions print the bytes of the
instruction -- that's the first field we see above for the `x/5i` after
the address -- and this is only useful for people who are debugging the
disassembler itself, I would argue. I don't want this displayed by
default either.
tl;dr this patch removes both fields from `memory read -f -i` and I
think this is the right call today. While I'm really interested in
instruction annotation, I don't think `x/i` is the right place to have
it enabled by default unless it's really compelling on at least some of
our major targets.
This is the first step to being able to handle non
trivial types in the union.
info_type effects the lifetime of the objects in the union,
so making it private means we know you have to call one of the
Set<...> functions to change it.
Reviewed By: clayborg
Differential Revision: https://reviews.llvm.org/D134039
In UnwindAssemblyInstEmulation we correctly recognize when a LDP
restores the fp & lr in an epilogue, and mark them as having the
caller's contents now, but we don't update the CFA register rule
at that point to indicate that the CFA is now calculated in terms
of $sp. This doesn't impact the backtrace because the register
contents are all <same> now, but it can confuse the stepper when
the StackID changes mid-epilogue.
Differential Revision: https://reviews.llvm.org/D124492
rdar://92064415
Most of our code was including Log.h even though that is not where the
"lldb" log channel is defined (Log.h defines the generic logging
infrastructure). This worked because Log.h included Logging.h, even
though it should.
After the recent refactor, it became impossible the two files include
each other in this direction (the opposite inclusion is needed), so this
patch removes the workaround that was put in place and cleans up all
files to include the right thing. It also renames the file to LLDBLog to
better reflect its purpose.
There is no reason why this function should be returning a ConstString.
While modifying these files, I also fixed several instances where
GetPluginName and GetPluginNameStatic were returning different strings.
I am not changing the return type of GetPluginNameStatic in this patch, as that
would necessitate additional changes, and this patch is big enough as it is.
Differential Revision: https://reviews.llvm.org/D111877
In all these years, we haven't found a use for this function (it has
zero callers). Lets just remove the boilerplate.
Differential Revision: https://reviews.llvm.org/D109600
Commiting this patch for Augusto Noronha who is getting set
up still.
This patch changes Target::ReadMemory so the default behavior
when a read is in a Section that is read-only is to fetch the
data from the local binary image, instead of reading it from
memory. Update all callers to use their old preferences
(the old prefer_file_cache bool) using the new API; we should
revisit these calls and see if they really intend to read
live memory, or if reading from a read-only Section would be
equivalent and important for performance-sensitive cases.
rdar://30634422
Differential revision: https://reviews.llvm.org/D100338
Fix a bug where UnwindAssemblyInstEmulation would confuse which
register is used to compute the Canonical Frame Address after it
had branched over a mid-function epilogue (where the CFA reg changes
from $fp to $sp in the process of epiloguing). Reinstate the
correct CFA register after we forward the unwind rule for branch
targets. The failure mode was that UnwindAssemblyInstEmulation
would think CFA was set in terms of $sp after one of these epilogues,
and if it sees modifications to $sp after the branch target, it would
change the CFA offset in the unwind rule -- even though the CFA is
defined in terms of $fp and the $sp changes are irrelevant to correct
calculation.
<rdar://problem/60300528>
Differential Revision: https://reviews.llvm.org/D78077
This is a step towards making the initialize and terminate calls be
generated by CMake, which in turn is towards making it possible to
disable plugins at configuration time.
Differential revision: https://reviews.llvm.org/D74245
Summary:
A *.cpp file header in LLDB (and in LLDB) should like this:
```
//===-- TestUtilities.cpp -------------------------------------------------===//
```
However in LLDB most of our source files have arbitrary changes to this format and
these changes are spreading through LLDB as folks usually just use the existing
source files as templates for their new files (most notably the unnecessary
editor language indicator `-*- C++ -*-` is spreading and in every review
someone is pointing out that this is wrong, resulting in people pointing out that this
is done in the same way in other files).
This patch removes most of these inconsistencies including the editor language indicators,
all the different missing/additional '-' characters, files that center the file name, missing
trailing `===//` (mostly caused by clang-format breaking the line).
Reviewers: aprantl, espindola, jfb, shafik, JDevlieghere
Reviewed By: JDevlieghere
Subscribers: dexonsmith, wuzish, emaste, sdardis, nemanjai, kbarton, MaskRay, atanasyan, arphaman, jfb, abidh, jsji, JDevlieghere, usaxena95, lldb-commits
Tags: #lldb
Differential Revision: https://reviews.llvm.org/D73258
Summary:
NFC = [[ https://llvm.org/docs/Lexicon.html#nfc | Non functional change ]]
This commit is the result of modernizing the LLDB codebase by using
`nullptr` instread of `0` or `NULL`. See
https://clang.llvm.org/extra/clang-tidy/checks/modernize-use-nullptr.html
for more information.
This is the command I ran and I to fix and format the code base:
```
run-clang-tidy.py \
-header-filter='.*' \
-checks='-*,modernize-use-nullptr' \
-fix ~/dev/llvm-project/lldb/.* \
-format \
-style LLVM \
-p ~/llvm-builds/debug-ninja-gcc
```
NOTE: There were also changes to `llvm/utils/unittest` but I did not
include them because I felt that maybe this library shall be updated in
isolation somehow.
NOTE: I know this is a rather large commit but it is a nobrainer in most
parts.
Reviewers: martong, espindola, shafik, #lldb, JDevlieghere
Reviewed By: JDevlieghere
Subscribers: arsenm, jvesely, nhaehnle, hiraditya, JDevlieghere, teemperor, rnkovacs, emaste, kubamracek, nemanjai, ki.stfu, javed.absar, arichardson, kbarton, jrtc27, MaskRay, atanasyan, dexonsmith, arphaman, jfb, jsji, jdoerfert, lldb-commits, llvm-commits
Tags: #lldb, #llvm
Differential Revision: https://reviews.llvm.org/D61847
llvm-svn: 361484
A lot of comments in LLDB are surrounded by an ASCII line to delimit the
begging and end of the comment.
Its use is not really consistent across the code base, sometimes the
lines are longer, sometimes they are shorter and sometimes they are
omitted. Furthermore, it looks kind of weird with the 80 column limit,
where the comment actually extends past the line, but not by much.
Furthermore, when /// is used for Doxygen comments, it looks
particularly odd. And when // is used, it incorrectly gives the
impression that it's actually a Doxygen comment.
I assume these lines were added to improve distinguishing between
comments and code. However, given that todays editors and IDEs do a
great job at highlighting comments, I think it's worth to drop this for
the sake of consistency. The alternative is fixing all the
inconsistencies, which would create a lot more churn.
Differential revision: https://reviews.llvm.org/D60508
llvm-svn: 358135
The `ap` suffix is a remnant of lldb's former use of auto pointers,
before they got deprecated. Although all their uses were replaced by
unique pointers, some variables still carried the suffix.
In r353795 I removed another auto_ptr remnant, namely redundant calls to
::get for unique_pointers. Jim justly noted that this is a good
opportunity to clean up the variable names as well.
I went over all the changes to ensure my find-and-replace didn't have
any undesired side-effects. I hope I didn't miss any, but if you end up
at this commit doing a git blame on a weirdly named variable, please
know that the change was unintentional.
llvm-svn: 353912
to reflect the new license.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
llvm-svn: 351636
Summary:
The dump function was the only part of this class which depended on
high-level functionality. This was due to the DumpDataExtractor
function, which uses info from a running target to control dump format
(although, RegisterValue doesn't really use the high-level part of
DumpDataExtractor).
This patch follows the same approach done for the DataExtractor class,
and extracts the dumping code into a separate function/file. This file
can stay in the higher level code, while the RegisterValue class and
anything that does not depend in dumping can stay go to lower layers.
The XCode project will need to be updated after this patch.
Reviewers: zturner, jingham, clayborg
Subscribers: lldb-commits, mgorny
Differential Revision: https://reviews.llvm.org/D48351
llvm-svn: 337832
This is intended as a clean up after the big clang-format commit
(r280751), which unfortunately resulted in many of the comment
paragraphs in LLDB being very hard to read.
FYI, the script I used was:
import textwrap
import commands
import os
import sys
import re
tmp = "%s.tmp"%sys.argv[1]
out = open(tmp, "w+")
with open(sys.argv[1], "r") as f:
header = ""
text = ""
comment = re.compile(r'^( *//) ([^ ].*)$')
special = re.compile(r'^((([A-Z]+[: ])|([0-9]+ )).*)|(.*;)$')
for line in f:
match = comment.match(line)
if match and not special.match(match.group(2)):
# skip intentionally short comments.
if not text and len(match.group(2)) < 40:
out.write(line)
continue
if text:
text += " " + match.group(2)
else:
header = match.group(1)
text = match.group(2)
continue
if text:
filled = textwrap.wrap(text, width=(78-len(header)),
break_long_words=False)
for l in filled:
out.write(header+" "+l+'\n')
text = ""
out.write(line)
os.rename(tmp, sys.argv[1])
Differential Revision: https://reviews.llvm.org/D46144
llvm-svn: 331197
The rationale here is that ArchSpec is used throughout the codebase,
including in places which should not depend on the rest of the code in
the Core module.
This commit touches many files, but most of it is just renaming of
#include lines. In a couple of cases, I removed the #include ArchSpec
line altogether, as the file was not using it. In one or two places,
this necessitated adding other #includes like lldb-private-defines.h.
llvm-svn: 318048
This renames the LLDB error class to Status, as discussed
on the lldb-dev mailing list.
A change of this magnitude cannot easily be done without
find and replace, but that has potential to catch unwanted
occurrences of common strings such as "Error". Every effort
was made to find all the obvious things such as the word "Error"
appearing in a string, etc, but it's possible there are still
some lingering occurences left around. Hopefully nothing too
serious.
llvm-svn: 302872