This patch factors out the `-e` option logic into two helper functions.
The `EvaluateExpression` helper might seem redundant but I'll be adding
to it in a follow-up patch to fix an issue when running `memory find -e`
for Swift targets.
Also adds test coverage for the error cases that were previously
untested.
rdar://152113525
This patch pushes the error handling boundary for the GetBitSize()
methods from Runtime into the Type and CompilerType APIs. This makes it
easier to diagnose problems thanks to more meaningful error messages
being available. GetBitSize() is often the first thing LLDB asks about a
type, so this method is particularly important for a better user
experience.
rdar://145667239
This is intended for use with Arm's Guarded Control Stack extension
(GCS). Which reuses some existing shadow stack support in Linux. It
should also work with the x86 equivalent.
A "ss" flag is added to the "VmFlags" line of shadow stack memory
regions in `/proc/<pid>/smaps`. To keep the naming generic I've called
it shadow stack instead of guarded control stack.
Also the wording is "shadow stack: yes" because the shadow stack region
is just where it's stored. It's enabled for the whole process or it
isn't. As opposed to memory tagging which can be enabled per region, so
"memory tagging: enabled" fits better for that.
I've added a test case that is also intended to be the start of a set of
tests for GCS. This should help me avoid duplicating the inline assembly
needed.
Note that no special compiler support is needed for the test. However,
for the intial enabling of GCS (assuming the libc isn't doing it) we do
need to use an inline assembly version of prctl.
This is because as soon as you enable GCS, all returns are checked
against the GCS. If the GCS is empty, the program will fault. In other
words, you can never return from the function that enabled GCS, unless
you push values onto it (which is possible but not needed here).
So you cannot use the libc's prctl wrapper for this reason. You can use
that wrapper for anything else, as we do to check if GCS is enabled.
ValueObject is part of lldbCore for historical reasons, but conceptually
it deserves to be its own library. This does introduce a (link-time) circular
dependency between lldbCore and lldbValueObject, which is unfortunate
but probably unavoidable because so many things in LLDB rely on
ValueObject. We already have cycles and these libraries are never built
as dylibs so while this doesn't improve the situation, it also doesn't
make things worse.
The header includes were updated with the following command:
```
find . -type f -exec sed -i.bak "s%include \"lldb/Core/ValueObject%include \"lldb/ValueObject/ValueObject%" '{}' \;
```
This patch removes all of the Set.* methods from Status.
This cleanup is part of a series of patches that make it harder use the
anti-pattern of keeping a long-lives Status object around and updating
it while dropping any errors it contains on the floor.
This patch is largely NFC, the more interesting next steps this enables
is to:
1. remove Status.Clear()
2. assert that Status::operator=() never overwrites an error
3. remove Status::operator=()
Note that step (2) will bring 90% of the benefits for users, and step
(3) will dramatically clean up the error handling code in various
places. In the end my goal is to convert all APIs that are of the form
` ResultTy DoFoo(Status& error)
`
to
` llvm::Expected<ResultTy> DoFoo()
`
How to read this patch?
The interesting changes are in Status.h and Status.cpp, all other
changes are mostly
` perl -pi -e 's/\.SetErrorString/ = Status::FromErrorString/g' $(git
grep -l SetErrorString lldb/source)
`
plus the occasional manual cleanup.
Compilers and language runtimes often use helper functions that are
fundamentally uninteresting when debugging anything but the
compiler/runtime itself. This patch introduces a user-extensible
mechanism that allows for these frames to be hidden from backtraces and
automatically skipped over when navigating the stack with `up` and
`down`.
This does not affect the numbering of frames, so `f <N>` will still
provide access to the hidden frames. The `bt` output will also print a
hint that frames have been hidden.
My primary motivation for this feature is to hide thunks in the Swift
programming language, but I'm including an example recognizer for
`std::function::operator()` that I wished for myself many times while
debugging LLDB.
rdar://126629381
Example output. (Yes, my proof-of-concept recognizer could hide even
more frames if we had a method that returned the function name without
the return type or I used something that isn't based off regex, but it's
really only meant as an example).
before:
```
(lldb) thread backtrace --filtered=false
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
* frame #0: 0x0000000100001f04 a.out`foo(x=1, y=1) at main.cpp:4:10
frame #1: 0x0000000100003a00 a.out`decltype(std::declval<int (*&)(int, int)>()(std::declval<int>(), std::declval<int>())) std::__1::__invoke[abi:se200000]<int (*&)(int, int), int, int>(__f=0x000000016fdff280, __args=0x000000016fdff224, __args=0x000000016fdff220) at invoke.h:149:25
frame #2: 0x000000010000399c a.out`int std::__1::__invoke_void_return_wrapper<int, false>::__call[abi:se200000]<int (*&)(int, int), int, int>(__args=0x000000016fdff280, __args=0x000000016fdff224, __args=0x000000016fdff220) at invoke.h:216:12
frame #3: 0x0000000100003968 a.out`std::__1::__function::__alloc_func<int (*)(int, int), std::__1::allocator<int (*)(int, int)>, int (int, int)>::operator()[abi:se200000](this=0x000000016fdff280, __arg=0x000000016fdff224, __arg=0x000000016fdff220) at function.h:171:12
frame #4: 0x00000001000026bc a.out`std::__1::__function::__func<int (*)(int, int), std::__1::allocator<int (*)(int, int)>, int (int, int)>::operator()(this=0x000000016fdff278, __arg=0x000000016fdff224, __arg=0x000000016fdff220) at function.h:313:10
frame #5: 0x0000000100003c38 a.out`std::__1::__function::__value_func<int (int, int)>::operator()[abi:se200000](this=0x000000016fdff278, __args=0x000000016fdff224, __args=0x000000016fdff220) const at function.h:430:12
frame #6: 0x0000000100002038 a.out`std::__1::function<int (int, int)>::operator()(this= Function = foo(int, int) , __arg=1, __arg=1) const at function.h:989:10
frame #7: 0x0000000100001f64 a.out`main(argc=1, argv=0x000000016fdff4f8) at main.cpp:9:10
frame #8: 0x0000000183cdf154 dyld`start + 2476
(lldb)
```
after
```
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
* frame #0: 0x0000000100001f04 a.out`foo(x=1, y=1) at main.cpp:4:10
frame #1: 0x0000000100003a00 a.out`decltype(std::declval<int (*&)(int, int)>()(std::declval<int>(), std::declval<int>())) std::__1::__invoke[abi:se200000]<int (*&)(int, int), int, int>(__f=0x000000016fdff280, __args=0x000000016fdff224, __args=0x000000016fdff220) at invoke.h:149:25
frame #2: 0x000000010000399c a.out`int std::__1::__invoke_void_return_wrapper<int, false>::__call[abi:se200000]<int (*&)(int, int), int, int>(__args=0x000000016fdff280, __args=0x000000016fdff224, __args=0x000000016fdff220) at invoke.h:216:12
frame #6: 0x0000000100002038 a.out`std::__1::function<int (int, int)>::operator()(this= Function = foo(int, int) , __arg=1, __arg=1) const at function.h:989:10
frame #7: 0x0000000100001f64 a.out`main(argc=1, argv=0x000000016fdff4f8) at main.cpp:9:10
frame #8: 0x0000000183cdf154 dyld`start + 2476
Note: Some frames were hidden by frame recognizers
```
This change by itself has no measurable effect on the LLDB
testsuite. I'm making it in preparation for threading through more
errors in the Swift language plugin.
As we have debuginfod as symbol locator available in lldb now, we want
to make full use of it.
In case of post mortem debugging, we don't always have the main
executable available.
However, the .note.gnu.build-id of the main executable(some other
modules too), should be available in the core file, as those binaries
are loaded in memory and dumped in the core file.
We try to iterate through the NT_FILE entries, read and store the gnu
build id if possible. This will be very useful as this id is the unique
key which is needed for querying the debuginfod server.
Test:
Build and run lldb. Breakpoint set to
https://github.com/llvm/llvm-project/blob/main/lldb/source/Plugins/SymbolLocator/Debuginfod/SymbolLocatorDebuginfod.cpp#L147
Verified after this commit, module_uuid is the correct gnu build id of
the main executable which caused the crash(first in the NT_FILE entry)
This patch replaces uses of StringRef::{starts,ends}with with
StringRef::{starts,ends}_with for consistency with
std::{string,string_view}::{starts,ends}_with in C++20.
I'm planning to deprecate and eventually remove
StringRef::{starts,ends}with.
This patch revives the effort to get this Phabricator patch into
upstream:
https://reviews.llvm.org/D137900
This patch was accepted before in Phabricator but I found some
-gsimple-template-names issues that are fixed in this patch.
A fixed up version of the description from the original patch starts
now.
This patch started off trying to fix Module::FindFirstType() as it
sometimes didn't work. The issue was the SymbolFile plug-ins didn't do
any filtering of the matching types they produced, and they only looked
up types using the type basename. This means if you have two types with
the same basename, your type lookup can fail when only looking up a
single type. We would ask the Module::FindFirstType to lookup "Foo::Bar"
and it would ask the symbol file to find only 1 type matching the
basename "Bar", and then we would filter out any matches that didn't
match "Foo::Bar". So if the SymbolFile found "Foo::Bar" first, then it
would work, but if it found "Baz::Bar" first, it would return only that
type and it would be filtered out.
Discovering this issue lead me to think of the patch Alex Langford did a
few months ago that was done for finding functions, where he allowed
SymbolFile objects to make sure something fully matched before parsing
the debug information into an AST type and other LLDB types. So this
patch aimed to allow type lookups to also be much more efficient.
As LLDB has been developed over the years, we added more ways to to type
lookups. These functions have lots of arguments. This patch aims to make
one API that needs to be implemented that serves all previous lookups:
- Find a single type
- Find all types
- Find types in a namespace
This patch introduces a `TypeQuery` class that contains all of the state
needed to perform the lookup which is powerful enough to perform all of
the type searches that used to be in our API. It contain a vector of
CompilerContext objects that can fully or partially specify the lookup
that needs to take place.
If you just want to lookup all types with a matching basename,
regardless of the containing context, you can specify just a single
CompilerContext entry that has a name and a CompilerContextKind mask of
CompilerContextKind::AnyType.
Or you can fully specify the exact context to use when doing lookups
like: CompilerContextKind::Namespace "std"
CompilerContextKind::Class "foo"
CompilerContextKind::Typedef "size_type"
This change expands on the clang modules code that already used a
vector<CompilerContext> items, but it modifies it to work with
expression type lookups which have contexts, or user lookups where users
query for types. The clang modules type lookup is still an option that
can be enabled on the `TypeQuery` objects.
This mirrors the most recent addition of type lookups that took a
vector<CompilerContext> that allowed lookups to happen for the
expression parser in certain places.
Prior to this we had the following APIs in Module:
```
void
Module::FindTypes(ConstString type_name, bool exact_match, size_t max_matches,
llvm::DenseSet<lldb_private::SymbolFile *> &searched_symbol_files,
TypeList &types);
void
Module::FindTypes(llvm::ArrayRef<CompilerContext> pattern, LanguageSet languages,
llvm::DenseSet<lldb_private::SymbolFile *> &searched_symbol_files,
TypeMap &types);
void Module::FindTypesInNamespace(ConstString type_name,
const CompilerDeclContext &parent_decl_ctx,
size_t max_matches, TypeList &type_list);
```
The new Module API is much simpler. It gets rid of all three above
functions and replaces them with:
```
void FindTypes(const TypeQuery &query, TypeResults &results);
```
The `TypeQuery` class contains all of the needed settings:
- The vector<CompilerContext> that allow efficient lookups in the symbol
file classes since they can look at basename matches only realize fully
matching types. Before this any basename that matched was fully realized
only to be removed later by code outside of the SymbolFile layer which
could cause many types to be realized when they didn't need to.
- If the lookup is exact or not. If not exact, then the compiler context
must match the bottom most items that match the compiler context,
otherwise it must match exactly
- If the compiler context match is for clang modules or not. Clang
modules matches include a Module compiler context kind that allows types
to be matched only from certain modules and these matches are not needed
when d oing user type lookups.
- An optional list of languages to use to limit the search to only
certain languages
The `TypeResults` object contains all state required to do the lookup
and store the results:
- The max number of matches
- The set of SymbolFile objects that have already been searched
- The matching type list for any matches that are found
The benefits of this approach are:
- Simpler API, and only one API to implement in SymbolFile classes
- Replaces the FindTypesInNamespace that used a CompilerDeclContext as a
way to limit the search, but this only worked if the TypeSystem matched
the current symbol file's type system, so you couldn't use it to lookup
a type in another module
- Fixes a serious bug in our FindFirstType functions where if we were
searching for "foo::bar", and we found a "baz::bar" first, the basename
would match and we would only fetch 1 type using the basename, only to
drop it from the matching list and returning no results
[lldb] Part 2 of 2 - Refactor `CommandObject::DoExecute(...)` to return
`void` instead of ~~`bool`~~
Justifications:
- The code doesn't ultimately apply the `true`/`false` return values.
- The methods already pass around a `CommandReturnObject`, typically
with a `result` parameter.
- Each command return object already contains:
- A more precise status
- The error code(s) that apply to that status
Part 1 refactors the `CommandObject::Execute(...)` method.
- See
[https://github.com/llvm/llvm-project/pull/69989](https://github.com/llvm/llvm-project/pull/69989)
rdar://117378957
Refactor OptionValue to return a std::optional instead of taking a fail
value. This allows the caller to handle situations where there's no
value, instead of being unable to distinguish between the absence of a
value and the value happening the match the fail value. When a fail
value is required, std::optional::value_or() provides the same
functionality.
This is a follow up to https://reviews.llvm.org/D141629
and applies the change it made to all paths through ToAddress
(now DoToAddress).
I have included the test from my previous attempt
https://reviews.llvm.org/D136938.
The initial change only applied fixing to addresses that
would parse as integers, so my test case failed. Since
ToAddress has multiple exit points, I've wrapped it into
a new method DoToAddress.
Now you can call ToAddress, it will call DoToAddress and
no matter what path you take, the address will be fixed.
For the memory tagging commands we actually want the full
address (to work out mismatches). So I added ToRawAddress
for that.
I have tested this on a QEMU AArch64 Linux system with
Memory Tagging, Pointer Authentication and Top Byte Ignore
enabled. By running the new test and all other tests in
API/linux/aarch64.
Some commands have had calls to the ABI plugin removed
as ToAddress now does this for them.
The "memory region" command still needs to use the ABI plugin
to detect the end of memory when there are non-address bits.
Reviewed By: jasonmolenda
Differential Revision: https://reviews.llvm.org/D142715
std::optional::value() has undesired exception checking semantics and is
unavailable in older Xcode (see _LIBCPP_AVAILABILITY_BAD_OPTIONAL_ACCESS). The
call sites block std::optional migration.
There are two conditions for the loop exit. Either we hit LLDB_INVALID_ADDRESS
or the ABI tells us we are beyond mappable memory.
I made a mistake in that second part that meant if you had no ABI plugin
--all would stop on the first loop and return nothing.
If there's no ABI plugin we should only check for LLDB_INVALID_ADDRESS.
Depends on D134029
Reviewed By: labath
Differential Revision: https://reviews.llvm.org/D134030
When I wrote the initial version I forgot that a region being
unmapped is not an error. There are real errors that we don't
want to hide, such as the remote not supporting the
qMemoryRegionInfo packet (gdbserver does not).
Reviewed By: labath
Differential Revision: https://reviews.llvm.org/D134029
Refactor the command option enum values and the command argument table
to connect the two. This has two benefits:
- We guarantee that two options that use the same argument type have
the same accepted values.
- We can print the enum values and their description in the help
output. (D129707)
Differential revision: https://reviews.llvm.org/D129703
Adds a check to ensure that a process exists before attempting to get
its ABI to prevent lldb from crashing due to trying to read from page zero.
Differential revision: https://reviews.llvm.org/D127016
This is off by default. If you get a result and that
memory has memory tags, when --show-tags is given you'll
see the tags inline with the memory content.
```
(lldb) memory read mte_buf mte_buf+64 --show-tags
<...>
0xfffff7ff8020: 00 00 00 00 00 00 00 00 0d f0 fe ca 00 00 00 00 ................ (tag: 0x2)
<...>
(lldb) memory find -e 0xcafef00d mte_buf mte_buf+64 --show-tags
data found at location: 0xfffff7ff8028
0xfffff7ff8028: 0d f0 fe ca 00 00 00 00 00 00 00 00 00 00 00 00 ................ (tags: 0x2 0x3)
0xfffff7ff8038: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ (tags: 0x3 0x4)
```
The logic for handling alignments is the same as for memory read
so in the above example because the line starts misaligned to the
granule it covers 2 granules.
Depends on D125089
Reviewed By: omjavaid
Differential Revision: https://reviews.llvm.org/D125090
This reverts commit 3e928c4b9dfb01efd2cb968795e605760828e873.
This fixes an issue seen on Windows where we did not properly
get the section names of regions if they overlapped. Windows
has regions like:
[0x00007fff928db000-0x00007fff949a0000) ---
[0x00007fff949a0000-0x00007fff949a1000) r-- PECOFF header
[0x00007fff949a0000-0x00007fff94a3d000) r-x .hexpthk
[0x00007fff949a0000-0x00007fff94a85000) r-- .rdata
[0x00007fff949a0000-0x00007fff94a88000) rw- .data
[0x00007fff949a0000-0x00007fff94a94000) r-- .pdata
[0x00007fff94a94000-0x00007fff95250000) ---
I assumed that you could just resolve the address and get the section
name using the start of the region but here you'd always get
"PECOFF header" because they all have the same start point.
The usual command repeating loop used the end address of the previous
region when requesting the next, or getting the section name.
So I've matched this in the --all scenario.
In the example above, somehow asking for the region at
0x00007fff949a1000 would get you a region that starts at
0x00007fff949a0000 but has a different end point. Using the load
address you get (what I assume is) the correct section name.
This does 2 things:
* Moves it after the short options. Which makes sense given it's
a niche, default off option.
(if 2 files for one option seems a bit much, I am going to reuse
them for "memory find" later)
* Fixes the use of repeated commands. For example:
memory read buf --show-tags
<shows tags>
memory read
<shows tags>
Added tests for the repetition and updated existing help tests.
Reviewed By: omjavaid
Differential Revision: https://reviews.llvm.org/D125089
This adds an option to the memory region command
to print all regions at once. Like you can do by
starting at address 0 and repeating the command
manually.
memory region [-a] [<address-expression>]
(lldb) memory region --all
[0x0000000000000000-0x0000000000400000) ---
[0x0000000000400000-0x0000000000401000) r-x <...>/a.out PT_LOAD[0]
<...>
[0x0000fffffffdf000-0x0001000000000000) rw- [stack]
[0x0001000000000000-0xffffffffffffffff) ---
The output matches exactly what you'd get from
repeating the command. Including that it shows
unmapped areas between the mapped regions.
(this is why Process GetMemoryRegions is not
used, that skips unmapped areas)
Help text has been updated to show that you can have
an address or --all but not both.
Reviewed By: JDevlieghere
Differential Revision: https://reviews.llvm.org/D111791
FixAnyAddress is to be used when we don't know or don't care
whether we're fixing a code or data address.
By using FixAnyAddress over the others, you document that no
specific choice was made.
On all existing platforms apart from Arm Thumb, you could use
either FixCodeAddress or FixDataAddress and be fine. Up until
now I've chosen to use FixDataAddress but if I had
chosen to use FixCodeAddress that would have broken Arm Thumb.
Hence FixAnyAddress, to give you the "safest" option when you're
in generic code.
Uses of FixDataAddress in memory region code have been changed
to FixAnyAddress. The functionality is unchanged.
Reviewed By: omjavaid, JDevlieghere
Differential Revision: https://reviews.llvm.org/D124000
Given that you'd never find empty string, just error.
Also add a test that an invalid expr generates an error.
Reviewed By: JDevlieghere
Differential Revision: https://reviews.llvm.org/D123793
Currently, all data buffers are assumed to be writable. This is a
problem on macOS where it's not allowed to load unsigned binaries in
memory as writable. To be more precise, MAP_RESILIENT_CODESIGN and
MAP_RESILIENT_MEDIA need to be set for mapped (unsigned) binaries on our
platform.
Binaries are mapped through FileSystem::CreateDataBuffer which returns a
DataBufferLLVM. The latter is backed by a llvm::WritableMemoryBuffer
because every DataBuffer in LLDB is considered to be writable. In order
to use a read-only llvm::MemoryBuffer I had to split our abstraction
around it.
This patch distinguishes between a DataBuffer (read-only) and
WritableDataBuffer (read-write) and updates LLDB to use the appropriate
one.
rdar://74890607
Differential revision: https://reviews.llvm.org/D122856
Applied modernize-use-equals-default clang-tidy check over LLDB.
This check is already present in the lldb/.clang-tidy config.
Differential Revision: https://reviews.llvm.org/D121844
Applied modernize-use-default-member-init clang-tidy check over LLDB.
It appears in many files we had already switched to in class member init but
never updated the constructors to reflect that. This check is already present in
the lldb/.clang-tidy config.
Differential Revision: https://reviews.llvm.org/D121481
This way if you have a long stack, you can issue "thread backtrace --count 10"
and then subsequent <Return>-s will page you through the stack.
This took a little more effort than just adding the repeat command, since
the GetRepeatCommand API was returning a "const char *". That meant the command
had to keep the repeat string alive, which is inconvenient. The original
API returned either a nullptr, or a const char *, so I changed the private API to
return an llvm::Optional<std::string>. Most of the patch is propagating that change.
Also, there was a little thinko in fetching the repeat command. We don't
fetch repeat commands for commands that aren't being added to history, which
is in general reasonable. And we don't add repeat commands to the history -
also reasonable. But we do want the repeat command to be able to generate
the NEXT repeat command. So I adjusted the logic in HandleCommand to work
that way.
Differential Revision: https://reviews.llvm.org/D119046
This reverts commit 0df522969a7a0128052bd79182c8d58e00556e2f.
Additional checks are added to fix the detection of the last memory region
in GetMemoryRegions or repeating the "memory region" command when the
target has non-address bits.
Normally you keep reading from address 0, looking up each region's end
address until you get LLDB_INVALID_ADDR as the region end address.
(0xffffffffffffffff)
This is what the remote will return once you go beyond the last mapped region:
[0x0000fffffffdf000-0x0001000000000000) rw- [stack]
[0x0001000000000000-0xffffffffffffffff) ---
Problem is that when we "fix" the lookup address, we remove some bits
from it. On an AArch64 system we have 48 bit virtual addresses, so when
we fix the end address of the [stack] region the result is 0.
So we loop back to the start.
[0x0000fffffffdf000-0x0001000000000000) rw- [stack]
[0x0000000000000000-0x0000000000400000) ---
To fix this I added an additional check for the last range.
If the end address of the region is different once you apply
FixDataAddress, we are at the last region.
Since the end of the last region will be the last valid mappable
address, plus 1. That 1 will be removed by the ABI plugin.
The only side effect is that on systems with non-address bits, you
won't get that last catch all unmapped region from the max virtual
address up to 0xf...f.
[0x0000fffff8000000-0x0000fffffffdf000) ---
[0x0000fffffffdf000-0x0001000000000000) rw- [stack]
<ends here>
Though in some way this is more correct because that region is not
just unmapped, it's not mappable at all.
No extra testing is needed because this is already covered by
TestMemoryRegion.py, I simply forgot to run it on system that had
both top byte ignore and pointer authentication.
This change has been tested on a qemu VM with top byte ignore,
memory tagging and pointer authentication enabled.
Reviewed By: omjavaid
Differential Revision: https://reviews.llvm.org/D115508