Resolves#141955
- Adds data to breakpoints `Source` object, in order for assembly
breakpoints, which rely on a temporary `sourceReference` value, to be
able to resolve in future sessions like normal path+line breakpoints
- Adds optional `instructions_offset` parameter to `BreakpointResolver`
This commit handles the following types:
- clang::ExternalASTSource
- clang::TargetInfo
- clang::ASTContext
- clang::SourceManager
- clang::FileManager
Part of cleanup #151026
This is a continuation of 68fd102, which did the same thing but only for
StopInfo. Using make_shared is both safer and more efficient:
- With make_shared, the object and the control block are allocated
together, which is more efficient.
- With make_shared, the enable_shared_from_this base class is properly
linked to the control block before the constructor finishes, so
shared_from_this() will be safe to use (though still not recommended
during construction).
We defend against a direct cycle where a base class ValueObject is its
own parent, but not against a longer base cycle. This cycle requires
that some value's Type includes a base class, and that base class is in
a class hierarchy that cycles back to the original base class.
I wrote a test case that creates a cycle in the class hierarchy by
dynamically overwriting the superclass of an object, but I can't
reproduce the crash. I can't think of any other way to make a real
object that behaves that way. Maybe is a type system problem in making
up the type for whatever type we're trying to ingest here.
While unsatisfying, without a reproducer this is the best we can do for
now.
rdar://140293233
Some inter-plugin dependencies are okay, others are not. Yet others not,
but we're sort of stuck with them. The idea is to be able to prevent
backsliding while making sure that acceptable dependencies are..
accepted. For context, see
https://github.com/llvm/llvm-project/pull/139170 and the attached
changes to the documentation.
If we're not touching them, we don't need to do anything special to pass
them along -- with one important caveat: due to how cmake arguments
work, the implicitly passed arguments need to be specified before
arguments that we handle.
This isn't particularly nice, but the alternative is enumerating all
arguments that can be used by llvm_add_library and the macros it calls
(it also relies on implicit passing of some arguments to
llvm_process_sources).
The problem was in calling GetLoadAddress on a value in the error state,
where `ValueObject::GetLoadAddress` could end up accessing the
uninitialized "address type" by-ref return value from `GetAddressOf`.
This probably happened because each function expected the other to
initialize it.
We can guarantee initialization by turning this into a proper return
value.
I've added a test, but it only (reliably) crashes if lldb is built with
ubsan.
We're reading from the object's vtable to determine the pointer to the
full object. The vtable is normally in the "rodata" section of the
executable, which is often not included in the core file because it's
not supposed to change and the debugger can extrapolate its contents
from the executable file. We weren't doing that.
This patch changes the read operation to use the target class (which
falls back onto the executable module as expected) and adds the missing
ReadSignedIntegerFromMemory API. The fix is tested by creating a core
(minidump) file which deliberately omits the vtable pointer.
Fixes parsing of an ObjC type encoding such as `{?="a""b"}`. Parsing of such a type
encoding would lead to an assert. This was observed when running `language objc
class-table dump`.
The function `ReadQuotedString` consumes the closing quote, however one of its two
callers (`ReadStructElement`) was also consuming a quote. For the above type encoding,
where two quoted strings occur back to back, the parser would unintentionally consume
the opening quote of the second quoted string - leaving the remaining text with an
unbalanced quote.
This changes fixes `ReadStructElement` to not consume a quote after calling
`ReadQuotedString`.
For callers to know whether a string was successfully parsed, `ReadQuotedString` now
returns an optional string.
Change `objc tagged-pointer info` to call
`OptionArgParser::ToRawAddress`.
Previously `ToAddress` was used, but it calls `FixCodeAddress`, which
can erroneously mutate the bits of a tagged pointer.
This patch pushes the error handling boundary for the GetBitSize()
methods from Runtime into the Type and CompilerType APIs. This makes it
easier to diagnose problems thanks to more meaningful error messages
being available. GetBitSize() is often the first thing LLDB asks about a
type, so this method is particularly important for a better user
experience.
rdar://145667239
This PR fixes LLDB stepping out, rather than stepping through a C++
thunk. The implementation is based on, and upstreams, the support for
runtime thunks in the Swift fork.
Fixes#43413
AppleObjCRuntimeV2 prints a warning when we read libobjc.A.dylib from
memory, as a canary to detect that we are reading system binaries out of
memory (which is slow, and we try hard to avoid). But with a corefile,
reading out of "memory" is fine, and may be the only way we can find the
correct binary.
rdar://144322688
Remove Debugger::GetOutputStream and Debugger::GetErrorStream in
preparation for replacing both with a new variant that needs to be
locked and hence can't be handed out like we do right now.
The patch replaces most uses with GetAsyncOutputStream and
GetAsyncErrorStream respectively. There methods return new StreamSP
objects that automatically get flushed on destruction.
See #126630 for more details.
…uffer
ValueObjectDynamicValue::UpdateValue() assumes that the dynamic type
found by GetDynamicTypeAndAddress() would return an address in the
inferior. This commit makes it so it can deal with being passed a host
address instead.
This is needed downstream by the Swift fork.
rdar://143357274
Many uses of SC::GetAddressRange were not interested in the range, but
in the address of the function/symbol contained inside the symbol
context. They were getting that by calling the GetBaseAddress on the
returned range, which worked well enough so far, but isn't compatible
with discontinuous functions, whose address (entry point) may not be the
lowest address in the range.
To resolve this problem, this PR creates a new function whose purpose is
return the address of the function or symbol inside the symbol context.
It also changes all of the callers of GetAddressRange which do not
actually care about the range to call this function instead.
Close https://github.com/llvm/llvm-project/issues/90154
This patch is also an optimization to the lookup process to utilize the
information provided by `export` keyword.
Previously, in the lookup process, the `export` keyword only takes part
in the check part, it doesn't get involved in the lookup process. That
said, previously, in a name lookup for 'name', we would load all of
declarations with the name 'name' and check if these declarations are
valid or not. It works well. But it is inefficient since it may load
declarations that may not be wanted.
Note that this patch actually did a trick in the lookup process instead
of bring module information to DeclarationName or considering module
information when deciding if two declarations are the same. So it may
not be a surprise to me if there are missing cases. But it is not a
regression. It should be already the case. Issue reports are welcomed.
In this patch, I tried to split the big lookup table into a lookup table
as before and a module local lookup table, which takes a combination of
the ID of the DeclContext and hash value of the primary module name as
the key. And refactored `DeclContext::lookup()` method to take the
module information. So that a lookup in a DeclContext won't load
declarations that are local to **other** modules.
And also I think it is already beneficial to split the big lookup table
since it may reduce the conflicts during lookups in the hash table.
BTW, this patch introduced a **regression** for a reachability rule in
C++20 but it was false-negative. See
'clang/test/CXX/module/module.interface/p7.cpp' for details.
This patch is not expected to introduce any other
regressions for non-c++20-modules users since the module local lookup
table should be empty for them.
---
On the API side, this patch unfortunately add a maybe-confusing argument
`Module *NamedModule` to
`ExternalASTSource::FindExternalVisibleDeclsByName()`. People may think
we can get the information from the first argument `const DeclContext
*DC`. But sadly there are declarations (e.g., namespace) can appear in
multiple different modules as a single declaration. So we have to add
additional information to indicate this.
Lots of code around LLDB was directly accessing the target's section
load list. This NFC patch makes the section load list private so the
Target class can access it, but everyone else now uses accessor
functions. This allows us to control the resolving of addresses and will
allow for functionality in LLDB which can lazily resolve addresses in
JIT plug-ins with a future patch.
This is motivated by exposing some Swift language-specific flags through
the API, in the example here it is used to communicate the Objective-C
runtime version. This could also be a meaningful extension point to get
information about "embedded: languages, such as extracting the C++
version in an Objective-C++ frame or something along those lines.
The `relative_list_list_entry_t` offset field in the Objective-C runtime
is of type `int64_t`. There are cases where these offsets are negative
values. For negative offsets, LLDB would currently incorrectly
zero-extend the offset (dropping the fact that the offset was negative),
instead producing large offsets that, when added to the
`m_baseMethods_ptr` result in addresses that had their upper bits set
(e.g., `0x00017ff81b3241b0`). We then would try to `GetMethodList` from
such an address but fail to read it (because it's an invalid address).
This would manifest in Objective-C decls not getting completed correctly
(and formatters not working). We noticed this in CI failures on our
Intel bots. This happened to work fine on arm64 because we strip the
upper bits when calling `ClassDescriptorV2::method_list_t::Read` using
the `FixCodeAddress` ABI plugin API (which doesn't do that on Intel).
The fix is to sign-extend the offset calculation.
Example failure before this patch:
```
======================================================================
FAIL: test_break_dwarf (TestRuntimeTypes.RuntimeTypesTestCase)
Test setting objc breakpoints using '_regexp-break' and 'breakpoint set'.
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/michaelbuch/Git/llvm-project/lldb/packages/Python/lldbsuite/test/lldbtest.py", line 1769, in test_method
return attrvalue(self)
File "/Users/michaelbuch/Git/llvm-project/lldb/test/API/lang/objc/foundation/TestRuntimeTypes.py", line 48, in test_break
self.expect(
File "/Users/michaelbuch/Git/llvm-project/lldb/packages/Python/lldbsuite/test/lldbtest.py", line 2370, in expect
self.runCmd(
File "/Users/michaelbuch/Git/llvm-project/lldb/packages/Python/lldbsuite/test/lldbtest.py", line 1000, in runCmd
self.assertTrue(self.res.Succeeded(), msg + output)
AssertionError: False is not true : Got a valid type
Error output:
error: <user expression 1>:1:11: no known method '+stringWithCString:encoding:'; cast the message send to the method's return type
1 | [NSString stringWithCString:"foo" encoding:1]
| ~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Config=x86_64-/Users/michaelbuch/Git/lldb-build-main-no-modules/bin/clang
---------------------------------------------------------------------- ```
ValueObject is part of lldbCore for historical reasons, but conceptually
it deserves to be its own library. This does introduce a (link-time) circular
dependency between lldbCore and lldbValueObject, which is unfortunate
but probably unavoidable because so many things in LLDB rely on
ValueObject. We already have cycles and these libraries are never built
as dylibs so while this doesn't improve the situation, it also doesn't
make things worse.
The header includes were updated with the following command:
```
find . -type f -exec sed -i.bak "s%include \"lldb/Core/ValueObject%include \"lldb/ValueObject/ValueObject%" '{}' \;
```
This commit changes the libc++ frame recognizer to hide implementation
details of libc++ more aggressively. The applied heuristic is rather
straightforward: We consider every function name starting with `__` as
an implementation detail.
This works pretty neatly for `std::invoke`, `std::function`,
`std::sort`, `std::map::emplace` and many others. Also, this should
align quite nicely with libc++'s general coding convention of using the
`__` for their implementation details, thereby keeping the future
maintenance effort low.
However, this heuristic by itself does not work in 100% of the cases:
E.g., `std::ranges::sort` is not a function, but an object with an
overloaded `operator()`, which means that there is no actual call
`std::ranges::sort` in the call stack. Instead, there is a
`std::ranges::__sort::operator()` call. To make sure that we don't hide
this stack frame, we never hide the frame which represents the entry
point from user code into libc++ code
LLVM now triggers an assertion when the format string and arguments
don't match. Fix a variety of incorrect format strings I discovered when
enabling logging with a debug build.
As specified in the docs,
1) raw_string_ostream is always unbuffered and
2) the underlying buffer may be used directly
( 65b13610a5226b84889b923bae884ba395ad084d for further reference )
* Don't call raw_string_ostream::flush(), which is essentially a no-op.
* Avoid unneeded calls to raw_string_ostream::str(), to avoid excess
indirection.
This patch extends TypeQuery matching to support anonymous namespaces. A
new flag is added to control the behavior. In the "strict" mode, the
query must match the type exactly -- all anonymous namespaces included.
The dynamic type resolver in the itanium abi (the motivating use case
for this) uses this flag, as it queries using the name from the
demangles, which includes anonymous namespaces.
This ensures we don't confuse a type with a same-named type in an
anonymous namespace. However, this does *not* ensure we don't confuse
two types in anonymous namespacs (in different CUs). To resolve this, we
would need to use a completely different lookup algorithm, which
probably also requires a DWARF extension.
In the "lax" mode (the default), the anonymous namespaces in the query
are optional, and this allows one search for the type using the usual
language rules (`::A` matches `::(anonymous namespace)::A`).
This patch also changes the type context computation algorithm in
DWARFDIE, so that it includes anonymous namespace information. This
causes a slight change in behavior: the algorithm previously stopped
computing the context after encountering an anonymous namespace, which
caused the outer namespaces to be ignored. This meant that a type like
`NS::(anonymous namespace)::A` would be (incorrectly) recognized as
`::A`). This can cause code depending on the old behavior to misbehave.
The fix is to specify all the enclosing namespaces in the query, or use
a non-exact match.
This patch removes all of the Set.* methods from Status.
This cleanup is part of a series of patches that make it harder use the
anti-pattern of keeping a long-lives Status object around and updating
it while dropping any errors it contains on the floor.
This patch is largely NFC, the more interesting next steps this enables
is to:
1. remove Status.Clear()
2. assert that Status::operator=() never overwrites an error
3. remove Status::operator=()
Note that step (2) will bring 90% of the benefits for users, and step
(3) will dramatically clean up the error handling code in various
places. In the end my goal is to convert all APIs that are of the form
` ResultTy DoFoo(Status& error)
`
to
` llvm::Expected<ResultTy> DoFoo()
`
How to read this patch?
The interesting changes are in Status.h and Status.cpp, all other
changes are mostly
` perl -pi -e 's/\.SetErrorString/ = Status::FromErrorString/g' $(git
grep -l SetErrorString lldb/source)
`
plus the occasional manual cleanup.
With this commit, we also hide the implementation details of
`std::invoke`. To do so, the `LibCXXFrameRecognizer` got a couple more
regular expressions.
The regular expression passed into `AddRecognizer` became problematic,
as it was evaluated on the demangled name. Those names also included
result types for C++ symbols. For `std::__invoke` the return type is a
huge `decltype(...)`, making the regular expresison really hard to
write.
Instead, I added support to `AddRecognizer` for matching on the
demangled names without result type and argument types.
By hiding the implementation details of `invoke`, also the back traces
for `std::function` become even nicer, because `std::function` is using
`__invoke` internally.
Co-authored-by: Adrian Prantl <aprantl@apple.com>
Compilers and language runtimes often use helper functions that are
fundamentally uninteresting when debugging anything but the
compiler/runtime itself. This patch introduces a user-extensible
mechanism that allows for these frames to be hidden from backtraces and
automatically skipped over when navigating the stack with `up` and
`down`.
This does not affect the numbering of frames, so `f <N>` will still
provide access to the hidden frames. The `bt` output will also print a
hint that frames have been hidden.
My primary motivation for this feature is to hide thunks in the Swift
programming language, but I'm including an example recognizer for
`std::function::operator()` that I wished for myself many times while
debugging LLDB.
rdar://126629381
Example output. (Yes, my proof-of-concept recognizer could hide even
more frames if we had a method that returned the function name without
the return type or I used something that isn't based off regex, but it's
really only meant as an example).
before:
```
(lldb) thread backtrace --filtered=false
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
* frame #0: 0x0000000100001f04 a.out`foo(x=1, y=1) at main.cpp:4:10
frame #1: 0x0000000100003a00 a.out`decltype(std::declval<int (*&)(int, int)>()(std::declval<int>(), std::declval<int>())) std::__1::__invoke[abi:se200000]<int (*&)(int, int), int, int>(__f=0x000000016fdff280, __args=0x000000016fdff224, __args=0x000000016fdff220) at invoke.h:149:25
frame #2: 0x000000010000399c a.out`int std::__1::__invoke_void_return_wrapper<int, false>::__call[abi:se200000]<int (*&)(int, int), int, int>(__args=0x000000016fdff280, __args=0x000000016fdff224, __args=0x000000016fdff220) at invoke.h:216:12
frame #3: 0x0000000100003968 a.out`std::__1::__function::__alloc_func<int (*)(int, int), std::__1::allocator<int (*)(int, int)>, int (int, int)>::operator()[abi:se200000](this=0x000000016fdff280, __arg=0x000000016fdff224, __arg=0x000000016fdff220) at function.h:171:12
frame #4: 0x00000001000026bc a.out`std::__1::__function::__func<int (*)(int, int), std::__1::allocator<int (*)(int, int)>, int (int, int)>::operator()(this=0x000000016fdff278, __arg=0x000000016fdff224, __arg=0x000000016fdff220) at function.h:313:10
frame #5: 0x0000000100003c38 a.out`std::__1::__function::__value_func<int (int, int)>::operator()[abi:se200000](this=0x000000016fdff278, __args=0x000000016fdff224, __args=0x000000016fdff220) const at function.h:430:12
frame #6: 0x0000000100002038 a.out`std::__1::function<int (int, int)>::operator()(this= Function = foo(int, int) , __arg=1, __arg=1) const at function.h:989:10
frame #7: 0x0000000100001f64 a.out`main(argc=1, argv=0x000000016fdff4f8) at main.cpp:9:10
frame #8: 0x0000000183cdf154 dyld`start + 2476
(lldb)
```
after
```
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
* frame #0: 0x0000000100001f04 a.out`foo(x=1, y=1) at main.cpp:4:10
frame #1: 0x0000000100003a00 a.out`decltype(std::declval<int (*&)(int, int)>()(std::declval<int>(), std::declval<int>())) std::__1::__invoke[abi:se200000]<int (*&)(int, int), int, int>(__f=0x000000016fdff280, __args=0x000000016fdff224, __args=0x000000016fdff220) at invoke.h:149:25
frame #2: 0x000000010000399c a.out`int std::__1::__invoke_void_return_wrapper<int, false>::__call[abi:se200000]<int (*&)(int, int), int, int>(__args=0x000000016fdff280, __args=0x000000016fdff224, __args=0x000000016fdff220) at invoke.h:216:12
frame #6: 0x0000000100002038 a.out`std::__1::function<int (int, int)>::operator()(this= Function = foo(int, int) , __arg=1, __arg=1) const at function.h:989:10
frame #7: 0x0000000100001f64 a.out`main(argc=1, argv=0x000000016fdff4f8) at main.cpp:9:10
frame #8: 0x0000000183cdf154 dyld`start + 2476
Note: Some frames were hidden by frame recognizers
```
This patch changes the return type of `GetMetadata` from a
`ClangASTMetadata*` to a `std::optional<ClangASTMetadata>`. Except for
one call-site (`SetDeclIsForcefullyCompleted`), we never actually make
use of the mutability of the returned metadata. And we never make use of
the pointer-identity. By passing `ClangASTMetadata` by-value (the type
is fairly small, size of 2 64-bit pointers) we'll avoid some questions
surrounding the lifetimes/ownership/mutability of this metadata.
For consistency, we also change the parameter to `SetMetadata` from
`ClangASTMetadata&` to `ClangASTMetadata` (which is an NFC since we copy
the data anyway).
This came up during some changes we plan to make where we [create
redeclaration chains for decls in the LLDB
AST](https://github.com/llvm/llvm-project/pull/95100). We want to avoid
having to dig out the canonical decl of the declaration chain for
retrieving/setting the metadata. It should just be copied across all
decls in the chain. This is easier to guarantee when everything is done
by-value.
Make LanguageRuntime::GetTypeBitSize return an optional. This should be
NFC, though the ObjCLanguageRuntime implementation is (possibly) more
defensive against returning 0.
I'm not sure if it's possible for both `m_ivar.size` and `m_ivar.offset`
to be zero. Previously, we'd return 0 and cache it, only to discard it
the next time when finding it in the cache, and recomputing it again.
The new code will avoid putting it in the cache in the first place.
AppleObjCTypeEncodingParser::BuildObjCObjectPointerType currently
contains an lldbassert to detect situations where we have a forward
declaration without a definition. According to the accompanying comment,
its purpose is to catch "weird cases" during test suite runs.
However, because this is an lldbassert, we show a scary message to our
users who think this is a problem and report the issue to us.
Unfortunately those reports aren't very actionable without a way to know
the name of the type.
This patch changes the lldbassert to a regular assert and emits a log
message to the types log when this happens.
rdar://127439898
This is fixing a report from ubsan which I don't think is super high
value, but our testsuite hits it on
TestDataFormatterObjCNSContainer.py so I'd like to work around it. We
are getting
```
runtime error: left shift of negative value -8827055269646171913
3159 int64_t data_payload_signed =
3160 ((int64_t)((int64_t)unobfuscated
-> 3161 << m_objc_debug_taggedpointer_ext_payload_lshift) >>
3162 m_objc_debug_taggedpointer_ext_payload_rshift);
```
At this point `unobfuscated` is 0x85800000000000f7 and
`m_objc_debug_taggedpointer_ext_payload_lshift` is 9, so
`(int64_t)0x85800000000000f7<<9` shifts off the "sign" bit and then some
zeroes etc, and that's how we get this error.
We're only trying to extract some bits in the middle of the doubleword,
so the fact that we're "losing" the sign is not a bug. Change the inner
cast to (uint64_t).