Depends on https://github.com/llvm/llvm-project/pull/142163
This patch makes the `-ast-dump-filter` Clang option available to the
`target modules dump ast` command. This allows us to selectively dump
parts of the AST by name.
The AST can quickly grow way too large to skim on the console. This will
aid in debugging AST related issues.
Example:
```
(lldb) target modules dump ast --filter func
Dumping clang ast for 48 modules.
Dumping func:
FunctionDecl 0xc4b785008 <<invalid sloc>> <invalid sloc> func 'void (int)' extern
|-ParmVarDecl 0xc4b7853d8 <<invalid sloc>> <invalid sloc> x 'int'
`-AsmLabelAttr 0xc4b785358 <<invalid sloc>> Implicit "_Z4funcIiEvT_"
Dumping func<int>:
FunctionDecl 0xc4b7850b8 <<invalid sloc>> <invalid sloc> func<int> 'void (int)' implicit_instantiation extern
|-TemplateArgument type 'int'
| `-BuiltinType 0xc4b85b110 'int'
`-ParmVarDecl 0xc4b7853d8 <<invalid sloc>> <invalid sloc> x 'int'
```
The majority of this patch is adjust the `Dump` API. The main change in
behaviour is in `TypeSystemClang::Dump`, where we now use the
`ASTPrinter` for dumping the `TranslationUnitDecl`. This is where the
`-ast-dump-filter` functionality lives in Clang.
Inferred submodule declarations are emitted in DWARF as `DW_TAG_module`s
without `DW_AT_LLVM_include_path`s. Instead the parent DIE will have the
include path. This patch adds support for such setups. Without this, the
`ClangModulesDeclVendor` would fail to `AddModule` the submodules (i.e.,
compile and load the submodules). This would cause errors such as:
```
note: error: Header search couldn't locate module 'TopLevel'
```
The test added here also tests
https://github.com/llvm/llvm-project/pull/141220. Without that patch
we'd fail with:
```
note: error: No module map file in /Users/jonas/Git/llvm-worktrees/llvm-project/lldb/test/API/lang/cpp/decl-from-submodule
```
Unfortunately the embedded clang instance doesn't allow us to use the
decls we find in the modules. But we'll try to fix that in a separate
patch.
rdar://151022173
Add an overloaded `GetTypeSystem` to specify the expected type system subclass. Changes code from `GetTypeSystem().dyn_cast_or_null<TypeSystemClang>()` to `GetTypeSystem<TypeSystemClang>()`.
Identical PR to: https://github.com/llvm/llvm-project/pull/134563
Previous PR was approved and landed but broke the build due to bad
merge.
Manually resolve the merge conflict and try to land again.
Co-authored-by: George Hu <georgehuyubo@gmail.com>
This reverts commit 070a4ae2f9bcf6967a7147ed2972f409eaa7d3a6.
Multiple buildbot failures have been reported:
https://github.com/llvm/llvm-project/pull/134563
The build fails with:
lldb/source/Target/Statistics.cpp:75:39: error: use of undeclared
identifier 'num_symbols_loaded'
The check is not correct for discontinuous functions, as one of the
blocks could very well begin before the function entry point. To catch
dead-stripped ranges, I check whether the functions is after the first
known code address. I don't print any error in this case as that is a
common/expected situation.
This avoids many errors like:
```
error: ld-linux-x86-64.so.2 0x00085f3b: adding range [0x0000000000001ae8-0x0000000000001b07) which has a
base that is less than the function's low PC 0x000000000001cfb0. Please file a bug and attach the file at
the start of this error message
```
when debugging binaries on debian trixie because the dynamic linker
(ld-linux) contains discontinuous functions.
If the block ranges is not a subrange of the enclosing block then this
will range will currently be added to the outer block as well (i.e., we
get the same behavior that's currently possible for non-subrange blocks
larger than function_low_pc). However, this code path is buggy and I'd
like to change that (#117725).
This patch addresses the issue #129543.
After this patch DWARFExpression does not call DWARFUnit directly and does not depend on
lldb/source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.cpp and a lot of clang code.
After this patch the size of lldb-server binary (Linux Aarch64) is reduced from 47MB to 17MB.
This patch pushes the error handling boundary for the GetBitSize()
methods from Runtime into the Type and CompilerType APIs. This makes it
easier to diagnose problems thanks to more meaningful error messages
being available. GetBitSize() is often the first thing LLDB asks about a
type, so this method is particularly important for a better user
experience.
rdar://145667239
This reverts commit 6041c745f32e8fd60ed24e29e7d919d8d1c87ca6.
Relands the original patch with the test-case data fixed. Weirldy the PR CI
didn't seem to run the unit-tests? In any case, the problem was an
incorrect expectation in the test-case data. Since we have both public
and internal SDK in that test-case, we should `expect_mismatch` to be
`true`.
Reverts llvm/llvm-project#128712
```
******************** TEST 'lldb-unit :: SymbolFile/DWARF/./SymbolFileDWARFTests/10/14' FAILED ********************
Script(shard):
--
GTEST_OUTPUT=json:/Users/ec2-user/jenkins/workspace/llvm.org/as-lldb-cmake/lldb-build/tools/lldb/unittests/SymbolFile/DWARF/./SymbolFileDWARFTests-lldb-unit-1021-10-14.json GTEST_SHUFFLE=1 GTEST_TOTAL_SHARDS=14 GTEST_SHARD_INDEX=10 GTEST_RANDOM_SEED=62233 /Users/ec2-user/jenkins/workspace/llvm.org/as-lldb-cmake/lldb-build/tools/lldb/unittests/SymbolFile/DWARF/./SymbolFileDWARFTests
--
Script:
--
/Users/ec2-user/jenkins/workspace/llvm.org/as-lldb-cmake/lldb-build/tools/lldb/unittests/SymbolFile/DWARF/./SymbolFileDWARFTests --gtest_filter=SDKPathParsingTests/SDKPathParsingMultiparamTests.TestSDKPathFromDebugInfo/6
--
/Users/ec2-user/jenkins/workspace/llvm.org/as-lldb-cmake/llvm-project/lldb/unittests/SymbolFile/DWARF/XcodeSDKModuleTests.cpp:265: Failure
Expected equality of these values:
found_mismatch
Which is: true
expect_mismatch
Which is: false
/Users/ec2-user/jenkins/workspace/llvm.org/as-lldb-cmake/llvm-project/lldb/unittests/SymbolFile/DWARF/XcodeSDKModuleTests.cpp:265
Expected equality of these values:
found_mismatch
Which is: true
expect_mismatch
Which is: false
```
`GetSDKRoot` uses `xcrun` to find an SDK root path for a given SDK
version string. But if the SDK doesn't exist in the Xcode installations,
but instead lives in the `CommandLineTools`, `xcrun` will fail to find
it. Negative searches for an SDK path cost a lot (a few seconds) each
time `xcrun` is invoked. We do cache negative results in
`find_cached_path` inside LLDB, but we would still pay the price on
every new debug session the first time we evaluate an expression. This
doesn't only cause a noticable delay in running the expression, but also
generates following error:
```
error: Error while searching for Xcode SDK: timed out waiting for shell command to complete
(int) $0 = 42
```
In this patch we avoid these possibly expensive calls to `xcrun` by
checking the `DW_AT_LLVM_sysroot`, and if it exists, using that as the
SDK path. We need an explicit check for the `CommandLineTools` path
before we call `RegisterXcodeSDK`, because that will try to call
`xcrun`. This won't prevent other uses of `GetSDKRoot` popping up that
cause us to make expensive `xcrun` calls, but for now this addresses the
regression in the expression evaluator. We also had to adjust the
`XcodeSDK::Merge` logic to update the sysroot. There is one case for
which this wouldn't make sense: if a CU was compiled with
`CommandLineTools` and a different one with an older internal SDK, in
that case we would update the `CommandLineTools` sysroot with a
`.Internal.sdk` prefix, which won't possibly exist for
`CommandLineTools`. I added a unit-test for this. Not sure if we want to
explicitly detect and disallow this, given it's quite a niche scenario.
rdar://113619904
rdar://113619723
The purpose of this originally was to check for DWARF which refers to
garbage-collected functions (by checking whether we're able to get a
good address out of the function). The address check has been removed in
https://reviews.llvm.org/D112310, so the code computing it is not doing
anything.
This is the behavior expected by DWARF. It also requires some fixups to
algorithms which were storing the addresses of some objects (Blocks and
Variables) relative to the beginning of the function.
There are plenty of things that still don't work in this setups, but
this change is sufficient for the expression evaluator to correctly
recognize the entry point of a function in this case.
`GetAttributes` returns all attributes on a given DIE, including any
attributes that the DIE references via `DW_AT_abstract_origin` and
`DW_AT_specification`. However, if an attribute exists on both the
referring DIE and the referenced DIE, the first one encountered will be
the one that takes precendence when querying the returned
`DWARFAttributes`. But there was no guarantee in which order those
attributes get visited. That means there's no convenient way of ensuring
that an attribute of a definition doesn't get shadowed by one found on
the declaration. One use-case where we don't want this to happen is for
`DW_AT_object_pointer` (which can exist on both definitions and
declarations, see https://github.com/llvm/llvm-project/pull/123089).
This patch makes sure we visit the current DIE's attributes before
following DIE references. I tried keeping as much of the original
`GetAttributes` unchanged and just add an outer `GetAttributes` that
keeps track of the DIEs we need to visit next.
There's precendent for this iteration order in
`llvm::DWARFDie::findRecursively` and also
`lldb_private::ElaboratingDIEIterator`. We could use the latter to
implement `GetAttributes`, though it also follows `DW_AT_signature` so I
decided to leave it for follow-up.
Anonymous namespaces are supposed to be optional when looking up types.
This was not working in combination with -gsimple-template-names,
because the way it was constructing the complete (with template args)
name scope (i.e., by generating thescope as a string and then reparsing
it) did not preserve the information about the scope kinds.
Essentially what the code wants here is to call `GetTypeLookupContext`
(that's the function used to get the context in the "regular" code
path), but to embelish each name with the template arguments (if they
don't have them already). This PR implements exactly that by adding an
argument to control which kind of names are we interested in. This
should also make the lookup faster as it avoids parsing of the long
string, but I haven't attempted to benchmark that.
I believe this function can also be used in some other places where
we're manually appending template names, but I'm leaving that for
another patch.
The problem here manifests as follows:
1. We are stopped in main.o, so the first `ParseTypeFromDWARF` on
`FooImpl<char>` gets called on `main.o`'s SymbolFile. This adds a
mapping from *declaration die* -> `TypeSP` into `main.o`'s
`GetDIEToType` map.
2. We then `CompleteType(FooImpl<char>)`. Depending on the order of
entries in the debug-map, this might call `CompleteType` on `lib.o`'s
SymbolFile. In which case, `GetDIEToType().lookup(decl_die)` will return
a `nullptr`. This is already a bit iffy because some of the surrounding
code assumes we don't call `CompleteTypeFromDWARF` with a `nullptr`
`Type*`. E.g., `CompleteEnumType` blindly dereferences it (though enums
will never encounter this because their definition is fetched in
ParseEnum, unlike for structures).
3. While in `CompleteTypeFromDWARF`, we call `ParseTypeFromDWARF` again.
This will parse the member function `FooImpl::Create` and its return
type which is a typedef to `FooImpl*`. But now we're inside `lib.o`'s
SymbolFile, so we call it on the definition DIE. In step (2) we just
inserted a `nullptr` into `GetDIEToType` for the definition DIE, so we
trivially return a `nullptr` from `ParseTypeFromDWARF`. Instead of
reporting back this parse failure to the user LLDB trucks on and marks
`FooImpl::Ref` to be `void*`.
This test-case will trigger an assert in `TypeSystemClang::VerifyDecl`
even if we just `frame var` (but only in debug-builds). In release
builds where this function is a no-op, we'll create an incorrect Clang
AST node for the `Ref` typedef.
The proposed fix here is to share the `GetDIEToType` map between
SymbolFiles if a debug-map exists.
**Alternatives considered**
* Check the `GetDIEToType` map of the `SymbolFile` that the declaration
DIE belongs to. The assumption here being that if we called
`ParseTypeFromDWARF` on a declaration, the `GetDIEToType` map that the
result was inserted into was the one on that DIE's SymbolFile. This was
the first version of this patch, but that felt like a weaker version
sharing the map. It complicates the code in `CompleteType` and is less
consistent with the other bookkeeping structures we already share
between SymbolFiles
* Return from `SymbolFileDWARF::CompleteType` if there is no type in the
current `GetDIEToType`. Then `SymbolFileDWARFDebugMap::CompleteType`
could continue to the next `SymbolFile` which does own the type. But
that didn't quite work because we remove the
`GetForwardCompilerTypeToDie` entry in `SymbolFile::CompleteType`, which
`SymbolFileDWARFDebugMap::CompleteType` relies upon for iterating
The main difference is that the llvm class (just a std::vector in
disguise) is not sorted. It turns out this isn't an issue because the
callers either:
- ignore the range list;
- convert it to a different format (which is then sorted);
- or query the minimum value (which is faster than sorting)
The last case is something I want to get rid of in a followup as a part
of removing the assumption that function's entry point is also its
lowest address.
This reverts commit 2526d5b1689389da9b194b5ec2878cfb2f4aca93, reapplying
ba14dac481564000339ba22ab867617590184f4c after fixing the conflict with
#117532. The change is that Function::GetAddressRanges now recomputes
the returned value instead of returning the member. This means it now
returns a value instead of a reference type.
This is a follow-up/reimplementation of #115730. While working on that
patch, I did not realize that the correct (discontinuous) set of ranges
is already stored in the block representing the whole function. The
catch -- ranges for this block are only set later, when parsing all of
the blocks of the function.
This patch changes that by populating the function block ranges eagerly
-- from within the Function constructor. This also necessitates a
corresponding change in all of the symbol files -- so that they stop
populating the ranges of that block. This allows us to avoid some
unnecessary work (not parsing the function DW_AT_ranges twice) and also
results in some simplification of the parsing code.
It's basically true already (except for a brief time during
construction). This patch makes sure the objects are constructed with a
valid parent and enforces this in the type system, which allows us to
get rid of some nullptr checks.
In #108907, the index classes started filtering the DIEs according to
the full type query (instead of just the base name). This means that the
checks in SymbolFileDWARF are now redundant.
I've also moved the non-redundant checks so that now all checking is
done in the DWARFIndex class and the caller can expect to get the final
filtered list of types.
This reverts commit f06c187799d910fd3ac3e9106397e5eecff9f265.
Temporary revert: there is https://github.com/llvm/llvm-project/pull/117239 that is suppose to fix the issue.
Reverting to keep things rolling.
This is the second half of
https://github.com/llvm/llvm-project/pull/90008.
Essentially, it replaces the work of resolving template types when we
just need the qualified names with walking the DIE tree using
`DWARFTypePrinter`.
### Result
For an internal target, the time spent on `expr *this` for the first
time reduced from 28 secs to 17 secs.
The class is only used from one place, which is trivial to implement
using the llvm class.
The main difference is that in the new implementation, the ranges are
parsed each time anew (instead of being parsed at startup and cached). I
believe this is fine because:
- this is already how things work with DWARF v5 debug_rnglists
- parsing debug_ranges is fairly fast (definitely faster than rnglists)
- generally, this result will be cached at a higher level anyway.
Browsing the code I did find one instance where that is not the case --
SymbolFileDWARF::ResolveFunctionAndBlock -- which is called each time we
resolve an address (to the block level). However, this function is
already pretty suboptimal: it first traverses the DIE tree (which
involves parsing all the DIE attributes) to find the correct block, then
it parses them again to construct the `lldb_private::Block`
representation, and *then* it uses the ID of the block DIE it found in
the first step to look up the `Block` object. If this turns out to be a
bottleneck, I think there are better ways to optimize it than caching
the debug_ranges parse.
The motiviation for this is that DWARFDebugRanges sorts the block
ranges, even though the order of the ranges is load-bearing (in the
absence of DW_AT_low_pc, the "base address" of a scope is determined by
the first range entry). Delaying the parsing (and sorting) step makes it
easier to access the first entry.
"statistics dump" currently report the statistics of all targets in
debugger instead of current target. This is wrong because there is a
"statistics dump --all-targets" option that supposed to include
everything.
This PR fixes the issue by only report statistics for current target
instead of all. It also includes the change to reset statistics debug
info/symbol table parsing/indexing time during debugger destroy. This is
required so that we report current statistics if we plan to reuse
lldb/lldb-dap across debug sessions
---------
Co-authored-by: jeffreytan81 <jeffreytan@fb.com>
This is the beginning of a different, more fundamental approach to
handling. This PR tries to tries to minimize functional changes. It only
makes sure that we store the true set of ranges inside the function
object, so that subsequent patches can make use of it.
In DWARF 4 and earlier `static const` members of structs, classes and
unions have an entry tag `DW_TAG_member`, and are also tagged as
`DW_AT_declaration`, but otherwise follow the same rules as
`DW_TAG_variable`.
Swift types have mangled type names. This adds functionality to look up
those types through the FindTypes API by searching for the mangled type
name instead of the regular name.
## Summary
This PR is a continuation of
https://github.com/llvm/llvm-project/pull/108907 by using `.debug_names`
parent chain faster lookup for namespaces.
## Implementation
Similar to https://github.com/llvm/llvm-project/pull/108907. This PR
adds a new API: `GetNamespacesWithParents` in `DWARFIndex` base class.
The API performs the same function as `GetNamespaces()` with additional
filtering using parents `CompilerDeclContext`. A default implementation
is given in `DWARFIndex` class which parses debug info and performs the
matching. In the `DebugNameDWARFIndex` override, parents
`CompilerDeclContext` is cross checked with parent chain in
`.debug_names` for much faster filtering before fallback to base
implementation for final filtering.
## Performance Results
For the same benchmark used in
https://github.com/llvm/llvm-project/pull/108907, this PR improves: 48s
=> 28s
---------
Co-authored-by: jeffreytan81 <jeffreytan@fb.com>
I've been getting complaints from users being spammed by -gmodules
missing file warnings going out of control because each object file
depends on an entire DAG of PCM files that usually are all missing at
once. To reduce this problem, this patch does two things:
1. Module now maintains a DenseMap<hash, once> that is used to display
each warning only once, based on its actual text.
2. The PCM warning itself is reworded to include less details, such as
the DIE offset, which is only useful to LLDB developers, who can get
this from the dwarf log if they need it. Because the detail is omitted
the hashing from (1) deduplicates the warnings.
rdar://138144624
## Summary
This PR improves `SymbolFileDWARF::FindTypes()` by utilizing the newly
added parent chain `DW_IDX_parent` in `.debug_names`. The proposal was
originally discussed in [this
RFC](https://discourse.llvm.org/t/rfc-improve-dwarf-5-debug-names-type-lookup-parsing-speed/74151).
## Implementation
To leverage the parent chain for `SymbolFileDWARF::FindTypes()`, this PR
adds a new API: `GetTypesWithQuery` in `DWARFIndex` base class. The API
performs the same function as `GetTypes` with additional filtering using
`TypeQuery`. Since this only introduces filtering, the callback
mechanisms at all call sites remain unchanged. A default implementation
is given in `DWARFIndex` class which parses debug info and performs the
matching. In the `DebugNameDWARFIndex` override, the parent_contexts in
the `TypeQuery` is cross checked with parent chain in `.debug_names` for
for much faster filtering before fallback to base implementation for
final filtering.
Unlike the `GetFullyQualifiedType` API, which fully consumes the
`DW_IDX_parent` parent chain for exact matching, these new APIs perform
partial subset matching for type/namespace queries. This is necessary to
support queries involving anonymous or inline namespaces. For instance,
a user might request `NS1::NS2::NS3::Foo`, while the index table's
parent chain might contain `NS1::inline_NS2::NS3::Foo`, which would fail
exact matching.
## Performance Results
In one of our internal target using `.debug_names` + split dwarf.
Expanding a "this" pointer in locals view in VSCode:
94s => 48s. (Not sure why I got 94s this time instead of 70s last week).
---------
Co-authored-by: jeffreytan81 <jeffreytan@fb.com>
This can legitimately happen for static function-local variables with a
non-manual dwarf index. According to the DWARF spec, these variables
should be (and are) included in the compiler generated indexes, but they
are ignored by our manual index. Encountering them does not indicate any
sort of error.
The error message is particularly annoying due to a combination of the
fact that we don't cache negative hits (so we print it every time we
encounter it), VSCode's aggresive tab completion (which attempts a
lookup after every keypress) and the fact that some low-level libraries
(e.g. tcmalloc) have several local variables called "v". The result is a
console full of error messages everytime you type "v ".
We already have tests (e.g. find-basic-variable.cpp), which check that
these variables are not included in the result (and by extension, that
their presence does not crash lldb). It would be possible to extend it
to make sure it does *not* print this error message, but it doesn't seem
like it would be particularly useful.
My build of LLDB is all the time loading targets with a version of
libc++ that was built with gcc that uses the DW_FORM 0x1e that is not
implemented by LLVM, and I doubt it'll ever implement it. It's used for
some 128 bit encoding of numbers, which is just very weird. Because of
this, LLDB is showing some warnings all the time for my users, so I'm
adding a flag to control the enablement of this warning.
This patch removes all of the Set.* methods from Status.
This cleanup is part of a series of patches that make it harder use the
anti-pattern of keeping a long-lives Status object around and updating
it while dropping any errors it contains on the floor.
This patch is largely NFC, the more interesting next steps this enables
is to:
1. remove Status.Clear()
2. assert that Status::operator=() never overwrites an error
3. remove Status::operator=()
Note that step (2) will bring 90% of the benefits for users, and step
(3) will dramatically clean up the error handling code in various
places. In the end my goal is to convert all APIs that are of the form
` ResultTy DoFoo(Status& error)
`
to
` llvm::Expected<ResultTy> DoFoo()
`
How to read this patch?
The interesting changes are in Status.h and Status.cpp, all other
changes are mostly
` perl -pi -e 's/\.SetErrorString/ = Status::FromErrorString/g' $(git
grep -l SetErrorString lldb/source)
`
plus the occasional manual cleanup.
@walter-erquinigo found the the [PR with testing and a fix for
DebugInfoD](https://github.com/llvm/llvm-project/pull/98344) caused an
issue when working with stripped binaries.
The issue is that when you're working with split-dwarf, there are *3*
possible files: The stripped binary the user is debugging, the
"only-keep-debug" *or* unstripped binary, plus the `.dwp` file. The
debuginfod plugin should provide the unstripped/OKD binary. However, if
the debuginfod plugin fails, the default symbol locator plugin will just
return the stripped binary, which doesn't help. So, to address that, the
SymbolVendorELF code checks to see if the SymbolLocator's
ExecutableObjectFile request returned the same file, and bails if that's
the case. You can see the specific diff as the second commit in the PR.
I'm investigating adding a test: I can't quite get a simple repro, and
I'm unwilling to make any additional changes to Makefile.rules to this
diff, for Pavlovian reasons.
This reverts commit 2fa1220a37a3f55b76a29803d8333b3a3937d53a.
This reverts commit b9496a74eb4029629ca2e440c5441614e766f773.
The patch #98344 causes a crash in LLDB when parsing some files like `numpy.libs/libgfortran-daac5196.so.5.0.0` on graviton (you can download it in https://drive.google.com/file/d/12ygLjJwWpzdYsrzBPp1JGiFHxcgM0-XY/view?usp=drive_link if you want to troubleshoot yourself).
The assert that is hit is the following:
```
llvm-project/lldb/source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp:2452: std::pair<unsigned int, std::map<long unsigned int, lldb_private::AddressClass> > ObjectFileELF::ParseSymbolTable(lldb_private::Symtab*, lldb::user_id_t, lldb_private::Section*): Assertion `strtab->GetObjectFile() == this' failed.
[383588:383636:20240716,025305.572639:ERROR crashpad_client_linux.cc:780] Crashpad isn't enabled
```
This object file doesn't have apparently a strings table but LLDB still tries to process it due to the code that is being reverted.