When creating LLDB types from `LF_MODIFIER` records, the type name of
the modified type was used. This didn't include the modifiers
(`const`/`volatile`/`__unaligned`). With this PR, they're included.
The DIA plugin had a test for this. That test also assumed that function
types had a name. I removed that check here, because function/procedure
types themselves in PDB don't have a name:
```
0x1015 | LF_ARGLIST [size = 20, hash = 0xBCB6]
0x0074 (int): `int`
0x1013: `int* __restrict`
0x1014: `int& __restrict`
0x1016 | LF_PROCEDURE [size = 16, hash = 0x3F611]
return type = 0x0003 (void), # args = 3, param list = 0x1015
calling conv = cdecl, options = None
```
I assume DIA gets the name from the function symbol itself. In the
native plugin, that name isn't included and multiple functions with the
same signature will reuse one type, whereas DIA would create a new type
for each function. The
[Shell/SymbolFile/PDB/func-symbols.test](b29c7ded31/lldb/test/Shell/SymbolFile/PDB/func-symbols.test)
also relies on this.
If LLDB is built without the DIA SDK enabled, then the native plugin is
used regardless of `plugin.symbol-file.pdb.reader` or
`LLDB_USE_NATIVE_PDB_READER`. This made the test fail on Windows when
the DIA SDK was disabled
(https://github.com/llvm/llvm-project/issues/114906#issuecomment-3241796062).
This PR changes the requirement for the test from `target-windows` to
`diasdk` (only used in this test).
After parsing blocks in a function, the blocks should be marked as
parsed for them to be dumped (see
[Function::Dump](e6aefbec78/lldb/source/Symbol/Function.cpp (L446-L447))).
As explained in
https://github.com/llvm/llvm-project/issues/114906#issuecomment-3255016266,
this happens (accidentally?) in the DIA plugin when parsing variables,
because it calls `function.GetBlock(can_create=true)` which marks blocks
as parsed. In the native plugin, this was never called, so blocks and
variables were never included in the `lldb-test symbols` output.
The `variables.test` for the DIA plugin tests this. One difference
between the plugins is how they specify the location of local variables.
This causes the output of the native plugin to be two lines per
variable, whereas the DIA plugin has one line:
```
(native):
000002C4B7593020: Variable{0x1c800001}, name = "var_arg1", type = {0000000000000744} 0x000002C4B6CA7900 (int), scope = parameter, location = 0x00000000:
[0x000000014000102c, 0x000000014000103e): DW_OP_breg7 RSP+8
```
```
(DIA):
000002778C827EE0: Variable{0x0000001b}, name = "var_arg1", type = {0000000000000005} 0x000002778C1FBAB0 (int), scope = parameter, decl = VariablesTest.cpp:32, location = DW_OP_breg7 RSP+8
```
In the test, I filtered lines starting with spaces followed by `[0x`, so
we can still use `CHECK-NEXT`.
---
Another difference between the plugins is that DIA marks the `this`
pointer as artificial (equivalent to DWARF). This is done if a
variable's object kind is `ObjectPtr`
([source](ab898f32c6/lldb/source/Plugins/SymbolFile/PDB/SymbolFilePDB.cpp (L1050))).
As far as I know, there isn't anything in the debug info that says "this
variable is the `this` pointer" other than the name/type of a variable
and the type of the function.
This test was brokem by migrating to the lit internal shell due to a
lack of env prefix for setting environment variables. This was fixed in
prevented the breakpoint in the test from mapping to anything, causing
the test to file. This patch restores the original line numbering.
These tests were failing on darwin, because the internal shell needs
environment var definitions to start with 'env'. This PR (hopefully)
fixes that problem.
To find global variables, `SymbolFileNativePDB` used to search the
globals stream for the name passed to `FindGlobalVariables`. However,
the symbols in the globals stream contain the fully qualified name and
`FindGlobalVariables` only gets the basename. The approach here is
similar to the one for types and functions.
As we already search the globals stream for functions, we can cache the
basenames for global variables there as well.
This makes the `expressions.test` from the DIA PDB plugin pass with the
native one (#114906).
This patch updates the lld lit test config to use the internal shell by
default. This has some performance advantages (~10-15%) and also
produces nicer failure output. It also updates the two LLDB tests to not
require shell (so that they run under the internal shell), after first
verifying that they run and pass using the internal shell; and it fixes
one test that was not passing under the internal shell.
This upstreams https://github.com/swiftlang/llvm-project/pull/10313.
If we detect a situation where a forward declaration is C++ and the
definition DIE is Objective-C, then just don't try to complete the type
(it would crash otherwise). In the long term we might want to add
support for completing such types.
We've seen real world crashes when debugging WebKit and wxWidgets
because of this. Both projects forward declare ObjC++ decls in the way
shown in the test.
rdar://145959981
This uses split DWARF and from looking at other tests, it should not be
running on Darwin or Windows.
It does pass using the DIA PDB plugin but I think this is misleading
because it's not actually testing the intended feature.
When the native PDB plugin is used it fails because it cannot set a
breakpoint. I don't see a point to running this test on Windows at all.
Native PDB plugin test failures are being tracked in #114906.
Tag types like stucts or enums didn't have a declaration attached to
them. The source locations are present in the IPI stream in
`LF_UDT_MOD_SRC_LINE` records:
```
0x101F | LF_UDT_MOD_SRC_LINE [size = 18, hash = 0x1C63]
udt = 0x1058, mod = 3, file = 1, line = 0
0x2789 | LF_UDT_MOD_SRC_LINE [size = 18, hash = 0x1E5A]
udt = 0x1253, mod = 35, file = 93, line = 17069
```
The file is an ID in the string table `/names`:
```
ID | String
1 | '\<unknown>'
12 | 'D:\a\_work\1\s\src\ExternalAPIs\WindowsSDKInc\c\Include\10.0.22621.0\um\wingdi.h'
93 | 'D:\a\_work\1\s\src\ExternalAPIs\WindowsSDKInc\c\Include\10.0.22621.0\um\winnt.h'
```
Here, we're not interested in `mod`. This would indicate which module
contributed the UDT.
I was looking at Rustc's PDB and found that it uses `<unknown>` for some
types, so I added a check for that.
This makes two DIA PDB shell tests to work with the native PDB plugin.
---------
Co-authored-by: Michael Buch <michaelbuch12@gmail.com>
Relands #152295.
Checking for the overloads did not account for them being out of order.
For example, [the failed
output](https://github.com/llvm/llvm-project/pull/152295#issuecomment-3177563247)
contained the overloads, but out of order. The last commit here fixes
that by using `-DAG`.
---------
Co-authored-by: Jonas Devlieghere <jonas@devlieghere.com>
This adds the ability for functions to be matched by their basename.
Before, the globals were searched for the name. This works if the full
name is available but fails for basenames.
PDB only includes the full names of functions, so we need to cache all
basenames. This is (again) very similar to
[SymbolFilePDB](b242150b07/lldb/source/Plugins/SymbolFile/PDB/SymbolFilePDB.cpp (L1291-L1383)).
There are two main differences:
- We can't just access the parent of a function to determine that it's a
member function - we need to check the type of the function, and its
"this type".
- SymbolFilePDB saved the full method name in the map. However, when
searching for methods, only the basename is passed, so the function
never found any methods.
Fixes#51933.
---------
Co-authored-by: Jonas Devlieghere <jonas@devlieghere.com>
This is a major change on how we represent nested name qualifications in
the AST.
* The nested name specifier itself and how it's stored is changed. The
prefixes for types are handled within the type hierarchy, which makes
canonicalization for them super cheap, no memory allocation required.
Also translating a type into nested name specifier form becomes a no-op.
An identifier is stored as a DependentNameType. The nested name
specifier gains a lightweight handle class, to be used instead of
passing around pointers, which is similar to what is implemented for
TemplateName. There is still one free bit available, and this handle can
be used within a PointerUnion and PointerIntPair, which should keep
bit-packing aficionados happy.
* The ElaboratedType node is removed, all type nodes in which it could
previously apply to can now store the elaborated keyword and name
qualifier, tail allocating when present.
* TagTypes can now point to the exact declaration found when producing
these, as opposed to the previous situation of there only existing one
TagType per entity. This increases the amount of type sugar retained,
and can have several applications, for example in tracking module
ownership, and other tools which care about source file origins, such as
IWYU. These TagTypes are lazily allocated, in order to limit the
increase in AST size.
This patch offers a great performance benefit.
It greatly improves compilation time for
[stdexec](https://github.com/NVIDIA/stdexec). For one datapoint, for
`test_on2.cpp` in that project, which is the slowest compiling test,
this patch improves `-c` compilation time by about 7.2%, with the
`-fsyntax-only` improvement being at ~12%.
This has great results on compile-time-tracker as well:

This patch also further enables other optimziations in the future, and
will reduce the performance impact of template specialization resugaring
when that lands.
It has some other miscelaneous drive-by fixes.
About the review: Yes the patch is huge, sorry about that. Part of the
reason is that I started by the nested name specifier part, before the
ElaboratedType part, but that had a huge performance downside, as
ElaboratedType is a big performance hog. I didn't have the steam to go
back and change the patch after the fact.
There is also a lot of internal API changes, and it made sense to remove
ElaboratedType in one go, versus removing it from one type at a time, as
that would present much more churn to the users. Also, the nested name
specifier having a different API avoids missing changes related to how
prefixes work now, which could make existing code compile but not work.
How to review: The important changes are all in
`clang/include/clang/AST` and `clang/lib/AST`, with also important
changes in `clang/lib/Sema/TreeTransform.h`.
The rest and bulk of the changes are mostly consequences of the changes
in API.
PS: TagType::getDecl is renamed to `getOriginalDecl` in this patch, just
for easier to rebasing. I plan to rename it back after this lands.
Fixes#136624
Fixes https://github.com/llvm/llvm-project/issues/43179
Fixes https://github.com/llvm/llvm-project/issues/68670
Fixes https://github.com/llvm/llvm-project/issues/92757
Some DIA PDB tests pass with the native plugin already, but didn't test
this. This adds test runs with the native plugin - no functional
changes.
In addition to the x86 calling convention test, there's also
9f102a9004/lldb/test/Shell/SymbolFile/PDB/calling-conventions-arm.test,
but I can't test this.
Fixes a bug that surfaces in frame recognizers.
Details about the bug:
A new frame recognizer is configured to match a specific symbol
(`swift_willThrow`). This is an `extern "C"` symbol defined in a C++
source file. When Swift is built with debug info, the function
`ParseFunctionFromDWARF` will use the debug info to construct a function
name that looks like a C++ declaration (`::swift_willThrow(void *,
SwiftError**)`). The `Mangled` instance will have this string as its
`m_demangled` field, and have _no_ string for its `m_mangled` field.
The result is the frame recognizer would not match the symbol to the
name (`swift_willThrow` != `::swift_willThrow(void *, SwiftError**)`.
By changing `ParseFunctionFromDWARF` to assign both a demangled name and
a mangled, frame recognizers can successfully match symbols in this
configuration.
Languages other than C/C++ don't necessarily emit mangled names in the
`UniqueName` field of type records. Rust specifically emits a unique ID
that doesn't contain the name.
For example, `(i32, i32)` is emitted as
```llvm
!266 = !DICompositeType(
tag: DW_TAG_structure_type, name: "tuple$<i32,i32>", file: !9, size: 64, align: 32,
elements: !267, templateParams: !17, identifier: "19122721b0632fe96c0dd37477674472"
)
```
which results in
```
0x1091 | LF_STRUCTURE [size = 72, hash = 0x1AC67] `tuple$<i32,i32>`
unique name: `19122721b0632fe96c0dd37477674472`
vtable: <no type>, base list: <no type>, field list: 0x1090
options: has unique name, sizeof 8
```
In C++ with Clang and MSVC, a structure similar to this would result in
```
0x136F | LF_STRUCTURE [size = 44, hash = 0x30BE2] `MyTuple`
unique name: `.?AUMyTuple@@`
vtable: <no type>, base list: <no type>, field list: 0x136E
options: has unique name, sizeof 8
```
With this PR, if a `UniqueName` is encountered that couldn't be parsed,
it will fall back to using the undecorated (→ do the same as if the
unique name is empty/unavailable).
I'm not sure how to test this. Maybe compiling the LLVM IR that rustc
emits?
Fixes#152051.
Initially suggested in
https://github.com/llvm/llvm-project/pull/149305#issuecomment-3113413702
- this PR adds the setting `plugin.symbol-file.pdb.use-native-reader`.
It doesn't remove support for `LLDB_USE_NATIVE_PDB_READER` to allow some
backwards compatibility. This was the suggested way to use the native
reader - changing that would mean users who set this, now use the DIA
reader. The setting has priority over the environment variable, though.
If the default gets flipped on Windows, the environment variable could
probably be removed as well.
This would make it possible to test both native PDB and DIA PDB in the
API tests (see linked PR).
Previously, `type lookup` for types in namespaces didn't work with the
native PDB plugin, because `FindTypes` would only look for types whose
base name was equal to their full name. PDB/CodeView does not store the
base names in the TPI stream, but the types have their full name (e.g.
`std::thread` instead of `thread`). So `findRecordsByName` would only
return types in the top level namespace.
This PR changes the lookup to go through all types and check their base
name. As that could be a bit expensive, the names are first cached
(similar to the function lookup in the DIA PDB plugin). Potential types
are checked with `TypeQuery::ContextMatches`.
To be able to handle anonymous namespaces, I changed
`TypeQuery::ContextMatches`. The [`TypeQuery`
constructor](9ad7edef42/lldb/source/Symbol/Type.cpp (L76-L79))
inserts all name components as `CompilerContextKind::AnyDeclContext`. To
skip over anonymous namespaces, `ContextMatches` checked if a component
was empty and exactly of kind `Namespace`. For our query, the last check
was always false, so we never skipped anonymous namespaces. DWARF
doesn't have this problem, as it [constructs the context
outside](abe93d9d7e/lldb/source/Plugins/SymbolFile/DWARF/DWARFIndex.cpp (L154-L160))
and has proper information about namespaces. I'm not fully sure if my
change is correct and that it doesn't break other users of `TypeQuery`.
This enables `type lookup <type>` to work on types in namespaces.
However, expressions don't work with this yet, because `FindNamespace`
is unimplemented for native PDB.
%T has been deprecated for about seven years, mostly because it is not
unique to each test which can lead to races. This patch updates the few
remaining tests in lldb that use %T to not use it (either directly using
files or creating their own temp dir). The eventual goal is to remove
support for %T from llvm-lit given few tests use it and it still has
racey behavior.
This patch errors on the side of creating new temp dirs even when not
strictly necessary to avoid needing to update filenames inside filecheck
matchers.
https://github.com/llvm/llvm-project/pull/149282 changed
the max children depth and that caused one part of the
output to become `{...}`.
The original PR set a higher limit for a different test,
so I'm doing the same here.
Deeply nested structs can be noisy, so Apple's LLDB fork sets the
default to `4`:
9c93adbb28/lldb/source/Target/TargetProperties.td (L134-L136)
Thought it would be useful to upstream this. Though happy to pick a
different default or keep it as-is.
* Changes the default synthetic symbol names to contain their file
address
This is a new PR after the first PR (#137512) was reverted because it
didn't update the way unnamed symbols were searched in the symbol table,
which relied on the index being in the name.
This time also added extra test to make sure the symbol is found as
expected
similar to #140570
getting this error:
exit status 1
ld.lld: error: section '.text' address (0x8074) is smaller than image
base (0x10000); specify --image-base
The check is not correct for discontinuous functions, as one of the
blocks could very well begin before the function entry point. To catch
dead-stripped ranges, I check whether the functions is after the first
known code address. I don't print any error in this case as that is a
common/expected situation.
This avoids many errors like:
```
error: ld-linux-x86-64.so.2 0x00085f3b: adding range [0x0000000000001ae8-0x0000000000001b07) which has a
base that is less than the function's low PC 0x000000000001cfb0. Please file a bug and attach the file at
the start of this error message
```
when debugging binaries on debian trixie because the dynamic linker
(ld-linux) contains discontinuous functions.
If the block ranges is not a subrange of the enclosing block then this
will range will currently be added to the outer block as well (i.e., we
get the same behavior that's currently possible for non-subrange blocks
larger than function_low_pc). However, this code path is buggy and I'd
like to change that (#117725).
When searching for the end of prologue, I'm only iterating through the
address range (~basic block) which contains the function entry point.
The reason for that is that even if some other range somehow contained
the end-of-prologue marker, the fact that it's in a different range
would imply it's reachable through some form of control flow, and that's
usually not a good place to set an function entry breakpoint.
This patch pushes the error handling boundary for the GetBitSize()
methods from Runtime into the Type and CompilerType APIs. This makes it
easier to diagnose problems thanks to more meaningful error messages
being available. GetBitSize() is often the first thing LLDB asks about a
type, so this method is particularly important for a better user
experience.
rdar://145667239
The llvm versions of these functions do that, so we must to so as well.
Practically this meant that were were unable to correctly un-simplify
the names of some types when using type units, which resulted in type
lookup errors.
PR #86603 broke unwinding in for unwind info added via "target symbols
add". #86770 attempted to fix this, but the fix was only partial -- it
accepted new sources of unwind information, but didn't take into account
that the symbol file can alter what lldb percieves as function
boundaries.
A stripped file will not contain information about private
(non-exported) symbols, which will make the public symbols appear very
large. If lldb tries to unwind from such a function before symbols are
added, then the cached unwind plan will prevent new (correct) unwind
plans from being created.
target-symbols-add-unwind.test might have caught this, were it not for
the fact that the "image show-unwind" command does *not* use cached
unwind information (it recomputes it from scratch).
The changes in this patch come in three pieces:
- Clear cached unwind plans when adding symbols. Since the symbol
boundaries can change, we cannot trust anything we've computed
previously.
- Add a flag to "image show-unwind" to display the cached unwind
information (mainly for the use in the test, but I think it's also
generally useful).
- Rewrite the test to better and more reliably simulate the real-world
scenario: I've swapped the running process for a core (minidump) file so
it can run anywhere; used the caching version of the show-unwind
command; and swapped C for assembly to better control the placement of
symbols
The original code resulted in a misfire in the symtab vs. debug info
deduplication code, which caused us to return the same function twice
when searching via a regex (for functions whose entry point is also not
the lowest address).
This is XFAILed for now until we find a good way to locate the
DW_AT_object_pointer of function declarations (a possible solution being
https://github.com/llvm/llvm-project/pull/124790).
Made it a shell test because I couldn't find any SBAPIs that i could
query to find the CV-qualifiers/etc. of member functions.
This is the behavior expected by DWARF. It also requires some fixups to
algorithms which were storing the addresses of some objects (Blocks and
Variables) relative to the beginning of the function.
There are plenty of things that still don't work in this setups, but
this change is sufficient for the expression evaluator to correctly
recognize the entry point of a function in this case.
In Objective-C, forward declarations are currently represented as:
```
DW_TAG_structure_type
DW_AT_name ("Foo")
DW_AT_declaration (true)
DW_AT_APPLE_runtime_class (DW_LANG_ObjC)
```
However, when compiling with `-gmodules`, when a class definition is
turned into a forward declaration within a `DW_TAG_module`, the DIE for
the forward declaration looks as follows:
```
DW_TAG_structure_type
DW_AT_name ("Foo")
DW_AT_declaration (true)
```
Note the absence of `DW_AT_APPLE_runtime_class`. With recent changes in
LLDB, not being able to differentiate between C++ and Objective-C
forward declarations has become problematic (see attached test-case and
explanation in https://github.com/llvm/llvm-project/pull/119860).
The ManualDWARFIndex class can create a index cache if the LLDB index
cache is enabled. This used to save the index to the same file,
regardless of wether the cache was a full index (no .debug_names) or a
partial index (have .debug_names, but not all .o files were had
.debug_names). So we could end up saving an index cache that was
partial, and then later load that partial index as if it were a full
index if the user set the 'settings set
plugin.symbol-file.dwarf.ignore-file-indexes true'. This would cause us
to ignore the .debug_names section, and if the index cache was enabled,
we could end up loading the partial index as if it were a full DWARF
index.
This patch detects when the ManualDWARFIndex is being used with
.debug_names, and saves out a cache file with a suffix of "-full" or
"-partial" to avoid this issue.