Adds the ability to lookup and display all merged functions for an
address in llvm-gsymutil.
Now, when `--merged-functions` is used in combination with
`--address/--addresses-from-stdin`, lookup results will contain
information about merged functions, if available.
To support printing merged function information when using the
`--verbose` option, the `LookupResult` data structure also had to be
extended with pointers to the raw function data and raw merged function
data. This is because merged functions share the same address range, so
it's not easy to look up the raw merged function data for a particular
`LookupResult` that is based on a merged function.
This patch makes sure YAML files are opened for reading as text file to
trigger auto-conversion from EBCDIC encoding into expected ASCII
encoding on z/OS platform. This is required to fix the following lit
tests:
```
LLVM :: tools/llvm-gsymutil/ARM_AArch64/macho-gsym-callsite-info-exe.yaml
LLVM :: tools/llvm-gsymutil/ARM_AArch64/macho-gsym-callsite-info-obj.test
LLVM :: tools/llvm-gsymutil/ARM_AArch64/macho-gsym-callsite-info-dsym.yaml
LLVM :: Transforms/PGOProfile/memprof_undrift_missing_leaf.ll
```
Previously, the test
`llvm/test/tools/llvm-gsymutil/ARM_AArch64/macho-gsym-merged-callsites-dsym.yaml`
[was disabled](https://github.com/llvm/llvm-project/pull/119957/files)
due to failing on Linux.
The issue is that the merged functions appear in a different order in
the final produced `gSYM`.
To ensure deterministic ordering of merged functions, we change the
sorting of functions to use the `stable_sort` algorithm. Before merged
functions, this was not an issue as the address is used as a comparator
in the sorting algorithm so there was no chance of two functions testing
as identical according to the comparator.
Confirmed that test now passes locally with
`-DLLVM_ENABLE_EXPENSIVE_CHECKS=ON`.
This patch modifies the DWARF verifier to handle a valid case where two
or more functions have identical address ranges due to being merged by
ICF (Identical Code Folding). Previously, the verifier would incorrectly
report these as errors, but functions merged via ICF (such as when using
LLD's --keep-icf-stabs option) can legitimately share the same address
range.
A new test case has been added to verify this behavior using YAML-based
DWARF data that simulates two DW_TAG_subprogram entries with identical
address ranges. The test ensures that the verifier correctly identifies
this as a valid case and doesn't emit any errors, while still
maintaining the existing verification for truly invalid overlapping
ranges in other scenarios. Before this change, the newly added test case
would have failed, with `llvm-dwarfdump` marking the overlapping address
ranges in the DWARF as an error.
We also modify the existing tests `llvm-dwarfutil/ELF/X86/verify.test` and
`llvm/test/tools/llvm-dwarfdump/X86/verify_parent_zero_length.yaml`
which rely on the existence of the error that we're trying to
suppress. We slightly change one offset so that the ranges don't
perfectly overlap and an error is still generated.
This change adds support for loading gSYM callsite information from
DWARF. Previously the only support was for loading callsites info from
YAML.
For testing, we add a pass where `macho-gsym-merged-callsites-dsym`
loads callsite info from DWARF rather than YAML.
[llvm-debuginfo-analyzer] Fix crash due to un-checked error in LVReaderHandler::handleArchive
method.
- Added README describing how to generated the binary files used for the test.
- A follow up patch to add extra ASSERT_NE
Committed on behalf of @aurelien35
Currently, when dumping the contents of a GSYM there are three issues:
- Callsite information is not displayed for merged functions - this is
because of a bug in `CallSiteInfoLoader::buildFunctionMap` where when
enumerating through `Func.MergedFunctions` - we enumerate by value
instead of by reference.
- There is no variable indent for printing callsite info - meaning that
when printing callsites for merged functions, the indent will be
different than the other info of the merged function. To address this we
add configurable indent for printing callsite info
- Callsite info is printed right after merged function info. Meaning
that if the merged function also has call site information, the parent's
callsite info will appear right after the merged function's callsite
info - leading to confusion. To address this we print the callsite info
first, then the merged functions info.
This change addresses all the above 3 issues.
Example of old vs new:
<img width="1074" alt="image"
src="https://github.com/user-attachments/assets/d039ad69-fa79-4abb-9816-eda9cc2eda53"
/>
For the given C++ code:
```
template <typename T> class Foo { T Member; };
template <template <typename T> class TemplateType>
class Bar {
TemplateType<int> Int;
};
template <template <template <typename> class> class TemplateTemplateType>
class Baz {
TemplateTemplateType<Foo> Foo;
};
typedef Baz<Bar> Example;
Example TT;
```
The '--attribute=encoded' option, will produce the logical view:
```
{Class} 'Foo<int>'
{Encoded} <int>
{Class} 'Bar<Foo>'
{Encoded} <> <-- Missing the template argument info (Foo)
{Class} 'Baz<Bar>'
{Encoded} <> <-- Missing the template argument info (Bar)
```
When the template argument is another template it is not included in the
{Encoded} field. The correct output should be:
```
{Class} 'Foo<int>'
{Encoded} <int>
{Class} 'Bar<Foo>'
{Encoded} <Foo>
{Class} 'Baz<Bar>'
{Encoded} <Bar>
```
- In the DWARF reader, for those attributes that can have an unsigned
value, allow for the following cases:
* Is an implicit constant
* Is an optional value
- The testing is done by creating a file with generated DWARF, using
`DwarfGenerator` (generate DWARF debug info for unit tests).
This PR adds support in the gSYM format for call site information and
adds support for loading call sites from a YAML file. The support for
YAML input is mostly for testing purposes - so we have a way to test the
functionality.
Note that this data is not currently used in the gSYM tooling - the
logic to use call sites will be added in a later PR.
The reason why we need call site information in gSYM files is so that we
can support better call stack function disambiguation in the case where
multiple functions have been merged due to optimization (linker ICF).
When resolving a merged function on the callstack, we can use the call
site information of the calling function to narrow down the actual
function that is being called, from the set of all merged functions.
See [this
RFC](https://discourse.llvm.org/t/rfc-extending-gsym-format-with-call-site-information-for-merged-function-disambiguation/80682)
for more details on this change.
This reverts commit f06c187799d910fd3ac3e9106397e5eecff9f265.
Temporary revert: there is https://github.com/llvm/llvm-project/pull/117239 that is suppose to fix the issue.
Reverting to keep things rolling.
This is the second half of
https://github.com/llvm/llvm-project/pull/90008.
Essentially, it replaces the work of resolving template types when we
just need the qualified names with walking the DIE tree using
`DWARFTypePrinter`.
### Result
For an internal target, the time spent on `expr *this` for the first
time reduced from 28 secs to 17 secs.
The code dealing with DW_AT_call_line/DW_AT_call_file is in the wrong
place. The correct functions were call, but with incorrect values:
DW_AT_call_line <-- Filename Index
DW_AT_call_file <-- Line number
The DW_OP_GNU_push_tls_address, DW_OP_form_tls_address DWARF
location forms generated for thread local storage variables, caused a
crash in the DWARFReader, due to incorrect number of operands.
As part of FEAT_PAuthLR, a new DWARF Frame Instruction was introduced,
`DW_CFA_AARCH64_negate_ra_state_with_pc`. This instructs Libunwind that
the PC has been used with the signing instruction. This change includes
three commits
- Libunwind support for the newly introduced DWARF Instruction
- CodeGen Support for the DWARF Instructions
- Reversing the changes made in #96377. Due to
`DW_CFA_AARCH64_negate_ra_state_with_pc`'s requirements to be placed
immediately after the signing instruction, this would mean the CFI
Instruction location was not consistent with the generated location when
not using FEAT_PAuthLR. The commit reverses the changes and makes the
location consistent across the different branch protection options.
While this does have a code size effect, this is a negligible one.
For the ABI information, see here:
853286c7ab/aadwarf64/aadwarf64.rst (id23)
In DwarfTransformer::verify() line number information is retrieved for
each address using:
auto DwarfInlineInfos =
DICtx.getInliningInfoForAddress(SectAddr, DLIS);
Later down the loop, another such invocation was made before:
Gsym->dump(Log, *FI);
There is a continue after that, DwarfInlineInfos do not affect the
dump() invocation, I am not aware of any other side effects that is
needed from the extra getInliningInfoForAddress() invocation, and tests
pass without it, so just remove it.
This generalizes DWARFTypePrinter class to a template class so that it
can be reused for lldb's DWARFDIE type.
This is a split of https://github.com/llvm/llvm-project/pull/90008. The
difference is that this doesn't have `Visitor` template parameter which
can be added later if necessary.
They are close enough to swap lldb's `DWARFDebugArangeSet` with the llvm
one.
The difference is that llvm's `DWARFDebugArangeSet` add empty ranges
when extracting. To accommodate this, `DWARFDebugAranges` in lldb
filters out empty ranges when extracting.
Don't call raw_string_ostream::flush(), which is essentially a no-op.
As specified in the docs, raw_string_ostream is always unbuffered.
( 65b13610a5226b84889b923bae884ba395ad084d for further reference )
… (#99444)
The previous version introduced a bug (caught by cross-project tests).
Explicit signature resolution is still necessary when one wants to
access the children (not attributes) of a given DIE.
The new version keeps just the findRecursively extension, and reverts
all the DWARFTypePrinter modifications.
Nothing in the affected code depends on the `ModuleName` being
null-terminated,
so take it by `StringRef` instead of `const std::string &`.
This change simplifies API consumption, since one doesn't always have a
`std::string` at the call site (might have `std::string_view` instead),
and also gives some minor performance improvements by removing
string-copies in the cache-hit path of `getOrCreateModuleInfo`.
This patch fixes .debug_names verification for split DWARF with no type
units. It will print out an error for any name entries where we can't
locate the .dwo file. It finds the non skeleton unit and correctly
figures out the DIE offset in the .dwo file. If the non skeleton unit is
found and yet the skeleton unit has a DWO ID, an error will be emitted
showing we couldn't access the non-skeleton compile unit.
This allows `llvm-dwarfdump` to decode the DWARF 5 opcode
`DW_OP_implicit_pointer` (0xa0). GCC makes use of this opcode in recent
versions. LLVM contains some (unfinished) support as well. With existing
usage in the ecosystem, adding decoding support here seems reasonable.
This sorts DWARF op descriptions in `DWARFExpression.cpp` by opcode and version, packing the standardised ops together. A few ops also had the wrong version listed, so this fixes those versions as well. (The version does not appear to actually be used currently.)
This patch introduces support for storing debug info for merged
functions in the GSYM debug info. It allows GSYM to represent multiple
functions that share the same address range, which occur when multiple
functions are merged during linker ICF.
The core of this functionality is the new `MergedFunctionsInfo` class,
which is integrated into the existing `FunctionInfo` structure. During
GSYM creation, functions with identical address ranges are now grouped
together, with one function serving as the "master" and the others
becoming "merged" functions. This organization is preserved in the GSYM
format and can be read back and displayed when dumping GSYM information.
Old readers will only see the master function, and ther "merged"
functions will not be processed.
Note: This patch just adds the functionality to the gSYM format -
additional changes to the gsym format and algorithmic changes to logic
existing tooling are needed to take advantage of this data.
Exact output of `llvm-gsymutil --verify --verbose` for the included
test:
[gist](https://gist.github.com/alx32/b9c104d7f87c0b3e7b4171399fc2dca3)
…_sig8
Splitting from #99495.
I've extended the type unit test case to feature more kinds of
references, including the gcc-style DW_AT_type[DW_FORM_ref_sig8]
reference, which this patch fixes.
LLVM Symbolizer attempt to symbolize addresses of optimized binaries
reports missing line numbers for some cases. It maybe due to compiler
which sometimes cannot map an instruction to line number due to
optimizations. Symbolizer should handle those cases gracefully.
Adding an option '--skip-line-zero' to symbolizer so as to report the
nearest non-zero line number.
---------
Co-authored-by: Amit Pandey <amit.pandey@amd.com>
This patch adds support for verifying local type units in .debug_names
section. It adds a test to test if the TU index is valid, and a test
that tests that an error is found inside the name entry for a type unit.
We don't need to test all other errors in the name entry because these
are essentially identical to compile unit entries, they just use a
different DWARF unit offset index.
findRecursively follows DW_AT_specification and DW_AT_abstract_origin
references, but not DW_AT_signature. As far as I can tell, there is no
fundamental difference between these attributes that would make this
behavior desirable, and this just seems like a consequence of the fact
that this attribute is newer. This patch aims to change that.
The motivation is some code in lldb, which assumes that it can construct
a qualified name of a type by just walking the parent chain and looking
at the name attribute. This works for "regular" debug info, even when
some of the DIEs are just forward declarations, but it breaks in the
presence of type units, because of the need to explicitly resolve the
signature reference.
While LLDB does not use the llvm's DWARFDie class (yet?), this seems
like a very important change in the overall API, and any divergence here
would complicate eventual reunification, which is why I am making the
change in the llvm API first. However, putting lldb aside, I think this
change is beneficial in llvm on its own, as it allows us to remove the
explicit DW_AT_signature resolution in the DWARFTypePrinter.
The result of the function cannot be correctly interpreted without
knowing the precise form type (a type signature needs to be looked up
very differently from a supplementary debug info reference). The
function sort of worked because the two reference types (unit-relative
and section-relative) that can be handled uniformly are also the most
common types of references, but this setup made it easy to write code
which does not support other kinds of reference (and if one tried to
support them, the result didn't look pretty --
https://github.com/llvm/llvm-project/pull/97423/files#r1676217081).
The split is based on the reference type classification from DWARFv5
(Section 7.5.5 Classes and Forms), and it should enable uniform (if
slightly more verbose) hadling. Note that this only affects users which
want more control of how (or if) the references are resolved. Users
which just want to access the referenced DIE can use the higher level
API (DWARFDie::GetAttributeValueAsReferencedDie) which returns (or will
return after #97423 is merged) the correct die for all reference types
(except for supplementary references, which we don't support right now).