raw_string_ostream::flush() is essentially a no-op (also specified in docs).
Don't call it in tests that aren't meant to test 'raw_string_ostream' itself.
p.s. remove a few redundant calls to raw_string_ostream::str()
… (#99444)
The previous version introduced a bug (caught by cross-project tests).
Explicit signature resolution is still necessary when one wants to
access the children (not attributes) of a given DIE.
The new version keeps just the findRecursively extension, and reverts
all the DWARFTypePrinter modifications.
The result of the function cannot be correctly interpreted without
knowing the precise form type (a type signature needs to be looked up
very differently from a supplementary debug info reference). The
function sort of worked because the two reference types (unit-relative
and section-relative) that can be handled uniformly are also the most
common types of references, but this setup made it easy to write code
which does not support other kinds of reference (and if one tried to
support them, the result didn't look pretty --
https://github.com/llvm/llvm-project/pull/97423/files#r1676217081).
The split is based on the reference type classification from DWARFv5
(Section 7.5.5 Classes and Forms), and it should enable uniform (if
slightly more verbose) hadling. Note that this only affects users which
want more control of how (or if) the references are resolved. Users
which just want to access the referenced DIE can use the higher level
API (DWARFDie::GetAttributeValueAsReferencedDie) which returns (or will
return after #97423 is merged) the correct die for all reference types
(except for supplementary references, which we don't support right now).
Update the folder titles for targets in the monorepository that have not
seen taken care of for some time. These are the folders that targets are
organized in Visual Studio and XCode
(`set_property(TARGET <target> PROPERTY FOLDER "<title>")`)
when using the respective CMake's IDE generator.
* Ensure that every target is in a folder
* Use a folder hierarchy with each LLVM subproject as a top-level folder
* Use consistent folder names between subprojects
* When using target-creating functions from AddLLVM.cmake, automatically
deduce the folder. This reduces the number of
`set_property`/`set_target_property`, but are still necessary when
`add_custom_target`, `add_executable`, `add_library`, etc. are used. A
LLVM_SUBPROJECT_TITLE definition is used for that in each subproject's
root CMakeLists.txt.
The relative offset for a CU in Dwarf v5 (and later) is different than
the relative offset for a CU in Dwarf v4 (and before).
Signed-off-by: Will Hawkins <hawkinsw@obs.cr>
As part of the WebAssembly support work review
https://github.com/llvm/llvm-project/pull/82588
It was decided to rename:
Files: LVElfReader.cpp[h] -> LVDWARFReader.cpp[h]
ELFReaderTest.cpp -> DWARFReaderTest.cpp
Class: LVELFReader -> LVDWARFReader
The name LVDWARFReader would match the another reader LVCodeViewReader
as they will reflect the type of
debug information format that they are parsing.
We recently ran into some bad DWARF where the `DW_AT_stmt_list` of many
compile units was randomly set to invalid values and was causing LLDB to
crash due to an assertion about address sizes not matching. Instead of
asserting, we should return an appropriate recoverable `llvm::Error`.
GsymUtil, like DwarfDump --verify, spews a *lot* of data necessary to
understand/diagnose issues with DWARF data. The trouble is that the kind
of information necessary to make the messages useful also makes them
nearly impossible to easily categorize. I put together a similar output
categorizer (https://github.com/llvm/llvm-project/pull/79648) that will
emit a summary of issues identified at the bottom of the (very verbose)
output, enabling easier tracking of issues as they arise or are
addressed.
There's a single output change, where a message "warning: Unable to
retrieve DWO .debug_info section for some object files. (Remove the
--quiet flag for full output)" was being dumped the first time it was
encountered (in what looks like an attempt to make something easily
grep-able), but rather than keep the output in the same order, that
message is now a 'category' so gets emitted at the end of the output.
The test 'tools/llvm-gsymutil/X86/elf-dwo.yaml' was changed to reflect
this difference.
---------
Co-authored-by: Kevin Frei <freik@meta.com>
This commit brings support for debug_names in DWARFYAML. It parses YAML
and generates emits a DWARF5 Accelerator table with the following
limitations:
1. All forms must have a fixed length (zero length is also ok).
2. Hard-coded support for DWARF 5 and DWARF32.
3. The generated table does not contain a hash index
All of these limitations can be lifted in the future, but for now this
is good enough to enable testing.
This patch replaces uses of StringRef::{starts,ends}with with
StringRef::{starts,ends}_with for consistency with
std::{string,string_view}::{starts,ends}_with in C++20.
I'm planning to deprecate and eventually remove
StringRef::{starts,ends}with.
llvm-gsymutil allows address ranges to overlap. There was a bug where if
we had debug info for a function with a range like [0x100-0x200) and a
symbol at the same start address yet with a larger range like
[0x100-0x300), we would randomly get either only information from the
first or second entry. This could cause lookups to fail due to the way
the binary search worked.
This patch makes sure that when lookups happen we find the first address
table entry that can match an address, and also ensures that we always
select the first FunctionInfo that could match. FunctionInfo entries are
sorted such that the most debug info rich entries come first. And if we
have two ranges that have the same start address, the smaller range
comes first and the larger one comes next. This patch also adds the
ability to iterate over all function infos with the same start address
to always find a range that contains the address.
Added a unit test to test this functionality that failed prior to this
fix and now succeeds.
Also fix an issue when dumping an entire GSYM file that has duplicate address entries where it used to always print out the binary search match for the FunctionInfo, not the actual data for the address index.
We now have 64-bit XCOFF object file support, so these tests can be
enabled again. However, some tests still fail due to unsupported debug
sections, so I cleaned up their comments.
DWARF produced by LTO and BOLT can sometimes be broken where file
indexes are beyond the end of the line table's file list in the
prologue. This patch allows llvm-gsymutil to convert this DWARF without
crashing, and emits errors when:
line table contains entries with an invalid file index (line entry will
be removed) inline functions that have invalid DW_AT_call_file file
indexes when there are no line table entries for a function and we fall
back to making a single line table entry from the functions
DW_AT_decl_file/DW_AT_decl_line attributes, we make sure the
DW_AT_decl_file attribute is valid before emitting it.
Fix line table lookups in line tables with multiple lines with the same
address.
Compilers emit line tables that have multiple line table entries with
the same address. When doing lookups, we always need to use the last
line entry if a lookup address matches the address of one or more line
entries. This is because the size of an address range for a line uses
the next line entry to figure out how big the current line entry is. If
the next line entry has the same address, that means the current line
entry has a size of zero, so no bytes correspond to the line entry.
This patch ensures that lookups will always pick the last matching line
entry when the lookup address matches more than one line entry.
Instead of reporting the error directly through the DWARFContext passed
in as an argument, it would be more flexible to have extract return the
error and allow the caller to react appropriately.
This will be useful for using llvm's DWARFHeaderUnit from lldb which may
report header extraction errors through a different mechanism.
Previous to this fix, if we had a DW_TAG_subprogram that had a
DW_AT_linkage_name that was empty, it would attempt to use this name
which would cause an error to be emitted when saving the gsym file to
disk:
error: DWARF conversion failed: : attempted to encode invalid
FunctionInfo object
This patch fixes this issue and adds a unit test case.
Note that llvm::support::endianness has been renamed to
llvm::endianness while becoming an enum class. This patch replaces
{big,little,native} with llvm::endianness::{big,little,native}.
This patch completes the migration to llvm::endianness and
llvm::endianness::{big,little,native}. I'll post a separate patch to
remove the migration helpers in llvm/Support/Endian.h:
using endianness = llvm::endianness;
constexpr llvm::endianness big = llvm::endianness::big;
constexpr llvm::endianness little = llvm::endianness::little;
constexpr llvm::endianness native = llvm::endianness::native;
Note that llvm::support::endianness has been renamed to
llvm::endianness while becoming an enum class as opposed to an
enum. This patch replaces support::{big,little,native} with
llvm::endianness::{big,little,native}.
Note that llvm::support::endianness has been renamed to
llvm::endianness while becoming an enum class as opposed to an enum.
This patch replaces llvm::support::{big,little,native} with
llvm::endianness::{big,little,native}.
Now that llvm::support::endianness has been renamed to
llvm::endianness, we can use the shorter form. This patch replaces
support::endianness with llvm::endianness.
Now that llvm::support::endianness has been renamed to
llvm::endianness, we can use the shorter form. This patch replaces
llvm::support::endianness with llvm::endianness.
system_endianness() just returns llvm::endianness::native, a
compile-time constant equivalent to std::native in C++20. This patch
deprecates system_endianness() while replacing all invocations of
system_endianness() with llvm::endianness::native.
While we are at it, this patch replaces
llvm::support::endianness::{big,little} with
llvm::endianness::{big,little} in those statements that happen to call
system_endianness(). It does not go out of its way to replace other
occurrences of llvm::support::endianness::{big,little}.
To make DWARFDebugAbbrev more amenable to error-handling, I would like
to change the return type of DWARFDebugAbbrev::parse from `void` to
`Error`. Users of DWARFDebugAbbrev can consume the error if they want to
use all the valid DWARF that was parsed (without worrying about the
malformed DWARF) or stop when the parse fails if the use case needs to
be strict.
This also will bring the LLVM DWARFDebugAbbrev interface closer to
LLDB's which opens up the opportunity for LLDB adopt the LLVM
implementation with minimal changes.
Extend llvm-objdump to show CO-RE relocations when `-r` option is
passed and object file has .BTF and .BTF.ext sections.
For example, the following C program:
#define __pai __attribute__((preserve_access_index))
struct foo { int i; int j;} __pai;
struct bar { struct foo f[7]; } __pai;
extern void sink(void *);
void root(struct bar *bar) {
sink(&bar[2].f[3].j);
}
Should lead to the following objdump output:
$ clang --target=bpf -O2 -g t.c -c -o - | \
llvm-objdump --no-addresses --no-show-raw-insn -dr -
...
r2 = 0x94
CO-RE <byte_off> [2] struct bar::[2].f[3].j (2:0:3:1)
r1 += r2
call -0x1
R_BPF_64_32 sink
exit
...
More examples could be found in unit tests, see BTFParserTest.cpp.
To achieve this:
- Move CO-RE relocation kinds definitions from BPFCORE.h to BTF.h.
- Extend BTF.h with types derived from BTF::CommonType, e.g.
BTF::IntType and BTF::StrutType, to allow dyn_cast() and access to
type additional data.
- Extend BTFParser to load BTF type and relocation data.
- Modify llvm-objdump.cpp to create instance of BTFParser when
disassembly of object file with BTF sections is processed and `-r`
flag is supplied.
Additional information about CO-RE is available at [1].
[1] https://docs.kernel.org/bpf/llvm_reloc.html
Depends on D149058
Differential Revision: https://reviews.llvm.org/D150079
llvm-gsymutil would emit errors about address ranges for DW_TAG_inlined_subroutine DIEs whose address range didn't exist in the parent inline information. When a DW_TAG_subprogram DIE has more than one address range with a DW_AT_ranges attribute, we emit multiple FunctionInfo objets, one for each range of a function. When we parsed the inline information, it might have inline contribution that appear in any of the function's ranges, and if we were parsing the first range of a function, all inline entries that appeared in other valid ranges of the functions would end up emitting error messages. This patch fixes this by always passing down the full list of ranges, even if they aren't being used in the parse of the information. This eliminates reporting of errors when we shouldn't have been emitting error messages. Added a test to track this and ensure this doesn't regress.
Also we don't warn if we end up with empty inline information if the only top level inline function have been elided where the high and low PC values are the same which indicates that the inline function was elided.
Differential Revision: https://reviews.llvm.org/D157669
The GSYM code alwasy logging to streams even in quiet mode. When in quiet mode we would use the "nulls()" stream to avoid logging to the terminal, but this still caused logging functions to be called on DWARFDie objects and other messages which were quite expensive and not needed if we weren't logging anything. This patch switches some logs in performant areas to be "raw_ostream *" values and if the ostream pointer is NULL, then we don't call the expensive logging functions on DWARFDie and other objects which will improve performance.
Differential Revision: https://reviews.llvm.org/D157466
llvm-gsymutil was maintaining an address ranges collection behind a mutex and having the multi-threaded code access this and hold the mutex was causing slowdown when converting DWARF to GSYM. This patch does the following:
- removes the "Ranges" variable from the GsymCreator and any functions and places that used it
- clients don't try to detect if a function has been added for an address range, we now remove any inferior copies of information in the GsymCreator::finalize() routine as was done before, we just have more items to remove, though performance is greator due to less mutex thread locking
- after I started adding all of the inferior funtion info objects the previous patch that tried to remove infrior debug info had bugs in it, so I replace the removeIfBinary() function in GsymCreator with a more efficient and easier to debug way to do things which copies items from the GsymCreator::Funcs into a new vector of FunctionInfo objects and then replaces GsymCreator::Funcs at the end.
- Sorting of FunctionInfo objects has been modified to also compare InlineInfo objects. We found cases where LTO was ruining inline function address ranges and we ended up with a variety of FunctionInfo objects for the same range that had varying amounts of valid debug info. This patch now ensure that two function info objects with different inline info for the same function address range, the best one will be picked to ensure the greatest fidelity.
- If we detect that a DW_TAG_subprogram has inline functions and after parsing it, we don't end up with any valid inline information, we set the optional to std::nullopt to avoid emitting empty inline information and wasting space.
My tests show a 200% perf increase on M1 macs and a 100% performance increase on linux machines for the same complex large DWARF input binary.
Differential Revision: https://reviews.llvm.org/D156773
If a function contains inline function ranges whose address ranges are not contained in the parent scope, then emit an error message and omit them from the final GSYM. Prior to this we would only test if an inline function's address range was within the concrete function's ranges. If we ran into a case where the inline range was within the function's ranges, but not within one of the parent inline function's ranges, then we would fail to produce a GSYM file and exit with an error.
The current code will emit full details on invalid inline ranges as they are being parsed and will omit any bad ranges from the final GSYM file.
Differential Revision: https://reviews.llvm.org/D155254
Test bot reported an issue with unit tests for D149058 in [1]:
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from BTFParserTest
[ RUN ] BTFParserTest.simpleCorrectInput
/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/unittests/DebugInfo/BTF/BTFParserTest.cpp:141:33:
runtime error: upcast of misaligned address 0x7facce60411f for type 'llvm::SmallString<0>', which requires 8 byte alignment
0x7facce60411f: note: pointer points here
64 00 00 00 37 41 60 ce ac 7f 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
^
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior
/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/unittests/DebugInfo/BTF/BTFParserTest.cpp:141:33
The issue is caused by attribute "packed" used for too many things:
#pragma pack(push, 1)
struct MockData1 {
struct B {
...
} BTF;
struct E {
...
} Ext;
int BTFSectionLen = sizeof(BTF);
int ExtSectionLen = sizeof(Ext);
SmallString<0> Storage;
std::unique_ptr<ObjectFile> Obj;
}
#pragma pack(pop)
Access to unaligned pointers in `Storage`/`Obj` causes unaligned
access errors.
To fix this #pragma directives are pushed invards to apply only to `B`
and `E` definitions.
[1] https://lab.llvm.org/buildbot/#/builders/5/builds/35040
Differential Revision: https://reviews.llvm.org/D155176
"BTF" is a debug information format used by LLVM's BPF backend.
The format is much smaller in scope than DWARF, the following info is
available:
- full set of C types used in the binary file;
- types for global values;
- line number / line source code information .
BTF information is embedded in ELF as .BTF and .BTF.ext sections.
Detailed format description could be found as a part of Linux Source
tree, e.g. here: [1].
This commit modifies `llvm-objdump` utility to use line number
information provided by BTF if DWARF information is not available.
E.g., the goal is to make the following to print source code lines,
interleaved with disassembly:
$ clang --target=bpf -g test.c -o test.o
$ llvm-strip --strip-debug test.o
$ llvm-objdump -Sd test.o
test.o: file format elf64-bpf
Disassembly of section .text:
<foo>:
; void foo(void) {
r1 = 0x1
; consume(1);
call -0x1
r1 = 0x2
; consume(2);
call -0x1
; }
exit
A common production use case for BPF programs is to:
- compile separate object files using clang with `-g -c` flags;
- link these files as a final "static" binary using bpftool linker ([2]).
The bpftool linker discards most of the DWARF sections
(line information sections as well) but merges .BTF and .BTF.ext sections.
Hence, having `llvm-objdump` capable to print source code using .BTF.ext
is valuable.
The commit consists of the following modifications:
- llvm/lib/DebugInfo/BTF aka `DebugInfoBTF` component is added to host
the code needed to process BTF (with assumption that BTF support
would be added to some other tools as well, e.g. `llvm-readelf`):
- `DebugInfoBTF` provides `llvm::BTFParser` class, that loads information
from `.BTF` and `.BTF.ext` sections of a given `object::ObjectFile`
instance and allows to query this information.
Currently only line number information is loaded.
- `DebugInfoBTF` also provides `llvm::BTFContext` class, which is an
implementation of `DIContext` interface, used by `llvm-objdump` to
query information about line numbers corresponding to specific
instructions.
- Structure `DILineInfo` is modified with field `LineSource`.
`DIContext` interface uses `DILineInfo` structure to communicate
line number and source code information.
Specifically, `DILineInfo::Source` field encodes full file source code,
if available. BTF only stores source code for selected lines of the
file, not a complete source file. Moreover, stored lines are not
guaranteed to be sorted in a specific order.
To avoid reconstruction of a file source code from a set of
available lines, this commit adds `LineSource` field instead.
- `Symbolize` class is modified to use `BTFContext` instead of
`DWARFContext` when DWARF sections are not available but BTF
sections are present in the object file.
(`Symbolize` is instantiated by `llvm-objdump`).
- Integration and unit tests.
Note, that DWARF has a notion of "instruction sequence".
DWARF implementation of `DIContext::getLineInfoForAddress()` provides
inexact responses if exact address information is not available but
address falls within "instruction sequence" with some known line
information (see `DWARFDebugLine::LineTable::findRowInSeq()`).
BTF does not provide instruction sequence groupings, thus
`getLineInfoForAddress()` queries only return exact matches.
This does not seem to be a big issue in practice, but output
of the `llvm-objdump -Sd` might differ slightly when BTF
is used instead of DWARF.
[1] https://www.kernel.org/doc/html/latest/bpf/btf.html
[2] https://github.com/libbpf/bpftool
Depends on https://reviews.llvm.org/D149501
Reviewed By: MaskRay, yonghong-song, nickdesaulniers, #debug-info
Differential Revision: https://reviews.llvm.org/D149058
This extends DWARFDebugLine to properly parse line number programs with
maximum_operations_per_instruction > 1 for VLIW targets.
No functions that use that parsed output to retrieve line information
have been extended to support multiple op-indexes. This means that when
retrieving information for an address with multiple op-indexes, e.g.
when using llvm-addr2line, the penultimate row for that address will be
used, which in most cases is the row for the second largest op-index.
This will be addressed in further changes, but this patch at least
allows us to correctly parse such line number programs, with a warning
saying that the line number information may be incorrect (incomplete).
Reviewed By: StephenTozer
Differential Revision: https://reviews.llvm.org/D152536
This is a preparatory patch for extending DWARFDebugLine to properly
parse line number programs with maximum_operations_per_instruction > 1
for VLIW targets.
Add some scaffolding for handling op-index in line number programs, and
add printouts for that in the table. As this affects a lot of tests,
this is done in a separate commit to get a cleaner review for the actual
op-index implementation.
Verbose printouts are not present in many tests, and adding op-index to
those will require a bit more code changes, so that is done in the
actual implementation patch.
Reviewed By: StephenTozer
Differential Revision: https://reviews.llvm.org/D152535
In preparation for removing the `#include "llvm/ADT/StringExtras.h"`
from the header to source file of `llvm/Support/Error.h`, first add in
all the missing includes that were previously included transitively
through this header.
The extension codespace for DWARF expressions (DW_OP_LLVM_{lo,hi}_user)
has shrunk over time, as no extension is ever "retired" in practice. To
facilitate future extensions, this patch reserves one open opcode as an extension
point (0xfe), which is followed by a ULEB128-encoded SubOperation, and
then by the subop's operands.
There is some prior-art, namely DW_OP_AARCH64_operation
(see edd7460d87/aadwarf64/aadwarf64.rst (45dwarf-expression-operations)).
This version makes some different tradeoffs, opting to use a ULEB128 for
the subop encoding for future-proofing.
Reviewed By: #debug-info, dblaikie
Differential Revision: https://reviews.llvm.org/D147271
This commit aims to improve error handling in the
DWARFAbbreviationDeclarationSet class. Specifically, we change the return type
of DWARFAbbreviationDeclarationSet::extract to an llvm::Error. In doing
so, we propagate the error from DWARFAbbreviationDeclaration::extract
another layer upward.
I have built on the previous unittest for DWARFDebugAbbrev that I
wrote a few days prior.
Namely, I am verifying that the following should give an error:
- An invalid tag following a non-null code
- An invalid attribute with a valid form
- A valid attribute with an invalid form
- An incorrectly terminated DWARFAbbreviationDeclaration
Additionally, I uncovered some invalid DWARF in an unrelated dsymutil
test. Namely the last Abbreviation Decl was missing a code.
This test has been updated accordingly. However, this commit does
not fix the underlying issue: llvm-dwarfdump does not correctly
verify the debug abbreviation section to catch these kinds of
mistakes. I have updated DWARFVerifier to not dereference a
pointer without first checking it and left a FIXME for future
contributors.
Differential Revision: https://reviews.llvm.org/D151353
This reverts commit 11d61c079d4b4927efea42a38a27d4586887b764 to re-apply
6836a47b7e6b57927664ec6ec750ae37bb951129 with modifications.
Specifically, the errors in DWARFAbbreviationDeclaration::extract needed
to be moved as they are returned to ensure the right Error constructor
is selected.
In trying to hoist errors further up this callstack, I discovered that
if the data in the debug_abbrev section is invalid entirely, the code
that parses the debug_abbrev section may do strange or unpredictable
things. The underlying issue is that DataExtractor will return a value
of 0 when it encounters an error in extracting a LEB128 value. It's thus
difficult to determine if there was an error just by looking at the
return value. This patch aims to bail at the first sight of an error in
the debug_abbrev parsing code.
Differential Revision: https://reviews.llvm.org/D151755
I landed D151001 before it had gotten sign-off from all the reviewers.
This is a follow-up to address the additional feedback.
Differential Revision: https://reviews.llvm.org/D151233