Reapply "[NFC][DebugInfo][DWARF] Create new low-level dwarf library (#…
(#145959)
This reapplies cbf781f0bdf2f680abbe784faedeefd6f84c246e, with fixes for
the shared-library build and the unconventional sanitizer-runtime build.
Original Description:
This is the culmination of a series of changes described in [1].
Although somewhat large by line count, it is almost entirely mechanical,
creating a new library in DebugInfo/DWARF/LowLevel. This new library has
very minimal dependencies, allowing it to be used from more places than
the normal DebugInfo/DWARF library--in particular from MC.
1.
https://discourse.llvm.org/t/rfc-debuginfo-dwarf-refactor-into-to-lower-and-higher-level-libraries/86665/2
This is the culmination of a series of changes described in [1].
Although somewhat large by line count, it is almost entirely mechanical,
creating a new library in DebugInfo/DWARF/LowLevel. This new library has
very minimal dependencies, allowing it to be used from more places than
the normal DebugInfo/DWARF library--in particular from MC.
I am happy to put it in another location, or to structure it differently
if that makes sense. Some have suggested in BinaryFormat, but it is not
a great fit there. But if that makes more sense to the reviewers, I can
do that.
Another possibility would be to use pass-through headers to allow
clients who don't care to depend only on DebugInfo/DWARF. This would be
a much less invasive change, and perhaps easier for clients. But also a
system that hides details.
Either way, I'm open.
1.
https://discourse.llvm.org/t/rfc-debuginfo-dwarf-refactor-into-to-lower-and-higher-level-libraries/86665/2
We saw occasional segfaults while processing some binaries. The reason
probably is that we may clear the DIE while we are reading it's data
from another thread which happens due to cross-unit references.
---------
Co-authored-by: Arslan Khabutdinov <akhabutdinov@fb.com>
Verification crashed here
https://github.com/llvm/llvm-project/blob/main/llvm/lib/DebugInfo/DWARF/DWARFUnit.cpp#L519
The reason being that during verification to extract inline_info we
recreate compile unit dies. Assert fails because we previously cleaned
up just the DIEs but some other fields remained initialized.
Co-authored-by: Arslan Khabutdinov <akhabutdinov@fb.com>
Same as https://github.com/llvm/llvm-project/pull/139907/ except there
is now a special dovoidwork helper function.
Previous approach with assert(f();return success;) failed tests for
release builds, so I created a separate helper. Open to suggestions how
to solve this more elegantly.
Co-authored-by: Arslan Khabutdinov <akhabutdinov@fb.com>
For large binaries gsymutil ends up using too much memory. This diff
adds DIE tree cleanup per compile unit to reduce memory usage.
P. S. Not sure about formatting. Maybe it hasn't been run in a while, or
I have misconfigured something.
`$ git clang-format HEAD~1
clang-format did not modify any files
$ clang-format --version
clang-format version 21.0.0git
(git@github.com:peremyach/llvm-project.git
8d945c8357e1bd9872a34f92620d4916bfd27482)
`
Co-authored-by: Arslan Khabutdinov <akhabutdinov@fb.com>
llvm-gsymutil eats a lot of RAM. On some large binaries, it causes OOM's on smaller hardware, consuming well over 64GB of RAM. This change frees line tables once we're done with them, and frees DWARFUnits's DIE's when we finish processing each DU, though they may get reconstituted if there are references from other DU's during processing. Once the conversion is complete, all DIE's are freed. The reduction in peak memory usage from these changes showed between 7-12% in my tests.
The double-checked locking around the creation & freeing of the data structures was tested on a 166 core system. I validated that it trivially malfunctioned without the locks (and with stupid reordering of the locks) and worked reliably with them.
---------
Co-authored-by: Kevin Frei <freik@meta.com>
Motivation: LLDB is able to report errors about these scenarios whereas
LLVM's DWARF parser only gives a boolean success/fail. I want to migrate
LLDB to using LLVM's DWARFUnitHeader class, but I don't want to lose
some of the error reporting, so I'm adding it to the LLVM class first.
Instead of reporting the error directly through the DWARFContext passed
in as an argument, it would be more flexible to have extract return the
error and allow the caller to react appropriately.
This will be useful for using llvm's DWARFHeaderUnit from lldb which may
report header extraction errors through a different mechanism.
For now, data location expression is hard coded to little endian. We are
going to support sanitizers on AIX which is big endian. Support big
endian too in the data location expression parser of llvm-symbolizer.
D152495 makes clang warn on unused variables that are declared in conditions like `if (int var = init) {}`
This patch is an NFC fix to suppress the new warning in llvm,clang,lld builds to pass CI in the above patch.
Differential Revision: https://reviews.llvm.org/D158016
This gives us more meaningful information when
`getAbbreviationDeclarationSet` fails. Right now only
`verifyAbbrevSection` actually uses the error that it returns, but the
other call sites could be rewritten to take advantage of the returned error.
Differential Revision: https://reviews.llvm.org/D153459
In Split DWARF, if the unit had a non-trivial base address (a real
low_pc, rather than one with fixed value 0) then computing addresses
needs to access that base address to add to any base address-relative
values. But the code was trying to access the base address in the split
unit, when it's actually in the skeleton unit. So delegate to the
skeleton if it's available.
Fixes#62941
Previously we'd stash a null pointer in a sorted vector of CUs - the
next time around, we'd try to do a binary search in that vector (sorting
on a key inside the objects pointed to by the elements of the vector)
which would deref null if we'd stashed a null in there previously.
As a reasonable, but not ideal, workaround - don't stash any result in
the vector - this means every query will produce a new warning
(resulting in duplicate warnings) but better than a crash.
Stashing null in the list could be workable if we also stashed the
offset in a pair - but then all the clients would need to be fixed up
(maybe using a filtering iterator) which seems like overkill for this
uncommon error case.
Changed contribution data structure to 64 bit. I added the 32bit and 64bit
accessors to make it explicit where we use 32bit and where we use 64bit. Also to
make sure sure we catch all the cases where this data structure is used.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D139379
Changed contribution data structure to 64 bit. I added the 32bit and 64bit
accessors to make it explicit where we use 32bit and where we use 64bit. Also to
make sure sure we catch all the cases where this data structure is used.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D139379
Summary:
Changed contribution data structure to 64 bit. I added the 32bit and 64bit
accessors to make it explicit where we use 32bit and where we use 64bit. Also to
make sure sure we catch all the cases where this data structure is used.
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated. The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.
This is part of an effort to migrate from llvm::Optional to
std::optional:
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
llvm-debuginfo-analyzer is a command line tool that processes debug
info contained in a binary file and produces a debug information
format agnostic “Logical View”, which is a high-level semantic
representation of the debug info, independent of the low-level
format.
The code has been divided into the following patches:
1) Interval tree
2) Driver and documentation
3) Logical elements
4) Locations and ranges
5) Select elements
6) Warning and internal options
7) Compare elements
8) ELF Reader
9) CodeView Reader
Full details:
https://discourse.llvm.org/t/llvm-dev-rfc-llvm-dva-debug-information-visual-analyzer/62570
This patch:
This is a high level summary of the changes in this patch.
ELF Reader
- Support for ELF/DWARF.
LVBinaryReader, LVELFReader
Reviewed By: psamolysov, probinson
Differential Revision: https://reviews.llvm.org/D125783
This review is extracted from D96035.
DWARF Debuginfo classes have two representations for DIEs: DWARFDebugInfoEntry
(short) and DWARFDie(extended). Depending on the task, it might be more convenient
to use DWARFDebugInfoEntry or/and DWARFDie. DWARFUnit class already has methods
working with DWARFDie and DWARFDebugInfoEntry. This patch adds more
methods working with DWARFDebugInfoEntry to have paired functionality.
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D126059
This is minimum changes extracted from https://reviews.llvm.org/D78950. The old patch tried to add LRU eviction of caching data structure. Due to multiple layers of interfaces that users could be using, it was not clear where to put the functionality. While we work out on where to put that functionality, it'll be great to add this minimum interface change so that the user could implement their own memory management. More specifically:
* Add a clearLineTable method for DWARFDebugLine which erases the given offset from the LineTableMap.
* DWARFDebugContext adds the clearLineTableForUnit method that leverages clearLineTable to remove the object corresponding to a given compile unit, for memory management purposes. When it is referred to again, the line table object will be repopulated.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D90006
Currently, llvm-symbolizer doesn't like to parse .debug_info in order to
show the line info for global variables. addr2line does this. In the
future, I'm looking to migrate AddressSanitizer off of internal metadata
over to using debuginfo, and this is predicated on being able to get the
line info for global variables.
This patch adds the requisite support for getting the line info from the
.debug_info section for symbolizing global variables. This only happens
when you ask for a global variable to be symbolized as data.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D123538
As usual with that header cleanup series, some implicit dependencies now need to
be explicit:
llvm/DebugInfo/DWARF/DWARFContext.h no longer includes:
- "llvm/DebugInfo/DWARF/DWARFAcceleratorTable.h"
- "llvm/DebugInfo/DWARF/DWARFCompileUnit.h"
- "llvm/DebugInfo/DWARF/DWARFDebugAbbrev.h"
- "llvm/DebugInfo/DWARF/DWARFDebugAranges.h"
- "llvm/DebugInfo/DWARF/DWARFDebugFrame.h"
- "llvm/DebugInfo/DWARF/DWARFDebugLoc.h"
- "llvm/DebugInfo/DWARF/DWARFDebugMacro.h"
- "llvm/DebugInfo/DWARF/DWARFGdbIndex.h"
- "llvm/DebugInfo/DWARF/DWARFSection.h"
- "llvm/DebugInfo/DWARF/DWARFTypeUnit.h"
- "llvm/DebugInfo/DWARF/DWARFUnitIndex.h"
Plus llvm/Support/Errc.h not included by a bunch of llvm/DebugInfo/DWARF/DWARF*.h files
Preprocessed lines to build llvm on my setup:
after: 1065629059
before: 1066621848
Which is a great diff!
Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D119723
Avoid duplicating the string decoding - improve the error messages down
in form parsing (& produce an Expected<const char*> instead of
Optional<const char*> to communicate the extra error details)
Some dwarf loaders in LLVM are hard-coded to only accept 4-byte and 8-byte address sizes. This patch generalizes acceptance into `DWARFContext::isAddressSizeSupported` and provides a common way to generate rejection errors.
The MSP430 target has been given new tests to cover dwarf loading cases that previously failed due to 2-byte addresses.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D111953
This patch implements suggestion done while reviewing D102634. It adds two fields:
ParentIdx and SiblingIdx. These fields allow fast navigation to die parent and
die sibling. These fields are set at the moment when dies are loaded.
dsymutil works 2% faster with this patch(run on clang binary).
Differential Revision: https://reviews.llvm.org/D110363
DWARFUnit::clearDIEs() uses std::vector::shrink_to_fit() to make
capacity of DieArray matched with its size(). The shrink_to_fit()
is not binding request to make capacity match with size().
Thus the memory could still be reserved after DWARFUnit::clearDIEs()
is called. This patch erases capacity when DWARFUnit::clearDIEs() is requested.
So the memory occupied by dies would be freed.
Differential Revision: https://reviews.llvm.org/D109499
This call would incorrectly overwrite (with the .debug_rnglists.dwo from
the executable, if there was one) the rnglists section instead of the
correct value (from the .debug_rnglists.dwo in the .dwo file) that's
applied in DWARFUnit::tryExtractDIEsIfNeeded
Originally committed as 04c203e310bd3fb58e16c936c0200d680100526e
Reverted in 768510632c5ddbf9438693d9c7db1903e39295ad due to the test
failing when encountering windows directory separators.
Fix the path separator platform issue with a FileCheck pattern {{[/\\]}}
Original commit message:
A followup to the feature added in 69da27c7496ea373567ce5121e6fe8613846e7a5
that added the optional "start file name" to match "start line" - but this
didn't work with Split DWARF because of the need for the decl file number
resolution code to refer back to the skeleton unit to find its .debug_line
contribution. So this patch adds the necessary infrastructure to track the
skeleton unit corresponding to a split full unit for the purpose of this
lookup.
A followup to the feature added in
69da27c7496ea373567ce5121e6fe8613846e7a5 that added the optional "start
file name" to match "start line" - but this didn't work with Split DWARF
because of the need for the decl file number resolution code to refer
back to the skeleton unit to find its .debug_line contribution. So this
patch adds the necessary infrastructure to track the skeleton unit
corresponding to a split full unit for the purpose of this lookup.
llvm-dwarfdump was silent even when the format of DWARF was invalid
and/or llvm-dwarfdump did not understand/support some of the constructs.
This can be pretty confusing as llvm-dwarfdump is a tool for DWARF
producers+consumers development.
Review comments also by @dblaikie.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D104271
In some cases a broken or invalid debug info could cause a crash in DWARFUnit::getInlinedChainForAddress during parsing a chain of in-lined functions. This patch fixes this issue.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D98119
There's no way to know whether there's a loclist contribution to parse
if there's no loclistx encoding - and if there is one, there's no need
to walk back from the loclist_base (or, uin the case of
info.dwo/loclist.dwo - starting at 0 in the contribution) to parse the
header, instead rely on the DWARF32/64 and address size in the CU
that's already available.
This would come up in split DWARF (non-split wouldn't try to read a
loclist header in the absence of a loclist_base) when one unit had
location lists and another does not (because the loclists.dwo section
would be non-empty in that case - in the case where it's empty the
parsing would silently skip).
Simplify the testing a bit, rather than needing a whole dwp, etc - by
creating a malformed loclists.dwo section (and use single file Split
DWARF) that would trip up any attempt to parse it - but no attempt
should be made.