Split DWARF doesn't handle LTO of any form (roughly there's an
assumption that each dwo file will have one CU - it's not explicitly
documented, nor explicitly handled, so the ecosystem isn't really well
understood/tested/etc).
This had previously been handled by implementing (& disabling by
default) the `-split-dwarf-cross-cu-references` flag, which would
disable use of ref_addr across two dwo CUs.
This worked for a while, at least in LTO (it didn't address Split
DWARF+Full LTO, but that's an unlikely combination, as the benefits of
Split DWARF are more limited in a full LTO build) - because the only
source of cross-CU references was inlined functions, so by making those
non-cross-CU (by moving the referenced inlined function DWARF
description into the referencing CU) the result was one CU per dwo.
But recently the Function Specialization pass was added to the ThinLTO
pipeline, which caused imported functions that may not be inlined to be
emitted by a backend compile. This meant foreign CU entities (not just
abstract origins/cross-CU referenced entities)/standalone foreign CUs
could be emitted by a backend compile.
The end result was, due to a bug* in binutils dwp (I think basically
it saw two CUs in a single dwo and reprocessed the offsets in the shared
debug_str_offsets.dwo section) this situation lead to corrupted strings.
So to make this more robust, I've generalized the definition of the
`-split-dwarf-cross-cu-references` flag (perhaps it should be renamed at
this point, but it's /really/ niche, doubt anyone's using it - more or
less there for experimentation when we get around to figuring out
spec'ing LTO+Split DWARF) to mean "single CU in a dwo file" and added
more general handling for this.
There's certainly some weird corner cases that could come up in terms of
"how do we choose which CU to put everything in" - for now it's "first
come, first served" which is probably going to be OK for ThinLTO - the
base module will have the first functions and first CU, imported
fragments will come after that. For LTO the choice will be fairly
arbitrary - but, again, essentially whichever module comes first.
* Arguably a bug in binutils dwp, but since the feature isn't well
specified, I'd rather avoid dabbling in this uncertain area and ensure
LLVM doesn't produce especially novel DWARF (dwos with multiple CUs)
regardless of whether binutils dwp would/should be fixed. I'm not
confident debuggers could read such a dwo file well, etc.
Summary: DWARF32 is not supported for XCOFF64 under non-integrated-as mode on AIX, because system assembler will fill the debug section lengths according to DWARF64 format. While in intergrated-as mode, XCOFF64 should be able to select the DWARF format.
Reviewed By: shchenz
Differential Revision: https://reviews.llvm.org/D150181
This patch consumes the EntryValueObjects in a MachineFunction's table, using
them to emit the appropriate debug information for these variables.
Depends on D149880
Differential Revision: https://reviews.llvm.org/D149881
MachineFunction keeps a table of variables whose addresses never change
throughout the function. Today, the only kinds of locations it can
handle are stack slots.
However, we could expand this for variables whose address is derived
from the value a register had upon function entry. One case where this
happens is with variables alive across coroutine funclets: these can
be placed in a coroutine frame object whose pointer is placed in a
register that is an argument to coroutine funclets.
```
define @foo(ptr %frame_ptr) {
dbg.declare(%frame_ptr, !some_var,
!DIExpression(EntryValue, <ptr_arithmetic>))
```
This is a patch in a series that aims to improve the debug information
generated by the CoroSplit pass in the context of `swiftasync`
arguments. Variables stored in the coroutine frame _must_ be described
the entry_value of the ABI-defined register containing a pointer to the
coroutine frame. Since these variables have a single location throughout
their lifetime, they are candidates for being stored in the
MachineFunction table.
Differential Revision: https://reviews.llvm.org/D149879
Reland D147506 after fixing the failure in bot
https://lab.llvm.org/buildbot/#/builders/247/builds/4125
Some debuggers like DBX on AIX assume the address in debug line
entries is always incremental. But clang generates two entries (entry
for file scope line and entry for prologue end) with same address if
prologue is empty
And if the prologue is empty, seems the first debug line entry for the
function is unnecessary(i.e. removing the first entry won't impact the
behavior in GDB on Linux), so I implement this for all debuggers.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D147506
Prevent optimization of DebugLoc across section boundaries, such optimization will yield incorrect source location if memory layout of sections does not strictly match the Asm file.
Reviewed By: #debug-info, dblaikie, MaskRay
Differential Revision: https://reviews.llvm.org/D149294
Some debuggers like DBX on AIX assume the address in debug line
entries is always incremental. But clang generates two entries (entry
for file scope line and entry for prologue end) with same address if
prologue is empty
And if the prologue is empty, seems the first debug line entry for the
function is unnecessary(i.e. removing the first entry won't impact the
behavior in GDB on Linux), so I implement this for all debuggers.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D147506
Given the intent of Split DWARF is to minimize .o file size it seems
like adequate signal that it's worth a minor tradeoff in .dwo size to
significantly reduce .o size (though it doesn't reduce linked executable
size - the cost is mostly in the static relocations resolved by the
linker).
Old behavior was to add to .debug_aranges only when we create a DIE. As the
result we could end up in situation where DW_AT_ranges have addresses that are
not in .debug_aranges. This has caused issues for LLDB: D136395.
Changed it to add addresses to .debug_aranges even when DIE is not created.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D137139
When finding call-site argument locations, don't consider registers to be
location candidates if they will be clobbered between the copy to/from them
and call site. Doing so would present overwritten register values as entry
values in called functions.
This patch adds a collection of register units defined as we walk back from
the call site, and prevents the acceptance of a call-site parameter
location if it will be clobbered on that path.
Fixes https://github.com/llvm/llvm-project/issues/57444
Differential Revision: https://reviews.llvm.org/D141279
We identify epilogue code by looking for instructions tagged
with FrameDestroy.
A function may have more than one epilogue, e.g., because of early
returns or code duplicated during optimization. We need only track
the current block, and emit epilogie_begin at most once per block.
We reduce the number of entries in the line table by combining
epilogue_begin with other flags instead of emitting a separate
entry just for epilogue_begin.
Reviewed By: dblaikie, aprantl
Differential Revision: https://reviews.llvm.org/D133376
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated. The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.
This is part of an effort to migrate from llvm::Optional to
std::optional:
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
This commit modifies the AsmPrinter to avoid emitting any zero-sized symbols to
the .debug_aranges table, by rounding their size up to 1. Entries with zero
length violate the DWARF 5 spec, which states:
> Each descriptor is a triple consisting of a segment selector, the beginning
> address within that segment of a range of text or data covered by some entry
> owned by the corresponding compilation unit, followed by the non-zero length
> of that range.
In practice, these zero-sized entries produce annoying warnings in lld and
cause GNU binutils to truncate the table when parsing it.
Other parts of LLVM, such as DWARFDebugARanges in the DebugInfo module
(specifically the appendRange method), already avoid emitting zero-sized
symbols to .debug_aranges, but not comprehensively in the AsmPrinter. In fact,
the AsmPrinter does try to avoid emitting such zero-sized symbols when labels
aren't involved, but doesn't when the symbol to emitted is a difference of two
labels; this patch extends that logic to handle the case in which the symbol is
defined via labels.
Furthermore, this patch fixes a bug in which `available_externally` symbols
would cause unpredictable values to be emitted into the `.debug_aranges` table
under certain circumstances. In practice I don't believe that this caused
issues up until now, but the root cause of this bug--an invalid DenseMap
lookup--triggered failures in Chromium when combined with an earlier version of
this patch. Therefore, this patch fixes that bug too.
This is a revised version of diff D126257, which was reverted due to breaking
tests. The now-reverted version of this patch didn't distinguish between
symbols that didn't have their size reported to the DwarfDebug handler and
those that had their size reported to be zero. This new version of the patch
instead restricts the special handling only to the symbols whose size is
definitively known to be zero.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D126835
This reverts commit 256a52d9aac8a9e98fbfd6a3d91090bf127cef7d (and
also the follow-up commit 38eb4fe74b3843ab0d7fc1e that moved a test
case to a different directory).
As discussed in https://reviews.llvm.org/D126257 there is a suspicion
that something was wrong with this commit as text section range was
shortened to 1 byte rather than rounded up as shown in the
llvm/test/DebugInfo/X86/dwarf-aranges.ll test case.
This commit modifies the AsmPrinter to avoid emitting any zero-sized symbols to
the .debug_aranges table, by rounding their size up to 1. Entries with zero
length violate the DWARF 5 spec, which states:
> Each descriptor is a triple consisting of a segment selector, the beginning
> address within that segment of a range of text or data covered by some entry
> owned by the corresponding compilation unit, followed by the non-zero length
> of that range.
In practice, these zero-sized entries produce annoying warnings in lld and
cause GNU binutils to truncate the table when parsing it.
Other parts of LLVM, such as DWARFDebugARanges in the DebugInfo module
(specifically the appendRange method), already avoid emitting zero-sized
symbols to .debug_aranges, but not comprehensively in the AsmPrinter. In fact,
the AsmPrinter does try to avoid emitting such zero-sized symbols when labels
aren't involved, but doesn't when the symbol to emitted is a difference of two
labels; this patch extends that logic to handle the case in which the symbol is
defined via labels.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D126257
Returning `std::array<uint8_t, N>` is better ergonomics for the hashing functions usage, instead of a `StringRef`:
* When returning `StringRef`, client code is "jumping through hoops" to do string manipulations instead of dealing with fixed array of bytes directly, which is more natural
* Returning `std::array<uint8_t, N>` avoids the need for the hasher classes to keep a field just for the purpose of wrapping it and returning it as a `StringRef`
As part of this patch also:
* Introduce `TruncatedBLAKE3` which is useful for using BLAKE3 as the hasher type for `HashBuilder` with non-default hash sizes.
* Make `MD5Result` inherit from `std::array<uint8_t, 16>` which improves & simplifies its API.
Differential Revision: https://reviews.llvm.org/D123100
We could only do this in limited ways (since we emit the TUs first, we
can't use ref_addr (& we can't use that in Split DWARF either) - so we
had to synthesize declarations into the TUs) and they were ambiguous in
some cases (if the CU type had internal linkage, parsing the TU would
require knowing which CU was referencing the TU to know which type the
declaration was for, which seems not-ideal). So to avoid all that, let's
just not reference types defined in the CU from TUs - instead moving the
TU type into the CU (recursively).
This does increase debug info size (by pulling more things out of type
units, into the compile unit) - about 2% of uncompressed dwp file size
for clang -O0 -g -gsplit-dwarf. (5% .debug_info.dwo section size
increase in the .dwp)
This reverts commit ab4756338c5b2216d52d9152b2f7e65f233c4dac.
Breaks some cases, including this:
namespace {
template <typename> struct a {};
} // namespace
class c {
c();
};
class b {
b();
a<c> ax;
};
b::b() {}
c::c() {}
By producing a reference to a type unit for "c" but not producing the type unit.
Doing this causes a declaration of the internal linkage (anonymous
namespace) type to be emitted in the type unit, which would then be
ambiguous as to which internal linkage definition it refers to (since
the name is only valid internally).
It's possible these internal linkage types could be resolved relative to
the unit the TU is referred to from - but that doesn't seem ideal, and
there's no reason to put the type in a type unit since it can only be
defined in one CU anyway (since otherwise it'd be an ODR violation) & so
avoiding the type unit should be a smaller DWARF encoding anyway.
This also addresses an issue with Simplified Template Names where the
template parameter could not be rebuilt from the declaration emitted
into the TU (specifically for an enum non-type template parameter, where
looking up the enumerators is necessary to rebuild the full template
name)
Commit 2bddab25dba8d4b0 removed a piece of code from
DwarfDebug::emitDebugLocEntry that according to code comments
"Make sure comments stay aligned".
This patch restores that piece of code, together with the addition
of some extra checks in an existing lit test to work as a regression
test. Without this patch we incorrectly get
.byte 159 # 0
instead of
.byte 159 # DW_OP_stack_value
Differential Revision: https://reviews.llvm.org/D117441
Instead of hashing DIE offsets, hash DIE references the same as they
would be when used outside of a loclist - that is, deep hash the type on
first use, and hash the numbering on subsequent uses.
This does produce different hashes for different type references, where
it did not before (because we were hashing zero all the time - so it
didn't matter what type was referenced, the hash would be identical).
This also allows us to enforce that the DIE offset (& size) is not
queried before it is used (which came up while investigating another bug
recently).
Causes invalid debug_gnu_pubnames (& I think non-gnu pubnames too) -
visible as 0 values for the offset in gnu pubnames. More details on the
original review in D115325.
This reverts commit 78d15a112cbd545fbb6e1aa37c221ef5aeffb3f2.
This reverts commit 54586582d3e17c0bba76258d2930486a6dfaaf2a.
Try to revert D113741 once again.
This also reverts 0ac75e82fff93a80ca401d3db3541e8d1d9098f9 (D114705)
as it causes LLDB's lldb-api.lang/cpp/nsimport.TestCppNsImport.py test
failure w/o D113741.
This reverts commit f9607d45f399e2afc39ec16222ea68b4e0831564.
Differential Revision: https://reviews.llvm.org/D116225
This patch causes invalid DWARF to be generated in some cases of LTO +
Split DWARF - follow-up on the original review thread (D113741) contains
further detail and test cases.
This reverts commit 75b622a7959479ccdb758ebe0973061b7b4fcacd.
This reverts commit b6ccca217c35a95b8c2a337a7801b37cb23dbae2.
This reverts commit 514d37441918dd04a2cd4f35c08959d7c3ff096d.
Fixes https://llvm.org/PR51087: Extraneous enum record in DWARF with type units.
As explained in PR51087 we sometimes get skeleton DIEs for enums in a Dwarf
Compile Unit (CU) that are not referenced from any CU and are already described
by a type unit.
Types for enums are emitted whether used or not, all together before most types
in the CU. Mechanically, the extraneous CU records are generated because the
enum types are generated with a call to CU->getOrCreateTypeDIE. This function
will recursively get-or-create the parent DIE (in the CU) and the type unit for
each. We don't need the CU-side DIEs if the type units are sucesfully
emitted. Fix by only emitting the type units for enums if possible, falling back
to a call to getOrCreateTypeDIE if not. Do the same for retained types.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D115325
This patch proposes to move emission of global variables, types,
imported entities, etc from DwarfDebug::beginModule() to DwarfDebug::endModule().
Effectively, this changes nothing but the order of debug entities which
will be as follows:
* subprograms (including related context, local variables/labels,
local imported entities; related types can be created as a part of
the emission of local entities of an abstract subprogram);
* global variables (including related context and types);
* retained types and enums;
* non-local-scoped imported entities;
* basic types;
* other types left (as a part of local variables attributes emission).
Note that the order of emitted compile units may also be changed as now we emit
units that contain subprograms first and then all other non-empty units.
The motivation behind this change is the following:
(1) DwarfDebug::beginModule() is run at the very beginning of backend's pipeline,
from this time IR can be significantly changed by target-specific passes.
If it happens for debug metadata of global entities, those changes will not
be reflected in the emitted DWARF.
(2) imported subprogram names should refer to an abstract subprogram if it exists,
but it isn't known in DwarfDebug::beginModule() (it's possible to make some
guesses based on location info, but it's not quite reliable);
(3) aforementioned entities if they are scoped within a bracketed block
(subject of D113741) couldn't be emitted in DwarfDebug::beginModule()
(they need parent emitted first). Another problem is if to try to gather
some information about local entities and defer their emission
(till subprogram's processing or DwarfDebug::endModule()) all the gathered
details might be irrelevant / invalid by the time the entities are being
emitted (because of (1)).
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D114705
This reverts commits
* ee691970a9a85470948ada623c31f0ab8773617c (D113741),
* 79d3132998b2828be8f7d2ec411f91fb11b3e01f (D114705)
due to lldb and dexter test failures.
This patch proposes to move emission of global variables, types,
imported entities, etc from DwarfDebug::beginModule() to DwarfDebug::endModule().
Effectively, this changes nothing but the order of debug entities which
will be as follows:
* subprograms (including related context, local variables/labels,
local imported entities; related types can be created as a part of
the emission of local entities of an abstract subprogram);
* global variables (including related context and types);
* retained types and enums;
* non-local-scoped imported entities;
* basic types;
* other types left (as a part of local variables attributes emission).
Note that the order of emitted compile units may also be changed as now we emit
units that contain subprograms first and then all other non-empty units.
The motivation behind this change is the following:
(1) DwarfDebug::beginModule() is run at the very beginning of backend's pipeline,
from this time IR can be significantly changed by target-specific passes.
If it happens for debug metadata of global entities, those changes will not
be reflected in the emitted DWARF.
(2) imported subprogram names should refer to an abstract subprogram if it exists,
but it isn't known in DwarfDebug::beginModule() (it's possible to make some
guesses based on location info, but it's not quite reliable);
(3) aforementioned entities if they are scoped within a bracketed block
(subject of D113741) couldn't be emitted in DwarfDebug::beginModule()
(they need parent emitted first). Another problem is if to try to gather
some information about local entities and defer their emission
(till subprogram's processing or DwarfDebug::endModule()) all the gathered
details might be irrelevant / invalid by the time the entities are being
emitted (because of (1)).
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D114705
In a LTO build, the `end_sequence` in debug_line table for each compile unit (CU) points the end of text section which merged all CUs. The `end_sequence` needs to point to the end of each CU's range. This bug often causes invalid `debug_line` table in the final `.dSYM` binary for MachO after running `dsymutil` which tries to compensate an out-of-range address of `end_sequence`.
The fix is to sync the line table termination with the range operations that are already maintained in DwarfDebug. When CU or section changes, or nodebug functions appear or module is finished, the prior pending line table is terminated using the last range label. In the MC path where no range is tracked, the old logic is conservatively used to end the line table using the section end symbol.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D108261