Extend `DebugInfoFinder` to collect and expose macro debug information
(`DIMacro` and `DIMacroFile` nodes).
Also update `ModuleDebugInfoPrinter` to display macro information
including the macro type, name, value, and source location.
-----
The motivation behind this PR is that `DebugInfoFinder` is key for the
[SPIRV-LLVM-Translator](https://github.com/KhronosGroup/SPIRV-LLVM-Translator)
and also for future support of debug info in the SPIRV backend in LLVM.
This new lookup of `DIMacro` with their `DIMacroFile` when available
simplifies the logic around the translation for this debug information.
This is an attempt to merge https://reviews.llvm.org/D144006 with LTO
fix.
The last merge attempt was
https://github.com/llvm/llvm-project/pull/75385.
The issue with it was investigated in
https://github.com/llvm/llvm-project/pull/75385#issuecomment-2386684121.
The problem happens when
1. Several modules are being linked.
2. There are several DISubprograms that initially belong to different
modules but represent the same source code function (for example, a
function included from the same source code file).
3. Some of such DISubprograms survive IR linking. It may happen if one
of them is inlined somewhere or if the functions that have these
DISubprograms attached have internal linkage.
4. Each of these DISubprograms has a local type that corresponds to the
same source code type. These types are initially from different modules,
but have the same ODR identifier.
If the same (in the sense of ODR identifier/ODR uniquing rules) local
type is present in two modules, and these modules are linked together,
the type gets uniqued. A DIType, that happens to be loaded first,
survives linking, and the references on other types with the same ODR
identifier from the modules loaded later are replaced with the
references on the DIType loaded first. Since defintion subprograms, in
scope of which these types are located, are not deduplicated, the linker
output may contain multiple DISubprogram's having the same (uniqued)
type in their retainedNodes lists.
Further compilation of such modules causes crashes.
To tackle that,
* previous solution to handle LTO linking with local types in
retainedNodes is removed (cloneLocalTypes() function),
* for each loaded distinct (definition) DISubprogram, its retainedNodes
list is scanned after loading, and DITypes with a scope of another
subprogram are removed. If something from a Function corresponding to
the DISubprogram references uniqued type, we rely on cross-CU links.
Additionally:
* a check is added to Verifier to report about local types located in a
wrong retainedNodes list,
Original commit message follows.
---------
RFC https://discourse.llvm.org/t/rfc-dwarfdebug-fix-and-improve-handling-imported-entities-types-and-static-local-in-subprogram-and-lexical-block-scopes/68544
Similar to imported declarations, the patch tracks function-local types in
DISubprogram's 'retainedNodes' field. DwarfDebug is adjusted in accordance with
the aforementioned metadata change and provided a support of function-local
types scoped within a lexical block.
The patch assumes that DICompileUnit's 'enums field' no longer tracks local
types and DwarfDebug would assert if any locally-scoped types get placed there.
Authored-by: Kristina Bessonova <kbessonova@accesssoftek.com>
Co-authored-by: Jeremy Morse <jeremy.morse@sony.com>
For swift async code, we need to use a debug intrinsic that behaves like
an llvm.dbg.declare but can take any location type rather than just a
pointer or integer.
To solve this, a new debug instrinsic called llvm.dbg.declare_value has
been created, which behaves exactly like an llvm.dbg.declare but can
take non pointer and integer location types.
More information here:
https://discourse.llvm.org/t/rfc-introduce-new-llvm-dbg-coroframe-entry-intrinsic/88269
This is the first patch as part of a stack of patches, with the one
succeeding it being: https://github.com/llvm/llvm-project/pull/168134
These checks ensure that retained nodes of a DISubprogram belong to the
subprogram.
Tests with incorrect IR are fixed. We should not have variables of one subprogram present in retained nodes of other subprograms.
Also, interface for accessing DISubprogram's retained nodes is slightly
refactored. `DISubprogram::visitRetainedNodes` and
`DISubprogram::forEachRetainedNode` are added to avoid repeating checks
like
```
if (const auto *LV = dyn_cast<DILocalVariable>(N))
...
else if (const auto *L = dyn_cast<DILabel>(N))
...
else if (const auto *IE = dyn_cast<DIImportedEntity>(N))
...
```
This patch sets up `DICompileUnit` to support the DWARFv6
`DW_AT_language_name` and `DW_AT_language_version` attributes (which are
set to replace `DW_AT_language`). This patch changes the
`DICompileUnit::SourceLanguage` field type to a `DISourceLanguageName`
that encapsulates the notion of "versioned vs. unversioned name". A
"versioned" name is one that has an associated version stored separately
in `DISourceLanguageName::Version`.
This patch just changes all the clients of the `getSourceLanguage` API
to the expect a `DISourceLanguageName`. Currently they all just `assert`
(via `DISourceLanguageName::getUnversionedName`) that we're dealing with
"unversioned names" (i.e., the pre-DWARFv6 language codes). In follow-up
patches (e.g., draft is at
https://github.com/llvm/llvm-project/pull/162261), when we start
emitting versioned language codes, the `getUnversionedName` calls can
then be adjusted to `getName`.
**Implementation considerations**
* We could have added a new member to `DICompileUnit` alongside the
existing `SourceLanguage` field. I don't think this would have made the
transition any simpler (clients would still need to be aware of
"versioned" vs. "unversioned" language names). I felt that encapsulating
this inside a `DISourceLanguageName` was easier to reason about for
maintainers.
* Currently DISourceLanguageName is a `12` byte structure. We could
probably pack all the info inside a `uint64_t` (16-bits for the name,
32-bits for the version, 1-bit for answering the `hasVersionedName`).
Just to keep the prototype simple I used a `std::optional`. But since
the guts of the structure are hidden, we can always change the layout to
a more compact representation instead.
**How to review**
* The new `DISourceLanguageName` structure is defined in
`DebugInfoMetadata.h`. All the other changes fall out from changing the
`DICompileUnit::SourceLanguage` from `unsigned` to
`DISourceLanguageName`.
Inliner/IROutliner/CodeExtractor all uses the
updateLoopMetadataDebugLocations helper in order to modify debug
location related to loop metadata. However, the helper has only
been updating DILocation nodes found as operands to the first level
of the MD_loop metadata. There could however be more DILocations
as part of the various kinds of followup metadata. A typical example
would be llvm.loop metadata like this
!6 = distinct !{!6, !7, !8, !9, !10, !11}
!7 = !DILocation(line: 6, column: 3, scope: !3)
!8 = !DILocation(line: 7, column: 22, scope: !3)
!11 = !{!"llvm.loop.distribute.followup_all", !7, !8, ..., !14}
!14 = !{!"llvm.loop.vectorize.followup_all", !7, !8, ...}
Instead of just updating !7 and !8 in !6, this patch make sure that
we now recursively update the DILocations in !11 and !14 as well.
Fixes#141568
In order to better see what's going on during ThinLTO linking, this PR
adds more profile tags when using `--time-trace` on a `lld-link.exe`
invocation.
After PR, linking `clang.exe`:
<img width="3839" height="2026" alt="Capture d’écran 2025-09-02 082021"
src="https://github.com/user-attachments/assets/bf0c85ba-2f85-4bbf-a5c1-800039b56910"
/>
Linking a custom (Unreal Engine game) binary gives a completly
different picture, probably because of using Unity files, and the sheer
amount of input files (here, providing over 60 GB of .OBJs/.LIBs).
<img width="1940" height="1008" alt="Capture d’écran 2025-09-02 102048"
src="https://github.com/user-attachments/assets/60b28630-7995-45ce-9e8c-13f3cb5312e0"
/>
At this stage I'm just opportunistically deleting any code using
debug-intrinsic types, largely adjacent to calls to findDbgUsers. I'll
get to deleting that in probably one or more two commits.
This change adds some support to the C DebugInfo capability to create set types,
subrange types, dynamic array types and the ability to replace arrays.
RFC on discourse:
https://discourse.llvm.org/t/rfc-debug-info-for-coroutine-suspension-locations-take-2/86606
With this commit, we add `DILabel` debug infos to the resume points of a
coroutine. Those labels can be used by debugging scripts to figure out
the exact line and column at which a coroutine was suspended by looking
up current `__coro_index` value inside the coroutines frame, and then
searching for the corresponding label inside the coroutine's resume
function.
The DWARF information generated for such a label looks like:
```
0x00000f71: DW_TAG_label
DW_AT_name ("__coro_resume_1")
DW_AT_decl_file ("generator-example.cpp")
DW_AT_decl_line (5)
DW_AT_decl_column (3)
DW_AT_artificial (true)
DW_AT_LLVM_coro_suspend_idx (0x01)
DW_AT_low_pc (0x00000000000019be)
```
The labels can be mapped to their corresponding `__coro_idx` values
either via their naming convention `__coro_resume_<N>` or using the new
`DW_AT_LLVM_coro_suspend_idx` attribute. In gdb, those line numebrs can
be looked up using `info line -function my_coroutine -label
__coro_resume_1`. LLDB unfortunately does not understand DW_TAG_label
debug information, yet.
Given this is an artificial compiler-generated label, I did apply the
DW_AT_artificial tag to it. The DWARFv5 standard only allows that tag on
type and variable definitions, but this is a natural extension and was
also blessed in the RFC on discourse.
Also, this commit adds `DW_AT_decl_column` to labels, not only for
coroutines but also for normal C and C++ labels. While not strictly
necessary, I am doing so now because it would be harder to do so later
without breaking the binary LLVM-IR format
Drive-by fixes: While reading the existing test cases to understand how
to write my own test case, I did a couple of small typo fixes and
comment improvements
The C API does not provide a way to replace the subroutine type after
creating a subprogram. This functionality is useful for creating a
subroutine type composed of types which have the subprogram as scope
Seeing how we can't generate any debug intrinsics any more: delete a
variety of codepaths where they're handled. For the most part these are
plain deletions, in others I've tweaked comments to remain coherent, or
added a type to (what was) type-generic-lambdas.
This isn't all the DbgInfoIntrinsic call sites but it's most of the
simple scenarios.
Co-authored-by: Nikita Popov <github@npopov.com>
Part of the coverage-tracking feature, following #107279.
In order for DebugLoc coverage testing to work, we firstly have to set
annotations for intentionally-empty DebugLocs, and secondly we have to
ensure that we do not drop these annotations as we propagate DebugLocs
throughout compilation. As the annotations exist as part of the DebugLoc
class, and not the underlying DILocation, they will not survive a
DebugLoc->DILocation->DebugLoc roundtrip. Therefore this patch modifies
a number of places in the compiler to propagate DebugLocs directly
rather than via the underlying DILocation. This has no effect on the
output of normal builds; it only ensures that during coverage builds, we
do not drop incorrectly annotations and therefore create false
positives.
The bulk of these changes are in replacing
DILocation::getMergedLocation(s) with a DebugLoc equivalent, and in
changing the IRBuilder to store a DebugLoc directly rather than storing
DILocations in its general Metadata array. We also use a new function,
`DebugLoc::orElse`, which selects the "best" DebugLoc out of a pair
(valid location > annotated > empty), preferring the current DebugLoc on
a tie - this encapsulates the existing behaviour at a few sites where we
_may_ assign a DebugLoc to an existing instruction, while extending the
logic to handle annotation DebugLocs at the same time.
Specifically this is the assertion in BasicBlock.cpp. Now that we're not
examining or setting that flag consistently (because it'll be deleted in
about an hour) there's no need to keep this assertion.
Original commit title:
[DebugInfo][RemoveDIs] Remove some debug intrinsic-only codepaths (#143451)
This reverts commit c71a2e688828ab3ede4fb54168a674ff68396f61.
/me squints -- this is hitting an assertion I thought had been deleted,
will revert and investigate for a bit.
These are opportunistic deletions as more places that make use of the
IsNewDbgInfoFormat flag are removed. It should (TM)(R) all be dead code
now that `IsNewDbgInfoFormat` should be true everywhere.
FastISel: we don't need to do debug-aware instruction counting any more,
because there are no debug instructions,
Autoupgrade: you can no-longer avoid autoupgrading of intrinsics to
records
DIBuilder: Delete the code for creating debug intrinsics (!)
LoopUtils: No need to handle debug instructions, they don't exist
Since https://reviews.llvm.org/D144004, DISubprogram's retainedNodes
field is used to track DIImportedEntities, in addition to local
variables and labels.
However, the corresponding update for DebugInfoFinder, to make it visit
DISubprogram's retainedNodes, was missing.
This is the fix for it.
This change is separated from
https://github.com/llvm/llvm-project/pull/119001 to simplify it.
Reapplied after fixing the config issue that was causing issues following
the previous merge.
This reverts commit fdbf073a86573c9ac4d595fac8e06d252ce1469f.
Since `LLVMDIBuilderCreateEnumerator` only supports up to 64 bits, this
PR adds a new `LLVMDIBuilderCreateEnumeratorOfArbitraryPrecision`
function that takes an arbitrary number of words, based on
`LLVMConstIntOfArbitraryPrecision`. This allows even larger enumeration
types to represent their values exactly. (It seems LLVM should already
support i128 enums since 13.0.0.)
This reverts commit a9d93ecf1f8d2cfe3f77851e0df179b386cff353.
Reverted due to the commit including a config in LLVM headers that is not
available outside of the llvm source tree.
This is part of a series of patches that tries to improve DILocation bug
detection in Debugify; see the review for more details. This is the patch
that adds the main feature, adding a set of `DebugLoc::get<Kind>`
functions that can be used for instructions with intentionally empty
DebugLocs to prevent Debugify from treating them as bugs, removing the
currently-pervasive false positives and allowing us to use Debugify (in
its original DI preservation mode) to reliably detect existing bugs and
regressions. This patch does not add uses of these functions, except for
once in Clang before optimizations, and in
`Instruction::dropLocation()`, since that is an obvious case that
immediately removes a set of false positives.
After #124287 updated several functions to return iterators rather than
Instruction *, it was no longer straightforward to pass their result to
DIBuilder. This commit updates DIBuilder methods to accept an
InsertPosition instead, so that they can be called with an iterator
(preferred), or with a deprecation warning an Instruction *, or a
BasicBlock *. This commit also updates the existing calls to the
DIBuilder methods to pass in iterators.
As a special exception, DIBuilder::insertDeclare() keeps a separate
overload accepting a BasicBlock *InsertAtEnd. This is because despite
the name, this method does not insert at the end of the block, therefore
this cannot be handled implicitly by using InsertPosition.
After #124287 updated several functions to return iterators rather than
Instruction *, it was no longer straightforward to pass their result to
DIBuilder. This commit updates DIBuilder methods to accept an
InsertPosition instead, so that they can be called with an iterator
(preferred), or with a deprecation warning an Instruction *, or a
BasicBlock *. This commit also updates the existing calls to the
DIBuilder methods to pass in iterators.
This is an upstream proposal from
e60884cb98
We observed malfunctioning StripNonLineTableDebugInfo during debugging
and it's caused by out-of-order evaluation, this is a C++ level semantic
ambiguity issue, refer
https://en.cppreference.com/w/cpp/language/eval_order
Solution is simply separating one line into two.
This reverts commit c3a935e3f967f8f22f5db240d145459ee621c1e0.
The only change to the reverted commit is that this also updates
the OCaml bindings according to the C debug-info API changes.
The build failure originally introduced was:
```
FAILED: bindings/ocaml/debuginfo/debuginfo_ocaml.o /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bindings/ocaml/debuginfo/debuginfo_ocaml.o
cd /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bindings/ocaml/debuginfo && /usr/bin/ocamlfind ocamlc -c /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bindings/ocaml/debuginfo/debuginfo_ocaml.c -ccopt "-I/b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/bindings/ocaml/debuginfo/../llvm -D_GNU_SOURCE -D_DEBUG -D_GLIBCXX_ASSERTIONS -DEXPENSIVE_CHECKS -D_GLIBCXX_DEBUG -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/b/1/llvm-clang-x86_64-expensive-checks-debian/build/include -I/b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/include -DNDEBUG "
/b/1/llvm-clang-x86_64-expensive-checks-debian/build/bindings/ocaml/debuginfo/debuginfo_ocaml.c: In function ‘llvm_dibuild_create_object_pointer_type’:
/b/1/llvm-clang-x86_64-expensive-checks-debian/build/bindings/ocaml/debuginfo/debuginfo_ocaml.c:620:30: error: too few arguments to function ‘LLVMDIBuilderCreateObjectPointerType’
620 | LLVMMetadataRef Metadata = LLVMDIBuilderCreateObjectPointerType(
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bindings/ocaml/debuginfo/debuginfo_ocaml.c:23:
/b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/include/llvm-c/DebugInfo.h:880:17: note: declared here
880 | LLVMMetadataRef LLVMDIBuilderCreateObjectPointerType(LLVMDIBuilderRef Builder,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
```
Note that PointerUnion::{is,get,dyn_cast} have been soft deprecated in
PointerUnion.h:
// FIXME: Replace the uses of is(), get() and dyn_cast() with
// isa<T>, cast<T> and the llvm::dyn_cast<T>
It is almost always simpler to use {} instead of std::nullopt to
initialize an empty ArrayRef. This patch changes all occurrences I could
find in LLVM itself. In future the ArrayRef(std::nullopt_t) constructor
could be deprecated or removed.
Patch [3/x] to fix structured bindings debug info in SROA.
This function computes a fragment, bit-extract operation if needed, and new
constant offset to describe a part of a variable covered by some memory.
This generalises, simplifies, and replaces at::calculateFragmentIntersect. That
version is still used as a wrapper for now though to keep this change NFC.
The new version takes doesn't have a DbgRecord parameter, instead using an
explicit address and address offset. The old version only operates on
dbg_assigns and this change means it can also operate on dbg_declare records
easily, which it will do in a subsequent patch.
The new version has a new out-param OffsetFromLocationInBits which is set to
the difference between the first bit of the variable location and the first
bit of the memory slice. This will be used in a subsequent patch in SROA to
determine the new offset to use in the address expression after splitting an
alloca.
Fix#91814
When instructions are extracted into a new function the `DIAssignID` metadata
uses and attachments need to be remapped so that the stores and assignment
markers don't link to stores and assignment markers in the original function.
This matches existing inlining behaviour for DIAssignIDs.