[DebugInfo][DWARF] Emit Per-Function Line Table Offsets and End Sequences (#110192)

**Summary**

This patch introduces a new compiler option `-mllvm
-emit-func-debug-line-table-offsets` that enables the emission of
per-function line table offsets and end sequences in DWARF debug
information. This enhancement allows tools and debuggers to accurately
attribute line number information to their corresponding functions, even
in scenarios where functions are merged or share the same address space
due to optimizations like Identical Code Folding (ICF) in the linker.

**Background**
RFC: [New DWARF Attribute for Symbolication of Merged
Functions](https://discourse.llvm.org/t/rfc-new-dwarf-attribute-for-symbolication-of-merged-functions/79434)

Previous similar PR:
[#93137](https://github.com/llvm/llvm-project/pull/93137) – This PR was
very similar to the current one but at the time, the assembler had no
support for emitting labels within the line table. That support was
added in PR [#99710](https://github.com/llvm/llvm-project/pull/99710) -
and in this PR we use some of the support added in the assembler PR.

In the current implementation, Clang generates line information in the
`debug_line` section without directly associating line entries with
their originating `DW_TAG_subprogram` DIEs. This can lead to issues when
post-compilation optimizations merge functions, resulting in overlapping
address ranges and ambiguous line information.

For example, when functions are merged by ICF in LLD, multiple functions
may end up sharing the same address range. Without explicit linkage
between functions and their line entries, tools cannot accurately
attribute line information to the correct function, adversely affecting
debugging and call stack resolution.


**Implementation Details**
To address the above issue, the patch makes the following key changes:

**`DW_AT_LLVM_stmt_sequence` Attribute**: Introduces a new LLVM-specific
attribute `DW_AT_LLVM_stmt_sequence` to each `DW_TAG_subprogram` DIE.
This attribute holds a label pointing to the offset in the line table
where the function's line entries begin.

**End-of-Sequence Markers**: Emits an explicit DW_LNE_end_sequence after
each function's line entries in the line table. This marks the end of
the line information for that function, ensuring that line entries are
correctly delimited.

**Assembler and Streamer Modifications**: Modifies the MCStreamer and
related classes to support emitting the necessary labels and tracking
the current function's line entries. A new flag
GenerateFuncLineTableOffsets is added to control this behavior.

**Compiler Option**: Introduces the `-mllvm
-emit-func-debug-line-table-offsets` option to enable this
functionality, allowing users to opt-in as needed.
This commit is contained in:
alx32 2024-11-13 18:51:34 -08:00 committed by GitHub
parent e9aee4fd80
commit f407dff50c
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
8 changed files with 180 additions and 11 deletions

View File

@ -618,6 +618,7 @@ HANDLE_DW_AT(0x3e08, LLVM_ptrauth_isa_pointer, 0, LLVM)
HANDLE_DW_AT(0x3e09, LLVM_ptrauth_authenticates_null_values, 0, LLVM)
HANDLE_DW_AT(0x3e0a, LLVM_ptrauth_authentication_mode, 0, LLVM)
HANDLE_DW_AT(0x3e0b, LLVM_num_extra_inhabitants, 0, LLVM)
HANDLE_DW_AT(0x3e0c, LLVM_stmt_sequence, 0, LLVM)
// Apple extensions.

View File

@ -313,6 +313,8 @@ public:
void setAllowAutoPadding(bool v) { AllowAutoPadding = v; }
bool getAllowAutoPadding() const { return AllowAutoPadding; }
MCSymbol *emitLineTableLabel();
/// When emitting an object file, create and emit a real label. When emitting
/// textual assembly, this should do nothing to avoid polluting our output.
virtual MCSymbol *emitCFILabel();

View File

@ -49,6 +49,12 @@ cl::opt<cl::boolOrDefault> AddLinkageNamesToDeclCallOrigins(
"referenced by DW_AT_call_origin attributes. Enabled by default "
"for -gsce debugger tuning."));
static cl::opt<bool> EmitFuncLineTableOffsetsOption(
"emit-func-debug-line-table-offsets", cl::Hidden,
cl::desc("Include line table offset in function's debug info and emit end "
"sequence after each function's line data."),
cl::init(false));
static bool AddLinkageNamesToDeclCallOriginsForTuning(const DwarfDebug *DD) {
bool EnabledByDefault = DD->tuneForSCE();
if (EnabledByDefault)
@ -511,7 +517,8 @@ void DwarfCompileUnit::addWasmRelocBaseGlobal(DIELoc *Loc, StringRef GlobalName,
// Find DIE for the given subprogram and attach appropriate DW_AT_low_pc
// and DW_AT_high_pc attributes. If there are global variables in this
// scope then create and insert DIEs for these variables.
DIE &DwarfCompileUnit::updateSubprogramScopeDIE(const DISubprogram *SP) {
DIE &DwarfCompileUnit::updateSubprogramScopeDIE(const DISubprogram *SP,
MCSymbol *LineTableSym) {
DIE *SPDie = getOrCreateSubprogramDIE(SP, includeMinimalInlineScopes());
SmallVector<RangeSpan, 2> BB_List;
// If basic block sections are on, ranges for each basic block section has
@ -526,6 +533,12 @@ DIE &DwarfCompileUnit::updateSubprogramScopeDIE(const DISubprogram *SP) {
*DD->getCurrentFunction()))
addFlag(*SPDie, dwarf::DW_AT_APPLE_omit_frame_ptr);
if (emitFuncLineTableOffsets() && LineTableSym) {
addSectionLabel(
*SPDie, dwarf::DW_AT_LLVM_stmt_sequence, LineTableSym,
Asm->getObjFileLowering().getDwarfLineSection()->getBeginSymbol());
}
// Only include DW_AT_frame_base in full debug info
if (!includeMinimalInlineScopes()) {
const TargetFrameLowering *TFI = Asm->MF->getSubtarget().getFrameLowering();
@ -1096,8 +1109,9 @@ sortLocalVars(SmallVectorImpl<DbgVariable *> &Input) {
}
DIE &DwarfCompileUnit::constructSubprogramScopeDIE(const DISubprogram *Sub,
LexicalScope *Scope) {
DIE &ScopeDIE = updateSubprogramScopeDIE(Sub);
LexicalScope *Scope,
MCSymbol *LineTableSym) {
DIE &ScopeDIE = updateSubprogramScopeDIE(Sub, LineTableSym);
if (Scope) {
assert(!Scope->getInlinedAt());
@ -1691,6 +1705,10 @@ bool DwarfCompileUnit::includeMinimalInlineScopes() const {
(DD->useSplitDwarf() && !Skeleton);
}
bool DwarfCompileUnit::emitFuncLineTableOffsets() const {
return EmitFuncLineTableOffsetsOption;
}
void DwarfCompileUnit::addAddrTableBase() {
const TargetLoweringObjectFile &TLOF = Asm->getObjFileLowering();
MCSymbol *Label = DD->getAddressPool().getLabel();

View File

@ -152,6 +152,8 @@ public:
bool includeMinimalInlineScopes() const;
bool emitFuncLineTableOffsets() const;
void initStmtList();
/// Apply the DW_AT_stmt_list from this compile unit to the specified DIE.
@ -207,10 +209,10 @@ public:
void attachLowHighPC(DIE &D, const MCSymbol *Begin, const MCSymbol *End);
/// Find DIE for the given subprogram and attach appropriate
/// DW_AT_low_pc and DW_AT_high_pc attributes. If there are global
/// variables in this scope then create and insert DIEs for these
/// variables.
DIE &updateSubprogramScopeDIE(const DISubprogram *SP);
/// DW_AT_low_pc, DW_AT_high_pc and DW_AT_LLVM_stmt_sequence attributes.
/// If there are global variables in this scope then create and insert DIEs
/// for these variables.
DIE &updateSubprogramScopeDIE(const DISubprogram *SP, MCSymbol *LineTableSym);
void constructScopeDIE(LexicalScope *Scope, DIE &ParentScopeDIE);
@ -254,8 +256,8 @@ public:
DIE *getOrCreateContextDIE(const DIScope *Ty) override;
/// Construct a DIE for this subprogram scope.
DIE &constructSubprogramScopeDIE(const DISubprogram *Sub,
LexicalScope *Scope);
DIE &constructSubprogramScopeDIE(const DISubprogram *Sub, LexicalScope *Scope,
MCSymbol *LineTableSym);
DIE *createAndAddScopeChildren(LexicalScope *Scope, DIE &ScopeDIE);

View File

@ -2361,6 +2361,9 @@ void DwarfDebug::beginFunctionImpl(const MachineFunction *MF) {
return;
DwarfCompileUnit &CU = getOrCreateDwarfCompileUnit(SP->getUnit());
FunctionLineTableLabel = CU.emitFuncLineTableOffsets()
? Asm->OutStreamer->emitLineTableLabel()
: nullptr;
Asm->OutStreamer->getContext().setDwarfCompileUnitID(
getDwarfCompileUnitIDForLineTable(CU));
@ -2474,11 +2477,14 @@ void DwarfDebug::endFunctionImpl(const MachineFunction *MF) {
}
ProcessedSPNodes.insert(SP);
DIE &ScopeDIE = TheCU.constructSubprogramScopeDIE(SP, FnScope);
DIE &ScopeDIE =
TheCU.constructSubprogramScopeDIE(SP, FnScope, FunctionLineTableLabel);
if (auto *SkelCU = TheCU.getSkeleton())
if (!LScopes.getAbstractScopesList().empty() &&
TheCU.getCUNode()->getSplitDebugInlining())
SkelCU->constructSubprogramScopeDIE(SP, FnScope);
SkelCU->constructSubprogramScopeDIE(SP, FnScope, FunctionLineTableLabel);
FunctionLineTableLabel = nullptr;
// Construct call site entries.
constructCallSiteEntryDIEs(*SP, TheCU, ScopeDIE, *MF);

View File

@ -410,6 +410,9 @@ class DwarfDebug : public DebugHandlerBase {
std::pair<std::unique_ptr<DwarfTypeUnit>, const DICompositeType *>, 1>
TypeUnitsUnderConstruction;
/// Symbol pointing to the current function's DWARF line table entries.
MCSymbol *FunctionLineTableLabel;
/// Used to set a uniqe ID for a Type Unit.
/// This counter represents number of DwarfTypeUnits created, not necessarily
/// number of type units that will be emitted.

View File

@ -483,6 +483,20 @@ void MCStreamer::emitCFIEndProcImpl(MCDwarfFrameInfo &Frame) {
Frame.End = (MCSymbol *)1;
}
MCSymbol *MCStreamer::emitLineTableLabel() {
// Create a label and insert it into the line table and return this label
const MCDwarfLoc &DwarfLoc = getContext().getCurrentDwarfLoc();
MCSymbol *LineStreamLabel = getContext().createTempSymbol();
MCDwarfLineEntry LabelLineEntry(nullptr, DwarfLoc, LineStreamLabel);
getContext()
.getMCDwarfLineTable(getContext().getDwarfCompileUnitID())
.getMCLineSections()
.addLineEntry(LabelLineEntry, getCurrentSectionOnly() /*Section*/);
return LineStreamLabel;
}
MCSymbol *MCStreamer::emitCFILabel() {
// Return a dummy non-null value so that label fields appear filled in when
// generating textual assembly.

View File

@ -0,0 +1,123 @@
; RUN: llc -O3 -mtriple=i686-w64-mingw32 -o %t_no -filetype=obj %s
; RUN: llvm-dwarfdump -v -all %t_no | FileCheck %s -check-prefix=NO_STMT_SEQ
; RUN: llc -O3 -mtriple=i686-w64-mingw32 -o %t_yes -filetype=obj %s -emit-func-debug-line-table-offsets
; RUN: llvm-dwarfdump -v -all %t_yes | FileCheck %s -check-prefix=STMT_SEQ
; NO_STMT_SEQ-NOT: DW_AT_LLVM_stmt_sequence
; STMT_SEQ: [[[ABBREV_CODE1:[0-9]+]]] DW_TAG_subprogram
; STMT_SEQ: DW_AT_LLVM_stmt_sequence DW_FORM_sec_offset
; STMT_SEQ: [[[ABBREV_CODE2:[0-9]+]]] DW_TAG_subprogram
; STMT_SEQ: DW_AT_LLVM_stmt_sequence DW_FORM_sec_offset
; STMT_SEQ: DW_TAG_subprogram [[[ABBREV_CODE1]]]
; STMT_SEQ: DW_AT_LLVM_stmt_sequence [DW_FORM_sec_offset] (0x00000043)
; STMT_SEQ: DW_AT_name {{.*}}func01
; STMT_SEQ: DW_TAG_subprogram [[[ABBREV_CODE2]]]
; STMT_SEQ: DW_AT_LLVM_stmt_sequence [DW_FORM_sec_offset] (0x00000056)
; STMT_SEQ: DW_AT_name {{.*}}main
;; Check the entire line sequence to see that it's correct
; STMT_SEQ: Address Line Column File ISA Discriminator OpIndex Flags
; STMT_SEQ-NEXT: ------------------ ------ ------ ------ --- ------------- ------- -------------
; STMT_SEQ-NEXT: 0x00000043: 04 DW_LNS_set_file (0)
; STMT_SEQ-NEXT: 0x00000045: 05 DW_LNS_set_column (9)
; STMT_SEQ-NEXT: 0x00000047: 0a DW_LNS_set_prologue_end
; STMT_SEQ-NEXT: 0x00000048: 00 DW_LNE_set_address (0x00000000)
; STMT_SEQ-NEXT: 0x0000004f: 16 address += 0, line += 4, op-index += 0
; STMT_SEQ-NEXT: 0x0000000000000000 5 9 0 0 0 0 is_stmt prologue_end
; STMT_SEQ-NEXT: 0x00000050: 05 DW_LNS_set_column (3)
; STMT_SEQ-NEXT: 0x00000052: 67 address += 6, line += 1, op-index += 0
; STMT_SEQ-NEXT: 0x0000000000000006 6 3 0 0 0 0 is_stmt
; STMT_SEQ-NEXT: 0x00000053: 00 DW_LNE_end_sequence
; STMT_SEQ-NEXT: 0x0000000000000006 6 3 0 0 0 0 is_stmt end_sequence
; STMT_SEQ-NEXT: 0x00000056: 04 DW_LNS_set_file (0)
; STMT_SEQ-NEXT: 0x00000058: 00 DW_LNE_set_address (0x00000008)
; STMT_SEQ-NEXT: 0x0000005f: 03 DW_LNS_advance_line (10)
; STMT_SEQ-NEXT: 0x00000061: 01 DW_LNS_copy
; STMT_SEQ-NEXT: 0x0000000000000008 10 0 0 0 0 0 is_stmt
; STMT_SEQ-NEXT: 0x00000062: 05 DW_LNS_set_column (10)
; STMT_SEQ-NEXT: 0x00000064: 0a DW_LNS_set_prologue_end
; STMT_SEQ-NEXT: 0x00000065: 83 address += 8, line += 1, op-index += 0
; STMT_SEQ-NEXT: 0x0000000000000010 11 10 0 0 0 0 is_stmt prologue_end
; STMT_SEQ-NEXT: 0x00000066: 05 DW_LNS_set_column (3)
; STMT_SEQ-NEXT: 0x00000068: 9f address += 10, line += 1, op-index += 0
; STMT_SEQ-NEXT: 0x000000000000001a 12 3 0 0 0 0 is_stmt
; STMT_SEQ-NEXT: 0x00000069: 02 DW_LNS_advance_pc (addr += 5, op-index += 0)
; STMT_SEQ-NEXT: 0x0000006b: 00 DW_LNE_end_sequence
; STMT_SEQ-NEXT: 0x000000000000001f 12 3 0 0 0 0 is_stmt end_sequence
; generated from:
; clang -Oz -g -S -emit-llvm test.c -o test.ll
; ======= test.c ======
; volatile int g_var1 = 1;
; #define ATTR __attribute__((noinline))
; ATTR int func01() {
; g_var1++;
; func01();
; return 1;
; }
; ATTR int main() {
; g_var1 = 100;
; func01();
; g_var1--;
; return g_var1;
; }
; =====================
; ModuleID = 'test.c'
source_filename = "test.c"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"
@g_var1 = dso_local global i32 1, align 4, !dbg !0
; Function Attrs: minsize nofree noinline norecurse noreturn nounwind optsize memory(readwrite, argmem: none) uwtable
define dso_local noundef i32 @func01() local_unnamed_addr #0 !dbg !14 {
entry:
br label %tailrecurse
tailrecurse: ; preds = %tailrecurse, %entry
%0 = load volatile i32, ptr @g_var1, align 4, !dbg !17, !tbaa !18
%inc = add nsw i32 %0, 1, !dbg !17
store volatile i32 %inc, ptr @g_var1, align 4, !dbg !17, !tbaa !18
br label %tailrecurse, !dbg !22
}
; Function Attrs: minsize nofree noinline norecurse noreturn nounwind optsize uwtable
define dso_local noundef i32 @main() local_unnamed_addr #1 !dbg !23 {
entry:
store volatile i32 100, ptr @g_var1, align 4, !dbg !24, !tbaa !18
%call = tail call i32 @func01() #2, !dbg !25
unreachable, !dbg !26
}
attributes #0 = { minsize nofree noinline norecurse noreturn nounwind optsize memory(readwrite, argmem: none) uwtable "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cmov,+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
attributes #1 = { minsize nofree noinline norecurse noreturn nounwind optsize uwtable "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cmov,+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
attributes #2 = { minsize optsize }
!llvm.dbg.cu = !{!2}
!llvm.module.flags = !{!7, !8, !9, !10, !11, !12}
!llvm.ident = !{!13}
!0 = !DIGlobalVariableExpression(var: !1, expr: !DIExpression())
!1 = distinct !DIGlobalVariable(name: "g_var1", scope: !2, file: !3, line: 1, type: !5, isLocal: false, isDefinition: true)
!2 = distinct !DICompileUnit(language: DW_LANG_C11, file: !3, producer: "clang version 20.0.0", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, globals: !4, splitDebugInlining: false, nameTableKind: None)
!3 = !DIFile(filename: "test.c", directory: "/tmp/tst", checksumkind: CSK_MD5, checksum: "eee003eb3c4fd0a1ff078d3148679e06")
!4 = !{!0}
!5 = !DIDerivedType(tag: DW_TAG_volatile_type, baseType: !6)
!6 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
!7 = !{i32 7, !"Dwarf Version", i32 5}
!8 = !{i32 2, !"Debug Info Version", i32 3}
!9 = !{i32 1, !"wchar_size", i32 4}
!10 = !{i32 8, !"PIC Level", i32 2}
!11 = !{i32 7, !"PIE Level", i32 2}
!12 = !{i32 7, !"uwtable", i32 2}
!13 = !{!"clang version 20.0.0"}
!14 = distinct !DISubprogram(name: "func01", scope: !3, file: !3, line: 4, type: !15, scopeLine: 4, flags: DIFlagAllCallsDescribed, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !2)
!15 = !DISubroutineType(types: !16)
!16 = !{!6}
!17 = !DILocation(line: 5, column: 9, scope: !14)
!18 = !{!19, !19, i64 0}
!19 = !{!"int", !20, i64 0}
!20 = !{!"omnipotent char", !21, i64 0}
!21 = !{!"Simple C/C++ TBAA"}
!22 = !DILocation(line: 6, column: 3, scope: !14)
!23 = distinct !DISubprogram(name: "main", scope: !3, file: !3, line: 10, type: !15, scopeLine: 10, flags: DIFlagAllCallsDescribed, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !2)
!24 = !DILocation(line: 11, column: 10, scope: !23)
!25 = !DILocation(line: 12, column: 3, scope: !23)
!26 = !DILocation(line: 13, column: 9, scope: !23)