This introduces a substantial (5-10%) regression in .debug_info size, so
we're discussing alternatives in #122742 and #124790.
This reverts commit 7c729418d721147bf1f2b257afd30f84721888ad.
https://github.com/llvm/llvm-project/pull/122183 adds a codegen pass to
infer machine jump table entry's hotness from the MBB hotness. This is a
follow-up PR to produce `.hot` and or `.unlikely` section prefix for
jump table's (read-only) data sections in the relocatable `.o` files.
When this patch is enabled, linker will see {`.rodata`, `.rodata.hot`,
`.rodata.unlikely`} in input sections. It can map `.rodata.hot` and
`.rodata` in the input sections to `.rodata.hot` in the executable, and
map `.rodata.unlikely` into `.rodata` with a pending extension to
`--keep-text-section-prefix` like
059e7cbb66,
or with a linker script.
1. To partition hot and jump tables, the AsmPrinter pass slices a function's jump table indices into two groups, one for hot and the other for cold jump tables. It then emits hot jump tables into a `.hot`-prefixed data section and cold ones into a `.unlikely`-prefixed data section, retaining the relative order of `LJT<N>` labels within each group.
2. [ELF only] To have data sections with _dynamic_ names (e.g., `.rodata.hot[.func]`), we implement
`TargetLoweringObjectFile::getSectionForJumpTable` method that accepts a `MachineJumpTableEntry` parameter, and update `selectELFSectionForGlobal` to generate `.hot` or `.unlikely` based on
MJTE's hotness.
- The dynamic JT section name doesn't depend on `-ffunction-section=true` or `-funique-section-names=true`, even though it leverages the similar underlying mechanism to have a MCSection with on-demand name as `-ffunction-section` does.
3. The new code path is off by default.
- Typically, `TargetOptions` conveys clang or LLVM tools' options to code generation passes. To follow the pattern, add option `EnableStaticDataPartitioning` bit in `TargetOptions` and make it
readable through `TargetMachine`.
- To enable the new code path in tools like `llc`, `partition-static-data-sections` option is introduced in
`CodeGen/CommandFlags.h/cpp`.
- A subsequent patch
([draft](8f36a13743)) will add a clang option to enable the new code path.
---------
Co-authored-by: Ellis Hoag <ellis.sparky.hoag@gmail.com>
As part of the "RemoveDIs" project, BasicBlock::iterator now carries a
debug-info bit that's needed when getFirstNonPHI and similar feed into
instruction insertion positions. Call-sites where that's necessary were
updated a year ago; but to ensure some type safety however, we'd like to
have all calls to getFirstNonPHI use the iterator-returning version.
This patch changes a bunch of call-sites calling getFirstNonPHI to use
getFirstNonPHIIt, which returns an iterator. All these call sites are
where it's obviously safe to fetch the iterator then dereference it. A
follow-up patch will contain less-obviously-safe changes.
We'll eventually deprecate and remove the instruction-pointer
getFirstNonPHI, but not before adding concise documentation of what
considerations are needed (very few).
---------
Co-authored-by: Stephen Tozer <Melamoto@gmail.com>
.gfids$y contains a list of indirect calls for Control Flow Guard. This
wasn't working properly for ARM64EC: direct calls were being treated as
indirect calls. Make sure we correctly filter out direct calls.
This improves the protection from Control Flow Guard, and also fixes a
link error when using certain functions from oldnames.lib.
Also add support for new relocation types required by debug information.
Constants have been taken from CodeView Symbolic Debug Information
Specification.
Currently Clang only attaches `DW_AT_object_pointer` to
`DW_TAG_subprogram` definitions. LLDB constructs C++ method types from
their `DW_TAG_subprogram` declaration, which is also the point at which
it needs to determine whether a method is static or not. LLDB's
heuristic for this could be very simple if we emitted
`DW_AT_object_pointer` on declarations. But since we don't, LLDB has to
guess whether an argument is an implicit object parameter based on the
DW_AT_name and DW_AT_type.
To simplify LLDB's job (and to eventually support C++23's explicit
object parameters), this patch adds the `DIFlagObjectPointer` to
`DISubprogram` declarations.
For reference, GCC attaches the object-pointer DIE to both the
definition and declaration: https://godbolt.org/z/3TWjTfWon
Fixes https://github.com/llvm/llvm-project/issues/120973
This restores the functionality of AsmPrinterHandlers to what it was
prior to https://github.com/llvm/llvm-project/pull/96785. The attempted
hack there of adding a duplicate DebugHandlerBase handling added a lot
of hidden state and assumptions, which just segfaulted when we tried to
continuing using this API. Instead, this just goes back to the old
design, but adds a separate array for the basic EH handles. The
duplicate array is identical to the other array of handler, but which
doesn't get their begin/endInstruction callbacks called. This still
saves the negligible but measurable amount of virtual function calls as
was the goal of #96785, while restoring the API to the pre-LLVM-19
status quo.
Fixes the "use after poison" issue introduced by #121516 (see
<https://github.com/llvm/llvm-project/pull/121516#issuecomment-2585912395>).
The root cause of this issue is that #121516 introduced "Called Global"
information for call instructions modeling how "Call Site" info is
stored in the machine function, HOWEVER it didn't copy the
copy/move/erase operations for call site information.
The fix is to rename and update the existing copy/move/erase functions
so they also take care of Called Global info.
The caller `AsmPrinter::emitJumpTableInfo` checks [1] `MJTI` is not a
null pointer before calling `emitJumpTableEntry` or
`emitJumpTableSizesSection`.
This patch updates callee function's signature to accept const
reference, this way it's explicit `MJTI` won't be nullptr inside the
callee.
[1]
9d5299eb61/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp (L2857)
Unsigned subtraction wrap-around occurs in `emitGlobalConstantImpl` on
an AIX-specific code path from 8e4423eb0888 when a structure type has
zero elements.
With assertions enabled, this manifests as:
```
TypeSize llvm::StructLayout::getElementOffset(unsigned int) const: Assertion `Idx < NumElements && "Invalid element idx!"' failed.
```
When creating a Type Unit (TU), LLVM attempts to do so optimistically.
However, if this fails, it discards the TU state and creates the TU
within the Compilation Unit (CU). In such cases, an entry for the
top-level DIE is not created in the debug names table.
This can cause issues when running llvm-dwarfdump --debug-names
--verify, as the missing entry will result in verification failure.
To address this issue, this patch adds a call to the
updateAcceleratorTables when TU creation fails. This ensures that the
debug names table is updated correctly, even in cases where TU creation
fails.
AIX assembly is very different from the gas syntax. We don't expect
other targets to share these differences. Unify the numerous,
essentially AIX-specific variables.
llvm-mc --assemble prints an initial `.text` from `initSections`.
This is weird for quick assembly tasks that do not specify `.text`.
Omit the .text by moving section directive printing from `changeSection`
to `switchSection`. switchSectionNoPrint now correctly calls the
`changeSection` hook (needed by MachO).
The initial directives of clang -S are now reordered. On ELF targets, we
get `.file "a.c"; .text` instead of `.text; .file "a.c"`.
If there is no function, `.text` will be omitted.
This makes sure no optimizations are applied that assume the
bigger alignment or size, which could be incorrect if we link
together with non-instrumented code.
Re-apply #113148 after revert in #119331
If function pointer signing is enabled, sign personality function
pointer stored in `.DW.ref.__gxx_personality_v0` section with IA key,
0x7EAD = `ptrauth_string_discriminator("personality")` constant
discriminator and address diversity enabled.
This relands commit #115111.
Use traditional way to update post dominator tree, i.e. break critical
edge splitting into insert, insert, delete sequence.
When splitting critical edges, the post dominator tree may change its
root node, and `setNewRoot` only works in normal dominator tree...
See
6c7e5827ed/llvm/include/llvm/Support/GenericDomTree.h (L684-L687)
Currently LLVMContext::emitError emits any error as an "inline asm"
error which does not make any sense. InlineAsm appears to be special,
in that it uses a "LocCookie" from srcloc metadata, which looks like
a parallel mechanism to ordinary source line locations. This meant
that other types of failures had degraded source information reported
when available.
Introduce some new generic error types, and only use inline asm
in the appropriate contexts. The DiagnosticInfo types are still
a bit of a mess, and I'm not sure why DiagnosticInfoWithLocationBase
exists instead of just having an optional DiagnosticLocation in the
base class.
DK_Generic is for any error that derives from an IR level instruction,
and thus can pull debug locations directly from it. DK_GenericWithLoc
is functionally the generic codegen error, since it does not depend
on the IR and instead can construct a DiagnosticLocation from the
MI debug location.
If function pointer signing is enabled, sign personality function
pointer stored in `.DW.ref.__gxx_personality_v0` section with IA key,
0x7EAD = `ptrauth_string_discriminator("personality")` constant
discriminator and address diversity enabled.
The optimiser will produce empty blocks that are unconditionally
executed according to the CFG -- while it may not be meaningful code,
and won't get a prologue_end position, we need to not crash on this
input.
The fault comes from assuming that there's always a next block with some
instructions in it, that will eventually produce some meaningful control
flow to stop at -- in the given reproducer in issue #117206 this isn't
true, because the function terminates with `unreachable`. Thus, I've
refactored the "get next instruction logic" into a helper that'll step
through all blocks and terminate if there aren't any more.
Reproducer from aeubanks
Sometimes we want to use a `PgoAnalysisMap` feature that doesn't require
the BB entries info, e.g. only the `FuncEntryCount`, but the BB entries
is emitted by default, so I'm adding an option to skip the info for this
case to save the binary size(can save ~90% size of the section). For
implementation, it extends a new field(`OmitBBEntries`) in
`BBAddrMap::Features` for this and it's controlled by a switch
`--basic-block-address-map-skip-bb-entries`.
Note that this naturally supports backwards compatibility as the field
is zero for the old version, matches the decoding in the new version
llvm.
When GlobalMerge creates a MergedGlobal of statics all initialized to
zero, emitGlobalConstantImpl sees a ConstantAggregateZero. This results
in just emitting zeros followed by labels for the aliases. We need to
handle it more like how emitGlobalConstantStruct does by emitting each
global inside the aggregate.
---------
Co-authored-by: Hubert Tong <hubert.reinterpretcast@gmail.com>
Consider the case when the compiler generates a static member. Any
consumer of the debug info generated for that case, would benefit if
that member has the DW_AT_artificial flag.
Add a filter to avoid picking prologue_end when a function is empty (it may
have blocks but no instructions). This saves us from pushing more
validity-checking into findPrologueEndLoc.
In 39b2979a4 Pavel has kindly refined the implementation of a test in such
a way that it doesn't trip up over this patch -- the test wishes to
stimulate LLDBs presentation of line0 locations, rather than wanting to
always step on line-zero on entry to artificial_location.c. As that's what
was tripping up this change, reapply.
Original commit message follows.
[DWARF] Emit a worst-case prologue_end flag for pathological inputs (#107849)
prologue_end usually indicates where the end of the function-initialization
lies, and is where debuggers usually choose to put the initial breakpoint
for a function. Our current algorithm piggy-backs it on the first available
source-location: which doesn't necessarily have anything to do with the
start of the function.
To avoid this in heavily-optimised code that lacks many useful source
locations, pick a worst-case "if all else fails" prologue_end location, of
the first instruction that appears to do meaningful computation. It'll be
given the function-scope line number, which should run-on from the start of
the function anyway. This means if your code is completely inverted by the
optimiser, you can at least put a breakpoint at the _start_ like you
expect, even if it's difficult to then step through.
This patch also attempts to preserve some good behaviour we have without
optimisations -- at O0, if the prologue immediately falls into a loop body
without any computation happening, then prologue_end lands at the start of
that loop. This is desirable; but does mean we need to do more work to
detect and support those situations.
**Summary**
This patch introduces a new compiler option `-mllvm
-emit-func-debug-line-table-offsets` that enables the emission of
per-function line table offsets and end sequences in DWARF debug
information. This enhancement allows tools and debuggers to accurately
attribute line number information to their corresponding functions, even
in scenarios where functions are merged or share the same address space
due to optimizations like Identical Code Folding (ICF) in the linker.
**Background**
RFC: [New DWARF Attribute for Symbolication of Merged
Functions](https://discourse.llvm.org/t/rfc-new-dwarf-attribute-for-symbolication-of-merged-functions/79434)
Previous similar PR:
[#93137](https://github.com/llvm/llvm-project/pull/93137) – This PR was
very similar to the current one but at the time, the assembler had no
support for emitting labels within the line table. That support was
added in PR [#99710](https://github.com/llvm/llvm-project/pull/99710) -
and in this PR we use some of the support added in the assembler PR.
In the current implementation, Clang generates line information in the
`debug_line` section without directly associating line entries with
their originating `DW_TAG_subprogram` DIEs. This can lead to issues when
post-compilation optimizations merge functions, resulting in overlapping
address ranges and ambiguous line information.
For example, when functions are merged by ICF in LLD, multiple functions
may end up sharing the same address range. Without explicit linkage
between functions and their line entries, tools cannot accurately
attribute line information to the correct function, adversely affecting
debugging and call stack resolution.
**Implementation Details**
To address the above issue, the patch makes the following key changes:
**`DW_AT_LLVM_stmt_sequence` Attribute**: Introduces a new LLVM-specific
attribute `DW_AT_LLVM_stmt_sequence` to each `DW_TAG_subprogram` DIE.
This attribute holds a label pointing to the offset in the line table
where the function's line entries begin.
**End-of-Sequence Markers**: Emits an explicit DW_LNE_end_sequence after
each function's line entries in the line table. This marks the end of
the line information for that function, ensuring that line entries are
correctly delimited.
**Assembler and Streamer Modifications**: Modifies the MCStreamer and
related classes to support emitting the necessary labels and tracking
the current function's line entries. A new flag
GenerateFuncLineTableOffsets is added to control this behavior.
**Compiler Option**: Introduces the `-mllvm
-emit-func-debug-line-table-offsets` option to enable this
functionality, allowing users to opt-in as needed.
Add a specification attribute to LLVM DebugInfo, which is analogous
to DWARF's DW_AT_specification. According to the DWARF spec:
"A debugging information entry that represents a declaration that
completes another (earlier) non-defining declaration may have a
DW_AT_specification attribute whose value is a reference to the
debugging information entry representing the non-defining declaration."
This patch allows types to be specifications of other types. This is
used by Swift to represent generic types. For example, given this Swift
program:
```
struct MyStruct<T> {
let t: T
}
let variable = MyStruct<Int>(t: 43)
```
The Swift compiler emits (roughly) an unsubtituted type for MyStruct<T>:
```
DW_TAG_structure_type
DW_AT_name ("MyStruct")
// "$s1w8MyStructVyxGD" is a Swift mangled name roughly equivalent to
// MyStruct<T>
DW_AT_linkage_name ("$s1w8MyStructVyxGD")
// other attributes here
```
And a specification for MyStruct<Int>:
```
DW_TAG_structure_type
DW_AT_specification (<link to "MyStruct">)
// "$s1w8MyStructVySiGD" is a Swift mangled name equivalent to
// MyStruct<Int>
DW_AT_linkage_name ("$s1w8MyStructVySiGD")
DW_AT_byte_size (0x08)
// other attributes here
```
This reverts commit bf483ddb42065405e345393e022dc72357ec5a3a.
See PR, there's a test testing for this behaviour (possibly adaptable), and
a duplicate line entry too
This patch follows on from the changes made in #105524, by adding an
additional heuristic that prevents us from applying the start-of-MBB
is_stmt flag when we can see that, for all direct branches to the MBB,
the last line stepped on before the branch is the same as the first line
of the MBB. This is mainly to prevent certain pathological cases, such
as macros that expand to multiple basic blocks that all have the same
source location, from giving us repeated steps on the same line. This
approach is not comprehensive, since it relies on analyzeBranch to read
edges, but the default fallback of applying is_stmt may lead only to
useless steps in some cases, rather than skipping useful steps
altogether.
prologue_end usually indicates where the end of the function-initialization
lies, and is where debuggers usually choose to put the initial breakpoint
for a function. Our current algorithm piggy-backs it on the first available
source-location: which doesn't necessarily have anything to do with the
start of the function.
To avoid this in heavily-optimised code that lacks many useful source
locations, pick a worst-case "if all else fails" prologue_end location, of
the first instruction that appears to do meaningful computation. It'll be
given the function-scope line number, which should run-on from the start of
the function anyway. This means if your code is completely inverted by the
optimiser, you can at least put a breakpoint at the _start_ like you
expect, even if it's difficult to then step through.
This patch also attempts to preserve some good behaviour we have without
optimisations -- at O0, if the prologue immediately falls into a loop body
without any computation happening, then prologue_end lands at the start of
that loop. This is desirable; but does mean we need to do more work to
detect and support those situations.
These specify that the value of the given register in the previous frame
is the CFA plus some offset. This isn't very common but can be necessary
if the original value is normally reconstructed from the stack/frame
pointer instead of being saved on the stack and reloaded from there.
An extra inhabitant is a bit pattern that does not represent a valid
value for instances of a given type. The number of extra inhabitants is
the number of those bit configurations.
This is used by Swift to save space when composing types. For example,
because Bool only needs 2 bit patterns to represent all of its values
(true and false), an Optional<Bool> only occupies 1 byte in memory by
using a bit configuration that is unused by Bool. Which bit patterns are
unused are part of the ABI of the language.
Since Swift generics are not monomorphized, by using dynamic libraries
you can have generic types whose size, alignment, etc, are known only
at runtime (which is why this feature is needed).
This patch adds num_extra_inhabitants to LLVM-IR debug info and in DWARF
as an Apple extension.
... matching what we have in the disassembler. This isn't turned on by
default since several of the scheduling models are not completely
accurate, and we don't want to be misleading.
In buildLocationList, with basic block sections, we iterate over
every basic block twice to detect section start and end. This is
sub-optimal and shows up as significantly time consuming when
compiling large functions.
This patch uses the set of sections already stored in MBBSectionRanges
and iterates over sections rather than basic blocks.
When detecting if loclists can be merged, the end label of an entry is
matched with the beginning label of the next entry. For the section
corresponding to the entry basic block, this is skipped. This is
because the loc list uses the end label corresponding to the function
whereas the MBBSectionRanges map uses the function end label.
For example:
.Lfunc_begin0:
.file
.loc 0 4 0 # ex2.cc:4:0
.cfi_startproc
.Ltmp0:
.loc 0 8 5 prologue_end # ex2.cc:8:5
....
.LBB_END0_0:
.cfi_endproc
.section .text._Z4testv,"ax",@progbits,unique,1
...
.Lfunc_end0:
.size _Z4testv, .Lfunc_end0-_Z4testv
The debug loc uses ".LBB_END0_0" for the end of the section whereas
MBBSectionRanges uses ".Lfunc_end0".
It is alright to skip this as we already check the section corresponding
to the debugloc entry.
Added a new test case to check that if this works correctly when the
variable's value is mutated in the entry section.
As part of FEAT_PAuthLR, a new DWARF Frame Instruction was introduced,
`DW_CFA_AARCH64_negate_ra_state_with_pc`. This instructs Libunwind that
the PC has been used with the signing instruction. This change includes
three commits
- Libunwind support for the newly introduced DWARF Instruction
- CodeGen Support for the DWARF Instructions
- Reversing the changes made in #96377. Due to
`DW_CFA_AARCH64_negate_ra_state_with_pc`'s requirements to be placed
immediately after the signing instruction, this would mean the CFI
Instruction location was not consistent with the generated location when
not using FEAT_PAuthLR. The commit reverses the changes and makes the
location consistent across the different branch protection options.
While this does have a code size effect, this is a negligible one.
For the ABI information, see here:
853286c7ab/aadwarf64/aadwarf64.rst (id23)
This patch makes it so that specifying all or none for -pgo-analysis-map
along with an explicit option causes an error as this set of options
does not really make sense.
This patch adds none and all options to the -pgo-analysis-map flag,
which do basically what they say on the tin. The none option is added to
enable forcing the pgo-analysis-map by overriding an earlier invocation
of the flag. The all option is just added for convenience.
Store Swift mangled names in DW_AT_linkage_name. The Swift compiler
emits only the type mangled name in debug information, and LLDB uses
those mangled names as keys to look up size, alignment, fields, etc
from either reflection metadata or Swift modules.
Additionally, emit types linkage names for types into the accelerator
table if they exist and they're different from the display name.