With
3feb724496,
AsmPrinter can place jump table entries into `.hot` or `.unlikely`
prefixed data sections. This change refactors AsmPrinter and
AArch64AsmPrinter to prepare for the aarch64 port.
* Before this patch, the AsmPrinter class exposes `emitJumpTableInfo` as
a virtual method, and AArch64AsmPrinter overrides `emitJumpTableInfo`
for jump table emission.
* After this patch, both AsmPrinter and AArch64AsmPrinter shares
`AsmPrinter::emitJumpTableInfo`, and class-specific code are moved
inside each class's `emitJumpTableImpl` respectively.
This is a follow-up of https://github.com/llvm/llvm-project/pull/125987,
and https://github.com/llvm/llvm-project/pull/126018 implements the port
for aarch64.
AsmPrinter may switch the current section when e.g., emitting a jump
table for a switch. `.stack_sizes` should still be linked to the
function section. If the section is wrong, readelf emits a warning
"relocation symbol is not in the expected section".
https://reviews.llvm.org/D17938 introduced lowerRelativeReference to
give ConstantExpr sub (A-B) special semantics in ELF: when `A` is an
`unnamed_addr` function, create a PLT-generating relocation. This was
intended for C++ relative vtables, but C++ relative vtable ended up
using DSOLocalEquivalent (lowerDSOLocalEquivalent).
This special treatment of `unnamed_addr` seems unusual.
Let's remove it. Only COFF needs an overload to generate a @IMGREL32
relocation specifier (llvm/test/MC/COFF/cross-section-relative.ll).
Pull Request: https://github.com/llvm/llvm-project/pull/134781
With -fpatchable-function-entry (or the patchable_function_entry
function attribute), we emit records of patchable entry locations to the
__patchable_function_entries section. Add an additional parameter to the
command line option that allows one to specify a different default
section name for the records, and an identical parameter to the function
attribute that allows one to override the section used.
The main use case for this change is the Linux kernel using prefix NOPs
for ftrace, and thus depending on__patchable_function_entries to locate
traceable functions. Functions that are not traceable currently disable
entry NOPs using the function attribute, but this creates a
compatibility issue with -fsanitize=kcfi, which expects all indirectly
callable functions to have a type hash prefix at the same offset from
the function entry.
Adding a section parameter would allow the kernel to distinguish between
traceable and non-traceable functions by adding entry records to
separate sections while maintaining a stable function prefix layout for
all functions. LKML discussion:
https://lore.kernel.org/lkml/Y1QEzk%2FA41PKLEPe@hirez.programming.kicks-ass.net/
This is a follow-up patch of
https://github.com/llvm/llvm-project/pull/125756
In this PR, static-data-splitter pass produces the aggregated profile
counts of constants for constant pools in a global state
(`StateDataProfileInfo`), and asm printer consumes the profile counts to
produce `.hot` or `.unlikely` prefixes.
This implementation covers both x86 and aarch64 asm printer.
clang -fexperimental-relative-c++-abi-vtables might generate `@plt` and
`@gotpcrel` specifiers in data directives. The syntax is not used in
humand-written assembly code, and is not supported by GNU assembler.
Note: the `@plt` in `.word foo@plt` is different from
the legacy `call func@plt` (where `@plt` is simply ignored).
The `@plt` syntax was selected was simply due to a quirk of AsmParser:
the syntax was supported by all targets until I updated it
to be an opt-in feature in a0671758eb6e52a758bd1b096a9b421eec60204c
RISC-V favors the `%specifier(expr)` syntax following MIPS and Sparc,
and we should follow this convention.
This PR adds support for `.word %pltpcrel(foo+offset)` and
`.word %gotpcrel(foo)`, and drops `@plt` and `@gotpcrel`.
* MCValue::SymA can no longer have a SymbolVariant. Add an assert
similar to that of AArch64ELFObjectWriter.cpp before
https://reviews.llvm.org/D81446 (see my analysis at
https://maskray.me/blog/2025-03-16-relocation-generation-in-assemblers
if intrigued)
* `jump foo@plt, x31` now has a different diagnostic.
Pull Request: https://github.com/llvm/llvm-project/pull/132569
https://reviews.llvm.org/D23669 inappropriately added MIPS-specific
dtprel/tprel directives to MCStreamer. In addition,
llvm-mc -filetype=null parsing these directives will crash.
This patch moves these functions to MipsTargetStreamer and fixes
-filetype=null.
gprel32 and gprel64, called by AsmPrinter, are moved to
MCTargetStreamer.
setType is unneeded (and AsmPrinter tries not to modify symbols).
AsmPrinter. MCSA_ELF_TypeFunction is available on all
targets using getSymbolPreferLocal.
Pull Request: https://github.com/llvm/llvm-project/pull/128138
https://github.com/llvm/llvm-project/pull/122183 adds a codegen pass to
infer machine jump table entry's hotness from the MBB hotness. This is a
follow-up PR to produce `.hot` and or `.unlikely` section prefix for
jump table's (read-only) data sections in the relocatable `.o` files.
When this patch is enabled, linker will see {`.rodata`, `.rodata.hot`,
`.rodata.unlikely`} in input sections. It can map `.rodata.hot` and
`.rodata` in the input sections to `.rodata.hot` in the executable, and
map `.rodata.unlikely` into `.rodata` with a pending extension to
`--keep-text-section-prefix` like
059e7cbb66,
or with a linker script.
1. To partition hot and jump tables, the AsmPrinter pass slices a function's jump table indices into two groups, one for hot and the other for cold jump tables. It then emits hot jump tables into a `.hot`-prefixed data section and cold ones into a `.unlikely`-prefixed data section, retaining the relative order of `LJT<N>` labels within each group.
2. [ELF only] To have data sections with _dynamic_ names (e.g., `.rodata.hot[.func]`), we implement
`TargetLoweringObjectFile::getSectionForJumpTable` method that accepts a `MachineJumpTableEntry` parameter, and update `selectELFSectionForGlobal` to generate `.hot` or `.unlikely` based on
MJTE's hotness.
- The dynamic JT section name doesn't depend on `-ffunction-section=true` or `-funique-section-names=true`, even though it leverages the similar underlying mechanism to have a MCSection with on-demand name as `-ffunction-section` does.
3. The new code path is off by default.
- Typically, `TargetOptions` conveys clang or LLVM tools' options to code generation passes. To follow the pattern, add option `EnableStaticDataPartitioning` bit in `TargetOptions` and make it
readable through `TargetMachine`.
- To enable the new code path in tools like `llc`, `partition-static-data-sections` option is introduced in
`CodeGen/CommandFlags.h/cpp`.
- A subsequent patch
([draft](8f36a13743)) will add a clang option to enable the new code path.
---------
Co-authored-by: Ellis Hoag <ellis.sparky.hoag@gmail.com>
This restores the functionality of AsmPrinterHandlers to what it was
prior to https://github.com/llvm/llvm-project/pull/96785. The attempted
hack there of adding a duplicate DebugHandlerBase handling added a lot
of hidden state and assumptions, which just segfaulted when we tried to
continuing using this API. Instead, this just goes back to the old
design, but adds a separate array for the basic EH handles. The
duplicate array is identical to the other array of handler, but which
doesn't get their begin/endInstruction callbacks called. This still
saves the negligible but measurable amount of virtual function calls as
was the goal of #96785, while restoring the API to the pre-LLVM-19
status quo.
The caller `AsmPrinter::emitJumpTableInfo` checks [1] `MJTI` is not a
null pointer before calling `emitJumpTableEntry` or
`emitJumpTableSizesSection`.
This patch updates callee function's signature to accept const
reference, this way it's explicit `MJTI` won't be nullptr inside the
callee.
[1]
9d5299eb61/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp (L2857)
Unsigned subtraction wrap-around occurs in `emitGlobalConstantImpl` on
an AIX-specific code path from 8e4423eb0888 when a structure type has
zero elements.
With assertions enabled, this manifests as:
```
TypeSize llvm::StructLayout::getElementOffset(unsigned int) const: Assertion `Idx < NumElements && "Invalid element idx!"' failed.
```
AIX assembly is very different from the gas syntax. We don't expect
other targets to share these differences. Unify the numerous,
essentially AIX-specific variables.
llvm-mc --assemble prints an initial `.text` from `initSections`.
This is weird for quick assembly tasks that do not specify `.text`.
Omit the .text by moving section directive printing from `changeSection`
to `switchSection`. switchSectionNoPrint now correctly calls the
`changeSection` hook (needed by MachO).
The initial directives of clang -S are now reordered. On ELF targets, we
get `.file "a.c"; .text` instead of `.text; .file "a.c"`.
If there is no function, `.text` will be omitted.
This makes sure no optimizations are applied that assume the
bigger alignment or size, which could be incorrect if we link
together with non-instrumented code.
This relands commit #115111.
Use traditional way to update post dominator tree, i.e. break critical
edge splitting into insert, insert, delete sequence.
When splitting critical edges, the post dominator tree may change its
root node, and `setNewRoot` only works in normal dominator tree...
See
6c7e5827ed/llvm/include/llvm/Support/GenericDomTree.h (L684-L687)
Sometimes we want to use a `PgoAnalysisMap` feature that doesn't require
the BB entries info, e.g. only the `FuncEntryCount`, but the BB entries
is emitted by default, so I'm adding an option to skip the info for this
case to save the binary size(can save ~90% size of the section). For
implementation, it extends a new field(`OmitBBEntries`) in
`BBAddrMap::Features` for this and it's controlled by a switch
`--basic-block-address-map-skip-bb-entries`.
Note that this naturally supports backwards compatibility as the field
is zero for the old version, matches the decoding in the new version
llvm.
When GlobalMerge creates a MergedGlobal of statics all initialized to
zero, emitGlobalConstantImpl sees a ConstantAggregateZero. This results
in just emitting zeros followed by labels for the aliases. We need to
handle it more like how emitGlobalConstantStruct does by emitting each
global inside the aggregate.
---------
Co-authored-by: Hubert Tong <hubert.reinterpretcast@gmail.com>
... matching what we have in the disassembler. This isn't turned on by
default since several of the scheduling models are not completely
accurate, and we don't want to be misleading.
In buildLocationList, with basic block sections, we iterate over
every basic block twice to detect section start and end. This is
sub-optimal and shows up as significantly time consuming when
compiling large functions.
This patch uses the set of sections already stored in MBBSectionRanges
and iterates over sections rather than basic blocks.
When detecting if loclists can be merged, the end label of an entry is
matched with the beginning label of the next entry. For the section
corresponding to the entry basic block, this is skipped. This is
because the loc list uses the end label corresponding to the function
whereas the MBBSectionRanges map uses the function end label.
For example:
.Lfunc_begin0:
.file
.loc 0 4 0 # ex2.cc:4:0
.cfi_startproc
.Ltmp0:
.loc 0 8 5 prologue_end # ex2.cc:8:5
....
.LBB_END0_0:
.cfi_endproc
.section .text._Z4testv,"ax",@progbits,unique,1
...
.Lfunc_end0:
.size _Z4testv, .Lfunc_end0-_Z4testv
The debug loc uses ".LBB_END0_0" for the end of the section whereas
MBBSectionRanges uses ".Lfunc_end0".
It is alright to skip this as we already check the section corresponding
to the debugloc entry.
Added a new test case to check that if this works correctly when the
variable's value is mutated in the entry section.
This patch makes it so that specifying all or none for -pgo-analysis-map
along with an explicit option causes an error as this set of options
does not really make sense.
This patch adds none and all options to the -pgo-analysis-map flag,
which do basically what they say on the tin. The none option is added to
enable forcing the pgo-analysis-map by overriding an earlier invocation
of the flag. The all option is just added for convenience.
This is one of the many PRs to fix errors with LLVM_ENABLE_WERROR=on.
Built by GCC 11.
Fix warnings:
llvm-project/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp: In member
function ‘void llvm::AsmPrinter::emitJumpTableSizesSection(const
llvm::MachineJumpTableInfo*, const llvm::Function&) const’:
llvm-project/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp:2852:31: error:
enumerated and non-enumerated type in conditional expression
[-Werror=extra]
2852 | int Flags = F.hasComdat() ? ELF::SHF_GROUP : 0;
| ~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~
This reapplies commit 1911a50fae8a441b445eb835b98950710d28fc88 with a
minor fix in lld/ELF/LTO.cpp which sets Options.BBAddrMap when
`--lto-basic-block-sections=labels` is passed.
This feature is supported via the newer option
`-fbasic-block-address-map`. Using the old option still works by
delegating to the newer option, while a warning is printed to show
deprecation.
AsmPrinter always creates a symbol for the end of function if valid
debug info is present. However, this breaks SPIR-V target's output,
because SPIR-V specification allows label instructions only inside a
block, not after the function body (see
https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#OpLabel).
This PR proposes to disable emission of label instructions after the
function body if the target is SPIR-V.
This PR is a fix of the
https://github.com/llvm/llvm-project/issues/102732 issue.
This patch will make LLVM emit a new section .llvm_jump_table_sizes
containing tuples of (jump table address, entry count) in object files.
This section is useful for tools that need to statically reconstruct the
control flow of executables.
At the moment this is only enabled by default for the PS5 target.
The InitUndef pass currently uses target-specific pseudo instructions,
with one pseudo per register class.
Instead, add a generic pseudo instruction, which can be used by all
targets and register classes.