Adds support for MSVC's undocumented `/funcoverride` flag, which marks
functions as being replaceable by the Windows kernel loader. This is
used to allow functions to be upgraded depending on the capabilities of
the current processor (e.g., the kernel can be built with the naive
implementation of a function, but that function can be replaced at boot
with one that uses SIMD instructions if the processor supports them).
For each marked function we need to generate:
* An undefined symbol named `<name>_$fo$`.
* A defined symbol `<name>_$fo_default$` that points to the `.data`
section (anywhere in the data section, it is assumed to be zero sized).
* An `/ALTERNATENAME` linker directive that points from `<name>_$fo$` to
`<name>_$fo_default$`.
This is used by the MSVC linker to generate the appropriate metadata in
the Dynamic Value Relocation Table.
Marked function must never be inlined (otherwise those inline sites
can't be replaced).
Note that I've chosen to implement this in AsmPrinter as there was no
way to create a `GlobalVariable` for `<name>_$fo$` that would result in
a symbol being emitted (as nothing consumes it and it has no
initializer). I tried to have `llvm.used` and `llvm.compiler.used` point
to it, but this didn't help.
Within LLVM I referred to this feature as "loader replaceable" as
"function override" already has a different meaning to C++ developers...
I also took the opportunity to extract the feature symbol generation
code used by both AArch64 and X86 into a common function in AsmPrinter.
Reapply "IR: Remove uselist for constantdata (#137313)"
This reverts commit 5936c02c8b9c6d1476f7830517781ce8b6e26e75.
Fix checking uselists of constants in assume bundle queries
Register assembly printer passes in the pass registry.
This makes it possible to use `llc -start-before=<target>-asm-printer ...` in tests.
Adds a `char &ID` parameter to the AssemblyPrinter constructor to allow
targets to use the `INITIALIZE_PASS` macros and register the pass in the
pass registry. This currently has a default parameter so it won't break
any targets that have not been updated.
This is a follow up change to eliminating uselists for ConstantData.
In the previous revision, ConstantData had a replacement reference count
instead of a uselist. This reference count was misleading, and not useful
in the same way as it would be for another value. The references may not
have even been in the current module, since these are shared throughout
the LLVMContext.
This doesn't space leak any more than we previously did; nothing was
attempting to garbage collect unused constants.
Previously the use_empty, and hasNUses type of APIs were supported through
the reference count. These now behave as if the uses are always empty.
Ideally it would be illegal to inspect these, but this forces API complexity
into quite a few places. It may be doable to make it illegal to check these
counts, but I would like there to be a targeted fuzzing effort to make sure
every transform properly deals with a constant in every operand position.
All tests pass if I turn the hasNUses* and getNumUses queries into assertions,
only hasOneUse in particular appears to hit in some set of contexts. I've
added unit tests to ensure logical consistency between these cases
This is a resurrected version of the patch attached to this RFC:
https://discourse.llvm.org/t/rfc-constantdata-should-not-have-use-lists/42606
In this adaptation, there are a few differences. In the original patch, the Use's
use list was replaced with an unsigned* to the reference count in the value. This
version leaves them as null and leaves the ref counting only in Value.
Remove use-lists from instances of ConstantData (which are shared
across modules and have no operands).
To continue supporting most of the use-list API, store a ref-count in
place of the use-list; this is for API like Value::use_empty and
Value::hasNUses. Operations that actually need the use-list -- like
Value::use_begin -- will assert.
This change has three benefits:
1. The compiler output cannot in any way depend on the use-list order
of instances of ConstantData.
2. There's no use-list traffic when adding and removing simple
constants from operand lists (although there is ref-count traffic;
YMMV).
3. It's cheaper to serialize use-lists (since we're no longer
serializing the use-list order of things like i32 0).
The downside is that you can't look at all the users of ConstantData,
but traversals of users of i32 0 are already ill-advised.
Possible follow-ups:
- Track if an instance of a ConstantVector/ConstantArray/etc. is known
to have all ConstantData arguments, and drop the use-lists to
ref-counts in those cases. Callers need to check Value::hasUseList
before iterating through the use-list.
- Remove even the ref-counts. I'm not sure they have any benefit
besides minimizing the scope of this commit, and maintaining the
counts is not free.
Fixes#58629
Co-authored-by: Duncan P. N. Exon Smith <dexonsmith@apple.com>
With
3feb724496,
AsmPrinter can place jump table entries into `.hot` or `.unlikely`
prefixed data sections. This change refactors AsmPrinter and
AArch64AsmPrinter to prepare for the aarch64 port.
* Before this patch, the AsmPrinter class exposes `emitJumpTableInfo` as
a virtual method, and AArch64AsmPrinter overrides `emitJumpTableInfo`
for jump table emission.
* After this patch, both AsmPrinter and AArch64AsmPrinter shares
`AsmPrinter::emitJumpTableInfo`, and class-specific code are moved
inside each class's `emitJumpTableImpl` respectively.
This is a follow-up of https://github.com/llvm/llvm-project/pull/125987,
and https://github.com/llvm/llvm-project/pull/126018 implements the port
for aarch64.
AsmPrinter may switch the current section when e.g., emitting a jump
table for a switch. `.stack_sizes` should still be linked to the
function section. If the section is wrong, readelf emits a warning
"relocation symbol is not in the expected section".
https://reviews.llvm.org/D17938 introduced lowerRelativeReference to
give ConstantExpr sub (A-B) special semantics in ELF: when `A` is an
`unnamed_addr` function, create a PLT-generating relocation. This was
intended for C++ relative vtables, but C++ relative vtable ended up
using DSOLocalEquivalent (lowerDSOLocalEquivalent).
This special treatment of `unnamed_addr` seems unusual.
Let's remove it. Only COFF needs an overload to generate a @IMGREL32
relocation specifier (llvm/test/MC/COFF/cross-section-relative.ll).
Pull Request: https://github.com/llvm/llvm-project/pull/134781
With -fpatchable-function-entry (or the patchable_function_entry
function attribute), we emit records of patchable entry locations to the
__patchable_function_entries section. Add an additional parameter to the
command line option that allows one to specify a different default
section name for the records, and an identical parameter to the function
attribute that allows one to override the section used.
The main use case for this change is the Linux kernel using prefix NOPs
for ftrace, and thus depending on__patchable_function_entries to locate
traceable functions. Functions that are not traceable currently disable
entry NOPs using the function attribute, but this creates a
compatibility issue with -fsanitize=kcfi, which expects all indirectly
callable functions to have a type hash prefix at the same offset from
the function entry.
Adding a section parameter would allow the kernel to distinguish between
traceable and non-traceable functions by adding entry records to
separate sections while maintaining a stable function prefix layout for
all functions. LKML discussion:
https://lore.kernel.org/lkml/Y1QEzk%2FA41PKLEPe@hirez.programming.kicks-ass.net/
This is a follow-up patch of
https://github.com/llvm/llvm-project/pull/125756
In this PR, static-data-splitter pass produces the aggregated profile
counts of constants for constant pools in a global state
(`StateDataProfileInfo`), and asm printer consumes the profile counts to
produce `.hot` or `.unlikely` prefixes.
This implementation covers both x86 and aarch64 asm printer.
clang -fexperimental-relative-c++-abi-vtables might generate `@plt` and
`@gotpcrel` specifiers in data directives. The syntax is not used in
humand-written assembly code, and is not supported by GNU assembler.
Note: the `@plt` in `.word foo@plt` is different from
the legacy `call func@plt` (where `@plt` is simply ignored).
The `@plt` syntax was selected was simply due to a quirk of AsmParser:
the syntax was supported by all targets until I updated it
to be an opt-in feature in a0671758eb6e52a758bd1b096a9b421eec60204c
RISC-V favors the `%specifier(expr)` syntax following MIPS and Sparc,
and we should follow this convention.
This PR adds support for `.word %pltpcrel(foo+offset)` and
`.word %gotpcrel(foo)`, and drops `@plt` and `@gotpcrel`.
* MCValue::SymA can no longer have a SymbolVariant. Add an assert
similar to that of AArch64ELFObjectWriter.cpp before
https://reviews.llvm.org/D81446 (see my analysis at
https://maskray.me/blog/2025-03-16-relocation-generation-in-assemblers
if intrigued)
* `jump foo@plt, x31` now has a different diagnostic.
Pull Request: https://github.com/llvm/llvm-project/pull/132569
https://reviews.llvm.org/D23669 inappropriately added MIPS-specific
dtprel/tprel directives to MCStreamer. In addition,
llvm-mc -filetype=null parsing these directives will crash.
This patch moves these functions to MipsTargetStreamer and fixes
-filetype=null.
gprel32 and gprel64, called by AsmPrinter, are moved to
MCTargetStreamer.
setType is unneeded (and AsmPrinter tries not to modify symbols).
AsmPrinter. MCSA_ELF_TypeFunction is available on all
targets using getSymbolPreferLocal.
Pull Request: https://github.com/llvm/llvm-project/pull/128138
https://github.com/llvm/llvm-project/pull/122183 adds a codegen pass to
infer machine jump table entry's hotness from the MBB hotness. This is a
follow-up PR to produce `.hot` and or `.unlikely` section prefix for
jump table's (read-only) data sections in the relocatable `.o` files.
When this patch is enabled, linker will see {`.rodata`, `.rodata.hot`,
`.rodata.unlikely`} in input sections. It can map `.rodata.hot` and
`.rodata` in the input sections to `.rodata.hot` in the executable, and
map `.rodata.unlikely` into `.rodata` with a pending extension to
`--keep-text-section-prefix` like
059e7cbb66,
or with a linker script.
1. To partition hot and jump tables, the AsmPrinter pass slices a function's jump table indices into two groups, one for hot and the other for cold jump tables. It then emits hot jump tables into a `.hot`-prefixed data section and cold ones into a `.unlikely`-prefixed data section, retaining the relative order of `LJT<N>` labels within each group.
2. [ELF only] To have data sections with _dynamic_ names (e.g., `.rodata.hot[.func]`), we implement
`TargetLoweringObjectFile::getSectionForJumpTable` method that accepts a `MachineJumpTableEntry` parameter, and update `selectELFSectionForGlobal` to generate `.hot` or `.unlikely` based on
MJTE's hotness.
- The dynamic JT section name doesn't depend on `-ffunction-section=true` or `-funique-section-names=true`, even though it leverages the similar underlying mechanism to have a MCSection with on-demand name as `-ffunction-section` does.
3. The new code path is off by default.
- Typically, `TargetOptions` conveys clang or LLVM tools' options to code generation passes. To follow the pattern, add option `EnableStaticDataPartitioning` bit in `TargetOptions` and make it
readable through `TargetMachine`.
- To enable the new code path in tools like `llc`, `partition-static-data-sections` option is introduced in
`CodeGen/CommandFlags.h/cpp`.
- A subsequent patch
([draft](8f36a13743)) will add a clang option to enable the new code path.
---------
Co-authored-by: Ellis Hoag <ellis.sparky.hoag@gmail.com>
This restores the functionality of AsmPrinterHandlers to what it was
prior to https://github.com/llvm/llvm-project/pull/96785. The attempted
hack there of adding a duplicate DebugHandlerBase handling added a lot
of hidden state and assumptions, which just segfaulted when we tried to
continuing using this API. Instead, this just goes back to the old
design, but adds a separate array for the basic EH handles. The
duplicate array is identical to the other array of handler, but which
doesn't get their begin/endInstruction callbacks called. This still
saves the negligible but measurable amount of virtual function calls as
was the goal of #96785, while restoring the API to the pre-LLVM-19
status quo.
The caller `AsmPrinter::emitJumpTableInfo` checks [1] `MJTI` is not a
null pointer before calling `emitJumpTableEntry` or
`emitJumpTableSizesSection`.
This patch updates callee function's signature to accept const
reference, this way it's explicit `MJTI` won't be nullptr inside the
callee.
[1]
9d5299eb61/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp (L2857)
Unsigned subtraction wrap-around occurs in `emitGlobalConstantImpl` on
an AIX-specific code path from 8e4423eb0888 when a structure type has
zero elements.
With assertions enabled, this manifests as:
```
TypeSize llvm::StructLayout::getElementOffset(unsigned int) const: Assertion `Idx < NumElements && "Invalid element idx!"' failed.
```
AIX assembly is very different from the gas syntax. We don't expect
other targets to share these differences. Unify the numerous,
essentially AIX-specific variables.
llvm-mc --assemble prints an initial `.text` from `initSections`.
This is weird for quick assembly tasks that do not specify `.text`.
Omit the .text by moving section directive printing from `changeSection`
to `switchSection`. switchSectionNoPrint now correctly calls the
`changeSection` hook (needed by MachO).
The initial directives of clang -S are now reordered. On ELF targets, we
get `.file "a.c"; .text` instead of `.text; .file "a.c"`.
If there is no function, `.text` will be omitted.
This makes sure no optimizations are applied that assume the
bigger alignment or size, which could be incorrect if we link
together with non-instrumented code.
This relands commit #115111.
Use traditional way to update post dominator tree, i.e. break critical
edge splitting into insert, insert, delete sequence.
When splitting critical edges, the post dominator tree may change its
root node, and `setNewRoot` only works in normal dominator tree...
See
6c7e5827ed/llvm/include/llvm/Support/GenericDomTree.h (L684-L687)
Sometimes we want to use a `PgoAnalysisMap` feature that doesn't require
the BB entries info, e.g. only the `FuncEntryCount`, but the BB entries
is emitted by default, so I'm adding an option to skip the info for this
case to save the binary size(can save ~90% size of the section). For
implementation, it extends a new field(`OmitBBEntries`) in
`BBAddrMap::Features` for this and it's controlled by a switch
`--basic-block-address-map-skip-bb-entries`.
Note that this naturally supports backwards compatibility as the field
is zero for the old version, matches the decoding in the new version
llvm.
When GlobalMerge creates a MergedGlobal of statics all initialized to
zero, emitGlobalConstantImpl sees a ConstantAggregateZero. This results
in just emitting zeros followed by labels for the aliases. We need to
handle it more like how emitGlobalConstantStruct does by emitting each
global inside the aggregate.
---------
Co-authored-by: Hubert Tong <hubert.reinterpretcast@gmail.com>
... matching what we have in the disassembler. This isn't turned on by
default since several of the scheduling models are not completely
accurate, and we don't want to be misleading.
In buildLocationList, with basic block sections, we iterate over
every basic block twice to detect section start and end. This is
sub-optimal and shows up as significantly time consuming when
compiling large functions.
This patch uses the set of sections already stored in MBBSectionRanges
and iterates over sections rather than basic blocks.
When detecting if loclists can be merged, the end label of an entry is
matched with the beginning label of the next entry. For the section
corresponding to the entry basic block, this is skipped. This is
because the loc list uses the end label corresponding to the function
whereas the MBBSectionRanges map uses the function end label.
For example:
.Lfunc_begin0:
.file
.loc 0 4 0 # ex2.cc:4:0
.cfi_startproc
.Ltmp0:
.loc 0 8 5 prologue_end # ex2.cc:8:5
....
.LBB_END0_0:
.cfi_endproc
.section .text._Z4testv,"ax",@progbits,unique,1
...
.Lfunc_end0:
.size _Z4testv, .Lfunc_end0-_Z4testv
The debug loc uses ".LBB_END0_0" for the end of the section whereas
MBBSectionRanges uses ".Lfunc_end0".
It is alright to skip this as we already check the section corresponding
to the debugloc entry.
Added a new test case to check that if this works correctly when the
variable's value is mutated in the entry section.