https://github.com/llvm/llvm-project/pull/122183 adds a codegen pass to
infer machine jump table entry's hotness from the MBB hotness. This is a
follow-up PR to produce `.hot` and or `.unlikely` section prefix for
jump table's (read-only) data sections in the relocatable `.o` files.
When this patch is enabled, linker will see {`.rodata`, `.rodata.hot`,
`.rodata.unlikely`} in input sections. It can map `.rodata.hot` and
`.rodata` in the input sections to `.rodata.hot` in the executable, and
map `.rodata.unlikely` into `.rodata` with a pending extension to
`--keep-text-section-prefix` like
059e7cbb66,
or with a linker script.
1. To partition hot and jump tables, the AsmPrinter pass slices a function's jump table indices into two groups, one for hot and the other for cold jump tables. It then emits hot jump tables into a `.hot`-prefixed data section and cold ones into a `.unlikely`-prefixed data section, retaining the relative order of `LJT<N>` labels within each group.
2. [ELF only] To have data sections with _dynamic_ names (e.g., `.rodata.hot[.func]`), we implement
`TargetLoweringObjectFile::getSectionForJumpTable` method that accepts a `MachineJumpTableEntry` parameter, and update `selectELFSectionForGlobal` to generate `.hot` or `.unlikely` based on
MJTE's hotness.
- The dynamic JT section name doesn't depend on `-ffunction-section=true` or `-funique-section-names=true`, even though it leverages the similar underlying mechanism to have a MCSection with on-demand name as `-ffunction-section` does.
3. The new code path is off by default.
- Typically, `TargetOptions` conveys clang or LLVM tools' options to code generation passes. To follow the pattern, add option `EnableStaticDataPartitioning` bit in `TargetOptions` and make it
readable through `TargetMachine`.
- To enable the new code path in tools like `llc`, `partition-static-data-sections` option is introduced in
`CodeGen/CommandFlags.h/cpp`.
- A subsequent patch
([draft](8f36a13743)) will add a clang option to enable the new code path.
---------
Co-authored-by: Ellis Hoag <ellis.sparky.hoag@gmail.com>
This restores the functionality of AsmPrinterHandlers to what it was
prior to https://github.com/llvm/llvm-project/pull/96785. The attempted
hack there of adding a duplicate DebugHandlerBase handling added a lot
of hidden state and assumptions, which just segfaulted when we tried to
continuing using this API. Instead, this just goes back to the old
design, but adds a separate array for the basic EH handles. The
duplicate array is identical to the other array of handler, but which
doesn't get their begin/endInstruction callbacks called. This still
saves the negligible but measurable amount of virtual function calls as
was the goal of #96785, while restoring the API to the pre-LLVM-19
status quo.
The caller `AsmPrinter::emitJumpTableInfo` checks [1] `MJTI` is not a
null pointer before calling `emitJumpTableEntry` or
`emitJumpTableSizesSection`.
This patch updates callee function's signature to accept const
reference, this way it's explicit `MJTI` won't be nullptr inside the
callee.
[1]
9d5299eb61/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp (L2857)
Unsigned subtraction wrap-around occurs in `emitGlobalConstantImpl` on
an AIX-specific code path from 8e4423eb0888 when a structure type has
zero elements.
With assertions enabled, this manifests as:
```
TypeSize llvm::StructLayout::getElementOffset(unsigned int) const: Assertion `Idx < NumElements && "Invalid element idx!"' failed.
```
AIX assembly is very different from the gas syntax. We don't expect
other targets to share these differences. Unify the numerous,
essentially AIX-specific variables.
llvm-mc --assemble prints an initial `.text` from `initSections`.
This is weird for quick assembly tasks that do not specify `.text`.
Omit the .text by moving section directive printing from `changeSection`
to `switchSection`. switchSectionNoPrint now correctly calls the
`changeSection` hook (needed by MachO).
The initial directives of clang -S are now reordered. On ELF targets, we
get `.file "a.c"; .text` instead of `.text; .file "a.c"`.
If there is no function, `.text` will be omitted.
This makes sure no optimizations are applied that assume the
bigger alignment or size, which could be incorrect if we link
together with non-instrumented code.
This relands commit #115111.
Use traditional way to update post dominator tree, i.e. break critical
edge splitting into insert, insert, delete sequence.
When splitting critical edges, the post dominator tree may change its
root node, and `setNewRoot` only works in normal dominator tree...
See
6c7e5827ed/llvm/include/llvm/Support/GenericDomTree.h (L684-L687)
Sometimes we want to use a `PgoAnalysisMap` feature that doesn't require
the BB entries info, e.g. only the `FuncEntryCount`, but the BB entries
is emitted by default, so I'm adding an option to skip the info for this
case to save the binary size(can save ~90% size of the section). For
implementation, it extends a new field(`OmitBBEntries`) in
`BBAddrMap::Features` for this and it's controlled by a switch
`--basic-block-address-map-skip-bb-entries`.
Note that this naturally supports backwards compatibility as the field
is zero for the old version, matches the decoding in the new version
llvm.
When GlobalMerge creates a MergedGlobal of statics all initialized to
zero, emitGlobalConstantImpl sees a ConstantAggregateZero. This results
in just emitting zeros followed by labels for the aliases. We need to
handle it more like how emitGlobalConstantStruct does by emitting each
global inside the aggregate.
---------
Co-authored-by: Hubert Tong <hubert.reinterpretcast@gmail.com>
... matching what we have in the disassembler. This isn't turned on by
default since several of the scheduling models are not completely
accurate, and we don't want to be misleading.
In buildLocationList, with basic block sections, we iterate over
every basic block twice to detect section start and end. This is
sub-optimal and shows up as significantly time consuming when
compiling large functions.
This patch uses the set of sections already stored in MBBSectionRanges
and iterates over sections rather than basic blocks.
When detecting if loclists can be merged, the end label of an entry is
matched with the beginning label of the next entry. For the section
corresponding to the entry basic block, this is skipped. This is
because the loc list uses the end label corresponding to the function
whereas the MBBSectionRanges map uses the function end label.
For example:
.Lfunc_begin0:
.file
.loc 0 4 0 # ex2.cc:4:0
.cfi_startproc
.Ltmp0:
.loc 0 8 5 prologue_end # ex2.cc:8:5
....
.LBB_END0_0:
.cfi_endproc
.section .text._Z4testv,"ax",@progbits,unique,1
...
.Lfunc_end0:
.size _Z4testv, .Lfunc_end0-_Z4testv
The debug loc uses ".LBB_END0_0" for the end of the section whereas
MBBSectionRanges uses ".Lfunc_end0".
It is alright to skip this as we already check the section corresponding
to the debugloc entry.
Added a new test case to check that if this works correctly when the
variable's value is mutated in the entry section.
This patch makes it so that specifying all or none for -pgo-analysis-map
along with an explicit option causes an error as this set of options
does not really make sense.
This patch adds none and all options to the -pgo-analysis-map flag,
which do basically what they say on the tin. The none option is added to
enable forcing the pgo-analysis-map by overriding an earlier invocation
of the flag. The all option is just added for convenience.
This is one of the many PRs to fix errors with LLVM_ENABLE_WERROR=on.
Built by GCC 11.
Fix warnings:
llvm-project/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp: In member
function ‘void llvm::AsmPrinter::emitJumpTableSizesSection(const
llvm::MachineJumpTableInfo*, const llvm::Function&) const’:
llvm-project/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp:2852:31: error:
enumerated and non-enumerated type in conditional expression
[-Werror=extra]
2852 | int Flags = F.hasComdat() ? ELF::SHF_GROUP : 0;
| ~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~
This reapplies commit 1911a50fae8a441b445eb835b98950710d28fc88 with a
minor fix in lld/ELF/LTO.cpp which sets Options.BBAddrMap when
`--lto-basic-block-sections=labels` is passed.
This feature is supported via the newer option
`-fbasic-block-address-map`. Using the old option still works by
delegating to the newer option, while a warning is printed to show
deprecation.
AsmPrinter always creates a symbol for the end of function if valid
debug info is present. However, this breaks SPIR-V target's output,
because SPIR-V specification allows label instructions only inside a
block, not after the function body (see
https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#OpLabel).
This PR proposes to disable emission of label instructions after the
function body if the target is SPIR-V.
This PR is a fix of the
https://github.com/llvm/llvm-project/issues/102732 issue.
This patch will make LLVM emit a new section .llvm_jump_table_sizes
containing tuples of (jump table address, entry count) in object files.
This section is useful for tools that need to statically reconstruct the
control flow of executables.
At the moment this is only enabled by default for the PS5 target.
The InitUndef pass currently uses target-specific pseudo instructions,
with one pseudo per register class.
Instead, add a generic pseudo instruction, which can be used by all
targets and register classes.
This patch is part of a set of patches that add an `-fextend-lifetimes`
flag to clang, which extends the lifetimes of local variables and
parameters for improved debuggability. In addition to that flag, the
patch series adds a pragma to selectively disable `-fextend-lifetimes`,
and an `-fextend-this-ptr` flag which functions as `-fextend-lifetimes`
for this pointers only. All changes and tests in these patches were
written by Wolfgang Pieb (@wolfy1961), while Stephen Tozer (@SLTozer)
has handled review and merging. The extend lifetimes flag is intended to
eventually be set on by `-Og`, as discussed in the RFC
here:
https://discourse.llvm.org/t/rfc-redefine-og-o1-and-add-a-new-level-of-og/72850
This patch implements a new intrinsic instruction in LLVM,
`llvm.fake.use` in IR and `FAKE_USE` in MIR, that takes a single operand
and has no effect other than "using" its operand, to ensure that its
operand remains live until after the fake use. This patch does not emit
fake uses anywhere; the next patch in this sequence causes them to be
emitted from the clang frontend, such that for each variable (or this) a
fake.use operand is inserted at the end of that variable's scope, using
that variable's value. This patch covers everything post-frontend, which
is largely just the basic plumbing for a new intrinsic/instruction,
along with a few steps to preserve the fake uses through optimizations
(such as moving them ahead of a tail call or translating them through
SROA).
Co-authored-by: Stephen Tozer <stephen.tozer@sony.com>
All Mach-O targets have this property, so just remove this variable,
which could lure contributors to add unneeded object file format
specific properties.
This patch makes LLVM emit line 0 source locations for nops emitted as
basic block padding.
---------
Co-authored-by: Orlando Cazalet-Hyams <orlando.hyams@sony.com>
This avoids another unserializable field. Move the DbgInfoAvailable
field into the AsmPrinter, which is only really a cache/convenience
bit for checking a direct IR module metadata check.
Enabled in clang using:
-fptrauth-indirect-gotos
and at the IR level using function attribute:
"ptrauth-indirect-gotos"
Signing uses IA and a per-function integer discriminator. The
discriminator isn't ABI-visible, and is currently:
ptrauth_string_discriminator("<function_name> blockaddress")
A sufficiently sophisticated frontend could benefit from per-indirectbr
discrimination, which would need additional machinery, such as allowing
"ptrauth" bundles on indirectbr. For our purposes, the simple scheme
above is sufficient.
This approach doesn't support subtracting label addresses and using
the result as offsets, because each label address is signed.
Pointer arithmetic on signed pointers corrupts the signature bits,
and because label address expressions aren't typed beyond void*,
we can't do anything reliably intelligent on the arithmetic exprs.
Not signing addresses when used to form offsets would allow
easily hijacking control flow by overwriting the offset.
This diagnoses the basic cases (`&&lbl2 - &&lbl1`) in the frontend,
while we evaluate either alternative implementations (e.g., lowering
blockaddress to a bb number, and indirectbr to a checked jump-table),
or better diagnostics (both at the frontend level and on unencodable
IR constants).
Summary:
- Remove wrappers in `MachineDominatorTree`.
- Remove `MachineDominatorTree` update code in
`MachineBasicBlock::SplitCriticalEdge`.
- Use `MachineDomTreeUpdater` in passes which call
`MachineBasicBlock::SplitCriticalEdge` and preserve
`MachineDominatorTreeWrapperPass` or CFG analyses.
Commit abea99f65a97248974c02a5544eaf25fc4240056 introduced related
methods in 2014. Now we have SemiNCA based dominator tree in 2017 and
dominator tree updater, the solution adopted here seems a bit outdated.
If a function's address is taken, which means it may be called via a function pointer,
we need the function descriptor for it.
Otherwise, the function descriptor can be omitted for external symbols.
getSectionIDNum may return same value for two different MBBSectionID.
e.g. A Cold type MBBSectionID with number 0 and a Default type
MBBSectionID with number 2 get same value 2 from getSectionIDNum. This
may lead to overwrite of MBBSectionRanges. Using MBBSectionID itself
as DenseMap key is better choice.
Since `raw_string_ostream` doesn't own the string buffer, it is
desirable (in terms of memory safety) for users to directly reference
the string buffer rather than use `raw_string_ostream::str()`.
Work towards TODO comment to remove `raw_string_ostream::str()`.
Timers are an out-of-line function call and a global variable access,
here twice per emitted instruction. At this granularity, not only the
time results become skewed, but the timers also add a performance
overhead when profiling is disabled. Also outside of the innermost loop,
timers add a measurable overhead. As this is quite expensive for a
mostly unused profiling facility, remove the timers.
Fixes#39650.
Currently, an AsmPrinterHandler has several methods that allow to
dynamically hook in unwind or debug info emission, e.g. at begin/end of
every function or instruction. The class hierarchy and the actually
overridden functions are as follows:
(SymSz=setSymbolSize, mFE=markFunctionEnd, BBS=BasicBlockSection,
FL=Funclet; b=beginX, e=endX)
SymSz Mod Fn mFE BBS FL Inst
AsmPrinterHandler - - - - - - -
` PseudoProbeHandler - - - - - - -
` WinCFGuard - e e - - - -
` EHStreamer - - - - - - -
` DwarfCFIException - e be - be - -
` ARMException - - be e - - -
` AIXException - - e - - - -
` WinException - e be e - be -
` WasmException - e be - - - -
` DebugHandlerBase - b be - be - be
` BTFDebug - e - - - - b
` CodeViewDebug - be - - - - b
` DWARFDebug yes be - - - - b
Doing virtual function calls per instruction is costly and useless when
the called function does nothing.
This commit performs the following clean-up/improvements:
- PseudoProbeHandler is no longer an AsmPrinterHandler -- it used
nothing of its functionality to hook in at the possible points. This
avoids virtual function calls when a pseudo probe printer is present.
- DebugHandlerBase is no longer an AsmPrinterHandler, but a separate
base class. DebugHandlerBase is the only remaining "hook" for begin/end
instruction and setSymbolSize (only used by DWARFDebug). begin/end for
function and basic block sections are never overriden and therefore are
no longer virtual. (Originally I intended there to be only one debug
handler, but BPF as the only target supports two at the same time: DWARF
and BTF.)
- AsmPrinterHandler no longer has begin/end instruction and
setSymbolSize hooks -- these were only used by DebugHandlerBase. This
avoid iterating over handlers in every instruction.
AsmPrinterHandler Mod Fn mFE BBS FL
` WinCFGuard e e - - -
` EHStreamer - - - - -
` DwarfCFIException e be - be -
` ARMException - be e - -
` AIXException - e - - -
` WinException e be e - be
` WasmException e be - - -
SymSz Mod Fn BBS Inst
DebugHandlerBase - b be be be
` BTFDebug - e b
` CodeViewDebug - be b
` DWARFDebug yes be b
PseudoProbeHandler (no shared methods)
To continue allowing external users (e.g., Julia) to hook in at every
instruction, a new method addDebugHandler is exposed.
This results in a performance improvement, especially in the -O0 -g0
case with unwind information (e.g., JIT baseline).