This patch will make LLVM emit a new section .llvm_jump_table_sizes
containing tuples of (jump table address, entry count) in object files.
This section is useful for tools that need to statically reconstruct the
control flow of executables.
At the moment this is only enabled by default for the PS5 target.
The InitUndef pass currently uses target-specific pseudo instructions,
with one pseudo per register class.
Instead, add a generic pseudo instruction, which can be used by all
targets and register classes.
This patch is part of a set of patches that add an `-fextend-lifetimes`
flag to clang, which extends the lifetimes of local variables and
parameters for improved debuggability. In addition to that flag, the
patch series adds a pragma to selectively disable `-fextend-lifetimes`,
and an `-fextend-this-ptr` flag which functions as `-fextend-lifetimes`
for this pointers only. All changes and tests in these patches were
written by Wolfgang Pieb (@wolfy1961), while Stephen Tozer (@SLTozer)
has handled review and merging. The extend lifetimes flag is intended to
eventually be set on by `-Og`, as discussed in the RFC
here:
https://discourse.llvm.org/t/rfc-redefine-og-o1-and-add-a-new-level-of-og/72850
This patch implements a new intrinsic instruction in LLVM,
`llvm.fake.use` in IR and `FAKE_USE` in MIR, that takes a single operand
and has no effect other than "using" its operand, to ensure that its
operand remains live until after the fake use. This patch does not emit
fake uses anywhere; the next patch in this sequence causes them to be
emitted from the clang frontend, such that for each variable (or this) a
fake.use operand is inserted at the end of that variable's scope, using
that variable's value. This patch covers everything post-frontend, which
is largely just the basic plumbing for a new intrinsic/instruction,
along with a few steps to preserve the fake uses through optimizations
(such as moving them ahead of a tail call or translating them through
SROA).
Co-authored-by: Stephen Tozer <stephen.tozer@sony.com>
All Mach-O targets have this property, so just remove this variable,
which could lure contributors to add unneeded object file format
specific properties.
This patch makes LLVM emit line 0 source locations for nops emitted as
basic block padding.
---------
Co-authored-by: Orlando Cazalet-Hyams <orlando.hyams@sony.com>
This avoids another unserializable field. Move the DbgInfoAvailable
field into the AsmPrinter, which is only really a cache/convenience
bit for checking a direct IR module metadata check.
Enabled in clang using:
-fptrauth-indirect-gotos
and at the IR level using function attribute:
"ptrauth-indirect-gotos"
Signing uses IA and a per-function integer discriminator. The
discriminator isn't ABI-visible, and is currently:
ptrauth_string_discriminator("<function_name> blockaddress")
A sufficiently sophisticated frontend could benefit from per-indirectbr
discrimination, which would need additional machinery, such as allowing
"ptrauth" bundles on indirectbr. For our purposes, the simple scheme
above is sufficient.
This approach doesn't support subtracting label addresses and using
the result as offsets, because each label address is signed.
Pointer arithmetic on signed pointers corrupts the signature bits,
and because label address expressions aren't typed beyond void*,
we can't do anything reliably intelligent on the arithmetic exprs.
Not signing addresses when used to form offsets would allow
easily hijacking control flow by overwriting the offset.
This diagnoses the basic cases (`&&lbl2 - &&lbl1`) in the frontend,
while we evaluate either alternative implementations (e.g., lowering
blockaddress to a bb number, and indirectbr to a checked jump-table),
or better diagnostics (both at the frontend level and on unencodable
IR constants).
Summary:
- Remove wrappers in `MachineDominatorTree`.
- Remove `MachineDominatorTree` update code in
`MachineBasicBlock::SplitCriticalEdge`.
- Use `MachineDomTreeUpdater` in passes which call
`MachineBasicBlock::SplitCriticalEdge` and preserve
`MachineDominatorTreeWrapperPass` or CFG analyses.
Commit abea99f65a97248974c02a5544eaf25fc4240056 introduced related
methods in 2014. Now we have SemiNCA based dominator tree in 2017 and
dominator tree updater, the solution adopted here seems a bit outdated.
If a function's address is taken, which means it may be called via a function pointer,
we need the function descriptor for it.
Otherwise, the function descriptor can be omitted for external symbols.
getSectionIDNum may return same value for two different MBBSectionID.
e.g. A Cold type MBBSectionID with number 0 and a Default type
MBBSectionID with number 2 get same value 2 from getSectionIDNum. This
may lead to overwrite of MBBSectionRanges. Using MBBSectionID itself
as DenseMap key is better choice.
Since `raw_string_ostream` doesn't own the string buffer, it is
desirable (in terms of memory safety) for users to directly reference
the string buffer rather than use `raw_string_ostream::str()`.
Work towards TODO comment to remove `raw_string_ostream::str()`.
Timers are an out-of-line function call and a global variable access,
here twice per emitted instruction. At this granularity, not only the
time results become skewed, but the timers also add a performance
overhead when profiling is disabled. Also outside of the innermost loop,
timers add a measurable overhead. As this is quite expensive for a
mostly unused profiling facility, remove the timers.
Fixes#39650.
Currently, an AsmPrinterHandler has several methods that allow to
dynamically hook in unwind or debug info emission, e.g. at begin/end of
every function or instruction. The class hierarchy and the actually
overridden functions are as follows:
(SymSz=setSymbolSize, mFE=markFunctionEnd, BBS=BasicBlockSection,
FL=Funclet; b=beginX, e=endX)
SymSz Mod Fn mFE BBS FL Inst
AsmPrinterHandler - - - - - - -
` PseudoProbeHandler - - - - - - -
` WinCFGuard - e e - - - -
` EHStreamer - - - - - - -
` DwarfCFIException - e be - be - -
` ARMException - - be e - - -
` AIXException - - e - - - -
` WinException - e be e - be -
` WasmException - e be - - - -
` DebugHandlerBase - b be - be - be
` BTFDebug - e - - - - b
` CodeViewDebug - be - - - - b
` DWARFDebug yes be - - - - b
Doing virtual function calls per instruction is costly and useless when
the called function does nothing.
This commit performs the following clean-up/improvements:
- PseudoProbeHandler is no longer an AsmPrinterHandler -- it used
nothing of its functionality to hook in at the possible points. This
avoids virtual function calls when a pseudo probe printer is present.
- DebugHandlerBase is no longer an AsmPrinterHandler, but a separate
base class. DebugHandlerBase is the only remaining "hook" for begin/end
instruction and setSymbolSize (only used by DWARFDebug). begin/end for
function and basic block sections are never overriden and therefore are
no longer virtual. (Originally I intended there to be only one debug
handler, but BPF as the only target supports two at the same time: DWARF
and BTF.)
- AsmPrinterHandler no longer has begin/end instruction and
setSymbolSize hooks -- these were only used by DebugHandlerBase. This
avoid iterating over handlers in every instruction.
AsmPrinterHandler Mod Fn mFE BBS FL
` WinCFGuard e e - - -
` EHStreamer - - - - -
` DwarfCFIException e be - be -
` ARMException - be e - -
` AIXException - e - - -
` WinException e be e - be
` WasmException e be - - -
SymSz Mod Fn BBS Inst
DebugHandlerBase - b be be be
` BTFDebug - e b
` CodeViewDebug - be b
` DWARFDebug yes be b
PseudoProbeHandler (no shared methods)
To continue allowing external users (e.g., Julia) to hook in at every
instruction, a new method addDebugHandler is exposed.
This results in a performance improvement, especially in the -O0 -g0
case with unwind information (e.g., JIT baseline).
This reverts commit ab58b6d58edf6a7c8881044fc716ca435d7a0156.
In `CodeGen/Generic/MachineBranchProb.ll`, `llc` crashed with dumped MIR
when targeting PowerPC. Move test to `llc/new-pm`, which is X86
specific.
This reverts commit 245491a9f384e4c53421196533c2a2b693efaf8d ("[MC] Disable MCAssembler based constant folding for DwarfDebug")
and cb09b5f3d53e5b7b4452bb3db78dca79fc9b3f17 ("[MC] Disable MCAssembler based constant folding for compact unwind and emitJumpTableEntry").
Checking the relative order of FA and FB is now faster due to
de19f7b6d46f1c38e10e604154f0fdaaffde9ebd ("[MC] Replace fragment ilist with singly-linked lists").
There are only three actual uses of the section kind in MCSection:
isText(), XCOFF, and WebAssembly. Store isText() in the MCSection, and
store other info in the actual section variants where required.
ELF and COFF flags also encode all relevant information, so for these
two section variants, remove the SectionKind parameter entirely.
This allows to remove the string switch (which is unnecessary and
inaccurate) from createELFSectionImpl. This was introduced in
[D133456](https://reviews.llvm.org/D133456), but apparently, it was
never hit for non-writable sections anyway and the resulting kind was
never used.
Prepare for new pass manager version of `MachineDominatorTreeAnalysis`.
We may need a machine dominator tree version of `DomTreeUpdater` to
handle `SplitCriticalEdge` in some CodeGen passes.
Lower global references to ptrauth constants into `@AUTH` `MCExpr`'s.
The logic is common for MachO and ELF - test both.
---------
Co-authored-by: Ahmed Bougacha <ahmed@bougacha.org>
Similar to commit 245491a9f384e4c53421196533c2a2b693efaf8d for DwarfDebug.
This completely disables the expensive MCFragment walk code in
`AttemptToFoldSymbolOffsetDifference` when compiling sqlite3.i for
macOS.
In the future, we should try enabling the MCFragment walk only for
constructs like `.if . -_start == 1` and `.subsection a-b` and
remove these `setUseAssemblerInfoForParsing`.
Related to the poor performance of MCAssembler based constant folding
(see `bool MCExpr::evaluateAsAbsolute(int64_t &Res, const MCAssembler *Asm) const` and
`AttemptToFoldSymbolOffsetDifference`),
commit 9500a5d02e23f9b43294e5f662ac099f8989c0e4 (#91082) caused -O0 -g
compile time regression.
9500a5d02e23f9b43294e5f662ac099f8989c0e4 special cased .eh_frame FDE
emitting. This patch adds a special case to .debug_* emitting as well to
mitigate the rest regression.
The MCAssembler based constant folding strategy should be improved to
remove the two special cases.
If `LLVM_APPEND_VC_REV` is on, add the git revision to the `.file`
string. The revision can be set with `LLVM_FORCE_VC_REVISION`.
Before:
`.file "git_revision.cpp",,"LLVM version 19.0.0git"`
After:
`.file "git_revision.cpp",,"LLVM version 19.0.0git (LLVM_REVISION)"`
-fsanitize=function emits a signature and function hash before a
function. Similar to 7f6e2c9, these can be sheared off when
`.subsections_via_symbols` is used.
This change uses the same technique 7f6e2c9 introduced for prefixes:
emitting a symbol for the metadata, then marking the actual function
entry as an .alt_entry symbol.
`clang -c -masm=intel` compiling a source file with file scope basic asm
incorrectly uses the AT&T dialect.
```
% cat a.c
asm("mov rax, rax");
% clang a.c -c -masm=intel
<inline asm>:1:1: error: unknown use of instruction mnemonic without a size suffix
mov rax, rax
^
```
Fix this by setting the assembler dialect from the MCAsmInfo object.
Note: `clang -c -flto -masm=intel a.c` still fails because of
https://reviews.llvm.org/D82862 for #34830: it tried to support AT&T
syntax for clang-cl, but the forced AT&T syntax is not compatible with
intended Intel syntax.
Pull Request: https://github.com/llvm/llvm-project/pull/85367
This is a small part of #70452, attempting to take a small simpler part
of it in isolation to simplify what remains. It changes the getSpillSize,
getFoldedSpillSize, getRestoreSize and getFoldedRestoreSize methods to return
optional<LocationSize> instead of unsigned. The code is intended to be the
same, keeping the optional<> to specify when there was no size found, with some
minor adjustments to make sure that unknown (~UINT64_C(0)) sizes are handled
sensibly. Hopefully as more unsigned's are converted to LocationSize's the use
of ~UINT64_C(0) can be cleaned up too.
The current implementation of aliases tries to remove all the aliases in
the module to prevent the generic version of `AsmPrinter` from emitting
them incorrectly. Unfortunately, if the aliases are used this will fail.
Instead let's override the function to print aliases directly.
In addition, the declarations of the alias functions must occur before
the uses. To fix this we emit alias declarations as part of
`emitDeclarations` and only emit the `.alias` directives at the end
(where we can assume the aliasee has also already been declared).
Today `-split-machine-functions` and `-fbasic-block-sections={all,list}`
cannot be combined with `-basic-block-sections=labels` (the labels
option will be ignored).
The inconsistency comes from the way basic block address map -- the
underlying mechanism for basic block labels -- encodes basic block
addresses
(https://lists.llvm.org/pipermail/llvm-dev/2020-July/143512.html).
Specifically, basic block offsets are computed relative to the function
begin symbol. This relies on functions being contiguous which is not the
case for MFS and basic block section binaries. This means Propeller
cannot use binary profiles collected from these binaries, which limits
the applicability of Propeller for iterative optimization.
To make the `SHT_LLVM_BB_ADDR_MAP` feature work with basic block section
binaries, we propose modifying the encoding of this section as follows.
First let us review the current encoding which emits the address of each
function and its number of basic blocks, followed by basic block entries
for each basic block.
| | |
|--|--|
| Address of the function | Function Address |
| Number of basic blocks in this function | NumBlocks |
| BB entry 1
| BB entry 2
| ...
| BB entry #NumBlocks
To make this work for basic block sections, we treat each basic block
section similar to a function, except that basic block sections of the
same function must be encapsulated in the same structure so we can map
all of them to their single function.
We modify the encoding to first emit the number of basic block sections
(BB ranges) in the function. Then we emit the address map of each basic
block section section as before: the base address of the section, its
number of blocks, and BB entries for its basic block. The first section
in the BB address map is always the function entry section.
| | |
|--|--|
| Number of sections for this function | NumBBRanges |
| Section 1 begin address | BaseAddress[1] |
| Number of basic blocks in section 1 | NumBlocks[1] |
| BB entries for Section 1
|..................|
| Section #NumBBRanges begin address | BaseAddress[NumBBRanges] |
| Number of basic blocks in section #NumBBRanges |
NumBlocks[NumBBRanges] |
| BB entries for Section #NumBBRanges
The encoding of basic block entries remains as before with the minor
change that each basic block offset is now computed relative to the
begin symbol of its containing BB section.
This patch adds a new boolean codegen option `-basic-block-address-map`.
Correspondingly, the front-end flag `-fbasic-block-address-map` and LLD
flag `--lto-basic-block-address-map` are introduced.
Analogously, we add a new TargetOption field `BBAddrMap`. This means BB
address maps are either generated for all functions in the compiling
unit, or for none (depending on `TargetOptions::BBAddrMap`).
This patch keeps the functionality of the old
`-fbasic-block-sections=labels` option but does not remove it. A
subsequent patch will remove the obsolete option.
We refactor the `BasicBlockSections` pass by separating the BB address
map and BB sections handing to their own functions (named
`handleBBAddrMap` and `handleBBSections`). `handleBBSections` renumbers
basic blocks and places them in their assigned sections.
`handleBBAddrMap` is invoked after `handleBBSections` (if requested) and
only renumbers the blocks.
- New tests added:
- Two tests basic-block-address-map-with-basic-block-sections.ll and
basic-block-address-map-with-mfs.ll to exercise the combination of
`-basic-block-address-map` with `-basic-block-sections=list` and
'-split-machine-functions`.
- A driver sanity test for the `-fbasic-block-address-map` option
(basic-block-address-map.c).
- An LLD test for testing the `--lto-basic-block-address-map` option.
This reuses the LLVM IR from `lld/test/ELF/lto/basic-block-sections.ll`.
- Renamed and modified the two existing codegen tests for basic block
address map (`basic-block-sections-labels-functions-sections.ll` and
`basic-block-sections-labels.ll`)
- Removed `SHT_LLVM_BB_ADDR_MAP_V0` tests. Full deprecation of
`SHT_LLVM_BB_ADDR_MAP_V0` and `SHT_LLVM_BB_ADDR_MAP` version less than 2
will happen in a separate PR in a few months.
Now that the work embedding PGO information in SHT_LLVM_BB_ADDR_MAP ELF
sections has landed, there is no longer a need to keep around the
mbb-profile-dump flag.
This combines the previously posted patches with some additional work
I've done to more closely match MSVC output.
Most of the important logic here is implemented in
AArch64Arm64ECCallLowering. The purpose of the
AArch64Arm64ECCallLowering is to take "normal" IR we'd generate for
other targets, and generate most of the Arm64EC-specific bits:
generating thunks, mangling symbols, generating aliases, and generating
the .hybmp$x table. This is all done late for a few reasons: to
consolidate the logic as much as possible, and to ensure the IR exposed
to optimization passes doesn't contain complex arm64ec-specific
constructs.
The other changes are supporting changes, to handle the new constructs
generated by that pass.
There's a global llvm.arm64ec.symbolmap representing the .hybmp$x
entries for the thunks. This gets handled directly by the AsmPrinter
because it needs symbol indexes that aren't available before that.
There are two new calling conventions used to represent calls to and
from thunks: ARM64EC_Thunk_X64 and ARM64EC_Thunk_Native. There are a few
changes to handle the associated exception-handling info,
SEH_SaveAnyRegQP and SEH_SaveAnyRegQPX.
I've intentionally left out handling for structs with small
non-power-of-two sizes, because that's easily separated out. The rest of
my current work is here. I squashed my current patches because they were
split in ways that didn't really make sense. Maybe I could split out
some bits, but it's hard to meaningfully test most of the parts
independently.
Thanks to @dpaoliello for extensive testing and suggestions.
(Originally posted as https://reviews.llvm.org/D157547 .)
Uses machine analyses to emit PGOAnalysisMap into the bb-addr-map ELF
section. Implements filecheck tests to verify emitting new fields.
This patch emits optional PGO related analyses into the bb-addr-map ELF
section during AsmPrinter. This currently supports Function Entry Count,
Machine Block Frequencies. and Machine Branch Probabilities. Each is
independently enabled via the `feature` byte of `bb-addr-map` for the given
function.
A part of [RFC - PGO Accuracy Metrics: Emitting and Evaluating Branch and Block Analysis](https://discourse.llvm.org/t/rfc-pgo-accuracy-metrics-emitting-and-evaluating-branch-and-block-analysis/73902).
... by lowering them as lazy resolve-on-first-use symbol resolvers. Note that this is subtly different timing than on ELF platforms, where ifunc resolution happens at load time.
Since ld64 and ld-prime don't support all the cases we need for these, we lower them manually in the AsmPrinter.
This works around an AIX assembler and linker bug. If the
-fno-integrated-as and -frecord-command-line options are used but
there's no actual code in the source file, the assembler creates an
object file with only an .info section. The AIX linker rejects such an
object file.
When generating the assembly code for AIX/XCOFF, the .file pseudo-op
needs to be emitted first, before any csects are generated. Otherwise,
information such as the embedded command line will be associated with
part of the object file rather than the entire object file.