On RISC-V, there are certain relocations that target a specific
instruction instead of a more abstract location like a function or basic
block. Take the following example that loads a value from symbol `foo`:
```
nop
1: auipc t0, %pcrel_hi(foo)
ld t0, %pcrel_lo(1b)(t0)
```
This results in two relocation:
- auipc: `R_RISCV_PCREL_HI20` referencing `foo`;
- ld: `R_RISCV_PCREL_LO12_I` referencing to local label `1` which points
to the auipc instruction.
It is of utmost importance that the `R_RISCV_PCREL_LO12_I` keeps
referring to the auipc instruction; if not, the program will fail to
assemble. However, BOLT currently does not guarantee this.
BOLT currently assumes that all local symbols are jump targets and
always starts a new basic block at symbol locations. The example above
results in a CFG the looks like this:
```
.BB0:
nop
.BB1:
auipc t0, %pcrel_hi(foo)
ld t0, %pcrel_lo(.BB1)(t0)
```
While this currently works (i.e., the `R_RISCV_PCREL_LO12_I` relocation
points to the correct instruction), it has two downsides:
- Too many basic blocks are created (the example above is logically only
one yet two are created);
- If instructions are inserted in `.BB1` (e.g., by instrumentation),
things will break since the label will not point to the auipc anymore.
This patch proposes to fix this issue by teaching BOLT to track labels
that should always point to a specific instruction. This is implemented
as follows:
- Add a new annotation type (`kLabel`) that allows us to annotate
instructions with an `MCSymbol *`;
- Whenever we encounter a relocation type that is used to refer to a
specific instruction (`Relocation::isInstructionReference`), we
register it without a symbol;
- During disassembly, whenever we encounter an instruction with such a
relocation, create a symbol for its target and store it in an offset
to symbol map (to ensure multiple relocations referencing the same
instruction use the same label);
- After disassembly, iterate this map to attach labels to instructions
via the new annotation type;
- During emission, emit these labels right before the instruction.
I believe the use of annotations works quite well for this use case as
it allows us to reliably track instruction labels. If we were to store
them as offsets in basic blocks, it would be error prone to keep them
updated whenever instructions are inserted or removed.
I have chosen to add labels as first-class annotations (as opposed to a
generic one) because the documentation of `MCAnnotation` suggests that
generic annotations are to be used for optional metadata that can be
discarded without affecting correctness. As this is not the case for
labels, a first-class annotation seemed more appropriate.
In large code model, the address of GOT is calculated by the
static linker via R_X86_GOTPC64 reloc applied against a MOVABSQ
instruction. In the final binary, it can be disassembled as a regular
immediate, but because such immediate is the result of PC-relative
pointer arithmetic, we need to parse this relocation and update this
calculation whenever we move code, otherwise we break the code trying
to read GOT.
A test case showing how GOT is accessed was provided.
Reviewed By: #bolt, maksfb
Differential Revision: https://reviews.llvm.org/D158911
Calls on RISC-V are typically compiled to `auipc`/`jalr` pairs to allow
a maximum target range (32-bit pc-relative). In order to optimize calls
to near targets, linker relaxation may replace those pairs with, for
example, single `jal` instructions.
To allow BOLT to freely reassign function addresses in relaxed binaries,
this patch proposes the following approach:
- Expand all relaxed calls back to `auipc`/`jalr`;
- Rely on JITLink to relax those back to shorter forms where possible.
This is implemented by detecting all possible call instructions and
replacing them with `PseudoCALL` (or `PseudoTAIL`) instructions. The
RISC-V backend then expands those and adds the necessary relocations for
relaxation.
Since BOLT generally ignores pseudo instruction, this patch makes
`MCPlusBuilder::isPseudo` virtual so that `RISCVMCPlusBuilder` can
override it to exclude `PseudoCALL` and `PseudoTAIL`.
To ensure JITLink knows about the correct section addresses while
relaxing, reassignment of addresses has been moved to a post-allocation
pass. Note that this is probably the time it had to be done in the
first place since in `notifyResolved` (where it was done before), all
symbols are supposed to be resolved already.
Depends on D159082
Reviewed By: maksfb
Differential Revision: https://reviews.llvm.org/D159089
Jump tables may contain a function start address. One real-world example
is when a target basic block contains a recursive tail call that is
later optimized/folded into a jump table target.
While analyzing a jump table, we treat start address similar to an
address past the end of the containing function (a result of
__builtin_unreachable), i.e. we require another "regular" entry for the
heuristic to proceed.
Reviewed By: Amir
Differential Revision: https://reviews.llvm.org/D156206
Just enough features are implemented to process a simple "hello world"
executable and produce something that still runs (including libc calls).
This was mainly a matter of implementing support for various
relocations. Currently, the following are handled:
- R_RISCV_JAL
- R_RISCV_CALL
- R_RISCV_CALL_PLT
- R_RISCV_BRANCH
- R_RISCV_RVC_BRANCH
- R_RISCV_RVC_JUMP
- R_RISCV_GOT_HI20
- R_RISCV_PCREL_HI20
- R_RISCV_PCREL_LO12_I
- R_RISCV_RELAX
- R_RISCV_NONE
Executables linked with linker relaxation will probably fail to be
processed. BOLT relocates .text to a high address while leaving .plt at
its original (low) address. This causes PC-relative PLT calls that were
relaxed to a JAL to not fit their offset in an I-immediate anymore. This
is something that will be addressed in a later patch.
Changes to the BOLT core are relatively minor. Two things were tricky to
implement and needed slightly larger changes. I'll explain those below.
The R_RISCV_CALL(_PLT) relocation is put on the first instruction of a
AUIPC/JALR pair, the second does not get any relocation (unlike other
PCREL pairs). This causes issues with the combinations of the way BOLT
processes binaries and the RISC-V MC-layer handles relocations:
- BOLT reassembles instructions one by one and since the JALR doesn't
have a relocation, it simply gets copied without modification;
- Even though the MC-layer handles R_RISCV_CALL properly (adjusts both
the AUIPC and the JALR), it assumes the immediates of both
instructions are 0 (to be able to or-in a new value). This will most
likely not be the case for the JALR that got copied over.
To handle this difficulty without resorting to RISC-V-specific hacks in
the BOLT core, a new binary pass was added that searches for
AUIPC/JALR pairs and zeroes-out the immediate of the JALR.
A second difficulty was supporting ABS symbols. As far as I can tell,
ABS symbols were not handled at all, causing __global_pointer$ to break.
RewriteInstance::analyzeRelocation was updated to handle these
generically.
Tests are provided for all supported relocations. Note that in order to
test the correct handling of PLT entries, an ELF file produced by GCC
had to be used. While I tried to strip the YAML representation, it's
still quite large. Any suggestions on how to improve this would be
appreciated.
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D145687
RuntimeDyld has been deprecated in favor of JITLink. [1] This patch
replaces all uses of RuntimeDyld in BOLT with JITLink.
Care has been taken to minimize the impact on the code structure in
order to ease the inspection of this (rather large) changeset. Since
BOLT relied on the RuntimeDyld API in multiple places, this wasn't
always possible though and I'll explain the changes in code structure
first.
Design note: BOLT uses a JIT linker to perform what essentially is
static linking. No linked code is ever executed; the result of linking
is simply written back to an executable file. For this reason, I
restricted myself to the use of the core JITLink library and avoided ORC
as much as possible.
RuntimeDyld contains methods for loading objects (loadObject) and symbol
lookup (getSymbol). Since JITLink doesn't provide a class with a similar
interface, the BOLTLinker abstract class was added to implement it. It
was added to Core since both the Rewrite and RuntimeLibs libraries make
use of it. Wherever a RuntimeDyld object was used before, it was
replaced with a BOLTLinker object.
There is one major difference between the RuntimeDyld and BOLTLinker
interfaces: in JITLink, section allocation and the application of fixups
(relocation) happens in a single call (jitlink::link). That is, there is
no separate method like finalizeWithMemoryManagerLocking in RuntimeDyld.
BOLT used to remap sections between allocating (loadObject) and linking
them (finalizeWithMemoryManagerLocking). This doesn't work anymore with
JITLink. Instead, BOLTLinker::loadObject accepts a callback that is
called before fixups are applied which is used to remap sections.
The actual implementation of the BOLTLinker interface lives in the
JITLinkLinker class in the Rewrite library. It's the only part of the
BOLT code that should directly interact with the JITLink API.
For loading object, JITLinkLinker first creates a LinkGraph
(jitlink::createLinkGraphFromObject) and then links it (jitlink::link).
For the latter, it uses a custom JITLinkContext with the following
properties:
- Use BOLT's ExecutableFileMemoryManager. This one was updated to
implement the JITLinkMemoryManager interface. Since BOLT never
executes code, its finalization step is a no-op.
- Pass config: don't use the default target passes since they modify
DWARF sections in a way that seems incompatible with BOLT. Also run a
custom pre-prune pass that makes sure sections without symbols are not
pruned by JITLink.
- Implement symbol lookup. This used to be implemented by
BOLTSymbolResolver.
- Call the section mapper callback before the final linking step.
- Copy symbol values when the LinkGraph is resolved. Symbols are stored
inside JITLinkLinker to ensure that later objects (i.e.,
instrumentation libraries) can find them. This functionality used to
be provided by RuntimeDyld but I did not find a way to use JITLink
directly for this.
Some more minor points of interest:
- BinarySection::SectionID: JITLink doesn't have something equivalent to
RuntimeDyld's Section IDs. Instead, sections can only be referred to
by name. Hence, SectionID was updated to a string.
- There seem to be no tests for Mach-O. I've tested a small hello-world
style binary but not more than that.
- On Mach-O, JITLink "normalizes" section names to include the segment
name. I had to parse the section name back from this manually which
feels slightly hacky.
[1] https://reviews.llvm.org/D145686#4222642
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D147544
Add helper methods and simplify cases where we want to check if two functions
are parent-child of each other (function-fragment relationship).
Reviewed By: #bolt, rafauler
Differential Revision: https://reviews.llvm.org/D142668
All users of MCCodeEmitter::encodeInstruction use a raw_svector_ostream
to encode the instruction into a SmallVector. The raw_ostream however
incurs some overhead for the actual encoding.
This change allows an MCCodeEmitter to directly emit an instruction into
a SmallVector without using a raw_ostream and therefore allow for
performance improvments in encoding. A default path that uses existing
raw_ostream implementations is provided.
Reviewed By: MaskRay, Amir
Differential Revision: https://reviews.llvm.org/D145791
This patch fixes few problems with supporting dynamic relocations in CI.
1. After dynamic relocations and functions were read search for dynamic
relocations located in functions. Currently we expected them only to be
relative and only to be in constant island. Mark islands of such
functions to have dynamic relocations and create CI access symbol on the
relocation offset, so the BD would be created for such place.
2. During function disassemble and handling address reference for
constant island check if the referred external CI has dynamic
relocation. And if it has one we would continue to refer original CI
rather then creating a local copy.
3. After function disassembly stage mark function that has dynamic reloc
in CI as non-simple. We don't want such functions to be optimized, since
such passes as split function would create 2 copies of CI which we
unable to support currently.
4. During updating output values for BF search for BD located in CI and
update their output locations.
5. On dynamic relocation patching stage search for binary data located
on relocation offset. If it was moved use new relocation offset value
rather then an old one.
Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei
Differential Revision: https://reviews.llvm.org/D143748
ICF optimization runs multiple passes and the order in which functions
are folded could be dependent on the order they are being processed.
This order is indeterministic as functions are intermediately stored in
std::unordered_map<>. Note that this order is mostly stable, but is not
guaranteed to be and can change e.g. after switching to a different C++
library implementation.
Because the processing (and folding) order is indeterministic, the
previous way of calculating merged function call count could produce
different results.
Change the way we calculate the ICF call count to make it independent of
the function folding/processing order.
Mostly NFC as the output binary should remain the same, the change
affects only the console output.
Reviewed By: yota9
Differential Revision: https://reviews.llvm.org/D144807
`isChildOf` is a more concise name for the check. Also, there's no need to
test if the function is a fragment before doing `isChildOf` check.
Reviewed By: #bolt, rafauler, maksfb
Differential Revision: https://reviews.llvm.org/D142667
In lite mode, include split function fragments to the list of functions to
process even if a fragment has no samples. This is required to properly
detect and update split jump tables (jump tables that contain pointers to code
in the main and cold fragments).
Reviewed By: #bolt, maksfb
Differential Revision: https://reviews.llvm.org/D140457
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated. The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.
This is part of an effort to migrate from llvm::Optional to
std::optional:
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
This has the following advantages:
- std::shared_timed_mutex is macOS 10.12+ only. llvm::sys::RWMutex
automatically switches to a different implementation internally
when targeting older macOS versions.
- bolt only needs std::shared_mutex, not std::shared_timed_mutex.
llvm::sys::RWMutex automatically uses std::shared_mutex internally
where available.
std::shared_mutex and RWMutex have the same API, so no code changes
other than types and includes are needed.
Differential Revision: https://reviews.llvm.org/D138423
This patch replaces NoneType() and NoneType::None with None in
preparation for migration from llvm::Optional to std::optional.
In the std::optional world, we are not guranteed to be able to
default-construct std::nullopt_t or peek what's inside it, so neither
NoneType() nor NoneType::None has a corresponding expression in the
std::optional world.
Once we consistently use None, we should even be able to replace the
contents of llvm/include/llvm/ADT/None.h with something like:
using NoneType = std::nullopt_t;
inline constexpr std::nullopt_t None = std::nullopt;
to ease the migration from llvm::Optional to std::optional.
Differential Revision: https://reviews.llvm.org/D138376
matchLinkerVeneer() returns 3 if `Instruction` and the last
two instructions in `[Instructions.begin, Instructions.end())`
match the pattern
ADRP x16, imm
ADD x16, x16, imm
BR x16
BinaryContext.cpp used to use
--Count;
for (auto It = std::prev(Instructions.end()); Count != 0;
It = std::prev(It), --Count) {
...use It...
}
to walk these instructions. The first `--Count` skips the
instruction that's in `Instruction` instead of in `Instructions`.
The loop then walks over `Instructions`.
However, on the last iteration, this calls `std::prev()` on an
iterator that points at the container's begin(), which can blow
up.
Instead, use rbegin(), which sidesteps this issue.
Fixes test/AArch64/veneer-gold.s on a macOS host.
With this, check-bolt passes on macOS.
Differential Revision: https://reviews.llvm.org/D138313
If NewName twine has reference to the old name, then after
Section.Name = NewName.str(); this reference is invalidated,
so we cannot use NewName.str() anymore.
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D137616
Always use non-symbolizing disassembler for instruction encoding
validation as symbols will be treated as undefined/zeros be the encoder
and causing byte sequence mismatches.
Reviewed By: Amir
Differential Revision: https://reviews.llvm.org/D136118
Simplify the logic of handling sections in BOLT. This change brings more
direct and predictable mapping of BinarySection instances to sections in
the input and output files.
* Only sections from the input binary will have a non-null SectionRef.
When a new section is created as a copy of the input section,
its SectionRef is reset to null.
* RewriteInstance::getOutputSectionName() is removed as the section name
in the output file is now defined by BinarySection::getOutputName().
* Querying BinaryContext for sections by name uses their original name.
E.g., getUniqueSectionByName(".rodata") will return the original
section even if the new .rodata section was created.
* Input file sections (with relocations applied) are emitted via MC with
".bolt.org" prefix. However, their name in the output binary is
unchanged unless a new section with the same name is created.
* New sections are emitted internally with ".bolt.new" prefix if there's
a name conflict with an input file section. Their original name is
preserved in the output file.
* Section header string table is properly populated with section names
that are actually used. Previously we used to include discarded
section names as well.
* Fix the problem when dynamic relocations were propagated to a new
section with a name that matched a section in the input binary.
E.g., the new .rodata with jump tables had dynamic relocations from
the original .rodata.
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D135494
Put code that creates references to symbol+addend behind MCPlusBuilder.
Will use this later in validate memory references pass.
Reviewed By: #bolt, maksfb, yota9
Differential Revision: https://reviews.llvm.org/D134097
In non-pie binaries BOLT unconditionally converted type encoding
from indirect to absptr, which broke std exceptions since pointers
to their typeinfo were only assigned at runtime in .data section.
In this patch we preserve original encoding so that indirect
remains indirect and can be resolved at runtime, and absolute remains absolute.
Reviewed By: rafauler, maksfb
Differential Revision: https://reviews.llvm.org/D132484
This changes `FunctionFragment` from being used as a temporary proxy
object to access basic block ranges to a heap-allocated object that can
store fragment-specific information.
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D132050
This changes `FunctionFragment` from being used as a temporary proxy
object to access basic block ranges to a heap-allocated object that can
store fragment-specific information.
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D132050
This patch adds support to generate any number of sections that are
assigned to fragments of functions that are split more than two-way.
With this, a function's *nth* split fragment goes into section
`.text.cold.n`.
This also changes `FunctionLayout::erase` to make sure, that there are
no empty fragments at the end of the function. This sometimes happens
when blocks are erased from the function. To avoid creating symbols
pointing to these fragments, they need to be removed.
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D130521
This changes code emission such that it can emit specific function
fragments instead of scanning all basic blocks of a function and just
emitting those that are hot or cold.
To implement this, `FunctionLayout` explicitly distinguishes the "main"
fragment (i.e. the one that contains the entry block and is associated
with the original symbol) from "split" fragments. Additionally,
`BinaryFunction` receives support for multiple cold symbols - one for
each split fragment.
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D130052
Move logging into LLVM_DEBUG scope.
Remove redundant printing of jump table parents:
Old logging:
```
failed to analyze jump table in function _ZN12_GLOBAL__N_116InitHeaderSearch23Ad
dDefaultCIncludePathsERKN4llvm6TripleERKN5clang19HeaderSearchOptionsE/1(*2)
PIC Jump table JUMP_TABLE/_ZN12_GLOBAL__N_116InitHeaderSearch23AddDefaultCInclud
ePathsERKN4llvm6TripleERKN5clang19HeaderSearchOptionsE/1.1 for function _ZN12_GL
OBAL__N_116InitHeaderSearch23AddDefaultCIncludePathsERKN4llvm6TripleERKN5clang19
HeaderSearchOptionsE/1(*2) at 0x65996e0 with a total count of 0:
0x9dc
next jump table at 0x659a810 belongs to function _ZN5clang5Lexer40LexDependencyD
irectiveTokenWhileSkippingERNS_5TokenE
PIC Jump table JUMP_TABLE/_ZN5clang5Lexer40LexDependencyDirectiveTokenWhileSkipp
ingERNS_5TokenE.0 for function _ZN5clang5Lexer40LexDependencyDirectiveTokenWhile
SkippingERNS_5TokenE at 0x659a810 with a total count of 0:
jump table heuristic failure
```
New logging:
```
failed to analyze PIC Jump table JUMP_TABLE/_ZN12_GLOBAL__N_116InitHeaderSearch2
3AddDefaultCIncludePathsERKN4llvm6TripleERKN5clang19HeaderSearchOptionsE/1.1 for
function _ZN12_GLOBAL__N_116InitHeaderSearch23AddDefaultCIncludePathsERKN4llvm6T
ripleERKN5clang19HeaderSearchOptionsE/1(*2) at 0x65996e0 with a total count of 0:
absolute offset: 0x52ac58c
next PIC Jump table JUMP_TABLE/_ZN5clang5Lexer40LexDependencyDirectiveTokenWhile
SkippingERNS_5TokenE.0 for function _ZN5clang5Lexer40LexDependencyDirectiveToken
WhileSkippingERNS_5TokenE at 0x659a810 with a total count of 0:
jump table heuristic failure
```
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D131243
This diff uncovers an ASAN leak in getOrCreateJumpTable:
```
Indirect leak of 264 byte(s) in 1 object(s) allocated from:
#1 0x4f6e48c in llvm::bolt::BinaryContext::getOrCreateJumpTable ...
```
The removal of an assertion needs to be accompanied by proper deallocation of
a `JumpTable` object for which `analyzeJumpTable` was unsuccessful.
This reverts commit 52cd00cabf479aa7eb6dbb063b7ba41ea57bce9e.
Disassembly and branch target analysis are not decoupled, so any
analysis that depends on disassembly may not operate properly.
In specific, analyzeJumpTable uses instruction bounds check property.
A jump table was analyzed twice: (a) during disassembly, and (b) after
disassembly, so there are potentially some mismatched results.
In this update, functions that access JTs which fail the second check
will be marked as ignored.
Test Plan:
```
ninja check-bolt
```
Reviewed By: Amir
Differential Revision: https://reviews.llvm.org/D130431
There are two assumptions regarding jump table:
(a) It is accessed by only one fragment, say, Parent
(b) All entries target instructions in Parent
For (a), BOLT stores jump table entries as relative offset to Parent.
For (b), BOLT treats jump table entries target somewhere out of Parent
as INVALID_OFFSET, including fragment of same split function.
In this update, we extend (a) and (b) to include fragment of same split
functinon. For (a), we store jump table entries in absolute offset
instead. In addition, jump table will store all fragments that access
it. A fragment uses this information to only create label for jump table
entries that target to that fragment.
For (b), using absolute offset allows jump table entries to target
fragments of same split function, i.e., extend support for split jump
table. This can be done using relocation (fragment start/size) and
fragment detection heuristics (e.g., using symbol name pattern for
non-stripped binaries).
For jump table targets that can only be reached by one fragment, we
mark them as local label; otherwise, they would be the secondary
function entry to the target fragment.
Test Plan
```
ninja check-bolt
```
Reviewed By: Amir
Differential Revision: https://reviews.llvm.org/D128474
The gold linker veneers are written between functions without symbols,
so we to handle it specially in BOLT.
Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei
Differential Revision: https://reviews.llvm.org/D129260
Since we now have +all feature for AArch64 disassembler, we can use it
in BOLT and allow it to disassemble all ARM instructions supported by LLVM.
Reviewed by: rafauler
Differential Revision: https://reviews.llvm.org/D129139
Added support for mixing monolithic DWARF5 with legacy DWARF, and monolithic legacy and DWARF5 split dwarf.
Reviewed By: maksfb
Differential Revision: https://reviews.llvm.org/D128232
This reverts commit 425dda76e9fac93117289fd68a2abdfb1e4a0ba5.
This commit is currently causing BOLT to crash in one of our
binaries and needs a bit more checking to make sure it is safe
to land.
The gold linker veneers are written between functions without symbols,
so we to handle it specially in BOLT.
Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei
Differential Revision: https://reviews.llvm.org/D128082