Do not recommend the strict mode to the user when ADR relaxation fails
on a non-simple function, i.e. a function with unknown CFG.
We cannot rely on relocations to reconstruct compiler-generated jump
tables for AArch64, hence strict mode does not work as intended.
gas has supported " quoted symbols since 2015.
Both \ and " need to be escaped.
https://sourceware.org/pipermail/binutils/2015-August/090003.html
We don't unescape \\ or \" in assembly strings, leading to clang -c
--save-temps vs clang -c difference for the following C code:
```
int x asm("a\"\\b");
```
Fix#138390
MC/COFF/safeseh.h looks incorrect. \01 in `.safeseh "\01foo"` is not a
correct escape sequence. Change it to \\
Pull Request: https://github.com/llvm/llvm-project/pull/138817
This patch adds code generation for RISCV64 instrumentation.The work
involved includes the following three points:
a) Implements support for instrumenting direct function call and jump
on RISC-V which relies on , Atomic instructions
(used to increment counters) are only available on RISC-V when the A
extension is used.
b) Implements support for instrumenting direct function inderect call
by implementing the createInstrumentedIndCallHandlerEntryBB and
createInstrumentedIndCallHandlerExitBB interfaces. In this process, we
need to accurately record the target address and IndCallID to ensure
the correct recording of the indirect call counters.
c)Implemented the RISCV64 Bolt runtime library, implemented some system
call interfaces through embedded assembly. Get the difference between
runtime addrress of .text section andstatic address in section header
table, which in turn can be used to search for indirect call
description.
However, the community code currently has problems with relocation in
some scenarios, but this has nothing to do with instrumentation. We
may continue to submit patches to fix the related bugs.
To handle relative vftable, which is enabled with clang option
`-fexperimental-relative-c++-abi-vtables`, we look for PC relative
relocations whose fixup locations fall in vtable address ranges.
For such relocations, actual target is just virtual function itself,
and the addend is to record the distance between vtable slot for
target virtual function and the first virtual function slot in vtable,
which is to match generated code that calls virtual function. So
we can skip the logic of handling "function + offset" and directly
save such relocations for future fixup after new layout is known.
A few tests generate a statically-linked position-independent executable
with `-nostdlib -Wl,--unresolved-symbols=ignore-all -pie` (`%clang`) and
test PLT handling. (--unresolved-symbols=ignore-all suppresses undefined
symbol errors and serves as a convenience hack.)
This relies on an unguaranteed linker behavior: a statically-linked PIE
does not necessarily generate PLT entries.
While current lld generates a PLT entry, it will change to suppress the
PLT entry to simplify internal handling and improve consistency.
(The behavior has no consistency in GNU ld, some ports generated a
.dynsym entry while some don't. While most seem to generate a PLT entry
but some ports use a weird `R_*_NONE` relocation.)
1. With a Clang that doesn't default to GNU extensions they need to be enabled explicitly.
2. The X86 directory lit config sets it already, there's no reason for this test to do it by itself.
3. The C frontend executable will fail if there's for example a Clang resource file for the C++ mode that sets C++-specific options:
```
+ /home/tambre/dev/llvm/build/bin/clang --target=x86_64-unknown-linux-gnu -fPIE -fuse-ld=lld -Wl,--unresolved-symbols=ignore-all -pie -fPIC -shared /home/tambre/dev/llvm/bolt/test/R_ABS.pic.lld.cpp -o /home/tambre/dev/llvm/build/tools/bolt/test/Output/R_ABS.pic.lld.cpp.tmp.so -Wl,-q -fuse-ld=lld
clang: warning: argument unused during compilation: '-pie' [-Wunused-command-line-argument]
error: invalid argument '-std=c23' not allowed with 'C++'
```
We used to emit EH trampolines for PIE/DSO whenever a function fragment
contained a landing pad outside of it. However, it is common to have all
landing pads in a cold fragment even when their throwers are in a hot
one.
To reduce the number of trampolines, analyze landing pads for any given
function fragment, and if they all belong to the same (possibly
different) fragment, designate that fragment as a landing pad fragment
for the "thrower" fragment. Later, emit landing pad fragment symbol as
an LPStart for the thrower LSDA.
Exempt special symbols (hot text/data and _end symbol) from normal
handling. We only need to set their value and make them absolute.
If these symbols are handled as normal symbols and if they alias
functions we may create non-sensical symbols, e.g. __hot_start.cold.
Test Plan: updated hot-end-symbol.s
Reviewers: maksfb, rafaelauler, ayermolo, dcci
Reviewed By: dcci, maksfb
Pull Request: https://github.com/llvm/llvm-project/pull/92713
When we rewrite dynamic relocations, there could be cases where they
reference code locations inside functions that were rewritten. When this
happens, we need to precisely map old address to a new one. Until we can
reliably perform the mapping, detect such condition and issue an error
refusing to write a broken binary.
If a jump table has entries at the end that are a result of
__builtin_unreachable() targets, BOLT can confuse them with function
pointers. In such case, we should exclude these targets from the table
as we risk incorrectly updating the function pointers. It is safe to
exclude them as branching on such targets is considered an undefined
behavior.
Test was failing when only X86 was specified for LLVM_TARGETS_TO_BUILD.
Changed so that it will now report unsupporeted.
For "X86;AArch64" it still passes.
For "X86" reports UNSUPPORTED: BOLT :: runtime/instrument-wrong-target.s
(1 of 1)
NFC processing time script identifies tests by output filename.
When `/dev/null` is used as output filename, we're unable to tell the
source test, and the reports are unhelpful.
Replace `/dev/null/` with `%t.null` which resolves the issue.
Whenever LPStartEncoding was different from DW_EH_PE_omit, we used to
miscalculate LPStart. As a result, landing pads were assigned wrong
addresses. Fix that.
Currently strict mode is used to expand number of optimized functions,
not to shrink it. Revert the option usage in the pass, so passing strict
option would relax adr instruction even if there are no nops around it.
Also add check for nop after adr instruction.
Closes https://github.com/llvm/llvm-project/issues/63097
Before merging please make sure the change to
bolt/include/bolt/Passes/StokeInfo.h is correct.
bolt/include/bolt/Passes/StokeInfo.h
```diff
// This Pass solves the two major problems to use the Stoke program without
- // proting its code:
+ // probing its code:
```
I'm still not happy about the awkward wording in this comment.
bolt/include/bolt/Passes/FixRelaxationPass.h
```
$ ed -s bolt/include/bolt/Passes/FixRelaxationPass.h <<<'9,12p'
// This file declares the FixRelaxations class, which locates instructions with
// wrong targets and fixes them. Such problems usually occures when linker
// relaxes (changes) instructions, but doesn't fix relocations types properly
// for them.
$
```
bolt/docs/doxygen.cfg.in
bolt/include/bolt/Core/BinaryContext.h
bolt/include/bolt/Core/BinaryFunction.h
bolt/include/bolt/Core/BinarySection.h
bolt/include/bolt/Core/DebugData.h
bolt/include/bolt/Core/DynoStats.h
bolt/include/bolt/Core/Exceptions.h
bolt/include/bolt/Core/MCPlusBuilder.h
bolt/include/bolt/Core/Relocation.h
bolt/include/bolt/Passes/FixRelaxationPass.h
bolt/include/bolt/Passes/InstrumentationSummary.h
bolt/include/bolt/Passes/ReorderAlgorithm.h
bolt/include/bolt/Passes/StackReachingUses.h
bolt/include/bolt/Passes/StokeInfo.h
bolt/include/bolt/Passes/TailDuplication.h
bolt/include/bolt/Profile/DataAggregator.h
bolt/include/bolt/Profile/DataReader.h
bolt/lib/Core/BinaryContext.cpp
bolt/lib/Core/BinarySection.cpp
bolt/lib/Core/DebugData.cpp
bolt/lib/Core/DynoStats.cpp
bolt/lib/Core/Relocation.cpp
bolt/lib/Passes/Instrumentation.cpp
bolt/lib/Passes/JTFootprintReduction.cpp
bolt/lib/Passes/ReorderData.cpp
bolt/lib/Passes/RetpolineInsertion.cpp
bolt/lib/Passes/ShrinkWrapping.cpp
bolt/lib/Passes/TailDuplication.cpp
bolt/lib/Rewrite/BoltDiff.cpp
bolt/lib/Rewrite/DWARFRewriter.cpp
bolt/lib/Rewrite/RewriteInstance.cpp
bolt/lib/Utils/CommandLineOpts.cpp
bolt/runtime/instr.cpp
bolt/test/AArch64/got-ld64-relaxation.test
bolt/test/AArch64/unmarked-data.test
bolt/test/X86/Inputs/dwarf5-cu-no-debug-addr-helper.s
bolt/test/X86/Inputs/linenumber.cpp
bolt/test/X86/double-jump.test
bolt/test/X86/dwarf5-call-pc-function-null-check.test
bolt/test/X86/dwarf5-split-dwarf4-monolithic.test
bolt/test/X86/dynrelocs.s
bolt/test/X86/fallthrough-to-noop.test
bolt/test/X86/tail-duplication-cache.s
bolt/test/runtime/X86/instrumentation-ind-calls.s
BOLT currently hooks its its instrumentation finalization function via
`DT_FINI`. However, this method of calling finalization routines is not
supported anymore on newer ABIs like RISC-V. `DT_FINI_ARRAY` is
preferred there.
This patch adds support for hooking into `DT_FINI_ARRAY` instead if the
binary does not have a `DT_FINI` entry. If it does, `DT_FINI` takes
precedence so this patch should not change how the currently supported
instrumentation targets behave.
`DT_FINI_ARRAY` points to an array in memory of `DT_FINI_ARRAYSZ` bytes.
It consists of pointer-length entries that contain the addresses of
finalization functions. However, the addresses are only filled-in by the
dynamic linker at load time using relative relocations. This makes
hooking via `DT_FINI_ARRAY` a bit more complicated than via `DT_FINI`.
The implementation works as follows:
- While scanning the binary: find the section where `DT_FINI_ARRAY`
points to, read its first dynamic relocation and use its addend to find
the address of the fini function we will use to hook;
- While writing the output file: overwrite the addend of the dynamic
relocation with the address of the runtime library's fini function.
Updating the dynamic relocation required a bit of boiler plate: since
dynamic relocations are stored in a `std::multiset` which doesn't
support getting mutable references to its items, functions were added to
`BinarySection` to take an existing relocation and insert a new one.
Currently we were testing only the binaries compiled with O0, which
results in indirect call to the IFUNC trampoline and the trampoline has
associated IFUNC symbol with it. Compile with O3 results in direct
calling the IFUNC trampoline and no symbols are associated with it, the
IFUNC symbol address becomes the same as IFUNC resolver address. Since
no symbol was associated the BF was not created before PLT analyze and
be the algorithm we're going to analyze target relocation. As we're
expecting the JUMP relocation we're also expecting the associated symbol
with it to be presented. But for IFUNC relocation the IRELATIVE
relocation is used and no symbol is associated with it, the addend value
is pointing on the target symbol, so we need to find BF using it and use
it's symbol in this situation. Currently this is checked only for
AArch64 platform, so I've limited it in code to use this logic only for
this platform, although I wouldn't be surprised if other platforms needs
to activate this logic too.
This could happen, for example, when instrumenting an AArch64 binary on
an x86 host because the instrumentation library is always built for the
host.
Note that this check will probably need to be refined in the future as
merely having the same architecture does not guarantee objects can be
linked. For example, on RISC-V, the float ABI of all objects should
match.
In large code model, the address of GOT is calculated by the
static linker via R_X86_GOTPC64 reloc applied against a MOVABSQ
instruction. In the final binary, it can be disassembled as a regular
immediate, but because such immediate is the result of PC-relative
pointer arithmetic, we need to parse this relocation and update this
calculation whenever we move code, otherwise we break the code trying
to read GOT.
A test case showing how GOT is accessed was provided.
Reviewed By: #bolt, maksfb
Differential Revision: https://reviews.llvm.org/D158911
Fix tests that are failing in cross-compilation after D151920
(https://lab.llvm.org/buildbot/#/builders/221/builds/17715):
- instrumentation-ind-call, basic-instrumentation: add -mno-outline-atomics flag to runtime lib
- bolt-address-translation-internal-call, internal-call-instrument: add %cflags
- meta-merge-fdata: restrict to x86_64
Reviewed By: Amir
Differential Revision: https://reviews.llvm.org/D159094
Since the issue with trap value is fixed in D158191, it now should pass
on both platforms.
Reviewed By: maksfb
Differential Revision: https://reviews.llvm.org/D158899
Since the test executes instrumented version of the binary, move it under
runtime/X86. Note that it can be adjusted to also run under AArch64 now that
instrumentation is supported.
Reviewed By: #bolt, maksfb
Differential Revision: https://reviews.llvm.org/D159298
The relationship of X86 registers is shown in the diagram. BL and BH do
not have a direct alias relationship. However, if the BH register cannot be
swapped, then the BX/EBX/RBX registers cannot be swapped as well, which
means that BL register also cannot be swapped. Therefore, in the presence
of BX/EBX/RBX registers, BL and BH have an alias relationship.
┌────────────────┐
│ RBX │
├────┬───────────┤
│ │ EBX │
├────┴──┬────────┤
│ │ BX │
├───────┼───┬────┤
│ │BH │BL │
└───────┴───┴────┘
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D155098
BOLT-ERROR and BOLT-WARNING messages are output to stderr which is not captured
by piping to FileCheck. Redirect stderr to stdout to fix that in tests.
Reviewed By: #bolt, maksfb
Differential Revision: https://reviews.llvm.org/D156340
We identify instructions to be instrumented based on Offset annotation.
BOLT "expands" conditional tail calls into a conditional jump to a basic block
with unconditional tail call. Move Offset annotation from former CTC to the tail
call.
For expanded CTC we keep Offset attached to the original instruction which is
converted into a regular conditional jump, while leaving the newly created tail
call without an Offset annotation. This leads to attempting the instrumentation
of the conditional jump which points to the basic block with an inherited input
offset thus creating an invalid edge description. At the same time, the newly
created tail call is skipped entirely which means we're not creating a call
description for it.
If we instead reassign Offset annotation from the conditional jump to the tail
call we fix both issues. The conditional jump will be skipped not creating an
invalid edge description, while tail call will be handled properly (unformly
with regular calls).
Reviewed By: #bolt, maksfb
Differential Revision: https://reviews.llvm.org/D156389
Align perf reader to fdata behavior by sorting BranchData after reading samples,
in the same way as DataReader:
20c66a0c66/bolt/lib/Profile/DataReader.cpp (L1239)
Namely, that order affects CallSiteInfo annotations which determine the
construction order of CallGraph, which in turn affects function reordering.
Reviewed By: #bolt, rafauler
Differential Revision: https://reviews.llvm.org/D152731
This is an ongoing series of commits that are reformatting our
Python code. This catches the last of the python files to
reformat. Since they where so few I bunched them together.
Reformatting is done with `black`.
If you end up having problems merging this commit because you
have made changes to a python file, the best way to handle that
is to run git checkout --ours <yourfile> and then reformat it
with black.
If you run into any problems, post to discourse about it and
we will try to help.
RFC Thread below:
https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style
Reviewed By: jhenderson, #libc, Mordante, sivachandra
Differential Revision: https://reviews.llvm.org/D150784
Make retpoline functions invariant of X86 register numbers.
retpoline-synthetic.test is known to fail NFC testing due to shifting
register numbers. Use canonical register names instead of tablegen
numbers.
Before:
```
__retpoline_r51_
__retpoline_mem_r58+DATAat0x200fe8
__retpoline_mem_r51+0
__retpoline_mem_r132+0+8*53
```
After:
```
__retpoline_%rax_
__retpoline_mem_%rip+DATAat0x200fe8
__retpoline_mem_%rax+0
__retpoline_mem_%r12+0+8*%rbx
```
Test Plan:
- Revert 67bd3c58c0c7389e39c5a2f4d3b1a30459ccf5b7 that touches X86RegisterInfo.td.
- retpoline-synthetic.test passes in NFC mode with this diff, fails without it.
Reviewed By: #bolt, rafauler
Differential Revision: https://reviews.llvm.org/D150138
Remove the usage of StringMap in places where the iteration order
affects the output since the iteration over StringMap is
non-deterministic.
Reviewed By: Amir
Differential Revision: https://reviews.llvm.org/D145194
Avoid replacing one adr instruction with two adrp+add by utilizing linker-provided nops
when they are present. By doing so we preserve relative offsets of next instructions
in a function which reduces chances to break undetected jump tables. This commit makes
release-mode lld-linked clang, lld and etc work after BOLT.
Reviewed By: rafauler, yota9
Differential Revision: https://reviews.llvm.org/D143887
The test has 3 invocations with 1M iterations each, which adds delay to fast
check-bolt testing. Reduce the number to 1K.
Reviewed By: #bolt, rafauler
Differential Revision: https://reviews.llvm.org/D139651
This patch adds the huge pages support (-hugify) for PIE/no-PIE
binaries. Also returned functionality to support the kernels < 5.10
where there is a problem in a dynamic loader with the alignment of
pages addresses.
Differential Revision: https://reviews.llvm.org/D129107
This prepares for an upcoming change to make --print-imm-hex the default
behavior of llvm-objdump. These tests were updated in a semi-automatic
fashion.
See D136972 for details.