NFC processing time script identifies tests by output filename.
When `/dev/null` is used as output filename, we're unable to tell the
source test, and the reports are unhelpful.
Replace `/dev/null/` with `%t.null` which resolves the issue.
Whenever LPStartEncoding was different from DW_EH_PE_omit, we used to
miscalculate LPStart. As a result, landing pads were assigned wrong
addresses. Fix that.
Closes https://github.com/llvm/llvm-project/issues/63097
Before merging please make sure the change to
bolt/include/bolt/Passes/StokeInfo.h is correct.
bolt/include/bolt/Passes/StokeInfo.h
```diff
// This Pass solves the two major problems to use the Stoke program without
- // proting its code:
+ // probing its code:
```
I'm still not happy about the awkward wording in this comment.
bolt/include/bolt/Passes/FixRelaxationPass.h
```
$ ed -s bolt/include/bolt/Passes/FixRelaxationPass.h <<<'9,12p'
// This file declares the FixRelaxations class, which locates instructions with
// wrong targets and fixes them. Such problems usually occures when linker
// relaxes (changes) instructions, but doesn't fix relocations types properly
// for them.
$
```
bolt/docs/doxygen.cfg.in
bolt/include/bolt/Core/BinaryContext.h
bolt/include/bolt/Core/BinaryFunction.h
bolt/include/bolt/Core/BinarySection.h
bolt/include/bolt/Core/DebugData.h
bolt/include/bolt/Core/DynoStats.h
bolt/include/bolt/Core/Exceptions.h
bolt/include/bolt/Core/MCPlusBuilder.h
bolt/include/bolt/Core/Relocation.h
bolt/include/bolt/Passes/FixRelaxationPass.h
bolt/include/bolt/Passes/InstrumentationSummary.h
bolt/include/bolt/Passes/ReorderAlgorithm.h
bolt/include/bolt/Passes/StackReachingUses.h
bolt/include/bolt/Passes/StokeInfo.h
bolt/include/bolt/Passes/TailDuplication.h
bolt/include/bolt/Profile/DataAggregator.h
bolt/include/bolt/Profile/DataReader.h
bolt/lib/Core/BinaryContext.cpp
bolt/lib/Core/BinarySection.cpp
bolt/lib/Core/DebugData.cpp
bolt/lib/Core/DynoStats.cpp
bolt/lib/Core/Relocation.cpp
bolt/lib/Passes/Instrumentation.cpp
bolt/lib/Passes/JTFootprintReduction.cpp
bolt/lib/Passes/ReorderData.cpp
bolt/lib/Passes/RetpolineInsertion.cpp
bolt/lib/Passes/ShrinkWrapping.cpp
bolt/lib/Passes/TailDuplication.cpp
bolt/lib/Rewrite/BoltDiff.cpp
bolt/lib/Rewrite/DWARFRewriter.cpp
bolt/lib/Rewrite/RewriteInstance.cpp
bolt/lib/Utils/CommandLineOpts.cpp
bolt/runtime/instr.cpp
bolt/test/AArch64/got-ld64-relaxation.test
bolt/test/AArch64/unmarked-data.test
bolt/test/X86/Inputs/dwarf5-cu-no-debug-addr-helper.s
bolt/test/X86/Inputs/linenumber.cpp
bolt/test/X86/double-jump.test
bolt/test/X86/dwarf5-call-pc-function-null-check.test
bolt/test/X86/dwarf5-split-dwarf4-monolithic.test
bolt/test/X86/dynrelocs.s
bolt/test/X86/fallthrough-to-noop.test
bolt/test/X86/tail-duplication-cache.s
bolt/test/runtime/X86/instrumentation-ind-calls.s
In large code model, the address of GOT is calculated by the
static linker via R_X86_GOTPC64 reloc applied against a MOVABSQ
instruction. In the final binary, it can be disassembled as a regular
immediate, but because such immediate is the result of PC-relative
pointer arithmetic, we need to parse this relocation and update this
calculation whenever we move code, otherwise we break the code trying
to read GOT.
A test case showing how GOT is accessed was provided.
Reviewed By: #bolt, maksfb
Differential Revision: https://reviews.llvm.org/D158911
Since the issue with trap value is fixed in D158191, it now should pass
on both platforms.
Reviewed By: maksfb
Differential Revision: https://reviews.llvm.org/D158899
Since the test executes instrumented version of the binary, move it under
runtime/X86. Note that it can be adjusted to also run under AArch64 now that
instrumentation is supported.
Reviewed By: #bolt, maksfb
Differential Revision: https://reviews.llvm.org/D159298
The relationship of X86 registers is shown in the diagram. BL and BH do
not have a direct alias relationship. However, if the BH register cannot be
swapped, then the BX/EBX/RBX registers cannot be swapped as well, which
means that BL register also cannot be swapped. Therefore, in the presence
of BX/EBX/RBX registers, BL and BH have an alias relationship.
┌────────────────┐
│ RBX │
├────┬───────────┤
│ │ EBX │
├────┴──┬────────┤
│ │ BX │
├───────┼───┬────┤
│ │BH │BL │
└───────┴───┴────┘
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D155098
BOLT-ERROR and BOLT-WARNING messages are output to stderr which is not captured
by piping to FileCheck. Redirect stderr to stdout to fix that in tests.
Reviewed By: #bolt, maksfb
Differential Revision: https://reviews.llvm.org/D156340
We identify instructions to be instrumented based on Offset annotation.
BOLT "expands" conditional tail calls into a conditional jump to a basic block
with unconditional tail call. Move Offset annotation from former CTC to the tail
call.
For expanded CTC we keep Offset attached to the original instruction which is
converted into a regular conditional jump, while leaving the newly created tail
call without an Offset annotation. This leads to attempting the instrumentation
of the conditional jump which points to the basic block with an inherited input
offset thus creating an invalid edge description. At the same time, the newly
created tail call is skipped entirely which means we're not creating a call
description for it.
If we instead reassign Offset annotation from the conditional jump to the tail
call we fix both issues. The conditional jump will be skipped not creating an
invalid edge description, while tail call will be handled properly (unformly
with regular calls).
Reviewed By: #bolt, maksfb
Differential Revision: https://reviews.llvm.org/D156389
Align perf reader to fdata behavior by sorting BranchData after reading samples,
in the same way as DataReader:
20c66a0c66/bolt/lib/Profile/DataReader.cpp (L1239)
Namely, that order affects CallSiteInfo annotations which determine the
construction order of CallGraph, which in turn affects function reordering.
Reviewed By: #bolt, rafauler
Differential Revision: https://reviews.llvm.org/D152731
This is an ongoing series of commits that are reformatting our
Python code. This catches the last of the python files to
reformat. Since they where so few I bunched them together.
Reformatting is done with `black`.
If you end up having problems merging this commit because you
have made changes to a python file, the best way to handle that
is to run git checkout --ours <yourfile> and then reformat it
with black.
If you run into any problems, post to discourse about it and
we will try to help.
RFC Thread below:
https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style
Reviewed By: jhenderson, #libc, Mordante, sivachandra
Differential Revision: https://reviews.llvm.org/D150784
Make retpoline functions invariant of X86 register numbers.
retpoline-synthetic.test is known to fail NFC testing due to shifting
register numbers. Use canonical register names instead of tablegen
numbers.
Before:
```
__retpoline_r51_
__retpoline_mem_r58+DATAat0x200fe8
__retpoline_mem_r51+0
__retpoline_mem_r132+0+8*53
```
After:
```
__retpoline_%rax_
__retpoline_mem_%rip+DATAat0x200fe8
__retpoline_mem_%rax+0
__retpoline_mem_%r12+0+8*%rbx
```
Test Plan:
- Revert 67bd3c58c0c7389e39c5a2f4d3b1a30459ccf5b7 that touches X86RegisterInfo.td.
- retpoline-synthetic.test passes in NFC mode with this diff, fails without it.
Reviewed By: #bolt, rafauler
Differential Revision: https://reviews.llvm.org/D150138
Remove the usage of StringMap in places where the iteration order
affects the output since the iteration over StringMap is
non-deterministic.
Reviewed By: Amir
Differential Revision: https://reviews.llvm.org/D145194
The test has 3 invocations with 1M iterations each, which adds delay to fast
check-bolt testing. Reduce the number to 1K.
Reviewed By: #bolt, rafauler
Differential Revision: https://reviews.llvm.org/D139651
This patch adds the huge pages support (-hugify) for PIE/no-PIE
binaries. Also returned functionality to support the kernels < 5.10
where there is a problem in a dynamic loader with the alignment of
pages addresses.
Differential Revision: https://reviews.llvm.org/D129107
This adds a round of checks to memory references, looking for
incorrect references to jump table objects. Fix them by replacing the
jump table reference with another object reference + offset.
This solves bugs related to regular data references in code
accidentally being bound to a jump table, and this reference being
updated to a new (incorrect) location because we moved this jump
table.
Fixes#55004
Reviewed By: #bolt, maksfb
Differential Revision: https://reviews.llvm.org/D134098
.bss section emitted by llvm-bolt (e.g. with instrumentation) is not a
real BSS section, i.e. it takes space in the output file. Hence the
order with respect to .data is not defined. Remove .bss from the test
and fix the buildbot failure.
Reviewed By: Amir
Differential Revision: https://reviews.llvm.org/D135475
While the order of new sections in the output binary was deterministic
in the past (i.e. there was no run-to-run variation), it wasn't always
rational as we used size to define the precedence of allocatable
sections within "code" or "data" groups (probably unintentionally).
Fix that by defining stricter section-ordering rules.
Other than the order of sections, this should be NFC.
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D135235
For functions with references to internal offsets from data, verify externally
referenced blocks against the set of jump table targets. Mark the function
as non-simple if there are any unclaimed data to code references.
Reviewed By: #bolt, maksfb
Differential Revision: https://reviews.llvm.org/D132495
For exception handling, LSDA call sites have to be emitted for each
fragment individually. With this patch, call sites and respective LSDA
symbols are generated and associated with each fragment of their
function, such that they can be used by the emitter.
Reviewed By: maksfb
Differential Revision: https://reviews.llvm.org/D132052
If the first block of a fragment is also a landing pad, the landing pad
is not used if an exception is thrown. This is because the landing pad
is at the same start address that the corresponding LSDA describes. In
that case, the offset in the call site records to refer to that landing
pad is zero, and a zero offset is interpreted by the personality
function as "no handler" and ignored.
Reviewed By: Amir
Differential Revision: https://reviews.llvm.org/D132053
If BOLT instrumentation runtime uses XMM registers, it can interfere
with the user program causing crashes and unexpected behavior. This
happens as the instrumentation code preserves general purpose registers
only.
Build BOLT instrumentation runtime with "-mno-sse".
Reviewed By: Amir
Differential Revision: https://reviews.llvm.org/D128960
When SplitFunctions pass adds a trampoline code for exception landing
pads (limited to shared objects), it may increase the size of the hot
fragment making it larger than the whole function pre-split. When this
happens, the pass reverts the splitting action by restoring the original
block order and marking all blocks hot.
However, if createEHTrampolines() added new blocks to the CFG and
modified invoke instructions, simply restoring the original block layout
will not suffice as the new CFG has more blocks.
For proper backout of the split, modify the original layout by merging
in trampoline blocks immediately before their matching targets. As a
result, the number of blocks increases, but the number of instructions
and the function size remains the same as pre-split.
Add an assertion for the number of blocks when updating a function
layout.
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D128696
The SplitFunctions pass does not distinguish between various splitting
modes anymore. This change updates the command line interface to
reflect this behavior by deprecating values passed to the
--split-function option.
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D128558
Add functionality to allow splitting code with C++ exceptions in shared
libraries and PIEs. To overcome a limitation in exception ranges format,
for functions with fragments spanning multiple sections, add trampoline
landing pads in the same section as the corresponding throwing range.
Reviewed By: Amir
Differential Revision: https://reviews.llvm.org/D127936
Replace a single dash with a double dash for options that have more
than a single letter.
llvm-bolt-wrapper.py has special treatment for output options such as
"-o" and "-w" causing issues when a single dash is used, e.g. for
"-write-dwp". The wrapper can be fixed as well, but using a double dash
has other advantages as well.
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D127538
Replace "cache+" with "ext-tsp" in all BOLT tests
Test Plan:
```
ninja check-bolt
grep -rnw . -e "cache+"
```
no more tests containing "cache+"
"cache+" and "ext-tsp" are aliases
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D126714
Align with an upstream change D120305 to make PIE the default on linux-gnu.
Add `-no-pie` to tests that require it.
Reviewed By: maksfb, yota9
Differential Revision: https://reviews.llvm.org/D123329
Matching an exact byte offset is fragile if a different version of compiler
is used (e.g. distro clang).
Resolves an issue with running with BOLT_CLANG_EXE + clang-12
Reviewed By: maksfb
Differential Revision: https://reviews.llvm.org/D117440
Summary:
The majority of tests in LLVM projects are using - instead of _ in the name,
i.e. `check-something.test` is preferred over `check_something.test`.
It makes sense for us to adopt the same naming scheme for our future tests and
to rename existing ones.
(cherry picked from FBD32185879)
Summary:
Change cmake config in BOLT to only support Linux. In other
platforms, we print a warning that we won't build BOLT. Change
configs to determine whether we will build BOLT runtime libs. This
only happens in x86 hosts. If true, we will build the runtime and
enable bolt-runtime tests. New tests that depend on the bolt_rt lib
needs to be marked REQUIRES:bolt-runtime. I updated the relevant
tests. Fix cmake to do not crash when building llvm with a target
that BOLT does not support.
(cherry picked from FBD31935760)
Summary:
Add lit.local.cfg to X86 and AArch64 folders.
Fix host_arch in lit config for AArch64.
Fix AArch64 and X86 tests.
Elvina Yakubova,
Advanced Software Technology Lab, Huawei
(cherry picked from FBD31702068)