Check that the function will be emitted in the final binary. Preserving
old function address is needed in case it is PLT trampiline, that is
currently not moved by the BOLT.
Differential Revision: https://reviews.llvm.org/D122098
BOLT treats aarch64 objects located in text as empty functions with
contant islands. Emit them with at least 8-byte alignment to the new
text section.
Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei
Differential Revision: https://reviews.llvm.org/D122097
AArch64 requires CI to be aligned to 8 bytes due to access instructions
restrictions. E.g. the ldr with imm, where imm must be aligned to 8 bytes.
Differential Revision: https://reviews.llvm.org/D122065
It seems the earlier implementation does not follow the description
in LoopRotationPass.h: It rotates loops even if they are already laid out
correctly. The diff adjusts the behaviour.
Given that the impact of LoopInversionPass is minor, this change won't
yield significant perf differences. Tested on clang-10: there seems to be a
0.1%-0.3% cpu win and a small reduction of branch misses.
**Before:**
BOLT-INFO: 120 Functions were reordered by LoopInversionPass
**After:**
BOLT-INFO: 79 Functions were reordered by LoopInversionPass
Reviewed By: yota9
Differential Revision: https://reviews.llvm.org/D121921
Run tentativeLayoutRelocMode twice only if UseOldText option was passed.
Refactor BF loop to break on condtition met.
Differential Revision: https://reviews.llvm.org/D121825
Since LLVM MC now preserves redundant AdSize override prefix (0x67), remove it
in BOLT explicitly (-x86-strip-redundant-adsize, on by default).
Test Plan:
`bin/llvm-lit -a bolt/test/X86/addr32.s`
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D120975
The BinaryEmitter uses opts::AlignText value to align the hot text
section. Also check that the opts::AlignText is at least
equal opts::AlignFunctions for the same reason, as described in D121392.
Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei
Differential Revision: https://reviews.llvm.org/D121728
The cold text section alignment is set using the maximum alignment value
passed to the emitCodeAlignment. In order to calculate tentetive layout
right we will set the minimum alignment of such sections to the maximum
possible function alignment explicitly.
Differential Revision: https://reviews.llvm.org/D121392
Fix data race reported by ThreadSanitizer in clang.test:
```
ThreadSanitizer: data race /data/llvm-project/bolt/lib/Passes/ShrinkWrapping.cpp:1359:28
in llvm::bolt::ShrinkWrapping::moveSaveRestores()
```
The issue is with incrementing global counters from multiple threads.
Reviewed By: yota9
Differential Revision: https://reviews.llvm.org/D120218
Summary:
- variable 'TotalSize' set but not used
- variable 'TotalCallsTopN' set but not used
- use of bitwise '|' with boolean operands
Reviewed By: maksfb
FBD33911129
Summary:
Move the annotation to avoid dynamic memory allocations.
Improves the CPU time of instrumenting a large binary by 1% (+-0.8%, p-value 0.01)
Test Plan: NFC
Reviewers: maksfb
FBD30091656
Summary:
Refactor bolt/*/Passes to follow the braces rule for if/else/loop from
[LLVM Coding Standards](https://llvm.org/docs/CodingStandards.html).
(cherry picked from FBD33344642)
Summary:
The lower_bound might return the end iterator, the ignoring of which will
cause memory corruption.
Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei
(cherry picked from FBD33307803)
Summary:
The patch moves the shortenInstructions and nop remove to separate binary
passes. As a result when llvm-bolt optimizations stage will begin the
instructions of the binary functions will be absolutely the same as it
was in the binary. This is needed for the golang support by llvm-bolt.
Some of the tests must be changed, since bb alignment nops might create
unreachable BBs in original functions.
Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei
(cherry picked from FBD32896517)
Summary:
Refactor members of BinaryBasicBlock. Replace some std containers with
ADT equivalents. The size of BinaryBasicBlock on x86-64 Linux is reduced
from 232 bytes to 192 bytes.
(cherry picked from FBD33081850)
Summary:
Switched members of BinaryFunction to ADT where it was possible and
made sense. As a result, the size of BinaryFunction on x86-64 Linux
reduced from 1624 bytes to 1448.
(cherry picked from FBD32981555)
Summary:
Some optimizations may remove all instructions in a basic block.
The pass will cleanup the CFG afterwards by removing empty basic
blocks and merging duplicate CFG edges.
The normalized CFG is printed under '-print-normalized' option.
(cherry picked from FBD32774360)
Summary:
TailDuplication::isInCacheLine makes the assumption that the block
has a valid layout index, which is not the case for unreachable blocks.
Add a check for a valid layout index.
(cherry picked from FBD32659755)
Summary:
The push and pop instructions might have wrong reorder due to this
error. Thanks rafaelauler for the provided test case.
Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei
(cherry picked from FBD32478348)
Summary:
We were not tracking -DBUILD_SHARED_LIBS=ON and we introduced
some commits that break that by not specifying the correct dependencies.
Fix that.
(cherry picked from FBD32453377)
Summary:
Make BOLT build in VisualStudio compiler and run without
crashing on a simple test. Other tests are not running.
(cherry picked from FBD32378736)
Summary:
Moved the FDATA printing under the condition of non-empty profile data.
The change reduces the assembly dumps.
(cherry picked from FBD32262675)
Summary:
Replace erroneous check for function eligibility from `Function.isIgnored()`
to `shouldOptimize(Function)`. This prevents non-simple functions from being
processed.
(cherry picked from FBD32301958)
Summary:
This commit uses reviews.llvm.org/D6629 as a reference to optimize
X86::EFLAGS load/store in the instrumentation snippet by using lahf/sahf
instructions instead of pushf/popf.
(cherry picked from FBD31662303)
Summary:
Added new functionality of dumping simple functions into assembly.
This includes:
- function control flow (basic blocks, instructions),
- profile information as `FDATA` directives, to be consumed by link_fdata,
- data labels,
- CFI directives,
- symbols for callee functions,
- jump table symbols.
Envisioned usage:
1. Find a function that triggers BOLT crash (e.g. with `bughunter.sh`).
2. Generate reproducer asm source for that function (using `-funcs`).
3. Attach it to an issue.
4. Reduce and include as a test case.
Current limitations:
1. Emitted assembly won't match input file relocations.
2. No DWARF support.
3. Data is not emitted.
(cherry picked from FBD32746857)
Summary:
BinaryContext is available via BinaryFunction::getBinaryContext(),
hence there's no reason to pass both as arguments to a function.
In a similar fashion, BinaryBasicBlock has an access to BinaryFunction
via getFunction(). Eliminate unneeded arguments.
(cherry picked from FBD31921680)
Summary:
Moves source files into separate components, and make explicit
component dependency on each other, so LLVM build system knows how to
build BOLT in BUILD_SHARED_LIBS=ON.
Please use the -c merge.renamelimit=230 git option when rebasing your
work on top of this change.
To achieve this, we create a new library to hold core IR files (most
classes beginning with Binary in their names), a new library to hold
Utils, some command line options shared across both RewriteInstance
and core IR files, a new library called Rewrite to hold most classes
concerned with running top-level functions coordinating the binary
rewriting process, and a new library called Profile to hold classes
dealing with profile reading and writing.
To remove the dependency from BinaryContext into X86-specific classes,
we do some refactoring on the BinaryContext constructor to receive a
reference to the specific backend directly from RewriteInstance. Then,
the dependency on X86 or AArch64-specific classes is transfered to the
Rewrite library. We can't have the Core library depend on targets
because targets depend on Core (which would create a cycle).
Files implementing the entry point of a tool are transferred to the
tools/ folder. All header files are transferred to the include/
folder. The src/ folder was renamed to lib/.
(cherry picked from FBD32746834)