On AArch64, we create optional/weak relocations that may not be
processed due to the relocated value overflow. When the overflow
happens, we used to enforce patching for all functions in the binary via
--force-patch option. This PR relaxes the requirement, and enforces
patching only for functions that are target of optional relocations.
Moreover, if the compact code model is used, the relocation overflow is
guaranteed not to happen and the patching will be skipped.
The current implementation of `lookupStubFromGroup` is incorrect. The
function is intended to find and return the closest stub using
`lower_bound`, which identifies the first element in a sorted range that
is not less than a specified value. However, if such an element is not
found within `Candidates` and the list is not empty, the function
returns `nullptr`. Instead, it should check whether the last element
satisfies the condition.
Add `--compact-code-model` option that executes alternative branch
relaxation with an assumption that the resulting binary has less than
128MB of code. The relaxation is done in `relaxLocalBranches()`, which
operates on a function level and executes on multiple functions in
parallel.
Running the new option on AArch64 Clang binary produces slightly smaller
code and the relaxation finishes in about 1/10th of the time.
Note that the new `.text` has to be smaller than 128MB, *and* `.plt` has
to be closer than 128MB to `.text`.
When split functions is used, BOLT may skip tentative code layout
estimation in some cases, like:
- when there is no profile data for some blocks (ie cold blocks)
- when there are cold functions in lite mode
- when skip functions is used
However, when rewriting the binary we still need to compute PC-relative
distances between hot and cold basic blocks. Without cold layout
estimation, BOLT uses '0x0' as the address of the first cold block,
leading to incorrect estimations of any PC-relative addresses.
This affects large binaries as the relaxStub method expands more
branches than necessary using the short-jump sequence, at it wrongly
believes it has exceeded the branch distance boundary.
This increases code size with both a larger and slower sequence;
however,
performance regression is expected to be minimal since this only affects
any called cold code.
Example of such an unnecessary relaxation:
from:
```armasm
b .Ltmp1234
```
to:
```armasm
adrp x16, .Ltmp1234
add x16, x16, :lo12:.Ltmp1234
br x16
```
Make core BOLT functionality more friendly to being used as a
library instead of in our standalone driver llvm-bolt. To
accomplish this, we augment BinaryContext with journaling streams
that are to be used by most BOLT code whenever something needs to
be logged to the screen. Users of the library can decide if logs
should be printed to a file, no file or to the screen, as
before. To illustrate this, this patch adds a new option
`--log-file` that allows the user to redirect BOLT logging to a
file on disk or completely hide it by using
`--log-file=/dev/null`. Future BOLT code should now use
`BinaryContext::outs()` for printing important messages instead of
`llvm::outs()`. A new test log.test enforces this by verifying that
no strings are print to screen once the `--log-file` option is
used.
In previous patches we also added a new BOLTError class to report
common and fatal errors, so code shouldn't call exit(1) now. To
easily handle problems as before (by quitting with exit(1)),
callers can now use
`BinaryContext::logBOLTErrorsAndQuitOnFatal(Error)` whenever code
needs to deal with BOLT errors. To test this, we have fatal.s
that checks we are correctly quitting and printing a fatal error
to the screen.
Because this is a significant change by itself, not all code was
yet ported. Code from Profiler libs (DataAggregator and friends)
still print errors directly to screen.
Co-authored-by: Rafael Auler <rafaelauler@fb.com>
Test Plan: NFC
As part of the effort to refactor old error handling code that
would directly call exit(1), in this patch continue the migration
on libCore, libRewrite and libPasses to use the new BOLTError
class whenever a failure occurs.
Test Plan: NFC
Co-authored-by: Rafael Auler <rafaelauler@fb.com>
As part of the effort to refactor old error handling code that
would directly call exit(1), in this patch we change the
interface to `BinaryFunctionPass` to return an Error on
`runOnFunctions()`. This gives passes the ability to report a
serious problem to the caller (RewriteInstance class), so the
caller may decide how to best handle the exceptional situation.
Co-authored-by: Rafael Auler <rafaelauler@fb.com>
Test Plan: NFC
If a local stub is out-of-range, at LongJmp we will try to find another
local stub first. However, The original implementation do not work as
expected and it leads to an infinite loop between replaceTargetWithStub
and fixBranches.
After this patch, we first convert the target of BB back to the target
of the local stub, and then look up for other valid local stubs and so
on.
Currently minimal alignment of function is hardcoded to 2 bytes.
Add 2 more cases:
1. In case BF is data in code return the alignment of CI as minimal
alignment
2. For aarch64 and riscv platforms return the minimal value of 4 (added
test for aarch64)
Otherwise fallback to returning the 2 as it previously was.
In instruction encoding, the relative offset address of the PC is
signed, that is, the number of positive offset bits and the number of
negative offset bits is asymmetric. Therefore, the maximum and minimum
values are used to replace Mask to determine the boundary.
Co-authored-by: qijitao <qijitao@hisilicon.com>
This patch adds a dedicated class to keep track of each function's
layout. It also lays the groundwork for splitting functions into
multiple fragments (as opposed to a strict hot/cold split).
Reviewed By: maksfb
Differential Revision: https://reviews.llvm.org/D129518
This reverts commit 3988bd13988aad72ec979beb2361e8738584926b.
Did not build on this bot:
https://lab.llvm.org/buildbot#builders/215/builds/6372
/usr/include/c++/9/bits/predefined_ops.h:177:11: error: no match for call to
‘(llvm::less_first) (std::pair<long unsigned int, llvm::bolt::BinaryBasicBlock*>&, const std::pair<long unsigned int, std::nullptr_t>&)’
177 | { return bool(_M_comp(*__it, __val)); }
One could reuse this functor instead of rolling out your own version.
There were a couple other cases where the code was similar, but not
quite the same, such as it might have an assertion in the lambda or other
constructs. Thus, I've not touched any of those, as it might change the
behavior in some way.
As per https://discourse.llvm.org/t/submitting-simple-nfc-patches/62640/3?u=steakhal
Chris Lattner
> LLVM intentionally has a “yes, you can apply common sense judgement to
> things” policy when it comes to code review. If you are doing mechanical
> patches (e.g. adopting less_first) that apply to the entire monorepo,
> then you don’t need everyone in the monorepo to sign off on it. Having
> some +1 validation from someone is useful, but you don’t need everyone
> whose code you touch to weigh in.
Differential Revision: https://reviews.llvm.org/D126068
Check that the function will be emitted in the final binary. Preserving
old function address is needed in case it is PLT trampiline, that is
currently not moved by the BOLT.
Differential Revision: https://reviews.llvm.org/D122098
AArch64 requires CI to be aligned to 8 bytes due to access instructions
restrictions. E.g. the ldr with imm, where imm must be aligned to 8 bytes.
Differential Revision: https://reviews.llvm.org/D122065
Run tentativeLayoutRelocMode twice only if UseOldText option was passed.
Refactor BF loop to break on condtition met.
Differential Revision: https://reviews.llvm.org/D121825
The BinaryEmitter uses opts::AlignText value to align the hot text
section. Also check that the opts::AlignText is at least
equal opts::AlignFunctions for the same reason, as described in D121392.
Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei
Differential Revision: https://reviews.llvm.org/D121728
The cold text section alignment is set using the maximum alignment value
passed to the emitCodeAlignment. In order to calculate tentetive layout
right we will set the minimum alignment of such sections to the maximum
possible function alignment explicitly.
Differential Revision: https://reviews.llvm.org/D121392
Summary:
Refactor bolt/*/Passes to follow the braces rule for if/else/loop from
[LLVM Coding Standards](https://llvm.org/docs/CodingStandards.html).
(cherry picked from FBD33344642)
Summary:
The lower_bound might return the end iterator, the ignoring of which will
cause memory corruption.
Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei
(cherry picked from FBD33307803)
Summary:
Refactor members of BinaryBasicBlock. Replace some std containers with
ADT equivalents. The size of BinaryBasicBlock on x86-64 Linux is reduced
from 232 bytes to 192 bytes.
(cherry picked from FBD33081850)
Summary:
Moves source files into separate components, and make explicit
component dependency on each other, so LLVM build system knows how to
build BOLT in BUILD_SHARED_LIBS=ON.
Please use the -c merge.renamelimit=230 git option when rebasing your
work on top of this change.
To achieve this, we create a new library to hold core IR files (most
classes beginning with Binary in their names), a new library to hold
Utils, some command line options shared across both RewriteInstance
and core IR files, a new library called Rewrite to hold most classes
concerned with running top-level functions coordinating the binary
rewriting process, and a new library called Profile to hold classes
dealing with profile reading and writing.
To remove the dependency from BinaryContext into X86-specific classes,
we do some refactoring on the BinaryContext constructor to receive a
reference to the specific backend directly from RewriteInstance. Then,
the dependency on X86 or AArch64-specific classes is transfered to the
Rewrite library. We can't have the Core library depend on targets
because targets depend on Core (which would create a cycle).
Files implementing the entry point of a tool are transferred to the
tools/ folder. All header files are transferred to the include/
folder. The src/ folder was renamed to lib/.
(cherry picked from FBD32746834)