91 Commits

Author SHA1 Message Date
Job Noorman
f873029386 [BOLT] Add minimal RISC-V 64-bit support
Just enough features are implemented to process a simple "hello world"
executable and produce something that still runs (including libc calls).
This was mainly a matter of implementing support for various
relocations. Currently, the following are handled:

- R_RISCV_JAL
- R_RISCV_CALL
- R_RISCV_CALL_PLT
- R_RISCV_BRANCH
- R_RISCV_RVC_BRANCH
- R_RISCV_RVC_JUMP
- R_RISCV_GOT_HI20
- R_RISCV_PCREL_HI20
- R_RISCV_PCREL_LO12_I
- R_RISCV_RELAX
- R_RISCV_NONE

Executables linked with linker relaxation will probably fail to be
processed. BOLT relocates .text to a high address while leaving .plt at
its original (low) address. This causes PC-relative PLT calls that were
relaxed to a JAL to not fit their offset in an I-immediate anymore. This
is something that will be addressed in a later patch.

Changes to the BOLT core are relatively minor. Two things were tricky to
implement and needed slightly larger changes. I'll explain those below.

The R_RISCV_CALL(_PLT) relocation is put on the first instruction of a
AUIPC/JALR pair, the second does not get any relocation (unlike other
PCREL pairs). This causes issues with the combinations of the way BOLT
processes binaries and the RISC-V MC-layer handles relocations:
- BOLT reassembles instructions one by one and since the JALR doesn't
  have a relocation, it simply gets copied without modification;
- Even though the MC-layer handles R_RISCV_CALL properly (adjusts both
  the AUIPC and the JALR), it assumes the immediates of both
  instructions are 0 (to be able to or-in a new value). This will most
  likely not be the case for the JALR that got copied over.

To handle this difficulty without resorting to RISC-V-specific hacks in
the BOLT core, a new binary pass was added that searches for
AUIPC/JALR pairs and zeroes-out the immediate of the JALR.

A second difficulty was supporting ABS symbols. As far as I can tell,
ABS symbols were not handled at all, causing __global_pointer$ to break.
RewriteInstance::analyzeRelocation was updated to handle these
generically.

Tests are provided for all supported relocations. Note that in order to
test the correct handling of PLT entries, an ELF file produced by GCC
had to be used. While I tried to strip the YAML representation, it's
still quite large. Any suggestions on how to improve this would be
appreciated.

Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D145687
2023-06-16 12:19:36 +02:00
Maksim Panchenko
5c4d306a10 [BOLT][NFC] Change signature of MCPlusBuilder::isUnsupportedBranch()
Make MCPlusBuilder::isUnsupportedBranch() take MCInst, not opcode.

Reviewed By: Amir

Differential Revision: https://reviews.llvm.org/D152765
2023-06-13 12:20:36 -07:00
Maksim Panchenko
43f56a2f27 [BOLT] Fix handling of code references from unmodified code
In lite mode (default for X86), BOLT optimizes and relocates functions
with profile. The rest of the code is preserved, but if it references
relocated code such references have to be updated. The update is handled
by scanExternalRefs() function. Note that we cannot solely rely on
relocations written by the linker, as not all code references are
exposed to the linker. Additionally, the linker can modify certain
instructions and relocations will no longer match the code.

With this change, start using symbolic disassembler for scanning code
for references in scanExternalRefs(). Unlike the previous approach, the
symbolizer properly detects and creates references for instructions with
multiple/ambiguous symbolic operands and handles cases where a
relocation doesn't match any operand. See test cases for examples.

Reviewed By: Amir

Differential Revision: https://reviews.llvm.org/D152631
2023-06-12 10:46:51 -07:00
spupyrev
2316a10fe5 [BOLT] stale profile matching [part 2 out of 2]
This is a first "serious" version of stale profile matching in BOLT. This diff
extends the hash computation for basic blocks so that we can apply a fuzzy
hash-based matching. The idea is to compute several "versions" of a hash value
for a basic block. A loose version of a hash (computed by ignoring instruction
operands) allows to match blocks in functions whose content has been changed,
while stricter hash values (considering instruction opcodes with operands and
even based on hashes of block's successors/predecessors) allow to resolve
collisions. In order to save space and build time, individual hash components
are blended into a single uint64_t.
There are likely numerous ways of improving hash computation but already this
simple variant provides significant perf benefits.

**Perf testing** on the clang binary: collecting data on clang-10 and using it
to optimize clang-11 (with ~1 year of commits in between). Next, we compare
- //stale_clang// (clang-11 optimized with profile collected on clang-10 with **infer-stale-profile=0**)
- //opt_clang// (clang-11 optimized with profile collected on clang-11)
- //infer_clang// (clang-11 optimized with profile collected on clang-10 with **infer-stale-profile=1**)

`LTO-only` mode:
//stale_clang// vs //opt_clang//: task-clock [delta(%): 9.4252 ± 1.6582, p-value: 0.000002]
(That is, there is a ~9.5% perf regression)
//infer_clang// vs //opt_clang//: task-clock [delta(%): 2.1834 ± 1.8158, p-value: 0.040702]
(That is, the regression is reduced to ~2%)
Related BOLT logs:
```
BOLT-INFO: identified 2114 (18.61%) stale functions responsible for 30.96% samples
BOLT-INFO: inferred profile for 2101 (18.52% of all profiled) functions responsible for 30.95% samples
```

`LTO+AutoFDO` mode:
//stale_clang// vs //opt_clang//: task-clock [delta(%): 19.1293 ± 1.4131, p-value: 0.000002]
//infer_clang// vs //opt_clang//: task-clock [delta(%): 7.4364 ± 1.3343, p-value: 0.000002]
Related BOLT logs:
```
BOLT-INFO: identified 5452 (50.27%) stale functions responsible for 85.34% samples
BOLT-INFO: inferred profile for 5442 (50.23% of all profiled) functions responsible for 85.33% samples
```

Reviewed By: Amir

Differential Revision: https://reviews.llvm.org/D146661
2023-06-08 14:42:41 -07:00
Amir Ayupov
713b28532e [BOLT][NFC] Fix debug messages
Fix debug printing, making it easier to compare two debug logs side by side:
- `BinaryFunction::addRelocation`: print function name instead of `this` ptr,
- `DataAggregator::doTrace`: remove duplicated function name.

Reviewed By: #bolt, maksfb

Differential Revision: https://reviews.llvm.org/D152314
2023-06-06 15:50:58 -07:00
spupyrev
3e3a926be8 [BOLT][NFC] Add hash computation for basic blocks
Extending yaml profile format with block hashes, which are used for stale
profile matching. To avoid duplication of the code, created a new class with a
collection of utilities for computing hashes.

Reviewed By: Amir

Differential Revision: https://reviews.llvm.org/D144306
2023-05-02 14:03:47 -07:00
Rafael Auler
b87bf74428 [BOLT] Fix creation of invalid CFG in presence of dead code
When there is a direct jump right after an indirect one, in
the absence of code jumpting to this direct jump, this is obviously
dead code. However, BOLT was failing to recognize that by mistakenly
placing both jmp instructions in the same basic block, and creating
wrong successor edges. Fix that, so we can safely run UCE on
that. This bug also causes validateCFG to fail and BOLT to crash if it
is running ICP on that function.

Reviewed By: #bolt, Amir

Differential Revision: https://reviews.llvm.org/D148055
2023-04-11 17:19:39 -07:00
Alexis Engelke
0c049ea60a [MC] Always encode instruction into SmallVector
All users of MCCodeEmitter::encodeInstruction use a raw_svector_ostream
to encode the instruction into a SmallVector. The raw_ostream however
incurs some overhead for the actual encoding.

This change allows an MCCodeEmitter to directly emit an instruction into
a SmallVector without using a raw_ostream and therefore allow for
performance improvments in encoding. A default path that uses existing
raw_ostream implementations is provided.

Reviewed By: MaskRay, Amir

Differential Revision: https://reviews.llvm.org/D145791
2023-04-06 16:21:49 +02:00
spupyrev
92758a99c3 [BOLT] computing raw branch count for yaml profiles
`Function.RawBranchCount` is initialized for fdata profile but not for yaml one.
The diff adds the computation of the field for yaml profiles

Reviewed By: Amir

Differential Revision: https://reviews.llvm.org/D144211
2023-03-28 11:09:21 -07:00
Kazu Hirata
4e585e51c1 Use *{Map,Set}::contains (NFC) 2023-03-15 22:55:35 -07:00
Amir Ayupov
edda85771a [BOLT][NFC] Move addRelocation{X86,AArch64} into MCPlusBuilder
The two methods don't belong in BinaryFunction methods.
Move the dispatch tables into target-specific MCPlusBuilder methods.

Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D131813
2023-03-14 17:34:25 -07:00
Amir Ayupov
2eae9d8eb2 [BOLT][NFC] Use llvm::is_contained
Apply the replacement throughout BOLT.

Reviewed By: #bolt, rafauler

Differential Revision: https://reviews.llvm.org/D145464
2023-03-14 15:37:03 -07:00
Vladislav Khmelevsky
7117af529e [BOLT] Improve dynamic relocations support for CI
This patch fixes few problems with supporting dynamic relocations in CI.
1. After dynamic relocations and functions were read search for dynamic
relocations located in functions. Currently we expected them only to be
relative and only to be in constant island. Mark islands of such
functions to have dynamic relocations and create CI access symbol on the
relocation offset, so the BD would be created for such place.
2. During function disassemble and handling address reference for
constant island check if the referred external CI has dynamic
relocation. And if it has one we would continue to refer original CI
rather then creating a local copy.
3. After function disassembly stage mark function that has dynamic reloc
in CI as non-simple. We don't want such functions to be optimized, since
such passes as split function would create 2 copies of CI which we
unable to support currently.
4. During updating output values for BF search for BD located in CI and
update their output locations.
5. On dynamic relocation patching stage search for binary data located
on relocation offset. If it was moved use new relocation offset value
rather then an old one.

Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei

Differential Revision: https://reviews.llvm.org/D143748
2023-03-13 13:37:28 +04:00
Amir Ayupov
d77f96a9e0 [BOLT][NFC] Simplify BinaryFunction::setTrapOnEntry
Reviewed By: #bolt, maksfb

Differential Revision: https://reviews.llvm.org/D144758
2023-02-27 15:41:40 -08:00
Amir Ayupov
1c286acfb8 [BOLT] Prevent unsetting unknown control flow for split jump table
In case of a function with unknown control flow but with a single jump
table and a single jump table site, we attempt to match the jump table
and a site and update block successors using jump table targets.
Restrict this behavior for split jump tables which have targets in a
fragment function.

Fixes https://github.com/llvm/llvm-project/issues/60795.

Reviewed By: #bolt, rafauler

Differential Revision: https://reviews.llvm.org/D144602
2023-02-27 15:22:43 -08:00
Amir Ayupov
be2f67c4d8 [BOLT][NFC] Replace anonymous namespace functions with static
Follow LLVM Coding Standards guideline on using anonymous namespaces
(https://llvm.org/docs/CodingStandards.html#anonymous-namespaces)
and use `static` modifier for function definitions.

Reviewed By: #bolt, maksfb

Differential Revision: https://reviews.llvm.org/D143124
2023-02-06 18:05:41 -08:00
Amir Ayupov
69a9bbf106 [BOLT][NFC] Replace ambiguous BinarySection::isReadOnly with isWritable
Address feedback in https://reviews.llvm.org/D102284#2755060

Reviewed By: yota9

Differential Revision: https://reviews.llvm.org/D141733
2023-01-18 14:53:07 -08:00
Amir Ayupov
f40d25dd8d [BOLT][NFC] Use llvm::reverse
Use llvm::reverse instead of `for (auto I = rbegin(), E = rend(); I != E; ++I)`

Reviewed By: #bolt, rafauler

Differential Revision: https://reviews.llvm.org/D140516
2023-01-03 17:32:11 -08:00
Maksim Panchenko
be9d3edee8 [BOLT][NFC] Remove unused PrintInstructions argument
PrintInstructions was unused in BinaryFunction::print() and dump().

Reviewed By: Amir

Differential Revision: https://reviews.llvm.org/D140440
2022-12-20 15:57:13 -08:00
Amir Ayupov
76cfea0c47 [BOLT][NFC] Use std::optional for readDWARFExpressionTargetReg 2022-12-11 22:13:47 -08:00
Amir Ayupov
72528ee4b4 [BOLT][NFC] Use std::optional in has*NameRegex 2022-12-11 22:13:47 -08:00
Amir Ayupov
2563fd63c6 [BOLT][NFC] Use std::optional in MCPlusBuilder
Reviewed By: maksfb, #bolt

Differential Revision: https://reviews.llvm.org/D139260
2022-12-06 14:51:38 -08:00
Maksim Panchenko
bcc4c90954 [BOLT] Fix instruction encoding validation
Always use non-symbolizing disassembler for instruction encoding
validation as symbols will be treated as undefined/zeros be the encoder
and causing byte sequence mismatches.

Reviewed By: Amir

Differential Revision: https://reviews.llvm.org/D136118
2022-10-18 13:50:00 -07:00
Rafael Auler
8d1fc45dc3 [BOLT][NFC] Refactor creation of symbol+addend references
Put code that creates references to symbol+addend behind MCPlusBuilder.
Will use this later in validate memory references pass.

Reviewed By: #bolt, maksfb, yota9

Differential Revision: https://reviews.llvm.org/D134097
2022-10-12 18:39:26 -07:00
Amir Ayupov
e002523b65 [BOLT] Verify externally referenced blocks against jump table targets
For functions with references to internal offsets from data, verify externally
referenced blocks against the set of jump table targets. Mark the function
as non-simple if there are any unclaimed data to code references.

Reviewed By: #bolt, maksfb

Differential Revision: https://reviews.llvm.org/D132495
2022-09-16 11:44:33 -07:00
Fabian Parzefall
3ac46f377a [BOLT] Emit LSDA call sites for all fragments
For exception handling, LSDA call sites have to be emitted for each
fragment individually. With this patch, call sites and respective LSDA
symbols are generated and associated with each fragment of their
function, such that they can be used by the emitter.

Reviewed By: maksfb

Differential Revision: https://reviews.llvm.org/D132052
2022-09-08 17:10:29 -07:00
Kazu Hirata
a0c7ca8ad4 [BOLT] Use range-based for loops (NFC)
LLVM Coding Standards discourage for_each unless callable objects
already exist.
2022-09-03 11:17:32 -07:00
Amir Ayupov
f119a2483d [BOLT][NFC] Use llvm::any_of
Replace the imperative pattern of the following kind
```
bool IsTrue = false;
for (Element : Range) {
  if (Condition(Element)) {
    IsTrue = true;
    break;
  }
}
```
with functional style `llvm::any_of`:
```
bool IsTrue = llvm::any_of(Range, [&](Element) {
  return Condition(Element);
});
```

Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D132276
2022-08-27 21:36:15 -07:00
Fabian Parzefall
9b6e7861ae [BOLT] Track fragment info for all split fragments
To generate all symbols correctly, it is necessary to record the address
of each fragment. This patch moves the address info for the main and
cold fragments from BinaryFunction to FunctionFragment, where this data
is recorded for all fragments.

Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D132051
2022-08-24 18:07:09 -07:00
Fabian Parzefall
07f63b0ac5 [BOLT] Allocate FunctionFragment on heap
This changes `FunctionFragment` from being used as a temporary proxy
object to access basic block ranges to a heap-allocated object that can
store fragment-specific information.

Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D132050
2022-08-24 18:06:08 -07:00
Fabian Parzefall
d5c03def24 [BOLT] Towards FunctionLayout const-correctness
A const-qualified reference to function layout allows accessing
non-const qualified basic blocks on a const-qualified function. This
patch adds or removes const-qualifiers where necessary to indicate where
basic blocks are used in a non-const manner.

Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D132049
2022-08-24 16:32:33 -07:00
Fabian Parzefall
f24c299e7d Revert "[BOLT] Towards FunctionLayout const-correctness"
This reverts commit 587d2653420d75ef10f30bd612d86f1e08fe9ea7.
2022-08-24 10:51:38 -07:00
Fabian Parzefall
5065134aa0 Revert "[BOLT] Allocate FunctionFragment on heap"
This reverts commit 101344af1af82d1633c773b718788eaa813d7f79.
2022-08-24 10:51:36 -07:00
Fabian Parzefall
6304e38281 Revert "[BOLT] Track fragment info for all split fragments"
This reverts commit 7e254818e49454a53bd00e3737007025b62d0f79.
2022-08-24 10:51:19 -07:00
Fabian Parzefall
7e254818e4 [BOLT] Track fragment info for all split fragments
To generate all symbols correctly, it is necessary to record the address
of each fragment. This patch moves the address info for the main and
cold fragments from BinaryFunction to FunctionFragment, where this data
is recorded for all fragments.

Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D132051
2022-08-24 10:17:17 -07:00
Fabian Parzefall
101344af1a [BOLT] Allocate FunctionFragment on heap
This changes `FunctionFragment` from being used as a temporary proxy
object to access basic block ranges to a heap-allocated object that can
store fragment-specific information.

Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D132050
2022-08-24 10:17:17 -07:00
Fabian Parzefall
587d265342 [BOLT] Towards FunctionLayout const-correctness
A const-qualified reference to function layout allows accessing
non-const qualified basic blocks on a const-qualified function. This
patch adds or removes const-qualifiers where necessary to indicate where
basic blocks are used in a non-const manner.

Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D132049
2022-08-24 10:17:17 -07:00
Amir Ayupov
37cbbea674 [BOLT][NFC] Move out handleAArch64IndirectCall
Move the large lambda out of BinaryFunction::disassemble, reducing its size from
255 to 233 LoC.

Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D132104
2022-08-23 17:37:01 -07:00
Amir Ayupov
c844850bdf [BOLT][NFC] Move out handleIndirectBranch
Move the large lambda out of BinaryFunction::disassemble, reducing its size from
295 to 255 LoC.

Differential Revision: https://reviews.llvm.org/D132101
2022-08-23 17:36:51 -07:00
Amir Ayupov
ec1fbf229e [BOLT][NFC] Move out handleExternalReference
Move the large lambda out of BinaryFunction::disassemble, reducing its size from
338 to 295 LoC.

Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D132100
2022-08-23 17:36:41 -07:00
Amir Ayupov
6cd475f8ca [BOLT][NFC] Move out handlePCRelOperand
Move the large lambda out of BinaryFunction::disassemble, reducing its size from
377 to 338 LoC.

Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D132099
2022-08-23 17:36:29 -07:00
Fabian Parzefall
0f74d191d1 [BOLT] Generate sections for multiple fragments
This patch adds support to generate any number of sections that are
assigned to fragments of functions that are split more than two-way.
With this, a function's *nth* split fragment goes into section
`.text.cold.n`.

This also changes `FunctionLayout::erase` to make sure, that there are
no empty fragments at the end of the function. This sometimes happens
when blocks are erased from the function. To avoid creating symbols
pointing to these fragments, they need to be removed.

Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D130521
2022-08-18 21:55:06 -07:00
Fabian Parzefall
a191ea7d59 [BOLT] Make exception handling fragment aware
This adds basic fragment awareness in the exception handling passes and
generates the necessary symbols for fragments.

Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D130520
2022-08-18 21:55:06 -07:00
Fabian Parzefall
275e075cbe [BOLT] Support passing fragments to code emission
This changes code emission such that it can emit specific function
fragments instead of scanning all basic blocks of a function and just
emitting those that are hot or cold.

To implement this, `FunctionLayout` explicitly distinguishes the "main"
fragment (i.e. the one that contains the entry block and is associated
with the original symbol) from "split" fragments. Additionally,
`BinaryFunction` receives support for multiple cold symbols - one for
each split fragment.

Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D130052
2022-08-18 21:55:06 -07:00
Denis Revunov
d0e29e87cd [BOLT][AArch64] Ignore functions with islandsInfo during VeneerEliminarion and ICF
Differential Revision: https://reviews.llvm.org/D131881

Reviewed By: yota9
2022-08-18 11:08:47 -04:00
Amir Ayupov
cdef841fe7 [BOLT][NFC] Simplify scanExternalRefs
Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D132013
2022-08-17 17:33:59 -07:00
Fabian Parzefall
0f8412c19c [BOLT] Add main fragment to function layout
Functions that do not contain any code still have to be emitted. This
occurs on AArch64 where functions can consist only of a constant island.
To support fragment semantics in code emission, this commits adds a
guaranteed main fragment to function layout. This fragment might be
empty, but allows us omit checks whether the function is empty in most
places.

Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D130051
2022-08-17 14:51:31 -07:00
Rafael Auler
19eb908e61 [BOLT] Remove always true if statement
Got a warning from GCC when building this.

Reviewed By: Amir

Differential Revision: https://reviews.llvm.org/D131092
2022-08-03 13:11:33 -07:00
Kazu Hirata
b498a8991e [bolt] Remove redundaunt control-flow statements (NFC)
Identified with readability-redundant-control-flow.
2022-07-30 10:35:49 -07:00
Fabian Parzefall
8477bc6761 [BOLT] Add function layout class
This patch adds a dedicated class to keep track of each function's
layout. It also lays the groundwork for splitting functions into
multiple fragments (as opposed to a strict hot/cold split).

Reviewed By: maksfb

Differential Revision: https://reviews.llvm.org/D129518
2022-07-16 17:23:24 -07:00