165 Commits

Author SHA1 Message Date
Amir Ayupov
3968ebd00d
[BOLT] Keep multi-entry functions simple in aggregation mode (#128253)
BOLT used to mark multi-entry functions non-simple in non-relocation
mode with the reasoning that we can't move them due to potentially
undetected references. However, in aggregation mode it doesn't apply as
BOLT doesn't perform optimizations.

Relax this constraint in case of an aggregation job.

Test Plan: added entry-point-fallthru.s
2025-02-25 10:53:45 -08:00
YongKang Zhu
9fa77c1854
[BOLT][Linker][NFC] Remove lookupSymbol() in favor of lookupSymbolInfo() (#128070)
Sometimes we need to know the size of a symbol besides its address, so
maybe we can start using the existing `BOLTLinker::lookupSymbolInfo()`
(that returns symbol address and size) and remove
`BOLTLinker::lookupSymbol()` (that only returns symbol address). And for
both we need to check return value as it is wrapped in `std::optional<>`,
which makes the difference even smaller.
2025-02-20 17:14:33 -08:00
Maksim Panchenko
0ba391a85f
[BOLT] Improve constant island disassembly (#127971)
* Add label that identifies constant island.
* Support cases where the island is located after the function.
2025-02-20 11:16:01 -08:00
Maksim Panchenko
3115278c4e [BOLT] Fixup for commit 137c378/#125961 2025-02-06 00:26:20 -08:00
Maksim Panchenko
137c3781e6
[BOLT][AArch64] Include constant islands in disassembly (#125961)
When printing disassembly of a function with constant islands, include
the island info in the dump.

At the moment, only print islands in pre-CFG state. Include islands that
are interleaved with instructions.
2025-02-05 22:41:40 -08:00
Maksim Panchenko
ef232a7e34
[BOLT][AArch64] Remove nops in functions with defined control flow (#124705)
When a function has an indirect branch with unknown control flow, we
preserve nops in order to keep all instruction offsets (from the start
of the function) the same in case the indirect branch is used by a
PC-relative jump table. However, when we know the control flow of the
function, we should be able to safely remove nops.
2025-01-28 11:03:49 -08:00
Alexander Yermolovich
3c357a49d6
[BOLT] Add support for safe-icf (#116275)
Identical Code Folding (ICF) folds functions that are identical into one
function, and updates symbol addresses to the new address. This reduces
the size of a binary, but can lead to problems. For example when
function pointers are compared. This can be done either explicitly in
the code or generated IR by optimization passes like Indirect Call
Promotion (ICP). After ICF what used to be two different addresses
become the same address. This can lead to a different code path being
taken.

This is where safe ICF comes in. Linker (LLD) does it using address
significant section generated by clang. If symbol is in it, or an object
doesn't have this section symbols are not folded.

BOLT does not have the information regarding which objects do not have
this section, so can't re-use this mechanism.

This implementation scans code section and conservatively marks
functions symbols as unsafe. It treats symbols as unsafe if they are
used in non-control flow instruction. It also scans through the data
relocation sections and does the same for relocations that reference a
function symbol. The latter handles the case when function pointer is
stored in a local or global variable, etc. If a relocation address
points within a vtable these symbols are skipped.
2024-12-16 21:49:53 -08:00
Enna1
4d2bc0adc6
[BOLT] Extract comparator for sorting functions by index into helper function (#116217)
This change extracts the comparator for sorting functions by index into
a helper function `compareBinaryFunctionByIndex()`

Not sure why the comparator used in
`BinaryContext::getSortedFunctions()` is not same as the other two
places. I think they should use the same comparator, so I also change
`BinaryContext::getSortedFunctions()` to use
`compareBinaryFunctionByIndex()` for sorting functions.
2024-11-27 09:01:12 +08:00
Daniel Sanders
74003f11b3
[mc] Add CFI directive to emit val_offset() rules (#113971)
These specify that the value of the given register in the previous frame
is the CFA plus some offset. This isn't very common but can be necessary
if the original value is normally reconstructed from the stack/frame
pointer instead of being saved on the stack and reloaded from there.
2024-11-11 11:38:36 -08:00
Kazu Hirata
41baa69a7e
[BOLT] Fix warnings (#114116)
This patch fixes:

  bolt/lib/Core/BinaryFunction.cpp:2537:13: error: enumeration value
  'OpNegateRAStateWithPC' not handled in switch [-Werror,-Wswitch]

  bolt/lib/Core/BinaryFunction.cpp:2661:13: error: enumeration value
  'OpNegateRAStateWithPC' not handled in switch [-Werror,-Wswitch]

  bolt/lib/Core/BinaryFunction.cpp:2805:13: error: enumeration value
  'OpNegateRAStateWithPC' not handled in switch [-Werror,-Wswitch]
2024-10-29 13:52:22 -07:00
Kazu Hirata
7928e14f5e
[BOLT] Avoid repeated map lookups (NFC) (#112118) 2024-10-12 22:06:49 -07:00
Maksim Panchenko
4db0cc4c55
[BOLT] Allow sections in --print-only flag (#109622)
While printing functions, expand --print-only flag to accept section
names. E.g., "--print-only=\.init" will only print functions from
".init" section.
2024-09-25 23:44:06 +02:00
Maksim Panchenko
abd69b3653
[BOLT] Handle internal calls in ValidateInternalCalls (#105736)
Move handling of all internal calls into the designated pass. Preserve
NOPs and mark functions as non-simple on non-X86 platforms.
2024-08-27 11:31:32 -07:00
Maksim Panchenko
8f3050684e
[BOLT] Reduce CFI warning verbosity (#105336)
CFI programs may have more saves than restores and this is completely
benign from BOLT's perspective. Reduce the verbosity and print the
warning only under `-v=1` and above.
2024-08-20 13:41:19 -07:00
Amir Ayupov
f83a89c1b1
[BOLT] Turn non-empty CFI StateStack assert into a warning (#102216)
clang-15 can produce binaries with mismatched RememberState/RestoreState
CFIs. This is benign for unwinding, so replace an assert with a warning.
2024-08-06 17:23:43 -07:00
Amir Ayupov
3023b15fb1 [BOLT] Support POSSIBLE_PIC_FIXED_BRANCH
Detect and support fixed PIC indirect jumps of the following form:
```
movslq  En(%rip), %r1
leaq  PIC_JUMP_TABLE(%rip), %r2
addq  %r2, %r1
jmpq  *%r1
```

with PIC_JUMP_TABLE that looks like following:

```
  JT:  ----------
   E1:| L1 - JT  |
      |----------|
   E2:| L2 - JT  |
      |----------|
      |          |
         ......
   En:| Ln - JT  |
       ----------
```

The code could be produced by compilers, see
https://github.com/llvm/llvm-project/issues/91648.

Test Plan: updated jump-table-fixed-ref-pic.test

Reviewers: maksfb, ayermolo, dcci, rafaelauler

Reviewed By: rafaelauler

Pull Request: https://github.com/llvm/llvm-project/pull/91667
2024-07-18 20:57:05 -07:00
Fangrui Song
2718654c54
[MC] Support .cfi_label
GNU assembler 2.26 introduced the .cfi_label directive. It does not
expand to any CFI instructions, but defines a label in
.eh_frame/.debug_frame, which can be used by runtime patching code to
locate the FDE. .cfi_label is not allowed for CIE's initial
instructions, and can therefore be used to force the next instruction to
be placed in a FDE instead of a CIE.

In glibc since 2018, sysdeps/riscv/start.S utilizes .cfi_label to force
DW_CFA_undefined to be placed in a FDE. arc/csky/loongarch ports have
copied this use.
```
.cfi_startproc
// DW_CFA_undefined is allowed for CIE's initial instructions.
// Without .cfi_label, gas would place DW_CFA_undefined in a CIE.
.cfi_label .Ldummy
.cfi_undefined ra
.cfi_endproc
```

No CFI instruction is associated with .cfi_label, so the `case
MCCFIInstruction::OpLabel:` code in BOLT is unreachable and onlt to make
-Wswitch happy.

Close #97222

Pull Request: https://github.com/llvm/llvm-project/pull/97922
2024-07-07 12:41:13 -07:00
Amir Ayupov
344228ebf4 [BOLT] Drop macro-fusion alignment (#97358)
9d0754ada5dbbc0c009bcc2f7824488419cc5530 dropped MC support required for
optimal macro-fusion alignment in BOLT. Remove the support in BOLT as
performance measurements with large binaries didn't show a significant
improvement.

Test Plan:
macro-fusion alignment was never upstreamed, so no upstream tests are
affected.
2024-07-02 09:20:41 -07:00
Nathan Sidwell
6c5b62b846
[BOLT][NFC] Separate isReversibleBranch's 2 semantics (#95572)
`isUnsupportedBranch` was renamed (and inverted)  to `isReversibleBranch`, as that was how it was being used. But one use  in `BinaryFunction::disassemble` was using the original meaning to detect unsupported branches, and the `isUnsupportedBranch` had 2 separate semantic checks.

Move the unsupported branch check from `isReversibleBranch` to a new entry point: `isUnsupportedInstruction`. Call that from `BinaryFunction::disassemble`.

Move the dynamic branch check from X86's isReversibleBranch to the base class, as it is not an architecture-specific check.

Remove unnecessary `isReversibleBranch` calls from Instrumentation and X86 MCPlusBuilder.
2024-06-28 07:45:37 -04:00
Maksim Panchenko
d16b21b17d
[BOLT][Linux] Support ORC for alternative instructions (#96709)
Alternative instruction sequences in the Linux kernel can modify the
stack and thus they need their own ORC unwind entries. Since there's
only one ORC table, it has to be "shared" among multiple instruction
sequences. The kernel achieves this by putting a restriction on
instruction boundaries. If ORC state changes at a given IP, only one of
the alternative sequences can have an instruction starting/ending at
this IP. Then, developers can insert NOPs to guarantee the above
requirement is met.

The most common use of ORC with alternatives is "pushf; pop %rax"
sequence used for paravirtualization. Note that newer kernel versions
no longer use .parainstructions; instead, they utilize alternatives for
the same purpose.

Before we implement a better support for alternatives, we can safely
skip ORC entries associated with them.

Fixes #87052.
2024-06-27 19:26:11 -07:00
Maksim Panchenko
ca06b61084
[BOLT] Omit CFI state while printing functions without CFI (#96723)
If a function has no CFI program attached to it, do not print redundant
empty CFI state for every basic block.
2024-06-27 17:26:58 -07:00
Nikita Popov
b23fe1088f [bolt] Add missing <stack> include (NFC) 2024-06-21 14:02:15 +02:00
shaw young
4be3083bb3
[BOLT] Remove mutable from BB::LayoutIndex (#93224)
Removed mutability from BB::LayoutIndex, subsequently removed const from
BB::SetLayout, and changed BF::dfs to track visited blocks with a set as
opposed to tracking and altering LayoutIndexes for more consistent code.
2024-05-31 11:52:22 -07:00
Amir Ayupov
f239490592
[BOLT][NFC] Define getExprValue helper (#91663)
Move out common code extracting the address of a MCExpr. To be reused in
#91667.

Test Plan: NFC
2024-05-24 15:33:25 -07:00
Amir Ayupov
720cade2b6
[BOLT][NFC] Avoid computing BF hash twice in YAML reader (#75096)
We compute BF hashes in `YAMLProfileReader::readProfile` when first
matching profile functions with binary functions, and second time in
`YAMLProfileReader::parseFunctionProfile` during the profile assignment
(we need to do that to account for LTO private functions with
mismatching suffix).

Avoid recomputing the hash if it's been set.
2024-05-24 14:00:03 -07:00
Amir Ayupov
935b946b1f
[BOLT] Process cross references between ignored functions in BAT mode (#92484)
To align YAML and fdata profiles produced in BAT mode, lift two
restrictions applied in non-relocation mode when BAT is present:
1) register secondary entry points from ignored functions,
2) treat functions with secondary entry points as simple.

This allows constructing CFG for non-simple functions in non-relocation
mode and emitting YAML profile for them, which can then be used for
optimizations in relocation mode.

Test Plan: added test ignored-interprocedural-reference.s
2024-05-21 20:22:12 -07:00
Nathan Sidwell
76fdc2e527
[BOLT][NFC] Rename isUnsupportedBranch to isReversibleBranch (#92447)
`isUnsupportedBranch` is not a very informative name, and doesn't match
its corresponding `reverseBranchCondition`, as I noted in PR #92018.
Here's a renaming to a more mnemonic name.
2024-05-17 15:40:40 -04:00
Nathan Sidwell
725014d866
[BOLT][NFC] Simplify CFG validation (#91977)
Remove 'Valid' local boolean that has a single use, and return directly instead.
2024-05-14 09:36:34 -04:00
Amir Ayupov
db29f20fdd
[BOLT] Ignore returns in DataAggregator
Returns are ignored in perf/pre-aggregated/fdata profile reader (see
DataReader::convertBranchData). They are also omitted in
YAMLProfileWriter by virtue of not having the profile attached to them
in the reader, and YAMLProfileWriter converting the profile attached to
BinaryFunctions. Thus, return profile is universally ignored across all
profile types except BAT YAML.

To make returns ignored for YAML produced in BAT mode, we can:
1) ignore them in YAMLProfileReader,
2) omit them from YAML profile in profile conversion/writing.

The first option is prone to profile staleness issue, where the profiled
binary doesn't match the one to be optimized, and thus returns in the
profile can no longer be reliably detected (as we don't distinguish them
from calls in the profile).

The second option is robust to staleness but requires disassembling the
branch source instruction.

Test Plan: Updated bolt-address-translation-yaml.test

Reviewers: rafaelauler, dcci, ayermolo, maksfb

Reviewed By: maksfb

Pull Request: https://github.com/llvm/llvm-project/pull/90807
2024-05-08 12:02:18 -07:00
Amir Ayupov
fd38366e45
[BOLT][NFC] Clean includes, add license headers (#87200) 2024-03-31 19:29:45 -07:00
Amir Ayupov
d12e45ad16
[BOLT][NFC] Split out DomTree construction from BF::calculateLoopInfo (#87181) 2024-03-31 06:24:19 -07:00
Amir Ayupov
d8fe2e4bb0
[BOLT] Fix enumeration of secondary entry points
Make them start with 1 instead of 0 (reserved for primary entry point).

Test Plan:
```
bin/llvm-lit -a tools/bolt/test/X86/yaml-secondary-entry-discriminator.s
```

Reviewers: rafaelauler, ayermolo, maksfb, dcci

Reviewed By: maksfb

Pull Request: https://github.com/llvm/llvm-project/pull/86848
2024-03-27 15:23:49 -07:00
Maksim Panchenko
6b1cf00400
[BOLT] Add support for Linux kernel static keys jump table (#86090)
Runtime code modification used by static keys is the most ubiquitous
self-modifying feature of the Linux kernel. The idea is to to eliminate
the condition check and associated conditional jump on a hot path if
that condition (based on a boolean value of a static key) does not
change often. Whenever they condition changes, the kernel runtime
modifies all code paths associated with that key flipping the code
between nop and (unconditional) jump.
2024-03-21 14:05:21 -07:00
Maksim Panchenko
d7d564b2fc
[BOLT] Add BinaryFunction::registerBranch(). NFC (#83337)
Add an external interface to register a branch in a function that is in
disassembled state. Allows to make custom modifications to the
disassembler. E.g., a pre-CFG pass can add an instruction and register a
branch that will later be used during the CFG construction.
2024-02-28 20:04:28 -08:00
Maksim Panchenko
3f2a9e5910
[BOLT] Sort TakenBranches immediately before use. NFCI (#83333)
Move code that sorts TakenBranches right before the branches are used.
We can populate TakenBranches in pre-CFG post-processing and hence have
to postpone the sorting to a later point in the processing pipeline.
Will add such a pass later. For now it's NFC.
2024-02-28 19:51:44 -08:00
Maksim Panchenko
7c206c7812
[BOLT] Refactor interface for instruction labels. NFCI (#83209)
To avoid accidentally setting the label twice for the same instruction,
which can lead to a "lost" label, introduce getOrSetInstLabel()
function. Rename existing functions to getInstLabel()/setInstLabel() to
make it explicit that they operate on instruction labels. Add an
assertion in setInstLabel() that the instruction did not have a prior
label set.
2024-02-27 18:44:28 -08:00
Amir Ayupov
52cf07116b
[BOLT][NFC] Log through JournalingStreams (#81524)
Make core BOLT functionality more friendly to being used as a
library instead of in our standalone driver llvm-bolt. To
accomplish this, we augment BinaryContext with journaling streams
that are to be used by most BOLT code whenever something needs to
be logged to the screen. Users of the library can decide if logs
should be printed to a file, no file or to the screen, as
before. To illustrate this, this patch adds a new option
`--log-file` that allows the user to redirect BOLT logging to a
file on disk or completely hide it by using
`--log-file=/dev/null`. Future BOLT code should now use
`BinaryContext::outs()` for printing important messages instead of
`llvm::outs()`. A new test log.test enforces this by verifying that
no strings are print to screen once the `--log-file` option is
used.

In previous patches we also added a new BOLTError class to report
common and fatal errors, so code shouldn't call exit(1) now. To
easily handle problems as before (by quitting with exit(1)),
callers can now use
`BinaryContext::logBOLTErrorsAndQuitOnFatal(Error)` whenever code
needs to deal with BOLT errors. To test this, we have fatal.s
that checks we are correctly quitting and printing a fatal error
to the screen.

Because this is a significant change by itself, not all code was
yet ported. Code from Profiler libs (DataAggregator and friends)
still print errors directly to screen.

Co-authored-by: Rafael Auler <rafaelauler@fb.com>

Test Plan: NFC
2024-02-12 14:53:53 -08:00
Amir Ayupov
13d60ce2f2
[BOLT][NFC] Propagate BOLTErrors from Core, RewriteInstance, and passes (2/2) (#81523)
As part of the effort to refactor old error handling code that
would directly call exit(1), in this patch continue the migration
on libCore, libRewrite and libPasses to use the new BOLTError
class whenever a failure occurs.

Test Plan: NFC

Co-authored-by: Rafael Auler <rafaelauler@fb.com>
2024-02-12 14:51:15 -08:00
Amir Ayupov
b039ccc684
[BOLT] Provide backwards compatibility for YAML profile with std::hash (#74253)
Provide backwards compatibility for YAML profile that uses `std::hash`:
xxh3 hash is the default for newly produced profile (sets `std-hash:
false`),
whereas the profile that doesn't specify `std-hash` will be treated as
`std-hash: true`, preserving old behavior.
2023-12-11 12:27:32 -08:00
Maksim Panchenko
4f3081296f
[BOLT][NFC] Fix comment (#73983)
Fix off-by-one error in comment.
2023-11-30 14:31:38 -08:00
Maksim Panchenko
4bcbbe1f70
[BOLT] Refactor fixBranches() (#73752)
Simplify code in fixBranches(). Mostly NFC, accept the x86-specific
check for code fragments now takes into account presence of more than
two fragments. Should only matter when we split code into multiple
fragments and can run fixBranches() more than once.

Also, don't replace a branch target with the same one, as such operation
may allocate memory for extra MCSymbolRefExpr.
2023-11-29 16:24:16 -08:00
spupyrev
e7dd596c68
[BOLT] Use deterministic xxh3 for computing BF/BB hashes (#72542)
std::hash and ADT/Hashing::hash_value are non-deterministic functions
whose
results might vary across implementation/process/execution. Using xxh3
instead
for computing hashes of BinaryFunctions and BinaryBasicBlock for stale
profile
matching.
(A possible alternative is to use ADT/StableHashing.h based on FNV
hashing but
xxh3 seems to be more popular in LLVM)

This is to address https://github.com/llvm/llvm-project/issues/65241.
2023-11-27 14:45:46 -08:00
Maksim Panchenko
f4834255d3
[BOLT] Reset output addresses for deleted blocks (#73429)
This is a follow-up to #73076. We need to reset output addresses for
deleted blocks, otherwise the address translation may mistakenly
attribute input address of a deleted block to a non-zero address.

While working on a test case, I've discovered that DWARF output ranges
were already broken for deleted basic blocks: #73428. I will provide a
test case for this PR with a DWARF address range fix PR.
2023-11-25 23:23:47 -08:00
Maksim Panchenko
365114292a
[BOLT][NFC] Refactor function state check (#73420)
Remove redundant check in updateOutputValues().
2023-11-25 21:09:54 -08:00
ShatianWang
d333c0e062
[BOLT] Extend calculateEmittedSize() for block size calculation (#73076)
This commit modifies BinaryContext::calculateEmittedSize() to update 
the BinaryBasicBlock::OutputAddressRange of each basic block in the
function in place. BinaryBasicBlock::getOutputSize() now gives the 
emitted size of the basic block.
2023-11-23 15:28:31 -05:00
Maksim Panchenko
f653f6d57a
[BOLT][NFC] Delete unused declarations (#72596) 2023-11-16 23:36:19 -08:00
Vladislav Khmelevsky
5b59540661
[BOLT] Enhance fixed indirect branch handling (#71324)
Previously HasFixedIndirectBranch was set in BF to set isSimple to false
later because of unreachable bb ellimination pass which might remove the
BB with it's symbols accessed by other instructions than calls. It seems
to be that better solution would be to add extra entry point on target
offset instead of marking BF as non-simple.
2023-11-16 09:30:55 +04:00
Maksim Panchenko
e823136d43
[BOLT] Refactor --keep-nops option. NFC. (#72228)
Run RemoveNops pass only if --keep-nops is set to false (default).
2023-11-14 11:28:13 -08:00
Maksim Panchenko
f633f325a1
[BOLT] Fix NOP instruction emission on x86 (#72186)
Use MCAsmBackend::writeNopData() interface to emit NOP instructions on
x86. There are multiple forms of NOP instruction on x86 with different
sizes. Currently, LLVM's assembly/disassembly does not support all forms
correctly which can lead to a breakage of input code semantics, e.g. if
the program relies on NOP instructions for reserving a patch space.

Add "--keep-nops" option to preserve NOP instructions.
2023-11-13 18:12:39 -08:00
Maksim Panchenko
2db9b6a93f
[BOLT] Make instruction size a first-class annotation (#72167)
When NOP instructions are used to reserve space in the code, e.g. for
patching, it becomes critical to preserve their original size while
emitting the code. On x86, we rely on "Size" annotation for NOP
instructions size, as the original instruction size is lost in the
disassembly/assembly process.

This change makes instruction size a first-class annotation and is
affectively NFCI. A follow-up diff will use the annotation for code
emission.
2023-11-13 14:33:39 -08:00