1386 Commits

Author SHA1 Message Date
Alexandros Lamprineas
64b728128d
[BOLT][AArch64] Add minimal support for liveness analysis. (#183298)
In this patch I am adding the missing target hooks required for the
liveness analysis to run on AArch64. These are
 - getFlagsReg()
 - getRegsUsedAsParams()
 - getDefaultLiveOut()
 - getGPRegs()
 - isCleanRegXOR()

I am also introducing the following API in LivenessAnalysis
 - BitVector getLiveIn/Out(const MCInst &)
 - MCPhysReg scavengeRegFromState(BitVector &)
 
My intention is to allow the LongJmp pass scavenge usable registers when
injecting code.
2026-04-02 11:59:59 +01:00
wangjue
8c2feea2f7
[BOLT] Delete unnecessary instructions (#189297) 2026-04-02 06:48:38 +03:00
Alexandros Lamprineas
abc0674f83
[BOLT][AArch64] Handle irreversible branches in compact-code-model (#186850)
When the compact-code-model is used, LongJmpPass::relaxLocalBranches
attempts to reverseBranchCondition without calling isReversibleBranch
resulting in runtime error. With this patch I am adding an additional
trampoline to handle irreversible FEAT_CMPBR branches.

In the future the plan is to use liveness analysis and replace the
irreversible branch with compare followed by branch (see #185731) as
long as the condition flags are dead, or emit the additional trampoline
otherwise.
2026-03-27 13:41:58 +00:00
Amir Ayupov
2fafeb0509 [BOLT] Support buildid in pre-aggregated profile (#186931)
Sample addresses belonging to external DSOs (buildid doesn't match the
current file) are treated as external (0).

Buildid for the main binary is expected to be omitted.

Test Plan:
added pre-aggregated-perf-buildid.test
2026-03-24 15:15:08 -07:00
Amir Ayupov
2e247a1d54 Revert "[BOLT] Support buildid in pre-aggregated profile"
Accidentally pushed unreviewed version.

This reverts commit fce6895804e596f18765c4db0f76931dac8df9f8.
2026-03-24 15:13:14 -07:00
Amir Ayupov
fce6895804 [BOLT] Support buildid in pre-aggregated profile
Sample addresses belonging to external DSOs (buildid doesn't match the
current file) are treated as external (0).

Buildid for the main binary is expected to be omitted.

Test Plan: added pre-aggregated-perf-buildid.test

Reviewers:
paschalis-mpeis, maksfb, yavtuk, ayermolo, yozhu, rafaelauler, yota9

Reviewed By: paschalis-mpeis

Pull Request: https://github.com/llvm/llvm-project/pull/186931
2026-03-24 15:05:33 -07:00
Fangrui Song
d1b9b4c548
[MC] Remove unused NoExecStack parameter from MCStreamer::initSections. NFC (#188184)
Unused after commit 34bc5d580b73c0ca79653bb03e5c50419be2c634
2026-03-24 07:42:09 +00:00
Ádám Kallai
733bc3409b
[BOLT][Perf2bolt] Add support to generate pre-parsed perf data (#171144)
Adding a generator into Perf2bolt is the initial step to support the
large end-to-end tests for Arm SPE. This functionality proves unified format of
pre-parsed profile that Perf2bolt is able to consume.

Why does the test need to have a textual format SPE profile?

* To collect an Arm SPE profile by Linux Perf, it needs to have
an arm developer device which has SPE support.
* To decode SPE data, it also needs to have the proper version of
Linux Perf.
* The minimum required version of Linux Perf is v6.15.

Bypassing these technical difficulties, that easier to prove
a pre-generated textual profile format.

The generator relies on the aggregator work to spawn the required
perf-script jobs based on the the aggregation type, and merges the
results of the pref-script jobs into a single file.
This hybrid profile will contain all required events such as BuildID,
MMAP, TASK, BRSTACK, or MEM event for the aggregation.

Two examples below how to generate a pre-parsed perf data as
an input for ARM SPE aggregation:

`perf2bolt -p perf.data BINARY -o perf.text --spe
--generate-perf-script`

Or for basic aggregation:

`perf2bolt -p perf.data BINARY -o perf.text --ba --generate-perf-script`
2026-03-23 12:03:52 +01:00
Shanzhi Chen
de514fbaba
[BOLT] Remove some unused code (NFC) (#183880)
Remove some unused code in BOLT:
- `RewriteInstance::linkRuntime` is declared but not defined
- `BranchContext` typedef is never used
- `FuncBranchData::getBranch` is defined but never used
- `FuncBranchData::getDirectCallBranch` is defined but never used
2026-03-23 09:13:00 +00:00
YongKang Zhu
b7d97d9e8d
[BOLT] Remove outdated assertion from local symtab update logic (#187409)
The assert condition (function is not split or split
into less than three fragments) is not always true now
that we will emit more local symbols due to #184074.
2026-03-21 13:15:49 -07:00
Vasily Leonenko
51fd033521
[BOLT] Enable compatibility of instrumentation-file-append-pid with instrumentation-sleep-time (#183919)
This commit enables compatibility of instrumentation-file-append-pid and
instrumentation-sleep-time options. It also requires keeping the
counters mapping between the watcher process and the instrumented binary
process in shared mode. This is useful when we instrument a shared
library that is used by several tasks running on the target system. In
case when we cannot wait for every task to complete, we must use the
sleep-time option. Without append-pid option, we would overwrite the
profile at the same path but collected from different tasks, leading to
unexpected or suboptimal optimization effects.

Co-authored-by: Vasily Leonenko <vasily.leonenko@huawei.com>
2026-03-18 09:14:03 +03:00
YongKang Zhu
037c2095e6
Add hybrid function ordering support (#186003)
Allow `--function-order` to be combined with `--reorder-functions`
algorithms. Functions listed in the order file are pinned first
(indices 0..N-1), then the selected algorithm orders remaining
functions starting at index N.
2026-03-17 11:12:54 -07:00
Anatoly Trosinenko
481da949a4
[BOLT] Gadget scanner: implement finer-grained --scanners=... argument (#176135)
Add separate options to enable each of the available gadget detectors.
Furthermore, add two meta-options enabling all PtrAuth scanners and all
available scanners of any type (which is only PtrAuth for now, though).

This commit renames `pacret` option to `ptrauth-pac-ret` and `pauth` to
`ptrauth-all`.
2026-03-13 15:03:25 +00:00
Ádám Kallai
fd225e296f
[BOLT] Spawn buildid-list perf job at perf2bolt start. NFC (#185865)
Launch this perf job with the others at the beginning of the aggregation
process.

Extracting buildid-list from perf data is not a costly process, so it
can be performed by default. This provides a distinct advantage when
this dataset is required in other perf2bolt stages as well.

Please see PR #171144.
2026-03-12 10:24:09 +01:00
Amina Chabane
498906f2df
[BOLT] Error out on SHF_COMPRESSED debug sections (#185662)
Some binaries are built using `-gz=zstd`, but when using
`--update-debug-sections` on said binaries BOLT crashes.

This patch fixes this issue by recognising compressed debug sections in
binaries via their flag `SHF_COMPRESSED` and appropriately erroring out.

Legacy GNU-style compression is not handled.
2026-03-10 10:18:12 -07:00
Fangrui Song
c889454f1d
[MC] Rename PrivateGlobalPrefix to InternalSymbolPrefix. NFC (#185164)
The "private global" terminology, likely came from
llvm/lib/IR/Mangler.cpp, is misleading: "private" is the opposite of
"global", and these prefixed symbols are not global in the object file
format sense (e.g. ELF has STB_GLOBAL while these symbols are always
STB_LOCAL). The term "internal symbol" better describes their purpose:
symbols for internal use by compilers and assemblers, not meant to be
visible externally.

This rename is a step toward adopting the "internal symbol prefix"
terminology agreed with GNU as
(https://sourceware.org/pipermail/binutils/2026-March/148448.html).
2026-03-10 01:03:27 -07:00
Asher Dobrescu
7bce678ec1
[BOLT] Check if symbol is in data area of function (#160143)
There are cases in which `getEntryIDForSymbol` is called, where the
given Symbol is in a constant island, and so BOLT can not find its
function. This causes BOLT to reach `llvm_unreachable("symbol not
found")` and crash. This patch adds a check that avoids this crash.
2026-03-06 10:37:54 +00:00
YongKang Zhu
95685ca52e
[BOLT] Retain certain local symbols (#184074)
BOLT currently strips all STT_NOTYPE STB_LOCAL zero-sized symbols
that fall inside function bodies. Certain such symbols are named
labels (loop markers and subroutine entry points) or local function
symbols in hand-written assembly. We now keep them in local symbol
table in BOLT processed binaries for better symbolication.
2026-03-05 00:34:36 -08:00
YongKang Zhu
14bcb1a009
[BOLT] Make sure IOAddressMap exist before lookup (NFC) (#183184)
`BinaryFunction::translateInputToOutputAddress()` contains fallback
logic in case that querying `IOAddressMap` doesn't yield an output
address. Because this function could be called in scenarios where
`IOAddressMap` won't be set up, we should check if the map actually
exists before lookup.
2026-03-01 23:27:39 -08:00
Gergely Bálint
9d762ad279
[BOLT][BTI] Patch ignored functions in place when targeting them with indirect branches (#177165)
When applying BTI fixups to indirect branch targets, ignored functions
are
considered as a special case:
- these hold no instructions,
- have no CFG,
- and are not emitted in the new text section.

The solution is to patch the entry points in the original location.

If such a situation occurs in a binary, recompilation using the
-fpatchable-function-entry flag is required. This will place a nop at
all
function starts, which BOLT can use to patch the original section.

Without the extra nop, BOLT cannot safely patch the original .text
section.

An alternative solution could be to also ignore the function from which
the stub starts. This has not been tried as LongJmp pass - where most
stubs are inserted - is currently not equipped to ignore functions.

Testing: both the success and failure cases are covered with lit tests.
2026-02-24 11:09:42 +01:00
Maksim Panchenko
7063b22c63
[BOLT] Always place new PT_LOAD after existing ones (#182642)
Insert new PT_LOAD segments right after the last existing PT_LOAD in the
program header table, instead of before PT_DYNAMIC or at the end. This
maintains the ascending p_vaddr order required by the ELF specification.

Previously, new segments could end up breaking PT_LOAD p_vaddr order
when PT_LOAD segments followed PT_DYNAMIC or PT_GNU_STACK. This lead to
runtime loader incorrectly assessing dynamic object size and silently
corrupting memory.
2026-02-21 14:09:36 -08:00
Amir Ayupov
393adaac1d
[BOLT] Mark BOLTReserved segment executable (#181606)
Summary:
When .bolt_reserved section is defined in the linker script, there's
no way to mark the containing segment executable other than via PHDRS
command which overrides program headers entirely which is impractical.

Since .bolt_reserved contains executable code, mark segment executable
in BOLT.

Test Plan: bolt-reserved.test
2026-02-19 15:07:50 -08:00
Fangrui Song
6f0b0ecaba
[NFC] Ensure MCTargetOptions outlives MCAsmInfo at createMCAsmInfo call sites (#180465)
Preparatory change for storing the MCTargetOptions pointer in MCAsmInfo
(#180464)
2026-02-17 21:48:22 -08:00
Alexey Moksyakov
12b561a5e2
[bolt][aarch64] Change indirect call instrumentation snippet (#180229)
Indirect call instrumentation snippet uses x16 register in exit handler
to go to destination target

    __bolt_instr_ind_call_handler_func:
            msr  nzcv, x1
            ldp  x0, x1, [sp], #16
            ldr  x16, [sp], #16
            ldp  x0, x1, [sp], #16
            br   x16	<-----

This patch adds the instrumentation snippet by calling instrumentation
runtime library through indirect call instruction and adding the wrapper
to store/load target value and the register for original indirect
instruction.

Example:
            mov x16, foo

    infirectCall:
            adrp x8, Label
            add  x8, x8, #:lo12:Label
            blr x8

Before:

    Instrumented indirect call:
            stp     x0, x1, [sp, #-16]!
            mov     x0, x8
            movk    x1, #0x0, lsl #48
            movk    x1, #0x0, lsl #32
            movk    x1, #0x0, lsl #16
            movk    x1, #0x0
            stp     x0, x1, [sp, #-16]!
            adrp    x0, __bolt_instr_ind_call_handler_func
            add     x0, x0, #:lo12:__bolt_instr_ind_call_handler_func
            blr     x0

    __bolt_instr_ind_call_handler:  (exit snippet)
            msr     nzcv, x1
            ldp     x0, x1, [sp], #16
            ldr     x16, [sp], #16
            ldp     x0, x1, [sp], #16
            br      x16    <- overwrites the original value in X16

    __bolt_instr_ind_call_handler_func:  (entry snippet)
            stp     x0, x1, [sp, #-16]!
            mrs     x1, nzcv
            adrp    x0, __bolt_instr_ind_call_handler
            add     x0, x0, x0, #:lo12:__bolt_instr_ind_call_handler
            ldr     x0, [x0]
            cmp     x0, #0x0
            b.eq    __bolt_instr_ind_call_handler
            str     x30, [sp, #-16]!
            blr     x0     <--- runtime lib store/load all regs
            ldr     x30, [sp], #16
            b       __bolt_instr_ind_call_handler


_________________________________________________________________________

After:

            mov     x16, foo
    infirectCall:
            adrp    x8, Label
            add     x8, x8, #:lo12:Label
            blr     x8

    Instrumented indirect call:
            stp     x0, x30, [sp, #-16]!
            mov     x0, callsiteid
            stp    x8, x0, [sp, #-16]!
            adrp    x8, __bolt_instr_ind_call_handler_func
            add     x8, x8, #:lo12:__bolt_instr_ind_call_handler_func
            blr     x8       <--- call trampoline instr lib
            ldr     x8, [sp], #16
            ldp     x0, x30, [sp], #16
            blr     x8       <--- original indirect call instruction

    // don't touch regs besides x0, x1
    __bolt_instr_ind_call_handler:  (exit snippet)
            ret     <---- return to original function with indirect call

    __bolt_instr_ind_call_handler_func: (entry snippet)
            adrp    x0, __bolt_instr_ind_call_handler
            add     x0, x0, #:lo12:__bolt_instr_ind_call_handler
            ldr     x0, [x0]
            cmp     x0, #0x0
            b.eq    __bolt_instr_ind_call_handler
            str     x30, [sp, #-16]!
            blr     x0     <--- runtime lib store/load all regs
            ldr     x30, [sp], #16
            b       __bolt_instr_ind_call_handler
2026-02-16 10:45:08 +03:00
Alexandros Lamprineas
0584699c11
[BOLT][AArch64] Support FEAT_CMPBR branch instructions. (#174972)
The Armv9.6-A compare-and-branch instructions use a short range 9-bit
immediate value. They do not have a corresponding relocation type in the
ABI. For now we only support them in compact code model, with
diagnostics added in the LongJmp pass to ensure this condition. Some
interesting edge cases we cover:
- function splitting works when target is within or beyond the 1KB range
of those instructions,
 - but doesn't work beyond the 128MB limit of the compact code model
- branch inversion works with block reordering so long as the immediate
value adjustments remain in bounds
2026-02-12 15:49:00 +00:00
Gergely Bálint
f7c5316468
[BOLT][BTI] Refactor: move applyBTIFixup under MCPlusBuilder (#177164)
This patch moves the applyBTIFixup from LongJmp pass to MCPlusBuilder.
This refactor allows applyBTIFixup to be called from other passes
inserting indirect branches, such as:
- Hugify,
- PatchEntries.

As different passes have different information about their targets (e.g.
target BasicBlock, target Symbol, target Function), specialized versions
are created (applyBTIFixupToSymbol, applyBTIFixupToTarget), and each
calls
applyBTIFixupCommon, which implements the original logic from before.

Names of related lit tests are updated to have the "bti" prefix.
2026-02-12 08:29:16 +01:00
Maksim Panchenko
5129b3c449
[BOLT] Make FoldedIntoFunction always point to root parent (#180855)
After ICF folds functions, FoldedIntoFunction may point to a function
that was also folded. Add a post-processing step at the end of ICF to
flatten all chains so FoldedIntoFunction always points to the ultimate
root parent (a function that is not itself folded).
2026-02-11 11:35:02 -08:00
Maksim Panchenko
f80e3b3d7e
[BOLT] Keep folded functions in BinaryFunctions map. NFC (#180392)
In relocation mode, keep folded functions in the BinaryFunctions map
instead of erasing them. Mark them as folded using setFolded() and skip
emitting them.
2026-02-10 14:56:26 -08:00
Shanzhi Chen
e4674b85e9
[BOLT][NFC] Stop populating unnecessary samples into MemSamples (#179472)
Currently, many unnecessary samples are populated into MemSamples,
including zero-initialized samples and samples in which the PC address
is not contained in any BinaryFunction. But these samples are totally
skipped during processing and the whole MemSamples vector is cleared
immediately after processing. So, we could just stop populating these
samples into MemSamples, which would reduce maximum resident set size
when processing a large perf.data.
2026-02-08 19:27:55 -08:00
Maksim Panchenko
1e5493b1b8
[BOLT] Don't fold hot text mover functions in ICF (#180367)
Hot text mover functions are placed in special sections (e.g.,
.never_hugify) to avoid being placed on hot/huge pages. Folding them
with functions from other sections could defeat this purpose.

Add a check in ICF's isIdenticalWith() to prevent folding when either
function is a hot text mover.
2026-02-07 20:39:24 -08:00
YongKang Zhu
fc89b1c2d8
[BOLT] Get symbol for const island referenced across func by relocation (#178988)
When handling relocation in one function referencing code or
data defined in another function, we should check if relocation
target is constant island or not, and get the referenced symbol
accordingly for both cases.
2026-02-02 16:05:40 -08:00
Maksim Panchenko
2b2e02bea7
[BOLT] Refactor rewriteFunctionsInPlace from rewriteFile (#178787)
Extract the code that rewrites functions in place from rewriteFile()
into a separate rewriteFunctionsInPlace() function.
2026-01-30 11:51:29 -08:00
Maksim Panchenko
34a7608fad
[BOLT] Drop -znow requirement for PLT optimization on x86-64 (#178758)
On x86-64, PLT optimization does not require the binary to be linked
with -znow because indirect calls through GOT work correctly with lazy
binding. At runtime, the dynamic linker's resolver will populate the GOT
entry on the first call, just like with a regular PLT call.

This change removes the -znow requirement specifically for x86-64 while
keeping it for other architectures. I haven't checked RISV-V, but it's
still necessary on AArch64.
2026-01-29 16:10:43 -08:00
Francesco Petrogalli
460c9b2db1
[RISC-V][Mach-O] Add assembler support for Mach-O relocations. (#177446)
This patch adds comprehensive assembler (MC layer) support for the
Mach-O object file format on RISC-V targets, enabling assembly and
disassembly of RISC-V code targeting Apple platforms.

Key changes:

- Define RISC-V-specific Mach-O relocation types in BinaryFormat/MachO.h
- Implement RISCVMachObjectWriter with full relocation handling for:
  - PCREL_HI/LO pairs for PC-relative addressing
  - GOT relocations for external symbols
  - Branch relocations (CALL, unconditional/conditional branches)
  - Data section relocations

Test files include llvm-otool dumps to verify the generated relocations.

This code is based on code originally written by Tim Northover.
2026-01-26 14:06:30 -08:00
Gergely Bálint
de40ef2a3f
[BOLT][BTI] Patch LLD-generated PLTs to contain BTI landing pad (#173245)
This patch adds the patchPLTEntryForBTI to enable patching PLT entries
generated by LLD.

## Context:

To keep BTI consistent, targets of stubs inserted in LongJmp need to be
patched. As PLTs are not optimized and emitted by BOLT, this patch adds
a helper for patching them in the original .plt section.

For PLTs generated by LLD, this is safe as LLD inserts extra nops to
PLTs which don't already contain a BTI.

PLT entry before patching:
```
   adrp x16, Page(&(.got.plt[n]))
   ldr  x17, [x16, Offset(&(.got.plt[n]))]
   add  x16, x16, Offset(&(.got.plt[n]))
   br   x17
   nop
   nop
```

PLT entry after patching:
```
   bti c
   adrp x16, Page(&(.got.plt[n]))
   ldr  x17, [x16, Offset(&(.got.plt[n]))]
   add  x16, x16, Offset(&(.got.plt[n]))
   br   x17
   nop
```

## Safety considerations:

The PLT entry can become incorrect if shifting the ADRP moves it
across a page boundary.

The PLT entry is 24 bytes, and page size is 4096 (or 16384) bytes.
Their GCD is 8 bytes, meaning that shifting the ADRP is safe, as long as
it's shifted by less than 8 bytes.

The introduced function only shifts the ADRP by one instruction (4
bytes),
meaning there is no need to recompute the ADRP offset.
2026-01-16 09:48:54 +01:00
Gergely Bálint
4193c404ca
[BOLT][BTI] Disassemble PLT entries when processing BTI binaries (#169663)
PLT entries are PseudoFunctions, and are not disassembled or emitted.
For BTI, we need to check the first MCInst of PLT entries, to see
if indirectly calling them is safe or not.

This patch disassembles PLTs for binaries using BTI, while not changing
the behaviour for binaries without BTI.

The PLTs are only disassembled, not emitted.

---------

Co-authored-by: Paschalis Mpeis <paschalis.mpeis@arm.com>
2026-01-16 07:40:05 +01:00
Austin Jiang
e6cdfb75ac
Fix typos and spelling errors across codebase (#156270)
Corrected various spelling mistakes such as 'occurred', 'receiver',
'initialized', 'length', and others in comments, variable names,
function names, and documentation throughout the project. These
changes improve code readability and maintain consistency in naming
and documentation.

Co-authored-by: Louis Dionne <ldionne.2@gmail.com>
2026-01-13 11:52:46 -05:00
Harald van Dijk
e720636120
[BOLT] Avoid UB due to misaligned access. (#174990)
There is no guarantee that PatchOffset is suitably aligned for uint32_t,
and in BOLT's own tests, it is not aligned for uint32_t.

Fixes test failures seen with LLVM_USE_SANITIZER=Undefined.
2026-01-10 03:45:13 +00:00
Harald van Dijk
e42f862042
[BOLT][AArch64] Avoid UB due to shift of negative value. (#174994)
A build with LLVM_USE_SANITIZER=Undefined showed:

  bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp:2277:60:
  runtime error: left shift of negative value -32768

This showed up in bolt/test/AArch64/veneer-lite-mode.s.

It is valid for ADRP's operand to be negative, and not valid to shift it
like that. To perform this shift reliably, cast the value to unsigned.
2026-01-08 17:06:40 +00:00
Gergely Bálint
d6c22d4b0e
[BOLT][BTI] Disallow instrumenting BTI binaries (#174936)
Until instrumentation support is added, the feature should be
disabled for BTI binaries. An error message is added to explain
the situation.
Meanwhile, users can choose sampling-based profiling methods.

Added a TODO comment explaining missing steps.
2026-01-08 12:42:20 +01:00
Haibo Jiang
8fc3f6ddb8
[BOLT] Add option instrumentation-max-size for bump allocator (#174716)
While the current max memory size is sufficient for most binaries, a few
binaries may encouter insufficient allocated memory space.

Allow specify the max memory size of the instrumentation bump allocator.
2026-01-08 11:53:20 +03:00
Gergely Bálint
76c300c8c7
[BOLT][BTI] Fix assertions checking getNumOperands (#174600)
Several BTI-related functions are checking that a call MCInst has one
non-annotation operand.

This patch changes these checks to use MCPlus::getNumPrimeOperands,
instead of getNumOperands.

Testing: 
added annotations to existing gtests to serve as regression
tests. These now also explicitly check getNumOperands and getNumPrimeOperands
usage on the annotated MCInsts.
2026-01-07 10:54:00 +01:00
Maksim Panchenko
3e840d2957
[BOLT] Remove unnecessary dependency. NFC (#174645)
There's no need for a full definition of `BinaryBasicBlock` in
`MCPlusBuilder.h`. Use `InstructionListType::iterator` instead of
`BinaryBasicBlock::iterator` in `findMemcpySizeInBytes()`.
2026-01-06 13:30:28 -08:00
Victor Chernyakin
c438773432
[LLVM][ADT] Migrate users of make_scope_exit to CTAD (#174030)
This is a followup to #173131, which introduced the CTAD functionality.
2026-01-02 20:42:56 -08:00
Anatoly Trosinenko
2469d39a75
[BOLT] Overhaul the comments in PAuthGadgetScanner for readability (NFC) (#169801)
Update the comments in PAuthGadgetScanner.cpp to better describe the
current version of the code. Along the way, shorten identifier names
that are redundant taking their context into account:
`RegsToTrackInstsFor` (made `RegsToTrack`) and `getNumTrackedRegisters`
(made `getNumRegisters`).

Co-authored-by: Kristof Beyls <kristof.beyls@arm.com>
2025-12-29 14:10:11 +03:00
Amir Ayupov
6f4ddf920b
[BOLT][NFC] Split up StaleProfileMatching::matchWeights (#165492)
Simplify matchWeights in preparation for pseudo probe matching 
(#100446).

Test Plan: NFC
2025-12-23 18:55:03 -08:00
Anatoly Trosinenko
96ee7d2c37
[ADT] Make use of subsetOf and anyCommon methods of BitVector (NFC) (#170876)
Replace the code along these lines

    BitVector Tmp = LHS;
    Tmp &= RHS;
    return Tmp.any();

and

    BitVector Tmp = LHS;
    Tmp.reset(RHS);
    return Tmp.none();

with `LHS.anyCommon(RHS)` and `LHS.subsetOf(RHS)`, correspondingly,
which do not require creating temporary BitVector and can return early.
2025-12-23 11:57:35 +03:00
Amir Ayupov
8c010dea55
[BOLT] Lookup top-level inline tree node in YAMLProfileWriter (#165491)
Top-level (binary) functions don't have a unique GUID mapping, with
different
causes namely coroutine fragments sharing the same parent source
function GUID.

Replace the top-level inline tree node GUID lookup with probe lookup
coupled
with walk up the inline tree.

Test Plan: added test-coro-probes.yaml
2025-12-22 14:47:21 -08:00
Gergely Bálint
674f308324
[BOLT][BTI] Add needed BTIs in LongJmp or refuse to optimize binary (#171149)
This patch adds BTI landing pads to ShortJmp/LongJmp targets in the
LongJmp pass when optimizing BTI binaries.

BOLT does not have the ability to add BTI to all types of functions.
This patch aims to insert the landing pad where possible, and emit an
error and exit where it currently is not.

BOLT cannot insert BTIs into several function "types", including:
- ignored functions,
- PLT functions,
- other functions without a CFG.

Additional context:

In #161206, BOLT gained the ability to decode the .note.gnu.property
section, and warn about lack of BTI support for BOLT. However, this
warning is misleading: the emitted binary may not need extra BTI landing
pads.

With this patch, the emitted binary will be "BTI-safe".
2025-12-22 12:24:42 +01:00
Gergely Bálint
24297bea96
[BOLT][BTI] Refactor BTI helpers (#173000)
- Add an enum to encode BTI variants in function arguments.
- Remove updateBTIVariant as createBTI can be used for the same
purpose.
- Remove a test case that checked against invalid BTI variants, as
those are now unrepresentable.
2025-12-22 10:11:41 +01:00