2838 Commits

Author SHA1 Message Date
Amir Ayupov
a8cf1a0352
[BOLT] Allow empty buildid in pre-aggregated profile addresses (#190675)
Allow `parseString()` to return an empty `StringRef` when the delimiter
appears at position 0. This enables parsing pre-aggregated profile
addresses with an omitted buildid but preserved colon (`:addr` format),
where the empty buildid corresponds to the main binary.

Previously, `parseString()` rejected zero-length fields by treating
`StringEnd == 0` the same as `StringRef::npos` (delimiter not found).
These are distinct situations: `npos` means no delimiter exists, while
`0` means the field before the delimiter is empty. The fix removes the
`StringEnd == 0` sub-condition so only the missing-delimiter case
errors.

The existing test for buildid-prefixed addresses is extended to also
verify that `:addr` input produces identical output to the plain-address
and non-empty-buildid variants.

Test Plan:
Added empty-buildid input file and extended
`pre-aggregated-perf-buildid.test` to run perf2bolt with `:addr` format
and diff the fdata output against the existing buildid-prefixed result.
2026-04-06 14:41:21 -07:00
Yashwant Singh
5e14916fa6
Early exit llvm-bolt when coming across empty data files (#176859)
perf2bolt generates empty fdata files for small binaries and right now
BOLT does this check while parsing by calling `((!hasBranchData() &&
!hasMemData()))`. Instead, early exit as soon as the buffer finishes
reading the data file and exit with error message.
2026-04-06 09:37:05 +05:30
Brian Cain
98ced6cfd0
[BOLT] Template patchELFPHDRTable and rewriteNoteSections for ELF32 (#189715)
Template patchELFPHDRTable, rewriteNoteSections, markGnuRelroSections,
and discoverStorage to support both ELF32LE and ELF64LE binaries.
Previously these functions were hardcoded for ELF64LE, causing crashes
when processing 32-bit ELF binaries.

The RewriteInstance constructor now accepts ELF32LE objects in addition
to ELF64LE. The ELF_FUNCTION macro is reused (and moved earlier in the
header) to dispatch to the correct template instantiation.

These changes are preparation for adding support to hexagon architecture
in Bolt.
2026-04-03 15:16:31 -05:00
Rafael Auler
7da3a66c06
[BOLT] Check for write errors before keeping output file (#190359)
Summary:
When the disk runs out of space during output file writing, BOLT would
crash with SIGSEGV/SIGABRT because raw_fd_ostream silently records write
errors and only reports them via abort() in its destructor. This made it
difficult to distinguish real BOLT bugs from infrastructure issues in
production monitoring.

Add an explicit error check on the output stream before calling
Out->keep(), so BOLT exits cleanly with exit code 1 and a clear error
message instead.

Test: manually verified with a full filesystem that BOLT now prints
"BOLT-ERROR: failed to write output file: No space left on device" and
exits with code 1.
2026-04-03 10:02:36 -07:00
Harald van Dijk
7c1d91c435
[BOLT] Move extern "C" out of unnamed namespace (#190282)
GCC 15 changes how it interprets extern "C" in unnamed namespaces and
gives the variable internal linkage.
2026-04-03 09:51:55 +01:00
Alexandros Lamprineas
64b728128d
[BOLT][AArch64] Add minimal support for liveness analysis. (#183298)
In this patch I am adding the missing target hooks required for the
liveness analysis to run on AArch64. These are
 - getFlagsReg()
 - getRegsUsedAsParams()
 - getDefaultLiveOut()
 - getGPRegs()
 - isCleanRegXOR()

I am also introducing the following API in LivenessAnalysis
 - BitVector getLiveIn/Out(const MCInst &)
 - MCPhysReg scavengeRegFromState(BitVector &)
 
My intention is to allow the LongJmp pass scavenge usable registers when
injecting code.
2026-04-02 11:59:59 +01:00
Alexandros Lamprineas
4c9a739c5e
[BOLT][AArch64] Strip uneeded labels from FEAT_CMPBR tests. (#189931)
Eliminates the temporary labels so that BOLT does not recognize them as
secondary entry points.
2026-04-02 10:16:41 +01:00
wangjue
8c2feea2f7
[BOLT] Delete unnecessary instructions (#189297) 2026-04-02 06:48:38 +03:00
Alexandros Lamprineas
abc0674f83
[BOLT][AArch64] Handle irreversible branches in compact-code-model (#186850)
When the compact-code-model is used, LongJmpPass::relaxLocalBranches
attempts to reverseBranchCondition without calling isReversibleBranch
resulting in runtime error. With this patch I am adding an additional
trampoline to handle irreversible FEAT_CMPBR branches.

In the future the plan is to use liveness analysis and replace the
irreversible branch with compare followed by branch (see #185731) as
long as the condition flags are dead, or emit the additional trampoline
otherwise.
2026-03-27 13:41:58 +00:00
Amir Ayupov
2fafeb0509 [BOLT] Support buildid in pre-aggregated profile (#186931)
Sample addresses belonging to external DSOs (buildid doesn't match the
current file) are treated as external (0).

Buildid for the main binary is expected to be omitted.

Test Plan:
added pre-aggregated-perf-buildid.test
2026-03-24 15:15:08 -07:00
Amir Ayupov
2e247a1d54 Revert "[BOLT] Support buildid in pre-aggregated profile"
Accidentally pushed unreviewed version.

This reverts commit fce6895804e596f18765c4db0f76931dac8df9f8.
2026-03-24 15:13:14 -07:00
Amir Ayupov
fce6895804 [BOLT] Support buildid in pre-aggregated profile
Sample addresses belonging to external DSOs (buildid doesn't match the
current file) are treated as external (0).

Buildid for the main binary is expected to be omitted.

Test Plan: added pre-aggregated-perf-buildid.test

Reviewers:
paschalis-mpeis, maksfb, yavtuk, ayermolo, yozhu, rafaelauler, yota9

Reviewed By: paschalis-mpeis

Pull Request: https://github.com/llvm/llvm-project/pull/186931
2026-03-24 15:05:33 -07:00
Amir Ayupov
31b17c4789
[BOLT] Add profile format documentation (#186685)
Create bolt/docs/profiles.md documenting all accepted profile formats:
perf.data, fdata, YAML, and pre-aggregated. Covers collection methods,
format syntax, examples, and known limitations.

Add reference from bolt/docs/index.rst.
2026-03-24 23:04:52 +01:00
Fangrui Song
d1b9b4c548
[MC] Remove unused NoExecStack parameter from MCStreamer::initSections. NFC (#188184)
Unused after commit 34bc5d580b73c0ca79653bb03e5c50419be2c634
2026-03-24 07:42:09 +00:00
Ádám Kallai
733bc3409b
[BOLT][Perf2bolt] Add support to generate pre-parsed perf data (#171144)
Adding a generator into Perf2bolt is the initial step to support the
large end-to-end tests for Arm SPE. This functionality proves unified format of
pre-parsed profile that Perf2bolt is able to consume.

Why does the test need to have a textual format SPE profile?

* To collect an Arm SPE profile by Linux Perf, it needs to have
an arm developer device which has SPE support.
* To decode SPE data, it also needs to have the proper version of
Linux Perf.
* The minimum required version of Linux Perf is v6.15.

Bypassing these technical difficulties, that easier to prove
a pre-generated textual profile format.

The generator relies on the aggregator work to spawn the required
perf-script jobs based on the the aggregation type, and merges the
results of the pref-script jobs into a single file.
This hybrid profile will contain all required events such as BuildID,
MMAP, TASK, BRSTACK, or MEM event for the aggregation.

Two examples below how to generate a pre-parsed perf data as
an input for ARM SPE aggregation:

`perf2bolt -p perf.data BINARY -o perf.text --spe
--generate-perf-script`

Or for basic aggregation:

`perf2bolt -p perf.data BINARY -o perf.text --ba --generate-perf-script`
2026-03-23 12:03:52 +01:00
Shanzhi Chen
de514fbaba
[BOLT] Remove some unused code (NFC) (#183880)
Remove some unused code in BOLT:
- `RewriteInstance::linkRuntime` is declared but not defined
- `BranchContext` typedef is never used
- `FuncBranchData::getBranch` is defined but never used
- `FuncBranchData::getDirectCallBranch` is defined but never used
2026-03-23 09:13:00 +00:00
YongKang Zhu
b7d97d9e8d
[BOLT] Remove outdated assertion from local symtab update logic (#187409)
The assert condition (function is not split or split
into less than three fragments) is not always true now
that we will emit more local symbols due to #184074.
2026-03-21 13:15:49 -07:00
Vasily Leonenko
51fd033521
[BOLT] Enable compatibility of instrumentation-file-append-pid with instrumentation-sleep-time (#183919)
This commit enables compatibility of instrumentation-file-append-pid and
instrumentation-sleep-time options. It also requires keeping the
counters mapping between the watcher process and the instrumented binary
process in shared mode. This is useful when we instrument a shared
library that is used by several tasks running on the target system. In
case when we cannot wait for every task to complete, we must use the
sleep-time option. Without append-pid option, we would overwrite the
profile at the same path but collected from different tasks, leading to
unexpected or suboptimal optimization effects.

Co-authored-by: Vasily Leonenko <vasily.leonenko@huawei.com>
2026-03-18 09:14:03 +03:00
YongKang Zhu
037c2095e6
Add hybrid function ordering support (#186003)
Allow `--function-order` to be combined with `--reorder-functions`
algorithms. Functions listed in the order file are pinned first
(indices 0..N-1), then the selected algorithm orders remaining
functions starting at index N.
2026-03-17 11:12:54 -07:00
Anatoly Trosinenko
481da949a4
[BOLT] Gadget scanner: implement finer-grained --scanners=... argument (#176135)
Add separate options to enable each of the available gadget detectors.
Furthermore, add two meta-options enabling all PtrAuth scanners and all
available scanners of any type (which is only PtrAuth for now, though).

This commit renames `pacret` option to `ptrauth-pac-ret` and `pauth` to
`ptrauth-all`.
2026-03-13 15:03:25 +00:00
Alexandros Lamprineas
3a8eabeb3a
[BOLT][AArch64] Support block reordering beyond 1KB for FEAT_CMPBR. (#185443)
Currently LongJmpPass::relaxLocalBranches bails early if the estimated
size of a binary function is less than 32KB assuming that the shortest
branches are 16 bits. Therefore the fixup value for the cold branch
target may go out of range if the function is larger than 1KB.

I am decreasing ShortestJumpSpan from 32KB to 1KB, since FEAT_CMPBR
branches are 11 bits.
2026-03-12 11:01:15 +00:00
Ádám Kallai
fd225e296f
[BOLT] Spawn buildid-list perf job at perf2bolt start. NFC (#185865)
Launch this perf job with the others at the beginning of the aggregation
process.

Extracting buildid-list from perf data is not a costly process, so it
can be performed by default. This provides a distinct advantage when
this dataset is required in other perf2bolt stages as well.

Please see PR #171144.
2026-03-12 10:24:09 +01:00
Amina Chabane
498906f2df
[BOLT] Error out on SHF_COMPRESSED debug sections (#185662)
Some binaries are built using `-gz=zstd`, but when using
`--update-debug-sections` on said binaries BOLT crashes.

This patch fixes this issue by recognising compressed debug sections in
binaries via their flag `SHF_COMPRESSED` and appropriately erroring out.

Legacy GNU-style compression is not handled.
2026-03-10 10:18:12 -07:00
Fangrui Song
c889454f1d
[MC] Rename PrivateGlobalPrefix to InternalSymbolPrefix. NFC (#185164)
The "private global" terminology, likely came from
llvm/lib/IR/Mangler.cpp, is misleading: "private" is the opposite of
"global", and these prefixed symbols are not global in the object file
format sense (e.g. ELF has STB_GLOBAL while these symbols are always
STB_LOCAL). The term "internal symbol" better describes their purpose:
symbols for internal use by compilers and assemblers, not meant to be
visible externally.

This rename is a step toward adopting the "internal symbol prefix"
terminology agreed with GNU as
(https://sourceware.org/pipermail/binutils/2026-March/148448.html).
2026-03-10 01:03:27 -07:00
Haibo Jiang
2c2126603c
[BOLT] Speed up dataflow analysis with RPO (#183704) 2026-03-10 08:46:01 +08:00
Nikita Popov
1f84bdeac2
[BOLT] Fix test with -DCLANG_DEFAULT_PIE_ON_LINUX=OFF (#185047)
Use `%cxxflags`, so that `-fPIE -pie` get passed in order to ensure the
test behavior is the same regardless of cmake configuration. We do
similar in many other BOLT tests.
2026-03-09 14:21:00 +01:00
Keith Smiley
f540ad69a8
[bolt][NFC] Remove unused ReorderUtils.h (#184642)
This header has a case sensitivity syntax error, delete it since it's
unused
2026-03-06 16:23:30 -08:00
Asher Dobrescu
7bce678ec1
[BOLT] Check if symbol is in data area of function (#160143)
There are cases in which `getEntryIDForSymbol` is called, where the
given Symbol is in a constant island, and so BOLT can not find its
function. This causes BOLT to reach `llvm_unreachable("symbol not
found")` and crash. This patch adds a check that avoids this crash.
2026-03-06 10:37:54 +00:00
YongKang Zhu
95685ca52e
[BOLT] Retain certain local symbols (#184074)
BOLT currently strips all STT_NOTYPE STB_LOCAL zero-sized symbols
that fall inside function bodies. Certain such symbols are named
labels (loop markers and subroutine entry points) or local function
symbols in hand-written assembly. We now keep them in local symbol
table in BOLT processed binaries for better symbolication.
2026-03-05 00:34:36 -08:00
YongKang Zhu
14bcb1a009
[BOLT] Make sure IOAddressMap exist before lookup (NFC) (#183184)
`BinaryFunction::translateInputToOutputAddress()` contains fallback
logic in case that querying `IOAddressMap` doesn't yield an output
address. Because this function could be called in scenarios where
`IOAddressMap` won't be set up, we should check if the map actually
exists before lookup.
2026-03-01 23:27:39 -08:00
YongKang Zhu
b4b32e88dd
[BOLT][instr] Disable stderr diagnostic output when targeting Android (#183185)
Disable all stderr diagnostic output on Android since there is typically
no terminal to read diagnostic message. The `noinline`annotation is to
keep same inline decision before and after this change. On AArch64
the `.text` section in instr runtime library is now ~4.8 KB smaller.
2026-03-01 23:26:48 -08:00
YongKang Zhu
3270bbf04c
[BOLT][instr] Make instrumentation counter reset thread safe (#183186)
Use `GlobalWriteProfileMutex` to synchronize between data reset and
dump. Between static counter reset and increment, we use atomic store
in counter reset - the counter increment sequence inserted within user
code already takes care of thread safety, so we just need to make sure
the counter reset code is also thread safe (no torn write to counter).
2026-03-01 23:26:08 -08:00
Fangrui Song
bed89970c3
AArch64: Replace @plt/%gotpcrel in data directives with %pltpcrel %gotpcrel (#155776)
Similar to #132569 for RISC-V, replace the unofficial `@plt` and
`@gotpcrel` relocation specifiers, currently only used by clang
-fexperimental-relative-c++-abi-vtables, with %pltpcrel %gotpcrel. The
syntax is not used in humand-written assembly code, and is not supported
by GNU assembler.

Also replace the recent `@funcinit` with `%funcinit(x)`.
2026-02-28 05:37:59 +00:00
Alexandros Lamprineas
a71ded3861
[BOLT][AArch64] Add a unittest for compare-and-branch inversion. (#181177)
Checks that isReversibleBranch() returns false
 - when the immediate value is 63 and needs +1 adjustment
 - when the immediate value is 0 and needs -1 adjustment

Checks that reverseBranchCondition() adjusts
 - the opcode
 - the immediate operand if necessary (+/-1)
 - the register operands if necessary (swap)
2026-02-27 21:09:16 +00:00
YongKang Zhu
143664fcd3
[BOLT][merge-fdata] Skip truncated lines in raw profile data (#183187)
Raw profile data file may contain lines truncated due to unexpected
app exit. This change is to have merge_fdata check number of fields
in each line of raw profile data file and ignore a line if the number
is not expected.
2026-02-25 21:42:30 -08:00
Gergely Bálint
9d762ad279
[BOLT][BTI] Patch ignored functions in place when targeting them with indirect branches (#177165)
When applying BTI fixups to indirect branch targets, ignored functions
are
considered as a special case:
- these hold no instructions,
- have no CFG,
- and are not emitted in the new text section.

The solution is to patch the entry points in the original location.

If such a situation occurs in a binary, recompilation using the
-fpatchable-function-entry flag is required. This will place a nop at
all
function starts, which BOLT can use to patch the original section.

Without the extra nop, BOLT cannot safely patch the original .text
section.

An alternative solution could be to also ignore the function from which
the stub starts. This has not been tried as LongJmp pass - where most
stubs are inserted - is currently not equipped to ignore functions.

Testing: both the success and failure cases are covered with lit tests.
2026-02-24 11:09:42 +01:00
Maksim Panchenko
7063b22c63
[BOLT] Always place new PT_LOAD after existing ones (#182642)
Insert new PT_LOAD segments right after the last existing PT_LOAD in the
program header table, instead of before PT_DYNAMIC or at the end. This
maintains the ascending p_vaddr order required by the ELF specification.

Previously, new segments could end up breaking PT_LOAD p_vaddr order
when PT_LOAD segments followed PT_DYNAMIC or PT_GNU_STACK. This lead to
runtime loader incorrectly assessing dynamic object size and silently
corrupting memory.
2026-02-21 14:09:36 -08:00
Amir Ayupov
393adaac1d
[BOLT] Mark BOLTReserved segment executable (#181606)
Summary:
When .bolt_reserved section is defined in the linker script, there's
no way to mark the containing segment executable other than via PHDRS
command which overrides program headers entirely which is impractical.

Since .bolt_reserved contains executable code, mark segment executable
in BOLT.

Test Plan: bolt-reserved.test
2026-02-19 15:07:50 -08:00
Fangrui Song
6f0b0ecaba
[NFC] Ensure MCTargetOptions outlives MCAsmInfo at createMCAsmInfo call sites (#180465)
Preparatory change for storing the MCTargetOptions pointer in MCAsmInfo
(#180464)
2026-02-17 21:48:22 -08:00
Alexey Moksyakov
db19a57597
[bolt][nfc] Exclude Call id verification from instrument-ind-call test (#181655)
The instrument-ind-call test checks the correctness of instrumented
snippet by the set of registers are used, the call id value is
meaningless (platform depend) and should be exclude from test.
2026-02-16 17:08:13 +03:00
Alexey Moksyakov
0a3db57e51
[bolt][nfc] fix typo in test (#181611)
Fixed the typo in instrument-ind-call test
2026-02-16 11:54:02 +03:00
Alexey Moksyakov
12b561a5e2
[bolt][aarch64] Change indirect call instrumentation snippet (#180229)
Indirect call instrumentation snippet uses x16 register in exit handler
to go to destination target

    __bolt_instr_ind_call_handler_func:
            msr  nzcv, x1
            ldp  x0, x1, [sp], #16
            ldr  x16, [sp], #16
            ldp  x0, x1, [sp], #16
            br   x16	<-----

This patch adds the instrumentation snippet by calling instrumentation
runtime library through indirect call instruction and adding the wrapper
to store/load target value and the register for original indirect
instruction.

Example:
            mov x16, foo

    infirectCall:
            adrp x8, Label
            add  x8, x8, #:lo12:Label
            blr x8

Before:

    Instrumented indirect call:
            stp     x0, x1, [sp, #-16]!
            mov     x0, x8
            movk    x1, #0x0, lsl #48
            movk    x1, #0x0, lsl #32
            movk    x1, #0x0, lsl #16
            movk    x1, #0x0
            stp     x0, x1, [sp, #-16]!
            adrp    x0, __bolt_instr_ind_call_handler_func
            add     x0, x0, #:lo12:__bolt_instr_ind_call_handler_func
            blr     x0

    __bolt_instr_ind_call_handler:  (exit snippet)
            msr     nzcv, x1
            ldp     x0, x1, [sp], #16
            ldr     x16, [sp], #16
            ldp     x0, x1, [sp], #16
            br      x16    <- overwrites the original value in X16

    __bolt_instr_ind_call_handler_func:  (entry snippet)
            stp     x0, x1, [sp, #-16]!
            mrs     x1, nzcv
            adrp    x0, __bolt_instr_ind_call_handler
            add     x0, x0, x0, #:lo12:__bolt_instr_ind_call_handler
            ldr     x0, [x0]
            cmp     x0, #0x0
            b.eq    __bolt_instr_ind_call_handler
            str     x30, [sp, #-16]!
            blr     x0     <--- runtime lib store/load all regs
            ldr     x30, [sp], #16
            b       __bolt_instr_ind_call_handler


_________________________________________________________________________

After:

            mov     x16, foo
    infirectCall:
            adrp    x8, Label
            add     x8, x8, #:lo12:Label
            blr     x8

    Instrumented indirect call:
            stp     x0, x30, [sp, #-16]!
            mov     x0, callsiteid
            stp    x8, x0, [sp, #-16]!
            adrp    x8, __bolt_instr_ind_call_handler_func
            add     x8, x8, #:lo12:__bolt_instr_ind_call_handler_func
            blr     x8       <--- call trampoline instr lib
            ldr     x8, [sp], #16
            ldp     x0, x30, [sp], #16
            blr     x8       <--- original indirect call instruction

    // don't touch regs besides x0, x1
    __bolt_instr_ind_call_handler:  (exit snippet)
            ret     <---- return to original function with indirect call

    __bolt_instr_ind_call_handler_func: (entry snippet)
            adrp    x0, __bolt_instr_ind_call_handler
            add     x0, x0, #:lo12:__bolt_instr_ind_call_handler
            ldr     x0, [x0]
            cmp     x0, #0x0
            b.eq    __bolt_instr_ind_call_handler
            str     x30, [sp, #-16]!
            blr     x0     <--- runtime lib store/load all regs
            ldr     x30, [sp], #16
            b       __bolt_instr_ind_call_handler
2026-02-16 10:45:08 +03:00
Alexandros Lamprineas
0584699c11
[BOLT][AArch64] Support FEAT_CMPBR branch instructions. (#174972)
The Armv9.6-A compare-and-branch instructions use a short range 9-bit
immediate value. They do not have a corresponding relocation type in the
ABI. For now we only support them in compact code model, with
diagnostics added in the LongJmp pass to ensure this condition. Some
interesting edge cases we cover:
- function splitting works when target is within or beyond the 1KB range
of those instructions,
 - but doesn't work beyond the 128MB limit of the compact code model
- branch inversion works with block reordering so long as the immediate
value adjustments remain in bounds
2026-02-12 15:49:00 +00:00
Gergely Bálint
f7c5316468
[BOLT][BTI] Refactor: move applyBTIFixup under MCPlusBuilder (#177164)
This patch moves the applyBTIFixup from LongJmp pass to MCPlusBuilder.
This refactor allows applyBTIFixup to be called from other passes
inserting indirect branches, such as:
- Hugify,
- PatchEntries.

As different passes have different information about their targets (e.g.
target BasicBlock, target Symbol, target Function), specialized versions
are created (applyBTIFixupToSymbol, applyBTIFixupToTarget), and each
calls
applyBTIFixupCommon, which implements the original logic from before.

Names of related lit tests are updated to have the "bti" prefix.
2026-02-12 08:29:16 +01:00
Maksim Panchenko
5129b3c449
[BOLT] Make FoldedIntoFunction always point to root parent (#180855)
After ICF folds functions, FoldedIntoFunction may point to a function
that was also folded. Add a post-processing step at the end of ICF to
flatten all chains so FoldedIntoFunction always points to the ultimate
root parent (a function that is not itself folded).
2026-02-11 11:35:02 -08:00
Maksim Panchenko
f80e3b3d7e
[BOLT] Keep folded functions in BinaryFunctions map. NFC (#180392)
In relocation mode, keep folded functions in the BinaryFunctions map
instead of erasing them. Mark them as folded using setFolded() and skip
emitting them.
2026-02-10 14:56:26 -08:00
Devanshi
8e02d249ba
[Bolt] Replace -1ULL/-2ULL/-3ULL with std::numeric_limits in DataAggregator (#178597)
Replace instances of -1ULL, -2ULL, and -3ULL with std::numeric_limits in
Bolt DataAggregator Trace constants to address C4146 compiler warning.

Changes:
- BR_ONLY: -1ULL → std::numeric_limits<uint64_t>::max()
- FT_ONLY: -1ULL → std::numeric_limits<uint64_t>::max()
- FT_EXTERNAL_ORIGIN: -2ULL → std::numeric_limits<uint64_t>::max() - 1
- FT_EXTERNAL_RETURN: -3ULL → std::numeric_limits<uint64_t>::max() - 2

Fixes part of #147439
2026-02-08 22:04:11 -08:00
Shanzhi Chen
e4674b85e9
[BOLT][NFC] Stop populating unnecessary samples into MemSamples (#179472)
Currently, many unnecessary samples are populated into MemSamples,
including zero-initialized samples and samples in which the PC address
is not contained in any BinaryFunction. But these samples are totally
skipped during processing and the whole MemSamples vector is cleared
immediately after processing. So, we could just stop populating these
samples into MemSamples, which would reduce maximum resident set size
when processing a large perf.data.
2026-02-08 19:27:55 -08:00
Maksim Panchenko
1e5493b1b8
[BOLT] Don't fold hot text mover functions in ICF (#180367)
Hot text mover functions are placed in special sections (e.g.,
.never_hugify) to avoid being placed on hot/huge pages. Folding them
with functions from other sections could defeat this purpose.

Add a check in ICF's isIdenticalWith() to prevent folding when either
function is a hot text mover.
2026-02-07 20:39:24 -08:00
YongKang Zhu
fc89b1c2d8
[BOLT] Get symbol for const island referenced across func by relocation (#178988)
When handling relocation in one function referencing code or
data defined in another function, we should check if relocation
target is constant island or not, and get the referenced symbol
accordingly for both cases.
2026-02-02 16:05:40 -08:00