485 Commits

Author SHA1 Message Date
YongKang Zhu
5c4f506cca
[BOLT] Validate extra entry point by querying data marker symbols (#154611)
Look up marker symbols and decide whether candidate is really
extra entry point in `adjustFunctionBoundaries()`.
2025-08-20 14:18:56 -07:00
Maksim Panchenko
0d9b9d1eef
[BOLT] Keep X86 HLT instruction as a terminator in user mode (#154402)
This is a follow-up to #150963. X86 HLT instruction may appear in the
user-level code, in which case we should treat it as a terminator.
Handle it as a non-terminator in the Linux kernel mode.
2025-08-19 14:41:13 -07:00
Maksim Panchenko
1e0edb072a
[BOLT][AArch64] Compensate for missing code markers (#151060)
Code written in assembly can have missing code markers. In BOLT, we can
compensate by recognizing that a function entry point should start a
code sequence.

Seen such code in lua jit library.
2025-07-29 12:01:06 -07:00
Amir Ayupov
a850912de1
[BOLT] Require CFG in BAT mode (#150488)
`getFallthroughsInTrace` requires CFG for functions not covered by BAT,
even in BAT/fdata mode. BAT-covered functions go through special
handling in fdata (`BAT->getFallthroughsInTrace`) and YAML
(`DataAggregator::writeBATYAML`) modes.

Since all modes (BAT/no-BAT, YAML/fdata) now need disassembly/CFG
construction:
- drop special BAT/fdata handling that omitted disassembly/CFG in
  `RewriteInstance::run`, enabling *CFG for all non-BAT functions*,
- switch `getFallthroughsInTrace` to check if a function has CFG,
- which *allows emitting profile for non-simple functions* in all modes.

Previously, traces in non-simple functions were reported as invalid/
mismatching disassembled function contents. This change reduces the
number of such invalid traces and increases the number of profiled
functions. These functions may participate in function reordering via
call graph profile.

Test Plan: updated unclaimed-jt-entries.s
2025-07-25 13:54:37 +02:00
Maksim Panchenko
d5d94ba8bc
[BOLT] More refactoring of PHDR handling. NFC (#148932)
Replace ad-hoc adjustment of the program header count with info from the
new segment list.
2025-07-24 11:47:27 -07:00
Maksim Panchenko
218fd69261
[BOLT] Decouple new segment creation from PHDR rewrite. NFCI (#146111)
Refactor handling of PHDR table rewrite to make modifications easier.
2025-07-02 11:22:12 -07:00
Maksim Panchenko
ad7d675991
[BOLT] Refactor mapCodeSections(). NFC (#146434)
Factor out non-relocation specific code into a separate function.
2025-06-30 17:09:41 -07:00
Maksim Panchenko
fb24b4d46a
[BOLT] Push code to higher addresses under options (#146180)
When --hot-functions-at-end is used in combination with --use-old-text,
allocate code at the highest possible addresses withing old .text.

This feature is mostly useful for HHVM, where it is beneficial to have
hot static code placed as close as possible to jitted code.
2025-06-28 13:53:56 -07:00
Sterling-Augustine
23f1ba3ee4
Reapply "[NFC][DebugInfo][DWARF] Create new low-level dwarf library (#… (#145959) (#146112)
Reapply "[NFC][DebugInfo][DWARF] Create new low-level dwarf library (#…
(#145959)
    
This reapplies cbf781f0bdf2f680abbe784faedeefd6f84c246e, with fixes for
the shared-library build and the unconventional sanitizer-runtime build.

Original Description:

This is the culmination of a series of changes described in [1].
    
Although somewhat large by line count, it is almost entirely mechanical,
creating a new library in DebugInfo/DWARF/LowLevel. This new library has
very minimal dependencies, allowing it to be used from more places than
the normal DebugInfo/DWARF library--in particular from MC.
    
1.
https://discourse.llvm.org/t/rfc-debuginfo-dwarf-refactor-into-to-lower-and-higher-level-libraries/86665/2
2025-06-27 11:05:49 -07:00
Maksim Panchenko
d00c83ef22
[BOLT] Skip creation of new segments (#146023)
When all section contents are updated in-place, we can skip creation of
new segment(s), save disk space, and free up low memory addresses.

Currently, this feature only works with --use-gnu-stack.
2025-06-27 09:12:08 -07:00
Sterling-Augustine
5d03e7a204
Revert "[NFC][DebugInfo][DWARF] Create new low-level dwarf library (#… (#145959)
…145081)"

This reverts commit cbf781f0bdf2f680abbe784faedeefd6f84c246e.

Breaks a couple of buildbots.
2025-06-26 13:09:20 -07:00
Maksim Panchenko
4308292d1e
[BOLT] Refactor NewTextSegmentAddress handling (#145950)
Refactor the code for NewTextSegmentAddress to correctly point at the
true start of the segment when PHDR table is placed at the beginning. We
used to offset NewTextSegmentAddress by PHDR table plus cache line
alignment.

NFC for proper binaries. Some YAML binaries from our tests will diverge
due to bad segment address/offset alignment.
2025-06-26 12:09:11 -07:00
Sterling-Augustine
cbf781f0bd
[NFC][DebugInfo][DWARF] Create new low-level dwarf library (#145081)
This is the culmination of a series of changes described in [1].
    
Although somewhat large by line count, it is almost entirely mechanical,
creating a new library in DebugInfo/DWARF/LowLevel. This new library has
very minimal dependencies, allowing it to be used from more places than
the normal DebugInfo/DWARF library--in particular from MC.
    
I am happy to put it in another location, or to structure it differently
if that makes sense. Some have suggested in BinaryFormat, but it is not
a great fit there. But if that makes more sense to the reviewers, I can
do that.
 
Another possibility would be to use pass-through headers to allow
clients who don't care to depend only on DebugInfo/DWARF. This would be
a much less invasive change, and perhaps easier for clients. But also a
system that hides details.

Either way, I'm open.

1.
https://discourse.llvm.org/t/rfc-debuginfo-dwarf-refactor-into-to-lower-and-higher-level-libraries/86665/2
2025-06-26 11:23:46 -07:00
Amir Ayupov
f0d32575a1
[BOLT][NFCI] Use FileSymbols for local symbol disambiguation (#89088)
Remove SymbolToFileName mapping from every local symbol to its
containing FILE symbol name, and reuse FileSymbols to disambiguate
local symbols instead.

Also removes the check for `ld-temp.o` file symbol which was added to
prevent LTO build mode from affecting the disambiguated name. This may
cause incompatibility when using the profile collected on a binary built
in a different mode than the input binary.

Addresses #90661.

Speeds up discover file objects by 5-10% for large binaries:
- binary with ~1.2M symbols: 12.6422s -> 12.0297s
- binary with ~4.5M symbols: 48.8851s -> 43.7315s
2025-06-20 14:29:32 -07:00
Amir Ayupov
4959e8a1da
[BOLT][NFCI] Use heuristic for matching split global functions (#90429)
This change speeds up fragment matching for large BOLTed binaries where
all fragments of global parent functions are put under `bolt-pseudo.o`
file symbol:
- before: iterating over symbols under `bolt-pseudo.o` only to fail
  to find a parent,
- after: bail out immediately and use a global parent by name.

Test Plan: NFC, updated register-fragments-bolt-symbols.s
2025-06-20 12:46:56 -07:00
Maksim Panchenko
b8d3efa189
[BOLT][Linux] Fix linux_banner lookup (#144962)
While detecting the Linux kernel version, look for `linux_banner` symbol
with local visibility if the global one was not found.

Fixes #144847
2025-06-19 23:09:34 -07:00
Maksim Panchenko
06f13f8684
[BOLT] Fix references in ignored functions in CFG state (#140678)
When we call setIgnored() on functions that already have CFG built,
these functions are not going to get emitted and we risk missing
external function references being updated.

To mitigate the potential issues, run scanExternalRefs() on such
functions to create patches/relocations.

Since scanExternalRefs() relies on function relocations, we have to
preserve relocations until the function is emitted. As a result, the
memory overhead without debug info update could reach up to 2%.
2025-06-02 12:33:54 -07:00
Maksim Panchenko
c9022a29b4
[BOLT][AArch64] Detect veneers with missing data markers (#142069)
The linker may omit data markers for long absolute veneers causing BOLT
to treat data as code. Detect such veneers and introduce data markers
artificially before BOLT's disassembler kicks in.
2025-05-29 19:24:34 -07:00
Kazu Hirata
d08833f23f
[BOLT] Remove unused local variables (NFC) (#140421)
While I'm at it, this patch removes GetExecutablePath, which becomes
unused after removing the sole use.
2025-05-17 17:43:29 -07:00
Amir Ayupov
9d5d715330
[BOLT][heatmap] Add synthetic hot text section (#139824)
In heatmap mode, report samples and utilization of the section(s)
between hot text markers `[__hot_start, __hot_end)`.

The intended use is with multi-way splitting where there are several
sections that contain "hot" code (e.g. `.text.warm` with CDSplit).

Addresses the comment on #139193

https://github.com/llvm/llvm-project/pull/139193#pullrequestreview-2835274682

Test Plan: updated heatmap-preagg.test
2025-05-14 09:47:14 -07:00
Amir Ayupov
0289ca09be
[BOLT] Print heatmap from perf2bolt (#139194)
Add perf2bolt `--heatmap` option to produce heatmaps during profile
aggregation.

Distinguish exclusive mode (`llvm-bolt-heatmap`) and optional mode 
(`perf2bolt --heatmap`), which impacts perf.data handling:
exclusive mode covers all addresses, whereas optional mode consumes
attached profile only covering function addresses.

Test Plan: updated per2bolt tests:
- pre-aggregated-perf.test: pre-aggregated data,
- bolt-address-translation-yaml.test: pre-aggregated + BOLTed input,
- perf_test.test: no-LBR perf data.
2025-05-13 13:23:18 -07:00
Amir Ayupov
7f4febde10
[BOLT][heatmap] Compute section utilization and partition score (#139193)
Heatmap groups samples into buckets of configurable size (`--block-size`
flag with 64 bytes as the default =X86 cache line size). Buckets are
mapped to containing sections; for buckets that cover multiple sections,
they are attributed to the first overlapping section. Buckets not mapped
to a section are reported as unmapped.

Heatmap reports **section hotness** which is a percentage of samples
attributed to the section.

Define **section utilization** as a percentage of buckets with non-zero
samples relative to the total number of section buckets.

Also define section **partition score** as a product of section hotness
(where total excludes unmapped buckets) and mapped utilization, ranging 
from 0 to 1 (higher is better).

The intended use of new metrics is with **production profile** collected
from BOLT-optimized binary. In this case the partition score of .text
(hot text if function splitting is enabled) reflects **optimization
profile** representativeness and the quality of hot-cold splitting.
Partition score of 1 means that all samples fall into hot text, and all
buckets (cache lines) in hot text are exercised, equivalent to perfect
hot-cold splitting.

Test Plan: updated heatmap-preagg.test
2025-05-13 13:20:13 -07:00
Kazu Hirata
074c420496
[BOLT] Use StringRef::starts_with (NFC) (#139437) 2025-05-11 07:13:02 -07:00
Kazu Hirata
d5b170c39b
[BOLT] Remove redundant calls to std::unique_ptr<T>::get (NFC) (#139403) 2025-05-10 13:39:15 -07:00
Maksim Panchenko
7d6fda4fd3
[BOLT] Run PatchEntries pass before LongJmp (#137236)
With --force-patch option, every original function entry point is
overwritten with a trampoline to a new version of the function to
prevent the execution of the original code.

If the function size is too small for the trampoline code, we are forced
to bail out on rewriting the function. That presented a problem on
AArch64 due to LongJmp pass that assumed the presence of the new copy of
the function. If the new copy was not emitted it could have lead to a
relocation overflow.

Run PatchEntries pass before LongJmp and make the latter aware of the
functions that are not going to be emitted. Make --force-patch option
behavior on AArch64 consistent with other architectures.
2025-05-01 15:09:09 -07:00
YongKang Zhu
316a6ff3d0
[BOLT][RelVTable] Skip special handling on non virtual function pointer relocations (#137406)
Besides virtual function pointers vtable could contain other kinds of
entries like those for RTTI data that also require relocations. We need
to skip special handling on relocations for non virtual function pointers
in relative vtable.

Co-authored-by: Maksim Panchenko <maks@meta.com>
2025-04-29 08:13:44 -07:00
Owen Rodley
d3d856ad84
Clean up external users of GlobalValue::getGUID(StringRef) (#129644)
See https://discourse.llvm.org/t/rfc-keep-globalvalue-guids-stable/84801
for context.

This is a non-functional change which just changes the interface of
GlobalValue, in preparation for future functional changes. This part
touches a fair few users, so is split out for ease of review. Future
changes to the GlobalValue implementation can then be focused purely on
that class.

This does the following:

* Rename GlobalValue::getGUID(StringRef) to
  getGUIDAssumingExternalLinkage. This is simply making explicit at the
  callsite what is currently implicit.
* Where possible, migrate users to directly calling getGUID on a
  GlobalValue instance.
* Otherwise, where possible, have them call the newly renamed
  getGUIDAssumingExternalLinkage, to make the assumption explicit.


There are a few cases where neither of the above are possible, as the
caller saves and reconstructs the necessary information to compute the
GUID themselves. We want to migrate these callers eventually, but for
this first step we leave them be.
2025-04-28 11:09:43 +10:00
Rafael Auler
3bcb724903
[BOLT] Add --custom-allocation-vma flag (#136385)
Add an advanced-user flag so we are able to rewrite binaries when we fail
to identify a suitable location to put new code. User then can supply a
custom location via --custom-allocation-vma. This happens more obviously if the
binary has segments mapped to very high addresses.
2025-04-18 21:02:09 -07:00
Rafael Auler
5c4e6c6113
[BOLT] Don't choke on nobits symbols (#136384) 2025-04-18 17:29:24 -07:00
wangjue
dbb79c30c9
[BOLT][Instrumentation] Initial instrumentation support for RISCV64 (#133882)
This patch adds code generation for RISCV64 instrumentation.The work
    involved includes the following three points:

a) Implements support for instrumenting direct function call and jump
    on RISC-V which relies on , Atomic instructions
    (used to increment counters) are only available on RISC-V when the A
    extension is used.

b) Implements support for instrumenting direct function inderect call
    by implementing the createInstrumentedIndCallHandlerEntryBB and
createInstrumentedIndCallHandlerExitBB interfaces. In this process, we
    need to accurately record the target address and IndCallID to ensure
    the correct recording of the indirect call counters.

c)Implemented the RISCV64 Bolt runtime library, implemented some system
call interfaces through embedded assembly. Get the difference between
runtime addrress of .text section andstatic address in section header
table, which in turn can be used to search for indirect call
description.

However, the community code currently has problems with relocation in
    some scenarios, but this has nothing to do with instrumentation. We
    may continue to submit patches to fix the related bugs.
2025-04-16 23:01:00 -07:00
alekuz01
38faf32d23
[BOLT] Enable hugify for AArch64 (#117158)
Add required hugify instrumentation and runtime libraries support for AArch64.
Fixes #58226
Unblocks #62695
2025-04-15 12:59:05 +01:00
YongKang Zhu
2a83c0cc13
[BOLT] Support relative vtable (#135449)
To handle relative vftable, which is enabled with clang option
`-fexperimental-relative-c++-abi-vtables`, we look for PC relative
relocations whose fixup locations fall in vtable address ranges.
For such relocations, actual target is just virtual function itself,
and the addend is to record the distance between vtable slot for
target virtual function and the first virtual function slot in vtable,
which is to match generated code that calls virtual function. So
we can skip the logic of handling "function + offset" and directly
save such relocations for future fixup after new layout is known.
2025-04-14 10:24:47 -07:00
Maksim Panchenko
e4cbb7780b
[BOLT][AArch64] Fix symbolization of unoptimized TLS access (#134332)
TLS relocations may not have a valid BOLT symbol associated with them.
While symbolizing the operand, we were checking for the symbol value,
and since there was no symbol the check resulted in a crash.

Handle TLS case while performing operand symbolization on AArch64.
2025-04-04 11:42:21 -07:00
Anatoly Trosinenko
c818ae7399
[BOLT] Gadget scanner: detect non-protected indirect calls (#131899)
Implement the detection of non-protected indirect calls and branches
similar to pac-ret scanner.
2025-04-03 16:40:34 +03:00
Maksim Panchenko
96e5ee23a7
[BOLT][AArch64] Add partial support for lite mode (#133014)
In lite mode, we only emit code for a subset of functions while
preserving the original code in .bolt.org.text. This requires updating
code references in non-emitted functions to ensure that:

* Non-optimized versions of the optimized code never execute.
* Function pointer comparison semantics is preserved.

On x86-64, we can update code references in-place using "pending
relocations" added in scanExternalRefs(). However, on AArch64, this is
not always possible due to address range limitations and linker address
"relaxation".

There are two types of code-to-code references: control transfer (e.g.,
calls and branches) and function pointer materialization.
AArch64-specific control transfer instructions are covered by #116964.

For function pointer materialization, simply changing the immediate
field of an instruction is not always sufficient. In some cases, we need
to modify a pair of instructions, such as undoing linker relaxation and
converting NOP+ADR into ADRP+ADD sequence.

To achieve this, we use the instruction patch mechanism instead of
pending relocations. Instruction patches are emitted via the regular MC
layer, just like regular functions. However, they have a fixed address
and do not have an associated symbol table entry. This allows us to make
more complex changes to the code, ensuring that function pointers are
correctly updated. Such mechanism should also be portable to RISC-V and
other architectures.

To summarize, for AArch64, we extend the scanExternalRefs() process to
undo linker relaxation and use instruction patches to partially
overwrite unoptimized code.
2025-03-27 21:33:25 -07:00
Paschalis Mpeis
6bbd45dec7
[NFC][BOLT] Refactor ForcePatch option (#127812)
Move force-patch flag to CommandLineOpts and add details on
PatchEntries.
2025-03-21 15:55:09 +00:00
Anatoly Trosinenko
03557169e0
[BOLT] Gadget scanner: streamline issue reporting (#131896)
In preparation for adding more gadget kinds to detect, streamline
issue reporting.

Rename classes representing issue reports. In particular, rename
`Annotation` base class to `Report`, as it has nothing to do with
"annotations" in `MCPlus` terms anymore. Remove references to "return
instructions" from variable names and report messages, use generic
terms instead. Rename NonPacProtectedRetAnalysis to PAuthGadgetScanner.

Remove `GeneralDiagnostic` as a separate class, make `GenericReport`
(former `GenDiag`) store `std::string Text` directly. Remove unused
`operator=` and `operator==` methods, as `Report`s are created on the
heap and referenced via `shared_ptr`s.

Introduce `GadgetKind` class - currently, it only wraps a `const char *`
description to display to the user. This description is intended to be
a per-gadget-kind constant (or a few hard-coded constants), so no need
to store it to `std::string` field in each report instance. To handle
both free-form `GenericReport`s and statically-allocated messages
without unnecessary overhead, move printing of the report header to the
base class (and take the message argument as a `StringRef`).
2025-03-21 11:19:53 +03:00
Ash Dobrescu
3bba268013
[BOLT] Support computed goto and allow map addrs inside functions (#120267)
Create entry points for addresses referenced by dynamic relocations and
allow getNewFunctionOrDataAddress to map addrs inside functions. By
adding addresses referenced by dynamic relocations as entry points. This
patch fixes an issue where bolt fails on code using computing goto's.
This also fixes a mapping issue with the bugfix from this PR:
https://github.com/llvm/llvm-project/pull/117766.
2025-03-19 14:55:59 +00:00
Maksim Panchenko
bac21719a8
[BOLT] Pass unfiltered relocations to disassembler. NFCI (#131202)
Instead of filtering and modifying relocations in readRelocations(),
preserve the relocation info and use it in the symbolizing disassembler.
This change mostly affects AArch64, where we need to look at original
linker relocations in order to properly symbolize instruction operands.
2025-03-14 18:44:33 -07:00
Paschalis Mpeis
2f9d94981c
[BOLT] Change Relocation Type to 32-bit NFCI (#130792) 2025-03-14 18:15:59 +00:00
chrisPyr
038fff3f24
[NFC][BOLT] Make file-local cl::opt global variables static (#126472)
#125983
2025-03-05 22:11:05 -08:00
ShatianWang
7e33bebe7c
[BOLT] Report flow conservation scores (#127954)
Add two additional profile quality stats for CG (call graph) and CFG
(control flow graph) flow conservations besides the CFG discontinuity
stats introduced in #109683. The two new stats quantify how different
"in-flow" is from "out-flow" in the following cases where they should be
equal. The smaller the reported stats, the better the flow conservations
are.

CG flow conservation: for each function that is not a program entry, the
number of times the function is called according to CG ("in-flow")
should be equal to the number of times the transition from an entry
basic block of the function to another basic block within the function
is recorded ("out-flow").

CFG flow conservation: for each basic block that is not a function entry
or exit, the number of times the transition into this basic block from
another basic block within the function is recorded ("in-flow") should
be equal to the number of times the transition from this basic block to
another basic block within the function is recorded ("out-flow").

Use `-v=1` for more detailed bucketed stats, and use `-v=2` to dump
functions / basic blocks with bad flow conservations.
2025-02-28 11:06:52 -05:00
YongKang Zhu
5401c675eb
[BOLT][instr] Avoid WX segment (#128982)
BOLT instrumented binary today has a readable (R), writeable (W) and also
executable (X) segment, which Android system won't load due to its WX
attribute. Such RWX segment was produced because BOLT has a two step linking,
first for everything in the updated or rewritten input binary and next for
runtime library. Each linking will layout sections in the order of RX sections
followed by RO sections and then followed by RW sections. So we could end up
having a RW section `.bolt.instr.counters` surrounded by a number of RO and RX
sections, and a new text segment was then formed by including all RX sections
which includes the RW section in the middle, and hence the RWX segment. One
way to fix this is to separate the RW `.bolt.instr.counters` section into its
own segment by a). assigning the starting addresses for section
`.bolt.instr.counters` and its following section with regular page aligned
addresses and b). creating two extra program headers accordingly.
2025-02-27 16:13:57 -08:00
Kristof Beyls
850b492976
[BOLT][binary-analysis] Add initial pac-ret gadget scanner (#122304)
This adds an initial pac-ret gadget scanner to the
llvm-bolt-binary-analysis-tool.

The scanner is taken from the prototype that was published last year at
https://github.com/llvm/llvm-project/compare/main...kbeyls:llvm-project:bolt-gadget-scanner-prototype,
and has been discussed in RFC

https://discourse.llvm.org/t/rfc-bolt-based-binary-analysis-tool-to-verify-correctness-of-security-hardening/78148
and in the EuroLLVM 2024 keynote "Does LLVM implement security
hardenings correctly? A BOLT-based static analyzer to the rescue?"
[Video](https://youtu.be/Sn_Fxa0tdpY)
[Slides](https://llvm.org/devmtg/2024-04/slides/Keynote/Beyls_EuroLLVM2024_security_hardening_keynote.pdf)

In the spirit of incremental development, this PR aims to add a minimal
implementation that is "fully working" on its own, but has major
limitations, as described in the bolt/docs/BinaryAnalysis.md
documentation in this proposed commit. These and other limitations will
be fixed in follow-on PRs, mostly based on code already existing in the
prototype branch. I hope incrementally upstreaming will make it easier
to review the code.

Note that I believe that this could also form the basis of a scanner to
analyze correct implementation of PAuthABI.
2025-02-24 07:26:28 +00:00
YongKang Zhu
9fa77c1854
[BOLT][Linker][NFC] Remove lookupSymbol() in favor of lookupSymbolInfo() (#128070)
Sometimes we need to know the size of a symbol besides its address, so
maybe we can start using the existing `BOLTLinker::lookupSymbolInfo()`
(that returns symbol address and size) and remove
`BOLTLinker::lookupSymbol()` (that only returns symbol address). And for
both we need to check return value as it is wrapped in `std::optional<>`,
which makes the difference even smaller.
2025-02-20 17:14:33 -08:00
Nikita Popov
e7244d8659
[BOLT][CMake] Don't export bolt libraries in LLVMExports.cmake (#121936)
Bolt makes use of add_llvm_library and as such ends up exporting its
libraries from LLVMExports.cmake, which is not correct.

Bolt doesn't have its own exports file, and I assume that there is no
desire to have one either -- Bolt libraries are not intended to be
consumed as a cmake module, right?

As such, this PR adds a NO_EXPORT option to simplify exclude these
libraries from the exports file.
2025-01-08 09:41:09 +01:00
Franklin
6e8a1a45a7
[BOLT] Detect Linux kernel version if the binary is a Linux kernel (#119088)
This makes it easier to handle differences (e.g. of exception table
entry size) between versions of Linux kernel
2024-12-26 09:54:23 -08:00
Maksim Panchenko
21684e38ee
[BOLT][Linux] Refactor reading of PC-relative addresses. NFCI (#120491)
Fix evaluation order problem identified in
https://github.com/llvm/llvm-project/pull/119088.
2024-12-19 10:40:25 -08:00
Alexander Yermolovich
3c357a49d6
[BOLT] Add support for safe-icf (#116275)
Identical Code Folding (ICF) folds functions that are identical into one
function, and updates symbol addresses to the new address. This reduces
the size of a binary, but can lead to problems. For example when
function pointers are compared. This can be done either explicitly in
the code or generated IR by optimization passes like Indirect Call
Promotion (ICP). After ICF what used to be two different addresses
become the same address. This can lead to a different code path being
taken.

This is where safe ICF comes in. Linker (LLD) does it using address
significant section generated by clang. If symbol is in it, or an object
doesn't have this section symbols are not folded.

BOLT does not have the information regarding which objects do not have
this section, so can't re-use this mechanism.

This implementation scans code section and conservatively marks
functions symbols as unsafe. It treats symbols as unsafe if they are
used in non-control flow instruction. It also scans through the data
relocation sections and does the same for relocations that reference a
function symbol. The latter handles the case when function pointer is
stored in a local or global variable, etc. If a relocation address
points within a vtable these symbols are skipped.
2024-12-16 21:49:53 -08:00
Paschalis Mpeis
2df48fa78b
[BOLT][AArch64] Enable function print after ADRRelaxation (#119869)
Introduce `--print-adr-relaxation` to print after ADR Relaxation pass.
2024-12-16 12:06:56 +00:00