111 Commits

Author SHA1 Message Date
Alexandros Lamprineas
64b728128d
[BOLT][AArch64] Add minimal support for liveness analysis. (#183298)
In this patch I am adding the missing target hooks required for the
liveness analysis to run on AArch64. These are
 - getFlagsReg()
 - getRegsUsedAsParams()
 - getDefaultLiveOut()
 - getGPRegs()
 - isCleanRegXOR()

I am also introducing the following API in LivenessAnalysis
 - BitVector getLiveIn/Out(const MCInst &)
 - MCPhysReg scavengeRegFromState(BitVector &)
 
My intention is to allow the LongJmp pass scavenge usable registers when
injecting code.
2026-04-02 11:59:59 +01:00
Gergely Bálint
9d762ad279
[BOLT][BTI] Patch ignored functions in place when targeting them with indirect branches (#177165)
When applying BTI fixups to indirect branch targets, ignored functions
are
considered as a special case:
- these hold no instructions,
- have no CFG,
- and are not emitted in the new text section.

The solution is to patch the entry points in the original location.

If such a situation occurs in a binary, recompilation using the
-fpatchable-function-entry flag is required. This will place a nop at
all
function starts, which BOLT can use to patch the original section.

Without the extra nop, BOLT cannot safely patch the original .text
section.

An alternative solution could be to also ignore the function from which
the stub starts. This has not been tried as LongJmp pass - where most
stubs are inserted - is currently not equipped to ignore functions.

Testing: both the success and failure cases are covered with lit tests.
2026-02-24 11:09:42 +01:00
Alexey Moksyakov
12b561a5e2
[bolt][aarch64] Change indirect call instrumentation snippet (#180229)
Indirect call instrumentation snippet uses x16 register in exit handler
to go to destination target

    __bolt_instr_ind_call_handler_func:
            msr  nzcv, x1
            ldp  x0, x1, [sp], #16
            ldr  x16, [sp], #16
            ldp  x0, x1, [sp], #16
            br   x16	<-----

This patch adds the instrumentation snippet by calling instrumentation
runtime library through indirect call instruction and adding the wrapper
to store/load target value and the register for original indirect
instruction.

Example:
            mov x16, foo

    infirectCall:
            adrp x8, Label
            add  x8, x8, #:lo12:Label
            blr x8

Before:

    Instrumented indirect call:
            stp     x0, x1, [sp, #-16]!
            mov     x0, x8
            movk    x1, #0x0, lsl #48
            movk    x1, #0x0, lsl #32
            movk    x1, #0x0, lsl #16
            movk    x1, #0x0
            stp     x0, x1, [sp, #-16]!
            adrp    x0, __bolt_instr_ind_call_handler_func
            add     x0, x0, #:lo12:__bolt_instr_ind_call_handler_func
            blr     x0

    __bolt_instr_ind_call_handler:  (exit snippet)
            msr     nzcv, x1
            ldp     x0, x1, [sp], #16
            ldr     x16, [sp], #16
            ldp     x0, x1, [sp], #16
            br      x16    <- overwrites the original value in X16

    __bolt_instr_ind_call_handler_func:  (entry snippet)
            stp     x0, x1, [sp, #-16]!
            mrs     x1, nzcv
            adrp    x0, __bolt_instr_ind_call_handler
            add     x0, x0, x0, #:lo12:__bolt_instr_ind_call_handler
            ldr     x0, [x0]
            cmp     x0, #0x0
            b.eq    __bolt_instr_ind_call_handler
            str     x30, [sp, #-16]!
            blr     x0     <--- runtime lib store/load all regs
            ldr     x30, [sp], #16
            b       __bolt_instr_ind_call_handler


_________________________________________________________________________

After:

            mov     x16, foo
    infirectCall:
            adrp    x8, Label
            add     x8, x8, #:lo12:Label
            blr     x8

    Instrumented indirect call:
            stp     x0, x30, [sp, #-16]!
            mov     x0, callsiteid
            stp    x8, x0, [sp, #-16]!
            adrp    x8, __bolt_instr_ind_call_handler_func
            add     x8, x8, #:lo12:__bolt_instr_ind_call_handler_func
            blr     x8       <--- call trampoline instr lib
            ldr     x8, [sp], #16
            ldp     x0, x30, [sp], #16
            blr     x8       <--- original indirect call instruction

    // don't touch regs besides x0, x1
    __bolt_instr_ind_call_handler:  (exit snippet)
            ret     <---- return to original function with indirect call

    __bolt_instr_ind_call_handler_func: (entry snippet)
            adrp    x0, __bolt_instr_ind_call_handler
            add     x0, x0, #:lo12:__bolt_instr_ind_call_handler
            ldr     x0, [x0]
            cmp     x0, #0x0
            b.eq    __bolt_instr_ind_call_handler
            str     x30, [sp, #-16]!
            blr     x0     <--- runtime lib store/load all regs
            ldr     x30, [sp], #16
            b       __bolt_instr_ind_call_handler
2026-02-16 10:45:08 +03:00
Alexandros Lamprineas
0584699c11
[BOLT][AArch64] Support FEAT_CMPBR branch instructions. (#174972)
The Armv9.6-A compare-and-branch instructions use a short range 9-bit
immediate value. They do not have a corresponding relocation type in the
ABI. For now we only support them in compact code model, with
diagnostics added in the LongJmp pass to ensure this condition. Some
interesting edge cases we cover:
- function splitting works when target is within or beyond the 1KB range
of those instructions,
 - but doesn't work beyond the 128MB limit of the compact code model
- branch inversion works with block reordering so long as the immediate
value adjustments remain in bounds
2026-02-12 15:49:00 +00:00
Gergely Bálint
f7c5316468
[BOLT][BTI] Refactor: move applyBTIFixup under MCPlusBuilder (#177164)
This patch moves the applyBTIFixup from LongJmp pass to MCPlusBuilder.
This refactor allows applyBTIFixup to be called from other passes
inserting indirect branches, such as:
- Hugify,
- PatchEntries.

As different passes have different information about their targets (e.g.
target BasicBlock, target Symbol, target Function), specialized versions
are created (applyBTIFixupToSymbol, applyBTIFixupToTarget), and each
calls
applyBTIFixupCommon, which implements the original logic from before.

Names of related lit tests are updated to have the "bti" prefix.
2026-02-12 08:29:16 +01:00
Gergely Bálint
de40ef2a3f
[BOLT][BTI] Patch LLD-generated PLTs to contain BTI landing pad (#173245)
This patch adds the patchPLTEntryForBTI to enable patching PLT entries
generated by LLD.

## Context:

To keep BTI consistent, targets of stubs inserted in LongJmp need to be
patched. As PLTs are not optimized and emitted by BOLT, this patch adds
a helper for patching them in the original .plt section.

For PLTs generated by LLD, this is safe as LLD inserts extra nops to
PLTs which don't already contain a BTI.

PLT entry before patching:
```
   adrp x16, Page(&(.got.plt[n]))
   ldr  x17, [x16, Offset(&(.got.plt[n]))]
   add  x16, x16, Offset(&(.got.plt[n]))
   br   x17
   nop
   nop
```

PLT entry after patching:
```
   bti c
   adrp x16, Page(&(.got.plt[n]))
   ldr  x17, [x16, Offset(&(.got.plt[n]))]
   add  x16, x16, Offset(&(.got.plt[n]))
   br   x17
   nop
```

## Safety considerations:

The PLT entry can become incorrect if shifting the ADRP moves it
across a page boundary.

The PLT entry is 24 bytes, and page size is 4096 (or 16384) bytes.
Their GCD is 8 bytes, meaning that shifting the ADRP is safe, as long as
it's shifted by less than 8 bytes.

The introduced function only shifts the ADRP by one instruction (4
bytes),
meaning there is no need to recompute the ADRP offset.
2026-01-16 09:48:54 +01:00
Harald van Dijk
e42f862042
[BOLT][AArch64] Avoid UB due to shift of negative value. (#174994)
A build with LLVM_USE_SANITIZER=Undefined showed:

  bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp:2277:60:
  runtime error: left shift of negative value -32768

This showed up in bolt/test/AArch64/veneer-lite-mode.s.

It is valid for ADRP's operand to be negative, and not valid to shift it
like that. To perform this shift reliably, cast the value to unsigned.
2026-01-08 17:06:40 +00:00
Gergely Bálint
76c300c8c7
[BOLT][BTI] Fix assertions checking getNumOperands (#174600)
Several BTI-related functions are checking that a call MCInst has one
non-annotation operand.

This patch changes these checks to use MCPlus::getNumPrimeOperands,
instead of getNumOperands.

Testing: 
added annotations to existing gtests to serve as regression
tests. These now also explicitly check getNumOperands and getNumPrimeOperands
usage on the annotated MCInsts.
2026-01-07 10:54:00 +01:00
Maksim Panchenko
3e840d2957
[BOLT] Remove unnecessary dependency. NFC (#174645)
There's no need for a full definition of `BinaryBasicBlock` in
`MCPlusBuilder.h`. Use `InstructionListType::iterator` instead of
`BinaryBasicBlock::iterator` in `findMemcpySizeInBytes()`.
2026-01-06 13:30:28 -08:00
Gergely Bálint
24297bea96
[BOLT][BTI] Refactor BTI helpers (#173000)
- Add an enum to encode BTI variants in function arguments.
- Remove updateBTIVariant as createBTI can be used for the same
purpose.
- Remove a test case that checked against invalid BTI variants, as
those are now unrepresentable.
2025-12-22 10:11:41 +01:00
Alexey Moksyakov
6f748698f7
Revert "[bolt][aarch64] simplify rodata/literal load for X86 & AArch6… (#172822)
few tests are broken on ubuntu, need find out the cause 
This reverts commit 999c9382571d6aadf9b786263862bf4085dd2dba.

Co-authored-by: yavtuk <yavtuk@ya.ru>
2025-12-18 14:19:17 +03:00
Alexey Moksyakov
999c938257
[bolt][aarch64] simplify rodata/literal load for X86 & AArch64 (#165723)
This patch fixed the issue related to load literal
for AArch64 (bolt/test/AArch64/materialize-constant.s),
address range for literal is limited  +/- 1MB,
emitCI puts the constants by the end of function and
the one is out of available range.

SimplifyRODataLoads is enabled by default for X86 & AArch64
2025-12-18 09:59:01 +03:00
YongKang Zhu
429dbce8d7
[BOLT][AArch64] Tweak heuristics for epilogue recognition (#169584)
If a basic block contains a load into LR from stack and has no
instruction saving LR onto stack, we just assume the basic block
is an epilogue.

This is not meant to accurately recognize epilogue in all possible
cases, but to have BOLT be conservative on treating basic block as
epilogue and then turning indirect branch with unknown control flow
to tail call.
2025-12-11 14:46:36 -08:00
Gergely Bálint
fbc121ce1e
[BOLT][BTI] Add MCPlusBuilder::insertBTI (#167329)
This function contains most of the logic for BTI:
- it takes the BasicBlock and the instruction used to jump to it.
- Then it checks if the first non-pseudo instruction is a sufficient
landing pad for the used call.
- if not, it generates the correct BTI instruction.

Also introduce the isCallCoveredByBTI helper to simplify the logic.
2025-12-11 11:58:24 +01:00
Harald van Dijk
3686ff2b15
[AArch64] Treat NOP as a separate instruction. (#170968)
Previously, nop was treated as just an alias for hint #0. The
consequence of that was that all the general rules for hint instructions
applied to nop too, in particular that during binary analysis, they were
assumed to have unknown effects. This commit adds AArch64::NOP as a
standalone instruction with no side effects.

The scheduling update in A55-load-store-alias.s is probably not entirely
accurate, but should be more accurate than the previous result.
2025-12-09 00:04:25 +00:00
Gergely Bálint
d984125721
Revert "[BOLT][AArch64] Fixed indirect call instrumentation snippet" (#170874)
Reverts llvm/llvm-project#141918

The patch broke a buildbot while applying instrumentation:
https://lab.llvm.org/buildbot/#/builders/128/builds/8866

Reverting until the issue is triaged.

Co-authored-by: Paschalis Mpeis <paschalis.mpeis@arm.com>
2025-12-05 19:06:36 +03:00
Alexey Moksyakov
ad605bdad7
[bolt][aarch64] Change indirect call instrumentation snippet
Indirect call instrumentation snippet uses x16 register in exit
handler to go to destination target

    __bolt_instr_ind_call_handler_func:
            msr  nzcv, x1
            ldp  x0, x1, [sp], llvm#16
            ldr  x16, [sp], llvm#16
            ldp  x0, x1, [sp], llvm#16
            br   x16	<-----

This patch adds the instrumentation snippet by calling instrumentation
runtime library through indirect call instruction and adding the wrapper
to store/load target value and the register for original indirect instruction.

Example:
            mov x16, foo

    infirectCall:
            adrp x8, Label
            add  x8, x8, #:lo12:Label
            blr x8

Before:

    Instrumented indirect call:
            stp     x0, x1, [sp, #-16]!
            mov     x0, x8
            movk    x1, #0x0, lsl llvm#48
            movk    x1, #0x0, lsl llvm#32
            movk    x1, #0x0, lsl llvm#16
            movk    x1, #0x0
            stp     x0, x1, [sp, #-16]!
            adrp    x0, __bolt_instr_ind_call_handler_func
            add     x0, x0, #:lo12:__bolt_instr_ind_call_handler_func
            blr     x0

    __bolt_instr_ind_call_handler:  (exit snippet)
            msr     nzcv, x1
            ldp     x0, x1, [sp], llvm#16
            ldr     x16, [sp], llvm#16
            ldp     x0, x1, [sp], llvm#16
            br      x16    <- overwrites the original value in X16

    __bolt_instr_ind_call_handler_func:  (entry snippet)
            stp     x0, x1, [sp, #-16]!
            mrs     x1, nzcv
            adrp    x0, __bolt_instr_ind_call_handler
            add     x0, x0, x0, #:lo12:__bolt_instr_ind_call_handler
            ldr     x0, [x0]
            cmp     x0, #0x0
            b.eq    __bolt_instr_ind_call_handler
            str     x30, [sp, #-16]!
            blr     x0     <--- runtime lib store/load all regs
            ldr     x30, [sp], llvm#16
            b       __bolt_instr_ind_call_handler

_________________________________________________________________________

After:

            mov     x16, foo
    infirectCall:
            adrp    x8, Label
            add     x8, x8, #:lo12:Label
            blr     x8

    Instrumented indirect call:
            stp     x0, x1, [sp, #-16]!
            mov     x0, x8
            movk    x1, #0x0, lsl llvm#48
            movk    x1, #0x0, lsl llvm#32
            movk    x1, #0x0, lsl llvm#16
            movk    x1, #0x0
            stp     x0, x30, [sp, #-16]!
            adrp    x8, __bolt_instr_ind_call_handler_func
            add     x8, x8, #:lo12:__bolt_instr_ind_call_handler_func
            blr     x8       <--- call trampoline instr lib
            ldp     x0, x30, [sp], llvm#16
            mov     x8, x0   <---- restore original target
            ldp     x0, x1, [sp], llvm#16
            blr     x8       <--- original indirect call instruction

    // don't touch regs besides x0, x1
    __bolt_instr_ind_call_handler:  (exit snippet)
            ret     <---- return to original function with indirect call

    __bolt_instr_ind_call_handler_func: (entry snippet)
            adrp    x0, __bolt_instr_ind_call_handler
            add     x0, x0, #:lo12:__bolt_instr_ind_call_handler
            ldr     x0, [x0]
            cmp     x0, #0x0
            b.eq    __bolt_instr_ind_call_handler
            str     x30, [sp, #-16]!
            blr     x0     <--- runtime lib store/load all regs
            ldr     x30, [sp], llvm#16
            b       __bolt_instr_ind_call_handler
2025-11-27 23:48:10 +03:00
Gergely Bálint
cca66a21c2
[BOLT][BTI] Add MCPlusBuilder::updateBTIVariant (#167308)
Checks if an instruction is BTI, and updates the immediate value to the
newly requested variant.  
  
This can be used in situations when the compiler already inserted a BTI
landing pad to a location, but BOLT needs to update it to a different
variant.
Example: br x0 to a location with a BTI c.
2025-11-26 17:48:34 +01:00
Gergely Bálint
4533699245
[BOLT][BTI] Add MCPlusBuilder::isBTILandingPad (#167306)
- takes both implicit and explicit BTIs into account
- fix related comment in 
   llvm/lib/Target/AArch64/AArch64BranchTargets.cpp
2025-11-25 18:37:30 +01:00
Gergely Bálint
ed95c4d6ec
[BOLT][BTI] Add MCPlusBuilder::createBTI (#167305)
- creates a BTI j|c landing pad MCInst.
- create getBTIHintNum utility in AArch64/Utils, to make sure BOLT
  generates BTI immediates the same way as LLVM.
- add MCPlusBuilder unittests to cover new function.
2025-11-25 09:51:40 +01:00
Gergely Bálint
bab1c2971a
[BOLT] Extend Inliner to work on functions with Pointer Authentication (#162458)
The inliner uses DirectSP to check if a function has instructions that
modify the SP. Exceptions are stack Push and Pop instructions.

We can also allow pointer signing and authenticating instructions.

The inliner removes the Return instructions from the inlined functions.
If it is a fused pointer-authentication-and-return (e.g. RETAA), we have
to generate a new authentication instruction.
2025-11-24 18:00:58 +01:00
YongKang Zhu
4cd16f2a0c
[BOLT][AArch64] Add more heuristics on epilogue determination (#167077)
Add more heuristics to check if a basic block is an AArch64 epilogue. We
assume instructions that load from stack or adjust stack pointer as
valid epilogue code sequence if and only if they immediately precede the
branch instruction that ends the basic block.
2025-11-10 09:50:44 -08:00
YongKang Zhu
b0ae054a56
[BOLT][AArch64] Fix LDR relocation type in ADRP+LDR sequence (#166391)
`R_AARCH64_ADD_ABS_LO12_NC` is for the `ADD` instruction in the
`ADRP+ADD` sequence. For `ADRP+LDR` sequence generated in LDR
relaxation, relocation type for `LDR` should be
`R_AARCH64_LDST64_ABS_LO12_NC` if it is 64-bit integer load or
`R_AARCH64_LDST32_ABS_LO12_NC` if 32-bit.

Sorry should have included this in #165787.
2025-11-05 12:01:58 -08:00
Elvina Yakubova
a65867ac31
[BOLT][AArch64] Fix search to proceed upwards from memcpy call (#166182)
The search should proceed from CallInst to the beginning of BB since X2
can be rewritten and we need to catch the most recent write before the
call.

Patch by Yafet Beyene alulayafet@gmail.com
2025-11-05 10:51:31 +00:00
YongKang Zhu
718a3b268f
[BOLT][AArch64] Run LDR relaxation (#165787)
Replace the current `ADRRelaxationPass` with `AArch64RelaxationPass`,
which, besides the existing ADR relaxation, will also run LDR relaxation
that for now only handles these two forms of LDR instructions:
`ldr Xt, [label]` and `ldr Wt, [label]`.
2025-11-04 06:49:04 -08:00
YongKang Zhu
e1ae126401
[BOLT][AArch64] Validate code padding (#164037)
Check whether AArch64 function code padding is valid,
and add an option to treat invalid code padding as error.
2025-10-22 20:25:06 -07:00
Christian Clauss
0fc05aa1c6
[bolt] Fix typos discovered by codespell (#124726)
https://github.com/codespell-project/codespell
```bash
codespell bolt --skip="*.yaml,Maintainers.txt" --write-changes \
    --ignore-words-list=acount,alledges,ans,archtype,defin,iself,mis,mmaped,othere,outweight,vas
```
2025-10-14 14:45:40 +02:00
Gergely Bálint
889bfd9172
Reapply "[BOLT][AArch64] Handle OpNegateRAState to enable optimizing binaries with pac-ret hardening" (#162353) (#162435)
Reapply "[BOLT][AArch64] Handle OpNegateRAState to enable optimizing
binaries with pac-ret hardening (#120064)" (#162353)

This reverts commit c7d776b06897567e2d698e447d80279664b67d47.

#120064 was reverted for breaking builders.

Fix: changed the mismatched type in MarkRAStates.cpp to `auto`.

---

Original message:

OpNegateRAState is an AArch64-specific DWARF CFI used to change the value
of the RA_SIGN_STATE pseudoregister. The RA_SIGN_STATE register records
whether the current return address has been signed with PAC.

OpNegateRAState requires special handling in BOLT because its placement
depends on the function layout. Since BOLT reorders basic blocks during
optimization, these CFIs must be regenerated after layout is finalized.

This patch introduces two new passes:

- MarkRAStates (runs before optimizations): assigns a signedness annotation to each
  instruction based on OpNegateRAState CFIs in the input binary.

- InsertNegateRAStates (runs after optimizations): reads the annotations and emits
  new OpNegateRAState CFIs where RA state changes between instructions.

Design details are described in: `bolt/docs/PacRetDesign.md`.
2025-10-08 11:05:41 +02:00
Gergely Bálint
c7d776b068
Revert "[BOLT][AArch64] Handle OpNegateRAState to enable optimizing binaries with pac-ret hardening" (#162353)
Reverts llvm/llvm-project#120064.

@gulfemsavrun reported that the patch broke toolchain builders.
2025-10-07 21:59:18 +02:00
Gergely Bálint
32eaf5b59c
[BOLT][AArch64] Handle OpNegateRAState to enable optimizing binaries with pac-ret hardening (#120064)
OpNegateRAState is an AArch64-specific DWARF CFI used to change the value
of the RA_SIGN_STATE pseudoregister. The RA_SIGN_STATE register records
if the current return address has been signed with PAC.

OpNegateRAState requires special handling in BOLT because its placement
depends on the function layout. Since BOLT reorders basic blocks during
optimization, these CFIs must be regenerated after layout is finalized.

This patch introduces two new passes:

- MarkRAStates (runs before optimizations): assigns a signedness annotation to each
  instruction based on OpNegateRAState CFIs in the input binary.

- InsertNegateRAStates (runs after optimizations): reads the annotations and emits
  new OpNegateRAState CFIs where RA state changes between instructions.

Design details are described in: `bolt/docs/PacRetDesign.md`.
2025-10-07 10:22:14 +02:00
Anatoly Trosinenko
d884b55ea4
[BOLT] Introduce helpers to match MCInsts one at a time (NFC) (#138883)
Introduce a low-level instruction matching DSL to capture and/or match
the operands of MCInst, single instruction at a time. Unlike the
existing `MCPlusBuilder::MCInstMatcher` machinery, this DSL is intended
for the use cases when the precise control over the instruction order is
required. For example, when validating PtrAuth hardening, all registers
are usually considered unsafe after a function call, even though
callee-saved registers should preserve their old
values _under normal operation_.

Usage example:

    // Bring the short names into the local scope:
    using namespace LowLevelInstMatcherDSL;
    // Declare the registers to capture:
    Reg Xn, Xm;
    // Capture the 0th and 1st operands, match the 2nd operand against the
    // just captured Xm register, match the 3rd operand against literal 0:
    if (!matchInst(MaybeAdd, AArch64::ADDXrs, Xm, Xn, Xm, Imm(0))
      return AArch64::NoRegister;
    // Match the 0th operand against Xm:
    if (!matchInst(MaybeBr, AArch64::BR, Xm))
      return AArch64::NoRegister;
    // Manually check that Xm and Xn did not match the same register:
    if (Xm.get() == Xn.get())
      return AArch64::NoRegister;
    // Return the matched register:
    return Xm.get();
2025-09-30 21:24:44 +03:00
YongKang Zhu
9e6fa39540
[BOLT][AArch64][instr] Consider targeting ARM64 CPUs without LSE support (#158738)
`stadd` is only available in recent arm64 CPUs that have LSE support
(like Cortex-A73 and Cortex-A75) and is not available on old arm64 CPUs
(like Cortex-A53 and Cortex-A55). Devices could have a mixture of these
two kinds of CPUs, for which we need to provide an option for BOLT to
generate instrumentation sequence that emulates what `stadd` would do.
The implementation puts counter increment into an injected helper function
so we don't need to update CFG in the function that is being instrumented
and instrumentation induced binary size increase will be smaller.
2025-09-25 13:18:57 -07:00
YongKang Zhu
675b01a4a3
[BOLT][AArch64][instr] Remove instructions on saving and restoring NZCV (#156994)
Remove the `NZCV` save and restore instructions from instrumentation
sequence because the instructions used for getting counter address,
counter increment and stack push/pop won't impact `NZCV`. And with
this, we can use `X1` to do counter increment and remove the two
instructions that saves and later restores `X2`.
2025-09-10 10:16:07 -07:00
YafetBeyene
244588b9d7
[BOLT][AArch64] Inlining of Memcpy (#154929)
The pass for inlining memcpy in BOLT was currently X86-specific and was
using the instruction `rep movsb`.

This patch implements a static size analysis system for AArch64 memcpy
inlining that extracts copy sizes from preceding instructions to then
use it to generate the optimal width-specific load/store sequences.
2025-09-09 14:09:23 +01:00
YongKang Zhu
93785ff4a0
[BOLT][AArch64][instr] Remove red zone clobbering protection (#156129)
We can safely remove the red zone clobbering protection in arm64
instrumentation sequence, since there is no red zone in AArch64
ELF/Linux system.
2025-09-03 21:43:49 -07:00
Anatoly Trosinenko
58edd27670
[BOLT] Gadget scanner: account for BRK when searching for auth oracles (#137975)
An authenticated pointer can be explicitly checked by the compiler via a
sequence of instructions that executes BRK on failure. It is important
to recognize such BRK instruction as checking every register (as it is
expected to immediately trigger an abnormal program termination) to
prevent false positive reports about authentication oracles:

      autia   x2, x3
      autia   x0, x1
      ; neither x0 nor x2 are checked at this point
      eor     x16, x0, x0, lsl #1
      tbz     x16, #62, on_success ; marks x0 as checked
      ; end of BB: for x2 to be checked here, it must be checked in both
      ; successor basic blocks
    on_failure:
      brk     0xc470
    on_success:
      ; x2 is checked
      ldr     x1, [x2] ; marks x2 as checked
2025-08-25 14:24:19 +03:00
Fangrui Song
244e053b6c MC: Remove llvm/MC/MCFixupKindInfo.h
The file used to define `MCFixupKindInfo`, a simple structure,
which is now in MCAsmBackend.h.
2025-07-05 11:24:11 -07:00
Fangrui Song
5b7f1c17d9 BOLT: Replace deprecated MCFixupKindInfo::FKF_IsPCRel with MCFixup::isPCRel
MCFixup::PCRel is now set at creation and the MCFixupKindInfo::FKF_IsPCRel flag
is no longer set.
2025-07-04 17:33:20 -07:00
Fangrui Song
109b7d965c MC: Remove unneeded VK_None argument to MCSymbolRefExpr::create calls
The MCSymbolRefExpr::create overload with the specifier parameter is
discouraged and being phased out. Expressions with relocation specifiers
should use MCSpecifierExpr instead.
2025-06-27 21:22:46 -07:00
Fangrui Song
30922f740e
Move relocation specifier constants to AArch64::
Rename these relocation specifier constants, aligning with the naming
convention used by other targets (`S_` instead of `VK_`).

* ELF/COFF: AArch64MCExpr::VK_ => AArch64::S_ (VK_ABS/VK_PAGE_ABS are
  also used by Mach-O as a hack)
* Mach-O: AArch64MCExpr::M_ => AArch64::S_MACHO_
* shared: AArch64MCExpr::None => AArch64::S_None

Apologies for the churn following the recent rename in #132595. This
change ensures consistency after introducing MCSpecifierExpr to replace
MCTargetSpecifier subclasses.

Pull Request: https://github.com/llvm/llvm-project/pull/144633
2025-06-24 19:06:22 -07:00
Fangrui Song
17e8465a3e
AArch64: Replace AArch64MCExpr with MCSpecifierExpr
Replace AArch64MCExpr, which encodes expressions with relocation
specifiers, with the new generic MCSpecifierExpr interface, aligning
with other targets by phasing out target-specific XXXMCExpr classes.

Temporarily convert AArch64MCExpr to a namespace to avoid renaming
`AArch64MCExpr::VK_` constants in this PR. A follow-up patch will rename
these to `AArch64::S_` to match the convention used by other targets.

Move helper functions to AArch64MCAsmInfo.h, with the goal of eventually
removing AArch64MCExpr.h.

Pull Request: https://github.com/llvm/llvm-project/pull/144632
2025-06-20 20:06:32 -07:00
Fangrui Song
cdd0a6c781 BOLT: Replace MCTargetExpr with MCSpecifierExpr to fix bolt-icf.test on aarch64 host 2025-06-07 22:35:20 -07:00
Anatoly Trosinenko
e1328fd9ad
[BOLT] Gadget scanner: clarify MCPlusBuilder callbacks interface (#136147)
Clarify the semantics of `getAuthenticatedReg` and remove a redundant
`isAuthenticationOfReg` method, as combined auth+something instructions
(such as `retaa` on AArch64) should be handled carefully, especially
when searching for authentication oracles: usually, such instructions
cannot be authentication oracles and only some of them actually write an
authenticated pointer to a register (such as "ldra x0, [x1]!").

Use `std::optional<MCPhysReg>` returned type instead of plain MCPhysReg
and returning `getNoRegister()` as a "not applicable" indication.

Document a few existing methods, add information about preconditions.
2025-05-26 18:31:20 +03:00
Anatoly Trosinenko
48a2836b4d
[BOLT] Gadget scanner: detect signing oracles (#134146)
Implement the detection of signing oracles. In this patch, a signing
oracle is defined as a sign instruction that accepts a "non-protected"
pointer, but for a slightly different definition of "non-protected"
compared to control flow instructions.

A second BitVector named TrustedRegs is added to the register state
computed by the data-flow analysis. The difference between a
"safe-to-dereference" and a "trusted" register states is that to make
an unsafe register trusted by authentication, one has to make sure
that the authentication succeeded. For example, on AArch64 without
FEAT_PAuth2 and FEAT_EPAC, an authentication instruction produces an
invalid pointer on failure, so that subsequent memory access triggers
an error, but re-signing such pointer would "fix" the signature.

Note that while a separate "trusted" register state may be redundant
depending on the specific semantics of auth and sign operations, it is
still important to check signing operations: while code like this

    resign:
      autda x0, x1
      pacda x0, x2
      ret

is probably safe provided `autda` generates an error on authentication
failure, this function

    sign_anything:
      pacda x0, x1
      ret

is inherently unsafe.
2025-05-20 13:42:53 +03:00
Fangrui Song
c239acb5b6 MCFixup: Make FixupKindInfo smaller and change getFixupKindInfo to return value
We will increase the use of raw relocation types and eliminate fixup
kinds that correspond to relocation types. The getFixupKindInfo
functions will return an rvalue instead. Let's update the return type
from a const reference to a value type.
2025-04-18 20:55:43 -07:00
Anatoly Trosinenko
8521bd2424
[BOLT][AArch64] Handle PAuth call instructions in isIndirectCall (#133227)
Handle `BLRA*` opcodes in AArch64MCPlusBuilder::isIndirectCall, update
getRegUsedAsCallDest accordingly.
2025-04-08 13:23:10 +03:00
Anatoly Trosinenko
0fc7aec349
[BOLT] Gadget scanner: detect address materialization and arithmetic (#132540)
In addition to authenticated pointers, consider the contents of a
register safe if it was
* written by PC-relative address computation
* updated by an arithmetic instruction whose input address is safe
2025-04-07 13:13:11 +03:00
Rodrigo Rocha
b9891715af
[BOLT] Handle generation of compare and jump sequences (#131949)
This patch fixes the following two issues with the createCmpJE for
AArch64:
1. Avoids overwriting the value of the input register RegNo by use XZR
as the destination register.
   subs xzr, RegNo, #Imm
   which is equivalent to a simple
   cmp RegNo, #Imm
2. The immediate operand to the Bcc instruction must be EQ instead of
#Imm.

This patch also adds a new function for createCmpJNE and unit tests for
the both createCmpJE and createCmpJNE for X86 and AArch64.
2025-04-03 18:34:24 -07:00
Anatoly Trosinenko
c818ae7399
[BOLT] Gadget scanner: detect non-protected indirect calls (#131899)
Implement the detection of non-protected indirect calls and branches
similar to pac-ret scanner.
2025-04-03 16:40:34 +03:00
Maksim Panchenko
b2d272ccfb
[BOLT][X86] Fix getTargetSymbol() (#133834)
In 96e5ee2, I inadvertently broke the way non-trivial symbol references
got updated from non-optimized code. The breakage was a consequence of
`getTargetSymbol(MCExpr *)` not returning a symbol when the parameter
was a binary expression. Fix `getTargetSymbol()` to cover such cases.
2025-03-31 18:31:33 -07:00