In this patch I am adding the missing target hooks required for the
liveness analysis to run on AArch64. These are
- getFlagsReg()
- getRegsUsedAsParams()
- getDefaultLiveOut()
- getGPRegs()
- isCleanRegXOR()
I am also introducing the following API in LivenessAnalysis
- BitVector getLiveIn/Out(const MCInst &)
- MCPhysReg scavengeRegFromState(BitVector &)
My intention is to allow the LongJmp pass scavenge usable registers when
injecting code.
When applying BTI fixups to indirect branch targets, ignored functions
are
considered as a special case:
- these hold no instructions,
- have no CFG,
- and are not emitted in the new text section.
The solution is to patch the entry points in the original location.
If such a situation occurs in a binary, recompilation using the
-fpatchable-function-entry flag is required. This will place a nop at
all
function starts, which BOLT can use to patch the original section.
Without the extra nop, BOLT cannot safely patch the original .text
section.
An alternative solution could be to also ignore the function from which
the stub starts. This has not been tried as LongJmp pass - where most
stubs are inserted - is currently not equipped to ignore functions.
Testing: both the success and failure cases are covered with lit tests.
The Armv9.6-A compare-and-branch instructions use a short range 9-bit
immediate value. They do not have a corresponding relocation type in the
ABI. For now we only support them in compact code model, with
diagnostics added in the LongJmp pass to ensure this condition. Some
interesting edge cases we cover:
- function splitting works when target is within or beyond the 1KB range
of those instructions,
- but doesn't work beyond the 128MB limit of the compact code model
- branch inversion works with block reordering so long as the immediate
value adjustments remain in bounds
This patch moves the applyBTIFixup from LongJmp pass to MCPlusBuilder.
This refactor allows applyBTIFixup to be called from other passes
inserting indirect branches, such as:
- Hugify,
- PatchEntries.
As different passes have different information about their targets (e.g.
target BasicBlock, target Symbol, target Function), specialized versions
are created (applyBTIFixupToSymbol, applyBTIFixupToTarget), and each
calls
applyBTIFixupCommon, which implements the original logic from before.
Names of related lit tests are updated to have the "bti" prefix.
This patch adds the patchPLTEntryForBTI to enable patching PLT entries
generated by LLD.
## Context:
To keep BTI consistent, targets of stubs inserted in LongJmp need to be
patched. As PLTs are not optimized and emitted by BOLT, this patch adds
a helper for patching them in the original .plt section.
For PLTs generated by LLD, this is safe as LLD inserts extra nops to
PLTs which don't already contain a BTI.
PLT entry before patching:
```
adrp x16, Page(&(.got.plt[n]))
ldr x17, [x16, Offset(&(.got.plt[n]))]
add x16, x16, Offset(&(.got.plt[n]))
br x17
nop
nop
```
PLT entry after patching:
```
bti c
adrp x16, Page(&(.got.plt[n]))
ldr x17, [x16, Offset(&(.got.plt[n]))]
add x16, x16, Offset(&(.got.plt[n]))
br x17
nop
```
## Safety considerations:
The PLT entry can become incorrect if shifting the ADRP moves it
across a page boundary.
The PLT entry is 24 bytes, and page size is 4096 (or 16384) bytes.
Their GCD is 8 bytes, meaning that shifting the ADRP is safe, as long as
it's shifted by less than 8 bytes.
The introduced function only shifts the ADRP by one instruction (4
bytes),
meaning there is no need to recompute the ADRP offset.
A build with LLVM_USE_SANITIZER=Undefined showed:
bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp:2277:60:
runtime error: left shift of negative value -32768
This showed up in bolt/test/AArch64/veneer-lite-mode.s.
It is valid for ADRP's operand to be negative, and not valid to shift it
like that. To perform this shift reliably, cast the value to unsigned.
Several BTI-related functions are checking that a call MCInst has one
non-annotation operand.
This patch changes these checks to use MCPlus::getNumPrimeOperands,
instead of getNumOperands.
Testing:
added annotations to existing gtests to serve as regression
tests. These now also explicitly check getNumOperands and getNumPrimeOperands
usage on the annotated MCInsts.
There's no need for a full definition of `BinaryBasicBlock` in
`MCPlusBuilder.h`. Use `InstructionListType::iterator` instead of
`BinaryBasicBlock::iterator` in `findMemcpySizeInBytes()`.
- Add an enum to encode BTI variants in function arguments.
- Remove updateBTIVariant as createBTI can be used for the same
purpose.
- Remove a test case that checked against invalid BTI variants, as
those are now unrepresentable.
few tests are broken on ubuntu, need find out the cause
This reverts commit 999c9382571d6aadf9b786263862bf4085dd2dba.
Co-authored-by: yavtuk <yavtuk@ya.ru>
This patch fixed the issue related to load literal
for AArch64 (bolt/test/AArch64/materialize-constant.s),
address range for literal is limited +/- 1MB,
emitCI puts the constants by the end of function and
the one is out of available range.
SimplifyRODataLoads is enabled by default for X86 & AArch64
If a basic block contains a load into LR from stack and has no
instruction saving LR onto stack, we just assume the basic block
is an epilogue.
This is not meant to accurately recognize epilogue in all possible
cases, but to have BOLT be conservative on treating basic block as
epilogue and then turning indirect branch with unknown control flow
to tail call.
This function contains most of the logic for BTI:
- it takes the BasicBlock and the instruction used to jump to it.
- Then it checks if the first non-pseudo instruction is a sufficient
landing pad for the used call.
- if not, it generates the correct BTI instruction.
Also introduce the isCallCoveredByBTI helper to simplify the logic.
Previously, nop was treated as just an alias for hint #0. The
consequence of that was that all the general rules for hint instructions
applied to nop too, in particular that during binary analysis, they were
assumed to have unknown effects. This commit adds AArch64::NOP as a
standalone instruction with no side effects.
The scheduling update in A55-load-store-alias.s is probably not entirely
accurate, but should be more accurate than the previous result.
Checks if an instruction is BTI, and updates the immediate value to the
newly requested variant.
This can be used in situations when the compiler already inserted a BTI
landing pad to a location, but BOLT needs to update it to a different
variant.
Example: br x0 to a location with a BTI c.
- creates a BTI j|c landing pad MCInst.
- create getBTIHintNum utility in AArch64/Utils, to make sure BOLT
generates BTI immediates the same way as LLVM.
- add MCPlusBuilder unittests to cover new function.
The inliner uses DirectSP to check if a function has instructions that
modify the SP. Exceptions are stack Push and Pop instructions.
We can also allow pointer signing and authenticating instructions.
The inliner removes the Return instructions from the inlined functions.
If it is a fused pointer-authentication-and-return (e.g. RETAA), we have
to generate a new authentication instruction.
Add more heuristics to check if a basic block is an AArch64 epilogue. We
assume instructions that load from stack or adjust stack pointer as
valid epilogue code sequence if and only if they immediately precede the
branch instruction that ends the basic block.
`R_AARCH64_ADD_ABS_LO12_NC` is for the `ADD` instruction in the
`ADRP+ADD` sequence. For `ADRP+LDR` sequence generated in LDR
relaxation, relocation type for `LDR` should be
`R_AARCH64_LDST64_ABS_LO12_NC` if it is 64-bit integer load or
`R_AARCH64_LDST32_ABS_LO12_NC` if 32-bit.
Sorry should have included this in #165787.
The search should proceed from CallInst to the beginning of BB since X2
can be rewritten and we need to catch the most recent write before the
call.
Patch by Yafet Beyene alulayafet@gmail.com
Replace the current `ADRRelaxationPass` with `AArch64RelaxationPass`,
which, besides the existing ADR relaxation, will also run LDR relaxation
that for now only handles these two forms of LDR instructions:
`ldr Xt, [label]` and `ldr Wt, [label]`.
Reapply "[BOLT][AArch64] Handle OpNegateRAState to enable optimizing
binaries with pac-ret hardening (#120064)" (#162353)
This reverts commit c7d776b06897567e2d698e447d80279664b67d47.
#120064 was reverted for breaking builders.
Fix: changed the mismatched type in MarkRAStates.cpp to `auto`.
---
Original message:
OpNegateRAState is an AArch64-specific DWARF CFI used to change the value
of the RA_SIGN_STATE pseudoregister. The RA_SIGN_STATE register records
whether the current return address has been signed with PAC.
OpNegateRAState requires special handling in BOLT because its placement
depends on the function layout. Since BOLT reorders basic blocks during
optimization, these CFIs must be regenerated after layout is finalized.
This patch introduces two new passes:
- MarkRAStates (runs before optimizations): assigns a signedness annotation to each
instruction based on OpNegateRAState CFIs in the input binary.
- InsertNegateRAStates (runs after optimizations): reads the annotations and emits
new OpNegateRAState CFIs where RA state changes between instructions.
Design details are described in: `bolt/docs/PacRetDesign.md`.
OpNegateRAState is an AArch64-specific DWARF CFI used to change the value
of the RA_SIGN_STATE pseudoregister. The RA_SIGN_STATE register records
if the current return address has been signed with PAC.
OpNegateRAState requires special handling in BOLT because its placement
depends on the function layout. Since BOLT reorders basic blocks during
optimization, these CFIs must be regenerated after layout is finalized.
This patch introduces two new passes:
- MarkRAStates (runs before optimizations): assigns a signedness annotation to each
instruction based on OpNegateRAState CFIs in the input binary.
- InsertNegateRAStates (runs after optimizations): reads the annotations and emits
new OpNegateRAState CFIs where RA state changes between instructions.
Design details are described in: `bolt/docs/PacRetDesign.md`.
Introduce a low-level instruction matching DSL to capture and/or match
the operands of MCInst, single instruction at a time. Unlike the
existing `MCPlusBuilder::MCInstMatcher` machinery, this DSL is intended
for the use cases when the precise control over the instruction order is
required. For example, when validating PtrAuth hardening, all registers
are usually considered unsafe after a function call, even though
callee-saved registers should preserve their old
values _under normal operation_.
Usage example:
// Bring the short names into the local scope:
using namespace LowLevelInstMatcherDSL;
// Declare the registers to capture:
Reg Xn, Xm;
// Capture the 0th and 1st operands, match the 2nd operand against the
// just captured Xm register, match the 3rd operand against literal 0:
if (!matchInst(MaybeAdd, AArch64::ADDXrs, Xm, Xn, Xm, Imm(0))
return AArch64::NoRegister;
// Match the 0th operand against Xm:
if (!matchInst(MaybeBr, AArch64::BR, Xm))
return AArch64::NoRegister;
// Manually check that Xm and Xn did not match the same register:
if (Xm.get() == Xn.get())
return AArch64::NoRegister;
// Return the matched register:
return Xm.get();
`stadd` is only available in recent arm64 CPUs that have LSE support
(like Cortex-A73 and Cortex-A75) and is not available on old arm64 CPUs
(like Cortex-A53 and Cortex-A55). Devices could have a mixture of these
two kinds of CPUs, for which we need to provide an option for BOLT to
generate instrumentation sequence that emulates what `stadd` would do.
The implementation puts counter increment into an injected helper function
so we don't need to update CFG in the function that is being instrumented
and instrumentation induced binary size increase will be smaller.
Remove the `NZCV` save and restore instructions from instrumentation
sequence because the instructions used for getting counter address,
counter increment and stack push/pop won't impact `NZCV`. And with
this, we can use `X1` to do counter increment and remove the two
instructions that saves and later restores `X2`.
The pass for inlining memcpy in BOLT was currently X86-specific and was
using the instruction `rep movsb`.
This patch implements a static size analysis system for AArch64 memcpy
inlining that extracts copy sizes from preceding instructions to then
use it to generate the optimal width-specific load/store sequences.
An authenticated pointer can be explicitly checked by the compiler via a
sequence of instructions that executes BRK on failure. It is important
to recognize such BRK instruction as checking every register (as it is
expected to immediately trigger an abnormal program termination) to
prevent false positive reports about authentication oracles:
autia x2, x3
autia x0, x1
; neither x0 nor x2 are checked at this point
eor x16, x0, x0, lsl #1
tbz x16, #62, on_success ; marks x0 as checked
; end of BB: for x2 to be checked here, it must be checked in both
; successor basic blocks
on_failure:
brk 0xc470
on_success:
; x2 is checked
ldr x1, [x2] ; marks x2 as checked
The MCSymbolRefExpr::create overload with the specifier parameter is
discouraged and being phased out. Expressions with relocation specifiers
should use MCSpecifierExpr instead.
Rename these relocation specifier constants, aligning with the naming
convention used by other targets (`S_` instead of `VK_`).
* ELF/COFF: AArch64MCExpr::VK_ => AArch64::S_ (VK_ABS/VK_PAGE_ABS are
also used by Mach-O as a hack)
* Mach-O: AArch64MCExpr::M_ => AArch64::S_MACHO_
* shared: AArch64MCExpr::None => AArch64::S_None
Apologies for the churn following the recent rename in #132595. This
change ensures consistency after introducing MCSpecifierExpr to replace
MCTargetSpecifier subclasses.
Pull Request: https://github.com/llvm/llvm-project/pull/144633
Replace AArch64MCExpr, which encodes expressions with relocation
specifiers, with the new generic MCSpecifierExpr interface, aligning
with other targets by phasing out target-specific XXXMCExpr classes.
Temporarily convert AArch64MCExpr to a namespace to avoid renaming
`AArch64MCExpr::VK_` constants in this PR. A follow-up patch will rename
these to `AArch64::S_` to match the convention used by other targets.
Move helper functions to AArch64MCAsmInfo.h, with the goal of eventually
removing AArch64MCExpr.h.
Pull Request: https://github.com/llvm/llvm-project/pull/144632
Clarify the semantics of `getAuthenticatedReg` and remove a redundant
`isAuthenticationOfReg` method, as combined auth+something instructions
(such as `retaa` on AArch64) should be handled carefully, especially
when searching for authentication oracles: usually, such instructions
cannot be authentication oracles and only some of them actually write an
authenticated pointer to a register (such as "ldra x0, [x1]!").
Use `std::optional<MCPhysReg>` returned type instead of plain MCPhysReg
and returning `getNoRegister()` as a "not applicable" indication.
Document a few existing methods, add information about preconditions.
Implement the detection of signing oracles. In this patch, a signing
oracle is defined as a sign instruction that accepts a "non-protected"
pointer, but for a slightly different definition of "non-protected"
compared to control flow instructions.
A second BitVector named TrustedRegs is added to the register state
computed by the data-flow analysis. The difference between a
"safe-to-dereference" and a "trusted" register states is that to make
an unsafe register trusted by authentication, one has to make sure
that the authentication succeeded. For example, on AArch64 without
FEAT_PAuth2 and FEAT_EPAC, an authentication instruction produces an
invalid pointer on failure, so that subsequent memory access triggers
an error, but re-signing such pointer would "fix" the signature.
Note that while a separate "trusted" register state may be redundant
depending on the specific semantics of auth and sign operations, it is
still important to check signing operations: while code like this
resign:
autda x0, x1
pacda x0, x2
ret
is probably safe provided `autda` generates an error on authentication
failure, this function
sign_anything:
pacda x0, x1
ret
is inherently unsafe.
We will increase the use of raw relocation types and eliminate fixup
kinds that correspond to relocation types. The getFixupKindInfo
functions will return an rvalue instead. Let's update the return type
from a const reference to a value type.
In addition to authenticated pointers, consider the contents of a
register safe if it was
* written by PC-relative address computation
* updated by an arithmetic instruction whose input address is safe
This patch fixes the following two issues with the createCmpJE for
AArch64:
1. Avoids overwriting the value of the input register RegNo by use XZR
as the destination register.
subs xzr, RegNo, #Imm
which is equivalent to a simple
cmp RegNo, #Imm
2. The immediate operand to the Bcc instruction must be EQ instead of
#Imm.
This patch also adds a new function for createCmpJNE and unit tests for
the both createCmpJE and createCmpJNE for X86 and AArch64.
In 96e5ee2, I inadvertently broke the way non-trivial symbol references
got updated from non-optimized code. The breakage was a consequence of
`getTargetSymbol(MCExpr *)` not returning a symbol when the parameter
was a binary expression. Fix `getTargetSymbol()` to cover such cases.