StoreSwiftAsyncContext clobbers X16 & X17. Make sure they are available
in canUseAsPrologue, to avoid shrink wrapping moving the pseudo to a
place where X16 or X17 are live.
Add support for probing for dynamic allocas (variable-size objects and
outgoing stack arguments).
Co-authored-by: Oliver Stannard <oliver.stannard@linaro.org>
This adds code to AArch64 function prologues to protect against stack
clash attacks by probing (writing to) the stack at regular enough
intervals to ensure that the guard page cannot be skipped over.
The patch depends on and maintains the following invariants:
Upon function entry the caller guarantees that it has probed the stack
(e.g. performed a store) at some address [sp, #N], where`0 <= N <=
1024`. This invariant comes from a requirement for compatibility with
GCC. Any address range in the allocated stack, no smaller than
stack-probe-size bytes contains at least one probe At any time the stack
pointer is above or in the guard page Probes are performed in
descreasing address order
The stack-probe-size is a function attribute that can be set by a
platform to correspond to the guard page size.
By default, the stack probe size is 4KiB, which is a safe default as
this is the smallest possible page size for AArch64. Linux uses a 64KiB
guard for AArch64, so this can be overridden by the stack-probe-size
function attribute.
For small frames without a frame pointer (<= 240 bytes), no probes are
needed.
For larger frame sizes, LLVM always stores x29 to the stack. This serves
as an implicit stack probe. Thus, while allocating stack objects the
compiler assumes that the stack has been probed at [sp].
There are multiple probing sequences that can be emitted, depending on
the size of the stack allocation:
A straight-line sequence of subtracts and stores, used when the
allocation size is smaller than 5 guard pages. A loop allocating and
probing one page size per iteration, plus at most a single probe to deal
with the remainder, used when the allocation size is larger but still
known at compile time. A loop which moves the SP down to the target
value held in a register (or a loop, moving a scratch register to the
target value help in SP), used when the allocation size is not known at
compile-time, such as when allocating space for SVE values, or when
over-aligning the stack. This is emitted in AArch64InstrInfo because it
will also be used for dynamic allocas in a future patch. A single probe
where the amount of stack adjustment is unknown, but is known to be less
than or equal to a page size.
---------
Co-authored-by: Oliver Stannard <oliver.stannard@linaro.org>
The base change here is to change getMemOperandWithOffsetWidth to return
a TypeSize Width, which in turn allows areMemAccessesTriviallyDisjoint
to reason about trivially disjoint widths.
This patch extends AArch64FrameLowering::emitProglogue to check if the
inserted prologue clobbers live registers.
It updates `llvm/test/CodeGen/AArch64/framelayout-scavengingslot.mir`
with an extra load to make x9 live before the store, preserving the
original test.
It uses the original
`llvm/test/CodeGen/AArch64/framelayout-scavengingslot.mir` as
`llvm/test/CodeGen/AArch64/emit-prologue-clobber-verification.mir`,
because there x9 is marked as live on entry, but used as scratch reg as
it is not callee saved.
The new assertion catches a mis-compile in
`store-swift-async-context-clobber-live-reg.ll` on
https://github.com/apple/llvm-project/tree/next
It seems TypeSize is currently broken in the sense that:
TypeSize::Fixed(4) + TypeSize::Scalable(4) => TypeSize::Fixed(8)
without failing its assert that explicitly tests for this case:
assert(LHS.Scalable == RHS.Scalable && ...);
The reason this fails is that `Scalable` is a static method of class
TypeSize,
and LHS and RHS are both objects of class TypeSize. So this is
evaluating
if the pointer to the function Scalable == the pointer to the function
Scalable,
which is always true because LHS and RHS have the same class.
This patch fixes the issue by renaming `TypeSize::Scalable` ->
`TypeSize::getScalable`, as well as `TypeSize::Fixed` to
`TypeSize::getFixed`,
so that it no longer clashes with the variable in
FixedOrScalableQuantity.
The new methods now also better match the coding standard, which
specifies that:
* Variable names should be nouns (as they represent state)
* Function names should be verb phrases (as they represent actions)
Factor out some stack allocation in a separate function. This patch
splits out the generic portion of a larger refactoring done as a part of
stack clash protection support.
The patch is almost, but not quite NFC. The only difference should
be that where we have adjacent allocation of stack space
for local SVE objects and non-local SVE objects the order
of `sub sp, ...` and `addvl sp, ...` instructions is reversed, because now
it's done with a single call to `emitFrameOffset` and it happens
add/subtract the fixed part before the scalable part, e.g.
addvl sp, sp, #-2
sub sp, sp, #16, lsl #12
sub sp, sp, #16
becomes
sub sp, sp, #16, lsl #12
sub sp, sp, #16
addvl sp, sp, #-2
The tryMergeAdjacentSTG function tries to merge multiple
stg/st2g/stg_loop instructions. It doesn't verify the liveness of NZCV
flag before moving around STGloop which also alters NZCV flags. This was
not issue before the patch 5e612bc as these stack tag stores does not
alter the NZCV flags. But after the change, this merge function leads to
miscompilation because of control flow change in instructions. Added the
check to to see if the first instruction after insert point reads or
writes to NZCV flag and it's liveout state. This check happens after the
filling of merge list just before merge and bails out if necessary.
Track the live register state immediately before, instead of after,
MBBI. This makes it simple to track the state at the start or end of a
basic block without a separate (and poorly named) Tracking flag.
This changes the API of the backward(MachineBasicBlock::iterator I)
method, which now recedes to the state just before, instead of just
after, *I. Some clients are simplified by this change.
There is one small functional change shown in the lit tests where
multiple spilled registers all need to be reloaded before the same
instruction. The reloads will now be inserted in the opposite order.
This should not affect correctness.
If a function has odd number of same type of registers to save, and the
calling convention also requires odd number of such type of CSRs, an FP
register would be accidentally marked as saved when producePairRegisters
returns true.
This patch also fixes the AArch64LowerHomogeneousPrologEpilog pass not
handling AArch64::NoRegister; actually this pass must be fixed along
with the register pairing so i can write a test for it.
When performing a tail call, check the value of LR register after
authentication to prevent the callee from signing and spilling an
untrusted value. This commit implements a few variants of check,
more can be added later.
If it is safe to assume that executable pages are always readable,
LR can be checked just by dereferencing the LR value via LDR.
As an alternative, LR can be checked as follows:
; lowered AUT* instruction
; <some variant of check that LR contains a valid address>
b.cond break_block
ret_block:
; lowered TCRETURN
break_block:
brk 0xc471
As the existing methods either break the compatibility with execute-only
memory mappings or can degrade the performance, they are disabled by
default and can be explicitly enabled with a command line option.
Individual subtargets can opt-in to use one of the available methods
by updating AArch64FrameLowering::getAuthenticatedLRCheckMethod().
Reviewed By: kristof.beyls
Differential Revision: https://reviews.llvm.org/D156716
…tasync argument
swiftasync introduces a number of frame adjustments which is
incompatible with current implementation of HomogeneousPrologEpilog
pass.
To simplify handling PAuth in the machine outliner, introduce a
separate AArch64PointerAuth pass that is executed after both
Prologue/Epilogue Inserter and Machine Outliner passes.
After moving to AArch64PointerAuth, signLR and authenticateLR are
not used outside of their class anymore, so make them private and
simplify accordingly.
The new pass is added via AArch64PassConfig::addPostBBSections(),
so that it can change the code size before branch relaxation occurs.
AArch64BranchTargets is placed there too, so it can take into account
any PACI(A|B)SP instructions and not excessively add BTIs at the start
of functions.
Reviewed By: tmatheson
Differential Revision: https://reviews.llvm.org/D159357
Place the architecuture-specific logic to clear registers in a single
place and call it via a TargetInstrInfo method.
This will allow one to add instructions to clear registers holding the
stack protector guard value before return, but do it in
non-architecture-specific code.
If no frame-pointer is available and the compiler has scavenged a
spill-slot in the callee-save area, the compiler may be forced to emit an
'addvl' inside the streaming-mode-changing call sequence when it needs to
fill (reload) an FP register being passed to the call.
We can avoid this entirely by disabling stack-slot scavenging when there
are streaming-mode-changing call-sequences in the function.
Reviewed By: david-arm
Differential Revision: https://reviews.llvm.org/D159196
but there are some in the epilogue.
Make a decision whether or not to have a startepilogue/endepilogue
based on whether we actually insert SEH opcodes in the epilogue,
rather than whether we had SEH opcodes in the prologue or not.
This fixes an assert failure when there are no SEH opcodes in the
prologue but there are SEH opcodes in the epilogue (for example, when
there is no stack frame but there are stack arguments) which was not
covered in https://reviews.llvm.org/D88641.
Assertion failed: HasWinCFI == MF.hasWinCFI(), file C:\Users\hiroshi\llvm-project\llvm\lib\Target\AArch64\AArch64FrameLowering.cpp, line 1988
Differential Revision: https://reviews.llvm.org/D159238
In the machine outliner implementation for AArch64, `signOutlinedFunction()`
reimplements signing the LR value in prologue and authenticating it in
epilogue of the outlined function. This patch factors out `signLR()` and
`authenticateLR()` functions from AArch64FrameLowering code and reuses
them in `signOutlinedFunction()`.
The `mergeOutliningCandidateAttributes()` outliner callback is
introduced as well to further unify signing and authentication of the LR
value.
Reviewed By: tmatheson
Differential Revision: https://reviews.llvm.org/D157320
When generating unwind tables for code which uses return-address
signing, we need to toggle the RA_SIGN_STATE DWARF register around any
tail-calls, because these require the return address to be authenticated
before the call, and could throw an exception. This is done using the
.cfi_negate_ra_state directive before the call, and .cfi_restore_state
at the start of the next basic block.
However, since D153098, the .cfi_restore_state isn't being inserted,
because the CFIFixup pass isn't being run. This re-enables that pass
when return-adress signing is enabled.
Reviewed By: ikudrin, MaskRay
Differential Revision: https://reviews.llvm.org/D156428
This fixes a code gen issue where savings the swift async context
register (x22) accidentally overwrites the saved value of another
callee-saved register, corrupts its value and causes a crash.
Differential Revision: https://reviews.llvm.org/D156391
This reverts commit b1d0bc0f4395c69097bc11b6ba8f821f621272a9.
Builds with expensive checks show that 'sp' isn't a valid register
in ADDXrr - an object file built without exprnsive checks enabled
disassembles as "add x15, xzr, x16", instead of the intended
"add x15, sp, x16".
The instruction-precise, or asynchronous, unwind tables usually take up
much more space than the synchronous ones. If a user is concerned about
the load size of the program and does not need the features provided
with the asynchronous tables, the compiler should be able to generate
the more compact variant.
This patch changes the generation of CFI instructions for these cases so
that they all come in one chunk in the prolog; it emits only one
`.cfi_def_cfa*` instruction followed by `.cfi_offset` ones after all
stack adjustments and register spills, and avoids generating CFI
instructions in the epilog(s) as well as any other exceeding CFI
instructions like `.cfi_remember_state` and `.cfi_restore_state`.
Effectively, it reverses the effects of D111411 and D114545 on functions
with the `uwtable(sync)` attribute. As a side effect, it also restores
the behavior on functions that have neither `uwtable` nor `nounwind`
attributes.
Differential Revision: https://reviews.llvm.org/D153098
Clang accepts preserve_all for AArch64 while it is missing form the backed.
Fixes#58145
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D135652
STG, STZG, ST2G, STZ2G are the exceptions to append 'Offset' to name the
offset format of load/store instructions. All other load/store
instructions use 'i' as the appendix. If there is no special reason to
do so, we should make the naming consistent.
Differential Revision: https://reviews.llvm.org/D141819
Linux kernel sets SCTRL_EL1.BT0 and BT1 to 1 unconditionally, which
makes PACIASP equivalent to BTI C + PACIA LR,SP.
Use the shorter instruction sequence by default.
I'm not aware of anyone who needs the opposite. They are welcome to
revert to the current behavior under a subtarget feature or an
environment check.
This reverts commit 571c8c5263a79293aaadae07b11feb36726eaf53.
Differential Revision: https://reviews.llvm.org/D141978
The most common case for string attributes parses them as integers. We
don't have a convenient way to do this, and as a result we have
inconsistent missing attribute and invalid attribute handling
scattered around. We also have inconsistent radix usage to
getAsInteger; some places use the default 0 and others use base 10.
Update a few of the uses, but there are quite a lot of these.
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated. The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.
This is part of an effort to migrate from llvm::Optional to
std::optional:
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
The last use of emitCalleeSavedFrameMoves was removed on March 24,
2022 in commit 50a97aacacf689f838451439d913421d608e1bed.
Differential Revision: https://reviews.llvm.org/D138388
All instructions that can raise fp exceptions also read FPCR, with the
only other instructions that interact with it being the MSR/MRS to
write/read FPCR.
Introducing an FPCR register also requires adjusting
invalidateWindowsRegisterPairing in AArch64FrameLowering.cpp to use
the encoded value of registers instead of their enum value, as the
enum value is based on the alphabetical order of register names and
now FPCR is placed between FP and LR.
This change unfortunately means a large number of mir tests need to
be adjusted due to instructions now requiring an implicit fpcr operand
to be present.
Differential Revision: https://reviews.llvm.org/D121929
Whenever a call to __chkstk was made, the frame lowering previously
omitted the aligning (as NumBytes was reset to zero before doing
alignment).
This fixes https://github.com/llvm/llvm-project/issues/56182.
The initial version of this produced invalid code for small
functions with no local stack allocations, if those functions
were marked with the "stackrealign" attribute. If building
with -mstack-alignment=16 (which otherwise mostly would be a
no-op), this attribute is added on the main function.
Differential Revision: https://reviews.llvm.org/D135687
This reverts commit 50e0aced4521260af842dba73f1d8c50d36314ea.
This could accidentally start producing invalid code in some
cases (in particular, if compiling with -mstack-alignment=16, which
one could expect to be a no-op for a target where the stack always
is aligned to 16 bytes anyway).
If the stack is realigned, we've emitted a frame pointer and
already terminated the SEH prologue, making this dead code since
a07787c9a50c046e45921dd665f5a53a752bbc31.
The immediate to this SEH opcode was entirely bogus - we don't
know how many bytes the AND operation adjusts the SP, and by
doing "NumBytes & andMaskEncoded" (where andMaskEncoded was the
immediate bitpattern for the AND instruction), the immediate to the
opcode was total gibberish.
This hasn't had any practical effect, since the original stack
pointer always was restored from the frame pointer afterwards anyway.
Differential Revision: https://reviews.llvm.org/D135815
Without this, unwinding through functions that does use PAC
would fail, if PAC actually was active.
Differential Revision: https://reviews.llvm.org/D135103
After setting up the FP, the rest of the prologue doesn't need to
be replayed for unwinding the stack frame.
This allows reverting the functional parts of
2f7fbf837625267193351cc334e506a3a9161958 (but fixing inconsistent
duplicate setting of HasWinCFI).
Differential Revision: https://reviews.llvm.org/D135686
When returning from a function with both SCS and PAC-RET enabled, we need to
authenticate the return address from the stack and then load from the SCS,
but this was happening in the reverse order when RETA[AB] were being used.
Fix it by disabling the use of RETA[AB] when SCS is enabled.
Fixes pr58072.
Differential Revision: https://reviews.llvm.org/D134931
Part of initial Arm64EC patchset.
Arm64EC code needs to use functions with a different name, to avoid
using the x64 versions.
Differential Revision: https://reviews.llvm.org/D125417