With -fomit-frame-pointer, even if we set up a frame pointer for other
reasons (e.g. variable-sized or over-aligned stack allocations), we
don't need to create an ABI-compliant frame record. This means that we
can save all of the general-purpose registers in one push, instead of
splitting it to ensure that the frame pointer and link register are
adjacent on the stack, saving two instructions per function.
When using AAPCS-compliant frame chains with PACBTI return address
signing, there ware a number of bugs in the generation of the frame
pointer and function prologues. The most obvious was that we sometimes
would modify r11 before pushing it to the stack, so it wasn't preserved
as required by the PCS. We also sometimes did not push R11 and LR
adjacent to one another on the stack, or used R11 as a frame pointer
without pointing it at the saved value of R11, both of which are
required to have an AAPCS compliant frame chain.
The original work of this patch was done by James Westwood, reviewed as
#82801 and #81249, with some tidy-ups done by Mark Murray and myself.
Some targets (e.g. PPC and Hexagon) already did this. I think it's best
to do this consistently so that frontend authors don't run into
inconsistent results when they emit `naked` functions. For example, in
Zig, we had to change our emit code to also set `frame-pointer=none` to
get reliable results across targets.
Note: I don't have commit access.
When using AAPCS-compliant frame chains with PACBTI return address
signing, there ware a number of bugs in the generation of the frame
pointer and function prologues. The most obvious was that we sometimes
would modify r11 before pushing it to the stack, so it wasn't preserved
as required by the PCS. We also sometimes did not push R11 and LR
adjacent to one another on the stack, or used R11 as a frame pointer
without pointing it at the saved value of R11, both of which are
required to have an AAPCS compliant frame chain.
The original work of this patch was done by James Westwood, reviewed as
#82801 and #81249, with some tidy-ups done by Mark Murray and myself.
These used a set of callback functions to check which callee-save area a
register is in, refactor them to use the same data as other parts of
ARMFrameLowering. This will make it easier to add a new variant to the
register splitting.
There were multiple loops in ARMFrameLowering which sort the callee
saved registers into spill areas, which were hard to understand and
modify. This splits the information about which register is in which
save area into a separate function.
We have two different ways of splitting the pushes of callee-saved
registers onto the stack, controlled by the confusingly similar names
STI.splitFramePushPop() and STI.splitFramePointerPush(). This removes
those functions and replaces them with a single function which returns
an enum. This is in preparation for adding another value to that enum.
The original work of this patch was done by James Westwood, reviewed as
#82801 and #81249, with some tidy-ups done by Mark Murray and myself.
Rebase of #84114. I've only included the core changes to frame layout
calculation & CFI generation which sidesteps the regressions found after
merging #84114. Since these changes are a necessary precursor to the
overall fix and are themselves slightly beneficial as CFI is now
generated correctly, I think it is reasonable to merge this first step.
---
For very large stack frames, the offset from the stack pointer to a
local can be more than 2^31 which overflows various `int` offsets in the
frame lowering code.
This patch updates the frame lowering code to calculate the offsets as
64-bit values and fixes CFI to use the corrected sizes.
After this patch, additional work is needed to fix offset truncations in
each target's codegen.
MachineFunction's probably should not include a backreference to
the owning MachineModuleInfo. Most of these references were used
just to query the MCContext, which MachineFunction already directly
stores. Other contexts are using it to query the LLVMContext, which
can already be accessed through the IR function reference.
When using the -mframe-chain=aapcs or -mframe-chain=aapcs-leaf options,
we cannot use r11 as an allocatable register, even if
-fomit-frame-pointer is also used. This is so that r11 will always point
to a valid frame record, even if we don't create one in every function.
When compiling for thumbv8.1m with +pacbti and making an indirect tail
call, the compiler was free to put the function pointer into R12.
This is incorrect because R12 is restored to contain authentication code
for the caller's return address.
This patch excludes R12 from the set of registers the compiler can put
the function pointer in.
Fixes https://github.com/llvm/llvm-project/issues/75998
Fixes#82659
There are some functions, such as `findRegisterDefOperandIdx` and `findRegisterDefOperand`, that have too many default parameters. As a result, we have encountered some issues due to the lack of TRI parameters, as shown in issue #82411.
Following @RKSimon 's suggestion, this patch refactors 9 functions, including `{reads, kills, defines, modifies}Register`, `registerDefIsDead`, and `findRegister{UseOperandIdx, UseOperand, DefOperandIdx, DefOperand}`, adjusting the order of the TRI parameter and making it required. In addition, all the places that call these functions have also been updated correctly to ensure no additional impact.
After this, the caller of these functions should explicitly know whether to pass the `TargetRegisterInfo` or just a `nullptr`.
For very large stack frames, the offset from the stack pointer to a local can be more than 2^31 which overflows various `int` offsets in the frame lowering code.
This patch updates the frame lowering code to calculate the offsets as 64-bit values and resolves the overflows, resulting in the correct codegen for very large frames.
Fixes#48911
… AAPCS frame chain fix (#82801)"
This reverts commit 00e4a4197137410129d4725ffb82bae9ce44bdde. This patch
was found to cause miscompilations and compilation failures.
When code for M class architecture was compiled with AAPCS and PAC
enabled, the frame pointer, r11, was not pushed to the stack adjacent to
the link register. Due to PAC being enabled, r12 was placed between r11
and lr. This patch fixes this by adding an extra case to the already
existing code that splits the GPR push in two when R11 is the frame
pointer and certain paremeters are met. The differential revision for
this previous change can be found here:
https://reviews.llvm.org/D125649. This now ensures that r11 and lr are
pushed in a separate push instruction to the other GPRs when PAC and
AAPCS are enabled, meaning the frame pointer and link register are now
pushed onto the stack adjacent to each other.
PR #75527 fixed ARMFrameLowering to set the IsRestored flag for LR based
on all of the return instructions in the function, not just one.
However, there is also code in ARMLoadStoreOptimizer which changes
return instructions, but it set IsRestored based on the one instruction
it changed, not the whole function.
The fix is to factor out the code added in #75527, and also call it from
ARMLoadStoreOptimizer if it made a change to return instructions.
Fixes#80287.
emitPopInst checks a single function exit MBB. If other paths also exit
the function and any of there terminators uses LR implicitly, it is not
save to clear the Restored bit.
Check all terminators for the function before clearing Restored.
This fixes a mis-compile in outlined-fn-may-clobber-lr-in-caller.ll
where the machine-outliner previously introduced BLs that clobbered LR
which in turn is used by the tail call return.
Alternative to #73553
R12 is callee-saved in functions with pacbti-m enabled, but this is
done in assignCalleeSavedSpillSlots, meaning that in
determineCalleeSaves we have to manually set CanEliminateFrame.
This fixes a bug where in leaf functions with no other callee-saved
registers the aut instruction wouldn't be emitted and stack offsets
of arguments passed on the stack would be incorrect.
Differential Revision: https://reviews.llvm.org/D157865
When resolving a frame index with a large offset for v6M execute-only,
we emit a tMOVimm32 pseudo-instruction, which later gets lowered to a
sequence of instructions, all of which are flag-setting. However, a
frame index may be generated for a register spill or reload instruction,
which can be inserted at a point where CPSR is live. This patch inserts
MRS and MSR instructions around the tMOVimm32 to save and restore the
value of CPSR, if CPSR is live at that point.
This may need up to two virtual registers (one to build the immediate
value, one to save CPSR) during frame index lowering, which happens
after register allocation, so we need to ensure two spill slots are
avilable to the register scavenger to ensure it can free up enough
registers for this.
There is no test for the emission (or not) of the MRS/MSR pair, because
it requires a spill or reload to be inserted at a point where CPSR is
live, which requires a large, complex function and is fragile enough
that any optimisation changes will break the test. This bug was easily
found by csmith with -verify-machineinstrs, which I now run regularly on
v6M execute-only (and many other combinations).
Patch by John Brawn and myself.
Reviewed By: stuij
Differential Revision: https://reviews.llvm.org/D158404
Using segmented stacks with execute-only mostly works, but we need to
use the correct movi32 opcode in 6-M, and there's one place where for
thumb1 (i.e. 6-M and 8-M.base) a constant pool was unconditionally
used which needed to be fixed.
Differential Revision: https://reviews.llvm.org/D156339
Assuming that the stack grows downwards, it is fine if the stack
pointer is exactly at the stacklet boundary. We should use
less-or-equal condition when deciding whether to skip new memory
allocation.
Differential Revision: https://reviews.llvm.org/D149315
The most common case for string attributes parses them as integers. We
don't have a convenient way to do this, and as a result we have
inconsistent missing attribute and invalid attribute handling
scattered around. We also have inconsistent radix usage to
getAsInteger; some places use the default 0 and others use base 10.
Update a few of the uses, but there are quite a lot of these.
LLVM contains a helpful function for getting the size of a C-style
array: `llvm::array_lengthof`. This is useful prior to C++17, but not as
helpful for C++17 or later: `std::size` already has support for C-style
arrays.
Change call sites to use `std::size` instead.
Differential Revision: https://reviews.llvm.org/D133429
Currently the a AAPCS compliant frame record is not always created for
functions when it should. Although a consistent frame record might not
be required in some cases, there are still scenarios where applications
may want to make use of the call hierarchy made available trough it.
In order to enable the use of AAPCS compliant frame records whilst keep
backwards compatibility, this patch introduces a new command-line option
(`-mframe-chain=[none|aapcs|aapcs+leaf]`) for Aarch32 and Thumb backends.
The option allows users to explicitly select when to use it, and is also
useful to ensure the extra overhead introduced by the frame records is
only introduced when necessary, in particular for Thumb targets.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D125094
This reverts commit 7625e01d661644a560884057755d48a0da8b77b4 and
dependent cbcce82ef6b512d97e92a319a75a03e997c844e1.
Commit 7625e01d661644a560884057755d48a0da8b77b4 causes some new codegen test
failures under asan, e.g., CodeGen/ARM/execute-only.ll:
https://lab.llvm.org/buildbot/#/builders/5/builds/24659/steps/15/logs/stdio.
Currently the a AAPCS compliant frame record is not always created for
functions when it should. Although a consistent frame record might not
be required in some cases, there are still scenarios where applications
may want to make use of the call hierarchy made available trough it.
In order to enable the use of AAPCS compliant frame records whilst keep
backwards compatibility, this patch introduces a new command-line option
(`-mframe-chain=[none|aapcs|aapcs+leaf]`) for Aarch32 and Thumb backends.
The option allows users to explicitly select when to use it, and is also
useful to ensure the extra overhead introduced by the frame records is
only introduced when necessary, in particular for Thumb targets.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D125094
Currently the a AAPCS compliant frame record is not always created for
functions when it should. Although a consistent frame record might not
be required in some cases, there are still scenarios where applications
may want to make use of the call hierarchy made available trough it.
In order to enable the use of AAPCS compliant frame records whilst keep
backwards compatibility, this patch introduces a new command-line option
(`-mframe-chain=[none|aapcs|aapcs+leaf]`) for Aarch32 and Thumb backends.
The option allows users to explicitly select when to use it, and is also
useful to ensure the extra overhead introduced by the frame records is
only introduced when necessary, in particular for Thumb targets.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D125094
We intentionally disable Thumb2SizeReduction for SEH
prologues/epilogues, to avoid needing to guess what will happen with
the instructions in a potential future pass in frame lowering.
But for this specific case, where we know we can express the
intent with a narrow instruction, change to that instruction form
directly in frame lowering.
Differential Revision: https://reviews.llvm.org/D126949
We intentionally disable Thumb2SizeReduction for SEH
prologues/epilogues, to avoid needing to guess what will happen with
the instructions in a potential future pass in frame lowering.
But for this specific case, where we know we can express the
intent with a narrow instruction, change to that instruction form
directly in frame lowering.
Differential Revision: https://reviews.llvm.org/D126948
For functions that require restoring SP from FP (e.g. that need to
align the stack, or that have variable sized allocations), the prologue
and epilogue previously used to look like this:
push {r4-r5, r11, lr}
add r11, sp, #8
...
sub r4, r11, #8
mov sp, r4
pop {r4-r5, r11, pc}
This is problematic, because this unwinding operation (restoring sp
from r11 - offset) can't be expressed with the SEH unwind opcodes
(probably because this unwind procedure doesn't map exactly to
individual instructions; note the detour via r4 in the epilogue too).
To make unwinding work, the GPR push is split into two; the first one
pushing all other registers, and the second one pushing r11+lr, so that
r11 can be set pointing at this spot on the stack:
push {r4-r5}
push {r11, lr}
mov r11, sp
...
mov sp, r11
pop {r11, lr}
pop {r4-r5}
bx lr
For the same setup, MSVC generates code that uses two registers;
r11 still pointing at the {r11,lr} pair, but a separate register
used for restoring the stack at the end:
push {r4-r5, r7, r11, lr}
add r11, sp, #12
mov r7, sp
...
mov sp, r7
pop {r4-r5, r7, r11, pc}
For cases with clobbered float/vector registers, they are pushed
after the GPRs, before the {r11,lr} pair.
Differential Revision: https://reviews.llvm.org/D125649
Skip inserting regular CFI instructions if using WinCFI.
This is based a fair amount on the corresponding ARM64 implementation,
but instead of trying to insert the SEH opcodes one by one where
we generate other prolog/epilog instructions, we try to walk over the
whole prolog/epilog range and insert them. This is done because in
many cases, the exact number of instructions inserted is abstracted
away deeper.
For some cases, we manually insert specific SEH opcodes directly where
instructions are generated, where the automatic mapping of instructions
to SEH opcodes doesn't hold up (e.g. for __chkstk stack probes).
Skip Thumb2SizeReduction for SEH prologs/epilogs, and force
tail calls to wide instructions (just like on MachO), to make sure
that the unwind info actually matches the width of the final
instructions, without heuristics about what later passes will do.
Mark SEH instructions as scheduling boundaries, to make sure that they
aren't reordered away from the instruction they describe by
PostRAScheduler.
Mark the SEH instructions with the NoMerge flag, to avoid doing
tail merging of functions that have multiple epilogs that all end
with the same sequence of "b <other>; .seh_nop_w, .seh_endepilogue".
Differential Revision: https://reviews.llvm.org/D125648
When adjusting the function prologue for segmented stacks, only update
the successor edges of the immediate predecessors of the original
prologue.
Differential Revision: https://reviews.llvm.org/D122959
This is used to emit one field in doFinalization for the module. We
can accumulate this when emitting all individual functions directly in
the AsmPrinter, rather than accumulating additional state in
MachineModuleInfo.
Move the special case behavior predicate into MachineFrameInfo to
share it. This now promotes it to generic behavior. I'm assuming this
is fine because no other target implements adjustForSegmentedStacks,
or has tests using the split-stack attribute.
It fixes the overflow of 8-bit immediate field in the emitted
instruction that allocates large stacklet.
For thumb2 targets, load large immediate by a pair of movw and movt
instruction. For thumb1 and ARM targets, load large immediate by reading
from literal pool.
Differential Revision: https://reviews.llvm.org/D118545