This patch aims to reduce the include used by AArch64ISelLowering, allowing it
to be included by unittests so that they can reference the AArch64ISD nodes.
It:
- Moves the inclusion of AArch64SMEAttributes.h to the uses.
- Moves LowerPtrAuthGlobalAddressStatically to a static function, so that
AArch64PACKey is not required in the header.
- Moves the definitions of getExceptionPointerRegister to the cpp file, to
remove the reference of AArch64::X0.
This is defined by the `-aarch64-streaming-hazard-size` option or its
alias `-aarch64-stack-hazard-size` (the original name). It has been
renamed to be more general as this option will (for the time being) be
used to detect if the current target has streaming mode memory hazards.
---------
Co-authored-by: Hari Limaye <hari.limaye@arm.com>
As part of FEAT_PAuthLR, a new DWARF Frame Instruction was introduced,
`DW_CFA_AARCH64_negate_ra_state_with_pc`. This instructs Libunwind that
the PC has been used with the signing instruction. This change includes
three commits
- Libunwind support for the newly introduced DWARF Instruction
- CodeGen Support for the DWARF Instructions
- Reversing the changes made in #96377. Due to
`DW_CFA_AARCH64_negate_ra_state_with_pc`'s requirements to be placed
immediately after the signing instruction, this would mean the CFI
Instruction location was not consistent with the generated location when
not using FEAT_PAuthLR. The commit reverses the changes and makes the
location consistent across the different branch protection options.
While this does have a code size effect, this is a negligible one.
For the ABI information, see here:
853286c7ab/aadwarf64/aadwarf64.rst (id23)
Some targets (e.g. PPC and Hexagon) already did this. I think it's best
to do this consistently so that frontend authors don't run into
inconsistent results when they emit `naked` functions. For example, in
Zig, we had to change our emit code to also set `frame-pointer=none` to
get reliable results across targets.
Note: I don't have commit access.
The iterator passed to `fixupCalleeSaveRestoreStackOffset` may be
incorrect when it tries to skip over the instructions that get the
current value of 'vg', when there is a 'rdsvl' instruction straight
after the prologue. That's because it doesn't check that the instruction
is still a 'frame-setup' instruction.
…me lowering
SME instructions can only be used in streaming mode. PTRUE for
predicated counter and the ld/st pair can be used when:
sve2.1 is available or
sme2 available in function in streaming mode.
Previously the frame lowering only checking if sme2 available when
building the machine instruction.
This fix checks if sme2 is available and is subtarget in streaming mode
In https://reviews.llvm.org/D159196 we avoided stackslot scavenging
when there was no FP available. But in the case where FP is available
we need to actually prefer using the FP over the BP.
This change affects more than just SME, but it should be a general
improvement, since any slot above the (address pointed to by) FP
is always closer to FP than BP, so it makes sense to always favour
using the FP to address it when the FP is available.
This also fixes the issue for SME where this is not just preferred
but required.
This patch fixes incorrect usage of scalar+immediate variant of ld1/st1
instructions during stack allocation caused by
[c4bac7f](c4bac7f7dc).
This commit used ld1/st1 even when stack offset was outside of immediate
range for this instruction, producing invalid assembly. This commit was also using incorrect offsets when using ld1/st1.
For the tests I just added +sve instead of what actual hardware has, which is only SME,
since otherwise all the test functions need to be marked as streaming mode.
rdar://121864771
On Darwin we don't have any hardware that has SVE support, only SME.
Therefore we don't need to save VG for unwinders and can safely omit it.
This also fixes crashes introduced since this feature landed since Darwin's
compact unwind code can't handle the presence of VG anyway.
rdar://131072344
Emit an optimization remark when objects in the stack frame may cause
hazards in a streaming mode function. The analysis requires either the
`aarch64-stack-hazard-size` or `aarch64-stack-hazard-remark-size` flag
to be set by the user, with the former flag taking precedence.
This patch tries to clean up some of the existing values in
getMemOpInfo. All values should now be in bytes (not bits), and the
MinOffset/MaxOffset are now always represented unscaled (the immediate
that will be present in the final instruction).
Although I could not find a place where it altered codegen, the offset
of a post-index instruction will be 0, not scale*imm. A
IsPostIndexLdStOpcode method has been added to try and make sure that
case is handled properly.
StackFrameLayoutAnalysis currently calculates SP-relative offsets in a
target-independent way via MachineFrameInfo offsets. This is incorrect
for some Targets, e.g. AArch64, when there are scalable vector stack
slots.
This patch adds a virtual function to TargetFrameLowering to provide
offsets from SP, with a default implementation matching what is
currently used in StackFrameLayoutAnalysis, and refactors
StackFrameLayoutAnalysis to use this function. Only non-zero scalable
offsets are output by the analysis pass.
An implementation of this function is added for AArch64 targets, which
aims to provide correct SP offsets in most cases.
Under some SME contexts, a coprocessor with its own separate cache will
be used for FPR operations. This can create hazards if the CPU and the
SME unit try to access the same area of memory, including if the access
is to an area of the stack.
To try to alleviate that, this patch attempts to introduce extra padding
into the stack frame between FP and GPR accesses, controlled by the
StackHazardSize option. Without changing the layout of the stack frame,
a stack object of the right size is added between GPR and FPR CSRs.
Another is added to the stack objects section, and stack objects are
sorted so that FPR > Hazard padding slot > GPRs (where possible).
Unfortunately some things are not handled well (VLA area, FPR arguments
on the stack, object with both GPR and FPR accesses), but if those are
controlled by the user then the entire stack frame becomes GPR at the
start/end with FPR in the middle, surrounded by Hazard padding. This can
greatly help reduce something that can be difficult for the user to
control themselves.
The current implementation is opt-in through an
-aarch64-stack-hazard-size flag, and should have no effect if the option
is unset. In the long run the implementation might change (for example
using more base pointers to separate in more cases, re-enabling ldp/stp
using an extra register, etc), but this gets at least something for
people to use in llvm-19 if they need it. The only change whilst the
option is unset will be a fix for making sure the stack increment is
added at the right place when it cannot be converted to postinc
(++MBBI). I believe without extra padding that can not normally be
reached.
This re-introduces the effective behaviour that was reverted in
7ad481e76c9bee5b9895ebfa0fdb52f31cb7de77.
This time we're not using the same mechanism, exposing another
reservation feature
that prevents only regalloc from using the register, but not for other
required uses
like ABIs.
This also fixes a consequent issue with reserving LR, which is that
frame lowering
was only adding live-in flags for non-reserved regs. This would cause
issues later
since the outliner needs accurate flags to determine when LR needs to be
preserved.
rdar://131313095
These registers include:
- X19, used by LLVM as the base pointer
- X15 on Windows, where it is used for stack allocation. It can still be
used on Linux/Darwin.
- Adjust FrameLowering scratch register code to not assume X9 is
available if the calling convention is preserve_nonecc. The code will
then pick an unused register as scratch, and allow X9 to continue being
used for argument passing.
If a function requires any streaming-mode change, the vector granule
value must be stored to the stack and unwind info must also describe the
save of VG to this location.
This patch adds VG to the list of callee-saved registers and increases
the
callee-saved stack size if the function requires streaming-mode changes.
A new type is added to RegPairInfo, which is also used to skip restoring
the register used to spill the VG value in the epilogue.
See
https://github.com/ARM-software/abi-aa/blob/main/aadwarf64/aadwarf64.rst
In the reverted change, the order of the IR was dependent on the host
compiler, because we inserted instructions in arguments to functions.
Fix that, and also fix another problem with the test.
This reverts commit 3313f28897a87ec313ec0b52ef71c14d3b9ff652.
https://github.com/llvm/llvm-project/pull/79940 put calls to
recomputeLiveIns into
a loop, to repeatedly call the function until the computation converges.
However,
this repeats a lot of code. This changes moves the loop into a function
to simplify
the handling.
Note that this changes the order in which recomputeLiveIns is called.
For example,
```
bool anyChange = false;
do {
anyChange = recomputeLiveIns(*ExitMBB) || recomputeLiveIns(*LoopMBB);
} while (anyChange);
```
only begins to recompute the live-ins for LoopMBB after the computation
for ExitMBB
has converged. With this change, all basic blocks have a recomputation
of the live-ins
for each loop iteration. This can result in less or more calls,
depending on the
situation.
This patch removes the `-reverse-csr-restore-seq` option from
AArch64FrameLowering, since this is no longer used.
This patch was reverted because of a crash in PR#79623.
Merging it back as it was fixed in PR#82492.
This is needed by PR#77665[1] that uses a P-register while restoring
Z-registers.
The reverse for SVE register restore in the epilogue was added to
guarantee performance, but further work was done to improve sve frame
restore and besides that the schedule also may change the order of the
restore, undoing the reverse restore.
This also fix the problem reported in (PR #79623) on Windows with
std::reverse and .base().
[1]https://github.com/llvm/llvm-project/pull/77665
Certain stack probing sequences might clobber flags, then we can't use a
block as a prologue if the flags register is a live-in on entry to that
block.
This is needed by PR#77665[1] that uses a P-register while restoring
Z-registers.
The reverse for SVE register restore in the epilogue was added to
guarantee performance, but further work was done to improve sve frame
restore and besides that the schedule also may change the order of the
restore, undoing the reverse restore.
[1]https://github.com/llvm/llvm-project/pull/77665
Inline stack probing code may need a scratch register, hence basic
blocks where such register is not available cannot be used as prologues.
Checking for an available scratch regidster was incorrectly skipped when
the function uses stack probing.
This is a fix for the regression seen in
https://github.com/llvm/llvm-project/pull/79498
> Currently, the way that recomputeLiveIns works is that it will
recompute the livein registers for that MachineBasicBlock but it matters
what order you call recomputeLiveIn which can result in incorrect
register allocations down the line.
Now we do not recompute the entire CFG but we do ensure that the newly
added MBB do reach convergence.
Currently, the way that recomputeLiveIns works is that it will recompute
the livein registers for that MachineBasicBlock but it matters what
order you call recomputeLiveIn which can result in incorrect register
allocations down the line.
This PR fixes that by simply recomputing the liveins for the entire CFG
until convergence is achieved. This makes it harder to introduce subtle
bugs which alter liveness.
Change the return type of
findScratchNonCalleeSaveRegister
to Register instead of unsigned.
Every place the function is called we already put the returned value in a
Register variable or compare it with another Register.
This fixes some gcc warnings:
../lib/Target/AArch64/AArch64FrameLowering.cpp:744: warning: enumeral and non-enumeral type in conditional expression [-Wextra]
743 | Register TargetReg = RealignmentPadding
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
744 | ? findScratchNonCalleeSaveRegister(&MBB)
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
745 | : AArch64::SP;
|
../lib/Target/AArch64/AArch64FrameLowering.cpp:803: warning: enumeral and non-enumeral type in conditional expression [-Wextra]
802 | Register ScratchReg = RealignmentPadding
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
803 | ? findScratchNonCalleeSaveRegister(&MBB)
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
804 | : AArch64::SP;
|
This combines the previously posted patches with some additional work
I've done to more closely match MSVC output.
Most of the important logic here is implemented in
AArch64Arm64ECCallLowering. The purpose of the
AArch64Arm64ECCallLowering is to take "normal" IR we'd generate for
other targets, and generate most of the Arm64EC-specific bits:
generating thunks, mangling symbols, generating aliases, and generating
the .hybmp$x table. This is all done late for a few reasons: to
consolidate the logic as much as possible, and to ensure the IR exposed
to optimization passes doesn't contain complex arm64ec-specific
constructs.
The other changes are supporting changes, to handle the new constructs
generated by that pass.
There's a global llvm.arm64ec.symbolmap representing the .hybmp$x
entries for the thunks. This gets handled directly by the AsmPrinter
because it needs symbol indexes that aren't available before that.
There are two new calling conventions used to represent calls to and
from thunks: ARM64EC_Thunk_X64 and ARM64EC_Thunk_Native. There are a few
changes to handle the associated exception-handling info,
SEH_SaveAnyRegQP and SEH_SaveAnyRegQPX.
I've intentionally left out handling for structs with small
non-power-of-two sizes, because that's easily separated out. The rest of
my current work is here. I squashed my current patches because they were
split in ways that didn't really make sense. Maybe I could split out
some bits, but it's hard to meaningfully test most of the parts
independently.
Thanks to @dpaoliello for extensive testing and suggestions.
(Originally posted as https://reviews.llvm.org/D157547 .)
StoreSwiftAsyncContext clobbers X16 & X17. Make sure they are available
in canUseAsPrologue, to avoid shrink wrapping moving the pseudo to a
place where X16 or X17 are live.