369 Commits

Author SHA1 Message Date
David Green
8274be509e [AArch64] Remove header dependencies of AArch64ISelLowering.h. NFC
This patch aims to reduce the include used by AArch64ISelLowering, allowing it
to be included by unittests so that they can reference the AArch64ISD nodes.
It:
 - Moves the inclusion of AArch64SMEAttributes.h to the uses.
 - Moves LowerPtrAuthGlobalAddressStatically to a static function, so that
   AArch64PACKey is not required in the header.
 - Moves the definitions of getExceptionPointerRegister to the cpp file, to
   remove the reference of AArch64::X0.
2024-10-28 18:53:37 +00:00
Benjamin Maxwell
ddd463be7e
[AArch64] Add getStreamingHazardSize() to AArch64Subtarget (#113679)
This is defined by the `-aarch64-streaming-hazard-size` option or its
alias `-aarch64-stack-hazard-size` (the original name). It has been
renamed to be more general as this option will (for the time being) be
used to detect if the current target has streaming mode memory hazards.

---------

Co-authored-by: Hari Limaye <hari.limaye@arm.com>
2024-10-28 13:01:22 +00:00
Jack Styles
86f76c3b17
[AArch64][Libunwind] Add Support for FEAT_PAuthLR DWARF Instruction (#112171)
As part of FEAT_PAuthLR, a new DWARF Frame Instruction was introduced,
`DW_CFA_AARCH64_negate_ra_state_with_pc`. This instructs Libunwind that
the PC has been used with the signing instruction. This change includes
three commits
- Libunwind support for the newly introduced DWARF Instruction
- CodeGen Support for the DWARF Instructions
- Reversing the changes made in #96377. Due to
`DW_CFA_AARCH64_negate_ra_state_with_pc`'s requirements to be placed
immediately after the signing instruction, this would mean the CFI
Instruction location was not consistent with the generated location when
not using FEAT_PAuthLR. The commit reverses the changes and makes the
location consistent across the different branch protection options.
While this does have a code size effect, this is a negligible one.

For the ABI information, see here:
853286c7ab/aadwarf64/aadwarf64.rst (id23)
2024-10-28 08:22:38 +00:00
Alex Rønne Petersen
ad4a582fd9
[llvm] Consistently respect naked fn attribute in TargetFrameLowering::hasFP() (#106014)
Some targets (e.g. PPC and Hexagon) already did this. I think it's best
to do this consistently so that frontend authors don't run into
inconsistent results when they emit `naked` functions. For example, in
Zig, we had to change our emit code to also set `frame-pointer=none` to
get reliable results across targets.

Note: I don't have commit access.
2024-10-18 09:35:42 +04:00
Sander de Smalen
f314e12494
[AArch64][SME] Fix iterator to fixupCalleeSaveRestoreStackOffset (#110855)
The iterator passed to `fixupCalleeSaveRestoreStackOffset` may be
incorrect when it tries to skip over the instructions that get the
current value of 'vg', when there is a 'rdsvl' instruction straight
after the prologue. That's because it doesn't check that the instruction
is still a 'frame-setup' instruction.
2024-10-15 11:56:40 +01:00
CarolineConcatto
a548eded70
[AArch64][SME]Check streaming mode when using SME2 instruction in fra… (#109680)
…me lowering

SME instructions can only be used in streaming mode. PTRUE for
predicated counter and the ld/st pair can be used when:
  sve2.1  is available or
  sme2 available in function in streaming mode.
Previously the frame lowering only checking if sme2 available when
building the machine instruction.
This fix checks if sme2 is available and is subtarget in streaming mode
2024-09-30 08:42:41 +01:00
Sander de Smalen
db054a1970
[AArch64][SME] Fix ADDVL addressing to scavenged stackslot. (#109674)
In https://reviews.llvm.org/D159196 we avoided stackslot scavenging
when there was no FP available. But in the case where FP is available
we need to actually prefer using the FP over the BP.

This change affects more than just SME, but it should be a general
improvement, since any slot above the (address pointed to by) FP
is always closer to FP than BP, so it makes sense to always favour
using the FP to address it when the FP is available.

This also fixes the issue for SME where this is not just preferred
but required.
2024-09-24 13:29:30 +01:00
Lukacma
7f0c5b0502
[AArch64]Fix invalid use of ld1/st1 in stack alloc (#105518)
This patch fixes incorrect usage of scalar+immediate variant of ld1/st1
instructions during stack allocation caused by
[c4bac7f](c4bac7f7dc).
This commit used ld1/st1 even when stack offset was outside of immediate
range for this instruction, producing invalid assembly.  This commit was also using incorrect offsets when using ld1/st1.
2024-09-05 14:47:10 +01:00
Amara Emerson
39ec1f79b7 [AArch64] Basic SVE PCS support for handling scalable vectors on Darwin.
For the tests I just added +sve instead of what actual hardware has, which is only SME,
since otherwise all the test functions need to be marked as streaming mode.

rdar://121864771
2024-08-20 17:10:51 -07:00
Kerry McLaughlin
9211977d13
[AArch64][SME] Return false from produceCompactUnwindFrame if VG save required. (#104588)
The compact unwind format requires all registers are stored in pairs, so
return false from produceCompactUnwindFrame if we require saving VG.
2024-08-19 10:17:10 +01:00
Amara Emerson
334a366ba7 [AArch64][Darwin][SME] Don't try to save VG to the stack for unwinding.
On Darwin we don't have any hardware that has SVE support, only SME.
Therefore we don't need to save VG for unwinders and can safely omit it.

This also fixes crashes introduced since this feature landed since Darwin's
compact unwind code can't handle the presence of VG anyway.

rdar://131072344
2024-08-13 01:46:43 -07:00
Hari Limaye
a98a0dcf63
[AArch64] Add streaming-mode stack hazard optimization remarks (#101695)
Emit an optimization remark when objects in the stack frame may cause
hazards in a streaming mode function. The analysis requires either the
`aarch64-stack-hazard-size` or `aarch64-stack-hazard-remark-size` flag
to be set by the user, with the former flag taking precedence.
2024-08-06 11:39:01 +01:00
David Green
a3cf8642bf
[AArch64] Cleanup existing values in getMemOpInfo (#98196)
This patch tries to clean up some of the existing values in
getMemOpInfo. All values should now be in bytes (not bits), and the
MinOffset/MaxOffset are now always represented unscaled (the immediate
that will be present in the final instruction).

Although I could not find a place where it altered codegen, the offset
of a post-index instruction will be 0, not scale*imm. A
IsPostIndexLdStOpcode method has been added to try and make sure that
case is handled properly.
2024-08-03 12:31:10 +01:00
Hari Limaye
dc1c00f6b1
[StackFrameLayoutAnalysis] Use target-specific hook for SP offsets (#100386)
StackFrameLayoutAnalysis currently calculates SP-relative offsets in a
target-independent way via MachineFrameInfo offsets. This is incorrect
for some Targets, e.g. AArch64, when there are scalable vector stack
slots.

This patch adds a virtual function to TargetFrameLowering to provide
offsets from SP, with a default implementation matching what is
currently used in StackFrameLayoutAnalysis, and refactors
StackFrameLayoutAnalysis to use this function. Only non-zero scalable
offsets are output by the analysis pass.

An implementation of this function is added for AArch64 targets, which
aims to provide correct SP offsets in most cases.
2024-07-25 09:03:48 +01:00
antangelo
6c9086d13f
[AArch64] Support varargs for preserve_nonecc (#99434)
Adds varargs support for preserve_none by falling back to C argument
passing for the target platform for varargs functions.

Fixes #95093
2024-07-21 00:29:18 -04:00
Matt Arsenault
a8a7d62d04 AArch64: Avoid using MachineFunction::getMMI 2024-07-20 13:11:39 +04:00
David Green
ae2e66b03b [AArch64] Use TargetStackID::ScalableVector instead of hard-coded values. NFC 2024-07-19 08:59:26 +01:00
David Green
4b9bcabdf0
[AArch64] Add streaming-mode stack hazards. (#98956)
Under some SME contexts, a coprocessor with its own separate cache will
be used for FPR operations. This can create hazards if the CPU and the
SME unit try to access the same area of memory, including if the access
is to an area of the stack.

To try to alleviate that, this patch attempts to introduce extra padding
into the stack frame between FP and GPR accesses, controlled by the
StackHazardSize option. Without changing the layout of the stack frame,
a stack object of the right size is added between GPR and FPR CSRs.
Another is added to the stack objects section, and stack objects are
sorted so that FPR > Hazard padding slot > GPRs (where possible).

Unfortunately some things are not handled well (VLA area, FPR arguments
on the stack, object with both GPR and FPR accesses), but if those are
controlled by the user then the entire stack frame becomes GPR at the
start/end with FPR in the middle, surrounded by Hazard padding. This can
greatly help reduce something that can be difficult for the user to
control themselves.

The current implementation is opt-in through an
-aarch64-stack-hazard-size flag, and should have no effect if the option
is unset. In the long run the implementation might change (for example
using more base pointers to separate in more cases, re-enabling ldp/stp
using an extra register, etc), but this gets at least something for
people to use in llvm-19 if they need it. The only change whilst the
option is unset will be a fix for making sure the stack increment is
added at the right place when it cannot be converted to postinc
(++MBBI). I believe without extra padding that can not normally be
reached.
2024-07-18 08:16:40 +01:00
David Green
0d7403184d [AArch64] Add a AArch64InstrInfo::isFpOrNEON method for checking physical register call. NFC 2024-07-15 08:13:52 +01:00
Amara Emerson
9865171e24
[AArch64] Add -mlr-for-calls-only to replace the now removed -ffixed-x30 flag. (#98073)
This re-introduces the effective behaviour that was reverted in
7ad481e76c9bee5b9895ebfa0fdb52f31cb7de77.

This time we're not using the same mechanism, exposing another
reservation feature
that prevents only regalloc from using the register, but not for other
required uses
like ABIs.

This also fixes a consequent issue with reserving LR, which is that
frame lowering
was only adding live-in flags for non-reserved regs. This would cause
issues later
since the outliner needs accurate flags to determine when LR needs to be
preserved.

rdar://131313095
2024-07-10 15:16:51 -07:00
David Green
cb48ad6603 [AArch64] Clean up formatting of AArch64FrameLowering. NFC 2024-07-03 16:48:07 +01:00
antangelo
f05fa6e0cf
[AArch64] Fix argument passing in reserved registers for preserve_nonecc (#96259)
These registers include:
- X19, used by LLVM as the base pointer
- X15 on Windows, where it is used for stack allocation. It can still be
used on Linux/Darwin.
- Adjust FrameLowering scratch register code to not assume X9 is
available if the calling convention is preserve_nonecc. The code will
then pick an unused register as scratch, and allow X9 to continue being
used for argument passing.
2024-06-23 17:18:35 -04:00
Kerry McLaughlin
93c8e0f2eb
[AArch64][SME] Save VG for unwind info when changing streaming-mode (#83301)
If a function requires any streaming-mode change, the vector granule
value must be stored to the stack and unwind info must also describe the
save of VG to this location.

This patch adds VG to the list of callee-saved registers and increases
the
callee-saved stack size if the function requires streaming-mode changes.
A new type is added to RegPairInfo, which is also used to skip restoring
the register used to spill the VG value in the epilogue.

See
https://github.com/ARM-software/abi-aa/blob/main/aadwarf64/aadwarf64.rst
2024-06-13 17:42:11 +01:00
Sander de Smalen
c63a622ba7
[AArch64] Disable red-zone when lowering Q-reg copy through memory. (#94962)
This was pointed out in PR #93940.
2024-06-11 08:58:28 +01:00
Florian Mayer
4e67f45168 Reapply "[MTE] add stack frame history buffer"
In the reverted change, the order of the IR was dependent on the host
compiler, because we inserted instructions in arguments to functions.
Fix that, and also fix another problem with the test.

This reverts commit 3313f28897a87ec313ec0b52ef71c14d3b9ff652.
2024-05-29 13:02:58 -07:00
Florian Mayer
3313f28897 Revert "[MTE] add stack frame history buffer"
This reverts commit 1f67f34a5cf993f03eca8936bfb7203778c2997a.
2024-05-29 11:21:29 -07:00
Florian Mayer
1f67f34a5c [MTE] add stack frame history buffer
this will allow us to find offending objects in a symbolization step,
like we can do with hwasan.

needs matching changes in AOSP:
https://android-review.git.corp.google.com/q/topic:%22stackhistorybuffer%22

Pull Request: https://github.com/llvm/llvm-project/pull/86356
2024-05-29 10:57:11 -07:00
Nikita Popov
84314d0ae4 Revert "[AArch64][NFC] Switch to LiveRegUnits (#87313)"
This reverts commit 0f8a74732aa352e5e6dfbf74a53f015b772c5743.

PR merged without approval.
2024-05-27 08:29:59 +02:00
AtariDreams
0f8a74732a
[AArch64][NFC] Switch to LiveRegUnits (#87313) 2024-05-26 19:49:07 -04:00
Jie Fu
5a20a07fce [AArch64] Fix -Wunused-variable in AArch64FrameLowering.cpp (NFC)
llvm-project/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp:3084:31:
error: unused variable 'Subtarget' [-Werror,-Wunused-variable]
      const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
                              ^
llvm-project/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp:3253:31:
error: unused variable 'Subtarget' [-Werror,-Wunused-variable]
      const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
                              ^
2 errors generated.
2024-05-17 16:37:43 +08:00
CarolineConcatto
c4bac7f7dc
[LLVM][AArch64]Use load/store with consecutive registers in SME2 or S… (#77665)
…VE2.1 for spill/fill

When possible the spill/fill register in Frame Lowering uses the ld/st
consecutive pairs available in sme or sve2.1.
2024-05-17 09:25:21 +01:00
Fangrui Song
5a12f2867a LLVM_FALLTHROUGH => [[fallthrough]]. NFC 2024-04-25 17:50:59 -07:00
Kai Nacke
21d177096f
[NFC] Refactor looping over recomputeLiveIns into function (#88040)
https://github.com/llvm/llvm-project/pull/79940 put calls to
recomputeLiveIns into
a loop, to repeatedly call the function until the computation converges.
However,
this repeats a lot of code. This changes moves the loop into a function
to simplify
the handling.

Note that this changes the order in which recomputeLiveIns is called.
For example,

```
  bool anyChange = false;
  do {
    anyChange = recomputeLiveIns(*ExitMBB) || recomputeLiveIns(*LoopMBB);
  } while (anyChange);
```

only begins to recompute the live-ins for LoopMBB after the computation
for ExitMBB
has converged. With this change, all basic blocks have a recomputation
of the live-ins
for each loop iteration. This can result in less or more calls,
depending on the
situation.
2024-04-15 17:12:25 -04:00
Jay Foad
7a0e222a17 Revert "Convert many LivePhysRegs uses to LiveRegUnits (#83905)"
This reverts commit 2a13422b8bcee449405e3ebff957b4020805f91c.

It was causing test failures on the expensive check builders.
2024-03-07 08:20:26 +00:00
AtariDreams
2a13422b8b
Convert many LivePhysRegs uses to LiveRegUnits (#83905) 2024-03-06 10:38:14 +05:30
Sander de Smalen
1f99a45012 [AArch64] Remove unused ReverseCSRRestoreSeq option. (#82326)
This patch removes the `-reverse-csr-restore-seq` option from
AArch64FrameLowering, since this is no longer used.

This patch was reverted because of a crash in PR#79623.
Merging it back as it was fixed in PR#82492.
2024-02-22 12:01:53 +00:00
CarolineConcatto
c5253aa136
[AArch64] Restore Z-registers before P-registers (#79623) (#82492)
This is needed by PR#77665[1] that uses a P-register while restoring
Z-registers.

The reverse for SVE register restore in the epilogue was added to
guarantee performance, but further work was done to improve sve frame
restore and besides that the schedule also may change the order of the
restore, undoing the reverse restore.

This also fix the problem reported in (PR #79623) on Windows with
std::reverse and .base().

[1]https://github.com/llvm/llvm-project/pull/77665
2024-02-22 09:19:48 +00:00
Momchil Velikov
1a7166833d
[AArch64] Fix stack probing clobbering flags (#81879)
Certain stack probing sequences might clobber flags, then we can't use a
block as a prologue if the flags register is a live-in on entry to that
block.
2024-02-21 13:58:04 +00:00
Caroline Concatto
48af281f7a Revert "[AArch64] Restore Z-registers before P-registers (#79623)"
This reverts commit 3f0404aae7ed2f7138526e1bcd100a60dfe08227.

std::reverse is breaking some builds
2024-02-20 18:13:33 +00:00
Caroline Concatto
7af70643ca Revert "[AArch64] Remove unused ReverseCSRRestoreSeq option. (#82326)"
Patch  3f0404aae7ed2 is breaking some debugs build so we cannot use the reverse here.

This reverts commit 493f10106f7f1799eb67be95058b251e6a3bf0af.
2024-02-20 18:13:33 +00:00
Sander de Smalen
493f10106f
[AArch64] Remove unused ReverseCSRRestoreSeq option. (#82326)
This patch removes the `-reverse-csr-restore-seq` option from
AArch64FrameLowering, since this is no longer used.
2024-02-20 15:08:06 +00:00
CarolineConcatto
3f0404aae7
[AArch64] Restore Z-registers before P-registers (#79623)
This is needed by PR#77665[1] that uses a P-register while restoring
Z-registers.

The reverse for SVE register restore in the epilogue was added to
guarantee performance, but further work was done to improve sve frame
restore and besides that the schedule also may change the order of the
restore, undoing the reverse restore.

[1]https://github.com/llvm/llvm-project/pull/77665
2024-02-19 13:39:24 +00:00
Momchil Velikov
658e4763a2
[AArch64] Fix wrong condition in canUseAsPrologue (#81878)
Inline stack probing code may need a scratch register, hence basic
blocks where such register is not available cannot be used as prologues.

Checking for an available scratch regidster was incorrectly skipped when
the function uses stack probing.
2024-02-19 10:40:21 +00:00
Hiroshi Yamauchi
692566a8b2
Fix an assert failure with a funclet in a swifttailcc function. (#78806)
The failure happens in the livedebugvalues pass.
2024-02-15 15:54:03 -08:00
Oskar Wirga
ff4636a4ab
Refactor recomputeLiveIns to converge on added MachineBasicBlocks (#79940)
This is a fix for the regression seen in
https://github.com/llvm/llvm-project/pull/79498

> Currently, the way that recomputeLiveIns works is that it will
recompute the livein registers for that MachineBasicBlock but it matters
what order you call recomputeLiveIn which can result in incorrect
register allocations down the line.

Now we do not recompute the entire CFG but we do ensure that the newly
added MBB do reach convergence.
2024-01-30 19:33:04 -08:00
Nikita Popov
07a1925b8b Revert "Refactor recomputeLiveIns to operate on whole CFG (#79498)"
This reverts commit 59bf60519fc30d9d36c86abd83093b068f6b1e4b.

Introduces a major compile-time regression.
2024-01-26 22:33:17 +01:00
Oskar Wirga
59bf60519f
Refactor recomputeLiveIns to operate on whole CFG (#79498)
Currently, the way that recomputeLiveIns works is that it will recompute
the livein registers for that MachineBasicBlock but it matters what
order you call recomputeLiveIn which can result in incorrect register
allocations down the line.

This PR fixes that by simply recomputing the liveins for the entire CFG
until convergence is achieved. This makes it harder to introduce subtle
bugs which alter liveness.
2024-01-26 11:25:36 -08:00
Mikael Holmen
90c326b198 [AArch64] Fix gcc warning about mix of enumeral and non-enumeral types [NFC]
Change the return type of
 findScratchNonCalleeSaveRegister
to Register instead of unsigned.

Every place the function is called we already put the returned value in a
Register variable or compare it with another Register.

This fixes some gcc warnings:
 ../lib/Target/AArch64/AArch64FrameLowering.cpp:744: warning: enumeral and non-enumeral type in conditional expression [-Wextra]
   743 |     Register TargetReg = RealignmentPadding
       |     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   744 |                              ? findScratchNonCalleeSaveRegister(&MBB)
       |                              ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   745 |                              : AArch64::SP;
       |
 ../lib/Target/AArch64/AArch64FrameLowering.cpp:803: warning: enumeral and non-enumeral type in conditional expression [-Wextra]
   802 |     Register ScratchReg = RealignmentPadding
       |     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   803 |                               ? findScratchNonCalleeSaveRegister(&MBB)
       |                               ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   804 |                               : AArch64::SP;
       |
2024-01-25 07:56:16 +01:00
Eli Friedman
a6065f0fa5
Arm64EC entry/exit thunks, consolidated. (#79067)
This combines the previously posted patches with some additional work
I've done to more closely match MSVC output.

Most of the important logic here is implemented in
AArch64Arm64ECCallLowering. The purpose of the
AArch64Arm64ECCallLowering is to take "normal" IR we'd generate for
other targets, and generate most of the Arm64EC-specific bits:
generating thunks, mangling symbols, generating aliases, and generating
the .hybmp$x table. This is all done late for a few reasons: to
consolidate the logic as much as possible, and to ensure the IR exposed
to optimization passes doesn't contain complex arm64ec-specific
constructs.

The other changes are supporting changes, to handle the new constructs
generated by that pass.

There's a global llvm.arm64ec.symbolmap representing the .hybmp$x
entries for the thunks. This gets handled directly by the AsmPrinter
because it needs symbol indexes that aren't available before that.

There are two new calling conventions used to represent calls to and
from thunks: ARM64EC_Thunk_X64 and ARM64EC_Thunk_Native. There are a few
changes to handle the associated exception-handling info,
SEH_SaveAnyRegQP and SEH_SaveAnyRegQPX.

I've intentionally left out handling for structs with small
non-power-of-two sizes, because that's easily separated out. The rest of
my current work is here. I squashed my current patches because they were
split in ways that didn't really make sense. Maybe I could split out
some bits, but it's hard to meaningfully test most of the parts
independently.

Thanks to @dpaoliello for extensive testing and suggestions.

(Originally posted as https://reviews.llvm.org/D157547 .)
2024-01-22 21:28:07 -08:00
Florian Hahn
58dcac3948
[AArch64] Check X16&X17 in prologue if the fn has an SwiftAsyncContext. (#73945)
StoreSwiftAsyncContext clobbers X16 & X17. Make sure they are available
in canUseAsPrologue, to avoid shrink wrapping moving the pseudo to a
place where X16 or X17 are live.
2023-12-05 11:41:40 +00:00