329 Commits

Author SHA1 Message Date
Jonathan Thackray
cf21ea9c99
[ARM] Fix more typos (NFC)
Fix more typos in the AArch64 codebase using the
https://github.com/crate-ci/typos Rust package.

commit-id:33a1bb8d

Reviewers: davemgreen

Reviewed By: davemgreen

Pull Request: https://github.com/llvm/llvm-project/pull/183087
2026-03-06 22:27:35 +00:00
walkerkd
bb2b957c53
[Thumb2] Use BXAUT instruction if available (#183056)
Generated a

  bxaut r12, lr, sp

instruction rather than

  aut r12, lr, sp
  bx lr

The bxaut instruction is available when for thumb2 code with the
armv8.1m-main architecture and PACBTI is enabled

This change introduces a new pseudo instruction ARM::t2BXAUT_RET which
is similar to the existing pseudo instruction ARM::tBX_RET.

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-03-03 16:37:43 +00:00
Simon Tatham
0921542e3b
[ARM] Count register copies when estimating function size (#175763)
`EstimateFunctionSizeInBytes`, in `ARMFrameLowering.cpp`, provides an
early estimate of the compiled size of a function, in a context that
wants to overestimate rather than underestimate.

In some cases it was underestimating severely, by over 20%. The
discrepancy was entirely accounted for by the fact that `COPY`
operations were not being counted at all, even though each one (or at
least each one that survives any post-regalloc optimizations) takes 2
bytes in Thumb or 4 in Arm. This could lead to a compile failure, if the
underestimated function size led frame lowering to not stack LR, but
later, `ARMConstantIslandsPass` needed to insert an intra-function
branch long enough to require a `bl` instruction, needing LR to have
been stacked.

The result of `EstimateFunctionSizeInBytes` was not directly available
for testing, so I added an `LLVM_DEBUG` at the end of the function. That
way, the test file doesn't need to try to make a >2048 byte function
estimated at <2048 bytes; it just needs to exhibit a function with a
single `COPY` and make sure it's counted.

At the moment, `EstimateFunctionSizeInBytes` is only used at all in
Thumb-1 compilations, to decide whether the function is large enough to
justify stacking LR as a precaution. However, the subroutine
`ARMBaseInstrInfo::getInstSizeInBytes` which counts each individual
`MachineInstr` is called from other contexts too, so I've made it return
a sensible answer for `COPY` nodes in both of Arm and Thumb.
2026-01-26 09:28:38 +00:00
Matt Arsenault
bb993a89a8
RuntimeLibcalls: Add entries for stack probe functions (#167453) 2025-12-19 01:17:20 +01:00
Amara Emerson
18f29a5810
[ARM] Fix not saving FP when required to in frame-pointer=non-leaf. (#163699)
When the stars align to conspire against stack alignment, when we have
frame-pointer=non-leaf we can incorrectly skip preserving fp/r7 in the
prolog.

The fix here first makes sure we're using the right frame pointer
register in the context of preserving the incoming FP, and then make sure that we
save the FP when re-alignment is known to be necessary.

rdar://162462271
2025-11-12 16:31:25 -08:00
Matt Arsenault
55422e804b
CodeGen: Remove TRI argument from getRegClass (#158225)
TargetInstrInfo now directly holds a reference to TargetRegisterInfo
and does not need TRI passed in anywhere.
2025-11-10 15:43:55 -08:00
Matt Arsenault
7289f2cd0c
CodeGen: Remove MachineFunction argument from getRegClass (#158188)
This is a low level utility to parse the MCInstrInfo and should
not depend on the state of the function.
2025-09-12 19:22:02 +09:00
Brad Smith
0d2e11f3e8
Remove Native Client support (#133661)
Remove the Native Client support now that it has finally reached end of life.
2025-07-15 13:22:33 -04:00
Jie Fu
feb61f5b05 [Target] Prevent copying in loop variables (NFC)
/llvm-project/llvm/lib/Target/ARM/ARMISelLowering.cpp:2769:19: error: loop variable '[Reg, N]' creates a copy from type 'std::pair<unsigned int, llvm::SDValue> const' [-Werror,-Wrange-loop-construct]
  for (const auto [Reg, N] : RegsToPass) {
                  ^
/llvm-project/llvm/lib/Target/ARM/ARMISelLowering.cpp:2769:8: note: use reference type 'std::pair<unsigned int, llvm::SDValue> const &' to prevent copying
  for (const auto [Reg, N] : RegsToPass) {
       ^~~~~~~~~~~~~~~~~~~~~
                  &
/llvm-project/llvm/lib/Target/ARM/ARMISelLowering.cpp:2954:19: error: loop variable '[Reg, N]' creates a copy from type 'std::pair<unsigned int, llvm::SDValue> const' [-Werror,-Wrange-loop-construct]
  for (const auto [Reg, N] : RegsToPass)
                  ^
/llvm-project/llvm/lib/Target/ARM/ARMISelLowering.cpp:2954:8: note: use reference type 'std::pair<unsigned int, llvm::SDValue> const &' to prevent copying
  for (const auto [Reg, N] : RegsToPass)
       ^~~~~~~~~~~~~~~~~~~~~
                  &
2 errors generated.
2025-06-28 19:29:00 +08:00
Kazu Hirata
094a7087b8
[Target] Use range-based for loops (NFC) (#146198) 2025-06-27 22:07:58 -07:00
Kazu Hirata
d4d8a0fe01
[ARM] Remove unused includes (NFC) (#141377)
These are identified by misc-include-cleaner.  I've filtered out those
that break builds.  Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
2025-05-24 14:48:58 -07:00
Kazu Hirata
d144c13ae5
[Target] Remove unused local variables (NFC) (#138443) 2025-05-04 07:56:38 -07:00
Benson Chu
50320504c8 [ARM][Thumb] Save FPSCR + FPEXC for save-vfp attribute
FPSCR and FPEXC will be stored in FPStatusRegs, after GPRCS2 has been
saved.

- GPRCS1
- GPRCS2
- FPStatusRegs (new)
- DPRCS
- GPRCS3
- DPRCS2

FPSCR is present on all targets with a VFP, but the FPEXC register is
not present on Cortex-M devices, so different amounts of bytes are
being pushed onto the stack depending on our target, which would
affect alignment for subsequent saves.

DPRCS1 will sum up all previous bytes that were saved, and will emit
extra instructions to ensure that its alignment is correct. My
assumption is that if DPRCS1 is able to correct its alignment to be
correct, then all subsequent saves will also have correct alignment.

Avoid annotating the saving of FPSCR and FPEXC for functions marked
with the interrupt_save_fp attribute, even though this is done as part
of frame setup.  Since these are status registers, there really is no
viable way of annotating this. Since these aren't GPRs or DPRs, they
can't be used with .save or .vsave directives. Instead, just record
that the intermediate registers r4 and r5 are saved to the stack
again.

Co-authored-by: Jake Vossen <jake@vossen.dev>
Co-authored-by: Alan Phipps <a-phipps@ti.com>
2025-04-22 14:31:29 -05:00
Sergei Barannikov
2afef58e40
[ARM] Use helper class for emitting CFI instructions into MIR (#135994)
Similar to #135845.

PR: https://github.com/llvm/llvm-project/pull/135994
2025-04-17 00:03:34 +03:00
Kazu Hirata
fac8fe9cf9
[Target] Use *Set::insert_range (NFC) (#132879)
We can use *Set::insert_range to collapse:

  for (auto Elem : Range)
    Set.insert(E);

down to:

  Set.insert_range(Range);

In some cases, we can further fold that into the set declaration.
2025-03-24 22:42:04 -07:00
Benson Chu
3b3356043c Revert "[ARM][Thumb] Save FPSCR + FPEXC for save-vfp attribute"
This reverts commit 1f05703176d43a339b41a474f51c0e8b1a83c9bb.
2025-03-10 10:11:23 -05:00
Benson Chu
1f05703176 [ARM][Thumb] Save FPSCR + FPEXC for save-vfp attribute
FPSCR and FPEXC will be stored in FPStatusRegs, after GPRCS2 has been
saved.

- GPRCS1
- GPRCS2
- FPStatusRegs (new)
- DPRCS
- GPRCS3
- DPRCS2

FPSCR is present on all targets with a VFP, but the FPEXC register is
not present on Cortex-M devices, so different amounts of bytes are
being pushed onto the stack depending on our target, which would
affect alignment for subsequent saves.

DPRCS1 will sum up all previous bytes that were saved, and will emit
extra instructions to ensure that its alignment is correct. My
assumption is that if DPRCS1 is able to correct its alignment to be
correct, then all subsequent saves will also have correct alignment.

Avoid annotating the saving of FPSCR and FPEXC for functions marked
with the interrupt_save_fp attribute, even though this is done as part
of frame setup.  Since these are status registers, there really is no
viable way of annotating this. Since these aren't GPRs or DPRs, they
can't be used with .save or .vsave directives. Instead, just record
that the intermediate registers r4 and r5 are saved to the stack
again.

Co-authored-by: Jake Vossen <jake@vossen.dev>
Co-authored-by: Alan Phipps <a-phipps@ti.com>
2025-03-10 10:05:15 -05:00
Craig Topper
af64f0a6c2
[FrameLowering] Use MCRegister instead of Register in CalleeSavedInfo. NFC (#128095)
Callee saved registers should always be phyiscal registers. They are
often passed directly to other functions that take MCRegister like
getMinimalPhysRegClass or TargetRegisterClass::contains.

Unfortunately, sometimes the MCRegister is compared to a Register which
gave an ambiguous comparison error when the MCRegister is on the LHS.
Adding a MCRegister==Register comparison operator created more ambiguous
comparison errors elsewhere. These cases were usually comparing against
a base or frame pointer register that is a physical register in a
Register. For those I added an explicit conversion of Register to
MCRegister to fix the error.
2025-02-20 23:44:05 -08:00
Craig Topper
9d9c5619a5 [ARM] Use MCRegister instead of unsigned. NFC
Primarily around uses of getSubReg/getSuperReg.
2025-01-20 19:36:44 -08:00
Guy David
1a935d7a17
[llvm] Mark scavenging spill-slots as *spilled* stack objects. (#122673)
This seems like an oversight when copying code from other backends.
2025-01-14 10:18:31 +02:00
Kazu Hirata
0611a668d1 [ARM] Fix a warning
This patch fixes:

  llvm/lib/Target/ARM/ARMFrameLowering.cpp:1404:39: error: unused
  variable 'PushPopSplit' [-Werror,-Wunused-variable]
2024-11-19 09:20:03 -08:00
Benson Chu
d37554b69b [ARM] Specifically delineate between different GPRCS2 positions
Currently, the relative position of GPRCS2 (with respect to other
instructions in the prologue of a function) can be different depending
on the type of ARMSubtarget::PushPopSplitVariant.

When the PushPopSpiltVariant is SplitR11WindowsSEH, GPRCS2 comes
after both GPRCS1 and DPRCS2:

GPRCS1
DPRCS1
GPRCS2

However, in all other cases, GPRCS2 comes before DPRCS1, like so:

GPRCS1
GPRCS2
DPRCS1

This makes the MI walking code in ARMFrameLowering::emitPrologue a bit
confusing. If GPRCS2Size is non-zero, we also have to check the
PushPopSplitVariant to know if we will encounter the DPRCS1 push
instruction first or the GPRCS2 push instruction first.

This commit changes to SplitR11WindowsSEH such that the spill area is
as follows:

GPRCS1
DPRCS1
GPRCS3

This disambiguates a lot of the ARMFrameLowering.cpp MI traversal
code.
2024-11-19 10:40:40 -06:00
Kazu Hirata
9571cc2b28
[ARM] Remove unused includes (NFC) (#115995)
Identified with misc-include-cleaner.
2024-11-12 23:15:21 -08:00
Oliver Stannard
dff114b356
[ARM] Optimise non-ABI frame pointers (#110286)
With -fomit-frame-pointer, even if we set up a frame pointer for other
reasons (e.g. variable-sized or over-aligned stack allocations), we
don't need to create an ABI-compliant frame record. This means that we
can save all of the general-purpose registers in one push, instead of
splitting it to ensure that the frame pointer and link register are
adjacent on the stack, saving two instructions per function.
2024-10-28 09:01:06 +00:00
Oliver Stannard
493529fbce Re-land: [ARM] Fix frame chains with M-profile PACBTI (#110285)
When using AAPCS-compliant frame chains with PACBTI return address
signing, there ware a number of bugs in the generation of the frame
pointer and function prologues. The most obvious was that we sometimes
would modify r11 before pushing it to the stack, so it wasn't preserved
as required by the PCS. We also sometimes did not push R11 and LR
adjacent to one another on the stack, or used R11 as a frame pointer
without pointing it at the saved value of R11, both of which are
required to have an AAPCS compliant frame chain.

The original work of this patch was done by James Westwood, reviewed as
 #82801 and #81249, with some tidy-ups done by Mark Murray and myself.
2024-10-24 16:44:16 +01:00
Oliver Stannard
18ac0178ad Revert "[ARM] Fix frame chains with M-profile PACBTI (#110285)"
Reverting because this is causing failures with MSan:
https://lab.llvm.org/buildbot/#/builders/169/builds/4378

This reverts commit e1f8f84acec05997893c305c78fbf7feecf44dd7.
2024-10-18 09:04:28 +01:00
Alex Rønne Petersen
ad4a582fd9
[llvm] Consistently respect naked fn attribute in TargetFrameLowering::hasFP() (#106014)
Some targets (e.g. PPC and Hexagon) already did this. I think it's best
to do this consistently so that frontend authors don't run into
inconsistent results when they emit `naked` functions. For example, in
Zig, we had to change our emit code to also set `frame-pointer=none` to
get reliable results across targets.

Note: I don't have commit access.
2024-10-18 09:35:42 +04:00
gxlayer
4a2bd78f5b
[ARM] Fix -mno-omit-leaf-frame-pointer flag doesn't works on 32-bit ARM (#109628)
The -mno-omit-leaf-frame-pointer flag works on 32-bit ARM architectures
and addresses the bug reported in #108019
2024-10-17 20:25:06 +08:00
Jie Fu
584e00a316 [ARM] Fix -Wunused-variable in ARMFrameLowering.cpp (NFC)
/llvm-project/llvm/lib/Target/ARM/ARMFrameLowering.cpp:1028:9:
error: unused variable 'FPOffset' [-Werror,-Wunused-variable]
    int FPOffset = MFI.getObjectOffset(FramePtrSpillFI);
        ^
1 error generated.
2024-10-17 18:46:26 +08:00
Oliver Stannard
e1f8f84ace
[ARM] Fix frame chains with M-profile PACBTI (#110285)
When using AAPCS-compliant frame chains with PACBTI return address
signing, there ware a number of bugs in the generation of the frame
pointer and function prologues. The most obvious was that we sometimes
would modify r11 before pushing it to the stack, so it wasn't preserved
as required by the PCS. We also sometimes did not push R11 and LR
adjacent to one another on the stack, or used R11 as a frame pointer
without pointing it at the saved value of R11, both of which are
required to have an AAPCS compliant frame chain.

The original work of this patch was done by James Westwood, reviewed as
 #82801 and #81249, with some tidy-ups done by Mark Murray and myself.
2024-10-17 09:32:44 +01:00
Oliver Stannard
754c1f2170 [ARM] Add debug dump for StackAdjustingInsts (NFC) (#110283) 2024-10-09 09:29:28 +01:00
Oliver Stannard
e817cfde41 [ARM] Refactor generation of push/pop instructions (NFC) (#110283)
These used a set of callback functions to check which callee-save area a
register is in, refactor them to use the same data as other parts of
ARMFrameLowering. This will make it easier to add a new variant to the
register splitting.
2024-10-09 09:29:27 +01:00
Oliver Stannard
2ecf2e242b [ARM] Factor out code to determine spill areas (NFC) (#110283)
There were multiple loops in ARMFrameLowering which sort the callee
saved registers into spill areas, which were hard to understand and
modify. This splits the information about which register is in which
save area into a separate function.
2024-10-09 09:29:27 +01:00
Oliver Stannard
67200f5dc8 [ARM] Tidy up stack frame strategy code (NFC) (#110283)
We have two different ways of splitting the pushes of callee-saved
registers onto the stack, controlled by the confusingly similar names
STI.splitFramePushPop() and STI.splitFramePointerPush(). This removes
those functions and replaces them with a single function which returns
an enum. This is in preparation for adding another value to that enum.

The original work of this patch was done by James Westwood, reviewed as
 #82801 and #81249, with some tidy-ups done by Mark Murray and myself.
2024-10-09 09:29:27 +01:00
Wesley Wiser
ca076f7a63
[LLVM] [MC] Update frame layout & CFI generation to handle frames larger than 2gb (#99263)
Rebase of #84114. I've only included the core changes to frame layout
calculation & CFI generation which sidesteps the regressions found after
merging #84114. Since these changes are a necessary precursor to the
overall fix and are themselves slightly beneficial as CFI is now
generated correctly, I think it is reasonable to merge this first step.

---

For very large stack frames, the offset from the stack pointer to a
local can be more than 2^31 which overflows various `int` offsets in the
frame lowering code.

This patch updates the frame lowering code to calculate the offsets as
64-bit values and fixes CFI to use the corrected sizes.

After this patch, additional work is needed to fix offset truncations in
each target's codegen.
2024-07-23 09:43:30 -07:00
Matt Arsenault
0f0cfcff2c
CodeGen: Avoid some references to MachineFunction's getMMI (#99652)
MachineFunction's probably should not include a backreference to
the owning MachineModuleInfo. Most of these references were used
just to query the MCContext, which MachineFunction already directly
stores. Other contexts are using it to query the LLVMContext, which
can already be accessed through the IR function reference.
2024-07-19 22:09:05 +04:00
Kazu Hirata
3e47f6ba4a Rapply "[Target] Use range-based for loops (NFC) (#98844)"
This iteration drops hunks where the loop body adds more elements.
2024-07-17 19:39:04 -07:00
Kazu Hirata
515618e245 Revert "[Target] Use range-based for loops (NFC) (#98844)"
This reverts commit 3614f65a7ba9d925010e3316a1d93bcebc632178.

fixupImmediateBr seems to resize ImmBranches.
2024-07-15 20:39:49 -07:00
Kazu Hirata
3614f65a7b
[Target] Use range-based for loops (NFC) (#98844) 2024-07-15 17:23:11 -07:00
Oliver Stannard
1a5239251e
[ARM] r11 is reserved when using -mframe-chain=aapcs (#86951)
When using the -mframe-chain=aapcs or -mframe-chain=aapcs-leaf options,
we cannot use r11 as an allocatable register, even if
-fomit-frame-pointer is also used. This is so that r11 will always point
to a valid frame record, even if we don't create one in every function.
2024-06-07 10:58:10 +01:00
Eleanor Bonnici
c12bc57e23
Do not use R12 for indirect tail calls with PACBTI (#82661)
When compiling for thumbv8.1m with +pacbti and making an indirect tail
call, the compiler was free to put the function pointer into R12.

This is incorrect because R12 is restored to contain authentication code
for the caller's return address.

This patch excludes R12 from the set of registers the compiler can put
the function pointer in.

Fixes https://github.com/llvm/llvm-project/issues/75998
2024-04-30 15:29:07 +01:00
Xu Zhang
f6d431f208
[CodeGen] Make the parameter TRI required in some functions. (#85968)
Fixes #82659

There are some functions, such as `findRegisterDefOperandIdx` and  `findRegisterDefOperand`, that have too many default parameters. As a result, we have encountered some issues due to the lack of TRI  parameters, as shown in issue #82411.

Following @RKSimon 's suggestion, this patch refactors 9 functions, including `{reads, kills, defines, modifies}Register`,  `registerDefIsDead`, and `findRegister{UseOperandIdx, UseOperand, DefOperandIdx, DefOperand}`, adjusting the order of the TRI parameter and making it required. In addition, all the places that call these functions have also been updated correctly to ensure no additional impact.

After this, the caller of these functions should explicitly know whether to pass the `TargetRegisterInfo` or just a `nullptr`.
2024-04-24 14:24:14 +01:00
Simon Pilgrim
78f0871bee Revert rG58de1e2c5eee548a9b365e3b1554d87317072ad9 "Fix stack layout for frames larger than 2gb (#84114)"
This is failing on some EXPENSIVE_CHECKS buildbots
2024-03-27 16:16:15 +00:00
Wesley Wiser
58de1e2c5e
Fix stack layout for frames larger than 2gb (#84114)
For very large stack frames, the offset from the stack pointer to a local can be more than 2^31 which overflows various `int` offsets in the frame lowering code.

This patch updates the frame lowering code to calculate the offsets as 64-bit values and resolves the overflows, resulting in the correct codegen for very large frames.

Fixes #48911
2024-03-27 15:05:58 +00:00
James Westwood
b2c16e7ff4
Revert "[ARM] R11 not pushed adjacent to link register with PAC-M and… (#84019)
… AAPCS frame chain fix (#82801)"

This reverts commit 00e4a4197137410129d4725ffb82bae9ce44bdde. This patch
was found to cause miscompilations and compilation failures.
2024-03-05 14:34:43 +00:00
James Westwood
00e4a41971
[ARM] R11 not pushed adjacent to link register with PAC-M and AAPCS frame chain fix (#82801)
When code for M class architecture was compiled with AAPCS and PAC
enabled, the frame pointer, r11, was not pushed to the stack adjacent to
the link register. Due to PAC being enabled, r12 was placed between r11
and lr. This patch fixes this by adding an extra case to the already
existing code that splits the GPR push in two when R11 is the frame
pointer and certain paremeters are met. The differential revision for
this previous change can be found here:
https://reviews.llvm.org/D125649. This now ensures that r11 and lr are
pushed in a separate push instruction to the other GPRs when PAC and
AAPCS are enabled, meaning the frame pointer and link register are now
pushed onto the stack adjacent to each other.
2024-03-04 12:00:36 +00:00
ostannard
749384c08e
[ARM] Update IsRestored for LR based on all returns (#82745)
PR #75527 fixed ARMFrameLowering to set the IsRestored flag for LR based
on all of the return instructions in the function, not just one.
However, there is also code in ARMLoadStoreOptimizer which changes
return instructions, but it set IsRestored based on the one instruction
it changed, not the whole function.

The fix is to factor out the code added in #75527, and also call it from
ARMLoadStoreOptimizer if it made a change to return instructions.

Fixes #80287.
2024-02-26 12:23:25 +00:00
Kazu Hirata
af8d050286 [Target] Use range-based for loops (NFC) 2023-12-24 23:09:55 -08:00
Florian Hahn
b1a5ee1feb
[ARM] Check all terms in emitPopInst when clearing Restored for LR. (#75527)
emitPopInst checks a single function exit MBB. If other paths also exit
the function and any of there terminators uses LR implicitly, it is not
save to clear the Restored bit.

Check all terminators for the function before clearing Restored.

This fixes a mis-compile in outlined-fn-may-clobber-lr-in-caller.ll
where the machine-outliner previously introduced BLs that clobbered LR
which in turn is used by the tail call return.

Alternative to #73553
2023-12-20 16:56:15 +01:00
John Brawn
fae3f9ec4f [ARM] Fix prologue/epilogue for pacbti-m leaf functions
R12 is callee-saved in functions with pacbti-m enabled, but this is
done in assignCalleeSavedSpillSlots, meaning that in
determineCalleeSaves we have to manually set CanEliminateFrame.

This fixes a bug where in leaf functions with no other callee-saved
registers the aut instruction wouldn't be emitted and stack offsets
of arguments passed on the stack would be incorrect.

Differential Revision: https://reviews.llvm.org/D157865
2023-09-04 13:46:01 +01:00