llvm-project

Author	SHA1	Message	Date
Oliver Stannard	dff114b356	[ARM] Optimise non-ABI frame pointers (#110286 ) With -fomit-frame-pointer, even if we set up a frame pointer for other reasons (e.g. variable-sized or over-aligned stack allocations), we don't need to create an ABI-compliant frame record. This means that we can save all of the general-purpose registers in one push, instead of splitting it to ensure that the frame pointer and link register are adjacent on the stack, saving two instructions per function.	2024-10-28 09:01:06 +00:00
Oliver Stannard	493529fbce	Re-land: [ARM] Fix frame chains with M-profile PACBTI (#110285 ) When using AAPCS-compliant frame chains with PACBTI return address signing, there ware a number of bugs in the generation of the frame pointer and function prologues. The most obvious was that we sometimes would modify r11 before pushing it to the stack, so it wasn't preserved as required by the PCS. We also sometimes did not push R11 and LR adjacent to one another on the stack, or used R11 as a frame pointer without pointing it at the saved value of R11, both of which are required to have an AAPCS compliant frame chain. The original work of this patch was done by James Westwood, reviewed as #82801 and #81249, with some tidy-ups done by Mark Murray and myself.	2024-10-24 16:44:16 +01:00
Oliver Stannard	18ac0178ad	Revert "[ARM] Fix frame chains with M-profile PACBTI (#110285 )" Reverting because this is causing failures with MSan: https://lab.llvm.org/buildbot/#/builders/169/builds/4378 This reverts commit e1f8f84acec05997893c305c78fbf7feecf44dd7.	2024-10-18 09:04:28 +01:00
Alex Rønne Petersen	ad4a582fd9	[llvm] Consistently respect `naked` fn attribute in `TargetFrameLowering::hasFP()` (#106014 ) Some targets (e.g. PPC and Hexagon) already did this. I think it's best to do this consistently so that frontend authors don't run into inconsistent results when they emit `naked` functions. For example, in Zig, we had to change our emit code to also set `frame-pointer=none` to get reliable results across targets. Note: I don't have commit access.	2024-10-18 09:35:42 +04:00
gxlayer	4a2bd78f5b	[ARM] Fix -mno-omit-leaf-frame-pointer flag doesn't works on 32-bit ARM (#109628 ) The -mno-omit-leaf-frame-pointer flag works on 32-bit ARM architectures and addresses the bug reported in #108019	2024-10-17 20:25:06 +08:00
Jie Fu	584e00a316	[ARM] Fix -Wunused-variable in ARMFrameLowering.cpp (NFC) /llvm-project/llvm/lib/Target/ARM/ARMFrameLowering.cpp:1028:9: error: unused variable 'FPOffset' [-Werror,-Wunused-variable] int FPOffset = MFI.getObjectOffset(FramePtrSpillFI); ^ 1 error generated.	2024-10-17 18:46:26 +08:00
Oliver Stannard	e1f8f84ace	[ARM] Fix frame chains with M-profile PACBTI (#110285 ) When using AAPCS-compliant frame chains with PACBTI return address signing, there ware a number of bugs in the generation of the frame pointer and function prologues. The most obvious was that we sometimes would modify r11 before pushing it to the stack, so it wasn't preserved as required by the PCS. We also sometimes did not push R11 and LR adjacent to one another on the stack, or used R11 as a frame pointer without pointing it at the saved value of R11, both of which are required to have an AAPCS compliant frame chain. The original work of this patch was done by James Westwood, reviewed as #82801 and #81249, with some tidy-ups done by Mark Murray and myself.	2024-10-17 09:32:44 +01:00
Oliver Stannard	754c1f2170	[ARM] Add debug dump for StackAdjustingInsts (NFC) (#110283 )	2024-10-09 09:29:28 +01:00
Oliver Stannard	e817cfde41	[ARM] Refactor generation of push/pop instructions (NFC) (#110283 ) These used a set of callback functions to check which callee-save area a register is in, refactor them to use the same data as other parts of ARMFrameLowering. This will make it easier to add a new variant to the register splitting.	2024-10-09 09:29:27 +01:00
Oliver Stannard	2ecf2e242b	[ARM] Factor out code to determine spill areas (NFC) (#110283 ) There were multiple loops in ARMFrameLowering which sort the callee saved registers into spill areas, which were hard to understand and modify. This splits the information about which register is in which save area into a separate function.	2024-10-09 09:29:27 +01:00
Oliver Stannard	67200f5dc8	[ARM] Tidy up stack frame strategy code (NFC) (#110283 ) We have two different ways of splitting the pushes of callee-saved registers onto the stack, controlled by the confusingly similar names STI.splitFramePushPop() and STI.splitFramePointerPush(). This removes those functions and replaces them with a single function which returns an enum. This is in preparation for adding another value to that enum. The original work of this patch was done by James Westwood, reviewed as #82801 and #81249, with some tidy-ups done by Mark Murray and myself.	2024-10-09 09:29:27 +01:00
Wesley Wiser	ca076f7a63	[LLVM] [MC] Update frame layout & CFI generation to handle frames larger than 2gb (#99263 ) Rebase of #84114. I've only included the core changes to frame layout calculation & CFI generation which sidesteps the regressions found after merging #84114. Since these changes are a necessary precursor to the overall fix and are themselves slightly beneficial as CFI is now generated correctly, I think it is reasonable to merge this first step. --- For very large stack frames, the offset from the stack pointer to a local can be more than 2^31 which overflows various `int` offsets in the frame lowering code. This patch updates the frame lowering code to calculate the offsets as 64-bit values and fixes CFI to use the corrected sizes. After this patch, additional work is needed to fix offset truncations in each target's codegen.	2024-07-23 09:43:30 -07:00
Matt Arsenault	0f0cfcff2c	CodeGen: Avoid some references to MachineFunction's getMMI (#99652 ) MachineFunction's probably should not include a backreference to the owning MachineModuleInfo. Most of these references were used just to query the MCContext, which MachineFunction already directly stores. Other contexts are using it to query the LLVMContext, which can already be accessed through the IR function reference.	2024-07-19 22:09:05 +04:00
Kazu Hirata	3e47f6ba4a	Rapply "[Target] Use range-based for loops (NFC) (#98844 )" This iteration drops hunks where the loop body adds more elements.	2024-07-17 19:39:04 -07:00
Kazu Hirata	515618e245	Revert "[Target] Use range-based for loops (NFC) (#98844 )" This reverts commit 3614f65a7ba9d925010e3316a1d93bcebc632178. fixupImmediateBr seems to resize ImmBranches.	2024-07-15 20:39:49 -07:00
Kazu Hirata	3614f65a7b	[Target] Use range-based for loops (NFC) (#98844 )	2024-07-15 17:23:11 -07:00
Oliver Stannard	1a5239251e	[ARM] r11 is reserved when using -mframe-chain=aapcs (#86951 ) When using the -mframe-chain=aapcs or -mframe-chain=aapcs-leaf options, we cannot use r11 as an allocatable register, even if -fomit-frame-pointer is also used. This is so that r11 will always point to a valid frame record, even if we don't create one in every function.	2024-06-07 10:58:10 +01:00
Eleanor Bonnici	c12bc57e23	Do not use R12 for indirect tail calls with PACBTI (#82661 ) When compiling for thumbv8.1m with +pacbti and making an indirect tail call, the compiler was free to put the function pointer into R12. This is incorrect because R12 is restored to contain authentication code for the caller's return address. This patch excludes R12 from the set of registers the compiler can put the function pointer in. Fixes https://github.com/llvm/llvm-project/issues/75998	2024-04-30 15:29:07 +01:00
Xu Zhang	f6d431f208	[CodeGen] Make the parameter TRI required in some functions. (#85968 ) Fixes #82659 There are some functions, such as `findRegisterDefOperandIdx` and `findRegisterDefOperand`, that have too many default parameters. As a result, we have encountered some issues due to the lack of TRI parameters, as shown in issue #82411. Following @RKSimon 's suggestion, this patch refactors 9 functions, including `{reads, kills, defines, modifies}Register`, `registerDefIsDead`, and `findRegister{UseOperandIdx, UseOperand, DefOperandIdx, DefOperand}`, adjusting the order of the TRI parameter and making it required. In addition, all the places that call these functions have also been updated correctly to ensure no additional impact. After this, the caller of these functions should explicitly know whether to pass the `TargetRegisterInfo` or just a `nullptr`.	2024-04-24 14:24:14 +01:00
Simon Pilgrim	78f0871bee	Revert rG58de1e2c5eee548a9b365e3b1554d87317072ad9 "Fix stack layout for frames larger than 2gb (#84114 )" This is failing on some EXPENSIVE_CHECKS buildbots	2024-03-27 16:16:15 +00:00
Wesley Wiser	58de1e2c5e	Fix stack layout for frames larger than 2gb (#84114 ) For very large stack frames, the offset from the stack pointer to a local can be more than 2^31 which overflows various `int` offsets in the frame lowering code. This patch updates the frame lowering code to calculate the offsets as 64-bit values and resolves the overflows, resulting in the correct codegen for very large frames. Fixes #48911	2024-03-27 15:05:58 +00:00
James Westwood	b2c16e7ff4	Revert "[ARM] R11 not pushed adjacent to link register with PAC-M and… (#84019 ) … AAPCS frame chain fix (#82801)" This reverts commit 00e4a4197137410129d4725ffb82bae9ce44bdde. This patch was found to cause miscompilations and compilation failures.	2024-03-05 14:34:43 +00:00
James Westwood	00e4a41971	[ARM] R11 not pushed adjacent to link register with PAC-M and AAPCS frame chain fix (#82801 ) When code for M class architecture was compiled with AAPCS and PAC enabled, the frame pointer, r11, was not pushed to the stack adjacent to the link register. Due to PAC being enabled, r12 was placed between r11 and lr. This patch fixes this by adding an extra case to the already existing code that splits the GPR push in two when R11 is the frame pointer and certain paremeters are met. The differential revision for this previous change can be found here: https://reviews.llvm.org/D125649. This now ensures that r11 and lr are pushed in a separate push instruction to the other GPRs when PAC and AAPCS are enabled, meaning the frame pointer and link register are now pushed onto the stack adjacent to each other.	2024-03-04 12:00:36 +00:00
ostannard	749384c08e	[ARM] Update IsRestored for LR based on all returns (#82745 ) PR #75527 fixed ARMFrameLowering to set the IsRestored flag for LR based on all of the return instructions in the function, not just one. However, there is also code in ARMLoadStoreOptimizer which changes return instructions, but it set IsRestored based on the one instruction it changed, not the whole function. The fix is to factor out the code added in #75527, and also call it from ARMLoadStoreOptimizer if it made a change to return instructions. Fixes #80287.	2024-02-26 12:23:25 +00:00
Kazu Hirata	af8d050286	[Target] Use range-based for loops (NFC)	2023-12-24 23:09:55 -08:00
Florian Hahn	b1a5ee1feb	[ARM] Check all terms in emitPopInst when clearing Restored for LR. (#75527 ) emitPopInst checks a single function exit MBB. If other paths also exit the function and any of there terminators uses LR implicitly, it is not save to clear the Restored bit. Check all terminators for the function before clearing Restored. This fixes a mis-compile in outlined-fn-may-clobber-lr-in-caller.ll where the machine-outliner previously introduced BLs that clobbered LR which in turn is used by the tail call return. Alternative to #73553	2023-12-20 16:56:15 +01:00
John Brawn	fae3f9ec4f	[ARM] Fix prologue/epilogue for pacbti-m leaf functions R12 is callee-saved in functions with pacbti-m enabled, but this is done in assignCalleeSavedSpillSlots, meaning that in determineCalleeSaves we have to manually set CanEliminateFrame. This fixes a bug where in leaf functions with no other callee-saved registers the aut instruction wouldn't be emitted and stack offsets of arguments passed on the stack would be incorrect. Differential Revision: https://reviews.llvm.org/D157865	2023-09-04 13:46:01 +01:00
Oliver Stannard	40614e1c14	[ARM] Save and restore CPSR around tMOVimm32 When resolving a frame index with a large offset for v6M execute-only, we emit a tMOVimm32 pseudo-instruction, which later gets lowered to a sequence of instructions, all of which are flag-setting. However, a frame index may be generated for a register spill or reload instruction, which can be inserted at a point where CPSR is live. This patch inserts MRS and MSR instructions around the tMOVimm32 to save and restore the value of CPSR, if CPSR is live at that point. This may need up to two virtual registers (one to build the immediate value, one to save CPSR) during frame index lowering, which happens after register allocation, so we need to ensure two spill slots are avilable to the register scavenger to ensure it can free up enough registers for this. There is no test for the emission (or not) of the MRS/MSR pair, because it requires a spill or reload to be inserted at a point where CPSR is live, which requires a large, complex function and is fragile enough that any optimisation changes will break the test. This bug was easily found by csmith with -verify-machineinstrs, which I now run regularly on v6M execute-only (and many other combinations). Patch by John Brawn and myself. Reviewed By: stuij Differential Revision: https://reviews.llvm.org/D158404	2023-08-24 14:15:02 +01:00
John Brawn	8336d38be9	[ARM] Correctly handle combining segmented stacks with execute-only Using segmented stacks with execute-only mostly works, but we need to use the correct movi32 opcode in 6-M, and there's one place where for thumb1 (i.e. 6-M and 8-M.base) a constant pool was unconditionally used which needed to be fixed. Differential Revision: https://reviews.llvm.org/D156339	2023-07-28 10:37:40 +01:00
Zhiyao Ma	1d0ccebcd7	[ARM] Don't allocate memory if free space in segmented stack is just enough Assuming that the stack grows downwards, it is fine if the stack pointer is exactly at the stacklet boundary. We should use less-or-equal condition when deciding whether to skip new memory allocation. Differential Revision: https://reviews.llvm.org/D149315	2023-05-02 13:09:49 +01:00
Kazu Hirata	4241d890ae	[Target] Use range-based for loops (NFC)	2023-04-15 14:14:56 -07:00
Simon Pilgrim	b206145323	ARMFrameLowering.cpp - fix MSVC "result of 32-bit shift implicitly converted to 64 bits" warning. NFC.	2023-03-31 10:34:19 +01:00
Martin Storsjö	c5383536cb	[ARM] Handle generating SEH unwind info for t2STR_PRE/t2LDR_POST This fixes compiling some uncommon cases. Differential Revision: https://reviews.llvm.org/D147212	2023-03-31 10:22:28 +03:00
Matt Arsenault	c16a58b36c	Attributes: Add function getter to parse integer string attributes The most common case for string attributes parses them as integers. We don't have a convenient way to do this, and as a result we have inconsistent missing attribute and invalid attribute handling scattered around. We also have inconsistent radix usage to getAsInteger; some places use the default 0 and others use base 10. Update a few of the uses, but there are quite a lot of these.	2022-12-14 13:12:35 -05:00
Joe Loser	5e96cea1db	[llvm] Use std::size instead of llvm::array_lengthof LLVM contains a helpful function for getting the size of a C-style array: `llvm::array_lengthof`. This is useful prior to C++17, but not as helpful for C++17 or later: `std::size` already has support for C-style arrays. Change call sites to use `std::size` instead. Differential Revision: https://reviews.llvm.org/D133429	2022-09-08 09:01:53 -06:00
Fangrui Song	de9d80c1c5	[llvm] LLVM_FALLTHROUGH => [[fallthrough]]. NFC With C++17 there is no Clang pedantic warning or MSVC C5051.	2022-08-08 11:24:15 -07:00
Lucas Prates	70a5c52534	[ARM][Thumb] Command-line option to ensure AAPCS compliant Frame Records Currently the a AAPCS compliant frame record is not always created for functions when it should. Although a consistent frame record might not be required in some cases, there are still scenarios where applications may want to make use of the call hierarchy made available trough it. In order to enable the use of AAPCS compliant frame records whilst keep backwards compatibility, this patch introduces a new command-line option (`-mframe-chain=[none\|aapcs\|aapcs+leaf]`) for Aarch32 and Thumb backends. The option allows users to explicitly select when to use it, and is also useful to ensure the extra overhead introduced by the frame records is only introduced when necessary, in particular for Thumb targets. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D125094	2022-06-27 14:08:48 +01:00
Krasimir Georgiev	8f2ba36336	Revert "[ARM][Thumb] Command-line option to ensure AAPCS compliant Frame Records AND [NFC][Thumb] Update frame-chain codegen test to use thumbv6m" This reverts commit 7625e01d661644a560884057755d48a0da8b77b4 and dependent cbcce82ef6b512d97e92a319a75a03e997c844e1. Commit 7625e01d661644a560884057755d48a0da8b77b4 causes some new codegen test failures under asan, e.g., CodeGen/ARM/execute-only.ll: https://lab.llvm.org/buildbot/#/builders/5/builds/24659/steps/15/logs/stdio.	2022-06-15 16:10:02 +02:00
Lucas Prates	7625e01d66	[ARM][Thumb] Command-line option to ensure AAPCS compliant Frame Records Currently the a AAPCS compliant frame record is not always created for functions when it should. Although a consistent frame record might not be required in some cases, there are still scenarios where applications may want to make use of the call hierarchy made available trough it. In order to enable the use of AAPCS compliant frame records whilst keep backwards compatibility, this patch introduces a new command-line option (`-mframe-chain=[none\|aapcs\|aapcs+leaf]`) for Aarch32 and Thumb backends. The option allows users to explicitly select when to use it, and is also useful to ensure the extra overhead introduced by the frame records is only introduced when necessary, in particular for Thumb targets. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D125094	2022-06-14 13:37:51 +01:00
Lucas Prates	33b9ad647e	Revert "[ARM][Thumb] Command-line option to ensure AAPCS compliant Frame Records" Reverting change due to test failure. This reverts commit 6119053dab67129eb1700dbf36db3524dd3e421f.	2022-06-13 11:00:49 +01:00
Lucas Prates	6119053dab	[ARM][Thumb] Command-line option to ensure AAPCS compliant Frame Records Currently the a AAPCS compliant frame record is not always created for functions when it should. Although a consistent frame record might not be required in some cases, there are still scenarios where applications may want to make use of the call hierarchy made available trough it. In order to enable the use of AAPCS compliant frame records whilst keep backwards compatibility, this patch introduces a new command-line option (`-mframe-chain=[none\|aapcs\|aapcs+leaf]`) for Aarch32 and Thumb backends. The option allows users to explicitly select when to use it, and is also useful to ensure the extra overhead introduced by the frame records is only introduced when necessary, in particular for Thumb targets. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D125094	2022-06-13 10:21:06 +01:00
Martin Storsjö	485432f3c8	[ARM] Make a narrow tMOVi8 where possible in SEH prologues We intentionally disable Thumb2SizeReduction for SEH prologues/epilogues, to avoid needing to guess what will happen with the instructions in a potential future pass in frame lowering. But for this specific case, where we know we can express the intent with a narrow instruction, change to that instruction form directly in frame lowering. Differential Revision: https://reviews.llvm.org/D126949	2022-06-03 22:33:55 +03:00
Martin Storsjö	bd52506d24	[ARM] Make narrow push/pop in SEH prologues/epilogues where applicable We intentionally disable Thumb2SizeReduction for SEH prologues/epilogues, to avoid needing to guess what will happen with the instructions in a potential future pass in frame lowering. But for this specific case, where we know we can express the intent with a narrow instruction, change to that instruction form directly in frame lowering. Differential Revision: https://reviews.llvm.org/D126948	2022-06-03 22:33:55 +03:00
Martin Storsjö	40c937cba2	[ARM] Fix restoring stack for varargs with SEH split frame pointer push Previously, the "add sp, #12" ended up inserted after "bx lr". Differential Revision: https://reviews.llvm.org/D126872	2022-06-03 09:32:00 +03:00
Martin Storsjö	2ab19bfa41	[ARM] Adjust the frame pointer when it's needed for SEH unwinding For functions that require restoring SP from FP (e.g. that need to align the stack, or that have variable sized allocations), the prologue and epilogue previously used to look like this: push {r4-r5, r11, lr} add r11, sp, #8 ... sub r4, r11, #8 mov sp, r4 pop {r4-r5, r11, pc} This is problematic, because this unwinding operation (restoring sp from r11 - offset) can't be expressed with the SEH unwind opcodes (probably because this unwind procedure doesn't map exactly to individual instructions; note the detour via r4 in the epilogue too). To make unwinding work, the GPR push is split into two; the first one pushing all other registers, and the second one pushing r11+lr, so that r11 can be set pointing at this spot on the stack: push {r4-r5} push {r11, lr} mov r11, sp ... mov sp, r11 pop {r11, lr} pop {r4-r5} bx lr For the same setup, MSVC generates code that uses two registers; r11 still pointing at the {r11,lr} pair, but a separate register used for restoring the stack at the end: push {r4-r5, r7, r11, lr} add r11, sp, #12 mov r7, sp ... mov sp, r7 pop {r4-r5, r7, r11, pc} For cases with clobbered float/vector registers, they are pushed after the GPRs, before the {r11,lr} pair. Differential Revision: https://reviews.llvm.org/D125649	2022-06-02 12:28:46 +03:00
Martin Storsjö	d8e67c1ccc	[ARM] Add SEH opcodes in frame lowering Skip inserting regular CFI instructions if using WinCFI. This is based a fair amount on the corresponding ARM64 implementation, but instead of trying to insert the SEH opcodes one by one where we generate other prolog/epilog instructions, we try to walk over the whole prolog/epilog range and insert them. This is done because in many cases, the exact number of instructions inserted is abstracted away deeper. For some cases, we manually insert specific SEH opcodes directly where instructions are generated, where the automatic mapping of instructions to SEH opcodes doesn't hold up (e.g. for __chkstk stack probes). Skip Thumb2SizeReduction for SEH prologs/epilogs, and force tail calls to wide instructions (just like on MachO), to make sure that the unwind info actually matches the width of the final instructions, without heuristics about what later passes will do. Mark SEH instructions as scheduling boundaries, to make sure that they aren't reordered away from the instruction they describe by PostRAScheduler. Mark the SEH instructions with the NoMerge flag, to avoid doing tail merging of functions that have multiple epilogs that all end with the same sequence of "b <other>; .seh_nop_w, .seh_endepilogue". Differential Revision: https://reviews.llvm.org/D125648	2022-06-02 12:28:46 +03:00
Zongwei Lan	ad73ce318e	[Target] use getSubtarget<> instead of static_cast<>(getSubtarget()) Differential Revision: https://reviews.llvm.org/D125391	2022-05-26 11:22:41 -07:00
Zhiyao Ma	bd606afe26	[ARM] Only update the successor edges for immediate predecessors of PrologueMBB When adjusting the function prologue for segmented stacks, only update the successor edges of the immediate predecessors of the original prologue. Differential Revision: https://reviews.llvm.org/D122959	2022-05-03 12:36:35 +01:00
Matt Arsenault	d7938b1a81	MachineModuleInfo: Move HasSplitStack handling to AsmPrinter This is used to emit one field in doFinalization for the module. We can accumulate this when emitting all individual functions directly in the AsmPrinter, rather than accumulating additional state in MachineModuleInfo. Move the special case behavior predicate into MachineFrameInfo to share it. This now promotes it to generic behavior. I'm assuming this is fine because no other target implements adjustForSegmentedStacks, or has tests using the split-stack attribute.	2022-04-20 10:54:29 -04:00
Zhiyao Ma	adc26b4eae	[ARM] Fix 8-bit immediate overflow in the instruction of segmented stack prologue. It fixes the overflow of 8-bit immediate field in the emitted instruction that allocates large stacklet. For thumb2 targets, load large immediate by a pair of movw and movt instruction. For thumb1 and ARM targets, load large immediate by reading from literal pool. Differential Revision: https://reviews.llvm.org/D118545	2022-03-10 15:15:24 -08:00

1 2 3 4 5 ...

306 Commits