llvm-project

Author	SHA1	Message	Date
Simon Pilgrim	8634e358eb	[AArch64][ARM] Avoid some APFloat copies in tablegen patterns. NFC. (#114416 ) Either the N->getValueAPF() was being unused or we were failing to make use of it returning a const APFloat&	2024-11-01 18:14:59 +00:00
Oliver Stannard	33411d5207	[ARM] Fix CMSE S->NS calls when CONTROL_S.SFPA==0 (CVE-2024-7883) (#114433 ) When doing a call from CMSE secure state to non-secure state for v8-M.main, we use the VLLDM and VLSTM instructions to save, clear and restore the FP registers around the call. These instructions both check the CONTROL_S.SFPA bit, and if it is clear (meaning the current contents of the FP registers are not secret) they execute as no-ops. This causes a problem when CONTROL_S.SFPA==0 before the call, which happens if there are no floating-point instructions executed between entry to secure state and the call. If this is the case, then the VLSTM instruction will do nothing, leaving the save area in the stack uninitialised. If the called function returns a value in floating-point registers, the call sequence includes an instruction to copy the return value from a floating-point register to a GPR, which must be before the VLLDM instruction. This copy sets CONTROL_S.SFPA, meaning that the VLLDM will fully execute, and load the uninitialised stack memory into the FP registers. This causes two problems: * The FP register file is clobbered, including all of the callee-saved registers, which might contain live values. * The stack region might contain secret values, which will be leaked to non-secure state through the floating-point registers if/when we return to non-secure state. The fix is to insert a `vmov s0, s0` instruction before the VLSTM instruction, to ensure that CONTROL_S.SFPA is set for both the VLLDM and VLSTM instruction. CVE: https://www.cve.org/cverecord?id=CVE-2024-7883 Security bulletin: https://developer.arm.com/Arm%20Security%20Center/Cortex-M%20Security%20Extensions%20Vulnerability	2024-11-01 09:36:13 +00:00
dnsampaio	28d0718033	[DAGCombiner] Add combine avg from shifts (#113909 ) This teaches dagcombiner to fold: `(asr (add nsw x, y), 1) -> (avgfloors x, y)` `(lsr (add nuw x, y), 1) -> (avgflooru x, y)` as well the combine them to a ceil variant: `(avgfloors (add nsw x, y), 1) -> (avgceils x, y)` `(avgflooru (add nuw x, y), 1) -> (avgceilu x, y)` iff valid for the target. Removes some of the ARM MVE patterns that are now dead code. It adds the avg opcodes to `IsQRMVEInstruction` as to preserve the immediate splatting as before.	2024-10-31 10:57:27 +01:00
Craig Topper	50896e7ef5	[ARM] Use getSignedConstant. NFC	2024-10-30 21:43:16 -07:00
Fangrui Song	facdae62b7	[MCInstPrinter] Make printRegName non-const Similar to printInst. printRegName may change states (e.g. #113834).	2024-10-29 19:14:54 -07:00
Benjamin Maxwell	c3260c65e8	[IR] Add `llvm.sincos` intrinsic (#109825 ) This adds the `llvm.sincos` intrinsic, legalization, and lowering. The `llvm.sincos` intrinsic takes a floating-point value and returns both the sine and cosine (as a struct). ``` declare { float, float } @llvm.sincos.f32(float %Val) declare { double, double } @llvm.sincos.f64(double %Val) declare { x86_fp80, x86_fp80 } @llvm.sincos.f80(x86_fp80 %Val) declare { fp128, fp128 } @llvm.sincos.f128(fp128 %Val) declare { ppc_fp128, ppc_fp128 } @llvm.sincos.ppcf128(ppc_fp128 %Val) declare { <4 x float>, <4 x float> } @llvm.sincos.v4f32(<4 x float> %Val) ``` The lowering is built on top of the existing FSINCOS ISD node, with additional type legalization to allow for f16, f128, and vector values.	2024-10-29 10:52:20 +00:00
Oliver Stannard	dff114b356	[ARM] Optimise non-ABI frame pointers (#110286 ) With -fomit-frame-pointer, even if we set up a frame pointer for other reasons (e.g. variable-sized or over-aligned stack allocations), we don't need to create an ABI-compliant frame record. This means that we can save all of the general-purpose registers in one push, instead of splitting it to ensure that the frame pointer and link register are adjacent on the stack, saving two instructions per function.	2024-10-28 09:01:06 +00:00
Kazu Hirata	60d2feded5	[ARM] Remove a redundant call to StringRef::slice (NFC) (#113783 ) OptStr.slice(0, OptStr.size()) is exactly the same as OptStr.	2024-10-26 22:07:56 -07:00
Kazu Hirata	242c77018f	[ARM] clang-format (NFC) I'm planning to post a patch in this area.	2024-10-26 19:29:49 -07:00
Oliver Stannard	376d7b27fa	[ARM] Optimise byval arguments in tail-calls We don't need to copy byval arguments to tail calls via a temporary, if we can prove that we are not copying from the outgoing argument area. This patch does this when the source if the argument is one of: * Memory in the local stack frame, which can't be used for tail-call arguments. * A global variable. We can also avoid doing the copy completely if the source and destination are the same memory location, which is the case when the caller and callee have the same signature, and pass some arguments through unmodified.	2024-10-25 09:34:09 +01:00
Oliver Stannard	914a3990d1	[ARM] Avoid clobbering byval arguments when passing to tail-calls When passing byval arguments to tail-calls, we need to store them into the stack memory in which this the caller received it's arguments. If any of the outgoing arguments are forwarded from incoming byval arguments, then the source of the copy is from the same stack memory. This can result in the copy corrupting a value which is still to be read. The fix is to first make a copy of the outgoing byval arguments in local stack space, and then copy them to their final location. This fixes the correctness issue, but results in extra copying, which could be optimised.	2024-10-25 09:34:09 +01:00
Oliver Stannard	78ec2e2ed5	[ARM] Allow tail calls with byval args Byval arguments which are passed partially in registers get stored into the local stack frame, but it is valid to tail-call them because the part which gets spilled is always re-loaded into registers before doing the tail-call, so it's OK for the spill area to be deallocated.	2024-10-25 09:34:08 +01:00
Oliver Stannard	82e6472197	[ARM] Allow functions with sret returns to be tail-called It is valid to tail-call a function which returns through an sret argument, as long as we have an incoming sret pointer to pass on.	2024-10-25 09:34:08 +01:00
Oliver Stannard	c1eb790cd2	[ARM] Tail-calls do not require caller and callee arguments to match The ARM backend was checking that the outgoing values for a tail-call matched the incoming argument values of the caller. This isn't necessary, because the caller can change the values in both registers and the stack before doing the tail-call. The actual limitation is that the callee can't need more stack space for it's arguments than the caller does. This is needed for code using the musttail attribute, as well as enabling tail calls as an optimisation in more cases.	2024-10-25 09:34:08 +01:00
Oliver Stannard	246baeb5fe	[ARM] Add debug trace for tail-call optimisation There are lots of reasons a call might not be eligible for tail-call optimisation, this adds debug trace to help understand the compiler's decisions here.	2024-10-25 09:34:08 +01:00
Oliver Stannard	8e289e4fa6	[ARM] Fix comment typo	2024-10-25 09:34:07 +01:00
Oliver Stannard	493529fbce	Re-land: [ARM] Fix frame chains with M-profile PACBTI (#110285 ) When using AAPCS-compliant frame chains with PACBTI return address signing, there ware a number of bugs in the generation of the frame pointer and function prologues. The most obvious was that we sometimes would modify r11 before pushing it to the stack, so it wasn't preserved as required by the PCS. We also sometimes did not push R11 and LR adjacent to one another on the stack, or used R11 as a frame pointer without pointing it at the saved value of R11, both of which are required to have an AAPCS compliant frame chain. The original work of this patch was done by James Westwood, reviewed as #82801 and #81249, with some tidy-ups done by Mark Murray and myself.	2024-10-24 16:44:16 +01:00
Nashe Mncube	e37d736def	Recommit: [llvm][ARM][GlobalOpt]Add widen global arrays pass (#113289 ) This is a recommit of #107120 . The original PR was approved but failed buildbot. The newly added tests should only be run for compilers that support the ARM target. This has been resolved by adding a config file for these tests. - Pass optimizes memcpy's by padding out destinations and sources to a full word to make ARM backend generate full word loads instead of loading a single byte (ldrb) and/or half word (ldrh). Only pads destination when it's a stack allocated constant size array and source when it's constant string. Heuristic to decide whether to pad or not is very basic and could be improved to allow more examples to be padded. - Pass works at the midend level	2024-10-24 10:12:01 +01:00
Benson Chu	0b32769444	[ARM] Apply sign-return-address attribute to outlined function This make checking for whether PAC is necessary simpler when building the outlined frame.	2024-10-23 08:50:56 -07:00
David Spickett	dd76d9b1bb	[llvm][ARM] Correct the properties of trap instructions (#113287 ) Fixes #113154 The encodings used for llvm.trap() on ARM were all marked as barriers and terminators. This lead to stack frame destroy code being inserted before the trap if the trap was the last thing in the function and it had no return statement. ``` void fn() { volatile int i = 0; __builtin_trap(); } ``` Produced: ``` fn: push {r11, lr} << stack frame create <...> mov sp, r11 pop {r11, lr} << stack frame destroy .inst 0xe7ffdefe << trap bx lr ``` All the other targets don't mark them this way, instead they mark them with isTrap. I've changed ARM to do this, which fixes the code generation: ``` fn: push {r11, lr} << stack frame create <...> .inst 0xe7ffdefe << trap mov sp, r11 pop {r11, lr} << stack frame destroy bx lr ``` I've updated the existing trap test to force the need for a stack frame, then check that the instruction immediately after the trap is resetting the stack pointer. debugtrap was already working but I've added the same checks for it anyway.	2024-10-23 09:06:12 +01:00
jofrn	fe480cf923	[ARM] Use proper types for these records. (#113370 ) llvm#112904 will add typechecking to submulticlass arguments, and these ones are currently mistyped.	2024-10-22 18:17:52 -04:00
Jack Styles	a4d6fe54a7	Reland "[llvm][ARM] Add Addend Checks for MOVT and MOVW instructions. (PR #111970 )" (#112877 ) Change relanded after feedback on failures and improvements to the check of the addend. Original PR #111970 Changes from original patch: - The value that is being checked has changed, it is now correctly checking any Addend for the instruction, rather than the Value. The addend is kept within the Target data structure from my investigation. - Removed changes to the following tests due to the original behaviour being correct, and my original patch causing unexpected errors - llvm/test/MC/ARM/Windows/mov32t-range.s - llvm/test/MC/MachO/ARM/thumb2-movw-fixup.s As per the ARM ABI, the MOVT and MOVW instructions should have addends that fall within a 16bit signed range. LLVM does not check this so it is possible to use addends that are beyond the accepted range. These addends are silently truncated. A new check is added to ensure the addend falls within the expected range, rejecting an addend that falls outside with an error. Information relating to the ABI requirements can be found here: https://github.com/ARM-software/abi-aa/blob/main/aaelf32/aaelf32.rst#addends-and-pc-bias-compensation	2024-10-22 08:18:09 +01:00
Alex Rønne Petersen	5785cbb405	[llvm] Ensure that soft float targets don't emit `fma()` libcalls. (#106615 ) The previous behavior could be harmful in some edge cases, such as emitting a call to `fma()` in the `fma()` implementation itself. Do this by just being more accurate in `isFMAFasterThanFMulAndFAdd()`. This was already done for PowerPC; this commit just extends that to Arm, z/Arch, and x86. MIPS and SPARC already got it right, but I added tests for them too, for good measure. Note: I don't have commit access.	2024-10-19 06:13:15 -07:00
David Green	0f3ed9c650	[ARM] Use ARM::NoRegister in more places. NFC Similar to #112507, this uses ARM::NoRegister in a few more places, as opposed to the constant 0.	2024-10-18 17:39:21 +01:00
Oliver Stannard	18ac0178ad	Revert "[ARM] Fix frame chains with M-profile PACBTI (#110285 )" Reverting because this is causing failures with MSan: https://lab.llvm.org/buildbot/#/builders/169/builds/4378 This reverts commit e1f8f84acec05997893c305c78fbf7feecf44dd7.	2024-10-18 09:04:28 +01:00
Alex Rønne Petersen	ad4a582fd9	[llvm] Consistently respect `naked` fn attribute in `TargetFrameLowering::hasFP()` (#106014 ) Some targets (e.g. PPC and Hexagon) already did this. I think it's best to do this consistently so that frontend authors don't run into inconsistent results when they emit `naked` functions. For example, in Zig, we had to change our emit code to also set `frame-pointer=none` to get reliable results across targets. Note: I don't have commit access.	2024-10-18 09:35:42 +04:00
Keith Packard	44b020a381	[PowerPC][ISelLowering] Support -mstack-protector-guard=tls (#110928 ) Add support for using a thread-local variable with a specified offset for holding the stack guard canary value. This supports both 32- and 64- bit PowerPC targets. This mirrors changes from #108942 but targeting PowerPC instead of RISCV. Because both of these PRs modify the same driver functions, this series is stack on top of the RISC-V one. --------- Signed-off-by: Keith Packard <keithp@keithp.com>	2024-10-17 19:06:47 -07:00
Jay Foad	85c17e4092	[LLVM] Make more use of IRBuilder::CreateIntrinsic. NFC. (#112706 ) Convert many instances of: Fn = Intrinsic::getOrInsertDeclaration(...); CreateCall(Fn, ...) to the equivalent CreateIntrinsic call.	2024-10-17 16:20:43 +01:00
VladiKrapp-Arm	ea796e5237	[ARM] Prefer MUL to MULS on some implementations (#112540 ) MULS adversely affects performance on many implementations. Where this is the case, we prefer not to shrink MUL to MULS.	2024-10-17 13:53:22 +01:00
Nashe Mncube	370fd74361	Revert "[llvm][ARM]Add widen global arrays pass" (#112701 ) Reverts llvm/llvm-project#107120 Unexpected build failures in post-commit pipelines. Needs investigation	2024-10-17 13:38:01 +01:00
gxlayer	4a2bd78f5b	[ARM] Fix -mno-omit-leaf-frame-pointer flag doesn't works on 32-bit ARM (#109628 ) The -mno-omit-leaf-frame-pointer flag works on 32-bit ARM architectures and addresses the bug reported in #108019	2024-10-17 20:25:06 +08:00
Simon Pilgrim	bf5cf82dd4	Fix MSVC signed/unsigned mismatch warning. NFC.	2024-10-17 12:50:10 +01:00
Nashe Mncube	ab90d2793c	[llvm][ARM]Add widen global arrays pass (#107120 ) - Pass optimizes memcpy's by padding out destinations and sources to a full word to make backend generate full word loads instead of loading a single byte (ldrb) and/or half word (ldrh). Only pads destination when it's a stack allocated constant size array and source when it's constant array. Heuristic to decide whether to pad or not is very basic and could be improved to allow more examples to be padded. - Pass works within GlobalOpt but is disabled by default on all targets except ARM.	2024-10-17 11:56:00 +01:00
Jie Fu	584e00a316	[ARM] Fix -Wunused-variable in ARMFrameLowering.cpp (NFC) /llvm-project/llvm/lib/Target/ARM/ARMFrameLowering.cpp:1028:9: error: unused variable 'FPOffset' [-Werror,-Wunused-variable] int FPOffset = MFI.getObjectOffset(FramePtrSpillFI); ^ 1 error generated.	2024-10-17 18:46:26 +08:00
John Brawn	ad45eb4a9c	[ARM] Fix problems with register list in vscclrm (#111825 ) The register list in vscclrm is unusual in three ways: * The encoded size can be zero, meaning the list contains only vpr. * Double-precision registers past d15 are permitted even when the subtarget doesn't have them, they are instead ignored when the instruction executes. * The single-precision variant allows double-precision registers d16 onwards, which are encoded as a pair of single-precision registers. Fixing this also incidentally changes a vlldm/vlstm error message: when the first register is in the range d16-d31 we now get the "operand must be exactly..." error instead of "register expected".	2024-10-17 11:15:08 +01:00
Oliver Stannard	e1f8f84ace	[ARM] Fix frame chains with M-profile PACBTI (#110285 ) When using AAPCS-compliant frame chains with PACBTI return address signing, there ware a number of bugs in the generation of the frame pointer and function prologues. The most obvious was that we sometimes would modify r11 before pushing it to the stack, so it wasn't preserved as required by the PCS. We also sometimes did not push R11 and LR adjacent to one another on the stack, or used R11 as a frame pointer without pointing it at the saved value of R11, both of which are required to have an AAPCS compliant frame chain. The original work of this patch was done by James Westwood, reviewed as #82801 and #81249, with some tidy-ups done by Mark Murray and myself.	2024-10-17 09:32:44 +01:00
Nikita Popov	255a99c29f	[APInt] Fix APInt constructions where value does not fit bitwidth (NFCI) (#80309 ) This fixes all the places that hit the new assertion added in https://github.com/llvm/llvm-project/pull/106524 in tests. That is, cases where the value passed to the APInt constructor is not an N-bit signed/unsigned integer, where N is the bit width and signedness is determined by the isSigned flag. The fixes either set the correct value for isSigned, set the implicitTrunc flag, or perform more calculations inside APInt. Note that the assertion is currently still disabled by default, so this patch is mostly NFC.	2024-10-17 08:48:08 +02:00
Jay Foad	9255850e89	[LLVM] Remove unused variables after #112546	2024-10-16 16:15:34 +01:00
Jay Foad	d9c95efb6c	[LLVM] Make more use of IRBuilder::CreateIntrinsic. NFC. (#112546 ) Convert almost every instance of: CreateCall(Intrinsic::getOrInsertDeclaration(...), ...) to the equivalent CreateIntrinsic call.	2024-10-16 15:43:30 +01:00
Karl-Johan Karlsson	f113a66c29	[ARM] Fix warnings in ARMAsmParser.cpp and ARMDisassembler.cpp (#112507 ) Fix gcc warnings like: ARMAsmParser.cpp:7168:46: warning: enumeral and non-enumeral type in conditional expression [-Wextra]	2024-10-16 13:49:34 +02:00
Albert Huang	aa2c0f35a1	[ARM] [AArch32] Add support for Arm China STAR-MC1 CPU (#110085 ) STAR-MC1 is an Armv8m CPU. Technical specifications available at: https://www.armchina.com/download/Documents/Application-Notes/Technical-Reference-Manual?infoId=160	2024-10-14 15:48:12 +01:00
Kazu Hirata	9c64b5e759	[ARM] Simplify code with std::map::operator[] (NFC) (#112159 )	2024-10-14 06:56:39 -07:00
Jack Styles	6a98c4a160	Revert "[llvm][ARM] Add Addend Checks for MOVT and MOVW instructions.… (#112184 ) … (#111970)" I was made aware of breakages in Windows/ARM, so reverting while I investigate. This reverts commit f3aebe623b49b7ae14d0f0996999114aac052e4b.	2024-10-14 12:31:50 +01:00
Michał Górny	387b37af1a	[LLVM] [Clang] Support for Gentoo `t64` triples (64-bit time_t ABIs) (#111302 ) Gentoo is planning to introduce a `t64` suffix for triples that will be used by 32-bit platforms that use 64-bit `time_t`. Add support for parsing and accepting these triples, and while at it make clang automatically enable the necessary glibc feature macros when this suffix is used. An open question is whether we can backport this to LLVM 19.x. After all, adding new triplets to Triple sounds like an ABI change — though I suppose we can minimize the risk of breaking something if we move new enum values to the very end.	2024-10-14 11:18:04 +00:00
Serge Pavlov	52e5683ddd	[GlobalISel][ARM] Legalization of G_CONSTANT using constant pool (#98308 ) ARM uses complex encoding of immediate values using small number of bits. As a result, some values cannot be represented as immediate operands, they need to be synthesized in a register. This change implements legalization of such constants with loading values from constant pool. --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com>	2024-10-14 16:40:21 +07:00
Jack Styles	f3aebe623b	[llvm][ARM] Add Addend Checks for MOVT and MOVW instructions. (#111970 ) Previously, any value could be used for the MOVT and MOVW instructions, however the ARM ABI dictates that the addend should be a signed 16 bit value. To ensure this is followed, the Assembler will now check that when using these instructions, the addend is a 16bit signed value, and throw an error if this is not the case. Information relating to the ABI requirements can be found here: https://github.com/ARM-software/abi-aa/blob/main/aaelf32/aaelf32.rst#addends-and-pc-bias-compensation	2024-10-14 10:38:58 +01:00
Kazu Hirata	eef6c0926e	[ARM] Avoid repeated hash lookups (NFC) (#111935 )	2024-10-11 08:58:06 -07:00
Rahul Joshi	fa789dffb1	[NFC] Rename `Intrinsic::getDeclaration` to `getOrInsertDeclaration` (#111752 ) Rename the function to reflect its correct behavior and to be consistent with `Module::getOrInsertFunction`. This is also in preparation of adding a new `Intrinsic::getDeclaration` that will have behavior similar to `Module::getFunction` (i.e, just lookup, no creation).	2024-10-11 05:26:03 -07:00
Kazu Hirata	126ed16525	[ARM] Fix formatting (NFC) I'm about to post a PR in this area.	2024-10-10 20:30:04 -07:00
Jeffrey Byrnes	853c43d04a	[TTI] NFC: Port TLI.shouldSinkOperands to TTI (#110564 ) Porting to TTI provides direct access to the instruction cost model, which can enable instruction cost based sinking without introducing code duplication.	2024-10-09 14:30:09 -07:00

1 2 3 4 5 ...

12508 Commits