llvm-project

Author	SHA1	Message	Date
Matt Arsenault	831e79adff	DAG: Merge all sincos_stret emission code into legalizer (#166295 ) This avoids AArch64 legality rules depending on libcall availability. ARM, AArch64, and X86 all had custom lowering of fsincos which all were just to emit calls to sincos_stret / sincosf_stret. This messes with the cost heuristics around legality, because really it's an expand/libcall cost and not a favorable custom. This is a bit ugly, because we're emitting code trying to match the C ABI lowered IR type for the aggregate return type. This now also gives an easy way to lift the unhandled x86_32 darwin case, since ARM already handled the return as sret case.	2025-11-04 10:20:00 -08:00
Matt Arsenault	590a2b0a1f	Revert "ARM: Remove unnecessary manual ABI lowering for sincos_stret (#166040 )" (#166262 ) This reverts commit a522ae3ef6e13cb39e7756c151652e03a024b301. The ABI handling doesn't account for matching the C ABI, only implicit sret.	2025-11-03 16:00:29 -08:00
Matt Arsenault	a522ae3ef6	ARM: Remove unnecessary manual ABI lowering for sincos_stret (#166040 ) LowerCallTo handles all of the ABI details, including the load of implicit sret return to the expected result positions.	2025-11-03 14:17:39 -08:00
Erik Enikeev	1523332fbd	[ARM] Mark function calls as possibly changing FPSCR (#160699 ) This patch does the same changes as D143001 for AArch64. This PR is part of the work on adding strict FP support in ARM, which was previously discussed in #137101.	2025-10-30 16:36:55 +00:00
Erik Enikeev	242ebcf13e	[ARM] Add instruction selection for strict FP (#160696 ) This consists of marking the various strict opcodes as legal, and adjusting instruction selection patterns so that 'op' is 'any_op'. The changes are similar to those in D114946 for AArch64. Custom lowering and promotion are set for some FP16 strict ops to work correctly. This PR is part of the work on adding strict FP support in ARM, which was previously discussed in #137101.	2025-10-29 21:43:43 +00:00
Matt Arsenault	28e9a2832f	DAG: Consider __sincos_stret when deciding to form fsincos (#165169 )	2025-10-28 08:28:09 -07:00
Matt Arsenault	f5a2e6bb8f	CodeGen: Remove overrides of getSSPStackGuardCheck (NFC) (#164044 ) All 3 implementations are just checking if this has the windows check function, so merge that as the only implementation.	2025-10-24 21:17:34 +09:00
Kees Cook	d130f40264	[ARM][KCFI] Add backend support for Kernel Control-Flow Integrity (#163698 ) Implement KCFI (Kernel Control Flow Integrity) backend support for ARM32, Thumb2, and Thumb1. The Linux kernel has supported ARM KCFI via Clang's generic KCFI implementation, but this has finally started to [cause problems](https://github.com/ClangBuiltLinux/linux/issues/2124) so it's time to get the KCFI operand bundle lowering working on ARM. Supports patchable-function-prefix with adjusted load offsets. Provides an instruction size worst case estimate of how large the KCFI bundle is so that range-limited instructions (e.g. cbz) know how big the indirect calls can become. ARM implementation notes: - Four-instruction EOR sequence builds the 32-bit type ID byte-by-byte to work within ARM's modified immediate encoding constraints. - Scratch register selection: r12 (IP) is preferred, r3 used as fallback when r12 holds the call target. r3 gets spilled/reloaded if it is being used as a call argument. - UDF trap encoding: 0x8000 \| (0x1F << 5) \| target_reg_index, similar to aarch64's trap encoding. Thumb2 implementation notes: - Logically the same as ARM - UDF trap encoding: 0x80 \| target_reg_index Thumb1 implementation notes: - Due to register pressure, 2 scratch registers are needed: r3 and r2, which get spilled/reloaded if they are being used as call args. - Instead of EOR, add/lsl sequence to load immediate, followed by a compare. - No trap encoding. Update tests to validate all three sub targets.	2025-10-23 08:27:13 -07:00
Petr Hosek	7b190b79d9	[Clang][LLVM] Support for Fuchsia on ARM (#163848 ) This introduces the support for 32-bit ARM Fuchsia target which uses the aapcs-linux ABI defaulting to thumbv8a as the target.	2025-10-21 11:08:30 -07:00
David Green	6d5dea63ed	[ARM][SDAG] Add llvm.lround half promotion. (#164235 ) Similar to #161088, add llvm.lround and llvm.llround promotion.	2025-10-21 16:56:55 +01:00
Matt Arsenault	0cefd5c3c2	CodeGen: Fix hardcoded libcall names in insertSSPDeclarations (NFC) (#163710 )	2025-10-17 21:50:16 +09:00
AZero13	d95f8ffee4	[ARM][TargetLowering] Combine Level should not be a factor in shouldFoldConstantShiftPairToMask (NFC) (#156949 ) This should be based on the type and instructions, and only thumb uses combine level anyway.	2025-10-11 10:58:48 +09:00
Matt Arsenault	424fa83335	CodeGen: Remove unused IntrinsicLowering includes (#162844 )	2025-10-10 14:34:16 +00:00
David Green	125f0ac757	[ARM][SDAG] Half promote llvm.lrint nodes. (#161088 ) As shown in #137101, fp16 lrint are not handled correctly on Arm. This adds soft-half promotion for them, reusing the function that promotes a value with operands (and can handle strict fp once that is added).	2025-10-07 22:04:39 +01:00
Simon Tatham	2cacf7117b	[ARM] Improve comment on the 'J' inline asm modifier. (#160712 ) An inline asm constraint "Jr", in AArch32, means that if the input value is a compile-time constant in the range -4095 to +4095, then it can be inserted into the assembly language as an immediate operand, and otherwise it will be placed in a register. The comment in the Arm backend said "It is not clear what this constraint is intended for". I believe the answer is that that range of immediate values are the ones you can use in a LDR or STR instruction. So it's suitable for cases like this: asm("str %0,[%1,%2]" : : "r"(data), "r"(base), "Jr"(offset) : "memory"); in the same way that the "Ir" constraint is suitable for the immediate in a data-processing instruction such as ADD or EOR.	2025-09-26 09:18:59 +01:00
paperchalice	3257dc35fe	[ARM] Remove `UnsafeFPMath` uses in code generation part (#160801 ) Factor out from #151275 Remove all UnsafeFPMath uses but ABI tags related part.	2025-09-26 15:54:30 +08:00
AZero13	151a80bbce	[TargetLowering][ExpandABD] Prefer selects over usubo if we do the same for ucmp (#159889 ) Same deal we use for determining ucmp vs scmp. Using selects on platforms that like selects is better than using usubo. Rename function to be more general fitting this new description.	2025-09-25 10:33:05 +09:00
AZero13	733c1aded1	[ARM] Replace ABS and tABS machine nodes with custom lowering (#156717 ) Just do a custom lowering instead. Also copy paste the cmov-neg fold to prevent regressions in nabs.	2025-09-19 19:43:36 +01:00
Nikita Popov	1723f80b08	[ARM] Allow s constraints on half (#157860 ) Fix a regression from https://github.com/llvm/llvm-project/pull/147559.	2025-09-11 08:50:32 +02:00
paperchalice	667f919214	[SelectionDAG][ARM] Propagate fast math flags in visitBRCOND (#156647 ) Factor out from #151275.	2025-09-06 20:44:25 +08:00
woruyu	22fb21a64e	[DAG][ARM] canCreateUndefOrPoisonForTargetNode - ARMISD VORRIMM\VBICIMM nodes can't create poison/undef (#156831 ) ### Summary This PR resolves https://github.com/llvm/llvm-project/issues/156640	2025-09-05 16:40:02 +08:00
woruyu	010f1ea3b3	[DAG][ARM] ComputeKnownBitsForTargetNode - add handling for ARMISD VORRIMM\VBICIMM nodes (#149494 ) ### Summary This PR resolves https://github.com/llvm/llvm-project/issues/147179	2025-09-04 15:56:31 +08:00
Nikita Popov	3f757a39f2	[CodeGen] Remove ExpandInlineAsm hook (#156617 ) This hook replaces inline asm with LLVM intrinsics. It was intended to match inline assembly implementations of bswap in libc headers and replace them more optimizable implementations. At this point, it has outlived its usefulness (see https://github.com/llvm/llvm-project/issues/156571#issuecomment-3247638412), as libc implementations no longer use inline assembly for this purpose. Additionally, it breaks the "black box" property of inline assembly, which some languages like Rust would like to guarantee. Fixes https://github.com/llvm/llvm-project/issues/156571.	2025-09-04 09:28:11 +02:00
Daniel Paoliello	f99b0f3de4	[NFC] RuntimeLibcalls: Prefix the impls with 'Impl_' (#153850 ) As noted in #153256, TableGen is generating reserved names for RuntimeLibcalls, which resulted in a build failure for Arm64EC since `vcruntime.h` defines `__security_check_cookie` as a macro. To avoid using reserved names, all impl names will now be prefixed with `Impl_`. `NumLibcallImpls` was lifted out as a `constexpr size_t` instead of being an enum field. While I was churning the dependent code, I also removed the TODO to move the impl enum into its own namespace and use an `enum class`: I experimented with using an `enum class` and adding a namespace, but we decided it was too verbose so it was dropped.	2025-09-02 09:57:33 -07:00
AZero13	2259a80c7d	[ARM] Simplify LowerCMP (NFC) (#156198 ) Pass the opcode directly.	2025-08-31 15:45:12 +01:00
Min-Yih Hsu	acaa925cb2	[IA][RISCV] Recognize interleaving stores that could lower to strided segmented stores (#154647 ) This is a sibling patch to #151612: passing gap masks to the renewal TLI hooks for lowering interleaved stores that use shufflevector to do the interleaving.	2025-08-26 13:22:42 -07:00
AZero13	79dfe48865	[ARM] Set isCheapToSpeculateCtlz as true for hasV5TOps and no Thumb 1 (#154848 ) This is so that we don't expand to include unneeded 0 checks. Also fix the logic error in LegalizerInfo so it is NOT legal on Thumb1 in Fast-ISEL. Finally, Remove the README entry regarding this issue.	2025-08-25 12:43:48 -07:00
Kazu Hirata	e9045b3cea	[ARM] Remove an unnecessary cast (NFC) (#155206 ) getType() already returns Type *.	2025-08-25 07:33:34 -07:00
Matt Arsenault	65d12622fa	RuntimeLibcalls: Add entries for stackprotector globals (#154930 ) Add entries for_stack_chk_guard, __ssp_canary_word, __security_cookie, and __guard_local. As far as I can tell these are all just different names for the same shaped functionality on different systems. These aren't really functions, but special global variable names. They should probably be treated the same way; all the same contexts that need to know about emittable function names also need to know about this. This avoids a special case check in IRSymtab. This isn't a complete change, there's a lot more cleanup which should be done. The stack protector configuration system is a complete mess. There are multiple overlapping controls, used in 3 different places. Some of the target control implementations overlap with conditions used in the emission points, and some use correlated but not identical conditions in different contexts. i.e. useLoadStackGuardNode, getIRStackGuard, getSSPStackGuardCheck and insertSSPDeclarations are all used in inconsistent ways so I don't know if I've tracked the intention of the system correctly. The PowerPC test change is a bug fix on linux. Previously the manual conditions were based around !isOSOpenBSD, which is not the condition where __stack_chk_guard are used. Now getSDagStackGuard returns the proper global reference, resulting in LOAD_STACK_GUARD getting a MachineMemOperand which allows scheduling.	2025-08-23 10:21:00 +09:00
Nikita Popov	01bc742185	[CodeGen] Give ArgListEntry a proper constructor (NFC) (#153817 ) This ensures that the required fields are set, and also makes the construction more convenient.	2025-08-15 18:06:07 +02:00
Matt Arsenault	4aae7bc625	ARM: Move half convert libcall config to tablegen (#153389 )	2025-08-14 17:35:58 +09:00
Matt Arsenault	bbcac029db	ARM: Move more aeabi libcall config into tablegen (#152109 )	2025-08-14 15:43:15 +09:00
Matt Arsenault	32f1fe3770	ARM: Move calling conv config to RuntimeLibcalls (#152065 ) Consolidate module level ABI into RuntimeLibcalls	2025-08-14 08:36:03 +09:00
David Green	06d2d1e156	[ARM] Protect against odd sized vectors in isVTRNMask and friends (#153413 ) Fixes the issue reported on #153138, where odd-sized vectors would cause the checks to iterate off the end of the mask.	2025-08-13 20:57:46 +01:00
Min-Yih Hsu	ca05058b49	[IA][RISCV] Recognize deinterleaved loads that could lower to strided segmented loads (#151612 ) Turn the following deinterleaved load patterns ``` %l = masked.load(%ptr, /mask=/110110110110, /passthru=/poison) %f0 = shufflevector %l, [0, 3, 6, 9] %f1 = shufflevector %l, [1, 4, 7, 10] %f2 = shufflevector %l, [2, 5, 8, 11] ``` into ``` %s = riscv.vlsseg2(/passthru=/poison, %ptr, /mask=/1111) %f0 = extractvalue %s, 0 %f1 = extractvalue %s, 1 %f2 = poison ``` The mask `110110110110` is regarded as 'gap mask' since it effectively skips the entire third field / component. Similarly, turning the following snippet ``` %l = masked.load(%ptr, /mask=/110000110000, /passthru=/poison) %f0 = shufflevector %l, [0, 3, 6, 9] %f1 = shufflevector %l, [1, 4, 7, 10] ``` into ``` %s = riscv.vlsseg2(/passthru=/poison, %ptr, /mask=/1010) %f0 = extractvalue %s, 0 %f1 = extractvalue %s, 1 ``` Right now this patch only tries to detect gap mask from a constant mask supplied to a masked.load/vp.load.	2025-08-12 14:08:18 -07:00
AZero13	6a425f1e54	[ARM] Have custom lowering for ucmp and scmp (#149315 ) Limited to non-thumb1 for scmp at the moment, since there is no good way to do it.	2025-08-08 06:51:18 +01:00
Kazu Hirata	62fc0028bf	[Target] Remove unnecessary casts (NFC) (#152262 ) value() already returns uint64_t.	2025-08-06 07:11:07 -07:00
eleviant	907b7d0f07	[ARM] Fix inline asm register validation for vector types (#152175 ) Patch allows following piece of code to be successfully compiled: ``` register uint8x8_t V asm("d3") = vdup_n_u8(0xff); ```	2025-08-06 10:30:49 +02:00
Matt Arsenault	342bf58f93	RuntimeLibcalls: Add entries for __security_check_cookie (#151843 ) Avoids hardcoding string name based on target, and gets the entry in the centralized list of emitted calls.	2025-08-06 10:26:36 +09:00
Matt Arsenault	d44754c344	ARM: Remove redundant or buggy config of __aeabi_d2h (#152126 ) This was set if `TT.isTargetAEABI()`. This was previously set above if `TM.isAAPCS_ABI() && (TT.isTargetAEABI() \|\| TT.isTargetGNUAEABI() \|\| TT.isTargetMuslAEABI() \|\| TT.isAndroid())`. So this could differ based on a manually specified -target-abi flag due to the `isAAPCS_ABI` part of the original condition. I'm guessing these should be consistent, so either this second group of setLibcallImpl calls should have been guarded by the `isAAPCS_ABI` check, or the first condition should remove it. There doesn't appear to be any meaningful test coverage using the manually specified ABI option, so #152108 tries to remove it	2025-08-06 08:48:01 +09:00
Matt Arsenault	1392edcc07	ARM: Remove idiv runtime call aliases (#152098 ) Really only the i32 variants exist. We don't need synthetic aliases for illegal types which will be promoted.	2025-08-05 17:49:22 +09:00
AZero13	23022a4683	[SelectionDAG] Move sign pattern check from AArch64 and ARM to general SelectionDAG (#151736 ) This works on all cases much like the XOR case above it in SelectionDAG.	2025-08-01 14:46:51 -07:00
Prabhu Rajasekaran	17ccb849f3	[llvm] Extract and propagate callee_type metadata Update MachineFunction::CallSiteInfo to extract numeric CalleeTypeIds from callee_type metadata attached to indirect call instructions. Reviewers: nikic, ilovepi Reviewed By: ilovepi Pull Request: https://github.com/llvm/llvm-project/pull/87575	2025-07-30 14:56:39 -07:00
Nikita Popov	fe0dbe0f29	[CodeGen] More consistently expand float ops by default (#150597 ) These float operations were expanded for scalar f32/f64/f128, but not for f16 and more problematically, not for vectors. A small subset of them was separately set to expand for vectors. Change these to always expand by default, and adjust targets to mark these as legal where necessary instead. This is a much safer default, and avoids unnecessary legalization failures because a target failed to manually mark them as expand. Fixes https://github.com/llvm/llvm-project/issues/110753. Fixes https://github.com/llvm/llvm-project/issues/121390.	2025-07-28 09:46:00 +02:00
eleviant	a4796b14fc	[ARM] Emit error message when incompatible reg is specified (#147559 ) At the moment the following piece of code causes undefined behavior: ``` int a; void b() { register float d2 asm("d2") = a; asm("" ::"r"(d2)); } ``` This happens because variable and register types are incompatible.	2025-07-24 19:19:25 +02:00
Philip Reames	dbd9eae95a	[IA] Support vp.store in lowerinterleavedStore (#149605 ) Follow up to 28417e64, and the whole line of work started with 4b81dc7. This change merges the handling for VPStore - currently in lowerInterleavedVPStore - into the existing dedicated routine used in the shuffle lowering path. This removes the last use of the dedicated lowerInterleavedVPStore and thus we can remove it. This contains two changes which are functional. First, like in 28417e64, merging support for vp.store exposes the strided store optimization for code using vp.store. Second, it seems the strided store case had a significant missed optimization. We were performing the strided store at the full unit strided store type width (i.e. LMUL) rather than reducing it to match the input width. This became obvious when I tried to use the mask created by the helper routine as it caused a type incompatibility. Normally, I'd try not to include an optimization in an API rework, but structuring the code to both be correct for vp.store and not optimize the existing case turned out be more involved than seemed worthwhile. I could pull this part out as a pre-change, but its a bit awkward on it's own as it turns out to be somewhat of a half step on the possible optimization; the full optimization is complex with the old code structure. --------- Co-authored-by: Craig Topper <craig.topper@sifive.com>	2025-07-22 15:50:17 -07:00
Philip Reames	28417e6459	[IA] Support vp.load in lowerInterleavedLoad [nfc-ish] (#149174 ) This continues in the direction started by commit 4b81dc7. We essentially merges the handling for VPLoad - currently in lowerInterleavedVPLoad - into the existing dedicated routine. This removes the last use of the dedicate lowerInterleavedVPLoad and thus we can remove it. This isn't quite NFC as the main callback has support for the strided load optimization whereas the VPLoad specific version didn't. So this adds the ability to form a strided load for a vp.load deinterleave with one shuffle used.	2025-07-17 17:29:28 -07:00
Kazu Hirata	2da59287aa	[Target] Remove unnecessary casts (NFC) (#149342 ) getFunction().getParent() already returns Module *.	2025-07-17 15:24:25 -07:00
Brad Smith	0d2e11f3e8	Remove Native Client support (#133661 ) Remove the Native Client support now that it has finally reached end of life.	2025-07-15 13:22:33 -04:00
Simon Pilgrim	82a276e610	[ARM][WebAssembly] Remove unused PatternMatch namespace. NFC. (#147984 ) Avoid file-level "using namespace llvm::PatternMatch" to make it easier to potentially use SDPatternMatch in the future.	2025-07-11 10:24:43 +01:00

1 2 3 4 5 ...

2434 Commits