llvm-project

Author	SHA1	Message	Date
Sander de Smalen	2ad1d77b17	[AArch64] Match constants in SelectSMETileSlice (#151494 ) If the slice is a constant then it should try to use `WZR + <imm>` addressing mode if the constant fits the range.	2025-08-11 10:19:26 +01:00
Paul Walker	20293ebd31	[LLVM][CodeGen][SME] Only emit strided loads in streaming mode. (#150445 ) The selection code for aarch64_sve_ld[nt]1_pn_x{2,4} intrinsics gates the use of strided load instructions behind the SME2 target feature. However, the instructions are only available in streaming mode.	2025-07-30 11:41:46 +01:00
Peter Collingbourne	2197671109	AArch64: Relax x16/x17 constraint on AUT in certain cases. On most operating systems, the x16 and x17 registers are not special, so there is no benefit, and only a code size cost, to constraining AUT to only using them. Therefore, adjust the backend to only use the AUT pseudo (renamed AUTx16x17 for clarity) on Darwin platforms. All other platforms use an unconstrained variant of the pseudo, AUTxMxN, for selection. Reviewers: ahmedbougacha, kovdan01, atrosinenko Reviewed By: atrosinenko Pull Request: https://github.com/llvm/llvm-project/pull/132857	2025-07-09 13:46:44 -07:00
John Brawn	b53c1e4ee8	[AArch64] Add ISel for postindex ld1/st1 in big-endian (#144387 ) When big-endian we need to use ld1/st1 for vector loads and stores so that we get the elements in the correct order, but this prevents postindex addressing from being used. Fix this by adding the appropriate ISel patterns, plus the relevant changes in ISelLowering and ISelDAGToDAG to cause postindex addressing to be used.	2025-06-18 16:16:52 +01:00
Rajveer Singh Bharadwaj	95bbaca6c1	[AArch64] Extend usage of `XAR` instruction for fixed-length operations (#139460 )	2025-06-12 10:54:01 +05:30
Jonathan Thackray	62cae4ffcb	[AArch64] Fix a multitude of AArch64 typos (NFC) (#143370 ) Fix a multitude of typos in the AArch64 codebase using the https://github.com/crate-ci/typos Rust package.	2025-06-09 22:13:22 +01:00
David Green	32837f376f	[AArch64] Handle XAR with v1i64 operand types (#141754 ) When converting ROTR(XOR(a, b)) to XAR(a, b), or ROTR(a, a) to XAR(a, zero) we were not handling v1i64 types, meaning illegal copies get generated. This addresses that by generating insert_subreg and extract_subreg for v1i64 to keep the values with the correct types. Fixes #141746	2025-05-29 10:22:24 +01:00
Benjamin Maxwell	7a8090c037	[AArch64] Remove unused ISD nodes (NFC) (#140706 ) Part of #140472.	2025-05-21 10:55:07 +01:00
Rajveer Singh Bharadwaj	36bb17aa65	[AArch64] Utilize `XAR` for certain vector rotates (#137629 ) Resolves #137162 For cases when there isn't any `XOR` in the transformation, replace with a zero register.	2025-05-09 10:02:31 +01:00
Kazu Hirata	c30776ab9a	[AArch64] Use ArrayRef::slice (NFC) (#133862 )	2025-04-01 07:28:18 -07:00
Ricardo Jesus	74f5a028cb	Reapply "[AArch64][SVE] Improve fixed-length addressing modes. (#129732 )" (#130625 ) The original patch from #129732 exposed a bug in `getMemVTFromNode`, which was returning incorrect types for fixed length vectors.	2025-03-19 08:25:37 +00:00
Ricardo Jesus	21610e3ecc	Revert "[AArch64][SVE] Improve fixed-length addressing modes." (#130263 ) This reverts commit f01e760c08365426de95f02dc2c2dc670eb47352.	2025-03-07 09:35:55 +00:00
Ricardo Jesus	f01e760c08	[AArch64][SVE] Improve fixed-length addressing modes. (#129732 ) When compiling VLS SVE, the compiler often replaces VL-based offsets with immediate-based ones. This leads to a mismatch in the allowed addressing modes due to SVE loads/stores generally expecting immediate offsets relative to VL. For example, given: ```c svfloat64_t foo(const double *x) { svbool_t pg = svptrue_b64(); return svld1_f64(pg, x+svcntd()); } ``` When compiled with `-msve-vector-bits=128`, we currently generate: ```gas foo: ptrue p0.d mov x8, #2 ld1d { z0.d }, p0/z, [x0, x8, lsl #3] ret ``` Instead, we could be generating: ```gas foo: ldr z0, [x0, #1, mul vl] ret ``` Likewise for other types, stores, and other VLS lengths. This patch achieves the above by extending `SelectAddrModeIndexedSVE` to let constants through when `vscale` is known.	2025-03-06 09:27:07 +00:00
David Tellenbach	0fe0968c93	[AArch64][FEAT_CMPBR] Codegen for Armv9.6-a compare-and-branch (#116465 ) This patch adds codegen for all Arm9.6-a compare-and-branch instructions, that operate on full w or x registers. The instruction variants operating on half-words (cbh) and bytes (cbb) are added in a subsequent patch. Since CB doesn't use standard 4-bit Arm condition codes but a reduced set of conditions, encoded in 3 bits, some conditions are expressed by modifying operands, namely incrementing or decrementing immediate operands and swapping register operands. To invert a CB instruction it's therefore not enough to just modify the condition code which doesn't play particularly well with how the backend is currently organized. We therefore introduce a number of pseudos which operate on the standard 4-bit condition codes and lower them late during codegen.	2025-02-19 13:58:20 -08:00
Csanád Hajdú	4a00c84fbb	[AArch64] Allow register offset addressing mode for prefetch (#124534 ) Previously instruction selection failed to generate PRFM instructions with register offsets because `AArch64ISD::PREFETCH` is not a `MemSDNode`.	2025-01-28 09:16:40 +00:00
Momchil Velikov	b2073fb9b9	[AArch64] Prefer SVE2.2 zeroing forms of certain instructions with an all-true predicate (#120595 ) When the predicate of a destructive operation is known to be all-true, for example fabs z0.s, p0/m, z1.s then the entire output register is written and we can use a zeroing (instead of a merging) form of the instruction, for example fabs z0.s, p0/z, z1.s thus eliminate the dependency on the input-output destination register without the need to insert a `movprfx`. This patch complements (and in the case of `2b3266c170`, fixes a regression) the following: `7f4414b2a1` [AArch64] Generate zeroing forms of certain SVE2.2 instructions (4/11) (https://github.com/llvm/llvm-project/pull/116830) `2474cf7ad1` [AArch64] Generate zeroing forms of certain SVE2.2 instructions (3/11) (https://github.com/llvm/llvm-project/pull/116829) `6f285d3115` [AArch64] Generate zeroing forms of certain SVE2.2 instructions (2/11) (https://github.com/llvm/llvm-project/pull/116828) `2b3266c170` [AArch64] Generate zeroing forms of certain SVE2.2 instructions (1/11) (https://github.com/llvm/llvm-project/pull/116259)	2024-12-24 10:18:48 +00:00
Craig Topper	104ad9258a	[SelectionDAG] Rename SDNode::uses() to users(). (#120499 ) This function is most often used in range based loops or algorithms where the iterator is implicitly dereferenced. The dereference returns an SDNode * of the user rather than SDUse * so users() is a better name. I've long beeen annoyed that we can't write a range based loop over SDUse when we need getOperandNo. I plan to rename use_iterator to user_iterator and add a use_iterator that returns SDUse& on dereference. This will make it more like IR.	2024-12-18 20:09:33 -08:00
SpencerAbson	b0f06769e6	[AArch64] Implement intrinsics for SME FP8 F1CVT/F2CVT and BF1CVT/BF2CVT (#118027 ) This patch implements the following intrinsics: 8-bit floating-point convert to half-precision or BFloat16 (in-order). ``` c // Variant is also available for: _bf16[_mf8]_x2 svfloat16x2_t svcvt1_f16[_mf8]_x2_fpm(svmfloat8_t zn, fpm_t fpm) __arm_streaming; svfloat16x2_t svcvt2_f16[_mf8]_x2_fpm(svmfloat8_t zn, fpm_t fpm) __arm_streaming; ``` In accordance with https://github.com/ARM-software/acle/pull/323. Co-authored-by: Marin Lukac marian.lukac@arm.com Co-authored-by: Caroline Concatto caroline.concatto@arm.com	2024-12-08 19:34:01 +00:00
SpencerAbson	e4ee970c4b	[AArch64] Implement intrinsics for F1CVTL/F2CVTL and BF1CVTL/BF2CVTL (#116959 ) This patch implements the following intrinsics: 8-bit floating-point convert to deinterleaved half-precision or BFloat16. ``` c // Variant is also available for: _bf16[_mf8]_x2 svfloat16x2_t svcvtl1_f16[_mf8]_x2_fpm(svmfloat8_t zn, fpm_t fpm) __arm_streaming; svfloat16x2_t svcvtl2_f16[_mf8]_x2_fpm(svmfloat8_t zn, fpm_t fpm) __arm_streaming; ``` Defined in https://github.com/ARM-software/acle/pull/323 Co-authored-by: Caroline Concatto caroline.concatto@arm.com Co-authored-by: Marian Lukac marian.lukac@arm.com	2024-11-28 12:37:02 +00:00
Simon Pilgrim	d800ea7cb1	Adjust MSVC disabled optimization pragmas to be _MSC_VER only (#116704 ) Alter the #ifdef values from #110986 and #115292 to use _MSC_VER instead of _WIN32 to stop the pragmas being used on gcc/mingw builds Noticed by @mstorsjo	2024-11-21 13:33:13 +00:00
Craig Topper	ce0cc8e9eb	[AArch64][VE][X86] Use getSignedTargetConstant. NFC	2024-11-18 13:12:23 -08:00
Kazu Hirata	4048c64306	[llvm] Remove redundant control flow statements (NFC) (#115831 ) Identified with readability-redundant-control-flow.	2024-11-12 10:09:42 -08:00
Simon Pilgrim	9123dc6abf	[AArch64] AArch64ISelDAGToDAG.cpp - disable inlining on MSVC release builds (#115292 ) Similar to #110986 - disabling inlining on MSVC release builds avoids an excessive build time issue affecting all recent versions of CL.EXE Fixes #114425	2024-11-07 12:35:58 +00:00
Sander de Smalen	3098200fcc	[ISel] Propagate disjoint flag in ShrinkDemandedOp (#114560 ) When trying to evaluate an expression in a narrower type, the DAGCombine should propagate the disjoint flag, as it's equally valid on the narrower expression. This helps improve better use of addressing modes for some Arm SME instructions, for example.	2024-11-03 19:42:04 +00:00
Nikita Popov	e2074c60bb	[AArch64] Use implicitTrunc in isBitfieldDstMask() (NFC) This code intentionally discards the high bits, so set implicitTrunc=true. This is currently NFC but will enable an APInt assertion in the future.	2024-10-21 15:53:21 +02:00
CarolineConcatto	c0e97c4dfc	[Clang][LLVM][AArch64] Add intrinsic for LUTI4 SME2 instruction (#97755 ) (#109953 ) This patch was reverted because of a failing C test. It now has being solved and can be merged into main again This patch adds these intrinsics: // Variants are also available for: _s8 svuint8x4_t svluti4_zt_u8_x4(uint64_t zt0, svuint8x2_t zn) __arm_streaming __arm_in("zt0"); according to PR#324[1] [1]ARM-software/acle#324	2024-09-30 12:59:06 +01:00
Lukacma	02c138f8d1	[AArch64] Implement intrinsics for SME2 FSCALE (#100128 ) This patch implements these intrinsics: FSCALE SINGLE AND MULTI ``` // Variants are also available for: // [_single_f32_x2], [_single_f64_x2], // [_single_f16_x4], [_single_f32_x4], [_single_f64_x4] svfloat16x2_t svscale[_single_f16_x2](svfloat16x2_t zd, svfloat16_t zm) __arm_streaming; // Variants are also available for: // [_f32_x2], [_f64_x2], // [_f16_x4], [_f32_x4], [_f64_x4] svfloat16x2_t svscale[_f16_x2](svfloat16x2_t zd, svfloat16x2_t zm) __arm_streaming ``` (cf. https://github.com/ARM-software/acle/pull/323) Co-authored-by: Caroline Concatto <caroline.concatto@arm.com>	2024-09-25 14:34:00 +01:00
Caroline Concatto	02f46d7fb8	Revert "[Clang][LLVM][AArch64] Add intrinsic for LUTI4 SME2 instruction (#97755 )" Going to revert to Fix test in clang as it is failing This reverts commit 445d8b2d10b2bb9a5f50e3fe0671045acd309a04.	2024-09-25 09:25:28 +00:00
CarolineConcatto	445d8b2d10	[Clang][LLVM][AArch64] Add intrinsic for LUTI4 SME2 instruction (#97755 ) This patch adds these intrinsics: // Variants are also available for: _s8 svuint8x4_t svluti4_zt_u8_x4(uint64_t zt0, svuint8x2_t zn) __arm_streaming __arm_in("zt0"); according to PR#324[1] [1]ARM-software/acle#324	2024-09-25 09:53:23 +01:00
Momchil Velikov	f7fa75b208	[AArch64] Implement intrinsics for SME2 FAMIN/FAMAX (#99063 ) This patch implements these intrinsics: ``` c // Variants are also available for: // [_f32_x2], [_f64_x2], // [_f16_x4], [_f32_x4], [_f64_x4] svfloat16x2_t svamax[_f16_x2](svfloat16x2 zd, svfloat16x2_t zm) __arm_streaming; svfloat16x2_t svamin[_f16_x2](svfloat16x2 zd, svfloat16x2_t zm) __arm_streaming; ``` (cf. https://github.com/ARM-software/acle/pull/324) Co-authored-by: Caroline Concatto <caroline.concatto@arm.com>	2024-09-04 15:29:32 +01:00
Craig Topper	69115cce29	[AArch64] Use SelectionDAG::getSignedConstant/getAllOnesConstant.	2024-08-17 18:02:49 -07:00
Kazu Hirata	fcf6dc3365	[AArch64] Construct SmallVector<SDValue> with ArrayRef (NFC) (#102713 )	2024-08-09 21:40:59 -07:00
Paul Walker	1fbd7be58f	[LLVM][ISel][SVE] Remove redundant merging fp patterns. (#101351 ) Since "vselect cond, (binop, x, y), x" became the canonical form the equivalent PatFrags for "binop x, (vselect cond, y, 0)" are no longer required.	2024-08-01 13:59:38 +01:00
Ahmed Bougacha	d7e8a7487c	[AArch64][PAC] Lower auth/resign into checked sequence. (#79024 ) This introduces 3 hardening modes in the authentication step of auth/resign lowering: - unchecked, which uses the AUT instructions as-is - poison, which detects authentication failure (using an XPAC+CMP sequence), explicitly yielding the XPAC result rather than the AUT result, to avoid leaking - trap, which additionally traps on authentication failure, using BRK #0xC470 + key (IA C470, IB C471, DA C472, DB C473.) Not all modes are necessarily useful in all contexts, and there are more performant alternative lowerings in specific contexts (e.g., when I/D TBI enablement is a target ABI guarantee.) These will be implemented separately. This is controlled by the `ptrauth-auth-traps` function attributes, and can be overridden using `-aarch64-ptrauth-auth-checks=`. This also adds the FPAC extension, which we haven't needed before, to improve isel when we can rely on HW checking.	2024-07-22 21:28:01 -07:00
John Brawn	3a14ffbae3	[AArch64] Implement GCS ACLE intrinsics (#96903 ) This adds intrinsics defined in ARM-software/acle#260 Doing this requires some changes to the GCS instruction definitions, as these intrinsics make use of how some instructions don't modify the input register when GCS is disabled, and they need to be correctly marked with mayLoad/mayStore/hasSideEffects for instruction selection to work.	2024-07-11 14:09:36 +01:00
CarolineConcatto	6859e5a169	[CLANG][LLVM][AArch64]Add SME2.1 intrinsics for MOVAZ array to vector (#88901 ) According to the specification in ARM-software/acle#309 this adds the intrinsics Move and zero multiple ZA single-vector groups to vector registers // Variants are also available for _za8_u8, _za16_s16, _za16_u16, // _za16_f16, _za16_bf16, _za32_s32, _za32_u32, _za32_f32, // _za64_s64, _za64_u64 and _za64_f64 svint8x2_t svreadz_za8_s8_vg1x2(uint32_t slice) __arm_streaming __arm_inout("za"); // Variants are also available for _za8_u8, _za16_s16, _za16_u16, // _za16_f16, _za16_bf16, _za32_s32, _za32_u32, _za32_f32, // _za64_s64, _za64_u64 and _za64_f64 svint8x4_t svreadz_za8_s8_vg1x4(uint32_t slice) __arm_streaming __arm_inout("za");	2024-07-01 08:23:16 +01:00
CarolineConcatto	c9fc960650	[CLANG][LLVM][AArch64]SME2.1 intrinsics for MOVAZ tile to 2/4 vectors (#88710 ) According to the specification in ARM-software/acle#309 this adds the intrinsics // Variants are also available for _za8_u8, _za16_s16, _za16_u16, // _za16_f16, _za16_bf16, _za32_s32, _za32_u32, _za32_f32, // _za64_s64, _za64_u64 and _za64_f64 svint8x2_t svreadz_hor_za8_s8_vg2(uint64_t tile, uint32_t slice) __arm_streaming __arm_inout("za"); // Variants are also available for _za8_u8, _za16_s16, _za16_u16, // _za16_f16, _za16_bf16, _za32_s32, _za32_u32, _za32_f32, // _za64_s64, _za64_u64 and _za64_f64 svint8x4_t svreadz_hor_za8_s8_vg4(uint64_t tile, uint32_t slice) __arm_streaming __arm_inout("za"); // Variants are also available for _za8_u8, _za16_s16, _za16_u16, // _za16_f16, _za16_bf16, _za32_s32, _za32_u32, _za32_f32, // _za64_s64, _za64_u64 and _za64_f64 svint8x2_t svreadz_ver_za8_s8_vg2(uint64_t tile, uint32_t slice) __arm_streaming __arm_inout("za"); // Variants are also available for _za8_u8, _za16_s16, _za16_u16, // _za16_f16, _za16_bf16, _za32_s32, _za32_u32, _za32_f32, // _za64_s64, _za64_u64 and _za64_f64 svint8x4_t svreadz_ver_za8_s8_vg4(uint64_t tile, uint32_t slice) __arm_streaming __arm_inout("za");	2024-06-26 12:54:37 +01:00
Sander de Smalen	c436649313	[AArch64] Remove all instances of the 'hasSVEorSME' interfaces. (#96543 ) I've not added any new tests for these, because the original conditions were wrong (they did not consider streaming mode) and we have tests for the positive cases.	2024-06-25 13:27:06 +01:00
paperchalice	7652a59407	Reland "[NewPM][CodeGen] Port selection dag isel to new pass manager" (#94149 ) - Fix build with `EXPENSIVE_CHECKS` - Remove unused `PassName::ID` to resolve warning - Mark `~SelectionDAGISel` virtual so AArch64 backend can work properly	2024-06-04 08:10:58 +08:00
paperchalice	8917afaf0e	Revert "[NewPM][CodeGen] Port selection dag isel to new pass manager" (#94146 ) This reverts commit de37c06f01772e02465ccc9f538894c76d89a7a1 to de37c06f01772e02465ccc9f538894c76d89a7a1 It still breaks EXPENSIVE_CHECKS build. Sorry.	2024-06-02 14:31:52 +08:00
paperchalice	d2cdc8ab45	[NewPM][CodeGen] Port selection dag isel to new pass manager (#83567 ) Port selection dag isel to new pass manager. Only `AMDGPU` and `X86` support new pass version. `-verify-machineinstrs` in new pass manager belongs to verify instrumentation, it is enabled by default.	2024-06-02 09:12:33 +08:00
Lukacma	e93799f260	[SME] Add intrinsics for FCVT(wid.) and FCVTL (#93202 ) According to the specification in https://github.com/ARM-software/acle/pull/309 this adds the intrinsics ``` svfloat32x2_t svcvt_f32[_f16_x2](svfloat16_t zn) __arm_streaming; svfloat32x2_t svcvtl_f32[_f16_x2](svfloat16_t zn) __arm_streaming; ``` These are available only if __ARM_FEATURE_SME_F16F16 is enabled. --------- Co-authored-by: Caroline Concatto <caroline.concatto@arm.com>	2024-05-29 11:34:24 +01:00
Lukacma	78cc9cbba2	[AArch64][SME] Add intrinsics for multi-vector BFCLAMP (#93532 ) According to the specification in https://github.com/ARM-software/acle/pull/309 this adds the intrinsics ``` svbfloat16x2_t svclamp[_single_bf16_x2](svbfloat16x2_t zd, svbfloat16_t zn, svbfloat16_t zm) __arm_streaming; svbfloat16x4_t svclamp[_single_bf16_x4](svbfloat16x4_t zd, svbfloat16_t zn, svbfloat16_t zm) __arm_streaming; ``` These are available only if __ARM_FEATURE_SME_B16B16 is enabled.	2024-05-29 10:44:58 +01:00
Lukacma	d67200e084	Revert "[AArch64][SME] Add intrinsics for multi-vector BFCLAMP" (#93531 ) Reverts llvm/llvm-project#88251	2024-05-28 12:07:49 +01:00
Lukacma	12271710da	[AArch64][SME] Add intrinsics for multi-vector BFCLAMP (#88251 ) According to the specification in https://github.com/ARM-software/acle/pull/309 this adds the intrinsics ``` svbfloat16x2_t svclamp[_single_bf16_x2](svbfloat16x2_t zd, svbfloat16_t zn, svbfloat16_t zm) __arm_streaming; svbfloat16x4_t svclamp[_single_bf16_x4](svbfloat16x4_t zd, svbfloat16_t zn, svbfloat16_t zm) __arm_streaming; ``` These are available only if __ARM_FEATURE_SME_B16B16 is enabled.	2024-05-28 11:34:03 +01:00
Lukacma	90a469057e	Revert "[SME] Add intrinsics for FCVT(wid.) and FCVTL" (#93196 ) Reverts llvm/llvm-project#90215	2024-05-23 15:13:14 +01:00
Lukacma	05c154f2bc	[SME] Add intrinsics for FCVT(wid.) and FCVTL (#90215 ) According to the specification in https://github.com/ARM-software/acle/pull/309 this adds the intrinsics ``` svfloat32x2_t svcvt_f32[_f16_x2](svfloat16_t zn) __arm_streaming; svfloat32x2_t svcvtl_f32[_f16_x2](svfloat16_t zn) __arm_streaming; ``` These are available only if __ARM_FEATURE_SME_F16F16 is enabled. --------- Co-authored-by: Caroline Concatto <caroline.concatto@arm.com>	2024-05-23 14:32:34 +01:00
Hassnaa Hamdi	f7392f40f3	[AArch64] Add intrinsics for bfloat16 min/max/minnm/maxnm (#90105 ) According to specifications in [ARM-software/acle/pull/309](https://github.com/ARM-software/acle/pull/309) Add following intrinsics: ``` // svmax single,multi svbfloat16x2_t svmax_single_bf16_x2(svbfloat16x2_t zdn, svbfloat16_t zm) svbfloat16x4_t svmax_single_bf16_x4(svbfloat16x4_t zdn, svbfloat16_t zm) svbfloat16x2_t svmax_bf16_x2(svbfloat16x2_t zdn, svbfloat16x2_t zm) svbfloat16x4_t svmax_bf16_x4(svbfloat16x4_t zdn, svbfloat16x4_t zm) ``` ``` // svmin single,multi svbfloat16x2_t svmin_single_bf16_x2(svbfloat16x2_t zdn, svbfloat16_t zm) svbfloat16x4_t svmin_single_bf16_x4(svbfloat16x4_t zdn, svbfloat16_t zm) svbfloat16x2_t svmin_bf16_x2(svbfloat16x2_t zdn, svbfloat16x2_t zm) svbfloat16x4_t svmin_bf16_x4(svbfloat16x4_t zdn, svbfloat16x4_t zm) ``` ``` // svmaxnm single,multi svbfloat16x2_t svmaxnm_single_bf16_x2(svbfloat16x2_t zdn, svbfloat16_t zm) svbfloat16x4_t svmaxnm_single_bf16_x4(svbfloat16x4_t zdn, svbfloat16_t zm) svbfloat16x2_t svmaxnm_bf16_x2(svbfloat16x2_t zdn, svbfloat16x2_t zm) svbfloat16x4_t svmaxnm_bf16_x4(svbfloat16x4_t zdn, svbfloat16x4_t zm) ``` ``` // svminnm single,multi svbfloat16x2_t svminnm_single_bf16_x2(svbfloat16x2_t zdn, svbfloat16_t zm) svbfloat16x4_t svminnm_single_bf16_x4(svbfloat16x4_t zdn, svbfloat16_t zm) svbfloat16x2_t svminnm_bf16_x2(svbfloat16x2_t zdn, svbfloat16x2_t zm) svbfloat16x4_t svminnm_bf16_x4(svbfloat16x4_t zdn, svbfloat16x4_t zm) ``` - Variations other than bfloat16 are already supported.	2024-05-16 13:56:02 +01:00
Paul Walker	d48ebac038	[LLVM][SVE][CodeGen] Fix incorrect isel for signed saturating instructions. (#88136 ) The immediate forms of SQADD and SQSUB interpret their immediate operand as unsigned and thus effectively only support positive immediate operands. The original code is only wrong for the i8 cases because they previously accepted all values, however, the new patterns enable more uses and this I've added tests for the larger element types as well.	2024-04-15 12:24:42 +01:00
Eli Friedman	c83f23d6ab	[AArch64] Fix heuristics for folding "lsl" into load/store ops. (#86894 ) The existing heuristics were assuming that every core behaves like an Apple A7, where any extend/shift costs an extra micro-op... but in reality, nothing else behaves like that. On some older Cortex designs, shifts by 1 or 4 cost extra, but all other shifts/extensions are free. On all other cores, as far as I can tell, all shifts/extensions for integer loads are free (i.e. the same cost as an unshifted load). To reflect this, this patch: - Enables aggressive folding of shifts into loads by default. - Removes the old AddrLSLFast feature, since it applies to everything except A7 (and even if you are explicitly targeting A7, we want to assume extensions are free because the code will almost always run on a newer core). - Adds a new feature AddrLSLSlow14 that applies specifically to the Cortex cores where shifts by 1 or 4 cost extra. I didn't add support for AddrLSLSlow14 on the GlobalISel side because it would require a bunch of refactoring to work correctly. Someone can pick this up as a followup.	2024-04-04 11:25:44 -07:00

1 2 3 4 5 ...

355 Commits