llvm-project

Author	SHA1	Message	Date
Djordje Todorovic	0cb7636a46	[RISCV] Add MIPS extensions (#121394 ) Adding two extensions for MIPS p8700 CPU: 1. cmove (conditional move) 2. lsp (load/store pair) The official product page here: https://mips.com/products/hardware/p8700	2025-01-28 08:04:09 +01:00
Philip Reames	a9ad601f7c	[RISCV] Use vrsub for select of add and sub of the same operands (#123400 ) If we have a (vselect c, a+b, a-b), we can combine this to a+(vselect c, b, -b). That by itself isn't hugely profitable, but if we reverse the select, we get a form which matches a masked vrsub.vi with zero. The result is that we can use a masked vrsub before the add instead of a masked add or sub. This doesn't change the critical path (since we already had the pass through on the masked second op), but does reduce register pressure since a, b, and (a+b) don't need to all be alive at once. In addition to the vselect form, we can also see the same pattern with a vector_shuffle encoding the vselect. I explored canonicalizing these to vselects instead, but that exposes several unrelated missing combines.	2025-01-24 10:08:42 -08:00
Sam Elliott	e06b703030	[RISCV][NFC] Remove Redundant Inline Asm Logic (#124202 ) This was left over from 408659c5b5c7d745042ae71db344d1ed10601512.	2025-01-23 17:40:36 -08:00
Sam Elliott	33c4407471	[RISCV] Support cR Inline Asm Constraint (#124174 ) This denotes RVC-compatible GPR Pairs, which are used by the Zclsd extension. C API PR: riscv-non-isa/riscv-c-api-doc#102	2025-01-23 16:19:19 -08:00
Min-Yih Hsu	bc74a1edbe	[IA] Generalize the support for power-of-two (de)interleave intrinsics (#123863 ) Previously, AArch64 used pattern matching to support llvm.vector.(de)interleave of 2 and 4; RISC-V only supported (de)interleave of 2. This patch consolidates the logics in these two targets by factoring out the common factor calculations into the InterleaveAccess Pass.	2025-01-23 15:27:51 -08:00
Philip Reames	e7001061b7	[RISCV] Revise naming and style in matchSplatAsGather [nfc]	2025-01-21 13:09:25 -08:00
Mikhail R. Gadelha	89f119cbda	[RISCV] Update matchSplatAsGather to use the index of extract_elt if in-bounds (#118873 ) This is a follow-up to #117878 and allows the usage of vrgather if the index we are accessing in VT is constant and within bounds. This patch replaces the previous behavior of bailing out if the length of the search vector is greater than the vector of elements we are searching for. Since matchSplatAsGather works on EXTRACT_VECTOR_ELT, and we know the index from which the element is extracted, we only need to check if we are doing an insert from a larger vector into a smaller one, in which we do an extract instead. Co-authored-by: Luke Lau luke_lau@icloud.com Co-authored-by: Philip Reames preames@rivosinc.com	2025-01-21 12:51:41 -08:00
yingopq	754ed95b66	[Mips] Fix compiler crash when returning fp128 after calling a functi… (#117525 ) …on returning { i8, i128 } Fixes https://github.com/llvm/llvm-project/issues/96432.	2025-01-20 16:47:40 +08:00
Philip Reames	143c33c6df	[RISCV] Consider only legally typed splats to be legal shuffles (#123415 ) Given the comment, I'd expected test coverage. There was none so let's do the simple thing which benefits the one thing we have tests for.	2025-01-17 19:13:04 -08:00
Craig Topper	0c6e03eea0	[RISCV] Fold vp.store(vp.reverse(VAL), ADDR, MASK) -> vp.strided.store(VAL, NEW_ADDR, -1, MASK) (#123123 ) Co-authored-by: Brandon Wu <brandon.wu@sifive.com>	2025-01-17 14:22:25 -08:00
Philip Reames	bb6e94a05d	[RISCV] Custom legalize <N x i128>, <4 x i256>, etc.. shuffles (#122352 ) I have a particular user downstream who likes to write shuffles in terms of unions involving _BitInt(128) types. This isn't completely crazy because there's a bunch of code in the wild which was written with SSE in mind, so 128 bits is a common data fragment size. The problem is that generic lowering scalarizes this to ELEN, and we end up with really terrible extract/insert sequences if the i128 shuffle is between other (non-i128) operations. I explored trying to do this via generic lowering infrastructure, and frankly got lost. Doing this a target specific DAG is a bit ugly - really, there's nothing hugely target specific here - but oh well. If reviewers prefer, I could probably phrase this as a generic DAG combine, but I'm not sure that's hugely better. If reviewers have a strong preference on how to handle this, let me know, but I may need a bit of help. A couple notes: * The argument passing weirdness is due to a missing combine to turn a build_vector of adjacent i64 loads back into a vector load. I'm a bit surprised we don't get that, but the isel output clearly has the build_vector at i64. * The splat case I plan to revisit in another patch. That's a relatively common pattern, and the fact I have to scalarize that to avoid an infinite loop is non-ideal.	2025-01-16 14:55:45 -08:00
Raphael Moreira Zinsly	01d7f434d2	[RISCV] Stack clash protection for dynamic alloca (#122508 ) Create a probe loop for dynamic allocation and add the corresponding SelectionDAG support in order to use it.	2025-01-16 11:58:42 -08:00
Craig Topper	fc7a1ed0ba	[RISCV] Fold vp.reverse(vp.load(ADDR, MASK)) -> vp.strided.load(ADDR, -1, MASK). (#123115 ) Co-authored-by: Brandon Wu <brandon.wu@sifive.com>	2025-01-16 08:20:17 -08:00
Alexey Bataev	bab7920fd7	[RISCV][CG]Use processShuffleMasks for per-register shuffles Patch adds usage of processShuffleMasks in in codegen in lowerShuffleViaVRegSplitting. This function is already used for X86 shuffles estimations and in DAGTypeLegalizer::SplitVecRes_VECTOR_SHUFFLE functions, unifies the code. Reviewers: topperc, wangpc-pp, lukel97, preames Reviewed By: preames Pull Request: https://github.com/llvm/llvm-project/pull/121765	2025-01-13 17:06:25 -05:00
Philip Reames	24bb180e8a	[RISCV] Attempt to widen SEW before generic shuffle lowering (#122311 ) This takes inspiration from AArch64 which does the same thing to assist with zip/trn/etc.. Doing this recursion unconditionally when the mask allows is slightly questionable, but seems to work out okay in practice. As a bit of context, it's helpful to realize that we have existing logic in both DAGCombine and InstCombine which mutates the element width of in an analogous manner. However, that code has two restriction which prevent it from handling the motivating cases here. First, it only triggers if there is a bitcast involving a different element type. Second, the matcher used considers a partially undef wide element to be a non-match. I considered trying to relax those assumptions, but the information loss for undef in mid-level opt seemed more likely to open a can of worms than I wanted.	2025-01-10 07:12:24 -08:00
Craig Topper	b0f11dfc75	[RISCV] Add call preserved regmask to tail calls. (#122181 ) Every call should have regmask operand to indicate what registers are preserved or clobbered by the call. VirtRegRewriter uses this to tell MachineRegisterInfo what registers are clobbered by a function. If the mask isn't present the registers potentially clobbered by a tail called function aren't counted. I have checked ARM, AArch64, and X86 and they all have a regmask operand on their tail calls. I believe this fixes an issue I'm seeing with IPRA.	2025-01-08 16:19:31 -08:00
Mikhail R. Gadelha	b0e05a5f04	[RISCV] Add missing check before accessing pointer C2 can be null here, so we need to check it or clang may crash.	2025-01-07 17:55:37 +08:00
Craig Topper	c8d435f9af	[RISCV] Use ISD::XOR instead of RISCVISD::VMXOR_VL in lowerVectorMaskVecReduction of scalable ISD::VECREDUCE_AND (#121812 ) This allows combining the XOR with earlier ISD::ANDs inserted by type legalization.	2025-01-06 14:26:21 -08:00
Craig Topper	1401703fe4	[RISCV] Add Enum for CSR encodings. (#121674 ) This allows us to use them in C++ code without needing to do a table lookup.	2025-01-06 10:11:29 -08:00
Luke Lau	b359c84f3a	[RISCV] Don't commute with shift if it would break sh{1,2,3}add pattern (#119527 ) This fixes a regression from #101294 by checking if we might be clobbering a sh{1,2,3}add pattern. Only do this is the underlying add isn't going to be folded away into an address offset.	2025-01-06 19:23:21 +08:00
Philip Reames	6840521739	Revert "[RISCV][CG]Use processShuffleMasks for per-register shuffles" This reverts commit b8952d4b1b0c73bf39d6440ad3166a088ced563f. spec x264 fails to build in all VLS configurations, with the assertion failure: clang: ../llvm-project/llvm/lib/Target/RISCV/RISCVISelLowering.cpp:5246: llvm::SDValue lowerShuffleViaVRegSplitting(llvm::ShuffleVectorSDNode*, llvm::SelectionDAG&, const llvm::RISCVSubtarget&): Assertion `RegCnt == NumOfDestRegs && "Whole vector must be processed"' failed. I can reduce a failing piece of IR, but the failure appears pretty broad, so I suspect any reasonable vls build will hit it.	2025-01-01 10:53:24 -08:00
Alexey Bataev	b8952d4b1b	[RISCV][CG]Use processShuffleMasks for per-register shuffles Patch adds usage of processShuffleMasks in in codegen in lowerShuffleViaVRegSplitting. This function is already used for X86 shuffles estimations and in DAGTypeLegalizer::SplitVecRes_VECTOR_SHUFFLE functions, unifies the code. Reviewers: preames, topperc, lukel97, wangpc-pp Reviewed By: wangpc-pp Pull Request: https://github.com/llvm/llvm-project/pull/120803	2024-12-23 11:18:10 -05:00
Sergei Barannikov	9ae92d7056	[SelectionDAG] Virtualize isTargetStrictFPOpcode / isTargetMemoryOpcode (#119969 ) With this change, targets are no longer required to put memory / strict-fp opcodes after special `ISD::FIRST_TARGET_MEMORY_OPCODE`/`ISD::FIRST_TARGET_STRICTFP_OPCODE` markers. This will also allow autogenerating `isTargetMemoryOpcode`/`isTargetStrictFPOpcode (#119709). Pull Request: https://github.com/llvm/llvm-project/pull/119969	2024-12-21 05:29:51 +03:00
Craig Topper	f139bde8d8	[SelectionDAG] Move SDNode::use_iterator::getOperandNo to SDUse. (#120536 ) This allows us to write more range based for loops because we no longer need the iterator. It also matches IR's Use class.	2024-12-19 09:07:42 -08:00
Craig Topper	e6b2495545	[SelectionDAG] Split SDNode::use_iterator into user_iterator and use_iterator. (#120531 ) SDNode::use_iterator now returns an SDUse& when dereferenced. SDNode::user_iterator returns SDNode*. SDNode::use_begin/use_end/uses work on use_iterator. SDNode::user_begin/user_end/users work on user_iterator. We can now write range based for loops using SDUse& and SDNode::uses(). I've converted many of these in this patch. I didn't update loops that have additional variables updated in their for statement. Some loops use SDNode::use_iterator::getOperandNo() which also prevents using range based for loops. I plan to move this into SDUse in a follow up patch.	2024-12-19 08:35:32 -08:00
Craig Topper	bd261ecc5a	[SelectionDAG] Add SDNode::user_begin() and use it in some places (#120509 ) Most of these are just places that want the first user and aren't iterating over the whole list. While there I changed some use_size() == 1 to hasOneUse() which is more efficient. This is part of an effort to rename use_iterator to user_iterator and provide a use_iterator that dereferences to SDUse&. This patch helps reduce the diff on later patches.	2024-12-18 22:13:04 -08:00
Craig Topper	104ad9258a	[SelectionDAG] Rename SDNode::uses() to users(). (#120499 ) This function is most often used in range based loops or algorithms where the iterator is implicitly dereferenced. The dereference returns an SDNode * of the user rather than SDUse * so users() is a better name. I've long beeen annoyed that we can't write a range based loop over SDUse when we need getOperandNo. I plan to rename use_iterator to user_iterator and add a use_iterator that returns SDUse& on dereference. This will make it more like IR.	2024-12-18 20:09:33 -08:00
Craig Topper	dc72ec808d	[RISCV] Custom legalize vp.merge for mask vectors. (#120479 ) The default legalization uses vmslt with a vector of XLen to compute a mask. This doesn't work if the type isn't legal. For fixed vectors it will scalarize. For scalable vectors it crashes the compiler. This patch uses an alternate strategy that promotes the i1 vector to an i8 vector and does the merge. I don't claim this to be the best lowering. I wrote it quickly almost 3 years ago when a crash was reported in our downstream. Fixes #120405.	2024-12-18 19:19:14 -08:00
Philip Reames	984cb791db	[RISCV] Use vmv.v.x to materialize masks in deinterleave2 lowering (#118500 ) This is a follow up to 2af2634 to use vmv.v.x of i8 constants instead of the prior vid/vand/vmsne sequence. The advantage of the vmv.v.x sequence is that it's always m1 (so cheaper at high LMUL), and can be rematerialized by the register allocator if needed to locally reduce register pressure.	2024-12-17 12:50:09 -08:00
Brendan Sweeney	bfe8a21bad	[RISCV][ISEL] Lowering to load-acquire/store-release for RISCV Zalasr (#82914 ) Lowering to load-acquire/store-release for RISCV Zalasr. Currently uses the psABI lowerings for WMO load-acquire/store-release (which are identical to A.7). These are incompatable with the A.6 lowerings currently used by LLVM. This should be OK for now since Zalasr is behind the enable experimental extensions flag, but needs to be fixed before it is removed from that. For TSO, it uses the standard Ztso mappings except for lowering seq_cst loads/store to load-acquire/store-release, I had Andrea review that.	2024-12-17 00:19:45 -08:00
Luke Lau	088db868f3	[RISCV] Merge shuffle sources if lanes are disjoint (#119401 ) In x264, there's a few kernels with shuffles like this: %41 = add nsw <16 x i32> %39, %40 %42 = sub nsw <16 x i32> %39, %40 %43 = shufflevector <16 x i32> %41, <16 x i32> %42, <16 x i32> <i32 11, i32 15, i32 7, i32 3, i32 26, i32 30, i32 22, i32 18, i32 9, i32 13, i32 5, i32 1, i32 24, i32 28, i32 20, i32 16> Because this is a complex two-source shuffle, this will get lowered as two vrgather.vvs that are blended together. vadd.vv v20, v16, v12 vsub.vv v12, v16, v12 vrgatherei16.vv v24, v20, v10 vrgatherei16.vv v24, v12, v16, v0.t However the indices coming from each source are disjoint, so we can blend the two together and perform a single source shuffle instead: %41 = add nsw <16 x i32> %39, %40 %42 = sub nsw <16 x i32> %39, %40 %43 = select <0,0,0,0,1,1,1,1,0,0,0,0,1,1,1,1> %41, %42 %44 = shufflevector <16 x i32> %43, <16 x i32> poison, <16 x i32> <i32 11, i32 15, i32 7, i32 3, i32 10, i32 14, i32 6, i32 2, i32 9, i32 13, i32 5, i32 1, i32 8, i32 12, i32 4, i32 0> The select will likely get merged into the preceding instruction, and then we only have to do one vrgather.vv: vadd.vv v20, v16, v12 vsub.vv v20, v16, v12, v0.t vrgatherei16.vv v24, v20, v10 This patch bails if either of the sources are a broadcast/splat/identity shuffle, since that will usually already have some sort of cheaper lowering. This improves performance on 525.x264_r by 4.12% with -O3 -flto -march=rva22u64_v on the spacemit-x60.	2024-12-12 13:41:27 +08:00
Luke Lau	2698fc699b	[RISCV] Refactor helper in isDesirableToCommuteWithShift. NFC (#119526 ) Instead of duplicating the loop twice, add arguments to the lambda. I plan on reusing this in #119527	2024-12-12 13:24:28 +08:00
Raphael Moreira Zinsly	708a478d67	[RISCV] Add stack clash protection (#117612 ) Enable `-fstack-clash-protection` for RISCV and stack probe for function prologues. We probe the stack by creating a loop that allocates and probe the stack in ProbeSize chunks. We emit an unrolled probe loop for small allocations and emit a variable length probe loop for bigger ones.	2024-12-10 16:48:26 +00:00
LiqinWeng	3083acc215	[DAGCombine] Remove oneuse restrictions for RISCV in folding (shl (add_nsw x, c1)), c2) and folding (shl(sext(add x, c1)), c2) in some scenarios (#101294 ) This patch remove the restriction for folding (shl (add_nsw x, c1)), c2) and folding (shl(sext(add x, c1)), c2), and test case from dhrystone , see this link: riscv32: https://godbolt.org/z/o8GdMKrae riscv64: https://godbolt.org/z/Yh5bPz56z	2024-12-10 11:17:54 +08:00
David Sherwood	8630a7ba7c	Reapply "[DAGCombiner] Add support for scalarising extracts of a vector setcc (#117566 )" (#118823 ) [Reverts d57892a2a153ab71a796f07e39d939eae6910c21] For IR like this: %icmp = icmp ult <4 x i32> %a, splat (i32 5) %res = extractelement <4 x i1> %icmp, i32 1 where there is only one use of %icmp we can take a similar approach to what we already do for binary ops such add, sub, etc. and convert this into %ext = extractelement <4 x i32> %a, i32 1 %res = icmp ult i32 %ext, 5 For AArch64 targets at least the scalar boolean result will almost certainly need to be in a GPR anyway, since it will probably be used by branches for control flow. I've tried to reuse existing code in scalarizeExtractedBinop to also work for setcc. NOTE: The optimisations don't apply for tests such as extract_icmp_v4i32_splat_rhs in the file CodeGen/AArch64/extract-vector-cmp.ll because scalarizeExtractedBinOp only works if one of the input operands is a constant. --------- Co-authored-by: Paul Walker <paul.walker@arm.com>	2024-12-09 10:56:44 +00:00
Philip Reames	02ad623bb5	[RISCV] Prefer strided store for interleave store with one lane active (#119027 ) If we're performing a segment store and all but one of the segments are undefined, that's equivalent to performing a strided store of the one active segment. This is the store side of a905203b. As before, this only covers fixed vectors.	2024-12-06 16:45:01 -08:00
Michael Maitland	84efad0b47	[RISCV][MRI] Account for fixed registers when determining callee saved regs (#115756 ) This fixes https://discourse.llvm.org/t/fixed-register-being-spill-and-restored-in-clang/83058. We need to do it in `MachineRegisterInfo::getCalleeSavedRegs` instead of `RISCVRegisterInfo::getCalleeSavedRegs` since the MF argument of `TargetRegisterInfo:::getCalleeSavedRegs` is `const`, so we can't call `MF->getRegInfo().disableCalleeSavedRegister` there. So to put it in `MachineRegisterInfo::getCalleeSavedRegs`, we move `isRegisterReservedByUser` into `TargetSubtargetInfo`.	2024-12-06 14:07:27 -05:00
Pengcheng Wang	35619c791d	[RISCV] Add tune info for mem* expansion (#118439 ) So that CPUs can tune these options.	2024-12-06 14:48:37 +08:00
Philip Reames	e60a939a51	[RISCV] Use zext and shift for spread(4,8) when types allow (#118893 ) For a spread with an element type small enough, we can use a zext and shift to perform the shuffle. For e8, this covers spread(2,4,8), and for e16 covers spread(2,4). Note that spread(2) is already covered by the existing interleave logic, and is simply listed for completeness in the prior description.	2024-12-05 16:34:15 -08:00
Philip Reames	66a0a08133	[RISCV] Extract spread(2,4,8) shuffle lowering from interleave(2) (#118822 ) This is a prep patch for improving spread(4,8) shuffles. I also think it improves the readability of the existing code, but the primary motivation is simply staging work.	2024-12-05 11:32:27 -08:00
Mikhail R. Gadelha	59a9e4d8a4	[RISCV] Update matchSplatAsGather to convert vectors if they have different sizes (#117878 ) This patch updates the matchSplatAsGather function so we can handle vectors of different sizes. The goal is to improve the code gen for @llvm.experimental.vector.match on RISCV. Currently, we use a scalar extract and splat instead of vrgather, and the patch changes that.	2024-12-05 11:47:02 -03:00
Vitaly Buka	d57892a2a1	Revert "[DAGCombiner] Add support for scalarising extracts of a vector setcc" (#118693 ) Reverts llvm/llvm-project#117566 Breaks libc++ tests with HWASAN https://lab.llvm.org/buildbot/#/builders/55/builds/3959	2024-12-04 12:36:46 -08:00
Philip Reames	a6e7749ea9	[RISCV] Improve lowering of spread(2) shuffles (#118658 ) A spread(2) shuffle is just a interleave with an undef lane. The existing lowering was reusing the even lane for the undef value. This was entirely legal, but non-optimal.	2024-12-04 12:21:08 -08:00
David Sherwood	4675db5f39	[DAGCombiner] Add support for scalarising extracts of a vector setcc (#117566 ) For IR like this: %icmp = icmp ult <4 x i32> %a, splat (i32 5) %res = extractelement <4 x i1> %icmp, i32 1 where there is only one use of %icmp we can take a similar approach to what we already do for binary ops such add, sub, etc. and convert this into %ext = extractelement <4 x i32> %a, i32 1 %res = icmp ult i32 %ext, 5 For AArch64 targets at least the scalar boolean result will almost certainly need to be in a GPR anyway, since it will probably be used by branches for control flow. I've tried to reuse existing code in scalarizeExtractedBinop to also work for setcc. NOTE: The optimisations don't apply for tests such as extract_icmp_v4i32_splat_rhs in the file CodeGen/AArch64/extract-vector-cmp.ll because scalarizeExtractedBinOp only works if one of the input operands is a constant.	2024-12-04 10:26:51 +00:00
Craig Topper	b076fbb844	[TargetLowering] Use Type* instead of EVT in shouldSignExtendTypeInLibCall. (#118587 ) I want to use this function for GISel too so Type * is a better common interface. All of the callers already convert EVT to Type * as needed by calling lowering anyway.	2024-12-03 22:06:55 -08:00
Brandon Wu	109e4a147f	[RISCV] Handle zeroinitializer of vector tuple Type (#113995 ) It doesn't make sense to add a new generic ISD to handle riscv tuple type. Instead we use `SPLAT_VECTOR` for ISD and further lower to `VMV_V_X`. Note: If there's `visitSPLAT_VECTOR` in generic DAG combiner, it needs to skip riscv vector tuple type. Stack on https://github.com/llvm/llvm-project/pull/114329	2024-12-04 13:40:02 +08:00
Philip Reames	c1afcaf33b	[RISCV] Match deinterleave(4,8) shuffles to SHL/TRUNC when legal (#118509 ) We can extend the existing SHL+TRUNC lowering used for deinterleave2 for deinterleave4, and deinterleave8 when the result types are small enough to allow the shift to be legal. On RV64, this means i8 and i16 results for deinterleave4 and i8 results for deinterleave8.	2024-12-03 19:35:52 -08:00
Philip Reames	b8805c88ce	[RISCV] Clang-format a few lines to remove diff in a nearby patch	2024-12-03 10:42:17 -08:00
Philip Reames	2af2634c64	[RISCV] Use vcompress in deinterleave2 intrinsic lowering (#118325 ) This is analogous to febbf91 which added shuffle lowering using vcompress; we can do the same thing in the deinterleave2 lowering path which is used for scalable vectors. Note that we can further improve this for high lmul usage by adjusting how we materialize the mask (whose result is at most m1 with a known bit pattern). I am deliberately staging the work so that the changes to reduce register pressure are more easily evaluated on their own merit.	2024-12-02 18:37:32 -08:00
Philip Reames	56cb5cbfcd	[RISCV] Remove RISCVISD::VNSRL_VL and adjust deinterleave lowering to match (#118391 ) Instead of directly lowering to vnsrl_vl and having custom pattern matching for that case, we can just lower to a (legal) shift and truncate, and let generic pattern matching produce the vnsrl. The major motivation for this is that I'm going to reuse this logic to handle e.g. deinterleave4 w/ i8 result. The test changes aren't particularly interesting. They're minor code improvements - I think because we do slightly better with the insert_subvector patterns, but that's mostly irrelevant.	2024-12-02 13:39:12 -08:00

1 2 3 4 5 ...

1839 Commits