llvm-project

Author	SHA1	Message	Date
Philip Reames	9478822f4f	[RISCV] Decompose single source shuffles (without exact VLEN) (#126951 ) (This is a re-apply for what was 8374d42. The bug there was fairly major - despite the comments and review description, the code was using each register in the source register group, not only the first register. This was completely wrong.) This is a continuation of the work started in https://github.com/llvm/llvm-project/pull/125735 to lower selected VLA shuffles in linear m1 components instead of generating O(LMUL^2) or O(LMUL*Log2(LMUL) high LMUL shuffles. This pattern focuses on shuffles where all the elements being used across the entire destination register group come from a single register in the source register group. Such cases come up fairly frequently via e.g. spread(N), and repeat(N) idioms. One subtlety to this patch is the handling of the index vector for vrgatherei16.vv. Because the index and source registers can have different EEW, the index vector for the Nth chunk of the destination is not guaranteed to be register aligned. In fact, it is common for e.g. an EEW=64 shuffle to have EEW=16 indices which are four chunks per source register. Given this, we have to pay a cost for extracting these chunks into the low position before performing each shuffle. I'd initially expressed this as a naive extract sub-vector for each data parallel piece. However, at high LMUL, this quickly caused register pressure problems since we could at worst need 4x the temporary registers for the index. Instead, this patch uses a repeating slidedown chained from previous iterations. This increases critical path by at worst 3 slides (SEW=64 is the worst case), but reduces register pressure to at worst 2x - and only if the original index vector is reused elsewhere. I view this as arguably a bit of a workaround (since our scheduling should have done better with the plain extract variant), but a probably necessary one.	2025-02-12 12:10:35 -08:00
Philip Reames	c77d202759	Revert "[RISCV] Decompose single source shuffles (without exact VLEN) (#126108 )" This reverts commit 8374d421861cd3d47e21ae7889ba0b4c498e8d85. A miscompile was reported against the review thread, reverting while we investigate.	2025-02-12 08:33:20 -08:00
Philip Reames	ff8f6abe20	Reapply "[RISCV] Allow undef prefix for local repeating VLA shuffle lowering (#126097 )" (With a fix to recently added code.) Implement the first TODO from #125735, and minorly cleanup code using same style as the recently landed strict prefix case.	2025-02-12 07:40:13 -08:00
Philip Reames	a6a5507e36	Revert "[RISCV] Allow undef prefix for local repeating VLA shuffle lowering (#126097 )" This reverts commit ab0006ddba3e977c44e1e761909e09603816b32c. It appears to have rebased badly during web merge.	2025-02-11 15:46:12 -08:00
Philip Reames	ab0006ddba	[RISCV] Allow undef prefix for local repeating VLA shuffle lowering (#126097 ) Implement the first TODO from #125735, and minorly cleanup code using same style as the recently landed strict prefix case.	2025-02-11 15:30:36 -08:00
Philip Reames	8374d42186	[RISCV] Decompose single source shuffles (without exact VLEN) (#126108 ) This is a continuation of the work started in #125735 to lower selected VLA shuffles in linear m1 components instead of generating O(LMUL^2) or O(LMUL*Log2(LMUL) high LMUL shuffles. This pattern focuses on shuffles where all the elements being used across the entire destination register group come from a single register in the source register group. Such cases come up fairly frequently via e.g. spread(N), and repeat(N) idioms. One subtlety to this patch is the handling of the index vector for vrgatherei16.vv. Because the index and source registers can have different EEW, the index vector for the Nth chunk of the destination is not guaranteed to be register aligned. In fact, it is common for e.g. an EEW=64 shuffle to have EEW=16 indices which are four chunks per source register. Given this, we have to pay a cost for extracting these chunks into the low position before performing each shuffle. I'd initially expressed this as a naive extract sub-vector for each data parallel piece. However, at high LMUL, this quickly caused register pressure problems since we could at worst need 4x the temporary registers for the index. Instead, this patch uses a repeating slidedown chained from previous iterations. This increases critical path by at worst 3 slides (SEW=64 is the worst case), but reduces register pressure to at worst 2x - and only if the original index vector is reused elsewhere. I view this as arguably a bit of a workaround (since our scheduling should have done better with the plain extract variant), but a probably neccessary one.	2025-02-11 14:40:36 -08:00
Philip Reames	e4016bf5c3	[DAG] Use ArrayRef to simplify ShuffleVectorSDNode::isSplatMask	2025-02-11 12:47:10 -08:00
Sudharsan Veeravalli	83783e8bec	[RISCV] Fix typos discovered by codespell (NFC) (#126191 ) Found using https://github.com/codespell-project/codespell ``` codespell RISCV --write-changes \ --ignore-words-list=FPR,fpr,VAs,ORE,WorstCase,hart,sie,MIs,FLE,fle,CarryIn,vor,OLT,VILL,vill,bu,pass-thru ```	2025-02-07 13:35:30 +05:30
Craig Topper	932d0ce325	Recommit "[RISCV] Prefer (select (x < 0), y, z) -> x >> (XLEN - 1) & (y - z) + z even with Zicond. (#125772 )" With the test changes. Original message: The Zicond version of this requires an li instruction and an additional register. Without Zicond we match this in a DAGCombine on RISCVISD::SELECT_CC. This PR has 2 commits. I'll pre-commit the test change if this looks good.	2025-02-06 10:29:45 -08:00
Craig Topper	bddc815c61	Revert "[RISCV] Prefer (select (x < 0), y, z) -> x >> (XLEN - 1) & (y - z) + z even with Zicond. (#125772 )" This reverts commit ead88c787c4eba28b2148a5aaf190186bdb40820. I seem to have lost the test updates when rebasing.	2025-02-06 09:22:14 -08:00
Craig Topper	ead88c787c	[RISCV] Prefer (select (x < 0), y, z) -> x >> (XLEN - 1) & (y - z) + z even with Zicond. (#125772 ) The Zicond version of this requires an li instruction and an additional register. Without Zicond we match this in a DAGCombine on RISCVISD::SELECT_CC. This PR has 2 commits. I'll pre-commit the test change if this looks good.	2025-02-06 08:53:40 -08:00
Philip Reames	f2ac265c22	[RISCV] Reduce the LMUL for a vrgather operation if legal (#125768 ) If we're lowering a shuffle to a vrgather (or vcompress), and we know that a prefix of the operation can be done while producing the same (defined) lanes, do the operation with a narrower LMUL.	2025-02-06 07:15:36 -08:00
Philip Reames	fc10ad1a66	[RISCV] Make single source reverse legal in isShuffleMaskLegal (#125949 ) This enables DAG combines to form this mask. Reverse is generally linear in LMUL so this is reasonable, and results in better codegen for the 2 source variants. For <= m1, the change is only slightly profitable if at all. We trade some mask creation and an extract vrsub for a vslideup.vi. This is likely roughly neutral. At >= m2, this is distinctly profitable as generic DAG pushes the reverse into the two operands. We effectively already did this for one operand, but the other was hitting a full O(LMUL^2) shuffle. Moving that to be O(LMUL/2) operation is a big win.	2025-02-05 18:12:59 -08:00
Min-Yih Hsu	5a1e16f6de	[IR][RISCV] Add llvm.vector.(de)interleave3/5/7 (#124825 ) These three intrinsics are similar to llvm.vector.(de)interleave2 but work with 3/5/7 vector operands or results. For RISC-V, it's important to have them in order to support segmented load/store with factor of 2 to 8: factor of 2/4/8 can be synthesized from (de)interleave2; factor of 6 can be synthesized from factor of 2 and 3; factor 5 and 7 have their own intrinsics added by this patch. This patch only adds codegen support for these intrinsics, we still need to teach vectorizer to generate them as well as teaching InterleavedAccessPass to use them. --------- Co-authored-by: Craig Topper <craig.topper@sifive.com>	2025-02-05 15:30:33 -08:00
Craig Topper	0d7ee520d3	[RISCV] Use getSignedConstant for negative values. (#125903 ) The APInt constructor asserts if bits are set past the size of the APInt unless it is signed. This currently fails on RV32 because more than XLen bits are set.	2025-02-05 14:49:01 -08:00
Alexey Bataev	23b6a05ec9	[CG][RISCV]Fix shuffling of odd number of input vectors If the input contains odd number of shuffled vectors, the 2 last shuffles are shuffled with the same first vector. Need to correctly process such situation: when the first vector is requested for the first time - extract it from the source vector, when it is requested the second time - reuse previous result. The second vector should be extracted in both cases. Fixes #125269 Reviewers: topperc, preames Reviewed By: preames Pull Request: https://github.com/llvm/llvm-project/pull/125693	2025-02-05 07:13:33 -05:00
Luke Lau	0815b0e7ce	[RISCV] Don't custom lower direct fp_extends where possible (#125644 ) This avoids lowering scalable fp_extends that don't need multiple extends (i.e. f16->f32, f32->f64) to _vl nodes, but converts them back during DAG preprocessing so we don't need to add any more patterns. Keeping the nodes in their generic SDNode form matches more splat patterns	2025-02-05 12:29:26 +08:00
Philip Reames	6b3cbf2a0f	[RISCV] Decompose locally repeating shuffles (without exact VLEN) (#125735 ) High LMUL shuffles are expensive on typical SIMD implementations. Without exact vector length knowledge, we struggle to map elements within the vector to the register within the vector register group. However, there are some patterns where we can perform a vector length agnostic (VLA) shuffle by leveraging knowledge of the pattern performed even without the ability to map individual elements to registers. An existing in tree example is vector reverse. This patch introduces another such case. Specifically, if we have a shuffle where the a local rearrangement of elements is happening within a 128b (really zvlNb) chunk, and we're applying the same pattern to each chunk, we can decompose a high LMUL shuffle into a linear number of m1 shuffles. We take advantage of the fact the tail of the operation is undefined, and repeat the pattern for all elements in the source register group - not just the ones the fixed vector type covers. This is an optimization for typical SIMD vrgather designs, but could be a pessimation on hardware for which vrgather's execution cost is not independent of the runtime VL.	2025-02-04 19:10:56 -08:00
Min-Yih Hsu	005b23bb3b	[IA][RISCV] Support VP loads/stores in InterleavedAccessPass (#120490 ) Teach InterleavedAccessPass to recognize the following patterns: - vp.store an interleaved scalable vector - Deinterleaving a scalable vector loaded from vp.load Upon recognizing these patterns, IA will collect the interleaved / deinterleaved operands and delegate them over to their respective newly-added TLI hooks. For RISC-V, these patterns are lowered into segmented loads/stores Right now we only recognized power-of-two (de)interleave cases, in which (de)interleave4/8 are synthesized from a tree of (de)interleave2. --------- Co-authored-by: Nikolay Panchenko <nicholas.panchenko@gmail.com>	2025-02-04 11:07:34 -08:00
Piotr Fusik	f7aad60cd1	[RISCV] Fold vector shift of sext/zext to widening multiply (#121563 ) (shl (sext X), C) -> (vwmulsu X, 1u << C) (shl (zext X), C) -> (vwmulu X, 1u << C)	2025-02-04 16:59:57 +01:00
Craig Topper	7c5100d36d	[RISCV] Check isFixedLengthVector before calling getVectorNumElements in getSingleShuffleSrc. (#125455 ) I have been unsuccessful at further reducing the test. The failure requires a shuffle with 2 scalable->fixed extracts with the same source. 0 is the only valid index for a scalable->fixed extract so the 2 sources must be the same extract. Shuffles with the same source are aggressively canonicalized to a unary shuffle. So it requires the extracts to become identical through other optimizations without the shuffle being canonicalized before it is lowered. Fixes #125306.	2025-02-03 13:48:42 -08:00
Philip Reames	d841c8842e	[RISCV] Move spread(4,8) shuffle lowering above generic fallbacks [NFC NFC because the patterns are distinct, but has confused me now twice despite being the person who wrote said code.	2025-01-31 12:43:46 -08:00
Craig Topper	e7e72a9bb4	[RISCV] Add DAG combine for forming VAADDU_VL from VP intrinsics. (#124848 ) This adds a VP version of an existing DAG combine. I've put it in RISCVISelLowering since we would need to add a ISD::VP_AVGCEIL opcode otherwise. This pattern appears in 525.264_r.	2025-01-30 09:03:00 -08:00
Djordje Todorovic	0cb7636a46	[RISCV] Add MIPS extensions (#121394 ) Adding two extensions for MIPS p8700 CPU: 1. cmove (conditional move) 2. lsp (load/store pair) The official product page here: https://mips.com/products/hardware/p8700	2025-01-28 08:04:09 +01:00
Philip Reames	a9ad601f7c	[RISCV] Use vrsub for select of add and sub of the same operands (#123400 ) If we have a (vselect c, a+b, a-b), we can combine this to a+(vselect c, b, -b). That by itself isn't hugely profitable, but if we reverse the select, we get a form which matches a masked vrsub.vi with zero. The result is that we can use a masked vrsub before the add instead of a masked add or sub. This doesn't change the critical path (since we already had the pass through on the masked second op), but does reduce register pressure since a, b, and (a+b) don't need to all be alive at once. In addition to the vselect form, we can also see the same pattern with a vector_shuffle encoding the vselect. I explored canonicalizing these to vselects instead, but that exposes several unrelated missing combines.	2025-01-24 10:08:42 -08:00
Sam Elliott	e06b703030	[RISCV][NFC] Remove Redundant Inline Asm Logic (#124202 ) This was left over from 408659c5b5c7d745042ae71db344d1ed10601512.	2025-01-23 17:40:36 -08:00
Sam Elliott	33c4407471	[RISCV] Support cR Inline Asm Constraint (#124174 ) This denotes RVC-compatible GPR Pairs, which are used by the Zclsd extension. C API PR: riscv-non-isa/riscv-c-api-doc#102	2025-01-23 16:19:19 -08:00
Min-Yih Hsu	bc74a1edbe	[IA] Generalize the support for power-of-two (de)interleave intrinsics (#123863 ) Previously, AArch64 used pattern matching to support llvm.vector.(de)interleave of 2 and 4; RISC-V only supported (de)interleave of 2. This patch consolidates the logics in these two targets by factoring out the common factor calculations into the InterleaveAccess Pass.	2025-01-23 15:27:51 -08:00
Philip Reames	e7001061b7	[RISCV] Revise naming and style in matchSplatAsGather [nfc]	2025-01-21 13:09:25 -08:00
Mikhail R. Gadelha	89f119cbda	[RISCV] Update matchSplatAsGather to use the index of extract_elt if in-bounds (#118873 ) This is a follow-up to #117878 and allows the usage of vrgather if the index we are accessing in VT is constant and within bounds. This patch replaces the previous behavior of bailing out if the length of the search vector is greater than the vector of elements we are searching for. Since matchSplatAsGather works on EXTRACT_VECTOR_ELT, and we know the index from which the element is extracted, we only need to check if we are doing an insert from a larger vector into a smaller one, in which we do an extract instead. Co-authored-by: Luke Lau luke_lau@icloud.com Co-authored-by: Philip Reames preames@rivosinc.com	2025-01-21 12:51:41 -08:00
yingopq	754ed95b66	[Mips] Fix compiler crash when returning fp128 after calling a functi… (#117525 ) …on returning { i8, i128 } Fixes https://github.com/llvm/llvm-project/issues/96432.	2025-01-20 16:47:40 +08:00
Philip Reames	143c33c6df	[RISCV] Consider only legally typed splats to be legal shuffles (#123415 ) Given the comment, I'd expected test coverage. There was none so let's do the simple thing which benefits the one thing we have tests for.	2025-01-17 19:13:04 -08:00
Craig Topper	0c6e03eea0	[RISCV] Fold vp.store(vp.reverse(VAL), ADDR, MASK) -> vp.strided.store(VAL, NEW_ADDR, -1, MASK) (#123123 ) Co-authored-by: Brandon Wu <brandon.wu@sifive.com>	2025-01-17 14:22:25 -08:00
Philip Reames	bb6e94a05d	[RISCV] Custom legalize <N x i128>, <4 x i256>, etc.. shuffles (#122352 ) I have a particular user downstream who likes to write shuffles in terms of unions involving _BitInt(128) types. This isn't completely crazy because there's a bunch of code in the wild which was written with SSE in mind, so 128 bits is a common data fragment size. The problem is that generic lowering scalarizes this to ELEN, and we end up with really terrible extract/insert sequences if the i128 shuffle is between other (non-i128) operations. I explored trying to do this via generic lowering infrastructure, and frankly got lost. Doing this a target specific DAG is a bit ugly - really, there's nothing hugely target specific here - but oh well. If reviewers prefer, I could probably phrase this as a generic DAG combine, but I'm not sure that's hugely better. If reviewers have a strong preference on how to handle this, let me know, but I may need a bit of help. A couple notes: * The argument passing weirdness is due to a missing combine to turn a build_vector of adjacent i64 loads back into a vector load. I'm a bit surprised we don't get that, but the isel output clearly has the build_vector at i64. * The splat case I plan to revisit in another patch. That's a relatively common pattern, and the fact I have to scalarize that to avoid an infinite loop is non-ideal.	2025-01-16 14:55:45 -08:00
Raphael Moreira Zinsly	01d7f434d2	[RISCV] Stack clash protection for dynamic alloca (#122508 ) Create a probe loop for dynamic allocation and add the corresponding SelectionDAG support in order to use it.	2025-01-16 11:58:42 -08:00
Craig Topper	fc7a1ed0ba	[RISCV] Fold vp.reverse(vp.load(ADDR, MASK)) -> vp.strided.load(ADDR, -1, MASK). (#123115 ) Co-authored-by: Brandon Wu <brandon.wu@sifive.com>	2025-01-16 08:20:17 -08:00
Alexey Bataev	bab7920fd7	[RISCV][CG]Use processShuffleMasks for per-register shuffles Patch adds usage of processShuffleMasks in in codegen in lowerShuffleViaVRegSplitting. This function is already used for X86 shuffles estimations and in DAGTypeLegalizer::SplitVecRes_VECTOR_SHUFFLE functions, unifies the code. Reviewers: topperc, wangpc-pp, lukel97, preames Reviewed By: preames Pull Request: https://github.com/llvm/llvm-project/pull/121765	2025-01-13 17:06:25 -05:00
Philip Reames	24bb180e8a	[RISCV] Attempt to widen SEW before generic shuffle lowering (#122311 ) This takes inspiration from AArch64 which does the same thing to assist with zip/trn/etc.. Doing this recursion unconditionally when the mask allows is slightly questionable, but seems to work out okay in practice. As a bit of context, it's helpful to realize that we have existing logic in both DAGCombine and InstCombine which mutates the element width of in an analogous manner. However, that code has two restriction which prevent it from handling the motivating cases here. First, it only triggers if there is a bitcast involving a different element type. Second, the matcher used considers a partially undef wide element to be a non-match. I considered trying to relax those assumptions, but the information loss for undef in mid-level opt seemed more likely to open a can of worms than I wanted.	2025-01-10 07:12:24 -08:00
Craig Topper	b0f11dfc75	[RISCV] Add call preserved regmask to tail calls. (#122181 ) Every call should have regmask operand to indicate what registers are preserved or clobbered by the call. VirtRegRewriter uses this to tell MachineRegisterInfo what registers are clobbered by a function. If the mask isn't present the registers potentially clobbered by a tail called function aren't counted. I have checked ARM, AArch64, and X86 and they all have a regmask operand on their tail calls. I believe this fixes an issue I'm seeing with IPRA.	2025-01-08 16:19:31 -08:00
Mikhail R. Gadelha	b0e05a5f04	[RISCV] Add missing check before accessing pointer C2 can be null here, so we need to check it or clang may crash.	2025-01-07 17:55:37 +08:00
Craig Topper	c8d435f9af	[RISCV] Use ISD::XOR instead of RISCVISD::VMXOR_VL in lowerVectorMaskVecReduction of scalable ISD::VECREDUCE_AND (#121812 ) This allows combining the XOR with earlier ISD::ANDs inserted by type legalization.	2025-01-06 14:26:21 -08:00
Craig Topper	1401703fe4	[RISCV] Add Enum for CSR encodings. (#121674 ) This allows us to use them in C++ code without needing to do a table lookup.	2025-01-06 10:11:29 -08:00
Luke Lau	b359c84f3a	[RISCV] Don't commute with shift if it would break sh{1,2,3}add pattern (#119527 ) This fixes a regression from #101294 by checking if we might be clobbering a sh{1,2,3}add pattern. Only do this is the underlying add isn't going to be folded away into an address offset.	2025-01-06 19:23:21 +08:00
Philip Reames	6840521739	Revert "[RISCV][CG]Use processShuffleMasks for per-register shuffles" This reverts commit b8952d4b1b0c73bf39d6440ad3166a088ced563f. spec x264 fails to build in all VLS configurations, with the assertion failure: clang: ../llvm-project/llvm/lib/Target/RISCV/RISCVISelLowering.cpp:5246: llvm::SDValue lowerShuffleViaVRegSplitting(llvm::ShuffleVectorSDNode*, llvm::SelectionDAG&, const llvm::RISCVSubtarget&): Assertion `RegCnt == NumOfDestRegs && "Whole vector must be processed"' failed. I can reduce a failing piece of IR, but the failure appears pretty broad, so I suspect any reasonable vls build will hit it.	2025-01-01 10:53:24 -08:00
Alexey Bataev	b8952d4b1b	[RISCV][CG]Use processShuffleMasks for per-register shuffles Patch adds usage of processShuffleMasks in in codegen in lowerShuffleViaVRegSplitting. This function is already used for X86 shuffles estimations and in DAGTypeLegalizer::SplitVecRes_VECTOR_SHUFFLE functions, unifies the code. Reviewers: preames, topperc, lukel97, wangpc-pp Reviewed By: wangpc-pp Pull Request: https://github.com/llvm/llvm-project/pull/120803	2024-12-23 11:18:10 -05:00
Sergei Barannikov	9ae92d7056	[SelectionDAG] Virtualize isTargetStrictFPOpcode / isTargetMemoryOpcode (#119969 ) With this change, targets are no longer required to put memory / strict-fp opcodes after special `ISD::FIRST_TARGET_MEMORY_OPCODE`/`ISD::FIRST_TARGET_STRICTFP_OPCODE` markers. This will also allow autogenerating `isTargetMemoryOpcode`/`isTargetStrictFPOpcode (#119709). Pull Request: https://github.com/llvm/llvm-project/pull/119969	2024-12-21 05:29:51 +03:00
Craig Topper	f139bde8d8	[SelectionDAG] Move SDNode::use_iterator::getOperandNo to SDUse. (#120536 ) This allows us to write more range based for loops because we no longer need the iterator. It also matches IR's Use class.	2024-12-19 09:07:42 -08:00
Craig Topper	e6b2495545	[SelectionDAG] Split SDNode::use_iterator into user_iterator and use_iterator. (#120531 ) SDNode::use_iterator now returns an SDUse& when dereferenced. SDNode::user_iterator returns SDNode*. SDNode::use_begin/use_end/uses work on use_iterator. SDNode::user_begin/user_end/users work on user_iterator. We can now write range based for loops using SDUse& and SDNode::uses(). I've converted many of these in this patch. I didn't update loops that have additional variables updated in their for statement. Some loops use SDNode::use_iterator::getOperandNo() which also prevents using range based for loops. I plan to move this into SDUse in a follow up patch.	2024-12-19 08:35:32 -08:00
Craig Topper	bd261ecc5a	[SelectionDAG] Add SDNode::user_begin() and use it in some places (#120509 ) Most of these are just places that want the first user and aren't iterating over the whole list. While there I changed some use_size() == 1 to hasOneUse() which is more efficient. This is part of an effort to rename use_iterator to user_iterator and provide a use_iterator that dereferences to SDUse&. This patch helps reduce the diff on later patches.	2024-12-18 22:13:04 -08:00
Craig Topper	104ad9258a	[SelectionDAG] Rename SDNode::uses() to users(). (#120499 ) This function is most often used in range based loops or algorithms where the iterator is implicitly dereferenced. The dereference returns an SDNode * of the user rather than SDUse * so users() is a better name. I've long beeen annoyed that we can't write a range based loop over SDUse when we need getOperandNo. I plan to rename use_iterator to user_iterator and add a use_iterator that returns SDUse& on dereference. This will make it more like IR.	2024-12-18 20:09:33 -08:00

1 2 3 4 5 ...

1862 Commits