llvm-project

Author	SHA1	Message	Date
Craig Topper	999b9e6ddb	[RISCV] Use vector getConstant instead of getSplatVector+getConstant. NFC	2024-04-10 19:39:41 -07:00
Jianjian Guan	fd50151180	[RISCV] Only support SPLAT_VECTOR for Zvfhmin when also enable the scalar extension of half fp (#88275 )	2024-04-11 10:23:26 +08:00
Craig Topper	323d3ab257	[RISCV] Optimize undef Even vector in getWideningInterleave. (#88221 ) We recently optimized the code when the Odd vector was undef to fix a poison bug. There are additional optimizations we can do if the even vector is undef. With Zvbb, we can use a single vwsll. Without Zvbb, we can use a vzext.vf2 and a vsll.	2024-04-10 09:08:50 -07:00
Chia	469caa31e7	[RISCV] Use vwadd.vx for splat vector with extension (#87249 ) This patch allows `combineBinOp_VLToVWBinOp_VL` to handle patterns like `(splat_vector (sext op))` or `(splat_vector (zext op))`. Then we can use `vwadd.vx` and `vwadd.w` for such a case. ### Source code ``` define <vscale x 8 x i64> @vwadd_vx_splat_sext(<vscale x 8 x i32> %va, i32 %b) { %sb = sext i32 %b to i64 %head = insertelement <vscale x 8 x i64> poison, i64 %sb, i32 0 %splat = shufflevector <vscale x 8 x i64> %head, <vscale x 8 x i64> poison, <vscale x 8 x i32> zeroinitializer %vc = sext <vscale x 8 x i32> %va to <vscale x 8 x i64> %ve = add <vscale x 8 x i64> %vc, %splat ret <vscale x 8 x i64> %ve } ``` ### Before this patch [Compiler Explorer](https://godbolt.org/z/sq191PsT4) ``` vwadd_vx_splat_sext: sext.w a0, a0 vsetvli a1, zero, e64, m8, ta, ma vmv.v.x v16, a0 vsetvli zero, zero, e32, m4, ta, ma vwadd.wv v16, v16, v8 vmv8r.v v8, v16 ret ``` ### After this patch ``` vwadd_vx_splat_sext vsetvli a1, zero, e32, m4, ta, ma vwadd.vx v16, v8, a0 vmv8r.v v8, v16 ret ```	2024-04-10 15:26:17 +09:00
Luke Lau	9c660362c4	[RISCV] Support vwsll in combineBinOp_VLToVWBinOp_VL (#87620 ) If the subtarget has +zvbb then we can attempt folding shl and shl_vl to vwsll nodes. There are few test cases where we still don't pick up the vwsll: - For fixed vector vwsll.vi on RV32, see the FIXME for VMV_V_X_VL in fillUpExtensionSupport for support implicit sign extension - For scalable vector vwsll.vi we need to support ISD::SPLAT_VECTOR, see #87249	2024-04-09 16:10:35 +08:00
Luke Lau	0f20b9b92f	[RISCV] Don't require mask or VL to be the same in combineBinOp_VLToVWBinOp_VL (#87997 ) In NodeExtensionHelper we keep track of the VL and mask of the operand being extended and check that they are the same as the root node's. However for the nodes that we support, none of them have a passthru operand with the exception of RISCV::VMV_V_X_VL, but we check that it's passthru is undef anyway. So it's safe to just discard the extend node's VL and mask and just use the root's instead. (This is the same type of reasoning we use to treat any vmset_vl as an all ones mask) This allows us to match some more cases where we mix VP/non-VP/VL nodes, but these don't seem to appear in practice. The main benefit from this would be to simplify the code.	2024-04-09 16:04:10 +08:00
Luke Lau	8b3b4a92ad	[RISCV] Fix canFoldToVWWithSameExtension allowing different FP extensions (#87978 )	2024-04-08 19:20:36 +08:00
Craig Topper	3b19cd7f80	[RISCV] Slightly simplify RVVArgDispatcher::constructArgInfos. NFC (#87308 ) Use a single insert for the non-mask case instead of a push_back followed by an insert that may contain 0 registers.	2024-04-02 18:34:03 -07:00
Craig Topper	a9af66a90e	[RISCV] Lower (vector_interleave X, undef) to (vzext_vl X). (#87283 ) If the odd vector is undef or poison, the widening add and multiply trick doesn't work unless we freeze the odd vector. Unfortunately, freezing doesn't work when the operand is provably undef/poison. MIR doesn't have a representation for freeze so it just becomes a COPY from IMPLICIT_DEF which freely propagates undef to each operand independently. To work around this, check for undef explicitly and lower to a VZEXT_VL of the even vector. This produces better code than we'd get from a freeze anyway. I've left a FIXME for adding a freeze. I'll do that as a separate patch as it affects other tests and doesn't help with the new test.	2024-04-02 11:58:41 -07:00
Brandon Wu	29e8bfc13c	[RISCV] RISCV vector calling convention (2/2) (#79096 ) This commit handles vector arguments/return for function definition/call, the new class RVVArgDispatcher is added for doing all vector register assignment including mask types, data types as well as tuple types. It precomputes the register number for each argument as per https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/master/riscv-cc.adoc#standard-vector-calling-convention-variant and it's passed to calling convention function to handle all vector arguments. Depends on: #78550	2024-03-30 21:05:33 +08:00
Luke Lau	2a315d800b	[RISCV] Combine (or disjoint ext, ext) -> vwadd (#86929 ) DAGCombiner (or InstCombine) will convert an add to an or if the bits are disjoint, which can prevent what was originally an (add {s,z}ext, {s,z}ext) from being selected as a vwadd. This teaches combineBinOp_VLToVWBinOp_VL to recover it by treating it as an add.	2024-03-29 19:45:24 +08:00
Luke Lau	a3c2d8c072	[RISCV] Combine ({s,u}{div,rem} (zext, zext)) -> (zext ({s,u}{div,rem} (zext, zext))) (#86779 ) This narrows unsigned and signed div and rem nodes via combineBinOpOfZExt. Unlike other binary ops, there are no widening div or rem instructions. So we will end up with an extra vzext.vf2. However I'm assuming that div/rem are expensive enough that by reducing their EMUL we will gain back the cost. Alive2 proof: https://alive2.llvm.org/ce/z/Et_L6y	2024-03-29 05:55:38 +08:00
Brandon Wu	91896607ff	[RISCV] RISCV vector calling convention (1/2) (#77560 ) [RISCV] RISCV vector calling convention (1/2) This is the vector calling convention based on https://github.com/riscv-non-isa/riscv-elf-psabi-doc, the idea is to split between "scalar" callee-saved registers and "vector" callee-saved registers. "scalar" ones remain the original strategy, however, "vector" ones are handled together with RVV objects. The stack layout would be: \|--------------------------\| <-- FP \| callee-allocated save \| \| area for register varargs\| \|--------------------------\| \| callee-saved registers \| <-- scalar callee-saved \| (scalar) \| \|--------------------------\| \| RVV alignment padding \| \|--------------------------\| \| callee-saved registers \| <-- vector callee-saved \| (vector) \| \|--------------------------\| \| RVV objects \| \|--------------------------\| \| padding before RVV \| \|--------------------------\| \| scalar local variables \| \|--------------------------\| <-- BP \| variable size objects \| \|--------------------------\| <-- SP Note: This patch doesn't contain "tuple" type, e.g. vint32m1x2. It will be handled in https://github.com/riscv-non-isa/riscv-elf-psabi-doc (2/2). Differential Revision: https://reviews.llvm.org/D154576	2024-03-27 23:03:13 +08:00
Luke Lau	87519a2830	[RISCV] Combine (mul (zext, zext)) -> (zext (mul (zext, zext))) (#86465 ) Building on #86248, we can also narrow the width of a mul of zexts. This is specifically legal because on RVV we always extend to the next power of 2 width, and multiplying two N bit integers produces a maximum value of 2\N bits. So as long as we keep an inner zext of 2\N, we will have enough space for the multiply and won't overflow. Alive2 proof: https://alive2.llvm.org/ce/z/XteYyb	2024-03-26 23:28:04 +08:00
Philip Reames	a6b870db09	[RISCV] Enable sub(max, min) lowering for ABDS and ABDU (#86592 ) We have the ISD nodes for representing signed and unsigned absolute difference. For RISCV, we have vector min/max in the base vector extension, so we can expand to the sub(max,min) lowering. We could almost use the default expansion, but since fixed length min/max are custom (not legal), the default expansion doesn't cover the fixed vector cases. The expansion here is just a copy of the generic code specialized to allow the custom min/max nodes to be created so they can in turn be legalized to the _vl variants. Existing DAG combines handle the recognition of absolute difference idioms and conversion into the respective ISD::ABDS and ISD::ABDU nodes. This change does have the net effect of potentially pushing a free floating zero/sign extend after the expansion, and we don't do a great job of folding that into later expressions. However, since in general narrowing can reduce required work (by reducing LMUL) this seems like the right general tradeoff.	2024-03-25 20:13:53 -07:00
Craig Topper	ce37a7131f	[RISCV] Add integer RISCVISD::SELECT_CC to canCreateUndefOrPoison and isGuaranteedNotToBeUndefOrPoison. (#84693 ) Integer RISCVISD::SELECT_CC doesn't create poison. If none of the, operands are poison, the result is not poison. This allows ISD::FREEZE to be hoisted above RISCVISD::SELECT_CC.	2024-03-25 11:10:58 -07:00
Luke Lau	373e77b4c0	[RISCV] Generalize (sub zext, zext) -> (sext (sub zext, zext)) to add (#86248 ) This generalizes the combine added in #82455 to other binary ops, beginning with adds in this patch. Because the two zext operands are always +ve when treated as signed, and we don't get any overflow since the add is carried out in at least N * 2 bits of the narrow type, the result of the add will always be +ve. So we can use a zext for the outer extend, unlike sub which may produce a -ve result from two +ve operands. Although we could still use sext for add, I plan to add support for other binary ops like mul in a later patch, but mul requires zext to be correct (because the maximum value will take up the full N * 2 bits). So I've opted to use zext here too for consistency. Alive2 proof: https://alive2.llvm.org/ce/z/PRNsUM	2024-03-25 13:08:56 +08:00
Harvin Iriawan	57146daeaa	[CodeGen] Update for scalable MemoryType in MMO (#70452 ) Remove getSizeOrUnknown call when MachineMemOperand is created. For Scalable TypeSize, the MemoryType created becomes a scalable_vector. 2 MMOs that have scalable memory access can then use the updated BasicAA that understands scalable LocationSize. Original Patch by Harvin Iriawan Co-authored-by: David Green <david.green@arm.com>	2024-03-23 12:56:25 +00:00
Luke Lau	51d5b65819	[RISCV] Handle scalable ops with < EEW / 2 narrow types in combineBinOp_VLToVWBinOp_VL (#84158 ) We can remove the restriction that the narrow type needs to be exactly EEW / 2 for scalable ISD::{ADD,SUB,MUL} nodes. This allows us to perform the combine even if we can't fully fold the extend into the widening op. VP intrinsics already do this, since they are lowered to _VL nodes which don't have this restriction. The "exactly EEW / 2" narrow type restriction prevented us from emitting V{S,Z}EXT_VL nodes with i1 element types which crash when we try to select them, since no other legal type is double the size of i1, see the test case added in this PR `i1_zext`. So to preserve this, this adds a check for i1 narrow types instead.	2024-03-22 07:26:29 +08:00
Luke Lau	06d245242e	[RISCV] Recursively split concat_vector into smaller LMULs when lowering (#85825 ) This is a reimplementation of the combine added in #83035 but as a lowering instead of a combine, so we don't regress the test case added in e59f120e3a14ccdc55fcb7be996efaa768daabe0 by interfering with the strided load combine Previously the combine had to concatenate the split vectors with insert_subvector instead of concat_vectors to prevent an infinite combine loop. And the reasoning behind keeping it as a combine was because if we emitted the insert_subvector during lowering then we didn't fold away inserts of undef subvectors. However it turns out we can avoid this if we just do this in lowering and select a concat_vector directly, since we get the undef folding for free with `DAG.getNode(ISD::CONCAT_VECTOR, ...)` via foldCONCAT_VECTORS.	2024-03-22 07:08:51 +08:00
Craig Topper	f5c90f3000	[RISCV] Use BuildPairF64 and SplitF64 for bitcast i64<->f64 on rv32 regardless of Zfa. (#85982 ) Previously we used BuildPairF64 and SplitF64 only if Zfa was supported since they will select register file moves that are only available with Zfa. We recently changed the handling of BuildPairF64/SplitF64 for Zdinx to not go through memory so we should use that for bitcast. That leaves the D without Zfa case that does need to go through memory. Previously we let type legalization expand to loads and stores using a new stack temporary created for each bitcast. After this patch we will create the loads ands stores in the custom inserter and share the same stack slot for all. This also allows DAGCombiner to optimize when bitcast is mixed with BuildPairF64/SplitF64.	2024-03-21 08:52:51 -07:00
Craig Topper	d42992e71c	[RISCV] Cleanup setOperationAction for ISD::BITCAST with Zfa and D extension. NFC We only need Custom handling for i64 on RV32. This will be used by type legalization. We don't need to make it custom for f64 to get type legalization to custom split i64. If f64 and i64 are legal types, then ISD::BITCAST should be legal.	2024-03-20 10:43:54 -07:00
Craig Topper	576d81baa5	[RISCV] Use REG_SEQUENCE/EXTRACT_SUBREG to move between individual GPRs and GPRPair. (#85887 ) Previously we used memory like we do to move between GPRs and FPR64 with the D extension on RV32. We can instead use REG_SEQUENCE/EXTRACT_SUBREG to inform register allocation how to do the copy without memory.	2024-03-20 08:44:24 -07:00
Jiahan Xie	4bf06bebb9	[GISEL][RISCV] IRTranslator for scalable vector load (#80006 ) Add IRTranslator for scalable vector load instruction and include corresponding tests with alignment argument included, which can be smaller/equal/larger than element size or smaller/equal/larger than the minimum total vector size.	2024-03-19 20:12:26 -04:00
Luke Lau	ef520ca6b1	Revert "[RISCV] Recursively split concat_vector into smaller LMULs (#83035 )" This reverts commit c59129a7c79448837d665de8f2743ad4b14666f6. This causes regressions in some x264 workloads like pixel_var_8x8 due to it interfering with the strided load combine. Reverting so I can try to rework it as a lowering instead.	2024-03-19 20:59:03 +08:00
Craig Topper	ffa2810f7b	[RISCV] Optimize lowering of VECREDUCE_FMINIMUM/VECREDUCE_FMAXIMUM. (#85165 ) Use a normal min/max reduction that doesn't propagate nans and force the result to nan at the end if any elements were nan.	2024-03-14 12:51:29 -07:00
Kolya Panchenko	aa68e2814d	[RISCV] Support `llvm.masked.compressstore` intrinsic (#83457 ) The changeset enables lowering of `llvm.masked.compressstore(%data, %ptr, %mask)` for RVV for fixed vector type into: ``` %0 = vcompress %data, %mask, %vl %new_vl = vcpop %mask, %vl vse %0, %ptr, %1, %new_vl ``` Such lowering is only possible when `%data` fits into available LMULs and otherwise `llvm.masked.compressstore` is scalarized by `ScalarizeMaskedMemIntrin` pass. Even though RVV spec in the section `15.8` provide alternative sequence for compressstore, use of `vcompress + vcpop` should be a proper canonical form to lower `llvm.masked.compressstore`. If RISC-V target find the sequence from `15.8` better, peephole optimization can transform `vcompress + vcpop` into that sequence.	2024-03-13 15:18:51 -04:00
Luke Lau	0ef61ed54d	[RISCV] Move NodeExtensionHelper assert to getOrCreateExtendedOp. NFC Move the narrow types assert from the ZERO_EXTEND/SIGN_EXTEND case in fillUpExtensionSupport to getOrCreateExtendedOp so we check the other nodes too.	2024-03-11 18:00:29 +08:00
Luke Lau	58dd59a282	[RISCV] Don't run combineBinOp_VLToVWBinOp_VL until after legalize types. NFCI (#84125 ) I noticed this from a discrepancy in fillUpExtensionSupport between how we apparently need to check for legal types for ISD::{ZERO,SIGN}_EXTEND, but we don't need to for RISCVISD::V{Z,S}EXT_VL. Prior to #72340, combineBinOp_VLToVWBinOp_VL only ran after type legalization because it only operated on _VL nodes. _VL nodes are only emitted during op legalization, which takes place after type legalization, which is presumably why the existing code didn't need to check for legal types. After #72340 we now handle generic ops like ISD::ADD that exist before op legalization and thus before type legalization. This meant that we needed to add extra checks that the narrow type was legal in #76785. I think the easiest thing to do here is to just maintain the invariant that the types are legal and only run the combine after type legalization.	2024-03-11 17:43:02 +08:00
Craig Topper	d8d2dea7fc	[RISCV] Handle FP riscv_masked_strided_load with 0 stride. (#84576 ) Previously, we tried to create an integer extending load. We need to a non-extending FP load instead. Fixes #84541.	2024-03-10 21:22:37 -07:00
Craig Topper	909ab0e0d1	[RISCV] Insert a freeze before converting select to AND/OR. (#84232 ) Select blocks poison, but AND/OR do not. We need to insert a freeze to block poison propagation. This creates suboptimal codegen which I will try to fix with other patches. I'm prioritizing the correctness fix since we have 2 bug reports. Fixes #84200 and #84350	2024-03-07 15:03:51 -08:00
Michael Maitland	96049fcf4e	[GISEL] Add IRTranslation for shufflevector on scalable vector types (#80378 ) Recommits llvm/llvm-project#80378 which was reverted in llvm/llvm-project#84330. The problem was that the change in llvm/test/CodeGen/AArch64/GlobalISel/legalizer-info-validation.mir used 217 as an opcode instead of a regex.	2024-03-07 09:10:03 -08:00
Michael Maitland	552da24843	Revert "[GISEL] Add IRTranslation for shufflevector on scalable vector types" (#84330 ) Reverts llvm/llvm-project#80378 causing Buildbot failures that did not show up with check-llvm or CI.	2024-03-07 10:16:31 -05:00
Michael Maitland	2b8aaef09e	[GISEL] Add IRTranslation for shufflevector on scalable vector types (#80378 ) This patch is stacked on https://github.com/llvm/llvm-project/pull/80372, https://github.com/llvm/llvm-project/pull/80307, and https://github.com/llvm/llvm-project/pull/80306. ShuffleVector on scalable vector types gets IRTranslate'd to G_SPLAT_VECTOR since a ShuffleVector that has operates on scalable vectors is a splat vector where the value of the splat vector is the 0th element of the first operand, because the index mask operand is the zeroinitializer (undef and poison are treated as zeroinitializer here). This is analogous to what happens in SelectionDAG for ShuffleVector. `buildSplatVector` is renamed to`buildBuildVectorSplatVector`. I did not make this a separate patch because it would cause problems to revert that change without reverting this change too.	2024-03-07 09:50:29 -05:00
Luke Lau	c59129a7c7	[RISCV] Recursively split concat_vector into smaller LMULs (#83035 ) This is the concat_vector equivalent of #81312, in that we recursively split concat_vectors with more than two operands into smaller concat_vectors. This allows us to break up the chain of vslideups, as well as perform the vslideups at a smaller LMUL, which in turn reduces register pressure as the previous lowering performed N vslideups at the highest result LMUL. For now, it stops splitting past MF2. This is done as a DAG combine so that any undef operands are combined away: If we do this during lowering then we end up with unnecessary vslideups of undefs.	2024-03-07 16:50:26 +08:00
Luke Lau	0207270494	[RISCV] Don't remove extends for i1 indices in mgather/mscatter (#83951 )	2024-03-06 08:52:27 +08:00
Craig Topper	58d8805ff9	[RISCV] Always use signed APSInt in getExactInteger. (#84070 ) We were setting based on whether the FP value is positive/negative, but we really want to know whether the resulting integer will be treated as a signed or unsigned value. Since we use SINT_TO_FP to convert the integer to FP, we should always used signed here. Without this we convert +2147483648.0 to an integer 0x80000000 and convert it using sint_to_fp which produces -2147483648.0.	2024-03-05 14:36:37 -08:00
Craig Topper	4dd9c2ed32	[RISCV] Use NewVL in splatPartsI64WithVL. (#83690 ) In 7b5cf52f32c09, I added this NewVL and checked that it had been set, but I didn't use it for the VL of the splat.	2024-03-02 17:08:48 -08:00
Yeting Kuo	14d8c4563e	[RISCV] Add more intrinsics into canSplatOperand. (#83106 ) This patch adds smin/smax/umin/umax/sadd_sat/ssub_sat/uadd_sat/usub_sat into canSplatOperand. It can help llvm fold vv instructions with one splat operand to vx instructions.	2024-02-29 12:57:34 +08:00
Luke Lau	9617da88ab	[RISCV] Use a ta vslideup if inserting over end of InterSubVT (#83230 ) The description in #83146 is slightly inaccurate: it relaxes a tail undisturbed vslideup to tail agnostic if we are inserting over the entire tail of the vector and we didn't shrink the LMUL of the vector being inserted into. This handles the case where we did shrink down the LMUL via InterSubVT by checking if we inserted over the entire tail of InterSubVT, the actual type that we're performing the vslideup on, not VecVT.	2024-02-28 15:58:55 +08:00
Luke Lau	91d23370cd	[RISCV] Use a tail agnostic vslideup if possible for scalable insert_subvector (#83146 ) If we know that an insert_subvector inserting a fixed subvector will overwrite the entire tail of the vector, we use a tail agnostic vslideup. This was added in https://reviews.llvm.org/D147347, but we can do the same thing for scalable vectors too. The `Policy` variable is defined in a slightly weird place but this is to mirror the fixed length subvector code path as closely as possible. I think we may be able to deduplicate them in future.	2024-02-28 10:26:54 +08:00
Luke Lau	2e564840e0	[RISCV] Use getVectorIdxConstant in RISCVISelLowering.cpp. NFC (#83019 ) We use getVectorIdxConstant() in some places and getConstant(XLenVT) or getIntPtrConstant() in others, but getVectorIdxTy() == getPointerTy() == XLenVT. This refactors RISCVISelLowering to use the former for nodes that use getVectorIdxTy(), i.e. INSERT_SUBVECTOR, EXTRACT_SUBVECTOR, INSERT_VECTOR_ELT and EXTRACT_VECTOR_ELT, so that we're consistent.	2024-02-27 10:34:04 +08:00
Yeting Kuo	e510fc7753	[VP][RISCV] Introduce vp.lrint/llrint and RISC-V support. (#82627 ) RISC-V implements vector lrint/llrint by vfcvt.x.f.v.	2024-02-26 16:37:41 +08:00
Yeting Kuo	850dde063b	[RISCV][VP] Introduce vp saturating addition/subtraction and RISC-V support. (#82370 ) This patch also pick the MatchContext framework from DAGCombiner to an indiviual header file to make the framework be used from other files in llvm/lib/CodeGen/SelectionDAG/.	2024-02-23 14:17:15 +08:00
Luke Lau	2d50703ddd	[RISCV] Use RISCVSubtarget::getRealVLen() in more places. NFC Catching a couple of more places where we can use the new query added in 8603a7b2.	2024-02-23 12:47:28 +08:00
Craig Topper	de41eae41f	[SelectionDAG][RISCV] Use FP type for legality query for LRINT/LLRINT in LegalizeVectorOps. (#82728 ) This matches how LRINT/LLRINT is queried for scalar types in LegalizeDAG. It's confusing if they do different things since a "Legal" vector LRINT/LLRINT would get through to LegalizeDAG which would then consider it illegal. This doesn't happen currently because RISC-V uses Custom.	2024-02-22 20:18:52 -08:00
Philip Reames	ac518c7c99	[RISCV] Vector sub (zext, zext) -> sext (sub (zext, zext)) (#82455 ) This is legal as long as the inner zext retains at least one bit of increase so that the sub overflow case (0 - UINT_MAX) can be represented. Alive2 proof: https://alive2.llvm.org/ce/z/BKeV3W For RVV, restrict this to power of two sizes with the operation type being at least e8 to stick to legal extends. We could arguably handle i1 source types with some care if we wanted to. This is likely profitable because it may allow us to perform the sub instruction in a narrow LMUL (equivalently, in fewer DLEN-sized pieces) before widening for the user. We could arguably avoid narrowing below DLEN, but the transform should at worst introduce one extra extend and one extra vsetvli toggle if the source could previously be handled via loads explicit w/EEW.	2024-02-22 16:17:48 -08:00
Yingwei Zheng	0107c8824b	[RISCV][SDAG] Improve codegen of select with constants if zicond is available (#82456 ) This patch uses `add + czero.eqz/nez` to lower select with constants if zicond is available. ``` (select c, c1, c2) -> (add (czero_nez c2 - c1, c), c1) (select c, c1, c2) -> (add (czero_eqz c1 - c2, c), c2) ``` The above code sequence is suggested by [RISCV Optimization Guide](https://riscv-optimization-guide-riseproject-c94355ae3e6872252baa952524.gitlab.io/riscv-optimization-guide.html#_avoid_branches_using_conditional_moves).	2024-02-23 00:18:56 +08:00
Luke Lau	edd4aee4dd	[RISCV] Compute integers once in isSimpleVIDSequence. NFCI (#82590 ) We need to iterate through the integers twice in isSimpleVIDSequence, so instead of computing them twice just compute them once at the start. This also replaces the individual checks that each element is constant with a single call to BuildVectorSDNode::isConstant.	2024-02-22 15:57:57 +08:00
Luke Lau	815644b4dd	[RISCV] Fix mgather -> riscv.masked.strided.load combine not extending indices (#82506 ) This fixes the miscompile reported in #82430 by telling isSimpleVIDSequence to sign extend to XLen instead of the width of the indices, since the "sequence" of indices generated by a strided load will be at XLen. This was the simplest way I could think of getting isSimpleVIDSequence to treat the indexes as if they were zero extended to XLenVT. Another way we could do this is by refactoring out the "get constant integers" part from isSimpleVIDSequence and handle them as APInts so we can separately zero extend it. Fixes #82430	2024-02-22 11:50:27 +08:00

1 2 3 4 5 ...

1512 Commits