llvm-project

Author	SHA1	Message	Date
Hassnaa Hamdi	3176f255c9	[IA][AArch64]: Construct (de)interleave4 out of (de)interleave2 (#89276 ) - [AArch64]: TargetLowering is updated to spot load/store (de)interleave4 like sequences using PatternMatch, and emit equivalent sve.ld4 and sve.st4 intrinsics.	2024-08-12 17:23:00 +01:00
Craig Topper	257c479b91	[LegalizeTypes][RISCV] Use SExtOrZExtPromotedOperands to promote operands for USUBSAT. (#102781 ) It doesn't matter which extend we use to promote the operands. Use whatever is the most efficient. The custom handler for RISC-V was using SIGN_EXTEND when the Zbb extension is enabled so we no longer need that.	2024-08-11 10:22:31 -07:00
Craig Topper	ca7ad38ca0	[RISCV] Remove riscv-experimental-rv64-legal-i32. (#102509 ) This has received no development work in a while and is slowly bit rotting as new extensions are added. At the moment, I don't think this is viable without adding a new invariant that 32 bit values are always in sign extended form like Mips64 does. We are very dependent on computeKnownBits and ComputeNumSignBits in SelectionDAG to remove sign extends created for ABI reasons. If we can't propagate sign bit information through 64-bit values in SelectionDAG, we can't effectively clean up those extends.	2024-08-09 11:48:48 -07:00
Kazu Hirata	f4fb735840	[llvm] Construct SmallVector<SDValue> with ArrayRef (NFC) (#102578 )	2024-08-09 09:15:42 -07:00
Craig Topper	dac9042cc6	[RISCV] Use uint64_t operations instead of APInt operations. NFC We already know the type is i64 here. Just extract the uint64_t.	2024-08-05 10:37:04 -07:00
Sergei Barannikov	4527fba9ad	Revert "[SDag][ARM][RISCV] Allow lowering CTPOP into a libcall" (#101740 ) Reverts the rest of llvm/llvm-project#99752	2024-08-03 01:51:26 +03:00
Craig Topper	4aac78dd4a	[RISCV] Generalize existing SRA combine to fix #101040 . (#101610 ) We already had a DAG combine for (sra (sext_inreg (shl X, C1), i32), C2) -> (sra (shl X, C1+32), C2+32) that we used for RV64. This patch generalizes it to other sext_inregs for both RV32 and RV64. Fixes #101040.	2024-08-02 09:02:58 -07:00
Sergei Barannikov	92e18ffd80	[SDag][ARM][RISCV] Allow lowering CTPOP into a libcall (#99752 ) The main change is adding CTPOP to `RuntimeLibcalls.def` to allow targets to use LibCall action for CTPOP. DAG legalizers are changed accordingly.	2024-08-02 12:29:39 +03:00
Craig Topper	840ec59a0b	[RISCV] Refactor setOperationActions for scalar ISD::ABS. NFC	2024-08-01 22:42:58 -07:00
Craig Topper	3626443507	[RISCV] Use X0 for VLMax for slide1up/slide1down in lowerVectorIntrinsicScalars. (#101384 ) Previously, we created a vsetvlimax intrinsic. Using X0 simplifies the code and enables some optimizations to kick when the exact value of vlmax is known.	2024-07-31 13:01:31 -07:00
Craig Topper	76a15e5fc1	[RISCV] Keep all the setOperationActions for the same types and opcodes together. Some of XCV subtarget checks were in their own area.	2024-07-30 14:48:10 -07:00
Philip Reames	8a4b095403	[RISCV][TLI/TTI] Reject scalable offsets in isLegalAddressing mode None of our addressing modes support a scalable offset. I could not figure out how to get LSR to actually try such a formula, but let's be defensive and explicitly prevent this case from being considered a valid address mode match.	2024-07-30 12:04:23 -07:00
Luke Lau	b1542afd0b	[RISCV] Rename merge operand -> passthru. NFC (#100330 ) We sometimes call the first tied dest operand in vector pseudos the merge operand, and other times the passthru. Passthru seems to be more common, and it's what the C intrinsics call it[^1], so this renames all usages of merge to passthru to be consistent. It also helps prevent confusion with vmerge.vvm in some of the peephole optimisations. [^1]: https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/main/doc/rvv-intrinsic-spec.adoc#the-passthrough-vd-argument-in-the-intrinsics	2024-07-30 17:47:00 +08:00
Craig Topper	43de4e03a3	[RISCV] Rename hasVInstructionsBF16 to hasVInstructionsBF16Minimal. NFC (#101080 ) This makes it more consistent with Zvfhmin since it is not a complete bf16 implementation.	2024-07-29 21:55:42 -07:00
Craig Topper	ad80265874	[RISCV] Qualify all XCV predicates with !is64Bit. (#101074 ) The tablegen patterns all have isRV32. I did not check if any of them could naively support RV64. Fixes #101067 and probably other bugs like it we haven't found yet.	2024-07-29 21:52:57 -07:00
Luke Lau	e90c21831f	[RISCV] Use APInt in isSimpleVIDSequence to account for index overflow (#100072 ) At zvl1024b, we may have legal fixed length vectors where a vid.v would overflow at i8, e.g. <512 x i8>. When lowering constant build_vectors, isSimpleVIDSequence used uint64_t to model the vid.v sequence which meant it didn't account for the fact that it could overflow in these larger types. This patch fixes it by modelling the sequence with an SEW-wide APInt so if it does overflow the loop that checks/calculates the addend will detect it and bail. Fixes #99729	2024-07-30 12:16:07 +08:00
Craig Topper	3de76e4f57	[RISCV] Replace hasStdExtV with hasVInstructions in isTruncateFree. This prevents excluding the embedded vector extensions Zve*.	2024-07-29 17:15:02 -07:00
Craig Topper	03e1eb29e7	[RISCV] Replace hasStdExtV with hasVInstructions. This prevents excluding Zve*	2024-07-29 16:58:48 -07:00
Craig Topper	279953f1da	[RISCV] Simplify code in decomposeMulByConstant. NFC (#100946 ) We already checked that the type is a scalar integer, so the constant node should definitely be a ConstantSDNode. We can use cast instead of dyn_cast.	2024-07-28 19:29:39 -07:00
Yingwei Zheng	13996378d8	[RISCV][ISel] Fold FSGNJX idioms (#100718 ) This patch folds `fmul X, (fcopysign 1.0, Y)` into `fsgnjx X, Y`. This pattern exists in some graphics applications/math libraries. Alive2: https://alive2.llvm.org/ce/z/epyL33 Since fpimm +1.0 is lowered to a load from constant pool after OpLegalization, I have to introduce a new RISCVISD node FSGNJX and fold this pattern in DAGCombine. Closes https://github.com/dtcxzyw/llvm-opt-benchmark/issues/1072.	2024-07-27 12:51:58 +08:00
Craig Topper	b582b658d6	[RISCV] Add FMA support to combineOp_VLToVWOp_VL. (#100454 ) This adds FMA to the widening web support we have for add, sub, mul, and shl. Extra care needs to be taken to not widen the third FMA operand.	2024-07-26 09:12:40 -07:00
Luke Lau	754dc9ff5a	[RISCV] Move exact VLEN VLMAX transform to RISCVVectorPeephole (#100551 ) We can teach RISCVVectorPeephole to detect when an AVL is equal to the VLMAX when the exact VLEN is known and use the VLMAX sentinel instead, and in doing so remove the need for getVLOp in RISCVISelLowering. This keeps all the VLMAX logic in one place.	2024-07-26 07:56:12 +08:00
Luke Lau	63ae1e9550	[RISCV] Emit VP strided loads/stores in RISCVGatherScatterLowering (#98111 ) RISCVGatherScatterLowering is the last user of riscv_masked_strided_{load,store} after #98131 and #98112, this patch changes it to emit the VP equivalent instead. This allows us to remove the masked_strided intrinsics so we have only have one lowering path. riscv_masked_strided_{load,store} didn't have AVL operands and were always VLMAX, so this passes in the fixed or scalable element count to the EVL instead, which RISCVVectorPeephole should now convert to VLMAX after #97800. For loads we also use a vp_select to get passthru (mask undisturbed) behaviour	2024-07-24 13:56:22 +08:00
Craig Topper	caaba2a883	[RISCV] Replace VNCLIP RISCVISD opcodes with TRUNCATE_VECTOR_VL_SSAT/USAT opcodes (#100173 ) These new opcodes drop the shift amount, rounding mode, and passthru. Making them exactly like TRUNCATE_VECTOR_VL. The shift amount, rounding mode, and passthru are added in isel patterns similar to how we translate TRUNCATE_VECTOR_VL to vnsrl with a shift of 0. This should simplify #99418 a little.	2024-07-23 14:57:31 -07:00
Craig Topper	ef1367faed	[RISCV] Use vnclip(u) to handle fp_to_(s/u)int_sat that needs additional narrowing. (#100071 ) If vncvt doesn't produce the destination type directly, use vnclip to do additional narrowing with saturation.	2024-07-23 09:09:35 -07:00
Craig Topper	df4fa47b57	[RISCV] Use MVT::changeVectorElementType. NFC	2024-07-23 09:02:15 -07:00
Craig Topper	0950533cff	[RISCV] Move call to EmitLoweredCascadedSelect above some variable declarations. NFC These variables aren't used if we call EmitLoweredCascadedSelect so move the call above them.	2024-07-22 10:37:45 -07:00
Craig Topper	d221662ed0	[RISCV] In emitSelectPseudo, copy call frame size from LastSelectPseudo instead of MI. The split point is LastSelectPseudo. If MI is earlier, we might sink it to LastSelectPseudo.	2024-07-22 10:37:44 -07:00
Craig Topper	2c92335eb7	[RISCV] Copy call frame size when splitting basic block in emitSelectPseudo. (#99823 ) Fixes #97304.	2024-07-22 09:21:01 -07:00
Craig Topper	77ac07444d	Re-commit "[RISCV] Use Root instead of N throughout the worklist loop in combineBinOp_VLToVWBinOp_VL. (#99416 )" With correct test update. Original message: We were only checking that the node from the worklist is a supported root. We weren't checking the strategy or any of its operands unless it was the original node. For any other node, we just rechecked the original node's strategy and operands. The effect of this is that we don't do all of the transformations at once. Instead, when there were multiple possible nodes to transform we would only do them as each node was visited by the main DAG combine worklist. The test shows a case where we widened an instruction without removing all of the uses of the vsext. The sext is shared by one node that shares another sext node with the root another node that doesn't share anything with the root.	2024-07-18 09:10:52 -07:00
Craig Topper	10627d2004	Revert "[RISCV] Use Root instead of N throughout the worklist loop in combineBinOp_VLToVWBinOp_VL. (#99416 )" This reverts commit 0c4023ae3b64c54ff51947e9776aee0e963c5635. I messed up re-generating the test after the change.	2024-07-18 09:02:26 -07:00
Craig Topper	0c4023ae3b	[RISCV] Use Root instead of N throughout the worklist loop in combineBinOp_VLToVWBinOp_VL. (#99416 ) We were only checking that the node from the worklist is a supported root. We weren't checking the strategy or any of its operands unless it was the original node. For any other node, we just rechecked the original node's strategy and operands. The effect of this is that we don't do all of the transformations at once. Instead, when there were multiple possible nodes to transform we would only do them as each node was visited by the main DAG combine worklist. The test shows a case where we widened an instruction without removing all of the uses of the vsext. The sext is shared by one node that shares another sext node with the root another node that doesn't share anything with the root.	2024-07-18 08:47:06 -07:00
Craig Topper	d85f1054fb	[RISCV] Teach fillUpExtensionSupportForSplat to handle nxvXi64 VMV_V_X_VL on RV32. (#99251 ) A nxvXi64 VMV_V_X_VL on RV32 sign extends its 32 bit input to 64 bits. If that input is positive, the sign extend can also be considered as a zero extend.	2024-07-17 12:23:05 -07:00
Amara Emerson	f270a4dd66	[AArch64] Don't tail call memset if it would convert to a bzero. (#98969 ) Well, not quite that simple. We can tc memset since it returns the first argument but bzero doesn't do that and therefore we can end up miscompiling. This patch also refactors the logic out of isInTailCallPosition() into the callers. As a result memcpy and memmove are also modified to do the same thing for consistency. rdar://131419786	2024-07-17 01:31:52 -07:00
Yeting Kuo	746cea3eb7	[VP][RISCV] Introduce vp.splat and RISC-V. (#98731 ) This patch introduces a vp intrinsic for splat. It's helpful for IR-level passes to create a splat with specific vector length.	2024-07-17 08:40:42 +08:00
Yeting Kuo	b6c4ad700b	[RISCV] Remove x7 from fastcc list. (#96729 ) Like #93321, this patch also tries to solve the conflict usage of x7 for fastcc and Zicfilp. But this patch removes x7 from fastcc directly. Its purpose is to reduce the code complexity of #93321, and we also found that it at most increase 0.02% instruction count for most benchmarks and it might be benefit for benchmarks.	2024-07-17 08:37:55 +08:00
Craig Topper	9e52a9ee63	[RISCV] Simplify some checks of when we can't form a widening vector FP operation. NFCI The _VL nodes are only used with scalable vectors so we don't need to check that. It doesn't matter if Zvfhmin is enabled. All that really matters is whether Zvfh is.	2024-07-15 17:50:43 -07:00
Craig Topper	7863e4ed88	[RISCV] Form VFWMUL.VF and VFWADD.VF/WF when fp_extend is scalar and then splatted. (#98590 ) Previously we only supported the extend being in the vector domain after the splat.	2024-07-15 17:50:23 -07:00
Froster	c8dc21d77f	[SelectionDAG][RISCV] Fix break of vnsrl pattern in issue #94265 (#95563 ) Added a RISCV overload of `isTruncateFree` to fix the break of vnsrl described in issue #94265. Fixes #94265	2024-07-14 12:09:37 +01:00
Philip Reames	657dbc3feb	[RISCV] Reorder shuffle operands if one side is an identity (#98534 ) Doing so allows one side to fold entirely into the mask applied to the other recursive call (or a vmerge.vv at worst). This is a generalization of the existing IsSelect case (both operands are selects), so I removed that code in the process. This actually started as an attempt to remove the IsSelect bit as I'd thought it was fully redundant with the recursive formulation, but digging into test deltas revealed that we depended on that to catch the majority of the identity cases, and that in turn we were missing some cases where only RHS was an identity.	2024-07-11 13:52:41 -07:00
Joseph Huber	3f1a767572	[LLVM] Factor disabled Libcalls into the initializer (#98421 ) Summary: These Libcalls represent which functions are available to the backend. If a runtime call is not available, the target sets the the name to `nullptr`. Currently, this logic is spread around the various targets. This patch pulls all of the locations that disable libcalls into the intializer. This patch is effectively NFC. The motivation behind this patch is that currently the LTO handling uses the list of all runtime calls to determine which functions cannot be internalized and must be extracted from static libraries. We do not want this to happen for libcalls that are not emitted by the backend. A follow-up patch will move out this logic so the LTO pass can know which rtlib calls are actually used by the backend.	2024-07-11 12:59:25 -05:00
Luke Lau	19cc46144d	[RISCV] Use VP strided load in concat_vectors combine (#98131 )	2024-07-09 18:36:00 +08:00
Luke Lau	ed51908cec	[RISCV] Emit VP strided load in mgather combine. NFCI (#98112 ) This combine is a duplication of the transform in RISCVGatherScatterLowering but at the SelectionDAG level, so similarly to #98111 we can replace the use of riscv_masked_strided_load with a VP strided load. Unlike #98111 we don't require #97800 or #97798 since it only operates on fixed vectors with a non-zero stride.	2024-07-09 14:57:21 +08:00
Craig Topper	bb8998dd3b	[RISCV] Don't custom legalize vXf16 SPLAT_VECTOR with Zvfhmin without Zfhmin. Marking SPLAT_VECTOR as Custom enables generic DAGCombine to turn BUILD_VECTOR into SPLAT_VECTOR. We need to custom type legalize BUILD_VECTOR without Zfhmin since we don't have the scalar f16 type. If we allow SPLAT_VECTOR to be formed, we'll need to custom type legalize it too. Easiest fix is to only enable SPLAT_VECTOR with Zvfhmin+Zfhmin. There's still an issue that we need to properly support BUILD_VECTOR with Zvfhmin+Zfhmin. Should fix the new case reported in #97849. I've also changed the predicates to Zfhmin instead of ZfhminOrZhinxmin since Zhinx isn't compatible with Zvfhmin.	2024-07-08 22:44:58 -07:00
Philip Reames	c95935789d	[RISCV] Directly use pack* in build_vector lowering (#98084 ) In 03d4332, we extended build_vector lowering to pack elements into the largest size which doesn't exceed either ELEN or XLEN. The zbkb extension - ratified under scalar crypto, but otherwise not really connected to crypto per se - adds the packh, packw, and pack instructions. These instructions are designed for exactly this pairwise packing. I ended up choosing to directly lower to machine nodes. A combination of the slightly non-uniform semantics of these instructions (packw sign extends the result, whereas packh zero extends it), and our generic dag canonicalization (which sinks shl through or nodes), make pattern matching these tricky and not particularly robust. Another alternative was to have an ISD node for them, but that didn't seem to add much in practice.	2024-07-08 16:10:25 -07:00
Paul Kirth	a4fec164bf	Reapply "[llvm][RISCV] Enable trailing fences for seq-cst stores by default (#87376 )" (#90267 ) With the tag merging in place, we can safely change the default for +seq-cst-trailing-fence to the default, according to the recommendation in https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/master/riscv-atomic.adoc This patch changes the default for the feature flag, and moves to more consistent naming with respect to existing features. This was reverted with https://github.com/llvm/llvm-project/pull/84597, because ld.bfd would segfault with unknown riscv attributes. Now that attributes emission is guarded with a backend flag, `--riscv-abi-attributes`, this should be safe to reland, since it won't introduce abi tags unless the user opts into them.	2024-07-08 13:35:36 -07:00
Philip Reames	03d4332625	[RISCV] Pack build_vectors into largest available element type (#97351 ) Our worst case build_vector lowering is a serial chain of vslide1down.vx operations which creates a serial dependency chain through a relatively high latency operation. We can instead pack together elements into ELEN sized chunks, and move them from integer to scalar in a single operation. This reduces the length of the serial chain on the vector side, and costs at most three scalar instructions per element. This is a win for all cores when the sum of the latencies of the scalar instructions is less than the vslide1down.vx being replaced, and is particularly profitable for out-of-order cores which can overlap the scalar computation. This patch is restricted to configurations with zba and zbb. Without both, the zero extend might require two instructions which would bring the total scalar instructions per element to 4. zba and zba are both present in the rva22u64 baseline which is looking to be quite common for hardware in practice; we could extend this to systems without bitmanip with a bit of extra effort.	2024-07-08 10:38:15 -07:00
Craig Topper	e4ee9bf0d2	[RISCV] Custom legalize vXf16 BUILD_VECTOR without Zfhmin. (#97874 ) If we don't have Zfhmin, we will call `SoftPromoteHalfOperand` on the BUILD_VECTOR. This operation is not supported by the generic code. Instead, custom lower to a vXi16 BUILD_VECTOR using bitcasts. Fixes #97849.	2024-07-07 20:25:09 -07:00
Craig Topper	6337fdcc52	[RISCV] Use EXTLOAD in lowerVECTOR_SHUFFLE. (#97862 ) We're creating a load and a splat. The splat doesn't use the extended bits so it doesn't matter what extend we use.	2024-07-05 19:34:19 -07:00
Craig Topper	f118c882fe	[RISCV] Remove unnecessary setOperationAction for ISD::SELECT_CC for fixed vectors. NFC We already looped through all builtin operations and marked them as Expand. We don't need to do it to SELECT_CC again.	2024-07-05 17:35:34 -07:00

1 2 3 4 5 ...

1640 Commits