llvm-project

Author	SHA1	Message	Date
Craig Topper	e64f5d6305	[RISCV] Replace RISCVISD::VP_MERGE_VL with a new node that has a separate passthru operand. (#75682 ) ISD::VP_MERGE treats the false operand as the source for elements past VL. The vmerge instruction encodes 3 registers and treats the vd register as the source for the tail. This patch adds a new ISD opcode that models the tail source explicitly. During lowering we copy the false operand to this operand. I think we can merge RISCVISD::VSELECT_VL with this new opcode by using an UNDEF passthru, but I'll save that for another patch.	2023-12-21 14:34:49 -08:00
Craig Topper	0dcff0db3a	[RISCV] Add codegen support for experimental.vp.splice (#74688 ) IR intrinsics were already defined, but no codegen support had been added. I extracted this code from our downstream. Some of it may have come from https://repo.hca.bsc.es/gitlab/rferrer/llvm-epi/ originally.	2023-12-21 08:38:32 -08:00
Yeting Kuo	9b561ca044	[RISCV] Make performFP_TO_INTCombine fold with ISD::FRINT. (#76020 ) Fold (fp_to_int (frint X)) to (fcvt X) without rounding mode.	2023-12-21 15:03:36 +08:00
Yeting Kuo	b7376c3196	[RISCV][NFC] Add comments and tests for frint case of performFP_TO_INT_SATCombine. (#76014 ) performFP_TO_INT_SATCombine could also serve pattern (fp_to_int_sat (frint X)).	2023-12-20 14:56:28 +08:00
Yeting Kuo	cdc0392669	[RISCV] Update implies for subtarget feature. (#75824 ) PR #75576 and #75735 update some implies in llvm/lib/Support/RISCVISAInfo.cpp, but both of them miss the subtarget feature part. This patch still preserve predicate HasStdExtZfhOrZfhmin and HasStdExtZhinxOrZhinxmin, since they could make error message more readable. ( Users might not know that zfh implies zfhmin.)	2023-12-19 09:47:46 +08:00
Jie Fu	b6cce87110	[RISCV] Fix -Wbraced-scalar-init in RISCVISelLowering.cpp (NFC) llvm-project/llvm/lib/Target/RISCV/RISCVISelLowering.cpp:339:24: error: braces around scalar initializer [-Werror,-Wbraced-scalar-init] 339 \| setOperationAction({ISD::ROTL}, XLenVT, Expand); \| ^~~~~~~~~~~ 1 error generated.	2023-12-17 19:59:42 +08:00
melonedo	3eaed9e6f5	[RISCV] Implement intrinsics for XCVbitmanip extension in CV32E40P (#74993 ) Implement XCVbitmanip intrinsics for CV32E40P according to the specification. This commit is part of a patch-set to upstream the vendor specific extensions of CV32E40P that need LLVM intrinsics to implement Clang builtins. Contributors: @CharKeaney, @ChunyuLiao, @jeremybennett, @lewis-revill, @NandniJamnadas, @PaoloS02, @simonpcook, @xingmingjie. Spec: `05481cf0ef/specifications/corev-builtin-spec.md (listing-of-pulp-bit-manipulation-builtins-xcvbitmanip)`. Previously reviewed on Phabricator: https://reviews.llvm.org/D157510. Parallel GCC patch: https://gcc.gnu.org/pipermail/gcc-patches/2023-November/635795.html. Co-authored-by: melonedo <funanzeng@gmail.com>	2023-12-17 19:29:40 +08:00
Philip Reames	e8a15eca92	[RISCV] Prefer whole register loads and stores when VL=VLMAX (#75531 ) If we're lowering a fixed length vector load or store which happens to exactly VLEN in size (when VLEN is exactly known), we can use a whole register load or store instead of the unit strided variants. This doesn't require a vsetvli in some cases, allows additional flexibility of vsetvli cases in others, and doesn't have a runtime dependency on the value of VL.	2023-12-15 09:26:57 -08:00
Philip Reames	632f1c5d18	[RISCV] When VLEN is exactly known, prefer VLMAX encoding for vsetvli (#75412 ) If we know the exact VLEN, then we can tell if the AVL for particular operation is equivalent to the vsetvli xN, zero, <vtype> encoding. Using this encoding is better than having to materialize an immediate in a register, but worse than being able to use the vsetivli zero, imm, <type> encoding.	2023-12-13 17:51:03 -08:00
Philip Reames	12af9c8337	[RISCV] Extract a utility for computing bounds on VLMAX [nfc] Simplifying an upcoming change...	2023-12-13 13:40:18 -08:00
Craig Topper	2c185709bc	[RISCV] Remove setJumpIsExpensive(). (#74647 ) Middle end up optimizations can speculate away the short circuit behavior of C/C++ && and \|\|. Using i1 and/or or logical select instructions and a single branch. SelectionDAGBuilder can turn i1 and/or/select back into multiple branches, but this is disabled when jump is expensive. RISC-V can use slt(u)(i) to evaluate a condition into any GPR which makes us better than other targets that use a flag register. RISC-V also has single instruction compare and branch. So its not clear from a code size perspective that using compare+and/or is better. If the full condition is dependent on multiple loads, using a logic delays the branch resolution until all the loads are resolved even if there is a cheap condition that makes the loads unnecessary. PowerPC and Lanai are the only CPU targets that use setJumpIsExpensive. NVPTX and AMDGPU also use it but they are GPU targets. PowerPC appears to have a MachineIR pass that turns AND/OR of CR bits into multiple branches. I don't know anything about Lanai and their reason for using setJumpIsExpensive. I think the decision to use logic vs branches is much more nuanced than this big hammer. So I propose to make RISC-V match other CPU targets. Anyone who wants the old behavior can still pass -mllvm -jump-is-expensive=true.	2023-12-13 09:37:25 -08:00
Craig Topper	8227072f5a	[RISCV] Add missing break to last case in switch. NFC	2023-12-12 13:52:52 -08:00
Craig Topper	3c5b42acd3	[RISCV] Allocate the varargs GPR save area as a single object. (#74354 ) Previously we allocated one object for each GPR. We also allocated the same offset twice, once to save for VASTART and then again for the first register in the save loop. This patch uses a single object for all the registers and shares this with VASTART. This is more consistent with other targets like AArch64 and ARM. I've removed the setValue(nullptr) from the memory operand now. Having a single object makes me a lot more comfortable about alias analysis being able to see what is going on. This led to the scheduling changes in push-pop-popret.ll and vararg.ll.	2023-12-05 10:30:01 -08:00
Craig Topper	b73d79fda8	[RISCV] Fix typo in comment. NFC This should say "Assume that VL output is <= 65536".	2023-12-04 14:15:49 -08:00
Craig Topper	47fe9fcaf2	[RISCV] Share ArgGPRs array between SelectionDAG and GISel. (#74152 ) This will allow us to isolate the EABI from D70401 to this new function.	2023-12-04 11:29:54 -08:00
Craig Topper	26fc26c184	[RISCV] Simplify computation of VarArgsSaveSize. NFC (#74209 ) The computation we use for computing the size already returns 0 when all registers are allocated. We don't need an if to set it to 0. Use the size being 0 to check for whether we need to spill registers or not. I have another change I want to make to this code, but this change seemed to stand on its own. I left the curly braces since I need them for the other change.	2023-12-03 20:35:12 -08:00
Philip Reames	e817966718	[RISCV] Collapse fast unaligned access into a single feature [nfc-ish] (#73971 ) When we'd originally added unaligned-scalar-mem and unaligned-vector-mem, they were separated into two parts under the theory that some processor might implement one, but not the other. At the moment, we don't have evidence of such a processor. The C/C++ level interface, and the clang driver command lines have settled on a single unaligned flag which indicates both scalar and vector support unaligned. Given that, let's remove the test matrix complexity for a set of configurations which don't appear useful. Given these are internal feature names, I don't think we need to provide any forward compatibility. Anyone disagree? Note: The immediate trigger for this patch was finding another case where the unaligned-vector-mem wasn't being properly serialized to IR from clang which resulted in problems reproducing assembly from clang's -emit-llvm feature. Instead of fixing this, I decided getting rid of the complexity was the better approach.	2023-12-01 11:00:59 -08:00
Philip Reames	ff5e536b5e	[RISCV] Add combines to form binop from tail insert idioms (#72675 ) This patch contains two related combines: 1) If we have an scalar vector insert into the result of a concat_vector, sink the insert into the operand of the concat. 2) If we have a insert of a scalar binop into a vector binop of the same opcode and the RHS of both are constant, perform the insert and then the binop. The common theme to both is pushing inserts closer to the sources of the computation graph. The goal is to enable forming vector bin ops from inserts of scalar binops at the end of another vector. For RISCV specifically, the concat_vector transform will push inserts to smaller vectors. This will have the effect of reducing lmul for the vslides, and usually doesn't require an additional vsetvli since the source vectors are already working in the narrower VL. I tried that one as a target independent combine first, and it doesn't appear profitable on all targets. This is only one approach to the problem. Another idea would be to aggressively form build_vectors and subvector inserts from the individual scalar inserts, and then have a transform which sunk a subvector_insert down through the concat. The advantage of the alternate approach is that we expose parallelism in the insert sequence, even if the source vector isn't a concat_vector. If reviewers are okay with it, I'd like to start with this approach, and then explore that direction in a follow up patch.	2023-11-30 07:32:42 -08:00
Craig Topper	e3021bdecd	[RISCV] Add RISCVISD::SLLW to computeKnownBitsForTargetNode. Found while investigating whether we still need to stop DAG combiner from turning (i64 (sext (i32 X))) into zext when i32 is known non negative. No test case because I still need to find fixes for some other issues before I can remove the code from DAGCombiner.	2023-11-29 16:21:43 -08:00
Yeting Kuo	f35c0f2f23	[RISCV] Refine pattern (select_cc seteq (and x, C), 0, 0, A) with Zbs. (#73746 ) PR #72978 disabled transformation (select_cc seteq (and x, C), 0, 0, A) -> (and (sra(shl x)), A) for better Zicond codegen. It still enables the combine when C is not fit into 12-bits. This patch disables the combine when Zbs enabled.	2023-11-29 13:09:47 +08:00
Yeting Kuo	f73844d92b	[RISCV] Generate bexti for (select(setcc eq (and x, c))) where c is power of 2. (#73649 ) Currently, llvm can transform (setcc ne (and x, c)) to (bexti x, log2(c)) where c is power of 2. This patch transform (select (setcc ne (and x, c)), T, F) into (select (setcc eq (and x, c)), F, T). It is benefit to the case c is not fit to 12-bits.	2023-11-29 11:56:48 +08:00
Philip Reames	02cbae4fe0	[RISCV] Work on subreg for insert_vector_elt when vlen is known (#72666 ) (#73680 ) If we have a constant index and a known vlen, then we can identify which registers out of a register group is being accessed. Given this, we can reuse the (slightly generalized) existing handling for working on sub-register groups. This results in all constant index extracts with known vlen becoming m1 operations. One bit of weirdness to highlight and explain: the existing code uses the VL from the original vector type, not the inner vector type. This is correct because the inner register group must be smaller than the original (possibly fixed length) vector type. Overall, this seems to a reasonable codegen tradeoff as it biases us towards immediate AVLs, which avoids needing the vsetvli form which clobbers a GPR for no real purpose. The downside is that for large fixed length vectors, we end up materializing an immediate in register for little value. We should probably generalize this idea and try to optimize the large fixed length vector case, but that can be done in separate work.	2023-11-28 10:45:22 -08:00
Philip Reames	f3a9dbe7fc	[RISCV] Split build_vector into vreg sized pieces when exact VLEN is known (#73606 ) If we have a high LMUL build_vector and a known exact VLEN, we can decompose the build_vector into one build_vector per register in the register group. Doing so requires exact knowledge of which elements correspond to each register in the register group, and thus an exact VLEN must be known. Since we no longer have operations which are linear (or worse) in LMUL, this also allows us to lower all build_vectors without resorting to going through the stack.	2023-11-28 07:39:58 -08:00
Philip Reames	a3ae7b660a	[RISCV] Minor style cleanup to cf17a24 [nfc] This was suggested in another related review, so backporting it to the existing code as well.	2023-11-28 07:27:30 -08:00
Philip Reames	cf17a24a4b	[RISCV] Use subreg extract for extract_vector_elt when vlen is known (#72666 ) This is the first in a planned patch series to teach our vector lowering how to exploit register boundaries in LMUL>1 types when VLEN is known to be an exact constant. This corresponds to code compiled by clang with the -mrvv-vector-bits=zvl option. For extract_vector_elt, if we have a constant index and a known vlen, then we can identify which register out of a register group is being accessed. Given this, we can do a sub-register extract for that register, and then shift any remaining index. This results in all constant index extracts becoming m1 operations, and thus eliminates the complexity concern for explode-vector idioms at high lmul.	2023-11-27 14:33:16 -08:00
Zi Xuan Wu (Zeson)	e89324219a	[RISCV] Don't combine store of vmv.x.s/vfmv.f.s to vp_store with VL of 1 when it's indexed store (#73219 ) Because we can't support vp_store with indexed address mode by lowering to vse intrinsic later.	2023-11-27 13:39:35 +08:00
Wang Pengcheng	5973272af7	[RISCV] Add MinimumJumpTableEntries to TuneInfo (#72963 ) This is like what AArch64 has done in #71166 except that we don't handle `HasMinSize` case now.	2023-11-23 14:05:23 +08:00
Min Hsu	e096732307	[RISCV][NFC] Rename `RISCVISD::FPCLASS` to `RISCVISD::FCLASS` To be consistent with `fclass.s/d`. Also rename `riscv_fpclass` to `riscv_fclass`. NFC.	2023-11-22 16:24:05 -08:00
Sander de Smalen	81b7f115fb	[llvm][TypeSize] Fix addition/subtraction in TypeSize. (#72979 ) It seems TypeSize is currently broken in the sense that: TypeSize::Fixed(4) + TypeSize::Scalable(4) => TypeSize::Fixed(8) without failing its assert that explicitly tests for this case: assert(LHS.Scalable == RHS.Scalable && ...); The reason this fails is that `Scalable` is a static method of class TypeSize, and LHS and RHS are both objects of class TypeSize. So this is evaluating if the pointer to the function Scalable == the pointer to the function Scalable, which is always true because LHS and RHS have the same class. This patch fixes the issue by renaming `TypeSize::Scalable` -> `TypeSize::getScalable`, as well as `TypeSize::Fixed` to `TypeSize::getFixed`, so that it no longer clashes with the variable in FixedOrScalableQuantity. The new methods now also better match the coding standard, which specifies that: * Variable names should be nouns (as they represent state) * Function names should be verb phrases (as they represent actions)	2023-11-22 08:52:53 +00:00
Zi Xuan Wu (Zeson)	06e733b198	[RISCV] Fix the order of arguments of setTruncStoreAction and setLoadExtAction (#73090 ) The first argument of setTruncStoreAction/setLoadExtAction should be Value VT instead of Memory VT.	2023-11-22 15:32:39 +08:00
Yeting Kuo	a756a6b97e	[TargetLowering][RISCV] Introduce shouldFoldSelectWithSingleBitTest and RISC-V implement. (#72978 ) DAGCombiner folds (select_cc seteq (and x, y), 0, 0, A) to (and (sra (shl x)) A) where y has a single bit set. Previously, DAGCombiner relies on `shouldAvoidTransformToShift` to decide when to do the combine, but `shouldAvoidTransformToShift` is only about shift cost. This patch introuduces a specific hook to decide when to do the combine and disable the combine when Zicond enabled and AndMask <= 1024.	2023-11-22 08:22:14 +08:00
Craig Topper	7a6fd49c8a	[RISCV] Use short forward branch for ISD::ABS. We can use short forward branch to conditionally negate if the value is negative.	2023-11-21 11:00:06 -08:00
Brandon Wu	2749f52ec4	[RISCV] Convert all floating point vector type operands to integer vector type (#69559 )	2023-11-21 23:19:10 +08:00
Liao Chunyu	9166cd2a71	[RISCV] DAG combine (mul (add x, 1), y) -> vmadd (#71495 ) vmadd: (mul (add x, 1), y) -> (add (mul x, y), y) (mul x, add (y, 1)) -> (add x, (mul x, y)) vnmsub: (mul (sub 1, x), y) -> (sub y, (mul x, y)) (mul x, (sub 1, y)) -> (sub x, (mul x, y)) Comparison with gcc: vmadd: https://gcc.godbolt.org/z/xjePx87Y7 vnsub: https://gcc.godbolt.org/z/b17zG7nT1	2023-11-21 13:43:34 +08:00
Philip Reames	144b2f579e	[RISCV] Start vslide1down sequence with a dependency breaking splat (#72691 ) If we are using entirely vslide1downs to initialize an otherwise undef vector, we end up with an implicit_def as the source of the first vslide1down. This register has to be allocated, and creates false dependencies with surrounding code. Instead, start our sequence with a vmv.v.x in the hopes of creating a dependency breaking idiom. Unfortunately, it's not clear this will actually work as due to the VL=0 special case for T.A. the hardware has to work pretty hard to recognize that the vmv.v.x actually has no source dependence. I don't think we can reasonable expect all hardware to have optimized this case, but I also don't see any downside in prefering it.	2023-11-17 12:02:58 -08:00
Sacha Coppey	aeedc07637	[IR] Add GraalVM calling conventions Adds GraalVM calling conventions. The only difference with the default calling conventions is that GraalVM reserves two registers for the heap base and the thread. Since the registers are then accessed by name, getRegisterByName has to be updated accordingly. This patch implements the calling conventions only for X86, AArch64 and RISC-V. For X86, the reserved registers are X14 and X15. For AArch64, they are X27 and X28. For RISC-V, they are X23 and X27. This patch has been used by the LLVM backend of GraalVM's Native Image project in production for around 4 months with no major issues. Differential Revision: https://reviews.llvm.org/D151107	2023-11-17 16:30:09 +00:00
Nemanja Ivanovic	0765f6451f	[RISCV] Use correct register class for Z[df]inx inline asm (#71872 ) Allocate a register of the correct register class for inline asm constraint "r" when used for FP values with -Zfinx/-Zdinx. --------- Co-authored-by: Nemanja Ivanovic <nemanja@synopsys.com>	2023-11-17 16:17:48 +01:00
Philip Reames	8f81c605f5	[RISCV] Remove custom instruction selection for VFCVT_RM and friends (#72540 ) We already have the pseudo's for lowering these as MI nodes with rounding mode operands, and the generic FRM insertion pass. Doing the insertion later in the backend allows SSA level passes to avoid reasoning about physical register copies, and happens to produce better code in practice. The later is mostly an accident of our insertion order; we happen to place the frm write after the vsetvli, and it's very common for a register to be killed at the vsetvli. End result is that we get slightly better scalar register allocation. I'm a bit unclear on the history here. I was surprised to find this code in ISEL lowering at all, but am also surprised once I found it that all the patterns and pseudos seem to already exist. My best guess is that maybe we didn't do all the possible cleanup after introducing the HasRoundMode mechanism?	2023-11-17 07:07:37 -08:00
Craig Topper	927f6f1858	[RISCV] Use bset+addi for (not (sll -1, X)). This is an alternative to #71420 that handles i32 on RV64 safely by pre-promoting the pattern in DAG combine.	2023-11-16 11:14:53 -08:00
Craig Topper	84044061e8	[RISCV] Make getDefaultVLOps call getDefaultScalableVLOps instead of the other way around. NFC Previously getDefaultScalableVLOps called getDefaultVLOps. getDefaultVLOps also handles fixed vectors so had to then check if it was fixed or scalable. Since getDefaultScalableVLOps know the type is scalable, it makes sense for it to contain the scalable case directly and have getDefaultVLOps call it for the scalable case.	2023-11-15 19:28:00 -08:00
Michael Maitland	725e599637	[RISCV][GISEL] Add support for scalable vector types in lowerReturnVal (#71587 ) Scalable vector types from LLVM IR are lowered into physical vector registers in MIR based on calling convention for return instructions.	2023-11-15 17:30:53 -05:00
Yingwei Zheng	650026897c	[RISCV][SDAG] Prefer ShortForwardBranch to lower sdiv by pow2 (#67364 ) This patch lowers `sdiv x, +/-2k` to `add + select + shift` when the short forward branch optimization is enabled. The latter inst seq performs faster than the seq generated by target-independent DAGCombiner. This algorithm is described in Hacker's Delight**. This patch also removes duplicate logic in the X86 and AArch64 backend. But we cannot do this for the PowerPC backend since it generates a special instruction `addze`.	2023-11-10 21:38:47 +08:00
Craig Topper	72f30acfed	[RISCV] Disable Zbs special case in performTRUNCATECombine with -riscv-experimental-rv64-legal-i32.	2023-11-09 10:19:28 -08:00
Craig Topper	679cc16c99	[RISCV] Disable early promotion for Zbs in performANDCombine with riscv-experimental-rv64-legal-i32 We can match this directly in isel with the i32 type being legal. The generic DAG combine will unpromote part of the pattern and prevent it from being matched in isel.	2023-11-09 09:51:31 -08:00
Wang Pengcheng	e179b125fb	[RISCV][NFC] Pass MCSubtargetInfo instead of FeatureBitset in RISCVMatInt (#71770 ) The use of `hasFeature` is more descriptive and the callers of `RISCVMatInt` have no need to call `getFeatureBits()` any more.	2023-11-09 15:15:23 +08:00
Jianjian Guan	d36eb79ccc	[RISCV] Support Strict FP arithmetic Op when only have Zvfhmin (#68867 ) Include: STRICT_FADD, STRICT_FSUB, STRICT_FMUL, STRICT_FDIV, STRICT_FSQRT and STRICT_FMA.	2023-11-09 09:55:48 +08:00
Craig Topper	90f768440d	[VP][RISCV] Add llvm.experimental.vp.reverse. (#70405 ) This is similar to vector.reverse, but only reverses the first EVL elements. I extracted this code from our downstream. Some of it may have come from https://repo.hca.bsc.es/gitlab/rferrer/llvm-epi/ originally.	2023-11-05 22:39:27 -08:00
Craig Topper	ae49bf5a7b	[RISCV] Fix stale comments in lowerShift*Parts. NFC This comment was not updated when we changed from xor back to sub.	2023-11-02 22:20:49 -07:00
Craig Topper	5570d3250f	[RISCV] Don't promote i32 and/or/xor with -riscv-experimental-rv64-legal-i32. Some test improvements, but also some regressions that need to be fixed.	2023-11-01 11:36:46 -07:00
Craig Topper	8912200966	[RISCV] Add experimental support for making i32 a legal type on RV64 in SelectionDAG. (#70357 ) This will select i32 operations directly to W instructions without custom nodes. Hopefully this can allow us to be less dependent on hasAllNBitUsers to recover i32 operations in RISCVISelDAGToDAG.cpp. This support is enabled with a command line option that is off by default. Generated code is still not optimal. I've duplicated many test cases for this, but its not complete. Enabling this runs all existing lit tests without crashing.	2023-11-01 09:36:41 -07:00

1 2 3 4 5 ...

1378 Commits