llvm-project

Author	SHA1	Message	Date
Philip Reames	e9311f9c5a	[RISCV] Separate single source and dual source lowering code [nfc] The two single source cases aren't effected by the swap or select matching as those are dual operand specific. Similarly, a two source shuffle can't be a rotate. We can extend this idea for some of the shuffle types above, but some of them are validly either single or dual source. We don't want to loose that and the code complexity of versioning early and having to repeat some shuffle kinds doesn't (currently) seem worth it.	2024-01-24 09:16:50 -08:00
Philip Reames	fd817249f4	[RISCV] Sink code into using branch in shuffle lowering [nfc] Follow up to 396b6bbc, sink code into consuming branch, and fix one comment I realized used the misleading wording. (Permute is a specific sub-type of single source shuffle.)	2024-01-24 08:52:07 -08:00
Philip Reames	396b6bbc5e	[RISCV] Recurse on second operand of two operand shuffles (#79197 ) This builds on bdc41106ee48dce59c500c9a3957af947f30c8c3. This change completes the migration to a recursive shuffle lowering strategy where when we encounter an unknown two argument shuffle, we lower each operand as a single source permute, and then use a vselect (i.e. a vmerge) to combine the results. This relies for code quality on the post-isel combine which will aggressively fold that vmerge back into the materialization of the second operand if possible. Note: The change includes only the most immediately obvious of the stylistic cleanup. There's a bunch of code movement that this enables that I'll do as a separate patch as rolling it into this creates an unreadable diff.	2024-01-24 08:29:28 -08:00
Brandon Wu	33d804c6c2	[RISCV] Allow VCIX with SE to reorder (#77049 ) This patch allows VCIX instructions that have side effect to be reordered with memory and other side effecting instructions. However we don't want VCIX instructions to be reordered with each other, so we propose a dummy register called VCIX_STATE and make these instructions implicitly define and use it.	2024-01-24 11:30:12 +08:00
Paul Kirth	03a61d34eb	[RISCV] Support TLSDESC in the RISC-V backend (#66915 ) This patch adds basic TLSDESC support in the RISC-V backend. Specifically, we add new relocation types for TLSDESC, as prescribed in https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/373, and add a new pseudo instruction to simplify code generation. This patch does not try to optimize the local dynamic case, which can be improved in separate patches. Linker side changes will also be handled separately. The current implementation is only enabled when passing the new `-enable-tlsdesc` codegen flag.	2024-01-23 16:16:07 -08:00
Philip Reames	bdc41106ee	[RISCV] Recurse on first operand of two operand shuffles (#79180 ) This is the first step towards an alternate shuffle lowering design for the general two vector argument case. The goal is to leverage the existing lowering for single vector permutes to avoid as many of the vrgathers as required - even if we do need the other. This patch handles only the first argument, and is arguably a slightly weird half-step. However, the test changes from the full two argument recurse patch are a lot harder to reason about. Taking this half step gives much more easily reviewable changes, and is thus worthwhile. I intend to post the patch for the second argument once this has landed.	2024-01-23 10:49:55 -08:00
Philip Reames	bb8a8770e2	[RISCV] Exploit register boundaries when lowering shuffle with exact vlen (#79072 ) If we have a shuffle which is larger than m1, we may be able to split it into a series of individual m1 shuffles. This patch starts with the subcase where the mask allows a 1-to-1 mapping from source register to destination register - each with a possible permutation of their own. We can potentially extend this later, thought in practice this seems to already catch a number of the most interesting cases.	2024-01-23 10:36:22 -08:00
Philip Reames	a0f69be262	[RISCV] Continue with early return for shuffle lowering [nfc] Move two cases where we're not actually going to use any of our computed index vectors or mask values above the computation of the same.	2024-01-23 09:32:04 -08:00
Philip Reames	51f9e982ed	[RISCV] Use early return for select shuffle lowering [nfc] Minor rework of the fallback case for two argument shuffles in lowerVECTOR_SHUFFLE. We had some common code which wasn't actually common, and simplified significantly once specialized for whether we had a select or not.	2024-01-23 09:20:52 -08:00
Simeon K	58cfd56356	[VP][RISCV] Introduce llvm.vp.minimum/maximum intrinsics (#74840 ) Although there are predicated versions of minnum/maxnum, the ones for minimum/maximum are currently missing. This patch introduces these intrinsics and implements their lowering to RISC-V.	2024-01-22 16:46:39 -08:00
Craig Topper	672fb5892e	[RISCV] Remove extra semicolons. NFC	2024-01-22 14:30:34 -08:00
Craig Topper	0ad83bc26c	[RISCV] Don't look through EXTRACT_ELEMENT in lowerScalarInsert if the element types are different. (#78668 ) If the element type of the vector we're extracting from doesn't match the type we're inserting into, we can't directly insert or extract the subvector.	2024-01-18 22:35:24 -08:00
Chia	ba81477e9c	Recommit "[RISCV][ISel] Combine scalable vector add/sub/mul with zero/sign extension." (#76785 ) This patch was originally introduced in PR #72340, but was reverted due to a bug on invalid extension combine. Specifically, we resolve the case in the https://github.com/llvm/llvm-project/pull/72340#issuecomment-1874810998 ``` define <vscale x 1 x i32> @foo(<vscale x 1 x i1> %x, <vscale x 1 x i2> %y) { %a = zext <vscale x 1 x i1> %x to <vscale x 1 x i32> %b = zext <vscale x 1 x i1> %y to <vscale x 1 x i32> %c = add <vscale x 1 x i32> %a, %b ret <vscale x 1 x i32> %c } ``` The previous patch didn't check if the semantic of `ISD::ZERO_EXTEND` and `ISD::ZERO_EXTEND` is equivalent to the `vsext.vf2` or `vzext.vf2` (not ensuring the SEW condition on widening Vector Arithmetic Instructions). Thanks for @topperc pointing out this bug. ## The original description This PR mainly aims at resolving the below missed-optimization case, while it could also be considered as an extension of the previous patch https://reviews.llvm.org/D133739?id= ### Missed-Optimization Case Compiler Explorer: https://godbolt.org/z/GzWzP7Pfh ### Source Code: ``` define <vscale x 2 x i16> @multiple_users(ptr %x, ptr %y, ptr %z) { %a = load <vscale x 2 x i8>, ptr %x %b = load <vscale x 2 x i8>, ptr %y %b2 = load <vscale x 2 x i8>, ptr %z %c = sext <vscale x 2 x i8> %a to <vscale x 2 x i16> %d = sext <vscale x 2 x i8> %b to <vscale x 2 x i16> %d2 = sext <vscale x 2 x i8> %b2 to <vscale x 2 x i16> %e = mul <vscale x 2 x i16> %c, %d %f = add <vscale x 2 x i16> %c, %d2 %g = sub <vscale x 2 x i16> %c, %d2 %h = or <vscale x 2 x i16> %e, %f %i = or <vscale x 2 x i16> %h, %g ret <vscale x 2 x i16> %i } ``` ### Before This Patch ``` # %bb.0: vsetvli a3, zero, e16, mf2, ta, ma vle8.v v8, (a0) vle8.v v9, (a1) vle8.v v10, (a2) svf2 v11, v8 vsext.vf2 v8, v9 vsext.vf2 v9, v10 vmul.vv v8, v11, v8 vadd.vv v10, v11, v9 vsub.vv v9, v11, v9 vor.vv v8, v8, v10 vor.vv v8, v8, v9 ret ``` ### After This Patch ``` # %bb.0: vsetvli a3, zero, e8, mf4, ta, ma vle8.v v8, (a0) vle8.v v9, (a1) vle8.v v10, (a2) vwmul.vv v11, v8, v9 vwadd.vv v9, v8, v10 vwsub.vv v12, v8, v10 vsetvli zero, zero, e16, mf2, ta, ma vor.vv v8, v11, v9 vor.vv v8, v8, v12 ret ``` We can see Add/Sub/Mul are combined with the Sign Extension. ### Relation to the Patch D133739 The patch D133739 introduced an optimization for folding `ADD_VL`/ `SUB_VL` / `MUL_V` with `VSEXT_VL` / `VZEXT_VL`. However, the patch did not consider the case of non-fixed length vector case, thus this PR could also be considered as an extension for the D133739.	2024-01-17 18:30:27 -08:00
Philip Reames	de423cfe3d	[RISCV] Prefer vsetivli for VLMAX when VLEN is exactly known (#75509 ) If VLEN is exactly known, we may be able to use the vsetivli encoding instead of the vsetvli a0, zero, <vtype> encoding. This slightly reduces register pressure. This builds on 632f1c5, but reverses course a bit. It turns out to be quite complicated to canonicalize from VLMAX to immediate early because the sentinel value is widely used in tablegen patterns without knowledge of LMUL. Instead, we canonicalize towards the VLMAX representation, and then pick the immediate form during insertion since we have the LMUL information there. Within InsertVSETVLI, this could reasonable fit in a couple places. If reviewers want me to e.g. move it to emission, let me know. Doing so may require a bit of extra code to e.g. handle comparisons of the two forms, but shouldn't be too complicated.	2024-01-17 12:40:00 -08:00
Wang Pengcheng	3ac9fe69f7	[RISCV] CodeGen of RVE and ilp32e/lp64e ABIs (#76777 ) This commit includes the necessary changes to clang and LLVM to support codegen of `RVE` and the `ilp32e`/`lp64e` ABIs. The differences between `RVE` and `RVI` are: * `RVE` reduces the integer register count to 16(x0-x16). * The ABI should be `ilp32e` for 32 bits and `lp64e` for 64 bits. `RVE` can be combined with all current standard extensions. The central changes in ilp32e/lp64e ABI, compared to ilp32/lp64 are: * Only 6 integer argument registers (rather than 8). * Only 2 callee-saved registers (rather than 12). * A Stack Alignment of 32bits (rather than 128bits). * ilp32e isn't compatible with D ISA extension. If `ilp32e` or `lp64` is used with an ISA that has any of the registers x16-x31 and f0-f31, then these registers are considered temporaries. To be compatible with the implementation of ilp32e in GCC, we don't use aligned registers to pass variadic arguments and set stack alignment\ to 4-bytes for types with length of 2*XLEN. FastCC is also supported on RVE, while GHC isn't since there is only one avaiable register. Differential Revision: https://reviews.llvm.org/D70401	2024-01-16 20:44:30 +08:00
Luke Lau	286a366d05	[RISCV] Remove vmv.s.x and vmv.x.s lmul pseudo variants (#71501 ) vmv.s.x and vmv.x.s ignore LMUL, so we can replace the PseudoVMV_S_X_MX and PseudoVMV_X_S_MX with just one pseudo each. These pseudos use the VR register class (just like the actual instruction), so we now only have TableGen patterns for vectors of LMUL <= 1. We now rely on the existing combines that shrink LMUL down to 1 for vmv_s_x_vl (and vfmv_s_f_vl). We could look into removing these combines later and just inserting the nodes with the correct type in a later patch. The test diff is due to the fact that a PseudoVMV_S_X/PsuedoVMV_X_S no longer carries any information about LMUL, so if it's the only vector pseudo instruction in a block then it now defaults to LMUL=1.	2024-01-16 13:36:24 +07:00
Luke Lau	c07a1fe7b4	[RISCV] Lower vfmv.s.f intrinsics to VFMV_S_F_VL first (#76699 ) Currently vfmv.s.f intrinsics are directly selected to their pseudos via a tablegen pattern in RISCVInstrInfoVPseudos.td, whereas the other move instructions (vmv.s.x/vmv.v.x/vmv.v.f etc.) first get lowered to their corresponding VL SDNode, then get selected from a pattern in RISCVInstrInfoVVLPatterns.td This patch brings vfmv.s.f inline with the other move instructions. Split out from #71501, where we did this to preserve the behaviour of selecting vmv_s_x for VFMV_S_F_VL for small enough immediates.	2024-01-15 12:07:29 +07:00
Craig Topper	3378514a4d	[RISCV] Use any_extend for type legalizing atomic_compare_swap with Zacas. (#77669 ) With Zacas we will use amocas.w which doesn't require the input to be sign extended.	2024-01-10 12:41:11 -08:00
jiahanxie353	e42a70afab	[RISCV][GISel] IRTranslate and Legalize some instructions with scalable vector type * Add IRTranslate tests for ADD, SUB, AND, OR, and XOR with scalable vector types to show that they work as expected. * Legalize G_ADD, G_SUB, G_AND, G_OR, and G_XOR of scalable vector type for the RISC-V vector extension.	2024-01-09 21:51:30 -07:00
Chia	a79d13f12a	[RISCV][ISel] Use vaaddu with rounding mode rnu for ISD::AVGCEILU. (#77473 ) Similar to #76550, but for `ISD::AVGCEILU`. Specifically, this patch aims to use `vaaddu` with rounding mode rnu (i.e `vxrm[1:0] = 0b00`) for `ISD::AVGCEILU`. ### Source code ``` define <vscale x 8 x i8> @vaaddu_vv_nxv8i8_ceil(<vscale x 8 x i8> %x, <vscale x 8 x i8> %y) { %xzv = zext <vscale x 8 x i8> %x to <vscale x 8 x i16> %yzv = zext <vscale x 8 x i8> %y to <vscale x 8 x i16> %add = add nuw nsw <vscale x 8 x i16> %xzv, %yzv %one = insertelement <vscale x 8 x i16> poison, i16 1, i32 0 %splat = shufflevector <vscale x 8 x i16> %one, <vscale x 8 x i16> poison, <vscale x 8 x i32> zeroinitializer %add1 = add nuw nsw <vscale x 8 x i16> %add, %splat %div = lshr <vscale x 8 x i16> %add1, %splat %ret = trunc <vscale x 8 x i16> %div to <vscale x 8 x i8> ret <vscale x 8 x i8> %ret } ``` ### Before this patch ``` vaaddu_vv_nxv8i8_ceil: vsetvli a0, zero, e8, m1, ta, ma vwaddu.vv v10, v8, v9 vsetvli zero, zero, e16, m2, ta, ma vadd.vi v10, v10, 1 vsetvli zero, zero, e8, m1, ta, ma vnsrl.wi v8, v10, 1 ret ``` ### After this patch ``` vaaddu_vv_nxv8i8_ceil: vsetvli a0, zero, e8, m1, ta, ma csrwi vxrm, 0 vaaddu.vv v8, v8, v9 ret ```	2024-01-10 12:08:16 +09:00
Craig Topper	c9da4dc77f	[RISCV] Refactor GPRF64 register class to make it usable for Zacas. (#77408 ) -Rename to GPRPair. -Rename registers to be named like X10_X11 instead of X10_PD. Except X0 which is now X0_Pair since it is not paired with X1. -Use unknown size and offset for the subreg indices. This might be a functional change, but does not affect any lit tests.	2024-01-09 09:21:27 -08:00
Alex Bradbury	2d54ec36f7	[SelectionDAG] Add and use SDNode::getAsAPIntVal() helper (#77455 ) This is the logical equivalent for #76710 for APInt and uses the same naming scheme. Converted existing users through: `git grep -l "cast<ConstantSDNode>\(.\).getAPIntValueValue" \| xargs sed -E -i 's/cast<ConstantSDNode>\((.*)\)->getAPIntValue/\1->getAsAPIntVal/'`	2024-01-09 14:27:07 +00:00
Alex Bradbury	197214e39b	[RFC][SelectionDAG] Add and use SDNode::getAsZExtVal() helper (#76710 ) This follows on from #76708, allowing `cast<ConstantSDNode>(N)->getZExtValue()` to be replaced with just `N->getAsZextVal();` Introduced via `git grep -l "cast<ConstantSDNode>\(.\).getZExtValue" \| xargs sed -E -i 's/cast<ConstantSDNode>\((.*)\)->getZExtValue/\1->getAsZExtVal/'` and then using `git clang-format` on the result.	2024-01-09 12:25:17 +00:00
Chia	0c24c175f2	[RISCV][ISel] Use vaaddu with rounding mode rdn for ISD::AVGFLOORU. (#76550 ) This patch aims to use `vaaddu` with rounding mode rdn (i.e `vxrm[1:0] = 0b10`) for `ISD::AVGFLOORU`. ### Source code ``` define <8 x i8> @vaaddu_auto(ptr %x, ptr %y, ptr %z) { %xv = load <8 x i8>, ptr %x, align 2 %yv = load <8 x i8>, ptr %y, align 2 %xzv = zext <8 x i8> %xv to <8 x i16> %yzv = zext <8 x i8> %yv to <8 x i16> %add = add nuw nsw <8 x i16> %xzv, %yzv %div = lshr <8 x i16> %add, <i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1> %ret = trunc <8 x i16> %div to <8 x i8> ret <8 x i8> %ret } ``` ### Before this patch ``` vaaddu_auto: vsetivli zero, 8, e8, mf2, ta, ma vle8.v v8, (a0) vle8.v v9, (a1) vwaddu.vv v10, v8, v9 vnsrl.wi v8, v10, 1 ret ``` ### After this patch ``` vaaddu_auto: vsetivli zero, 8, e8, mf2, ta, ma vle8.v v8, (a0) vle8.v v9, (a1) csrwi vxrm, 2 vaaddu.vv v8, v8, v9 ret ``` ### Note on signed averaging addition Based on the rvv spec, there is also a variant for signed averaging addition called `vaadd`. But AFAIU, no matter in which rounding mode, we cannot achieve the semantic of signed averaging addition through `vaadd`. Thus this patch only introduces `vaaddu`.	2024-01-09 15:17:38 +09:00
Craig Topper	a8e9dceb49	[RISCV] Use getELen() instead of hardcoded 64 in lowerBUILD_VECTOR. (#77355 ) This is needed to properly support Zve32x.	2024-01-08 19:36:15 -08:00
Craig Topper	faa326de97	[RISCV] Add branch+c.mv macrofusion for sifive-p450. (#76169 ) sifive-p450 supports a very restricted version of the short forward branch optimization from the sifive-7-series. For sifive-p450, a branch over a single c.mv can be macrofused as a conditional move operation. Due to encoding restrictions on c.mv, we can't conditionally move from X0. That would require c.li instead.	2024-01-08 15:23:26 -08:00
Fangrui Song	360996ac5a	[RISCV] Merge machine operand flag MO_PLT into MO_CALL (#77253 ) Since #72467, `@plt` in assembly output "call foo@plt" is omitted. We can trivially merge MO_PLT and MO_CALL without any functional change to assembly/relocatable file output. Earlier architectures use different call relocation types whether a PLT is potentially needed: R_386_PLT32/R_386_PC32, R_68K_PLT32/R_68K_PC32, R_SPARC_WDISP30/R_SPARC_WPLT320. However, as the PLT property is per-symbol instead of per-call-site and linkers can optimize out a PLT, the distinction has been confusing. Arm made good names R_ARM_CALL/R_AARCH64_CALL. Let's use MO_CALL instead of MO_PLT. As follow-ups, we can merge fixup_riscv_call/fixup_riscv_call_plt and VK_RISCV_CALL/VK_RISCV_CALL_PLT.	2024-01-07 12:43:39 -08:00
Craig Topper	a960703466	[RISCV] Remove incomplete PRE_DEC/POST_DEC code for XTHeadMemIdx. (#76922 ) As far as I can tell if getIndexedAddressParts received an ISD::SUB, the constant would be negated. So `IsInc` should be set to true since the SUB was effectively converted to ADD. This means we should never use PRE_DEC/POST_DEC. No tests are affected because DAGCombine aggressively turns SUB with constant into ADD so no lit test has a SUB reach getIndexedAddressParts.	2024-01-04 09:48:40 -08:00
Shih-Po Hung	475890cd2e	[RISCV][CostModel] Add getRISCVInstructionCost() to TTI for CostKind (#76793 ) Instruction cost for CodeSize and Latency/RecipThroughput can be very different. Considering the diversity of CostKind and vendor-specific cost, and how they are spread across various TTI functions, it's becoming quite a challenge to handle. This patch adds an interface getRISCVInstructionCost to address it.	2024-01-04 21:04:36 +08:00
Craig Topper	80889ae029	[RISCV] Remove RISCVISD::VSELECT_VL. (#76866 ) We can use RISCVISD::VMERGE_VL with an undef passthru operand. I had to rewrite the FMA patterns to handle both undef and non-undef cases so we can get the tail policy.	2024-01-03 21:31:07 -08:00
Craig Topper	4e347b4e38	Revert "[RISCV][ISel] Combine scalable vector add/sub/mul with zero/sign extension (#72340 )" This reverts most of commit 5b155aea0e529b7b5c807e189fef6ea5cd5faec9. I have left the new test file, but regenerated the checks. This causes failures in our downstream testing. The input types to the extends need to be checked so we don't create RISCVISD::VZEXT_VL with illegal or unsupported input type.	2024-01-02 19:49:42 -08:00
Alex Bradbury	80aeb62211	[llvm][NFC] Use SDValue::getConstantOperandVal(i) where possible (#76708 ) This helper function shortens examples like `cast<ConstantSDNode>(Node->getOperand(1))->getZExtValue();` to `Node->getConstantOperandVal(1);`. Implemented with: `git grep -l "cast<ConstantSDNode>\(.->getOperand\(.\)\)->getZExtValue\(\)" \| xargs sed -E -i 's/cast<ConstantSDNode>\((.)->getOperand\((.)\)\)->getZExtValue\(\)/\1->getConstantOperandVal(\2)/` and `git grep -l "cast<ConstantSDNode>\(.\.getOperand\(.\)\)->getZExtValue\(\)" \| xargs sed -E -i 's/cast<ConstantSDNode>\((.)\.getOperand\((.)\)\)->getZExtValue\(\)/\1.getConstantOperandVal(\2)/'`. With a couple of simple manual fixes needed. Result then processed by `git clang-format`.	2024-01-02 13:14:28 +00:00
Chia	5b155aea0e	[RISCV][ISel] Combine scalable vector add/sub/mul with zero/sign extension (#72340 ) This PR mainly aims at resolving the below missed-optimization case, while it could also be considered as an extension of the previous patch https://reviews.llvm.org/D133739?id= ## Missed-Optimization Case Compiler Explorer: https://godbolt.org/z/GzWzP7Pfh ### Source Code: ``` define <vscale x 2 x i16> @multiple_users(ptr %x, ptr %y, ptr %z) { %a = load <vscale x 2 x i8>, ptr %x %b = load <vscale x 2 x i8>, ptr %y %b2 = load <vscale x 2 x i8>, ptr %z %c = sext <vscale x 2 x i8> %a to <vscale x 2 x i16> %d = sext <vscale x 2 x i8> %b to <vscale x 2 x i16> %d2 = sext <vscale x 2 x i8> %b2 to <vscale x 2 x i16> %e = mul <vscale x 2 x i16> %c, %d %f = add <vscale x 2 x i16> %c, %d2 %g = sub <vscale x 2 x i16> %c, %d2 %h = or <vscale x 2 x i16> %e, %f %i = or <vscale x 2 x i16> %h, %g ret <vscale x 2 x i16> %i } ``` ### Before This Patch ``` # %bb.0: vsetvli a3, zero, e16, mf2, ta, ma vle8.v v8, (a0) vle8.v v9, (a1) vle8.v v10, (a2) svf2 v11, v8 vsext.vf2 v8, v9 vsext.vf2 v9, v10 vmul.vv v8, v11, v8 vadd.vv v10, v11, v9 vsub.vv v9, v11, v9 vor.vv v8, v8, v10 vor.vv v8, v8, v9 ret ``` ### After This Patch ``` # %bb.0: vsetvli a3, zero, e8, mf4, ta, ma vle8.v v8, (a0) vle8.v v9, (a1) vle8.v v10, (a2) vwmul.vv v11, v8, v9 vwadd.vv v9, v8, v10 vwsub.vv v12, v8, v10 vsetvli zero, zero, e16, mf2, ta, ma vor.vv v8, v11, v9 vor.vv v8, v8, v12 ret ``` We can see Add/Sub/Mul are combined with the Sign Extension. ## Relation to the Patch D133739 The patch D133739 introduced an optimization for folding `ADD_VL`/ `SUB_VL` / `MUL_V` with `VSEXT_VL` / `VZEXT_VL`. However, the patch did not consider the case of non-fixed length vector case, thus this PR could also be considered as an extension for the D133739. Furthermore, in the current `SelectionDAG`, we represent scalable vector add (or any binary operator) as a normal `ADD` operation. It might be better to use an Opcode like `ADD_VL`, which needs further conversation and decision.	2023-12-29 14:36:38 +08:00
Vitaly Buka	9c39d9bb49	Revert "[RISCV][CostModel] Add getRISCVInstructionCost() to TTI for Cost… (#73651 )" (#76536 ) Fails on bots https://lab.llvm.org/buildbot/#/builders/5/builds/39629 Issue #76535 This reverts commit 3e75dece919511e4a2edada82d783304cc14a9cd.	2023-12-28 13:30:56 -08:00
Shih-Po Hung	3e75dece91	[RISCV][CostModel] Add getRISCVInstructionCost() to TTI for Cost… (#73651 ) …Kind Instruction cost for CodeSize and Latency/RecipThroughput can be very different. Considering the diversity of CostKind and vendor-specific cost, and how they are spread across various TTI functions, it's becoming quite a challenge to handle. This patch adds an interface getRISCVInstructionCost to address it.	2023-12-28 14:36:01 +08:00
Yeting Kuo	af837d44c7	[RISCV][DAG] Teach computeKnownBits consider SEW/LMUL/AVL for vsetvli. (#76158 ) This patch also add tests whose masks are too narrow to combine. I think it can help us to find out bugs caused by too large known bits.	2023-12-25 11:18:22 +08:00
Craig Topper	e64f5d6305	[RISCV] Replace RISCVISD::VP_MERGE_VL with a new node that has a separate passthru operand. (#75682 ) ISD::VP_MERGE treats the false operand as the source for elements past VL. The vmerge instruction encodes 3 registers and treats the vd register as the source for the tail. This patch adds a new ISD opcode that models the tail source explicitly. During lowering we copy the false operand to this operand. I think we can merge RISCVISD::VSELECT_VL with this new opcode by using an UNDEF passthru, but I'll save that for another patch.	2023-12-21 14:34:49 -08:00
Craig Topper	0dcff0db3a	[RISCV] Add codegen support for experimental.vp.splice (#74688 ) IR intrinsics were already defined, but no codegen support had been added. I extracted this code from our downstream. Some of it may have come from https://repo.hca.bsc.es/gitlab/rferrer/llvm-epi/ originally.	2023-12-21 08:38:32 -08:00
Yeting Kuo	9b561ca044	[RISCV] Make performFP_TO_INTCombine fold with ISD::FRINT. (#76020 ) Fold (fp_to_int (frint X)) to (fcvt X) without rounding mode.	2023-12-21 15:03:36 +08:00
Yeting Kuo	b7376c3196	[RISCV][NFC] Add comments and tests for frint case of performFP_TO_INT_SATCombine. (#76014 ) performFP_TO_INT_SATCombine could also serve pattern (fp_to_int_sat (frint X)).	2023-12-20 14:56:28 +08:00
Yeting Kuo	cdc0392669	[RISCV] Update implies for subtarget feature. (#75824 ) PR #75576 and #75735 update some implies in llvm/lib/Support/RISCVISAInfo.cpp, but both of them miss the subtarget feature part. This patch still preserve predicate HasStdExtZfhOrZfhmin and HasStdExtZhinxOrZhinxmin, since they could make error message more readable. ( Users might not know that zfh implies zfhmin.)	2023-12-19 09:47:46 +08:00
Jie Fu	b6cce87110	[RISCV] Fix -Wbraced-scalar-init in RISCVISelLowering.cpp (NFC) llvm-project/llvm/lib/Target/RISCV/RISCVISelLowering.cpp:339:24: error: braces around scalar initializer [-Werror,-Wbraced-scalar-init] 339 \| setOperationAction({ISD::ROTL}, XLenVT, Expand); \| ^~~~~~~~~~~ 1 error generated.	2023-12-17 19:59:42 +08:00
melonedo	3eaed9e6f5	[RISCV] Implement intrinsics for XCVbitmanip extension in CV32E40P (#74993 ) Implement XCVbitmanip intrinsics for CV32E40P according to the specification. This commit is part of a patch-set to upstream the vendor specific extensions of CV32E40P that need LLVM intrinsics to implement Clang builtins. Contributors: @CharKeaney, @ChunyuLiao, @jeremybennett, @lewis-revill, @NandniJamnadas, @PaoloS02, @simonpcook, @xingmingjie. Spec: `05481cf0ef/specifications/corev-builtin-spec.md (listing-of-pulp-bit-manipulation-builtins-xcvbitmanip)`. Previously reviewed on Phabricator: https://reviews.llvm.org/D157510. Parallel GCC patch: https://gcc.gnu.org/pipermail/gcc-patches/2023-November/635795.html. Co-authored-by: melonedo <funanzeng@gmail.com>	2023-12-17 19:29:40 +08:00
Philip Reames	e8a15eca92	[RISCV] Prefer whole register loads and stores when VL=VLMAX (#75531 ) If we're lowering a fixed length vector load or store which happens to exactly VLEN in size (when VLEN is exactly known), we can use a whole register load or store instead of the unit strided variants. This doesn't require a vsetvli in some cases, allows additional flexibility of vsetvli cases in others, and doesn't have a runtime dependency on the value of VL.	2023-12-15 09:26:57 -08:00
Philip Reames	632f1c5d18	[RISCV] When VLEN is exactly known, prefer VLMAX encoding for vsetvli (#75412 ) If we know the exact VLEN, then we can tell if the AVL for particular operation is equivalent to the vsetvli xN, zero, <vtype> encoding. Using this encoding is better than having to materialize an immediate in a register, but worse than being able to use the vsetivli zero, imm, <type> encoding.	2023-12-13 17:51:03 -08:00
Philip Reames	12af9c8337	[RISCV] Extract a utility for computing bounds on VLMAX [nfc] Simplifying an upcoming change...	2023-12-13 13:40:18 -08:00
Craig Topper	2c185709bc	[RISCV] Remove setJumpIsExpensive(). (#74647 ) Middle end up optimizations can speculate away the short circuit behavior of C/C++ && and \|\|. Using i1 and/or or logical select instructions and a single branch. SelectionDAGBuilder can turn i1 and/or/select back into multiple branches, but this is disabled when jump is expensive. RISC-V can use slt(u)(i) to evaluate a condition into any GPR which makes us better than other targets that use a flag register. RISC-V also has single instruction compare and branch. So its not clear from a code size perspective that using compare+and/or is better. If the full condition is dependent on multiple loads, using a logic delays the branch resolution until all the loads are resolved even if there is a cheap condition that makes the loads unnecessary. PowerPC and Lanai are the only CPU targets that use setJumpIsExpensive. NVPTX and AMDGPU also use it but they are GPU targets. PowerPC appears to have a MachineIR pass that turns AND/OR of CR bits into multiple branches. I don't know anything about Lanai and their reason for using setJumpIsExpensive. I think the decision to use logic vs branches is much more nuanced than this big hammer. So I propose to make RISC-V match other CPU targets. Anyone who wants the old behavior can still pass -mllvm -jump-is-expensive=true.	2023-12-13 09:37:25 -08:00
Craig Topper	8227072f5a	[RISCV] Add missing break to last case in switch. NFC	2023-12-12 13:52:52 -08:00
Craig Topper	3c5b42acd3	[RISCV] Allocate the varargs GPR save area as a single object. (#74354 ) Previously we allocated one object for each GPR. We also allocated the same offset twice, once to save for VASTART and then again for the first register in the save loop. This patch uses a single object for all the registers and shares this with VASTART. This is more consistent with other targets like AArch64 and ARM. I've removed the setValue(nullptr) from the memory operand now. Having a single object makes me a lot more comfortable about alias analysis being able to see what is going on. This led to the scheduling changes in push-pop-popret.ll and vararg.ll.	2023-12-05 10:30:01 -08:00
Craig Topper	b73d79fda8	[RISCV] Fix typo in comment. NFC This should say "Assume that VL output is <= 65536".	2023-12-04 14:15:49 -08:00

1 2 3 4 5 ...

1414 Commits