llvm-project

Author	SHA1	Message	Date
Michael Maitland	3967510032	[RISCV][GISel] First mask argument placed in v0 according to RISCV Ve… (#79343 ) …ctor CC.	2024-01-24 16:03:38 -05:00
Philip Reames	396b6bbc5e	[RISCV] Recurse on second operand of two operand shuffles (#79197 ) This builds on bdc41106ee48dce59c500c9a3957af947f30c8c3. This change completes the migration to a recursive shuffle lowering strategy where when we encounter an unknown two argument shuffle, we lower each operand as a single source permute, and then use a vselect (i.e. a vmerge) to combine the results. This relies for code quality on the post-isel combine which will aggressively fold that vmerge back into the materialization of the second operand if possible. Note: The change includes only the most immediately obvious of the stylistic cleanup. There's a bunch of code movement that this enables that I'll do as a separate patch as rolling it into this creates an unreadable diff.	2024-01-24 08:29:28 -08:00
Brandon Wu	33d804c6c2	[RISCV] Allow VCIX with SE to reorder (#77049 ) This patch allows VCIX instructions that have side effect to be reordered with memory and other side effecting instructions. However we don't want VCIX instructions to be reordered with each other, so we propose a dummy register called VCIX_STATE and make these instructions implicitly define and use it.	2024-01-24 11:30:12 +08:00
Paul Kirth	03a61d34eb	[RISCV] Support TLSDESC in the RISC-V backend (#66915 ) This patch adds basic TLSDESC support in the RISC-V backend. Specifically, we add new relocation types for TLSDESC, as prescribed in https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/373, and add a new pseudo instruction to simplify code generation. This patch does not try to optimize the local dynamic case, which can be improved in separate patches. Linker side changes will also be handled separately. The current implementation is only enabled when passing the new `-enable-tlsdesc` codegen flag.	2024-01-23 16:16:07 -08:00
Philip Reames	f05dd29cee	[RISCV] Regenerate autogen test to remove spurious diff	2024-01-23 10:57:54 -08:00
Philip Reames	bdc41106ee	[RISCV] Recurse on first operand of two operand shuffles (#79180 ) This is the first step towards an alternate shuffle lowering design for the general two vector argument case. The goal is to leverage the existing lowering for single vector permutes to avoid as many of the vrgathers as required - even if we do need the other. This patch handles only the first argument, and is arguably a slightly weird half-step. However, the test changes from the full two argument recurse patch are a lot harder to reason about. Taking this half step gives much more easily reviewable changes, and is thus worthwhile. I intend to post the patch for the second argument once this has landed.	2024-01-23 10:49:55 -08:00
Philip Reames	bb8a8770e2	[RISCV] Exploit register boundaries when lowering shuffle with exact vlen (#79072 ) If we have a shuffle which is larger than m1, we may be able to split it into a series of individual m1 shuffles. This patch starts with the subcase where the mask allows a 1-to-1 mapping from source register to destination register - each with a possible permutation of their own. We can potentially extend this later, thought in practice this seems to already catch a number of the most interesting cases.	2024-01-23 10:36:22 -08:00
Craig Topper	d360963aaa	[RISCV] Add regalloc hints for Zcb instructions. (#78949 ) This hints the register allocator to use the same register for source and destination to enable more compression.	2024-01-23 09:33:06 -08:00
Simeon K	297b77036e	[RISCV] Fix stack size computation when M extension disabled (#78602 ) Ensure that getVLENFactoredAmount does not fail when the scale amount requires the use of a non-trivial multiplication but the M extension is not enabled. In such case, perform the multiplication using shifts and adds.	2024-01-22 23:10:25 -08:00
Jim Lin	b8e708b9d3	[RISCV] Merge ADDI with X0 into base offset (#78940 ) If offset is `addi rd, x0, imm`, merge imm into base offset.	2024-01-23 09:57:05 +08:00
Simeon K	58cfd56356	[VP][RISCV] Introduce llvm.vp.minimum/maximum intrinsics (#74840 ) Although there are predicated versions of minnum/maxnum, the ones for minimum/maximum are currently missing. This patch introduces these intrinsics and implements their lowering to RISC-V.	2024-01-22 16:46:39 -08:00
Philip Reames	8675952583	[RISCV] Add coverage for shuffles splitable using exact VLEN Test coverage for an upcoming transform.	2024-01-22 14:00:51 -08:00
Wang Pengcheng	5cd8d53cac	[RISCV] Teach RISCVMergeBaseOffset to handle inline asm (#78945 ) For inline asm with memory operands, we can merge the offset into the second operand of memory constraint operands. Differential Revision: https://reviews.llvm.org/D158062	2024-01-22 17:36:32 +08:00
Craig Topper	9396891271	[RISCV] Don't look for sext in RISCVCodeGenPrepare::visitAnd. We want to know the upper 33 bits of the And Input are zero. SExt only guarantees they are the same. We originally checked for SExt or ZExt when we were using isImpliedByDomCondition because a ZExt may have been changed to SExt before we visited the And. We are no longer using isImpliedByDomCondition so we can only look for zext with the nneg flag. While here, switch to PatternMatch to simplify the code. Fixes #78783	2024-01-19 14:44:47 -08:00
Craig Topper	66cea7143a	[RISCV] Add test case for #78783 . NFC	2024-01-19 14:44:47 -08:00
Craig Topper	9ae28fb9d3	[RISCV] Prevent RISCVMergeBaseOffsetOpt from calling getVRegDef on a physical register. (#78762 ) Fixes #78679.	2024-01-19 12:15:08 -08:00
Min-Yih Hsu	5330daad41	[RISCV] Add support for Smepmp 1.0 (#78489 ) Smepmp is a supervisor extension that prevents privileged processes from accessing unprivileged program and data. Spec: https://github.com/riscv/riscv-tee/blob/main/Smepmp/Smepmp.pdf	2024-01-19 11:09:35 -08:00
Craig Topper	0ad83bc26c	[RISCV] Don't look through EXTRACT_ELEMENT in lowerScalarInsert if the element types are different. (#78668 ) If the element type of the vector we're extracting from doesn't match the type we're inserting into, we can't directly insert or extract the subvector.	2024-01-18 22:35:24 -08:00
Luke Lau	8649328060	[RISCV] Add support for new unprivileged extensions defined in profiles spec (#77458 ) This adds minimal support for 7 new unprivileged extensions that were defined as a part of the RISC-V Profiles specification here: https://github.com/riscv/riscv-profiles/blob/main/profiles.adoc#7-new-isa-extensions * Ziccif: Main memory supports instruction fetch with atomicity requirement * Ziccrse: Main memory supports forward progress on LR/SC sequences * Ziccamoa: Main memory supports all atomics in A * Zicclsm: Main memory supports misaligned loads/stores * Za64rs: Reservation set size of 64 bytes * Za128rs: Reservation set size of 128 bytes * Zic64b: Cache block size isf 64 bytes As stated in the specification, these extensions don't add any new features but describe existing features. So this patch only adds parsing and subtarget features.	2024-01-19 06:57:06 +07:00
Luke Lau	15b0fabb21	[RISCV] Vectorize phi for loop carried @llvm.vector.reduce.fadd (#78244 ) LLVM vector reduction intrinsics return a scalar result, but on RISC-V vector reduction instructions write the result in the first element of a vector register. So when a reduction in a loop uses a scalar phi, we end up with unnecessary scalar moves: loop: vfmv.s.f v10, fa0 vfredosum.vs v8, v8, v10 vfmv.f.s fa0, v8 This mainly affects ordered fadd reductions, which has a scalar accumulator operand. This tries to vectorize any scalar phis that feed into a fadd reduction in RISCVCodeGenPrepare, converting: loop: %phi = phi <float> [ ..., %entry ], [ %acc, %loop] %acc = call float @llvm.vector.reduce.fadd.nxv4f32(float %phi, <vscale x 2 x float> %vec) ``` to loop: %phi = phi <vscale x 2 x float> [ ..., %entry ], [ %acc.vec, %loop] %phi.scalar = extractelement <vscale x 2 x float> %phi, i64 0 %acc = call float @llvm.vector.reduce.fadd.nxv4f32(float %x, <vscale x 2 x float> %vec) %acc.vec = insertelement <vscale x 2 x float> poison, float %acc.next, i64 0 Which eliminates the scalar -> vector -> scalar crossing during instruction selection.	2024-01-18 16:15:20 +07:00
Chia	ba81477e9c	Recommit "[RISCV][ISel] Combine scalable vector add/sub/mul with zero/sign extension." (#76785 ) This patch was originally introduced in PR #72340, but was reverted due to a bug on invalid extension combine. Specifically, we resolve the case in the https://github.com/llvm/llvm-project/pull/72340#issuecomment-1874810998 ``` define <vscale x 1 x i32> @foo(<vscale x 1 x i1> %x, <vscale x 1 x i2> %y) { %a = zext <vscale x 1 x i1> %x to <vscale x 1 x i32> %b = zext <vscale x 1 x i1> %y to <vscale x 1 x i32> %c = add <vscale x 1 x i32> %a, %b ret <vscale x 1 x i32> %c } ``` The previous patch didn't check if the semantic of `ISD::ZERO_EXTEND` and `ISD::ZERO_EXTEND` is equivalent to the `vsext.vf2` or `vzext.vf2` (not ensuring the SEW condition on widening Vector Arithmetic Instructions). Thanks for @topperc pointing out this bug. ## The original description This PR mainly aims at resolving the below missed-optimization case, while it could also be considered as an extension of the previous patch https://reviews.llvm.org/D133739?id= ### Missed-Optimization Case Compiler Explorer: https://godbolt.org/z/GzWzP7Pfh ### Source Code: ``` define <vscale x 2 x i16> @multiple_users(ptr %x, ptr %y, ptr %z) { %a = load <vscale x 2 x i8>, ptr %x %b = load <vscale x 2 x i8>, ptr %y %b2 = load <vscale x 2 x i8>, ptr %z %c = sext <vscale x 2 x i8> %a to <vscale x 2 x i16> %d = sext <vscale x 2 x i8> %b to <vscale x 2 x i16> %d2 = sext <vscale x 2 x i8> %b2 to <vscale x 2 x i16> %e = mul <vscale x 2 x i16> %c, %d %f = add <vscale x 2 x i16> %c, %d2 %g = sub <vscale x 2 x i16> %c, %d2 %h = or <vscale x 2 x i16> %e, %f %i = or <vscale x 2 x i16> %h, %g ret <vscale x 2 x i16> %i } ``` ### Before This Patch ``` # %bb.0: vsetvli a3, zero, e16, mf2, ta, ma vle8.v v8, (a0) vle8.v v9, (a1) vle8.v v10, (a2) svf2 v11, v8 vsext.vf2 v8, v9 vsext.vf2 v9, v10 vmul.vv v8, v11, v8 vadd.vv v10, v11, v9 vsub.vv v9, v11, v9 vor.vv v8, v8, v10 vor.vv v8, v8, v9 ret ``` ### After This Patch ``` # %bb.0: vsetvli a3, zero, e8, mf4, ta, ma vle8.v v8, (a0) vle8.v v9, (a1) vle8.v v10, (a2) vwmul.vv v11, v8, v9 vwadd.vv v9, v8, v10 vwsub.vv v12, v8, v10 vsetvli zero, zero, e16, mf2, ta, ma vor.vv v8, v11, v9 vor.vv v8, v8, v12 ret ``` We can see Add/Sub/Mul are combined with the Sign Extension. ### Relation to the Patch D133739 The patch D133739 introduced an optimization for folding `ADD_VL`/ `SUB_VL` / `MUL_V` with `VSEXT_VL` / `VZEXT_VL`. However, the patch did not consider the case of non-fixed length vector case, thus this PR could also be considered as an extension for the D133739.	2024-01-17 18:30:27 -08:00
Mikhail Gudim	c1f433849b	[GISel][RISCV] Implement selectShiftMask. (#77572 ) Implement `selectShiftMask` in `GlobalISel`.	2024-01-17 16:25:43 -05:00
Philip Reames	de423cfe3d	[RISCV] Prefer vsetivli for VLMAX when VLEN is exactly known (#75509 ) If VLEN is exactly known, we may be able to use the vsetivli encoding instead of the vsetvli a0, zero, <vtype> encoding. This slightly reduces register pressure. This builds on 632f1c5, but reverses course a bit. It turns out to be quite complicated to canonicalize from VLMAX to immediate early because the sentinel value is widely used in tablegen patterns without knowledge of LMUL. Instead, we canonicalize towards the VLMAX representation, and then pick the immediate form during insertion since we have the LMUL information there. Within InsertVSETVLI, this could reasonable fit in a couple places. If reviewers want me to e.g. move it to emission, let me know. Doing so may require a bit of extra code to e.g. handle comparisons of the two forms, but shouldn't be too complicated.	2024-01-17 12:40:00 -08:00
Simon Pilgrim	d92ce344bf	Revert faecc736e2ac3cd8c77 #74443 [DAG] isSplatValue - node is a splat if all demanded elts have the same whole constant value (#74443 ) Relying on ComputeKnownBits to find a splat is causing miscompilations where a shift of zero is being assumed to give zero, but further simplification leads to a shift of zero by undef, resulting in an unexpected undef value. Fixes #78109	2024-01-17 15:59:33 +00:00
Alex Bradbury	da0755f7b7	[RISCV][test] Test showing missed optimisation for spills/fills of GPR<->FPR moves The fmv can be removed through appropriate logic in RISCVInstrInfo::foldMemoryOperandImpl.	2024-01-17 08:11:06 +00:00
Craig Topper	7fe5269b54	[RISCV] Bump Zfbfmin, Zvfbfmin, and Zvfbfwma to 1.0. (#78021 )	2024-01-16 08:42:21 -08:00
Luke Lau	93d39657f5	[RISCV] Remove -riscv-v-vector-bits-min flag that was left behind. NFC This should have been removed in 74f985b793bf4005e49736f8c2cef8b5cbf7c1ab	2024-01-16 21:30:32 +07:00
Wang Pengcheng	3ac9fe69f7	[RISCV] CodeGen of RVE and ilp32e/lp64e ABIs (#76777 ) This commit includes the necessary changes to clang and LLVM to support codegen of `RVE` and the `ilp32e`/`lp64e` ABIs. The differences between `RVE` and `RVI` are: * `RVE` reduces the integer register count to 16(x0-x16). * The ABI should be `ilp32e` for 32 bits and `lp64e` for 64 bits. `RVE` can be combined with all current standard extensions. The central changes in ilp32e/lp64e ABI, compared to ilp32/lp64 are: * Only 6 integer argument registers (rather than 8). * Only 2 callee-saved registers (rather than 12). * A Stack Alignment of 32bits (rather than 128bits). * ilp32e isn't compatible with D ISA extension. If `ilp32e` or `lp64` is used with an ISA that has any of the registers x16-x31 and f0-f31, then these registers are considered temporaries. To be compatible with the implementation of ilp32e in GCC, we don't use aligned registers to pass variadic arguments and set stack alignment\ to 4-bytes for types with length of 2*XLEN. FastCC is also supported on RVE, while GHC isn't since there is only one avaiable register. Differential Revision: https://reviews.llvm.org/D70401	2024-01-16 20:44:30 +08:00
Alex Bradbury	84f7fb6217	[MachineScheduler] Add option to control reordering for store/load clustering (#75338 ) Reordering based on the sort order of the MemOpInfo array was disabled in <https://reviews.llvm.org/D72706>. However, it's not clear this is desirable for al targets. It also makes it more difficult to compare the incremental benefit of enabling load clustering in the selectiondag scheduler as well was the machinescheduler, as the sdag scheduler does seem to allow this reordering. This patch adds a parameter that can control the behaviour on a per-target basis. Split out from #73789.	2024-01-16 07:17:41 +00:00
Luke Lau	286a366d05	[RISCV] Remove vmv.s.x and vmv.x.s lmul pseudo variants (#71501 ) vmv.s.x and vmv.x.s ignore LMUL, so we can replace the PseudoVMV_S_X_MX and PseudoVMV_X_S_MX with just one pseudo each. These pseudos use the VR register class (just like the actual instruction), so we now only have TableGen patterns for vectors of LMUL <= 1. We now rely on the existing combines that shrink LMUL down to 1 for vmv_s_x_vl (and vfmv_s_f_vl). We could look into removing these combines later and just inserting the nodes with the correct type in a later patch. The test diff is due to the fact that a PseudoVMV_S_X/PsuedoVMV_X_S no longer carries any information about LMUL, so if it's the only vector pseudo instruction in a block then it now defaults to LMUL=1.	2024-01-16 13:36:24 +07:00
Luke Lau	3b7abf38fb	[RISCV] Add disjoint flag to or ops in RISCVGatherScatterLowering tests. NFC InstCombine will add the disjoint flag to these or instructions. This patch adds them to the tests so that it matches the input RISCVGatherScatterLowering will receive in practice, allowing us to rely on said disjoint flag: https://github.com/llvm/llvm-project/pull/77800#discussion_r1449231844	2024-01-15 14:09:27 +07:00
Luke Lau	0cf768e7f1	[RISCV] Handle disjoint or in RISCVGatherScatterLowering (#77800 ) This patch adds support for the disjoint flag in the non-recursive case, as well as adding an additional check for it in the recursive case. Note that haveNoCommonBitsSet should be equivalent to having the disjoint flag set, and the check can be removed in a follow-up patch. Co-authored-by: Philip Reames <preames@rivosinc.com> --------- Co-authored-by: Philip Reames <preames@rivosinc.com>	2024-01-15 13:37:09 +07:00
Luke Lau	c07a1fe7b4	[RISCV] Lower vfmv.s.f intrinsics to VFMV_S_F_VL first (#76699 ) Currently vfmv.s.f intrinsics are directly selected to their pseudos via a tablegen pattern in RISCVInstrInfoVPseudos.td, whereas the other move instructions (vmv.s.x/vmv.v.x/vmv.v.f etc.) first get lowered to their corresponding VL SDNode, then get selected from a pattern in RISCVInstrInfoVVLPatterns.td This patch brings vfmv.s.f inline with the other move instructions. Split out from #71501, where we did this to preserve the behaviour of selecting vmv_s_x for VFMV_S_F_VL for small enough immediates.	2024-01-15 12:07:29 +07:00
Min-Yih Hsu	2f2217a8f7	[RISCV] Add missing tests for inttoptr/ptrtoint on scalable vectors (#77857 ) Add missing tests for inttoptr/ptrtoint on scalable vectors. Previously we only had inttoptr/ptrtoint tests for fixed vectors.	2024-01-12 09:52:07 -08:00
Philip Reames	5ce067d592	Revert "[LSR][TTI][RISCV] Disable terminator folding for RISC-V." This reverts commit fdb87640ee2be63af9b0e0cd943cb13d79686a03, and thus re-enables terminator folding for RISCV. The reported miscompile has been fixed in f5dd70c58277d925710e5a7c25c86d7565cc3c6c.	2024-01-11 13:20:02 -08:00
Luke Lau	114e6d7ba0	[RISCV] Add test for strided gather with recursive disjoint or. NFC This already gets converted to a strided intrinsic because we currently call haveNoCommonBitsSet when checking or instructions, but an upcoming patch will change this logic and we want to preserve this case. Note that this IR is in the form that comes from instcombine. The splats need to be inline constexprs, otherwise isSplatValue() will fail. (It can't currently handle splats where the shufflevector is an instruction, and the insertelement is a constexpr.	2024-01-12 00:02:28 +07:00
Luke Lau	3b3ee1f534	[RISCV] Add test for strided gather with disjoint or. NFC	2024-01-11 22:08:57 +07:00
Luke Lau	e8790027b1	[RISCV] Allow vsetvlis with same register AVL in doLocalPostpass (#76801 )	2024-01-11 12:12:46 +07:00
Craig Topper	3378514a4d	[RISCV] Use any_extend for type legalizing atomic_compare_swap with Zacas. (#77669 ) With Zacas we will use amocas.w which doesn't require the input to be sign extended.	2024-01-10 12:41:11 -08:00
Craig Topper	0a1b066bba	[RISCV] Support isel for Zacas for XLen and i32. (#77666 ) This adds new isel patterns for Zacas that take priority over the pseudoinstructions we use for the A extension. Support for 2x XLen types will come in a separate patch since they need to be done differently.	2024-01-10 12:00:40 -08:00
Craig Topper	b788692fa5	[RISCV][NFC] Remove unused CHECK prefixes to fix buildbots. NFC	2024-01-09 23:37:18 -08:00
jiahanxie353	e42a70afab	[RISCV][GISel] IRTranslate and Legalize some instructions with scalable vector type * Add IRTranslate tests for ADD, SUB, AND, OR, and XOR with scalable vector types to show that they work as expected. * Legalize G_ADD, G_SUB, G_AND, G_OR, and G_XOR of scalable vector type for the RISC-V vector extension.	2024-01-09 21:51:30 -07:00
Chia	a79d13f12a	[RISCV][ISel] Use vaaddu with rounding mode rnu for ISD::AVGCEILU. (#77473 ) Similar to #76550, but for `ISD::AVGCEILU`. Specifically, this patch aims to use `vaaddu` with rounding mode rnu (i.e `vxrm[1:0] = 0b00`) for `ISD::AVGCEILU`. ### Source code ``` define <vscale x 8 x i8> @vaaddu_vv_nxv8i8_ceil(<vscale x 8 x i8> %x, <vscale x 8 x i8> %y) { %xzv = zext <vscale x 8 x i8> %x to <vscale x 8 x i16> %yzv = zext <vscale x 8 x i8> %y to <vscale x 8 x i16> %add = add nuw nsw <vscale x 8 x i16> %xzv, %yzv %one = insertelement <vscale x 8 x i16> poison, i16 1, i32 0 %splat = shufflevector <vscale x 8 x i16> %one, <vscale x 8 x i16> poison, <vscale x 8 x i32> zeroinitializer %add1 = add nuw nsw <vscale x 8 x i16> %add, %splat %div = lshr <vscale x 8 x i16> %add1, %splat %ret = trunc <vscale x 8 x i16> %div to <vscale x 8 x i8> ret <vscale x 8 x i8> %ret } ``` ### Before this patch ``` vaaddu_vv_nxv8i8_ceil: vsetvli a0, zero, e8, m1, ta, ma vwaddu.vv v10, v8, v9 vsetvli zero, zero, e16, m2, ta, ma vadd.vi v10, v10, 1 vsetvli zero, zero, e8, m1, ta, ma vnsrl.wi v8, v10, 1 ret ``` ### After this patch ``` vaaddu_vv_nxv8i8_ceil: vsetvli a0, zero, e8, m1, ta, ma csrwi vxrm, 0 vaaddu.vv v8, v8, v9 ret ```	2024-01-10 12:08:16 +09:00
Fangrui Song	6c207ee5d2	[RISCV] Force relocations if initial MCSubtargetInfo contains FeatureRelax (#77436 ) Regarding ``` .option norelax j label .option relax // relaxable instructions // For assembly input, RISCVAsmParser::ParseInstruction will set ForceRelocs (https://reviews.llvm.org/D46423). // For direct object emission, ForceRelocs is not set after https://github.com/llvm/llvm-project/pull/73721 label: ``` The J instruction needs a relocation to ensure the target is correct after linker relaxation. This is related a limitation in the assembler: RISCVAsmBackend::shouldForceRelocation decides upfront whether a relocation is needed, instead of checking more information (whether there are relaxable fragments in between). Despite the limitation, `j label` produces a relocation in direct object emission mode, but was broken by #73721 due to the shouldForceRelocation limitation. Add a workaround to RISCVTargetELFStreamer to emulate the previous behavior. Link: https://github.com/ClangBuiltLinux/linux/issues/1965	2024-01-09 11:24:21 -08:00
Fangrui Song	7620f03ef7	[MC] Parse SHF_LINK_ORDER argument before section group name (#77407 ) When both SHF_LINK_ORDER \| SHF_GROUP flags are set, GNU assembler from 2.35 onwards (https://sourceware.org/PR25381 https://sourceware.org/binutils/docs/as/Section.html) parses the SHF_LINK_ORDER argument before section group name, different from us. This is unfortunate, but does not matter because the `.section` flag `o` is a niche feature only used by compiler instrumentations, not adopted by hand-written assembly, and using both flags is extremely rare. Let's just match GNU assembler. There is another benefit: we now support zero-flag section group with the SHF_LINK_ORDER flag, while previously there isn't a syntax. While here, print 'G' after 'o' to be clear that the 'G' argument is parsed after the 'o' argument. To make the diff smaller, we don't print 'G' after 'w' in the absence of 'o' for now.	2024-01-09 10:42:34 -08:00
Chia	0c24c175f2	[RISCV][ISel] Use vaaddu with rounding mode rdn for ISD::AVGFLOORU. (#76550 ) This patch aims to use `vaaddu` with rounding mode rdn (i.e `vxrm[1:0] = 0b10`) for `ISD::AVGFLOORU`. ### Source code ``` define <8 x i8> @vaaddu_auto(ptr %x, ptr %y, ptr %z) { %xv = load <8 x i8>, ptr %x, align 2 %yv = load <8 x i8>, ptr %y, align 2 %xzv = zext <8 x i8> %xv to <8 x i16> %yzv = zext <8 x i8> %yv to <8 x i16> %add = add nuw nsw <8 x i16> %xzv, %yzv %div = lshr <8 x i16> %add, <i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1> %ret = trunc <8 x i16> %div to <8 x i8> ret <8 x i8> %ret } ``` ### Before this patch ``` vaaddu_auto: vsetivli zero, 8, e8, mf2, ta, ma vle8.v v8, (a0) vle8.v v9, (a1) vwaddu.vv v10, v8, v9 vnsrl.wi v8, v10, 1 ret ``` ### After this patch ``` vaaddu_auto: vsetivli zero, 8, e8, mf2, ta, ma vle8.v v8, (a0) vle8.v v9, (a1) csrwi vxrm, 2 vaaddu.vv v8, v8, v9 ret ``` ### Note on signed averaging addition Based on the rvv spec, there is also a variant for signed averaging addition called `vaadd`. But AFAIU, no matter in which rounding mode, we cannot achieve the semantic of signed averaging addition through `vaadd`. Thus this patch only introduces `vaaddu`.	2024-01-09 15:17:38 +09:00
Craig Topper	a8e9dceb49	[RISCV] Use getELen() instead of hardcoded 64 in lowerBUILD_VECTOR. (#77355 ) This is needed to properly support Zve32x.	2024-01-08 19:36:15 -08:00
Jim Lin	96c4f1034c	[RISCV] Add support predicating for ANDN/ORN/XNOR with short-forward-branch-opt. (#77077 ) ANDN/ORN/XNOR are like other ALU instructions. It should be able to be predicated by the cpu that supports short-forward-branch.	2024-01-09 11:12:44 +08:00
Craig Topper	faa326de97	[RISCV] Add branch+c.mv macrofusion for sifive-p450. (#76169 ) sifive-p450 supports a very restricted version of the short forward branch optimization from the sifive-7-series. For sifive-p450, a branch over a single c.mv can be macrofused as a conditional move operation. Due to encoding restrictions on c.mv, we can't conditionally move from X0. That would require c.li instead.	2024-01-08 15:23:26 -08:00
Min-Yih Hsu	478ec63312	[RISCV] Mark VFIRST and VCPOP as SignExtendingOpW (#77022 ) Since their values are small enough ([-1, 65535] & [0, 65535], respectively) to fit into signed 32 bits, any sext (or downcasting + sext) will be redundnat. Hence marking them as SignExtendingOpW.	2024-01-08 10:59:06 -08:00

1 2 3 4 5 ...

3409 Commits