llvm-project

Author	SHA1	Message	Date
Sam Elliott	cfc5baf6e6	[RISCV] SiFive CLIC Support (#132481 ) This Change adds support for two SiFive vendor attributes in clang: - "SiFive-CLIC-preemptible" - "SiFive-CLIC-stack-swap" These can be given together, and can be combined with "machine", but cannot be combined with any other interrupt attribute values. These are handled primarily in RISCVFrameLowering: - "SiFive-CLIC-stack-swap" entails swapping `sp` with `sf.mscratchcsw` at function entry and exit, which holds the trap stack pointer. - "SiFive-CLIC-preemptible" entails saving `mcause` and `mepc` before re-enabling interrupts using `mstatus`. To save these, `s0` and `s1` are first spilled to the stack, and then the values are read into these registers. If these registers are used in the function, their values will be spilled a second time onto the stack with the generic callee-saved-register handling. At the end of the function interrupts are disabled again before `mepc` and `mcause` are restored. This Change also adds support for the following two experimental extensions, which only contain CSRs: - XSfsclic - for SiFive's CLIC Supervisor-Mode CSRs - XSfmclic - for SiFive's CLIC Machine-Mode CSRs The latter is needed for interrupt support. The CFI information for this implementation is not correct, but I'd prefer to correct this in a follow-up. While it's unlikely anyone wants to unwind through a handler, the CFI information is also used by debuggers so it would be good to get it right. Co-authored-by: Ana Pazos <apazos@quicinc.com>	2025-04-25 17:12:27 -07:00
Matt Arsenault	91865ac9ba	Use isa instead of !dyn_cast (#137344 )	2025-04-25 19:11:56 +02:00
Philip Reames	9062a38d5d	[RISCV] Add codegen support for ri.vinsert.v.x and ri.vextract.x.v (#136708 ) These instructions are included in XRivosVisni. They perform a scalar insert into a vector (with a potentially non-zero index) and a scalar extract from a vector (with a potentially non-zero index) respectively. They're very analogous to vmv.s.x and vmv.x.s respectively. The instructions do have a couple restrictions: 1) Only constant indices are supported w/a uimm5 format. 2) There are no FP variants. One important property of these instructions is that their throughput and latency are expected to be LMUL independent.	2025-04-25 07:59:40 -07:00
Philip Reames	b278aa3197	[RISCV] Make xrivosvizip interleave2 and deinterleave2 undef safe (#136733 ) We're duplicating uses here, so we need to freeze the inputs. --------- Co-authored-by: Luke Lau <luke_lau@icloud.com>	2025-04-24 08:25:12 -07:00
Sergei Barannikov	7af555e524	[ARM][RISCV] Partially revert #101786 (#137120 ) The change as is breaks the Linux kernel build as pointed out in the comments.	2025-04-24 10:13:05 +03:00
Luke Lau	717efc0a99	[RISCV] Support disjoint RISCVISD::OR_VL in combineOp_VLToVWOp_VL (#136820 ) This handles combining fixed-length disjoint ors to vwadd[u].wv, as was done for scalable vectors in #86929. vwadd[u].vv patterns need to be handled separately with a pattern in a separate patch due to the extends being sunk, see #136716.	2025-04-23 18:43:55 +08:00
Sergei Barannikov	11a3de7e98	[SDag][ARM][RISCV] Allow lowering CTPOP into a libcall (#101786 ) This is a reland of #99752 with the bug fixed (see test diff in the third commit in this PR). All `popcount` libcalls return `int`, but `ISD::CTPOP` returns the type of the argument, which can be wider than `int`. The fix is to make DAG legalizer pass the correct return type to `makeLibCall` and sign-extend the result afterwards. Original commit message: The main change is adding CTPOP to `RuntimeLibcalls.def` to allow targets to use LibCall action for CTPOP. DAG legalizers are changed accordingly. Pull Request: https://github.com/llvm/llvm-project/pull/101786	2025-04-23 12:43:05 +03:00
David Green	98b6f8dc69	[CostModel] Remove optional from InstructionCost::getValue() (#135596 ) InstructionCost is already an optional value, containing an Invalid state that can be checked with isValid(). There is little point in returning another optional from getValue(). Most uses do not make use of it being a std::optional, dereferencing the value directly (either isValid has been checked previously or the Cost is assumed to be valid). The one case that does in AMDGPU used value_or which has been replaced by a isValid() check.	2025-04-23 07:46:27 +01:00
Philip Reames	4dbf67de40	[RISCV] Lower SEW<=32 vector_deinterleave(2) via vunzip2{a,b} (#136463 ) This is a continuation from 22d5890c and adds the neccessary logic to handle SEW!=64 profitably. The interesting case is needing to handle e.g. a single m1 which is split via extract_subvector into two operands, and form that back into a single m1 operation - instead of letting the vslidedown by vlenb/Constant sequence be generated. This is analogous to the getSingleShuffleSrc for vnsrl, and we can share a bunch of code.	2025-04-22 09:26:36 -07:00
Philip Reames	901ac60db7	[RISCV] Use ri.vzip2{a,b} for interleave2 if available (#136364 ) If XRivosVizip is available, the ri.vzip2a and ri.vzip2b instructions can be used perform a interleave shuffle. This patch only effects the intrinsic lowering (and thus scalable vectors). Fixed vectors go through shuffle lowering and the zip2a (but not zip2b) case is already handled there..	2025-04-22 08:25:17 -07:00
quic_hchandel	8f8853a574	[RISCV] Add ISel patterns for Xqcia instructions (#136548 ) This patch adds instruction selection patterns for generating the integer arithmetic instructions.	2025-04-22 15:40:03 +05:30
Philip Reames	722d5890cc	[RISCV] Lower e64 vector_deinterleave via ri.vunzip2{a,b} if available (#136321 ) If XRivosVizip is available, the ri.vunzip2a and ri.vunzip2b can be used to the concatenation and register deinterleave shuffle. This patch only effects the intrinsic lowering (and thus scalable vectors because the fixed vectors go through shuffle lowering). Note that this patch is restricted to e64 for staging purposes only. e64 is obviously profitable (i.e. we remove a vcompress). At e32 and below, our alternative is a vnsrl instead, and we need a bit more complexity around lowering with fractional LMUL before the ri.vunzip2a/b versions becomes always profitable. I'll post the followup change once this lands.	2025-04-18 10:40:15 -07:00
Philip Reames	f2ecd86e34	[Analysis] Remove implicit LocationSize conversion from uint64_t (#133342 ) This change removes the uint64_t constructor on LocationSize preventing implicit conversion, and fixes up the using APIs to adapt to the change. Note that I'm adding a couple of explicit conversion points on routines where passing in a fixed offset as an integer seems likely to have well understood semantics. We had an unfortunate case which arose if you tried to pass a TypeSize value to a parameter of LocationSize type. We'd find the implicit conversion path through TypeSize -> uint64_t -> LocationSize which works just fine for fixed values, but looses information and fails assertions if the TypeSize was scalable. This change breaks the first link in that implicit conversion chain since that seemed to be the easier one.	2025-04-18 07:46:31 -07:00
Philip Reames	50f9b34b53	[RISCV] Prefer vmv.s.x for build_vector a, undef, ..., undef (#136164 ) If we have a build vector which could be either a splat or a scalar insert, prefer the scalar insert. At high LMUL, this reduces vector register pressure (locally, the use will likely still be aligned), and the amount of work performed for the splat.	2025-04-17 19:51:35 -07:00
Philip Reames	7866fc2bd9	[RISCV] Rewrite vrgather.vx undef, (vmv.s.x), 0, v0 as vmv.v.x (#136010 ) This extends the DAG combine introduced in 336b2909 to handle the case where the prior value is defined by a vmv.s.x instead of a vmv.v.x. If the vrgather splats the single source element, and has no passthru we can replace it with a vmv.v.x - which will in turn usually get folded into a vmerge if a select follows.	2025-04-17 10:06:43 -07:00
Jim Lin	0439a4eca7	[RISCV] Add new CondCode COND_CV_BEQIMM/COND_CV_BNEIMM for CV immediate branch (#135771 ) If there is another branch instruction also with immediate operand, but it is used to specify which bit to be tested is set or clear. We only check whether operand2 is immediate or not here. There are no way to distinguish between them. So add new CondCode COND_CV_BEQIMM/COND_CV_BNEIMM that we can know what kinds of immediate branch instruction are matched in Select_* Pseudo.	2025-04-16 10:16:31 +08:00
Philip Reames	a6b424e1e7	[RISCV] Extend redundant vrgather.vx peephole to vfmv.v.f (#135503 ) Extend the transform introduced in 336b290 to vfmv.v.f. This is fairly trivial and would have been in the original commit except I hadn't written the FP tests yet. If the vrgather.vi is preceeded by a vfmv.v.f which writes a superset of the lanes writen by the vrgather, and the vrgather has no passthru, then the vrgather has no semantic effect.	2025-04-14 18:13:12 -07:00
Philip Reames	336b290923	[RISCV] Use a DAG combine to prune pointless vrgather.vi (#135392 ) If the vrgather.vi is preceeded by a vmv.v.x which writes a superset of the lanes writen by the vrgather, and the vrgather has no passthru, then the vrgather has no semantic effect. This is the start of a mini-series of patches around rewriting vrgather.vi/vx preceeded by vmv.v.x, vfmf.v.f, vmv.s.x, etc... Starting with the simplest, but also lowest impact. One point I'd like a second oppinion on is the out of bounds semenatic change. As far as I can tell, all the indices are in bounds by construction. The doc change is as much as I couldn't figure out how to test the alternative as anything else.	2025-04-11 20:02:53 -07:00
Sergey Kachkov	bc2a5b5466	[RISCV] Explicitly set FRM defs as non-dead to prevent their reordering with instructions that may use it (#135176 ) Fixes #135172. The proposed solution is to conservatively reset dead flag from all $frm defs in AdjustInstrPostInstrSelection.	2025-04-11 15:07:51 +03:00
Philip Reames	f40001372b	[RISCV] Lower a shuffle which is nearly identity except one replicated element (#135292 ) This can be done with a vrgather.vi/vx, and (possibly) a register move. The alternative is to do a vrgather.vv with a full width index vector. We'd already caught the two operands forms of this shuffle; this patch specifically handles the single operand form. Unfortunately only in abstract, it would be nice if we canonicalized shuffles in some way wouldn't it?	2025-04-10 19:45:04 -07:00
Philip Reames	d34437e9e1	[RISCV] Recognize a zipeven/zipodd requiring larger SEW (#134923 ) This is a follow up to f8ee58a3c, and improves code generation for the XRivosVizip extension. If we have a slide pair which could be a zipeven or zipodd if the shuffle was widened, widen the shuffle and then mask the zipeven or zipodd. This is basically working around an order of matching issue; we match the slide pair variants before trying widening. I considered whether we should just widen slide pairs without any consideration of the zip idioms, but the resulting codegen changes look mostly like churn, and have no clear evidence of profitability.	2025-04-10 02:29:04 -07:00
Jim Lin	d28b4d8916	[RISCV] Lower BUILD_VECTOR with i64 type to VID on RV32 if possible (#132339 ) The element type i64 of the BUILD_VECTOR is not legal on RV32. It doesn't catch the VID pattern after being legalized for i64. So try to customized lower it to VID during type legalization.	2025-04-09 13:33:58 +08:00
Philip Reames	c1e95b2e5e	[RISCV] Fix matching bug in VLA shuffle lowering (#134750 ) Fix https://github.com/llvm/llvm-project/issues/134126. The matching code was previous written as if we were mutating the indices to replace undef elements with preferred values, but the actual lowering code just took a prefix of the index vector. This resulted in us using undef indices for lanes which should have been defined, resulting in incorrect codegen. Longer term, we probably should rewrite the mask, but this seemed like an easier tactical fix.	2025-04-08 07:20:25 -07:00
Simon Pilgrim	387a8859cf	Fix MSVC "32-bit shift implicitly converted to 64 bits" warning. NFCI.	2025-04-07 10:29:16 +01:00
Luke Lau	d62d15e298	[RISCV] Undo unprofitable zext of icmp combine (#134306 ) InstCombine will combine this zext of an icmp where the source has a single bit set to a lshr plus trunc (`InstCombinerImpl::transformZExtICmp`): ```llvm define <vscale x 1 x i8> @f(<vscale x 1 x i64> %x) { %1 = and <vscale x 1 x i64> %x, splat (i64 8) %2 = icmp ne <vscale x 1 x i64> %1, splat (i64 0) %3 = zext <vscale x 1 x i1> %2 to <vscale x 1 x i8> ret <vscale x 1 x i8> %3 } ``` ```llvm define <vscale x 1 x i8> @reverse_zexticmp_i64(<vscale x 1 x i64> %x) { %1 = trunc <vscale x 1 x i64> %x to <vscale x 1 x i8> %2 = lshr <vscale x 1 x i8> %1, splat (i8 2) %3 = and <vscale x 1 x i8> %2, splat (i8 1) ret <vscale x 1 x i8> %3 } ``` In a loop, this ends up being unprofitable for RISC-V because the codegen now goes from: ```asm f: # @f .cfi_startproc # %bb.0: vsetvli a0, zero, e64, m1, ta, ma vand.vi v8, v8, 8 vmsne.vi v0, v8, 0 vsetvli zero, zero, e8, mf8, ta, ma vmv.v.i v8, 0 vmerge.vim v8, v8, 1, v0 ret ``` To a series of narrowing vnsrl.wis: ```asm f: # @f .cfi_startproc # %bb.0: vsetvli a0, zero, e64, m1, ta, ma vand.vi v8, v8, 8 vsetvli zero, zero, e32, mf2, ta, ma vnsrl.wi v8, v8, 3 vsetvli zero, zero, e16, mf4, ta, ma vnsrl.wi v8, v8, 0 vsetvli zero, zero, e8, mf8, ta, ma vnsrl.wi v8, v8, 0 ret ``` In the original form, the vmv.v.i is loop invariant and is hoisted out, and the vmerge.vim usually gets folded away into a masked instruction, so you usually just end up with a vsetvli + vmsne.vi. The truncate requires multiple instructions and introduces a vtype toggle for each one, and is measurably slower on the BPI-F3. This reverses the transform in RISCVISelLowering for truncations greater than twice the bitwidth, i.e. it keeps single vnsrl.wis. Fixes #132245	2025-04-04 19:05:59 +01:00
Luke Lau	711b15d179	[RISCV] Mark subvector extracts from index 0 as cheap (#134101 ) Previously we only marked fixed length vector extracts as cheap, so this extends it to any extract at index 0 which should just be a subreg extract. This allows extracts of i1 vectors to be considered for DAG combines, but also scalable vectors too. This causes some slight improvements with large legalized fixed-length vectors, but the underlying motiviation for this is to actually prevent an unprofitable DAG combine on a scalable vector in an upcoming patch.	2025-04-02 17:57:13 +01:00
Ryan Buchner	fa2a6d68c6	[CodeGenPrepare][RISCV] Combine (X ^ Y) and (X == Y) where appropriate (#130922 ) Fixes #130510. In RISCV, modify the folding of (X ^ Y == 0) -> (X == Y) to account for cases where the (X ^ Y) will be re-used. If a constant is being used for the XOR before a branch, ensure that it is small enough to fit within a 12-bit immediate field. Otherwise, the equality check is more efficient than the check against 0, see the following: ``` # %bb.0: lui a1, 5 addiw a1, a1, 1365 xor a0, a0, a1 beqz a0, .LBB0_2 # %bb.1: ret .LBB0_2: ``` ``` # %bb.0: lui a1, 5 addiw a1, a1, 1365 beq a0, a1, .LBB0_2 # %bb.1: xor a0, a0, a1 ret .LBB0_2: ``` Similarly, if the XOR is between 1 and a size one integer, we should still fold away the XOR since that comparison can be optimized as a comparison against 0. ``` # %bb.0: slt a0, a0, a1 xor a0, a0, 1 beqz a0, .LBB0_2 # %bb.1: ret .LBB0_2: ``` ``` # %bb.0: slt a0, a0, a1 bnez a0, .LBB0_2 # %bb.1: xor a0, a0, 1 ret .LBB0_2: ``` One question about my code is that I used a hard-coded value for the width of a RISCV ALU immediate. Do you know of a way that I can gather this from the `context`, I was unable to devise one.	2025-04-02 09:56:09 -07:00
Kazu Hirata	86c382514e	[Target] Construct SmallVector with ArrayRef (NFC) (#134019 )	2025-04-01 21:59:19 -07:00
Stefan Pintilie	8d69e953b5	[RISCV] Add combine for shadd family of instructions. (#130829 ) For example for the following situation: %6:gpr = SLLI %2:gpr, 2 %7:gpr = ADDI killed %6:gpr, 24 %8:gpr = ADD %0:gpr, %7:gpr If we swap the two add instrucions we can merge the shift and add. The final code will look something like this: %7 = SH2ADD %0, %2 %8 = ADDI %7, 24	2025-03-31 10:02:12 -04:00
Philip Reames	e8059467ef	[RISCV] Fix -Wsign-compare warning from f8ee58a lib/Target/RISCV/RISCVISelLowering.cpp:4629:26: error: comparison of integers of different signs: 'unsigned int' and 'int' [-Werror,-Wsign-compare] 4629 \| for (unsigned i = 0; i != NumElts; ++i) { \| ~ ^ ~~~~~~~ 1 error generated.	2025-03-29 15:46:56 -07:00
Philip Reames	f8ee58a3cb	[RISCV] Initial codegen support for the XRivosVizip extension (#131933 ) This implements initial code generation support for a subset of the xrivosvizip extension. Specifically, this adds support for vzipeven, vzipodd, and vzip2a, but not vzip2b, vunzip2a, or vunzip2b. The others will follow in separate patches. One review note: The zipeven/zipodd matchers were recently rewritten to better match upstream style, so careful review there would be appreciated. The matchers don't yet support type coercion to wider types. This will be done in a future patch.	2025-03-29 15:25:56 -07:00
Craig Topper	d131b78e06	[RISCV] Disable i1 fixed vectors with more than 1024 elements. (#133267 ) v2048i1 is an MVT, but v2048i8 is not so we don't support i8 vectors with more than 1024 elements. Lowering a v2048i1 shufflevector would requires promoting to v2048i8. Since v2048i8 isn't legal and isn't an MVT this leads to a crash. To fix the crash, this patch makes v2048i1 an illegal type.	2025-03-27 19:12:21 -07:00
Craig Topper	f4e14e7cf3	[RISCV] Const correct reference argument to isElementRotate. NFC	2025-03-27 17:23:00 -07:00
Min-Yih Hsu	25c5bad2f2	[RISCV] Check the legality of source vector types in matchSplatAsGather (#133028 ) When we're trying to lower `extractelement + splat` with vrgather.vi/.vx, we should also check the legality of source vector type from `extractelement`, as the entire transformation assumes legal types. Fixes #133020	2025-03-26 12:07:02 -07:00
Piotr Fusik	96925fa84c	[RISCV] Add vector hasAndNot to enable optimizations (#132438 ) Enables transforms that emit the VANDN instruction. Co-authored-by: Craig Topper <craig.topper@sifive.com>	2025-03-26 10:40:00 +01:00
MingYan	b75dad090c	[RISCV] Support VP_SPLAT mask operations (#132345 ) When val is a constant, it will: (vp.splat val, mask, vl) -> (select val, (riscv_vmset_vl vl), (riscv_vmclr_vl vl)) Otherwise: (vp.splat val, mask, vl) -> (vmsne_vl (vmv_v_x_vl (zext val), vl), splat(zero), mask, vl) --------- Co-authored-by: yanming <ming.yan@terapines.com>	2025-03-24 15:26:58 +08:00
Kazu Hirata	f3e8e80563	[llvm] Construct SmallVector with ArrayRef (NFC) (#132560 )	2025-03-22 13:11:31 -07:00
Philip Reames	8d78b7cc7d	[RISCV] Introduce RISCV::RVVBytesPerBlock to simplify code [nfc] (#132436 )	2025-03-21 11:11:54 -07:00
Min-Yih Hsu	03ceb26b55	[RISCV] Fix incorrect slide offset when using vnsrl to de-interleave (#132123 ) Given this shuffle: ``` shufflevector <8 x i8> %0, <8 x i8> %1, <8 x i32> <i32 0, i32 4, i32 8, i32 12, i32 undef, i32 undef, i32 undef, i32 undef> ``` #127272 lowers it with a bunch of vnsrl. If we describe the result in terms of the shuffle mask, we expect: ``` <0, 4, 8, 12, u, u, u, u> ``` but we actually got: ``` <0, 4, u, u, 8, 12, u, u> ``` for factor larger than 2. This is caused by `CONCAT_VECTORS` on incorrect (sub) vector types. This patch fixes the said issue by building an aggregate vector with the correct sub vector types. Fix #132071	2025-03-20 09:06:24 -07:00
ming	a274ea1e3a	[RISCV] Call SimplifyDemandedBits on the scalar input of vmv_s_x_vl (#131711 ) The vmv.s.x instruction copies the scalar integer register to element 0 of the destination vector register. If SEW < XLEN, the least-significant bits are copied and the upper XLEN-SEW bits are ignored. Co-authored-by: yanming <ming.yan@terapines.com>	2025-03-18 19:29:22 -07:00
Philip Reames	22f7897374	[RISCV] Use vmv.v.x for any rv32 e64 splat with equal halves (#130530 ) The prior logic was reasoning in terms of vsetivli immediates, but using the vmv.v.x is strongly profitable for high LMUL cases. The key difference is that the vmv.v.x form is rematerializeable during register allocation, and the vsle form is not. This change uses the vlmax form of the vsetvli for all cases where the 2 x size can't be encoded as a vsetivli. This has the effect of increasing VL more than necessary across the vmv.v.x, which could in theory be problematic performance-wise on some hardware. We can revisit (or add a tune flag) if this turns out to be noteworthy.	2025-03-10 11:11:53 -07:00
Sam Elliott	3492245ac0	[RISCV] QCI Interrupt Support (#129957 ) This change adds support for `qci-nest` and `qci-nonest` interrupt attribute values. Both of these are machine-mode interrupts, which use instructions in Xqciint to push and pop A- and T-registers (and a few others) from the stack. In particular: - `qci-nonest` uses `qc.c.mienter` to save registers at the start of the function, and uses `qc.c.mileaveret` to restore those registers and return from the interrupt. - `qci-nest` uses `qc.c.mienter.nest` to save registers at the start of the function, and uses `qc.c.mileaveret` to restore those registers and return from the interrupt. - `qc.c.mienter` and `qc.c.mienter.nest` both push registers ra, s0 (fp), t0-t6, and a0-a10 onto the stack (as well as some CSRs for the interrupt context). The difference between these is that `qc.c.mienter.nest` re-enables M-mode interrupts. - `qc.c.mileaveret` will restore the registers that were saved by `qc.c.mienter(.nest)`, and return from the interrupt. These work for both standard M-mode interrupts and the non-maskable interrupt CSRs added by Xqciint. The `qc.c.mienter`, `qc.c.mienter.nest` and `qc.c.mileaveret` instructions are compatible with push and pop instructions, in as much as they (mostly) only spill the A- and T-registers, so we can use the `Zcmp` or `Xqccmp` instructions to spill the S-registers. This combination (`qci-(no)nest` and `Xqccmp`/`Zcmp`) is not implemented in this change. The `qc.c.mienter(.nest)` instructions have a specific register storage order so they preserve the frame pointer convention linked list past the current interrupt handler and into the interrupted code and frames if frame pointers are enabled. Co-authored-by: Pankaj Gode <quic_pgode@quicinc.com>	2025-03-06 13:31:08 -08:00
Jim Lin	a0a904e946	[RISCV] Collect shuffle mask for the lane not by createSequentialMask (#129830 ) If there are the shuffle mask <1, u, u, u, 2, u, u, u> with factor 4. we should have the shuffle mask <1, 2> for lane 0 and <u, u> for lane 1, and so on. Since we use createSequentialMask to create the shuffle mask, the shuffle mask for lane 1 would be <u, 0>(dervied from <u, u+1>). This leads to poor code generation.	2025-03-05 15:31:16 +08:00
Sam Elliott	ee4bc5a8ca	[RISCV] Remove Last Traces of User Interrupts (#129300 ) These were left over from when Craig removed `__attribute__((interrupt("user")))` support in 05d0caef6081e1a6cb23a5a5afe43dc82e8ca558. The tests change "interrupt"="user" into "interrupt"="machine" as they are still intending to be interrupt tests. ISelLowering will now reject "interrupt"="user". The docs no longer mention "user" as a possible interrupt attribute argument.	2025-03-04 11:36:16 -08:00
Petr Penzin	b44fbdee00	[RISCV] Tune flag for fast vrgather.vv (#124664 ) Add tune knob for N*Log2(N) vrgather.vv cost.	2025-03-03 16:04:49 -08:00
Sergey Kachkov	3dc799162f	[RISCV] Add DAG combine to convert (iN reduce.add (zext (vXi1 A to vXiN)) into vcpop.m (#127497 ) This patch combines (iN vector.reduce.add (zext (vXi1 A to vXiN)) into vcpop.m instruction (similarly to bitcast + ctpop pattern). It can be useful for counting number of set bits in scalable vector types, which can't be expressed with bitcast + ctpop (this was previously discussed here: https://github.com/llvm/llvm-project/pull/74294).	2025-03-03 16:27:52 +03:00
Brandon Wu	c804e86f55	[RISCV][VLS] Support RISCV VLS calling convention (#100346 ) This patch adds a function attribute `riscv_vls_cc` for RISCV VLS calling convention which takes 0 or 1 argument, the argument is the `ABI_VLEN` which is the `VLEN` for passing the fixed-vector arguments, it wraps the argument as a scalable vector(VLA) using the `ABI_VLEN` and uses the corresponding mechanism to handle it. The range of `ABI_VLEN` is [32, 65536], if not specified, the default value is 128. Here is an example of VLS argument passing: Non-VLS call: ``` void original_call(__attribute__((vector_size(16))) int arg) {} => define void @original_call(i128 noundef %arg) { entry: ... ret void } ``` VLS call: ``` void __attribute__((riscv_vls_cc(256))) vls_call(__attribute__((vector_size(16))) int arg) {} => define riscv_vls_cc void @vls_call(<vscale x 1 x i32> %arg) { entry: ... ret void } } ``` The first Non-VLS call passes generic vector argument of 16 bytes by flattened integer. On the contrary, the VLS call uses `ABI_VLEN=256` which wraps the vector to <vscale x 1 x i32> where the number of scalable vector elements is calaulated by: `ORIG_ELTS * RVV_BITS_PER_BLOCK / ABI_VLEN`. Note: ORIG_ELTS = Vector Size / Type Size = 128 / 32 = 4. PsABI PR: https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/418 C-API PR: https://github.com/riscv-non-isa/riscv-c-api-doc/pull/68	2025-03-03 12:39:35 +08:00
Philip Reames	248be98418	Reapply "[RISCV][TTI] Add shuffle costing for masked slide lowering (#128537 )" With a fix for fully undef masks. These can't reach the lowering code, but can reach the costing code via e.g. SLP. This change adds the TTI costing corresponding to the recently added isMaskedSlidePair lowering for vector shuffles. However, since the existing costing code hadn't covered either slideup, slidedown, or the (now removed) isElementRotate, the impact is larger in scope than just that new lowering. --------- Co-authored-by: Alexey Bataev <a.bataev@gmx.com> Co-authored-by: Luke Lau <luke_lau@icloud.com>	2025-02-28 08:02:27 -08:00
Jim Lin	94f6b6d538	[SelectionDAG][RISCV] Promote VECREDUCE_{FMAX,FMIN,FMAXIMUM,FMINIMUM} (#128800 ) This patch also adds the tests for VP_REDUCE_{FMAX,FMIN,FMAXIMUM,FMINIMUM}, which have been supported for a while.	2025-02-28 23:13:30 +08:00
Philip Reames	b2152823e0	Revert "[RISCV][TTI] Add shuffle costing for masked slide lowering (#128537 )" This reverts commit 4904728cab8596320a77a895cb712fba07ea7bb1. Downstream test failed, reverting during investigation.	2025-02-27 22:03:18 -08:00

1 2 3 4 5 ...

1925 Commits