1925 Commits

Author SHA1 Message Date
Sam Elliott
cfc5baf6e6
[RISCV] SiFive CLIC Support (#132481)
This Change adds support for two SiFive vendor attributes in clang:
- "SiFive-CLIC-preemptible"
- "SiFive-CLIC-stack-swap"

These can be given together, and can be combined with "machine", but
cannot be combined with any other interrupt attribute values.

These are handled primarily in RISCVFrameLowering:
- "SiFive-CLIC-stack-swap" entails swapping `sp` with `sf.mscratchcsw`
  at function entry and exit, which holds the trap stack pointer.
- "SiFive-CLIC-preemptible" entails saving `mcause` and `mepc` before
  re-enabling interrupts using `mstatus`. To save these, `s0` and `s1`
  are first spilled to the stack, and then the values are read into
  these registers. If these registers are used in the function, their
  values will be spilled a second time onto the stack with the generic
  callee-saved-register handling. At the end of the function interrupts
  are disabled again before `mepc` and `mcause` are restored.

This Change also adds support for the following two experimental
extensions, which only contain CSRs:
- XSfsclic - for SiFive's CLIC Supervisor-Mode CSRs
- XSfmclic - for SiFive's CLIC Machine-Mode CSRs

The latter is needed for interrupt support.

The CFI information for this implementation is not correct, but I'd
prefer to correct this in a follow-up. While it's unlikely anyone wants
to unwind through a handler, the CFI information is also used by
debuggers so it would be good to get it right.

Co-authored-by: Ana Pazos <apazos@quicinc.com>
2025-04-25 17:12:27 -07:00
Matt Arsenault
91865ac9ba
Use isa instead of !dyn_cast (#137344) 2025-04-25 19:11:56 +02:00
Philip Reames
9062a38d5d
[RISCV] Add codegen support for ri.vinsert.v.x and ri.vextract.x.v (#136708)
These instructions are included in XRivosVisni. They perform a scalar
insert into a vector (with a potentially non-zero index) and a scalar
extract from a vector (with a potentially non-zero index) respectively.
They're very analogous to vmv.s.x and vmv.x.s respectively.

The instructions do have a couple restrictions:
1) Only constant indices are supported w/a uimm5 format.
2) There are no FP variants.

One important property of these instructions is that their throughput
and latency are expected to be LMUL independent.
2025-04-25 07:59:40 -07:00
Philip Reames
b278aa3197
[RISCV] Make xrivosvizip interleave2 and deinterleave2 undef safe (#136733)
We're duplicating uses here, so we need to freeze the inputs.

---------

Co-authored-by: Luke Lau <luke_lau@icloud.com>
2025-04-24 08:25:12 -07:00
Sergei Barannikov
7af555e524
[ARM][RISCV] Partially revert #101786 (#137120)
The change as is breaks the Linux kernel build as pointed out in the
comments.
2025-04-24 10:13:05 +03:00
Luke Lau
717efc0a99
[RISCV] Support disjoint RISCVISD::OR_VL in combineOp_VLToVWOp_VL (#136820)
This handles combining fixed-length disjoint ors to vwadd[u].wv, as was
done for scalable vectors in #86929.

vwadd[u].vv patterns need to be handled separately with a pattern in a
separate patch due to the extends being sunk, see #136716.
2025-04-23 18:43:55 +08:00
Sergei Barannikov
11a3de7e98
[SDag][ARM][RISCV] Allow lowering CTPOP into a libcall (#101786)
This is a reland of #99752 with the bug fixed (see test diff in the
third commit in this PR).
All `popcount` libcalls return `int`, but `ISD::CTPOP` returns the type
of the argument, which can be wider than `int`. The fix is to make DAG
legalizer pass the correct return type to `makeLibCall` and sign-extend
the result afterwards.

Original commit message:
The main change is adding CTPOP to `RuntimeLibcalls.def` to allow
targets to use LibCall action for CTPOP. DAG legalizers are changed
accordingly.

Pull Request: https://github.com/llvm/llvm-project/pull/101786
2025-04-23 12:43:05 +03:00
David Green
98b6f8dc69
[CostModel] Remove optional from InstructionCost::getValue() (#135596)
InstructionCost is already an optional value, containing an Invalid
state that can be checked with isValid(). There is little point in
returning another optional from getValue(). Most uses do not make use of
it being a std::optional, dereferencing the value directly (either
isValid has been checked previously or the Cost is assumed to be valid).
The one case that does in AMDGPU used value_or which has been replaced
by a isValid() check.
2025-04-23 07:46:27 +01:00
Philip Reames
4dbf67de40
[RISCV] Lower SEW<=32 vector_deinterleave(2) via vunzip2{a,b} (#136463)
This is a continuation from 22d5890c and adds the neccessary logic to
handle SEW!=64 profitably. The interesting case is needing to handle
e.g. a single m1 which is split via extract_subvector into two operands,
and form that back into a single m1 operation - instead of letting the
vslidedown by vlenb/Constant sequence be generated. This is analogous to
the getSingleShuffleSrc for vnsrl, and we can share a bunch of code.
2025-04-22 09:26:36 -07:00
Philip Reames
901ac60db7
[RISCV] Use ri.vzip2{a,b} for interleave2 if available (#136364)
If XRivosVizip is available, the ri.vzip2a and ri.vzip2b instructions
can be used perform a interleave shuffle. This patch only effects the
intrinsic lowering (and thus scalable vectors). Fixed vectors go through
shuffle lowering and the zip2a (but not zip2b) case is already handled
there..
2025-04-22 08:25:17 -07:00
quic_hchandel
8f8853a574
[RISCV] Add ISel patterns for Xqcia instructions (#136548)
This patch adds instruction selection patterns for generating the
integer arithmetic instructions.
2025-04-22 15:40:03 +05:30
Philip Reames
722d5890cc
[RISCV] Lower e64 vector_deinterleave via ri.vunzip2{a,b} if available (#136321)
If XRivosVizip is available, the ri.vunzip2a and ri.vunzip2b can be used
to the concatenation and register deinterleave shuffle. This patch only
effects the intrinsic lowering (and thus scalable vectors because the
fixed vectors go through shuffle lowering).

Note that this patch is restricted to e64 for staging purposes only. e64
is obviously profitable (i.e. we remove a vcompress). At e32 and below,
our alternative is a vnsrl instead, and we need a bit more complexity
around lowering with fractional LMUL before the ri.vunzip2a/b versions
becomes always profitable. I'll post the followup change once this
lands.
2025-04-18 10:40:15 -07:00
Philip Reames
f2ecd86e34
[Analysis] Remove implicit LocationSize conversion from uint64_t (#133342)
This change removes the uint64_t constructor on LocationSize
preventing implicit conversion, and fixes up the using APIs to adapt to
the change. Note that I'm adding a couple of explicit conversion points
on routines where passing in a fixed offset as an integer seems likely
to have well understood semantics.

We had an unfortunate case which arose if you tried to pass a TypeSize
value to a parameter of LocationSize type. We'd find the implicit
conversion path through TypeSize -> uint64_t -> LocationSize which works
just fine for fixed values, but looses information and fails assertions
if the TypeSize was scalable. This change breaks the first link in that
implicit conversion chain since that seemed to be the easier one.
2025-04-18 07:46:31 -07:00
Philip Reames
50f9b34b53
[RISCV] Prefer vmv.s.x for build_vector a, undef, ..., undef (#136164)
If we have a build vector which could be either a splat or a scalar
insert, prefer the scalar insert. At high LMUL, this reduces vector
register pressure (locally, the use will likely still be aligned), and
the amount of work performed for the splat.
2025-04-17 19:51:35 -07:00
Philip Reames
7866fc2bd9
[RISCV] Rewrite vrgather.vx undef, (vmv.s.x), 0, v0 as vmv.v.x (#136010)
This extends the DAG combine introduced in 336b2909 to handle the case
where the prior value is defined by a vmv.s.x instead of a vmv.v.x. If
the vrgather splats the single source element, and has no passthru we
can replace it with a vmv.v.x - which will in turn usually get folded
into a vmerge if a select follows.
2025-04-17 10:06:43 -07:00
Jim Lin
0439a4eca7
[RISCV] Add new CondCode COND_CV_BEQIMM/COND_CV_BNEIMM for CV immediate branch (#135771)
If there is another branch instruction also with immediate operand, but
it is used to specify which bit to be tested is set or clear. We only
check whether operand2 is immediate or not here. There are no way to
distinguish between them.

So add new CondCode COND_CV_BEQIMM/COND_CV_BNEIMM that we can know what
kinds of immediate branch instruction are matched in Select_* Pseudo.
2025-04-16 10:16:31 +08:00
Philip Reames
a6b424e1e7
[RISCV] Extend redundant vrgather.vx peephole to vfmv.v.f (#135503)
Extend the transform introduced in 336b290 to vfmv.v.f. This is fairly
trivial and would have been in the original commit except I hadn't
written the FP tests yet.

If the vrgather.vi is preceeded by a vfmv.v.f which writes a superset of
the lanes writen by the vrgather, and the vrgather has no passthru, then
the vrgather has no semantic effect.
2025-04-14 18:13:12 -07:00
Philip Reames
336b290923
[RISCV] Use a DAG combine to prune pointless vrgather.vi (#135392)
If the vrgather.vi is preceeded by a vmv.v.x which writes a superset of
the lanes writen by the vrgather, and the vrgather has no passthru, then
the vrgather has no semantic effect.

This is the start of a mini-series of patches around rewriting
vrgather.vi/vx preceeded by vmv.v.x, vfmf.v.f, vmv.s.x, etc... Starting
with the simplest, but also lowest impact.

One point I'd like a second oppinion on is the out of bounds semenatic
change. As far as I can tell, all the indices are in bounds by
construction. The doc change is as much as I couldn't figure out how to
test the alternative as anything else.
2025-04-11 20:02:53 -07:00
Sergey Kachkov
bc2a5b5466
[RISCV] Explicitly set FRM defs as non-dead to prevent their reordering with instructions that may use it (#135176)
Fixes #135172. The proposed solution is to conservatively reset dead
flag from all $frm defs in AdjustInstrPostInstrSelection.
2025-04-11 15:07:51 +03:00
Philip Reames
f40001372b
[RISCV] Lower a shuffle which is nearly identity except one replicated element (#135292)
This can be done with a vrgather.vi/vx, and (possibly) a register move.
The alternative is to do a vrgather.vv with a full width index vector.

We'd already caught the two operands forms of this shuffle; this patch
specifically handles the single operand form.

Unfortunately only in abstract, it would be nice if we canonicalized
shuffles in some way wouldn't it?
2025-04-10 19:45:04 -07:00
Philip Reames
d34437e9e1
[RISCV] Recognize a zipeven/zipodd requiring larger SEW (#134923)
This is a follow up to f8ee58a3c, and improves code generation for the
XRivosVizip extension.

If we have a slide pair which could be a zipeven or zipodd if the
shuffle was widened, widen the shuffle and then mask the zipeven or
zipodd.

This is basically working around an order of matching issue; we match
the slide pair variants before trying widening. I considered whether we
should just widen slide pairs without any consideration of the zip
idioms, but the resulting codegen changes look mostly like churn, and
have no clear evidence of profitability.
2025-04-10 02:29:04 -07:00
Jim Lin
d28b4d8916
[RISCV] Lower BUILD_VECTOR with i64 type to VID on RV32 if possible (#132339)
The element type i64 of the BUILD_VECTOR is not legal on RV32. It
doesn't catch the VID pattern after being legalized for i64.
So try to customized lower it to VID during type legalization.
2025-04-09 13:33:58 +08:00
Philip Reames
c1e95b2e5e
[RISCV] Fix matching bug in VLA shuffle lowering (#134750)
Fix https://github.com/llvm/llvm-project/issues/134126.

The matching code was previous written as if we were mutating the
indices to replace undef elements with preferred values, but the actual
lowering code just took a prefix of the index vector. This resulted in
us using undef indices for lanes which should have been defined,
resulting in incorrect codegen.

Longer term, we probably should rewrite the mask, but this seemed like
an easier tactical fix.
2025-04-08 07:20:25 -07:00
Simon Pilgrim
387a8859cf Fix MSVC "32-bit shift implicitly converted to 64 bits" warning. NFCI. 2025-04-07 10:29:16 +01:00
Luke Lau
d62d15e298
[RISCV] Undo unprofitable zext of icmp combine (#134306)
InstCombine will combine this zext of an icmp where the source has a
single bit set to a lshr plus trunc
(`InstCombinerImpl::transformZExtICmp`):

```llvm
define <vscale x 1 x i8> @f(<vscale x 1 x i64> %x) {
  %1 = and <vscale x 1 x i64> %x, splat (i64 8)
  %2 = icmp ne <vscale x 1 x i64> %1, splat (i64 0)
  %3 = zext <vscale x 1 x i1> %2 to <vscale x 1 x i8>
  ret <vscale x 1 x i8> %3
}
```

```llvm
define <vscale x 1 x i8> @reverse_zexticmp_i64(<vscale x 1 x i64> %x) {
  %1 = trunc <vscale x 1 x i64> %x to <vscale x 1 x i8>
  %2 = lshr <vscale x 1 x i8> %1, splat (i8 2)
  %3 = and <vscale x 1 x i8> %2, splat (i8 1)
  ret <vscale x 1 x i8> %3
}
```

In a loop, this ends up being unprofitable for RISC-V because the
codegen now goes from:

```asm
f:                                      # @f
	.cfi_startproc
# %bb.0:
	vsetvli	a0, zero, e64, m1, ta, ma
	vand.vi	v8, v8, 8
	vmsne.vi	v0, v8, 0
	vsetvli	zero, zero, e8, mf8, ta, ma
	vmv.v.i	v8, 0
	vmerge.vim	v8, v8, 1, v0
	ret
```

To a series of narrowing vnsrl.wis:

```asm
f:                                      # @f
	.cfi_startproc
# %bb.0:
	vsetvli	a0, zero, e64, m1, ta, ma
	vand.vi	v8, v8, 8
	vsetvli	zero, zero, e32, mf2, ta, ma
	vnsrl.wi	v8, v8, 3
	vsetvli	zero, zero, e16, mf4, ta, ma
	vnsrl.wi	v8, v8, 0
	vsetvli	zero, zero, e8, mf8, ta, ma
	vnsrl.wi	v8, v8, 0
	ret
```

In the original form, the vmv.v.i is loop invariant and is hoisted out,
and the vmerge.vim usually gets folded away into a masked instruction,
so you usually just end up with a vsetvli + vmsne.vi.

The truncate requires multiple instructions and introduces a vtype
toggle for each one, and is measurably slower on the BPI-F3.

This reverses the transform in RISCVISelLowering for truncations greater
than twice the bitwidth, i.e. it keeps single vnsrl.wis.

Fixes #132245
2025-04-04 19:05:59 +01:00
Luke Lau
711b15d179
[RISCV] Mark subvector extracts from index 0 as cheap (#134101)
Previously we only marked fixed length vector extracts as cheap, so this
extends it to any extract at index 0 which should just be a subreg
extract.

This allows extracts of i1 vectors to be considered for DAG combines,
but also scalable vectors too.

This causes some slight improvements with large legalized fixed-length
vectors, but the underlying motiviation for this is to actually prevent
an unprofitable DAG combine on a scalable vector in an upcoming patch.
2025-04-02 17:57:13 +01:00
Ryan Buchner
fa2a6d68c6
[CodeGenPrepare][RISCV] Combine (X ^ Y) and (X == Y) where appropriate (#130922)
Fixes #130510.

In RISCV, modify the folding of (X ^ Y == 0) -> (X == Y) to account for
cases where the (X ^ Y) will be re-used.

If a constant is being used for the XOR before a branch, ensure that it
is small enough to fit within a 12-bit immediate field. Otherwise, the
equality check is more efficient than the check against 0, see the
following:
```
# %bb.0:
        lui     a1, 5
        addiw   a1, a1, 1365
        xor     a0, a0, a1
        beqz    a0, .LBB0_2
# %bb.1: 
        ret
.LBB0_2: 
```

```
# %bb.0:
        lui     a1, 5
        addiw   a1, a1, 1365
        beq    a0, a1, .LBB0_2
# %bb.1: 
        xor     a0, a0, a1
        ret
.LBB0_2: 
```

Similarly, if the XOR is between 1 and a size one integer, we should
still fold away the XOR since that comparison can be optimized as a
comparison against 0.
```
# %bb.0:
        slt a0, a0, a1
        xor  a0, a0, 1
        beqz    a0, .LBB0_2
# %bb.1: 
        ret
.LBB0_2: 
```

```
# %bb.0:
        slt a0, a0, a1
        bnez    a0, .LBB0_2
# %bb.1: 
        xor  a0, a0, 1
        ret
.LBB0_2: 
```

One question about my code is that I used a hard-coded value for the
width of a RISCV ALU immediate. Do you know of a way that I can gather
this from the `context`, I was unable to devise one.
2025-04-02 09:56:09 -07:00
Kazu Hirata
86c382514e
[Target] Construct SmallVector with ArrayRef (NFC) (#134019) 2025-04-01 21:59:19 -07:00
Stefan Pintilie
8d69e953b5
[RISCV] Add combine for shadd family of instructions. (#130829)
For example for the following situation:
  %6:gpr = SLLI %2:gpr, 2
  %7:gpr = ADDI killed %6:gpr, 24
  %8:gpr = ADD %0:gpr, %7:gpr

If we swap the two add instrucions we can merge the shift and add. The
final code will look something like this:
  %7 = SH2ADD %0, %2
  %8 = ADDI %7, 24
2025-03-31 10:02:12 -04:00
Philip Reames
e8059467ef [RISCV] Fix -Wsign-compare warning from f8ee58a
lib/Target/RISCV/RISCVISelLowering.cpp:4629:26: error: comparison of integers of different signs: 'unsigned int' and 'int' [-Werror,-Wsign-compare]
 4629 |   for (unsigned i = 0; i != NumElts; ++i) {
      |                        ~ ^  ~~~~~~~
1 error generated.
2025-03-29 15:46:56 -07:00
Philip Reames
f8ee58a3cb
[RISCV] Initial codegen support for the XRivosVizip extension (#131933)
This implements initial code generation support for a subset of the
xrivosvizip extension. Specifically, this adds support for vzipeven,
vzipodd, and vzip2a, but not vzip2b, vunzip2a, or vunzip2b. The others
will follow in separate patches.

One review note: The zipeven/zipodd matchers were recently rewritten to
better match upstream style, so careful review there would be
appreciated. The matchers don't yet support type coercion to wider
types. This will be done in a future patch.
2025-03-29 15:25:56 -07:00
Craig Topper
d131b78e06
[RISCV] Disable i1 fixed vectors with more than 1024 elements. (#133267)
v2048i1 is an MVT, but v2048i8 is not so we don't support i8 vectors
with more than 1024 elements. Lowering a v2048i1 shufflevector would
requires promoting to v2048i8. Since v2048i8 isn't legal and isn't an
MVT this leads to a crash.

To fix the crash, this patch makes v2048i1 an illegal type.
2025-03-27 19:12:21 -07:00
Craig Topper
f4e14e7cf3 [RISCV] Const correct reference argument to isElementRotate. NFC 2025-03-27 17:23:00 -07:00
Min-Yih Hsu
25c5bad2f2
[RISCV] Check the legality of source vector types in matchSplatAsGather (#133028)
When we're trying to lower `extractelement + splat` with
vrgather.vi/.vx, we should also check the legality of source vector type
from `extractelement`, as the entire transformation assumes legal types.

Fixes #133020
2025-03-26 12:07:02 -07:00
Piotr Fusik
96925fa84c
[RISCV] Add vector hasAndNot to enable optimizations (#132438)
Enables transforms that emit the VANDN instruction.

Co-authored-by: Craig Topper <craig.topper@sifive.com>
2025-03-26 10:40:00 +01:00
MingYan
b75dad090c
[RISCV] Support VP_SPLAT mask operations (#132345)
When val is a constant, it will:
(vp.splat val, mask, vl) -> (select val, (riscv_vmset_vl vl),
(riscv_vmclr_vl vl))
Otherwise:
(vp.splat val, mask, vl) -> (vmsne_vl (vmv_v_x_vl (zext val), vl),
splat(zero), mask, vl)

---------

Co-authored-by: yanming <ming.yan@terapines.com>
2025-03-24 15:26:58 +08:00
Kazu Hirata
f3e8e80563
[llvm] Construct SmallVector with ArrayRef (NFC) (#132560) 2025-03-22 13:11:31 -07:00
Philip Reames
8d78b7cc7d
[RISCV] Introduce RISCV::RVVBytesPerBlock to simplify code [nfc] (#132436) 2025-03-21 11:11:54 -07:00
Min-Yih Hsu
03ceb26b55
[RISCV] Fix incorrect slide offset when using vnsrl to de-interleave (#132123)
Given this shuffle:
```
shufflevector <8 x i8> %0, <8 x i8> %1, <8 x i32> <i32 0, i32 4, i32 8, i32 12, i32 undef, i32 undef, i32 undef, i32 undef>
```
#127272 lowers it with a bunch of vnsrl. If we describe the result in
terms of the shuffle mask, we expect:
```
<0, 4, 8, 12, u, u, u, u>
```
but we actually got:
```
<0, 4, u, u, 8, 12, u, u>
```
for factor larger than 2. This is caused by `CONCAT_VECTORS` on
incorrect (sub) vector types. This patch fixes the said issue by
building an aggregate vector with the correct sub vector types.

Fix #132071
2025-03-20 09:06:24 -07:00
ming
a274ea1e3a
[RISCV] Call SimplifyDemandedBits on the scalar input of vmv_s_x_vl (#131711)
The vmv.s.x instruction copies the scalar integer register to element 0
of the destination vector register. If SEW < XLEN, the least-significant
bits are copied and the upper XLEN-SEW bits are ignored.

Co-authored-by: yanming <ming.yan@terapines.com>
2025-03-18 19:29:22 -07:00
Philip Reames
22f7897374
[RISCV] Use vmv.v.x for any rv32 e64 splat with equal halves (#130530)
The prior logic was reasoning in terms of vsetivli immediates, but using
the vmv.v.x is strongly profitable for high LMUL cases.  The key
difference is that the vmv.v.x form is rematerializeable during
register allocation, and the vsle form is not.

This change uses the vlmax form of the vsetvli for all cases where the
2 x size can't be encoded as a vsetivli.  This has the effect of increasing
VL more than necessary across the vmv.v.x, which could in theory be
problematic performance-wise on some hardware.  We can revisit (or
add a tune flag) if this turns out to be noteworthy.
2025-03-10 11:11:53 -07:00
Sam Elliott
3492245ac0
[RISCV] QCI Interrupt Support (#129957)
This change adds support for `qci-nest` and `qci-nonest` interrupt
attribute values. Both of these are machine-mode interrupts, which use
instructions in Xqciint to push and pop A- and T-registers (and a few
others) from the stack.

In particular:
- `qci-nonest` uses `qc.c.mienter` to save registers at the start of the
function, and uses `qc.c.mileaveret` to restore those registers and
return from the interrupt.
- `qci-nest` uses `qc.c.mienter.nest` to save registers at the start of
the function, and uses `qc.c.mileaveret` to restore those registers and
return from the interrupt.
- `qc.c.mienter` and `qc.c.mienter.nest` both push registers ra, s0
(fp), t0-t6, and a0-a10 onto the stack (as well as some CSRs for the
interrupt context). The difference between these is that
`qc.c.mienter.nest` re-enables M-mode interrupts.
- `qc.c.mileaveret` will restore the registers that were saved by
`qc.c.mienter(.nest)`, and return from the interrupt.

These work for both standard M-mode interrupts and the non-maskable
interrupt CSRs added by Xqciint.

The `qc.c.mienter`, `qc.c.mienter.nest` and `qc.c.mileaveret`
instructions are compatible with push and pop instructions, in as much
as they (mostly) only spill the A- and T-registers, so we can use the
`Zcmp` or `Xqccmp` instructions to spill the S-registers. This
combination (`qci-(no)nest` and `Xqccmp`/`Zcmp`) is not implemented in
this change.

The `qc.c.mienter(.nest)` instructions have a specific register storage
order so they preserve the frame pointer convention linked list past the
current interrupt handler and into the interrupted code and frames if
frame pointers are enabled.

Co-authored-by: Pankaj Gode <quic_pgode@quicinc.com>
2025-03-06 13:31:08 -08:00
Jim Lin
a0a904e946
[RISCV] Collect shuffle mask for the lane not by createSequentialMask (#129830)
If there are the shuffle mask <1, u, u, u, 2, u, u, u> with factor 4. we
should have the shuffle mask <1, 2> for lane 0 and <u, u> for lane 1,
and so on. Since we use createSequentialMask to create the shuffle mask,
the shuffle mask for lane 1 would be <u, 0>(dervied from <u, u+1>). This
leads to poor code generation.
2025-03-05 15:31:16 +08:00
Sam Elliott
ee4bc5a8ca
[RISCV] Remove Last Traces of User Interrupts (#129300)
These were left over from when Craig removed
`__attribute__((interrupt("user")))` support in
05d0caef6081e1a6cb23a5a5afe43dc82e8ca558.

The tests change "interrupt"="user" into "interrupt"="machine" as they
are still intending to be interrupt tests. ISelLowering will now reject
"interrupt"="user". The docs no longer mention "user" as a possible
interrupt attribute argument.
2025-03-04 11:36:16 -08:00
Petr Penzin
b44fbdee00
[RISCV] Tune flag for fast vrgather.vv (#124664)
Add tune knob for N*Log2(N) vrgather.vv cost.
2025-03-03 16:04:49 -08:00
Sergey Kachkov
3dc799162f
[RISCV] Add DAG combine to convert (iN reduce.add (zext (vXi1 A to vXiN)) into vcpop.m (#127497)
This patch combines (iN vector.reduce.add (zext (vXi1 A to vXiN)) into
vcpop.m instruction (similarly to bitcast + ctpop pattern). It can be
useful for counting number of set bits in scalable vector types, which
can't be expressed with bitcast + ctpop (this was previously discussed
here: https://github.com/llvm/llvm-project/pull/74294).
2025-03-03 16:27:52 +03:00
Brandon Wu
c804e86f55
[RISCV][VLS] Support RISCV VLS calling convention (#100346)
This patch adds a function attribute `riscv_vls_cc` for RISCV VLS
calling
convention which takes 0 or 1 argument, the argument is the `ABI_VLEN`
which is the `VLEN` for passing the fixed-vector arguments, it wraps the
argument as a scalable vector(VLA) using the `ABI_VLEN` and uses the
corresponding mechanism to handle it. The range of `ABI_VLEN` is [32,
65536],
if not specified, the default value is 128.

Here is an example of VLS argument passing:
Non-VLS call:
```
  void original_call(__attribute__((vector_size(16))) int arg) {}
=>
  define void @original_call(i128 noundef %arg) {
  entry:
    ...
    ret void
  }
```
VLS call:
```
  void __attribute__((riscv_vls_cc(256))) vls_call(__attribute__((vector_size(16))) int arg) {}
=>
  define riscv_vls_cc void @vls_call(<vscale x 1 x i32> %arg) {
  entry:
    ...
    ret void
  }
}
```

The first Non-VLS call passes generic vector argument of 16 bytes by
flattened integer.
On the contrary, the VLS call uses `ABI_VLEN=256` which wraps the
vector to <vscale x 1 x i32> where the number of scalable vector
elements
is calaulated by: `ORIG_ELTS * RVV_BITS_PER_BLOCK / ABI_VLEN`.
Note: ORIG_ELTS = Vector Size / Type Size = 128 / 32 = 4.

PsABI PR: https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/418
C-API PR: https://github.com/riscv-non-isa/riscv-c-api-doc/pull/68
2025-03-03 12:39:35 +08:00
Philip Reames
248be98418 Reapply "[RISCV][TTI] Add shuffle costing for masked slide lowering (#128537)"
With a fix for fully undef masks.  These can't reach the lowering code, but
can reach the costing code via e.g. SLP.

This change adds the TTI costing corresponding to the recently added
isMaskedSlidePair lowering for vector shuffles. However, since the
existing costing code hadn't covered either slideup, slidedown, or the
(now removed) isElementRotate, the impact is larger in scope than just
that new lowering.

---------

Co-authored-by: Alexey Bataev <a.bataev@gmx.com>
Co-authored-by: Luke Lau <luke_lau@icloud.com>
2025-02-28 08:02:27 -08:00
Jim Lin
94f6b6d538
[SelectionDAG][RISCV] Promote VECREDUCE_{FMAX,FMIN,FMAXIMUM,FMINIMUM} (#128800)
This patch also adds the tests for VP_REDUCE_{FMAX,FMIN,FMAXIMUM,FMINIMUM}, which have been supported for a while.
2025-02-28 23:13:30 +08:00
Philip Reames
b2152823e0 Revert "[RISCV][TTI] Add shuffle costing for masked slide lowering (#128537)"
This reverts commit 4904728cab8596320a77a895cb712fba07ea7bb1.  Downstream
test failed, reverting during investigation.
2025-02-27 22:03:18 -08:00