6991 Commits

Author SHA1 Message Date
Craig Topper
98e8f01d18
[RISCV] Rename MIPS_PREFETCH->MIPS_PREF. NFC (#154062)
This matches the instruction's assembler mnemonic.
2025-08-18 07:38:10 -07:00
Kazu Hirata
07eb7b7692
[llvm] Replace SmallSet with SmallPtrSet (NFC) (#154068)
This patch replaces SmallSet<T *, N> with SmallPtrSet<T *, N>.  Note
that SmallSet.h "redirects" SmallSet to SmallPtrSet for pointer
element types:

  template <typename PointeeType, unsigned N>
class SmallSet<PointeeType*, N> : public SmallPtrSet<PointeeType*, N>
{};

We only have 140 instances that rely on this "redirection", with the
vast majority of them under llvm/. Since relying on the redirection
doesn't improve readability, this patch replaces SmallSet with
SmallPtrSet for pointer element types.
2025-08-18 07:01:29 -07:00
林克
6842cc5562
[RISCV] Add SpacemiT XSMTVDot (SpacemiT Vector Dot Product) extension. (#151706)
The full spec can be found at spacemit-x60 processor support scope:
Section 2.1.2.2 (Features):

https://developer.spacemit.com/documentation?token=BWbGwbx7liGW21kq9lucSA6Vnpb#2.1

This patch only supports assembler.
2025-08-18 18:03:17 +08:00
Jim Lin
127ba533bd
[RISCV] Remove ST->hasVInstructions() from getIntrinsicInstrCost for cttz/ctlz/ctpop. NFC. (#154064)
That isn't necessary if we've checked ST->hasStdExtZvbb().
2025-08-18 15:24:25 +08:00
Kazu Hirata
cbf5af9668
[llvm] Remove unused includes (NFC) (#154051)
These are identified by misc-include-cleaner.  I've filtered out those
that break builds.  Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
2025-08-17 23:46:35 -07:00
Kazu Hirata
400dde6ca8
[RISCV] Remove an unnecessary cast (NFC) (#154049)
&UncompressedMI is already of MCInst *.
2025-08-17 23:46:27 -07:00
Craig Topper
4a3b69920b
[RISCV] Accept [-128,255] instead of [0, 255] for pli.b (#153913)
pli.h and pli.w both accept signed immediates, so pli.b should too. But
unlike those instructions, pli.b doesn't do any extension so its ok to
accept an unsigned immediate as well.
2025-08-17 21:39:08 -07:00
Brandon Wu
98f4b7797e
[RISCV][llvm] Support fixed-length vector inline assembly constraints (#150724) 2025-08-18 03:36:12 +00:00
Craig Topper
e67ec12640
[RISCV] Remove experimental from Smctr and Ssctr. (#153903)
These extensions were ratified in November 2024.
2025-08-15 17:18:09 -07:00
Sergei Barannikov
b7d6f484c8
[RISCV] Remove non-existent operand of nds.vfwcvt/nds.vfncvt instructions (#153865)
Mask operand is likely a copy-past error, they don't have one.
2025-08-16 00:46:19 +03:00
Craig Topper
c84a43ff3b
[RISCV] Fold (sext_inreg (xor (setcc), -1), i1) -> (add (setcc), -1). (#153855)
This improves all 3 vendor extensions that make sext_inreg i1 legal

Fixes #153781.
2025-08-15 12:55:18 -07:00
Nikita Popov
01bc742185
[CodeGen] Give ArgListEntry a proper constructor (NFC) (#153817)
This ensures that the required fields are set, and also makes the
construction more convenient.
2025-08-15 18:06:07 +02:00
Craig Topper
e2eaea412a
[RISCV] Add MC support for more P extension instructions. (#153629)
This implements pages 10-14 from
https://jhauser.us/RISCV/ext-P/RVP-instrEncodings-015.pdf

Test cases copied from #123271 with a couple mistakes fixed.

Co-authored-by: realqhc <caiqihan021@hotmail.com>
2025-08-14 23:23:28 -07:00
Luke Lau
e261f2895f
[RISCV] Add TSFlag for reading past VL behaviour. NFCI (#149704)
Currently we have a switch statement that checks if a vector instruction
may read elements past VL. However it currently doesn't account for
instructions in vendor extensions.

Handling all possible vendor instructions will result in quite a lot of
opcodes being added, so I've created a new TSFlag that we can declare in
TableGen, and added it to the existing instruction definitions.

I've tried to be conservative as possible here: All SiFive vendor vector
instructions should be covered by the flag, as well as all of
XRivosVizip, and ri.vextract from XRivosVisni.

For now this should be NFC because coincidentally, these instructions
aren't handled in getOperandInfo, so RISCVVLOptimizer should currently
avoid touching them despite them being liberally handled in
getMinimumVLForUser.

However in an upcoming patch we'll need to also bail in
getMinimumVLForUser, so this prepares for it.
2025-08-15 01:19:03 +00:00
Craig Topper
9465916a61
[RISCV] Stop passing the merge opcode around in RISCVMoveMerger. NFC (#153687)
What most code wants to know is the direction and we have to decode the
opcode to figure that out. Instead pass the direction around as a bool
and convert to opcode when we create the merge instruction.
2025-08-14 18:08:23 -07:00
Craig Topper
defbbf0129
[RISCV][MoveMerge] Don't copy kill flag when moving past an instruction that reads the register. (#153644)
If we're moving the second copy before another instruction that reads
the copied register, we need to clear the kill flag on the combined
move.

Fixes #153598.
2025-08-14 14:52:54 -07:00
Craig Topper
cba5f1b6c1
[RISCV] Add MC support for P extensions with scalar second operands. (#153502)
These are the instructions from page 8 and the second half of
page 9 here in
https://jhauser.us/RISCV/ext-P/RVP-instrEncodings-015.pdf

Co-authored-by: realqhc <caiqihan021@hotmail.com>
2025-08-14 07:03:36 -07:00
Nikita Popov
d1952baa5d [CodeGen] Remove unnecessary setTypeListBeforeSoften() parameter (NFC)
It does not make sense to set the softening type list without
setting IsSoften=true.
2025-08-14 10:04:56 +02:00
Piotr Fusik
18782db4c9
[RISCV] Improve instruction selection for most significant bit extraction (#151687)
(seteq (and X, 1<<XLEN-1), 0) -> (xori (srli X, XLEN-1), 1)
    (seteq (and X, 1<<31), 0) -> (xori (srliw X, 31), 1) // RV64
    (setlt X, 0) -> (srli X, XLEN-1) // SRLI is compressible
    (setlt (sext X), 0) -> (srliw X, 31) // RV64
2025-08-14 09:59:43 +02:00
quic_hchandel
71b066e3a2
[RISCV] Add CodeGen support for qc.insbi and qc.insb insert instructions (#152447)
This patch adds CodeGen support for qc.insbi and qc.insb instructions
defined in the Qualcomm uC Xqcibm extension. qc.insbi and qc.insb
inserts bits into destination register from immediate and register
operand respectively.
A sequence of `xor`, `and` & `xor` depending on appropriate conditions
are converted to `qc.insbi` or `qc.insb` which depends on the
immediate's value.
2025-08-14 12:08:28 +05:30
Craig Topper
ace08d5ccf
[RISCV] Add MC support for more P extension instructions. (#153458)
These instructions are the shift by immediate and saturate by immediate
instructions from the top half of page 9 of
https://jhauser.us/RISCV/ext-P/RVP-instrEncodings-015.pdf

I've also improved the CHECK lines in the invalid tests to check line
and column number from the diagnostic.

Co-authored-by: realqhc <caiqihan021@hotmail.com>
2025-08-13 22:07:03 -07:00
Craig Topper
059e49ceaa [RISCV] Fix typo in comment Interger->Integer. NFC 2025-08-13 11:19:49 -07:00
Craig Topper
57fb38a536
[RISCV] Indent body of let scopes in RISCVInstrInfoP.td. NFC (#153349)
I think this makes the code a little more readable.
2025-08-13 08:41:33 -07:00
Mikhail R. Gadelha
489a41d474
[RISCV][VLOPT] Added support for the zvbc and the remaining zvbb instructions (#153234)
Follow-up PR to #153071, adding the remaining zvbb instructions
(VBREV8_V and VREV8_V), plus the zvbc instruction (VCLMUL_VV, VCLMUL_VX,
VCLMULH_VV, VCLMULH_VX).
2025-08-13 14:43:25 +00:00
Sergey Kachkov
bdddff2488
[RISCV][RVV] Prohibit conversion of scalar store to single-element vse if vmv.x.s has multiple uses (#152112)
Godbolt example: https://godbolt.org/z/ThdfP475a

In the example single-element vse is used to store reduction result
instead of scalar store ([this optimization was introduced by this
patch](https://reviews.llvm.org/D109482)). However, vmv.x.s can't be
eliminated here because it has other uses (e.g. CopyToReg), so it seems
more profitable to use scalar store (we already have store value in a
scalar register, and can save one vsetvli which is likely to be required
for single-element vse). The proposed solution is to this transform only
if vmv.x.s has one use (in store instruction)
2025-08-13 13:10:27 +03:00
Sam Elliott
7317e3c9dd [NFC][RISCV] Correct signed/unsigned in Comment 2025-08-12 16:17:22 -07:00
Min-Yih Hsu
ca05058b49
[IA][RISCV] Recognize deinterleaved loads that could lower to strided segmented loads (#151612)
Turn the following deinterleaved load patterns
```
%l = masked.load(%ptr, /*mask=*/110110110110, /*passthru=*/poison)
%f0 = shufflevector %l, [0, 3, 6, 9]
%f1 = shufflevector %l, [1, 4, 7, 10]
%f2 = shufflevector %l, [2, 5, 8, 11]
```
into
```
%s = riscv.vlsseg2(/*passthru=*/poison, %ptr, /*mask=*/1111)
%f0 = extractvalue %s, 0
%f1 = extractvalue %s, 1
%f2 = poison
```
The mask `110110110110` is regarded as 'gap mask' since it effectively skips the entire third field / component.

Similarly,  turning the following snippet
```
%l = masked.load(%ptr, /*mask=*/110000110000, /*passthru=*/poison)
%f0 = shufflevector %l, [0, 3, 6, 9]
%f1 = shufflevector %l, [1, 4, 7, 10]
```
into
```
%s = riscv.vlsseg2(/*passthru=*/poison, %ptr, /*mask=*/1010)
%f0 = extractvalue %s, 0
%f1 = extractvalue %s, 1
```

Right now this patch only tries to detect gap mask from a constant mask supplied to a masked.load/vp.load.
2025-08-12 14:08:18 -07:00
Sam Elliott
9b93ccbcbe
[RISCV] Fix Immediate Check for Xqcibi UGT (#153141)
The check should be about unsigned 16-bit immediates, not signed ones.

This is not a bug per-se, as the old codegen was correct for the
uint16_max case, it just didn't end up using `qc.e.bgeui`, which we
would prefer it did.
2025-08-12 11:06:00 -07:00
Mikhail R. Gadelha
d455d45654
[RISCV][VLOPT] Added support for several vector crypto instructions (#153071)
This PR adds support for the following instructions to the RISC-V
VLOptimizer: vandn.vx, vandn.vv, vbrev.v, vclz.v, vcpop.v, vctz.v,
vror.vi, vror.vx, vror.vv, vrol.vx, vrol.vv.
2025-08-12 12:05:03 -03:00
Sam Elliott
9e8f7acd2b
[RISCV] Track Linker Relaxable through Assembly Relaxation (#152602)
Span-dependent instructions on RISC-V interact in a complex manner with
linker relaxation. The span-dependent assembler algorithm implemented in
LLVM has to start with the smallest version of an instruction and then
only make it larger, so we compress instructions before emitting them to
the streamer.

When the instruction is streamed, the information that the instruction
(or rather, the fixup on the instruction) is linker relaxable must be
accurate, even though the assembler relaxation process may transform a
not-linker-relaxable instruction/fixup into one that that is linker
relaxable, for instance `c.jal` becoming `qc.e.jal`, or `bne` getting
turned into `beq; jal` (the `jal` is linker relaxable).

In order for this to work, the following things have to happen:
- Any instruction/fixup which might be relaxed to a linker-relaxable
instruction/fixup, gets marked as `RelaxCandidate = true` in
RISCVMCCodeEmitter.
- In RISCVAsmBackend, when emitting the `R_RISCV_RELAX` relocation, we
have to check that the relocation/fixup kind is one that may need a
relax relocation, as well as that it is marked as linker relaxable (the
latter will not be set if relaxation is disabled).
- Linker Relaxable instructions streamed to a Relaxable fragment need to
mark the fragment and its section as linker relaxable.

I also added more debug output for Sections/Fixups which are marked
Linker Relaxable.

This results in more relocations, when these PC-relative fixups cross an
instruction with a fixup that is resolved as not linker-relaxable but
caused the fragment to be marked linker relaxable at streaming time
(i.e. `c.j`).

Fixes: #150071
2025-08-12 09:02:48 +01:00
Luke Lau
81b576e66b
[RISCV] Cost casts with illegal types that can't be legalized (#153030)
If we have a floating point vector and no zve32f/zve64f/zve64d, we can
end up with an invalid type-legalization cost from
getTypeLegalizationCost.

Previously this triggered an assertion that the type must have been
legalized if the "legal" type is a vector, but in this case when it's
not possible to legalize the original type is spat back out.

This fixes it by just checking that the legalization cost is valid.

We don't have much testing for zve64x, so we may have other places in
the cost model with this issue.

Fixes #153008
2025-08-12 00:29:39 +08:00
Craig Topper
f55281ac38
[RISCV] Add a high half PACKW+PACK pattern for RV64. (#152760)
Similar to the PACKH+PACK pattern for RV32. We can end up with the
shift left by 32 neeed by our PACK pattern hidden behind an OR that
packs 2 half words.
2025-08-11 07:37:55 -05:00
Nikita Popov
e92b7e9641
[CodeGen] Provide original IR type to CC lowering (NFC) (#152709)
It is common to have ABI requirements for illegal types: For example,
two i64 argument parts that originally came from an fp128 argument may
have a different call ABI than ones that came from a i128 argument.

The current calling convention lowering does not provide access to this
information, so backends come up with various hacks to support it (like
additional pre-analysis cached in CCState, or bypassing the default
logic entirely).

This PR adds the original IR type to InputArg/OutputArg and passes it
down to CCAssignFn. It is not actually used anywhere yet, this just does
the mechanical changes to thread through the new argument.
2025-08-11 08:57:53 +02:00
Craig Topper
7fb8630e71
[RISCV] Add another packh+packw pattern. (#152744)
If the upper 32 bits are demanded, we might have a sext_inreg in
the pattern on the byte shifted by 24. We can also match this case
since packw sign extends from bit 31.
2025-08-09 09:23:44 -05:00
Min-Yih Hsu
c065ed3912
[RISCV] Add intrinsics for strided segment stores with fixed vectors (#152038)
These are the strided versions of `riscv.segN.store.mask` intrinsics.
2025-08-08 14:08:08 -07:00
Mikhail R. Gadelha
e91f68487c
[RISCV] Update SpacemiT-X60 vector fixed-point arithmetic latencies (#150517)
This PR adds hardware-measured latencies for all instructions defined in
Section 12 of the RVV specification: "Vector Fixed-Point Arithmetic
Instructions" to the SpacemiT-X60 scheduling model.
2025-08-08 11:57:35 -03:00
Luke Lau
7074471593
[RISCV] Enable tail folding by default (#151681)
We have been tracking the performance of EVL tail folding in the loop
vectorizer on RISC-V for a while now, and after much hard work from
various contributors we think it should be generally profitable to
enable by default now.

With tail folding there is a 21% improvement on 525.x264_r on SPEC CPU
2017 on the BPI-F3 (-march=rva22u64_v -O3 -flto), as well as a 30%
geomean codesize reduction on SPEC and TSVC, with no significant
regressions detected.

Now that we are early into the LLVM 22.x development cycle it seems like
a good time to enable it to catch any issues. There are still more EVL
related items of work being tracked in #123069, which should continue to
improve performance.
2025-08-08 14:26:23 +08:00
Jim Lin
b9ca01b746 [RISCV] Move the decoder table for XCV, Xqci and XRivos from standard section to vendor section. NFC 2025-08-08 11:18:18 +08:00
Fangrui Song
3769ce013b
MC: Refine ALIGN relocation conditions
Each section now tracks the index of the first linker-relaxable
fragment, enabling two changes:

* Delete redundant ALIGN relocations before the first linker-relaxable
  instruction in a section. The primary example is the offset 0
  R_RISCV_ALIGN relocation for a text section aligned by 4.
* For alignments larger than the NOP size after the first
  linker-relaxable instruction, ALIGN relocations are now generated, even in
  norelax regions. This fixes the issue #150159.

The new test llvm/test/MC/RISCV/Relocations/align-after-relax.s
verifies the required ALIGN in a norelax region following
linker-relaxable instructions.
By using a fragment index within the subsection (which is less than or
equal to the section's index), the implementation may generate redundant
ALIGN relocations in lower-numbered subsections before the first
linker-relaxable instruction.

align-option-relax.s demonstrates the ALIGN optimization.
Add an initial `call` to a few tests to prevent the ALIGN optimization.

---

When the alignment exceeds 2, we insert $alignment-2 bytes of NOPs, even
in non-RVC code. This enables non-RVC code following RVC code to handle
a 2-byte adjustment without requiring an additional state in MCSection
or AsmParser.

```
.globl _start
_start:
// GNU ld can relax this to  6505          lui     a0, 0x1
// LLD hasn't implemented this transformation.
  lui a0, %hi(foo)

.option push
.option norelax
.option norvc
// Now we generate R_RISCV_ALIGN with addend 2, even if this is a norvc region.
.balign 4
b0:
  .word 0x3a393837
.option pop
foo:
```

Pull Request: https://github.com/llvm/llvm-project/pull/150816
2025-08-07 19:16:58 -07:00
Sam Elliott
4e11f89904
[RISCV] Basic Objdump Mapping Symbol Support (#151452)
This implements very basic support for RISC-V mapping symbols in
llvm-objdump, sharing the implementation with how Arm/AArch64/CSKY
implement this feature.

This only supports the `$x` (instruction) and `$d` (data) mapping
symbols for RISC-V, and not the version of `$x` which includes an
architecture string suffix.
2025-08-07 11:28:07 -07:00
Mikhail R. Gadelha
f3db0cb4d8
Reland "[RISCV] Refactor X60 scheduling model helper classes. NFC." (#152336)
This PR fixes the issue that caused an ub in PR #151472.

The issue was a shl call taking a negative shift amount (posDiff). The
result was never used, but tablegen would perform the calculation
anyway. The fix was to replace the shl call with just multiplications
with constants.

Original PR description:

This patch improves the helper classes in the SpacemiT-X60 vector
scheduling model and will be used in follow-up PRs:

There are now two functions to map LMUL to values:
* ConstValueUntilLMULThenDoubleBase: returns BaseValue for LMUL values
before startLMUL, Value for startLMUL, then doubles Value for each
subsequent LMUL. Useful for cases where fractional LMULs have constant
cycles, and integer LMULs double as they increase.
* GetLMULValue: takes an ordered list of LMUL cycles and LMUL and
returns the corresponding cycle. Useful for cases we can't easily cover
with ConstValueUntilLMULThenDoubleBase.

This PR also adds some useful simplified versions of
ConstValueUntilLMULThenDoubleBase, e.g.: ConstValueUntilLMULThenDouble
(when BaseValue == Value), or ConstOneUntilMF4ThenDouble (when cycles
start to double after MF2)
2025-08-07 11:27:46 -03:00
Nikita Popov
406d9b1dd6
[CodeGen] Move IsFixed into ArgFlags (NFCI) (#152319)
The information whether a specific argument is vararg or fixed is
currently stored separately from all the other argument information in
ArgFlags. This means that it is not accessible from CCAssign, and
backends have developed all kinds of workarounds for how they can access
it after all.

Move this information to ArgFlags to make it directly available in all
relevant places.

I've opted to invert this and store it as IsVarArg, as I think that both
makes the meaning more obvious and provides for a better default (which
is IsVarArg=false).
2025-08-07 09:12:40 +02:00
Craig Topper
886b2133e3
[RISCV] Relax one of the zexti8 in the PACKH+PACK(W)/SLLI patterns. (#152384)
For RV32 we don't need the byte shifted by 24 to be zero extend
since the extended bits are shifted out.
For RV64, we don't need the byte shifted by 24 to be zero extended
if the upper 32 bits of the result aren't demanded.
2025-08-06 17:46:43 -07:00
Daniel Henrique Barboza
35bd40d321
[RISCV] add more generic macrofusions (#151140)
These are some macrofusions that are used internally in Ventana in an
yet not upstreamed processor. Figured it would be good to contribute
them ahead of the processor to allow the community to also use them in
their own processors, while also alleaviting our own downstream upkeep.

The macrofusions being added are, considering load =
lb,lh,lw,ld,lbu,lhu,lwu:

- bfext (slli+srli)
- auipc+load
- lui+load
- add(.uw)+load 
- addi+load
- shXadd(.uw)+load, where X=1,2,3
2025-08-06 14:46:34 -04:00
Craig Topper
e232f05dfd
[RISCV] Add packw+packh isel pattern for unaligned loads on RV64. (#152159)
This is similar to an existing pattern from RV32 with the
simpliflication proposed by #152045. Instead of pack we need to use
packw and we need to know that the upper 32 bits are being ignored since
packw sign extends from bit 31.

The use of allBinOpWUsers prevents tablegen from automatically
reassociating the pattern so we need to do it manually. Tablegen is
still able to commute operands though.
2025-08-06 09:09:39 -07:00
Daniel Henrique Barboza
8e57689c34
[RISCV] add load/store misched/PostRA subtarget features (#149409)
Some processors benefit more from store clustering than load clustering,
and vice-versa, depending on factors that are exclusive to each one
(e.g. macrofusions implemented).

Likewise, certain optimizations benefits more from misched clustering
than postRA clustering. Macrofusions are again an example: in a
processor with store pair macrofusions, like the veyron-v1, it is
observed that misched clustering increases the amount of macrofusions
more than postRA clustering. This of course isn't necessarily true for
other processors, but it shows that processors can benefit from a more
fine grained control of clustering mutations, and each one is able to do
it differently.

Add 4 new subtarget features that deprecates the existing
riscv-misched-load-store-clustering and
riscv-postmisched-load-store-clustering
options:

- disable-misched-load-clustering and disable-misched-store-clustering:
disable load/store clustering during misched;

- disable-postmisched-load-clustering and
disable-postmisched-store-clustering:
disable load/store clustering during PostRA.

Note that the new subtarget features disables specific stages of the
default
clustering settings. The default per se (load and store clustering for
both
misched and PostRA) is left untouched.

Disable all clustering but misched-store-clustering for the veyron-v1
processor
using the new features.
2025-08-06 09:08:25 -07:00
Kazu Hirata
62fc0028bf
[Target] Remove unnecessary casts (NFC) (#152262)
value() already returns uint64_t.
2025-08-06 07:11:07 -07:00
Craig Topper
6ba6efea84
[RISCV] Simplify one of the RV32 PACK isel patterns. (#152045)
This pattern previously checked a specific variant of 4 bytes being
packed that is generated by unaligned load expansion.

Our individual PACK patterns don't handle that particular case because a
DAG combine turns (or (or A, (shl B, 8)), (shl (or C, (shl D, 8)), 16))
into (or (or A, (shl B, 8)), (or (shl C, 16), (shl D, 24)). After this,
the outer OR doesn't have a shl operand so we needed a pattern that
looks through 2 layers of OR.

To match this pattern we don't need to look at the (or A, (shl B, 8))
part since that part wasn't affected by the DAG combine and can be
matched to PACKH by itself. It's enough to make sure that part of the
pattern has zeros in the upper 16 bits.

This allows tablegen to automatically generate more permutations of this pattern.
The associative variant expansion is limited to 3 children.
2025-08-05 20:11:59 -07:00
Craig Topper
73685583c8
[VP][RISCV] Add a vp.load.ff intrinsic for fault only first load. (#128593)
There's been some interest in supporting early-exit loops recently.
https://discourse.llvm.org/t/rfc-supporting-more-early-exit-loops/84690

This patch was extracted from our downstream where we've been using it
in our vectorizer.
2025-08-05 16:12:42 -07:00
quic_hchandel
d1b363e0b0
[RISCV] Add Tied operands to insert instructions in Qualcomm uC extension Xqcibm (#151339) 2025-08-05 15:02:19 +05:30