7012 Commits

Author SHA1 Message Date
Philip Reames
88ee17c24e [RISCV][IA] Prefer getElementCount over getNumElements [nfc]
Small cleanup, with the eventual goal of making code easier to merge
between the various routines.
2025-07-15 07:39:38 -07:00
Luke Lau
612afab512 [RISCV] Use MachineInstr::isFullCopy in a few places. NFC
Instead of checking that there's no subregisters.
2025-07-15 21:39:59 +08:00
Fangrui Song
5ba458c559 MCFixup: Replace getTargetKind with getKind 2025-07-15 00:21:07 -07:00
Fangrui Song
0b674f4c52 MCFixup: Replace getTargetKind with getKind
MCFixupKind is now a type alias (fixup kinds are inherently
target-specific). getTargetKind is no longer necessary.
2025-07-15 00:08:45 -07:00
quic_hchandel
0be51cff91
[RISCV] Add ISel patterns for Qualcomm uC Xqcicli extension (#148121)
Add CodeGen patterns for conditional load immediate instructions
2025-07-15 12:13:57 +05:30
Kazu Hirata
7c83d66719
[llvm] Remove unused includes (NFC) (#148768)
These are identified by misc-include-cleaner.  I've filtered out those
that break builds.  Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
2025-07-14 22:19:14 -07:00
Craig Topper
028dfd7756 [RISCV] Replace tab character. NFC 2025-07-14 21:53:55 -07:00
Craig Topper
9ba45c5c5e [RISCV] Move RISCVDAGToDAGISel::SelectAddrRegRegScale definition later. NFC
This function was placed between some static functions and their
callers. Reorder to keep the related code together.
2025-07-14 21:12:10 -07:00
Sam Elliott
3faaa5cdb0
[RISCV] Fix QC.E.LI -> C.LI with Bare Symbol Compression (#146763)
There's a long comment explaining this approach in RISCVInstrInfoXqci.td

This change also fixes some problems when fixups are able to be resolved for `qc.e.li` and `qc.li`.
2025-07-14 21:00:38 -07:00
Craig Topper
4923313727
[RISCV] Fix typo in comment. NFC (#148754)
'unsigned' was misspelled, but it seemed easier to write uimm9 than to
spell it out.
2025-07-14 20:56:07 -07:00
Craig Topper
31944ac45b
[RISCV] Render P-ext simm10_unsigned as a simm10 after parsing. (#148749)
Instead of allowing a parsed MCInst to have a either uimm10 or simm10,
always render as simm10. This avoids a mismatch between parsed MCInst
and disassembled MCInst when a uimm10 value is used.
2025-07-14 20:55:10 -07:00
Craig Topper
3265a36c55
[RISCV] Refactor RISCVDAGToDAGISel::selectSimm5Shl2. NFC (#148731)
Return from the for loop body instead of using a break and checking the
shift amount after.
2025-07-14 20:54:06 -07:00
Jim Lin
96148f9214 [RISCV] Use cond_code instead for PseudoCCNDS_BFOS and PseudoCCNDS_BFOZ. 2025-07-15 11:19:09 +08:00
Jim Lin
22707fd4a5
[RISCV] Add Andes XAndesBFHCvt (Andes Scalar BFLOAT16) extension (#148563)
The spec can be found at:

https://github.com/andestech/andes-v5-isa/releases/tag/ast-v5_4_0-release.

The extension includes only two instructions: one for converting from
f32 to f16, and another for converting from f16 to f32.

This patch only implements MC support for XAndesBFHCvt.
2025-07-15 08:59:00 +08:00
Sudharsan Veeravalli
085e8f1e52
[RISCV] Relax destination instruction dag operand matching in CompresInstEmitter (#148660)
We have some 48-bit instructions in the `Xqci` spec that currently
cannot be compressed to their 32-bit variants due to the constraint in
`CompressInstEmitter` on destination instruction operands not being
allowed to mismatch with the DAG operands.

For eg. the` QC_E_ADDI` instruction can be compressed to the `ADDI`
instruction when the immediate is signed-12 bit but this is currently
not possible since the `QC_E_ADDI` instruction has `GPRNoX0` register
operands while the `ADDI` instruction has `GPR` register operands
leading to an operand type validation error.

I think we can remove the check that only source instruction operands
can mismatch with the corresponding DAG operands and rely on the fact
that we check if the DAG register operand type is a subclass of the
instruction register operand type.
2025-07-15 04:52:51 +05:30
Craig Topper
19b2dd9d79
[RISCV] Use emplace_back instead of push_back+make_pair. NFC (#148711) 2025-07-14 13:47:40 -07:00
Craig Topper
d5ac1b5e28
[RISCV] Improve hasAllNBitUsers for SLLIW. (#148344) 2025-07-14 09:02:10 -07:00
Sudharsan Veeravalli
0ae1506847
[RISCV] Add ISel patterns for Xqciac QC_SHLADD instruction (#148256)
Add a couple of patterns to generate the Xqciac QC_SHLADD shift left and
add immediate instruction.
2025-07-14 16:43:41 +05:30
Jim Lin
2886d30dd6
[RISCV] Add short forward branch scheduling for Andes45 (#147890) 2025-07-14 09:26:19 +08:00
Craig Topper
cc9b5c3480 [RISCV] Remove unused Predicates. NFC 2025-07-11 22:51:42 -07:00
Craig Topper
390fbe664c
[RISCV] Use Predicates instead of Added Complexity to prefer QC_SELECTEQI over QC_MVEQI. NFC (#148312)
IMHO AddedComplexity should be used as a last resort. We should use
other mechanism like Predicates and PatFrag predicates to give priority.
2025-07-11 22:34:54 -07:00
Craig Topper
d0a0a1ae63
[RISCV] Remove unneeded AddedComplexity from Xqcibi patterns. NFCI (#148301)
We don't have any tests that show why this AddedComplexity is needed.
ImmLeafs are automatically ranked higher than register operands so there
is no ambgiuity with the base ISA here.
2025-07-11 22:16:38 -07:00
Craig Topper
5a95ec6dc1
[RISCV] Add riscv_vlm/vsm to RISCVTargetLowering::getTgtMemIntrinsic. (#148265) 2025-07-11 16:59:47 -07:00
Craig Topper
794698031c
[RISCV] Use i32 instead of XLenVT in Xqci patterns. NFC (#148271)
This allows the i64 RV64 patterns to be filtered out of
RISCVGenDAGISel.inc. This saves about 1500 bytes.
2025-07-11 16:59:16 -07:00
Min-Yih Hsu
bf94c8ddb3
[RISCV][NFC] Split InterleavedAccess related TLI hooks into a separate file (#148040)
There have been discussions on splitting RISCVISelLowering.cpp. I think
InterleavedAccess related TLI hooks would be some of the low hanging
fruit as it's relatively isolated and also because X86 is already doing
it.

NFC.
2025-07-11 11:04:41 -07:00
Luke Lau
6563c795cd
[RISCV] Handle implicit defs when ensuring pseudo dominates in peephole (#148181)
Previously we just assumed that no instruction that needed to be moved
would have an implicit def, but vnclip pseudos will.

We can still try to move them but we just need to check that no
instructions between have any reads or writes to the physical register.

Fixes #147986
2025-07-12 01:57:45 +08:00
Craig Topper
6882a30ace
[RISCV] Add BREV8 and ORC_B to hasAllNBitUsers in RISCVOptWInstrs. (#148076)
These were removed in #147830 due to ignoring that these instructions
operate on bytes. This patch adds them back with tests including a test
for the byte boundary issue.

I seperated out the commits to show bad optimization if we don't round
Bits to the nearest byte.
2025-07-11 09:23:50 -07:00
Alex Bradbury
798f4c156f Revert "[RISCV] AddEdge between mask producer and user of V0 (#146855)"
This reverts commit aee21c368b41cd5f7765a31b9dbe77f2bffadd4e.

As noted
<https://github.com/llvm/llvm-project/pull/146855#issuecomment-3061784904>
this causes compile errors for several RVV configurations:

    fatal error: error in backend: SmallVector unable to grow. Requested capacity (4294967296) is larger than maximum value for size type (4294967295)
2025-07-11 14:04:43 +01:00
Liao Chunyu
aee21c368b
[RISCV] AddEdge between mask producer and user of V0 (#146855)
If there are multiple masks producers followed by multiple
masked consumers, may a move(vmv* v0, vx) will be generated
to save mask.
By moving the mask's producer after the mask's use,
the spill can be eliminated, and the move can be removed.
2025-07-11 17:57:01 +08:00
Sudharsan Veeravalli
9de657abaf
[RISCV] Add ISel patterns for Xqciac QC.MULIADD instruction (#147661)
Add basic isel patterns for the multiple accumulate QC.MULIADD
instruction.

While most case work with just the TD file pattern, there are few cases
which need to be handled in ISelLowering depending on the immediate we
are multiplying with:

- imm + 1 , imm - 1, 1 - imm, -1 - imm are a power of 2 --> these become
slli and add/sub
- immediate is 2^n - 2 ^m --> this becomes (add/sub (shl X, C1), (shl X,
C2))
- imm - 2, imm - 4, imm - 6 is a power of 2 --> these use shxadd when
zba is enabled

The patch does not decompose mul if Xqciac is present, for the above
conditions. There could be cases where this may not beneficial which I
plan to address in follow up patches.
2025-07-11 12:16:11 +05:30
quic_hchandel
66969c9494
[RISCV] Add ISel patterns for Qualcomm uC Xqcics extension (#146675)
Add CodeGen support for conditional select instructions in this
extension
2025-07-11 10:27:13 +05:30
Ramkumar Ramachandra
19c2fb2325
[ISel/RISCV] Custom-lower vector [l]lround (#147713)
Lower it just like the vector [l]lrint, using vfcvt, with the right
rounding mode. Updating costs to account for this custom-lowering is
left to a companion patch.
2025-07-10 10:33:46 +01:00
Luke Lau
da8d7f49ff
[RISCV] Unify non-vp and vp rounding intrinsic costing (#147872)
Currently we have slightly different costing for the vp and non-vp
version of the rounding intrinsics.

We can delete this code and use the generic BasicTTIImpl code for the vp
intrinsics which falls back to the non-vp versions.

I'm not sure if the zvfh costing is correct, this should probably be
fixed in a follow up patch. At the moment the non-vp cost is more
important since it is what the loop vectorizer will use.
2025-07-10 15:46:05 +08:00
Luke Lau
20becf373e
[TTI] Move vp.{select,merge} costing from RISCV to BasicTTIImpl. NFC (#147870)
Move the costing to the generic implementation in BasicTTIImpl since it
just falls back to the non-vp costing.

Also pass through the OperandValueInfo if using value based costing, but
I don't believe this affects the result for any in-tree target
currently.
2025-07-10 14:30:52 +08:00
Pengcheng Wang
b57df56b48
[RISCV] Add UnsupportedSchedXXX for vendor extensions package (#147666)
There will be more schedule definitions for vendor extentions and
we need to add these `UnsupportedSchedXXX` to exsiting models every
time we add new schedule definitions.

The fact is that each vendor will barely implement other vendors'
extensions, so we can package these definitions into one.
2025-07-10 14:15:22 +08:00
Boyao Wang
697beb3f17
[TargetLowering] Change getOptimalMemOpType and findOptimalMemOpLowering to take LLVM Context (#147664)
Add LLVM Context to getOptimalMemOpType and findOptimalMemOpLowering. So
that we can use EVT::getVectorVT to generate EVT type in
getOptimalMemOpType.

Related to [#146673](https://github.com/llvm/llvm-project/pull/146673).
2025-07-10 11:11:09 +08:00
Craig Topper
574b66f241
[RISCV] Use Selection::haveNoCommonBitsSet in RISCVDAGToDAGISel::orDisjoint. (#147838) 2025-07-09 16:18:51 -07:00
Craig Topper
20a68c6179
[RISCV] Remove BREV8 and ORC_B from hasAllNBitUsers in RISCVOptWInstrs. (#147830)
These instructions operate on bytes so we need to round the demanded
bits up to the nearest byte which we aren't doing. I think we forgot to
update this when we changed from hasAllWUsers to hasNBitUsers.

We don't have any test case for these instruction so remove them until
we can put together a test.
2025-07-09 16:18:04 -07:00
Philip Reames
7bf439d260 [IA] Partially revert interface change from 4a66ba
As noted in post commit review, the API change here was not required.
I'd apparently confused myself when teasing apart patches from my
development branch.
2025-07-09 12:02:52 -07:00
Philip Reames
4a66ba2a4d
[IA] Support deinterleave intrinsics w/ fewer than N extracts (#147572)
For the fixed vector cases, we already support this, but the
deinterleave intrinsic cases (primary used by scalable vectors) didn't.
Supporting it requires plumbing through the Factor separately from the
extracts, as there can now be fewer extracts than the Factor. Note that
the fixed vector path handles this slightly differently - it uses the
shuffle and indices scheme to achieve the same thing.
2025-07-09 09:41:07 -07:00
Min-Yih Hsu
c2a818f48b
[RISCV] Add scheduling info for XSfvqmaccdod/qoq and XSfvfwmaccqqq instructions (#147626)
XSfvqmaccdod/qoq and XSfvfwmaccqqq are SiFive's small-size matrix
multiplication extensions. This patches add scheduling info for their
instructions along with six new SchedReadWrite.
2025-07-09 09:38:44 -07:00
Min-Yih Hsu
d59d2652c8
[RISCV] Add scheduling info for XSfvfnrclipxfqf instructions (#147586)
This patch adds scheduling data for the XSfvfnrclipxfqf instruction,
which narrows / clips FP32 data to INT8 according to value range
specified by a scalar register. Three new SchedReadWrites are
introduced.
2025-07-09 09:06:08 -07:00
Craig Topper
3640a5842b
[RISCV] Add Commutable flag to XNOR. (#147654) 2025-07-09 07:54:07 -07:00
Ryan Buchner
8905b1c38f
[RISCV] Efficiently lower (select %cond, andn (f, x), f) using zicond (#147369)
The following case is now optimized:
(select c, (and f, ~x), f) -> (andn f, (czero_eqz x, c))
2025-07-09 09:32:54 -04:00
Ramkumar Ramachandra
9c97b38d44
[ISel/RISCV] Custom-promote [b]f16 in [l]lrint (#146507)
Extend lowerVectorXRINT to also do a FP_EXTEND_VL when the source
element type is [b]f16, and wire up this custom-promote. Updating the
cost-model to not give these an invalid cost is left to a companion
patch.
2025-07-09 10:24:38 +01:00
Luke Lau
b02920f369
[RISCV] Don't increase vslide or splat vl if +vl-dependent-latency is present (#147089)
If the subtarget's latency is dependent on vl, then we shouldn't try to
fold away vsetvli toggles if it means increasing vl.
2025-07-09 16:25:22 +08:00
Jim Lin
7a6568dcd5
[RISCV] Support LLVM IR intrinsics for XAndesVSIntLoad (#147493)
This patch adds LLVM IR intrinsic support for XAndesVSIntLoad.

The document for the intrinsics can be found at:
https://github.com/andestech/andes-vector-intrinsic-doc/blob/ast-v5_4_0-release-v5/auto-generated/andes-v5/intrinsic_funcs/04_andes_vector_int4_load_extension.adoc

The clang part will be added in a later patch.

---------

Co-authored-by: Lino Hsing-Yu Peng <linopeng@andestech.com>
2025-07-09 13:02:57 +08:00
Craig Topper
4d0c25f4a6
[RISCV] Select disjoint_or+not as xnor. (#147636)
A disjoint OR can be converted to XOR. And a XOR+NOT is XNOR. Idea
taken from #147279.
    
I changed the existing xnor pattern to have the not on the outside
instead of the inside. These are equivalent for xor since xor is
associative. Tablegen was already generating multiple variants
of the isel pattern using associativity.
    
There are some issues here. The disjoint flag isn't preserved
through type legalization. I was hoping we could recover it
manually for the masked merge cases, but that doesn't work either.
2025-07-08 21:50:23 -07:00
Craig Topper
b7248b5cd4
[RISCV] Use cast instead of dyn_cast to MemSDNode in RISCVISelDAGToDAG.cpp. (#147643)
All of these are guaranteed to be MemSDNode. The only intrinsics that
aren't are vlm and vsm. We should add those to
RISCVTargetLowering::getTgtMemIntrinsic to fix that.
2025-07-08 21:44:39 -07:00
Luke Lau
7c812ea01a
[RISCV] Avoid vl toggles when lowering vector_splice/experimental_vp_splice and add +vl-dependent-latency tuning feature (#146746)
When vectorizing a loop with a fixed-order recurrence we use a splice,
which gets lowered to a vslidedown and vslideup pair.

However with the way we lower it today we end up with extra vl toggles
in the loop, especially with EVL tail folding, e.g:

    .LBB0_5:                                # %vector.body
# =>This Inner Loop Header: Depth=1
    	sub	a5, a2, a3
    	sh2add	a6, a3, a1
    	zext.w	a7, a4
    	vsetvli	a4, a5, e8, mf2, ta, ma
    	vle32.v	v10, (a6)
    	addi	a7, a7, -1
    	vsetivli	zero, 1, e32, m2, ta, ma
    	vslidedown.vx	v8, v8, a7
    	sh2add	a6, a3, a0
    	vsetvli	zero, a5, e32, m2, ta, ma
    	vslideup.vi	v8, v10, 1
    	vadd.vv	v8, v10, v8
    	add	a3, a3, a4
    	vse32.v	v8, (a6)
    	vmv2r.v	v8, v10
    	bne	a3, a2, .LBB0_5

Because the vslideup overwrites all but UpOffset elements from the
vslidedown, we currently set the vslidedown's AVL to said offset.

But in the vslideup we use either VLMAX or the EVL which causes a
toggle.

This increases the AVL of the vslidedown so it matches vslideup, even if
the extra elements are overridden, to avoid the toggle.

A new tuning feature +vl-dependent-latency has been added which keeps
the old behaviour for microarchitectures that dynamically dispatch uops
based on vl, e.g. sifive-x280.

+vl-dependent-latency can be reused for the recently proposed Ovlt
optimization directive if/when it's ratified:
https://lists.riscv.org/g/tech-privileged/message/2487

If we wanted to aggressively optimise for vl at the expense of
introducing more toggles we could probably look at doing this in
RISCVVLOptimizer.
2025-07-09 11:09:13 +08:00