2068 Commits

Author SHA1 Message Date
Ryan Buchner
be762b7b7d
[RISCV] Efficiently lower (select cond, u, rot[r/l](u, rot.amt)) using zicond extension (#143768)
The following lowerings now occur:
(select cond, u, rotr(u, rot.amt)) -> (rotr u, (czero_nez rot.amt,
cond))
(select cond, rotr(u, rot.amt), u) -> (rotr u, (czero_eqz rot.amt,
cond))
(select cond, u, rotl(u, rot.amt)) -> (rotl u, (czero_nez rot.amt,
cond))
(select cond, rotl(u, rot.amt), u) -> (rotl u, (czero_eqz rot.amt,
cond))
2025-07-03 15:27:09 -04:00
UmeshKalappa
032966ff56
[RISCV] Added the MIPS prefetch extensions for MIPS RV64 P8700. (#145647)
the extension enabled with xmipscbop.

Please refer "MIPS RV64 P8700/P8700-F Multiprocessing System
Programmer’s Guide" for more info on the extension at
https://mips.com/wp-content/uploads/2025/06/P8700_Programmers_Reference_Manual_Rev1.84_5-31-2025.pdf
2025-07-03 10:59:10 +02:00
Jim Lin
283f53ac6f
[RISCV] Add isel patterns for generating XAndesPerf branch immediate instructions (#145147)
Similar to #139872. This patch adds isel patterns to match
`riscv_brcc` and `riscv_selectcc_frag` to XAndesPerf branch
instructions.
2025-07-03 12:47:53 +08:00
Simon Pilgrim
38200e94f1
[DAG] visitFREEZE - always allow freezing multiple operands (#145939)
Always try to fold freeze(op(....)) -> op(freeze(),freeze(),freeze(),...).

This patch proposes we drop the opt-in limit for opcodes that are allowed to push a freeze through the op to freeze all its operands, through the tree towards the roots.

I'm struggling to find a strong reason for this limit apart from the DAG freeze handling being immature for so long - as we've improved coverage in canCreateUndefOrPoison/isGuaranteedNotToBeUndefOrPoison it looks like the regressions are not as severe.

Hopefully this will help some of the regression issues in #143102 etc.
2025-07-02 11:28:37 +01:00
Ramkumar Ramachandra
652630b3c9
[ISel/RISCV] Fix fixed-vector [l]lrint lowering (#145898)
Make the fixed-vector lowering of ISD::[L]LRINT use the custom-lowering
routine, lowerVectorXRINT, and fix issues in lowerVectorXRINT related to
this new functionality.
2025-06-30 13:44:34 +01:00
Ramkumar Ramachandra
7ff9669a2e
[ISel/RISCV] Refactor isPromotedOpNeedingSplit (NFC) (#146059) 2025-06-28 11:41:26 +01:00
Ramkumar Ramachandra
2282d4faa0
[ISel/RISCV] Improve code in lowerFCOPYSIGN (NFC) (#146061) 2025-06-27 17:02:38 +01:00
Craig Topper
375af75efb
[RISCV] Simplify the check for when to call EmitLoweredCascadedSelect. NFC (#145930)
Based on the comments and tests, we only want to call
EmitLoweredCascadedSelect on selects of FP registers.

Everytime we add a new branch with immediate opcode, we've been
excluding it here.

This patch switches to checking that the comparison operands are both
registers so branch on immediate is automatically excluded.
2025-06-27 08:56:49 -07:00
quic_hchandel
950d281eb2
[RISCV] Add ISel patterns for Qualcomm uC Xqcicm extension (#145643)
Add codegen patterns for the conditional move instructions in this
extension
2025-06-27 12:25:48 +05:30
Craig Topper
c8243251cb
[RISCV] Remove separate immediate condition codes from RISCVCC. NFC (#145762)
This wasn't scalable and made the RISCVCC enum effectively just
a different way of spelling the branch opcodes.
    
This patch reduces RISCVCC back down to 6 enum values. The primary user
is select pseudoinstructions which now share the same encoding across
all
vendor extensions. The select opcode and condition code are used to
determine the branch opcode when expanding the pseudo.
    
The Cond SmallVector returned by analyzeBranch now returns the opcode
instead of the RISCVCC. reverseBranchCondition now works directly on
opcodes. getOppositeBranchCondition is also retained.

Stacked on #145622
2025-06-25 23:09:24 -07:00
Craig Topper
6fd182a3bb
[RISCV] Support fixed vector vp.reverse/splice with Zvfhmin/Zvfbfmin. (#145596)
Fix the names of some tests I accidentally misspelled.
2025-06-25 13:47:00 -07:00
Ming Yan
10edc3df99
[RISCV] Try to optimize vp.splice to vslide1up. (#144871)
Fold (vp.splice (insert_elt poison, scalar, 0), vec, 0, mask, 1, vl)
to (vslide1up vec, scalar, mask, vl).

Fold (vp.splice (splat_vector scalar), vec, 0, mask, 1, vl)
to (vslide1up vec, scalar, mask, vl).
2025-06-25 23:03:20 +08:00
Craig Topper
9702d37062
[RISCV] Support scalable vector vp.reverse/splice with Zvfhmin/Zvfbfmin. (#145588) 2025-06-24 15:40:24 -07:00
Craig Topper
7150b2c76a
[RISCV] Optimize vp.splice with 0 offset. (#145533)
We can skip the slidedown if the offset is 0.
2025-06-24 10:02:28 -07:00
Jim Lin
f6ab1f02ec
[RISCV] Support LLVM IR intrinsics for XAndesVBFHCvt (#145321)
This patch adds LLVM IR intrinsic support for XAndesVBFHCvt.

The document for the intrinsics can be found at:
https://github.com/andestech/andes-vector-intrinsic-doc/blob/ast-v5_4_0-release-v5/auto-generated/andes-v5/intrinsic_funcs.adoc#vector-widening-convert-intrinsicsxandesvbfhcvt
https://github.com/andestech/andes-vector-intrinsic-doc/blob/ast-v5_4_0-release-v5/auto-generated/andes-v5/intrinsic_funcs.adoc#vector-narrowing-convert-intrinsicsxandesvbfhcvt

Vector bf16 load/store intrisics is also enabled when +xandesvbfhcvt is
specified. The corresponding LLVM IR intrisic testcase would be added in
a follow-up patches.

The clang part will be added in a later patch.

Co-authored-by: Tony Chuan-Yue Yuan <yuan593@andestech.com>
2025-06-24 10:19:04 +08:00
Sudharsan Veeravalli
88b98d3367
[RISCV] Add ISel pattern for generating QC_BREV32 (#145288)
The `QC_BREV32` instruction reverses the bit order of `rs1` and writes
the result to `rd`
2025-06-24 07:11:46 +05:30
Sam Elliott
a6eb5eee38
[RISCV][NFC] Remove hasStdExtCOrZca (#145139)
As of 20b5728b7b1ccc4509a316efb270d46cc9526d69, C always enables Zca, so
the check `C || Zca` is equivalent to just checking for `Zca`.

This replaces any uses of `HasStdExtCOrZca` with a new `HasStdExtZca`
(with the same assembler description, to avoid changes in error
messages), and simplifies everywhere where C++ needed to check for
either C or Zca.

The Subtarget function is just deprecated for the moment.
2025-06-23 10:49:47 -07:00
Matt Arsenault
48155f93dd
CodeGen: Emit error if getRegisterByName fails (#145194)
This avoids using report_fatal_error and standardizes the error
message in a subset of the error conditions.
2025-06-23 16:33:35 +09:00
Craig Topper
0c47628515 Re-commit "[RISCV] Properly support RISCVISD::LLA in getTargetConstantFromLoad. (#145112)"
With proper co-author.

Original message:

We need to pass the operand of LLA to GetSupportedConstantPool.

This replaces #142292 with test from there added as a pre-commit
for both medlow and pic.

Co-authored-by: Carl Nettelblad carl.nettelblad@rapidity-space.com
2025-06-21 10:18:49 -07:00
Craig Topper
fc36e47a49 Revert "[RISCV] Properly support RISCVISD::LLA in getTargetConstantFromLoad. (#145112)"
I missed the Co-authored-by that I tried to add.

This reverts commit 1da864b574f699d5c9be68dca9b3969ad50f4803.
2025-06-21 10:18:34 -07:00
Craig Topper
1da864b574
[RISCV] Properly support RISCVISD::LLA in getTargetConstantFromLoad. (#145112)
We need to pass the operand of LLA to GetSupportedConstantPool.
    
This replaces #142292 with test from there added as a pre-commit
for both medlow and pic.
2025-06-21 10:17:30 -07:00
Philip Reames
5886f0a183
[RISCV] Allow larger offset when matching build_vector as vid sequence (#144756)
I happened to notice that when legalizing get.active.lane.mask with
large vectors we were materializing via constant pool instead of just
shifting by a constant.

We should probably be doing a full cost comparison for the different
lowering strategies as opposed to our current adhoc heuristics, but the
few cases this regresses seem pretty minor. (Given the reduction in vset
toggles, they might not be regressions at all.)

---------

Co-authored-by: Craig Topper <craig.topper@sifive.com>
2025-06-20 14:20:17 -07:00
Craig Topper
04e2e581ac
[RISCV] Treat bf16->f32 as separate ExtKind in combineOp_VLToVWOp_VL. (#144653)
This allows us to better track the narrow type we need and to fix
miscompiles if f16->f32 and bf16->f32 extends are mixed.

Fixes #144651.
2025-06-20 10:44:51 -07:00
Pengcheng Wang
ca29c632f0
[RISCV] Support non-power-of-2 types when expanding memcmp
We can convert non-power-of-2 types into extended value types
and then they will be widen.

Reviewers: lukel97

Reviewed By: lukel97

Pull Request: https://github.com/llvm/llvm-project/pull/114971
2025-06-18 16:11:18 +08:00
Craig Topper
f3af1cd08c
[RISCV] Set the exact flag on the SRL created for converting vscale to a read of vlenb. (#144571)
We know that vlenb is a multiple of RVVBytesPerBlock so we aren't
shifting out any non-zero bits.
2025-06-17 16:24:50 -07:00
Philip Reames
391dafd8af
[RISCV] Consolidate both copies of getLMUL1VT [nfc] (#144568)
Put one copy on RISCVTargetLowering as a static function so that both
locations can use it, and rename the method to getM1VT for slightly
improved readability.
2025-06-17 11:28:43 -07:00
Craig Topper
a3d35b87ea
[RISCV] Use RISCV::RVVBitsPerBlock instead of 64 in getLMUL1VT. NFC (#144401) 2025-06-16 11:24:33 -07:00
Ryan Buchner
a59e4acd75
[RISCV] Lower SELECT's with one constant more efficiently using Zicond (#143581)
See #143580 for MR with the test commit.

Performs the following transformations:
(select c, c1, t) -> (add (czero_nez t - c1, c), c1)
(select c, t, c1) -> (add (czero_eqz t - c1, c), c1)


@mgudim
2025-06-13 08:57:46 -04:00
Pengcheng Wang
4903c11a7e
[RISCV] Support memcmp expansion for vectors
This patch adds the support of generating vector instructions for
`memcmp`. This implementation is inspired by X86's.

We convert integer comparisons (eq/ne only) into vector comparisons
and do a vector reduction and to get the result.

The range of supported load sizes is (XLEN, VLEN * LMUL8] and
non-power-of-2 types are not supported.

Fixes #143294.

Reviewers: lukel97, asb, preames, topperc, dtcxzyw

Reviewed By: topperc, lukel97

Pull Request: https://github.com/llvm/llvm-project/pull/114517
2025-06-13 14:31:48 +08:00
Serge Pavlov
953a778fab
[RISCV][FPEnv] Lowering of fpenv intrinsics (#141498)
The change implements custom lowering of `get_fpenv`, `set_fpenv` and
`reset_fpenv` for RISCV target.
2025-06-11 19:08:23 +07:00
Jim Lin
bfe0967603 [RISCV] Remove the TODO for vqdotsu. NFC.
It has been supported by #141267.
2025-06-10 14:08:37 +08:00
Jim Lin
6881c7d5fa
[RISCV] Don't select sh{1,2,3}add if shl doesn't have one use (#143351)
Try to fix https://github.com/llvm/llvm-project/pull/130829#pullrequestreview-2730533158.
There's no benefit if shl doesn't have one use.
2025-06-10 13:34:12 +08:00
Philip Reames
2680afb76b
[RISCV] Migrate zvqdotq reduce matching to use partial_reduce infrastructure (#142212)
This involves a codegen regression at the moment due to the issue
described in 443cdd0b, but this aligns the lowering paths for this case
and makes it less likely future bugs go undetected.
2025-06-09 17:47:08 -07:00
Philip Reames
939666380f
[SDAG] Add partial_reduce_sumla node (#141267)
We have recently added the partial_reduce_smla and partial_reduce_umla
nodes to represent Acc += ext(b) * ext(b) where the two extends have to
have the same source type, and have the same extend kind.

For riscv64 w/zvqdotq, we have the vqdot and vqdotu instructions which
correspond to the existing nodes, but we also have vqdotsu which
represents the case where the two extends are sign and zero respective
(i.e. not the same type of extend).

This patch adds a partial_reduce_sumla node which has sign extension for
A, and zero extension for B. The addition is somewhat mechanical.
2025-06-09 07:17:45 -07:00
Jesse Huang
893fa06280
[RISC-V] Adjust trampoline code for branch control flow protection (#141949)
Trampoline will use a alternative sequence when branch CFI is on.
The stack of the test is organized as follow
```
   56 $ra
   44 $a0      f
   36 $a1      p
   32 00038067 jalr  t2
   28 010e3e03 ld    t3, 16(t3)
   24 018e3383 ld    t2, 24(t3)
   20 00000e17 auipc t3, 0
sp+16 00000023 lpad  0
```
2025-06-07 23:51:08 +08:00
Jim Lin
f8df24015a
[RISCV] Don't commute with shift if XAndesPerf is enabled (#142920)
More nds.lea.{h,w,d} are generated, similar to sh{1,2,3}add
2025-06-06 11:08:23 +08:00
Philip Reames
88738a74f0
[RISCV] Optimize two source deinterleave2 via ri.vunzip2{a,b} (#142667)
As done for the existing vnsrl cases, we can split a two source
deinterleave2 into two single source deinterleave2 and a slideup.  
We can also use a concat-then-deinterleave2 tactic. Both are equally
valid (except in the m8 source type case), and the 
concat-then-deinterleave2 saves one instruction for fractional LMUL cases.

Additionally, if we happen to know the exact VLEN and our fixed vectors
are an even number of vector registers, we can avoid the need to split or
concat entirely and just use both registers sources.
2025-06-03 20:18:55 -07:00
Sam Elliott
2863c640fa
[RISCV][NFC] Simplify ISD::SELECT Legality (#142650)
ISD::SELECT is legal by default, so this change to the conditional makes
it clearer that XTHeadCondMov and XMipsCMove both leave this operation
legal rather than custom expanding it.
2025-06-03 12:49:42 -07:00
Philip Reames
7ced3281ee
[RISCV] Use ri.vunzip2{a,b} for e64 fixed length deinterleave(2) shuffles (#137217)
If we have xrivosvizip, we can use the vunzip2{a,b} instructions for
these cases *provided* that we can prove the layout in the two registers
matches the fixed length semantics.

The majority of this patch is a straight-forward port of the existing
vnsrl logic which has the same requirement (though for slightly
different reasoning).

The one complicated bit is the addition of the scalable splitting logic
inside lowerVZIP to exploit the independent register operands, and allow
the use of lower LMUL. This bit is annoyingly complicated, and really
"should" be a DAG combine - except that the VL and mask reduction
becomes hard when it's not known to be a constant.
2025-06-03 10:16:53 -07:00
Craig Topper
40e1f7d1e7
[RISCV] Use llvm::is_contained. NFC (#142239) 2025-05-30 22:01:54 -07:00
Kazu Hirata
4a7b53f040 [RISCV] Fix a warning
This patch fixes:

  llvm/lib/Target/RISCV/RISCVISelLowering.cpp:8411:7: error: unused
  variable 'ArgVT' [-Werror,-Wunused-variable]
2025-05-30 14:37:25 -07:00
Philip Reames
443cdd0b48
[RISCV] Fix a bug in partial.reduce lowering for zvqdotq .vx forms (#142185)
I'd missed a bitcast in the lowering. Unfortunately, that bitcast
happens to be semantically required here as the partial_reduce_* source
expects an i8 element type, but the pseudos and patterns expect an i32
element type.

This appears to only influence the .vx matching from the cases I've
found so far, and LV does not yet generate anything which will exercise
this. The reduce path (instead of the partial.reduce one) used by SLP
currently manually constructs the i32 value, and then goes directly to
the pseudo's with their i32 arguments, not the partial_reduce nodes.

We're basically loosing the .vx matching on this path until we teach
splat matching to be able to manually splat the i8 value into an i32 via
LUI/ADDI.
2025-05-30 11:05:43 -07:00
Philip Reames
1651aa2943
[SDAG] Split the partial reduce legalize table by opcode [nfc] (#141970)
On it's own, this change should be non-functional. This is a preparatory
change for https://github.com/llvm/llvm-project/pull/141267 which adds a
new form of PARTIAL_REDUCE_*MLA. As noted in the discussion on that
review, AArch64 needs a different set of legal and custom types for the
PARTIAL_REDUCE_SUMLA variant than the currently existing
PARTIAL_REDUCE_UMLA/SMLA.
2025-05-29 14:05:31 -07:00
Craig Topper
8e2641a97f
[RISCV] Add ORC_B to SimplifyDemandedBitsForTargetNode. (#141975) 2025-05-29 12:33:36 -07:00
Craig Topper
dce490e529
[RISCV] Custom type legalize MVT::i8 BITREVERSE to BREV8. (#142001)
If we're only reversing a single byte, we can use BREV8 directly.

If we let it type legalize we'll get (srl (bitreverse X), XLen-8). In op
legalization, we'll expand that to (srl (brev8 (bswap X)), XLen - 8).
Then, SimplifyDemandedBits can reduce it to (srl (brev8 (shl X, XLen -
8)), XLen - 8). We could add a DAGCombine to pull the shl through the
brev8 to put it next to the srl which will allow it to become (and
(brev8 X), 255). Unless we can prove the upper XLen-8 bits are 0 or that
they aren't demanded, we can't remove the `and`.

By emitting BREV8 directly when we still know the type is i8, we can
avoid this. We already DAGCombine i16 and i32 (bitreverse (bswap X)) to
BREV8 early for the same reason.

I've added an i7 test case so we can still see the opportunity for
improvement on weird sizes.

Fixes the RISC-V part of #141863.
2025-05-29 12:33:16 -07:00
Craig Topper
67a0844812
[RISCV] Add BREV8 to SimplifyDemandedBitsForTargetNode. (#141898) 2025-05-29 08:56:48 -07:00
Marius Kamp
10647685ca
[SDAG] Make Select-with-Identity-Fold More Flexible; NFC (#136554)
This change adds new parameters to the method
`shouldFoldSelectWithIdentityConstant()`. The method now takes the
opcode of the select node and the non-identity operand of the select
node. To gain access to the appropriate arguments, the call of
`shouldFoldSelectWithIdentityConstant()` is moved after all other checks
have been performed. Moreover, this change adjusts the precondition of
the fold so that it would work for `SELECT` nodes in addition to
`VSELECT` nodes.
    
No functional change is intended because all implementations of
`shouldFoldSelectWithIdentityConstant()` are adjusted such that they
restrict the fold to a `VSELECT` node; the same restriction as before.
    
The rationale of this change is to make more fine grained decisions
possible when to revert the InstCombine canonicalization of
`(select c (binop x y) y)` to `(binop (select c x idc) y)` in the
backends.
2025-05-29 09:46:39 +02:00
Philip Reames
77a3f81dc4
[RISCV] Custom lower fixed length partial.reduce to zvqdotq (#141180)
This is a follow on to 9b4de7 which handles the fixed vector cases. In
retrospect, this is simple enough if probably should have just been part
of the original commit, but oh well.
2025-05-23 13:56:49 -07:00
Rahul Joshi
52c2e45c11
[NFC][CodeGen] Adopt MachineFunctionProperties convenience accessors (#141101) 2025-05-23 08:30:29 -07:00
Harald van Dijk
86d1d4eacb
[RISC-V] Allow intrinsics to be used with any pointer type. (#139634)
RISC-V does not use address spaces and leaves them available for user
code to make use of. Intrinsics, however, required pointer types to use
the default address space, complicating handling during lowering to
handle non-default address spaces. When the intrinsics are overloaded,
this is handled without extra effort.

This commit does not yet update Clang builtin functions to also permit
pointers to non-default address spaces.
2025-05-23 09:40:27 +01:00