159 Commits

Author SHA1 Message Date
Harrison Hao
8c14d3f44f
[MISched] Use SchedRegion in overrideSchedPolicy and overridePostRASchedPolicy (#149297)
This patch updates `overrideSchedPolicy` and `overridePostRASchedPolicy`
to take a
`SchedRegion` parameter instead of just `NumRegionInstrs`. This provides
access to both the
instruction range and the parent `MachineBasicBlock`, which enables
looking up function-level
attributes.

With this change, targets can select post-RA scheduling direction per
function using a function
attribute. For example:

```cpp
void overridePostRASchedPolicy(MachineSchedPolicy &Policy,
                               const SchedRegion &Region) const {
  const Function &F = Region.RegionBegin->getMF()->getFunction();
  Attribute Attr = F.getFnAttribute("amdgpu-post-ra-direction");
  ...
}
2025-07-22 15:55:12 +08:00
Sam Elliott
a6eb5eee38
[RISCV][NFC] Remove hasStdExtCOrZca (#145139)
As of 20b5728b7b1ccc4509a316efb270d46cc9526d69, C always enables Zca, so
the check `C || Zca` is equivalent to just checking for `Zca`.

This replaces any uses of `HasStdExtCOrZca` with a new `HasStdExtZca`
(with the same assembler description, to avoid changes in error
messages), and simplifies everywhere where C++ needed to check for
either C or Zca.

The Subtarget function is just deprecated for the moment.
2025-06-23 10:49:47 -07:00
Jim Lin
483d19619c
[RISCV] Add tune features for Andes 45 series cpus (#143899)
Add tune features TuneNoDefaultUnroll, TuneShortForwardBranchOpt and 
TunePostRAScheduler for Andes 45 series cpus.
2025-06-13 14:26:50 +08:00
Craig Topper
a0b6cfd975
[RISCV] Add MC layer support for XSfmm*. (#133031)
This adds assembler/disassembler support for XSfmmbase 0.6 and related
SiFive matrix multiplication extensions based on the spec here
https://www.sifive.com/document-file/xsfmm-matrix-extensions-specification

Functionality-wise, this is the same as the Zvma extension proposal that
SiFive shared with the Attached Matrix Extension Task Group. The
extension names and instruction mnemonics have been changed to use
vendor prefixes.

Note this is a non-conforming extension as the opcodes used here are in
the standard opcode space in OP-V or OP-VE.

---------

Co-authored-by: Brandon Wu <brandon.wu@sifive.com>
2025-05-21 08:26:35 -07:00
Pengcheng Wang
95e38baf24
[RISCV] Enable prefetching writes (#130561)
We should prefetch writes since `Zicbop` has `prefetch.w`.
2025-03-11 11:39:21 +08:00
Petr Penzin
b44fbdee00
[RISCV] Tune flag for fast vrgather.vv (#124664)
Add tune knob for N*Log2(N) vrgather.vv cost.
2025-03-03 16:04:49 -08:00
Craig Topper
6e14d75f54 [RISCV] Fix some implicit conversions from Register to unsigned. NFC 2025-02-05 17:06:10 -08:00
Djordje Todorovic
0cb7636a46
[RISCV] Add MIPS extensions (#121394)
Adding two extensions for MIPS p8700 CPU:
  1. cmove (conditional move)
  2. lsp (load/store pair)

The official product page here:
https://mips.com/products/hardware/p8700
2025-01-28 08:04:09 +01:00
Pengcheng Wang
2c782ab271
[RISCV] Add software pipeliner support (#117546)
This patch adds basic support of `MachinePipeliner` and disable
it by default.
    
The functionality should be OK and all llvm-test-suite tests have
passed.
2024-12-19 13:00:08 +08:00
Djordje Todorovic
cedc9bf94a
[RISCV] Add MIPSP8700 RISCVProcFamilyEnum (#120073) 2024-12-16 16:06:53 +01:00
Sergei Barannikov
03847f19f2
[SelectionDAG] Add empty implementation of SelectionDAGInfo to some targets (#119968)
#119969 adds a couple of new methods to this class, which will need to
be overridden by these targets.

Part of #119709.

Pull Request: https://github.com/llvm/llvm-project/pull/119968
2024-12-16 15:13:46 +03:00
Pengcheng Wang
9571d2023b
[RISCV] Add tune info for postra scheduling direction (#115864)
The results differ on different platforms so it is really hard to
determine a common default value.
    
Tune info for postra scheduling direction is added and CPUs can
set their own preferable postra scheduling direction.
2024-12-16 12:18:38 +08:00
Min-Yih Hsu
ea76b2d8d8
[XRay][RISCV] RISCV support for XRay (#117368)
Add RISC-V support for XRay. The RV64 implementation has been tested in
both QEMU and in our hardware environment.

Currently this requires D and C extensions, but since both RV64GC and
RVA22/RVA23 are becoming mainstream, I don't think this requirement will
be a big problem.

Based on the previous work by @a-poduval :
https://reviews.llvm.org/D117929

---------

Co-authored-by: Ashwin Poduval <ashwin.poduval@gmail.com>
2024-12-10 17:57:04 -08:00
Michael Maitland
84efad0b47
[RISCV][MRI] Account for fixed registers when determining callee saved regs (#115756)
This fixes
https://discourse.llvm.org/t/fixed-register-being-spill-and-restored-in-clang/83058.

We need to do it in `MachineRegisterInfo::getCalleeSavedRegs` instead of
`RISCVRegisterInfo::getCalleeSavedRegs` since the MF argument of
`TargetRegisterInfo:::getCalleeSavedRegs` is `const`, so we can't call
`MF->getRegInfo().disableCalleeSavedRegister` there.

So to put it in `MachineRegisterInfo::getCalleeSavedRegs`, we move
`isRegisterReservedByUser` into `TargetSubtargetInfo`.
2024-12-06 14:07:27 -05:00
Pengcheng Wang
35619c791d
[RISCV] Add tune info for mem* expansion (#118439)
So that CPUs can tune these options.
2024-12-06 14:48:37 +08:00
Pengcheng Wang
6633916ef5
[RISCV] Remove getPostRAMutations (#117527)
We are using `PostMachineScheduler` instead of `PostRAScheduler`
since #68696.

The hook `getPostRAMutations` is only used in `PostRAScheduler` so
it is actually dead code for RISC-V now.
2024-11-26 10:55:43 +08:00
Pengcheng Wang
9122c5235e
[RISCV] Enable bidirectional scheduling and tracking register pressure (#115445)
This is based on other targets like PPC/AArch64 and some experiments.

This PR will only enable bidirectional scheduling and tracking register
pressure.

Disclaimer: I haven't tested it on many cores, maybe we should make
some options being features. I believe downstreams must have tried
this before, so feedbacks are welcome.
2024-11-15 17:53:14 +08:00
Philip Reames
a905203b9e
[RISCV] Prefer strided load for interleave load with only one lane active (#115069)
If only one of the elements is actually used, then we can legally use a
strided load in place of the segment load. Doing so reduces vector
register pressure, so if both segment and strided are believed to be
element/segment at a time, then prefer the strided load variant.

Note that I've seen the vectorizer emitting wide interleave loads to
represent a strided load, so this does happen in practice. It doesn't
matter much for small LMUL*NF, but at large NF can start causing
problems in register allocation.

Note that this patch only covers the fixed vector formation cases. In
theory, we should do the same patch for scalable, but we can currently
only represent NF2 in scalable IR, and NF2 is assumed to be optimized to
better than segment-at-a-time by default, so there's currently nothing
to do.
2024-11-05 16:15:20 -08:00
Nikita Popov
dfa54298ff
[InitUndef] Enable the InitUndef pass on non-AMDGPU targets (#108353)
The InitUndef pass works around a register allocation issue, where undef
operands can be allocated to the same register as early-clobber result
operands. This may lead to ISA constraint violations, where certain
input and output registers are not allowed to overlap.

Originally this pass was implemented for RISCV, and then extended to ARM
in #77770. I've since removed the target-specific parts of the pass in
#106744 and #107885. This PR reduces the pass to use a single
requiresDisjointEarlyClobberAndUndef() target hook and enables it by
default. The hook is disabled for AMDGPU, because overlapping
early-clobber and undef operands are known to be safe for that target,
and we get significant codegen diffs otherwise.

The motivating case is the one in arm64-ldxr-stxr.ll, where we were
previously incorrectly allocating a stxp input and output to the same
register.
2024-09-16 09:48:25 +02:00
Pengcheng Wang
27b608055f
[RISCV] Increase default tail duplication threshold to 6 at -O3 (#98873)
This is just like AArch64.

Changing the threshold to 6 will increase the code size, but will
also decrease unconditional branches. CPUs with wide fetch/issue units
can benefit from it.

The value 6 may be debatable, we can set it to `SchedModel.IssueWidth`.
2024-08-01 12:24:25 +08:00
Craig Topper
43de4e03a3
[RISCV] Rename hasVInstructionsBF16 to hasVInstructionsBF16Minimal. NFC (#101080)
This makes it more consistent with Zvfhmin since it is not a complete
bf16 implementation.
2024-07-29 21:55:42 -07:00
Craig Topper
1819323781
[RISCV] Store a std::unique_ptr<RISCVRegisterBankInfo> in RISCVSubtarget. NFC (#98375)
Instead of std::unique_ptr<RegisterBankInfo>. This allows us to return a
RISCVRegisterBankInfo* from getRegBankInfo so we can avoid a
static_cast.

This does require an additional header file to be included in
RISCVSubtarget.h, but I don't think it's a big deal.
2024-07-10 14:17:56 -07:00
Michael Maitland
24619f6aaf
[RISCV][GISEL] Do not initialize GlobalISel objects unless needed (#98233)
Prior to this commit, we created the GlobalISel objects in the
RISCVSubtarget constructor, even if we are not running GlobalISel. This
patch moves creation of the GlobalISel objects into their getters, which
ensures that we only create these objects if they are actually needed.
This helps since some of the constructors of the GlobalISel objects have
a significant amount of code.

We make the `unique_ptr`s `mutable` since GlobalISel passes only have
access to `const TargetSubtargetInfo` through `MF.getSubtarget()`.

This patch is tested by the fact that all existing RISC-V GlobalISel
tests remain passing.
2024-07-10 15:12:58 -04:00
Michael Maitland
66b5f16b2f
[RISCV] Do not check PostRAScheduler in enablePostRAScheduler (#92781)
On RISC-V, there are a few ways to control whether the
PostMachineScheduler is enabled. If `-enable-post-misched` is passed or
passed with a value of true, then the PostMachineScheduler is enabled.
If it is passed with a value of false then the PostMachineScheduler is
disabled. If the option is not passed at all, then
`RISCVSubtarget::enablePostRAMachineScheduler` decides whether the pass
should be enabled or not. `TargetSubtargetInfo::enablePostRAScheduler`
and `TargetSubtargetInfo::enablePostRAMachineScheduler` who check the
SchedModel value are not called by RISC-V backend.

`RISCVSubtarget::enablePostRAMachineScheduler` currently checks if the
active scheduler model sets `PostRAScheduler`. If it is set to true by
the scheduler model, then the pass is enabled. If it is not set to true
by the scheduler model, then the value of `UsePostRAScheduler` subtarget
feature is used.

I argue that the RISC-V backend should not use `PostRAScheduler` field
of the scheduler model to control whether the PostMachineScheduler is
enabled for the following reasons:

1. No other targets use this value to control whether
PostMachineScheduler is enabled. They only use it to check whether the
legacy PostRASchedulerList scheduler is enabled.

2. We can add the `UsePostRAScheduler` feature to the processor
definition in RISCVProcessors.td to tie a processor to whether the pass
should be enabled by default. This makes the feature and the sched model
field redundant.

3. Since these options are redundant, we should prefer the feature,
since we can set `+` and `-` on the feature, but the value of the
scheduler cannot be controlled on the command line.

4. Keeping both options allows us to set the feature and the scheduler
model value to conflicting values. Although the scheduler model value
will win out, it feels awkward to allow it.
2024-05-24 14:31:14 -04:00
Luke Lau
f565b79f9f
[RISCV] Handle fixed length vectors with exact VLEN in lowerINSERT_SUBVECTOR (#84107)
This is the insert_subvector equivalent to #79949, where we can avoid
sliding up by the full LMUL amount if we know the exact subregister the
subvector will be inserted into.

This mirrors the lowerEXTRACT_SUBVECTOR changes in that we handle this
in two parts:

- We handle fixed length subvector types by converting the subvector to
a scalable vector. But unlike EXTRACT_SUBVECTOR, we may also need to
convert the vector being inserted into too.

- Whenever we don't need a vslideup because either the subvector fits
exactly into a vector register group *or* the vector is undef, we need
to emit an insert_subreg ourselves because RISCVISelDAGToDAG::Select
doesn't correctly handle fixed length subvectors yet: see d7a28f7ad

A subvector exactly fits into a vector register group if its size is a
known multiple of the size of a vector register, and this adds a new
overload for TypeSize::isKnownMultipleOf for scalable to scalable
comparisons to help reason about this.

I've left RISCVISelDAGToDAG::Select untouched for now (minus relaxing an
invariant), so that the insert_subvector and extract_subvector code
paths are the same.

We should teach it to properly handle fixed length subvectors in a
follow-up patch, so that the "exact subregsiter" logic is handled in one
place instead of being spread across both RISCVISelDAGToDAG.cpp and
RISCVISelLowering.cpp.
2024-05-01 01:35:13 +08:00
Paul Kirth
bffc0b6569
[RISCV][NFC] Add isTargetAndroid API in RISCVSubtarget (#87671)
This is required to set target specific code generation options for
Android,
like using the TLS slot for the stack protector.
2024-04-04 15:21:59 -07:00
Jack Styles
28233408a2
[CodeGen] [ARM] Make RISC-V Init Undef Pass Target Independent and add support for the ARM Architecture. (#77770)
When using Greedy Register Allocation, there are times where
early-clobber values are ignored, and assigned the same register. This
is illeagal behaviour for these intructions. To get around this, using
Pseudo instructions for early-clobber registers gives them a definition
and allows Greedy to assign them to a different register. This then
meets the ARM Architecture Reference Manual and matches the defined
behaviour.

This patch takes the existing RISC-V patch and makes it target
independent, then adds support for the ARM Architecture. Doing this will
ensure early-clobber restraints are followed when using the ARM
Architecture. Making the pass target independent will also open up
possibility that support other architectures can be added in the future.
2024-02-26 12:12:31 +00:00
Yeting Kuo
7e97ae35ae
[RISCV] Teach RISCVMakeCompressible handle Zca/Zcf/Zce/Zcd. (#81844)
Make targets which don't have C but have Zca/Zcf/Zce/Zcd benefit from
this pass.
2024-02-22 15:51:19 +08:00
Philip Reames
8603a7b21f [RISCV] Add a query for exact VLEN to RISCVSubtarget [nfc]
We've now got enough of these in tree that we can see which patterns
appear to be idiomatic.  As such, extract a helper for checking
if we know the exact VLEN.
2024-02-20 17:27:47 -08:00
Wang Pengcheng
3fdb431b63
[RISCV] Use TableGen-based macro fusion (#72224)
We convert existed macro fusions to TableGen.
    
Bacause `Fusion` depend on `Instruction` definitions which is defined
below `RISCVFeatures.td`, so we recommend user to add fusion features
when defining new processor.
2024-01-25 17:10:49 +08:00
Craig Topper
faa326de97
[RISCV] Add branch+c.mv macrofusion for sifive-p450. (#76169)
sifive-p450 supports a very restricted version of the short forward
branch optimization from the sifive-7-series.

For sifive-p450, a branch over a single c.mv can be macrofused as a
conditional move operation. Due to encoding restrictions on c.mv, we
can't conditionally move from X0. That would require c.li instead.
2024-01-08 15:23:26 -08:00
Wang Pengcheng
f9c908862a
[RISCV] Split TuneShiftedZExtFusion (#76032)
We split `TuneShiftedZExtFusion` into three fusions to make them
reusable and match the GCC implementation[1].

The zexth/zextw fusions can be reused by XiangShan[2] and other
commercial processors, but shifted zero extension is not so common.

`macro-fusions-veyron-v1.mir` is renamed so it's not relevant to
specific processor.

References:
[1] https://gcc.gnu.org/pipermail/gcc-patches/2023-November/637303.html
[2] https://xiangshan-doc.readthedocs.io/zh_CN/latest/frontend/decode
2023-12-22 14:37:26 +08:00
Yeting Kuo
cdc0392669
[RISCV] Update implies for subtarget feature. (#75824)
PR #75576 and #75735 update some implies in
llvm/lib/Support/RISCVISAInfo.cpp, but both of them miss the subtarget
feature part.
This patch still preserve predicate HasStdExtZfhOrZfhmin and
HasStdExtZhinxOrZhinxmin, since they could make error message more
readable. ( Users might not know that zfh implies zfhmin.)
2023-12-19 09:47:46 +08:00
Mikhail Gudim
29ee66f4a0
[RISCV] Macro-fusion support for veyron-v1 CPU. (#70012)
Support was added for the following fusions:
  auipc-addi, slli-srli, ld-add
Some parts of the code became repetative, so small refactoring of
existing lui-addi fusion was done.
2023-12-11 16:34:13 -05:00
Kazu Hirata
b85f1f9b18 [Target] Include bitset (NFC)
These files are relying on the transitive include of <bitset> from
GIMatchTableExecutor.h, which doesn't actually use std::bitset.
2023-12-09 18:34:57 -08:00
Wang Pengcheng
5973272af7
[RISCV] Add MinimumJumpTableEntries to TuneInfo (#72963)
This is like what AArch64 has done in #71166 except that we don't
handle `HasMinSize` case now.
2023-11-23 14:05:23 +08:00
Wang Pengcheng
9bb69c1d96
[RISCV] Enable LoopDataPrefetch pass (#66201)
So that we can benefit from data prefetch when `Zicbop` extension is
supported.

Tune information for data prefetching are added in `RISCVTuneInfo`.
2023-11-10 15:39:58 +08:00
Mikhail Gudim
ae7f7f2ef2
[RISCV] Add TuneVentanaVeyron subtarget feature. (#70414)
This will be used to add veyron fusions in a later commit.
2023-10-31 00:37:30 -04:00
Wang Pengcheng
f3c92a06b9
[RISCV] Make PostRAScheduler a target feature (#68692)
This is what AArch64 has done in https://reviews.llvm.org/D20762.

Tests are added in macro fusion tests, which uncover a bug that
DAG mutations don't take effect.
2023-10-11 10:51:03 +08:00
Wang Pengcheng
08165c444e
[RISCV] Add searchable table for tune information (#66193)
There are many information that can be used for tuning, like
alignments, cache line size, etc. But we can't make all of them
`SubtargetFeature` because some of them are not with enumerable
value, for example, `PrefetchDistance` used by `LoopDataPrefetch`.

In this patch, a searchable table `RISCVTuneInfoTable` is added,
in which each entry contains the CPU name and all tune information
defined in `RISCVTuneInfo`. Each field of `RISCVTuneInfo` should
have a default value and processor definitions can override the
default value via `let` statements.

We don't need to define a `RISCVTuneInfo` for each processor and
it will use the default value (which is for `generic`) if no
`RISCVTuneInfo` defined.

For processors in the same series, a subclass can inherit from
`RISCVTuneInfo` and override the fields. And we can also override
the fields in processor definitions if there are some differences
in the same processor series.

When initilizing `RISCVSubtarget`, we will use `TuneCPU` as the
key to serach the tune info table. So, the behavior here is if
we don't specify the tune CPU, we will use specified `CPU`, which
is expected I think. 

This patch almost undoes 61ab106, in which I added tune features
of preferred function/loop alignments. More tune information can
be added in the future.
2023-09-26 12:26:35 +08:00
Philip Reames
faed70d38f [RISCV] Remove XLen field from RISCVSubtarget [nfc]
The isRV64 field contains the same information, and we can derive XLen
from that.

Differential Revision: https://reviews.llvm.org/D159306
2023-09-01 07:42:03 -07:00
Philip Reames
3e89aca446 [RISCV] Rename getELEN to getELen [nfc]
Let's follow the naming scheme use for DLen, XLen, and FLen.
2023-08-31 11:27:00 -07:00
Philip Reames
1c43aa44d8 [RISCV] Kill off redundant field XLenVT [nfc]
We're already tracking XLen, we can compute XLenVt from that.  Note that XLen itself should probably be driven from IsRV64 (the processor flag), but I'm leaving that to a separate change (with review).
2023-08-31 11:20:06 -07:00
Jianjian GUAN
759903568f [RISCV] Add Zvfhmin extension support for llvm RISCV backend
This patch supports Zvfhmin for RISCV codegen.

Reviewed By: michaelmaitland

Differential Revision: https://reviews.llvm.org/D151414
2023-08-23 16:47:47 +08:00
Yunze Zhu
5f73d2b780 [RISCV] Enable alias analysis by default
In llvm alias analysis is off by default now.
This patch enable alias analysis on RISCV target during code generation by default,
and this makes more chances for improving performance.
Modified related test cases.

Differential Revision: https://reviews.llvm.org/D157250
2023-08-10 10:48:43 +08:00
Craig Topper
54e8cfe6d6 [RISCV] Simplify some predicate functions in RISCVSubtarget.h. NFC
Remove some redundancy.

HasStdExtZve32f implies HasStdExtF
HasStdExtZve64d implies HasStdExtD
HasStdExtZvfbfwma implies HasStdExtZvfbfmin
2023-07-30 12:37:50 -07:00
Jun Sha (Joshua)
e56bf13317 [RISCV] Remove some instructions from Zvfbfwma by implying Zfbfmin according to the latest spec
According to the latest spec, Zvfbfwma requires Zvfbfmin and Zvfbfmin requires Zfbfmin, with FLH/FSH/FMV.H.X/HMV.X.H removed from Zvfbfwma.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D155916
2023-07-28 15:52:03 +08:00
Jun Sha (Joshua)
2b6df4a336 [RISCV] Add codegen support for bf16 vector
This patch adds codegen support for vector with bfloat16 type in llvm backend.
With this patch, Zvbfmin/Zvbfwma instructions as well as vle16/vse16 can generated from newly added bf16 IR intrinsics.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D156287
2023-07-28 09:54:23 +08:00
Craig Topper
6ac2ce7d84 [RISCV] Introduce the concept of DLEN(data path width) into getLMULCost.
SiFive's x280 CPU has a vector unit that VLEN/2 bits wide. This
means that LMUL=1 operations take 2 to process all VLEN bits.

This patch adds a DLenFactor tuning parameter and applies it to
TuneSiFive7. getLMULCost has been updated to use this factor in
its calculations. I've added an x280 command line to one cost
model test to demonstrate the effect.

Reviewed By: arcbbb

Differential Revision: https://reviews.llvm.org/D152421
2023-06-13 16:09:25 -07:00
Alex Bradbury
c4efcd6970 [RISCV] Generalise shouldExtendTypeInLibcall logic to apply to all <XLEN floats on soft ABIs
This results in improved codegen for half/bf16 libcalls on soft ABIs

Adds a RISCVSubtarget helper method for determining if a soft FP ABI is
being targeted (future bf16 related patches make use of this).

Differential Revision: https://reviews.llvm.org/D151434
2023-05-30 11:04:03 +01:00