This patch updates `overrideSchedPolicy` and `overridePostRASchedPolicy`
to take a
`SchedRegion` parameter instead of just `NumRegionInstrs`. This provides
access to both the
instruction range and the parent `MachineBasicBlock`, which enables
looking up function-level
attributes.
With this change, targets can select post-RA scheduling direction per
function using a function
attribute. For example:
```cpp
void overridePostRASchedPolicy(MachineSchedPolicy &Policy,
const SchedRegion &Region) const {
const Function &F = Region.RegionBegin->getMF()->getFunction();
Attribute Attr = F.getFnAttribute("amdgpu-post-ra-direction");
...
}
As of 20b5728b7b1ccc4509a316efb270d46cc9526d69, C always enables Zca, so
the check `C || Zca` is equivalent to just checking for `Zca`.
This replaces any uses of `HasStdExtCOrZca` with a new `HasStdExtZca`
(with the same assembler description, to avoid changes in error
messages), and simplifies everywhere where C++ needed to check for
either C or Zca.
The Subtarget function is just deprecated for the moment.
This adds assembler/disassembler support for XSfmmbase 0.6 and related
SiFive matrix multiplication extensions based on the spec here
https://www.sifive.com/document-file/xsfmm-matrix-extensions-specification
Functionality-wise, this is the same as the Zvma extension proposal that
SiFive shared with the Attached Matrix Extension Task Group. The
extension names and instruction mnemonics have been changed to use
vendor prefixes.
Note this is a non-conforming extension as the opcodes used here are in
the standard opcode space in OP-V or OP-VE.
---------
Co-authored-by: Brandon Wu <brandon.wu@sifive.com>
This patch adds basic support of `MachinePipeliner` and disable
it by default.
The functionality should be OK and all llvm-test-suite tests have
passed.
The results differ on different platforms so it is really hard to
determine a common default value.
Tune info for postra scheduling direction is added and CPUs can
set their own preferable postra scheduling direction.
Add RISC-V support for XRay. The RV64 implementation has been tested in
both QEMU and in our hardware environment.
Currently this requires D and C extensions, but since both RV64GC and
RVA22/RVA23 are becoming mainstream, I don't think this requirement will
be a big problem.
Based on the previous work by @a-poduval :
https://reviews.llvm.org/D117929
---------
Co-authored-by: Ashwin Poduval <ashwin.poduval@gmail.com>
This fixes
https://discourse.llvm.org/t/fixed-register-being-spill-and-restored-in-clang/83058.
We need to do it in `MachineRegisterInfo::getCalleeSavedRegs` instead of
`RISCVRegisterInfo::getCalleeSavedRegs` since the MF argument of
`TargetRegisterInfo:::getCalleeSavedRegs` is `const`, so we can't call
`MF->getRegInfo().disableCalleeSavedRegister` there.
So to put it in `MachineRegisterInfo::getCalleeSavedRegs`, we move
`isRegisterReservedByUser` into `TargetSubtargetInfo`.
We are using `PostMachineScheduler` instead of `PostRAScheduler`
since #68696.
The hook `getPostRAMutations` is only used in `PostRAScheduler` so
it is actually dead code for RISC-V now.
This is based on other targets like PPC/AArch64 and some experiments.
This PR will only enable bidirectional scheduling and tracking register
pressure.
Disclaimer: I haven't tested it on many cores, maybe we should make
some options being features. I believe downstreams must have tried
this before, so feedbacks are welcome.
If only one of the elements is actually used, then we can legally use a
strided load in place of the segment load. Doing so reduces vector
register pressure, so if both segment and strided are believed to be
element/segment at a time, then prefer the strided load variant.
Note that I've seen the vectorizer emitting wide interleave loads to
represent a strided load, so this does happen in practice. It doesn't
matter much for small LMUL*NF, but at large NF can start causing
problems in register allocation.
Note that this patch only covers the fixed vector formation cases. In
theory, we should do the same patch for scalable, but we can currently
only represent NF2 in scalable IR, and NF2 is assumed to be optimized to
better than segment-at-a-time by default, so there's currently nothing
to do.
The InitUndef pass works around a register allocation issue, where undef
operands can be allocated to the same register as early-clobber result
operands. This may lead to ISA constraint violations, where certain
input and output registers are not allowed to overlap.
Originally this pass was implemented for RISCV, and then extended to ARM
in #77770. I've since removed the target-specific parts of the pass in
#106744 and #107885. This PR reduces the pass to use a single
requiresDisjointEarlyClobberAndUndef() target hook and enables it by
default. The hook is disabled for AMDGPU, because overlapping
early-clobber and undef operands are known to be safe for that target,
and we get significant codegen diffs otherwise.
The motivating case is the one in arm64-ldxr-stxr.ll, where we were
previously incorrectly allocating a stxp input and output to the same
register.
This is just like AArch64.
Changing the threshold to 6 will increase the code size, but will
also decrease unconditional branches. CPUs with wide fetch/issue units
can benefit from it.
The value 6 may be debatable, we can set it to `SchedModel.IssueWidth`.
Instead of std::unique_ptr<RegisterBankInfo>. This allows us to return a
RISCVRegisterBankInfo* from getRegBankInfo so we can avoid a
static_cast.
This does require an additional header file to be included in
RISCVSubtarget.h, but I don't think it's a big deal.
Prior to this commit, we created the GlobalISel objects in the
RISCVSubtarget constructor, even if we are not running GlobalISel. This
patch moves creation of the GlobalISel objects into their getters, which
ensures that we only create these objects if they are actually needed.
This helps since some of the constructors of the GlobalISel objects have
a significant amount of code.
We make the `unique_ptr`s `mutable` since GlobalISel passes only have
access to `const TargetSubtargetInfo` through `MF.getSubtarget()`.
This patch is tested by the fact that all existing RISC-V GlobalISel
tests remain passing.
On RISC-V, there are a few ways to control whether the
PostMachineScheduler is enabled. If `-enable-post-misched` is passed or
passed with a value of true, then the PostMachineScheduler is enabled.
If it is passed with a value of false then the PostMachineScheduler is
disabled. If the option is not passed at all, then
`RISCVSubtarget::enablePostRAMachineScheduler` decides whether the pass
should be enabled or not. `TargetSubtargetInfo::enablePostRAScheduler`
and `TargetSubtargetInfo::enablePostRAMachineScheduler` who check the
SchedModel value are not called by RISC-V backend.
`RISCVSubtarget::enablePostRAMachineScheduler` currently checks if the
active scheduler model sets `PostRAScheduler`. If it is set to true by
the scheduler model, then the pass is enabled. If it is not set to true
by the scheduler model, then the value of `UsePostRAScheduler` subtarget
feature is used.
I argue that the RISC-V backend should not use `PostRAScheduler` field
of the scheduler model to control whether the PostMachineScheduler is
enabled for the following reasons:
1. No other targets use this value to control whether
PostMachineScheduler is enabled. They only use it to check whether the
legacy PostRASchedulerList scheduler is enabled.
2. We can add the `UsePostRAScheduler` feature to the processor
definition in RISCVProcessors.td to tie a processor to whether the pass
should be enabled by default. This makes the feature and the sched model
field redundant.
3. Since these options are redundant, we should prefer the feature,
since we can set `+` and `-` on the feature, but the value of the
scheduler cannot be controlled on the command line.
4. Keeping both options allows us to set the feature and the scheduler
model value to conflicting values. Although the scheduler model value
will win out, it feels awkward to allow it.
This is the insert_subvector equivalent to #79949, where we can avoid
sliding up by the full LMUL amount if we know the exact subregister the
subvector will be inserted into.
This mirrors the lowerEXTRACT_SUBVECTOR changes in that we handle this
in two parts:
- We handle fixed length subvector types by converting the subvector to
a scalable vector. But unlike EXTRACT_SUBVECTOR, we may also need to
convert the vector being inserted into too.
- Whenever we don't need a vslideup because either the subvector fits
exactly into a vector register group *or* the vector is undef, we need
to emit an insert_subreg ourselves because RISCVISelDAGToDAG::Select
doesn't correctly handle fixed length subvectors yet: see d7a28f7ad
A subvector exactly fits into a vector register group if its size is a
known multiple of the size of a vector register, and this adds a new
overload for TypeSize::isKnownMultipleOf for scalable to scalable
comparisons to help reason about this.
I've left RISCVISelDAGToDAG::Select untouched for now (minus relaxing an
invariant), so that the insert_subvector and extract_subvector code
paths are the same.
We should teach it to properly handle fixed length subvectors in a
follow-up patch, so that the "exact subregsiter" logic is handled in one
place instead of being spread across both RISCVISelDAGToDAG.cpp and
RISCVISelLowering.cpp.
When using Greedy Register Allocation, there are times where
early-clobber values are ignored, and assigned the same register. This
is illeagal behaviour for these intructions. To get around this, using
Pseudo instructions for early-clobber registers gives them a definition
and allows Greedy to assign them to a different register. This then
meets the ARM Architecture Reference Manual and matches the defined
behaviour.
This patch takes the existing RISC-V patch and makes it target
independent, then adds support for the ARM Architecture. Doing this will
ensure early-clobber restraints are followed when using the ARM
Architecture. Making the pass target independent will also open up
possibility that support other architectures can be added in the future.
We've now got enough of these in tree that we can see which patterns
appear to be idiomatic. As such, extract a helper for checking
if we know the exact VLEN.
We convert existed macro fusions to TableGen.
Bacause `Fusion` depend on `Instruction` definitions which is defined
below `RISCVFeatures.td`, so we recommend user to add fusion features
when defining new processor.
sifive-p450 supports a very restricted version of the short forward
branch optimization from the sifive-7-series.
For sifive-p450, a branch over a single c.mv can be macrofused as a
conditional move operation. Due to encoding restrictions on c.mv, we
can't conditionally move from X0. That would require c.li instead.
PR #75576 and #75735 update some implies in
llvm/lib/Support/RISCVISAInfo.cpp, but both of them miss the subtarget
feature part.
This patch still preserve predicate HasStdExtZfhOrZfhmin and
HasStdExtZhinxOrZhinxmin, since they could make error message more
readable. ( Users might not know that zfh implies zfhmin.)
Support was added for the following fusions:
auipc-addi, slli-srli, ld-add
Some parts of the code became repetative, so small refactoring of
existing lui-addi fusion was done.
This is what AArch64 has done in https://reviews.llvm.org/D20762.
Tests are added in macro fusion tests, which uncover a bug that
DAG mutations don't take effect.
There are many information that can be used for tuning, like
alignments, cache line size, etc. But we can't make all of them
`SubtargetFeature` because some of them are not with enumerable
value, for example, `PrefetchDistance` used by `LoopDataPrefetch`.
In this patch, a searchable table `RISCVTuneInfoTable` is added,
in which each entry contains the CPU name and all tune information
defined in `RISCVTuneInfo`. Each field of `RISCVTuneInfo` should
have a default value and processor definitions can override the
default value via `let` statements.
We don't need to define a `RISCVTuneInfo` for each processor and
it will use the default value (which is for `generic`) if no
`RISCVTuneInfo` defined.
For processors in the same series, a subclass can inherit from
`RISCVTuneInfo` and override the fields. And we can also override
the fields in processor definitions if there are some differences
in the same processor series.
When initilizing `RISCVSubtarget`, we will use `TuneCPU` as the
key to serach the tune info table. So, the behavior here is if
we don't specify the tune CPU, we will use specified `CPU`, which
is expected I think.
This patch almost undoes 61ab106, in which I added tune features
of preferred function/loop alignments. More tune information can
be added in the future.
We're already tracking XLen, we can compute XLenVt from that. Note that XLen itself should probably be driven from IsRV64 (the processor flag), but I'm leaving that to a separate change (with review).
In llvm alias analysis is off by default now.
This patch enable alias analysis on RISCV target during code generation by default,
and this makes more chances for improving performance.
Modified related test cases.
Differential Revision: https://reviews.llvm.org/D157250
According to the latest spec, Zvfbfwma requires Zvfbfmin and Zvfbfmin requires Zfbfmin, with FLH/FSH/FMV.H.X/HMV.X.H removed from Zvfbfwma.
Reviewed By: asb
Differential Revision: https://reviews.llvm.org/D155916
This patch adds codegen support for vector with bfloat16 type in llvm backend.
With this patch, Zvbfmin/Zvbfwma instructions as well as vle16/vse16 can generated from newly added bf16 IR intrinsics.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D156287
SiFive's x280 CPU has a vector unit that VLEN/2 bits wide. This
means that LMUL=1 operations take 2 to process all VLEN bits.
This patch adds a DLenFactor tuning parameter and applies it to
TuneSiFive7. getLMULCost has been updated to use this factor in
its calculations. I've added an x280 command line to one cost
model test to demonstrate the effect.
Reviewed By: arcbbb
Differential Revision: https://reviews.llvm.org/D152421
This results in improved codegen for half/bf16 libcalls on soft ABIs
Adds a RISCVSubtarget helper method for determining if a soft FP ABI is
being targeted (future bf16 related patches make use of this).
Differential Revision: https://reviews.llvm.org/D151434