This ensures the feature bits and RISCVSubtarget flags match what
RISCVISAInfo would do.
I'm not excited about the code duplication, but I need to set the
RISCVSubtarget flags along with calling ToggleFeature. I'll think about
how to improve this.
P extension packed SIMD types are passed in GPRs. For types larger than
XLen (e.g. v8i8 on RV32), they are split and passed via the 2XLen
mechanism, similar to i64 on RV32.
FIXME: Need to figure out the mechanism when P and V are enabled at the
same time.
stack on: https://github.com/llvm/llvm-project/pull/176193
The "generic" entry in tablegen is really a dummy entry. We shouldn't
use it for anything. Remap "generic" to either generic-rv32 or
generic-rv64 based on the triple.
This is the initial support of P extension codegen, it only includes
small part of instructions:
PADD_H, PADD_B,
PSADD_H, PSADD_B,
PAADD_H, PAADD_B,
PSADDU_H, PSADDU_B,
PAADDU_H, PAADDU_B,
PSUB_H, PSUB_B,
PDIF_H, PDIF_B,
PSSUB_H, PSSUB_B,
PASUB_H, PASUB_B,
PDIFU_H, PDIFU_B,
PSSUBU_H, PSSUBU_B,
PASUBU_H, PASUBU_B
Both conceptually belong to the same subtarget, so it should not
be necessary to pass in the context TargetRegisterInfo to any
TargetInstrInfo member. Add this reference so those superfluous
arguments can be removed.
Most targets placed their TargetRegisterInfo as a member
in TargetInstrInfo. A few had this owned by the TargetSubtargetInfo,
so unify all targets to look the same.
This patch updates `overrideSchedPolicy` and `overridePostRASchedPolicy`
to take a
`SchedRegion` parameter instead of just `NumRegionInstrs`. This provides
access to both the
instruction range and the parent `MachineBasicBlock`, which enables
looking up function-level
attributes.
With this change, targets can select post-RA scheduling direction per
function using a function
attribute. For example:
```cpp
void overridePostRASchedPolicy(MachineSchedPolicy &Policy,
const SchedRegion &Region) const {
const Function &F = Region.RegionBegin->getMF()->getFunction();
Attribute Attr = F.getFnAttribute("amdgpu-post-ra-direction");
...
}
These are identified by misc-include-cleaner. I've filtered out those
that break builds. Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
Introduce RISCVLoadStoreOptimizer MIR Pass that will do the
optimization. The load/store pairing pass identifies adjacent load/store
instructions operating on consecutive memory locations and merges them
into a single paired instruction.
This is part of MIPS extensions for the p8700 CPU.
Production of ldp/sdp instructions is OFF by default, since it is
beneficial for -Os only in the case of p8700 CPU.
This patch adds basic support of `MachinePipeliner` and disable
it by default.
The functionality should be OK and all llvm-test-suite tests have
passed.
The results differ on different platforms so it is really hard to
determine a common default value.
Tune info for postra scheduling direction is added and CPUs can
set their own preferable postra scheduling direction.
This tune feature will disable latency scheduling heuristic.
This can reduce the number of spills/reloads but will cause some
regressions on some cores.
CPU may add this tune feature if they find it's profitable.
Reviewers: lukel97, michaelmaitland, asb, preames, mshockwave, topperc
Reviewed By: michaelmaitland, mshockwave, topperc
Pull Request: https://github.com/llvm/llvm-project/pull/115858
We are using `PostMachineScheduler` instead of `PostRAScheduler`
since #68696.
The hook `getPostRAMutations` is only used in `PostRAScheduler` so
it is actually dead code for RISC-V now.
This is based on other targets like PPC/AArch64 and some experiments.
This PR will only enable bidirectional scheduling and tracking register
pressure.
Disclaimer: I haven't tested it on many cores, maybe we should make
some options being features. I believe downstreams must have tried
this before, so feedbacks are welcome.
Fixed length vectors use scalable vector containers. With Zve32* and not
Zvl64b, vscale is a 0.5 due RVVBitsPerBlock being 64.
To support this correctly we need to lower RVVBitsPerBlock to 32 and
change our type mapping. But we need to RVVBitsPerBlock to alway be
>= ELEN. This means we need two different mapping depending on ELEN.
That is a non-trivial amount of work so disable fixed lenght vectors
without Zvl64b for now.
We had almost no tests for Zve32x without Zvl64b which is probably why
we never realized that it was broken.
Fixes#102352.
Instead of std::unique_ptr<RegisterBankInfo>. This allows us to return a
RISCVRegisterBankInfo* from getRegBankInfo so we can avoid a
static_cast.
This does require an additional header file to be included in
RISCVSubtarget.h, but I don't think it's a big deal.
Prior to this commit, we created the GlobalISel objects in the
RISCVSubtarget constructor, even if we are not running GlobalISel. This
patch moves creation of the GlobalISel objects into their getters, which
ensures that we only create these objects if they are actually needed.
This helps since some of the constructors of the GlobalISel objects have
a significant amount of code.
We make the `unique_ptr`s `mutable` since GlobalISel passes only have
access to `const TargetSubtargetInfo` through `MF.getSubtarget()`.
This patch is tested by the fact that all existing RISC-V GlobalISel
tests remain passing.
This removes the uses of target flags to disable subreg liveness,
relying on the `-enable-subreg-liveness` flag instead. The
`-enable-subreg-liveness` flag has been changed to take precedence over
the subtarget if set, and one use of `Subtarget->enableSubRegLiveness()`
has been changed to `MRI->subRegLivenessEnabled()` to make sure the
option properly applies.
We convert existed macro fusions to TableGen.
Bacause `Fusion` depend on `Instruction` definitions which is defined
below `RISCVFeatures.td`, so we recommend user to add fusion features
when defining new processor.
There are many information that can be used for tuning, like
alignments, cache line size, etc. But we can't make all of them
`SubtargetFeature` because some of them are not with enumerable
value, for example, `PrefetchDistance` used by `LoopDataPrefetch`.
In this patch, a searchable table `RISCVTuneInfoTable` is added,
in which each entry contains the CPU name and all tune information
defined in `RISCVTuneInfo`. Each field of `RISCVTuneInfo` should
have a default value and processor definitions can override the
default value via `let` statements.
We don't need to define a `RISCVTuneInfo` for each processor and
it will use the default value (which is for `generic`) if no
`RISCVTuneInfo` defined.
For processors in the same series, a subclass can inherit from
`RISCVTuneInfo` and override the fields. And we can also override
the fields in processor definitions if there are some differences
in the same processor series.
When initilizing `RISCVSubtarget`, we will use `TuneCPU` as the
key to serach the tune info table. So, the behavior here is if
we don't specify the tune CPU, we will use specified `CPU`, which
is expected I think.
This patch almost undoes 61ab106, in which I added tune features
of preferred function/loop alignments. More tune information can
be added in the future.
We're already tracking XLen, we can compute XLenVt from that. Note that XLen itself should probably be driven from IsRV64 (the processor flag), but I'm leaving that to a separate change (with review).
In llvm alias analysis is off by default now.
This patch enable alias analysis on RISCV target during code generation by default,
and this makes more chances for improving performance.
Modified related test cases.
Differential Revision: https://reviews.llvm.org/D157250
This patch adds logic for determining RegisterBank size to RegisterBankInfo, which allows accounting for the HwMode of the target. Individual RegisterBanks cannot be constructed with HwMode information as construction is generated by TableGen, but a RegisterBankInfo subclass can provide the HwMode as a constructor argument. The HwMode is used to select the appropriate RegisterBank size from an array relating sizes to RegisterBanks.
Targets simply need to provide the HwMode argument to the <target>GenRegisterBankInfo constructor. The RISC-V RegisterBankInfo constructor has been updated accordingly (plus an unused argument removed).
Reviewed By: simoncook, craig.topper
Differential Revision: https://reviews.llvm.org/D76007
Fuchsia's ABI always reserves the x18 (s2) register for the
ShadowCallStack ABI, even when -fsanitize=shadow-call-stack is
not enabled.
Reviewed By: phosek
Differential Revision: https://reviews.llvm.org/D143355
This change enables the use of RISCV's variable length vector registers for fixed length vectors in the IR, and implicitly enables various IR transforms which generate fixed length vectors if legal (e.g. LoopVectorize). Specifically, this enables fixed length vectors which are known to be inbounds of the underlying variable hardware size.
For context, remember that the +V extension provides a minimum VLEN of 128. The embedded variants provide lower minimums. The analogy here is essentially vectorizing for SSE on a machine which may or may not include AVX2/AVX512. We won't get full utilization by default, but we will get some benefit. And of course, with an explicit mcpu we can vectorize to the exact target hardware.
The LV impact is mostly related to vectorizer robustness. In cases we haven't yet fully implemented scalable vectorization support, we can fall back to fixed length vectorization.
SLP has been disabled for now, even when fixed vectors are enabled. See a310637 and associated review. There are a few addiitional code quality issues which need worked through before turning SLP on would be reasonable.
Differential Revision: https://reviews.llvm.org/D131508
Saves a heap allocation and avoids an explicit call to the BitVector constructor.
Reviewed By: reames, myhsu
Differential Revision: https://reviews.llvm.org/D132674
We previously enabled subregister liveness by default when compiling
with RVV. This has been shown to cause miscompilations where RVV
register operand constraints are not met. A test was added for this in
D129639 which explains the issue in more detail.
Until this issue is fixed in some way, we should not be enabling
subregister liveness unless the user asks for it.
Reviewed By: craig.topper, rogfer01, kito-cheng
Differential Revision: https://reviews.llvm.org/D129646
This adds the macrofusion plumbing and support fusing LUI+ADDI(W).
This is similar to D73643, but handles a different case. Other cases
can be added in the future.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D128393
The failure that caused the previous revert has been fixed
by https://reviews.llvm.org/D126048
Original commit message:
RVV makes heavy use of subregisters due to LMUL>1 and segment
load/store tuples. Enabling subregister liveness tracking improves the quality
of the register allocation.
I've added a command line that can be used to turn it off if it causes compile
time or functional issues. I used the command line to keep the old behavior
for one interesting test case that was testing register allocation.
Reviewed By: kito-cheng
Differential Revision: https://reviews.llvm.org/D128016
This reverts most of ed242b54c9c2aa84a47f66af5b8497d93646b68d
I'm seeing failures in our intrinsic testing on qemu that seem
related to this. Reverting while I investigate.
I've left the command line option in place for directed testing.
It defaults to off.
RVV makes heavy use of subregisters due to LMUL>1 and segment
load/store tuples. Enabling subregister liveness tracking improves the quality
of the register allocation.
I've added a command line that can be used to turn it off if it causes compile
time or functional issues. I used the command line to keep the old behavior
for one interesting test case that was testing register allocation.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D125108
riscv-v-vector-bits-min is primarily used to opt-in to the
autovectorizer. The vector width can be determined from Zvl*b.
This patch adds support treating -1 as meaning use Zvl*b so we can
still opt-in to autovectorization without needing to repeat a
vector width already given by Zvl*b or -mcpu.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D124960
This was added before Zve extensions were defined. I think users
should use Zve32x or Zve32f now. Though we will lose support for limiting
ELEN to 16 or 8, but I hope no one was using that.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D123418
Most other targets support 'generic', but RISCV issues an error.
This can require a special case in tools that use LLVM that aren't
clang.
This patch treats "generic" the same as an empty string and remaps
it to generic-rv/rv64 based on the triple. Unfortunately, it has to
be added to RISCV.td because MCSubtargetInfo is constructed and
parses the CPU before RISCVSubtarget's constructor gets a chance
to remap it. The CPU will then reparsed and the state in the
MCSubtargetInfo subclass will be updated again.
Fixes PR54146.
Reviewed By: khchen
Differential Revision: https://reviews.llvm.org/D121149
In the Zve* extensions, the vlen could be 64. This patch change the vlen constraint of low bound to 64.
Differential Revision: https://reviews.llvm.org/D118217