These are used by both SelectionDAG and GlobalISel and are separate from
RISCVTargetLowering.
Having a separate file is how other targets are structured. Though other
targets generate most of their calling convention code through tablegen.
I moved the `CC_RISV` functions from the `llvm::RISCV` namespace to
`llvm::`. That's what the tablegen code on other targets does and the
functions already have RISCV in their name. `RISCVCCAssignFn` is moved
from `RISCVTargetLowering` to the `llvm` namespace.
Resolve https://github.com/llvm/llvm-project/issues/106970
currently it returns 0 fixed size for `ptr` element type. The `ptr`
element size should depend on `XLen` which is 64 in riscv64 and 32 in
riscv32 respectively.
This allows odd sized vector load/store to be legalized to a
VP_LOAD/STORE using EVL.
I changed the bf16 tests in fixed-vectors-load.ll and
fixed-vectors-store.ll to use an illegal type to be consistent with the
intent of these files. A legal type is already tested in
fixed-vectors-load-store.ll
The first mask vector operand is supposed to be assigned to V0. No other
vector types will be assigned to V0. We don't need to pre-assign, we can
just try V0 first for any mask vectors in the normal processing.
This gives us much better control of the generated code for GISel. I've
tried to closely match the current gisel code, but it looks like we had
2 layers of G_ANYEXT in some cases before.
SelectionDAG now checks needsCustom() instead of detecting the special
cases in the Bitcast handler.
Unfortunately, IRTranslator for bitcast still generates copies between
register classes of different sizes. Because of this we can't handle
i16<->f16 bitcasts without crashing. Not sure if I should teach
RISCVInstrInfo::copyPhysReg to allow copies between FPR16 and GPR or if
I should convert the copies to instructions in GISel.
The fp_extend will canonicalize NaNs which is not the semantics of
FNEG/FABS/FCOPYSIGN.
For fixed vectors I'm scalarizing due to test changes on other targets
where the scalarization is expected. I will try to address in a follow
up.
For scalable vectors, we bitcast to integer and use integer logic ops.
Use isel patterns on regular FP_ROUND. For double->bf16 we need
to emit two instructions. Note the double->bf16 conversion does
double rounding, but I don't know a good way to fix that.
Previously, if Zfbfmin/Zfhmin were enabled, we only handled
build_vectors that could be turned into splat_vectors. We promoted them
to f32 splats by extending in the scalar domain and narrowing in the
vector domain.
This patch fixes a crash where we failed to account for whether the f32
vector type fit in LMUL<=8.
Because the new lowering occurs after type legalization, we have to be
careful to use XLenVT for the scalar integer type and use custom cast
nodes.
The LegalizeDAG expansion will go through memory since i16 isn't a legal
type. Avoid this by using FMV nodes.
Similar to what we did for #106886 for FNEG and FABS. Special care is
needed to handle the Sign operand being a different type.
I don't think we need this node. We can isel fp_extend directly.
fp_extend to f64 requires two instructions, but we can emit them with an
isel pattern.
I have not removed RISCVISD::FP_ROUND_BF16 because f64->bf16 needs more
work to fix the double rounding.
This patch handles target lowering and calling convention.
For target lowering, the vector tuple type represented as multiple
scalable vectors is now changed to a single `MVT`, each `MVT` has a
corresponding register class.
The load/store of vector tuples are handled as the same way but need
another vector insert/extract instructions to get sub-register group.
Inline assembly constraint for vector tuple type can directly be modeled
as "vr" which is identical to normal vector registers.
For calling convention, it no longer needs an alternative algorithm to
handle register allocation, this makes the code easier to maintain and
read.
Stacked on https://github.com/llvm/llvm-project/pull/97994
fneg/fabs are not supposed to canonicalize nans. Promoting to f32 will
go through an fp_extend which will canonicalize. The generic Promote
handler needs to be removed from LegalizeDAG.
We need to use integer bit manip to clear the bit instead.
Unfortunately, this is going through the stack due to i16 not being a
legal type. Fixing that will require custom legalization or some other
generic SelectionDAG change.
As far as I'm aware, vrgather.vv is quadratic in LMUL on most
microarchitectures today due to each output register needing to read
from each input register in the group.
For example, the reciprocal throughput for vrgather.vv on the
spacemit-x60 is listed on
https://camel-cdr.github.io/rvv-bench-results/bpi_f3 as:
LMUL1 LMUL2 LMUL4 LMUL8
4.0 16.0 64.0 256.1
Vector reverses are commonly emitted by the loop vectorizer and are
lowered as vrgather.vvs, but since the loop vectorizer uses LMUL 2 by
default they end up being quadratic.
The output registers in a reverse only need to read from one input
register though, so we can decompose this into LMUL * M1 vrgather.vvs to
get linear performance.
This gives a 0.43% runtime improvement on 526.blender_r at rva22u64_v O3
on the Banana Pi F3.
By default, type legalization will try to promote the build_vector,
but that generic type legalizer doesn't support that. Bitcast to
vXi16 instead. Same as what we do for vXf16 without Zfhmin.
Fixes#100846.
Since 2.2, `fmin.s/fmax.s` instructions follow the IEEE754-2019, if F
extension is avaiable; and `fmin.d/fmax.d` also follow the IEEE754-2019
if D extension is avaiable.
So, let's mark them as Legal.
This teaches lowerVECTOR_REVERSE to handle fixed length vectors, and
then lowers reverse vector_shuffles through it.
The motiviation for this is to allow fixed length vectors to share a
potential optimization on vector_reverse in an upcoming patch (splitting
up LMUL > 1 vrgathers.vv)
This optimization helps reduce repeated calculations of base addresses
by extracting type extensions when the same base address is accessed
multiple times but its offset is a constant.
With Zfh and Zfhmin this combine creates a fmv_x_signexth node so we can
remember that the result is sign extended. This become a fmv.x.h
instruction which sign extends its result.
With Zhinx, fmv_x_signexth becomes a COPY_TO_REGCLASS. In order for
this to guarantee the result is properly sign extended we need all
producers of a GPRF16 register class to guarantee the rest of the
GPR is sign extended. I don't think we've done that. bitcasts from i16
to f16 definitely don't do it.
The safest thing to do is to not do this combine so the sign_extend_inreg
will emit a shift pair. This is also consistent with the code generated
for Zfinx on RV64, we don't assume the upper 32 bits are sign extended.
A truncate is considered saturated if no additional conversion is required between the target and return values. If the target is saturated when attempting to truncate from a vector, there is an opportunity to optimize it.
Previously, each architecture had its own attempt at optimization, leading to redundant code. This patch implements common logic by introducing three new ISDs:
`ISD::TRUNCATE_SSAT_S`: When the operand is a signed value and the range of values matches the range of signed values of the destination type.
`ISD::TRUNCATE_SSAT_U`: When the operand is a signed value and the range of values matches the range of unsigned values of the destination type.
`ISD::TRUNCATE_USAT_U`: When the operand is an unsigned value and the range of values matches the range of unsigned values of the destination type.
These ISDs indicate a saturated truncate.
Fixes https://github.com/llvm/llvm-project/issues/85903
As noted in #102781 we can promote UADDSAT if we use sign extend instead
of zero extend.
The custom handler for RISC-V was using SIGN_EXTEND when the Zbb
extension was enabled. With this change we no longer need the custom
code.
- [AArch64]: TargetLowering is updated to spot load/store (de)interleave4 like sequences using PatternMatch,
and emit equivalent sve.ld4 and sve.st4 intrinsics.
It doesn't matter which extend we use to promote the operands. Use
whatever is the most efficient.
The custom handler for RISC-V was using SIGN_EXTEND when the Zbb
extension is enabled so we no longer need that.
This has received no development work in a while and is slowly bit
rotting as new extensions are added.
At the moment, I don't think this is viable without adding a new
invariant that 32 bit values are always in sign extended form like
Mips64 does. We are very dependent on computeKnownBits and
ComputeNumSignBits in SelectionDAG to remove sign extends created for
ABI reasons. If we can't propagate sign bit information through 64-bit
values in SelectionDAG, we can't effectively clean up those extends.