Use a setne with 0 instead of a trunc. We know we zero extended the node
so we can get by with a non-zero check only. The truncate lowering
doesn't know that we zero extended so has to mask the lsb.
I don't think DAG combine sees the trunc before we lower it to RISCVISD
nodes so we don't get a chance to use computeKnownBits to remove the
AND.
For f16 with zvfhmin, we promote most ops and VP ops to f32. This does
the same for bf16 with zvfbfmin, so the two fp types should now be in
sync.
There are a few places in the custom lowering where we need to check for
a LMUL 8 f16/bf16 vector that can't be promoted and must be split, this
extracts that out into isPromotedOpNeedingSplit.
In a follow up NFC we can deduplicate the code that sets up the
promotions.
This adds zvfhmin test coverage for fceil, ffloor, fnearbyint, frint,
fround and froundeven and splits them at nxv32f16 to avoid crashing,
similarly to what we do for other nodes that we promote.
This also sets ftrunc to promote which was previously missing. We
already promote the VP version of it, vp_froundtozero.
Marking it as promoted affects some of the cost model tests since
they're no longer expanded.
This patch adds support for the missing STRICT_UINT_TO_FP and
STRICT_SINT_TO_FP for riscv and adds a test case for rv32 which was
previously crashing.
The code is in line with how other strict_* nodes are handled
(e.g., getting op(1) instead of op(0) when it's a strict node, as op(0)
in a strict node is the entry token).
We currently make sure to check that if folding an op to an f16 widening
op that we have zvfh. We need to do the same for bf16 vectors, but with
the further restriction that we can only combine vfmadd_vl to vfwmadd_vl
(to get vfwmaccbf16.v{v,f}).
The added test case currently crashes because we try to fold an add to a
bf16 widening add, which doesn't exist in zvfbfmin or zvfbfwma
This moves the checks into the extension support checks to keep it one
place.
We need to insert a insert_subvector or extract_subvector which feels
pretty custom.
This should make it easier to support fixed vector arguments for GISel.
This handles int->fp/fp->int nodes for zvfbfmin, reusing the same parts
that f16 uses with zvfhmin.
There's quite a bit of replication here that can probably be cleaned up
at some point.
Fortunately f16 and bf16 are always < EEW, so we can always lower via
widening or narrowing. This means we don't need to add patterns for
vrgather_vv_vl just yet.
Most of the constants fli can generate are positive numbers. We can use
fli+fneg to generate their negative versions.
Previously, we considered such negative constants as "legal" and let
isel generate the fli+fneg. However, it is useful to expose the fneg to
DAG combines to fold with fadd to produce fsub or with fma to produce
fnmadd, fnmsub, or fmsub.
This patch moves the fneg creation to lowering so that the fneg will be
visible to the last DAG combine.
I might move the rest of Zfa handling from isel to lowering as a follow
up.
Fixes#107772.
The motivation for this is to start promoting bf16 ops to f32 so that we
can mark bf16 as a supported type in
RISCVTTIImpl::isElementTypeLegalForScalableVector and scalably-vectorize
it.
This starts with expanding the nodes that can't be promoted to f32 due
to canonicalizing NaNs, similarly to f16 in #106652.
Previously they were legal by default, so the truncstore/extload test
cases would get combined and crash during selection.
These are set to expand for f16 so do the same for bf16.
This adds VL patterns for vfwmaccbf16.vv so that we can handle fixed
length vectors.
It does this by teaching combineOp_VLToVWOp_VL to emit
RISCVISD::VFWMADD_VL for bf16. The change in getOrCreateExtendedOp is
needed because getNarrowType is based off of the bitwidth so returns
f16. We need to explicitly check for bf16.
Note that the .vf patterns don't work yet, since the build_vector splat
gets lowered to a (vmv_v_x_vl (fmv_x_anyexth x)) instead of a vfmv.v.f,
which SplatFP doesn't pick up, see #106637.
This is a three deep expression which is deeper than we've otherwise
gone for multiple expansions, but I think it's reasonable to do so. This
covers mul by 50, 100, and 200 which are reasonably common naturally
arising numbers.
DataLayout, ABI, and TargetLowering can all be obtained via the
MachineFunction reference in the State object. This is how the targets
that use TableGen for CC handlers get these objects.
This might be a little slower, but it simplies all the callers in
SelectionDAG and GlobalISel.
These are used by both SelectionDAG and GlobalISel and are separate from
RISCVTargetLowering.
Having a separate file is how other targets are structured. Though other
targets generate most of their calling convention code through tablegen.
I moved the `CC_RISV` functions from the `llvm::RISCV` namespace to
`llvm::`. That's what the tablegen code on other targets does and the
functions already have RISCV in their name. `RISCVCCAssignFn` is moved
from `RISCVTargetLowering` to the `llvm` namespace.
Resolve https://github.com/llvm/llvm-project/issues/106970
currently it returns 0 fixed size for `ptr` element type. The `ptr`
element size should depend on `XLen` which is 64 in riscv64 and 32 in
riscv32 respectively.
This allows odd sized vector load/store to be legalized to a
VP_LOAD/STORE using EVL.
I changed the bf16 tests in fixed-vectors-load.ll and
fixed-vectors-store.ll to use an illegal type to be consistent with the
intent of these files. A legal type is already tested in
fixed-vectors-load-store.ll
The first mask vector operand is supposed to be assigned to V0. No other
vector types will be assigned to V0. We don't need to pre-assign, we can
just try V0 first for any mask vectors in the normal processing.
This gives us much better control of the generated code for GISel. I've
tried to closely match the current gisel code, but it looks like we had
2 layers of G_ANYEXT in some cases before.
SelectionDAG now checks needsCustom() instead of detecting the special
cases in the Bitcast handler.
Unfortunately, IRTranslator for bitcast still generates copies between
register classes of different sizes. Because of this we can't handle
i16<->f16 bitcasts without crashing. Not sure if I should teach
RISCVInstrInfo::copyPhysReg to allow copies between FPR16 and GPR or if
I should convert the copies to instructions in GISel.
The fp_extend will canonicalize NaNs which is not the semantics of
FNEG/FABS/FCOPYSIGN.
For fixed vectors I'm scalarizing due to test changes on other targets
where the scalarization is expected. I will try to address in a follow
up.
For scalable vectors, we bitcast to integer and use integer logic ops.
Use isel patterns on regular FP_ROUND. For double->bf16 we need
to emit two instructions. Note the double->bf16 conversion does
double rounding, but I don't know a good way to fix that.
Previously, if Zfbfmin/Zfhmin were enabled, we only handled
build_vectors that could be turned into splat_vectors. We promoted them
to f32 splats by extending in the scalar domain and narrowing in the
vector domain.
This patch fixes a crash where we failed to account for whether the f32
vector type fit in LMUL<=8.
Because the new lowering occurs after type legalization, we have to be
careful to use XLenVT for the scalar integer type and use custom cast
nodes.
The LegalizeDAG expansion will go through memory since i16 isn't a legal
type. Avoid this by using FMV nodes.
Similar to what we did for #106886 for FNEG and FABS. Special care is
needed to handle the Sign operand being a different type.
I don't think we need this node. We can isel fp_extend directly.
fp_extend to f64 requires two instructions, but we can emit them with an
isel pattern.
I have not removed RISCVISD::FP_ROUND_BF16 because f64->bf16 needs more
work to fix the double rounding.
This patch handles target lowering and calling convention.
For target lowering, the vector tuple type represented as multiple
scalable vectors is now changed to a single `MVT`, each `MVT` has a
corresponding register class.
The load/store of vector tuples are handled as the same way but need
another vector insert/extract instructions to get sub-register group.
Inline assembly constraint for vector tuple type can directly be modeled
as "vr" which is identical to normal vector registers.
For calling convention, it no longer needs an alternative algorithm to
handle register allocation, this makes the code easier to maintain and
read.
Stacked on https://github.com/llvm/llvm-project/pull/97994