If VecVT is an LMUL=8 VT, we can't concatenate the vectors as that
would create an illegal type. Instead we need to split the vectors
and emit two VECTOR_INTERLEAVE operations that can each be lowered.
Reviewed By: fakepaper56
Differential Revision: https://reviews.llvm.org/D152411
If VecVT is an LMUL=8 VT, we can't concatenate the vectors as that
would create an illegal type. Instead we need to split the vectors
and emit two VECTOR_DEINTERLEAVE operations that can each be lowered.
Reviewed By: luke, rogfer01
Differential Revision: https://reviews.llvm.org/D152402
It's possible that both multiplicands are being negated. This won't
change the opcode, but we can delete the two negates. Allow this
case to get through negateFMAOpcode.
I think D152260 will also fix this test case, but in the future
it may be possible for an fneg to appear after we've already converted
to RISCVISD opcodes in which case D152260 won't help.
Reviewed By: fakepaper56
Differential Revision: https://reviews.llvm.org/D152296
Where C is a simm32.
This costs an extra temporary register, but avoids a constant pool.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D152236
Since vfclass intruction will only set one single bit in the result, so if we only want to check 1 fp class, we could use vmseq to do it.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D151967
This patch lowers to vsetvli when the AVL is i32 or XLenVT and
the VF is a power of 2 in the range [1, 64]. VLEN=32 is not supported
as we don't have a valid type mapping for that. VF=1 is not supported
with Zve32* only.
The element width is used to set the SEW for the vsetvli if possible.
Otherwise we use SEW=8.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D150824
Add a DAG combine to form these from FADD_VL/FSUB_VL and FP_EXTEND_VL.
This makes it similar to other widening ops and allows us to handle
using the same FP_EXTEND_VL for both operands.
Differential Revision: https://reviews.llvm.org/D151969
Those combines may change the exception behavior and rounding behavior.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D151992
Such symbols may be undefined at link time and thus resolve to 0, which
may be further than 2GiB away from PC, causing the immediate to be out
of range for PC-relative addressing. Using the GOT avoids this, and is
the approach taken by AArch64.
Reviewed By: asb, MaskRay, arichardson
Differential Revision: https://reviews.llvm.org/D107280
This mirrors lla and is always GOT-relative, allowing an explicit
request to use the GOT without having to expand the instruction. This
then means la is just defined in terms of lla and lga in the assembler,
based on whether PIC is enabled, and at the codegen level we replace la
entirely with lga since we only ever use la there when we want to load
from the GOT (and assert that to be the case).
See https://github.com/riscv-non-isa/riscv-asm-manual/issues/50
Reviewed By: asb, MaskRay
Differential Revision: https://reviews.llvm.org/D107278
This is a follow up to D151468 which added the vslide1down case as a sub-case of vslide1down matching. This generalizes that code into generic mask matching - specifically to point out the sub-vector insert restriction in the original patch. Since the matching logic is basically the same, go ahead and support vslide1up at the same time.
Differential Revision: https://reviews.llvm.org/D151742
This is more consistent with how we handle integer widening multiply.
A follow up patch will add support for matching vfwmul when the
multiplicand is being squared.
This is pretty straight forward in the basic form. I did need to move the slideup matching earlier, but that looks generally profitable on it's own.
As follow ups, I plan to explore the v(f)slide1down variants, and see what I can do to canonicalize the shuffle then insert pattern (see _inverse tests at the end of the vslide1up.ll test).
Differential Revision: https://reviews.llvm.org/D151468
This results in improved codegen for half/bf16 libcalls on soft ABIs
Adds a RISCVSubtarget helper method for determining if a soft FP ABI is
being targeted (future bf16 related patches make use of this).
Differential Revision: https://reviews.llvm.org/D151434
After D149063.
This patch adds support for both scalable and fixed-length vector.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D151176
Even though we only need to write to the bottom NumElts - Rotation
elements for the vslidedown.vi, we can save an extra vsetivli toggle if
we just keep the wide VL.
(I may be missing something here: is there a reason why we want to explicitly keep the vslidedown narrow?)
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D151390
This adds the vfslide1down (and vfslide1up for consistency) nodes. These mostly parallel the existing vslide1down/up nodes. (See note below on instruction semantics.) We then use the vfslide1down in build_vector lowering instead of going through the stack.
The specification is more than a bit vague on the meaning of these instructions. All we're given is "The vfslide1down instruction is defined analogously, but sources its scalar argument from an f register."
We have to combine this with a general note at the beginning of section 10. Vector Arithmetic Instruction Formats which reads: "For floating-point operations, the scalar can be taken from a scalar f register. If FLEN > SEW, the value in the f registers is checked for a valid NaN-boxed value, in which case the least-signicant SEW bits of the f register are used, else the canonical NaN value is used. Vector instructions where any floating-point vector operand’s EEW is not a supported floating-point type width (which includes when FLEN < SEW) are reserved.".
Note that floats are NaN-boxed when D is implemented.
Combining that all together, we're fine as long as the element type matches the vector type - which is does by construction. We shouldn't have legal vectors which hit the reserved encoding case. An assert is included, just to be careful.
Differential Revision: https://reviews.llvm.org/D151347
For stores of small fixed-length vector constants, we can store them
with a sequence of lui/addi/sh/sw to avoid the cost of building the
vector and the vsetivli toggle, provided the constant materialization
cost isn't too high.
This subsumes the optimisation for stores of zeroes in
4dc9a2c5b93682c12d7a80bbe790b14ddb301877
(This is a reapply of 0ca13f9d2701e23af2d000a5d8f48b33fe0878b7)
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D151221
For stores of small fixed-length vector constants, we can store them
with a sequence of lui/addi/sh/sw to avoid the cost of building the
vector and the vsetivli toggle.
Note that this only handles vectors that are 32 bits or smaller, but
could be expanded to 64 bits if we know that the constant
materialization cost isn't too high.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D151221
For small fixed-length vector copies like
vsetivli zero, 2, e16, m1, ta, ma
vle16.v v8, (a0)
vse16.v v8, (a1)
We can scalarize them if the total vector size < XLEN:
lw a0, 0(a0)
sw a0, 0(a1)
This patch adds a DAG combine to do so, reusing much of the existing
logic in https://reviews.llvm.org/D150717
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D151103
Add minimal support to lower return, and introduce an OutgoingValueHandler and an OutgoingValueAssigner for returns.
Supports return values with integer, pointer and aggregate types.
(Update of D69808 - avoiding commandeering that revision)
Co-authored By: lewis-revill
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D117318
The term "next stack offset" is misleading because the next argument is
not necessarily allocated at this offset due to alignment constrains.
It also does not make much sense when allocating arguments at negative
offsets (introduced in a follow-up patch), because the returned offset
would be past the end of the next argument.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D149566
The direct motivation here is to undo an unprofitable vectorization performed by SLP, but the transform seems generally useful as well. If we are storing a zero to memory, we can use a single scalar store (from X0) for all power of two sizes up to XLen.
Differential Revision: https://reviews.llvm.org/D150717
Define intersectWith and unionWith as two complementary ways of
combining KnownBits. The names are chosen for consistency with
ConstantRange.
Deprecate commonBits as a synonym for intersectWith.
Differential Revision: https://reviews.llvm.org/D150443
Use it to replace isel patterns with a DAG combine of FP_EXTEND_VL+VFMADD_VL.
This makes it similar to how other widening operations are handled.
I plan to use this to make it easier to form tail undisturbed vfwmacc.
Previously, LegalizeVectorOps used the result VT while LegalizeDAG
used the operand VT. This patch makes them both use the operand VT.
This also makes it consistent with how the default cost model works.
I've hacked the AArch64 cost model to maintain old behavior for some
f16 vectors.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D149572