This patch adds codegen support for vector with bfloat16 type in llvm backend.
With this patch, Zvbfmin/Zvbfwma instructions as well as vle16/vse16 can generated from newly added bf16 IR intrinsics.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D156287
This removes selectSETCC and adds isel patterns for seteq/setne
conditions.
This removes the duplication of selectSETCC between lowering and
isel. This also gets some cases in xaluo.ll that we missed previously.
Reviewed By: wangpc
Differential Revision: https://reviews.llvm.org/D156250
Because we have STRICT_FCVT_W_RV64 equal to ISD::FIRST_TARGET_STRICTFP_OPCODE, the check needs to be splitted into 2 parts.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D155683
If we are selecting between two setccs that need to be legalized
with xor, the select will be legalized first. Detect this pattern
so we can pull the xor through to expose it to additional
optimizations.
We could generalize this to other operations, but those normally
get handled in DAG combine before select legalization.
Reviewed By: asb
Differential Revision: https://reviews.llvm.org/D156159
If we're selecting the result of two setccs that have been legalized
by introducing an xor with 1, we can pull the xor with 1 through the
select to enable more optimizations.
We could generalize this to other binary operators with identical
conditions, but those are usually caught before we legalize the select.
Reviewed By: asb
Differential Revision: https://reviews.llvm.org/D156144
Unlike fmaxnum and fminnum, these operations propagate nan and
consider -0.0 to be less than +0.0.
Without Zfa, we don't have a single instruction for this. The
lowering I've used forces the other input to nan if one input
is a nan. If both inputs are nan, they get swapped. Then use
the fmax or fmin instruction.
New ISD nodes are needed because fmaxnum/fminnum to not define
the order of -0.0 and +0.0.
This lowering ensures the snans are quieted though that is probably not
required in default environment). Also ensures non-canonical nans
are canonicalized, though I'm also not sure that's needed.
Another option could be to use fmax/fmin and then overwrite the
result based on the inputs being nan, but I'm not sure we can do
that with any less code.
Future work will handle nonans FMF, and handling the case where
we can prove the input isn't nan.
This does fix the crash in #64022, but we need to do more work
to avoid scalarization.
Reviewed By: fakepaper56
Differential Revision: https://reviews.llvm.org/D156069
This code was originally added in D134277. This transform is now
available in target independent DAG combine after D153502.
Reviewed By: asb
Differential Revision: https://reviews.llvm.org/D156075
The +unaligned-scalar-mem and +unaligned-vector-mem features were added in
D126085 and D149375 respectively to allow subtargets to indicate that
they supported misaligned loads/stores with "sufficient" performance.
This is separate from whether or not the target actually supports
misaligned accesses, which could be determined from Zicclsm.
This patch enables the Fast flag under the assumption that any subtarget
that declares support for +unaligned-*-mem will want to opt into
optimisations that take advantage of misaligned scalar accesses, such as
store merging.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D150771
This patch supports register allocation for floating-point types when
`zfinx` and `zdinx` is specified in the architecture for the GHC
calling convention.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D155910
The implementation in https://reviews.llvm.org/D151313 is done for the circumstance without Zfbfmin. This patch adds codegen support for the 6 instructions provided in Zfbfmin extension.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D153234
This is an alternative to D155288 that can handle other sources of
xori like FP compares. Unfortunately, it misses the i64 setge case
on RV32 in condops.ll.
Reviewed By: asb
Differential Revision: https://reviews.llvm.org/D155328
Within the AggressiveInstCombine Pass we have
an analysis/optimization that matches that
pattern of the Table Based CTZ. Some Targets do
not support/define ctz(0), but since the
AggressiveInstCombine is just an extension of
InstCombine, it should be a target-independent
canonicalization Pass, and therefore, we decided
to introduce several instructions, such as select
and compare that produce canonical IR, even if
the input is 0. The task for the Targets that do
support that input is to handle such a case and
to produce an optimal assembly.
This patch optimizes the CTTZ/CTLZ instructions
if the input is 0 by performing the`DAG combine`,
by generating the cttz(x) & 0x1f pattern (the
same goes for ctlz as well).
Differential Revision: https://reviews.llvm.org/D151449
In D155502, we added code for the compiler to check GPR-s for f16
under zhinx. This commit adds code to hit the stack when we run out of
GPR-s.
With this patch and D155502, resolves#63922
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D155507
Instead of zero extending the inputs by masking. We can shift them
left instead. This is cheaper when we don't zext.w instruction.
This does make the case where the inputs are already zero extended
or freely zero extendable worse though.
Reviewed By: wangpc
Differential Revision: https://reviews.llvm.org/D155530
This makes Zicond and XVentanaCondOps use the same code path.
The instructions have identical semantics.
Reviewed By: wangpc
Differential Revision: https://reviews.llvm.org/D155391
Resolves#63917.
Also lets the compiler check for available GPR before hitting the stack.
Reviewed By: asb
Differential Revision: https://reviews.llvm.org/D155502
D111904, D141585 made RISC-V customized lower vector ISD::CTLZ_ZERO_UNDEF/CTTZ_ZERO_UNDEF/CTLZ
by converting to float and using the float result.
Perhaps VP_CTLZ_ZERO_UNDEF/VP_CTTZ_ZERO_UNDEF/VP_CTLZ could use the similar feature.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D155150
Previously we returned i32 on RV32 and i64 on RV64. The instructions
only consume 32 bits and only produce 32 bits. For RV64, the result
is sign extended to 64 bits like *W instructions.
This patch removes this detail from the interface to improve
portability and consistency. This matches the proposal for scalar
intrinsics here https://github.com/riscv-non-isa/riscv-c-api-doc/pull/44
I've included IR autoupgrade support as well.
I'll be doing this for other builtins/intrinsics that currently use
'long' in other patches.
Reviewed By: VincentWu
Differential Revision: https://reviews.llvm.org/D154647
This follows the pattern of lowering VP nodes to equivalent
RISCVISD::*_VL nodes. The nodes are modelled after the VP ISD nodes rather
than the actual zvbb instructions, and I've included a merge operand to be
consistent with the underlying pseudos that were recently refactored.
I've defined the nodes in RISCVInstrInfoVVLpatterns.td as the nodes aren't Zvk
specific, but the patterns are in RISCVInstrInfoZvk.td.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D155229
This allows us to remove some curly braces around the if body.
The code wasn't consistent about it anyway. Comments before is
used in other places in this file already.
Reviewed By: wangpc, MaskRay
Differential Revision: https://reviews.llvm.org/D155390
We can use a null SDValue for the 'false' case. This avoids the
need for an output parameter. This is consistent with other
SelectionDAG code.
Reviewed By: wangpc
Differential Revision: https://reviews.llvm.org/D155388
This allows us to remove some curly braces around the if body.
The code wasn't consistent about it anyway. Comments before is
used in other places in this file already.
Differential Revision: https://reviews.llvm.org/D155390
This fixes some bugs in the original commit:
(1) Operands are passed in correct order when creating new constant
and the binary operator. New tests were added to cover these cases.
(2) Check was added to see if it is safe to commute the select and the binary operator.
Reviewed By: Craig Topper
Differential Revision: https://reviews.llvm.org/D152147
We can use an i64 clmul to emulate i32 clmul.
For clmulh and clmulr we need to zero extend the 32 bit input
to 64 bits then extract either bits [63:32] or [62:31].
Unfortunately, without Zba we need to use 2 shifts for the
zero extends. These can be optimized out later if the producing
instruction already zeroed the upper bits or if we can use lwu.
There are alternative sequences we can use for clmulh/clmulr
when the zero extend isn't free, but those are best handled by
a DAG combine to give the best opportunity for removing the extend.
This allows us to implement i32 clmul C intrinsics proposed in
https://github.com/riscv-non-isa/riscv-c-api-doc/pull/44.
Reviewed By: asb
Differential Revision: https://reviews.llvm.org/D154729
This patch is a step towards altering how we handle the emission of
condops. Marking ISD::SELECT as legal is a major change in the codegen
path, and gives few options for maintaining the old codegen path when
it is believed to be better (e.g. a better branchless sequence is
possible using non-zicond instructions, or the branch-based sequence is
preferable).
This removes the existing SelectionDAG patterns and moves the logic into
lowerSELECT. Along some small codegen changes you'll note a few minor
regressions in the generated code quality - this are due to the fact
that by lowering the SELECT node early we miss out on combines that
would kick in later when setcc condcodes that aren't natively supported
have been expanded (thus exposing opportunities for optimisation by
performing logical negation and swapping truev/falsev). I've opted to
split out work that addresses these into follow-on patches (especially
as zicond is still 'experimental').
matchSetCC is a straight-forward translation from the version in
RISCVISelDAGToDAG. Ideally, in the future it can be converted to a
helper shared between both files.
Differential Revision: https://reviews.llvm.org/D155083
(shl (zext to iXLenVec), C) is a possible pattern in auto-vectorized code for
indexed loads/stores. But extending to iXLen might be too aggressive, RVV
indexed load/store instructions zero extend their indexed operand to XLEN.
The patch tries to narrow the type of the zero extension. It's benefit to
decrease register pressure.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D154687
This patch adds pseudos and SDNode patterns for vbrev.v, vrev8.v, vclz.v,
vctz.v and vcpop.v.
I've only added them for integer element types so far since we're lacking tests
for floats.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D155216
Depends on D154634
For the cover letter of the patch-set, please checkout D154628.
This is the 7th patch of the patch-set. This patch includes change to
vfcvt_x_f, vfcvt_xu_f, vfwcvt_x_f, vfwcvt_xu_f, vfncvt_x_f, vfncvt_xu_f
vfcvt_f_x, vfcvt_f_xu, vfncvt_f_x vfncvt_f_xu, vfncvt_f_f
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D154635
Depends on D152879.
Specification PR: riscv-non-isa/rvv-intrinsic-doc#226
This patch adds variant of `vfadd` that models the rounding mode control.
The added variant has suffix `_rm` appended to differentiate from the
existing ones that does not alternate `frm` and uses whatever is inside.
The value `7` is used to indicate no rounding mode change. Reusing the
semantic from the rounding mode encoding for scalar floating-point
instructions.
Additional data member `HasFRMRoundModeOp` is added so we can append
`_rm` suffix for the fadd variants that models rounding mode control.
Additional data member `IsRVVFixedPoint` is added so we can define
pseudo instructions with rounding mode operand and distinguish the
instructions between fixed-point and floating-point.
Reviewed By: craig.topper, kito-cheng
Differential Revision: https://reviews.llvm.org/D152996
Some instructions treat x0 as a special encoding rather than as a
value of 0. Since we don't parse the inline assembly to know what
the instruction is, chooser the safest option of never using x0.
Fixes#63747.
Reviewed By: asb
Differential Revision: https://reviews.llvm.org/D154744
The orc_b and brev8 intrinsics are type overloaded, but only
i32 and XLen are supported types. The type legalization code in
ReplaceNodeResults only handles the i32 case on RV64. Add some
checks so we will fail type legalization for other types.
vwcvt.f.x doesn't use rounding mode. The integer value fits in
the mantissa of a 2x larger FP type so no rounding is required.
I've remove the Uses = [FRM] that is also not needed.
I deleted the isel patterns. Alternatively, we could keep them and
drop the rounding mode immediate. The patterns are currently untested
so I chose to delete them. If they become needed in the future, we
can decide then if we should have the patterns or teach the node
creation to use the non-RM form for widening.
This reverts part of D142102.
Reviewed By: luke
Differential Revision: https://reviews.llvm.org/D154653
As noted by @reames, we should be checking that the memory access is aligned to
the element size (or the unaligned vector memory access feature is enabled)
before lowering vlseg/vsseg intrinsics via the interleaved access pass.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D154536
This constructs a proper memory operand for riscv_vsoxei_mask and riscv_vsuxei_mask.
I think they are missed in D147119.
Reviewed By: kito-cheng
Differential Revision: https://reviews.llvm.org/D154694
Following from D153864, this patch implements the lowerDeinterleaveIntrinsic
hook to lower deinterleaves of loads into vlseg2 intrinsics.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D153876