303 Commits

Author SHA1 Message Date
Craig Topper
524545317c [RISCV] Remove RISCVISD::BREV8 and use RISCVISD::GREV instead.
We already have an ISD opcode for the more general GREV/GREVI
instructon. We can just use it with the encoding that corresponds
to the behavior of brev8. This is similar to what we do for orc.b
where we use the GORC ISD opcode.
2022-01-29 22:45:43 -08:00
Craig Topper
d8f929a567 [RISCV] Custom legalize BITREVERSE with Zbkb.
With Zbkb, a bitreverse can be split into a rev8 and a brev8.

Reviewed By: VincentWu

Differential Revision: https://reviews.llvm.org/D118430
2022-01-28 23:11:12 -08:00
Chenbing.Zheng
6d6c44a3f3 [RISCV] Add support for matching vwmulsu from fixed vectors
According to riscv-v-spec-1.0, widening signed(vs2)-unsigned integer multiply
vwmulsu.vv vd, vs2, vs1, vm # vector-vector
vwmulsu.vx vd, vs2, rs1, vm # vector-scalar

It is worth noting that signed op is only for vs2.
For vwmulsu.vv, we can swap two ops, and don't care which is sign extension,
but for vwmulsu.vx signExt can not be a vector extended from scalar (rs1).
I specifically added two functions ending with _swap in the test case.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D118215
2022-01-28 02:33:30 +00:00
Fraser Cormack
af773a1818 [RISCV][VP] Lower VP_MERGE to RVV instructions
This patch adds lowering of the llvm.vp.merge.* intrinsic
(ISD::VP_MERGE) to RVV vmerge/vfmerge instructions. It introduces a
special pseudo form of vmerge which allows a tied merge operand,
allowing us to specify the tail elements as being equal to the "on
false" operand, using a tied-def constraint and a "tail undisturbed"
policy.

While this strategy allows us to often lower the intrinsic to just one
instruction, it may be less efficient in fixed-vector types as the
number of tail elements may extend far beyond the length of the fixed
vector. Another strategy could be to use a vmerge/vfmerge instruction
with an AVL equal to the length of the vector type, and manipulate the
condition operand such that mask elements greater than the operation's
EVL are false.

I've also observed inefficient codegen in which our 'VF' patterns don't
match raw floating-point SPLAT_VECTORs, which occur in scalable-vector
code.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D117561
2022-01-24 11:05:05 +00:00
Craig Topper
fa8bb22466 [RISCV] Optimize vector_shuffles that are interleaving the lowest elements of two vectors.
RISCV only has a unary shuffle that requires places indices in a
register. For interleaving two vectors this means we need at least
two vrgathers and a vmerge to do a shuffle of two vectors.

This patch teaches shuffle lowering to use a widening addu followed
by a widening vmaccu to implement the interleave. First we extract
the low half of both V1 and V2. Then we implement
(zext(V1) + zext(V2)) + (zext(V2) * zext(2^eltbits - 1)) which
simplifies to (zext(V1) + zext(V2) * 2^eltbits). This further
simplifies to (zext(V1) + zext(V2) << eltbits). Then we bitcast the
result back to the original type splitting the wide elements in half.

We can only do this if we have a type with wider elements available.
Because we're using extends we also have to be careful with fractional
lmuls. Floating point types are supported by bitcasting to/from integer.

The tests test a varied combination of LMULs split across VLEN>=128 and
VLEN>=512 tests. There a few tests with shuffle indices commuted as well
as tests for undef indices. There's one test for a vXi64/vXf64 vector which
we can't optimize, but verifies we don't crash.

Reviewed By: rogfer01

Differential Revision: https://reviews.llvm.org/D117743
2022-01-20 14:44:47 -08:00
Craig Topper
aa7fc02feb Recommit "[RISCV] Make the operand order for RISCVISD::FSL(W)/FSR(W) match the instruction register numbering."
This reverts the revert commit e32838573929ac85fc4df3058593798d10ce4cd2.

Accidental demanded bits change has been removed. The demanded bits
code itself was remove in a pre-commit since it isn't tested.

Original commit message:
Previous we used the fshl/fshr operand ordering for simplicity. This
made things confusing when D117468 proposed adding intrinsics for
the instructions. We can't just use the generic funnel shifting
intrinsics because fsl/fsr have different functionality that should
be exposed to software.

Now we use rs1, rs3, rs2/shamt order which matches the instruction
printing order and the order used in this intrinsic header
https://github.com/riscv/riscv-bitmanip/blob/main-history/cproofs/rvintrin.h
2022-01-18 10:52:43 -08:00
Craig Topper
e328385739 Revert "[RISCV] Make the operand order for RISCVISD::FSL(W)/FSR(W) match the instruction register numbering."
This reverts commit b634f8a663d56877663f5224a785d9f0263c4176.

I broke the SimplifyDemandedBits code, but we don't have tests.
2022-01-18 10:36:03 -08:00
Craig Topper
b634f8a663 [RISCV] Make the operand order for RISCVISD::FSL(W)/FSR(W) match the instruction register numbering.
Previous we used the fshl/fshr operand ordering for simplicity. This
made things confusing when D117468 proposed adding intrinsics for
the instructions. We can't just use the generic funnel shifting
intrinsics because fsl/fsr have different functionality that should
be exposed to software.

Now we use rs1, rs3, rs2/shamt order which matches the instruction
printing order and the order used in this intrinsic header
https://github.com/riscv/riscv-bitmanip/blob/main-history/cproofs/rvintrin.h
2022-01-18 09:47:28 -08:00
David Sherwood
f4515ab858 Revert "[CodeGen][AArch64] Ensure isSExtCheaperThanZExt returns true for negative constants"
This reverts commit 197f3c0deb76951315118ef13937b67ea9cbd5aa.

Reverting after miscompilation errors discovered with ffmpeg.
2022-01-18 08:40:20 +00:00
Han-Kuan Chen
ec9cb3a79c [RISCV] Provide VLOperand in td.
Currently, users expected VL is the last operand. However, since some
intrinsics has tail policy in the last operand, this rule cannot be used
anymore.

Reviewed By: craig.topper, frasercrmck

Differential Revision: https://reviews.llvm.org/D117452
2022-01-17 20:25:47 -08:00
Han-Kuan Chen
3fc4b5896a [RISCV] Make SplatOperand start from 0.
Current SplatOperand starts from 1 because operand 0 (or 1) is intrinsic
id in SelectionDAG.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D117453
2022-01-17 20:14:59 -08:00
David Sherwood
197f3c0deb [CodeGen][AArch64] Ensure isSExtCheaperThanZExt returns true for negative constants
When we know the value we're extending is a negative constant then it
makes sense to use SIGN_EXTEND because this may improve code quality in
some cases, particularly when doing a constant splat of an unpacked vector
type. For example, for SVE when splatting the value -1 into all elements
of a vector of type <vscale x 2 x i32> the element type will get promoted
from i32 -> i64. In this case we want the splat value to sign-extend from
(i32 -1) -> (i64 -1), whereas currently it zero-extends from
(i32 -1) -> (i64 0xFFFFFFFF). Sign-extending the constant means we can use
a single mov immediate instruction.

New tests added here:

  CodeGen/AArch64/sve-vector-splat.ll

I believe we see some code quality improvements in these existing
tests too:

  CodeGen/AArch64/reduce-and.ll
  CodeGen/AArch64/unfold-masked-merge-vector-variablemask.ll

The apparent regressions in CodeGen/AArch64/fast-isel-cmp-vec.ll only
occur because the test disables codegen prepare and branch folding.

Differential Revision: https://reviews.llvm.org/D114357
2022-01-17 11:08:57 +00:00
David Sherwood
ba471ba8d2 Revert "[CodeGen][AArch64] Ensure isSExtCheaperThanZExt returns true for negative constants"
This reverts commit 31009f0b5afb504fc1f30769c038e1b7be6ea45b.

It seems to be causing SVE VLA buildbot failures and has introduced a
genuine regression. Reverting for now.
2022-01-13 15:59:43 +00:00
David Sherwood
31009f0b5a [CodeGen][AArch64] Ensure isSExtCheaperThanZExt returns true for negative constants
When we know the value we're extending is a negative constant then it
makes sense to use SIGN_EXTEND because this may improve code quality in
some cases, particularly when doing a constant splat of an unpacked vector
type. For example, for SVE when splatting the value -1 into all elements
of a vector of type <vscale x 2 x i32> the element type will get promoted
from i32 -> i64. In this case we want the splat value to sign-extend from
(i32 -1) -> (i64 -1), whereas currently it zero-extends from
(i32 -1) -> (i64 0xFFFFFFFF). Sign-extending the constant means we can use
a single mov immediate instruction.

New tests added here:

  CodeGen/AArch64/sve-vector-splat.ll

I believe we see some code quality improvements in these existing
tests too:

  CodeGen/AArch64/dag-numsignbits.ll
  CodeGen/AArch64/reduce-and.ll
  CodeGen/AArch64/unfold-masked-merge-vector-variablemask.ll

The apparent regressions in CodeGen/AArch64/fast-isel-cmp-vec.ll only
occur because the test disables codegen prepare and branch folding.

Differential Revision: https://reviews.llvm.org/D114357
2022-01-13 09:43:07 +00:00
Lian Wang
16877c5d2c [RISCV] Add bfp and bfpw intrinsic in zbf extension
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D116994
2022-01-13 02:53:00 +00:00
wangpc
c6430fade3 [RISCV] Generate 32 bits jumptable entries when code model is small
The code can only address the whole RV32 address space or the lower 2 GiB
of the RV64 address space in small code model, so 32 bits entry is enough.
Cache hit ratio and code size have some improvements.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D116435
2022-01-11 18:20:37 +08:00
wangpc
98d51c2542 [RISCV] Override TargetLowering::BuildSDIVPow2 to generate SELECT
When `Zbt` is enabled, we can generate SELECT for division by power
of 2, so that there is no data dependency.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D114856
2022-01-11 15:54:35 +08:00
Craig Topper
808c662665 [RISCV] Change RISCVISD::FCVT*RTZ opcodes to take rounding mode as an operand.
Pre-work for a future change that will use these opcodes with other
rounding modes.

Differential Revision: https://reviews.llvm.org/D116724
2022-01-06 08:12:12 -08:00
wangpc
41454ab256 [RISCV] Use constant pool for large integers
For large integers (for example, magic numbers generated by
TargetLowering::BuildSDIV when dividing by constant), we may
need about 4~8 instructions to build them.
In the same time, it just takes two instructions to load
constants (with extra cycles to access memory), so it may be
profitable to put these integers into constant pool.

Reviewed By: asb, craig.topper

Differential Revision: https://reviews.llvm.org/D114950
2021-12-31 14:48:48 +08:00
Victor Perez
10b3675aa9 [RISCV][VP] Lower mask vector VP AND/OR/XOR to RVV instructions
For fixed and scalable vectors, each intrinsic x is lowered to vmx.mm,
dropping the mask, which is safe to do as masked-off elements are
undef anyway.

Differential Revision: https://reviews.llvm.org/D115339
2021-12-23 15:02:32 -06:00
Craig Topper
b7b260e19a [RISCV] Support strict FP conversion operations.
This adds support for strict conversions between fp types and between
integer and fp.

NOTE: RISCV has static rounding mode instructions, but the constrainted
intrinsic metadata is not used to select static rounding modes. Dynamic
rounding mode is always used.

Differential Revision: https://reviews.llvm.org/D115997
2021-12-23 09:40:58 -06:00
jacquesguan
28a3e7dea2 [RISCV] Override hasAndNotCompare to use more andn when have Zbb extension.
Enable transform (X & Y) == Y ---> (~X & Y) == 0 and (X & Y) != Y ---> (~X & Y) != 0 when have Zbb extension to use more andn instruction.

Differential Revision: https://reviews.llvm.org/D115922
2021-12-23 10:42:20 +08:00
Craig Topper
3f1c403a2b [RISCV] Use AdjustInstrPostInstrSelection to insert a FRM dependency for scalar FP instructions with dynamic rounding mode.
In order to support constrained FP intrinsics we need to model FRM
dependency. Whether or not a instruction uses FRM is based on a 3
bit field in the instruction. Because of this we can't add
'Uses = [FRM]' to the tablegen descriptions.

This patch examines the immediate after isel and adds an implicit
use of FRM. This idea came from Roger Ferrer Ibanez.

Other ideas:
We could be overly conservative and just pretend all instructions with
frm field read the FRM register. Or we could have pseudoinstructions
for CodeGen with rounding mode.

Reviewed By: asb, frasercrmck, arcbbb

Differential Revision: https://reviews.llvm.org/D115555
2021-12-14 10:17:57 -08:00
David Green
9e8a71caf0 [DAG] Create fptosi.sat from clamped fptosi
This adds a fold in DAGCombine to create fptosi_sat from sequences for
smin(smax(fptosi(x))) nodes, where the min/max saturate the output of
the fp convert to a specific bitwidth (say INT_MIN and INT_MAX). Because
it is dealing with smin(/smax) in DAG they may currently be ISD::SMIN,
ISD::SETCC/ISD::SELECT, ISD::VSELECT or ISD::SELECT_CC nodes which need
to be handled similarly.

A shouldConvertFpToSat method was added to control when converting may
be profitable. The original fptosi will have a less strict semantics
than the fptosisat, with less values that need to produce defined
behaviour.

This especially helps on ARM/AArch64 where the vcvt instructions
naturally saturate the result.

Differential Revision: https://reviews.llvm.org/D111976
2021-11-30 15:29:14 +00:00
Hans Wennborg
a87782c34d Revert "[DAG] Create fptosi.sat from clamped fptosi"
It causes builds to fail with this assert:

llvm/include/llvm/ADT/APInt.h:990:
bool llvm::APInt::operator==(const llvm::APInt &) const:
Assertion `BitWidth == RHS.BitWidth && "Comparison requires equal bit widths"' failed.

See comment on the code review.

> This adds a fold in DAGCombine to create fptosi_sat from sequences for
> smin(smax(fptosi(x))) nodes, where the min/max saturate the output of
> the fp convert to a specific bitwidth (say INT_MIN and INT_MAX). Because
> it is dealing with smin(/smax) in DAG they may currently be ISD::SMIN,
> ISD::SETCC/ISD::SELECT, ISD::VSELECT or ISD::SELECT_CC nodes which need
> to be handled similarly.
>
> A shouldConvertFpToSat method was added to control when converting may
> be profitable. The original fptosi will have a less strict semantics
> than the fptosisat, with less values that need to produce defined
> behaviour.
>
> This especially helps on ARM/AArch64 where the vcvt instructions
> naturally saturate the result.
>
> Differential Revision: https://reviews.llvm.org/D111976

This reverts commit 52ff3b009388f1bef4854f1b6470b4ec19d10b0e.
2021-11-30 15:36:56 +01:00
David Green
52ff3b0093 [DAG] Create fptosi.sat from clamped fptosi
This adds a fold in DAGCombine to create fptosi_sat from sequences for
smin(smax(fptosi(x))) nodes, where the min/max saturate the output of
the fp convert to a specific bitwidth (say INT_MIN and INT_MAX). Because
it is dealing with smin(/smax) in DAG they may currently be ISD::SMIN,
ISD::SETCC/ISD::SELECT, ISD::VSELECT or ISD::SELECT_CC nodes which need
to be handled similarly.

A shouldConvertFpToSat method was added to control when converting may
be profitable. The original fptosi will have a less strict semantics
than the fptosisat, with less values that need to produce defined
behaviour.

This especially helps on ARM/AArch64 where the vcvt instructions
naturally saturate the result.

Differential Revision: https://reviews.llvm.org/D111976
2021-11-30 11:05:32 +00:00
Philipp Tomsich
af57a71d18 [RISCV] Don't call setHasMultipleConditionRegisters(), so icmp is sunk
On RISC-V, icmp is not sunk (as the following snippet shows) which
generates the following suboptimal branch pattern:
```
  core_list_find:
	lh	a2, 2(a1)
	seqz	a3, a0         <<
	bltz	a2, .LBB0_5
	bnez	a3, .LBB0_9    << should sink the seqz
        [...]
	j	.LBB0_9
  .LBB0_5:
	bnez	a3, .LBB0_9    << should sink the seqz
	lh	a1, 0(a1)
        [...]
```
due to an icmp not being sunk.

The blocks after `codegenprepare` look as follows:
```
  define dso_local %struct.list_head_s* @core_list_find(%struct.list_head_s* readonly %list, %struct.list_data_s* nocapture readonly %info) local_unnamed_addr #0 {
  entry:
    %idx = getelementptr inbounds %struct.list_data_s, %struct.list_data_s* %info, i64 0, i32 1
    %0 = load i16, i16* %idx, align 2, !tbaa !4
    %cmp = icmp sgt i16 %0, -1
    %tobool.not37 = icmp eq %struct.list_head_s* %list, null
    br i1 %cmp, label %while.cond.preheader, label %while.cond9.preheader

  while.cond9.preheader:                            ; preds = %entry
    br i1 %tobool.not37, label %return, label %land.rhs11.lr.ph
```
where the `%tobool.not37` is the result of the icmp that is not sunk.
Note that it is computed in the basic-block up until what becomes the
`bltz` instruction and the `bnez` is a basic-block of its own.

Compare this to what happens on AArch64 (where the icmp is correctly sunk):
```
  define dso_local %struct.list_head_s* @core_list_find(%struct.list_head_s* readonly %list, %struct.list_data_s* nocapture readonly %info) local_unnamed_addr #0 {
  entry:
    %idx = getelementptr inbounds %struct.list_data_s, %struct.list_data_s* %info, i64 0, i32 1
    %0 = load i16, i16* %idx, align 2, !tbaa !6
    %cmp = icmp sgt i16 %0, -1
    br i1 %cmp, label %while.cond.preheader, label %while.cond9.preheader

  while.cond9.preheader:                            ; preds = %entry
    %1 = icmp eq %struct.list_head_s* %list, null
    br i1 %1, label %return, label %land.rhs11.lr.ph
```

This is caused by sinkCmpExpression() being skipped, if multiple
condition registers are supported.

Given that the check for multiple condition registers affect only
sinkCmpExpression() and shouldNormalizeToSelectSequence(), this change
adjusts the RISC-V target as follows:
 * we no longer signal multiple condition registers (thus changing
   the behaviour of sinkCmpExpression() back to sinking the icmp)
 * we override shouldNormalizeToSelectSequence() to let always select
   the preferred normalisation strategy for our backend

With both changes, the test results remain unchanged.  Note that without
the target-specific override to shouldNormalizeToSelectSequence(), there
is worse code (more branches) generated for select-and.ll and select-or.ll.

The original test case changes as expected:
```
  core_list_find:
	lh	a2, 2(a1)
	bltz	a2, .LBB0_5
	beqz	a0, .LBB0_9    <<
        [...]
	j	.LBB0_9
.LBB0_5:
	beqz	a0, .LBB0_9    <<
	lh	a1, 0(a1)
        [...]
```

Differential Revision: https://reviews.llvm.org/D98932
2021-11-19 08:32:59 -08:00
Craig Topper
391b0ba603 [RISCV] Override TargetLowering::hasAndNot for Zbb.
Differential Revision: https://reviews.llvm.org/D113937
2021-11-15 18:44:07 -08:00
Zakk Chen
0649dfebba [RISCV] Rename some assembler mnemonic and intrinsic functions for RVV 1.0.
Rename vpopc/vmandnot/vmornot to vcpop/vmandn/vmorn assembler mnemonic.

Reviewed By: frasercrmck, jrtc27, craig.topper

Differential Revision: https://reviews.llvm.org/D111062
2021-11-04 10:08:01 -07:00
Fraser Cormack
e7c879a69d [RISCV][VP] Add support for VP_REDUCE_* operations
This patch adds codegen support for lowering the vector-predicated
reduction intrinsics to RVV instructions. The process is similar to that
of the other reduction intrinsics, save for the fact that every VP
reduction has a start value. We reuse the existing custom "VL" nodes,
adding extra patterns where required to handle non-true masks.

To support these nodes, the `RISCVISD::VECREDUCE_*_VL` nodes have been
given an explicit "merge" operand. This is to faciliate the VP
reductions, where we must be careful to ensure that even if no operation
is performed (when VL=0) we still produce the start value. The RVV
reductions don't update the destination register under these conditions,
so we tie the splatted start value to the output register.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D107657
2021-09-23 11:11:05 +01:00
Craig Topper
d85e347a28 [RISCV] Add a pass to recognize VLS strided loads/store from gather/scatter.
For strided accesses the loop vectorizer seems to prefer creating a
vector induction variable with a start value of the form
<i32 0, i32 1, i32 2, ...>. This value will be incremented each
loop iteration by a splat constant equal to the length of the vector.
Within the loop, arithmetic using splat values will be done on this
vector induction variable to produce indices for a vector GEP.

This pass attempts to dig through the arithmetic back to the phi
to create a new scalar induction variable and a stride. We push
all of the arithmetic out of the loop by folding it into the start,
step, and stride values. Then we create a scalar GEP to use as the
base pointer for a strided load or store using the computed stride.
Loop strength reduce will run after this pass and can do some
cleanups to the scalar GEP and induction variable.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D107790
2021-09-20 09:39:44 -07:00
Craig Topper
1b736bda3b [RISCV] Enable CGP to sink splat operands of Add/Sub/Mul/Shl/LShr/AShr
LICM may have pulled out a splat, but with .vx instructions we
can fold it into an operation.

This patch enables CGP to reverse the LICM transform and move the
splat back into the loop.

I've started with the commutable integer operations and shifts, but we can
extend this with more operations in future patches.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D109394
2021-09-10 09:04:01 -07:00
Fraser Cormack
a823bdf3ab [RISCV][VP] Custom lower VP_STORE and VP_LOAD
This patch adds support for the vector-predicated `VP_STORE` and
`VP_LOAD` nodes. We do this in the same way we lower `MSTORE` and
`MLOAD`: to regular load/store instructions via intrinsics.

One necessary change was made to `SelectionDAGLegalize` so that
`VP_STORE` nodes' operation actions are taken from the stored "value"
operands, in the same vein as `STORE` or `MSTORE`.

Reviewed By: craig.topper, rogfer01

Differential Revision: https://reviews.llvm.org/D108999
2021-09-07 10:53:25 +01:00
Fraser Cormack
f4dee8cb82 [RISCV][VP] Custom lower VP_SCATTER and VP_GATHER
This patch adds support for the `VP_SCATTER` and `VP_GATHER` nodes by
lowering them to RVV's `vsox`/`vlux` instructions, respectively. This
process is almost identical to the existing `MSCATTER`/`MGATHER` support.

One extra change was made to `SelectionDAGLegalize` so that
`VP_SCATTER`'s operation action is derived from its stored "value"
operand rather than its return type (which is always the chain).

Reviewed By: craig.topper, rogfer01

Differential Revision: https://reviews.llvm.org/D108987
2021-09-07 10:43:07 +01:00
Ben Shi
f69fb7ac72 [DAGCombiner] Add target hook function to decide folding (mul (add x, c1), c2)
Reviewed by: lebedev.ri, spatel, craig.topper, luismarques, jrtc27

Differential Revision: https://reviews.llvm.org/D107711
2021-08-22 16:53:32 +08:00
Craig Topper
d4ee84ceee [RISCV] Support FP_TO_S/UINT_SAT for i32 and i64.
The fcvt fp to integer instructions saturate if their input is
infinity or out of range, but the instructions produce a maximum
integer for nan instead of 0 required for the ISD opcodes.

This means we can use the instructions to do the saturating
conversion, but we'll need to fix up the nan case at the end.

We can probably improve the i8 and i16 default codegen as well,
but I'll leave that for a follow up.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D107230
2021-08-07 16:06:00 -07:00
Craig Topper
593059b328 [RISCV] Rename RISCVISD::FCVT_W_RV64 to FCVT_W_RTZ_RV64. NFC
fcvt.w(u) supports multiple rounding modes, but the ISD node
doesn't encode that. So name it to match the rounding mode it uses.
2021-07-31 11:14:59 -07:00
Fraser Cormack
172487fe4c [RISCV] Add support for vector saturating add/sub operations
This patch adds support for lowering the saturating vector add/sub
intrinsics to RVV instructions, for both fixed-length and
scalable-vector forms alike.

Note that some of the DAG combines are still not triggering for the
scalable-vector tests. These require a bit more work in the DAGCombiner
itself.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D106651
2021-07-27 10:04:14 +01:00
Craig Topper
c63dbd8501 [RISCV] Custom lower (i32 (fptoui/fptosi X)).
I stumbled onto a case where our (sext_inreg (assertzexti32 (fptoui X)), i32)
isel pattern can cause an fcvt.wu and fcvt.lu to be emitted if
the assertzexti32 has an additional user. If we add a one use check
it would just cause a fcvt.lu followed by a sext.w when only need
a fcvt.wu to satisfy both users.

To mitigate this I've added custom isel and new ISD opcodes for
fcvt.wu. This allows us to keep know it started life as a conversion
to i32 without needing to match multiple nodes. ComputeNumSignBits
has been taught that this new nodes produces 33 sign bits. To
prevent regressions when we need to zero extend the result of an
(i32 (fptoui X)), I've added a DAG combine to convert it to an
(i64 (fptoui X)) before type legalization. In most cases this would
happen in InstCombine, but a zero_extend can be created for function
returns or arguments.

To keep everything consistent I've added new nodes for fptosi as well.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D106346
2021-07-24 10:50:43 -07:00
Craig Topper
2b5e53111a [RISCV] Add support for matching vwmul(u) and vwmacc(u) from fixed vectors.
This adds a DAG combine to detect sext/zext inputs and emit a
new ISD opcode. The extends will either be removed or replaced
with narrower extends.

Isel patterns are used to match add and widening mul to vwmacc
similar to the recently added vmacc patterns.

There's still some work to be to match vmulsu.
We should also rewrite splats that were extended as scalars and
then splatted.

Reviewed By: arcbbb

Differential Revision: https://reviews.llvm.org/D104802
2021-07-06 10:24:31 -07:00
Craig Topper
3b6dfa381e [RISCV] Protect the SHL/SRA/SRL handlers in LowerOperation against being called for an illegal i32 shift amount.
It seems it is possible for DAG combine to create a shl with an
i64 result type and an i32 shift amount. This is ok before type
legalization since the type don't need to match in SelectionDAG.
This results in type legalization calling LowerOperation to
legalize just the amount. We weren't expecting this so we
asserted for not finding a fixed vector shift.

To fix this, I've added a check for the fixed vector case and
returned SDValue() to get the default type legalizer. I've
factored all shifts together and added a fixed vector specific
handler to avoid repeating similar code for each in
LowerOperation.

The particular case I found was exposed by D104581, but the bad
shift is created after that patch triggers.
2021-06-29 09:45:13 -07:00
Fraser Cormack
c75e454cb9 [RISCV] Transform unaligned RVV vector loads/stores to aligned ones
This patch adds support for loading and storing unaligned vectors via an
equivalently-sized i8 vector type, which has support in the RVV
specification for byte-aligned access.

This offers a more optimal path for handling of unaligned fixed-length
vector accesses, which are currently scalarized. It also prevents
crashing when `LegalizeDAG` sees an unaligned scalable-vector load/store
operation.

Future work could be to investigate loading/storing via the largest
vector element type for the given alignment, in case that would be more
optimal on hardware. For instance, a 4-byte-aligned nxv2i64 vector load
could loaded as nxv4i32 instead of as nxv16i8.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D104032
2021-06-14 18:12:18 +01:00
Nikita Popov
1ffa6499ea [TargetLowering] Use IRBuilderBase instead of IRBuilder<> (NFC)
Don't require a specific kind of IRBuilder for TargetLowering hooks.
This allows us to drop the IRBuilder.h include from TargetLowering.h.

Differential Revision: https://reviews.llvm.org/D103759
2021-06-06 16:29:50 +02:00
Fraser Cormack
4f500c402b [RISCV] Support vector types in combination with fastcc
This patch extends the RISC-V lowering of the 'fastcc' calling
convention to vector types, both fixed-length and scalable. Without this
patch, any function passing or returning vector types by value would
throw a compiler error.

Vectors are handled in 'fastcc' much as they are in the default calling
convention, the noticeable difference being the extended set of scalar
GPR registers that can be used to pass vectors indirectly.

Reviewed By: HsiangKai

Differential Revision: https://reviews.llvm.org/D102505
2021-06-01 10:31:18 +01:00
Fraser Cormack
5a80dc4988 [VP][SelectionDAG] Add a target-configurable EVL operand type
This patch adds a way for the target to configure the type it uses for
the explicit vector length operands of VP SDNodes. The type must be a
legal integer type (there is still no target-independent legalization of
this operand) and must currently be at least as big as i32, the type
used by the IR intrinsics. An implicit zero-extension takes place on
targets which choose a larger type. All VP nodes should be created with
this type used for the EVL operand.

This allows 64-bit RISC-V to avoid custom legalization of all VP nodes,
keeping them in their target-independent form for that bit longer.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D103027
2021-05-27 15:27:36 +01:00
Craig Topper
9065118b64 [RISCV] Optimize SEW=64 shifts by splat on RV32.
SEW=64 shifts only uses the log2(64) bits of shift amount. If we're
splatting a 64 bit value in 2 parts, we can avoid splatting the
upper bits and just let the low bits be sign extended. They won't
be read anyway.

For the purposes of SelectionDAG semantics of the generic ISD opcodes,
if hi was non-zero or bit 31 of the low is 1, the shift was already
undefined so it should be ok to replace high with sign extend of low.

In order do be able to find the split i64 value before it becomes
a stack operation, I added a new ISD opcode that will be expanded
to the stack spill in PreprocessISelDAG. This new node is conceptually
similar to BuildPairF64, but I expanded earlier so that we could
go through regular isel to get the right VLSE opcode for the LMUL.
BuildPairF64 is expanded in a CustomInserter.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D102521
2021-05-26 10:23:32 -07:00
Fraser Cormack
7a211ed110 [RISCV] Prevent store combining from infinitely looping
RVV code generation does not successfully custom-lower BUILD_VECTOR in all
cases. When it resorts to default expansion it may, on occasion, be expanded to
scalar stores through the stack. Unfortunately these stores may then be picked
up by the post-legalization DAGCombiner which merges them again. The merged
store uses a BUILD_VECTOR which is then expanded, and so on.

This patch addresses the issue by overriding the `mergeStoresAfterLegalization`
hook. A lack of granularity in this method (being passed the scalar type) means
we opt out in almost all cases when RVV fixed-length vector support is enabled.
The only exception to this rule are mask vectors, which are always either
custom-lowered or are expanded to a load from a constant pool.

Reviewed By: HsiangKai

Differential Revision: https://reviews.llvm.org/D102913
2021-05-24 10:19:32 +01:00
Evandro Menezes
3a64b7080d [RISCV] Move instruction information into the RISCVII namespace (NFC)
Move instruction attributes into the `RISCVII` namespace and add associated helper functions.

Differential Revision: https://reviews.llvm.org/D102268
2021-05-11 16:32:42 -05:00
Craig Topper
6660319cef [RISCV] Remove unused RISCV::VLEFF and VLEFF_MASK. NFC
Looks like these got left behind when vleff isel was moved to
X86ISelDAGToDAG.cpp
2021-05-06 09:41:29 -07:00
Fraser Cormack
6f17613bfb [RISCV][VP] Lower VP ISD nodes to RVV instructions
This patch supports all of the current set of VP integer binary
intrinsics by lowering them to to RVV instructions. It does so by using
the existing RISCVISD *_VL custom nodes as an intermediate layer. Both
scalable and fixed-length vectors are supported by using this method.

One notable change to the existing vector codegen strategy is that
scalable all-ones and all-zeros mask SPLAT_VECTORs are now lowered to
RISCVISD VMSET_VL and VMCLR_VL nodes to match their fixed-length
BUILD_VECTOR counterparts. This allows them to reuse the existing
"all-ones" VL patterns.

To reduce the size of the phabricator diff, some tests are intentionally
left out and will be added later if the patch is accepted.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D101826
2021-05-05 12:32:24 +01:00