1414 Commits

Author SHA1 Message Date
Philip Reames
e9311f9c5a [RISCV] Separate single source and dual source lowering code [nfc]
The two single source cases aren't effected by the swap or select matching
as those are dual operand specific.  Similarly, a two source shuffle can't
be a rotate.

We can extend this idea for some of the shuffle types above, but some of
them are validly either single or dual source.  We don't want to loose that
and the code complexity of versioning early and having to repeat some shuffle
kinds doesn't (currently) seem worth it.
2024-01-24 09:16:50 -08:00
Philip Reames
fd817249f4 [RISCV] Sink code into using branch in shuffle lowering [nfc]
Follow up to 396b6bbc, sink code into consuming branch, and fix one
comment I realized used the misleading wording.  (Permute is a specific
sub-type of single source shuffle.)
2024-01-24 08:52:07 -08:00
Philip Reames
396b6bbc5e
[RISCV] Recurse on second operand of two operand shuffles (#79197)
This builds on bdc41106ee48dce59c500c9a3957af947f30c8c3.

This change completes the migration to a recursive shuffle lowering
strategy where when we encounter an unknown two argument shuffle, we
lower each operand as a single source permute, and then use a vselect
(i.e. a vmerge) to combine the results. This relies for code quality on
the post-isel combine which will aggressively fold that vmerge back into
the materialization of the second operand if possible.

Note: The change includes only the most immediately obvious of the
stylistic cleanup. There's a bunch of code movement that this enables
that I'll do as a separate patch as rolling it into this creates an
unreadable diff.
2024-01-24 08:29:28 -08:00
Brandon Wu
33d804c6c2
[RISCV] Allow VCIX with SE to reorder (#77049)
This patch allows VCIX instructions that have side effect to be
reordered
with memory and other side effecting instructions. However we don't want
VCIX instructions to be reordered with each other, so we propose a dummy
register called VCIX_STATE and make these instructions implicitly define
and use
it.
2024-01-24 11:30:12 +08:00
Paul Kirth
03a61d34eb
[RISCV] Support TLSDESC in the RISC-V backend (#66915)
This patch adds basic TLSDESC support in the RISC-V backend.

Specifically, we add new relocation types for TLSDESC, as prescribed in 
https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/373, and add a
new pseudo instruction to simplify code generation.

This patch does not try to optimize the local dynamic case, which can be
improved in separate patches. 

Linker side changes will also be handled separately.

The current implementation is only enabled when passing the new
`-enable-tlsdesc` codegen flag.
2024-01-23 16:16:07 -08:00
Philip Reames
bdc41106ee
[RISCV] Recurse on first operand of two operand shuffles (#79180)
This is the first step towards an alternate shuffle lowering design for
the general two vector argument case. The goal is to leverage the
existing lowering for single vector permutes to avoid as many of the
vrgathers as required - even if we do need the other.

This patch handles only the first argument, and is arguably a slightly
weird half-step. However, the test changes from the full two argument
recurse patch are a lot harder to reason about. Taking this half step
gives much more easily reviewable changes, and is thus worthwhile. I
intend to post the patch for the second argument once this has landed.
2024-01-23 10:49:55 -08:00
Philip Reames
bb8a8770e2
[RISCV] Exploit register boundaries when lowering shuffle with exact vlen (#79072)
If we have a shuffle which is larger than m1, we may be able to split it
into a series of individual m1 shuffles. This patch starts with the
subcase where the mask allows a 1-to-1 mapping from source register to
destination register - each with a possible permutation of their own. We
can potentially extend this later, thought in practice this seems to
already catch a number of the most interesting cases.
2024-01-23 10:36:22 -08:00
Philip Reames
a0f69be262 [RISCV] Continue with early return for shuffle lowering [nfc]
Move two cases where we're not actually going to use any of our computed index vectors or mask values above the computation of the same.
2024-01-23 09:32:04 -08:00
Philip Reames
51f9e982ed [RISCV] Use early return for select shuffle lowering [nfc]
Minor rework of the fallback case for two argument shuffles in lowerVECTOR_SHUFFLE.  We had some common code which wasn't actually common, and simplified significantly once specialized for whether we had a select or not.
2024-01-23 09:20:52 -08:00
Simeon K
58cfd56356
[VP][RISCV] Introduce llvm.vp.minimum/maximum intrinsics (#74840)
Although there are predicated versions of minnum/maxnum, the ones for
minimum/maximum are currently missing. This patch introduces these
intrinsics and implements their lowering to RISC-V.
2024-01-22 16:46:39 -08:00
Craig Topper
672fb5892e [RISCV] Remove extra semicolons. NFC 2024-01-22 14:30:34 -08:00
Craig Topper
0ad83bc26c
[RISCV] Don't look through EXTRACT_ELEMENT in lowerScalarInsert if the element types are different. (#78668)
If the element type of the vector we're extracting from doesn't match the type we're
inserting into, we can't directly insert or extract the subvector.
2024-01-18 22:35:24 -08:00
Chia
ba81477e9c
Recommit "[RISCV][ISel] Combine scalable vector add/sub/mul with zero/sign extension." (#76785)
This patch was originally introduced in PR #72340, but was reverted due
to a bug on invalid extension combine.

Specifically, we resolve the case in the
https://github.com/llvm/llvm-project/pull/72340#issuecomment-1874810998

```
define <vscale x 1 x i32> @foo(<vscale x 1 x i1> %x, <vscale x 1 x i2> %y) {     
  %a = zext <vscale x 1 x i1> %x to <vscale x 1 x i32>                           
  %b = zext <vscale x 1 x i1> %y to <vscale x 1 x i32>                           
  %c = add <vscale x 1 x i32> %a, %b                                             
  ret <vscale x 1 x i32> %c                                                      
}
```
The previous patch didn't check if the semantic of `ISD::ZERO_EXTEND`
and `ISD::ZERO_EXTEND` is equivalent to the `vsext.vf2` or `vzext.vf2`
(not ensuring the SEW condition on widening Vector Arithmetic
Instructions).

Thanks for @topperc pointing out this bug.

## The original description 
This PR mainly aims at resolving the below missed-optimization case,
while it could also be considered as an extension of the previous patch
https://reviews.llvm.org/D133739?id=

### Missed-Optimization Case
Compiler Explorer: https://godbolt.org/z/GzWzP7Pfh

### Source Code: 
```
define <vscale x 2 x i16> @multiple_users(ptr  %x, ptr  %y, ptr %z) {
  %a = load <vscale x 2 x i8>, ptr %x
  %b = load <vscale x 2 x i8>, ptr %y
  %b2 = load <vscale x 2 x i8>, ptr %z
  %c = sext <vscale x 2 x i8> %a to <vscale x 2 x i16>
  %d = sext <vscale x 2 x i8> %b to <vscale x 2 x i16>
  %d2 = sext <vscale x 2 x i8> %b2 to <vscale x 2 x i16>
  %e = mul <vscale x 2 x i16> %c, %d
  %f = add <vscale x 2 x i16> %c, %d2
  %g = sub <vscale x 2 x i16> %c, %d2
  %h = or <vscale x 2 x i16> %e, %f
  %i = or <vscale x 2 x i16> %h, %g
  ret <vscale x 2 x i16> %i
}
```
### Before This Patch
```
# %bb.0:
        vsetvli a3, zero, e16, mf2, ta, ma
        vle8.v  v8, (a0)
        vle8.v  v9, (a1)
        vle8.v  v10, (a2)
        svf2       v11, v8
        vsext.vf2       v8, v9
        vsext.vf2       v9, v10
        vmul.vv v8, v11, v8
        vadd.vv v10, v11, v9
        vsub.vv v9, v11, v9
        vor.vv  v8, v8, v10
        vor.vv  v8, v8, v9
        ret
```
###  After This Patch 
```
# %bb.0:
	vsetvli	a3, zero, e8, mf4, ta, ma
	vle8.v	v8, (a0)
	vle8.v	v9, (a1)
	vle8.v	v10, (a2)
	vwmul.vv	v11, v8, v9
	vwadd.vv	v9, v8, v10
	vwsub.vv	v12, v8, v10
	vsetvli	zero, zero, e16, mf2, ta, ma
	vor.vv	v8, v11, v9
	vor.vv	v8, v8, v12
	ret
```
We can see Add/Sub/Mul are combined with the Sign Extension.

### Relation to the Patch D133739
The patch D133739 introduced an optimization for folding `ADD_VL`/
`SUB_VL` / `MUL_V` with `VSEXT_VL` / `VZEXT_VL`. However, the patch did
not consider the case of non-fixed length vector case, thus this PR
could also be considered as an extension for the D133739.
2024-01-17 18:30:27 -08:00
Philip Reames
de423cfe3d
[RISCV] Prefer vsetivli for VLMAX when VLEN is exactly known (#75509)
If VLEN is exactly known, we may be able to use the vsetivli encoding
instead of the vsetvli a0, zero, <vtype> encoding. This slightly reduces
register pressure.

This builds on 632f1c5, but reverses course a bit. It turns out to be
quite complicated to canonicalize from VLMAX to immediate early because
the sentinel value is widely used in tablegen patterns without knowledge
of LMUL. Instead, we canonicalize towards the VLMAX representation, and
then pick the immediate form during insertion since we have the LMUL
information there.

Within InsertVSETVLI, this could reasonable fit in a couple places. If
reviewers want me to e.g. move it to emission, let me know. Doing so may
require a bit of extra code to e.g. handle comparisons of the two forms,
but shouldn't be too complicated.
2024-01-17 12:40:00 -08:00
Wang Pengcheng
3ac9fe69f7
[RISCV] CodeGen of RVE and ilp32e/lp64e ABIs (#76777)
This commit includes the necessary changes to clang and LLVM to support
codegen of `RVE` and the `ilp32e`/`lp64e` ABIs.

The differences between `RVE` and `RVI` are:
* `RVE` reduces the integer register count to 16(x0-x16).
* The ABI should be `ilp32e` for 32 bits and `lp64e` for 64 bits.

`RVE` can be combined with all current standard extensions.

The central changes in ilp32e/lp64e ABI, compared to ilp32/lp64 are:
* Only 6 integer argument registers (rather than 8).
* Only 2 callee-saved registers (rather than 12).
* A Stack Alignment of 32bits (rather than 128bits).
* ilp32e isn't compatible with D ISA extension.

If `ilp32e` or `lp64` is used with an ISA that has any of the registers
x16-x31 and f0-f31, then these registers are considered temporaries.

To be compatible with the implementation of ilp32e in GCC, we don't use
aligned registers to pass variadic arguments and set stack alignment\
to 4-bytes for types with length of 2*XLEN.

FastCC is also supported on RVE, while GHC isn't since there is only one
avaiable register.

Differential Revision: https://reviews.llvm.org/D70401
2024-01-16 20:44:30 +08:00
Luke Lau
286a366d05
[RISCV] Remove vmv.s.x and vmv.x.s lmul pseudo variants (#71501)
vmv.s.x and vmv.x.s ignore LMUL, so we can replace the PseudoVMV_S_X_MX
and
PseudoVMV_X_S_MX with just one pseudo each. These pseudos use the VR
register
class (just like the actual instruction), so we now only have TableGen
patterns for vectors of LMUL <= 1.
We now rely on the existing combines that shrink LMUL down to 1 for
vmv_s_x_vl (and vfmv_s_f_vl). We could look into removing these combines
later and just inserting the nodes with the correct type in a later
patch.

The test diff is due to the fact that a PseudoVMV_S_X/PsuedoVMV_X_S no
longer
carries any information about LMUL, so if it's the only vector pseudo
instruction in a block then it now defaults to LMUL=1.
2024-01-16 13:36:24 +07:00
Luke Lau
c07a1fe7b4
[RISCV] Lower vfmv.s.f intrinsics to VFMV_S_F_VL first (#76699)
Currently vfmv.s.f intrinsics are directly selected to their pseudos via
a
tablegen pattern in RISCVInstrInfoVPseudos.td, whereas the other move
instructions (vmv.s.x/vmv.v.x/vmv.v.f etc.) first get lowered to their
corresponding VL SDNode, then get selected from a pattern in
RISCVInstrInfoVVLPatterns.td

This patch brings vfmv.s.f inline with the other move instructions.

Split out from #71501, where we did this to preserve the behaviour of
selecting
vmv_s_x for VFMV_S_F_VL for small enough immediates.
2024-01-15 12:07:29 +07:00
Craig Topper
3378514a4d
[RISCV] Use any_extend for type legalizing atomic_compare_swap with Zacas. (#77669)
With Zacas we will use amocas.w which doesn't require the input to be
sign extended.
2024-01-10 12:41:11 -08:00
jiahanxie353
e42a70afab [RISCV][GISel] IRTranslate and Legalize some instructions with scalable vector type
* Add IRTranslate tests for ADD, SUB, AND, OR, and XOR with scalable
  vector types to show that they work as expected.
* Legalize G_ADD, G_SUB, G_AND, G_OR, and G_XOR of scalable vector
  type for the RISC-V vector extension.
2024-01-09 21:51:30 -07:00
Chia
a79d13f12a
[RISCV][ISel] Use vaaddu with rounding mode rnu for ISD::AVGCEILU. (#77473)
Similar to #76550, but for `ISD::AVGCEILU`.
Specifically, this patch aims to use `vaaddu` with rounding mode rnu
(i.e `vxrm[1:0] = 0b00`) for `ISD::AVGCEILU`.

### Source code 
```
define <vscale x 8 x i8> @vaaddu_vv_nxv8i8_ceil(<vscale x 8 x i8> %x, <vscale x 8 x i8> %y) {
  %xzv = zext <vscale x 8 x i8> %x to <vscale x 8 x i16>
  %yzv = zext <vscale x 8 x i8> %y to <vscale x 8 x i16>
  %add = add nuw nsw <vscale x 8 x i16> %xzv, %yzv
  %one = insertelement <vscale x 8 x i16> poison, i16 1, i32 0
  %splat = shufflevector <vscale x 8 x i16> %one, <vscale x 8 x i16> poison, <vscale x 8 x i32> zeroinitializer
  %add1 = add nuw nsw <vscale x 8 x i16> %add, %splat
  %div = lshr <vscale x 8 x i16> %add1, %splat
  %ret = trunc <vscale x 8 x i16> %div to <vscale x 8 x i8>
  ret <vscale x 8 x i8> %ret
}
```

### Before this patch 
```
vaaddu_vv_nxv8i8_ceil:
        vsetvli a0, zero, e8, m1, ta, ma
        vwaddu.vv       v10, v8, v9
        vsetvli zero, zero, e16, m2, ta, ma
        vadd.vi v10, v10, 1
        vsetvli zero, zero, e8, m1, ta, ma
        vnsrl.wi        v8, v10, 1
        ret
```
### After this patch 
```
vaaddu_vv_nxv8i8_ceil:
        vsetvli a0, zero, e8, m1, ta, ma
        csrwi vxrm, 0
        vaaddu.vv v8, v8, v9
        ret
```
2024-01-10 12:08:16 +09:00
Craig Topper
c9da4dc77f
[RISCV] Refactor GPRF64 register class to make it usable for Zacas. (#77408)
-Rename to GPRPair.
-Rename registers to be named like X10_X11 instead of X10_PD. Except X0
 which is now X0_Pair since it is not paired with X1.
-Use unknown size and offset for the subreg indices. This might
 be a functional change, but does not affect any lit tests.
2024-01-09 09:21:27 -08:00
Alex Bradbury
2d54ec36f7
[SelectionDAG] Add and use SDNode::getAsAPIntVal() helper (#77455)
This is the logical equivalent for #76710 for APInt and uses the same
naming scheme.

Converted existing users through:
`git grep -l "cast<ConstantSDNode>\(.*\).*getAPIntValueValue" | xargs
sed -E -i
's/cast<ConstantSDNode>\((.*)\)->getAPIntValue/\1->getAsAPIntVal/'`
2024-01-09 14:27:07 +00:00
Alex Bradbury
197214e39b
[RFC][SelectionDAG] Add and use SDNode::getAsZExtVal() helper (#76710)
This follows on from #76708, allowing
`cast<ConstantSDNode>(N)->getZExtValue()` to be replaced with just
`N->getAsZextVal();`
    
Introduced via `git grep -l "cast<ConstantSDNode>\(.*\).*getZExtValue" |
xargs sed -E -i
's/cast<ConstantSDNode>\((.*)\)->getZExtValue/\1->getAsZExtVal/'` and
then using `git clang-format` on the result.
2024-01-09 12:25:17 +00:00
Chia
0c24c175f2
[RISCV][ISel] Use vaaddu with rounding mode rdn for ISD::AVGFLOORU. (#76550)
This patch aims to use `vaaddu` with rounding mode rdn (i.e `vxrm[1:0] =
0b10`) for `ISD::AVGFLOORU`.

### Source code 
```
define <8 x i8> @vaaddu_auto(ptr %x, ptr %y, ptr %z) {
  %xv = load <8 x i8>, ptr %x, align 2
  %yv = load <8 x i8>, ptr %y, align 2
  %xzv = zext <8 x i8> %xv to <8 x i16>
  %yzv = zext <8 x i8> %yv to <8 x i16>
  %add = add nuw nsw <8 x i16> %xzv, %yzv
  %div = lshr <8 x i16> %add, <i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1>
  %ret = trunc <8 x i16> %div to <8 x i8>
  ret <8 x i8> %ret 
}
```

### Before this patch 
```
vaaddu_auto: 
        vsetivli        zero, 8, e8, mf2, ta, ma
        vle8.v  v8, (a0)
        vle8.v  v9, (a1)
        vwaddu.vv       v10, v8, v9
        vnsrl.wi        v8, v10, 1
        ret
```
### After this patch 
```
vaaddu_auto: 
	vsetivli	zero, 8, e8, mf2, ta, ma
	vle8.v	v8, (a0)
	vle8.v	v9, (a1)
	csrwi	vxrm, 2
	vaaddu.vv	v8, v8, v9
	ret
```

### Note on signed averaging addition

Based on the rvv spec, there is also a variant for signed averaging
addition called `vaadd`.
But AFAIU, no matter in which rounding mode, we cannot achieve the
semantic of signed averaging addition through `vaadd`.
Thus this patch only introduces `vaaddu`.
2024-01-09 15:17:38 +09:00
Craig Topper
a8e9dceb49
[RISCV] Use getELen() instead of hardcoded 64 in lowerBUILD_VECTOR. (#77355)
This is needed to properly support Zve32x.
2024-01-08 19:36:15 -08:00
Craig Topper
faa326de97
[RISCV] Add branch+c.mv macrofusion for sifive-p450. (#76169)
sifive-p450 supports a very restricted version of the short forward
branch optimization from the sifive-7-series.

For sifive-p450, a branch over a single c.mv can be macrofused as a
conditional move operation. Due to encoding restrictions on c.mv, we
can't conditionally move from X0. That would require c.li instead.
2024-01-08 15:23:26 -08:00
Fangrui Song
360996ac5a
[RISCV] Merge machine operand flag MO_PLT into MO_CALL (#77253)
Since #72467, `@plt` in assembly output "call foo@plt" is omitted. We
can trivially merge MO_PLT and MO_CALL without any functional change to
assembly/relocatable file output.

Earlier architectures use different call relocation types whether a PLT
is potentially needed: R_386_PLT32/R_386_PC32, R_68K_PLT32/R_68K_PC32,
R_SPARC_WDISP30/R_SPARC_WPLT320. However, as the PLT property is
per-symbol instead of per-call-site and linkers can optimize out a PLT,
the distinction has been confusing.

Arm made good names R_ARM_CALL/R_AARCH64_CALL. Let's use MO_CALL instead
of MO_PLT.

As follow-ups, we can merge fixup_riscv_call/fixup_riscv_call_plt and
VK_RISCV_CALL/VK_RISCV_CALL_PLT.
2024-01-07 12:43:39 -08:00
Craig Topper
a960703466
[RISCV] Remove incomplete PRE_DEC/POST_DEC code for XTHeadMemIdx. (#76922)
As far as I can tell if getIndexedAddressParts received an ISD::SUB, the
constant would be negated. So `IsInc` should be set to true since the
SUB was effectively converted to ADD. This means we should never use
PRE_DEC/POST_DEC.

No tests are affected because DAGCombine aggressively turns SUB with
constant into ADD so no lit test has a SUB reach getIndexedAddressParts.
2024-01-04 09:48:40 -08:00
Shih-Po Hung
475890cd2e
[RISCV][CostModel] Add getRISCVInstructionCost() to TTI for CostKind (#76793)
Instruction cost for CodeSize and Latency/RecipThroughput can be very
different. Considering the diversity of CostKind and vendor-specific
cost, and how they are spread across various TTI functions, it's
becoming quite a challenge to handle. This patch adds an interface
getRISCVInstructionCost to address it.
2024-01-04 21:04:36 +08:00
Craig Topper
80889ae029
[RISCV] Remove RISCVISD::VSELECT_VL. (#76866)
We can use RISCVISD::VMERGE_VL with an undef passthru operand.

I had to rewrite the FMA patterns to handle both undef and non-undef
cases so we can get the tail policy.
2024-01-03 21:31:07 -08:00
Craig Topper
4e347b4e38 Revert "[RISCV][ISel] Combine scalable vector add/sub/mul with zero/sign extension (#72340)"
This reverts most of commit 5b155aea0e529b7b5c807e189fef6ea5cd5faec9.
I have left the new test file, but regenerated the checks.

This causes failures in our downstream testing. The input types
to the extends need to be checked so we don't create RISCVISD::VZEXT_VL
with illegal or unsupported input type.
2024-01-02 19:49:42 -08:00
Alex Bradbury
80aeb62211
[llvm][NFC] Use SDValue::getConstantOperandVal(i) where possible (#76708)
This helper function shortens examples like
`cast<ConstantSDNode>(Node->getOperand(1))->getZExtValue();` to
`Node->getConstantOperandVal(1);`.

Implemented with:
`git grep -l
"cast<ConstantSDNode>\(.*->getOperand\(.*\)\)->getZExtValue\(\)" | xargs
sed -E -i

's/cast<ConstantSDNode>\((.*)->getOperand\((.*)\)\)->getZExtValue\(\)/\1->getConstantOperandVal(\2)/`
and `git grep -l
"cast<ConstantSDNode>\(.*\.getOperand\(.*\)\)->getZExtValue\(\)" | xargs
sed -E -i

's/cast<ConstantSDNode>\((.*)\.getOperand\((.*)\)\)->getZExtValue\(\)/\1.getConstantOperandVal(\2)/'`.
With a couple of simple manual fixes needed. Result then processed by
`git clang-format`.
2024-01-02 13:14:28 +00:00
Chia
5b155aea0e
[RISCV][ISel] Combine scalable vector add/sub/mul with zero/sign extension (#72340)
This PR mainly aims at resolving the below missed-optimization case,
while it could also be considered as an extension of the previous patch
https://reviews.llvm.org/D133739?id=

## Missed-Optimization Case
Compiler Explorer: https://godbolt.org/z/GzWzP7Pfh
### Source Code: 
```
define <vscale x 2 x i16> @multiple_users(ptr  %x, ptr  %y, ptr %z) {
  %a = load <vscale x 2 x i8>, ptr %x
  %b = load <vscale x 2 x i8>, ptr %y
  %b2 = load <vscale x 2 x i8>, ptr %z
  %c = sext <vscale x 2 x i8> %a to <vscale x 2 x i16>
  %d = sext <vscale x 2 x i8> %b to <vscale x 2 x i16>
  %d2 = sext <vscale x 2 x i8> %b2 to <vscale x 2 x i16>
  %e = mul <vscale x 2 x i16> %c, %d
  %f = add <vscale x 2 x i16> %c, %d2
  %g = sub <vscale x 2 x i16> %c, %d2
  %h = or <vscale x 2 x i16> %e, %f
  %i = or <vscale x 2 x i16> %h, %g
  ret <vscale x 2 x i16> %i
}
```
### Before This Patch
```
# %bb.0:
        vsetvli a3, zero, e16, mf2, ta, ma
        vle8.v  v8, (a0)
        vle8.v  v9, (a1)
        vle8.v  v10, (a2)
        svf2       v11, v8
        vsext.vf2       v8, v9
        vsext.vf2       v9, v10
        vmul.vv v8, v11, v8
        vadd.vv v10, v11, v9
        vsub.vv v9, v11, v9
        vor.vv  v8, v8, v10
        vor.vv  v8, v8, v9
        ret
```
###  After This Patch 
```
# %bb.0:
	vsetvli	a3, zero, e8, mf4, ta, ma
	vle8.v	v8, (a0)
	vle8.v	v9, (a1)
	vle8.v	v10, (a2)
	vwmul.vv	v11, v8, v9
	vwadd.vv	v9, v8, v10
	vwsub.vv	v12, v8, v10
	vsetvli	zero, zero, e16, mf2, ta, ma
	vor.vv	v8, v11, v9
	vor.vv	v8, v8, v12
	ret
```
We can see Add/Sub/Mul are combined with the Sign Extension.

## Relation to the Patch D133739
The patch D133739 introduced an optimization for folding `ADD_VL`/
`SUB_VL` / `MUL_V` with `VSEXT_VL` / `VZEXT_VL`. However, the patch did
not consider the case of non-fixed length vector case, thus this PR
could also be considered as an extension for the D133739.

Furthermore, in the current `SelectionDAG`, we represent scalable vector
add (or any binary operator) as a normal `ADD` operation. It might be
better to use an Opcode like `ADD_VL`, which needs further conversation
and decision.
2023-12-29 14:36:38 +08:00
Vitaly Buka
9c39d9bb49
Revert "[RISCV][CostModel] Add getRISCVInstructionCost() to TTI for Cost… (#73651)" (#76536)
Fails on bots https://lab.llvm.org/buildbot/#/builders/5/builds/39629

Issue #76535

This reverts commit 3e75dece919511e4a2edada82d783304cc14a9cd.
2023-12-28 13:30:56 -08:00
Shih-Po Hung
3e75dece91
[RISCV][CostModel] Add getRISCVInstructionCost() to TTI for Cost… (#73651)
…Kind

Instruction cost for CodeSize and Latency/RecipThroughput can be very
different. Considering the diversity of CostKind and vendor-specific
cost, and how they are spread across various TTI functions, it's
becoming quite a challenge to handle. This patch adds an interface
getRISCVInstructionCost to address it.
2023-12-28 14:36:01 +08:00
Yeting Kuo
af837d44c7
[RISCV][DAG] Teach computeKnownBits consider SEW/LMUL/AVL for vsetvli. (#76158)
This patch also add tests whose masks are too narrow to combine. I think
it can help us to find out bugs caused by too large known bits.
2023-12-25 11:18:22 +08:00
Craig Topper
e64f5d6305
[RISCV] Replace RISCVISD::VP_MERGE_VL with a new node that has a separate passthru operand. (#75682)
ISD::VP_MERGE treats the false operand as the source for elements past
VL. The vmerge instruction encodes 3 registers and treats the vd
register as the source for the tail.

This patch adds a new ISD opcode that models the tail source explicitly.
During lowering we copy the false operand to this operand.

I think we can merge RISCVISD::VSELECT_VL with this new opcode by using
an UNDEF passthru, but I'll save that for another patch.
2023-12-21 14:34:49 -08:00
Craig Topper
0dcff0db3a
[RISCV] Add codegen support for experimental.vp.splice (#74688)
IR intrinsics were already defined, but no codegen support had been
added.

I extracted this code from our downstream. Some of it may have come from
https://repo.hca.bsc.es/gitlab/rferrer/llvm-epi/ originally.
2023-12-21 08:38:32 -08:00
Yeting Kuo
9b561ca044
[RISCV] Make performFP_TO_INTCombine fold with ISD::FRINT. (#76020)
Fold (fp_to_int (frint X)) to (fcvt X) without rounding mode.
2023-12-21 15:03:36 +08:00
Yeting Kuo
b7376c3196
[RISCV][NFC] Add comments and tests for frint case of performFP_TO_INT_SATCombine. (#76014)
performFP_TO_INT_SATCombine could also serve pattern (fp_to_int_sat
(frint X)).
2023-12-20 14:56:28 +08:00
Yeting Kuo
cdc0392669
[RISCV] Update implies for subtarget feature. (#75824)
PR #75576 and #75735 update some implies in
llvm/lib/Support/RISCVISAInfo.cpp, but both of them miss the subtarget
feature part.
This patch still preserve predicate HasStdExtZfhOrZfhmin and
HasStdExtZhinxOrZhinxmin, since they could make error message more
readable. ( Users might not know that zfh implies zfhmin.)
2023-12-19 09:47:46 +08:00
Jie Fu
b6cce87110 [RISCV] Fix -Wbraced-scalar-init in RISCVISelLowering.cpp (NFC)
llvm-project/llvm/lib/Target/RISCV/RISCVISelLowering.cpp:339:24:
 error: braces around scalar initializer [-Werror,-Wbraced-scalar-init]
  339 |     setOperationAction({ISD::ROTL}, XLenVT, Expand);
      |                        ^~~~~~~~~~~
1 error generated.
2023-12-17 19:59:42 +08:00
melonedo
3eaed9e6f5
[RISCV] Implement intrinsics for XCVbitmanip extension in CV32E40P (#74993)
Implement XCVbitmanip intrinsics for CV32E40P according to the
specification.

This commit is part of a patch-set to upstream the vendor specific
extensions of CV32E40P that need LLVM intrinsics to implement Clang
builtins.

Contributors: @CharKeaney, @ChunyuLiao, @jeremybennett, @lewis-revill,
@NandniJamnadas, @PaoloS02, @simonpcook, @xingmingjie.

Spec:
05481cf0ef/specifications/corev-builtin-spec.md (listing-of-pulp-bit-manipulation-builtins-xcvbitmanip).

Previously reviewed on Phabricator: https://reviews.llvm.org/D157510.
Parallel GCC patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/635795.html.

Co-authored-by: melonedo <funanzeng@gmail.com>
2023-12-17 19:29:40 +08:00
Philip Reames
e8a15eca92
[RISCV] Prefer whole register loads and stores when VL=VLMAX (#75531)
If we're lowering a fixed length vector load or store which happens to
exactly VLEN in size (when VLEN is exactly known), we can use a whole
register load or store instead of the unit strided variants. This
doesn't require a vsetvli in some cases, allows additional flexibility
of vsetvli cases in others, and doesn't have a runtime dependency on the
value of VL.
2023-12-15 09:26:57 -08:00
Philip Reames
632f1c5d18
[RISCV] When VLEN is exactly known, prefer VLMAX encoding for vsetvli (#75412)
If we know the exact VLEN, then we can tell if the AVL for particular
operation is equivalent to the vsetvli xN, zero, <vtype> encoding. Using
this encoding is better than having to materialize an immediate in a
register, but worse than being able to use the vsetivli zero, imm,
<type> encoding.
2023-12-13 17:51:03 -08:00
Philip Reames
12af9c8337 [RISCV] Extract a utility for computing bounds on VLMAX [nfc]
Simplifying an upcoming change...
2023-12-13 13:40:18 -08:00
Craig Topper
2c185709bc
[RISCV] Remove setJumpIsExpensive(). (#74647)
Middle end up optimizations can speculate away the short circuit
behavior of C/C++ && and ||. Using i1 and/or or logical select
instructions and a single branch.

SelectionDAGBuilder can turn i1 and/or/select back into multiple
branches, but this is disabled when jump is expensive.

RISC-V can use slt(u)(i) to evaluate a condition into any GPR which
makes us better than other targets that use a flag register. RISC-V also
has single instruction compare and branch. So its not clear from a code
size perspective that using compare+and/or is better.

If the full condition is dependent on multiple loads, using a logic
delays the branch resolution until all the loads are resolved even if
there is a cheap condition that makes the loads unnecessary.

PowerPC and Lanai are the only CPU targets that use setJumpIsExpensive.
NVPTX and AMDGPU also use it but they are GPU targets. PowerPC appears
to have a MachineIR pass that turns AND/OR of CR bits into multiple
branches. I don't know anything about Lanai and their reason for using
setJumpIsExpensive.

I think the decision to use logic vs branches is much more nuanced than
this big hammer. So I propose to make RISC-V match other CPU targets.

Anyone who wants the old behavior can still pass -mllvm
-jump-is-expensive=true.
2023-12-13 09:37:25 -08:00
Craig Topper
8227072f5a [RISCV] Add missing break to last case in switch. NFC 2023-12-12 13:52:52 -08:00
Craig Topper
3c5b42acd3
[RISCV] Allocate the varargs GPR save area as a single object. (#74354)
Previously we allocated one object for each GPR. We also allocated the
same offset twice, once to save for VASTART and then again for the first
register in the save loop.

This patch uses a single object for all the registers and shares this
with VASTART. This is more consistent with other targets like AArch64
and ARM.

I've removed the setValue(nullptr) from the memory operand now. Having a
single object makes me a lot more comfortable about alias analysis being
able to see what is going on. This led to the scheduling changes in
push-pop-popret.ll and vararg.ll.
2023-12-05 10:30:01 -08:00
Craig Topper
b73d79fda8 [RISCV] Fix typo in comment. NFC
This should say "Assume that VL output is <= 65536".
2023-12-04 14:15:49 -08:00