3409 Commits

Author SHA1 Message Date
Michael Maitland
3967510032
[RISCV][GISel] First mask argument placed in v0 according to RISCV Ve… (#79343)
…ctor CC.
2024-01-24 16:03:38 -05:00
Philip Reames
396b6bbc5e
[RISCV] Recurse on second operand of two operand shuffles (#79197)
This builds on bdc41106ee48dce59c500c9a3957af947f30c8c3.

This change completes the migration to a recursive shuffle lowering
strategy where when we encounter an unknown two argument shuffle, we
lower each operand as a single source permute, and then use a vselect
(i.e. a vmerge) to combine the results. This relies for code quality on
the post-isel combine which will aggressively fold that vmerge back into
the materialization of the second operand if possible.

Note: The change includes only the most immediately obvious of the
stylistic cleanup. There's a bunch of code movement that this enables
that I'll do as a separate patch as rolling it into this creates an
unreadable diff.
2024-01-24 08:29:28 -08:00
Brandon Wu
33d804c6c2
[RISCV] Allow VCIX with SE to reorder (#77049)
This patch allows VCIX instructions that have side effect to be
reordered
with memory and other side effecting instructions. However we don't want
VCIX instructions to be reordered with each other, so we propose a dummy
register called VCIX_STATE and make these instructions implicitly define
and use
it.
2024-01-24 11:30:12 +08:00
Paul Kirth
03a61d34eb
[RISCV] Support TLSDESC in the RISC-V backend (#66915)
This patch adds basic TLSDESC support in the RISC-V backend.

Specifically, we add new relocation types for TLSDESC, as prescribed in 
https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/373, and add a
new pseudo instruction to simplify code generation.

This patch does not try to optimize the local dynamic case, which can be
improved in separate patches. 

Linker side changes will also be handled separately.

The current implementation is only enabled when passing the new
`-enable-tlsdesc` codegen flag.
2024-01-23 16:16:07 -08:00
Philip Reames
f05dd29cee [RISCV] Regenerate autogen test to remove spurious diff 2024-01-23 10:57:54 -08:00
Philip Reames
bdc41106ee
[RISCV] Recurse on first operand of two operand shuffles (#79180)
This is the first step towards an alternate shuffle lowering design for
the general two vector argument case. The goal is to leverage the
existing lowering for single vector permutes to avoid as many of the
vrgathers as required - even if we do need the other.

This patch handles only the first argument, and is arguably a slightly
weird half-step. However, the test changes from the full two argument
recurse patch are a lot harder to reason about. Taking this half step
gives much more easily reviewable changes, and is thus worthwhile. I
intend to post the patch for the second argument once this has landed.
2024-01-23 10:49:55 -08:00
Philip Reames
bb8a8770e2
[RISCV] Exploit register boundaries when lowering shuffle with exact vlen (#79072)
If we have a shuffle which is larger than m1, we may be able to split it
into a series of individual m1 shuffles. This patch starts with the
subcase where the mask allows a 1-to-1 mapping from source register to
destination register - each with a possible permutation of their own. We
can potentially extend this later, thought in practice this seems to
already catch a number of the most interesting cases.
2024-01-23 10:36:22 -08:00
Craig Topper
d360963aaa
[RISCV] Add regalloc hints for Zcb instructions. (#78949)
This hints the register allocator to use the same register for source
and destination to enable more compression.
2024-01-23 09:33:06 -08:00
Simeon K
297b77036e
[RISCV] Fix stack size computation when M extension disabled (#78602)
Ensure that getVLENFactoredAmount does not fail when the scale amount
requires the use of a non-trivial multiplication but the M extension is
not enabled. In such case, perform the multiplication using shifts and
adds.
2024-01-22 23:10:25 -08:00
Jim Lin
b8e708b9d3
[RISCV] Merge ADDI with X0 into base offset (#78940)
If offset is `addi rd, x0, imm`, merge imm into base offset.
2024-01-23 09:57:05 +08:00
Simeon K
58cfd56356
[VP][RISCV] Introduce llvm.vp.minimum/maximum intrinsics (#74840)
Although there are predicated versions of minnum/maxnum, the ones for
minimum/maximum are currently missing. This patch introduces these
intrinsics and implements their lowering to RISC-V.
2024-01-22 16:46:39 -08:00
Philip Reames
8675952583 [RISCV] Add coverage for shuffles splitable using exact VLEN
Test coverage for an upcoming transform.
2024-01-22 14:00:51 -08:00
Wang Pengcheng
5cd8d53cac
[RISCV] Teach RISCVMergeBaseOffset to handle inline asm (#78945)
For inline asm with memory operands, we can merge the offset into
the second operand of memory constraint operands.

Differential Revision: https://reviews.llvm.org/D158062
2024-01-22 17:36:32 +08:00
Craig Topper
9396891271 [RISCV] Don't look for sext in RISCVCodeGenPrepare::visitAnd.
We want to know the upper 33 bits of the And Input are zero. SExt
only guarantees they are the same.

We originally checked for SExt or ZExt when we were using
isImpliedByDomCondition because a ZExt may have been changed to SExt
before we visited the And.

We are no longer using isImpliedByDomCondition so we can only look
for zext with the nneg flag.

While here, switch to PatternMatch to simplify the code.

Fixes #78783
2024-01-19 14:44:47 -08:00
Craig Topper
66cea7143a [RISCV] Add test case for #78783. NFC 2024-01-19 14:44:47 -08:00
Craig Topper
9ae28fb9d3
[RISCV] Prevent RISCVMergeBaseOffsetOpt from calling getVRegDef on a physical register. (#78762)
Fixes #78679.
2024-01-19 12:15:08 -08:00
Min-Yih Hsu
5330daad41
[RISCV] Add support for Smepmp 1.0 (#78489)
Smepmp is a supervisor extension that prevents privileged processes from
accessing unprivileged program and data.

Spec: https://github.com/riscv/riscv-tee/blob/main/Smepmp/Smepmp.pdf
2024-01-19 11:09:35 -08:00
Craig Topper
0ad83bc26c
[RISCV] Don't look through EXTRACT_ELEMENT in lowerScalarInsert if the element types are different. (#78668)
If the element type of the vector we're extracting from doesn't match the type we're
inserting into, we can't directly insert or extract the subvector.
2024-01-18 22:35:24 -08:00
Luke Lau
8649328060
[RISCV] Add support for new unprivileged extensions defined in profiles spec (#77458)
This adds minimal support for 7 new unprivileged extensions that were
defined as a part of
the RISC-V Profiles specification here:

https://github.com/riscv/riscv-profiles/blob/main/profiles.adoc#7-new-isa-extensions

* Ziccif: Main memory supports instruction fetch with atomicity
requirement
* Ziccrse: Main memory supports forward progress on LR/SC sequences
* Ziccamoa: Main memory supports all atomics in A
* Zicclsm: Main memory supports misaligned loads/stores
* Za64rs: Reservation set size of 64 bytes
* Za128rs: Reservation set size of 128 bytes
* Zic64b: Cache block size isf 64 bytes

As stated in the specification, these extensions don't add any new
features but
describe existing features. So this patch only adds parsing and
subtarget
features.
2024-01-19 06:57:06 +07:00
Luke Lau
15b0fabb21
[RISCV] Vectorize phi for loop carried @llvm.vector.reduce.fadd (#78244)
LLVM vector reduction intrinsics return a scalar result, but on RISC-V
vector reduction instructions write the result in the first element of a
vector register. So when a reduction in a loop uses a scalar phi, we end
up with unnecessary scalar moves:

loop:
    vfmv.s.f v10, fa0
    vfredosum.vs v8, v8, v10
    vfmv.f.s fa0, v8

This mainly affects ordered fadd reductions, which has a scalar accumulator
operand.
This tries to vectorize any scalar phis that feed into a fadd reduction
in RISCVCodeGenPrepare, converting:

loop:
%phi = phi <float> [ ..., %entry ], [ %acc, %loop]
%acc = call float @llvm.vector.reduce.fadd.nxv4f32(float %phi, <vscale x 2 x float> %vec)
```

to

loop:
%phi = phi <vscale x 2 x float> [ ..., %entry ], [ %acc.vec, %loop]
%phi.scalar = extractelement <vscale x 2 x float> %phi, i64 0
%acc = call float @llvm.vector.reduce.fadd.nxv4f32(float %x, <vscale x 2 x float> %vec)
%acc.vec = insertelement <vscale x 2 x float> poison, float %acc.next, i64 0

Which eliminates the scalar -> vector -> scalar crossing during
instruction selection.
2024-01-18 16:15:20 +07:00
Chia
ba81477e9c
Recommit "[RISCV][ISel] Combine scalable vector add/sub/mul with zero/sign extension." (#76785)
This patch was originally introduced in PR #72340, but was reverted due
to a bug on invalid extension combine.

Specifically, we resolve the case in the
https://github.com/llvm/llvm-project/pull/72340#issuecomment-1874810998

```
define <vscale x 1 x i32> @foo(<vscale x 1 x i1> %x, <vscale x 1 x i2> %y) {     
  %a = zext <vscale x 1 x i1> %x to <vscale x 1 x i32>                           
  %b = zext <vscale x 1 x i1> %y to <vscale x 1 x i32>                           
  %c = add <vscale x 1 x i32> %a, %b                                             
  ret <vscale x 1 x i32> %c                                                      
}
```
The previous patch didn't check if the semantic of `ISD::ZERO_EXTEND`
and `ISD::ZERO_EXTEND` is equivalent to the `vsext.vf2` or `vzext.vf2`
(not ensuring the SEW condition on widening Vector Arithmetic
Instructions).

Thanks for @topperc pointing out this bug.

## The original description 
This PR mainly aims at resolving the below missed-optimization case,
while it could also be considered as an extension of the previous patch
https://reviews.llvm.org/D133739?id=

### Missed-Optimization Case
Compiler Explorer: https://godbolt.org/z/GzWzP7Pfh

### Source Code: 
```
define <vscale x 2 x i16> @multiple_users(ptr  %x, ptr  %y, ptr %z) {
  %a = load <vscale x 2 x i8>, ptr %x
  %b = load <vscale x 2 x i8>, ptr %y
  %b2 = load <vscale x 2 x i8>, ptr %z
  %c = sext <vscale x 2 x i8> %a to <vscale x 2 x i16>
  %d = sext <vscale x 2 x i8> %b to <vscale x 2 x i16>
  %d2 = sext <vscale x 2 x i8> %b2 to <vscale x 2 x i16>
  %e = mul <vscale x 2 x i16> %c, %d
  %f = add <vscale x 2 x i16> %c, %d2
  %g = sub <vscale x 2 x i16> %c, %d2
  %h = or <vscale x 2 x i16> %e, %f
  %i = or <vscale x 2 x i16> %h, %g
  ret <vscale x 2 x i16> %i
}
```
### Before This Patch
```
# %bb.0:
        vsetvli a3, zero, e16, mf2, ta, ma
        vle8.v  v8, (a0)
        vle8.v  v9, (a1)
        vle8.v  v10, (a2)
        svf2       v11, v8
        vsext.vf2       v8, v9
        vsext.vf2       v9, v10
        vmul.vv v8, v11, v8
        vadd.vv v10, v11, v9
        vsub.vv v9, v11, v9
        vor.vv  v8, v8, v10
        vor.vv  v8, v8, v9
        ret
```
###  After This Patch 
```
# %bb.0:
	vsetvli	a3, zero, e8, mf4, ta, ma
	vle8.v	v8, (a0)
	vle8.v	v9, (a1)
	vle8.v	v10, (a2)
	vwmul.vv	v11, v8, v9
	vwadd.vv	v9, v8, v10
	vwsub.vv	v12, v8, v10
	vsetvli	zero, zero, e16, mf2, ta, ma
	vor.vv	v8, v11, v9
	vor.vv	v8, v8, v12
	ret
```
We can see Add/Sub/Mul are combined with the Sign Extension.

### Relation to the Patch D133739
The patch D133739 introduced an optimization for folding `ADD_VL`/
`SUB_VL` / `MUL_V` with `VSEXT_VL` / `VZEXT_VL`. However, the patch did
not consider the case of non-fixed length vector case, thus this PR
could also be considered as an extension for the D133739.
2024-01-17 18:30:27 -08:00
Mikhail Gudim
c1f433849b
[GISel][RISCV] Implement selectShiftMask. (#77572)
Implement `selectShiftMask` in `GlobalISel`.
2024-01-17 16:25:43 -05:00
Philip Reames
de423cfe3d
[RISCV] Prefer vsetivli for VLMAX when VLEN is exactly known (#75509)
If VLEN is exactly known, we may be able to use the vsetivli encoding
instead of the vsetvli a0, zero, <vtype> encoding. This slightly reduces
register pressure.

This builds on 632f1c5, but reverses course a bit. It turns out to be
quite complicated to canonicalize from VLMAX to immediate early because
the sentinel value is widely used in tablegen patterns without knowledge
of LMUL. Instead, we canonicalize towards the VLMAX representation, and
then pick the immediate form during insertion since we have the LMUL
information there.

Within InsertVSETVLI, this could reasonable fit in a couple places. If
reviewers want me to e.g. move it to emission, let me know. Doing so may
require a bit of extra code to e.g. handle comparisons of the two forms,
but shouldn't be too complicated.
2024-01-17 12:40:00 -08:00
Simon Pilgrim
d92ce344bf Revert faecc736e2ac3cd8c77 #74443 [DAG] isSplatValue - node is a splat if all demanded elts have the same whole constant value (#74443)
Relying on ComputeKnownBits to find a splat is causing miscompilations where a shift of zero is being assumed to give zero, but further simplification leads to a shift of zero by undef, resulting in an unexpected undef value.

Fixes #78109
2024-01-17 15:59:33 +00:00
Alex Bradbury
da0755f7b7 [RISCV][test] Test showing missed optimisation for spills/fills of GPR<->FPR moves
The fmv can be removed through appropriate logic in
RISCVInstrInfo::foldMemoryOperandImpl.
2024-01-17 08:11:06 +00:00
Craig Topper
7fe5269b54
[RISCV] Bump Zfbfmin, Zvfbfmin, and Zvfbfwma to 1.0. (#78021) 2024-01-16 08:42:21 -08:00
Luke Lau
93d39657f5 [RISCV] Remove -riscv-v-vector-bits-min flag that was left behind. NFC
This should have been removed in 74f985b793bf4005e49736f8c2cef8b5cbf7c1ab
2024-01-16 21:30:32 +07:00
Wang Pengcheng
3ac9fe69f7
[RISCV] CodeGen of RVE and ilp32e/lp64e ABIs (#76777)
This commit includes the necessary changes to clang and LLVM to support
codegen of `RVE` and the `ilp32e`/`lp64e` ABIs.

The differences between `RVE` and `RVI` are:
* `RVE` reduces the integer register count to 16(x0-x16).
* The ABI should be `ilp32e` for 32 bits and `lp64e` for 64 bits.

`RVE` can be combined with all current standard extensions.

The central changes in ilp32e/lp64e ABI, compared to ilp32/lp64 are:
* Only 6 integer argument registers (rather than 8).
* Only 2 callee-saved registers (rather than 12).
* A Stack Alignment of 32bits (rather than 128bits).
* ilp32e isn't compatible with D ISA extension.

If `ilp32e` or `lp64` is used with an ISA that has any of the registers
x16-x31 and f0-f31, then these registers are considered temporaries.

To be compatible with the implementation of ilp32e in GCC, we don't use
aligned registers to pass variadic arguments and set stack alignment\
to 4-bytes for types with length of 2*XLEN.

FastCC is also supported on RVE, while GHC isn't since there is only one
avaiable register.

Differential Revision: https://reviews.llvm.org/D70401
2024-01-16 20:44:30 +08:00
Alex Bradbury
84f7fb6217
[MachineScheduler] Add option to control reordering for store/load clustering (#75338)
Reordering based on the sort order of the MemOpInfo array was disabled
in <https://reviews.llvm.org/D72706>. However, it's not clear this is
desirable for al targets. It also makes it more difficult to compare the
incremental benefit of enabling load clustering in the selectiondag
scheduler as well was the machinescheduler, as the sdag scheduler does
seem to allow this reordering.

This patch adds a parameter that can control the behaviour on a
per-target basis.

Split out from #73789.
2024-01-16 07:17:41 +00:00
Luke Lau
286a366d05
[RISCV] Remove vmv.s.x and vmv.x.s lmul pseudo variants (#71501)
vmv.s.x and vmv.x.s ignore LMUL, so we can replace the PseudoVMV_S_X_MX
and
PseudoVMV_X_S_MX with just one pseudo each. These pseudos use the VR
register
class (just like the actual instruction), so we now only have TableGen
patterns for vectors of LMUL <= 1.
We now rely on the existing combines that shrink LMUL down to 1 for
vmv_s_x_vl (and vfmv_s_f_vl). We could look into removing these combines
later and just inserting the nodes with the correct type in a later
patch.

The test diff is due to the fact that a PseudoVMV_S_X/PsuedoVMV_X_S no
longer
carries any information about LMUL, so if it's the only vector pseudo
instruction in a block then it now defaults to LMUL=1.
2024-01-16 13:36:24 +07:00
Luke Lau
3b7abf38fb [RISCV] Add disjoint flag to or ops in RISCVGatherScatterLowering tests. NFC
InstCombine will add the disjoint flag to these or instructions. This patch
adds them to the tests so that it matches the input RISCVGatherScatterLowering
will receive in practice, allowing us to rely on said disjoint flag:
https://github.com/llvm/llvm-project/pull/77800#discussion_r1449231844
2024-01-15 14:09:27 +07:00
Luke Lau
0cf768e7f1
[RISCV] Handle disjoint or in RISCVGatherScatterLowering (#77800)
This patch adds support for the disjoint flag in the non-recursive case,
as well as adding an additional check for it in the recursive case. Note
that haveNoCommonBitsSet should be equivalent to having the disjoint
flag set, and the check can be removed in a follow-up patch.

Co-authored-by: Philip Reames <preames@rivosinc.com>

---------

Co-authored-by: Philip Reames <preames@rivosinc.com>
2024-01-15 13:37:09 +07:00
Luke Lau
c07a1fe7b4
[RISCV] Lower vfmv.s.f intrinsics to VFMV_S_F_VL first (#76699)
Currently vfmv.s.f intrinsics are directly selected to their pseudos via
a
tablegen pattern in RISCVInstrInfoVPseudos.td, whereas the other move
instructions (vmv.s.x/vmv.v.x/vmv.v.f etc.) first get lowered to their
corresponding VL SDNode, then get selected from a pattern in
RISCVInstrInfoVVLPatterns.td

This patch brings vfmv.s.f inline with the other move instructions.

Split out from #71501, where we did this to preserve the behaviour of
selecting
vmv_s_x for VFMV_S_F_VL for small enough immediates.
2024-01-15 12:07:29 +07:00
Min-Yih Hsu
2f2217a8f7
[RISCV] Add missing tests for inttoptr/ptrtoint on scalable vectors (#77857)
Add missing tests for inttoptr/ptrtoint on scalable vectors. Previously we only had inttoptr/ptrtoint tests for fixed vectors.
2024-01-12 09:52:07 -08:00
Philip Reames
5ce067d592 Revert "[LSR][TTI][RISCV] Disable terminator folding for RISC-V."
This reverts commit fdb87640ee2be63af9b0e0cd943cb13d79686a03, and thus
re-enables terminator folding for RISCV.  The reported miscompile has
been fixed in f5dd70c58277d925710e5a7c25c86d7565cc3c6c.
2024-01-11 13:20:02 -08:00
Luke Lau
114e6d7ba0 [RISCV] Add test for strided gather with recursive disjoint or. NFC
This already gets converted to a strided intrinsic because we currently call
haveNoCommonBitsSet when checking or instructions, but an upcoming patch will
change this logic and we want to preserve this case.

Note that this IR is in the form that comes from instcombine. The splats need
to be inline constexprs, otherwise isSplatValue() will fail. (It can't
currently handle splats where the shufflevector is an instruction, and the
insertelement is a constexpr.
2024-01-12 00:02:28 +07:00
Luke Lau
3b3ee1f534 [RISCV] Add test for strided gather with disjoint or. NFC 2024-01-11 22:08:57 +07:00
Luke Lau
e8790027b1
[RISCV] Allow vsetvlis with same register AVL in doLocalPostpass (#76801) 2024-01-11 12:12:46 +07:00
Craig Topper
3378514a4d
[RISCV] Use any_extend for type legalizing atomic_compare_swap with Zacas. (#77669)
With Zacas we will use amocas.w which doesn't require the input to be
sign extended.
2024-01-10 12:41:11 -08:00
Craig Topper
0a1b066bba
[RISCV] Support isel for Zacas for XLen and i32. (#77666)
This adds new isel patterns for Zacas that take priority over the
pseudoinstructions we use for the A extension.

Support for 2x XLen types will come in a separate patch since they need
to be done differently.
2024-01-10 12:00:40 -08:00
Craig Topper
b788692fa5 [RISCV][NFC] Remove unused CHECK prefixes to fix buildbots. NFC 2024-01-09 23:37:18 -08:00
jiahanxie353
e42a70afab [RISCV][GISel] IRTranslate and Legalize some instructions with scalable vector type
* Add IRTranslate tests for ADD, SUB, AND, OR, and XOR with scalable
  vector types to show that they work as expected.
* Legalize G_ADD, G_SUB, G_AND, G_OR, and G_XOR of scalable vector
  type for the RISC-V vector extension.
2024-01-09 21:51:30 -07:00
Chia
a79d13f12a
[RISCV][ISel] Use vaaddu with rounding mode rnu for ISD::AVGCEILU. (#77473)
Similar to #76550, but for `ISD::AVGCEILU`.
Specifically, this patch aims to use `vaaddu` with rounding mode rnu
(i.e `vxrm[1:0] = 0b00`) for `ISD::AVGCEILU`.

### Source code 
```
define <vscale x 8 x i8> @vaaddu_vv_nxv8i8_ceil(<vscale x 8 x i8> %x, <vscale x 8 x i8> %y) {
  %xzv = zext <vscale x 8 x i8> %x to <vscale x 8 x i16>
  %yzv = zext <vscale x 8 x i8> %y to <vscale x 8 x i16>
  %add = add nuw nsw <vscale x 8 x i16> %xzv, %yzv
  %one = insertelement <vscale x 8 x i16> poison, i16 1, i32 0
  %splat = shufflevector <vscale x 8 x i16> %one, <vscale x 8 x i16> poison, <vscale x 8 x i32> zeroinitializer
  %add1 = add nuw nsw <vscale x 8 x i16> %add, %splat
  %div = lshr <vscale x 8 x i16> %add1, %splat
  %ret = trunc <vscale x 8 x i16> %div to <vscale x 8 x i8>
  ret <vscale x 8 x i8> %ret
}
```

### Before this patch 
```
vaaddu_vv_nxv8i8_ceil:
        vsetvli a0, zero, e8, m1, ta, ma
        vwaddu.vv       v10, v8, v9
        vsetvli zero, zero, e16, m2, ta, ma
        vadd.vi v10, v10, 1
        vsetvli zero, zero, e8, m1, ta, ma
        vnsrl.wi        v8, v10, 1
        ret
```
### After this patch 
```
vaaddu_vv_nxv8i8_ceil:
        vsetvli a0, zero, e8, m1, ta, ma
        csrwi vxrm, 0
        vaaddu.vv v8, v8, v9
        ret
```
2024-01-10 12:08:16 +09:00
Fangrui Song
6c207ee5d2
[RISCV] Force relocations if initial MCSubtargetInfo contains FeatureRelax (#77436)
Regarding
```
.option norelax
j label
.option relax
// relaxable instructions
// For assembly input, RISCVAsmParser::ParseInstruction will set ForceRelocs (https://reviews.llvm.org/D46423).
// For direct object emission, ForceRelocs is not set after https://github.com/llvm/llvm-project/pull/73721
label:
```

The J instruction needs a relocation to ensure the target is correct
after linker relaxation. This is related a limitation in the assembler:
RISCVAsmBackend::shouldForceRelocation decides upfront whether a
relocation is needed, instead of checking more information (whether
there are relaxable fragments in between).

Despite the limitation, `j label` produces a relocation in direct object
emission mode, but was broken by #73721 due to the shouldForceRelocation
limitation.

Add a workaround to RISCVTargetELFStreamer to emulate the previous
behavior.

Link: https://github.com/ClangBuiltLinux/linux/issues/1965
2024-01-09 11:24:21 -08:00
Fangrui Song
7620f03ef7
[MC] Parse SHF_LINK_ORDER argument before section group name (#77407)
When both SHF_LINK_ORDER | SHF_GROUP flags are set, GNU assembler from
2.35 onwards (https://sourceware.org/PR25381
https://sourceware.org/binutils/docs/as/Section.html) parses the
SHF_LINK_ORDER argument before section group name, different from us.

This is unfortunate, but does not matter because the `.section` flag `o`
is a niche feature only used by compiler instrumentations, not adopted
by hand-written assembly, and using both flags is extremely rare. Let's
just match GNU assembler. There is another benefit: we now support
zero-flag section group with the SHF_LINK_ORDER flag, while previously
there isn't a syntax.

While here, print 'G' after 'o' to be clear that the 'G' argument is
parsed after the 'o' argument. To make the diff smaller, we don't print
'G' after 'w' in the absence of 'o' for now.
2024-01-09 10:42:34 -08:00
Chia
0c24c175f2
[RISCV][ISel] Use vaaddu with rounding mode rdn for ISD::AVGFLOORU. (#76550)
This patch aims to use `vaaddu` with rounding mode rdn (i.e `vxrm[1:0] =
0b10`) for `ISD::AVGFLOORU`.

### Source code 
```
define <8 x i8> @vaaddu_auto(ptr %x, ptr %y, ptr %z) {
  %xv = load <8 x i8>, ptr %x, align 2
  %yv = load <8 x i8>, ptr %y, align 2
  %xzv = zext <8 x i8> %xv to <8 x i16>
  %yzv = zext <8 x i8> %yv to <8 x i16>
  %add = add nuw nsw <8 x i16> %xzv, %yzv
  %div = lshr <8 x i16> %add, <i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1>
  %ret = trunc <8 x i16> %div to <8 x i8>
  ret <8 x i8> %ret 
}
```

### Before this patch 
```
vaaddu_auto: 
        vsetivli        zero, 8, e8, mf2, ta, ma
        vle8.v  v8, (a0)
        vle8.v  v9, (a1)
        vwaddu.vv       v10, v8, v9
        vnsrl.wi        v8, v10, 1
        ret
```
### After this patch 
```
vaaddu_auto: 
	vsetivli	zero, 8, e8, mf2, ta, ma
	vle8.v	v8, (a0)
	vle8.v	v9, (a1)
	csrwi	vxrm, 2
	vaaddu.vv	v8, v8, v9
	ret
```

### Note on signed averaging addition

Based on the rvv spec, there is also a variant for signed averaging
addition called `vaadd`.
But AFAIU, no matter in which rounding mode, we cannot achieve the
semantic of signed averaging addition through `vaadd`.
Thus this patch only introduces `vaaddu`.
2024-01-09 15:17:38 +09:00
Craig Topper
a8e9dceb49
[RISCV] Use getELen() instead of hardcoded 64 in lowerBUILD_VECTOR. (#77355)
This is needed to properly support Zve32x.
2024-01-08 19:36:15 -08:00
Jim Lin
96c4f1034c
[RISCV] Add support predicating for ANDN/ORN/XNOR with short-forward-branch-opt. (#77077)
ANDN/ORN/XNOR are like other ALU instructions. It should be able to be
predicated by the cpu that supports short-forward-branch.
2024-01-09 11:12:44 +08:00
Craig Topper
faa326de97
[RISCV] Add branch+c.mv macrofusion for sifive-p450. (#76169)
sifive-p450 supports a very restricted version of the short forward
branch optimization from the sifive-7-series.

For sifive-p450, a branch over a single c.mv can be macrofused as a
conditional move operation. Due to encoding restrictions on c.mv, we
can't conditionally move from X0. That would require c.li instead.
2024-01-08 15:23:26 -08:00
Min-Yih Hsu
478ec63312
[RISCV] Mark VFIRST and VCPOP as SignExtendingOpW (#77022)
Since their values are small enough ([-1, 65535] & [0, 65535],
respectively) to fit into signed 32 bits, any sext (or downcasting +
sext) will be redundnat. Hence marking them as SignExtendingOpW.
2024-01-08 10:59:06 -08:00