1264 Commits

Author SHA1 Message Date
Philip Reames
13a74d6cc8 [RISCV] Fix crash when legalizing mgather/scatter on rv32
This is a fix for a subset of legalization problems around 64 bit indices on
rv32 targets.  For RV32+V, we were using the wrong mask type for the manual
truncation lowering for fixed length vectors.  Instead, just use the generic
TRUNCATE node, and let it be lowered as needed.

Note that legalization is still broken for rv32+zve32.  That appears to be
a different issue.
2023-09-18 08:36:23 -07:00
Craig Topper
e6a007f6b5 [RISCV] Use getConstantOperandVal. NFC 2023-09-17 10:53:50 -07:00
Craig Topper
d76e96b627 [RISCV] Reuse existing XLenVT variable. NFC 2023-09-17 10:46:53 -07:00
Vettel
ddae50d1e6
[RISCV] Combine trunc (sra sext (x), zext (y)) to sra (x, smin (y, scalarsizeinbits(y) - 1)) (#65728)
For RVV, If we want to perform an i8 or i16 element-wise vector
arithmetic right shift in the upper C/C++ program, the value to be
shifted would be first sign extended to i32, and the shift amount would
also be zero_extended to i32 to perform the vsra.vv instruction, and
followed by a truncate to get the final calculation result, such pattern
will later expanded to a series of "vsetvli" and "vnsrl" instructions
later, this is because the RVV spec only support 2 * SEW -> SEW
truncate. But for vector, the shift amount can also be determined by
smin (Y, ScalarSizeInBits(Y) - 1)). Also, for the vsra instruction, we
only care about the low lg2(SEW) bits as the shift amount.

- Alive2: https://alive2.llvm.org/ce/z/u3-Zdr
- C++ Test cases : https://gcc.godbolt.org/z/q1qE7fbha
2023-09-17 17:11:28 +08:00
Yingwei Zheng
e042ff7eef
[SDAG][RISCV] Avoid expanding is-power-of-2 pattern on riscv32/64 with zbb
This patch adjusts the legality check for riscv to use `cpop/cpopw` since `isOperationLegal(ISD::CTPOP, MVT::i32)` returns false on rv64gc_zbb.
Clang vs gcc: https://godbolt.org/z/rc3s4hjPh

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D156390
2023-09-17 02:56:09 +08:00
Yingwei Zheng
b423e1f05d
[SDAG][RISCV] Avoid neg instructions when lowering atomic_load_sub with a constant rhs
This patch avoids creating (sub x0, rhs) when lowering atomic_load_sub with a constant rhs.
Comparison with GCC: https://godbolt.org/z/c5zPdP7j4

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D158673
2023-09-16 17:09:41 +08:00
Philip Reames
c663401f69 [RISCV] Prefer vrgatherei16 for shuffles (#66291)
If the data type is larger than e16, and the requires more than LMUL1 register
class, prefer the use of vrgatherei16.  This has three major benefits:
1) Less work needed to evaluate the constant for e.g. vid sequences.  Remember
that arithmetic generally scales lineary with LMUL.
2) Less register pressure.  In particular, the source and indices registers
*can* overlap so using a smaller index can significantly help at m8.
3) Smaller constants.  We've got a bunch of tricks for materializing small
constants, and if needed, can use a EEW=16 load.
2023-09-15 15:57:23 -07:00
Philip Reames
ff2622b5ac [RISCV] Optimize gather/scatter to unit-stride memop + shuffle (#66279)
If we have a gather or a scatter whose index describes a permutation of the
lanes, we can lower this as a shuffle + a unit strided memory operation.  For
RISCV, this replaces a indexed load/store with a unit strided memory operation
and a vrgather (at worst).

I did not bother to implement the vp.scatter and vp.gather variants of these
transforms because they'd only be legal when EVL was VLMAX.  Given that, they
should have been transformed to the non-vp variants anyways.  I haven't checked
to see if they actually are.
2023-09-15 15:54:32 -07:00
Philip Reames
37aa07ad31 [RISCV] Move narrowIndex to be a DAG combine over target independent nodes
In D154687, we added a transform to narrow indexed load/store indices of the
form (shl (zext), C).  We can move this into a generic transform over the
target independent nodes instead, and pick up the fixed vector cases with no
additional work required.  This is an alternative to D158163.

Performing this transform points out that we weren't eliminating zero_extends
via the the generic DAG combine.  Adjust the (existing) callbacks so that we
do.

This change *removes* the existing transform on the target specific intrinsic
nodes.  If anyone has a use case this impacts, please speak up.

Note: Reviewed as part of a stack of changes in PR# 66405.
2023-09-15 15:02:14 -07:00
Philip Reames
2ff9175af7 [RISCV] Normalize gather/scatter addressing to UNSIGNED_SCALAR
If the index type is greater or equal to XLEN, then signed and unsigned
are the same.  Canonacalize towards unsigned to simplify upcoming transform.

Note: Reviewed as part of a stack of changes in PR# 66405.
2023-09-15 14:56:33 -07:00
Philip Reames
09a5aac514 [TLI] Add extend as explicit parameter to shouldRemoveExtendFromGSIndex [nfc]
Note: Reviewed as part of a stack of changes in PR# 66405.
2023-09-15 14:48:02 -07:00
Philip Reames
52b33ff760 [RISCV] Avoid toggling VL for hidden splat case in constant buildvector lowering
We have the analogous case in the single insert path.  The reasoning here is that if the original VL fits in LMUL1, we'd prefer to clobber a few extra dead lanes than to force two VL toggles.  VTYPE toggles are generally cheaper than VL toggles.
2023-09-15 12:33:21 -07:00
Nick Desaulniers
86735a4353
reland [InlineAsm] wrap ConstraintCode in enum class NFC (#66264)
reland [InlineAsm] wrap ConstraintCode in enum class NFC (#66003)

This reverts commit ee643b706be2b6bef9980b25cc9cc988dab94bb5.

Fix up build failures in targets I missed in #66003

Kept as 3 commits for reviewers to see better what's changed. Will
squash when
merging.

- reland [InlineAsm] wrap ConstraintCode in enum class NFC (#66003)
- fix all the targets I missed in #66003
- fix off by one found by llvm/test/CodeGen/SystemZ/inline-asm-addr.ll
2023-09-13 13:31:24 -07:00
Reid Kleckner
ee643b706b Revert "[InlineAsm] wrap ConstraintCode in enum class NFC (#66003)"
This reverts commit 2ca4d136124d151216aac77a0403dcb5c5835bcd.

Also revert the followup, "[InlineAsm] fix botched merge conflict resolution"

This reverts commit 8b9bf3a9f715ee5dce96eb1194441850c3663da1.

There were SystemZ and Mips build errors, too many to fix forward.
2023-09-13 09:58:02 -07:00
Nick Desaulniers
2ca4d13612
[InlineAsm] wrap ConstraintCode in enum class NFC (#66003)
Similar to
commit 2fad6e69851e ("[InlineAsm] wrap Kind in enum class NFC")

Fix the TODOs added in
commit 93bd428742f9 ("[InlineAsm] refactor InlineAsm class NFC
(#65649)")
2023-09-13 08:48:09 -07:00
Philip Reames
17b071db6a [RISCV] Rework gather/scatter DAG combine structure [NFC]
Instead of switching on type before and after common code, use a helper function.  This matches the style of DAGCombine.cpp more closely, and makes porting candidate changes from one place to the other much easier.
2023-09-12 10:57:12 -07:00
Luke Lau
b2f1a1b20b [RISCV] Move getSmallestVTForIndex so it can be used by lowerINSERT_VECTOR_ELT. NFC 2023-09-12 15:58:19 +01:00
Philip Reames
5352c79398
[RISCV] Add a combine to form masked.load from unit strided load (#65674)
Add a DAG combine to form a masked.load from a masked_strided_load
intrinsic with stride equal to element size. This covers a couple of
extra test cases, and allows us to simplify and common some existing
code on the concat_vector(load, ...) to strided load transform.

This is the first in a mini-patch series to try and generalize our
strided load and gather matching to handle more cases, and common up
different approaches to the same problems in different places.
2023-09-11 13:01:14 -07:00
Philip Reames
299d710e3d [RISCV] Lower fixed vectors extract_vector_elt through stack at high LMUL
This is the extract side of D159332. The goal is to avoid non-linear costing on patterns where an entire vector is split back into scalars. This is an idiomatic pattern for SLP.

Each vslide operation is linear in LMUL on common hardware. (For instance, the sifive-x280 cost model models slides this way.) If we do a VL unique extracts, each with a cost linear in LMUL, the overall cost is O(LMUL2) * VLEN/ETYPE. To avoid the degenerate case, fallback to the stack if we're beyond LMUL2.

There's a subtly here. For this to work, we're *relying* on an optimization in LegalizeDAG which tries to reuse the stack slot from a previous extract. In practice, this appear to trigger for patterns within a block, but if we ended up with an explode idiom split across multiple blocks, we'd still be in quadratic territory. I don't think that variant is fixable within SDAG.

It's tempting to think we can do better than going through the stack, but well, I haven't found it yet if it exists. Here's the results for sifive-s280 on all the variants I wrote (all 16 x i64 with V):

output/sifive-x280/linear_decomp_with_slidedown.mca:Total Cycles:      20703
output/sifive-x280/linear_decomp_with_vrgather.mca:Total Cycles:      23903
output/sifive-x280/naive_linear_with_slidedown.mca:Total Cycles:      21604
output/sifive-x280/naive_linear_with_vrgather.mca:Total Cycles:      22804
output/sifive-x280/recursive_decomp_with_slidedown.mca:Total Cycles:      15204
output/sifive-x280/recursive_decomp_with_vrgather.mca:Total Cycles:      18404
output/sifive-x280/stack_by_vreg.mca:Total Cycles:      12104
output/sifive-x280/stack_element_by_element.mca:Total Cycles:      4304

I am deliberately excluding scalable vectors. It functionally works, but frankly, the code quality for an idiomatic explode loop is so terrible either way that it felt better to leave that for future work.

Differential Revision: https://reviews.llvm.org/D159375
2023-09-11 10:49:17 -07:00
Luke Lau
e33f3f09b8
[RISCV] Shrink vslidedown when lowering fixed extract_subvector (#65598)
As noted in
https://github.com/llvm/llvm-project/pull/65392#discussion_r1316259471,
when lowering an extract of a fixed length vector from another vector,
we don't need to perform the vslidedown on the full vector type. Instead
we can extract the smallest subregister that contains the subvector to
be extracted and perform the vslidedown with a smaller LMUL. E.g, with
+Zvl128b:

v2i64 = extract_subvector nxv4i64, 2

is currently lowered as

vsetivli zero, 2, e64, m4, ta, ma
vslidedown.vi v8, v8, 2

This patch shrinks the vslidedown to LMUL=2:

vsetivli zero, 2, e64, m2, ta, ma
vslidedown.vi v8, v8, 2

Because we know that there's at least 128*2=256 bits in v8 at LMUL=2,
and we only need the first 256 bits to extract a v2i64 at index 2.

lowerEXTRACT_VECTOR_ELT already has this logic, so this extracts it out
and reuses it.

I've split this out into a separate PR rather than include it in #65392,
with the hope that we'll be able to generalize it later.

This patch refactors extract_subvector lowering to lower to
extract_subreg directly, and to shortcut whenever the index is 0 when
extracting a scalable vector. This doesn't change any of the existing
behaviour, but makes an upcoming patch that extends the scalable path
slightly easier to read.
2023-09-11 17:25:12 +01:00
Luke Lau
b46d7011f2
[RISCV] Refactor extract_subvector lowering slightly. NFC (#65391)
This patch refactors extract_subvector lowering to lower to
extract_subreg directly, and to shortcut whenever the index is 0 when
extracting a scalable vector. This doesn't change any of the existing
behaviour, but makes an upcoming patch that extends the scalable path
slightly easier to read.
2023-09-11 16:48:35 +01:00
Philip Reames
463c9f44dc [RISCV] Move slide and gather costing to TLI [NFC] (PR #65396)
As mentioned in TODOs from D159332.  This PR doesn't actually
common up that copy of the code because doing so is not NFC - due to
DLEN.  Fixing that will be a future PR.
2023-09-07 18:28:17 -07:00
Philip Reames
b4a99f1cd6
[RISCV] Lower constant build_vectors with few non-sign bits via vsext (#65648)
If we have a build_vector such as [i64 0, i64 3, i64 1, i64 2], we
instead lower this as vsext([i8 0, i8 3, i8 1, i8 2]). For vectors with
4 or fewer elements, the resulting narrow vector can be generated via
scalar materialization.

For shuffles which get lowered to vrgathers, constant build_vectors of
small constants are idiomatic. As such, this change covers all shuffles
with an output type of 4 or less.

I deliberately started narrow here. I think it makes sense to expand
this to longer vectors, but we need a more robust profit model on the
recursive expansion. It's questionable if we want to do the zsext if
we're going to generate a constant pool load for the narrower type
anyways.

One possibility for future exploration is to allow the narrower VT to be
less than 8 bits. We can't use vsext for that, but we could use
something analogous to our widening interleave lowering with some extra
shifts and ands.
2023-09-07 16:01:16 -07:00
Philip Reames
de34d39b66 [RISCV] Cap build vector cost to avoid quadratic cost at high LMULs
Each vslide1down operation is linear in LMUL on common hardware. (For instance, the sifive-x280 cost model models slides this way.) If we do a VL unique inserts, each with a cost linear in LMUL, the overall cost is O(VL*LMUL).  Since VL is a linear function of LMUL, this means the current lowering is quadradic in both LMUL and VL.  To avoid the degenerate case, fallback to the stack if the cost is more than a fixed (linear) threshold.

For context, here's the sifive-x280 llvm-mca results for the current lowering and stack based lowering for each LMUL (using e64). Assumes code was compiled for V (i.e. zvl128b).
  buildvector_m1_via_stack.mca:Total Cycles: 1904
  buildvector_m2_via_stack.mca:Total Cycles: 2104
  buildvector_m4_via_stack.mca:Total Cycles: 2504
  buildvector_m8_via_stack.mca:Total Cycles: 3304
  buildvector_m1_via_vslide1down.mca:Total Cycles:  804
  buildvector_m2_via_vslide1down.mca:Total Cycles:  1604
  buildvector_m4_via_vslide1down.mca:Total Cycles:  6400
  buildvector_m8_via_vslide1down.mca:Total Cycles: 25599

There are other schemes we could use to cap the cost. The next best is recursive decomposition of the vector into smaller LMULs. That's still quadratic, but with a better constant. However, stack based seems to cost better on all LMULs, so we can just go with the simpler scheme.

Arguably, this patch is fixing a regression introduced with my D149667 as before that change, we'd always fallback to the stack, and thus didn't have the non-linearity.

Differential Revision: https://reviews.llvm.org/D159332
2023-09-05 09:03:26 -07:00
Luke Lau
6098d7d5f6 [RISCV] Lower shuffles as rotates without zvbb
Now that the codegen for the expanded ISD::ROTL sequence has been improved,
it's probably profitable to lower a shuffle that's a rotate to the
vsll+vsrl+vor sequence to avoid a vrgather where possible, even if we don't
have the vror instruction.

This patch relaxes the restriction on ISD::ROTL being legal in
lowerVECTOR_SHUFFLEAsRotate. It also attempts to do the lowering twice: Once
if zvbb is enabled before any of the interleave/deinterleave/vmerge lowerings,
and a second time unconditionally just before it falls back to the vrgather.
This way it doesn't interfere with any of the above patterns that may be more
profitable than the expanded ISD::ROTL sequence.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D159353
2023-09-04 09:35:12 +01:00
Kazu Hirata
e2e68468f5 [RISCV] Use isNullConstant (NFC) 2023-09-04 00:31:38 -07:00
Matt Arsenault
b14e83d1a4 IR: Add llvm.exp10 intrinsic
We currently have log, log2, log10, exp and exp2 intrinsics. Add exp10
to fix this asymmetry. AMDGPU already has most of the code for f32
exp10 expansion implemented alongside exp, so the current
implementation is duplicating nearly identical effort between the
compiler and library which is inconvenient.

https://reviews.llvm.org/D157871
2023-09-01 19:45:03 -04:00
Craig Topper
319aba645f [RISCV] Teach MatInt to use (ADD_UW X, (SLLI X, 32)) to materialize some constants.
If the high and low 32 bits are the same, we try to use
(ADD X, (SLLI X, 32)) but that only works if bit 31 is clear since
the low 32 bits will be sign extended.

If we have Zba we can use add.uw to zero the sign extended bits.

Reviewed By: reames, wangpc

Differential Revision: https://reviews.llvm.org/D159253
2023-08-31 20:24:34 -07:00
Luke Lau
1664eb05d0 [RISCV] Fix crash during during i1 vector bitreverse lowering
A shuffle of v256i1 with a large enough minimum vlen might make it through type
legalization and into lowering. In this case, zvl1024b was enough. The
bitreverse shuffle lowering would then try to convert this to a v1i256 type
which is invalid (v1i128 exists though, which is why the existing v128i1 tests
were fine).

This patch checks to make sure that the new type is not only legal but also
valid.

Reviewed By: craig.topper, reames

Differential Revision: https://reviews.llvm.org/D159215
2023-08-31 19:39:08 +01:00
Luke Lau
7b33f60f13 [RISCV] Remove vmv_v_x_vl workaround for constant splat. NFC
Now that DAG.getConstant uses splat_vector_parts if needed on RV32, we can use
it directly without having to manually lower to a vmv_v_x_vl.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D159287
2023-08-31 19:36:09 +01:00
Philip Reames
3e89aca446 [RISCV] Rename getELEN to getELen [nfc]
Let's follow the naming scheme use for DLen, XLen, and FLen.
2023-08-31 11:27:00 -07:00
Craig Topper
d1c3784adf [RISCV] Prefer ShortForwardBranch over the fully generic Zicond expansion.
Short forward branch is shorter than (or (czero.eqz), (czero.nez)).

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D159295
2023-08-31 11:07:35 -07:00
Philip Reames
079c968eb9 [RISCV] Form vmv.s.f/x from single element splats via DAG combine
This re-implements the special casing we had in lowerScalarSplat as a DAG combine. As can be seen in the tests, this ends up triggering in a bunch more cases.

The semantically interesting bit of this change is the use of the implicit truncate semantics for when XLEN > SEW. We'd already been doing this for vmv.v.x, but this change extends e.g. the constant matching to make the same assumption about vmv.s.x. Per my reading of the specification, this should be fine, and if anything, is more obviously true of vmv.s.x than vmv.v.x.

Differential Revision: https://reviews.llvm.org/D158874
2023-08-30 12:44:36 -07:00
Philip Reames
fd465f377c [RISCV] Move vmv_s_x and vfmv_s_f special casing to DAG combine
We'd discussed this in the original set of patches months ago, but decided against it. I think we should reverse ourselves here as the code is significantly more readable, and we do pick up cases we'd missed by not calling the appropriate helper routine.

Differential Revision: https://reviews.llvm.org/D158854
2023-08-30 12:04:48 -07:00
Luke Lau
976244bb84 [RISCV] Canonicalize vrot{l,r} to vrev8 when lowering shuffle as rotate
A rotate of 8 bits of an e16 vector in either direction is equivalent to a
byteswap, i.e. vrev8. There is a generic combine on ISD::ROT{L,R} to
canonicalize these rotations to byteswaps, but on fixed vectors they are
legalized before they have the chance to be combined. This patch teaches the
rotate vector_shuffle lowering to emit these rotations as byteswaps to match
the scalable vector behaviour.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D158195
2023-08-30 11:01:49 +01:00
Luke Lau
a61c4a0ef6 [RISCV][SelectionDAG] Lower shuffles as bitrotates with vror.vi when possible
Given a shuffle mask like <3, 0, 1, 2, 7, 4, 5, 6> for v8i8, we can
reinterpret it as a shuffle of v2i32 where the two i32s are bit rotated, and
lower it as a vror.vi (if legal with zvbb enabled).
We also need to make sure that the larger element type is a valid SEW, hence
the tests for zve32x.

X86 already did this, so I've extracted the logic for it and put it inside
ShuffleVectorSDNode so it could be reused by RISC-V. I originally tried to add
this as a generic combine in DAGCombiner.cpp, but it ended up causing worse
codegen on X86 and PPC.

Reviewed By: reames, pengfei

Differential Revision: https://reviews.llvm.org/D157417
2023-08-30 11:01:47 +01:00
Craig Topper
7b5cf52f32 [RISCV] Improve splatPartsI64WithVL for fixed vector constants where Hi and Lo are the same and the VL is constant.
If doubling the VL will fit in a vsetivli, use it. It will be cheap
to change and cheap to change back.

This improves codegen from D158896.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D158896
2023-08-29 09:27:48 -07:00
Craig Topper
398c855457 [RISCV] Improve splatPartsI64WithVL for vlmax scalable vector constants where Hi and Lo are the same.
We can use a 32-bit splat and bitcast to i64 vector.

This only handles the case where we are using vlmax so that the new
vl is cheap to compute. This could be generalized to double the VL.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D158879
2023-08-25 14:15:41 -07:00
Craig Topper
4184bafa9b [RISCV] Refactor lowerSPLAT_VECTOR_PARTS to use splatPartsI64WithVL for scalable vectors.
There was quite a bit of duplication between splatPartsI64WithVL
and the scalable vector handling in lowerSPLAT_VECTOR_PARTS, but
scalable vector had one additional case. Move that case to
splatPartsI64WithVL which improves some fixed vector tests.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D158876
2023-08-25 14:15:40 -07:00
LiaoChunyu
1b12427c01 [VP][RISCV] Add vp.is.fpclass and RISC-V support
There is no vp.fpclass after FCLASS_VL(D151176), try to support vp.fpclass.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D152993
2023-08-25 15:40:55 +08:00
Luke Lau
e772c0ecd8 [RISCV] Use vmv.v.x if Hi bits are undef when lowering splat_vector_parts
When lowering a splat_vector_parts, if the hi bits are undefined then we can
splat the lo bits without having to check if it's going to be sign extended or
not, because those bits will be undefined anyway.

I've handled it for both fixed and scalable vectors, but there's no diff
on the scalable vror tests, since the hi bits aren't combined away to
undef in SimplifyDemanded for scalable vectors. I'm not sure why that is.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D158625
2023-08-24 12:19:09 +01:00
Luke Lau
06d3ee9603 [RISCV] Fix wrong operand being used for VL in shift combine
At some point a merge operand was added to the binary vl ops, so this combine
was using the mask for the VL. This causes a crash when trying to
select the vmv_v_x_vl, which showed up locally when messing about with
selectVSplat, but thankfully in ToT the vmv_v_x_vl gets pattern matched
away into the .vx and .vi operands every time, so there's no noticeable
change.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D158634
2023-08-23 17:44:21 +01:00
Jianjian GUAN
879e801a91 [RISCV] Apply promotion for f16 vector ops when only have zvfhmin
For most fp16 vector ops, we could promote it to fp32 vector when zvfhmin is enable but zvfh is not.
But for nxv32f16, we need to split it first since nxv32f32 is not a valid MVT.

Reviewed By: michaelmaitland

Differential Revision: https://reviews.llvm.org/D153848
2023-08-23 16:49:20 +08:00
Jianjian GUAN
759903568f [RISCV] Add Zvfhmin extension support for llvm RISCV backend
This patch supports Zvfhmin for RISCV codegen.

Reviewed By: michaelmaitland

Differential Revision: https://reviews.llvm.org/D151414
2023-08-23 16:47:47 +08:00
Philip Reames
c3b48ec6ff [RISCV] Match strided loads with reversed indexing sequences
This extends the concat_vector of loads to strided_load transform to handle reversed index pattern. The previous code expected indexing of the form (a0, a1+S, a2+S,...). However, we can also see indexing of the form (a1+S, a2+S, a3+S, .., aS). This form is a strided load starting at address aN + S*(n-1) with stride -S.

Note that this is also fixing what looks to be a bug in the memory location reasoning for forward strided case. A strided load with negative stride access eltsize bytes past base ptr, and then bytes *before* base ptr. (That is, the range should extend from before base ptr to after base ptr.)

Differential Revision: https://reviews.llvm.org/D157886
2023-08-22 07:59:49 -07:00
Philip Reames
ecb855a5a8 [RISCV] Reduce LMUL for vector extracts
If we have a known (or bounded) index which definitely fits in a smaller LMUL register group size, we can reduce the LMUL of the slide and extract instructions. This loosens constraints on register allocation, and allows the hardware to do less work, at the potential cost of some additional VTYPE toggles. In practice, we appear (after prior patches) to do a decent job of eliminating the additional VTYPE toggles in most cases.

Differential Revision: https://reviews.llvm.org/D158460
2023-08-22 07:36:17 -07:00
Craig Topper
b441fd60b2 [RISCV] Separate hasRoundModeOpNum into separate VXRM and FRM functions.
Preparation for developing a new rounding mode insertion algorithm
that is going to be different between them since VXRM doesn't need
to be save/restored.

This also unifies the FRM handling in RISCVISelLowering.cpp between
scalar and vector.

Fixes outdated comments in RISCVAsmPrinter and sorts the predicate
function by the reverse order of the operands being skipped.

Reviewed By: eopXD

Differential Revision: https://reviews.llvm.org/D158326
2023-08-21 10:00:23 -07:00
Craig Topper
078eb4bd85 [RISCV] Fix a UBSAN failure for passing INT64_MIN to std::abs.
clang recently started checking for INT64_MIN being passed to 64-bit std::abs.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D158304
2023-08-18 12:47:52 -07:00
Craig Topper
42dad521e3 [RISCV] Add RISCVII::getRoundModeOpNum to reduce code duplication. NFC 2023-08-16 12:00:02 -07:00
wangpc
ac00cca3d9 [RISCV] Fix assertion when passing f64 vectors via integer registers
The vector arguments are split but assignments won't be pending.

Fixes #64645

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D157847
2023-08-15 12:11:08 +08:00