1512 Commits

Author SHA1 Message Date
Craig Topper
999b9e6ddb [RISCV] Use vector getConstant instead of getSplatVector+getConstant. NFC 2024-04-10 19:39:41 -07:00
Jianjian Guan
fd50151180
[RISCV] Only support SPLAT_VECTOR for Zvfhmin when also enable the scalar extension of half fp (#88275) 2024-04-11 10:23:26 +08:00
Craig Topper
323d3ab257
[RISCV] Optimize undef Even vector in getWideningInterleave. (#88221)
We recently optimized the code when the Odd vector was undef to fix a
poison bug.

There are additional optimizations we can do if the even vector is
undef. With Zvbb, we can use a single vwsll. Without Zvbb, we can use a
vzext.vf2 and a vsll.
2024-04-10 09:08:50 -07:00
Chia
469caa31e7
[RISCV] Use vwadd.vx for splat vector with extension (#87249)
This patch allows `combineBinOp_VLToVWBinOp_VL` to handle patterns like
`(splat_vector (sext op))` or `(splat_vector (zext op))`. Then we can
use `vwadd.vx` and `vwadd.w` for such a case.

### Source code
```
define <vscale x 8 x i64> @vwadd_vx_splat_sext(<vscale x 8 x i32> %va, i32 %b) {
     %sb = sext i32 %b to i64
     %head = insertelement <vscale x 8 x i64> poison, i64 %sb, i32 0
     %splat = shufflevector <vscale x 8 x i64> %head, <vscale x 8 x i64> poison, <vscale x 8 x i32> zeroinitializer
     %vc = sext <vscale x 8 x i32> %va to <vscale x 8 x i64>
     %ve = add <vscale x 8 x i64> %vc, %splat
     ret <vscale x 8 x i64> %ve
}
```

### Before this patch
[Compiler Explorer](https://godbolt.org/z/sq191PsT4)
```
vwadd_vx_splat_sext:
  sext.w a0, a0
  vsetvli a1, zero, e64, m8, ta, ma
  vmv.v.x v16, a0
  vsetvli zero, zero, e32, m4, ta, ma
  vwadd.wv v16, v16, v8
  vmv8r.v v8, v16
  ret
```
### After this patch
```
vwadd_vx_splat_sext
  vsetvli a1, zero, e32, m4, ta, ma
  vwadd.vx v16, v8, a0
  vmv8r.v v8, v16
  ret
```
2024-04-10 15:26:17 +09:00
Luke Lau
9c660362c4
[RISCV] Support vwsll in combineBinOp_VLToVWBinOp_VL (#87620)
If the subtarget has +zvbb then we can attempt folding shl and shl_vl to
vwsll nodes.

There are few test cases where we still don't pick up the vwsll:
- For fixed vector vwsll.vi on RV32, see the FIXME for VMV_V_X_VL in
fillUpExtensionSupport for support implicit sign extension
- For scalable vector vwsll.vi we need to support ISD::SPLAT_VECTOR, see
#87249
2024-04-09 16:10:35 +08:00
Luke Lau
0f20b9b92f
[RISCV] Don't require mask or VL to be the same in combineBinOp_VLToVWBinOp_VL (#87997)
In NodeExtensionHelper we keep track of the VL and mask of the operand
being extended and check that they are the same as the root node's.
However for the nodes that we support, none of them have a passthru
operand with the exception of RISCV::VMV_V_X_VL, but we check that it's
passthru is undef anyway.

So it's safe to just discard the extend node's VL and mask and just use
the root's instead. (This is the same type of reasoning we use to treat
any vmset_vl as an all ones mask)

This allows us to match some more cases where we mix VP/non-VP/VL nodes,
but these don't seem to appear in practice. The main benefit from this
would be to simplify the code.
2024-04-09 16:04:10 +08:00
Luke Lau
8b3b4a92ad
[RISCV] Fix canFoldToVWWithSameExtension allowing different FP extensions (#87978) 2024-04-08 19:20:36 +08:00
Craig Topper
3b19cd7f80
[RISCV] Slightly simplify RVVArgDispatcher::constructArgInfos. NFC (#87308)
Use a single insert for the non-mask case instead of a push_back
followed by an insert that may contain 0 registers.
2024-04-02 18:34:03 -07:00
Craig Topper
a9af66a90e [RISCV] Lower (vector_interleave X, undef) to (vzext_vl X). (#87283)
If the odd vector is undef or poison, the widening add and multiply trick
doesn't work unless we freeze the odd vector.

Unfortunately, freezing doesn't work when the operand is provably
undef/poison. MIR doesn't have a representation for freeze so it
just becomes a COPY from IMPLICIT_DEF which freely propagates undef
to each operand independently.

To work around this, check for undef explicitly and lower to a VZEXT_VL
of the even vector. This produces better code than we'd get from a
freeze anyway.

I've left a FIXME for adding a freeze. I'll do that as a separate patch
as it affects other tests and doesn't help with the new test.
2024-04-02 11:58:41 -07:00
Brandon Wu
29e8bfc13c
[RISCV] RISCV vector calling convention (2/2) (#79096)
This commit handles vector arguments/return for function definition/call,
the new class RVVArgDispatcher is added for doing all vector register
assignment including mask types, data types as well as tuple types.
It precomputes the register number for each argument as per
https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/master/riscv-cc.adoc#standard-vector-calling-convention-variant
and it's passed to calling convention function to handle all vector arguments.

Depends on: #78550
2024-03-30 21:05:33 +08:00
Luke Lau
2a315d800b
[RISCV] Combine (or disjoint ext, ext) -> vwadd (#86929)
DAGCombiner (or InstCombine) will convert an add to an or if the bits
are disjoint, which can prevent what was originally an (add {s,z}ext,
{s,z}ext) from being selected as a vwadd.

This teaches combineBinOp_VLToVWBinOp_VL to recover it by treating it as
an add.
2024-03-29 19:45:24 +08:00
Luke Lau
a3c2d8c072
[RISCV] Combine ({s,u}{div,rem} (zext, zext)) -> (zext ({s,u}{div,rem} (zext, zext))) (#86779)
This narrows unsigned and signed div and rem nodes via
combineBinOpOfZExt.

Unlike other binary ops, there are no widening div or rem instructions.
So we will end up with an extra vzext.vf2.

However I'm assuming that div/rem are expensive enough that by reducing
their EMUL we will gain back the cost.

Alive2 proof: https://alive2.llvm.org/ce/z/Et_L6y
2024-03-29 05:55:38 +08:00
Brandon Wu
91896607ff
[RISCV] RISCV vector calling convention (1/2) (#77560)
[RISCV] RISCV vector calling convention (1/2)

    This is the vector calling convention based on
    https://github.com/riscv-non-isa/riscv-elf-psabi-doc,
    the idea is to split between "scalar" callee-saved registers
    and "vector" callee-saved registers. "scalar" ones remain the
    original strategy, however, "vector" ones are handled together
    with RVV objects.

    The stack layout would be:

      |--------------------------| <-- FP
      | callee-allocated save    |
      | area for register varargs|
      |--------------------------|
      | callee-saved registers   | <-- scalar callee-saved
      |        (scalar)          |
      |--------------------------|
      | RVV alignment padding    |
      |--------------------------|
      | callee-saved registers   | <-- vector callee-saved
      |        (vector)          |
      |--------------------------|
      | RVV objects              |
      |--------------------------|
      | padding before RVV       |
      |--------------------------|
      | scalar local variables   |
      |--------------------------| <-- BP
      | variable size objects    |
      |--------------------------| <-- SP

    Note: This patch doesn't contain "tuple" type, e.g. vint32m1x2.
          It will be handled in https://github.com/riscv-non-isa/riscv-elf-psabi-doc (2/2).

    Differential Revision: https://reviews.llvm.org/D154576
2024-03-27 23:03:13 +08:00
Luke Lau
87519a2830
[RISCV] Combine (mul (zext, zext)) -> (zext (mul (zext, zext))) (#86465)
Building on #86248, we can also narrow the width of a mul of zexts.

This is specifically legal because on RVV we always extend to the next
power of 2 width, and multiplying two N bit integers produces a maximum
value of 2\*N bits.
So as long as we keep an inner zext of 2\*N, we will have enough space
for the multiply and won't overflow.

Alive2 proof: https://alive2.llvm.org/ce/z/XteYyb
2024-03-26 23:28:04 +08:00
Philip Reames
a6b870db09
[RISCV] Enable sub(max, min) lowering for ABDS and ABDU (#86592)
We have the ISD nodes for representing signed and unsigned absolute
difference. For RISCV, we have vector min/max in the base vector
extension, so we can expand to the sub(max,min) lowering.

We could almost use the default expansion, but since fixed length
min/max are custom (not legal), the default expansion doesn't cover the
fixed vector cases. The expansion here is just a copy of the generic
code specialized to allow the custom min/max nodes to be created so they
can in turn be legalized to the _vl variants.

Existing DAG combines handle the recognition of absolute difference
idioms and conversion into the respective ISD::ABDS and ISD::ABDU nodes.

This change does have the net effect of potentially pushing a free
floating zero/sign extend after the expansion, and we don't do a great
job of folding that into later expressions. However, since in general
narrowing can reduce required work (by reducing LMUL) this seems like
the right general tradeoff.
2024-03-25 20:13:53 -07:00
Craig Topper
ce37a7131f
[RISCV] Add integer RISCVISD::SELECT_CC to canCreateUndefOrPoison and isGuaranteedNotToBeUndefOrPoison. (#84693)
Integer RISCVISD::SELECT_CC doesn't create poison. If none of the,
operands are poison, the result is not poison.

This allows ISD::FREEZE to be hoisted above RISCVISD::SELECT_CC.
2024-03-25 11:10:58 -07:00
Luke Lau
373e77b4c0
[RISCV] Generalize (sub zext, zext) -> (sext (sub zext, zext)) to add (#86248)
This generalizes the combine added in #82455 to other binary ops,
beginning with adds in this patch.

Because the two zext operands are always +ve when treated as signed, and
we don't get any overflow since the add is carried out in at least N * 2
bits of the narrow type, the result of the add will always be +ve. So we
can use a zext for the outer extend, unlike sub which may produce a -ve
result from two +ve operands.

Although we could still use sext for add, I plan to add support for
other binary ops like mul in a later patch, but mul requires zext to be
correct (because the maximum value will take up the full N * 2 bits). So
I've opted to use zext here too for consistency.

Alive2 proof: https://alive2.llvm.org/ce/z/PRNsUM
2024-03-25 13:08:56 +08:00
Harvin Iriawan
57146daeaa
[CodeGen] Update for scalable MemoryType in MMO (#70452)
Remove getSizeOrUnknown call when MachineMemOperand is created.  For Scalable
TypeSize, the MemoryType created becomes a scalable_vector.

2 MMOs that have scalable memory access can then use the updated BasicAA that
understands scalable LocationSize.

Original Patch by Harvin Iriawan
Co-authored-by: David Green <david.green@arm.com>
2024-03-23 12:56:25 +00:00
Luke Lau
51d5b65819
[RISCV] Handle scalable ops with < EEW / 2 narrow types in combineBinOp_VLToVWBinOp_VL (#84158)
We can remove the restriction that the narrow type needs to be exactly
EEW / 2 for scalable ISD::{ADD,SUB,MUL} nodes. This allows us to perform
the combine even if we can't fully fold the extend into the widening op.

VP intrinsics already do this, since they are lowered to _VL nodes which
don't have this restriction.

The "exactly EEW / 2" narrow type restriction prevented us from emitting
V{S,Z}EXT_VL nodes with i1 element types which crash when we try to
select them, since no other legal type is double the size of i1, see the
test case added in this PR `i1_zext`. So to preserve this, this adds a
check for i1 narrow types instead.
2024-03-22 07:26:29 +08:00
Luke Lau
06d245242e
[RISCV] Recursively split concat_vector into smaller LMULs when lowering (#85825)
This is a reimplementation of the combine added in #83035 but as a
lowering instead of a combine, so we don't regress the test case added
in e59f120e3a14ccdc55fcb7be996efaa768daabe0 by interfering with the
strided load combine

Previously the combine had to concatenate the split vectors with
insert_subvector instead of concat_vectors to prevent an infinite
combine loop. And the reasoning behind keeping it as a combine was
because if we emitted the insert_subvector during lowering then we
didn't fold away inserts of undef subvectors.

However it turns out we can avoid this if we just do this in lowering
and select a concat_vector directly, since we get the undef folding for
free with `DAG.getNode(ISD::CONCAT_VECTOR, ...)` via foldCONCAT_VECTORS.
2024-03-22 07:08:51 +08:00
Craig Topper
f5c90f3000
[RISCV] Use BuildPairF64 and SplitF64 for bitcast i64<->f64 on rv32 regardless of Zfa. (#85982)
Previously we used BuildPairF64 and SplitF64 only if Zfa was supported
since they will select register file moves that are only available with
Zfa.

We recently changed the handling of BuildPairF64/SplitF64 for Zdinx to
not go through memory so we should use that for bitcast.

That leaves the D without Zfa case that does need to go through memory.
Previously we let type legalization expand to loads and stores using a
new stack temporary created for each bitcast. After this patch we will
create the loads ands stores in the custom inserter and share the same
stack slot for all. This also allows DAGCombiner to optimize when
bitcast is mixed with BuildPairF64/SplitF64.
2024-03-21 08:52:51 -07:00
Craig Topper
d42992e71c [RISCV] Cleanup setOperationAction for ISD::BITCAST with Zfa and D extension. NFC
We only need Custom handling for i64 on RV32. This will be used by
type legalization. We don't need to make it custom for f64 to get
type legalization to custom split i64.

If f64 and i64 are legal types, then ISD::BITCAST should be legal.
2024-03-20 10:43:54 -07:00
Craig Topper
576d81baa5
[RISCV] Use REG_SEQUENCE/EXTRACT_SUBREG to move between individual GPRs and GPRPair. (#85887)
Previously we used memory like we do to move between GPRs and FPR64 with
the D extension on RV32.

We can instead use REG_SEQUENCE/EXTRACT_SUBREG to inform register
allocation how to do the copy without memory.
2024-03-20 08:44:24 -07:00
Jiahan Xie
4bf06bebb9
[GISEL][RISCV] IRTranslator for scalable vector load (#80006)
Add IRTranslator for scalable vector load instruction and include
corresponding tests with alignment argument included, which can be
smaller/equal/larger than element size or smaller/equal/larger than the
minimum total vector size.
2024-03-19 20:12:26 -04:00
Luke Lau
ef520ca6b1 Revert "[RISCV] Recursively split concat_vector into smaller LMULs (#83035)"
This reverts commit c59129a7c79448837d665de8f2743ad4b14666f6.

This causes regressions in some x264 workloads like pixel_var_8x8 due to it
interfering with the strided load combine. Reverting so I can try to rework
it as a lowering instead.
2024-03-19 20:59:03 +08:00
Craig Topper
ffa2810f7b
[RISCV] Optimize lowering of VECREDUCE_FMINIMUM/VECREDUCE_FMAXIMUM. (#85165)
Use a normal min/max reduction that doesn't propagate nans and force
the result to nan at the end if any elements were nan.
2024-03-14 12:51:29 -07:00
Kolya Panchenko
aa68e2814d
[RISCV] Support llvm.masked.compressstore intrinsic (#83457)
The changeset enables lowering of `llvm.masked.compressstore(%data,
%ptr, %mask)` for RVV for fixed vector type into:
```
%0 = vcompress %data, %mask, %vl
%new_vl = vcpop %mask, %vl
vse %0, %ptr, %1, %new_vl
```
Such lowering is only possible when `%data` fits into available LMULs
and otherwise `llvm.masked.compressstore` is scalarized by
`ScalarizeMaskedMemIntrin` pass.
Even though RVV spec in the section `15.8` provide alternative sequence
for compressstore, use of `vcompress + vcpop` should be a proper
canonical form to lower `llvm.masked.compressstore`. If RISC-V target
find the sequence from `15.8` better, peephole optimization can
transform `vcompress + vcpop` into that sequence.
2024-03-13 15:18:51 -04:00
Luke Lau
0ef61ed54d [RISCV] Move NodeExtensionHelper assert to getOrCreateExtendedOp. NFC
Move the narrow types assert from the ZERO_EXTEND/SIGN_EXTEND case in
fillUpExtensionSupport to getOrCreateExtendedOp so we check the other nodes
too.
2024-03-11 18:00:29 +08:00
Luke Lau
58dd59a282
[RISCV] Don't run combineBinOp_VLToVWBinOp_VL until after legalize types. NFCI (#84125)
I noticed this from a discrepancy in fillUpExtensionSupport between how
we apparently need to check for legal types for ISD::{ZERO,SIGN}_EXTEND,
but we don't need to for RISCVISD::V{Z,S}EXT_VL.

Prior to #72340, combineBinOp_VLToVWBinOp_VL only ran after type
legalization because it only operated on _VL nodes. _VL nodes are only
emitted during op legalization, which takes place **after** type
legalization, which is presumably why the existing code didn't need to
check for legal types.

After #72340 we now handle generic ops like ISD::ADD that exist before
op legalization and thus **before** type legalization. This meant that
we needed to add extra checks that the narrow type was legal in #76785.

I think the easiest thing to do here is to just maintain the invariant
that the types are legal and only run the combine after type
legalization.
2024-03-11 17:43:02 +08:00
Craig Topper
d8d2dea7fc
[RISCV] Handle FP riscv_masked_strided_load with 0 stride. (#84576)
Previously, we tried to create an integer extending load. We need to a
non-extending FP load instead.

Fixes #84541.
2024-03-10 21:22:37 -07:00
Craig Topper
909ab0e0d1
[RISCV] Insert a freeze before converting select to AND/OR. (#84232)
Select blocks poison, but AND/OR do not. We need to insert a freeze
to block poison propagation.

This creates suboptimal codegen which I will try to fix with other
patches. I'm prioritizing the correctness fix since we have 2 bug reports.

Fixes #84200 and #84350
2024-03-07 15:03:51 -08:00
Michael Maitland
96049fcf4e [GISEL] Add IRTranslation for shufflevector on scalable vector types (#80378)
Recommits llvm/llvm-project#80378 which was reverted in
llvm/llvm-project#84330. The problem was that the change in
llvm/test/CodeGen/AArch64/GlobalISel/legalizer-info-validation.mir used
217 as an opcode instead of a regex.
2024-03-07 09:10:03 -08:00
Michael Maitland
552da24843
Revert "[GISEL] Add IRTranslation for shufflevector on scalable vector types" (#84330)
Reverts llvm/llvm-project#80378

causing Buildbot failures that did not show up with check-llvm or CI.
2024-03-07 10:16:31 -05:00
Michael Maitland
2b8aaef09e
[GISEL] Add IRTranslation for shufflevector on scalable vector types (#80378)
This patch is stacked on
https://github.com/llvm/llvm-project/pull/80372,
https://github.com/llvm/llvm-project/pull/80307, and
https://github.com/llvm/llvm-project/pull/80306.

ShuffleVector on scalable vector types gets IRTranslate'd to
G_SPLAT_VECTOR since a ShuffleVector that has operates on scalable
vectors is a splat vector where the value of the splat vector is the 0th
element of the first operand, because the index mask operand is the
zeroinitializer (undef and poison are treated as zeroinitializer here).
This is analogous to what happens in SelectionDAG for ShuffleVector.

`buildSplatVector` is renamed to`buildBuildVectorSplatVector`. I did not
make this a separate patch because it would cause problems to revert
that change without reverting this change too.
2024-03-07 09:50:29 -05:00
Luke Lau
c59129a7c7
[RISCV] Recursively split concat_vector into smaller LMULs (#83035)
This is the concat_vector equivalent of #81312, in that we recursively
split concat_vectors with more than two operands into smaller
concat_vectors.

This allows us to break up the chain of vslideups, as well as perform
the vslideups at a smaller LMUL, which in turn reduces register pressure
as the previous lowering performed N vslideups at the highest result
LMUL. For now, it stops splitting past MF2.

This is done as a DAG combine so that any undef operands are combined
away: If we do this during lowering then we end up with unnecessary
vslideups of undefs.
2024-03-07 16:50:26 +08:00
Luke Lau
0207270494
[RISCV] Don't remove extends for i1 indices in mgather/mscatter (#83951) 2024-03-06 08:52:27 +08:00
Craig Topper
58d8805ff9
[RISCV] Always use signed APSInt in getExactInteger. (#84070)
We were setting based on whether the FP value is positive/negative, but
we really want to know whether the resulting integer will be treated as
a signed or unsigned value. Since we use SINT_TO_FP to convert the
integer to FP, we should always used signed here.

Without this we convert +2147483648.0 to an integer 0x80000000 and
convert it using sint_to_fp which produces -2147483648.0.
2024-03-05 14:36:37 -08:00
Craig Topper
4dd9c2ed32
[RISCV] Use NewVL in splatPartsI64WithVL. (#83690)
In 7b5cf52f32c09, I added this NewVL and checked that it had been set,
but I didn't use it for the VL of the splat.
2024-03-02 17:08:48 -08:00
Yeting Kuo
14d8c4563e
[RISCV] Add more intrinsics into canSplatOperand. (#83106)
This patch adds smin/smax/umin/umax/sadd_sat/ssub_sat/uadd_sat/usub_sat
into canSplatOperand. It can help llvm fold vv instructions with one
splat operand to vx instructions.
2024-02-29 12:57:34 +08:00
Luke Lau
9617da88ab
[RISCV] Use a ta vslideup if inserting over end of InterSubVT (#83230)
The description in #83146 is slightly inaccurate: it relaxes a tail
undisturbed vslideup to tail agnostic if we are inserting over the
entire tail of the vector **and** we didn't shrink the LMUL of the
vector being inserted into.

This handles the case where we did shrink down the LMUL via InterSubVT
by checking if we inserted over the entire tail of InterSubVT, the
actual type that we're performing the vslideup on, not VecVT.
2024-02-28 15:58:55 +08:00
Luke Lau
91d23370cd
[RISCV] Use a tail agnostic vslideup if possible for scalable insert_subvector (#83146)
If we know that an insert_subvector inserting a fixed subvector will
overwrite the entire tail of the vector, we use a tail agnostic
vslideup. This was added in https://reviews.llvm.org/D147347, but we can
do the same thing for scalable vectors too.

The `Policy` variable is defined in a slightly weird place but this is
to mirror the fixed length subvector code path as closely as possible. I
think we may be able to deduplicate them in future.
2024-02-28 10:26:54 +08:00
Luke Lau
2e564840e0
[RISCV] Use getVectorIdxConstant in RISCVISelLowering.cpp. NFC (#83019)
We use getVectorIdxConstant() in some places and getConstant(XLenVT) or
getIntPtrConstant() in others, but getVectorIdxTy() == getPointerTy() ==
XLenVT.

This refactors RISCVISelLowering to use the former for nodes that use
getVectorIdxTy(), i.e. INSERT_SUBVECTOR, EXTRACT_SUBVECTOR,
INSERT_VECTOR_ELT and EXTRACT_VECTOR_ELT, so that we're consistent.
2024-02-27 10:34:04 +08:00
Yeting Kuo
e510fc7753
[VP][RISCV] Introduce vp.lrint/llrint and RISC-V support. (#82627)
RISC-V implements vector lrint/llrint by vfcvt.x.f.v.
2024-02-26 16:37:41 +08:00
Yeting Kuo
850dde063b
[RISCV][VP] Introduce vp saturating addition/subtraction and RISC-V support. (#82370)
This patch also pick the MatchContext framework from DAGCombiner to an
indiviual header file to make the framework be used from other files in
llvm/lib/CodeGen/SelectionDAG/.
2024-02-23 14:17:15 +08:00
Luke Lau
2d50703ddd [RISCV] Use RISCVSubtarget::getRealVLen() in more places. NFC
Catching a couple of more places where we can use the new query added in
8603a7b2.
2024-02-23 12:47:28 +08:00
Craig Topper
de41eae41f
[SelectionDAG][RISCV] Use FP type for legality query for LRINT/LLRINT in LegalizeVectorOps. (#82728)
This matches how LRINT/LLRINT is queried for scalar types in
LegalizeDAG.

It's confusing if they do different things since a "Legal" vector
LRINT/LLRINT would get through to LegalizeDAG which would then consider
it illegal. This doesn't happen currently because RISC-V uses Custom.
2024-02-22 20:18:52 -08:00
Philip Reames
ac518c7c99
[RISCV] Vector sub (zext, zext) -> sext (sub (zext, zext)) (#82455)
This is legal as long as the inner zext retains at least one bit of
increase so that the sub overflow case (0 - UINT_MAX) can be
represented. Alive2 proof: https://alive2.llvm.org/ce/z/BKeV3W

For RVV, restrict this to power of two sizes with the operation type
being at least e8 to stick to legal extends. We could arguably handle i1
source types with some care if we wanted to.

This is likely profitable because it may allow us to perform the sub
instruction in a narrow LMUL (equivalently, in fewer DLEN-sized pieces)
before widening for the user. We could arguably avoid narrowing below
DLEN, but the transform should at worst introduce one extra extend and
one extra vsetvli toggle if the source could previously be handled via
loads explicit w/EEW.
2024-02-22 16:17:48 -08:00
Yingwei Zheng
0107c8824b
[RISCV][SDAG] Improve codegen of select with constants if zicond is available (#82456)
This patch uses `add + czero.eqz/nez` to lower select with constants if
zicond is available.
```
(select c, c1, c2) -> (add (czero_nez c2 - c1, c), c1)
(select c, c1, c2) -> (add (czero_eqz c1 - c2, c), c2)
```
The above code sequence is suggested by [RISCV Optimization
Guide](https://riscv-optimization-guide-riseproject-c94355ae3e6872252baa952524.gitlab.io/riscv-optimization-guide.html#_avoid_branches_using_conditional_moves).
2024-02-23 00:18:56 +08:00
Luke Lau
edd4aee4dd
[RISCV] Compute integers once in isSimpleVIDSequence. NFCI (#82590)
We need to iterate through the integers twice in isSimpleVIDSequence, so
instead of computing them twice just compute them once at the start.

This also replaces the individual checks that each element is constant
with a single call to BuildVectorSDNode::isConstant.
2024-02-22 15:57:57 +08:00
Luke Lau
815644b4dd
[RISCV] Fix mgather -> riscv.masked.strided.load combine not extending indices (#82506)
This fixes the miscompile reported in #82430 by telling
isSimpleVIDSequence to sign extend to XLen instead of the width of the
indices, since the "sequence" of indices generated by a strided load
will be at XLen.

This was the simplest way I could think of getting isSimpleVIDSequence
to treat the indexes as if they were zero extended to XLenVT.

Another way we could do this is by refactoring out the "get constant
integers" part from isSimpleVIDSequence and handle them as APInts so we
can separately zero extend it.

Fixes #82430
2024-02-22 11:50:27 +08:00