20 Commits

Author SHA1 Message Date
Yeting Kuo
87af9ee870
[RISCV] Use experimental.vp.splat to splat specific vector length elements. (#101329)
Previously, llvm IR is hard to create a scalable vector splat with a
specific vector length, so we use riscv.vmv.v.x and riscv.vmv.v.f to do
this work. But the two rvv intrinsics needs strict type constraint which
can not support fixed vector types and illegal vector types. Using
vp.splat could preserve old functionality and also generate more
optimized code for vector types and illegal vectors.
This patch also fixes crash for getEVT not serving ptr types.
2024-08-01 09:37:42 +08:00
Piotr Fusik
f0ac8903ea
[RISCV][NFC] Fix intrinsic misspelled in a comment (#98998) 2024-07-17 07:07:33 +02:00
Luke Lau
563ae62095
[RISCV] Don't expand zero stride vp.strided.load if SEW>XLEN (#98924)
A splat of a <n x i64> on RV32 will get lowered as a zero strided load
anyway (and won't match any .vx splat patterns), so don't expand it to a
scalar load + splat to avoid writing it to the stack.
2024-07-16 10:29:53 +08:00
Luke Lau
d5f4f084d2
[RISCV] Always expand zero strided vp.strided.load (#98901)
This patch makes zero strided VP loads always be expanded to a scalar
load and splat even if +optimized-zero-stride-load is present.

Expanding it allows more .vx splat patterns to be matched, which is
needed to prevent regressions in #98111.

If the feature is present, RISCVISelDAGToDAG will combine it back to a
zero strided load.

The RV32 test diff also shows how need to emit a zero strided load
either way after expanding an SEW=64 strided load. We could maybe fix
this in a later patch by not doing the expand if SEW>XLEN.
2024-07-15 23:54:00 +08:00
Yeting Kuo
94279ae4ca
[RISCV] Recommit "Expand vp.stride.load to splat of a scalar load." (#98579)
This is a recommit of #98140. The old commit should be rebased on #98205
which changes the feature of hardware zero stride optimization.

It's a similar patch as a214c521f8763b36dd400b89017f74ad5ae4b6c7 for
vp.stride.load. Some targets prefer pattern (vmv.v.x (load)) instead of
vlse with zero stride.
2024-07-15 16:09:56 +08:00
Nico Weber
cea7bad732
Revert "[RISCV] Expand vp.stride.load to splat of a scalar load." (#98422)
Reverts llvm/llvm-project#98140

Breaks tests, see comments on the PR.
2024-07-10 20:54:41 -04:00
Yeting Kuo
cda245a339
[RISCV] Expand vp.stride.load to splat of a scalar load. (#98140)
It's a similar patch as a214c521f8763b36dd400b89017f74ad5ae4b6c7 for
vp.stride.load. Some targets prefer pattern (vmv.v.x (load)) instead of
vlse with zero stride.

It's IR version of #97798.
2024-07-11 08:38:31 +08:00
Nikita Popov
9df71d7673
[IR] Add getDataLayout() helpers to Function and GlobalValue (#96919)
Similar to https://github.com/llvm/llvm-project/pull/96902, this adds
`getDataLayout()` helpers to Function and GlobalValue, replacing the
current `getParent()->getDataLayout()` pattern.
2024-06-28 08:36:49 +02:00
Craig Topper
9396891271 [RISCV] Don't look for sext in RISCVCodeGenPrepare::visitAnd.
We want to know the upper 33 bits of the And Input are zero. SExt
only guarantees they are the same.

We originally checked for SExt or ZExt when we were using
isImpliedByDomCondition because a ZExt may have been changed to SExt
before we visited the And.

We are no longer using isImpliedByDomCondition so we can only look
for zext with the nneg flag.

While here, switch to PatternMatch to simplify the code.

Fixes #78783
2024-01-19 14:44:47 -08:00
Luke Lau
15b0fabb21
[RISCV] Vectorize phi for loop carried @llvm.vector.reduce.fadd (#78244)
LLVM vector reduction intrinsics return a scalar result, but on RISC-V
vector reduction instructions write the result in the first element of a
vector register. So when a reduction in a loop uses a scalar phi, we end
up with unnecessary scalar moves:

loop:
    vfmv.s.f v10, fa0
    vfredosum.vs v8, v8, v10
    vfmv.f.s fa0, v8

This mainly affects ordered fadd reductions, which has a scalar accumulator
operand.
This tries to vectorize any scalar phis that feed into a fadd reduction
in RISCVCodeGenPrepare, converting:

loop:
%phi = phi <float> [ ..., %entry ], [ %acc, %loop]
%acc = call float @llvm.vector.reduce.fadd.nxv4f32(float %phi, <vscale x 2 x float> %vec)
```

to

loop:
%phi = phi <vscale x 2 x float> [ ..., %entry ], [ %acc.vec, %loop]
%phi.scalar = extractelement <vscale x 2 x float> %phi, i64 0
%acc = call float @llvm.vector.reduce.fadd.nxv4f32(float %x, <vscale x 2 x float> %vec)
%acc.vec = insertelement <vscale x 2 x float> poison, float %acc.next, i64 0

Which eliminates the scalar -> vector -> scalar crossing during
instruction selection.
2024-01-18 16:15:20 +07:00
Yingwei Zheng
d64d5ea102
[RISCV][CodeGenPrepare] Remove duplicated transform for zext. NFC. (#72053)
After #71534 and #72052, the transform `zext -> zext nneg` in
`RISCVCodeGenPrepare` is redundant.
2023-11-13 22:45:33 +08:00
Philip Reames
784a2cd561
[RISCV] Rewrite RISCVCodeGenPrepare using zext nneg [nfc-ish] (#70739)
This stacks on #70725. Once we have lowering for zext nneg, we can
rewrite all of the existing RISCVCodeGenPrepare login in terms of zext
nneg instead of sext. The change isn't NFC from the perspective of the
individual pass, but should be from the perspective of codegen as a
whole.

As noted in the TODO, one piece can be moved to instcombine, but I'll
leave that to a separate commit.
2023-10-30 16:35:30 -07:00
Craig Topper
0f4c9c016c [RISCV] Replace RISCV->RISC-V in strings.
To be consistent with RISC-V branding guidelines
https://riscv.org/about/risc-v-branding-guidelines/
Think we should be using RISC-V where possible.

D146449 already updated comments. Strings may have more user impact.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D146451
2023-03-27 09:50:17 -07:00
Craig Topper
29463612d2 [RISCV] Replace RISCV -> RISC-V in comments. NFC
To be consistent with RISC-V branding guidelines
https://riscv.org/about/risc-v-branding-guidelines/
Think we should be using RISC-V where possible.

More patches will follow.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D146449
2023-03-27 09:50:17 -07:00
Craig Topper
37db283362 [RISCV] isImpliedByDomCondition returns an Optional<bool> not a bool.
We were incorrectly checking that it returned an implicaton result,
not that the implication result itself was true.
2022-08-12 22:21:05 -07:00
Craig Topper
f19497f7b0 [RISCV] Use InstVisitor in RISCVCodeGenPrepare. NFC
Makes it easy to add new instructions to look at without dispatching
manually.
2022-08-02 21:19:30 -07:00
Craig Topper
1db6d6dcd8 [RISCV] Teach RISCVCodeGenPrepare to optimize (zext (abs(i32 X, i1 1))).
(abs(i32 X, i1 1) always produces a positive result. The 'i1 1'
means INT_MIN input produces poison. If the result is sign extended,
InstCombine will convert it to zext. This does not produce ideal
code for RISCV.

This patch reverses the zext back to sext which can be folded
into a subw or negw. Ideally we'd do this in SelectionDAG, but
we lose the INT_MIN poison flag when llvm.abs becomes ISD::ABS.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D130412
2022-07-25 09:36:41 -07:00
Craig Topper
8cc483099a [RISCV] Teach RISCVCodeGenPrepare to optimize (i64 (and (zext/sext (i32 X), C1)))
If X is known positive by a dominating condition, we can fill in
ones into the upper bits of C1 if that would allow it to become an
simm12 allowing the use of ANDI.

This pattern often occurs in unrolled loops where the induction
variable has been widened.

To get the best benefit from this, I had to move the pass above
ConstantHoisting which is in addIRPasses. Otherwise the AND constant
is often hoisted away from the AND.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D129888
2022-07-17 11:00:56 -07:00
Craig Topper
73f766ca9a [RISCV] Remove unnecessary use of IRBuilder from RISCVCodeGenPrepare.
We're creating single instruction to replace another instruction.
We can insert using the InsertBefore operand of the constructor.
Then copy the debug location.
2022-07-17 10:59:54 -07:00
Craig Topper
1a8468ba61 [RISCV] Add a RISCV specific CodeGenPrepare pass.
Initial optimization is to convert (i64 (zext (i32 X))) to
(i64 (sext (i32 X))) if the dominating condition for the basic block
guaranteed the sign bit of X is zero.

This frequently occurs in loop preheaders where a signed induction
variable that can never be negative has been widened. There will be
a dominating check that the 32-bit trip count isn't negative or zero.
The check here is not restricted to that specific case though.

A i32->i64 sext is cheaper than zext on RV64 without the Zba
extension. Later optimizations can often remove the sext from the
preheader basic block because the dominating block also needs a sext to
evaluate the greater than 0 check.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D129732
2022-07-14 10:20:59 -07:00