26 Commits

Author SHA1 Message Date
Luke Lau
cc7e83601d
[RISCV] Select mask operands as virtual registers and eliminate uses of vmv0 (#125026)
This is another attempt at #88496 to keep mask operands in SSA after
instruction selection.

Previously we selected the mask operands into vmv0, a singleton register
class with exactly one register, V0.

But the register allocator doesn't really support singleton register
classes and we ran into errors like "ran out of registers during
register allocation in function".

This avoids this by introducing a pass just before register allocation
that converts any use of vmv0 to a copy to $v0, i.e. what isel currently
does today.

That way the register allocator doesn't need to deal with the singleton
register class, but we get the benefits of having the mask registers in
SSA throughout the backend:

- This allows RISCVVLOptimizer to reduce the VLs of instructions that
define mask registers
- It enables CSE and code sinking in more places
- It removes the need to peek through mask copies in RISCVISelDAGToDAG
and keep track of V0 defs in RISCVVectorPeephole

This patch initially eliminates uses of vmv0s after RISCVVectorPeephole
to keep the diff to a minimum, and a follow up patch will move it past
the other MachineInstr SSA passes.

Note that it doesn't try to remove any defs of vmv0 as we shouldn't have
any instructions that have any vmv0 outputs.

As a further follow up, we can move the elimination pass to after phi
elimination and outside of SSA, which would unblock the pre-RA scheduler
around masked pseudos. This might also help the issue that
RISCVVectorMaskDAGMutation tries to solve.
2025-02-12 12:06:55 +08:00
Luke Lau
c8ee1164bd
[RISCV] Fix masked->unmasked peephole handling masked pseudos with no passthru (#122253)
Some masked pseudos like PseudoVCPOP_M_B8_MASK don't have a passthru,
but in the masked->unmasked peephole we assumed the masked pseudo always
had one.

This checks for a passthru first and fixes #122245.
2025-01-09 19:54:37 +08:00
Kazu Hirata
82d5dd28b4
[RISCV] Remove unused includes (NFC) (#115814)
Identified with misc-include-cleaner.
2024-11-11 22:54:54 -08:00
Michael Maitland
ae68d532f8
[RISCV][VLOPT] Allow propagation even when VL isn't VLMAX (#112228)
The original goal of this pass was to focus on vector operations with
VLMAX. However, users often utilize only part of the result, and such
usage may come from the vectorizer.

We found that relaxing this constraint can capture more optimization
opportunities, such as non-power-of-2 code generation and vector
operation sequences with different VLs.

---------

Co-authored-by: Kito Cheng <kito.cheng@sifive.com>
2024-10-16 14:58:00 -04:00
Luke Lau
d15e53c1e4
[RISCV] Check vmerge's true is in same block in vmerge -> vmv.v.v peephole (#110861)
The peepholes in RISCVVectorPeephole need to be local and we were
failing to check if the true operand was in the same block as the
vmerge.

Fixes #110832
2024-10-03 12:02:54 +08:00
Philip Reames
12530015a4
[RISCV] Add reductions to list of roots in tryToReduceVL (#107595)
This allows us to reduce VLs feeding reduction instructions. In
particular, this means that <3 x Ty> reduce(load) like sequences no
longer require a VL toggle.

This was waiting on 3d72957; now that the latent correctness issue is
fixed, we can expand this transform.
2024-09-10 18:47:20 -07:00
Luke Lau
7ba6768df8 Revert "[RISCV] Update V0Defs after moving Src in peepholes (#107359)"
This fixes #107950 and adds a test case for it. The issue was due to
us incorrectly assuming that we stored a V0Defs entry for every single
instruction.

We actually only store them for instructions that use V0, so when we
updated the V0Def after moving we sometimes ended up copying nullptr
over from an instruction that doesn't use V0 and clearing the V0Def
entry inadvertently.

Because we don't have V0Defs on instructions that don't use V0, the
FIXME was never actually needed in the first place since the
bookkeeping wasn't out of sync to begin with.

That commit also mentioned that a future unmasked to masked pseudo
peephole might need unmasked pseudos to have V0Defs entries, but after
working on this locally it turns out we don't.

This reverts commit ce3648094d44e8c098396a353b215acecb363cda.
2024-09-10 13:26:07 +08:00
Luke Lau
b71d88ca5b [RISCV] Constrain passthru regclass in vmerge -> vmv peephole
In #107827 we now set true's passthru to the false operand if it was
undef. We need to remember to also constrain the regclass in case true
is a masked pseudo which needs its passthrus to be in VR[M*]NoV0
2024-09-10 13:26:07 +08:00
Luke Lau
111932d5ca
[RISCV] Fix same mask vmerge peephole discarding false operand (#107827)
This fixes the issue raised in
https://github.com/llvm/llvm-project/pull/106108#discussion_r1749677510

True's passthru needs to be equivalent to vmerge's false, but we also
allow true's passthru to be undef.

However if it's undef then we need to replace it with false, otherwise
we end up discarding the false operand entirely.

The changes in fixed-vectors-strided-load-store-asm.ll undo the changes
in #106108 where we introduced this miscompile.
2024-09-09 22:45:44 +08:00
Luke Lau
2949720c2e
[RISCV] Move vmerge same mask peephole to RISCVVectorPeephole (#106108)
We currently fold a vmerge.vvm into its true operand if the true operand
is a masked pseudo with the same mask.

We can move this over to RISCVVectorPeephole by instead splitting it up
into a smaller peephole which converts it to a vmv.v.v first. The
existing foldVMV_V_V peephole will then take care of folding it if
needed.

This is very similar to the existing all-ones mask peephole and we could
potentially do it inside of it. I opted to put it in a separate peephole
to make it easier to reason about, given that the duplication is small,
but I could be persuaded either way.
2024-09-06 08:59:13 +08:00
Luke Lau
ce3648094d
[RISCV] Update V0Defs after moving Src in peepholes (#107359)
If we move a pseudo in tryReduceVL or foldVMV_V_V via ensureDominates,
its V0 definition may have changed so we need to update V0Defs.

This shouldn't have any functional change today since any pseudo which
uses V0 won't be able to move past a new definition.

However this will matter if we add a peephole to convert unmasked
pseudos to masked pseudos and add a use of V0.
2024-09-06 00:31:01 +08:00
Luke Lau
3d729571fd
[RISCV] Model dest EEW and fix peepholes not checking EEW (#105945)
Previously for vector peepholes that fold based on VL, we checked if the
VLMAX is the same as a proxy to check that the EEWs were the same. This
only worked at LMUL >= 1 because the EMULs of the Src output and user's
input had to be the same because the register classes needed to match.

At fractional LMULs we would have incorrectly folded something like
this:

    %x:vr = PseudoVADD_VV_MF4 $noreg, $noreg, $noreg, 4, 4 /* e16 */, 0
    %y:vr = PseudoVMV_V_V_MF8 $noreg, %x, 4, 3 /* e8 */, 0

This models the EEW of the destination operands of vector instructions
with a TSFlag, which is enough to fix the incorrect folding.

There's some overlap with the TargetOverlapConstraintType and
IsRVVWideningReduction. If we model the source operands as well we may
be able to subsume them.
2024-09-05 15:27:48 +08:00
Luke Lau
aad6997764
[RISCV] Fold PseudoVMV_V_V with undef passthru, handling policy (#106943)
If a vmv.v.v has an undef passthru then we can just replace it with its
input operand, since the tail is completely undefined.

This is a reattempt of #106840, but also checks to see if the input was
a pseudo where we can relax its tail policy to undef.

This also means we don't need to check for undef passthrus in
foldVMV_V_V anymore because they will be handled by
foldUndefPassthruVMV_V_V.
2024-09-05 12:04:33 +08:00
Luke Lau
6c607cfb2c
[RISCV] Preserve tail agnostic policy in foldVMV_V_V (#105788)
This patch helps avoid regressions in an upcoming patch by making sure
we don't accidentally lose a tail agnostic policy when folding a vmv.v.v
into its source.

The previous comment about RISCVInsertVSETVLI relaxing the policy didn't
take into account the fact that there's a policy operand on vmv.v.v,
which can be tail agnostic.

If the tail is agnostic (via either the policy operand or the passthru
being undef) and vmv.v.v's VL <= Src's VL, then Src's tail can be made
agnostic.
2024-09-04 12:57:09 +08:00
Luke Lau
58e1c0e416
[RISCV] Discard the false operand in vmerge.vvm -> vmv.v.v peephole (#106688)
vmerge.vvm needs to have an all ones mask, so nothing is taken from the
false operand. So instead of checking that the passthru is the same as
false, just use the passthru directly for the tail elements.

This supersedes the convertVMergeToVMv part of #105788, as noted in
https://github.com/llvm/llvm-project/pull/105788/files#r1731683971
2024-08-31 13:20:53 +08:00
Luke Lau
0efa38699a
[RISCV] Check VL dominates and potentially move in tryReduceVL (#106753)
Similar to what we do in foldVMV_V_V with the passthru, if we end up
changing the Src's VL in tryReduceVL we need to make sure it dominates.

Fixes #106735
2024-08-31 01:50:24 +08:00
Luke Lau
dbbfc952f0
[RISCV] Separate ActiveElementsAffectResult into VL and Mask flags (#106517)
In #106110 we had to mark v[f]slide1down.vx as
ActiveElementsAffectResult since the elements in the body depend on VL.
However it doesn't depend on the mask, so this was overly conservative
and broke the vmerge peephole.

We can recover this by splitting up ActiveElementsAffectResult into VL
and Mask bits, so we can more accurately model v[f]slide1down.vx and
re-enable the peephole.
2024-08-30 07:46:06 +08:00
Luke Lau
c073821142
[RISCV] Reduce VL of vmerge.vvm's true operand (#105786)
This extends the peephole added in #104689 to also reduce the VL of a
PseudoVMERGE_VVM's true operand.

We could extend this later to reduce the false operand as well, but this
starts with just the true operand since it allows vmerges that are
converted to vmv.v.vs (convertVMergeToVMv) to be potentially further
folded into their source (foldVMV_V_V).
2024-08-27 01:11:46 +08:00
Luke Lau
be5ecc35ef
[RISCV] Don't move source if passthru already dominates in vmv.v.v peephole (#105792)
Currently we move the source down to where vmv.v.v to make sure that the
new passthru dominates, but we do this even if it already does.

This adds a simple local dominance check (taken from
X86FastPreTileConfig.cpp) and avoids doing the move if it can.

It also modifies the move to only move it to just past the passthru
definition, and not all the way down to the vmv.v.v.

This allows folding to succeed in some edge cases, which prevents
regressions in an upcoming patch.
2024-08-24 20:14:28 +08:00
Philip Reames
26a8a857dc
[RISCV] Introduce local peephole to reduce VLs based on demanded VL (#104689)
This is a fairly narrow transform (at the moment) to reduce the VLs of
instructions feeding a store with a smaller VL. Note that the goal of
this transform isn't really to reduce VL - it's to reduce VL *toggles*.
To our knowledge, small reductions in VL without also changing LMUL are
generally not profitable on existing hardware.

For a single use instruction without side effects, fp exceptions, or a
result dependency on VL, reducing VL is legal if only a subset of
elements are legal. We'd already implemented this logic for vmv.v.v, and
this patch simply applies it to stores as an alternate root.

Longer term, I plan to extend this to other root instructions (i.e.
different kind of stores, reduces, etc..), and add a more general
recursive walkback through operands.

One risk with the dataflow based approach is that we could be reducing
VL of an instruction scheduled in a region with the wider VL (i.e. mixed
mode computations) forcing an additional VL toggle. An example of this
is the @insert_subvector_dag_loop test case, but it doesn't appear to
happen widely. I think this is a risk we should accept.
2024-08-22 07:34:41 -07:00
Luke Lau
aba3476111
[RISCV] Move vmv.v.v peephole from SelectionDAG to RISCVVectorPeephole (#100367)
This is split off from #71764, and moves only the vmv.v.v part of
performCombineVMergeAndVOps to work on MachineInstrs.

In retrospect trying to handle PseudoVMV_V_V and PseudoVMERGE_VVM in the
same function makes the code quite hard to read, so this just does it in
a separate peephole.

This turns out to be simpler since for PseudoVMV_V_V we don't need to
convert the Src instruction to a masked variant, and we don't need to
create a fake all ones mask.
2024-08-17 00:49:27 +08:00
Luke Lau
b1542afd0b
[RISCV] Rename merge operand -> passthru. NFC (#100330)
We sometimes call the first tied dest operand in vector pseudos the
merge operand, and other times the passthru.

Passthru seems to be more common, and it's what the C intrinsics call
it[^1], so this renames all usages of merge to passthru to be
consistent. It also helps prevent confusion with vmerge.vvm in some of
the peephole optimisations.

[^1]:
https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/main/doc/rvv-intrinsic-spec.adoc#the-passthrough-vd-argument-in-the-intrinsics
2024-07-30 17:47:00 +08:00
Luke Lau
6f65a39785 [RISCV] Update RISCVVectorPeephole pass name
It was previously called RISCVFoldMasks
2024-07-26 10:33:38 +08:00
Luke Lau
754dc9ff5a
[RISCV] Move exact VLEN VLMAX transform to RISCVVectorPeephole (#100551)
We can teach RISCVVectorPeephole to detect when an AVL is equal to the
VLMAX when the exact VLEN is known and use the VLMAX sentinel instead,
and in doing so remove the need for getVLOp in RISCVISelLowering. This
keeps all the VLMAX logic in one place.
2024-07-26 07:56:12 +08:00
Luke Lau
b91c75fcae
[RISCV] Add unit strided load/store to whole register peephole (#100116)
This adds a new vector peephole that converts unmasked, VLMAX
vleN.v/vseN.v to their whole register equivalents.

It replaces the existing tablegen patterns on ISD::LOAD/ISD::STORE and
is a bit more general since it also catches VP loads and stores and
@llvm.riscv intrinsics.

The heavy lifting of detecting a VLMAX AVL and an all-ones mask is
already taken care of by existing peepholes.
2024-07-24 11:52:54 +08:00
Luke Lau
c74ba57e0b
[RISCV] Convert AVLs with vlenb to VLMAX where possible (#97800)
Given an AVL that's computed from vlenb, if it's equal to VLMAX then we
can replace it with the VLMAX sentinel value.

The main motiviation is to be able to express an EVL of VLMAX in VP
intrinsics whilst emitting vsetvli a0, zero, so that we can replace
llvm.riscv.masked.strided.{load,store} with their VP counterparts.

This is done in RISCVVectorPeephole (previously RISCVFoldMasks, renamed
to account for the fact that it no longer just folds masks) instead of
SelectionDAG since there are multiple places places where VP nodes are
lowered that would have need to have been handled.

This also avoids doing it in RISCVInsertVSETVLI as it's much harder to
lookup the value of the AVL, and in RISCVVectorPeephole we can take
advantage of DeadMachineInstrElim to remove any leftover
PseudoReadVLENBs.
2024-07-11 14:22:00 +08:00