llvm-project

Author	SHA1	Message	Date
Luke Lau	cc7e83601d	[RISCV] Select mask operands as virtual registers and eliminate uses of vmv0 (#125026 ) This is another attempt at #88496 to keep mask operands in SSA after instruction selection. Previously we selected the mask operands into vmv0, a singleton register class with exactly one register, V0. But the register allocator doesn't really support singleton register classes and we ran into errors like "ran out of registers during register allocation in function". This avoids this by introducing a pass just before register allocation that converts any use of vmv0 to a copy to $v0, i.e. what isel currently does today. That way the register allocator doesn't need to deal with the singleton register class, but we get the benefits of having the mask registers in SSA throughout the backend: - This allows RISCVVLOptimizer to reduce the VLs of instructions that define mask registers - It enables CSE and code sinking in more places - It removes the need to peek through mask copies in RISCVISelDAGToDAG and keep track of V0 defs in RISCVVectorPeephole This patch initially eliminates uses of vmv0s after RISCVVectorPeephole to keep the diff to a minimum, and a follow up patch will move it past the other MachineInstr SSA passes. Note that it doesn't try to remove any defs of vmv0 as we shouldn't have any instructions that have any vmv0 outputs. As a further follow up, we can move the elimination pass to after phi elimination and outside of SSA, which would unblock the pre-RA scheduler around masked pseudos. This might also help the issue that RISCVVectorMaskDAGMutation tries to solve.	2025-02-12 12:06:55 +08:00
Luke Lau	c8ee1164bd	[RISCV] Fix masked->unmasked peephole handling masked pseudos with no passthru (#122253 ) Some masked pseudos like PseudoVCPOP_M_B8_MASK don't have a passthru, but in the masked->unmasked peephole we assumed the masked pseudo always had one. This checks for a passthru first and fixes #122245.	2025-01-09 19:54:37 +08:00
Kazu Hirata	82d5dd28b4	[RISCV] Remove unused includes (NFC) (#115814 ) Identified with misc-include-cleaner.	2024-11-11 22:54:54 -08:00
Michael Maitland	ae68d532f8	[RISCV][VLOPT] Allow propagation even when VL isn't VLMAX (#112228 ) The original goal of this pass was to focus on vector operations with VLMAX. However, users often utilize only part of the result, and such usage may come from the vectorizer. We found that relaxing this constraint can capture more optimization opportunities, such as non-power-of-2 code generation and vector operation sequences with different VLs. --------- Co-authored-by: Kito Cheng <kito.cheng@sifive.com>	2024-10-16 14:58:00 -04:00
Luke Lau	d15e53c1e4	[RISCV] Check vmerge's true is in same block in vmerge -> vmv.v.v peephole (#110861 ) The peepholes in RISCVVectorPeephole need to be local and we were failing to check if the true operand was in the same block as the vmerge. Fixes #110832	2024-10-03 12:02:54 +08:00
Philip Reames	12530015a4	[RISCV] Add reductions to list of roots in tryToReduceVL (#107595 ) This allows us to reduce VLs feeding reduction instructions. In particular, this means that <3 x Ty> reduce(load) like sequences no longer require a VL toggle. This was waiting on 3d72957; now that the latent correctness issue is fixed, we can expand this transform.	2024-09-10 18:47:20 -07:00
Luke Lau	7ba6768df8	Revert "[RISCV] Update V0Defs after moving Src in peepholes (#107359 )" This fixes #107950 and adds a test case for it. The issue was due to us incorrectly assuming that we stored a V0Defs entry for every single instruction. We actually only store them for instructions that use V0, so when we updated the V0Def after moving we sometimes ended up copying nullptr over from an instruction that doesn't use V0 and clearing the V0Def entry inadvertently. Because we don't have V0Defs on instructions that don't use V0, the FIXME was never actually needed in the first place since the bookkeeping wasn't out of sync to begin with. That commit also mentioned that a future unmasked to masked pseudo peephole might need unmasked pseudos to have V0Defs entries, but after working on this locally it turns out we don't. This reverts commit ce3648094d44e8c098396a353b215acecb363cda.	2024-09-10 13:26:07 +08:00
Luke Lau	b71d88ca5b	[RISCV] Constrain passthru regclass in vmerge -> vmv peephole In #107827 we now set true's passthru to the false operand if it was undef. We need to remember to also constrain the regclass in case true is a masked pseudo which needs its passthrus to be in VR[M*]NoV0	2024-09-10 13:26:07 +08:00
Luke Lau	111932d5ca	[RISCV] Fix same mask vmerge peephole discarding false operand (#107827 ) This fixes the issue raised in https://github.com/llvm/llvm-project/pull/106108#discussion_r1749677510 True's passthru needs to be equivalent to vmerge's false, but we also allow true's passthru to be undef. However if it's undef then we need to replace it with false, otherwise we end up discarding the false operand entirely. The changes in fixed-vectors-strided-load-store-asm.ll undo the changes in #106108 where we introduced this miscompile.	2024-09-09 22:45:44 +08:00
Luke Lau	2949720c2e	[RISCV] Move vmerge same mask peephole to RISCVVectorPeephole (#106108 ) We currently fold a vmerge.vvm into its true operand if the true operand is a masked pseudo with the same mask. We can move this over to RISCVVectorPeephole by instead splitting it up into a smaller peephole which converts it to a vmv.v.v first. The existing foldVMV_V_V peephole will then take care of folding it if needed. This is very similar to the existing all-ones mask peephole and we could potentially do it inside of it. I opted to put it in a separate peephole to make it easier to reason about, given that the duplication is small, but I could be persuaded either way.	2024-09-06 08:59:13 +08:00
Luke Lau	ce3648094d	[RISCV] Update V0Defs after moving Src in peepholes (#107359 ) If we move a pseudo in tryReduceVL or foldVMV_V_V via ensureDominates, its V0 definition may have changed so we need to update V0Defs. This shouldn't have any functional change today since any pseudo which uses V0 won't be able to move past a new definition. However this will matter if we add a peephole to convert unmasked pseudos to masked pseudos and add a use of V0.	2024-09-06 00:31:01 +08:00
Luke Lau	3d729571fd	[RISCV] Model dest EEW and fix peepholes not checking EEW (#105945 ) Previously for vector peepholes that fold based on VL, we checked if the VLMAX is the same as a proxy to check that the EEWs were the same. This only worked at LMUL >= 1 because the EMULs of the Src output and user's input had to be the same because the register classes needed to match. At fractional LMULs we would have incorrectly folded something like this: %x:vr = PseudoVADD_VV_MF4 $noreg, $noreg, $noreg, 4, 4 /* e16 /, 0 %y:vr = PseudoVMV_V_V_MF8 $noreg, %x, 4, 3 / e8 */, 0 This models the EEW of the destination operands of vector instructions with a TSFlag, which is enough to fix the incorrect folding. There's some overlap with the TargetOverlapConstraintType and IsRVVWideningReduction. If we model the source operands as well we may be able to subsume them.	2024-09-05 15:27:48 +08:00
Luke Lau	aad6997764	[RISCV] Fold PseudoVMV_V_V with undef passthru, handling policy (#106943 ) If a vmv.v.v has an undef passthru then we can just replace it with its input operand, since the tail is completely undefined. This is a reattempt of #106840, but also checks to see if the input was a pseudo where we can relax its tail policy to undef. This also means we don't need to check for undef passthrus in foldVMV_V_V anymore because they will be handled by foldUndefPassthruVMV_V_V.	2024-09-05 12:04:33 +08:00
Luke Lau	6c607cfb2c	[RISCV] Preserve tail agnostic policy in foldVMV_V_V (#105788 ) This patch helps avoid regressions in an upcoming patch by making sure we don't accidentally lose a tail agnostic policy when folding a vmv.v.v into its source. The previous comment about RISCVInsertVSETVLI relaxing the policy didn't take into account the fact that there's a policy operand on vmv.v.v, which can be tail agnostic. If the tail is agnostic (via either the policy operand or the passthru being undef) and vmv.v.v's VL <= Src's VL, then Src's tail can be made agnostic.	2024-09-04 12:57:09 +08:00
Luke Lau	58e1c0e416	[RISCV] Discard the false operand in vmerge.vvm -> vmv.v.v peephole (#106688 ) vmerge.vvm needs to have an all ones mask, so nothing is taken from the false operand. So instead of checking that the passthru is the same as false, just use the passthru directly for the tail elements. This supersedes the convertVMergeToVMv part of #105788, as noted in https://github.com/llvm/llvm-project/pull/105788/files#r1731683971	2024-08-31 13:20:53 +08:00
Luke Lau	0efa38699a	[RISCV] Check VL dominates and potentially move in tryReduceVL (#106753 ) Similar to what we do in foldVMV_V_V with the passthru, if we end up changing the Src's VL in tryReduceVL we need to make sure it dominates. Fixes #106735	2024-08-31 01:50:24 +08:00
Luke Lau	dbbfc952f0	[RISCV] Separate ActiveElementsAffectResult into VL and Mask flags (#106517 ) In #106110 we had to mark v[f]slide1down.vx as ActiveElementsAffectResult since the elements in the body depend on VL. However it doesn't depend on the mask, so this was overly conservative and broke the vmerge peephole. We can recover this by splitting up ActiveElementsAffectResult into VL and Mask bits, so we can more accurately model v[f]slide1down.vx and re-enable the peephole.	2024-08-30 07:46:06 +08:00
Luke Lau	c073821142	[RISCV] Reduce VL of vmerge.vvm's true operand (#105786 ) This extends the peephole added in #104689 to also reduce the VL of a PseudoVMERGE_VVM's true operand. We could extend this later to reduce the false operand as well, but this starts with just the true operand since it allows vmerges that are converted to vmv.v.vs (convertVMergeToVMv) to be potentially further folded into their source (foldVMV_V_V).	2024-08-27 01:11:46 +08:00
Luke Lau	be5ecc35ef	[RISCV] Don't move source if passthru already dominates in vmv.v.v peephole (#105792 ) Currently we move the source down to where vmv.v.v to make sure that the new passthru dominates, but we do this even if it already does. This adds a simple local dominance check (taken from X86FastPreTileConfig.cpp) and avoids doing the move if it can. It also modifies the move to only move it to just past the passthru definition, and not all the way down to the vmv.v.v. This allows folding to succeed in some edge cases, which prevents regressions in an upcoming patch.	2024-08-24 20:14:28 +08:00
Philip Reames	26a8a857dc	[RISCV] Introduce local peephole to reduce VLs based on demanded VL (#104689 ) This is a fairly narrow transform (at the moment) to reduce the VLs of instructions feeding a store with a smaller VL. Note that the goal of this transform isn't really to reduce VL - it's to reduce VL toggles. To our knowledge, small reductions in VL without also changing LMUL are generally not profitable on existing hardware. For a single use instruction without side effects, fp exceptions, or a result dependency on VL, reducing VL is legal if only a subset of elements are legal. We'd already implemented this logic for vmv.v.v, and this patch simply applies it to stores as an alternate root. Longer term, I plan to extend this to other root instructions (i.e. different kind of stores, reduces, etc..), and add a more general recursive walkback through operands. One risk with the dataflow based approach is that we could be reducing VL of an instruction scheduled in a region with the wider VL (i.e. mixed mode computations) forcing an additional VL toggle. An example of this is the @insert_subvector_dag_loop test case, but it doesn't appear to happen widely. I think this is a risk we should accept.	2024-08-22 07:34:41 -07:00
Luke Lau	aba3476111	[RISCV] Move vmv.v.v peephole from SelectionDAG to RISCVVectorPeephole (#100367 ) This is split off from #71764, and moves only the vmv.v.v part of performCombineVMergeAndVOps to work on MachineInstrs. In retrospect trying to handle PseudoVMV_V_V and PseudoVMERGE_VVM in the same function makes the code quite hard to read, so this just does it in a separate peephole. This turns out to be simpler since for PseudoVMV_V_V we don't need to convert the Src instruction to a masked variant, and we don't need to create a fake all ones mask.	2024-08-17 00:49:27 +08:00
Luke Lau	b1542afd0b	[RISCV] Rename merge operand -> passthru. NFC (#100330 ) We sometimes call the first tied dest operand in vector pseudos the merge operand, and other times the passthru. Passthru seems to be more common, and it's what the C intrinsics call it[^1], so this renames all usages of merge to passthru to be consistent. It also helps prevent confusion with vmerge.vvm in some of the peephole optimisations. [^1]: https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/main/doc/rvv-intrinsic-spec.adoc#the-passthrough-vd-argument-in-the-intrinsics	2024-07-30 17:47:00 +08:00
Luke Lau	6f65a39785	[RISCV] Update RISCVVectorPeephole pass name It was previously called RISCVFoldMasks	2024-07-26 10:33:38 +08:00
Luke Lau	754dc9ff5a	[RISCV] Move exact VLEN VLMAX transform to RISCVVectorPeephole (#100551 ) We can teach RISCVVectorPeephole to detect when an AVL is equal to the VLMAX when the exact VLEN is known and use the VLMAX sentinel instead, and in doing so remove the need for getVLOp in RISCVISelLowering. This keeps all the VLMAX logic in one place.	2024-07-26 07:56:12 +08:00
Luke Lau	b91c75fcae	[RISCV] Add unit strided load/store to whole register peephole (#100116 ) This adds a new vector peephole that converts unmasked, VLMAX vleN.v/vseN.v to their whole register equivalents. It replaces the existing tablegen patterns on ISD::LOAD/ISD::STORE and is a bit more general since it also catches VP loads and stores and @llvm.riscv intrinsics. The heavy lifting of detecting a VLMAX AVL and an all-ones mask is already taken care of by existing peepholes.	2024-07-24 11:52:54 +08:00
Luke Lau	c74ba57e0b	[RISCV] Convert AVLs with vlenb to VLMAX where possible (#97800 ) Given an AVL that's computed from vlenb, if it's equal to VLMAX then we can replace it with the VLMAX sentinel value. The main motiviation is to be able to express an EVL of VLMAX in VP intrinsics whilst emitting vsetvli a0, zero, so that we can replace llvm.riscv.masked.strided.{load,store} with their VP counterparts. This is done in RISCVVectorPeephole (previously RISCVFoldMasks, renamed to account for the fact that it no longer just folds masks) instead of SelectionDAG since there are multiple places places where VP nodes are lowered that would have need to have been handled. This also avoids doing it in RISCVInsertVSETVLI as it's much harder to lookup the value of the AVL, and in RISCVVectorPeephole we can take advantage of DeadMachineInstrElim to remove any leftover PseudoReadVLENBs.	2024-07-11 14:22:00 +08:00

26 Commits