Reviving some of the progress on #71764. To recap, we explored removing
the V0 register copies to simplify the pass, but hit a limitation with
the register allocator due to our use of the vmv0 singleton reg class
and early-clobber constraints.
So since we will have to continue to track the definition of V0
ourselves, this patch simplifies it by storing it in a map. It will
allow us to move about copies to V0 in #71764 without having to do extra
bookkeeping.
This change is motived by a point of confusion on
https://github.com/llvm/llvm-project/pull/71764. I hadn't fully
understood why doPeepholeMaskedRVV needed to be part of the same change.
As indicated in the fixme in this patch, the reason is that
performCombineVMergeAndVOps doesn't know how to deal with the true side
of the merge being a all-ones masked instruction.
This change removes one of two calls to the routine in
RISCVISELDAGToDAG, and adds a clarifying comment on the precondition for
the remaining call. The post-ISEL code is tested by the cases where we
can form a unmasked instruction after folding the vmerge back into true.
I don't really care if we actually land this patch, or leave it roled
into https://github.com/llvm/llvm-project/pull/71764. I'm posting it
mostly to clarify the confusion.
We currently have three postprocess peephole optimisations for vector
pseudos:
1) Masked pseudo with all ones mask -> unmasked pseudo
2) Merge vmerge pseudo into operand pseudo's mask
3) vmerge pseudo with all ones mask -> vmv.v.v pseudo
This patch aims to move these peepholes out of SelectionDAG and into a
separate RISCVFoldMasks MachineFunction pass.
There are a few motivations for doing this:
* The current SelectionDAG implementation operates on MachineSDNodes,
which are essentially MachineInstrs but require a bunch of logic to
reason about chain and glue operands. The RISCVII::has*Op helper
functions also don't exactly line up with the SDNode operands. Mutating
these pseudos and their operands in place becomes a good bit easier at
the MachineInstr level. For example, we would no longer need to check
for cycles in the DAG during performCombineVMergeAndVOps.
* Although it's further down the line, moving this code out of
SelectionDAG allows it to be reused by GlobalISel later on.
* In performCombineVMergeAndVOps, it may be possible to commute the
operands to enable folding in more cases (see
test/CodeGen/RISCV/rvv/vmadd-vp.ll). There is existing machinery to
commute operands in TII::commuteInstruction, but it's implemented on
MachineInstrs.
The pass runs straight after ISel, before any of the other machine SSA
optimization passes run. This is so that dead-mi-elimination can mop up
any vmsets that are no longer used (but if preferred we could try and
erase them from inside RISCVFoldMasks itself). This also means that
these peepholes are no longer run at codegen -O0, so this patch isn't
strictly NFC.
Only the performVMergeToVMv peephole is refactored in this patch, the
remaining two would be implemented later. And as noted by @preames, it
should be possible to move doPeepholeSExtW out of SelectionDAG as well.