Currently we have a switch statement that checks if a vector instruction
may read elements past VL. However it currently doesn't account for
instructions in vendor extensions.
Handling all possible vendor instructions will result in quite a lot of
opcodes being added, so I've created a new TSFlag that we can declare in
TableGen, and added it to the existing instruction definitions.
I've tried to be conservative as possible here: All SiFive vendor vector
instructions should be covered by the flag, as well as all of
XRivosVizip, and ri.vextract from XRivosVisni.
For now this should be NFC because coincidentally, these instructions
aren't handled in getOperandInfo, so RISCVVLOptimizer should currently
avoid touching them despite them being liberally handled in
getMinimumVLForUser.
However in an upcoming patch we'll need to also bail in
getMinimumVLForUser, so this prepares for it.
Follow-up PR to #153071, adding the remaining zvbb instructions
(VBREV8_V and VREV8_V), plus the zvbc instruction (VCLMUL_VV, VCLMUL_VX,
VCLMULH_VV, VCLMULH_VX).
This PR adds support for the following instructions to the RISC-V
VLOptimizer: vandn.vx, vandn.vv, vbrev.v, vclz.v, vcpop.v, vctz.v,
vror.vi, vror.vx, vror.vv, vrol.vx, vrol.vv.
Currently when checking to see if two OperandInfos are compatible, we
check to see if the user operand only uses the first scalar and then do
two different checks depending on that.
However whether the user only uses the first scalar or not is already
encoded in OperandInfo, when EMUL is nullopt.
This removes the redundant check and keeps the logic in the OperandInfo
class to make the call site easier to reason about.
We don't need to lookup the reg class because the MCInstDesc already
gives us this information.
With that we can remove some helper methods, and tighten the assert in
isCandidate because all pseudos at this stage should be defining virtual
registers.
The previous assert wasn't passing the TSFlags but the opcode, so wasn't
working.
Fixing it reveals that it was actually triggering, because we're too
strict with viota and vmsxf.m We already reduce the VL on these
instructions because the result in each element doesn't depend on VL.
However, it does change if masked, so account for that.
This PR adds support for the vrgather.vi, vrgather.vx, vrgather.vv,
vrgatherei16.vv instructions in the RISC-V VLOptimizer.
To support vrgatherei16.vv I also needed to add support for it in
getOperandLog2EEW.
Similarly to #146710, for vslide1ups vl only determines the destination
elements written to so we can safely reduce their AVL.
We cannot do this for vslide1downs as the vl determines which lane the
new element is to be inserted in, so some negative tests have been
added.
For vslideup and vslidedown, vl controls the elements which are written
just like other vector instructions. So unless I'm missing something it
should be safe to reduce them. For vslidedown, the specification states
that elements past vl may be read.
We already reduce vslideup and vslidedown in
RISCVVectorPeephole::tryToReduceVL where we just check for
RISCVII::elementsDependOnVL.
Eventually we should replace the whitelist with
RISCVII::elementsDependOnVL once we have test coverage. I've also added
an assert just to double check the instructions we currently support.
This helps reduce vl toggles for fixed-order recurrences vectorized with
EVL tail folding.
This was asserting the raw virtual register class was a scalar
class, instead of computing the net result of the register class
plus the subregister index on the operand. The machine verifier
should be checking this was a valid combination in the first place,
so just drop the assert.
checkUsers currently does two things, a) work out the minimum VL read by
every user and b) check that the operand info of the MI and users match.
getMinimumVLForUser handles most of a), with the exception of the check
for instructions that read past VL e.g. vrgather which is still in
checkUsers.
This moves it into getMinimumVLForUser to keep all that logic in one
place and simplifies an upcoming patch.
Currently if a user of an instruction isn't a vector pseudo we bail. For
simple non-subreg virtual COPYs, we can peek through their uses by using
a worklist.
This is extracted from a loop in TSVC2 (s273) that contains a fcmp +
select, which produces a copy that doesn't seem to be coalesced away.
The VLMUL and policy enums originally lived in RISCVBaseInfo.h in the
backend which is where everything else in the RISCVII namespace is
defined.
RISCVTargetParser.h is used by much more of the compiler and it
doesn't really make sense to have 2 different namespaces exposed.
These enums are both associated with VTYPE so using the RISCVVType
namespace seems like a good home for them.
This patch adds the remaining support for fixed-point arithmetic
instructions (we previously had support for averaging adds and
subtracts).
For saturating adds/subs/multiplies/clips, we can't change `vl` if
`vxsat` is used, since changing `vl` may change its value. So this patch
checks to see if it's dead before considering it a candidate.
After #124066 we started allowing users that are passthrus. However for
widening/narrowing instructions we were returning the wrong operand info
for passthru operands since it originally assumed the operand would
never be a passthru. This fixes it by handling it in IsMODef.
We already had getOperandInfo support, so this marks the instructions as
supported in isCandidate. It also adds support for vfwmaccbf16.v{v,f}
from zvfbfwma
I was running into failed assertions of `isCandidate(UserMI)` in
`getMinimumVLForUser`, but only occurring with
`-enable-machine-outliner=never`. I believe this is a red herring, and
it just so happens the memory allocation pattern on my machine exposed
the bug with that flag.
DemandedVLs is never cleared, which means it accumulates more
MachineInstr pointer keys over time, and it's possible that when e.g.
running on function 'b', a MachineInstr pointer points to the same
memory location used for a candidate in 'a'. This causes the assertion
to fail.
Comment left on #124530 with more information.
We don't want OperandInfo to be visible outside of this translation
unit.
getEMULEqualsEEWDivSEWTimesLMUL is local to this file and declared
static. There's no reason to put it in a namespace.
The motivation for this to allow reducing the vl when a user is a
ternary pseudo, where the third operand is tied and also acts as a
passthru.
When checking the users of an instruction, we currently bail if the user
is used as a passthru because all of its elements past vl will be used
for the tail.
We can allow passthru users if we know the tail of their result isn't
used, which we will have computed beforehand after #124530
It's worth noting that this is all irrelevant of the tail policy,
because tail agnostic still ends up using the passthru.
I've checked that SPEC CPU 2017 + llvm-test-suite pass with this (on
qemu with rvv_ta_all_1s=true)
Fixes#123760
This replaces the worklist by instead computing what VL is demanded by
each instruction's users first, which is done via checkUsers.
The demanded VLs are stored in a DenseMap, and then we can just do a
single forward pass of tryReduceVL where we check if a candidate's
demanded VL is less than its VLOp.
This means the pass should now be linear in complexity, and allows us to
relax the restriction on tied operands in more easily as in #124066.
Whilst adding a cross-block test, I encountered an assertion failure in
the second pass where we check the instruction popped off the worklist
is a candidate.
The leaf instruction %c in this case will be added to the worklist when
its VL is VLMAX, but during the first pass it will have its VL reduced
to 1.
Then in the second pass when its processed via the worklist, isCandidate
will no longer be true due to its VL == 1.
This fixes it by moving the VL == 1 check to tryReduceVL, keeping it
alongside the other VL check for bailing out early as an optimisation.
We currently check for passthrus in two places, on the instruction to
reduce in isCandidate, and on the users in checkUsers.
We cannot reduce the VL if an instruction has a user that's a passthru,
because the user will read elements past VL in the tail.
However it's fine to reduce an instruction if it itself contains a
non-undef passthru. Since the VL can only be reduced, not increased, the
previous tail will always remain the same.
We already bail if the user is tied in checkUsers, which is true for all
passthrus. Remove the check in getOperandLog2EEW so that it only worries
about computing the OperandInfo, and leaves the passthru correctness to
checkUsers.
This implements a suggestion by Craig in PR #123878. We can move the
worklist management out of the per-instruction work and do it once at
the end of scanning all the instructions. This should reduce repeat
visitation of the same instruction when no changes can be made.
Note that this does not remove the inherent O(N^2) in the algorithm.
We're still potentially visiiting every user of every def.
I also included a guard for unreachable blocks since that had been
mentioned as a possible cause. It seems we've rulled that out, but
guarding for this case is still a good idea.
For .wv widening instructions when checking if the opperand is vs1 or
vs2, we take into account whether or not it has a passthru. For tied
pseudos though their passthru is the vs2, and we weren't taking this
into account.