PR #75576 and #75735 update some implies in
llvm/lib/Support/RISCVISAInfo.cpp, but both of them miss the subtarget
feature part.
This patch still preserve predicate HasStdExtZfhOrZfhmin and
HasStdExtZhinxOrZhinxmin, since they could make error message more
readable. ( Users might not know that zfh implies zfhmin.)
This change is motived by a point of confusion on
https://github.com/llvm/llvm-project/pull/71764. I hadn't fully
understood why doPeepholeMaskedRVV needed to be part of the same change.
As indicated in the fixme in this patch, the reason is that
performCombineVMergeAndVOps doesn't know how to deal with the true side
of the merge being a all-ones masked instruction.
This change removes one of two calls to the routine in
RISCVISELDAGToDAG, and adds a clarifying comment on the precondition for
the remaining call. The post-ISEL code is tested by the cases where we
can form a unmasked instruction after folding the vmerge back into true.
I don't really care if we actually land this patch, or leave it roled
into https://github.com/llvm/llvm-project/pull/71764. I'm posting it
mostly to clarify the confusion.
It seems TypeSize is currently broken in the sense that:
TypeSize::Fixed(4) + TypeSize::Scalable(4) => TypeSize::Fixed(8)
without failing its assert that explicitly tests for this case:
assert(LHS.Scalable == RHS.Scalable && ...);
The reason this fails is that `Scalable` is a static method of class
TypeSize,
and LHS and RHS are both objects of class TypeSize. So this is
evaluating
if the pointer to the function Scalable == the pointer to the function
Scalable,
which is always true because LHS and RHS have the same class.
This patch fixes the issue by renaming `TypeSize::Scalable` ->
`TypeSize::getScalable`, as well as `TypeSize::Fixed` to
`TypeSize::getFixed`,
so that it no longer clashes with the variable in
FixedOrScalableQuantity.
The new methods now also better match the coding standard, which
specifies that:
* Variable names should be nouns (as they represent state)
* Function names should be verb phrases (as they represent actions)
The compiler currently abends with an impossible reg-to-reg copy when
producing a negative zero FP immediate on RV32 with -Zdinx. This is
because we emit a negation that uses FP registers. Emit the right node
to produce correct code.
This transformation might be illegal for `PseduoVIOTA_M`. The value of
`viota.m vd, vs2` is the prefix sum of vd2 and adding mask for it may
cause wrong prefix sum.
Take an example, the result of following expression is `{5, 5, 5, 3}`,
```
; v4 = {1, 1, 1, 1}
viota.m v1, v4
; v0 = {0, 0, 0, 1}, v1 = {0, 1, 2, 3}, v8 = {5, 5, 5, 5}
vmerge.vvm v8, v8, v1, v0.t
; v8 = {5, 5, 5, 3}
```
but if we merge them to `viota.m v8, v4, v0.t`, then the result of is
`{5, 5, 5, 0}`.
Also, we still does `performCombineVMergeAndVOps` for `voita.m` when
mask of `vmerge.vvm` is a true mask.
This will select i32 operations directly to W instructions without
custom nodes. Hopefully this can allow us to be less dependent on
hasAllNBitUsers to recover i32 operations in RISCVISelDAGToDAG.cpp.
This support is enabled with a command line option that is off by
default.
Generated code is still not optimal.
I've duplicated many test cases for this, but its not complete. Enabling this runs all existing lit tests without crashing.
Most of the FP constants supported by FLI are positive. For negative FP
constants X whose positive values is supported by FLI, we can use `(FNEG
(FLI -X))` to materialize X.
We currently have three postprocess peephole optimisations for vector
pseudos:
1) Masked pseudo with all ones mask -> unmasked pseudo
2) Merge vmerge pseudo into operand pseudo's mask
3) vmerge pseudo with all ones mask -> vmv.v.v pseudo
This patch aims to move these peepholes out of SelectionDAG and into a
separate RISCVFoldMasks MachineFunction pass.
There are a few motivations for doing this:
* The current SelectionDAG implementation operates on MachineSDNodes,
which are essentially MachineInstrs but require a bunch of logic to
reason about chain and glue operands. The RISCVII::has*Op helper
functions also don't exactly line up with the SDNode operands. Mutating
these pseudos and their operands in place becomes a good bit easier at
the MachineInstr level. For example, we would no longer need to check
for cycles in the DAG during performCombineVMergeAndVOps.
* Although it's further down the line, moving this code out of
SelectionDAG allows it to be reused by GlobalISel later on.
* In performCombineVMergeAndVOps, it may be possible to commute the
operands to enable folding in more cases (see
test/CodeGen/RISCV/rvv/vmadd-vp.ll). There is existing machinery to
commute operands in TII::commuteInstruction, but it's implemented on
MachineInstrs.
The pass runs straight after ISel, before any of the other machine SSA
optimization passes run. This is so that dead-mi-elimination can mop up
any vmsets that are no longer used (but if preferred we could try and
erase them from inside RISCVFoldMasks itself). This also means that
these peepholes are no longer run at codegen -O0, so this patch isn't
strictly NFC.
Only the performVMergeToVMv peephole is refactored in this patch, the
remaining two would be implemented later. And as noted by @preames, it
should be possible to move doPeepholeSExtW out of SelectionDAG as well.
Rematerialization during register allocation is currently limited to a
single instruction with no inputs.
This patch introduces a pseudoinstruction that represents the
materialization of a constant. I've started with a sequence of 2
instructions for now, which covers at least the common LUI+ADDI(W) case.
This instruction will be expanded into real instructions immediately
after register allocation using a new pass. This gives the post-RA
scheduler a chance to separate the 2 instructions to improve ILP.
I believe this matches the approach used by AArch64.
Unfortunately, this loses some CSE opportunies when an LUI value is used
by multiple constants with different LSBs.
This feature is off by default and a new backend command line option is
added to enable it for testing.
This avoids the spill and reloads reported in #69586.
VTs on already selected instructions can be arbitrary. Reviewing
the isel table I see i32 used for instructions that are part of
multiple instruction output patterns. Looks like tblgen to just
picks the lowest numbered MVT that is legal for the destination
register class of the instruction.
Seems better to just not check types for already selected nodes.
RISCVGenDAGISel.inc can call this before it checks the node type.
Ensure the type is scalar before wasting time to do the more
computationally expensive checks.
This also avoids an assertion if we hit a VMV_X_S instruction
which doesn't have a VL operand which vectorPseudoHasAllNBitUsers
expects.
A new ComplexPattern `AddrRegImmLsb00000` is added, which is like
`AddrRegImm` except that if the least significant 5 bits isn't all
zeros, we will fail back to offset 0.
The handling for vector pseudos in hasAllNBitUsers is duplicated across
RISCVISelDAGToDAG and RISCVOptWInstrs. This deduplicates it between the
two,
with the common denominator between the two call sites being the opcode
and
SEW: We need to handle extracting these separately since one operates at
the
SelectionDAG level and the other at the MachineInstr level.
Without this, various CodeGen tests fail because a
RISCV::FCVT_D_W[_IN32X] machine node is created without the rounding
mode operand.
The relevant PR was committed as bf94ba39b65d1212ea84d5783b393280e1ce7478
Vector pseudos with scalar operands only use the lower SEW bits (or less in the
case of shifts and clips). This patch accounts for this in hasAllNBitUsers for
both SDNodes in RISCVISelDAGToDAG. We also need to handle this in
RISCVOptWInstrs otherwise we introduce slliw instructions that are less
compressible than their original slli counterpart.
This is a reland of aff6ffc8760b99cc3d66dd6e251a4f90040c0ab9 with the
refactoring omitted.
This reverts commit aff6ffc8760b99cc3d66dd6e251a4f90040c0ab9. Version landed differs from version reviewed in (stylistic) manner worthy of separate review.
Vector pseudos with scalar operands only use the lower SEW bits (or less
in the
case of shifts and clips). This patch accounts for this in
hasAllNBitUsers for
both SDNodes in RISCVISelDAGToDAG. We also need to handle this in
RISCVOptWInstrs otherwise we introduce slliw instructions that are less
compressible than their original slli counterpart.
…n RISCVDAGToDAGISel::Select.
UNDEF needs to go through isel itself. All of the nodes have been
topologically sorted so that instruction selection precedes from root to
entry node. If we create a new node that needs to go through isel, we
have to insert it into the correct place in the topological sort. If we
don't, it might not get selected at all in some cases.
Some targets have a function like X86's insertDAGNode to sort newly
created nodes.
To avoid introducing such a function on RISC-V, we can directly emit the
IMPLICIT_DEF node that UNDEF would get selected to.
I'd guarded this case in D158874 to avoid regressions, and decided to go investigate what was going on. The solution turns out to be a generic splat matching extension to handle INSERT_SUBVECTOR. In theory, we could see these from other sources as well, but for some reason we only seem to see the i64 extract on rv32 case in practice. Not sure why that is to be honest.
Differential Revision: https://reviews.llvm.org/D159230
This will make it easy for callers to see issues with and fix up calls
to createTargetMachine after a future change to the params of
TargetMachine.
This matches other nearby enums.
For downstream users, this should be a fairly straightforward
replacement,
e.g. s/CodeGenOpt::Aggressive/CodeGenOptLevel::Aggressive
or s/CGFT_/CodeGenFileType::
reland [InlineAsm] wrap ConstraintCode in enum class NFC (#66003)
This reverts commit ee643b706be2b6bef9980b25cc9cc988dab94bb5.
Fix up build failures in targets I missed in #66003
Kept as 3 commits for reviewers to see better what's changed. Will
squash when
merging.
- reland [InlineAsm] wrap ConstraintCode in enum class NFC (#66003)
- fix all the targets I missed in #66003
- fix off by one found by llvm/test/CodeGen/SystemZ/inline-asm-addr.ll
This reverts commit 2ca4d136124d151216aac77a0403dcb5c5835bcd.
Also revert the followup, "[InlineAsm] fix botched merge conflict resolution"
This reverts commit 8b9bf3a9f715ee5dce96eb1194441850c3663da1.
There were SystemZ and Mips build errors, too many to fix forward.
Similar to
commit 2fad6e69851e ("[InlineAsm] wrap Kind in enum class NFC")
Fix the TODOs added in
commit 93bd428742f9 ("[InlineAsm] refactor InlineAsm class NFC
(#65649)")
If the high and low 32 bits are the same, we try to use
(ADD X, (SLLI X, 32)) but that only works if bit 31 is clear since
the low 32 bits will be sign extended.
If we have Zba we can use add.uw to zero the sign extended bits.
Reviewed By: reames, wangpc
Differential Revision: https://reviews.llvm.org/D159253
This re-implements the special casing we had in lowerScalarSplat as a DAG combine. As can be seen in the tests, this ends up triggering in a bunch more cases.
The semantically interesting bit of this change is the use of the implicit truncate semantics for when XLEN > SEW. We'd already been doing this for vmv.v.x, but this change extends e.g. the constant matching to make the same assumption about vmv.s.x. Per my reading of the specification, this should be fine, and if anything, is more obviously true of vmv.s.x than vmv.v.x.
Differential Revision: https://reviews.llvm.org/D158874
This patch shares the logic between the various splat ComplexPatterns to help
the diff in some upcoming patches.
It's worth noting that the uimm splat pattern now takes into account the
implicit truncation + sign extend semantics of vmv_v_x_vl, but that doesn't
seem to affect the result since it always took the sext value anyway.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D158741
When folding a vmerge into its operands, if the resulting VL is smaller than
what the vmerge had originally then what was previously in its body then gets
moved to the tail. In that case, we can't relax the tail policy to agnostic
when the merge operand is undefined, since we need to preserve these elements
past the new VL.
Fixes https://github.com/llvm/llvm-project/issues/64754
Reviewed By: craig.topper, reames
Differential Revision: https://reviews.llvm.org/D158161
In a recent series of refactorings (described here: https://discourse.llvm.org/t/riscv-transition-in-vector-pseudo-structure-policy-variants/71295), I greatly increased the number of IMPLICIT_DEF operands to our vector instructions. This has turned out to have an unexpected negative impact because MachineCSE does not CSE IMPLICIT_DEFs, and thus does not CSE any instruction with an IMPLICIT_DEF operand. SelectionDAG *does* CSE the same case, but that only covers the same block case, not the cross block case. This lead to the performance regression reported in https://github.com/llvm/llvm-project/issues/64282.
This change is a slightly ugly hack to side step the issue. Instead of fixing the root cause (lack of CSE for IMPLICIT_DEF) or undoing the operand changes, we leave the extra operand in place, and use NoReg in place of IMPLICIT_DEF. I then convert back to IMPLICIT_DEF just before register allocation so that ProcessImplicitDefs and TwoAddressInstructions can do the normal transforms to Undef tied registers.
We may end up backporting this into the 17.x release branch. Given how late in the release cycle this is landing, that's much less likely now, but still a possibility.
Differential Revision: https://reviews.llvm.org/D156909
Part of this test file was stolen from D156895. We should merge them
when committing.
Reviewed By: asb
Differential Revision: https://reviews.llvm.org/D156926
We could use x0 form in vsetvli when we already know the vlmax and avl is equal to it.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D156404
Similar to D155698 where the shift amount is extended, this patch extends the
ComplexPattern to handle the case where the shift amount has been truncated.
Truncations are custom lowered to truncate_vector_vl, and in cases like i64 ->
i16 they are truncated by one power of two at a time, so we need to unravel
nested layers of them.
The pattern can also be reused for Zvbb's vwsll.vx in an upcoming patch.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D155928
We're currently only matching scalar shift amounts where the type is the same
as the vector element type. But because only the bottom log2(2*SEW) bits are
used, only 7 bits will be used at most so we can use any scalar type >= i8.
This patch adds patterns for the case above, as well as for when the shift
amount type is the same as the widened element type and doesn't need extended.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D155698
A vmv.v.v shares the same encoding as a vmerge that isn't masked, so we can
also fold it into its operands if we treat it as a vmerge with an all-ones
mask. We take care here not to actually transform the existing vmv into a
vmerge, otherwise things like True.hasOneUse() become inaccurate. Instead this
just returns an equivalent list of operands.
This is an alternative to D153351.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D155101
Currently when folding vmerge into its operands, we stop if the VLs aren't
identical. However since the body of (vmerge (vop)) is the intersection of
vmerge and vop's bodies, we can use the smaller of the two VLs if we know it
ahead of time. This patch relaxes the constraint on VL if they are both
constants, or if either of them are VLMAX.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D155071
We can share the code for both the unmasked and masked cases, and add a missing consistency assert in the process.
This is a subset of Luke's D155063. I'm splitting pieces and landing them in the process of convincing myself all the individual transforms are in fact correct. This is the last major piece.