This improves opportunities to use bset/bclr/binv. Unfortunately,
there are no W versions of these instrcutions so this isn't always
a clear win. If we use SLLW we get free sign extend and shift masking,
but need to put a 1 in a register and can't remove an or/xor. If
we use bset/bclr/binv we remove the immediate materializationg and
logic op, but might need a mask on the shift amount and sext.w.
Reviewed By: luismarques
Differential Revision: https://reviews.llvm.org/D124096
The description of SETCC says
/// SetCC operator - This evaluates to a true value iff the condition is
/// true. If the result value type is not i1 then the high bits conform
/// to getBooleanContents.
Without this patch, we sign extended the i1 to the used larger type
regardless of getBooleanContents. This resulted in miscompiles, as
shown in the attached testcase that ended up returning -1 instead of
1 when using -mattr=+v.
Fixes https://github.com/llvm/llvm-project/issues/55168
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D124618
By clearing the HasDummyMask flag from mask register binary operations
and mask load/store.
HasDummyMask was causing an extra operand to get appended when
converting from MachineInstr to MCInst. This extra operand doesn't
appear in the assembly string so was mostly ignored, but it prevented
the alias instruction printing from working correctly.
Reviewed By: arcbbb
Differential Revision: https://reviews.llvm.org/D124424
The default implementation of findCommutedOpIndices picks the
first two source operands. That's exactly what we want for the
scalar FMA instructions.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D124463
Following D118810 that reduced the size of ISel table,
this patch optimizes allone-masked RVV pseudos with TU policy and
swap them out to their unmasked TU pseudos.
Since the UNDEF merge operand is not preserved, we turn it into TA
pseudo regardless of the policy operand.
Reviewed By: craig.topper, frasercrmck
Differential Revision: https://reviews.llvm.org/D121881
vslideup works by leaving elements 0<i<OFFSET undisturbed.
so it need the destination operand as input for correctness
regardless of policy. Add a operand to indicate policy.
We also add policy operand for unmaksed vslidedown to keep the interface consistent with vslideup
because vslidedown have only undisturbed at 0<i<vstart but user have no way to control of vstart.
Reviewed By: rogfer01, craig.topper
Differential Revision: https://reviews.llvm.org/D124186
In D123418 we removed some RUN line (ex. RV32-ELEN16) but their
expected results still exist there.
Remove them and rename prefix for more descriptive.
Reviewed By: frasercrmck, asb, craig.topper
Differential Revision: https://reviews.llvm.org/D124179
Backwards search
The sext.w removal pass (before the new patch) checks if the input to sext.w is already in sign-extended form, so it can eliminate it. It does that by checking every definition/source that reaches the sext.w is an instruction that produces a sign-extended value, either by definition (e.g. ADDW), or it propagates sign-extension (e.g. OR) so we check its sources recursively.
Forward search
Sometimes, one of the sources is an instruction that doesn't always produce a sign-extended value, but it has a W-version that does (e.g. ADD / ADDW). If we transform the ADD to ADDW, the sext.w can be removed (assuming other def paths are satisfied), but this transformation is sound only if every use of this ADD/W only reqruires the lower 32-bits either directly (like sll %x, 32) or they propagate dependency (lower word of output only depends on lower word of input) so we check its uses recursively.
When searching backwards, if an instruction that can be replaced with W-variant is encountered, this pass runs the forward search to verify it can be replaced, then adds it to a list of fixable instructions. After verifying all paths, it replaces the instruction and removes the sext.w.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D119928
This patch adds custom MIR operand comments to VTYPE immediate operands
in VSETVLI instructions and SEW/LMUL operands in vector codegen pseudo
instructions. The result is intended to be more human-readable and
hopefully maintainable when working with MIR, particularly when
writing or reading test cases.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D124187
We saw a failure caused by unwinding with incomplete CFIs, so we
can't outline CFI instructions when they are needed in EH.
This is a recommit of 0d40688, which was reverted in ce83883 as
related precommit test 360d44e caused some errors.
Reviewed By: luismarques
Differential Revision: https://reviews.llvm.org/D122634
This is a recommit of 360d44e, which was reverted
in b1620d4 because it caused some errors due to no
`nounwind` attrs in `machine-outliner-cfi.mir`.
Reviewed By: luismarques
Differential Revision: https://reviews.llvm.org/D123364
If there are fewer than 12 trailing zeros, we'll try to use an ADDI
at the end of the sequence. If we strip trailing zeros and end the
sequence with a SLLI we might find a shorter sequence.
Differential Revision: https://reviews.llvm.org/D124148
We saw a failure caused by unwinding with incomplete CFIs, so we
can't outline CFI instructions when they are needed in EH.
Reviewed By: luismarques
Differential Revision: https://reviews.llvm.org/D122634
We can't shift-right negative numbers to divide them, so avoid emitting
such sequences. Use negative numerators as a proxy for this situation, since
the indices are always non-negative.
An alternative strategy could be to add a compiler flag to emit division
instructions, which would at least allow us to test the VID sequence
matching itself.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D123796
This test shows a (contrived) BUILD_VECTOR which is correctly identified
as a sequence of ((vid * -3) / 8) + 5. However, the issue is that using
shift-right for the divide is invalid as the step values are negative.
This patch just adds the test: the fix is added in D123796.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D123989
There's an existing generic combine that does this for legal types.
This patch adds a RISCV specific combine for W instructions.
Reviewed By: luismarques
Differential Revision: https://reviews.llvm.org/D123983
This patch fixes a bug when lowering BUILD_VECTOR via VID sequences.
After adding support for fractional steps in D106533, elements with zero
steps may be skipped if no step has yet been computed. This allowed
certain sequences to slip through the cracks, being identified as VID
sequences when in fact they are not.
The fix for this is to perform a second loop over the BUILD_VECTOR to
validate the entire sequence once the step has been computed. This isn't
the most efficient, but on balance the code is more readable and
maintainable than doing back-validation during the first loop.
Fixes the tests introduced in D123785.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D123786
These tests both use vector constants misidentified as VID sequences.
Because the initial run of elements has a zero step, the elements are
skipped until such a step can be identified. The bug is that the skipped
elements are never validated, even though the computed step is
incompatible across the entire sequence.
A fix will follow in a subseqeuent patch.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D123785
This patch adds rvv codegen support for vp.fptrunc. The lowering of fp_round and vp.fptrunc share most code so use a common lowering function to handle those two, similar to vp.trunc.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D123841
This is a replacement for the original fix attempted in
c46aab01c002b7a04135b8b7f1f52d8c9ae23a58.
This fixes "overlapping insert" assertion failures when trying to
unwind an unsuccessful recoloring attempt.
The problem would occur when there are multiple recoloring candidates
which recursively required recoloring. If one recoloring candidate was
successfully recolored at one level, and the next recoloring candidate
was unsuccessful, we would not roll back the first candidates
successful recoloring. The forgotten successful recoloring may have
been assigned to something that conflicts with a register that needs
to be restored in a parent recoloring attempt.
See the testcase added in issue48473 for a more concrete example with
explanation.
Materializing constants on RISCV is simpler if the constant is sign
extended from i32. By default i32 constant operands of phis are
zero extended.
This patch adds a hook to allow RISCV to override this for i32. We
have an existing isSExtCheaperThanZExt, but it operates on EVT which
we don't have at these places in the code.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D122951
This was added before Zve extensions were defined. I think users
should use Zve32x or Zve32f now. Though we will lose support for limiting
ELEN to 16 or 8, but I hope no one was using that.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D123418
We're just trying to canonicalize here and won't be using the constant
value returned.
The attached test changes are because we were previously commuting
a seteq X, (splat_vector 0) because we also have (sub 0, X). The
0 is larger than the element type so we don't detect it as a splat
without the AllowTruncation flag. By preventing the commute we are
able to match it to the vmseq.vx instruction during isel. We only
look for constants on the RHS in isel.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D123256
There's an assert in LUI+SH*ADD+ADDI materialization that makes sure the
lower 12 bits aren't zero since that case should have been handled as
LUI+ADDI+SH*ADD. But nothing prevented the LUI+SH*ADD+ADDI checks from
running after the earlier code handled it.
The sequence would be the same length or longer so it wouldn't replace
the earlier sequence, but the assert happened before that was checked.
The vector holding the sequence also wasn't reset before the second
check so that guaranteed the sequence would never be found to be
shorter.
This patch fixes this by only trying the second expansion when the
earlier fails.
Fixes PR54812.
Reviewed By: benshi001
Differential Revision: https://reviews.llvm.org/D123406
SLLI is always compressible to C.SLLI as long as the source and dest
register is the same.
ANDI and SRLI are only compressible if the register is x8-x15. By
using SLLI we have a better chance of generating shorter code.
I had to exclude one exclusion for the BEXTI case so that it's
pattern match could still fire.
Reviewed By: luismarques
Differential Revision: https://reviews.llvm.org/D123336
This patch synchronizes the structure of the templates with those
in RISCVInstrInfoVVLPatterns.td so that we get patterns with .vx
on the left hand side.
Reviewed By: rogfer01
Differential Revision: https://reviews.llvm.org/D123255
This matches VPatIntegerSetCCVL_VI_Swappable. But as noted in the
FIXME this may only be needed due to lack of canonicalization on
VP_SETCC.
Reviewed By: rogfer01
Differential Revision: https://reviews.llvm.org/D123239
This patch adds the necessary infrastructure to lower vp.fcmp via
ISD::VP_SETCC to RVV instructions.
Most notably this patch adds cond-code legalization for VP_SETCC,
reusing the existing TargetLowering::LegalizeSetCCCondCode by passing in
additional SDValue parameters for the Mask and EVL. This method then
uses VP operations to legalize the condcode.
There is still a general lack of canonicalization on VP_SETCC as opposed
to SETCC which results in worse code than is theoretically possible.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D123051
This patch adds the minimum required to successfully lower vp.icmp via
the new ISD::VP_SETCC node to RVV instructions.
Regular ISD::SETCC goes through a lot of canonicalization which targets
may rely on which has not hereto been ported to VP_SETCC. It also
supports expansion of individual condition codes and a non-boolean
return type. Support for all of that will follow in later patches.
In the case of RVV this largely isn't a problem as the vector integer
comparison instructions are plentiful enough that it can lower all
VP_SETCC nodes on legal integer vectors except for boolean vectors,
which regular SETCC folds away immediately into logical operations.
Floating-point VP_SETCC operations aren't as well supported in RVV and
the backend relies on condition code expansion, so support for those
operations will come in later patches.
Portions of this code were taken from the VP reference patches.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D122743