This will make it easy for callers to see issues with and fix up calls
to createTargetMachine after a future change to the params of
TargetMachine.
This matches other nearby enums.
For downstream users, this should be a fairly straightforward
replacement,
e.g. s/CodeGenOpt::Aggressive/CodeGenOptLevel::Aggressive
or s/CGFT_/CodeGenFileType::
reland [InlineAsm] wrap ConstraintCode in enum class NFC (#66003)
This reverts commit ee643b706be2b6bef9980b25cc9cc988dab94bb5.
Fix up build failures in targets I missed in #66003
Kept as 3 commits for reviewers to see better what's changed. Will
squash when
merging.
- reland [InlineAsm] wrap ConstraintCode in enum class NFC (#66003)
- fix all the targets I missed in #66003
- fix off by one found by llvm/test/CodeGen/SystemZ/inline-asm-addr.ll
This reverts commit 2ca4d136124d151216aac77a0403dcb5c5835bcd.
Also revert the followup, "[InlineAsm] fix botched merge conflict resolution"
This reverts commit 8b9bf3a9f715ee5dce96eb1194441850c3663da1.
There were SystemZ and Mips build errors, too many to fix forward.
Similar to
commit 2fad6e69851e ("[InlineAsm] wrap Kind in enum class NFC")
Fix the TODOs added in
commit 93bd428742f9 ("[InlineAsm] refactor InlineAsm class NFC
(#65649)")
If the high and low 32 bits are the same, we try to use
(ADD X, (SLLI X, 32)) but that only works if bit 31 is clear since
the low 32 bits will be sign extended.
If we have Zba we can use add.uw to zero the sign extended bits.
Reviewed By: reames, wangpc
Differential Revision: https://reviews.llvm.org/D159253
This re-implements the special casing we had in lowerScalarSplat as a DAG combine. As can be seen in the tests, this ends up triggering in a bunch more cases.
The semantically interesting bit of this change is the use of the implicit truncate semantics for when XLEN > SEW. We'd already been doing this for vmv.v.x, but this change extends e.g. the constant matching to make the same assumption about vmv.s.x. Per my reading of the specification, this should be fine, and if anything, is more obviously true of vmv.s.x than vmv.v.x.
Differential Revision: https://reviews.llvm.org/D158874
This patch shares the logic between the various splat ComplexPatterns to help
the diff in some upcoming patches.
It's worth noting that the uimm splat pattern now takes into account the
implicit truncation + sign extend semantics of vmv_v_x_vl, but that doesn't
seem to affect the result since it always took the sext value anyway.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D158741
When folding a vmerge into its operands, if the resulting VL is smaller than
what the vmerge had originally then what was previously in its body then gets
moved to the tail. In that case, we can't relax the tail policy to agnostic
when the merge operand is undefined, since we need to preserve these elements
past the new VL.
Fixes https://github.com/llvm/llvm-project/issues/64754
Reviewed By: craig.topper, reames
Differential Revision: https://reviews.llvm.org/D158161
In a recent series of refactorings (described here: https://discourse.llvm.org/t/riscv-transition-in-vector-pseudo-structure-policy-variants/71295), I greatly increased the number of IMPLICIT_DEF operands to our vector instructions. This has turned out to have an unexpected negative impact because MachineCSE does not CSE IMPLICIT_DEFs, and thus does not CSE any instruction with an IMPLICIT_DEF operand. SelectionDAG *does* CSE the same case, but that only covers the same block case, not the cross block case. This lead to the performance regression reported in https://github.com/llvm/llvm-project/issues/64282.
This change is a slightly ugly hack to side step the issue. Instead of fixing the root cause (lack of CSE for IMPLICIT_DEF) or undoing the operand changes, we leave the extra operand in place, and use NoReg in place of IMPLICIT_DEF. I then convert back to IMPLICIT_DEF just before register allocation so that ProcessImplicitDefs and TwoAddressInstructions can do the normal transforms to Undef tied registers.
We may end up backporting this into the 17.x release branch. Given how late in the release cycle this is landing, that's much less likely now, but still a possibility.
Differential Revision: https://reviews.llvm.org/D156909
Part of this test file was stolen from D156895. We should merge them
when committing.
Reviewed By: asb
Differential Revision: https://reviews.llvm.org/D156926
We could use x0 form in vsetvli when we already know the vlmax and avl is equal to it.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D156404
Similar to D155698 where the shift amount is extended, this patch extends the
ComplexPattern to handle the case where the shift amount has been truncated.
Truncations are custom lowered to truncate_vector_vl, and in cases like i64 ->
i16 they are truncated by one power of two at a time, so we need to unravel
nested layers of them.
The pattern can also be reused for Zvbb's vwsll.vx in an upcoming patch.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D155928
We're currently only matching scalar shift amounts where the type is the same
as the vector element type. But because only the bottom log2(2*SEW) bits are
used, only 7 bits will be used at most so we can use any scalar type >= i8.
This patch adds patterns for the case above, as well as for when the shift
amount type is the same as the widened element type and doesn't need extended.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D155698
A vmv.v.v shares the same encoding as a vmerge that isn't masked, so we can
also fold it into its operands if we treat it as a vmerge with an all-ones
mask. We take care here not to actually transform the existing vmv into a
vmerge, otherwise things like True.hasOneUse() become inaccurate. Instead this
just returns an equivalent list of operands.
This is an alternative to D153351.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D155101
Currently when folding vmerge into its operands, we stop if the VLs aren't
identical. However since the body of (vmerge (vop)) is the intersection of
vmerge and vop's bodies, we can use the smaller of the two VLs if we know it
ahead of time. This patch relaxes the constraint on VL if they are both
constants, or if either of them are VLMAX.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D155071
We can share the code for both the unmasked and masked cases, and add a missing consistency assert in the process.
This is a subset of Luke's D155063. I'm splitting pieces and landing them in the process of convincing myself all the individual transforms are in fact correct. This is the last major piece.
This is a subset of Luke's D155063. I'm splitting pieces and landing them in the process of convincing myself all the individual transforms are in fact correct.
The code structure here is overly verbose. I'm landing this staging change with the code structure exactly matching the non-masked case to make the following cleanup that commons this all obviously correct.
This is a subset of Luke's D155063. I'm splitting pieces and landing them in the process of convincing myself all the individual transforms are in fact correct.
This particular change involves a slightly ugly bit of code to match the glue to the mask. I'm staging it this way as I ran into a bit of weirdness when commoning mask operands, and wanted to isolate the complexity.
Very minor change, just making sure each step is obvious and easy to follow.
This is a subset of Luke's D155063. I'm splitting pieces and landing them in the process of convincing myself all the individual transforms are in fact correct.
We have the SEW operand access repeating in all paths, common it up to make the code easier to read.
This is a subset of Luke's D155063. I'm splitting pieces and landing them in the process of convincing myself all the individual transforms are in fact correct.
This is a subset of Luke's D155063. I'm splitting pieces and landing them in the process of convincing myself all the individual transforms are in fact correct.
In this case, we're simplifying based on the assumption that all of our vmerge operands have mask operands. This is a fundemental property of a vmerge.
We are already checking for fp exceptions if VL changes, but I believe we
should also be checking for them if the mask changes as well, since that also
affects the set of active elements. From the spec:
> A vector floating-point exception at any active floating-point element sets
> the standard FP exception flags in the fflags register. Inactive elements do
> not set FP exception flags.
Note that we don't change the mask if IsMasked is true, i.e. True is masked
already, since in that case we keep the existing mask.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D154980
Depends on D152879.
Specification PR: riscv-non-isa/rvv-intrinsic-doc#226
This patch adds variant of `vfadd` that models the rounding mode control.
The added variant has suffix `_rm` appended to differentiate from the
existing ones that does not alternate `frm` and uses whatever is inside.
The value `7` is used to indicate no rounding mode change. Reusing the
semantic from the rounding mode encoding for scalar floating-point
instructions.
Additional data member `HasFRMRoundModeOp` is added so we can append
`_rm` suffix for the fadd variants that models rounding mode control.
Additional data member `IsRVVFixedPoint` is added so we can define
pseudo instructions with rounding mode operand and distinguish the
instructions between fixed-point and floating-point.
Reviewed By: craig.topper, kito-cheng
Differential Revision: https://reviews.llvm.org/D152996
his change continues with the line of work discussed in https://discourse.llvm.org/t/riscv-transition-in-vector-pseudo-structure-policy-variants/71295.
This is analogous to other patches in the series, but with one key difference - the resulting pseudo does *not* have a policy operand. We could add one for vmerge, but the some of the multiclasses are sufficiently entwined with the mask producing arithmetic instructions that the change delta becomes unmanageable. Note that these instructions are *not* in the RISCVMaskedPseudo table, and thus the difference doesn't complicate other code. The main value of working incrementally here is that we get to eagerly cleanup the IsTA logic flowing through the post-ISEL combines.
Differential Revision: https://reviews.llvm.org/D154645
After D154245 lands, we have greatly simplified the possible configurations for an entry in the RISCVMaskedPseudo table. This change goes through and reworks everything which uses that table to exploit the available simplifications.
To justify the correctness here, let me note that we no longer had any use of HasTU=true. We were left with only the HasTu=false, and IsCombined=true|false cases. The only usage is IsCombined=false was for the comparison operations. At the moment, these operations are the only ones in the table without vector policy operands. Instead of switching on the pseudo value, we can just check the VecPolicy flag instead.
It may be worth adding a passthru operand to the comparisons (which is actually needed to represent tail undefined vs tail agnostic), and a vector policy operand (which is strictly unneeded) just for consistency, but we can do that in a follow up patch for some further simplification if desired.
Note that we do have a few _TU pseudos left at this point. It's simply that none of them are in the RISCVMaskedPseudo table, and thus don't participate in our post-ISEL transforms.
Differential Revision: https://reviews.llvm.org/D154620
We try to fold RISCVISD::VMV_V_X_VL series node + scalar load -> vector load.
But if scalar load is indexed load (load update form), it's not profitable to fold because load update node can't be removed after fold.
Differential Revision: https://reviews.llvm.org/D152222
This change continues with the line of work discussed in https://discourse.llvm.org/t/riscv-transition-in-vector-pseudo-structure-policy-variants/71295.
This change targets all the pseudos used in loads (unit, strided, segmented, fault first, and their combinations). As with previous changes in the series, we replace the existing TA and TU forms with a single unified pseudo with a passthru (which may be implicit_def) and a policy operand.
One quirk is that I went ahead and treated the unmasked mask load instruction (vlm) the same way. We need the pass thru operand to model tail undefined, but since the instruction is unconditionally agnostic and the instruction has no mask, the policy operand is arguably unneeded. I kept it mostly for consistency sake.
Another quirk worth highlighting is that segment loads require a bit of dedicated handling. Surprisingly, we don't have IMPLICIT_DEF nodes of the right types, and attempting to use them results in some odd looking codegen and a few crashes. Instead, I left the REG_SEQUENCE form, and extended InsertVSETVLI to recognize the complex undefs. Arguably, we should probably revisit the handling of undef reg_sequence nodes here, but I'm hoping to side step that in this patch.
As before, we see codegen changes (some improvements and some regressions) due to scheduling differences caused by the extra implicit_def instructions. I did have to delete one register allocation regression test as I couldn't figure out how to meaningfully update it. I spent a significant amount of time trying, and finally gave up.
Differential Revision: https://reviews.llvm.org/D154141
This change continues with the line of work discussed in https://discourse.llvm.org/t/riscv-transition-in-vector-pseudo-structure-policy-variants/71295. In D153155, we started removing the legacy distinction between unsuffixed (TA) and _TU pseudos. This patch continues that effort for the unary instruction families.
The change consists of a few interacting pieces:
* Adding a vector policy operand to VPseudoUnaryNoMaskTU.
* Then using VPseudoUnaryNoMaskTU for all cases where VPseudoUnaryNoMask was previously used and deleting the unsuffixed form.
* Then renaming VPseudoUnaryNoMaskTU to VPseudoUnaryNoMask, and adjusting the RISCVMaskedPseudo table to use the combined pseudo.
* Fixing up two places in C++ code which manually construct VMV_V_* instructions.
Normally, I'd try to factor this into a couple of changes, but in this case, the table structure is tied to naming and thus we can't really separate the otherwise NFC bits.
As before, we see codegen changes (some improvements and some regressions) due to scheduling differences caused by the extra implicit_def instructions.
Differential Revision: https://reviews.llvm.org/D153899
There is an issue: https://github.com/llvm/llvm-project/issues/63515
The issue is because when expanding SPLAT_VECTOR_SPLIT_I64_VL node, only memoperand is used to create dependency.
However in ScheduleDAGNodes, dependency is checked with chain only, and breaks order of store/load instructions.
I think in llvm.bitreverse.nxv2i64 intrinsic SPLAT_VECTOR_SPLIT_I64_VL nodes are parallel processed,
so no chain should be add to these nodes.
Using temporary in expanding SPLAT_VECTOR_SPLIT_I64_VL node can keep vlse instruction get correct value
no matter order of store instructions is changed.
Differential Revision: https://reviews.llvm.org/D153743
The code was using the tail policy being "agnostic" to select a instruction whose semantics were "undefined". This was almost always fine (as the pass through operand was usually implicit_def), but could in theory lead to a miscompile. I don't actually have a test case as it requires a later transform to exploit the wrong tail policy state, and I couldn't easily figure out to get vsetvli insertion to miscompile given the wrong state. This was spotted by inspection, and it may be a miscompile in theory only at the moment.
Note that this may cause regressions if there are instructions for which we either don't have a _TU pseudo form, or the _TU pseudo form is missing a policy operand. When I was first looking at this, I saw exactly that, and D153067 exists to add the missing policy operand I noticed.
As a later follow up, I want to always force the use of _TU, but it seemed good to fix the bug, then driven the _TU transition in a separate patch.
Differential Revision: https://reviews.llvm.org/D153070
We can just explicitly check if the new unmasked pseudo takes a policy
op, rather than implicitly relying on I->UnmaskedTUPseudo ==
I->UnmaskedPseudo. Split out from another patch to make the diff more
readable.
Differential Revision: https://reviews.llvm.org/D152961
We already treat -1 passed to instruction intrinsics as vlmax, this
make vsetvli consistent.
Reviewed By: rogfer01
Differential Revision: https://reviews.llvm.org/D152954
This is after lowering of undef to IMPLICIT_DEF, so the condition is always false. Rather than fixing the intent (which was to match implicit_def per the comment), just delete it. We're in the process of migrating away from the TA pseudos, so using _TA more often is fine.
This patch teaches performCombineVMergeAndVOps how to handle a True instruction (the one being merged into) which is a _TU psuedo, but with an implicit_def passthrough operand. These are semantically equivalent to the unsuffixed "TA" psuedos, and we can hnndle them as such.
This is a companion to D152380, and demonstrates the unsuffixed to _TA pseudo transition for a non-VMERGE case. Between the two of them, these should cover all the changes required to the post-ISEL combines, and other arithmetic-like instructions should be just TD changes.
See https://discourse.llvm.org/t/riscv-transition-in-vector-pseudo-structure-policy-variants/71295 for context on the patch series.
Differential Revision: https://reviews.llvm.org/D152740
This is the first patch in a series to change how we represent tail agnostic, tail undefined, and tail undisturbed operations. In current code, we tend to use an unsuffixed pseudo for undefined (despite calling it TA most places in code), and the _TU form for both agnostic and undisturbed (via the policy operand).
The key observation behind this patch is that we can represent tail undefined via a pseudo with a passthrough operand if that operand is IMPLICIT_DEF (aka undef). We already have a few instances of this in tree - see vmv.s.x and vslide* - but we can do this more universally. Once complete, we will be able to delete roughly ~1/3 of our vector pseudo classes.
A bit more information on the overall goal can be found in this discourse post: https://discourse.llvm.org/t/riscv-transition-in-vector-pseudo-structure-policy-variants/71295.
This patch doesn't actually remove the legacy unsuffixed pseudo as there's still some path from intrinsic lowering which uses it. (I have not yet located it.) This also means we don't have to modify any of the lookup tables which makes the migration simpler. We can defer deleting the tables and pseudos until one final change once all the instructions have been migrated.
There are a couple of regressions in the tests. At first, these concerned me, but it turns out that all of them are differences in expansion of a single source level instruction. I think we can safely ignore this for the moment. I did explore changing the handling of IMPLICIT_DEF in ScheduleDAG, but that causes an absolutely *massive* test diff with minimal profit. I really don't think it's worth doing.
Differential Revision: https://reviews.llvm.org/D152380
This was used to know if we need to insert a dummy operand during
MCInstLowering. We can use the operand info from MCInstrDesc to
figure this out without needing a separate flag.
I'll remove the tablegen bits if there is consensus this is a good
idea.
Differential Revision: https://reviews.llvm.org/D152050
No idea what I was thinking when I suggested vadd.vi.
Reviewed By: reames, frasercrmck, fakepaper56
Differential Revision: https://reviews.llvm.org/D152553