Generate QC_INSB/QC_INSBI from `or (and X, MaskImm), OrImm` iff the
value being inserted only sets known zero bits. This is based on a
similar DAG to DAG transform done in `AArch64`.
This patch adds CodeGen support for qc.insbi and qc.insb instructions
defined in the Qualcomm uC Xqcibm extension. qc.insbi and qc.insb
inserts bits into destination register from immediate and register
operand respectively.
A sequence of `xor`, `and` & `xor` depending on appropriate conditions
are converted to `qc.insbi` or `qc.insb` which depends on the
immediate's value.
A disjoint OR can be converted to XOR. And a XOR+NOT is XNOR. Idea
taken from #147279.
I changed the existing xnor pattern to have the not on the outside
instead of the inside. These are equivalent for xor since xor is
associative. Tablegen was already generating multiple variants
of the isel pattern using associativity.
There are some issues here. The disjoint flag isn't preserved
through type legalization. I was hoping we could recover it
manually for the masked merge cases, but that doesn't work either.
When the immediate to the ORI is a ShiftedMask_32 that does not fit in
12-bits we can use the QC.INSBI instruction instead. We do not do this
for cases where the ORI can be replaced with a BSETI since these can be
compressesd when the Xqcibm extension (which QC.INSBI is a part of) is
enabled.
This moves the peephole that folds vmerges into its operands into
RISCVVectorPeephole. This will also allow us to eventually commute
instructions to allow folding, see #141885 and #70042
Most of the test diffs are due to the slight change in instruction
ordering.
For now doPeepholeMaskedRVV is kept even though its a duplicate of
RISCVVectorPeephole::convertToUnmasked to minimize the diff, I plan on
removing it in a separate patch as it causes some instructions to be
shuffled around.
Similarly, this runs foldVMergeToMask before the other peepholes to
minimize the diff for now.
rvv-peephole-vmerge-vops-mir.ll was replaced with a dedicated
vmerge-peephole.mir test.
This patch is similar to #142737
The XAndesPerf extension includes signed bitfield extraction
instruction `NDS.BFOS, which can extract the bits from 0 to Len - 1,
place them starting at bit Lsb, zero-filled the bits from 0 to Lsb -1,
and sign-extend the result.
When Lsb == Msb, it is a special case where the Lsb will be set to 0
instead of being equal to the Msb.
The XAndesPerf extension includes unsigned bitfield extraction
instruction `NDS.BFOZ`, which can extract the bits from 0 to Len -1,
place them starting at bit Msb, and zero-fills the remaining bits.
This patch handles the cases where Msb < Lsb for `NDS.BFOZ`.
Instruction Sytax:
nds.bfoz Rd, Rs1, Msb, Lsb
The operation is:
if Msb < Lsb:
Lenm1 = Lsb - Msb;
Rd[Lsb:Msb] = Rs1[Lenm1:0];
if (Lsb < (XLen -1)) Rd[XLen-1:Lsb+1]=0;
Rd[Msb-1:0]=0;
When Len == 1, it is a special case where the Msb is set to 0 instead of
being equal to the Lsb.
This avoids a bunch of complexity around making sure the offset doesn't
exceed 4093 so we can add 4 after splitting later. By splitting early,
the split loads/stores will get selected independently.
There's a bit of follow up work to do, particularly around splitting a
constant pool load. Overall I think this is cleaner with less edge
cases.
This commit moves RISC-V to auto-generate its target-specific SDNode
types. The biggest change is that SDNodes can now be validated against
their expected type profiles, and that we don't need to edit several
different files when declaring a new one.
This takes Sergei's work in #119709 and "finishes" it - by moving the
final five RISCVISD opcodes into tablegen (including defining their
types), and by ensuring the tablegen has expected closing scope
comments.
Co-authored-by: Sergei Barannikov <barannikov88@gmail.com>
We'd already had this transform for the intrinsics, but hadn't added it
for either fixed length or scalable vectors coming from normal IR.
For the record, the fact we have three different sets of patterns here
really is quite ugly.
llvm-mca needs some of them for #128978.
I'm relying on -ffunction-sections and -fdata-sections allowing these to
be stripped from tools that don't need them like llvm-mc.
This line was previously removed when
12d47247e5046b959af180e12f648c54e2c5e863 moved it to RISCVInstrInfo.h.
But we probably don't want to have dangling `#define *_DECL`
(RISCVGenSearchableTables.inc will `#undef` these macros) and I think
there is no harm putting declarations of those search table functions in
RISCVISelDAGToDAG.h.
selectFPImm previously handled cases where an FPImm could be
materialized in an integer register.
We can generalize this to cases where a value was in an integer register
and then copied to a scalar FP register to be used by a vector
instruction.
In the affected test, the call lowering code used up all of the FP
argument registers and started using GPRs. Now we use integer vector
instructions to consume those GPRs instead of moving them to scalar FP
first.
This patch handles target lowering and calling convention.
For target lowering, the vector tuple type represented as multiple
scalable vectors is now changed to a single `MVT`, each `MVT` has a
corresponding register class.
The load/store of vector tuples are handled as the same way but need
another vector insert/extract instructions to get sub-register group.
Inline assembly constraint for vector tuple type can directly be modeled
as "vr" which is identical to normal vector registers.
For calling convention, it no longer needs an alternative algorithm to
handle register allocation, this makes the code easier to maintain and
read.
Stacked on https://github.com/llvm/llvm-project/pull/97994
In order to support -unaligned-scalar-mem properly, we need to be more
careful with immediates of global variables. We need to guarantee that
adding 4 in RISCVExpandingPseudos won't overflow simm12. Since we don't
know what the simm12 is until link time, the only way to guarantee this
is to make sure the base address is at least 8 byte aligned.
There were also several corner cases bugs in immediate folding where we
would fold an immediate in the range [2044,2047] where adding 4 would
overflow. These are not related to unaligned-scalar-mem.
- Fix build with `EXPENSIVE_CHECKS`
- Remove unused `PassName::ID` to resolve warning
- Mark `~SelectionDAGISel` virtual so AArch64 backend can work properly
This reverts commit de37c06f01772e02465ccc9f538894c76d89a7a1 to
de37c06f01772e02465ccc9f538894c76d89a7a1
It still breaks EXPENSIVE_CHECKS build. Sorry.
Port selection dag isel to new pass manager.
Only `AMDGPU` and `X86` support new pass version. `-verify-machineinstrs` in new pass manager belongs to verify instrumentation, it is enabled by default.
This follows the same pattern as 20e62658735a1b03ecadc. Although we
can't reduce the number of instructions used, if we are able to use a
sign-extended 6-bit immediate then the 16-bit c.li instruction can be
selected (thus saving code size). Although this _could_ be gated so it
only happens if C is enabled, I've opted not to because at worst it's
neutral and it doesn't seem helpful to add unnecessary divergence
between the RVC and non-RVC paths.
This transformation might be illegal for `PseduoVIOTA_M`. The value of
`viota.m vd, vs2` is the prefix sum of vd2 and adding mask for it may
cause wrong prefix sum.
Take an example, the result of following expression is `{5, 5, 5, 3}`,
```
; v4 = {1, 1, 1, 1}
viota.m v1, v4
; v0 = {0, 0, 0, 1}, v1 = {0, 1, 2, 3}, v8 = {5, 5, 5, 5}
vmerge.vvm v8, v8, v1, v0.t
; v8 = {5, 5, 5, 3}
```
but if we merge them to `viota.m v8, v4, v0.t`, then the result of is
`{5, 5, 5, 0}`.
Also, we still does `performCombineVMergeAndVOps` for `voita.m` when
mask of `vmerge.vvm` is a true mask.
A new ComplexPattern `AddrRegImmLsb00000` is added, which is like
`AddrRegImm` except that if the least significant 5 bits isn't all
zeros, we will fail back to offset 0.
This will make it easy for callers to see issues with and fix up calls
to createTargetMachine after a future change to the params of
TargetMachine.
This matches other nearby enums.
For downstream users, this should be a fairly straightforward
replacement,
e.g. s/CodeGenOpt::Aggressive/CodeGenOptLevel::Aggressive
or s/CGFT_/CodeGenFileType::
reland [InlineAsm] wrap ConstraintCode in enum class NFC (#66003)
This reverts commit ee643b706be2b6bef9980b25cc9cc988dab94bb5.
Fix up build failures in targets I missed in #66003
Kept as 3 commits for reviewers to see better what's changed. Will
squash when
merging.
- reland [InlineAsm] wrap ConstraintCode in enum class NFC (#66003)
- fix all the targets I missed in #66003
- fix off by one found by llvm/test/CodeGen/SystemZ/inline-asm-addr.ll
This reverts commit 2ca4d136124d151216aac77a0403dcb5c5835bcd.
Also revert the followup, "[InlineAsm] fix botched merge conflict resolution"
This reverts commit 8b9bf3a9f715ee5dce96eb1194441850c3663da1.
There were SystemZ and Mips build errors, too many to fix forward.
Similar to
commit 2fad6e69851e ("[InlineAsm] wrap Kind in enum class NFC")
Fix the TODOs added in
commit 93bd428742f9 ("[InlineAsm] refactor InlineAsm class NFC
(#65649)")
In a recent series of refactorings (described here: https://discourse.llvm.org/t/riscv-transition-in-vector-pseudo-structure-policy-variants/71295), I greatly increased the number of IMPLICIT_DEF operands to our vector instructions. This has turned out to have an unexpected negative impact because MachineCSE does not CSE IMPLICIT_DEFs, and thus does not CSE any instruction with an IMPLICIT_DEF operand. SelectionDAG *does* CSE the same case, but that only covers the same block case, not the cross block case. This lead to the performance regression reported in https://github.com/llvm/llvm-project/issues/64282.
This change is a slightly ugly hack to side step the issue. Instead of fixing the root cause (lack of CSE for IMPLICIT_DEF) or undoing the operand changes, we leave the extra operand in place, and use NoReg in place of IMPLICIT_DEF. I then convert back to IMPLICIT_DEF just before register allocation so that ProcessImplicitDefs and TwoAddressInstructions can do the normal transforms to Undef tied registers.
We may end up backporting this into the 17.x release branch. Given how late in the release cycle this is landing, that's much less likely now, but still a possibility.
Differential Revision: https://reviews.llvm.org/D156909
Similar to D155698 where the shift amount is extended, this patch extends the
ComplexPattern to handle the case where the shift amount has been truncated.
Truncations are custom lowered to truncate_vector_vl, and in cases like i64 ->
i16 they are truncated by one power of two at a time, so we need to unravel
nested layers of them.
The pattern can also be reused for Zvbb's vwsll.vx in an upcoming patch.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D155928
We're currently only matching scalar shift amounts where the type is the same
as the vector element type. But because only the bottom log2(2*SEW) bits are
used, only 7 bits will be used at most so we can use any scalar type >= i8.
This patch adds patterns for the case above, as well as for when the shift
amount type is the same as the widened element type and doesn't need extended.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D155698
his change continues with the line of work discussed in https://discourse.llvm.org/t/riscv-transition-in-vector-pseudo-structure-policy-variants/71295.
This is analogous to other patches in the series, but with one key difference - the resulting pseudo does *not* have a policy operand. We could add one for vmerge, but the some of the multiclasses are sufficiently entwined with the mask producing arithmetic instructions that the change delta becomes unmanageable. Note that these instructions are *not* in the RISCVMaskedPseudo table, and thus the difference doesn't complicate other code. The main value of working incrementally here is that we get to eagerly cleanup the IsTA logic flowing through the post-ISEL combines.
Differential Revision: https://reviews.llvm.org/D154645
After D154245 lands, we have greatly simplified the possible configurations for an entry in the RISCVMaskedPseudo table. This change goes through and reworks everything which uses that table to exploit the available simplifications.
To justify the correctness here, let me note that we no longer had any use of HasTU=true. We were left with only the HasTu=false, and IsCombined=true|false cases. The only usage is IsCombined=false was for the comparison operations. At the moment, these operations are the only ones in the table without vector policy operands. Instead of switching on the pseudo value, we can just check the VecPolicy flag instead.
It may be worth adding a passthru operand to the comparisons (which is actually needed to represent tail undefined vs tail agnostic), and a vector policy operand (which is strictly unneeded) just for consistency, but we can do that in a follow up patch for some further simplification if desired.
Note that we do have a few _TU pseudos left at this point. It's simply that none of them are in the RISCVMaskedPseudo table, and thus don't participate in our post-ISEL transforms.
Differential Revision: https://reviews.llvm.org/D154620
This change continues with the line of work discussed in https://discourse.llvm.org/t/riscv-transition-in-vector-pseudo-structure-policy-variants/71295.
This change targets all the pseudos used in loads (unit, strided, segmented, fault first, and their combinations). As with previous changes in the series, we replace the existing TA and TU forms with a single unified pseudo with a passthru (which may be implicit_def) and a policy operand.
One quirk is that I went ahead and treated the unmasked mask load instruction (vlm) the same way. We need the pass thru operand to model tail undefined, but since the instruction is unconditionally agnostic and the instruction has no mask, the policy operand is arguably unneeded. I kept it mostly for consistency sake.
Another quirk worth highlighting is that segment loads require a bit of dedicated handling. Surprisingly, we don't have IMPLICIT_DEF nodes of the right types, and attempting to use them results in some odd looking codegen and a few crashes. Instead, I left the REG_SEQUENCE form, and extended InsertVSETVLI to recognize the complex undefs. Arguably, we should probably revisit the handling of undef reg_sequence nodes here, but I'm hoping to side step that in this patch.
As before, we see codegen changes (some improvements and some regressions) due to scheduling differences caused by the extra implicit_def instructions. I did have to delete one register allocation regression test as I couldn't figure out how to meaningfully update it. I spent a significant amount of time trying, and finally gave up.
Differential Revision: https://reviews.llvm.org/D154141
No idea what I was thinking when I suggested vadd.vi.
Reviewed By: reames, frasercrmck, fakepaper56
Differential Revision: https://reviews.llvm.org/D152553
To do this we need to remove the always matching behavior from condop.
This requires us to add more 'select' isel patterns with a bare GPR
as the condition.
Rename condop/invcondop to riscv_setne/riscv_seteq.
This centralizes the ADDI/XORI/XOR tricks into one place.