This transformation might be illegal for `PseduoVIOTA_M`. The value of
`viota.m vd, vs2` is the prefix sum of vd2 and adding mask for it may
cause wrong prefix sum.
Take an example, the result of following expression is `{5, 5, 5, 3}`,
```
; v4 = {1, 1, 1, 1}
viota.m v1, v4
; v0 = {0, 0, 0, 1}, v1 = {0, 1, 2, 3}, v8 = {5, 5, 5, 5}
vmerge.vvm v8, v8, v1, v0.t
; v8 = {5, 5, 5, 3}
```
but if we merge them to `viota.m v8, v4, v0.t`, then the result of is
`{5, 5, 5, 0}`.
Also, we still does `performCombineVMergeAndVOps` for `voita.m` when
mask of `vmerge.vvm` is a true mask.
A new ComplexPattern `AddrRegImmLsb00000` is added, which is like
`AddrRegImm` except that if the least significant 5 bits isn't all
zeros, we will fail back to offset 0.
This will make it easy for callers to see issues with and fix up calls
to createTargetMachine after a future change to the params of
TargetMachine.
This matches other nearby enums.
For downstream users, this should be a fairly straightforward
replacement,
e.g. s/CodeGenOpt::Aggressive/CodeGenOptLevel::Aggressive
or s/CGFT_/CodeGenFileType::
reland [InlineAsm] wrap ConstraintCode in enum class NFC (#66003)
This reverts commit ee643b706be2b6bef9980b25cc9cc988dab94bb5.
Fix up build failures in targets I missed in #66003
Kept as 3 commits for reviewers to see better what's changed. Will
squash when
merging.
- reland [InlineAsm] wrap ConstraintCode in enum class NFC (#66003)
- fix all the targets I missed in #66003
- fix off by one found by llvm/test/CodeGen/SystemZ/inline-asm-addr.ll
This reverts commit 2ca4d136124d151216aac77a0403dcb5c5835bcd.
Also revert the followup, "[InlineAsm] fix botched merge conflict resolution"
This reverts commit 8b9bf3a9f715ee5dce96eb1194441850c3663da1.
There were SystemZ and Mips build errors, too many to fix forward.
Similar to
commit 2fad6e69851e ("[InlineAsm] wrap Kind in enum class NFC")
Fix the TODOs added in
commit 93bd428742f9 ("[InlineAsm] refactor InlineAsm class NFC
(#65649)")
In a recent series of refactorings (described here: https://discourse.llvm.org/t/riscv-transition-in-vector-pseudo-structure-policy-variants/71295), I greatly increased the number of IMPLICIT_DEF operands to our vector instructions. This has turned out to have an unexpected negative impact because MachineCSE does not CSE IMPLICIT_DEFs, and thus does not CSE any instruction with an IMPLICIT_DEF operand. SelectionDAG *does* CSE the same case, but that only covers the same block case, not the cross block case. This lead to the performance regression reported in https://github.com/llvm/llvm-project/issues/64282.
This change is a slightly ugly hack to side step the issue. Instead of fixing the root cause (lack of CSE for IMPLICIT_DEF) or undoing the operand changes, we leave the extra operand in place, and use NoReg in place of IMPLICIT_DEF. I then convert back to IMPLICIT_DEF just before register allocation so that ProcessImplicitDefs and TwoAddressInstructions can do the normal transforms to Undef tied registers.
We may end up backporting this into the 17.x release branch. Given how late in the release cycle this is landing, that's much less likely now, but still a possibility.
Differential Revision: https://reviews.llvm.org/D156909
Similar to D155698 where the shift amount is extended, this patch extends the
ComplexPattern to handle the case where the shift amount has been truncated.
Truncations are custom lowered to truncate_vector_vl, and in cases like i64 ->
i16 they are truncated by one power of two at a time, so we need to unravel
nested layers of them.
The pattern can also be reused for Zvbb's vwsll.vx in an upcoming patch.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D155928
We're currently only matching scalar shift amounts where the type is the same
as the vector element type. But because only the bottom log2(2*SEW) bits are
used, only 7 bits will be used at most so we can use any scalar type >= i8.
This patch adds patterns for the case above, as well as for when the shift
amount type is the same as the widened element type and doesn't need extended.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D155698
his change continues with the line of work discussed in https://discourse.llvm.org/t/riscv-transition-in-vector-pseudo-structure-policy-variants/71295.
This is analogous to other patches in the series, but with one key difference - the resulting pseudo does *not* have a policy operand. We could add one for vmerge, but the some of the multiclasses are sufficiently entwined with the mask producing arithmetic instructions that the change delta becomes unmanageable. Note that these instructions are *not* in the RISCVMaskedPseudo table, and thus the difference doesn't complicate other code. The main value of working incrementally here is that we get to eagerly cleanup the IsTA logic flowing through the post-ISEL combines.
Differential Revision: https://reviews.llvm.org/D154645
After D154245 lands, we have greatly simplified the possible configurations for an entry in the RISCVMaskedPseudo table. This change goes through and reworks everything which uses that table to exploit the available simplifications.
To justify the correctness here, let me note that we no longer had any use of HasTU=true. We were left with only the HasTu=false, and IsCombined=true|false cases. The only usage is IsCombined=false was for the comparison operations. At the moment, these operations are the only ones in the table without vector policy operands. Instead of switching on the pseudo value, we can just check the VecPolicy flag instead.
It may be worth adding a passthru operand to the comparisons (which is actually needed to represent tail undefined vs tail agnostic), and a vector policy operand (which is strictly unneeded) just for consistency, but we can do that in a follow up patch for some further simplification if desired.
Note that we do have a few _TU pseudos left at this point. It's simply that none of them are in the RISCVMaskedPseudo table, and thus don't participate in our post-ISEL transforms.
Differential Revision: https://reviews.llvm.org/D154620
This change continues with the line of work discussed in https://discourse.llvm.org/t/riscv-transition-in-vector-pseudo-structure-policy-variants/71295.
This change targets all the pseudos used in loads (unit, strided, segmented, fault first, and their combinations). As with previous changes in the series, we replace the existing TA and TU forms with a single unified pseudo with a passthru (which may be implicit_def) and a policy operand.
One quirk is that I went ahead and treated the unmasked mask load instruction (vlm) the same way. We need the pass thru operand to model tail undefined, but since the instruction is unconditionally agnostic and the instruction has no mask, the policy operand is arguably unneeded. I kept it mostly for consistency sake.
Another quirk worth highlighting is that segment loads require a bit of dedicated handling. Surprisingly, we don't have IMPLICIT_DEF nodes of the right types, and attempting to use them results in some odd looking codegen and a few crashes. Instead, I left the REG_SEQUENCE form, and extended InsertVSETVLI to recognize the complex undefs. Arguably, we should probably revisit the handling of undef reg_sequence nodes here, but I'm hoping to side step that in this patch.
As before, we see codegen changes (some improvements and some regressions) due to scheduling differences caused by the extra implicit_def instructions. I did have to delete one register allocation regression test as I couldn't figure out how to meaningfully update it. I spent a significant amount of time trying, and finally gave up.
Differential Revision: https://reviews.llvm.org/D154141
No idea what I was thinking when I suggested vadd.vi.
Reviewed By: reames, frasercrmck, fakepaper56
Differential Revision: https://reviews.llvm.org/D152553
To do this we need to remove the always matching behavior from condop.
This requires us to add more 'select' isel patterns with a bare GPR
as the condition.
Rename condop/invcondop to riscv_setne/riscv_seteq.
This centralizes the ADDI/XORI/XOR tricks into one place.
XVentanaCondOps check the condition operand for zero or non-zero.
We use this to optimize seteq/setne that would otherwise becomes
xor/xori/addi+snez/seqz. These patterns avoid the snez/seqz.
This patch adds two ComplexPatterns to match the varous cases and
emit the xor/xori/addi instruction.
These patterns can also be used by D144681.
Reviewed By: philipp.tomsich
Differential Revision: https://reviews.llvm.org/D144700
The XTHeadBb extension hab both signed and unsigned bitfield
extraction instructions (TH.EXT and TH.EXTU, respectively) which have
previously only been supported for sign extension on byte, halfword,
and word-boundaries.
This adds the infrastructure to use TH.EXT and TH.EXTU for arbitrary
bitfield extraction.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D144229
There is no compressed form of ORI but there is a compressed form
for ADDI.
This also works for XORI since DAGCombine will turn Xor with disjoint
bits in Or.
Note: The compressed forms require a simm6 immediate, but I'm doing
this for the full simm12 range.
Reviewed By: kito-cheng
Differential Revision: https://reviews.llvm.org/D140674
Follow up to the series:
1. https://reviews.llvm.org/D140161
2. https://reviews.llvm.org/D140349
3. https://reviews.llvm.org/D140331
4. https://reviews.llvm.org/D140323
Completes the work from the previous two for remaining targets.
This creates the following named passes that can be run via
`llc -{start|stop}-{before|after}`:
- arc-isel
- arm-isel
- avr-isel
- bpf-isel
- csky-isel
- hexagon-isel
- lanai-isel
- loongarch-isel
- m68k-isel
- msp430-isel
- mips-isel
- nvptx-isel
- ppc-codegen
- riscv-isel
- sparc-isel
- systemz-isel
- ve-isel
- wasm-isel
- xcore-isel
A nice way to write tests for SelectionDAGISel might be to use a RUN:
line like:
llc -mtriple=<triple> -start-before=<arch>-isel -stop-after=finalize-isel -o -
Fixes: https://github.com/llvm/llvm-project/issues/59538
Reviewed By: asb, zixuan-wu
Differential Revision: https://reviews.llvm.org/D140364
Previously we had a shared ID in SelectionDAGISel. AMDGPU has an
initializePass function for its subclass of SelectionDAGISel. No
other target does.
This causes all target specific SelectionDAGISel passes to be known
as "amdgpu-isel".
I'm not sure what would happen if another target tried to implement
an initializePass function too since the ID is already claimed.
This patch gives all targets their own ID and passes it down to
SelectionDAGISel constructor to MachineFunctionPass's constructor.
Unfortunately, I think this causes most targets to lose
print-before/after-all support for their SelectionDAGISel pass.
And they probably no longer support start/stop-before/after. We
can add initializePass functions to fix this as a follow up. NOTE:
This was probably also broken if the AMDGPU target isn't compiled in.
Step 1 to fixing PR59538.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D140161
We don’t have W versions of AND/OR/XOR/ANDN/ORN/XNOR so we should recursively check their users. We should limit the recursion to SelectionDAG::MaxRecursionDepth levels.
We need to add a Depth argument, all existing callers should pass 0 to the Depth. The new recursive calls should increment it by 1. At the top of the function we should give up and return false if Depth >= SelectionDAG::MaxRecursionDepth.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D139462
This matches what we get for something like.
%0 = shl i32 %x, C
%1 = zext i32 %0 to i64
%2 = getelementptr i32, ptr %y, %1
The shift before the zext and the shift implied by the GEP get
combined with an AND after them. We need to split it back into
2 shifts so we can fold one into shXadd.uw.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D137886
The transformation is benefit because vmerge.vvm always needs mask operand but
vadd.vi may not.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D133255
The patch uses peephole method to fold merge.vvm and unmasked intrinsics to
masked intrinsics. Using peephole intead of tablegen patterns is to avoid large
auto gnerated code.
Note: The patch ignores segment loads since I don't know how to test them.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D130442
InstCombine and DAGCombine prefer to keep shl before binops.
This patch teaches isel to convert to (shl (and/or/xor X, C1 >> C2), C2)
if (C1 >> C2) is a simm12. The idea was taken from X86's isel code.
There's a special case implemented for a sext_inreg between the
shift and the binop.
Differential Revision: https://reviews.llvm.org/D130610
RVV doesn't have immediate field for memory addressing. Currently
we build MachineInstructions in PEI to computing stack offset for
RVV load store instructions. These instructions were added too late to
can be optimized by CSE, LICM... passes.
This patch makes FrameIndex SDNodes can't be matched in RVV Load Store
instruction selection patterns. So that the FrameIndex SDNodes would be
selected as `ADDI GPR, targetframeindex`.
There are 2 advantages for such change:
1. Stack objects address computing can be optimized by machine function
passes.
2. Since the ADDI instruction's destination register can be used as a
temp register, we can save an emergency spill slot.
Differential Revision: https://reviews.llvm.org/D128187
Some more complex cases require checking the relationship of
operands on different nodes of the match. They also require
additional instructions to be created. Using a ComplexPattern
gives us that flexibility.
I'll be adding another pattern in a future patch.
SelectBaseAddr was a minor convenience to use since it already'
existed for vector load/store. D128187 is going to remove the other
uses of SelectBaseAddr so it has less reason to exist.
This patch removes the dependency on SelectBaseAddr and adds a new
SelectAddrFrameIndex to share some code with SelectFrameAddrRegImm.
Previously we had 3 different isel patterns for every scalar load
store instruction.
This reduces them to a single ComplexPattern that returns the Base
and Offset. Or an offset of 0 if there was no offset identified
I've done a similar thing for the 2 isel patterns that match add/or
with FrameIndex and immediate. Using the offset of 0, I was also
able to remove the custom handler for FrameIndex. Happy to split that
to another patch.
We might be able to enhance in the future to remove the post-isel
peephole or the special handling for ADD with constant added by D126576.
A nice side effect is that this removes nearly 3000 bytes from the isel
table.
Differential Revision: https://reviews.llvm.org/D126932
Originally, `OptLevel` isn't passed into the `MachineFunctionPass`.
This lets the default parameter of `SelectionDAGISel`, which is
`CodeGenOpt::Default`, be passed in. OptLevelChanger captures the
optimization level with the parameter, and rather not the value
within `TargetMachine`. This lets the optimization be
unintentionally overwriten if other value than `CodeGenOpt::Default`
passed.
This patch fixes this by passing the optimization level rather
than using the default value.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D126641
The goal is support tail and mask policy in RVV builtins.
We focus on IR part first.
If the passthru operand is undef, we use tail agnostic, otherwise
use tail undisturbed.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D125323
Following D118810 that reduced the size of ISel table,
this patch optimizes allone-masked RVV pseudos with TU policy and
swap them out to their unmasked TU pseudos.
Since the UNDEF merge operand is not preserved, we turn it into TA
pseudo regardless of the policy operand.
Reviewed By: craig.topper, frasercrmck
Differential Revision: https://reviews.llvm.org/D121881
This patch drops TableGen patterns matching all-ones masked RVV pseudos
in the case where there are fallback patterns matching the generic
masked forms to "_MASK" pseudos. This optimization is now performed with
a SelectionDAG post-processing step which peephole-optimizes these same
pseudos with all-ones masks and swaps them out to their unmasked
pseudos.
This cuts our generated ISel table down by around ~5% (~110kB) in lieu
of a far smaller auto-generated table to help with the peephole.
This only targets our custom RISCVISD::*_VL binary operator nodes, which
use the one form for both masked and unmasked variants. A similar
approach could be used for our intrinsics but we'd need to do some work,
e.g., to represent unmasked intrinsics as true-masked intrinsics at the
IR or ISel level. At a rough estimate, this could save us a further 9%
on the size of our ISel table for the binary intrinsic patterns alone.
There is no observable impact on our tests.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D118810
The goal is support tail and mask policy in RVV builtins.
We focus on IR part first.
If the passthru operand is undef, we use tail agnostic, otherwise
use tail undisturbed.
Co-Authored-by: Hsiangkai Wang <Hsiangkai@gmail.com>
Reviewers: craig.topper, frasercrmck
Differential Revision: https://reviews.llvm.org/D117647
This patch introduces new intrinsics that enable the use of vsetvli in
contexts where only the returned vector length is of interest. The
pre-existing intrinsics are marked with side-effects, which prevents
even trivial optimizations on/across them.
These intrinsics are intended to be used in situations where the vector
length is fed in turn to RVV intrinsics or to vector-predication
intrinsics during loop vectorization, for example. Those codegen paths
ensure that instructions are generated with their own implicit vsetvli,
so the vector length and vtype can be relied upon to be correct.
No corresponding C builtins are planned at this stage, though that is a
possibility for the future if the need arises.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D117910
Add the tail policy argument to LLVM IR intrinsics. There are two policies for tail elements. Tail agnostic means users do not care about the values in the tail elements and tail undisturbed means the values in the tail elements need to be kept after the operation. In order to let users control the tail policy, we add an additional argument at the end of the argument list.
For unmasked operations, we have no maskedoff and the tail policy is always tail agnostic. If users want to keep tail elements under unmasked operations, they could use all one mask in the masked operations to do it. So, we only add the additional argument for masked operations for most cases. There are exceptions listed below.
In this patch, we do not handle the following cases to reduce the complexity of the patch. There could be two separate patches for them.
* Use dest argument to control tail policy
vmerge.vvm/vmerge.vxm/vmerge.vim (add _t builtins with additional dest argument)
vfmerge.vfm (add _t builtins with additional dest argument)
vmv.v.v (add _t builtins with additional dest argument)
vmv.v.x (add _t builtins with additional dest argument)
vmv.v.i (add _t builtins with additional dest argument)
vfmv.v.f (add _t builtins with additional dest argument)
vadc.vvm/vadc.vxm/vadc.vim (add _t builtins with additional dest argument)
vsbc.vvm/vsbc.vxm (add _t builtins with additional dest argument)
* Always has tail argument for masked/unmasked intrinsics
Vector Single-Width Integer Multiply-Add Instructions (add _t and _mt builtins)
Vector Widening Integer Multiply-Add Instructions (add _t and _mt builtins)
Vector Single-Width Floating-Point Fused Multiply-Add Instructions (add _t and _mt builtins)
Vector Widening Floating-Point Fused Multiply-Add Instructions (add _t and _mt builtins)
Vector Reduction Operations (add _t and _mt builtins)
Vector Slideup Instructions (add _t and _mt builtins)
Vector Slidedown Instructions (add _t and _mt builtins)
Discussion: https://github.com/riscv/rvv-intrinsic-doc/pull/101
Differential Revision: https://reviews.llvm.org/D105092
Let the sext_inreg be selected to sext.w. Remove unneeded sext.w
during PostProcessISelDAG.
This gives opportunities for some other isel patterns to match
like the ADDIPair or matching mul with immediate to shXadd.
This becomes possible after D107658 started selecting W instructions
based on users. The sext.w will be considered a W user so isel
will often select a W instruction for the sext.w input and we can
just remove the sext.w. Otherwise we can combine the sext.w with
a ADD/SUB/MUL/SLLI to create a new W instruction in parallel
to the the original instruction.
Reviewed By: luismarques
Differential Revision: https://reviews.llvm.org/D107708