llvm-project

Author	SHA1	Message	Date
Brandon Wu	fb94c6491a	[RISCV][SiFive] Reduce intrinsics of SiFive VCIX extension (#79407 ) This patch models LMUL and SEW as inputs in sf_vc_x_se and sf_vc_i_se, it reduces 42 intrinsics in the lookup table.	2024-01-26 11:15:53 +08:00
Yeting Kuo	a5c1ecada2	[RISCV] Disable performCombineVMergeAndVOps for PseduoVIOTA_M. (#71483 ) This transformation might be illegal for `PseduoVIOTA_M`. The value of `viota.m vd, vs2` is the prefix sum of vd2 and adding mask for it may cause wrong prefix sum. Take an example, the result of following expression is `{5, 5, 5, 3}`, ``` ; v4 = {1, 1, 1, 1} viota.m v1, v4 ; v0 = {0, 0, 0, 1}, v1 = {0, 1, 2, 3}, v8 = {5, 5, 5, 5} vmerge.vvm v8, v8, v1, v0.t ; v8 = {5, 5, 5, 3} ``` but if we merge them to `viota.m v8, v4, v0.t`, then the result of is `{5, 5, 5, 0}`. Also, we still does `performCombineVMergeAndVOps` for `voita.m` when mask of `vmerge.vvm` is a true mask.	2023-11-07 16:21:35 +08:00
Luke Lau	fecd11ba87	[RISCV] Remove old peephole declaration in RISCVISelDAGToDAG.h. NFC It was removed in 72e6c1c70d5e07bbc8cb7cae2ed915108daf93aa	2023-10-30 15:45:54 +00:00
Wang Pengcheng	f24d9490e5	[RISCV] Match prefetch address with offset (#66072 ) A new ComplexPattern `AddrRegImmLsb00000` is added, which is like `AddrRegImm` except that if the least significant 5 bits isn't all zeros, we will fail back to offset 0.	2023-10-20 14:22:48 +08:00
Arthur Eubanks	0a1aa6cda2	[NFC][CodeGen] Change CodeGenOpt::Level/CodeGenFileType into enum classes (#66295 ) This will make it easy for callers to see issues with and fix up calls to createTargetMachine after a future change to the params of TargetMachine. This matches other nearby enums. For downstream users, this should be a fairly straightforward replacement, e.g. s/CodeGenOpt::Aggressive/CodeGenOptLevel::Aggressive or s/CGFT_/CodeGenFileType::	2023-09-14 14:10:14 -07:00
Nick Desaulniers	86735a4353	reland [InlineAsm] wrap ConstraintCode in enum class NFC (#66264 ) reland [InlineAsm] wrap ConstraintCode in enum class NFC (#66003) This reverts commit ee643b706be2b6bef9980b25cc9cc988dab94bb5. Fix up build failures in targets I missed in #66003 Kept as 3 commits for reviewers to see better what's changed. Will squash when merging. - reland [InlineAsm] wrap ConstraintCode in enum class NFC (#66003) - fix all the targets I missed in #66003 - fix off by one found by llvm/test/CodeGen/SystemZ/inline-asm-addr.ll	2023-09-13 13:31:24 -07:00
Reid Kleckner	ee643b706b	Revert "[InlineAsm] wrap ConstraintCode in enum class NFC (#66003 )" This reverts commit 2ca4d136124d151216aac77a0403dcb5c5835bcd. Also revert the followup, "[InlineAsm] fix botched merge conflict resolution" This reverts commit 8b9bf3a9f715ee5dce96eb1194441850c3663da1. There were SystemZ and Mips build errors, too many to fix forward.	2023-09-13 09:58:02 -07:00
Nick Desaulniers	2ca4d13612	[InlineAsm] wrap ConstraintCode in enum class NFC (#66003 ) Similar to commit 2fad6e69851e ("[InlineAsm] wrap Kind in enum class NFC") Fix the TODOs added in commit 93bd428742f9 ("[InlineAsm] refactor InlineAsm class NFC (#65649)")	2023-09-13 08:48:09 -07:00
Philip Reames	a63bd7e99b	[RISCV] Use NoReg in place of IMPLICIT_DEF for undefined passthru operands In a recent series of refactorings (described here: https://discourse.llvm.org/t/riscv-transition-in-vector-pseudo-structure-policy-variants/71295), I greatly increased the number of IMPLICIT_DEF operands to our vector instructions. This has turned out to have an unexpected negative impact because MachineCSE does not CSE IMPLICIT_DEFs, and thus does not CSE any instruction with an IMPLICIT_DEF operand. SelectionDAG does CSE the same case, but that only covers the same block case, not the cross block case. This lead to the performance regression reported in https://github.com/llvm/llvm-project/issues/64282. This change is a slightly ugly hack to side step the issue. Instead of fixing the root cause (lack of CSE for IMPLICIT_DEF) or undoing the operand changes, we leave the extra operand in place, and use NoReg in place of IMPLICIT_DEF. I then convert back to IMPLICIT_DEF just before register allocation so that ProcessImplicitDefs and TwoAddressInstructions can do the normal transforms to Undef tied registers. We may end up backporting this into the 17.x release branch. Given how late in the release cycle this is landing, that's much less likely now, but still a possibility. Differential Revision: https://reviews.llvm.org/D156909	2023-08-14 12:57:38 -07:00
Craig Topper	de7fa3ab9a	[RISCV] Copy memoperands in some of the post isel peepholes. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D156830	2023-08-02 09:16:14 -07:00
Luke Lau	ce8f094da8	[RISCV] Add patterns for vnsrl.vx where shift amount is truncated Similar to D155698 where the shift amount is extended, this patch extends the ComplexPattern to handle the case where the shift amount has been truncated. Truncations are custom lowered to truncate_vector_vl, and in cases like i64 -> i16 they are truncated by one power of two at a time, so we need to unravel nested layers of them. The pattern can also be reused for Zvbb's vwsll.vx in an upcoming patch. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D155928	2023-07-26 20:26:32 +01:00
Luke Lau	33a83c5486	[RISCV] Add SDNode patterns for vrol.[vv,vx] and vror.[vv,vx,vi] These correspond to ROTL/ROTR nodes Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D155439	2023-07-21 10:22:46 +01:00
Luke Lau	24628a14c4	[RISCV] Add patterns for vnsr[a,l].wx where shift amount has different type than vector element We're currently only matching scalar shift amounts where the type is the same as the vector element type. But because only the bottom log2(2*SEW) bits are used, only 7 bits will be used at most so we can use any scalar type >= i8. This patch adds patterns for the case above, as well as for when the shift amount type is the same as the widened element type and doesn't need extended. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D155698	2023-07-21 10:13:28 +01:00
Philip Reames	b5cbd9628e	[RISCV] Remove legacy TA/TU pseudo distinction of vmerge and carry-in arithmetic operations [NFC[ his change continues with the line of work discussed in https://discourse.llvm.org/t/riscv-transition-in-vector-pseudo-structure-policy-variants/71295. This is analogous to other patches in the series, but with one key difference - the resulting pseudo does not have a policy operand. We could add one for vmerge, but the some of the multiclasses are sufficiently entwined with the mask producing arithmetic instructions that the change delta becomes unmanageable. Note that these instructions are not in the RISCVMaskedPseudo table, and thus the difference doesn't complicate other code. The main value of working incrementally here is that we get to eagerly cleanup the IsTA logic flowing through the post-ISEL combines. Differential Revision: https://reviews.llvm.org/D154645	2023-07-12 15:31:02 -07:00
Philip Reames	95d62344c0	[RISCV] Cleanup dead complexity in RISCVMaskedPseudo after TA/TU merge refactoring [nfc] After D154245 lands, we have greatly simplified the possible configurations for an entry in the RISCVMaskedPseudo table. This change goes through and reworks everything which uses that table to exploit the available simplifications. To justify the correctness here, let me note that we no longer had any use of HasTU=true. We were left with only the HasTu=false, and IsCombined=true\|false cases. The only usage is IsCombined=false was for the comparison operations. At the moment, these operations are the only ones in the table without vector policy operands. Instead of switching on the pseudo value, we can just check the VecPolicy flag instead. It may be worth adding a passthru operand to the comparisons (which is actually needed to represent tail undefined vs tail agnostic), and a vector policy operand (which is strictly unneeded) just for consistency, but we can do that in a follow up patch for some further simplification if desired. Note that we do have a few _TU pseudos left at this point. It's simply that none of them are in the RISCVMaskedPseudo table, and thus don't participate in our post-ISEL transforms. Differential Revision: https://reviews.llvm.org/D154620	2023-07-11 10:32:54 -07:00
Philip Reames	403261eafd	[RISCV] Remove legacy TA/TU pseudo distinction for load instructions This change continues with the line of work discussed in https://discourse.llvm.org/t/riscv-transition-in-vector-pseudo-structure-policy-variants/71295. This change targets all the pseudos used in loads (unit, strided, segmented, fault first, and their combinations). As with previous changes in the series, we replace the existing TA and TU forms with a single unified pseudo with a passthru (which may be implicit_def) and a policy operand. One quirk is that I went ahead and treated the unmasked mask load instruction (vlm) the same way. We need the pass thru operand to model tail undefined, but since the instruction is unconditionally agnostic and the instruction has no mask, the policy operand is arguably unneeded. I kept it mostly for consistency sake. Another quirk worth highlighting is that segment loads require a bit of dedicated handling. Surprisingly, we don't have IMPLICIT_DEF nodes of the right types, and attempting to use them results in some odd looking codegen and a few crashes. Instead, I left the REG_SEQUENCE form, and extended InsertVSETVLI to recognize the complex undefs. Arguably, we should probably revisit the handling of undef reg_sequence nodes here, but I'm hoping to side step that in this patch. As before, we see codegen changes (some improvements and some regressions) due to scheduling differences caused by the extra implicit_def instructions. I did have to delete one register allocation regression test as I couldn't figure out how to meaningfully update it. I spent a significant amount of time trying, and finally gave up. Differential Revision: https://reviews.llvm.org/D154141	2023-07-05 13:11:58 -07:00
Craig Topper	c5fdab3014	[RISCV] Use tail undisturbed vmv.v.v instead of vadd.vi with 0 for vp.merge with all ones mask. No idea what I was thinking when I suggested vadd.vi. Reviewed By: reames, frasercrmck, fakepaper56 Differential Revision: https://reviews.llvm.org/D152553	2023-06-12 10:27:08 -07:00
Shao-Ce SUN	8b90f8e04b	[RISCV][CodeGen] Support Zdinx on RV32 codegen This patch was split from D122918 . Co-Author: @StephenFan @liaolucy @realqhc Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D149743	2023-05-25 14:13:37 +08:00
Craig Topper	29463612d2	[RISCV] Replace RISCV -> RISC-V in comments. NFC To be consistent with RISC-V branding guidelines https://riscv.org/about/risc-v-branding-guidelines/ Think we should be using RISC-V where possible. More patches will follow. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D146449	2023-03-27 09:50:17 -07:00
Craig Topper	465a48fecb	[RISCV] Reuse the condop/invcondop ComplexPatterns for seteq/setne isel. NFC NFC NFC NFC To do this we need to remove the always matching behavior from condop. This requires us to add more 'select' isel patterns with a bare GPR as the condition. Rename condop/invcondop to riscv_setne/riscv_seteq. This centralizes the ADDI/XORI/XOR tricks into one place.	2023-02-25 12:05:48 -08:00
Craig Topper	3caa427f8e	[RISCV] Use ComplexPattern to reduce the number of patterns for XVentanaCondOps. XVentanaCondOps check the condition operand for zero or non-zero. We use this to optimize seteq/setne that would otherwise becomes xor/xori/addi+snez/seqz. These patterns avoid the snez/seqz. This patch adds two ComplexPatterns to match the varous cases and emit the xor/xori/addi instruction. These patterns can also be used by D144681. Reviewed By: philipp.tomsich Differential Revision: https://reviews.llvm.org/D144700	2023-02-24 09:36:58 -08:00
Manolis Tsamis	f6262201d8	[RISCV] Add vendor-defined XTheadMemIdx (Indexed Memory Operations) extension The vendor-defined XTHeadMemIdx (no comparable standard extension exists at the time of writing) extension adds indexed load/store instructions as well as load/store and update register instructions. It is supported by the C9xx cores (e.g., found in the wild in the Allwinner D1) by Alibaba T-Head. The current (as of this commit) public documentation for this extension is available at: https://github.com/T-head-Semi/thead-extension-spec/releases/download/2.2.2/xthead-2023-01-30-2.2.2.pdf Support for these instructions has already landed in GNU Binutils: https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=27cfd142d0a7e378d19aa9a1278e2137f849b71b Depends on D144002 Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D144249	2023-02-24 00:17:58 +01:00
Philipp Tomsich	10b7cd660c	[RISCV] Select signed and unsigned bitfield extracts for XTHeadBb The XTHeadBb extension hab both signed and unsigned bitfield extraction instructions (TH.EXT and TH.EXTU, respectively) which have previously only been supported for sign extension on byte, halfword, and word-boundaries. This adds the infrastructure to use TH.EXT and TH.EXTU for arbitrary bitfield extraction. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D144229	2023-02-17 21:46:26 +01:00
Manolis Tsamis	d4012bc43f	[RISCV] Add vendor-defined XTheadMAC (multiply-accumulate) extension The vendor-defined XTHeadMAC (no comparable standard extension exists at the time of writing) extension adds multiply accumulate instructions. It is supported by the C9xx cores (e.g., found in the wild in the Allwinner D1) by Alibaba T-Head. The current (as of this commit) public documentation for this extension is available at: https://github.com/T-head-Semi/thead-extension-spec/releases/download/2.2.2/xthead-2023-01-30-2.2.2.pdf Support for these instructions has already landed in GNU Binutils: https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=4041e11db3ec3611921d10150572a92689aa3154 Co-authored-by: Philipp Tomsich <philipp.tomsich@vrull.eu> Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D143847	2023-02-14 20:25:47 +01:00
Han-Kuan Chen	d02b9869b2	[RISCV] Don't use constantpool for floating-point value if the value can be easily constructed by integer sequence and a floating-point move. In addition, this commit does the following combine vfmv.v.f + fmv.[dhw].x -> vmv.v.x vfmv.s.f + fmv.[dhw].x -> vmv.s.x vfmerge.vfm + fmv.[dhw].x -> vmerge.vxm Differential Revision: https://reviews.llvm.org/D142953	2023-02-03 22:42:08 -08:00
Craig Topper	79d6e9c713	[RISCV] Prefer ADDI over ORI if the known bits are disjoint. There is no compressed form of ORI but there is a compressed form for ADDI. This also works for XORI since DAGCombine will turn Xor with disjoint bits in Or. Note: The compressed forms require a simm6 immediate, but I'm doing this for the full simm12 range. Reviewed By: kito-cheng Differential Revision: https://reviews.llvm.org/D140674	2022-12-28 19:59:42 -08:00
Nick Desaulniers	19a004b468	[llvm][SelectionDAGISel] support -{start\|stop}-{before\|after}= for remaining targets Follow up to the series: 1. https://reviews.llvm.org/D140161 2. https://reviews.llvm.org/D140349 3. https://reviews.llvm.org/D140331 4. https://reviews.llvm.org/D140323 Completes the work from the previous two for remaining targets. This creates the following named passes that can be run via `llc -{start\|stop}-{before\|after}`: - arc-isel - arm-isel - avr-isel - bpf-isel - csky-isel - hexagon-isel - lanai-isel - loongarch-isel - m68k-isel - msp430-isel - mips-isel - nvptx-isel - ppc-codegen - riscv-isel - sparc-isel - systemz-isel - ve-isel - wasm-isel - xcore-isel A nice way to write tests for SelectionDAGISel might be to use a RUN: line like: llc -mtriple=<triple> -start-before=<arch>-isel -stop-after=finalize-isel -o - Fixes: https://github.com/llvm/llvm-project/issues/59538 Reviewed By: asb, zixuan-wu Differential Revision: https://reviews.llvm.org/D140364	2022-12-21 13:25:15 -08:00
Craig Topper	c09edce1b3	[SelectionDAG] Give all the target specific subclasses of SelectionDAGISel their own pass ID. Previously we had a shared ID in SelectionDAGISel. AMDGPU has an initializePass function for its subclass of SelectionDAGISel. No other target does. This causes all target specific SelectionDAGISel passes to be known as "amdgpu-isel". I'm not sure what would happen if another target tried to implement an initializePass function too since the ID is already claimed. This patch gives all targets their own ID and passes it down to SelectionDAGISel constructor to MachineFunctionPass's constructor. Unfortunately, I think this causes most targets to lose print-before/after-all support for their SelectionDAGISel pass. And they probably no longer support start/stop-before/after. We can add initializePass functions to fix this as a follow up. NOTE: This was probably also broken if the AMDGPU target isn't compiled in. Step 1 to fixing PR59538. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D140161	2022-12-15 15:48:55 -08:00
Nitin John Raj	d741a31a39	[RISCV][CodeGen][SelectionDAG] Recursively check hasAllNBitUsers for logical machine opcodes We don’t have W versions of AND/OR/XOR/ANDN/ORN/XNOR so we should recursively check their users. We should limit the recursion to SelectionDAG::MaxRecursionDepth levels. We need to add a Depth argument, all existing callers should pass 0 to the Depth. The new recursive calls should increment it by 1. At the top of the function we should give up and return false if Depth >= SelectionDAG::MaxRecursionDepth. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D139462	2022-12-14 15:15:30 -08:00
Craig Topper	24810acb62	[RISCV] Add isel patterns to select slli+shXadd.uw. This matches what we get for something like. %0 = shl i32 %x, C %1 = zext i32 %0 to i64 %2 = getelementptr i32, ptr %y, %1 The shift before the zext and the shift implied by the GEP get combined with an AND after them. We need to split it back into 2 shifts so we can fold one into shXadd.uw. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D137886	2022-11-21 09:32:51 -08:00
Craig Topper	95388f7329	[RISCV] Improve selection of PACK/PACKW for AssertZExt input.	2022-11-13 16:00:45 -08:00
Craig Topper	a9f9520907	[RISCV] Rename template parameter. NFC	2022-11-12 00:31:31 -08:00
Craig Topper	ae503d3760	[RISCV] Use template to reduce some code. NFC	2022-11-11 23:05:58 -08:00
jacquesguan	1cbf44bd50	[RISCV] Support peephole optimization to fold vmerge.vvm that has tail agnostic policy and unmasked intrinsics. This patch supports the tail agnostic part of D130442. Reviewed By: fakepaper56 Differential Revision: https://reviews.llvm.org/D132923	2022-09-21 10:56:37 +08:00
Yeting Kuo	1b56b2b267	[RISCV] Transform VMERGE_VVM_<LMUL>_TU with all ones mask to VADD_VI_<LMUL>_TU. The transformation is benefit because vmerge.vvm always needs mask operand but vadd.vi may not. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D133255	2022-09-14 10:01:37 +08:00
Yeting Kuo	875694089d	[RISCV] Peephole optimization to fold merge.vvm and unmasked intrinsics. The patch uses peephole method to fold merge.vvm and unmasked intrinsics to masked intrinsics. Using peephole intead of tablegen patterns is to avoid large auto gnerated code. Note: The patch ignores segment loads since I don't know how to test them. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D130442	2022-08-11 17:58:11 +08:00
Craig Topper	a304d70ee9	[RISCV] Reorder (and/or/xor (shl X, C1), C2) if we can form ANDI/ORI/XORI. InstCombine and DAGCombine prefer to keep shl before binops. This patch teaches isel to convert to (shl (and/or/xor X, C1 >> C2), C2) if (C1 >> C2) is a simm12. The idea was taken from X86's isel code. There's a special case implemented for a sext_inreg between the shift and the binop. Differential Revision: https://reviews.llvm.org/D130610	2022-07-27 17:35:26 -07:00
Craig Topper	759e5e0096	[RISCV] Remove doPeepholeLoadStoreADDI. All of the cases should be handled by SelectAddrRegImm now. Reviewed By: asb, luismarques Differential Revision: https://reviews.llvm.org/D129451	2022-07-11 10:44:33 -07:00
luxufan	c06d0b4d02	[RISCV] Add ADDI instr for computing FrameIndex address RVV doesn't have immediate field for memory addressing. Currently we build MachineInstructions in PEI to computing stack offset for RVV load store instructions. These instructions were added too late to can be optimized by CSE, LICM... passes. This patch makes FrameIndex SDNodes can't be matched in RVV Load Store instruction selection patterns. So that the FrameIndex SDNodes would be selected as `ADDI GPR, targetframeindex`. There are 2 advantages for such change: 1. Stack objects address computing can be optimized by machine function passes. 2. Since the ADDI instruction's destination register can be used as a temp register, we can save an emergency spill slot. Differential Revision: https://reviews.llvm.org/D128187	2022-07-04 22:13:35 +08:00
Craig Topper	8eb4dcb737	[RISCV] Move some SHXADD matching cases into a ComplexPattern. NFC Some more complex cases require checking the relationship of operands on different nodes of the match. They also require additional instructions to be created. Using a ComplexPattern gives us that flexibility. I'll be adding another pattern in a future patch.	2022-07-03 21:57:05 -07:00
Craig Topper	5e944e9eb7	[RISCV] Refactor SelectAddrRegImm to not depend on SelectBaseAddr. SelectBaseAddr was a minor convenience to use since it already' existed for vector load/store. D128187 is going to remove the other uses of SelectBaseAddr so it has less reason to exist. This patch removes the dependency on SelectBaseAddr and adds a new SelectAddrFrameIndex to share some code with SelectFrameAddrRegImm.	2022-06-26 11:11:41 -07:00
Craig Topper	4402852002	[RISCV] Reduce scalar load/store isel patterns to a single ComplexPattern. NFCI Previously we had 3 different isel patterns for every scalar load store instruction. This reduces them to a single ComplexPattern that returns the Base and Offset. Or an offset of 0 if there was no offset identified I've done a similar thing for the 2 isel patterns that match add/or with FrameIndex and immediate. Using the offset of 0, I was also able to remove the custom handler for FrameIndex. Happy to split that to another patch. We might be able to enhance in the future to remove the post-isel peephole or the special handling for ADD with constant added by D126576. A nice side effect is that this removes nearly 3000 bytes from the isel table. Differential Revision: https://reviews.llvm.org/D126932	2022-06-03 09:00:17 -07:00
eopXD	2cadf84fc8	[RISCV] Pass OptLevel to `RISCVDAGToDAGISel` correctly Originally, `OptLevel` isn't passed into the `MachineFunctionPass`. This lets the default parameter of `SelectionDAGISel`, which is `CodeGenOpt::Default`, be passed in. OptLevelChanger captures the optimization level with the parameter, and rather not the value within `TargetMachine`. This lets the optimization be unintentionally overwriten if other value than `CodeGenOpt::Default` passed. This patch fixes this by passing the optimization level rather than using the default value. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D126641	2022-05-30 17:22:50 -07:00
Zakk Chen	7dfc56c107	[RISCV] Add the passthru operand for RVV unmasked segment load IR intrinsics. The goal is support tail and mask policy in RVV builtins. We focus on IR part first. If the passthru operand is undef, we use tail agnostic, otherwise use tail undisturbed. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D125323	2022-05-13 02:16:40 -07:00
ShihPo Hung	6b55f133fb	[RISCV][RVV] Select unmasked TU RVV pseudos in a DAG post-process Following D118810 that reduced the size of ISel table, this patch optimizes allone-masked RVV pseudos with TU policy and swap them out to their unmasked TU pseudos. Since the UNDEF merge operand is not preserved, we turn it into TA pseudo regardless of the policy operand. Reviewed By: craig.topper, frasercrmck Differential Revision: https://reviews.llvm.org/D121881	2022-04-26 20:14:54 -07:00
Fraser Cormack	6449bea508	[RISCV] Select unmasked RVV pseudos in a DAG post-process This patch drops TableGen patterns matching all-ones masked RVV pseudos in the case where there are fallback patterns matching the generic masked forms to "_MASK" pseudos. This optimization is now performed with a SelectionDAG post-processing step which peephole-optimizes these same pseudos with all-ones masks and swaps them out to their unmasked pseudos. This cuts our generated ISel table down by around ~5% (~110kB) in lieu of a far smaller auto-generated table to help with the peephole. This only targets our custom RISCVISD::*_VL binary operator nodes, which use the one form for both masked and unmasked variants. A similar approach could be used for our intrinsics but we'd need to do some work, e.g., to represent unmasked intrinsics as true-masked intrinsics at the IR or ISel level. At a rough estimate, this could save us a further 9% on the size of our ISel table for the binary intrinsic patterns alone. There is no observable impact on our tests. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D118810	2022-02-09 07:50:15 +00:00
Zakk Chen	9273378b85	[RISCV] Add the passthru operand for RVV nomask load intrinsics. The goal is support tail and mask policy in RVV builtins. We focus on IR part first. If the passthru operand is undef, we use tail agnostic, otherwise use tail undisturbed. Co-Authored-by: Hsiangkai Wang <Hsiangkai@gmail.com> Reviewers: craig.topper, frasercrmck Differential Revision: https://reviews.llvm.org/D117647	2022-01-25 17:31:36 -08:00
Fraser Cormack	d42678b453	[RISCV] Add side-effect-free vsetvli intrinsics This patch introduces new intrinsics that enable the use of vsetvli in contexts where only the returned vector length is of interest. The pre-existing intrinsics are marked with side-effects, which prevents even trivial optimizations on/across them. These intrinsics are intended to be used in situations where the vector length is fed in turn to RVV intrinsics or to vector-predication intrinsics during loop vectorization, for example. Those codegen paths ensure that instructions are generated with their own implicit vsetvli, so the vector length and vtype can be relied upon to be correct. No corresponding C builtins are planned at this stage, though that is a possibility for the future if the need arises. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D117910	2022-01-24 13:52:08 +00:00
Hsiangkai Wang	7d39a8a921	[RISCV] (1/2) Add the tail policy argument to builtins/intrinsics. Add the tail policy argument to LLVM IR intrinsics. There are two policies for tail elements. Tail agnostic means users do not care about the values in the tail elements and tail undisturbed means the values in the tail elements need to be kept after the operation. In order to let users control the tail policy, we add an additional argument at the end of the argument list. For unmasked operations, we have no maskedoff and the tail policy is always tail agnostic. If users want to keep tail elements under unmasked operations, they could use all one mask in the masked operations to do it. So, we only add the additional argument for masked operations for most cases. There are exceptions listed below. In this patch, we do not handle the following cases to reduce the complexity of the patch. There could be two separate patches for them. * Use dest argument to control tail policy vmerge.vvm/vmerge.vxm/vmerge.vim (add _t builtins with additional dest argument) vfmerge.vfm (add _t builtins with additional dest argument) vmv.v.v (add _t builtins with additional dest argument) vmv.v.x (add _t builtins with additional dest argument) vmv.v.i (add _t builtins with additional dest argument) vfmv.v.f (add _t builtins with additional dest argument) vadc.vvm/vadc.vxm/vadc.vim (add _t builtins with additional dest argument) vsbc.vvm/vsbc.vxm (add _t builtins with additional dest argument) * Always has tail argument for masked/unmasked intrinsics Vector Single-Width Integer Multiply-Add Instructions (add _t and _mt builtins) Vector Widening Integer Multiply-Add Instructions (add _t and _mt builtins) Vector Single-Width Floating-Point Fused Multiply-Add Instructions (add _t and _mt builtins) Vector Widening Floating-Point Fused Multiply-Add Instructions (add _t and _mt builtins) Vector Reduction Operations (add _t and _mt builtins) Vector Slideup Instructions (add _t and _mt builtins) Vector Slidedown Instructions (add _t and _mt builtins) Discussion: https://github.com/riscv/rvv-intrinsic-doc/pull/101 Differential Revision: https://reviews.llvm.org/D105092	2021-09-24 17:09:50 +08:00
Craig Topper	3f9b37ccb1	[RISCV] Remove sext_inreg+add/sub/mul/shl isel patterns. Let the sext_inreg be selected to sext.w. Remove unneeded sext.w during PostProcessISelDAG. This gives opportunities for some other isel patterns to match like the ADDIPair or matching mul with immediate to shXadd. This becomes possible after D107658 started selecting W instructions based on users. The sext.w will be considered a W user so isel will often select a W instruction for the sext.w input and we can just remove the sext.w. Otherwise we can combine the sext.w with a ADD/SUB/MUL/SLLI to create a new W instruction in parallel to the the original instruction. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D107708	2021-08-18 11:07:11 -07:00

1 2

92 Commits