llvm-project

Author	SHA1	Message	Date
David Green	6baaa0afc3	[ARM] Handle roundeven for MVE. (#142557 ) Now that #141786 handles scalar and neon types, this adds MVE definitions and legalization for llvm.roundeven intrinsics. The existing llvm.arm.mve.vrintn are auto-upgraded to llvm.roundeven like other vrint instructions, so should continue to work.	2025-06-08 18:23:50 +01:00
Sergei Barannikov	6b2232606d	[TableGen] Replace WantRoot/WantParent SDNode properties with flags (#119599 ) These properties are only valid on ComplexPatterns. Having them as flags is more convenient because one can now use "let = ... in" syntax to set these flags on several patterns at a time. This is also less error-prone as it makes it impossible to specify these properties on records derived from SDPatternOperator. Pull Request: https://github.com/llvm/llvm-project/pull/119599	2024-12-12 00:41:44 +03:00
Oliver Stannard	aba55809e9	[ARM] Fix operand order for MVE predicated VFMAS (#115908 ) For most MVE predicated FMA instructions, disabled lanes will contain the value in the addend operand. However, The VFMAS instruction takes the addend in a GPR, and the output register is shared with the first multiply operand, so disabled lanes will get that value instead. This means that we can't use the same intrinsic as for the other VFMA instructions. Instead, we can codegen the vfmas intrinsic to a regular FMA and select in clang, which the backend already has the patterns to select VFMAS from.	2024-11-13 12:16:28 +00:00
Oliver Stannard	9b016e3cb2	[ARM] Add early-clobber to MVE VCMLA.f32 (#114995 ) This instruction (but not the f16 variant) cannot us the same register for the output as either of the inputs, so it needs to be marked as early-clobber.	2024-11-06 14:46:08 +00:00
dnsampaio	28d0718033	[DAGCombiner] Add combine avg from shifts (#113909 ) This teaches dagcombiner to fold: `(asr (add nsw x, y), 1) -> (avgfloors x, y)` `(lsr (add nuw x, y), 1) -> (avgflooru x, y)` as well the combine them to a ceil variant: `(avgfloors (add nsw x, y), 1) -> (avgceils x, y)` `(avgflooru (add nuw x, y), 1) -> (avgceilu x, y)` iff valid for the target. Removes some of the ARM MVE patterns that are now dead code. It adds the avg opcodes to `IsQRMVEInstruction` as to preserve the immediate splatting as before.	2024-10-31 10:57:27 +01:00
jofrn	fe480cf923	[ARM] Use proper types for these records. (#113370 ) llvm#112904 will add typechecking to submulticlass arguments, and these ones are currently mistyped.	2024-10-22 18:17:52 -04:00
Alfie Richards	60c775769b	[ARM] Add missing earlyclobber to sqrshr and uqrshl instructions. (#77782 ) This avoids possible undefined behavior using the same register for Rm and Rda. Additionally adds a check in MC to produce an error upon parsing this case.	2024-01-16 10:30:16 +00:00
David Green	f1961153c2	[ARM] Add predicated shift patterns This uses the patterns defined in MVE_TwoOpPattern to add predicated patterns for vshls/u instructions. Differnetial Revision: https://reviews.llvm.org/D149366	2023-04-29 20:32:54 +01:00
David Green	4f41a74d82	[ARM] Fold fadd of vcmul into vcmla This adds an extra tablegen combine for folding fadd(a, vcmul(b, c)) into vcmla(a, b, c), so long as the fadd is allowed to contract. Differential Revision: https://reviews.llvm.org/D147201	2023-04-05 11:52:05 +01:00
Simon Tatham	e45cbf9923	[ARM,MVE] Update MVE_VMLA_qr for architecture change. In revision B.q and before of the Armv8-M architecture reference manual, the vector/scalar forms of the `vmla` and `vmlas` instructions came in signed and unsigned integer forms, such as `vmla.s8 q0,q1,r2` or `vmlas.u32 q3,q4,r5`. Revision B.r has changed this. There are no longer signed and unsigned versions of these instructions, since they were functionally identical anyway. Now there is just `vmla.i8` (or `i16` or `i32`, and similarly for `vmlas`). Bit 28 of the instruction encoding, which was previously 0 for signed or 1 for unsigned, is now expected to be 0 always. This change updates LLVM to the new version of the architecture. The obsoleted encodings for unsigned integers are now decoding errors, and only the still-valid encoding is ever emitted. This shouldn't break any existing assembly code, because the old signed and unsigned versions of the mnemonic are still accepted by the assembler (which is standard practice anyway for all signedness-agnostic MVE integer instructions). Reviewed By: dmgreen, lenary Differential Revision: https://reviews.llvm.org/D138827	2022-11-29 08:47:00 +00:00
Alex Richardson	88218d5c52	[SelectionDAG] Remove deprecated MemSDNode->getAlignment() I noticed a an assertion error when building MIPS code that loaded from NULL. Loading from NULL ends up being a load with maximum alignment, and due to integer truncation the value maximum was interpreted as 0 and the assertion in MipsDAGToDAGISel::Select() failed. This previously happened to work, but the maximum alignment was increased in df84c1fe78130a86445d57563dea742e1b85156a, so it no longer fits into a 32 bit integer. Instead of just fixing the one MIPS case, this patch removes all uses of the deprecated getAlignment() call and replaces them with getAlign(). Differential Revision: https://reviews.llvm.org/D138420	2022-11-23 09:04:42 +00:00
David Green	cb806ce2aa	[ARM] Guard VMOVH and VINS patterns. These instructions are only available when fp is available, so cannot be used with just +mve. Add predicates to ensure we fall-back under the right circumstances.	2022-07-17 21:26:49 +01:00
David Green	ea6ebbcfb3	[ARM] MVE hadd and rhadd This uses the nodes from D106237 to add MVE HADD and RHADD lowering. Differential Revision: https://reviews.llvm.org/D106238	2022-02-14 11:55:40 +00:00
David Green	6bd8f114c8	[ARM] Handle splats of constants for MVE qr instruction Some MVE instructions have qr variants that take a Q and R register, splatting the R register for each lane. This is usually handled fine for standard splats as we sink the splat into the loop and combine the resulting dup into the qr instruction. It does not work for constant splats though, as we generate a vmovimm or constant pool load instead. This intercepts that, generating a vdup of the constant instead where we can turn the result into a qr instruction variant. Differential Revision: https://reviews.llvm.org/D115242	2021-12-17 09:16:28 +00:00
David Green	ab0c5cea0b	[ARM] Use v2i1 for MVE and CDE intrinsics This adjusts all the MVE and CDE intrinsics now that v2i1 is a legal type, to use a <2 x i1> as opposed to emulating the predicate with a <4 x i1>. The v4i1 workarounds have been removed leaving the natural v2i1 types, notably in vctp64 which now generates a v2i1 type. AutoUpgrade code has been added to upgrade old IR, which needs to convert the old v4i1 to a v2i1 be converting it back and forth to an integer with arm.mve.v2i and arm.mve.i2v intrinsics. These should be optimized away in the final assembly. Differential Revision: https://reviews.llvm.org/D114455	2021-12-03 15:27:58 +00:00
David Green	255ad73424	[ARM] Make MVE v2i1 predicates legal MVE can treat v16i1, v8i1, v4i1 and v2i1 as different views onto the same 16bit VPR.P0 register, with v2i1 holding two 8 bit values for the two halves. This was never treated as a legal type in llvm in the past as there are not many 64bit instructions and no 64bit compares. There are a few instructions that could use it though, notably a VSELECT (as it can handle any size using the underlying v16i8 VPSEL), AND/OR/XOR for similar reasons, some gathers/scatter and long multiplies and VCTP64 instructions. This patch goes through and makes v2i1 a legal type, handling all the cases that fall out of that. It also makes VSELECT legal for v2i64 as a side benefit. A lot of the codegen changes as a result - usually in way that is a little better or a little worse, but still expensive. Costs can change a little too in the process, again in a way that expensive things remain expensive. A lot of the tests that changed are mainly to ensure correctness - the code can hopefully be improved in the future where it comes up in practice. The intrinsics currently remain using the v4i1 they previously did to emulate a v2i1. This will be changed in a followup patch but this one was already large enough. Differential Revision: https://reviews.llvm.org/D114449	2021-12-03 14:05:41 +00:00
David Green	d9af9c2c5a	[ARM] Fold floating point select(binop) patterns Similar to D84091 which added extra predicated folds for integer operations using the identity element of the operation, this adds them for floating point operations for the form `BinOp(x, select(p, y, Identity))`. They are folded back to predicated versions of the operator, with fadd having the identity -0.0, fsub using the identity 0.0 and fmul using 1.0. Differential Revision: https://reviews.llvm.org/D113574	2021-11-24 10:22:20 +00:00
David Green	73346f5848	[ARM] Introduce a MQPRCopy Currently when creating tail predicated loops, we need to validate that all the live-outs of a loop will be equivalent with and without tail predication, and if they are not we cannot legally create a tail-predicated loop, leaving expensive vctp and vpst instructions in the loop. These notably can include register-allocation instructions like stack loads and stores, and copys lowered from COPYs to MVE_VORRs. Instead of trying to prove this is valid late in the pipeline, this patch introduces a MQPRCopy pseudo instruction that COPY is lowered to. This can then either be converted to a MVE_VORR where possible, or to a couple of VMOVD instructions if not. This way they do not behave differently within and outside of tail-predications regions, and we can know by construction that they are always valid. The idea is that we can do the same with stack load and stores, converting them to VLDR/VSTR or VLDM/VSTM where required to prove tail predication is always valid. This does unfortunately mean inserting multiple VMOVD instructions, instead of a single MVE_VORR, but my experiments show it to be an improvement in general. Differential Revision: https://reviews.llvm.org/D111048	2021-10-07 12:52:12 +01:00
David Green	02cd8a6b91	[ARM] Allow smaller VMOVL in tail predicated loops This allows VMOVL in tail predicated loops so long as the the vector size the VMOVL is extending into is less than or equal to the size of the VCTP in the tail predicated loop. These cases represent a sign-extend-inreg (or zero-extend-inreg), which needn't block tail predication as in https://godbolt.org/z/hdTsEbx8Y. For this a vecsize has been added to the TSFlag bits of MVE instructions, which stores the size of the elements that the MVE instruction operates on. In the case of multiple size (such as a MVE_VMOVLs8bh that extends from i8 to i16, the largest size was be chosen). The sizes are encoded as 00 = i8, 01 = i16, 10 = i32 and 11 = i64, which often (but not always) comes from the instruction encoding directly. A unit test was added, and although only a subset of the vecsizes are currently used, the rest should be useful for other cases. Differential Revision: https://reviews.llvm.org/D109706	2021-09-22 12:07:52 +01:00
David Green	6b7cdb40da	[ARM] Remove unused tblgen arguments. NFCI As per D109359, this removes or makes use of some of the existing unused MVE tblgn arguments.	2021-09-10 15:06:31 +01:00
David Green	9cb8f4d1ad	[ARM] Add a tail-predication loop predicate register The semantics of tail predication loops means that the value of LR as an instruction is executed determines the predicate. In other words: mov r3, #3 DLSTP lr, r3 // Start tail predication, lr==3 VADD.s32 q0, q1, q2 // Lanes 0,1 and 2 are updated in q0. mov lr, #1 VADD.s32 q0, q1, q2 // Only first lane is updated. This means that the value of lr cannot be spilled and re-used in tail predication regions without potentially altering the behaviour of the program. More lanes than required could be stored, for example, and in the case of a gather those lanes might not have been setup, leading to alignment exceptions. This patch adds a new lr predicate operand to MVE instructions in order to keep a reference to the lr that they use as a tail predicate. It will usually hold the zeroreg meaning not predicated, being set to the LR phi value in the MVETPAndVPTOptimisationsPass. This will prevent it from being spilled anywhere that it needs to be used. A lot of tests needed updating. Differential Revision: https://reviews.llvm.org/D107638	2021-09-02 13:42:58 +01:00
David Green	49476a4d66	[ARM] Add MVE lowering for fptosi.sat This adds lowering of the llvm.fptosi.sat and llvm.fptoui.sat intinsics, selecting a VCVT instruction which under MVE will inherently perform the saturate. Differential Revision: https://reviews.llvm.org/D107865	2021-09-01 22:38:47 +01:00
David Green	22c384129e	[ARM] Add missing validForTailPredication for VMINNM/VMAXNM Apparently this was missing, preventing the generation of tail predication loops containing VMINNM, VMAXNM, VMINNMA and VMAXNMA.	2021-08-31 18:19:03 +01:00
David Green	62e892fa2d	[ARM] Add MQQPR and MQQQQPR spill and reload pseudo instructions As a part of D107642, this adds pseudo instructions for MQQPR and MQQQQPR register classes, that can spill and reloads entire registers whilst keeping them combined, not splitting them into multiple D subregs that a VLDMIA/VSTMIA would use. This can help certain analyses, and helps to prevent verifier issues with subreg liveness.	2021-08-17 13:51:34 +01:00
David Green	9236dea255	[ARM] Create MQQPR and MQQQQPR register classes Similar to the MQPR register class as the MVE equivalent to QPR, this adds MQQPR and MQQQQPR register classes for the MVE equivalents of QQPR and QQQQPR registers. The MVE MQPR seemed have worked out quite well, and adding MQQPR and MQQQQPR allows us to a little more accurately specify the number of registers, calculating register pressure limits a little better. Differential Revision: https://reviews.llvm.org/D107463	2021-08-16 22:58:12 +01:00
David Green	0f83d37a14	[ARM] MVE vabd This adds MVE lowering for VABDS/VABDU, using the code parted from AArch64 in D91937. Differential Revision: https://reviews.llvm.org/D91938	2021-06-26 19:41:32 +01:00
David Green	2cf0e52b85	[ARM] Add patterns for vmulh Now that vmulh can be selected, this adds the MVE patterns to make it legal and generate instructions. Differential Revision: https://reviews.llvm.org/D88011	2021-05-26 09:22:12 +01:00
David Green	11b34e78c1	[ARM] Define CPSR on MEMCPY pseudos These pseudos are converted post-isel into t2WhileLoopStart and t2LoopEnd/LoopDec instructions, which themselves are defined to clobber CPSR. Doing the same with the MEMCPY nodes will make sure they are scheduled correctly to not end up with incorrect uses.	2021-05-14 15:06:59 +01:00
Malhar Jajoo	dfe3ffaa4a	[ARM] Transforming memset to Tail predicated Loop This patch converts llvm.memset intrinsic into Tail Predicated Hardware loops for a target that supports the Arm M-profile Vector Extension (MVE). The llvm.memset is converted to a TP loop for both constant and non-constant input sizes (of llvm.memset). Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D100435	2021-05-07 13:35:53 +01:00
Malhar Jajoo	9ff38e2d9d	[ARM] Transforming memcpy to Tail predicated Loop This patch converts llvm.memcpy intrinsic into Tail Predicated Hardware loops for a target that supports the Arm M-profile Vector Extension (MVE). From an implementation point of view, the patch - adds an ARM specific SDAG Node (to which the llvm.memcpy intrinsic is lowered to, during first phase of ISel) - adds a corresponding TableGen entry to generate a pseudo instruction, with a custom inserter, on matching the above node. - Adds a custom inserter function that expands the pseudo instruction into MIR suitable to be (by later passes) into a WLSTP loop. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D99723	2021-05-06 23:21:28 +01:00
Malhar Jajoo	fc690777fc	Revert "[ARM] Transforming memcpy to Tail predicated Loop" Reverting commit since it causes failure (10462). This reverts commit b856f4a232cbd43476e9b9f75c80aacfc6f5c152.	2021-05-06 12:39:08 +01:00
Malhar Jajoo	b856f4a232	[ARM] Transforming memcpy to Tail predicated Loop This patch converts llvm.memcpy intrinsic into Tail Predicated Hardware loops for a target that supports the Arm M-profile Vector Extension (MVE). From an implementation point of view, the patch - adds an ARM specific SDAG Node (to which the llvm.memcpy intrinsic is lowered to, during first phase of ISel) - adds a corresponding TableGen entry to generate a pseudo instruction, with a custom inserter, on matching the above node. - Adds a custom inserter function that expands the pseudo instruction into MIR suitable to be (by later passes) into a WLSTP loop. Note: A cli option is used to control the conversion of memcpy to TP loop and this option is currently disabled by default. It may be enabled in the future after further downstream testing. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D99723	2021-05-06 09:34:09 +01:00
David Green	15b5d1a5bf	[ARM] Transfer memory operands for VLDn We create MMO's for the VLDn/VSTn intrinsics in ARMTargetLowering:: getTgtMemIntrinsic, but they do not currently make it ll the way through ISel. This changes that in the various places it needs changing, making sure that the MMO is propagate through to the final instruction. This can help in scheduling, not treating the VLD2/VST2 as a scheduling barrier. Differential Revision: https://reviews.llvm.org/D101096	2021-05-03 00:04:21 +01:00
David Green	8de7d8b2c2	[ARM] Recognize VIDUP from BUILDVECTORs of additions This adds a pattern to recognize VIDUP from BUILD_VECTOR of incrementing adds. This can come up from either geps or adds, and came up recently in D100550. We are just looking for a BUILD_VECTOR where each lane is an add of the first lane with N*i, where i is the lane and N is one of 1, 2, 4, or 8, supported by the VIDUP instruction. Differential Revision: https://reviews.llvm.org/D101263	2021-04-27 19:33:24 +01:00
David Green	6d9d2049c8	[ARM] VINS f16 pattern This adds an extra pattern for inserting an f16 into a odd vector lane via an VINS. If the dual-insert-lane pattern does not happen to apply, this can help with some simple cases. Differential Revision: https://reviews.llvm.org/D95471	2021-03-21 12:00:06 +00:00
David Green	21a4faab60	[ARM] Move double vector insert patterns using vins to DAG combine This removes the existing patterns for inserting two lanes into an f16/i16 vector register using VINS, instead using a DAG combine to pattern match the same code sequences. The tablegen patterns were already on the large side (foreach LANE = [0, 2, 4, 6]) and were not handling all the cases they could. Moving that to a DAG combine, whilst not less code, allows us to better control and expand the selection of VINSs. Additionally this allows us to remove the AddedComplexity on VCVTT. The extra trick that this has learned in the process is to move two adjacent lanes using a single f32 vmov, allowing some extra inefficiencies to be removed. Differenial Revision: https://reviews.llvm.org/D96876	2021-02-22 09:29:47 +00:00
David Green	1e007cf43c	[ARM] Use rGPR for writeback vldrs From what I can tell, a writeback is unpredictable with LR for both loads and stores. This changes the operand from a gprnopc to a rGPR in both cases (which I believe is essentially a NFC due to the tied-def already being a rGPR.) Differential Revision: https://reviews.llvm.org/D96723	2021-02-16 16:44:47 +00:00
David Green	11e415dc90	[ARM] Make v2f64 scalar_to_vector legal Because we mark all operations as expand for v2f64, scalar_to_vector would end up lowering through a stack store/reload. But it is pretty simple to implement, only inserting a D reg into an undef vector. This helps clear up some inefficient codegen from soft calling conventions. Differential Revision: https://reviews.llvm.org/D96153	2021-02-08 11:34:55 +00:00
David Green	1b435eb8f3	[ARM] i16 insert-of-extract to VINS pattern This adds another tablegen fold that converts an i16 odd-lane-insert of an even-lane-extract into a VINS. We extract the existing f32 value from the destination register and VINS the new value into it. The rest of the backend then is able to optimize the INSERT_SUBREG / COPY_TO_REGCLASS / EXTRACT_SUBREG. Differential Revision: https://reviews.llvm.org/D95456	2021-02-08 08:41:07 +00:00
David Green	3e780616c4	[ARM] Correct some tablegen operand types. NFC	2021-02-02 16:55:31 +00:00
David Green	2753722b0f	[ARM] Mark MVE_VMOV_to_lane_32 as isInsertSubregLike This allows the peephole optimizer to know that a MVE_VMOV_to_lane_32 is the same as an insert subreg, allowing it to optimize some redundant lane moves. Differential Revision: https://reviews.llvm.org/D95433	2021-02-02 16:35:47 +00:00
David Green	3a5adf8483	[ARM] Add MVE insert-of-extract pattern A v4i32 insert of an extract can become a simple lane move, as opposed to round-tripping via a GPR. This adds a patterns that turns an v4i32 insert-extract pair into a EXTRACT_SUBREG/INSERT_SUBREG, with the required COPY_TO_REGCLASS. These get better optimized into a simple lane move by the rest of the backend. Differential Revision: https://reviews.llvm.org/D95428	2021-02-02 15:15:04 +00:00
David Green	c722575633	[ARM] Select VINS from vector inserts This patch adds tablegen patterns for pairs of i16/f16 insert/extracts. If we are inserting into two adjacent vector lanes (0 and 1 for example), we can use either a vmov;vins or vmovx;vins to insert the pair together, avoiding a round-trip from GRP registers. This is quite a large patterns with a number of EXTRACT_SUBREG/INSERT_SUBREG/ COPY_TO_REGCLASS nodes, but hopefully as most of those become copies all that will be cleaned up by further optimizations. The VINS pattern was also adjusted to allow it to represent that it is inserting into the top half of an existing register. Differential Revision: https://reviews.llvm.org/D95381	2021-02-02 13:50:02 +00:00
David Green	024af42c60	[ARM] Custom lower i1 vector truncates The ISel patterns we have for truncating to i1's under MVE do not seem to be correct. Instead custom lower to icmp(ne, and(x, 1), 0). Differential Revision: https://reviews.llvm.org/D94226	2021-01-08 18:21:00 +00:00
David Green	e1c1adf9dc	[ARM] Match dual lane vmovs from insert_vector_elt MVE has a dual lane vector move instruction, capable of moving two general purpose registers into lanes of a vector register. They look like one of: vmov q0[2], q0[0], r2, r0 vmov q0[3], q0[1], r3, r1 They only accept these lane indices though (and only insert into an i32), either moving lanes 1 and 3, or 0 and 2. This patch adds some tablegen patterns for them, selecting from vector inserts elements. Because the insert_elements are know to be canonicalized to ascending order there are several patterns that we need to select. These lane indices are: 3 2 1 0 -> vmovqrr 31; vmovqrr 20 3 2 1 -> vmovqrr 31; vmov 2 3 1 -> vmovqrr 31 2 1 0 -> vmovqrr 20; vmov 1 2 0 -> vmovqrr 20 With the top one being the most common. All other potential patterns of lane indices will be matched by a combination of these and the individual vmov pattern already present. This does mean that we are selecting several machine instructions at once due to the need to re-arrange the inserts, but in this case there is nothing else that will attempt to match an insert_vector_elt node. This is a recommit of 6cc3d80a84884a79967fffa4596c14001b8ba8a3 after fixing the backward instruction definitions.	2020-12-18 16:13:08 +00:00
David Green	6e913e4451	Revert "[ARM] Match dual lane vmovs from insert_vector_elt" This one needed more testing.	2020-12-18 13:33:40 +00:00
David Green	6cc3d80a84	[ARM] Match dual lane vmovs from insert_vector_elt MVE has a dual lane vector move instruction, capable of moving two general purpose registers into lanes of a vector register. They look like one of: vmov q0[2], q0[0], r2, r0 vmov q0[3], q0[1], r3, r1 They only accept these lane indices though (and only insert into an i32), either moving lanes 1 and 3, or 0 and 2. This patch adds some tablegen patterns for them, selecting from vector inserts elements. Because the insert_elements are know to be canonicalized to ascending order there are several patterns that we need to select. These lane indices are: 3 2 1 0 -> vmovqrr 31; vmovqrr 20 3 2 1 -> vmovqrr 31; vmov 2 3 1 -> vmovqrr 31 2 1 0 -> vmovqrr 20; vmov 1 2 0 -> vmovqrr 20 With the top one being the most common. All other potential patterns of lane indices will be matched by a combination of these and the individual vmov pattern already present. This does mean that we are selecting several machine instructions at once due to the need to re-arrange the inserts, but in this case there is nothing else that will attempt to match an insert_vector_elt node. Differential Revision: https://reviews.llvm.org/D92553	2020-12-15 15:58:52 +00:00
Craig Topper	4252f7773a	[SelectionDAG][ARM][AArch64][Hexagon][RISCV][X86] Add SDNPCommutative to fma and fmad nodes in tablegen. Remove explicit commuted patterns from targets. X86 was already specially marking fma as commutable which allowed tablegen to autogenerate commuted patterns. This moves it to the target independent definition and fix up the targets to remove now unneeded patterns. Unfortunately, the tests change because the commuted version of the patterns are generating operands in a different than the explicit patterns. Differential Revision: https://reviews.llvm.org/D91842	2020-11-23 10:09:20 -08:00
David Green	c8c3a411c5	[ARM] Ensure MVE_TwoOpPattern is used inside Predicate's	2020-11-22 21:38:00 +00:00
David Green	d14db8c8dc	[ARM] Match MVE vqdmulh This adds ISel matching for a form of VQDMULH. There are several ir patterns that we could match to that instruction, this one is for: min(ashr(mul(sext(a), sext(b)), 7), 127) Which is what llvm will optimize to once it has removed the max that usually makes up the min/max saturate pattern, as in this case the compare will always be false. The additional complication to match i32 patterns (which extend into an i64) is that the min will be a vselect/setcc, as vmin is not supported for i64 vectors. Tablegen patterns have also been updated to attempt to reuse the MVE_TwoOpPattern patterns. Differential Revision: https://reviews.llvm.org/D90096	2020-10-30 13:34:27 +00:00

1 2 3 4 5 ...

276 Commits