llvm-project

Author	SHA1	Message	Date
Simon Pilgrim	da570ef1b4	[DAG] Match select(icmp(x,y),sub(x,y),sub(y,x)) -> abd(x,y) patterns Pulled out of PowerPC, and added ABDS support as well (hence the additional v4i32 PPC matches) Differential Revision: https://reviews.llvm.org/D144789	2023-03-14 15:10:30 +00:00
Simon Pilgrim	4bf004e07e	[DAG] Fold (bitcast (logicop (bitcast x), (c))) -> (logicop x, (bitcast c)) iff the current logicop type is illegal Try to remove extra bitcasts around logicops if we're dealing with illegal types Fixes the regressions in D145939 Differential Revision: https://reviews.llvm.org/D146032	2023-03-14 14:41:11 +00:00
pvanhout	1f1fea6c38	Reland: [DAG/AMDGPU] Use UniformityAnalysis in DAGISel Switch DAGISel over to UniformityAnalysis, which was one of the last remaining users of the DivergenceAnalysis. No explosions seen during internal testing so this looks like a smooth transition. Reviewed By: sameerds Differential Revision: https://reviews.llvm.org/D145918	2023-03-14 14:38:45 +01:00
pvanhout	0e79106fc9	Revert "[DAG/AMDGPU] Use UniformityAnalysis in DAGISel" This reverts commit 0022b5803fd4f5a4e9fcf233267c0ffa1b88f763.	2023-03-14 11:48:58 +01:00
pvanhout	0022b5803f	[DAG/AMDGPU] Use UniformityAnalysis in DAGISel Switch DAGISel over to UniformityAnalysis, which was one of the last remaining users of the DivergenceAnalysis. No explosions seen during internal testing so this looks like a smooth transition. Reviewed By: sameerds Differential Revision: https://reviews.llvm.org/D145918	2023-03-14 11:18:28 +01:00
Simon Pilgrim	c7d844ea0f	[DAG] Use ISD::isBitwiseLogicOp in AND/OR/XOR checks. NFCI. There's additional cases we can cleanup (mainly in target code), but this tries to cleanup generic code and PPC which had an equivalent helper.	2023-03-13 13:39:02 +00:00
Chen Zheng	4f0ed16a46	Reland rGf35a09daebd0a90daa536432e62a2476f708150d and rG63854f91d3ee1056796a5ef27753648396cac6ec [DAGCombiner] handle more store value forwarding When lowering calls on target like PPC, some stack loads will be generated for by value parameters. Node CALLSEQ_START prevents such loads from being combined. Suggested by @RolandF, this patch removes the unnecessary loads for the byval parameter by extending ForwardStoreValueToDirectLoad Reviewed By: nemanjai, RolandF Differential Revision: https://reviews.llvm.org/D138899	2023-03-12 21:59:18 -04:00
Jun Ma	00eef4f7c3	[SelectionDAG] Fix mismatched truncate when combine BUILD_VECTOR with EXTRACT_SUBVECTOR Just use correct type for truncation. Fixes PR59625 Differential Revision: https://reviews.llvm.org/D145757	2023-03-13 08:59:52 +08:00
Simon Pilgrim	82dc04befd	[DAG] visitZERO_EXTEND - pull out the repeated SDLoc(N) variables	2023-03-12 15:18:46 +00:00
Simon Pilgrim	4d7da0e711	[DAG] Cleanup the (zext (shl (zext x), cst)) -> (shl (zext x), cst) fold. NFC. Preliminary cleanup before adding some additional legality and value tracking handling.	2023-03-12 15:01:33 +00:00
Simon Pilgrim	b53ea2b9c5	[DAG] visitAND - fold (and (any_ext V), c) -> (zero_ext (and (trunc V), c)) if profitable. Try to more aggressively narrow masks of extended values. This is mainly for cases where the mask is trying to zero out any_extended upper bits, assuming we can zext/trunc the values for free. This catches a few actual missed folds, as well as helps canonicalize a number of other cases which were being caught in isel etc. Differential Revision: https://reviews.llvm.org/D145866	2023-03-12 13:25:23 +00:00
Simon Pilgrim	fad852efe4	[DAG] combineShiftAnd1ToBitTest - improve support for peeking through truncations Allows us to handle shift amounts that exceed the original bitwidth	2023-03-11 16:37:47 +00:00
Yeting Kuo	b2c48559c8	[IR][DAG][RISCV] Allow scalable vector ISD::STRICT_FP_EXTEND and RISC-V supports for vector ISD::STRICT_FP_EXTEND. The patch mainly does two things. The first is allowing scalable vector ISD::STRICT_FP_EXTEND. The second is making RISC-V customized lower strict_fpextend to riscv_strict_fpextend_vl, the strict version of riscv_fpextend_vl. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D145548	2023-03-09 17:37:59 +08:00
Juneyoung Lee	a66bc1c4a3	[DAGCombiner] Avoid converting (x or/xor const) + y to (x + y) + const if benefit is unclear This patch resolves suboptimal code generation reported by https://github.com/llvm/llvm-project/issues/60571 . DAGCombiner currently converts `(x or/xor const) + y` to `(x + y) + const` if this is valid. However, if `.. + const` is broken down into a sequences of adds with carries, the benefit is not clear, introducing two more add(-with-carry) ops (total 6) in the case of the reported issue whereas the optimal sequence must only have 4 add(-with-carry)s. This patch resolves this issue by allowing this conversion only when (1) `.. + const` is legal or promotable, or (2) `const` is a sign bit because it does not introduce more adds. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D144116	2023-03-08 18:13:57 +00:00
Chen Zheng	fc26ab36a2	[DAGCombiner] don't use the pointer info for widen store The merged store touches memory for other underlying objects, so mapping the merged store to the first underlying object is not correct. For example in https://github.com/llvm/llvm-project/issues/60744, the merged store is not correctly analyzed as dependent with memory operations which are also part of the merged store. Fixes #60744 Reviewed By: foad Differential Revision: https://reviews.llvm.org/D144711	2023-03-07 20:31:09 -05:00
Jay Foad	0265dd9925	Fix "compatiable" typos	2023-03-07 12:57:39 +00:00
Noah Goldstein	c1ecd0a3f4	[DAGCombiner] Add fold for `~x + x` -> `-1` This is generally done by the InstCombine, but can be emitted as an intermediate step and is cheap to handle. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D145177	2023-03-06 20:30:27 -06:00
Noah Goldstein	d4b24b4a55	[DAGCombiner] Add fold for `~x & x` -> `0` This is generally done by the InstCombine, but can be emitted as an intermediate step and is cheap to handle. Differential Revision: https://reviews.llvm.org/D145143	2023-03-06 20:30:20 -06:00
Marco Elver	bdb4353ae0	[SelectionDAG] Optimize copyExtraInfo deep copy It turns out that there are relatively trivial, albeit rare, cases that require a MaxDepth of more than 16 (see added test). However, we want to avoid having to rely on a large fixed MaxDepth. Since these cases are relatively rare, apply the following strategy: 1. Start with a low MaxDepth of 16 - if the entry node was not reached, we can return (the common case). 2. If the entry node was reached, exponentially increase MaxDepth up to some large limit that should cover all cases and guard against stack exhaustion. This retains the better performance with a low MaxDepth in the common case, and in complex cases backs off and retries. On a whole, this is preferable vs. starting with a large MaxDepth which would unnecessarily penalize the common case where a low MaxDepth is sufficient. Reviewed By: dvyukov Differential Revision: https://reviews.llvm.org/D145386	2023-03-06 17:29:53 +01:00
Caroline Concatto	204800ad0a	[IR][Legalization] Promote illegal deinterleave and interleave vectors To make legalization easier, the operands and outputs have the same size for these ISD Nodes. When legalizing the results in PromoteIntegerResult the operands are legalized to the same size as the outputs. The ISD Node has two output/results, therefore the legalizing functions update both results/outputs. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D144846	2023-03-03 10:54:52 +00:00
Marco Elver	7ecd2a23f5	[SelectionDAG] Fix missing lambda capture Move MaxDepth into the lambda, since it is not needed outside. This fixes some compilers that complain about missing capture: error C3493: 'MaxDepth' cannot be implicitly captured because no default capture mode has been specified Fixes: f693932fbea7 ("[SelectionDAG] Transitively copy NodeExtraInfo on RAUW")	2023-03-02 23:47:36 +01:00
Marco Elver	f693932fbe	[SelectionDAG] Transitively copy NodeExtraInfo on RAUW During legalization of the SelectionDAG, some nodes are replaced with arch-specific nodes. These may be complex nodes, where the root node no longer corresponds to the node that should carry the extra info. Fix the issue by copying extra info to the new node and all its new transitive operands during RAUW. See code comments for more details. This fixes the remaining pcsections-atomics.ll tests on X86. v2: Optimize copyExtraInfo() deep copy. For now we assume that only NodeExtraInfo that have PCSections set require deep copy. Furthermore, limit the depth of graph search while pre-populating the visited set, assuming the to-be-replaced subgraph 'From' has limited complexity. An assertion catches if the maximum depth needs to be increased. Reviewed By: dvyukov Differential Revision: https://reviews.llvm.org/D144677	2023-03-02 23:07:19 +01:00
Craig Topper	06c6b787b2	[SelectionDAG][AArch64] Constant fold in SelectionDAG::getVScale if VScaleMin==VScaleMax. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D145113	2023-03-02 12:02:38 -08:00
Craig Topper	c546f13f1f	[DAGCombiner] Replace LegalOperations check in visitSIGN_EXTEND with LegalTypes. This is guarding a check for isTypeLegal so it should check is LegalTypes. Fixes PR61111. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D145139	2023-03-02 07:52:53 -08:00
Sander de Smalen	170e7a0ec2	[AArch64][SME2] Add CodeGen support for target("aarch64.svcount"). This patch adds AArch64 CodeGen support such that the type can be passed and returned to/from functions, and also adds support to use this type in load/store operations and PHI nodes. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D136862	2023-03-02 12:07:41 +00:00
J. Ryan Stinnett	22b8e82c12	[DebugInfo] Remove `dbg.addr` from CodeGen As part of this work, removing `SDDbgValue::clearIsEmitted` originally added for `dbg.addr` in 045c67769d7fe577fc38cccb6fb40fd814437447 was attempted, but it appears some tests for `DBG_INSTR_REF` now depend on that behaviour as well, so it was kept and comments were updated instead. Part of `dbg.addr` removal Discussed in https://discourse.llvm.org/t/what-is-the-status-of-dbg-addr/62898 Differential Revision: https://reviews.llvm.org/D144800	2023-03-02 09:29:43 +00:00
J. Ryan Stinnett	f5b85c02e9	[DebugInfo][NFC] Remove `FuncArgumentDbgValueKind::Addr` from SelectionDAG This removes the unused `FuncArgumentDbgValueKind::Addr` value originally added by e24f5348798605a799c63ff09169d177d262cd37. The intent was to signal the original intrinsic that marked a function argument, but the `Addr` part was never used. Part of `dbg.addr` removal Discussed in https://discourse.llvm.org/t/what-is-the-status-of-dbg-addr/62898 Differential Revision: https://reviews.llvm.org/D144794	2023-03-02 09:29:42 +00:00
Marco Elver	e0bc779000	Revert "[SelectionDAG] Transitively copy NodeExtraInfo on RAUW" This reverts commit 7f635b90e7bdf1378fd9a65fc62b99e8e07d4aaf. The current implementation causes pathological slowdowns in certain cases: https://github.com/llvm/llvm-project/issues/61108	2023-03-02 09:39:44 +01:00
Simon Pilgrim	73cdccad55	[DAG] expandIntMINMAX - attempt to match existing SETCC node As noticed on D144789, when we have pairs of min/max nodes we often end up with multiple comparisons which we could reuse with commuted select ops, so check to see if a suitable SETCC already exists. This also allowed us to remove a similar X86 peephole. There are other getSETCC cases where we could safely reuse other CondCodes as well - I've been trying to think of how we could reuse this logic in SelectionDAG but haven't found anything that always works well. An alternative would be to have a TLI callback that returns a preferred CondCode from a list of options, I've noticed this helped fpclamptosat tests on some other targets (MVE + WebAssembly), but other tests suffered. Differential Revision: https://reviews.llvm.org/D145065	2023-03-01 19:04:03 +00:00
David Green	337215ddf9	[DAG] ABD is not reassociative I'm not sure how I missed this in the testing, but as far as I understand whilst ABDS and ABDU are commutive they are not associative. This patch disables reassociateOps from visitABD, fixing the problems found in #61069. ABDU: https://alive2.llvm.org/ce/z/eiT5QG ABDS: https://alive2.llvm.org/ce/z/HzE29l Differential Revision: https://reviews.llvm.org/D145064	2023-03-01 16:22:13 +00:00
Caroline Concatto	cb96eba27c	[IR][Legalization] Split illegal deinterleave and interleave vectors To make legalization easier, the operands and outputs have the same size for these ISD Nodes. When legalizing the results in SplitVectorResult the operands are legalized to the same size as the outputs. The ISD Node has two output/results, therefore the legalizing functions update both results/outputs. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D144744	2023-03-01 08:30:16 +00:00
David Green	06daa515b2	[AArch64] Don't remove free sext_inreg(vector_extract(x)) if it leads to multiple extracts If we have sext_inreg(vector_extract(x)) but the top bits are not used, DAG will try to remove the sext_inreg, using vector_extract(x) directly. This can lead to multiple uses of both sext_inreg(vector_extract(x)) and vector_extract(x), leading to the generation of both umov and smov extracts. This adds a target hook to prevent that under AArch64 where the sext_inreg can be considered free if there are multiple uses of the sext and no uses of the vector_extract. This helps fix a small regression from D144550. Differential Revision: https://reviews.llvm.org/D144850	2023-02-27 19:20:10 +00:00
Marco Elver	7f635b90e7	[SelectionDAG] Transitively copy NodeExtraInfo on RAUW During legalization of the SelectionDAG, some nodes are replaced with arch-specific nodes. These may be complex nodes, where the root node no longer corresponds to the node that should carry the extra info. Fix the issue by copying extra info to the new node and all its new transitive operands during RAUW. See code comments for more details. This fixes the remaining pcsections-atomics.ll tests on X86. Reviewed By: dvyukov Differential Revision: https://reviews.llvm.org/D144677	2023-02-27 12:16:14 +01:00
Noah Goldstein	e981e6d10e	Add transform for `(and/or (icmp eq/ne A,-1),(icmp eq/ne A,-1+C))`->`(and/or (icmp eq/ne (and ~A,-1+C),0))` This works of `-1+C` is a negative power of 2. This can be more useful than the `AddAnd` case as `~A` does not necessarily require materializing a constant. This makes the transform worth it for X86 vector types. Alive2 Links: EQ: https://alive2.llvm.org/ce/z/P6u8cq NE: https://alive2.llvm.org/ce/z/_Kkqp1 Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D144284	2023-02-24 15:22:09 -06:00
Noah Goldstein	8c74c5402f	Make `(and/or (icmp eq/ne A,C0), (icmp eq/ne A,C1))` where `IsPow(dif(C0,C1))` work for more patterns. `(and/or (icmp eq/ne A,C0), (icmp eq/ne A,C1))` can be lowered to `(icmp eq/ne (and (sub A, (smin C0, C1)), (not (sub (smax C0, C1), (smin C0, C1)))), 0)` generically if `(sub (smax C0, C1), (smin C0,C1))` is a power of 2. This covers the existing case of `(and/or (icmp eq/ne A, C_Pow2),(icmp eq/ne A, -C_Pow2))` as well as other cases. Alive2 Links: EQ: https://alive2.llvm.org/ce/z/mLJiUW NE: https://alive2.llvm.org/ce/z/TKnzUr Differential Revision: https://reviews.llvm.org/D144283	2023-02-24 15:22:09 -06:00
Serge Pavlov	7f81dd4dd6	[NFC] Make FPClassTest a bitmask enumeration This is recommit of 2e416cdd52, fixed to be accepatble by GCC. The original commit message is below. With this change bitwise operations are allowed for FPClassTest enumeration, it must simplify using this type. Also some functions changed to get argument of type FPClassTest instead of unsigned. Differential Revision: https://reviews.llvm.org/D144241	2023-02-24 15:12:16 +07:00
Samuel Parker	f48d3b6f46	Revert "[DAGCombine] Fold redundant select" This reverts commit c7f9344d0f8f6a00adab138037e2e7b406ef2b69.	2023-02-23 17:59:41 +00:00
Craig Topper	230e61658b	[LegalizeTypes] Add a special case for (add X, 1) to ExpandIntRes_ADDSUB. On targets without ADDCARRY or ADDE, we need to emit a separate SETCC to determine carry from the low half to the high half. Usually we do (setult Lo, LHSLo). If RHSLo is 1 we can instead do (seteq Lo, 0). This can reduce the live range of LHSLo.	2023-02-23 09:47:42 -08:00
Craig Topper	2fc5a5117c	[LegalizeTypes][RISCV] Add a special case to ExpandIntRes_UADDSUBO for (uaddo X, 1). On targets that lack ADDCARRY support we split a wide uaddo into an ADD and a SETCC that both need to be split. For (uaddo X, 1) we can observe that when the add overflows the result will be 0. We can emit (seteq (or Lo, Hi), 0) to detect this. This improves D142071. There is an alternative here. We could use either ~(lo(X) & hi(X)) == 0 or (lo(X) & hi(X)) == -1 before the addition. That would be closer to the code before D142071. Reviewed By: liaolucy Differential Revision: https://reviews.llvm.org/D144614	2023-02-23 09:16:54 -08:00
Nikita Popov	8347ca7dc8	[PatternMatch] Don't require DataLayout for m_VScale() The m_VScale() matcher is unusual in that it requires a DataLayout. It is currently used to determine the size of the GEP type. However, I believe it is sufficient to check for the canonical <vscale x 1 x i8> form here -- I don't think there's a need to recognize exotic variations like <vscale x 1 x i4> as a vscale constant representation as well. Differential Revision: https://reviews.llvm.org/D144566	2023-02-23 15:30:29 +01:00
Yeting Kuo	419948fe67	[VP] Reorder is_int_min_poison/is_zero_poison operand before mask for vp.abs/ctlz/cttz. The patch ensures last two operands of vp.abs/ctlz/cttz are mask and evl. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D144536	2023-02-23 13:58:21 +08:00
Serge Pavlov	08a09235b6	Revert "[NFC] Make FPClassTest a bitmask enumeration" This reverts commit e7613c1d9b259bdf2b0b06b4169d9a10dd553406. GCC issues an error: In file included from /home/buildbot/as-builder-4/lld-x86_64-ubuntu-fast/llvm-project/llvm/unittests/ADT/BitmaskEnumTest.cpp:9: /home/buildbot/as-builder-4/lld-x86_64-ubuntu-fast/llvm-project/llvm/include/llvm/ADT/BitmaskEnum.h:66:22: error: explicit specialization of template<class E, class Enable> struct llvm::is_bitmask_enum outside its namespace must use a nested-name-specifier [-fpermissive] 66 \| template <> struct is_bitmask_enum<Enum> : std::true_type {}; \ \| ^~~~~~~~~~~~~~~~~~~~~ /home/buildbot/as-builder-4/lld-x86_64-ubuntu-fast/llvm-project/llvm/unittests/ADT/BitmaskEnumTest.cpp:30:1: note: in expansion of macro LLVM_DECLARE_ENUM_AS_BITMASK 30 \| LLVM_DECLARE_ENUM_AS_BITMASK(Flags2, V4); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~	2023-02-23 12:55:58 +07:00
Serge Pavlov	e7613c1d9b	[NFC] Make FPClassTest a bitmask enumeration This is recommit of 2e416cdd52, reverted in 8555ab2fcd, because GCC complains on extra qualification. The macro LLVM_DECLARE_ENUM_AS_BITMASK does not specify llvm:: anymore, so the macro must occur in the namespace llvm. Documentation updated accordingly. The original commit message is below. With this change bitwise operations are allowed for FPClassTest enumeration, it must simplify using this type. Also some functions changed to get argument of type FPClassTest instead of unsigned. Differential Revision: https://reviews.llvm.org/D144241	2023-02-23 12:38:57 +07:00
Cameron McInally	af4c4f4e21	[DAGCombine] Fix an ICE in combineMinNumMaxNum(...) 65420c8041f4 introduced an ICE in combineMinNumMaxNum(...) when combineMinNumMaxNumImpl(...) returns an SDValue(). Make sure to check that a value is returned before trying to perform an FNEG on it. GitHub Issue: #60924 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D144571	2023-02-22 11:00:51 -08:00
Ricardo Jesus	272bd573dc	[AArch64] Fix abs(sub nsw) -> absd This partially reverts a regression introduced in 8f25e382c5b1 for AArch64 targets. In particular, we restore the logic of `(abs (sub nsw x, y)) -> abds(x, y)` for all targets except X86, which keeps the logic introduced in 8f25e382c5b1. See also https://reviews.llvm.org/D142288. Differential Revision: https://reviews.llvm.org/D144379	2023-02-22 09:17:25 +00:00
Nikita Popov	8555ab2fcd	Revert "[NFC] Make FPClassTest a bitmask enumeration" This reverts commit 2e416cdd52c1079b8c7cb1f7d7e557c889a4fb56. Breaks the GCC build: In file included from /home/npopov/repos/llvm-project/llvm/include/llvm/ADT/FloatingPointMode.h:18, from /home/npopov/repos/llvm-project/llvm/include/llvm/ADT/APFloat.h:20, from /home/npopov/repos/llvm-project/llvm/lib/Support/APFloat.cpp:14: /home/npopov/repos/llvm-project/llvm/include/llvm/ADT/BitmaskEnum.h:66:22: error: extra qualification not allowed [-fpermissive] 66 \| template <> struct llvm::is_bitmask_enum<Enum> : std::true_type {}; \ \| ^~~~ /home/npopov/repos/llvm-project/llvm/include/llvm/ADT/FloatingPointMode.h:223:1: note: in expansion of macro ‘LLVM_DECLARE_ENUM_AS_BITMASK’ 223 \| LLVM_DECLARE_ENUM_AS_BITMASK(FPClassTest, /* LargestValue / fcPosInf); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~ /home/npopov/repos/llvm-project/llvm/include/llvm/ADT/BitmaskEnum.h:67:22: error: extra qualification not allowed [-fpermissive] 67 \| template <> struct llvm::largest_bitmask_enum_bit<Enum> { \ \| ^~~~ /home/npopov/repos/llvm-project/llvm/include/llvm/ADT/FloatingPointMode.h:223:1: note: in expansion of macro ‘LLVM_DECLARE_ENUM_AS_BITMASK’ 223 \| LLVM_DECLARE_ENUM_AS_BITMASK(FPClassTest, / LargestValue */ fcPosInf); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~ [43/4396] Building CXX object lib/Supp...iles/LLVMSupport.dir/CommandLine.cpp.o	2023-02-22 08:56:19 +01:00
Serge Pavlov	2e416cdd52	[NFC] Make FPClassTest a bitmask enumeration With this change bitwise operations are allowed for FPClassTest enumeration, it must simplify using this type. Also some functions changed to get argument of type FPClassTest instead of unsigned. Differential Revision: https://reviews.llvm.org/D144241	2023-02-22 14:20:04 +07:00
Fangrui Song	e4f4f34e7a	[SelectionDAG] Migrate away from soft-deprecated functions. NFC	2023-02-21 11:01:34 -08:00
Caroline Concatto	d515ecca68	[IR] Add new intrinsics interleave and deinterleave vectors This patch adds 2 new intrinsics: ; Interleave two vectors into a wider vector <vscale x 4 x i64> @llvm.vector.interleave2.nxv2i64(<vscale x 2 x i64> %even, <vscale x 2 x i64> %odd) ; Deinterleave the odd and even lanes from a wider vector {<vscale x 2 x i64>, <vscale x 2 x i64>} @llvm.vector.deinterleave2.nxv2i64(<vscale x 4 x i64> %vec) The main motivator for adding these intrinsics is to support vectorization of complex types using scalable vectors. The intrinsics are kept simple by only supporting a stride of 2, which makes them easy to lower and type-legalize. A stride of 2 is sufficient to handle complex types which only have a real/imaginary component. The format of the intrinsics matches how `shufflevector` is used in LoopVectorize. For example: using cf = std::complex<float>; void foo(cf * dst, int N) { for (int i=0; i<N; ++i) dst[i] += cf(1.f, 2.f); } For this loop, LoopVectorize: (1) Loads a wide vector (e.g. <8 x float>) (2) Extracts odd lanes using shufflevector (leading to <4 x float>) (3) Extracts even lanes using shufflevector (leading to <4 x float>) (4) Performs the addition (5) Interleaves the two <4 x float> vectors into a single <8 x float> using shufflevector (6) Stores the wide vector. In this example, we can 1-1 replace shufflevector in (2) and (3) with the deinterleave intrinsic, and replace the shufflevector in (5) with the interleave intrinsic. The SelectionDAG nodes might be extended to support higher strides (3, 4, etc) as well in the future. Similar to what was done for vector.splice and vector.reverse, the intrinsic is lowered to a shufflevector when the type is fixed width, so to benefit from existing code that was written to recognize/optimize shufflevector patterns. Note that this approach does not prevent us from adding new intrinsics for other strides, or adding a more generic shuffle intrinsic in the future. It just solves the immediate problem of being able to vectorize loops with complex math. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D141924	2023-02-20 12:21:59 +00:00
Kazu Hirata	a28b252d85	Use APInt::getSignificantBits instead of APInt::getMinSignedBits (NFC) Note that getMinSignedBits has been soft-deprecated in favor of getSignificantBits.	2023-02-19 23:56:52 -08:00

1 2 3 4 5 ...

12751 Commits