llvm-project

Author	SHA1	Message	Date
Dhruv Chawla	a7f3d17de1	[GlobalISel] Add support for interleave and deinterleave intrinsics to IRTranslator (#85199 ) This patch adds support for the @llvm.experimental.vector.{interleave2, deinterleave2} intrinsics to IRTranslator for fixed-width vector types. They are lowered to vector shuffles, in roughly the same manner as SelectionDAG.	2024-03-15 17:18:17 +05:30
Harvin Iriawan	db158c7c83	[AArch64] Update generic sched model to A510 Refresh of the generic scheduling model to use A510 instead of A55. Main benefits are to the little core, and introducing SVE scheduling information. Changes tested on various OoO cores, no performance degradation is seen. Differential Revision: https://reviews.llvm.org/D156799	2023-08-21 12:25:15 +01:00
David Green	18af853022	[AArch64] Remove 64bit->128bit vector insert lowering The AArch64 backend, during lowering, will convert an 64bit vector insert to a 128bit vector: vector_insert %dreg, %v, %idx => %qreg = insert_subvector undef, %dreg, 0 %ins = vector_insert %qreg, %v, %idx EXTRACT_SUBREG %ins, dsub This creates a bit of mess in the DAG, and the EXTRACT_SUBREG being a machine nodes makes it difficult to simplify. This patch removes that, treating the 64bit vector insert as legal and handling them with extra tablegen patterns. The end result is a simpler DAG that is easier to write tablegen patterns for. Differential Revision: https://reviews.llvm.org/D144550	2023-03-01 09:39:51 +00:00
Caroline Concatto	d515ecca68	[IR] Add new intrinsics interleave and deinterleave vectors This patch adds 2 new intrinsics: ; Interleave two vectors into a wider vector <vscale x 4 x i64> @llvm.vector.interleave2.nxv2i64(<vscale x 2 x i64> %even, <vscale x 2 x i64> %odd) ; Deinterleave the odd and even lanes from a wider vector {<vscale x 2 x i64>, <vscale x 2 x i64>} @llvm.vector.deinterleave2.nxv2i64(<vscale x 4 x i64> %vec) The main motivator for adding these intrinsics is to support vectorization of complex types using scalable vectors. The intrinsics are kept simple by only supporting a stride of 2, which makes them easy to lower and type-legalize. A stride of 2 is sufficient to handle complex types which only have a real/imaginary component. The format of the intrinsics matches how `shufflevector` is used in LoopVectorize. For example: using cf = std::complex<float>; void foo(cf * dst, int N) { for (int i=0; i<N; ++i) dst[i] += cf(1.f, 2.f); } For this loop, LoopVectorize: (1) Loads a wide vector (e.g. <8 x float>) (2) Extracts odd lanes using shufflevector (leading to <4 x float>) (3) Extracts even lanes using shufflevector (leading to <4 x float>) (4) Performs the addition (5) Interleaves the two <4 x float> vectors into a single <8 x float> using shufflevector (6) Stores the wide vector. In this example, we can 1-1 replace shufflevector in (2) and (3) with the deinterleave intrinsic, and replace the shufflevector in (5) with the interleave intrinsic. The SelectionDAG nodes might be extended to support higher strides (3, 4, etc) as well in the future. Similar to what was done for vector.splice and vector.reverse, the intrinsic is lowered to a shufflevector when the type is fixed width, so to benefit from existing code that was written to recognize/optimize shufflevector patterns. Note that this approach does not prevent us from adding new intrinsics for other strides, or adding a more generic shuffle intrinsic in the future. It just solves the immediate problem of being able to vectorize loops with complex math. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D141924	2023-02-20 12:21:59 +00:00

4 Commits