llvm-project

Author	SHA1	Message	Date
Nikita Popov	580210a0c9	[SLP] Convert some tests to opaque pointers (NFC)	2022-12-23 10:02:57 +01:00
Roman Lebedev	6697140ba1	[NFC] Port all SLPVectorizer tests to `-passes=` syntax	2022-12-07 21:44:09 +03:00
David Green	2de05afc19	[SLP] Peek into loads when hitting the RecursionMaxDepth This patch slightly extends the limit on the RecursionMaxDepth inside the SLP vectorizer. It does it only when it hits a load (or zext/sext of a load), which allows it to peek through in the places where it will be the most valuable, without ballooning out the O(..) by any 2^n factors. Differential Revision: https://reviews.llvm.org/D122148	2022-07-04 14:22:50 +01:00
Alexey Bataev	2ac5ebedea	[SLP]Do not emit extract elements for insertelements users, replace with shuffles directly. SLP vectorizer emits extracts for externally used vectorized scalars and estimates the cost for each such extract. But in many cases these scalars are input for insertelement instructions, forming buildvector, and instead of extractelement/insertelement pair we can emit/cost estimate shuffle(s) cost and generate series of shuffles, which can be further optimized. Tested using test-suite (+SPEC2017), the tests passed, SLP was able to generate/vectorize more instructions in many cases and it allowed to reduce number of re-vectorization attempts (where we could try to vectorize buildector insertelements again and again). Differential Revision: https://reviews.llvm.org/D107966	2022-05-23 07:06:45 -07:00
Florian Hahn	aeb19817d6	Revert "[SLP]Do not emit extract elements for insertelements users, replace with shuffles directly." This reverts commit fc9c59c355cb255446e571b4515b5e41a76503c4. The patch triggers an assertion when building SPEC on X86. Reduced reproducer shared at D107966. Also reverts follow-up commit 11a09af76d11ad5a9f1f95b561112af17ff81f80.	2022-05-21 21:00:01 +01:00
Alexey Bataev	fc9c59c355	[SLP]Do not emit extract elements for insertelements users, replace with shuffles directly. SLP vectorizer emits extracts for externally used vectorized scalars and estimates the cost for each such extract. But in many cases these scalars are input for insertelement instructions, forming buildvector, and instead of extractelement/insertelement pair we can emit/cost estimate shuffle(s) cost and generate series of shuffles, which can be further optimized. Tested using test-suite (+SPEC2017), the tests passed, SLP was able to generate/vectorize more instructions in many cases and it allowed to reduce number of re-vectorization attempts (where we could try to vectorize buildector insertelements again and again). Differential Revision: https://reviews.llvm.org/D107966	2022-05-20 05:58:09 -07:00
David Green	802e15c576	[SLP] Cluster ordering for loads Given a load without a better order, this patch partially sorts the elements to form clusters of adjacent elements in memory. These clusters can potentially be loaded in fewer loads, meaning less overall shuffling (for example loading v4i8 clusters of a v16i8 as a single f32 loads, as opposed to multiple independent bytes loads and inserts). Differential Revision: https://reviews.llvm.org/D122145	2022-05-07 14:38:11 +01:00
David Green	2db46db54d	[SLP] Add tests for awkward laod orders from SLP. NFC	2022-05-07 10:27:32 +01:00

8 Commits