llvm-project

Author	SHA1	Message	Date
Alexey Bataev	434aa2fe56	[SLP]Improve canreuseExtracts for reordering analysis. Improve the analysis in canReuseExtracts for the reodering to better reorder extracts for ExtractSubvector pattern.	2023-09-15 12:09:45 -07:00
Alexey Bataev	b9ad72ba05	[SLP]Fix PR66176: SLP incorrectly reorders select operands. On the very first iteration for the reductions, when trying to build reduction for boolean logic operations, no need to compare LHS/RHS with the Reduction(VectorizedTree), need to compare with actual parameters of the reduction operations.	2023-09-15 03:57:36 -07:00
Alexey Bataev	d2ab97b00c	[SLP][NFC]Add a test with incorrect reduction of poisoned logical bool.	2023-09-14 17:11:44 -07:00
Alexey Bataev	c15c1e5dd5	[SLP]Do not account non-instructions for external use. If the non-instruction gets vectorized, no need to account its extract cost, it won't be removed and replaced by extractelement instruction.	2023-09-14 12:40:33 -07:00
Alexey Bataev	1034405486	[SLP][NFC]Add a test for non-instruction with external use.	2023-09-14 12:34:14 -07:00
Thomas	0a7a926007	[NVPTX] Make i16x2 a native type and add supported vec instructions (#65799 ) recommit https://github.com/llvm/llvm-project/pull/65432 with minor bug fix for bitcasts	2023-09-08 13:44:58 -07:00
Alexey Bataev	5bab59de44	[SLP]Try to vectorize scalars, being vectorized already, but does not need to be scheduled. If the scalar does not need to be scheduled and it was vectorized already in one of the vector nodes, we still can try to vectorize it in another node. Just does not need account its cost in the scalar total cost, as it will be handled in the main vectorized node. Differential Revision: https://reviews.llvm.org/D159205	2023-09-08 13:34:12 -07:00
Dmitri Gribenko	b3a14cac4f	Revert "[NVPTX] Make i16x2 a native type and add supported vec instructions (#65432 )" This reverts commit db5d845c73ee2d64f1a5bab3fc72edece9e3a7ba. As per PR discussion "Looks like we've missed lowering of bitcasts between v2f16 and v2i16 and it breaks XLA."	2023-09-08 19:28:15 +02:00
Alexey Bataev	30edf1c449	[SLP]Do not early exit if the number of unique elements is non-power-of-2. (#65476 ) We still can try to vectorize the bundle of the instructions, even if the repeated number of instruction is non-power-of-2. In this case need to adjust the cost (calculate the cost only for unique scalar instructions) and cost of the extracts. Also, when scheduling the bundle need to schedule only unique scalars to avoid compiler crash because of the multiple dependencies. Can be safely applied only if all scalars's users are also vectorized and do not require memory accesses (this one is a temporarily requirement, can be relaxed later). --------- Co-authored-by: Alexey Bataev <a.bataev@outlook.com>	2023-09-08 10:00:46 -04:00
Ramkumar Ramachandra	a06be8a2e4	SLP/RISCV: add negative test for lrint (#55208 ) (#65611 ) The issue #55208 describes a current deficiency of the SLPVectorizer, namely that it doesn't vectorize code written with lrint, while similar code written with rint is vectorized. Add a test corresponding to this issue for the RISC-V target.	2023-09-08 10:58:14 +01:00
Ramkumar Ramachandra	7f499579a8	SLP/RISCV: add test for vectorized ctpop, like in X86 (#65330 ) Recently, 7f26c27 turned on SLP by default for RISC-V, and although there are quite a few tests for SLP under the X86/ target, it is unclear whether the same constructs would be vectorized on RISC-V. This patch takes a step in the direction of remedying this, by noticing that ctpop is often vectorized on RISC-V, and adding four tests for different integer widths.	2023-09-07 17:02:13 +01:00
Thomas	db5d845c73	[NVPTX] Make i16x2 a native type and add supported vec instructions (#65432 ) On sm_90 some instructions now support i16x2 which allows hardware to execute more efficiently add, min and max instructions. In order to support that we need to make i16x2 a native type in the backend. This does the necessary changes to make i16x2 a native type and adds support for the instructions natively supporting i16x2. This caused a negative test in nvptx slp to start passing. Changed the test to a positive one as the IR is correctly vectorized.	2023-09-06 21:59:13 -07:00
Alexey Bataev	25fd5e63f8	[SLP][NFC]Update tests checks, NFC.	2023-09-06 13:57:49 -07:00
Matt Arsenault	5c0da5839d	InstCombine: Recognize fabs as bitcasted integer In the past we sort of pretended float might be implementable as a non-IEEE type but that never realistically would work. Exotic FP types would need to be added to the IR. Turning these into FP operations enables FP tracking optimizations. https://reviews.llvm.org/D151937	2023-08-31 19:03:48 -04:00
Matt Arsenault	50a9b3d8a5	InstCombine: Recognize fneg when performed as bitcasted integer This is a resurrection of D18874. This was previously wrong with fneg conflated with fsub, but we now have a proper fneg instruction. Additionally, I think it is now clearer that IR float=IEEE float, and a different bit layout would require adding a different IR type. https://reviews.llvm.org/D151934	2023-08-31 18:59:34 -04:00
Philip Reames	aada8f2e54	[slp] Tweak debug costing output to include VL This makes it much easier to understand which vector length is being considered when the same set of nodes are evaluated at multiple vector lengths.	2023-08-30 09:13:19 -07:00
Philip Reames	514b38cd7e	[RISCV] Remove mask size restriction on single source and dual src shuffle costing (try 2) Some callers pass in an empty mask to represent "unknown". We should use the generic costs for these cases. We can add VL=1 costing seperately if desired. Reapplying after revert. A new test had been added, and I'd missed updating it when rebasing before. This is a great happy accident as I hadn't figured out how to get SLP to exercise this case, I'd merely noticed it via inspection.	2023-08-23 14:43:02 -07:00
wangpc	9a82bda9de	[RISCV] Fix assertion of getShuffleCost This assertion is introduced by D157425. We should calculate the cost iff `Mask` is not empty. Fixes 64901 Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D158590	2023-08-23 20:10:50 +08:00
Alexey Bataev	b51195dece	[SLP]Fix PR63854: Add proper sorting of pointers for masked stores. If the masked gathers can be reordered, it may produce strided access pattern and the reordering does not affect common reodering, better to try to reorder masked gathers for better performance. Differential Revision: https://reviews.llvm.org/D157009	2023-08-22 06:14:01 -07:00
Nikita Popov	69bd66b3ce	[Tests] Remove some and/or constant expressions in tests (NFC) In preparation for their removal in D158081.	2023-08-21 12:05:32 +02:00
Alexey Bataev	9a207578ac	[TTI]Add InsertSubvector pattern in improveShuffleKindFromMask(). It improves shuffle instructions estimation and improves vectorization outcome. Differential Revision: https://reviews.llvm.org/D157425	2023-08-18 13:47:01 -07:00
Alexey Bataev	63c7815faf	[SLP]Fix comparator for PHI nodes comparison. Fixed comparator for PHI nodes sorting to meet the criteria for strict weak ordering.	2023-08-14 14:05:39 -07:00
Alexey Bataev	4f0bd8f7ac	[SLP]Fix strict weak ordering for Cmp instruction comparator. Sorting algorithms require strict weak ordering for comparators, final fix for cmp instructions comparator.	2023-08-14 09:37:46 -07:00
Alexey Bataev	2216507171	[SLP]Fix PR64568: Crash during horizontal reduction. If the reduced values is constant-foldable and was folded to a constant during previous transformations, need to excluded it from the list of the reduced values-instructions as non-matchable.	2023-08-10 07:33:16 -07:00
Matt Arsenault	25bc999d1f	Intrinsics: Add type overload to stacksave and stackstore This allows use with non-0 address space stacks. llvm_ptr_ty should never be used. This could use some more percolation up through mlir, but this is enough to fix existing tests. https://reviews.llvm.org/D156666	2023-08-09 18:33:11 -04:00
Alexey Bataev	d0e3a571e7	[SLP]Fix PR64519: Unexpected reordering of gathers. The issue is actually related to ScatterVectorize nodes. If such node gets reordered during bottom-to-top reordering, it may have associated non-empty ReorderIndices. In this case, such nodes need to be handled the same way as regular Vectorize nodes, not NeedToGather nodes. In this case we need to reorder ReorderIndices array rather than scalars.	2023-08-08 08:07:25 -07:00
Alexey Bataev	e894c3d1a9	[SLP]Improve stores vectorization. Use O(nlogn) instead of O(N2) (N <= 32) sorting approach and do not try to revectorize all possible combinations of stores, if they definitely cannot be combined because of mem/data dependencies. Compile time (O3 + lto, skylake_avx512): External/SPEC/CINT2006/483.xalancbmk/483.xalancbmk.test 117.15 120.11 2.5% External/SPEC/CINT2017speed/623.xalancbmk_s/623.xalancbmk_s.test 203.67 207.42 1.8% External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 232.43 235.01 1.1% External/SPEC/CINT2017rate/523.xalancbmk_r/523.xalancbmk_r.test 205.49 207.25 0.9% External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 310.46 306.23 -1.4% Link time (O3+lto, skylake_avx512): External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 1383.69 1475.94 6.7% Other changes are too small, cannot rely on them. size..text Program size..text results results0 diff test-suite :: SingleSource/Regression/C/Regression-C-sumarray.test 392.00 1439.00 267.1% test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test 394258.00 394818.00 0.1% test-suite :: MultiSource/Applications/JM/lencod/lencod.test 846355.00 847075.00 0.1% test-suite :: External/SPEC/CINT2006/464.h264ref/464.h264ref.test 782816.00 783360.00 0.1% test-suite :: External/SPEC/CFP2017rate/508.namd_r/508.namd_r.test 779667.00 779923.00 0.0% test-suite :: MultiSource/Benchmarks/mafft/pairlocalalign.test 224398.00 224446.00 0.0% test-suite :: MultiSource/Applications/oggenc/oggenc.test 185019.00 185035.00 0.0% test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12487610.00 12488010.00 0.0% test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test 1051772.00 1051804.00 0.0% test-suite :: MultiSource/Applications/SPASS/SPASS.test 529586.00 529602.00 0.0% test-suite :: External/SPEC/CINT2006/400.perlbench/400.perlbench.test 1084684.00 1084716.00 0.0% test-suite :: MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4.test 1014245.00 1014261.00 0.0% test-suite :: MultiSource/Benchmarks/MallocBench/espresso/espresso.test 223494.00 223478.00 -0.0% test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 660843.00 660795.00 -0.0% test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 660843.00 660795.00 -0.0% test-suite :: MultiSource/Applications/ClamAV/clamscan.test 568824.00 568760.00 -0.0% espresso - 2 more stores vectorized x264 - small number of changes in 3-4 functions, generated a bit more vector stores (2 4x zeroinitializer stores + some other small variations). clamscan - emitted 32xi8 store instead of several scalar stores + several 4x-8x stores. Differential Revision: https://reviews.llvm.org/D155246	2023-08-07 09:17:56 -07:00
Alexey Bataev	48bcaeb997	Revert "[SLP]Improve stores vectorization." This reverts commit 58066edbb05d66e6a7512a675da778475da3bdfb reported in https://lab.llvm.org/buildbot/#/builders/252/builds/3389	2023-08-04 07:37:26 -07:00
Alexey Bataev	58066edbb0	[SLP]Improve stores vectorization. Use O(nlogn) instead of O(N2) (N <= 32) sorting approach and do not try to revectorize all possible combinations of stores, if they definitely cannot be combined because of mem/data dependencies. Compile time (O3 + lto, skylake_avx512): External/SPEC/CINT2006/483.xalancbmk/483.xalancbmk.test 117.15 120.11 2.5% External/SPEC/CINT2017speed/623.xalancbmk_s/623.xalancbmk_s.test 203.67 207.42 1.8% External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 232.43 235.01 1.1% External/SPEC/CINT2017rate/523.xalancbmk_r/523.xalancbmk_r.test 205.49 207.25 0.9% External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 310.46 306.23 -1.4% Link time (O3+lto, skylake_avx512): External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 1383.69 1475.94 6.7% Other changes are too small, cannot rely on them. size..text Program size..text results results0 diff test-suite :: SingleSource/Regression/C/Regression-C-sumarray.test 392.00 1439.00 267.1% test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test 394258.00 394818.00 0.1% test-suite :: MultiSource/Applications/JM/lencod/lencod.test 846355.00 847075.00 0.1% test-suite :: External/SPEC/CINT2006/464.h264ref/464.h264ref.test 782816.00 783360.00 0.1% test-suite :: External/SPEC/CFP2017rate/508.namd_r/508.namd_r.test 779667.00 779923.00 0.0% test-suite :: MultiSource/Benchmarks/mafft/pairlocalalign.test 224398.00 224446.00 0.0% test-suite :: MultiSource/Applications/oggenc/oggenc.test 185019.00 185035.00 0.0% test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12487610.00 12488010.00 0.0% test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test 1051772.00 1051804.00 0.0% test-suite :: MultiSource/Applications/SPASS/SPASS.test 529586.00 529602.00 0.0% test-suite :: External/SPEC/CINT2006/400.perlbench/400.perlbench.test 1084684.00 1084716.00 0.0% test-suite :: MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4.test 1014245.00 1014261.00 0.0% test-suite :: MultiSource/Benchmarks/MallocBench/espresso/espresso.test 223494.00 223478.00 -0.0% test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 660843.00 660795.00 -0.0% test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 660843.00 660795.00 -0.0% test-suite :: MultiSource/Applications/ClamAV/clamscan.test 568824.00 568760.00 -0.0% espresso - 2 more stores vectorized x264 - small number of changes in 3-4 functions, generated a bit more vector stores (2 4x zeroinitializer stores + some other small variations). clamscan - emitted 32xi8 store instead of several scalar stores + several 4x-8x stores. Differential Revision: https://reviews.llvm.org/D155246	2023-08-04 06:47:16 -07:00
Alexey Bataev	2f6ca38d51	Revert "[SLP]Improve stores vectorization." This reverts commit 58b0d7c34ddd9d2117009a8cd7bd5e34a8276082 to fix crashes reported in https://lab.llvm.org/buildbot/#/builders/85/builds/18117.	2023-08-04 05:25:31 -07:00
Alexey Bataev	58b0d7c34d	[SLP]Improve stores vectorization. Use O(nlogn) instead of O(N2) (N <= 32) sorting approach and do not try to revectorize all possible combinations of stores, if they definitely cannot be combined because of mem/data dependencies. Compile time (O3 + lto, skylake_avx512): External/SPEC/CINT2006/483.xalancbmk/483.xalancbmk.test 117.15 120.11 2.5% External/SPEC/CINT2017speed/623.xalancbmk_s/623.xalancbmk_s.test 203.67 207.42 1.8% External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 232.43 235.01 1.1% External/SPEC/CINT2017rate/523.xalancbmk_r/523.xalancbmk_r.test 205.49 207.25 0.9% External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 310.46 306.23 -1.4% Link time (O3+lto, skylake_avx512): External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 1383.69 1475.94 6.7% Other changes are too small, cannot rely on them. size..text Program size..text results results0 diff test-suite :: SingleSource/Regression/C/Regression-C-sumarray.test 392.00 1439.00 267.1% test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test 394258.00 394818.00 0.1% test-suite :: MultiSource/Applications/JM/lencod/lencod.test 846355.00 847075.00 0.1% test-suite :: External/SPEC/CINT2006/464.h264ref/464.h264ref.test 782816.00 783360.00 0.1% test-suite :: External/SPEC/CFP2017rate/508.namd_r/508.namd_r.test 779667.00 779923.00 0.0% test-suite :: MultiSource/Benchmarks/mafft/pairlocalalign.test 224398.00 224446.00 0.0% test-suite :: MultiSource/Applications/oggenc/oggenc.test 185019.00 185035.00 0.0% test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12487610.00 12488010.00 0.0% test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test 1051772.00 1051804.00 0.0% test-suite :: MultiSource/Applications/SPASS/SPASS.test 529586.00 529602.00 0.0% test-suite :: External/SPEC/CINT2006/400.perlbench/400.perlbench.test 1084684.00 1084716.00 0.0% test-suite :: MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4.test 1014245.00 1014261.00 0.0% test-suite :: MultiSource/Benchmarks/MallocBench/espresso/espresso.test 223494.00 223478.00 -0.0% test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 660843.00 660795.00 -0.0% test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 660843.00 660795.00 -0.0% test-suite :: MultiSource/Applications/ClamAV/clamscan.test 568824.00 568760.00 -0.0% espresso - 2 more stores vectorized x264 - small number of changes in 3-4 functions, generated a bit more vector stores (2 4x zeroinitializer stores + some other small variations). clamscan - emitted 32xi8 store instead of several scalar stores + several 4x-8x stores. Differential Revision: https://reviews.llvm.org/D155246	2023-08-03 16:14:21 -07:00
Alexey Bataev	f5edeb2617	[SLP][NFC]Add SVML vectorization tests, NFC.	2023-08-01 06:10:15 -07:00
Nikita Popov	a7b95487c7	[SLP] Avoid branch on undef in test (NFC)	2023-08-01 11:24:30 +02:00
Alexey Bataev	0a68cd2304	[SLP]Fix PR64252: Requesting cost of invalid extending instruction. If the actual instruction bitwidth does not match its original size, need to reestimate the casting opcode, the compiler cannot rely on the one, provided in the instruction.	2023-07-31 13:37:52 -07:00
David Green	2a859b2014	[AArch64] Change the cost of vector insert/extract to 2 The cost of vector instructions has always been high under AArch64, in order to add a high cost for inserts/extracts, shuffles and scalarization. This is a conservative approach to limit the scope of unusual SLP vectorization where the codegen ends up being quite poor, but has always been higher than the correct costs would be for any specific core. This relaxes that, reducing the vector insert/extract cost from 3 to 2. It is a generalization of D142359 to all AArch64 cpus. The ScalarizationOverhead is also overridden for integer vector at the same time, to remove the effect of lane 0 being considered free for integer vectors (something that should only be true for float when scalarizing). The lower insert/extract cost will reduce the cost of insert, extracts, shuffling and scalarization. The adjustments of ScalaizationOverhead will increase the cost on integer, especially for small vectors. The end result will be lower cost for float and long-integer types, some higher cost for some smaller vectors. This, along with the raw insert/extract cost being lower, will generally mean more vectorization from the Loop and SLP vectorizer. We may end up regretting this, as that vectorization is not always profitable. In all the benchmarking I have done this is generally an improvement in the overall performance, and I've attempted to address the places where it wasn't with other costmodel adjustments. Differential Revision: https://reviews.llvm.org/D155459	2023-07-28 21:26:50 +01:00
Alexey Bataev	48bc5b0a29	[SLP][PR64099]Fix unsound undef to poison transformation when handling insertelement instructions. If the original vector has undef, not poison values, which are not rewritten by later insertelement instructions, need to transform shuffle with the undef vector, not a poison vector, and actual indices, not PoisonMaskElem, otherwise the transformation may produce more poisons output than the input.	2023-07-27 16:09:49 -07:00
Simon Pilgrim	bbfdb8cc2d	[CostModel][X86] Add scalar rotate-by-immediate costs As noted on #63980 rotate by immediate amounts is much cheaper than variable amounts. This still needs to be expanded to vector rotate cases, and we need to add reasonable funnel-shift costs as well (very tricky as there's a huge range in CPU behaviour for these).	2023-07-27 16:54:30 +01:00
Simon Pilgrim	33b00b4949	[SLP][X86] Add basic funnel-shift / rotation test coverage Including test coverage for Issue #63980	2023-07-27 15:52:12 +01:00
Alexey Bataev	44eca64224	[SLP]Check scalars before trying scheduling. Need to check the scalars if they can be vectorized before trying to schedule them. It may save compile time and improve vectorization on large functions/basic blocks. Differential Revision: https://reviews.llvm.org/D154891	2023-07-24 09:25:19 -07:00
Alexey Bataev	f2e8b38fa5	[SLP][NFC]Add a test with strided loads, NFC.	2023-07-21 13:15:43 -07:00
Alexey Bataev	aae2eaae2c	[SLP]Fix a crash when trying to cast scalable vector type to fixed. Need to check for FixedVectorType, not a vector type, since later compiler performs unconditional cast to FixedVectorType and gets the number of elements in this type.	2023-07-19 11:53:49 -07:00
Simon Pilgrim	ab6ec66642	[SLP][X86] Regenerate some test checks to reduce diff in D154891	2023-07-19 17:02:11 +01:00
Alexey Bataev	83ba148a8a	[SLP]Include cost of the reshuffling for same nodes with resizing. Need to account reshuffling, required for the reused elements in the buildvector nodes, which are copies (perfect match) of other nodes, but include reused elements. Differential Revision: https://reviews.llvm.org/D149966	2023-07-18 06:05:15 -07:00
Alexey Bataev	8ab962e411	[SLP]Relax assertion to check if the input scalars were extended to match the size of base node (PR63668). Need to adjust the check for assert and take into account case where the original scalars are reused and were extended to match the vector factor of the reused SLP node.	2023-07-14 07:19:49 -07:00
Alexey Bataev	bc8abb42bb	Revert "[SLP]Relax assertion to check if the input scalars were extended to" This reverts commit 6fdfc81287ecdc2a7f409d08538ec6ce2bd698da to fix the check in the assert )need to use end, nod begin function).	2023-07-14 07:04:06 -07:00
Alexey Bataev	6fdfc81287	[SLP]Relax assertion to check if the input scalars were extended to match the size of base node (PR63668). Need to adjust the check for assert and take into account case where the original scalars are reused and were extended to match the vector factor of the reused SLP node.	2023-07-14 06:48:25 -07:00
Alexey Bataev	ec6b40ab9b	[SLP]Add a test with the stores with long distances between them, NFC.	2023-07-13 15:14:09 -07:00
Anna Thomas	1159266734	[SLP] Add support for fmaximum/fminimum reduction This patch adds support for vectorized reduction of maximum/minimum intrinsics which are under the appropriate reduction kind. Differential Revision: https://reviews.llvm.org/D154463	2023-07-12 15:22:38 -04:00
Anna Thomas	a43aebcd91	[SLP] Test for minimum/maximum reduction minimum/maximum tests from D154463. This contains tests where we vectorize minimum/maximum as well as the tests where we currently do not identify reduction patterns. Differential Revision: https://reviews.llvm.org/D155096	2023-07-12 15:22:37 -04:00
Nikita Popov	edb2fc6dab	[llvm] Remove explicit -opaque-pointers flag from tests (NFC) Opaque pointers mode is enabled by default, no need to explicitly enable it.	2023-07-12 14:35:55 +02:00

1 2 3 4 5 ...

1462 Commits