llvm-project

Author	SHA1	Message	Date
Alexey Bataev	cf792f664a	[SLP]Fix a crash for the replaced vectorized value. If two nodes share the same value, which is replaced in one of the nodes, need to automatically replace same value in all nodes. Btter to use WeakTrackingVH for this to fix compiler crash.	2023-04-27 09:32:00 -07:00
ManuelJBrito	8b56da5e9f	[IR] Change shufflevector undef mask to poison With this patch an undefined mask in a shufflevector will be printed as poison. This change is done to support the new shufflevector semantics for undefined mask elements. Differential Revision: https://reviews.llvm.org/D149210	2023-04-27 14:41:10 +01:00
Alexey Bataev	b1abc2beaf	[SLP]Fix PR58616: assert for gep nodes with different basic blocks. Need to relax the assertion check in the FindFirstInst lambda for GEP nodes with non-GEP instruction to avoid compiler crash.	2023-04-24 07:41:06 -07:00
Jay Foad	593e25ffae	[Vectorize] Fix vectorization, scalarization and folding of llvm.is.fpclass llvm.is.fpclass is different from other vectorizable intrinsics in that it is overloaded on an argument type, not on the return type. Differential Revision: https://reviews.llvm.org/D148905	2023-04-24 13:42:08 +01:00
Jay Foad	3237497d01	[Vectorize] Pre-commit tests for D148905 Differential Revision: https://reviews.llvm.org/D149050	2023-04-24 13:42:08 +01:00
Simon Pilgrim	aca5f9aeea	[CostModel][X86] getMemoryOpCost - increase cost of sub-32-bit vector load/stores For 8-bit/16-bit vector loads/stores we scalarize and transfer to/from the vector unit, or use the (usually slow) PINSR/PEXTR instructions. Fixes #59867	2023-04-23 21:48:25 +01:00
Simon Pilgrim	97927c380f	[SLP][X86] Add test coverage for Issue #59867	2023-04-23 21:20:44 +01:00
Alexey Bataev	851a12138a	[SLP]Fix the cost for the extractelements, used in several nodes. Currently the compiler calculates the compensation cost for the extractelements, removed during vectorization. But if the extractelement instruction is used in several nodes, we can calculate the compensation for them several times. Differential Revision: https://reviews.llvm.org/D148806	2023-04-21 09:05:03 -07:00
Alexey Bataev	403bd583a8	[SLP]Fix a crash on scalarized vectors. Need to register in-vector for scalarized types to avoid crash in further analysis.	2023-04-21 08:13:48 -07:00
Alexey Bataev	ecc204b64e	[SLP][NFC]Add a test with an extra cost of the reused extractelement instruction, NFC.	2023-04-20 13:27:48 -07:00
Simon Pilgrim	4060042384	[CostModel][X86] Improve i8 and vXi8 MUL costs We were treating vXi8 multiply as the sum of a trunc(mul(extend(),extend())) which diverged from the costs from llvm-mcaonce we extended beyond legal types Use a modified version of the D103695 script to determine more accurate throughput/latency/codesize/size-latency cost estimates Helps address some of the regressions identified in D148806	2023-04-20 19:38:51 +01:00
Alexey Bataev	0e1312fbe0	[SLP][X86]Fix the cost of reused gathers/buildvectors and floats insert. There are 2 problems in the cost estimation for buildvector/gather. 1. If the buildvector/gather node is the same as another one node, need to estimate the cost of this node as 0. 2. The cost of inserting float point register to non-poison vector is not 0, it should not be considered free. Differential Revision: https://reviews.llvm.org/D148801	2023-04-20 09:34:46 -07:00
Vasileios Porpodas	a72bcc1252	[SLP][NFC] Test showing a cost estimation issue caused by f82eb7e066f322a231627383fc80522d98ce6181 The buildvector cost for the case shown in the test should be 0 but it is -1, causing the code to get vectorized, whenit shouldn't. Differential Revision: https://reviews.llvm.org/D148732	2023-04-19 14:32:16 -07:00
Alexey Bataev	8cf0290c4a	[SLP]Fix cost estimation for buildvectors with extracts and/or constants. If the partial matching is found and some other scalars must be inserted, need to account the cost of the extractelements, transformed to shuffles, and/or reused entries and calculate the cost of inserting constants properly into the non-poison vectors. Also, fixed the cost calculation for final gather/buildvector sequence. Differential Revision: https://reviews.llvm.org/D148362	2023-04-19 05:54:58 -07:00
Alexey Bataev	1ce4b26a21	[SLP]Add final resize to ShuffleCostEstimator::finalize member function and basic add member functions. Implemented the reshuffling in finalize member function + add basic support for add member functions, used during vector build. Part of D110978 Differential Revision: https://reviews.llvm.org/D148279	2023-04-18 11:52:04 -07:00
Alexey Bataev	d7a40a447f	Revert "[SLP]Add final resize to ShuffleCostEstimator::finalize member function and basic add member functions." This reverts commit cd341f3f4878137d1c9e7a05c4c3a7bd8ff216dc to fix a crash revealed by buildbot https://lab.llvm.org/buildbot#builders/124/builds/7108.	2023-04-18 10:41:00 -07:00
Alexey Bataev	cd341f3f48	[SLP]Add final resize to ShuffleCostEstimator::finalize member function and basic add member functions. Implemented the reshuffling in finalize member function + add basic support for add member functions, used during vector build. Part of D110978 Differential Revision: https://reviews.llvm.org/D148279	2023-04-18 05:51:23 -07:00
Alexey Bataev	f82eb7e066	[SLP]Introduce gather cost estimation function. Introduced BoUpSLP::ShuffleCostEstimator::gather function as an initial implementation of the gather/buildvector cost estimation for buildvector nodes. It will allow to use general codegen infrastructure for better cost estimation + it improves the cost estimation for the gathers/buildvectors. Improved part of D110978. Differential Revision: https://reviews.llvm.org/D148174	2023-04-13 10:16:00 -07:00
Simon Pilgrim	b3480d5ede	[SLP] Compute min/max scalar reduction costs using min/max intrinsics instead of expanded cmp+sel By default these will expand back to cmp/sel, but some targets (X86) has optimized costs for scalar integer min/max patterns which are lower than the default expansion (pre-SSE41 is particularly weak for vector min/max support). Differential Revision: [SLP] Compute min/max scalar reduction costs using min/max intrinsics instead of expanded cmp+sel	2023-04-13 17:00:39 +01:00
Alexey Bataev	b28f407df9	[SLP]Improve reduction cost model for scalars. Instead of abstract cost of the scalar reduction ops, try to use the cost of actual reduction operation instructions, where possible. Also, remove the estimation of the vectorized GEPs pointers for reduced loads, since it is already handled in the tree. Differential Revision: https://reviews.llvm.org/D148036	2023-04-12 11:32:51 -07:00
Simon Pilgrim	162284b2e1	[SLP][X86] Add SSE4 test coverage to minmax reduction tests Improve coverage for D148036	2023-04-12 17:41:31 +01:00
Simon Pilgrim	63c3895327	[TTI][X86] getMinMaxCost - use existing integer min/max intrinsic cost values instead of maintaining a duplicate cost table getMinMaxCost has an alternative set of min/max costs to getIntrinsicInstrCost that are only used by getMinMaxReductionCost, but are a lot less thorough and fallback to an expansion in most cases resulting in cost overestimations - we're better off just using getIntrinsicInstrCost. getIntrinsicInstrCost is still missing complete FMINNUM/FMAXNUM costs, so until then getMinMaxCost will still be used for these, after that we can remove getMinMaxCost and have getMinMaxReductionCost call getIntrinsicInstrCost directly. Fixes regression noticed in D148036	2023-04-12 15:33:12 +01:00
Sjoerd Meijer	d827865e9f	Recommit "[AArch64][TTI] Cost model FADD/FSUB/FNEG"" Fixed two test cases that relied on Asserts, and added a fallthrough annotation to the switch case.	2023-04-11 12:48:15 +01:00
Sjoerd Meijer	4876f43ea9	Revert "[AArch64][TTI] Cost model FADD/FSUB/FNEG" This reverts commit d0027e0be990df60f6f826123f035286a168f288. Need to look at 2 test failures.	2023-04-11 10:14:40 +01:00
Sjoerd Meijer	d0027e0be9	[AArch64][TTI] Cost model FADD/FSUB/FNEG This lowers the cost for FADD, FSUB, and FNEG. The motivation is to avoid over-eager SLP vectorisation, that makes it look like SLP vectorisation is profitable but results in significant slow downs. Lowering the cost for scalar FADD/FSUB costs helps the profitability decision to favour the scalar version where vectorisation isn't beneficial. Lowering the cost for these floating point operations makes sense because a lot of other instructions including many shuffles have only a cost of 1; these FADD/FSUB/FNEG instructions should not be twice the cost. Performance results show a 7% improvement for Imagick from SPEC FP 2017, a small improvement in Blender, and unchanged results for the other apps in SPEC. RAJAPerf is neutral and mostly shows no changes. Differential Revision: https://reviews.llvm.org/D146033	2023-04-11 09:46:14 +01:00
Alexey Bataev	50af6ab0ab	[SLP]Fix emission of the masks in shuffles for undefs. If the value is used in the expression, need to adjust the mask before applying the mask. Plus, need to fix the analysis of the phi nodes for reused scalars.	2023-04-06 10:16:58 -07:00
Alexey Bataev	cf62adbbd8	[SLP]Fix delete of the extractelement with users. Made the condition for the erasing of the gathered extractelements stricter, remove it only if it has single vectorized use, otherwise leave it for instcombiner/instsimplify analysis.	2023-04-06 09:15:30 -07:00
Alexey Bataev	40105a9933	[SLP]Find reused scalars in buildvector sequences, if any. Patch generalizes analysis of scalars. The main part is outlined into lambda, which can be used to find reused inserted scalars and emit shuffle for them instead of multiple insertelement instructions, if the permutation is found alreadyi. I.e. some scalars are transformed by the permutation of previously vectorized nodes, and some are inserted directly. Reworked part of D110978 Differential Revision: https://reviews.llvm.org/D146564	2023-04-05 09:37:05 -07:00
Alexey Bataev	c1660006b2	[SLP]Reorder counters for same values, if the root node is reordered. The counters for the repeated scalars are ordered in the natural order, but the original scalars might be reordered during SLP graph reordering and this order can be dropped. Need to use the scalars after the reordering, not the original ones, to emit correct code for same value counters.	2023-04-03 07:52:49 -07:00
Alexey Bataev	367db8bf6a	[SLP][NFC]Add a test for reordered scalars with not reordered reuse coefficient.	2023-04-03 07:15:58 -07:00
Alexey Bataev	c1bcf5dd0a	[SLP]Fix PR61835: Assertion `I->use_empty() && "trying to erase instruction with users."' failed. If the externally used scalar is part of the tree and is replaced by extractelement instruction, need to add generated extractelement instruction to the list of the ExternallyUsedValues to avoid deletion during vectorization.	2023-03-31 14:21:19 -07:00
Guozhi Wei	a72162cc52	[AARCH64] Enable STORE of v4i8 to help more vectorization opportunities For the attached test case, currently llvm generates instructions to load/or/store the bytes one by one. Although NEON doesn't support v4i8 natively, we can promote it to v4i16 and operate on v4i16 vectors. So this patch override getStoreMinimumVF and specify the minimum VF for i8 vector is v4i8. Differential Revision: https://reviews.llvm.org/D145614	2023-03-31 17:03:06 +00:00
Alexey Bataev	9255124a07	[SLP]Fix a crash when trying to shuffle multiple nodes. Need to transform mask after applying shuffle using the mask itself as a base to correctly mark with identity those indices, actually used in previous shuffle. Allows to fix a crash, if different sized vectors are shuffled.	2023-03-30 09:32:11 -07:00
Zain Jaffal	4d7d454334	[SLP][AArch64] Add test to check for the vectorization of fshl Currently the cost for fshl is an overestimate causing SLP to vectorize when it is not necessary. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D147056	2023-03-28 17:46:33 +01:00
Florian Hahn	417fe52e6f	Revert "[SLP] Check with target before vectorizing GEP Indices." This reverts commit 1387a13e1d0bac94457626ef3e7427c84caf6e65. This introduced performance regressions on AArch64, when the cost of a vector GEP + extracts is offset by the benefits of vectorizing the rest of the tree. The test in llvm/test/Transforms/SLPVectorizer/AArch64/vector-getelementptr.ll illustrates the issue. It was extracted from code that regressed a SPEC benchmark by 15%.	2023-03-28 08:06:53 +01:00
Zain Jaffal	984b46e6cc	[SLP] Add test to check for GEP vectorization add a test to check for gep vectorization after the change from D144128 where the gep vectorization is dependant on the target hook `prefersVectorizedAddressing()` Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D146540	2023-03-27 17:52:55 +01:00
Luke Lau	f23ea4cbd4	[RISCV] Model select and insertsubvector shuffle kinds Selects get lowered to a vmerge with a mask, and insertsubvectors get lowered to a vslideup. Differential Revision: https://reviews.llvm.org/D146747	2023-03-24 17:30:32 +00:00
Luke Lau	1c9094a201	[RISCV] Add test case for two equivalent reductions They are functionally equivalent but currently one fails to vectorize because the cost of an insert subvector shuffle is too expensive. D146747 will update the cost of these types of shuffles, so add a test case for it.	2023-03-24 17:30:32 +00:00
Luke Lau	40b408cb05	[RISCV] Enable SLP in RISC-V SLP reduction tests Horizontal reduction can still kick in even when the max VF is set to 0, but strange stuff can happen as it affects the cost model. Enable it for these tests as eventually the goal will be to have SLP enabled.	2023-03-24 17:30:32 +00:00
Luke Lau	8d16c6809a	[RISCV] Increase default vectorizer LMUL to 2 After some discussion and experimentation, we have seen that changing the default number of vector register bits to LMUL=2 strikes a sweet spot. Whilst we could be clever here and make the vectorizer smarter about dynamically selecting an LMUL that a) Doesn't affect register pressure b) Suitable for the microarchitecture we would need to teach its heuristics about RISC-V register grouping specifics. Instead this just does the easy, pragmatic thing by changing the default to a safe value that doesn't affect register pressure signifcantly[1], but should increase throughput and unlock more interleaving. [1] Register spilling when compiling sqlite at various levels of `-riscv-v-register-bit-width-lmul`: LMUL=1 2573 spills LMUL=2 2583 spills LMUL=4 2819 spills LMUL=8 3256 spills Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D143723	2023-03-23 10:33:50 +00:00
Ben Shi	9855fe4568	[RISCV][NFC] Add more tests for SLP vectorization (binops on load/store) Reviewed By: reames Differential Revision: https://reviews.llvm.org/D146025	2023-03-23 09:01:04 +08:00
Luke Lau	e69f8bac42	[RISCV][NFC] Add test case for SLP reduction vectorization failure Horizontal reductions still occur on RISC-V, despite the maximum SLP VF reported back by TTI being 1, to disable SLP. This can cause the cost model to think it can vectorize a gather into smaller, widened loads, when it will actually fail to do so. This should ultimately be fixed whenever SLP is re-enabled for RISC-V at some point. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D146529	2023-03-21 15:57:52 +00:00
Alexey Bataev	59ff9d3777	[SLP]Fix PR61554: use of missing vectorized value in buildvector nodes. If the buildvector node matches the vector node, it reuse the vector value from this vector node, but its VectorizedValue field is not updated. Need to update this field to avoid misses during the analysis of the reused gather/buildvector nodes.	2023-03-20 12:05:26 -07:00
Alexey Bataev	427136dc35	[SLP][NFC]Add a test with missed buildvector node, matching the vectorized node.	2023-03-20 10:58:52 -07:00
Alexey Bataev	0ad87ffdcc	[SLP]Introduce shuffle of the nodes + gather/vectorbuild of the remaining scalars. Currently compiler does not support mixing of shuffled nodes + gather/buildvector of the remaining scalar values. It may reduce total number of instructions and improve performance of the gather/buildvector sequences. Part of D110978 Differential Revision: https://reviews.llvm.org/D146167	2023-03-17 11:18:36 -07:00
Valery N Dmitriev	4c2299003f	[TTI] Add X86 target specific version of getPointersChainCost. When all the pointers are off the same base address and have known distances to each other these differences can be encoded into displacements in x86 arch. So the only cost that matters is cost of the base GEP. Differential Revision: https://reviews.llvm.org/D146102	2023-03-16 10:26:50 -07:00
Ben Shi	ce455f4434	[RISCV][NFC] Add more floating point tests for SLP vectorization Reviewed By: reames Differential Revision: https://reviews.llvm.org/D146108	2023-03-16 13:30:16 +08:00
Ben Shi	72ce9d1ccd	[RISCV][NFC] Add tests for SLP vectorization of smin/smax/umin/umax Reviewed By: reames Differential Revision: https://reviews.llvm.org/D146015	2023-03-16 13:30:16 +08:00
Alexey Bataev	641939baa9	[SLP]Remove CreateShuffle lambda and reuse ShuffleBuilder functions. After merging main part of the gather/buildvector code, CreateShuffle lambda can removed and ShuffleBuilder add functions can be used instead. Also, part of the code from CreateShuffle migrated to createShuffle of the BaseShuffleAnalysis::createShuffle function for better code emission. Differential Revision: https://reviews.llvm.org/D145988	2023-03-14 10:15:41 -07:00
Alexey Bataev	874c49f554	[SLP]Fix PR61395: need to adjust vector factor after emitting shuffle operation for combined entries. The vector factor after combining of the shuffle entries is defined by the size of the mask, not by the vector factors of the original entries. So, need to adjust it to emit correct code.	2023-03-14 06:27:08 -07:00

1 2 3 4 5 ...

1375 Commits