llvm-project

Author	SHA1	Message	Date
Alexey Bataev	f82eb7e066	[SLP]Introduce gather cost estimation function. Introduced BoUpSLP::ShuffleCostEstimator::gather function as an initial implementation of the gather/buildvector cost estimation for buildvector nodes. It will allow to use general codegen infrastructure for better cost estimation + it improves the cost estimation for the gathers/buildvectors. Improved part of D110978. Differential Revision: https://reviews.llvm.org/D148174	2023-04-13 10:16:00 -07:00
Simon Pilgrim	b3480d5ede	[SLP] Compute min/max scalar reduction costs using min/max intrinsics instead of expanded cmp+sel By default these will expand back to cmp/sel, but some targets (X86) has optimized costs for scalar integer min/max patterns which are lower than the default expansion (pre-SSE41 is particularly weak for vector min/max support). Differential Revision: [SLP] Compute min/max scalar reduction costs using min/max intrinsics instead of expanded cmp+sel	2023-04-13 17:00:39 +01:00
Simon Pilgrim	9e30b87afb	[TTI] getMinMaxReductionCost - add FastMathFlag argument Similar to the getArithmeticReductionCost / getExtendedReductionCost calls (which really don't need to use std::optional<>). This will be necessary to correct recognize fast/nnan fmax/fmul reductions which can avoid nan handling - which will allow us to remove the fmax/fmin special case in X86TTIImpl::getMinMaxCost and use getIntrinsicInstrCost like we do for integer reductions (63c3895327839ba5b57f5b99ec9e888abf976ac6). Differential Revision: https://reviews.llvm.org/D148149	2023-04-13 10:42:42 +01:00
Alexey Bataev	b28f407df9	[SLP]Improve reduction cost model for scalars. Instead of abstract cost of the scalar reduction ops, try to use the cost of actual reduction operation instructions, where possible. Also, remove the estimation of the vectorized GEPs pointers for reduced loads, since it is already handled in the tree. Differential Revision: https://reviews.llvm.org/D148036	2023-04-12 11:32:51 -07:00
Alexey Bataev	d00158cd28	[SLP][NFC]Introduce ShuffleCostEstimator and adjustExtracts member function. Added ShuffleCostEstimator class and the first adjustExtracts member, which is just a copy of previous AdjustExtractCost lambda. Differential Revision: https://reviews.llvm.org/D147787	2023-04-11 12:47:07 -07:00
Alexey Bataev	85327f307b	[SLP][NFC]Make clusterSortPtrAccesses static.	2023-04-07 13:24:24 -07:00
Alexey Bataev	6ff177d928	[SLP][NFC]Improve SLP time by precomputing value<->gather nodes dependencies. Improved compiled time by the precomputing the mapping between gathered scalars and their gather/buildvector nodes for later use in isGatherShuffledEntry to avoid recomputing this map each time this function is called.	2023-04-07 12:12:02 -07:00
Alexey Bataev	52dd72a37a	[SLP][NFC]Make adjustExtracts/needToDelay members of ShuffleInstructionBuilder. Make adjustExtracts/needToDelay lambdas members of ShuffleInstructionBuilder to allow to overload them later for cost model. Differential Revision: https://reviews.llvm.org/D147730	2023-04-06 16:27:19 -07:00
Alexey Bataev	e58a49300e	[SLP][NFC]Evaluate FMF for reductions before the loop, no need to reevaluate it.	2023-04-06 11:57:20 -07:00
Alexey Bataev	50af6ab0ab	[SLP]Fix emission of the masks in shuffles for undefs. If the value is used in the expression, need to adjust the mask before applying the mask. Plus, need to fix the analysis of the phi nodes for reused scalars.	2023-04-06 10:16:58 -07:00
Alexey Bataev	cf62adbbd8	[SLP]Fix delete of the extractelement with users. Made the condition for the erasing of the gathered extractelements stricter, remove it only if it has single vectorized use, otherwise leave it for instcombiner/instsimplify analysis.	2023-04-06 09:15:30 -07:00
Alexey Bataev	40105a9933	[SLP]Find reused scalars in buildvector sequences, if any. Patch generalizes analysis of scalars. The main part is outlined into lambda, which can be used to find reused inserted scalars and emit shuffle for them instead of multiple insertelement instructions, if the permutation is found alreadyi. I.e. some scalars are transformed by the permutation of previously vectorized nodes, and some are inserted directly. Reworked part of D110978 Differential Revision: https://reviews.llvm.org/D146564	2023-04-05 09:37:05 -07:00
Alexey Bataev	c1660006b2	[SLP]Reorder counters for same values, if the root node is reordered. The counters for the repeated scalars are ordered in the natural order, but the original scalars might be reordered during SLP graph reordering and this order can be dropped. Need to use the scalars after the reordering, not the original ones, to emit correct code for same value counters.	2023-04-03 07:52:49 -07:00
Alexey Bataev	c1bcf5dd0a	[SLP]Fix PR61835: Assertion `I->use_empty() && "trying to erase instruction with users."' failed. If the externally used scalar is part of the tree and is replaced by extractelement instruction, need to add generated extractelement instruction to the list of the ExternallyUsedValues to avoid deletion during vectorization.	2023-03-31 14:21:19 -07:00
Alexey Bataev	9255124a07	[SLP]Fix a crash when trying to shuffle multiple nodes. Need to transform mask after applying shuffle using the mask itself as a base to correctly mark with identity those indices, actually used in previous shuffle. Allows to fix a crash, if different sized vectors are shuffled.	2023-03-30 09:32:11 -07:00
Florian Hahn	417fe52e6f	Revert "[SLP] Check with target before vectorizing GEP Indices." This reverts commit 1387a13e1d0bac94457626ef3e7427c84caf6e65. This introduced performance regressions on AArch64, when the cost of a vector GEP + extracts is offset by the benefits of vectorizing the rest of the tree. The test in llvm/test/Transforms/SLPVectorizer/AArch64/vector-getelementptr.ll illustrates the issue. It was extracted from code that regressed a SPEC benchmark by 15%.	2023-03-28 08:06:53 +01:00
Alexey Bataev	59ff9d3777	[SLP]Fix PR61554: use of missing vectorized value in buildvector nodes. If the buildvector node matches the vector node, it reuse the vector value from this vector node, but its VectorizedValue field is not updated. Need to update this field to avoid misses during the analysis of the reused gather/buildvector nodes.	2023-03-20 12:05:26 -07:00
Alexey Bataev	0ad87ffdcc	[SLP]Introduce shuffle of the nodes + gather/vectorbuild of the remaining scalars. Currently compiler does not support mixing of shuffled nodes + gather/buildvector of the remaining scalar values. It may reduce total number of instructions and improve performance of the gather/buildvector sequences. Part of D110978 Differential Revision: https://reviews.llvm.org/D146167	2023-03-17 11:18:36 -07:00
Valery N Dmitriev	f9b438b519	[SLP] Outline GEP chain cost modeling into new TTI interface - NFCI. Cost modeling for GEPs should actually be target dependent but is currently done inside SLP target-independent way. Sinking it into TTI enables target dependent implementation. This patch adds new TTI interface and implementation of the basic functionality trying to retain existing cost modeling. Differential Revision: https://reviews.llvm.org/D144770	2023-03-14 14:01:34 -07:00
Alexey Bataev	641939baa9	[SLP]Remove CreateShuffle lambda and reuse ShuffleBuilder functions. After merging main part of the gather/buildvector code, CreateShuffle lambda can removed and ShuffleBuilder add functions can be used instead. Also, part of the code from CreateShuffle migrated to createShuffle of the BaseShuffleAnalysis::createShuffle function for better code emission. Differential Revision: https://reviews.llvm.org/D145988	2023-03-14 10:15:41 -07:00
Alexey Bataev	874c49f554	[SLP]Fix PR61395: need to adjust vector factor after emitting shuffle operation for combined entries. The vector factor after combining of the shuffle entries is defined by the size of the mask, not by the vector factors of the original entries. So, need to adjust it to emit correct code.	2023-03-14 06:27:08 -07:00
Jakub Kuderski	b9db89fbcf	[ADT][NFCI] Do not use non-const lvalue-refs with enumerate in llvm/ Replace references to `enumerate` results with either const lvalue rerences or structured bindings. I did not use structured bindings everywhere as it wasn't clear to me it would improve readability. This is in preparation to the switch to `zip` semantics which won't support non-const lvalue reference to elements: https://reviews.llvm.org/D144503. Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D145987	2023-03-13 20:59:06 -04:00
Alexey Bataev	f3a68ac10c	[SLP][NFC]Initial merge of gather/buildvector code in the createBuildVector function. Required for future changes with combining shuffled nodes and buildvector sequences to improve cost/emission of the gather nodes. Part of D110978 Differential Revision: https://reviews.llvm.org/D145732	2023-03-13 06:11:05 -07:00
Arthur Eubanks	7c3c981442	[Passes] Remove some legacy passes DFAJumpThreading JumpThreading LibCallsShrink LoopVectorize SLPVectorizer DeadStoreElimination AggressiveDCE CorrelatedValuePropagation IndVarSimplify These are part of the optimization pipeline, of which the legacy version is deprecated and being removed.	2023-03-10 17:17:00 -08:00
Alexey Bataev	93a9be0cea	[SLP]Initial support for reshuffling of non-starting buildvector/gather nodes. Previously only the very first gather/buildvector node might be probed for reshuffling of other nodes. But the compiler may do the same for other gather/buildvector nodes too, just need to check the dependency and postpone the emission of the dependent nodes, if the origin nodes were not emitted yet. Part of D110978 Differential Revision: https://reviews.llvm.org/D144958	2023-03-10 13:19:43 -08:00
Hans Wennborg	3b3a4c270b	Revert "[SLP]Initial support for reshuffling of non-starting buildvector/gather nodes." This caused verifier errors: Instruction does not dominate all uses! %8 = insertelement <2 x i64> %7, i64 %pgocount1330, i64 1 %15 = shufflevector <2 x i64> %8, <2 x i64> poison, <2 x i32> <i32 1, i32 1> in function ?NearestInclusiveAncestorAssignedToSlot@SlotScopedTraversal@blink@@SAPAVElement@2@ABV32@@Z (or register allocator crash when the verifier was disabled). See comment on the code review. > Previously only the very first gather/buildvector node might be probed for reshuffling of other nodes. > But the compiler may do the same for other gather/buildvector nodes too, just need to check the > dependency and postpone the emission of the dependent nodes, if the origin nodes were not emitted yet. > > Part of D110978 > > Differential Revision: https://reviews.llvm.org/D144958 This reverts commit a611b3f3059e4c3b9e7b914091c3edaef099fd5d. It also reverts 7a4061ae372b3262703ffeea3b64db89187db611 which depended on the above.	2023-03-10 14:40:12 +01:00
Alexey Bataev	a611b3f305	[SLP]Initial support for reshuffling of non-starting buildvector/gather nodes. Previously only the very first gather/buildvector node might be probed for reshuffling of other nodes. But the compiler may do the same for other gather/buildvector nodes too, just need to check the dependency and postpone the emission of the dependent nodes, if the origin nodes were not emitted yet. Part of D110978 Differential Revision: https://reviews.llvm.org/D144958	2023-03-07 12:45:40 -08:00
Alexey Bataev	c411965820	[SLP]Fix PR61224: Compiler hits infinite loop. IRBuilder in many cases is able to fold constant code automatically, but in some cases (for some intrinsics) it cannot do it. Need to perform manual calculation, if constant provided in these corner cases, to avoid infinite loop.	2023-03-06 13:46:41 -08:00
Valery N Dmitriev	ec7154fe70	[SLP] Add banner argument to SLP costs debug printer method - NFC. Removed unnecessary warning workaround. Differential Revision: https://reviews.llvm.org/D144992	2023-02-28 11:22:49 -08:00
Alexey Bataev	1d6b5b66bb	[SLP]Fix PR61050: Assertion `I->use_empty() && "trying to erase instruction with users." When gathering the counter for the reused scalars, need to use reduced value, not the original reduced value. Same values counter is gathered for reduced values, not original ones.	2023-02-28 07:51:34 -08:00
Vasileios Porpodas	a700fb3d9b	[SLP] Fixes crash in BoUpSLP::isGatherShuffledEntry() Crash caused by: 708eb1b96d9a36f9c0182b7d53c492059778fa35 Differential Revision: https://reviews.llvm.org/D144895	2023-02-27 12:29:25 -08:00
Alexey Bataev	007177bdde	[SLP]Fix PR61018: Assertion `Mask[I] == UndefMaskElem && "Multiple uses of scalars."' failed. Need to check for the reused indices when checking if 2 insertelement instruction are from the same buildvector. If the inidices are reused, better not to match buildvectors and consider them as differenet, otherwise need to track the order of insertelement operations.	2023-02-27 10:09:48 -08:00
Alexey Bataev	5f53e85f8a	[SLP]Fix a crash when trying to find reduced ops for the reduced value. Need to use original reduced value, not the one the compiler gets after reduction, it may be replaced by the extractelement instruction already.	2023-02-27 07:32:36 -08:00
Alexey Bataev	f1c8b72c13	[SLP]Improve handling gathers/buildvectors with undefs. If have just one non-undef scalar in the buildvector/gather node, we try to put it to be the very first element, which is profitable in most cases. Do the preliminary estimation, if this more profitable during graph rotation and do same for all elements, including extractelements. Differential Revision: https://reviews.llvm.org/D144689	2023-02-24 13:17:40 -08:00
Alexey Bataev	6e30dffe71	[SLP][NFC]Format and improve function, returning std::optional<struct>, NFC.	2023-02-24 11:06:31 -08:00
Jonas Paulsson	1387a13e1d	[SLP] Check with target before vectorizing GEP Indices. The target hook prefersVectorizedAddressing() already exists to check with target if address computations should be vectorized, so it seems like this should be used in SLPVectorizer as well. Reviewed By: ABataev, RKSimon Differential Revision: https://reviews.llvm.org/D144128	2023-02-23 15:31:34 +01:00
Alexey Bataev	cbcdd747e8	[SLP]Do not swap not counted extractelements. No need to swap extractelements, which were not excluded from the list during cost analysis. It leads to incorrect cost calculation and make vector code more profitable than it is actually is.	2023-02-21 13:16:51 -08:00
Alexey Bataev	5f928a223e	[SLP]Properly define incoming block for user PHI nodes. MainOp of the PHI vectorizable entries contains the proper order of incoming blocks, not the last instruction in the block.	2023-02-21 08:01:24 -08:00
Alexey Bataev	708eb1b96d	[SLP]Add shuffling of extractelements to avoid extra costs/data movement. If the scalar must be extracted and then used in the gather node, instead we can emit shuffle instruction to avoid those extra extractelements and vector-to-scalar and back data movement. Part of D110978 Differential Revision: https://reviews.llvm.org/D141940	2023-02-20 06:14:42 -08:00
Kazu Hirata	f8f3db2756	Use APInt::count{l,r}_{zero,one} (NFC)	2023-02-19 22:04:47 -08:00
Florian Hahn	f61c9b7569	[SLP] Fix infinite loop in isUndefVector. This fixes an infinite loop if isa<T>(II->getOperand(1)) is true. Update Base at the top of the loop, before the continue. Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D144292	2023-02-19 21:42:24 +00:00
Alexey Bataev	e03d254bbd	[SLP]Do not reduce repeated values, use scalar red ops instead. Metric: size..text size..text results results0 diff SingleSource/Regression/C/gcc-c-torture/execute/GCC-C-execute-980605-1.test 445.00 461.00 3.6% SingleSource/Benchmarks/Adobe-C++/loop_unroll.test 428477.00 428445.00 -0.0% External/SPEC/CFP2006/447.dealII/447.dealII.test 618849.00 618785.00 -0.0% For all tests some extra code was optimized, GCC-C-execute has some more inlining after Differential Revision: https://reviews.llvm.org/D132261	2023-02-17 07:19:35 -08:00
Kazu Hirata	7e6e636fb6	Use llvm::has_single_bit<uint32_t> (NFC) This patch replaces isPowerOf2_32 with llvm::has_single_bit<uint32_t> where the argument is wider than uint32_t.	2023-02-15 22:17:27 -08:00
Simon Pilgrim	552e27c521	[SLP] Use allConstant helper. NFCI.	2023-02-05 19:21:48 +00:00
Kazu Hirata	526966d07d	Use llvm::bit_ceil (NFC) Note that: std::has_single_bit(X) ? X : llvm::NextPowerOf2(X); is equivalent to: std::bit_ceil(X) even for input 0.	2023-01-28 16:13:09 -08:00
Kazu Hirata	f20b5071f3	[llvm] Use llvm::bit_floor instead of llvm::PowerOf2Floor (NFC)	2023-01-28 09:06:31 -08:00
ShihPo Hung	5fb3a57ea7	[Cost] Add CostKind to getVectorInstrCost and its related users LoopUnroll estimates the loop size via getInstructionCost(), but getInstructionCost() cannot pass CostKind to getVectorInstrCost(). And so does getShuffleCost() to getBroadcastShuffleOverhead(), getPermuteShuffleOverhead(), getExtractSubvectorOverhead(), and getInsertSubvectorOverhead(). To address this, this patch adds an argument CostKind to these functions. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D142116	2023-01-21 05:29:24 -08:00
Alexey Bataev	9bdcf8778a	[SLP]Improve isGatherShuffledEntry by looking deeper through the reused scalars. The compiler may produce better results if it does not look for constants, uses an extra analysis of phi nodes, looks through all tree nodes without skipping the cases, where the very first set of nodes is empty. Also, it tries to reshufle the nodes if it is profitable for sure, i.e. at least 2 scalars are used for single node permutation and at least 3 scalars are used for the permutation of 2 nodes. Part of D110978 Differential Revision: https://reviews.llvm.org/D141512	2023-01-19 13:46:25 -08:00
Joe Loser	a288d7f937	[llvm][ADT] Replace uses of `makeMutableArrayRef` with deduction guides Similar to how `makeArrayRef` is deprecated in favor of deduction guides, do the same for `makeMutableArrayRef`. Once all of the places in-tree are using the deduction guides for `MutableArrayRef`, we can mark `makeMutableArrayRef` as deprecated. Differential Revision: https://reviews.llvm.org/D141814	2023-01-16 14:49:37 -07:00
Guillaume Chatelet	8fd5558b29	[NFC] Use TypeSize::geFixedValue() instead of TypeSize::getFixedSize() This change is one of a series to implement the discussion from https://reviews.llvm.org/D141134.	2023-01-11 16:49:38 +00:00

1 2 3 4 5 ...

1351 Commits