llvm-project

Author	SHA1	Message	Date
Florian Hahn	2db71c9851	[VPlan] Simplify code in createReplicateRegion (NFC). Simplify the code as suggested in D143865.	2023-03-11 11:47:23 +01:00
Arthur Eubanks	7c3c981442	[Passes] Remove some legacy passes DFAJumpThreading JumpThreading LibCallsShrink LoopVectorize SLPVectorizer DeadStoreElimination AggressiveDCE CorrelatedValuePropagation IndVarSimplify These are part of the optimization pipeline, of which the legacy version is deprecated and being removed.	2023-03-10 17:17:00 -08:00
Alexey Bataev	93a9be0cea	[SLP]Initial support for reshuffling of non-starting buildvector/gather nodes. Previously only the very first gather/buildvector node might be probed for reshuffling of other nodes. But the compiler may do the same for other gather/buildvector nodes too, just need to check the dependency and postpone the emission of the dependent nodes, if the origin nodes were not emitted yet. Part of D110978 Differential Revision: https://reviews.llvm.org/D144958	2023-03-10 13:19:43 -08:00
Florian Hahn	9be8d90e62	[VPlan] Add VPWidenSelectRecipe::getCond() (NFC). Add helper to access condition, as suggested in D144489.	2023-03-10 17:49:23 +01:00
Florian Hahn	54558fd8f3	[VPlan] Replace InvariantCond field from VPWidenSelectRecipe. There is no need to store information about invariance in the recipe. Replace the fields with checks of the operands using isDefinedOutsideVectorRegions. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D144489	2023-03-10 15:28:43 +01:00
Hans Wennborg	3b3a4c270b	Revert "[SLP]Initial support for reshuffling of non-starting buildvector/gather nodes." This caused verifier errors: Instruction does not dominate all uses! %8 = insertelement <2 x i64> %7, i64 %pgocount1330, i64 1 %15 = shufflevector <2 x i64> %8, <2 x i64> poison, <2 x i32> <i32 1, i32 1> in function ?NearestInclusiveAncestorAssignedToSlot@SlotScopedTraversal@blink@@SAPAVElement@2@ABV32@@Z (or register allocator crash when the verifier was disabled). See comment on the code review. > Previously only the very first gather/buildvector node might be probed for reshuffling of other nodes. > But the compiler may do the same for other gather/buildvector nodes too, just need to check the > dependency and postpone the emission of the dependent nodes, if the origin nodes were not emitted yet. > > Part of D110978 > > Differential Revision: https://reviews.llvm.org/D144958 This reverts commit a611b3f3059e4c3b9e7b914091c3edaef099fd5d. It also reverts 7a4061ae372b3262703ffeea3b64db89187db611 which depended on the above.	2023-03-10 14:40:12 +01:00
Florian Hahn	a8adb38a96	[VPlan] Replace invariance fields from VPWidenGEPRecipe. There is no need to store information about invariance in the recipe. Replace the fields with checks of the operands using isDefinedOutsideVectorRegions. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D144487	2023-03-09 17:52:22 +01:00
Florian Hahn	79272ec028	[VPlan] Add predicate to VPReplicateRecipe, expand region later. This patch adds the predicate as additional operand to VPReplicateRecipe during initial construction. The predicated recipes are later moved into replicate regions. This simplifies constructions and some VPlan transformations, like fixed-order recurrence handling. It also improves codegen in some cases (e.g. for in-loop reductions), because the recipes remain in the same block. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D143865	2023-03-08 20:11:28 +01:00
Florian Hahn	3b2cf45d6b	[VPlan] Check if recipe is in ReplicateRegion for IfPredicateInstr (NFC) Check if replicate recipe is in a replicate region when considering to collect predicated instructions. This allows use IsPredicated for recipes with a mask attached directly in D143865. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D145322	2023-03-08 11:39:44 +01:00
Alexey Bataev	a611b3f305	[SLP]Initial support for reshuffling of non-starting buildvector/gather nodes. Previously only the very first gather/buildvector node might be probed for reshuffling of other nodes. But the compiler may do the same for other gather/buildvector nodes too, just need to check the dependency and postpone the emission of the dependent nodes, if the origin nodes were not emitted yet. Part of D110978 Differential Revision: https://reviews.llvm.org/D144958	2023-03-07 12:45:40 -08:00
Nikita Popov	ffe8f47d72	[IR] Add operator<< overload for CmpInst::Predicate (NFC) I regularly try and fail to use this while debugging.	2023-03-07 15:10:56 +01:00
sgokhale	4f018e54c4	[LV][AArch64] Resolve test failure due use of unordered container AArch64/reg-usage.ll has an issue with the output ordering due to use of unordered container. This was discovered by -DLLVM_REVERSE_ITERATION:BOOL=ON cmake option. This patch tries to address it by making use of ordered container. Differential Revision: https://reviews.llvm.org/D145472/	2023-03-07 16:42:21 +05:30
Alexey Bataev	c411965820	[SLP]Fix PR61224: Compiler hits infinite loop. IRBuilder in many cases is able to fold constant code automatically, but in some cases (for some intrinsics) it cannot do it. Need to perform manual calculation, if constant provided in these corner cases, to avoid infinite loop.	2023-03-06 13:46:41 -08:00
Florian Hahn	be968dbeee	[VPlan] VPWidenCallRecipe has side-effects if the call has. Handle VPWidenCallRecipe in VPRecipeBase::mayHaveSideEffects by delegating to the underlying call.	2023-03-05 12:08:56 +01:00
Graham Hunter	a180344589	[LV] Allow scalarization of function calls when masking is required This patch adds support for scalarizing calls to a function when there is a vector variant that cannot be used, either because there isn't a masked variant or because the cost model indicated a VF without a masked variant was better. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D134422	2023-03-03 15:26:04 +00:00
Nikita Popov	f7ca013332	[llvm-c] Remove bindings for creating legacy passes Legacy passes are only supported for codegen, and I don't believe it's possible to write backends using the C API, so we should drop all of those. Reduces the number of places that need to be modified when removing legacy passes. Differential Revision: https://reviews.llvm.org/D144970	2023-03-02 09:53:50 +01:00
Sander de Smalen	c41b41eb11	[LoopVectorize] Use overflow-check analysis to improve tail-folding. This work follows on from D142109 and addresses a possible regression when we know the loop iteration counter cannot overflow. When we know the overflow-check always evaluates to false, it's better to use the other style of tail folding where it assumes a runtime check was added, because that avoids having to calculate a modified trip-count. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D142894	2023-03-01 14:17:58 +00:00
Sander de Smalen	fe1b51ffee	[LoopVectorize] Remove runtime check and scalar tail loop when tail-folding. When using tail-folding and using the predicate for both data and control-flow (the next vector iteration's predicate is generated with the llvm.active.lane.mask intrinsic and then tested for the backedge), the LoopVectorizer still inserts a runtime check to see if the 'i + VF' may at any point overflow for the given trip-count. When it does, it falls back to a scalar epilogue loop. We can get rid of that runtime check in the pre-header and therefore also remove the scalar epilogue loop. This reduces code-size and avoids a runtime check. Consider the following loop: void foo(char * __restrict__ dst, char *src, unsigned long N) { for (unsigned long i=0; i<N; ++i) dst[i] = src[i] + 42; } If 'N' is e.g. ULONG_MAX, and the VF > 1, then the loop iteration counter will overflow when calculating the predicate for the next vector iteration at some point, because LLVM does: vector.ph: %active.lane.mask.entry = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 0, i64 %N) vector.body: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ] %active.lane.mask = phi <vscale x 16 x i1> [ %active.lane.mask.entry, %vector.ph ], [ %active.lane.mask.next, %vector.body ] ... %index.next = add i64 %index, 16 ; The add above may overflow, which would affect the lane mask and control flow. Hence a runtime check is needed. %active.lane.mask.next = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 %index.next, i64 %N) %8 = extractelement <vscale x 16 x i1> %active.lane.mask.next, i64 0 br i1 %8, label %vector.body, label %for.cond.cleanup, !llvm.loop !7 The solution: What we can do instead is calculate the predicate before incrementing the loop iteration counter, such that the llvm.active.lane.mask is calculated from 'i' to 'tripcount > VF ? tripcount - VF : 0', i.e. vector.ph: %active.lane.mask.entry = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 0, i64 %N) %N_minus_VF = select %N > 16 ? %N - 16 : 0 vector.body: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ] %active.lane.mask = phi <vscale x 16 x i1> [ %active.lane.mask.entry, %vector.ph ], [ %active.lane.mask.next, %vector.body ] ... %active.lane.mask.next = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 %index, i64 %N_minus_VF) %index.next = add i64 %index, %4 ; The add above may still overflow, but this time the active.lane.mask is not affected %8 = extractelement <vscale x 16 x i1> %active.lane.mask.next, i64 0 br i1 %8, label %vector.body, label %for.cond.cleanup, !llvm.loop !7 For N = 20, we'd then get: vector.ph: %active.lane.mask.entry = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 0, i64 %N) ; %active.lane.mask.entry = <1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1> %N_minus_VF = select 20 > 16 ? 20 - 16 : 0 ; %N_minus_VF = 4 vector.body: (1st iteration) ... ; using <1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1> as predicate in the loop ... %active.lane.mask.next = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 0, i64 4) ; %active.lane.mask.next = <1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0> %index.next = add i64 0, 16 ; %index.next = 16 %8 = extractelement <vscale x 16 x i1> %active.lane.mask.next, i64 0 ; %8 = 1 br i1 %8, label %vector.body, label %for.cond.cleanup, !llvm.loop !7 ; branch to %vector.body vector.body: (2nd iteration) ... ; using <1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0> as predicate in the loop ... %active.lane.mask.next = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 16, i64 4) ; %active.lane.mask.next = <0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0> %index.next = add i64 16, 16 ; %index.next = 32 %8 = extractelement <vscale x 16 x i1> %active.lane.mask.next, i64 0 ; %8 = 0 br i1 %8, label %vector.body, label %for.cond.cleanup, !llvm.loop !7 ; branch to %for.cond.cleanup Reviewed By: fhahn, david-arm Differential Revision: https://reviews.llvm.org/D142109	2023-03-01 09:01:19 +00:00
Valery N Dmitriev	ec7154fe70	[SLP] Add banner argument to SLP costs debug printer method - NFC. Removed unnecessary warning workaround. Differential Revision: https://reviews.llvm.org/D144992	2023-02-28 11:22:49 -08:00
Alexey Bataev	1d6b5b66bb	[SLP]Fix PR61050: Assertion `I->use_empty() && "trying to erase instruction with users." When gathering the counter for the reused scalars, need to use reduced value, not the original reduced value. Same values counter is gathered for reduced values, not original ones.	2023-02-28 07:51:34 -08:00
Nikita Popov	4bc254c664	[LoopVectorize] Only fetch BFI if profile summary available BlockFrequencyInfo should generally only be fetched in PGO builds where a PSI profile summary is available. However, LoopVectorize was fetching it unconditionally. This results in a small compile-time improvement for non-PGO builds. Differential Revision: https://reviews.llvm.org/D144953	2023-02-28 14:16:21 +01:00
sgokhale	4f9a5447c6	[LV] Reland "Update logic for calculating register usage due to invariants" Previously, while calculating register usage due to invariants, it was assumed that invariant would always be part of widening instructions. This resulted in calculating vector register types for vectors which cant be legalized(check the newly added test for more details). An invariant might not always need a vector register. For e.g., invariant might just be used for iteration check. This patch checks if the invariant is part of any widening instruction and considers register usage accordingly. Fixes issue 60493 Differential Revision: https://reviews.llvm.org/D143422	2023-02-28 17:32:39 +05:30
sgokhale	3c8ddbde37	Revert "[LV] Update logic for calculating register usage due to invariants" Observing test failure for llvm/test/Transforms/LoopVectorize/AArch64/reg-usage.ll This reverts commit d1628266946fdddb44bdad2b3ccf3cd5fc769f42.	2023-02-28 15:46:59 +05:30
sgokhale	d162826694	[LV] Update logic for calculating register usage due to invariants Previously, while calculating register usage due to invariants, it was assumed that invariant would always be part of widening instructions. This resulted in calculating vector register types for vectors which cant be legalized(check the newly added test for more details). An invariant might not always need a vector register. For e.g., invariant might just be used for iteration check. This patch checks if the invariant is part of any widening instruction and considers register usage accordingly. Fixes issue 60493 Differential Revision: https://reviews.llvm.org/D143422	2023-02-28 11:05:26 +05:30
Vasileios Porpodas	a700fb3d9b	[SLP] Fixes crash in BoUpSLP::isGatherShuffledEntry() Crash caused by: 708eb1b96d9a36f9c0182b7d53c492059778fa35 Differential Revision: https://reviews.llvm.org/D144895	2023-02-27 12:29:25 -08:00
Alexey Bataev	007177bdde	[SLP]Fix PR61018: Assertion `Mask[I] == UndefMaskElem && "Multiple uses of scalars."' failed. Need to check for the reused indices when checking if 2 insertelement instruction are from the same buildvector. If the inidices are reused, better not to match buildvectors and consider them as differenet, otherwise need to track the order of insertelement operations.	2023-02-27 10:09:48 -08:00
Alexey Bataev	5f53e85f8a	[SLP]Fix a crash when trying to find reduced ops for the reduced value. Need to use original reduced value, not the one the compiler gets after reduction, it may be replaced by the extractelement instruction already.	2023-02-27 07:32:36 -08:00
Alexey Bataev	f1c8b72c13	[SLP]Improve handling gathers/buildvectors with undefs. If have just one non-undef scalar in the buildvector/gather node, we try to put it to be the very first element, which is profitable in most cases. Do the preliminary estimation, if this more profitable during graph rotation and do same for all elements, including extractelements. Differential Revision: https://reviews.llvm.org/D144689	2023-02-24 13:17:40 -08:00
Alexey Bataev	6e30dffe71	[SLP][NFC]Format and improve function, returning std::optional<struct>, NFC.	2023-02-24 11:06:31 -08:00
Sander de Smalen	9449deda12	[LV] NFC: Move logic to query maximum vscale to its own function. To query the maximum value for vscale, the LV queries the vscale_range attribute or a TTI hook. To avoid having to reimplement the same behaviour for multiple uses (such as in D142894), it makes sense to move this code to a separate function.	2023-02-23 15:12:35 +00:00
Jonas Paulsson	1387a13e1d	[SLP] Check with target before vectorizing GEP Indices. The target hook prefersVectorizedAddressing() already exists to check with target if address computations should be vectorized, so it seems like this should be used in SLPVectorizer as well. Reviewed By: ABataev, RKSimon Differential Revision: https://reviews.llvm.org/D144128	2023-02-23 15:31:34 +01:00
OCHyams	620a529760	[Assignment Tracking] Choose better passes for RemoveRedundantDbgInstrs call Enabling assignment tracking without this patch, a significant amount of additional compiler run time comes from the RemoveRedundantDbgInstrs call in InstCombine. This patch reduces compiler run time by choosing better places to call RemoveRedundantDbgInstrs. In non-assignment-tracking builds, RemoveRedundantDbgInstrs is called by InstCombine if LowerDbgDeclare makes a change (i.e. it is _sometimes_ called). In assignment tracking builds LowerDbgDeclare doesn't do anything. We still need to clean up redundant intrinsics to avoid a large performance hit due to the number of instructions, so the current approach is to have InstCombine _always_ call RemoveRedundantDbgInstrs. Instrumenting the compiler to run RemoveRedundantDbgInstrs after every pass and dump the numbers and building CTMark/tramp3d-v4 indicates that SROA and LoopVectorize give us a bigger bang (number removed) for buck (times pass is run). The compile time tracker reports that this patch reduces the number of instructions retired building CTMark projects by an average of 1.1%. Reviewed By: scott.linder Differential Revision: https://reviews.llvm.org/D144483	2023-02-22 16:28:06 +00:00
Luke Lau	b02b1e0ed6	[LV][NFC] Use ElementCount for getMaxInterleaveFactor In order to allow targets to disable interleaving for scalable vectors, pass the entire VF's ElementCount to getMaxInterleaveFactor. This is based off of the approach used here: `8d36708507` The plan would then be to disable interleaving on scalable VFs on RISC-V in a follow up patch. See https://reviews.llvm.org/D143723#4132349 Reviewed By: reames Differential Revision: https://reviews.llvm.org/D144474	2023-02-22 10:15:05 +00:00
Liren Peng	529ee9750b	[NFC] Use single quotes for single char output during `printPipline` Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D144365	2023-02-22 02:35:13 +00:00
Alexey Bataev	cbcdd747e8	[SLP]Do not swap not counted extractelements. No need to swap extractelements, which were not excluded from the list during cost analysis. It leads to incorrect cost calculation and make vector code more profitable than it is actually is.	2023-02-21 13:16:51 -08:00
Alexey Bataev	5f928a223e	[SLP]Properly define incoming block for user PHI nodes. MainOp of the PHI vectorizable entries contains the proper order of incoming blocks, not the last instruction in the block.	2023-02-21 08:01:24 -08:00
Alexey Bataev	708eb1b96d	[SLP]Add shuffling of extractelements to avoid extra costs/data movement. If the scalar must be extracted and then used in the gather node, instead we can emit shuffle instruction to avoid those extra extractelements and vector-to-scalar and back data movement. Part of D110978 Differential Revision: https://reviews.llvm.org/D141940	2023-02-20 06:14:42 -08:00
Florian Hahn	c21ccebe6f	[VPlan] Use usesScalars in shouldPack. Suggested by @Ayal as follow-up improvement in D143864. I was unable to find a case where this actually changes generated code, but it is a unifying code to use common infrastructure.	2023-02-20 14:11:40 +00:00
Florian Hahn	df016a9525	[VPlan] Move shouldPack outside of DEBUG ifdef. This fixes a build failure with assertions disabled.	2023-02-20 10:53:45 +00:00
Florian Hahn	9333b97763	[VPlan] Replace AlsoPack field with shouldPack() method (NFC). There is no need to update the AlsoPack field when creating VPReplicateRecipes. It can be easily computed based on the VP def-use chains when it is needed. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D143864	2023-02-20 10:28:26 +00:00
Kazu Hirata	a28b252d85	Use APInt::getSignificantBits instead of APInt::getMinSignedBits (NFC) Note that getMinSignedBits has been soft-deprecated in favor of getSignificantBits.	2023-02-19 23:56:52 -08:00
Kazu Hirata	f8f3db2756	Use APInt::count{l,r}_{zero,one} (NFC)	2023-02-19 22:04:47 -08:00
Florian Hahn	f61c9b7569	[SLP] Fix infinite loop in isUndefVector. This fixes an infinite loop if isa<T>(II->getOperand(1)) is true. Update Base at the top of the loop, before the continue. Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D144292	2023-02-19 21:42:24 +00:00
Florian Hahn	7737c05696	[VPlan] Make sure properlyDominates(A, A) returns false. At the moment, properlyDominates(A, A) can return true via LocalComesBefore. Add an early exit to ensure it returns false if A == B. Note: no test has been added because the existing test suite covers this case already with libc++ with assertions enabled. Fixes https://github.com/llvm/llvm-project/issues/60850.	2023-02-19 18:01:16 +00:00
Alexey Bataev	e03d254bbd	[SLP]Do not reduce repeated values, use scalar red ops instead. Metric: size..text size..text results results0 diff SingleSource/Regression/C/gcc-c-torture/execute/GCC-C-execute-980605-1.test 445.00 461.00 3.6% SingleSource/Benchmarks/Adobe-C++/loop_unroll.test 428477.00 428445.00 -0.0% External/SPEC/CFP2006/447.dealII/447.dealII.test 618849.00 618785.00 -0.0% For all tests some extra code was optimized, GCC-C-execute has some more inlining after Differential Revision: https://reviews.llvm.org/D132261	2023-02-17 07:19:35 -08:00
Florian Hahn	a3d1de3e29	[LV] Move invalid cost remark code to separate function (NFC). The code only needs access to INvalidCosts, ORE and TheLoop, so it can easily be moved into a helper to make selectVectorizationFactor more compact. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D143957	2023-02-16 11:28:19 +00:00
Kazu Hirata	7e6e636fb6	Use llvm::has_single_bit<uint32_t> (NFC) This patch replaces isPowerOf2_32 with llvm::has_single_bit<uint32_t> where the argument is wider than uint32_t.	2023-02-15 22:17:27 -08:00
Hongtao Yu	eddec9de44	[Pseudo probe] Duplicate probes in vectorized loop body. Prevoius pseudo probes were dropped out of a vectorized loop body during loop vectorization. This can result in the samples of the loop entry is used for the loop body, which in turn can cause undercounting of the loop iteration count. The undercounting can further prevent the loop from being vectorized in the next build. I'm fixing this by explicting allowing pseudo probes to be kept in the vectorized loop body, and by claiming a probe instruction is not "uniform", the vectorizer will duplicate it by the number of vector lanes. For one internal service, I'm seeing the change causes the size increase of the .pseudoprobe section by 0.7%, which should count around 0.2% of the whole binary size. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D144066	2023-02-15 10:18:08 -08:00
Graham Hunter	0fa5df1959	[LV] Synthesize all true masks for masked vector function variants When vectorizing code with function calls in it, if we encounter a function which only has vectorized variants requiring a mask we can synthesize an all-true mask to enable us to proceed. Since we want the mask to be represented in vplan, the pointer to the chosen Function is now stored as part of the VPWidenCallRecipe, and mask arguments are added at the appropriate index to the recipe operands. Reviewed By: david-arm, fhahn, reames Differential Revision: https://reviews.llvm.org/D132458	2023-02-14 14:33:18 +00:00
Fangrui Song	1e6921131a	Move global namespace cl::opt inside llvm::	2023-02-14 00:09:44 -08:00

1 2 3 4 5 ...

3626 Commits