llvm-project

Author	SHA1	Message	Date
Florian Hahn	4fc190351e	[VPlan] Remove uneeded NeedsVectorIV from VPWidenIntOrFpInduction. After recent improvements, all instances of VPWidenIntOrFpInductionRecipe should needs a vector IV and there's no need for a separate field.	2023-04-17 13:38:00 +01:00
Florian Hahn	668045eb77	[VPlan] Unify Value2VPValue and VPExternalDefs maps (NFCI). Before this patch, a VPlan contained 2 mappings for Values -> VPValue: 1) Value2VPValue and 2) VPExternalDefs. This duplication is unnecessary and there are already cases where external defs are added to Value2VPValue. This patch replaces all uses of VPExternalDefs with Value2VPValue. It clarifies the naming of getOrAddVPValue (to getOrAddExternalVPValue) and addVPValue (to addExternalVPValue). At the moment, this is NFC, but will enable additional simplifications in D147783. Depends on D147891. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D147892	2023-04-16 15:38:31 +01:00
Florian Hahn	2db031528e	[VPlan] Check VPValue step in isCanonical (NFCI). Update the isCanonical() implementations to check the VPValue step operand instead of the step in the induction descriptor. At the moment this is NFC, but it enables further optimizations if the step is replaced by a constant in D147783. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D147891	2023-04-16 14:48:03 +01:00
Craig Topper	4b47d875a1	[LV] Optimize trip count SCEV. To calculate the trip count we need to add 1 to the backedge taken count. If we need to widen the backedge count, it's better to do the add before the widening if we can guarantee it won't overflow. The code here is based on similar code I found in LoopIdiomRecognize. This is the vectorizer version of this InstCombine patch D142783. Looking at the IR diffs, this does look like it gets more cases than the InstCombine patch. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D147355	2023-04-12 16:17:58 -07:00
Florian Hahn	c255eb2c4b	[VPlan] Use VPLiveOut to update FOR live-out users. Instead of iterating over all LCSSA phis in the exit block, collect all LiveOut users of the FOR splice VPInstruction and only update those users. Building on top of D147471, this removes an access to the cost model after VPlan execution. Depends on D147471. Reviewed By: Ayal, michaelmaitland Differential Revision: https://reviews.llvm.org/D147472	2023-04-10 13:02:44 +01:00
Florian Hahn	620e011a25	[VPlan] Don't add live-outs if scalar epilogue is required. Instead of clearing live outs when a scalar epilogue is required late, don't add live outs during VPlan construction if a scalar epilogue is required. This enables more VPlan-based DCE (if the live out would be the only user in the plan) and is a step towards removing an access of the cost model in fixedVectorizedLoop (which is after VPlan execution). Depends on D147468. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D147471	2023-04-09 09:18:24 +01:00
Florian Hahn	c7a34d355a	[VPlan] Require VFRange.End to be a power-of-2. (NFCI) This removes the need to convert the end of the range to the next power-of-2 for the end iterator after 4bd3fda5124962 and was suggested as follow-up TODO in D147468.	2023-04-08 13:04:08 +01:00
Florian Hahn	4bd3fda512	[VPlan] Add VFRange::begin() and end() iterators. (NFCI) Add an iterator to iterate over all VFs in VFRange. This simplifies some existing code and allows using all_of,any_of and none_of on a VFRange. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D147468	2023-04-08 10:22:25 +01:00
Florian Hahn	11896357d4	[VPlan] Add VPInterleaveRecipe::NeedsMaskForGaps field (NFCI). This patch adds a NeedsMaskForGaps field to VPInterleaveRecipe to record whether a mask for gaps is needed. This removes a dependence on the cost model in VPlan code-generation. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D147467	2023-04-07 13:11:03 +01:00
Michael Maitland	194f3dc8fd	[VPlan] VPWidenIntOrFpInductionRecipe inherits from VPHeaderPHIRecipe Differential Revision: https://reviews.llvm.org/D144125	2023-03-14 17:01:34 -07:00
Kazu Hirata	c8f9555c4d	[Transforms] Use *{Set,Map}::contains (NFC)	2023-03-14 00:24:30 -07:00
Florian Hahn	9be8d90e62	[VPlan] Add VPWidenSelectRecipe::getCond() (NFC). Add helper to access condition, as suggested in D144489.	2023-03-10 17:49:23 +01:00
Florian Hahn	54558fd8f3	[VPlan] Replace InvariantCond field from VPWidenSelectRecipe. There is no need to store information about invariance in the recipe. Replace the fields with checks of the operands using isDefinedOutsideVectorRegions. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D144489	2023-03-10 15:28:43 +01:00
Florian Hahn	a8adb38a96	[VPlan] Replace invariance fields from VPWidenGEPRecipe. There is no need to store information about invariance in the recipe. Replace the fields with checks of the operands using isDefinedOutsideVectorRegions. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D144487	2023-03-09 17:52:22 +01:00
Florian Hahn	79272ec028	[VPlan] Add predicate to VPReplicateRecipe, expand region later. This patch adds the predicate as additional operand to VPReplicateRecipe during initial construction. The predicated recipes are later moved into replicate regions. This simplifies constructions and some VPlan transformations, like fixed-order recurrence handling. It also improves codegen in some cases (e.g. for in-loop reductions), because the recipes remain in the same block. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D143865	2023-03-08 20:11:28 +01:00
Sander de Smalen	fe1b51ffee	[LoopVectorize] Remove runtime check and scalar tail loop when tail-folding. When using tail-folding and using the predicate for both data and control-flow (the next vector iteration's predicate is generated with the llvm.active.lane.mask intrinsic and then tested for the backedge), the LoopVectorizer still inserts a runtime check to see if the 'i + VF' may at any point overflow for the given trip-count. When it does, it falls back to a scalar epilogue loop. We can get rid of that runtime check in the pre-header and therefore also remove the scalar epilogue loop. This reduces code-size and avoids a runtime check. Consider the following loop: void foo(char * __restrict__ dst, char *src, unsigned long N) { for (unsigned long i=0; i<N; ++i) dst[i] = src[i] + 42; } If 'N' is e.g. ULONG_MAX, and the VF > 1, then the loop iteration counter will overflow when calculating the predicate for the next vector iteration at some point, because LLVM does: vector.ph: %active.lane.mask.entry = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 0, i64 %N) vector.body: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ] %active.lane.mask = phi <vscale x 16 x i1> [ %active.lane.mask.entry, %vector.ph ], [ %active.lane.mask.next, %vector.body ] ... %index.next = add i64 %index, 16 ; The add above may overflow, which would affect the lane mask and control flow. Hence a runtime check is needed. %active.lane.mask.next = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 %index.next, i64 %N) %8 = extractelement <vscale x 16 x i1> %active.lane.mask.next, i64 0 br i1 %8, label %vector.body, label %for.cond.cleanup, !llvm.loop !7 The solution: What we can do instead is calculate the predicate before incrementing the loop iteration counter, such that the llvm.active.lane.mask is calculated from 'i' to 'tripcount > VF ? tripcount - VF : 0', i.e. vector.ph: %active.lane.mask.entry = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 0, i64 %N) %N_minus_VF = select %N > 16 ? %N - 16 : 0 vector.body: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ] %active.lane.mask = phi <vscale x 16 x i1> [ %active.lane.mask.entry, %vector.ph ], [ %active.lane.mask.next, %vector.body ] ... %active.lane.mask.next = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 %index, i64 %N_minus_VF) %index.next = add i64 %index, %4 ; The add above may still overflow, but this time the active.lane.mask is not affected %8 = extractelement <vscale x 16 x i1> %active.lane.mask.next, i64 0 br i1 %8, label %vector.body, label %for.cond.cleanup, !llvm.loop !7 For N = 20, we'd then get: vector.ph: %active.lane.mask.entry = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 0, i64 %N) ; %active.lane.mask.entry = <1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1> %N_minus_VF = select 20 > 16 ? 20 - 16 : 0 ; %N_minus_VF = 4 vector.body: (1st iteration) ... ; using <1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1> as predicate in the loop ... %active.lane.mask.next = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 0, i64 4) ; %active.lane.mask.next = <1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0> %index.next = add i64 0, 16 ; %index.next = 16 %8 = extractelement <vscale x 16 x i1> %active.lane.mask.next, i64 0 ; %8 = 1 br i1 %8, label %vector.body, label %for.cond.cleanup, !llvm.loop !7 ; branch to %vector.body vector.body: (2nd iteration) ... ; using <1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0> as predicate in the loop ... %active.lane.mask.next = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 16, i64 4) ; %active.lane.mask.next = <0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0> %index.next = add i64 16, 16 ; %index.next = 32 %8 = extractelement <vscale x 16 x i1> %active.lane.mask.next, i64 0 ; %8 = 0 br i1 %8, label %vector.body, label %for.cond.cleanup, !llvm.loop !7 ; branch to %for.cond.cleanup Reviewed By: fhahn, david-arm Differential Revision: https://reviews.llvm.org/D142109	2023-03-01 09:01:19 +00:00
Florian Hahn	9333b97763	[VPlan] Replace AlsoPack field with shouldPack() method (NFC). There is no need to update the AlsoPack field when creating VPReplicateRecipes. It can be easily computed based on the VP def-use chains when it is needed. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D143864	2023-02-20 10:28:26 +00:00
Graham Hunter	0fa5df1959	[LV] Synthesize all true masks for masked vector function variants When vectorizing code with function calls in it, if we encounter a function which only has vectorized variants requiring a mask we can synthesize an all-true mask to enable us to proceed. Since we want the mask to be represented in vplan, the pointer to the chosen Function is now stored as part of the VPWidenCallRecipe, and mask arguments are added at the appropriate index to the recipe operands. Reviewed By: david-arm, fhahn, reames Differential Revision: https://reviews.llvm.org/D132458	2023-02-14 14:33:18 +00:00
Florian Hahn	31d46ca8aa	[Dominators] Introduce DomTreeNodeTraits to allow customization. (NFC) This patch introduces DomTreeNodeTraits for customization. Clients can implement DomTreeNodeTraitsCustom to provide custom ParentPtr, getEntryNode and getParent. There's also a default specialization if DomTreeNodeTraitsCustom is not implemented, that assume a Function-like NodeT. This is what is used for the existing DominatorTree and MachineDominatorTree. The main motivation for this patch is using DominatorTreeBase across all regions of a VPlan, see D140513. Reviewed By: kuhar Differential Revision: https://reviews.llvm.org/D142162	2023-01-22 20:22:41 +00:00
Florian Hahn	22c9f4cf2d	[VPlan] Replace VPInterleaveRecipe::classof with VP_CLASSOF_IMPL. (NFC)	2023-01-18 14:23:22 +00:00
Florian Hahn	f615de7e26	[VPlan] Replace VPBranchOnMaskSC::classof with VP_CLASSOF_IMPL. (NFC)	2023-01-18 12:14:58 +00:00
Florian Hahn	cdd8fcdbd7	[VPlan] Replace VPExpandSCEVRecipe::classof with VP_CLASSOF_IMPL. (NFC)	2023-01-17 21:11:33 +00:00
Florian Hahn	bf1ba6bb52	[VPlan] Replace VPScalarIVStepsRecipe::classof with VP_CLASSOF_IMPL(NFC)	2023-01-17 20:53:14 +00:00
Florian Hahn	d47bdae28e	[VPlan] Remove duplicated VPValue IDs (NFCI). At the moment, both VPValue and VPDef have an ID used when casting via classof. This duplication is cumbersome, because it requires adding IDs for new recipes twice and also requires setting them twice. In a few cases, there's only a VPDef ID and no VPValue ID, which can cause same confusion. To simplify things, remove the VPValue IDs for different recipes. Instead, only retain the generic VPValue ID (= used VPValues without a corresponding defining recipe) and VPVRecipe for VPValues that are defined by recipes that inherit from VPValue. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D140848	2023-01-17 15:11:38 +00:00
Florian Hahn	133f017479	[VPlan] Remove unneeded VPUser::classof(const VPDef *) (NFC). This specialization is not needed any longer as VPRecipeBase inherits from VPUser and getDefiningRecipe returns a VPRecipeBase.	2023-01-17 09:08:33 +00:00
Florian Hahn	56ffd39c3d	[VPlan] Use VPDef prefix for VPDef IDs instead of VPRecipeBase (NFC). Various places in the code where still using the VPRecipeBase:: prefix for VPDef IDs or not prefix at all. Now that the VPDef IDs have been moved to VPDef, use this prefix instead and consistently use it.	2023-01-16 10:23:52 +00:00
Florian Hahn	ce1be13a86	[VPlan] Use VP_CLASSOF_IMPL for VPWidenCanonicalIVRecipe(NFC). Replace VPWidenCanonicalIVRecipe::classof implementation with general VP_CLASSOF_IMPL.	2023-01-02 17:52:13 +00:00
Florian Hahn	64f1d845b3	[VPlan] Use VP_CLASSOF_IMPL for VPWidenMemoryInstructionRecipe (NFC). Replace VPWidenMemoryInstructionRecipe ::classof implementation with general VP_CLASSOF_IMPL.	2023-01-02 17:32:31 +00:00
Florian Hahn	2d6d47f807	[VPlan] Use VP_CLASSOF_IMPL for VPPredInstPHI (NFC). Replace VPPredInstPHI::classof implementation with general VP_CLASSOF_IMPL.	2023-01-02 17:22:34 +00:00
Florian Hahn	cd16a3f04c	[VPlan] Move GraphTraits definitions to separate header (NFC). This reduces the size of VPlan.h and avoids future growth of the file when the graph traits are extended in future patches. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D140500	2022-12-31 15:14:57 +00:00
Florian Hahn	36d70a6aea	[VPlan] Remove redundant blocks by merging them into predecessors. Add and run VPlan transform to fold blocks with a single predecessor into the predecessor. This remove redundant blocks and addresses a TODO to replace special handling for the vector latch VPBB. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D139927	2022-12-26 22:47:09 +00:00
Florian Hahn	e1650c8d52	[LV] Move exit cond simplification to separate transform. This sets the stage for D133017 by moving out the code that performs VPlan based simplifications to a separate transform that takes the chosen VF & UF as arguments. The main advantage is that this transform runs before any changes to the CFG are being made. This allows using SCEV without worrying about making queries while the IR is in an incomplete state. Note that this patch switches the reasoning to use SCEV, but still only simplifies loops with constant trip counts. Using SCEV here is needed to access the backedge taken count, because the trip count IR value has not been created yet. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D135017	2022-12-23 12:51:21 +00:00
Florian Hahn	5df34e971d	[VPlan] Add support for tracking UFs applicable to VPlan (NFC). Explicitly track the UFs supported in a VPlan. This is needed to allow transformations to restrict the UFs which are supported. Discussed as separate improvement in D135017.	2022-12-22 18:58:25 +00:00
Florian Hahn	96296922b6	[VPlan] Move VF and UF string generation to getName() (NFC). The VFs and UFs may be more constrained as the plans are transformed (e.g. see D135017 for an example). To make sure the VFs/UFs included in the VPlan dump are accurate, generate them when accessing a plan's name, rather than include them in the name string set after initial construction.	2022-12-22 13:15:01 +00:00
Florian Hahn	a84064bcda	[LV] Add createTripCountSCEV helper (NFC). Split off helper function in preparation for D135017.	2022-12-21 22:02:31 +00:00
Florian Hahn	f69ac9a22d	[LV] Support widened induction variables in epilogue vectorization. Code generation now uses the start VPValue of induction recipes. This makes it possible to adjust the start value of the epilogue vector loop to use the 'resume' value of the main vector loop. Fixes #59459. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D92132	2022-12-21 13:58:50 +00:00
Florian Hahn	08f16a8217	[VPlan] Use macro to define recipe classof implementation (NFC). Add a VP_CLASSOF_IMPL macro to define common classof implementations for recipes. This reduces duplication and also adds missing implementations to existing recipes.	2022-12-16 17:52:15 +00:00
Kazu Hirata	6eb0b0a045	Don't include Optional.h These files no longer use llvm::Optional.	2022-12-14 21:16:22 -08:00
Florian Hahn	e898479f2b	[VPlan] Sink non-uniform recieps for scalar plans. In scalar plans, replicate recipes will only generate a single value per UF, independent of whether they are uniform or not. So don't consider uniformity for plans with scalar VFs only. This allows us to handle a few additional cases in VPlan sinking instead of non-VPlan sinkScalarOperands. Depends on D133762. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D134218	2022-12-14 17:55:31 +00:00
Fangrui Song	1ec11d2d48	[Transforms/Vectorize] llvm::Optional => std::optional	2022-12-12 08:56:35 +00:00
Florian Hahn	29e8de5de1	[VPlan] Summarize recipes used to model inductions (NFC). Document recipes used to model inductions after introducing VPDerivedIVRecipe in 0c5df7cd2f81c. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D138748	2022-12-11 16:28:43 +00:00
Florian Hahn	0c5df7cd2f	Recommit "[VPlan] Add VPDerivedIVRecipe, use for VPScalarIVStepsRecipe." This reverts commit bf15f1e489aa2f1ac13268c9081a992a8963eb5b. The updated version fixes a crash by checking the induction kind instead of the opcode; for integer inductions, the step is always added, but the opcode might not be set.	2022-11-30 17:04:20 +00:00
Florian Hahn	bf15f1e489	Revert "[VPlan] Add VPDerivedIVRecipe, use for VPScalarIVStepsRecipe." This reverts commit 0fa666ecedc3f36471c0fee925d664512e7525a8. This triggers an assertion during AArch64 stage2 builds. Revert while I investigate. See https://lab.llvm.org/buildbot/#/builders/179/builds/4967/steps/11/logs/stdio	2022-11-28 22:43:11 +00:00
Florian Hahn	0fa666eced	[VPlan] Add VPDerivedIVRecipe, use for VPScalarIVStepsRecipe. This patch splits off the logic to transform the canonical IV to a a value for an induction with a different start and step. This transformation only needs to be done once (independent of VF/UF) and enables sinking of VPScalarIVStepsRecipe as follow-up. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D133758	2022-11-28 16:32:31 +00:00
Florian Hahn	12bb5535d2	[VPlan] Move cast codegen to emitTransformedIndex (NFCI). This reduces duplication a bit. Suggested as simplification in D133758.	2022-11-26 22:47:13 +00:00
Florian Hahn	32f1c5531b	[VPlan] Update VPValue::getDef to return VPRecipeBase, adjust name(NFC) The return value of getDef is guaranteed to be a VPRecipeBase and all users can also accept a VPRecipeBase *. Most users actually case to VPRecipeBase or a specific recipe before using it, so this change removes a number of redundant casts. Also rename it to getDefiningRecipe to make the name a bit clearer. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D136068	2022-11-16 22:12:08 +00:00
Florian Hahn	d72fcee8f4	[VPlan] Add VPValue::isDefinedOutsideVectorRegions helper (NFC). @Ayal suggested a better named helper than using `!getDef()` to check if a value is invariant across all parts. The property we are using here is that the VPValue is defined outside any vector loop region. There's a TODO left to handle recipes defined in pre-header blocks. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D133666	2022-10-19 13:20:30 +01:00
Florian Hahn	2c692d891e	[LV] Update handling of scalable pointer inductions after b73d2c8. The dependent code has been changed quite a lot since 151c144 which b73d2c8 effectively reverts. Now we run into a case where lowering didn't expect/support the behavior pre 151c144 any longer. Update the code dealing with scalable pointer inductions to also check for uniformity in combination with isScalarAfterVectorization. This should ensure scalable pointer inductions are handled properly during epilogue vectorization. Fixes #57912.	2022-09-23 18:23:02 +01:00
Florian Hahn	582f8ef19f	[LV] Keep track of cost-based ScalarAfterVec in VPWidenPointerInd. Epilogue vectorization uses isScalarAfterVectorization to check if widened versions for inductions need to be generated and bails out in those cases. At the moment, there are scenarios where isScalarAfterVectorization returns true but VPWidenPointerInduction::onlyScalarsGenerated would return false, causing widening. This can lead to widened phis with incorrect start values being created in the epilogue vector body. This patch addresses the issue by storing the cost-model decision in VPWidenPointerInductionRecipe and restoring the behavior before 151c144. This effectively reverts 151c144, but the long-term fix is to properly support widened inductions during epilogue vectorization Fixes #57712.	2022-09-19 18:14:35 +01:00
Kazu Hirata	5e5a6c5b07	Use std::conditional_t (NFC)	2022-09-18 10:25:06 -07:00

1 2 3 4 5 ...

292 Commits