llvm-project

Author	SHA1	Message	Date
Sander de Smalen	fe1b51ffee	[LoopVectorize] Remove runtime check and scalar tail loop when tail-folding. When using tail-folding and using the predicate for both data and control-flow (the next vector iteration's predicate is generated with the llvm.active.lane.mask intrinsic and then tested for the backedge), the LoopVectorizer still inserts a runtime check to see if the 'i + VF' may at any point overflow for the given trip-count. When it does, it falls back to a scalar epilogue loop. We can get rid of that runtime check in the pre-header and therefore also remove the scalar epilogue loop. This reduces code-size and avoids a runtime check. Consider the following loop: void foo(char * __restrict__ dst, char *src, unsigned long N) { for (unsigned long i=0; i<N; ++i) dst[i] = src[i] + 42; } If 'N' is e.g. ULONG_MAX, and the VF > 1, then the loop iteration counter will overflow when calculating the predicate for the next vector iteration at some point, because LLVM does: vector.ph: %active.lane.mask.entry = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 0, i64 %N) vector.body: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ] %active.lane.mask = phi <vscale x 16 x i1> [ %active.lane.mask.entry, %vector.ph ], [ %active.lane.mask.next, %vector.body ] ... %index.next = add i64 %index, 16 ; The add above may overflow, which would affect the lane mask and control flow. Hence a runtime check is needed. %active.lane.mask.next = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 %index.next, i64 %N) %8 = extractelement <vscale x 16 x i1> %active.lane.mask.next, i64 0 br i1 %8, label %vector.body, label %for.cond.cleanup, !llvm.loop !7 The solution: What we can do instead is calculate the predicate before incrementing the loop iteration counter, such that the llvm.active.lane.mask is calculated from 'i' to 'tripcount > VF ? tripcount - VF : 0', i.e. vector.ph: %active.lane.mask.entry = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 0, i64 %N) %N_minus_VF = select %N > 16 ? %N - 16 : 0 vector.body: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ] %active.lane.mask = phi <vscale x 16 x i1> [ %active.lane.mask.entry, %vector.ph ], [ %active.lane.mask.next, %vector.body ] ... %active.lane.mask.next = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 %index, i64 %N_minus_VF) %index.next = add i64 %index, %4 ; The add above may still overflow, but this time the active.lane.mask is not affected %8 = extractelement <vscale x 16 x i1> %active.lane.mask.next, i64 0 br i1 %8, label %vector.body, label %for.cond.cleanup, !llvm.loop !7 For N = 20, we'd then get: vector.ph: %active.lane.mask.entry = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 0, i64 %N) ; %active.lane.mask.entry = <1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1> %N_minus_VF = select 20 > 16 ? 20 - 16 : 0 ; %N_minus_VF = 4 vector.body: (1st iteration) ... ; using <1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1> as predicate in the loop ... %active.lane.mask.next = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 0, i64 4) ; %active.lane.mask.next = <1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0> %index.next = add i64 0, 16 ; %index.next = 16 %8 = extractelement <vscale x 16 x i1> %active.lane.mask.next, i64 0 ; %8 = 1 br i1 %8, label %vector.body, label %for.cond.cleanup, !llvm.loop !7 ; branch to %vector.body vector.body: (2nd iteration) ... ; using <1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0> as predicate in the loop ... %active.lane.mask.next = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 16, i64 4) ; %active.lane.mask.next = <0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0> %index.next = add i64 16, 16 ; %index.next = 32 %8 = extractelement <vscale x 16 x i1> %active.lane.mask.next, i64 0 ; %8 = 0 br i1 %8, label %vector.body, label %for.cond.cleanup, !llvm.loop !7 ; branch to %for.cond.cleanup Reviewed By: fhahn, david-arm Differential Revision: https://reviews.llvm.org/D142109	2023-03-01 09:01:19 +00:00
Florian Hahn	9333b97763	[VPlan] Replace AlsoPack field with shouldPack() method (NFC). There is no need to update the AlsoPack field when creating VPReplicateRecipes. It can be easily computed based on the VP def-use chains when it is needed. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D143864	2023-02-20 10:28:26 +00:00
Graham Hunter	0fa5df1959	[LV] Synthesize all true masks for masked vector function variants When vectorizing code with function calls in it, if we encounter a function which only has vectorized variants requiring a mask we can synthesize an all-true mask to enable us to proceed. Since we want the mask to be represented in vplan, the pointer to the chosen Function is now stored as part of the VPWidenCallRecipe, and mask arguments are added at the appropriate index to the recipe operands. Reviewed By: david-arm, fhahn, reames Differential Revision: https://reviews.llvm.org/D132458	2023-02-14 14:33:18 +00:00
Florian Hahn	31d46ca8aa	[Dominators] Introduce DomTreeNodeTraits to allow customization. (NFC) This patch introduces DomTreeNodeTraits for customization. Clients can implement DomTreeNodeTraitsCustom to provide custom ParentPtr, getEntryNode and getParent. There's also a default specialization if DomTreeNodeTraitsCustom is not implemented, that assume a Function-like NodeT. This is what is used for the existing DominatorTree and MachineDominatorTree. The main motivation for this patch is using DominatorTreeBase across all regions of a VPlan, see D140513. Reviewed By: kuhar Differential Revision: https://reviews.llvm.org/D142162	2023-01-22 20:22:41 +00:00
Florian Hahn	22c9f4cf2d	[VPlan] Replace VPInterleaveRecipe::classof with VP_CLASSOF_IMPL. (NFC)	2023-01-18 14:23:22 +00:00
Florian Hahn	f615de7e26	[VPlan] Replace VPBranchOnMaskSC::classof with VP_CLASSOF_IMPL. (NFC)	2023-01-18 12:14:58 +00:00
Florian Hahn	cdd8fcdbd7	[VPlan] Replace VPExpandSCEVRecipe::classof with VP_CLASSOF_IMPL. (NFC)	2023-01-17 21:11:33 +00:00
Florian Hahn	bf1ba6bb52	[VPlan] Replace VPScalarIVStepsRecipe::classof with VP_CLASSOF_IMPL(NFC)	2023-01-17 20:53:14 +00:00
Florian Hahn	d47bdae28e	[VPlan] Remove duplicated VPValue IDs (NFCI). At the moment, both VPValue and VPDef have an ID used when casting via classof. This duplication is cumbersome, because it requires adding IDs for new recipes twice and also requires setting them twice. In a few cases, there's only a VPDef ID and no VPValue ID, which can cause same confusion. To simplify things, remove the VPValue IDs for different recipes. Instead, only retain the generic VPValue ID (= used VPValues without a corresponding defining recipe) and VPVRecipe for VPValues that are defined by recipes that inherit from VPValue. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D140848	2023-01-17 15:11:38 +00:00
Florian Hahn	133f017479	[VPlan] Remove unneeded VPUser::classof(const VPDef *) (NFC). This specialization is not needed any longer as VPRecipeBase inherits from VPUser and getDefiningRecipe returns a VPRecipeBase.	2023-01-17 09:08:33 +00:00
Florian Hahn	56ffd39c3d	[VPlan] Use VPDef prefix for VPDef IDs instead of VPRecipeBase (NFC). Various places in the code where still using the VPRecipeBase:: prefix for VPDef IDs or not prefix at all. Now that the VPDef IDs have been moved to VPDef, use this prefix instead and consistently use it.	2023-01-16 10:23:52 +00:00
Florian Hahn	ce1be13a86	[VPlan] Use VP_CLASSOF_IMPL for VPWidenCanonicalIVRecipe(NFC). Replace VPWidenCanonicalIVRecipe::classof implementation with general VP_CLASSOF_IMPL.	2023-01-02 17:52:13 +00:00
Florian Hahn	64f1d845b3	[VPlan] Use VP_CLASSOF_IMPL for VPWidenMemoryInstructionRecipe (NFC). Replace VPWidenMemoryInstructionRecipe ::classof implementation with general VP_CLASSOF_IMPL.	2023-01-02 17:32:31 +00:00
Florian Hahn	2d6d47f807	[VPlan] Use VP_CLASSOF_IMPL for VPPredInstPHI (NFC). Replace VPPredInstPHI::classof implementation with general VP_CLASSOF_IMPL.	2023-01-02 17:22:34 +00:00
Florian Hahn	cd16a3f04c	[VPlan] Move GraphTraits definitions to separate header (NFC). This reduces the size of VPlan.h and avoids future growth of the file when the graph traits are extended in future patches. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D140500	2022-12-31 15:14:57 +00:00
Florian Hahn	36d70a6aea	[VPlan] Remove redundant blocks by merging them into predecessors. Add and run VPlan transform to fold blocks with a single predecessor into the predecessor. This remove redundant blocks and addresses a TODO to replace special handling for the vector latch VPBB. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D139927	2022-12-26 22:47:09 +00:00
Florian Hahn	e1650c8d52	[LV] Move exit cond simplification to separate transform. This sets the stage for D133017 by moving out the code that performs VPlan based simplifications to a separate transform that takes the chosen VF & UF as arguments. The main advantage is that this transform runs before any changes to the CFG are being made. This allows using SCEV without worrying about making queries while the IR is in an incomplete state. Note that this patch switches the reasoning to use SCEV, but still only simplifies loops with constant trip counts. Using SCEV here is needed to access the backedge taken count, because the trip count IR value has not been created yet. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D135017	2022-12-23 12:51:21 +00:00
Florian Hahn	5df34e971d	[VPlan] Add support for tracking UFs applicable to VPlan (NFC). Explicitly track the UFs supported in a VPlan. This is needed to allow transformations to restrict the UFs which are supported. Discussed as separate improvement in D135017.	2022-12-22 18:58:25 +00:00
Florian Hahn	96296922b6	[VPlan] Move VF and UF string generation to getName() (NFC). The VFs and UFs may be more constrained as the plans are transformed (e.g. see D135017 for an example). To make sure the VFs/UFs included in the VPlan dump are accurate, generate them when accessing a plan's name, rather than include them in the name string set after initial construction.	2022-12-22 13:15:01 +00:00
Florian Hahn	a84064bcda	[LV] Add createTripCountSCEV helper (NFC). Split off helper function in preparation for D135017.	2022-12-21 22:02:31 +00:00
Florian Hahn	f69ac9a22d	[LV] Support widened induction variables in epilogue vectorization. Code generation now uses the start VPValue of induction recipes. This makes it possible to adjust the start value of the epilogue vector loop to use the 'resume' value of the main vector loop. Fixes #59459. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D92132	2022-12-21 13:58:50 +00:00
Florian Hahn	08f16a8217	[VPlan] Use macro to define recipe classof implementation (NFC). Add a VP_CLASSOF_IMPL macro to define common classof implementations for recipes. This reduces duplication and also adds missing implementations to existing recipes.	2022-12-16 17:52:15 +00:00
Kazu Hirata	6eb0b0a045	Don't include Optional.h These files no longer use llvm::Optional.	2022-12-14 21:16:22 -08:00
Florian Hahn	e898479f2b	[VPlan] Sink non-uniform recieps for scalar plans. In scalar plans, replicate recipes will only generate a single value per UF, independent of whether they are uniform or not. So don't consider uniformity for plans with scalar VFs only. This allows us to handle a few additional cases in VPlan sinking instead of non-VPlan sinkScalarOperands. Depends on D133762. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D134218	2022-12-14 17:55:31 +00:00
Fangrui Song	1ec11d2d48	[Transforms/Vectorize] llvm::Optional => std::optional	2022-12-12 08:56:35 +00:00
Florian Hahn	29e8de5de1	[VPlan] Summarize recipes used to model inductions (NFC). Document recipes used to model inductions after introducing VPDerivedIVRecipe in 0c5df7cd2f81c. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D138748	2022-12-11 16:28:43 +00:00
Florian Hahn	0c5df7cd2f	Recommit "[VPlan] Add VPDerivedIVRecipe, use for VPScalarIVStepsRecipe." This reverts commit bf15f1e489aa2f1ac13268c9081a992a8963eb5b. The updated version fixes a crash by checking the induction kind instead of the opcode; for integer inductions, the step is always added, but the opcode might not be set.	2022-11-30 17:04:20 +00:00
Florian Hahn	bf15f1e489	Revert "[VPlan] Add VPDerivedIVRecipe, use for VPScalarIVStepsRecipe." This reverts commit 0fa666ecedc3f36471c0fee925d664512e7525a8. This triggers an assertion during AArch64 stage2 builds. Revert while I investigate. See https://lab.llvm.org/buildbot/#/builders/179/builds/4967/steps/11/logs/stdio	2022-11-28 22:43:11 +00:00
Florian Hahn	0fa666eced	[VPlan] Add VPDerivedIVRecipe, use for VPScalarIVStepsRecipe. This patch splits off the logic to transform the canonical IV to a a value for an induction with a different start and step. This transformation only needs to be done once (independent of VF/UF) and enables sinking of VPScalarIVStepsRecipe as follow-up. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D133758	2022-11-28 16:32:31 +00:00
Florian Hahn	12bb5535d2	[VPlan] Move cast codegen to emitTransformedIndex (NFCI). This reduces duplication a bit. Suggested as simplification in D133758.	2022-11-26 22:47:13 +00:00
Florian Hahn	32f1c5531b	[VPlan] Update VPValue::getDef to return VPRecipeBase, adjust name(NFC) The return value of getDef is guaranteed to be a VPRecipeBase and all users can also accept a VPRecipeBase *. Most users actually case to VPRecipeBase or a specific recipe before using it, so this change removes a number of redundant casts. Also rename it to getDefiningRecipe to make the name a bit clearer. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D136068	2022-11-16 22:12:08 +00:00
Florian Hahn	d72fcee8f4	[VPlan] Add VPValue::isDefinedOutsideVectorRegions helper (NFC). @Ayal suggested a better named helper than using `!getDef()` to check if a value is invariant across all parts. The property we are using here is that the VPValue is defined outside any vector loop region. There's a TODO left to handle recipes defined in pre-header blocks. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D133666	2022-10-19 13:20:30 +01:00
Florian Hahn	2c692d891e	[LV] Update handling of scalable pointer inductions after b73d2c8. The dependent code has been changed quite a lot since 151c144 which b73d2c8 effectively reverts. Now we run into a case where lowering didn't expect/support the behavior pre 151c144 any longer. Update the code dealing with scalable pointer inductions to also check for uniformity in combination with isScalarAfterVectorization. This should ensure scalable pointer inductions are handled properly during epilogue vectorization. Fixes #57912.	2022-09-23 18:23:02 +01:00
Florian Hahn	582f8ef19f	[LV] Keep track of cost-based ScalarAfterVec in VPWidenPointerInd. Epilogue vectorization uses isScalarAfterVectorization to check if widened versions for inductions need to be generated and bails out in those cases. At the moment, there are scenarios where isScalarAfterVectorization returns true but VPWidenPointerInduction::onlyScalarsGenerated would return false, causing widening. This can lead to widened phis with incorrect start values being created in the epilogue vector body. This patch addresses the issue by storing the cost-model decision in VPWidenPointerInductionRecipe and restoring the behavior before 151c144. This effectively reverts 151c144, but the long-term fix is to properly support widened inductions during epilogue vectorization Fixes #57712.	2022-09-19 18:14:35 +01:00
Kazu Hirata	5e5a6c5b07	Use std::conditional_t (NFC)	2022-09-18 10:25:06 -07:00
Florian Hahn	2a78890b7b	[VPlan] Move SCEV expansion for pointer induction to VPExpandSCEV (NFC). Use VPExpandSCEVRecipe to expand the step of pointer inductions. This cleanup addresses a corresponding FIXME. It should be NFC, as steps for pointer induction must be constants, which makes expansion trivial.	2022-09-09 19:20:13 +01:00
Florian Hahn	fc444ddc77	[VPlan] Add field to track if intrinsic should be used for call. (NFC) This patch moves the cost-based decision whether to use an intrinsic or library call to the point where the recipe is created. This untangles code-gen from the cost model and also avoids doing some extra work as the information is already computed at construction. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D132585	2022-09-01 13:14:40 +01:00
Philip Reames	033a97a8f3	[LV] Minor code restructure of isUniformAfterVectorization [nfc] Mostly just to make a future patch easier to review.	2022-08-29 12:48:27 -07:00
Florian Hahn	af98b875e8	[VPlan] Use range check in VPHeaderPHIRecipe::classof (NFC). This addresses a suggestion to simplify the check from D131989. This also makes it easier to ensure that VPHeaderPHIRecipe::classof checks for all header phi ids.	2022-08-28 15:54:12 +01:00
Florian Hahn	7743badafa	[VPlan] Verify that header only contains header phi recipes. Add verification that VPHeaderPHIRecipes are only in header VPBBs. Also adds missing checks for VPPointerInductionRecipe to VPHeaderPHIRecipe::classof. Split off from D119661. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D131989	2022-08-27 22:06:12 +01:00
Kazu Hirata	21de2888a4	Use llvm::is_contained (NFC)	2022-08-27 09:53:11 -07:00
Florian Hahn	4e5c44964a	[VPlan] Move isUniformAfterVectorization from VPlan to vputils (NFC). This allows re-using the utility without a VPlan object. The helper also doesn't access any data from VPlan.	2022-08-26 18:26:33 +01:00
Florian Hahn	689895f432	[VPlan] Remove unneeded `struct` prefix for VPTransformState args (NFC).	2022-08-24 17:58:08 +01:00
David Sherwood	03fee6712a	[LoopVectorize] Add option to use active lane mask for loop control flow Currently, for vectorised loops that use the get.active.lane.mask intrinsic we only use the mask for predicated vector operations, such as masked loads and stores, etc. The loop itself is still controlled by comparing the canonical induction variable with the trip count. However, for some targets this is inefficient when it's cheap to use the mask itself to control the loop. This patch adds support for using the active lane mask for control flow by: 1. Generating the active lane mask for the next iteration of the vector loop, rather than the current one. If there are still any remaining iterations then at least the first bit of the mask will be set. 2. Extract the first bit of this mask and use this bit for the conditional branch. I did this by creating a new VPActiveLaneMaskPHIRecipe that sets up the initial PHI values in the vector loop pre-header. I've also made use of the new BranchOnCond VPInstruction for the final instruction in the loop region. Differential Revision: https://reviews.llvm.org/D125301	2022-07-11 13:46:55 +01:00
David Sherwood	02d6950d84	[LoopVectorize][NFC] Add optional Name parameter to VPInstruction This patch is a simple piece of refactoring that now permits users to create VPInstructions and specify the name of the value being generated. This is useful for creating more readable/meaningful names in IR. Differential Revision: https://reviews.llvm.org/D128982	2022-07-11 09:23:24 +01:00
Florian Hahn	b0da3c6fa4	[VPlan] Move setDebugLocFromInst to VPTransformState (NFC). The moved helpers are only used for codegen. It will allow moving the remaining ::execute implementations out of LoopVectorize.cpp. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D128657	2022-07-02 15:18:17 +01:00
Florian Hahn	0dddf04cab	[LV] Don't optimize exit cond during epilogue vectorization. At the moment, the same VPlan can be used code generation of both the main vector and epilogue vector loop. This can lead to wrong results, if the plan is optimized based on the VF of the main vector loop and then re-used for the epilogue loop. One example where this is problematic is if the scalar loops need to execute at least one iteration, e.g. due to interleave groups. To prevent mis-compiles in the short-term, disable optimizing exit conditions for VPlans when using epilogue vectorization. The proper fix is to avoid re-using the same plan for both loops, which will require support for cloning plans first. Fixes #56319.	2022-07-01 13:48:38 +01:00
Florian Hahn	583abd0e36	[VPlan] Move addMetadata to VPTransformState (NFC). The moved helpers are only used for codegen. It will allow moving the remaining ::execute implementations out of LoopVectorize.cpp. Depends on D127966. Depends on D127965. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D127968	2022-07-01 12:03:25 +01:00
Florian Hahn	85983ca42e	[VPlan] Replace remaining use of needsScalarIV. All information is already available in VPlan. Note that there are some test changes, because we now can correctly look through instructions like truncates to analyze the actual users. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D123541	2022-06-09 12:05:37 +01:00
Florian Hahn	cedfd7a2e5	Recommit "[VPlan] Remove uneeded needsVectorIV check." This reverts commit 266ea446ab747671eb6c736569c3c9c5f3c53d11. The reasons for the revert have been addressed by cleaning up condition handling in VPlan and properly marking VPBranchOnMaskRecipe as using scalars. The test case for the revert from D123720 has been added in 3d663308a5d.	2022-06-08 14:06:45 +01:00

1 2 3 4 5 ...

277 Commits