llvm-project

Author	SHA1	Message	Date
David Green	b65267ca7b	[LV] Invalidate widening decisions after maximizing vector bandwidth When MaximizeVectorBandwidth is enabled, we can end up (via calls to collectUniformsAndScalars/setCostBasedWideningDecision through calculateRegisterUsage) making widening decisions before we have decided whether to fold the tail by masking. These decisions will be wrong if we later decided to fold the tail, for example when the trip count is very low. It will use incorrect costs for loads that should get masked, using standard memory operation costs instead. This still at the moment uses the EmulatedMaskMemRefHack costs (a bit unfortunately), but the old costs without this change were 1, leading to too optimistic vectorization. This slightly changes the way that the MaximizeVectorBandwidth option works to make it easier to test, always honouring the option if it is set. Differential Revision: https://reviews.llvm.org/D120215	2022-03-31 09:19:31 +01:00
Florian Hahn	ecb4171dcb	[LV] Handle zero cost loops in selectInterleaveCount. In some case, like in the added test case, we can reach selectInterleaveCount with loops that actually have a cost of 0. Unfortunately a loop cost of 0 is also used to communicate that the cost has not been computed yet. To resolve the crash, bail out if the cost remains zero after computing it. This seems like the best option, as there are multiple code paths that return a cost of 0 to force a computation in selectInterleaveCount. Computing the cost at multiple places up front there would unnecessarily complicate the logic. Fixes #54413.	2022-03-29 22:52:43 +01:00
Florian Hahn	d1d3563278	[LV] Move code to place pointer induction increment to VPlan post-processing. This patch moves the code to set the correct incoming block for the backedge value to VPlan::execute. When generating the phi node, the backedge value is temporarily added using the pre-header as incoming block. The invalid phi node will be fixed up during VPlan::execute after main VPlan code generation. At the same time, the backedge value is also moved to the latch. This change removes the requirement to create the latch block up-front for VPWidenInductionPHIRecipe::execute, which in turn will enable modeling the pre-header in VPlan. Depends on D121617. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D121618	2022-03-29 20:27:59 +01:00
Florian Hahn	e7bf2ea934	[LV] Move code to place induction increment to VPlan post-processing. This patch moves the code to set the correct incoming block for the backedge value to VPlan::execute. When generating the phi node, the backedge value is temporarily added using the pre-header as incoming block. The invalid phi node will be fixed up during VPlan::execute after main VPlan code generation. At the same time, the backedge value is also moved to the latch. This change removes the requirement to create the latch block up-front for VPWidenIntOrFpInductionRecipe::execute, which in turn will enable modeling the pre-header in VPlan. As an alternative, the increment could be modeled as separate recipe, but that would require more work and a bit of redundant code, as we need to create the step-vector during VPWidenIntOrFpInductionRecipe::execute anyways, to create the values for different parts. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D121617	2022-03-28 16:20:02 +01:00
Florian Hahn	e47d220230	[LV] Use getVectorLoopRegion to retrieve header. (NFC) Update all places that currently assume the entry block to the plan is also the vector loop header to use getVectorLoopRegion instead. getVectorLoopRegion will keep doing the right thing when the pre-header is modeled explicitly (and becomes the new entry block in the plan).	2022-03-25 16:57:12 +00:00
Simon Pilgrim	597aefa89c	Fix unused variable warning by embedding inside assertion	2022-03-24 17:41:24 +00:00
Florian Hahn	46432a0088	[VPlan] Add VPWidenPointerInductionRecipe. This patch moves pointer induction handling from VPWidenPHIRecipe to its own recipe. In the process, it adds all information required to generate code for pointer inductions without relying on Legal to access the list of induction phis. Alternatively VPWidenPHIRecipe could also take an optional pointer to InductionDescriptor. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D121615	2022-03-24 14:58:45 +00:00
serge-sans-paille	1b89c83254	Cleanup includes: Transforms/Instrumentation & Transforms/Vectorize Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D122181	2022-03-23 11:06:13 +01:00
Florian Hahn	50c8588e44	[LV] Remove Loop argument from createInductionResumeValues (NFCI). createInductionResumeValues only uses its loop argument only to get the pre-header, but the pre-header is already known (we created/cached it earlier). Remove the unneeded loop argument.	2022-03-22 14:23:12 +00:00
Sophia	72bde608d2	[LV] Fix typo in comment Reviewed by: fhahn (Florian Hahn) Differential Revision: https://reviews.llvm.org/D121781	2022-03-21 20:30:05 +08:00
Florian Hahn	0ebac76e6e	[LV] Remove unneeded Loop argument from completeLoopSkeleton. (NFCI) completeLoopSkeleton only uses its loop argument only to get the pre-header, but the pre-header is already known (we created/cached it earlier). Remove the unneeded loop argument.	2022-03-21 10:07:25 +00:00
Florian Hahn	487629cc61	[LV] Remove dead Loop argument from emitMemRuntimeChecks. (NFC)	2022-03-20 21:01:15 +00:00
Florian Hahn	1a820ff039	[LV] Remove unnecessary uses of Loop* (NFC). Update functions that previously took a loop pointer but only to get the pre-header. Instead, pass the block directly. This removes the requirement for the loop object to be created up-front.	2022-03-19 20:18:47 +00:00
Florian Hahn	151c144350	[LV] Use usesScalars in widenPHIInstruction. This uses the existing VPlan helpers to check whether there are scalar uses of a phi recipe. It remove one of the few remaining dependencies on the cost model from VPlan code generation. Depends on D121612. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D121613	2022-03-17 13:16:32 +00:00
Malhar Jajoo	a36d269658	[VPlan] Avoid collecting scalars for SVE This patch ensures scalars (except for uniforms) are no longer collected (prior to LVP planning phase) for scalable vectorization. This is to avoid the chances of generating scalarized instructions later (during LVP execute phase) as they are not supported for scalable vectorization. Relevant test has also been added. Differential Revision: https://reviews.llvm.org/D121452	2022-03-16 16:33:34 +00:00
Florian Hahn	ca1b2fc9fb	[LV] Remove LoopVectorBody from InnerLoopVectorizer. (NFCI) Update places still referencing LoopVectorBody to use the vector loop to get the vector loop header. This is needed to move vector loop code-generation to VPlan completely, which in turn is needed to model pre-header & exit blocks in VPlan as well.	2022-03-15 08:22:31 +00:00
Florian Hahn	d621ae30e2	[LV] Remove dead Loop argument from emitMinimumVector... (NFC) The argument is not used, remove it.	2022-03-14 15:47:40 +00:00
Florian Hahn	3ee2d908a9	[LV] Remove dead Loop argument from emitSCEVChecks. (NFC) The argument is not used, remove it.	2022-03-14 13:00:03 +00:00
Florian Hahn	8896c36624	[LV] Do not set insert point in completeLoopSkeleton. (NFCI) The insertion point for the builder used during VPlan code generation is set during code generation. Setting the insert point here is dead code and can be removed.	2022-03-14 12:21:26 +00:00
Florian Hahn	95f76bff1c	[LV] Create & use VPScalarIVSteps for all scalar users. This patch is a follow-up to D115953. It updates optimizeInductions to also introduce new VPScalarIVStepsRecipes if an IV has both vector and scalar uses. It updates all uses that only need scalar values to use the newly created recipe for the scalar steps. This completes untangling of VPWidenIntOrFpInductionRecipe code-generation. Now the recipe only creates the widened vector values, as it says on the tin. The code to genereate IR has been moved directly to VPWidenIntOrFpInductionRecipe::execute. Note that the recipe has been updated to hold a reference to ScalarEvolution, which is needed to expand the step, until we can place the corresponding SCEV expansion in the pre-header. Depends on D120827. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D120828	2022-03-13 17:15:24 +00:00
serge-sans-paille	ed98c1b376	Cleanup includes: DebugInfo & CodeGen Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D121332	2022-03-12 17:26:40 +01:00
Roman Lebedev	2f80ea7f4f	[NFC][LV] Use different braces in debug output The analysis passes output function name encapsulated in `'` braces, but LV uses `"`. Harmonizing this may help in creating an update script for the LV costmodel test checks. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D121105	2022-03-07 19:32:37 +03:00
Florian Hahn	8777cb66a8	[VPlan] Remove reliance on underlying instr for ScalarIVSteps (NFCI). Instead of relying on underlying instructions, this patch updates VPScalarIVStepsRecipe to only store the required type information. This removes access to unrelated information, as well as avoiding issues with the same underlying instruction being shared by multiple recipes. This change should only change the debug output and not cause any codegen changes, hence NFCI.	2022-03-02 16:23:19 +00:00
Florian Hahn	9e46866c0c	[LV] Remove dead EntryVal argument from buildScalarSteps (NFC). The EntryVal argument is not needed after recent refactoring. Remove it.	2022-03-02 14:59:22 +00:00
Florian Hahn	b3e8ace198	Recommit "[VPlan] Introduce recipe to build scalar steps." This reverts the revert commit ff93260bf6bddfbad1fa65c4d5184988885b900f. The underlying issue causing the PPC bot failures has been fixed in cbaac1473403 and a corresponding test case has been added in ad2cad1c521c. Original message: This patch adds a new VPScalarIVStepsRecipe to handle building scalar steps. In the first patch, it only handles the case where there is no vector induction variable needed. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D115953	2022-02-28 14:12:20 +00:00
Florian Hahn	ff93260bf6	Revert "[VPlan] Introduce recipe to build scalar steps." This reverts commit 49b23f451cf713036c99573a35daed308d2ac894. This appears to break some PPC build bots. Revert while I investigate.	2022-02-27 17:51:19 +00:00
Florian Hahn	49b23f451c	[VPlan] Introduce recipe to build scalar steps. This patch adds a new VPScalarIVStepsRecipe to handle building scalar steps. In the first patch, it only handles the case where there is no vector induction variable needed. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D115953	2022-02-27 17:32:41 +00:00
Florian Hahn	da740492b0	[VPlan] Remove dead header-phi recipes. This patch adds a new transform to remove dead recipes. For now, it only removes dead recipes in the header, to keep the number tests that require updating manageable. Future patches will extend this to remove dead recipes across the whole plan. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D118051	2022-02-26 16:26:39 +00:00
Kerry McLaughlin	12fb133eba	[LoopVectorize] Support conditional in-loop vector reductions Extends getReductionOpChain to look through Phis which may be part of the reduction chain. adjustRecipesForReductions will now also create a CondOp for VPReductionRecipe if the block is predicated and not only if foldTailByMasking is true. Changes were required in tryToBlend to ensure that we don't attempt to convert the reduction Phi into a select by returning a VPBlendRecipe. The VPReductionRecipe will create a select between the Phi and the reduction. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D117580	2022-02-22 12:04:35 +00:00
Florian Hahn	2cd22ce0d0	[LV] Pass start value directly to emitTransformedIndex (NFC).	2022-02-12 19:03:32 +00:00
Philip Reames	5ba115031d	[PSE] Remove assumption that top level predicate is union from public interface [NFC] Note that this doesn't actually cause the top level predicate to become a non-union just yet. The above comes from a case in the LoopVectorizer where a predicate which is later proven no longer blocks vectorization due to a change from checking if predicates exists to whether the predicate is possibly false.	2022-02-10 16:14:52 -08:00
Simon Pilgrim	6af7c1371a	[LoopVectorize] getStepVector - reduce scope of local variable. NFC.	2022-02-10 20:44:25 +00:00
David Green	b55d4c2ad8	Revert "[LV] Remove `LoopVectorizationCostModel::useEmulatedMaskMemRefHack()`" This reverts commit 77a0da926c9ea86afa9baf28158d79c7678fc6b9 as we've received multiple reports of this significantly impacting performance, in ways that don't seem to just be target specific cost models going wrong. I would offer some reproducers, but the test changes here seem to be full of them! Reverting for now and hopefully we can remove the "hack" more carefully as we go.	2022-02-09 20:02:54 +00:00
Florian Hahn	8aa122081f	[LV] Pass step to emitTransformedIndex (NFC). Move out the induction step creation from emitTransformedIndex to the callers. In some places (e.g. widenIntOrFpInduction) the step is already created. Passing the step in ensures the steps are kept in sync.	2022-02-09 11:12:45 +00:00
Florian Hahn	c9e6678b56	[LV] Move buildScalarSteps out of ILV (NFC). This makes the function independent of shared state in ILV (ensures no new dependencies on things like the cost model are introduced) and allows for use directly in recipe's ::execute functions.	2022-02-08 21:18:40 +00:00
David Green	b4c6d1bb37	[LoopVectorizer] Don't perform interleaving of predicated scalar loops The vectorizer will choose at times to "vectorize" loops with a scalar factor (VF=1) with interleaving (IC > 1). This can occasionally produce better code than the unroller (notable for reductions where it can produce independent reduction chains that are combined after the loop). At times this is not very beneficial though, for example when runtime checks are needed or when the scalar code requires predication. This addresses the second point, preventing the vectorizer from interleaving when the scalar loop will require predication. This prevents it from making a bit of a mess, that is worse than the original and better left for the unroller to unroll if beneficial. It helps reverse some of the regressions from D118090. Differential Revision: https://reviews.llvm.org/D118566	2022-02-07 19:34:28 +00:00
Florian Hahn	5a72357697	[LV] Use IRBuilderBase in VPlan.h, remove IRBuilder.h include (NFC). By using IRBuilderBase instead of IRBuilder<> a forward declaration can be used instead of including IRBuilder.h	2022-02-07 17:46:16 +00:00
Roman Lebedev	77a0da926c	[LV] Remove `LoopVectorizationCostModel::useEmulatedMaskMemRefHack()` D43208 extracted `useEmulatedMaskMemRefHack()` from legality into cost model. What it essentially does is prevents scalarized vectorization of masked memory operations: ``` // TODO: Cost model for emulated masked load/store is completely // broken. This hack guides the cost model to use an artificially // high enough value to practically disable vectorization with such // operations, except where previously deployed legality hack allowed // using very low cost values. This is to avoid regressions coming simply // from moving "masked load/store" check from legality to cost model. // Masked Load/Gather emulation was previously never allowed. // Limited number of Masked Store/Scatter emulation was allowed. ``` While i don't really understand about what specifically `is completely broken` was talking about, i believe that at least on X86 with AVX2-or-later, this is no longer true. (or at least, i would like to know what is still broken). So i would like to follow suit after D111460, and like wise disable that hack for AVX2+. But since this was added for X86 specifically, let's just instead completely remove this hack. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D114779	2022-02-07 16:08:31 +03:00
Florian Hahn	541ca12dcd	[LV] Use VPReplicateRecipe::isUniform instead isUniformAfterVec (NFCI). In scalarizeInstruction(), isUniformAfterVectorization is used to detect cases where it is sufficient to always access the first lane. This should map directly checking whether the operand is a uniform replicate recipe. Differential Revision: https://reviews.llvm.org/D116654	2022-02-06 16:37:20 +00:00
Sander de Smalen	eaee477eda	[LV] Use VScaleForTuning to allow wider epilogue VFs. When the main loop is e.g. VF=vscale x 1 and the epilogue VF cannot be any smaller, the vectorizer should try to estimate how many lanes are executed at runtime and allow a suitable fixed-width VF to be chosen. It can use VScaleForTuning to figure out what a suitable fixed-width VF could be. For the case where the main loop VF is VF=vscale x 1, and VScaleForTuning=8, it could still choose an epilogue VF upto VF=4. This was a bit tricky to test, so this patch also introduces a wrapper function to get 'VScaleForTuning' by also considering vscale_range. If min and max are equal, then that will be the vscale we compile for. It makes little sense to tune for a different width if the code will not be portable for other widths. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D118709	2022-02-03 15:40:17 +00:00
Sander de Smalen	2a44eaf20f	[LV] Allow a scalable VF for the epilogue. For some reason we limited the epilogue VF to be fixed-width, but there is not necessarily a reason for doing so. If the main VF=vscale x 16, the epilogue VF could be either fixed-width, or a scalable VF upto vscale x 8. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D118688	2022-02-01 22:38:55 +00:00
Florian Hahn	7fe4fa9a0a	[LV] Use onlyFirstLaneDemanded when widening pointer phis (NFCI). This removes another instance of recipe execution still relying on the cost model. Depends on D116554. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D116656	2022-02-01 09:50:47 +00:00
Florian Hahn	8f12175fed	[VPlan] Use VPlan to check if only the first lane is used. This removes the remaining dependence on LoopVectorizationCostModel from buildScalarSteps and is required so it can be moved out of ILV. It also improves allows us to remove a few unneeded instructions. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D116554	2022-01-30 13:07:29 +00:00
Florian Hahn	efd4938723	[VPlan] Handle IV vector splat using VPWidenCanonicalIV. This patch tries to use an existing VPWidenCanonicalIVRecipe instead of creating another step-vector for canonical induction recipes in widenIntOrFpInduction. This has the following benefits: 1. First step to avoid setting both vector and scalar values for the same induction def. 2. Reducing complexity of widenIntOrFpInduction through making things more explicit in VPlan 3. Only need to splat the vector IV for block in masks. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D116123	2022-01-29 16:25:27 +00:00
Florian Hahn	96400f179f	[VPlan] Record whether scalar IVs are need in induction recipe. (NFC) This explicitly records whether a scalar IV is needed in the VPWidenIntOrFpInductionRecipe, to remove a dependence on the cost-model during its ::execute. It will also be used in D116123 to determine if a vector phi will be generated. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D118167	2022-01-28 09:34:03 +00:00
Kerry McLaughlin	8082ab2fc3	[LoopVectorize] Support epilogue vectorisation of loops with reductions isCandidateForEpilogueVectorization will currently return false for loops which contain reductions. This patch removes this restriction and makes the following changes to support epilogue vectorisation with reductions: - `fixReduction`: If fixReduction is being called during vectorisation of the epilogue, the phi node it creates will need to additionally carry incoming values from the middle block of the main loop. - `createEpilogueVectorizedLoopSkeleton`: The incoming values of the phi created by fixReduction are updated after the vec.epilog.iter.check block is added. The phi is also moved to the preheader of the epilogue. - `processLoop`: The start value of any VPReductionPHIRecipes are updated before vectorising the epilogue loop. The getResumeInstr function added to the ILV will return the resume instruction associated with the recurrence descriptor. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D116928	2022-01-24 12:03:31 +00:00
Florian Hahn	5f2854f1da	[LV] Always create VPWidenCanonicalIVRecipe, optimize away later. This patch updates createBlockInMask to always generate VPWidenCanonicalIVRecipe and adds a transform to optimize it away later, if it is not needed. This is a step towards breaking up VPWidenIntOrFpInductionRecipe and explicitly distinguishing between vector phis and scalarizing. Split off from D116123. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D117140	2022-01-22 15:34:20 +00:00
Florian Hahn	c0cf209076	[VPlan] Add VPWidenIntOrFpInductionRecipe::isCanonical, use it (NFCI). This patch adds VPWidenIntOrFpInductionRecipe::isCanonical to check if an induction recipe is canonical. The code is also updated to use it instead of isCanonicalID. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D117551	2022-01-21 09:35:06 +00:00
Florian Hahn	070d1034da	[LV] Restore metadata to disable runtime unrolling for epilogue loop. After d4a8fc3a87a1 LV stopped adding metadata to disable runtime unrolling to the vectorized epilogue loop. This was missed because 278aa65cc495 removed the relevant test coverage. This patch fixes that by adding the relevant metadata after vector loop generation.	2022-01-16 13:14:16 +00:00
Florian Hahn	62739204d4	[LV] Move AddRuntimeUnrollDisableMetaData so it can be used earlier (NFC) Move up the definition of AddRuntimeUnrollDisableMetaData, so it can be re-used earlier in the file in a follow-up patch.	2022-01-16 10:30:24 +00:00

1 2 3 4 5 ...

1578 Commits