llvm-project

Author	SHA1	Message	Date
Malhar Jajoo	a36d269658	[VPlan] Avoid collecting scalars for SVE This patch ensures scalars (except for uniforms) are no longer collected (prior to LVP planning phase) for scalable vectorization. This is to avoid the chances of generating scalarized instructions later (during LVP execute phase) as they are not supported for scalable vectorization. Relevant test has also been added. Differential Revision: https://reviews.llvm.org/D121452	2022-03-16 16:33:34 +00:00
Florian Hahn	ca1b2fc9fb	[LV] Remove LoopVectorBody from InnerLoopVectorizer. (NFCI) Update places still referencing LoopVectorBody to use the vector loop to get the vector loop header. This is needed to move vector loop code-generation to VPlan completely, which in turn is needed to model pre-header & exit blocks in VPlan as well.	2022-03-15 08:22:31 +00:00
Florian Hahn	d621ae30e2	[LV] Remove dead Loop argument from emitMinimumVector... (NFC) The argument is not used, remove it.	2022-03-14 15:47:40 +00:00
Florian Hahn	3ee2d908a9	[LV] Remove dead Loop argument from emitSCEVChecks. (NFC) The argument is not used, remove it.	2022-03-14 13:00:03 +00:00
Florian Hahn	8896c36624	[LV] Do not set insert point in completeLoopSkeleton. (NFCI) The insertion point for the builder used during VPlan code generation is set during code generation. Setting the insert point here is dead code and can be removed.	2022-03-14 12:21:26 +00:00
Florian Hahn	95f76bff1c	[LV] Create & use VPScalarIVSteps for all scalar users. This patch is a follow-up to D115953. It updates optimizeInductions to also introduce new VPScalarIVStepsRecipes if an IV has both vector and scalar uses. It updates all uses that only need scalar values to use the newly created recipe for the scalar steps. This completes untangling of VPWidenIntOrFpInductionRecipe code-generation. Now the recipe only creates the widened vector values, as it says on the tin. The code to genereate IR has been moved directly to VPWidenIntOrFpInductionRecipe::execute. Note that the recipe has been updated to hold a reference to ScalarEvolution, which is needed to expand the step, until we can place the corresponding SCEV expansion in the pre-header. Depends on D120827. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D120828	2022-03-13 17:15:24 +00:00
serge-sans-paille	ed98c1b376	Cleanup includes: DebugInfo & CodeGen Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D121332	2022-03-12 17:26:40 +01:00
Roman Lebedev	2f80ea7f4f	[NFC][LV] Use different braces in debug output The analysis passes output function name encapsulated in `'` braces, but LV uses `"`. Harmonizing this may help in creating an update script for the LV costmodel test checks. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D121105	2022-03-07 19:32:37 +03:00
Florian Hahn	8777cb66a8	[VPlan] Remove reliance on underlying instr for ScalarIVSteps (NFCI). Instead of relying on underlying instructions, this patch updates VPScalarIVStepsRecipe to only store the required type information. This removes access to unrelated information, as well as avoiding issues with the same underlying instruction being shared by multiple recipes. This change should only change the debug output and not cause any codegen changes, hence NFCI.	2022-03-02 16:23:19 +00:00
Florian Hahn	9e46866c0c	[LV] Remove dead EntryVal argument from buildScalarSteps (NFC). The EntryVal argument is not needed after recent refactoring. Remove it.	2022-03-02 14:59:22 +00:00
Florian Hahn	b3e8ace198	Recommit "[VPlan] Introduce recipe to build scalar steps." This reverts the revert commit ff93260bf6bddfbad1fa65c4d5184988885b900f. The underlying issue causing the PPC bot failures has been fixed in cbaac1473403 and a corresponding test case has been added in ad2cad1c521c. Original message: This patch adds a new VPScalarIVStepsRecipe to handle building scalar steps. In the first patch, it only handles the case where there is no vector induction variable needed. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D115953	2022-02-28 14:12:20 +00:00
Florian Hahn	ff93260bf6	Revert "[VPlan] Introduce recipe to build scalar steps." This reverts commit 49b23f451cf713036c99573a35daed308d2ac894. This appears to break some PPC build bots. Revert while I investigate.	2022-02-27 17:51:19 +00:00
Florian Hahn	49b23f451c	[VPlan] Introduce recipe to build scalar steps. This patch adds a new VPScalarIVStepsRecipe to handle building scalar steps. In the first patch, it only handles the case where there is no vector induction variable needed. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D115953	2022-02-27 17:32:41 +00:00
Florian Hahn	da740492b0	[VPlan] Remove dead header-phi recipes. This patch adds a new transform to remove dead recipes. For now, it only removes dead recipes in the header, to keep the number tests that require updating manageable. Future patches will extend this to remove dead recipes across the whole plan. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D118051	2022-02-26 16:26:39 +00:00
Kerry McLaughlin	12fb133eba	[LoopVectorize] Support conditional in-loop vector reductions Extends getReductionOpChain to look through Phis which may be part of the reduction chain. adjustRecipesForReductions will now also create a CondOp for VPReductionRecipe if the block is predicated and not only if foldTailByMasking is true. Changes were required in tryToBlend to ensure that we don't attempt to convert the reduction Phi into a select by returning a VPBlendRecipe. The VPReductionRecipe will create a select between the Phi and the reduction. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D117580	2022-02-22 12:04:35 +00:00
Florian Hahn	2cd22ce0d0	[LV] Pass start value directly to emitTransformedIndex (NFC).	2022-02-12 19:03:32 +00:00
Philip Reames	5ba115031d	[PSE] Remove assumption that top level predicate is union from public interface [NFC] Note that this doesn't actually cause the top level predicate to become a non-union just yet. The above comes from a case in the LoopVectorizer where a predicate which is later proven no longer blocks vectorization due to a change from checking if predicates exists to whether the predicate is possibly false.	2022-02-10 16:14:52 -08:00
Simon Pilgrim	6af7c1371a	[LoopVectorize] getStepVector - reduce scope of local variable. NFC.	2022-02-10 20:44:25 +00:00
David Green	b55d4c2ad8	Revert "[LV] Remove `LoopVectorizationCostModel::useEmulatedMaskMemRefHack()`" This reverts commit 77a0da926c9ea86afa9baf28158d79c7678fc6b9 as we've received multiple reports of this significantly impacting performance, in ways that don't seem to just be target specific cost models going wrong. I would offer some reproducers, but the test changes here seem to be full of them! Reverting for now and hopefully we can remove the "hack" more carefully as we go.	2022-02-09 20:02:54 +00:00
Florian Hahn	8aa122081f	[LV] Pass step to emitTransformedIndex (NFC). Move out the induction step creation from emitTransformedIndex to the callers. In some places (e.g. widenIntOrFpInduction) the step is already created. Passing the step in ensures the steps are kept in sync.	2022-02-09 11:12:45 +00:00
Florian Hahn	c9e6678b56	[LV] Move buildScalarSteps out of ILV (NFC). This makes the function independent of shared state in ILV (ensures no new dependencies on things like the cost model are introduced) and allows for use directly in recipe's ::execute functions.	2022-02-08 21:18:40 +00:00
David Green	b4c6d1bb37	[LoopVectorizer] Don't perform interleaving of predicated scalar loops The vectorizer will choose at times to "vectorize" loops with a scalar factor (VF=1) with interleaving (IC > 1). This can occasionally produce better code than the unroller (notable for reductions where it can produce independent reduction chains that are combined after the loop). At times this is not very beneficial though, for example when runtime checks are needed or when the scalar code requires predication. This addresses the second point, preventing the vectorizer from interleaving when the scalar loop will require predication. This prevents it from making a bit of a mess, that is worse than the original and better left for the unroller to unroll if beneficial. It helps reverse some of the regressions from D118090. Differential Revision: https://reviews.llvm.org/D118566	2022-02-07 19:34:28 +00:00
Florian Hahn	5a72357697	[LV] Use IRBuilderBase in VPlan.h, remove IRBuilder.h include (NFC). By using IRBuilderBase instead of IRBuilder<> a forward declaration can be used instead of including IRBuilder.h	2022-02-07 17:46:16 +00:00
Roman Lebedev	77a0da926c	[LV] Remove `LoopVectorizationCostModel::useEmulatedMaskMemRefHack()` D43208 extracted `useEmulatedMaskMemRefHack()` from legality into cost model. What it essentially does is prevents scalarized vectorization of masked memory operations: ``` // TODO: Cost model for emulated masked load/store is completely // broken. This hack guides the cost model to use an artificially // high enough value to practically disable vectorization with such // operations, except where previously deployed legality hack allowed // using very low cost values. This is to avoid regressions coming simply // from moving "masked load/store" check from legality to cost model. // Masked Load/Gather emulation was previously never allowed. // Limited number of Masked Store/Scatter emulation was allowed. ``` While i don't really understand about what specifically `is completely broken` was talking about, i believe that at least on X86 with AVX2-or-later, this is no longer true. (or at least, i would like to know what is still broken). So i would like to follow suit after D111460, and like wise disable that hack for AVX2+. But since this was added for X86 specifically, let's just instead completely remove this hack. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D114779	2022-02-07 16:08:31 +03:00
Florian Hahn	541ca12dcd	[LV] Use VPReplicateRecipe::isUniform instead isUniformAfterVec (NFCI). In scalarizeInstruction(), isUniformAfterVectorization is used to detect cases where it is sufficient to always access the first lane. This should map directly checking whether the operand is a uniform replicate recipe. Differential Revision: https://reviews.llvm.org/D116654	2022-02-06 16:37:20 +00:00
Sander de Smalen	eaee477eda	[LV] Use VScaleForTuning to allow wider epilogue VFs. When the main loop is e.g. VF=vscale x 1 and the epilogue VF cannot be any smaller, the vectorizer should try to estimate how many lanes are executed at runtime and allow a suitable fixed-width VF to be chosen. It can use VScaleForTuning to figure out what a suitable fixed-width VF could be. For the case where the main loop VF is VF=vscale x 1, and VScaleForTuning=8, it could still choose an epilogue VF upto VF=4. This was a bit tricky to test, so this patch also introduces a wrapper function to get 'VScaleForTuning' by also considering vscale_range. If min and max are equal, then that will be the vscale we compile for. It makes little sense to tune for a different width if the code will not be portable for other widths. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D118709	2022-02-03 15:40:17 +00:00
Sander de Smalen	2a44eaf20f	[LV] Allow a scalable VF for the epilogue. For some reason we limited the epilogue VF to be fixed-width, but there is not necessarily a reason for doing so. If the main VF=vscale x 16, the epilogue VF could be either fixed-width, or a scalable VF upto vscale x 8. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D118688	2022-02-01 22:38:55 +00:00
Florian Hahn	7fe4fa9a0a	[LV] Use onlyFirstLaneDemanded when widening pointer phis (NFCI). This removes another instance of recipe execution still relying on the cost model. Depends on D116554. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D116656	2022-02-01 09:50:47 +00:00
Florian Hahn	8f12175fed	[VPlan] Use VPlan to check if only the first lane is used. This removes the remaining dependence on LoopVectorizationCostModel from buildScalarSteps and is required so it can be moved out of ILV. It also improves allows us to remove a few unneeded instructions. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D116554	2022-01-30 13:07:29 +00:00
Florian Hahn	efd4938723	[VPlan] Handle IV vector splat using VPWidenCanonicalIV. This patch tries to use an existing VPWidenCanonicalIVRecipe instead of creating another step-vector for canonical induction recipes in widenIntOrFpInduction. This has the following benefits: 1. First step to avoid setting both vector and scalar values for the same induction def. 2. Reducing complexity of widenIntOrFpInduction through making things more explicit in VPlan 3. Only need to splat the vector IV for block in masks. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D116123	2022-01-29 16:25:27 +00:00
Florian Hahn	96400f179f	[VPlan] Record whether scalar IVs are need in induction recipe. (NFC) This explicitly records whether a scalar IV is needed in the VPWidenIntOrFpInductionRecipe, to remove a dependence on the cost-model during its ::execute. It will also be used in D116123 to determine if a vector phi will be generated. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D118167	2022-01-28 09:34:03 +00:00
Kerry McLaughlin	8082ab2fc3	[LoopVectorize] Support epilogue vectorisation of loops with reductions isCandidateForEpilogueVectorization will currently return false for loops which contain reductions. This patch removes this restriction and makes the following changes to support epilogue vectorisation with reductions: - `fixReduction`: If fixReduction is being called during vectorisation of the epilogue, the phi node it creates will need to additionally carry incoming values from the middle block of the main loop. - `createEpilogueVectorizedLoopSkeleton`: The incoming values of the phi created by fixReduction are updated after the vec.epilog.iter.check block is added. The phi is also moved to the preheader of the epilogue. - `processLoop`: The start value of any VPReductionPHIRecipes are updated before vectorising the epilogue loop. The getResumeInstr function added to the ILV will return the resume instruction associated with the recurrence descriptor. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D116928	2022-01-24 12:03:31 +00:00
Florian Hahn	5f2854f1da	[LV] Always create VPWidenCanonicalIVRecipe, optimize away later. This patch updates createBlockInMask to always generate VPWidenCanonicalIVRecipe and adds a transform to optimize it away later, if it is not needed. This is a step towards breaking up VPWidenIntOrFpInductionRecipe and explicitly distinguishing between vector phis and scalarizing. Split off from D116123. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D117140	2022-01-22 15:34:20 +00:00
Florian Hahn	c0cf209076	[VPlan] Add VPWidenIntOrFpInductionRecipe::isCanonical, use it (NFCI). This patch adds VPWidenIntOrFpInductionRecipe::isCanonical to check if an induction recipe is canonical. The code is also updated to use it instead of isCanonicalID. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D117551	2022-01-21 09:35:06 +00:00
Florian Hahn	070d1034da	[LV] Restore metadata to disable runtime unrolling for epilogue loop. After d4a8fc3a87a1 LV stopped adding metadata to disable runtime unrolling to the vectorized epilogue loop. This was missed because 278aa65cc495 removed the relevant test coverage. This patch fixes that by adding the relevant metadata after vector loop generation.	2022-01-16 13:14:16 +00:00
Florian Hahn	62739204d4	[LV] Move AddRuntimeUnrollDisableMetaData so it can be used earlier (NFC) Move up the definition of AddRuntimeUnrollDisableMetaData, so it can be re-used earlier in the file in a follow-up patch.	2022-01-16 10:30:24 +00:00
Florian Hahn	42b34facfd	Recommit "[LV] Inline CreateSplatIV call for scalar VFs." This reverts the revert commit 073c27b5e5851f13d99d383e047309299b68827d. A reduced test case has been added in 5e4966cbae7ba5 and the code has been updated to handle the case where getInductionOpcode returns BinaryOpsEnd. In this case, the original code was always using Instruction::Add. Do the same in the patch. Note this commit may slightly change the value naming, because it now also assigns the 'induction' name in the floating point case.	2022-01-14 19:03:49 +00:00
James Y Knight	073c27b5e5	Revert "[LV] Inline CreateSplatIV call for scalar VFs (NFC)." Causes a crash with the following (creduce'd) test-case: clang -O3 '--target=aarch64-grtev4-linux-gnu' -xc - -c -o /dev/null <<EOF int e; int f; int g() { int h; int j = 0; while (&f - j > 0) { int k; k = j; if (e == j && *e) k = 5; h = k; j++; } return h; } EOF This reverts commit 7ce48be0fd83fb4fe3d0104f324bbbcfcc82983c.	2022-01-14 00:00:02 +00:00
Florian Hahn	3f2fb767e3	[VPlan] Make IV operand explicit for VPWidenCanonicalIVRecipe (NFC). This makes the def-use relationship between VPCanonicalIVPHIRecipe and VPWidenCanonicalIVRecipe explicit. Needed for D117140.	2022-01-13 11:13:05 +00:00
Florian Hahn	7ce48be0fd	[LV] Inline CreateSplatIV call for scalar VFs (NFC). This is a NFC change split off from D116123, as suggested there. D116123 will remove the last user of CreateSplatIV.	2022-01-13 09:34:31 +00:00
Florian Hahn	d4a8fc3a87	[VPlan] Introduce and use BranchOnCount VPInstruction. This patch adds a new BranchOnCount VPInstruction opcode with 2 operands. It first compares its 2 operands (increment of canonical induction and vector trip count), followed by a branch to either the exit block or back to the vector header. It must be the last recipe in the exit block of the topmost vector loop region. This extracts parts from D113224 and was discussed in D113223. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D116479	2022-01-12 13:42:13 +00:00
Rosie Sumpter	552eb372cb	[LoopVectorize] Pass a vector type to isLegalMaskedGather/Scatter This is required to query the legality more precisely in the LoopVectorizer. This adds another TTI function named 'forceScalarizeMaskedGather/Scatter' function to work around the hack introduced for MVE, where isLegalMaskedGather/Scatter would return an answer by second-guessing where the function was called from, based on the Type passed in (vector vs scalar). The new interface makes this explicit. It is also used by X86 to check for vector widths where gather/scatters aren't profitable (or don't exist) for certain subtargets. Differential Revision: https://reviews.llvm.org/D115329	2022-01-12 13:34:12 +00:00
David Sherwood	e3c84fb948	[LoopVectorize] Add support for tail folding using scalable vectors This patch fixes up an issue with InnerLoopVectorizer::getOrCreateVectorTripCount whereby we weren't correctly generating the runtime trip count for scalable vectors when tail-folding. It also removes some asserts in the tail-folding path for cases when the VF is not scalable. In this patch I have only permitted tail-folding to be enabled explicitly for scalable vectors when the user has specified one of the following flags: -prefer-predicate-over-epilogue=predicate-dont-vectorize -prefer-predicate-over-epilogue=predicate-else-scalar-epilogue For now it's best not to enable tail-folding with scalable vectors for low trip counts or when optimising for code size, since there has been no analysis on whether this is worth it. Various tests have been added here: Transforms/LoopVectorize/AArch64/sve-tail-folding.ll Transforms/LoopVectorize/AArch64/sve-tail-folding-forced.ll The tests cannot be target independent because they require masked load/store support, i.e. TTI.isLegalMaskedLoad and TTI.isLegalMaskedStore need to return true. Differential Revision: https://reviews.llvm.org/D113003	2022-01-10 10:55:40 +00:00
Florian Hahn	1ce01b7dfe	[SCEVExpander] Simplify cleanup, skip sorting by dominance. There is no need to sort inserted instructions by dominance, as the deletion loop still requires RAUW with undef before deleting. Removing instructions in reverse insertion order should still insure that the number of uselist updates is kept to a minimum.	2022-01-09 18:38:41 +00:00
Sander de Smalen	9cbe000df2	[LV] Load/store/reduction type must be sized, assert it. This addresses a suggestion by @nikic on D115356.	2022-01-06 12:35:27 +00:00
Florian Hahn	2ee8154816	[LV] Don't use getVPSingleValue for VPWidenMemoryInstRecipe (NFC). VPWidenMemoryInstructionRecipe is a VPValue, so this can be passed directly, instead of relying on getVPSingleValue.	2022-01-05 13:51:50 +00:00
Sander de Smalen	95a93722db	[LV] Remove what seems like stale code in collectElementTypesForWidening. This was originally added in rG22174f5d5af1eb15b376c6d49e7925cbb7cca6be although that patch doesn't really mention any reasons for ignoring the pointer type in this calculation if the memory access isn't consecutive. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D115356	2022-01-05 12:20:59 +00:00
Florian Hahn	65c4d6191f	[VPlan] Add VPCanonicalIVPHIRecipe, partly retire createInductionVariable. At the moment, the primary induction variable for the vector loop is created as part of the skeleton creation. This is tied to creating the vector loop latch outside of VPlan. This prevents from modeling the whole vector loop in VPlan, which in turn is required to model preheader and exit blocks in VPlan as well. This patch introduces a new recipe VPCanonicalIVPHIRecipe to represent the primary IV in VPlan and CanonicalIVIncrement{NUW} opcodes for VPInstruction to model the increment. This allows us to partly retire createInductionVariable. At the moment, a bit of patching up is done after executing all blocks in the plan. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D113223	2022-01-05 10:46:06 +00:00
Rosie Sumpter	961f51fdf0	[LoopVectorize][CostModel] Choose smaller VFs for in-loop reductions without loads/stores For loops that contain in-loop reductions but no loads or stores, large VFs are chosen because LoopVectorizationCostModel::getSmallestAndWidestTypes has no element types to check through and so returns the default widths (-1U for the smallest and 8 for the widest). This results in the widest VF being chosen for the following example, float s = 0; for (int i = 0; i < N; ++i) s += (float) i*i; which, for more computationally intensive loops, leads to large loop sizes when the operations end up being scalarized. In this patch, for the case where ElementTypesInLoop is empty, the widest type is determined by finding the smallest type used by recurrences in the loop instead of falling back to a default value of 8 bits. This results in the cost model choosing a more sensible VF for loops like the one above. Differential Revision: https://reviews.llvm.org/D113973	2022-01-04 10:12:57 +00:00
Florian Hahn	791523bae6	[LV] Set loop metadata after VPlan execution (NFC). Setting the loop metadata for the vector loop after VPlan execution allows generating the full loop body during VPlan execution. This is in preparation for D113224.	2022-01-03 09:59:50 +00:00

1 2 3 4 5 ...

1564 Commits