llvm-project

Author	SHA1	Message	Date
Florian Hahn	95fef1dfef	[LV] Improve AnyOf reduction codegen. (#78304 ) Update AnyOf reduction code generation to only keep track of the AnyOf property in a boolean vector in the loop, only selecting either the new or start value in the middle block. The patch incorporates feedback from https://reviews.llvm.org/D153697. This fixes the #62565, as now there aren't multiple uses of the start/new values. Fixes https://github.com/llvm/llvm-project/issues/62565 PR: https://github.com/llvm/llvm-project/pull/78304	2024-03-14 11:22:06 +00:00
Jeremy Morse	2fe81edef6	[NFC][RemoveDIs] Insert instruction using iterators in Transforms/ As part of the RemoveDIs project we need LLVM to insert instructions using iterators wherever possible, so that the iterators can carry a bit of debug-info. This commit implements some of that by updating the contents of llvm/lib/Transforms/Utils to always use iterator-versions of instruction constructors. There are two general flavours of update: * Almost all call-sites just call getIterator on an instruction * Several make use of an existing iterator (scenarios where the code is actually significant for debug-info) The underlying logic is that any call to getFirstInsertionPt or similar APIs that identify the start of a block need to have that iterator passed directly to the insertion function, without being converted to a bare Instruction pointer along the way. Noteworthy changes: * FindInsertedValue now takes an optional iterator rather than an instruction pointer, as we need to always insert with iterators, * I've added a few iterator-taking versions of some value-tracking and DomTree methods -- they just unwrap the iterator. These are purely convenience methods to avoid extra syntax in some passes. * A few calls to getNextNode become std::next instead (to keep in the theme of using iterators for positions), * SeparateConstOffsetFromGEP has it's insertion-position field changed. Noteworthy because it's not a purely localised spelling change. All this should be NFC.	2024-03-05 15:12:22 +00:00
Florian Hahn	4d525f2b9a	[VPlan] Remove unneeded InsertPointGuard (NFCI). getBlockInMask now simply returns an already computed mask, hence there's no need to adjust the builder insert point.	2024-02-29 12:37:13 +00:00
Nilanjana Basu	1c211bc76e	[LV] Remove unused configuration option (#82955 ) Recent set of changes (PR #67725) in loop interleaving algorithm caused removal of the loop trip count threshold for allowing interleaving. Therefore configuration option interleave-small-loop-scalar-reduction is no longer needed.	2024-02-28 10:17:25 -08:00
Florian Hahn	3fac0562f8	[VPlan] Reset trip count when replacing ExpandSCEV recipe. Otherwise accessing the trip count may accesses freed memory. Fixes https://lab.llvm.org/buildbot/#/builders/74/builds/26239 and others.	2024-02-28 16:31:49 +00:00
Alexey Bataev	80cff27390	[LV][NFC]Fix a misprint, NFC.	2024-02-28 07:56:31 -08:00
Alexey Bataev	fdd60c7b96	[LV][NFC]Preselect folding style while chosing the max VF, NFC. Selects the tail-folding style while choosing the max vector factor and storing it in the data member rather than calculating it each time upon getTailFoldingStyle call. Part of https://github.com/llvm/llvm-project/pull/76172 Reviewers: ayalz, fhahn Reviewed By: fhahn Pull Request: https://github.com/llvm/llvm-project/pull/81885	2024-02-28 10:15:52 -05:00
Florian Hahn	15d9d0fa8f	[VPlan] Also print final VPlan directly before codegen/execute. (#82269 ) Some optimizations are apply after UF and VF have been chosen. This patch adds an extra print of the final VPlan just before codegen/execution. In the future, there will be additional transforms that are applied later (interleaving for example). PR: https://github.com/llvm/llvm-project/pull/82269	2024-02-28 13:19:43 +00:00
Florian Hahn	911055e34f	[VPlan] Consistently use (Part, 0) for first lane scalar values (#80271 ) At the moment, some VPInstructions create only a single scalar value, but use VPTransformatState's 'vector' storage for this value. Those values are effectively uniform-per-VF (or in some cases uniform-across-VF-and-UF). Using the vector/per-part storage doesn't interact well with other recipes, that more accurately using (Part, Lane) to look up scalar values and prevents VPInstructions creating scalars from interacting with other recipes working with scalars. This PR tries to unify handling of scalars by using (Part, 0) for scalar values where only the first lane is demanded. This allows using VPInstructions with other recipes like VPScalarCastRecipe and is also needed when using VPInstructions in more cases otuside the vector loop region to generate scalars. Depends on https://github.com/llvm/llvm-project/pull/80269	2024-02-26 19:06:43 +00:00
Florian Hahn	9923d29cfa	[VPlan] Merge main VPlan verifer with HCFG verifier. Unify VPlan verifiers in verifyVPlanIsValid. This adds verification for various properties on blocks to the verifier used for VPlans generated by the inner loop vectorizer. It also adds def-use checks for the verifier used in the VPlan native path. This drops the separate flag to enable HCFG verification. Instead, all VPlans are verified once they have been created, if assertions are enabled. This also removes VPWidenPHIRecipe from VPHeaderPHIRecipe; it is used to model any phi node in the native path.	2024-02-20 16:43:57 +00:00
Florian Hahn	35ee6de966	[VPlan] Simplify addCanonicalIVRecipes by using VPBuilder (NFC). Use VPBuilder to construct VPInstructions, which means there's no need to manually inserting recipes.	2024-02-17 18:30:01 +00:00
David Sherwood	1c10821022	[LoopVectorize] Fix divide-by-zero bug (#80836 ) (#81721 ) When attempting to use the estimated trip count to refine the costs of the runtime memory checks we should also check for sane trip counts to prevent divide-by-zero faults on some platforms. Fixes #80836	2024-02-14 16:07:51 +00:00
Florian Hahn	debca7ee43	[VPlan] Move dropping of poison flags to VPlanTransforms. (NFC) Move collectPoisonGeneratingFlags from InnerLoopVectorizer to VPlanTransforms and also update its name. collectPoisonGeneratingFlags already directly drops poison-generating flags, not only collecting it. This means it is more appropriate to integerate it directly into the VPlan transform pipeline. The current implementation still calls back to legal to check if a block needs predication, which should be improved in the future.	2024-02-14 12:28:58 +00:00
Nilanjana Basu	c1c5b854ad	[LV] Remove loop trip count threshold for deciding whether to interleave a loop (#67725 ) A set of microbenchmarks (https://github.com/llvm/llvm-test-suite/pull/26) showed that loop interleaving can be beneficial for loops with low trip count as well. Loop interleaving count computation is updated accordingly in prior patches while this patch removes the loop trip count threshold for interleaving.	2024-02-05 17:23:58 -08:00
Florian Hahn	2906f3626b	[VPlan] Update ::onlyScalarsGenerated to take IsScalable bool (NFCI). Instead of passing in a full VF, just pass IsScalable as bool.	2024-02-03 14:51:14 +00:00
Nilanjana Basu	c492eb6b28	[LV] Update interleaving count computation when scalar epilogue loop needs to run at least once (#79651 ) Update loop interleaving count computation to address loops that require at least one scalar iteration in the epilogue loop. For this case, the available trip count for interleaving the loop is one less.	2024-01-29 13:41:15 -08:00
Florian Hahn	743946e8ef	[VPlan] Replace VPRecipeOrVPValue with VP2VP recipe simplification. (#76090 ) Move simplification of VPBlendRecipes from early VPlan construction to VPlan-to-VPlan based recipe simplification. This simplifies initial construction. Note that some in-loop reduction tests are failing at the moment, due to the reduction predicate being created after the reduction recipe. I will provide a patch for that soon. PR: https://github.com/llvm/llvm-project/pull/76090	2024-01-29 09:52:05 +00:00
Florian Hahn	2d0d65b3ba	[VPlan] Create edge masks all cases up front needed.(NFC) Similarly to how block masks are created up front and later only retrieved also make sure masks are created in cases where edge masks are needed, i.e. blend recipes. Creating block-in masks for all blocks in the loop also ensures edge masks for all relevant edges have been created. Later, the new getEdgeMask can be used to look up cached edge masks. This makes sure edge masks are available in all cases for https://github.com/llvm/llvm-project/pull/76090.	2024-01-28 21:20:18 +00:00
Florian Hahn	7c03d5d41d	[VPlan] Use unique_ptr to clean up duplicated plan.	2024-01-27 20:51:55 +00:00
Florian Hahn	ec402a2e53	[VPlan] Implement cloning of VPlans. (#73158 ) This patch implements cloning for VPlans and recipes. Cloning is used in the epilogue vectorization path, to clone the VPlan for the main vector loop. This means we won't re-use a VPlan when executing the VPlan for the epilogue vector loop, which in turn will enable us to perform optimizations based on UF & VF.	2024-01-27 13:30:52 +00:00
David Sherwood	962fbafecf	[LoopVectorize] Refine runtime memory check costs when there is an outer loop (#76034 ) When we generate runtime memory checks for an inner loop it's possible that these checks are invariant in the outer loop and so will get hoisted out. In such cases, the effective cost of the checks should reduce to reflect the outer loop trip count. This fixes a 25% performance regression introduced by commit 49b0e6dcc296792b577ae8f0f674e61a0929b99d when building the SPEC2017 x264 benchmark with PGO, where we decided the inner loop trip count wasn't high enough to warrant the (incorrect) high cost of the runtime checks. Also, when runtime memory checks consist entirely of diff checks these are likely to be outer loop invariant.	2024-01-26 14:43:48 +00:00
Florian Hahn	0ab539fd67	[VPlan] Add new VPScalarCastRecipe, use for IV & step trunc. (#78113 ) Add a new recipe to model scalar cast instructions, without relying on an underlying instruction. This allows creating scalar casts, without relying on an underlying instruction (like the current VPReplicateRecipe). The new recipe is used to explicitly model both truncating the induction step and the VPDerivedIVRecipe, thus simplifying both the recipe and code needed to introduce it. Truncating VPWidenIntOrFpInductionRecipes should also be modeled using the new recipe, as follow-up. PR: https://github.com/llvm/llvm-project/pull/78113	2024-01-26 11:13:05 +00:00
Florian Hahn	a04f615291	[LV] Check for innermost loop instead of EnableVPlanNativePath in CM. Replace EnableVPlanNativePath checks in the cost-model by assertions that the code is only called for innermost loops. This ensures that the cost model isn't used in the VPlanNativePath, which is only used for outer-loop vectorization. Even with EnableVPlanNativePath, inner loops are processed by the inner loop vectorization path, not the native path, so checking for EnableVPlanNativePath may impact decisions for inner loops and can cause crashes, like in the attached test case.	2024-01-25 12:49:52 +00:00
Florian Hahn	42fb1fac9e	[VPlan] Use DebugLoc from recipe in VPWidenCallRecipe (NFCI). Instead of using the debug location of the underlying instruction, use the debug location from the recipe. This removes an unneeded dependency of the underlying instruction.	2024-01-19 13:33:03 +00:00
Florian Hahn	abdb61f5fd	[VPlan] Introduce VPSingleDefRecipe. (#77023 ) This patch introduces a new common base class for recipes defining a single result VPValue. This has been discussed/mentioned at various previous reviews as potential follow-up and helps to replace various getVPSingleValue calls. PR: https://github.com/llvm/llvm-project/pull/77023	2024-01-19 10:27:53 +00:00
Paschalis Mpeis	37c87d5689	[LV][AArch64] LoopVectorizer allows scalable frem instructions (#76247 ) LoopVectorizer is aware when a target can replace a scalable frem instruction with a vector library call for a given VF and it returns the relevant cost. Otherwise, it returns an invalid cost (as previously). Add test that check costs on AArch64, when there is no vector library available and when there is (with and without tail-folding). NOTE: Invoking CostModel directly (not through LV) would still return invalid costs.	2024-01-18 08:32:53 +00:00
Florian Hahn	e7671bc9d6	[LV] Fix indent for loop in adjustRecipesForReductions (NFC).	2024-01-16 15:28:46 +00:00
Nikita Popov	6c2fbc3a68	[IRBuilder] Add CreatePtrAdd() method (NFC) (#77582 ) This abstracts over the common pattern of creating a gep with i8 element type.	2024-01-12 14:21:21 +01:00
Craig Topper	1c342571b8	[LV] Use value_or to simplify code. NFC (#77030 )	2024-01-10 12:40:26 -08:00
Florian Hahn	8b7bbedec7	[LV] Re-add early exit in VPRecipeBuilder::createBlockInMask. Re-add early exit that was accidentally dropped in 51afb10.	2024-01-10 15:02:14 +00:00
Florian Hahn	51afb10174	[LV] Create block in mask up-front if needed. (#76635 ) At the moment, block and edge masks are created on demand, which means that they are inserted at the point where they are demanded and then cached. It is possible that the mask for a block is looked up later at a point that's not dominated by the point where the mask has been inserted. To avoid this, create masks up front on entry to the corresponding basic block and leave it to VPlan simplification to remove unneeded masks. Note that we need to create masks for all blocks, if any of the blocks in the loop needs predication, as computing the mask of a block depends on the masks of its predecessor. Needed for #76090. https://github.com/llvm/llvm-project/pull/76635	2024-01-09 10:50:08 +00:00
Florian Hahn	18ec3304a9	[VPlan] Manage InBounds via VPRecipeWithIRFlags for VectorPtrRecipe. As suggested as follow-up in https://github.com/llvm/llvm-project/pull/72164, manage inbounds via VPRecipeWithIRFlags. Note that in some cases we can now preserve inbounds in a few more cases.	2024-01-07 13:58:05 +00:00
Florian Hahn	241fe83704	[VPlan] Introduce ComputeReductionResult VPInstruction opcode. (#70253 ) This patch introduces a new ComputeReductionResult opcode to compute the final reduction result in the middle block. The code from fixReduction has been moved to ComputeReductionResult, after some earlier cleanup changes to model parts of fixReduction explicitly elsewhere as needed. The recipe may be broken down further in the future. Note that the phi nodes to merge the reduction result from the trip count check and the middle block, to be used as resume value for the scalar remainder loop are also generated based on ComputeReductionResult. Once we have a VPValue for the reduction result, this can also be modeled explicitly and moved out of the recipe.	2024-01-04 22:53:18 +00:00
Nilanjana Basu	cd28da390f	[LV] Change loops' interleave count computation (#73766 ) [LV] Change loops' interleave count computation A set of microbenchmarks in llvm-test-suite (https://github.com/llvm/llvm-test-suite/pull/56), when tested on a AArch64 platform, demonstrates that loop interleaving is beneficial when the vector loop runs at least twice or when the epilogue loop trip count (TC) is minimal. Therefore, we choose interleaving count (IC) between TC/VF & TC/2*VF (VF = vectorization factor), such that remainder TC for the epilogue loop is minimum while the IC is maximum in case the remainder TC is same for both. The initial tests for this change were submitted in PRs: https://github.com/llvm/llvm-project/pull/70272 and https://github.com/llvm/llvm-project/pull/74689.	2024-01-04 12:45:22 +05:30
Florian Hahn	6dda74cc51	[VPlan] Use createSelect in adjustRecipesForReductions (NFCI). Simplify the code and rename Result->NewExitingVPV as suggested by @ayalz in https://github.com/llvm/llvm-project/pull/70253.	2024-01-03 20:54:10 +00:00
Florian Hahn	f18536d642	[VPlan] Model address separately. (#72164 ) Move vector pointer generation to a separate VPVectorPointerRecipe. This untangles address computation from the memory recipes future and is also needed to enable explicit unrolling in VPlan. https://github.com/llvm/llvm-project/pull/72164	2024-01-01 19:51:15 +00:00
Florian Hahn	516cc98aff	[LV] Fix typo in comment (NFC).	2023-12-28 21:20:10 +00:00
Florian Hahn	a5891fa4d2	[VPlan] Initial modeling of VF * UF as VPValue. (#74761 ) This patch starts initial modeling of VF * UF in VPlan. Initially, introduce a dedicated VFxUF VPValue, which is then populated during VPlan::prepareToExecute. Initially, the VF * UF applies only to the main vector loop region. Once we extend the scope of VPlan in the future, we may want to associate different VFxUFs with different vector loop regions (e.g. the epilogue vector loop) This allows explicitly parameterizing recipes that rely on the VF * UF, like the canonical induction increment. At the moment, this mainly helps to avoid generating some duplicated calls to vscale with scalable vectors. It should also allow using EVL as induction increments explicitly in D99750. Referring to VF * UF is also needed in other places that we plan to migrate to VPlan, like the minimum trip count check during skeleton creation. The first version creates the value for VF * UF directly in prepareToExecute to limit the scope of the patch. A follow-on patch will model VF * UF computation explicitly in VPlan using recipes. Moved from Phabricator (https://reviews.llvm.org/D157322)	2023-12-08 18:30:30 +00:00
Graham Hunter	d0d5ef8133	[LV] Add support for linear arguments for vector function variants (#73941 ) If we have vectorized variants of a function which take linear parameters, we should be able to vectorize assuming the strides match.	2023-12-08 10:24:05 +00:00
Alexey Bataev	056367bb19	[LV]Support dropping of nneg flag for zext widencast recipes. (#74112 ) Compiler crashes when the assertion triggered for zext nneg instruction, that checks that the instruction cannot produce poison. Changed the base class for widencast recipe to handle dropping nneg flag to avoid compiler crash.	2023-12-05 09:17:23 -05:00
Florian Hahn	70535f5e60	[VPlan] Replace IR based truncateToMinimalBitwidths with VPlan version. This patch replaces the IR based truncateToMinimalBitwidths with a VPlan version. This has 3 benefits: 1) the VPlan-based version is simpler; we don't need to implement special codegen for each supported instruction type like the IR based one. 2) Removes a dependency on the cost-model after VPlan execution and 3) Removes a use of getVPValue that uses underlying values after VPlan execution (See removed FIXME). Depends on D149081. Depends on D149079. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D149903	2023-12-02 16:12:38 +00:00
Florian Hahn	cbf7b52a65	[VPlan] Properly update reduction live-out after placing select. After inserting a select for the final value, update the VPlan def-use chains. At the moment, the incorrect live-out doesn't cause a mis-compile, as computing the final reduction value is not yet modeled in VPlan.	2023-12-02 15:22:09 +00:00
Graham Hunter	104b7c624e	[LV] Add support for uniform parameters on vectorized function variants (#72891 ) Parameters marked as uniform take a scalar value, assuming the value is invariant in the scalar loop.	2023-11-28 15:01:32 +00:00
Florian Hahn	906f598263	[VPlan] Remove dead IsEpilogueVec argument from prepareToExecute (NFC).	2023-11-23 16:59:50 +00:00
Matthias Braun	2404477219	LoopVectorize: Add better heuristic for vectorized epilogue skip test (#72589 ) This is a follow-up to PR #72450 correcting the branch_weights used for the test whether the vectorized epilogue loop should be skipped.	2023-11-20 11:02:27 -08:00
Graham Hunter	4d64a2bcd3	[LV] Refactor vector function variant selection to prepare for uniform args (#68879 ) Parameters marked as uniform take a scalar value, assuming the value is invariant in the scalar loop. In order to support this, we need to stop asking for a vector function variant with a default shape assuming that all arguments will become vector arguments, and instead consider all available variants and their parameter types.	2023-11-20 13:30:03 +00:00
Florian Hahn	7fd021a092	[LV] Don't crash on vector masks during scalar VPReductionRecipe::exec. VPReductionRecipe may be executed for scalar VFs. Make sure to access part 0 of the condition, as it could be an active-lane-mask, which is a vector <1 x i1> Fixes https://github.com/llvm/llvm-project/issues/72720.	2023-11-18 21:52:22 +00:00
Florian Hahn	2a9aed1730	[LV] Retain mask-reversal comment as suggested after e5e71af. Address post-commit comment to retain comment.	2023-11-18 20:53:23 +00:00
Florian Hahn	e5e71affb7	[LV] Reverse mask up front, not when creating vector pointer. (#72163 ) Reverse mask early on when populating BlockInMask. This will enable separating mask management and address computation from the memory recipes in the future and is also needed to enable explicit unrolling in VPlan.	2023-11-17 13:59:35 +00:00
Nikita Popov	de176d8c54	[SCEV][LV] Invalidate LCSSA exit phis more thoroughly (#69909 ) This an alternative to #69886. The basic problem is that SCEV can look through trivial LCSSA phis. When the phi node later becomes non-trivial, we do invalidate it, but this doesn't catch uses that are not covered by the IR use-def walk, such as those in BECounts. Fix this by adding a special invalidation method for LCSSA phis, which will also invalidate all the SCEVUnknowns/SCEVAddRecExprs used by the LCSSA phi node and defined in the loop. We should probably also use this invalidation method in other places that add predecessors to exit blocks, such as loop unrolling and loop peeling. Fixes #69097. Fixes #66616. Fixes #63970.	2023-11-17 09:34:24 +01:00

1 2 3 4 5 ...

2030 Commits