llvm-project

Author	SHA1	Message	Date
Florian Hahn	2d0d65b3ba	[VPlan] Create edge masks all cases up front needed.(NFC) Similarly to how block masks are created up front and later only retrieved also make sure masks are created in cases where edge masks are needed, i.e. blend recipes. Creating block-in masks for all blocks in the loop also ensures edge masks for all relevant edges have been created. Later, the new getEdgeMask can be used to look up cached edge masks. This makes sure edge masks are available in all cases for https://github.com/llvm/llvm-project/pull/76090.	2024-01-28 21:20:18 +00:00
Florian Hahn	7c03d5d41d	[VPlan] Use unique_ptr to clean up duplicated plan.	2024-01-27 20:51:55 +00:00
Florian Hahn	ec402a2e53	[VPlan] Implement cloning of VPlans. (#73158 ) This patch implements cloning for VPlans and recipes. Cloning is used in the epilogue vectorization path, to clone the VPlan for the main vector loop. This means we won't re-use a VPlan when executing the VPlan for the epilogue vector loop, which in turn will enable us to perform optimizations based on UF & VF.	2024-01-27 13:30:52 +00:00
David Sherwood	962fbafecf	[LoopVectorize] Refine runtime memory check costs when there is an outer loop (#76034 ) When we generate runtime memory checks for an inner loop it's possible that these checks are invariant in the outer loop and so will get hoisted out. In such cases, the effective cost of the checks should reduce to reflect the outer loop trip count. This fixes a 25% performance regression introduced by commit 49b0e6dcc296792b577ae8f0f674e61a0929b99d when building the SPEC2017 x264 benchmark with PGO, where we decided the inner loop trip count wasn't high enough to warrant the (incorrect) high cost of the runtime checks. Also, when runtime memory checks consist entirely of diff checks these are likely to be outer loop invariant.	2024-01-26 14:43:48 +00:00
Florian Hahn	0ab539fd67	[VPlan] Add new VPScalarCastRecipe, use for IV & step trunc. (#78113 ) Add a new recipe to model scalar cast instructions, without relying on an underlying instruction. This allows creating scalar casts, without relying on an underlying instruction (like the current VPReplicateRecipe). The new recipe is used to explicitly model both truncating the induction step and the VPDerivedIVRecipe, thus simplifying both the recipe and code needed to introduce it. Truncating VPWidenIntOrFpInductionRecipes should also be modeled using the new recipe, as follow-up. PR: https://github.com/llvm/llvm-project/pull/78113	2024-01-26 11:13:05 +00:00
Florian Hahn	a04f615291	[LV] Check for innermost loop instead of EnableVPlanNativePath in CM. Replace EnableVPlanNativePath checks in the cost-model by assertions that the code is only called for innermost loops. This ensures that the cost model isn't used in the VPlanNativePath, which is only used for outer-loop vectorization. Even with EnableVPlanNativePath, inner loops are processed by the inner loop vectorization path, not the native path, so checking for EnableVPlanNativePath may impact decisions for inner loops and can cause crashes, like in the attached test case.	2024-01-25 12:49:52 +00:00
Florian Hahn	42fb1fac9e	[VPlan] Use DebugLoc from recipe in VPWidenCallRecipe (NFCI). Instead of using the debug location of the underlying instruction, use the debug location from the recipe. This removes an unneeded dependency of the underlying instruction.	2024-01-19 13:33:03 +00:00
Florian Hahn	abdb61f5fd	[VPlan] Introduce VPSingleDefRecipe. (#77023 ) This patch introduces a new common base class for recipes defining a single result VPValue. This has been discussed/mentioned at various previous reviews as potential follow-up and helps to replace various getVPSingleValue calls. PR: https://github.com/llvm/llvm-project/pull/77023	2024-01-19 10:27:53 +00:00
Paschalis Mpeis	37c87d5689	[LV][AArch64] LoopVectorizer allows scalable frem instructions (#76247 ) LoopVectorizer is aware when a target can replace a scalable frem instruction with a vector library call for a given VF and it returns the relevant cost. Otherwise, it returns an invalid cost (as previously). Add test that check costs on AArch64, when there is no vector library available and when there is (with and without tail-folding). NOTE: Invoking CostModel directly (not through LV) would still return invalid costs.	2024-01-18 08:32:53 +00:00
Florian Hahn	e7671bc9d6	[LV] Fix indent for loop in adjustRecipesForReductions (NFC).	2024-01-16 15:28:46 +00:00
Nikita Popov	6c2fbc3a68	[IRBuilder] Add CreatePtrAdd() method (NFC) (#77582 ) This abstracts over the common pattern of creating a gep with i8 element type.	2024-01-12 14:21:21 +01:00
Craig Topper	1c342571b8	[LV] Use value_or to simplify code. NFC (#77030 )	2024-01-10 12:40:26 -08:00
Florian Hahn	8b7bbedec7	[LV] Re-add early exit in VPRecipeBuilder::createBlockInMask. Re-add early exit that was accidentally dropped in 51afb10.	2024-01-10 15:02:14 +00:00
Florian Hahn	51afb10174	[LV] Create block in mask up-front if needed. (#76635 ) At the moment, block and edge masks are created on demand, which means that they are inserted at the point where they are demanded and then cached. It is possible that the mask for a block is looked up later at a point that's not dominated by the point where the mask has been inserted. To avoid this, create masks up front on entry to the corresponding basic block and leave it to VPlan simplification to remove unneeded masks. Note that we need to create masks for all blocks, if any of the blocks in the loop needs predication, as computing the mask of a block depends on the masks of its predecessor. Needed for #76090. https://github.com/llvm/llvm-project/pull/76635	2024-01-09 10:50:08 +00:00
Florian Hahn	18ec3304a9	[VPlan] Manage InBounds via VPRecipeWithIRFlags for VectorPtrRecipe. As suggested as follow-up in https://github.com/llvm/llvm-project/pull/72164, manage inbounds via VPRecipeWithIRFlags. Note that in some cases we can now preserve inbounds in a few more cases.	2024-01-07 13:58:05 +00:00
Florian Hahn	241fe83704	[VPlan] Introduce ComputeReductionResult VPInstruction opcode. (#70253 ) This patch introduces a new ComputeReductionResult opcode to compute the final reduction result in the middle block. The code from fixReduction has been moved to ComputeReductionResult, after some earlier cleanup changes to model parts of fixReduction explicitly elsewhere as needed. The recipe may be broken down further in the future. Note that the phi nodes to merge the reduction result from the trip count check and the middle block, to be used as resume value for the scalar remainder loop are also generated based on ComputeReductionResult. Once we have a VPValue for the reduction result, this can also be modeled explicitly and moved out of the recipe.	2024-01-04 22:53:18 +00:00
Nilanjana Basu	cd28da390f	[LV] Change loops' interleave count computation (#73766 ) [LV] Change loops' interleave count computation A set of microbenchmarks in llvm-test-suite (https://github.com/llvm/llvm-test-suite/pull/56), when tested on a AArch64 platform, demonstrates that loop interleaving is beneficial when the vector loop runs at least twice or when the epilogue loop trip count (TC) is minimal. Therefore, we choose interleaving count (IC) between TC/VF & TC/2*VF (VF = vectorization factor), such that remainder TC for the epilogue loop is minimum while the IC is maximum in case the remainder TC is same for both. The initial tests for this change were submitted in PRs: https://github.com/llvm/llvm-project/pull/70272 and https://github.com/llvm/llvm-project/pull/74689.	2024-01-04 12:45:22 +05:30
Florian Hahn	6dda74cc51	[VPlan] Use createSelect in adjustRecipesForReductions (NFCI). Simplify the code and rename Result->NewExitingVPV as suggested by @ayalz in https://github.com/llvm/llvm-project/pull/70253.	2024-01-03 20:54:10 +00:00
Florian Hahn	f18536d642	[VPlan] Model address separately. (#72164 ) Move vector pointer generation to a separate VPVectorPointerRecipe. This untangles address computation from the memory recipes future and is also needed to enable explicit unrolling in VPlan. https://github.com/llvm/llvm-project/pull/72164	2024-01-01 19:51:15 +00:00
Florian Hahn	516cc98aff	[LV] Fix typo in comment (NFC).	2023-12-28 21:20:10 +00:00
Florian Hahn	a5891fa4d2	[VPlan] Initial modeling of VF * UF as VPValue. (#74761 ) This patch starts initial modeling of VF * UF in VPlan. Initially, introduce a dedicated VFxUF VPValue, which is then populated during VPlan::prepareToExecute. Initially, the VF * UF applies only to the main vector loop region. Once we extend the scope of VPlan in the future, we may want to associate different VFxUFs with different vector loop regions (e.g. the epilogue vector loop) This allows explicitly parameterizing recipes that rely on the VF * UF, like the canonical induction increment. At the moment, this mainly helps to avoid generating some duplicated calls to vscale with scalable vectors. It should also allow using EVL as induction increments explicitly in D99750. Referring to VF * UF is also needed in other places that we plan to migrate to VPlan, like the minimum trip count check during skeleton creation. The first version creates the value for VF * UF directly in prepareToExecute to limit the scope of the patch. A follow-on patch will model VF * UF computation explicitly in VPlan using recipes. Moved from Phabricator (https://reviews.llvm.org/D157322)	2023-12-08 18:30:30 +00:00
Graham Hunter	d0d5ef8133	[LV] Add support for linear arguments for vector function variants (#73941 ) If we have vectorized variants of a function which take linear parameters, we should be able to vectorize assuming the strides match.	2023-12-08 10:24:05 +00:00
Alexey Bataev	056367bb19	[LV]Support dropping of nneg flag for zext widencast recipes. (#74112 ) Compiler crashes when the assertion triggered for zext nneg instruction, that checks that the instruction cannot produce poison. Changed the base class for widencast recipe to handle dropping nneg flag to avoid compiler crash.	2023-12-05 09:17:23 -05:00
Florian Hahn	70535f5e60	[VPlan] Replace IR based truncateToMinimalBitwidths with VPlan version. This patch replaces the IR based truncateToMinimalBitwidths with a VPlan version. This has 3 benefits: 1) the VPlan-based version is simpler; we don't need to implement special codegen for each supported instruction type like the IR based one. 2) Removes a dependency on the cost-model after VPlan execution and 3) Removes a use of getVPValue that uses underlying values after VPlan execution (See removed FIXME). Depends on D149081. Depends on D149079. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D149903	2023-12-02 16:12:38 +00:00
Florian Hahn	cbf7b52a65	[VPlan] Properly update reduction live-out after placing select. After inserting a select for the final value, update the VPlan def-use chains. At the moment, the incorrect live-out doesn't cause a mis-compile, as computing the final reduction value is not yet modeled in VPlan.	2023-12-02 15:22:09 +00:00
Graham Hunter	104b7c624e	[LV] Add support for uniform parameters on vectorized function variants (#72891 ) Parameters marked as uniform take a scalar value, assuming the value is invariant in the scalar loop.	2023-11-28 15:01:32 +00:00
Florian Hahn	906f598263	[VPlan] Remove dead IsEpilogueVec argument from prepareToExecute (NFC).	2023-11-23 16:59:50 +00:00
Matthias Braun	2404477219	LoopVectorize: Add better heuristic for vectorized epilogue skip test (#72589 ) This is a follow-up to PR #72450 correcting the branch_weights used for the test whether the vectorized epilogue loop should be skipped.	2023-11-20 11:02:27 -08:00
Graham Hunter	4d64a2bcd3	[LV] Refactor vector function variant selection to prepare for uniform args (#68879 ) Parameters marked as uniform take a scalar value, assuming the value is invariant in the scalar loop. In order to support this, we need to stop asking for a vector function variant with a default shape assuming that all arguments will become vector arguments, and instead consider all available variants and their parameter types.	2023-11-20 13:30:03 +00:00
Florian Hahn	7fd021a092	[LV] Don't crash on vector masks during scalar VPReductionRecipe::exec. VPReductionRecipe may be executed for scalar VFs. Make sure to access part 0 of the condition, as it could be an active-lane-mask, which is a vector <1 x i1> Fixes https://github.com/llvm/llvm-project/issues/72720.	2023-11-18 21:52:22 +00:00
Florian Hahn	2a9aed1730	[LV] Retain mask-reversal comment as suggested after e5e71af. Address post-commit comment to retain comment.	2023-11-18 20:53:23 +00:00
Florian Hahn	e5e71affb7	[LV] Reverse mask up front, not when creating vector pointer. (#72163 ) Reverse mask early on when populating BlockInMask. This will enable separating mask management and address computation from the memory recipes in the future and is also needed to enable explicit unrolling in VPlan.	2023-11-17 13:59:35 +00:00
Nikita Popov	de176d8c54	[SCEV][LV] Invalidate LCSSA exit phis more thoroughly (#69909 ) This an alternative to #69886. The basic problem is that SCEV can look through trivial LCSSA phis. When the phi node later becomes non-trivial, we do invalidate it, but this doesn't catch uses that are not covered by the IR use-def walk, such as those in BECounts. Fix this by adding a special invalidation method for LCSSA phis, which will also invalidate all the SCEVUnknowns/SCEVAddRecExprs used by the LCSSA phi node and defined in the loop. We should probably also use this invalidation method in other places that add predecessors to exit blocks, such as loop unrolling and loop peeling. Fixes #69097. Fixes #66616. Fixes #63970.	2023-11-17 09:34:24 +01:00
Matthias Braun	a9cc6fc280	LoopVectorize: Set branch_weight for conditional branches (#72450 ) Consistently add `branch_weights` metadata in any condition branch created by `LoopVectorize.cpp`: - Will only add metadata if the original loop-latch branch had metadata assigned. - Most checks should rarely trigger so I am using a 127:1 ratio. - For the middle block we assume an equal distribution of modulo results.	2023-11-16 11:33:46 -08:00
Graham Hunter	b070629c10	[LV] Increase max VF if vectorized function variants exist (#66639 ) If there are function calls in the candidate loop and we have vectorized variants available, try some wider VFs in case the conservative initial maximum based on the widest types in the loop won't actually allow us to make use of those function variants.	2023-11-13 10:27:10 +00:00
Florian Hahn	34c2dcd5ac	[VPlan] Move initial skeleton construction to createInitialVPlan. (NFC) This patch moves creating the middle VPBBs and an initial empty vector loop region for the top-level loop to createInitialVPlan. This consolidates code to create the initial VPlan skeleton and enables adding other bits outside the main region during initial VPlan construction. In particular, D150398 will add the exit check & branch to the middle block. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D158333	2023-11-12 13:00:44 +00:00
Florian Hahn	ed6f4994d8	[VPlan] Handle conditional ordered reductions with scalar VFs. VPReductionRecipe::execute was not handling predicates for ordered reduction with scalar VFs, which was causing a crash. Thsi patch adds dedicated handling for scalar VFs when dealing with the condition. The other operands are already handled in a similar fashion below. Fixes #70988.	2023-11-11 12:55:40 +00:00
Simon Pilgrim	3ca4fe80d4	[Transforms] Use StringRef::starts_with/ends_with instead of startswith/endswith. NFC. startswith/endswith wrap starts_with/ends_with and will eventually go away (to more closely match string_view)	2023-11-06 16:50:18 +00:00
Florian Hahn	fd82b5b287	[LV] Support recieps without underlying instr in collectPoisonGenRec. Support recipes without underlying instruction in collectPoisonGeneratingRecipes by directly trying to dyn_cast_or_null the underlying value. Fixes https://github.com/llvm/llvm-project/issues/70590.	2023-11-03 10:21:14 +00:00
Igor Kirillov	70904226e1	[LoopVectorize] Enhance Vectorization decisions for predicate tail-folded loops with low trip counts (#69588 ) * Avoid using `CM_ScalarEpilogueNotAllowedLowTripLoop` for loops known to be predicate tail-folded, delegating to `areRuntimeChecksProfitable` to decide on the profitability of vectorizing loops with runtime checks. * Update the `areRuntimeChecksProfitable` function to consider the `ScalarEpilogueLowering` setting when assessing vectorization of a loop. With this patch, we can make more informed decisions for loops with low trip counts, especially when leveraging Profile-Guided Optimization (PGO) data.	2023-10-30 13:43:26 +00:00
Florian Hahn	b0b88643a1	[VPlan] Add initial anlysis to infer scalar type of VPValues. (#69013 ) This patch adds initial type inferrence for VPValues. It infers the scalar type of a VPValue, by bottom-up traversing through defining recipes until root nodes with known types are reached (e.g. live-ins or load recipes). The types are then propagated top down through operations. This is intended as building block for a VPlan-based cost model, which will need access to type information for VPValues/recipes. Initial testing is done by asserting the inferred type matches the type of the result value generated for a widen and replicate recipes.	2023-10-27 14:38:28 +01:00
Florian Hahn	4f56d47d05	[VPlan] Make ExpandedSCEVs argument const (NFC). The argument is only used in const contexts. Simplifies a follow-up diff.	2023-10-22 12:31:55 +01:00
Florian Hahn	ca01f2af78	[LV] Enforce order of reductions with intermediate stores in VPlan (NFC) Reductions with intermediate stores currently need to be fixed in order of their intermediate stores. Instead of doing this at fixup time after code has been generated, sort the reductions in adjustRecipesForReductions. This makes the order explicit in VPlan and will enable removing fixReductions with modeling computing the final reduction result in VPlan, followed by also modeling the intermediate stores explicitly.	2023-10-21 21:26:52 +01:00
Lou Knauer	852bac4439	[VPlan] Support scalable vectors in outer-loop vectorization This patch enables scalable vectors in the VPlan-native path. If a vectorization factor is specified via loop vectorization hints, that factor is used. If no vectorization factor is specified, but the target preferes scalable vectorization, a scalable vectorization factor is selected. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D157484	2023-10-20 23:17:35 +01:00
Florian Hahn	2ec7bba77b	Recommit "[VPlan] Insert Trunc/Exts for reductions directly in VPlan." This reverts commit e4ea0997486000b460c4875a00301b73b3c0d6a7. The recommit fixes a reported crash by adding a missing check to make sure the cast recipes are only introduced when vectorizing. Test coverage added in 3cac608fbd0811b2f5c59c6e13148162ccd8543e. Original commit message: Update the code to create Trunc/Ext recipes directly in adjustRecipesForReductions instead of fixing it up later in fixReductions. This explicitly models the required conversions and also makes sure they are generated at the right place (instead of after the exit condition), hence the changes in a few tests.	2023-10-20 14:30:04 +01:00
Fangrui Song	e4ea099748	Revert "[VPlan] Insert Trunc/Exts for reductions directly in VPlan." This reverts commit fd311126349b8fe1684d62154a9fa5a7bbb0b713. There are two different crash reports on `fd31112634`	2023-10-18 23:25:31 -07:00
Florian Hahn	fd31112634	[VPlan] Insert Trunc/Exts for reductions directly in VPlan. Update the code to create Trunc/Ext recipes directly in adjustRecipesForReductions instead of fixing it up later in fixReductions. This explicitly models the required conversions and also makes sure they are generated at the right place (instead of after the exit condition), hence the changes in a few tests.	2023-10-17 19:17:40 +01:00
Yingwei Zheng	4718b4011f	[LV] Invalidate disposition of SCEV values after loop vectorization (#69230 ) This PR fixes the assertion failure of `SE.verify()` after loop vectorization.	2023-10-17 03:49:39 +08:00
Florian Hahn	b1115f8cce	[LV] Use LatchVPBB directly instead of going through region (NFC). Split off from D158333.	2023-10-13 20:08:31 +01:00
Rin	df8e0d057d	[AArch64][LoopVectorize] Use upper bound trip count instead of the constant TC when choosing max VF (#67697 ) This patch is based off of https://github.com/llvm/llvm-project/pull/67543. We are currently using the exact trip count to make decisions regarding the maximum VF. We can instead use the upper bound TC, which will be the same as the constant trip count when that is known.	2023-10-09 16:26:19 +01:00

1 2 3 4 5 ...

2013 Commits