llvm-project

Author	SHA1	Message	Date
Florian Hahn	dcef154b5c	[VPlan] Replace VPRegionBlock with explicit CFG before execute (NFCI). (#117506 ) Building on top of https://github.com/llvm/llvm-project/pull/114305, replace VPRegionBlocks with explicit CFG before executing. This brings the final VPlan closer to the IR that is generated and helps to simplify codegen. It will also enable further simplifications of phi handling during execution and transformations that do not have to preserve the canonical IV required by loop regions. This for example could include replacing the canonical IV with an EVL based phi while completely removing the original canonical IV. PR: https://github.com/llvm/llvm-project/pull/117506	2025-05-24 19:17:16 +01:00
Florian Hahn	95ba5508e5	Reapply "[VPlan] Move predication to VPlanTransform (NFC). (#128420 )" This reverts commit 793bb6b257fa4d9f4af169a4366cab3da01f2e1f. The recommitted version contains a fix to make sure only the original phis are processed in convertPhisToBlends nu collecting them in a vector first. This fixes a crash when no mask is needed, because there is only a single incoming value. Original message: This patch moves the logic to predicate and linearize a VPlan to a dedicated VPlan transform. It mostly ports the existing logic directly. There are a number of follow-ups planned in the near future to further improve on the implementation: * Edge and block masks are cached in VPPredicator, but the block masks are still made available to VPRecipeBuilder, so they can be accessed during recipe construction. As a follow-up, this should be replaced by adding mask operands to all VPInstructions that need them and use that during recipe construction. * The mask caching in a map also means that this map needs updating each time a new recipe replaces a VPInstruction; this would also be handled by adding mask operands. PR: https://github.com/llvm/llvm-project/pull/128420	2025-05-22 08:16:15 +01:00
Florian Hahn	793bb6b257	Revert "[VPlan] Move predication to VPlanTransform (NFC). (#128420 )" This reverts commit b263c08e1a0b54a871915930aa9a1a6ba205b099. Looks like this triggers a crash in one of the Fortran tests. Reverting while I investigate https://lab.llvm.org/buildbot/#/builders/41/builds/6825	2025-05-21 19:24:21 +01:00
Florian Hahn	b263c08e1a	[VPlan] Move predication to VPlanTransform (NFC). (#128420 ) This patch moves the logic to predicate and linearize a VPlan to a dedicated VPlan transform. It mostly ports the existing logic directly. There are a number of follow-ups planned in the near future to further improve on the implementation: * Edge and block masks are cached in VPPredicator, but the block masks are still made available to VPRecipeBuilder, so they can be accessed during recipe construction. As a follow-up, this should be replaced by adding mask operands to all VPInstructions that need them and use that during recipe construction. * The mask caching in a map also means that this map needs updating each time a new recipe replaces a VPInstruction; this would also be handled by adding mask operands. PR: https://github.com/llvm/llvm-project/pull/128420	2025-05-21 15:47:33 +01:00
Elvis Wang	664c937b43	[VPlan] Implement VPExtendedReduction, VPMulAccumulateReductionRecipe and corresponding vplan transformations. (#137746 ) This patch introduce two new recipes. * VPExtendedReductionRecipe - cast + reduction. * VPMulAccumulateReductionRecipe - (cast) + mul + reduction. This patch also implements the transformation that match following patterns via vplan and converts to abstract recipes for better cost estimation. * VPExtendedReduction - reduce(cast(...)) * VPMulAccumulateReductionRecipe - reduce.add(mul(...)) - reduce.add(mul(ext(...), ext(...)) - reduce.add(ext(mul(ext(...), ext(...)))) The converted abstract recipes will be lower to the concrete recipes (widen-cast + widen-mul + reduction) just before recipe execution. Note that this patch still relies on legacy cost model the calculate the cost for these patters. Will enable vplan-based cost decision in #113903. Split from #113903.	2025-05-16 10:25:38 +08:00
Florian Hahn	2f55123cbb	[VPlan] Handle early exit before forming regions. (NFC) (#138393 ) Move early-exit handling up front to original VPlan construction, before introducing early exits. This builds on https://github.com/llvm/llvm-project/pull/137709, which adds exiting edges to the original VPlan, instead of adding exit blocks later. This retains the exit conditions early, and means we can handle early exits before forming regions, without the reliance on VPRecipeBuilder. Once we retain all exits initially, handling early exits before region construction ensures the regions are valid; otherwise we would leave edges exiting the region from elsewhere than the latch. Removing the reliance on VPRecipeBuilder removes the dependence on mapping IR BBs to VPBBs and unblocks predication as VPlan transform: https://github.com/llvm/llvm-project/pull/128420. Depends on https://github.com/llvm/llvm-project/pull/137709 (included in PR). PR: https://github.com/llvm/llvm-project/pull/138393	2025-05-12 12:53:20 +01:00
Florian Hahn	cfde685e22	[VPlan] Sink VPB2IRBB lookups to VPRecipeBuilder (NFC). This allows migrating some more code to be based on VPBBs in VPRecipeBuilder, in preparation for https://github.com/llvm/llvm-project/pull/128420.	2025-05-10 22:00:58 +01:00
Luke Lau	1484f82cbc	[VPlan] Add VPInstruction::StepVector and use it in VPWidenIntOrFpInductionRecipe (#129508 ) Split off from #118638, this adds VPInstruction::StepVector, which generates integer step vectors (0,1,2,...,VF). This is a step towards eventually modelling all the separate parts of VPWidenIntOrFpInductionRecipe in VPlan. This is then used by VPWidenIntOrFpInductionRecipe, where we materialize it just before unrolling so the operands stay in a fixed position. The need for a separate operand in VPWidenIntOrFpInductionRecipe, as well as the need to update it in optimizeVectorInductionWidthForTCAndVFUF, should be removed with #118638 when everything is expanded in convertToConcreteRecipes.	2025-05-08 18:47:44 +08:00
Florian Hahn	edb690dc5b	Reapply "[VPlan] Add canonical IV during construction (NFC)." This reverts commit d431921677ae923d189ff2d6f188f676a2964ed8. Missing gtests have been updated. Original message: This addresses an existing TODO and simply moves the current code to add canonical IV recipes to the initial skeleton construction, at the same place where the corresponding region will be introduced.	2025-05-03 10:54:59 +01:00
Florian Hahn	d431921677	Revert "[VPlan] Add canonical IV during construction (NFC)." This reverts commit e17122fffa8d233fcf9f717354ecda46173f1b8d. Revert as this seems to break some unit tests on some bots.	2025-04-29 22:55:11 +01:00
Florian Hahn	e17122fffa	[VPlan] Add canonical IV during construction (NFC). This addresses an existing TODO and simply moves the current code to add canonical IV recipes to the initial skeleton construction, at the same place where the corresponding region will be introduced.	2025-04-29 22:38:59 +01:00
Florian Hahn	d2ce88a939	[VPlan] Create initial skeleton before creating regions. (NFC) Move out the logic to prepare for vectorization to a separate transform, before creating loop regions. This was discussed as follow-up in https://github.com/llvm/llvm-project/pull/136455. This just moves the existing code around slightly and will simplify follow-up patches to include the exiting edges during initial VPlan construction.	2025-04-28 21:51:32 +01:00
Florian Hahn	7cce38beea	[VPlan] Remove dead SE argument from handleUncountableEarlyExit (NFC). ScalarEvolution is not used by the function, remove the dead arg.	2025-04-24 19:59:05 +01:00
Florian Hahn	e232d28eff	[VPlan] Move plain CFG construction to VPlanConstruction. (NFC) Follow-up as discussed in https://github.com/llvm/llvm-project/pull/129402. After bc03d6cce257, the VPlanHCFGBuilder doesn't actually build a HCFG any longer. Move what remains directly into VPlanConstruction.cpp.	2025-04-18 21:52:05 +01:00
Kazu Hirata	f0621b31f8	[Vectorize] Fix a warning This patch fixes: llvm/lib/Transforms/Vectorize/VPlanTransforms.h:31:1: error: class 'VFRange' was previously declared as a struct; this is valid, but may result in linker errors under the Microsoft C++ ABI [-Werror,-Wmismatched-tags]	2025-04-17 16:23:52 -07:00
Elvis Wang	69ade7c090	[LV] Check if the VF is scalar by VFRange in `handleUncountableEarlyExit`. (#135294 ) This patch check if the plan contains scalar VF by VFRange instead of Plan. This patch also clamp the range to contains either only scalar or only vector VFs to prevent mis-compile. Split from #113903.	2025-04-18 06:51:36 +08:00
Florian Hahn	bc03d6cce2	[VPlan] Introduce all loop regions as VPlan transform. (NFC) (#129402 ) Further simplify VPlan CFG builder by moving introduction of inner regions to a VPlan transform, building on https://github.com/llvm/llvm-project/pull/128419. The HCFG builder now only constructs plain CFGs. I will move it to VPlanConstruction as follow-up. Depends on https://github.com/llvm/llvm-project/pull/128419. PR: https://github.com/llvm/llvm-project/pull/129402	2025-04-16 13:30:45 +02:00
Florian Hahn	54b33eba16	[VPlan] Add opcode to create step for wide inductions. (#119284 ) This patch adds a WideIVStep opcode that can be used to create a vector with the steps to increment a wide induction. The opcode has 2 operands * the vector step * the scale of the vector step The opcode is later converted into a sequence of recipes that convert the scale and step to the target type, if needed, and then multiply vector step by scale. This simplifies code that needs to materialize step vectors, e.g. replacing wide IVs as follow up to https://github.com/llvm/llvm-project/pull/108378 with an increment of the wide IV step. PR: https://github.com/llvm/llvm-project/pull/119284	2025-04-14 23:20:44 +02:00
Florian Hahn	c73ad7ba20	[VPlan] Add transformation to narrow interleave groups. This patch adds a new narrowInterleaveGroups transfrom, which tries convert a plan with interleave groups with VF elements to a plan that instead replaces the interleave groups with wide loads and stores processing VF elements. This effectively is a very simple form of loop-aware SLP, where we use interleave groups to identify candidates. This initial version is quite restricted and hopefully serves as a starting point for how to best model those kinds of transforms. For now it only transforms load interleave groups feeding store groups. Depends on #106431. This lands the main parts of the approved https://github.com/llvm/llvm-project/pull/106441 as suggested to break things up a bit more.	2025-03-20 19:41:37 +00:00
Florian Hahn	2e13ec561c	[VPlan] Bail out on non-intrinsic calls in VPlanNativePath. Update initial VPlan-construction in VPlanNativePath in line with the inner loop path, in that it bails out when encountering constructs it cannot handle, like non-intrinsic calls. Fixes https://github.com/llvm/llvm-project/issues/131071.	2025-03-19 21:35:15 +00:00
Florian Hahn	62994c3291	[VPlan] Also introduce explicit broadcasts for values from entry VPBB. Update and generalize materializeBroadcasts to also introduce explicit broadcasts for VPValues defined in the Plans Entry block. This fixes a crash when trying to insert the broadcasts generated by VPTransformState::get after the generating instruction, which isn't possible after invoke instructions. Fixes https://github.com/llvm/llvm-project/issues/128838.	2025-03-12 22:03:19 +00:00
Florian Hahn	fd267082ee	[VPlan] Refactor VPlan creation, add transform introducing region (NFC). (#128419 ) Create an empty VPlan first, then let the HCFG builder create a plain CFG for the top-level loop (w/o a top-level region). The top-level region is introduced by a separate VPlan-transform. This is instead of creating the vector loop region before building the VPlan CFG for the input loop. This simplifies the HCFG builder (which should probably be renamed) and moves along the roadmap ('buildLoop') outlined in [1]. As follow-up, I plan to also preserve the exit branches in the initial VPlan out of the CFG builder, including connections to the exit blocks. The conversion from plain CFG with potentially multiple exits to a single entry/exit region will be done as VPlan transform in a follow-up. This is needed to enable VPlan-based predication. Currently early exit support relies on building the block-in masks on the original CFG, because exiting branches and conditions aren't preserved in the VPlan. So in order to switch to VPlan-based predication, we will have to preserve them in the initial plain CFG, so the exit conditions are available explicitly when we convert to single entry/exit regions. Another follow-up is updating the outer loop handling to also introduce VPRegionBlocks for nested loops as transform. Currently the existing logic in the builder will take care of creating VPRegionBlocks for nested loops, but not the top-level loop. [1] https://llvm.org/devmtg/2023-10/slides/techtalks/Hahn-VPlan-StatusUpdateAndRoadmap.pdf PR: https://github.com/llvm/llvm-project/pull/128419	2025-03-09 15:05:35 +00:00
Florian Hahn	4277c21059	[VPlan] Introduce explicit broadcasts for live-ins. (#124644 ) Add a new VPInstruction::Broadcast opcode and use it to materialize explicit broadcasts of live-ins. The initial patch only materlizes the broadcasts if the vector preheader dominates all uses that need it. Later patches will pick the best valid insert point, thus retiring implicit hoisting of broadcasts from VPTransformsState::get(). PR: https://github.com/llvm/llvm-project/pull/124644	2025-02-26 13:57:51 +00:00
Florian Hahn	6ff8a06de9	[VPlan] Run recipe removal and simplification after optimizeForVFAndUF. (#125926 ) Run recipe simplification and dead recipe removal after VPlan-based unrolling and optimizeForVFAndUF, to clean up any redundant or dead recipes introduced by them. Currently this is NFC, as it removes the corresponding removeDeadRecipes run in optimizeForVFAndUF and no additional simplifications kick in after unrolling yet. That is changing with https://github.com/llvm/llvm-project/pull/123655. Note that with this change, pattern-matching is now applied after EVL-based recipes have been introduced. Trying to match VPWidenEVLRecipe when not explicitly requested might apply a pattern with 2 operands to one with 3 due to the extra EVL operand and VPWidenEVLRecipe being a subclass of VPWidenRecipe. To prevent this, update Recipe_match::match to only match VPWidenEVLRecipe if it is in the requested recipe types (RecipeTy). PR: https://github.com/llvm/llvm-project/pull/125926	2025-02-08 13:33:46 +00:00
David Sherwood	3bc2dade36	[LoopVectorize] Enable vectorisation of early exit loops with live-outs (#120567 ) This work feeds part of PR https://github.com/llvm/llvm-project/pull/88385, and adds support for vectorising loops with uncountable early exits and outside users of loop-defined variables. When calculating the final value from an uncountable early exit we need to calculate the vector lane that triggered the exit, and hence determine the value at the point we exited. All code for calculating the last value when exiting the loop early now lives in a new vector.early.exit block, which sits between the middle.split block and the original exit block. Doing this required two fixes: 1. The vplan verifier incorrectly assumed that the block containing a definition always dominates the block of the user. That's not true if you can arrive at the use block from multiple incoming blocks. This is possible for early exit loops where both the early exit and the latch jump to the same block. 2. We were adding the new vector.early.exit to the wrong parent loop. It needs to have the same parent as the actual early exit block from the original loop. I've added a new ExtractFirstActive VPInstruction that extracts the first active lane of a vector, i.e. the lane of the vector predicate that triggered the exit. NOTE: The IR generated for dealing with live-outs from early exit loops is unoptimised, as opposed to normal loops. This inevitably leads to poor quality code, but this can be fixed up later.	2025-01-30 10:37:00 +00:00
Florian Hahn	2b55ef187c	[VPlan] Add helper to run VPlan passes, verify after run (NFC). (#123640 ) Add new runPass helpers to run a VPlan transformation. This makes it easier to add additional checks/functionality for each transform run. In this patch, an option is added to run the verifier after each VPlan transform. Follow-ups will use the same helper to also support printing VPlans after each transform. Note that the verifier at the moment requires there to be a canonical IV and vector loop region, so the final lowering transforms aren't run via runPass yet. PR: https://github.com/llvm/llvm-project/pull/123640	2025-01-29 10:50:01 +00:00
Florian Hahn	09a29fcc8d	[VPlan] Don't collect live-ins in collectUsersInExitBlocks. (NFC) (#123819 ) Live-ins don't need to be handled, other than adding to the exit phi recipe. Do that early and assert that otherwise the exit value is defined in the vector loop region. This should enable simply skipping other exit values that do not need further fixing, e.g. if handling the exit value from the early exit directly in handleUncountableEarlyExit. PR: https://github.com/llvm/llvm-project/pull/123819	2025-01-27 16:12:07 +00:00
Florian Hahn	2c87133c62	Reapply "[VPlan] Update final IV exit value via VPlan. (#112147 )" This reverts the revert commit 58326f1d5b5b379590af92dd129b2f3b3e96af46. The build failure in sanitizer stage2 builds has been fixed with 0d39fe6f5bb3edf0bddec09a8c6417377390aeac. Original commit message: Model updating IV users directly in VPlan, replace fixupIVUsers. Now simple extracts are created for all phis in the exit block during initial VPlan construction. A later VPlan transform (optimizeInductionExitUsers) replaces extracts of inductions with their pre-computed values if possible. This completes the transition towards modeling all live-outs directly in VPlan. There are a few follow-ups: * emit extracts initially also for resume phis, and optimize them tougher with IV exit users * support for VPlans with multiple exits in optimizeInductionExitUsers. Depends on https://github.com/llvm/llvm-project/pull/110004, https://github.com/llvm/llvm-project/pull/109975 and https://github.com/llvm/llvm-project/pull/112145.	2025-01-19 19:32:03 +00:00
Florian Hahn	58326f1d5b	Revert "[VPlan] Update final IV exit value via VPlan. (#112147 )" This reverts commit c2d15ac4d4432788557e77c15ce572ac655a8fec. Causes build failures on PPC stage2 & fuchsia bots https://lab.llvm.org/buildbot/#/builders/168/builds/7650 https://lab.llvm.org/buildbot/#/builders/11/builds/11248	2025-01-18 13:40:33 +00:00
Florian Hahn	c2d15ac4d4	[VPlan] Update final IV exit value via VPlan. (#112147 ) Model updating IV users directly in VPlan, replace fixupIVUsers. Now simple extracts are created for all phis in the exit block during initial VPlan construction. A later VPlan transform (optimizeInductionExitUsers) replaces extracts of inductions with their pre-computed values if possible. This completes the transition towards modeling all live-outs directly in VPlan. There are a few follow-ups: * emit extracts initially also for resume phis, and optimize them tougher with IV exit users * support for VPlans with multiple exits in optimizeInductionExitUsers. Depends on https://github.com/llvm/llvm-project/pull/110004, https://github.com/llvm/llvm-project/pull/109975 and https://github.com/llvm/llvm-project/pull/112145.	2025-01-18 13:22:34 +00:00
Florian Hahn	5fae408d3a	[VPlan] Dispatch to multiple exit blocks via middle blocks. (#112138 ) A more lightweight variant of https://github.com/llvm/llvm-project/pull/109193, which dispatches to multiple exit blocks via the middle blocks. The patch also introduces a bit of required scaffolding to enable early-exit vectorization, including an option. At the moment, early-exit vectorization doesn't come with legality checks, and is only used if the option is provided and the loop has metadata forcing vectorization. This is only intended to be used for testing during bring-up, with @david-arm enabling auto early-exit vectorization plugging in the changes from https://github.com/llvm/llvm-project/pull/88385. PR: https://github.com/llvm/llvm-project/pull/112138	2024-12-11 21:11:05 +00:00
Florian Hahn	afef545efa	[VPlan] Address post-commit for #114305 . Apply suggested renaming and adjust placement as suggested in https://github.com/llvm/llvm-project/pull/114305. Also drop unneeded RPOT creation.	2024-12-08 21:24:19 +00:00
Florian Hahn	a7fda0e1e4	[VPlan] Introduce VPScalarPHIRecipe, use for can & EVL IV codegen (NFC). (#114305 ) Introduce a general recipe to generate a scalar phi. Lower VPCanonicalIVPHIRecipe and VPEVLBasedIVRecipe to VPScalarIVPHIrecipe before plan execution, avoiding the need for duplicated ::execute implementations. There are other cases that could benefit, including in-loop reduction phis and pointer induction phis. Builds on a similar idea as https://github.com/llvm/llvm-project/pull/82270. PR: https://github.com/llvm/llvm-project/pull/114305	2024-12-03 14:53:51 +00:00
Florian Hahn	2dfb1c664c	[VPlan] Try to hoist Previous (and operands), if sinking fails for FORs. (#108945 ) In some cases, Previous (and its operands) can be hoisted. This allows supporting additional cases where sinking of all users of to FOR fails, e.g. due having to sink recipes with side-effects. This fixes a crash where we fail to create a scalar VPlan for a first-order recurrence, but can create a vector VPlan, because the trunc instruction of an IV which generates the previous value of the recurrence has been optimized to a truncated induction recipe, thus hoisting it to the beginning. Fixes https://github.com/llvm/llvm-project/issues/106523. PR: https://github.com/llvm/llvm-project/pull/108945	2024-10-23 13:12:03 -07:00
Alexey Bataev	f148d5791b	[LV]Initial support for safe distance in predicated DataWithEVL vectorization mode. Enabled initial support for max safe distance in DataWithEVL mode. If max safe distance is required, need to emit special code: CMP = icmp ult AVL, MAX_SAFE_DISTANCE SAFE_AVL = select CMP, AVL, MAX_SAFE_DISTANCE EVL = call i32 @llvm.experimental.get.vector.length(i64 SAFE_AVL) while vectorize the loop in DataWithEVL tail folding mode. Reviewers: fhahn Reviewed By: fhahn Pull Request: https://github.com/llvm/llvm-project/pull/102897	2024-10-18 15:51:49 -04:00
Florian Hahn	7f74651837	[VPlan] Use pointer to member 0 as VPInterleaveRecipe's pointer arg. (#106431 ) Update VPInterleaveRecipe to always use the pointer to member 0 as pointer argument. This in many cases helps to remove unneeded index adjustments and simplifies VPInterleaveRecipe::execute. In some rare cases, the address of member 0 does not dominate the insert position of the interleave group. In those cases a PtrAdd VPInstruction is emitted to compute the address of member 0 based on the address of the insert position. Alternatively we could hoist the recipe computing the address of member 0.	2024-10-06 22:53:13 +01:00
Florian Hahn	53266f73f0	[VPlan] Run DCE after unrolling. This cleans up a number of dead recipes after unrolling if only their first or last parts are used. This simplifies a number of tests. Fixes https://github.com/llvm/llvm-project/issues/109581.	2024-09-22 22:08:46 +01:00
Florian Hahn	8ec406757c	[VPlan] Implement unrolling as VPlan-to-VPlan transform. (#95842 ) This patch implements explicit unrolling by UF as VPlan transform. In follow up patches this will allow simplifying VPTransform state (no need to store unrolled parts) as well as recipe execution (no need to generate code for multiple parts in an each recipe). It also allows for more general optimziations (e.g. avoid generating code for recipes that are uniform-across parts). It also unifies the logic dealing with unrolled parts in a single place, rather than spreading it out across multiple places (e.g. VPlan post processing for header-phi recipes previously.) In the initial implementation, a number of recipes still take the unrolled part as additional, optional argument, if their execution depends on the unrolled part. The computation for start/step values for scalable inductions changed slightly. Previously the step would be computed as scalar and then splatted, now vscale gets splatted and multiplied by the step in a vector mul. This has been split off https://github.com/llvm/llvm-project/pull/94339 which also includes changes to simplify VPTransfomState and recipes' ::execute. The current version mostly leaves existing ::execute untouched and instead sets VPTransfomState::UF to 1. A follow-up patch will clean up all references to VPTransformState::UF. Another follow-up patch will simplify VPTransformState to only store a single vector value per VPValue. PR: https://github.com/llvm/llvm-project/pull/95842	2024-09-21 19:47:37 +01:00
David Sherwood	f3029b330a	[NFC][LoopVectorize] Avoid passing ScalarEvolution to VPlanTransforms::optimize (#108380 ) Whilst trying to write some VPlan unit tests I realised that we don't need to pass a ScalarEvolution object into VPlanTransforms::optimize because the only thing we actually need is a LLVMContext.	2024-09-13 12:09:00 +01:00
Florian Hahn	16910a21ee	[VPlan] Move logic to create interleave groups to VPlanTransforms (NFC). This is a step towards further breaking up the rather large tryToBuildVPlanWithVPRecipes. It moves logic create interleave groups to VPlanTransforms.cpp, where similar replacements for other recipes are defined as well (e.g. EVL-based ones)	2024-08-28 15:56:09 +01:00
Shih-Po Hung	0338c55ea5	[LV, VPlan] Check if plan is compatible to EVL transform (#92092 ) The transform updates all users of inductions to work based on EVL, instead of the VF directly. At the moment, widened inductions cannot be updated, so bail out if the plan contains any. This patch introduces a check before applying EVL transform. If any recipes in loop rely on RuntimeVF, the plan is discarded.	2024-05-25 08:22:49 +08:00
Alexey Bataev	413a66f339	[LV, VP]VP intrinsics support for the Loop Vectorizer + adding new tail-folding mode using EVL. (#76172 ) This patch introduces generating VP intrinsics in the Loop Vectorizer. Currently the Loop Vectorizer supports vector predication in a very limited capacity via tail-folding and masked load/store/gather/scatter intrinsics. However, this does not let architectures with active vector length predication support take advantage of their capabilities. Architectures with general masked predication support also can only take advantage of predication on memory operations. By having a way for the Loop Vectorizer to generate Vector Predication intrinsics, which (will) provide a target-independent way to model predicated vector instructions. These architectures can make better use of their predication capabilities. Our first approach (implemented in this patch) builds on top of the existing tail-folding mechanism in the LV (just adds a new tail-folding mode using EVL), but instead of generating masked intrinsics for memory operations it generates VP intrinsics for loads/stores instructions. The patch adds a new VPlanTransforms to replace the wide header predicate compare with EVL and updates codegen for load/stores to use VP store/load with EVL. Other important part of this approach is how the Explicit Vector Length is computed. (VP intrinsics define this vector length parameter as Explicit Vector Length (EVL)). We use an experimental intrinsic `get_vector_length`, that can be lowered to architecture specific instruction(s) to compute EVL. Also, added a new recipe to emit instructions for computing EVL. Using VPlan in this way will eventually help build and compare VPlans corresponding to different strategies and alternatives. Differential Revision: https://reviews.llvm.org/D99750	2024-04-04 18:30:17 -04:00
Florian Hahn	20177c45db	[VPlan] Turn private members of VPlanTransforms to static funcs (NFC) Private members of VPlanTransforms are only used inside VPlanTransforms.cpp, just make them static.	2024-02-17 13:45:23 +00:00
Florian Hahn	debca7ee43	[VPlan] Move dropping of poison flags to VPlanTransforms. (NFC) Move collectPoisonGeneratingFlags from InnerLoopVectorizer to VPlanTransforms and also update its name. collectPoisonGeneratingFlags already directly drops poison-generating flags, not only collecting it. This means it is more appropriate to integerate it directly into the VPlan transform pipeline. The current implementation still calls back to legal to check if a block needs predication, which should be improved in the future.	2024-02-14 12:28:58 +00:00
Kazu Hirata	8b1181133d	[Transforms] Remove unused forward declarations (NFC)	2023-12-10 10:07:12 -08:00
Florian Hahn	70535f5e60	[VPlan] Replace IR based truncateToMinimalBitwidths with VPlan version. This patch replaces the IR based truncateToMinimalBitwidths with a VPlan version. This has 3 benefits: 1) the VPlan-based version is simpler; we don't need to implement special codegen for each supported instruction type like the IR based one. 2) Removes a dependency on the cost-model after VPlan execution and 3) Removes a use of getVPValue that uses underlying values after VPlan execution (See removed FIXME). Depends on D149081. Depends on D149079. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D149903	2023-12-02 16:12:38 +00:00
Florian Hahn	97687b7aea	[VPlan] Add active-lane-mask as VPlan-to-VPlan transformation. This patch updates the mask creation code to always create compares of the form (ICMP_ULE, wide canonical IV, backedge-taken-count) up front when tail folding and introduce active-lane-mask as later transformation. This effectively makes (ICMP_ULE, wide canonical IV, backedge-taken-count) the canonical form for tail-folding early on. Introducing more specific active-lane-mask recipes is treated as a VPlan-to-VPlan optimization. This has the advantage of keeping the logic (and complexity) of introducing active-lane-mask recipes in a single place, instead of spreading the logic out across multiple functions. It also simplifies initial VPlan construction and enables treating introducing EVL as similar optimization. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D158779	2023-09-25 13:34:45 +01:00
Florian Hahn	a6d6730709	[LV] Split off code to optimize initial VPlan (NFC). Split up tryToBuildVPlanWithVPRecipes into intial plan creation and optimizations, by introducing a VPLanTransform::optimize helper. Depends on D154640. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D154644	2023-08-04 13:21:20 +01:00
Florian Hahn	9259f41e62	[VPlan] Clear reduction flags directly as VPlanTransform. After D150027, all relevant recipes should model their IR flags directly. Instead of removing the flags after codegen as part of fixReductions, drop poison generating flags directly from the recipes. Depends on D150027. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D150028	2023-07-09 21:11:51 +01:00
Florian Hahn	6303fa369c	[VPlan] Remove DeadInsts arg from VPInstructionsToVPRecipes (NFC) The argument isn't used. VPlan-based dead recipe removal can be used instead.	2023-05-01 15:03:29 +01:00

1 2

83 Commits