llvm-project

Author	SHA1	Message	Date
Florian Hahn	5550d30228	[VPlan] Check captured operand when simplifying redundant OR. Follow-up to 0f607f to actually use the captured operand X instead of Y.	2025-04-13 13:23:27 +01:00
Florian Hahn	0f607f3df5	[VPlan] Simplify 'or x, true' -> true. Add additional OR simplification to fix a divergence between legacy and VPlan-based cost model. This adds a new m_AllOnes matcher by generalizing specific_intval to int_pred_ty, which takes a predicate to check to support matching both specific APInts and other APInt predices, like isAllOnes. Fixes https://github.com/llvm/llvm-project/issues/131359.	2025-04-13 12:09:40 +01:00
Luke Lau	be6ccc98f3	[VPlan] Split out VPBlendRecipe simplifications from simplifyRecipes. NFC (#134073 ) This is split off from #133977 VPBlendRecipe normalisation is sensitive to the number of users a mask has, so should probably be run after the masks are simplified as much as possible. Note this could be run after removeDeadRecipes but this causes test diffs, some regressions, so this is left to a later patch.	2025-04-07 09:55:52 +01:00
Florian Hahn	464286ba63	[VPlan] Don't narrow interleave groups if there are vector pointers. Do not narrow interleave groups if there are VectorPointer recipes and the plan was unrolled. The recipe implicitly uses VF from VPTransformState.	2025-04-06 22:14:24 +01:00
Florian Hahn	3a859b11e3	[VPlan] Set and use debug location for VPScalarIVStepsRecipe. This adds missing debug location for VPscalarIVStepsRecipe. The location of the corresponding phi is used.	2025-04-04 21:14:36 +01:00
Florian Hahn	5fbd0658a0	[VPlan] Add initial CFG simplification, removing BranchOnCond true. (#106748 ) Add an initial CFG simplification transform, which removes the dead edges for blocks terminated with BranchOnCond true. At the moment, this removes the edge between middle block and scalar preheader when folding the tail. PR: https://github.com/llvm/llvm-project/pull/106748	2025-04-04 15:44:26 +01:00
Florian Hahn	380defd4b3	[VPlan] Update VPInterleaveRecipe to take debug loc directly as arg (NFC)	2025-04-02 22:46:38 +01:00
Luke Lau	8107b430ed	[VPlan] Simplify select c, x, x -> x (#133731 ) As noted in 1a9358c090d0507be21c5e9b2d97a23ef1de8ab0, some simplifications can produce a redundant select where the true and false operands are the same, which this patch removes. The is_fpclass test was changed so the condition wasn't made dead.	2025-04-02 10:26:48 +01:00
Luke Lau	6afe5e5d1a	[LV][EVL] Peek through combination tail-folded + predicated masks (#133430 ) If a recipe was predicated and tail folded at the same time, it will have a mask like EMIT vp<%header-mask> = icmp ule canonical-iv, backedge-tc EMIT vp<%mask> = logical-and vp<%header-mask>, vp<%pred-mask> When converting to an EVL recipe, if the mask isn't exactly just the header-mask we copy the whole logical-and. We can remove this redundant logical-and (because it's now covered by EVL) and just use vp<%pred-mask> instead. This lets us remove the widened canonical IV in more places.	2025-03-31 21:28:39 +01:00
Luke Lau	b739a3cb65	[VPlan] Add m_Deferred. NFC (#133736 ) This copies over the implementation of m_Deferred which allows matching values that were bound in the pattern, and uses it for the (X && Y) \|\| (X && !Y) -> X simplifcation.	2025-03-31 21:01:28 +01:00
Florian Hahn	809f857d2c	[VPlan] Support early-exit loops in optimizeForVFAndUF. (#131539 ) Update optimizeForVFAndUF to support early-exit loops by handling BranchOnCond(Or(..., CanonicalIV == TripCount)) via SCEV PR: https://github.com/llvm/llvm-project/pull/131539	2025-03-31 07:55:48 +01:00
Florian Hahn	6b98134466	[VPlan] Re-enable narrowing interleave groups with interleaving. Remove the UF = 1 restriction introduced by 577631f0a5 building on top of 783a846507683, which allows updating all relevant users of the VF, VPScalarIVSteps in particular. This restores the full functionality of https://github.com/llvm/llvm-project/pull/106441.	2025-03-29 20:14:10 +00:00
Florian Hahn	783a846507	[VPlan] Add VF as operand to VPScalarIVStepsRecipe. Similarly to other recipes, update VPScalarIVStepsRecipe to also take the runtime VF as argument. This removes some unnecessary runtime VF computations for scalable vectors. It will also allow dropping the UF == 1 restriction for narrowing interleave groups required in 577631f0a528.	2025-03-28 21:48:59 +00:00
Hari Limaye	bf5627c85e	[LV] Optimize VPWidenIntOrFpInductionRecipe for known TC (#118828 ) Optimize the IR generated for a VPWidenIntOrFpInductionRecipe to use the narrowest type necessary, when the trip-count of a loop is known to be constant and the only use of the recipe is the condition used by the vector loop's backedge branch.	2025-03-28 14:47:40 +00:00
Florian Hahn	7b75db5755	[VPlan] Add new VPIRPhi overlay for VPIRInsts wrapping phi nodes (NFC). (#129387 ) Add a new VPIRPhi subclass of VPIRInstruction, that purely serves as an overlay, to provide more convenient checking (via directly doing isa/dyn_cast/cast) and specialied execute/print implementations. Both VPIRInstruction and VPIRPhi share the same VPDefID, and are differentiated by the backing IR instruction. This pattern could alos be used to provide more specialized interfaces for some VPInstructions ocpodes, without introducing new, completely spearate recipes. An example would be modeling VPWidenPHIRecipe & VPScalarPHIRecip using VPInstructions opcodes and providing an interface to retrieve incoming blocks and values through a VPInstruction subclass similar to VPIRPhi. PR: https://github.com/llvm/llvm-project/pull/129387	2025-03-28 08:43:46 +00:00
Florian Hahn	5eccd71ce4	[VPlan] Add assertion ensuring Plan's UF matches BestUF (NFC).	2025-03-27 19:29:55 +00:00
David Sherwood	1c9fe8c8af	[LV] Optimise users of induction variables in early exit blocks (#130766 ) This is the second of two PRs that attempts to improve the IR generated in the exit blocks of vectorised loops with uncountable early exits. It follows on from PR #128880. In this PR I am improving the generated code for users of induction variables in early exit blocks. This required using a newly add VPInstruction called FirstActiveLane, which calculates the index of the first active predicate in the mask operand. I have added a new function optimizeEarlyExitInductionUser that is called from optimizeInductionExitUsers when handling users in early exit blocks.	2025-03-26 12:09:59 +00:00
Florian Hahn	577631f0a5	Reapply "[VPlan] Add transformation to narrow interleave groups. (#106441 )" This reverts commit ff3e2ba9eb94217f3ad3525dc18b0c7b684e0abf. The recommmitted version limits to transform to cases where no interleaving is taking place, to avoid a mis-compile when interleaving. Original commit message: This patch adds a new narrowInterleaveGroups transfrom, which tries convert a plan with interleave groups with VF elements to a plan that instead replaces the interleave groups with wide loads and stores processing VF elements. This effectively is a very simple form of loop-aware SLP, where we use interleave groups to identify candidates. This initial version is quite restricted and hopefully serves as a starting point for how to best model those kinds of transforms. Depends on https://github.com/llvm/llvm-project/pull/106431. Fixes https://github.com/llvm/llvm-project/issues/82936. PR: https://github.com/llvm/llvm-project/pull/106441	2025-03-25 20:57:10 +00:00
Florian Hahn	dfca6c0d3b	[VPlan] Remove no-op SCALAR-STEPS after unrolling. (#123655 ) After unrolling, there may be additional simplifications that can be applied. One example is removing SCALAR-STEPS for the first part where only the first lane is demanded. This removes redundant adds of 0 from a large number of tests (~200), many which I am still working on updating. In preparation for removing redundant WideIV steps added in https://github.com/llvm/llvm-project/pull/119284. PR: https://github.com/llvm/llvm-project/pull/123655	2025-03-25 12:57:24 +00:00
Florian Hahn	06fd10f1da	[VPlan] Don't create ExtractElement recipes for scalar plans. (#131604 ) ExtractElements are no-ops for scalar VPlans. Don't introduce them in handleUncountableEarlyExit if the plan has only a scalar VF. This fixes a crash trying to compute the cost of ExtractElement after 26ecf978951b79. PR: https://github.com/llvm/llvm-project/pull/131604	2025-03-23 22:00:02 +00:00
Martin Storsjö	ff3e2ba9eb	Revert "[VPlan] Add transformation to narrow interleave groups. (#106441 )" This reverts commit dfa665f19c52d98b8d833a8e9073427ba5641b19. This commit caused miscompilations in ffmpeg, see https://github.com/llvm/llvm-project/pull/106441 for details.	2025-03-23 23:27:39 +02:00
Kazu Hirata	fae34938f6	[llvm] Use *Set::insert_range (NFC) (#132591 ) DenseSet, SmallPtrSet, SmallSet, SetVector, and StringSet recently gained C++23-style insert_range. This patch uses insert_range with iterator ranges. For each case, I've verified that foos is defined as make_range(foo_begin(), foo_end()) or in a similar manner.	2025-03-22 22:14:45 -07:00
Florian Hahn	dfa665f19c	[VPlan] Add transformation to narrow interleave groups. (#106441 ) This patch adds a new narrowInterleaveGroups transfrom, which tries convert a plan with interleave groups with VF elements to a plan that instead replaces the interleave groups with wide loads and stores processing VF elements. This effectively is a very simple form of loop-aware SLP, where we use interleave groups to identify candidates. This initial version is quite restricted and hopefully serves as a starting point for how to best model those kinds of transforms. Depends on https://github.com/llvm/llvm-project/pull/106431. Fixes https://github.com/llvm/llvm-project/issues/82936. PR: https://github.com/llvm/llvm-project/pull/106441	2025-03-22 21:40:17 +00:00
Kazu Hirata	3520dc5e7a	[Vectorize] Fix a build This patch fixes: llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:2263:19: error: expected ';' after return statement	2025-03-20 12:52:27 -07:00
Florian Hahn	c73ad7ba20	[VPlan] Add transformation to narrow interleave groups. This patch adds a new narrowInterleaveGroups transfrom, which tries convert a plan with interleave groups with VF elements to a plan that instead replaces the interleave groups with wide loads and stores processing VF elements. This effectively is a very simple form of loop-aware SLP, where we use interleave groups to identify candidates. This initial version is quite restricted and hopefully serves as a starting point for how to best model those kinds of transforms. For now it only transforms load interleave groups feeding store groups. Depends on #106431. This lands the main parts of the approved https://github.com/llvm/llvm-project/pull/106441 as suggested to break things up a bit more.	2025-03-20 19:41:37 +00:00
Florian Hahn	2e13ec561c	[VPlan] Bail out on non-intrinsic calls in VPlanNativePath. Update initial VPlan-construction in VPlanNativePath in line with the inner loop path, in that it bails out when encountering constructs it cannot handle, like non-intrinsic calls. Fixes https://github.com/llvm/llvm-project/issues/131071.	2025-03-19 21:35:15 +00:00
Florian Hahn	870f753f1f	[VPlan] Also materialize broadcasts for backedge-taken-counts (NFC). Also include VPlan's BTC in the set of VPValues to materialize broadcasts for, if it is used.	2025-03-18 22:35:18 +00:00
Florian Hahn	d51bc83511	[VPlan] Only skip live-ins with constants in materializeBroadccast (NFC) Currently this should be NFC, but will be needed in future patches.	2025-03-18 20:23:16 +00:00
Luke Lau	a4dc02c0e7	[VPlan] Rename VPReverseVectorPointerRecipe to VPVectorEndPointerRecipe. NFC (#131086 ) After #128718 lands there will be two ways of performing a reversed widened memory access, either by performing a consecutive unit-stride access and a reverse, or a strided access with a negative stride. Even though both produce a reversed vector, only the former needs VPReverseVectorPointerRecipe which computes a pointer to the last element of each part. A strided reverse still needs a pointer to the first element of each part so it will use VPVectorPointerRecipe. This renames VPReverseVectorPointerRecipe to VPVectorEndPointerRecipe to clarify that a reversed access may not necessarily need a pointer to the last element.	2025-03-19 00:09:15 +08:00
David Sherwood	3b6d0093aa	[LV][NFC] Refactor code for extracting first active element (#131118 ) Refactor the code to extract the first active element of a vector in the early exit block, in preparation for PR #130766. I've replaced the VPInstruction::ExtractFirstActive nodes with a combination of a new VPInstruction::FirstActiveLane node and a Instruction::ExtractElement node.	2025-03-14 11:14:09 +00:00
Florian Hahn	02575f887b	[VPlan] Use VPInstruction for VPScalarPHIRecipe. (NFCI) (#129767 ) Now that all phi nodes manage their incoming blocks through the VPlan-predecessors, there should be no need for having a dedicate recipe, it should be sufficient to allow PHI opcodes in VPInstruction. Follow-ups will also migrate VPWidenPHIRecipe and possibly others, building on top of https://github.com/llvm/llvm-project/pull/129388. PR: https://github.com/llvm/llvm-project/pull/129767	2025-03-13 18:35:07 +00:00
Florian Hahn	62994c3291	[VPlan] Also introduce explicit broadcasts for values from entry VPBB. Update and generalize materializeBroadcasts to also introduce explicit broadcasts for VPValues defined in the Plans Entry block. This fixes a crash when trying to insert the broadcasts generated by VPTransformState::get after the generating instruction, which isn't possible after invoke instructions. Fixes https://github.com/llvm/llvm-project/issues/128838.	2025-03-12 22:03:19 +00:00
Florian Hahn	8132c4f554	[VPlan] Also introduce broadcasts for live-ins used in vec preheader. Slightly generalize materializeLiveInBroadcasts to also introduce broadcasts for live-ins used in the vector preheader. This should cover all live-ins. If the live-in is used in the vector preheader, insert the broadcast at the beginning of the block.	2025-03-11 21:19:14 +00:00
David Sherwood	055db3ec33	[LV] Optimise latch exit induction users for some early exit loops (#128880 ) This is the first of two PRs that attempts to improve the IR generated in the exit blocks of vectorised loops with uncountable early exits. In this PR I am improving the generated code for users of induction variables in early exit loops that have a unique exit block, when exiting via the latch. I have moved some of the code for calculating the exit values in latch exit blocks from `optimizeInductionExitUsers` into a new function `optimizeLatchExitInductionUser`. I intend to follow this up very soon with another patch to optimise the code for induction users in the vector.early.exit block.	2025-03-11 10:13:16 +00:00
Florian Hahn	8dd160f476	Revert "[VPlan] Fold NOT into predicate of wide compares." (#130347 ) Reverts llvm/llvm-project#129430 this seems to have introduced a divergence between legacy and VPlan-based cost model https://lab.llvm.org/buildbot/#/builders/30/builds/17159	2025-03-07 21:18:49 +00:00
Florian Hahn	cb3ce30ca8	[VPlan] Fold NOT into predicate of wide compares. (#129430 ) Add simplification to fold negation into a compare, if the negation is the only user of the compare. This removes a number of redundant negations. Alive2 Proofs for FPCMP test changes: https://alive2.llvm.org/ce/z/WGDz9U PR: https://github.com/llvm/llvm-project/pull/129430	2025-03-07 20:32:43 +00:00
Florian Hahn	b2d70e8796	[VPlan] Use Builder to create cast recipes in VPlanTransforms (NFC). Use VPBuilder in a few more places. This avoids manual insertions and will make changing the cast recipe easier in the future.	2025-03-04 13:39:12 +00:00
Florian Hahn	15770a1e9d	[VPlan] Remove dead recipes in entry when merging regions. (NFC) Also remove recipes in the entry of the region that will be removed. This makes sure we don't leave any dead users around. NFC at the moment, but avoids causing issues in the future.	2025-03-04 08:26:27 +00:00
Mel Chen	9b4ad2fe50	[LV][EVL] Support fixed-order recurrence idiom with EVL tail folding. (#124093 ) This patch converts the llvm.vector.splice intrinsic to llvm.experimental.vp.splice, ensuring that fixed-order recurrences execute correctly when tail folding by EVL is enable. Due to the non-VFxUF penultimate EVL issue, the EVL from the previous iteration will be preserved and used in llvm.experimental.vp.splice.	2025-03-03 21:27:13 +08:00
Florian Hahn	6ce41db6b0	[VPlan] Preserve DebugLoc for VPBranchOnMaskRecipe. Update code to set and generate debug location for branch recipe	2025-02-27 20:19:42 +00:00
Florian Hahn	1e1b9bccc0	[VPlan] Simplify BLEND %a, %b, NOT(%m) -> BLEND %b, %a, %m. (#128375 ) Avoid negations for normalized blends by reordering operands. PR: https://github.com/llvm/llvm-project/pull/128375	2025-02-27 17:43:24 +00:00
Florian Hahn	4277c21059	[VPlan] Introduce explicit broadcasts for live-ins. (#124644 ) Add a new VPInstruction::Broadcast opcode and use it to materialize explicit broadcasts of live-ins. The initial patch only materlizes the broadcasts if the vector preheader dominates all uses that need it. Later patches will pick the best valid insert point, thus retiring implicit hoisting of broadcasts from VPTransformsState::get(). PR: https://github.com/llvm/llvm-project/pull/124644	2025-02-26 13:57:51 +00:00
Florian Hahn	522b05afb6	[VPlan] Construct immutable VPIRBBs for exit blocks at construction(NFC) (#128374 ) Constract immutable VPIRBasicBlocks for all exit blocks up front and keep a list of them. Same as the scalar header, they are leaf nodes of the VPlan and won't change. Some exit blocks may be unreachable, e.g. if the scalar epilogue always executes or depending on optimizations. This simplifies both the way we retrieve the exit blocks as well as hooking up the exit blocks. PR: https://github.com/llvm/llvm-project/pull/128374	2025-02-25 14:23:27 +00:00
Florian Hahn	baa77e30f0	[LV] Remove some redundant casts (NFC).	2025-02-24 21:46:29 +00:00
Luke Lau	e23ab73335	[VPlan] Don't convert widen recipes to VP intrinsics in EVL transform (#127180 ) This is a copy of #126177, since it was automatically and permanently closed because I messed up the source branch on my remote This patch proposes to avoid converting widening recipes to VP intrinsics during the EVL transform. IIUC we initially did this to avoid `vl` toggles on RISC-V. However we now have the RISCVVLOptimizer pass which mostly makes this redundant. Emitting regular IR instead of VP intrinsics allows more generic optimisations, both in the middle end and DAGCombiner, and we generally have better patterns in the RISC-V backend for non-VP nodes. Sticking to regular IR instructions is likely a lot less work than reimplementing all of these optimisations for VP intrinsics, and on SPEC CPU 2017 we get noticeably better code generation.	2025-02-22 19:38:11 +08:00
Florian Hahn	6ff8a06de9	[VPlan] Run recipe removal and simplification after optimizeForVFAndUF. (#125926 ) Run recipe simplification and dead recipe removal after VPlan-based unrolling and optimizeForVFAndUF, to clean up any redundant or dead recipes introduced by them. Currently this is NFC, as it removes the corresponding removeDeadRecipes run in optimizeForVFAndUF and no additional simplifications kick in after unrolling yet. That is changing with https://github.com/llvm/llvm-project/pull/123655. Note that with this change, pattern-matching is now applied after EVL-based recipes have been introduced. Trying to match VPWidenEVLRecipe when not explicitly requested might apply a pattern with 2 operands to one with 3 due to the extra EVL operand and VPWidenEVLRecipe being a subclass of VPWidenRecipe. To prevent this, update Recipe_match::match to only match VPWidenEVLRecipe if it is in the requested recipe types (RecipeTy). PR: https://github.com/llvm/llvm-project/pull/125926	2025-02-08 13:33:46 +00:00
Florian Hahn	ee806646ad	[VPlan] Consistently use hasScalarVFOnly (NFC). Consistently use hasScalarVFOnly instead of using hasVF(ElementCount::getFixed(1)). Also add an assert to ensure all cases are covered by hasScalarVFOnly.	2025-02-08 12:19:25 +00:00
Florian Hahn	5008277322	[VPlan] Move auxiliary declarations out of VPlan.h (NFC). (#124104 ) Nothing in VPlan.h directly depends on VPTransformState, VPCostContext, VPFRange, VPlanPrinter or VPSlotTracker. Move them out to a separate header to reduce the size of widely used VPlan.h. This is a first step towards more cleanly separating declarations in VPlan. Besides reducing VPlan.h's size, this also allows including additional VPlan-related headers in VPlanHelpers.h for use there. An example is using VPDominatorTree in VPTransformState (https://github.com/llvm/llvm-project/pull/117138). PR: https://github.com/llvm/llvm-project/pull/124104	2025-02-02 13:44:07 +00:00
David Sherwood	3bc2dade36	[LoopVectorize] Enable vectorisation of early exit loops with live-outs (#120567 ) This work feeds part of PR https://github.com/llvm/llvm-project/pull/88385, and adds support for vectorising loops with uncountable early exits and outside users of loop-defined variables. When calculating the final value from an uncountable early exit we need to calculate the vector lane that triggered the exit, and hence determine the value at the point we exited. All code for calculating the last value when exiting the loop early now lives in a new vector.early.exit block, which sits between the middle.split block and the original exit block. Doing this required two fixes: 1. The vplan verifier incorrectly assumed that the block containing a definition always dominates the block of the user. That's not true if you can arrive at the use block from multiple incoming blocks. This is possible for early exit loops where both the early exit and the latch jump to the same block. 2. We were adding the new vector.early.exit to the wrong parent loop. It needs to have the same parent as the actual early exit block from the original loop. I've added a new ExtractFirstActive VPInstruction that extracts the first active lane of a vector, i.e. the lane of the vector predicate that triggered the exit. NOTE: The IR generated for dealing with live-outs from early exit loops is unoptimised, as opposed to normal loops. This inevitably leads to poor quality code, but this can be fixed up later.	2025-01-30 10:37:00 +00:00
Florian Hahn	2b55ef187c	[VPlan] Add helper to run VPlan passes, verify after run (NFC). (#123640 ) Add new runPass helpers to run a VPlan transformation. This makes it easier to add additional checks/functionality for each transform run. In this patch, an option is added to run the verifier after each VPlan transform. Follow-ups will use the same helper to also support printing VPlans after each transform. Note that the verifier at the moment requires there to be a canonical IV and vector loop region, so the final lowering transforms aren't run via runPass yet. PR: https://github.com/llvm/llvm-project/pull/123640	2025-01-29 10:50:01 +00:00

1 2 3 4 5 ...

318 Commits