llvm-project

Author	SHA1	Message	Date
Luke Lau	a4dc02c0e7	[VPlan] Rename VPReverseVectorPointerRecipe to VPVectorEndPointerRecipe. NFC (#131086 ) After #128718 lands there will be two ways of performing a reversed widened memory access, either by performing a consecutive unit-stride access and a reverse, or a strided access with a negative stride. Even though both produce a reversed vector, only the former needs VPReverseVectorPointerRecipe which computes a pointer to the last element of each part. A strided reverse still needs a pointer to the first element of each part so it will use VPVectorPointerRecipe. This renames VPReverseVectorPointerRecipe to VPVectorEndPointerRecipe to clarify that a reversed access may not necessarily need a pointer to the last element.	2025-03-19 00:09:15 +08:00
Florian Hahn	166937b49d	[LV] Cleanup after expanding SCEV predicate to constant. In some cases, SCEV isn't able to prove that no wrap checks are needed, while constant folding in SCEVExpander can. In those cases, we may leave around IR for computing the trip count, which is unused at this point but may be re-used later, triggering an assertion when trying to clean up SCEVExp after vectorization. Directly run the cleaner after expanding to a constant predicate to prevent any generated code from being re-used. Fixes https://github.com/llvm/llvm-project/issues/131281.	2025-03-17 21:26:51 +00:00
Luke Lau	eef5ea0c42	[VPlan] Account for dead FOR splice simplification in cost model (#131486 ) Fixes #131359 After #129645, a first-order recurrence will no longer have it's splice costed if the VPInstruction::FirstOrderRecurrenceSplice has no users and is dead. The legacy cost model didn't account for this, so this accounts for it in planContainsAdditionalSimplifications to avoid the "VPlan cost model and legacy cost model disagreed" assertion.	2025-03-18 00:00:54 +08:00
Florian Hahn	17b4be8f63	[VPlan] Move setting name and adding VFs after recipe creation.(NFC) Recipe creation is the only place where the VF range is restricted. Move setting the VFs just after initial recipe creation.	2025-03-17 12:04:09 +00:00
Luke Lau	315c02aa02	[VPlan] Fix crash with inloop fmuladd reductions with blend (#131154 ) When visiting in-loop reduction links, we previously crashed if we had an fmuladd with a blend after it in the chain. This fixes it by lifting the existing blend folding to also handle fmuladd. This also simplifies the code structure slightly for an upcoming patch I want to post to handle in-loop AnyOf reductions. I removed the PhiR->isInLoop() check since it's already guarded at the top of the parent Header->Phis() loop.	2025-03-14 09:08:32 +08:00
Florian Hahn	62994c3291	[VPlan] Also introduce explicit broadcasts for values from entry VPBB. Update and generalize materializeBroadcasts to also introduce explicit broadcasts for VPValues defined in the Plans Entry block. This fixes a crash when trying to insert the broadcasts generated by VPTransformState::get after the generating instruction, which isn't possible after invoke instructions. Fixes https://github.com/llvm/llvm-project/issues/128838.	2025-03-12 22:03:19 +00:00
Ramkumar Ramachandra	93d41d8148	[LV] Use ElementCount::isKnownLT to factor code (NFC) (#130596 )	2025-03-11 15:42:59 +00:00
John Brawn	7129205816	[LoopVectorize] Move checking for OptForSize into the cost model (NFC) (#130752 ) Move OptForSizeBasedOnProfile into the cost model and rename it to OptForSize, as shouldOptimizeForSize checks both the function attribute and profile. This being done in preparation for OptForSize being used in the cost model.	2025-03-11 14:58:04 +00:00
David Sherwood	26ecf97895	[LoopVectorize] Further improve cost model for early exit loops (#126235 ) Following on from #125058, this patch takes into account the work done in the vector early exit block when assessing the profitability of vectorising the loop. I have renamed areRuntimeChecksProfitable to isOutsideLoopWorkProfitable and we now pass in the early exit costs. As part of this, I have added the ExtractFirstActive opcode to VPInstruction::computeCost. It's worth pointing out that when we assess profitability of the loop we calculate a minimum trip count and compare that against the maximum trip count. However, since the loop has an early exit the runtime trip count can still end up being less than the minimum. Alternatively, we may never take the early exit at all at runtime and so we have the opposite problem of over-estimating the cost of the loop. The loop vectoriser cannot simultaneously take two contradictory positions and so I feel the only sensible thing to do is be conservative and assume the loop will be more expensive than loops without early exits. We may find in future that we need to adjust the cost according to the probability of taking the early exit. This will become even more important once we support multiple early exits. However, we have to start somewhere and we can always revisit this later.	2025-03-11 11:48:55 +00:00
Florian Hahn	f84f4e1e05	[LV] Don't crash on inner loops with RT checks in VPlan-native path. Assert that we only generate runtime checks for inner loops in emitMemRuntimeChecks, instead of returning nullptr in the VPlan-native path, which is causing crashes and incorrect code.	2025-03-10 20:28:56 +00:00
Ramkumar Ramachandra	35c0302086	[LV] Clean up unused 'using' definition (NFC) (#130597 )	2025-03-10 17:57:32 +00:00
Ramkumar Ramachandra	7220ab821d	[LV] Mark getMaxVScale static (NFC) (#130598 )	2025-03-10 17:56:50 +00:00
Florian Hahn	fd267082ee	[VPlan] Refactor VPlan creation, add transform introducing region (NFC). (#128419 ) Create an empty VPlan first, then let the HCFG builder create a plain CFG for the top-level loop (w/o a top-level region). The top-level region is introduced by a separate VPlan-transform. This is instead of creating the vector loop region before building the VPlan CFG for the input loop. This simplifies the HCFG builder (which should probably be renamed) and moves along the roadmap ('buildLoop') outlined in [1]. As follow-up, I plan to also preserve the exit branches in the initial VPlan out of the CFG builder, including connections to the exit blocks. The conversion from plain CFG with potentially multiple exits to a single entry/exit region will be done as VPlan transform in a follow-up. This is needed to enable VPlan-based predication. Currently early exit support relies on building the block-in masks on the original CFG, because exiting branches and conditions aren't preserved in the VPlan. So in order to switch to VPlan-based predication, we will have to preserve them in the initial plain CFG, so the exit conditions are available explicitly when we convert to single entry/exit regions. Another follow-up is updating the outer loop handling to also introduce VPRegionBlocks for nested loops as transform. Currently the existing logic in the builder will take care of creating VPRegionBlocks for nested loops, but not the top-level loop. [1] https://llvm.org/devmtg/2023-10/slides/techtalks/Hahn-VPlan-StatusUpdateAndRoadmap.pdf PR: https://github.com/llvm/llvm-project/pull/128419	2025-03-09 15:05:35 +00:00
Ramkumar Ramachandra	ddffb74afd	[LV] Strip unreachable SCEV-check blocks (#130079 ) emitSCEVChecks checks if SCEVCheckCond matches zero, and returns nullptr. However, it sets SCEVCheckCond as used before it does this, which prevents it from being removed during cleanup, resulting in unreachable blocks being emitted. Fix this.	2025-03-06 19:30:25 +00:00
Ramkumar Ramachandra	00f3089c2e	[LV] Use PatternMatch in emitTransformedIndex (NFC) (#130081 )	2025-03-06 19:23:31 +00:00
Luke Lau	e1cea0d928	[LV][TTI] Remove unused ReductionFlags. NFC (#129858 ) No in-tree targets currently use it in the preferInLoopReduction/preferPredicatedReductionSelect TTI hooks. It looks like it used to be used in LoopUtils, at least in 8ca60db40bd944dc5f67e0f200a403b4e03818ea, but I presume it was replaced by RecurrenceDescriptor.	2025-03-05 18:31:12 +08:00
Ramkumar Ramachandra	80bdfcd411	[LoopUtils] Don't wrap in getLoopEstimatedTripCount (#129080 ) getLoopEstimatedTripCount returns the trip count based on profiling data, and its documentation says that it could return 0 when the trip count is zero, but this is not the case: a valid trip count can never be zero, and it returns 0 when the unsigned ExitCount is incremented by 1 and wraps. Some callers are careful about checking for this zero value in an std::optional, but it makes for an API with footguns, as a std::optional return value indicates that a non-nullopt value would be a valid trip count. Fix this by explicitly returning std::nullopt when the return value would wrap, and strip additional checks in callers. This also fixes a minor bug in LoopVectorize.	2025-03-04 08:43:08 +00:00
Mel Chen	9b4ad2fe50	[LV][EVL] Support fixed-order recurrence idiom with EVL tail folding. (#124093 ) This patch converts the llvm.vector.splice intrinsic to llvm.experimental.vp.splice, ensuring that fixed-order recurrences execute correctly when tail folding by EVL is enable. Due to the non-VFxUF penultimate EVL issue, the EVL from the previous iteration will be preserved and used in llvm.experimental.vp.splice.	2025-03-03 21:27:13 +08:00
Florian Hahn	f937b17e85	[LV] Don't query SCEV for non-invariant values in cost model. This fixes a divergence between VPlan and legacy cost model, matching behavior further up in getInstructionCost as well. Fixes https://github.com/llvm/llvm-project/issues/129236.	2025-03-02 10:55:52 +00:00
Florian Hahn	75270e3750	[VPlan] Don't print VPlan DT after VPlan construction. (NFC) Remove unnecessary code to just print VPlan dominator tree.	2025-03-01 21:15:56 +00:00
David Sherwood	65c45bfa7d	[LoopVectorize][NFC] Fix formatting issue with a comment (#129033 )	2025-02-27 12:51:04 +00:00
John Brawn	8150ab93f7	[LoopVectorize] Use CodeSize as the cost kind for minsize (#124119 ) Functions marked with minsize should aim for minimum code size, so the vectorizer should use CodeSize for the cost kind and also the cost we compare should be the cost for the entire loop: it shouldn't be divided by the number of vector elements and block costs shouldn't be divided by the block probability. Possibly we should also be doing this for optsize as well, but there are a lot of tests that assume the current behaviour and the definition of optsize is less clear than minsize (for minsize the goal is to "keep the code size of this function as small as possible" whereas for optsize it's "keep the code size of this function low").	2025-02-27 11:07:02 +00:00
Benjamin Maxwell	3307b0374a	[LV] Teach the loop vectorizer llvm.sincos is trivially vectorizable (#128035 ) Depends on #123210	2025-02-27 09:37:06 +00:00
Florian Hahn	4277c21059	[VPlan] Introduce explicit broadcasts for live-ins. (#124644 ) Add a new VPInstruction::Broadcast opcode and use it to materialize explicit broadcasts of live-ins. The initial patch only materlizes the broadcasts if the vector preheader dominates all uses that need it. Later patches will pick the best valid insert point, thus retiring implicit hoisting of broadcasts from VPTransformsState::get(). PR: https://github.com/llvm/llvm-project/pull/124644	2025-02-26 13:57:51 +00:00
Florian Hahn	522b05afb6	[VPlan] Construct immutable VPIRBBs for exit blocks at construction(NFC) (#128374 ) Constract immutable VPIRBasicBlocks for all exit blocks up front and keep a list of them. Same as the scalar header, they are leaf nodes of the VPlan and won't change. Some exit blocks may be unreachable, e.g. if the scalar epilogue always executes or depending on optimizations. This simplifies both the way we retrieve the exit blocks as well as hooking up the exit blocks. PR: https://github.com/llvm/llvm-project/pull/128374	2025-02-25 14:23:27 +00:00
Elvis Wang	8009c1fd81	[LV][VPlan] Prevent calculate cost for skiped instructions in precomputeCosts(). (#127966 ) Skip calculating instruction costs for exit conditions in precomputeCosts() when it should be skipped. Reported from: https://github.com/llvm/llvm-project/issues/115744#issuecomment-2670479463 Godbolt for reduced test cases: https://godbolt.org/z/fr4YMeqcv	2025-02-25 11:09:09 +08:00
Ramkumar Ramachandra	4ac43b541c	[LV] Restrict widest induction type to be IntegerType (NFC) (#128173 ) As the name of the function suggests, convertPointerToIntegerType should return an IntegerType instead of a Type, and should only ever be called with integer or ptr type. Fix the callers getWiderType, and addInductionPhi to narrow the type of WidestIndTy to IntegerType, stripping unclear casts. While at it, rename convertPointerToIntegerType and getWiderType for clarity.	2025-02-24 19:10:33 +00:00
Florian Hahn	b72bbfc293	[VPlan] Remove fixHeaderPhis (NFC). Removes unneeded code after https://github.com/llvm/llvm-project/pull/124432.	2025-02-23 10:51:20 +00:00
Florian Hahn	0859df4e42	[VPlan] Use operands from initial VPInstructions directly (NFC). Use operands from VPInstructions directly during recipe creation. Follow-up as discussed and planned after https://github.com/llvm/llvm-project/pull/124432.	2025-02-22 22:34:35 +00:00
Ramkumar Ramachandra	2d38be5fd4	[LV] Strip redundant casts (NFC) (#128177 )	2025-02-21 17:37:39 +00:00
Florian Hahn	f5cf04c548	[LV] Remove unused variable after 38376dee92224c66.	2025-02-18 16:23:33 +01:00
Florian Hahn	38376dee92	[VPlan] Build initial VPlan 0 using HCFGBuilder for inner loops. (NFC) (#124432 ) Use HCFGBuilder to build an initial VPlan 0, which wraps all input instructions in VPInstructions and update tryToBuildVPlanWithVPRecipes to replace the VPInstructions with widened recipes. At the moment, widened recipes are created based on the underlying instruction of the VPInstruction. Masks are also still created based on the input IR basic blocks and the loop CFG is flattened in the main loop processing the VPInstructions. This patch also incldues support for Switch instructions in HCFGBuilder using just a VPInstruction with Instruction::Switch opcode. There are multiple follow-ups planned: * Perform predication on the VPlan directly, * Unify code constructing VPlan 0 to be shared by both inner and outer loop code paths. * Construct VPlan 0 once, clone subsequent ones for VFs PR: https://github.com/llvm/llvm-project/pull/124432	2025-02-18 16:12:29 +01:00
Benjamin Maxwell	e0e67a6207	[LV] Add initial support for vectorizing literal struct return values (#109833 ) This patch adds initial support for vectorizing literal struct return values. Currently, this is limited to the case where the struct is homogeneous (all elements have the same type) and not packed. The users of the call also must all be `extractvalue` instructions. The intended use case for this is vectorizing intrinsics such as: ``` declare { float, float } @llvm.sincos.f32(float %x) ``` Mapping them to structure-returning library calls such as: ``` declare { <4 x float>, <4 x float> } @Sleef_sincosf4_u10advsimd(<4 x float>) ``` Or their widened form (such as `@llvm.sincos.v4f32` in this case). Implementing this required two main changes: 1. Supporting widening `extractvalue` 2. Adding support for vectorized struct types in LV * This is mostly limited to parts of the cost model and scalarization Since the supported use case is narrow, the required changes are relatively small.	2025-02-17 09:51:35 +00:00
Florian Hahn	b4f91b007f	[LV] Use IRBuilder::insert to insert VPWidenRecipe (NFC).	2025-02-16 21:25:07 +01:00
Florian Hahn	e5f5517f91	[VPlan] Create IR basic block for middle.block in VPlan. Create a IR BB directly for the middle.block, instead of creating the IR BB during skeleton creation and then replacing the middle VPBB with a VPIRBB. This moves another part of skeleton creation to VPlan and simplififes the code slightly by removing code to disconnect the middle block and vector preheader + the corresponding DT update. NFC modulo IR block naming and block creation order, which changes the IR names for the blocks.	2025-02-15 21:54:16 +01:00
Nicholas Guy	9c89faa62b	[LoopVectorizer][AArch64] Add support for partial reduce subtraction (#123636 )	2025-02-13 10:35:45 +00:00
David Sherwood	3e62321ed9	[LoopVectorize] Make collectInLoopReductions more efficient (#126769 ) We call collectInLoopReductions in multiple places asking the same question with exactly the same answer. For example, this was being called from a loop in calculateRegisterUsage and this patch hoists the call out to above the loop. In addition I've changed collectInLoopReductions so that it bails out if we've already built up a list.	2025-02-12 14:05:34 +00:00
Florian Hahn	3706dfef66	[LV] Forget LCSSA phi with new pred before other SCEV invalidation. (#119897 ) `forgetLcssaPhiWithNewPredecessor` performs additional invalidation if there is an existing SCEV for the phi, but earlier `forgetBlockAndLoopDispositions` or `forgetLoop` may already invalidate the SCEV for the phi. Change the order to first call `forgetLcssaPhiWithNewPredecessor` to ensure it runs before its SCEV gets invalidated too eagerly. Fixes https://github.com/llvm/llvm-project/issues/119665. PR: https://github.com/llvm/llvm-project/pull/119897	2025-02-10 16:29:42 +00:00
Elvis Wang	2e3729bf40	[LV] Prevent query the computeCost() when VF=1 in emitInvalidCostRemarks(). (#117288 ) We should only query the computeCost() when the VF is vector.	2025-02-10 08:40:28 +08:00
Hassnaa Hamdi	e9a20f77ee	Reland "[LV]: Teach LV to recursively (de)interleave." (#125094 ) This patch relands the changes from "[LV]: Teach LV to recursively (de)interleave.#122989" Reason for revert: - The patch exposed an assert in the vectorizer related to VF difference between legacy cost model and VPlan-based cost model because of uncalculated cost for VPInstruction which is created by VPlanTransforms as a replacement to 'or disjoint' instruction. VPlanTransforms do that instructions change when there are memory interleaving and predicated blocks, but that change didn't cause problems because at most cases the cost difference between legacy/new models is not noticeable. - Issue is fixed by #125434 Original patch: https://github.com/llvm/llvm-project/pull/89018 Reviewed-by: paulwalker-arm, Mel-Chen	2025-02-09 19:21:54 +00:00
Florian Hahn	32c4493d5f	[VPlan] Add incoming values for all predecessor to ResumePHI (NFCI). Follow-up as discussed when using VPInstruction::ResumePhi for all resume values (#112147). This patch explicitly adds incoming values for each predecessor in VPlan. This simplifies codegen and allows transformations adjusting the predecessors of blocks with NFC modulo incoming block order in phis.	2025-02-09 11:20:20 +00:00
Florian Hahn	6ff8a06de9	[VPlan] Run recipe removal and simplification after optimizeForVFAndUF. (#125926 ) Run recipe simplification and dead recipe removal after VPlan-based unrolling and optimizeForVFAndUF, to clean up any redundant or dead recipes introduced by them. Currently this is NFC, as it removes the corresponding removeDeadRecipes run in optimizeForVFAndUF and no additional simplifications kick in after unrolling yet. That is changing with https://github.com/llvm/llvm-project/pull/123655. Note that with this change, pattern-matching is now applied after EVL-based recipes have been introduced. Trying to match VPWidenEVLRecipe when not explicitly requested might apply a pattern with 2 operands to one with 3 due to the extra EVL operand and VPWidenEVLRecipe being a subclass of VPWidenRecipe. To prevent this, update Recipe_match::match to only match VPWidenEVLRecipe if it is in the requested recipe types (RecipeTy). PR: https://github.com/llvm/llvm-project/pull/125926	2025-02-08 13:33:46 +00:00
Florian Hahn	ee806646ad	[VPlan] Consistently use hasScalarVFOnly (NFC). Consistently use hasScalarVFOnly instead of using hasVF(ElementCount::getFixed(1)). Also add an assert to ensure all cases are covered by hasScalarVFOnly.	2025-02-08 12:19:25 +00:00
David Sherwood	3872e55758	[LoopVectorize] Fix build error (#126218 ) Fixes issue caused by 1930524bbde3cd26ff527bbdb5e1f937f484edd6 Unused variable UsesMask in LoopVectorize.cpp	2025-02-07 10:16:32 +00:00
David Sherwood	1930524bbd	[LoopVectorize] Fix cost model assert when vectorising calls (#125716 ) The legacy and vplan cost models did not agree because VPWidenCallRecipe::computeCost only calculates the cost of the call instruction, whereas LoopVectorizationCostModel::setVectorizedCallDecision in some cases adds on the cost of a synthesised mask argument. However, this mask is always 'splat(i1 true)' which should be hoisted out of the loop during codegen. In order to synchronise the two cost models I have two options: 1) Also add the cost of the splat to the vplan model, or 2) Remove the cost of the splat from the legacy model. I chose 2) because I feel this more closely represents what the final code will look like. There is an argument that we should take account of such broadcast costs in the preheader when deciding if it's profitable to vectorise a loop, however there isn't currently a mechanism to do this. We currently only take account of the runtime checks when assessing profitability and what the minimum trip count should be. However, I don't believe this work needs doing as part of this PR.	2025-02-07 09:36:52 +00:00
James Chesterman	ac158aa13b	[LoopVectorizer] Allow partial reductions to be made in predicated loops (#124268 ) Does a select on the input rather than the output. This way the mask has the same number of lanes as the other operand in the select instruction.	2025-02-07 09:09:10 +00:00
Mel Chen	4d3148d926	[LV][EVL] Fix the check for legality of folding with EVL. (#125678 ) The current legality check for folding with EVL has incomplete verification for VF. This patch fixes the VF check, ensuring that tail folding with EVL is enabled only when a scalable VF is available. This allows loops that prefer tail folding with EVL but cannot use scalable VF vectorization to still be vectorized using a fixed VF, rather than abandoning vectorization entirely.	2025-02-07 12:53:10 +08:00
David Sherwood	f07cd36a5d	[LoopVectorize] Add the cost of VPInstruction::AnyOf to vplan (#125058 ) This patch adds an initial implementation of VPInstruction::computeCost with support for only one instruction so far - VPInstruction::AnyOf. This is only used when vectorising loops with uncountable early exits.	2025-02-05 16:31:14 +00:00
Mel Chen	8d037b9256	[LV][EVL] Skip tryAddExplicitVectorLength for plans with scalar VF. (#125497 ) The plans with scalar VF should not be transformed the plans folded by EVL. TODO: Move the scalar VF checking into `LoopVectorizationCostModel ::foldTailWithEVL()`.	2025-02-05 15:02:33 +08:00
Florian Hahn	7043895911	[VPlan] Remove dead VPBB argument from tryTo[Create]Widen[Recipe] (NFC) The functions now use VPBuilder to insert recipes and the VPBB argument is unused. Clean it up.	2025-02-04 21:40:07 +00:00

1 2 3 4 5 ...

2432 Commits