llvm-project

Author	SHA1	Message	Date
Rahul Joshi	4703195c8d	[NFC][LLVM] Namespace cleanup in SLPVectorizer (#168623 ) - Remove file local functions out of `llvm` or anonymous namespace and make them static. - Use namespace qualifier to define `BoUpSLP` class and several template specializations.	2025-11-19 07:34:09 -08:00
Florian Hahn	e009de26b6	[LV] Use VPlan pattern matching in adjustRecipesForReductions (NFC) Replace the assert checking if CurrentLinkI is a CmpInst with a pattern matching check in the if condition. This uses VPlan-level pattern matching instead of inspecting the underlying instruction type.	2025-11-15 21:45:40 +00:00
Florian Hahn	a6edeedbfa	Revert "[LV] Use ExtractLane(LastActiveLane, V) live outs when tail-folding. (#149042 )" This reverts commit 62d1a080e69e3c5e98840e000135afa7c688a77b. This appears to be causing some runtime failures on RISCV https://lab.llvm.org/buildbot/#/builders/210/builds/5221	2025-11-13 22:34:55 +00:00
Florian Hahn	53a65ba6b9	[VPlan] Don't look up recipe for IV step via RecipeBuilder. (NFC) Directly update induction increments with step value created for wide inductions in createWidenInductionRecipes, which does not require looking up via RecipeBuilder.	2025-11-12 22:08:56 +00:00
Florian Hahn	62d1a080e6	[LV] Use ExtractLane(LastActiveLane, V) live outs when tail-folding. (#149042 ) Building on top of https://github.com/llvm/llvm-project/pull/148817, introduce a new abstract LastActiveLane opcode that gets lowered to Not(Mask) → FirstActiveLane(NotMask) → Sub(result, 1). When folding the tail, update all extracts for uses outside the loop the extract the value of the last actice lane. See also https://github.com/llvm/llvm-project/issues/148603 PR: https://github.com/llvm/llvm-project/pull/149042	2025-11-12 15:11:00 +00:00
Luke Lau	97d4e96cc5	[VPlan] Perform optimizeMaskToEVL in terms of pattern matching (#155394 ) Currently in optimizeMaskToEVL we convert every widened load, store or reduction to a VP predicated recipe with EVL, regardless of whether or not it uses the header mask. So currently we have to be careful when working on other parts VPlan to make sure that the EVL transform doesn't break or transform something incorrectly, because it's not a semantics preserving transform. Forgetting to do so has caused miscompiles before, like the case that was fixed in #113667 This PR rewrites it to work in terms of pattern matching, so it now only converts a recipe to a VP predicated recipe if it is exactly masked with the header mask. After this the transform should be a true optimisation and not change any semantics, so it shouldn't miscompile things if other parts of VPlan change. This fixes #152541, and allows us to move addExplicitVectorLength into tryToBuildVPlanWithVPRecipes in #153144 It also splits out the load/store transforms into separate patterns for reversed and non-reversed, which should make #146525 easier to implement and reason about.	2025-11-03 16:53:18 +08:00
Florian Hahn	b9ce7656e9	[VPlan] Add VPInstruction to unpack vector values to scalars. (#155670 ) Add a new Unpack VPInstruction (name to be improved) to explicitly extract scalars values from vectors. Test changes are movements of the extracts: they are no generated together and also directly after the producer. Depends on https://github.com/llvm/llvm-project/pull/155102 (included in PR) PR: https://github.com/llvm/llvm-project/pull/155670	2025-10-19 18:49:05 +00:00
Ramkumar Ramachandra	0a4702407b	[VPlan] Improve code around canConstantBeExtended (NFC) (#161652 ) Follow up on 7c4f188 ([LV] Support multiplies by constants when forming scaled reductions), introducing m_APInt, and improving code around canConstantBeExtended: we change canConstantBeExtended to take an APInt.	2025-10-16 13:03:13 +01:00
Florian Hahn	4f23767852	[VPlan] Add m_FirstActiveLane matcher (NFC). Add m_FirstActiveLane, to slightly simplify pattern matching in preparation for https://github.com/llvm/llvm-project/pull/149042.	2025-10-15 18:55:26 +01:00
Florian Hahn	7f54fccc0e	[VPlan] Add ExtractLastLanePerPart, use in narrowToSingleScalar. (#163056 ) When narrowing stores of a single-scalar, we currently use ExtractLastElement, which extracts the last element across all parts. This is not correct if the store's address is not uniform across all parts. If it is only uniform-per-part, the last lane per part must be extracted. Add a new ExtractLastLanePerPart opcode to handle this correctly. Most transforms apply to both ExtractLastElement and ExtractLastLanePerPart, with the only difference being their treatment during unrolling. Fixes https://github.com/llvm/llvm-project/issues/162498. PR: https://github.com/llvm/llvm-project/pull/163056	2025-10-15 13:46:09 +01:00
Ramkumar Ramachandra	869c76dda3	[VPlan] Allow zero-operand m_BranchOn(Cond\|Count) (NFC) (#162721 )	2025-10-13 08:50:09 +01:00
Ramkumar Ramachandra	b716d35388	[VPlanPatternMatch] Introduce m_ConstantInt (#159558 )	2025-09-21 13:27:46 +01:00
Ramkumar Ramachandra	f1ba44f50a	[VPlan] Strip dead code in cst live-in match (NFC) (#159589 ) A live-in constant can never be of vector type.	2025-09-18 19:28:42 +01:00
Graham Hunter	6b99a7bbed	[LV] Provide utility routine to find uncounted exit recipes (#152530 ) Splitting out just the recipe finding code from #148626 into a utility function (along with the extra pattern matchers). Hopefully this makes reviewing a bit easier. Added a gtest, since this isn't actually used anywhere yet.	2025-09-18 15:45:23 +00:00
Ramkumar Ramachandra	f68f3b9a7e	[VPlan] Allow zero-operand m_VPInstruction (NFC) (#159550 )	2025-09-18 12:40:31 +01:00
Ramkumar Ramachandra	0384f6c9db	[VPlanPatternMatch] Introduce match functor (NFC) (#159521 ) Follow up on 7fb3a91 ([PatternMatch] Introduce match functor) to introduce the VPlanPatternMatch version of the match functor to shorten some idioms. Co-authored-by: Luke Lau <luke@igalia.com>	2025-09-18 10:36:12 +01:00
Ramkumar Ramachandra	d012642be1	[VPlan] Match more GEP-like in m_GetElementPtr (#158019 ) The m_GetElementPtr matcher is incorrect and incomplete. Fix it to match all possible GEPs to avoid misleading users. It currently just has one use, and the change is non-functional for that use.	2025-09-15 20:06:37 +01:00
Ramkumar Ramachandra	148a83543b	[LV] Introduce m_One and improve (0\|1)-match (NFC) (#157419 )	2025-09-15 10:34:06 +00:00
Kerry McLaughlin	f0e9bba024	[LoopVectorize] Generate wide active lane masks (#147535 ) This patch adds a new flag (-enable-wide-lane-mask) which allows LoopVectorize to generate wider-than-VF active lane masks when it is safe to do so (i.e. the mask is used for data and control flow). The transform in extractFromWideActiveLaneMask creates vector extracts from the first active lane mask in the header & loop body, modifying the active lane mask phi operands to use the extracts. An additional operand is passed to the ActiveLaneMask instruction, the value of which is used as a multiplier of VF when generating the mask. By default this is 1, and is updated to UF by extractFromWideActiveLaneMask. The motivation for this change is to improve interleaved loops when SVE2.1 is available, where we can make use of the whilelo instruction which returns a predicate pair. This is based on a PR that was created by @momchil-velikov (#81140) and contains tests which were added there.	2025-09-01 13:53:30 +01:00
Florian Hahn	5ebd59806b	[VPlan] Fold BinaryAnd x, 0 -> 0 in simplifyRecipe. This also fixes a cost-model divergence in the newly added tests in constant-fold.ll	2025-08-27 22:35:08 +01:00
Luke Lau	f3520c538d	[VPlan] Replace EVL branch condition with (branch-on-count AVLNext, 0) (#152167 ) This changes the branch condition to use the AVL's backedge value instead of the EVL-based IV. This allows us to emit bnez on RISC-V and removes a use of the trip count, which should reduce register pressure. To match phis with VPlanPatternMatch I've had to relax the assert that the number of operands must exactly match the pattern for the Phi opcode, and I've copied over m_ZExtOrSelf from the LLVM IR PatternMatch.h. Fixes #151459	2025-08-26 11:19:19 +00:00
Ramkumar Ramachandra	66be00d635	[VPlan] Introduce m_Cmp; match more compares (#154771 ) Extend [Specific]Cmp_match to handle floating-point compares, and introduce m_Cmp that matches both integer and floating-point compares. Use it in simplifyRecipe to match and simplify the general case of compares. The change has necessitated a bugfix in VPReplicateRecipe::execute.	2025-08-24 13:27:06 +01:00
Florian Hahn	9f87cd68a4	[VPlan] Add m_ExtractLastElement matcher. (NFC)	2025-08-23 21:21:03 +01:00
Ramkumar Ramachandra	2975e674ec	[VPlan] Improve style in match_combine_or (NFC) (#154793 )	2025-08-22 12:01:42 +01:00
Ramkumar Ramachandra	de7bac6426	[VPlan/PatternMatch] Strip outdated hdr comment (NFC) (#154794 )	2025-08-21 20:43:03 +01:00
Luke Lau	5ef28e0a88	[VPlan] Add m_c_Add to VPlanPatternMatch. NFC (#154730 ) Same thing as #154705, and useful for simplifying the matching in #152167	2025-08-21 11:26:08 +00:00
Luke Lau	955c475ae6	[VPlan] Add m_Sub to VPlanPatternMatch. NFC (#154705 ) To mirror PatternMatch.h, and we'll also be able to use it in #152167	2025-08-21 09:33:46 +00:00
Luke Lau	af06835483	[VPlan] Use parameter packs to avoid unary/binary/ternary matchers. NFC (#152272 ) Instead of defining unary/binary/ternary/4ary overloads of each matcher, we can use parameter packs to support arbitrary numbers of operands. This allows us to remove the explicit N-ary definitions for each matcher. We need to rewrite Recipe_match's constructor to use a parameter pack too, otherwise we end up with ambiguous overloads.	2025-08-14 11:55:55 +08:00
Ramkumar Ramachandra	092388171f	[VPlan] Introduce m_[Specific]ICmp matcher (#151540 )	2025-08-06 20:35:35 +01:00
Luke Lau	2e5776130b	[VPlan] Simplify select !c, x, y -> select c, y, x (#147268 ) This is split off from #133993 On its own this simplification isn't that useful, but it allows us to make the equivalent VPBlendRecipe optimisation more generic by operating on VPInstructions. In order to actually test this without #133993, I've had to also extend the m_Not pattern matcher to also catch VPWidenRecipes, since I couldn't really think of a straightforward way to create a VPInstruction::Select with a negated condition.	2025-07-08 15:56:04 +08:00
Florian Hahn	aa24029319	[VPlan] Unroll VPReplicateRecipe by VF. (#142433 ) Explicitly unroll VPReplicateRecipes outside replicate regions by VF, replacing them by VF single-scalar recipes. Extracts for operands are added as needed and the scalar results are combined to a vector using a new BuildVector VPInstruction. It also adds a few folds to simplify unnecessary extracts/BuildVectors. It also adds a BuildStructVector opcode for handling of calls that have struct return types. VPReplicateRecipe in replicate regions can will be unrolled as follow up, turing non-single-scalar VPReplicateRecipes into 'abstract', i.e. not executable. PR: https://github.com/llvm/llvm-project/pull/142433	2025-06-26 11:19:09 +01:00
Florian Hahn	e8be733a3c	[VPlan] Remove redundant ExtractLastElement from vector-to-scalar VPI. Recipes that are vector-to-scalar are guaranteed to generate a scalar value, so the extract is redundant after VPlan unrolling. Remove it. This removes unneeded ExtractLastElement VPInstruction of reduction result computations.	2025-06-20 12:45:20 +01:00
Florian Hahn	f68848015f	[VPlan] Manage Sentinel value for FindLastIV in VPlan. (#142291 ) Similar to modeling the start value as operand, also model the sentinel value as operand explicitly. This makes all require information for code-gen available directly in VPlan. PR: https://github.com/llvm/llvm-project/pull/142291	2025-06-13 19:17:01 +01:00
Florian Hahn	5520ab3d50	[VPlan] Add ComputeAnyOfResult VPInstruction (NFC) (#141932 ) Add a dedicated opcode for any-of reduction, similar to https://github.com/llvm/llvm-project/pull/132689 and https://github.com/llvm/llvm-project/pull/132690. The patch also explictly adds the start value to not require RecurrenceDescriptor during execute. It also allows freezing the start value to make it poison-safe. PR: https://github.com/llvm/llvm-project/pull/141932	2025-06-03 14:33:53 +01:00
Graham Hunter	5b9246517f	[LV] Fix ScalarIVSteps vplan pattern matcher, remove m_CanonicalIV() (#138298 ) 783a846 changed VPScalarIVStepsRecipe to take 3 arguments (adding VF explicitly) instead of 2, but didn't change the corresponding pattern matcher. This matcher was only used in vputils::isHeaderMask, and no test ever reached that function with a ScalarIVSteps recipe for the value being matched -- it was always a WideCanonicalIV. So the matcher bailed out immediately before checking arguments and asserting that the number of arguments in the recipe was the same provided by the matcher. Since the constructors for ScalarIVSteps take 3 values, we should be safe to update the matcher and guard it with a dedicated gtest. m_CanonicalIV() on the other hand is removed; as a phi recipe it may not have a consistent number of arguments to match, only requiring one (the start value) when being constructed with the assumption that a second incoming value is added for the backedge later. In order to keep the matcher we would need to add multiple matchers with different numbers of arguments for it depending on what phase of vplan construction we were in, and ensure that we never reorder matcher usage vs. vplan transformation. Since the main IR PatternMatch.h doesn't contain any matchers for PHI nodes, I think we can just remove it and match via m_Specific() using the VPValue we get from Plan.getCanonicalIV().	2025-05-14 15:01:03 +01:00
Luke Lau	3883b27ba8	[VPlan] Fix typo in assertion. NFC (#137009 )	2025-04-24 16:36:32 +08:00
Luke Lau	41675fa5b8	[VPlan] Simplify vp.merge true, (or x, y), x -> vp.merge y, true, x (#135017 ) With EVL tail folding an AnyOf reduction will emit an i1 vp.merge like vp.merge true, (or phi, cond), phi, evl We can remove the or and optimise this to vp.merge cond, true, phi, evl Which makes it slightly easier to pattern match in #134898. This also adds a pattern matcher for calls to help match this. Blended AnyOf reductions will use an and instead of an or, which we may also be able to simplify in a later patch.	2025-04-17 16:31:14 +02:00
Florian Hahn	0f607f3df5	[VPlan] Simplify 'or x, true' -> true. Add additional OR simplification to fix a divergence between legacy and VPlan-based cost model. This adds a new m_AllOnes matcher by generalizing specific_intval to int_pred_ty, which takes a predicate to check to support matching both specific APInts and other APInt predices, like isAllOnes. Fixes https://github.com/llvm/llvm-project/issues/131359.	2025-04-13 12:09:40 +01:00
Luke Lau	b739a3cb65	[VPlan] Add m_Deferred. NFC (#133736 ) This copies over the implementation of m_Deferred which allows matching values that were bound in the pattern, and uses it for the (X && Y) \|\| (X && !Y) -> X simplifcation.	2025-03-31 21:01:28 +01:00
Hari Limaye	bf5627c85e	[LV] Optimize VPWidenIntOrFpInductionRecipe for known TC (#118828 ) Optimize the IR generated for a VPWidenIntOrFpInductionRecipe to use the narrowest type necessary, when the trip-count of a loop is known to be constant and the only use of the recipe is the condition used by the vector loop's backedge branch.	2025-03-28 14:47:40 +00:00
Florian Hahn	8ddbc01295	[VPlan] Manage FindLastIV start value in ComputeFindLastIVResult (NFC) (#132690 ) Keep the start value as operand of ComputeFindLastIVResult. A follow-up patch will use this to make sure the start value is frozen if needed. Depends on https://github.com/llvm/llvm-project/pull/132689 PR: https://github.com/llvm/llvm-project/pull/132690	2025-03-27 18:34:13 +00:00
Luke Lau	e23ab73335	[VPlan] Don't convert widen recipes to VP intrinsics in EVL transform (#127180 ) This is a copy of #126177, since it was automatically and permanently closed because I messed up the source branch on my remote This patch proposes to avoid converting widening recipes to VP intrinsics during the EVL transform. IIUC we initially did this to avoid `vl` toggles on RISC-V. However we now have the RISCVVLOptimizer pass which mostly makes this redundant. Emitting regular IR instead of VP intrinsics allows more generic optimisations, both in the middle end and DAGCombiner, and we generally have better patterns in the RISC-V backend for non-VP nodes. Sticking to regular IR instructions is likely a lot less work than reimplementing all of these optimisations for VP intrinsics, and on SPEC CPU 2017 we get noticeably better code generation.	2025-02-22 19:38:11 +08:00
Florian Hahn	6ff8a06de9	[VPlan] Run recipe removal and simplification after optimizeForVFAndUF. (#125926 ) Run recipe simplification and dead recipe removal after VPlan-based unrolling and optimizeForVFAndUF, to clean up any redundant or dead recipes introduced by them. Currently this is NFC, as it removes the corresponding removeDeadRecipes run in optimizeForVFAndUF and no additional simplifications kick in after unrolling yet. That is changing with https://github.com/llvm/llvm-project/pull/123655. Note that with this change, pattern-matching is now applied after EVL-based recipes have been introduced. Trying to match VPWidenEVLRecipe when not explicitly requested might apply a pattern with 2 operands to one with 3 due to the extra EVL operand and VPWidenEVLRecipe being a subclass of VPWidenRecipe. To prevent this, update Recipe_match::match to only match VPWidenEVLRecipe if it is in the requested recipe types (RecipeTy). PR: https://github.com/llvm/llvm-project/pull/125926	2025-02-08 13:33:46 +00:00
Florian Hahn	049aa179dc	[VPlan] Simplify operand tuple matching in VPlanPatternMatch (NFC). Remove some indirection when matching recipe and matcher operands by directly using fold over parameter pack.	2025-02-06 21:00:44 +00:00
Florian Hahn	585b75ec9a	[VPlan] Simplify matching recipe ty and opcode in pattern match (NFC). Use parameter pack fold to simplify matching of recipe types and opcodes for RecipeTys parameter pack.	2025-02-05 20:03:42 +00:00
Florian Hahn	df4a615c98	[VPlan] Convert induction increment check to be VPlan-based. Check the VPlan directly to determine if a VPValue is an optimiziable IV or IV use instead of checking the underlying IR instructions. Split off from https://github.com/llvm/llvm-project/pull/112147. This refactoring enables moving IV end value creation from the legacy fixupIVUsers to a VPlan-based transform. There is one case we now won't optimize, that is IVs with subtracts and non-constant steps. But as this is a minor optimization and doesn't impact correctness, the benefits of performing the check in VPlan should outweigh the missed case.	2025-01-05 11:16:01 +00:00
Florian Hahn	e1833e3a7e	[VPlan] Simplify redundant VPDerivedIVRecipe (NFC). Split DerivedIV simplification off from https://github.com/llvm/llvm-project/pull/112145 and use to remove the need for extra checks in createScalarIVSteps. Required an extra simplification run after IV transforms.	2024-12-22 09:39:19 +00:00
Mel Chen	4480a22c2b	[LV][EVL] Emit vp.merge intrinsic to enable out-loop reduction in EVL vectorization. (#101641 ) Following #90184, this patch emits vp.merge intrinsic, which is used to set the inactive lanes in a select operation to the RHS instead of undef. Currently, it is applied to out-loop reduction for EVL vectorization. This patch performs transformation to convert select(header_mask, LHS, RHS) into vp.merge(all-true, LHS, RHS, EVL) And always use the predicated reduction select to set the incoming value of the reduction phi to support out-loop reduction when using tail folding with EVL. TODO: Postpone the adjustment of the predicated reduction select to VPlanTransform. The current adjustment might be too early, which could lead to a situation where the predicated reduction select is adjusted, but the EVL recipes cannot be successfully generated during VPlanTransform.	2024-11-06 14:53:49 +08:00
Florian Hahn	1d9b3222f3	[VPlan] Implement VPWidenSelectRecipe::computeCost. Implement VPlan-based cost computation for VPWidenSelectRecipe.	2024-10-22 03:10:04 +01:00
Florian Hahn	dac0f7e83e	[VPlan] Add general recipe matcher, replace handwritten ones (NFC) The new matcher is more flexible and can be used to build matchers for additional recipe types without unnecessary duplication.	2024-10-21 16:46:45 -07:00

1 2

64 Commits