llvm-project

Author	SHA1	Message	Date
Florian Hahn	47258ca470	[VPlan] Use VPPhi instead of dyn_cast + opcode check in isPhi (NFC).	2025-08-05 19:20:12 +01:00
Luke Lau	94a6cd464e	[VPlan] Expand VPWidenPointerInductionRecipe into separate recipes (#148274 ) This is the VPWidenPointerInductionRecipe equivalent of #118638, with the motivation of allowing us to use the EVL as the induction step. There is a new VPInstruction added, WidePtrAdd to allow adding the step vector to the induction phi, since VPInstruction::PtrAdd only handles scalars or multiple scalar lanes. Originally this transformation was copied from the original recipe's execute code, but it's since been simplifed by teaching `unrollWidenInductionByUF` to unroll the recipe, which brings it inline with VPWidenIntOrFpInductionRecipe.	2025-08-05 16:54:02 +08:00
Florian Hahn	39c30665e9	[VPlan] Update type of cloned instruction in scalarizeInstruction. The operands of the replicate recipe may have been narrowed, resulting in a narrower result type. Update the type of the cloned instruction to the correct type. Fixes https://github.com/llvm/llvm-project/issues/151392.	2025-08-02 19:49:59 +01:00
Mel Chen	86916ff0f0	[LV] Fix gap mask requirement for interleaved access (#151105 ) When interleaved stores contain gaps, a mask is required to skip the gaps, regardless of whether scalar epilogues are allowed. This patch corrects the condition under which a gap mask is needed, ensuring consistency between the legacy and VPlan-based cost models and avoiding assertion failures. Related #149981	2025-08-01 14:24:30 +08:00
Samuel Tebbs	339b0a1d74	[LV][NFCI] Format fcc419b05f62	2025-07-31 14:37:59 +01:00
Samuel Tebbs	fcc419b05f	[LV][NFCI] Swap reduction recipe operand order https://github.com/llvm/llvm-project/pull/147026 will enable sub reductions, which require that the phi value is the first operand since they aren't commutative. This re-orders the operands when executing reductions, which actually matches other existing code in VPReductionRecipe::execute.	2025-07-31 14:35:10 +01:00
David Sherwood	6fbc397964	[IR] Add new CreateVectorInterleave interface (#150931 ) This PR adds a new interface to IRBuilder called CreateVectorInterleave, which can be used to create vector.interleave intrinsics of factors 2-8. For convenience I have also moved getInterleaveIntrinsicID and getDeinterleaveIntrinsicID from VectorUtils.cpp to Intrinsics.cpp where it can be used by IRBuilder.	2025-07-29 08:47:07 +01:00
Florian Hahn	4386848776	[VPlan] Add explicit VPUnrollPartAccessor<1> instantiation. This should fix a build-failure with GCC, including https://lab.llvm.org/buildbot/#/builders/105/builds/10685.	2025-07-27 14:05:23 +01:00
Florian Hahn	80c43b6c07	[VPlan] Add ExtractLane VPInst to extract across multiple parts. (#148817 ) This patch adds a new ExtractLane VPInstruction which extracts across multiple parts using a wide index, to be used in combination with FirstActiveLane. The patch updates early-exit codegen to use it instead ExtractElement, which is only per-part. With this change, interleaving should work correctly with early-exit loops. The patch removes the restrictions added in 6f43754e9 (#145877), but does not yet automatically select interleave counts > 1 for early-exit loops. I'll share a patch as follow-up. The cost of extracting a lane adds non-trivial overhead in the exit block, so that should be considered when picking the interleave count. PR: https://github.com/llvm/llvm-project/pull/148817	2025-07-27 08:08:25 +01:00
Florian Hahn	1640d51bf8	[VPlan] Mark getUnrollPart argument as const (NFC).	2025-07-25 10:49:33 +01:00
Luke Lau	9563e7a940	[VPlan] Mark VPInstruction::ExplicitVectorLength as single scalar. NFC (#150221 ) This allows it to be broadcasted without an explicit VPInstruction::Broadcast in #150202	2025-07-23 22:38:21 +08:00
Luke Lau	114d74e391	[VPlan] Expand VPBlendRecipes to select instructions. NFC (#133993 ) When looking at some EVL tail folded code in SPEC CPU 2017 I noticed we sometimes have both VPBlendRecipes and select VPInstructions in the same plan: EMIT vp<%active.lane.mask> = active lane mask vp<%5>, vp<%3> EMIT vp<%7> = icmp ... EMIT vp<%8> = logical-and vp<%active.lane.mask>, vp<%7> BLEND ir<%8> = ir<%n.015> ir<%foo>/vp<%8> EMIT vp<%9> = select vp<%active.lane.mask>, ir<%8>, ir<%n.015> Since a blend will ultimately generate a chain of selects, we could fold the blend into the select: EMIT vp<%active.lane.mask> = active lane mask vp<%5>, vp<%3> EMIT vp<%7> = icmp ... EMIT vp<%8> = logical-and vp<%active.lane.mask>, vp<%7> EMIT ir<%8> = select vp<%8>, ir<%foo>, ir<%n.015> So as a first step, this patch expands blends to a series of select instructions, which may allow them to be simplified further with other select instructions.	2025-07-23 20:09:33 +08:00
Mel Chen	6752369139	[LV] Unify interleaved load handling for fixed and scalable VFs. nfc (#146914 ) This patch modifies VPInterleaveRecipe::execute to handle both fixed and scalable VFs using a single loop.	2025-07-22 09:00:10 +08:00
Florian Hahn	004c67ea25	[LV] Vectorize maxnum/minnum w/o fast-math flags. (#148239 ) Update LV to vectorize maxnum/minnum reductions without fast-math flags, by adding an extra check in the loop if any inputs to maxnum/minnum are NaN, due to maxnum/minnum behavior w.r.t to signaling NaNs. Signed-zeros are already handled consistently by maxnum/minnum. If any input is NaN, exit the vector loop, compute the reduction result up to the vector iteration that contained NaN inputs and * resume in the scalar loop New recurrence kinds are added for reductions using maxnum/minnum without fast-math flags. PR: https://github.com/llvm/llvm-project/pull/148239	2025-07-18 21:58:19 +01:00
Florian Hahn	a40dc05898	[VPlan] Mark canonical IV and reduction phis as not writing memory (NFC). Both recipes do not write to memory. Should be NFC at the moment, as they cannot be removed currently due to being in a cycle.	2025-07-15 11:08:54 +01:00
Luke Lau	c8d0e24745	[VPlan] Preserve trunc nuw/nsw in VPRecipeWithIRFlags (#144700 ) This preserves the nuw/nsw flags on widened truncs by checking for TruncInst in the VPIRFlags constructor The motivation for this is to be able to fold away some redundant truncs feeding into uitofps (or potentially narrow the inductions feeding them)	2025-07-15 15:34:14 +08:00
Kazu Hirata	649347e208	[Vectorize] Remove unnecessary casts (NFC) (#148116 ) &Ingredient is already of Instruction *.	2025-07-11 09:52:42 -07:00
Ramkumar Ramachandra	62f8377e40	[LV] Extend FindFirstIV to unsigned case (#146386 ) Extend FindFirstIV vectorization to the unsigned case by introducing and handling FindFirstIVUMin. Co-authored-by: Florian Hahn <flo@fhahn.com>	2025-07-09 15:56:40 +01:00
Florian Hahn	6a9a16da7a	[VPlan] Replace RdxDesc with RecurKind in VPReductionPHIRecipe (NFC). (#142322 ) Replace RdxDesc with RecurKind in VPReductionPHIRecipe, as all VPlan analyses and codegen only require the recurrence kind. This enables creating new VPReductionPHIRecipe directly in LV, without needing to construction a whole RecurrenceDescriptor object. Depends on https://github.com/llvm/llvm-project/pull/141860 https://github.com/llvm/llvm-project/pull/141932 https://github.com/llvm/llvm-project/pull/142290 https://github.com/llvm/llvm-project/pull/142291 PR: https://github.com/llvm/llvm-project/pull/142322	2025-07-06 21:40:42 +01:00
Mel Chen	fcdb91e113	[VPlan] Remove redundant debug location setting in VPInterleaveRecipe::execute. nfc (#146670 ) Remove it since we already set debug loc in VPBasicBlock::executeRecipes.	2025-07-03 15:52:45 +08:00
Mel Chen	3e370452fd	[VPlan] Early assert for unsupported interleaved access features. nfc (#146669 )	2025-07-03 15:31:53 +08:00
David Sherwood	f575b18fdc	[LV] Add support for partial reductions without a binary op (#133922 ) Consider IR such as this: for.body: %iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ] %accum = phi i32 [ 0, %entry ], [ %add, %for.body ] %gep.a = getelementptr i8, ptr %a, i64 %iv %load.a = load i8, ptr %gep.a, align 1 %ext.a = zext i8 %load.a to i32 %add = add i32 %ext.a, %accum %iv.next = add i64 %iv, 1 %exitcond.not = icmp eq i64 %iv.next, 1025 br i1 %exitcond.not, label %for.exit, label %for.body Conceptually we can vectorise this using partial reductions too, although the current loop vectoriser implementation requires the accumulation of a multiply. For AArch64 this is easily done with a udot or sdot with an identity operand, i.e. a vector of (i16 1). In order to do this I had to teach getScaledReductions that the accumulated value may come from a unary op, hence there is only one extension to consider. Similarly, I updated the vplan and AArch64 TTI cost model to understand the possible unary op. --------- Co-authored-by: Matt Devereau <matthew.devereau@arm.com>	2025-07-02 13:05:51 +01:00
Mel Chen	bc8dad1c7e	[VPlan] Emit VPVectorEndPointerRecipe for reverse interleave pointer adjustment (#144864 ) A reverse interleave access is essentially composed of multiple load/store operations with same negative stride, and their addresses are based on the last lane address of member 0 in the interleaved group. Currently, we already have VPVectorEndPointerRecipe for computing the last lane address of consecutive reverse memory accesses. This patch extends VPVectorEndPointerRecipe to support constant stride and extracts the reverse interleave group address adjustment from VPInterleaveRecipe::execute, replacing it with a VPVectorEndPointerRecipe. The final goal is to support interleaved accesses with EVL tail folding. Given that VPInterleaveRecipe is large and tightly coupled — combining both load and store, and embedding operations like reverse pointer adjustion (GEP), widen load/store, deinterleave/interleave, and reversal — breaking it down into smaller, dedicated recipes may allow VPlanTransforms::tryAddExplicitVectorLength to lower them into EVL-aware form more effectively. One foreseeable challenge is that VPlanTransforms::convertToConcreteRecipes currently runs after tryAddExplicitVectorLength, so decomposing VPInterleaveRecipe will likely need to happen earlier in the pipeline to be effective.	2025-07-02 18:16:02 +08:00
Simon Pilgrim	651c5208f8	VPlanRecipes.cpp - fix "'llvm::VPExpressionRecipe::computeCost': not all control paths return a value" MSVC warning. NFC.	2025-07-02 09:59:01 +01:00
Florian Hahn	6b3d2b629c	[VPlan] Add VPExpressionRecipe, replacing extended reduction recipes. (#144281 ) This patch adds a new recipe to combine multiple recipes into an 'expression' recipe, which should be considered as single entity for cost-modeling and transforms. The recipe needs to be 'decomposed', i.e. replaced by its individual recipes before execute. This subsumes VPExtendedReductionRecipe and VPMulAccumulateReductionRecipe and should make it easier to extend to include more types of bundled patterns, like e.g. extends folded into loads or various arithmetic instructions, if supported by the target. It allows avoiding re-creating the original recipes when converting to concrete recipes, together with removing the need to record various information. The current version of the patch still retains the original printing matching VPExtendedReductionRecipe and VPMulAccumulateReductionRecipe, but this specialized print could be replaced with printing the bundled recipes directly. PR: https://github.com/llvm/llvm-project/pull/144281	2025-07-01 20:44:50 +01:00
Florian Hahn	59a7185dd9	[VPlan] Truncate/Extend ComputeReductionResult at construction (NFC). (#141860 ) Instead of looking up the narrower reduction type via getRecurrenceType we can generate the needed extend directly at constructiond re-use the truncated value from the loop. PR: https://github.com/llvm/llvm-project/pull/141860	2025-06-30 22:39:17 +01:00
Florian Hahn	20fbbd7675	[LV] Add support for cmp reductions with decreasing IVs. (#140451 ) Similar to FindLastIV, add FindFirstIVSMin to support select (icmp(), x, y) reductions where one of x or y is a decreasing induction, producing a SMin reduction. It uses signed max as sentinel value. PR: https://github.com/llvm/llvm-project/pull/140451	2025-06-29 11:17:03 +01:00
Florian Hahn	ec62dee703	[VPlan] Handle FirstActiveLane when unrolling. (#145394 ) Currently FirstActiveLane is not handled correctly during unrolling. This is currently causing mis-compiles when vectorizing early-exit loops with interleaving forced. This patch updates handling of FirstActiveLane to be analogous to computing final reduction results: during unrolling, the created copies for its original operand are added as additional operands, and FirstActiveLane will always produce the index of the first active lane across all unrolled iterations. Note that some of the generated code is still incorrect, as we also need to handle ExtractElement with FirstActiveLane operands. I will share patches for those soon as well. PR: https://github.com/llvm/llvm-project/pull/145394	2025-06-27 08:44:57 +01:00
Florian Hahn	5b76cdba5a	[VPlan] Handle AnyOf when unrolling. (#145340 ) Currently AnyOf is not handled correctly during unrolling. This is currently causing mis-compiles when vectorizing early-exit loops with interleaving forced (even though selectInterleaveCount will currently only pick IC = 1, unless forced by the user). This patch updates handling of AnyOf to be analogous to computing final reduction results: during unrolling, the created copies for its original operand are added as additional operands, and AnyOf will always produce the reduced value across all unrolled iterations. Note that the generated code is still incorrect, as we also need to handle FirstActiveLane and ExtractElement with FirstActiveLane operands. I will share patches for those soon as well. PR: https://github.com/llvm/llvm-project/pull/145340	2025-06-26 14:19:38 +01:00
Florian Hahn	aa24029319	[VPlan] Unroll VPReplicateRecipe by VF. (#142433 ) Explicitly unroll VPReplicateRecipes outside replicate regions by VF, replacing them by VF single-scalar recipes. Extracts for operands are added as needed and the scalar results are combined to a vector using a new BuildVector VPInstruction. It also adds a few folds to simplify unnecessary extracts/BuildVectors. It also adds a BuildStructVector opcode for handling of calls that have struct return types. VPReplicateRecipe in replicate regions can will be unrolled as follow up, turing non-single-scalar VPReplicateRecipes into 'abstract', i.e. not executable. PR: https://github.com/llvm/llvm-project/pull/142433	2025-06-26 11:19:09 +01:00
Florian Hahn	c3e25e7fc4	[VPlan] Add VPInst::getNumOperandsForOpcode, use to verify in ctor (NFC) (#142284 ) Add a new getNumOperandsForOpcode helper to determine the number of operands from the opcode. For now, it is used to verify the number operands at VPInstruction construction. It returns -1 for a few opcodes where the number of operands cannot be determined (GEP, Switch, PHI, Call). This can also be used in a follow-up to determine if a VPInstruction is masked based on the number of arguments. PR: https://github.com/llvm/llvm-project/pull/142284	2025-06-24 20:39:35 +01:00
Ramkumar Ramachandra	bb8c42e859	[LV] Extend FindLastIV to unsigned case (#141752 ) Split the FindLastIV RecurKind into SMax and UMax variants, depending on the reduction op produced.	2025-06-23 15:27:49 +01:00
Florian Hahn	9f7a155394	[VPlan] Update packScalarIntoVector to take and return wide value (NFC) Make the function more flexible in preparation for new users.	2025-06-21 18:03:14 +01:00
David Green	77941eba7f	[CostModel] Add a DstTy to getShuffleCost (#141634 ) A shuffle will take two input vectors and a mask, to produce a new vector of size <MaskElts x SrcEltTy>. Historically it has been assumed that the SrcTy and the DstTy are the same for getShuffleCost, with that being relaxed in recent years. If the Tp passed to getShuffleCost is the SrcTy, then the DstTy can be calculated from the Mask elts and the src elt size, but the Mask is not always provided and the Tp is not reliably always the SrcTy. This has led to situations notably in the SLP vectorizer but also in the generic cost routines where assumption about how vectors will be legalized are built into the generic cost routines - for example whether they will widen or promote, with the cost modelling assuming they will widen but the default lowering to promote for integer vectors. This patch attempts to start improving that - it originally tried to alter more of the cost model but that too quickly became too many changes at once, so this patch just plumbs in a DstTy to getShuffleCost so that DstTy and SrcTy can be reliably distinguished. The callers of getShuffleCost have been updated to try and include a DstTy that is more accurate. Otherwise it tries to be fairly non-functional, keeping the SrcTy used as the primary type used in shuffle cost routines, only using DstTy where it was in the past (for InsertSubVector for example). Some asserts have been added that help to check for consistent values when a Mask and a DstTy are provided to getShuffleCost. Some of them took a while to get right, and some non-mask calls might still be incorrect. Hopefully this will provide a useful base to build more shuffles that alter size.	2025-06-21 12:29:29 +01:00
Florian Hahn	2f5d965bb5	[VPlan] Use EMIT-SCALAR when printing casts. Split off EMIT-SCALAR printing changes from already approved https://github.com/llvm/llvm-project/pull/140623. Currently all casts are single scalars, this brings printing in line with printing for other VPInstructions.	2025-06-21 10:23:53 +01:00
Luke Lau	a2b8a93ff9	[VPlan] Pass NumUnrolledElems as operand to VPWidenPointerInductionRecipe. NFC (#119859 ) Similarly to VPWidenIntOrFpInductionRecipe, if we want to support it in EVL tail folding we need to increment the induction by EVL steps instead of VF*UF steps, but currently this is hard-wired in VPWidenPointerInductionRecipe. This adds an operand for the number of elements unrolled and plumbs it through, so that we can swap it out in VPlanTransforms::tryAddExplicitVectorLength further down the line.	2025-06-20 15:46:52 +01:00
Kazu Hirata	64fe323647	[llvm] Migrate away from ArrayRef(std::nullopt) (NFC) (#144967 ) ArrayRef has a constructor that accepts std::nullopt. This constructor dates back to the days when we still had llvm::Optional. Since the use of std::nullopt outside the context of std::optional is kind of abuse and not intuitive to new comers, I would like to move away from the constructor and eventually remove it. This patch takes care of the llvm side of the migration.	2025-06-19 21:31:26 -07:00
Philip Reames	b96370131d	[TTI] Plumb CostKind through getPartialReductionCost (#144953 ) Purely for the sake of being idiomatic with other TTI costing routines, no direct motivation beyond that.	2025-06-19 15:29:56 -07:00
Mel Chen	ba40a7bc2e	[LoopVectorize] Vectorize fixed-order recurrence with vscale x 1. (#142772 ) When the fixed-order recurrence phi is live-out from the loop, the vectorizer uses VPInstruction::ExtractPenultimateElement to extract the penultimate element from the recurrence vector. However, this is not feasible when the VF is vscale x 1, since vscale could be 1, making the vector contain only one element. This patch changes the behavior for vscale x 1 by extracting the last element from the vector produced by splicing the recurrence phi and the previous value. This ensures we can still determine the correct live-out value of the recurrence phi.	2025-06-18 16:03:20 +08:00
Luke Lau	9dd1c66e8f	[VPlan] Expand VPWidenIntOrFpInductionRecipe into separate recipes (#118638 ) The motivation of this PR is to make #115274 easier to implement, and should allow us to add EVL support by just passing EVL to the VF operand. The current difficulty with widening IVs with EVL is that VPWidenIntOrFpInductionRecipe generates its own backedge value. Since it's a VPHeaderPHIRecipe the VF operand must be in the preheader, which means we can't use the EVL since it's defined in the loop body. The gist in this PR is to take the approach in #114305 and expand VPWidenIntOrFpInductionRecipe into several recipes for the initial value, phi and backedge value just before execution. I.e. this example: ``` vector.ph: Successor(s): vector loop <x1> vector loop: { vector.body: WIDEN-INDUCTION %i = phi %start, %step, %vf ... EMIT branch-on-count ... No successors } ``` gets expanded to: ``` vector.ph: ... vp<%induction.start> = ... vp<%induction.increment> = ... Successor(s): vector loop <x1> vector loop: { vector.body: ir<%i> = WIDEN-PHI vp<%induction.start>, vp<%vec.ind.next> ... vp<%vec.ind.next> = add ir<%i>, vp<%induction.increment> EMIT branch-on-count ... No successors } ``` This allows us to a value defined in the loop in the backedge value, and also means we can just reuse the existing backedge fixups in VPlan::execute without having to specially handle it ourselves. After this #115274 should just become a matter of setting the VF operand to EVL (and building the increment step in the loop body, not the preheader).	2025-06-17 18:24:07 +01:00
Florian Hahn	790df93298	[VPlan] Mark VPFirstOrderRecurrencePHI as not reading/writing memory. First-order recurrence phis don't have side-effects and don't read or write memory. Mark them as such.	2025-06-15 22:00:47 +01:00
Florian Hahn	577199f922	Reapply "[VPlan] Set branch weight metadata on middle term in VPlan (NFC) (#143035 )" This reverts commit 0604dc199c019b23746f4a54885ba0c75569cdae. The recommitted version addresses post-commit comments and adjusts the place the branch weights are added. It now runs before VPlans are optimized for VF and UF, which may remove the vector loop region, causing a crash trying to get the middle block after that. Test case added in 72f99b75afc12bb. Original message: Manage branch weights for the BranchOnCond in the middle block in VPlan. This requires updating VPInstruction to inherit from VPIRMetadata, which in general makes sense as there are a number of opcodes that could take metadata. There are other branches (part of the skeleton) that also need branch weights adding. PR: https://github.com/llvm/llvm-project/pull/143035	2025-06-14 17:20:46 +01:00
Florian Hahn	732ebf803b	[VPlan] Address post-commit comments for f68848015f62. Assign sentinel value to named variable to clarify naming and update comments. Addresses post-commit comments from https://github.com/llvm/llvm-project/pull/142291.	2025-06-14 10:44:20 +01:00
Florian Hahn	f68848015f	[VPlan] Manage Sentinel value for FindLastIV in VPlan. (#142291 ) Similar to modeling the start value as operand, also model the sentinel value as operand explicitly. This makes all require information for code-gen available directly in VPlan. PR: https://github.com/llvm/llvm-project/pull/142291	2025-06-13 19:17:01 +01:00
David Sherwood	541e5118ce	[LV] Use getFixedValue instead of getKnownMinValue when appropriate (#143526 ) There are many places in VPlan and LoopVectorize where we use getKnownMinValue to discover the number of elements in a vector. Where we expect the vector to have a fixed length, I have used the stronger getFixedValue call. I believe this is clearer and adds extra protection in the form of an assert in getFixedValue that the vector is not scalable. While looking at VPFirstOrderRecurrencePHIRecipe::computeCost I also took the liberty of simplifying the code. In theory I believe this patch should be NFC, but I'm reluctant to add that to the title in case we're just missing tests for some of the VPlan changes. I built and ran the LLVM test suite when targeting neoverse-v1 and it seemed ok.	2025-06-13 11:43:50 +01:00
Philip Reames	8ee9646b06	[LV] Simplify creation of vp.load/vp.store/vp.reduce intrinsics (#143804 ) The use of VectorBuilder here was simply obscuring what was actually going on. For vp.load and vp.store, the resulting code is significantly more idiomatic. For the vp.reduce cases, we remove several layers of indirection, including passing parameters via implicit state on the builder. In both cases, the code is significantly easier to follow.	2025-06-12 13:46:06 -07:00
Hans Wennborg	0604dc199c	Revert "[VPlan] Set branch weight metadata on middle term in VPlan (NFC) (#143035 )" This caused assertion failures: llvm/lib/Transforms/Vectorize/VPlan.h:4021: llvm::VPBasicBlock* llvm::VPlan::getMiddleBlock(): Assertion `LoopRegion && "cannot call the function after vector loop region has been removed"' failed. See comment on the PR. > Manage branch weights for the BranchOnCond in the middle block in VPlan. > This requires updating VPInstruction to inherit from VPIRMetadata, which > in general makes sense as there are a number of opcodes that could take > metadata. > > There are other branches (part of the skeleton) that also need branch > weights adding. > > PR: https://github.com/llvm/llvm-project/pull/143035 This reverts commit db8d34db26e9ea92c08d6e813eca9cce40c48478.	2025-06-12 13:52:05 +02:00
Luke Lau	7ef77eb998	[LV] Support scalable interleave groups for factors 3,5,6 and 7 (#141865 ) Currently the loop vectorizer can only vectorize interleave groups for power-of-2 factors at scalable VFs by recursively interleaving [de]interleave2 intrinsics. However after https://github.com/llvm/llvm-project/pull/124825 and #139893, we now have [de]interleave intrinsics for all factors up to 8, which is enough to support all types of segmented loads and stores on RISC-V. Now that the interleaved access pass has been taught to lower these in #139373 and #141512, this patch teaches the loop vectorizer to emit these intrinsics for factors up to 8, which enables scalable vectorization for non-power-of-2 factors. As far as I'm aware, no in-tree target will vectorize a scalable interelave group above factor 8 because the maximum interleave factor is capped at 4 on AArch64 and 8 on RISC-V, and the `-max-interleave-group-factor` CLI option defaults to 8, so the recursive [de]interleaving code has been removed for now. Factors of 3 with scalable VFs are also turned off in AArch64 since there's no lowering for [de]interleave3 just yet either.	2025-06-12 11:09:09 +01:00
Florian Hahn	db8d34db26	[VPlan] Set branch weight metadata on middle term in VPlan (NFC) (#143035 ) Manage branch weights for the BranchOnCond in the middle block in VPlan. This requires updating VPInstruction to inherit from VPIRMetadata, which in general makes sense as there are a number of opcodes that could take metadata. There are other branches (part of the skeleton) that also need branch weights adding. PR: https://github.com/llvm/llvm-project/pull/143035	2025-06-12 10:04:08 +01:00
Florian Hahn	6108d50aed	[VPlan] Add ReductionStartVector VPInstruction. (#142290 ) Add a new VPInstruction::ReductionStartVector opcode to create the start values for wide reductions. This more accurately models the start value creation in VPlan and simplifies VPReductionPHIRecipe::execute. Down the line it also allows removing VPReductionPHIRecipe::RdxDesc. PR: https://github.com/llvm/llvm-project/pull/142290	2025-06-09 20:59:12 +01:00

1 2 3 4 5 ...

418 Commits