llvm-project

Author	SHA1	Message	Date
Florian Hahn	b8eaceb39b	[VPlan] Explicitly replicate VPInstructions by VF. (#155102 ) Extend replicateByVF added in #142433 (aa240293190) to also explicitly unroll replicating VPInstructions. Now the only remaining case where we replicate for all lanes is VPReplicateRecipes in replicate regions. PR: https://github.com/llvm/llvm-project/pull/155102	2025-09-12 17:06:26 +01:00
Florian Hahn	c3e76b2770	[VPlan] Keep common flags during CSE. (#157664 ) During CSE, we don't have to drop all poison-generating flags on mis-match, we can keep the ones common on both recipes. PR: https://github.com/llvm/llvm-project/pull/157664	2025-09-10 10:20:48 +00:00
David Sherwood	ba4ce60f1a	[LV] Add scalar load/stores to VPReplicateRecipe::computeCost (#153218 ) Avoid calling getLegacyCost for single scalar loads and stores where the cost is trivial to calculate.	2025-09-05 11:52:07 +01:00
Sam Tebbs	37127f74f4	[LV] Bundle sub reductions into VPExpressionRecipe (#147255 ) This PR bundles sub reductions into the VPExpressionRecipe class and adjusts the cost functions to take the negation into account. Stacked PRs: 1. https://github.com/llvm/llvm-project/pull/147026 2. -> https://github.com/llvm/llvm-project/pull/147255 3. https://github.com/llvm/llvm-project/pull/147302 4. https://github.com/llvm/llvm-project/pull/147513	2025-09-01 17:25:01 +01:00
Mel Chen	13357e8a12	[LV][EVL] Support interleaved access with tail folding by EVL (#152070 ) The InterleavedAccess pass already supports transforming vector-predicated (vp) load/store intrinsics. With this patch, we start enabling interleaved access under tail folding by EVL. This patch introduces a new base class, VPInterleaveBase, and a concrete class, VPInterleaveEVLRecipe. Both the existing VPInterleaveRecipe and the new VPInterleaveEVLRecipe inherit from and implement VPInterleaveBase. Compared to VPInterleaveRecipe, VPInterleaveEVLRecipe adds an EVL operand to emit vp.load/vp.store intrinsics. Currently, tail folding by EVL is only supported for scalable vectorization. Therefore, VPInterleaveEVLRecipe will only emit interleave/deinterleave intrinsics. Reverse accesses are not yet implemented, as masked reverse interleaved access under tail folding is not yet supported. Fixed #123201	2025-09-01 21:20:06 +08:00
Kerry McLaughlin	f0e9bba024	[LoopVectorize] Generate wide active lane masks (#147535 ) This patch adds a new flag (-enable-wide-lane-mask) which allows LoopVectorize to generate wider-than-VF active lane masks when it is safe to do so (i.e. the mask is used for data and control flow). The transform in extractFromWideActiveLaneMask creates vector extracts from the first active lane mask in the header & loop body, modifying the active lane mask phi operands to use the extracts. An additional operand is passed to the ActiveLaneMask instruction, the value of which is used as a multiplier of VF when generating the mask. By default this is 1, and is updated to UF by extractFromWideActiveLaneMask. The motivation for this change is to improve interleaved loops when SVE2.1 is available, where we can make use of the whilelo instruction which returns a predicate pair. This is based on a PR that was created by @momchil-velikov (#81140) and contains tests which were added there.	2025-09-01 13:53:30 +01:00
Florian Hahn	df098796ec	[VPlan] Compute cost of intrinsics directly for VPReplicateRecipe (NFCI). (#154617 ) Handle intrinsic calls in VPReplicateRecipe::computeCost. There are some intrinsics pseudo intrinsics for which the computed cost is known zero, so we handle those up front. Depends on https://github.com/llvm/llvm-project/pull/154291. PR: https://github.com/llvm/llvm-project/pull/154617	2025-08-27 21:40:47 +01:00
Florian Hahn	5e32f728ec	[VPlan] Move logic to compute cost for intrinsic to helper (NFC). Refactor to prepare for https://github.com/llvm/llvm-project/pull/154617.	2025-08-27 19:26:34 +01:00
Florian Hahn	c3470d1cdd	[VPlan] Compute cost of replicating calls in VPlan. (NFCI) (#154291 ) Implement computing the scalarization overhead for replicating calls in VPlan, matching the legacy cost model. Depends on https://github.com/llvm/llvm-project/pull/154126. PR: https://github.com/llvm/llvm-project/pull/154291	2025-08-26 13:37:49 +01:00
David Sherwood	d606eae2ce	[LV] Stop using the legacy cost model for udiv + friends (#152707 ) In VPWidenRecipe::computeCost for the instructions udiv, sdiv, urem and srem we fall back on the legacy cost unnecessarily. At this point we know that the vplan must be functionally correct, i.e. if the divide/remainder is not safe to speculatively execute then we must have either: 1. Scalarised the operation, in which case we wouldn't be using a VPWidenRecipe, or 2. We've inserted a select for the second operand to ensure we don't fault through divide-by-zero. For 2) it's necessary to add the select operation to VPInstruction::computeCost so that we mirror the cost of the legacy cost model. The only problem with this is that we also generate selects in vplan for predicated loops with reductions, which aren't accounted for in the legacy cost model. In order to prevent asserts firing I've also added the selects to precomputeCosts to ensure the legacy costs match the vplan costs for reductions.	2025-08-26 10:17:23 +01:00
Elvis Wang	ed52bdd453	[VPlan] Get Addr computation cost with scalar type if it is uniform for gather/scatter. (NFC) (#150371 ) This patch query `getAddressComputationCost()` with scalar type if the address is uniform. This can help the cost for gather/scatter more accurate. In current LV, non consecutive VPWidenMemoryRecipe (gather/scatter) will account the cost of address computation. But there are some cases that the address is uniform across all lanes, that makes the address can be calculated with scalar type and broadcast. I have a followup optimization that tries to convert gather/scatter with uniform memory access to scalar load/store + broadcast (and select if needed). With this optimization, we can remove this temporary change. This patch is preparation for #149955 to prevent regressions.	2025-08-26 09:04:15 +08:00
Florian Hahn	c950a72974	[VPlan] Support scalar VF for ExtractLane and FirstActiveLane. Extend ExtractLane and FirstActiveLane to support scalable VFs. This allows correct handling when interleaving with VF = 1. Alive2 proofs: - Fixed codegen with this patch: https://alive2.llvm.org/ce/z/8Y5_Vc (verifies as correct) - Original codegen: https://alive2.llvm.org/ce/z/twdg3X (doesn't verify) Fixes https://github.com/llvm/llvm-project/issues/154967.	2025-08-25 21:45:21 +01:00
Florian Hahn	f492eb9509	[VPlan] Make VPInstruction::AnyOf poison-safe. (#154156 ) AnyOf reduces multiple input vectors to a single boolean value. When used for early-exit vectorization, we need to consider any lane after the early exit being poison. Any poison lane would result in poison after the AnyOf reduction. To prevent this, freeze all inputs to AnyOf. Fixes https://github.com/llvm/llvm-project/issues/153946. Fixes https://github.com/llvm/llvm-project/issues/155162. https://alive2.llvm.org/ce/z/FD-XxA PR: https://github.com/llvm/llvm-project/pull/154156	2025-08-25 18:55:23 +01:00
Florian Hahn	d84be8a9b4	[VPlan] Get Cmp cost via getCostForRecipeWithOp for VPReplicateR (NFCI). Use common getCostForRecipeWithOpcode to get the cost for ICmp/FCmp.	2025-08-24 19:43:22 +01:00
Ramkumar Ramachandra	66be00d635	[VPlan] Introduce m_Cmp; match more compares (#154771 ) Extend [Specific]Cmp_match to handle floating-point compares, and introduce m_Cmp that matches both integer and floating-point compares. Use it in simplifyRecipe to match and simplify the general case of compares. The change has necessitated a bugfix in VPReplicateRecipe::execute.	2025-08-24 13:27:06 +01:00
Florian Hahn	300d2c6d20	[VPlan] Move SCEV expansion to VPlan transform. (NFCI). Move the logic to expand SCEVs directly to a late VPlan transform that expands SCEVs in the entry block. This turns VPExpandSCEVRecipe into an abstract recipe without execute, which clarifies how the recipe is handled, i.e. it is not executed like regular recipes. It also helps to simplify construction, as now scalar evolution isn't required to be passed to the recipe.	2025-08-21 22:03:26 +01:00
Florian Hahn	e41aaf5a64	[VPlan] Use VPIRMetadata for VPInterleaveRecipe. (#153084 ) Use VPIRMetadata for VPInterleaveRecipe to preserve noalias metadata added by versioning. This still uses InterleaveGroup's logic to preserve existing metadata from IR. This can be migrated separately. Fixes https://github.com/llvm/llvm-project/issues/153006. PR: https://github.com/llvm/llvm-project/pull/153084	2025-08-21 18:58:10 +01:00
Luke Lau	955c475ae6	[VPlan] Add m_Sub to VPlanPatternMatch. NFC (#154705 ) To mirror PatternMatch.h, and we'll also be able to use it in #152167	2025-08-21 09:33:46 +00:00
Ramkumar Ramachandra	0db57ab586	[VPlan] Improve code using onlyScalarValuesUsed (NFC) (#154564 )	2025-08-20 22:38:00 +01:00
Florian Hahn	35be64a416	[VPlan] Factor out logic to common compute costs to helper (NFCI). (#153361 ) A number of recipes compute costs for the same opcodes for scalars or vectors, depending on the recipe. Move the common logic out to a helper in VPRecipeWithIRFlags, that is then used by VPReplicateRecipe, VPWidenRecipe and VPInstruction. This makes it easier to cover all relevant opcodes, without duplication. PR: https://github.com/llvm/llvm-project/pull/153361	2025-08-20 16:05:20 +01:00
David Sherwood	13d8ba7dea	[LV][TTI] Calculate cost of extracting last index in a scalable vector (#144086 ) There are a couple of places in the loop vectoriser where we want to calculate the cost of extracting the last lane in a vector. However, we wrongly assume that asking for the cost of extracting lane (VF.getKnownMinValue() - 1) is an accurate representation of the cost of extracting the last lane. For SVE at least, this is non-trivial as it requires the use of whilelo and lastb instructions. To solve this problem I have added a new getReverseVectorInstrCost interface where the index is used in reverse from the end of the vector. Suppose a vector has a given ElementCount EC, the extracted/inserted lane would be EC - 1 - Index. For scalable vectors this index is unknown at compile time. I've added a AArch64 hook that better represents the cost, and also a RISCV hook that maintains compatibility with the behaviour prior to this PR. I've also taken the liberty of adding support in vplan for calculating the cost of VPInstruction::ExtractLastElement.	2025-08-19 09:31:37 +01:00
Mel Chen	1dac302ce7	[LV] Explicitly disallow interleaved access requiring gap mask for scalable VFs. nfc (#154122 ) Currently, VPInterleaveRecipe::execute does not support generating LLVM IR for interleaved accesses that require a gap mask for scalable VFs. It would be better to detect and prevent such groups from being vectorized as interleaved accesses in LoopVectorizationCostModel::interleavedAccessCanBeWidened, rather than relying on the TTI function getInterleavedMemoryOpCost to return an invalid cost.	2025-08-19 08:42:39 +08:00
Florian Hahn	79be94c984	[VPlan] Compute cost single-scalar calls in computeCost. (NFC) Compute the cost of non-intrinsic, single-scalar calls directly in VPReplicateRecipe::computeCost. This starts moving call cost computations to VPlan, handling the simplest case first.	2025-08-18 21:56:56 +01:00
Florian Hahn	7e9989390d	[VPlan] Materialize Build(Struct)Vectors for VPReplicateRecipes. (NFCI) (#151487 ) Materialze Build(Struct)Vectors explicitly for VPRecplicateRecipes, to serve their users requiring a vector, instead of doing so when unrolling by VF. Now we only need to implicitly build vectors in VPTransformState::get for VPInstructions. Once they are also unrolled by VF we can remove the code-path alltogether. PR: https://github.com/llvm/llvm-project/pull/151487	2025-08-18 20:49:42 +01:00
David Sherwood	7ee6cf06c8	[LV] Fix incorrect cost kind in VPReplicateRecipe::computeCost (#153216 ) We were incorrectly using the TTI::TCK_RecipThroughput cost kind and ignoring the kind set in the context.	2025-08-18 09:52:31 +01:00
Florian Hahn	177f27d220	[VPlan] Add incoming_[blocks,values] iterators to VPPhiAccessors (NFC) (#138472 ) Add 3 new iterator ranges to VPPhiAccessors * incoming_values(): returns a range over the incoming values of a phi * incoming_blocks(): returns a range over the incoming blocks of a phi * incoming_values_and_blocks: returns a range over pairs of incoming values and blocks. Depends on https://github.com/llvm/llvm-project/pull/124838. PR: https://github.com/llvm/llvm-project/pull/138472	2025-08-14 16:47:04 +01:00
Elvis Wang	01fac67e2a	[TTI] Add cost kind to getAddressComputationCost(). NFC. (#153342 ) This patch add cost kind to `getAddressComputationCost()` for #149955. Note that this patch also remove all the default value in `getAddressComputationCost()`.	2025-08-14 16:01:44 +08:00
Florian Hahn	424258947e	[VPlan] Materialize VF and VFxUF using VPInstructions. (#152879 ) Materialize VF and VFxUF computation using VPInstruction instead of directly creating IR. This is one of the last few steps needed to model the full vector skeleton in VPlan. This is mostly NFC, although in some cases we remove some unused computations. PR: https://github.com/llvm/llvm-project/pull/152879	2025-08-12 14:13:13 +01:00
Sam Tebbs	0bfa1718af	[LV] Create in-loop sub reductions (#147026 ) This PR allows the loop vectorizer to handle in-loop sub reductions by forming a normal in-loop add reduction with a negated input. Stacked PRs: 1. -> https://github.com/llvm/llvm-project/pull/147026 2. https://github.com/llvm/llvm-project/pull/147255 3. https://github.com/llvm/llvm-project/pull/147302 4. https://github.com/llvm/llvm-project/pull/147513	2025-08-12 10:22:41 +01:00
Luke Lau	acb86fb9e0	[TTI] Consistently pass the pointer type to getAddressComputationCost. NFCI (#152657 ) In some places we were passing the type of value being accessed, in other cases we were passing the type of the pointer for the access. The most "involved" user is LoopVectorizationCostModel::getMemInstScalarizationCost, which is the only call site that passes in the SCEV, and it passes along the pointer type. This changes call sites to consistently pass the pointer type, and renames the arguments to clarify this. No target actually checks the contents of the type passed, only to see if it's a vector or not, so this shouldn't have an effect.	2025-08-11 18:00:12 +08:00
Elvis Wang	37fe7a9933	[LV] Generate scalar xor for VPInstruction::Not if possible. (#152628 ) `VPInstruction::Not` which will generate xor instruction is widely used for the exit condition. This patch make `VPInstruction::Not` generate scalar `xor` if possible. This can help reducing the (splat true) in the `xor` and make `xor` be scalar.	2025-08-11 16:35:21 +08:00
Florian Hahn	86813aa786	[VPlan] Add dedicated user for resume phi with epilogue vectorization. Epilogue vectorization currently relies on the resume phi for the canonical induction being always available, which is why VPPhi are considered to have side-effects, to prevent their removal. This patch adds a new ResumeForEpilogue opcode to mark the resume phi as used for epilogue vectorization. This allows treating VPPhis in general as not having side-effects, enabling removal of unused VPPhis.	2025-08-10 21:21:16 +01:00
Florian Hahn	47258ca470	[VPlan] Use VPPhi instead of dyn_cast + opcode check in isPhi (NFC).	2025-08-05 19:20:12 +01:00
Luke Lau	94a6cd464e	[VPlan] Expand VPWidenPointerInductionRecipe into separate recipes (#148274 ) This is the VPWidenPointerInductionRecipe equivalent of #118638, with the motivation of allowing us to use the EVL as the induction step. There is a new VPInstruction added, WidePtrAdd to allow adding the step vector to the induction phi, since VPInstruction::PtrAdd only handles scalars or multiple scalar lanes. Originally this transformation was copied from the original recipe's execute code, but it's since been simplifed by teaching `unrollWidenInductionByUF` to unroll the recipe, which brings it inline with VPWidenIntOrFpInductionRecipe.	2025-08-05 16:54:02 +08:00
Florian Hahn	39c30665e9	[VPlan] Update type of cloned instruction in scalarizeInstruction. The operands of the replicate recipe may have been narrowed, resulting in a narrower result type. Update the type of the cloned instruction to the correct type. Fixes https://github.com/llvm/llvm-project/issues/151392.	2025-08-02 19:49:59 +01:00
Mel Chen	86916ff0f0	[LV] Fix gap mask requirement for interleaved access (#151105 ) When interleaved stores contain gaps, a mask is required to skip the gaps, regardless of whether scalar epilogues are allowed. This patch corrects the condition under which a gap mask is needed, ensuring consistency between the legacy and VPlan-based cost models and avoiding assertion failures. Related #149981	2025-08-01 14:24:30 +08:00
Samuel Tebbs	339b0a1d74	[LV][NFCI] Format fcc419b05f62	2025-07-31 14:37:59 +01:00
Samuel Tebbs	fcc419b05f	[LV][NFCI] Swap reduction recipe operand order https://github.com/llvm/llvm-project/pull/147026 will enable sub reductions, which require that the phi value is the first operand since they aren't commutative. This re-orders the operands when executing reductions, which actually matches other existing code in VPReductionRecipe::execute.	2025-07-31 14:35:10 +01:00
David Sherwood	6fbc397964	[IR] Add new CreateVectorInterleave interface (#150931 ) This PR adds a new interface to IRBuilder called CreateVectorInterleave, which can be used to create vector.interleave intrinsics of factors 2-8. For convenience I have also moved getInterleaveIntrinsicID and getDeinterleaveIntrinsicID from VectorUtils.cpp to Intrinsics.cpp where it can be used by IRBuilder.	2025-07-29 08:47:07 +01:00
Florian Hahn	4386848776	[VPlan] Add explicit VPUnrollPartAccessor<1> instantiation. This should fix a build-failure with GCC, including https://lab.llvm.org/buildbot/#/builders/105/builds/10685.	2025-07-27 14:05:23 +01:00
Florian Hahn	80c43b6c07	[VPlan] Add ExtractLane VPInst to extract across multiple parts. (#148817 ) This patch adds a new ExtractLane VPInstruction which extracts across multiple parts using a wide index, to be used in combination with FirstActiveLane. The patch updates early-exit codegen to use it instead ExtractElement, which is only per-part. With this change, interleaving should work correctly with early-exit loops. The patch removes the restrictions added in 6f43754e9 (#145877), but does not yet automatically select interleave counts > 1 for early-exit loops. I'll share a patch as follow-up. The cost of extracting a lane adds non-trivial overhead in the exit block, so that should be considered when picking the interleave count. PR: https://github.com/llvm/llvm-project/pull/148817	2025-07-27 08:08:25 +01:00
Florian Hahn	1640d51bf8	[VPlan] Mark getUnrollPart argument as const (NFC).	2025-07-25 10:49:33 +01:00
Luke Lau	9563e7a940	[VPlan] Mark VPInstruction::ExplicitVectorLength as single scalar. NFC (#150221 ) This allows it to be broadcasted without an explicit VPInstruction::Broadcast in #150202	2025-07-23 22:38:21 +08:00
Luke Lau	114d74e391	[VPlan] Expand VPBlendRecipes to select instructions. NFC (#133993 ) When looking at some EVL tail folded code in SPEC CPU 2017 I noticed we sometimes have both VPBlendRecipes and select VPInstructions in the same plan: EMIT vp<%active.lane.mask> = active lane mask vp<%5>, vp<%3> EMIT vp<%7> = icmp ... EMIT vp<%8> = logical-and vp<%active.lane.mask>, vp<%7> BLEND ir<%8> = ir<%n.015> ir<%foo>/vp<%8> EMIT vp<%9> = select vp<%active.lane.mask>, ir<%8>, ir<%n.015> Since a blend will ultimately generate a chain of selects, we could fold the blend into the select: EMIT vp<%active.lane.mask> = active lane mask vp<%5>, vp<%3> EMIT vp<%7> = icmp ... EMIT vp<%8> = logical-and vp<%active.lane.mask>, vp<%7> EMIT ir<%8> = select vp<%8>, ir<%foo>, ir<%n.015> So as a first step, this patch expands blends to a series of select instructions, which may allow them to be simplified further with other select instructions.	2025-07-23 20:09:33 +08:00
Mel Chen	6752369139	[LV] Unify interleaved load handling for fixed and scalable VFs. nfc (#146914 ) This patch modifies VPInterleaveRecipe::execute to handle both fixed and scalable VFs using a single loop.	2025-07-22 09:00:10 +08:00
Florian Hahn	004c67ea25	[LV] Vectorize maxnum/minnum w/o fast-math flags. (#148239 ) Update LV to vectorize maxnum/minnum reductions without fast-math flags, by adding an extra check in the loop if any inputs to maxnum/minnum are NaN, due to maxnum/minnum behavior w.r.t to signaling NaNs. Signed-zeros are already handled consistently by maxnum/minnum. If any input is NaN, exit the vector loop, compute the reduction result up to the vector iteration that contained NaN inputs and * resume in the scalar loop New recurrence kinds are added for reductions using maxnum/minnum without fast-math flags. PR: https://github.com/llvm/llvm-project/pull/148239	2025-07-18 21:58:19 +01:00
Florian Hahn	a40dc05898	[VPlan] Mark canonical IV and reduction phis as not writing memory (NFC). Both recipes do not write to memory. Should be NFC at the moment, as they cannot be removed currently due to being in a cycle.	2025-07-15 11:08:54 +01:00
Luke Lau	c8d0e24745	[VPlan] Preserve trunc nuw/nsw in VPRecipeWithIRFlags (#144700 ) This preserves the nuw/nsw flags on widened truncs by checking for TruncInst in the VPIRFlags constructor The motivation for this is to be able to fold away some redundant truncs feeding into uitofps (or potentially narrow the inductions feeding them)	2025-07-15 15:34:14 +08:00
Kazu Hirata	649347e208	[Vectorize] Remove unnecessary casts (NFC) (#148116 ) &Ingredient is already of Instruction *.	2025-07-11 09:52:42 -07:00
Ramkumar Ramachandra	62f8377e40	[LV] Extend FindFirstIV to unsigned case (#146386 ) Extend FindFirstIV vectorization to the unsigned case by introducing and handling FindFirstIVUMin. Co-authored-by: Florian Hahn <flo@fhahn.com>	2025-07-09 15:56:40 +01:00

1 2 3 4 5 ...

450 Commits