llvm-project

Author	SHA1	Message	Date
Florian Hahn	48eb697441	[LV] Count cost of middle block if TC <= VF. (#168949 ) If the expected trip count is less than the VF, the vector loop will only execute a single iteration. When that's the case, the cost of the middle block has the same impact as the cost of the vector loop. Include it in isOutsideLoopWorkProfitable to avoid vectorizing when the extra work in the middle block makes it unprofitable. Note that isOutsideLoopWorkProfitable already scales the cost of blocks outside the vector region, but the patch restricts accounting for the middle block to cases where VF <= ExpectedTC, to initially catch some worst cases and avoid regressions. This initial version should specifically avoid unprofitable tail-folding for loops with low trip counts after re-applying https://github.com/llvm/llvm-project/pull/149042. PR: https://github.com/llvm/llvm-project/pull/168949	2025-11-24 19:23:04 +00:00
Ramkumar Ramachandra	1abb055c57	[IVDesc] Make getCastInsts return an ArrayRef (NFC) (#169021 ) To make it clear that the return value is immutable.	2025-11-24 08:57:55 +00:00
Florian Hahn	080ca902c6	[VPlan] Create resume phis in scalar preheader early. (NFC) (#166099 ) Create phi recipes for scalar resume value up front in addInitialSkeleton during initial construction. This will allow moving the remaining code dealing with resume values to VPlan transforms/construction. PR: https://github.com/llvm/llvm-project/pull/166099	2025-11-22 20:45:41 +00:00
Florian Hahn	7acfbc23a7	[VPlan] Remove PtrIV::IsScalarAfterVectorization, use VPlan analysis. (#168289 ) Remove `VPWidenPointerInductionRecipe::IsScalarAfterVectorization` and replace it with `onlyScalarValuesUsed`. This removes the need to carry state from the legacy cost model through VPlan, and the VPlan-based analysis gives more accurate results, avoiding a number of extracts. PR: https://github.com/llvm/llvm-project/pull/168289	2025-11-20 18:58:25 +00:00
Florian Hahn	67e35bbebb	[LV] Check full partial reduction chains in order. (#168036 ) https://github.com/llvm/llvm-project/pull/162822 added another validation step to check if entries in a partial reduction chain have the same scale factor. But the validation was still dependent on the order of entries in PartialReductionChains, and would fail to reject some cases (e.g. if the first first link matched the scale of the second link, but the second link is invalidated later). To fix that, group chains by their starting phi nodes, then perform the validation for each chain, and if it fails, invalidate the whole chain for the phi. Fixes https://github.com/llvm/llvm-project/issues/167243. Fixes https://github.com/llvm/llvm-project/issues/167867. PR: https://github.com/llvm/llvm-project/pull/168036	2025-11-20 15:54:57 +00:00
Sam Tebbs	3396b4654b	[LV] Allow partial reductions with an extended bin op (#165536 ) A pattern of the form reduce.add(ext(mul)) is valid for a partial reduction as long as the mul and its operands fulfill the requirements of a normal partial reduction. The mul's extend operands will be optimised to the wider extend, and we already have oneUse checks in place to make sure the mul and operands can be modified safely. 1. -> https://github.com/llvm/llvm-project/pull/165536 2. https://github.com/llvm/llvm-project/pull/165543	2025-11-20 10:22:11 +00:00
Florian Hahn	040d9c94be	[VPlan] Collect FMFs for in-loop reduction chain in VPlan. (NFC) Replace retrieving FMFs for in-loop reduction via underlying instruction + legal by collecting the flags during reduction chain traversal in VPlan.	2025-11-19 22:11:21 +00:00
Luke Lau	5da0445420	[LV] Consolidate shouldOptimizeForSize and remove unused BFI/PSI. NFC (#168697 ) #158690 plans on passing BFI as a lazy lambda to avoid computing BlockFrequencyInfo when not needed. In preparation for that, this PR removes BFI and PSI from some constructors that aren't used. It also consolidates the two calls to llvm::shouldOptimizeForSize so that the result is computed once and passed where needed. This also renames OptForSize in LoopVectorizationLegality to clarify that it's to prevent runtime SCEV checks, see https://reviews.llvm.org/D68082	2025-11-19 21:29:26 +08:00
Hassnaa Hamdi	f7f41350b4	[LV]: Skip Epilogue scalable VF greater than RemainingIterations. (#156724 ) Consider skipping epilogue scalable VF when they are greater than RemainingIterations same as fixed VF. And skip scalable RemainingIterations from that comparison because SCEV ATM can't evaluate non-canonical vscale-based expressions.	2025-11-19 05:11:17 +00:00
Shih-Po Hung	961940e1a7	[TTI] Use MemIntrinsicCostAttributes for getMaskedMemoryOpCost (#168029 ) - Split from #165532. This is a step toward a unified interface for masked/gather-scatter/strided/expand-compress cost modeling. - Replace the ad-hoc parameter list with a single attributes object. API change: ``` - InstructionCost getMaskedMemoryOpCost(Opcode, Src, Alignment, - AddressSpace, CostKind); + InstructionCost getMaskedMemoryOpCost(MemIntrinsicCostAttributes, + CostKind); ``` Notes: - NFCI intended: callers populate MemIntrinsicCostAttributes with the same information as before. - Follow-up: migrate gather/scatter, strided, and expand/compress cost queries to the same attributes-based entry point.	2025-11-19 09:51:12 +08:00
Florian Hahn	2befda2225	[VPlan] Populate and use VPIRFlags from initial VPInstruction. (#168450 ) Update VPlan to populate VPIRFlags during VPInstruction construction and use it when creating widened recipes, instead of constructing VPIRFlags from the underlying IR instruction each time. The VPRecipeWithIRFlags constructor taking an underlying instruction and setting the flags based on it has been removed. This centralizes initial VPIRFlags creation and ensures flags are consistently available throughout VPlan transformations and makes sure we don't accidentally re-add flags from the underlying instruction that already got dropped during transformations. Follow-up to https://github.com/llvm/llvm-project/pull/167253, which did the same for VPIRMetadata. Should be NFC w.r.t. to the generated IR. PR: https://github.com/llvm/llvm-project/pull/168450	2025-11-18 15:15:14 +00:00
Florian Hahn	3cba379e3d	[VPlan] Populate and use VPIRMetadata from VPInstructions (NFC) (#167253 ) Update VPlan to populate VPIRMetadata during VPInstruction construction and use it when creating widened recipes, instead of constructing VPIRMetadata from the underlying IR instruction each time. This centralizes VPIRMetadata in VPInstructions and ensures metadata is consistently available throughout VPlan transformations. PR: https://github.com/llvm/llvm-project/pull/167253	2025-11-17 21:28:49 +00:00
Ramkumar Ramachandra	ef023cae38	Reland [VPlan] Expand WidenInt inductions with nuw/nsw (#168354 ) Changes: The previous patch had to be reverted to a mismatching-OpType assert in cse. The reduced-test has now been added corresponding to a RVV pointer-induction, and the pointer-induction case has been updated to use createOverflowingBinaryOp. While at it, record VPIRFlags in VPWidenInductionRecipe.	2025-11-17 13:44:25 +00:00
Florian Hahn	e009de26b6	[LV] Use VPlan pattern matching in adjustRecipesForReductions (NFC) Replace the assert checking if CurrentLinkI is a CmpInst with a pattern matching check in the if condition. This uses VPlan-level pattern matching instead of inspecting the underlying instruction type.	2025-11-15 21:45:40 +00:00
Alex Bradbury	f2336d4c7e	Revert "[VPlan] Expand WidenInt inductions with nuw/nsw" (#168080 ) Reverts llvm/llvm-project#163538 This is causing build failures on the two-stage RVV buildbots. e.g. https://lab.llvm.org/buildbot/#/builders/214/builds/1363. I've shared a reproducer and more information at https://github.com/llvm/llvm-project/pull/163538#issuecomment-3533482822 This reverts commit 355e0f94af5adabe90ac57110ce1b47596afd4cd.	2025-11-14 16:11:48 +00:00
Ramkumar Ramachandra	355e0f94af	[VPlan] Expand WidenInt inductions with nuw/nsw (#163538 ) While at it, record VPIRFlags in VPWidenInductionRecipe.	2025-11-14 12:10:55 +00:00
Mel Chen	3277f6caef	[LV] Explicitly disable in-loop reductions for AnyOf and FindIV. nfc (#163541 ) Currently, in-loop reductions for AnyOf and FindIV are not supported. They were implicitly blocked. This happened because RecurrenceDescriptor::getReductionOpChain could not detect their recurrence chain. The reason is that RecurrenceDescriptor::getOpcode was set to Instruction::Or, but the recurrence chains of AnyOf and FindIV do not actually contain an Instruction::Or. This patch explicitly disables in-loop reductions for AnyOf and FindIV instead of relying on getReductionOpChain to implicitly prevent them.	2025-11-14 09:14:07 +00:00
Luke Lau	851f8f7984	[VPlan] Disable partial reductions again with EVL tail folding (#167863 ) VPPartialReductionRecipe doesn't yet support an EVL variant, and we guard against this by not calling convertToAbstractRecipes when we're tail folding with EVL. However recently some things got shuffled around which means we may detect some scaled reductions in collectScaledReductions and store them in ScaledReductionMap, where outside of convertToAbstractRecipes we may look them up and start e.g. adding a scale factor to an otherwise regular VPReductionPHI. This fixes it by skipping collectScaledReductions, and fixes #167861	2025-11-14 06:30:12 +00:00
Florian Hahn	a6edeedbfa	Revert "[LV] Use ExtractLane(LastActiveLane, V) live outs when tail-folding. (#149042 )" This reverts commit 62d1a080e69e3c5e98840e000135afa7c688a77b. This appears to be causing some runtime failures on RISCV https://lab.llvm.org/buildbot/#/builders/210/builds/5221	2025-11-13 22:34:55 +00:00
Ryan Buchner	a04c6b5512	[LV] Update LoopVectorizationPlanner::emitInvalidCostRemarks to handle reduction plans (#165913 ) The TypeSwitch for extracting the Opcode now handles the `VPReductionRecipe` case. Fixes #165359.	2025-11-13 06:12:40 -10:00
Florian Hahn	b6bcfdea40	[VPlan] Get opcode & type from recipe in adjustRecipesForReduction (NFC) Replace direct access to underlying IR instructions with VPlan-level equivalents, i.e. VPTypeAnalysis and pattern matching on the recipe. Removes a few uses of accessing underlying IR.	2025-11-12 22:37:15 +00:00
Florian Hahn	53a65ba6b9	[VPlan] Don't look up recipe for IV step via RecipeBuilder. (NFC) Directly update induction increments with step value created for wide inductions in createWidenInductionRecipes, which does not require looking up via RecipeBuilder.	2025-11-12 22:08:56 +00:00
Florian Hahn	62d1a080e6	[LV] Use ExtractLane(LastActiveLane, V) live outs when tail-folding. (#149042 ) Building on top of https://github.com/llvm/llvm-project/pull/148817, introduce a new abstract LastActiveLane opcode that gets lowered to Not(Mask) → FirstActiveLane(NotMask) → Sub(result, 1). When folding the tail, update all extracts for uses outside the loop the extract the value of the last actice lane. See also https://github.com/llvm/llvm-project/issues/148603 PR: https://github.com/llvm/llvm-project/pull/149042	2025-11-12 15:11:00 +00:00
Luke Lau	02c68b3ef7	[VPlan] Plumb scalable register size through narrowInterleaveGroups (#167505 ) On RISC-V narrowInterleaveGroups doesn't kick in because the wrong VectorRegWidth is passed to isConsecutiveInterleaveGroup. narrowInterleaveGroups is always passed the RGK_FixedWidthVector register size, but on RISC-V the RGK_ScalableVector size is twice as large because we want to use LMUL 2. This causes the `GroupSize == VectorRegWidth` check to fail. This fixes it by using the scalable register size whenever the VF is scalable and plumbing it through as a potentially scalable TypeSize. Note that this only makes a difference when tail folding is disabled, as narrowInterleaveGroups can't handle EVL based IVs yet.	2025-11-12 11:14:53 +00:00
Florian Hahn	519cf3c2b8	[VPlan] Remove unneeded getDefiningRecipe with isa/cast/dyn_cast. (NFC) Classof for most recipes directly supports VPValue, so there is no need to call getDefiningRecipe when using isa/cast/dyn_cast.	2025-11-11 22:07:48 +00:00
Florian Hahn	c41ef17653	[VPlan] Add getSingleUser helper (NFC). Add helper to make it easier to retrieve the single user of a VPUser.	2025-11-11 21:05:44 +00:00
Kerry McLaughlin	de3de3f143	[LV] Consider interleaving when -enable-wide-lane-mask=true (#163387 ) Currently the only way to enable the use of wide active lane masks is to pass -enable-wide-lane-mask and force both interleaving & tail-folding with additional flags. This patch changes selectInterleaveCount to consider interleaving if wide lane masks were requested, although the feature remains off by default.	2025-11-11 11:46:59 +00:00
Sander de Smalen	517d725463	[LV] Move condition to VPPartialReductionRecipe::execute (#166136 ) This means that VPExpressions will now be constructed for VPPartialReductionRecipe's when the loop has tail-folding predication. Note that control-flow (if/else) predication is not yet handled for partial reductions, because of the way partial reductions are recognised and built up.	2025-11-11 09:42:54 +00:00
Luke Lau	bfd4155f23	[VPlan] Don't apply predication discount to non-originally-predicated blocks (#160449 ) Split off from #158690. Currently if an instruction needs predicated due to tail folding, it will also have a predicated discount applied to it in multiple places. This is likely inaccurate because we can expect a tail folded instruction to be executed on every iteration bar the last. This fixes it by checking if the instruction/block was originally predicated, and in doing so prevents vectorization with tail folding where we would have had to scalarize the memory op anyway. On llvm-test-suite this causes 4 loops in total to no longer be vectorized with -O3 on arm64-apple-darwin, and there's no observable performance impact.	2025-11-10 12:10:40 +00:00
Florian Hahn	d406c15fc8	[VPlan] Use VPInstructionWithType for casts in VPlan0. (NFC) Use VPInstructionWithType for casts in VPlan0, to enable additional analysis/transforms on VPlan0, and more accurate modeling in VPlan0.	2025-11-09 21:35:50 +00:00
Florian Hahn	8a6d5f68e4	[VPlan] Update more VPRecipeBuilder members to take VPInst directly (NFC) Update VPRecipeBuilder methos to take VPInstruction* directly instead of ArrayRef<> for operands and Instruction * separately. This allows avoid accessing the underlying instruction in some cases, by using information directly from VPInstruction, like getOpcode(), getDebugLoc(), and getOperand(). It also allows directly transferring other information directly from VPInstruction in the future.	2025-11-07 21:06:38 +00:00
Florian Hahn	3ad5765e23	[LV] Check all users of partial reductions in chain have same scale. (#162822 ) Check that all partial reductions in a chain are only used by other partial reductions with the same scale factor. Otherwise we end up creating users of scaled reductions where the types of the other operands don't match. A similar issue was addressed in https://github.com/llvm/llvm-project/pull/158603, but misses the chained cases. Fixes https://github.com/llvm/llvm-project/issues/162530. PR: https://github.com/llvm/llvm-project/pull/162822	2025-11-06 21:45:57 +00:00
Mel Chen	d1874047f5	[VPlan] Retrieve alignment from Load/StoreInst in constructors. nfc (#165722 ) This patch removes the explicit Alignment parameter from VPWidenLoadRecipe and VPWidenStoreRecipe constructors. Instead, these recipes now directly retrieve the alignment from their LoadInst/StoreInst.	2025-11-06 09:02:04 +00:00
Florian Hahn	6e83937f39	[VPlan] Add getConstantInt helpers for constant int creation (NFC). Add getConstantInt helper methods to VPlan to simplify the common pattern of creating constant integer live-ins. Suggested as follow-up in https://github.com/llvm/llvm-project/pull/164127.	2025-11-01 04:13:01 +00:00
Florian Hahn	b2d12d6f2b	[VPlan] Extend getSCEVForVPV, use to compute VPReplicateRecipe cost. (#161276 ) Update getSCEVExprForVPValue to handle more complex expressions, to use it in VPReplicateRecipe::comptueCost. In particular, it supports construction SCEV expressions for GetElementPtr VPReplicateRecipes, with operands that are VPScalarIVStepsRecipe, VPDerivedIVRecipe and VPCanonicalIVRecipe. If we hit a sub-expression we don't support yet, we return SCEVCouldNotCompute. Note that the SCEV expression is valid VF = 1: we only support construction AddRecs for VPCanonicalIVRecipe, which is an AddRec starting at 0 and stepping by 1. The returned SCEV expressions could be converted to a VF specific one, by rewriting the AddRecs to ones with the appropriate step. Note that the logic for constructing SCEVs for GetElementPtr was directly ported from ScalarEvolution.cpp. Another thing to note is that we construct SCEV expression purely by looking at the operation of the recipe and its translated operands, w/o accessing the underlying IR (the exception being getting the source element type for GEPs). PR: https://github.com/llvm/llvm-project/pull/161276	2025-10-30 15:46:19 -07:00
Ramkumar Ramachandra	01fbbda62c	[LV] Strengthen assert: VPlan0 doesn't have WidenPHIs (NFC) (#165715 ) VPWidenCanonicalIV and VPBlend recipes are created by VPPredicator, and VPCanonicalIVPHI and VPInstruction recipes are created by VPlanConstruction. WidenPHIs are never created.	2025-10-30 18:32:33 +00:00
Florian Hahn	4c46ae3948	[LV] Only skip scalarization overhead for members used as address. Refine logic to scalarize interleave group member: only skip scalarization overhead for member being used as addresses. For others, use the regular scalar memory op cost. This currently doesn't trigger in practice as far as I could find, but fixes a potential divergence between VPlan- and legacy cost models. It fixes a concrete divergence with a follow-up patch, https://github.com/llvm/llvm-project/pull/161276.	2025-10-30 05:04:34 +00:00
Mel Chen	6bf948999f	[VPlan] Store memory alignment in VPWidenMemoryRecipe. nfc (#165255 ) Add an member Alignment to VPWidenMemoryRecipe to store memory alignment directly in the recipe. Update constructors, clone(), and relevant methods to use this stored alignment instead of querying the IR instruction. This allows VPWidenLoadRecipe/VPWidenStoreRecipe to be constructed without relying on the original IR instruction in the future.	2025-10-28 15:29:35 +08:00
Hassnaa Hamdi	be29f0dd86	[LV]: Improve accuracy of calculating remaining iterations of MainLoopVF (#156723 ) Transform TC and VF to same numerical space when they are different.	2025-10-26 14:45:44 +00:00
Florian Hahn	bfc322dd72	Revert "[VPlan] Run narrowInterleaveGroups during general VPlan optimizations. (#149706 )" This reverts commit 8d29d09309654541fb2861524276ada6a3ebf84c. There have been reports of mis-compiles in https://github.com/llvm/llvm-project/pull/149706. Revert while I investigate.	2025-10-22 21:27:11 +01:00
Kerry McLaughlin	45c0b29171	[LV] Ignore user-specified interleave count when unsafe. (#153009 ) When an VF is specified via a loop hint, it will be clamped to a safe VF or ignored if it is found to be unsafe. This is not the case for user-specified interleave counts, which can lead to loops such as the following with a memory dependence being vectorised with interleaving: ``` #pragma clang loop interleave_count(4) for (int i = 4; i < LEN; i++) b[i] = b[i - 4] + a[i]; ``` According to [1], loop hints are ignored if they are not safe to apply. This patch adds a check to prevent vectorisation with interleaving if isSafeForAnyVectorWidth() returns false. This is already checked in selectInterleaveCount(). [1] https://llvm.org/docs/LangRef.html#llvm-loop-vectorize-and-llvm-loop-interleave	2025-10-22 15:21:27 +01:00
Florian Hahn	8d29d09309	[VPlan] Run narrowInterleaveGroups during general VPlan optimizations. (#149706 ) Move narrowInterleaveGroups to to general VPlan optimization stage. To do so, narrowInterleaveGroups now has to find a suitable VF where all interleave groups are consecutive and saturate the full vector width. If such a VF is found, the original VPlan is split into 2: a) a new clone which contains all VFs of Plan, except VFToOptimize, and b) the original Plan with VFToOptimize as single VF. The original Plan is then optimized. If a new copy for the other VFs has been created, it is returned and the caller has to add it to the list of candidate plans. Together with https://github.com/llvm/llvm-project/pull/149702, this allows to take the narrowed interleave groups into account when computing costs to choose the best VF and interleave count. One example where we currently miss interleaving/unrolling when narrowing interleave groups is https://godbolt.org/z/Yz77zbacz PR: https://github.com/llvm/llvm-project/pull/149706	2025-10-21 11:37:42 +01:00
Florian Hahn	35b9f20449	[LV] Check for TruncInsts in canTruncateToMinimalBitwidth. TruncInst must truncate at most to their destination. Return false if MinBWs contains a destination size > the trunc result type size. Fixes https://github.com/llvm/llvm-project/issues/162688.	2025-10-20 22:31:16 +01:00
Florian Hahn	b9ce7656e9	[VPlan] Add VPInstruction to unpack vector values to scalars. (#155670 ) Add a new Unpack VPInstruction (name to be improved) to explicitly extract scalars values from vectors. Test changes are movements of the extracts: they are no generated together and also directly after the producer. Depends on https://github.com/llvm/llvm-project/pull/155102 (included in PR) PR: https://github.com/llvm/llvm-project/pull/155670	2025-10-19 18:49:05 +00:00
Ramkumar Ramachandra	0a4702407b	[VPlan] Improve code around canConstantBeExtended (NFC) (#161652 ) Follow up on 7c4f188 ([LV] Support multiplies by constants when forming scaled reductions), introducing m_APInt, and improving code around canConstantBeExtended: we change canConstantBeExtended to take an APInt.	2025-10-16 13:03:13 +01:00
Florian Hahn	861519327a	[VPlan] Move getCanonicalIV to VPRegionBlock (NFC). (#163020 ) The canonical IV is tied to region blocks; move getCanonicalIV there and update all users. PR: https://github.com/llvm/llvm-project/pull/163020	2025-10-15 12:48:35 +01:00
Florian Hahn	4bf5ab4f9d	[VPlan] Set flags when constructing truncs using VPWidenCastRecipe. VPWidenCastRecipes with Trunc opcodes where missing the correct OpType for IR flags. Update createWidenCast to set the correct flags for truncs, and use it consistenly. Fixes https://github.com/llvm/llvm-project/issues/162374.	2025-10-12 14:01:12 +01:00
Florian Hahn	4b8cac2bcc	[VPlan] Don't reset canonical IV start value. (#161589 ) Instead of re-setting the start value of the canonical IV when vectorizing the epilogue we can emit an Add VPInstruction to provide canonical IV value, adjusted by the resume value from the main loop. This is in preparation to make the canonical IV a VPValue defined by loop regions. It ensures that the canonical IV always starts at 0. PR: https://github.com/llvm/llvm-project/pull/161589	2025-10-11 22:19:05 +01:00
Florian Hahn	ba69e33e13	[LV] Consistently apply address def scalarization across loop. Consistently scalarize loads used as part of address computations across all uses in the loop. This aligns the VPlan and legacy cost model and fixes a divergence crash. It doesn't matter if the load and address users are in different blocks, as long as they are in the same loop, the scalar value can be used. This removes a number of insert/extracts.	2025-10-09 22:04:15 +01:00
Florian Hahn	4d45718b47	[IVDescriptors] Add isFPMinMaxNumRecurrenceKind helper (NFC). Add helper to check for FMinNum and FMaxNum recurrence kinds, as suggested in https://github.com/llvm/llvm-project/pull/161735.	2025-10-08 11:40:46 +01:00

1 2 3 4 5 ...

2790 Commits