llvm-project

Author	SHA1	Message	Date
David Sherwood	4e69258bf3	[LoopVectorize] Add cost of generating tail-folding mask to the loop (#130565 ) At the moment if we decide to enable tail-folding we do not include the cost of generating the mask per VF. This can mean we make some poor choices of VF, which is definitely true for SVE-enabled AArch64 targets where mask generation for fixed-width vectors is more expensive than for scalable vectors. I've added a VPInstruction::computeCost function to return the costs of the ActiveLaneMask and ExplicitVectorLength operations. Unfortunately, in order to prevent asserts firing I've also had to duplicate the same code in the legacy cost model to make sure the chosen VFs match up. I've wrapped this up in a ifndef NDEBUG for now. The alternative would be to disable the assert completely when tail-folding, which I imagine is just as bad. New tests added: Transforms/LoopVectorize/AArch64/sve-tail-folding-cost.ll Transforms/LoopVectorize/RISCV/tail-folding-cost.ll	2025-03-21 09:24:56 +00:00
Luke Lau	01f04252b6	[LV] Get FMFs from VectorBuilder in createSimpleReduction. NFC (#132017 ) The other createSimpleReduction takes the FMFs from the IRBuilder, so this aligns the VectorBuilder variant to do the same and reduce the possibility of there being a mismatch in flags.	2025-03-20 16:38:56 +08:00
Luke Lau	f536f71580	[LV] Split RecurrenceDescriptor into RecurKind + FastMathFlags in LoopUtils. NFC (#132014 ) Split off from #131300, this splits up RecurrenceDescriptor arguments so that arbitrary recurrence kinds may be used down the line.	2025-03-19 22:56:57 +08:00
Luke Lau	a4dc02c0e7	[VPlan] Rename VPReverseVectorPointerRecipe to VPVectorEndPointerRecipe. NFC (#131086 ) After #128718 lands there will be two ways of performing a reversed widened memory access, either by performing a consecutive unit-stride access and a reverse, or a strided access with a negative stride. Even though both produce a reversed vector, only the former needs VPReverseVectorPointerRecipe which computes a pointer to the last element of each part. A strided reverse still needs a pointer to the first element of each part so it will use VPVectorPointerRecipe. This renames VPReverseVectorPointerRecipe to VPVectorEndPointerRecipe to clarify that a reversed access may not necessarily need a pointer to the last element.	2025-03-19 00:09:15 +08:00
Elvis Wang	ed19620b8c	[VPlan] Make VPReductionRecipe a VPRecipeWithIRFlags. NFC (#130881 ) This patch change the parent of the VPReductionRecipe from VPSingleDefRecipe to VPRecipeWithIRFlags and also print/get/drop/control flags by the VPRecipeWithIRFlags. This will remove the dependency of the underlying instruction. This patch also add a new function `setFastMathFlags()` to the VPRecipeWithIRFlags because the entire reduction chain may contains multiple instructions. And the underlying instruction may not contains the corresponding flags for this reduction. Split from #113903.	2025-03-18 10:08:23 +08:00
Luke Lau	67f1c033b8	[VPlan] Remove createReduction. NFCI (#131336 ) This is split off from #131300. A VPReductionRecipe will never have a AnyOf or FindLastIV recurrence, so when it calls createReduction it always calls createSimpleReduction. If we replace the call then it leaves createReduction with one user in VPInstruction::ComputeReductionResult, which we can inline and then remove.	2025-03-18 00:18:15 +08:00
Florian Hahn	4e9894498e	[VPlan] Truncate VFxUF if needed in VPWidenPointerInduction::execute. Create truncate if needed after 56b05a0d6. Note that this preserves the original behavior pre 56b05a0d6. If truncate would strip any set bits, then the explicit computation in the narrower type would wrap.	2025-03-16 11:37:58 +00:00
Florian Hahn	0bd8a75a0c	[VPlan] Fix formatting after 6a8d5f22f.	2025-03-15 21:33:30 +00:00
Florian Hahn	6a8d5f22ff	[VPlan] Don't access canonical IV in VPWidenPointerInduction::execute. This updates VPWidenPointerInductionRecipe::execute to not use the canonical IV to determine the insert point. Instead, it relies on the current recipe position. In cases where this is not sufficient, set the insert point to the first non-phi instruction, to ensure phis are created together.	2025-03-15 21:32:48 +00:00
Florian Hahn	56b05a0d6b	[VPlan] Use VFxUF in VPWidenPointerInductionRecipe. Use VFxUF VPValue instead of computing VF * UF explicitly.	2025-03-15 18:18:53 +00:00
Florian Hahn	8ff27bbd84	[VPlan] Remove unneeded select in VPWidenPointerInductionRecipe (NFC).	2025-03-15 17:34:23 +00:00
David Sherwood	3b6d0093aa	[LV][NFC] Refactor code for extracting first active element (#131118 ) Refactor the code to extract the first active element of a vector in the early exit block, in preparation for PR #130766. I've replaced the VPInstruction::ExtractFirstActive nodes with a combination of a new VPInstruction::FirstActiveLane node and a Instruction::ExtractElement node.	2025-03-14 11:14:09 +00:00
Luke Lau	26324bc1bf	[VPlan] Move FOR splice cost into VPInstruction::FirstOrderRecurrenceSplice (#129645 ) After #124093 we now support fixed-order recurrences with EVL tail folding by replacing VPInstruction::FirstOrderRecurrenceSplice with a VP splice intrinsic. However the costing for the splice is currently done in VPFirstOrderRecurrencePHIRecipe, so when we add the VP splice intrinsic we end up costing it twice. This fixes it by splitting out the cost for the splice into FirstOrderRecurrenceSplice so that it's not duplicated when we replace it. We still have to keep the VF=1 checks in VPFirstOrderRecurrencePHIRecipe since the splice might end up dead and discarded, e.g. in the test @pr97452_scalable_vf1_for.	2025-03-14 15:33:32 +08:00
Florian Hahn	02575f887b	[VPlan] Use VPInstruction for VPScalarPHIRecipe. (NFCI) (#129767 ) Now that all phi nodes manage their incoming blocks through the VPlan-predecessors, there should be no need for having a dedicate recipe, it should be sufficient to allow PHI opcodes in VPInstruction. Follow-ups will also migrate VPWidenPHIRecipe and possibly others, building on top of https://github.com/llvm/llvm-project/pull/129388. PR: https://github.com/llvm/llvm-project/pull/129767	2025-03-13 18:35:07 +00:00
Florian Hahn	dc23234a66	[VPlan] Remove dead code in VPWidenPHIRecipe::print (NFC). All incoming models for VPWidenPHIRecipe are modled in VPlan, remove code trying to print the orignial phi.	2025-03-11 21:44:04 +00:00
David Sherwood	26ecf97895	[LoopVectorize] Further improve cost model for early exit loops (#126235 ) Following on from #125058, this patch takes into account the work done in the vector early exit block when assessing the profitability of vectorising the loop. I have renamed areRuntimeChecksProfitable to isOutsideLoopWorkProfitable and we now pass in the early exit costs. As part of this, I have added the ExtractFirstActive opcode to VPInstruction::computeCost. It's worth pointing out that when we assess profitability of the loop we calculate a minimum trip count and compare that against the maximum trip count. However, since the loop has an early exit the runtime trip count can still end up being less than the minimum. Alternatively, we may never take the early exit at all at runtime and so we have the opposite problem of over-estimating the cost of the loop. The loop vectoriser cannot simultaneously take two contradictory positions and so I feel the only sensible thing to do is be conservative and assume the loop will be more expensive than loops without early exits. We may find in future that we need to adjust the cost according to the probability of taking the early exit. This will become even more important once we support multiple early exits. However, we have to start somewhere and we can always revisit this later.	2025-03-11 11:48:55 +00:00
Ramkumar Ramachandra	809a33d7ab	[VPlan] Clean up unused header (NFC) (#130599 ) We use VPlanPatternMatch, and not PatternMatch: clean up the PatternMatch include to avoid confusion.	2025-03-10 17:57:11 +00:00
Luke Lau	6d89c042e3	[VPlan] Remove dead AnyOf reduction case in VPReductionRecipe. NFCI (#130048 ) From what I understand, we only create VPReductionRecipes for in-loop reductions, and we don't currently support in-loop AnyOf reductions. We only create VPReductionRecipes in the !PhiR->isInLoop() section of adjustRecipesForReductions, and this comment from the initial patch seems to confirm this https://reviews.llvm.org/D108136#anchor-inline-1038338, so I think we can remove this check in the condition logic. I checked compiling SPEC 2017 with -prefer-inloop-predicates and the added assertion doesn't trigger.	2025-03-07 01:05:53 +08:00
Luke Lau	47fb9c4bb9	[VPlan] Add Name argument to VPWidenPHIRecipe. NFC (#129527 ) This allows a different IR name for the generated phi to be used. This is split off from #118638 and helps remove some of the diffs in it.	2025-03-04 16:47:21 +08:00
Florian Hahn	6ce41db6b0	[VPlan] Preserve DebugLoc for VPBranchOnMaskRecipe. Update code to set and generate debug location for branch recipe	2025-02-27 20:19:42 +00:00
Florian Hahn	253d691596	[VPlan] Update VPBranchOnMaskRecipe to always set the mask (NFC). The mask is always available at construction time. Make it non-optional to simlpify code.	2025-02-27 18:53:24 +00:00
Benjamin Maxwell	3307b0374a	[LV] Teach the loop vectorizer llvm.sincos is trivially vectorizable (#128035 ) Depends on #123210	2025-02-27 09:37:06 +00:00
Florian Hahn	4277c21059	[VPlan] Introduce explicit broadcasts for live-ins. (#124644 ) Add a new VPInstruction::Broadcast opcode and use it to materialize explicit broadcasts of live-ins. The initial patch only materlizes the broadcasts if the vector preheader dominates all uses that need it. Later patches will pick the best valid insert point, thus retiring implicit hoisting of broadcasts from VPTransformsState::get(). PR: https://github.com/llvm/llvm-project/pull/124644	2025-02-26 13:57:51 +00:00
Luke Lau	e23ab73335	[VPlan] Don't convert widen recipes to VP intrinsics in EVL transform (#127180 ) This is a copy of #126177, since it was automatically and permanently closed because I messed up the source branch on my remote This patch proposes to avoid converting widening recipes to VP intrinsics during the EVL transform. IIUC we initially did this to avoid `vl` toggles on RISC-V. However we now have the RISCVVLOptimizer pass which mostly makes this redundant. Emitting regular IR instead of VP intrinsics allows more generic optimisations, both in the middle end and DAGCombiner, and we generally have better patterns in the RISC-V backend for non-VP nodes. Sticking to regular IR instructions is likely a lot less work than reimplementing all of these optimisations for VP intrinsics, and on SPEC CPU 2017 we get noticeably better code generation.	2025-02-22 19:38:11 +08:00
Florian Hahn	6c627831f9	[VPlan] Use VPlan predecessors in VPWidenPHIRecipe (NFC). (#126388 ) Update VPWidenPHIRecipe to use the predecessors in VPlan to determine the incoming blocks instead of tracking them separately. This brings VPWidenPHIRecipe in line with the other phi recipes. PR: https://github.com/llvm/llvm-project/pull/126388	2025-02-17 16:40:37 +01:00
Benjamin Maxwell	e0e67a6207	[LV] Add initial support for vectorizing literal struct return values (#109833 ) This patch adds initial support for vectorizing literal struct return values. Currently, this is limited to the case where the struct is homogeneous (all elements have the same type) and not packed. The users of the call also must all be `extractvalue` instructions. The intended use case for this is vectorizing intrinsics such as: ``` declare { float, float } @llvm.sincos.f32(float %x) ``` Mapping them to structure-returning library calls such as: ``` declare { <4 x float>, <4 x float> } @Sleef_sincosf4_u10advsimd(<4 x float>) ``` Or their widened form (such as `@llvm.sincos.v4f32` in this case). Implementing this required two main changes: 1. Supporting widening `extractvalue` 2. Adding support for vectorized struct types in LV * This is mostly limited to parts of the cost model and scalarization Since the supported use case is narrow, the required changes are relatively small.	2025-02-17 09:51:35 +00:00
Nicholas Guy	9c89faa62b	[LoopVectorizer][AArch64] Add support for partial reduce subtraction (#123636 )	2025-02-13 10:35:45 +00:00
Florian Hahn	e258bca950	[VPlan] Only skip expansion for SCEVUnknown if it isn't an instruction. (#125235 ) Update getOrCreateVPValueForSCEVExpr to only skip expansion of SCEVUnknown if the underlying value isn't an instruction. Instructions may be defined in a loop and using them without expansion may break LCSSA form. SCEVExpander will take care of preserving LCSSA if needed. We could also try to pass LoopInfo, but there are some users of the function where it won't be available and main benefit from skipping expansion is slightly more concise VPlans. Note that SCEVExpander is now used to expand SCEVUnknown with floats. Adjust the check in expandCodeFor to only check the types and casts if the type of the value is different to the requested type. Otherwise we crash when trying to expand a float and requesting a float type. Fixes https://github.com/llvm/llvm-project/issues/121518. PR: https://github.com/llvm/llvm-project/pull/125235	2025-02-11 13:03:12 +01:00
Hassnaa Hamdi	e9a20f77ee	Reland "[LV]: Teach LV to recursively (de)interleave." (#125094 ) This patch relands the changes from "[LV]: Teach LV to recursively (de)interleave.#122989" Reason for revert: - The patch exposed an assert in the vectorizer related to VF difference between legacy cost model and VPlan-based cost model because of uncalculated cost for VPInstruction which is created by VPlanTransforms as a replacement to 'or disjoint' instruction. VPlanTransforms do that instructions change when there are memory interleaving and predicated blocks, but that change didn't cause problems because at most cases the cost difference between legacy/new models is not noticeable. - Issue is fixed by #125434 Original patch: https://github.com/llvm/llvm-project/pull/89018 Reviewed-by: paulwalker-arm, Mel-Chen	2025-02-09 19:21:54 +00:00
Florian Hahn	32c4493d5f	[VPlan] Add incoming values for all predecessor to ResumePHI (NFCI). Follow-up as discussed when using VPInstruction::ResumePhi for all resume values (#112147). This patch explicitly adds incoming values for each predecessor in VPlan. This simplifies codegen and allows transformations adjusting the predecessors of blocks with NFC modulo incoming block order in phis.	2025-02-09 11:20:20 +00:00
Florian Hahn	1611059f5d	[VPlan] Compute cost for binary op VPInstruction with underlying values. (#125434 ) As exposed by https://github.com/llvm/llvm-project/pull/125094, we are missing cost computation for some binary VPInstructions we created based on original IR instructions. Their cost should be considered. PR: https://github.com/llvm/llvm-project/pull/125434	2025-02-07 15:27:31 +00:00
James Chesterman	ac158aa13b	[LoopVectorizer] Allow partial reductions to be made in predicated loops (#124268 ) Does a select on the input rather than the output. This way the mask has the same number of lanes as the other operand in the select instruction.	2025-02-07 09:09:10 +00:00
David Sherwood	f07cd36a5d	[LoopVectorize] Add the cost of VPInstruction::AnyOf to vplan (#125058 ) This patch adds an initial implementation of VPInstruction::computeCost with support for only one instruction so far - VPInstruction::AnyOf. This is only used when vectorising loops with uncountable early exits.	2025-02-05 16:31:14 +00:00
Florian Hahn	5008277322	[VPlan] Move auxiliary declarations out of VPlan.h (NFC). (#124104 ) Nothing in VPlan.h directly depends on VPTransformState, VPCostContext, VPFRange, VPlanPrinter or VPSlotTracker. Move them out to a separate header to reduce the size of widely used VPlan.h. This is a first step towards more cleanly separating declarations in VPlan. Besides reducing VPlan.h's size, this also allows including additional VPlan-related headers in VPlanHelpers.h for use there. An example is using VPDominatorTree in VPTransformState (https://github.com/llvm/llvm-project/pull/117138). PR: https://github.com/llvm/llvm-project/pull/124104	2025-02-02 13:44:07 +00:00
David Sherwood	3bc2dade36	[LoopVectorize] Enable vectorisation of early exit loops with live-outs (#120567 ) This work feeds part of PR https://github.com/llvm/llvm-project/pull/88385, and adds support for vectorising loops with uncountable early exits and outside users of loop-defined variables. When calculating the final value from an uncountable early exit we need to calculate the vector lane that triggered the exit, and hence determine the value at the point we exited. All code for calculating the last value when exiting the loop early now lives in a new vector.early.exit block, which sits between the middle.split block and the original exit block. Doing this required two fixes: 1. The vplan verifier incorrectly assumed that the block containing a definition always dominates the block of the user. That's not true if you can arrive at the use block from multiple incoming blocks. This is possible for early exit loops where both the early exit and the latch jump to the same block. 2. We were adding the new vector.early.exit to the wrong parent loop. It needs to have the same parent as the actual early exit block from the original loop. I've added a new ExtractFirstActive VPInstruction that extracts the first active lane of a vector, i.e. the lane of the vector predicate that triggered the exit. NOTE: The IR generated for dealing with live-outs from early exit loops is unoptimised, as opposed to normal loops. This inevitably leads to poor quality code, but this can be fixed up later.	2025-01-30 10:37:00 +00:00
Florian Hahn	713482fccf	[VPlan] Use State.get to extract lane mask for BranchOnMask. Simplifies the code slightly and avoids redundant extracts/broadcasts if the operand is live-in or already scalar.	2025-01-27 21:35:36 +00:00
Florian Hahn	09a29fcc8d	[VPlan] Don't collect live-ins in collectUsersInExitBlocks. (NFC) (#123819 ) Live-ins don't need to be handled, other than adding to the exit phi recipe. Do that early and assert that otherwise the exit value is defined in the vector loop region. This should enable simply skipping other exit values that do not need further fixing, e.g. if handling the exit value from the early exit directly in handleUncountableEarlyExit. PR: https://github.com/llvm/llvm-project/pull/123819	2025-01-27 16:12:07 +00:00
Elvis Wang	aff1242b8e	[LV] Align debug location of the widen-phi to the original phi. (#120338 ) This patch align the debug location of the widen-phi to the debug location of original phi. Split from: #120054	2025-01-24 17:49:54 +08:00
Florian Hahn	6c787ff6cf	Revert "[LV]: Teach LV to recursively (de)interleave. (#122989 )" This reverts commit 9491f75e1d912b277247450d1c7b6d56f7faf885. This triggers an assert when building with SVE enabled. https://lab.llvm.org/buildbot/#/builders/143/builds/4795	2025-01-21 21:36:16 +00:00
John Brawn	edf3a55bce	[LoopVectorize][NFC] Centralize the setting of CostKind (#121937 ) In each class which calculates instruction costs (VPCostContext, LoopVectorizationCostModel, GeneratedRTChecks) set the CostKind once in the constructor instead of in each function that calculates a cost. This is in preparation for potentially changing the CostKind when compiling for optsize.	2025-01-17 15:06:18 +00:00
Hassnaa Hamdi	9491f75e1d	Reland: [LV]: Teach LV to recursively (de)interleave. (#122989 ) This commit relands the changes from "[LV]: Teach LV to recursively (de)interleave. #89018" Reason for revert: - The patch exposed a bug in the IA pass, the bug is now fixed and landed by commit: #122643	2025-01-17 10:34:57 +00:00
LiqinWeng	0294dab79e	[LV][VPlan] Add fast flags for selectRecipe (#121023 ) Change the inheritance of class VPWidenSelectRecipe to class VPRecipeWithIRFlags, which allows recipe of the select to pass the fastmath flags.The patch of #119847 will add the fastmath flag to for recipe	2025-01-15 10:10:11 +08:00
Sam Tebbs	795e35a653	Reland "[LoopVectorizer] Add support for partial reductions" with non-phi operand fix. (#121744 ) This relands the reverted #120721 with a fix for cases where neither reduction operand are the reduction phi. Only 63114239cc8d26225a0ef9920baacfc7cc00fc58 and 63114239cc8d26225a0ef9920baacfc7cc00fc58 are new on top of the reverted PR. --------- Co-authored-by: Nicholas Guy <nicholas.guy@arm.com>	2025-01-13 11:20:35 +00:00
Luke Lau	f0d5104c94	[VPlan] Handle some VPInstructions in may{Read,Write}FromMemory (#120058 ) This just copies the same conservative definition from mayWriteToMemory, and enables more VPInstructions to be hoisted out in LICM. I think this should give more accurate costs, and I was able to build llvm-test-suite without the legacy-vplan cost model assertion going off.	2025-01-08 15:17:26 +08:00
Yingwei Zheng	a77346bad0	[IRBuilder] Refactor FMF interface (#121657 ) Up to now, the only way to set specified FMF flags in IRBuilder is to use `FastMathFlagGuard`. It makes the code ugly and hard to maintain. This patch introduces a helper class `FMFSource` to replace the original parameter `Instruction *FMFSource` in IRBuilder. To maximize the compatibility, it accepts an instruction or a specified FMF. This patch also removes the use of `FastMathFlagGuard` in some simple cases. Compile-time impact: https://llvm-compile-time-tracker.com/compare.php?from=f87a9db8322643ccbc324e317a75b55903129b55&to=9397e712f6010be15ccf62f12740e9b4a67de2f4&stat=instructions%3Au	2025-01-06 14:37:04 +08:00
Florian Hahn	f4230b4332	[VPlan] Add and use debug location for VPScalarCastRecipe. Update the recipe it always take a debug location and set it.	2025-01-05 20:08:51 +00:00
Florian Hahn	b06a45c66f	[VPlan] Add all blocks to outer loop if present during ::execute (NFCI). This ensures that all blocks created during VPlan execution are properly added to an enclosing loop, if present. Split off from https://github.com/llvm/llvm-project/pull/108378 and also needed once more of the skeleton blocks are created directly via VPlan. This also allows removing the custom logic for early-exit loop vectorization added as part of https://github.com/llvm/llvm-project/pull/117008.	2024-12-31 19:34:34 +00:00
Muhammad Omair Javaid	332d2647ff	Revert "[LV]: Teach LV to recursively (de)interleave. (#89018 )" This reverts commit ccfe0de0e1e37ed369c9bf89dd0188ba0afb2e9a. This breaks LLVM build on AArch64 SVE Linux buildbots https://lab.llvm.org/buildbot/#/builders/143/builds/4462 https://lab.llvm.org/buildbot/#/builders/17/builds/4902 https://lab.llvm.org/buildbot/#/builders/4/builds/4399 https://lab.llvm.org/buildbot/#/builders/41/builds/4299	2024-12-31 03:12:24 +05:00
Zequan Wu	4d8f9594b2	Revert "Reland "[LoopVectorizer] Add support for partial reductions" (#120721 )" This reverts commit c858bf620c3ab2a4db53e84b9365b553c3ad1aa6 as it casuse optimization crash on -O2, see https://github.com/llvm/llvm-project/pull/120721#issuecomment-2563192057	2024-12-27 11:51:54 -08:00
Hassnaa Hamdi	ccfe0de0e1	[LV]: Teach LV to recursively (de)interleave. (#89018 ) Currently available intrinsics are only ld2/st2, which don't support interleaving factor > 2. This patch teaches the LV to use ld2/st2 recursively to support high interleaving factors.	2024-12-27 12:42:07 +00:00

1 2 3 4 5 ...

300 Commits