llvm-project

Author	SHA1	Message	Date
Florian Hahn	e9834209aa	[VPlan] Move convertToConreteRecipes to end of VPlan-opt phase (NFCI). Adjust placement as suggested in https://github.com/llvm/llvm-project/pull/114305, after some refactoring to prepare for the move.	2024-12-10 09:13:13 +00:00
Florian Hahn	0e70289f37	[VPlan] Create canonical IV resume value for epilogue in VPlan. (NFCI) Update the code to create induction resume PHIs to also create a resume phi for the canonical induction during epilogue vectorization. This unifies the code for handling induction resume values and removes the need to explicitly create manually resume PHI and return it during epilogue creation. Overall it helps to move the code for updating the canonical induction resume value to the place where all other header phi resume values are updated. This is NFC, modulo order of the created phis.	2024-12-09 23:11:38 +00:00
Florian Hahn	adfe54f7da	[VPlan] Directly check VectorizingEpilogue in ::executePlan (NFC). Directly check VectorizingEpilogue which directly indicates that the epilogue is vectorized.	2024-12-09 22:21:25 +00:00
Florian Hahn	4fd8dbc184	[LV] Move code to prepare VPlan for epilogue vector loop to helper (NFC) Move code to prepare the VPlan for the epilogue vector loop to a helper to reduce size and complexity of processLoop.	2024-12-09 21:56:10 +00:00
Kazu Hirata	9099d694f6	[Vectorize] Fix a warning This patch fixes: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:2699:49: error: captured structured bindings are a C++20 extension [-Werror,-Wc++20-extensions]	2024-12-09 10:26:48 -08:00
Igor Kirillov	337936a83b	[LV] Ignore some costs when loop gets fully unrolled (#106699 ) When VF has a fixed width and equals the number of iterations, and we are not tail folding by masking, comparison instruction and induction operation will be DCEed later. Ignoring the costs of these instructions improves the cost model.	2024-12-09 18:17:52 +00:00
Florian Hahn	eff0d8103c	[VPlan] Adjust original position of convertToConcreteRecipes. Restore the original position of the call before afef545efab77a8 to fix a number of crashes.	2024-12-08 21:52:51 +00:00
Florian Hahn	afef545efa	[VPlan] Address post-commit for #114305 . Apply suggested renaming and adjust placement as suggested in https://github.com/llvm/llvm-project/pull/114305. Also drop unneeded RPOT creation.	2024-12-08 21:24:19 +00:00
Florian Hahn	156da98683	[VPlan] Move printing final VPlan to ::execute (NFC). This moves printing of the final VPlan to ::execute. This ensures the final VPlan is printed, including recipes that get introduced by late, lowering transforms and skeleton construction. Split off from https://github.com/llvm/llvm-project/pull/114292, to simplify the diff.	2024-12-07 09:39:10 +00:00
Florian Hahn	7f7f540a48	Reapply "[VPlan] Update scalar induction resume values in VPlan. (#110577 )" This reverts commit f09b16e2671cbcdf7cb7dc7ed705db092a9deda1. The crash when building llvm-test-suite with stage2 should have been fixed by 1091fad31a83d5ab87eb6fa11fe3bdb3f0d152ea.	2024-12-06 19:41:51 +00:00
Nikita Popov	f09b16e267	Revert "[VPlan] Update scalar induction resume values in VPlan. (#110577 )" This reverts commit 0678e2058364ec10b94560d27ec7138dfa003287. This reverts commit 1091fad31a83d5ab87eb6fa11fe3bdb3f0d152ea. Causes crashes in llvm-test-suite when using stage 2 clang.	2024-12-06 18:01:42 +01:00
Florian Hahn	0678e20583	[VPlan] Update scalar induction resume values in VPlan. (#110577 ) Updated ILV.createInductionResumeValues (now createInductionResumeVPValue) to directly update the VPIRInstructions wrapping the original phis with the created resume values. This is the first step towards modeling them completely in VPlan. Subsequent patches will move creation of the resume values completely into VPlan. Depends on https://github.com/llvm/llvm-project/pull/109975. PR: https://github.com/llvm/llvm-project/pull/110577	2024-12-06 12:26:19 +00:00
Florian Hahn	f081ffe701	[LV] Simplify & clarify bypass handling for IV resume values (NFC) Split off NFC part refactoring from https://github.com/llvm/llvm-project/pull/110577. This simplifies and clarifies induction resume value creation for bypass blocks.	2024-12-06 11:33:30 +00:00
Florian Hahn	a7fda0e1e4	[VPlan] Introduce VPScalarPHIRecipe, use for can & EVL IV codegen (NFC). (#114305 ) Introduce a general recipe to generate a scalar phi. Lower VPCanonicalIVPHIRecipe and VPEVLBasedIVRecipe to VPScalarIVPHIrecipe before plan execution, avoiding the need for duplicated ::execute implementations. There are other cases that could benefit, including in-loop reduction phis and pointer induction phis. Builds on a similar idea as https://github.com/llvm/llvm-project/pull/82270. PR: https://github.com/llvm/llvm-project/pull/114305	2024-12-03 14:53:51 +00:00
Florian Hahn	77767986ed	[LV] Use IsaPred in a few more places (NFC). Simplifies the code slightly by removing explicit lambdas.	2024-12-01 18:47:53 +00:00
Florian Hahn	82821254f5	[LV] Use IVUpdateMayOverflow to set HasNUW. (#111758 ) If IVUpdateMayOverflow is false, we proved that the induction increment cannot overflow in the vector loop. This allows setting NUW in some cases when folding the tail. PR: https://github.com/llvm/llvm-project/pull/111758	2024-11-28 10:12:41 +00:00
Elvis Wang	9ea5be639d	Recommit "[LV][VPlan] Remove any-of reduction from precomputeCost. NFC (#117109 )" (#117289 ) Update the test cases contains `any-of` printings from the precomputeCost(). Origin message: The any-of reduction contains phi and select instructions. The select instruction might be optimized and removed in the vplan which may cause VF difference between legacy and VPlan-based model. But if the select instruction be removed, planContainsAdditionalSimplifications() will catch it and disable the assertion. Therefore, we can just remove the ayn-of reduction calculation in the precomputeCost(). Recommit "[LV][VPlan] Remove any-of reduction from precomputeCost. NFC (#117109)"	2024-11-28 15:07:36 +08:00
Florian Hahn	590f451b60	[VPlan] Allow setting IR name for VPDerivedIVRecipe (NFCI). Allow setting the name to use for the generated IR value of the derived IV in preparations for https://github.com/llvm/llvm-project/pull/112145. This is analogous to VPInstruction::Name.	2024-11-24 20:39:12 +00:00
Elvis Wang	0e3c791916	Revert "[LV][VPlan] Remove any-of reduction from precomputeCost. NFC" (#117280 ) Reverts llvm/llvm-project#117109 Some test cases need to update.	2024-11-22 11:32:52 +08:00
Elvis Wang	ce66b56865	[LV][VPlan] Remove any-of reduction from precomputeCost. NFC (#117109 ) The any-of reduction contains phi and select instructions. The select instruction might be optimized and removed in the vplan which may cause VF difference between legacy and VPlan-based model. But if the select instruction be removed, `planContainsAdditionalSimplifications()` will catch it and disable the assertion. Therefore, we can just remove the ayn-of reduction calculation in the precomputeCost().	2024-11-22 10:48:11 +08:00
Florian Hahn	4d1959b70b	[VPlan] Generalize collectUsersInExitBlocks for multiple exit bbs. (#115066 ) Generalize collectUsersInExitBlock to collecting exit users in multiple exit blocks. Exit blocks are leaf nodes in the VPlan (without successors) except the scalar header. Split off in preparation for https://github.com/llvm/llvm-project/pull/112138 PR: https://github.com/llvm/llvm-project/pull/115066	2024-11-21 21:15:36 +00:00
Finn Plummer	8663b8777e	[NFC][VectorUtils][TargetTransformInfo] Add `isVectorIntrinsicWithOverloadTypeAtArg` api (#114849 ) This changes allows target intrinsics to specify and overwrite overloaded types. - Updates `ReplaceWithVecLib` to not provide TTI as there most probably won't be a use-case - Updates `SLPVectorizer` to use available TTI - Updates `VPTransformState` to pass down TTI - Updates `VPlanRecipe` to use passed-down TTI This change will let us add scalarization for `asdouble`: #114847	2024-11-21 11:04:25 -08:00
Sjoerd Meijer	9bccf61f5f	[AArch64][LV] Set MaxInterleaving to 4 for Neoverse V2 and V3 (#100385 ) Set the maximum interleaving factor to 4, aligning with the number of available SIMD pipelines. This increases the number of vector instructions in the vectorised loop body, enhancing performance during its execution. However, for very low iteration counts, the vectorised body might not execute at all, leaving only the epilogue loop to run. This issue affects e.g. cam4_r from SPEC FP, which experienced a performance regression. To address this, the patch reduces the minimum epilogue vectorisation factor from 16 to 8, enabling the epilogue to be vectorised and largely mitigating the regression.	2024-11-20 09:33:39 +00:00
David Sherwood	12180717cb	[NFC][LoopVectorize] Introduce new getEstimatedRuntimeVF function (#116247 ) There are lots of places where we try to estimate the runtime vectorisation factor based on the getVScaleForTuning TTI hook. I've added a new getEstimatedRuntimeVF function and taught several places in the vectoriser to use this new function.	2024-11-19 12:38:11 +00:00
David Sherwood	3097c60928	[LoopVectorize][NFC] Rewrite tests to check output of vplan cost model (#113697 ) Currently it's very difficult to improve the cost model for tail-folded loops because as soon as you add a VPInstruction::computeCost function that adds the costs of instructions such as VPInstruction::ActiveLaneMask and VPInstruction::ExplicitVectorLength the assert in LoopVectorizationPlanner::computeBestVF fails for some tests. This is because the VF chosen by the legacy cost model doesn't match the vplan cost model. See PR #90191. This assert is currently making it difficult to improve the cost model. Hopefully we will be in a position to remove the assert soon, however in order to do that we have to fix up a whole bunch of tests that rely upon the legacy cost model output. I've tried my best to update these tests to use vplan output instead. There is still work needed for the VF=1 case because the vplan cost model is not printed out in this case. I've not attempted to fix those in this patch.	2024-11-19 08:55:39 +00:00
Julian Nagele	a8538b9138	[LV] Vectorize Epilogues for loops with small VF but high IC (#108190 ) - Consider MainLoopVF * IC when determining whether Epilogue Vectorization is profitable - Allow the same VF for the Epilogue as for the main loop - Use an upper bound for the trip count of the Epilogue when choosing the Epilogue VF PR: https://github.com/llvm/llvm-project/pull/108190 --------- Co-authored-by: Florian Hahn <flo@fhahn.com>	2024-11-17 19:35:32 +00:00
Luke Lau	050e2d325a	[LV] Remove assertions in IV overflow check (#115705 ) In #111310 an assert was added that for the IV overflow check used with tail folding, the overflow check is never known. However when applying the loop guards, it looks like it's possible that we might actually know the IV won't overflow: this occurs in 500.perlbench_r from SPEC CPU 2017 and triggers the assertion: Assertion failed: (!isIndvarOverflowCheckKnownFalse(Cost, VF * UF) && !SE.isKnownPredicate(CmpInst::getInversePredicate(ICmpInst::ICMP_ULT), TC2OverflowSCEV, SE.getSCEV(Step)) && "unexpectedly proved overflow check to be known"), function emitIterationCountCheck, file LoopVectorize.cpp, line 2501. There is a discrepancy between `isIndvarOverflowCheckKnownFalse` and the ICMP_ULT check, because the former uses `getSmallConstantMaxTripCount` which only takes into trip counts that fit into 32 bits. There doesn't seem to be an easy way to make the assertion aware of this, so this PR just removes it for now. There are two potential follow up things from this PR: 1. We miss calculating the max trip count in `@trip_count_max_1024`, it looks like we might need to apply loop guards somewhere in `ScalarEvolution::computeExitLimitFromICmp` 2. In `@overflow_at_0`, if `%tc == 0` then we the overflow check will always return false, even though it will overflow Fixes https://github.com/llvm/llvm-project/issues/115755	2024-11-14 17:04:49 +09:00
Luke Lau	9e77f59005	[LV] Account for vp_merge in out of loop EVL reductions in legacy cost model (#115903 ) In #101641, support for out of loop reductions with EVL tail folding was added by transforming selects to vp_merges in transformRecipestoEVLRecipes. Whilst the select was previously free, the vp_merge wasn't and incurs a cost on RISC-V with the VPlan cost model. But this diverged from the legacy cost model and caused the "VPlan cost model and legacy cost model disagreed" assertion to trigger when building 502.gcc_r from SPEC CPU 2017. Neither the select nor vp_merge recipes from the VPlan exist in the underlying instructions, so I thought it would make the most sense to fix this by adding the cost to the underlying phi instruction in getInstructionCost. It's worth noting that on RISC-V this vp_merge won't actually generate any instructions because the mask is all true, and will be folded away. So we should update the cost model at some point to reflect that.	2024-11-14 16:55:18 +09:00
Elvis Wang	b4d23cf685	[LV] Fix missing precomptueCosts() in emitInvalidCostRemarks(). (#114918 ) We should always update the `SkipComputation` which is set in `VPCostContext` before VPlan compute costs. This patch prevent the assertion of in-loop reduction in the `VPReductionRecipe::computeCost()` and other potential assertions of partially implemented VPlan-based cost model.	2024-11-14 08:29:55 +08:00
Florian Hahn	98c4f4fce8	[LV] Remove IVEndValues, use resume value directly from fixed phi.(NFC) Use the IV resume/end values from the phis in the scalar header, instead of collecting them in a map. This removes some complexity from the code dealing with induction resume values. Analogous to 1edd22030 which did the same for reduction resume values.	2024-11-13 21:03:54 +00:00
Kazu Hirata	c236dbc343	[Vectorize] Simplify code with MapVector::operator[] (NFC) (#115592 )	2024-11-09 14:36:32 -08:00
Florian Hahn	144bdf3eb7	[VPlan] Also check if plan for best legacy VF contains simplifications. The plan for the VF chosen by the legacy cost model could also contain additional simplifications that cause cost differences. Also check if it contains simplifications. Fixes https://github.com/llvm/llvm-project/issues/114860.	2024-11-08 20:53:03 +00:00
David Sherwood	d77a36e01b	[LoopVectorize] Use new getUniqueLatchExitBlock routine (#108231 ) With PR #88385 I am introducing support for vectorising more loops with early exits that don't require a scalar epilogue. As such, if a loop doesn't have a unique exit block it will not automatically imply we require a scalar epilogue. Also, in all places in the code today where we use the variable LoopExitBlock we actually mean the exit block from the latch. Therefore, it seemed reasonable to add a new getUniqueLatchExitBlock that allows the caller to determine the exit block taken from the latch and use this instead of getUniqueExitBlock. I also renamed LoopExitBlock to be LatchExitBlock. I feel this not only better reflects how the variable is used today, but also prepares the code for PR #88385. While doing this I also noticed that one of the comments in requiresScalarEpilogue is wrong when we require a scalar epilogue, i.e. when we're not exiting from the latch block. This doesn't always imply we have multiple exits, e.g. see the test in Transforms/LoopVectorize/unroll_nonlatch.ll where the latch unconditionally branches back to the only exiting block.	2024-11-06 10:35:35 +00:00
Mel Chen	4480a22c2b	[LV][EVL] Emit vp.merge intrinsic to enable out-loop reduction in EVL vectorization. (#101641 ) Following #90184, this patch emits vp.merge intrinsic, which is used to set the inactive lanes in a select operation to the RHS instead of undef. Currently, it is applied to out-loop reduction for EVL vectorization. This patch performs transformation to convert select(header_mask, LHS, RHS) into vp.merge(all-true, LHS, RHS, EVL) And always use the predicated reduction select to set the incoming value of the reduction phi to support out-loop reduction when using tail folding with EVL. TODO: Postpone the adjustment of the predicated reduction select to VPlanTransform. The current adjustment might be too early, which could lead to a situation where the predicated reduction select is adjusted, but the EVL recipes cannot be successfully generated during VPlanTransform.	2024-11-06 14:53:49 +08:00
Mel Chen	70de0b8bea	[LV][NFC] Simplify initialization of MinProfitableTripCount (#113445 ) Iteration runtime check confirms whether the trip count is greater than VFxUF at least. Therefore, there is no need to adjust the MinProfitableTripCount to VF if it is zero. Retaining the original MinProfitableTripCount information is also beneficial for supporting more profitable runtime checks in the future.	2024-11-05 15:13:59 +08:00
Florian Hahn	17bad1a9da	[LV] Bail out on header phis in shouldConsiderInvariant. This fixes an infinite recursion in rare cases. Fixes https://github.com/llvm/llvm-project/issues/113794.	2024-11-01 20:51:25 +00:00
David Sherwood	4ed7bcb4a6	[VPlan][NFC] Add new getMiddleBlock interface to VPlan (#113558 ) This work is in preparation for PRs #112138 and #88385 where the middle block is not guaranteed to be the immediate successor to the region block. I've simply add new getMiddleBlock() interfaces to VPlan that for now just return cast<VPBasicBlock>(VectorRegion->getSingleSuccessor()) Once PR #112138 lands we'll need to do more work to discover the middle block.	2024-11-01 10:50:52 +00:00
Florian Hahn	b021464d35	[VPlan] Introduce scalar loop header in plan, remove VPLiveOut. (#109975 ) Update VPlan to include the scalar loop header. This allows retiring VPLiveOut, as the remaining live-outs can now be handled by adding operands to the wrapped phis in the scalar loop header. Note that the current version only includes the scalar loop header, no other loop blocks and also does not wrap it in a region block. PR: https://github.com/llvm/llvm-project/pull/109975	2024-10-31 21:36:44 +01:00
Florian Hahn	5bd1af5abc	[LV] Directly store VPlan in InnerLoopVectorizer (NFC). The current VPlan is already passed to multiple functions and more in the future. Store it once directly in InnerLoopVectorizer.	2024-10-30 18:39:50 +00:00
Piotr Fusik	3c02fea737	[LV][NFC] Remove stray semicolons (#114057 )	2024-10-30 04:07:14 +01:00
Florian Hahn	0d0abb351b	[VPlan] Use ResumePhi to create reduction resume phis. (#110004 ) Use VPInstruction::ResumePhi to create phi nodes for reduction resume values in the scalar preheader, similar to how ResumePhis are used for first-order recurrence resume values after 9a5a8731e77. This allows simplifying createAndCollectMergePhiForReduction to only collect reduction resume phis when vectorizing epilogue loops and adding extra incoming edges from the main vector loop. Updating phis for the epilogue vector loops requires special attention, because additional incoming values from the bypass blocks need to be added. PR: https://github.com/llvm/llvm-project/pull/110004	2024-10-28 20:14:08 +01:00
Shih-Po Hung	266ff98cba	[LV][VPlan] Use VF VPValue in VPVectorPointerRecipe (#110974 ) Refactors VPVectorPointerRecipe to use the VF VPValue to obtain the runtime VF, similar to #95305. Since only reverse vector pointers require the runtime VF, the patch sets VPUnrollPart::PartOpIndex to 1 for vector pointers and 2 for reverse vector pointers. As a result, the generation of reverse vector pointers is moved into a separate recipe.	2024-10-26 23:18:50 +08:00
Florian Hahn	9648271a3c	[LV] Pass flag indicating epilogue is vectorized to executePlan (NFC) This clarifies the flag, which is now only passed if the epilogue loop is being vectorized.	2024-10-25 20:39:47 +02:00
Florian Hahn	7b9f988a53	[VPlan] Limit stride replacement to vector region and middle VPBB (NFC). At the moment this in NFC, but ensures we only replace uses that are dominated by runtime checks as we model more of the skeleton in VPlan.	2024-10-24 15:08:36 -07:00
Florian Hahn	2437784a17	[LV] Replace unreachable by folding into else with assert (NFC). Simplify code as suggested post-commit in https://github.com/llvm/llvm-project/pull/110576/.	2024-10-21 21:46:48 -07:00
Elvis Wang	b3edc764f7	[VPlan] Implement VPWidenCastRecipe::computeCost(). (NFCI) (#111339 ) This patch implement `VPWidenCastRecipe::computeCost()` and skip cast recipies in the in-loop reduction.	2024-10-22 12:23:49 +08:00
Florian Hahn	173907b5d7	[LV] Move logic to check if op is invariant to legacy cost model. (NFC) This allows the function to be re-used in other places	2024-10-20 17:26:15 -07:00
Florian Hahn	cba5c77a71	[VPlan] Mark unreachable code path when retrieving the scalar PH. (NFCI)	2024-10-19 19:14:21 -07:00
Florian Hahn	2deb3a26fa	[LV] Fixup IV users only once during epilogue vectorization. (NFC) Induction users only need to be updated when vectorizing the epilogue. Avoid running fixupIVUsers when vectorizing the main loop during epilogue vectorization.	2024-10-19 18:11:06 -07:00
Florian Hahn	2a6b09e0d3	[LV] Use type from InsertPos for cost computation of interleave groups. Previously the legacy cost model would pick the type for the cost computation depending on the order of the members in the input IR. This is incompatible with the VPlan-based cost model (independent of original IR order) and also doesn't match code-gen, which uses the type of the insert position. Update the legacy cost model to use the type (and address space) from the Group's insert position. This brings the legacy cost model in line with the legacy cost model and fixes a divergence between both models. Note that the X86 cost model seems to assign different costs to groups with i64 and double types. Added a TODO to check. Fixes https://github.com/llvm/llvm-project/issues/112922.	2024-10-18 19:12:40 -07:00

1 2 3 4 5 ...

2308 Commits