llvm-project

Author	SHA1	Message	Date
Alexey Bataev	badf34a063	[LV]Process alloca in isPredicatedInst for tail-folded analysis. Patch fixes the compiler crash when it tries to check is alloca in the loop is a predicated instruction. Reviewers: fhahn Reviewed By: fhahn Pull Request: https://github.com/llvm/llvm-project/pull/101743	2024-08-05 14:04:54 -04:00
Florian Hahn	fdb9f96fa2	[LV] Consider earlier stores to invariant reduction address as dead. For invariant stores to an address of a reduction, only the latest store will be generated outside the loop. Consider earlier stores as dead. This fixes a difference between the legacy and VPlan-based cost model. Fixes https://github.com/llvm/llvm-project/issues/96294.	2024-08-04 20:54:26 +01:00
Kazu Hirata	b7146aed5b	[Transforms] Construct SmallVector with ArrayRef (NFC) (#101851 )	2024-08-03 15:33:08 -07:00
Florian Hahn	66ce4f771e	[VPlan] Port invalid cost remarks to VPlan. (#99322 ) This patch moves the logic to create remarks for instructions with invalid costs to work on recipes and decoupling it from selectVectorizationFactor. This is needed to replace the remaining uses of selectVectorizationFactor with getBestPlan using the VPlan-based cost model. The current implementation iterates over all VPlans and their recipes again, to find recipes with invalid costs, which is more work but will only be done when remarks for LV are enabled. Once the remaining uses of selectVectorizationFactor are retired, we can collect VPlans with invalid costs as part of getBestPlan if we want to optimize the remarks case a bit, at the cost of adding additional complexity. PR: https://github.com/llvm/llvm-project/pull/99322	2024-07-27 12:52:12 +01:00
Florian Hahn	5a9b9ef660	[VPlan] Remove now redundant VF assertion. The assertion was added in preparation for https://github.com/llvm/llvm-project/pull/9882. Remove assertion now the PR has landed.	2024-07-26 21:25:33 +01:00
Florian Hahn	67a55e01e3	[VPlan] Replace getBestPlan by getBestVF use also for epilogue vec. (#98821 ) Replace getBestPlan by getBestVF which simply finds the best VF out of the VFs for the available VPlans. Then use getBestPlan to retrieve the corresponding VPlan. This allows using getBestVF & getBestPlan for epilogue vectorization as well. As the same plan may be used to vectorize both the main and epilogue loop, restricting the VF of the best plan would cause issues. PR: https://github.com/llvm/llvm-project/pull/98821	2024-07-26 14:06:46 +01:00
Florian Hahn	8b02f31aea	[VPlan] Consistently use VF.Width to getting plan for main loop VF (NFC) Cleanup to make things consistent in preparation for https://github.com/llvm/llvm-project/pull/98821.	2024-07-26 11:16:45 +01:00
Florian Hahn	a3092152ac	[VPlan] Don't create live-outs for induction increments. Follow up to fc9cd3272b5 to also skip creating live-outs for IV increments, as those are also generated independent of VPlan for now.	2024-07-25 21:34:55 +01:00
Florian Hahn	72532c9219	[LV] Don't predicate divs with invariant divisor when folding tail (#98904 ) When folding the tail, at least one of the lanes must execute unconditionally. If the divisor is loop-invariant no predication is needed, as predication would not prevent the divide-by-0 on the executed lane. Depends on https://github.com/llvm/llvm-project/pull/98892. PR: https://github.com/llvm/llvm-project/pull/98904	2024-07-25 12:21:09 +01:00
Florian Hahn	b72689a5cb	[LV] Ignore live-out users in cost model if scalar epilogue is required. Follow-up to ba8126b6fef79. If a scalar epilogue is required, users outside the loop won't use live-outs from the vector loop but from the scalar epilogue. Ignore them if that is the case. This fixes another case where the VPlan-based cost-model more accurately computes cost. Fixes https://github.com/llvm/llvm-project/issues/100464.	2024-07-25 11:16:18 +01:00
Florian Hahn	07688d1341	Revert "[LV] Add option to still enable the legacy cost model. (#99536 )" This reverts commit 9ba524427321b931bad156860755adf420aeec6a. Remove the recently added temporary option vectorize-use-legacy-cost-model as discussed on the PR adding it, now that we branched for 19.x.	2024-07-24 14:32:36 +01:00
Florian Hahn	ba8126b6fe	[LV] Mark dead instructions in loop as free. Update collectValuesToIgnore to also ignore dead instructions in the loop. Such instructions will be removed by VPlan-based DCE and won't be considered by the VPlan-based cost model. This closes a gap between the legacy and VPlan-based cost model. In practice with the default pipelines, there shouldn't be any dead instructions in loops reaching LoopVectorize, but it is easy to generate such cases by hand or automatically via fuzzers. Fixes https://github.com/llvm/llvm-project/issues/99701.	2024-07-24 09:31:32 +01:00
Florian Hahn	d89f3e8df3	[VPlan] Remove dead HeaderVPBB argument from addUsersInExitBlock (NFC).	2024-07-23 11:36:43 +01:00
Florian Hahn	a23efcc703	[VPlan] Move VPInterleaveRecipe::execute to VPlanRecipes.cpp (NFC). Move ::exeute and ::print to VPlanRecipes.cpp in line with other recipe definitions.	2024-07-20 22:23:02 +01:00
Florian Hahn	1f00c42446	[VPlan] Assert masked interleave accesses are allowed if needed (NFC) Add assertion at interleave group construction.	2024-07-20 21:42:38 +01:00
Craig Topper	be7f1827ff	[LV] Use llvm::all_of in LoopVectorizationCostModel::getMaximizedVFForTarget. NFC (#99585 )	2024-07-19 17:13:20 -07:00
Florian Hahn	9ba5244273	[LV] Add option to still enable the legacy cost model. (#99536 ) This patch adds a new temporary option to still use the legacy cost model after https://github.com/llvm/llvm-project/pull/92555. It defaults to false and the only intended use is to adjust the default to true in the soon-to-be-cut release branch. PR: https://github.com/llvm/llvm-project/pull/99536	2024-07-19 18:48:15 +01:00
Florian Hahn	008df3cf85	[LV] Check isPredInst instead of isScalarWithPred in uniform analysis. (#98892 ) Any instruction marked as uniform will result in a uniform VPReplicateRecipe. If it requires predication, it will be placed in a replicate region, even if isScalarWithPredication returns false. Check isPredicatedInst instead of isScalarWithPredication to avoid generating uniform VPReplicateRecipes placed inside a replicate region. This fixes an assertion when using scalable VFs. Fixes https://github.com/llvm/llvm-project/issues/80416. Fixes https://github.com/llvm/llvm-project/issues/94328. Fixes https://github.com/llvm/llvm-project/issues/99625. PR: https://github.com/llvm/llvm-project/pull/98892	2024-07-19 12:02:25 +01:00
Craig Topper	52d947b5c1	[LV] Remove unnecessary variable from InnerLoopVectorizer::createBitOrPointerCast. NFC DstVTy is already a VectorType, we don't need to cast it again. This used to be a cast to FixedVectorType that was changed to support scalable vectors.	2024-07-18 12:54:40 -07:00
Florian Hahn	371777695f	[LV] Assert uniform recipes don't get predicated for when vectorizing. Add assertion ensuring invariant on construction, split off as suggested from https://github.com/llvm/llvm-project/pull/98892.	2024-07-18 17:43:51 +01:00
Alexey Bataev	1a80153ba9	[LV][NFC]Simplify the structure and improve message of safe distance analysis for scalable vectorization. (#99487 )	2024-07-18 10:11:39 -04:00
Florian Hahn	2bb65660ae	[LV] Allow re-processing of operands of instrs feeding interleave group Follow up to d216615518 to update dead interleave group pointer detection to allow re-processing of operands of instructions determined to only feed interleave groups. This is needed because instructions feeding interleave group pointers can become dead in any order, as per the newly added test case.	2024-07-17 21:37:28 +01:00
Florian Hahn	75b3ddf23b	[VPlan] Use State.VF in vectorizeInterleaveGroup (NFCI). Update vectorizeInterleaveGroup to use State.VF in preparation to moving the code directly to the recipe.	2024-07-17 14:30:19 +01:00
Alexey Bataev	8156be684d	[LV][NFC]Introduce isScalableVectorizationAllowed() to refactor getMaxLegalScalableVF(). Adds isScalableVectorizationAllowed() and the corresponding data member to query if the scalable vectorization is supported rather than performing the analysis each time the scalable vector factor is requested. Part of https://github.com/llvm/llvm-project/pull/91403 Reviewers: ayalz, fhahn Reviewed By: fhahn, ayalz Pull Request: https://github.com/llvm/llvm-project/pull/98916	2024-07-17 07:16:13 -04:00
Florian Hahn	d216615518	[LV] Process dead interleave pointer ops in reverse order. Process dead interleave pointer ops in reverse order. This also catches cases where the same base pointer is used by multiple different interleave groups. This fixes another case where the legacy cost model inaccuarately estimates cost, surfaced by b841e2eca3b5c8.	2024-07-17 11:43:42 +01:00
Sjoerd Meijer	c5329c827a	[LV][AArch64] Prefer Fixed over Scalable if cost-model is equal (Neoverse V2) (#95819 ) For the Neoverse V2 we would like to prefer fixed width over scalable vectorisation if the cost-model assigns an equal cost to both for certain loops. This improves 7 kernels from TSVC-2 and several production kernels by about 2x, and does not affect SPEC21017 INT and FP. This also adds a new TTI hook that can steer the loop vectorizater to preferring fixed width vectorization, which can be set per CPU. For now, this is only enabled for the Neoverse V2. There are 3 reasons why preferring NEON might be better in the case the cost-model is a tie and the SVE vector size is the same as NEON (128-bit): architectural reasons, micro-architecture reasons, and SVE codegen reasons. The latter will be improved over time, so the more important reasons are the former two. I.e., (micro) architecture reason is the use of LPD/STP instructions which are not available in SVE2 and it avoids predication. For what it is worth: this codegen strategy to generate more NEON is inline with GCC's codegen strategy, which is actually even more aggressive in generating NEON when no predication is required. We could be smarter about the decision making, but this seems to be a first good step in the right direction, and we can always revise this later (for example make the target hook more general).	2024-07-17 10:46:28 +01:00
Florian Hahn	cf673604c1	[LV] Use VF from selected plan when creating InnerLoopVectorizer. This makes sure the same VF is used when executing the plan and in the functions in InnerLoopVectorizer when the assertion is disabled (e.g. release builds). No tests added as they would trigger an assertion.	2024-07-17 10:29:58 +01:00
Florian Hahn	7e2b5e233a	[LV] Move reportVectorizationInfo to LoopVectorize.cpp (NFC) The function is only used in LoopVectorize.cpp, no need to define it in header.	2024-07-16 13:47:12 +01:00
Florian Hahn	cc97a0d347	[LV] Use getBestPlan when interleaving only. (NFCI) Use the getBestPlan() utility added in b841e2eca3 to also get the scalar plan when interleaving only.	2024-07-16 12:00:49 +01:00
Mel Chen	4eb30cfb34	[LV][EVL] Support in-loop reduction using tail folding with EVL. (#90184 ) Following from #87816, add VPReductionEVLRecipe to describe vector predication reduction. Address one of TODOs from #76172.	2024-07-16 16:15:24 +08:00
Florian Hahn	fc9cd3272b	[VPlan] Don't add live-outs for IV phis. Resume and exit values for inductions are currently still created outside of VPlan and independent of the induction recipes. Don't add live-outs for now, as the additional unneeded users can pessimize other anlysis. Fixes https://github.com/llvm/llvm-project/issues/98660.	2024-07-14 20:49:03 +01:00
Mel Chen	a00754bb2a	[LV] Fix the cost of min/max reductions. (#98453 ) This patch updates the function `getReductionPatternCost` to handle the cost of min/max reductions by `TTI.getMinMaxReductionCost`.	2024-07-12 13:47:33 +08:00
Florian Hahn	7a49d80f58	[VPlan] Skip users outside loop in check for exit pre-compute candidates When collecting candidates to pre-compute cost for operands of exit conditions, skip users outside the loop when checking if they are in ExistInstrs. The users outside the loop should be ignored, as they won't make a value live in the VPlan. This fixes a failure when building for X86 with sanitizers on macOS after b841e2eca3b5c (https://green.lab.llvm.org/job/llvm.org/job/clang-stage2-cmake-RgSan/287/)	2024-07-11 22:04:39 +01:00
Graham Hunter	22a7f6dcc4	Revert "[LV] Autovectorization for the all-in-one histogram intrinsic" (#98493 ) Reverts llvm/llvm-project#91458 to deal with post-commit reviewer requests.	2024-07-11 16:39:30 +01:00
Florian Hahn	9a5a8731e7	[VPlan] Introduce ResumePhi VPInstruction, use to create phi for FOR. (#94760 ) This patch introduces a new ResumePhi VPInstruction which creates a phi in a leaf block of a VPlan. The first use is to create the phi node for fixed-order recurrence resume values in the scalar preheader. The VPInstruction takes 2 operands: 1) the incoming value from the middle-block and a default value to be used for all other incoming blocks. In follow-up changes, it will also be used to create phis for reduction and induction resume values. Depends on https://github.com/llvm/llvm-project/pull/92651 PR: https://github.com/llvm/llvm-project/pull/94760	2024-07-11 16:08:04 +01:00
Graham Hunter	1860fd049e	[LV] Autovectorization for the all-in-one histogram intrinsic (#91458 ) This patch implements limited loop vectorization support for the 'all-in-one' histogram intrinsic. The feature is disabled by default, and when enabled will only vectorize if there are no other users of values in the gather-modify-scatter sequence.	2024-07-11 15:33:30 +01:00
Florian Hahn	2267191072	[LV] Add missing check, drop 'then'. Address post-commit comments for 67f4968a577.	2024-07-11 15:21:33 +01:00
Florian Hahn	67f4968a57	[LV] Skip cost for ZExt/SExts that will be removed by truncating ops. If an extend is truncated, it will be removed if the result type is <= the source type, as there is nothing to extend. Return a cost of 0. This was caught by the first step to perform cost-modeling based on VPlan (b841e2e), as the legacy cost model would query the cost of an invalid extend, while the extend has been folded away by VPlan transforms. Fixes https://github.com/llvm/llvm-project/issues/98413.	2024-07-11 11:40:14 +01:00
Florian Hahn	88e9c56990	[LV] Don't adjust name of recurrence phi in scalar loop (NFC). Adjusting the name of the recurrence phi in the scalar loop is a bit inconsistent, as we do not adjust any other names in the scalar loops (including other phis). Remove this adjustment in preparation for https://github.com/llvm/llvm-project/pull/94760/ and as discussed there.	2024-07-10 18:37:35 +01:00
Florian Hahn	b841e2eca3	Recommit "[VPlan] First step towards VPlan cost modeling. (#92555 )" This reverts commit 6f538f6a2d3224efda985e9eb09012fa4275ea92. A number of crashes have been fixed by separate fixes, including ttps://github.com/llvm/llvm-project/pull/96622. This version of the PR also pre-computes the costs for branches (except the latch) instead of computing their costs as part of costing of replicate regions, as there may not be a direct correspondence between original branches and number of replicate regions. Original message: This adds a new interface to compute the cost of recipes, VPBasicBlocks, VPRegionBlocks and VPlan, initially falling back to the legacy cost model for all recipes. Follow-up patches will gradually migrate recipes to compute their own costs step-by-step. It also adds getBestPlan function to LVP which computes the cost of all VPlans and picks the most profitable one together with the most profitable VF. The VPlan selected by the VPlan cost model is executed and there is an assert to catch cases where the VPlan cost model and the legacy cost model disagree. Even though I checked a number of different build configurations on AArch64 and X86, there may be some differences that have been missed. Additional discussions and context can be found in @arcbbb's https://github.com/llvm/llvm-project/pull/67647 and https://github.com/llvm/llvm-project/pull/67934 which is an earlier version of the current PR. PR: https://github.com/llvm/llvm-project/pull/92555	2024-07-10 14:22:21 +01:00
Florian Hahn	ef89e3efa9	[VPlan] Collect ephemeral values for VPlan. Port collectEphemeralValues to VPlan as collectEphemeralRecipesForVPlan, use it in willGenerateVectors. This fixes a regression caused by 29b8b72117 for loops where the only vector values are ephemeral.	2024-07-09 21:34:49 +01:00
Florian Hahn	72937203dd	[VPlan] Create vector header and latch VPBBs in createInitialVPlan (NFC) The empty header and latch blocks can be created together with the vector loop region. This is in preparation for splitting up the very large tryToBuildVPlanWithVPRecipes into several distinct functions, as suggested multiple times, including in https://github.com/llvm/llvm-project/pull/94760	2024-07-09 12:41:12 +01:00
Florian Hahn	a2a0ef567c	[VPlan] Retrieve LatchVPBB from region in adjustRecipesForRed (NFC) The HeaderVPBB is retrieved in a similar fashion already. This is in preparation for splitting up the very large tryToBuildVPlanWithVPRecipes into several distinct functions, as suggested multiple times, including in https://github.com/llvm/llvm-project/pull/94760	2024-07-09 10:48:43 +01:00
Florian Hahn	0577cdaa32	[LV] Split checking if tail-folding is possible, collecting masked ops. (#77612 ) Introduce new canFoldTail helper which only checks if tail-folding is possible, but without modifying MaskedOps. Just because tail-folding is possible doesn't mean the tail will be folded; that's up to the cost-model to decide. Separating the check if tail-folding is possible and preparing for tail-folding makes sure that MaskedOps is only populated when tail-folding is actually selected. PR: https://github.com/llvm/llvm-project/pull/77612	2024-07-08 16:34:42 +01:00
Florian Hahn	29b8b72117	[LV] Move check if any vector insts will be generated to VPlan. (#96622 ) This patch moves the check if any vector instructions will be generated from getInstructionCost to be based on VPlan. This simplifies getInstructionCost, is more accurate as we check the final result and also allows us to exit early once we visit a recipe that generates vector instructions. The helper can then be re-used by the VPlan-based cost model to match the legacy selectVectorizationFactor behavior, this fixing a crash and paving the way to recommit https://github.com/llvm/llvm-project/pull/92555. PR: https://github.com/llvm/llvm-project/pull/96622	2024-07-07 20:08:01 +01:00
Florian Hahn	ac03ae30cf	[LV] Preserve LAA in LoopVectorize (NFCI). LoopVectorize already always preserves DT, LI and SCEV. If any changes get made to the CFG, cached LAA info for loops are cleared. LoopAccessAnalysis also implements ::invalidate to clear the analysis if SE, DT or LI gets invalidated. Hence it should be safe to preserve LAA and save a small amount of compile-time.	2024-07-05 21:41:31 +01:00
Florian Hahn	eedc2c8cb2	[LV] Remove now obsolete DT updates of scalar exit block. Remove manual DT updates of scalar exit blocks during legacy skeleton creation, as they are not needed after 99d6c6d9365. This fixes DT verification failures with expensive checks, including https://lab.llvm.org/buildbot/#/builders/16/builds/1270.	2024-07-05 11:20:44 +01:00
Florian Hahn	99d6c6d936	[VPlan] Model branch cond to enter scalar epilogue in VPlan. (#92651 ) This patch moves branch condition creation to enter the scalar epilogue loop to VPlan. Modeling the branch in the middle block also requires modeling the successor blocks. This is done using the recently introduced VPIRBasicBlock. Note that the middle.block is still created as part of the skeleton and then patched in during VPlan execution. Unfortunately the skeleton needs to create the middle.block early on, as it is also used for induction resume value creation and is also needed to properly update the dominator tree during skeleton creation. After this patch lands, I plan to move induction resume value and phi node creation in the scalar preheader to VPlan. Once that is done, we should be able to create the middle.block in VPlan directly. This is a re-worked version based on the earlier https://reviews.llvm.org/D150398 and the main change is the use of VPIRBasicBlock. Depends on https://github.com/llvm/llvm-project/pull/92525 PR: https://github.com/llvm/llvm-project/pull/92651	2024-07-05 10:08:42 +01:00
Florian Hahn	8299bfaf29	[VPlan] Extract reduction result insertion point to variable (NFCI). Split off from https://github.com/llvm/llvm-project/pull/92651 as suggested.	2024-07-04 16:25:49 +01:00
Florian Hahn	2b3b405b09	[LV] Don't vectorize first-order recurrence with VF <vscale x 1 x ..> The assertion added as part of https://github.com/llvm/llvm-project/pull/93395 surfaced cases where first-order recurrences are vectorized with <vscale x 1 x ..>. If vscale is 1, then we are unable to extract the penultimate value (second to last lane). Previously this case got mis-compiled, trying to extract from an invalid lane (-1) https://llvm.godbolt.org/z/3adzYYcf9. Fixes https://github.com/llvm/llvm-project/issues/97452.	2024-07-04 11:44:51 +01:00

1 2 3 4 5 ...

2151 Commits