llvm-project

Author	SHA1	Message	Date
Florian Hahn	8796dfdcba	[VPlan] Consolidate logic to update loop metadata and profile info. This patch consolidates updating loop metadata and profile info for both the remainder and vector loops in a single place. This is NFC, modulo consistently applying vectorization specific metadata also in the experimental VPlan-native path. Split off from https://github.com/llvm/llvm-project/pull/154510.	2025-09-04 21:50:40 +01:00
Hassnaa Hamdi	35b22764e2	[LV][AArch64] Prefer epilogue with fixed-width over scalable VF. (#155546 ) In case of equal costs Prefer epilogue with fixed-width over scalable VF. That is helpful in cases like post-LTO vectorization where epilogue with fixed-width VF can be removed when we eventually know that the trip count is less than the epilogue iterations.	2025-09-04 19:31:30 +01:00
Florian Hahn	5faed1ad84	[VPlan] Add VPlan-based addMinIterCheck, replace ILV for non-epilogue. (#153643 ) This patch adds a new VPlan-based addMinimumIterationCheck, which replaced the ILV version for the non-epilogue case. The VPlan-based version constructs a SCEV expression to compute the minimum iterations, use that to check if the check is known true or false. Otherwise it creates a VPExpandSCEV recipe and emits a compare-and-branch. When using epilogue vectorization, we still need to create the minimum trip-count-check during the legacy skeleton creation. The patch moves the definitions out of ILV. PR: https://github.com/llvm/llvm-project/pull/153643	2025-08-26 15:52:31 +01:00
Ramkumar Ramachandra	97f554249c	[VPlan] Preserve nusw in createInBoundsPtrAdd (#151549 ) Rename createInBoundsPtrAdd to createNoWrapPtrAdd, and preserve nusw as well as inbounds at the callsite.	2025-08-18 17:48:42 +01:00
Florian Hahn	424258947e	[VPlan] Materialize VF and VFxUF using VPInstructions. (#152879 ) Materialize VF and VFxUF computation using VPInstruction instead of directly creating IR. This is one of the last few steps needed to model the full vector skeleton in VPlan. This is mostly NFC, although in some cases we remove some unused computations. PR: https://github.com/llvm/llvm-project/pull/152879	2025-08-12 14:13:13 +01:00
Luke Lau	94a6cd464e	[VPlan] Expand VPWidenPointerInductionRecipe into separate recipes (#148274 ) This is the VPWidenPointerInductionRecipe equivalent of #118638, with the motivation of allowing us to use the EVL as the induction step. There is a new VPInstruction added, WidePtrAdd to allow adding the step vector to the induction phi, since VPInstruction::PtrAdd only handles scalars or multiple scalar lanes. Originally this transformation was copied from the original recipe's execute code, but it's since been simplifed by teaching `unrollWidenInductionByUF` to unroll the recipe, which brings it inline with VPWidenIntOrFpInductionRecipe.	2025-08-05 16:54:02 +08:00
Florian Hahn	c9dd14d1d4	[VPlan] Compute interleave count for VPlan. (#149702 ) Move selectInterleaveCount to LoopVectorizationPlanner and retrieve some information directly from VPlan. Register pressure was already computed for a VPlan, and with this patch we now also check for reductions directly on VPlan, as well as checking how many load and store operations remain in the loop. This should be mostly NFC, but we may compute slightly different interleave counts, except for some edge cases, e.g. where dead loads have been removed. This shouldn't happen in practice, and the patch doesn't cause changes across a large test corpus on AArch64. Computing the interleave count based on VPlan allows for making better decisions in presence of VPlan optimizations, for example when operations on interleave groups are narrowed. Note that there are a few test changes for tests that were still checking the legacy cost-model output when it was computed in selectInterleaveCount. PR: https://github.com/llvm/llvm-project/pull/149702	2025-08-05 09:42:55 +01:00
Ramkumar Ramachandra	20f6ec4b29	[VPlan] Make VPBuilder APIs uniformly take ArrayRef (NFC) (#151484 )	2025-07-31 11:33:04 +01:00
Florian Hahn	004c67ea25	[LV] Vectorize maxnum/minnum w/o fast-math flags. (#148239 ) Update LV to vectorize maxnum/minnum reductions without fast-math flags, by adding an extra check in the loop if any inputs to maxnum/minnum are NaN, due to maxnum/minnum behavior w.r.t to signaling NaNs. Signed-zeros are already handled consistently by maxnum/minnum. If any input is NaN, exit the vector loop, compute the reduction result up to the vector iteration that contained NaN inputs and * resume in the scalar loop New recurrence kinds are added for reductions using maxnum/minnum without fast-math flags. PR: https://github.com/llvm/llvm-project/pull/148239	2025-07-18 21:58:19 +01:00
Florian Hahn	64686c59c3	[VPlan] Connect (MemRuntime\|SCEV)Check blocks as VPlan transform (NFC). (#143879 ) Connect SCEV and memory runtime check block directly in VPlan as VPIRBasicBlocks, removing ILV::emitSCEVChecks and ILV::emitMemRuntimeChecks. The new logic is currently split across LoopVectorizationPlanner::addRuntimeChecks which collects a list of {Condition, CheckBlock} pairs and performs some checks and emits remarks if needed. The list of checks is then added to VPlan in VPlanTransforms::connectCheckBlocks. PR: https://github.com/llvm/llvm-project/pull/143879	2025-07-09 14:03:25 +02:00
Ramkumar Ramachandra	f1e1b48023	[LV] Strip redundant fn in VPBuilder (NFC) (#147499 )	2025-07-08 13:41:29 +01:00
Ramkumar Ramachandra	b334ffd4f4	[VPlan] Refine return types in VPBuilder (NFC) (#108858 )	2025-06-20 14:01:15 +01:00
Philip Reames	53ea522d1b	[LV] Introduce and use VPBuilder::createScalarZExtOrTrunc [nfc] (#144946 ) Reduce redundant code, make the flow slightly easier to read.	2025-06-19 14:12:14 -07:00
Stephen Tozer	aa8a1fa6f5	[DLCov][NFC] Annotate intentionally-blank DebugLocs in existing code (#136192 ) Following the work in PR #107279, this patch applies the annotative DebugLocs, which indicate that a particular instruction is intentionally missing a location for a given reason, to existing sites in the compiler where their conditions apply. This is NFC in ordinary LLVM builds (each function `DebugLoc::getFoo()` is inlined as `DebugLoc()`), but marks the instruction in coverage-tracking builds so that it will be ignored by Debugify, allowing only real errors to be reported. From a developer standpoint, it also communicates the intentionality and reason for a missing DebugLoc. Some notes for reviewers: - The difference between `I->dropLocation()` and `I->setDebugLoc(DebugLoc::getDropped())` is that the former _may_ decide to keep some debug info alive, while the latter will always be empty; in this patch, I always used the latter (even if the former could technically be correct), because the former could result in some (barely) different output, and I'd prefer to keep this patch purely NFC. - I've generally documented the uses of `DebugLoc::getUnknown()`, with the exception of the vectorizers - in summary, they are a huge cause of dropped source locations, and I don't have the time or the domain knowledge currently to solve that, so I've plastered it all over them as a form of "fixme".	2025-06-11 17:42:10 +01:00
Florian Hahn	567b3172da	[VPlan] Construct initial once and pass clones to tryToBuildVPlan (NFC). (#141363 ) Update to only build an initial, plain-CFG VPlan once, and then transform & optimize clones. This requires changes to ::clone() for VPInstruction and VPWidenPHIRecipe to allow for proper cloning of the recipes in the initial VPlan. PR: https://github.com/llvm/llvm-project/pull/141363	2025-05-26 13:42:47 +01:00
Florian Hahn	c0506a11f4	[VPlan] Separate out logic to manage IR flags to VPIRFlags (NFC). (#140621 ) This patch moves the logic to manage IR flags to a separate VPIRFlags class. For now, VPRecipeWithIRFlags is the only class that inherits VPIRFlags. The new class allows for simpler passing of flags when constructing recipes, simplifying the constructors for various recipes (VPInstruction in particular, which now just has 2 constructors, one taking an extra VPIRFlags argument. This mirrors the approach taken for VPIRMetadata and makes it easier to extend in the future. The patch also adds a unified flagsValidForOpcode to check if the flags in a VPIRFlags match the provided opcode. PR: https://github.com/llvm/llvm-project/pull/140621	2025-05-25 11:13:11 +01:00
Florian Hahn	f2e62cfca5	[VPlan] Add VPPhi subclass for VPInstruction with PHI opcodes.(NFC) (#139151 ) Similarly to VPInstructionWithType and VPIRPhi, add VPPhi as a subclass for VPInstruction. This allows implementing the VPPhiAccessors trait, making available helpers for generic printing of incoming values / blocks and accessors for incoming blocks and values. It will also allow properly verifying def-uses for values used by VPInstructions with PHI opcodes via https://github.com/llvm/llvm-project/pull/124838. PR: https://github.com/llvm/llvm-project/pull/139151	2025-05-10 11:08:00 +01:00
Florian Hahn	e854c381c6	[VPlan] Manage noalias/alias_scope metadata in VPlan. (#136450 ) Use VPIRMetadata added in https://github.com/llvm/llvm-project/pull/135272 to also manage no-alias metadata added by versioning. Note that this means we have to build the no-alias metadata up-front once. If it is not used, it will be discarded automatically. This also fixes a case where incorrect metadata was added to wide loads/stores that got converted from an interleave group. Compile-time impact is neutral: https://llvm-compile-time-tracker.com/compare.php?from=38bf1af41c5425a552a53feb13c71d82873f1c18&to=2fd7844cfdf5ec0f1c2ce0b9b3ae0763245b6922&stat=instructions:u	2025-05-09 11:19:12 +01:00
Florian Hahn	7f4e36ebf6	[VPlan] Create PHI VPInstruction using VPBuilder (NFC). Use builder to create scalar PHI VPInstructions.	2025-05-07 20:47:37 +01:00
Luke Lau	b0f2bfc7e4	[VPlan] Use correct non-FMF constructor in VPInstructionWithType createNaryOp (#137632 ) Currently if we try to create a VPInstructionWithType without a FMF via VPBuilder::createNaryOp we will use the constructor that asserts `assert(isFPMathOp() && "this op can't take fast-math flags");`. This fixes it by checking if FMFs have a value, similar to the other createNaryOp overloads. This is needed by #129508	2025-04-29 20:35:19 +08:00
Florian Hahn	54b33eba16	[VPlan] Add opcode to create step for wide inductions. (#119284 ) This patch adds a WideIVStep opcode that can be used to create a vector with the steps to increment a wide induction. The opcode has 2 operands * the vector step * the scale of the vector step The opcode is later converted into a sequence of recipes that convert the scale and step to the target type, if needed, and then multiply vector step by scale. This simplifies code that needs to materialize step vectors, e.g. replacing wide IVs as follow up to https://github.com/llvm/llvm-project/pull/108378 with an increment of the wide IV step. PR: https://github.com/llvm/llvm-project/pull/119284	2025-04-14 23:20:44 +02:00
Florian Hahn	e27a21f6a7	[VPlan] Add hasScalarTail, use instead of !CM.foldTailByMasking() (NFC). (#134674 ) Now that VPlan is able to fold away redundant branches to the scalar preheader, we can directly check in VPlan if the scalar tail may execute. hasScalarTail returns true if the tail may execute. We know that the scalar tail won't execute if the scalar preheader doesn't have any predecessors, i.e. is not reachable. This removes some late uses of the legacy cost model. PR: https://github.com/llvm/llvm-project/pull/134674	2025-04-11 12:50:59 +01:00
Florian Hahn	6a9e8fc50c	[VPlan] Introduce VPInstructionWithType, use instead of VPScalarCast(NFC) (#129706 ) There are some opcodes that currently require specialized recipes, due to their result type not being implied by their operands, including casts. This leads to duplication from defining multiple full recipes. This patch introduces a new VPInstructionWithType subclass that also stores the result type. The general idea is to have opcodes needing to specify a result type to use this general recipe. The current patch replaces VPScalarCastRecipe with VInstructionWithType, a similar patch for VPWidenCastRecipe will follow soon. There are a few proposed opcodes that should also benefit, without the need of workarounds: * https://github.com/llvm/llvm-project/pull/129508 * https://github.com/llvm/llvm-project/pull/119284 PR: https://github.com/llvm/llvm-project/pull/129706	2025-04-10 22:30:40 +01:00
Florian Hahn	3a859b11e3	[VPlan] Set and use debug location for VPScalarIVStepsRecipe. This adds missing debug location for VPscalarIVStepsRecipe. The location of the corresponding phi is used.	2025-04-04 21:14:36 +01:00
Florian Hahn	783a846507	[VPlan] Add VF as operand to VPScalarIVStepsRecipe. Similarly to other recipes, update VPScalarIVStepsRecipe to also take the runtime VF as argument. This removes some unnecessary runtime VF computations for scalable vectors. It will also allow dropping the UF == 1 restriction for narrowing interleave groups required in 577631f0a528.	2025-03-28 21:48:59 +00:00
Ramkumar Ramachandra	e8d882a95b	[LV] Audit and fix nits in cl::opts (NFC) (#130601 ) Non-static cl::opts should be under the llvm namespace.	2025-03-25 10:19:45 +00:00
Florian Hahn	0d3ba087f7	[LV] Move IV bypass value creation out of ILV (NFC) createInductionAdditionalBypassValues is only used for epilogue vectorization now. Move it out of ILV, which means we do not have to thread through ExpandedSCEVs and also don't have to track the bypass values in ILV. Instead, directly create them if needed after executing the epilogue plan. This moves more the epilogue specific logic out of the generic executePlan.	2025-03-22 20:36:45 +00:00
Florian Hahn	2e13ec561c	[VPlan] Bail out on non-intrinsic calls in VPlanNativePath. Update initial VPlan-construction in VPlanNativePath in line with the inner loop path, in that it bails out when encountering constructs it cannot handle, like non-intrinsic calls. Fixes https://github.com/llvm/llvm-project/issues/131071.	2025-03-19 21:35:15 +00:00
Florian Hahn	f8fa93193b	[LV] Add VPBuilder::insert, use to insert created vector pointer (NFC). Split off from https://github.com/llvm/llvm-project/pull/124432 as suggested. Adds VPBuilder::insert, inspired by IRBuilderBase.	2025-02-03 22:20:40 +00:00
Florian Hahn	5008277322	[VPlan] Move auxiliary declarations out of VPlan.h (NFC). (#124104 ) Nothing in VPlan.h directly depends on VPTransformState, VPCostContext, VPFRange, VPlanPrinter or VPSlotTracker. Move them out to a separate header to reduce the size of widely used VPlan.h. This is a first step towards more cleanly separating declarations in VPlan. Besides reducing VPlan.h's size, this also allows including additional VPlan-related headers in VPlanHelpers.h for use there. An example is using VPDominatorTree in VPTransformState (https://github.com/llvm/llvm-project/pull/117138). PR: https://github.com/llvm/llvm-project/pull/124104	2025-02-02 13:44:07 +00:00
Florian Hahn	f4230b4332	[VPlan] Add and use debug location for VPScalarCastRecipe. Update the recipe it always take a debug location and set it.	2025-01-05 20:08:51 +00:00
Florian Hahn	7f3428d3ed	[VPlan] Compute induction end values in VPlan. (#112145 ) Use createDerivedIV to compute IV end values directly in VPlan, instead of creating them up-front. This allows updating IV users outside the loop as follow-up. Depends on https://github.com/llvm/llvm-project/pull/110004 and https://github.com/llvm/llvm-project/pull/109975. PR: https://github.com/llvm/llvm-project/pull/112145	2024-12-29 19:05:08 +00:00
Nikita Popov	1157187496	[VPlan] Propagate all GEP flags (#119899 ) Store GEPNoWrapFlags instead of only InBounds and propagate them.	2024-12-17 13:48:50 +01:00
Florian Hahn	590f451b60	[VPlan] Allow setting IR name for VPDerivedIVRecipe (NFCI). Allow setting the name to use for the generated IR value of the derived IV in preparations for https://github.com/llvm/llvm-project/pull/112145. This is analogous to VPInstruction::Name.	2024-11-24 20:39:12 +00:00
Julian Nagele	a8538b9138	[LV] Vectorize Epilogues for loops with small VF but high IC (#108190 ) - Consider MainLoopVF * IC when determining whether Epilogue Vectorization is profitable - Allow the same VF for the Epilogue as for the main loop - Use an upper bound for the trip count of the Epilogue when choosing the Epilogue VF PR: https://github.com/llvm/llvm-project/pull/108190 --------- Co-authored-by: Florian Hahn <flo@fhahn.com>	2024-11-17 19:35:32 +00:00
Florian Hahn	b021464d35	[VPlan] Introduce scalar loop header in plan, remove VPLiveOut. (#109975 ) Update VPlan to include the scalar loop header. This allows retiring VPLiveOut, as the remaining live-outs can now be handled by adding operands to the wrapped phis in the scalar loop header. Note that the current version only includes the scalar loop header, no other loop blocks and also does not wrap it in a region block. PR: https://github.com/llvm/llvm-project/pull/109975	2024-10-31 21:36:44 +01:00
Florian Hahn	9648271a3c	[LV] Pass flag indicating epilogue is vectorized to executePlan (NFC) This clarifies the flag, which is now only passed if the epilogue loop is being vectorized.	2024-10-25 20:39:47 +02:00
Florian Hahn	7f74651837	[VPlan] Use pointer to member 0 as VPInterleaveRecipe's pointer arg. (#106431 ) Update VPInterleaveRecipe to always use the pointer to member 0 as pointer argument. This in many cases helps to remove unneeded index adjustments and simplifies VPInterleaveRecipe::execute. In some rare cases, the address of member 0 does not dominate the insert position of the interleave group. In those cases a PtrAdd VPInstruction is emitted to compute the address of member 0 based on the address of the insert position. Alternatively we could hoist the recipe computing the address of member 0.	2024-10-06 22:53:13 +01:00
Florian Hahn	1edd22030c	[LV] Retrieve reduction resume values directly for epilogue vec. (NFC) Use the reduction resume values from the phis in the scalar header, instead of collecting them in a map. This removes some complexity from the general executePlan code paths and pushes it to only the epilogue vectorization part.	2024-09-29 10:37:58 +01:00
Florian Hahn	c95583f15f	[VPlan] Add createPtrAdd helper (NFC). Preparation for https://github.com/llvm/llvm-project/pull/106431.	2024-09-24 19:55:35 +01:00
Florian Hahn	8ec406757c	[VPlan] Implement unrolling as VPlan-to-VPlan transform. (#95842 ) This patch implements explicit unrolling by UF as VPlan transform. In follow up patches this will allow simplifying VPTransform state (no need to store unrolled parts) as well as recipe execution (no need to generate code for multiple parts in an each recipe). It also allows for more general optimziations (e.g. avoid generating code for recipes that are uniform-across parts). It also unifies the logic dealing with unrolled parts in a single place, rather than spreading it out across multiple places (e.g. VPlan post processing for header-phi recipes previously.) In the initial implementation, a number of recipes still take the unrolled part as additional, optional argument, if their execution depends on the unrolled part. The computation for start/step values for scalable inductions changed slightly. Previously the step would be computed as scalar and then splatted, now vscale gets splatted and multiplied by the step in a vector mul. This has been split off https://github.com/llvm/llvm-project/pull/94339 which also includes changes to simplify VPTransfomState and recipes' ::execute. The current version mostly leaves existing ::execute untouched and instead sets VPTransfomState::UF to 1. A follow-up patch will clean up all references to VPTransformState::UF. Another follow-up patch will simplify VPTransformState to only store a single vector value per VPValue. PR: https://github.com/llvm/llvm-project/pull/95842	2024-09-21 19:47:37 +01:00
Florian Hahn	c3fda44147	[VPlan] Use VPBuilder to create scalar IV steps and derived IV (NFCI). Extend VPBuilder to allow creating VPDerivedIVRecipe, VPScalarCastRecipe and VPScalarIVStepsRecipe. Use them to simplify the code to create scalar IV steps slightly.	2024-09-13 22:19:36 +01:00
Ramkumar Ramachandra	75a57edadc	VPlan/Builder: inline VPBuilder::createICmp (NFC) (#105650 ) Inline VPBuilder::createICmp in the header, in line with the other VPBuilder functions.	2024-09-13 20:08:11 +01:00
Florian Hahn	08d294df55	[VPlan] Simplify VPBuilder insert point when adding users in exit block. Simplifies setting the insert point, addressing a TODO.	2024-09-12 22:47:03 +01:00
Florian Hahn	e454d31037	[VPlan] Factor out precomputing costs from LVP::cost (NFC). Move the logic for pre-computing costs of certain instructions to a separate helper function, allowing re-use in a follow-up patch.	2024-08-22 20:40:38 +01:00
Florian Hahn	4e04286d61	[VPlan] Only use selectVectorizationFactor for cross-check (NFCI). (#103033 ) Use getBestVF to select VF up-front and only use selectVectorizationFactor to get the VF legacy VF to check the vectorization decision matches the VPlan-based cost model. PR: https://github.com/llvm/llvm-project/pull/103033	2024-08-21 13:09:01 +02:00
Florian Hahn	f2fcd9cb97	[VPlan] Rename getBestPlanFor -> getPlanFor (NFC). As suggested in https://github.com/llvm/llvm-project/pull/103033, more accurately rename to getPlanFor , as it simplify returns the VPlan for VF, relying on the fact that there is a single VPlan for each VF at the moment.	2024-08-19 13:05:19 +01:00
Florian Hahn	740f055451	[VPlan] Rename getBestVF -> computeBestVF (NFC). As suggested in https://github.com/llvm/llvm-project/pull/103033, more accurately rename to computeBestVF, as it now does not simply return the best VF, but directly computes it.	2024-08-19 10:44:50 +01:00
Florian Hahn	7024cecf03	[LV] Collect profitable VFs in ::getBestVF. (NFCI) Move collectig profitable VFs to ::getBestVF, in preparation for retiring selectVectorizationFactor.	2024-08-11 14:45:54 +01:00
Florian Hahn	66ce4f771e	[VPlan] Port invalid cost remarks to VPlan. (#99322 ) This patch moves the logic to create remarks for instructions with invalid costs to work on recipes and decoupling it from selectVectorizationFactor. This is needed to replace the remaining uses of selectVectorizationFactor with getBestPlan using the VPlan-based cost model. The current implementation iterates over all VPlans and their recipes again, to find recipes with invalid costs, which is more work but will only be done when remarks for LV are enabled. Once the remaining uses of selectVectorizationFactor are retired, we can collect VPlans with invalid costs as part of getBestPlan if we want to optimize the remarks case a bit, at the cost of adding additional complexity. PR: https://github.com/llvm/llvm-project/pull/99322	2024-07-27 12:52:12 +01:00

1 2 3 4

165 Commits