llvm-project

Author	SHA1	Message	Date
John Brawn	eafbb879f6	[LoopVectorize] Don't replicate blocks with optsize (#129265 ) Any VPlan we generate that contains a replicator region will result in replicated blocks in the output, causing a large code size increase. Reject such VPlans when optimizing for size, as the code size impact is usually worse than having a scalar epilogue, which we already forbid with optsize. This change requires a lot of test changes. For tests of optsize specifically I've updated the test with the new output, otherwise the tests have been adjusted to not rely on optsize. Fixes #66652	2025-04-17 11:50:49 +01:00
Florian Hahn	41c1a7be3f	[LV] Don't add fixed-order recurrence phis to forced scalars. Fixed-order recurrence phis cannot be forced to be scalar, they will always be widened at the moment. Make sure we don't add them to ForcedScalars, otherwise the legacy cost model will compute incorrect costs. This fixes an assertion reported with https://github.com/llvm/llvm-project/pull/129645.	2025-04-16 22:58:10 +02:00
Florian Hahn	bc03d6cce2	[VPlan] Introduce all loop regions as VPlan transform. (NFC) (#129402 ) Further simplify VPlan CFG builder by moving introduction of inner regions to a VPlan transform, building on https://github.com/llvm/llvm-project/pull/128419. The HCFG builder now only constructs plain CFGs. I will move it to VPlanConstruction as follow-up. Depends on https://github.com/llvm/llvm-project/pull/128419. PR: https://github.com/llvm/llvm-project/pull/129402	2025-04-16 13:30:45 +02:00
Mel Chen	e96111d3e9	[LV] Remove redundant check. nfc (#135605 ) Remove the redundant `!TheLoop->contains(Op->getParent())` check since `!TheLoop->contains(Op)` has already been verified.	2025-04-15 08:21:28 +08:00
Florian Hahn	54b33eba16	[VPlan] Add opcode to create step for wide inductions. (#119284 ) This patch adds a WideIVStep opcode that can be used to create a vector with the steps to increment a wide induction. The opcode has 2 operands * the vector step * the scale of the vector step The opcode is later converted into a sequence of recipes that convert the scale and step to the target type, if needed, and then multiply vector step by scale. This simplifies code that needs to materialize step vectors, e.g. replacing wide IVs as follow up to https://github.com/llvm/llvm-project/pull/108378 with an increment of the wide IV step. PR: https://github.com/llvm/llvm-project/pull/119284	2025-04-14 23:20:44 +02:00
Mel Chen	9df153bc14	[LV] Remove unused requiresScalarEpilogue function. nfc (#135341 )	2025-04-14 14:16:04 +08:00
Sam Tebbs	b658a2e74a	[LV] Reduce register usage for scaled reductions (#133090 ) This PR accounts for scaled reductions in `calculateRegisterUsage` to reflect the fact that the number of lanes in their output is smaller than the VF. Depends on https://github.com/llvm/llvm-project/pull/126437	2025-04-11 14:31:08 +01:00
Florian Hahn	e27a21f6a7	[VPlan] Add hasScalarTail, use instead of !CM.foldTailByMasking() (NFC). (#134674 ) Now that VPlan is able to fold away redundant branches to the scalar preheader, we can directly check in VPlan if the scalar tail may execute. hasScalarTail returns true if the tail may execute. We know that the scalar tail won't execute if the scalar preheader doesn't have any predecessors, i.e. is not reachable. This removes some late uses of the legacy cost model. PR: https://github.com/llvm/llvm-project/pull/134674	2025-04-11 12:50:59 +01:00
Florian Hahn	6a9e8fc50c	[VPlan] Introduce VPInstructionWithType, use instead of VPScalarCast(NFC) (#129706 ) There are some opcodes that currently require specialized recipes, due to their result type not being implied by their operands, including casts. This leads to duplication from defining multiple full recipes. This patch introduces a new VPInstructionWithType subclass that also stores the result type. The general idea is to have opcodes needing to specify a result type to use this general recipe. The current patch replaces VPScalarCastRecipe with VInstructionWithType, a similar patch for VPWidenCastRecipe will follow soon. There are a few proposed opcodes that should also benefit, without the need of workarounds: * https://github.com/llvm/llvm-project/pull/129508 * https://github.com/llvm/llvm-project/pull/119284 PR: https://github.com/llvm/llvm-project/pull/129706	2025-04-10 22:30:40 +01:00
Florian Hahn	6f92339d9e	[LV] Compute register usage for interleaving on VPlan. (#126437 ) Add a version of calculateRegisterUsage that works estimates register usage for a VPlan. This mostly just ports the existing code, with some updates to figure out what recipes will generate vectors vs scalars. There are number of changes in the computed register usages, but they should be more accurate w.r.t. to the generated vector code. There are the following changes: * Scalar usage increases in most cases by 1, as we always create a scalar canonical IV, which is alive across the loop and is not considered by the legacy implementation * Output is ordered by insertion, now scalar registers are added first due the canonical IV phi. * Using the VPlan, we now also more precisely know if an induction will be vectorized or scalarized. Depends on https://github.com/llvm/llvm-project/pull/126415 PR: https://github.com/llvm/llvm-project/pull/126437	2025-04-08 20:52:50 +01:00
Florian Hahn	a51e282784	[LV] Check if plan has an early exit via plan's exit blocks. (NFC) (#134720 ) Add a dedicated function to check if a plan is for a loop with an early exit. This can easily be determined by checking the exit blocks. This allows removing a use of Legal->hasUncountableEarlyExit() from InnerLoopVectorizer. PR: https://github.com/llvm/llvm-project/pull/134720	2025-04-08 12:52:38 +01:00
Ramkumar Ramachandra	6a42fb8fbf	[LV] Clarify code in isPredicatedInst (NFC) (#134251 )	2025-04-08 10:46:17 +01:00
Florian Hahn	ad9f15ab53	[VPlan] Introduce and use VPValue::replaceUsesOfWith (NFC). Adds an API matching LLVM's IR Value, which simplifies some code a bit.	2025-04-07 22:07:52 +01:00
Mel Chen	409df9f74c	[TTI][LV] Change the prototype of preferInLoopReduction. nfc (#132698 ) This patch changes the preferInLoopReduction function to take a RecurKind instead of an unsigned Opcode. This makes it possible to distinguish non-arithmetic reductions such as min/max, AnyOf, and FindLastIV, and also helps unify IAnyOf with FAnyOf and IFindLastIV with FFindLastIV. Related patch #118393 #131830	2025-04-07 19:10:16 +08:00
Florian Hahn	449e2f5d66	[LV] Remove more DT updates from legacy code path (NFCI). Remove some legacy DT updates. Those should already be handled when updating the DT during VPlan execution.	2025-04-06 14:35:21 +01:00
Florian Hahn	283a78a088	Reapply "[LV] Don't add blocks to loop in GeneratedRTChecks (NFC)." This reverts commit 46a2f4174a051f29a09dbc3844df763571c67309. Recommits 2fd6f8fb5e3a with corresponding VPlan change to ensure LoopInfo is updated for all blocks during VPlan execution if needed.	2025-04-06 12:18:11 +01:00
Florian Hahn	46a2f4174a	Revert "[LV] Don't add blocks to loop in GeneratedRTChecks (NFC)." This reverts commit 2fd6f8fb5e3a52e901276d97c285b8de66742985. This missed a possible case, causing buildbot failures.	2025-04-05 21:47:14 +01:00
Florian Hahn	2fd6f8fb5e	[LV] Don't add blocks to loop in GeneratedRTChecks (NFC). Blocks will get added to parent loops as needed during VPlan execution.	2025-04-05 21:10:26 +01:00
Florian Hahn	5fbd0658a0	[VPlan] Add initial CFG simplification, removing BranchOnCond true. (#106748 ) Add an initial CFG simplification transform, which removes the dead edges for blocks terminated with BranchOnCond true. At the moment, this removes the edge between middle block and scalar preheader when folding the tail. PR: https://github.com/llvm/llvm-project/pull/106748	2025-04-04 15:44:26 +01:00
Florian Hahn	2bdc1a1337	[LV] Use frozen start value for FindLastIV if needed. (#132691 ) FindLastIV introduces multiple uses of the start value, where in the original source there was only a single use, when the epilogue is vectorized. Each use of undef may produce a different result, so introducing multiple uses can produce incorrect results when the input is undef/poison. If the start value may be undef or poison, freeze it and use the frozen value, which will be the same at all uses. See the following scenarios in Alive2: * Both main and epilogue vector loops execute, go to exit block: https://alive2.llvm.org/ce/z/_TSvRr * Both main and epilogue vector loops execute, go to scalar loop: https://alive2.llvm.org/ce/z/CsPj5v * Only epilogue vector loop executes, go to exit block: https://alive2.llvm.org/ce/z/5XqkNV * Only epilogue vector loop executes, go to scalar loop: https://alive2.llvm.org/ce/z/JUpqRN The latter 2 show requiring freezing the resume phi. That means we cannot freeze in the preheader. We could move the freeze to the main iteration count check, but that would be a bit fragile to find and other transforms can sink the freeze if needed. Depends on https://github.com/llvm/llvm-project/pull/132689 and https://github.com/llvm/llvm-project/pull/132690. Fixes https://github.com/llvm/llvm-project/issues/126836 PR: https://github.com/llvm/llvm-project/pull/132691	2025-04-04 11:48:01 +01:00
Florian Hahn	cdff7f0b6e	[LV] Retrieve middle VPBB via scalar ph to fix epilogue resumephis (NFC) If ScalarPH has predecessors, we may need to update its reduction resume values. If there is a middle block, it must be the first predecessor. Note that the first predecessor may not be the middle block, if the middle block doesn't branch to the scalar preheader. In that case, fixReductionScalarResumeWhenVectorizingEpilog will be a no-op. In preparation for https://github.com/llvm/llvm-project/pull/106748.	2025-04-03 21:46:48 +01:00
Ramkumar Ramachandra	6bbdc70066	[LV] Use getCallWideningDecision in more places (NFC) (#134236 )	2025-04-03 14:53:19 +01:00
Florian Hahn	4b67c53e20	[VPlan] Use recipe debug loc instead of instr DLs in more cases (NFC) Update both VPInterleaveRecipe and VPReplicateRecipe codegen to use debug location directly from the recipe, not the underlying instruction. This removes another dependency on underlying instructions.	2025-04-02 21:51:17 +01:00
Krzysztof Drewniak	554859c736	[TTI] Make isLegalMasked{Load,Store} take an address space (#134006 ) In order to facilitate targets that only support masked loads/stores on certain address spaces (AMDGPU will support them in an upcoming patch, but only for address space 7), add an AddressSpace parameter to isLegalMaskedLoad and isLegalMaskedStore	2025-04-02 15:38:10 -05:00
Ningning Shi(史宁宁)	6b647de031	[NFC] Remove the unused hasMinSize() (#133838 ) The 'hasOptSize()' is 'hasFnAttribute(Attribute::OptimizeForSize) \|\| hasMinSize()', so we don't need another 'hasMinSize()'.	2025-04-01 15:23:34 +08:00
Alexey Bataev	78777a204a	[LV]Split store-load forward distance analysis from other checks, NFC (#121156 ) The patch splits the store-load forwarding distance analysis from other dependency analysis in LAA. Currently it supports only power-of-2 distances, required to support non-power-of-2 distances in future. Part of #100755	2025-03-31 07:28:44 -04:00
Kazu Hirata	2fc08d4c31	[Vectorize] Use DenseMap::insert_range (NFC) (#133656 )	2025-03-30 22:57:45 -07:00
Florian Hahn	424c8f9217	[VPlan] Remove dead UF argument from VPTransformState ctor (NFC).	2025-03-30 17:31:00 +01:00
Kazu Hirata	d8b078d550	[Transforms] Use llvm::append_range (NFC) (#133607 )	2025-03-29 18:57:50 -07:00
Ramkumar Ramachandra	4c4e4e4299	[LV] Strengthen calls to collectInstsToScalarize (NFC) (#130642 ) Avoid the pattern of always calling collectInstsToScalarize after collectUniformsAndScalars, and call it in collectUniformsAndScalars instead. Also strengthen checks for early exits in the function.	2025-03-28 17:27:57 +00:00
Florian Hahn	7b75db5755	[VPlan] Add new VPIRPhi overlay for VPIRInsts wrapping phi nodes (NFC). (#129387 ) Add a new VPIRPhi subclass of VPIRInstruction, that purely serves as an overlay, to provide more convenient checking (via directly doing isa/dyn_cast/cast) and specialied execute/print implementations. Both VPIRInstruction and VPIRPhi share the same VPDefID, and are differentiated by the backing IR instruction. This pattern could alos be used to provide more specialized interfaces for some VPInstructions ocpodes, without introducing new, completely spearate recipes. An example would be modeling VPWidenPHIRecipe & VPScalarPHIRecip using VPInstructions opcodes and providing an interface to retrieve incoming blocks and values through a VPInstruction subclass similar to VPIRPhi. PR: https://github.com/llvm/llvm-project/pull/129387	2025-03-28 08:43:46 +00:00
Florian Hahn	8ddbc01295	[VPlan] Manage FindLastIV start value in ComputeFindLastIVResult (NFC) (#132690 ) Keep the start value as operand of ComputeFindLastIVResult. A follow-up patch will use this to make sure the start value is frozen if needed. Depends on https://github.com/llvm/llvm-project/pull/132689 PR: https://github.com/llvm/llvm-project/pull/132690	2025-03-27 18:34:13 +00:00
Kazu Hirata	cde58bfc16	[Transforms] Use range constructors of *Set (NFC) (#133203 )	2025-03-27 07:51:58 -07:00
Simon Pilgrim	1715386e80	Fix MSVC signed/unsigned comparison warning. NFC.	2025-03-27 08:56:21 +00:00
David Green	de1c2f24bc	[LoopVectorizer][AArch64] Move getMinTripCountTailFoldingThreshold later. (#132170 ) This moves the checks of MinTripCountTailFoldingThreshold later, during the calculation of whether to tail fold. This allows it to check beforehand whether tail predication is required, either for scalable or fixed-width vectors. This option is only specified for AArch64, where it returns the minimum of 5. This patch aims to allow the vectorization of TC=4 loops, preventing them from performing slower when SVE is present.	2025-03-26 19:35:08 +00:00
Florian Hahn	420c056f85	[VPlan] Add ComputeFindLastIVResult opcode (NFC). (#132689 ) This moves the logic for computing the FindLastIV reduction result to its own opcode. A follow-up patch will update the new opcode to also take the start value, to fix https://github.com/llvm/llvm-project/issues/126836. PR: https://github.com/llvm/llvm-project/pull/132689	2025-03-26 10:49:09 +00:00
Kazu Hirata	e87921304b	[Vectorize] Avoid repeated hash lookups (NFC) (#132661 ) Co-authored-by: Florian Hahn <flo@fhahn.com>	2025-03-25 15:18:15 -07:00
Ramkumar Ramachandra	8fb802e995	[LV] Improve code in collectInstsToScalarize (NFC) (#130643 )	2025-03-25 16:52:13 +00:00
Ramkumar Ramachandra	e8d882a95b	[LV] Audit and fix nits in cl::opts (NFC) (#130601 ) Non-static cl::opts should be under the llvm namespace.	2025-03-25 10:19:45 +00:00
Luke Lau	6a8606e99e	[VPlan] Only store RecurKind + FastMathFlags in VPReductionRecipe. NFCI (#131300 ) VPReductionRecipes take a RecurrenceDescriptor, but only use the RecurKind and FastMathFlags in it when executing. This patch makes the recipe more lightweight by stripping it to only take the latter two. The motiviation for this is to simplify an upcoming patch to support in-loop AnyOf reductions. For an in-loop AnyOf reduction we want to create an Or reduction, and by using RecurKind we can create an arbitrary reduction without needing a full RecurrenceDescriptor.	2025-03-24 19:18:54 +08:00
Florian Hahn	206b42c98e	[LV] Use VPBuilder to create ComputeReductionResult. (NFC) Update code to use VPBuilder, to simplify follow-up changes.	2025-03-23 16:10:07 +00:00
Florian Hahn	c482b8faea	[VPlan] Only execute VPExpandSCEVRecipes once and remove them (NFC). Instead of executing the whole entry VPIRBB twice, first only execute the VPExpandSCEVRecipes and replace their uses with the expanded VPValue, which will be a live-in. This allows removing special logic in VPExpandSCEVRecipe to support executing twice and allows moving the ExpandedSCEVs map out of VPTransformState. It will also allow adding other recipes to the entry VPBB in the future.	2025-03-23 09:06:01 +00:00
Kazu Hirata	fae34938f6	[llvm] Use *Set::insert_range (NFC) (#132591 ) DenseSet, SmallPtrSet, SmallSet, SetVector, and StringSet recently gained C++23-style insert_range. This patch uses insert_range with iterator ranges. For each case, I've verified that foos is defined as make_range(foo_begin(), foo_end()) or in a similar manner.	2025-03-22 22:14:45 -07:00
Florian Hahn	0d3ba087f7	[LV] Move IV bypass value creation out of ILV (NFC) createInductionAdditionalBypassValues is only used for epilogue vectorization now. Move it out of ILV, which means we do not have to thread through ExpandedSCEVs and also don't have to track the bypass values in ILV. Instead, directly create them if needed after executing the epilogue plan. This moves more the epilogue specific logic out of the generic executePlan.	2025-03-22 20:36:45 +00:00
David Sherwood	4e69258bf3	[LoopVectorize] Add cost of generating tail-folding mask to the loop (#130565 ) At the moment if we decide to enable tail-folding we do not include the cost of generating the mask per VF. This can mean we make some poor choices of VF, which is definitely true for SVE-enabled AArch64 targets where mask generation for fixed-width vectors is more expensive than for scalable vectors. I've added a VPInstruction::computeCost function to return the costs of the ActiveLaneMask and ExplicitVectorLength operations. Unfortunately, in order to prevent asserts firing I've also had to duplicate the same code in the legacy cost model to make sure the chosen VFs match up. I've wrapped this up in a ifndef NDEBUG for now. The alternative would be to disable the assert completely when tail-folding, which I imagine is just as bad. New tests added: Transforms/LoopVectorize/AArch64/sve-tail-folding-cost.ll Transforms/LoopVectorize/RISCV/tail-folding-cost.ll	2025-03-21 09:24:56 +00:00
Florian Hahn	c73ad7ba20	[VPlan] Add transformation to narrow interleave groups. This patch adds a new narrowInterleaveGroups transfrom, which tries convert a plan with interleave groups with VF elements to a plan that instead replaces the interleave groups with wide loads and stores processing VF elements. This effectively is a very simple form of loop-aware SLP, where we use interleave groups to identify candidates. This initial version is quite restricted and hopefully serves as a starting point for how to best model those kinds of transforms. For now it only transforms load interleave groups feeding store groups. Depends on #106431. This lands the main parts of the approved https://github.com/llvm/llvm-project/pull/106441 as suggested to break things up a bit more.	2025-03-20 19:41:37 +00:00
Kazu Hirata	2df51fd9c4	[Transforms] Avoid repeated hash lookups (NFC) (#132146 )	2025-03-20 09:11:56 -07:00
Kazu Hirata	0dcc201ac4	[Transforms] Use *Set::insert_range (NFC) (#132056 ) DenseSet, SmallPtrSet, SmallSet, SetVector, and StringSet recently gained C++23-style insert_range. This patch replaces: Dest.insert(Src.begin(), Src.end()); with: Dest.insert_range(Src); This patch does not touch custom begin like succ_begin for now.	2025-03-19 15:35:01 -07:00
Florian Hahn	2e13ec561c	[VPlan] Bail out on non-intrinsic calls in VPlanNativePath. Update initial VPlan-construction in VPlanNativePath in line with the inner loop path, in that it bails out when encountering constructs it cannot handle, like non-intrinsic calls. Fixes https://github.com/llvm/llvm-project/issues/131071.	2025-03-19 21:35:15 +00:00
Florian Hahn	11b8699572	[LV] Don't skip instrs with side-effects in reg pressure computation. (#126415 ) calculateRegisterUsage adds end points for each user of an instruction to Ends and ignores instructions not added to it, i.e. instructions with no users. This means things like stores aren't included, which in turn means values that are only used in stores are also not included for consideration. This means we underestimate the register usage in cases where the only users are things like stores. Update the code to don't skip instructions without users (i.e. not in Ends) if they have side-effects. PR: https://github.com/llvm/llvm-project/pull/126415	2025-03-19 15:13:43 +00:00

1 2 3 4 5 ...

2483 Commits