llvm-project

Author	SHA1	Message	Date
Luke Lau	6e723d2de8	[VPlan] Remove loop region in simplifyBranchConditionForVFAndUF with EVL PHI (#150016 ) Previously we fell back to just simplifying the branch cond to true since one of the phis was a VPEVLBasedIVPHIRecipe. However this should be fine to replace with its start value.	2025-07-22 22:30:34 +08:00
Florian Hahn	3fd53db858	[VPlan] Remove unneeded VPVectorPointer after narrowing to replicate. The replicate recipes created when narrowing interleave groups don't need a VPVectorPointer, they can simply use the existing pointer.	2025-07-19 20:18:04 +01:00
Florian Hahn	89193640f4	[VPlan] Remove unused argument from canNarrowLoad (NFC). The WideMember argument is unused, remove it.	2025-07-11 21:10:58 +01:00
Florian Hahn	c452de1715	Reapply "[VPlan] Allow derived IVs and scalar-steps in narrowing interleave." This reverts commit f5ed863176dd286462cd5558723dfe445967fedf. Recommit patch now that the crash exposed by the change has been fixed.	2025-07-10 20:48:19 +01:00
Florian Hahn	253f8b6873	[VPlan] Support single-scalar VPReplicateRecipes when narrowing IGs. When narrowing interleave groups, we can treat single scalar VPReplicateRecipes as already narrowed.	2025-07-09 21:30:44 +01:00
Luke Lau	2e5776130b	[VPlan] Simplify select !c, x, y -> select c, y, x (#147268 ) This is split off from #133993 On its own this simplification isn't that useful, but it allows us to make the equivalent VPBlendRecipe optimisation more generic by operating on VPInstructions. In order to actually test this without #133993, I've had to also extend the m_Not pattern matcher to also catch VPWidenRecipes, since I couldn't really think of a straightforward way to create a VPInstruction::Select with a negated condition.	2025-07-08 15:56:04 +08:00
Florian Hahn	6a9a16da7a	[VPlan] Replace RdxDesc with RecurKind in VPReductionPHIRecipe (NFC). (#142322 ) Replace RdxDesc with RecurKind in VPReductionPHIRecipe, as all VPlan analyses and codegen only require the recurrence kind. This enables creating new VPReductionPHIRecipe directly in LV, without needing to construction a whole RecurrenceDescriptor object. Depends on https://github.com/llvm/llvm-project/pull/141860 https://github.com/llvm/llvm-project/pull/141932 https://github.com/llvm/llvm-project/pull/142290 https://github.com/llvm/llvm-project/pull/142291 PR: https://github.com/llvm/llvm-project/pull/142322	2025-07-06 21:40:42 +01:00
Florian Hahn	1f3f9874b0	[VPlan] Fix crash when narrowing interleave-groups with reuse. If a wide load is used multiple times in an expression, it will be narrowed the first time. Re-use the already narrowed op in that case to fix crash.	2025-07-04 21:32:24 +01:00
Florian Hahn	6efa3dfb7b	[VPlan] Handle interleave groups with trivially narrow operands. If all operands to an interleave group are already trivially narrow, narrow the interleave group itself as well.	2025-07-03 21:02:10 +01:00
Luke Lau	61a0653cc6	[VPlan] Fix first-order splices without header mask not using EVL (#146672 ) This fixes a buildbot failure with EVL tail folding after #144666: https://lab.llvm.org/buildbot/#/builders/132/builds/1653 For a first-order recurrence to be correct with EVL tail folding we need to convert splices to vp splices with the EVL operand. Originally we did this by looking for users of the header mask and its users, and converting it in createEVLRecipe. However after #144666 a FOR splice might not actually use the header mask if it's based off e.g. an induction variable, and so we wouldn't pick it up in createEVLRecipe. This fixes this by converting FOR splices separately in a loop over all recipes in the plan, regardless of whether or not it uses the header mask. I think there was some conflation in createEVLRecipe between what was an optimisation and what was needed for correctness. Most of the transforms in it just exist to optimize the mask away and we should still emit correct code without them. So I've renamed it to make the separation clearer.	2025-07-03 16:55:00 +01:00
Luke Lau	ec25a0568c	[VPlan] Don't convert VPWidenSelectRecipes to vp.select in EVL transform (#146695 ) createEVLRecipe tries to optimise recipes that use the header mask by replacing them with their VP equivalents and setting the EVL, allowing the mask to be removed. However we currently also convert widened selects to vp.select even though they don't necessarily use the header mask. Unlike vp.merge a vp.select only makes the "unused" lanes past EVL poison, so it's not needed for correctness. In the same vein as #127180, this patch removes the transform for VPWidenSelectRecipes and keeps them as plain select instructions to allow for more optimisations. RISCVVLOptimizer will still be able to optimise away any VL toggles and we end up with better code generation across llvm-test-suite and SPEC CPU 2017.	2025-07-03 11:50:25 +01:00
Mel Chen	bc8dad1c7e	[VPlan] Emit VPVectorEndPointerRecipe for reverse interleave pointer adjustment (#144864 ) A reverse interleave access is essentially composed of multiple load/store operations with same negative stride, and their addresses are based on the last lane address of member 0 in the interleaved group. Currently, we already have VPVectorEndPointerRecipe for computing the last lane address of consecutive reverse memory accesses. This patch extends VPVectorEndPointerRecipe to support constant stride and extracts the reverse interleave group address adjustment from VPInterleaveRecipe::execute, replacing it with a VPVectorEndPointerRecipe. The final goal is to support interleaved accesses with EVL tail folding. Given that VPInterleaveRecipe is large and tightly coupled — combining both load and store, and embedding operations like reverse pointer adjustion (GEP), widen load/store, deinterleave/interleave, and reversal — breaking it down into smaller, dedicated recipes may allow VPlanTransforms::tryAddExplicitVectorLength to lower them into EVL-aware form more effectively. One foreseeable challenge is that VPlanTransforms::convertToConcreteRecipes currently runs after tryAddExplicitVectorLength, so decomposing VPInterleaveRecipe will likely need to happen earlier in the pipeline to be effective.	2025-07-02 18:16:02 +08:00
Florian Hahn	6b3d2b629c	[VPlan] Add VPExpressionRecipe, replacing extended reduction recipes. (#144281 ) This patch adds a new recipe to combine multiple recipes into an 'expression' recipe, which should be considered as single entity for cost-modeling and transforms. The recipe needs to be 'decomposed', i.e. replaced by its individual recipes before execute. This subsumes VPExtendedReductionRecipe and VPMulAccumulateReductionRecipe and should make it easier to extend to include more types of bundled patterns, like e.g. extends folded into loads or various arithmetic instructions, if supported by the target. It allows avoiding re-creating the original recipes when converting to concrete recipes, together with removing the need to record various information. The current version of the patch still retains the original printing matching VPExtendedReductionRecipe and VPMulAccumulateReductionRecipe, but this specialized print could be replaced with printing the bundled recipes directly. PR: https://github.com/llvm/llvm-project/pull/144281	2025-07-01 20:44:50 +01:00
David Sherwood	9b13dfdfbc	[LV] Use vscale for tuning to improve branch weight estimates (#144733 ) In addBranchWeightToMiddleTerminator we attempt to add branch weights to the middle block terminator. We pessimistically assume vscale=1, whereas we can improve the estimate by using the value of vscale used for tuning.	2025-07-01 13:23:38 +01:00
Luke Lau	4a2fa0847f	[VPlan] Support VPWidenIntOrFpInductionRecipes with EVL tail folding (#144666 ) Following on from #118638, this handles widened induction variables with EVL tail folding by setting the VF operand to be EVL, calculated in the vector body. We need to do this for correctness since with EVL tail folding the number of elements processed in the penultimate iteration may not be VF, but the runtime EVL, and we need take this into account when updating the backedge value. - Because the VF may now not be a live-in we need to move the insertion point to just after the VFs definition - We also need to avoid truncating it when it's the same size as the step type, previously this wasn't a problem for live-ins. - Also because the VF may be smaller than the IV type, since the EVL is always i32, we may need to zext it. On -march=rva23u64 -O3 we get 87.1% more loops vectorized on TSVC, and 42.8% more loops vectorized on SPEC CPU 2017	2025-07-01 12:29:24 +01:00
Luke Lau	f01a7936be	[VPlan] Replace all uses of VF when EVL tail folding. NFCI (#146339 ) With EVL tail folding, any use of the VF live in should be replaced by the EVL. Otherwise, it should likely be directly emitted as a constant via VPTransformState::VF. This strengthens the EVL transformation by replacing all uses of VF with EVL and asserting that the only users are VPVectorEndPointerRecipe and VPScalarIVStepsRecipe, the latter of which is new. This should be NFC because even though we didn't previously replace the EVL of VPScalarIVStepsRecipe, it's only used when unrolling which we don't allow with EVL tail folding yet.	2025-06-30 13:47:38 +01:00
Florian Hahn	b822a32659	[VPlan] Fix crash when trying to narrow interleave group storing const. Use dyn_cast_null to handle the case where an interleave groups stores a constant in any of its lanes.	2025-06-29 21:29:12 +01:00
Florian Hahn	f5ed863176	Revert "[VPlan] Allow derived IVs and scalar-steps in narrowing interleave." This reverts commit 2787759ef2e41b19f8bfde06fe9a26b25d1f5834. This exposed a crash on some build bots. Revert to investigate.	2025-06-29 14:40:03 +01:00
Florian Hahn	2787759ef2	[VPlan] Allow derived IVs and scalar-steps in narrowing interleave. Both VPDerivedIVRecipe and VPScalarIVSteps recipe should be supported in narrowInterleaveGroups: * VPDerivedIVRecipe is based on the canonical IV and independent of VF, * VPScalarIVSteps takes the VF as operand, so it will be updated by narrowInterleaveGroup.	2025-06-29 13:18:51 +01:00
Florian Hahn	bdb299a67e	[VPlan] Simplify code in single scalar transform code (NFC). Adjust code as suggested post-commit 3b7b95f78e2. `3b7b95f78e (r160997427)`	2025-06-28 22:53:14 +01:00
Florian Hahn	3b7b95f78e	[VPlan] Support VPWidenSelectRecipe in narrowToSingleScalar. VPWidenSelectRecipes are single scalars if all their operands are. Add support for narrowing them to a single scalar VPReplicateRecipe. This fixes a crash after https://github.com/llvm/llvm-project/pull/142433 (aa24029319083) when due to a replicate recipe not being converted to single-scalar being hoisted to the vector preheader.	2025-06-27 15:42:42 +01:00
Florian Hahn	aa24029319	[VPlan] Unroll VPReplicateRecipe by VF. (#142433 ) Explicitly unroll VPReplicateRecipes outside replicate regions by VF, replacing them by VF single-scalar recipes. Extracts for operands are added as needed and the scalar results are combined to a vector using a new BuildVector VPInstruction. It also adds a few folds to simplify unnecessary extracts/BuildVectors. It also adds a BuildStructVector opcode for handling of calls that have struct return types. VPReplicateRecipe in replicate regions can will be unrolled as follow up, turing non-single-scalar VPReplicateRecipes into 'abstract', i.e. not executable. PR: https://github.com/llvm/llvm-project/pull/142433	2025-06-26 11:19:09 +01:00
Florian Hahn	58b939abe5	[VPlan] Support matching constants in narrowInterleaveGroups. Matching constants can trivially be broadcasted, allow them if the same constant is used for all recipes in a bundle.	2025-06-22 08:45:40 +01:00
Florian Hahn	60d1276b0e	[VPlan] Pass operand index to canNarrowLoad. (NFC) Explicitly pass the operand we are checking to canNarrowLoad. This simplifies the check if the operands match across recipes and enables future optimizations.	2025-06-21 15:41:26 +01:00
Florian Hahn	f8ffb4e7cd	[VPlan] Simplify ExtractLastElement(Broadcast(A)) -> A. Remove trivial ExtractLastElement VPInstructions.	2025-06-20 21:08:14 +01:00
Luke Lau	521adc9fa2	[VPlan] Use createScalarZExtOrTrunc when expanding expandVPWidenIntOrFpInduction Split off from #144666	2025-06-20 19:18:49 +01:00
Florian Hahn	e8be733a3c	[VPlan] Remove redundant ExtractLastElement from vector-to-scalar VPI. Recipes that are vector-to-scalar are guaranteed to generate a scalar value, so the extract is redundant after VPlan unrolling. Remove it. This removes unneeded ExtractLastElement VPInstruction of reduction result computations.	2025-06-20 12:45:20 +01:00
Philip Reames	53ea522d1b	[LV] Introduce and use VPBuilder::createScalarZExtOrTrunc [nfc] (#144946 ) Reduce redundant code, make the flow slightly easier to read.	2025-06-19 14:12:14 -07:00
Florian Hahn	23b8f11b27	[VPlan] Remove redundant VPWidenRecipe constructors (NFC) Since the removal of VPWidenEVLRecipe, the constructors taking a VPDefOpcode are not needed any more. Remove them.	2025-06-18 20:59:16 +01:00
Luke Lau	9dd1c66e8f	[VPlan] Expand VPWidenIntOrFpInductionRecipe into separate recipes (#118638 ) The motivation of this PR is to make #115274 easier to implement, and should allow us to add EVL support by just passing EVL to the VF operand. The current difficulty with widening IVs with EVL is that VPWidenIntOrFpInductionRecipe generates its own backedge value. Since it's a VPHeaderPHIRecipe the VF operand must be in the preheader, which means we can't use the EVL since it's defined in the loop body. The gist in this PR is to take the approach in #114305 and expand VPWidenIntOrFpInductionRecipe into several recipes for the initial value, phi and backedge value just before execution. I.e. this example: ``` vector.ph: Successor(s): vector loop <x1> vector loop: { vector.body: WIDEN-INDUCTION %i = phi %start, %step, %vf ... EMIT branch-on-count ... No successors } ``` gets expanded to: ``` vector.ph: ... vp<%induction.start> = ... vp<%induction.increment> = ... Successor(s): vector loop <x1> vector loop: { vector.body: ir<%i> = WIDEN-PHI vp<%induction.start>, vp<%vec.ind.next> ... vp<%vec.ind.next> = add ir<%i>, vp<%induction.increment> EMIT branch-on-count ... No successors } ``` This allows us to a value defined in the loop in the backedge value, and also means we can just reuse the existing backedge fixups in VPlan::execute without having to specially handle it ourselves. After this #115274 should just become a matter of setting the VF operand to EVL (and building the increment step in the loop body, not the preheader).	2025-06-17 18:24:07 +01:00
Florian Hahn	30b16ec341	[VPlan] Simplify trivial VPFirstOrderRecurrencePHI recipes. VPFirstOrderRecurrencePHIRecipes where the incoming values are the same can be simplified and removed. Fixes https://github.com/llvm/llvm-project/issues/144212. The new test is added together with other related tests from first-order-recurrence.ll	2025-06-16 22:54:26 +01:00
Florian Hahn	577199f922	Reapply "[VPlan] Set branch weight metadata on middle term in VPlan (NFC) (#143035 )" This reverts commit 0604dc199c019b23746f4a54885ba0c75569cdae. The recommitted version addresses post-commit comments and adjusts the place the branch weights are added. It now runs before VPlans are optimized for VF and UF, which may remove the vector loop region, causing a crash trying to get the middle block after that. Test case added in 72f99b75afc12bb. Original message: Manage branch weights for the BranchOnCond in the middle block in VPlan. This requires updating VPInstruction to inherit from VPIRMetadata, which in general makes sense as there are a number of opcodes that could take metadata. There are other branches (part of the skeleton) that also need branch weights adding. PR: https://github.com/llvm/llvm-project/pull/143035	2025-06-14 17:20:46 +01:00
Florian Hahn	6108d50aed	[VPlan] Add ReductionStartVector VPInstruction. (#142290 ) Add a new VPInstruction::ReductionStartVector opcode to create the start values for wide reductions. This more accurately models the start value creation in VPlan and simplifies VPReductionPHIRecipe::execute. Down the line it also allows removing VPReductionPHIRecipe::RdxDesc. PR: https://github.com/llvm/llvm-project/pull/142290	2025-06-09 20:59:12 +01:00
Florian Hahn	2eab83f618	[VPlan] Remove CanonicalIV when dissolving loop regions (NFC). (#142372 ) Directly replace the canonical IV when we dissolve the containing region. That ensures that it won't get removed before the region gets removed, which would result in an invalid region. This removes the current ordering constraint between convertToConcreteRecipes and dissolving regions. PR: https://github.com/llvm/llvm-project/pull/142372	2025-06-03 10:05:28 +01:00
Ramkumar Ramachandra	b8c4eea3d8	[VPlan] Simplify PredPHI LiveIn -> LiveIn (#142271 ) 5f39be5 ([VPlan] Use InstSimplifyFolder instead of TargetFolder) updated simplifyRecipe to fold live-ins to Values that are not necessarily Constant, but forgot to update the corresponding PredPHI folder, which still folds PredPHI constant -> constant. Update it to fold PredPHI LiveIn -> LiveIn. Fixes #141968.	2025-06-02 14:56:35 +01:00
Florian Hahn	3b474bc510	[VPlan] Use VPSingleDef in simplifyRecipe (NFC). All simplifications are applied to VPSingleDefRecipes. Check for them early to skip unnecessary work and remove a number of getVPSingleValue calls.	2025-06-01 15:32:02 +01:00
Florian Hahn	33bbce5e34	[VPlan] Get plan once in simplifyRecipe (NFC). Also check once if the plan is unrolled at the end, to make it easier to add more transforms that apply after unrolling.	2025-06-01 12:46:08 +01:00
Kazu Hirata	c0bf51e3ad	[Vectorize] Fix a warning This patch fixes: llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:1865:17: error: unused variable 'Preds' [-Werror,-Wunused-variable]	2025-05-31 14:40:19 -07:00
Florian Hahn	0f00a96fed	[VPlan] Simplify branch on False in VPlan transform (NFC). (#140409 ) Simplify branch on false, starting with the branch from the middle block to the scalar preheader. Initially this helps simplifying the initial VPlan construction. Depends on https://github.com/llvm/llvm-project/pull/140405. PR: https://github.com/llvm/llvm-project/pull/140409	2025-05-31 20:32:45 +01:00
Florian Hahn	78eafb14f7	[VPlan] Add getIndexFor(Predecessor\|Successor) helpers (NFC). Move code to get the index of a predecessor and successor to helpers in VPBlockBase, to avoid duplication and enable future reuse. Split off from https://github.com/llvm/llvm-project/pull/140409.	2025-05-31 12:53:05 +01:00
Florian Hahn	10bd4cd9cd	[VPlan] Remove ResumePhi opcode, use regular PHI instead (NFC). (#140405 ) Use regular VPPhi instead of a separate opcode for resume phis. This removes an unneeded specialized opcode and unifies the code (verification, printing, updating when CFG is changed). Depends on https://github.com/llvm/llvm-project/pull/140132. PR: https://github.com/llvm/llvm-project/pull/140405	2025-05-30 12:50:08 +01:00
Ramkumar Ramachandra	5f39be5917	[VPlan] Use InstSimplifyFolder instead of TargetFolder (#141222 ) For more powerful folding with operands that are not necessarily all-constant, use InstSimplifyFolder instead of TargetFolder in tryToConstantFold, and rename the function tryToFoldLiveIns.	2025-05-28 11:00:14 +02:00
Florian Hahn	d56deea1e4	[VPlan] Connect Entry to scalar preheader during initial construction. (#140132 ) Update initial construction to connect the Plan's entry to the scalar preheader during initial construction. This moves a small part of the skeleton creation out of ILV and will also enable replacing VPInstruction::ResumePhi with regular VPPhi recipes. Resume phis need 2 incoming values to start with, the second being the bypass value from the scalar ph (and used to replicate the incoming value for other bypass blocks). Adding the extra edge ensures we incoming values for resume phis match the incoming blocks. PR: https://github.com/llvm/llvm-project/pull/140132	2025-05-27 16:07:56 +01:00
Kazu Hirata	89308de4b0	[llvm] Value-initialize values with *Map::try_emplace (NFC) (#141522 ) try_emplace value-initializes values, so we do not need to pass nullptr to try_emplace when the value types are raw pointers or std::unique_ptr<T>.	2025-05-26 15:13:02 -07:00
Florian Hahn	c0506a11f4	[VPlan] Separate out logic to manage IR flags to VPIRFlags (NFC). (#140621 ) This patch moves the logic to manage IR flags to a separate VPIRFlags class. For now, VPRecipeWithIRFlags is the only class that inherits VPIRFlags. The new class allows for simpler passing of flags when constructing recipes, simplifying the constructors for various recipes (VPInstruction in particular, which now just has 2 constructors, one taking an extra VPIRFlags argument. This mirrors the approach taken for VPIRMetadata and makes it easier to extend in the future. The patch also adds a unified flagsValidForOpcode to check if the flags in a VPIRFlags match the provided opcode. PR: https://github.com/llvm/llvm-project/pull/140621	2025-05-25 11:13:11 +01:00
Florian Hahn	dcef154b5c	[VPlan] Replace VPRegionBlock with explicit CFG before execute (NFCI). (#117506 ) Building on top of https://github.com/llvm/llvm-project/pull/114305, replace VPRegionBlocks with explicit CFG before executing. This brings the final VPlan closer to the IR that is generated and helps to simplify codegen. It will also enable further simplifications of phi handling during execution and transformations that do not have to preserve the canonical IV required by loop regions. This for example could include replacing the canonical IV with an EVL based phi while completely removing the original canonical IV. PR: https://github.com/llvm/llvm-project/pull/117506	2025-05-24 19:17:16 +01:00
Florian Hahn	bf15aadcbc	[VPlan] Don't try to narrow predicated VPReplicateRecipe. We cannot convert predicated recipes to uniform ones at the moment. This fixes a crash reported for https://github.com/llvm/llvm-project/pull/139150.	2025-05-21 22:13:55 +01:00
Ramkumar Ramachandra	cf1f116f78	[VPlan] Introduce constant folder in simplifyRecipe (#125365 ) Introduce a VPlan-level constant folder in simplifyRecipe that tries to fold a recipe to a constant using TargetFolder.	2025-05-20 14:16:01 +01:00
Florian Hahn	07c085af3e	[VPlan] Add narrowToSingleScalarRecipe transform. (#139150 ) Add a new convertToUniformRecipes transform which uses VPlan-based uniformity analysis to determine if wide recipes and replicate recipes can be converted to uniform recipes. There are a few places where we ad-hoc convert recipes to uniform recipes, which this transform will eventually replace. There are a few more generalizations required to do so which I plan to do as follow-ups. By converting the recipes to uniform recipes, we effectively materialize the information from the VPlan-based analysis. Note that there is one regression at the moment in SystemZ/pr47665.ll due to trivial constant folding opportunities in the input IR. This will be fixed by VPlan-based constant folding (https://github.com/llvm/llvm-project/pull/125365/) PR: https://github.com/llvm/llvm-project/pull/139150	2025-05-18 09:32:27 +01:00
Florian Hahn	04fde85057	[VPlan] Rename isUniform(AfterVectorization) to isSingleScalar (NFC). (#140134 ) Update the naming in VPReplicateRecipe and vputils to the more accurate isSingleScalar, as the functions check for cases where only a single scalar is needed, either because it produces the same value for all lanes or has only their first lane used. Discussed in https://github.com/llvm/llvm-project/pull/139150. PR: https://github.com/llvm/llvm-project/pull/140134	2025-05-16 16:38:39 +01:00

1 2 3 4 5 ...

389 Commits