llvm-project

Author	SHA1	Message	Date
Florian Hahn	8cdab07aaa	Reapply "[VPlan] Remove trivial dead VPPhi cycles." This reverts commit 1c7c8e3ad39957285524ff116d9a6aec0d9b62f9. Recommit with a fix for the verifier error caused for EVL recipes. Extra test coverage added in 6f939da60e.	2025-08-12 22:09:30 +01:00
Florian Hahn	1c7c8e3ad3	Revert "[VPlan] Remove trivial dead VPPhi cycles." This reverts commit 1f17bb133f4f49942a1e0245291811ca3c99a7d2. This seems to be breaking some RISCV bots, reverting for now https://lab.llvm.org/buildbot/#/builders/210/builds/1266	2025-08-11 22:05:30 +01:00
Florian Hahn	1f17bb133f	[VPlan] Remove trivial dead VPPhi cycles. Update removeDeadRecipes to remove trivial dead VPPhi cycles. Should effectively be NFC end-to-end.	2025-08-11 21:29:49 +01:00
Florian Hahn	86813aa786	[VPlan] Add dedicated user for resume phi with epilogue vectorization. Epilogue vectorization currently relies on the resume phi for the canonical induction being always available, which is why VPPhi are considered to have side-effects, to prevent their removal. This patch adds a new ResumeForEpilogue opcode to mark the resume phi as used for epilogue vectorization. This allows treating VPPhis in general as not having side-effects, enabling removal of unused VPPhis.	2025-08-10 21:21:16 +01:00
Florian Hahn	82d633e9ff	[VPlan] Materialize vector trip count using VPInstructions. (#151925 ) Materialize the vector trip count computation using VPInstruction instead of directly creating IR. This is one of the last few steps needed to model the full vector skeleton in VPlan. It also simplifies vector-trip count computations for scalable vectors, as we can re-use the UF x VF computation. PR: https://github.com/llvm/llvm-project/pull/151925	2025-08-08 11:44:32 +01:00
Nikita Popov	c23b4fbdbb	[IR] Remove size argument from lifetime intrinsics (#150248 ) Now that #149310 has restricted lifetime intrinsics to only work on allocas, we can also drop the explicit size argument. Instead, the size is implied by the alloca. This removes the ability to only mark a prefix of an alloca alive/dead. We never used that capability, so we should remove the need to handle that possibility everywhere (though many key places, including stack coloring, did not actually respect this).	2025-08-08 11:09:34 +02:00
Florian Hahn	25d1285eec	[VPlan] Replace single-entry VPPhis with their incoming values. Replace trivial, single-entry VPPhis with their incoming values,	2025-08-06 20:03:31 +01:00
Ramkumar Ramachandra	5dfc2d4535	[LV] Regen some tests with UTC (#152128 )	2025-08-05 18:01:02 +01:00
Luke Lau	94a6cd464e	[VPlan] Expand VPWidenPointerInductionRecipe into separate recipes (#148274 ) This is the VPWidenPointerInductionRecipe equivalent of #118638, with the motivation of allowing us to use the EVL as the induction step. There is a new VPInstruction added, WidePtrAdd to allow adding the step vector to the induction phi, since VPInstruction::PtrAdd only handles scalars or multiple scalar lanes. Originally this transformation was copied from the original recipe's execute code, but it's since been simplifed by teaching `unrollWidenInductionByUF` to unroll the recipe, which brings it inline with VPWidenIntOrFpInductionRecipe.	2025-08-05 16:54:02 +08:00
Florian Hahn	eee9755881	[LV] Refine check to find epilogue IV resume value. Make sure to check that the vector trip count is containedin the list of incoming values to serve as tie-breaker with phis with all-zero incoming values. Fixes https://github.com/llvm/llvm-project/issues/151686.	2025-08-01 20:54:39 +01:00
Nikita Popov	0a41e7c87e	[LICM] Do not reassociate constant offset GEP (#151492 ) LICM tries to reassociate GEPs in order to hoist an invariant GEP. Currently, it also does this in the case where the GEP has a constant offset. This is usually undesirable. From a back-end perspective, constant GEPs are usually free because they can be folded into addressing modes, so this just increases register pressume. From a middle-end perspective, keeping constant offsets last in the chain makes it easier to analyze the relationship between multiple GEPs on the same base, especially after CSE. The worst that can happen here is if we start with something like ``` loop { p + 4x p + 4x + 1 p + 4x + 2 p + 4x + 3 } ``` And LICM converts it into: ``` p.1 = p + 1 p.2 = p + 2 p.3 = p + 3 loop { p + 4x p.1 + 4x p.2 + 4x p.3 + 4x } ``` Which is much worse than leaving it for CSE to convert to: ``` loop { p2 = p + 4*x p2 + 1 p2 + 2 p2 + 3 } ```	2025-08-01 09:43:15 +02:00
Joel E. Denny	37e03b56b8	Revert "[PGO] Add `llvm.loop.estimated_trip_count` metadata" (#151585 ) Reverts llvm/llvm-project#148758 [As requested.](https://github.com/llvm/llvm-project/pull/148758#pullrequestreview-3076627201)	2025-07-31 15:56:31 -04:00
Joel E. Denny	f7b65011de	[PGO] Add `llvm.loop.estimated_trip_count` metadata (#148758 ) This patch implements the `llvm.loop.estimated_trip_count` metadata discussed in [[RFC] Fix Loop Transformations to Preserve Block Frequencies](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785). As [suggested in the RFC comments](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785/4), it adds the new metadata to all loops at the time of profile ingestion and estimates each trip count from the loop's `branch_weights` metadata. As [suggested in the PR #128785 review](https://github.com/llvm/llvm-project/pull/128785#discussion_r2151091036), it does so via a new `PGOEstimateTripCountsPass` pass, which creates the new metadata for each loop but omits the value if it cannot estimate a trip count due to the loop's form. An important observation not previously discussed is that `PGOEstimateTripCountsPass` often cannot estimate a loop's trip count, but later passes can sometimes transform the loop in a way that makes it possible. Currently, such passes do not necessarily update the metadata, but eventually that should be fixed. Until then, if the new metadata has no value, `llvm::getLoopEstimatedTripCount` disregards it and tries again to estimate the trip count from the loop's current `branch_weights` metadata.	2025-07-31 12:28:25 -04:00
Florian Hahn	89ae085859	[VPlan] Remove VPVectorPointer for part 0 after unrolling. (#149735 ) VPVectorPointer for part 0 is just the pointer operand. Simplify it after unrolling. This removes a large number of redundant GEPs with index 0. PR: https://github.com/llvm/llvm-project/pull/149735	2025-07-27 13:53:26 +01:00
Florian Hahn	fa3ec0c17c	[VPlan] Materialize constant vector trip counts before final opts. (#142309 ) Materialize constant vector trip counts before ::execute, if the trip count can be computed as Original (TC / (VF * UF)) * (VF * UF). For now this excludes when the tail is folded or scalar epilogues are required. This enables removing a number of redundant branches from the middle block. For now this is also only done when not vectorizing the epilogue, as the simplification complicates stitching the 2 plans together. PR: https://github.com/llvm/llvm-project/pull/142309	2025-07-26 17:16:36 +01:00
Luke Lau	feb77c0fea	[VPlan] Handle VPWidenSelectRecipe in tryToFoldLiveIns (#150357 ) This helps simplify VPBlendRecipes that are expanded to selects in another patch.	2025-07-25 09:46:19 +08:00
Florian Hahn	3fd53db858	[VPlan] Remove unneeded VPVectorPointer after narrowing to replicate. The replicate recipes created when narrowing interleave groups don't need a VPVectorPointer, they can simply use the existing pointer.	2025-07-19 20:18:04 +01:00
Florian Hahn	afe8150780	[VPlan] Simplify exituser handling by generating all extracts first(NFCI) Simplify the handling of exit users by generating all extracts first (safe option), and have FOR handling optimize the extracts, similar to already done for reductions and inductions. NFC modulo first-order recurrence extract order in middle block.	2025-07-16 08:14:12 +01:00
James Y Knight	093afed969	[VPlan] Fix miscompile after PR #142433 . (#147398 ) This fixes a bug introduced by aa2402931908317f5cc19b164ef17c5a74f2ae67, "[VPlan] Unroll VPReplicateRecipe by VF", which cloned a VPReplicateRecipe without transferring the flags from the original. That can cause incorrect nsw/nuw flags to be emitted on the new instructions, which may result in miscompiles. It turns out there were no test-cases in the repo which end up hitting the situation where the recipe requires instruction clones to have different flags from the underlying instruction. The existing tests covered the flags being correct when the replacement instruction is a vectorized version of the initial instruction, but not when it required clones. A new test is added covering this.	2025-07-08 14:51:27 -04:00
Florian Hahn	59a7185dd9	[VPlan] Truncate/Extend ComputeReductionResult at construction (NFC). (#141860 ) Instead of looking up the narrower reduction type via getRecurrenceType we can generate the needed extend directly at constructiond re-use the truncated value from the loop. PR: https://github.com/llvm/llvm-project/pull/141860	2025-06-30 22:39:17 +01:00
Florian Hahn	aa24029319	[VPlan] Unroll VPReplicateRecipe by VF. (#142433 ) Explicitly unroll VPReplicateRecipes outside replicate regions by VF, replacing them by VF single-scalar recipes. Extracts for operands are added as needed and the scalar results are combined to a vector using a new BuildVector VPInstruction. It also adds a few folds to simplify unnecessary extracts/BuildVectors. It also adds a BuildStructVector opcode for handling of calls that have struct return types. VPReplicateRecipe in replicate regions can will be unrolled as follow up, turing non-single-scalar VPReplicateRecipes into 'abstract', i.e. not executable. PR: https://github.com/llvm/llvm-project/pull/142433	2025-06-26 11:19:09 +01:00
Florian Hahn	fd97dfbb78	[LV] Don't mark ptrs as safe to speculate if fed by UB/poison op. (#143204 ) Add additional checks before marking pointers safe to load speculatively. If some computations feeding the pointer may trigger UB, we cannot load the pointer speculatively, because we cannot compute the address speculatively. The UB triggering instructions will be predicated, but if the predicated block does not execute the result is poison. Similarly, we also cannot load the pointer speculatively if it may be poison. The patch also checks if any of the operands defined outside the loop may be poison when entering the loop. We don't need to check if any operation inside the loop may produce poison due to flags, as those will be dropped if needed. There are some types of instructions inside the loop that can produce poison independent of flags. Currently loads are also checked, not sure if there's a convenient API to check for all such operands. Fixes https://github.com/llvm/llvm-project/issues/142957. PR: https://github.com/llvm/llvm-project/pull/143204	2025-06-20 13:05:19 +01:00
Luke Lau	9dd1c66e8f	[VPlan] Expand VPWidenIntOrFpInductionRecipe into separate recipes (#118638 ) The motivation of this PR is to make #115274 easier to implement, and should allow us to add EVL support by just passing EVL to the VF operand. The current difficulty with widening IVs with EVL is that VPWidenIntOrFpInductionRecipe generates its own backedge value. Since it's a VPHeaderPHIRecipe the VF operand must be in the preheader, which means we can't use the EVL since it's defined in the loop body. The gist in this PR is to take the approach in #114305 and expand VPWidenIntOrFpInductionRecipe into several recipes for the initial value, phi and backedge value just before execution. I.e. this example: ``` vector.ph: Successor(s): vector loop <x1> vector loop: { vector.body: WIDEN-INDUCTION %i = phi %start, %step, %vf ... EMIT branch-on-count ... No successors } ``` gets expanded to: ``` vector.ph: ... vp<%induction.start> = ... vp<%induction.increment> = ... Successor(s): vector loop <x1> vector loop: { vector.body: ir<%i> = WIDEN-PHI vp<%induction.start>, vp<%vec.ind.next> ... vp<%vec.ind.next> = add ir<%i>, vp<%induction.increment> EMIT branch-on-count ... No successors } ``` This allows us to a value defined in the loop in the backedge value, and also means we can just reuse the existing backedge fixups in VPlan::execute without having to specially handle it ourselves. After this #115274 should just become a matter of setting the VF operand to EVL (and building the increment step in the loop body, not the preheader).	2025-06-17 18:24:07 +01:00
Florian Hahn	790df93298	[VPlan] Mark VPFirstOrderRecurrencePHI as not reading/writing memory. First-order recurrence phis don't have side-effects and don't read or write memory. Mark them as such.	2025-06-15 22:00:47 +01:00
Florian Hahn	6108d50aed	[VPlan] Add ReductionStartVector VPInstruction. (#142290 ) Add a new VPInstruction::ReductionStartVector opcode to create the start values for wide reductions. This more accurately models the start value creation in VPlan and simplifies VPReductionPHIRecipe::execute. Down the line it also allows removing VPReductionPHIRecipe::RdxDesc. PR: https://github.com/llvm/llvm-project/pull/142290	2025-06-09 20:59:12 +01:00
Florian Hahn	11713e86b0	[LV] Move VPlan-based calculateRegisterUsage to VPlanAnalysis (NFC). (#135673 ) Move VPlan-based calculateRegisterUsage from LoopVectorize to VPlanAnalysis.cpp. It is a VPlan-based analysis and this helps to reduce the size of LoopVectorize. PR: https://github.com/llvm/llvm-project/pull/135673	2025-06-02 17:40:50 +01:00
Ramkumar Ramachandra	b8c4eea3d8	[VPlan] Simplify PredPHI LiveIn -> LiveIn (#142271 ) 5f39be5 ([VPlan] Use InstSimplifyFolder instead of TargetFolder) updated simplifyRecipe to fold live-ins to Values that are not necessarily Constant, but forgot to update the corresponding PredPHI folder, which still folds PredPHI constant -> constant. Update it to fold PredPHI LiveIn -> LiveIn. Fixes #141968.	2025-06-02 14:56:35 +01:00
Ramkumar Ramachandra	5f39be5917	[VPlan] Use InstSimplifyFolder instead of TargetFolder (#141222 ) For more powerful folding with operands that are not necessarily all-constant, use InstSimplifyFolder instead of TargetFolder in tryToConstantFold, and rename the function tryToFoldLiveIns.	2025-05-28 11:00:14 +02:00
Florian Hahn	a9b2998e31	[VPlan] Skip cost assert if VPlan converted to single-scalar recipes. Check if a VPlan transform converted recipes to single-scalar VPReplicateRecipes (after 07c085af3efcd67503232f99a1652efc6e54c1a9). If that's the case, the legacy cost model incorrectly overestimates the cost. Fixes https://github.com/llvm/llvm-project/issues/141237.	2025-05-24 11:09:27 +01:00
Ramkumar Ramachandra	cf1f116f78	[VPlan] Introduce constant folder in simplifyRecipe (#125365 ) Introduce a VPlan-level constant folder in simplifyRecipe that tries to fold a recipe to a constant using TargetFolder.	2025-05-20 14:16:01 +01:00
Sam Tebbs	70501ed2f0	[LoopVectorizer] Prune VFs based on plan register pressure (#132190 ) This PR moves the register usage checking to after the plans are created, so that any recipes that optimise register usage (such as partial reductions) can be properly costed and not have their VF pruned unnecessarily. Depends on https://github.com/llvm/llvm-project/pull/137746	2025-05-19 13:27:17 +01:00
Florian Hahn	07c085af3e	[VPlan] Add narrowToSingleScalarRecipe transform. (#139150 ) Add a new convertToUniformRecipes transform which uses VPlan-based uniformity analysis to determine if wide recipes and replicate recipes can be converted to uniform recipes. There are a few places where we ad-hoc convert recipes to uniform recipes, which this transform will eventually replace. There are a few more generalizations required to do so which I plan to do as follow-ups. By converting the recipes to uniform recipes, we effectively materialize the information from the VPlan-based analysis. Note that there is one regression at the moment in SystemZ/pr47665.ll due to trivial constant folding opportunities in the input IR. This will be fixed by VPlan-based constant folding (https://github.com/llvm/llvm-project/pull/125365/) PR: https://github.com/llvm/llvm-project/pull/139150	2025-05-18 09:32:27 +01:00
Florian Hahn	8c6c525a6b	[LV] Don't consider FORs as profitable to scalarize. Fixed-order recurrence phis cannot be scalarized, they will always be widened at the moment. Make sure they are not incorrectly considered profitable to scalarize, similar to 41c1a7be3f1a2556e. Fixes https://github.com/llvm/llvm-project/issues/139060. Fixes https://github.com/llvm/llvm-project/issues/139065.	2025-05-09 20:29:22 +01:00
Ramkumar Ramachandra	f058333941	[LV] Regen a test with UTC (#139235 )	2025-05-09 14:26:20 +01:00
Paul Walker	01813e8929	[LLVM][VecLib] Refactor LIBMVEC integration to be target neutral. (#138262 ) Renames LIBMVEC-X86 to LIBMVEC and updates TLI to only add the existing x86 specific mapping when targeting x86.	2025-05-07 11:05:25 +01:00
Florian Hahn	043b04acff	Reapply "[VPlan] Fold NOT into predicate of wide compares." (#130347 ) This reverts commit 8dd160f4767f971572eac065c8650d9202ff5bf9. The recommit contains an adjustment to planContainsAdditionalSimplifications, which considers changes to the original predicate for compares. Original commit message: Add simplification to fold negation into a compare, if the negation is the only user of the compare. This removes a number of redundant negations. Alive2 Proofs for FPCMP test changes: https://alive2.llvm.org/ce/z/WGDz9U PR: https://github.com/llvm/llvm-project/pull/129430	2025-04-28 20:01:37 +01:00
Florian Hahn	15bb1db4a9	[VPlan] Remove ILV::sinkScalarOperands. (#136023 ) Remove legacy ILV sinkScalarOperands, which is superseded by the sinkScalarOperands VPlan transforms. There are a few cases that aren't handled by VPlan's sinkScalarOperands, because the recipes doesn't support replicating. Those are pointer inductions and blends. We could probably improve this further, by allowing replication for more recipes, but I don't think the extra complexity is warranted. Depends on https://github.com/llvm/llvm-project/pull/136021. PR: https://github.com/llvm/llvm-project/pull/136023	2025-04-24 08:37:49 +01:00
Florian Hahn	5739a22fbb	[VPlan] Also duplicated scalar-steps when it enables sinking scalars. (#136021 ) Extend sinking logic to duplicate scalar steps recipe if it enables sinking, that is if all users in a destination block require all lanes. This should be the last step before removing legacy sinkScalarOperands. PR: https://github.com/llvm/llvm-project/pull/136021	2025-04-21 18:36:43 +01:00
John Brawn	eafbb879f6	[LoopVectorize] Don't replicate blocks with optsize (#129265 ) Any VPlan we generate that contains a replicator region will result in replicated blocks in the output, causing a large code size increase. Reject such VPlans when optimizing for size, as the code size impact is usually worse than having a scalar epilogue, which we already forbid with optsize. This change requires a lot of test changes. For tests of optsize specifically I've updated the test with the new output, otherwise the tests have been adjusted to not rely on optsize. Fixes #66652	2025-04-17 11:50:49 +01:00
Florian Hahn	41c1a7be3f	[LV] Don't add fixed-order recurrence phis to forced scalars. Fixed-order recurrence phis cannot be forced to be scalar, they will always be widened at the moment. Make sure we don't add them to ForcedScalars, otherwise the legacy cost model will compute incorrect costs. This fixes an assertion reported with https://github.com/llvm/llvm-project/pull/129645.	2025-04-16 22:58:10 +02:00
YunQiang Su	fe9e2090be	Vectorize: Support fminimumnum and fmaximumnum (#131781 ) Support auto-vectorize for fminimum_num and fmaximum_num. For ARM64 with SVE, scalable vector cannot support yet. --------- Co-authored-by: Your Name <you@example.com>	2025-04-15 08:08:45 +08:00
Björn Pettersson	092b6e73e6	[InstCombine] Handle "add like" in ADD+GEP->GEP+GEP rewrites (#135156 ) Considering that "or disjoint" is the canonical for certain add operations, then I think we want to support such "add like" operations when doing ADD+GEP->GEP+GEP rewrites to make things more consistent. Problem was found when improving ValueTracking, which turned an ADD into OR, and then suddenly optimizations got worse due to these rewrites no longer triggering.	2025-04-14 17:11:13 +02:00
Florian Hahn	5550d30228	[VPlan] Check captured operand when simplifying redundant OR. Follow-up to 0f607f to actually use the captured operand X instead of Y.	2025-04-13 13:23:27 +01:00
Florian Hahn	0f607f3df5	[VPlan] Simplify 'or x, true' -> true. Add additional OR simplification to fix a divergence between legacy and VPlan-based cost model. This adds a new m_AllOnes matcher by generalizing specific_intval to int_pred_ty, which takes a predicate to check to support matching both specific APInts and other APInt predices, like isAllOnes. Fixes https://github.com/llvm/llvm-project/issues/131359.	2025-04-13 12:09:40 +01:00
Sam Tebbs	b658a2e74a	[LV] Reduce register usage for scaled reductions (#133090 ) This PR accounts for scaled reductions in `calculateRegisterUsage` to reflect the fact that the number of lanes in their output is smaller than the VF. Depends on https://github.com/llvm/llvm-project/pull/126437	2025-04-11 14:31:08 +01:00
Florian Hahn	6f92339d9e	[LV] Compute register usage for interleaving on VPlan. (#126437 ) Add a version of calculateRegisterUsage that works estimates register usage for a VPlan. This mostly just ports the existing code, with some updates to figure out what recipes will generate vectors vs scalars. There are number of changes in the computed register usages, but they should be more accurate w.r.t. to the generated vector code. There are the following changes: * Scalar usage increases in most cases by 1, as we always create a scalar canonical IV, which is alive across the loop and is not considered by the legacy implementation * Output is ordered by insertion, now scalar registers are added first due the canonical IV phi. * Using the VPlan, we now also more precisely know if an induction will be vectorized or scalarized. Depends on https://github.com/llvm/llvm-project/pull/126415 PR: https://github.com/llvm/llvm-project/pull/126437	2025-04-08 20:52:50 +01:00
Florian Hahn	5fbd0658a0	[VPlan] Add initial CFG simplification, removing BranchOnCond true. (#106748 ) Add an initial CFG simplification transform, which removes the dead edges for blocks terminated with BranchOnCond true. At the moment, this removes the edge between middle block and scalar preheader when folding the tail. PR: https://github.com/llvm/llvm-project/pull/106748	2025-04-04 15:44:26 +01:00
Luke Lau	8107b430ed	[VPlan] Simplify select c, x, x -> x (#133731 ) As noted in 1a9358c090d0507be21c5e9b2d97a23ef1de8ab0, some simplifications can produce a redundant select where the true and false operands are the same, which this patch removes. The is_fpclass test was changed so the condition wasn't made dead.	2025-04-02 10:26:48 +01:00
YunQiang Su	e25187bc3e	LLVM/Test: Add vectorizing testcases for fminimumnum and fminimumnum (#133843 ) Vectorizing of fminimumnum and fminimumnum have not support yet. Let's add the testcase for it now, and we will update the testcase when we support it.	2025-04-02 08:46:02 +08:00
Ramkumar Ramachandra	3a66760d9b	[LV] Improve a test, regen with UTC (#130092 )	2025-04-01 14:11:20 +01:00

1 2 3 4 5 ...

936 Commits