llvm-project

Author	SHA1	Message	Date
Florian Hahn	3c5b05427d	[VPlan] Pass underlying instr to getMemoryOpCost in ::computeCost. Pass underlying instruction to getMemoryOpCost in VPReplicateRecipe::computeCost if UsedByLoadStoreAddress is true. Some targets use the underlying instruction to improve costs, and this is needed to match the legacy cost model. Fixes https://github.com/llvm/llvm-project/issues/177780. Fixes https://github.com/llvm/llvm-project/issues/177772.	2026-02-08 16:15:39 +00:00
Ramkumar Ramachandra	d12e99376f	Reland [VPlan] Simplify pow-of-2 (mul\|udiv) -> (shl\|lshr) (#174581 ) The original patch, landed as a2db31b0 ([VPlan] Simplify pow-of-2 (mul\|udiv) -> (shl\|lshr), #172477) had a critical commutative matcher bug, which has now been fixed. An assert has also been strengthened, following a post-commit review.	2026-01-06 20:36:26 +00:00
Alex Bradbury	5a456c17d9	Revert "[VPlan] Simplify pow-of-2 (mul\|udiv) -> (shl\|lshr)" (#174559 ) Reverts llvm/llvm-project#172477 This is causing failures for RVA23 (including some tests running away in their execution causing OOM, hence the builder dying). I will attempt to follow up on the PR with a reproducer of some kind. https://lab.llvm.org/buildbot/#/builders/210/builds/7243	2026-01-06 10:26:51 +00:00
Ramkumar Ramachandra	a2db31b06f	[VPlan] Simplify pow-of-2 (mul\|udiv) -> (shl\|lshr) (#172477 )	2026-01-06 08:27:48 +00:00
Florian Hahn	12ec050b9b	[LV] Remove some unnecessary uses of poison from tests.	2025-10-17 21:20:44 +01:00
Florian Hahn	8907b6d393	[VPlan] Remove original loop blocks if dead. (#155497 ) Build on top of https://github.com/llvm/llvm-project/pull/154510 to completely remove the blocks of dead scalar loops. Depends on https://github.com/llvm/llvm-project/pull/154510. PR: https://github.com/llvm/llvm-project/pull/155497	2025-10-01 16:53:59 +00:00
Florian Hahn	50b9ca4dda	[VPlan] Simplify Plan's entry in removeBranchOnConst. (#154510 ) After https://github.com/llvm/llvm-project/pull/153643, there may be a BranchOnCond with constant condition in the entry block. Simplify those in removeBranchOnConst. This removes a number of redundant conditional branch from entry blocks. In some cases, it may also make the original scalar loop unreachable, because we know it will never execute. In that case, we need to remove the loop from LoopInfo, because all unreachable blocks may dominate each other, making LoopInfo invalid. In those cases, we can also completely remove the loop, for which I'll share a follow-up patch. Depends on https://github.com/llvm/llvm-project/pull/153643. PR: https://github.com/llvm/llvm-project/pull/154510	2025-09-18 19:25:05 +01:00
Florian Hahn	351d398a37	[VPlan] Run final VPlan simplifications before codegen. Dissolving the hierarchical VPlan CFG and converting abstract to concrete recipes can expose additional simplification opportunities. Do a final run of simplifyRecipes before executing the VPlan.	2025-08-16 18:54:27 +01:00
Florian Hahn	86813aa786	[VPlan] Add dedicated user for resume phi with epilogue vectorization. Epilogue vectorization currently relies on the resume phi for the canonical induction being always available, which is why VPPhi are considered to have side-effects, to prevent their removal. This patch adds a new ResumeForEpilogue opcode to mark the resume phi as used for epilogue vectorization. This allows treating VPPhis in general as not having side-effects, enabling removal of unused VPPhis.	2025-08-10 21:21:16 +01:00
Nikita Popov	c23b4fbdbb	[IR] Remove size argument from lifetime intrinsics (#150248 ) Now that #149310 has restricted lifetime intrinsics to only work on allocas, we can also drop the explicit size argument. Instead, the size is implied by the alloca. This removes the ability to only mark a prefix of an alloca alive/dead. We never used that capability, so we should remove the need to handle that possibility everywhere (though many key places, including stack coloring, did not actually respect this).	2025-08-08 11:09:34 +02:00
Florian Hahn	25d1285eec	[VPlan] Replace single-entry VPPhis with their incoming values. Replace trivial, single-entry VPPhis with their incoming values,	2025-08-06 20:03:31 +01:00
Florian Hahn	c9dd14d1d4	[VPlan] Compute interleave count for VPlan. (#149702 ) Move selectInterleaveCount to LoopVectorizationPlanner and retrieve some information directly from VPlan. Register pressure was already computed for a VPlan, and with this patch we now also check for reductions directly on VPlan, as well as checking how many load and store operations remain in the loop. This should be mostly NFC, but we may compute slightly different interleave counts, except for some edge cases, e.g. where dead loads have been removed. This shouldn't happen in practice, and the patch doesn't cause changes across a large test corpus on AArch64. Computing the interleave count based on VPlan allows for making better decisions in presence of VPlan optimizations, for example when operations on interleave groups are narrowed. Note that there are a few test changes for tests that were still checking the legacy cost-model output when it was computed in selectInterleaveCount. PR: https://github.com/llvm/llvm-project/pull/149702	2025-08-05 09:42:55 +01:00
Florian Hahn	89ae085859	[VPlan] Remove VPVectorPointer for part 0 after unrolling. (#149735 ) VPVectorPointer for part 0 is just the pointer operand. Simplify it after unrolling. This removes a large number of redundant GEPs with index 0. PR: https://github.com/llvm/llvm-project/pull/149735	2025-07-27 13:53:26 +01:00
Florian Hahn	fa3ec0c17c	[VPlan] Materialize constant vector trip counts before final opts. (#142309 ) Materialize constant vector trip counts before ::execute, if the trip count can be computed as Original (TC / (VF * UF)) * (VF * UF). For now this excludes when the tail is folded or scalar epilogues are required. This enables removing a number of redundant branches from the middle block. For now this is also only done when not vectorizing the epilogue, as the simplification complicates stitching the 2 plans together. PR: https://github.com/llvm/llvm-project/pull/142309	2025-07-26 17:16:36 +01:00
Ramkumar Ramachandra	cf1f116f78	[VPlan] Introduce constant folder in simplifyRecipe (#125365 ) Introduce a VPlan-level constant folder in simplifyRecipe that tries to fold a recipe to a constant using TargetFolder.	2025-05-20 14:16:01 +01:00
Florian Hahn	07c085af3e	[VPlan] Add narrowToSingleScalarRecipe transform. (#139150 ) Add a new convertToUniformRecipes transform which uses VPlan-based uniformity analysis to determine if wide recipes and replicate recipes can be converted to uniform recipes. There are a few places where we ad-hoc convert recipes to uniform recipes, which this transform will eventually replace. There are a few more generalizations required to do so which I plan to do as follow-ups. By converting the recipes to uniform recipes, we effectively materialize the information from the VPlan-based analysis. Note that there is one regression at the moment in SystemZ/pr47665.ll due to trivial constant folding opportunities in the input IR. This will be fixed by VPlan-based constant folding (https://github.com/llvm/llvm-project/pull/125365/) PR: https://github.com/llvm/llvm-project/pull/139150	2025-05-18 09:32:27 +01:00
Björn Pettersson	092b6e73e6	[InstCombine] Handle "add like" in ADD+GEP->GEP+GEP rewrites (#135156 ) Considering that "or disjoint" is the canonical for certain add operations, then I think we want to support such "add like" operations when doing ADD+GEP->GEP+GEP rewrites to make things more consistent. Problem was found when improving ValueTracking, which turned an ADD into OR, and then suddenly optimizations got worse due to these rewrites no longer triggering.	2025-04-14 17:11:13 +02:00
Florian Hahn	5550d30228	[VPlan] Check captured operand when simplifying redundant OR. Follow-up to 0f607f to actually use the captured operand X instead of Y.	2025-04-13 13:23:27 +01:00
Florian Hahn	0f607f3df5	[VPlan] Simplify 'or x, true' -> true. Add additional OR simplification to fix a divergence between legacy and VPlan-based cost model. This adds a new m_AllOnes matcher by generalizing specific_intval to int_pred_ty, which takes a predicate to check to support matching both specific APInts and other APInt predices, like isAllOnes. Fixes https://github.com/llvm/llvm-project/issues/131359.	2025-04-13 12:09:40 +01:00
Florian Hahn	5fbd0658a0	[VPlan] Add initial CFG simplification, removing BranchOnCond true. (#106748 ) Add an initial CFG simplification transform, which removes the dead edges for blocks terminated with BranchOnCond true. At the moment, this removes the edge between middle block and scalar preheader when folding the tail. PR: https://github.com/llvm/llvm-project/pull/106748	2025-04-04 15:44:26 +01:00
Hari Limaye	bf5627c85e	[LV] Optimize VPWidenIntOrFpInductionRecipe for known TC (#118828 ) Optimize the IR generated for a VPWidenIntOrFpInductionRecipe to use the narrowest type necessary, when the trip-count of a loop is known to be constant and the only use of the recipe is the condition used by the vector loop's backedge branch.	2025-03-28 14:47:40 +00:00
Nikita Popov	29441e4f5f	[IR] Convert from nocapture to captures(none) (#123181 ) This PR removes the old `nocapture` attribute, replacing it with the new `captures` attribute introduced in #116990. This change is intended to be essentially NFC, replacing existing uses of `nocapture` with `captures(none)` without adding any new analysis capabilities. Making use of non-`none` values is left for a followup. Some notes: * `nocapture` will be upgraded to `captures(none)` by the bitcode reader. * `nocapture` will also be upgraded by the textual IR reader. This is to make it easier to use old IR files and somewhat reduce the test churn in this PR. * Helper APIs like `doesNotCapture()` will check for `captures(none)`. * MLIR import will convert `captures(none)` into an `llvm.nocapture` attribute. The representation in the LLVM IR dialect should be updated separately.	2025-01-29 16:56:47 +01:00
Florian Hahn	f48884ded8	[VPlan] Remove loop region in optimizeForVFAndUF. (#108378 ) Update optimizeForVFAndUF to completely remove the vector loop region when possible. At the moment, we cannot remove the region if it contains * widened IVs: the recipe is needed to generate the step vector * reductions: ComputeReductionResults requires the reduction phi recipe for codegen. Both cases can be addressed by more explicit modeling. The patch also includes a number of updates to allow executing VPlans without a vector loop region. Depends on https://github.com/llvm/llvm-project/pull/110004	2025-01-05 15:50:42 +00:00
Florian Hahn	7f3428d3ed	[VPlan] Compute induction end values in VPlan. (#112145 ) Use createDerivedIV to compute IV end values directly in VPlan, instead of creating them up-front. This allows updating IV users outside the loop as follow-up. Depends on https://github.com/llvm/llvm-project/pull/110004 and https://github.com/llvm/llvm-project/pull/109975. PR: https://github.com/llvm/llvm-project/pull/112145	2024-12-29 19:05:08 +00:00
Florian Hahn	82821254f5	[LV] Use IVUpdateMayOverflow to set HasNUW. (#111758 ) If IVUpdateMayOverflow is false, we proved that the induction increment cannot overflow in the vector loop. This allows setting NUW in some cases when folding the tail. PR: https://github.com/llvm/llvm-project/pull/111758	2024-11-28 10:12:41 +00:00
David Sherwood	3097c60928	[LoopVectorize][NFC] Rewrite tests to check output of vplan cost model (#113697 ) Currently it's very difficult to improve the cost model for tail-folded loops because as soon as you add a VPInstruction::computeCost function that adds the costs of instructions such as VPInstruction::ActiveLaneMask and VPInstruction::ExplicitVectorLength the assert in LoopVectorizationPlanner::computeBestVF fails for some tests. This is because the VF chosen by the legacy cost model doesn't match the vplan cost model. See PR #90191. This assert is currently making it difficult to improve the cost model. Hopefully we will be in a position to remove the assert soon, however in order to do that we have to fix up a whole bunch of tests that rely upon the legacy cost model output. I've tried my best to update these tests to use vplan output instead. There is still work needed for the VF=1 case because the vplan cost model is not printed out in this case. I've not attempted to fix those in this patch.	2024-11-19 08:55:39 +00:00
Paul Walker	38fffa630e	[LLVM][IR] Use splat syntax when printing Constant[Data]Vector. (#112548 )	2024-11-06 11:53:33 +00:00
Florian Hahn	4eb9838409	[VPlan] Generalize VPValue::isDefinedOutsideLoopRegions. Update isDefinedOutsideLoopRegions to check if a recipe is defined outside any region. Split off already approved https://github.com/llvm/llvm-project/pull/95842 now that this can be tested separately after landing VPlan-based LICM https://github.com/llvm/llvm-project/issues/107501	2024-09-20 15:34:00 +01:00
Florian Hahn	a861ed411a	[VPlan] Add initial loop-invariant code motion transform. (#107894 ) Add initial transform to move out loop-invariant recipes. This also helps to fix a divergence between legacy and VPlan-based cost model due to legacy using ScalarEvolution::isLoopInvariant in some cases. Fixes https://github.com/llvm/llvm-project/issues/107501. PR: https://github.com/llvm/llvm-project/pull/107894	2024-09-20 11:22:03 +01:00
Florian Hahn	ea83e1c05a	[LV] Assign cost to all interleave members when not interleaving. At the moment, the full cost of all interleave group members is assigned to the instruction at the group's insert position, even if the decision was to not form an interleave group. This can lead to inaccurate cost estimates, e.g. if the instruction at the insert position is dead. If the decision is to not vectorize but scalarize or scather/gather, then the cost will be to total cost for all members. In those cases, assign individual the cost per member, to more closely reflect to choice per instruction. This fixes a divergence between legacy and VPlan-based cost model. Fixes https://github.com/llvm/llvm-project/issues/108098.	2024-09-11 21:04:34 +01:00
Florian Hahn	bb60dd391f	[VPlan] Only use force-target-instruction-cost for recipes with insts. To match the behavior of the legacy cost model, only apply -force-target-instruction-cost to recipes with underlying instructions for now, as only original IR instructions are considered by the legacy cost model. This fixes a difference between legacy and VPlan based cost model, triggering the verification assertion, reported by @JonPsson1.	2024-07-23 21:05:10 +01:00
Florian Hahn	c8c0b18b5d	[LV] Update tests to not have dead interleave groups. Update existing tests with dead interleave groups by adding users. This ensures the tests keep testing what they were intended to test with a planned change to skip unused instructions in cost computations.	2024-07-21 14:03:40 +01:00
Florian Hahn	b8741cc185	[VPlan] Relax assertion retrieving a scalar from VPTransformState::get. The current assertion VPTransformState::get when retrieving a single scalar only does not account for cases where a def has multiple users, some demanding all scalar lanes, some demanding only a single scalar. For an example, see the modified test case. Relax the assertion by also allowing requesting scalar lanes only when the Def doesn't have only its first lane used. Fixes https://github.com/llvm/llvm-project/issues/88849.	2024-07-19 11:33:57 +01:00
Florian Hahn	17f98baf70	[LV] Add test with users both demanding all lanes and first-lane-only. Add a test case where scalar steps are used by both a VPReplicateRecipe (demands all scalar lanes) and a VPInstruction that only demands the first lane. Test case for https://github.com/llvm/llvm-project/issues/88849.	2024-07-19 10:29:43 +01:00
Florian Hahn	9a5a8731e7	[VPlan] Introduce ResumePhi VPInstruction, use to create phi for FOR. (#94760 ) This patch introduces a new ResumePhi VPInstruction which creates a phi in a leaf block of a VPlan. The first use is to create the phi node for fixed-order recurrence resume values in the scalar preheader. The VPInstruction takes 2 operands: 1) the incoming value from the middle-block and a default value to be used for all other incoming blocks. In follow-up changes, it will also be used to create phis for reduction and induction resume values. Depends on https://github.com/llvm/llvm-project/pull/92651 PR: https://github.com/llvm/llvm-project/pull/94760	2024-07-11 16:08:04 +01:00
Florian Hahn	29b8b72117	[LV] Move check if any vector insts will be generated to VPlan. (#96622 ) This patch moves the check if any vector instructions will be generated from getInstructionCost to be based on VPlan. This simplifies getInstructionCost, is more accurate as we check the final result and also allows us to exit early once we visit a recipe that generates vector instructions. The helper can then be re-used by the VPlan-based cost model to match the legacy selectVectorizationFactor behavior, this fixing a crash and paving the way to recommit https://github.com/llvm/llvm-project/pull/92555. PR: https://github.com/llvm/llvm-project/pull/96622	2024-07-07 20:08:01 +01:00
Florian Hahn	959ff45bda	[LV] Regenerate test checks for zero_unroll.ll (NFC). Regenerate test checks to better show impact of https://github.com/llvm/llvm-project/pull/96622.	2024-07-05 11:37:13 +01:00
Florian Hahn	99d6c6d936	[VPlan] Model branch cond to enter scalar epilogue in VPlan. (#92651 ) This patch moves branch condition creation to enter the scalar epilogue loop to VPlan. Modeling the branch in the middle block also requires modeling the successor blocks. This is done using the recently introduced VPIRBasicBlock. Note that the middle.block is still created as part of the skeleton and then patched in during VPlan execution. Unfortunately the skeleton needs to create the middle.block early on, as it is also used for induction resume value creation and is also needed to properly update the dominator tree during skeleton creation. After this patch lands, I plan to move induction resume value and phi node creation in the scalar preheader to VPlan. Once that is done, we should be able to create the middle.block in VPlan directly. This is a re-worked version based on the earlier https://reviews.llvm.org/D150398 and the main change is the use of VPIRBasicBlock. Depends on https://github.com/llvm/llvm-project/pull/92525 PR: https://github.com/llvm/llvm-project/pull/92651	2024-07-05 10:08:42 +01:00
David Green	352a836176	[InstCombine] Canonicalize non-i8 gep of mul to i8 (#96606 ) This is a small canonicalization for `gep i32, p, (mul x, C)` -> `gep i8, p, (mul x, C*4)`, so that the mul can combine both of the constant multiplications, and we take a small step towards canonicalizing more geps to i8. It currently doesn't attempt to check for multiple uses on the mul, but that should be possible if it sounds better. Let me know what you think of the idea in general.	2024-06-26 14:25:54 +01:00
Ramkumar Ramachandra	bb0d29a72d	[LV] fix logical error in trunc cost (#91136 ) In LoopVectorizationCostModel::getInstructionCost(), when the condition canTruncateToMinimalBitwidth() is satisfied, for a trunc, the source type is computed as the smallest type of the source vector and the destination vector, and the destination type is computed as the largest type of the instruction and destination type. This is clearly a logical error, as the original source vector type could be smaller than the original destination vector type, and the trunc semantics are broken because we're attempting to widen. Fixes #47665.	2024-05-24 18:01:58 +01:00
Ramkumar Ramachandra	dc148c9fb8	[LV] add test for #47665 , #88802 (#91135 )	2024-05-24 10:50:43 +01:00
Nilanjana Basu	c1c5b854ad	[LV] Remove loop trip count threshold for deciding whether to interleave a loop (#67725 ) A set of microbenchmarks (https://github.com/llvm/llvm-test-suite/pull/26) showed that loop interleaving can be beneficial for loops with low trip count as well. Loop interleaving count computation is updated accordingly in prior patches while this patch removes the loop trip count threshold for interleaving.	2024-02-05 17:23:58 -08:00
Jonas Paulsson	62b7e35f10	[SystemZ] Don't assert for i128 vectors in getInterleavedMemoryOpCost() (#78009 ) This assert does not seem justified given that the LoopVectorizer can form interleave groups containing i128 elements where the number of elements per vector is indeed just one.	2024-01-15 17:31:18 +01:00
Craig Topper	7ec4f6094e	[InstCombine] Infer disjoint flag on Or instructions. (#72912 ) The disjoint flag was recently added to IR in #72583 We already set it when we turn an add into an or. This patch sets it on Ors that weren't converted from an Add.	2023-12-02 14:11:12 -08:00
Craig Topper	03d4a9d94d	[InstCombine] Set disjoint flag when turning Add into Or. (#72702 ) The disjoint flag was recently added to IR in #72583	2023-11-27 12:54:11 -08:00
Nikita Popov	2b7c347c7f	[LoopVectorize] Convert test to opaque pointers (NFC) I'm keeping the bitcast in the input here, because without it we end up introducing a stride 1 assumption and end up testing a different case.	2023-06-12 14:49:45 +02:00
Tobias Hieta	f84bac329b	[NFC][Py Reformat] Reformat lit.local.cfg python files in llvm This is a follow-up to b71edfaa4ec3c998aadb35255ce2f60bba2940b0 since I forgot the lit.local.cfg files in that one. Reformatting is done with `black`. If you end up having problems merging this commit because you have made changes to a python file, the best way to handle that is to run git checkout --ours <yourfile> and then reformat it with black. If you run into any problems, post to discourse about it and we will try to help. RFC Thread below: https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style Reviewed By: barannikov88, kwk Differential Revision: https://reviews.llvm.org/D150762	2023-05-17 17:03:15 +02:00
Florian Hahn	35af27c30a	[VPlan] Only create extracts for recurrence exits if there are live-outs. Move the code to collect live-out earlier and only generate extracts for exit values if there are any live-outs that use them. Depends on D147472. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D147567	2023-04-10 21:08:34 +01:00
Nikita Popov	9ed2f14c87	[AsmParser] Remove typed pointer auto-detection IR is now always parsed in opaque pointer mode, unless -opaque-pointers=0 is explicitly given. There is no automatic detection of typed pointers anymore. The -opaque-pointers=0 option is added to any remaining IR tests that haven't been migrated yet. Differential Revision: https://reviews.llvm.org/D141912	2023-01-18 09:58:32 +01:00
Nikita Popov	2fab927546	[LoopVectorize] Convert some tests to opaque pointers (NFC) Check lines for some of these tests were regenerated. The difference is that with opaque pointers SCEVExpander always emits i8 GEPs, making the address calculation explicit. This is a known problem that will be solved long term by making all address calculations explicit.	2023-01-04 17:25:42 +01:00

1 2

83 Commits