llvm-project

Author	SHA1	Message	Date
Paul Walker	7b8fd8f31b	[LLVM][SCEV] Look through common vscale multiplicand when simplifying compares. (#141798 ) My usecase is simplifying the control flow generated by LoopVectorize when vectorising loops whose tripcount is a function of the runtime vector length. This can be problematic because: * CSE is a pre-LoopVectorize transform and so it's common for an IR function to include several calls to llvm.vscale(). (NOTE: Code generation will typically remove the duplicates) * Pre-LoopVectorize instcombines will rewrite some multiplies as shifts. This leads to a mismatch between VL based maths of the scalar loop and that created for the vector loop, which prevents some obvious simplifications. SCEV does not suffer these issues because it effectively does CSE during construction and shifts are represented as multiplies.	2025-09-19 12:57:13 +01:00
Florian Hahn	0c028bbf33	[LV] Always add uniform pointers to uniforms list. Always add pointers proved to be uniform via legal/SCEV to worklist. This extends the existing logic to handle a few more pointers known to be uniform.	2025-09-18 22:56:19 +01:00
Florian Hahn	70a7ffdc29	[LV] Add missing test cover for replicating load/store costs.	2025-09-18 19:47:06 +01:00
Florian Hahn	50b9ca4dda	[VPlan] Simplify Plan's entry in removeBranchOnConst. (#154510 ) After https://github.com/llvm/llvm-project/pull/153643, there may be a BranchOnCond with constant condition in the entry block. Simplify those in removeBranchOnConst. This removes a number of redundant conditional branch from entry blocks. In some cases, it may also make the original scalar loop unreachable, because we know it will never execute. In that case, we need to remove the loop from LoopInfo, because all unreachable blocks may dominate each other, making LoopInfo invalid. In those cases, we can also completely remove the loop, for which I'll share a follow-up patch. Depends on https://github.com/llvm/llvm-project/pull/153643. PR: https://github.com/llvm/llvm-project/pull/154510	2025-09-18 19:25:05 +01:00
Hassnaa Hamdi	e8aa0b688a	[LV]: Ensure fairness when selecting epilogue VF. (#155547 ) Consider IC when deciding if epilogue profitable for scalable vectors, same as fixed-width vectors.	2025-09-17 14:48:10 +01:00
Sander de Smalen	17e008db17	[IR] NFC: Remove 'experimental' from partial.reduce.add intrinsic (#158637 ) The partial reduction intrinsics are no longer experimental, because they've been used in production for a while and are unlikely to change.	2025-09-17 11:44:47 +01:00
Graham Hunter	666e4313eb	[NFC][LV] Improve ee with sideeffects legality test (#158275 ) Addressing postcommit comments for 54fc5367f63cca8e011d93bbd55764b0a7ecbbd5	2025-09-16 13:04:51 +00:00
Ramkumar Ramachandra	46fcece2a8	[VPlan] Extend CSE to eliminate GEPs (#156699 ) The motivation for this patch is to close the gap between the VPlan-based CSE and the legacy CSE, to make it easier to remove the legacy CSE. Before this patch, stubbing out the legacy CSE leads to 22 test failures, and after this patch, there are only 12 failures, and all of them seem to have a single root cause: VPlanTransforms::createInterleaveGroups() and VPInterleaveGroup::execute(). The improvements from this patch are of course welcome. While developing the patch, a miscompile was found when GEP source-element-types differ, and this has been fixed. Co-authored-by: Florian Hahn <flo@fhahn.com> Co-authored-by: Luke Lau <luke@igalia.com>	2025-09-16 10:14:32 +00:00
Florian Hahn	1858532c48	[VPlan] Handle predicated UDiv in VPReplicateRecipe::computeCost. Account for predicated UDiv,SDiv,URem,SRem in VPReplicateRecipe::computeCost: compute costs of extra phis and apply getPredBlockCostDivisor. Fixes https://github.com/llvm/llvm-project/issues/158660	2025-09-15 21:46:50 +01:00
Florian Hahn	4949cb4a5e	[VPlan] Track VPValues instead of VPRecipes in calculateRegisterUsage. (#155301 ) Update calculateRegisterUsageForPlan to track live-ness of VPValues instead of recipes. This gives slightly more accurate results for recipes that define multiple values (i.e. VPInterleaveRecipe). When tracking the live-ness of recipes, all VPValues defined by an VPInterleaveRecipe are considered alive until the last use of any of them. When tracking the live-ness of individual VPValues, we can accurately track the individual values until their last use. Note the changes in large-loop-rdx.ll and pr47437.ll. This patch restores the original behavior before introducing VPlan-based liveness tracking. PR: https://github.com/llvm/llvm-project/pull/155301	2025-09-15 20:55:11 +01:00
Florian Hahn	985dc69a2d	[LV] Add test for missed interleaving after narrowing interleave groups. Add extra test coverage for https://github.com/llvm/llvm-project/pull/149706. The added loop should be interleaved, after narrowing interleave groups, which requires moving the transform earlier.	2025-09-15 17:33:59 +01:00
Florian Hahn	2848e28012	[LV] Add test with partial reduction without narrowing.	2025-09-15 11:56:58 +01:00
Florian Hahn	ef7e03a2d1	[VPlan] Limit ExtractLastElem fold to recipes guaranteed single-scalar. vputils::isSingleScalar(A) may return true to recipes that produce only a single scalar value, but they could still end up as vector instruction, because the recipe could not be converted to a single-scalar VPInstruction/VPReplicateRecipe. For now, only apply the fold for recipes guaranteed to produce a single value, i.e. single-scalar VPInstructions and VPReplicateRecipes. Fixes https://github.com/llvm/llvm-project/issues/158319.	2025-09-13 18:15:38 +01:00
Antonio Frighetto	370607065d	[llvm] Regenerate test checks including TBAA semantics (NFC) Tests exercizing TBAA metadata (both purposefully and not), and previously generated via UTC, have been regenerated and updated to version 6.	2025-09-12 20:01:17 +02:00
Florian Hahn	b8eaceb39b	[VPlan] Explicitly replicate VPInstructions by VF. (#155102 ) Extend replicateByVF added in #142433 (aa240293190) to also explicitly unroll replicating VPInstructions. Now the only remaining case where we replicate for all lanes is VPReplicateRecipes in replicate regions. PR: https://github.com/llvm/llvm-project/pull/155102	2025-09-12 17:06:26 +01:00
Graham Hunter	54fc5367f6	[LV] Fix crash in uncountable exit with side effects checking Fixes an ICE reported on PR #145663, as an assert was found to be reachable with a specific combination of unreachable blocks.	2025-09-12 10:41:05 +00:00
Luke Lau	4bb250d6a3	[VPlan] Always consider register pressure on RISC-V (#156951 ) Stacked on #156923 In https://godbolt.org/z/8svWaredK, we spill a lot on RISC-V because whilst the largest element type is i8, we generate a bunch of pointer vectors for gathers and scatters. This means the VF chosen is quite high e.g. <vscale x 16 x i8>, but we end up using a bunch of <vscale x 16 x i64> m8 registers for the pointers. This was briefly fixed by #132190 where we computed register pressure in VPlan and used it to prune VFs that were likely to spill. The legacy cost model wasn't able to do this pruning because it didn't have visibility into the pointer vectors that were needed for the gathers/scatters. However VF pruning was restricted again to just the case when max bandwidth was enabled in #141736 to avoid an AArch64 regression, and restricted again in #149056 to only prune VFs that had max bandwidth enabled. On RISC-V we take advantage of register grouping for performance and choose a default of LMUL 2, which means there are 16 registers to work with – half the number as SVE, so we encounter higher register pressure more frequently. As such, we likely want to always consider pruning VFs with high register pressure and not just the VFs from max bandwidth. This adds a TTI hook to opt into this behaviour for RISC-V which fixes the motivating godbolt example above. When last checked this significantly reduces the number of spills on SPEC CPU 2017, up to 80% on 538.imagick_r.	2025-09-12 06:21:54 +00:00
Joel E. Denny	0e3c5566c0	[PGO] Add llvm.loop.estimated_trip_count metadata (#152775 ) This patch implements the `llvm.loop.estimated_trip_count` metadata discussed in [[RFC] Fix Loop Transformations to Preserve Block Frequencies](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785). As the RFC explains, that metadata enables future patches, such as PR #128785, to fix block frequency issues without losing estimated trip counts.	2025-09-11 15:55:18 -04:00
Graham Hunter	e285602fda	[LV] Enforce addrec in current loop for uncountable exit load address check Addresses post-commit review raised for #145663	2025-09-11 11:18:22 +00:00
Elvis Wang	3e898bc40f	[LV] Fix cost misaligned when gather/scatter w/ addr is uniform. (#157387 ) This patch fix the assertion when the `isUniform` (from legacy model) and `isSingleScalar`(from Vplan-based model) mismatch. The simplify test that cause assertion ``` loop: loadA = load %a => %a is loop invariant. loadB = load %LoadA ... ``` In the legacy cost model, it cannot analysis that addr of `%loadB` is uniform but in the Vplan-based cost model both addr in `%loadA` and `loadB` is single scalar. Full test caused crash: https://llvm.godbolt.org/z/zEG8YKjqh. --------- Co-authored-by: Luke Lau <luke@igalia.com>	2025-09-11 07:49:54 +08:00
Florian Hahn	1efa997317	[VPlan] Handle stores to single-scalar addr in narrowToSingleScalars. Move handling of stores to single-scalar/uniform address from replicateByVF to narrowToSingleScalar.	2025-09-10 21:58:29 +01:00
Florian Hahn	055e4ff35a	[VPlan] Don't narrow op multiple times in narrowInterleaveGroups. Track which ops already have been narrowed, to avoid narrowing the same operation multiple times. Repeated narrowing will lead to incorrect results, because we could first narrow from an interleave group -> wide load, and then narrow the wide load > single-scalar load. Fixes thttps://github.com/llvm/llvm-project/issues/156190.	2025-09-10 19:22:42 +01:00
Florian Hahn	7b828738c6	[LV] Add tests with multiple store groups re-using widened ops. Test coverage for https://github.com/llvm/llvm-project/issues/156190.	2025-09-10 17:10:46 +01:00
Nikita Popov	a301e1a895	[InstCombine] Split GEPs with multiple non-zero offsets (#151333 ) Split GEPs that have more than one non-zero offset into two GEPs. This is in preparation for the ptradd migration, which can only represent such GEPs. This also enables CSE and LICM of the common base.	2025-09-10 16:51:58 +02:00
Graham Hunter	3c810b76b9	[LV] Add initial legality checks for early exit loops with side effects (#145663 ) This adds initial support to LoopVectorizationLegality to analyze loops with side effects (particularly stores to memory) and an uncountable exit. This patch alone doesn't enable any new transformations, but does give clearer reasons for rejecting vectorization for such a loop. The intent is for a loop like the following to pass the specific checks, and only be rejected at the end until the transformation code is committed: ``` // Assume a is marked restrict // Assume b is known to be large enough to access up to b[N-1] for (int i = 0; i < N; ++) { a[i]++; if (b[i] > threshold) break; } ```	2025-09-10 13:54:52 +01:00
Hassnaa Hamdi	5739142345	[LV][AArch64][NFC]: Change TC in a test case. (#157512 ) - In sve-epilog-vscale-fixed.ll file, it tests the preference of fixed-width epilogue VF vs scalable when costs are equal. This NFC patch is changing the TC in the test case to be unknown to avoid folding the epilogue in future LV changes.	2025-09-10 12:41:49 +01:00
Florian Hahn	c3e76b2770	[VPlan] Keep common flags during CSE. (#157664 ) During CSE, we don't have to drop all poison-generating flags on mis-match, we can keep the ones common on both recipes. PR: https://github.com/llvm/llvm-project/pull/157664	2025-09-10 10:20:48 +00:00
Mel Chen	4d9a7fa9ba	[VPlan] Remove dead recipes before simplifying blends (#157622 ) In simplifyBlends, when normalizing a blend recipe, the first mask that is used only by the blend and is not all-false is chosen, and its corresponding incoming value becomes the initial value, with the others blended into it. At the same time, the mask that is chosen can be eliminated. However, a multi-user mask might be used by a dead recipe, which prevents this optimization. This patch moves removeDeadRecipes before simplifyBlends to eliminate dead recipes, allowing simplifyBlends to remove more dead masks.	2025-09-10 08:03:18 +00:00
Florian Hahn	c4b17bf9ed	[VPlan] Slightly extend ExtractLastElement fold to single-scalars. Update ExtractLastElement fold to support single scalar recipes, if all their users only use scalars.	2025-09-09 22:08:08 +01:00
Nikita Popov	dbdac9f3ab	[LoopVectorize] Generate test checks (NFC)	2025-09-09 17:43:17 +02:00
Florian Hahn	6bcb172bd6	[LV] Add test for preserving common GEP flags. Add additional test coverage for preserving poison generating flags. Modernize the existing flags tests with auto-generated check lines.	2025-09-09 13:54:53 +01:00
David Green	204917ea97	[LoopVectorizer][AArch64] Add a -sve-vscale-for-tuning override option. (#156916 ) It can be useful for debugging and tuning to be able to alter the VScaleForTuning. This adds a quick option to the aarch64 subtarget for altering it.	2025-09-09 10:46:12 +01:00
Florian Hahn	9b1b93766d	Reapply "[SCEVExp] Add helper to clean up dead instructions after expansion. (#157308 )" This reverts commit eeb43806eb1b40e690aeeba496ee974172202df9. Recommit with with a fix for MSan failure ( https://lab.llvm.org/buildbot/#/builders/169/builds/14799), by adding a set to track deleted values. Using the InsertedInstructions set is not sufficient, as it use asserting value handles as keys, which may dereference the value at construction. Original message: Add new helper to erase dead instructions inserted during SCEV expansion but not being used due to InstSimplifyFolder simplifications. Together with https://github.com/llvm/llvm-project/pull/157307 this also allows removing some specialized folds, e.g. https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp#L2205 PR: https://github.com/llvm/llvm-project/pull/157308	2025-09-09 09:47:41 +01:00
Florian Hahn	132bacde22	[VPlan] Also allow extracts as users when converting to single scalars. Extracts technically do not use scalars, but vectors, but if the operand is a single scalar we do not need a vector and they should not block forming single scalars.	2025-09-08 22:11:39 +01:00
Florian Hahn	eeb43806eb	Revert "[SCEVExp] Add helper to clean up dead instructions after expansion. (#157308 )" This reverts commit 528b13df571c86a2c5b8305d7974f135d785e30f. Triggers MSan errors in some configurations, e.g. https://lab.llvm.org/buildbot/#/builders/169/builds/14799	2025-09-08 14:52:28 +01:00
Florian Hahn	408a2e7cee	[LV] Remove instcombine,simplifycfg and dce from some tests. Remove instcombine, simplifycfg and dce from some tests, as they make it a bit more difficult to see the codegen coming out of LV and most simplifications are already done on the VPlan-level. Also modernizes some check lines.	2025-09-08 12:01:37 +01:00
Florian Hahn	528b13df57	[SCEVExp] Add helper to clean up dead instructions after expansion. (#157308 ) Add new helper to erase dead instructions inserted during SCEV expansion but not being used due to InstSimplifyFolder simplifications. Together with https://github.com/llvm/llvm-project/pull/157307 this also allows removing some specialized folds, e.g. https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp#L2205 PR: https://github.com/llvm/llvm-project/pull/157308	2025-09-08 10:53:20 +01:00
Luke Lau	fe6e178401	[VPlan] Don't build recipes for unconditional switches (#157323 ) In #157322 we crash because we try to infer a type for a VPReplicate switch recipe. My understanding was that these switches should be removed by VPlanPredicator, but this switch survived through it because it was unconditional, i.e. had no cases other than the default case. This fixes #157322 by not emitting any recipes for unconditional switches to begin with, similar to how we treat unconditional branches.	2025-09-08 09:01:43 +00:00
Florian Hahn	b50ad945dd	[InstSimplify] Simplify extractvalue (umul_with_overflow(x, 1)). (#157307 ) Look through extractvalue to simplify umul_with_overflow where one of the operands is 1. This removes some redundant instructions when expanding SCEVs, which in turn makes the runtime check cost estimate more accurate, reducing the minimum iterations for which vectorization is profitable. PR: https://github.com/llvm/llvm-project/pull/157307	2025-09-07 18:32:40 +01:00
Florian Hahn	2654690006	[LV] Add additional test with SCEV predicate. The SCEV predicate in the existing tests for optimizing for size is known to be false. Add additional test with a predicate that cannot be proven true/false. Also generate checks with latest version of script.	2025-09-07 16:14:52 +01:00
Florian Hahn	afa0e70cc6	[LV] Remove instcombine,simplifycfg and dce from some tests. Remove instcombine, simplifycfg and dce from some tests, as they make it a bit more difficult to see the codegen coming out of LV and most simplifications are already done on the VPlan-level. Also modernizes some check lines.	2025-09-07 10:28:25 +01:00
Florian Hahn	59d72b57b0	[LV] Modernize and regenerate checks for some tests.	2025-09-06 20:52:29 +01:00
Florian Hahn	724a63ba8b	[LV] Use more accurate getSCEV/MemChecks in GeneratedRTCheck::hasChecks. Update hasChecks to use getSCEV/MemRuntimeChecks(), which automatically handles checking for known-false checks. This improves a few cases where we previously did not add metadata to disable runtime unrolling, due to runtime checks, even though no runtime checks are needed.	2025-09-06 19:21:11 +01:00
Florian Hahn	cd8c3e5053	[LV] Add test showing missing metadata to disable runtime unrolling.	2025-09-06 16:42:02 +01:00
Florian Hahn	e0f00bd645	[LV] Don't consider second op as invariant in getDivRemSpeculationCost. The second operand when using a safe divisor will always be a select in the loop, so won't be invariant; don't treat it as such. This fixes a divergence with legacy and VPlan based cost model. Fixes https://github.com/llvm/llvm-project/issues/156066.	2025-09-06 14:06:04 +01:00
Florian Hahn	b9b0ea5f62	[LV] Pass DT to isGuaranteedNotToBePoison in canVectorizeWithIfCvt. Pass DT to slightly improve analysis results. Note that the context instruction is already passed.	2025-09-05 20:42:56 +01:00
Florian Hahn	f8972c8280	[SCEVExp] Fix early exit in ComputeEndCheck. (#156910 ) ComputeEndCheck incorrectly returned false for unsigned predicates starting at zero and a positive step. The AddRec could still wrap if Step * trunc ExitCount wraps or trunc ExitCount strips leading 1s. Fixes https://github.com/llvm/llvm-project/issues/156849. PR: https://github.com/llvm/llvm-project/pull/156910	2025-09-05 15:13:11 +00:00
Phoebe Wang	94b164c218	[X86][AVX10] Remove EVEX512 and AVX10-256 implementations (#157034 ) The 256-bit maximum vector register size control was removed from AVX10 whitepaper, ref: https://cdrdv2.intel.com/v1/dl/getContent/784343 We have warned these options in LLVM21 through #132542. This patch removes underlying implementations in LLVM22.	2025-09-05 14:08:59 +00:00
Florian Hahn	74ec38fad0	[SCEV] Fold (C * A /u C) -> A, if A is a multiple of C and C a pow-of-2. (#156730 ) Alive2 Proof: https://alive2.llvm.org/ce/z/JoHJE9 PR: https://github.com/llvm/llvm-project/pull/156730	2025-09-05 08:45:13 +00:00
Luke Lau	3f9e0736ac	[VPlan] Move findCommonEdgeMask optimization to simplifyBlends (#156304 ) Following up from #150368, this moves folding common edge masks into simplifyBlends. One test in uniform-blend.ll ended up regressing but after looking at it closely, it came from a weird (x && !x) edge mask. So I've just included a simplifcation in this PR to fold that to false.	2025-09-05 01:29:22 +00:00

1 2 3 4 5 ...

3445 Commits