llvm-project

Author	SHA1	Message	Date
Florian Hahn	b8eaceb39b	[VPlan] Explicitly replicate VPInstructions by VF. (#155102 ) Extend replicateByVF added in #142433 (aa240293190) to also explicitly unroll replicating VPInstructions. Now the only remaining case where we replicate for all lanes is VPReplicateRecipes in replicate regions. PR: https://github.com/llvm/llvm-project/pull/155102	2025-09-12 17:06:26 +01:00
Graham Hunter	54fc5367f6	[LV] Fix crash in uncountable exit with side effects checking Fixes an ICE reported on PR #145663, as an assert was found to be reachable with a specific combination of unreachable blocks.	2025-09-12 10:41:05 +00:00
Luke Lau	4bb250d6a3	[VPlan] Always consider register pressure on RISC-V (#156951 ) Stacked on #156923 In https://godbolt.org/z/8svWaredK, we spill a lot on RISC-V because whilst the largest element type is i8, we generate a bunch of pointer vectors for gathers and scatters. This means the VF chosen is quite high e.g. <vscale x 16 x i8>, but we end up using a bunch of <vscale x 16 x i64> m8 registers for the pointers. This was briefly fixed by #132190 where we computed register pressure in VPlan and used it to prune VFs that were likely to spill. The legacy cost model wasn't able to do this pruning because it didn't have visibility into the pointer vectors that were needed for the gathers/scatters. However VF pruning was restricted again to just the case when max bandwidth was enabled in #141736 to avoid an AArch64 regression, and restricted again in #149056 to only prune VFs that had max bandwidth enabled. On RISC-V we take advantage of register grouping for performance and choose a default of LMUL 2, which means there are 16 registers to work with – half the number as SVE, so we encounter higher register pressure more frequently. As such, we likely want to always consider pruning VFs with high register pressure and not just the VFs from max bandwidth. This adds a TTI hook to opt into this behaviour for RISC-V which fixes the motivating godbolt example above. When last checked this significantly reduces the number of spills on SPEC CPU 2017, up to 80% on 538.imagick_r.	2025-09-12 06:21:54 +00:00
Joel E. Denny	0e3c5566c0	[PGO] Add llvm.loop.estimated_trip_count metadata (#152775 ) This patch implements the `llvm.loop.estimated_trip_count` metadata discussed in [[RFC] Fix Loop Transformations to Preserve Block Frequencies](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785). As the RFC explains, that metadata enables future patches, such as PR #128785, to fix block frequency issues without losing estimated trip counts.	2025-09-11 15:55:18 -04:00
Graham Hunter	e285602fda	[LV] Enforce addrec in current loop for uncountable exit load address check Addresses post-commit review raised for #145663	2025-09-11 11:18:22 +00:00
Elvis Wang	3e898bc40f	[LV] Fix cost misaligned when gather/scatter w/ addr is uniform. (#157387 ) This patch fix the assertion when the `isUniform` (from legacy model) and `isSingleScalar`(from Vplan-based model) mismatch. The simplify test that cause assertion ``` loop: loadA = load %a => %a is loop invariant. loadB = load %LoadA ... ``` In the legacy cost model, it cannot analysis that addr of `%loadB` is uniform but in the Vplan-based cost model both addr in `%loadA` and `loadB` is single scalar. Full test caused crash: https://llvm.godbolt.org/z/zEG8YKjqh. --------- Co-authored-by: Luke Lau <luke@igalia.com>	2025-09-11 07:49:54 +08:00
Florian Hahn	1efa997317	[VPlan] Handle stores to single-scalar addr in narrowToSingleScalars. Move handling of stores to single-scalar/uniform address from replicateByVF to narrowToSingleScalar.	2025-09-10 21:58:29 +01:00
Florian Hahn	055e4ff35a	[VPlan] Don't narrow op multiple times in narrowInterleaveGroups. Track which ops already have been narrowed, to avoid narrowing the same operation multiple times. Repeated narrowing will lead to incorrect results, because we could first narrow from an interleave group -> wide load, and then narrow the wide load > single-scalar load. Fixes thttps://github.com/llvm/llvm-project/issues/156190.	2025-09-10 19:22:42 +01:00
Florian Hahn	7b828738c6	[LV] Add tests with multiple store groups re-using widened ops. Test coverage for https://github.com/llvm/llvm-project/issues/156190.	2025-09-10 17:10:46 +01:00
Nikita Popov	a301e1a895	[InstCombine] Split GEPs with multiple non-zero offsets (#151333 ) Split GEPs that have more than one non-zero offset into two GEPs. This is in preparation for the ptradd migration, which can only represent such GEPs. This also enables CSE and LICM of the common base.	2025-09-10 16:51:58 +02:00
Graham Hunter	3c810b76b9	[LV] Add initial legality checks for early exit loops with side effects (#145663 ) This adds initial support to LoopVectorizationLegality to analyze loops with side effects (particularly stores to memory) and an uncountable exit. This patch alone doesn't enable any new transformations, but does give clearer reasons for rejecting vectorization for such a loop. The intent is for a loop like the following to pass the specific checks, and only be rejected at the end until the transformation code is committed: ``` // Assume a is marked restrict // Assume b is known to be large enough to access up to b[N-1] for (int i = 0; i < N; ++) { a[i]++; if (b[i] > threshold) break; } ```	2025-09-10 13:54:52 +01:00
Hassnaa Hamdi	5739142345	[LV][AArch64][NFC]: Change TC in a test case. (#157512 ) - In sve-epilog-vscale-fixed.ll file, it tests the preference of fixed-width epilogue VF vs scalable when costs are equal. This NFC patch is changing the TC in the test case to be unknown to avoid folding the epilogue in future LV changes.	2025-09-10 12:41:49 +01:00
Florian Hahn	c3e76b2770	[VPlan] Keep common flags during CSE. (#157664 ) During CSE, we don't have to drop all poison-generating flags on mis-match, we can keep the ones common on both recipes. PR: https://github.com/llvm/llvm-project/pull/157664	2025-09-10 10:20:48 +00:00
Mel Chen	4d9a7fa9ba	[VPlan] Remove dead recipes before simplifying blends (#157622 ) In simplifyBlends, when normalizing a blend recipe, the first mask that is used only by the blend and is not all-false is chosen, and its corresponding incoming value becomes the initial value, with the others blended into it. At the same time, the mask that is chosen can be eliminated. However, a multi-user mask might be used by a dead recipe, which prevents this optimization. This patch moves removeDeadRecipes before simplifyBlends to eliminate dead recipes, allowing simplifyBlends to remove more dead masks.	2025-09-10 08:03:18 +00:00
Florian Hahn	c4b17bf9ed	[VPlan] Slightly extend ExtractLastElement fold to single-scalars. Update ExtractLastElement fold to support single scalar recipes, if all their users only use scalars.	2025-09-09 22:08:08 +01:00
Nikita Popov	dbdac9f3ab	[LoopVectorize] Generate test checks (NFC)	2025-09-09 17:43:17 +02:00
Florian Hahn	6bcb172bd6	[LV] Add test for preserving common GEP flags. Add additional test coverage for preserving poison generating flags. Modernize the existing flags tests with auto-generated check lines.	2025-09-09 13:54:53 +01:00
David Green	204917ea97	[LoopVectorizer][AArch64] Add a -sve-vscale-for-tuning override option. (#156916 ) It can be useful for debugging and tuning to be able to alter the VScaleForTuning. This adds a quick option to the aarch64 subtarget for altering it.	2025-09-09 10:46:12 +01:00
Florian Hahn	9b1b93766d	Reapply "[SCEVExp] Add helper to clean up dead instructions after expansion. (#157308 )" This reverts commit eeb43806eb1b40e690aeeba496ee974172202df9. Recommit with with a fix for MSan failure ( https://lab.llvm.org/buildbot/#/builders/169/builds/14799), by adding a set to track deleted values. Using the InsertedInstructions set is not sufficient, as it use asserting value handles as keys, which may dereference the value at construction. Original message: Add new helper to erase dead instructions inserted during SCEV expansion but not being used due to InstSimplifyFolder simplifications. Together with https://github.com/llvm/llvm-project/pull/157307 this also allows removing some specialized folds, e.g. https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp#L2205 PR: https://github.com/llvm/llvm-project/pull/157308	2025-09-09 09:47:41 +01:00
Florian Hahn	132bacde22	[VPlan] Also allow extracts as users when converting to single scalars. Extracts technically do not use scalars, but vectors, but if the operand is a single scalar we do not need a vector and they should not block forming single scalars.	2025-09-08 22:11:39 +01:00
Florian Hahn	eeb43806eb	Revert "[SCEVExp] Add helper to clean up dead instructions after expansion. (#157308 )" This reverts commit 528b13df571c86a2c5b8305d7974f135d785e30f. Triggers MSan errors in some configurations, e.g. https://lab.llvm.org/buildbot/#/builders/169/builds/14799	2025-09-08 14:52:28 +01:00
Florian Hahn	408a2e7cee	[LV] Remove instcombine,simplifycfg and dce from some tests. Remove instcombine, simplifycfg and dce from some tests, as they make it a bit more difficult to see the codegen coming out of LV and most simplifications are already done on the VPlan-level. Also modernizes some check lines.	2025-09-08 12:01:37 +01:00
Florian Hahn	528b13df57	[SCEVExp] Add helper to clean up dead instructions after expansion. (#157308 ) Add new helper to erase dead instructions inserted during SCEV expansion but not being used due to InstSimplifyFolder simplifications. Together with https://github.com/llvm/llvm-project/pull/157307 this also allows removing some specialized folds, e.g. https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp#L2205 PR: https://github.com/llvm/llvm-project/pull/157308	2025-09-08 10:53:20 +01:00
Luke Lau	fe6e178401	[VPlan] Don't build recipes for unconditional switches (#157323 ) In #157322 we crash because we try to infer a type for a VPReplicate switch recipe. My understanding was that these switches should be removed by VPlanPredicator, but this switch survived through it because it was unconditional, i.e. had no cases other than the default case. This fixes #157322 by not emitting any recipes for unconditional switches to begin with, similar to how we treat unconditional branches.	2025-09-08 09:01:43 +00:00
Florian Hahn	b50ad945dd	[InstSimplify] Simplify extractvalue (umul_with_overflow(x, 1)). (#157307 ) Look through extractvalue to simplify umul_with_overflow where one of the operands is 1. This removes some redundant instructions when expanding SCEVs, which in turn makes the runtime check cost estimate more accurate, reducing the minimum iterations for which vectorization is profitable. PR: https://github.com/llvm/llvm-project/pull/157307	2025-09-07 18:32:40 +01:00
Florian Hahn	2654690006	[LV] Add additional test with SCEV predicate. The SCEV predicate in the existing tests for optimizing for size is known to be false. Add additional test with a predicate that cannot be proven true/false. Also generate checks with latest version of script.	2025-09-07 16:14:52 +01:00
Florian Hahn	afa0e70cc6	[LV] Remove instcombine,simplifycfg and dce from some tests. Remove instcombine, simplifycfg and dce from some tests, as they make it a bit more difficult to see the codegen coming out of LV and most simplifications are already done on the VPlan-level. Also modernizes some check lines.	2025-09-07 10:28:25 +01:00
Florian Hahn	59d72b57b0	[LV] Modernize and regenerate checks for some tests.	2025-09-06 20:52:29 +01:00
Florian Hahn	724a63ba8b	[LV] Use more accurate getSCEV/MemChecks in GeneratedRTCheck::hasChecks. Update hasChecks to use getSCEV/MemRuntimeChecks(), which automatically handles checking for known-false checks. This improves a few cases where we previously did not add metadata to disable runtime unrolling, due to runtime checks, even though no runtime checks are needed.	2025-09-06 19:21:11 +01:00
Florian Hahn	cd8c3e5053	[LV] Add test showing missing metadata to disable runtime unrolling.	2025-09-06 16:42:02 +01:00
Florian Hahn	e0f00bd645	[LV] Don't consider second op as invariant in getDivRemSpeculationCost. The second operand when using a safe divisor will always be a select in the loop, so won't be invariant; don't treat it as such. This fixes a divergence with legacy and VPlan based cost model. Fixes https://github.com/llvm/llvm-project/issues/156066.	2025-09-06 14:06:04 +01:00
Florian Hahn	b9b0ea5f62	[LV] Pass DT to isGuaranteedNotToBePoison in canVectorizeWithIfCvt. Pass DT to slightly improve analysis results. Note that the context instruction is already passed.	2025-09-05 20:42:56 +01:00
Florian Hahn	f8972c8280	[SCEVExp] Fix early exit in ComputeEndCheck. (#156910 ) ComputeEndCheck incorrectly returned false for unsigned predicates starting at zero and a positive step. The AddRec could still wrap if Step * trunc ExitCount wraps or trunc ExitCount strips leading 1s. Fixes https://github.com/llvm/llvm-project/issues/156849. PR: https://github.com/llvm/llvm-project/pull/156910	2025-09-05 15:13:11 +00:00
Phoebe Wang	94b164c218	[X86][AVX10] Remove EVEX512 and AVX10-256 implementations (#157034 ) The 256-bit maximum vector register size control was removed from AVX10 whitepaper, ref: https://cdrdv2.intel.com/v1/dl/getContent/784343 We have warned these options in LLVM21 through #132542. This patch removes underlying implementations in LLVM22.	2025-09-05 14:08:59 +00:00
Florian Hahn	74ec38fad0	[SCEV] Fold (C * A /u C) -> A, if A is a multiple of C and C a pow-of-2. (#156730 ) Alive2 Proof: https://alive2.llvm.org/ce/z/JoHJE9 PR: https://github.com/llvm/llvm-project/pull/156730	2025-09-05 08:45:13 +00:00
Luke Lau	3f9e0736ac	[VPlan] Move findCommonEdgeMask optimization to simplifyBlends (#156304 ) Following up from #150368, this moves folding common edge masks into simplifyBlends. One test in uniform-blend.ll ended up regressing but after looking at it closely, it came from a weird (x && !x) edge mask. So I've just included a simplifcation in this PR to fold that to false.	2025-09-05 01:29:22 +00:00
Luke Lau	4e5e65e55d	[VPlan] Only compute reg pressure if considered. NFCI (#156923 ) In #149056 VF pruning was changed so that it only pruned VFs that stemmed from MaxBandwidth being enabled. However we always compute register pressure regardless of whether or not max bandwidth is permitted for any VFs (via `MaxPermissibleVFWithoutMaxBW`). This skips the computation if not needed and renames the method for clarity. The diff in reg-usage.ll is due to the scalable VPlan not actually having any maxbandwidth VFs, so I've changed it to check the fixed-length VF instead, which is affected by maxbandwidth.	2025-09-05 00:23:47 +00:00
Shih-Po Hung	9876b06bc7	[LV] Add initial legality checks for loops with unbound loads. (#152422 ) This patch splits out the legality checks from PR #151300, following the landing of PR #128593. It is a step toward supporting vectorization of early-exit loops that contain potentially faulting loads. In this commit, an early-exit loop is considered legal for vectorization if it satisfies the following criteria: 1. it is a read-only loop. 2. all potentially faulting loads are unit-stride, which is the only type currently supported by vp.load.ff.	2025-09-05 08:20:16 +08:00
Florian Hahn	8796dfdcba	[VPlan] Consolidate logic to update loop metadata and profile info. This patch consolidates updating loop metadata and profile info for both the remainder and vector loops in a single place. This is NFC, modulo consistently applying vectorization specific metadata also in the experimental VPlan-native path. Split off from https://github.com/llvm/llvm-project/pull/154510.	2025-09-04 21:50:40 +01:00
Hassnaa Hamdi	35b22764e2	[LV][AArch64] Prefer epilogue with fixed-width over scalable VF. (#155546 ) In case of equal costs Prefer epilogue with fixed-width over scalable VF. That is helpful in cases like post-LTO vectorization where epilogue with fixed-width VF can be removed when we eventually know that the trip count is less than the epilogue iterations.	2025-09-04 19:31:30 +01:00
Florian Hahn	ec581e460a	[LV] Don't run instcombine for interleaved-accesses test. Drop instcombine from the run-line to make test independent and make it easier to follow the generated code for SCEV predicate checks.	2025-09-04 16:08:52 +01:00
Florian Hahn	a614807130	[LV] Add more tests for interleave groups requiring predicates. Adds tests for https://github.com/llvm/llvm-project/issues/156849. Also tidies up the existing related test a bit.	2025-09-04 15:45:15 +01:00
Florian Hahn	b400fd1151	[LAA] Support assumptions with non-constant deref sizes. (#156758 ) Update evaluatePtrAddrecAtMaxBTCWillNotWrap to support non-constant sizes in dereferenceable assumptions. Apply loop-guards in a few places needed to reason about expressions involving trip counts of the from (BTC - 1). PR: https://github.com/llvm/llvm-project/pull/156758	2025-09-04 11:32:33 +01:00
Ramkumar Ramachandra	c14052e20b	[VPlan] Let Not preserve uniformity in isSingleScalar (#156676 ) LogicalAnd and WidePtrAdd should also preserve uniformity, but we don't have test coverage to enable adding them.	2025-09-04 11:27:14 +01:00
Ramkumar Ramachandra	e4c0b3e111	[VPlan] Simplify x && false -> false, x \| 0 -> x (#156345 ) The OR x, 0 -> x simplification has been introduced to avoid regressions.	2025-09-04 10:29:59 +01:00
Florian Hahn	f1e91bff42	[LV] Regenerate more checks for missing branch weights.	2025-09-03 22:18:04 +01:00
Florian Hahn	ce5a1158b8	[LV] Regenerate checks for missing branch weights.	2025-09-03 21:37:52 +01:00
Florian Hahn	2729284db1	[LV] Add early-exit tests with deref assumptions and scaled sizes. Add tests where the size of dereferenceable assumption is multiplied by a constant.	2025-09-03 20:30:46 +01:00
Florian Hahn	a434a7a4f1	Reapply "[LAA,Loads] Use loop guards and max BTC if needed when checking deref. (#155672 )" This reverts commit f0df1e3dd4ec064821f673ced7d83e5a2cf6afa1. Recommit with extra check for SCEVCouldNotCompute. Test has been added in b16930204b. Original message: Remove the fall-back to constant max BTC if the backedge-taken-count cannot be computed. The constant max backedge-taken count is computed considering loop guards, so to avoid regressions we need to apply loop guards as needed. Also remove the special handling for Mul in willNotOverflow, as this should not longer be needed after 914374624f (https://github.com/llvm/llvm-project/pull/155300). PR: https://github.com/llvm/llvm-project/pull/155672	2025-09-03 12:45:28 +01:00
Mel Chen	2f5500e4cf	[LV] Improve the test coverage for strided access. nfc (#155981 ) Add tests for strided access with UF > 1, and introduce a new test case @constant_stride_reinterpret.	2025-09-03 10:19:36 +00:00

1 2 3 4 5 ...

3431 Commits