llvm-project

Author	SHA1	Message	Date
Florian Hahn	10a6fd70d6	[LV] Regenerate checks for test (NFC). Auto-generate check lines for scalable-loop-unpredicated-body-scalar-tail.ll, while also updating the input to be more compact and avoid unnecessary checks to keep auto-generated checks compact without loss of generality.	2025-08-13 12:20:50 +01:00
Mel Chen	b9138bde35	[LV][EVL] More lit tests for interleaved access. nfc (#152959 ) Add test cases for reverse interleaved access and interleaved access with gap.	2025-08-13 15:43:39 +08:00
Florian Hahn	8cdab07aaa	Reapply "[VPlan] Remove trivial dead VPPhi cycles." This reverts commit 1c7c8e3ad39957285524ff116d9a6aec0d9b62f9. Recommit with a fix for the verifier error caused for EVL recipes. Extra test coverage added in 6f939da60e.	2025-08-12 22:09:30 +01:00
Florian Hahn	6f939da60e	[LV] Add additional test for backedge elimination with EVL.	2025-08-12 21:58:19 +01:00
Florian Hahn	424258947e	[VPlan] Materialize VF and VFxUF using VPInstructions. (#152879 ) Materialize VF and VFxUF computation using VPInstruction instead of directly creating IR. This is one of the last few steps needed to model the full vector skeleton in VPlan. This is mostly NFC, although in some cases we remove some unused computations. PR: https://github.com/llvm/llvm-project/pull/152879	2025-08-12 14:13:13 +01:00
David Sherwood	8140779a9a	[LV] Improve accuracy of branch weights in epilogue iteration check block (#152980 ) When one of the vector loops (main or epilogue) is scalable and the other isn't, we can use the estimated value of vscale to improve the accuracy.	2025-08-12 10:37:47 +01:00
Sam Tebbs	0bfa1718af	[LV] Create in-loop sub reductions (#147026 ) This PR allows the loop vectorizer to handle in-loop sub reductions by forming a normal in-loop add reduction with a negated input. Stacked PRs: 1. -> https://github.com/llvm/llvm-project/pull/147026 2. https://github.com/llvm/llvm-project/pull/147255 3. https://github.com/llvm/llvm-project/pull/147302 4. https://github.com/llvm/llvm-project/pull/147513	2025-08-12 10:22:41 +01:00
Florian Hahn	3cad3de6ea	[LV] Add more tests for handling IR metadata for interleave groups. Includes a test case for https://github.com/llvm/llvm-project/issues/153006	2025-08-11 22:09:07 +01:00
Florian Hahn	1c7c8e3ad3	Revert "[VPlan] Remove trivial dead VPPhi cycles." This reverts commit 1f17bb133f4f49942a1e0245291811ca3c99a7d2. This seems to be breaking some RISCV bots, reverting for now https://lab.llvm.org/buildbot/#/builders/210/builds/1266	2025-08-11 22:05:30 +01:00
Florian Hahn	1f17bb133f	[VPlan] Remove trivial dead VPPhi cycles. Update removeDeadRecipes to remove trivial dead VPPhi cycles. Should effectively be NFC end-to-end.	2025-08-11 21:29:49 +01:00
Ramkumar Ramachandra	95c525b1db	[VPlan] Preserve nusw on VectorEndPointer (#151558 ) In createInterleaveGroups, get the nusw in addition to inbounds from the existing GEP, and set them on the VPVectorEndPointerRecipe.	2025-08-11 10:38:25 +01:00
David Sherwood	9181a7e294	[LV] Fix branch weights in epilogue min iteration check block (#152534 ) I've changed how we construct the EpilogueVectorizerEpilogueLoop and EpilogueVectorizerMainLoop classes so that we construct the parent class with an additional boolean parameter indicating whether we're vectorising the main or epilogue loop. The InnerLoopAndEpilogueVectorizer class uses this new argument in combination with the EpilogueLoopVectorizationInfo struct to set the right UF and VF values. This then allows EpilogueVectorizerEpilogueLoop to access the correct values of VF and UF for the main loop, which are required when setting branch weights in the minimum iteration check block.	2025-08-11 09:52:54 +01:00
Elvis Wang	37fe7a9933	[LV] Generate scalar xor for VPInstruction::Not if possible. (#152628 ) `VPInstruction::Not` which will generate xor instruction is widely used for the exit condition. This patch make `VPInstruction::Not` generate scalar `xor` if possible. This can help reducing the (splat true) in the `xor` and make `xor` be scalar.	2025-08-11 16:35:21 +08:00
Florian Hahn	86813aa786	[VPlan] Add dedicated user for resume phi with epilogue vectorization. Epilogue vectorization currently relies on the resume phi for the canonical induction being always available, which is why VPPhi are considered to have side-effects, to prevent their removal. This patch adds a new ResumeForEpilogue opcode to mark the resume phi as used for epilogue vectorization. This allows treating VPPhis in general as not having side-effects, enabling removal of unused VPPhis.	2025-08-10 21:21:16 +01:00
Florian Hahn	d9199a85e1	[LV] Add missing check lines for tests. Add stray missing check lines for 2 tests.	2025-08-09 21:33:36 +01:00
Luke Lau	723de7f231	[LV][RISCV] Try fixing Windows buildbot failure in force-vect-msg.ll. NFC The clang-x64-windows-msvc buildbot is failing after 707447159341f7b5678dee4f47731af50524b9ae due to this test failing: https://lab.llvm.org/buildbot/#/builders/63/builds/8528 This is a stab in the dark, but my first thought is that it may be due to the handling of floats with MSVC or something. So this removes the floating point part of the check. I don't have access to a Windows machine handy to debug this just yet, so pushing this to see if it can quickly return the buildbot to green.	2025-08-09 21:09:36 +08:00
Florian Hahn	82d633e9ff	[VPlan] Materialize vector trip count using VPInstructions. (#151925 ) Materialize the vector trip count computation using VPInstruction instead of directly creating IR. This is one of the last few steps needed to model the full vector skeleton in VPlan. It also simplifies vector-trip count computations for scalable vectors, as we can re-use the UF x VF computation. PR: https://github.com/llvm/llvm-project/pull/151925	2025-08-08 11:44:32 +01:00
Graham Hunter	de72cca671	[CostModel] Provide a default model for histogram intrinsics (#149348 ) Since we scalarize these intrinsics when the target does not support them, we should model that for costing purposes.	2025-08-08 11:00:00 +01:00
Nikita Popov	c23b4fbdbb	[IR] Remove size argument from lifetime intrinsics (#150248 ) Now that #149310 has restricted lifetime intrinsics to only work on allocas, we can also drop the explicit size argument. Instead, the size is implied by the alloca. This removes the ability to only mark a prefix of an alloca alive/dead. We never used that capability, so we should remove the need to handle that possibility everywhere (though many key places, including stack coloring, did not actually respect this).	2025-08-08 11:09:34 +02:00
Luke Lau	7074471593	[RISCV] Enable tail folding by default (#151681 ) We have been tracking the performance of EVL tail folding in the loop vectorizer on RISC-V for a while now, and after much hard work from various contributors we think it should be generally profitable to enable by default now. With tail folding there is a 21% improvement on 525.x264_r on SPEC CPU 2017 on the BPI-F3 (-march=rva22u64_v -O3 -flto), as well as a 30% geomean codesize reduction on SPEC and TSVC, with no significant regressions detected. Now that we are early into the LLVM 22.x development cycle it seems like a good time to enable it to catch any issues. There are still more EVL related items of work being tracked in #123069, which should continue to improve performance.	2025-08-08 14:26:23 +08:00
Luke Lau	0720af8c24	[LV][RISCV] Precommit RUN line changes from #151681 . NFC In preparation for enabling EVL tail folding by default.	2025-08-08 12:40:27 +08:00
Ramkumar Ramachandra	edeee824f0	Reland [VectorUtils] Trivially vectorize ldexp, [l]lround (#152476 ) Changes: The original patch, landed as 1336675, was reverted due to a bug in LoopVectorize resulting in a crash. The bug has now been fixed by 95c32bf ([VPlan] Return invalid cost if any skeleton block has invalid costs), and this reland is identical to the original patch.	2025-08-07 12:07:29 +01:00
Florian Hahn	47944d071f	[LV] Auto-generate checks for sve-low-trip-count.ll. Auto-generate checks for https://github.com/llvm/llvm-project/pull/151925. Also update some naming to make more consistent with other tests.	2025-08-07 10:50:20 +01:00
Florian Hahn	95c32bf2d4	[VPlan] Return invalid cost if any skeleton block has invalid costs. (#151940 ) We need to reject plans that contain recipes with invalid costs. LICM can move recipes with invalid costs out of the loop region, which then get missed by the main cost computation. Extend the logic to check recipes for invalid cost currently only covering the middle block to include all skeleton blocks. Fixes https://github.com/llvm/llvm-project/issues/144358 Fixes https://github.com/llvm/llvm-project/issues/151664 PR: https://github.com/llvm/llvm-project/pull/151940	2025-08-07 10:45:27 +01:00
Ties Stuij	b9e133d5b6	[AArch64][SVE] Use FeatureUseFixedOverScalableIfEqualCost for A320 (#152156 ) With this new A320 in-order core, we follow adding the FeatureUseFixedOverScalableIfEqualCost feature to A510 and A520 (#132246), which reaps the same code generation benefits of preferring fixed over scalable when the cost is equal. So when we have: ``` void foo(float* a, float* b, float* dst, unsigned n) { for (unsigned i = 0; i < n; ++i) dst[i] = a[i] + b[i]; } ``` When compiling without the feature enabled, we get: ``` ... ld1b { z0.b }, p0/z, [x0, x10] ld1b { z2.b }, p0/z, [x1, x10] add x12, x0, x10 ldr z1, [x12, #1, mul vl] add x12, x1, x10 ldr z3, [x12, #1, mul vl] fadd z0.s, z2.s, z0.s add x12, x2, x10 fadd z1.s, z3.s, z1.s dech x11 st1b { z0.b }, p0, [x2, x10] incb x10, all, mul #2 str z1, [x12, #1, mul vl] ... ``` When compiling with, we get: ``` ... ldp q0, q1, [x12, #-16] ldp q2, q3, [x11, #-16] subs x13, x13, #8 fadd v0.4s, v2.4s, v0.4s fadd v1.4s, v3.4s, v1.4s add x11, x11, #32 add x12, x12, #32 stp q0, q1, [x10, #-16] add x10, x10, #32 ... ```	2025-08-07 09:48:09 +01:00
Luke Lau	a04142f11f	[LV][RISCV] Add check lines for scalable interleave costs. NFC Previously we could only scalably vectorize interleave groups with factor 2, but after 7ef77eb9984d1fb537a409cf4be89560fbb681fe we now support all factors (available on RISC-V). So this adds the remaining check lines for the scalable VFs.	2025-08-07 12:28:12 +08:00
Luke Lau	44af26ea2e	[LV] Fix EVL test after merge. NFC Test was modified in both 25d1285eecbab731eaf418c8aab44e4eb5f9e538 and df8da2ff8370fda479b5c118704af4f50e0d3536	2025-08-07 11:12:43 +08:00
Luke Lau	df8da2ff83	[VPlan] Support VPWidenPointerInductionRecipes with EVL tail folding (#152110 ) Now that VPWidenPointerInductionRecipes are modelled in VPlan in #148274, we can support them in EVL tail folding. We need to replace their VFxUF operand with EVL as the increment is not guaranteed to always be VF on the penultimate iteration, and UF is always 1 with EVL tail folding. We also need to move the creation of the backedge value to the latch so that EVL dominates it. With this we will no longer fail to convert a VPlan to EVL tail folding, so adjust tryAddExplicitVectorLength to account for this. This brings us to 99.4% of all vector loops vectorized on SPEC CPU 2017 with tail folding vs no tail folding. The test in only-compute-cost-for-vplan-vfs.ll previously relied on widened pointer inductions with EVL tail folding to end up in a scenario with no vector VPlans, so this also replaces it with an unvectorizable fixed-order recurrence test from first-order-recurrence-multiply-recurrences.ll that also gets discarded.	2025-08-07 10:54:24 +08:00
Anna Thomas	59231115b0	[Loads] Precommit tests for #149551 . NFC Add these tests that currently require predicated loads due to variable start SCEV.	2025-08-06 15:43:51 -04:00
Florian Hahn	25d1285eec	[VPlan] Replace single-entry VPPhis with their incoming values. Replace trivial, single-entry VPPhis with their incoming values,	2025-08-06 20:03:31 +01:00
Florian Hahn	e80e7e717e	[VPlan] Use scalar VPPhi instead of VPWidenPHIRecipe in createPlainCFG. (#150847 ) The initial VPlan closely reflects the original scalar loop, so unsing VPWidenPHIRecipe here is premature. Widened phi recipes should only be introduced together with other widened recipes. PR: https://github.com/llvm/llvm-project/pull/150847	2025-08-06 14:43:03 +01:00
Florian Hahn	d478502a42	[VPlan] Ensure that IV resume phi for epilogue is always first. (NFCI) Update handling of canonical IV resume phi for the epilogue loop to make sure the resume phi for the canonical IV is always the first phi in the scalar preheader. This makes it easier to retrieve it in preparePlanForEpilogueVectorLoop. For now, we keep an assert to make sure we use the same resume phi as before. This will be removed in the future.	2025-08-05 21:06:41 +01:00
Florian Hahn	e3ededa0f1	[LV] Add tests with canonical widen IV, reductions in different order. Add missing test coverage for re-using the resume value from the main vector loop for the canonical IV in the epilogue.	2025-08-05 19:19:13 +01:00
Ramkumar Ramachandra	f03345a07a	[LV] Improve a test; get rid of runtime checks (#152182 )	2025-08-05 18:48:10 +01:00
Ramkumar Ramachandra	5dfc2d4535	[LV] Regen some tests with UTC (#152128 )	2025-08-05 18:01:02 +01:00
Luke Lau	94a6cd464e	[VPlan] Expand VPWidenPointerInductionRecipe into separate recipes (#148274 ) This is the VPWidenPointerInductionRecipe equivalent of #118638, with the motivation of allowing us to use the EVL as the induction step. There is a new VPInstruction added, WidePtrAdd to allow adding the step vector to the induction phi, since VPInstruction::PtrAdd only handles scalars or multiple scalar lanes. Originally this transformation was copied from the original recipe's execute code, but it's since been simplifed by teaching `unrollWidenInductionByUF` to unroll the recipe, which brings it inline with VPWidenIntOrFpInductionRecipe.	2025-08-05 16:54:02 +08:00
Florian Hahn	c9dd14d1d4	[VPlan] Compute interleave count for VPlan. (#149702 ) Move selectInterleaveCount to LoopVectorizationPlanner and retrieve some information directly from VPlan. Register pressure was already computed for a VPlan, and with this patch we now also check for reductions directly on VPlan, as well as checking how many load and store operations remain in the loop. This should be mostly NFC, but we may compute slightly different interleave counts, except for some edge cases, e.g. where dead loads have been removed. This shouldn't happen in practice, and the patch doesn't cause changes across a large test corpus on AArch64. Computing the interleave count based on VPlan allows for making better decisions in presence of VPlan optimizations, for example when operations on interleave groups are narrowed. Note that there are a few test changes for tests that were still checking the legacy cost-model output when it was computed in selectInterleaveCount. PR: https://github.com/llvm/llvm-project/pull/149702	2025-08-05 09:42:55 +01:00
Mel Chen	76d98cfcc4	[RISCV][TTI] Enable masked interleave access (#151665 ) Now that support for masked loads/stores of interleave groups has landed, we can enable the loop vectorizer to generate masked interleave access where applicable. This improves vectorization in several ways: * Internal predication support: This enables interleave group vectorization for loops with internal control flow predication, provided all members of the group share the same predicate. Gaps in interleave groups are still not efficiently handled by masking, so masking for gaps remains disabled for now. * Tail folding: This allows tail folding of loops with interleave groups by using masking. Without this, vectorized loops with interleaves would fall back to using separate gather/scatter accesses, which can be significantly less efficient. "[RISCV][TTI] Enable masked interleave access for scalable vector (#149981)" was reverted by 5294793bdcf6ca142f7a0df897638bd4e85ed1a7 due to triggering an assertion. The issue has been addressed in the patch "[LV] Fix gap mask requirement for interleaved access (#151105)". On the other hand, this patch also enable fixed-length masked interleave access (#150624) since support for fixed-length has also been landed 992118cb4deab139ae384bb85f03225a9a21b008. --------- Co-authored-by: Philip Reames <preames@rivosinc.com>	2025-08-05 16:08:13 +08:00
Mel Chen	8761b6cf8f	[VPlan] Use VPTypeAnalysis to get the step type of widen pointer induction (#147925 ) This patch uses VPTypeAnalysis to determine its type since the induction step is not always a live-in value in the VPlan and may be defined by a recipe.	2025-08-05 09:13:44 +08:00
Ramkumar Ramachandra	9d151897ba	[LV] Improve a test, regen with UTC (#151947 )	2025-08-04 18:41:47 +01:00
Florian Hahn	66a8341f6d	[VPlan] Skip disconnected exit blocks in hasEarlyExit. (#151718 ) Currently hasEarlyExit returns true, if there are multiple exit blocks. ExitBlocks contains the wrapped original IR exit blocks. Without checking the predecessors we incorrectly return true for loops with multiple countable exits, that have been vectorized by requiring a scalar epilogue. In that case, the exit blocks will get disconnected. Fix this by filtering out disconnected exit blocks. Currently this should only impact the 'early-exit vectorized' statistic. PR: https://github.com/llvm/llvm-project/pull/151718	2025-08-04 11:31:00 +01:00
Ramkumar Ramachandra	8a55d46ebc	[LV] Add missing CHECK lines for a test from UTC (#151871 )	2025-08-04 09:42:59 +01:00
Florian Hahn	559d1dff89	[VPlan] Materialize BackedgeTakenCount using VPInstructions. Explicitly compute the backedge-taken count using VPInstruction. This is needed to model the full skeleton in VPlan. NFC modulo some instruction re-ordering.	2025-08-03 12:21:28 +01:00
Florian Hahn	39c30665e9	[VPlan] Update type of cloned instruction in scalarizeInstruction. The operands of the replicate recipe may have been narrowed, resulting in a narrower result type. Update the type of the cloned instruction to the correct type. Fixes https://github.com/llvm/llvm-project/issues/151392.	2025-08-02 19:49:59 +01:00
Florian Hahn	08f50e9665	[VPlan] Use vector tripcount if computable when simplifying conds. (#151034 ) Update isConditionTrueViaVFAndUF to use the vector trip count if computable. This is the case when it has been materialized to a constant. Otherwise fall back to the trip count. PR: https://github.com/llvm/llvm-project/pull/151034	2025-08-02 16:31:31 +01:00
Florian Hahn	eee9755881	[LV] Refine check to find epilogue IV resume value. Make sure to check that the vector trip count is containedin the list of incoming values to serve as tie-breaker with phis with all-zero incoming values. Fixes https://github.com/llvm/llvm-project/issues/151686.	2025-08-01 20:54:39 +01:00
Ramkumar Ramachandra	78c57c9b02	[LV] Fix missing REQUIRES: asserts in test (#151737 ) e7200c7 ([LV] Pre-commit test for #151664) forgot to require asserts in the test, and stripped a CHECK line in error. Fix this.	2025-08-01 18:59:20 +01:00
Ramkumar Ramachandra	e7200c734d	[LV] Pre-commit test for #151664 (#151671 ) Hoisted vector instructions are costed incorrectly.	2025-08-01 17:09:11 +01:00
Florian Hahn	d204fdcc23	[LV] Add countable multi-exit test to vec.stats.ll Currently the multi-exit loops with all countable exits are considered vectorized with early exit, while that terminology is used for loops we performed early-exit vectorization, instead of requring a scalar epilogue.	2025-08-01 16:39:28 +01:00
Florian Hahn	2ae996cbbe	[LAA] Support assumptions in evaluatePtrAddRecAtMaxBTCWillNotWrap (#147047 ) This patch extends the logic added in https://github.com/llvm/llvm-project/pull/128061 to support dereferenceability information from assumptions as well. Unfortunately both assumption cache and the dominator tree need to be threaded through multiple layers to make them available where needed. PR: https://github.com/llvm/llvm-project/pull/147047	2025-08-01 14:18:07 +01:00

1 2 3 4 5 ...

3307 Commits