llvm-project

Author	SHA1	Message	Date
Elvis Wang	d611a9ca15	[LV][VPlan] Reduce register usage of VPEVLBasedIVPHIRecipe. (#154482 ) `VPEVLBasedIVPHIRecipe` will lower to VPInstruction scalar phi and generate scalar phi. This recipe will only occupy a scalar register just like other phi recipes. This patch fix the register usage for `VPEVLBasedIVPHIRecipe` from vector to scalar which is close to generated vector IR. https://godbolt.org/z/6Mzd6W6ha shows that no register spills when choosing `<vscale x 16>`. Note that this test is basically copied from AArch64.	2025-08-21 07:39:01 +08:00
Shih-Po Hung	cf0e86118d	[VPlan] Handle canonical VPWidenIntOrFpInduction in branch-condition simplification (#153539 ) SimplifyBranchConditionForVFAndUF only recognized canonical IVs and a few PHI recipes in the loop header. With more IV-step optimizations, the canonical widen-canonical-iv can be replaced by a canonical VPWidenIntOrFpInduction, which the pass did not handle, causing regressions (missed simplifications). This patch replaces canonical VPWidenIntOrFpInduction with a StepVector in the vector preheader since the vector loop region only executes once.	2025-08-21 07:34:54 +08:00
Florian Hahn	7d33743324	[LV] Add tests for narrowing interleave groups with scalable vectors.	2025-08-20 22:31:24 +01:00
Florian Hahn	b0d0e04693	[LV] Add test where we choose VF * IC is larger than trip count.	2025-08-20 20:40:49 +01:00
Florian Hahn	dc23869f98	[LV] Handle vector trip count being zero in preparePlanForEpiVectorLoop. After a485e0e, we may not set the vector trip count in preparePlanForEpilogueVectorLoop if it is zero. We should not choose a VF * UF that makes the main vector loop dead (i.e. vector trip count is zero), but there are some cases where this can happen currently. In those cases, set EPI.VectorTripCount to zero.	2025-08-20 11:54:22 +01:00
Florian Hahn	23ea79de61	[LV] Add more tests for costs of predicated udivs and calls. Adds missing test coverage for the cost model. Also reduce the size of check lines a bit, by using a common prefix and filtering out after scalar.ph.	2025-08-19 20:04:31 +01:00
David Sherwood	13d8ba7dea	[LV][TTI] Calculate cost of extracting last index in a scalable vector (#144086 ) There are a couple of places in the loop vectoriser where we want to calculate the cost of extracting the last lane in a vector. However, we wrongly assume that asking for the cost of extracting lane (VF.getKnownMinValue() - 1) is an accurate representation of the cost of extracting the last lane. For SVE at least, this is non-trivial as it requires the use of whilelo and lastb instructions. To solve this problem I have added a new getReverseVectorInstrCost interface where the index is used in reverse from the end of the vector. Suppose a vector has a given ElementCount EC, the extracted/inserted lane would be EC - 1 - Index. For scalable vectors this index is unknown at compile time. I've added a AArch64 hook that better represents the cost, and also a RISCV hook that maintains compatibility with the behaviour prior to this PR. I've also taken the liberty of adding support in vplan for calculating the cost of VPInstruction::ExtractLastElement.	2025-08-19 09:31:37 +01:00
Luke Lau	144736b07e	[VPlan] Don't fold live ins with both scalar and vector operands (#154067 ) If we end up with a extract_element VPInstruction where both operands are live-ins, we will try to fold the live-ins even though the first operand is a vector whilst the live-in is scalar. This fixes it by just returning the vector live-in instead of calling the folder, and removes the handling for insertelement where we aren't able to do the fold. From some quick testing we previously never hit this fold anyway, and were probably just missing test coverage. Fixes #154045	2025-08-19 04:10:53 +00:00
Tobias Stadler	8135b7c1ab	[LV] Emit all remarks for unvectorizable instructions (#153833 ) If ExtraAnalysis is requested, emit all remarks caused by unvectorizable instructions - instead of only the first. This is in line with how other places handle DoExtraAnalysis and it can be quite helpful to get info about all instructions in a loop that prevent vectorization.	2025-08-18 18:04:53 +01:00
Ramkumar Ramachandra	97f554249c	[VPlan] Preserve nusw in createInBoundsPtrAdd (#151549 ) Rename createInBoundsPtrAdd to createNoWrapPtrAdd, and preserve nusw as well as inbounds at the callsite.	2025-08-18 17:48:42 +01:00
Florian Hahn	73775a0f27	[LV] Add test for #153946 . Add test for miscompile from https://github.com/llvm/llvm-project/issues/153946, caused by poison propagation.	2025-08-16 21:19:20 +01:00
Florian Hahn	351d398a37	[VPlan] Run final VPlan simplifications before codegen. Dissolving the hierarchical VPlan CFG and converting abstract to concrete recipes can expose additional simplification opportunities. Do a final run of simplifyRecipes before executing the VPlan.	2025-08-16 18:54:27 +01:00
Florian Hahn	2b1e06598f	[LV] Regenerate some more check lines. (NFC)	2025-08-15 15:53:19 +01:00
Florian Hahn	36be0bba2a	[SCEV] Check if predicate is known false for predicated AddRecs. (#151134 ) Similarly to https://github.com/llvm/llvm-project/pull/131538, we can also try and check if a predicate is known to wrap given the backedge taken count. For now, this just checks directly when we try to create predicated AddRecs. This both helps to avoid spending compile-time on optimizations where we know the predicate is false, and can also help to allow additional vectorization (e.g. by deciding to scalarize memory accesses when otherwise we would try to create a predicated AddRec with a predicate that's always false). The initial version is quite restricted, but can be extended in follow-ups to cover more cases. PR: https://github.com/llvm/llvm-project/pull/151134	2025-08-15 09:30:25 +01:00
David Green	5836bae463	[AArch64] Change the cost of fma and fmuladd to match fmul. (#152963 ) As fmul and fmadd are so similar, their performance characteristics tend to be the same on most platforms, at least in terms of reciprocal throughputs. Processors capable of performing a given number of fmul per cycle can usually perform the same number of fma, with the extra add being relatively simple on top. This patch makes the scores of the two operations the same, which brings the throughput cost of a fma/fmuladd to 2, and the latency to 3, which are the defaults for fmul. Note that we might also want to change the throughput cost of a fmul to 1, as most processors have ample bandwidth for them, but they should still stay in-line with one another.	2025-08-14 21:53:45 +01:00
Florian Hahn	8a0c7e9b32	[LV] Regenerate some more tests.	2025-08-14 21:21:03 +01:00
Florian Hahn	db98ac43ec	[LV] Use shl for ((VF * Step) * vscale) in createStepForVF. (#153495 ) Directly emit shl instead of a multiply if VF * Step is a power-of-2. The main motivation here is to prepare the code and test for directly generating and expanding a SCEV expression of the minimum iteration count. SCEVExpander will directly emit shl for multiplies with powers-of-2. InstCombine will also performs this combine, so end-to-end this should effectively by NFC. PR: https://github.com/llvm/llvm-project/pull/153495	2025-08-14 19:27:51 +01:00
Florian Hahn	10a6fd70d6	[LV] Regenerate checks for test (NFC). Auto-generate check lines for scalable-loop-unpredicated-body-scalar-tail.ll, while also updating the input to be more compact and avoid unnecessary checks to keep auto-generated checks compact without loss of generality.	2025-08-13 12:20:50 +01:00
Mel Chen	b9138bde35	[LV][EVL] More lit tests for interleaved access. nfc (#152959 ) Add test cases for reverse interleaved access and interleaved access with gap.	2025-08-13 15:43:39 +08:00
Florian Hahn	8cdab07aaa	Reapply "[VPlan] Remove trivial dead VPPhi cycles." This reverts commit 1c7c8e3ad39957285524ff116d9a6aec0d9b62f9. Recommit with a fix for the verifier error caused for EVL recipes. Extra test coverage added in 6f939da60e.	2025-08-12 22:09:30 +01:00
Florian Hahn	6f939da60e	[LV] Add additional test for backedge elimination with EVL.	2025-08-12 21:58:19 +01:00
Florian Hahn	424258947e	[VPlan] Materialize VF and VFxUF using VPInstructions. (#152879 ) Materialize VF and VFxUF computation using VPInstruction instead of directly creating IR. This is one of the last few steps needed to model the full vector skeleton in VPlan. This is mostly NFC, although in some cases we remove some unused computations. PR: https://github.com/llvm/llvm-project/pull/152879	2025-08-12 14:13:13 +01:00
David Sherwood	8140779a9a	[LV] Improve accuracy of branch weights in epilogue iteration check block (#152980 ) When one of the vector loops (main or epilogue) is scalable and the other isn't, we can use the estimated value of vscale to improve the accuracy.	2025-08-12 10:37:47 +01:00
Sam Tebbs	0bfa1718af	[LV] Create in-loop sub reductions (#147026 ) This PR allows the loop vectorizer to handle in-loop sub reductions by forming a normal in-loop add reduction with a negated input. Stacked PRs: 1. -> https://github.com/llvm/llvm-project/pull/147026 2. https://github.com/llvm/llvm-project/pull/147255 3. https://github.com/llvm/llvm-project/pull/147302 4. https://github.com/llvm/llvm-project/pull/147513	2025-08-12 10:22:41 +01:00
Florian Hahn	3cad3de6ea	[LV] Add more tests for handling IR metadata for interleave groups. Includes a test case for https://github.com/llvm/llvm-project/issues/153006	2025-08-11 22:09:07 +01:00
Florian Hahn	1c7c8e3ad3	Revert "[VPlan] Remove trivial dead VPPhi cycles." This reverts commit 1f17bb133f4f49942a1e0245291811ca3c99a7d2. This seems to be breaking some RISCV bots, reverting for now https://lab.llvm.org/buildbot/#/builders/210/builds/1266	2025-08-11 22:05:30 +01:00
Florian Hahn	1f17bb133f	[VPlan] Remove trivial dead VPPhi cycles. Update removeDeadRecipes to remove trivial dead VPPhi cycles. Should effectively be NFC end-to-end.	2025-08-11 21:29:49 +01:00
Ramkumar Ramachandra	95c525b1db	[VPlan] Preserve nusw on VectorEndPointer (#151558 ) In createInterleaveGroups, get the nusw in addition to inbounds from the existing GEP, and set them on the VPVectorEndPointerRecipe.	2025-08-11 10:38:25 +01:00
David Sherwood	9181a7e294	[LV] Fix branch weights in epilogue min iteration check block (#152534 ) I've changed how we construct the EpilogueVectorizerEpilogueLoop and EpilogueVectorizerMainLoop classes so that we construct the parent class with an additional boolean parameter indicating whether we're vectorising the main or epilogue loop. The InnerLoopAndEpilogueVectorizer class uses this new argument in combination with the EpilogueLoopVectorizationInfo struct to set the right UF and VF values. This then allows EpilogueVectorizerEpilogueLoop to access the correct values of VF and UF for the main loop, which are required when setting branch weights in the minimum iteration check block.	2025-08-11 09:52:54 +01:00
Elvis Wang	37fe7a9933	[LV] Generate scalar xor for VPInstruction::Not if possible. (#152628 ) `VPInstruction::Not` which will generate xor instruction is widely used for the exit condition. This patch make `VPInstruction::Not` generate scalar `xor` if possible. This can help reducing the (splat true) in the `xor` and make `xor` be scalar.	2025-08-11 16:35:21 +08:00
Florian Hahn	86813aa786	[VPlan] Add dedicated user for resume phi with epilogue vectorization. Epilogue vectorization currently relies on the resume phi for the canonical induction being always available, which is why VPPhi are considered to have side-effects, to prevent their removal. This patch adds a new ResumeForEpilogue opcode to mark the resume phi as used for epilogue vectorization. This allows treating VPPhis in general as not having side-effects, enabling removal of unused VPPhis.	2025-08-10 21:21:16 +01:00
Florian Hahn	d9199a85e1	[LV] Add missing check lines for tests. Add stray missing check lines for 2 tests.	2025-08-09 21:33:36 +01:00
Luke Lau	723de7f231	[LV][RISCV] Try fixing Windows buildbot failure in force-vect-msg.ll. NFC The clang-x64-windows-msvc buildbot is failing after 707447159341f7b5678dee4f47731af50524b9ae due to this test failing: https://lab.llvm.org/buildbot/#/builders/63/builds/8528 This is a stab in the dark, but my first thought is that it may be due to the handling of floats with MSVC or something. So this removes the floating point part of the check. I don't have access to a Windows machine handy to debug this just yet, so pushing this to see if it can quickly return the buildbot to green.	2025-08-09 21:09:36 +08:00
Florian Hahn	82d633e9ff	[VPlan] Materialize vector trip count using VPInstructions. (#151925 ) Materialize the vector trip count computation using VPInstruction instead of directly creating IR. This is one of the last few steps needed to model the full vector skeleton in VPlan. It also simplifies vector-trip count computations for scalable vectors, as we can re-use the UF x VF computation. PR: https://github.com/llvm/llvm-project/pull/151925	2025-08-08 11:44:32 +01:00
Graham Hunter	de72cca671	[CostModel] Provide a default model for histogram intrinsics (#149348 ) Since we scalarize these intrinsics when the target does not support them, we should model that for costing purposes.	2025-08-08 11:00:00 +01:00
Nikita Popov	c23b4fbdbb	[IR] Remove size argument from lifetime intrinsics (#150248 ) Now that #149310 has restricted lifetime intrinsics to only work on allocas, we can also drop the explicit size argument. Instead, the size is implied by the alloca. This removes the ability to only mark a prefix of an alloca alive/dead. We never used that capability, so we should remove the need to handle that possibility everywhere (though many key places, including stack coloring, did not actually respect this).	2025-08-08 11:09:34 +02:00
Luke Lau	7074471593	[RISCV] Enable tail folding by default (#151681 ) We have been tracking the performance of EVL tail folding in the loop vectorizer on RISC-V for a while now, and after much hard work from various contributors we think it should be generally profitable to enable by default now. With tail folding there is a 21% improvement on 525.x264_r on SPEC CPU 2017 on the BPI-F3 (-march=rva22u64_v -O3 -flto), as well as a 30% geomean codesize reduction on SPEC and TSVC, with no significant regressions detected. Now that we are early into the LLVM 22.x development cycle it seems like a good time to enable it to catch any issues. There are still more EVL related items of work being tracked in #123069, which should continue to improve performance.	2025-08-08 14:26:23 +08:00
Luke Lau	0720af8c24	[LV][RISCV] Precommit RUN line changes from #151681 . NFC In preparation for enabling EVL tail folding by default.	2025-08-08 12:40:27 +08:00
Ramkumar Ramachandra	edeee824f0	Reland [VectorUtils] Trivially vectorize ldexp, [l]lround (#152476 ) Changes: The original patch, landed as 1336675, was reverted due to a bug in LoopVectorize resulting in a crash. The bug has now been fixed by 95c32bf ([VPlan] Return invalid cost if any skeleton block has invalid costs), and this reland is identical to the original patch.	2025-08-07 12:07:29 +01:00
Florian Hahn	47944d071f	[LV] Auto-generate checks for sve-low-trip-count.ll. Auto-generate checks for https://github.com/llvm/llvm-project/pull/151925. Also update some naming to make more consistent with other tests.	2025-08-07 10:50:20 +01:00
Florian Hahn	95c32bf2d4	[VPlan] Return invalid cost if any skeleton block has invalid costs. (#151940 ) We need to reject plans that contain recipes with invalid costs. LICM can move recipes with invalid costs out of the loop region, which then get missed by the main cost computation. Extend the logic to check recipes for invalid cost currently only covering the middle block to include all skeleton blocks. Fixes https://github.com/llvm/llvm-project/issues/144358 Fixes https://github.com/llvm/llvm-project/issues/151664 PR: https://github.com/llvm/llvm-project/pull/151940	2025-08-07 10:45:27 +01:00
Ties Stuij	b9e133d5b6	[AArch64][SVE] Use FeatureUseFixedOverScalableIfEqualCost for A320 (#152156 ) With this new A320 in-order core, we follow adding the FeatureUseFixedOverScalableIfEqualCost feature to A510 and A520 (#132246), which reaps the same code generation benefits of preferring fixed over scalable when the cost is equal. So when we have: ``` void foo(float* a, float* b, float* dst, unsigned n) { for (unsigned i = 0; i < n; ++i) dst[i] = a[i] + b[i]; } ``` When compiling without the feature enabled, we get: ``` ... ld1b { z0.b }, p0/z, [x0, x10] ld1b { z2.b }, p0/z, [x1, x10] add x12, x0, x10 ldr z1, [x12, #1, mul vl] add x12, x1, x10 ldr z3, [x12, #1, mul vl] fadd z0.s, z2.s, z0.s add x12, x2, x10 fadd z1.s, z3.s, z1.s dech x11 st1b { z0.b }, p0, [x2, x10] incb x10, all, mul #2 str z1, [x12, #1, mul vl] ... ``` When compiling with, we get: ``` ... ldp q0, q1, [x12, #-16] ldp q2, q3, [x11, #-16] subs x13, x13, #8 fadd v0.4s, v2.4s, v0.4s fadd v1.4s, v3.4s, v1.4s add x11, x11, #32 add x12, x12, #32 stp q0, q1, [x10, #-16] add x10, x10, #32 ... ```	2025-08-07 09:48:09 +01:00
Luke Lau	a04142f11f	[LV][RISCV] Add check lines for scalable interleave costs. NFC Previously we could only scalably vectorize interleave groups with factor 2, but after 7ef77eb9984d1fb537a409cf4be89560fbb681fe we now support all factors (available on RISC-V). So this adds the remaining check lines for the scalable VFs.	2025-08-07 12:28:12 +08:00
Luke Lau	44af26ea2e	[LV] Fix EVL test after merge. NFC Test was modified in both 25d1285eecbab731eaf418c8aab44e4eb5f9e538 and df8da2ff8370fda479b5c118704af4f50e0d3536	2025-08-07 11:12:43 +08:00
Luke Lau	df8da2ff83	[VPlan] Support VPWidenPointerInductionRecipes with EVL tail folding (#152110 ) Now that VPWidenPointerInductionRecipes are modelled in VPlan in #148274, we can support them in EVL tail folding. We need to replace their VFxUF operand with EVL as the increment is not guaranteed to always be VF on the penultimate iteration, and UF is always 1 with EVL tail folding. We also need to move the creation of the backedge value to the latch so that EVL dominates it. With this we will no longer fail to convert a VPlan to EVL tail folding, so adjust tryAddExplicitVectorLength to account for this. This brings us to 99.4% of all vector loops vectorized on SPEC CPU 2017 with tail folding vs no tail folding. The test in only-compute-cost-for-vplan-vfs.ll previously relied on widened pointer inductions with EVL tail folding to end up in a scenario with no vector VPlans, so this also replaces it with an unvectorizable fixed-order recurrence test from first-order-recurrence-multiply-recurrences.ll that also gets discarded.	2025-08-07 10:54:24 +08:00
Anna Thomas	59231115b0	[Loads] Precommit tests for #149551 . NFC Add these tests that currently require predicated loads due to variable start SCEV.	2025-08-06 15:43:51 -04:00
Florian Hahn	25d1285eec	[VPlan] Replace single-entry VPPhis with their incoming values. Replace trivial, single-entry VPPhis with their incoming values,	2025-08-06 20:03:31 +01:00
Florian Hahn	e80e7e717e	[VPlan] Use scalar VPPhi instead of VPWidenPHIRecipe in createPlainCFG. (#150847 ) The initial VPlan closely reflects the original scalar loop, so unsing VPWidenPHIRecipe here is premature. Widened phi recipes should only be introduced together with other widened recipes. PR: https://github.com/llvm/llvm-project/pull/150847	2025-08-06 14:43:03 +01:00
Florian Hahn	d478502a42	[VPlan] Ensure that IV resume phi for epilogue is always first. (NFCI) Update handling of canonical IV resume phi for the epilogue loop to make sure the resume phi for the canonical IV is always the first phi in the scalar preheader. This makes it easier to retrieve it in preparePlanForEpilogueVectorLoop. For now, we keep an assert to make sure we use the same resume phi as before. This will be removed in the future.	2025-08-05 21:06:41 +01:00
Florian Hahn	e3ededa0f1	[LV] Add tests with canonical widen IV, reductions in different order. Add missing test coverage for re-using the resume value from the main vector loop for the canonical IV in the epilogue.	2025-08-05 19:19:13 +01:00

1 2 3 4 5 ...

3324 Commits