llvm-project

Author	SHA1	Message	Date
Florian Hahn	4c399b27c3	[LV] Add select cost test with negated condition. (NFC) Add additional test coverage for select with negated condition. Currently we overestimate the cost, because the negation can be folded in the compare.	2025-12-18 22:07:06 +00:00
Mel Chen	f196b1d66f	[VPlan] Extract reverse operation for reverse accesses (#146525 ) This patch introduces VPInstruction::Reverse and extracts the reverse operations of loaded/stored values from reverse memory accesses. This extraction facilitates future support for permutation elimination within VPlan.	2025-12-18 14:57:48 +00:00
Mel Chen	e655317cf1	[LV][EVL] Add test case for checking debug info when tail folding by EVL. nfc (#172429 )	2025-12-18 08:59:37 +00:00
Florian Hahn	bab0dc4d48	Reapply "[LV] Mark checks as never succeeding for high cost cutoff." Reapply 8a115b6934a90441 with an update to tests handling remarks. The patch now directly emits a clear remark when we bail out due to the memory check threshold. Original message: When GeneratedRTChecks::create bails out due to exceeding the cost threshold, no runtime checks are generated and we must not proceed assuming checks have been generated. Mark the checks as never succeeding, to make sure we don't try to vectorize assuming the runtime checks hold. This fixes a case where we previously incorrectly vectorized assuming runtime checks had been generated when forcing vectorization via metadate. Fixes the mis-compile mentioned in https://github.com/llvm/llvm-project/pull/166247#issuecomment-3631471588	2025-12-17 20:21:49 +00:00
Mingjie Xu	159f1c048e	[IR] Optimize PHINode::removeIncomingValue() by swapping removed incoming value with the last incoming value. (#171963 ) Current implementation uses `std::copy` to shift all incoming values after the removed index. This patch optimizes `PHINode::removeIncomingValue()` by replacing the linear shift of incoming values with a swap-with-last strategy. After this change, the relative order of incoming values after removal is not preserved. This improves compile-time for PHI nodes with many predecessors. Depends: https://github.com/llvm/llvm-project/pull/171955 https://github.com/llvm/llvm-project/pull/171956 https://github.com/llvm/llvm-project/pull/171960 https://github.com/llvm/llvm-project/pull/171962	2025-12-17 19:44:01 +08:00
Florian Hahn	eb0c7e752f	[VPlan] Replace BranchOnCount with Compare + BranchOnCond (NFC). (#172181 ) Expand BranchOnCount to BranchOnCond + ICmp in convertToConcreteRecipes to simplify codegen. PR: https://github.com/llvm/llvm-project/pull/172181	2025-12-16 19:19:31 +00:00
Luke Lau	67d0e21a62	Reapply "[VPlan] Remove legacy costing inside VPBlendRecipe::computeCost (#171846 )" (#172261 ) This reapplies #171846 with a test case and fix for a legacy cost-model mismatch assertion. In the previous version of the patch, we only considered the plan to contain simplifications when it had a VPBlendRecipe and VF.isScalar() was true. However for some VPlans we may have a blend with only the first lane used: BLEND ir<%phi> = ir<%foo.res> ir<%bar.res>/ir<%c> CLONE ir<%gep> = getelementptr ir<%p>, ir<%phi> vp<%5> = vector-pointer ir<%gep> And in the legacy cost model we cost a blend as a phi if it's uniform: // If we know that this instruction will remain uniform, check the cost of // the scalar version. if (isUniformAfterVectorization(I, VF)) VF = ElementCount::getFixed(1); So this replaces the VF.isScalar() check with vputils::onlyFirstLaneUsed, which matches how the VPlan cost model mirrored the legacy model beforehand. A VPInstruction::Select will also emit a scalar select for a vector VF if only the first lane is used, so this also updates VPBlendRecipe::computeCost to reflect that too.	2025-12-16 06:30:54 +00:00
Elvis Wang	1eba2cbe72	[LV] Convert uniform-address unmasked scatters to scalar store. (#166114 ) This patch optimizes vector scatters that have a uniform (single-scalar) address by replacing them with "extract-last-lane + scalar store" when the scatter is unmasked. Notes: - The legacy cost model can scalarize a store if both the address and the value are uniform. In VPlan we materialize the stored value via ExtractLastLane, so only the address must be uniform. - Some of the loops won't be vectorized any sine no vector instructions will be generated.	2025-12-16 12:24:22 +08:00
Ramkumar Ramachandra	0636225b93	[VPlan] Directly unroll VectorPointerRecipe (#168886 ) In an effort to get rid of VPUnrollPartAccessor and directly unroll recipes, start by directly unrolling VectorPointerRecipe, allowing for VPlan-based simplifications and simplification of the corresponding execute.	2025-12-15 10:54:06 +00:00
Florian Hahn	53cf22f3a1	[VPlan] Simplify live-ins early using SCEV. (#155304 ) Use SCEV to simplify all live-ins during VPlan0 construction. This enables us to remove special SCEV queries when constructing VPWidenRecipes and improves results in some cases. This leads to simplifications in a number of cases in real-world applications (~250 files changed across LLVM, SPEC, ffmpeg) PR: https://github.com/llvm/llvm-project/pull/155304	2025-12-14 20:15:05 +00:00
Florian Hahn	a99a982440	[LV] Add test coverage for remark for unprofitable RT checks. Add test coverage for remark when runtime checks are not profitable with threshold provided. Also make sure that X86 remark tests actually passes an X86 triple, which is needed for the threshold remark. Also clean up the tests a bit.	2025-12-13 22:44:09 +00:00
Luke Lau	4ea8157773	Revert "[VPlan] Remove legacy costing inside VPBlendRecipe::computeCost (#171846 )" This reverts commit fd5f53aa9b21060063484fc6c346316a34a6464c. It's triggering legacy cost model assertions reported in https://github.com/llvm/llvm-project/pull/171846#issuecomment-3647640019	2025-12-13 20:05:34 +08:00
Florian Hahn	0171e881b5	[VPlan] Strip stray whitespace when printing VPWidenIntOrFpInduction. printFlags takes care of inserting the needed spaces, remove unneeded extra stray whitespace	2025-12-12 21:28:50 +00:00
Florian Hahn	4e05d702f0	[LV] Always include middle block cost in isOutsideLoopWorkProfitable. (#171102 ) Always include the cost of the middle block in isOutsideLoopWorkProfitable. This addresses the TODO from https://github.com/llvm/llvm-project/pull/168949 and removes the temporary restriction. isOutsideLoopWorkProfitable already scales the cost outside loops according the expected trip counts. In practice this increases the minimum iteration threshold in a few cases. On a large IR corpus based on C/C++ workloads, ~50 out of 179450 vector loops have their thresholds increased slightly. PR: https://github.com/llvm/llvm-project/pull/171102	2025-12-11 21:41:47 +00:00
Luke Lau	fd5f53aa9b	[VPlan] Remove legacy costing inside VPBlendRecipe::computeCost (#171846 ) A VPBlendRecipe always emits selects, even when the VF is scalar. However the legacy cost model always costs all scalar non-header phis as a phi, and the VPlan cost model has to account for this. This can cause the cost to be a little off, for example not including the cost of the select in @smax_call_uniform leading to unprofitable vectorization. This removes this from the VPlan cost model and handles checks for the case in planContainsAdditionalSimplifications instead. I considered trying to make the legacy cost model more accurate but I'm not sure if it's possible. We need information as to whether or not the scalar VF we are costing is the original loop in which case it's actually a phi, or if it's a VPBlendRecipe that emits a select, potentially from a VF=1, UF>=1 VPlan.	2025-12-12 00:25:58 +08:00
Florian Hahn	5a1299b196	[VPlan] Strip stray whitespace when printing VPWidenSelectRecipe. (NFCI) printFlags takes care of inserting the correct amount of spaces, depending on whether there are flags to print or not.	2025-12-10 22:15:35 +00:00
Luke Lau	efda519a90	[LV] Use branch_weights metadata in getPredBlockCostDivisor test. NFC (#171299 ) This is more reliable in the event that the trivial fcmp gets folded away.	2025-12-10 06:13:32 +00:00
Aiden Grossman	f29d06029f	Revert "[LV] Mark checks as never succeeding for high cost cutoff." This reverts commit 8a115b6934a90441d77ea54af73e7aaaa1394b38. This broke premerge. https://lab.llvm.org/staging/#/builders/192/builds/13326 /home/gha/llvm-project/clang/test/Frontend/optimization-remark-options.c:10:11: remark: loop not vectorized: cannot prove it is safe to reorder floating-point operations; allow reordering by specifying '#pragma clang loop vectorize(enable)' before the loop or by providing the compiler option '-ffast-math'	2025-12-09 21:32:09 +00:00
Florian Hahn	8a115b6934	[LV] Mark checks as never succeeding for high cost cutoff. When GeneratedRTChecks::create bails out due to exceeding the cost threshold, no runtime checks are generated and we must not proceed assuming checks have been generated. Mark the checks as never succeeding, to make sure we don't try to vectorize assuming the runtime checks hold. This fixes a case where we previously incorrectly vectorized assuming runtime checks had been generated when forcing vectorization via metadate. Fixes the mis-compile mentioned in https://github.com/llvm/llvm-project/pull/166247#issuecomment-3631471588	2025-12-09 20:37:21 +00:00
Florian Hahn	7a5e2c9358	[LV] Add test with threshold=0 and metadata forcing vectorization. Test case for the mis-compile mentioned in https://github.com/llvm/llvm-project/pull/166247#issuecomment-3631471588 The issue is that we don't generate a runtime check even though it is required to vectorize.	2025-12-09 20:06:38 +00:00
Florian Hahn	c61a481a23	[VPlan] Use SCEV to prove non-aliasing for stores at different offsets. (#170347 ) Extend the logic add in https://github.com/llvm/llvm-project/pull/168771 to also allow sinking stores past stores in the same noalias set by checking if we can prove no-alias via the distance between accesses, checked via SCEV. PR: https://github.com/llvm/llvm-project/pull/170347	2025-12-09 16:19:13 +00:00
Florian Hahn	0768068ff0	[VPlan] Remove ExtractLastLane for plans with scalar VFs. (#171145 ) ExtractLastLane is a no-op for scalar VFs. Update simplifyRecipe to remove them. This also requires adjusting the code in VPlanUnroll.cpp to split off handling of ExtractLastLane/ExtractPenultimateElement for scalar VFs, which now needs to match ExtractLastPart. PR: https://github.com/llvm/llvm-project/pull/171145	2025-12-09 11:59:40 +00:00
Luke Lau	0fbb45e7d6	[LV] Return getPredBlockCostDivisor in uint64_t When the probability of a block is extremely low, HeaderFreq / BBFreq may be larger than 32 bits. Previously this got truncated to uint32_t which could cause division by zero exceptions on x86. Widen the return type to uint64_t which should fit the entire range of BlockFrequency values. It's also worth noting that a frequency can never be zero according to BlockFrequency.h, so we shouldn't need to worry about divide by zero in getPredBlockCostDivisor itself.	2025-12-09 15:43:13 +08:00
Florian Hahn	de53b1a4ef	[LV] Simplify IR for gather-cost.ll, auto-generate checks. (NFC) Simplify tests and auto-generate check in preparation for further updates.	2025-12-08 19:19:51 +00:00
Ramkumar Ramachandra	c5b90103da	[VPlan] Use nuw when computing {VF,VScale}xUF (#170710 ) These quantities should never unsigned-wrap. This matches the behavior if only VFxUF is used (and not VF): when computing both VF and VFxUF, nuw should hold for each step separately.	2025-12-08 15:46:02 +00:00
Luke Lau	e8219e5ce8	[VPlan] Use BlockFrequencyInfo in getPredBlockCostDivisor (#158690 ) In 531.deepsjeng_r from SPEC CPU 2017 there's a loop that we unprofitably loop vectorize on RISC-V. The loop looks something like: ```c for (int i = 0; i < n; i++) { if (x0[i] == a) if (x1[i] == b) if (x2[i] == c) // do stuff... } ``` Because it's so deeply nested the actual inner level of the loop rarely gets executed. However we still deem it profitable to vectorize, which due to the if-conversion means we now always execute the body. This stems from the fact that `getPredBlockCostDivisor` currently assumes that blocks have 50% chance of being executed as a heuristic. We can fix this by using BlockFrequencyInfo, which gives a more accurate estimate of the innermost block being executed 12.5% of the time. We can then calculate the probability as `HeaderFrequency / BlockFrequency`. Fixing the cost here gives a 7% speedup for 531.deepsjeng_r on RISC-V. Whilst there's a lot of changes in the in-tree tests, this doesn't affect llvm-test-suite or SPEC CPU 2017 that much: - On armv9-a -flto -O3 there's 0.0%/0.2% more geomean loops vectorized on llvm-test-suite/SPEC CPU 2017. - On x86-64 -flto -O3 with PGO there's 0.9%/0% less geomean loops vectorized on llvm-test-suite/SPEC CPU 2017. Overall geomean compile time impact is 0.03% on stage1-ReleaseLTO: https://llvm-compile-time-tracker.com/compare.php?from=9eee396c58d2e24beb93c460141170def328776d&to=32fbff48f965d03b51549fdf9bbc4ca06473b623&stat=instructions%3Au	2025-12-08 14:28:26 +00:00
Florian Hahn	3fc7419236	[VPlan] Replace ExtractLast(Elem\|LanePerPart) with ExtractLast(Lane/Part) (#164124 ) Replace ExtractLastElement and ExtractLastLanePerPart with more generic and specific ExtractLastLane and ExtractLastPart, which model distinct parts of extracting across parts and lanes. ExtractLastElement == ExtractLastLane(ExtractLastPart) and ExtractLastLanePerPart == ExtractLastLane, the latter clarifying the name of the opcode. A new m_ExtractLastElement matcher is provided for convenience. The patch should be NFC modulo printing changes. PR: https://github.com/llvm/llvm-project/pull/164124	2025-12-07 15:15:43 +00:00
Florian Hahn	ba836dc5ed	[VPlan] Remove stray space before ops when printing vector-ptr (NFC)	2025-12-06 13:07:07 +00:00
Florian Hahn	d8e52c0360	[VPlan] Use strict whitespace checks for VPlan printing test. Use --strict-whitespace for vplan-printing.ll to catch stray whitespaces. The test updates show a few places where we currently emit those.	2025-12-05 21:17:29 +00:00
Florian Hahn	f02dc4d198	[VPlan] Don't try to hoist multi-defs for first-order recurrences. Currently the hoisting implementation expects single-defs. Bail out on multi-defs (VPInterleaveRecipe), to fix an assertion. Fixes https://github.com/llvm/llvm-project/issues/170666	2025-12-04 21:09:16 +00:00
Florian Hahn	0e11a92447	[VPlan] Implement printing VPIRMetadata. (#168385 ) mplement printing for VPIRMetadata, using generic dyn_cast to VPIRMetadata. Depends on https://github.com/llvm/llvm-project/pull/166245 PR: https://github.com/llvm/llvm-project/pull/168385	2025-12-04 10:56:16 +00:00
Florian Hahn	1054a6e9de	[SCEV] Handle non-constant start values in AddRec UDiv canonicalization. (#170474 ) Follow-up to https://github.com/llvm/llvm-project/pull/169576 to enable UDiv canonicalization if the start of the AddRec is not constant. The fold is not restricted to constant start values, as long as we are able to compute a constant remainder. The fold is only applied if the subtraction of the remainder can be folded into to start expression, but that is just to avoid creating more complex AddRecs. For reference, the proof from #169576 is https://alive2.llvm.org/ce/z/iu2tav PR: https://github.com/llvm/llvm-project/pull/170474	2025-12-03 21:13:11 +00:00
Florian Hahn	095f8e0793	[LV] Add more tests for finding the first-iv of argmin. Adds more test coverage for https://github.com/llvm/llvm-project/pull/170223.	2025-12-03 21:06:36 +00:00
Florian Hahn	50916a4adc	[VPlan] Use predicate in VPInstruction::computeCost for selects. (#170278 ) In some cases, the lowering a select depends on the predicate. If the condition of a select is a compare instruction, thread the predicate through to the TTI hook. PR: https://github.com/llvm/llvm-project/pull/170278	2025-12-03 19:48:23 +00:00
Yingwei Zheng	6af1c3f3a9	[ValueTracking] Support scalable vector splats in computeKnownBits (#170345 ) Similar to https://github.com/llvm/llvm-project/pull/170325, this patch adds support for scalable vector splats in computeKnownBits.	2025-12-03 20:37:30 +08:00
Florian Hahn	f0e1254bce	[LV] Use forced cost once for whole interleave group in legacy costmodel (#168270 ) The VPlan-based cost model assigns the forced cost once for a whole VPInterleaveRecipe. Update the legacy cost model to match this behavior. This fixes a cost-model divergence, and assigns the cost in a way that matches the generated code more accurately. PR: https://github.com/llvm/llvm-project/pull/168270	2025-12-02 21:39:54 +00:00
Florian Hahn	1e6476ddb7	[LV] Add predicated store sinking tests requiring further noalias checks Add additional tests where extra no-alias checks are needed, as future extensions of https://github.com/llvm/llvm-project/pull/168771.	2025-12-02 17:39:02 +00:00
Florian Hahn	5d876093b7	[SCEV] Allow udiv canonicalization of potentially-wrapping AddRecs (#169576 ) Extend the {X,+,N}/C => {(X - X%N),+,N}/C canonicalization to handle AddRecs that may wrap, when X < N <= C and both N,C are powers of 2. The alignment and power-of-2 properties ensure division results remain equivalent for all offsets [(X - X%N), X). Alive2 Proof: https://alive2.llvm.org/ce/z/iu2tav Fixes https://github.com/llvm/llvm-project/issues/168709 PR: https://github.com/llvm/llvm-project/pull/169576	2025-12-02 14:09:53 +00:00
Tibor Győri	e8bf011085	[LV] Emit better debug and opt-report messages when vectorization is disallowed in the LoopVectorizer (#158513 ) While looking into fixing #158499, I found some other cases where the messages emitted could be improved. This PR improves both the messages printed to the debug output and the missed-optimization messages in cases where: - loop vectorization is explicitly disabled - loop vectorization is implicitly disabled by disabling all loop transformations - loop vectorization is set to happen only where explicitly enabled A branch that should currently be unreachable is also added. If the related logic ever breaks (eg. due to changes to getForce() or the ForceKind enum) this should alert devs and users. New test cases are also added to verify that the correct messages (and only them) are outputted. --------- Co-authored-by: GYT <tiborgyri@gmail.com> Co-authored-by: Florian Hahn <flo@fhahn.com>	2025-12-02 11:46:41 +00:00
Florian Hahn	4b6ad11876	[VPlan] Sink predicated stores with complementary masks. (#168771 ) Extend the logic to hoist predicated loads (https://github.com/llvm/llvm-project/pull/168373) to sink predicated stores with complementary masks in a similar fashion. The patch refactors some of the existing logic for legality checks to be shared between hosting and sinking, and adds a new sinking transform on top. With respect to the legality checks, for sinking stores the code also checks if there are any aliasing stores that may alias, not only loads. PR: https://github.com/llvm/llvm-project/pull/168771	2025-12-02 11:43:37 +00:00
Florian Hahn	2864afbe4d	[LV] Add more tests for argmin finding the first index. Add more test coverage for supporting argmin/argmax with strict predicates, in preparation for follow up to 99addbf73db596403a17.	2025-12-01 22:40:05 +00:00
Florian Hahn	25ab47bd40	[VPlan] Use wide IV if scalar lanes > 0 are used with scalable vectors. (#169796 ) For scalable vectors, VPScsalarIVStepsRecipe cannot create all scalar step values. At the moment, it creates a vector, in addition to to the first lane. The only supported case for this is when only the last lane is used. A recipe should not set both scalar and vector values. Instead, we can simply use a vector induction. It would also be possible to preserve the current vector code-gen, by creating VPInstructions based on the first lane of VPScalarIVStepsRecipe, but using a vector induction seems simpler. PR: https://github.com/llvm/llvm-project/pull/169796	2025-12-01 17:33:36 +00:00
David Sherwood	17677ad7eb	[LV] Don't create WidePtrAdd recipes for scalar VFs (#169344 ) While attempting to remove the use of undef from more loop vectoriser tests I discovered a bug where this assert was firing: ``` llvm::Constant* llvm::Constant::getSplatValue(bool) const: Assertion `this->getType()->isVectorTy() && "Only valid for vectors!"' failed. ... #8 0x0000aaaab9e2fba4 llvm::Constant::getSplatValue #9 0x0000aaaab9dfb844 llvm::ConstantFoldBinaryInstruction ``` This seems to be happening because we are incorrectly generating WidePtrAdd recipes for scalar VFs. The PR fixes this by checking whether a plan has a scalar VF only in legalizeAndOptimizeInductions. This PR also removes the use of undef from the test `both` in Transforms/LoopVectorize/iv_outside_user.ll, which is what started triggering the assert. Fixes #169334	2025-12-01 08:12:41 +00:00
Luke Lau	dc5ce79cc1	[LV] Regenerate some check lines. NFC The scalar loop doesn't exist anymore after 8907b6d39371d439461cdd3475d5590f87821377	2025-12-01 15:25:08 +08:00
Florian Hahn	113e0c95a8	[LV] Add additional tests for argmin with find-first wrapping IV ranges. Add test cases for upcoming argmin vectorization changes that have wrapping IV ranges.	2025-11-30 21:07:28 +00:00
Florian Hahn	24b87b8d48	[VPlan] Skip cost verification for loops with EVL gather/scatter. The VPlan-based cost model use vp_gather/vp_scatter for gather/scatter costs, which is different to the legacy cost model and cannot be matched there. Don't verify the costs match for plans containing gather/scatters with EVL. Fixes https://github.com/llvm/llvm-project/issues/169948.	2025-11-29 22:00:30 +00:00
Florian Hahn	cd3192a2c9	[VPlan] Turn IVOp assertion into early exit. Turn assertion added in 99addbf73 [0] into an early exit. There are cases where the operand may not be a VPWidenIntOrFpInductionRecipe, e.g. if the IV increment is selected, as in the test cases. [0] https://github.com/llvm/llvm-project/pull/141431	2025-11-29 20:49:22 +00:00
Florian Hahn	66d33cec99	[LV] Extend test coverage for inductions depending on complex SCEVs. Re-generate check lines, add test with complex SCEV as induction start value and add stores to existing loops to make them not trivial.	2025-11-29 12:26:51 +00:00
Florian Hahn	99addbf73d	[LV] Vectorize selecting last IV of min/max element. (#141431 ) Add support for vectorizing loops that select the index of the minimum or maximum element. The patch implements vectorizing those patterns by combining Min/Max and FindFirstIV reductions. It extends matching Min/Max reductions to allow in-loop users that are FindLastIV reductions. It records a flag indicating that the Min/Max reduction is used by another reduction. The extra user is then check as part of the new `handleMultiUseReductions` VPlan transformation. It processes any reduction that has other reduction users. The reduction using the min/max reduction currently must be a FindLastIV reduction, which needs adjusting to compute the correct result: 1. We need to find the last IV for which the condition based on the min/max reduction is true, 2. Compare the partial min/max reduction result to its final value and, 3. Select the lanes of the partial FindLastIV reductions which correspond to the lanes matching the min/max reduction result. Depends on https://github.com/llvm/llvm-project/pull/140451 PR: https://github.com/llvm/llvm-project/pull/141431	2025-11-28 22:26:19 +00:00
Florian Hahn	4dc29b8a5d	[LV] Add additional argmin/argmax tests for #141431 . Apply suggestions for tests from https://github.com/llvm/llvm-project/pull/141431 and add additional missing coverage.	2025-11-28 19:15:31 +00:00

1 2 3 4 5 ...

3688 Commits