llvm-project

Author	SHA1	Message	Date
Mingjie Xu	fac9472593	[IR] Reland Optimize PHINode::removeIncomingValue() and PHINode::removeIncomingValueIf() to use the swapping strategy. (#174274 ) Reland #171963, #172639 and #173444, they are reverted in 86b9f90b9574b3a7d15d28a91f6316459dcfa046 because of introducing non-determinism in compiles. The non-determinism has been fixed in 9b8addffa70cee5b2acc5454712d9cf78ce45710.	2026-01-04 09:24:53 +08:00
Florian Hahn	b4d833135a	[VPlan] Handle non-free bitcasts in getCostForRecipeWithOpcode. Update bitcast cost handling to match the legacy cost model.	2026-01-02 18:04:13 +00:00
David Green	75d4812532	[AArch64] Turn MaxInterleaveFactor into a subtarget feature (#171088 ) The default value for MaxInterleaveFactor is 2, but some CPUs prefer a wider factor of 4. This adds a subtarget feature so that cpus can override the default in their tuning features, keeping more of the options together in one place.	2026-01-02 15:45:27 +00:00
Florian Hahn	351f933d03	[LV] Add FindFirstIV test with IV truncated to i1 (NFC). Adds test case for https://github.com/llvm/llvm-project/issues/173459.	2026-01-02 15:10:03 +00:00
Florian Hahn	cd470dd817	[LV] Add test selecting negated IVs (NFC). Extend test coverage for selecting inductions.	2026-01-01 21:05:25 +00:00
Florian Hahn	5ee82dffc6	[VPlan] Handle addrspacecast/ptrtoaddr in VPlan-based cost model. Also handle missing PtrToAddrs and AddrSpaceCast in getCostForRecipeWithOpcode. This makes sure all cast opcodes are handled, fixing a crash on loops replicating addrspacecast and ptrtoaddrs.	2026-01-01 10:35:35 +00:00
Florian Hahn	0db04963d3	[VPlan] Fix use-after-free when iterating over live-ins directly. getLiveIns returns an iterator to members of a dense map. The loop may create new live-ins, which can trigger re-allocation of the underlying dense map, causing use-after-free accesses for the iterator. Make sure we iterate over a copy of the live-ins to avoid use-after-free. Fixes https://github.com/llvm/llvm-project/issues/173222.	2025-12-31 22:12:53 +00:00
Florian Hahn	2d60f87111	[VPlan] Only use legacy cost for instructions only used by exit conds. (#174029 ) Currently we need to precompute costs for exit conditions, to match the legacy cost, as they will get replaced by a compare against the canonical IV (or others, like active-lane-mask or EVL based) and the original compare will get removed. This is not true for instructions with users other than the exit condition. Those will remain, and we can just use the VPlan-based cost model to get more accurate results. This improves results in some cases, like @test_value_in_exit_compare_chain_used_outside because the IV increment user outside the loop is replaced by computing the final value outside the loop. It also fixes a crash introduced by f196b1d66ff (#146525). PR: https://github.com/llvm/llvm-project/pull/174029	2025-12-31 13:34:54 +00:00
Florian Hahn	746eced47d	[LV] Add extra tests for computing replicating cast costs (NFC)	2025-12-30 22:08:04 +00:00
Florian Hahn	0f3a9f658a	[LV] Add tail-folded test for fmax reductions without fast-math flags. Adds missing tail-folding test.	2025-12-30 20:49:32 +00:00
Florian Hahn	0b46cf7dcd	[VPlan] Handle BranchOnTwoConds in simplifyBranchCondition. This fixes a crash after introducing BranchOnTwoConds (524b1788, https://github.com/llvm/llvm-project/pull/172750) when trying to replace BranchOnTwoConds with a VPBranchOnCond, without dissolving the region. In that case, we need to update the appropriate condition operand.	2025-12-30 18:47:22 +00:00
Walter Lee	86b9f90b95	Revert 159f1c048e08a8780d92858cfc80e723c90235e3 (#173893 ) This causes non-determinism in compiles. From nikic: "FYI the non-determinism is also visible on llvm-opt-benchmark. Maybe repeatedly running test cases from `299446d99f` could reproduce the issue..." Also revert dependent 796fafeff92fe5d2d20594859e92607116e30a16 and e135447bda617125688b71d33480d131d1076a72.	2025-12-29 20:23:13 -05:00
Florian Hahn	524b1788c4	[VPlan] Add BranchOnTwoConds, use for early exit plans. (#172750 ) This PR introduces a new BranchOnTwoConds VPInstruction, that takes 2 boolean operands and must be placed in a block with 3 successors. If condition I is true, branches to successor I, otherwise falls through to check the next condition. If both conditions are false, branch to the third successor. This new branch recipe is used for early-exit loops, to simplify the representation in VPlan initially, by avoid the need for splitting the middle block early on, in a way that preserves the single-exit block property of regions. All exits still go through the latch block, but they can go to more than 2 successors. This idea was part of one of the original proposals for how to model early exits in VPlan, but at that point in time, there was no good way to handle this during code-gen, and we went with the early split-middle block approach initially. Now that we dissolve regions before ::execute, the new recipe can be lowered nicely after regions have been removed, to a set of VPBBs and BranchOnCond recipes. The initial lowering preserves the original structure with the split middle blocks. Follow-ups will improve the lowering to avoid this splitting, providing performance gains. PR: https://github.com/llvm/llvm-project/pull/172750	2025-12-29 19:39:38 +00:00
陈子昂	c9eb572b14	[LoopVectorize] Support vectorization of frexp intrinsic (#172957 ) This patch enables the vectorization of the llvm.frexp intrinsic. Following the suggestion in #112408, frexp is moved from isTriviallyScalarizable to isTriviallyVectorizable. Fixes #112408	2025-12-26 21:57:57 +00:00
Nikita Popov	6d1e7d4982	[LV][IRBuilder] Allow implicit truncation of step vector (#173229 ) LV can create step vectors that wrap around, e.g. `step-vector i1` with VF>2. Allow truncation when creating the vector constant to avoid an assertion failure with https://github.com/llvm/llvm-project/pull/171456. After https://github.com/llvm/llvm-project/pull/173494 the definition of the llvm.stepvector intrinsic has been changed to make it have wrapping semantics, so the semantics for the fixed and scalable case match now.	2025-12-25 12:38:53 +01:00
Matt Arsenault	5020e0ff14	ValueTracking: Improve computeKnownFPClass fmul handling (#173247 ) Improve known non-nan sign bit tracking. Handle cases with a known 0 or inf input of indeterminate sign. The tails of some library functions have sign management for special cases.	2025-12-24 22:17:58 +00:00
Florian Hahn	44a8d9c135	Reapply "[VPlan] Use predicate from VPValue VPWidenSelectR::computeCost." (#173170 ) This reverts commit f42af14073228 and re-applies https://github.com/llvm/llvm-project/pull/172915. It has an additional check if the condition is a live-in, which makes sure we preserve the original behavior in that case. This should fix the crash that caused the revert. Original commit message: Instead of looking up the predicate from the VPValue condition instead of the underlying IR. This improves cost modeling in some cases, e.g. when we can fold operations like negations in compares. On AArch64, this leads to additional vectorization in a few cases in practice. Example lowering for the modified test case: https://llvm.godbolt.org/z/6nc6jo5eG	2025-12-22 22:38:31 +00:00
Florian Hahn	d8ddfd9c09	[LV] Add additional select cost test with live-in compare cond (NFC). Add test case that triggered revert f42af1407322865.	2025-12-22 22:13:34 +00:00
Florian Hahn	f42af14073	Revert "[VPlan] Use predicate from VPValue VPWidenSelectR::computeCost." (#173170 ) Reverts llvm/llvm-project#172915 Looks like this may be causing https://lab.llvm.org/buildbot/#/builders/128/builds/9590 to fail. Revert while I confirm.	2025-12-20 22:54:21 +00:00
Florian Hahn	e77246dbf4	[VPlan] Use predicate from VPValue VPWidenSelectR::computeCost. (#172915 ) Instead of looking up the predicate from the VPValue condition instead of the underlying IR. This improves cost modeling in some cases, e.g. when we can fold operations like negations in compares. On AArch64, this leads to additional vectorization in a few cases in practice. Example lowering for the modified test case: https://llvm.godbolt.org/z/6nc6jo5eG PR: https://github.com/llvm/llvm-project/pull/172915	2025-12-20 22:09:58 +00:00
Florian Hahn	1f78f6a2d6	[LV] Check Addr in getAddressAccessSCEV in terms of SCEV expressions. (#171204 ) getAddressAccessSCEV previously had some restrictive checks that limited pointer SCEV expressions passed to TTI to GEPs with operands that must either be invariant or marked as inductions. As a consequence, the check rejected things like `GEP %base, (%iv + 1)`, while the SCEV for the GEP should be as easily analyzeable as for `GEP %base, %v`, with the only difference being the of the AddRec start adjusted by 1. This patch changes the code to use a SCEV-based check, limiting the address SCEV to be loop invariant, an affine AddRec (i.e. induction ), or an add expression of such operands or a sign-extended AddRec. This catches all existing cases getAddressAccessSCEV caught, plus additional ones like the cases mentioned above. This means we pass address SCEVs in more cases, giving the backends a better change to make informed decisions. It also unifies the decision when to use an address SCEV between the legacy and VPlan-based cost model. An illustrative example of showing the impact are the gather-cost.ll tests. Previously they were considered not profitable to vectorize because we failed to determine that %gep.src_data = getelementptr inbounds [1536 x float], ptr @src_data, i64 0, i64 %mul has a relatively small constant stride. There may be some rough edges in the cost models, where not passing pointer SCEVs hid some incorrect modeling, but those issues should be fixed in the target cost models if they surface. PR: https://github.com/llvm/llvm-project/pull/171204	2025-12-19 22:05:27 +00:00
Florian Hahn	4c399b27c3	[LV] Add select cost test with negated condition. (NFC) Add additional test coverage for select with negated condition. Currently we overestimate the cost, because the negation can be folded in the compare.	2025-12-18 22:07:06 +00:00
Mel Chen	f196b1d66f	[VPlan] Extract reverse operation for reverse accesses (#146525 ) This patch introduces VPInstruction::Reverse and extracts the reverse operations of loaded/stored values from reverse memory accesses. This extraction facilitates future support for permutation elimination within VPlan.	2025-12-18 14:57:48 +00:00
Mel Chen	e655317cf1	[LV][EVL] Add test case for checking debug info when tail folding by EVL. nfc (#172429 )	2025-12-18 08:59:37 +00:00
Florian Hahn	bab0dc4d48	Reapply "[LV] Mark checks as never succeeding for high cost cutoff." Reapply 8a115b6934a90441 with an update to tests handling remarks. The patch now directly emits a clear remark when we bail out due to the memory check threshold. Original message: When GeneratedRTChecks::create bails out due to exceeding the cost threshold, no runtime checks are generated and we must not proceed assuming checks have been generated. Mark the checks as never succeeding, to make sure we don't try to vectorize assuming the runtime checks hold. This fixes a case where we previously incorrectly vectorized assuming runtime checks had been generated when forcing vectorization via metadate. Fixes the mis-compile mentioned in https://github.com/llvm/llvm-project/pull/166247#issuecomment-3631471588	2025-12-17 20:21:49 +00:00
Mingjie Xu	159f1c048e	[IR] Optimize PHINode::removeIncomingValue() by swapping removed incoming value with the last incoming value. (#171963 ) Current implementation uses `std::copy` to shift all incoming values after the removed index. This patch optimizes `PHINode::removeIncomingValue()` by replacing the linear shift of incoming values with a swap-with-last strategy. After this change, the relative order of incoming values after removal is not preserved. This improves compile-time for PHI nodes with many predecessors. Depends: https://github.com/llvm/llvm-project/pull/171955 https://github.com/llvm/llvm-project/pull/171956 https://github.com/llvm/llvm-project/pull/171960 https://github.com/llvm/llvm-project/pull/171962	2025-12-17 19:44:01 +08:00
Florian Hahn	eb0c7e752f	[VPlan] Replace BranchOnCount with Compare + BranchOnCond (NFC). (#172181 ) Expand BranchOnCount to BranchOnCond + ICmp in convertToConcreteRecipes to simplify codegen. PR: https://github.com/llvm/llvm-project/pull/172181	2025-12-16 19:19:31 +00:00
Luke Lau	67d0e21a62	Reapply "[VPlan] Remove legacy costing inside VPBlendRecipe::computeCost (#171846 )" (#172261 ) This reapplies #171846 with a test case and fix for a legacy cost-model mismatch assertion. In the previous version of the patch, we only considered the plan to contain simplifications when it had a VPBlendRecipe and VF.isScalar() was true. However for some VPlans we may have a blend with only the first lane used: BLEND ir<%phi> = ir<%foo.res> ir<%bar.res>/ir<%c> CLONE ir<%gep> = getelementptr ir<%p>, ir<%phi> vp<%5> = vector-pointer ir<%gep> And in the legacy cost model we cost a blend as a phi if it's uniform: // If we know that this instruction will remain uniform, check the cost of // the scalar version. if (isUniformAfterVectorization(I, VF)) VF = ElementCount::getFixed(1); So this replaces the VF.isScalar() check with vputils::onlyFirstLaneUsed, which matches how the VPlan cost model mirrored the legacy model beforehand. A VPInstruction::Select will also emit a scalar select for a vector VF if only the first lane is used, so this also updates VPBlendRecipe::computeCost to reflect that too.	2025-12-16 06:30:54 +00:00
Elvis Wang	1eba2cbe72	[LV] Convert uniform-address unmasked scatters to scalar store. (#166114 ) This patch optimizes vector scatters that have a uniform (single-scalar) address by replacing them with "extract-last-lane + scalar store" when the scatter is unmasked. Notes: - The legacy cost model can scalarize a store if both the address and the value are uniform. In VPlan we materialize the stored value via ExtractLastLane, so only the address must be uniform. - Some of the loops won't be vectorized any sine no vector instructions will be generated.	2025-12-16 12:24:22 +08:00
Ramkumar Ramachandra	0636225b93	[VPlan] Directly unroll VectorPointerRecipe (#168886 ) In an effort to get rid of VPUnrollPartAccessor and directly unroll recipes, start by directly unrolling VectorPointerRecipe, allowing for VPlan-based simplifications and simplification of the corresponding execute.	2025-12-15 10:54:06 +00:00
Florian Hahn	53cf22f3a1	[VPlan] Simplify live-ins early using SCEV. (#155304 ) Use SCEV to simplify all live-ins during VPlan0 construction. This enables us to remove special SCEV queries when constructing VPWidenRecipes and improves results in some cases. This leads to simplifications in a number of cases in real-world applications (~250 files changed across LLVM, SPEC, ffmpeg) PR: https://github.com/llvm/llvm-project/pull/155304	2025-12-14 20:15:05 +00:00
Florian Hahn	a99a982440	[LV] Add test coverage for remark for unprofitable RT checks. Add test coverage for remark when runtime checks are not profitable with threshold provided. Also make sure that X86 remark tests actually passes an X86 triple, which is needed for the threshold remark. Also clean up the tests a bit.	2025-12-13 22:44:09 +00:00
Luke Lau	4ea8157773	Revert "[VPlan] Remove legacy costing inside VPBlendRecipe::computeCost (#171846 )" This reverts commit fd5f53aa9b21060063484fc6c346316a34a6464c. It's triggering legacy cost model assertions reported in https://github.com/llvm/llvm-project/pull/171846#issuecomment-3647640019	2025-12-13 20:05:34 +08:00
Florian Hahn	0171e881b5	[VPlan] Strip stray whitespace when printing VPWidenIntOrFpInduction. printFlags takes care of inserting the needed spaces, remove unneeded extra stray whitespace	2025-12-12 21:28:50 +00:00
Florian Hahn	4e05d702f0	[LV] Always include middle block cost in isOutsideLoopWorkProfitable. (#171102 ) Always include the cost of the middle block in isOutsideLoopWorkProfitable. This addresses the TODO from https://github.com/llvm/llvm-project/pull/168949 and removes the temporary restriction. isOutsideLoopWorkProfitable already scales the cost outside loops according the expected trip counts. In practice this increases the minimum iteration threshold in a few cases. On a large IR corpus based on C/C++ workloads, ~50 out of 179450 vector loops have their thresholds increased slightly. PR: https://github.com/llvm/llvm-project/pull/171102	2025-12-11 21:41:47 +00:00
Luke Lau	fd5f53aa9b	[VPlan] Remove legacy costing inside VPBlendRecipe::computeCost (#171846 ) A VPBlendRecipe always emits selects, even when the VF is scalar. However the legacy cost model always costs all scalar non-header phis as a phi, and the VPlan cost model has to account for this. This can cause the cost to be a little off, for example not including the cost of the select in @smax_call_uniform leading to unprofitable vectorization. This removes this from the VPlan cost model and handles checks for the case in planContainsAdditionalSimplifications instead. I considered trying to make the legacy cost model more accurate but I'm not sure if it's possible. We need information as to whether or not the scalar VF we are costing is the original loop in which case it's actually a phi, or if it's a VPBlendRecipe that emits a select, potentially from a VF=1, UF>=1 VPlan.	2025-12-12 00:25:58 +08:00
Florian Hahn	5a1299b196	[VPlan] Strip stray whitespace when printing VPWidenSelectRecipe. (NFCI) printFlags takes care of inserting the correct amount of spaces, depending on whether there are flags to print or not.	2025-12-10 22:15:35 +00:00
Luke Lau	efda519a90	[LV] Use branch_weights metadata in getPredBlockCostDivisor test. NFC (#171299 ) This is more reliable in the event that the trivial fcmp gets folded away.	2025-12-10 06:13:32 +00:00
Aiden Grossman	f29d06029f	Revert "[LV] Mark checks as never succeeding for high cost cutoff." This reverts commit 8a115b6934a90441d77ea54af73e7aaaa1394b38. This broke premerge. https://lab.llvm.org/staging/#/builders/192/builds/13326 /home/gha/llvm-project/clang/test/Frontend/optimization-remark-options.c:10:11: remark: loop not vectorized: cannot prove it is safe to reorder floating-point operations; allow reordering by specifying '#pragma clang loop vectorize(enable)' before the loop or by providing the compiler option '-ffast-math'	2025-12-09 21:32:09 +00:00
Florian Hahn	8a115b6934	[LV] Mark checks as never succeeding for high cost cutoff. When GeneratedRTChecks::create bails out due to exceeding the cost threshold, no runtime checks are generated and we must not proceed assuming checks have been generated. Mark the checks as never succeeding, to make sure we don't try to vectorize assuming the runtime checks hold. This fixes a case where we previously incorrectly vectorized assuming runtime checks had been generated when forcing vectorization via metadate. Fixes the mis-compile mentioned in https://github.com/llvm/llvm-project/pull/166247#issuecomment-3631471588	2025-12-09 20:37:21 +00:00
Florian Hahn	7a5e2c9358	[LV] Add test with threshold=0 and metadata forcing vectorization. Test case for the mis-compile mentioned in https://github.com/llvm/llvm-project/pull/166247#issuecomment-3631471588 The issue is that we don't generate a runtime check even though it is required to vectorize.	2025-12-09 20:06:38 +00:00
Florian Hahn	c61a481a23	[VPlan] Use SCEV to prove non-aliasing for stores at different offsets. (#170347 ) Extend the logic add in https://github.com/llvm/llvm-project/pull/168771 to also allow sinking stores past stores in the same noalias set by checking if we can prove no-alias via the distance between accesses, checked via SCEV. PR: https://github.com/llvm/llvm-project/pull/170347	2025-12-09 16:19:13 +00:00
Florian Hahn	0768068ff0	[VPlan] Remove ExtractLastLane for plans with scalar VFs. (#171145 ) ExtractLastLane is a no-op for scalar VFs. Update simplifyRecipe to remove them. This also requires adjusting the code in VPlanUnroll.cpp to split off handling of ExtractLastLane/ExtractPenultimateElement for scalar VFs, which now needs to match ExtractLastPart. PR: https://github.com/llvm/llvm-project/pull/171145	2025-12-09 11:59:40 +00:00
Luke Lau	0fbb45e7d6	[LV] Return getPredBlockCostDivisor in uint64_t When the probability of a block is extremely low, HeaderFreq / BBFreq may be larger than 32 bits. Previously this got truncated to uint32_t which could cause division by zero exceptions on x86. Widen the return type to uint64_t which should fit the entire range of BlockFrequency values. It's also worth noting that a frequency can never be zero according to BlockFrequency.h, so we shouldn't need to worry about divide by zero in getPredBlockCostDivisor itself.	2025-12-09 15:43:13 +08:00
Florian Hahn	de53b1a4ef	[LV] Simplify IR for gather-cost.ll, auto-generate checks. (NFC) Simplify tests and auto-generate check in preparation for further updates.	2025-12-08 19:19:51 +00:00
Ramkumar Ramachandra	c5b90103da	[VPlan] Use nuw when computing {VF,VScale}xUF (#170710 ) These quantities should never unsigned-wrap. This matches the behavior if only VFxUF is used (and not VF): when computing both VF and VFxUF, nuw should hold for each step separately.	2025-12-08 15:46:02 +00:00
Luke Lau	e8219e5ce8	[VPlan] Use BlockFrequencyInfo in getPredBlockCostDivisor (#158690 ) In 531.deepsjeng_r from SPEC CPU 2017 there's a loop that we unprofitably loop vectorize on RISC-V. The loop looks something like: ```c for (int i = 0; i < n; i++) { if (x0[i] == a) if (x1[i] == b) if (x2[i] == c) // do stuff... } ``` Because it's so deeply nested the actual inner level of the loop rarely gets executed. However we still deem it profitable to vectorize, which due to the if-conversion means we now always execute the body. This stems from the fact that `getPredBlockCostDivisor` currently assumes that blocks have 50% chance of being executed as a heuristic. We can fix this by using BlockFrequencyInfo, which gives a more accurate estimate of the innermost block being executed 12.5% of the time. We can then calculate the probability as `HeaderFrequency / BlockFrequency`. Fixing the cost here gives a 7% speedup for 531.deepsjeng_r on RISC-V. Whilst there's a lot of changes in the in-tree tests, this doesn't affect llvm-test-suite or SPEC CPU 2017 that much: - On armv9-a -flto -O3 there's 0.0%/0.2% more geomean loops vectorized on llvm-test-suite/SPEC CPU 2017. - On x86-64 -flto -O3 with PGO there's 0.9%/0% less geomean loops vectorized on llvm-test-suite/SPEC CPU 2017. Overall geomean compile time impact is 0.03% on stage1-ReleaseLTO: https://llvm-compile-time-tracker.com/compare.php?from=9eee396c58d2e24beb93c460141170def328776d&to=32fbff48f965d03b51549fdf9bbc4ca06473b623&stat=instructions%3Au	2025-12-08 14:28:26 +00:00
Florian Hahn	3fc7419236	[VPlan] Replace ExtractLast(Elem\|LanePerPart) with ExtractLast(Lane/Part) (#164124 ) Replace ExtractLastElement and ExtractLastLanePerPart with more generic and specific ExtractLastLane and ExtractLastPart, which model distinct parts of extracting across parts and lanes. ExtractLastElement == ExtractLastLane(ExtractLastPart) and ExtractLastLanePerPart == ExtractLastLane, the latter clarifying the name of the opcode. A new m_ExtractLastElement matcher is provided for convenience. The patch should be NFC modulo printing changes. PR: https://github.com/llvm/llvm-project/pull/164124	2025-12-07 15:15:43 +00:00
Florian Hahn	ba836dc5ed	[VPlan] Remove stray space before ops when printing vector-ptr (NFC)	2025-12-06 13:07:07 +00:00
Florian Hahn	d8e52c0360	[VPlan] Use strict whitespace checks for VPlan printing test. Use --strict-whitespace for vplan-printing.ll to catch stray whitespaces. The test updates show a few places where we currently emit those.	2025-12-05 21:17:29 +00:00

1 2 3 4 5 ...

3709 Commits