llvm-project

Author	SHA1	Message	Date
Mel Chen	f196b1d66f	[VPlan] Extract reverse operation for reverse accesses (#146525 ) This patch introduces VPInstruction::Reverse and extracts the reverse operations of loaded/stored values from reverse memory accesses. This extraction facilitates future support for permutation elimination within VPlan.	2025-12-18 14:57:48 +00:00
Simon Pilgrim	24d9550b27	[VectorCombine] foldShuffleOfBinops - if both operands are the same don't duplicate the total new cost (#172719 ) If we're shuffling/concatenating the same operands then ensure we don't duplicate the total cost, ensure we reuse the final shuffle and recognise that we reduce the total instruction count (so fold even when NewCost == OldCost, not just NewCost < OldCost).	2025-12-18 07:03:06 +00:00
Florian Hahn	9cc1585b13	[VPlan] Add VPBlockUtils::transferSuccessors (NFCI). Add a new helper to transfer successors to a new, unconnected VPBB. Helps to simplify existing code, and prepare for upcoming changes.	2025-12-17 22:48:22 +00:00
Florian Hahn	bab0dc4d48	Reapply "[LV] Mark checks as never succeeding for high cost cutoff." Reapply 8a115b6934a90441 with an update to tests handling remarks. The patch now directly emits a clear remark when we bail out due to the memory check threshold. Original message: When GeneratedRTChecks::create bails out due to exceeding the cost threshold, no runtime checks are generated and we must not proceed assuming checks have been generated. Mark the checks as never succeeding, to make sure we don't try to vectorize assuming the runtime checks hold. This fixes a case where we previously incorrectly vectorized assuming runtime checks had been generated when forcing vectorization via metadate. Fixes the mis-compile mentioned in https://github.com/llvm/llvm-project/pull/166247#issuecomment-3631471588	2025-12-17 20:21:49 +00:00
Florian Hahn	eb0c7e752f	[VPlan] Replace BranchOnCount with Compare + BranchOnCond (NFC). (#172181 ) Expand BranchOnCount to BranchOnCond + ICmp in convertToConcreteRecipes to simplify codegen. PR: https://github.com/llvm/llvm-project/pull/172181	2025-12-16 19:19:31 +00:00
Ramkumar Ramachandra	1c6e5b2d04	[LV] Improve code using VPlan::get{ConstantInt,True} (NFC) (#172471 )	2025-12-16 13:03:43 +00:00
Luke Lau	67d0e21a62	Reapply "[VPlan] Remove legacy costing inside VPBlendRecipe::computeCost (#171846 )" (#172261 ) This reapplies #171846 with a test case and fix for a legacy cost-model mismatch assertion. In the previous version of the patch, we only considered the plan to contain simplifications when it had a VPBlendRecipe and VF.isScalar() was true. However for some VPlans we may have a blend with only the first lane used: BLEND ir<%phi> = ir<%foo.res> ir<%bar.res>/ir<%c> CLONE ir<%gep> = getelementptr ir<%p>, ir<%phi> vp<%5> = vector-pointer ir<%gep> And in the legacy cost model we cost a blend as a phi if it's uniform: // If we know that this instruction will remain uniform, check the cost of // the scalar version. if (isUniformAfterVectorization(I, VF)) VF = ElementCount::getFixed(1); So this replaces the VF.isScalar() check with vputils::onlyFirstLaneUsed, which matches how the VPlan cost model mirrored the legacy model beforehand. A VPInstruction::Select will also emit a scalar select for a vector VF if only the first lane is used, so this also updates VPBlendRecipe::computeCost to reflect that too.	2025-12-16 06:30:54 +00:00
Elvis Wang	1eba2cbe72	[LV] Convert uniform-address unmasked scatters to scalar store. (#166114 ) This patch optimizes vector scatters that have a uniform (single-scalar) address by replacing them with "extract-last-lane + scalar store" when the scatter is unmasked. Notes: - The legacy cost model can scalarize a store if both the address and the value are uniform. In VPlan we materialize the stored value via ExtractLastLane, so only the address must be uniform. - Some of the loops won't be vectorized any sine no vector instructions will be generated.	2025-12-16 12:24:22 +08:00
Florian Hahn	83eea87a36	[VPlan] Create header phis once, after constructing VPlan0 (NFC). (#168291 ) Together with https://github.com/llvm/llvm-project/pull/168289 & https://github.com/llvm/llvm-project/pull/166099 we can construct header phis once up front, after creating VPlan0, as the induction/reduction/first-order-recurrence classification applies across all VFs. Depends on https://github.com/llvm/llvm-project/pull/168289 & https://github.com/llvm/llvm-project/pull/166099 PR: https://github.com/llvm/llvm-project/pull/168291	2025-12-15 22:12:10 +00:00
Florian Hahn	dbb4f5c2dd	[VPlan] Set VF scale factor in tryToCreatePartialReduction (NFCI). Split off unrelated change from approved https://github.com/llvm/llvm-project/pull/168291/ to land separately as suggested.	2025-12-15 21:18:07 +00:00
Nicolai Hähnle	88bd56597c	VectorCombine: Improve the insert/extract fold in the narrowing case (#168820 ) Keeping the extracted element in a natural position in the narrowed vector has two beneficial effects: 1. It makes the narrowing shuffles cheaper (at least on AMDGPU), which allows the insert/extract fold to trigger. 2. It makes the narrowing shuffles in a chain of extract/insert compatible, which allows foldLengthChangingShuffles to successfully recognize a chain that can be folded. There are minor X86 test changes that look reasonable to me. The IR change for AVX2 in llvm/test/Transforms/VectorCombine/X86/extract-insert-poison.ll doesn't change the assembly generated by `llc -mtriple=x86_64-- -mattr=AVX2` at all.	2025-12-15 11:25:51 -08:00
Alexey Bataev	b988555812	[SLP]Check if the extractelement is part of other buildvector node before marking for erasing Need to check if the extractelement instruction is part of other buildvector node, before trying to mark it for the deletion, otherwise the compiler may reuse the deleted instruction. Fixes #172221	2025-12-15 09:54:05 -08:00
Bala_Bhuvan_Varma	0b2fe07e6b	[VectorCombine] Prevent redundant cost computation for repeated operand pairs in foldShuffleOfIntrinsics (#171965 ) This pr resolves [#170867](https://github.com/llvm/llvm-project/issues/170867) Existing code recomputes the cost for creating a shuffle instruction even for the repeating Intrinsic operand pairs. This will result in higher newCost. Hence the runtime will decide not to fold. The change proposed in this pr will address this issue. When calculating the newCost we are skipping the cost calculation of an operand pair if it was already considered. And when creating the transformed code, we are reusing the already created shuffle instruction for repeated operand pair.	2025-12-15 14:42:41 +00:00
Ramkumar Ramachandra	0636225b93	[VPlan] Directly unroll VectorPointerRecipe (#168886 ) In an effort to get rid of VPUnrollPartAccessor and directly unroll recipes, start by directly unrolling VectorPointerRecipe, allowing for VPlan-based simplifications and simplification of the corresponding execute.	2025-12-15 10:54:06 +00:00
Florian Hahn	bcbbe2c2bc	[VPlan] Pass backedge value directly to FOR and reduction phis (NFC). Pass backedge values directly to VPFirstOrderRecurrencePHIRecipe and VPReductionPHIRecipe directly, as they must be provided and availbale. Split off from https://github.com/llvm/llvm-project/pull/168291.	2025-12-14 20:59:22 +00:00
Florian Hahn	53cf22f3a1	[VPlan] Simplify live-ins early using SCEV. (#155304 ) Use SCEV to simplify all live-ins during VPlan0 construction. This enables us to remove special SCEV queries when constructing VPWidenRecipes and improves results in some cases. This leads to simplifications in a number of cases in real-world applications (~250 files changed across LLVM, SPEC, ffmpeg) PR: https://github.com/llvm/llvm-project/pull/155304	2025-12-14 20:15:05 +00:00
Luke Lau	4ea8157773	Revert "[VPlan] Remove legacy costing inside VPBlendRecipe::computeCost (#171846 )" This reverts commit fd5f53aa9b21060063484fc6c346316a34a6464c. It's triggering legacy cost model assertions reported in https://github.com/llvm/llvm-project/pull/171846#issuecomment-3647640019	2025-12-13 20:05:34 +08:00
Nicolai Hähnle	54ae1222ef	VectorCombine: Fold chains of shuffles fed by length-changing shuffles (#168819 ) Such chains can arise from folding insert/extract chains.	2025-12-12 13:53:03 -08:00
Florian Hahn	e6e3f94b5c	[VPlan] Re-add clarifying comment regarding part to extract. (NFC) Re-add and emphasize comment regarding extracting from the last part, as suggested post-commit in https://github.com/llvm/llvm-project/pull/171145.	2025-12-12 21:51:33 +00:00
Florian Hahn	333ee931df	[LV] Update stale comment after 4e05d702f02a. (NFC) Address post-commit suggestion, update stale comment after 4e05d702f.	2025-12-12 21:36:56 +00:00
Florian Hahn	0171e881b5	[VPlan] Strip stray whitespace when printing VPWidenIntOrFpInduction. printFlags takes care of inserting the needed spaces, remove unneeded extra stray whitespace	2025-12-12 21:28:50 +00:00
Florian Hahn	65deac0872	[VPlan] Remove vector type checking in inferScalartType (NFC). inferScalarTypeForRecipe always infers a scalar type, so BaseTy must be a scalar type. Remove unneeded cast.	2025-12-11 22:10:31 +00:00
Florian Hahn	4e05d702f0	[LV] Always include middle block cost in isOutsideLoopWorkProfitable. (#171102 ) Always include the cost of the middle block in isOutsideLoopWorkProfitable. This addresses the TODO from https://github.com/llvm/llvm-project/pull/168949 and removes the temporary restriction. isOutsideLoopWorkProfitable already scales the cost outside loops according the expected trip counts. In practice this increases the minimum iteration threshold in a few cases. On a large IR corpus based on C/C++ workloads, ~50 out of 179450 vector loops have their thresholds increased slightly. PR: https://github.com/llvm/llvm-project/pull/171102	2025-12-11 21:41:47 +00:00
Nikita Popov	8a9d9e4853	[LV] Use getSigned() for stride The stride may be negative.	2025-12-11 17:30:37 +01:00
Luke Lau	fd5f53aa9b	[VPlan] Remove legacy costing inside VPBlendRecipe::computeCost (#171846 ) A VPBlendRecipe always emits selects, even when the VF is scalar. However the legacy cost model always costs all scalar non-header phis as a phi, and the VPlan cost model has to account for this. This can cause the cost to be a little off, for example not including the cost of the select in @smax_call_uniform leading to unprofitable vectorization. This removes this from the VPlan cost model and handles checks for the case in planContainsAdditionalSimplifications instead. I considered trying to make the legacy cost model more accurate but I'm not sure if it's possible. We need information as to whether or not the scalar VF we are costing is the original loop in which case it's actually a phi, or if it's a VPBlendRecipe that emits a select, potentially from a VF=1, UF>=1 VPlan.	2025-12-12 00:25:58 +08:00
Ramkumar Ramachandra	85fafd5db0	[SCEVExp] Get DL from SE, strip constructor arg (NFC) (#171823 )	2025-12-11 14:26:47 +00:00
Luke Lau	2967815249	[VPlan] Don't emit VPBlendRecipes with only one incoming value. NFC (#171804 ) We can just directly use the incoming value. These single value blends would get optimized later on in simplifyBlends, but by doing it early it removes the notion of an "immediately normalized" blend, and simplifies an upcoming patch.	2025-12-11 12:55:56 +00:00
Florian Hahn	5a1299b196	[VPlan] Strip stray whitespace when printing VPWidenSelectRecipe. (NFCI) printFlags takes care of inserting the correct amount of spaces, depending on whether there are flags to print or not.	2025-12-10 22:15:35 +00:00
Ramkumar Ramachandra	1c7126d8db	[VPlan] Combine LiveIns fields into MapVector (NFC) (#170220 ) Combine Value2VPValue and VPLiveIns into a single MapVector LiveIns field, simplifying users.	2025-12-10 07:09:21 +00:00
Ramkumar Ramachandra	3310c0be58	[VPlan] Strip TODO to consolidate (ActiveLaneMask\|Widen)PHI (#171392 ) They cannot be consolidated, as WidenPHI is not a header PHI, while ActtiveLaneMaskPHI is.	2025-12-09 21:38:58 +00:00
Aiden Grossman	f29d06029f	Revert "[LV] Mark checks as never succeeding for high cost cutoff." This reverts commit 8a115b6934a90441d77ea54af73e7aaaa1394b38. This broke premerge. https://lab.llvm.org/staging/#/builders/192/builds/13326 /home/gha/llvm-project/clang/test/Frontend/optimization-remark-options.c:10:11: remark: loop not vectorized: cannot prove it is safe to reorder floating-point operations; allow reordering by specifying '#pragma clang loop vectorize(enable)' before the loop or by providing the compiler option '-ffast-math'	2025-12-09 21:32:09 +00:00
Florian Hahn	8a115b6934	[LV] Mark checks as never succeeding for high cost cutoff. When GeneratedRTChecks::create bails out due to exceeding the cost threshold, no runtime checks are generated and we must not proceed assuming checks have been generated. Mark the checks as never succeeding, to make sure we don't try to vectorize assuming the runtime checks hold. This fixes a case where we previously incorrectly vectorized assuming runtime checks had been generated when forcing vectorization via metadate. Fixes the mis-compile mentioned in https://github.com/llvm/llvm-project/pull/166247#issuecomment-3631471588	2025-12-09 20:37:21 +00:00
Florian Hahn	c61a481a23	[VPlan] Use SCEV to prove non-aliasing for stores at different offsets. (#170347 ) Extend the logic add in https://github.com/llvm/llvm-project/pull/168771 to also allow sinking stores past stores in the same noalias set by checking if we can prove no-alias via the distance between accesses, checked via SCEV. PR: https://github.com/llvm/llvm-project/pull/170347	2025-12-09 16:19:13 +00:00
Fabrice de Gans	d478baa238	Add more missing `LLVM_ABI` annotations (#168765 ) This patch updates various LLVM headers to properly add the `LLVM_ABI` and `LLVM_ABI_FOR_TEST` annotations ot build LLVM as a DLL on Windows. This effort is tracked in #109483.	2025-12-09 09:03:15 -05:00
Florian Hahn	0768068ff0	[VPlan] Remove ExtractLastLane for plans with scalar VFs. (#171145 ) ExtractLastLane is a no-op for scalar VFs. Update simplifyRecipe to remove them. This also requires adjusting the code in VPlanUnroll.cpp to split off handling of ExtractLastLane/ExtractPenultimateElement for scalar VFs, which now needs to match ExtractLastPart. PR: https://github.com/llvm/llvm-project/pull/171145	2025-12-09 11:59:40 +00:00
Pengcheng Wang	1ef0a56b55	[LV][NFC] Use foldTailWithEVL() (#171282 )	2025-12-09 16:58:53 +08:00
Luke Lau	0fbb45e7d6	[LV] Return getPredBlockCostDivisor in uint64_t When the probability of a block is extremely low, HeaderFreq / BBFreq may be larger than 32 bits. Previously this got truncated to uint32_t which could cause division by zero exceptions on x86. Widen the return type to uint64_t which should fit the entire range of BlockFrequency values. It's also worth noting that a frequency can never be zero according to BlockFrequency.h, so we shouldn't need to worry about divide by zero in getPredBlockCostDivisor itself.	2025-12-09 15:43:13 +08:00
Drew Kersnar	5c8c7f3d21	[LoadStoreVectorizer] Fill gaps in load/store chains to enable vectorization (#159388 ) This change introduces Gap Filling, an optimization that aims to fill in holes in otherwise contiguous load/store chains to enable vectorization. It also introduces Chain Extending, which extends the end of a chain to the closest power of 2. This was originally motivated by the NVPTX target, but I tried to generalize it to be universally applicable to all targets that may use the LSV. I'm more than willing to make adjustments to improve the target-agnostic-ness of this change. I fully expect there are some issues and encourage feedback on how to improve things. For both loads and stores we only perform the optimization when we can generate a legal llvm masked load/store intrinsic, masking off the "extra" elements. Determining legality for stores is a little tricky from the NVPTX side, because these intrinsics are only supported for 256-bit vectors. See the other PR I opened for the implementation of the NVPTX lowering of masked store intrinsics, which include NVPTX TTI changes that return true for isLegalMaskedStore under certain conditions: https://github.com/llvm/llvm-project/pull/159387. This change is dependent on that backend change, but I predict this change will require more discussion, so I am putting them both up at the same time. The backend change will be merged first assuming both are approved. Edited: both stores _and loads_ must use masked intrinsics for this optimization to be legal. --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com>	2025-12-08 15:57:17 -06:00
Florian Hahn	65dd29b335	[LV] Compare induction start values via SCEV in assertion (NFCI). Instead of comparing plain VPValue in the assertion checking the start values, directly compare the SCEV's. This future-proofs the code in preparation of performing more simplifications/canonicalizations for live-ins.	2025-12-08 21:31:53 +00:00
Alexey Bataev	f8d0c355f5	[SLP]Prefer instructions, ued outside the block, as the initial main copyable instructions Instructions, used outside the block, must be considered the first choice for the main instructionsin the copyable nodes, to avoid use-before-def. Fixes #171055	2025-12-08 09:46:15 -08:00
Ramkumar Ramachandra	c5b90103da	[VPlan] Use nuw when computing {VF,VScale}xUF (#170710 ) These quantities should never unsigned-wrap. This matches the behavior if only VFxUF is used (and not VF): when computing both VF and VFxUF, nuw should hold for each step separately.	2025-12-08 15:46:02 +00:00
Luke Lau	e8219e5ce8	[VPlan] Use BlockFrequencyInfo in getPredBlockCostDivisor (#158690 ) In 531.deepsjeng_r from SPEC CPU 2017 there's a loop that we unprofitably loop vectorize on RISC-V. The loop looks something like: ```c for (int i = 0; i < n; i++) { if (x0[i] == a) if (x1[i] == b) if (x2[i] == c) // do stuff... } ``` Because it's so deeply nested the actual inner level of the loop rarely gets executed. However we still deem it profitable to vectorize, which due to the if-conversion means we now always execute the body. This stems from the fact that `getPredBlockCostDivisor` currently assumes that blocks have 50% chance of being executed as a heuristic. We can fix this by using BlockFrequencyInfo, which gives a more accurate estimate of the innermost block being executed 12.5% of the time. We can then calculate the probability as `HeaderFrequency / BlockFrequency`. Fixing the cost here gives a 7% speedup for 531.deepsjeng_r on RISC-V. Whilst there's a lot of changes in the in-tree tests, this doesn't affect llvm-test-suite or SPEC CPU 2017 that much: - On armv9-a -flto -O3 there's 0.0%/0.2% more geomean loops vectorized on llvm-test-suite/SPEC CPU 2017. - On x86-64 -flto -O3 with PGO there's 0.9%/0% less geomean loops vectorized on llvm-test-suite/SPEC CPU 2017. Overall geomean compile time impact is 0.03% on stage1-ReleaseLTO: https://llvm-compile-time-tracker.com/compare.php?from=9eee396c58d2e24beb93c460141170def328776d&to=32fbff48f965d03b51549fdf9bbc4ca06473b623&stat=instructions%3Au	2025-12-08 14:28:26 +00:00
Aiden Grossman	7bfdaa51f1	[VPlan] Fix unused variable warning llvm-project/llvm/lib/Transforms/Vectorize/VPlanPredicator.cpp:312:19: warning: unused variable 'EB' [-Wunused-variable] 312 \| VPBasicBlock *EB = Plan.getExitBlocks().front(); \| ^~ This showed up in a non-assertions build.	2025-12-07 18:07:52 +00:00
Florian Hahn	3fc7419236	[VPlan] Replace ExtractLast(Elem\|LanePerPart) with ExtractLast(Lane/Part) (#164124 ) Replace ExtractLastElement and ExtractLastLanePerPart with more generic and specific ExtractLastLane and ExtractLastPart, which model distinct parts of extracting across parts and lanes. ExtractLastElement == ExtractLastLane(ExtractLastPart) and ExtractLastLanePerPart == ExtractLastLane, the latter clarifying the name of the opcode. A new m_ExtractLastElement matcher is provided for convenience. The patch should be NFC modulo printing changes. PR: https://github.com/llvm/llvm-project/pull/164124	2025-12-07 15:15:43 +00:00
Florian Hahn	ba836dc5ed	[VPlan] Remove stray space before ops when printing vector-ptr (NFC)	2025-12-06 13:07:07 +00:00
Jerry Dang	23f09fd3e9	[VectorCombine] Fold permute of intrinsics into intrinsic of permutes: shuffle(intrinsic, poison/undef) -> intrinsic(shuffle) (#170052 ) [VectorCombine] Fold permute of intrinsics into intrinsic of permutes Add foldPermuteOfIntrinsic to transform: shuffle(intrinsic(args), poison) -> intrinsic(shuffle(args)) when the shuffle is a permute (operates on single vector) and the cost model determines the transformation is profitable. This optimization is particularly beneficial for subvector extractions where we can avoid computing unused elements. For example: %fma = call <8 x float> @llvm.fma.v8f32(<8 x float> %a, %b, %c) %result = shufflevector <8 x float> %fma, poison, <4 x i32> <0,1,2,3> transforms to: %a_low = shufflevector <8 x float> %a, poison, <4 x i32> <0,1,2,3> %b_low = shufflevector <8 x float> %b, poison, <4 x i32> <0,1,2,3> %c_low = shufflevector <8 x float> %c, poison, <4 x i32> <0,1,2,3> %result = call <4 x float> @llvm.fma.v4f32(%a_low, %b_low, %c_low) The transformation creates one shuffle per vector argument and calls the intrinsic with smaller vector types, reducing computation when only a subset of elements is needed. The existing foldShuffleOfIntrinsics handled the blend case (two intrinsic inputs), this adds support for the permute case (single intrinsic input). Fixes #170002	2025-12-05 15:54:53 +00:00
Florian Hahn	f02dc4d198	[VPlan] Don't try to hoist multi-defs for first-order recurrences. Currently the hoisting implementation expects single-defs. Bail out on multi-defs (VPInterleaveRecipe), to fix an assertion. Fixes https://github.com/llvm/llvm-project/issues/170666	2025-12-04 21:09:16 +00:00
Ramkumar Ramachandra	ef58670f03	Revert [VPlan] Consolidate logic for narrowToSingleScalars (#170720 ) This reverts commit 7b3ec51, as a crash was reported: https://llvm.godbolt.org/z/dK6ff5zvr -- this will give us time to investigate a re-land.	2025-12-04 19:14:51 +00:00
Alexey Bataev	a2a3d89e08	[SLP][NFC]Hoist invariant request for user nodes out of the loop, NFC	2025-12-04 06:57:54 -08:00
Alexey Bataev	e502dce8b5	[SLP][NFC]Simplify analysis of the scalars, NFC. Just an attempt to simplify some checks, remove extra calls and reorder checks to make code simpler and faster Reviewers: RKSimon, hiraditya Reviewed By: hiraditya Pull Request: https://github.com/llvm/llvm-project/pull/170382	2025-12-04 08:28:38 -05:00

1 2 3 4 5 ...

6896 Commits