llvm-project

Author	SHA1	Message	Date
Florian Hahn	90b3712d8a	Reapply "[VPlan] Detect and create partial reductions in VPlan. (NFCI) (#167851 )" This reverts commit d1e477b00b49c63ff4dd513eeb14a5b18bc055d7. Recommit with a extra checks making sure extends are VPWidenCastRecipes, rejecting VPReplicateRecipes. Original message: As a first step, move the existing partial reduction detection logic to VPlan, trying to preserve the existing code structure & behavior as closely as possible. With this, partial reductions are detected and created together in a single step. This allows forming partial reductions and bundling them up if profitable together in a follow-up. PR: https://github.com/llvm/llvm-project/pull/167851	2026-02-01 16:27:27 +00:00
Martin Storsjö	d1e477b00b	Revert "[VPlan] Detect and create partial reductions in VPlan. (NFCI) (#167851 )" This reverts commit f4e8cc1a2229dca76d21c8d37439c4c194b06b86. This change wasn't NFC; it causes failed asserts when building ffmpeg for i686 windows, see https://github.com/llvm/llvm-project/pull/167851 for details.	2026-02-01 14:35:02 +02:00
Florian Hahn	f4e8cc1a22	[VPlan] Detect and create partial reductions in VPlan. (NFCI) (#167851 ) As a first step, move the existing partial reduction detection logic to VPlan, trying to preserve the existing code structure & behavior as closely as possible. With this, partial reductions are detected and created together in a single step. This allows forming partial reductions and bundling them up if profitable together in a follow-up. PR: https://github.com/llvm/llvm-project/pull/167851	2026-01-31 19:44:46 +00:00
Florian Hahn	a0b99e32d3	[LV] Add additional partial reduction test coverage for #167851 . Add test cases for which earlier versions of https://github.com/llvm/llvm-project/pull/167851 was not NFC. Test chained_sext_adds is moved to a new file.	2026-01-30 20:31:32 +00:00
Andrei Elovikov	d8621d665d	Reapply "[VPlan] Add hidden `-vplan-print-after-all` option" (#178547 ) Re-commit of https://github.com/llvm/llvm-project/pull/175839 after fixing build without `LLVM_ENABLE_DUMP`. This consists of the following changes: * Merge several overloads of `VPlanTransforms::runPass` into a single function to avoid code duplication. * Add helper macro `RUN_VPLAN_PASS` to capture the transformation name and pass it to the helper above for printing. * Add new `-vplan-print-after-all` option (somewhat similar to existing `-vplan-verify-each`). * Add two empty passes `printAfterInitialConstruction`/`printFinalVPlan` so that initial/final VPlans would be supported in `-vplan-print-after-all` This follows the original future plans in https://github.com/llvm/llvm-project/pull/123640.	2026-01-30 19:55:09 +00:00
Florian Hahn	abfd56293c	[VPlan] Mark VPActiveLaneMaskPHIRecipe as readnone. (#177886 ) VPWidenActiveLaneMaskPHIRecipe does not have side-effects and also does not access memory. Mark accordingly. This allows hoisting of some invariant loads out of loops and also removing unused phi recipes in the future. In llvm/test/Transforms/LoopVectorize/AArch64/conditional-branches-cost.ll, the hoisting makes vectorization profitable. PR: https://github.com/llvm/llvm-project/pull/177886	2026-01-30 16:12:30 +00:00
Sander de Smalen	a726b1907a	NFC: Cleanup AArch64/partial-reduce-chained.ll This had some loop attributes that were unused. Also cleaned up the flags a little bit.	2026-01-30 14:59:38 +00:00
Sander de Smalen	b4c7518a0f	[LV] Add support for extended fadd reductions (#178447 ) This makes use of the llvm.vector.partial.reduce.fadd intrinsics added in #163975 to handle the following with FDOT: ``` float32_t fdot(float16_t *src, int N) { float32_t sum = 0.0f; for (int i=0; i<N; ++i) sum += src[i]; return sum; } ```	2026-01-30 08:27:57 +00:00
Florian Hahn	eabcdb572b	Revert "[VPlan] Add hidden `-vplan-print-after-all` option (#175839 )" (#178544 ) This reverts commit 97e1df149de213b760aae4060ee9e25dc9908125. It looks like the commit caused some build bot failures. Revert back to green so the failures can be investigated. https://lab.llvm.org/buildbot/#/builders/159/builds/39803 https://lab.llvm.org/buildbot/#/builders/2/builds/43204	2026-01-28 23:49:24 +00:00
Andrei Elovikov	97e1df149d	[VPlan] Add hidden `-vplan-print-after-all` option (#175839 ) This consists of the following changes: * Merge several overloads of `VPlanTransforms::runPass` into a single function to avoid code duplication. * Add helper macro `RUN_VPLAN_PASS` to capture the transformation name and pass it to the helper above for printing. * Add new `-vplan-print-after-all` option (somewhat similar to existing `-vplan-verify-each`). * Add two empty passes `printAfterInitialConstruction`/`printFinalVPlan` so that initial/final VPlans would be supported in `-vplan-print-after-all` This follows the original future plans in https://github.com/llvm/llvm-project/pull/123640.	2026-01-28 22:25:54 +00:00
Jay Foad	c75d371f57	[LLVM] Fix typo "LABLE" in test checks (#178451 )	2026-01-28 17:31:05 +00:00
Florian Hahn	e36cd26618	[VPlan] Remove non-reductions after simplifications. (#176795 ) In some cases, we identify patterns as reductions, even though they can be simplified to a non-reduction. Mark VPReductionPHIRecipe as not reading from memory & not having side-effects, to clean them up. We also need to remove ComputeReductionResult VPInstructions with live-in arguments. This means there is actually no reduction, and we need to fold it to the live in. Otherwise we would incorrectly reduce the live-in. PR: https://github.com/llvm/llvm-project/pull/176795	2026-01-28 15:51:08 +00:00
Damian Heaton	762ba885f9	[LV] Add support for llvm.vector.partial.reduce.fadd (#163975 ) Allows the Loop Vectorizer to generate `llvm.vector.partial.reduce.fadd` intrinsics when sequences which match its requirements are found.	2026-01-28 15:05:34 +00:00
Mel Chen	2f92d44043	[LV] Pre-commit test for sinking the recipe into vector early exit block. nfc (#177954 ) Pre-commit for #168031	2026-01-28 04:17:12 +00:00
Jim Lin	0ed8e7230f	[VPlan] Create SCEV before any VPIRInstructions to check for overflow (#177911 ) This PR tried to fix the assertion fail at VPlanTransforms.cpp:4862 since SCEV was created after VPIRInstructions. The tripcount in scalable-predication.ll was changed from constant value 256 to non-constant value %n to avoid VPIRInstructions optimized out, which cannot trigger the assertion fail. The orders in ir-bb<entry> from: ir-bb<entry>: EMIT vp<%2> = EXPAND SCEV (1 umax %n) EMIT vp<%3> = sub ir<-1>, vp<%2> EMIT vp<%4> = EXPAND SCEV (4 * vscale)<nuw> EMIT vp<%5> = icmp ult vp<%3>, vp<%4> EMIT branch-on-cond vp<%5> Successor(s): scalar.ph, vector.ph to: ir-bb<entry>: EMIT vp<%2> = EXPAND SCEV (1 umax %n) EMIT vp<%3> = EXPAND SCEV (4 * vscale)<nuw> EMIT vp<%4> = sub ir<-1>, vp<%2> EMIT vp<%5> = icmp ult vp<%4>, vp<%3> EMIT branch-on-cond vp<%5> Successor(s): scalar.ph, vector.ph	2026-01-28 03:16:50 +00:00
Tomer Shafir	4239e858fe	[AArch64] Align nontemporal store/load little-endian checks (#177468 ) This patch aims to align all nontemporal store/load handling to systematically enforce a little-endian target. This has been the effective support LLVM had for NT store/load lowering (there has been no effective support for big-endian, even with the inconsistencies). The change in `llvm/lib/Target/AArch64/AArch64InstrInfo.td` is effectively a NFC, because the only lowering of LDNP, in `llvm/lib/Target/AArch64/AArch64ISelLowering.cpp`, have already checked for `isLittleEndian`. The change in `llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h` affects its single caller `llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp`. The previous logic has been wrong, enabling vectorization of effectively illegal nontemporal store/load instructions on big-endian.	2026-01-27 21:37:21 +02:00
Ryan Buchner	2753e1dedf	[RISCV] Set the reciprocal throughtput cost for division to TTI::TCC_Expensive (#177516 ) Fixes #176208. Scaled back version of #176515 that only affects the RISCV backend. Only modifies the cost for cases when DIV is a legal operation. Updates the cost for both Scalar and Vector types. Used `TTI::TCC_Expensive` as suggested by https://github.com/llvm/llvm-project/issues/176208#issuecomment-3760902537. --------- Co-authored-by: Luke Lau <luke_lau@icloud.com>	2026-01-27 11:01:19 -08:00
Florian Hahn	537c648fc0	[LV] Precommit extra argmin/argmax tests for #170223 . Precommit extra tests for https://github.com/llvm/llvm-project/pull/170223	2026-01-26 21:32:50 +00:00
Florian Hahn	7b445ddcd2	[LV] Add additional tests for early-exit loops loads not known deref. Add additional test coverage for loops with loads that are not known to be dereferenceable.	2026-01-25 23:09:58 +00:00
Mircea Trofin	b39568d782	[LV] capture branch weights for constant trip counts (#175096 ) When a vectorized loop has constant trip, it's important to update the profile information accordingly. Hotness analysis will only look at profile info. For example, in the `tripcount.ll` test, without producing the profile info, in the `const_trip_over_profile` function, the BFI of the `vector.body` would be 32 (this is the expected value when synthetic branch weights are used, in loops). The real value is 250. The `for.body`value was _very_ incorrect before, too (and detrimentally so, as it would have appeared as "very hot" when it wasn't): The table below was obtained by printing BFI in the RUN: command, i.e. `build/bin/opt < llvm/test/Transforms/LoopVectorize/tripcount.ll -passes="loop-vectorize,print<block-freq>" -loop-vectorize-with-block-frequency -S -o /dev/null`. Showing only the `float` value, i.e. the BFI relative to the function entry BB. ``` Printing analysis results of BFI for function 'const_trip_over_profile': block-frequency-info: const_trip_over_profile ``` \| Block \| Before \| After \| \| ----- \| ------ \| ----- \| \| `entry` \| float = 1.0 \| float = 1.0 \| \| `vector.ph` \| float = 1.0 \| float = 1.0 \| \| `vector.body` \| float = 32.0 \| float = 250.0 \| \| `middle.block` \| float = 1.0 \| float = 1.0 \| \| `scalar.ph` \| float = 1.0 \| float = 1.0 \| \| `for.body` \| float = 2147483647.8 \| float = 1.0 \| \| `for.end` \| float = 1.0 \| float = 1.0 \|	2026-01-23 14:31:05 -08:00
Mel Chen	149c76538e	[LV] Separate runtime check cost from total overhead in profitability check (#176754 ) In isOutsideLoopWorkProfitable function, there are two places where only the runtime check cost (RtC) should be used, but incorrectly included the costs of middle blocks and early-exit blocks. 1. VectorizeMemoryCheckThreshold comparison for interleaving-only 2. Minimum trip count that bounds runtime check overhead, i.e. MinTC2 calculation This results in an overly conservative minimum profitable trip count. This patch separates the runtime check cost from the total overhead cost, and uses only RtC for VectorizeMemoryCheckThreshold comparison and the MinTC2 calculation.	2026-01-23 07:29:56 +00:00
Florian Hahn	7ea1fa591a	[LV] Skip FindLast reductions in collectInLoopReductions. FindLast in-loop reductions are not supported, similarly to FindLastIV reductions. Skip them in collectInLoopReductions, to avoid a crash for loops with FindLast reductions and in-loop reductions preferred.	2026-01-22 21:49:52 +00:00
Florian Hahn	14a209f852	[VPlan] Replace ComputeFindIVRes with ComputeRdxRes + cmp + sel (NFC) (#176672 ) Replace ComputeFindIVResult with ComputeReductionResult + explicit compare + select, to more explicitly and simpler model computing finding the first/last induction, which boils down to a min/max reduction + compare and select of the sentinel value. PR: https://github.com/llvm/llvm-project/pull/176672	2026-01-22 19:28:47 +00:00
Florian Hahn	f41767db9d	[LV] Add replicating load/store cost tests for Apple CPUs. Add dedicated tests to check replicating load/store costs on Apple CPUs.	2026-01-21 16:37:12 +00:00
Florian Hahn	d3f2f1366d	[LV] Consider UserIC when limiting VF. (#174573 ) If a UserIC is provided, the vector loop will process VF * UserIC. Pass it through UserIC to computeFeasibleMaxVF and use it to limit the max VF to factors where VF * UserIC <= MaxTripCount. This avoids creating dead vector loops with user provided interleave counts. PR: https://github.com/llvm/llvm-project/pull/174573	2026-01-20 14:19:11 +00:00
David Sherwood	abc924356e	[LV][NFC] Update low trip count tail-folding tests (#176898 ) Whilst reviewing PR #176754 I realised there seemed to be some odd cost model issues for the tests in file LoopVectorize/AArch64/fold-tail-low-trip-count.ll where we seemed to be vectorising loops that aren't worth it. It turns out the tests were not targeting AArch64 despite being in the AArch64 directory. I fixed the RUN line for the file and also added a new file for RISCV so we get more test coverage.	2026-01-20 12:15:50 +00:00
Florian Hahn	69fbab2b72	[VPlan] Fall back to legacy cost if operands may be force-scalarized. If any of the operands of a VPReplicateRecipe have been force-scalarized, then the legacy cost model skips the scalarization overhead, but we cannot match this in the VPlan cost model. Bail out for now in those very rare cases. Fixes https://github.com/llvm/llvm-project/issues/176720.	2026-01-19 21:25:54 +00:00
Florian Hahn	e34fefdb35	[LV] Add extra tests with sink-able recipes. Add extra test coverage for https://github.com/llvm/llvm-project/pull/168031.	2026-01-19 18:33:55 +00:00
Florian Hahn	c6f3bc888b	[LV] Add single-iteration epilogue test with de-generate reduction. While at it, also modernize the check lines to reduce diff of future changes.	2026-01-19 17:43:40 +00:00
Florian Hahn	5e5d6389f6	[LV] Allow loops with multiple early exits in legality checks. (#176403 ) This patch removes the single uncountable exit constraint, allowing loops with multiple early exits, if the exits form a dominance chain and all other constraints hold for all uncountable early exits. While legality now accepts such loops, vectorization is not yet supported. VPlan support will be added in a follow up: https://github.com/llvm/llvm-project/pull/174864 PR: https://github.com/llvm/llvm-project/pull/176403	2026-01-19 12:32:04 +00:00
Florian Hahn	497a6d6722	Recommit "[VPlan] Only use isAddressSCEVForCost in legacy getAddressAccSCEV" This reverts commit ed004cf42bf57ca79b57bc3076ef83a8477426ea. The original commit exposed an independent cost issue, triggering an assertion. That issue has been fixed in 3457e7efc3. Reland the patch now that the assertion has been fixed.	2026-01-18 19:55:46 +00:00
Florian Hahn	3457e7efc3	[VPlan] Match inverted logical AND/OR for select costs. VPlan transforms may invert logical AND/OR selects, which can impact costs on targets the select is not cheap but the boolean AND/OR is. Also match the inverted logical AND/OR to improve accuracy of the cost estimation and fixes the underlying issue for the cost divergence between legacy and VPlan-based cost model that caused the revert of 01d34eb38fa058 in ed004cf42bf57c.	2026-01-18 16:15:42 +00:00
Florian Hahn	123acb24da	[LV] Add missing coverage for LV cost model code paths. Add a set of tests that expose crashes with some upcoming and pending patches.	2026-01-17 21:59:50 +00:00
Florian Hahn	5995fe951f	[VPlan] Normalize selects to always select the data op when cond is true. Fix a miscompile in the FindLast handling by normalizing selects with the phi node as the first op to ones that select the data value when the condition is true, by swapping operands and inverting the condition. This should ensure correct codegen for both cases. Select normalization: https://alive2.llvm.org/ce/z/yFdivK Fixes a miscompile reported for 2abd6d6d7ac (#158088).	2026-01-17 18:30:52 +00:00
Florian Hahn	421d50b5bf	[LV] Add additional tests for miscompile caused by 2abd6d6d7a. Add tests showing mis-compiles caused by 2abd6d6d7a (#158088).	2026-01-17 16:30:54 +00:00
Florian Hahn	459990dcf7	[VPlan] Replace PhiR operand of ComputeFindIVResult with VPIRFlags. #174026 (#175461 ) Replace the Phi recipe operand of ComputeFindIVResult with VPIRFlags, building on top of https://github.com/llvm/llvm-project/pull/174026. PR: https://github.com/llvm/llvm-project/pull/175461	2026-01-17 16:23:33 +00:00
Florian Hahn	1056e32785	[LV] Precommit additional early-exit tests from #174864 . Pre-commit tests from https://github.com/llvm/llvm-project/pull/174864.	2026-01-16 15:18:20 +00:00
Florian Hahn	3fb914d851	[SCEV] Add initial support for ptrtoaddr. (#158032 ) Add initial support for PtrToAddr to SCEV, including a new SCEVPtrToAddrExpr and SCEV expansion support for it. PR: https://github.com/llvm/llvm-project/pull/158032	2026-01-16 11:58:04 +00:00
Vishruth Thimmaiah	04baf1105f	[LoopVectorize] Support vectorization of overflow intrinsics (#174835 ) Enables support for marking overflow intrinsics `uadd`, `sadd`, `usub`, `ssub`, `umul` and `smul` as trivially vectorizable. Fixes #174617 --------- Signed-off-by: vishruth-thimmaiah <vishruththimmaiah@gmail.com>	2026-01-16 10:09:42 +00:00
Elvis Wang	aa11629192	[LV] Prevent `extract-lane` generate unused IRs with single vector operand. (#172798 ) When `extract-lane` only contains single vector operand. We can simplify it to `extractelement`. This patch makes `extract-lane` generate simple `extractelement` when it only contains single vector operand to prevent unused IR generated. This patch is mostly NFC, the unused IR should be removed in following IR passes.	2026-01-16 13:59:51 +08:00
Mircea Trofin	88d3078e81	[NFC] use UTC for LoopVectorize/tripcount.ll (#175095 )	2026-01-15 15:01:26 -08:00
Florian Hahn	f14577fa6f	[VPlan] Fold boolean select to xor if possible. Fold select c, false, true -> not c. This allows for more accurate cost estimation and fixes the underlying issue for the cost divergence between legacy and VPlan-based cost model that caused the revert of 01d34eb38fa058 in ed004cf42bf57c. https://alive2.llvm.org/ce/z/yVuSgW.	2026-01-15 22:13:47 +00:00
Florian Hahn	808a6ba345	[VPlan] Bail out when rdx result cannot be found in handleFindLast. Turn assertion from 2abd6d6d7ac (https://github.com/llvm/llvm-project/pull/158088) into a bail out to prevent crash when tail-folding. Fixes https://github.com/llvm/llvm-project/issues/175990.	2026-01-15 09:09:19 +00:00
Luke Lau	d023577ef9	[VPlan] Explicitly test EVL recipe has "evl" name. NFC Addresses the comment in https://github.com/llvm/llvm-project/pull/175493#pullrequestreview-3651607778	2026-01-15 15:25:01 +08:00
Florian Hahn	8c5352cf3e	[LV] Add additional cost and folding test coverage. (NFC)	2026-01-14 22:19:11 +00:00
Florian Hahn	f61aab79ce	[VPlan] Handle min/max recur kinds in ::printFlags. Following up to d5c11b9a24c84f, also handle min/max recurrence kinds in ::printFlags, so the proper kind is imprinted instead of icmp. NFC modulo debug printing changes	2026-01-14 18:16:11 +00:00
Florian Hahn	b59a3dfaf1	[VPlan] Add printing test for UMax reduction (NFC). Currently compute-reduction-result prints (icmp) instead of the correct min/max kind.	2026-01-14 17:44:02 +00:00
Graham Hunter	8866af03c2	Require asserts for a debug printing test	2026-01-14 16:07:35 +00:00
Graham Hunter	2abd6d6d7a	[LV] Vectorize conditional scalar assignments (#158088 ) Based on Michael Maitland's previous work: https://github.com/llvm/llvm-project/pull/121222 This PR uses the existing recurrences code instead of introducing a new pass just for CSA autovec. I've also made recipes that are more generic.	2026-01-14 14:59:18 +00:00
Florian Hahn	d5c11b9a24	[VPlan] Replace PhiR operand of ComputeRdxResult with VPIRFlags. (#174026 ) Remove the artificial PhiR operand of ComputeReductionResult, which was only used to look up recurrence kind, in-loop and ordered properties. Instead, encode them as VPIRFlags as suggested by @ayalz in https://github.com/llvm/llvm-project/pull/170223. This addresses a TODO to make codegen for ComputeReductionResult independent of looking up information from other recipes. This is NFC w.r.t. codegen, the printing has been improved to include the reduction type, and whether it is in-loop/ordered. PR: https://github.com/llvm/llvm-project/pull/174026	2026-01-14 07:45:44 +00:00

1 2 3 4 5 ...

3791 Commits