llvm-project

Author	SHA1	Message	Date
vporpo	8d442bc5b5	[SandboxVec][LoadStoreVec] Add support for constants (#189769 ) Up until now the pass would only vectorize load-store pairs. This patch implements vectorization of constant-store pairs.	2026-04-06 11:25:20 -07:00
Florian Hahn	0403639667	[VPlan] Skip successors outside any loop when updating LoopInfo. (#190553 ) Successors outside of any loop do not contribute to the innermost loop, skip them to avoid incorrect results due to getSmallestCommonLoop(nullptr, X) returning nullptr.	2026-04-06 12:58:41 +01:00
Florian Hahn	64a0bd1227	[LV] Return best VPlan together with VF from computeBestVF (NFC). (#190385 ) computeBestVF iterates over all VPlans and picks the VF of the most profitable VPlan. This VPlan is later needed for execution and additional checks. Instead of retrieving it multiple times later, just directly return it from computeBestVF. This removes some redundant lookups. PR: https://github.com/llvm/llvm-project/pull/190385	2026-04-06 11:01:18 +01:00
Florian Hahn	f7cdebb478	[VPlan] Mark unary ops as not having side-effects (NFC). (#190554 ) Mark unary ops (only FNeg current) to neither read nor write memory, similar to binary and cast ops. Should currently be NFC end-to-end.	2026-04-06 09:05:38 +01:00
Florian Hahn	c109dd1e9a	[VPlan] Refactor FindLastSelect matching to use m_Specific(PhiR) (NFC). (#190547 ) Match the select operands directly against PhiR using m_Specific, binding only the non-phi IV expression. This replaces the generic TrueVal/FalseVal matching followed by an assert and conditional extraction. Split off from approved https://github.com/llvm/llvm-project/pull/183911/ as suggested.	2026-04-05 20:07:34 +00:00
Florian Hahn	36e495dd90	[VPlan] Use APSInt in CheckSentinel directly (NFC). (#190534 ) Simplify the sentinel checking logic by using APSInt and checking for both a signed and unsigned sentinel in a single call. Removes the IsSigned argument Split off from approved https://github.com/llvm/llvm-project/pull/183911/ as suggested.	2026-04-05 16:43:59 +00:00
Florian Hahn	a2c16bb59f	[VPlan] Rename CondSelect to FindLastSelect (NFC). (#190536 ) …ns (NFC). Use the more descriptive name FindLastSelect for the conditional select that picks between the reduction phi and the IV value. Split off from approved https://github.com/llvm/llvm-project/pull/183911/ as suggested.	2026-04-05 16:39:34 +00:00
Hassnaa Hamdi	c5a904946a	[LV][NFC] remove dead code in canFoldTailByMasking() (#190263 ) Remove unused ReductionLiveOuts variable in `canFoldTailByMasking()`. The set was being populated with reduction loop exit instructions but was never actually used anywhere in the function.	2026-04-05 12:59:32 +01:00
Hassnaa Hamdi	6bf8279dc2	[LV][NFC] correct comment for isScalarEpilogueAllowed() (#190254 ) The comment had the opposite meaning of what the function actually does.	2026-04-05 12:55:36 +01:00
vporpo	94545a7c63	[SandboxVec][Legality][NFC] Outline differentBlock() and areUnique() (#190024 ) And reuse them in LoadStoreVec.	2026-04-03 12:14:55 -07:00
Sander de Smalen	730a07f225	[LV] Only create partial reductions when profitable. (#181706 ) We want the LV cost-model to make the best possible decision of VF and whether or not to use partial reductions. At the moment, when the LV can use partial reductions for a given VF range, it assumes those are always preferred. After transforming the plan to use partial reductions, it then chooses the most profitable VF. It is possible for a different VF to have been more profitable, if it wouldn't have chosen to use partial reductions. This PR changes that, to first decide whether partial reductions are more profitable for a given chain. If not, then it won't do the transform. This causes some regressions for AArch64 which are addressed in a follow-up PR to keep this one simple.	2026-04-03 17:42:51 +01:00
Florian Hahn	c963092b0c	[VPlan] Mark VPCanonicalIVPHI as not reading memory (NFCI). (#190338 ) The canonical IV does not access any memory. Mark accordingly. This should be NFC end-to-end. PR: https://github.com/llvm/llvm-project/pull/190338	2026-04-03 13:12:20 +00:00
Ramkumar Ramachandra	e09d1e3ff1	[VPlan] Use not_equal_to to improve code (NFC) (#190262 )	2026-04-03 07:32:34 +01:00
Ramkumar Ramachandra	d0e265f20d	[VPlan] Cleanup and generalize VPIRMetadata CastInfo (NFC) (#190162 ) Similar to b0230f59 ([VPlan] Cleanup and generalize VPPhiAccessors CastInfo, #190027).	2026-04-02 19:00:23 +01:00
Graham Hunter	deaef1c1b7	[LV] Adjust exit recipe detection to run on early vplan (#183318 ) Splitting out some work from #178454; this covers the enums for early exit loop type (none, readonly, readwrite) and the style used (readonly with multiple exit blocks, or masking with the last iteration done in scalar code), along with changing the early exit recipe detection to suit moving the transform for handling early exit readwrite loops earlier in the vplan pipeline.	2026-04-02 17:25:35 +01:00
Ramkumar Ramachandra	bb2a63a673	[VPlan] Use m_Isa to improve code (NFC) (#190149 )	2026-04-02 15:53:05 +01:00
Alexey Bataev	c2f97c5917	[SLP] Do not skip tiny trees with gathered loads to vectorize The isTreeTinyAndNotFullyVectorizable check for 2-node trees (insertelement root + gather child) was too aggressive: it rejected trees even when LoadEntriesToVectorize was non-empty, preventing gathered loads from being vectorized into masked loads/strided loads, etc. Reviewers: hiraditya, RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/190181	2026-04-02 09:47:01 -04:00
Alexey Bataev	dc2d25f80b	Revert "[SLP] Do not skip tiny trees with gathered loads to vectorize" This reverts commit 94ec7ffa46d351b86fbbe3a445ceef37f331c4a2 to fix reported issue https://github.com/llvm/llvm-project/pull/190040#issuecomment-4177827078 Reviewers: Pull Request: https://github.com/llvm/llvm-project/pull/190176	2026-04-02 09:26:31 -04:00
Ramkumar Ramachandra	b0230f5996	[VPlan] Cleanup and generalize VPPhiAccessors CastInfo (NFC) (#190027 )	2026-04-02 13:47:44 +01:00
Alexey Bataev	94ec7ffa46	[SLP] Do not skip tiny trees with gathered loads to vectorize The isTreeTinyAndNotFullyVectorizable check for 2-node trees (insertelement root + gather child) was too aggressive: it rejected trees even when LoadEntriesToVectorize was non-empty, preventing gathered loads from being vectorized into masked loads/strided loads, etc. Reviewers: RKSimon, hiraditya Pull Request: https://github.com/llvm/llvm-project/pull/190040	2026-04-02 06:47:53 -04:00
Ramkumar Ramachandra	d835dd2b43	[LV] Strip createStepForVF (NFC) (#185668 ) The mul -> shl simplification is already done in VPlan.	2026-04-02 10:04:37 +01:00
Alexey Bataev	c6669c4993	[SLP] Guard FMulAdd conversion to require single-use/non-reordered FMul operands The FMulAdd (CombinedVectorize) transformation in transformNodes() marks an FMul child entry with zero cost, assuming it is fully absorbed into the fmuladd intrinsic. However, when any FMul scalar has multiple uses (e.g., also stored separately), the FMul must survive as a separate node. Reviewers: hiraditya, RKSimon, bababuck Pull Request: https://github.com/llvm/llvm-project/pull/189692	2026-04-01 17:14:52 -04:00
Ramkumar Ramachandra	934438ad86	[VPlanPatternMatch] Unify and clarify m_Isa (NFC) (#189941 )	2026-04-01 20:48:46 +01:00
Ramkumar Ramachandra	44979bedf0	[VPlan] Strip dead code in isUniformAcrossVFsAndUFs (NFC) (#189687 ) Checking a VPInstruction for scalar-cast is equivalent to checking opcode against Instruction::isCast via preservesUniformity.	2026-04-01 17:38:41 +01:00
Ramkumar Ramachandra	82e8494070	[VPlan] Avoid unnecessary BTC SymbolicValue creation (NFC) (#189929 ) Don't unnecessarily create a backedge-taken-count SymbolicValue. This allows us to simplify some code.	2026-04-01 16:25:48 +00:00
Alexey Bataev	1e06cd634e	[SLP][NFC] Fix uninitialized ReductionRoot in getTreeCost ReductionRoot was initialized to nullptr instead of the RdxRoot parameter. This caused two ScaleCost calls (for MinBWs cast cost and ReductionBitWidth resize cost) to pass nullptr as the user instruction, and suppressed the "Reduction Cost" line in debug output. In practice the scale factor is the same because the tree root's main op and the reduction root share the same basic block, so this is NFC. Reviewers: Pull Request: https://github.com/llvm/llvm-project/pull/189994	2026-04-01 12:22:02 -04:00
Florian Hahn	0b61cd39e4	[LV] Add epilogue minimum iteration check in VPlan as well. (#189372 ) Update LV to also use the VPlan-based addMinimumIterationCheck for the iteration count check for the epilogue. As the VPlan-based addMinimumIterationCheck uses VPExpandSCEV, those need to be placed in the entry block for now, moving vscale * VF * IC to the entry for scalable vectors. The new logic also fails to simplify some checks involving PtrToInt, because they were only simplified when going through generated IR, then folding some PtrToInt in IR, then constructing SCEVs again. But those should be cleaned up by later combines, and there is not really much we can do other than trying to go through IR. PR: https://github.com/llvm/llvm-project/pull/189372	2026-04-01 15:47:41 +01:00
David Green	fd40c60665	[VectorCombine] Fix transitive Uses in foldShuffleToIdentity (#188989 ) The Uses in foldShuffleToIdentity is intended to detect where an operand is used to distinguish between splats, identities and concats of the same value. When looking through multiple unsimplified shuffles the same Use could be both a splat and a identity though. This patch changes the Use to a Value and an original Use, so that even if we are looking through multiple vectors we recognise the splat vs identity vs concat of each use correctly. Fixes #180338	2026-04-01 14:53:04 +01:00
Ramkumar Ramachandra	3068132e32	[LV] Use bind_front in tryToOptimizeInductionTruncate (NFC) (#189763 )	2026-04-01 08:19:49 +01:00
Alexey Bataev	c20e233020	[SLP] Replace TrackedToOrig DenseMap with parallel SmallVector in reduction Replace the DenseMap<Value, Value> TrackedToOrig with a SmallVector<Value*> indexed in parallel with Candidates. This avoids hash-table overhead for the tracked-value-to-original-value mapping in horizontal reduction processing. Fixes #189686	2026-03-31 16:22:57 -07:00
Henry Jiang	5d624b5b93	[VPlan] Stop outerloop vectorization from vectorizing nonvector intrinsics (#185347 ) In outer-loop VPlan, avoid emitting vector intrinsic calls for intrinsics without a vector form. In VPRecipeBuilder, detect missing vector intrinsic mapping and emit scalar handling instead of a vector call. Also fix assertion when `llvm.pseudoprobe` in VPlan's native path is being treated as a `WIDEN-INTRINSIC`. Reproducer: https://godbolt.org/z/GsPYobvYs	2026-03-31 16:01:39 -07:00
vporpo	d8e9e0af1c	[SandboxVec][LoadStoreVec] Initial pass implementation (#188308 ) This patch implements a new simple region pass that can vectorize store-load chains.	2026-03-31 15:15:43 -07:00
Florian Hahn	ff4e229f8c	Revert "[VPlan] Extract reverse mask from reverse accesses" (#189637 ) Reverts llvm/llvm-project#155579 Assertion added triggers on some buildbots clang: /home/tcwg-buildbot/worker/clang-aarch64-sve2-vla/llvm/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp:3840: virtual InstructionCost llvm::VPWidenMemoryRecipe::computeCost(ElementCount, VPCostContext &) const: Assertion `!IsReverse() && "Inconsecutive memory access should not have reverse order"' failed. PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script. Stack dump: 0. Program arguments: /home/tcwg-buildbot/worker/clang-aarch64-sve2-vla/stage1.install/bin/clang -DNDEBUG -mcpu=neoverse-v2 -mllvm -scalable-vectorization=preferred -O3 -std=gnu17 -fcommon -Wno-error=incompatible-pointer-types -MD -MT MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/CMakeFiles/timberwolfmc.dir/finalpin.c.o -MF CMakeFiles/timberwolfmc.dir/finalpin.c.o.d -o CMakeFiles/timberwolfmc.dir/finalpin.c.o -c /home/tcwg-buildbot/worker/clang-aarch64-sve2-vla/test/test-suite/MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/finalpin.c	2026-03-31 15:53:01 +01:00
Ramkumar Ramachandra	c592eba498	[VPlan] Use RPOT in CSE, fixing potential crash (#187548 ) A CSE crash is observed arising from outdated hash values unless we forbid replacements in successor phis in blocks that are not dominated by the def: the crash is observed when there is a block with CSE'able phis with CSE'able incoming values, with incoming values coming from a non-dominating block, under the condition that the block with the phis is visited before the non-dominating block. It is unfortunately impossible to write a test case showing a crash at present, but crashes do occur when attempting to CSE DerivedIV recipes. The root cause of the crash is visiting a non-dominated use before a def, and hence would be fixed by a reverse post-order traversal. Fixes #187499. Co-authored-by: Luke Lau <luke@igalia.com>	2026-03-31 10:40:03 +01:00
Mel Chen	f76f41f702	[VPlan] Extract reverse mask from reverse accesses (#155579 ) Following #146525, separate the reverse mask from reverse access recipes. At the same time, remove the unused member variable `Reverse` from `VPWidenMemoryRecipe`. This will help to reduce redundant reverse mask computations by VPlan-based common subexpression elimination.	2026-03-31 08:51:15 +00:00
Vasileios Porpodas	47e3f42bc7	Reapply "[SandboxVec][VecUtils] Lane Enumerator (#188355 )" This reverts commit c93049ef504f942af0f884ce8a5efc21df21d131.	2026-03-31 00:19:25 +00:00
Demetrius Kanios	96bd7b6e15	[CodeGen] Add additional params to `TargetLoweringBase::getTruncStoreAction` (#187422 ) The truncating store analogue of #181104. Adds `Alignment` and `AddrSpace` parameters to `TargetLoweringBase::getTruncStoreAction` and dependents, and introduces a `getCustomTruncStoreAction` hook for targets to customize legalization behavior using this new information. This change is fully backwards compatible from the target's point of view, with `setTruncStoreAction` having identical functionality. The change is purely additive.	2026-03-30 16:52:45 -07:00
Vasileios Porpodas	c93049ef50	Revert "[SandboxVec][VecUtils] Lane Enumerator (#188355 )" This reverts commit 02402beefec61c5947c9d3bec60626a4afd860a8.	2026-03-30 22:12:02 +00:00
vporpo	02402beefe	[SandboxVec][VecUtils] Lane Enumerator (#188355 ) This patch introduces an iterator that helps us iterate over lane-value pairs in a range. For example, given a container `(i32 %v0, <2 x i32> %v1, i32 %v2)` we get: ``` Lane Value 0 %v0 1 %v1 3 %v2 ``` We use this iterator to replace the lane counting logic in BottomUpVec.cpp.	2026-03-30 15:08:16 -07:00
Alexey Bataev	26e0d15eaa	[SLP] Prefer to trim equal-cost alternate-shuffle subtrees If the trimming candidate subtree is rooted at an alternate-shuffle node with binary ops, and this subtree has the same cost as the buildvector node cost, better to stick with the buildvector node to avoid runtime perf regressions from shuffle/extra operations overhead that the cost model may underestimate. Skip trimming if the subtree contains ExtractElement nodes, since those operate on already-materialized vectors, which may reduced vector-to-scalar code movement and have better perf. Reviewers: hiraditya, bababuck, fhahn, RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/188272	2026-03-30 16:03:18 -04:00
Hassnaa Hamdi	1d48c3bc01	[NFC][LV] Separate control-flow masking from tail-folding masking. (#169509 ) - Differentiate between operations that need masking because they are in a conditionally-executed block, and operations that need masking because the loop is tail-folded (predicated). - This is needed for future work when we need to support a predicated vector epilogue in combination with an unpredicated vector body. - This is first patch in a series. - See #181401 for the follow-on work.	2026-03-30 17:09:34 +01:00
Florian Hahn	713c70d7ef	[VPlan] Handle regions with live-outs and scalar VF when replicating. (#186252 ) Extend intial unrolling of replicate regions (https://github.com/llvm/llvm-project/pull/170212) to support live-outs, if the VF is scalar. This allows adding the logic needed to explicitly unroll, and replacing VPPredPhiInsts with regular scalar VPPhi, without yet having to worry about packing values into vector phis. This will be done in a follow-up change, which means all replicate regions will be fully dissolved. PR: https://github.com/llvm/llvm-project/pull/186252	2026-03-30 13:23:23 +01:00
Florian Hahn	53e7f9ada4	[VPlan] Generalize header-phi detection in VPPhi::execute. (NFC) (#189352 ) Generalize the header-phi detection in VPPhi::execute to use VPDT. This is currently NFC, but is needed to use VPPhi also for dissolving replicate regions (https://github.com/llvm/llvm-project/pull/186252). Split off from approved https://github.com/llvm/llvm-project/pull/186252 as suggested. PR: https://github.com/llvm/llvm-project/pull/189352	2026-03-30 12:21:22 +01:00
Alexey Bataev	c7908d3320	[SLP][NFC]Use passing-by-ref in the range based loop to prevent warnings/errors	2026-03-30 03:47:00 -07:00
Ramkumar Ramachandra	8a4f21048f	[VPlan] Generalize noalias-licm-check to replicate regions (NFC) (#187017 ) In order to use the cannotHoistOrSinkWithNoAlias check in use-sites after replicate regions are created, generalize it to work with replicate regions.	2026-03-30 09:17:39 +01:00
Florian Hahn	b5d43f7794	[VPlan] Use transferSuccessors in mergeBlocksIntoPredecessors (NFC). (#189275 ) transferSuccessors is more compact and is guaranteed to preserve the predecessor/successor order properly in all cases. This is not an issue today, but will when used in more places, including #186252. Split off from approved https://github.com/llvm/llvm-project/pull/186252. PR: https://github.com/llvm/llvm-project/pull/189275	2026-03-29 20:20:23 +01:00
Florian Hahn	c467d38090	[LV] Fix offset handling for epilogue resume values. (NFCI) (#189259 ) Instead of replacing all uses of the canonical IV with an add of the resume value and then relying on the fold to simplify, directly create offset versions of both the canonical IV and its increment. The original offset computation were incorrect, but not resulted in mis-compiles due to the corresponding fold. Split off from approved https://github.com/llvm/llvm-project/pull/156262.	2026-03-29 17:04:50 +00:00
Alexey Bataev	4450891580	[SLP] Check if potential bitcast/bswap candidate is a root of reduction Need to check if the potential bitcast/bswap-like construct is a root of the reduction, otherwise it cannot represent a bitcast/bswap construct. Fixes #189184	2026-03-28 13:58:22 -07:00
Ryan Buchner	a125d9b5ef	[SLP][NFC] Reapply "Refactor to prepare for constant stride stores" (#188689 ) Refactor to proceed #185964. Much of this is a refactor to address this issues. Instead of iterating over one chain at a time, attempting all VFs for that given change, we now iterate over VFs, trying each chain for the current VF. Includes fix for use after free bug.	2026-03-27 10:11:49 -07:00
Ramkumar Ramachandra	840e9a4ddd	[VPlan] Fix wrap-flags on WidenInduction unroll (#187710 ) Due to a somewhat recent change, IntOrFpInduction recipes have associated VPIRFlags. The VPlanUnroll logic for WidenInduction recipes predates this change, and computes incomplete wrap-flags: update it to simply use the flags on IntOrFpInduction recipes; PointerInduction recipes have no associated flags, and indeed, no flags should be used.	2026-03-27 13:26:04 +00:00

1 2 3 4 5 ...

7433 Commits