llvm-project

Author	SHA1	Message	Date
Florian Hahn	b76089c7f3	[VPlan] Skip uses-scalars restriction if one of ops needs broadcast. (#168246 ) Update the logic in narrowToSingleScalar to allow narrowing even if not all users use scalars, if at least one of the operands already needs broadcasting. In that case, there won't be any additional broadcasts introduced. This should allow removing the special handling for stores, which can introduce additional broadcasts currently. Fixes https://github.com/llvm/llvm-project/issues/169668. PR: https://github.com/llvm/llvm-project/pull/168246	2025-11-28 10:26:27 +00:00
Shih-Po Hung	b9bdec3021	[TTI][Vectorize] Migrate masked/gather-scatter/strided/expand-compress costing (NFCI) (#165532 ) In #160470, there is a discussion about the possibility to explored a general approach for handling memory intrinsics. API changes: - Remove getMaskedMemoryOpCost, getGatherScatterOpCost, getExpandCompressMemoryOpCost, getStridedMemoryOpCost from Analysis/TargetTransformInfo. - Add getMemIntrinsicInstrCost. In BasicTTIImpl, map intrinsic IDs to existing target implementation until the legacy TTI hooks are retired. - masked_load/store → getMaskedMemoryOpCost - masked_/vp_gather/scatter → getGatherScatterOpCost - masked_expandload/compressstore → getExpandCompressMemoryOpCost - experimental_vp_strided_{load,store} → getStridedMemoryOpCost TODO: add support for vp_load_ff. No functional change intended; costs continue to route to the same target-specific hooks.	2025-11-28 05:14:37 +00:00
Florian Hahn	8459508227	[VPlan] Handle scalar VPWidenPointerInd in convertToConcreteRecipes. (#169338 ) In some case, VPWidenPointerInductions become only used by scalars after legalizeAndOptimizationInducftions was already run, for example due to some VPlan optimizations. Move the code to scalarize VPWidenPointerInductions to a helper and use it if needed. This fixes a crash after #148274 in the added test case. Fixes https://github.com/llvm/llvm-project/issues/169780	2025-11-27 21:52:15 +00:00
Florian Hahn	8f36135aea	[VPlan] Add m_Intrinsic matcher that takes a variable intrinsic ID (NFC) Add a variant of m_Intrinsic that matches a variable runtime ID.	2025-11-27 21:23:29 +00:00
Florian Hahn	db85babddd	[VPlan] Use m_Intrinsic to match assumes/noalias_scope_decl (NFC). Use pattern matching to check for intrinsics to slightly simplify code.	2025-11-27 18:50:34 +00:00
Luke Lau	1c7ec06b16	[VPlan] Optimize LastActiveLane to EVL - 1 (#169766 ) With EVL tail folding, the LastActiveLane can be computed with EVL - 1. This removes the need for a header mask and vfirst.m for loops with live outs on RISC-V: # %bb.5: # %for.cond.cleanup7 - vsetvli zero, zero, e32, m2, ta, ma - vmv.v.x v8, s1 - vmsleu.vv v10, v8, v22 - vfirst.m a0, v10 - srli a1, a0, 63 - czero.nez a0, a0, a1 - czero.eqz a1, s8, a1 - or a0, a0, a1 - addi a0, a0, -1 - vsetvli zero, zero, e64, m4, ta, ma - vslidedown.vx v8, v12, a0 + addi s1, s1, -1 + vslidedown.vx v8, v12, s1	2025-11-27 17:03:08 +08:00
Gang Chen	ceba82f862	[LoadStoreVectorizer] Fix one-element vector handling (#169671 ) This is the followup of https://github.com/llvm/llvm-project/pull/168135	2025-11-26 17:34:33 -08:00
Florian Hahn	f8eca64a28	Reapply "[LV] Use ExtractLane(LastActiveLane, V) live outs when tail-folding. (#149042 )" This reverts commit a6edeedbfa308876d6f2b1648729d52970bb07e6. The following fixes have landed, addressing issues causing the original revert: * https://github.com/llvm/llvm-project/pull/169298 * https://github.com/llvm/llvm-project/pull/167897 * https://github.com/llvm/llvm-project/pull/168949 Original message: Building on top of https://github.com/llvm/llvm-project/pull/148817, introduce a new abstract LastActiveLane opcode that gets lowered to Not(Mask) → FirstActiveLane(NotMask) → Sub(result, 1). When folding the tail, update all extracts for uses outside the loop the extract the value of the last actice lane. See also https://github.com/llvm/llvm-project/issues/148603 PR: https://github.com/llvm/llvm-project/pull/149042	2025-11-26 20:03:55 +00:00
Florian Hahn	d58ebe339c	Revert "Reapply "[LV] Use ExtractLane(LastActiveLane, V) live outs when tail-folding. (#149042 )"" This reverts commit 72e51d389f66d9cc6b55fd74b56fbbd087672a43. Missed some test updates.	2025-11-26 19:41:39 +00:00
Florian Hahn	72e51d389f	Reapply "[LV] Use ExtractLane(LastActiveLane, V) live outs when tail-folding. (#149042 )" This reverts commit a6edeedbfa308876d6f2b1648729d52970bb07e6. The following fixes have landed, addressing issues causing the original revert: * https://github.com/llvm/llvm-project/pull/169298 * https://github.com/llvm/llvm-project/pull/167897 * https://github.com/llvm/llvm-project/pull/168949 Original message: Building on top of https://github.com/llvm/llvm-project/pull/148817, introduce a new abstract LastActiveLane opcode that gets lowered to Not(Mask) → FirstActiveLane(NotMask) → Sub(result, 1). When folding the tail, update all extracts for uses outside the loop the extract the value of the last actice lane. See also https://github.com/llvm/llvm-project/issues/148603 PR: https://github.com/llvm/llvm-project/pull/149042	2025-11-26 19:31:25 +00:00
Sam Tebbs	071d1fb8be	[LV] Use VPReductionRecipe for partial reductions (#147513 ) Partial reductions can easily be represented by the VPReductionRecipe class by setting their scale factor to something greater than 1. This PR merges the two together and gives VPReductionRecipe a VFScaleFactor so that it can choose to generate the partial reduction intrinsic at execute time. Stacked PRs: 1. https://github.com/llvm/llvm-project/pull/147026 2. https://github.com/llvm/llvm-project/pull/147255 3. https://github.com/llvm/llvm-project/pull/156976 4. https://github.com/llvm/llvm-project/pull/160154 5. https://github.com/llvm/llvm-project/pull/147302 6. https://github.com/llvm/llvm-project/pull/162503 7. -> https://github.com/llvm/llvm-project/pull/147513 Replaces https://github.com/llvm/llvm-project/pull/146073 .	2025-11-26 16:18:22 +00:00
Florian Hahn	4cc8cc81e3	[VPlan] Hoist predicated loads with complementary masks. (#168373 ) This patch adds a new VPlan transformation to hoist predicated loads, if we can prove they execute unconditionally, i.e. there are 2 predicated loads to the same address with complementary masks. Then we are guaranteed to execute one of them on each iteration, allowing us to remove the mask. The transform groups masked replicating loads by their address SCEV, then checks if there are 2 loads with complementary mask. If that is the case, we check if there are any writes that may alias the load address in the blocks between the first and last load with the same address. The transforms operates after linearizing the CFG, but before introducing replicate regions, which means this is just checking a chain of consecutive blocks. Currently this only uses noalias metadata to check for no-alias (using the helpers added in https://github.com/llvm/llvm-project/pull/166247). Then we create an unpredicated VPReplicateRecipe at the position of the first load, then replace all users of the grouped loads with it. Small Alive2 proof for hoisting with complementary masks: https://alive2.llvm.org/ce/z/kUx742 PR: https://github.com/llvm/llvm-project/pull/168373	2025-11-26 13:55:14 +00:00
Ramkumar Ramachandra	2d4a8dadba	[VPlan] Use DL index type consistently for GEPs (#169396 ) In preparation to strip VPUnrollPartAccessor and unroll recipes directly, strip unnecessary complication in getGEPIndexTy, as the unroll part will no longer be available in follow-ups (see #168886 for instance). The patch also helps by doing a mass test update up-front. Narrowing the GEP index type conditionally does not yield any benefit, and the change is non-functional in terms of emitted assembly. While at it, avoid hard-coding address-space 0, and use the pointer operand's address space to get the GEP index type.	2025-11-26 12:25:55 +00:00
Florian Hahn	091aece72b	[VPlan] Remove redundant transferFlags call from replicateByVF (NFC). Flags are now passed on construction/cloning. Remove unnecessary transferFlags call, and make code independent of VPRecipeWithIRFlags, to support additional recipes in the future.	2025-11-25 20:57:42 +00:00
Florian Hahn	a51e2ef0fe	[VPlan] Treat VPVector(End)PointerRecipe as single-scalar, if ops are. (#169249 ) VPVector(End)PointerRecipes are single-scalar if all their operands are. This should be effectively NFC currently, but it should re-enable cost checking for some more VPWidenMemoryRecipe after https://github.com/llvm/llvm-project/pull/157387 as discovered by John Brawn.	2025-11-25 14:46:30 +00:00
Ramkumar Ramachandra	cb63e99e58	[VPlan] Include flags in VectorPointerRecipe::printRecipe (#169466 ) The change is non-functional with respect to emitted IR.	2025-11-25 10:26:51 +00:00
Ramkumar Ramachandra	c25e0d3e29	[VPlan] Simplify x + 0 -> x (#169394 )	2025-11-25 05:58:41 +00:00
Florian Hahn	48eb697441	[LV] Count cost of middle block if TC <= VF. (#168949 ) If the expected trip count is less than the VF, the vector loop will only execute a single iteration. When that's the case, the cost of the middle block has the same impact as the cost of the vector loop. Include it in isOutsideLoopWorkProfitable to avoid vectorizing when the extra work in the middle block makes it unprofitable. Note that isOutsideLoopWorkProfitable already scales the cost of blocks outside the vector region, but the patch restricts accounting for the middle block to cases where VF <= ExpectedTC, to initially catch some worst cases and avoid regressions. This initial version should specifically avoid unprofitable tail-folding for loops with low trip counts after re-applying https://github.com/llvm/llvm-project/pull/149042. PR: https://github.com/llvm/llvm-project/pull/168949	2025-11-24 19:23:04 +00:00
Ramkumar Ramachandra	37f7b3128d	Reland [VPlan] Handle WidenGEP in narrowToSingleScalars (#167880 ) Changes: Fix a missed update to WidenGEP::usesFirstLaneOnly, and include reduced-case test that was previously hitting the new assert: the underlying reason was that VPWidenGEP::usesScalars was too weak, and the single-scalar WidenGEP was not narrowed by narrowToSingleScalarRecipes. This allows us to strip a special case in VPWidenGEP::execute.	2025-11-24 18:11:58 +00:00
Luke Lau	456b0512c9	[VPlan] Set ZeroIsPoison=false for FirstActiveLane (#169298 ) When interleaving a loop with an early exit, the parts before the active lane will be all zero. Currently we emit @llvm.experimental.cttz.elts with ZeroIsPoison=true for these parts, which means that they will produce poison. We don't see any miscompiles today on AArch64 because it has the same lowering for cttz.elts regardless of ZeroIsPoison, but this may cause issues on RISC-V when interleaving. This fixes it by setting ZeroIsPoison=false. The codegen is slightly worse on RISC-V when ZeroIsPoison=false and we could potentially recover it by enabling it again when UF=1, but this is left to another PR. This is split off from #168738, where LastActiveLane can get expanded to a FirstActiveLane with an all-zeroes mask.	2025-11-24 14:39:26 +00:00
Ramkumar Ramachandra	1abb055c57	[IVDesc] Make getCastInsts return an ArrayRef (NFC) (#169021 ) To make it clear that the return value is immutable.	2025-11-24 08:57:55 +00:00
hstk30-hw	13a39eaa0b	[Sema] Fix Wunused-but-set-variable warning(NFC) (#169220 ) Fix warning: llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:1455:23: warning: variable 'Store' set but not used [-Wunused-but-set-variable]	2025-11-24 12:49:07 +08:00
Florian Hahn	996213c6ea	[VPlan] Refine mayRead/WriteFromMemory for VPInst, fix VPlan SLP check. Fix VPlan SLP check incorrectly bailing out for non-VPInstructions. Starting from the beginning of the block will include canonical IVs, which in turn are not VPInstructions. If we hit a non-VPInstruction, we should conservatively treat is as potentially unvectorizable. To keep the tests working as expected, refine mayRead/WriteFromMemory for Load and GEP VPInstructions.	2025-11-23 21:12:24 +00:00
Florian Hahn	21378fb75a	[VPlan] Merge `fcmp uno` feeding AnyOf. (#166823 ) Fold any-of (fcmp uno %A, %A), (fcmp uno %B, %B), ... -> any-of (fcmp uno %A, %B), ... This pattern is generated to check if any vector lane is NaN, and combining multiple compares is beneficial on architectures that have dedicated instructions. Alive2 Proof: https://alive2.llvm.org/ce/z/vA_aoM Combine suggested as part of https://github.com/llvm/llvm-project/pull/161735 PR: https://github.com/llvm/llvm-project/pull/166823	2025-11-23 15:52:19 +00:00
Florian Hahn	a2231af5dd	[VPlan] Share PreservesUniformity logic between isSingleScalar and isUniformAcrossVFsAndUFs Extract the PreservesUniformity logic from isSingleScalar into a shared static helper function. Update isUniformAcrossVFsAndUFs to use this logic for VPWidenRecipe and VPInstruction, so that any opcode that preserves uniformity is considered uniform-across-vf-and-uf if its operands are. This unifies the uniformity checking logic and makes it easier to extend in the future. This should effectively by NFC currently.	2025-11-22 22:11:01 +00:00
Florian Hahn	080ca902c6	[VPlan] Create resume phis in scalar preheader early. (NFC) (#166099 ) Create phi recipes for scalar resume value up front in addInitialSkeleton during initial construction. This will allow moving the remaining code dealing with resume values to VPlan transforms/construction. PR: https://github.com/llvm/llvm-project/pull/166099	2025-11-22 20:45:41 +00:00
Ramkumar Ramachandra	b98f6a54f6	[VPlan] Cast to VPIRMetadata in getMemoryLocation (NFC) (#169028 ) This allows us to strip an unnecessary TypeSwitch.	2025-11-21 14:23:17 +00:00
Florian Hahn	31711c908f	[VPlan] Only apply forced cost to recipes with underlying values. (#168372 ) Only apply forced instruction costs to recipes with underlying values to match the legacy cost model. A VPlan may have a number of additional VPInstructions without underlying values that are not considered for its cost, and assigning forced costs to them would incorrectly inflate its cost. This fixes a cost divergence between legacy and VPlan-based cost models with forced instruction costs. PR: https://github.com/llvm/llvm-project/pull/168372	2025-11-21 14:21:16 +00:00
Ramkumar Ramachandra	299ea95747	[VPlan] Drop poison-generating flags on induction trunc (#168922 ) After truncating an integer-induction, neither nuw nor nsw hold. Fixes #168902. Co-authored-by: Florian Hahn <flo@fhahn.com>	2025-11-21 08:14:46 +00:00
Florian Hahn	7acfbc23a7	[VPlan] Remove PtrIV::IsScalarAfterVectorization, use VPlan analysis. (#168289 ) Remove `VPWidenPointerInductionRecipe::IsScalarAfterVectorization` and replace it with `onlyScalarValuesUsed`. This removes the need to carry state from the legacy cost model through VPlan, and the VPlan-based analysis gives more accurate results, avoiding a number of extracts. PR: https://github.com/llvm/llvm-project/pull/168289	2025-11-20 18:58:25 +00:00
Alexey Bataev	54d9d4d868	[SLP]Check if the non-schedulable phi parent node has unique operands Need to check if the non-schedulable phi parent node has unique operands, if the incoming node has copyables, and the node is commutative. Otherwise, there might be issues with the correct calculation of the dependencies. Fixes #168589	2025-11-20 10:51:31 -08:00
Florian Hahn	67e35bbebb	[LV] Check full partial reduction chains in order. (#168036 ) https://github.com/llvm/llvm-project/pull/162822 added another validation step to check if entries in a partial reduction chain have the same scale factor. But the validation was still dependent on the order of entries in PartialReductionChains, and would fail to reject some cases (e.g. if the first first link matched the scale of the second link, but the second link is invalidated later). To fix that, group chains by their starting phi nodes, then perform the validation for each chain, and if it fails, invalidate the whole chain for the phi. Fixes https://github.com/llvm/llvm-project/issues/167243. Fixes https://github.com/llvm/llvm-project/issues/167867. PR: https://github.com/llvm/llvm-project/pull/168036	2025-11-20 15:54:57 +00:00
Sam Tebbs	3396b4654b	[LV] Allow partial reductions with an extended bin op (#165536 ) A pattern of the form reduce.add(ext(mul)) is valid for a partial reduction as long as the mul and its operands fulfill the requirements of a normal partial reduction. The mul's extend operands will be optimised to the wider extend, and we already have oneUse checks in place to make sure the mul and operands can be modified safely. 1. -> https://github.com/llvm/llvm-project/pull/165536 2. https://github.com/llvm/llvm-project/pull/165543	2025-11-20 10:22:11 +00:00
Gang Chen	9e9fe08b16	Re-land [Transform][LoadStoreVectorizer] allow redundant in Chain (#168135 ) This is the fixed version of https://github.com/llvm/llvm-project/pull/163019	2025-11-19 17:39:10 -08:00
Alexey Bataev	2c3aa92089	[SLP]Fix insertion point for setting for the nodes The problem with the many def-use chain problems in SLP vectorizer are related to the fact that some nodes reuse the same instruction as insertion point. Insertion point is not the instruction, but the place between instructions. To set it correctly, better to generate pseudo instruction immediately after the last instruction, and use it as insertion point. It resolves the issues in most cases. Fixes #168512 #168576	2025-11-19 17:15:24 -08:00
Florian Hahn	040d9c94be	[VPlan] Collect FMFs for in-loop reduction chain in VPlan. (NFC) Replace retrieving FMFs for in-loop reduction via underlying instruction + legal by collecting the flags during reduction chain traversal in VPlan.	2025-11-19 22:11:21 +00:00
Mikhail Gudim	12131d5cd3	[SLPVectorizer] Widen constant strided loads. (#162324 ) Given a set of pointers, check if they can be rearranged as follows (%s is a constant): %b + 0 * %s + 0 %b + 0 * %s + 1 %b + 0 * %s + 2 ... %b + 0 * %s + w %b + 1 * %s + 0 %b + 1 * %s + 1 %b + 1 * %s + 2 ... %b + 1 * %s + w ... If the pointers can be rearanged in the above pattern, it means that the memory can be accessed with a strided loads of width `w` and stride `%s`.	2025-11-19 15:11:09 -05:00
Rahul Joshi	4703195c8d	[NFC][LLVM] Namespace cleanup in SLPVectorizer (#168623 ) - Remove file local functions out of `llvm` or anonymous namespace and make them static. - Use namespace qualifier to define `BoUpSLP` class and several template specializations.	2025-11-19 07:34:09 -08:00
Luke Lau	5da0445420	[LV] Consolidate shouldOptimizeForSize and remove unused BFI/PSI. NFC (#168697 ) #158690 plans on passing BFI as a lazy lambda to avoid computing BlockFrequencyInfo when not needed. In preparation for that, this PR removes BFI and PSI from some constructors that aren't used. It also consolidates the two calls to llvm::shouldOptimizeForSize so that the result is computed once and passed where needed. This also renames OptForSize in LoopVectorizationLegality to clarify that it's to prevent runtime SCEV checks, see https://reviews.llvm.org/D68082	2025-11-19 21:29:26 +08:00
Florian Hahn	7b94dd336e	[VPLan] Reduce duplication in VPHeaderPHIRecipe::classof. (NFCI) Implement VPHeaderPHIRecipe::classof(const VPValue *V) in terms of the variant taking VPRecipeBase. Reduces some duplication, split off from https://github.com/llvm/llvm-project/pull/141431.	2025-11-19 12:46:53 +00:00
Florian Hahn	0730913529	[VPlan] Print debug info for all recipes. (#168454 ) Use the recently refactored VPRecipeBase::print to print debug location for all recipes. PR: https://github.com/llvm/llvm-project/pull/168454	2025-11-19 10:10:08 +00:00
Hassnaa Hamdi	f7f41350b4	[LV]: Skip Epilogue scalable VF greater than RemainingIterations. (#156724 ) Consider skipping epilogue scalable VF when they are greater than RemainingIterations same as fixed VF. And skip scalable RemainingIterations from that comparison because SCEV ATM can't evaluate non-canonical vscale-based expressions.	2025-11-19 05:11:17 +00:00
Shih-Po Hung	961940e1a7	[TTI] Use MemIntrinsicCostAttributes for getMaskedMemoryOpCost (#168029 ) - Split from #165532. This is a step toward a unified interface for masked/gather-scatter/strided/expand-compress cost modeling. - Replace the ad-hoc parameter list with a single attributes object. API change: ``` - InstructionCost getMaskedMemoryOpCost(Opcode, Src, Alignment, - AddressSpace, CostKind); + InstructionCost getMaskedMemoryOpCost(MemIntrinsicCostAttributes, + CostKind); ``` Notes: - NFCI intended: callers populate MemIntrinsicCostAttributes with the same information as before. - Follow-up: migrate gather/scatter, strided, and expand/compress cost queries to the same attributes-based entry point.	2025-11-19 09:51:12 +08:00
Florian Hahn	1e3ea03293	[VPlan] VPIRFlags kind for FCmp with predicate + fast-math flags (NFCI). FCmp instructions have both a predicate and fast-math flags. Introduce a new FCmp kind, that combines both to model this correctly in the current system. This should be NFC modulo VPlan printing which now includes the correct fast-math flags.	2025-11-18 22:09:53 +00:00
Ramkumar Ramachandra	507f236f5e	[VPlan] Fix OpType-mismatch in getFlagsFromIndDesc (#168560 ) Follow up on a cse OpType-mismatch crash reported due to ef023cae388d (Reland [VPlan] Expand WidenInt inductions with nuw/nsw), setting the OpType correctly when returning from getFlagsFromIndDesc.	2025-11-18 20:41:57 +00:00
Florian Hahn	2befda2225	[VPlan] Populate and use VPIRFlags from initial VPInstruction. (#168450 ) Update VPlan to populate VPIRFlags during VPInstruction construction and use it when creating widened recipes, instead of constructing VPIRFlags from the underlying IR instruction each time. The VPRecipeWithIRFlags constructor taking an underlying instruction and setting the flags based on it has been removed. This centralizes initial VPIRFlags creation and ensures flags are consistently available throughout VPlan transformations and makes sure we don't accidentally re-add flags from the underlying instruction that already got dropped during transformations. Follow-up to https://github.com/llvm/llvm-project/pull/167253, which did the same for VPIRMetadata. Should be NFC w.r.t. to the generated IR. PR: https://github.com/llvm/llvm-project/pull/168450	2025-11-18 15:15:14 +00:00
Florian Hahn	2432465d99	[VPlan] Support isa/dyn_cast from VPRecipeBase to VPIRMetadata (NFC). (#166245 ) Implement CastInfo from VPRecipeBase to VPIRMetadata to support isa/dyn_Cast. This is similar to CastInfoVPPhiAccessors, supporting dyn_cast by down-casting to the concrete recipe types inheriting from VPIRMetadata. Can be used for more generalized VPIRMetadata printing following https://github.com/llvm/llvm-project/pull/165825. PR: https://github.com/llvm/llvm-project/pull/166245	2025-11-18 11:31:11 +00:00
Florian Hahn	7c34848ae1	[VPlan] Hoist loads with invariant addresses using noalias metadata. (#166247 ) This patch implements a transform to hoists single-scalar replicated loads with invariant addresses out of the vector loop to the preheader when scoped noalias metadata proves they cannot alias with any stores in the loop. This enables hosting of loads we can prove do not alias any stores in the loop due to memory runtime checks added during vectorization. PR: https://github.com/llvm/llvm-project/pull/166247	2025-11-18 09:35:48 +00:00
Michael Bedy	a61889580e	[SLP] Invariant loads cannot have a memory dependency on stores. (#167929 )	2025-11-18 09:35:29 +01:00
Florian Hahn	3cba379e3d	[VPlan] Populate and use VPIRMetadata from VPInstructions (NFC) (#167253 ) Update VPlan to populate VPIRMetadata during VPInstruction construction and use it when creating widened recipes, instead of constructing VPIRMetadata from the underlying IR instruction each time. This centralizes VPIRMetadata in VPInstructions and ensures metadata is consistently available throughout VPlan transformations. PR: https://github.com/llvm/llvm-project/pull/167253	2025-11-17 21:28:49 +00:00

1 2 3 4 5 ...

6825 Commits