llvm-project

Author	SHA1	Message	Date
Ramkumar Ramachandra	6bbdc70066	[LV] Use getCallWideningDecision in more places (NFC) (#134236 )	2025-04-03 14:53:19 +01:00
Florian Hahn	4b67c53e20	[VPlan] Use recipe debug loc instead of instr DLs in more cases (NFC) Update both VPInterleaveRecipe and VPReplicateRecipe codegen to use debug location directly from the recipe, not the underlying instruction. This removes another dependency on underlying instructions.	2025-04-02 21:51:17 +01:00
Krzysztof Drewniak	554859c736	[TTI] Make isLegalMasked{Load,Store} take an address space (#134006 ) In order to facilitate targets that only support masked loads/stores on certain address spaces (AMDGPU will support them in an upcoming patch, but only for address space 7), add an AddressSpace parameter to isLegalMaskedLoad and isLegalMaskedStore	2025-04-02 15:38:10 -05:00
Ningning Shi(史宁宁)	6b647de031	[NFC] Remove the unused hasMinSize() (#133838 ) The 'hasOptSize()' is 'hasFnAttribute(Attribute::OptimizeForSize) \|\| hasMinSize()', so we don't need another 'hasMinSize()'.	2025-04-01 15:23:34 +08:00
Alexey Bataev	78777a204a	[LV]Split store-load forward distance analysis from other checks, NFC (#121156 ) The patch splits the store-load forwarding distance analysis from other dependency analysis in LAA. Currently it supports only power-of-2 distances, required to support non-power-of-2 distances in future. Part of #100755	2025-03-31 07:28:44 -04:00
Kazu Hirata	2fc08d4c31	[Vectorize] Use DenseMap::insert_range (NFC) (#133656 )	2025-03-30 22:57:45 -07:00
Florian Hahn	424c8f9217	[VPlan] Remove dead UF argument from VPTransformState ctor (NFC).	2025-03-30 17:31:00 +01:00
Kazu Hirata	d8b078d550	[Transforms] Use llvm::append_range (NFC) (#133607 )	2025-03-29 18:57:50 -07:00
Ramkumar Ramachandra	4c4e4e4299	[LV] Strengthen calls to collectInstsToScalarize (NFC) (#130642 ) Avoid the pattern of always calling collectInstsToScalarize after collectUniformsAndScalars, and call it in collectUniformsAndScalars instead. Also strengthen checks for early exits in the function.	2025-03-28 17:27:57 +00:00
Florian Hahn	7b75db5755	[VPlan] Add new VPIRPhi overlay for VPIRInsts wrapping phi nodes (NFC). (#129387 ) Add a new VPIRPhi subclass of VPIRInstruction, that purely serves as an overlay, to provide more convenient checking (via directly doing isa/dyn_cast/cast) and specialied execute/print implementations. Both VPIRInstruction and VPIRPhi share the same VPDefID, and are differentiated by the backing IR instruction. This pattern could alos be used to provide more specialized interfaces for some VPInstructions ocpodes, without introducing new, completely spearate recipes. An example would be modeling VPWidenPHIRecipe & VPScalarPHIRecip using VPInstructions opcodes and providing an interface to retrieve incoming blocks and values through a VPInstruction subclass similar to VPIRPhi. PR: https://github.com/llvm/llvm-project/pull/129387	2025-03-28 08:43:46 +00:00
Florian Hahn	8ddbc01295	[VPlan] Manage FindLastIV start value in ComputeFindLastIVResult (NFC) (#132690 ) Keep the start value as operand of ComputeFindLastIVResult. A follow-up patch will use this to make sure the start value is frozen if needed. Depends on https://github.com/llvm/llvm-project/pull/132689 PR: https://github.com/llvm/llvm-project/pull/132690	2025-03-27 18:34:13 +00:00
Kazu Hirata	cde58bfc16	[Transforms] Use range constructors of *Set (NFC) (#133203 )	2025-03-27 07:51:58 -07:00
Simon Pilgrim	1715386e80	Fix MSVC signed/unsigned comparison warning. NFC.	2025-03-27 08:56:21 +00:00
David Green	de1c2f24bc	[LoopVectorizer][AArch64] Move getMinTripCountTailFoldingThreshold later. (#132170 ) This moves the checks of MinTripCountTailFoldingThreshold later, during the calculation of whether to tail fold. This allows it to check beforehand whether tail predication is required, either for scalable or fixed-width vectors. This option is only specified for AArch64, where it returns the minimum of 5. This patch aims to allow the vectorization of TC=4 loops, preventing them from performing slower when SVE is present.	2025-03-26 19:35:08 +00:00
Florian Hahn	420c056f85	[VPlan] Add ComputeFindLastIVResult opcode (NFC). (#132689 ) This moves the logic for computing the FindLastIV reduction result to its own opcode. A follow-up patch will update the new opcode to also take the start value, to fix https://github.com/llvm/llvm-project/issues/126836. PR: https://github.com/llvm/llvm-project/pull/132689	2025-03-26 10:49:09 +00:00
Kazu Hirata	e87921304b	[Vectorize] Avoid repeated hash lookups (NFC) (#132661 ) Co-authored-by: Florian Hahn <flo@fhahn.com>	2025-03-25 15:18:15 -07:00
Ramkumar Ramachandra	8fb802e995	[LV] Improve code in collectInstsToScalarize (NFC) (#130643 )	2025-03-25 16:52:13 +00:00
Ramkumar Ramachandra	e8d882a95b	[LV] Audit and fix nits in cl::opts (NFC) (#130601 ) Non-static cl::opts should be under the llvm namespace.	2025-03-25 10:19:45 +00:00
Luke Lau	6a8606e99e	[VPlan] Only store RecurKind + FastMathFlags in VPReductionRecipe. NFCI (#131300 ) VPReductionRecipes take a RecurrenceDescriptor, but only use the RecurKind and FastMathFlags in it when executing. This patch makes the recipe more lightweight by stripping it to only take the latter two. The motiviation for this is to simplify an upcoming patch to support in-loop AnyOf reductions. For an in-loop AnyOf reduction we want to create an Or reduction, and by using RecurKind we can create an arbitrary reduction without needing a full RecurrenceDescriptor.	2025-03-24 19:18:54 +08:00
Florian Hahn	206b42c98e	[LV] Use VPBuilder to create ComputeReductionResult. (NFC) Update code to use VPBuilder, to simplify follow-up changes.	2025-03-23 16:10:07 +00:00
Florian Hahn	c482b8faea	[VPlan] Only execute VPExpandSCEVRecipes once and remove them (NFC). Instead of executing the whole entry VPIRBB twice, first only execute the VPExpandSCEVRecipes and replace their uses with the expanded VPValue, which will be a live-in. This allows removing special logic in VPExpandSCEVRecipe to support executing twice and allows moving the ExpandedSCEVs map out of VPTransformState. It will also allow adding other recipes to the entry VPBB in the future.	2025-03-23 09:06:01 +00:00
Kazu Hirata	fae34938f6	[llvm] Use *Set::insert_range (NFC) (#132591 ) DenseSet, SmallPtrSet, SmallSet, SetVector, and StringSet recently gained C++23-style insert_range. This patch uses insert_range with iterator ranges. For each case, I've verified that foos is defined as make_range(foo_begin(), foo_end()) or in a similar manner.	2025-03-22 22:14:45 -07:00
Florian Hahn	0d3ba087f7	[LV] Move IV bypass value creation out of ILV (NFC) createInductionAdditionalBypassValues is only used for epilogue vectorization now. Move it out of ILV, which means we do not have to thread through ExpandedSCEVs and also don't have to track the bypass values in ILV. Instead, directly create them if needed after executing the epilogue plan. This moves more the epilogue specific logic out of the generic executePlan.	2025-03-22 20:36:45 +00:00
David Sherwood	4e69258bf3	[LoopVectorize] Add cost of generating tail-folding mask to the loop (#130565 ) At the moment if we decide to enable tail-folding we do not include the cost of generating the mask per VF. This can mean we make some poor choices of VF, which is definitely true for SVE-enabled AArch64 targets where mask generation for fixed-width vectors is more expensive than for scalable vectors. I've added a VPInstruction::computeCost function to return the costs of the ActiveLaneMask and ExplicitVectorLength operations. Unfortunately, in order to prevent asserts firing I've also had to duplicate the same code in the legacy cost model to make sure the chosen VFs match up. I've wrapped this up in a ifndef NDEBUG for now. The alternative would be to disable the assert completely when tail-folding, which I imagine is just as bad. New tests added: Transforms/LoopVectorize/AArch64/sve-tail-folding-cost.ll Transforms/LoopVectorize/RISCV/tail-folding-cost.ll	2025-03-21 09:24:56 +00:00
Florian Hahn	c73ad7ba20	[VPlan] Add transformation to narrow interleave groups. This patch adds a new narrowInterleaveGroups transfrom, which tries convert a plan with interleave groups with VF elements to a plan that instead replaces the interleave groups with wide loads and stores processing VF elements. This effectively is a very simple form of loop-aware SLP, where we use interleave groups to identify candidates. This initial version is quite restricted and hopefully serves as a starting point for how to best model those kinds of transforms. For now it only transforms load interleave groups feeding store groups. Depends on #106431. This lands the main parts of the approved https://github.com/llvm/llvm-project/pull/106441 as suggested to break things up a bit more.	2025-03-20 19:41:37 +00:00
Kazu Hirata	2df51fd9c4	[Transforms] Avoid repeated hash lookups (NFC) (#132146 )	2025-03-20 09:11:56 -07:00
Kazu Hirata	0dcc201ac4	[Transforms] Use *Set::insert_range (NFC) (#132056 ) DenseSet, SmallPtrSet, SmallSet, SetVector, and StringSet recently gained C++23-style insert_range. This patch replaces: Dest.insert(Src.begin(), Src.end()); with: Dest.insert_range(Src); This patch does not touch custom begin like succ_begin for now.	2025-03-19 15:35:01 -07:00
Florian Hahn	2e13ec561c	[VPlan] Bail out on non-intrinsic calls in VPlanNativePath. Update initial VPlan-construction in VPlanNativePath in line with the inner loop path, in that it bails out when encountering constructs it cannot handle, like non-intrinsic calls. Fixes https://github.com/llvm/llvm-project/issues/131071.	2025-03-19 21:35:15 +00:00
Florian Hahn	11b8699572	[LV] Don't skip instrs with side-effects in reg pressure computation. (#126415 ) calculateRegisterUsage adds end points for each user of an instruction to Ends and ignores instructions not added to it, i.e. instructions with no users. This means things like stores aren't included, which in turn means values that are only used in stores are also not included for consideration. This means we underestimate the register usage in cases where the only users are things like stores. Update the code to don't skip instructions without users (i.e. not in Ends) if they have side-effects. PR: https://github.com/llvm/llvm-project/pull/126415	2025-03-19 15:13:43 +00:00
Kazu Hirata	9705010b5e	[Vectorize] Avoid repeated hash lookups (NFC) (#131962 )	2025-03-19 07:14:54 -07:00
Luke Lau	a4dc02c0e7	[VPlan] Rename VPReverseVectorPointerRecipe to VPVectorEndPointerRecipe. NFC (#131086 ) After #128718 lands there will be two ways of performing a reversed widened memory access, either by performing a consecutive unit-stride access and a reverse, or a strided access with a negative stride. Even though both produce a reversed vector, only the former needs VPReverseVectorPointerRecipe which computes a pointer to the last element of each part. A strided reverse still needs a pointer to the first element of each part so it will use VPVectorPointerRecipe. This renames VPReverseVectorPointerRecipe to VPVectorEndPointerRecipe to clarify that a reversed access may not necessarily need a pointer to the last element.	2025-03-19 00:09:15 +08:00
Florian Hahn	166937b49d	[LV] Cleanup after expanding SCEV predicate to constant. In some cases, SCEV isn't able to prove that no wrap checks are needed, while constant folding in SCEVExpander can. In those cases, we may leave around IR for computing the trip count, which is unused at this point but may be re-used later, triggering an assertion when trying to clean up SCEVExp after vectorization. Directly run the cleaner after expanding to a constant predicate to prevent any generated code from being re-used. Fixes https://github.com/llvm/llvm-project/issues/131281.	2025-03-17 21:26:51 +00:00
Luke Lau	eef5ea0c42	[VPlan] Account for dead FOR splice simplification in cost model (#131486 ) Fixes #131359 After #129645, a first-order recurrence will no longer have it's splice costed if the VPInstruction::FirstOrderRecurrenceSplice has no users and is dead. The legacy cost model didn't account for this, so this accounts for it in planContainsAdditionalSimplifications to avoid the "VPlan cost model and legacy cost model disagreed" assertion.	2025-03-18 00:00:54 +08:00
Florian Hahn	17b4be8f63	[VPlan] Move setting name and adding VFs after recipe creation.(NFC) Recipe creation is the only place where the VF range is restricted. Move setting the VFs just after initial recipe creation.	2025-03-17 12:04:09 +00:00
Luke Lau	315c02aa02	[VPlan] Fix crash with inloop fmuladd reductions with blend (#131154 ) When visiting in-loop reduction links, we previously crashed if we had an fmuladd with a blend after it in the chain. This fixes it by lifting the existing blend folding to also handle fmuladd. This also simplifies the code structure slightly for an upcoming patch I want to post to handle in-loop AnyOf reductions. I removed the PhiR->isInLoop() check since it's already guarded at the top of the parent Header->Phis() loop.	2025-03-14 09:08:32 +08:00
Florian Hahn	62994c3291	[VPlan] Also introduce explicit broadcasts for values from entry VPBB. Update and generalize materializeBroadcasts to also introduce explicit broadcasts for VPValues defined in the Plans Entry block. This fixes a crash when trying to insert the broadcasts generated by VPTransformState::get after the generating instruction, which isn't possible after invoke instructions. Fixes https://github.com/llvm/llvm-project/issues/128838.	2025-03-12 22:03:19 +00:00
Ramkumar Ramachandra	93d41d8148	[LV] Use ElementCount::isKnownLT to factor code (NFC) (#130596 )	2025-03-11 15:42:59 +00:00
John Brawn	7129205816	[LoopVectorize] Move checking for OptForSize into the cost model (NFC) (#130752 ) Move OptForSizeBasedOnProfile into the cost model and rename it to OptForSize, as shouldOptimizeForSize checks both the function attribute and profile. This being done in preparation for OptForSize being used in the cost model.	2025-03-11 14:58:04 +00:00
David Sherwood	26ecf97895	[LoopVectorize] Further improve cost model for early exit loops (#126235 ) Following on from #125058, this patch takes into account the work done in the vector early exit block when assessing the profitability of vectorising the loop. I have renamed areRuntimeChecksProfitable to isOutsideLoopWorkProfitable and we now pass in the early exit costs. As part of this, I have added the ExtractFirstActive opcode to VPInstruction::computeCost. It's worth pointing out that when we assess profitability of the loop we calculate a minimum trip count and compare that against the maximum trip count. However, since the loop has an early exit the runtime trip count can still end up being less than the minimum. Alternatively, we may never take the early exit at all at runtime and so we have the opposite problem of over-estimating the cost of the loop. The loop vectoriser cannot simultaneously take two contradictory positions and so I feel the only sensible thing to do is be conservative and assume the loop will be more expensive than loops without early exits. We may find in future that we need to adjust the cost according to the probability of taking the early exit. This will become even more important once we support multiple early exits. However, we have to start somewhere and we can always revisit this later.	2025-03-11 11:48:55 +00:00
Florian Hahn	f84f4e1e05	[LV] Don't crash on inner loops with RT checks in VPlan-native path. Assert that we only generate runtime checks for inner loops in emitMemRuntimeChecks, instead of returning nullptr in the VPlan-native path, which is causing crashes and incorrect code.	2025-03-10 20:28:56 +00:00
Ramkumar Ramachandra	35c0302086	[LV] Clean up unused 'using' definition (NFC) (#130597 )	2025-03-10 17:57:32 +00:00
Ramkumar Ramachandra	7220ab821d	[LV] Mark getMaxVScale static (NFC) (#130598 )	2025-03-10 17:56:50 +00:00
Florian Hahn	fd267082ee	[VPlan] Refactor VPlan creation, add transform introducing region (NFC). (#128419 ) Create an empty VPlan first, then let the HCFG builder create a plain CFG for the top-level loop (w/o a top-level region). The top-level region is introduced by a separate VPlan-transform. This is instead of creating the vector loop region before building the VPlan CFG for the input loop. This simplifies the HCFG builder (which should probably be renamed) and moves along the roadmap ('buildLoop') outlined in [1]. As follow-up, I plan to also preserve the exit branches in the initial VPlan out of the CFG builder, including connections to the exit blocks. The conversion from plain CFG with potentially multiple exits to a single entry/exit region will be done as VPlan transform in a follow-up. This is needed to enable VPlan-based predication. Currently early exit support relies on building the block-in masks on the original CFG, because exiting branches and conditions aren't preserved in the VPlan. So in order to switch to VPlan-based predication, we will have to preserve them in the initial plain CFG, so the exit conditions are available explicitly when we convert to single entry/exit regions. Another follow-up is updating the outer loop handling to also introduce VPRegionBlocks for nested loops as transform. Currently the existing logic in the builder will take care of creating VPRegionBlocks for nested loops, but not the top-level loop. [1] https://llvm.org/devmtg/2023-10/slides/techtalks/Hahn-VPlan-StatusUpdateAndRoadmap.pdf PR: https://github.com/llvm/llvm-project/pull/128419	2025-03-09 15:05:35 +00:00
Ramkumar Ramachandra	ddffb74afd	[LV] Strip unreachable SCEV-check blocks (#130079 ) emitSCEVChecks checks if SCEVCheckCond matches zero, and returns nullptr. However, it sets SCEVCheckCond as used before it does this, which prevents it from being removed during cleanup, resulting in unreachable blocks being emitted. Fix this.	2025-03-06 19:30:25 +00:00
Ramkumar Ramachandra	00f3089c2e	[LV] Use PatternMatch in emitTransformedIndex (NFC) (#130081 )	2025-03-06 19:23:31 +00:00
Luke Lau	e1cea0d928	[LV][TTI] Remove unused ReductionFlags. NFC (#129858 ) No in-tree targets currently use it in the preferInLoopReduction/preferPredicatedReductionSelect TTI hooks. It looks like it used to be used in LoopUtils, at least in 8ca60db40bd944dc5f67e0f200a403b4e03818ea, but I presume it was replaced by RecurrenceDescriptor.	2025-03-05 18:31:12 +08:00
Ramkumar Ramachandra	80bdfcd411	[LoopUtils] Don't wrap in getLoopEstimatedTripCount (#129080 ) getLoopEstimatedTripCount returns the trip count based on profiling data, and its documentation says that it could return 0 when the trip count is zero, but this is not the case: a valid trip count can never be zero, and it returns 0 when the unsigned ExitCount is incremented by 1 and wraps. Some callers are careful about checking for this zero value in an std::optional, but it makes for an API with footguns, as a std::optional return value indicates that a non-nullopt value would be a valid trip count. Fix this by explicitly returning std::nullopt when the return value would wrap, and strip additional checks in callers. This also fixes a minor bug in LoopVectorize.	2025-03-04 08:43:08 +00:00
Mel Chen	9b4ad2fe50	[LV][EVL] Support fixed-order recurrence idiom with EVL tail folding. (#124093 ) This patch converts the llvm.vector.splice intrinsic to llvm.experimental.vp.splice, ensuring that fixed-order recurrences execute correctly when tail folding by EVL is enable. Due to the non-VFxUF penultimate EVL issue, the EVL from the previous iteration will be preserved and used in llvm.experimental.vp.splice.	2025-03-03 21:27:13 +08:00
Florian Hahn	f937b17e85	[LV] Don't query SCEV for non-invariant values in cost model. This fixes a divergence between VPlan and legacy cost model, matching behavior further up in getInstructionCost as well. Fixes https://github.com/llvm/llvm-project/issues/129236.	2025-03-02 10:55:52 +00:00
Florian Hahn	75270e3750	[VPlan] Don't print VPlan DT after VPlan construction. (NFC) Remove unnecessary code to just print VPlan dominator tree.	2025-03-01 21:15:56 +00:00

1 2 3 4 5 ...

2462 Commits