llvm-project

Author	SHA1	Message	Date
Luke Lau	955c475ae6	[VPlan] Add m_Sub to VPlanPatternMatch. NFC (#154705 ) To mirror PatternMatch.h, and we'll also be able to use it in #152167	2025-08-21 09:33:46 +00:00
Elvis Wang	d611a9ca15	[LV][VPlan] Reduce register usage of VPEVLBasedIVPHIRecipe. (#154482 ) `VPEVLBasedIVPHIRecipe` will lower to VPInstruction scalar phi and generate scalar phi. This recipe will only occupy a scalar register just like other phi recipes. This patch fix the register usage for `VPEVLBasedIVPHIRecipe` from vector to scalar which is close to generated vector IR. https://godbolt.org/z/6Mzd6W6ha shows that no register spills when choosing `<vscale x 16>`. Note that this test is basically copied from AArch64.	2025-08-21 07:39:01 +08:00
Shih-Po Hung	cf0e86118d	[VPlan] Handle canonical VPWidenIntOrFpInduction in branch-condition simplification (#153539 ) SimplifyBranchConditionForVFAndUF only recognized canonical IVs and a few PHI recipes in the loop header. With more IV-step optimizations, the canonical widen-canonical-iv can be replaced by a canonical VPWidenIntOrFpInduction, which the pass did not handle, causing regressions (missed simplifications). This patch replaces canonical VPWidenIntOrFpInduction with a StepVector in the vector preheader since the vector loop region only executes once.	2025-08-21 07:34:54 +08:00
Ramkumar Ramachandra	0db57ab586	[VPlan] Improve code using onlyScalarValuesUsed (NFC) (#154564 )	2025-08-20 22:38:00 +01:00
Florian Hahn	4e6c88be7c	[TTI] Remove Args argument from getOperandsScalarizationOverhead (NFC). (#154126 ) Remove the ArrayRef<const Value> Args operand from getOperandsScalarizationOverhead and require that the callers de-duplicate arguments and filter constant operands. Removing the Value based Args argument enables callers where no Value * operands are available to use the function in a follow-up: computing the scalarization cost directly for a VPlan recipe. It also allows more accurate cost-estimates in the future: for example, when vectorizing a loop, we could also skip operands that are live-ins, as those also do not require scalarization. PR: https://github.com/llvm/llvm-project/pull/154126	2025-08-20 21:09:08 +01:00
Florian Hahn	35be64a416	[VPlan] Factor out logic to common compute costs to helper (NFCI). (#153361 ) A number of recipes compute costs for the same opcodes for scalars or vectors, depending on the recipe. Move the common logic out to a helper in VPRecipeWithIRFlags, that is then used by VPReplicateRecipe, VPWidenRecipe and VPInstruction. This makes it easier to cover all relevant opcodes, without duplication. PR: https://github.com/llvm/llvm-project/pull/153361	2025-08-20 16:05:20 +01:00
David Sherwood	e172110d12	[LV] Don't calculate scalar costs for scalable VFs in setVectorizedCallDecision (#152713 ) In setVectorizedCallDecision we attempt to calculate the scalar costs for vectorisation calls, even for scalable VFs where we already know the answer is Invalid. We can avoid doing unnecessary work by skipping this completely for scalable vectors.	2025-08-20 15:00:31 +01:00
Florian Hahn	dc23869f98	[LV] Handle vector trip count being zero in preparePlanForEpiVectorLoop. After a485e0e, we may not set the vector trip count in preparePlanForEpilogueVectorLoop if it is zero. We should not choose a VF * UF that makes the main vector loop dead (i.e. vector trip count is zero), but there are some cases where this can happen currently. In those cases, set EPI.VectorTripCount to zero.	2025-08-20 11:54:22 +01:00
David Sherwood	13d8ba7dea	[LV][TTI] Calculate cost of extracting last index in a scalable vector (#144086 ) There are a couple of places in the loop vectoriser where we want to calculate the cost of extracting the last lane in a vector. However, we wrongly assume that asking for the cost of extracting lane (VF.getKnownMinValue() - 1) is an accurate representation of the cost of extracting the last lane. For SVE at least, this is non-trivial as it requires the use of whilelo and lastb instructions. To solve this problem I have added a new getReverseVectorInstrCost interface where the index is used in reverse from the end of the vector. Suppose a vector has a given ElementCount EC, the extracted/inserted lane would be EC - 1 - Index. For scalable vectors this index is unknown at compile time. I've added a AArch64 hook that better represents the cost, and also a RISCV hook that maintains compatibility with the behaviour prior to this PR. I've also taken the liberty of adding support in vplan for calculating the cost of VPInstruction::ExtractLastElement.	2025-08-19 09:31:37 +01:00
Luke Lau	cabf6433c6	[VPlan] EVL transform VPVectorEndPointerRecipe alongisde load/store recipes. NFC (#152542 ) This is the first step in untangling the variable step transform and header mask optimizations as described in #152541. Currently we replace all VF users globally in the plan, including VPVectorEndPointerRecipe. However this leaves reversed loads and stores in an incorrect state until they are adjusted in optimizeMaskToEVL. This moves the VPVectorEndPointerRecipe transform so that it is updated in lockstep with the actual load/store recipe. One thought that crossed my mind was that VPInterleaveRecipe could also use VPVectorEndPointerRecipe, in which case we would have also been computing the wrong address because we don't transform it to an EVL recipe which accounts for the reversed address.	2025-08-19 08:16:48 +00:00
Luke Lau	144736b07e	[VPlan] Don't fold live ins with both scalar and vector operands (#154067 ) If we end up with a extract_element VPInstruction where both operands are live-ins, we will try to fold the live-ins even though the first operand is a vector whilst the live-in is scalar. This fixes it by just returning the vector live-in instead of calling the folder, and removes the handling for insertelement where we aren't able to do the fold. From some quick testing we previously never hit this fold anyway, and were probably just missing test coverage. Fixes #154045	2025-08-19 04:10:53 +00:00
Mel Chen	1dac302ce7	[LV] Explicitly disallow interleaved access requiring gap mask for scalable VFs. nfc (#154122 ) Currently, VPInterleaveRecipe::execute does not support generating LLVM IR for interleaved accesses that require a gap mask for scalable VFs. It would be better to detect and prevent such groups from being vectorized as interleaved accesses in LoopVectorizationCostModel::interleavedAccessCanBeWidened, rather than relying on the TTI function getInterleavedMemoryOpCost to return an invalid cost.	2025-08-19 08:42:39 +08:00
Florian Hahn	79be94c984	[VPlan] Compute cost single-scalar calls in computeCost. (NFC) Compute the cost of non-intrinsic, single-scalar calls directly in VPReplicateRecipe::computeCost. This starts moving call cost computations to VPlan, handling the simplest case first.	2025-08-18 21:56:56 +01:00
Florian Hahn	7e9989390d	[VPlan] Materialize Build(Struct)Vectors for VPReplicateRecipes. (NFCI) (#151487 ) Materialze Build(Struct)Vectors explicitly for VPRecplicateRecipes, to serve their users requiring a vector, instead of doing so when unrolling by VF. Now we only need to implicitly build vectors in VPTransformState::get for VPInstructions. Once they are also unrolled by VF we can remove the code-path alltogether. PR: https://github.com/llvm/llvm-project/pull/151487	2025-08-18 20:49:42 +01:00
Kyle Wang	064f02dac0	[VectorCombine] Preserve scoped alias metadata (#153714 ) Right now if a load op is scalarized, the `!alias.scope` and `!noalias` metadata are dropped. This PR is to keep them if exist.	2025-08-18 18:16:32 +00:00
Tobias Stadler	8135b7c1ab	[LV] Emit all remarks for unvectorizable instructions (#153833 ) If ExtraAnalysis is requested, emit all remarks caused by unvectorizable instructions - instead of only the first. This is in line with how other places handle DoExtraAnalysis and it can be quite helpful to get info about all instructions in a loop that prevent vectorization.	2025-08-18 18:04:53 +01:00
Ramkumar Ramachandra	97f554249c	[VPlan] Preserve nusw in createInBoundsPtrAdd (#151549 ) Rename createInBoundsPtrAdd to createNoWrapPtrAdd, and preserve nusw as well as inbounds at the callsite.	2025-08-18 17:48:42 +01:00
Kazu Hirata	07eb7b7692	[llvm] Replace SmallSet with SmallPtrSet (NFC) (#154068 ) This patch replaces SmallSet<T , N> with SmallPtrSet<T , N>. Note that SmallSet.h "redirects" SmallSet to SmallPtrSet for pointer element types: template <typename PointeeType, unsigned N> class SmallSet<PointeeType, N> : public SmallPtrSet<PointeeType, N> {}; We only have 140 instances that rely on this "redirection", with the vast majority of them under llvm/. Since relying on the redirection doesn't improve readability, this patch replaces SmallSet with SmallPtrSet for pointer element types.	2025-08-18 07:01:29 -07:00
David Sherwood	7ee6cf06c8	[LV] Fix incorrect cost kind in VPReplicateRecipe::computeCost (#153216 ) We were incorrectly using the TTI::TCK_RecipThroughput cost kind and ignoring the kind set in the context.	2025-08-18 09:52:31 +01:00
David Green	790bee99de	[VectorCombine] Remove dead node immediately in VectorCombine (#149047 ) The vector combiner will process all instructions as it first loops through the function, adding any newly added and deleted instructions to a worklist which is then processed when all nodes are done. These leaves extra uses in the graph as the initial processing is performed, leading to sub-optimal decisions being made for other combines. This changes it so that trivially dead instructions are removed immediately. The main changes that this requires is to make sure iterator invalidation does not occur.	2025-08-18 07:55:21 +01:00
Kazu Hirata	cbf5af9668	[llvm] Remove unused includes (NFC) (#154051 ) These are identified by misc-include-cleaner. I've filtered out those that break builds. Also, I'm staying away from llvm-config.h, config.h, and Compiler.h, which likely cause platform- or compiler-specific build failures.	2025-08-17 23:46:35 -07:00
Mel Chen	145e8aadca	[LV][EVL] Add dead EVL mask into ToErase for consistency. nfc (#153761 )	2025-08-18 14:11:50 +08:00
Florian Hahn	5892a2beec	[VPlan] Remove dead code from GetBroadCastInstr (NFCI). All relevant places should already explicitly materialize broadcasts. Remove dead code from VPTransformState::get	2025-08-17 21:51:14 +01:00
Florian Hahn	351d398a37	[VPlan] Run final VPlan simplifications before codegen. Dissolving the hierarchical VPlan CFG and converting abstract to concrete recipes can expose additional simplification opportunities. Do a final run of simplifyRecipes before executing the VPlan.	2025-08-16 18:54:27 +01:00
Alexey Bataev	b157599156	[SLP]Do not include copyable data to the same user twice If the copyable schedule data is created and the user is used several times in the user node, no need to count same data for the same user several times, need to include it only ones. Fixes #153754	2025-08-15 12:36:45 -07:00
Florian Hahn	2ed727f3f6	[VPlan] Move SCEV invalidation to ::executePlan. (NFCI) Move SCEV invalidation from legacy ILV code-path directly to ::executePlan.	2025-08-15 20:32:41 +01:00
Alexey Bataev	09f5b9ab0a	Revert "[SLP]Do not include copyable data to the same user twice" This reverts commit 758c6852c3ffe6b5e259cafadd811e60d8c276fb to fix buildbot https://lab.llvm.org/buildbot/#/builders/195/builds/13298	2025-08-15 12:08:31 -07:00
Alexey Bataev	758c6852c3	[SLP]Do not include copyable data to the same user twice If the copyable schedule data is created and the user is used several times in the user node, no need to count same data for the same user several times, need to include it only ones. Fixes #153754	2025-08-15 11:47:35 -07:00
XChy	3a4a60deff	[VectorCombine] Apply InstSimplify in scalarizeOpOrCmp to avoid infinite loop (#153069 ) Fixes #153012 As we tolerate unfoldable constant expressions in `scalarizeOpOrCmp`, we may fold ```llvm define void @bug(ptr %ptr1, ptr %ptr2, i64 %idx) #0 { entry: %158 = insertelement <2 x i64> <i64 5, i64 ptrtoint (ptr @val to i64)>, i64 %idx, i32 0 %159 = or disjoint <2 x i64> splat (i64 2), %158 store <2 x i64> %159, ptr %ptr2 ret void } ``` to ```llvm define void @bug(ptr %ptr1, ptr %ptr2, i64 %idx) { entry: %.scalar = or disjoint i64 2, %idx %0 = or <2 x i64> splat (i64 2), <i64 5, i64 ptrtoint (ptr @val to i64)> %1 = insertelement <2 x i64> %0, i64 %.scalar, i64 0 store <2 x i64> %1, ptr %ptr2, align 16 ret void } ``` And it would be folded back in `foldInsExtBinop`, resulting in an infinite loop. This patch forces scalarization iff InstSimplify can fold the constant expression.	2025-08-15 18:38:04 +00:00
zGoldthorpe	a8d25683ee	[PatternMatch] Allow `m_ConstantInt` to match integer splats (#153692 ) When matching integers, `m_ConstantInt` is a convenient alternative to `m_APInt` for matching unsigned 64-bit integers, allowing one to simplify ```cpp const APInt *IntC; if (match(V, m_APInt(IntC))) { if (IntC->ule(UINT64_MAX)) { uint64_t Int = IntC->getZExtValue(); // ... } } ``` to ```cpp uint64_t Int; if (match(V, m_ConstantInt(Int))) { // ... } ``` However, this simplification is only true if `V` is a scalar type. Specifically, `m_APInt` also matches integer splats, but `m_ConstantInt` does not. This patch ensures that the matching behaviour of `m_ConstantInt` parallels that of `m_APInt`, and also incorporates it in some obvious places.	2025-08-15 10:43:54 -06:00
Ramkumar Ramachandra	f34326dac8	[VPlan] Introduce vputils::onlyScalarValuesUsed (NFC) (#153577 )	2025-08-15 15:55:59 +00:00
Alexey Bataev	13b54f7dc1	[SLP] Recalculate dependencies for potential control dependencies if cleared If the control dependecies are cleared after calcellation of the copyables, need to reclculate them unconditionally. Fixes #153754 #153676	2025-08-15 07:52:10 -07:00
Alexey Bataev	bf2f241458	[SLP]Support LShr as base for copyable elements Added support for LShr instructions as base for copyable elements. Also, added simple analysis for best base instruction selection, if multiple candidates are available. Fixed scheduling after cancellation Reviewers: hiraditya, RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/153393	2025-08-14 19:12:27 -07:00
Alex Bradbury	db5f7dc374	Revert "[SLP]Support LShr as base for copyable elements" This reverts commit ca4ebf95172d24f8c47655709b2c9eb85bda5cb2. Causes compile-time crashes for some inputs with RVV zvl512b/zvl1024b configurations. See here for a minimal reproducer: https://github.com/llvm/llvm-project/pull/153393#issuecomment-3189898813	2025-08-14 22:18:24 +01:00
Florian Hahn	db98ac43ec	[LV] Use shl for ((VF * Step) * vscale) in createStepForVF. (#153495 ) Directly emit shl instead of a multiply if VF * Step is a power-of-2. The main motivation here is to prepare the code and test for directly generating and expanding a SCEV expression of the minimum iteration count. SCEVExpander will directly emit shl for multiplies with powers-of-2. InstCombine will also performs this combine, so end-to-end this should effectively by NFC. PR: https://github.com/llvm/llvm-project/pull/153495	2025-08-14 19:27:51 +01:00
Florian Hahn	ff0ce74be8	[VPlan] Replace scalar preheader with VPIRBB at single place (NFC). Replace the scalar preheader VPBB with an VPIRBB wrapping the IR basic block created by createVectorizedLoopSkeleton.	2025-08-14 19:11:34 +01:00
Ramkumar Ramachandra	86482dffba	[VPlan] Use m_Broadcast to improve a match (NFC) (#153607 )	2025-08-14 18:10:58 +01:00
Alexey Bataev	ca4ebf9517	[SLP]Support LShr as base for copyable elements Added support for LShr instructions as base for copyable elements. Also, added simple analysis for best base instruction selection, if multiple candidates are available. Reviewers: hiraditya, RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/153393	2025-08-14 12:35:28 -04:00
Alexey Bataev	d57ab276b6	[SLP] Recalculate cleared deps for potential control schedule data nodes Need to recalculate the dependencies for all potential control data schedule nodes to prevent compiler crash. Fixes #153571	2025-08-14 09:00:42 -07:00
Florian Hahn	177f27d220	[VPlan] Add incoming_[blocks,values] iterators to VPPhiAccessors (NFC) (#138472 ) Add 3 new iterator ranges to VPPhiAccessors * incoming_values(): returns a range over the incoming values of a phi * incoming_blocks(): returns a range over the incoming blocks of a phi * incoming_values_and_blocks: returns a range over pairs of incoming values and blocks. Depends on https://github.com/llvm/llvm-project/pull/124838. PR: https://github.com/llvm/llvm-project/pull/138472	2025-08-14 16:47:04 +01:00
Elvis Wang	01fac67e2a	[TTI] Add cost kind to getAddressComputationCost(). NFC. (#153342 ) This patch add cost kind to `getAddressComputationCost()` for #149955. Note that this patch also remove all the default value in `getAddressComputationCost()`.	2025-08-14 16:01:44 +08:00
Luke Lau	af06835483	[VPlan] Use parameter packs to avoid unary/binary/ternary matchers. NFC (#152272 ) Instead of defining unary/binary/ternary/4ary overloads of each matcher, we can use parameter packs to support arbitrary numbers of operands. This allows us to remove the explicit N-ary definitions for each matcher. We need to rewrite Recipe_match's constructor to use a parameter pack too, otherwise we end up with ambiguous overloads.	2025-08-14 11:55:55 +08:00
Florian Hahn	9400490a3c	[LV] Remove unused ILV state (NFC). Remove unused member variables from InnerLoopVectorizer.	2025-08-13 21:28:50 +01:00
Kazu Hirata	1f04b15c56	[Vectorize] Remove a redundant call to std::unique_ptr<T>::get (NFC) (#153359 )	2025-08-13 10:37:31 -07:00
Alexey Bataev	dd5ba694bd	[SLP]Recalculate deps for potential control-dependent schedule data After clearing the dependencies in copyable data, need to recalculate dependencies for the original ScheduleData, if it can be marked as control dependent. Fixes #153289	2025-08-13 08:18:26 -07:00
Ramkumar Ramachandra	d107c29fef	[VPlan] Strip unused CanonicalIVTy arg (NFC) (#153418 )	2025-08-13 15:53:56 +01:00
Florian Hahn	48bfaa4c06	[VPlan] Replace VPBB for vector.ph during skeleton creation (NFC) Shift replacement of regular VPBB for vector.ph with the VPIRBB wrapping the created IR block directly to skeleton creation, to be consistent with how the scalar preheader is handled.	2025-08-13 08:30:18 +01:00
Luke Lau	9217b6ab2e	[VPlan] Enforce that there is only ever one header mask. NFC (#152489 ) We almost only ever have one header mask, except with the data tail folding style, i.e. with VPInstruction::ActiveLaneMask. All we need to do is to make sure to erase the old header icmp based header mask when replacing it.	2025-08-13 02:39:04 +00:00
Florian Hahn	8cdab07aaa	Reapply "[VPlan] Remove trivial dead VPPhi cycles." This reverts commit 1c7c8e3ad39957285524ff116d9a6aec0d9b62f9. Recommit with a fix for the verifier error caused for EVL recipes. Extra test coverage added in 6f939da60e.	2025-08-12 22:09:30 +01:00
Leon Clark	e2bbd6d287	[VectorCombine][AMDGPU] Narrow Phi of Shuffles. (#140188 ) Attempt to narrow a phi of shufflevector instructions where the two incoming values have the same operands but different masks. Related to #128938. --------- Co-authored-by: Leon Clark <leoclark@amd.com>	2025-08-12 18:45:11 +01:00

1 2 3 4 5 ...

6407 Commits