llvm-project

Author	SHA1	Message	Date
Florian Hahn	300d2c6d20	[VPlan] Move SCEV expansion to VPlan transform. (NFCI). Move the logic to expand SCEVs directly to a late VPlan transform that expands SCEVs in the entry block. This turns VPExpandSCEVRecipe into an abstract recipe without execute, which clarifies how the recipe is handled, i.e. it is not executed like regular recipes. It also helps to simplify construction, as now scalar evolution isn't required to be passed to the recipe.	2025-08-21 22:03:26 +01:00
Ramkumar Ramachandra	a96b78cf41	[SCEVPatternMatch] Add signed cst match; use in LV (NFC) (#154568 ) Add a m_scev_SpecificSInt for matching a sign-extended value, and use it to improve some code in LoopVectorize.	2025-08-21 15:46:53 +00:00
Florian Hahn	4e6c88be7c	[TTI] Remove Args argument from getOperandsScalarizationOverhead (NFC). (#154126 ) Remove the ArrayRef<const Value> Args operand from getOperandsScalarizationOverhead and require that the callers de-duplicate arguments and filter constant operands. Removing the Value based Args argument enables callers where no Value * operands are available to use the function in a follow-up: computing the scalarization cost directly for a VPlan recipe. It also allows more accurate cost-estimates in the future: for example, when vectorizing a loop, we could also skip operands that are live-ins, as those also do not require scalarization. PR: https://github.com/llvm/llvm-project/pull/154126	2025-08-20 21:09:08 +01:00
David Sherwood	e172110d12	[LV] Don't calculate scalar costs for scalable VFs in setVectorizedCallDecision (#152713 ) In setVectorizedCallDecision we attempt to calculate the scalar costs for vectorisation calls, even for scalable VFs where we already know the answer is Invalid. We can avoid doing unnecessary work by skipping this completely for scalable vectors.	2025-08-20 15:00:31 +01:00
Florian Hahn	dc23869f98	[LV] Handle vector trip count being zero in preparePlanForEpiVectorLoop. After a485e0e, we may not set the vector trip count in preparePlanForEpilogueVectorLoop if it is zero. We should not choose a VF * UF that makes the main vector loop dead (i.e. vector trip count is zero), but there are some cases where this can happen currently. In those cases, set EPI.VectorTripCount to zero.	2025-08-20 11:54:22 +01:00
David Sherwood	13d8ba7dea	[LV][TTI] Calculate cost of extracting last index in a scalable vector (#144086 ) There are a couple of places in the loop vectoriser where we want to calculate the cost of extracting the last lane in a vector. However, we wrongly assume that asking for the cost of extracting lane (VF.getKnownMinValue() - 1) is an accurate representation of the cost of extracting the last lane. For SVE at least, this is non-trivial as it requires the use of whilelo and lastb instructions. To solve this problem I have added a new getReverseVectorInstrCost interface where the index is used in reverse from the end of the vector. Suppose a vector has a given ElementCount EC, the extracted/inserted lane would be EC - 1 - Index. For scalable vectors this index is unknown at compile time. I've added a AArch64 hook that better represents the cost, and also a RISCV hook that maintains compatibility with the behaviour prior to this PR. I've also taken the liberty of adding support in vplan for calculating the cost of VPInstruction::ExtractLastElement.	2025-08-19 09:31:37 +01:00
Mel Chen	1dac302ce7	[LV] Explicitly disallow interleaved access requiring gap mask for scalable VFs. nfc (#154122 ) Currently, VPInterleaveRecipe::execute does not support generating LLVM IR for interleaved accesses that require a gap mask for scalable VFs. It would be better to detect and prevent such groups from being vectorized as interleaved accesses in LoopVectorizationCostModel::interleavedAccessCanBeWidened, rather than relying on the TTI function getInterleavedMemoryOpCost to return an invalid cost.	2025-08-19 08:42:39 +08:00
Florian Hahn	7e9989390d	[VPlan] Materialize Build(Struct)Vectors for VPReplicateRecipes. (NFCI) (#151487 ) Materialze Build(Struct)Vectors explicitly for VPRecplicateRecipes, to serve their users requiring a vector, instead of doing so when unrolling by VF. Now we only need to implicitly build vectors in VPTransformState::get for VPInstructions. Once they are also unrolled by VF we can remove the code-path alltogether. PR: https://github.com/llvm/llvm-project/pull/151487	2025-08-18 20:49:42 +01:00
Kazu Hirata	07eb7b7692	[llvm] Replace SmallSet with SmallPtrSet (NFC) (#154068 ) This patch replaces SmallSet<T , N> with SmallPtrSet<T , N>. Note that SmallSet.h "redirects" SmallSet to SmallPtrSet for pointer element types: template <typename PointeeType, unsigned N> class SmallSet<PointeeType, N> : public SmallPtrSet<PointeeType, N> {}; We only have 140 instances that rely on this "redirection", with the vast majority of them under llvm/. Since relying on the redirection doesn't improve readability, this patch replaces SmallSet with SmallPtrSet for pointer element types.	2025-08-18 07:01:29 -07:00
Florian Hahn	351d398a37	[VPlan] Run final VPlan simplifications before codegen. Dissolving the hierarchical VPlan CFG and converting abstract to concrete recipes can expose additional simplification opportunities. Do a final run of simplifyRecipes before executing the VPlan.	2025-08-16 18:54:27 +01:00
Florian Hahn	2ed727f3f6	[VPlan] Move SCEV invalidation to ::executePlan. (NFCI) Move SCEV invalidation from legacy ILV code-path directly to ::executePlan.	2025-08-15 20:32:41 +01:00
Florian Hahn	db98ac43ec	[LV] Use shl for ((VF * Step) * vscale) in createStepForVF. (#153495 ) Directly emit shl instead of a multiply if VF * Step is a power-of-2. The main motivation here is to prepare the code and test for directly generating and expanding a SCEV expression of the minimum iteration count. SCEVExpander will directly emit shl for multiplies with powers-of-2. InstCombine will also performs this combine, so end-to-end this should effectively by NFC. PR: https://github.com/llvm/llvm-project/pull/153495	2025-08-14 19:27:51 +01:00
Florian Hahn	ff0ce74be8	[VPlan] Replace scalar preheader with VPIRBB at single place (NFC). Replace the scalar preheader VPBB with an VPIRBB wrapping the IR basic block created by createVectorizedLoopSkeleton.	2025-08-14 19:11:34 +01:00
Florian Hahn	177f27d220	[VPlan] Add incoming_[blocks,values] iterators to VPPhiAccessors (NFC) (#138472 ) Add 3 new iterator ranges to VPPhiAccessors * incoming_values(): returns a range over the incoming values of a phi * incoming_blocks(): returns a range over the incoming blocks of a phi * incoming_values_and_blocks: returns a range over pairs of incoming values and blocks. Depends on https://github.com/llvm/llvm-project/pull/124838. PR: https://github.com/llvm/llvm-project/pull/138472	2025-08-14 16:47:04 +01:00
Elvis Wang	01fac67e2a	[TTI] Add cost kind to getAddressComputationCost(). NFC. (#153342 ) This patch add cost kind to `getAddressComputationCost()` for #149955. Note that this patch also remove all the default value in `getAddressComputationCost()`.	2025-08-14 16:01:44 +08:00
Florian Hahn	9400490a3c	[LV] Remove unused ILV state (NFC). Remove unused member variables from InnerLoopVectorizer.	2025-08-13 21:28:50 +01:00
Ramkumar Ramachandra	d107c29fef	[VPlan] Strip unused CanonicalIVTy arg (NFC) (#153418 )	2025-08-13 15:53:56 +01:00
Florian Hahn	48bfaa4c06	[VPlan] Replace VPBB for vector.ph during skeleton creation (NFC) Shift replacement of regular VPBB for vector.ph with the VPIRBB wrapping the created IR block directly to skeleton creation, to be consistent with how the scalar preheader is handled.	2025-08-13 08:30:18 +01:00
Florian Hahn	8cdab07aaa	Reapply "[VPlan] Remove trivial dead VPPhi cycles." This reverts commit 1c7c8e3ad39957285524ff116d9a6aec0d9b62f9. Recommit with a fix for the verifier error caused for EVL recipes. Extra test coverage added in 6f939da60e.	2025-08-12 22:09:30 +01:00
Florian Hahn	424258947e	[VPlan] Materialize VF and VFxUF using VPInstructions. (#152879 ) Materialize VF and VFxUF computation using VPInstruction instead of directly creating IR. This is one of the last few steps needed to model the full vector skeleton in VPlan. This is mostly NFC, although in some cases we remove some unused computations. PR: https://github.com/llvm/llvm-project/pull/152879	2025-08-12 14:13:13 +01:00
David Sherwood	8140779a9a	[LV] Improve accuracy of branch weights in epilogue iteration check block (#152980 ) When one of the vector loops (main or epilogue) is scalable and the other isn't, we can use the estimated value of vscale to improve the accuracy.	2025-08-12 10:37:47 +01:00
Sam Tebbs	0bfa1718af	[LV] Create in-loop sub reductions (#147026 ) This PR allows the loop vectorizer to handle in-loop sub reductions by forming a normal in-loop add reduction with a negated input. Stacked PRs: 1. -> https://github.com/llvm/llvm-project/pull/147026 2. https://github.com/llvm/llvm-project/pull/147255 3. https://github.com/llvm/llvm-project/pull/147302 4. https://github.com/llvm/llvm-project/pull/147513	2025-08-12 10:22:41 +01:00
Florian Hahn	1c7c8e3ad3	Revert "[VPlan] Remove trivial dead VPPhi cycles." This reverts commit 1f17bb133f4f49942a1e0245291811ca3c99a7d2. This seems to be breaking some RISCV bots, reverting for now https://lab.llvm.org/buildbot/#/builders/210/builds/1266	2025-08-11 22:05:30 +01:00
Florian Hahn	1f17bb133f	[VPlan] Remove trivial dead VPPhi cycles. Update removeDeadRecipes to remove trivial dead VPPhi cycles. Should effectively be NFC end-to-end.	2025-08-11 21:29:49 +01:00
Luke Lau	aea82a780a	[VPlan] Remove some getCanonicalIV() uses. NFC (#152969 ) A lot of time getCanonicalIV() is used to get the canonical IV type, e.g. to instantiate a VPTypeAnalysis or to get the LLVMContext. However VPTypeAnalysis has a constructor that takes the VPlan directly and there's a method on VPlan to get the LLVMContext directly, so use those instead where possible. This lets us remove a constructor on VPTypeAnalysis. Also remove an unused LLVMContext argument in UnrollState whilst we're here.	2025-08-11 18:12:05 +08:00
Luke Lau	acb86fb9e0	[TTI] Consistently pass the pointer type to getAddressComputationCost. NFCI (#152657 ) In some places we were passing the type of value being accessed, in other cases we were passing the type of the pointer for the access. The most "involved" user is LoopVectorizationCostModel::getMemInstScalarizationCost, which is the only call site that passes in the SCEV, and it passes along the pointer type. This changes call sites to consistently pass the pointer type, and renames the arguments to clarify this. No target actually checks the contents of the type passed, only to see if it's a vector or not, so this shouldn't have an effect.	2025-08-11 18:00:12 +08:00
David Sherwood	aba0ce10c7	[LV] Add new line to interleaving disabled message (#152722 )	2025-08-11 09:53:20 +01:00
David Sherwood	9181a7e294	[LV] Fix branch weights in epilogue min iteration check block (#152534 ) I've changed how we construct the EpilogueVectorizerEpilogueLoop and EpilogueVectorizerMainLoop classes so that we construct the parent class with an additional boolean parameter indicating whether we're vectorising the main or epilogue loop. The InnerLoopAndEpilogueVectorizer class uses this new argument in combination with the EpilogueLoopVectorizationInfo struct to set the right UF and VF values. This then allows EpilogueVectorizerEpilogueLoop to access the correct values of VF and UF for the main loop, which are required when setting branch weights in the minimum iteration check block.	2025-08-11 09:52:54 +01:00
Florian Hahn	86813aa786	[VPlan] Add dedicated user for resume phi with epilogue vectorization. Epilogue vectorization currently relies on the resume phi for the canonical induction being always available, which is why VPPhi are considered to have side-effects, to prevent their removal. This patch adds a new ResumeForEpilogue opcode to mark the resume phi as used for epilogue vectorization. This allows treating VPPhis in general as not having side-effects, enabling removal of unused VPPhis.	2025-08-10 21:21:16 +01:00
Florian Hahn	06fd0f9d65	[VPlan] Move initial skeleton construction earlier (NFC). (#150848 ) Split up the not clearly named prepareForVectorization transform into buildVPlan0, which adds the vector preheader, middle and scalar preheader blocks, as well as the canonical induction recipes and sets the trip count. The new transform is run directly after building the plain CFG VPlan initially. The remaining code handling early exits and adding the branch in the middle block is renamed to handleEarlyExitsAndAddMiddleCheck and still runs at the original position. With the code movement, we only have to add the skeleton once to the initial VPlan, and cloning will take care of the rest. It will also enable moving other construction steps to work directly on VPlan0, like adding resume phis. PR: https://github.com/llvm/llvm-project/pull/150848	2025-08-09 20:54:42 +01:00
Florian Hahn	82d633e9ff	[VPlan] Materialize vector trip count using VPInstructions. (#151925 ) Materialize the vector trip count computation using VPInstruction instead of directly creating IR. This is one of the last few steps needed to model the full vector skeleton in VPlan. It also simplifies vector-trip count computations for scalable vectors, as we can re-use the UF x VF computation. PR: https://github.com/llvm/llvm-project/pull/151925	2025-08-08 11:44:32 +01:00
Florian Hahn	a485e0eae0	[VPlan] Retrieve vector TC for epilogue from resume phi (NFC). Instead of relying on getOrCreateVectorTripCount to initialize EPI.VectorTripCount, delay initialization after we retrieved the resume phi and get the trip count from there. This makes the code independent of legacy vector trip count creation.	2025-08-07 07:52:35 +01:00
Luke Lau	df8da2ff83	[VPlan] Support VPWidenPointerInductionRecipes with EVL tail folding (#152110 ) Now that VPWidenPointerInductionRecipes are modelled in VPlan in #148274, we can support them in EVL tail folding. We need to replace their VFxUF operand with EVL as the increment is not guaranteed to always be VF on the penultimate iteration, and UF is always 1 with EVL tail folding. We also need to move the creation of the backedge value to the latch so that EVL dominates it. With this we will no longer fail to convert a VPlan to EVL tail folding, so adjust tryAddExplicitVectorLength to account for this. This brings us to 99.4% of all vector loops vectorized on SPEC CPU 2017 with tail folding vs no tail folding. The test in only-compute-cost-for-vplan-vfs.ll previously relied on widened pointer inductions with EVL tail folding to end up in a scenario with no vector VPlans, so this also replaces it with an unvectorizable fixed-order recurrence test from first-order-recurrence-multiply-recurrences.ll that also gets discarded.	2025-08-07 10:54:24 +08:00
Florian Hahn	e80e7e717e	[VPlan] Use scalar VPPhi instead of VPWidenPHIRecipe in createPlainCFG. (#150847 ) The initial VPlan closely reflects the original scalar loop, so unsing VPWidenPHIRecipe here is premature. Widened phi recipes should only be introduced together with other widened recipes. PR: https://github.com/llvm/llvm-project/pull/150847	2025-08-06 14:43:03 +01:00
Florian Hahn	777c320e6c	[VPlan] Address comments missed in #142309 . Address additional comments from https://github.com/llvm/llvm-project/pull/142309.	2025-08-06 11:52:08 +01:00
Florian Hahn	d478502a42	[VPlan] Ensure that IV resume phi for epilogue is always first. (NFCI) Update handling of canonical IV resume phi for the epilogue loop to make sure the resume phi for the canonical IV is always the first phi in the scalar preheader. This makes it easier to retrieve it in preparePlanForEpilogueVectorLoop. For now, we keep an assert to make sure we use the same resume phi as before. This will be removed in the future.	2025-08-05 21:06:41 +01:00
Florian Hahn	c9dd14d1d4	[VPlan] Compute interleave count for VPlan. (#149702 ) Move selectInterleaveCount to LoopVectorizationPlanner and retrieve some information directly from VPlan. Register pressure was already computed for a VPlan, and with this patch we now also check for reductions directly on VPlan, as well as checking how many load and store operations remain in the loop. This should be mostly NFC, but we may compute slightly different interleave counts, except for some edge cases, e.g. where dead loads have been removed. This shouldn't happen in practice, and the patch doesn't cause changes across a large test corpus on AArch64. Computing the interleave count based on VPlan allows for making better decisions in presence of VPlan optimizations, for example when operations on interleave groups are narrowed. Note that there are a few test changes for tests that were still checking the legacy cost-model output when it was computed in selectInterleaveCount. PR: https://github.com/llvm/llvm-project/pull/149702	2025-08-05 09:42:55 +01:00
Florian Hahn	215e6beae0	[LV] Use MapVector for ScalarCostsTy for deterministic iter order (NFC) We iterate over the scalar costs of instruction when printing costs, and currently the iteration order is not deterministic. Currently no tests check the output with multiple instructions in the map, but those will come soon.	2025-08-04 19:31:07 +01:00
Florian Hahn	559d1dff89	[VPlan] Materialize BackedgeTakenCount using VPInstructions. Explicitly compute the backedge-taken count using VPInstruction. This is needed to model the full skeleton in VPlan. NFC modulo some instruction re-ordering.	2025-08-03 12:21:28 +01:00
Florian Hahn	eee9755881	[LV] Refine check to find epilogue IV resume value. Make sure to check that the vector trip count is containedin the list of incoming values to serve as tie-breaker with phis with all-zero incoming values. Fixes https://github.com/llvm/llvm-project/issues/151686.	2025-08-01 20:54:39 +01:00
Florian Hahn	c300a99ea8	[LV] Use MapVector for InstsToScalarize for deterministic iter order (NFC) We iterate over InstsToScalarize when printing costs, and currently the iteration order is not deterministic. Currently no tests check the output with multiple instructions in InstsToScalarize, but those will come soon.	2025-08-01 14:29:53 +01:00
Mel Chen	6752415ce8	[VectorUtils] Simplify the code by new function InterleaveGroup::isFull. nfc (#151112 )	2025-07-31 16:02:53 +08:00
Shih-Po Hung	cc8c941e17	[VPlan] Convert EVL loops to variable-length stepping after dissolution (#147222 ) Loop regions require fixed-length steps and rounded-up trip counts, but after dissolution creates explicit control flow, EVL loops can leverage variable-length stepping with original trip counts. This patch adds a post-dissolution transform pass to convert EVL loops from fixed-length to variable-length stepping .	2025-07-30 16:50:57 +08:00
Florian Hahn	55f9eccee9	[LV] Revert back to use Loop::isLoopInvariant in isPredicatedInst. (#150828 ) This partially reverts https://github.com/llvm/llvm-project/pull/140744, restoring the original TheLoop->isLoopInvariant check instead the more powerful Legal->isInvariant, which uses SCEV. This causes a mis-compile, because SCEV can prove that the stored value is loop-invariant, which in turn converts the store to a uniform store. But in VPlan, we aren't yet able to determine that the stored value is loop-invariant, so we extract the last lane, which is incorrect, because it does not account for the mask of the store. Restoring the original code is a safe fix and avoids this subtle divergence. Fixes https://github.com/llvm/llvm-project/issues/149347. PR: https://github.com/llvm/llvm-project/pull/150828	2025-07-29 20:32:31 +01:00
Paul Walker	3ede2decbe	[LLVM][LV] Improve UF calculation for vscale based scalar loops. (#146102 ) Update getSmallConstantTripCount() to return scalable ElementCount values that is used to acurrately determine the maximum value for UF, namely: TripCount / VF ==> X * VScale / Y * VScale ==> X / Y This improves the chances of being able to remove the scalar loop and also fixes an issue where a UF=2 is choosen for a scalar loop with exactly VF(= X * VScale) iterations.	2025-07-29 12:49:38 +01:00
Luke Lau	92d09245d6	[VPlan] Fall back to scalar epilogue if possible when EVL isn't legal (#150908 ) When enabling predicated vectorization by default on RISC-V, there's a bunch of performance regressions on llvm-test-suite's LoopInterleaving microbenchmarks: https://lnt.lukelau.me/db_default/v4/nts/788?show_delta=yes&show_previous=yes&show_stddev=yes&show_mad=yes&show_all=yes&show_all_samples=yes&show_sample_counts=yes&show_small_diff=yes&num_comparison_runs=0&test_filter=&test_min_value_filter=&aggregation_fn=min&MW_confidence_lv=0.05&compare_to=791&baseline=730&submit=Update Most of these regressions stem from the interleave_count pragma, which causes EVL tail folding interleaving to be unsupported (since we don't support unrolling with EVL) Currently if DataWithEVL isn't legal we fall back to DataWithoutLaneMask as the tail folding style, but this is very slow on RISC-V. The order of performance roughly is something like: DataWithEVL > None (scalar-epilogue) > Data[WithoutLaneMask] So this patch tries to prevent the regressions by falling back to a scalar epilogue where possible, i.e. the existing vectorization we have today. Not we may still need to fall back to DataWithoutLaneMask, e.g. if the trip count is low etc or it's forced by -prefer-predicate-over-epilogue=predicate-dont-vectorize.	2025-07-28 20:10:36 +08:00
Florian Hahn	2f2df751d4	[LV] Use SCEV::getElementCount in selectEpilogueVectorizationFactor. (#150018 ) Follow-up to https://github.com/llvm/llvm-project/pull/149789 to use getElementCount to compute the remaining iterations in selectEpilogueVectrizationFactor. PR: https://github.com/llvm/llvm-project/pull/150018	2025-07-28 12:12:27 +01:00
Florian Hahn	80c43b6c07	[VPlan] Add ExtractLane VPInst to extract across multiple parts. (#148817 ) This patch adds a new ExtractLane VPInstruction which extracts across multiple parts using a wide index, to be used in combination with FirstActiveLane. The patch updates early-exit codegen to use it instead ExtractElement, which is only per-part. With this change, interleaving should work correctly with early-exit loops. The patch removes the restrictions added in 6f43754e9 (#145877), but does not yet automatically select interleave counts > 1 for early-exit loops. I'll share a patch as follow-up. The cost of extracting a lane adds non-trivial overhead in the exit block, so that should be considered when picking the interleave count. PR: https://github.com/llvm/llvm-project/pull/148817	2025-07-27 08:08:25 +01:00
Florian Hahn	fa3ec0c17c	[VPlan] Materialize constant vector trip counts before final opts. (#142309 ) Materialize constant vector trip counts before ::execute, if the trip count can be computed as Original (TC / (VF * UF)) * (VF * UF). For now this excludes when the tail is folded or scalar epilogues are required. This enables removing a number of redundant branches from the middle block. For now this is also only done when not vectorizing the epilogue, as the simplification complicates stitching the 2 plans together. PR: https://github.com/llvm/llvm-project/pull/142309	2025-07-26 17:16:36 +01:00
Florian Hahn	662bede01e	[LV] Handle known-false mem runtime checks in GeneratedRTChecks. Handle mem checks known to be false in getMemRuntimeChecks the same way as SCEV checks known to be false in getSCEVChecks. This ensures such redundant check blocks are not added in the first place.	2025-07-26 15:39:21 +01:00

1 2 3 4 5 ...

2667 Commits