llvm-project

Author	SHA1	Message	Date
Paul Walker	6955a7d134	[NFC][LLVM][Instrumentation][LoopVectorize] Regenerate test checks.	2025-06-05 11:38:30 +00:00
Luke Lau	5458ea5122	[LV] Regenerate UTC variable names in RISCV/interleaved-accesses.ll. NFC	2025-06-04 01:07:32 +01:00
Florian Hahn	11713e86b0	[LV] Move VPlan-based calculateRegisterUsage to VPlanAnalysis (NFC). (#135673 ) Move VPlan-based calculateRegisterUsage from LoopVectorize to VPlanAnalysis.cpp. It is a VPlan-based analysis and this helps to reduce the size of LoopVectorize. PR: https://github.com/llvm/llvm-project/pull/135673	2025-06-02 17:40:50 +01:00
Florian Hahn	9ea4924720	[VPlan] Use EMIT-SCALAR for single-scalar VPPhis (NFC). Follow-up to https://github.com/llvm/llvm-project/pull/141428, to also use EMIT-SCALAR for VPPhis that are single scalars.	2025-05-29 11:20:07 +01:00
Florian Hahn	5b85e4b08d	[VPlan] Use EMIT-SCALAR when printing single-scalar VPInstructions. (#141428 ) By using SINGLE-SCALAR when printing, it is clear in the debug output that those VPInstructions only produce a single scalar. Split off in preparation for https://github.com/llvm/llvm-project/pull/140623. PR: https://github.com/llvm/llvm-project/pull/141428	2025-05-29 09:29:06 +01:00
Ramkumar Ramachandra	5f39be5917	[VPlan] Use InstSimplifyFolder instead of TargetFolder (#141222 ) For more powerful folding with operands that are not necessarily all-constant, use InstSimplifyFolder instead of TargetFolder in tryToConstantFold, and rename the function tryToFoldLiveIns.	2025-05-28 11:00:14 +02:00
Florian Hahn	d56deea1e4	[VPlan] Connect Entry to scalar preheader during initial construction. (#140132 ) Update initial construction to connect the Plan's entry to the scalar preheader during initial construction. This moves a small part of the skeleton creation out of ILV and will also enable replacing VPInstruction::ResumePhi with regular VPPhi recipes. Resume phis need 2 incoming values to start with, the second being the bypass value from the scalar ph (and used to replicate the incoming value for other bypass blocks). Adding the extra edge ensures we incoming values for resume phis match the incoming blocks. PR: https://github.com/llvm/llvm-project/pull/140132	2025-05-27 16:07:56 +01:00
Luke Lau	841c8d48a6	[LV] Add tests for more interleave group factors on AArch64 and RISC-V. NFC The plan is to eventually add support for scalably vectorizing these for non-power-of-2 factors, see https://github.com/llvm/llvm-project/pull/139893 Simultaneously, we need to add a test to make sure we don't generate @llvm.vector.[de]interleave3 for AArch64 if we can't lower it (yet)	2025-05-26 18:21:27 +01:00
Philip Reames	041d189f01	[RISCV][TTI] Adjust costing in getPartialReductionCost for zvqdotq (#141430 ) Two changes: 1) Handle fixed vector cases now that 77a3f8 has landed. 2) Fix a mistake in the original costing - the VF passed in is the input VF, not the output VF. Given that we should be costing the accumulator type with VF/4. Note that (2) does not cause any visible test differences as the vectorizer (outside of maximize-bandwidth mode) does not consider wide enough VF for the costing difference to matter.	2025-05-26 08:23:56 -07:00
Florian Hahn	dcef154b5c	[VPlan] Replace VPRegionBlock with explicit CFG before execute (NFCI). (#117506 ) Building on top of https://github.com/llvm/llvm-project/pull/114305, replace VPRegionBlocks with explicit CFG before executing. This brings the final VPlan closer to the IR that is generated and helps to simplify codegen. It will also enable further simplifications of phi handling during execution and transformations that do not have to preserve the canonical IV required by loop regions. This for example could include replacing the canonical IV with an EVL based phi while completely removing the original canonical IV. PR: https://github.com/llvm/llvm-project/pull/117506	2025-05-24 19:17:16 +01:00
Philip Reames	a21fb74c0c	[RISCV][TTI] Implement getPartialReductionCost for the vqdotq cases (#140974 ) Doing so tells the loop vectorizer that the partial.reduce intrinsic is profitable to use over the plain extend/multiply/reduce.add sequence.	2025-05-23 07:15:06 -07:00
Philip Reames	c21416d1f9	[RISCV][TTI] Add test coverage for getPartialReductionCost [nfc] Adding testing in advance of a change to cost the zvqdotq instructions such that we emit them from LV.	2025-05-21 15:12:23 -07:00
Ramkumar Ramachandra	cf1f116f78	[VPlan] Introduce constant folder in simplifyRecipe (#125365 ) Introduce a VPlan-level constant folder in simplifyRecipe that tries to fold a recipe to a constant using TargetFolder.	2025-05-20 14:16:01 +01:00
Sam Tebbs	70501ed2f0	[LoopVectorizer] Prune VFs based on plan register pressure (#132190 ) This PR moves the register usage checking to after the plans are created, so that any recipes that optimise register usage (such as partial reductions) can be properly costed and not have their VF pruned unnecessarily. Depends on https://github.com/llvm/llvm-project/pull/137746	2025-05-19 13:27:17 +01:00
Min-Yih Hsu	0ab67ec191	[LV][EVL] Introduce the EVLIndVarSimplify Pass for EVL-vectorized loops (#131005 ) When we enable EVL-based loop vectorization w/ predicated tail-folding, each vectorized loop has effectively two induction variables: one calculates the step using (VF x vscale) and the other one increases the IV by values returned from experiment.get.vector.length. The former, also known as canonical IV, is more favorable for analyses as it's "countable" in the sense of SCEV; the latter (EVL-based IV), however, is more favorable to codegen, at least for those that support scalable vectors like AArch64 SVE and RISC-V. The idea is that we use canonical IV all the way until the end of all vectorizers, where we replace it with EVL-based IV using EVLIVSimplify introduced here. Such that we can have the best from both worlds. This Pass is enabled by default in RISC-V. However, since we haven't really vectorize loops with predicate tail-folding by default, this Pass is no-op at this moment.	2025-05-14 13:49:50 -07:00
Florian Hahn	7a9fd62278	[VPlan] Use VPlan operand order for VPBlendRecipes. (#139475 ) Don't use the order of incoming values of IR phis when creating VPBlendRecipes. Instead, simply use the incoming operands and blocks from the VPWidenPHIRecipe. Note that this changes the order of the incoming operands/masks for some blends. PR: https://github.com/llvm/llvm-project/pull/139475	2025-05-14 14:56:35 +01:00
Florian Hahn	5fa64d65e9	[VPlan] Use printPhiOperands for VPPhi. Split off from https://github.com/llvm/llvm-project/pull/139151 to land printing improvements separately. Updates printing of VPPhi operands to be consistent with VPWidenPHIRecipe.	2025-05-10 12:49:29 +01:00
Luke Lau	1484f82cbc	[VPlan] Add VPInstruction::StepVector and use it in VPWidenIntOrFpInductionRecipe (#129508 ) Split off from #118638, this adds VPInstruction::StepVector, which generates integer step vectors (0,1,2,...,VF). This is a step towards eventually modelling all the separate parts of VPWidenIntOrFpInductionRecipe in VPlan. This is then used by VPWidenIntOrFpInductionRecipe, where we materialize it just before unrolling so the operands stay in a fixed position. The need for a separate operand in VPWidenIntOrFpInductionRecipe, as well as the need to update it in optimizeVectorInductionWidthForTCAndVFUF, should be removed with #118638 when everything is expanded in convertToConcreteRecipes.	2025-05-08 18:47:44 +08:00
Min-Yih Hsu	e0537c0768	[LV][EVL] Attach a new metadata on EVL vectorized loops (#131000 ) This patch attaches a new metadata, `llvm.loop.isvectorized.withevl`, on loops vectorized with explicit vector length. This will help other optimizations down in the pipeline that focus on EVL-vectorized loop This approach is much safer than, said IR pattern matching to figure out if a loop is EVL-vectorized or not.	2025-05-06 10:06:37 -07:00
Florian Hahn	043b04acff	Reapply "[VPlan] Fold NOT into predicate of wide compares." (#130347 ) This reverts commit 8dd160f4767f971572eac065c8650d9202ff5bf9. The recommit contains an adjustment to planContainsAdditionalSimplifications, which considers changes to the original predicate for compares. Original commit message: Add simplification to fold negation into a compare, if the negation is the only user of the compare. This removes a number of redundant negations. Alive2 Proofs for FPCMP test changes: https://alive2.llvm.org/ce/z/WGDz9U PR: https://github.com/llvm/llvm-project/pull/129430	2025-04-28 20:01:37 +01:00
YunQiang Su	e9a34e4236	[RISCV] Support vectorizing FMINIMUMNUM and FMAXIMUMNUM (#135727 ) RISC-V V extension support vfmax and vfmin, which follow IEEE754-2019. We can use them directly.	2025-04-27 19:10:02 +08:00
Florian Hahn	df21288247	[VPlan] Replace ExtractFromEnd with Extract(Last\|Penultimate)Element (NFC). (#137030 ) ExtractFromEnd only has 2 uses, extracting the last and penultimate elements. Replace it with 2 separate opcodes, removing the need to materialize and handle a constant argument. PR: https://github.com/llvm/llvm-project/pull/137030	2025-04-25 16:27:29 +01:00
Florian Hahn	5739a22fbb	[VPlan] Also duplicated scalar-steps when it enables sinking scalars. (#136021 ) Extend sinking logic to duplicate scalar steps recipe if it enables sinking, that is if all users in a destination block require all lanes. This should be the last step before removing legacy sinkScalarOperands. PR: https://github.com/llvm/llvm-project/pull/136021	2025-04-21 18:36:43 +01:00
Luke Lau	41675fa5b8	[VPlan] Simplify vp.merge true, (or x, y), x -> vp.merge y, true, x (#135017 ) With EVL tail folding an AnyOf reduction will emit an i1 vp.merge like vp.merge true, (or phi, cond), phi, evl We can remove the or and optimise this to vp.merge cond, true, phi, evl Which makes it slightly easier to pattern match in #134898. This also adds a pattern matcher for calls to help match this. Blended AnyOf reductions will use an and instead of an or, which we may also be able to simplify in a later patch.	2025-04-17 16:31:14 +02:00
Mel Chen	ffd5b14894	[LV] Add test cases for reverse accesses involving irregular types. nfc (#135139 ) Add a test with irregular type to ensure the vector load/store instructions are not generated.	2025-04-14 14:17:39 +08:00
Florian Hahn	995fd47944	[LAA] Make sure MaxVF for Store-Load forward safe dep distances is pow2. MaxVF computed in couldPreventStoreLoadFowrard may not be a power of 2, as CommonStride may not be a power-of-2. This can cause crashes after 78777a20. Use bit_floor to make sure it is a suitable power-of-2. Fixes https://github.com/llvm/llvm-project/issues/134696.	2025-04-12 20:05:37 +01:00
Florian Hahn	6a9e8fc50c	[VPlan] Introduce VPInstructionWithType, use instead of VPScalarCast(NFC) (#129706 ) There are some opcodes that currently require specialized recipes, due to their result type not being implied by their operands, including casts. This leads to duplication from defining multiple full recipes. This patch introduces a new VPInstructionWithType subclass that also stores the result type. The general idea is to have opcodes needing to specify a result type to use this general recipe. The current patch replaces VPScalarCastRecipe with VInstructionWithType, a similar patch for VPWidenCastRecipe will follow soon. There are a few proposed opcodes that should also benefit, without the need of workarounds: * https://github.com/llvm/llvm-project/pull/129508 * https://github.com/llvm/llvm-project/pull/119284 PR: https://github.com/llvm/llvm-project/pull/129706	2025-04-10 22:30:40 +01:00
Florian Hahn	6f92339d9e	[LV] Compute register usage for interleaving on VPlan. (#126437 ) Add a version of calculateRegisterUsage that works estimates register usage for a VPlan. This mostly just ports the existing code, with some updates to figure out what recipes will generate vectors vs scalars. There are number of changes in the computed register usages, but they should be more accurate w.r.t. to the generated vector code. There are the following changes: * Scalar usage increases in most cases by 1, as we always create a scalar canonical IV, which is alive across the loop and is not considered by the legacy implementation * Output is ordered by insertion, now scalar registers are added first due the canonical IV phi. * Using the VPlan, we now also more precisely know if an induction will be vectorized or scalarized. Depends on https://github.com/llvm/llvm-project/pull/126415 PR: https://github.com/llvm/llvm-project/pull/126437	2025-04-08 20:52:50 +01:00
Florian Hahn	5fbd0658a0	[VPlan] Add initial CFG simplification, removing BranchOnCond true. (#106748 ) Add an initial CFG simplification transform, which removes the dead edges for blocks terminated with BranchOnCond true. At the moment, this removes the edge between middle block and scalar preheader when folding the tail. PR: https://github.com/llvm/llvm-project/pull/106748	2025-04-04 15:44:26 +01:00
Luke Lau	79435de8a5	[ConstantFold] Support scalable constant splats in ConstantFoldCastInstruction (#133207 ) Previously only fixed vector splats were handled. This adds supports for scalable vectors too by allowing ConstantExpr splats. We need to add the extra V->getType()->isVectorTy() check because a ConstantExpr might be a scalar to vector bitcast. By allowing ConstantExprs this also allow fixed vector ConstantExprs to be folded, which causes the diffs in llvm/test/Analysis/ValueTracking/known-bits-from-operator-constexpr.ll and llvm/test/Transforms/InstSimplify/ConstProp/cast-vector.ll. I can remove them from this PR if reviewers would prefer. Fixes #132922	2025-04-03 16:24:56 +01:00
YunQiang Su	e25187bc3e	LLVM/Test: Add vectorizing testcases for fminimumnum and fminimumnum (#133843 ) Vectorizing of fminimumnum and fminimumnum have not support yet. Let's add the testcase for it now, and we will update the testcase when we support it.	2025-04-02 08:46:02 +08:00
Luke Lau	6afe5e5d1a	[LV][EVL] Peek through combination tail-folded + predicated masks (#133430 ) If a recipe was predicated and tail folded at the same time, it will have a mask like EMIT vp<%header-mask> = icmp ule canonical-iv, backedge-tc EMIT vp<%mask> = logical-and vp<%header-mask>, vp<%pred-mask> When converting to an EVL recipe, if the mask isn't exactly just the header-mask we copy the whole logical-and. We can remove this redundant logical-and (because it's now covered by EVL) and just use vp<%pred-mask> instead. This lets us remove the widened canonical IV in more places.	2025-03-31 21:28:39 +01:00
Florian Hahn	783a846507	[VPlan] Add VF as operand to VPScalarIVStepsRecipe. Similarly to other recipes, update VPScalarIVStepsRecipe to also take the runtime VF as argument. This removes some unnecessary runtime VF computations for scalable vectors. It will also allow dropping the UF == 1 restriction for narrowing interleave groups required in 577631f0a528.	2025-03-28 21:48:59 +00:00
Pengcheng Wang	f5f4da6db6	[RISCV] Don't vectorize for loops with small trip count (#132176 ) Inspired by https://reviews.llvm.org/D130755. I don't know the logic behind the value 5, it is copied from AArch64. For some tests, I have to change the trip count so that we don't break what they are testing.	2025-03-28 15:51:29 +08:00
Florian Hahn	2c7d40b2f0	[VPlan] Generalize SCALAR-STEPS removal to any unroll factor. Follow-up to dfca6c0d3bf9d1a056 to extend isUnrolled handle any unrolled VPlan, which means there's a single UF, but it will be > 1 if unrolling took place.	2025-03-26 21:03:50 +00:00
Florian Hahn	dfca6c0d3b	[VPlan] Remove no-op SCALAR-STEPS after unrolling. (#123655 ) After unrolling, there may be additional simplifications that can be applied. One example is removing SCALAR-STEPS for the first part where only the first lane is demanded. This removes redundant adds of 0 from a large number of tests (~200), many which I am still working on updating. In preparation for removing redundant WideIV steps added in https://github.com/llvm/llvm-project/pull/119284. PR: https://github.com/llvm/llvm-project/pull/123655	2025-03-25 12:57:24 +00:00
Florian Hahn	c482b8faea	[VPlan] Only execute VPExpandSCEVRecipes once and remove them (NFC). Instead of executing the whole entry VPIRBB twice, first only execute the VPExpandSCEVRecipes and replace their uses with the expanded VPValue, which will be a live-in. This allows removing special logic in VPExpandSCEVRecipe to support executing twice and allows moving the ExpandedSCEVs map out of VPTransformState. It will also allow adding other recipes to the entry VPBB in the future.	2025-03-23 09:06:01 +00:00
David Sherwood	4e69258bf3	[LoopVectorize] Add cost of generating tail-folding mask to the loop (#130565 ) At the moment if we decide to enable tail-folding we do not include the cost of generating the mask per VF. This can mean we make some poor choices of VF, which is definitely true for SVE-enabled AArch64 targets where mask generation for fixed-width vectors is more expensive than for scalable vectors. I've added a VPInstruction::computeCost function to return the costs of the ActiveLaneMask and ExplicitVectorLength operations. Unfortunately, in order to prevent asserts firing I've also had to duplicate the same code in the legacy cost model to make sure the chosen VFs match up. I've wrapped this up in a ifndef NDEBUG for now. The alternative would be to disable the assert completely when tail-folding, which I imagine is just as bad. New tests added: Transforms/LoopVectorize/AArch64/sve-tail-folding-cost.ll Transforms/LoopVectorize/RISCV/tail-folding-cost.ll	2025-03-21 09:24:56 +00:00
Florian Hahn	11b8699572	[LV] Don't skip instrs with side-effects in reg pressure computation. (#126415 ) calculateRegisterUsage adds end points for each user of an instruction to Ends and ignores instructions not added to it, i.e. instructions with no users. This means things like stores aren't included, which in turn means values that are only used in stores are also not included for consideration. This means we underestimate the register usage in cases where the only users are things like stores. Update the code to don't skip instructions without users (i.e. not in Ends) if they have side-effects. PR: https://github.com/llvm/llvm-project/pull/126415	2025-03-19 15:13:43 +00:00
Florian Hahn	870f753f1f	[VPlan] Also materialize broadcasts for backedge-taken-counts (NFC). Also include VPlan's BTC in the set of VPValues to materialize broadcasts for, if it is used.	2025-03-18 22:35:18 +00:00
Luke Lau	a4dc02c0e7	[VPlan] Rename VPReverseVectorPointerRecipe to VPVectorEndPointerRecipe. NFC (#131086 ) After #128718 lands there will be two ways of performing a reversed widened memory access, either by performing a consecutive unit-stride access and a reverse, or a strided access with a negative stride. Even though both produce a reversed vector, only the former needs VPReverseVectorPointerRecipe which computes a pointer to the last element of each part. A strided reverse still needs a pointer to the first element of each part so it will use VPVectorPointerRecipe. This renames VPReverseVectorPointerRecipe to VPVectorEndPointerRecipe to clarify that a reversed access may not necessarily need a pointer to the last element.	2025-03-19 00:09:15 +08:00
David Sherwood	f6b1b91a3d	[LV][NFC] Regenerate CHECK lines in some tests (#131799 ) Regenerates CHECK lines in tests that are affected by PR #130565 to aid reviews.	2025-03-18 14:38:01 +00:00
Mel Chen	489d1e764e	[LV][NFC] Pre-commit test for supporting strided accesses. (#130563 ) Duplicate riscv-vector-reverse.ll as riscv-vector-reverse-output.ll to verify all generated IR, not just debug output. Pre-commit for #128718.	2025-03-18 16:08:42 +08:00
Florian Hahn	6a8d5f22ff	[VPlan] Don't access canonical IV in VPWidenPointerInduction::execute. This updates VPWidenPointerInductionRecipe::execute to not use the canonical IV to determine the insert point. Instead, it relies on the current recipe position. In cases where this is not sufficient, set the insert point to the first non-phi instruction, to ensure phis are created together.	2025-03-15 21:32:48 +00:00
Florian Hahn	56b05a0d6b	[VPlan] Use VFxUF in VPWidenPointerInductionRecipe. Use VFxUF VPValue instead of computing VF * UF explicitly.	2025-03-15 18:18:53 +00:00
Luke Lau	26324bc1bf	[VPlan] Move FOR splice cost into VPInstruction::FirstOrderRecurrenceSplice (#129645 ) After #124093 we now support fixed-order recurrences with EVL tail folding by replacing VPInstruction::FirstOrderRecurrenceSplice with a VP splice intrinsic. However the costing for the splice is currently done in VPFirstOrderRecurrencePHIRecipe, so when we add the VP splice intrinsic we end up costing it twice. This fixes it by splitting out the cost for the splice into FirstOrderRecurrenceSplice so that it's not duplicated when we replace it. We still have to keep the VF=1 checks in VPFirstOrderRecurrencePHIRecipe since the splice might end up dead and discarded, e.g. in the test @pr97452_scalable_vf1_for.	2025-03-14 15:33:32 +08:00
Florian Hahn	02575f887b	[VPlan] Use VPInstruction for VPScalarPHIRecipe. (NFCI) (#129767 ) Now that all phi nodes manage their incoming blocks through the VPlan-predecessors, there should be no need for having a dedicate recipe, it should be sufficient to allow PHI opcodes in VPInstruction. Follow-ups will also migrate VPWidenPHIRecipe and possibly others, building on top of https://github.com/llvm/llvm-project/pull/129388. PR: https://github.com/llvm/llvm-project/pull/129767	2025-03-13 18:35:07 +00:00
Mel Chen	5d5e706691	[VPlan] Restrict hoisting of broadcast operations using VPDominatorTree (#117138 ) This patch restricts broadcast operations from being hoisted to the vector preheader unless the basic block that defines the broadcasted value properly dominates the vector preheader. This prevents potential use-before-definition issues when the broadcasted value is defined within the plan. VPDominatorTree is used to confirm this restriction while still allowing safe hoisting for broadcasted values defined outside the plan. Issue https://github.com/llvm/llvm-project/issues/117139	2025-03-13 07:16:04 -07:00
Mel Chen	ffe202ca00	Revert "[LV] Limits the splat operations be hoisted must not be defined by a recipe. (#117138 )" This reverts commit 1ff10fa82fff83bb2f0a5c1ffde6203b52bc9619.	2025-03-13 07:16:04 -07:00
Florian Hahn	62994c3291	[VPlan] Also introduce explicit broadcasts for values from entry VPBB. Update and generalize materializeBroadcasts to also introduce explicit broadcasts for VPValues defined in the Plans Entry block. This fixes a crash when trying to insert the broadcasts generated by VPTransformState::get after the generating instruction, which isn't possible after invoke instructions. Fixes https://github.com/llvm/llvm-project/issues/128838.	2025-03-12 22:03:19 +00:00

1 2 3 4 5 ...

370 Commits