llvm-project

Author	SHA1	Message	Date
Florian Hahn	2bdc1a1337	[LV] Use frozen start value for FindLastIV if needed. (#132691 ) FindLastIV introduces multiple uses of the start value, where in the original source there was only a single use, when the epilogue is vectorized. Each use of undef may produce a different result, so introducing multiple uses can produce incorrect results when the input is undef/poison. If the start value may be undef or poison, freeze it and use the frozen value, which will be the same at all uses. See the following scenarios in Alive2: * Both main and epilogue vector loops execute, go to exit block: https://alive2.llvm.org/ce/z/_TSvRr * Both main and epilogue vector loops execute, go to scalar loop: https://alive2.llvm.org/ce/z/CsPj5v * Only epilogue vector loop executes, go to exit block: https://alive2.llvm.org/ce/z/5XqkNV * Only epilogue vector loop executes, go to scalar loop: https://alive2.llvm.org/ce/z/JUpqRN The latter 2 show requiring freezing the resume phi. That means we cannot freeze in the preheader. We could move the freeze to the main iteration count check, but that would be a bit fragile to find and other transforms can sink the freeze if needed. Depends on https://github.com/llvm/llvm-project/pull/132689 and https://github.com/llvm/llvm-project/pull/132690. Fixes https://github.com/llvm/llvm-project/issues/126836 PR: https://github.com/llvm/llvm-project/pull/132691	2025-04-04 11:48:01 +01:00
Florian Hahn	0f696c2e86	[LV] Add test where epilogue is vectorized and backedge removed. Adds extra test coverage for https://github.com/llvm/llvm-project/pull/106748.	2025-04-03 22:14:15 +01:00
Florian Hahn	012e574d4d	[LV] Add FindLastIV test with truncated IV and epilogue vectorization. This adds missing test coverage for https://github.com/llvm/llvm-project/pull/132691.	2025-04-03 21:01:58 +01:00
Luke Lau	79435de8a5	[ConstantFold] Support scalable constant splats in ConstantFoldCastInstruction (#133207 ) Previously only fixed vector splats were handled. This adds supports for scalable vectors too by allowing ConstantExpr splats. We need to add the extra V->getType()->isVectorTy() check because a ConstantExpr might be a scalar to vector bitcast. By allowing ConstantExprs this also allow fixed vector ConstantExprs to be folded, which causes the diffs in llvm/test/Analysis/ValueTracking/known-bits-from-operator-constexpr.ll and llvm/test/Transforms/InstSimplify/ConstProp/cast-vector.ll. I can remove them from this PR if reviewers would prefer. Fixes #132922	2025-04-03 16:24:56 +01:00
Ramkumar Ramachandra	00122bb56b	[LV] Regen a test with UTC (#134076 )	2025-04-02 17:23:00 +01:00
Ramkumar Ramachandra	f7591ee161	[LV] Exercise type-mismatch with RT-check conflict rdx (#130295 ) The test suite of LoopVectorize suffers from a coverage hole when types mismatch, and runtime checks are needed, with a conflict redux. Fix this coverage hole by adding tests.	2025-04-02 17:22:40 +01:00
Luke Lau	8107b430ed	[VPlan] Simplify select c, x, x -> x (#133731 ) As noted in 1a9358c090d0507be21c5e9b2d97a23ef1de8ab0, some simplifications can produce a redundant select where the true and false operands are the same, which this patch removes. The is_fpclass test was changed so the condition wasn't made dead.	2025-04-02 10:26:48 +01:00
YunQiang Su	e25187bc3e	LLVM/Test: Add vectorizing testcases for fminimumnum and fminimumnum (#133843 ) Vectorizing of fminimumnum and fminimumnum have not support yet. Let's add the testcase for it now, and we will update the testcase when we support it.	2025-04-02 08:46:02 +08:00
Samuel Tebbs	a1e041b646	[NFC][AArch64] Pre-commit high register pressure dot product test	2025-04-01 14:13:30 +01:00
Ramkumar Ramachandra	3a66760d9b	[LV] Improve a test, regen with UTC (#130092 )	2025-04-01 14:11:20 +01:00
YunQiang Su	f9282475b3	Revert "LLVM/Test: Add vectorizing testcases for fminimumnum and fminimumnum (#133690 )" This reverts commit de053bb4b0db64aebdff7719ff6ce75487f6ba5d.	2025-04-01 08:48:10 +08:00
YunQiang Su	de053bb4b0	LLVM/Test: Add vectorizing testcases for fminimumnum and fminimumnum (#133690 ) Vectorizing of fminimumnum and fminimumnum have not support yet. Let's add the testcase for it now, and we will update the testcase when we support it.	2025-04-01 08:00:22 +08:00
Luke Lau	6afe5e5d1a	[LV][EVL] Peek through combination tail-folded + predicated masks (#133430 ) If a recipe was predicated and tail folded at the same time, it will have a mask like EMIT vp<%header-mask> = icmp ule canonical-iv, backedge-tc EMIT vp<%mask> = logical-and vp<%header-mask>, vp<%pred-mask> When converting to an EVL recipe, if the mask isn't exactly just the header-mask we copy the whole logical-and. We can remove this redundant logical-and (because it's now covered by EVL) and just use vp<%pred-mask> instead. This lets us remove the widened canonical IV in more places.	2025-03-31 21:28:39 +01:00
Florian Hahn	4e8fbc6071	[LV] Add epilogue vectorization tests for FindLastIV reductions. Add missing test coverage for #126836.	2025-03-31 21:23:35 +01:00
Ramkumar Ramachandra	c20bea09c2	[LV] Regen a test with UTC (#133432 )	2025-03-31 16:24:45 +01:00
David Sherwood	f4d25c498a	[LV][NFC] Regenerate some SVE tests using --filter-out-after option (#132174 ) I recently added a new option to update_test_checks.py that can filter out all CHECK lines after a certain point. We usually don't care about checking for the original scalar loop after the vector loop because it doesn't change. Cutting out unnecessary CHECK lines makes the files smaller and hopefully the tests run quicker.	2025-03-31 12:40:41 +01:00
Florian Hahn	809f857d2c	[VPlan] Support early-exit loops in optimizeForVFAndUF. (#131539 ) Update optimizeForVFAndUF to support early-exit loops by handling BranchOnCond(Or(..., CanonicalIV == TripCount)) via SCEV PR: https://github.com/llvm/llvm-project/pull/131539	2025-03-31 07:55:48 +01:00
Florian Hahn	6b98134466	[VPlan] Re-enable narrowing interleave groups with interleaving. Remove the UF = 1 restriction introduced by 577631f0a5 building on top of 783a846507683, which allows updating all relevant users of the VF, VPScalarIVSteps in particular. This restores the full functionality of https://github.com/llvm/llvm-project/pull/106441.	2025-03-29 20:14:10 +00:00
Florian Hahn	783a846507	[VPlan] Add VF as operand to VPScalarIVStepsRecipe. Similarly to other recipes, update VPScalarIVStepsRecipe to also take the runtime VF as argument. This removes some unnecessary runtime VF computations for scalable vectors. It will also allow dropping the UF == 1 restriction for narrowing interleave groups required in 577631f0a528.	2025-03-28 21:48:59 +00:00
David Green	70f083f068	[LV][AArch64] Test cleanup of low_trip_count_predicates.ll. NFC Post commit cleanup from #132170	2025-03-28 19:31:37 +00:00
Hari Limaye	bf5627c85e	[LV] Optimize VPWidenIntOrFpInductionRecipe for known TC (#118828 ) Optimize the IR generated for a VPWidenIntOrFpInductionRecipe to use the narrowest type necessary, when the trip-count of a loop is known to be constant and the only use of the recipe is the condition used by the vector loop's backedge branch.	2025-03-28 14:47:40 +00:00
Pengcheng Wang	f5f4da6db6	[RISCV] Don't vectorize for loops with small trip count (#132176 ) Inspired by https://reviews.llvm.org/D130755. I don't know the logic behind the value 5, it is copied from AArch64. For some tests, I have to change the trip count so that we don't break what they are testing.	2025-03-28 15:51:29 +08:00
Florian Hahn	5c26e80e57	[LV] Make cost model tests independent of VPValue numbers. Update tests to not rely on hard-coded VPValue numbers.	2025-03-27 21:15:32 +00:00
Florian Hahn	8ddbc01295	[VPlan] Manage FindLastIV start value in ComputeFindLastIVResult (NFC) (#132690 ) Keep the start value as operand of ComputeFindLastIVResult. A follow-up patch will use this to make sure the start value is frozen if needed. Depends on https://github.com/llvm/llvm-project/pull/132689 PR: https://github.com/llvm/llvm-project/pull/132690	2025-03-27 18:34:13 +00:00
Ryotaro Kasuga	6c56a842b7	[clang][CodeGen] Generate follow-up metadata for loops in correct format (#131985 ) When pragma of loop transformations is specified, follow-up metadata for loops is generated after each transformation. On the LLVM side, follow-up metadata is expected to be a list of properties, such as the following: ``` !followup = !{!"llvm.loop.vectorize.followup_all", !mp, !isvectorized} !mp = !{!"llvm.loop.mustprogress"} !isvectorized = !{"llvm.loop.isvectorized"} ``` However, on the clang side, the generated metadata contains an MDNode that has those properties, as shown below: ``` !followup = !{!"llvm.loop.vectorize.followup_all", !loop_id} !loop_id = distinct !{!loop_id, !mp, !isvectorized} !mp = !{!"llvm.loop.mustprogress"} !isvectorized = !{"llvm.loop.isvectorized"} ``` According to the [LangRef](https://llvm.org/docs/TransformMetadata.html#transformation-metadata-structure), the LLVM side is correct. Due to this inconsistency, follow-up metadata was not interpreted correctly, e.g., only one transformation is applied when multiple pragmas are used. This patch fixes clang side to emit followup metadata in correct format.	2025-03-27 20:29:37 +09:00
Florian Hahn	2c7d40b2f0	[VPlan] Generalize SCALAR-STEPS removal to any unroll factor. Follow-up to dfca6c0d3bf9d1a056 to extend isUnrolled handle any unrolled VPlan, which means there's a single UF, but it will be > 1 if unrolling took place.	2025-03-26 21:03:50 +00:00
David Green	de1c2f24bc	[LoopVectorizer][AArch64] Move getMinTripCountTailFoldingThreshold later. (#132170 ) This moves the checks of MinTripCountTailFoldingThreshold later, during the calculation of whether to tail fold. This allows it to check beforehand whether tail predication is required, either for scalable or fixed-width vectors. This option is only specified for AArch64, where it returns the minimum of 5. This patch aims to allow the vectorization of TC=4 loops, preventing them from performing slower when SVE is present.	2025-03-26 19:35:08 +00:00
David Sherwood	1c9fe8c8af	[LV] Optimise users of induction variables in early exit blocks (#130766 ) This is the second of two PRs that attempts to improve the IR generated in the exit blocks of vectorised loops with uncountable early exits. It follows on from PR #128880. In this PR I am improving the generated code for users of induction variables in early exit blocks. This required using a newly add VPInstruction called FirstActiveLane, which calculates the index of the first active predicate in the mask operand. I have added a new function optimizeEarlyExitInductionUser that is called from optimizeInductionExitUsers when handling users in early exit blocks.	2025-03-26 12:09:59 +00:00
Florian Hahn	420c056f85	[VPlan] Add ComputeFindLastIVResult opcode (NFC). (#132689 ) This moves the logic for computing the FindLastIV reduction result to its own opcode. A follow-up patch will update the new opcode to also take the start value, to fix https://github.com/llvm/llvm-project/issues/126836. PR: https://github.com/llvm/llvm-project/pull/132689	2025-03-26 10:49:09 +00:00
Florian Hahn	577631f0a5	Reapply "[VPlan] Add transformation to narrow interleave groups. (#106441 )" This reverts commit ff3e2ba9eb94217f3ad3525dc18b0c7b684e0abf. The recommmitted version limits to transform to cases where no interleaving is taking place, to avoid a mis-compile when interleaving. Original commit message: This patch adds a new narrowInterleaveGroups transfrom, which tries convert a plan with interleave groups with VF elements to a plan that instead replaces the interleave groups with wide loads and stores processing VF elements. This effectively is a very simple form of loop-aware SLP, where we use interleave groups to identify candidates. This initial version is quite restricted and hopefully serves as a starting point for how to best model those kinds of transforms. Depends on https://github.com/llvm/llvm-project/pull/106431. Fixes https://github.com/llvm/llvm-project/issues/82936. PR: https://github.com/llvm/llvm-project/pull/106441	2025-03-25 20:57:10 +00:00
Florian Hahn	dfca6c0d3b	[VPlan] Remove no-op SCALAR-STEPS after unrolling. (#123655 ) After unrolling, there may be additional simplifications that can be applied. One example is removing SCALAR-STEPS for the first part where only the first lane is demanded. This removes redundant adds of 0 from a large number of tests (~200), many which I am still working on updating. In preparation for removing redundant WideIV steps added in https://github.com/llvm/llvm-project/pull/119284. PR: https://github.com/llvm/llvm-project/pull/123655	2025-03-25 12:57:24 +00:00
Florian Hahn	9c7e38896f	[VPlan] Split off reduction printing tests, add find-last-IV test. Splits off reduction printing tests, to limit growth and add test case for printing find-last-IV (https://github.com/llvm/llvm-project/pull/132689)	2025-03-25 10:06:28 +00:00
Luke Lau	6a8606e99e	[VPlan] Only store RecurKind + FastMathFlags in VPReductionRecipe. NFCI (#131300 ) VPReductionRecipes take a RecurrenceDescriptor, but only use the RecurKind and FastMathFlags in it when executing. This patch makes the recipe more lightweight by stripping it to only take the latter two. The motiviation for this is to simplify an upcoming patch to support in-loop AnyOf reductions. For an in-loop AnyOf reduction we want to create an Or reduction, and by using RecurKind we can create an arbitrary reduction without needing a full RecurrenceDescriptor.	2025-03-24 19:18:54 +08:00
Florian Hahn	06fd10f1da	[VPlan] Don't create ExtractElement recipes for scalar plans. (#131604 ) ExtractElements are no-ops for scalar VPlans. Don't introduce them in handleUncountableEarlyExit if the plan has only a scalar VF. This fixes a crash trying to compute the cost of ExtractElement after 26ecf978951b79. PR: https://github.com/llvm/llvm-project/pull/131604	2025-03-23 22:00:02 +00:00
Martin Storsjö	ff3e2ba9eb	Revert "[VPlan] Add transformation to narrow interleave groups. (#106441 )" This reverts commit dfa665f19c52d98b8d833a8e9073427ba5641b19. This commit caused miscompilations in ffmpeg, see https://github.com/llvm/llvm-project/pull/106441 for details.	2025-03-23 23:27:39 +02:00
Florian Hahn	c482b8faea	[VPlan] Only execute VPExpandSCEVRecipes once and remove them (NFC). Instead of executing the whole entry VPIRBB twice, first only execute the VPExpandSCEVRecipes and replace their uses with the expanded VPValue, which will be a live-in. This allows removing special logic in VPExpandSCEVRecipe to support executing twice and allows moving the ExpandedSCEVs map out of VPTransformState. It will also allow adding other recipes to the entry VPBB in the future.	2025-03-23 09:06:01 +00:00
Florian Hahn	dfa665f19c	[VPlan] Add transformation to narrow interleave groups. (#106441 ) This patch adds a new narrowInterleaveGroups transfrom, which tries convert a plan with interleave groups with VF elements to a plan that instead replaces the interleave groups with wide loads and stores processing VF elements. This effectively is a very simple form of loop-aware SLP, where we use interleave groups to identify candidates. This initial version is quite restricted and hopefully serves as a starting point for how to best model those kinds of transforms. Depends on https://github.com/llvm/llvm-project/pull/106431. Fixes https://github.com/llvm/llvm-project/issues/82936. PR: https://github.com/llvm/llvm-project/pull/106441	2025-03-22 21:40:17 +00:00
Florian Hahn	0d3ba087f7	[LV] Move IV bypass value creation out of ILV (NFC) createInductionAdditionalBypassValues is only used for epilogue vectorization now. Move it out of ILV, which means we do not have to thread through ExpandedSCEVs and also don't have to track the bypass values in ILV. Instead, directly create them if needed after executing the epilogue plan. This moves more the epilogue specific logic out of the generic executePlan.	2025-03-22 20:36:45 +00:00
Florian Hahn	2186199d08	[LV] Add test showing missing debug location on VPScalarIVStepsRecipe.	2025-03-22 14:38:14 +00:00
Florian Hahn	2f2100c879	[LV] Add additional tests for #106441 . Further increase test coverage for https://github.com/llvm/llvm-project/pull/106441 Also regenerate checks with -filter-out-after.	2025-03-22 10:07:11 +00:00
David Sherwood	4e69258bf3	[LoopVectorize] Add cost of generating tail-folding mask to the loop (#130565 ) At the moment if we decide to enable tail-folding we do not include the cost of generating the mask per VF. This can mean we make some poor choices of VF, which is definitely true for SVE-enabled AArch64 targets where mask generation for fixed-width vectors is more expensive than for scalable vectors. I've added a VPInstruction::computeCost function to return the costs of the ActiveLaneMask and ExplicitVectorLength operations. Unfortunately, in order to prevent asserts firing I've also had to duplicate the same code in the legacy cost model to make sure the chosen VFs match up. I've wrapped this up in a ifndef NDEBUG for now. The alternative would be to disable the assert completely when tail-folding, which I imagine is just as bad. New tests added: Transforms/LoopVectorize/AArch64/sve-tail-folding-cost.ll Transforms/LoopVectorize/RISCV/tail-folding-cost.ll	2025-03-21 09:24:56 +00:00
Florian Hahn	c73ad7ba20	[VPlan] Add transformation to narrow interleave groups. This patch adds a new narrowInterleaveGroups transfrom, which tries convert a plan with interleave groups with VF elements to a plan that instead replaces the interleave groups with wide loads and stores processing VF elements. This effectively is a very simple form of loop-aware SLP, where we use interleave groups to identify candidates. This initial version is quite restricted and hopefully serves as a starting point for how to best model those kinds of transforms. For now it only transforms load interleave groups feeding store groups. Depends on #106431. This lands the main parts of the approved https://github.com/llvm/llvm-project/pull/106441 as suggested to break things up a bit more.	2025-03-20 19:41:37 +00:00
Florian Hahn	2e13ec561c	[VPlan] Bail out on non-intrinsic calls in VPlanNativePath. Update initial VPlan-construction in VPlanNativePath in line with the inner loop path, in that it bails out when encountering constructs it cannot handle, like non-intrinsic calls. Fixes https://github.com/llvm/llvm-project/issues/131071.	2025-03-19 21:35:15 +00:00
Florian Hahn	11b8699572	[LV] Don't skip instrs with side-effects in reg pressure computation. (#126415 ) calculateRegisterUsage adds end points for each user of an instruction to Ends and ignores instructions not added to it, i.e. instructions with no users. This means things like stores aren't included, which in turn means values that are only used in stores are also not included for consideration. This means we underestimate the register usage in cases where the only users are things like stores. Update the code to don't skip instructions without users (i.e. not in Ends) if they have side-effects. PR: https://github.com/llvm/llvm-project/pull/126415	2025-03-19 15:13:43 +00:00
Florian Hahn	3c554deaaa	[LV] Add reg-usage test with values only used by llvm.assume. Add test checking we are not counting registers that are only used by ephemeral users, like llvm.assume.	2025-03-19 12:17:50 +00:00
Florian Hahn	870f753f1f	[VPlan] Also materialize broadcasts for backedge-taken-counts (NFC). Also include VPlan's BTC in the set of VPValues to materialize broadcasts for, if it is used.	2025-03-18 22:35:18 +00:00
Florian Hahn	1442fe0c89	[LV] Update test to use dereferenceable attribute instead of assumption. Use dereferenceable attribute instead of assumption to make the tests independent of https://github.com/llvm/llvm-project/pull/128061.	2025-03-18 20:28:28 +00:00
Luke Lau	a4dc02c0e7	[VPlan] Rename VPReverseVectorPointerRecipe to VPVectorEndPointerRecipe. NFC (#131086 ) After #128718 lands there will be two ways of performing a reversed widened memory access, either by performing a consecutive unit-stride access and a reverse, or a strided access with a negative stride. Even though both produce a reversed vector, only the former needs VPReverseVectorPointerRecipe which computes a pointer to the last element of each part. A strided reverse still needs a pointer to the first element of each part so it will use VPVectorPointerRecipe. This renames VPReverseVectorPointerRecipe to VPVectorEndPointerRecipe to clarify that a reversed access may not necessarily need a pointer to the last element.	2025-03-19 00:09:15 +08:00
David Sherwood	f6b1b91a3d	[LV][NFC] Regenerate CHECK lines in some tests (#131799 ) Regenerates CHECK lines in tests that are affected by PR #130565 to aid reviews.	2025-03-18 14:38:01 +00:00
David Sherwood	2586e7fcd8	[LV][NFC] Tidy up partial reduction tests with filter-out-after option (#129047 ) A few test files seemed to have been edited after using the update_test_checks.py script, which can make life hard for developers when trying to update these tests in future patches. Also, the tests still had this comment at the top ; NOTE: Assertions have been autogenerated by ... which could potentially be confusing, since they've not strictly been auto-generated. I've attempted to keep the spirit of the original tests by excluding all CHECK lines after the scalar.ph IR block, however I've done this by using a new option called --filter-out-after to the update_test_checks.py script.	2025-03-18 11:39:55 +00:00

1 2 3 4 5 ...

3019 Commits