llvm-project

Author	SHA1	Message	Date
Ramkumar Ramachandra	c20bea09c2	[LV] Regen a test with UTC (#133432 )	2025-03-31 16:24:45 +01:00
David Sherwood	f4d25c498a	[LV][NFC] Regenerate some SVE tests using --filter-out-after option (#132174 ) I recently added a new option to update_test_checks.py that can filter out all CHECK lines after a certain point. We usually don't care about checking for the original scalar loop after the vector loop because it doesn't change. Cutting out unnecessary CHECK lines makes the files smaller and hopefully the tests run quicker.	2025-03-31 12:40:41 +01:00
Florian Hahn	809f857d2c	[VPlan] Support early-exit loops in optimizeForVFAndUF. (#131539 ) Update optimizeForVFAndUF to support early-exit loops by handling BranchOnCond(Or(..., CanonicalIV == TripCount)) via SCEV PR: https://github.com/llvm/llvm-project/pull/131539	2025-03-31 07:55:48 +01:00
Florian Hahn	6b98134466	[VPlan] Re-enable narrowing interleave groups with interleaving. Remove the UF = 1 restriction introduced by 577631f0a5 building on top of 783a846507683, which allows updating all relevant users of the VF, VPScalarIVSteps in particular. This restores the full functionality of https://github.com/llvm/llvm-project/pull/106441.	2025-03-29 20:14:10 +00:00
Florian Hahn	783a846507	[VPlan] Add VF as operand to VPScalarIVStepsRecipe. Similarly to other recipes, update VPScalarIVStepsRecipe to also take the runtime VF as argument. This removes some unnecessary runtime VF computations for scalable vectors. It will also allow dropping the UF == 1 restriction for narrowing interleave groups required in 577631f0a528.	2025-03-28 21:48:59 +00:00
David Green	70f083f068	[LV][AArch64] Test cleanup of low_trip_count_predicates.ll. NFC Post commit cleanup from #132170	2025-03-28 19:31:37 +00:00
Hari Limaye	bf5627c85e	[LV] Optimize VPWidenIntOrFpInductionRecipe for known TC (#118828 ) Optimize the IR generated for a VPWidenIntOrFpInductionRecipe to use the narrowest type necessary, when the trip-count of a loop is known to be constant and the only use of the recipe is the condition used by the vector loop's backedge branch.	2025-03-28 14:47:40 +00:00
Pengcheng Wang	f5f4da6db6	[RISCV] Don't vectorize for loops with small trip count (#132176 ) Inspired by https://reviews.llvm.org/D130755. I don't know the logic behind the value 5, it is copied from AArch64. For some tests, I have to change the trip count so that we don't break what they are testing.	2025-03-28 15:51:29 +08:00
Florian Hahn	5c26e80e57	[LV] Make cost model tests independent of VPValue numbers. Update tests to not rely on hard-coded VPValue numbers.	2025-03-27 21:15:32 +00:00
Florian Hahn	8ddbc01295	[VPlan] Manage FindLastIV start value in ComputeFindLastIVResult (NFC) (#132690 ) Keep the start value as operand of ComputeFindLastIVResult. A follow-up patch will use this to make sure the start value is frozen if needed. Depends on https://github.com/llvm/llvm-project/pull/132689 PR: https://github.com/llvm/llvm-project/pull/132690	2025-03-27 18:34:13 +00:00
Ryotaro Kasuga	6c56a842b7	[clang][CodeGen] Generate follow-up metadata for loops in correct format (#131985 ) When pragma of loop transformations is specified, follow-up metadata for loops is generated after each transformation. On the LLVM side, follow-up metadata is expected to be a list of properties, such as the following: ``` !followup = !{!"llvm.loop.vectorize.followup_all", !mp, !isvectorized} !mp = !{!"llvm.loop.mustprogress"} !isvectorized = !{"llvm.loop.isvectorized"} ``` However, on the clang side, the generated metadata contains an MDNode that has those properties, as shown below: ``` !followup = !{!"llvm.loop.vectorize.followup_all", !loop_id} !loop_id = distinct !{!loop_id, !mp, !isvectorized} !mp = !{!"llvm.loop.mustprogress"} !isvectorized = !{"llvm.loop.isvectorized"} ``` According to the [LangRef](https://llvm.org/docs/TransformMetadata.html#transformation-metadata-structure), the LLVM side is correct. Due to this inconsistency, follow-up metadata was not interpreted correctly, e.g., only one transformation is applied when multiple pragmas are used. This patch fixes clang side to emit followup metadata in correct format.	2025-03-27 20:29:37 +09:00
Florian Hahn	2c7d40b2f0	[VPlan] Generalize SCALAR-STEPS removal to any unroll factor. Follow-up to dfca6c0d3bf9d1a056 to extend isUnrolled handle any unrolled VPlan, which means there's a single UF, but it will be > 1 if unrolling took place.	2025-03-26 21:03:50 +00:00
David Green	de1c2f24bc	[LoopVectorizer][AArch64] Move getMinTripCountTailFoldingThreshold later. (#132170 ) This moves the checks of MinTripCountTailFoldingThreshold later, during the calculation of whether to tail fold. This allows it to check beforehand whether tail predication is required, either for scalable or fixed-width vectors. This option is only specified for AArch64, where it returns the minimum of 5. This patch aims to allow the vectorization of TC=4 loops, preventing them from performing slower when SVE is present.	2025-03-26 19:35:08 +00:00
David Sherwood	1c9fe8c8af	[LV] Optimise users of induction variables in early exit blocks (#130766 ) This is the second of two PRs that attempts to improve the IR generated in the exit blocks of vectorised loops with uncountable early exits. It follows on from PR #128880. In this PR I am improving the generated code for users of induction variables in early exit blocks. This required using a newly add VPInstruction called FirstActiveLane, which calculates the index of the first active predicate in the mask operand. I have added a new function optimizeEarlyExitInductionUser that is called from optimizeInductionExitUsers when handling users in early exit blocks.	2025-03-26 12:09:59 +00:00
Florian Hahn	420c056f85	[VPlan] Add ComputeFindLastIVResult opcode (NFC). (#132689 ) This moves the logic for computing the FindLastIV reduction result to its own opcode. A follow-up patch will update the new opcode to also take the start value, to fix https://github.com/llvm/llvm-project/issues/126836. PR: https://github.com/llvm/llvm-project/pull/132689	2025-03-26 10:49:09 +00:00
Florian Hahn	577631f0a5	Reapply "[VPlan] Add transformation to narrow interleave groups. (#106441 )" This reverts commit ff3e2ba9eb94217f3ad3525dc18b0c7b684e0abf. The recommmitted version limits to transform to cases where no interleaving is taking place, to avoid a mis-compile when interleaving. Original commit message: This patch adds a new narrowInterleaveGroups transfrom, which tries convert a plan with interleave groups with VF elements to a plan that instead replaces the interleave groups with wide loads and stores processing VF elements. This effectively is a very simple form of loop-aware SLP, where we use interleave groups to identify candidates. This initial version is quite restricted and hopefully serves as a starting point for how to best model those kinds of transforms. Depends on https://github.com/llvm/llvm-project/pull/106431. Fixes https://github.com/llvm/llvm-project/issues/82936. PR: https://github.com/llvm/llvm-project/pull/106441	2025-03-25 20:57:10 +00:00
Florian Hahn	dfca6c0d3b	[VPlan] Remove no-op SCALAR-STEPS after unrolling. (#123655 ) After unrolling, there may be additional simplifications that can be applied. One example is removing SCALAR-STEPS for the first part where only the first lane is demanded. This removes redundant adds of 0 from a large number of tests (~200), many which I am still working on updating. In preparation for removing redundant WideIV steps added in https://github.com/llvm/llvm-project/pull/119284. PR: https://github.com/llvm/llvm-project/pull/123655	2025-03-25 12:57:24 +00:00
Florian Hahn	9c7e38896f	[VPlan] Split off reduction printing tests, add find-last-IV test. Splits off reduction printing tests, to limit growth and add test case for printing find-last-IV (https://github.com/llvm/llvm-project/pull/132689)	2025-03-25 10:06:28 +00:00
Luke Lau	6a8606e99e	[VPlan] Only store RecurKind + FastMathFlags in VPReductionRecipe. NFCI (#131300 ) VPReductionRecipes take a RecurrenceDescriptor, but only use the RecurKind and FastMathFlags in it when executing. This patch makes the recipe more lightweight by stripping it to only take the latter two. The motiviation for this is to simplify an upcoming patch to support in-loop AnyOf reductions. For an in-loop AnyOf reduction we want to create an Or reduction, and by using RecurKind we can create an arbitrary reduction without needing a full RecurrenceDescriptor.	2025-03-24 19:18:54 +08:00
Florian Hahn	06fd10f1da	[VPlan] Don't create ExtractElement recipes for scalar plans. (#131604 ) ExtractElements are no-ops for scalar VPlans. Don't introduce them in handleUncountableEarlyExit if the plan has only a scalar VF. This fixes a crash trying to compute the cost of ExtractElement after 26ecf978951b79. PR: https://github.com/llvm/llvm-project/pull/131604	2025-03-23 22:00:02 +00:00
Martin Storsjö	ff3e2ba9eb	Revert "[VPlan] Add transformation to narrow interleave groups. (#106441 )" This reverts commit dfa665f19c52d98b8d833a8e9073427ba5641b19. This commit caused miscompilations in ffmpeg, see https://github.com/llvm/llvm-project/pull/106441 for details.	2025-03-23 23:27:39 +02:00
Florian Hahn	c482b8faea	[VPlan] Only execute VPExpandSCEVRecipes once and remove them (NFC). Instead of executing the whole entry VPIRBB twice, first only execute the VPExpandSCEVRecipes and replace their uses with the expanded VPValue, which will be a live-in. This allows removing special logic in VPExpandSCEVRecipe to support executing twice and allows moving the ExpandedSCEVs map out of VPTransformState. It will also allow adding other recipes to the entry VPBB in the future.	2025-03-23 09:06:01 +00:00
Florian Hahn	dfa665f19c	[VPlan] Add transformation to narrow interleave groups. (#106441 ) This patch adds a new narrowInterleaveGroups transfrom, which tries convert a plan with interleave groups with VF elements to a plan that instead replaces the interleave groups with wide loads and stores processing VF elements. This effectively is a very simple form of loop-aware SLP, where we use interleave groups to identify candidates. This initial version is quite restricted and hopefully serves as a starting point for how to best model those kinds of transforms. Depends on https://github.com/llvm/llvm-project/pull/106431. Fixes https://github.com/llvm/llvm-project/issues/82936. PR: https://github.com/llvm/llvm-project/pull/106441	2025-03-22 21:40:17 +00:00
Florian Hahn	0d3ba087f7	[LV] Move IV bypass value creation out of ILV (NFC) createInductionAdditionalBypassValues is only used for epilogue vectorization now. Move it out of ILV, which means we do not have to thread through ExpandedSCEVs and also don't have to track the bypass values in ILV. Instead, directly create them if needed after executing the epilogue plan. This moves more the epilogue specific logic out of the generic executePlan.	2025-03-22 20:36:45 +00:00
Florian Hahn	2186199d08	[LV] Add test showing missing debug location on VPScalarIVStepsRecipe.	2025-03-22 14:38:14 +00:00
Florian Hahn	2f2100c879	[LV] Add additional tests for #106441 . Further increase test coverage for https://github.com/llvm/llvm-project/pull/106441 Also regenerate checks with -filter-out-after.	2025-03-22 10:07:11 +00:00
David Sherwood	4e69258bf3	[LoopVectorize] Add cost of generating tail-folding mask to the loop (#130565 ) At the moment if we decide to enable tail-folding we do not include the cost of generating the mask per VF. This can mean we make some poor choices of VF, which is definitely true for SVE-enabled AArch64 targets where mask generation for fixed-width vectors is more expensive than for scalable vectors. I've added a VPInstruction::computeCost function to return the costs of the ActiveLaneMask and ExplicitVectorLength operations. Unfortunately, in order to prevent asserts firing I've also had to duplicate the same code in the legacy cost model to make sure the chosen VFs match up. I've wrapped this up in a ifndef NDEBUG for now. The alternative would be to disable the assert completely when tail-folding, which I imagine is just as bad. New tests added: Transforms/LoopVectorize/AArch64/sve-tail-folding-cost.ll Transforms/LoopVectorize/RISCV/tail-folding-cost.ll	2025-03-21 09:24:56 +00:00
Florian Hahn	c73ad7ba20	[VPlan] Add transformation to narrow interleave groups. This patch adds a new narrowInterleaveGroups transfrom, which tries convert a plan with interleave groups with VF elements to a plan that instead replaces the interleave groups with wide loads and stores processing VF elements. This effectively is a very simple form of loop-aware SLP, where we use interleave groups to identify candidates. This initial version is quite restricted and hopefully serves as a starting point for how to best model those kinds of transforms. For now it only transforms load interleave groups feeding store groups. Depends on #106431. This lands the main parts of the approved https://github.com/llvm/llvm-project/pull/106441 as suggested to break things up a bit more.	2025-03-20 19:41:37 +00:00
Florian Hahn	2e13ec561c	[VPlan] Bail out on non-intrinsic calls in VPlanNativePath. Update initial VPlan-construction in VPlanNativePath in line with the inner loop path, in that it bails out when encountering constructs it cannot handle, like non-intrinsic calls. Fixes https://github.com/llvm/llvm-project/issues/131071.	2025-03-19 21:35:15 +00:00
Florian Hahn	11b8699572	[LV] Don't skip instrs with side-effects in reg pressure computation. (#126415 ) calculateRegisterUsage adds end points for each user of an instruction to Ends and ignores instructions not added to it, i.e. instructions with no users. This means things like stores aren't included, which in turn means values that are only used in stores are also not included for consideration. This means we underestimate the register usage in cases where the only users are things like stores. Update the code to don't skip instructions without users (i.e. not in Ends) if they have side-effects. PR: https://github.com/llvm/llvm-project/pull/126415	2025-03-19 15:13:43 +00:00
Florian Hahn	3c554deaaa	[LV] Add reg-usage test with values only used by llvm.assume. Add test checking we are not counting registers that are only used by ephemeral users, like llvm.assume.	2025-03-19 12:17:50 +00:00
Florian Hahn	870f753f1f	[VPlan] Also materialize broadcasts for backedge-taken-counts (NFC). Also include VPlan's BTC in the set of VPValues to materialize broadcasts for, if it is used.	2025-03-18 22:35:18 +00:00
Florian Hahn	1442fe0c89	[LV] Update test to use dereferenceable attribute instead of assumption. Use dereferenceable attribute instead of assumption to make the tests independent of https://github.com/llvm/llvm-project/pull/128061.	2025-03-18 20:28:28 +00:00
Luke Lau	a4dc02c0e7	[VPlan] Rename VPReverseVectorPointerRecipe to VPVectorEndPointerRecipe. NFC (#131086 ) After #128718 lands there will be two ways of performing a reversed widened memory access, either by performing a consecutive unit-stride access and a reverse, or a strided access with a negative stride. Even though both produce a reversed vector, only the former needs VPReverseVectorPointerRecipe which computes a pointer to the last element of each part. A strided reverse still needs a pointer to the first element of each part so it will use VPVectorPointerRecipe. This renames VPReverseVectorPointerRecipe to VPVectorEndPointerRecipe to clarify that a reversed access may not necessarily need a pointer to the last element.	2025-03-19 00:09:15 +08:00
David Sherwood	f6b1b91a3d	[LV][NFC] Regenerate CHECK lines in some tests (#131799 ) Regenerates CHECK lines in tests that are affected by PR #130565 to aid reviews.	2025-03-18 14:38:01 +00:00
David Sherwood	2586e7fcd8	[LV][NFC] Tidy up partial reduction tests with filter-out-after option (#129047 ) A few test files seemed to have been edited after using the update_test_checks.py script, which can make life hard for developers when trying to update these tests in future patches. Also, the tests still had this comment at the top ; NOTE: Assertions have been autogenerated by ... which could potentially be confusing, since they've not strictly been auto-generated. I've attempted to keep the spirit of the original tests by excluding all CHECK lines after the scalar.ph IR block, however I've done this by using a new option called --filter-out-after to the update_test_checks.py script.	2025-03-18 11:39:55 +00:00
Mel Chen	489d1e764e	[LV][NFC] Pre-commit test for supporting strided accesses. (#130563 ) Duplicate riscv-vector-reverse.ll as riscv-vector-reverse-output.ll to verify all generated IR, not just debug output. Pre-commit for #128718.	2025-03-18 16:08:42 +08:00
Florian Hahn	166937b49d	[LV] Cleanup after expanding SCEV predicate to constant. In some cases, SCEV isn't able to prove that no wrap checks are needed, while constant folding in SCEVExpander can. In those cases, we may leave around IR for computing the trip count, which is unused at this point but may be re-used later, triggering an assertion when trying to clean up SCEVExp after vectorization. Directly run the cleaner after expanding to a constant predicate to prevent any generated code from being re-used. Fixes https://github.com/llvm/llvm-project/issues/131281.	2025-03-17 21:26:51 +00:00
Luke Lau	eef5ea0c42	[VPlan] Account for dead FOR splice simplification in cost model (#131486 ) Fixes #131359 After #129645, a first-order recurrence will no longer have it's splice costed if the VPInstruction::FirstOrderRecurrenceSplice has no users and is dead. The legacy cost model didn't account for this, so this accounts for it in planContainsAdditionalSimplifications to avoid the "VPlan cost model and legacy cost model disagreed" assertion.	2025-03-18 00:00:54 +08:00
Ryotaro Kasuga	e24e523150	[LoopVectorize] Add test for follow-up metadata for loops (NFC) (#131337 ) When pragma of loop transformations are encoded in LLVM IR, follow-up metadata is used if multiple transformations are specified. They are used to explicitly express the order of the transformations. However, they are not properly processed on each transformation pass, so now only the first one is attempted to be applied. This is a pre-commit to add a test that causes the problem. ref: https://github.com/llvm/llvm-project/pull/127474#issuecomment-2717790398	2025-03-17 13:45:09 +09:00
Florian Hahn	40b7034213	[LV] Add tests for vector backedge elimination with early-exit loops.	2025-03-16 19:42:30 +00:00
Florian Hahn	ee29e16135	[LV] Reorganize tests for narrowing interleave group transform. Make test target-dependent, as they will require access to a concrete vector register width. Also add new tests for cost modeling, unrolling and removing the vector loop region.	2025-03-16 19:18:47 +00:00
Florian Hahn	4e9894498e	[VPlan] Truncate VFxUF if needed in VPWidenPointerInduction::execute. Create truncate if needed after 56b05a0d6. Note that this preserves the original behavior pre 56b05a0d6. If truncate would strip any set bits, then the explicit computation in the narrower type would wrap.	2025-03-16 11:37:58 +00:00
Florian Hahn	6a8d5f22ff	[VPlan] Don't access canonical IV in VPWidenPointerInduction::execute. This updates VPWidenPointerInductionRecipe::execute to not use the canonical IV to determine the insert point. Instead, it relies on the current recipe position. In cases where this is not sufficient, set the insert point to the first non-phi instruction, to ensure phis are created together.	2025-03-15 21:32:48 +00:00
Florian Hahn	aadfa9f6c8	[LV] Add additional tests for narrowing interleave groups. Extend test coverage for https://github.com/llvm/llvm-project/pull/106441.	2025-03-15 21:13:49 +00:00
Florian Hahn	37a57ca257	[FMF] Set all bits if needed when setting individual flags. (#131321 ) Currently fast() won't return true if all flags are set via setXXX, which is surprising. Update setters to set all bits if needed to make sure isFast() consistently returns the expected result. PR: https://github.com/llvm/llvm-project/pull/131321	2025-03-15 18:46:26 +00:00
Florian Hahn	56b05a0d6b	[VPlan] Use VFxUF in VPWidenPointerInductionRecipe. Use VFxUF VPValue instead of computing VF * UF explicitly.	2025-03-15 18:18:53 +00:00
Jeremy Morse	792a6f8119	[RemoveDIs] Remove "try-debuginfo-iterators..." test flags (#130298 ) These date back to when the non-intrinsic format of variable locations was still being tested and was behind a compile-time flag, so not all builds / bots would correctly run them. The solution at the time, to get at least some test coverage, was to have tests opt-in to non-intrinsic debug-info if it was built into LLVM. Nowadays, non-intrinsic format is the default and has been on for more than a year, there's no need for this flag to exist. (I've downgraded the flag from "try" to explicitly requesting non-intrinsic format in some places, so that we can deal with tests that are explicitly about non-intrinsic format in their own commit).	2025-03-14 15:50:49 +00:00
David Sherwood	3b6d0093aa	[LV][NFC] Refactor code for extracting first active element (#131118 ) Refactor the code to extract the first active element of a vector in the early exit block, in preparation for PR #130766. I've replaced the VPInstruction::ExtractFirstActive nodes with a combination of a new VPInstruction::FirstActiveLane node and a Instruction::ExtractElement node.	2025-03-14 11:14:09 +00:00
Luke Lau	26324bc1bf	[VPlan] Move FOR splice cost into VPInstruction::FirstOrderRecurrenceSplice (#129645 ) After #124093 we now support fixed-order recurrences with EVL tail folding by replacing VPInstruction::FirstOrderRecurrenceSplice with a VP splice intrinsic. However the costing for the splice is currently done in VPFirstOrderRecurrencePHIRecipe, so when we add the VP splice intrinsic we end up costing it twice. This fixes it by splitting out the cost for the splice into FirstOrderRecurrenceSplice so that it's not duplicated when we replace it. We still have to keep the VF=1 checks in VPFirstOrderRecurrencePHIRecipe since the splice might end up dead and discarded, e.g. in the test @pr97452_scalable_vf1_for.	2025-03-14 15:33:32 +08:00

1 2 3 4 5 ...

3005 Commits