3005 Commits

Author SHA1 Message Date
Ramkumar Ramachandra
c20bea09c2
[LV] Regen a test with UTC (#133432) 2025-03-31 16:24:45 +01:00
David Sherwood
f4d25c498a
[LV][NFC] Regenerate some SVE tests using --filter-out-after option (#132174)
I recently added a new option to update_test_checks.py that can
filter out all CHECK lines after a certain point. We usually don't
care about checking for the original scalar loop after the vector
loop because it doesn't change. Cutting out unnecessary CHECK
lines makes the files smaller and hopefully the tests run quicker.
2025-03-31 12:40:41 +01:00
Florian Hahn
809f857d2c
[VPlan] Support early-exit loops in optimizeForVFAndUF. (#131539)
Update optimizeForVFAndUF to support early-exit loops by handling
BranchOnCond(Or(..., CanonicalIV == TripCount)) via SCEV

PR: https://github.com/llvm/llvm-project/pull/131539
2025-03-31 07:55:48 +01:00
Florian Hahn
6b98134466
[VPlan] Re-enable narrowing interleave groups with interleaving.
Remove the UF = 1 restriction introduced by 577631f0a5 building on top
of 783a846507683, which allows updating all relevant users of the VF,
VPScalarIVSteps in particular.

This restores the full functionality of
https://github.com/llvm/llvm-project/pull/106441.
2025-03-29 20:14:10 +00:00
Florian Hahn
783a846507
[VPlan] Add VF as operand to VPScalarIVStepsRecipe.
Similarly to other recipes, update VPScalarIVStepsRecipe to also take
the runtime VF as argument. This removes some unnecessary runtime VF
computations for scalable vectors. It will also allow dropping the
UF == 1 restriction for narrowing interleave groups required in
577631f0a528.
2025-03-28 21:48:59 +00:00
David Green
70f083f068 [LV][AArch64] Test cleanup of low_trip_count_predicates.ll. NFC
Post commit cleanup from #132170
2025-03-28 19:31:37 +00:00
Hari Limaye
bf5627c85e
[LV] Optimize VPWidenIntOrFpInductionRecipe for known TC (#118828)
Optimize the IR generated for a VPWidenIntOrFpInductionRecipe to use the
narrowest type necessary, when the trip-count of a loop is known to be
constant and the only use of the recipe is the condition used by the
vector loop's backedge branch.
2025-03-28 14:47:40 +00:00
Pengcheng Wang
f5f4da6db6
[RISCV] Don't vectorize for loops with small trip count (#132176)
Inspired by https://reviews.llvm.org/D130755.

I don't know the logic behind the value 5, it is copied from AArch64.

For some tests, I have to change the trip count so that we don't
break what they are testing.
2025-03-28 15:51:29 +08:00
Florian Hahn
5c26e80e57
[LV] Make cost model tests independent of VPValue numbers.
Update tests to not rely on hard-coded VPValue numbers.
2025-03-27 21:15:32 +00:00
Florian Hahn
8ddbc01295
[VPlan] Manage FindLastIV start value in ComputeFindLastIVResult (NFC) (#132690)
Keep the start value as operand of ComputeFindLastIVResult. A follow-up
patch will use this to make sure the start value is frozen if needed.

Depends on https://github.com/llvm/llvm-project/pull/132689

PR: https://github.com/llvm/llvm-project/pull/132690
2025-03-27 18:34:13 +00:00
Ryotaro Kasuga
6c56a842b7
[clang][CodeGen] Generate follow-up metadata for loops in correct format (#131985)
When pragma of loop transformations is specified, follow-up metadata for
loops is generated after each transformation. On the LLVM side,
follow-up metadata is expected to be a list of properties, such as the
following:

```
!followup = !{!"llvm.loop.vectorize.followup_all", !mp, !isvectorized}
!mp = !{!"llvm.loop.mustprogress"}
!isvectorized = !{"llvm.loop.isvectorized"}
```

However, on the clang side, the generated metadata contains an MDNode
that has those properties, as shown below:

```
!followup = !{!"llvm.loop.vectorize.followup_all", !loop_id}
!loop_id = distinct !{!loop_id, !mp, !isvectorized}
!mp = !{!"llvm.loop.mustprogress"}
!isvectorized = !{"llvm.loop.isvectorized"}
```
According to the
[LangRef](https://llvm.org/docs/TransformMetadata.html#transformation-metadata-structure),
the LLVM side is correct. Due to this inconsistency, follow-up metadata
was not interpreted correctly, e.g., only one transformation is applied
when multiple pragmas are used.

This patch fixes clang side to emit followup metadata in correct format.
2025-03-27 20:29:37 +09:00
Florian Hahn
2c7d40b2f0
[VPlan] Generalize SCALAR-STEPS removal to any unroll factor.
Follow-up to dfca6c0d3bf9d1a056 to extend isUnrolled handle any unrolled
VPlan, which means there's a single UF, but it will be > 1 if unrolling
took place.
2025-03-26 21:03:50 +00:00
David Green
de1c2f24bc
[LoopVectorizer][AArch64] Move getMinTripCountTailFoldingThreshold later. (#132170)
This moves the checks of MinTripCountTailFoldingThreshold later, during the
calculation of whether to tail fold. This allows it to check beforehand whether
tail predication is required, either for scalable or fixed-width vectors.

This option is only specified for AArch64, where it returns the minimum of 5.
This patch aims to allow the vectorization of TC=4 loops, preventing them from
performing slower when SVE is present.
2025-03-26 19:35:08 +00:00
David Sherwood
1c9fe8c8af
[LV] Optimise users of induction variables in early exit blocks (#130766)
This is the second of two PRs that attempts to improve the IR
generated in the exit blocks of vectorised loops with uncountable
early exits. It follows on from PR #128880. In this PR I am
improving the generated code for users of induction variables in
early exit blocks.

This required using a newly add VPInstruction called
FirstActiveLane, which calculates the index of the first active
predicate in the mask operand.

I have added a new function optimizeEarlyExitInductionUser that
is called from optimizeInductionExitUsers when handling users in
early exit blocks.
2025-03-26 12:09:59 +00:00
Florian Hahn
420c056f85
[VPlan] Add ComputeFindLastIVResult opcode (NFC). (#132689)
This moves the logic for computing the FindLastIV reduction result to
its own opcode. A follow-up patch will update the new opcode to also
take the start value, to fix
https://github.com/llvm/llvm-project/issues/126836.

PR: https://github.com/llvm/llvm-project/pull/132689
2025-03-26 10:49:09 +00:00
Florian Hahn
577631f0a5
Reapply "[VPlan] Add transformation to narrow interleave groups. (#106441)"
This reverts commit ff3e2ba9eb94217f3ad3525dc18b0c7b684e0abf.

The recommmitted version limits to transform to cases where no
interleaving is taking place, to avoid a mis-compile when interleaving.

Original commit message:

This patch adds a new narrowInterleaveGroups transfrom, which tries
convert a plan with interleave groups with VF elements to a plan that
instead replaces the interleave groups with wide loads and stores
processing VF elements.

This effectively is a very simple form of loop-aware SLP, where we
use interleave groups to identify candidates.

This initial version is quite restricted and hopefully serves as a
starting point for how to best model those kinds of transforms.

Depends on https://github.com/llvm/llvm-project/pull/106431.

Fixes https://github.com/llvm/llvm-project/issues/82936.

PR: https://github.com/llvm/llvm-project/pull/106441
2025-03-25 20:57:10 +00:00
Florian Hahn
dfca6c0d3b
[VPlan] Remove no-op SCALAR-STEPS after unrolling. (#123655)
After unrolling, there may be additional simplifications that can be
applied. One example is removing SCALAR-STEPS for the first part where
only the first lane is demanded.

This removes redundant adds of 0 from a large number of tests (~200),
many which I am still working on updating.

In preparation for removing redundant WideIV steps added in
https://github.com/llvm/llvm-project/pull/119284.

PR: https://github.com/llvm/llvm-project/pull/123655
2025-03-25 12:57:24 +00:00
Florian Hahn
9c7e38896f
[VPlan] Split off reduction printing tests, add find-last-IV test.
Splits off reduction printing tests, to limit growth and add test case
for printing find-last-IV (https://github.com/llvm/llvm-project/pull/132689)
2025-03-25 10:06:28 +00:00
Luke Lau
6a8606e99e
[VPlan] Only store RecurKind + FastMathFlags in VPReductionRecipe. NFCI (#131300)
VPReductionRecipes take a RecurrenceDescriptor, but only use the
RecurKind and FastMathFlags in it when executing. This patch makes the
recipe more lightweight by stripping it to only take the latter two.

The motiviation for this is to simplify an upcoming patch to support
in-loop AnyOf reductions. For an in-loop AnyOf reduction we want to
create an Or reduction, and by using RecurKind we can create an
arbitrary reduction without needing a full RecurrenceDescriptor.
2025-03-24 19:18:54 +08:00
Florian Hahn
06fd10f1da
[VPlan] Don't create ExtractElement recipes for scalar plans. (#131604)
ExtractElements are no-ops for scalar VPlans. Don't introduce them in 
handleUncountableEarlyExit if the plan has only a scalar VF.

This fixes a crash trying to compute the cost of ExtractElement after 26ecf978951b79.

PR: https://github.com/llvm/llvm-project/pull/131604
2025-03-23 22:00:02 +00:00
Martin Storsjö
ff3e2ba9eb Revert "[VPlan] Add transformation to narrow interleave groups. (#106441)"
This reverts commit dfa665f19c52d98b8d833a8e9073427ba5641b19.

This commit caused miscompilations in ffmpeg, see
https://github.com/llvm/llvm-project/pull/106441 for details.
2025-03-23 23:27:39 +02:00
Florian Hahn
c482b8faea
[VPlan] Only execute VPExpandSCEVRecipes once and remove them (NFC).
Instead of executing the whole entry VPIRBB twice, first only execute
the VPExpandSCEVRecipes and replace their uses with the expanded
VPValue, which will be a live-in. This allows removing special logic in
VPExpandSCEVRecipe to support executing twice and allows moving the
ExpandedSCEVs map out of VPTransformState.

It will also allow adding other recipes to the entry VPBB in the future.
2025-03-23 09:06:01 +00:00
Florian Hahn
dfa665f19c
[VPlan] Add transformation to narrow interleave groups. (#106441)
This patch adds a new narrowInterleaveGroups transfrom, which tries
convert a plan with interleave groups with VF elements to a plan that
instead replaces the interleave groups with wide loads and stores
processing VF elements.

This effectively is a very simple form of loop-aware SLP, where we
use interleave groups to identify candidates.

This initial version is quite restricted and hopefully serves as a
starting point for how to best model those kinds of transforms.

Depends on https://github.com/llvm/llvm-project/pull/106431.

Fixes https://github.com/llvm/llvm-project/issues/82936.

PR: https://github.com/llvm/llvm-project/pull/106441
2025-03-22 21:40:17 +00:00
Florian Hahn
0d3ba087f7
[LV] Move IV bypass value creation out of ILV (NFC)
createInductionAdditionalBypassValues is only used for epilogue
vectorization now. Move it out of ILV, which means we do not have to
thread through ExpandedSCEVs and also don't have to track the bypass
values in ILV. Instead, directly create them if needed after executing
the epilogue plan. This moves more the epilogue specific logic out of
the generic executePlan.
2025-03-22 20:36:45 +00:00
Florian Hahn
2186199d08
[LV] Add test showing missing debug location on VPScalarIVStepsRecipe. 2025-03-22 14:38:14 +00:00
Florian Hahn
2f2100c879
[LV] Add additional tests for #106441.
Further increase test coverage for
https://github.com/llvm/llvm-project/pull/106441

Also regenerate checks with -filter-out-after.
2025-03-22 10:07:11 +00:00
David Sherwood
4e69258bf3
[LoopVectorize] Add cost of generating tail-folding mask to the loop (#130565)
At the moment if we decide to enable tail-folding we do not include
the cost of generating the mask per VF. This can mean we make some
poor choices of VF, which is definitely true for SVE-enabled AArch64
targets where mask generation for fixed-width vectors is more
expensive than for scalable vectors.

I've added a VPInstruction::computeCost function to return the costs
of the ActiveLaneMask and ExplicitVectorLength operations.
Unfortunately, in order to prevent asserts firing I've also had to
duplicate the same code in the legacy cost model to make sure the
chosen VFs match up. I've wrapped this up in a ifndef NDEBUG for
now. The alternative would be to disable the assert completely when
tail-folding, which I imagine is just as bad.

New tests added:

  Transforms/LoopVectorize/AArch64/sve-tail-folding-cost.ll
  Transforms/LoopVectorize/RISCV/tail-folding-cost.ll
2025-03-21 09:24:56 +00:00
Florian Hahn
c73ad7ba20
[VPlan] Add transformation to narrow interleave groups.
This patch adds a new narrowInterleaveGroups transfrom, which tries
convert a plan with interleave groups with VF elements to a plan that
instead replaces the interleave groups with wide loads and stores
processing VF elements.

This effectively is a very simple form of loop-aware SLP, where we
use interleave groups to identify candidates.

This initial version is quite restricted and hopefully serves as a
starting point for how to best model those kinds of transforms. For now
it only transforms load interleave groups feeding store groups.

Depends on #106431.

This lands the main parts of the approved
https://github.com/llvm/llvm-project/pull/106441 as suggested to break
things up a bit more.
2025-03-20 19:41:37 +00:00
Florian Hahn
2e13ec561c
[VPlan] Bail out on non-intrinsic calls in VPlanNativePath.
Update initial VPlan-construction in VPlanNativePath in line with the
inner loop path, in that it bails out when encountering constructs it
cannot handle, like non-intrinsic calls.

Fixes https://github.com/llvm/llvm-project/issues/131071.
2025-03-19 21:35:15 +00:00
Florian Hahn
11b8699572
[LV] Don't skip instrs with side-effects in reg pressure computation. (#126415)
calculateRegisterUsage adds end points for each user of an instruction
to Ends and ignores instructions not added to it, i.e. instructions with
no users.

This means things like stores aren't included, which in turn means
values that are only used in stores are also not included for
consideration. This means we underestimate the register usage in cases
where the only users are things like stores.

Update the code to don't skip instructions without users (i.e. not in
Ends) if they have side-effects.

PR: https://github.com/llvm/llvm-project/pull/126415
2025-03-19 15:13:43 +00:00
Florian Hahn
3c554deaaa
[LV] Add reg-usage test with values only used by llvm.assume.
Add test checking we are not counting registers that are only used by
ephemeral users, like llvm.assume.
2025-03-19 12:17:50 +00:00
Florian Hahn
870f753f1f
[VPlan] Also materialize broadcasts for backedge-taken-counts (NFC).
Also include VPlan's BTC in the set of VPValues to materialize
broadcasts for, if it is used.
2025-03-18 22:35:18 +00:00
Florian Hahn
1442fe0c89
[LV] Update test to use dereferenceable attribute instead of assumption.
Use dereferenceable attribute instead of assumption to make the tests
independent of https://github.com/llvm/llvm-project/pull/128061.
2025-03-18 20:28:28 +00:00
Luke Lau
a4dc02c0e7
[VPlan] Rename VPReverseVectorPointerRecipe to VPVectorEndPointerRecipe. NFC (#131086)
After #128718 lands there will be two ways of performing a reversed
widened memory access, either by performing a consecutive unit-stride
access and a reverse, or a strided access with a negative stride.

Even though both produce a reversed vector, only the former needs
VPReverseVectorPointerRecipe which computes a pointer to the last
element of each part. A strided reverse still needs a pointer to the
first element of each part so it will use VPVectorPointerRecipe.

This renames VPReverseVectorPointerRecipe to VPVectorEndPointerRecipe to
clarify that a reversed access may not necessarily need a pointer to the
last element.
2025-03-19 00:09:15 +08:00
David Sherwood
f6b1b91a3d
[LV][NFC] Regenerate CHECK lines in some tests (#131799)
Regenerates CHECK lines in tests that are affected by
PR #130565 to aid reviews.
2025-03-18 14:38:01 +00:00
David Sherwood
2586e7fcd8
[LV][NFC] Tidy up partial reduction tests with filter-out-after option (#129047)
A few test files seemed to have been edited after using the
update_test_checks.py script, which can make life hard for
developers when trying to update these tests in future
patches. Also, the tests still had this comment at the top

; NOTE: Assertions have been autogenerated by ...

which could potentially be confusing, since they've not
strictly been auto-generated.

I've attempted to keep the spirit of the original tests by
excluding all CHECK lines after the scalar.ph IR block,
however I've done this by using a new option called
--filter-out-after to the update_test_checks.py script.
2025-03-18 11:39:55 +00:00
Mel Chen
489d1e764e
[LV][NFC] Pre-commit test for supporting strided accesses. (#130563)
Duplicate riscv-vector-reverse.ll as riscv-vector-reverse-output.ll to
verify all generated IR, not just debug output.
Pre-commit for #128718.
2025-03-18 16:08:42 +08:00
Florian Hahn
166937b49d
[LV] Cleanup after expanding SCEV predicate to constant.
In some cases, SCEV isn't able to prove that no wrap checks are needed,
while constant folding in SCEVExpander can. In those cases, we may leave
around IR for computing the trip count, which is unused at this point
but may be re-used later, triggering an assertion when trying to clean
up SCEVExp after vectorization.

Directly run the cleaner after expanding to a constant predicate to
prevent any generated code from being re-used.

Fixes https://github.com/llvm/llvm-project/issues/131281.
2025-03-17 21:26:51 +00:00
Luke Lau
eef5ea0c42
[VPlan] Account for dead FOR splice simplification in cost model (#131486)
Fixes #131359

After #129645, a first-order recurrence will no longer have it's splice
costed if the VPInstruction::FirstOrderRecurrenceSplice has no users and
is dead.

The legacy cost model didn't account for this, so this accounts for it
in planContainsAdditionalSimplifications to avoid the "VPlan cost model
and legacy cost model disagreed" assertion.
2025-03-18 00:00:54 +08:00
Ryotaro Kasuga
e24e523150
[LoopVectorize] Add test for follow-up metadata for loops (NFC) (#131337)
When pragma of loop transformations are encoded in LLVM IR, follow-up
metadata is used if multiple transformations are specified. They are
used to explicitly express the order of the transformations. However,
they are not properly processed on each transformation pass, so now only
the first one is attempted to be applied. This is a pre-commit to add a
test that causes the problem.

ref:
https://github.com/llvm/llvm-project/pull/127474#issuecomment-2717790398
2025-03-17 13:45:09 +09:00
Florian Hahn
40b7034213
[LV] Add tests for vector backedge elimination with early-exit loops. 2025-03-16 19:42:30 +00:00
Florian Hahn
ee29e16135
[LV] Reorganize tests for narrowing interleave group transform.
Make test target-dependent, as they will require access to a concrete
vector register width. Also add new tests for cost modeling, unrolling
and removing the vector loop region.
2025-03-16 19:18:47 +00:00
Florian Hahn
4e9894498e
[VPlan] Truncate VFxUF if needed in VPWidenPointerInduction::execute.
Create truncate if needed after 56b05a0d6. Note that this preserves the
original behavior pre 56b05a0d6. If truncate would strip any set bits,
then the explicit computation in the narrower type would wrap.
2025-03-16 11:37:58 +00:00
Florian Hahn
6a8d5f22ff
[VPlan] Don't access canonical IV in VPWidenPointerInduction::execute.
This updates VPWidenPointerInductionRecipe::execute to not use the
canonical IV to determine the insert point. Instead, it relies on the
current recipe position. In cases where this is not sufficient, set the
insert point to the first non-phi instruction, to ensure phis are
created together.
2025-03-15 21:32:48 +00:00
Florian Hahn
aadfa9f6c8
[LV] Add additional tests for narrowing interleave groups.
Extend test coverage for https://github.com/llvm/llvm-project/pull/106441.
2025-03-15 21:13:49 +00:00
Florian Hahn
37a57ca257
[FMF] Set all bits if needed when setting individual flags. (#131321)
Currently fast() won't return true if all flags are set via setXXX,
which is surprising. Update setters to set all bits if needed to make
sure isFast() consistently returns the expected result.

PR: https://github.com/llvm/llvm-project/pull/131321
2025-03-15 18:46:26 +00:00
Florian Hahn
56b05a0d6b
[VPlan] Use VFxUF in VPWidenPointerInductionRecipe.
Use VFxUF VPValue instead of computing VF * UF explicitly.
2025-03-15 18:18:53 +00:00
Jeremy Morse
792a6f8119
[RemoveDIs] Remove "try-debuginfo-iterators..." test flags (#130298)
These date back to when the non-intrinsic format of variable locations
was still being tested and was behind a compile-time flag, so not all
builds / bots would correctly run them. The solution at the time, to get
at least some test coverage, was to have tests opt-in to non-intrinsic
debug-info if it was built into LLVM.

Nowadays, non-intrinsic format is the default and has been on for more
than a year, there's no need for this flag to exist.

(I've downgraded the flag from "try" to explicitly requesting
non-intrinsic format in some places, so that we can deal with tests that
are explicitly about non-intrinsic format in their own commit).
2025-03-14 15:50:49 +00:00
David Sherwood
3b6d0093aa
[LV][NFC] Refactor code for extracting first active element (#131118)
Refactor the code to extract the first active element of a
vector in the early exit block, in preparation for PR #130766.
I've replaced the VPInstruction::ExtractFirstActive nodes with
a combination of a new VPInstruction::FirstActiveLane node and
a Instruction::ExtractElement node.
2025-03-14 11:14:09 +00:00
Luke Lau
26324bc1bf
[VPlan] Move FOR splice cost into VPInstruction::FirstOrderRecurrenceSplice (#129645)
After #124093 we now support fixed-order recurrences with EVL tail
folding by replacing VPInstruction::FirstOrderRecurrenceSplice with a VP
splice intrinsic.

However the costing for the splice is currently done in
VPFirstOrderRecurrencePHIRecipe, so when we add the VP splice intrinsic
we end up costing it twice.

This fixes it by splitting out the cost for the splice into
FirstOrderRecurrenceSplice so that it's not duplicated when we replace
it.

We still have to keep the VF=1 checks in VPFirstOrderRecurrencePHIRecipe
since the splice might end up dead and discarded, e.g. in the test
@pr97452_scalable_vf1_for.
2025-03-14 15:33:32 +08:00