2462 Commits

Author SHA1 Message Date
Ramkumar Ramachandra
6bbdc70066
[LV] Use getCallWideningDecision in more places (NFC) (#134236) 2025-04-03 14:53:19 +01:00
Florian Hahn
4b67c53e20
[VPlan] Use recipe debug loc instead of instr DLs in more cases (NFC)
Update both VPInterleaveRecipe and VPReplicateRecipe codegen to use
debug location directly from the recipe, not the underlying instruction.
This removes another dependency on underlying instructions.
2025-04-02 21:51:17 +01:00
Krzysztof Drewniak
554859c736
[TTI] Make isLegalMasked{Load,Store} take an address space (#134006)
In order to facilitate targets that only support masked loads/stores
on certain address spaces (AMDGPU will support them in an upcoming
patch, but only for address space 7), add an AddressSpace parameter
to isLegalMaskedLoad and isLegalMaskedStore
2025-04-02 15:38:10 -05:00
Ningning Shi(史宁宁)
6b647de031
[NFC] Remove the unused hasMinSize() (#133838)
The 'hasOptSize()' is 'hasFnAttribute(Attribute::OptimizeForSize) ||
hasMinSize()', so we don't need another 'hasMinSize()'.
2025-04-01 15:23:34 +08:00
Alexey Bataev
78777a204a
[LV]Split store-load forward distance analysis from other checks, NFC (#121156)
The patch splits the store-load forwarding distance analysis from other
dependency analysis in LAA. Currently it supports only power-of-2
distances, required to support non-power-of-2 distances in future.

Part of #100755
2025-03-31 07:28:44 -04:00
Kazu Hirata
2fc08d4c31
[Vectorize] Use DenseMap::insert_range (NFC) (#133656) 2025-03-30 22:57:45 -07:00
Florian Hahn
424c8f9217
[VPlan] Remove dead UF argument from VPTransformState ctor (NFC). 2025-03-30 17:31:00 +01:00
Kazu Hirata
d8b078d550
[Transforms] Use llvm::append_range (NFC) (#133607) 2025-03-29 18:57:50 -07:00
Ramkumar Ramachandra
4c4e4e4299
[LV] Strengthen calls to collectInstsToScalarize (NFC) (#130642)
Avoid the pattern of always calling collectInstsToScalarize after
collectUniformsAndScalars, and call it in collectUniformsAndScalars
instead. Also strengthen checks for early exits in the function.
2025-03-28 17:27:57 +00:00
Florian Hahn
7b75db5755
[VPlan] Add new VPIRPhi overlay for VPIRInsts wrapping phi nodes (NFC). (#129387)
Add a new VPIRPhi subclass of VPIRInstruction, that purely serves as an
overlay, to provide more convenient checking (via directly doing
isa/dyn_cast/cast) and specialied execute/print implementations.

Both VPIRInstruction and VPIRPhi share the same VPDefID, and are
differentiated by the backing IR instruction.

This pattern could alos be used to provide more specialized interfaces
for some VPInstructions ocpodes, without introducing new, completely
spearate recipes. An example would be modeling VPWidenPHIRecipe &
VPScalarPHIRecip using VPInstructions opcodes and providing an interface
to retrieve incoming blocks and values through a VPInstruction subclass
similar to VPIRPhi.

PR: https://github.com/llvm/llvm-project/pull/129387
2025-03-28 08:43:46 +00:00
Florian Hahn
8ddbc01295
[VPlan] Manage FindLastIV start value in ComputeFindLastIVResult (NFC) (#132690)
Keep the start value as operand of ComputeFindLastIVResult. A follow-up
patch will use this to make sure the start value is frozen if needed.

Depends on https://github.com/llvm/llvm-project/pull/132689

PR: https://github.com/llvm/llvm-project/pull/132690
2025-03-27 18:34:13 +00:00
Kazu Hirata
cde58bfc16
[Transforms] Use range constructors of *Set (NFC) (#133203) 2025-03-27 07:51:58 -07:00
Simon Pilgrim
1715386e80 Fix MSVC signed/unsigned comparison warning. NFC. 2025-03-27 08:56:21 +00:00
David Green
de1c2f24bc
[LoopVectorizer][AArch64] Move getMinTripCountTailFoldingThreshold later. (#132170)
This moves the checks of MinTripCountTailFoldingThreshold later, during the
calculation of whether to tail fold. This allows it to check beforehand whether
tail predication is required, either for scalable or fixed-width vectors.

This option is only specified for AArch64, where it returns the minimum of 5.
This patch aims to allow the vectorization of TC=4 loops, preventing them from
performing slower when SVE is present.
2025-03-26 19:35:08 +00:00
Florian Hahn
420c056f85
[VPlan] Add ComputeFindLastIVResult opcode (NFC). (#132689)
This moves the logic for computing the FindLastIV reduction result to
its own opcode. A follow-up patch will update the new opcode to also
take the start value, to fix
https://github.com/llvm/llvm-project/issues/126836.

PR: https://github.com/llvm/llvm-project/pull/132689
2025-03-26 10:49:09 +00:00
Kazu Hirata
e87921304b
[Vectorize] Avoid repeated hash lookups (NFC) (#132661)
Co-authored-by: Florian Hahn <flo@fhahn.com>
2025-03-25 15:18:15 -07:00
Ramkumar Ramachandra
8fb802e995
[LV] Improve code in collectInstsToScalarize (NFC) (#130643) 2025-03-25 16:52:13 +00:00
Ramkumar Ramachandra
e8d882a95b
[LV] Audit and fix nits in cl::opts (NFC) (#130601)
Non-static cl::opts should be under the llvm namespace.
2025-03-25 10:19:45 +00:00
Luke Lau
6a8606e99e
[VPlan] Only store RecurKind + FastMathFlags in VPReductionRecipe. NFCI (#131300)
VPReductionRecipes take a RecurrenceDescriptor, but only use the
RecurKind and FastMathFlags in it when executing. This patch makes the
recipe more lightweight by stripping it to only take the latter two.

The motiviation for this is to simplify an upcoming patch to support
in-loop AnyOf reductions. For an in-loop AnyOf reduction we want to
create an Or reduction, and by using RecurKind we can create an
arbitrary reduction without needing a full RecurrenceDescriptor.
2025-03-24 19:18:54 +08:00
Florian Hahn
206b42c98e
[LV] Use VPBuilder to create ComputeReductionResult. (NFC)
Update code to use VPBuilder, to simplify follow-up changes.
2025-03-23 16:10:07 +00:00
Florian Hahn
c482b8faea
[VPlan] Only execute VPExpandSCEVRecipes once and remove them (NFC).
Instead of executing the whole entry VPIRBB twice, first only execute
the VPExpandSCEVRecipes and replace their uses with the expanded
VPValue, which will be a live-in. This allows removing special logic in
VPExpandSCEVRecipe to support executing twice and allows moving the
ExpandedSCEVs map out of VPTransformState.

It will also allow adding other recipes to the entry VPBB in the future.
2025-03-23 09:06:01 +00:00
Kazu Hirata
fae34938f6
[llvm] Use *Set::insert_range (NFC) (#132591)
DenseSet, SmallPtrSet, SmallSet, SetVector, and StringSet recently
gained C++23-style insert_range.  This patch uses insert_range with
iterator ranges.  For each case, I've verified that foos is defined as
make_range(foo_begin(), foo_end()) or in a similar manner.
2025-03-22 22:14:45 -07:00
Florian Hahn
0d3ba087f7
[LV] Move IV bypass value creation out of ILV (NFC)
createInductionAdditionalBypassValues is only used for epilogue
vectorization now. Move it out of ILV, which means we do not have to
thread through ExpandedSCEVs and also don't have to track the bypass
values in ILV. Instead, directly create them if needed after executing
the epilogue plan. This moves more the epilogue specific logic out of
the generic executePlan.
2025-03-22 20:36:45 +00:00
David Sherwood
4e69258bf3
[LoopVectorize] Add cost of generating tail-folding mask to the loop (#130565)
At the moment if we decide to enable tail-folding we do not include
the cost of generating the mask per VF. This can mean we make some
poor choices of VF, which is definitely true for SVE-enabled AArch64
targets where mask generation for fixed-width vectors is more
expensive than for scalable vectors.

I've added a VPInstruction::computeCost function to return the costs
of the ActiveLaneMask and ExplicitVectorLength operations.
Unfortunately, in order to prevent asserts firing I've also had to
duplicate the same code in the legacy cost model to make sure the
chosen VFs match up. I've wrapped this up in a ifndef NDEBUG for
now. The alternative would be to disable the assert completely when
tail-folding, which I imagine is just as bad.

New tests added:

  Transforms/LoopVectorize/AArch64/sve-tail-folding-cost.ll
  Transforms/LoopVectorize/RISCV/tail-folding-cost.ll
2025-03-21 09:24:56 +00:00
Florian Hahn
c73ad7ba20
[VPlan] Add transformation to narrow interleave groups.
This patch adds a new narrowInterleaveGroups transfrom, which tries
convert a plan with interleave groups with VF elements to a plan that
instead replaces the interleave groups with wide loads and stores
processing VF elements.

This effectively is a very simple form of loop-aware SLP, where we
use interleave groups to identify candidates.

This initial version is quite restricted and hopefully serves as a
starting point for how to best model those kinds of transforms. For now
it only transforms load interleave groups feeding store groups.

Depends on #106431.

This lands the main parts of the approved
https://github.com/llvm/llvm-project/pull/106441 as suggested to break
things up a bit more.
2025-03-20 19:41:37 +00:00
Kazu Hirata
2df51fd9c4
[Transforms] Avoid repeated hash lookups (NFC) (#132146) 2025-03-20 09:11:56 -07:00
Kazu Hirata
0dcc201ac4
[Transforms] Use *Set::insert_range (NFC) (#132056)
DenseSet, SmallPtrSet, SmallSet, SetVector, and StringSet recently
gained C++23-style insert_range.  This patch replaces:

  Dest.insert(Src.begin(), Src.end());

with:

  Dest.insert_range(Src);

This patch does not touch custom begin like succ_begin for now.
2025-03-19 15:35:01 -07:00
Florian Hahn
2e13ec561c
[VPlan] Bail out on non-intrinsic calls in VPlanNativePath.
Update initial VPlan-construction in VPlanNativePath in line with the
inner loop path, in that it bails out when encountering constructs it
cannot handle, like non-intrinsic calls.

Fixes https://github.com/llvm/llvm-project/issues/131071.
2025-03-19 21:35:15 +00:00
Florian Hahn
11b8699572
[LV] Don't skip instrs with side-effects in reg pressure computation. (#126415)
calculateRegisterUsage adds end points for each user of an instruction
to Ends and ignores instructions not added to it, i.e. instructions with
no users.

This means things like stores aren't included, which in turn means
values that are only used in stores are also not included for
consideration. This means we underestimate the register usage in cases
where the only users are things like stores.

Update the code to don't skip instructions without users (i.e. not in
Ends) if they have side-effects.

PR: https://github.com/llvm/llvm-project/pull/126415
2025-03-19 15:13:43 +00:00
Kazu Hirata
9705010b5e
[Vectorize] Avoid repeated hash lookups (NFC) (#131962) 2025-03-19 07:14:54 -07:00
Luke Lau
a4dc02c0e7
[VPlan] Rename VPReverseVectorPointerRecipe to VPVectorEndPointerRecipe. NFC (#131086)
After #128718 lands there will be two ways of performing a reversed
widened memory access, either by performing a consecutive unit-stride
access and a reverse, or a strided access with a negative stride.

Even though both produce a reversed vector, only the former needs
VPReverseVectorPointerRecipe which computes a pointer to the last
element of each part. A strided reverse still needs a pointer to the
first element of each part so it will use VPVectorPointerRecipe.

This renames VPReverseVectorPointerRecipe to VPVectorEndPointerRecipe to
clarify that a reversed access may not necessarily need a pointer to the
last element.
2025-03-19 00:09:15 +08:00
Florian Hahn
166937b49d
[LV] Cleanup after expanding SCEV predicate to constant.
In some cases, SCEV isn't able to prove that no wrap checks are needed,
while constant folding in SCEVExpander can. In those cases, we may leave
around IR for computing the trip count, which is unused at this point
but may be re-used later, triggering an assertion when trying to clean
up SCEVExp after vectorization.

Directly run the cleaner after expanding to a constant predicate to
prevent any generated code from being re-used.

Fixes https://github.com/llvm/llvm-project/issues/131281.
2025-03-17 21:26:51 +00:00
Luke Lau
eef5ea0c42
[VPlan] Account for dead FOR splice simplification in cost model (#131486)
Fixes #131359

After #129645, a first-order recurrence will no longer have it's splice
costed if the VPInstruction::FirstOrderRecurrenceSplice has no users and
is dead.

The legacy cost model didn't account for this, so this accounts for it
in planContainsAdditionalSimplifications to avoid the "VPlan cost model
and legacy cost model disagreed" assertion.
2025-03-18 00:00:54 +08:00
Florian Hahn
17b4be8f63
[VPlan] Move setting name and adding VFs after recipe creation.(NFC)
Recipe creation is the only place where the VF range is restricted. Move
setting the VFs just after initial recipe creation.
2025-03-17 12:04:09 +00:00
Luke Lau
315c02aa02
[VPlan] Fix crash with inloop fmuladd reductions with blend (#131154)
When visiting in-loop reduction links, we previously crashed if we had
an fmuladd with a blend after it in the chain. This fixes it by lifting
the existing blend folding to also handle fmuladd.

This also simplifies the code structure slightly for an upcoming patch I
want to post to handle in-loop AnyOf reductions.

I removed the PhiR->isInLoop() check since it's already guarded at the
top of the parent Header->Phis() loop.
2025-03-14 09:08:32 +08:00
Florian Hahn
62994c3291
[VPlan] Also introduce explicit broadcasts for values from entry VPBB.
Update and generalize materializeBroadcasts to also introduce explicit
broadcasts for VPValues defined in the Plans Entry block.

This fixes a crash when trying to insert the broadcasts generated by
VPTransformState::get after the generating instruction, which isn't
possible after invoke instructions.

Fixes https://github.com/llvm/llvm-project/issues/128838.
2025-03-12 22:03:19 +00:00
Ramkumar Ramachandra
93d41d8148
[LV] Use ElementCount::isKnownLT to factor code (NFC) (#130596) 2025-03-11 15:42:59 +00:00
John Brawn
7129205816
[LoopVectorize] Move checking for OptForSize into the cost model (NFC) (#130752)
Move OptForSizeBasedOnProfile into the cost model and rename it to
OptForSize, as shouldOptimizeForSize checks both the function attribute
and profile. This being done in preparation for OptForSize being used in
the cost model.
2025-03-11 14:58:04 +00:00
David Sherwood
26ecf97895
[LoopVectorize] Further improve cost model for early exit loops (#126235)
Following on from #125058, this patch takes into account the
work done in the vector early exit block when assessing the
profitability of vectorising the loop. I have renamed
areRuntimeChecksProfitable to isOutsideLoopWorkProfitable and
we now pass in the early exit costs. As part of this, I have
added the ExtractFirstActive opcode to VPInstruction::computeCost.

It's worth pointing out that when we assess profitability of the
loop we calculate a minimum trip count and compare that against
the *maximum* trip count. However, since the loop has an early
exit the runtime trip count can still end up being less than the
minimum. Alternatively, we may never take the early exit at all
at runtime and so we have the opposite problem of over-estimating
the cost of the loop. The loop vectoriser cannot simultaneously
take two contradictory positions and so I feel the only sensible
thing to do is be conservative and assume the loop will be more
expensive than loops without early exits.

We may find in future that we need to adjust the cost according to
the probability of taking the early exit. This will become even
more important once we support multiple early exits. However, we
have to start somewhere and we can always revisit this later.
2025-03-11 11:48:55 +00:00
Florian Hahn
f84f4e1e05
[LV] Don't crash on inner loops with RT checks in VPlan-native path.
Assert that we only generate runtime checks for inner loops in
emitMemRuntimeChecks, instead of returning nullptr in the VPlan-native
path, which is causing crashes and incorrect code.
2025-03-10 20:28:56 +00:00
Ramkumar Ramachandra
35c0302086
[LV] Clean up unused 'using' definition (NFC) (#130597) 2025-03-10 17:57:32 +00:00
Ramkumar Ramachandra
7220ab821d
[LV] Mark getMaxVScale static (NFC) (#130598) 2025-03-10 17:56:50 +00:00
Florian Hahn
fd267082ee
[VPlan] Refactor VPlan creation, add transform introducing region (NFC). (#128419)
Create an empty VPlan first, then let the HCFG builder create a plain
CFG for the top-level loop (w/o a top-level region). The top-level
region is introduced by a separate VPlan-transform. This is instead of
creating the vector loop region before building the VPlan CFG for the
input loop.

This simplifies the HCFG builder (which should probably be renamed) and
moves along the roadmap ('buildLoop') outlined in [1].

As follow-up, I plan to also preserve the exit branches in the initial
VPlan out of the CFG builder, including connections to the exit blocks.

The conversion from plain CFG with potentially multiple exits to a
single entry/exit region will be done as VPlan transform in a follow-up.

This is needed to enable VPlan-based predication. Currently early exit
support relies on building the block-in masks on the original CFG,
because exiting branches and conditions aren't preserved in the VPlan.
So in order to switch to VPlan-based predication, we will have to
preserve them in the initial plain CFG, so the exit conditions are
available explicitly when we convert to single entry/exit regions.

Another follow-up is updating the outer loop handling to also introduce
VPRegionBlocks for nested loops as transform. Currently the existing
logic in the builder will take care of creating VPRegionBlocks for
nested loops, but not the top-level loop.

[1]
https://llvm.org/devmtg/2023-10/slides/techtalks/Hahn-VPlan-StatusUpdateAndRoadmap.pdf

PR: https://github.com/llvm/llvm-project/pull/128419
2025-03-09 15:05:35 +00:00
Ramkumar Ramachandra
ddffb74afd
[LV] Strip unreachable SCEV-check blocks (#130079)
emitSCEVChecks checks if SCEVCheckCond matches zero, and returns
nullptr. However, it sets SCEVCheckCond as used before it does this,
which prevents it from being removed during cleanup, resulting in
unreachable blocks being emitted. Fix this.
2025-03-06 19:30:25 +00:00
Ramkumar Ramachandra
00f3089c2e
[LV] Use PatternMatch in emitTransformedIndex (NFC) (#130081) 2025-03-06 19:23:31 +00:00
Luke Lau
e1cea0d928
[LV][TTI] Remove unused ReductionFlags. NFC (#129858)
No in-tree targets currently use it in the
preferInLoopReduction/preferPredicatedReductionSelect TTI hooks. It
looks like it used to be used in LoopUtils, at least in
8ca60db40bd944dc5f67e0f200a403b4e03818ea, but I presume it was replaced
by RecurrenceDescriptor.
2025-03-05 18:31:12 +08:00
Ramkumar Ramachandra
80bdfcd411
[LoopUtils] Don't wrap in getLoopEstimatedTripCount (#129080)
getLoopEstimatedTripCount returns the trip count based on profiling
data, and its documentation says that it could return 0 when the trip
count is zero, but this is not the case: a valid trip count can never be
zero, and it returns 0 when the unsigned ExitCount is incremented by 1
and wraps. Some callers are careful about checking for this zero value
in an std::optional, but it makes for an API with footguns, as a
std::optional return value indicates that a non-nullopt value would be a
valid trip count. Fix this by explicitly returning std::nullopt when the
return value would wrap, and strip additional checks in callers. This
also fixes a minor bug in LoopVectorize.
2025-03-04 08:43:08 +00:00
Mel Chen
9b4ad2fe50
[LV][EVL] Support fixed-order recurrence idiom with EVL tail folding. (#124093)
This patch converts the llvm.vector.splice intrinsic to
llvm.experimental.vp.splice, ensuring that fixed-order recurrences
execute correctly when tail folding by EVL is enable.
Due to the non-VFxUF penultimate EVL issue, the EVL from the previous
iteration will be preserved and used in llvm.experimental.vp.splice.
2025-03-03 21:27:13 +08:00
Florian Hahn
f937b17e85
[LV] Don't query SCEV for non-invariant values in cost model.
This fixes a divergence between VPlan and legacy cost model, matching
behavior further up in getInstructionCost as well.

Fixes https://github.com/llvm/llvm-project/issues/129236.
2025-03-02 10:55:52 +00:00
Florian Hahn
75270e3750
[VPlan] Don't print VPlan DT after VPlan construction. (NFC)
Remove unnecessary code to just print VPlan dominator tree.
2025-03-01 21:15:56 +00:00