551 Commits

Author SHA1 Message Date
Florian Hahn
5b38fb59df
[VPlan] Remove remaining references to VPScalarPHIRecipe (NFC).
VPScalarPHIRecipe has been replaced by VPInstructions with PHI opcodes.
Strip remaining dead references to VPScalarPHIRecipe.
2025-03-24 19:37:00 +00:00
Luke Lau
6a8606e99e
[VPlan] Only store RecurKind + FastMathFlags in VPReductionRecipe. NFCI (#131300)
VPReductionRecipes take a RecurrenceDescriptor, but only use the
RecurKind and FastMathFlags in it when executing. This patch makes the
recipe more lightweight by stripping it to only take the latter two.

The motiviation for this is to simplify an upcoming patch to support
in-loop AnyOf reductions. For an in-loop AnyOf reduction we want to
create an Or reduction, and by using RecurKind we can create an
arbitrary reduction without needing a full RecurrenceDescriptor.
2025-03-24 19:18:54 +08:00
Nikita Popov
0738f70615
[Intrinsics] Add Intrinsic::getFnAttributes() (NFC) (#132029)
Most places that call Intrinsic::getAttributes() are only interested in
the function attributes, so add a separate function for that.

The motivation for this is that I'd like to add the ability to specify
range attributes on intrinsics, which requires knowing the function
type. This avoids needing to know the type for most attribute queries.
2025-03-20 09:20:39 +01:00
Luke Lau
a4dc02c0e7
[VPlan] Rename VPReverseVectorPointerRecipe to VPVectorEndPointerRecipe. NFC (#131086)
After #128718 lands there will be two ways of performing a reversed
widened memory access, either by performing a consecutive unit-stride
access and a reverse, or a strided access with a negative stride.

Even though both produce a reversed vector, only the former needs
VPReverseVectorPointerRecipe which computes a pointer to the last
element of each part. A strided reverse still needs a pointer to the
first element of each part so it will use VPVectorPointerRecipe.

This renames VPReverseVectorPointerRecipe to VPVectorEndPointerRecipe to
clarify that a reversed access may not necessarily need a pointer to the
last element.
2025-03-19 00:09:15 +08:00
Elvis Wang
ed19620b8c
[VPlan] Make VPReductionRecipe a VPRecipeWithIRFlags. NFC (#130881)
This patch change the parent of the VPReductionRecipe from
VPSingleDefRecipe to VPRecipeWithIRFlags and also print/get/drop/control
flags by the VPRecipeWithIRFlags. This will remove the dependency of the
underlying instruction.

This patch also add a new function `setFastMathFlags()` to the
VPRecipeWithIRFlags because the entire reduction chain may contains
multiple instructions. And the underlying instruction may not contains
the corresponding flags for this reduction.

Split from #113903.
2025-03-18 10:08:23 +08:00
Florian Hahn
6a030b3005
[VPlan] Remove unused VPlanIngredient (NFC).
VPlanIngredient is not used anymore, remove it.
2025-03-15 10:52:43 +00:00
David Sherwood
3b6d0093aa
[LV][NFC] Refactor code for extracting first active element (#131118)
Refactor the code to extract the first active element of a
vector in the early exit block, in preparation for PR #130766.
I've replaced the VPInstruction::ExtractFirstActive nodes with
a combination of a new VPInstruction::FirstActiveLane node and
a Instruction::ExtractElement node.
2025-03-14 11:14:09 +00:00
Florian Hahn
02575f887b
[VPlan] Use VPInstruction for VPScalarPHIRecipe. (NFCI) (#129767)
Now that all phi nodes manage their incoming blocks through the
VPlan-predecessors, there should be no need for having a dedicate
recipe, it should be sufficient to allow PHI opcodes in VPInstruction.

Follow-ups will also migrate VPWidenPHIRecipe and possibly others,
building on top of https://github.com/llvm/llvm-project/pull/129388.

PR: https://github.com/llvm/llvm-project/pull/129767
2025-03-13 18:35:07 +00:00
Florian Hahn
aeae3667bd
[VPlan] Make start operand non-optional for VPHeaderPHIRecipe (NFC).
The start value is always available, require it unconditionally.
2025-03-10 21:45:09 +00:00
Florian Hahn
fd267082ee
[VPlan] Refactor VPlan creation, add transform introducing region (NFC). (#128419)
Create an empty VPlan first, then let the HCFG builder create a plain
CFG for the top-level loop (w/o a top-level region). The top-level
region is introduced by a separate VPlan-transform. This is instead of
creating the vector loop region before building the VPlan CFG for the
input loop.

This simplifies the HCFG builder (which should probably be renamed) and
moves along the roadmap ('buildLoop') outlined in [1].

As follow-up, I plan to also preserve the exit branches in the initial
VPlan out of the CFG builder, including connections to the exit blocks.

The conversion from plain CFG with potentially multiple exits to a
single entry/exit region will be done as VPlan transform in a follow-up.

This is needed to enable VPlan-based predication. Currently early exit
support relies on building the block-in masks on the original CFG,
because exiting branches and conditions aren't preserved in the VPlan.
So in order to switch to VPlan-based predication, we will have to
preserve them in the initial plain CFG, so the exit conditions are
available explicitly when we convert to single entry/exit regions.

Another follow-up is updating the outer loop handling to also introduce
VPRegionBlocks for nested loops as transform. Currently the existing
logic in the builder will take care of creating VPRegionBlocks for
nested loops, but not the top-level loop.

[1]
https://llvm.org/devmtg/2023-10/slides/techtalks/Hahn-VPlan-StatusUpdateAndRoadmap.pdf

PR: https://github.com/llvm/llvm-project/pull/128419
2025-03-09 15:05:35 +00:00
Florian Hahn
8dd160f476
Revert "[VPlan] Fold NOT into predicate of wide compares." (#130347)
Reverts llvm/llvm-project#129430

this seems to have introduced a divergence between legacy and
VPlan-based cost model

https://lab.llvm.org/buildbot/#/builders/30/builds/17159
2025-03-07 21:18:49 +00:00
Florian Hahn
cb3ce30ca8
[VPlan] Fold NOT into predicate of wide compares. (#129430)
Add simplification to fold negation into a compare, if the negation is
the only user of the compare. This removes a number of redundant
negations.

Alive2 Proofs for FPCMP test changes:  https://alive2.llvm.org/ce/z/WGDz9U

PR: https://github.com/llvm/llvm-project/pull/129430
2025-03-07 20:32:43 +00:00
Florian Hahn
72376e19df
[VPlan] Remove unused VPWidenIntrinsicRecipe constructor (NFC) 2025-03-06 20:24:30 +00:00
Luke Lau
47fb9c4bb9
[VPlan] Add Name argument to VPWidenPHIRecipe. NFC (#129527)
This allows a different IR name for the generated phi to be used. This
is split off from #118638 and helps remove some of the diffs in it.
2025-03-04 16:47:21 +08:00
Florian Hahn
dfc5f37e3a
[VPlan] Move onlyFirstLaneUsed to VPWidenInductionRecipe (NFC).
Move onlyFirstLaneUsed from VPWidenIntOrFpInductionRecipe and
VPWidenPointerInduction to VPWidenInductionRecipe. Also mark step value
as having only its first lane used.
2025-03-03 22:43:47 +00:00
Florian Hahn
87f837cb26
[VPlan] Remove unneeded classof with VPHeaderRecipe args (NFC).
The extra classof implementation is not needed any longer.
2025-03-03 20:52:28 +00:00
Mel Chen
9b4ad2fe50
[LV][EVL] Support fixed-order recurrence idiom with EVL tail folding. (#124093)
This patch converts the llvm.vector.splice intrinsic to
llvm.experimental.vp.splice, ensuring that fixed-order recurrences
execute correctly when tail folding by EVL is enable.
Due to the non-VFxUF penultimate EVL issue, the EVL from the previous
iteration will be preserved and used in llvm.experimental.vp.splice.
2025-03-03 21:27:13 +08:00
Florian Hahn
ba7e27381f
[VPlan] Use VP_CLASSOF_IMPL in VPWidenRecipe. (NFC) 2025-03-02 11:40:23 +00:00
Florian Hahn
6ce41db6b0
[VPlan] Preserve DebugLoc for VPBranchOnMaskRecipe.
Update code to set and generate debug location for branch recipe
2025-02-27 20:19:42 +00:00
Florian Hahn
253d691596
[VPlan] Update VPBranchOnMaskRecipe to always set the mask (NFC).
The mask is always available at construction time. Make it non-optional
to simlpify code.
2025-02-27 18:53:24 +00:00
Florian Hahn
4277c21059
[VPlan] Introduce explicit broadcasts for live-ins. (#124644)
Add a new VPInstruction::Broadcast opcode and use it to materialize
explicit broadcasts of live-ins. The initial patch only materlizes the
broadcasts if the vector preheader dominates all uses that need it.
Later patches will pick the best valid insert point, thus retiring
implicit hoisting of broadcasts from VPTransformsState::get().

PR: https://github.com/llvm/llvm-project/pull/124644
2025-02-26 13:57:51 +00:00
Florian Hahn
522b05afb6
[VPlan] Construct immutable VPIRBBs for exit blocks at construction(NFC) (#128374)
Constract immutable VPIRBasicBlocks for all exit blocks up front and
keep a list of them. Same as the scalar header, they are leaf nodes of
the VPlan and won't change. Some exit blocks may be unreachable, e.g. if
the scalar epilogue always executes or depending on optimizations.

This simplifies both the way we retrieve the exit blocks as well as
hooking up the exit blocks.

PR: https://github.com/llvm/llvm-project/pull/128374
2025-02-25 14:23:27 +00:00
Luke Lau
e23ab73335
[VPlan] Don't convert widen recipes to VP intrinsics in EVL transform (#127180)
This is a copy of #126177, since it was automatically and permanently
closed because I messed up the source branch on my remote

This patch proposes to avoid converting widening recipes to VP
intrinsics during the EVL transform.

IIUC we initially did this to avoid `vl` toggles on RISC-V. However we
now have the RISCVVLOptimizer pass which mostly makes this redundant.

Emitting regular IR instead of VP intrinsics allows more generic
optimisations, both in the middle end and DAGCombiner, and we generally
have better patterns in the RISC-V backend for non-VP nodes. Sticking to
regular IR instructions is likely a lot less work than reimplementing
all of these optimisations for VP intrinsics, and on SPEC CPU 2017 we get
noticeably better code generation.
2025-02-22 19:38:11 +08:00
Florian Hahn
6c627831f9
[VPlan] Use VPlan predecessors in VPWidenPHIRecipe (NFC). (#126388)
Update VPWidenPHIRecipe to use the predecessors in VPlan to determine
the incoming blocks instead of tracking them separately. This brings
VPWidenPHIRecipe in line with the other phi recipes.

PR: https://github.com/llvm/llvm-project/pull/126388
2025-02-17 16:40:37 +01:00
Kazu Hirata
042e860a8a
[Vectorize] Avoid repeated hash lookups (NFC) (#126681) 2025-02-11 09:09:43 -08:00
Florian Hahn
ee806646ad
[VPlan] Consistently use hasScalarVFOnly (NFC).
Consistently use hasScalarVFOnly instead of using
hasVF(ElementCount::getFixed(1)). Also add an assert to ensure all cases
are covered by hasScalarVFOnly.
2025-02-08 12:19:25 +00:00
Florian Hahn
16df836a52
[VPlan] Mark hasVF & hasScalableVF as const (NFC). 2025-02-08 11:32:23 +00:00
Luke Lau
d0f122b9c5
[LV] Update incoming blocks in VPWidenPHIRecipe in reassociateBlocks (#125481)
This is extracted from #118638

After c7ebe4f we will crash in fixNonInductionPHIs if we use a
VPWidenPHIRecipe with the vector preheader as an incoming block, because
the phi will reference the old non-IRBB vector preheader.

This fixes this by updating VPBlockUtils::reassociateBlocks to update
any VPWidenPHIRecipes's incoming blocks.

This assumes that if the VPWidenPHIRecipe is in a VPRegionBlock, it's in
the entry block, and that we are replacing a VPBasicBlock with another
VPBasicBlock.
2025-02-07 08:50:35 +08:00
David Sherwood
f07cd36a5d
[LoopVectorize] Add the cost of VPInstruction::AnyOf to vplan (#125058)
This patch adds an initial implementation of
VPInstruction::computeCost with support for only one
instruction so far - VPInstruction::AnyOf. This is only
used when vectorising loops with uncountable early exits.
2025-02-05 16:31:14 +00:00
Florian Hahn
5008277322
[VPlan] Move auxiliary declarations out of VPlan.h (NFC). (#124104)
Nothing in VPlan.h directly depends on VPTransformState, VPCostContext,
VPFRange, VPlanPrinter or VPSlotTracker. Move them out to a separate
header to reduce the size of widely used VPlan.h.

This is a first step towards more cleanly separating declarations in
VPlan.

Besides reducing VPlan.h's size, this also allows including additional
VPlan-related headers in VPlanHelpers.h for use there. An example is
using VPDominatorTree in VPTransformState
(https://github.com/llvm/llvm-project/pull/117138).

PR: https://github.com/llvm/llvm-project/pull/124104
2025-02-02 13:44:07 +00:00
Florian Hahn
75b922dccf [VPlan] Check VPWidenIntrinsicSC in VPRecipeWithIRFlags::classof.
When VPWidenIntrinsicRecipe was changed to inhert from VPRecipeWithIRFlags,
VPRecipeWithIRFlags::classof wasn't updated accordingly. Also check for
VPWidenIntrinsicSC in VPRecipeWithIRFlags::classof.

Fixes https://github.com/llvm/llvm-project/issues/125301.
2025-02-01 21:42:19 +00:00
David Sherwood
3bc2dade36
[LoopVectorize] Enable vectorisation of early exit loops with live-outs (#120567)
This work feeds part of PR
https://github.com/llvm/llvm-project/pull/88385, and adds support for
vectorising
loops with uncountable early exits and outside users of loop-defined
variables. When calculating the final value from an uncountable early
exit we need to calculate the vector lane that triggered the exit,
and hence determine the value at the point we exited.

All code for calculating the last value when exiting the loop early
now lives in a new vector.early.exit block, which sits between the
middle.split block and the original exit block. Doing this required
two fixes:

1. The vplan verifier incorrectly assumed that the block containing
a definition always dominates the block of the user. That's not true
if you can arrive at the use block from multiple incoming blocks.
This is possible for early exit loops where both the early exit and
the latch jump to the same block.
2. We were adding the new vector.early.exit to the wrong parent loop.
It needs to have the same parent as the actual early exit block from
the original loop.

I've added a new ExtractFirstActive VPInstruction that extracts the
first active lane of a vector, i.e. the lane of the vector predicate
that triggered the exit.

NOTE: The IR generated for dealing with live-outs from early exit
loops is unoptimised, as opposed to normal loops. This inevitably
leads to poor quality code, but this can be fixed up later.
2025-01-30 10:37:00 +00:00
Nicholas Guy
cdea38f91a
Reland "[LoopVectorizer] Add support for chaining partial reductions #120272" (#124282)
Change `getScaledReduction` to take an existing vector, rather than
creating and returning a new one each call.
Rename `getScaledReduction` to `getScaledReductions` to more accurately
reflect what it's now doing.

---------

Co-authored-by: Karlo Basioli <68535415+basioli-k@users.noreply.github.com>
2025-01-28 10:40:35 +00:00
Florian Hahn
09a29fcc8d
[VPlan] Don't collect live-ins in collectUsersInExitBlocks. (NFC) (#123819)
Live-ins don't need to be handled, other than adding to the exit phi
recipe. Do that early and assert that otherwise the exit value is
defined in the vector loop region.

This should enable simply skipping other exit values that do not need
further fixing, e.g. if handling the exit value from the early exit
directly in handleUncountableEarlyExit.

PR: https://github.com/llvm/llvm-project/pull/123819
2025-01-27 16:12:07 +00:00
Elvis Wang
aff1242b8e
[LV] Align debug location of the widen-phi to the original phi. (#120338)
This patch align the debug location of the widen-phi to the debug
location of original phi.

Split from: #120054
2025-01-24 17:49:54 +08:00
Vitaly Buka
0e213834df
Revert "[LoopVectorizer] Add support for chaining partial reductions (#120272)" (#124198)
Introduced stack buffer overflow, see #120272.

`getScaledReduction` can return empty vector, and there is not check for
that.

This reverts commit c9b7303b9b18129c4ee6b56aaa2a0a9f59be2d09.
This reverts commit caf0540b91b0fee31353dc7049ae836e0f814cff.
2025-01-23 14:00:33 -08:00
Florian Hahn
7a831eb924 [VPlan] Remove unused VPLane::getNumCachedLanes. (NFC)
The function isn't used, remove it.
2025-01-23 20:38:47 +00:00
Karlo Basioli
c9b7303b9b
Add [[maybe_unused]] to a variable used only in assert in VPlan.h (#124173) 2025-01-23 18:52:53 +00:00
Nicholas Guy
caf0540b91
[LoopVectorizer] Add support for chaining partial reductions (#120272)
Chaining partial reductions, where multiple partial reductions share an
accumulator, allow for more values to be combined together as part of
the reduction without discarding the semantics of the partial reduction
itself.
2025-01-23 17:24:57 +00:00
Nicholas Guy
26b61e143b
[LoopVectorizer] Propagate underlying instruction to the cloned instances of VPPartialReductionRecipes (#123638) 2025-01-23 14:57:31 +00:00
Florian Hahn
05fbc3830d [VPlan] Move VPBlockUtils to VPlanUtils.h (NFC)
Nothing in VPlan.h directly uses VPBlockUtils.h. Move it out to the more
appropriate VPlanUtils.h to reduce the size of the widely included VPlan.h.
2025-01-23 11:28:11 +00:00
Florian Hahn
2c87133c62
Reapply "[VPlan] Update final IV exit value via VPlan. (#112147)"
This reverts the revert commit 58326f1d5b5b379590af92dd129b2f3b3e96af46.

The build failure in sanitizer stage2 builds has been fixed with
0d39fe6f5bb3edf0bddec09a8c6417377390aeac.

Original commit message:
Model updating IV users directly in VPlan, replace fixupIVUsers.

Now simple extracts are created for all phis in the exit block during
initial VPlan construction. A later VPlan transform
(optimizeInductionExitUsers) replaces extracts of inductions with
their pre-computed values if possible.

This completes the transition towards modeling all live-outs directly in
VPlan.

There are a few follow-ups:
* emit extracts initially also for resume phis, and optimize them
   tougher with IV exit users
* support for VPlans with multiple exits in optimizeInductionExitUsers.

Depends on https://github.com/llvm/llvm-project/pull/110004,
https://github.com/llvm/llvm-project/pull/109975 and
https://github.com/llvm/llvm-project/pull/112145.
2025-01-19 19:32:03 +00:00
Florian Hahn
58326f1d5b
Revert "[VPlan] Update final IV exit value via VPlan. (#112147)"
This reverts commit c2d15ac4d4432788557e77c15ce572ac655a8fec.

Causes build failures on PPC stage2 & fuchsia bots
    https://lab.llvm.org/buildbot/#/builders/168/builds/7650
    https://lab.llvm.org/buildbot/#/builders/11/builds/11248
2025-01-18 13:40:33 +00:00
Florian Hahn
c2d15ac4d4
[VPlan] Update final IV exit value via VPlan. (#112147)
Model updating IV users directly in VPlan, replace fixupIVUsers.

Now simple extracts are created for all phis in the exit block during
initial VPlan construction. A later VPlan transform 
(optimizeInductionExitUsers) replaces extracts of inductions with 
their pre-computed values if possible.

This completes the transition towards modeling all live-outs directly in
VPlan.

There are a few follow-ups:
* emit extracts initially also for resume phis, and optimize them 
   tougher with IV exit users
* support for VPlans with multiple exits in optimizeInductionExitUsers.


Depends on https://github.com/llvm/llvm-project/pull/110004,
https://github.com/llvm/llvm-project/pull/109975 and
https://github.com/llvm/llvm-project/pull/112145.
2025-01-18 13:22:34 +00:00
John Brawn
edf3a55bce
[LoopVectorize][NFC] Centralize the setting of CostKind (#121937)
In each class which calculates instruction costs (VPCostContext,
LoopVectorizationCostModel, GeneratedRTChecks) set the CostKind once in
the constructor instead of in each function that calculates a cost. This
is in preparation for potentially changing the CostKind when compiling
for optsize.
2025-01-17 15:06:18 +00:00
Luke Lau
5c15caa83f
[VPlan] Verify scalar types in VPlanVerifier. NFCI (#122679)
VTypeAnalysis contains some assertions which can be useful for reasoning
that the types of various operands match.

This patch teaches VPlanVerifier to invoke VTypeAnalysis to check them,
and catches some issues with VPInstruction types that are also fixed
here:

* Handles the missing cases for CalculateTripCountMinusVF,
CanonicalIVIncrementForPart and AnyOf
* Fixes ICmp and ActiveLaneMask to return i1 (to align with `icmp` and
`@llvm.get.active.lane.mask` in the LangRef)

The VPlanVerifier unit tests also need to be fleshed out a bit more to
satisfy the stricter assertions
2025-01-16 18:57:08 +08:00
Florian Hahn
ef1260acc0
[VPlan] Make VPBlock constructors private (NFC).
16d19aaed moved to manage block creation via VPlan directly, with VPlan
owning the created blocks. Follow up to make the VPBlock constructors
private, to require creation via VPlan helpers and thus preventing
issues due to manually constructing blocks.
2025-01-15 21:34:24 +00:00
LiqinWeng
0294dab79e
[LV][VPlan] Add fast flags for selectRecipe (#121023)
Change the inheritance of class VPWidenSelectRecipe to class
VPRecipeWithIRFlags, which allows recipe of the select to pass the
fastmath flags.The patch of #119847 will add the fastmath flag to for
recipe
2025-01-15 10:10:11 +08:00
Sam Tebbs
795e35a653
Reland "[LoopVectorizer] Add support for partial reductions" with non-phi operand fix. (#121744)
This relands the reverted #120721 with a fix for cases where neither
reduction operand are the reduction phi. Only
63114239cc8d26225a0ef9920baacfc7cc00fc58 and
63114239cc8d26225a0ef9920baacfc7cc00fc58 are new on top of the reverted
PR.

---------

Co-authored-by: Nicholas Guy <nicholas.guy@arm.com>
2025-01-13 11:20:35 +00:00
Luke Lau
f0d5104c94
[VPlan] Handle some VPInstructions in may{Read,Write}FromMemory (#120058)
This just copies the same conservative definition from mayWriteToMemory,
and enables more VPInstructions to be hoisted out in LICM.

I think this should give more accurate costs, and I was able to build
llvm-test-suite without the legacy-vplan cost model assertion going off.
2025-01-08 15:17:26 +08:00