331 Commits

Author SHA1 Message Date
Kazu Hirata
5cfd81b0cc
[llvm] Use range constructors of *Set (NFC) (#137552) 2025-04-27 15:59:57 -07:00
Florian Hahn
826f237cb4
[VPlan] Don't added separate vector latch block (NFC).
Simplify initial VPlan construction by not creating a separate
vector.latch block, which isn't needed and will get folded away later.
This has been suggested as independent clean-up multiple times.
2025-04-26 22:03:18 +01:00
Florian Hahn
df21288247
[VPlan] Replace ExtractFromEnd with Extract(Last|Penultimate)Element (NFC). (#137030)
ExtractFromEnd only has 2 uses, extracting the last and penultimate
elements. Replace it with 2 separate opcodes, removing the need to
materialize and handle a constant argument.

PR: https://github.com/llvm/llvm-project/pull/137030
2025-04-25 16:27:29 +01:00
Florian Hahn
7cce38beea
[VPlan] Remove dead SE argument from handleUncountableEarlyExit (NFC).
ScalarEvolution is not used by the function, remove the dead arg.
2025-04-24 19:59:05 +01:00
Florian Hahn
06d4876982
[VPlan] Replace checking IR loop with checking VPlan predecessors (NFC).
Update check to use VPEarlyExitBlock's predecessors, which removes a
dependence on underlying IR and is more in line with the comment below.
2025-04-24 12:29:34 +01:00
Florian Hahn
3fbbe9b8d0
[VPlan] Add exit phi operands during initial construction (NFC). (#136455)
Add incoming exit phi operands during the initial VPlan construction.
This ensures all users are added to the initial VPlan and is also needed
in preparation to retaining exiting edges during initial construction.

PR: https://github.com/llvm/llvm-project/pull/136455
2025-04-23 20:40:42 +01:00
Florian Hahn
8c83355d5b
[VPlan] Handle VPIRPhi in VPRecipeBase::isPhi (NFC).
Also handle VPIRPhi in VPRecipeBase::isPhi, to simplify existing code
dealing with VPIRPhis.

Suggested as part of https://github.com/llvm/llvm-project/pull/136455.
2025-04-21 21:04:20 +01:00
Florian Hahn
5739a22fbb
[VPlan] Also duplicated scalar-steps when it enables sinking scalars. (#136021)
Extend sinking logic to duplicate scalar steps recipe if it enables
sinking, that is if all users in a destination block require all lanes.

This should be the last step before removing legacy sinkScalarOperands.

PR: https://github.com/llvm/llvm-project/pull/136021
2025-04-21 18:36:43 +01:00
Elvis Wang
69ade7c090
[LV] Check if the VF is scalar by VFRange in handleUncountableEarlyExit. (#135294)
This patch check if the plan contains scalar VF by VFRange instead of
Plan.
This patch also clamp the range to contains either only scalar or only
vector VFs to prevent mis-compile.

Split from #113903.
2025-04-18 06:51:36 +08:00
Luke Lau
41675fa5b8
[VPlan] Simplify vp.merge true, (or x, y), x -> vp.merge y, true, x (#135017)
With EVL tail folding an AnyOf reduction will emit an i1 vp.merge like

vp.merge true, (or phi, cond), phi, evl

We can remove the or and optimise this to

vp.merge cond, true, phi, evl

Which makes it slightly easier to pattern match in #134898.

This also adds a pattern matcher for calls to help match this.

Blended AnyOf reductions will use an and instead of an or, which we may
also be able to simplify in a later patch.
2025-04-17 16:31:14 +02:00
Sterling-Augustine
820a9b72b2
Fix build by marking possibly unused variable such. (#135689) [NFC] 2025-04-14 15:24:36 -07:00
Kazu Hirata
888b3ed5b4 [Vectorize] Fix a warning
This patch fixes:

  llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:2417:13: error:
  unused variable 'ConstStep' [-Werror,-Wunused-variable]
2025-04-14 15:03:07 -07:00
Florian Hahn
54b33eba16
[VPlan] Add opcode to create step for wide inductions. (#119284)
This patch adds a WideIVStep opcode that can be used to create a vector
with the steps to increment a wide induction. The opcode has 2 operands
* the vector step
* the scale of the vector step

The opcode is later converted into a sequence of recipes that convert
the scale and step to the target type, if needed, and then multiply
vector step by scale.

This simplifies code that needs to materialize step vectors, e.g.
replacing wide IVs as follow up to
https://github.com/llvm/llvm-project/pull/108378 with an increment of
the wide IV step.

PR: https://github.com/llvm/llvm-project/pull/119284
2025-04-14 23:20:44 +02:00
Florian Hahn
5550d30228
[VPlan] Check captured operand when simplifying redundant OR.
Follow-up to 0f607f to actually use the captured operand X instead of Y.
2025-04-13 13:23:27 +01:00
Florian Hahn
0f607f3df5
[VPlan] Simplify 'or x, true' -> true.
Add additional OR simplification to fix a divergence between legacy and
VPlan-based cost model.

This adds a new m_AllOnes matcher by generalizing specific_intval to
int_pred_ty, which takes a predicate to check to support matching both
specific APInts and other APInt predices, like isAllOnes.

Fixes https://github.com/llvm/llvm-project/issues/131359.
2025-04-13 12:09:40 +01:00
Luke Lau
be6ccc98f3
[VPlan] Split out VPBlendRecipe simplifications from simplifyRecipes. NFC (#134073)
This is split off from #133977

VPBlendRecipe normalisation is sensitive to the number of users a mask
has, so should probably be run after the masks are simplified as much as
possible.

Note this could be run after removeDeadRecipes but this causes test
diffs, some regressions, so this is left to a later patch.
2025-04-07 09:55:52 +01:00
Florian Hahn
464286ba63
[VPlan] Don't narrow interleave groups if there are vector pointers.
Do not narrow interleave groups if there are VectorPointer recipes and
the plan was unrolled. The recipe implicitly uses VF from VPTransformState.
2025-04-06 22:14:24 +01:00
Florian Hahn
3a859b11e3
[VPlan] Set and use debug location for VPScalarIVStepsRecipe.
This adds missing debug location for VPscalarIVStepsRecipe. The location
of the corresponding phi is used.
2025-04-04 21:14:36 +01:00
Florian Hahn
5fbd0658a0
[VPlan] Add initial CFG simplification, removing BranchOnCond true. (#106748)
Add an initial CFG simplification transform, which removes the dead
edges for blocks terminated with BranchOnCond true.

At the moment, this removes the edge between middle block and scalar
preheader when folding the tail.

PR: https://github.com/llvm/llvm-project/pull/106748
2025-04-04 15:44:26 +01:00
Florian Hahn
380defd4b3
[VPlan] Update VPInterleaveRecipe to take debug loc directly as arg (NFC) 2025-04-02 22:46:38 +01:00
Luke Lau
8107b430ed
[VPlan] Simplify select c, x, x -> x (#133731)
As noted in 1a9358c090d0507be21c5e9b2d97a23ef1de8ab0, some
simplifications can produce a redundant select where the true and false
operands are the same, which this patch removes.

The is_fpclass test was changed so the condition wasn't made dead.
2025-04-02 10:26:48 +01:00
Luke Lau
6afe5e5d1a
[LV][EVL] Peek through combination tail-folded + predicated masks (#133430)
If a recipe was predicated and tail folded at the same time, it will
have a mask like

    EMIT vp<%header-mask> = icmp ule canonical-iv, backedge-tc
    EMIT vp<%mask> = logical-and vp<%header-mask>, vp<%pred-mask>

When converting to an EVL recipe, if the mask isn't exactly just the
header-mask we copy the whole logical-and.
We can remove this redundant logical-and (because it's now covered by
EVL) and just use vp<%pred-mask> instead.

This lets us remove the widened canonical IV in more places.
2025-03-31 21:28:39 +01:00
Luke Lau
b739a3cb65
[VPlan] Add m_Deferred. NFC (#133736)
This copies over the implementation of m_Deferred which allows matching
values that were bound in the pattern, and uses it for the (X && Y) ||
(X && !Y) -> X simplifcation.
2025-03-31 21:01:28 +01:00
Florian Hahn
809f857d2c
[VPlan] Support early-exit loops in optimizeForVFAndUF. (#131539)
Update optimizeForVFAndUF to support early-exit loops by handling
BranchOnCond(Or(..., CanonicalIV == TripCount)) via SCEV

PR: https://github.com/llvm/llvm-project/pull/131539
2025-03-31 07:55:48 +01:00
Florian Hahn
6b98134466
[VPlan] Re-enable narrowing interleave groups with interleaving.
Remove the UF = 1 restriction introduced by 577631f0a5 building on top
of 783a846507683, which allows updating all relevant users of the VF,
VPScalarIVSteps in particular.

This restores the full functionality of
https://github.com/llvm/llvm-project/pull/106441.
2025-03-29 20:14:10 +00:00
Florian Hahn
783a846507
[VPlan] Add VF as operand to VPScalarIVStepsRecipe.
Similarly to other recipes, update VPScalarIVStepsRecipe to also take
the runtime VF as argument. This removes some unnecessary runtime VF
computations for scalable vectors. It will also allow dropping the
UF == 1 restriction for narrowing interleave groups required in
577631f0a528.
2025-03-28 21:48:59 +00:00
Hari Limaye
bf5627c85e
[LV] Optimize VPWidenIntOrFpInductionRecipe for known TC (#118828)
Optimize the IR generated for a VPWidenIntOrFpInductionRecipe to use the
narrowest type necessary, when the trip-count of a loop is known to be
constant and the only use of the recipe is the condition used by the
vector loop's backedge branch.
2025-03-28 14:47:40 +00:00
Florian Hahn
7b75db5755
[VPlan] Add new VPIRPhi overlay for VPIRInsts wrapping phi nodes (NFC). (#129387)
Add a new VPIRPhi subclass of VPIRInstruction, that purely serves as an
overlay, to provide more convenient checking (via directly doing
isa/dyn_cast/cast) and specialied execute/print implementations.

Both VPIRInstruction and VPIRPhi share the same VPDefID, and are
differentiated by the backing IR instruction.

This pattern could alos be used to provide more specialized interfaces
for some VPInstructions ocpodes, without introducing new, completely
spearate recipes. An example would be modeling VPWidenPHIRecipe &
VPScalarPHIRecip using VPInstructions opcodes and providing an interface
to retrieve incoming blocks and values through a VPInstruction subclass
similar to VPIRPhi.

PR: https://github.com/llvm/llvm-project/pull/129387
2025-03-28 08:43:46 +00:00
Florian Hahn
5eccd71ce4
[VPlan] Add assertion ensuring Plan's UF matches BestUF (NFC). 2025-03-27 19:29:55 +00:00
David Sherwood
1c9fe8c8af
[LV] Optimise users of induction variables in early exit blocks (#130766)
This is the second of two PRs that attempts to improve the IR
generated in the exit blocks of vectorised loops with uncountable
early exits. It follows on from PR #128880. In this PR I am
improving the generated code for users of induction variables in
early exit blocks.

This required using a newly add VPInstruction called
FirstActiveLane, which calculates the index of the first active
predicate in the mask operand.

I have added a new function optimizeEarlyExitInductionUser that
is called from optimizeInductionExitUsers when handling users in
early exit blocks.
2025-03-26 12:09:59 +00:00
Florian Hahn
577631f0a5
Reapply "[VPlan] Add transformation to narrow interleave groups. (#106441)"
This reverts commit ff3e2ba9eb94217f3ad3525dc18b0c7b684e0abf.

The recommmitted version limits to transform to cases where no
interleaving is taking place, to avoid a mis-compile when interleaving.

Original commit message:

This patch adds a new narrowInterleaveGroups transfrom, which tries
convert a plan with interleave groups with VF elements to a plan that
instead replaces the interleave groups with wide loads and stores
processing VF elements.

This effectively is a very simple form of loop-aware SLP, where we
use interleave groups to identify candidates.

This initial version is quite restricted and hopefully serves as a
starting point for how to best model those kinds of transforms.

Depends on https://github.com/llvm/llvm-project/pull/106431.

Fixes https://github.com/llvm/llvm-project/issues/82936.

PR: https://github.com/llvm/llvm-project/pull/106441
2025-03-25 20:57:10 +00:00
Florian Hahn
dfca6c0d3b
[VPlan] Remove no-op SCALAR-STEPS after unrolling. (#123655)
After unrolling, there may be additional simplifications that can be
applied. One example is removing SCALAR-STEPS for the first part where
only the first lane is demanded.

This removes redundant adds of 0 from a large number of tests (~200),
many which I am still working on updating.

In preparation for removing redundant WideIV steps added in
https://github.com/llvm/llvm-project/pull/119284.

PR: https://github.com/llvm/llvm-project/pull/123655
2025-03-25 12:57:24 +00:00
Florian Hahn
06fd10f1da
[VPlan] Don't create ExtractElement recipes for scalar plans. (#131604)
ExtractElements are no-ops for scalar VPlans. Don't introduce them in 
handleUncountableEarlyExit if the plan has only a scalar VF.

This fixes a crash trying to compute the cost of ExtractElement after 26ecf978951b79.

PR: https://github.com/llvm/llvm-project/pull/131604
2025-03-23 22:00:02 +00:00
Martin Storsjö
ff3e2ba9eb Revert "[VPlan] Add transformation to narrow interleave groups. (#106441)"
This reverts commit dfa665f19c52d98b8d833a8e9073427ba5641b19.

This commit caused miscompilations in ffmpeg, see
https://github.com/llvm/llvm-project/pull/106441 for details.
2025-03-23 23:27:39 +02:00
Kazu Hirata
fae34938f6
[llvm] Use *Set::insert_range (NFC) (#132591)
DenseSet, SmallPtrSet, SmallSet, SetVector, and StringSet recently
gained C++23-style insert_range.  This patch uses insert_range with
iterator ranges.  For each case, I've verified that foos is defined as
make_range(foo_begin(), foo_end()) or in a similar manner.
2025-03-22 22:14:45 -07:00
Florian Hahn
dfa665f19c
[VPlan] Add transformation to narrow interleave groups. (#106441)
This patch adds a new narrowInterleaveGroups transfrom, which tries
convert a plan with interleave groups with VF elements to a plan that
instead replaces the interleave groups with wide loads and stores
processing VF elements.

This effectively is a very simple form of loop-aware SLP, where we
use interleave groups to identify candidates.

This initial version is quite restricted and hopefully serves as a
starting point for how to best model those kinds of transforms.

Depends on https://github.com/llvm/llvm-project/pull/106431.

Fixes https://github.com/llvm/llvm-project/issues/82936.

PR: https://github.com/llvm/llvm-project/pull/106441
2025-03-22 21:40:17 +00:00
Kazu Hirata
3520dc5e7a [Vectorize] Fix a build
This patch fixes:

  llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:2263:19: error:
  expected ';' after return statement
2025-03-20 12:52:27 -07:00
Florian Hahn
c73ad7ba20
[VPlan] Add transformation to narrow interleave groups.
This patch adds a new narrowInterleaveGroups transfrom, which tries
convert a plan with interleave groups with VF elements to a plan that
instead replaces the interleave groups with wide loads and stores
processing VF elements.

This effectively is a very simple form of loop-aware SLP, where we
use interleave groups to identify candidates.

This initial version is quite restricted and hopefully serves as a
starting point for how to best model those kinds of transforms. For now
it only transforms load interleave groups feeding store groups.

Depends on #106431.

This lands the main parts of the approved
https://github.com/llvm/llvm-project/pull/106441 as suggested to break
things up a bit more.
2025-03-20 19:41:37 +00:00
Florian Hahn
2e13ec561c
[VPlan] Bail out on non-intrinsic calls in VPlanNativePath.
Update initial VPlan-construction in VPlanNativePath in line with the
inner loop path, in that it bails out when encountering constructs it
cannot handle, like non-intrinsic calls.

Fixes https://github.com/llvm/llvm-project/issues/131071.
2025-03-19 21:35:15 +00:00
Florian Hahn
870f753f1f
[VPlan] Also materialize broadcasts for backedge-taken-counts (NFC).
Also include VPlan's BTC in the set of VPValues to materialize
broadcasts for, if it is used.
2025-03-18 22:35:18 +00:00
Florian Hahn
d51bc83511
[VPlan] Only skip live-ins with constants in materializeBroadccast (NFC)
Currently this should be NFC, but will be needed in future patches.
2025-03-18 20:23:16 +00:00
Luke Lau
a4dc02c0e7
[VPlan] Rename VPReverseVectorPointerRecipe to VPVectorEndPointerRecipe. NFC (#131086)
After #128718 lands there will be two ways of performing a reversed
widened memory access, either by performing a consecutive unit-stride
access and a reverse, or a strided access with a negative stride.

Even though both produce a reversed vector, only the former needs
VPReverseVectorPointerRecipe which computes a pointer to the last
element of each part. A strided reverse still needs a pointer to the
first element of each part so it will use VPVectorPointerRecipe.

This renames VPReverseVectorPointerRecipe to VPVectorEndPointerRecipe to
clarify that a reversed access may not necessarily need a pointer to the
last element.
2025-03-19 00:09:15 +08:00
David Sherwood
3b6d0093aa
[LV][NFC] Refactor code for extracting first active element (#131118)
Refactor the code to extract the first active element of a
vector in the early exit block, in preparation for PR #130766.
I've replaced the VPInstruction::ExtractFirstActive nodes with
a combination of a new VPInstruction::FirstActiveLane node and
a Instruction::ExtractElement node.
2025-03-14 11:14:09 +00:00
Florian Hahn
02575f887b
[VPlan] Use VPInstruction for VPScalarPHIRecipe. (NFCI) (#129767)
Now that all phi nodes manage their incoming blocks through the
VPlan-predecessors, there should be no need for having a dedicate
recipe, it should be sufficient to allow PHI opcodes in VPInstruction.

Follow-ups will also migrate VPWidenPHIRecipe and possibly others,
building on top of https://github.com/llvm/llvm-project/pull/129388.

PR: https://github.com/llvm/llvm-project/pull/129767
2025-03-13 18:35:07 +00:00
Florian Hahn
62994c3291
[VPlan] Also introduce explicit broadcasts for values from entry VPBB.
Update and generalize materializeBroadcasts to also introduce explicit
broadcasts for VPValues defined in the Plans Entry block.

This fixes a crash when trying to insert the broadcasts generated by
VPTransformState::get after the generating instruction, which isn't
possible after invoke instructions.

Fixes https://github.com/llvm/llvm-project/issues/128838.
2025-03-12 22:03:19 +00:00
Florian Hahn
8132c4f554
[VPlan] Also introduce broadcasts for live-ins used in vec preheader.
Slightly generalize materializeLiveInBroadcasts to also introduce
broadcasts for live-ins used in the vector preheader. This should cover
all live-ins.

If the live-in is used in the vector preheader, insert the broadcast at
the beginning of the block.
2025-03-11 21:19:14 +00:00
David Sherwood
055db3ec33
[LV] Optimise latch exit induction users for some early exit loops (#128880)
This is the first of two PRs that attempts to improve the IR
generated in the exit blocks of vectorised loops with uncountable
early exits. In this PR I am improving the generated code for
users of induction variables in early exit loops that have a
unique exit block, when exiting via the latch.

I have moved some of the code for calculating the exit values in
latch exit blocks from `optimizeInductionExitUsers` into a new
function `optimizeLatchExitInductionUser`.

I intend to follow this up very soon with another patch to
optimise the code for induction users in the vector.early.exit
block.
2025-03-11 10:13:16 +00:00
Florian Hahn
8dd160f476
Revert "[VPlan] Fold NOT into predicate of wide compares." (#130347)
Reverts llvm/llvm-project#129430

this seems to have introduced a divergence between legacy and
VPlan-based cost model

https://lab.llvm.org/buildbot/#/builders/30/builds/17159
2025-03-07 21:18:49 +00:00
Florian Hahn
cb3ce30ca8
[VPlan] Fold NOT into predicate of wide compares. (#129430)
Add simplification to fold negation into a compare, if the negation is
the only user of the compare. This removes a number of redundant
negations.

Alive2 Proofs for FPCMP test changes:  https://alive2.llvm.org/ce/z/WGDz9U

PR: https://github.com/llvm/llvm-project/pull/129430
2025-03-07 20:32:43 +00:00
Florian Hahn
b2d70e8796
[VPlan] Use Builder to create cast recipes in VPlanTransforms (NFC).
Use VPBuilder in a few more places. This avoids manual insertions and
will make changing the cast recipe easier in the future.
2025-03-04 13:39:12 +00:00