5570 Commits

Author SHA1 Message Date
vporpo
1c207f1b6e
[SandboxVec][DAG] Fix DAG when old interval is mem free (#126983)
This patch fixes a bug in `DependencyGraph::extend()` when the old
interval contains no memory instructions. When this is the case we
should do a full dependency scan of the new interval.
2025-02-12 15:06:30 -08:00
vporpo
31cb807537
[SanbdoxVec][BottomUpVec] Fix diamond shuffle with multiple vector inputs (#126965)
When the operand comes from multiple inputs then we need additional
packing code. When the operands are scalar then we can use a single
InsertElementInst. But when the operands are vectors then we need a
chain of ExtractElementInst and InsertElementInst instructions to insert
the vector value into the destination vector. This is what this patch
implements.
2025-02-12 14:33:05 -08:00
Vasileios Porpodas
e75e61728e [SandboxVec] Fix warnings introduced by 7a7f9190d03e 2025-02-12 12:43:24 -08:00
vporpo
7a7f9190d0
[SandboxVec][Legality] Fix mask on diamond reuse with shuffle (#126963)
This patch fixes a bug in the creation of shuffle masks when vectorizing
vectors in case of a diamond reuse with shuffle. The mask needs to
enumerate all elements of a vector, not treat the original vector value
as a single element. That is: if vectorizing two <2 x float> vectors
into a <4 x float> the mask needs to have 4 indices, not just 2.
2025-02-12 12:29:09 -08:00
vporpo
6d7a84d72b
[SandboxVec][Scheduler] Fix top of schedule (#126820)
This patch fixes the way the top-of-schedule variable gets set and
updated. Before this patch it used to get updated whenever we scheduled
a bundle, which is wrong, as the top-of-schedule needs to be maintained
across scheduling attempts.

It should get reset only when we clear the schedule or when we destroy
the current schedule and re-schedule.
2025-02-12 11:52:01 -08:00
Alexey Bataev
bb3d789dfe [SLP][NFC]Improve dump of the ScheduleData, NFC 2025-02-12 06:51:30 -08:00
Alexey Bataev
e1935a2b15 Revert "[SLP][NFC]Improve dump of the ScheduleData, NFC"
This reverts commit 108e6bca693e5f44d2d17da5a6e06203a0290de7 to fix
error revealed by buildbots https://lab.llvm.org/buildbot/#/builders/159/builds/15888.
2025-02-12 06:34:27 -08:00
Alexey Bataev
108e6bca69 [SLP][NFC]Improve dump of the ScheduleData, NFC 2025-02-12 06:25:04 -08:00
David Sherwood
3e62321ed9
[LoopVectorize] Make collectInLoopReductions more efficient (#126769)
We call collectInLoopReductions in multiple places asking
the same question with exactly the same answer. For
example, this was being called from a loop in
calculateRegisterUsage and this patch hoists the call out
to above the loop. In addition I've changed
collectInLoopReductions so that it bails out if we've
already built up a list.
2025-02-12 14:05:34 +00:00
Alexey Bataev
10844fb9b0 [SLP]Fix attempt to build the reorder mask for non-adjusted reuse mask
When building the reorder for non-single use reuse mask, need to check
if the size of the mask is multiple of the number of unique scalars.
  Otherwise, the compiler may crash when trying to reorder nodes.

Fixes #126304
2025-02-11 13:41:25 -08:00
Kazu Hirata
042e860a8a
[Vectorize] Avoid repeated hash lookups (NFC) (#126681) 2025-02-11 09:09:43 -08:00
Florian Hahn
e258bca950
[VPlan] Only skip expansion for SCEVUnknown if it isn't an instruction. (#125235)
Update getOrCreateVPValueForSCEVExpr to only skip expansion of
SCEVUnknown if the underlying value isn't an instruction. Instructions
may be defined in a loop and using them without expansion may break
LCSSA form. SCEVExpander will take care of preserving LCSSA if needed.

We could also try to pass LoopInfo, but there are some users of the
function where it won't be available and main benefit from skipping
expansion is slightly more concise VPlans.

Note that SCEVExpander is now used to expand SCEVUnknown with floats.
Adjust the check in expandCodeFor to only check the types and casts if
the type of the value is different to the requested type. Otherwise we
crash when trying to expand a float and requesting a float type.

Fixes https://github.com/llvm/llvm-project/issues/121518.

PR: https://github.com/llvm/llvm-project/pull/125235
2025-02-11 13:03:12 +01:00
Florian Hahn
3706dfef66
[LV] Forget LCSSA phi with new pred before other SCEV invalidation. (#119897)
`forgetLcssaPhiWithNewPredecessor` performs additional invalidation if
there is an existing SCEV for the phi, but earlier
`forgetBlockAndLoopDispositions` or `forgetLoop` may already invalidate
the SCEV for the phi.

Change the order to first call `forgetLcssaPhiWithNewPredecessor` to
ensure it runs before its SCEV gets invalidated too eagerly.

Fixes https://github.com/llvm/llvm-project/issues/119665.

PR: https://github.com/llvm/llvm-project/pull/119897
2025-02-10 16:29:42 +00:00
Ricardo Jesus
5f84b6edd9
[AArch64] Add MATCH loops to LoopIdiomVectorizePass (#101976)
This patch adds a new loop to LoopIdiomVectorizePass, enabling it to
recognise and vectorise loops such as:
```cpp
template<class InputIt, class ForwardIt>
InputIt find_first_of(InputIt first, InputIt last,
                      ForwardIt s_first, ForwardIt s_last)
{
  for (; first != last; ++first)
    for (ForwardIt it = s_first; it != s_last; ++it)
      if (*first == *it)
        return first;
  return last;
}
```

These loops match the C++ standard library function `std::find_first_of`.
2025-02-10 08:23:34 +00:00
Elvis Wang
2e3729bf40
[LV] Prevent query the computeCost() when VF=1 in emitInvalidCostRemarks(). (#117288)
We should only query the computeCost() when the VF is vector.
2025-02-10 08:40:28 +08:00
Hassnaa Hamdi
e9a20f77ee
Reland "[LV]: Teach LV to recursively (de)interleave." (#125094)
This patch relands the changes from "[LV]: Teach LV to recursively
(de)interleave.#122989"
    Reason for revert:
- The patch exposed an assert in the vectorizer related to VF difference
between
legacy cost model and VPlan-based cost model because of uncalculated
cost for
VPInstruction which is created by VPlanTransforms as a replacement to
'or disjoint'
       instruction.
VPlanTransforms do that instructions change when there are memory
interleaving and
predicated blocks, but that change didn't cause problems because at most
cases the cost
      difference between legacy/new models is not noticeable.
    - Issue is fixed by #125434

Original patch: https://github.com/llvm/llvm-project/pull/89018
Reviewed-by: paulwalker-arm, Mel-Chen
2025-02-09 19:21:54 +00:00
Florian Hahn
32c4493d5f
[VPlan] Add incoming values for all predecessor to ResumePHI (NFCI).
Follow-up as discussed when using VPInstruction::ResumePhi for all resume
values (#112147). This patch explicitly adds incoming values for each
predecessor in VPlan. This simplifies codegen and allows transformations
adjusting the predecessors of blocks with

NFC modulo incoming block order in phis.
2025-02-09 11:20:20 +00:00
vporpo
69b8cf4f06
[SandboxVec][BottomUpVec] Add cost estimation and tr-accept-or-revert pass (#126325)
The TransactionAcceptOrRevert pass is the final pass in the Sandbox
Vectorizer's default pass pipeline. It's job is to check the cost
before/after vectorization and accept or revert the IR to its original
state.

Since we are now starting the transaction in BottomUpVec, tests that run
a custom pipeline need to accept the transaction. This is done with the
help of the TransactionAlwaysAccept pass (tr-accept).
2025-02-08 08:34:18 -08:00
Florian Hahn
6ff8a06de9
[VPlan] Run recipe removal and simplification after optimizeForVFAndUF. (#125926)
Run recipe simplification and dead recipe removal after VPlan-based
unrolling and optimizeForVFAndUF, to clean up any redundant or dead
recipes introduced by them. Currently this is NFC, as it removes the
corresponding removeDeadRecipes run in optimizeForVFAndUF and no
additional simplifications kick in after unrolling yet. That is changing
with https://github.com/llvm/llvm-project/pull/123655.

Note that with this change, pattern-matching is now applied after
EVL-based recipes have been introduced.

Trying to match VPWidenEVLRecipe when not explicitly requested might
apply a pattern with 2 operands to one with 3 due to the extra EVL
operand and VPWidenEVLRecipe being a subclass of VPWidenRecipe.

To prevent this, update Recipe_match::match to only match
VPWidenEVLRecipe if it is in the requested recipe types (RecipeTy).

PR: https://github.com/llvm/llvm-project/pull/125926
2025-02-08 13:33:46 +00:00
Florian Hahn
ee806646ad
[VPlan] Consistently use hasScalarVFOnly (NFC).
Consistently use hasScalarVFOnly instead of using
hasVF(ElementCount::getFixed(1)). Also add an assert to ensure all cases
are covered by hasScalarVFOnly.
2025-02-08 12:19:25 +00:00
Florian Hahn
16df836a52
[VPlan] Mark hasVF & hasScalableVF as const (NFC). 2025-02-08 11:32:23 +00:00
Kazu Hirata
5901bda5a0
[Vectorize] Avoid repeated hash lookups (NFC) (#126345) 2025-02-08 00:48:51 -08:00
Florian Hahn
1611059f5d
[VPlan] Compute cost for binary op VPInstruction with underlying values. (#125434)
As exposed by https://github.com/llvm/llvm-project/pull/125094, we are
missing cost computation for some binary VPInstructions we created based
on original IR instructions. Their cost should be considered.

PR: https://github.com/llvm/llvm-project/pull/125434
2025-02-07 15:27:31 +00:00
David Sherwood
3872e55758
[LoopVectorize] Fix build error (#126218)
Fixes issue caused by 1930524bbde3cd26ff527bbdb5e1f937f484edd6

Unused variable UsesMask in LoopVectorize.cpp
2025-02-07 10:16:32 +00:00
David Sherwood
1930524bbd
[LoopVectorize] Fix cost model assert when vectorising calls (#125716)
The legacy and vplan cost models did not agree because
VPWidenCallRecipe::computeCost only calculates the cost of the
call instruction, whereas
LoopVectorizationCostModel::setVectorizedCallDecision in some
cases adds on the cost of a synthesised mask argument. However,
this mask is always 'splat(i1 true)' which should be hoisted out
of the loop during codegen. In order to synchronise the two cost
models I have two options:

1) Also add the cost of the splat to the vplan model, or
2) Remove the cost of the splat from the legacy model.

I chose 2) because I feel this more closely represents what the
final code will look like. There is an argument that we should
take account of such broadcast costs in the preheader when
deciding if it's profitable to vectorise a loop, however there
isn't currently a mechanism to do this. We currently only take
account of the runtime checks when assessing profitability and
what the minimum trip count should be. However, I don't believe
this work needs doing as part of this PR.
2025-02-07 09:36:52 +00:00
James Chesterman
ac158aa13b
[LoopVectorizer] Allow partial reductions to be made in predicated loops (#124268)
Does a select on the input rather than the output. This way the mask has
the same number of lanes as the other operand in the select instruction.
2025-02-07 09:09:10 +00:00
Mel Chen
4d3148d926
[LV][EVL] Fix the check for legality of folding with EVL. (#125678)
The current legality check for folding with EVL has incomplete
verification for VF.
This patch fixes the VF check, ensuring that tail folding with EVL is
enabled only when a scalable VF is available. This allows loops that
prefer tail folding with EVL but cannot use scalable VF vectorization to
still be vectorized using a fixed VF, rather than abandoning
vectorization entirely.
2025-02-07 12:53:10 +08:00
Luke Lau
d0f122b9c5
[LV] Update incoming blocks in VPWidenPHIRecipe in reassociateBlocks (#125481)
This is extracted from #118638

After c7ebe4f we will crash in fixNonInductionPHIs if we use a
VPWidenPHIRecipe with the vector preheader as an incoming block, because
the phi will reference the old non-IRBB vector preheader.

This fixes this by updating VPBlockUtils::reassociateBlocks to update
any VPWidenPHIRecipes's incoming blocks.

This assumes that if the VPWidenPHIRecipe is in a VPRegionBlock, it's in
the entry block, and that we are replacing a VPBasicBlock with another
VPBasicBlock.
2025-02-07 08:50:35 +08:00
vporpo
a0d86b23c0
[SandboxVec][Scheduler] Notify scheduler about instruction creation (#126141)
This patch implements the vectorizer's callback for getting notified
about new instructions being created. This updates the scheduler state,
which may involve removing dependent instructions from the ready list
and update the "scheduled" flag.
Since we need to remove elements from the ready list, this patch also
implements the `remove()` operation.
2025-02-06 15:45:44 -08:00
vporpo
166b2e8837
[SandboxVec][DAG] Update DAG when a new instruction is created (#126124)
The DAG will now receive a callback whenever a new instruction is
created and will update itself accordingly.
2025-02-06 14:12:03 -08:00
Florian Hahn
049aa179dc
[VPlan] Simplify operand tuple matching in VPlanPatternMatch (NFC).
Remove some indirection when matching recipe and matcher operands by
directly using fold over parameter pack.
2025-02-06 21:00:44 +00:00
vporpo
788c88e2f6
[SandboxVec][DependencyGraph] Fix dependency node iterators (#125616)
This patch fixes a bug in the dependency node iterators that would
incorrectly not skip nodes that are not in the current DAG. This
resulted in iterators returning nullptr when dereferenced.

The fix is to update the existing "skip" function to not only skip
non-instruction values but also to skip instructions not in the DAG.
2025-02-06 12:30:49 -08:00
Simon Pilgrim
eb2b453eb7 [VectorCombine] foldInsExtVectorToShuffle - ensure we call getShuffleCost with the input operand type, not the result
Typo in #121216

Fixes #126085
2025-02-06 17:41:24 +00:00
hanbeom
8c1dbac304
[VectorCombine] Allow shuffling between vectors the same type but different element sizes (#121216)
`foldInsExtVectorToShuffle` function combines the extract/insert of a vector into a vector through a shuffle. However, we only supported coupling between vectors of the same size.

This commit allows combining extract/insert for vectors of the same type but with different sizes by converting the length of the vectors.

Proof: https://alive2.llvm.org/ce/z/ELNLr7
Fixed https://github.com/llvm/llvm-project/issues/120772
2025-02-06 10:38:50 +00:00
Florian Hahn
585b75ec9a
[VPlan] Simplify matching recipe ty and opcode in pattern match (NFC).
Use parameter pack fold to simplify matching of recipe types and opcodes
for RecipeTys parameter pack.
2025-02-05 20:03:42 +00:00
David Sherwood
f07cd36a5d
[LoopVectorize] Add the cost of VPInstruction::AnyOf to vplan (#125058)
This patch adds an initial implementation of
VPInstruction::computeCost with support for only one
instruction so far - VPInstruction::AnyOf. This is only
used when vectorising loops with uncountable early exits.
2025-02-05 16:31:14 +00:00
Mel Chen
8d037b9256
[LV][EVL] Skip tryAddExplicitVectorLength for plans with scalar VF. (#125497)
The plans with scalar VF should not be transformed the plans folded by
EVL.

TODO: Move the scalar VF checking into `LoopVectorizationCostModel
::foldTailWithEVL()`.
2025-02-05 15:02:33 +08:00
Alexey Bataev
7dca2c628c
[SLP]Gather scalarized calls
If the calls won't be vectorized, but will be scalarized after
vectorization, they should be build as buildvector nodes, not vector
nodes. Vectorization of such calls leads to incorrect cost estimation,
does not allow to calculate correctly spills costs.

Reviewers: lukel97, preames

Reviewed By: preames

Pull Request: https://github.com/llvm/llvm-project/pull/125070
2025-02-04 19:09:57 -05:00
Alexey Bataev
88e7b8b81c
[SLP]Use TTI::getScalarizationOverhead where possible
Better to use TTI::getScalarizationOverhead instead of
TTI::getVectorInstrCost to correctly calculate the costs of
buildvectors/extracts.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/125725
2025-02-04 18:49:43 -05:00
Florian Hahn
7043895911
[VPlan] Remove dead VPBB argument from tryTo[Create]Widen[Recipe] (NFC)
The functions now use VPBuilder to insert recipes and the VPBB argument
is unused. Clean it up.
2025-02-04 21:40:07 +00:00
Alexey Bataev
fe7e280820 [SLP][NFC]Move functions definitions, NFC
Move functions to use them later in the following patches
2025-02-04 07:19:18 -08:00
Min-Yih Hsu
635ab515d5
[VectorCombine] Fold vector.interleave2 with two constant splats (#125144)
If we're interleaving 2 constant splats, for instance `<vscale x 8 x
i32> <splat of 666>` and `<vscale x 8 x i32> <splat of 777>`, we can
create a larger splat `<vscale x 8 x i64> <splat of ((777 << 32) |
666)>` first before casting it back into `<vscale x 16 x i32>`.
2025-02-03 19:05:49 -08:00
vporpo
c5f99e1bd4
[SandboxVec][Legality] Fix legality of SelectInst (#125005)
SelectInsts need special treatment because they are not always
straightforward to vectorize. This patch disables vectorization unless
they are trivially vectorizable.
2025-02-03 17:33:09 -08:00
Florian Hahn
f8fa93193b [LV] Add VPBuilder::insert, use to insert created vector pointer (NFC).
Split off from https://github.com/llvm/llvm-project/pull/124432 as
suggested. Adds VPBuilder::insert, inspired by IRBuilderBase.
2025-02-03 22:20:40 +00:00
Florian Hahn
30f3752e54
[VPlan] Only use SCEV for live-ins in tryToWiden. (#125436)
Replacing a recipe with a live-in may not be correct in all cases,
e.g. when replacing recipes involving header-phi recipes, like
reductions.

For now, only use SCEV to simplify live-ins.

More powerful input simplification can be built in top of
https://github.com/llvm/llvm-project/pull/124432 in the future.


Fixes https://github.com/llvm/llvm-project/issues/119173.
Fixes https://github.com/llvm/llvm-project/issues/125374.

PR: https://github.com/llvm/llvm-project/pull/125436
2025-02-03 17:01:02 +00:00
Alexey Bataev
0c70a26f46 [SLP]Clear root node reordering only if the root node is not re-used in graph
The reordering of the root node can be safely cleared only if the root
node is not reused, otherwise the graph might be broken

Fixes #125357
2025-02-03 06:05:19 -08:00
Simon Pilgrim
e3fbf19eb4 [SLP] getSpillCost - fully populate IntrinsicCostAttributes to improve cost analysis. (#124129) (REAPPLIED)
We were only constructing the IntrinsicCostAttributes with the arg type info, and not the args themselves, preventing more detailed cost analysis (constant / uniform args etc.)

Just pass the whole IntrinsicInst to the constructor and let it resolve everything it can.

Noticed while having yet another attempt at #63980

Reapplied cleanup now that #125223 and #124984 have landed.
2025-02-03 09:55:41 +00:00
David Sherwood
50d5d06d38
[LoopVectorize][NFC] Cache the result of getVScaleForTuning (#124732)
We currently call getVScaleForTuning in many places, doing a
lot of work asking the same question with the same answer.
I've refactored the code to cache the value if the max
scalable VF != 0 and pull out the cached value from
LoopVectorizationCostModel.
2025-02-03 09:49:26 +00:00
Martin Storsjö
d00579be39 Revert "[SLP]Reduce number of alternate instruction, where possible"
This reverts commit d5a7a483a65f830a0c7a931781bc90046dc67ff4.

That commit triggers failed asserts, see
https://github.com/llvm/llvm-project/pull/123360 for details.
2025-02-02 15:56:08 +02:00
Florian Hahn
5008277322
[VPlan] Move auxiliary declarations out of VPlan.h (NFC). (#124104)
Nothing in VPlan.h directly depends on VPTransformState, VPCostContext,
VPFRange, VPlanPrinter or VPSlotTracker. Move them out to a separate
header to reduce the size of widely used VPlan.h.

This is a first step towards more cleanly separating declarations in
VPlan.

Besides reducing VPlan.h's size, this also allows including additional
VPlan-related headers in VPlanHelpers.h for use there. An example is
using VPDominatorTree in VPTransformState
(https://github.com/llvm/llvm-project/pull/117138).

PR: https://github.com/llvm/llvm-project/pull/124104
2025-02-02 13:44:07 +00:00