5436 Commits

Author SHA1 Message Date
Hassnaa Hamdi
9491f75e1d
Reland: [LV]: Teach LV to recursively (de)interleave. (#122989)
This commit relands the changes from "[LV]: Teach LV to recursively
(de)interleave. #89018"

Reason for revert:
- The patch exposed a bug in the IA pass, the bug is now fixed and landed by commit: #122643
2025-01-17 10:34:57 +00:00
Mel Chen
9720be95d6
[LV][EVL] Disable fixed-order recurrence idiom with EVL tail folding. (#122458)
The currently llvm.splice may occurs unexpected behavior if the evl of
the second-to-last iteration is not VF*UF.

Issue #122461
2025-01-17 16:55:35 +08:00
vporpo
e902c6960c
[SandboxVec][BottomUpVec] Implement InstrMaps (#122848)
InstrMaps is a helper data structure that maps scalars to vectors and
the reverse. This is used by the vectorizer to figure out which vectors
it can extract scalar values from.
2025-01-16 15:26:35 -08:00
Luke Lau
5c15caa83f
[VPlan] Verify scalar types in VPlanVerifier. NFCI (#122679)
VTypeAnalysis contains some assertions which can be useful for reasoning
that the types of various operands match.

This patch teaches VPlanVerifier to invoke VTypeAnalysis to check them,
and catches some issues with VPInstruction types that are also fixed
here:

* Handles the missing cases for CalculateTripCountMinusVF,
CanonicalIVIncrementForPart and AnyOf
* Fixes ICmp and ActiveLaneMask to return i1 (to align with `icmp` and
`@llvm.get.active.lane.mask` in the LangRef)

The VPlanVerifier unit tests also need to be fleshed out a bit more to
satisfy the stricter assertions
2025-01-16 18:57:08 +08:00
Vasileios Porpodas
acf6072fae Reapply "[SandboxVec][Interval][NFC] Move a few definitions from header to .cpp"
This reverts commit 069fbeb82f56f0ce7c0382dfd5d4fa4dc1983a13.
2025-01-15 16:38:37 -08:00
Vasileios Porpodas
069fbeb82f Revert "[SandboxVec][Interval][NFC] Move a few definitions from header to .cpp"
This reverts commit 24c603505f91b2979d13e0b963fbd3c0174a005f.
2025-01-15 15:30:19 -08:00
Vasileios Porpodas
24c603505f [SandboxVec][Interval][NFC] Move a few definitions from header to .cpp 2025-01-15 15:23:28 -08:00
Florian Hahn
ef1260acc0
[VPlan] Make VPBlock constructors private (NFC).
16d19aaed moved to manage block creation via VPlan directly, with VPlan
owning the created blocks. Follow up to make the VPBlock constructors
private, to require creation via VPlan helpers and thus preventing
issues due to manually constructing blocks.
2025-01-15 21:34:24 +00:00
David Sherwood
edc02351dd
[NFC][LoopVectorize] Add more loop early exit asserts (#122732)
This patch is split off #120567, adding asserts in
addScalarResumePhis and addExitUsersForFirstOrderRecurrences
that the loop does not contain an uncountable early exit,
since the code cannot yet handle them correctly.
2025-01-15 14:29:06 +08:00
LiqinWeng
0294dab79e
[LV][VPlan] Add fast flags for selectRecipe (#121023)
Change the inheritance of class VPWidenSelectRecipe to class
VPRecipeWithIRFlags, which allows recipe of the select to pass the
fastmath flags.The patch of #119847 will add the fastmath flag to for
recipe
2025-01-15 10:10:11 +08:00
Florian Hahn
1de3dc7d23 [LV] Bail out early if BTC+1 wraps.
Currently we fail to detect the case where BTC + 1 wraps, i.e. the
vector trip count is 0, In those cases, the minimum iteration count
check will fail, and the vector code will never be executed.

Explicitly check for this condition in computeMaxVF and avoid trying to
vectorize alltogether.

Note that a number of tests needed to be updated, because the vector
loop would never be executed given the input IR.

Fixes https://github.com/llvm/llvm-project/issues/122558.
2025-01-14 22:07:38 +00:00
Simon Pilgrim
87750c9de4 [VectorCombine] foldPermuteOfBinops - match identity shuffles only if they match the destination type
Fixes regression identified after #122118
2025-01-14 16:09:50 +00:00
Luke Lau
f925e54554
[VPlan] Fix mutating whilst iterating over users in EVL transform (#122885)
This fixes a miscompilation extracted from 525.x264_r, where we were
failing to update the runtime VF of a VPReverseVectorPointerRecipe.

We were removing a use of VF whilst iterating over the users() iterator,
which messed up the iterator in-flight and caused us to miss some
recipes. This fixes it by copying the users into a SmallVector first.

Fixes #122681
Fixes #122682
2025-01-14 22:17:51 +08:00
Simon Pilgrim
6a9e9878a2 [VectorCombine] foldPermuteOfBinops - ensure potential identity mask isn't length changing. 2025-01-14 12:17:21 +00:00
Ramkumar Ramachandra
e409204a89
VectorCombine: teach foldExtractedCmps about samesign (#122883)
Follow up on 4a0d53a (PatternMatch: migrate to CmpPredicate) to get rid
of one of the FIXMEs it introduced by replacing a predicate comparison
with CmpPredicate::getMatching.
2025-01-14 12:04:14 +00:00
Ramkumar Ramachandra
0fe8469e08
SLPVectorizer: strip bad FIXME (NFC) (#122888)
Follow up on 4a0d53a (PatternMatch: migrate to CmpPredicate) to get rid
of the FIXME it introduced in SLPVectorizer: the FIXME is bad, and we'd
get no testable impact by using CmpPredicate::getMatching here.
2025-01-14 11:27:55 +00:00
Simon Pilgrim
0bf1591d01
[VectorCombine] foldPermuteOfBinops - fold "shuffle (binop (shuffle, other)), undef" --> "binop (shuffle), (shuffle)". (#122118)
foldPermuteOfBinops currently requires both binop operands to be oneuse shuffles to fold the shuffles across the binop, but there will be cases where its still profitable to fold across the binop with only one foldable shuffle.
2025-01-14 10:43:22 +00:00
Luke Lau
cb2560d33b
[VPlan] Verify plan before optimizations. NFC (#122678)
I've been exploring verifying the VPlan before and after the EVL
transformation steps, and noticed that the VPlan comes out in an invalid
state between construction and optimisation.

In adjustRecipesForReductions, we leave behind some dead recipes which
are invalid:

1) When we replace a link with a reduction recipe, the old link ends up
becoming a use-before-def:

    WIDEN ir<%l7> = add ir<%sum.02>, ir<%indvars.iv>.1
    WIDEN ir<%l8> = add ir<%l7>.1, ir<%l3>
    WIDEN ir<%l9> = add ir<%l8>.1, ir<%l5>
    ...
    REDUCE ir<%l7>.1 = ir<%sum.02> + reduce.add (ir<%indvars.iv>.1)
    REDUCE ir<%l8>.1 = ir<%l7>.1 + reduce.add (ir<%l3>)
    REDUCE ir<%l9>.1 = ir<%l8>.1 + reduce.add (ir<%l5>)

2) When transforming an AnyOf reduction phi to a boolean, we leave
behind a select with mismatching operand types, which will trigger the
assertions in VTypeAnalysis after #122679

This adds an extra verification step and deletes the dead recipes
eagerly to keep the plan valid.
2025-01-14 12:44:24 +08:00
vporpo
7c51c310ad
[SandboxVec][BottomUpVec] Clean up dead address instrs (#122536)
When we vectorize loads or stores we only keep the address of the first
lane. The rest may become dead. This patch adds the address operands of
vectorized loads or stores to the dead candidates set, such that they
get erased if dead.
2025-01-13 18:25:25 -08:00
offsake
83be69cf9a
[VPlan][Coverity] Fix coverity CID1579964. (#121805)
Fix for the Coverity hit with CID1579964 in VPlan.cpp.

Coverity message with some context follows.

[Cov] var_compare_op: Comparing TermBr to null implies that TermBr might
be null.
434    } else if (TermBr && !TermBr->isConditional()) {
435      TermBr->setSuccessor(0, NewBB);
436    } else {
437 // Set each forward successor here when it is created, excluding
438 // backedges. A backward successor is set when the branch is
created.
439      unsigned idx = PredVPSuccessors.front() == this ? 0 : 1;
     	
[Cov] CID 1579964: (#1 of 1): Dereference after null check
(FORWARD_NULL)
[Cov] var_deref_model: Passing null pointer TermBr to getSuccessor,
which dereferences it.
2025-01-13 20:29:51 +00:00
Alexey Bataev
066b88879a [SLP]Correctly set vector operand for extracts with poisons
When extracts are vectorized and it has some poison values instead of
instructions, need to correctly set the vectorized operand not as
poison, but as a main vector operand of the main extract instruction.

Fixes #122583
2025-01-13 10:57:07 -08:00
Alexey Bataev
092d628383 [SLP]Check for div/rem instructions before extending with poisons
Need to check if the instructions can be safely extended with poison
before actually doing this to avoid incorrect transformations.

Fixes #122691
2025-01-13 09:28:27 -08:00
Alexey Bataev
af524de1fa [SLP]Do not include subvectors for fully matched buildvectors
If the buildvector node fully matched another node, need to exclude
subvectors, when building final shuffle, just a shuffle of the original
node must be emitted.

Fixes #122584
2025-01-13 07:24:16 -08:00
Mel Chen
3397950f2d
[LV] Fix FindLastIV reduction for epilogue vectorization. (#120395)
Following 0e528ac404e13ed2d952a2d83aaf8383293c851e, this patch adjusts
the resume value of VPReductionPHIRecipe for FindLastIV reductions.
Replacing the resume value with:

  ResumeValue = ResumeValue == StartValue ? SentinelValue : ResumeValue;

This addressed the correctness issue when the start value might not be
less than the minimum value of a monotonically increasing induction
variable.

Thanks Florian Hahn for the help.

---------

Co-authored-by: Florian Hahn <flo@fhahn.com>
2025-01-13 20:58:38 +08:00
Sam Tebbs
795e35a653
Reland "[LoopVectorizer] Add support for partial reductions" with non-phi operand fix. (#121744)
This relands the reverted #120721 with a fix for cases where neither
reduction operand are the reduction phi. Only
63114239cc8d26225a0ef9920baacfc7cc00fc58 and
63114239cc8d26225a0ef9920baacfc7cc00fc58 are new on top of the reverted
PR.

---------

Co-authored-by: Nicholas Guy <nicholas.guy@arm.com>
2025-01-13 11:20:35 +00:00
Mel Chen
56a37a3c76
[SLPVectorizer] Refactor HorizontalReduction::createOp (NFC) (#121549)
This patch simplifies select-based integer min/max reductions by
utilizing `llvm::getMinMaxReductionPredicate`, and generates
intrinsic-based min/max reductions by utilizing
`llvm::getMinMaxReductionIntrinsicOp`.
2025-01-13 16:11:31 +08:00
Florian Hahn
8df64ed777 [LV] Don't consider IV increments uniform if exit value is used outside.
In some cases, there might be a chain of uniform instructions producing
the exit value. To generate correct code in all cases, consider the IV
increment not uniform, if there are users outside the loop.

Instead, let VPlan narrow the IV, if possible using the logic from
3ff1d01985752.

Test case from #122602 verified with Alive2:
    https://alive2.llvm.org/ce/z/bA4EGj

Fixes https://github.com/llvm/llvm-project/issues/122496.
Fixes https://github.com/llvm/llvm-project/issues/122602.
2025-01-12 22:03:21 +00:00
Florian Hahn
3ff1d01985 Recommit "[VPlan] Try to narrow wide and replicating recipes to uniform recipes."
This reverts commit 0ebb3ac7c92c4c1c44e7f3d17832d75ec5a42a67.

Re-applies commit with typos fixed.
2025-01-12 20:10:28 +00:00
Florian Hahn
0ebb3ac7c9 Revert "[VPlan] Try to narrow wide and replicating recipes to uniform recipes."
This reverts commit 1afba19913253dda865a8e57b37b9f4dabead1ac.

Typo breaking the build
2025-01-12 19:37:45 +00:00
Florian Hahn
1afba19913 [VPlan] Try to narrow wide and replicating recipes to uniform recipes.
Use the existing VPlan-based analysis to identify recipes that only have
their first lane demanded and transform them to uniform recpliate
recipes. This simplifies the generated code in some places and prepares
for fixing https://github.com/llvm/llvm-project/issues/122496.
2025-01-12 19:32:01 +00:00
Florian Hahn
7f59b4e998
[VPlan] Skip non-induction phi recipes in legalizeAndOptimizeInductions.
The body of the loop only applies to wide induction recipes, skip any other
header phi recipes up-frond
2025-01-11 20:33:02 +00:00
Vasileios Porpodas
25b90c4ef6 [SandboxVec][SeedCollector][NFC] Remove redundant 'else' and move the assertion within the 'if' 2025-01-10 14:54:44 -08:00
vporpo
9248428db7
[SandboxVec][DAG][NFC] Refactor setNextNode() and setPrevNode() (#122363)
This patch updates DAG's `setNextNode()` and `setPrevNode()` to update
both nodes of the link.
2025-01-10 13:32:33 -08:00
Han-Kuan Chen
35e76b6a4f Revert "[SLP] NFC. Replace MainOp and AltOp in TreeEntry with InstructionsState. (#120198)"
This reverts commit f3d6cdc5aebafac3961d4fccbd2ca0e302c6082c.
2025-01-10 10:09:54 -08:00
Alexey Bataev
681c83a2f9 [SLP]Fix mask generation after cost estimation
When estimating the cost of entries shuffles for buildvectors, need to
rebuild original mask, not a generated submask, used for subregisters
analysis.

Fixes #122430
2025-01-10 09:32:35 -08:00
Alex MacLean
986f2ac48f
[SLPVectorizer] minor tweaks around lambdas for compatibility with older compilers (#122348)
Older version of msvc do not have great lambda support and are not able
to handle uses of class data or lambdas with implicit return types in
some cases. These minor changes improve the sources compatibility with
older msvc and don't hurt readability either.
2025-01-10 09:18:28 -08:00
Alexey Bataev
3c9c94a24f Revert "[SLP]Fix mask generation after cost estimation"
This reverts commit 547ba9730bf05df3383150f730a689f2c8336206 to fix
buildbots reported in
https://lab.llvm.org/buildbot/#/builders/123/builds/11370, https://lab.llvm.org/buildbot/#/builders/133/builds/9492
2025-01-10 08:46:42 -08:00
Alexey Bataev
547ba9730b [SLP]Fix mask generation after cost estimation
When estimating the cost of entries shuffles for buildvectors, need to
rebuild original mask, not a generated submask, used for subregisters
analysis.

Fixes #122430
2025-01-10 08:17:56 -08:00
Mel Chen
e0f14e11c7
[SLPVectorizer] Refine the scope of RdxOpcode in HorizontalReduction::createOp (NFC) (#122239)
This patch is one part of unifying IAnyOf and FAnyOf reduction. #118393
The related patch is #118777.
2025-01-10 16:01:36 +08:00
Han-Kuan Chen
f3d6cdc5ae [SLP] NFC. Replace MainOp and AltOp in TreeEntry with InstructionsState. (#120198)
Add TreeEntry::hasState.
Add assert for getTreeEntry.
Remove the OpValue parameter from the canReuseExtract function.
Remove the Opcode parameter from the ComputeMaxBitWidth lambda function.
2025-01-09 23:41:52 -08:00
Han-Kuan Chen
5454ac28b3 Revert "[SLP] NFC. Replace MainOp and AltOp in TreeEntry with InstructionsState. (#120198)"
This reverts commit 760f550de25792db83cd39c88ef57ab6d80a41a0.
2025-01-09 18:41:47 -08:00
Han-Kuan Chen
36b423e0f8
[SLP] NFC. Refactor getSameOpcode and reduce for loop iterations. (#122241)
Replace Cnt and AltIndex with MainOp and AltOp.
Reduce the number of iterations in the for loop.
2025-01-10 09:06:07 +08:00
Han-Kuan Chen
760f550de2
[SLP] NFC. Replace MainOp and AltOp in TreeEntry with InstructionsState. (#120198)
Add TreeEntry::hasState.
Add assert for getTreeEntry.
Remove the OpValue parameter from the canReuseExtract function.
Remove the Opcode parameter from the ComputeMaxBitWidth lambda function.
2025-01-10 09:05:39 +08:00
Florian Hahn
7ffb691595
[VPlan] Remove dead ToRemove (NFC). 2025-01-09 22:02:32 +00:00
vporpo
6312beef78
[SandboxVec][BottomUpVec] Use SeedCollector and slice seeds (#120826)
With this patch we switch from the temporary dummy seeds to actual seeds
provided by the seed collector.
The seeds get sliced and each slice is used as the starting point for
vectorization.
2025-01-09 11:53:48 -08:00
Alexey Bataev
5ff36748cf [SLP]Fix mask processing for reused gathered scalars
Need to sync the mask between cost and actual emission to avoid bugs in
mask calculation

Fixes #122324
2025-01-09 11:24:48 -08:00
Florian Hahn
b0697dc1de
[LV] Only check isVectorizableEarlyExitLoop with multiple exits. (#121994)
Currently we emit early-exit related debug messages/remarks even when
there is a single exit. Update to only check isVectorizableEarlyExitLoop
if there isn't a single exit block.

PR: https://github.com/llvm/llvm-project/pull/121994
2025-01-09 12:05:19 +00:00
Benjamin Maxwell
f88ef1bd1b
[LV] Teach LoopVectorizationLegality about struct vector calls (#119221)
This is a split-off from #109833 and only adds code relating to checking
if a struct-returning call can be vectorized.

This initial patch only allows the case where all users of the struct
return are `extractvalue` operations that can be widened.

```
%call = tail call { float, float } @foo(float %in_val)
%extract_a = extractvalue { float, float } %call, 0
%extract_b = extractvalue { float, float } %call, 1
```

Note: The tests require the VFABI changes from #119000 to pass.
2025-01-09 09:27:29 +00:00
Alexey Bataev
5b76a2e51b [SLP]Correctly calculate mask for the inserted vector 2025-01-08 15:18:06 -08:00
Alexey Bataev
0d921f96d4 [SLP][NFC]Introduce and use createInsertVector helper function, NFC 2025-01-08 14:26:13 -08:00