5473 Commits

Author SHA1 Message Date
vporpo
d2234ca163
[SandboxVec][BottomUpVec] Fix packing when PHIs are present (#124206)
Before this patch we might have emitted pack instructions in between PHI
nodes. This patch fixes it by fixing the insert point of the new packs.
2025-01-23 16:29:01 -08:00
vporpo
c7053ac202
[SandboxVec][BottomUpVec] Disable crossing BBs (#124039)
Crossing BBs is not currently supported by the structures of the
vectorizer. This patch fixes instances where this was happening,
including:
- a walk of use-def operands that updates the UnscheduledSuccs counter,
- the dead instruction removal is now done per BB,
- the scheduler, which will reject bundles that cross BBs.
2025-01-23 15:08:13 -08:00
Vitaly Buka
0e213834df
Revert "[LoopVectorizer] Add support for chaining partial reductions (#120272)" (#124198)
Introduced stack buffer overflow, see #120272.

`getScaledReduction` can return empty vector, and there is not check for
that.

This reverts commit c9b7303b9b18129c4ee6b56aaa2a0a9f59be2d09.
This reverts commit caf0540b91b0fee31353dc7049ae836e0f814cff.
2025-01-23 14:00:33 -08:00
Florian Hahn
7a831eb924 [VPlan] Remove unused VPLane::getNumCachedLanes. (NFC)
The function isn't used, remove it.
2025-01-23 20:38:47 +00:00
Karlo Basioli
c9b7303b9b
Add [[maybe_unused]] to a variable used only in assert in VPlan.h (#124173) 2025-01-23 18:52:53 +00:00
Alexey Bataev
c7e6ca76cb [SLP][NFC]Add dump() method for ScheduleData struct type for better debugging 2025-01-23 09:49:37 -08:00
Nicholas Guy
caf0540b91
[LoopVectorizer] Add support for chaining partial reductions (#120272)
Chaining partial reductions, where multiple partial reductions share an
accumulator, allow for more values to be combined together as part of
the reduction without discarding the semantics of the partial reduction
itself.
2025-01-23 17:24:57 +00:00
Simon Pilgrim
d8cd8d56ea
[SLP] getSpillCost - fully populate IntrinsicCostAttributes to improve cost analysis. (#124129)
We were only constructing the IntrinsicCostAttributes with the arg type info, and not the args themselves, preventing more detailed cost analysis (constant / uniform args etc.)

Just pass the whole IntrinsicInst to the constructor and let it resolve everything it can.

Noticed while having yet another attempt at #63980
2025-01-23 16:57:13 +00:00
Alexey Bataev
fa299294c0 [SLP][NFC]Modernize code base in several places 2025-01-23 08:43:07 -08:00
Nicholas Guy
26b61e143b
[LoopVectorizer] Propagate underlying instruction to the cloned instances of VPPartialReductionRecipes (#123638) 2025-01-23 14:57:31 +00:00
Florian Hahn
05fbc3830d [VPlan] Move VPBlockUtils to VPlanUtils.h (NFC)
Nothing in VPlan.h directly uses VPBlockUtils.h. Move it out to the more
appropriate VPlanUtils.h to reduce the size of the widely included VPlan.h.
2025-01-23 11:28:11 +00:00
Han-Kuan Chen
d3aea77f50
[SLP] Move transformMaskAfterShuffle into BaseShuffleAnalysis and use it as much as possible. (#123896) 2025-01-23 09:47:38 +08:00
vporpo
8110af75b1
[SandboxVec][BottomUpVec] Fix codegen when packing constants. (#124033)
Before this patch packing a bundle of constants would crash because
`getInsertPointAfterInstrs()` expected instructions. This patch fixes
this.
2025-01-22 16:38:10 -08:00
vporpo
2dc1c95595
[SandboxVec][VecUtils] Implement VecUtils::getLowest() (#124024)
VecUtils::getLowest(Valse) returns the lowest instruction in the BB among Vals.
If the instructions are not in the same BB, or if none of them is an
instruction it returns nullptr.
2025-01-22 16:08:15 -08:00
vporpo
fd087135ef
[SandboxVec][Legality] Diamond reuse multi input (#123426)
This patch implements the diamond pattern where we are vectorizing
toward the top of the diamond from both edges, but the second edge may
use elements from a different vector or just scalar values. This
requires some additional packing code (see lit test).
2025-01-22 15:23:47 -08:00
David Sherwood
4a2ebd6661
[LV][NFC] Refactor structures used to maintain uncountable exit info (#123219)
I've removed the HasUncountableEarlyExit variable, since we can
already determine whether or not a loop has an early exit by seeing
if we found an uncountable exit.

I have also deleted the old UncountableExitingBlocks and
UncountableExitBlocks lists and replaced them with a single
uncountable edge. This means we don't need to worry about keeping the
list entries in sync and makes it clear which exiting block
corresponds to which exit block.
2025-01-22 09:40:08 +00:00
Vasileios Porpodas
4089314907 [SandboxVec][DAG][NFC] Remove early return in notifyMoveInstr()
It used to early return when destination is same as origin. But it's redundant
because in that case the callback won't get called in the first place.
2025-01-21 18:38:39 -08:00
Florian Hahn
6c787ff6cf
Revert "[LV]: Teach LV to recursively (de)interleave. (#122989)"
This reverts commit 9491f75e1d912b277247450d1c7b6d56f7faf885.

This triggers an assert when building with SVE enabled.
https://lab.llvm.org/buildbot/#/builders/143/builds/4795
2025-01-21 21:36:16 +00:00
Florian Hahn
8770126382
[VPlan] Remove stale comment for collectUsersInExitBlocks.
Remove stale section about wide inductions. Since 2c87133c6212d4bd0
all live-outs are modeled in VPlan.
2025-01-21 19:27:45 +00:00
Alexey Bataev
5deb4ef9ab
[SLP]Initial non-power-of-2 (but still whole register) for remaining nodes
Added non-power-of-2 (but still whole registers) vectorization support
for nodes other than stores and reductions.

Reviewers: preames, RKSimon, hiraditya

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/113356
2025-01-21 10:33:03 -05:00
Alexey Bataev
7d01a8f2b9 [SLP]Fix vector factor for repeated node for bv
When adding a node vector, when it is used already in the shuffle for
buildvector, need to calculate vector factor from all vector, not only
this single vector, to avoid incorrect result. Also, need to increase
stability of the reused entries detection to avoid mismatch in cost
estimation/codegen.

Fixes #123639
2025-01-20 14:22:20 -08:00
Florian Hahn
bd5e12e6a0
[VPlan] Don't retrieve Def unnecessarily in isUniformAfterVector (NFC).
dyn_cast for recipes take VPValues, avoid calling getDefiningRecipe
unnecessarily.
2025-01-20 22:07:58 +00:00
David Sherwood
fcec8756e2
[LoopVectorize][NFC] Simplify ScaledReductionExitInstrs map (#123368)
For the following variable

DenseMap<const Instruction *, std::pair<PartialReductionChain,
unsigned>>
  ScaledReductionExitInstrs;

we never actually need the PartialReductionChain when using the map.
I've cleaned this up so that this now becomes

  DenseMap<const Instruction *, unsigned> ScaledReductionMap;
2025-01-20 14:41:27 +00:00
Mel Chen
84c89d0aa4
[LV][EVL] Address post-commit comments for 9720be9. (NFC) (#123311) 2025-01-20 14:20:40 +08:00
Florian Hahn
2c87133c62
Reapply "[VPlan] Update final IV exit value via VPlan. (#112147)"
This reverts the revert commit 58326f1d5b5b379590af92dd129b2f3b3e96af46.

The build failure in sanitizer stage2 builds has been fixed with
0d39fe6f5bb3edf0bddec09a8c6417377390aeac.

Original commit message:
Model updating IV users directly in VPlan, replace fixupIVUsers.

Now simple extracts are created for all phis in the exit block during
initial VPlan construction. A later VPlan transform
(optimizeInductionExitUsers) replaces extracts of inductions with
their pre-computed values if possible.

This completes the transition towards modeling all live-outs directly in
VPlan.

There are a few follow-ups:
* emit extracts initially also for resume phis, and optimize them
   tougher with IV exit users
* support for VPlans with multiple exits in optimizeInductionExitUsers.

Depends on https://github.com/llvm/llvm-project/pull/110004,
https://github.com/llvm/llvm-project/pull/109975 and
https://github.com/llvm/llvm-project/pull/112145.
2025-01-19 19:32:03 +00:00
Florian Hahn
0d39fe6f5b
[VPlan] Handle VPDerivedIV and more VPInsts in isUniformAfterVector.
In preparation for re-landing
https://github.com/llvm/llvm-project/pull/112147, also consider
VPDerivedIVRecipe and VPInstructions with binary opcodes and PtrAdd with
all uniform operands as uniform themselves.

Effectively NFC, but will be exercised once #112147 re-lands.
2025-01-19 14:42:52 +00:00
Alexey Bataev
2b1e037adb [SLP]Fix createInsertVector mask emission 2025-01-18 11:48:53 -08:00
Florian Hahn
58326f1d5b
Revert "[VPlan] Update final IV exit value via VPlan. (#112147)"
This reverts commit c2d15ac4d4432788557e77c15ce572ac655a8fec.

Causes build failures on PPC stage2 & fuchsia bots
    https://lab.llvm.org/buildbot/#/builders/168/builds/7650
    https://lab.llvm.org/buildbot/#/builders/11/builds/11248
2025-01-18 13:40:33 +00:00
Florian Hahn
c2d15ac4d4
[VPlan] Update final IV exit value via VPlan. (#112147)
Model updating IV users directly in VPlan, replace fixupIVUsers.

Now simple extracts are created for all phis in the exit block during
initial VPlan construction. A later VPlan transform 
(optimizeInductionExitUsers) replaces extracts of inductions with 
their pre-computed values if possible.

This completes the transition towards modeling all live-outs directly in
VPlan.

There are a few follow-ups:
* emit extracts initially also for resume phis, and optimize them 
   tougher with IV exit users
* support for VPlans with multiple exits in optimizeInductionExitUsers.


Depends on https://github.com/llvm/llvm-project/pull/110004,
https://github.com/llvm/llvm-project/pull/109975 and
https://github.com/llvm/llvm-project/pull/112145.
2025-01-18 13:22:34 +00:00
Han-Kuan Chen
07d496538f
[SLP] Replace MainOp and AltOp in TreeEntry with InstructionsState. (#122443)
Add TreeEntry::hasState.
Add assert for getTreeEntry.
Remove the OpValue parameter from the canReuseExtract function.
Remove the Opcode parameter from the ComputeMaxBitWidth lambda function.
2025-01-18 10:23:20 +08:00
vporpo
87e4b68195
[SandboxVec][Legality] Implement ShuffleMask (#123404)
This patch implements a helper ShuffleMask data structure that helps
describe shuffles of elements across lanes.
2025-01-17 15:48:24 -08:00
Florian Hahn
65cd9e4c2f
[VPlan] Make VPValue constructors protected. (NFC)
Tighten access to constructors similar to ef1260acc0. VPValues should
either be constructed by constructors of recipes defining them or should
be live-ins created by VPlan (via getOrAddLiveIn).
2025-01-17 22:17:12 +00:00
vporpo
d6315afff0
[SandboxVec][InstrMaps] EraseInstr callback (#123256)
This patch hooks up InstrMaps to the Sandbox IR callbacks such that it
gets updated when instructions get erased.
2025-01-17 13:36:42 -08:00
George Chaltas
b1bf95c081
ReduxWidth check for 0 (#123257)
Added assert to check for underflow of ReduxWidth

	modified:   llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp


Source code analysis flagged the operation (ReduxWwidth - 1) as
potential underflow, since ReduxWidth is unsigned.
Realize that this should never happen if everything is working right,
but added an assert to check for it just in case.
2025-01-17 15:56:58 -05:00
Vasileios Porpodas
128e2e446e [SandboxVec][VecUtils][NFC] Move functions to VecUtils.cpp and add a VecUtils::dump() 2025-01-17 11:56:07 -08:00
Alexey Bataev
fec503d1a3 [SLP][NFC]Add safe createExtractVector and use instead Builder.CreateExtractVector 2025-01-17 11:46:46 -08:00
John Brawn
edf3a55bce
[LoopVectorize][NFC] Centralize the setting of CostKind (#121937)
In each class which calculates instruction costs (VPCostContext,
LoopVectorizationCostModel, GeneratedRTChecks) set the CostKind once in
the constructor instead of in each function that calculates a cost. This
is in preparation for potentially changing the CostKind when compiling
for optsize.
2025-01-17 15:06:18 +00:00
Hassnaa Hamdi
9491f75e1d
Reland: [LV]: Teach LV to recursively (de)interleave. (#122989)
This commit relands the changes from "[LV]: Teach LV to recursively
(de)interleave. #89018"

Reason for revert:
- The patch exposed a bug in the IA pass, the bug is now fixed and landed by commit: #122643
2025-01-17 10:34:57 +00:00
Mel Chen
9720be95d6
[LV][EVL] Disable fixed-order recurrence idiom with EVL tail folding. (#122458)
The currently llvm.splice may occurs unexpected behavior if the evl of
the second-to-last iteration is not VF*UF.

Issue #122461
2025-01-17 16:55:35 +08:00
vporpo
e902c6960c
[SandboxVec][BottomUpVec] Implement InstrMaps (#122848)
InstrMaps is a helper data structure that maps scalars to vectors and
the reverse. This is used by the vectorizer to figure out which vectors
it can extract scalar values from.
2025-01-16 15:26:35 -08:00
Luke Lau
5c15caa83f
[VPlan] Verify scalar types in VPlanVerifier. NFCI (#122679)
VTypeAnalysis contains some assertions which can be useful for reasoning
that the types of various operands match.

This patch teaches VPlanVerifier to invoke VTypeAnalysis to check them,
and catches some issues with VPInstruction types that are also fixed
here:

* Handles the missing cases for CalculateTripCountMinusVF,
CanonicalIVIncrementForPart and AnyOf
* Fixes ICmp and ActiveLaneMask to return i1 (to align with `icmp` and
`@llvm.get.active.lane.mask` in the LangRef)

The VPlanVerifier unit tests also need to be fleshed out a bit more to
satisfy the stricter assertions
2025-01-16 18:57:08 +08:00
Vasileios Porpodas
acf6072fae Reapply "[SandboxVec][Interval][NFC] Move a few definitions from header to .cpp"
This reverts commit 069fbeb82f56f0ce7c0382dfd5d4fa4dc1983a13.
2025-01-15 16:38:37 -08:00
Vasileios Porpodas
069fbeb82f Revert "[SandboxVec][Interval][NFC] Move a few definitions from header to .cpp"
This reverts commit 24c603505f91b2979d13e0b963fbd3c0174a005f.
2025-01-15 15:30:19 -08:00
Vasileios Porpodas
24c603505f [SandboxVec][Interval][NFC] Move a few definitions from header to .cpp 2025-01-15 15:23:28 -08:00
Florian Hahn
ef1260acc0
[VPlan] Make VPBlock constructors private (NFC).
16d19aaed moved to manage block creation via VPlan directly, with VPlan
owning the created blocks. Follow up to make the VPBlock constructors
private, to require creation via VPlan helpers and thus preventing
issues due to manually constructing blocks.
2025-01-15 21:34:24 +00:00
David Sherwood
edc02351dd
[NFC][LoopVectorize] Add more loop early exit asserts (#122732)
This patch is split off #120567, adding asserts in
addScalarResumePhis and addExitUsersForFirstOrderRecurrences
that the loop does not contain an uncountable early exit,
since the code cannot yet handle them correctly.
2025-01-15 14:29:06 +08:00
LiqinWeng
0294dab79e
[LV][VPlan] Add fast flags for selectRecipe (#121023)
Change the inheritance of class VPWidenSelectRecipe to class
VPRecipeWithIRFlags, which allows recipe of the select to pass the
fastmath flags.The patch of #119847 will add the fastmath flag to for
recipe
2025-01-15 10:10:11 +08:00
Florian Hahn
1de3dc7d23 [LV] Bail out early if BTC+1 wraps.
Currently we fail to detect the case where BTC + 1 wraps, i.e. the
vector trip count is 0, In those cases, the minimum iteration count
check will fail, and the vector code will never be executed.

Explicitly check for this condition in computeMaxVF and avoid trying to
vectorize alltogether.

Note that a number of tests needed to be updated, because the vector
loop would never be executed given the input IR.

Fixes https://github.com/llvm/llvm-project/issues/122558.
2025-01-14 22:07:38 +00:00
Simon Pilgrim
87750c9de4 [VectorCombine] foldPermuteOfBinops - match identity shuffles only if they match the destination type
Fixes regression identified after #122118
2025-01-14 16:09:50 +00:00
Luke Lau
f925e54554
[VPlan] Fix mutating whilst iterating over users in EVL transform (#122885)
This fixes a miscompilation extracted from 525.x264_r, where we were
failing to update the runtime VF of a VPReverseVectorPointerRecipe.

We were removing a use of VF whilst iterating over the users() iterator,
which messed up the iterator in-flight and caused us to miss some
recipes. This fixes it by copying the users into a SmallVector first.

Fixes #122681
Fixes #122682
2025-01-14 22:17:51 +08:00