5505 Commits

Author SHA1 Message Date
vporpo
e094c0fa67
[SandboxVec][Legality] Don't vectorize when instructions repeat (#124479)
This patch adds a legality check that checks for repeated instrs in a
bundle and won't vectorize if such pattern is found.
2025-01-29 15:54:15 -08:00
Simon Pilgrim
5921295dca
Revert "[SLP] getSpillCost - fully populate IntrinsicCostAttributes to improve cost analysis." (#124962)
Reverts llvm/llvm-project#124129 as its currently causing a regression at #124499 - avoids the regression until a proper fix can be added to getSpillCost
2025-01-29 22:17:53 +00:00
Alexey Bataev
4a1a697427
[SLP][NFC]Unify ScalarToTreeEntries and MultiNodeScalars, NFC
Currently, SLP has 2 distinct storages to manage mapping between
vectorized instructions and their corresponding vectorized TreeEntry
nodes. It leads to inefficient lookup for the matching TreeEntries and
makes it harder to correctly track instructions, associated with
multiple nodes.
There is a plan to extend this support for instructions, that require
scheduling, to allow support for copyable elements. Merging
ScalarToTreeEntry and MultiNodeScalars will allow reduce maintenance of
the feature

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/124914
2025-01-29 09:05:54 -05:00
Florian Hahn
2b55ef187c
[VPlan] Add helper to run VPlan passes, verify after run (NFC). (#123640)
Add new runPass helpers to run a VPlan transformation. This makes it
easier to add additional checks/functionality for each transform run. In
this patch, an option is added to run the verifier after each VPlan
transform.

Follow-ups will use the same helper to also support printing VPlans
after each transform.

Note that the verifier at the moment requires there to be a canonical IV
and vector loop region, so the final lowering transforms aren't run via
runPass yet.

PR: https://github.com/llvm/llvm-project/pull/123640
2025-01-29 10:50:01 +00:00
vporpo
79cbad188a
[SandboxVec] Clear Context's state within runOnFunction() (#124842)
`sandboxir::Context` is defined at a pass-level scope with the
`SandboxVectorizerPass` class because the function pass manager `FPM`
object depends on it, and that is in pass-level scope to avoid
recreating the pass pipeline every single time `runOnFunction()` is
called.

This means that the Context's state lives on across function passes. The
problem is twofold:
(i) the LLVM IR to Sandbox IR map can grow very large including objects
from different functions, which is of no use to the vectorizer, as it's
a function-level pass.
(ii) this can result in stale data in the LLVM IR to Sandbox IR object
map, as other passes may delete LLVM IR objects.

To fix both issues this patch introduces a `Context::clear()` function
that clears the `LLVMValueToValueMap`.
2025-01-28 18:28:08 -08:00
Florian Hahn
6338bde568
[VPlan] Use cast<VPRecipeBase> in verifier (NFC).
All users of VPValue must be a VPRecipeBase, use cast.
2025-01-28 21:01:02 +00:00
Alexey Bataev
947d8ebbf3
[SLP]Unify getNumberOfParts use
Adds getNumberOfParts and uses it instead of similar code across code
base, fixes analysis of non-vectorizable types in
computeMinimumValueSizes.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/124774
2025-01-28 12:16:44 -05:00
Alexey Bataev
a1ab5b4c87 [SLP]Check the MainOp matches the requirements for the instructions
Need to include MainOp into the analysis of the instructions in
getSameOpcode to be sure that it is checked for the requirements to
prevent crashes during further analysis.
2025-01-28 06:00:52 -08:00
Alexey Bataev
1d5fbe83c3 [SLP]Adjust NumberOfParts value for adjusted number of buildvector scalars
Need to adjust NumParts value, when GatheredScalars scalars are adjusted
after extractelements analysis, to fix compiler crash
2025-01-28 05:45:13 -08:00
Jeremy Morse
304a99091c [NFC][DebugInfo] Use iterators for insertion at some final callsites
These are the callsites that have materialised in the last three weeks
since I last built with deprecation warnings.
2025-01-28 11:37:11 +00:00
Nicholas Guy
cdea38f91a
Reland "[LoopVectorizer] Add support for chaining partial reductions #120272" (#124282)
Change `getScaledReduction` to take an existing vector, rather than
creating and returning a new one each call.
Rename `getScaledReduction` to `getScaledReductions` to more accurately
reflect what it's now doing.

---------

Co-authored-by: Karlo Basioli <68535415+basioli-k@users.noreply.github.com>
2025-01-28 10:40:35 +00:00
David Sherwood
0f61558b97
[LoopVectorize][NFC] Remove unused variable in addUsersInExitBlocks (#124553)
We were allocating a VPTypeAnalysis object on the stack,
but never using it for anything.
2025-01-28 09:11:27 +00:00
vporpo
334a1cdbfa
[SandboxIR] createFunction() should always create a function (#124665)
This patch removes the assertion that checks for an existing function.
If one exists it will remove it and create a new one. This helps remove
a crash when a function declaration object already exists and we are
about to create a SandboxIR object for the definition.
2025-01-27 20:16:30 -08:00
Han-Kuan Chen
08d14e10ca
[SLP] Fix CommonMask will be transformed into an incorrect mask if createShuffle is called multiple times. (#124244)
We have two types of mask in SLP: a scalar mask and a vector mask.
When vectorizing four i32 additions into <4 x i32>, SLP creates a mask
of length 4.
When vectorizing four <2 x i32> additions into <8 x i32>, SLP also
creates a mask of length 4.
We refer to the first case as a scalar mask (because the mask element
represents a scalar, i32), and the second case as a vector mask (because
the mask element represents a vector, <4 x i32>).
At some point, we must convert the scalar mask into a vector mask
(otherwise, calling TTI cost functions or IRBuilderBase functions may
yield incorrect results).
Since both ShuffleCostEstimator and ShuffleInstructionBuilder can modify
the CommonMask, we have decided to perform the mask transformation only
within createShuffle. However, we do not store the transformed result,
as createShuffle may be called multiple times.
2025-01-28 12:02:37 +08:00
Florian Hahn
713482fccf [VPlan] Use State.get to extract lane mask for BranchOnMask.
Simplifies the code slightly and avoids redundant extracts/broadcasts if
the operand is live-in or already scalar.
2025-01-27 21:35:36 +00:00
Jeremy Morse
34b139594a
[NFC][DebugInfo] Switch more call-sites to using iterator-insertion (#124283)
To finalise the "RemoveDIs" work removing debug intrinsics, we're
updating call sites that insert instructions to use iterators instead.
This set of changes are those where it's not immediately obvious that
just calling getIterator to fetch an iterator is correct, and one or two
places where more than one line needs to change.

Overall the same rule holds though: iterators generated for the start of
a block such as getFirstNonPHIIt need to be passed into insert/move
methods without being unwrapped/rewrapped, everything else can use
getIterator.
2025-01-27 16:44:14 +00:00
Florian Hahn
09a29fcc8d
[VPlan] Don't collect live-ins in collectUsersInExitBlocks. (NFC) (#123819)
Live-ins don't need to be handled, other than adding to the exit phi
recipe. Do that early and assert that otherwise the exit value is
defined in the vector loop region.

This should enable simply skipping other exit values that do not need
further fixing, e.g. if handling the exit value from the early exit
directly in handleUncountableEarlyExit.

PR: https://github.com/llvm/llvm-project/pull/123819
2025-01-27 16:12:07 +00:00
Alexey Bataev
f1d5e70a00 [SLP][NFC]Do not check poison values for corresponding vectorized entries
No need to check poison values if they have been vectorized and/or mark
them as vectorized, it should work only for instructions.
2025-01-27 06:38:23 -08:00
Vasileios Porpodas
1c4341d176 [SandboxVec][DAG] Fix interval check without Node
This patch moves the check of whether a node exists before the check of
whether it is contained in the interval.
2025-01-26 11:54:09 -08:00
Florian Hahn
1395cd015f [VPlan] Support multi-exit loops in HCFG builder.
Update HCFG construction to support multi-exit loops. If there is no
unique exit block, map the middle block of the initial plan to the exit
block from the latch.

This further unifies HCFG construction and prepares for use to also
build an initial VPlan (VPlan0) for inner loops.

Effectively NFC as this isn't used on the default code path yet.
2025-01-25 21:55:15 +00:00
Vasileios Porpodas
b178c2d63e [SandboxVec][DAG] Fix trim schedule
Fix trimSchedule by skipping instructions without a DAG Node.
2025-01-25 09:42:14 -08:00
vporpo
5cb2db3b51
[SandboxVec][Scheduler] Forbid crossing BBs (#124369)
This patch updates the scheduler to forbid scheduling across BBs. It
should eventually be able to handle this, but we disable it for now.
2025-01-25 08:19:27 -08:00
Florian Hahn
6383a12e3b
[VPlan] Refactor HCFG builder to preserve original vector latch (NFC).
Update HCFG builder to preserve the original latch block of the initial
VPlan, ensuring there is always a latch.

It also skips creating the BranchOnCond for the latch of the top-level
loop, instead of removing it later. Exiting via the latch is controlled
by later recipes.

This further unifies HCFG construction and prepares for use to also
build an initial VPlan (VPlan0) for inner loops.
2025-01-25 13:32:01 +00:00
vporpo
6409799bdc
[SandboxVec][Legality] Pack from different BBs (#124363)
When the inputs of the pack come from different BBs we need to make sure
we emit the pack instructions at the correct place.
2025-01-24 15:39:37 -08:00
vporpo
ac75d32280
[SandboxVec][VecUtils] Filter out instructions not in BB in VecUtils:getLowest() (#124360)
This patch changes the functionality of `VecUtils::getLowest(Vals, BB)`
such that it filters out any instructions in `Vals` that are not in BB.
This is useful when Vals contains instructions from different BBs,
because in that case we are only interested in one BB.
2025-01-24 14:52:57 -08:00
vporpo
cff7ad56ba
[SandboxVec][Utils] Implement Utils::verifyFunction() (#124356)
This patch implements a wrapper function for the LLVM IR verifier for
functions, and calls it (flag-guarded) within the bottom-up-vectorizer
for finding IR bugs as soon as they happen.
2025-01-24 14:35:20 -08:00
vporpo
4b209c5d87
[SandboxIR][Region] Add cost modeling to the region (#124354)
This patch implements cost modeling for Region. All instructions that
are added or removed get their cost counted in the Scoreboard. This is
used for checking if the region before or after a transformation is more
profitable.
2025-01-24 14:28:55 -08:00
vporpo
b41987beae
[SandboxVec][DAG] Fix MemDGNode chain maintenance when move destination is non-mem (#124227)
This patch fixes a bug in the maintenance of the MemDGNode chain of the
DAG. Whenever we move a memory instruction, the DAG gets notified about
the move and maintains the chain of memory nodes. The bug was that if
the destination of the move was not a memory instruction, then the
memory node's next node would end up pointing to itself.
2025-01-24 13:59:32 -08:00
Simon Pilgrim
a12d7e4b61
[SLP] getVectorCallCosts - don't provide scalar argument data for vector IntrinsicCostAttributes (#124254)
getVectorCallCosts determines the cost of a vector intrinsic, based off
an existing scalar intrinsic call - but we were including the scalar
argument data to the IntrinsicCostAttributes, which meant that not only
was the cost calculation not type-only based, it was making incorrect
assumptions about constant values etc.

This also exposed an issue that x86 relied on fallback calculations for
funnel shift costs - this is great when we have the argument data as
that improves the accuracy of uniform shift amounts etc., but meant that
type-only costs would default to Cost=2 for all custom lowered funnel
shifts, which was far too cheap.

This is the reverse of #124129 where we weren't including argument data
when we could.

Fixes #63980
2025-01-24 15:13:13 +00:00
Jeremy Morse
6292a808b3
[NFC][DebugInfo] Use iterator-flavour getFirstNonPHI at many call-sites (#123737)
As part of the "RemoveDIs" project, BasicBlock::iterator now carries a
debug-info bit that's needed when getFirstNonPHI and similar feed into
instruction insertion positions. Call-sites where that's necessary were
updated a year ago; but to ensure some type safety however, we'd like to
have all calls to getFirstNonPHI use the iterator-returning version.

This patch changes a bunch of call-sites calling getFirstNonPHI to use
getFirstNonPHIIt, which returns an iterator. All these call sites are
where it's obviously safe to fetch the iterator then dereference it. A
follow-up patch will contain less-obviously-safe changes.

We'll eventually deprecate and remove the instruction-pointer
getFirstNonPHI, but not before adding concise documentation of what
considerations are needed (very few).

---------

Co-authored-by: Stephen Tozer <Melamoto@gmail.com>
2025-01-24 13:27:56 +00:00
Jeremy Morse
8e70273509
[NFC][DebugInfo] Use iterator moveBefore at many call-sites (#123583)
As part of the "RemoveDIs" project, BasicBlock::iterator now carries a
debug-info bit that's needed when getFirstNonPHI and similar feed into
instruction insertion positions. Call-sites where that's necessary were
updated a year ago; but to ensure some type safety however, we'd like to
have all calls to moveBefore use iterators.

This patch adds a (guaranteed dereferenceable) iterator-taking
moveBefore, and changes a bunch of call-sites where it's obviously safe
to change to use it by just calling getIterator() on an instruction
pointer. A follow-up patch will contain less-obviously-safe changes.

We'll eventually deprecate and remove the instruction-pointer
insertBefore, but not before adding concise documentation of what
considerations are needed (very few).
2025-01-24 10:53:11 +00:00
Elvis Wang
aff1242b8e
[LV] Align debug location of the widen-phi to the original phi. (#120338)
This patch align the debug location of the widen-phi to the debug
location of original phi.

Split from: #120054
2025-01-24 17:49:54 +08:00
vporpo
d2234ca163
[SandboxVec][BottomUpVec] Fix packing when PHIs are present (#124206)
Before this patch we might have emitted pack instructions in between PHI
nodes. This patch fixes it by fixing the insert point of the new packs.
2025-01-23 16:29:01 -08:00
vporpo
c7053ac202
[SandboxVec][BottomUpVec] Disable crossing BBs (#124039)
Crossing BBs is not currently supported by the structures of the
vectorizer. This patch fixes instances where this was happening,
including:
- a walk of use-def operands that updates the UnscheduledSuccs counter,
- the dead instruction removal is now done per BB,
- the scheduler, which will reject bundles that cross BBs.
2025-01-23 15:08:13 -08:00
Vitaly Buka
0e213834df
Revert "[LoopVectorizer] Add support for chaining partial reductions (#120272)" (#124198)
Introduced stack buffer overflow, see #120272.

`getScaledReduction` can return empty vector, and there is not check for
that.

This reverts commit c9b7303b9b18129c4ee6b56aaa2a0a9f59be2d09.
This reverts commit caf0540b91b0fee31353dc7049ae836e0f814cff.
2025-01-23 14:00:33 -08:00
Florian Hahn
7a831eb924 [VPlan] Remove unused VPLane::getNumCachedLanes. (NFC)
The function isn't used, remove it.
2025-01-23 20:38:47 +00:00
Karlo Basioli
c9b7303b9b
Add [[maybe_unused]] to a variable used only in assert in VPlan.h (#124173) 2025-01-23 18:52:53 +00:00
Alexey Bataev
c7e6ca76cb [SLP][NFC]Add dump() method for ScheduleData struct type for better debugging 2025-01-23 09:49:37 -08:00
Nicholas Guy
caf0540b91
[LoopVectorizer] Add support for chaining partial reductions (#120272)
Chaining partial reductions, where multiple partial reductions share an
accumulator, allow for more values to be combined together as part of
the reduction without discarding the semantics of the partial reduction
itself.
2025-01-23 17:24:57 +00:00
Simon Pilgrim
d8cd8d56ea
[SLP] getSpillCost - fully populate IntrinsicCostAttributes to improve cost analysis. (#124129)
We were only constructing the IntrinsicCostAttributes with the arg type info, and not the args themselves, preventing more detailed cost analysis (constant / uniform args etc.)

Just pass the whole IntrinsicInst to the constructor and let it resolve everything it can.

Noticed while having yet another attempt at #63980
2025-01-23 16:57:13 +00:00
Alexey Bataev
fa299294c0 [SLP][NFC]Modernize code base in several places 2025-01-23 08:43:07 -08:00
Nicholas Guy
26b61e143b
[LoopVectorizer] Propagate underlying instruction to the cloned instances of VPPartialReductionRecipes (#123638) 2025-01-23 14:57:31 +00:00
Florian Hahn
05fbc3830d [VPlan] Move VPBlockUtils to VPlanUtils.h (NFC)
Nothing in VPlan.h directly uses VPBlockUtils.h. Move it out to the more
appropriate VPlanUtils.h to reduce the size of the widely included VPlan.h.
2025-01-23 11:28:11 +00:00
Han-Kuan Chen
d3aea77f50
[SLP] Move transformMaskAfterShuffle into BaseShuffleAnalysis and use it as much as possible. (#123896) 2025-01-23 09:47:38 +08:00
vporpo
8110af75b1
[SandboxVec][BottomUpVec] Fix codegen when packing constants. (#124033)
Before this patch packing a bundle of constants would crash because
`getInsertPointAfterInstrs()` expected instructions. This patch fixes
this.
2025-01-22 16:38:10 -08:00
vporpo
2dc1c95595
[SandboxVec][VecUtils] Implement VecUtils::getLowest() (#124024)
VecUtils::getLowest(Valse) returns the lowest instruction in the BB among Vals.
If the instructions are not in the same BB, or if none of them is an
instruction it returns nullptr.
2025-01-22 16:08:15 -08:00
vporpo
fd087135ef
[SandboxVec][Legality] Diamond reuse multi input (#123426)
This patch implements the diamond pattern where we are vectorizing
toward the top of the diamond from both edges, but the second edge may
use elements from a different vector or just scalar values. This
requires some additional packing code (see lit test).
2025-01-22 15:23:47 -08:00
David Sherwood
4a2ebd6661
[LV][NFC] Refactor structures used to maintain uncountable exit info (#123219)
I've removed the HasUncountableEarlyExit variable, since we can
already determine whether or not a loop has an early exit by seeing
if we found an uncountable exit.

I have also deleted the old UncountableExitingBlocks and
UncountableExitBlocks lists and replaced them with a single
uncountable edge. This means we don't need to worry about keeping the
list entries in sync and makes it clear which exiting block
corresponds to which exit block.
2025-01-22 09:40:08 +00:00
Vasileios Porpodas
4089314907 [SandboxVec][DAG][NFC] Remove early return in notifyMoveInstr()
It used to early return when destination is same as origin. But it's redundant
because in that case the callback won't get called in the first place.
2025-01-21 18:38:39 -08:00
Florian Hahn
6c787ff6cf
Revert "[LV]: Teach LV to recursively (de)interleave. (#122989)"
This reverts commit 9491f75e1d912b277247450d1c7b6d56f7faf885.

This triggers an assert when building with SVE enabled.
https://lab.llvm.org/buildbot/#/builders/143/builds/4795
2025-01-21 21:36:16 +00:00