2296 Commits

Author SHA1 Message Date
Alexey Bataev
0bcf45ea34
[SLP]Initial FMAD support (#149102)
Added initial check for potential fmad conversion in reductions and
operands vectorization.
2025-08-07 09:51:43 -04:00
Alexey Bataev
4784ce9ebc [SLP][NFC]Check an external user before trying to address it in debug dump, NFC 2025-08-06 08:58:16 -07:00
Alexey Bataev
e27831ff9b [SLP] Fix a check for main/alternate interchanged instruction
If the instruction is checked for matching the main instruction, need to
check if the opcode of the main instruction is compatible with the
operands of the instruction. If they are not, need to check the
alternate instruction and its operands for compatibility and return
alternate instruction as a match.

Fixes #151699

Fixed check for non-supported binary operations.
2025-08-04 11:20:54 -07:00
Michael Halkenhäuser
70af09e3a1
Revert "[SLP] Fix a check for main/alternate interchanged instruction" (#151997)
This reverts commit 3ee8d047109ea4bb479095f4b153c2120a8d726c.

Revert reason: FAILED build for openmp-offload-amdgpu-runtime-2 
https://lab.llvm.org/buildbot/#/builders/10/builds/10827
2025-08-04 12:57:20 -04:00
Alexey Bataev
3ee8d04710 [SLP] Fix a check for main/alternate interchanged instruction
If the instruction is checked for matching the main instruction, need to
check if the opcode of the main instruction is compatible with the
operands of the instruction. If they are not, need to check the
alternate instruction and its operands for compatibility and return
alternate instruction as a match.

Fixes #151699
2025-08-04 08:31:35 -07:00
Alexey Bataev
7cd1ce3aa0 [SLP]Check vector-like instruction for dominance in copyables
Need to check if the vector-like instruction is dominated by main
operation in the copyables to prevent broken def-use chain

Fixes #151456
2025-08-04 06:14:19 -07:00
Kazu Hirata
3549134836
[Vectorize] Remove an unnecessary cast (NFC) (#151850)
getNumElements() already returns unsigned.
2025-08-03 08:44:50 -07:00
Alexey Bataev
ef98e248c7 [SLP]Initial support for copyable elements (non-schedulable only)
Adds initial support for copyable elements. This patch only models adds
and model copyable elements as add <element>, 0, i.e. uses identity
constants for missing lanes.
Only support for elements, which do not require scheduling, is added to
reduce size of the patch.

Fixed compile time regressions, reported crashes, updated release notes

Reviewers: RKSimon, hiraditya

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/140279
2025-07-25 10:55:07 -07:00
Martin Storsjö
936ee35dcc Revert "[SLP]Initial support for copyable elements (non-schedulable only)"
This reverts commit 898bba311f180ed54de33dc09e7071c279a4942a.

This change caused hangs and crashes, see
https://github.com/llvm/llvm-project/pull/140279#issuecomment-3115051063.
2025-07-25 01:22:20 +03:00
Martin Storsjö
bd170b78bb Revert "[SLP] Check if the user node has state before trying getting main instruction/opcode"
This reverts commit c9cea24fe68e24750b2d479144f839e1c2ec9d2b.

This is being reverted as it is intermixed with another commit
(898bba311f180ed54de33dc09e7071c279a4942a) that needs to be reverted.
2025-07-25 01:22:19 +03:00
Alexey Bataev
c9cea24fe6 [SLP] Check if the user node has state before trying getting main instruction/opcode
Need to check if the parent node has state to prevent compiler crashes.
Fixes #150479
2025-07-24 12:00:43 -07:00
Alexey Bataev
898bba311f [SLP]Initial support for copyable elements (non-schedulable only)
Adds initial support for copyable elements. This patch only models adds
and model copyable elements as add <element>, 0, i.e. uses identity
constants for missing lanes.
Only support for elements, which do not require scheduling, is added to
reduce size of the patch.

Fixed compile time regressions, updated release notes

Reviewers: RKSimon, hiraditya

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/140279
2025-07-23 13:38:34 -07:00
Alexey Bataev
a415d68e48 Revert "[SLP]Initial support for copyable elements (non-schedulable only)"
This reverts commit e202dba288edd47f1b370cc43aa8cd36a924e7c1 to try to
resolve compile time issues, reported in https://llvm-compile-time-tracker.com/compare.php?from=36089e5d983fe9ae00f497c2d500f37227f82db1&to=e202dba288edd47f1b370cc43aa8cd36a924e7c1&stat=instructions%3Au&details=on
2025-07-22 07:39:32 -07:00
Alexey Bataev
e202dba288
[SLP]Initial support for copyable elements (non-schedulable only)
Adds initial support for copyable elements. This patch only models adds
and model copyable elements as add <element>, 0, i.e. uses identity
constants for missing lanes.
Only support for elements, which do not require scheduling, is added to
reduce size of the patch.

Reviewers: RKSimon, hiraditya

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/140279
2025-07-21 14:07:28 -04:00
Florian Hahn
004c67ea25
[LV] Vectorize maxnum/minnum w/o fast-math flags. (#148239)
Update LV to vectorize maxnum/minnum reductions without fast-math flags,
by adding an extra check in the loop if any inputs to maxnum/minnum are
NaN, due to maxnum/minnum behavior w.r.t to signaling NaNs. Signed-zeros 
are already handled consistently by maxnum/minnum.

If any input is NaN,
 *exit the vector loop,
 *compute the reduction result up to the vector iteration that contained
   NaN inputs and
 * resume in the scalar loop


New recurrence kinds are added for reductions using maxnum/minnum
without fast-math flags.

PR: https://github.com/llvm/llvm-project/pull/148239
2025-07-18 21:58:19 +01:00
Alexey Bataev
60ae9c9c63
[SLP]Do not consider non-profitable loads slices
If all slices are small and end up with strided or even vectorization
states, better to not consider these candidates for the vectorization
and try to vectorize the whole bunch as gathered loads.

Reviewers: hiraditya, RKSimon, HanKuanChen

Reviewed By: RKSimon, HanKuanChen

Pull Request: https://github.com/llvm/llvm-project/pull/149209
2025-07-17 08:00:02 -04:00
Piotr Fusik
ade2f1023d
[SLP][NFCI] Don't trim indexes, reuse a variable (#149074) 2025-07-16 14:09:27 +02:00
Piotr Fusik
7674566c96
[SLP][NFC] Simplify count_if to count (#149072) 2025-07-16 14:09:09 +02:00
Piotr Fusik
949103b45c
[SLP][NFC] Use range-based for in matchAssociativeReduction (#149029) 2025-07-16 14:08:41 +02:00
Jeremy Morse
57a5f9c47e
[DebugInfo][RemoveDIs] Suppress getNextNonDebugInfoInstruction (#144383)
There are no longer debug-info instructions, thus we don't need this
skipping. Horray!
2025-07-15 15:34:10 +01:00
Gaëtan Bossu
adb6efeac9
[SLP] Fix cost estimation of external uses with wrong VF (#148185)
It assumed that the VF remains constant throughout the tree. That's not
always true. This meant that we could query the extraction cost for a
lane that is out of bounds.

While experimenting with re-vectorisation for AArch64, we ran into this
issue. We cannot add a proper AArch64 test as more changes would need to
be brought in.

This commit is only fixing the computation of VF and adding an assert.
Some tests were failing after adding the assert:
 - foo() in llvm/test/Transforms/SLPVectorizer/X86/horizontal.ll
- test() in
llvm/test/Transforms/SLPVectorizer/X86/reduction-with-removed-extracts.ll
- test_with_extract() in
llvm/test/Transforms/SLPVectorizer/RISCV/segmented-loads.ll
2025-07-15 11:39:09 +01:00
Alexey Bataev
a999a1b88c
[SLP]Remove emission of vector_insert/vector_extract intrinsics
Replaced by the regular shuffles.

Fixes #145512

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/148007
2025-07-11 15:26:45 -04:00
Gaëtan Bossu
d386b3b0b5
[SLP] Harmonise findLaneForValue() return type (NFC) (#148232)
The lane is computed as an unsigned, so let's return it as unsigned.
2025-07-11 14:05:22 +01:00
Alexey Bataev
dd60663b9b [SLP] Emit reduction instead of 2 extracts + scalar op, when vectorizing operands (#147583)
Added emission of the 2-element reduction instead of 2 extracts + scalar
op, when trying to vectorize operands of the instruction, if it is more
profitable.
2025-07-10 12:50:52 -07:00
Alex Bradbury
18627e995c Revert "[SLP] Emit reduction instead of 2 extracts + scalar op, when vectorizing operands (#147583)"
This reverts commit ac4a38e9bd573a173432b89cbef7cce7a48e7907.

This breaks the RVV builders
(MicroBenchmarks/ImageProcessing/Blur/blur.test and
MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4.test from llvm-test-suite)
and reportedly SPEC Accel2023
<https://github.com/llvm/llvm-project/pull/147583#issuecomment-3057183138>.
2025-07-10 14:55:22 +01:00
Alexey Bataev
ac4a38e9bd
[SLP] Emit reduction instead of 2 extracts + scalar op, when vectorizing operands (#147583)
Added emission of the 2-element reduction instead of 2 extracts + scalar
op, when trying to vectorize operands of the instruction, if it is more
profitable.
2025-07-09 19:52:09 -04:00
Ramkumar Ramachandra
62f8377e40
[LV] Extend FindFirstIV to unsigned case (#146386)
Extend FindFirstIV vectorization to the unsigned case by introducing and
handling FindFirstIVUMin.

Co-authored-by: Florian Hahn <flo@fhahn.com>
2025-07-09 15:56:40 +01:00
Alexey Bataev
9e132f5068 [SLP][NFC]Move function SLPVectorizerPass::tryToVectorize around, NFC 2025-07-09 05:34:36 -07:00
Gaëtan Bossu
50facad7fc
[SLP][REVEC] Fix insertelement legality checks (#146921)
The current code assumes that all the values in VL are valid
instructions, while it is possible to get poison.
2025-07-09 10:28:50 +01:00
Rahul Joshi
b38de6c18e
[NFCI][LLVM] Adopt ArrayRef::consume_front() in a few places (#146793) 2025-07-04 10:42:14 -07:00
Austin
a550fef906
[llvm] Use llvm::fill instead of std::fill(NFC) (#146911)
Use llvm::fill instead of std::fill
2025-07-04 14:10:28 +08:00
Florian Hahn
20fbbd7675
[LV] Add support for cmp reductions with decreasing IVs. (#140451)
Similar to FindLastIV, add FindFirstIVSMin to support select (icmp(), x, y)
reductions where one of x or y is a decreasing induction, producing a SMin
 reduction. It uses signed max as sentinel value.

PR: https://github.com/llvm/llvm-project/pull/140451
2025-06-29 11:17:03 +01:00
Ramkumar Ramachandra
bb8c42e859
[LV] Extend FindLastIV to unsigned case (#141752)
Split the FindLastIV RecurKind into SMax and UMax variants, depending on
the reduction op produced.
2025-06-23 15:27:49 +01:00
David Green
77941eba7f
[CostModel] Add a DstTy to getShuffleCost (#141634)
A shuffle will take two input vectors and a mask, to produce a new
vector of size <MaskElts x SrcEltTy>. Historically it has been assumed
that the SrcTy and the DstTy are the same for getShuffleCost, with that
being relaxed in recent years. If the Tp passed to getShuffleCost is the
SrcTy, then the DstTy can be calculated from the Mask elts and the src
elt size, but the Mask is not always provided and the Tp is not reliably
always the SrcTy. This has led to situations notably in the SLP
vectorizer but also in the generic cost routines where assumption about
how vectors will be legalized are built into the generic cost routines -
for example whether they will widen or promote, with the cost modelling
assuming they will widen but the default lowering to promote for integer
vectors.

This patch attempts to start improving that - it originally tried to
alter more of the cost model but that too quickly became too many
changes at once, so this patch just plumbs in a DstTy to getShuffleCost
so that DstTy and SrcTy can be reliably distinguished. The callers of
getShuffleCost have been updated to try and include a DstTy that is more
accurate. Otherwise it tries to be fairly non-functional, keeping the
SrcTy used as the primary type used in shuffle cost routines, only using
DstTy where it was in the past (for InsertSubVector for example).

Some asserts have been added that help to check for consistent values
when a Mask and a DstTy are provided to getShuffleCost. Some of them
took a while to get right, and some non-mask calls might still be
incorrect. Hopefully this will provide a useful base to build more
shuffles that alter size.
2025-06-21 12:29:29 +01:00
Sander de Smalen
874773635d
[SLP] NFC: Simplify CandidateVFs initialization (#144882)
Also adds a comment to clarify the meaning of MaxRegVF.
2025-06-20 10:00:55 +01:00
Kazu Hirata
64fe323647
[llvm] Migrate away from ArrayRef(std::nullopt) (NFC) (#144967)
ArrayRef has a constructor that accepts std::nullopt.  This
constructor dates back to the days when we still had llvm::Optional.

Since the use of std::nullopt outside the context of std::optional is
kind of abuse and not intuitive to new comers, I would like to move
away from the constructor and eventually remove it.

This patch takes care of the llvm side of the migration.
2025-06-19 21:31:26 -07:00
Alexey Bataev
0108a5908c [SLP]Fix a crash on an subvector size calculation for non-power-of-2 vector
Patch fixes cost estimation for the extractelements from non-power-of-2
vectors, defined as subvector extracts. In this case the subvector size
might be not adjusted to a whole register size, need to get the minimum
between whole vector size and the actual difference to prevent compiler
crash.

Fixes #143513
2025-06-17 08:58:07 -07:00
Jeffrey Byrnes
c9a87a50ae
[SLPVectorizer] Use accurate cost for external users of resize shuffles (#137419)
When implementing the vectorization, we potentially need to add shuffles
for external users. In such cases, we may be shuffling a smaller vector
into a larger vector. When this happens `ResizeToVF` will just build a
poison padded identity vector. Then the to build the final shuffle, we
just use the `SK_InsertSubvector` mask.

This is possibly clearer by looking at the included test in
SLPVectorizer/AMDGPU/external-shuffle.ll

In the exit block we have a bunch of shuffles to glue the vectorized
tree match the `InsertElement` users. `TMP25` holds the result of
resizing the v2i16 vectorized sequence to match the `InsertElement` size
v16i16. Then `TMP26` is the final shuffle which replaces the
`InsertElement` sequence. This is just an insertsubvector.

However, when calculating the cost for these shuffles, we aren't
modelling this correctly. `ResizeToVF` will indicate to
`performExtractsShuffleAction` that we cannot use the original mask due
to the resize shuffle. The consequence is that the cost calculation uses
a different shuffle mask than what is ultimately used.

Going back to the included test, we can consider again `TMP26`. Clearly
we can see the shuffle uses a mask {0, 1, 2, 3, 16, 17, poison ..}.
However, we will currently calculate the cost with a mask {0, 1, 2, 3,
20, 21, ...} we have replaced 16 and 17 with 20 and 21 (Index + Vector
Size). Queries like BasicTTImpl::improveShuffleKindFromMask will not
recognize this as an `SK_InsertSubvector` mask, and targets which have
reduced costs for `SK_InsertSubvector` will not accurately calculate the
cost.
2025-06-17 08:14:05 -07:00
Jeremy Morse
9eb0020555
[DebugInfo][RemoveDIs] Remove a swathe of debug-intrinsic code (#144389)
Seeing how we can't generate any debug intrinsics any more: delete a
variety of codepaths where they're handled. For the most part these are
plain deletions, in others I've tweaked comments to remain coherent, or
added a type to (what was) type-generic-lambdas.

This isn't all the DbgInfoIntrinsic call sites but it's most of the
simple scenarios.

Co-authored-by: Nikita Popov <github@npopov.com>
2025-06-17 15:55:14 +01:00
Han-Kuan Chen
414710c753
[SLP] Fix isCommutative to check uses of the original instruction instead of the converted instruction. (#143094) 2025-06-17 22:03:14 +08:00
Gaëtan Bossu
087d83e0c6
[SLP] vectorizeStores: Name things a bit more clearly (NFC) (#144511)
I believe the new variable names better convey their purpose. However, I
also believe that function is more complex than it needs to be, and this
tiny patch should be seen as a first step towards (maybe) further
refactoring.

The previous names were very generic (Size, Sz, Cnt, StartIdx). This
made it easy to get confused given that the vecotrizeStores() function
is already complex enough.

My hope would be to eventually have a function concise enough to clearly
see what are the different strategies being attempted to vectorise a
group of related store instructions.
2025-06-17 13:20:52 +01:00
Stephen Tozer
aa8a1fa6f5
[DLCov][NFC] Annotate intentionally-blank DebugLocs in existing code (#136192)
Following the work in PR #107279, this patch applies the annotative
DebugLocs, which indicate that a particular instruction is intentionally
missing a location for a given reason, to existing sites in the compiler
where their conditions apply. This is NFC in ordinary LLVM builds (each
function `DebugLoc::getFoo()` is inlined as `DebugLoc()`), but marks the
instruction in coverage-tracking builds so that it will be ignored by
Debugify, allowing only real errors to be reported. From a developer
standpoint, it also communicates the intentionality and reason for a
missing DebugLoc.

Some notes for reviewers:

- The difference between `I->dropLocation()` and
`I->setDebugLoc(DebugLoc::getDropped())` is that the former _may_ decide
to keep some debug info alive, while the latter will always be empty; in
this patch, I always used the latter (even if the former could
technically be correct), because the former could result in some
(barely) different output, and I'd prefer to keep this patch purely NFC.
- I've generally documented the uses of `DebugLoc::getUnknown()`, with
the exception of the vectorizers - in summary, they are a huge cause of
dropped source locations, and I don't have the time or the domain
knowledge currently to solve that, so I've plastered it all over them as
a form of "fixme".
2025-06-11 17:42:10 +01:00
Kazu Hirata
9ea3972cd1
[Vectorize] Strip away lambdas (NFC) (#143279)
We don't need lambdas here.
2025-06-08 01:34:09 -07:00
Ramkumar Ramachandra
b40e4ceaa6
[ValueTracking] Make Depth last default arg (NFC) (#142384)
Having a finite Depth (or recursion limit) for computeKnownBits is very
limiting, but is currently a load-bearing necessity, as all KnownBits
are recomputed on each call and there is no caching. As a prerequisite
for an effort to remove the recursion limit altogether, either using a
clever caching technique, or writing a easily-invalidable KnownBits
analysis, make the Depth argument in APIs in ValueTracking uniformly the
last argument with a default value. This would aid in removing the
argument when the time comes, as many callers that currently pass 0
explicitly are now updated to omit the argument altogether.
2025-06-03 17:12:24 +01:00
Alexey Bataev
cb648ba970 [SLP]Check if the user node has instructions, used only outside
Gather nodes with parents, which scalar instructions are used only
outside, are generated before the whole tree vectorization. Need to
teach isGatherShuffledSingleRegisterEntry to check that such nodes are
emitted first and they cannot depend on other nodes, which are emitted
later.

Fixes #141628
2025-05-29 10:09:49 -07:00
Alexey Bataev
aa452b65fc [SLP]Restore insertion points after gathers vectorization
Restore insertion points after gathers vectorization to avoid a crash in
a root node vectorization.

Fixes #141265
2025-05-24 07:25:20 -07:00
Ramkumar Ramachandra
0240129218
[IVDesc] Unify RecurKinds [I|F]AnyOf (#118393)
Co-authored-by: Mel Chen <mel.chen@sifive.com>
2025-05-23 11:57:30 +01:00
Ramkumar Ramachandra
b81170ecff
[IVDesc] Unify RecurKinds [I|F]FindLastIV (NFC) (#141082) 2025-05-22 22:48:01 +01:00
Alexey Bataev
2318491432 [SLP][NFC]Do the analysis first and then actual codegen, NFC 2025-05-20 08:12:53 -07:00
Alexey Bataev
a0058d1851
[SLP][NFC]Make TreeEntry a class and store "need-to-schedule" state
TreeEntry should be a class, not a struct, since it has private members.
Also, do no repeat Does-Not-Need-To-Schedule analysis during codegen,
codegen may affect the result of the analysis in future patches.

Reviewers: hiraditya, HanKuanChen, RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/140734
2025-05-20 10:33:59 -04:00