2255 Commits

Author SHA1 Message Date
Stephen Tozer
aa8a1fa6f5
[DLCov][NFC] Annotate intentionally-blank DebugLocs in existing code (#136192)
Following the work in PR #107279, this patch applies the annotative
DebugLocs, which indicate that a particular instruction is intentionally
missing a location for a given reason, to existing sites in the compiler
where their conditions apply. This is NFC in ordinary LLVM builds (each
function `DebugLoc::getFoo()` is inlined as `DebugLoc()`), but marks the
instruction in coverage-tracking builds so that it will be ignored by
Debugify, allowing only real errors to be reported. From a developer
standpoint, it also communicates the intentionality and reason for a
missing DebugLoc.

Some notes for reviewers:

- The difference between `I->dropLocation()` and
`I->setDebugLoc(DebugLoc::getDropped())` is that the former _may_ decide
to keep some debug info alive, while the latter will always be empty; in
this patch, I always used the latter (even if the former could
technically be correct), because the former could result in some
(barely) different output, and I'd prefer to keep this patch purely NFC.
- I've generally documented the uses of `DebugLoc::getUnknown()`, with
the exception of the vectorizers - in summary, they are a huge cause of
dropped source locations, and I don't have the time or the domain
knowledge currently to solve that, so I've plastered it all over them as
a form of "fixme".
2025-06-11 17:42:10 +01:00
Kazu Hirata
9ea3972cd1
[Vectorize] Strip away lambdas (NFC) (#143279)
We don't need lambdas here.
2025-06-08 01:34:09 -07:00
Ramkumar Ramachandra
b40e4ceaa6
[ValueTracking] Make Depth last default arg (NFC) (#142384)
Having a finite Depth (or recursion limit) for computeKnownBits is very
limiting, but is currently a load-bearing necessity, as all KnownBits
are recomputed on each call and there is no caching. As a prerequisite
for an effort to remove the recursion limit altogether, either using a
clever caching technique, or writing a easily-invalidable KnownBits
analysis, make the Depth argument in APIs in ValueTracking uniformly the
last argument with a default value. This would aid in removing the
argument when the time comes, as many callers that currently pass 0
explicitly are now updated to omit the argument altogether.
2025-06-03 17:12:24 +01:00
Alexey Bataev
cb648ba970 [SLP]Check if the user node has instructions, used only outside
Gather nodes with parents, which scalar instructions are used only
outside, are generated before the whole tree vectorization. Need to
teach isGatherShuffledSingleRegisterEntry to check that such nodes are
emitted first and they cannot depend on other nodes, which are emitted
later.

Fixes #141628
2025-05-29 10:09:49 -07:00
Alexey Bataev
aa452b65fc [SLP]Restore insertion points after gathers vectorization
Restore insertion points after gathers vectorization to avoid a crash in
a root node vectorization.

Fixes #141265
2025-05-24 07:25:20 -07:00
Ramkumar Ramachandra
0240129218
[IVDesc] Unify RecurKinds [I|F]AnyOf (#118393)
Co-authored-by: Mel Chen <mel.chen@sifive.com>
2025-05-23 11:57:30 +01:00
Ramkumar Ramachandra
b81170ecff
[IVDesc] Unify RecurKinds [I|F]FindLastIV (NFC) (#141082) 2025-05-22 22:48:01 +01:00
Alexey Bataev
2318491432 [SLP][NFC]Do the analysis first and then actual codegen, NFC 2025-05-20 08:12:53 -07:00
Alexey Bataev
a0058d1851
[SLP][NFC]Make TreeEntry a class and store "need-to-schedule" state
TreeEntry should be a class, not a struct, since it has private members.
Also, do no repeat Does-Not-Need-To-Schedule analysis during codegen,
codegen may affect the result of the analysis in future patches.

Reviewers: hiraditya, HanKuanChen, RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/140734
2025-05-20 10:33:59 -04:00
Alexey Bataev
3918ef3688
[SLP]Fix the analysis for masked compress loads
Need to remove the check for Orders in interleaved loads analysis and
estimate shuffle cost without the reordering to correctly handle the
costs of masked compress loads.

Reviewers: hiraditya, HanKuanChen, RKSimon

Reviewed By: HanKuanChen, RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/140647
2025-05-20 07:31:16 -04:00
Alexey Bataev
30ebcf6280
[SLP][NFC]Store operand entries in the map
Instead of looking through all the vectorizable tree to find the operand
entry, better to store it in a separate map and perform quick lookup,
basing on user tree entry and operand index.
It allows to remove lots of duplicated code, simplify processing and fix
potential future issues with the analysis, affected by the codegen.
Also, improves compile time.

Reviewers: HanKuanChen, RKSimon, hiraditya

Reviewed By: hiraditya

Pull Request: https://github.com/llvm/llvm-project/pull/140549
2025-05-19 19:53:47 -04:00
Alexey Bataev
bb8e2a8937 [SLP]Relax assertion to avoid compiler crash
Need to relax the assertion to fix a compiler crash in case if the
reordered compress loads are more profitable than the ordered ones.

Fixes #140334
2025-05-18 14:26:36 -07:00
Alexey Bataev
fb86b3d96b [SLP]Change the insertion point for outside-block-used nodes and prevec phi operand gathers
Need to set the insertion point for (non-schedulable) vector node after
the last instruction in the node to avoid def-use breakage. But it also
causes miscompilation with gather/buildvector operands of the phi nodes,
used in the same phi only in the block.
These nodes supposed to be inserted at the end of the block and after
changing the insertion point for the non-schedulable vec block, it also
may break def-use dependencies. Need to prevector such nodes, to emit
them as early as possible, so the vectorized nodes are inserted before
these nodes.

Fixes #139728

Recommit after revert 60fb92179291e848eb7b04913bdc818d081db296

Reviewers: hiraditya, HanKuanChen, RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/139917
2025-05-18 12:59:36 -07:00
Alexey Bataev
60fb921792 Revert "[SLP]Change the insertion point for outside-block-used nodes and prevec phi operand gathers"
This reverts commit d79d9b8fbfc7e8411aeaf2f5e1be9d4247594fee to fix
a bug reported in https://github.com/llvm/llvm-project/pull/139917#issuecomment-2888216404
2025-05-17 11:06:37 -07:00
Alexey Bataev
d79d9b8fbf
[SLP]Change the insertion point for outside-block-used nodes and prevec phi operand gathers
Need to set the insertion point for (non-schedulable) vector node after
the last instruction in the node to avoid def-use breakage. But it also
causes miscompilation with gather/buildvector operands of the phi nodes,
used in the same phi only in the block.
These nodes supposed to be inserted at the end of the block and after
changing the insertion point for the non-schedulable vec block, it also
may break def-use dependencies. Need to prevector such nodes, to emit
them as early as possible, so the vectorized nodes are inserted before
these nodes.

Fixes #139728

Reviewers: hiraditya, HanKuanChen, RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/139917
2025-05-16 12:52:27 -04:00
Ramkumar Ramachandra
c807395011
[LAA/SLP] Don't truncate APInt in getPointersDiff (#139941)
Change getPointersDiff to return an std::optional<int64_t>, and fill
this value with using APInt::trySExtValue. This simple change requires
changes to other functions in LAA, and major changes in SLPVectorizer
changing types from 32-bit to 64-bit.

Fixes #139202.
2025-05-15 10:08:05 +01:00
Kazu Hirata
690a30f3fd
[llvm] Construct SmallVector with ArrayRef (NFC) (#139992) 2025-05-14 22:30:38 -07:00
Alexey Bataev
a05cf2927a [SLP][NFC]Use WeakTrackVH instead of Instruction in EntryToLastInstruction
Use WEakTrackVH to prevent instability in the vectorizer.

Fixes #139729
2025-05-14 11:19:54 -07:00
Alexey Bataev
e1ea86e849 [SLP]Do not try to use interleaved loads, if reordering is required
If the interleaved loads require reordering, better to avoid generate
load + shuffle sequence, which in this case cannot be recognized as
interleaved load. Also, it fixes the issue with the incorrect codegen.

Fixes #138923
2025-05-12 14:12:51 -07:00
Han-Kuan Chen
53df6400af
[SLP] Fix incorrect operand order in interchangeable instruction. (#139225) 2025-05-12 20:03:45 +08:00
Alexey Bataev
c870b675db [SLP][NFC]Extract values state/operands analysis into separate class
Extract values state and operands analysis/building into a separate
class. This class allows to localize instrutions state and operands
building for future support of copyable elements vectorization.

Recommit after revert 10f512074fb13ab5da9f49c25965508f51c8452a

Recommit after revert 6a2a8ebe27c1941f5b952313239fc6d155f58e9d

Reviewers: HanKuanChen, RKSimon

Reviewed By: HanKuanChen

Pull Request: https://github.com/llvm/llvm-project/pull/138724
2025-05-11 08:14:05 -07:00
Alex Bradbury
6a2a8ebe27 Revert "[SLP][NFC]Extract values state/operands analysis into separate class"
This reverts commit 512a5d0b8aa82749995204f4852e93757192288a.

It broke RISC-V vector code generation on some inputs (oggenc.c from
llvm-test-suite), as found by our CI. Reduced test case and more
information posted in #138274.
2025-05-10 16:02:47 +01:00
Alexey Bataev
512a5d0b8a [SLP][NFC]Extract values state/operands analysis into separate class
Extract values state and operands analysis/building into a separate
class. This class allows to localize instrutions state and operands
building for future support of copyable elements vectorization.

Recommit after revert 10f512074fb13ab5da9f49c25965508f51c8452a

Reviewers: HanKuanChen, RKSimon

Reviewed By: HanKuanChen

Pull Request: https://github.com/llvm/llvm-project/pull/138724
2025-05-09 07:37:37 -07:00
Alexey Bataev
10f512074f Revert "[SLP][NFC]Extract values state/operands analysis into separate class"
This reverts commit 3954e9d6235d4e90c3f786594e877ab83fab3bf1to fix
a buildbot https://lab.llvm.org/buildbot/#/builders/46/builds/16518.
2025-05-09 06:52:55 -07:00
Alexey Bataev
3954e9d623
[SLP][NFC]Extract values state/operands analysis into separate class
Extract values state and operands analysis/building into a separate
class. This class allows to localize instrutions state and operands
building for future support of copyable elements vectorization.

Reviewers: HanKuanChen, RKSimon

Reviewed By: HanKuanChen

Pull Request: https://github.com/llvm/llvm-project/pull/138724
2025-05-09 09:38:49 -04:00
Gaëtan Bossu
19174126cf
[SLP] Simplify buildTree() legality checks (NFC) (#138833)
This NFC aims to simplify the interfaces used in `buildTree()` to make
it easier to understand where decisions for legality are made.

In particular, there is now a single point of definition for legality
decisions. This makes it clear where all those decisions are made.
Previously, multiple variables with a large scope were passed by
reference.
2025-05-08 08:34:53 +01:00
Alexey Bataev
3aecbbcbf6 [SLP]Do not match nodes if schedulability of parent nodes is different
If one user node is non-schedulable and another one is schedulable, such
nodes should be considered matched. The selection of the actual insert
point in this case differs and the insert points may match, which may
cause a compiler crash because of the broken def-use chain.

Fixes #137797
2025-05-06 07:52:49 -07:00
Kazu Hirata
6ab7cb7899
[Transforms] Remove unused local variables (NFC) (#138442) 2025-05-04 00:35:22 -07:00
Craig Topper
123758b1f4
[IRBuilder] Add versions of createInsertVector/createExtractVector that take a uint64_t index. (#138324)
Most callers want a constant index. Instead of making every caller
create a ConstantInt, we can do it in IRBuilder. This is similar to
createInsertElement/createExtractElement.
2025-05-02 16:10:18 -07:00
Kazu Hirata
4ec473e0e1
[llvm] Remove redundant calls to std::unique_ptr<T>::get (NFC) (#138236) 2025-05-02 08:53:53 -07:00
Alexey Bataev
9400270449 [SLP]Fix comparator for vector operands of extractelements in PHICompare
Need to make comparator to follow strict-weak ordering to fix compiler
crashes.

Fixes #138178
2025-05-01 14:28:20 -07:00
Jonas Paulsson
f5c8c1eedb
[SLPVectorizer] Move X86 specific handling into X86TTIImpl. (#137830)
`ad9909d "[SLP]Fix perfect diamond match with extractelements in scalars" `
changed SLPVectorizer getScalarizationOverhead() to call
TTI.getVectorInstrCost() instead of TTI.getScalarizationOverhead() in some
cases. This was due to X86 specific handlings in these (overridden) methods,
and unfortunately the general preference of TTI.getScalarizationOverhead()
was dropped. If VL is available it should always be preferred to use
getScalarizationOverhead(), and this is indeed the case for SystemZ which
has a special insertion instruction that can insert two GPR64s.

Then ` 33af951 "[SLP]Synchronize cost of gather/buildvector nodes with
codegen"` reworked SLPVectorizer getGatherCost() which together with
ad9909d caused the SystemZ test vec-elt-insertion.ll to fail.

This patch restores the SystemZ test and reverts the change in SLPVectorizer
getScalarizationOverhead() so that TTI.getScalarizationOverhead() is always
called again. The ForPoisonSrc argument is now passed on to the TTI method
so that X86 can handle this as required.

Fixes: #135346
2025-04-30 17:11:27 +02:00
Gaëtan Bossu
c5c4f0d11c
[SLP] Simplify tryToFindDuplicates() (NFC) (#135766)
This NFC aims to simplify the control-flow and interfaces used in tryToFindDuplicates(). The point is to make it easier to understand where decisions for scalar de-duplication are made.

In particular:
 - Limit indentation
 - Rename some variables to better match their use case
- Always give consistent outputs for VL and ReuseShuffleIndices. This makes it possible to use the same code for building gather TreeEntry everywhere. This also allows to remove the TryToFindDuplicates lambda.
2025-04-29 14:47:22 +01:00
Florian Hahn
d68b446933
[IR] Add matchers for remaining FP min/max intrinsics (NFC). (#137612)
Add dedicated matchers for minimum,maximum,minimumnum and maximumnum
intrinsics, similar for the existing matchers for maxnum and minnum.

As suggested in https://github.com/llvm/llvm-project/pull/137335.

PR: https://github.com/llvm/llvm-project/pull/137612
2025-04-29 12:20:00 +01:00
Alexey Bataev
73d90ec825 [SLP][NFC]Consider non-profitable trees with only phis, gathers, splits and small nodes with reuses
Improves compile time for non-profitable cases.
Fixes #135965
2025-04-28 03:56:08 -07:00
Florian Hahn
ec1016f7ef
[IVDescriptors] Support reductions with minimumnum/maximumnum. (#137335)
Add a new reduction recurrence kind for reductions with
minimumnum/maximumnum. Such reductions can be vectorized without
nsz/nnans, same as reductions with maximum/minimum intrinsics.

Note that a new reduction kind is needed to make sure partial reductions
are also combined with minimumnum/maximumnum.

Note that the final reduction to a scalar value is performed with
vector.reduce.fmin/fmax. This should be fine, as the results of the
partial reductions with maximumnum/minimumnum silences any sNaNs.

In-loop and reductions in SLP are not supported yet, as there's no
reduction version of maximumnum/minimumnum yet and fmax may be
incorrect.

PR: https://github.com/llvm/llvm-project/pull/137335
2025-04-28 11:16:36 +01:00
Kazu Hirata
5cfd81b0cc
[llvm] Use range constructors of *Set (NFC) (#137552) 2025-04-27 15:59:57 -07:00
Matt Arsenault
4ea2278e39
SLPVectorizer: Use use_empty instead of hasNUses(0) (#137336) 2025-04-25 17:27:01 +02:00
Alexey Bataev
a7a74b349d
[SLP]Improve reordering of the alternate nodes
Better to preserve the original order of the alternate nodes to avoid
inter-lane shuffling, select/insert subvector patterns provide better
perf.

Reviewers: RKSimon, hiraditya

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/136329
2025-04-24 14:33:10 -04:00
Alexey Bataev
f427890a1d [SLP]Fix PHI comparator to make it follow weak strict ordering restriction
Fixes #137164
2025-04-24 11:08:17 -07:00
Alexey Bataev
f52b01b6cf [SLP][NFC]Rename functions/variables, limit visibility to meet the coding standards, NFC 2025-04-22 09:56:31 -07:00
Alexey Bataev
9c388f1f05
[SLP]Prefer segmented/deinterleaved loads to strided and fix codegen
Need to estimate, which one is preferable, deinterleaved/segmented
loads or strided. Segmented loads can be combined, improving
the overall performance.

Reviewers: RKSimon, hiraditya

Reviewed By: hiraditya, RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/135058
2025-04-22 12:11:01 -04:00
Alexey Bataev
0252d338fa
[SLP]Model single unique value insert + shuffle as splat + select, where profitable
When we have the remaining unique scalar, that should be inserted into
non-poison vector and into non-zero position:
```
%vec1 = insertelement %vec, %v, pos1
%res = shuffle %vec1, poison, <0, 1, 2,..., pos1, pos1 + 1, ..., pos1,
...>
```
better to estimate if it is profitable to model it as is or model it as:
```
%bv = insertelement poison, %v, 0
%splat = shuffle %bv, poison, <poison, ..., 0, ..., 0, ...>
%res = shuffle %vec, %splat, <0, 1, 2,..., pos1 + VF, pos1 + 1, ...>
```

Reviewers: preames, hiraditya, RKSimon

Reviewed By: preames

Pull Request: https://github.com/llvm/llvm-project/pull/136590
2025-04-22 11:30:29 -04:00
David Green
d20604e5b6
[CostModel] Plumb CostKind into getExtractWithExtendCost (#135523)
This will likely not affect much with the current uses of the function,
but if we have getExtractWithExtendCost we can plumb CostKind through it
in the same way as other costmodel functions.
2025-04-22 15:09:43 +01:00
Kazu Hirata
b01e25deba
[llvm] Call hash_combine_range with ranges (NFC) (#136511) 2025-04-20 16:36:03 -07:00
Matt Arsenault
e2886705f0
SLPVectorizer: Use use_empty instead of getNumUses (#136336) 2025-04-18 21:14:06 +02:00
Alexey Bataev
fdcee2dd36
[SLP]Reorder tree, if the reorder indices are non empty
Need to consider the ordering for all nodes with the specified ordering,
not only loads/store/extracts.

Reviewers: hiraditya, RKSimon

Reviewed By: hiraditya

Pull Request: https://github.com/llvm/llvm-project/pull/136185
2025-04-18 13:37:08 -04:00
Kazu Hirata
5e1b0f9773
[llvm] Use llvm::less_first and llvm::less_second (NFC) (#136272) 2025-04-18 10:05:55 -07:00
Alexander Kornienko
85110ccee9
[SLP] Replace most uses of for_each with range-for loops. NFC (#136146)
This removes a bit of complexity from the code, where it doesn't seem to
be justified.
2025-04-17 21:38:18 +02:00
Alexey Bataev
5fe91f1b59 [SLP]Check for catchswitch block before doing the analysis of the instructions
Need to skip the analysis of the catchswitch blocks to avoid a compiler
crash when trying to get the first instruction in the block.
2025-04-17 09:10:15 -07:00