2451 Commits

Author SHA1 Message Date
Alexey Bataev
78490acb32 [SLP]Support for zext i1 %x modeling as select %x, 1, 0
Model zext i1 %x to in as select i1 %x, in 1, in 0 in case, if there are
other select instructions, which can be combined into a bundle.

Fixes #178403

Recommit after revert in 993e1f66afcfe9da03bd813e669eada341b11d2f

Reviewers: hiraditya, RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/180635
2026-02-10 12:54:12 -08:00
Alexey Bataev
993e1f66af Revert "[SLP]Support for zext i1 %x modeling as select %x, 1, 0"
This reverts commit 70aebae2a13114f4e3d5e2460c052d8f3de295be to fix
buildbots https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flab.llvm.org%2Fbuildbot%2F%23%2Fbuilders%2F85%2Fbuilds%2F18614&data=05%7C02%7C%7Ce5641da3fe984280a6e908de68b3658c%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C639063316889757116%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=65hUwLDdZkXq3zUEt3cVuqJNwXN7Alw4JKDggDbjeVk%3D&reserved=0
2026-02-10 06:49:53 -08:00
Alexey Bataev
70aebae2a1
[SLP]Support for zext i1 %x modeling as select %x, 1, 0
Model zext i1 %x to in as select i1 %x, in 1, in 0 in case, if there are
other select instructions, which can be combined into a bundle.

Fixes #178403

Reviewers: hiraditya, RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/180635
2026-02-10 08:59:44 -05:00
Alexey Bataev
10a0f40083 [SLP][NFC]Add another shl-to-add modeling case 2026-02-07 19:18:13 -08:00
Alexey Bataev
f52f97b30b [SLP][NFC]Add another shl-to-add modeling test, NFC 2026-02-07 07:13:26 -08:00
Alexey Bataev
f227efef34 [SLP][NFC]Add another shl-to-add modeling test case, NFC 2026-02-05 14:24:02 -08:00
Alexey Bataev
fe754dff6d [SLP]Remove LoadCombine workaround after handling of the copyables
LoadCombine pattern handling was added as a workaround for the cases,
where the SLP vectorizer could not vectorize the code effectively. With
the copyables support, it can handle it directly.

Also, patch adds support for scalar loads[ + bswap] pattern for byte
sized loads (+ reverse bytes for bswap)

Recommit after revert in 6377c86d718232fe60c548dfd7ab439f7ff84df7

Reviewers: RKSimon, hiraditya

Pull Request: https://github.com/llvm/llvm-project/pull/174205
2026-02-05 11:16:08 -08:00
Alexey Bataev
c0fe938dfd [SLP][NFC]Add a testing with aliasing between bswap memory operations, NFC 2026-02-05 10:34:22 -08:00
Alexey Bataev
6377c86d71 Revert "[SLP]Remove LoadCombine workaround after handling of the copyables"
This reverts commit 8dbb9f66e8b14a8a06f1873a2c1b7dce366ed2d6 to fix
buildbot issues https://lab.llvm.org/buildbot/#/builders/224/builds/2795
2026-02-05 09:57:00 -08:00
Alexey Bataev
8dbb9f66e8
[SLP]Remove LoadCombine workaround after handling of the copyables
LoadCombine pattern handling was added as a workaround for the cases,
where the SLP vectorizer could not vectorize the code effectively. With
the copyables support, it can handle it directly.

Also, patch adds support for scalar loads[ + bswap] pattern for byte
sized loads (+ reverse bytes for bswap)

Reviewers: RKSimon, hiraditya

Pull Request: https://github.com/llvm/llvm-project/pull/174205
2026-02-05 10:42:08 -05:00
Alexey Bataev
28878137f0 [SLP][NFC]Add another test for shl-to-add transformation, NFC 2026-02-04 15:22:28 -08:00
Alexey Bataev
569a9b4925 [SLP][NFC]Add another shl-to-add transformation test, NFC 2026-02-04 08:08:12 -08:00
Alexey Bataev
46a38488a4 [SLP]Disable modeling disjoint reduction or as bitcast for big endian
Big endian targets cannot be modeled as bitcast, need to support it as
a reversion/bswap instead, just disabling it for now.
2026-02-03 06:16:25 -08:00
Alexey Bataev
9411f5db60 [SLP][NFC]Add another test for shl-to-add transformation, NFC 2026-02-03 03:56:28 -08:00
Alexey Bataev
63c1615e7e [SLP][NFC]Add a case for missed shl-to-add transformation, NFC 2026-02-02 10:13:52 -08:00
Alexey Bataev
0b4fbc5aea [SLP][NFC]Add a test to check modeling shl x, 1 as add x,x, NFC 2026-02-01 11:26:48 -08:00
Alexey Bataev
b73122d5b7 [SLP]Cast incoming value to a propr type for int nodes, bitcasted to fp
Before casting the value to FP type, need to check, if the type for
reduced during minbitwidth analysis and need to restore the original
source type to generate correct bitcast operation.

Fixes #178884
2026-01-30 08:51:03 -08:00
Alexey Bataev
ac5dc54d50 [SLP][NFC]Add disjoint or forms of tests, which actually should lead to scalar identity/bswap, NFC 2026-01-29 11:58:13 -08:00
Alexey Bataev
2ea77ed013
[SLP]Support for bswap pattern for bytes-based disjoint or reductions
If the reduction forms reversed bitcast, we can represent it as
a bitcast + bswap, if the source elements are byte sized

Reviewers: hiraditya, RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/178513
2026-01-29 11:28:01 -05:00
Alexey Bataev
f6fe6fc86a [SLP]Do not vectorize subtrees of the split node, marked as gathers.
If the split node was marked as gather/buildvector nodes, the vectorizer
should not vectorize its subtrees, which are marked as deleted.
2026-01-28 17:44:38 -08:00
Alexey Bataev
5413a22e79
[SLP] Reordered disjoint or reduction of shl(zext, (0, stride, 2* stride)) modelled as bitcast
Added support for reorder reduction of shl(zext)-like construct. Such
constructs are modelled currently as shuffle + bitcast.

Reviewers: RKSimon, hiraditya

Pull Request: https://github.com/llvm/llvm-project/pull/178292
2026-01-28 11:15:54 -05:00
Alexey Bataev
68450ba210 [SLP]Support for tree throttling in SLP graphs with gathered loads
Gathered loads forming DAG instead of trees in SLP vectorizer. When
doing the throttling analysis for such graphs, need to consider partially
matched gathered loads DAG nodes and consider extract and/or gather
operations and their costs.
The patch adds this analysis and allows cutting off the expensive
sub-graphs with gathered loads.

Reviewers: hiraditya, RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/177855

Recommit after revert in d733771113339608aff6002d1fa89aaf4a51c502, which
was related to a crash in SelectionDAG
2026-01-28 07:19:09 -08:00
Simon Pilgrim
11081eb1d3
[CostModel][X86] reduce_add(vXi1) will lower as a scalar ctpop (#178400)
Fixes #176906
2026-01-28 11:43:30 +00:00
Soumik15630m
51845a53fd
[SLP] Fix crash on extractelement with out-of-bounds index.....Fixes … (#176918)
…The cose modeling logic was attempting to set a bit in APInt for an
out-of-bounds index, causing an assertion failure. This patch ignores
OOB indices as they produce poison- which is already handled.
Fixes #176780  

this is the same test result which produces this bug 

<img width="1600" height="964" alt="image"
src="https://github.com/user-attachments/assets/80593902-9d15-4e18-850b-a558bca8518e"
/>
2026-01-28 06:08:03 -05:00
Simon Pilgrim
482a3bc861
[SLP][X86] Add test case for #176906 (#178386) 2026-01-28 10:14:50 +00:00
Nico Weber
d733771113 Revert "[SLP]Support for tree throttling in SLP graphs with gathered loads"
This reverts commit 0666a777ec8138f58ebc7fc41a2fb8097328308a.

Makes clang assert, see repro at
https://github.com/llvm/llvm-project/pull/177855#issuecomment-3808529832
2026-01-27 21:01:56 -05:00
Ryan Buchner
2753e1dedf
[RISCV] Set the reciprocal throughtput cost for division to TTI::TCC_Expensive (#177516)
Fixes #176208. Scaled back version of #176515 that only affects the RISCV backend.

Only modifies the cost for cases when DIV is a legal operation.

Updates the cost for both Scalar and Vector types.

Used `TTI::TCC_Expensive` as suggested by
https://github.com/llvm/llvm-project/issues/176208#issuecomment-3760902537.

---------

Co-authored-by: Luke Lau <luke_lau@icloud.com>
2026-01-27 11:01:19 -08:00
Alexey Bataev
ae24a90ce6 [SLP][NFC]Add disojoint reduction or tests, which can be represented as bitcasts/bswaps 2026-01-27 08:20:21 -08:00
Alexey Bataev
0666a777ec
[SLP]Support for tree throttling in SLP graphs with gathered loads
Gathered loads forming DAG instead of trees in SLP vectorizer. When
doing the throttling analysis for such graphs, need to consider partially
matched gathered loads DAG nodes and consider extract and/or gather
operations and their costs.
The patch adds this analysis and allows cutting off the expensive
sub-graphs with gathered loads.

Reviewers: hiraditya, RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/177855
2026-01-25 12:22:47 -05:00
Alexey Bataev
d64d3735ab [SLP]Correctly handle vector nodes, coming from same incoming blocks in PHI nodes
If multiple nodes are generated from same PHI node for the same block,
still need to vectorize vector nodes, even if the value for the incoming block was already emitted.

Fixes #177124
2026-01-21 12:59:21 -08:00
Alexey Bataev
3dc5259bc8 [SLP]Do not build bundle for copyables, with parents used in PHI node
If the copyables have parents, used in PHI nodes, this causes complex
schedulable/non-schedulable dependecies, which require complex
processing, but with small profitability. Cut such case early for now to
prevent compiler crashes and compile time blow up.

Fixes #176658
2026-01-18 13:37:51 -08:00
Gabriel Baraldi
72a20b8e29
[SLPVectorizer] Check std::optional coming out of getPointersDiff (#175784)
Fixes https://github.com/llvm/llvm-project/issues/175768 
There are other unchecked uses std::optional in this pass but I couldn't
figure out a test that triggers them
2026-01-15 09:07:13 -06:00
Alexey Bataev
c322a0c462 [SLP]Do not throttle nodes with split parents, if any of scalars is used in more than one split nodes
If the the node to throttle is a vector node, which is used in split
node, and at least one scalar of such a node is used in many split
nodes, such vector node should be throttled. otherise there might be
wrong def-use chain, which crashes the compiler.

Fixes #175967
2026-01-15 03:50:45 -08:00
Alexey Bataev
a96cda0e33 [SLP]Update deps for copyables operands, if the user is used several times in node
If the user instruction is used several times in the node, and in one
cases its operand is copyable, but in another is not, need to check all
operands to be sure we do not miss scheduling
2026-01-09 15:18:31 -08:00
Alexey Bataev
125a53ce59 Revert "[SLP]Update deps for copyables operands, if the user is used several times in node"
This reverts commit 6e1acd061e74f44df6d53d54c78d1e50790456a8 to fix
crashes detected in  https://lab.llvm.org/buildbot/#/builders/25/builds/14678.
2026-01-08 14:15:25 -08:00
Alexey Bataev
6e1acd061e [SLP]Update deps for copyables operands, if the user is used several times in node
If the user instruction is used several times in the node, and in one
cases its operand is copyable, but in another is not, need to check all
operands to be sure we do not miss scheduling
2026-01-08 12:50:32 -08:00
Alex Bradbury
3ae71d30be
[SLP] Use ConstantInt::getSigned for stride argument to strided load/store intrinsics (#175007)
strided-stores-vectorized.ll crashes for RV32 without fixing the
relevant logic in vectorizeTree, because the argument can't be
represented as a 32-bit unsigned value:
```
llvm::APInt::APInt(unsigned int, uint64_t, bool, bool): Assertion `llvm::isUIntN(BitWidth, val) && "Value is not an N-bit unsigned value"' failed.
```

It is intended to be signed, so we simply use ConstantInt::getSigned
instead. This fixes other stride-related instances in the file as well.
For further context, this change is part of unblocking rv32gcv
llvm-test-suite in CI.
2026-01-08 16:45:02 +00:00
Alexey Bataev
9fb45c5959 [SLP]Do not generate extractelement subnodes with the same indeces
The compiler should not generate subvectors with the same extractelement
instructions, it may cause a crash and leads to inefficient
vectorization.

Fixes #174773
2026-01-08 07:23:06 -08:00
Alexey Bataev
39456e4226 [SLP]Do not increment dep count for non-schedulable nodes with non-schedulable parents
If the node is non-scedulable, all instructions are used outside only
and parent is non-schedulable non-phi node, the dependency count should be
increased for such nodes

Fixes #174599
2026-01-07 10:26:19 -08:00
Ryan Buchner
f180d4bb46
[SLP] Report the correct operand to getArithmeticInstrCost() when duplicated scalars (#174442)
Before, we were selecting the wrong operand in cases when Scalars
contained duplicate values. Stems from #135797.

Using:
`opt -passes=slp-vectorizer -mtriple=riscv64 -mattr=+v t.ll`
```
target datalayout = "e-m:e-p:64:64-i64:64-i128:128-n32:64-S128"
target triple = "riscv64"

define void @foo(ptr noalias %A, ptr noalias %B) {
entry:
  %0 = load i32, ptr %B
  %add = add nsw i32 %0, 1
  store i32 %add, ptr %A
  %arrayidx.1 = getelementptr inbounds nuw i8, ptr %B, i64 4
  %1 = load i32, ptr %arrayidx.1
  %add.1 = add nsw i32 %1, 1
  %arrayidx2.1 = getelementptr inbounds nuw i8, ptr %A, i64 4
  store i32 %add.1, ptr %arrayidx2.1
  %arrayidx.2 = getelementptr inbounds nuw i8, ptr %B, i64 8
  %2 = load i32, ptr %arrayidx.2
  %add.2 = add nsw i32 %2, 1
  %arrayidx2.2 = getelementptr inbounds nuw i8, ptr %A, i64 8
  store i32 %add.2, ptr %arrayidx2.2
  %arrayidx.3 = getelementptr inbounds nuw i8, ptr %B, i64 12

  %arrayidx2.3 = getelementptr inbounds nuw i8, ptr %A, i64 12

  store i32 %add, ptr %arrayidx2.3
  %arrayidx.4 = getelementptr inbounds nuw i8, ptr %B, i64 16
  %4 = load i32, ptr %arrayidx.4
  %add.4 = add nsw i32 %4, 1
  %arrayidx2.4 = getelementptr inbounds nuw i8, ptr %A, i64 16
  store i32 %add.4, ptr %arrayidx2.4
  %arrayidx.5 = getelementptr inbounds nuw i8, ptr %B, i64 20
  %5 = load i32, ptr %arrayidx.5
  %add.5 = add nsw i32 %5, 1
  %arrayidx2.5 = getelementptr inbounds nuw i8, ptr %A, i64 20
  store i32 %add.5, ptr %arrayidx2.5
  %arrayidx.6 = getelementptr inbounds nuw i8, ptr %B, i64 24
  %6 = load i32, ptr %arrayidx.6
  %add.6 = add nsw i32 %6, 1
  %arrayidx2.6 = getelementptr inbounds nuw i8, ptr %A, i64 24
  store i32 %add.6, ptr %arrayidx2.6
  %arrayidx.7 = getelementptr inbounds nuw i8, ptr %B, i64 28
  %7 = load i32, ptr %arrayidx.7
  %add.7 = add nsw i32 %7, 1
  %arrayidx2.7 = getelementptr inbounds nuw i8, ptr %A, i64 28
  store i32 %add.7, ptr %arrayidx2.7
  ret void
}
```

The following trace is produced, note the wrong operand is used for `Idx
> 2`

Before:
```
GetScalarCost(), Idx=0
UniqueValues[Idx]:   %add = add nsw i32 %0, 1
Op1:   %0 = load i32, ptr %B, align 4
GetScalarCost(), Idx=1
UniqueValues[Idx]:   %add.1 = add nsw i32 %1, 1
Op1:   %1 = load i32, ptr %arrayidx.1, align 4
GetScalarCost(), Idx=2
UniqueValues[Idx]:   %add.2 = add nsw i32 %2, 1
Op1:   %2 = load i32, ptr %arrayidx.2, align 4
GetScalarCost(), Idx=3
UniqueValues[Idx]:   %add.4 = add nsw i32 %3, 1
Op1:   %0 = load i32, ptr %B, align 4
GetScalarCost(), Idx=4
UniqueValues[Idx]:   %add.5 = add nsw i32 %4, 1
Op1:   %3 = load i32, ptr %arrayidx.4, align 4
GetScalarCost(), Idx=5
UniqueValues[Idx]:   %add.6 = add nsw i32 %5, 1
Op1:   %4 = load i32, ptr %arrayidx.5, align 4
GetScalarCost(), Idx=6
UniqueValues[Idx]:   %add.7 = add nsw i32 %6, 1
Op1:   %5 = load i32, ptr %arrayidx.6, align 4
```

After:
```
GetScalarCost(), Idx=0
UniqueValues[Idx]:   %add = add nsw i32 %0, 1
Op1:   %0 = load i32, ptr %B, align 4
GetScalarCost(), Idx=1
UniqueValues[Idx]:   %add.1 = add nsw i32 %1, 1
Op1:   %1 = load i32, ptr %arrayidx.1, align 4
GetScalarCost(), Idx=2
UniqueValues[Idx]:   %add.2 = add nsw i32 %2, 1
Op1:   %2 = load i32, ptr %arrayidx.2, align 4
GetScalarCost(), Idx=3
UniqueValues[Idx]:   %add.4 = add nsw i32 %3, 1
Op1:   %3 = load i32, ptr %arrayidx.4, align 4
GetScalarCost(), Idx=4
UniqueValues[Idx]:   %add.5 = add nsw i32 %4, 1
Op1:   %4 = load i32, ptr %arrayidx.5, align 4
GetScalarCost(), Idx=5
UniqueValues[Idx]:   %add.6 = add nsw i32 %5, 1
Op1:   %5 = load i32, ptr %arrayidx.6, align 4
GetScalarCost(), Idx=6
UniqueValues[Idx]:   %add.7 = add nsw i32 %6, 1
Op1:   %6 = load i32, ptr %arrayidx.7, align 4
```
2026-01-05 22:25:25 +00:00
Alexey Bataev
f985e1a113
[SLP]Better copyable vectorization for stores with non-instructions (#174249) 2026-01-03 17:05:55 -05:00
Mikhail Gudim
3572e62991
[SLPVectorizer] Widen rt stride loads (#162336)
Suppose we are given pointers of the form: `%b + x * %s + y * %c_i`
where `%c_i`s are constants and %s is a run-time fixed value.
If the pointers can be rearranged as follows:

```
 %b + 0 * %s + 0
 %b + 0 * %s + 1
 %b + 0 * %s + 2
 ...
 %b + 0 * %s + w

 %b + 1 * %s + 0
 %b + 1 * %s + 1
 %b + 1 * %s + 2
 ...
 %b + 1 * %s + w
 ...
```

It means that the memory can be accessed with a strided loads of width `w`
and stride `%s`.

This is motivated by x264 benchmark.
2026-01-02 17:06:11 -05:00
Alexey Bataev
8ac911b90d [SLP][NFC]Add a test with the leading non-instruction in sores, which cannot be handled as copyable 2026-01-02 13:27:07 -08:00
Alexey Bataev
8d75f97662 [SLP]Consider split node as potential reduction root
Need to check the first split node as a potential reduction root to
prevent compiler crash
2026-01-02 06:42:44 -08:00
Alexey Bataev
a0be4724a9 [SLP] Support for copyables in the reduced values (#153589)
Currently reductions can handles only same/alternate instructions,
skipping potential support for copyables. Patch adds support for
copyables in the reduced values.

Recommit after revert in 1febc3f088ef444af378c0a90aaba2195c30472b
2026-01-01 13:31:13 -08:00
Alexey Bataev
1febc3f088 Revert "[SLP] Support for copyables in the reduced values (#153589)"
This reverts commit 831bb12a30dbbbf69930c11846a7b62b33e0f0db to fix
buildbot https://lab.llvm.org/buildbot/#/builders/224/builds/1205
2026-01-01 08:48:40 -08:00
Alexey Bataev
831bb12a30
[SLP] Support for copyables in the reduced values (#153589)
Currently reductions can handles only same/alternate instructions,
skipping potential support for copyables. Patch adds support for
copyables in the reduced values.
2026-01-01 11:31:28 -05:00
Alexey Bataev
27cf32dafd [SLP]Fix def-after-use crash for gathered split nodes
If the split node is marked as a gather node after non-profitable
analysis, need to exclude it from the list of split nodes and include
into the list of gather/buildvector nodes

Fixes report from https://github.com/llvm/llvm-project/pull/162018#issuecomment-3701928745
2025-12-31 14:12:09 -08:00
Alexey Bataev
55e0b928b5 [SLP]Consider deleted/gathered nodes, when deciding to erase extractelement
If any user of the extractelement instruction is part of the node to be
deleted/gathered, such extractelements instructions should not be
considered for deletion.

Fixes #174020
2025-12-31 12:58:42 -08:00
Alexey Bataev
2541b1870e [SLP]Mark and incompatible for 'xor %a, 0' operations
Xor with 0 is incompatible with and, which resulst in all zero instead
of %a

https://alive2.llvm.org/ce/z/oEVETS

Fixes #174041
2025-12-31 08:30:50 -08:00