2542 Commits

Author SHA1 Message Date
Alexey Bataev
eaf0135b77
[SLP][NFC]Fix run line for the test, fix test name, NFC
Reviewers: 

Pull Request: https://github.com/llvm/llvm-project/pull/190537
2026-04-05 12:20:43 -04:00
Alexey Bataev
c2f97c5917
[SLP] Do not skip tiny trees with gathered loads to vectorize
The isTreeTinyAndNotFullyVectorizable check for 2-node trees
(insertelement root + gather child) was too aggressive: it rejected
trees even when LoadEntriesToVectorize was non-empty, preventing
gathered loads from being vectorized into masked loads/strided loads, etc.

Reviewers: hiraditya, RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/190181
2026-04-02 09:47:01 -04:00
Alexey Bataev
dc2d25f80b
Revert "[SLP] Do not skip tiny trees with gathered loads to vectorize"
This reverts commit 94ec7ffa46d351b86fbbe3a445ceef37f331c4a2 to fix
reported issue https://github.com/llvm/llvm-project/pull/190040#issuecomment-4177827078

Reviewers: 

Pull Request: https://github.com/llvm/llvm-project/pull/190176
2026-04-02 09:26:31 -04:00
Alexey Bataev
94ec7ffa46
[SLP] Do not skip tiny trees with gathered loads to vectorize
The isTreeTinyAndNotFullyVectorizable check for 2-node trees
(insertelement root + gather child) was too aggressive: it rejected
trees even when LoadEntriesToVectorize was non-empty, preventing
gathered loads from being vectorized into masked loads/strided loads, etc.

Reviewers: RKSimon, hiraditya

Pull Request: https://github.com/llvm/llvm-project/pull/190040
2026-04-02 06:47:53 -04:00
Alexey Bataev
c6669c4993
[SLP] Guard FMulAdd conversion to require single-use/non-reordered FMul operands
The FMulAdd (CombinedVectorize) transformation in transformNodes() marks
an FMul child entry with zero cost, assuming it is fully absorbed into
the fmuladd intrinsic. However, when any FMul scalar has multiple uses
(e.g., also stored separately), the FMul must survive as a separate
node.

Reviewers: hiraditya, RKSimon, bababuck

Pull Request: https://github.com/llvm/llvm-project/pull/189692
2026-04-01 17:14:52 -04:00
Alexey Bataev
c20e233020 [SLP] Replace TrackedToOrig DenseMap with parallel SmallVector in reduction
Replace the DenseMap<Value*, Value*> TrackedToOrig with a SmallVector<Value*>
indexed in parallel with Candidates. This avoids hash-table overhead for the
tracked-value-to-original-value mapping in horizontal reduction processing.

Fixes #189686
2026-03-31 16:22:57 -07:00
Alexey Bataev
38c0f53a14 [SLP][NFC] Add a test for incorrect fma-conversion for fmuls with multi uses 2026-03-31 08:00:21 -07:00
Alexey Bataev
26e0d15eaa
[SLP] Prefer to trim equal-cost alternate-shuffle subtrees
If the trimming candidate subtree is rooted at an alternate-shuffle node
with binary ops, and this subtree has the same cost as the buildvector
node cost, better to stick with the buildvector node to avoid runtime
perf regressions from shuffle/extra operations  overhead that the cost model may
underestimate. Skip trimming if the subtree contains ExtractElement
nodes, since those operate on already-materialized vectors, which may
reduced vector-to-scalar code movement and have better perf.

Reviewers: hiraditya, bababuck, fhahn, RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/188272
2026-03-30 16:03:18 -04:00
Alexey Bataev
4450891580 [SLP] Check if potential bitcast/bswap candidate is a root of reduction
Need to check if the potential bitcast/bswap-like construct is a root of
the reduction, otherwise it cannot represent a bitcast/bswap construct.

Fixes #189184
2026-03-28 13:58:22 -07:00
Alexey Bataev
1759b81de9
[SLP]Improve analysis of copyables operands for commmutative main instruction
For commutative copyables, instruction operands are always LHS and other
are RHS. But if some instruction is main and has 2 instructions
operands and RHS is more compatible with LHS operands, than LHS
operands, need to swap such operands for better analysis.

Reviewers: hiraditya, RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/185320
2026-03-26 16:03:58 -04:00
Alexey Bataev
d9a44c818f
[SLP]Initial support for vector register spills/reloads estimation
Adds initial support for spill/reload estimation. Currently, it just
runs the operands and calculates number of registers, used by the
operands. If this number greater than the number of total available
registers, it consider the first (full) groups as the candidates for the spills/reloads.

Reviewers: hiraditya, RKSimon, bababuck

Pull Request: https://github.com/llvm/llvm-project/pull/187594
2026-03-26 14:27:27 -04:00
Alexey Bataev
1cb9a78b5a
[SLP] Fix incorrect operand info for select in getCmpSelInstrCost
The operand info passed to getCmpSelInstrCost for Select instructions
was using operands 0 and 1 (condition and true value), but the API
expects info about the data operands (true and false values). For
selects, the data operands are at indices 1 and 2, not 0 and 1.
This led to the cost model receiving the condition's operand info
instead of the false arm's, potentially producing inaccurate cost
estimates.

Reviewers: bababuck, hiraditya, RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/188506
2026-03-26 07:09:28 -04:00
Alexey Bataev
2181f74905 [SLP][NFC] Add a test with the worse vectorization because of missing opreands matching 2026-03-25 15:29:23 -07:00
Alexey Bataev
ce6d3a49ee [SLP][NFC]Add a test with incorrect cost model for short-circuit or/and, modeled via select 2026-03-25 09:12:58 -07:00
Alexey Bataev
34889601a9
[SLP]Mark candidate instruction as reduced value, if it is the operand of another reduced value
If the next candidate is the operand of one of the reduced value
candidates, such instructions also should be marked as a reduced value,
not a reduction operation, even if all other requirements are met.
This will allow to reduce the compile time.

Reviewers: hiraditya, RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/188103
2026-03-25 11:23:51 -04:00
Jinsong Ji
642bde76c3
[SLP] Fix infinite loop in ordered reduction worklist processing (#188342)
The ordered reduction support introduced in 94e366ef2060 can cause an
infinite loop when processing complex reduction chains. The worklist
algorithm re-adds instructions from PossibleOrderedReductionOps when
switching to ordered mode, but doesn't track which instructions have
already been processed. This allows instructions to be re-added and
processed multiple times, creating cycles.

Add a Visited set to track processed instructions and skip any that
have already been handled, preventing the infinite loop.
2026-03-24 21:19:53 +00:00
Alexey Bataev
d422ffe19f [SLP][NFC]Add a test with the non-profitable alternate vectorization, not throttled 2026-03-24 07:50:44 -07:00
Alexey Bataev
af37ac8aee [SLP]Use reduction root explicitly from reduction analysis to avoid non-determinism
Initially, the reduction root was detected using the last member of the UserIgnoreList set, which is unordered. Better to use the reduction root explicitly to avoid non-determinism in the reduction parent block, which may cause incorrect scale factor estimation for the reduction cost.
2026-03-23 09:46:33 -07:00
Jasmine Tang
d69c670934
[WebAssembly] Add initial shuffle cost capabilities (#187596)
Fixes #178940

Fixes the case of i16x8, i8x16 manual splat not recognized but the case of i32x4 still remains.
2026-03-23 09:28:37 -07:00
Alexey Bataev
5b7ad38d6b [SLP]Fix codegen of compares with consts, being trunced
If the const values have more active bits, than requested by the another
operand of the compare, such constants should not be trunced to avoid
miscompilation
2026-03-23 07:49:19 -07:00
Alexey Bataev
df00c1c00b [SLP][NFC]Add a test with the icmp miscompilation, NFC 2026-03-23 06:49:25 -07:00
Alexey Bataev
85f529dda1 Revert "[SLP]Fix codegen of compares with consts, being trunced"
This reverts commit 16e0cc8308379857ecd69e6fe1aaf71e15b94910 to add
a new test case for the miscompile
2026-03-23 06:27:08 -07:00
Alexey Bataev
16e0cc8308 [SLP]Fix codegen of compares with consts, being trunced
If the const values have more active bits, than requested by the another
operand of the compare, such constants should not be trunced to avoid
miscompilation
2026-03-23 06:04:08 -07:00
Alexey Bataev
b2ba79578b [SLP]Fix patterns for compile time blow up with ordered reductions
Excluded patterns, leading to compile time blow up for integer ordered
reductions.
2026-03-22 13:42:54 -07:00
Alexey Bataev
88f830aed8 [SLP]Do not try to reduced instruction, marked for deletion in previous attempts
Need to skip instructions, which were vectorized and marked for deletion
to prevent a compiler crash
2026-03-22 10:10:48 -07:00
Alexey Bataev
616240369e [SLP]Do not consider copyable node with SplitVectorize parent
If the copyables are schedulable and the parent node is plit vectorize,
need to skip the scheduling analysis for such nodes to avoid a compiler
crash
2026-03-21 06:56:59 -07:00
Alexey Bataev
b260861b38 [SLP]Update values after ordered vectorization
Need to update matching between the original reduced values and their
vectorized matches after ordered reduction vectorization to avoid
a compiler crash
2026-03-20 13:33:40 -07:00
Alexey Bataev
94e366ef20
[SLP] Initial support for ordered reductions
Patch models ordered reductions as a series of extractelements for the
cases which cannot be modeled as unordered reductions.

Fixes #50590

Reviewers: RKSimon, hiraditya

Pull Request: https://github.com/llvm/llvm-project/pull/182644
2026-03-20 13:45:14 -04:00
Alexey Bataev
2bb0fa46a8
[SLP]Prefer copyable over alternate
If the instructions state is alternate and/or contains non-directly
matching instructions, need to check if it is better to represent such
operations as non-alternate with copyables.
To do this, we need to compare operands between the instructions in their
different representations and choose the best one for optimal
vectorization.

Reviewers: RKSimon, hiraditya

Pull Request: https://github.com/llvm/llvm-project/pull/183777
2026-03-20 11:59:59 -04:00
Alexey Bataev
7d76a3122d
[SLP]Improve analysis for the shl-based reduced values with copyables (#185485)
shl-based reduced values in many cases serve as a bitcast/bswap-based
transfromation root, but need to improve analysis for better matching.
This patch merges reduction candidates into a single reduced value
array, if there are only 2 different candidate arrays, one of them has
only single element, the second is a list of shl instructions. Also,
sorts these shl instructions by their shift amount and merges with the
single candidate, if it is profitable to have a copyable reduction.
2026-03-19 14:16:53 -04:00
Alexey Bataev
9050794e06
[SLP]Improve reductions for copyables/split nodes
The original support for copyables leads to a regression in x264 in
RISCV, this patch improves detection of the copyable candidates by more
precise checking of the profitability and adds and extra check for
splitnode reduction, if it is profitable.

Fixes #184313

Reviewers: hiraditya, RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/185697
2026-03-19 12:03:05 -04:00
Alexey Bataev
582fa78753
[SLP]Do not match buildvector node, if current node is part of its combined nodes
If current buildvector node is part of the combined nodes of the
matching candidate node, this matching candidate must be considered as
non-matching to prevent wrong def-use chain

Reviewers: 

Pull Request: https://github.com/llvm/llvm-project/pull/187491
2026-03-19 08:15:32 -04:00
Alexey Bataev
abdcde9bbc
[SLP] Loop aware cost model/tree building
Currently, SLP vectorizer do not care about loops and their trip count.
It may lead to inefficient vectorization in some cases. Patch adds loop
nest-aware tree building and cost estimation.
When it comes to tree building, it now checks that tree do not span
across different loop nests. The nodes from other loop nests are
immediate buildvector nodes.
The cost model adds the knowledge about loop trip count. If it is
unknown, the default value is used, controlled by the
-slp-cost-loop-min-trip-count=<value> option. The cost of the vector
nodes in the loop is multiplied by the number of iteration (trip count),
because each vector node will be executed the trip count number of
times. This allows better cost estimation.

Original Reviewers:
jdenny-ornl, vporpo, hiraditya, RKSimon

Original PR: https://github.com/llvm/llvm-project/pull/150450

Recommit after revert in c7bd3062f1dac975cf9b706f457b3c55b4bf57ff and in 4e500bd0015042b0cd4b7c87b81caeea06072d24

Reviewers: 

Pull Request: https://github.com/llvm/llvm-project/pull/187391
2026-03-18 17:54:01 -04:00
Alexey Bataev
4e500bd001 Revert "[SLP] Loop aware cost model/tree building"
This reverts commit 6261cb4487f153c599a040d7a77524561b520240 to try to
fix compile time regressions
2026-03-18 09:46:39 -07:00
Alexey Bataev
6261cb4487 [SLP] Loop aware cost model/tree building
Currently, SLP vectorizer do not care about loops and their trip count.
It may lead to inefficient vectorization in some cases. Patch adds loop
nest-aware tree building and cost estimation.
When it comes to tree building, it now checks that tree do not span
across different loop nests. The nodes from other loop nests are
immediate buildvector nodes.
The cost model adds the knowledge about loop trip count. If it is
unknown, the default value is used, controlled by the
-slp-cost-loop-min-trip-count=<value> option. The cost of the vector
nodes in the loop is multiplied by the number of iteration (trip count),
because each vector node will be executed the trip count number of
times. This allows better cost estimation.

Reviewers: jdenny-ornl, vporpo, hiraditya, RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/150450

Recommit after revert in c7bd3062f1dac975cf9b706f457b3c55b4bf57ff
2026-03-18 07:33:07 -07:00
Alexey Bataev
d117f98ff6 [SLP]Fix legality checks for bswap-based transformations
Fix the checks for the non-power-of-2 base bswaps by checking the
power-of-2 of the source type, not the target scalar type. Plus, add
cost estimation for zext, if the source type does not match the scalar type and fixes final bitcasting for the reduced values.

Fixes https://github.com/llvm/llvm-project/pull/184018#issuecomment-4053477562
2026-03-16 11:56:24 -07:00
Alexey Bataev
61a9e30045 Revert "[SLP]Fix legality checks for bswap-based transformations"
This reverts commit 2d4daea3b66469420fc164e76c15558b34e44c75 to fix
a buildbot https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flab.llvm.org%2Fbuildbot%2F%23%2Fbuilders%2F164%2Fbuilds%2F19737&data=05%7C02%7C%7C672461616e0d4b66614208de8374a0ff%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C639092734113272365%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=%2B23yMlvZzYt7bB2gM6MmcEwPkIKQogXjcKYIZ%2Bz79zQ%3D&reserved=0
2026-03-16 09:01:49 -07:00
Alexey Bataev
2d4daea3b6 [SLP]Fix legality checks for bswap-based transformations
Fix the checks for the non-power-of-2 base bswaps by checking the
power-of-2 of the source type, not the target scalar type. Plus, add
cost estimation for zext, if the source type does not match the scalar type.

Fixes https://github.com/llvm/llvm-project/pull/184018#issuecomment-4053477562
2026-03-16 08:40:44 -07:00
Jim Lin
3ca069c471
[CodeGen][TTI] Reduce funnel shift cost for constant shift amounts (#184942)
The Sub instruction cost and the shift-by-zero handling costs (ICmp +
Select) are only needed when the shift amount is non-constant. Move them
inside the `!OpInfoZ.isConstant()` guard to avoid overestimating cost
for constant shift amounts.

The overestimated scalar cost caused SLP vectorizer to incorrectly
prefer vectorizing funnel shifts with constant shift amounts, since SLP
compares vector cost against scalar cost and a falsely high scalar cost
makes vectorization appear more profitable than it actually is.

Fixes #181308.
2026-03-12 09:50:03 +08:00
Ryan Buchner
c56410fdc7
[SLP] Pre-commit tests for constant strided stores (#185990)
Tests for #185964
2026-03-11 17:30:09 -07:00
Alexey Bataev
50822d6b25 [SLP]Do not request the last instruction for first buildvector nodes with no state
If looking for the match of the gather/buildvector node and its root is
a first node, which also a buildvector/gather, and has no state, we
should skip the analysis for such nodes to prevent a compiler crash

Fixes #185851
2026-03-11 10:11:09 -07:00
Alexey Bataev
aa90add989 [SLP]Track vectorized values in reductions for correct handling between vectorization
Need to use WeakTrackingVH handler instead of the Value * to correctly
track modified/replaced vectorized instructions

Fixes https://github.com/llvm/llvm-project/pull/182760#issuecomment-4036706233
2026-03-11 06:05:08 -07:00
Alexey Bataev
c7bd3062f1 Revert "[SLP] Loop aware cost model/tree building"
This reverts commit 8963edb534e28d548d8381675bb18af1770c3041 to fix
miscompilations/compile time regressions, reported in https://github.com/llvm/llvm-project/pull/150450#issuecomment-4037224288, https://github.com/llvm/llvm-project/pull/150450#issuecomment-4037481719 and https://github.com/llvm/llvm-project/pull/150450#issuecomment-4038134121
2026-03-11 04:37:54 -07:00
Alexey Bataev
8963edb534
[SLP] Loop aware cost model/tree building
Currently, SLP vectorizer do not care about loops and their trip count.
It may lead to inefficient vectorization in some cases. Patch adds loop
nest-aware tree building and cost estimation.
When it comes to tree building, it now checks that tree do not span
across different loop nests. The nodes from other loop nests are
immediate buildvector nodes.
The cost model adds the knowledge about loop trip count. If it is
unknown, the default value is used, controlled by the
-slp-cost-loop-min-trip-count=<value> option. The cost of the vector
nodes in the loop is multiplied by the number of iteration (trip count),
because each vector node will be executed the trip count number of
times. This allows better cost estimation.

Reviewers: jdenny-ornl, vporpo, hiraditya, RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/150450
2026-03-10 16:14:57 -04:00
Vigneshwar Jayakumar
0d67b2edac
[AMDGPU] Add f32 cost model for exp/exp2/exp10 intrinsics (#185369)
exp2 maps to a single v_exp_f32 (quarter rate), while exp and exp10 need
additional full-rate costs for the base conversion. Also added cost based on 
non-afn and denorm mode.

This enables SLP vectorization of exp2 on targets with packed f32 ops.
2026-03-10 11:08:39 -05:00
Alexey Bataev
38a3de6936 [SLP][NFC]Add RISC_V test with a regression in reduction vectorization, NFC 2026-03-10 08:20:42 -07:00
tudinhh
c192e8c9e3
[SLP] Fix misvectorization in commutative to non-commutative conversion (#185230)
**Summary**
Fixes a miscompilation where commutative operations (e.g., or, and, mul)
with a left-hand side constant were incorrectly transformed into
non-commutative operations (e.g., shl, sub).

**The Problem**
In `BinOpSameOpcodeHelper::getOperand`, when a constant is at `Pos ==
0`, the helper was failing to swap operand order for new non-commutative
target opcodes. This resulted in inverted logic, such as transforming
`or 0, %x` into `shl 0, %x` (resulting in 0) instead of the correct `%x
<< 0`.

**The Fix**
The existing logic only protected the Sub opcode. This patch generalizes
the fix to all non-commutative instructions by using
`!Instruction::isCommutative(ToOpcode)`. This ensures that for any
directional operation, the variable is correctly placed on the LHS and
the constant on the RHS.

**Changes**
SLPVectorizer.cpp: Replaced the specific Sub check with a general
isCommutative check.

Regression Test: Added lhs-constant-non-cummutative.ll to cover shl,
sub, and ashr targets.

Fixes #185186
2026-03-09 16:17:39 -04:00
Alexey Bataev
0da2aecb01
[SLP]Invalid cost for non-power-of-2 bswaps (#185407)
bswaps are supported only for power-of-2 types, need to disable it for
the default cost model to fix a compiler crash.

Fixes
https://github.com/llvm/llvm-project/pull/184018#issuecomment-4022697189
2026-03-09 11:07:09 -04:00
Alexey Bataev
e25e010b96 [SLP][NFC]Add a test for bswap of i64 by 2 i32 bswaps 2026-03-08 13:02:15 -07:00
Alexey Bataev
95919ecd57
[SLP]Allow bitcast/bswap based reductions for types, larger than the total strided size
Added support for zero extending the bitcasted/bswapped type to the
original type, if it is larger than the original scalar type

Reviewers: hiraditya, RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/184018
2026-03-08 10:37:09 -04:00