2487 Commits

Author SHA1 Message Date
Alexis Engelke
d4c22859db
[DomTree] Assert non-null block for pre-dom tree (#186790)
In a pre-dominator tree, blocks should never be null.
2026-03-16 16:07:49 +01:00
Alexey Bataev
50822d6b25 [SLP]Do not request the last instruction for first buildvector nodes with no state
If looking for the match of the gather/buildvector node and its root is
a first node, which also a buildvector/gather, and has no state, we
should skip the analysis for such nodes to prevent a compiler crash

Fixes #185851
2026-03-11 10:11:09 -07:00
Alexey Bataev
aa90add989 [SLP]Track vectorized values in reductions for correct handling between vectorization
Need to use WeakTrackingVH handler instead of the Value * to correctly
track modified/replaced vectorized instructions

Fixes https://github.com/llvm/llvm-project/pull/182760#issuecomment-4036706233
2026-03-11 06:05:08 -07:00
Alexey Bataev
c7bd3062f1 Revert "[SLP] Loop aware cost model/tree building"
This reverts commit 8963edb534e28d548d8381675bb18af1770c3041 to fix
miscompilations/compile time regressions, reported in https://github.com/llvm/llvm-project/pull/150450#issuecomment-4037224288, https://github.com/llvm/llvm-project/pull/150450#issuecomment-4037481719 and https://github.com/llvm/llvm-project/pull/150450#issuecomment-4038134121
2026-03-11 04:37:54 -07:00
Alexey Bataev
8963edb534
[SLP] Loop aware cost model/tree building
Currently, SLP vectorizer do not care about loops and their trip count.
It may lead to inefficient vectorization in some cases. Patch adds loop
nest-aware tree building and cost estimation.
When it comes to tree building, it now checks that tree do not span
across different loop nests. The nodes from other loop nests are
immediate buildvector nodes.
The cost model adds the knowledge about loop trip count. If it is
unknown, the default value is used, controlled by the
-slp-cost-loop-min-trip-count=<value> option. The cost of the vector
nodes in the loop is multiplied by the number of iteration (trip count),
because each vector node will be executed the trip count number of
times. This allows better cost estimation.

Reviewers: jdenny-ornl, vporpo, hiraditya, RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/150450
2026-03-10 16:14:57 -04:00
tudinhh
c192e8c9e3
[SLP] Fix misvectorization in commutative to non-commutative conversion (#185230)
**Summary**
Fixes a miscompilation where commutative operations (e.g., or, and, mul)
with a left-hand side constant were incorrectly transformed into
non-commutative operations (e.g., shl, sub).

**The Problem**
In `BinOpSameOpcodeHelper::getOperand`, when a constant is at `Pos ==
0`, the helper was failing to swap operand order for new non-commutative
target opcodes. This resulted in inverted logic, such as transforming
`or 0, %x` into `shl 0, %x` (resulting in 0) instead of the correct `%x
<< 0`.

**The Fix**
The existing logic only protected the Sub opcode. This patch generalizes
the fix to all non-commutative instructions by using
`!Instruction::isCommutative(ToOpcode)`. This ensures that for any
directional operation, the variable is correctly placed on the LHS and
the constant on the RHS.

**Changes**
SLPVectorizer.cpp: Replaced the specific Sub check with a general
isCommutative check.

Regression Test: Added lhs-constant-non-cummutative.ll to cover shl,
sub, and ashr targets.

Fixes #185186
2026-03-09 16:17:39 -04:00
Alexey Bataev
95919ecd57
[SLP]Allow bitcast/bswap based reductions for types, larger than the total strided size
Added support for zero extending the bitcasted/bswapped type to the
original type, if it is larger than the original scalar type

Reviewers: hiraditya, RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/184018
2026-03-08 10:37:09 -04:00
Alexey Bataev
e0e5000ea7
[SLP]Remove Alternate early profitability checks in favor of throttling
Removes early check, which may prevent some further optimizations, in
favor of tree throttling.

Reviewers: RKSimon, hiraditya

Pull Request: https://github.com/llvm/llvm-project/pull/182760
2026-03-08 09:37:51 -04:00
Alexey Bataev
d8b718a3fa [SLP]Match the mask size, when copying mask for full match
Need to be careful, when filling the mask for fully matched nodes, the
masks may differ in sizes

Fixes a crash reported in test/Transforms/SLPVectorizer/X86/mask-size-less-common-mask.ll
2026-03-08 05:33:30 -07:00
Alexey Bataev
a96a0ded25 [SLP]Fix the matching of the nodes with the same scalars, but reused
If the scalars are reused and the ReuseShuffleIndices is set, we may
miss matching for the buildvector/gather nodes and add an extra cost
2026-03-07 10:29:34 -08:00
Alexey Bataev
2714583317 [SLP]Do not consider split vectorize nodes as vector phi nodes
Split vectorize nodes should not be considered as vector PHI nodes, when
trying to find the insertion point for the postpotned nodes.

Fixes #184585
2026-03-04 14:03:20 -08:00
Alexey Bataev
789bf51f0c [SLP]Do not consider condition with multiple uses and negate predicate as a candidate for inversed select
If the select/zext comparison has negate predicate and is used in
several places, it should not be considered as a candidate for inversed
zext/select pattern, it will be replaced by a negate vector predicate,
leading to an incorrect codegen for other uses
2026-03-01 12:01:19 -08:00
Alexey Bataev
9730d31284 [SLP]Fix types for reductions in revec
Need to consider vector inputs, when building casts for the reduced
values

Fixes #170828
2026-03-01 07:54:13 -08:00
Alexey Bataev
a6e7c38ea6 [SLP]Do not vectorize select nodes with scalar and vector conditions
If the select nodes contains selects with mixed scalar/vector
conditions, such nodes should not be revectorized.

Fixes #170836
2026-03-01 07:01:04 -08:00
Alexey Bataev
e317f42455 [SLP]Recalculate dependencies for the buildvector schedule node, if they have copyable node
Need to recalculate the deps for all buildvector nodes with copyable
deps to prevent a compiler crash during scheduling of instructions
2026-02-28 12:29:47 -08:00
Alexey Bataev
12e1075b64 [SLP]Fix operand reordering when estimating profitability of operands
Need to swap operand for a single instruction, not for the the same lane
of the first and second instruction in the list
2026-02-27 16:16:22 -08:00
Akash Dutta
cf28f23f10
[SLP] Reject duplicate shift amounts in matchesShlZExt reorder path (#183627)
In the reordered RHS path of matchesShlZExt, the code never checked that
each shift amount (0, Stride, 2×Stride, …) appears at most once. When
the same shift appeared in multiple lanes, it still filled Order,
producing a non-permutation (e.g. Order = [0,0,0,1]). That led to bad
shuffle masks and miscompilation (e.g. shuffles with poison).

The patch adds an explicit duplicate check: before setting Order[Idx] =
Pos, it ensures Pos has not been seen before, using a SmallBitVector
SeenPositions(VF). If a position is seen twice, the function returns
false and the optimization is not applied.
2026-02-27 13:00:58 -06:00
Alexey Bataev
c08079d8e7 [SLP]Add single-use check for the bitcasted reduction
If the reduced value, to be bitcasted, is used multiple times, it will
require emission of the extractelement instruction. Such nodes should
not be bitcasted, should be vectorized as vector instructions.

Fixes https://github.com/llvm/llvm-project/pull/181940#issuecomment-3950734168
2026-02-24 05:27:38 -08:00
Alexey Bataev
95a960daa0 [SLP]Do not convert inversed cmp nodes, if they reordered/reused
If the cmp node with inversed compares must be reordered/shuffled with
the reuses, disable transformation for such nodes for now, they require
some special processing.

Fixes https://github.com/llvm/llvm-project/pull/181580#issuecomment-3933026221
2026-02-20 06:04:51 -08:00
Alexey Bataev
29d4fea59b [SLP]Handle mixed select-to-bicasts and general reductions
If the reduction tree represents mixed select-to-bitcasts and general
reductions, need to handle them correctly to avoid a compiler crash

Fixes https://github.com/llvm/llvm-project/pull/181940#issuecomment-3929220929
2026-02-19 13:38:34 -08:00
Alexey Bataev
38d804725f [SLP]Do not mark for transforming to buildvector inversed compares
Inversed compares must remain vector nodes, they should be converted to
gathers to generate correct code.

Fixes issue reported in https://github.com/llvm/llvm-project/pull/181580#issuecomment-3926951332
2026-02-19 09:37:50 -08:00
Alexey Bataev
c6425aa9ae
[SLP]Support reduced or selects of bitmask as cmp bitcast
Converts reduced or(select %cmp, bitmask, 0) to zext(bitcast %vector_cmp to
i<num_reduced_values>) to in

Reviewers: RKSimon, hiraditya

Pull Request: https://github.com/llvm/llvm-project/pull/181940
2026-02-18 18:01:42 -05:00
Alexey Bataev
a7c25ba33c [SLP][NFC]Fix reorered -> reordered 2026-02-18 13:54:39 -08:00
Ryan Buchner
a6e8de7407
[SLP][NFC] Fix MainOp/AltOp assertion to check the correct value (#182093)
Previously both assertions were checking MainOp.

Initial assertion added incorrectly in d41e517748e2d.
2026-02-18 13:08:10 -08:00
Alexey Bataev
a5aaa9dc63
[SLP]Convert compares from zexts, promoted to selects, to inversed op, if improves codegen
Some of the zext i1 (cmp) + select sequences can be transformed by
inverting compare predicates to remove extra shuffles, like
zext 1 (cmp ne) + select (cmp eq), 0, 2 can be modeled as select <2
x > (cmp ne), <1, 2>, zeroinitializer

Reviewers: RKSimon, hiraditya

Pull Request: https://github.com/llvm/llvm-project/pull/181580
2026-02-17 13:26:06 -05:00
Alexey Bataev
26f944bb50 [SLP]Fix an ArrayRef out-of-bounds access in slice
If the revec is enabled, may have the number of parts (registers) for
the combined node, not a single element node, so need to check for
potential out-of-bounds access

Fixes #181798
2026-02-17 10:00:13 -08:00
Alexey Bataev
ef52df4365
[SLP]Do not increase depth for type-changing nodes and NotProfitableForVectorization removal
The patch changes the maximum tree size analysis. 1. Do not increase
depth for type changing nodes (like casts and compares), allowing more
deeper trees to be built. 2. Removes NotProfitableForVectorization
workaround, not needed anymore after throttling enabled

Reviewers: hiraditya, RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/180950
2026-02-17 09:52:23 -05:00
Alexey Bataev
2ff4ec172a [SLP]Fix revec in split nodes
Initially split nodes do not support vector entries in revec mode, patch
fixes the issue by adding analysis for the scale factor

Fixes #181546
2026-02-16 14:09:28 -08:00
Alexey Bataev
7ec7907b80 [SLP] Fix a very long loads offset, being stored in DenseMap
Added a check for a very long offset to avoid a crash in the compiler

Fixes #181682
2026-02-16 11:07:22 -08:00
Alexey Bataev
255b493673 [SLP]Do not overflow number of the reduced values
Need to trunc the total number of the reduced values, in case if the
number is too big

Fixes #181520
2026-02-15 11:02:32 -08:00
Ryan Buchner
f2903793de
[SLP][NFC] Use static_assert to confirm SupportedOps is sorted (#181397)
Can be checked at compile time.
2026-02-13 11:30:59 -08:00
Alexey Bataev
e93829e807 [SLP]Fix crash with deleted non-copyable node in scheduling copyables
If the copyables are parts of the deleted nodes, need to check the
actual tree to correctly handling the scheduling of copyables
2026-02-12 11:42:02 -08:00
Ryan Buchner
95ef1a5c31
[SLP] Use the correct identity when combining binary opcodes with AND/MUL (#180457)
Fixes #180456

Fix bug in the following SLP lowering:
```
define void @sub_mul(ptr %p, ptr %s) {
entry:
  %p1 = getelementptr i16, ptr %p, i64 1

  %l0 = load i16, ptr %p
  %l1 = load i16, ptr %p1

  %mul0 = sub i16 %l0, 0
  %mul1 = mul i16 %l1, 5

  %s1 = getelementptr i16, ptr %s, i64 1

  store i16 %mul0, ptr %s
  store i16 %mul1, ptr %s1
  ret void
}
```
to
```
define void @sub_mul(ptr %p, ptr %s) {
entry:
%tmp0 = load <2 x i16>, ptr %p, align 2
%tmp1 = mul <2 x i16> %tmp0, <i16 0, i16 5> -> updates to <i16 1, i16 5>
store <2 x i16> %tmp1, ptr %s, align 2
ret void
}
```
2026-02-12 09:34:44 -08:00
Alexey Bataev
fc648683cd
[SLP]Add external uses estimations into tree throttling
Added basic estimations for the external uses, when calculating the cost
of the non-profitable trees. Excluding stores/insertelement, as thay are
very good candidates for the vectorization. Also, tuned
buildvector/gather cost with minimum bitwidth analysis data.

Reviewers: hiraditya, RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/178024
2026-02-11 16:14:34 -05:00
Alexey Bataev
601364f1ba [SLP]Correctly process deleted gathered loads and short trees
If the gathered loads nodes are deleted for deletion, need to actually
deleted them from tree. Also, if the remaining tree is too short
(buildvector + gather node), need to skip such trees to avoid hanging.

Fixes #180846
2026-02-11 10:27:01 -08:00
Alexey Bataev
54cdd903b8 [SLP]Skip operands comparing on non-matching (but compatible) instructions
If the instructions are compatible but non-matching (zext-select pair as
example), no need to perform operands analysis, just return that they
are matching.
2026-02-11 04:55:29 -08:00
David Sherwood
6f0b8a7ebc
[SLP] Use the correct calling convention for vector math routines (#180759)
When vectorising calls to math intrinsics such as llvm.pow we
correctly detect and generate calls to the corresponding vector
math variant. However, we don't pick up and use the calling
convention for the vector math function. This matters for veclibs
such as ArmPL where the aarch64_vector_pcs calling convention
can improve codegen by reducing the number of registers that
need saving across calls.
2026-02-11 10:52:58 +00:00
Alexey Bataev
78490acb32 [SLP]Support for zext i1 %x modeling as select %x, 1, 0
Model zext i1 %x to in as select i1 %x, in 1, in 0 in case, if there are
other select instructions, which can be combined into a bundle.

Fixes #178403

Recommit after revert in 993e1f66afcfe9da03bd813e669eada341b11d2f

Reviewers: hiraditya, RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/180635
2026-02-10 12:54:12 -08:00
Alexey Bataev
993e1f66af Revert "[SLP]Support for zext i1 %x modeling as select %x, 1, 0"
This reverts commit 70aebae2a13114f4e3d5e2460c052d8f3de295be to fix
buildbots https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flab.llvm.org%2Fbuildbot%2F%23%2Fbuilders%2F85%2Fbuilds%2F18614&data=05%7C02%7C%7Ce5641da3fe984280a6e908de68b3658c%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C639063316889757116%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=65hUwLDdZkXq3zUEt3cVuqJNwXN7Alw4JKDggDbjeVk%3D&reserved=0
2026-02-10 06:49:53 -08:00
Alexey Bataev
70aebae2a1
[SLP]Support for zext i1 %x modeling as select %x, 1, 0
Model zext i1 %x to in as select i1 %x, in 1, in 0 in case, if there are
other select instructions, which can be combined into a bundle.

Fixes #178403

Reviewers: hiraditya, RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/180635
2026-02-10 08:59:44 -05:00
Alexey Bataev
fe754dff6d [SLP]Remove LoadCombine workaround after handling of the copyables
LoadCombine pattern handling was added as a workaround for the cases,
where the SLP vectorizer could not vectorize the code effectively. With
the copyables support, it can handle it directly.

Also, patch adds support for scalar loads[ + bswap] pattern for byte
sized loads (+ reverse bytes for bswap)

Recommit after revert in 6377c86d718232fe60c548dfd7ab439f7ff84df7

Reviewers: RKSimon, hiraditya

Pull Request: https://github.com/llvm/llvm-project/pull/174205
2026-02-05 11:16:08 -08:00
Alexey Bataev
6377c86d71 Revert "[SLP]Remove LoadCombine workaround after handling of the copyables"
This reverts commit 8dbb9f66e8b14a8a06f1873a2c1b7dce366ed2d6 to fix
buildbot issues https://lab.llvm.org/buildbot/#/builders/224/builds/2795
2026-02-05 09:57:00 -08:00
Alexey Bataev
8dbb9f66e8
[SLP]Remove LoadCombine workaround after handling of the copyables
LoadCombine pattern handling was added as a workaround for the cases,
where the SLP vectorizer could not vectorize the code effectively. With
the copyables support, it can handle it directly.

Also, patch adds support for scalar loads[ + bswap] pattern for byte
sized loads (+ reverse bytes for bswap)

Reviewers: RKSimon, hiraditya

Pull Request: https://github.com/llvm/llvm-project/pull/174205
2026-02-05 10:42:08 -05:00
Florian Hahn
05a2b146fb
[LV] Optimize FindLast recurrences to FindIV (NFCI). (#177870)
This patch restructures Find(First|Last)IV handling. Instead of
differentiating between FindLast, FindFirstIV and FindLastIV up front,
this patch simplifies the logic in IVDescriptor to just identify the
FindLast pattern up-front.

It then adds a new VPlan transformation to optimize FindLast reductions
to FindIV reductions if there is a suitable sentinel value.
Find(Last|First)IV recurrence kinds to a single FindIV kind.

This is simpler and more accurate, given selecting the first/last
induction of the final IV reduction is directly controlled by the
corresponding recurrence kind of the ComputeReductionResult.

The new structure also allows further optimizations, like vectorizing
FindLastIV with another boolean reduction that tracks if the condition
in the loop was ever true, if there is no suitable sentinel value.

PR: https://github.com/llvm/llvm-project/pull/177870
2026-02-05 13:57:20 +00:00
Alexey Bataev
46a38488a4 [SLP]Disable modeling disjoint reduction or as bitcast for big endian
Big endian targets cannot be modeled as bitcast, need to support it as
a reversion/bswap instead, just disabling it for now.
2026-02-03 06:16:25 -08:00
Ryan Buchner
e5b99502d7
[SLP] Avoid adding duplicate VFs into vectorizeStores()::CandidateVFs (#179296)
Small compile time improvement:
```
stage1-O3: (-0.01%)
stage1-ReleaseThinLTO (-0.00%)
stage1-ReleaseLTO-g (-0.01%)
stage1-O0-g (-0.00%)
stage1-aarch64-O3 (+0.01%)
stage1-aarch64-O0-g (-0.02%)
stage2-O3 (-0.00%)
stage2-O0-g (-0.03%)
stage2-clang (+0.00%)
```

Also changes/removes a few comments for clarity.
2026-02-02 11:03:27 -08:00
Ryan Buchner
b936771eea
[SLP][NFC] Refactor vectorizeStores::RangeSizes (#177241)
Currently `RangeSizes` is used to allow us to skip trying to vectorize
clearly unprofitable trees by caching prior attempts `TreeSizes`. This
PR refactors that logic to simplify and improve readability. This will
make it easier to handle the strided stores.

Switches RangeSizes to use `first` as the location to lookup values from, and `second` as the location to store values to. `first` gets updated by `second` at the appropriate times to match the behavior prior to this change.
2026-01-30 10:25:15 -08:00
Alexey Bataev
b73122d5b7 [SLP]Cast incoming value to a propr type for int nodes, bitcasted to fp
Before casting the value to FP type, need to check, if the type for
reduced during minbitwidth analysis and need to restore the original
source type to generate correct bitcast operation.

Fixes #178884
2026-01-30 08:51:03 -08:00
Alexey Bataev
2ea77ed013
[SLP]Support for bswap pattern for bytes-based disjoint or reductions
If the reduction forms reversed bitcast, we can represent it as
a bitcast + bswap, if the source elements are byte sized

Reviewers: hiraditya, RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/178513
2026-01-29 11:28:01 -05:00
Alexey Bataev
f6fe6fc86a [SLP]Do not vectorize subtrees of the split node, marked as gathers.
If the split node was marked as gather/buildvector nodes, the vectorizer
should not vectorize its subtrees, which are marked as deleted.
2026-01-28 17:44:38 -08:00