2326 Commits

Author SHA1 Message Date
Alexey Bataev
8c41859a21 [SLP]Clear the operands deps of non-schedulable nodes, if previously all operands were copyable
If all operands of the non-schedulable nodes were previously only
copyables, need to clear the dependencies of the original schedule data
for such copyable operands and recalculate them to correctly handle
  number of dependecies.

Fixes #159406
2025-09-18 12:11:33 -07:00
Alexey Bataev
f2301be0e8 [SLP]Add a check if the user itself is commutable
If the commutable instruction can be represented as a non-commutable
vector instruction (like add 0, %v can be represented as a part of sub
nodes with operation sub %v, 0), its operands might still be reordered
and this should be accounted when checking for copyables in operands

Fixes #158293
2025-09-15 12:50:03 -07:00
Mikhail Gudim
ee3a4f4c94
[SLPVectorizer] Test -1 stride loads. (#158358)
Add a test to generate -1 stride load and flags to force this behaviour.
2025-09-14 15:29:28 -04:00
Antonio Frighetto
370607065d
[llvm] Regenerate test checks including TBAA semantics (NFC)
Tests exercizing TBAA metadata (both purposefully and not), and
previously generated via UTC, have been regenerated and updated
to version 6.
2025-09-12 20:01:17 +02:00
Alexey Bataev
0dddfab54c [SLP]Recalculate deps if the original instruction scheduled after being copyable
If the original instruction is going to be scheduled after same
instruction being scheduled as copyable, need to recalculate
dependencies. Otherwise, the dependencies maybe calculated incorrectly.
2025-09-10 10:18:45 -07:00
Alexey Bataev
d0ea176cce [SLP]Do not consider SExt/ZExt profitable for demotion, if the user is a bitcast to float
If the user node of the SExt/ZExt node is a bitcast to a float point
type, the node itself should not be considered legal to demote, since
still the casting is required to match the size of the float point type.

Fixes #157277
2025-09-08 07:59:01 -07:00
Alexey Bataev
fd93dc5ac5 [SLP]Correctly schedule standalone schedule data, which is part of tree entry
If a standalone schedule data relates to a vectorized instruction, still
need to schedule it as a part of pseudo-bundle to correctly handle
dependencies between its child nodes.
2025-09-07 17:08:37 -07:00
Alexey Bataev
c4d927ce09 Revert "[SLP]Correctly schedule standalone schedule data, which is part of tree entry"
This reverts commit 57cae2b6a275a8eb3bc8935973263ed84535fb81 to fix
a buildbot https://lab.llvm.org/buildbot/#/builders/169/builds/14776
2025-09-07 13:27:12 -07:00
Alexey Bataev
57cae2b6a2 [SLP]Correctly schedule standalone schedule data, which is part of tree entry
If a standalone schedule data relates to a vectorized instruction, still
need to schedule it as a part of pseudo-bundle to correctly handle
dependencies between its child nodes.
2025-09-07 10:54:40 -07:00
Phoebe Wang
94b164c218
[X86][AVX10] Remove EVEX512 and AVX10-256 implementations (#157034)
The 256-bit maximum vector register size control was removed from AVX10
whitepaper, ref: https://cdrdv2.intel.com/v1/dl/getContent/784343

We have warned these options in LLVM21 through #132542. This patch
removes underlying implementations in LLVM22.
2025-09-05 14:08:59 +00:00
Alexey Bataev
9a3aedb093 [SLP]Do not try to schedule bundle with non-schedulable parent with commutable instructions
Commutable instruction can be reordering during tree building, and if
the parent node is not scheduled, its ScheduleData elements are
considered independent and compiler do not looks for reordered operands.
Need to cancel scheduling of copyables in this case.
2025-09-04 12:57:14 -07:00
Alexey Bataev
005f0fa40e [SLP]Improved/fixed FMAD support in reductions
In the initial patch for FMAD, potential FMAD nodes were completely
excluded from the reduction analysis for the smaller patch. But it may
cause regressions.

This patch adds better detection of scalar FMAD reduction operations and
tries to correctly calculate the costs of the FMAD reduction operations
(also, excluding the costs of the scalar fmuls) and split reduction
operations, combined with regular FMADs.

Fixed the handling for reduced values with many uses.

Reviewers: RKSimon, gregbedwell, hiraditya

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/152787
2025-09-02 13:09:57 -07:00
Alexey Bataev
6d902b67cd Revert "[SLP]Improved/fixed FMAD support in reductions"
This reverts commit 74230ff2791384fb3285c9e9ab202056959aa095 to fix the
bugs found during local testing.
2025-09-02 07:58:29 -07:00
Alexey Bataev
74230ff279
[SLP]Improved/fixed FMAD support in reductions
In the initial patch for FMAD, potential FMAD nodes were completely
excluded from the reduction analysis for the smaller patch. But it may
cause regressions.

This patch adds better detection of scalar FMAD reduction operations and
tries to correctly calculate the costs of the FMAD reduction operations
(also, excluding the costs of the scalar fmuls) and split reduction
operations, combined with regular FMADs.

Reviewers: RKSimon, gregbedwell, hiraditya

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/152787
2025-09-01 17:01:36 -04:00
Alexey Bataev
a80a1988f7
[SLP]Better support for copyable values in stores
Currently stores are sorted by the stored values instruction types,
which do not include analysis for copyables. The compiler may miss some
potential vectorization opportunities because of that. Patch adds
detection of the copyables in stored values.

Reviewers: hiraditya, HanKuanChen, RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/153213
2025-09-01 16:09:52 -04:00
Sam Tebbs
37127f74f4
[LV] Bundle sub reductions into VPExpressionRecipe (#147255)
This PR bundles sub reductions into the VPExpressionRecipe class and
adjusts the cost functions to take the negation into account.

Stacked PRs:
1. https://github.com/llvm/llvm-project/pull/147026
2. -> https://github.com/llvm/llvm-project/pull/147255
3. https://github.com/llvm/llvm-project/pull/147302
4. https://github.com/llvm/llvm-project/pull/147513
2025-09-01 17:25:01 +01:00
Alexey Bataev
7730ebce8e [SLP]Do not to try to revectorize previously vectorized phis in loops
No need to try to revectorize previously vectorized phis in loops, it leads to
a compile time blow-up.

Fixes #155998
2025-08-31 10:54:20 -07:00
Alexey Bataev
e5a4ea20c5 [SLP]Do not remove reduced value, if it is a copyable
If the value is checked for the reduction and it is a copyable element
in a root node, it should not be deleted, since it may still be used
after vectorization.

Fixes #155512
2025-08-31 09:09:39 -07:00
Alexey Bataev
eb39605192 [SLP]Do not schedule terminate copyable from main op basic block
If the copyable instruction is a terminate instruction from the same
block, as the potential main instruction, such instruction cannot be
copyable and the value list cannot be modeled as instructions with same
(and copyables) opcodes.

Fixes #155183
2025-08-30 18:05:08 -07:00
Alexey Bataev
2824b3c00e [SLP] Try to recalculate deps only for nodes with previously valid deps
Need to recalculate the dependencies only for nodes, which have valid
deps before they gets cleared because of the copyable nodes. Otherwise,
no need to recaculate the dependencies to prevent a crash.
2025-08-30 14:20:50 -07:00
Mikhail Gudim
fe6b611d58
[RISCV] Unaligned vec mem => prefer alt opc vec
Return `true` in `RISCVTTIImpl::preferAlternateOpcodeVectorization` if
subtarget supports unaligned memory accesses.
2025-08-30 04:56:01 -04:00
Mikhail Gudim
fda67dc5b7
[RISCV][NFC] Precommit a test for SLP behavior...
when the subtarget has unaligned-vector-mem feature.
2025-08-29 22:09:46 +00:00
Alexey Bataev
b157599156 [SLP]Do not include copyable data to the same user twice
If the copyable schedule data is created and the user is used several
times in the user node, no need to count same data for the same user
several times, need to include it only ones.

Fixes #153754
2025-08-15 12:36:45 -07:00
Alexey Bataev
09f5b9ab0a Revert "[SLP]Do not include copyable data to the same user twice"
This reverts commit 758c6852c3ffe6b5e259cafadd811e60d8c276fb to fix
buildbot  https://lab.llvm.org/buildbot/#/builders/195/builds/13298
2025-08-15 12:08:31 -07:00
Alexey Bataev
758c6852c3 [SLP]Do not include copyable data to the same user twice
If the copyable schedule data is created and the user is used several
times in the user node, no need to count same data for the same user
several times, need to include it only ones.

Fixes #153754
2025-08-15 11:47:35 -07:00
Alexey Bataev
13b54f7dc1 [SLP] Recalculate dependencies for potential control dependencies if cleared
If the control dependecies are cleared after calcellation of the
copyables, need to reclculate them unconditionally.

Fixes #153754 #153676
2025-08-15 07:52:10 -07:00
Alexey Bataev
bf2f241458 [SLP]Support LShr as base for copyable elements
Added support for LShr instructions as base for copyable elements. Also,
added simple analysis for best base instruction selection, if multiple
candidates are available.

Fixed scheduling after cancellation

Reviewers: hiraditya, RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/153393
2025-08-14 19:12:27 -07:00
Alex Bradbury
db5f7dc374 Revert "[SLP]Support LShr as base for copyable elements"
This reverts commit ca4ebf95172d24f8c47655709b2c9eb85bda5cb2.

Causes compile-time crashes for some inputs with RVV zvl512b/zvl1024b
configurations. See here for a minimal reproducer:
https://github.com/llvm/llvm-project/pull/153393#issuecomment-3189898813
2025-08-14 22:18:24 +01:00
David Green
5836bae463
[AArch64] Change the cost of fma and fmuladd to match fmul. (#152963)
As fmul and fmadd are so similar, their performance characteristics tend
to be the same on most platforms, at least in terms of reciprocal
throughputs. Processors capable of performing a given number of fmul per
cycle can usually perform the same number of fma, with the extra add
being relatively simple on top. This patch makes the scores of the two
operations the same, which brings the throughput cost of a fma/fmuladd
to 2, and the latency to 3, which are the defaults for fmul.

Note that we might also want to change the throughput cost of a fmul to
1, as most processors have ample bandwidth for them, but they should
still stay in-line with one another.
2025-08-14 21:53:45 +01:00
Alexey Bataev
ca4ebf9517
[SLP]Support LShr as base for copyable elements
Added support for LShr instructions as base for copyable elements. Also,
added simple analysis for best base instruction selection, if multiple
candidates are available.

Reviewers: hiraditya, RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/153393
2025-08-14 12:35:28 -04:00
Alexey Bataev
d57ab276b6 [SLP] Recalculate cleared deps for potential control schedule data nodes
Need to recalculate the dependencies for all potential control data
schedule nodes to prevent compiler crash.

Fixes #153571
2025-08-14 09:00:42 -07:00
Alexey Bataev
dd5ba694bd [SLP]Recalculate deps for potential control-dependent schedule data
After clearing the dependencies in copyable data, need to recalculate
dependencies for the original ScheduleData, if it can be marked as
control dependent.

Fixes #153289
2025-08-13 08:18:26 -07:00
Sam Tebbs
0bfa1718af
[LV] Create in-loop sub reductions (#147026)
This PR allows the loop vectorizer to handle in-loop sub reductions by
forming a normal in-loop add reduction with a negated input.

Stacked PRs:
1. -> https://github.com/llvm/llvm-project/pull/147026
2. https://github.com/llvm/llvm-project/pull/147255
3. https://github.com/llvm/llvm-project/pull/147302
4. https://github.com/llvm/llvm-project/pull/147513
2025-08-12 10:22:41 +01:00
Alexey Bataev
2d7b55a028
[SLP]Initial support for copyable elements
Adds initial support for copyable elements, both schedulable and
non-schedulable.
Adds support only for add for now, other opcodes will added in future.
Still some cases are not handled, e.g. stores do not include this,
because currently do not check for copyable elements.

Reviewers: hiraditya, RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/147366
2025-08-11 09:41:19 -04:00
Alexey Bataev
67af2f6c5c [SLP]Initial FMAD support (#149102)
Added initial check for potential fmad conversion in reductions and
operands vectorization.

Added the check for instruction to fix #152683

Skipped the code for reduction to avoid regressions.
2025-08-11 05:53:55 -07:00
David Green
cfe190979e Revert "[SLP]Initial FMAD support (#149102)"
This reverts commit 0fffb9f9ed81f4c2084b8fe040c88b60bb6c372a due to major
performance regressions.
2025-08-10 15:16:01 +01:00
Alexey Bataev
0fffb9f9ed [SLP]Initial FMAD support (#149102)
Added initial check for potential fmad conversion in reductions and
operands vectorization.

Added the check for instruction to fix #152683
2025-08-08 10:30:23 -07:00
Alexey Bataev
0419b459be Revert "[SLP]Initial FMAD support (#149102)"
This reverts commit 0bcf45ea3458ba79eb4257afcfd6af954292c9ce to fix the
regresions, reported in https://github.com/llvm/llvm-project/issues/152683
2025-08-08 09:17:59 -07:00
Alexey Bataev
adae370805 [SLP][NFC]Cleanup undefs and the whole test, NFC 2025-08-07 13:41:22 -07:00
Alexey Bataev
0bcf45ea34
[SLP]Initial FMAD support (#149102)
Added initial check for potential fmad conversion in reductions and
operands vectorization.
2025-08-07 09:51:43 -04:00
Ramkumar Ramachandra
edeee824f0
Reland [VectorUtils] Trivially vectorize ldexp, [l]lround (#152476)
Changes: The original patch, landed as 1336675, was reverted due to a
bug in LoopVectorize resulting in a crash. The bug has now been fixed by
95c32bf ([VPlan] Return invalid cost if any skeleton block has invalid
costs), and this reland is identical to the original patch.
2025-08-07 12:07:29 +01:00
Mikhail Gudim
3404c0b013
Slp basic test (#152355)
Add a basic test for SLPVectorizer to make sure that upcoming
refactoring patches don't break anything. Also, record a test for a
missed opportunity.
2025-08-06 14:54:50 -04:00
Alexey Bataev
e27831ff9b [SLP] Fix a check for main/alternate interchanged instruction
If the instruction is checked for matching the main instruction, need to
check if the opcode of the main instruction is compatible with the
operands of the instruction. If they are not, need to check the
alternate instruction and its operands for compatibility and return
alternate instruction as a match.

Fixes #151699

Fixed check for non-supported binary operations.
2025-08-04 11:20:54 -07:00
Michael Halkenhäuser
70af09e3a1
Revert "[SLP] Fix a check for main/alternate interchanged instruction" (#151997)
This reverts commit 3ee8d047109ea4bb479095f4b153c2120a8d726c.

Revert reason: FAILED build for openmp-offload-amdgpu-runtime-2 
https://lab.llvm.org/buildbot/#/builders/10/builds/10827
2025-08-04 12:57:20 -04:00
Alexey Bataev
3ee8d04710 [SLP] Fix a check for main/alternate interchanged instruction
If the instruction is checked for matching the main instruction, need to
check if the opcode of the main instruction is compatible with the
operands of the instruction. If they are not, need to check the
alternate instruction and its operands for compatibility and return
alternate instruction as a match.

Fixes #151699
2025-08-04 08:31:35 -07:00
Alexey Bataev
7cd1ce3aa0 [SLP]Check vector-like instruction for dominance in copyables
Need to check if the vector-like instruction is dominated by main
operation in the copyables to prevent broken def-use chain

Fixes #151456
2025-08-04 06:14:19 -07:00
David Green
b30d5315b7
[AArch64] Add better fcmp costs for expanded predicates (#147940)
Certain fcmp predicates need to be expanded into multiple operations and
or'd together. This adds some more accurate cost modelling for them
based on the predicate. Unsupported operations are given the cost of a
libcall and the latency is set to 2 as that seemed to be fairly common
between different CPUs.
2025-08-04 13:42:57 +01:00
Muhammad Omair Javaid
176d54aa33 Revert "[VectorUtils] Trivially vectorize ldexp, [l]lround (#145545)"
This reverts commit 13366759c3b9db9366659d870cc73c938422b020.

This broke various LLVM testsuite buildbots for AArch64 SVE, but the
problem got masked because relevant buildbots were already failing
due to other breakage.

It has broken llvm-test-suite test:
gfortran-regression-compile-regression__vect__pr106253_f.test

https://lab.llvm.org/buildbot/#/builders/4/builds/8164
https://lab.llvm.org/buildbot/#/builders/17/builds/9858
https://lab.llvm.org/buildbot/#/builders/41/builds/8067
https://lab.llvm.org/buildbot/#/builders/143/builds/9607
2025-08-01 01:24:52 +05:00
Ramkumar Ramachandra
13366759c3
[VectorUtils] Trivially vectorize ldexp, [l]lround (#145545) 2025-07-29 19:23:09 +01:00
Simon Pilgrim
0fa0ce1f3a
[CostModel][X86] Update SK_Broadcast based on cost kinds (#150620)
When these were converted to CostKindTblEntry the throughput was mainly copied to all cost kinds

Regenerated with my check_cost_tables.py helper script
2025-07-26 13:52:47 +01:00