2379 Commits

Author SHA1 Message Date
Alexey Bataev
b988555812 [SLP]Check if the extractelement is part of other buildvector node before marking for erasing
Need to check if the extractelement instruction is part of other
buildvector node, before trying to mark it for the deletion, otherwise
the compiler may reuse the deleted instruction.

Fixes #172221
2025-12-15 09:54:05 -08:00
Philip Ginsbach-Chen
1d821b0c6b
[AArch64] use isTRNMask to calculate shuffle costs (#171524)
This builds on #169858 to fix the divergence in codegen
(https://godbolt.org/z/a9az3h6oq) between two very similar
functions initially observed in #137447 (represented in the diff by test
cases `@transpose_splat_constants` and `@transpose_constants_splat`:
```
int8x16_t f(int8_t x)
{
  return (int8x16_t) { x, 0, x, 1, x, 2, x, 3,
                       x, 4, x, 5, x, 6, x, 7 };
}

int8x16_t g(int8_t x)
{
  return (int8x16_t) { 0, x, 1, x, 2, x, 3, x,
                       4, x, 5, x, 6, x, 7, x };
}
```

The PR uses an additional `isTRNMask` call in
`AArch64TTIImpl::getShuffleCost` to ensure that we treat shuffle masks
as transpose masks even if `isTransposeMask` fails to recognise them
(meaning that `Kind == TTI::SK_Transpose` cannot be relied upon).

Follow-up work could consider modifying `isTransposeMask`, but that
would also impact other backends than AArch64.
2025-12-15 05:34:30 +00:00
Alexey Bataev
f8d0c355f5 [SLP]Prefer instructions, ued outside the block, as the initial main copyable instructions
Instructions, used outside the block, must be considered the first
choice for the main instructionsin the copyable nodes, to avoid
use-before-def.

Fixes #171055
2025-12-08 09:46:15 -08:00
Nikita Popov
d0f5a49fb6
[Support] Support debug counters in non-assertion builds (#170468)
This enables the use of debug counters in (non-assertion) release
builds. This is useful to enable debugging without having to switch to
an assertion-enabled build, which may not always be easy.

After some recent improvements, always supporting debug counters no
longer has measurable overhead.
2025-12-03 16:21:47 +01:00
Alexey Bataev
216b9fa227 [SLP][NFC]Add another test with the user with multiple copyable operands, NFC 2025-11-26 14:28:16 -08:00
Alexey Bataev
66e18b86b8 [SLP][NFC]Add a test with single op inst, used in many nodes, NFC. 2025-11-26 11:15:48 -08:00
Alexey Bataev
00ffc70ba1 [SLP][NFC]Add a test with commutative instruction with non-commutative op, NFC 2025-11-25 12:58:20 -08:00
Alexey Bataev
eb1ff56e26 [SLP][NFC]Add a test for copyable operands, used multiple times, NFC 2025-11-25 08:40:00 -08:00
Nicolai Hähnle
69589dd2c0
AMDGPU: Improve getShuffleCost accuracy for 8- and 16-bit shuffles (#168818)
These shuffles can always be implemented using v_perm_b32, and so this
rewrites the analysis from the perspective of "how many v_perm_b32s does
it take to assemble each register of the result?"

The test changes in Transforms/SLPVectorizer/reduction.ll are
reasonable: VI (gfx8) has native f16 math, but not packed math.
2025-11-21 19:33:13 +00:00
Alexey Bataev
54d9d4d868 [SLP]Check if the non-schedulable phi parent node has unique operands
Need to check if the non-schedulable phi parent node has unique
operands, if the incoming node has copyables, and the node is
commutative. Otherwise, there might be issues with the correct
calculation of the dependencies.

Fixes #168589
2025-11-20 10:51:31 -08:00
Nicolai Hähnle
13ed14f47e
AMDGPU: Autogenerate checks in a test (#168815) 2025-11-20 03:51:32 +00:00
Alexey Bataev
2c3aa92089 [SLP]Fix insertion point for setting for the nodes
The problem with the many def-use chain problems in SLP vectorizer are
related to the fact that some nodes reuse the same instruction as
insertion point. Insertion point is not the instruction, but the place
between instructions. To set it correctly, better to generate pseudo
instruction immediately after the last instruction, and use it as
insertion point. It resolves the issues in most cases.

Fixes #168512 #168576
2025-11-19 17:15:24 -08:00
Mikhail Gudim
12131d5cd3
[SLPVectorizer] Widen constant strided loads. (#162324)
Given a set of pointers, check if they can be rearranged as follows (%s is a constant):
%b + 0 * %s + 0
%b + 0 * %s + 1
%b + 0 * %s + 2
...
%b + 0 * %s + w

%b + 1 * %s + 0
%b + 1 * %s + 1
%b + 1 * %s + 2
...
%b + 1 * %s + w
...

If the pointers can be rearanged in the above pattern, it means that the
memory can be accessed with a strided loads of width `w` and stride `%s`.
2025-11-19 15:11:09 -05:00
Michael Bedy
a61889580e
[SLP] Invariant loads cannot have a memory dependency on stores. (#167929) 2025-11-18 09:35:29 +01:00
Alexey Bataev
306b5a3d64 [SLP]Do not consider split nodes, when checking parent PHI-based nodes
The compiler should not consider split vectorize nodes, when checking
for non-schedulable PHI-based parent nodes. Only pure PHI nodes must be
  considered, they only can be considered as explicit users, split nodes
  are not.

Fixes #168268
2025-11-16 12:39:58 -08:00
Alexey Bataev
326d4e9033 [SLP]Check if the copyable element is a sub instruciton with abs in isCommutable
Need to check if the non-copyable element is an instruction before actually
trying to check its NSW attribute.
2025-11-14 16:09:50 -08:00
Alexey Bataev
e8cc0d2207 Revert "[SLP]Check if the copyable element is a sub instruciton with abs in isCommutable"
This reverts commit ddf5bb0a2e2d2dd77bce66173387d62ab7174d9f to fix
buildbots  https://lab.llvm.org/buildbot/#/builders/11/builds/28083.
2025-11-14 15:22:55 -08:00
Alexey Bataev
ddf5bb0a2e [SLP]Check if the copyable element is a sub instruciton with abs in isCommutable
Need to check if the non-copyable element is an instruction before actually
trying to check its NSW attribute.
2025-11-14 14:53:42 -08:00
Alexey Bataev
0a5be0f997
[SLP]Enable Sub as a base instruction in copyables
Patch adds support for sub instructions as main instruction in copyables
elements. Also, adds a check if the base instruction is not profitable
for the selection if at least one instruction with the main opcode is
  used as an immediate operand.

Reviewers: RKSimon, hiraditya

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/163231
2025-11-14 12:30:38 -05:00
Alexey Bataev
75ef0be0c3 [SLP]Be careful when trying match/vectorize copyable nodes with external uses only
Need to be careful when trying to match and/or build copyable node with
the instructions, used outside the block only and if their operands
immediately precede such instructions. In this case insertion point
might be the same and it may cause broken def-use chain.

Fixes #167366
2025-11-11 12:05:56 -08:00
Alexey Bataev
96806a7ec3 [SLP]Gather copyable node, if its parent is copyable, but this node is still used outside of the block only
If the current node is a copyable node and its parent is copyable too
and still current node is only used outside, better to cancel scheduling
for such node, because otherwise there might be wrong def-use chain
  built during vectorization.

Fixes #166775
2025-11-06 11:16:55 -08:00
David Green
6ad25c5912
[AArch64] Improve the cost model for extending mull (#125651)
We already have cost model code for detecting extending mull multiplies
for the form `mul(ext, ext)`. Since it was added the codegen for mull
has been improved, this attempts to catch the cost model up.

The main idea is to incorporate extends of larger sizes. A vector `v8i32
mul(zext(v8i8), zext(v8i8))` will be code-generated as `zext (v8i16
mul(zext(v8i8), zext(v8i8))`, or umull+ushll+ushll2.

So the total cost should be 3ish if each instruction costs 1. Where
exactly we attribute the costs is dependable, this patch opts to sets
the cost of the extend to 0 (or the cost of the extend not included in
the mull) and the mul gets the cost of the mull+extra extends.

isWideningInstruction is split into two functions for the two types of
operands it supports. isSingleExtWideningInstruction now handles addw
instructions that extend the second operand, isBinExtWideningInstruction
is for instructions like addl that extend both operands.
2025-11-04 07:50:51 +00:00
Alexey Bataev
7d5659083c [SLP]Do not create copyable node, if parent node is non-schedulable and has a use in binop.
If the parent node is non-schedulable (only externally used instructions), and at least one instruction has multiple uses and used in the binop, such copyable node should be created. Otherwise, it may contain wrong def-use chain model, which cannot be effective detected.

Fixes #166035
2025-11-03 08:00:22 -08:00
Alexey Bataev
66b69d518a [SLP][NFC]Fix UB and constant folded ops in test, NFC 2025-11-01 11:50:39 -07:00
Alexey Bataev
964c7711f4 [SLP]Fix the minbitwidth analysis for slternate opcodes
If the laternate operation is more stricter than the main operation, we
cannot rely on the analysis of the main operation. In such case, better
to avoid doing the analysis at all, since it may affect the overall
result and lead to incorrect optimization

Fixes #165878
2025-10-31 15:25:13 -07:00
Alexey Bataev
4ac74fc614 [SLP][NFC]Add a test with the incorrect minbitwidth in alternate nodes, NFC 2025-10-31 14:52:19 -07:00
Aiden Grossman
c528f60573 Revert "[SLP][NFC]Add a test with the incorrect minbitwidth in alternate nodes, NFC"
This reverts commit 0dca7ee4480f11cd0230d316ccc5d2c7234a4b31.

This broke check-llvm, including on premerge.

https://lab.llvm.org/buildbot/#/builders/137/builds/28194
https://lab.llvm.org/staging/#/builders/21/builds/7649
2025-10-31 21:02:55 +00:00
Alexey Bataev
0dca7ee448 [SLP][NFC]Add a test with the incorrect minbitwidth in alternate nodes, NFC 2025-10-31 13:19:02 -07:00
Alexey Bataev
db6ba82acc [SLP] Do not match the gather node with copyable parent, containing insert instruction
If the gather/buildvector node has the match and this matching node has
a scheduled copyable parent, and the parent node of the original node
has a last instruction, which is non-schedulable and is part of the
schedule copyable parent, such matching node should be excluded as
non-matching, since it produces wrong def-use chain.

Fixes #165435
2025-10-29 11:50:47 -07:00
Alexey Bataev
cf1f4896a7 [SLP]Check only instructions with unique parent instruction user
Need to re-check the instruction with the non-schedulable parent, only
if this parent has a user phi node (i.e. it is used only outside the
  block) and the user instruction has unique parent instruction.

Fixes issue reported in 20675ee67d (commitcomment-168863594)
2025-10-28 11:14:18 -07:00
Alexey Bataev
a7b188983f [SLP]Consider non-inst operands, when checking insts, used outside only
If the instructions in the node do not require scheduling and used
outside basic block only, still need to check, if their operands are
non-inst too. Such nodes should be emitted in the beginning of the
block.

Fixes #165151
2025-10-26 12:53:48 -07:00
paperchalice
f8b81b45ba
[test][Transforms] Remove unsafe-fp-math uses part 3 (NFC) (#164787)
Post cleanup for #164534.
2025-10-24 18:49:45 +08:00
Alexey Bataev
20675ee67d [SLP] Check all copyable children for non-schedulable parent nodes
If the parent node is non-schedulable and it includes several copies of
the same instruction, its operand might be replaced by the copyable
nodes in multiple children nodes, and if the instruction is commutative,
they can be used in different operands. The compiler shall consider this
opportunity, taking into account that non-copyable children are
scheduled only ones for the same parent instruction.

Fixes #164242
2025-10-21 06:39:49 -07:00
Alexey Bataev
8521ffdfaa Revert "[SLP] Check all copyable children for non-schedulable parent nodes"
This reverts commit e7f370f910701b6c67d41dab80e645227692c58b to fix
buildbots  https://lab.llvm.org/buildbot/#/builders/213/builds/1056.
2025-10-20 17:37:32 -07:00
Alexey Bataev
e7f370f910 [SLP] Check all copyable children for non-schedulable parent nodes
If the parent node is non-schedulable and it includes several copies of
the same instruction, its operand might be replaced by the copyable
nodes in multiple children nodes, and if the instruction is commutative,
they can be used in different operands. The compiler shall consider this
opportunity, taking into account that non-copyable children are
scheduled only ones for the same parent instruction.

Fixes #164242
2025-10-20 15:52:28 -07:00
Alexey Bataev
154138c25f [SLP]Do not pack div-like copyable values
If a main instruction in the copyables is a div-like instruction, the
compiler cannot pack duplicates, extending with poisons, these
instructions, being vectorize, will result in undefined behavior.

Fixes #164185
2025-10-20 05:19:42 -07:00
Nikita Popov
573ca36753
[IR] Replace alignment argument with attribute on masked intrinsics (#163802)
The `masked.load`, `masked.store`, `masked.gather` and `masked.scatter`
intrinsics currently accept a separate alignment immarg. Replace this
with an `align` attribute on the pointer / vector of pointers argument.

This is the standard representation for alignment information on
intrinsics, and is already used by all other memory intrinsics. This
means the signatures now match llvm.expandload, llvm.vp.load, etc.
(Things like llvm.memcpy used to have a separate alignment argument as
well, but were already migrated a long time ago.)

It's worth noting that the masked.gather and masked.scatter intrinsics
previously accepted a zero alignment to indicate the ABI type alignment
of the element type. This special case is gone now: If the align
attribute is omitted, the implied alignment is 1, as usual. If ABI
alignment is desired, it needs to be explicitly emitted (which the
IRBuilder API already requires anyway).
2025-10-20 08:50:09 +00:00
Alexey Bataev
e6b0be3764 [SLP]Correctly calculate number of copyable operands
The compiler shall not check for overflow of the number of copyable
operands counter, otherwise non-copyable operand can be counted as
copyable and lead to a compiler crash.

Fixes #164164
2025-10-19 12:14:39 -07:00
Alexey Bataev
0fdfad37d8 [SLP]Fix insert point for copyable node with the last inst, used only outside the block
If the copyable entry has the last instruction, used only outside the
block, tha insert ion point for the vector code should be the last
instruction itself, not the following one. It prevents wrong def-use
sequences, which might be generated for the buildvector nodes.

Fixes #163404
2025-10-17 05:59:48 -07:00
Alexey Bataev
739bfdeb91
[SLP]Enable support for logical ops in copyables (#162945)
Allows to use And, Or and Xor instructions as base for copyables.
2025-10-13 08:01:32 -04:00
Alexey Bataev
d81ffd4ebb [SLP]INsert postponed vector value after all uses, if the parent node is PHI
Need to insert the vector value for the postponed gather/buildvector
node after all uses non only if the vector value of the user node is
phi, but also if the user node itself is PHI node, which may produce
vector phi + shuffle.

Fixes #162799
2025-10-12 13:41:08 -07:00
Alexey Bataev
8f168376c1 [SLP]Support non-ordered copyable argument in non-commutative instructions
If the non-commutative user has several same operands and at least one
of them (but not the first) is copyable, need to consider this
opportunity when calculating the number of dependencies. Otherwise, the
schedule bundle might be not scheduled correctly and cause a compiler
crash

Fixes #162925
2025-10-12 10:28:19 -07:00
Alexey Bataev
d3233e806e [SLP]Do not allow undefs being combined with divs
Undefs/poisons with divs in vector operations lead to undefined
behavior, disabling this combination

Fixes #162663
2025-10-10 16:59:05 -07:00
Mikhail Gudim
004270d247
[RISCV][SLP][NFC]Add a test for satd-8x4 from x264 benchmark. (#162542)
Precommit a test.
2025-10-10 21:25:37 +00:00
Alexey Bataev
7f03b22dce
[SLP]Enable SDiv/UDiv support as main op in copyables (#161892)
Allow SDiv/UDiv as a main operation in copyables support
2025-10-08 07:28:06 -04:00
Alexey Bataev
5d7f324614
[SLP]Enable Shl as a base opcode in copyables (#156766)
Enables Shl matching for the nodes, where copyable can be modelled as
shl %v, 0
2025-10-06 07:07:37 -04:00
Alexey Bataev
2e67f5ceb8 [SLP][NFC]Add udiv/srem test cases, NFC 2025-10-03 10:49:16 -07:00
Mikhail Gudim
15dc80fda7
[SLPVectorizer][NFC] A test for widening constant strided loads. (#160552)
Precommit a test.
2025-10-01 16:09:45 -04:00
Mikhail Gudim
e485d5e77a
[SLPVectorizer] Clear TreeEntryToStridedPtrInfoMap. (#160544)
We need to clear `TreeEntryToStridedPtrInfoMap` in `deleteTree`.
2025-09-30 09:25:32 -04:00
Mikhail Gudim
5e4eb334af
[SLPVectorizer] Remove align 16 in a test. (#161251)
It is not necessary.
2025-09-30 09:23:16 -04:00