llvm-project

Author	SHA1	Message	Date
David Green	6ad25c5912	[AArch64] Improve the cost model for extending mull (#125651 ) We already have cost model code for detecting extending mull multiplies for the form `mul(ext, ext)`. Since it was added the codegen for mull has been improved, this attempts to catch the cost model up. The main idea is to incorporate extends of larger sizes. A vector `v8i32 mul(zext(v8i8), zext(v8i8))` will be code-generated as `zext (v8i16 mul(zext(v8i8), zext(v8i8))`, or umull+ushll+ushll2. So the total cost should be 3ish if each instruction costs 1. Where exactly we attribute the costs is dependable, this patch opts to sets the cost of the extend to 0 (or the cost of the extend not included in the mull) and the mul gets the cost of the mull+extra extends. isWideningInstruction is split into two functions for the two types of operands it supports. isSingleExtWideningInstruction now handles addw instructions that extend the second operand, isBinExtWideningInstruction is for instructions like addl that extend both operands.	2025-11-04 07:50:51 +00:00
Alexey Bataev	7d5659083c	[SLP]Do not create copyable node, if parent node is non-schedulable and has a use in binop. If the parent node is non-schedulable (only externally used instructions), and at least one instruction has multiple uses and used in the binop, such copyable node should be created. Otherwise, it may contain wrong def-use chain model, which cannot be effective detected. Fixes #166035	2025-11-03 08:00:22 -08:00
Alexey Bataev	66b69d518a	[SLP][NFC]Fix UB and constant folded ops in test, NFC	2025-11-01 11:50:39 -07:00
Alexey Bataev	964c7711f4	[SLP]Fix the minbitwidth analysis for slternate opcodes If the laternate operation is more stricter than the main operation, we cannot rely on the analysis of the main operation. In such case, better to avoid doing the analysis at all, since it may affect the overall result and lead to incorrect optimization Fixes #165878	2025-10-31 15:25:13 -07:00
Alexey Bataev	4ac74fc614	[SLP][NFC]Add a test with the incorrect minbitwidth in alternate nodes, NFC	2025-10-31 14:52:19 -07:00
Aiden Grossman	c528f60573	Revert "[SLP][NFC]Add a test with the incorrect minbitwidth in alternate nodes, NFC" This reverts commit 0dca7ee4480f11cd0230d316ccc5d2c7234a4b31. This broke check-llvm, including on premerge. https://lab.llvm.org/buildbot/#/builders/137/builds/28194 https://lab.llvm.org/staging/#/builders/21/builds/7649	2025-10-31 21:02:55 +00:00
Alexey Bataev	0dca7ee448	[SLP][NFC]Add a test with the incorrect minbitwidth in alternate nodes, NFC	2025-10-31 13:19:02 -07:00
Alexey Bataev	db6ba82acc	[SLP] Do not match the gather node with copyable parent, containing insert instruction If the gather/buildvector node has the match and this matching node has a scheduled copyable parent, and the parent node of the original node has a last instruction, which is non-schedulable and is part of the schedule copyable parent, such matching node should be excluded as non-matching, since it produces wrong def-use chain. Fixes #165435	2025-10-29 11:50:47 -07:00
Alexey Bataev	cf1f4896a7	[SLP]Check only instructions with unique parent instruction user Need to re-check the instruction with the non-schedulable parent, only if this parent has a user phi node (i.e. it is used only outside the block) and the user instruction has unique parent instruction. Fixes issue reported in `20675ee67d (commitcomment-168863594)`	2025-10-28 11:14:18 -07:00
Alexey Bataev	a7b188983f	[SLP]Consider non-inst operands, when checking insts, used outside only If the instructions in the node do not require scheduling and used outside basic block only, still need to check, if their operands are non-inst too. Such nodes should be emitted in the beginning of the block. Fixes #165151	2025-10-26 12:53:48 -07:00
paperchalice	f8b81b45ba	[test][Transforms] Remove unsafe-fp-math uses part 3 (NFC) (#164787 ) Post cleanup for #164534.	2025-10-24 18:49:45 +08:00
Alexey Bataev	20675ee67d	[SLP] Check all copyable children for non-schedulable parent nodes If the parent node is non-schedulable and it includes several copies of the same instruction, its operand might be replaced by the copyable nodes in multiple children nodes, and if the instruction is commutative, they can be used in different operands. The compiler shall consider this opportunity, taking into account that non-copyable children are scheduled only ones for the same parent instruction. Fixes #164242	2025-10-21 06:39:49 -07:00
Alexey Bataev	8521ffdfaa	Revert "[SLP] Check all copyable children for non-schedulable parent nodes" This reverts commit e7f370f910701b6c67d41dab80e645227692c58b to fix buildbots https://lab.llvm.org/buildbot/#/builders/213/builds/1056.	2025-10-20 17:37:32 -07:00
Alexey Bataev	e7f370f910	[SLP] Check all copyable children for non-schedulable parent nodes If the parent node is non-schedulable and it includes several copies of the same instruction, its operand might be replaced by the copyable nodes in multiple children nodes, and if the instruction is commutative, they can be used in different operands. The compiler shall consider this opportunity, taking into account that non-copyable children are scheduled only ones for the same parent instruction. Fixes #164242	2025-10-20 15:52:28 -07:00
Alexey Bataev	154138c25f	[SLP]Do not pack div-like copyable values If a main instruction in the copyables is a div-like instruction, the compiler cannot pack duplicates, extending with poisons, these instructions, being vectorize, will result in undefined behavior. Fixes #164185	2025-10-20 05:19:42 -07:00
Nikita Popov	573ca36753	[IR] Replace alignment argument with attribute on masked intrinsics (#163802 ) The `masked.load`, `masked.store`, `masked.gather` and `masked.scatter` intrinsics currently accept a separate alignment immarg. Replace this with an `align` attribute on the pointer / vector of pointers argument. This is the standard representation for alignment information on intrinsics, and is already used by all other memory intrinsics. This means the signatures now match llvm.expandload, llvm.vp.load, etc. (Things like llvm.memcpy used to have a separate alignment argument as well, but were already migrated a long time ago.) It's worth noting that the masked.gather and masked.scatter intrinsics previously accepted a zero alignment to indicate the ABI type alignment of the element type. This special case is gone now: If the align attribute is omitted, the implied alignment is 1, as usual. If ABI alignment is desired, it needs to be explicitly emitted (which the IRBuilder API already requires anyway).	2025-10-20 08:50:09 +00:00
Alexey Bataev	e6b0be3764	[SLP]Correctly calculate number of copyable operands The compiler shall not check for overflow of the number of copyable operands counter, otherwise non-copyable operand can be counted as copyable and lead to a compiler crash. Fixes #164164	2025-10-19 12:14:39 -07:00
Alexey Bataev	0fdfad37d8	[SLP]Fix insert point for copyable node with the last inst, used only outside the block If the copyable entry has the last instruction, used only outside the block, tha insert ion point for the vector code should be the last instruction itself, not the following one. It prevents wrong def-use sequences, which might be generated for the buildvector nodes. Fixes #163404	2025-10-17 05:59:48 -07:00
Alexey Bataev	739bfdeb91	[SLP]Enable support for logical ops in copyables (#162945 ) Allows to use And, Or and Xor instructions as base for copyables.	2025-10-13 08:01:32 -04:00
Alexey Bataev	d81ffd4ebb	[SLP]INsert postponed vector value after all uses, if the parent node is PHI Need to insert the vector value for the postponed gather/buildvector node after all uses non only if the vector value of the user node is phi, but also if the user node itself is PHI node, which may produce vector phi + shuffle. Fixes #162799	2025-10-12 13:41:08 -07:00
Alexey Bataev	8f168376c1	[SLP]Support non-ordered copyable argument in non-commutative instructions If the non-commutative user has several same operands and at least one of them (but not the first) is copyable, need to consider this opportunity when calculating the number of dependencies. Otherwise, the schedule bundle might be not scheduled correctly and cause a compiler crash Fixes #162925	2025-10-12 10:28:19 -07:00
Alexey Bataev	d3233e806e	[SLP]Do not allow undefs being combined with divs Undefs/poisons with divs in vector operations lead to undefined behavior, disabling this combination Fixes #162663	2025-10-10 16:59:05 -07:00
Mikhail Gudim	004270d247	[RISCV][SLP][NFC]Add a test for satd-8x4 from x264 benchmark. (#162542 ) Precommit a test.	2025-10-10 21:25:37 +00:00
Alexey Bataev	7f03b22dce	[SLP]Enable SDiv/UDiv support as main op in copyables (#161892 ) Allow SDiv/UDiv as a main operation in copyables support	2025-10-08 07:28:06 -04:00
Alexey Bataev	5d7f324614	[SLP]Enable Shl as a base opcode in copyables (#156766 ) Enables Shl matching for the nodes, where copyable can be modelled as shl %v, 0	2025-10-06 07:07:37 -04:00
Alexey Bataev	2e67f5ceb8	[SLP][NFC]Add udiv/srem test cases, NFC	2025-10-03 10:49:16 -07:00
Mikhail Gudim	15dc80fda7	[SLPVectorizer][NFC] A test for widening constant strided loads. (#160552 ) Precommit a test.	2025-10-01 16:09:45 -04:00
Mikhail Gudim	e485d5e77a	[SLPVectorizer] Clear `TreeEntryToStridedPtrInfoMap`. (#160544 ) We need to clear `TreeEntryToStridedPtrInfoMap` in `deleteTree`.	2025-09-30 09:25:32 -04:00
Mikhail Gudim	5e4eb334af	[SLPVectorizer] Remove `align 16` in a test. (#161251 ) It is not necessary.	2025-09-30 09:23:16 -04:00
Alexey Bataev	1f82553e38	[SLP]Fix mixing xor instructions in the same opcode analysis Xor with 0 operand should not be compatible with multiplications-based instructions, only with or/xor/add/sub. Fixes #161140	2025-09-29 11:14:06 -07:00
Alexey Bataev	0457644dfb	[SLP][NFC]Add a test with the incorrect combination of Xor/Mul vector instructions, NFC	2025-09-29 10:37:39 -07:00
Alexey Bataev	57947ace14	[SLP]Correctly set the insert point for insertlements with copyable arguments Need to find the last insertelement instruction in the list for the copyable arguments, otherwise wrong def-use chain may be built Fixes #160671	2025-09-25 15:09:23 -07:00
Alexey Bataev	8c41859a21	[SLP]Clear the operands deps of non-schedulable nodes, if previously all operands were copyable If all operands of the non-schedulable nodes were previously only copyables, need to clear the dependencies of the original schedule data for such copyable operands and recalculate them to correctly handle number of dependecies. Fixes #159406	2025-09-18 12:11:33 -07:00
Alexey Bataev	f2301be0e8	[SLP]Add a check if the user itself is commutable If the commutable instruction can be represented as a non-commutable vector instruction (like add 0, %v can be represented as a part of sub nodes with operation sub %v, 0), its operands might still be reordered and this should be accounted when checking for copyables in operands Fixes #158293	2025-09-15 12:50:03 -07:00
Mikhail Gudim	ee3a4f4c94	[SLPVectorizer] Test -1 stride loads. (#158358 ) Add a test to generate -1 stride load and flags to force this behaviour.	2025-09-14 15:29:28 -04:00
Antonio Frighetto	370607065d	[llvm] Regenerate test checks including TBAA semantics (NFC) Tests exercizing TBAA metadata (both purposefully and not), and previously generated via UTC, have been regenerated and updated to version 6.	2025-09-12 20:01:17 +02:00
Alexey Bataev	0dddfab54c	[SLP]Recalculate deps if the original instruction scheduled after being copyable If the original instruction is going to be scheduled after same instruction being scheduled as copyable, need to recalculate dependencies. Otherwise, the dependencies maybe calculated incorrectly.	2025-09-10 10:18:45 -07:00
Alexey Bataev	d0ea176cce	[SLP]Do not consider SExt/ZExt profitable for demotion, if the user is a bitcast to float If the user node of the SExt/ZExt node is a bitcast to a float point type, the node itself should not be considered legal to demote, since still the casting is required to match the size of the float point type. Fixes #157277	2025-09-08 07:59:01 -07:00
Alexey Bataev	fd93dc5ac5	[SLP]Correctly schedule standalone schedule data, which is part of tree entry If a standalone schedule data relates to a vectorized instruction, still need to schedule it as a part of pseudo-bundle to correctly handle dependencies between its child nodes.	2025-09-07 17:08:37 -07:00
Alexey Bataev	c4d927ce09	Revert "[SLP]Correctly schedule standalone schedule data, which is part of tree entry" This reverts commit 57cae2b6a275a8eb3bc8935973263ed84535fb81 to fix a buildbot https://lab.llvm.org/buildbot/#/builders/169/builds/14776	2025-09-07 13:27:12 -07:00
Alexey Bataev	57cae2b6a2	[SLP]Correctly schedule standalone schedule data, which is part of tree entry If a standalone schedule data relates to a vectorized instruction, still need to schedule it as a part of pseudo-bundle to correctly handle dependencies between its child nodes.	2025-09-07 10:54:40 -07:00
Phoebe Wang	94b164c218	[X86][AVX10] Remove EVEX512 and AVX10-256 implementations (#157034 ) The 256-bit maximum vector register size control was removed from AVX10 whitepaper, ref: https://cdrdv2.intel.com/v1/dl/getContent/784343 We have warned these options in LLVM21 through #132542. This patch removes underlying implementations in LLVM22.	2025-09-05 14:08:59 +00:00
Alexey Bataev	9a3aedb093	[SLP]Do not try to schedule bundle with non-schedulable parent with commutable instructions Commutable instruction can be reordering during tree building, and if the parent node is not scheduled, its ScheduleData elements are considered independent and compiler do not looks for reordered operands. Need to cancel scheduling of copyables in this case.	2025-09-04 12:57:14 -07:00
Alexey Bataev	005f0fa40e	[SLP]Improved/fixed FMAD support in reductions In the initial patch for FMAD, potential FMAD nodes were completely excluded from the reduction analysis for the smaller patch. But it may cause regressions. This patch adds better detection of scalar FMAD reduction operations and tries to correctly calculate the costs of the FMAD reduction operations (also, excluding the costs of the scalar fmuls) and split reduction operations, combined with regular FMADs. Fixed the handling for reduced values with many uses. Reviewers: RKSimon, gregbedwell, hiraditya Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/152787	2025-09-02 13:09:57 -07:00
Alexey Bataev	6d902b67cd	Revert "[SLP]Improved/fixed FMAD support in reductions" This reverts commit 74230ff2791384fb3285c9e9ab202056959aa095 to fix the bugs found during local testing.	2025-09-02 07:58:29 -07:00
Alexey Bataev	74230ff279	[SLP]Improved/fixed FMAD support in reductions In the initial patch for FMAD, potential FMAD nodes were completely excluded from the reduction analysis for the smaller patch. But it may cause regressions. This patch adds better detection of scalar FMAD reduction operations and tries to correctly calculate the costs of the FMAD reduction operations (also, excluding the costs of the scalar fmuls) and split reduction operations, combined with regular FMADs. Reviewers: RKSimon, gregbedwell, hiraditya Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/152787	2025-09-01 17:01:36 -04:00
Alexey Bataev	a80a1988f7	[SLP]Better support for copyable values in stores Currently stores are sorted by the stored values instruction types, which do not include analysis for copyables. The compiler may miss some potential vectorization opportunities because of that. Patch adds detection of the copyables in stored values. Reviewers: hiraditya, HanKuanChen, RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/153213	2025-09-01 16:09:52 -04:00
Sam Tebbs	37127f74f4	[LV] Bundle sub reductions into VPExpressionRecipe (#147255 ) This PR bundles sub reductions into the VPExpressionRecipe class and adjusts the cost functions to take the negation into account. Stacked PRs: 1. https://github.com/llvm/llvm-project/pull/147026 2. -> https://github.com/llvm/llvm-project/pull/147255 3. https://github.com/llvm/llvm-project/pull/147302 4. https://github.com/llvm/llvm-project/pull/147513	2025-09-01 17:25:01 +01:00
Alexey Bataev	7730ebce8e	[SLP]Do not to try to revectorize previously vectorized phis in loops No need to try to revectorize previously vectorized phis in loops, it leads to a compile time blow-up. Fixes #155998	2025-08-31 10:54:20 -07:00
Alexey Bataev	e5a4ea20c5	[SLP]Do not remove reduced value, if it is a copyable If the value is checked for the reduction and it is a copyable element in a root node, it should not be deleted, since it may still be used after vectorization. Fixes #155512	2025-08-31 09:09:39 -07:00

1 2 3 4 5 ...

2358 Commits