llvm-project

Author	SHA1	Message	Date
Alexey Bataev	cb648ba970	[SLP]Check if the user node has instructions, used only outside Gather nodes with parents, which scalar instructions are used only outside, are generated before the whole tree vectorization. Need to teach isGatherShuffledSingleRegisterEntry to check that such nodes are emitted first and they cannot depend on other nodes, which are emitted later. Fixes #141628	2025-05-29 10:09:49 -07:00
Alexey Bataev	aa452b65fc	[SLP]Restore insertion points after gathers vectorization Restore insertion points after gathers vectorization to avoid a crash in a root node vectorization. Fixes #141265	2025-05-24 07:25:20 -07:00
Alexey Bataev	3918ef3688	[SLP]Fix the analysis for masked compress loads Need to remove the check for Orders in interleaved loads analysis and estimate shuffle cost without the reordering to correctly handle the costs of masked compress loads. Reviewers: hiraditya, HanKuanChen, RKSimon Reviewed By: HanKuanChen, RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/140647	2025-05-20 07:31:16 -04:00
Alexey Bataev	bb8e2a8937	[SLP]Relax assertion to avoid compiler crash Need to relax the assertion to fix a compiler crash in case if the reordered compress loads are more profitable than the ordered ones. Fixes #140334	2025-05-18 14:26:36 -07:00
Alexey Bataev	fb86b3d96b	[SLP]Change the insertion point for outside-block-used nodes and prevec phi operand gathers Need to set the insertion point for (non-schedulable) vector node after the last instruction in the node to avoid def-use breakage. But it also causes miscompilation with gather/buildvector operands of the phi nodes, used in the same phi only in the block. These nodes supposed to be inserted at the end of the block and after changing the insertion point for the non-schedulable vec block, it also may break def-use dependencies. Need to prevector such nodes, to emit them as early as possible, so the vectorized nodes are inserted before these nodes. Fixes #139728 Recommit after revert 60fb92179291e848eb7b04913bdc818d081db296 Reviewers: hiraditya, HanKuanChen, RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/139917	2025-05-18 12:59:36 -07:00
Alexey Bataev	60fb921792	Revert "[SLP]Change the insertion point for outside-block-used nodes and prevec phi operand gathers" This reverts commit d79d9b8fbfc7e8411aeaf2f5e1be9d4247594fee to fix a bug reported in https://github.com/llvm/llvm-project/pull/139917#issuecomment-2888216404	2025-05-17 11:06:37 -07:00
Alexey Bataev	d79d9b8fbf	[SLP]Change the insertion point for outside-block-used nodes and prevec phi operand gathers Need to set the insertion point for (non-schedulable) vector node after the last instruction in the node to avoid def-use breakage. But it also causes miscompilation with gather/buildvector operands of the phi nodes, used in the same phi only in the block. These nodes supposed to be inserted at the end of the block and after changing the insertion point for the non-schedulable vec block, it also may break def-use dependencies. Need to prevector such nodes, to emit them as early as possible, so the vectorized nodes are inserted before these nodes. Fixes #139728 Reviewers: hiraditya, HanKuanChen, RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/139917	2025-05-16 12:52:27 -04:00
Ramkumar Ramachandra	c807395011	[LAA/SLP] Don't truncate APInt in getPointersDiff (#139941 ) Change getPointersDiff to return an std::optional<int64_t>, and fill this value with using APInt::trySExtValue. This simple change requires changes to other functions in LAA, and major changes in SLPVectorizer changing types from 32-bit to 64-bit. Fixes #139202.	2025-05-15 10:08:05 +01:00
Alexey Bataev	c632ac3506	[SLP][NFC]Add a test with the gather operand in phi node in gathered loads	2025-05-14 08:18:23 -07:00
Alexey Bataev	e1ea86e849	[SLP]Do not try to use interleaved loads, if reordering is required If the interleaved loads require reordering, better to avoid generate load + shuffle sequence, which in this case cannot be recognized as interleaved load. Also, it fixes the issue with the incorrect codegen. Fixes #138923	2025-05-12 14:12:51 -07:00
Alexey Bataev	fa985b5f1e	[SLP][NFC]Add a test with missed reordering of the interleaved loads	2025-05-12 13:48:11 -07:00
Alexey Bataev	2e13f7ab01	[SLP][NFC]Add a test with the incorrect vectorization for the pointers with distance difference > 2^32	2025-05-12 06:30:05 -07:00
Han-Kuan Chen	53df6400af	[SLP] Fix incorrect operand order in interchangeable instruction. (#139225 )	2025-05-12 20:03:45 +08:00
Alexey Bataev	49042f2bee	[SLP][NFC]Add a test with ordering of the operands of unordered loads	2025-05-11 08:09:51 -07:00
David Green	3b4d5638b3	[AArch64] Limit vector splitting to vectors of size larger than 128bit The intent of this code is to split larger vectors into smaller shuffles, but it currently triggering on some small vector types. Limit it to vectors of size >128bit.	2025-05-09 22:17:28 +01:00
Gheorghe-Teodor Bercea	25a031947a	[AMDGPU][NFC] Add tests in preparation for i8 vectorization (#138801 ) Precommit tests for PR: https://github.com/llvm/llvm-project/pull/134934	2025-05-09 10:32:49 -04:00
David Green	8b41551651	[AArch64] Add a slp vectorization test for extract and shuffle costs. NFC	2025-05-06 18:09:29 +01:00
Alexey Bataev	3aecbbcbf6	[SLP]Do not match nodes if schedulability of parent nodes is different If one user node is non-schedulable and another one is schedulable, such nodes should be considered matched. The selection of the actual insert point in this case differs and the insert points may match, which may cause a compiler crash because of the broken def-use chain. Fixes #137797	2025-05-06 07:52:49 -07:00
Alexey Bataev	9400270449	[SLP]Fix comparator for vector operands of extractelements in PHICompare Need to make comparator to follow strict-weak ordering to fix compiler crashes. Fixes #138178	2025-05-01 14:28:20 -07:00
Alexander Richardson	ee13638362	[AMDGPU] Remove explicit datalayout from tests where not needed Since e39f6c1844fab59c638d8059a6cf139adb42279a opt will infer the correct datalayout when given a triple. Avoid explicitly specifying it in tests that depend on the AMDGPU target being present to avoid the string becoming out of sync with the TargetInfo value. Only tests with REQUIRES: amdgpu-registered-target or a local lit.cfg were updated to ensure that tests for non-target-specific passes that happen to use the AMDGPU layout still pass when building with a limited set of targets. Reviewed By: shiltian, arsenm Pull Request: https://github.com/llvm/llvm-project/pull/137921	2025-04-30 10:58:17 -07:00
Jonas Paulsson	f5c8c1eedb	[SLPVectorizer] Move X86 specific handling into X86TTIImpl. (#137830 ) `ad9909d "[SLP]Fix perfect diamond match with extractelements in scalars" ` changed SLPVectorizer getScalarizationOverhead() to call TTI.getVectorInstrCost() instead of TTI.getScalarizationOverhead() in some cases. This was due to X86 specific handlings in these (overridden) methods, and unfortunately the general preference of TTI.getScalarizationOverhead() was dropped. If VL is available it should always be preferred to use getScalarizationOverhead(), and this is indeed the case for SystemZ which has a special insertion instruction that can insert two GPR64s. Then ` 33af951 "[SLP]Synchronize cost of gather/buildvector nodes with codegen"` reworked SLPVectorizer getGatherCost() which together with ad9909d caused the SystemZ test vec-elt-insertion.ll to fail. This patch restores the SystemZ test and reverts the change in SLPVectorizer getScalarizationOverhead() so that TTI.getScalarizationOverhead() is always called again. The ForPoisonSrc argument is now passed on to the TTI method so that X86 can handle this as required. Fixes: #135346	2025-04-30 17:11:27 +02:00
Florian Hahn	ec1016f7ef	[IVDescriptors] Support reductions with minimumnum/maximumnum. (#137335 ) Add a new reduction recurrence kind for reductions with minimumnum/maximumnum. Such reductions can be vectorized without nsz/nnans, same as reductions with maximum/minimum intrinsics. Note that a new reduction kind is needed to make sure partial reductions are also combined with minimumnum/maximumnum. Note that the final reduction to a scalar value is performed with vector.reduce.fmin/fmax. This should be fine, as the results of the partial reductions with maximumnum/minimumnum silences any sNaNs. In-loop and reductions in SLP are not supported yet, as there's no reduction version of maximumnum/minimumnum yet and fmax may be incorrect. PR: https://github.com/llvm/llvm-project/pull/137335	2025-04-28 11:16:36 +01:00
YunQiang Su	e9a34e4236	[RISCV] Support vectorizing FMINIMUMNUM and FMAXIMUMNUM (#135727 ) RISC-V V extension support vfmax and vfmin, which follow IEEE754-2019. We can use them directly.	2025-04-27 19:10:02 +08:00
Alexey Bataev	a7a74b349d	[SLP]Improve reordering of the alternate nodes Better to preserve the original order of the alternate nodes to avoid inter-lane shuffling, select/insert subvector patterns provide better perf. Reviewers: RKSimon, hiraditya Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/136329	2025-04-24 14:33:10 -04:00
Alexey Bataev	f427890a1d	[SLP]Fix PHI comparator to make it follow weak strict ordering restriction Fixes #137164	2025-04-24 11:08:17 -07:00
Philip Reames	1c722fc8f5	[RISCV][TTI] Use processShuffleMask for shuffle legalization estimate (#136191 ) We had some code which tried to estimate legalization costs for illegally typed shuffles, but it only handled the case of a widening shuffle, and used a somewhat adhoc heuristic. We can reuse the processShuffleMask utility (which we already use for individual vector register splitting when exact VLEN is known) to perform the same splitting given the legal vector type as the unit of split instead. This makes the costing both simpler and more robust. Note that this swings costs for illegal shuffles pretty wildly as we were previously sometimes hitting the adhoc code, and sometimes falling through into generic scalarization costing. I don't know that any of the costs for the individual tests in tree are significant, but the test which which triggered me finding this was reported to me by Alexey reduced from something triggering a bad choice in SLP for x264. So this has the potential to be somewhat high impact.	2025-04-22 10:50:20 -07:00
Alexey Bataev	9c388f1f05	[SLP]Prefer segmented/deinterleaved loads to strided and fix codegen Need to estimate, which one is preferable, deinterleaved/segmented loads or strided. Segmented loads can be combined, improving the overall performance. Reviewers: RKSimon, hiraditya Reviewed By: hiraditya, RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/135058	2025-04-22 12:11:01 -04:00
Alexey Bataev	0252d338fa	[SLP]Model single unique value insert + shuffle as splat + select, where profitable When we have the remaining unique scalar, that should be inserted into non-poison vector and into non-zero position: ``` %vec1 = insertelement %vec, %v, pos1 %res = shuffle %vec1, poison, <0, 1, 2,..., pos1, pos1 + 1, ..., pos1, ...> ``` better to estimate if it is profitable to model it as is or model it as: ``` %bv = insertelement poison, %v, 0 %splat = shuffle %bv, poison, <poison, ..., 0, ..., 0, ...> %res = shuffle %vec, %splat, <0, 1, 2,..., pos1 + VF, pos1 + 1, ...> ``` Reviewers: preames, hiraditya, RKSimon Reviewed By: preames Pull Request: https://github.com/llvm/llvm-project/pull/136590	2025-04-22 11:30:29 -04:00
Alexey Bataev	fdcee2dd36	[SLP]Reorder tree, if the reorder indices are non empty Need to consider the ordering for all nodes with the specified ordering, not only loads/store/extracts. Reviewers: hiraditya, RKSimon Reviewed By: hiraditya Pull Request: https://github.com/llvm/llvm-project/pull/136185	2025-04-18 13:37:08 -04:00
Alexey Bataev	5fe91f1b59	[SLP]Check for catchswitch block before doing the analysis of the instructions Need to skip the analysis of the catchswitch blocks to avoid a compiler crash when trying to get the first instruction in the block.	2025-04-17 09:10:15 -07:00
Alexey Bataev	1fcf78d153	[SLP]Cache data for compressed loads before codegen Need to cache and use cached data for compressed loads before codegen to avoid side-effects, caused by the earlier vectorization, which may affect the analysis.	2025-04-17 08:43:44 -07:00
Alexey Bataev	4aca20c8b6	[SLP]Pre-cache the last instruction for all entries before vectorization Need to pre-cache last instruction to avoid unexpected changes in the last instruction detection during the vectorization, caused by adding the new vector instructions, which add new uses and may affect the analysis.	2025-04-16 11:44:55 -07:00
Alexey Bataev	913dcf1aa3	[SLP]Fix type promotion for smax reduction with unsigned reduced operands Need to add an extra bit for sign info for unsigned reduced values to generate correct code.	2025-04-16 10:14:29 -07:00
Alexey Bataev	51fa6cde7d	[SLP][NFC]Add a test with missing unsigned promotion for smax reduction, NFC	2025-04-16 09:55:34 -07:00
Alexey Bataev	af28c9c65a	[SLP]Do not reorder split node operand with reuses, if not possible Need to check if the operand node of the split vectorize node has reuses and check if it is possible to build the order for this node to reorder it correctly. Fixes #135912	2025-04-16 06:23:44 -07:00
Alexey Bataev	ddb1267430	[SLP]Insert vector instruction after landingpad If the node must be emitted in the landingpad block, need to insert the instructions after the landingpad instruction to avoid a crash. Fixes #135781	2025-04-15 13:57:53 -07:00
Alexey Bataev	85eb44e304	[SLP]Fix number of operands for the split node FOr the split node number of operands should be requested via getNumOperands() function, even if the main op is CallInst.	2025-04-15 13:33:36 -07:00
Alexey Bataev	2271f0bebd	[SLP]Check for perfect/shuffled match for the split node If the potential split node is a perfect/shuffled match of another split node, need to skip creation of the another split node with the same scalars, it should be a buildvector. Fixes #135800	2025-04-15 13:17:46 -07:00
Han-Kuan Chen	d41e517748	[SLP] Make getSameOpcode support interchangeable instructions. (#135797 ) We use the term "interchangeable instructions" to refer to different operators that have the same meaning (e.g., `add x, 0` is equivalent to `mul x, 1`). Non-constant values are not supported, as they may incur high costs with little benefit. --------- Co-authored-by: Alexey Bataev <a.bataev@gmx.com>	2025-04-16 00:08:59 +08:00
Han-Kuan Chen	bcfc9f4529	[SLP][REVEC] VectorValuesAndScales should be supported by REVEC. (#135762 ) We should align REVEC with the SLP algorithm as closely as possible. For example, by applying REVEC-specific handling when calling IRBuilder's Create methods, performing cost analysis via TTI, and expanding shuffle masks using transformScalarShuffleIndicesToVector. reference commit: 3b18d47ecbaba4e519ebf0d1bc134a404a56a9da	2025-04-15 23:03:55 +08:00
Alexey Bataev	57025b42c4	[SLP]Mark smin reduction as signed compare Reduction signed min must be marked as signed compare, fixing the analysis for the cases, where the incoming arguments are unsigned. Fixes #133943	2025-04-15 07:24:17 -07:00
Alexey Bataev	7f2587a239	[SLP][NFC]Add a test with missing zext on signed minimum reduction, NFC	2025-04-15 07:14:36 -07:00
Han-Kuan Chen	e1382b3b45	Revert "[SLP] Make getSameOpcode support interchangeable instructions. (#133888 )" This reverts commit 123993fd974629ca0a094918db4c21ad1c2624d0.	2025-04-15 06:02:42 -07:00
YunQiang Su	fe9e2090be	Vectorize: Support fminimumnum and fmaximumnum (#131781 ) Support auto-vectorize for fminimum_num and fmaximum_num. For ARM64 with SVE, scalable vector cannot support yet. --------- Co-authored-by: Your Name <you@example.com>	2025-04-15 08:08:45 +08:00
Han-Kuan Chen	123993fd97	[SLP] Make getSameOpcode support interchangeable instructions. (#133888 ) We use the term "interchangeable instructions" to refer to different operators that have the same meaning (e.g., `add x, 0` is equivalent to `mul x, 1`). Non-constant values are not supported, as they may incur high costs with little benefit. --------- Co-authored-by: Alexey Bataev <a.bataev@gmx.com>	2025-04-14 19:23:18 +08:00
Alexey Bataev	38e64b1a84	[SLP]Fix minbiwidth analysis for gather nodes with SIToFP users If the buildvector node has cast to float user, it cannot be considered as safe for truncation, need to use the original bitwidth here. Fixes #135410	2025-04-11 11:40:41 -07:00
Alexey Bataev	c9ad5bed7f	[SLP][NFC]Add a test with the incorrect type promotion after bitwidth analysis, NFC	2025-04-11 11:10:01 -07:00
Alexey Bataev	a2d129b792	[SLP]Fix a crash when trying to reduce in revec after minbitwidth analysis Need to use the original scalar type, when building the reduction, and use the scalar type, when performing casting, to avoid compiler crash.	2025-04-11 10:58:39 -07:00
Alexey Bataev	33af951f3f	[SLP]Synchronize cost of gather/buildvector nodes with codegen If the buildvector node contains constants and non-constants, need to consider shuffling of the constant vec and insertion of unique elements into the vector. Also, if there is an input vector, need to consider the cost of shuffling source vector and constant vector and then insertion and shuffling of the non-constant elements. Reviewers: hiraditya, RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/135245	2025-04-11 09:42:34 -04:00
Han-Kuan Chen	b99a2b6221	[SLP][REVEC] Update test. The test is affected by commit aaaa2a325bd1abb8c87e0171384fd2c42da5e38a.	2025-04-11 03:01:09 -07:00

1 2 3 4 5 ...

2243 Commits