llvm-project

Author	SHA1	Message	Date
Han-Kuan Chen	a795a18bba	[SLP][REVEC] VF should be scaled when ScalarTy is FixedVectorType. (#114551 )	2024-11-02 03:03:52 +08:00
Alexey Bataev	cb5046da26	[SLP]Do not ignore undefs when trying to replace with "poisonous" shuffles Need to consider undefs correctly, when trying to replace them with potentially poisonous values in shuffles. Such elements should not be silently replaced by poison values, instead complex analysis should be implemented to see if it is safe to do it. Fixes #113425	2024-10-24 07:47:23 -07:00
Han-Kuan Chen	12bcea3292	[RISCV][TTI] Recognize CONCAT_VECTORS if a shufflevector mask is multiple insert subvector. (#111459 ) reference: https://github.com/llvm/llvm-project/pull/110457	2024-10-18 20:16:56 +07:00
Alexey Bataev	f9bc00e4bb	[SLP]Initial support for interleaved loads Adds initial support for interleaved loads, which allows emission of segmented loads for RISCV RVV. Vectorizes extra code for RISCV CFP2006/447.dealII, CFP2006/453.povray, CFP2017rate/510.parest_r, CFP2017rate/511.povray_r, CFP2017rate/526.blender_r, CFP2017rate/538.imagick_r, CINT2006/403.gcc, CINT2006/473.astar, CINT2017rate/502.gcc_r, CINT2017rate/525.x264_r Reviewers: RKSimon, preames Reviewed By: preames Pull Request: https://github.com/llvm/llvm-project/pull/112042	2024-10-14 09:12:33 -04:00
Alexey Bataev	4b5018d231	[SLP]Track repeated reduced value as it might be vectorized Need to track changes with the repeated reduced value, since it might be vectorized in the next attempt for reduction vectorization, to correctly generate the code and avoid compiler crash. Fixes #111887	2024-10-10 13:41:56 -07:00
Alexey Bataev	a65a5feb1a	[SLP]Improve masked loads vectorization, attempting gathered loads If the vector of loads can be vectorized as masked gather and there are several other masked gather nodes, compiler can try to attempt to check, if it possible to gather such nodes into big consecutive/strided loads node, which provide better performance. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/110151	2024-10-08 16:43:10 -04:00
Philip Reames	f11568bcb0	Revert "[RISCV][TTI] Recognize CONCAT_VECTORS if a shufflevector mask is multiple insert subvector. (#110457 )" This reverts commit 554eaec63908ed20c35c8cc85304a3d44a63c634. Change was not approved when landed.	2024-10-07 11:31:57 -07:00
Luke Lau	20864d2cf6	[ValueTypes][RISCV] Add v1bf16 type (#111112 ) When trying to add RISC-V fadd reduction cost model tests for bf16, I noticed a crash when the vector was of <1 x bfloat>. It turns out that this was being scalarized because unlike f16/f32/f64, there's no v1bf16 value type, and the existing cost model code assumed that the legalized type would always be a vector. This adds v1bf16 to bring bf16 in line with the other fp types. It also adds some more RISC-V bf16 reduction tests which previously crashed, including tests to ensure that SLP won't emit fadd/fmul reductions for bf16 or f16 w/ zvfhmin after #111000.	2024-10-06 22:20:51 +08:00
Han-Kuan Chen	554eaec639	[RISCV][TTI] Recognize CONCAT_VECTORS if a shufflevector mask is multiple insert subvector. (#110457 )	2024-10-05 14:58:44 +08:00
Alexey Bataev	b16e694948	[SLP]Try to keep operand of external casts as scalars, if profitable If the cost of original scalar instruction + cast is better than the extractelement from the vector cast instruction, better to keep original scalar instructions, where possible Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/110537	2024-10-01 13:35:42 -04:00
Han-Kuan Chen	061762933b	[SLP][REVEC] Fix cost model for getBuildVectorCost with FixedVectorType ScalarTy. (#110073 ) BoUpSLP::gather always use CreateInsertVector for FixedVectorType ScalarTy.	2024-09-30 21:51:12 +08:00
Philip Reames	556ec4a726	[SLP] Pass operand info to getCmpSelInstrInfo (#109998 ) Depending on the constant, selects with constant arms can have highly varying cost. This adjusts SLP to use the new API introduced in d2885743. Fixes https://github.com/llvm/llvm-project/issues/109466.	2024-09-25 08:17:55 -07:00
Philip Reames	bcbdf7ad6b	[RISCV][TTI/SLP] Add test coverage for select of constants costing Provides coverage for an upcoming change which accounts for the cost of materializing the vector constants in the vector select.	2024-09-24 08:15:40 -07:00
Han-Kuan Chen	00629752e6	[SLP][REVEC] Fix cost model for getGatherCost with FixedVectorType ScalarTy. (#109369 )	2024-09-24 08:08:35 +08:00
Philip Reames	7f6bbb3c4f	[RISCV][TTI] Reduce cost of a build_vector pattern (#108419 ) This change is actually two related changes, but they're very hard to meaningfully separate as the second balances the first, and yet doesn't do much good on it's own. First, we can reduce the cost of a build_vector pattern. Our current costing for this defers to generic insertelement costing which isn't unreasonable, but also isn't correct. While inserting N elements requires N-1 slides and N vmv.s.x, doing the full build_vector only requires N vslide1down. (Note there are other cases that our build vector lowering can do more cheaply, this is simply the easiest upper bound which appears to be "good enough" for SLP costing purposes.) Second, we need to tell SLP that calls don't preserve vector registers. Without this, SLP will vectorize scalar code which performs e.g. 4 x float @exp calls as two <2 x float> @exp intrinsic calls. Oddly, the costing works out that this is in fact the optimal choice - except that we don't actually have a <2 x float> @exp, and unroll during DAG. This would be fine (or at least cost neutral) except that the libcall for the scalar @exp blows all vector registers. So the net effect is we added a bunch of spills that SLP had no idea about. Thankfully, AArch64 has a similiar problem, and has taught SLP how to reason about spill cost once the right TTI hook is implemented. Now, for some implications... The SLP solution for spill costing has some inaccuracies. In particular, it basically just guesses whether a intrinsic will be lowered to a call or not, and can be wrong in both directions. It also has no mechanism to differentiate on calling convention. This has the effect of making partial vectorization (i.e. starting in scalar) more profitable. In practice, the major effect of this is to make it more like SLP will vectorize part of a tree in an intersecting forrest, and then vectorize the remaining tree once those uses have been removed. This has the effect of biasing us slightly away from strided, or indexed loads during vectorization - because the scalar cost is more accurately modeled, and these instructions look relevatively less profitable.	2024-09-20 08:34:36 -07:00
Philip Reames	fa8b737a81	[SLP][RISCV] Add test for 3 element build vector feeding reduce Our costs for build vectors are currently a bit off which inhibits vectorization. Fix forthcoming.	2024-09-11 15:56:53 -07:00
Philip Reames	7910812414	[SLP] Regen a test to pick up naming changes	2024-09-11 15:46:57 -07:00
Philip Reames	247d3ea843	[SLP] Expand non-power-of-two bailout in TryToFindDuplicates This fixes a crash noticed when doing a downstream merge. The test case has been reduced, and is included in this commit. The existing bailout for non-power-of-two vectors in TryToFindDuplicates did not consider the case where the list being vectorized had no root node. This allowed reshuffled scalars to slip through to code which does not yet expect to handle it. This was an existing bug (likely introduced by my ed03070e), but made easier to hit by 63e8a1b1	2024-09-05 13:51:11 -07:00
Philip Reames	63e8a1b16f	[SLP] Enable reordering for non-power-of-two vectors (#106638 ) This change tries to enable vector reordering during vectorization for non-power-of-two vectors. Specifically, my goal is to be able to vectorize reductions whose operands appear in other than identity order. (i.e. a[1] + a[0] + a[2]). Our standard pass pipeline, Reassociation effectively canonicalizes towards this form. So for reduction vectorization to be wildly applicable, we need this feature. This change enables the use of a non-empty ReorderIndices structure - which is effectively required for out of order loads or gathers - while leaving the ReuseShuffleIndices mechanism unused and disabled. If I've understood the code structure, the former is used when describing implicit shuffles required by the vectorization strategy (i.e. loading elements 0,1,3,2 in the order 0,1,2,3 and then shuffling later), while the later is used when trying to optimize explode/buildvectors (called gathers in this code). I audited all the code enabled by this change, but can't claim to deeply understand most of it. I added a couple of bailouts in places which appeared to be difficult to audit and optional optimizations. I've tried to do so in the least risky way I can, but am not completely confident in this change. Careful review appreciated.	2024-09-05 07:52:27 -07:00
Alexey Bataev	a724f9a7e5	[SLP][NFC]Make whole reg non-power-2 test for x86 and aarch64 along with risc-v	2024-09-04 10:11:05 -07:00
Alexey Bataev	571c8c2c88	Revert "[SLP]Initial support for non-power-of-2 (but still whole register) number of elements in operands." This reverts commit a3ea90ffbbe47d9a1b3eab03324f09d7b8e0dcb3 after the post commit review. The number of parts is calculated incorrectly.	2024-09-03 11:02:07 -07:00
Alexey Bataev	884d7c137a	Revert "[SLP]Check for the whole vector vectorization in unique scalars analysis" This reverts commit b74e09cb20e6218320013b54c9ba2f5c069d44b9 after post-commit review. The number of parts is calculated incorrectly.	2024-09-03 11:02:07 -07:00
Philip Reames	2c7786e94a	Prefer use of 0.0 over -0.0 for fadd reductions w/nsz (in IR) (#106770 ) This is a follow up to 924907bc6, and is mostly motivated by consistency but does include one additional optimization. In general, we prefer 0.0 over -0.0 as the identity value for an fadd. We use that value in several places, but don't in others. So, let's be consistent and use the same identity (when nsz allows) everywhere. This creates a bunch of test churn, but due to 924907bc6, most of that churn doesn't actually indicate a change in codegen. The exception is that this change enables the use of 0.0 for nsz, but not reasoc, fadd reductions. Or said differently, it allows the neutral value of an ordered fadd reduction to be 0.0.	2024-09-03 09:16:37 -07:00
Alexey Bataev	b74e09cb20	[SLP]Check for the whole vector vectorization in unique scalars analysis Need to check that thr whole number of register is attempted to vectorize before actually trying to build the node to avoid compiler crash.	2024-09-03 06:19:21 -07:00
Alexey Bataev	a3ea90ffbb	[SLP]Initial support for non-power-of-2 (but still whole register) number of elements in operands. Patch adds basic support for non-power-of-2 number of elements in operands. The patch still requires that this number addresses whole registers. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/106449	2024-08-31 08:14:49 -07:00
Martin Storsjö	9e86d4f2ed	Revert "[SLP]Initial support for non-power-of-2 (but still whole register) number of elements in operands." This reverts commit 6ab07d71174982e5cb95420ee4df01347333c342. This commit caused failed asserts, see https://github.com/llvm/llvm-project/pull/106449.	2024-08-31 14:53:08 +03:00
Alexey Bataev	6ab07d7117	[SLP]Initial support for non-power-of-2 (but still whole register) number of elements in operands. Patch adds basic support for non-power-of-2 number of elements in operands. The patch still requires that this number addresses whole registers. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/106449	2024-08-30 14:50:34 -04:00
Simon Pilgrim	ceb613a8be	[RISCV] Add full test coverage for acos/asin/atan and cosh/sinh/tanh intrinsics to support #106584	2024-08-30 14:01:15 +01:00
Philip Reames	22ba351108	[RISCV][SLP] Test for <3 x Ty> reductions which require reordering These tests show a vectorizable reduction where the order of the reduction has been adjusted so that profitable vectorization requires a reordering of the computation. We currently have no reordering in SLP for non-power-of-two vectors, so this doesn't work. Note that due to reassociation performed in the standard pipeline, this is actually the canonical form for a reduction reaching SLP.	2024-08-29 11:42:09 -07:00
Alexey Bataev	be7014e95a	[SLP][NFC]Add a test with non-power-of-2 (but still whole vector) operands.	2024-08-28 10:08:20 -07:00
Philip Reames	ed03070eb3	[SLP] Support vectorizing 2^N-1 reductions (#106266 ) Build on the -slp-vectorize-non-power-of-2 experimental option, and support vectorizing reductions with 2^N-1 sized vector. Specifically, two related changes: 1) When searching for a profitable VL, start with the 2^N-1 reduction width. If cost model does not select that VL, return to power of two boundaries when halfing the search VL. The later is mostly for simplicity. 2) Reduce the minimum reduction width from 4 to 3 when supporting non-power of two vectors. This is required to support <3 x Ty> cases. One thing which isn't directly related to this change, but I want to note for clarity is that the non-power-of-two vectorization appears to be sensative to operand order of reduction. I haven't yet fully figured out why, but I suspect this is non-power-of-two specific.	2024-08-27 12:27:03 -07:00
Philip Reames	acb33a0c9b	[RISCV][SLP] Add test coverage for 2^N-1 vector sizes w/FP types Our cost modeling for FP and integer differs in enough cases that having both is useful for exercising different logic in SLP.	2024-08-27 10:56:32 -07:00
Philip Reames	4dda564c72	[RISCV][SLP] Add test coverage for 2^N-1 vector sizes Mostly copied from the AArch64 coverage for same, but also added a couple tests for reductions which aren't currently supported.	2024-08-27 10:24:46 -07:00
Han-Kuan Chen	3d1c63ee2c	[SLP][REVEC] Expand getelementptr into vector form. (#103704 )	2024-08-27 16:11:52 +08:00
Alexey Bataev	2a50dac9fb	[RISCV][TTI]Fix the cost estimation for long select shuffle. The code was broken completely. Need to iterate over the whole mask and process the submasks correctly, check if they form full indentity and adjust indices correctly. Fixes https://github.com/llvm/llvm-project/issues/106126	2024-08-26 17:27:52 -07:00
Alexey Bataev	b9d3da8c8d	[SLP]Fix PR105904: the root node might be a gather node without user for reductions. Before checking the user components of the gather/buildvector nodes, need to check if the node has users at all. Root nodes might not have users, if it is a node for the reduction. Fixes https://github.com/llvm/llvm-project/issues/105904	2024-08-26 07:09:05 -07:00
Alexey Bataev	dab19dac94	[SLP]Fix a crash for the strided nodes with reversed order and externally used pointer. If the strided node is reversed, need to cehck for the last instruction, not the first one in the list of scalars, when checking if the root pointer must be extracted.	2024-08-23 07:35:48 -07:00
Alexey Bataev	f3d2609af3	[SLP]Improve/fix subvectors in gather/buildvector nodes handling SLP vectorizer has an estimation for gather/buildvector nodes, which contain some scalar loads. SLP vectorizer performs pretty similar (but large in SLOCs) estimation, which not always correct. Instead, this patch implements clustering analysis and actual node allocation with the full analysis for the vectorized clustered scalars (not only loads, but also some other instructions) with the correct cost estimation and vector insert instructions. Improves overall vectorization quality and simplifies analysis/estimations. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/104144	2024-08-23 06:45:22 -07:00
Vitaly Buka	96b3166602	Revert "[SLP]Improve/fix subvectors in gather/buildvector nodes handling" (#105780 ) with "[Vectorize] Fix warnings" It introduced compiler crashes, see #104144. This reverts commit 69332bb8995aef60d830406de12cb79a50390261 and 351f4a5593f1ef507708ec5eeca165b20add3340.	2024-08-22 22:21:20 -07:00
Alexey Bataev	69332bb899	[SLP]Improve/fix subvectors in gather/buildvector nodes handling SLP vectorizer has an estimation for gather/buildvector nodes, which contain some scalar loads. SLP vectorizer performs pretty similar (but large in SLOCs) estimation, which not always correct. Instead, this patch implements clustering analysis and actual node allocation with the full analysis for the vectorized clustered scalars (not only loads, but also some other instructions) with the correct cost estimation and vector insert instructions. Improves overall vectorization quality and simplifies analysis/estimations. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/104144	2024-08-22 11:24:08 -04:00
Alexey Bataev	b10ecfa914	[SLP]Represent externally used values as original scalars, if profitable. Currently SLP vectorizer tries to keep only GEPs as scalar, if they are vectorized but used externally. Same approach can be used for all scalar values. This patch tries to keep original scalars if all its operands remain scalar or externally used, the cost of the original scalar is lower than the cost of the extractelement instruction, or if the number of externally used scalars in the same entry is power of 2. Last criterion allows better revectorization for multiply used scalars. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/100904	2024-08-12 10:15:02 -04:00
Alexey Bataev	5ab7b0de0b	[SLP][NFC]Add a test for segmented load, NFC.	2024-08-08 10:26:58 -07:00
Simon Pilgrim	32c69faa6c	[SLP] Regenerate test checks to reduce NFC diff in #100904	2024-08-08 17:08:53 +01:00
Alexey Bataev	daf4a06e5c	[SLP]Try detect strided loads, if any pointer op require extraction. If any pointer operand of the non-cosencutive loads is an instructions with the user, which is not part of the current graph, and, thus, requires emission of the extractelement instruction, better to try to detect if the load sequence can be repsented as strided load and extractelement instructions for pointers are not required. Reviewers: preames, RKSimon, topperc Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/101668	2024-08-06 09:20:50 -04:00
Alexey Bataev	834ad102c3	[SLP][NFC]ADd a test version with threshold=-15, NFC.	2024-08-05 10:14:01 -07:00
Alexey Bataev	799fd3d87b	[SLP]Support vectorization of small strided loads only graph. If the graph includes only strided loads node, the compiler should still try to vectorize it. Reviewers: RKSimon, preames, topperc Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/101659	2024-08-05 12:51:10 -04:00
Alexey Bataev	deb3ecf09f	[SLP][NFC]Add a test with the reduction of strided load, NFC.	2024-08-02 05:06:44 -07:00
Alexey Bataev	f6e01b9ece	[SLP]Do not trunc bv nodes, if the user is vectorized an requires wider type. If at least a single user of the gathered trunc'ed instruction is vectorized and requires wider type, than the trunc node, such gathers/buildvectors should not be optimized for better bitwidth.	2024-07-19 07:28:04 -07:00
Alexey Bataev	4c28494e78	[SLP][NFC]Add a test for incorrect minbitwidth analysis for trunc'ed bv, NFC.	2024-07-19 07:14:03 -07:00
Alexey Bataev	8ff233f4f1	[SLP]Correctly detect minnum/maxnum patterns for select/cmp operations on floats. The patch enables detection of minnum/maxnum patterns for float point instruction, represented as select/cmp. Also, enables better cost estimation for integer min/max patterns since the compiler starts to estimate the scalars separately. Reviewers: nikic, RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/98570	2024-07-16 09:42:08 -07:00

1 2 3 4

162 Commits