llvm-project

Author	SHA1	Message	Date
Alexey Bataev	1160994602	[SLP]Fix a crash for very long GEP chains Need to check if the GEP bases are equal and return false early. Also, need to return false if the lookup is too deep, considering bases equal too. Fixes a crash in the assertion.	2025-01-08 06:47:41 -08:00
Alexey Bataev	07d284d4eb	[SLP]Add cost estimation for gather node reshuffling Adds cost estimation for the variants of the permutations of the scalar values, used in gather nodes. Currently, SLP just unconditionally emits shuffles for the reused buildvectors, but in some cases better to leave them as buildvectors rather than shuffles, if the cost of such buildvectors is better. X86, AVX512, -O3+LTO Metric: size..text Program size..text results results0 diff test-suite :: External/SPEC/CINT2006/445.gobmk/445.gobmk.test 912998.00 913238.00 0.0% test-suite :: MultiSource/Benchmarks/MiBench/consumer-lame/consumer-lame.test 203070.00 203102.00 0.0% test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test 1396320.00 1396448.00 0.0% test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test 1396320.00 1396448.00 0.0% test-suite :: MultiSource/Benchmarks/Bullet/bullet.test 309790.00 309678.00 -0.0% test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12477607.00 12470807.00 -0.1% CINT2006/445.gobmk - extra code vectorized MiBench/consumer-lame - small variations CFP2017speed/638.imagick_s CFP2017rate/538.imagick_r - extra vectorized code Benchmarks/Bullet - extra code vectorized CFP2017rate/526.blender_r - extra vector code RISC-V, sifive-p670, -O3+LTO CFP2006/433.milc - regressions, should be fixed by https://github.com/llvm/llvm-project/pull/115173 CFP2006/453.povray - extra vectorized code CFP2017rate/508.namd_r - better vector code CFP2017rate/510.parest_r - extra vectorized code SPEC/CFP2017rate - extra/better vector code CFP2017rate/526.blender_r - extra vectorized code CFP2017rate/538.imagick_r - extra vectorized code CINT2006/403.gcc - extra vectorized code CINT2006/445.gobmk - extra vectorized code CINT2006/464.h264ref - extra vectorized code CINT2006/483.xalancbmk - small variations CINT2017rate/525.x264_r - better vectorization Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/115201	2024-12-24 15:35:29 -05:00
Han-Kuan Chen	3133acf1fb	Revert "[SLP] Make getSameOpcode support different instructions if they have same semantics. (#112181 )" This reverts commit 82204154b7bd1f8c487c94c7ef00399d776b29f0.	2024-12-12 20:38:31 -08:00
Han-Kuan Chen	82204154b7	[SLP] Make getSameOpcode support different instructions if they have same semantics. (#112181 )	2024-12-13 12:06:10 +08:00
Han-Kuan Chen	2546ae4ed0	[SLP][REVEC] Fix the number of elements in the mask of a ShuffleVectorInst is not a power of 2. (#119689 ) The following shufflevector should not be vectorized when slp-vectorize-non-power-of-2 is enabled. shufflevector <8 x float> %1, <8 x float> poison, <3 x i32> <i32 0, i32 1, i32 2> shufflevector <8 x float> %1, <8 x float> poison, <3 x i32> <i32 4, i32 5, i32 6>	2024-12-13 02:22:41 +08:00
Alexey Bataev	7523086a05	[SLP]Use getExtendedReduction cost and fix reduction cost calculations Patch uses getExtendedReduction for reductions of ext-based nodes + adds cost estimation for ctpop-kind reductions into basic implementation and RISCV-V specific vcpop cost estimation. Reviewers: RKSimon, preames Reviewed By: preames Pull Request: https://github.com/llvm/llvm-project/pull/117350	2024-11-22 16:12:53 -05:00
Alexey Bataev	9c9e030fba	[SLP][NFC]Add a test with the RISCV ctpop-based reduction	2024-11-22 09:25:00 -08:00
Alexey Bataev	58c8d73172	[SLP][NFC]Add a test with multi reductions, NFC	2024-11-21 09:48:19 -08:00
Alexey Bataev	f6e1d64458	[SLP]Enable interleaved stores support Enables interaleaved stores, results in better estimation for segmented stores for RISC-V Reviewers: preames, topperc, RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/115354	2024-11-15 11:01:57 -05:00
Alexey Bataev	af3295bd3d	[SLP]Enable splat ordering for loads Enables splat support for loads with lanes> 2 or number of operands> 2. Allows better detect splats of loads and reduces number of shuffles in some cases. X86, AVX512, -O3+LTO Metric: size..text results results0 diff test-suite :: External/SPEC/CFP2006/433.milc/433.milc.test 154867.00 156723.00 1.2% test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12467735.00 12468023.00 0.0% Better vectorization quality Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/115173	2024-11-15 10:29:43 -05:00
Han-Kuan Chen	3cdd86bb47	[SLP][REVEC] Make GetMinMaxCost support FixedVectorType when REVEC is enabled. (#115417 )	2024-11-10 13:53:15 +08:00
Alexey Bataev	62db1c8a07	[SLP]Better decision making on whether to try stores packs for vectorization Since the stores are sorted by distance, comparing the indices in the original array and early exit, if the index is less than the index of the last store, not always the best strategy. Better to remove such stores explicitly to try better to check for the vectorization opportunity. Fixes #115008	2024-11-07 14:23:15 -08:00
Alexey Bataev	dec3839979	[SLP][NFC]Add a test with the missed vectorization opportunity for stores with same address	2024-11-07 13:53:23 -08:00
Kazu Hirata	22b4b1ab10	Revert "[SLP][REVEC] Make GetMinMaxCost support FixedVectorType when REVEC is enabled. (#114946 )" This reverts commit f58757b8dc167809b69ec00f9b5ab59281df0902. Failing buildbots: https://lab.llvm.org/buildbot/#/builders/174/builds/8058 https://lab.llvm.org/buildbot/#/builders/127/builds/1357	2024-11-07 10:43:11 -08:00
Han-Kuan Chen	f58757b8dc	[SLP][REVEC] Make GetMinMaxCost support FixedVectorType when REVEC is enabled. (#114946 )	2024-11-08 00:52:59 +08:00
Alexey Bataev	79fd615759	[SLP][NFC]Add a test with the segmented loads, NFC	2024-11-07 07:08:24 -08:00
Luke Lau	343a810725	[RISCV] Allow f16/bf16 with zvfhmin/zvfbfmin as legal strided access (#115264 ) This is also split off from the zvfhmin/zvfbfmin isLegalElementTypeForRVV work. Enabling this will cause SLP and RISCVGatherScatterLowering to emit @llvm.experimental.vp.strided.{load,store} intrinsics, and codegen support for this was added in #109387 and #114750.	2024-11-07 14:40:15 +08:00
Paul Walker	38fffa630e	[LLVM][IR] Use splat syntax when printing Constant[Data]Vector. (#112548 )	2024-11-06 11:53:33 +00:00
Alexey Bataev	c1cec8c0dc	[SLP][NFC]Add a test with missed splat ordering for loads, NFC	2024-11-05 14:08:17 -08:00
Alexey Bataev	0c18def2c1	[SLP]Allow interleaving check only if it is less than number of elements Need to check if the interleaving factor is less than total number of elements in loads slice to handle it correctly and avoid compiler crash. Fixes report https://github.com/llvm/llvm-project/pull/112361#issuecomment-2457227670	2024-11-05 07:06:15 -08:00
Han-Kuan Chen	a795a18bba	[SLP][REVEC] VF should be scaled when ScalarTy is FixedVectorType. (#114551 )	2024-11-02 03:03:52 +08:00
Alexey Bataev	cb5046da26	[SLP]Do not ignore undefs when trying to replace with "poisonous" shuffles Need to consider undefs correctly, when trying to replace them with potentially poisonous values in shuffles. Such elements should not be silently replaced by poison values, instead complex analysis should be implemented to see if it is safe to do it. Fixes #113425	2024-10-24 07:47:23 -07:00
Han-Kuan Chen	12bcea3292	[RISCV][TTI] Recognize CONCAT_VECTORS if a shufflevector mask is multiple insert subvector. (#111459 ) reference: https://github.com/llvm/llvm-project/pull/110457	2024-10-18 20:16:56 +07:00
Alexey Bataev	f9bc00e4bb	[SLP]Initial support for interleaved loads Adds initial support for interleaved loads, which allows emission of segmented loads for RISCV RVV. Vectorizes extra code for RISCV CFP2006/447.dealII, CFP2006/453.povray, CFP2017rate/510.parest_r, CFP2017rate/511.povray_r, CFP2017rate/526.blender_r, CFP2017rate/538.imagick_r, CINT2006/403.gcc, CINT2006/473.astar, CINT2017rate/502.gcc_r, CINT2017rate/525.x264_r Reviewers: RKSimon, preames Reviewed By: preames Pull Request: https://github.com/llvm/llvm-project/pull/112042	2024-10-14 09:12:33 -04:00
Alexey Bataev	4b5018d231	[SLP]Track repeated reduced value as it might be vectorized Need to track changes with the repeated reduced value, since it might be vectorized in the next attempt for reduction vectorization, to correctly generate the code and avoid compiler crash. Fixes #111887	2024-10-10 13:41:56 -07:00
Alexey Bataev	a65a5feb1a	[SLP]Improve masked loads vectorization, attempting gathered loads If the vector of loads can be vectorized as masked gather and there are several other masked gather nodes, compiler can try to attempt to check, if it possible to gather such nodes into big consecutive/strided loads node, which provide better performance. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/110151	2024-10-08 16:43:10 -04:00
Philip Reames	f11568bcb0	Revert "[RISCV][TTI] Recognize CONCAT_VECTORS if a shufflevector mask is multiple insert subvector. (#110457 )" This reverts commit 554eaec63908ed20c35c8cc85304a3d44a63c634. Change was not approved when landed.	2024-10-07 11:31:57 -07:00
Luke Lau	20864d2cf6	[ValueTypes][RISCV] Add v1bf16 type (#111112 ) When trying to add RISC-V fadd reduction cost model tests for bf16, I noticed a crash when the vector was of <1 x bfloat>. It turns out that this was being scalarized because unlike f16/f32/f64, there's no v1bf16 value type, and the existing cost model code assumed that the legalized type would always be a vector. This adds v1bf16 to bring bf16 in line with the other fp types. It also adds some more RISC-V bf16 reduction tests which previously crashed, including tests to ensure that SLP won't emit fadd/fmul reductions for bf16 or f16 w/ zvfhmin after #111000.	2024-10-06 22:20:51 +08:00
Han-Kuan Chen	554eaec639	[RISCV][TTI] Recognize CONCAT_VECTORS if a shufflevector mask is multiple insert subvector. (#110457 )	2024-10-05 14:58:44 +08:00
Alexey Bataev	b16e694948	[SLP]Try to keep operand of external casts as scalars, if profitable If the cost of original scalar instruction + cast is better than the extractelement from the vector cast instruction, better to keep original scalar instructions, where possible Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/110537	2024-10-01 13:35:42 -04:00
Han-Kuan Chen	061762933b	[SLP][REVEC] Fix cost model for getBuildVectorCost with FixedVectorType ScalarTy. (#110073 ) BoUpSLP::gather always use CreateInsertVector for FixedVectorType ScalarTy.	2024-09-30 21:51:12 +08:00
Philip Reames	556ec4a726	[SLP] Pass operand info to getCmpSelInstrInfo (#109998 ) Depending on the constant, selects with constant arms can have highly varying cost. This adjusts SLP to use the new API introduced in d2885743. Fixes https://github.com/llvm/llvm-project/issues/109466.	2024-09-25 08:17:55 -07:00
Philip Reames	bcbdf7ad6b	[RISCV][TTI/SLP] Add test coverage for select of constants costing Provides coverage for an upcoming change which accounts for the cost of materializing the vector constants in the vector select.	2024-09-24 08:15:40 -07:00
Han-Kuan Chen	00629752e6	[SLP][REVEC] Fix cost model for getGatherCost with FixedVectorType ScalarTy. (#109369 )	2024-09-24 08:08:35 +08:00
Philip Reames	7f6bbb3c4f	[RISCV][TTI] Reduce cost of a build_vector pattern (#108419 ) This change is actually two related changes, but they're very hard to meaningfully separate as the second balances the first, and yet doesn't do much good on it's own. First, we can reduce the cost of a build_vector pattern. Our current costing for this defers to generic insertelement costing which isn't unreasonable, but also isn't correct. While inserting N elements requires N-1 slides and N vmv.s.x, doing the full build_vector only requires N vslide1down. (Note there are other cases that our build vector lowering can do more cheaply, this is simply the easiest upper bound which appears to be "good enough" for SLP costing purposes.) Second, we need to tell SLP that calls don't preserve vector registers. Without this, SLP will vectorize scalar code which performs e.g. 4 x float @exp calls as two <2 x float> @exp intrinsic calls. Oddly, the costing works out that this is in fact the optimal choice - except that we don't actually have a <2 x float> @exp, and unroll during DAG. This would be fine (or at least cost neutral) except that the libcall for the scalar @exp blows all vector registers. So the net effect is we added a bunch of spills that SLP had no idea about. Thankfully, AArch64 has a similiar problem, and has taught SLP how to reason about spill cost once the right TTI hook is implemented. Now, for some implications... The SLP solution for spill costing has some inaccuracies. In particular, it basically just guesses whether a intrinsic will be lowered to a call or not, and can be wrong in both directions. It also has no mechanism to differentiate on calling convention. This has the effect of making partial vectorization (i.e. starting in scalar) more profitable. In practice, the major effect of this is to make it more like SLP will vectorize part of a tree in an intersecting forrest, and then vectorize the remaining tree once those uses have been removed. This has the effect of biasing us slightly away from strided, or indexed loads during vectorization - because the scalar cost is more accurately modeled, and these instructions look relevatively less profitable.	2024-09-20 08:34:36 -07:00
Philip Reames	fa8b737a81	[SLP][RISCV] Add test for 3 element build vector feeding reduce Our costs for build vectors are currently a bit off which inhibits vectorization. Fix forthcoming.	2024-09-11 15:56:53 -07:00
Philip Reames	7910812414	[SLP] Regen a test to pick up naming changes	2024-09-11 15:46:57 -07:00
Philip Reames	247d3ea843	[SLP] Expand non-power-of-two bailout in TryToFindDuplicates This fixes a crash noticed when doing a downstream merge. The test case has been reduced, and is included in this commit. The existing bailout for non-power-of-two vectors in TryToFindDuplicates did not consider the case where the list being vectorized had no root node. This allowed reshuffled scalars to slip through to code which does not yet expect to handle it. This was an existing bug (likely introduced by my ed03070e), but made easier to hit by 63e8a1b1	2024-09-05 13:51:11 -07:00
Philip Reames	63e8a1b16f	[SLP] Enable reordering for non-power-of-two vectors (#106638 ) This change tries to enable vector reordering during vectorization for non-power-of-two vectors. Specifically, my goal is to be able to vectorize reductions whose operands appear in other than identity order. (i.e. a[1] + a[0] + a[2]). Our standard pass pipeline, Reassociation effectively canonicalizes towards this form. So for reduction vectorization to be wildly applicable, we need this feature. This change enables the use of a non-empty ReorderIndices structure - which is effectively required for out of order loads or gathers - while leaving the ReuseShuffleIndices mechanism unused and disabled. If I've understood the code structure, the former is used when describing implicit shuffles required by the vectorization strategy (i.e. loading elements 0,1,3,2 in the order 0,1,2,3 and then shuffling later), while the later is used when trying to optimize explode/buildvectors (called gathers in this code). I audited all the code enabled by this change, but can't claim to deeply understand most of it. I added a couple of bailouts in places which appeared to be difficult to audit and optional optimizations. I've tried to do so in the least risky way I can, but am not completely confident in this change. Careful review appreciated.	2024-09-05 07:52:27 -07:00
Alexey Bataev	a724f9a7e5	[SLP][NFC]Make whole reg non-power-2 test for x86 and aarch64 along with risc-v	2024-09-04 10:11:05 -07:00
Alexey Bataev	571c8c2c88	Revert "[SLP]Initial support for non-power-of-2 (but still whole register) number of elements in operands." This reverts commit a3ea90ffbbe47d9a1b3eab03324f09d7b8e0dcb3 after the post commit review. The number of parts is calculated incorrectly.	2024-09-03 11:02:07 -07:00
Alexey Bataev	884d7c137a	Revert "[SLP]Check for the whole vector vectorization in unique scalars analysis" This reverts commit b74e09cb20e6218320013b54c9ba2f5c069d44b9 after post-commit review. The number of parts is calculated incorrectly.	2024-09-03 11:02:07 -07:00
Philip Reames	2c7786e94a	Prefer use of 0.0 over -0.0 for fadd reductions w/nsz (in IR) (#106770 ) This is a follow up to 924907bc6, and is mostly motivated by consistency but does include one additional optimization. In general, we prefer 0.0 over -0.0 as the identity value for an fadd. We use that value in several places, but don't in others. So, let's be consistent and use the same identity (when nsz allows) everywhere. This creates a bunch of test churn, but due to 924907bc6, most of that churn doesn't actually indicate a change in codegen. The exception is that this change enables the use of 0.0 for nsz, but not reasoc, fadd reductions. Or said differently, it allows the neutral value of an ordered fadd reduction to be 0.0.	2024-09-03 09:16:37 -07:00
Alexey Bataev	b74e09cb20	[SLP]Check for the whole vector vectorization in unique scalars analysis Need to check that thr whole number of register is attempted to vectorize before actually trying to build the node to avoid compiler crash.	2024-09-03 06:19:21 -07:00
Alexey Bataev	a3ea90ffbb	[SLP]Initial support for non-power-of-2 (but still whole register) number of elements in operands. Patch adds basic support for non-power-of-2 number of elements in operands. The patch still requires that this number addresses whole registers. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/106449	2024-08-31 08:14:49 -07:00
Martin Storsjö	9e86d4f2ed	Revert "[SLP]Initial support for non-power-of-2 (but still whole register) number of elements in operands." This reverts commit 6ab07d71174982e5cb95420ee4df01347333c342. This commit caused failed asserts, see https://github.com/llvm/llvm-project/pull/106449.	2024-08-31 14:53:08 +03:00
Alexey Bataev	6ab07d7117	[SLP]Initial support for non-power-of-2 (but still whole register) number of elements in operands. Patch adds basic support for non-power-of-2 number of elements in operands. The patch still requires that this number addresses whole registers. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/106449	2024-08-30 14:50:34 -04:00
Simon Pilgrim	ceb613a8be	[RISCV] Add full test coverage for acos/asin/atan and cosh/sinh/tanh intrinsics to support #106584	2024-08-30 14:01:15 +01:00
Philip Reames	22ba351108	[RISCV][SLP] Test for <3 x Ty> reductions which require reordering These tests show a vectorizable reduction where the order of the reduction has been adjusted so that profitable vectorization requires a reordering of the computation. We currently have no reordering in SLP for non-power-of-two vectors, so this doesn't work. Note that due to reassociation performed in the standard pipeline, this is actually the canonical form for a reduction reaching SLP.	2024-08-29 11:42:09 -07:00
Alexey Bataev	be7014e95a	[SLP][NFC]Add a test with non-power-of-2 (but still whole vector) operands.	2024-08-28 10:08:20 -07:00

1 2 3 4

182 Commits