llvm-project

Author	SHA1	Message	Date
Han-Kuan Chen	ead3a2f598	[SLP][REVEC] getScalarizationOverhead should not be used when ScalarTy is FixedVectorType. (#117536 )	2024-11-26 22:05:54 +08:00
Alexey Bataev	76f0ff8210	[SLP]Add an extra check to avoid infinite vectorization attempts Added extra check for the cost of the buildvector if the -slp-threshold option is used. Prevents infinite vectorization attempts.	2024-11-25 14:27:44 -08:00
Alexey Bataev	f953b5eb72	[SLP]Relax assertion about subvectors mask size SubVectorsMask might be less than CommonMask, if the vectors with larger number of elements are permuted or reused elements are used. Need to consider this when estimation/building the vector to avoid compiler crash Fixes #117518	2024-11-25 08:31:42 -08:00
Alexey Bataev	57bbdbd7ae	[SLP]Relax assertion in mask combine for non-power-of-2 number of elements The nodes may contain non-power-of-2 number of elements. Need to relax the assertion to avoid possible compiler crash Fixes #117517	2024-11-25 07:58:19 -08:00
Alexey Bataev	7523086a05	[SLP]Use getExtendedReduction cost and fix reduction cost calculations Patch uses getExtendedReduction for reductions of ext-based nodes + adds cost estimation for ctpop-kind reductions into basic implementation and RISCV-V specific vcpop cost estimation. Reviewers: RKSimon, preames Reviewed By: preames Pull Request: https://github.com/llvm/llvm-project/pull/117350	2024-11-22 16:12:53 -05:00
Alexey Bataev	b8703369da	[SLP] Match poison as instruction with the same opcode Patch allows to vector scalar instruction + poison values as if poisons are instructions with the same opcode. It allows better vectorization of the repeated values, reduces number of insertelement instructions and serves as a base ground for copyable elements vectorization AVX512, -O3 + LTO JM/ldecod - better vector code Applications/oggenc - better vectorization CINT2017speed/625.x264_s CINT2017rate/525.x264_r - better vector code CFP2017rate/526.blender_r - better vector code CFP2006/447.dealII - small variations Benchmarks/Bullet - extra vector code CFP2017rate/510.parest_r - better vectorization CINT2017rate/502.gcc_r CINT2017speed/602.gcc_s - extra vector code Benchmarks/tramp3d-v4 - small variations CFP2006/453.povray - extra vector code JM/lencod - better vector code CFP2017rate/511.povray_r - extra vector code MemFunctions/MemFunctions - extra vector code LoopVectorization/LoopVectorizationBenchmarks - extra vector code XRay/FDRMode - extra vector code XRay/ReturnReference - extra vector code LCALS/SubsetCLambdaLoops - extra vector code LCALS/SubsetCRawLoops - extra vector code LCALS/SubsetARawLoops - extra vector code LCALS/SubsetALambdaLoops - extra vector code DOE-ProxyApps-C++/miniFE - extra vector code LoopVectorization/LoopInterleavingBenchmarks - extra vector code LCALS/SubsetBLambdaLoops - extra vector code MicroBenchmarks/harris - extra vector code ImageProcessing/Dither - extra vector code MicroBenchmarks/SLPVectorization - extra vector code ImageProcessing/Blur - extra vector code ImageProcessing/Dilate - extra vector code Builtins/Int128 - extra vector code ImageProcessing/Interpolation - extra vector code ImageProcessing/BilateralFiltering - extra vector code ImageProcessing/AnisotropicDiffusion - extra vector code MicroBenchmarks/LoopInterchange - extra code vectorized LCALS/SubsetBRawLoops - extra code vectorized CINT2006/464.h264ref - extra vectorization with wider vectors CFP2017rate/508.namd_r - small variations, extra phis vectorized CFP2006/444.namd - 2 2 x phi replaced by 4 x phi DOE-ProxyApps-C/SimpleMOC - extra code vectorized CINT2017rate/541.leela_r CINT2017speed/641.leela_s - the function better vectorized and inlined Benchmarks/Misc/oourafft - 2 4 x bit reductions replaced by 2 x vector code FreeBench/fourinarow - better vectorization Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/115946	2024-11-22 16:10:17 -05:00
Alexey Bataev	9c9e030fba	[SLP][NFC]Add a test with the RISCV ctpop-based reduction	2024-11-22 09:25:00 -08:00
Han-Kuan Chen	39913ae095	[SLP][REVEC] Make reorderTopToBottom support ShuffleVectorInst. (#117310 ) We don't want reorderTopToBottom to reorder ShuffleVectorInst (because ShuffleVectorInst currently supports only a limited set of patterns). Either we make ShuffleVectorInst support more patterns, or we let ReorderIndices reorder the result of the vectorization of ShuffleVectorInst. We choose the latter solution.	2024-11-23 01:20:57 +08:00
Alexey Bataev	14bdcefbd8	[SLP]Model reduction_add(ext(<n x i1>)) as ext(ctpop(bitcast <n x i1> to int n)) Currently sequences reduction_add(ext(<n x i1>)) are modeled as vector extensions + reduction add, but later instcombiner transforms it into ext(ctcpop(bitcast <n x i1> to int n)). Patch adds direct support for this in SLP vectorizer, which enables better cost estimation. AVX512, -O3+LTO CINT2006/445.gobmk - extra vector code Prolangs-C/bison - extra vector code Benchmarks/NPB-serial/is - 16 x + 8 x reductions vectorized as 24 x reduction Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/116875	2024-11-22 06:50:25 -08:00
Alexey Bataev	07507cb591	[SLP]Fix shuffling of entries of the different sizes Need to choose the size of vector factor for mask based on the entries vector factors, not mask size, to generate correct code. Fixes #117170	2024-11-21 13:08:27 -08:00
Alexey Bataev	b62557aaeb	Revert "[SLP]Model reduction_add(ext(<n x i1>)) as ext(ctpop(bitcast <n x i1> to int n))" This reverts commit 0298c5921d3b9fbeb5fefc2555321ea82ade6090 to fix a buildbot crash reported by https://lab.llvm.org/buildbot/#/builders/113/builds/4079.	2024-11-21 12:52:55 -08:00
Alexey Bataev	0298c5921d	[SLP]Model reduction_add(ext(<n x i1>)) as ext(ctpop(bitcast <n x i1> to int n)) Currently sequences reduction_add(ext(<n x i1>)) are modeled as vector extensions + reduction add, but later instcombiner transforms it into ext(ctcpop(bitcast <n x i1> to int n)). Patch adds direct support for this in SLP vectorizer, which enables better cost estimation. AVX512, -O3+LTO CINT2006/445.gobmk - extra vector code Prolangs-C/bison - extra vector code Benchmarks/NPB-serial/is - 16 x + 8 x reductions vectorized as 24 x reduction Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/116875	2024-11-21 13:21:00 -05:00
Alexey Bataev	58c8d73172	[SLP][NFC]Add a test with multi reductions, NFC	2024-11-21 09:48:19 -08:00
Sushant Gokhale	197fb270cc	[AArch64][NFC] NFC for const vector as Instruction operand (#116790 ) Current cost-modelling does not take into account cost of materializing const vector. This results in some cases, as the test shows, being vectorized but this may not always be profitable. Future patch will try to address this issue.	2024-11-21 10:23:05 +05:30
Han-Kuan Chen	a62c5497c9	[SLP][REVEC] The vectorized result for ShuffleVector may not be ShuffleVectorInst. (#116940 )	2024-11-20 23:59:23 +08:00
Alexey Bataev	79682c4d57	[SLP]Check if the buildvector root is not a part of the graph before deletion If the buildvector root has no uses, it might be still needed as a part of the graph, so need to check that it is not a part of the graph before deletion. Fixes #116852	2024-11-19 11:31:40 -08:00
Sushant Gokhale	7e85cb8a8a	[AArch64][NFC] Add test as a representative of scalarizing a vector i… (#114107 ) …nteger division The last resort to vectorize a bundle of integer divisions is considered scalarizing it. Currently, the cost estimates for scalarizing a vector division can be considerably overestimated as is the scenario with this motivating test case i.e. vector cost should not deviate much from the scalar cost. Future patch will try to improve the scalarization cost.	2024-11-19 13:52:56 +05:30
Alexey Bataev	ad9c0b369e	[SLP]Check if the gathered loads form full vector before attempting build it Need to check that the number of gathered loads in the slice forms the build vector to avoid compiler crash. Fixes #116691	2024-11-18 14:09:31 -08:00
Alexey Bataev	f6e1d64458	[SLP]Enable interleaved stores support Enables interaleaved stores, results in better estimation for segmented stores for RISC-V Reviewers: preames, topperc, RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/115354	2024-11-15 11:01:57 -05:00
Alexey Bataev	af3295bd3d	[SLP]Enable splat ordering for loads Enables splat support for loads with lanes> 2 or number of operands> 2. Allows better detect splats of loads and reduces number of shuffles in some cases. X86, AVX512, -O3+LTO Metric: size..text results results0 diff test-suite :: External/SPEC/CFP2006/433.milc/433.milc.test 154867.00 156723.00 1.2% test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12467735.00 12468023.00 0.0% Better vectorization quality Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/115173	2024-11-15 10:29:43 -05:00
Alexey Bataev	058ac837bc	[SLP]Use generic createShuffle for buildvector Use generic createShuffle function, which know how to adjust the vectors correctly, to avoid compiler crash when trying to build a buildvector as a shuffle Fixes #115732	2024-11-11 10:49:39 -08:00
Han-Kuan Chen	3cdd86bb47	[SLP][REVEC] Make GetMinMaxCost support FixedVectorType when REVEC is enabled. (#115417 )	2024-11-10 13:53:15 +08:00
Tex Riddell	818d715989	[Analysis] atan2: isTriviallyVectorizable; add to massv and accelerate veclibs (#113637 ) This change is part of this proposal: https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294 - Return true for atan2 from isTriviallyVectorizable - Add atan2 to VecFuncs.def for massv and accelerate libraries. - Add atan2 to hasOptimizedCodeGen - Add atan2 support in llvm/lib/Analysis/ValueTracking.cpp llvm::getIntrinsicForCallSite and update vectorization tests - Add atan2 name check to isLoweredToCall in llvm/include/llvm/Analysis/TargetTransformInfoImpl.h - Note: there's no test coverage for these names in isLoweredToCall, except that Transforms/TailCallElim/inf-recursion.ll is impacted by the "fabs" case Thanks to @jroelofs for the atan2 accelerate veclib and associated test additions, plus the hasOptimizedCodeGen addition. Part of: Implement the atan2 HLSL Function #70096.	2024-11-08 16:07:38 -08:00
Alexey Bataev	77bec78878	[SLP]Do not look for last instruction in schedule block for buildvectors If looking for the insertion point for the node and the node is a buildvector node, the compiler should not use scheduling info for such nodes, they may contain only partial info, which is not fully correct and may cause compiler crash. Fixes #114082	2024-11-08 06:55:29 -08:00
Alexey Bataev	62db1c8a07	[SLP]Better decision making on whether to try stores packs for vectorization Since the stores are sorted by distance, comparing the indices in the original array and early exit, if the index is less than the index of the last store, not always the best strategy. Better to remove such stores explicitly to try better to check for the vectorization opportunity. Fixes #115008	2024-11-07 14:23:15 -08:00
Alexey Bataev	dec3839979	[SLP][NFC]Add a test with the missed vectorization opportunity for stores with same address	2024-11-07 13:53:23 -08:00
Kazu Hirata	22b4b1ab10	Revert "[SLP][REVEC] Make GetMinMaxCost support FixedVectorType when REVEC is enabled. (#114946 )" This reverts commit f58757b8dc167809b69ec00f9b5ab59281df0902. Failing buildbots: https://lab.llvm.org/buildbot/#/builders/174/builds/8058 https://lab.llvm.org/buildbot/#/builders/127/builds/1357	2024-11-07 10:43:11 -08:00
Han-Kuan Chen	f58757b8dc	[SLP][REVEC] Make GetMinMaxCost support FixedVectorType when REVEC is enabled. (#114946 )	2024-11-08 00:52:59 +08:00
Alexey Bataev	79fd615759	[SLP][NFC]Add a test with the segmented loads, NFC	2024-11-07 07:08:24 -08:00
Luke Lau	343a810725	[RISCV] Allow f16/bf16 with zvfhmin/zvfbfmin as legal strided access (#115264 ) This is also split off from the zvfhmin/zvfbfmin isLegalElementTypeForRVV work. Enabling this will cause SLP and RISCVGatherScatterLowering to emit @llvm.experimental.vp.strided.{load,store} intrinsics, and codegen support for this was added in #109387 and #114750.	2024-11-07 14:40:15 +08:00
Han-Kuan Chen	c6091cdbed	[SLP][REVEC] Make shufflevector can be vectorized with ReorderIndices and ReuseShuffleIndices. (#114965 )	2024-11-07 11:04:34 +08:00
Alexey Bataev	76422385c3	[SLP]Support reordered buildvector nodes for better clustering Patch adds reordering of the buildvector nodes for better clustering of the compatible operations and future vectorization. Includes basic cost estimation and if the transformation is not profitable - reverts it. AVX512, -O3+LTO Metric: size..text Program size..text results results0 diff test-suite :: External/SPEC/CINT2006/401.bzip2/401.bzip2.test 74565.00 75701.00 1.5% test-suite :: External/SPEC/CINT2017rate/541.leela_r/541.leela_r.test 75773.00 76397.00 0.8% test-suite :: External/SPEC/CINT2017speed/641.leela_s/641.leela_s.test 75773.00 76397.00 0.8% test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 2014462.00 2024494.00 0.5% test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test 395219.00 396979.00 0.4% test-suite :: MultiSource/Applications/JM/lencod/lencod.test 857795.00 859667.00 0.2% test-suite :: External/SPEC/CINT2006/464.h264ref/464.h264ref.test 800472.00 802440.00 0.2% test-suite :: External/SPEC/CFP2006/447.dealII/447.dealII.test 590699.00 591403.00 0.1% test-suite :: MultiSource/Benchmarks/MiBench/consumer-lame/consumer-lame.test 203006.00 203102.00 0.0% test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/miniGMG/miniGMG.test 42408.00 42424.00 0.0% test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12451575.00 12451927.00 0.0% test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test 1396480.00 1396448.00 -0.0% test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test 1396480.00 1396448.00 -0.0% test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test 1047708.00 1047580.00 -0.0% test-suite :: MultiSource/Benchmarks/MiBench/consumer-jpeg/consumer-jpeg.test 111344.00 111328.00 -0.0% test-suite :: External/SPEC/CINT2006/400.perlbench/400.perlbench.test 1087660.00 1087500.00 -0.0% test-suite :: MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/timberwolfmc.test 280664.00 280616.00 -0.0% test-suite :: MultiSource/Applications/sqlite3/sqlite3.test 502646.00 502006.00 -0.1% test-suite :: MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4.test 1033135.00 1031567.00 -0.2% test-suite :: External/SPEC/CINT2017rate/500.perlbench_r/500.perlbench_r.test 2070917.00 2065845.00 -0.2% test-suite :: External/SPEC/CINT2017speed/600.perlbench_s/600.perlbench_s.test 2070917.00 2065845.00 -0.2% test-suite :: External/SPEC/CINT2006/473.astar/473.astar.test 33893.00 33797.00 -0.3% test-suite :: MultiSource/Benchmarks/MiBench/telecomm-gsm/telecomm-gsm.test 39677.00 39549.00 -0.3% test-suite :: MultiSource/Benchmarks/mediabench/gsm/toast/toast.test 39674.00 39546.00 -0.3% test-suite :: MultiSource/Benchmarks/MiBench/security-blowfish/security-blowfish.test 11560.00 11512.00 -0.4% test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 653867.00 649275.00 -0.7% test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 653867.00 649275.00 -0.7% CINT2006/401.bzip2 - extra code vectorized CINT2017rate/541.leela_r CINT2017speed/641.leela_s - function _ZN9FastBoard25get_pattern3_augment_specEiib not inlined anymore, better vectorization CFP2017rate/510.parest_r - better vectorization JM/ldecod - better vectorization JM/lencod - same CINT2006/464.h264ref - extra code vectorized CFP2006/447.dealII - extra vector code MiBench/consumer-lame - vectorized 2 loops previously scalar DOE-ProxyApps-C/miniGMG - small changes Benchmarks/7zip - extra code vectorized, better vectorization CFP2017rate/526.blender_r - extra vectorization CFP2017speed/638.imagick_s CFP2017rate/538.imagick_r - extra vectorization MiBench/consumer-jpeg - extra vectorization CINT2006/400.perlbench - extra vectorization Prolangs-C/TimberWolfMC - small variations Applications/sqlite3 - extra function vectorized and inlined Benchmarks/tramp3d-v4 - extra code vectorized CINT2017rate/500.perlbench_r CINT2017speed/600.perlbench_s - extra code vectorized, function digcpy gets vectorized and inlined CINT2006/473.astar - extra code vectorized MiBench/telecomm-gsm - extra code vectorized, better vector code mediabench/gsm - same MiBench/security-blowfish - extra code vectorized CINT2017speed/625.x264_s CINT2017rate/525.x264_r - sub4x4_dct function vectorized and gets inlined RISCV-V, SiFive-p670, O3+LTO CFP2017rate/510.parest_r - extra vectorization CFP2017rate/526.blender_r - extra vectorization MiBench/consumer-lame - extra vectorized code Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/114284	2024-11-06 10:51:15 -05:00
Paul Walker	38fffa630e	[LLVM][IR] Use splat syntax when printing Constant[Data]Vector. (#112548 )	2024-11-06 11:53:33 +00:00
Alexey Bataev	c1cec8c0dc	[SLP][NFC]Add a test with missed splat ordering for loads, NFC	2024-11-05 14:08:17 -08:00
Alexey Bataev	0c18def2c1	[SLP]Allow interleaving check only if it is less than number of elements Need to check if the interleaving factor is less than total number of elements in loads slice to handle it correctly and avoid compiler crash. Fixes report https://github.com/llvm/llvm-project/pull/112361#issuecomment-2457227670	2024-11-05 07:06:15 -08:00
Alexey Bataev	899336735a	[SLP]Be more pessimistic about poisonous reductions Consider all possible reductions ops as being non-poisoning boolean logical operations, which require freeze to be fully correct. https://alive2.llvm.org/ce/z/TKWDMP Fixes #114738	2024-11-04 06:13:52 -08:00
Alexey Bataev	a15bf88d53	[SLP][NFC]Add a test with missing freeze instruction before reduction, NFC	2024-11-04 04:38:09 -08:00
Simon Pilgrim	ac1869aa70	[CostModel][X86] Add initial costs for non-lane-crossing one/two input shuffles (#114680 ) Most of the x86 shuffle instructions operate within each 128-bit subvector lane, but our shuffle costs struggle to handle this and have to fallback to worst case shuffles that reference elements from any lane. This patch detects shuffle masks that we know are "inlane" and enable us to assume a cheaper shuffle cost.	2024-11-04 10:19:02 +00:00
Han-Kuan Chen	a795a18bba	[SLP][REVEC] VF should be scaled when ScalarTy is FixedVectorType. (#114551 )	2024-11-02 03:03:52 +08:00
Han-Kuan Chen	e4aeeba84c	[SLP][REVEC] When ScalarTy is FixedVectorType, the insertion index should consider the number of elements of ScalarTy. (#114526 )	2024-11-01 21:17:57 +08:00
Alexey Bataev	e05def081e	[SLP]Do not vectorize code in EH and non-returning blocks The code in EH and non-returning blocks can be skipped by the vectorizer, since it does not add to the perfromance, just consumes compile/link time. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/112221	2024-10-31 13:50:02 -04:00
Alexey Bataev	19a34dded7	[SLP]Do not account external uses in EH block and in non-returning blocks No need to account the cost of the external uses in EH and non-returning basic blocks. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/112045	2024-10-31 13:23:43 -04:00
Alexey Bataev	e7080fd735	[SLP]Extra check if the intruction matked for removal, must be replaced in reduction ops If the instruction is vectorized and it is a part of the reduced values gather/buildvector node, it should replaced in reduced operation instructions before removal properly, to avoid compiler crash. Fixes #114371	2024-10-31 09:59:35 -07:00
Matthias Braun	255e441613	X86: Do not return invalid cost for fp16 conversion (#114128 ) Returning invalid instruction costs when converting from/to fp16 in `X86TTIImpl::getCastInstrCost` when there is no hardware support available was triggering asserts. This changes the code to return a large (arbitrary) number to model the fact that libcalls are used to implement the conversion. This also simplifies the code by only reporting costs for the scalar fp16 conversion; vectorized costs being left to the fallback assuming scalarization. This is a follow-up to assertion issues reported for the changes in #113195	2024-10-29 17:16:17 -07:00
Sushant Gokhale	c9f01f699c	[SLP][AArch64][NFC] Add more tests for SLP vectorization of div (#113876 ) Currently, we dont have much tests that show SLP outcome for integer divisions. This patch adds tests for same. In certain scenarios, for Neon, vectorization is profitable. An attempt would be made in future to improve the cost-model for the same.	2024-10-28 20:37:41 +05:30
Alexey Bataev	7152bf3bc8	[SLP]Do not create new vector node if scalars fully overlap with the existing one If the list of scalars vectorized as the part of the same vector node, no need to generate vector node again, it will be handled as part of overlapping matching. Fixes #113810	2024-10-28 06:59:41 -07:00
Matthias Braun	054c23d78f	X86: Improve cost model of fp16 conversion (#113195 ) Improve cost-modeling for x86 __fp16 conversions so the SLPVectorizer transforms the patterns: - Override `X86TTIImpl::getStoreMinimumVF` to report a minimum VF of 4 (SSE register can hold 4xfloat converted/stored to 4xf16) this is necessary as fp16 stores are neither modeled as trunc-stores nor can we mark direct Xxfp16 stores as legal as we generally expand fp16 operations). - Add missing cost entries to `X86TTIImpl::getCastInstrCost` conversion from/to fp16. Note that conversion from f64 to f16 is not supported by an X86 instruction.	2024-10-25 16:22:24 -07:00
Jonas Paulsson	aba39c3974	[System] Precommit of test for #112491 (#113704 )	2024-10-25 17:40:00 +02:00
Alexey Bataev	e914421d7f	[SLP]Do correct signedness analysis for externally used scalars If the scalars is used externally is in the root node, it may have incorrect signedness info because of the conflict with the demanded bits analysis. Need to perform exact signedness analysis and compute it rather than rely on the precomputed value, which might be incorrect for alternate zext/sext nodes. Fixes #113520	2024-10-24 08:59:24 -07:00
Alexey Bataev	d2e7ee77d3	[SLP]Do not check for clustered loads only Since SLP support "clusterization" of the non-load instructions, the restriction for reduced values for loads only should be removed to avoid compiler crash. Fixes #113516	2024-10-24 08:16:42 -07:00

1 2 3 4 5 ...

2019 Commits