llvm-project

Author	SHA1	Message	Date
DianQK	e7a4d78ad3	[SLP] Check if instructions exist after vectorization (#120434 ) Fixes #120433.	2024-12-19 06:21:57 +08:00
Alexander Kornienko	23a239267e	Revert "[InstCombine] Infer nuw for gep inbounds from base of object" (#120460 ) Reverts llvm/llvm-project#119225 due to the lack of sanitizer support, large potential of breaking code containing latent UB, non-trivial localization and investigation, and what seems to be a bad interaction with msan (a test is in the works). Related discussions: https://github.com/llvm/llvm-project/pull/119225#issuecomment-2551904822 https://github.com/llvm/llvm-project/pull/118472#issuecomment-2549986255	2024-12-18 19:06:34 +01:00
Alexey Bataev	0e11e19416	[SLP][NFC]Remove undef and update tests	2024-12-17 11:45:20 -08:00
Alexey Bataev	d1a7225076	[SLP]Check if the node must keep its original bitwidth Need to check if during previous analysis the node has requested to keep its original bitwidth to avoid incorrect codegen. Fixes #120076	2024-12-16 08:01:22 -08:00
Alexey Bataev	c53901405a	[SLP][NFC]Add a test with incorrect bitwidth for the node, previously identified as non-shrinkable	2024-12-16 07:50:49 -08:00
Han-Kuan Chen	3133acf1fb	Revert "[SLP] Make getSameOpcode support different instructions if they have same semantics. (#112181 )" This reverts commit 82204154b7bd1f8c487c94c7ef00399d776b29f0.	2024-12-12 20:38:31 -08:00
Han-Kuan Chen	82204154b7	[SLP] Make getSameOpcode support different instructions if they have same semantics. (#112181 )	2024-12-13 12:06:10 +08:00
Han-Kuan Chen	2546ae4ed0	[SLP][REVEC] Fix the number of elements in the mask of a ShuffleVectorInst is not a power of 2. (#119689 ) The following shufflevector should not be vectorized when slp-vectorize-non-power-of-2 is enabled. shufflevector <8 x float> %1, <8 x float> poison, <3 x i32> <i32 0, i32 1, i32 2> shufflevector <8 x float> %1, <8 x float> poison, <3 x i32> <i32 4, i32 5, i32 6>	2024-12-13 02:22:41 +08:00
Han-Kuan Chen	51a0c1bf25	[SLP] NFC. Replace TreeEntry::setOperandsInOrder with VLOperands. (#118949 ) To reduce repeated code, TreeEntry::setOperandsInOrder will be replaced by VLOperands. Arg_size will be provided to make sure other operands will not be reorderd when VL[0] is IntrinsicInst (because APO is a boolean value). In addition, BoUpSLP::reorderInputsAccordingToOpcode will also be removed since it is simple.	2024-12-11 10:09:23 +08:00
Alexey Bataev	a42aa8f265	[SLP]Fix adjusting of the mask for the fully matched nodes. When checking for the poison elements in the matches node, need to consider the register number, when clearing the corresponding mask element. Fixes #119393	2024-12-10 09:47:16 -08:00
Nikita Popov	e21ab4d16b	[InstCombine] Infer nuw for gep inbounds from base of object (#119225 ) When we have a gep inbounds from the base of an object (e.g. alloca or global), we know that the index cannot be negative, as this would go out of bounds. As such, we can infer nuw as well. The implementation is a bit stricter than necessary, we could also accept one unknown index followed by known-non-negative indices. Proof: https://alive2.llvm.org/ce/z/Hp7-6w (Note that alive2 currently incorrectly doesn't require the inbounds for the alloca case, see https://github.com/AliveToolkit/alive2/issues/1138).	2024-12-10 10:00:50 +01:00
Nikita Popov	10f315dc9c	[ConstantFolding] Infer getelementptr nuw flag (#119214 ) Infer nuw from nusw and nneg. This is the constant expression variant of https://github.com/llvm/llvm-project/pull/111144. Proof: https://alive2.llvm.org/ce/z/ihztLy	2024-12-09 16:44:05 +01:00
Alexey Bataev	376dad72ab	[SLP]Move resulting vector before inert point, if the late generated buildvector fully matched If the perfect diamond match was detected for the postponed buildvectors and the vector for the previous node comes after the current node, need to move the vector register before the current inserting point to prevent compiler crash. Fixes #119002	2024-12-06 13:54:48 -08:00
Nikita Popov	f7685af4a5	[InstCombine] Move gep of phi fold into separate function This makes sure that an early return during this fold doesn't end up skipping later gep folds.	2024-12-05 15:20:56 +01:00
Nikita Popov	462cb3cd6c	[InstCombine] Infer nusw + nneg -> nuw for getelementptr (#111144 ) If the gep is nusw (usually via inbounds) and the offset is non-negative, we can infer nuw. Proof: https://alive2.llvm.org/ce/z/ihztLy	2024-12-05 14:36:40 +01:00
Simon Pilgrim	85d15bd130	[TTI][X86] getMemoryOpCost - reduced costs when loading uniform values due to value reuse (#118642 ) Similar to what we do for broadcast shuffles, when legalising load costs, if the value is known to be uniform, then we will only load a single vector and reuse this across the split legalised registers. Fixes #111126	2024-12-04 16:36:00 +00:00
Simon Pilgrim	140df02aa2	[SLP][X86] Update test coverage for #111126 I'd copied the test case from #118016 instead of the original #111126 test case	2024-12-04 12:28:55 +00:00
Simon Pilgrim	2202f0e093	[SLP][X86] Add test coverage for #111126 This needs to be expanded to a wider range of tests but for now just focus on #111126	2024-12-04 10:03:43 +00:00
Lee Wei	9bf6365237	[llvm] Remove `br i1 undef` from some regression tests [NFC] (#118419 ) This PR removes tests with `br i1 undef` under `llvm/tests/Transforms/ObjCARC, Reassociate, SCCP, SLPVectorizer...`. After this PR, I'll continue to fix tests under `llvm/tests/CodeGen`, which has more UB tests than `llvm/tests/Transforms`.	2024-12-03 20:54:36 +00:00
Dominik Steenken	866b9f43a0	[SystemZ] Add realistic cost estimates for vector reduction intrinsics (#118319 ) This PR adds more realistic cost estimates for these reduction intrinsics - `llvm.vector.reduce.umax` - `llvm.vector.reduce.umin` - `llvm.vector.reduce.smax` - `llvm.vector.reduce.smin` - `llvm.vector.reduce.fadd` - `llvm.vector.reduce.fmul` - `llvm.vector.reduce.fmax` - `llvm.vector.reduce.fmin` - `llvm.vector.reduce.fmaximum` - `llvm.vector.reduce.fminimum` - `llvm.vector.reduce.mul ` The pre-existing cost estimates for `llvm.vector.reduce.add` are moved to `getArithmeticReductionCosts` to reduce complexity in `getVectorIntrinsicInstrCost` and enable other passes, like the SLP vectorizer, to benefit from these updated calculations. These are not expected to provide noticable performance improvements and are rather provided for the sake of completeness and correctness. This PR is in draft mode pending benchmark confirmation of this. This also provides and/or updates cost tests for all of these intrinsics. This PR was co-authored by me and @JonPsson1 .	2024-12-03 17:08:51 +01:00
Han-Kuan Chen	f71ea4bc1b	[SLP][REVEC] reorderNodeWithReuses should not be called if all users of a TreeEntry are ShuffleVectorInst. (#118260 )	2024-12-03 09:04:04 +08:00
Jonas Paulsson	0ad6be1927	[SLPVectorizer, TargetTransformInfo, SystemZ] Improve SLP getGatherCost(). (#112491 ) As vector element loads are free on SystemZ, this patch improves the cost computation in getGatherCost() to reflect this. getScalarizationOverhead() gets an optional parameter which can hold the actual Values so that they in turn can be passed (by BasicTTIImpl) to getVectorInstrCost(). SystemZTTIImpl::getVectorInstrCost() will now recognize a LoadInst and typically return a 0 cost for it, with some exceptions.	2024-11-29 21:19:45 +01:00
Alexey Bataev	f4974e0931	[SLP] Add a check for poison value in AShrChecker Need to check if the value in AShrChecker is a poison before casting it to instruction to avoid compiler crash Fixes #118030	2024-11-29 06:51:19 -08:00
Han-Kuan Chen	ead3a2f598	[SLP][REVEC] getScalarizationOverhead should not be used when ScalarTy is FixedVectorType. (#117536 )	2024-11-26 22:05:54 +08:00
Alexey Bataev	76f0ff8210	[SLP]Add an extra check to avoid infinite vectorization attempts Added extra check for the cost of the buildvector if the -slp-threshold option is used. Prevents infinite vectorization attempts.	2024-11-25 14:27:44 -08:00
Alexey Bataev	f953b5eb72	[SLP]Relax assertion about subvectors mask size SubVectorsMask might be less than CommonMask, if the vectors with larger number of elements are permuted or reused elements are used. Need to consider this when estimation/building the vector to avoid compiler crash Fixes #117518	2024-11-25 08:31:42 -08:00
Alexey Bataev	57bbdbd7ae	[SLP]Relax assertion in mask combine for non-power-of-2 number of elements The nodes may contain non-power-of-2 number of elements. Need to relax the assertion to avoid possible compiler crash Fixes #117517	2024-11-25 07:58:19 -08:00
Alexey Bataev	7523086a05	[SLP]Use getExtendedReduction cost and fix reduction cost calculations Patch uses getExtendedReduction for reductions of ext-based nodes + adds cost estimation for ctpop-kind reductions into basic implementation and RISCV-V specific vcpop cost estimation. Reviewers: RKSimon, preames Reviewed By: preames Pull Request: https://github.com/llvm/llvm-project/pull/117350	2024-11-22 16:12:53 -05:00
Alexey Bataev	b8703369da	[SLP] Match poison as instruction with the same opcode Patch allows to vector scalar instruction + poison values as if poisons are instructions with the same opcode. It allows better vectorization of the repeated values, reduces number of insertelement instructions and serves as a base ground for copyable elements vectorization AVX512, -O3 + LTO JM/ldecod - better vector code Applications/oggenc - better vectorization CINT2017speed/625.x264_s CINT2017rate/525.x264_r - better vector code CFP2017rate/526.blender_r - better vector code CFP2006/447.dealII - small variations Benchmarks/Bullet - extra vector code CFP2017rate/510.parest_r - better vectorization CINT2017rate/502.gcc_r CINT2017speed/602.gcc_s - extra vector code Benchmarks/tramp3d-v4 - small variations CFP2006/453.povray - extra vector code JM/lencod - better vector code CFP2017rate/511.povray_r - extra vector code MemFunctions/MemFunctions - extra vector code LoopVectorization/LoopVectorizationBenchmarks - extra vector code XRay/FDRMode - extra vector code XRay/ReturnReference - extra vector code LCALS/SubsetCLambdaLoops - extra vector code LCALS/SubsetCRawLoops - extra vector code LCALS/SubsetARawLoops - extra vector code LCALS/SubsetALambdaLoops - extra vector code DOE-ProxyApps-C++/miniFE - extra vector code LoopVectorization/LoopInterleavingBenchmarks - extra vector code LCALS/SubsetBLambdaLoops - extra vector code MicroBenchmarks/harris - extra vector code ImageProcessing/Dither - extra vector code MicroBenchmarks/SLPVectorization - extra vector code ImageProcessing/Blur - extra vector code ImageProcessing/Dilate - extra vector code Builtins/Int128 - extra vector code ImageProcessing/Interpolation - extra vector code ImageProcessing/BilateralFiltering - extra vector code ImageProcessing/AnisotropicDiffusion - extra vector code MicroBenchmarks/LoopInterchange - extra code vectorized LCALS/SubsetBRawLoops - extra code vectorized CINT2006/464.h264ref - extra vectorization with wider vectors CFP2017rate/508.namd_r - small variations, extra phis vectorized CFP2006/444.namd - 2 2 x phi replaced by 4 x phi DOE-ProxyApps-C/SimpleMOC - extra code vectorized CINT2017rate/541.leela_r CINT2017speed/641.leela_s - the function better vectorized and inlined Benchmarks/Misc/oourafft - 2 4 x bit reductions replaced by 2 x vector code FreeBench/fourinarow - better vectorization Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/115946	2024-11-22 16:10:17 -05:00
Alexey Bataev	9c9e030fba	[SLP][NFC]Add a test with the RISCV ctpop-based reduction	2024-11-22 09:25:00 -08:00
Han-Kuan Chen	39913ae095	[SLP][REVEC] Make reorderTopToBottom support ShuffleVectorInst. (#117310 ) We don't want reorderTopToBottom to reorder ShuffleVectorInst (because ShuffleVectorInst currently supports only a limited set of patterns). Either we make ShuffleVectorInst support more patterns, or we let ReorderIndices reorder the result of the vectorization of ShuffleVectorInst. We choose the latter solution.	2024-11-23 01:20:57 +08:00
Alexey Bataev	14bdcefbd8	[SLP]Model reduction_add(ext(<n x i1>)) as ext(ctpop(bitcast <n x i1> to int n)) Currently sequences reduction_add(ext(<n x i1>)) are modeled as vector extensions + reduction add, but later instcombiner transforms it into ext(ctcpop(bitcast <n x i1> to int n)). Patch adds direct support for this in SLP vectorizer, which enables better cost estimation. AVX512, -O3+LTO CINT2006/445.gobmk - extra vector code Prolangs-C/bison - extra vector code Benchmarks/NPB-serial/is - 16 x + 8 x reductions vectorized as 24 x reduction Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/116875	2024-11-22 06:50:25 -08:00
Alexey Bataev	07507cb591	[SLP]Fix shuffling of entries of the different sizes Need to choose the size of vector factor for mask based on the entries vector factors, not mask size, to generate correct code. Fixes #117170	2024-11-21 13:08:27 -08:00
Alexey Bataev	b62557aaeb	Revert "[SLP]Model reduction_add(ext(<n x i1>)) as ext(ctpop(bitcast <n x i1> to int n))" This reverts commit 0298c5921d3b9fbeb5fefc2555321ea82ade6090 to fix a buildbot crash reported by https://lab.llvm.org/buildbot/#/builders/113/builds/4079.	2024-11-21 12:52:55 -08:00
Alexey Bataev	0298c5921d	[SLP]Model reduction_add(ext(<n x i1>)) as ext(ctpop(bitcast <n x i1> to int n)) Currently sequences reduction_add(ext(<n x i1>)) are modeled as vector extensions + reduction add, but later instcombiner transforms it into ext(ctcpop(bitcast <n x i1> to int n)). Patch adds direct support for this in SLP vectorizer, which enables better cost estimation. AVX512, -O3+LTO CINT2006/445.gobmk - extra vector code Prolangs-C/bison - extra vector code Benchmarks/NPB-serial/is - 16 x + 8 x reductions vectorized as 24 x reduction Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/116875	2024-11-21 13:21:00 -05:00
Alexey Bataev	58c8d73172	[SLP][NFC]Add a test with multi reductions, NFC	2024-11-21 09:48:19 -08:00
Sushant Gokhale	197fb270cc	[AArch64][NFC] NFC for const vector as Instruction operand (#116790 ) Current cost-modelling does not take into account cost of materializing const vector. This results in some cases, as the test shows, being vectorized but this may not always be profitable. Future patch will try to address this issue.	2024-11-21 10:23:05 +05:30
Han-Kuan Chen	a62c5497c9	[SLP][REVEC] The vectorized result for ShuffleVector may not be ShuffleVectorInst. (#116940 )	2024-11-20 23:59:23 +08:00
Alexey Bataev	79682c4d57	[SLP]Check if the buildvector root is not a part of the graph before deletion If the buildvector root has no uses, it might be still needed as a part of the graph, so need to check that it is not a part of the graph before deletion. Fixes #116852	2024-11-19 11:31:40 -08:00
Sushant Gokhale	7e85cb8a8a	[AArch64][NFC] Add test as a representative of scalarizing a vector i… (#114107 ) …nteger division The last resort to vectorize a bundle of integer divisions is considered scalarizing it. Currently, the cost estimates for scalarizing a vector division can be considerably overestimated as is the scenario with this motivating test case i.e. vector cost should not deviate much from the scalar cost. Future patch will try to improve the scalarization cost.	2024-11-19 13:52:56 +05:30
Alexey Bataev	ad9c0b369e	[SLP]Check if the gathered loads form full vector before attempting build it Need to check that the number of gathered loads in the slice forms the build vector to avoid compiler crash. Fixes #116691	2024-11-18 14:09:31 -08:00
Alexey Bataev	f6e1d64458	[SLP]Enable interleaved stores support Enables interaleaved stores, results in better estimation for segmented stores for RISC-V Reviewers: preames, topperc, RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/115354	2024-11-15 11:01:57 -05:00
Alexey Bataev	af3295bd3d	[SLP]Enable splat ordering for loads Enables splat support for loads with lanes> 2 or number of operands> 2. Allows better detect splats of loads and reduces number of shuffles in some cases. X86, AVX512, -O3+LTO Metric: size..text results results0 diff test-suite :: External/SPEC/CFP2006/433.milc/433.milc.test 154867.00 156723.00 1.2% test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12467735.00 12468023.00 0.0% Better vectorization quality Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/115173	2024-11-15 10:29:43 -05:00
Alexey Bataev	058ac837bc	[SLP]Use generic createShuffle for buildvector Use generic createShuffle function, which know how to adjust the vectors correctly, to avoid compiler crash when trying to build a buildvector as a shuffle Fixes #115732	2024-11-11 10:49:39 -08:00
Han-Kuan Chen	3cdd86bb47	[SLP][REVEC] Make GetMinMaxCost support FixedVectorType when REVEC is enabled. (#115417 )	2024-11-10 13:53:15 +08:00
Tex Riddell	818d715989	[Analysis] atan2: isTriviallyVectorizable; add to massv and accelerate veclibs (#113637 ) This change is part of this proposal: https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294 - Return true for atan2 from isTriviallyVectorizable - Add atan2 to VecFuncs.def for massv and accelerate libraries. - Add atan2 to hasOptimizedCodeGen - Add atan2 support in llvm/lib/Analysis/ValueTracking.cpp llvm::getIntrinsicForCallSite and update vectorization tests - Add atan2 name check to isLoweredToCall in llvm/include/llvm/Analysis/TargetTransformInfoImpl.h - Note: there's no test coverage for these names in isLoweredToCall, except that Transforms/TailCallElim/inf-recursion.ll is impacted by the "fabs" case Thanks to @jroelofs for the atan2 accelerate veclib and associated test additions, plus the hasOptimizedCodeGen addition. Part of: Implement the atan2 HLSL Function #70096.	2024-11-08 16:07:38 -08:00
Alexey Bataev	77bec78878	[SLP]Do not look for last instruction in schedule block for buildvectors If looking for the insertion point for the node and the node is a buildvector node, the compiler should not use scheduling info for such nodes, they may contain only partial info, which is not fully correct and may cause compiler crash. Fixes #114082	2024-11-08 06:55:29 -08:00
Alexey Bataev	62db1c8a07	[SLP]Better decision making on whether to try stores packs for vectorization Since the stores are sorted by distance, comparing the indices in the original array and early exit, if the index is less than the index of the last store, not always the best strategy. Better to remove such stores explicitly to try better to check for the vectorization opportunity. Fixes #115008	2024-11-07 14:23:15 -08:00
Alexey Bataev	dec3839979	[SLP][NFC]Add a test with the missed vectorization opportunity for stores with same address	2024-11-07 13:53:23 -08:00
Kazu Hirata	22b4b1ab10	Revert "[SLP][REVEC] Make GetMinMaxCost support FixedVectorType when REVEC is enabled. (#114946 )" This reverts commit f58757b8dc167809b69ec00f9b5ab59281df0902. Failing buildbots: https://lab.llvm.org/buildbot/#/builders/174/builds/8058 https://lab.llvm.org/buildbot/#/builders/127/builds/1357	2024-11-07 10:43:11 -08:00

1 2 3 4 5 ...

2042 Commits