llvm-project

Author	SHA1	Message	Date
Florian Hahn	a7fda0e1e4	[VPlan] Introduce VPScalarPHIRecipe, use for can & EVL IV codegen (NFC). (#114305 ) Introduce a general recipe to generate a scalar phi. Lower VPCanonicalIVPHIRecipe and VPEVLBasedIVRecipe to VPScalarIVPHIrecipe before plan execution, avoiding the need for duplicated ::execute implementations. There are other cases that could benefit, including in-loop reduction phis and pointer induction phis. Builds on a similar idea as https://github.com/llvm/llvm-project/pull/82270. PR: https://github.com/llvm/llvm-project/pull/114305	2024-12-03 14:53:51 +00:00
Ramkumar Ramachandra	2a0ee090db	IVDesc: strip redundant arg in getOpcode call (NFC) (#118476 )	2024-12-03 13:40:51 +00:00
Han-Kuan Chen	f71ea4bc1b	[SLP][REVEC] reorderNodeWithReuses should not be called if all users of a TreeEntry are ShuffleVectorInst. (#118260 )	2024-12-03 09:04:04 +08:00
Florian Hahn	77767986ed	[LV] Use IsaPred in a few more places (NFC). Simplifies the code slightly by removing explicit lambdas.	2024-12-01 18:47:53 +00:00
Jonas Paulsson	0ad6be1927	[SLPVectorizer, TargetTransformInfo, SystemZ] Improve SLP getGatherCost(). (#112491 ) As vector element loads are free on SystemZ, this patch improves the cost computation in getGatherCost() to reflect this. getScalarizationOverhead() gets an optional parameter which can hold the actual Values so that they in turn can be passed (by BasicTTIImpl) to getVectorInstrCost(). SystemZTTIImpl::getVectorInstrCost() will now recognize a LoadInst and typically return a 0 cost for it, with some exceptions.	2024-11-29 21:19:45 +01:00
Alexey Bataev	f4974e0931	[SLP] Add a check for poison value in AShrChecker Need to check if the value in AShrChecker is a poison before casting it to instruction to avoid compiler crash Fixes #118030	2024-11-29 06:51:19 -08:00
Luke Lau	d9c269577e	[VPlan] Remove manual constant fold in VPWidenIntOrFpInductionRecipe. NFC (#118028 ) This manual constant folding was added in 2017 in https://reviews.llvm.org/D29956, but since then it looks like IRBuilder has learnt to fold it away itself. I'm not sure at what point this happened, I just verified this by stepping through the call to CreateVectorSplat in the debugger.	2024-11-29 00:21:53 +01:00
Florian Hahn	82821254f5	[LV] Use IVUpdateMayOverflow to set HasNUW. (#111758 ) If IVUpdateMayOverflow is false, we proved that the induction increment cannot overflow in the vector loop. This allows setting NUW in some cases when folding the tail. PR: https://github.com/llvm/llvm-project/pull/111758	2024-11-28 10:12:41 +00:00
Elvis Wang	9ea5be639d	Recommit "[LV][VPlan] Remove any-of reduction from precomputeCost. NFC (#117109 )" (#117289 ) Update the test cases contains `any-of` printings from the precomputeCost(). Origin message: The any-of reduction contains phi and select instructions. The select instruction might be optimized and removed in the vplan which may cause VF difference between legacy and VPlan-based model. But if the select instruction be removed, planContainsAdditionalSimplifications() will catch it and disable the assertion. Therefore, we can just remove the ayn-of reduction calculation in the precomputeCost(). Recommit "[LV][VPlan] Remove any-of reduction from precomputeCost. NFC (#117109)"	2024-11-28 15:07:36 +08:00
LiqinWeng	4a3f46de50	[LV][EVL] Support call instruction with EVL-vectorization (#110412 )	2024-11-28 10:05:08 +08:00
Han-Kuan Chen	ead3a2f598	[SLP][REVEC] getScalarizationOverhead should not be used when ScalarTy is FixedVectorType. (#117536 )	2024-11-26 22:05:54 +08:00
Alexey Bataev	76f0ff8210	[SLP]Add an extra check to avoid infinite vectorization attempts Added extra check for the cost of the buildvector if the -slp-threshold option is used. Prevents infinite vectorization attempts.	2024-11-25 14:27:44 -08:00
Florian Hahn	30af6fb163	[VPlan] Group together helpers for retrieving various VPBlocks (NFCI). Group together functions to retrieve various blocks of a VPlan, as suggested in https://github.com/llvm/llvm-project/pull/114292.	2024-11-25 21:22:45 +00:00
Florian Hahn	466ff3ed70	[VPlan] Mark VPIRInstruction::getInstruction) as const (NFCI). Split off from https://github.com/llvm/llvm-project/pull/114292.	2024-11-25 21:20:56 +00:00
Alexey Bataev	f953b5eb72	[SLP]Relax assertion about subvectors mask size SubVectorsMask might be less than CommonMask, if the vectors with larger number of elements are permuted or reused elements are used. Need to consider this when estimation/building the vector to avoid compiler crash Fixes #117518	2024-11-25 08:31:42 -08:00
Alexey Bataev	57bbdbd7ae	[SLP]Relax assertion in mask combine for non-power-of-2 number of elements The nodes may contain non-power-of-2 number of elements. Need to relax the assertion to avoid possible compiler crash Fixes #117517	2024-11-25 07:58:19 -08:00
Florian Hahn	590f451b60	[VPlan] Allow setting IR name for VPDerivedIVRecipe (NFCI). Allow setting the name to use for the generated IR value of the derived IV in preparations for https://github.com/llvm/llvm-project/pull/112145. This is analogous to VPInstruction::Name.	2024-11-24 20:39:12 +00:00
Florian Hahn	0dbdc6dc35	[VPlan] Simplify code to re-use existing basic blocks (NFCI). Restructure and slightly simplify code to re-use existing basic blocks.	2024-11-24 19:14:29 +00:00
LiqinWeng	042a1cc553	[VPlan] Generalize type inference for binary/cast/shift/logic. NFC (#116173 )	2024-11-24 09:14:14 +08:00
Florian Hahn	e2519b674c	[VPlan] Print incoming VPBB for Phi VPIRInstruction (NFC). Print the incoming block for Phi VPIRInstructions, for better debugging & testing.	2024-11-23 19:06:58 +00:00
Florian Hahn	590913983c	[VPlan] Simplify and unify code in verifyEVLRecipe using all_of. (NFCI) Use all_of instead of explicit loop to reduce indentation, also properly check VPScalarCastRecipe operand.	2024-11-23 11:12:33 +00:00
Alexey Bataev	7523086a05	[SLP]Use getExtendedReduction cost and fix reduction cost calculations Patch uses getExtendedReduction for reductions of ext-based nodes + adds cost estimation for ctpop-kind reductions into basic implementation and RISCV-V specific vcpop cost estimation. Reviewers: RKSimon, preames Reviewed By: preames Pull Request: https://github.com/llvm/llvm-project/pull/117350	2024-11-22 16:12:53 -05:00
Alexey Bataev	b8703369da	[SLP] Match poison as instruction with the same opcode Patch allows to vector scalar instruction + poison values as if poisons are instructions with the same opcode. It allows better vectorization of the repeated values, reduces number of insertelement instructions and serves as a base ground for copyable elements vectorization AVX512, -O3 + LTO JM/ldecod - better vector code Applications/oggenc - better vectorization CINT2017speed/625.x264_s CINT2017rate/525.x264_r - better vector code CFP2017rate/526.blender_r - better vector code CFP2006/447.dealII - small variations Benchmarks/Bullet - extra vector code CFP2017rate/510.parest_r - better vectorization CINT2017rate/502.gcc_r CINT2017speed/602.gcc_s - extra vector code Benchmarks/tramp3d-v4 - small variations CFP2006/453.povray - extra vector code JM/lencod - better vector code CFP2017rate/511.povray_r - extra vector code MemFunctions/MemFunctions - extra vector code LoopVectorization/LoopVectorizationBenchmarks - extra vector code XRay/FDRMode - extra vector code XRay/ReturnReference - extra vector code LCALS/SubsetCLambdaLoops - extra vector code LCALS/SubsetCRawLoops - extra vector code LCALS/SubsetARawLoops - extra vector code LCALS/SubsetALambdaLoops - extra vector code DOE-ProxyApps-C++/miniFE - extra vector code LoopVectorization/LoopInterleavingBenchmarks - extra vector code LCALS/SubsetBLambdaLoops - extra vector code MicroBenchmarks/harris - extra vector code ImageProcessing/Dither - extra vector code MicroBenchmarks/SLPVectorization - extra vector code ImageProcessing/Blur - extra vector code ImageProcessing/Dilate - extra vector code Builtins/Int128 - extra vector code ImageProcessing/Interpolation - extra vector code ImageProcessing/BilateralFiltering - extra vector code ImageProcessing/AnisotropicDiffusion - extra vector code MicroBenchmarks/LoopInterchange - extra code vectorized LCALS/SubsetBRawLoops - extra code vectorized CINT2006/464.h264ref - extra vectorization with wider vectors CFP2017rate/508.namd_r - small variations, extra phis vectorized CFP2006/444.namd - 2 2 x phi replaced by 4 x phi DOE-ProxyApps-C/SimpleMOC - extra code vectorized CINT2017rate/541.leela_r CINT2017speed/641.leela_s - the function better vectorized and inlined Benchmarks/Misc/oourafft - 2 4 x bit reductions replaced by 2 x vector code FreeBench/fourinarow - better vectorization Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/115946	2024-11-22 16:10:17 -05:00
Han-Kuan Chen	39913ae095	[SLP][REVEC] Make reorderTopToBottom support ShuffleVectorInst. (#117310 ) We don't want reorderTopToBottom to reorder ShuffleVectorInst (because ShuffleVectorInst currently supports only a limited set of patterns). Either we make ShuffleVectorInst support more patterns, or we let ReorderIndices reorder the result of the vectorization of ShuffleVectorInst. We choose the latter solution.	2024-11-23 01:20:57 +08:00
Alexey Bataev	14bdcefbd8	[SLP]Model reduction_add(ext(<n x i1>)) as ext(ctpop(bitcast <n x i1> to int n)) Currently sequences reduction_add(ext(<n x i1>)) are modeled as vector extensions + reduction add, but later instcombiner transforms it into ext(ctcpop(bitcast <n x i1> to int n)). Patch adds direct support for this in SLP vectorizer, which enables better cost estimation. AVX512, -O3+LTO CINT2006/445.gobmk - extra vector code Prolangs-C/bison - extra vector code Benchmarks/NPB-serial/is - 16 x + 8 x reductions vectorized as 24 x reduction Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/116875	2024-11-22 06:50:25 -08:00
Han-Kuan Chen	68aa6ac58c	[SLP] NFC. Remove redundant computation in getReorderingData. (#117295 )	2024-11-22 18:54:41 +08:00
Han-Kuan Chen	55e9afab6e	[SLP] NFC. Remove the useless check for alternate instruction. (#117293 ) Only BinaryOperator and CastInst support alternate instruction. It always returns false for TreeEntry::isAltShuffle if an instruction is ExtractElementInst, ExtractValueInst, LoadInst, StoreInst or InsertElementInst.	2024-11-22 18:53:40 +08:00
Shih-Po Hung	632c5d2991	[VPlan] Support VPReverseVectorPointer in DataWithEVL vectorization (#113667 ) VPReverseVectorPointer relies on the runtime VF, but in DataWithEVL tail-folding, EVL (which can be less than VF at runtime) should be used instead. This patch updates the logic to check the users of VF and replaces the second operand if the user is VPReverseVectorPointer.	2024-11-22 17:18:39 +08:00
Elvis Wang	0e3c791916	Revert "[LV][VPlan] Remove any-of reduction from precomputeCost. NFC" (#117280 ) Reverts llvm/llvm-project#117109 Some test cases need to update.	2024-11-22 11:32:52 +08:00
Elvis Wang	ce66b56865	[LV][VPlan] Remove any-of reduction from precomputeCost. NFC (#117109 ) The any-of reduction contains phi and select instructions. The select instruction might be optimized and removed in the vplan which may cause VF difference between legacy and VPlan-based model. But if the select instruction be removed, `planContainsAdditionalSimplifications()` will catch it and disable the assertion. Therefore, we can just remove the ayn-of reduction calculation in the precomputeCost().	2024-11-22 10:48:11 +08:00
Han-Kuan Chen	6b22e39f26	[SLP] NFC. Remove the useless check for alternate instruction. (#117116 ) Only BinaryOperator and CastInst support alternate instruction. It always returns false for TreeEntry::isAltShuffle if an instruction is ExtractElementInst, ExtractValueInst, LoadInst, StoreInst or InsertElementInst.	2024-11-22 10:39:41 +08:00
Alexey Bataev	68ce528def	[SLP]Fix vector factor calculation for adjusted mask Need to choose max vector factor as max(Mask.size(), prev-val-size). Fixes build erros in https://lab.llvm.org/buildbot/#/builders/95/builds/6504	2024-11-21 14:30:20 -08:00
Florian Hahn	4d1959b70b	[VPlan] Generalize collectUsersInExitBlocks for multiple exit bbs. (#115066 ) Generalize collectUsersInExitBlock to collecting exit users in multiple exit blocks. Exit blocks are leaf nodes in the VPlan (without successors) except the scalar header. Split off in preparation for https://github.com/llvm/llvm-project/pull/112138 PR: https://github.com/llvm/llvm-project/pull/115066	2024-11-21 21:15:36 +00:00
Florian Hahn	320038579d	[VPlan] Return cost of PHI for scalar VFs in computeCost for FORs. This fixes a crash when the VF is scalar. Fixes https://github.com/llvm/llvm-project/issues/116375.	2024-11-21 21:11:21 +00:00
Alexey Bataev	07507cb591	[SLP]Fix shuffling of entries of the different sizes Need to choose the size of vector factor for mask based on the entries vector factors, not mask size, to generate correct code. Fixes #117170	2024-11-21 13:08:27 -08:00
Alexey Bataev	b62557aaeb	Revert "[SLP]Model reduction_add(ext(<n x i1>)) as ext(ctpop(bitcast <n x i1> to int n))" This reverts commit 0298c5921d3b9fbeb5fefc2555321ea82ade6090 to fix a buildbot crash reported by https://lab.llvm.org/buildbot/#/builders/113/builds/4079.	2024-11-21 12:52:55 -08:00
Finn Plummer	8663b8777e	[NFC][VectorUtils][TargetTransformInfo] Add `isVectorIntrinsicWithOverloadTypeAtArg` api (#114849 ) This changes allows target intrinsics to specify and overwrite overloaded types. - Updates `ReplaceWithVecLib` to not provide TTI as there most probably won't be a use-case - Updates `SLPVectorizer` to use available TTI - Updates `VPTransformState` to pass down TTI - Updates `VPlanRecipe` to use passed-down TTI This change will let us add scalarization for `asdouble`: #114847	2024-11-21 11:04:25 -08:00
Alexey Bataev	0298c5921d	[SLP]Model reduction_add(ext(<n x i1>)) as ext(ctpop(bitcast <n x i1> to int n)) Currently sequences reduction_add(ext(<n x i1>)) are modeled as vector extensions + reduction add, but later instcombiner transforms it into ext(ctcpop(bitcast <n x i1> to int n)). Patch adds direct support for this in SLP vectorizer, which enables better cost estimation. AVX512, -O3+LTO CINT2006/445.gobmk - extra vector code Prolangs-C/bison - extra vector code Benchmarks/NPB-serial/is - 16 x + 8 x reductions vectorized as 24 x reduction Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/116875	2024-11-21 13:21:00 -05:00
Han-Kuan Chen	75b8f98ef6	[SLP] NFC. Change the comment to match the code execution. (#116022 ) Make code execute like the comment will modify many tests and affect the performance. As a result, we change the comment instead of the code.	2024-11-21 12:42:20 +08:00
Sterling-Augustine	c0ee8e22f4	[SandboxVec][SeedCollector] Reject non-simple memory ops for memory seeds (#116891 ) Load/Store isSimple is a necessary condition for VectorSeeds, but not sufficient, so reverse the condition and return value, and continue the check. Add relevant tests.	2024-11-20 11:53:41 -08:00
Han-Kuan Chen	a62c5497c9	[SLP][REVEC] The vectorized result for ShuffleVector may not be ShuffleVectorInst. (#116940 )	2024-11-20 23:59:23 +08:00
Alexey Bataev	b17f607703	[SLP][NFC]Remove unnecessary std::optional around Factor value	2024-11-20 05:54:15 -08:00
Sjoerd Meijer	9bccf61f5f	[AArch64][LV] Set MaxInterleaving to 4 for Neoverse V2 and V3 (#100385 ) Set the maximum interleaving factor to 4, aligning with the number of available SIMD pipelines. This increases the number of vector instructions in the vectorised loop body, enhancing performance during its execution. However, for very low iteration counts, the vectorised body might not execute at all, leaving only the epilogue loop to run. This issue affects e.g. cam4_r from SPEC FP, which experienced a performance regression. To address this, the patch reduces the minimum epilogue vectorisation factor from 16 to 8, enabling the epilogue to be vectorised and largely mitigating the regression.	2024-11-20 09:33:39 +00:00
vporpo	6e4821487f	[SandboxVec][DAG] Register callback for erase instr (#116742 ) This patch adds the callback registration logic in the DAG's constructor and the corresponding deregistration logic in the destructor. It also implements the code that makes sure that SchedBundle and DGNodes can be safely destroyed in any order.	2024-11-19 16:20:38 -08:00
Alexey Bataev	79682c4d57	[SLP]Check if the buildvector root is not a part of the graph before deletion If the buildvector root has no uses, it might be still needed as a part of the graph, so need to check that it is not a part of the graph before deletion. Fixes #116852	2024-11-19 11:31:40 -08:00
David Sherwood	12180717cb	[NFC][LoopVectorize] Introduce new getEstimatedRuntimeVF function (#116247 ) There are lots of places where we try to estimate the runtime vectorisation factor based on the getVScaleForTuning TTI hook. I've added a new getEstimatedRuntimeVF function and taught several places in the vectoriser to use this new function.	2024-11-19 12:38:11 +00:00
David Sherwood	3097c60928	[LoopVectorize][NFC] Rewrite tests to check output of vplan cost model (#113697 ) Currently it's very difficult to improve the cost model for tail-folded loops because as soon as you add a VPInstruction::computeCost function that adds the costs of instructions such as VPInstruction::ActiveLaneMask and VPInstruction::ExplicitVectorLength the assert in LoopVectorizationPlanner::computeBestVF fails for some tests. This is because the VF chosen by the legacy cost model doesn't match the vplan cost model. See PR #90191. This assert is currently making it difficult to improve the cost model. Hopefully we will be in a position to remove the assert soon, however in order to do that we have to fix up a whole bunch of tests that rely upon the legacy cost model output. I've tried my best to update these tests to use vplan output instead. There is still work needed for the VF=1 case because the vplan cost model is not printed out in this case. I've not attempted to fix those in this patch.	2024-11-19 08:55:39 +00:00
Alexey Bataev	ad9c0b369e	[SLP]Check if the gathered loads form full vector before attempting build it Need to check that the number of gathered loads in the slice forms the build vector to avoid compiler crash. Fixes #116691	2024-11-18 14:09:31 -08:00
Julian Nagele	a8538b9138	[LV] Vectorize Epilogues for loops with small VF but high IC (#108190 ) - Consider MainLoopVF * IC when determining whether Epilogue Vectorization is profitable - Allow the same VF for the Epilogue as for the main loop - Use an upper bound for the trip count of the Epilogue when choosing the Epilogue VF PR: https://github.com/llvm/llvm-project/pull/108190 --------- Co-authored-by: Florian Hahn <flo@fhahn.com>	2024-11-17 19:35:32 +00:00
vporpo	1be9827754	[SandboxVec][BottomUpVec] Implement packing of vectors (#116447 ) Up until now we could only support packing of scalar elements. This patch fixes this by implementing packing of vector elements, by generating extractelement and insertelement instruction pairs.	2024-11-15 16:12:22 -08:00

1 2 3 4 5 ...

5223 Commits