llvm-project

Author	SHA1	Message	Date
Pietro Ghiglio	83d9aa2768	[VPlan] Add scalar inferencing support for addrspace cast (#92107 ) Fixes https://github.com/llvm/llvm-project/issues/91434 PR: https://github.com/llvm/llvm-project/pull/92107	2024-05-15 14:03:21 +01:00
Jay Foad	1650f1b3d7	Fix typo "indicies" (#92232 )	2024-05-15 13:10:16 +01:00
Florian Hahn	d187005cad	[VPlan] Update VPBlendRecipe codegen for for first-lane only. Update VPBlendRecipe::execute to support generating code for first-lane only. This fixes a crash in the newly added test @test_not_first_lane_only_wide_compare_incoming_order_swapped.	2024-05-15 11:00:15 +01:00
Florian Hahn	67d840b60f	[VPlan] Relax over-aggressive assertion in VPTransformState::get(). There are cases where a vector value has some users that demand the the single scalar value only (NeedsScalar), while other users demand the vector value (see attached test cases). In those cases, the NeedsScalar users should only demand the first lane. Fixes https://github.com/llvm/llvm-project/issues/91883.	2024-05-14 19:10:49 +01:00
Florian Hahn	b1e99a699d	[LV] Drop redundant comment from createEdgeMask (NFC). Follow-up to remove a redundant comment post-commit https://github.com/llvm/llvm-project/pull/91897	2024-05-14 12:43:47 +01:00
Ramkumar Ramachandra	d7ef34bfe3	[LV] update comment following 63d8058 (NFC) (#91120 ) Address a review comment post landing 63d8058 (LoopVectorize: guard appending InstsToScalarize; fix bug) to update a comment.	2024-05-14 10:59:26 +01:00
Florian Hahn	632317e9ab	[VPlan] Add non-poison propagating LogicalAnd VPInstruction opcode. (#91897 ) Add a new opcode to mode non-poison propagating logical AND operations used when generating edge masks. This follows the similar decision to model Not as dedicated opcode as well, to improve clarity. This also helps to simplify the matchers for https://github.com/llvm/llvm-project/pull/89386. PR: https://github.com/llvm/llvm-project/pull/91897	2024-05-14 09:42:49 +01:00
Florian Hahn	e122380445	[LV] Use VPBuilder to create Select (NFCI).	2024-05-13 20:44:39 +01:00
David Green	b7ed097f29	[VectorCombine] Add intrinsics handling to shuffleToIdentity (#91000 ) This is probably the most involved addition, as it tries to make use of isTriviallyVectorizable with isVectorIntrinsicWithScalarOpAtArg to handle a number of different intrinsics that are all lane-wise. Additional tests have been added for some of the different intrinsics from isVectorIntrinsicWithScalarOpAtArg / isVectorIntrinsicWithOverloadTypeAtArg.	2024-05-12 20:31:11 +01:00
Florian Hahn	c3d2af0f4e	[VPlan] VPEVLBasedIVPHI is a VPSingleDefRecipe. VPEVLBasedIVPHIRecipe inherits from VPSingleDefRecipe. Add VPEVLBasedIVPHISC to VPSingleDefRecipe::classof to make isa/dyn_cast & co work as expected. Split off https://github.com/llvm/llvm-project/pull/67934.	2024-05-09 19:18:37 +01:00
Alexey Bataev	58a94b1d0a	[SLP]Fix PR91467: Look through scalar cast, when trying to cast to another type. Need to look through the SExt/ZExt scalars to be gathered, when trying to reduce their width after minbitwidth analysis to prevent permanent attempts to revectorize such gathered instructions.	2024-05-09 04:19:43 -07:00
Arthur Eubanks	2fb3774321	Revert "[SLP]Fix PR91467: Look through scalar cast, when trying to cast to another type." This reverts commit 2475efa91d8b4fa8f1a2d16052cb6d14be7d5dc6. Causes crashes, see comments on `2475efa91d`.	2024-05-08 23:01:47 +00:00
Alexey Bataev	2475efa91d	[SLP]Fix PR91467: Look through scalar cast, when trying to cast to another type. Need to look through the SExt/ZExt scalars to be gathered, when trying to reduce their width after minbitwidth analysis to prevent permanent attempts to revectorize such gathered instructions.	2024-05-08 07:25:19 -07:00
Ramkumar Ramachandra	57b9c15227	VectorCombine: fix logical error after m_Trunc match (#91201 ) The matcher m_Trunc() matches an Operator with a given Opcode, which could either be an Instruction or ConstExpr. VectorCombine::foldTruncFromReductions() incorrectly assumes that the pattern matched is always an Instruction, and attempts a cast. Fix this. Fixes #88796.	2024-05-08 09:47:55 +01:00
Florian Hahn	082c81ae4a	[LV] Properly extend versioned constant strides. We only version unknown strides to 1. If the original type is i1, then the sign of the extension matters. Properly extend the stride value before replacing it. Fixes https://github.com/llvm/llvm-project/issues/91369.	2024-05-07 21:31:42 +01:00
Alexey Bataev	f00f294130	[SLP]Fix PR91309: Do not consider SExt as always producing signed result. Still need to do the full analysis of the signedness of the values rather than rely on Instruction opcode, if the opcode is SExt. Still may produce unsigned result.	2024-05-07 08:57:52 -07:00
Alexey Bataev	c144157f3d	[SLP]Use last pointer instead of first for reversed strided stores. Need to use the last address of the vectorized stores for the strided stores, not the first one, to correctly store the data.	2024-05-06 10:16:28 -07:00
Alexey Bataev	a476032101	[SLP]Fix PR91025: correctly handle smin/smax of signed operands. Need to check that the signed operand has an extra sign bit to be sure that we do not skip signedness, when trying to minimize bitwidth for smin/smax intrinsics.	2024-05-06 08:10:20 -07:00
David Green	d145f40963	[VectorCombine] shuffleToIdentity - guard against call instructions. The shuffleToIdentity fold needs to be a bit more careful about the difference between call instructions and intrinsics. The second can be handled, but the first should result in bailing out. This patch also adds some extra intrinsic tests from #91000. Fixes #91078	2024-05-05 10:47:11 +01:00
Florian Hahn	b54a78d69b	[LV,LAA] Don't vectorize loops with load and store to invar address. Code checking stores to invariant addresses and reductions made an incorrect assumption that the case of both a load & store to the same invariant address does not need to be handled. In some cases when vectorizing with runtime checks, there may be dependences with a load and store to the same address, storing a reduction value. Update LAA to separately track if there was a store-store and a load-store dependence with an invariant addresses. Bail out early if there as a load-store dependence with invariant address. If there was a store-store one, still apply the logic checking if they all store a reduction.	2024-05-04 20:53:54 +01:00
Alexey Bataev	c7910ee1f0	[SLP][NFC]Use std::optional::value_or.	2024-05-04 11:47:41 -07:00
Alexey Bataev	03972261a9	[SLP]Fix PR90892: do a correct sign analysis of the entries elements in gather shuffles. Need to do extra analysis of the scalar elements of the tree entry to be shuffled instead of the vectorized value to correctly deduce signedness info.	2024-05-03 14:01:25 -07:00
David Green	a4d10266d2	[VectorCombine] Add foldShuffleToIdentity (#88693 ) This patch adds a basic version of a combine that attempts to remove shuffles that when combined simplify away to an identity shuffle. For example: %ab = shufflevector <8 x half> %a, <8 x half> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0> %at = shufflevector <8 x half> %a, <8 x half> poison, <4 x i32> <i32 7, i32 6, i32 5, i32 4> %abt = fneg <4 x half> %at %abb = fneg <4 x half> %ab %r = shufflevector <4 x half> %abt, <4 x half> %abb, <8 x i32> <i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0> By looking through the shuffles and fneg, it can be simplified to: %r = fneg <8 x half> %a The code tracks each lane starting from the original shuffle, keeping a track of a vector of {src, idx}. As we propagate up through the instructions we will either look through intermediate instructions (binops and unops) or see a collections of lanes that all have the same src and incrementing idx (an identity). We can also see a single value with identical lanes, which we can treat like a splat. Only the basic version is added here, handling identities, splats, binops and unops. In follow-up patches other instructions can be added such as constants, intrinsics, cmp/sel and zext/sext/trunc.	2024-05-03 19:14:38 +01:00
Florian Hahn	40cc96e7ec	[VPlan] Remove unused VPWidenCanonicalIVRecipe::getScalarType (NFCI). After a48ebb8276408fa88cf7060ddc68f4eda1b62def, the function is no longer used. Remove it.	2024-05-03 15:49:20 +01:00
Alexey Bataev	6517c5b068	[LV][NFC]Address last comments from https://github.com/llvm/llvm-project/pull/88025 .	2024-05-03 06:51:01 -07:00
Florian Hahn	bccb7ed8ac	Reapply "[LV] Improve AnyOf reduction codegen. (#78304 )" This reverts the revert commit c6e01627acf859. This patch includes a fix for any-of reductions and epilogue vectorization. Extra test coverage for the issue that caused the revert has been added in bce3bfced5fe0b019 and an assertion has been added in c7209cbb8be7a3c65813. -------------------------------- Original commit message: Update AnyOf reduction code generation to only keep track of the AnyOf property in a boolean vector in the loop, only selecting either the new or start value in the middle block. The patch incorporates feedback from https://reviews.llvm.org/D153697. This fixes the #62565, as now there aren't multiple uses of the start/new values. Fixes https://github.com/llvm/llvm-project/issues/62565 PR: https://github.com/llvm/llvm-project/pull/78304	2024-05-03 14:40:49 +01:00
Florian Hahn	a48ebb8276	[VPlan] Check type directly in ::isCanonical (NFC). Directly check the type of the wide induction matches the canonical induction. Refactor suggested in and in preparation for https://github.com/llvm/llvm-project/pull/89603	2024-05-03 13:12:33 +01:00
Alexey Bataev	1d43cdc9f5	[LV][EVL]Support reversed loads/stores. Support for predicated vector reverse intrinsic was added some time ago. Adds support for predicated reversed loads/stores in the loop vectorizer. Reviewers: fhahn Reviewed By: fhahn Pull Request: https://github.com/llvm/llvm-project/pull/88025	2024-05-03 07:28:56 -04:00
Florian Hahn	c7209cbb8b	[LV] Assert that there's a resume phi for epilogue loops (NFC). This patch adds an assert to createAndCollectMergePhiForReduction to make sure there is a resume phi when vectorizing the epilogue loop. This is needed to set the resume value from the main vector loop. This assertion guards against the issue caused the revert of https://github.com/llvm/llvm-project/pull/78304.	2024-05-02 19:20:28 +01:00
Alexey Bataev	5e67c41a93	[SLP]Fix PR90780: insert cast instruction for PHI nodes after all phi nodes. Need to check if the vectorized value is a PHINode before insert casting instruction and insert it after all phis to generate the code correctly.	2024-05-02 06:30:14 -07:00
Alexey Bataev	fc382db239	[SLP]Improve comparison of shuffled loads/masked gathers by adding GEP cost. In some cases masked gather is less profitable than insert-subvector of consecutive/strided stores. SLP has this kind of analysis, but need to improve it by adding the cost of the GEP analysis. Also, the GEP cost estimation for masked gather is fixed. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/90737	2024-05-01 15:53:25 -04:00
Alexey Bataev	59ef94d7cf	[SLP]Do not include the cost of and -1, <v> and emit just <v> after MinBitWidth. After minbitwidth analysis, and <v>, (power_of_2 - 1 const) can be transformed into just an <v>, (all_ones const), which can be ignored at the cost estimation and at the codegen. x264 benchmark has this pattern. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/90739	2024-05-01 15:52:23 -04:00
Florian Hahn	e846778e52	[VPlan] Make CallInst optional for VPWidenCallRecipe (NFCI). Replace relying on the underling CallInst for looking up the called function and its types by instead adding the called function as operand, in line with how called functions are handled in CallInst. Operand bundles, metadata and fast-math flags are optionally used if there's an underlying CallInst. This enables creating VPWidenCallRecipes without requiring an underlying IR instruction.	2024-05-01 20:48:22 +01:00
Alexey Bataev	576261ac8f	[SLP]Improve reordering for consts, splats and ops from same nodes + improved analysis. Improved detection of const/splat candidates, their matching and analysis of instructions from same nodes. Metric: size..text Program size..text results results0 diff results results0 diff test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/miniFE/miniFE.test 92952.00 93096.00 0.2% test-suite :: External/SPEC/CINT2006/464.h264ref/464.h264ref.test 779832.00 780136.00 0.0% test-suite :: MultiSource/Applications/JM/lencod/lencod.test 839923.00 840179.00 0.0% test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test 392708.00 392740.00 0.0% test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test 1171131.00 1171147.00 0.0% test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test 1391089.00 1391073.00 -0.0% test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test 1391089.00 1391073.00 -0.0% test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12352780.00 12352636.00 -0.0% MultiSource/Benchmarks/DOE-ProxyApps-C++/miniFE/miniFE - small reordering External/SPEC/CINT2006/464.h264ref/464.h264ref - small better code after reordering MultiSource/Applications/JM/lencod/lencod - smaller code with less shuffles MultiSource/Applications/JM/ldecod/ldecod - same External/SPEC/CFP2017rate/511.povray_r/511.povray_r - 2 extra loads vectorized, smaller code External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r - better code, size increased because of more constant vectors. External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s - same External/SPEC/CFP2017rate/526.blender_r/526.blender_r - small change in the vectorized code, some code a bit better, some a bit worse. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/87091	2024-05-01 07:34:06 -04:00
Alexey Bataev	67e726a2f7	[SLP]Transform stores + reverse to strided stores with stride -1, if profitable. Adds transformation of consecutive vector store + reverse to strided stores with stride -1, if it is profitable Reviewers: RKSimon, preames Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/90464	2024-05-01 07:32:33 -04:00
Alexey Bataev	51aac5b043	[SLP][NFCI]Improve compile time for phis with large number of incoming values. Added a limit of 128 incoming values at max for PHIs nodes to be vectorized plus improved performance by using logarithmic search instead of linear if the number of incoming values is > 4.	2024-04-30 14:42:49 -07:00
Florian Hahn	9c3f5fe88f	[LV] Don't consider the latch block as ScalarPredicatedBB. The conditional branch from the loop latch will be replaced by a single branch controlling the loop, so there is no extra overhead from scalarization. This improves the cost esimates in some cases.	2024-04-29 19:15:46 +01:00
Alexey Bataev	37ae4ad0ee	[SLP]Support minbitwidth analisys for buildvector nodes. Metric: size..text Program size..text exp ref diff test-suite :: MultiSource/Benchmarks/mediabench/gsm/toast/toast.test 42906.00 42986.00 0.2% test-suite :: MultiSource/Benchmarks/MiBench/telecomm-gsm/telecomm-gsm.test 42909.00 42989.00 0.2% test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 664581.00 664661.00 0.0% test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 664581.00 664661.00 0.0% Less is better. Replaces `buildvector <p x in> + trunc <p x in> to <p x im>` sequences to `buildvector <p x im> of { trunc in to im }` scalars, which is free in most cases, results in better code. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/88504	2024-04-29 09:57:37 -04:00
Alexey Bataev	040b5a1255	[SLP]Fix PR90211: vectorized node must match completely to be reused. If the gather node matches the vectorized node, it must also match with the scalars completely. Otherwise, need to revectorize the gather node to generate correct code.	2024-04-29 06:51:11 -07:00
Maciej Gabka	bfc0317153	Move several vector intrinsics out of experimental namespace (#88748 ) This patch is moving out following intrinsics: * vector.interleave2/deinterleave2 * vector.reverse * vector.splice from the experimental namespace. All these intrinsics exist in LLVM for more than a year now, and are widely used, so should not be considered as experimental.	2024-04-29 10:16:45 +01:00
Florian Hahn	aafed3408e	[VPlan] Make createScalarIVSteps return VPScalarIVStepsRecipe (NFC). This avoids the need for using getVPSingleValue/getDefiningRecipe at the place the return value is used.	2024-04-28 21:56:55 +01:00
Florian Hahn	b6a8f5486b	[LV] Consider all exit branch conditions uniform. If we vectorize a loop with multiple exits, all exiting branches should be considered uniform, as the resulting loop will be controlled by the canonical IV only. Previously we were overestimating the cost of values contributing to the other exits.	2024-04-28 13:15:55 +01:00
Florian Hahn	9ee8e38cdc	[VPlan] Also propagate versioned strides to users via sext/zext. The versioned value may not be used in the loop directly but through a sext/zext. Add new live-ins in those cases.	2024-04-26 21:29:43 +01:00
Alexey Bataev	79314c64d0	[SLP]Fix PR90224: check that users of gep are all vectorized. Before deleting extractelement instruction for vectorized GEP with external users, need to check that all users vectorized before deleting this extractelement.	2024-04-26 11:49:12 -07:00
Alexey Bataev	d74e42acd2	[SLP]Attempt to vectorize long stores, if short one failed. We can try to vectorize long store sequences, if short ones were unsuccessful because of the non-profitable vectorization. It should not increase compile time significantly (stores are sorted already, complexity is n x log n), but vectorize extra code. Metric: size..text Program size..text results results0 diff test-suite :: External/SPEC/CINT2006/400.perlbench/400.perlbench.test 1088012.00 1088236.00 0.0% test-suite :: SingleSource/UnitTests/matrix-types-spec.test 480396.00 480476.00 0.0% test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 664613.00 664661.00 0.0% test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 664613.00 664661.00 0.0% test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 2041105.00 2040961.00 -0.0% test-suite :: MultiSource/Applications/JM/lencod/lencod.test 836563.00 836387.00 -0.0% test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test 1035100.00 1032140.00 -0.3% In all benchmarks extra code gets vectorized Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/88563	2024-04-26 06:53:44 -07:00
Troy Butler	468fecfc39	Fix mismatches between function parameter definitions and declarations (#89512 ) Addresses issue #88716. Some function parameter names in the affected header files did not match the parameter names in the definitions, or were listed in a different order. --------- Signed-off-by: Troy-Butler <squintik@outlook.com>	2024-04-26 13:00:31 +02:00
Alexey Bataev	f758bb66e8	[SLP]Fix PR89988: do extra analysis of the icmp args to correctly handle signed/unsigned comparison. If operands of icmp has different signedness, need to consider extending unsigned operands to correctly handle comparison with the signed operands.	2024-04-25 16:10:24 -07:00
Simon Pilgrim	282b56f43d	[VectorCombine] foldShuffleOfBinops - add support for length changing shuffles (#88899 ) Refactor to be closer to foldShuffleOfCastops - sibling patch to #88743 that can be used to address some of the issues identified in #88693	2024-04-24 10:18:49 +01:00
Patrick O'Neill	adb0126ef1	[VPlan] Add scalar inferencing support for Not and Or insns (#89160 ) Fixes #87394. PR: https://github.com/llvm/llvm-project/pull/89160	2024-04-23 15:48:43 +01:00
Alexey Bataev	b4a0fd40f1	[SLP]Fix PR89635: do not try to vectorize single-gather alternate node. No need to try to vectorize single gather/buildvector with alternate opcode graph, it is not profitable. In other cases, need to use last instruction for inserting the vectorized code.	2024-04-23 06:45:43 -07:00

1 2 3 4 5 ...

4494 Commits