llvm-project

Author	SHA1	Message	Date
Florian Hahn	617398e5e2	[SLP] Collect candidate VFs in vector in vectorizeStores (NFC). (#82793 ) This is in preparation for https://github.com/llvm/llvm-project/pull/77790 and makes it easy to add other, non-power-of-2 VFs for processing. PR: https://github.com/llvm/llvm-project/pull/82793	2024-03-01 20:13:49 +00:00
Florian Hahn	6fe60bd89f	[SLP] Exit early if MaxVF < MinVF (NFCI). (#83283 ) Exit early if MaxVF < MinVF. In that case, the loop body below will never get entered. Note that this adjusts the condition from MaxVF <= MinVF. If MaxVF == MinVF, vectorization may still be feasible (and the loop below gets entered). PR: https://github.com/llvm/llvm-project/pull/83283	2024-03-01 19:43:06 +00:00
Alexey Bataev	2ab6d1e18e	[SLP][NFC]Move some check to the outer if to simplify inner checks.	2024-03-01 10:58:41 -08:00
Alexey Bataev	3a30d8e9e5	[SLP]Check if masked gather can be emitted as a serie of loads/insert subvector. Masked gather is very expensive operation and sometimes better to represent it as a serie of consecutive/strided loads + insertsubvectors sequences. Patch adds some basic estimation and if loads+insertsubvector is cheaper, decides to represent it in this way rather than masked gather. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/83481	2024-03-01 13:49:29 -05:00
Alexey Bataev	df0fd3a80e	[SLP]Try to vectorize small graph with extractelements, used in buildvector. If the graph incudes only single "gather" node with only extractelements/undefs, which used only in insertelement-based buildvector sequences, it still might be profitable to vectorize it. Need to rely on the cost model, not throw this graph away immediately. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/83581	2024-03-01 12:48:45 -05:00
Benjamin Kramer	3fc277f665	[SLPVectorizer] Make the insert/extractvector PHICompare a strict-weak ordering (#83571 ) This was tripping off STL implementations that check for it (like libc++ with debug checking). The goal of this sort is to cluster operations on the same values so preserve that property but sort everything else based on the existing numbering.	2024-03-01 15:37:54 +01:00
Alexey Bataev	5bafb8d952	[SLP][NFC]Add/use single UsesLimit constant, NFC.	2024-03-01 06:37:08 -08:00
Florian Hahn	6ecd26132b	[SLP] Use ScopeExit to update Operands/PrevDist on all paths. (NFC) (#83490 ) Use ScopeExit to make sure Operands/PrevDist are updated on all paths in the loop. This makes it easier to ensure they are updated correctly if new early continues are added. Split off from https://github.com/llvm/llvm-project/pull/83283 PR: https://github.com/llvm/llvm-project/pull/83490	2024-03-01 14:30:01 +00:00
Alexey Bataev	f28c4b4bac	[SLP]Fix/improve potential masked gather loads analysis. When do the analysis for the (potential) masked gather node, we check that not greater than half of the pointer operands are loop invariants or potentially vectorizable. Need to check actually, that we have a loop at first and do better check for the potentially vectorizable pointers. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/83472	2024-03-01 07:38:18 -05:00
Alexey Bataev	2d98d763a8	[SLP]Fix the cost model for extracts combined with later shuffle. If the buildvector node contains extract, which later should be combined with some other nodes by shuffling, need to estimate the cost of this shuffle before building the mask after shuffle. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/83442	2024-03-01 07:12:07 -05:00
Alexey Bataev	45d82f33af	[SLP]Fix miscompilation, cause by incorrect final node reordering. Need to use the regular reordering from the correct node for the final store/insertelement node to avoid miscommilation.	2024-02-29 11:21:06 -08:00
Florian Hahn	4d525f2b9a	[VPlan] Remove unneeded InsertPointGuard (NFCI). getBlockInMask now simply returns an already computed mask, hence there's no need to adjust the builder insert point.	2024-02-29 12:37:13 +00:00
Nilanjana Basu	1c211bc76e	[LV] Remove unused configuration option (#82955 ) Recent set of changes (PR #67725) in loop interleaving algorithm caused removal of the loop trip count threshold for allowing interleaving. Therefore configuration option interleave-small-loop-scalar-reduction is no longer needed.	2024-02-28 10:17:25 -08:00
Florian Hahn	3fac0562f8	[VPlan] Reset trip count when replacing ExpandSCEV recipe. Otherwise accessing the trip count may accesses freed memory. Fixes https://lab.llvm.org/buildbot/#/builders/74/builds/26239 and others.	2024-02-28 16:31:49 +00:00
Alexey Bataev	80cff27390	[LV][NFC]Fix a misprint, NFC.	2024-02-28 07:56:31 -08:00
Alexey Bataev	fdd60c7b96	[LV][NFC]Preselect folding style while chosing the max VF, NFC. Selects the tail-folding style while choosing the max vector factor and storing it in the data member rather than calculating it each time upon getTailFoldingStyle call. Part of https://github.com/llvm/llvm-project/pull/76172 Reviewers: ayalz, fhahn Reviewed By: fhahn Pull Request: https://github.com/llvm/llvm-project/pull/81885	2024-02-28 10:15:52 -05:00
Alexey Bataev	c89d51112d	[SLP]Use It->second.first for BWSz, NFC.	2024-02-28 06:38:41 -08:00
Florian Hahn	15d9d0fa8f	[VPlan] Also print final VPlan directly before codegen/execute. (#82269 ) Some optimizations are apply after UF and VF have been chosen. This patch adds an extra print of the final VPlan just before codegen/execution. In the future, there will be additional transforms that are applied later (interleaving for example). PR: https://github.com/llvm/llvm-project/pull/82269	2024-02-28 13:19:43 +00:00
Florian Hahn	911055e34f	[VPlan] Consistently use (Part, 0) for first lane scalar values (#80271 ) At the moment, some VPInstructions create only a single scalar value, but use VPTransformatState's 'vector' storage for this value. Those values are effectively uniform-per-VF (or in some cases uniform-across-VF-and-UF). Using the vector/per-part storage doesn't interact well with other recipes, that more accurately using (Part, Lane) to look up scalar values and prevents VPInstructions creating scalars from interacting with other recipes working with scalars. This PR tries to unify handling of scalars by using (Part, 0) for scalar values where only the first lane is demanded. This allows using VPInstructions with other recipes like VPScalarCastRecipe and is also needed when using VPInstructions in more cases otuside the vector loop region to generate scalars. Depends on https://github.com/llvm/llvm-project/pull/80269	2024-02-26 19:06:43 +00:00
Florian Hahn	85da9f80b8	[VPlan] Remove unused VPTransformState::VPValue2Value (NFCI). Clean up unused member variable.	2024-02-25 12:14:44 +00:00
Florian Hahn	0b01320d28	[VPlan] Remove unused VPTransformState::CanonicalIV (NFCI). Clean up unused member variable.	2024-02-23 16:54:30 +00:00
Alexey Bataev	32994cc0d6	[SLP]Improve findReusedOrderedScalars and graph rotation. Patch syncs the code in findReusedOrderedScalars with cost estimation/codegen. It tries to use similar logic to better determine best order. Before, it just tried to find previously vectorized node without checking if it is possible to use the vectorized value in the shuffle. Now it relies on the more generalized version. If it determines, that a single vector must be reordered (using same mechanism, as codegen and cost estimation), it generates better order. The comparison between new/ref ordering: Metric: SLP.NumVectorInstructions Program SLP.NumVectorInstructions results results0 diff test-suite :: MultiSource/Benchmarks/nbench/nbench.test 139.00 140.00 0.7% test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/miniFE/miniFE.test 344.00 346.00 0.6% test-suite :: MultiSource/Benchmarks/FreeBench/pifft/pifft.test 1293.00 1292.00 -0.1% test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test 5176.00 5170.00 -0.1% test-suite :: External/SPEC/CFP2006/453.povray/453.povray.test 5173.00 5167.00 -0.1% test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 11692.00 11660.00 -0.3% test-suite :: External/SPEC/CINT2006/464.h264ref/464.h264ref.test 1621.00 1615.00 -0.4% test-suite :: External/SPEC/CINT2006/403.gcc/403.gcc.test 795.00 792.00 -0.4% test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 26499.00 26338.00 -0.6% test-suite :: MultiSource/Benchmarks/Bullet/bullet.test 7343.00 7281.00 -0.8% test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test 1104.00 1094.00 -0.9% test-suite :: MultiSource/Applications/JM/lencod/lencod.test 2216.00 2180.00 -1.6% test-suite :: External/SPEC/CFP2006/433.milc/433.milc.test 787.00 637.00 -19.1% Less 0% is better. Most of the benchmarks see more vectorized code. The first ones just have shuffles removed. The ordering analysis still may require some improvements (e.g. for alternate nodes), but this one should be produce better results. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/77529	2024-02-22 14:32:15 -05:00
Florian Hahn	cd160a6e98	[VPlan] Do not add call results with void type to State (NFC). With vector libraries, we may vectorize calls with void return types. Do not add those values to the state; they can never be accessed.	2024-02-21 20:36:17 +00:00
Florian Hahn	3d66d6932e	[VPlan] Support live-ins without underlying IR in type analysis. (#80723 ) A VPlan contains multiple live-ins without underlying IR, like VFxUF or VectorTripCount. Trying to infer the scalar type of those causes a crash at the moment. Update VPTypeAnalysis to take a VPlan in its constructor and assign types to those live-ins up front. All those live-ins share the type of the canonical IV. PR: https://github.com/llvm/llvm-project/pull/80723	2024-02-21 19:37:15 +00:00
Florian Hahn	9923d29cfa	[VPlan] Merge main VPlan verifer with HCFG verifier. Unify VPlan verifiers in verifyVPlanIsValid. This adds verification for various properties on blocks to the verifier used for VPlans generated by the inner loop vectorizer. It also adds def-use checks for the verifier used in the VPlan native path. This drops the separate flag to enable HCFG verification. Instead, all VPlans are verified once they have been created, if assertions are enabled. This also removes VPWidenPHIRecipe from VPHeaderPHIRecipe; it is used to model any phi node in the native path.	2024-02-20 16:43:57 +00:00
Florian Hahn	44b17679e3	[VPlan] Remove stale comment from VPTransformState::get (NFC) All values accessed via get are now part of VPTransformState, the ILV reference in the comment has been removed a long time ago. Remove the stale comment.	2024-02-19 22:11:51 +00:00
Alexey Bataev	35f45926eb	[SLP][NFC]Add asserts for undef handling in PHIComparator, NFC.	2024-02-19 12:57:56 -08:00
Simon Pilgrim	769c22f25b	[VectorCombine] Fold reduce(trunc(x)) -> trunc(reduce(x)) iff cost effective (#81852 ) Vector truncations can be pretty expensive, especially on X86, whilst scalar truncations are often free. If the cost of performing the add/mul/and/or/xor reduction is cheap enough on the pre-truncated type, then avoid the vector truncation entirely. Fixes https://github.com/llvm/llvm-project/issues/81469	2024-02-19 11:32:23 +00:00
Florian Hahn	536d78c213	[VPlan] Remove VPInstruction::setUnderlyingInstr (NFCI). VPInstruction doesn't rely on the underlying instruction any longer for codegen, remove the unneeded setUnderlyingInstr.	2024-02-18 18:50:01 +00:00
Florian Hahn	35ee6de966	[VPlan] Simplify addCanonicalIVRecipes by using VPBuilder (NFC). Use VPBuilder to construct VPInstructions, which means there's no need to manually inserting recipes.	2024-02-17 18:30:01 +00:00
Florian Hahn	20177c45db	[VPlan] Turn private members of VPlanTransforms to static funcs (NFC) Private members of VPlanTransforms are only used inside VPlanTransforms.cpp, just make them static.	2024-02-17 13:45:23 +00:00
Florian Hahn	0dacba3ad1	[VPlan] Handle truncating ICMPs in truncateToMinimalBWs. Update truncateToMinimalBitwidths to handle truncating ICMPs. For ICMPs, the new target type will be the same as the original type. In that case, only truncate the operands, but skip the extend. This is in line with what the original truncateToMinimalBitwidths did for compares. Fixes https://github.com/llvm/llvm-project/issues/81415.	2024-02-16 12:58:56 +00:00
David Sherwood	1c10821022	[LoopVectorize] Fix divide-by-zero bug (#80836 ) (#81721 ) When attempting to use the estimated trip count to refine the costs of the runtime memory checks we should also check for sane trip counts to prevent divide-by-zero faults on some platforms. Fixes #80836	2024-02-14 16:07:51 +00:00
Florian Hahn	debca7ee43	[VPlan] Move dropping of poison flags to VPlanTransforms. (NFC) Move collectPoisonGeneratingFlags from InnerLoopVectorizer to VPlanTransforms and also update its name. collectPoisonGeneratingFlags already directly drops poison-generating flags, not only collecting it. This means it is more appropriate to integerate it directly into the VPlan transform pipeline. The current implementation still calls back to legal to check if a block needs predication, which should be improved in the future.	2024-02-14 12:28:58 +00:00
Florian Hahn	ca56966684	[VPlan] Properly retain flags when cloning VPReplicateRecipe. This makes sure the correct flags are used for the clone (i.e. the ones present on the recipe), instead of the ones on the original IR instruction. At the moment, this should not change anything, as flags of replicate recipe should not be dropped before they are cloned at the moment. But that will change in a follow-up patch.	2024-02-14 11:11:46 +00:00
Alexey Bataev	b04dd5d187	[SLP]FIx PR81403: compiler crah because wrongly resized vector value. The mask for the reshuffling/resizing might be calculated incorrectly, fixed.	2024-02-12 10:27:25 -08:00
Alexey Bataev	833a1cadeb	[SLP]Add support for strided loads. Added basic support for strided loads support in SLP vectorizer. Supports constant strides only. If the strided load must be reversed, applies -stride to avoid extra reverse shuffle. Reviewers: preames, lukel97 Reviewed By: preames Pull Request: https://github.com/llvm/llvm-project/pull/80310	2024-02-12 09:43:54 -08:00
Alexey Bataev	6a3a5cad2e	Revert "[SLP]Add support for strided loads." This reverts commit 0940f9083e68bda78bcbb323c2968a4294092e21 to fix issues reported in https://github.com/llvm/llvm-project/pull/80310.	2024-02-12 08:47:28 -08:00
Alexey Bataev	0940f9083e	[SLP]Add support for strided loads. Added basic support for strided loads support in SLP vectorizer. Supports constant strides only. If the strided load must be reversed, applies -stride to avoid extra reverse shuffle. Reviewers: preames, lukel97 Reviewed By: preames Pull Request: https://github.com/llvm/llvm-project/pull/80310	2024-02-12 07:41:42 -05:00
Alexey Bataev	df856e4977	[SLP]Add GEP cost estimation for gathered loads. When doing estimation for vectorization of gathered loads, need to estimate the cost of the pointer (vectorization), as it is done for the actual vectorized loads. Otherwise may be too optimistic about the cost of the gathered loads. Reviewers: preames Reviewed By: preames Pull Request: https://github.com/llvm/llvm-project/pull/80867	2024-02-07 07:30:41 -05:00
Alexey Bataev	299e5fef9d	[SLP][NFC]Simplify/unify vectors for scattered/vectorized loads from gathers, NFC.	2024-02-06 08:18:11 -08:00
Alexey Bataev	36e8db7d8c	[SLP][NFC]Extract main part of GetGEPCostDiff to a function, NFC.	2024-02-06 08:05:42 -08:00
Nilanjana Basu	c1c5b854ad	[LV] Remove loop trip count threshold for deciding whether to interleave a loop (#67725 ) A set of microbenchmarks (https://github.com/llvm/llvm-test-suite/pull/26) showed that loop interleaving can be beneficial for loops with low trip count as well. Loop interleaving count computation is updated accordingly in prior patches while this patch removes the loop trip count threshold for interleaving.	2024-02-05 17:23:58 -08:00
Florian Hahn	8cb2de7fae	[VPlan] Implement type inference for ICmp. This fixes a crash in the attached test case due to missing type inference for ICmp VPInstructions.	2024-02-05 15:42:07 +00:00
Florian Hahn	47abbf4fe9	[VPlan] Update VPInst::onlyFirstLaneUsed to check users. (#80269 ) A VPInstruction only has its first lane used if all users use its first lane only. Use vputils::onlyFirstLaneUsed to continue checking the recipe's users to handle more cases. Besides allowing additional introduction of scalar steps when interleaving in some cases, this also enables using an Add VPInstruction to model the increment - as a follow up.	2024-02-03 16:19:10 +00:00
Florian Hahn	3444240540	[VPlan] Mark vputils::onlyFirstPartUsed arg as const (NFC) Split off https://github.com/llvm/llvm-project/pull/80269 as suggested.	2024-02-03 15:59:09 +00:00
Florian Hahn	6936479020	[VPlan] Mark vputils::onlyFirstLaneUsed arg as const (NFC) Split off https://github.com/llvm/llvm-project/pull/80269 as suggested.	2024-02-03 15:56:40 +00:00
Florian Hahn	2906f3626b	[VPlan] Update ::onlyScalarsGenerated to take IsScalable bool (NFCI). Instead of passing in a full VF, just pass IsScalable as bool.	2024-02-03 14:51:14 +00:00
Alexey Bataev	ef7f6aca14	[SLP][NFC]Add some extra checks/reorganize the code to improve compile time, NFC.	2024-02-01 10:53:39 -08:00
Alexey Bataev	15295d0135	[SLP][NFC]Introduce and use computeCommonAlignment function, NFC.	2024-02-01 06:13:39 -08:00

1 2 3 4 5 ...

4261 Commits