llvm-project

Author	SHA1	Message	Date
Benjamin Kramer	3fc277f665	[SLPVectorizer] Make the insert/extractvector PHICompare a strict-weak ordering (#83571 ) This was tripping off STL implementations that check for it (like libc++ with debug checking). The goal of this sort is to cluster operations on the same values so preserve that property but sort everything else based on the existing numbering.	2024-03-01 15:37:54 +01:00
Alexey Bataev	5bafb8d952	[SLP][NFC]Add/use single UsesLimit constant, NFC.	2024-03-01 06:37:08 -08:00
Florian Hahn	6ecd26132b	[SLP] Use ScopeExit to update Operands/PrevDist on all paths. (NFC) (#83490 ) Use ScopeExit to make sure Operands/PrevDist are updated on all paths in the loop. This makes it easier to ensure they are updated correctly if new early continues are added. Split off from https://github.com/llvm/llvm-project/pull/83283 PR: https://github.com/llvm/llvm-project/pull/83490	2024-03-01 14:30:01 +00:00
Alexey Bataev	f28c4b4bac	[SLP]Fix/improve potential masked gather loads analysis. When do the analysis for the (potential) masked gather node, we check that not greater than half of the pointer operands are loop invariants or potentially vectorizable. Need to check actually, that we have a loop at first and do better check for the potentially vectorizable pointers. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/83472	2024-03-01 07:38:18 -05:00
Alexey Bataev	2d98d763a8	[SLP]Fix the cost model for extracts combined with later shuffle. If the buildvector node contains extract, which later should be combined with some other nodes by shuffling, need to estimate the cost of this shuffle before building the mask after shuffle. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/83442	2024-03-01 07:12:07 -05:00
Alexey Bataev	45d82f33af	[SLP]Fix miscompilation, cause by incorrect final node reordering. Need to use the regular reordering from the correct node for the final store/insertelement node to avoid miscommilation.	2024-02-29 11:21:06 -08:00
Alexey Bataev	c89d51112d	[SLP]Use It->second.first for BWSz, NFC.	2024-02-28 06:38:41 -08:00
Alexey Bataev	32994cc0d6	[SLP]Improve findReusedOrderedScalars and graph rotation. Patch syncs the code in findReusedOrderedScalars with cost estimation/codegen. It tries to use similar logic to better determine best order. Before, it just tried to find previously vectorized node without checking if it is possible to use the vectorized value in the shuffle. Now it relies on the more generalized version. If it determines, that a single vector must be reordered (using same mechanism, as codegen and cost estimation), it generates better order. The comparison between new/ref ordering: Metric: SLP.NumVectorInstructions Program SLP.NumVectorInstructions results results0 diff test-suite :: MultiSource/Benchmarks/nbench/nbench.test 139.00 140.00 0.7% test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/miniFE/miniFE.test 344.00 346.00 0.6% test-suite :: MultiSource/Benchmarks/FreeBench/pifft/pifft.test 1293.00 1292.00 -0.1% test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test 5176.00 5170.00 -0.1% test-suite :: External/SPEC/CFP2006/453.povray/453.povray.test 5173.00 5167.00 -0.1% test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 11692.00 11660.00 -0.3% test-suite :: External/SPEC/CINT2006/464.h264ref/464.h264ref.test 1621.00 1615.00 -0.4% test-suite :: External/SPEC/CINT2006/403.gcc/403.gcc.test 795.00 792.00 -0.4% test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 26499.00 26338.00 -0.6% test-suite :: MultiSource/Benchmarks/Bullet/bullet.test 7343.00 7281.00 -0.8% test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test 1104.00 1094.00 -0.9% test-suite :: MultiSource/Applications/JM/lencod/lencod.test 2216.00 2180.00 -1.6% test-suite :: External/SPEC/CFP2006/433.milc/433.milc.test 787.00 637.00 -19.1% Less 0% is better. Most of the benchmarks see more vectorized code. The first ones just have shuffles removed. The ordering analysis still may require some improvements (e.g. for alternate nodes), but this one should be produce better results. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/77529	2024-02-22 14:32:15 -05:00
Alexey Bataev	35f45926eb	[SLP][NFC]Add asserts for undef handling in PHIComparator, NFC.	2024-02-19 12:57:56 -08:00
Alexey Bataev	b04dd5d187	[SLP]FIx PR81403: compiler crah because wrongly resized vector value. The mask for the reshuffling/resizing might be calculated incorrectly, fixed.	2024-02-12 10:27:25 -08:00
Alexey Bataev	833a1cadeb	[SLP]Add support for strided loads. Added basic support for strided loads support in SLP vectorizer. Supports constant strides only. If the strided load must be reversed, applies -stride to avoid extra reverse shuffle. Reviewers: preames, lukel97 Reviewed By: preames Pull Request: https://github.com/llvm/llvm-project/pull/80310	2024-02-12 09:43:54 -08:00
Alexey Bataev	6a3a5cad2e	Revert "[SLP]Add support for strided loads." This reverts commit 0940f9083e68bda78bcbb323c2968a4294092e21 to fix issues reported in https://github.com/llvm/llvm-project/pull/80310.	2024-02-12 08:47:28 -08:00
Alexey Bataev	0940f9083e	[SLP]Add support for strided loads. Added basic support for strided loads support in SLP vectorizer. Supports constant strides only. If the strided load must be reversed, applies -stride to avoid extra reverse shuffle. Reviewers: preames, lukel97 Reviewed By: preames Pull Request: https://github.com/llvm/llvm-project/pull/80310	2024-02-12 07:41:42 -05:00
Alexey Bataev	df856e4977	[SLP]Add GEP cost estimation for gathered loads. When doing estimation for vectorization of gathered loads, need to estimate the cost of the pointer (vectorization), as it is done for the actual vectorized loads. Otherwise may be too optimistic about the cost of the gathered loads. Reviewers: preames Reviewed By: preames Pull Request: https://github.com/llvm/llvm-project/pull/80867	2024-02-07 07:30:41 -05:00
Alexey Bataev	299e5fef9d	[SLP][NFC]Simplify/unify vectors for scattered/vectorized loads from gathers, NFC.	2024-02-06 08:18:11 -08:00
Alexey Bataev	36e8db7d8c	[SLP][NFC]Extract main part of GetGEPCostDiff to a function, NFC.	2024-02-06 08:05:42 -08:00
Alexey Bataev	ef7f6aca14	[SLP][NFC]Add some extra checks/reorganize the code to improve compile time, NFC.	2024-02-01 10:53:39 -08:00
Alexey Bataev	15295d0135	[SLP][NFC]Introduce and use computeCommonAlignment function, NFC.	2024-02-01 06:13:39 -08:00
Alexey Bataev	285bc69846	[SLP]Fix PR80027: Fix costs processing for minbitwidth types. Need to switch the types, the destination is first in getCastInstrCost function.	2024-01-30 10:32:55 -08:00
Alexey Bataev	976374d982	[SLP][NFC]Use MutableArrayRef instead of SmallVectorImpl&, NFC.	2024-01-30 06:21:47 -08:00
Alexey Bataev	8d89dd4a58	[SLP]Fix PR79743: Check that all users are demoted before trying to demote the tree entry. Need to check if all user nodes are marked for demotion before demoting the node. Otherwise, some data info might be lost after vectorization.	2024-01-29 10:51:20 -08:00
Alexey Bataev	92ae2ca12b	[SLP][NFC]Improve BottomTopTop reordering of orders for multi-iterations attempts, NFC. If several iterations of reodering of orders is required, need to use different algorithm.	2024-01-25 13:04:01 -08:00
Alexey Bataev	6fe21bc1da	[SLP]Fix PR79229: Do not erase extractelement, if it used in multiregister node. If the node can be span between several registers and same extractelement instruction is used in several parts, it may be required to keep such extractelement instruction to avoid compiler crash.	2024-01-25 06:20:53 -08:00
Alexey Bataev	36e4a7ecca	[SLP]Fix PR79321: SLPVectorizer's PHICompare doesn't provide a strict weak ordering. Try to make PHICompare to meat strict weak ordering criteria.	2024-01-24 13:46:05 -08:00
Alexey Bataev	48bbd76587	[SLP]Fix PR79229: Check that extractelement is used only in a single node before erasing. Before trying to erase the extractelement instruction, not enough to check for single use, need to check that it is not used in several nodes because of the preliminary nodes reordering.	2024-01-24 11:22:22 -08:00
Alexey Bataev	ca654acc16	[SLP]Fix PR79321: SLPVectorizer's PHICompare doesn't provide a strict weak ordering. Compared NumUses to meet the reaquirements of the strict weak ordering.	2024-01-24 09:36:25 -08:00
Alexey Bataev	bb3e0d7fc3	[SLP]Fix PR79193: skip analysis of gather nodes for minbitwidth. No need in trying to analyze small graphs with gather node only to avoid crash.	2024-01-23 12:44:49 -08:00
Stephen Tozer	632f44e5ed	[RemoveDIs][DebugInfo] Handle DPVAssign in most transforms (#78986 ) This patch trivially updates various opt passes to handle DPVAssigns. In all cases, this means some combination of generifying existing code to handle DPValues and DbgAssignIntrinsics, iterating over DPValues where previously we did not, or duplicating code for DbgAssignIntrinsics to the equivalent DPValue function (in inlining and salvageDebugInfo).	2024-01-23 16:16:59 +00:00
Alexey Bataev	093206bb7e	[SLP]Fix PR78298: Assertion `GEP->getNumIndices() == 1 && !isa<Constant>(GEPIdx)' failed. The non-constant index might be folded to constant during earlier stages of vectorization. Need to consider this option and filter out out GEP with the constant indices from the candidates list.	2024-01-16 09:17:35 -08:00
Alexey Bataev	d79fdb2749	[SLP]Fix PR78236: correctly track external values, replaced several times during reduction vectorization. If the external value was replaced in the vectorizer several times during reduction vectorization, need to find the original value to correctly handle external uses and emit extractelement instructions properly.	2024-01-16 06:52:43 -08:00
Alexey Bataev	6fdc2ce8c5	[SLP]Fix PR77916: transform the whole mask, not only the elements for the second vector. Need to transform all elements in the long mask, if we decided to produce shorter version, some elements may still have incorrect inifices after transformation for the first vector in the permutation.	2024-01-12 07:07:43 -08:00
Alexey Bataev	39b2104b4a	[SLP]Fix a crash for reduced values with minbitwidth, which are reused. If the reduced values are additionally affected by minbitwidth analysis, need to cast them to a proper type before doing any math, if they are reused.	2024-01-12 04:49:48 -08:00
Alexey Bataev	18473eb108	[SLP]Do not require external uses for roots and single use for other instructions in computeMinimumValueSizes. (#72679 ) After changes, that does not require support from InstCombine, we can drop some extra requirements for values-to-be-demoted. No need to check for external uses for roots/other instructions, just check that the no non-vectorized insertelement instruction, which may require widening. Review: https://github.com/llvm/llvm-project/pull/72679	2024-01-11 06:59:57 -08:00
Martin Storsjö	1de3f46938	Revert "[SLP]Do not require external uses for roots and single use for other instructions in computeMinimumValueSizes. (#72679 )" This reverts commit 408dce82016463dcb5026b2ddfc62174970a88e9. This triggered failed asserts with code like this: char a[]; short b; int c, d, e, f; void g() { char h; for (;;) { for (; f; ++f) { h[f] = b[0] * a[e] + b[c] * a[1] >> 7; ++b; } h += d; } } Compiled like this: $ clang -target x86_64-linux-gnu -c repro.c -O2 clang: ../lib/IR/Instructions.cpp:3335: static llvm::CastInst* llvm::CastInst::Create(llvm::Instruction::CastOps, llvm::Value, llvm::Type, const llvm::Twine&, llvm::Instruction*): Assertion `castIsValid(op, S, Ty) && "Invalid cast!"' failed.	2024-01-11 12:15:35 +02:00
Alexey Bataev	408dce8201	[SLP]Do not require external uses for roots and single use for other instructions in computeMinimumValueSizes. (#72679 ) After changes, that does not require support from InstCombine, we can drop some extra requirements for values-to-be-demoted. No need to check for external uses for roots/other instructions, just check that the no non-vectorized insertelement instruction, which may require widening.	2024-01-10 14:06:29 -05:00
Alexey Bataev	73ce13d79b	[SLP][TTI]Improve detection of the insert-subvector pattern for SLP. (#74749 ) SLP vectorizer passes the type of the subvector and the mask, which size determines the size of the resulting vector. TTI should support this pattern to improve cost estimation of the insert_subvector shuffle pattern.	2024-01-10 10:39:34 -05:00
Alexey Bataev	036e48e2f5	[SLP]Fix PR76850: do the analysis of the submask. Need to limit the transformation of the VecMask by the corresponding part of the mask of SliceSize size to avoid compiler crash during further cost analysis.	2024-01-08 07:51:02 -08:00
Alexey Bataev	79e62315be	[SLP]Use revectorized value for extracts from buildvector, beeing vectorized. When trying to reuse the extractelement instruction, emitted for the insertelement instruction, need to check, if the this insertelement instruction was vectorized. In this case, need to use vectorized value, not the original insertelement.	2024-01-04 06:45:26 -08:00
Alexey Bataev	7c963fde16	[SLP]Use revectorized value for extracts from buildvector, beeing vectorized. If the insertelement instruction is vectorized, and the extractelement instruction from such insertelement also vectorized as part of the same tree, need to extract from the corresponding for insertelement vectorized value rather than original insertelement instruction.	2024-01-03 10:38:09 -08:00
Alexandros Lamprineas	e512df3ecc	[LV] Fix crash when vectorizing function calls with linear args. (#76274 ) llvm/lib/IR/Type.cpp:694: Assertion `isValidElementType(ElementType) && "Element type of a VectorType must be an integer, floating point, or pointer type."' failed. Stack dump: llvm::FixedVectorType::get(llvm::Type, unsigned int) llvm::VPWidenCallRecipe::execute(llvm::VPTransformState&) llvm::VPBasicBlock::execute(llvm::VPTransformState) llvm::VPRegionBlock::execute(llvm::VPTransformState) llvm::VPlan::execute(llvm::VPTransformState) ... Happens with function calls of void return type.	2024-01-02 18:14:16 +00:00
Enna1	9943d33997	[SLP][NFC] Fix assertion in vectorizeGEPIndices() (#76660 ) The index constraints for the collected getelementptr instructions should be single and non-constant.	2024-01-02 21:32:18 +08:00
Enna1	a51c2f39f5	[SLP] no need to generate extract for in-tree uses for original scala… (#76077 ) …r instruction. Before `77a609b556`, we always skip in-tree uses of the vectorized scalars in `buildExternalUses()`, that commit handles the case that if the in-tree use is scalar operand in vectorized instruction, we need to generate extract for these in-tree uses. in-tree uses remain as scalar in vectorized instructions can be 3 cases: - The pointer operand of vectorized LoadInst uses an in-tree scalar - The pointer operand of vectorized StoreInst uses an in-tree scalar - The scalar argument of vector form intrinsic uses an in-tree scalar Generating extract for in-tree uses for vectorized instructions are implemented in `BoUpSLP::vectorizeTree()`: - https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp#L11497-L11506 - https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp#L11542-L11551 - https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp#L11657-L11667 However, `77a609b556` not only generates extract for vectorized instructions, but also generates extract for original scalar instructions. There is no need to generate extract for origin scalar instrutions, as these scalar instructions will be replaced by vector instructions and get erased later. This patch marks there is no exact user for in-tree scalars that remain as scalar in vectorized instructions when building external uses, In this case all uses of this scalar will be automatically replaced by extractelement. and remove - https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp#L11497-L11506 - https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp#L11542-L11551 - https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp#L11657-L11667 extracts.	2023-12-30 10:45:26 +08:00
Alexey Bataev	5096501082	[SLP][TTI][X86]Add addsub pattern cost estimation. (#76461 ) SLP/TTI do not know about the cost estimation for addsub pattern, supported by X86. Previously the support for pattern detection was added (seeTTI::isLegalAltInstr), but the cost still did not estimated properly.	2023-12-28 05:04:04 -08:00
Douglas Yung	fb981e6b4b	Revert "[SLP][TTI][X86]Add addsub pattern cost estimation. (#76461 )" This reverts commit bc8c4bbd7973ab9527a78a20000aecde9bed652d. Change is failing to build on several bots: - https://lab.llvm.org/buildbot/#/builders/127/builds/60184 - https://lab.llvm.org/buildbot/#/builders/123/builds/23709 - https://lab.llvm.org/buildbot/#/builders/216/builds/32302	2023-12-27 23:52:04 -08:00
Alexey Bataev	bc8c4bbd79	[SLP][TTI][X86]Add addsub pattern cost estimation. (#76461 ) SLP/TTI do not know about the cost estimation for addsub pattern, supported by X86. Previously the support for pattern detection was added (seeTTI::isLegalAltInstr), but the cost still did not estimated properly.	2023-12-27 15:57:21 -05:00
Kazu Hirata	03dc806b12	[Transforms] Use {DenseMap,SmallPtrSet}::contains (NFC)	2023-12-22 14:51:22 -08:00
Alexey Bataev	a13148a880	[SLP]Fix PR75995: drop wrapping flags for resized wrapped binops. If decided to resize the instruction, need to drop wrapping flags from the resulting vector instructions to avoid incorrect optimizations/assumptions later. Fixes PR75995.	2023-12-20 06:51:39 -08:00
Arthur Eubanks	71a9292298	Revert "[SLP]Improve findReusedOrderedScalars processing, NFCI." This reverts commit 44dc1e0baae7c4b8a02ba06dcf396d3d452aa873. Causes non-determinism, see #75987.	2023-12-19 16:14:04 -08:00
Alexey Bataev	00edad17c2	[SLP][NFC]Check for equal opcode preliminary to meet weak strict order requirement, NFC. This change does not affect functionality, just fixes the assertions in some standard c++ library implementations.	2023-12-18 14:12:33 -08:00
Alexey Bataev	a7e10e6603	Revert "[SLP][NFC]Check for equal opcode preliminary to meet weak strict order" This reverts commit 58a2c4e2f24ffce3966c3988d1a4ca7b04c52244 to fix the issue detected by https://lab.llvm.org/buildbot/#/builders/233/builds/5424.	2023-12-18 12:35:52 -08:00

1 2 3 4 5 ...

1605 Commits