llvm-project

Author	SHA1	Message	Date
Florian Hahn	2ab5c47c87	[VPlan] Don't replace scalarizing recipe with VPWidenCastRecipe. Don't replace a scalarizing recipe with a VPWidenCastRecipe. This would introduce wide (vectorizing) recipes when interleaving only. Fixes https://github.com/llvm/llvm-project/issues/76986	2024-01-04 20:39:44 +00:00
Alexey Bataev	79e62315be	[SLP]Use revectorized value for extracts from buildvector, beeing vectorized. When trying to reuse the extractelement instruction, emitted for the insertelement instruction, need to check, if the this insertelement instruction was vectorized. In this case, need to use vectorized value, not the original insertelement.	2024-01-04 06:45:26 -08:00
Jannik Silvanus	7954c57124	[IR] Fix GEP offset computations for vector GEPs (#75448 ) Vectors are always bit-packed and don't respect the elements' alignment requirements. This is different from arrays. This means offsets of vector GEPs need to be computed differently than offsets of array GEPs. This PR fixes many places that rely on an incorrect pattern that always relies on `DL.getTypeAllocSize(GTI.getIndexedType())`. We replace these by usages of `GTI.getSequentialElementStride(DL)`, which is a new helper function added in this PR. This changes behavior for GEPs into vectors with element types for which the (bit) size and alloc size is different. This includes two cases: * Types with a bit size that is not a multiple of a byte, e.g. i1. GEPs into such vectors are questionable to begin with, as some elements are not even addressable. * Overaligned types, e.g. i16 with 32-bit alignment. Existing tests are unaffected, but a miscompilation of a new test is fixed. --------- Co-authored-by: Nikita Popov <github@npopov.com>	2024-01-04 10:08:21 +01:00
Nilanjana Basu	cd28da390f	[LV] Change loops' interleave count computation (#73766 ) [LV] Change loops' interleave count computation A set of microbenchmarks in llvm-test-suite (https://github.com/llvm/llvm-test-suite/pull/56), when tested on a AArch64 platform, demonstrates that loop interleaving is beneficial when the vector loop runs at least twice or when the epilogue loop trip count (TC) is minimal. Therefore, we choose interleaving count (IC) between TC/VF & TC/2*VF (VF = vectorization factor), such that remainder TC for the epilogue loop is minimum while the IC is maximum in case the remainder TC is same for both. The initial tests for this change were submitted in PRs: https://github.com/llvm/llvm-project/pull/70272 and https://github.com/llvm/llvm-project/pull/74689.	2024-01-04 12:45:22 +05:30
Florian Hahn	6dda74cc51	[VPlan] Use createSelect in adjustRecipesForReductions (NFCI). Simplify the code and rename Result->NewExitingVPV as suggested by @ayalz in https://github.com/llvm/llvm-project/pull/70253.	2024-01-03 20:54:10 +00:00
Alexey Bataev	7c963fde16	[SLP]Use revectorized value for extracts from buildvector, beeing vectorized. If the insertelement instruction is vectorized, and the extractelement instruction from such insertelement also vectorized as part of the same tree, need to extract from the corresponding for insertelement vectorized value rather than original insertelement instruction.	2024-01-03 10:38:09 -08:00
Alexandros Lamprineas	e512df3ecc	[LV] Fix crash when vectorizing function calls with linear args. (#76274 ) llvm/lib/IR/Type.cpp:694: Assertion `isValidElementType(ElementType) && "Element type of a VectorType must be an integer, floating point, or pointer type."' failed. Stack dump: llvm::FixedVectorType::get(llvm::Type, unsigned int) llvm::VPWidenCallRecipe::execute(llvm::VPTransformState&) llvm::VPBasicBlock::execute(llvm::VPTransformState) llvm::VPRegionBlock::execute(llvm::VPTransformState) llvm::VPlan::execute(llvm::VPTransformState) ... Happens with function calls of void return type.	2024-01-02 18:14:16 +00:00
Enna1	9943d33997	[SLP][NFC] Fix assertion in vectorizeGEPIndices() (#76660 ) The index constraints for the collected getelementptr instructions should be single and non-constant.	2024-01-02 21:32:18 +08:00
Florian Hahn	f18536d642	[VPlan] Model address separately. (#72164 ) Move vector pointer generation to a separate VPVectorPointerRecipe. This untangles address computation from the memory recipes future and is also needed to enable explicit unrolling in VPlan. https://github.com/llvm/llvm-project/pull/72164	2024-01-01 19:51:15 +00:00
Enna1	a51c2f39f5	[SLP] no need to generate extract for in-tree uses for original scala… (#76077 ) …r instruction. Before `77a609b556`, we always skip in-tree uses of the vectorized scalars in `buildExternalUses()`, that commit handles the case that if the in-tree use is scalar operand in vectorized instruction, we need to generate extract for these in-tree uses. in-tree uses remain as scalar in vectorized instructions can be 3 cases: - The pointer operand of vectorized LoadInst uses an in-tree scalar - The pointer operand of vectorized StoreInst uses an in-tree scalar - The scalar argument of vector form intrinsic uses an in-tree scalar Generating extract for in-tree uses for vectorized instructions are implemented in `BoUpSLP::vectorizeTree()`: - https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp#L11497-L11506 - https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp#L11542-L11551 - https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp#L11657-L11667 However, `77a609b556` not only generates extract for vectorized instructions, but also generates extract for original scalar instructions. There is no need to generate extract for origin scalar instrutions, as these scalar instructions will be replaced by vector instructions and get erased later. This patch marks there is no exact user for in-tree scalars that remain as scalar in vectorized instructions when building external uses, In this case all uses of this scalar will be automatically replaced by extractelement. and remove - https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp#L11497-L11506 - https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp#L11542-L11551 - https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp#L11657-L11667 extracts.	2023-12-30 10:45:26 +08:00
Florian Hahn	516cc98aff	[LV] Fix typo in comment (NFC).	2023-12-28 21:20:10 +00:00
Alexey Bataev	5096501082	[SLP][TTI][X86]Add addsub pattern cost estimation. (#76461 ) SLP/TTI do not know about the cost estimation for addsub pattern, supported by X86. Previously the support for pattern detection was added (seeTTI::isLegalAltInstr), but the cost still did not estimated properly.	2023-12-28 05:04:04 -08:00
Douglas Yung	fb981e6b4b	Revert "[SLP][TTI][X86]Add addsub pattern cost estimation. (#76461 )" This reverts commit bc8c4bbd7973ab9527a78a20000aecde9bed652d. Change is failing to build on several bots: - https://lab.llvm.org/buildbot/#/builders/127/builds/60184 - https://lab.llvm.org/buildbot/#/builders/123/builds/23709 - https://lab.llvm.org/buildbot/#/builders/216/builds/32302	2023-12-27 23:52:04 -08:00
Alexey Bataev	bc8c4bbd79	[SLP][TTI][X86]Add addsub pattern cost estimation. (#76461 ) SLP/TTI do not know about the cost estimation for addsub pattern, supported by X86. Previously the support for pattern detection was added (seeTTI::isLegalAltInstr), but the cost still did not estimated properly.	2023-12-27 15:57:21 -05:00
Kazu Hirata	03dc806b12	[Transforms] Use {DenseMap,SmallPtrSet}::contains (NFC)	2023-12-22 14:51:22 -08:00
Alexey Bataev	a13148a880	[SLP]Fix PR75995: drop wrapping flags for resized wrapped binops. If decided to resize the instruction, need to drop wrapping flags from the resulting vector instructions to avoid incorrect optimizations/assumptions later. Fixes PR75995.	2023-12-20 06:51:39 -08:00
Arthur Eubanks	71a9292298	Revert "[SLP]Improve findReusedOrderedScalars processing, NFCI." This reverts commit 44dc1e0baae7c4b8a02ba06dcf396d3d452aa873. Causes non-determinism, see #75987.	2023-12-19 16:14:04 -08:00
Alexey Bataev	00edad17c2	[SLP][NFC]Check for equal opcode preliminary to meet weak strict order requirement, NFC. This change does not affect functionality, just fixes the assertions in some standard c++ library implementations.	2023-12-18 14:12:33 -08:00
Alexey Bataev	a7e10e6603	Revert "[SLP][NFC]Check for equal opcode preliminary to meet weak strict order" This reverts commit 58a2c4e2f24ffce3966c3988d1a4ca7b04c52244 to fix the issue detected by https://lab.llvm.org/buildbot/#/builders/233/builds/5424.	2023-12-18 12:35:52 -08:00
Alexey Bataev	58a2c4e2f2	[SLP][NFC]Check for equal opcode preliminary to meet weak strict order requirement, NFC. This change does not affect functionality, just fixes the assertions in some standard c++ library implementations.	2023-12-18 06:42:03 -08:00
Florian Hahn	cb56ba6350	[VPlan] Unswitch cond in replaceUsesWithIf in optimizeInductions (NFC) As suggested post-commit for a00227197, unswitch the condition in replaceUsesWithIf to simplify the check.	2023-12-15 20:26:36 +00:00
Florian Hahn	9277ef12c0	[VPlan] Remove stale comment from optimizeInductions (NFC). As suggested post-commit for a00227197, remove the stale comment, SetVector is no longer used here.	2023-12-15 17:35:13 +00:00
Reid Kleckner	3e16152ebc	[SLP] Fix OOB GEP index access for a no-op GEP Issue is covered by existing test llvm/test/Transforms/SLPVectorizer/RISCV/phi-const.ll See issue #75632 for ideas for how we could catch these more easily in the future.	2023-12-15 17:33:06 +00:00
Florian Hahn	b1bfe221e6	[VPlan] Remove unneeded getNumUsers calls in replaceAllUsesWith (NFC). As suggested post-commit for a00227197, replace unnecessary getNumUsers calls by boolean variable to indicate if users changed. Note that this also requires an early exit to detect the case where a value is replaced by itself.	2023-12-15 13:43:15 +00:00
Shih-Po Hung	3d422a9859	[VPlan] Implement mayHaveSideEffects/mayWriteToMemory for VPInterleav… (#71360 ) …eRecipe This helps VPlanTransforms::removeDeadRecipes to work on VPInterleaveRecipe	2023-12-15 00:23:14 +08:00
Maurice Heumann	f42b930af9	[SLP] Pessimistically handle unknown vector entries in SLP vectorizer (#75438 ) SLP Vectorizer can discard vector entries at unknown positions. This example shows the behaviour: https://godbolt.org/z/or43EM594 The following instruction inserts an element at an unknown position: ``` %2 = insertelement <3 x i64> poison, i64 %value, i64 %position ``` The position depends on an argument that is unknown at compile time. After running SLP, one can see there is no more instruction present referencing `%position`. This happens as SLP parallelizes the two adds in the example. It then needs to merge the original vector with the new vector. Within `isUndefVector`, the SLP vectorizer constructs a bitmap indicating which elements of the original vector are poison values. It does this by walking the insertElement instructions. If it encounters an insert with a non-constant position, it is ignored. This will result in poison values to be used for all entries, where there are no inserts with constant positions. However, as the position is unknown, the element could be anywhere. Therefore, I think it is only safe to assume none of the entries are poison values and to simply take them all over when constructing the shuffleVector instruction. This fixes #75437	2023-12-14 09:48:23 -05:00
Florian Hahn	173032902c	Revert "[VPlan] Mark Select VPInstructions as not having sideeffects." This reverts commit 19918ac34dc5d304ec6ad413ceae1d4394abe28f. Fixes #75298. There is still a case where we miss the correct users outside the main vector loop for reductions, and that is tail-folded loops with reductions where the final value is stored after the loop. This should be handled explicitly in #70253	2023-12-13 21:05:24 +00:00
Florian Hahn	19918ac34d	[VPlan] Mark Select VPInstructions as not having sideeffects. Select VPInstructions don't have sideeffects, mark them accordingly.	2023-12-11 12:26:32 +00:00
Shao-Ce SUN	d860710905	[NFC][VPlan] Simplify VPValue::removeUser (#74708 ) Replaced explicit loops with find + erase.	2023-12-11 10:55:27 +08:00
Kazu Hirata	8b1181133d	[Transforms] Remove unused forward declarations (NFC)	2023-12-10 10:07:12 -08:00
Kazu Hirata	a16429365c	[Transforms] Remove unnecessary includes (NFC)	2023-12-09 18:23:06 -08:00
Alexey Bataev	44dc1e0baa	[SLP]Improve findReusedOrderedScalars processing, NFCI. Tries to simplify structural complexity of the findReusedOrderedScalars function.	2023-12-08 14:27:55 -08:00
Florian Hahn	a5891fa4d2	[VPlan] Initial modeling of VF * UF as VPValue. (#74761 ) This patch starts initial modeling of VF * UF in VPlan. Initially, introduce a dedicated VFxUF VPValue, which is then populated during VPlan::prepareToExecute. Initially, the VF * UF applies only to the main vector loop region. Once we extend the scope of VPlan in the future, we may want to associate different VFxUFs with different vector loop regions (e.g. the epilogue vector loop) This allows explicitly parameterizing recipes that rely on the VF * UF, like the canonical induction increment. At the moment, this mainly helps to avoid generating some duplicated calls to vscale with scalable vectors. It should also allow using EVL as induction increments explicitly in D99750. Referring to VF * UF is also needed in other places that we plan to migrate to VPlan, like the minimum trip count check during skeleton creation. The first version creates the value for VF * UF directly in prepareToExecute to limit the scope of the patch. A follow-on patch will model VF * UF computation explicitly in VPlan using recipes. Moved from Phabricator (https://reviews.llvm.org/D157322)	2023-12-08 18:30:30 +00:00
Florian Hahn	5ea6a3fc6d	[VPlan] Compute scalable VF in preheader for induction increment. (#74762 ) UF * VF is loop invariant and can be computed directly in the preheader. This prepares the code for #74761 and reduces the test changes.	2023-12-08 12:18:31 +00:00
Florian Hahn	633fe60149	[VPlan] Print flags for VPWidenCastRecipe. Update VPWidenCastRecipe to also print flags. Simplify nneg printing test and replace hard-coded value number references with patterns.	2023-12-08 10:48:54 +00:00
Graham Hunter	d0d5ef8133	[LV] Add support for linear arguments for vector function variants (#73941 ) If we have vectorized variants of a function which take linear parameters, we should be able to vectorize assuming the strides match.	2023-12-08 10:24:05 +00:00
Alexey Bataev	fb35bb48c6	[SLP][NFC]Build value-to-gather-nodes map during nodes building, NFC.	2023-12-07 13:41:19 -08:00
Alexey Bataev	58785ebd24	[SLP][NFC]Check for ephemeral values beforehand, NFC.	2023-12-07 13:25:15 -08:00
Alexey Bataev	0e1a9e3084	[SLP]Fix PR74607: Fix dependency between buildvector nodes with user nodes, having same last instruction. If the user nodes has the same last-instruction, used as insert points for the buildvector nodes, finding the proper dependency is crucial. Before, it depended on the indices of the buildvectors themselves but looks like it should depend on indices of the user nodes, because it identifies the vectorization order and, thus, properly aligns buildvector nodes in terms of def-use chain.	2023-12-06 10:15:01 -08:00
Paschalis Mpeis	7b83f69db4	[NFC] Replace CallInst with FunctionType in VFABI, VFShape API (#74569 ) Minor simplification applied to VFShape::getScalarShape, VFShape::get, and VFABI::tryDemangleForVFABI methods. Also, remove unnecessary `static_cast` in `SLPVectorizer.cpp`	2023-12-06 17:14:58 +00:00
Florian Hahn	bbd1941a38	[VPlan] Add disjoint flag to VPRecipeWithIRFlags. (#74364 ) A new disjoint flag was added for OR instructions in #72583. Update VPRecipeWithIRFlags to also support the new flag. This allows printing and preserving the disjoint flag in vectorized code.	2023-12-05 15:21:59 +00:00
Alexey Bataev	056367bb19	[LV]Support dropping of nneg flag for zext widencast recipes. (#74112 ) Compiler crashes when the assertion triggered for zext nneg instruction, that checks that the instruction cannot produce poison. Changed the base class for widencast recipe to handle dropping nneg flag to avoid compiler crash.	2023-12-05 09:17:23 -05:00
Florian Hahn	cd4348349a	[VPlan] Sink cases where no truncate is needed in truncateMinimalBWs. MinBWs contains entries that specify the minimum required bitwidth. In some cases, the old and new bitwidths can be equal (see test case) and in those cases no truncations are needed, so skip those cases. Fixes #74307.	2023-12-04 15:35:54 +00:00
Florian Hahn	99aa5311ee	[VPlan] Add missing output of live-ins to VPlan dot printing. Split off live-in printing to VPlan::printLiveIns and use it to print Live-ins when printing in the DOT format.	2023-12-04 13:41:28 +00:00
Florian Hahn	c890582912	[VPlan] Account for live-in entries in MinBW used by replicate recipes. In some cases MinBWs may contain entries for live-ins that are not used by VPWidenRecipe or VPWidenSelectRecipes. In those cases, the live-ins won't get processed, so make sure we include them in the count when used as operands in VPWidenCast and VPWidenSelectRecipe. Fixes https://github.com/llvm/llvm-project/issues/74231	2023-12-03 11:15:29 +00:00
Kazu Hirata	0008b9c0ac	[Vectorize] Fix an unused variable warning This patch fixes: llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:912:16: error: unused variable 'OldResSizeInBits' [-Werror,-Wunused-variable]	2023-12-02 11:20:57 -08:00
Florian Hahn	70535f5e60	[VPlan] Replace IR based truncateToMinimalBitwidths with VPlan version. This patch replaces the IR based truncateToMinimalBitwidths with a VPlan version. This has 3 benefits: 1) the VPlan-based version is simpler; we don't need to implement special codegen for each supported instruction type like the IR based one. 2) Removes a dependency on the cost-model after VPlan execution and 3) Removes a use of getVPValue that uses underlying values after VPlan execution (See removed FIXME). Depends on D149081. Depends on D149079. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D149903	2023-12-02 16:12:38 +00:00
Florian Hahn	cbf7b52a65	[VPlan] Properly update reduction live-out after placing select. After inserting a select for the final value, update the VPlan def-use chains. At the moment, the incorrect live-out doesn't cause a mis-compile, as computing the final reduction value is not yet modeled in VPlan.	2023-12-02 15:22:09 +00:00
Alexey Bataev	279b1ea65f	[SLP]Improve gathering of the scalars used in the graph. Currently we emit gathers for scalars being vectorized in the tree as a pair of extractelement/insertelement instructions. Instead we can try to find all required vectors and emit shuffle vector instructions directly, improving the code and reducing compile time. Part of non-power-of-2 vectorization. Differential Revision: https://reviews.llvm.org/D110978	2023-12-01 11:23:57 -08:00
Alexey Bataev	ba52310657	[SLP][NFC] Unify code for cost estimation/codegen for buildvector, NFC. (#73182 ) This just moves towards reusing same function for both cost estimation/codegen for buildvector.	2023-11-30 10:04:57 -05:00

1 2 3 4 5 ...

4163 Commits