llvm-project

Author	SHA1	Message	Date
Alexey Bataev	36e4a7ecca	[SLP]Fix PR79321: SLPVectorizer's PHICompare doesn't provide a strict weak ordering. Try to make PHICompare to meat strict weak ordering criteria.	2024-01-24 13:46:05 -08:00
Alexey Bataev	48bbd76587	[SLP]Fix PR79229: Check that extractelement is used only in a single node before erasing. Before trying to erase the extractelement instruction, not enough to check for single use, need to check that it is not used in several nodes because of the preliminary nodes reordering.	2024-01-24 11:22:22 -08:00
Alexey Bataev	ca654acc16	[SLP]Fix PR79321: SLPVectorizer's PHICompare doesn't provide a strict weak ordering. Compared NumUses to meet the reaquirements of the strict weak ordering.	2024-01-24 09:36:25 -08:00
Alexey Bataev	bb3e0d7fc3	[SLP]Fix PR79193: skip analysis of gather nodes for minbitwidth. No need in trying to analyze small graphs with gather node only to avoid crash.	2024-01-23 12:44:49 -08:00
Stephen Tozer	632f44e5ed	[RemoveDIs][DebugInfo] Handle DPVAssign in most transforms (#78986 ) This patch trivially updates various opt passes to handle DPVAssigns. In all cases, this means some combination of generifying existing code to handle DPValues and DbgAssignIntrinsics, iterating over DPValues where previously we did not, or duplicating code for DbgAssignIntrinsics to the equivalent DPValue function (in inlining and salvageDebugInfo).	2024-01-23 16:16:59 +00:00
Florian Hahn	3683852d49	[VPlan] Use replaceUsesWithIf in replaceAllUseswith and add comment (NFCI). Follow-up to post-commit commens for b1bfe221e6.	2024-01-21 12:56:16 +00:00
Florian Hahn	42fb1fac9e	[VPlan] Use DebugLoc from recipe in VPWidenCallRecipe (NFCI). Instead of using the debug location of the underlying instruction, use the debug location from the recipe. This removes an unneeded dependency of the underlying instruction.	2024-01-19 13:33:03 +00:00
Florian Hahn	abdb61f5fd	[VPlan] Introduce VPSingleDefRecipe. (#77023 ) This patch introduces a new common base class for recipes defining a single result VPValue. This has been discussed/mentioned at various previous reviews as potential follow-up and helps to replace various getVPSingleValue calls. PR: https://github.com/llvm/llvm-project/pull/77023	2024-01-19 10:27:53 +00:00
Paschalis Mpeis	37c87d5689	[LV][AArch64] LoopVectorizer allows scalable frem instructions (#76247 ) LoopVectorizer is aware when a target can replace a scalable frem instruction with a vector library call for a given VF and it returns the relevant cost. Otherwise, it returns an invalid cost (as previously). Add test that check costs on AArch64, when there is no vector library available and when there is (with and without tail-folding). NOTE: Invoking CostModel directly (not through LV) would still return invalid costs.	2024-01-18 08:32:53 +00:00
Alexey Bataev	093206bb7e	[SLP]Fix PR78298: Assertion `GEP->getNumIndices() == 1 && !isa<Constant>(GEPIdx)' failed. The non-constant index might be folded to constant during earlier stages of vectorization. Need to consider this option and filter out out GEP with the constant indices from the candidates list.	2024-01-16 09:17:35 -08:00
Florian Hahn	9a402d6fbb	[LV] Make DL optional argument for VPBuilder member functions (NFCI).	2024-01-16 15:50:09 +00:00
Florian Hahn	e7671bc9d6	[LV] Fix indent for loop in adjustRecipesForReductions (NFC).	2024-01-16 15:28:46 +00:00
Alexey Bataev	d79fdb2749	[SLP]Fix PR78236: correctly track external values, replaced several times during reduction vectorization. If the external value was replaced in the vectorizer several times during reduction vectorization, need to find the original value to correctly handle external uses and emit extractelement instructions properly.	2024-01-16 06:52:43 -08:00
Florian Hahn	6011d6b2cc	[VPlan] Use start value of reduction phi to determine type (NFCI). Instead of accessing the underlying original IR value, check the type of the start value from the recipe directly.	2024-01-16 14:39:51 +00:00
Mel Chen	b6e8f6604c	[LV] Skipping all debug instructions when native vplan is enabled (#77413 ) The following internal error occurred when using native vplan to vectorize the program with the debug info generation. Assertion `!isa<DbgInfoIntrinsic>(CI) && "DbgInfoIntrinsic should have been dropped during VPlan construction"' failed. This patch ignored all debug instructions to fix the error when native vplan is enabled.	2024-01-16 11:08:10 +08:00
Alexey Bataev	6fdc2ce8c5	[SLP]Fix PR77916: transform the whole mask, not only the elements for the second vector. Need to transform all elements in the long mask, if we decided to produce shorter version, some elements may still have incorrect inifices after transformation for the first vector in the permutation.	2024-01-12 07:07:43 -08:00
Nikita Popov	6c2fbc3a68	[IRBuilder] Add CreatePtrAdd() method (NFC) (#77582 ) This abstracts over the common pattern of creating a gep with i8 element type.	2024-01-12 14:21:21 +01:00
Florian Hahn	59d6f033a2	[VPlan] Support narrowing widened loads in truncateToMinimimalBitwidths. MinBWs may also contain widened load instructions, handle them by only narrowing their result. Fixes https://github.com/llvm/llvm-project/issues/77468	2024-01-12 13:14:13 +00:00
Alexey Bataev	39b2104b4a	[SLP]Fix a crash for reduced values with minbitwidth, which are reused. If the reduced values are additionally affected by minbitwidth analysis, need to cast them to a proper type before doing any math, if they are reused.	2024-01-12 04:49:48 -08:00
Alexey Bataev	18473eb108	[SLP]Do not require external uses for roots and single use for other instructions in computeMinimumValueSizes. (#72679 ) After changes, that does not require support from InstCombine, we can drop some extra requirements for values-to-be-demoted. No need to check for external uses for roots/other instructions, just check that the no non-vectorized insertelement instruction, which may require widening. Review: https://github.com/llvm/llvm-project/pull/72679	2024-01-11 06:59:57 -08:00
Martin Storsjö	1de3f46938	Revert "[SLP]Do not require external uses for roots and single use for other instructions in computeMinimumValueSizes. (#72679 )" This reverts commit 408dce82016463dcb5026b2ddfc62174970a88e9. This triggered failed asserts with code like this: char a[]; short b; int c, d, e, f; void g() { char h; for (;;) { for (; f; ++f) { h[f] = b[0] * a[e] + b[c] * a[1] >> 7; ++b; } h += d; } } Compiled like this: $ clang -target x86_64-linux-gnu -c repro.c -O2 clang: ../lib/IR/Instructions.cpp:3335: static llvm::CastInst* llvm::CastInst::Create(llvm::Instruction::CastOps, llvm::Value, llvm::Type, const llvm::Twine&, llvm::Instruction*): Assertion `castIsValid(op, S, Ty) && "Invalid cast!"' failed.	2024-01-11 12:15:35 +02:00
Craig Topper	1c342571b8	[LV] Use value_or to simplify code. NFC (#77030 )	2024-01-10 12:40:26 -08:00
Alexey Bataev	408dce8201	[SLP]Do not require external uses for roots and single use for other instructions in computeMinimumValueSizes. (#72679 ) After changes, that does not require support from InstCombine, we can drop some extra requirements for values-to-be-demoted. No need to check for external uses for roots/other instructions, just check that the no non-vectorized insertelement instruction, which may require widening.	2024-01-10 14:06:29 -05:00
Alexey Bataev	73ce13d79b	[SLP][TTI]Improve detection of the insert-subvector pattern for SLP. (#74749 ) SLP vectorizer passes the type of the subvector and the mask, which size determines the size of the resulting vector. TTI should support this pattern to improve cost estimation of the insert_subvector shuffle pattern.	2024-01-10 10:39:34 -05:00
Florian Hahn	8b7bbedec7	[LV] Re-add early exit in VPRecipeBuilder::createBlockInMask. Re-add early exit that was accidentally dropped in 51afb10.	2024-01-10 15:02:14 +00:00
Florian Hahn	51afb10174	[LV] Create block in mask up-front if needed. (#76635 ) At the moment, block and edge masks are created on demand, which means that they are inserted at the point where they are demanded and then cached. It is possible that the mask for a block is looked up later at a point that's not dominated by the point where the mask has been inserted. To avoid this, create masks up front on entry to the corresponding basic block and leave it to VPlan simplification to remove unneeded masks. Note that we need to create masks for all blocks, if any of the blocks in the loop needs predication, as computing the mask of a block depends on the masks of its predecessor. Needed for #76090. https://github.com/llvm/llvm-project/pull/76635	2024-01-09 10:50:08 +00:00
Alexey Bataev	036e48e2f5	[SLP]Fix PR76850: do the analysis of the submask. Need to limit the transformation of the VecMask by the corresponding part of the mask of SliceSize size to avoid compiler crash during further cost analysis.	2024-01-08 07:51:02 -08:00
Florian Hahn	18ec3304a9	[VPlan] Manage InBounds via VPRecipeWithIRFlags for VectorPtrRecipe. As suggested as follow-up in https://github.com/llvm/llvm-project/pull/72164, manage inbounds via VPRecipeWithIRFlags. Note that in some cases we can now preserve inbounds in a few more cases.	2024-01-07 13:58:05 +00:00
Florian Hahn	3fb0d8dc80	Recommit "[VPlan] Mark Select VPInstructions as not having sideeffects." With #70253 landed, selects for reduction results are explicitly used by ComputeReductionResult and Selects can be marked as not having side-effects again. This reverts the revert commit 173032902c960d4d0d67b521d8c149553d8e8ba3.	2024-01-06 12:08:06 +00:00
Florian Hahn	241fe83704	[VPlan] Introduce ComputeReductionResult VPInstruction opcode. (#70253 ) This patch introduces a new ComputeReductionResult opcode to compute the final reduction result in the middle block. The code from fixReduction has been moved to ComputeReductionResult, after some earlier cleanup changes to model parts of fixReduction explicitly elsewhere as needed. The recipe may be broken down further in the future. Note that the phi nodes to merge the reduction result from the trip count check and the middle block, to be used as resume value for the scalar remainder loop are also generated based on ComputeReductionResult. Once we have a VPValue for the reduction result, this can also be modeled explicitly and moved out of the recipe.	2024-01-04 22:53:18 +00:00
Florian Hahn	2ab5c47c87	[VPlan] Don't replace scalarizing recipe with VPWidenCastRecipe. Don't replace a scalarizing recipe with a VPWidenCastRecipe. This would introduce wide (vectorizing) recipes when interleaving only. Fixes https://github.com/llvm/llvm-project/issues/76986	2024-01-04 20:39:44 +00:00
Alexey Bataev	79e62315be	[SLP]Use revectorized value for extracts from buildvector, beeing vectorized. When trying to reuse the extractelement instruction, emitted for the insertelement instruction, need to check, if the this insertelement instruction was vectorized. In this case, need to use vectorized value, not the original insertelement.	2024-01-04 06:45:26 -08:00
Jannik Silvanus	7954c57124	[IR] Fix GEP offset computations for vector GEPs (#75448 ) Vectors are always bit-packed and don't respect the elements' alignment requirements. This is different from arrays. This means offsets of vector GEPs need to be computed differently than offsets of array GEPs. This PR fixes many places that rely on an incorrect pattern that always relies on `DL.getTypeAllocSize(GTI.getIndexedType())`. We replace these by usages of `GTI.getSequentialElementStride(DL)`, which is a new helper function added in this PR. This changes behavior for GEPs into vectors with element types for which the (bit) size and alloc size is different. This includes two cases: * Types with a bit size that is not a multiple of a byte, e.g. i1. GEPs into such vectors are questionable to begin with, as some elements are not even addressable. * Overaligned types, e.g. i16 with 32-bit alignment. Existing tests are unaffected, but a miscompilation of a new test is fixed. --------- Co-authored-by: Nikita Popov <github@npopov.com>	2024-01-04 10:08:21 +01:00
Nilanjana Basu	cd28da390f	[LV] Change loops' interleave count computation (#73766 ) [LV] Change loops' interleave count computation A set of microbenchmarks in llvm-test-suite (https://github.com/llvm/llvm-test-suite/pull/56), when tested on a AArch64 platform, demonstrates that loop interleaving is beneficial when the vector loop runs at least twice or when the epilogue loop trip count (TC) is minimal. Therefore, we choose interleaving count (IC) between TC/VF & TC/2*VF (VF = vectorization factor), such that remainder TC for the epilogue loop is minimum while the IC is maximum in case the remainder TC is same for both. The initial tests for this change were submitted in PRs: https://github.com/llvm/llvm-project/pull/70272 and https://github.com/llvm/llvm-project/pull/74689.	2024-01-04 12:45:22 +05:30
Florian Hahn	6dda74cc51	[VPlan] Use createSelect in adjustRecipesForReductions (NFCI). Simplify the code and rename Result->NewExitingVPV as suggested by @ayalz in https://github.com/llvm/llvm-project/pull/70253.	2024-01-03 20:54:10 +00:00
Alexey Bataev	7c963fde16	[SLP]Use revectorized value for extracts from buildvector, beeing vectorized. If the insertelement instruction is vectorized, and the extractelement instruction from such insertelement also vectorized as part of the same tree, need to extract from the corresponding for insertelement vectorized value rather than original insertelement instruction.	2024-01-03 10:38:09 -08:00
Alexandros Lamprineas	e512df3ecc	[LV] Fix crash when vectorizing function calls with linear args. (#76274 ) llvm/lib/IR/Type.cpp:694: Assertion `isValidElementType(ElementType) && "Element type of a VectorType must be an integer, floating point, or pointer type."' failed. Stack dump: llvm::FixedVectorType::get(llvm::Type, unsigned int) llvm::VPWidenCallRecipe::execute(llvm::VPTransformState&) llvm::VPBasicBlock::execute(llvm::VPTransformState) llvm::VPRegionBlock::execute(llvm::VPTransformState) llvm::VPlan::execute(llvm::VPTransformState) ... Happens with function calls of void return type.	2024-01-02 18:14:16 +00:00
Enna1	9943d33997	[SLP][NFC] Fix assertion in vectorizeGEPIndices() (#76660 ) The index constraints for the collected getelementptr instructions should be single and non-constant.	2024-01-02 21:32:18 +08:00
Florian Hahn	f18536d642	[VPlan] Model address separately. (#72164 ) Move vector pointer generation to a separate VPVectorPointerRecipe. This untangles address computation from the memory recipes future and is also needed to enable explicit unrolling in VPlan. https://github.com/llvm/llvm-project/pull/72164	2024-01-01 19:51:15 +00:00
Enna1	a51c2f39f5	[SLP] no need to generate extract for in-tree uses for original scala… (#76077 ) …r instruction. Before `77a609b556`, we always skip in-tree uses of the vectorized scalars in `buildExternalUses()`, that commit handles the case that if the in-tree use is scalar operand in vectorized instruction, we need to generate extract for these in-tree uses. in-tree uses remain as scalar in vectorized instructions can be 3 cases: - The pointer operand of vectorized LoadInst uses an in-tree scalar - The pointer operand of vectorized StoreInst uses an in-tree scalar - The scalar argument of vector form intrinsic uses an in-tree scalar Generating extract for in-tree uses for vectorized instructions are implemented in `BoUpSLP::vectorizeTree()`: - https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp#L11497-L11506 - https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp#L11542-L11551 - https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp#L11657-L11667 However, `77a609b556` not only generates extract for vectorized instructions, but also generates extract for original scalar instructions. There is no need to generate extract for origin scalar instrutions, as these scalar instructions will be replaced by vector instructions and get erased later. This patch marks there is no exact user for in-tree scalars that remain as scalar in vectorized instructions when building external uses, In this case all uses of this scalar will be automatically replaced by extractelement. and remove - https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp#L11497-L11506 - https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp#L11542-L11551 - https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp#L11657-L11667 extracts.	2023-12-30 10:45:26 +08:00
Florian Hahn	516cc98aff	[LV] Fix typo in comment (NFC).	2023-12-28 21:20:10 +00:00
Alexey Bataev	5096501082	[SLP][TTI][X86]Add addsub pattern cost estimation. (#76461 ) SLP/TTI do not know about the cost estimation for addsub pattern, supported by X86. Previously the support for pattern detection was added (seeTTI::isLegalAltInstr), but the cost still did not estimated properly.	2023-12-28 05:04:04 -08:00
Douglas Yung	fb981e6b4b	Revert "[SLP][TTI][X86]Add addsub pattern cost estimation. (#76461 )" This reverts commit bc8c4bbd7973ab9527a78a20000aecde9bed652d. Change is failing to build on several bots: - https://lab.llvm.org/buildbot/#/builders/127/builds/60184 - https://lab.llvm.org/buildbot/#/builders/123/builds/23709 - https://lab.llvm.org/buildbot/#/builders/216/builds/32302	2023-12-27 23:52:04 -08:00
Alexey Bataev	bc8c4bbd79	[SLP][TTI][X86]Add addsub pattern cost estimation. (#76461 ) SLP/TTI do not know about the cost estimation for addsub pattern, supported by X86. Previously the support for pattern detection was added (seeTTI::isLegalAltInstr), but the cost still did not estimated properly.	2023-12-27 15:57:21 -05:00
Kazu Hirata	03dc806b12	[Transforms] Use {DenseMap,SmallPtrSet}::contains (NFC)	2023-12-22 14:51:22 -08:00
Alexey Bataev	a13148a880	[SLP]Fix PR75995: drop wrapping flags for resized wrapped binops. If decided to resize the instruction, need to drop wrapping flags from the resulting vector instructions to avoid incorrect optimizations/assumptions later. Fixes PR75995.	2023-12-20 06:51:39 -08:00
Arthur Eubanks	71a9292298	Revert "[SLP]Improve findReusedOrderedScalars processing, NFCI." This reverts commit 44dc1e0baae7c4b8a02ba06dcf396d3d452aa873. Causes non-determinism, see #75987.	2023-12-19 16:14:04 -08:00
Alexey Bataev	00edad17c2	[SLP][NFC]Check for equal opcode preliminary to meet weak strict order requirement, NFC. This change does not affect functionality, just fixes the assertions in some standard c++ library implementations.	2023-12-18 14:12:33 -08:00
Alexey Bataev	a7e10e6603	Revert "[SLP][NFC]Check for equal opcode preliminary to meet weak strict order" This reverts commit 58a2c4e2f24ffce3966c3988d1a4ca7b04c52244 to fix the issue detected by https://lab.llvm.org/buildbot/#/builders/233/builds/5424.	2023-12-18 12:35:52 -08:00
Alexey Bataev	58a2c4e2f2	[SLP][NFC]Check for equal opcode preliminary to meet weak strict order requirement, NFC. This change does not affect functionality, just fixes the assertions in some standard c++ library implementations.	2023-12-18 06:42:03 -08:00

1 2 3 4 5 ...

4193 Commits