llvm-project

Author	SHA1	Message	Date
Florian Hahn	cec24f0d7e	[VPlan] Update stale test after 9536a6286, fix formatting.	2024-01-31 13:45:38 +00:00
Florian Hahn	9536a6286e	[VPlan] Preserve original induction order when creating scalar steps. Update createScalarIVSteps to take an insert point as parameter. This ensures that the inserted scalar steps are in the same order as the recipes they replace (vs in reverse order as currently). This helps to reduce the diff for follow-up changes.	2024-01-31 13:31:28 +00:00
Alexey Bataev	285bc69846	[SLP]Fix PR80027: Fix costs processing for minbitwidth types. Need to switch the types, the destination is first in getCastInstrCost function.	2024-01-30 10:32:55 -08:00
Alexey Bataev	976374d982	[SLP][NFC]Use MutableArrayRef instead of SmallVectorImpl&, NFC.	2024-01-30 06:21:47 -08:00
Nilanjana Basu	c492eb6b28	[LV] Update interleaving count computation when scalar epilogue loop needs to run at least once (#79651 ) Update loop interleaving count computation to address loops that require at least one scalar iteration in the epilogue loop. For this case, the available trip count for interleaving the loop is one less.	2024-01-29 13:41:15 -08:00
Alexey Bataev	8d89dd4a58	[SLP]Fix PR79743: Check that all users are demoted before trying to demote the tree entry. Need to check if all user nodes are marked for demotion before demoting the node. Otherwise, some data info might be lost after vectorization.	2024-01-29 10:51:20 -08:00
Florian Hahn	743946e8ef	[VPlan] Replace VPRecipeOrVPValue with VP2VP recipe simplification. (#76090 ) Move simplification of VPBlendRecipes from early VPlan construction to VPlan-to-VPlan based recipe simplification. This simplifies initial construction. Note that some in-loop reduction tests are failing at the moment, due to the reduction predicate being created after the reduction recipe. I will provide a patch for that soon. PR: https://github.com/llvm/llvm-project/pull/76090	2024-01-29 09:52:05 +00:00
Florian Hahn	2d0d65b3ba	[VPlan] Create edge masks all cases up front needed.(NFC) Similarly to how block masks are created up front and later only retrieved also make sure masks are created in cases where edge masks are needed, i.e. blend recipes. Creating block-in masks for all blocks in the loop also ensures edge masks for all relevant edges have been created. Later, the new getEdgeMask can be used to look up cached edge masks. This makes sure edge masks are available in all cases for https://github.com/llvm/llvm-project/pull/76090.	2024-01-28 21:20:18 +00:00
Florian Hahn	1b37e8087e	[VPlan] use getVPValueOrAddLiveIn in VPlan::duplicate. Instead of creating live-ins manually, use getOrAddLiveIn which automatically takes care of adding them to VPLiveInsToFree. Also use it to create the VPValue for the trip-count. This fixes a leak: https://lab.llvm.org/buildbot/#/builders/168/builds/18308/steps/10/logs/stdio	2024-01-28 12:39:39 +00:00
Florian Hahn	7c03d5d41d	[VPlan] Use unique_ptr to clean up duplicated plan.	2024-01-27 20:51:55 +00:00
Florian Hahn	ec402a2e53	[VPlan] Implement cloning of VPlans. (#73158 ) This patch implements cloning for VPlans and recipes. Cloning is used in the epilogue vectorization path, to clone the VPlan for the main vector loop. This means we won't re-use a VPlan when executing the VPlan for the epilogue vector loop, which in turn will enable us to perform optimizations based on UF & VF.	2024-01-27 13:30:52 +00:00
David Sherwood	962fbafecf	[LoopVectorize] Refine runtime memory check costs when there is an outer loop (#76034 ) When we generate runtime memory checks for an inner loop it's possible that these checks are invariant in the outer loop and so will get hoisted out. In such cases, the effective cost of the checks should reduce to reflect the outer loop trip count. This fixes a 25% performance regression introduced by commit 49b0e6dcc296792b577ae8f0f674e61a0929b99d when building the SPEC2017 x264 benchmark with PGO, where we decided the inner loop trip count wasn't high enough to warrant the (incorrect) high cost of the runtime checks. Also, when runtime memory checks consist entirely of diff checks these are likely to be outer loop invariant.	2024-01-26 14:43:48 +00:00
Florian Hahn	731c2049a4	[VPlan] Relax IV user assertion after 0ab539f for epilogue vec. After 0ab539fd6748adf2f638e10514dd9419597d8863, the canonical IV in the epilogue vector loop may be used by a trunc. Relax the corresponding assert. This should fix some build-bot failures, including https://lab.llvm.org/buildbot/#/builders/187/builds/14113 https://lab.llvm.org/buildbot/#/builders/98/builds/32350 https://lab.llvm.org/buildbot/#/builders/239/builds/5473	2024-01-26 13:19:25 +00:00
Graham Hunter	d4c0171423	[LV] Fix handling of interleaving linear args (#78725 ) Currently when interleaving vector calls with linear arguments, the Part is ignored and all vector calls use the initial value from the first lane of the current iteration. Fix this to extract from the correct part of the linear vector.	2024-01-26 11:30:35 +00:00
Florian Hahn	0ab539fd67	[VPlan] Add new VPScalarCastRecipe, use for IV & step trunc. (#78113 ) Add a new recipe to model scalar cast instructions, without relying on an underlying instruction. This allows creating scalar casts, without relying on an underlying instruction (like the current VPReplicateRecipe). The new recipe is used to explicitly model both truncating the induction step and the VPDerivedIVRecipe, thus simplifying both the recipe and code needed to introduce it. Truncating VPWidenIntOrFpInductionRecipes should also be modeled using the new recipe, as follow-up. PR: https://github.com/llvm/llvm-project/pull/78113	2024-01-26 11:13:05 +00:00
Alexey Bataev	92ae2ca12b	[SLP][NFC]Improve BottomTopTop reordering of orders for multi-iterations attempts, NFC. If several iterations of reodering of orders is required, need to use different algorithm.	2024-01-25 13:04:01 -08:00
Alexey Bataev	6fe21bc1da	[SLP]Fix PR79229: Do not erase extractelement, if it used in multiregister node. If the node can be span between several registers and same extractelement instruction is used in several parts, it may be required to keep such extractelement instruction to avoid compiler crash.	2024-01-25 06:20:53 -08:00
Florian Hahn	a04f615291	[LV] Check for innermost loop instead of EnableVPlanNativePath in CM. Replace EnableVPlanNativePath checks in the cost-model by assertions that the code is only called for innermost loops. This ensures that the cost model isn't used in the VPlanNativePath, which is only used for outer-loop vectorization. Even with EnableVPlanNativePath, inner loops are processed by the inner loop vectorization path, not the native path, so checking for EnableVPlanNativePath may impact decisions for inner loops and can cause crashes, like in the attached test case.	2024-01-25 12:49:52 +00:00
Alexey Bataev	36e4a7ecca	[SLP]Fix PR79321: SLPVectorizer's PHICompare doesn't provide a strict weak ordering. Try to make PHICompare to meat strict weak ordering criteria.	2024-01-24 13:46:05 -08:00
Alexey Bataev	48bbd76587	[SLP]Fix PR79229: Check that extractelement is used only in a single node before erasing. Before trying to erase the extractelement instruction, not enough to check for single use, need to check that it is not used in several nodes because of the preliminary nodes reordering.	2024-01-24 11:22:22 -08:00
Alexey Bataev	ca654acc16	[SLP]Fix PR79321: SLPVectorizer's PHICompare doesn't provide a strict weak ordering. Compared NumUses to meet the reaquirements of the strict weak ordering.	2024-01-24 09:36:25 -08:00
Alexey Bataev	bb3e0d7fc3	[SLP]Fix PR79193: skip analysis of gather nodes for minbitwidth. No need in trying to analyze small graphs with gather node only to avoid crash.	2024-01-23 12:44:49 -08:00
Stephen Tozer	632f44e5ed	[RemoveDIs][DebugInfo] Handle DPVAssign in most transforms (#78986 ) This patch trivially updates various opt passes to handle DPVAssigns. In all cases, this means some combination of generifying existing code to handle DPValues and DbgAssignIntrinsics, iterating over DPValues where previously we did not, or duplicating code for DbgAssignIntrinsics to the equivalent DPValue function (in inlining and salvageDebugInfo).	2024-01-23 16:16:59 +00:00
Florian Hahn	3683852d49	[VPlan] Use replaceUsesWithIf in replaceAllUseswith and add comment (NFCI). Follow-up to post-commit commens for b1bfe221e6.	2024-01-21 12:56:16 +00:00
Florian Hahn	42fb1fac9e	[VPlan] Use DebugLoc from recipe in VPWidenCallRecipe (NFCI). Instead of using the debug location of the underlying instruction, use the debug location from the recipe. This removes an unneeded dependency of the underlying instruction.	2024-01-19 13:33:03 +00:00
Florian Hahn	abdb61f5fd	[VPlan] Introduce VPSingleDefRecipe. (#77023 ) This patch introduces a new common base class for recipes defining a single result VPValue. This has been discussed/mentioned at various previous reviews as potential follow-up and helps to replace various getVPSingleValue calls. PR: https://github.com/llvm/llvm-project/pull/77023	2024-01-19 10:27:53 +00:00
Paschalis Mpeis	37c87d5689	[LV][AArch64] LoopVectorizer allows scalable frem instructions (#76247 ) LoopVectorizer is aware when a target can replace a scalable frem instruction with a vector library call for a given VF and it returns the relevant cost. Otherwise, it returns an invalid cost (as previously). Add test that check costs on AArch64, when there is no vector library available and when there is (with and without tail-folding). NOTE: Invoking CostModel directly (not through LV) would still return invalid costs.	2024-01-18 08:32:53 +00:00
Alexey Bataev	093206bb7e	[SLP]Fix PR78298: Assertion `GEP->getNumIndices() == 1 && !isa<Constant>(GEPIdx)' failed. The non-constant index might be folded to constant during earlier stages of vectorization. Need to consider this option and filter out out GEP with the constant indices from the candidates list.	2024-01-16 09:17:35 -08:00
Florian Hahn	9a402d6fbb	[LV] Make DL optional argument for VPBuilder member functions (NFCI).	2024-01-16 15:50:09 +00:00
Florian Hahn	e7671bc9d6	[LV] Fix indent for loop in adjustRecipesForReductions (NFC).	2024-01-16 15:28:46 +00:00
Alexey Bataev	d79fdb2749	[SLP]Fix PR78236: correctly track external values, replaced several times during reduction vectorization. If the external value was replaced in the vectorizer several times during reduction vectorization, need to find the original value to correctly handle external uses and emit extractelement instructions properly.	2024-01-16 06:52:43 -08:00
Florian Hahn	6011d6b2cc	[VPlan] Use start value of reduction phi to determine type (NFCI). Instead of accessing the underlying original IR value, check the type of the start value from the recipe directly.	2024-01-16 14:39:51 +00:00
Mel Chen	b6e8f6604c	[LV] Skipping all debug instructions when native vplan is enabled (#77413 ) The following internal error occurred when using native vplan to vectorize the program with the debug info generation. Assertion `!isa<DbgInfoIntrinsic>(CI) && "DbgInfoIntrinsic should have been dropped during VPlan construction"' failed. This patch ignored all debug instructions to fix the error when native vplan is enabled.	2024-01-16 11:08:10 +08:00
Alexey Bataev	6fdc2ce8c5	[SLP]Fix PR77916: transform the whole mask, not only the elements for the second vector. Need to transform all elements in the long mask, if we decided to produce shorter version, some elements may still have incorrect inifices after transformation for the first vector in the permutation.	2024-01-12 07:07:43 -08:00
Nikita Popov	6c2fbc3a68	[IRBuilder] Add CreatePtrAdd() method (NFC) (#77582 ) This abstracts over the common pattern of creating a gep with i8 element type.	2024-01-12 14:21:21 +01:00
Florian Hahn	59d6f033a2	[VPlan] Support narrowing widened loads in truncateToMinimimalBitwidths. MinBWs may also contain widened load instructions, handle them by only narrowing their result. Fixes https://github.com/llvm/llvm-project/issues/77468	2024-01-12 13:14:13 +00:00
Alexey Bataev	39b2104b4a	[SLP]Fix a crash for reduced values with minbitwidth, which are reused. If the reduced values are additionally affected by minbitwidth analysis, need to cast them to a proper type before doing any math, if they are reused.	2024-01-12 04:49:48 -08:00
Alexey Bataev	18473eb108	[SLP]Do not require external uses for roots and single use for other instructions in computeMinimumValueSizes. (#72679 ) After changes, that does not require support from InstCombine, we can drop some extra requirements for values-to-be-demoted. No need to check for external uses for roots/other instructions, just check that the no non-vectorized insertelement instruction, which may require widening. Review: https://github.com/llvm/llvm-project/pull/72679	2024-01-11 06:59:57 -08:00
Martin Storsjö	1de3f46938	Revert "[SLP]Do not require external uses for roots and single use for other instructions in computeMinimumValueSizes. (#72679 )" This reverts commit 408dce82016463dcb5026b2ddfc62174970a88e9. This triggered failed asserts with code like this: char a[]; short b; int c, d, e, f; void g() { char h; for (;;) { for (; f; ++f) { h[f] = b[0] * a[e] + b[c] * a[1] >> 7; ++b; } h += d; } } Compiled like this: $ clang -target x86_64-linux-gnu -c repro.c -O2 clang: ../lib/IR/Instructions.cpp:3335: static llvm::CastInst* llvm::CastInst::Create(llvm::Instruction::CastOps, llvm::Value, llvm::Type, const llvm::Twine&, llvm::Instruction*): Assertion `castIsValid(op, S, Ty) && "Invalid cast!"' failed.	2024-01-11 12:15:35 +02:00
Craig Topper	1c342571b8	[LV] Use value_or to simplify code. NFC (#77030 )	2024-01-10 12:40:26 -08:00
Alexey Bataev	408dce8201	[SLP]Do not require external uses for roots and single use for other instructions in computeMinimumValueSizes. (#72679 ) After changes, that does not require support from InstCombine, we can drop some extra requirements for values-to-be-demoted. No need to check for external uses for roots/other instructions, just check that the no non-vectorized insertelement instruction, which may require widening.	2024-01-10 14:06:29 -05:00
Alexey Bataev	73ce13d79b	[SLP][TTI]Improve detection of the insert-subvector pattern for SLP. (#74749 ) SLP vectorizer passes the type of the subvector and the mask, which size determines the size of the resulting vector. TTI should support this pattern to improve cost estimation of the insert_subvector shuffle pattern.	2024-01-10 10:39:34 -05:00
Florian Hahn	8b7bbedec7	[LV] Re-add early exit in VPRecipeBuilder::createBlockInMask. Re-add early exit that was accidentally dropped in 51afb10.	2024-01-10 15:02:14 +00:00
Florian Hahn	51afb10174	[LV] Create block in mask up-front if needed. (#76635 ) At the moment, block and edge masks are created on demand, which means that they are inserted at the point where they are demanded and then cached. It is possible that the mask for a block is looked up later at a point that's not dominated by the point where the mask has been inserted. To avoid this, create masks up front on entry to the corresponding basic block and leave it to VPlan simplification to remove unneeded masks. Note that we need to create masks for all blocks, if any of the blocks in the loop needs predication, as computing the mask of a block depends on the masks of its predecessor. Needed for #76090. https://github.com/llvm/llvm-project/pull/76635	2024-01-09 10:50:08 +00:00
Alexey Bataev	036e48e2f5	[SLP]Fix PR76850: do the analysis of the submask. Need to limit the transformation of the VecMask by the corresponding part of the mask of SliceSize size to avoid compiler crash during further cost analysis.	2024-01-08 07:51:02 -08:00
Florian Hahn	18ec3304a9	[VPlan] Manage InBounds via VPRecipeWithIRFlags for VectorPtrRecipe. As suggested as follow-up in https://github.com/llvm/llvm-project/pull/72164, manage inbounds via VPRecipeWithIRFlags. Note that in some cases we can now preserve inbounds in a few more cases.	2024-01-07 13:58:05 +00:00
Florian Hahn	3fb0d8dc80	Recommit "[VPlan] Mark Select VPInstructions as not having sideeffects." With #70253 landed, selects for reduction results are explicitly used by ComputeReductionResult and Selects can be marked as not having side-effects again. This reverts the revert commit 173032902c960d4d0d67b521d8c149553d8e8ba3.	2024-01-06 12:08:06 +00:00
Florian Hahn	241fe83704	[VPlan] Introduce ComputeReductionResult VPInstruction opcode. (#70253 ) This patch introduces a new ComputeReductionResult opcode to compute the final reduction result in the middle block. The code from fixReduction has been moved to ComputeReductionResult, after some earlier cleanup changes to model parts of fixReduction explicitly elsewhere as needed. The recipe may be broken down further in the future. Note that the phi nodes to merge the reduction result from the trip count check and the middle block, to be used as resume value for the scalar remainder loop are also generated based on ComputeReductionResult. Once we have a VPValue for the reduction result, this can also be modeled explicitly and moved out of the recipe.	2024-01-04 22:53:18 +00:00
Florian Hahn	2ab5c47c87	[VPlan] Don't replace scalarizing recipe with VPWidenCastRecipe. Don't replace a scalarizing recipe with a VPWidenCastRecipe. This would introduce wide (vectorizing) recipes when interleaving only. Fixes https://github.com/llvm/llvm-project/issues/76986	2024-01-04 20:39:44 +00:00
Alexey Bataev	79e62315be	[SLP]Use revectorized value for extracts from buildvector, beeing vectorized. When trying to reuse the extractelement instruction, emitted for the insertelement instruction, need to check, if the this insertelement instruction was vectorized. In this case, need to use vectorized value, not the original insertelement.	2024-01-04 06:45:26 -08:00

1 2 3 4 5 ...

4261 Commits