llvm-project

Author	SHA1	Message	Date
Sander de Smalen	9449deda12	[LV] NFC: Move logic to query maximum vscale to its own function. To query the maximum value for vscale, the LV queries the vscale_range attribute or a TTI hook. To avoid having to reimplement the same behaviour for multiple uses (such as in D142894), it makes sense to move this code to a separate function.	2023-02-23 15:12:35 +00:00
Jonas Paulsson	1387a13e1d	[SLP] Check with target before vectorizing GEP Indices. The target hook prefersVectorizedAddressing() already exists to check with target if address computations should be vectorized, so it seems like this should be used in SLPVectorizer as well. Reviewed By: ABataev, RKSimon Differential Revision: https://reviews.llvm.org/D144128	2023-02-23 15:31:34 +01:00
OCHyams	620a529760	[Assignment Tracking] Choose better passes for RemoveRedundantDbgInstrs call Enabling assignment tracking without this patch, a significant amount of additional compiler run time comes from the RemoveRedundantDbgInstrs call in InstCombine. This patch reduces compiler run time by choosing better places to call RemoveRedundantDbgInstrs. In non-assignment-tracking builds, RemoveRedundantDbgInstrs is called by InstCombine if LowerDbgDeclare makes a change (i.e. it is _sometimes_ called). In assignment tracking builds LowerDbgDeclare doesn't do anything. We still need to clean up redundant intrinsics to avoid a large performance hit due to the number of instructions, so the current approach is to have InstCombine _always_ call RemoveRedundantDbgInstrs. Instrumenting the compiler to run RemoveRedundantDbgInstrs after every pass and dump the numbers and building CTMark/tramp3d-v4 indicates that SROA and LoopVectorize give us a bigger bang (number removed) for buck (times pass is run). The compile time tracker reports that this patch reduces the number of instructions retired building CTMark projects by an average of 1.1%. Reviewed By: scott.linder Differential Revision: https://reviews.llvm.org/D144483	2023-02-22 16:28:06 +00:00
Luke Lau	b02b1e0ed6	[LV][NFC] Use ElementCount for getMaxInterleaveFactor In order to allow targets to disable interleaving for scalable vectors, pass the entire VF's ElementCount to getMaxInterleaveFactor. This is based off of the approach used here: `8d36708507` The plan would then be to disable interleaving on scalable VFs on RISC-V in a follow up patch. See https://reviews.llvm.org/D143723#4132349 Reviewed By: reames Differential Revision: https://reviews.llvm.org/D144474	2023-02-22 10:15:05 +00:00
Liren Peng	529ee9750b	[NFC] Use single quotes for single char output during `printPipline` Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D144365	2023-02-22 02:35:13 +00:00
Alexey Bataev	cbcdd747e8	[SLP]Do not swap not counted extractelements. No need to swap extractelements, which were not excluded from the list during cost analysis. It leads to incorrect cost calculation and make vector code more profitable than it is actually is.	2023-02-21 13:16:51 -08:00
Alexey Bataev	5f928a223e	[SLP]Properly define incoming block for user PHI nodes. MainOp of the PHI vectorizable entries contains the proper order of incoming blocks, not the last instruction in the block.	2023-02-21 08:01:24 -08:00
Alexey Bataev	708eb1b96d	[SLP]Add shuffling of extractelements to avoid extra costs/data movement. If the scalar must be extracted and then used in the gather node, instead we can emit shuffle instruction to avoid those extra extractelements and vector-to-scalar and back data movement. Part of D110978 Differential Revision: https://reviews.llvm.org/D141940	2023-02-20 06:14:42 -08:00
Florian Hahn	c21ccebe6f	[VPlan] Use usesScalars in shouldPack. Suggested by @Ayal as follow-up improvement in D143864. I was unable to find a case where this actually changes generated code, but it is a unifying code to use common infrastructure.	2023-02-20 14:11:40 +00:00
Florian Hahn	df016a9525	[VPlan] Move shouldPack outside of DEBUG ifdef. This fixes a build failure with assertions disabled.	2023-02-20 10:53:45 +00:00
Florian Hahn	9333b97763	[VPlan] Replace AlsoPack field with shouldPack() method (NFC). There is no need to update the AlsoPack field when creating VPReplicateRecipes. It can be easily computed based on the VP def-use chains when it is needed. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D143864	2023-02-20 10:28:26 +00:00
Kazu Hirata	a28b252d85	Use APInt::getSignificantBits instead of APInt::getMinSignedBits (NFC) Note that getMinSignedBits has been soft-deprecated in favor of getSignificantBits.	2023-02-19 23:56:52 -08:00
Kazu Hirata	f8f3db2756	Use APInt::count{l,r}_{zero,one} (NFC)	2023-02-19 22:04:47 -08:00
Florian Hahn	f61c9b7569	[SLP] Fix infinite loop in isUndefVector. This fixes an infinite loop if isa<T>(II->getOperand(1)) is true. Update Base at the top of the loop, before the continue. Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D144292	2023-02-19 21:42:24 +00:00
Florian Hahn	7737c05696	[VPlan] Make sure properlyDominates(A, A) returns false. At the moment, properlyDominates(A, A) can return true via LocalComesBefore. Add an early exit to ensure it returns false if A == B. Note: no test has been added because the existing test suite covers this case already with libc++ with assertions enabled. Fixes https://github.com/llvm/llvm-project/issues/60850.	2023-02-19 18:01:16 +00:00
Alexey Bataev	e03d254bbd	[SLP]Do not reduce repeated values, use scalar red ops instead. Metric: size..text size..text results results0 diff SingleSource/Regression/C/gcc-c-torture/execute/GCC-C-execute-980605-1.test 445.00 461.00 3.6% SingleSource/Benchmarks/Adobe-C++/loop_unroll.test 428477.00 428445.00 -0.0% External/SPEC/CFP2006/447.dealII/447.dealII.test 618849.00 618785.00 -0.0% For all tests some extra code was optimized, GCC-C-execute has some more inlining after Differential Revision: https://reviews.llvm.org/D132261	2023-02-17 07:19:35 -08:00
Florian Hahn	a3d1de3e29	[LV] Move invalid cost remark code to separate function (NFC). The code only needs access to INvalidCosts, ORE and TheLoop, so it can easily be moved into a helper to make selectVectorizationFactor more compact. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D143957	2023-02-16 11:28:19 +00:00
Kazu Hirata	7e6e636fb6	Use llvm::has_single_bit<uint32_t> (NFC) This patch replaces isPowerOf2_32 with llvm::has_single_bit<uint32_t> where the argument is wider than uint32_t.	2023-02-15 22:17:27 -08:00
Hongtao Yu	eddec9de44	[Pseudo probe] Duplicate probes in vectorized loop body. Prevoius pseudo probes were dropped out of a vectorized loop body during loop vectorization. This can result in the samples of the loop entry is used for the loop body, which in turn can cause undercounting of the loop iteration count. The undercounting can further prevent the loop from being vectorized in the next build. I'm fixing this by explicting allowing pseudo probes to be kept in the vectorized loop body, and by claiming a probe instruction is not "uniform", the vectorizer will duplicate it by the number of vector lanes. For one internal service, I'm seeing the change causes the size increase of the .pseudoprobe section by 0.7%, which should count around 0.2% of the whole binary size. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D144066	2023-02-15 10:18:08 -08:00
Graham Hunter	0fa5df1959	[LV] Synthesize all true masks for masked vector function variants When vectorizing code with function calls in it, if we encounter a function which only has vectorized variants requiring a mask we can synthesize an all-true mask to enable us to proceed. Since we want the mask to be represented in vplan, the pointer to the chosen Function is now stored as part of the VPWidenCallRecipe, and mask arguments are added at the appropriate index to the recipe operands. Reviewed By: david-arm, fhahn, reames Differential Revision: https://reviews.llvm.org/D132458	2023-02-14 14:33:18 +00:00
Fangrui Song	1e6921131a	Move global namespace cl::opt inside llvm::	2023-02-14 00:09:44 -08:00
Florian Hahn	807d43239a	[VPlan] Use properlyDominates predicate for ordering FOR users. The current implementation may return true for A < B and B < A, which may cause issues if the sort implementation assures this property of the comperator. This should fix a crash with MSVC.	2023-02-13 21:24:58 +00:00
Florian Hahn	af3c25dc3d	[VPlan] Fix iterator invalidation in adjustFixedOrderRecurrences. adjustFixedOrderRecurrences may insert instructions after immediately after the PHI nodes in the block. This invalidates the phis() iterator. To avoid crashing/accessing invalid recipes, first collect all first-order recurrence phi recipes. This should fix a crash reported by @dmgreen after D142589 landed.	2023-02-13 13:51:14 +00:00
Florian Hahn	2e6430666c	[LV] Update recipe builder functions to pass VPlan directly (NFC). Passing VPlanPtr requires a dereference of std::unique_ptr on each access, which is unnecessary. Just pass the plan by reference.	2023-02-12 22:35:14 +00:00
Sanjay Patel	af39acda88	[VectorCombine] fix insertion point of shuffles As shown in issue #60649, the new shuffles were being inserted before a phi, and that is invalid. It seems like most test coverage for this fold (foldSelectShuffle) lives in the AArch64 dir, but this doesn't repro there for a base target.	2023-02-10 10:57:11 -05:00
Sander de Smalen	5a115452c4	Reland D143267: [LoopVectorize] Use DataLayout::getIndexType instead of i32 for non-constant GEP indices. Fixed issue where 'ConstantInt::get(IndextTy, -Part)' was executed with the wrong type for Part, e.g. IndexTy was i64, but Part was 'unsigned', which led to things like 'mul i64 .., 4294967292', which was obviously wrong. Also changed sve-vector-reverse.ll to be vectorized with UF>1 to test this. This reverts commit 1f01cdda68614dba12af3cc3aff38541d0abcc6b.	2023-02-09 09:42:29 +00:00
Florian Hahn	c83fdc905a	[LV] Perform recurrence sinking directly on VPlan. This patch updates LV to sink recipes directly using the VPlan use chains. The initial patch only moves sinking to be purely VPlan-based. Follow-up patches will move legality checks to VPlan as well. At the moment, there's a single test failure remaining. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D142589	2023-02-08 15:49:29 +00:00
Sander de Smalen	1f01cdda68	Revert "[LoopVectorize] Use DataLayout::getIndexType instead of i32 for non-constant GEP indices." This patch causes a regression, so reverting it while I investigate the issue. This reverts commit e6eb84a191ca2a1afd5789c5bb398da68bb6065e.	2023-02-08 15:46:52 +00:00
Florian Hahn	32efff591a	[VPlan] Mark load VPWidenMemoryInstruction as not having side-effects. Also add an assert using the underlying instruction to catch any potential violations.	2023-02-07 22:02:50 +00:00
Arthur Eubanks	15977742d3	Reland [LegacyPM] Remove some legacy passes These are part of the optimization pipeline, of which the legacy pass manager version is deprecated. Namely * Internalize * StripSymbols * StripNonDebugSymbols * StripDeadDebugInfo * StripDeadPrototypes * VectorCombine * WarnMissedTransformations Fixed previously failing ocaml tests (one of them seems to already be failing?)	2023-02-07 12:56:05 -08:00
Arthur Eubanks	1b254022b2	Revert "[LegacyPM] Remove some legacy passes" This reverts commit a4b4f62beb0bf40123181e5f5bdf32ef54f87166. Ocaml bindings tests failing.	2023-02-07 10:17:45 -08:00
Arthur Eubanks	a4b4f62beb	[LegacyPM] Remove some legacy passes These are part of the optimization pipeline, of which the legacy pass manager version is deprecated. Namely * Internalize * StripSymbols * StripNonDebugSymbols * StripDeadDebugInfo * StripDeadPrototypes * VectorCombine * WarnMissedTransformations	2023-02-07 09:57:48 -08:00
Sander de Smalen	e6eb84a191	[LoopVectorize] Use DataLayout::getIndexType instead of i32 for non-constant GEP indices. This is specifically relevant for loops that vectorize using a scalable VF, where the code results in: %vscale = call i32 llvm.vscale.i32() %vf.part1 = mul i32 %vscale, 4 %gep = getelementptr ..., i32 %vf.part1 Which InstCombine then changes into: %vscale = call i32 llvm.vscale.i32() %vf.part1 = mul i32 %vscale, 4 %vf.part1.zext = sext i32 %vf.part1 to i64 %gep = getelementptr ..., i32 %vf.part1.zext D143016 tried to remove these extends, but that only works when the call to llvm.vscale.i32() has a single use. After doing any kind of CSE on these calls the combine no longer kicks in. It seems more sensible to ask DataLayout what type to use, rather than relying on InstCombine to insert the extend and hoping it can fold it away. I've only changed this for indices that are not constant, because I vaguely remember there was a reason for sticking with i32. It would also mean patching up loads more tests. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D143267	2023-02-07 11:47:51 +00:00
Simon Pilgrim	552e27c521	[SLP] Use allConstant helper. NFCI.	2023-02-05 19:21:48 +00:00
Sander de Smalen	005311399e	[LoopVectorize][TTI] NFCI: Clarify enum for the tail folding style. This NFC (intended) patch has several small changes: * It renames PredicationStyle to TailFoldingStyle. * It renames TTI.emitActiveLaneMask() to TTI.getPreferredTailFoldingStyle() * Simplifies some of its uses in the LoopVectorizer Rationale: To my surprise PredicationStyle::None did not mean 'no predication', but rather 'no active lane mask intrinsic', such that the predicate is created using a splat + compare with stepvector. The enum is also highly specific to tail folding, so it seems better to name this around that feature, i.e. 'tail folding style'. This also makes it more amenable to extend it to other tail folding styles, such as the one added in D142109. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D142887	2023-02-03 14:59:57 +00:00
Florian Hahn	cf2d436b31	[VPlan] VPPredInstPHIRecipe does not read from memory. VPPredInstPHIRecipe just merges the incoming values and does not write to memory.	2023-01-31 21:51:03 +00:00
Florian Hahn	5368536cf1	[VPlan] VPPredInstPHIRecipes does not write to memory. VPPredInstPHIRecipe just merges the incoming values and does not write to memory.	2023-01-30 10:29:27 +00:00
Kazu Hirata	526966d07d	Use llvm::bit_ceil (NFC) Note that: std::has_single_bit(X) ? X : llvm::NextPowerOf2(X); is equivalent to: std::bit_ceil(X) even for input 0.	2023-01-28 16:13:09 -08:00
Kazu Hirata	f20b5071f3	[llvm] Use llvm::bit_floor instead of llvm::PowerOf2Floor (NFC)	2023-01-28 09:06:31 -08:00
Florian Hahn	b6b3d20d06	[VPlan] Use VPDominatorTree in VPlanVerifier . Use VPDominatorTree to generalize def-use verification. Depends on D140513. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D140514	2023-01-25 16:32:40 +00:00
Florian Hahn	bf9e0da1a5	[VPlan] Switch default graph traits to be recursive, update VPDomTree. This updates the GraphTraits specialization for VPBlockBase to recurse through VPRegionBlocks. This in turn enables using VPDominatorTree to query dominance between any block in a plan. This should enable additional use cases, including improvements to def-use verification and porting IR-based transforms that rely on the dominator tree. Specifically, this change means that for regions, the entry and exit blocks dominate the successors of the region. Depends on D140512 and D142162. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D140513	2023-01-23 14:00:43 +00:00
Florian Hahn	31d46ca8aa	[Dominators] Introduce DomTreeNodeTraits to allow customization. (NFC) This patch introduces DomTreeNodeTraits for customization. Clients can implement DomTreeNodeTraitsCustom to provide custom ParentPtr, getEntryNode and getParent. There's also a default specialization if DomTreeNodeTraitsCustom is not implemented, that assume a Function-like NodeT. This is what is used for the existing DominatorTree and MachineDominatorTree. The main motivation for this patch is using DominatorTreeBase across all regions of a VPlan, see D140513. Reviewed By: kuhar Differential Revision: https://reviews.llvm.org/D142162	2023-01-22 20:22:41 +00:00
Florian Hahn	fb40c34b8f	[VPlan] Consider all recipes in replicate blocks as sink candidates. Update sinkScalarOperands to consider all operands of recipes in replicate blocks as sink candidates This enables additional sinking opportunities and is another step towards retiring LLVM IR-based sinkScalarOperands. This enables iterative sinking of operands for successive calls of sinkScalarOperands. Depends on D139788. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D139790	2023-01-21 17:14:13 +00:00
ShihPo Hung	5fb3a57ea7	[Cost] Add CostKind to getVectorInstrCost and its related users LoopUnroll estimates the loop size via getInstructionCost(), but getInstructionCost() cannot pass CostKind to getVectorInstrCost(). And so does getShuffleCost() to getBroadcastShuffleOverhead(), getPermuteShuffleOverhead(), getExtractSubvectorOverhead(), and getInsertSubvectorOverhead(). To address this, this patch adds an argument CostKind to these functions. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D142116	2023-01-21 05:29:24 -08:00
Alexey Bataev	9bdcf8778a	[SLP]Improve isGatherShuffledEntry by looking deeper through the reused scalars. The compiler may produce better results if it does not look for constants, uses an extra analysis of phi nodes, looks through all tree nodes without skipping the cases, where the very first set of nodes is empty. Also, it tries to reshufle the nodes if it is profitable for sure, i.e. at least 2 scalars are used for single node permutation and at least 3 scalars are used for the permutation of 2 nodes. Part of D110978 Differential Revision: https://reviews.llvm.org/D141512	2023-01-19 13:46:25 -08:00
Florian Hahn	e2c43a547b	[VPlan] Add vp_depth_first_deep (NFC) Similar to vp_depth_first_shallow (D140512) add vp_depth_first_deep to make existing code clearer and more compact. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D142055	2023-01-19 20:34:23 +00:00
Florian Hahn	655c88ca36	[VPlan] Add vp_depth_first_shallow + graph traits for wrapper(NFC) This patch adds a new VPBlockShallowTraversalWrapper struct to provide graph traits specialization that do not traverse through VPRegionBlocks. This matches the behavior of the existing traits for plain VPBlockBase and is a step before moving the graph traits for VPBlockBase to traverse through VPRegionBlocks to enable cross region support in VPDominatorTree. Depends on D140511. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D140512	2023-01-19 12:07:27 +00:00
Florian Hahn	feee22db52	[VPlan] Disconnect VPRegionBlock from successors in graph iterator(NFCI) This updates the VPAllSuccessorsIterator to not connect the VPRegionBlock itself to its successors. The successors are connected to the exit block of the region. At the moment, this doesn't change any exisint functionality. But the new schema ensures the following property when used for VPDominatorTree: 1. Entry & exit blocks of regions dominate the successors of the region. This allows for convenient checking of dominance between defs and uses that are not defined in the same region. I will share a follow-up patch to use it for the VPDominatorTree soon. Depends on D140500. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D140511	2023-01-18 15:02:41 +00:00
Florian Hahn	22c9f4cf2d	[VPlan] Replace VPInterleaveRecipe::classof with VP_CLASSOF_IMPL. (NFC)	2023-01-18 14:23:22 +00:00
Florian Hahn	f615de7e26	[VPlan] Replace VPBranchOnMaskSC::classof with VP_CLASSOF_IMPL. (NFC)	2023-01-18 12:14:58 +00:00

1 2 3 4 5 ...

3597 Commits