llvm-project

Author	SHA1	Message	Date
Florian Hahn	f61c9b7569	[SLP] Fix infinite loop in isUndefVector. This fixes an infinite loop if isa<T>(II->getOperand(1)) is true. Update Base at the top of the loop, before the continue. Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D144292	2023-02-19 21:42:24 +00:00
Florian Hahn	7737c05696	[VPlan] Make sure properlyDominates(A, A) returns false. At the moment, properlyDominates(A, A) can return true via LocalComesBefore. Add an early exit to ensure it returns false if A == B. Note: no test has been added because the existing test suite covers this case already with libc++ with assertions enabled. Fixes https://github.com/llvm/llvm-project/issues/60850.	2023-02-19 18:01:16 +00:00
Alexey Bataev	e03d254bbd	[SLP]Do not reduce repeated values, use scalar red ops instead. Metric: size..text size..text results results0 diff SingleSource/Regression/C/gcc-c-torture/execute/GCC-C-execute-980605-1.test 445.00 461.00 3.6% SingleSource/Benchmarks/Adobe-C++/loop_unroll.test 428477.00 428445.00 -0.0% External/SPEC/CFP2006/447.dealII/447.dealII.test 618849.00 618785.00 -0.0% For all tests some extra code was optimized, GCC-C-execute has some more inlining after Differential Revision: https://reviews.llvm.org/D132261	2023-02-17 07:19:35 -08:00
Florian Hahn	a3d1de3e29	[LV] Move invalid cost remark code to separate function (NFC). The code only needs access to INvalidCosts, ORE and TheLoop, so it can easily be moved into a helper to make selectVectorizationFactor more compact. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D143957	2023-02-16 11:28:19 +00:00
Kazu Hirata	7e6e636fb6	Use llvm::has_single_bit<uint32_t> (NFC) This patch replaces isPowerOf2_32 with llvm::has_single_bit<uint32_t> where the argument is wider than uint32_t.	2023-02-15 22:17:27 -08:00
Hongtao Yu	eddec9de44	[Pseudo probe] Duplicate probes in vectorized loop body. Prevoius pseudo probes were dropped out of a vectorized loop body during loop vectorization. This can result in the samples of the loop entry is used for the loop body, which in turn can cause undercounting of the loop iteration count. The undercounting can further prevent the loop from being vectorized in the next build. I'm fixing this by explicting allowing pseudo probes to be kept in the vectorized loop body, and by claiming a probe instruction is not "uniform", the vectorizer will duplicate it by the number of vector lanes. For one internal service, I'm seeing the change causes the size increase of the .pseudoprobe section by 0.7%, which should count around 0.2% of the whole binary size. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D144066	2023-02-15 10:18:08 -08:00
Graham Hunter	0fa5df1959	[LV] Synthesize all true masks for masked vector function variants When vectorizing code with function calls in it, if we encounter a function which only has vectorized variants requiring a mask we can synthesize an all-true mask to enable us to proceed. Since we want the mask to be represented in vplan, the pointer to the chosen Function is now stored as part of the VPWidenCallRecipe, and mask arguments are added at the appropriate index to the recipe operands. Reviewed By: david-arm, fhahn, reames Differential Revision: https://reviews.llvm.org/D132458	2023-02-14 14:33:18 +00:00
Fangrui Song	1e6921131a	Move global namespace cl::opt inside llvm::	2023-02-14 00:09:44 -08:00
Florian Hahn	807d43239a	[VPlan] Use properlyDominates predicate for ordering FOR users. The current implementation may return true for A < B and B < A, which may cause issues if the sort implementation assures this property of the comperator. This should fix a crash with MSVC.	2023-02-13 21:24:58 +00:00
Florian Hahn	af3c25dc3d	[VPlan] Fix iterator invalidation in adjustFixedOrderRecurrences. adjustFixedOrderRecurrences may insert instructions after immediately after the PHI nodes in the block. This invalidates the phis() iterator. To avoid crashing/accessing invalid recipes, first collect all first-order recurrence phi recipes. This should fix a crash reported by @dmgreen after D142589 landed.	2023-02-13 13:51:14 +00:00
Florian Hahn	2e6430666c	[LV] Update recipe builder functions to pass VPlan directly (NFC). Passing VPlanPtr requires a dereference of std::unique_ptr on each access, which is unnecessary. Just pass the plan by reference.	2023-02-12 22:35:14 +00:00
Sanjay Patel	af39acda88	[VectorCombine] fix insertion point of shuffles As shown in issue #60649, the new shuffles were being inserted before a phi, and that is invalid. It seems like most test coverage for this fold (foldSelectShuffle) lives in the AArch64 dir, but this doesn't repro there for a base target.	2023-02-10 10:57:11 -05:00
Sander de Smalen	5a115452c4	Reland D143267: [LoopVectorize] Use DataLayout::getIndexType instead of i32 for non-constant GEP indices. Fixed issue where 'ConstantInt::get(IndextTy, -Part)' was executed with the wrong type for Part, e.g. IndexTy was i64, but Part was 'unsigned', which led to things like 'mul i64 .., 4294967292', which was obviously wrong. Also changed sve-vector-reverse.ll to be vectorized with UF>1 to test this. This reverts commit 1f01cdda68614dba12af3cc3aff38541d0abcc6b.	2023-02-09 09:42:29 +00:00
Florian Hahn	c83fdc905a	[LV] Perform recurrence sinking directly on VPlan. This patch updates LV to sink recipes directly using the VPlan use chains. The initial patch only moves sinking to be purely VPlan-based. Follow-up patches will move legality checks to VPlan as well. At the moment, there's a single test failure remaining. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D142589	2023-02-08 15:49:29 +00:00
Sander de Smalen	1f01cdda68	Revert "[LoopVectorize] Use DataLayout::getIndexType instead of i32 for non-constant GEP indices." This patch causes a regression, so reverting it while I investigate the issue. This reverts commit e6eb84a191ca2a1afd5789c5bb398da68bb6065e.	2023-02-08 15:46:52 +00:00
Florian Hahn	32efff591a	[VPlan] Mark load VPWidenMemoryInstruction as not having side-effects. Also add an assert using the underlying instruction to catch any potential violations.	2023-02-07 22:02:50 +00:00
Arthur Eubanks	15977742d3	Reland [LegacyPM] Remove some legacy passes These are part of the optimization pipeline, of which the legacy pass manager version is deprecated. Namely * Internalize * StripSymbols * StripNonDebugSymbols * StripDeadDebugInfo * StripDeadPrototypes * VectorCombine * WarnMissedTransformations Fixed previously failing ocaml tests (one of them seems to already be failing?)	2023-02-07 12:56:05 -08:00
Arthur Eubanks	1b254022b2	Revert "[LegacyPM] Remove some legacy passes" This reverts commit a4b4f62beb0bf40123181e5f5bdf32ef54f87166. Ocaml bindings tests failing.	2023-02-07 10:17:45 -08:00
Arthur Eubanks	a4b4f62beb	[LegacyPM] Remove some legacy passes These are part of the optimization pipeline, of which the legacy pass manager version is deprecated. Namely * Internalize * StripSymbols * StripNonDebugSymbols * StripDeadDebugInfo * StripDeadPrototypes * VectorCombine * WarnMissedTransformations	2023-02-07 09:57:48 -08:00
Sander de Smalen	e6eb84a191	[LoopVectorize] Use DataLayout::getIndexType instead of i32 for non-constant GEP indices. This is specifically relevant for loops that vectorize using a scalable VF, where the code results in: %vscale = call i32 llvm.vscale.i32() %vf.part1 = mul i32 %vscale, 4 %gep = getelementptr ..., i32 %vf.part1 Which InstCombine then changes into: %vscale = call i32 llvm.vscale.i32() %vf.part1 = mul i32 %vscale, 4 %vf.part1.zext = sext i32 %vf.part1 to i64 %gep = getelementptr ..., i32 %vf.part1.zext D143016 tried to remove these extends, but that only works when the call to llvm.vscale.i32() has a single use. After doing any kind of CSE on these calls the combine no longer kicks in. It seems more sensible to ask DataLayout what type to use, rather than relying on InstCombine to insert the extend and hoping it can fold it away. I've only changed this for indices that are not constant, because I vaguely remember there was a reason for sticking with i32. It would also mean patching up loads more tests. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D143267	2023-02-07 11:47:51 +00:00
Simon Pilgrim	552e27c521	[SLP] Use allConstant helper. NFCI.	2023-02-05 19:21:48 +00:00
Sander de Smalen	005311399e	[LoopVectorize][TTI] NFCI: Clarify enum for the tail folding style. This NFC (intended) patch has several small changes: * It renames PredicationStyle to TailFoldingStyle. * It renames TTI.emitActiveLaneMask() to TTI.getPreferredTailFoldingStyle() * Simplifies some of its uses in the LoopVectorizer Rationale: To my surprise PredicationStyle::None did not mean 'no predication', but rather 'no active lane mask intrinsic', such that the predicate is created using a splat + compare with stepvector. The enum is also highly specific to tail folding, so it seems better to name this around that feature, i.e. 'tail folding style'. This also makes it more amenable to extend it to other tail folding styles, such as the one added in D142109. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D142887	2023-02-03 14:59:57 +00:00
Florian Hahn	cf2d436b31	[VPlan] VPPredInstPHIRecipe does not read from memory. VPPredInstPHIRecipe just merges the incoming values and does not write to memory.	2023-01-31 21:51:03 +00:00
Florian Hahn	5368536cf1	[VPlan] VPPredInstPHIRecipes does not write to memory. VPPredInstPHIRecipe just merges the incoming values and does not write to memory.	2023-01-30 10:29:27 +00:00
Kazu Hirata	526966d07d	Use llvm::bit_ceil (NFC) Note that: std::has_single_bit(X) ? X : llvm::NextPowerOf2(X); is equivalent to: std::bit_ceil(X) even for input 0.	2023-01-28 16:13:09 -08:00
Kazu Hirata	f20b5071f3	[llvm] Use llvm::bit_floor instead of llvm::PowerOf2Floor (NFC)	2023-01-28 09:06:31 -08:00
Florian Hahn	b6b3d20d06	[VPlan] Use VPDominatorTree in VPlanVerifier . Use VPDominatorTree to generalize def-use verification. Depends on D140513. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D140514	2023-01-25 16:32:40 +00:00
Florian Hahn	bf9e0da1a5	[VPlan] Switch default graph traits to be recursive, update VPDomTree. This updates the GraphTraits specialization for VPBlockBase to recurse through VPRegionBlocks. This in turn enables using VPDominatorTree to query dominance between any block in a plan. This should enable additional use cases, including improvements to def-use verification and porting IR-based transforms that rely on the dominator tree. Specifically, this change means that for regions, the entry and exit blocks dominate the successors of the region. Depends on D140512 and D142162. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D140513	2023-01-23 14:00:43 +00:00
Florian Hahn	31d46ca8aa	[Dominators] Introduce DomTreeNodeTraits to allow customization. (NFC) This patch introduces DomTreeNodeTraits for customization. Clients can implement DomTreeNodeTraitsCustom to provide custom ParentPtr, getEntryNode and getParent. There's also a default specialization if DomTreeNodeTraitsCustom is not implemented, that assume a Function-like NodeT. This is what is used for the existing DominatorTree and MachineDominatorTree. The main motivation for this patch is using DominatorTreeBase across all regions of a VPlan, see D140513. Reviewed By: kuhar Differential Revision: https://reviews.llvm.org/D142162	2023-01-22 20:22:41 +00:00
Florian Hahn	fb40c34b8f	[VPlan] Consider all recipes in replicate blocks as sink candidates. Update sinkScalarOperands to consider all operands of recipes in replicate blocks as sink candidates This enables additional sinking opportunities and is another step towards retiring LLVM IR-based sinkScalarOperands. This enables iterative sinking of operands for successive calls of sinkScalarOperands. Depends on D139788. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D139790	2023-01-21 17:14:13 +00:00
ShihPo Hung	5fb3a57ea7	[Cost] Add CostKind to getVectorInstrCost and its related users LoopUnroll estimates the loop size via getInstructionCost(), but getInstructionCost() cannot pass CostKind to getVectorInstrCost(). And so does getShuffleCost() to getBroadcastShuffleOverhead(), getPermuteShuffleOverhead(), getExtractSubvectorOverhead(), and getInsertSubvectorOverhead(). To address this, this patch adds an argument CostKind to these functions. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D142116	2023-01-21 05:29:24 -08:00
Alexey Bataev	9bdcf8778a	[SLP]Improve isGatherShuffledEntry by looking deeper through the reused scalars. The compiler may produce better results if it does not look for constants, uses an extra analysis of phi nodes, looks through all tree nodes without skipping the cases, where the very first set of nodes is empty. Also, it tries to reshufle the nodes if it is profitable for sure, i.e. at least 2 scalars are used for single node permutation and at least 3 scalars are used for the permutation of 2 nodes. Part of D110978 Differential Revision: https://reviews.llvm.org/D141512	2023-01-19 13:46:25 -08:00
Florian Hahn	e2c43a547b	[VPlan] Add vp_depth_first_deep (NFC) Similar to vp_depth_first_shallow (D140512) add vp_depth_first_deep to make existing code clearer and more compact. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D142055	2023-01-19 20:34:23 +00:00
Florian Hahn	655c88ca36	[VPlan] Add vp_depth_first_shallow + graph traits for wrapper(NFC) This patch adds a new VPBlockShallowTraversalWrapper struct to provide graph traits specialization that do not traverse through VPRegionBlocks. This matches the behavior of the existing traits for plain VPBlockBase and is a step before moving the graph traits for VPBlockBase to traverse through VPRegionBlocks to enable cross region support in VPDominatorTree. Depends on D140511. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D140512	2023-01-19 12:07:27 +00:00
Florian Hahn	feee22db52	[VPlan] Disconnect VPRegionBlock from successors in graph iterator(NFCI) This updates the VPAllSuccessorsIterator to not connect the VPRegionBlock itself to its successors. The successors are connected to the exit block of the region. At the moment, this doesn't change any exisint functionality. But the new schema ensures the following property when used for VPDominatorTree: 1. Entry & exit blocks of regions dominate the successors of the region. This allows for convenient checking of dominance between defs and uses that are not defined in the same region. I will share a follow-up patch to use it for the VPDominatorTree soon. Depends on D140500. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D140511	2023-01-18 15:02:41 +00:00
Florian Hahn	22c9f4cf2d	[VPlan] Replace VPInterleaveRecipe::classof with VP_CLASSOF_IMPL. (NFC)	2023-01-18 14:23:22 +00:00
Florian Hahn	f615de7e26	[VPlan] Replace VPBranchOnMaskSC::classof with VP_CLASSOF_IMPL. (NFC)	2023-01-18 12:14:58 +00:00
Florian Hahn	cdd8fcdbd7	[VPlan] Replace VPExpandSCEVRecipe::classof with VP_CLASSOF_IMPL. (NFC)	2023-01-17 21:11:33 +00:00
Florian Hahn	bf1ba6bb52	[VPlan] Replace VPScalarIVStepsRecipe::classof with VP_CLASSOF_IMPL(NFC)	2023-01-17 20:53:14 +00:00
Florian Hahn	d47bdae28e	[VPlan] Remove duplicated VPValue IDs (NFCI). At the moment, both VPValue and VPDef have an ID used when casting via classof. This duplication is cumbersome, because it requires adding IDs for new recipes twice and also requires setting them twice. In a few cases, there's only a VPDef ID and no VPValue ID, which can cause same confusion. To simplify things, remove the VPValue IDs for different recipes. Instead, only retain the generic VPValue ID (= used VPValues without a corresponding defining recipe) and VPVRecipe for VPValues that are defined by recipes that inherit from VPValue. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D140848	2023-01-17 15:11:38 +00:00
Florian Hahn	c95138392a	[VPlan] Remove unnecessary getNumSuccessors call (NFC). If ParentWithSuccs is nullptr, the number of successors is guaranteed to be 0. Simplify the code as suggested by @Ayal in D140511.	2023-01-17 11:44:50 +00:00
Florian Hahn	133f017479	[VPlan] Remove unneeded VPUser::classof(const VPDef *) (NFC). This specialization is not needed any longer as VPRecipeBase inherits from VPUser and getDefiningRecipe returns a VPRecipeBase.	2023-01-17 09:08:33 +00:00
Joe Loser	a288d7f937	[llvm][ADT] Replace uses of `makeMutableArrayRef` with deduction guides Similar to how `makeArrayRef` is deprecated in favor of deduction guides, do the same for `makeMutableArrayRef`. Once all of the places in-tree are using the deduction guides for `MutableArrayRef`, we can mark `makeMutableArrayRef` as deprecated. Differential Revision: https://reviews.llvm.org/D141814	2023-01-16 14:49:37 -07:00
Florian Hahn	56ffd39c3d	[VPlan] Use VPDef prefix for VPDef IDs instead of VPRecipeBase (NFC). Various places in the code where still using the VPRecipeBase:: prefix for VPDef IDs or not prefix at all. Now that the VPDef IDs have been moved to VPDef, use this prefix instead and consistently use it.	2023-01-16 10:23:52 +00:00
Florian Hahn	2b054d5dd4	[VPlan] Use to_vector when iterating over a temporary vector. (NFC)	2023-01-13 17:51:00 +00:00
Guillaume Chatelet	8fd5558b29	[NFC] Use TypeSize::geFixedValue() instead of TypeSize::getFixedSize() This change is one of a series to implement the discussion from https://reviews.llvm.org/D141134.	2023-01-11 16:49:38 +00:00
Guillaume Chatelet	48f5d77eee	[NFC] Use TypeSize::getKnownMinValue() instead of TypeSize::getKnownMinSize() This change is one of a series to implement the discussion from https://reviews.llvm.org/D141134.	2023-01-11 16:36:39 +00:00
Miguel Saldivar	3c5e0d87f8	[LoopVectorize] Clear cache of `LoopAccessInfoManager` LAI is cached during the LoopDistribute pass, and is later re-used during LoopVectorize. The problem is that LoopVectorize changes SCEV, and the cached LAI does not get updated. Hence, when re-using the cached LAI, it references an invalid SCEV. Fixes #59319 Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D139601	2023-01-11 09:03:40 +00:00
Valery N Dmitriev	fd7273359a	[SLP] Do not ignore ordering for root node when it has in-tree uses. When rooted with PHIs, a vectorization tree may have another node with PHIs which have roots as their operands. We cannot ignore ordering information for root in such a case. Differential Revision: https://reviews.llvm.org/D141309	2023-01-10 10:12:51 -08:00
Alexey Bataev	755282ec1e	[SLP][NFC]Move getExtractIndex function for future changes, NFC.	2023-01-09 09:53:01 -08:00

1 2 3 4 5 ...

3584 Commits