llvm-project

Author	SHA1	Message	Date
Alexey Bataev	9bdcf8778a	[SLP]Improve isGatherShuffledEntry by looking deeper through the reused scalars. The compiler may produce better results if it does not look for constants, uses an extra analysis of phi nodes, looks through all tree nodes without skipping the cases, where the very first set of nodes is empty. Also, it tries to reshufle the nodes if it is profitable for sure, i.e. at least 2 scalars are used for single node permutation and at least 3 scalars are used for the permutation of 2 nodes. Part of D110978 Differential Revision: https://reviews.llvm.org/D141512	2023-01-19 13:46:25 -08:00
Florian Hahn	e2c43a547b	[VPlan] Add vp_depth_first_deep (NFC) Similar to vp_depth_first_shallow (D140512) add vp_depth_first_deep to make existing code clearer and more compact. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D142055	2023-01-19 20:34:23 +00:00
Florian Hahn	655c88ca36	[VPlan] Add vp_depth_first_shallow + graph traits for wrapper(NFC) This patch adds a new VPBlockShallowTraversalWrapper struct to provide graph traits specialization that do not traverse through VPRegionBlocks. This matches the behavior of the existing traits for plain VPBlockBase and is a step before moving the graph traits for VPBlockBase to traverse through VPRegionBlocks to enable cross region support in VPDominatorTree. Depends on D140511. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D140512	2023-01-19 12:07:27 +00:00
Florian Hahn	feee22db52	[VPlan] Disconnect VPRegionBlock from successors in graph iterator(NFCI) This updates the VPAllSuccessorsIterator to not connect the VPRegionBlock itself to its successors. The successors are connected to the exit block of the region. At the moment, this doesn't change any exisint functionality. But the new schema ensures the following property when used for VPDominatorTree: 1. Entry & exit blocks of regions dominate the successors of the region. This allows for convenient checking of dominance between defs and uses that are not defined in the same region. I will share a follow-up patch to use it for the VPDominatorTree soon. Depends on D140500. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D140511	2023-01-18 15:02:41 +00:00
Florian Hahn	22c9f4cf2d	[VPlan] Replace VPInterleaveRecipe::classof with VP_CLASSOF_IMPL. (NFC)	2023-01-18 14:23:22 +00:00
Florian Hahn	f615de7e26	[VPlan] Replace VPBranchOnMaskSC::classof with VP_CLASSOF_IMPL. (NFC)	2023-01-18 12:14:58 +00:00
Florian Hahn	cdd8fcdbd7	[VPlan] Replace VPExpandSCEVRecipe::classof with VP_CLASSOF_IMPL. (NFC)	2023-01-17 21:11:33 +00:00
Florian Hahn	bf1ba6bb52	[VPlan] Replace VPScalarIVStepsRecipe::classof with VP_CLASSOF_IMPL(NFC)	2023-01-17 20:53:14 +00:00
Florian Hahn	d47bdae28e	[VPlan] Remove duplicated VPValue IDs (NFCI). At the moment, both VPValue and VPDef have an ID used when casting via classof. This duplication is cumbersome, because it requires adding IDs for new recipes twice and also requires setting them twice. In a few cases, there's only a VPDef ID and no VPValue ID, which can cause same confusion. To simplify things, remove the VPValue IDs for different recipes. Instead, only retain the generic VPValue ID (= used VPValues without a corresponding defining recipe) and VPVRecipe for VPValues that are defined by recipes that inherit from VPValue. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D140848	2023-01-17 15:11:38 +00:00
Florian Hahn	c95138392a	[VPlan] Remove unnecessary getNumSuccessors call (NFC). If ParentWithSuccs is nullptr, the number of successors is guaranteed to be 0. Simplify the code as suggested by @Ayal in D140511.	2023-01-17 11:44:50 +00:00
Florian Hahn	133f017479	[VPlan] Remove unneeded VPUser::classof(const VPDef *) (NFC). This specialization is not needed any longer as VPRecipeBase inherits from VPUser and getDefiningRecipe returns a VPRecipeBase.	2023-01-17 09:08:33 +00:00
Joe Loser	a288d7f937	[llvm][ADT] Replace uses of `makeMutableArrayRef` with deduction guides Similar to how `makeArrayRef` is deprecated in favor of deduction guides, do the same for `makeMutableArrayRef`. Once all of the places in-tree are using the deduction guides for `MutableArrayRef`, we can mark `makeMutableArrayRef` as deprecated. Differential Revision: https://reviews.llvm.org/D141814	2023-01-16 14:49:37 -07:00
Florian Hahn	56ffd39c3d	[VPlan] Use VPDef prefix for VPDef IDs instead of VPRecipeBase (NFC). Various places in the code where still using the VPRecipeBase:: prefix for VPDef IDs or not prefix at all. Now that the VPDef IDs have been moved to VPDef, use this prefix instead and consistently use it.	2023-01-16 10:23:52 +00:00
Florian Hahn	2b054d5dd4	[VPlan] Use to_vector when iterating over a temporary vector. (NFC)	2023-01-13 17:51:00 +00:00
Guillaume Chatelet	8fd5558b29	[NFC] Use TypeSize::geFixedValue() instead of TypeSize::getFixedSize() This change is one of a series to implement the discussion from https://reviews.llvm.org/D141134.	2023-01-11 16:49:38 +00:00
Guillaume Chatelet	48f5d77eee	[NFC] Use TypeSize::getKnownMinValue() instead of TypeSize::getKnownMinSize() This change is one of a series to implement the discussion from https://reviews.llvm.org/D141134.	2023-01-11 16:36:39 +00:00
Miguel Saldivar	3c5e0d87f8	[LoopVectorize] Clear cache of `LoopAccessInfoManager` LAI is cached during the LoopDistribute pass, and is later re-used during LoopVectorize. The problem is that LoopVectorize changes SCEV, and the cached LAI does not get updated. Hence, when re-using the cached LAI, it references an invalid SCEV. Fixes #59319 Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D139601	2023-01-11 09:03:40 +00:00
Valery N Dmitriev	fd7273359a	[SLP] Do not ignore ordering for root node when it has in-tree uses. When rooted with PHIs, a vectorization tree may have another node with PHIs which have roots as their operands. We cannot ignore ordering information for root in such a case. Differential Revision: https://reviews.llvm.org/D141309	2023-01-10 10:12:51 -08:00
Alexey Bataev	755282ec1e	[SLP][NFC]Move getExtractIndex function for future changes, NFC.	2023-01-09 09:53:01 -08:00
Benjamin Kramer	b6942a2880	[NFC] Hide implementation details in anonymous namespaces	2023-01-08 17:37:02 +01:00
Florian Hahn	78914e8c32	[VPlan] Keep entries in worklist in sinkScalarOperands. Not removing the entries ensures that duplicates are avoided, reducing the number of iterations.	2023-01-08 15:52:57 +00:00
Alexey Bataev	996ad44b97	[SLP][NFC]Fix compile build by declaring ArrayRef, NFC. Fix compiler build reported in https://lab.llvm.org/buildbot#builders/243/builds/218	2023-01-06 17:01:48 -08:00
Alexey Bataev	cc17e93178	[SLP][NFC]Remove unused variables, NFC.	2023-01-06 16:55:54 -08:00
Alexey Bataev	7439e1b2de	[SLP]Fix incorrect reordering of clustered scalars. The new mask represents the order, not the mask itself. At first, need to treat as the order, convert to mask and only after that reorder gathered scalars to build correct clustered order. Differential Revision: https://reviews.llvm.org/D141161	2023-01-06 16:04:09 -08:00
Alexey Bataev	9b5f62685a	[SLP]Fix cost of the broadcast buildvector/gather. Need to include the cost of the initial insertelement to the cost of the broadcasts. Also, need to adjust the cost of the gather/buildvector if the element is inserted into poison/undef vector. Differential Revision: https://reviews.llvm.org/D140498	2023-01-06 09:25:05 -08:00
Florian Hahn	68469a80cb	[LV] Disable runtime unrolling for vectorized loops. This patch adds metadata to disable runtime unrolling to the vectorized loop. If runtime unrolling/interleaving is considered profitable, LV will interleave the loop directly. There should be no need to perform runtime unrolling at a later stage. Note that we already add metadata to disable runtime unrolling to the scalar loop after vectorization. The additional unrolling unnecessarily increases code size and compile time. In addition to that we have several bug reports of unncessary runtime unrolling for vectorized loops, e.g. PR40961 Compile-time improvements: NewPM-O3: -1.04% NewPM-ReleaseThinLTO: -0.59% NewPM-ReleaseLTO-g: -0.97% https://llvm-compile-time-tracker.com/compare.php?from=ce1be13a868d0f8afa367975558c1a6175cce33a&to=78bc2e67f22e9e10e61cdb6cdac4bb857d95eb1b&stat=instructions:u Fixes #40306. Reviewed By: lebedev.ri, nikic Differential Revision: https://reviews.llvm.org/D115261	2023-01-06 10:56:17 +00:00
Valery N Dmitriev	6d677c0b3d	[SLP] Unify GEP cost modeling for load, store and GEP nodes. Make a separate routine for GEPs cost calculation and make the approach uniform across load, store and GEP tree nodes. Additional issue fixed is GEP cost savings were applied twice for ScatterVectorize nodes (aka gather load) making them look unrealistically profitable for vectorization. Differential Revision: https://reviews.llvm.org/D140789	2023-01-05 10:11:36 -08:00
serge-sans-paille	38818b60c5	Move from llvm::makeArrayRef to ArrayRef deduction guides - llvm/ part Use deduction guides instead of helper functions. The only non-automatic changes have been: 1. ArrayRef(some_uint8_pointer, 0) needs to be changed into ArrayRef(some_uint8_pointer, (size_t)0) to avoid an ambiguous call with ArrayRef((uint8_t), (uint8_t)) 2. CVSymbol sym(makeArrayRef(symStorage)); needed to be rewritten as CVSymbol sym{ArrayRef(symStorage)}; otherwise the compiler is confused and thinks we have a (bad) function prototype. There was a few similar situation across the codebase. 3. ADL doesn't seem to work the same for deduction-guides and functions, so at some point the llvm namespace must be explicitly stated. 4. The "reference mode" of makeArrayRef(ArrayRef<T> &) that acts as no-op is not supported (a constructor cannot achieve that). Per reviewers' comment, some useless makeArrayRef have been removed in the process. This is a follow-up to https://reviews.llvm.org/D140896 that introduced the deduction guides. Differential Revision: https://reviews.llvm.org/D140955	2023-01-05 14:11:08 +01:00
David Green	586fd86b0a	[LoopVectorizer] Fix inloop reductions mask placement The validation of vplans could fail if an inloop reduction was created with a block-in mask that did not dominate the reduction. This makes sure that the insert point is set when creating the mask, to ensure it dominates the reduction. Differential Revision: https://reviews.llvm.org/D141003	2023-01-05 11:37:37 +00:00
Augie Fackler	0676156f81	Revert "[VPlan] Also consider operands of sink candidates in same block." This reverts commit aa2414729ebbcb2d8f162e9002a3a6aa768b1f9d. Previously-valid IR from a tensorflow test case (as shown on the Diffusion revision for aa2414729ebbcb2d8f162e9002a3a6aa768b1f9d) started hanging in the loop-vectorize pass. Reverting to keep everyone working.	2023-01-04 16:17:13 -05:00
Alexey Bataev	a1b18946f9	[SLP]Fix incorrect shuffle results because of missing shuffle mask analysis. Missed the analysis of the shuffle mask when trying to analyze the operands of the shuffle instruction during peeking through shuffle instructions.	2023-01-04 13:10:40 -08:00
Dinar Temirbulatov	55c600819f	[SLP][AArch64] Incorrectly estimated intrinsic as a function call. We incorrectly assume intrinsic as a function call and it prevents us from the opportunity to vectorize. On Aarch64 Cortex-A53 we think that llvm.fmuladd.f64 is a function call which is wrong. Differential Revision: https://reviews.llvm.org/D140392	2023-01-03 19:45:24 +00:00
Alexey Bataev	26fec4e845	[SLP]Fix crash on casting non-instruction extractelement. Need to check if the extractelement operation is an extraction before trying to move it around the buildblocks to avoid crash on cast.	2023-01-03 09:45:57 -08:00
Florian Hahn	ce1be13a86	[VPlan] Use VP_CLASSOF_IMPL for VPWidenCanonicalIVRecipe(NFC). Replace VPWidenCanonicalIVRecipe::classof implementation with general VP_CLASSOF_IMPL.	2023-01-02 17:52:13 +00:00
Florian Hahn	64f1d845b3	[VPlan] Use VP_CLASSOF_IMPL for VPWidenMemoryInstructionRecipe (NFC). Replace VPWidenMemoryInstructionRecipe ::classof implementation with general VP_CLASSOF_IMPL.	2023-01-02 17:32:31 +00:00
Florian Hahn	2d6d47f807	[VPlan] Use VP_CLASSOF_IMPL for VPPredInstPHI (NFC). Replace VPPredInstPHI::classof implementation with general VP_CLASSOF_IMPL.	2023-01-02 17:22:34 +00:00
Florian Hahn	89718815c6	[VPlan] Adjust mergeReplicateRegions to be in line with mergeBlock (NFC) Adjust mergeReplicateRegions to be in line with mergeBlocksIntoPredecessors added in 36d70a6aea6b by collecting only the valid candidates first. Also rename to mergeReplicateRegionsIntoSuccessors and add missing doc-comment. This addresses post-commit suggestions by @Ayal.	2023-01-01 19:48:49 +00:00
Florian Hahn	cd16a3f04c	[VPlan] Move GraphTraits definitions to separate header (NFC). This reduces the size of VPlan.h and avoids future growth of the file when the graph traits are extended in future patches. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D140500	2022-12-31 15:14:57 +00:00
Florian Hahn	aa2414729e	[VPlan] Also consider operands of sink candidates in same block. Even if the the sink candidate is already in the target block, its operands can be candidates for sinking. Queue them up as well. Also moves the queuing logic to a helper.	2022-12-30 18:24:35 +00:00
Alexey Bataev	5dccea5a68	[SLP]Do not emit many extractelements, reuse the single one emitted. We do not need to emit many extractelements for each particular use, we can reuse the only one, just need to adjust it to make it dominate on all uses. Differential Revision: https://reviews.llvm.org/D140580	2022-12-30 06:38:06 -08:00
Valery N Dmitriev	ad956ed568	[SLP] Fix debug print for cost in tryToVectorizeList - NFC. Actual VF was confused with local variable named "VF".	2022-12-29 11:30:10 -08:00
Valery N Dmitriev	8eb3698b94	[SLP] A couple of minor improvements for slp graph view - NFC. Show ScatterVectorize nodes in frames of blue color and print vectorize tree indices.	2022-12-29 11:02:36 -08:00
Alexey Bataev	ac01ae71f0	[SLP]Use ShuffleInstructionBuilder for vector shrinking. We can use ShuffleInstructionBuilder now for shrinking shuffle emission. It allows to remove extra shuffle from the emitted code and reuse original vector. Part of D110978 Differential Revision: https://reviews.llvm.org/D140499	2022-12-28 06:09:04 -08:00
Michael Maitland	396b0b2b13	[LV] Remove duplicate name set of vector header basic block. NFC The preheader was named explicitly in 256c6b0ba14e8a7ab6373b61b7193ea8c0a3651c which makes setting the name in prior commit 95b2aa511eea1f31e183a2a3aed4d2aa852d089c unnecessary. Differential Revision: https://reviews.llvm.org/D140246	2022-12-27 17:19:08 -08:00
Florian Hahn	e91e62db14	[LV] Sink scalar operands and merge regions repeatedly. Merging regions can enable new sinking opportunities (e.g. if users of a scalar value are moved from different VPBBs into the same VPBB). Sinking in turn can also enable new merging opportunities (e.g. if a recipe between to merge-able regions is moved. To enable more sinking opportunities, repeat sinking & merging if regions could be merged. Also fix mergeReplicateRegions to return the correct Changed status. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D139788	2022-12-27 18:08:32 +00:00
Alexey Bataev	a9b052e2ef	[SLP]Fix PR59693: Do not crash trying to set insert point for buildvector of extractvalues. No need to get the last instruction only for vectorized extractvalues, for gathered(buildvector sequence) still need to get the insertion point.	2022-12-27 06:01:38 -08:00
Florian Hahn	36d70a6aea	[VPlan] Remove redundant blocks by merging them into predecessors. Add and run VPlan transform to fold blocks with a single predecessor into the predecessor. This remove redundant blocks and addresses a TODO to replace special handling for the vector latch VPBB. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D139927	2022-12-26 22:47:09 +00:00
Florian Hahn	435e220ba6	[VPlan] Use VPBB in sinkScalarOperands directly. (NFC) Suggested by @Ayal in D139790.	2022-12-25 21:34:59 +00:00
Florian Hahn	9758242046	[LV] Use SCEV to check if the trip count <= VF * UF. Just comparing constant trip counts causes LV to miss cases where the vector loop body only executes once. The motivation for this is to remove the need for unrolling to remove vector loop back-edges, if the body only executes once in more cases. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D133017	2022-12-24 18:34:54 +00:00
Florian Hahn	e1650c8d52	[LV] Move exit cond simplification to separate transform. This sets the stage for D133017 by moving out the code that performs VPlan based simplifications to a separate transform that takes the chosen VF & UF as arguments. The main advantage is that this transform runs before any changes to the CFG are being made. This allows using SCEV without worrying about making queries while the IR is in an incomplete state. Note that this patch switches the reasoning to use SCEV, but still only simplifies loops with constant trip counts. Using SCEV here is needed to access the backedge taken count, because the trip count IR value has not been created yet. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D135017	2022-12-23 12:51:21 +00:00

1 2 3 4 5 ...

3553 Commits