llvm-project

Author	SHA1	Message	Date
Florian Hahn	654cd94629	[VPlan] Unconditionally run optimizeForVFAndUF. Now that the VPlan for the main vector loop gets cloned in the epilogue vectorization code path, there optimizeForVFAndUF can be applied unconditionally.	2024-05-31 06:32:49 -07:00
Florian Hahn	5785048321	[VPlan] Add VPIRBasicBlock, use to model pre-preheader. (#93398 ) This patch adds a new special type of VPBasicBlock that wraps an existing IR basic block. Recipes of the block get added before the terminator of the wrapped IR basic block. Making it a subclass of VPBasicBlock avoids duplicating various APIs to manage recipes in a block, as well as makes sure the traversals filtering VPBasicBlocks automatically apply as well. Initially VPIRBasicBlock are only used for the pre-preheader (wrapping the original preheader of the scalar loop). As follow-up, this will be used to move more parts of the skeleton inside VPlan, starting with the branch and condition in the middle block. Separated out of https://github.com/llvm/llvm-project/pull/92651 PR: https://github.com/llvm/llvm-project/pull/93398	2024-05-30 11:23:32 -07:00
Ramkumar Ramachandra	43100766f2	LV: generalize profitability criterion over TC (#93300 ) Generalize LoopVectorizationPlanner::isMoreProfitable smoothly across the fixed-vector and scalable-vector cases, taking the trip-count into account, and fixing logical pitfalls that arise from a lack of generality.	2024-05-30 10:54:32 +01:00
Florian Hahn	8b037862b6	[VPlan] Preserve DT (and SCEV) in VPlan-native path (#93287 ) As a follow-up to b2f65e80, use the DTU to also update and preserve the DT in the native path. This should also allow preserving SCEV in the native path PR: https://github.com/llvm/llvm-project/pull/93287	2024-05-27 17:03:53 -07:00
Florian Hahn	83646590af	[VPlan] Remove unused Range arg from createWidenInductionRecipe (NFC). The Range argument is not used by createWidenInductionRecipe; induction classification applies across the whole range of VFs. Remove the argument.	2024-05-25 09:11:36 +01:00
Shih-Po Hung	0338c55ea5	[LV, VPlan] Check if plan is compatible to EVL transform (#92092 ) The transform updates all users of inductions to work based on EVL, instead of the VF directly. At the moment, widened inductions cannot be updated, so bail out if the plan contains any. This patch introduces a check before applying EVL transform. If any recipes in loop rely on RuntimeVF, the plan is discarded.	2024-05-25 08:22:49 +08:00
Ramkumar Ramachandra	bb0d29a72d	[LV] fix logical error in trunc cost (#91136 ) In LoopVectorizationCostModel::getInstructionCost(), when the condition canTruncateToMinimalBitwidth() is satisfied, for a trunc, the source type is computed as the smallest type of the source vector and the destination vector, and the destination type is computed as the largest type of the instruction and destination type. This is clearly a logical error, as the original source vector type could be smaller than the original destination vector type, and the trunc semantics are broken because we're attempting to widen. Fixes #47665.	2024-05-24 18:01:58 +01:00
Florian Hahn	b2f65e809e	[VPlan] Use DomTreeUpdater to automatically update DT for vector loop. (#92525 ) Use DTU to queue DominatorTree updates directly when connecting basic blocks during VPlan execution. This simplifies DT updates and will also automatically allow updating the DT for the VPlan-native path as additional benefit in a follow-up. PR: https://github.com/llvm/llvm-project/pull/92525	2024-05-24 10:10:12 +01:00
Florian Hahn	352dc7d4bb	[LV] Propagate PredicatedBBsAfterVectorization to predecessors. This fixes some cases where predicated BBs where missed previously, leading to under-estimating the cost of those blocks.	2024-05-21 10:27:32 +01:00
Florian Hahn	1e7d047c71	[VPlan] Mark LoopInfo preserved in native-path as well (NFC). LoopInfo is updated during VPlan execution now, so it will also be updated correctly in the native path.	2024-05-17 12:18:01 +01:00
Florian Hahn	b1e99a699d	[LV] Drop redundant comment from createEdgeMask (NFC). Follow-up to remove a redundant comment post-commit https://github.com/llvm/llvm-project/pull/91897	2024-05-14 12:43:47 +01:00
Ramkumar Ramachandra	d7ef34bfe3	[LV] update comment following 63d8058 (NFC) (#91120 ) Address a review comment post landing 63d8058 (LoopVectorize: guard appending InstsToScalarize; fix bug) to update a comment.	2024-05-14 10:59:26 +01:00
Florian Hahn	632317e9ab	[VPlan] Add non-poison propagating LogicalAnd VPInstruction opcode. (#91897 ) Add a new opcode to mode non-poison propagating logical AND operations used when generating edge masks. This follows the similar decision to model Not as dedicated opcode as well, to improve clarity. This also helps to simplify the matchers for https://github.com/llvm/llvm-project/pull/89386. PR: https://github.com/llvm/llvm-project/pull/91897	2024-05-14 09:42:49 +01:00
Florian Hahn	e122380445	[LV] Use VPBuilder to create Select (NFCI).	2024-05-13 20:44:39 +01:00
Florian Hahn	082c81ae4a	[LV] Properly extend versioned constant strides. We only version unknown strides to 1. If the original type is i1, then the sign of the extension matters. Properly extend the stride value before replacing it. Fixes https://github.com/llvm/llvm-project/issues/91369.	2024-05-07 21:31:42 +01:00
Alexey Bataev	6517c5b068	[LV][NFC]Address last comments from https://github.com/llvm/llvm-project/pull/88025 .	2024-05-03 06:51:01 -07:00
Florian Hahn	bccb7ed8ac	Reapply "[LV] Improve AnyOf reduction codegen. (#78304 )" This reverts the revert commit c6e01627acf859. This patch includes a fix for any-of reductions and epilogue vectorization. Extra test coverage for the issue that caused the revert has been added in bce3bfced5fe0b019 and an assertion has been added in c7209cbb8be7a3c65813. -------------------------------- Original commit message: Update AnyOf reduction code generation to only keep track of the AnyOf property in a boolean vector in the loop, only selecting either the new or start value in the middle block. The patch incorporates feedback from https://reviews.llvm.org/D153697. This fixes the #62565, as now there aren't multiple uses of the start/new values. Fixes https://github.com/llvm/llvm-project/issues/62565 PR: https://github.com/llvm/llvm-project/pull/78304	2024-05-03 14:40:49 +01:00
Alexey Bataev	1d43cdc9f5	[LV][EVL]Support reversed loads/stores. Support for predicated vector reverse intrinsic was added some time ago. Adds support for predicated reversed loads/stores in the loop vectorizer. Reviewers: fhahn Reviewed By: fhahn Pull Request: https://github.com/llvm/llvm-project/pull/88025	2024-05-03 07:28:56 -04:00
Florian Hahn	c7209cbb8b	[LV] Assert that there's a resume phi for epilogue loops (NFC). This patch adds an assert to createAndCollectMergePhiForReduction to make sure there is a resume phi when vectorizing the epilogue loop. This is needed to set the resume value from the main vector loop. This assertion guards against the issue caused the revert of https://github.com/llvm/llvm-project/pull/78304.	2024-05-02 19:20:28 +01:00
Florian Hahn	e846778e52	[VPlan] Make CallInst optional for VPWidenCallRecipe (NFCI). Replace relying on the underling CallInst for looking up the called function and its types by instead adding the called function as operand, in line with how called functions are handled in CallInst. Operand bundles, metadata and fast-math flags are optionally used if there's an underlying CallInst. This enables creating VPWidenCallRecipes without requiring an underlying IR instruction.	2024-05-01 20:48:22 +01:00
Florian Hahn	9c3f5fe88f	[LV] Don't consider the latch block as ScalarPredicatedBB. The conditional branch from the loop latch will be replaced by a single branch controlling the loop, so there is no extra overhead from scalarization. This improves the cost esimates in some cases.	2024-04-29 19:15:46 +01:00
Maciej Gabka	bfc0317153	Move several vector intrinsics out of experimental namespace (#88748 ) This patch is moving out following intrinsics: * vector.interleave2/deinterleave2 * vector.reverse * vector.splice from the experimental namespace. All these intrinsics exist in LLVM for more than a year now, and are widely used, so should not be considered as experimental.	2024-04-29 10:16:45 +01:00
Florian Hahn	b6a8f5486b	[LV] Consider all exit branch conditions uniform. If we vectorize a loop with multiple exits, all exiting branches should be considered uniform, as the resulting loop will be controlled by the canonical IV only. Previously we were overestimating the cost of values contributing to the other exits.	2024-04-28 13:15:55 +01:00
Florian Hahn	9ee8e38cdc	[VPlan] Also propagate versioned strides to users via sext/zext. The versioned value may not be used in the loop directly but through a sext/zext. Add new live-ins in those cases.	2024-04-26 21:29:43 +01:00
David Green	a8105026ff	[LV] Fix warning about Mask being set twice. NFC	2024-04-20 16:40:08 +01:00
Florian Hahn	e2a72fa583	[VPlan] Introduce recipes for VP loads and stores. (#87816 ) Introduce new subclasses of VPWidenMemoryRecipe for VP (vector-predicated) loads and stores to address multiple TODOs from https://github.com/llvm/llvm-project/pull/76172 Note that the introduction of the new recipes also improves code-gen for VP gather/scatters by removing the redundant header mask. With the new approach, it is not sufficient to look at users of the widened canonical IV to find all uses of the header mask. In some cases, a widened IV is used instead of separately widening the canonical IV. To handle that, first collect all VPValues representing header masks (by looking at users of both the canonical IV and widened inductions that are canonical) and then checking all users (recursively) of those header masks. Depends on https://github.com/llvm/llvm-project/pull/87411. PR: https://github.com/llvm/llvm-project/pull/87816	2024-04-19 09:44:23 +01:00
Ramkumar Ramachandra	73e7f2ff70	LoopVectorize: guard marking iv as scalar; fix bug (#88730 ) When collecting loop scalars, LoopVectorize over-eagerly marks the induction variable and its update as scalars after vectorization, even if the induction variable update is a first-order recurrence. Guard the process with this check, fixing a crash. Fixes #72969.	2024-04-18 14:41:07 +01:00
Ramkumar Ramachandra	63d8058ef5	LoopVectorize: guard appending InstsToScalarize; fix bug (#88720 ) In the process of collecting instructions to scalarize, LoopVectorize uses faulty reasoning whereby it also adds instructions that will be scalar after vectorization. If an instruction satisfies isScalarAfterVectorization() for the given VF, it should not be appended to InstsToScalarize. Add this extra guard, fixing a crash. Fixes #55096.	2024-04-18 10:03:07 +01:00
Florian Hahn	a9bafe91dd	[VPlan] Split VPWidenMemoryInstructionRecipe (NFCI). (#87411 ) This patch introduces a new VPWidenMemoryRecipe base class and distinct sub-classes to model loads and stores. This is a first step in an effort to simplify and modularize code generation for widened loads and stores and enable adding further more specialized memory recipes. PR: https://github.com/llvm/llvm-project/pull/87411	2024-04-17 11:00:58 +01:00
Mel Chen	cbe148b730	[LV][NFC] Remove the declaration of function `fixReduction`. (#88491 )	2024-04-17 17:59:52 +08:00
Arthur Eubanks	c6e01627ac	Revert "Reapply "[LV] Improve AnyOf reduction codegen. (#78304 )"" This reverts commit c6e38b928c56f562aea68a8e90f02dbdf0eada85. Causes miscompiles, see comments on #78304.	2024-04-16 20:40:21 +00:00
Alexey Bataev	e84b2fb48d	[LV][NFCI]Use integer for cost/trip count calculations instead of double, fix possible UB. Using fp type in the compiler is not the best idea, here it used with the comparison for equal to 0 and may cause undefined behavior in some cases. Reviewers: fhahn Reviewed By: fhahn Pull Request: https://github.com/llvm/llvm-project/pull/87241	2024-04-16 09:48:13 -04:00
Florian Hahn	c836983671	[VPlan] Remove unused first mask op from VPBlendRecipe. (#87770 ) VPBlendRecipe does not use the first mask operand. Removing it allows VPlan-based DCE to remove unused mask computations. This also fixes #87410, where unused Not VPInstructions are considered having only their first lane demanded, but some of their operands providing a vector value due to other users. Fixes https://github.com/llvm/llvm-project/issues/87410 PR: https://github.com/llvm/llvm-project/pull/87770	2024-04-09 11:14:05 +01:00
Florian Hahn	9430a4b9d2	[VPlan] Use getEdgeMask when constructing VPBlendRecipe (NFCI). After 2d0d65b3babe, block-in and edge masks are create up-front. Only retrieve the cached edge-mask here.	2024-04-09 09:32:40 +01:00
Florian Hahn	15d11a4de9	[VPlan] Track IsOrdered in VPReductionRecipe, remove use of ILV (NFCI). Instead of using ILV.useOrderedReductions during ::execute, instead store the information at recipe construction. Another step towards making recipe'::execute independent of legacy ILV.	2024-04-07 20:33:22 +01:00
Florian Hahn	c6e38b928c	Reapply "[LV] Improve AnyOf reduction codegen. (#78304 )" This reverts the revert commit 589c7abb03448. This patch includes a fix for any-of reductions and epilogue vectorization. Extra test coverage for the issue that caused the revert has been added in 399ff08e29d. -------------------------------- Original commit message: Update AnyOf reduction code generation to only keep track of the AnyOf property in a boolean vector in the loop, only selecting either the new or start value in the middle block. The patch incorporates feedback from https://reviews.llvm.org/D153697. This fixes the #62565, as now there aren't multiple uses of the start/new values. Fixes https://github.com/llvm/llvm-project/issues/62565 PR: https://github.com/llvm/llvm-project/pull/78304	2024-04-05 13:45:13 +01:00
Alexey Bataev	413a66f339	[LV, VP]VP intrinsics support for the Loop Vectorizer + adding new tail-folding mode using EVL. (#76172 ) This patch introduces generating VP intrinsics in the Loop Vectorizer. Currently the Loop Vectorizer supports vector predication in a very limited capacity via tail-folding and masked load/store/gather/scatter intrinsics. However, this does not let architectures with active vector length predication support take advantage of their capabilities. Architectures with general masked predication support also can only take advantage of predication on memory operations. By having a way for the Loop Vectorizer to generate Vector Predication intrinsics, which (will) provide a target-independent way to model predicated vector instructions. These architectures can make better use of their predication capabilities. Our first approach (implemented in this patch) builds on top of the existing tail-folding mechanism in the LV (just adds a new tail-folding mode using EVL), but instead of generating masked intrinsics for memory operations it generates VP intrinsics for loads/stores instructions. The patch adds a new VPlanTransforms to replace the wide header predicate compare with EVL and updates codegen for load/stores to use VP store/load with EVL. Other important part of this approach is how the Explicit Vector Length is computed. (VP intrinsics define this vector length parameter as Explicit Vector Length (EVL)). We use an experimental intrinsic `get_vector_length`, that can be lowered to architecture specific instruction(s) to compute EVL. Also, added a new recipe to emit instructions for computing EVL. Using VPlan in this way will eventually help build and compare VPlans corresponding to different strategies and alternatives. Differential Revision: https://reviews.llvm.org/D99750	2024-04-04 18:30:17 -04:00
Florian Hahn	e701c1a653	[VPlan] Use recipe's debug loc for VPWidenMemoryInstructionRecipe (NFCI) Now that VPRecipeBase manages debug locations for recipes, use it in VPWidenMemoryInstructionRecipe.	2024-04-01 12:07:30 +01:00
Florian Hahn	8a614c1d31	[VPlan] Rename getVPValueOrAddLiveIn -> getOrAddLiveIn (NFCI). The helper now only deals with live-ins, clarify the name.	2024-03-28 21:02:15 +00:00
Florian Hahn	06bb8c9f20	[VPlan] Explicitly handle scalar pointer inductions. (#83068 ) Add a new PtrAdd opcode to VPInstruction that corresponds to IRBuilder::CreatePtrAdd, which creates a GEP with source element type i8. This is then used to model scalarizing VPWidenPointerInductionRecipe by introducing scalar-steps to model the index increment followed by a PtrAdd. Note that PtrAdd needs to be able to generate code for only the first lane or for all lanes. This may warrant introducing a separate recipe for scalarizing that can be created without relying on the underlying IR. Depends on https://github.com/llvm/llvm-project/pull/80271 PR: https://github.com/llvm/llvm-project/pull/83068	2024-03-26 16:01:57 +01:00
Florian Hahn	39c8e87717	[VPlan] Move recording of Inst->VPValue to VPRecipeBuilder (NFCI). (#84464 ) Instead of keeping a mapping of Inst->VPValues (of their corresponding recipes) in VPlan's Value2VPValue mapping, keep it in VPRecipeBuilder instead. After recently replacing the last user of this mapping after initial construction, this mapping is only needed for recipe construction (to map IR operands to VPValue operands). By moving the mapping, VPlan's VPValue tracking can be simplified and limited only to live-ins. It also allows removing disableValue2VPValue and associated machinery & asserts. PR: https://github.com/llvm/llvm-project/pull/84464	2024-03-23 18:43:14 +01:00
Florian Hahn	8578b6e912	[VPlan] Store VPlan directly in VPRecipeBuilder (NFCI). Instead of passing VPlan in a number of places, just store it directly in VPRecipeBuilder. A single instance is only used for a single VPlan. This simplifies the code and was suggested by @nikolaypanchenko in https://github.com/llvm/llvm-project/pull/84464.	2024-03-18 19:23:37 +00:00
Paschalis Mpeis	f795d1a8b1	[AArch64][LV][SLP] Vectorizers use call cost for vectorized frem (#82488 ) getArithmeticInstrCost is used by both LoopVectorizer and SLPVectorizer to compute the cost of frem, which becomes a call cost on AArch64 when TLI has a vector library function. Add tests that do SLP vectorization for code that contains 2x double and 4x float frem instructions.	2024-03-14 17:20:29 +00:00
Kirill Stoimenov	589c7abb03	Revert "[LV] Improve AnyOf reduction codegen. (#78304 )" Broke sanitizer bots: https://lab.llvm.org/buildbot/#/builders/74/builds/26697 This reverts commit 95fef1dfefd5467206e74c089d29806fcd82889b.	2024-03-14 14:57:01 +00:00
Florian Hahn	95fef1dfef	[LV] Improve AnyOf reduction codegen. (#78304 ) Update AnyOf reduction code generation to only keep track of the AnyOf property in a boolean vector in the loop, only selecting either the new or start value in the middle block. The patch incorporates feedback from https://reviews.llvm.org/D153697. This fixes the #62565, as now there aren't multiple uses of the start/new values. Fixes https://github.com/llvm/llvm-project/issues/62565 PR: https://github.com/llvm/llvm-project/pull/78304	2024-03-14 11:22:06 +00:00
Jeremy Morse	2fe81edef6	[NFC][RemoveDIs] Insert instruction using iterators in Transforms/ As part of the RemoveDIs project we need LLVM to insert instructions using iterators wherever possible, so that the iterators can carry a bit of debug-info. This commit implements some of that by updating the contents of llvm/lib/Transforms/Utils to always use iterator-versions of instruction constructors. There are two general flavours of update: * Almost all call-sites just call getIterator on an instruction * Several make use of an existing iterator (scenarios where the code is actually significant for debug-info) The underlying logic is that any call to getFirstInsertionPt or similar APIs that identify the start of a block need to have that iterator passed directly to the insertion function, without being converted to a bare Instruction pointer along the way. Noteworthy changes: * FindInsertedValue now takes an optional iterator rather than an instruction pointer, as we need to always insert with iterators, * I've added a few iterator-taking versions of some value-tracking and DomTree methods -- they just unwrap the iterator. These are purely convenience methods to avoid extra syntax in some passes. * A few calls to getNextNode become std::next instead (to keep in the theme of using iterators for positions), * SeparateConstOffsetFromGEP has it's insertion-position field changed. Noteworthy because it's not a purely localised spelling change. All this should be NFC.	2024-03-05 15:12:22 +00:00
Florian Hahn	4d525f2b9a	[VPlan] Remove unneeded InsertPointGuard (NFCI). getBlockInMask now simply returns an already computed mask, hence there's no need to adjust the builder insert point.	2024-02-29 12:37:13 +00:00
Nilanjana Basu	1c211bc76e	[LV] Remove unused configuration option (#82955 ) Recent set of changes (PR #67725) in loop interleaving algorithm caused removal of the loop trip count threshold for allowing interleaving. Therefore configuration option interleave-small-loop-scalar-reduction is no longer needed.	2024-02-28 10:17:25 -08:00
Florian Hahn	3fac0562f8	[VPlan] Reset trip count when replacing ExpandSCEV recipe. Otherwise accessing the trip count may accesses freed memory. Fixes https://lab.llvm.org/buildbot/#/builders/74/builds/26239 and others.	2024-02-28 16:31:49 +00:00
Alexey Bataev	80cff27390	[LV][NFC]Fix a misprint, NFC.	2024-02-28 07:56:31 -08:00

1 2 3 4 5 ...

2074 Commits