llvm-project

Author	SHA1	Message	Date
Florian Hahn	c836983671	[VPlan] Remove unused first mask op from VPBlendRecipe. (#87770 ) VPBlendRecipe does not use the first mask operand. Removing it allows VPlan-based DCE to remove unused mask computations. This also fixes #87410, where unused Not VPInstructions are considered having only their first lane demanded, but some of their operands providing a vector value due to other users. Fixes https://github.com/llvm/llvm-project/issues/87410 PR: https://github.com/llvm/llvm-project/pull/87770	2024-04-09 11:14:05 +01:00
Florian Hahn	15d11a4de9	[VPlan] Track IsOrdered in VPReductionRecipe, remove use of ILV (NFCI). Instead of using ILV.useOrderedReductions during ::execute, instead store the information at recipe construction. Another step towards making recipe'::execute independent of legacy ILV.	2024-04-07 20:33:22 +01:00
Alexey Bataev	413a66f339	[LV, VP]VP intrinsics support for the Loop Vectorizer + adding new tail-folding mode using EVL. (#76172 ) This patch introduces generating VP intrinsics in the Loop Vectorizer. Currently the Loop Vectorizer supports vector predication in a very limited capacity via tail-folding and masked load/store/gather/scatter intrinsics. However, this does not let architectures with active vector length predication support take advantage of their capabilities. Architectures with general masked predication support also can only take advantage of predication on memory operations. By having a way for the Loop Vectorizer to generate Vector Predication intrinsics, which (will) provide a target-independent way to model predicated vector instructions. These architectures can make better use of their predication capabilities. Our first approach (implemented in this patch) builds on top of the existing tail-folding mechanism in the LV (just adds a new tail-folding mode using EVL), but instead of generating masked intrinsics for memory operations it generates VP intrinsics for loads/stores instructions. The patch adds a new VPlanTransforms to replace the wide header predicate compare with EVL and updates codegen for load/stores to use VP store/load with EVL. Other important part of this approach is how the Explicit Vector Length is computed. (VP intrinsics define this vector length parameter as Explicit Vector Length (EVL)). We use an experimental intrinsic `get_vector_length`, that can be lowered to architecture specific instruction(s) to compute EVL. Also, added a new recipe to emit instructions for computing EVL. Using VPlan in this way will eventually help build and compare VPlans corresponding to different strategies and alternatives. Differential Revision: https://reviews.llvm.org/D99750	2024-04-04 18:30:17 -04:00
Florian Hahn	e5abd963c7	[VPlan] Remove VPTransformState::addMetadata with ArrayRef arg (NFCI). addMeadata is only over called with a single element, clean up the variant that takes multiple values.	2024-04-03 09:43:12 +01:00
Florian Hahn	6261c53c6f	[VPlan] Make sure OR VPInstructions are treated as disjoint ops. Make sure that VPInstructions with OR opcodes are properly registered as disjoint ops. Fixes https://github.com/llvm/llvm-project/issues/87378.	2024-04-02 21:48:51 +01:00
Florian Hahn	e701c1a653	[VPlan] Use recipe's debug loc for VPWidenMemoryInstructionRecipe (NFCI) Now that VPRecipeBase manages debug locations for recipes, use it in VPWidenMemoryInstructionRecipe.	2024-04-01 12:07:30 +01:00
Florian Hahn	a34834138a	[VPlan] Inline addVPValue into single caller (NFCI). Inline the function into its single caller.	2024-04-01 11:12:35 +01:00
Florian Hahn	8d9cb6b016	[VPlan] Inline getVPValue in only caller (NFCI).	2024-03-30 20:38:40 +00:00
Florian Hahn	8a614c1d31	[VPlan] Rename getVPValueOrAddLiveIn -> getOrAddLiveIn (NFCI). The helper now only deals with live-ins, clarify the name.	2024-03-28 21:02:15 +00:00
Florian Hahn	6ef829941b	Recommit "[VPlan] Replace disjoint or with add instead of dropping disjoint. (#83821 )" Recommit with a fix for the use-after-free causing the revert. This reverts the revert commit f872043e055f4163c3c4b1b86ca0354490174987. Original commit message: Dropping disjoint from an OR may yield incorrect results, as some analysis may have converted it to an Add implicitly (e.g. SCEV used for dependence analysis). Instead, replace it with an equivalent Add. This is possible as all users of the disjoint OR only access lanes where the operands are disjoint or poison otherwise. Note that replacing all disjoint ORs with ADDs instead of dropping the flags is not strictly necessary. It is only needed for disjoint ORs that SCEV treated as ADDs, but those are not tracked. There are other places that may drop poison-generating flags; those likely need similar treatment. Fixes https://github.com/llvm/llvm-project/issues/81872 PR: https://github.com/llvm/llvm-project/pull/83821	2024-03-27 19:11:18 +00:00
Florian Hahn	06bb8c9f20	[VPlan] Explicitly handle scalar pointer inductions. (#83068 ) Add a new PtrAdd opcode to VPInstruction that corresponds to IRBuilder::CreatePtrAdd, which creates a GEP with source element type i8. This is then used to model scalarizing VPWidenPointerInductionRecipe by introducing scalar-steps to model the index increment followed by a PtrAdd. Note that PtrAdd needs to be able to generate code for only the first lane or for all lanes. This may warrant introducing a separate recipe for scalarizing that can be created without relying on the underlying IR. Depends on https://github.com/llvm/llvm-project/pull/80271 PR: https://github.com/llvm/llvm-project/pull/83068	2024-03-26 16:01:57 +01:00
Florian Hahn	1081d3a0a7	[VPlan] Mark CanonicalIVIncrementForPart as only using part 0 of IV. CanonicalIVIncrementForPart uses VPIteration(0, 0) of the IV (first operand), mark it as only using part 0. This avoids generating redundant IV increments per part.	2024-03-25 11:27:17 +00:00
Florian Hahn	39c8e87717	[VPlan] Move recording of Inst->VPValue to VPRecipeBuilder (NFCI). (#84464 ) Instead of keeping a mapping of Inst->VPValues (of their corresponding recipes) in VPlan's Value2VPValue mapping, keep it in VPRecipeBuilder instead. After recently replacing the last user of this mapping after initial construction, this mapping is only needed for recipe construction (to map IR operands to VPValue operands). By moving the mapping, VPlan's VPValue tracking can be simplified and limited only to live-ins. It also allows removing disableValue2VPValue and associated machinery & asserts. PR: https://github.com/llvm/llvm-project/pull/84464	2024-03-23 18:43:14 +01:00
Benjamin Kramer	f872043e05	Revert "[VPlan] Replace disjoint or with add instead of dropping disjoint. (#83821 )" This reverts commit c2c1e6ee4ce0df3d000ba880fa6cf58441da6462. It creates a use after free. ==8342==ERROR: AddressSanitizer: heap-use-after-free on address 0x50f000001760 at pc 0x55b9fb84a8fb bp 0x7ffc18468a10 sp 0x7ffc18468a08 READ of size 1 at 0x50f000001760 thread T0 #0 0x55b9fb84a8fa in dropPoisonGeneratingFlags llvm/lib/Transforms/Vectorize/VPlan.h:1040:13 #1 0x55b9fb84a8fa in llvm::VPlanTransforms::dropPoisonGeneratingRecipes(llvm::VPlan&, llvm::function_ref<bool (llvm::BasicBlock)>)::$_0::operator()(llvm::VPRecipeBase) const llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:1236:23 #2 0x55b9fb84a196 in llvm::VPlanTransforms::dropPoisonGeneratingRecipes(llvm::VPlan&, llvm::function_ref<bool (llvm::BasicBlock*)>) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp Can be reproduced with asan on Transforms/LoopVectorize/AArch64/sve-interleaved-masked-accesses.ll Transforms/LoopVectorize/X86/pr81872.ll Transforms/LoopVectorize/X86/x86-interleaved-accesses-masked-group.ll	2024-03-20 15:14:58 +01:00
Florian Hahn	c2c1e6ee4c	[VPlan] Replace disjoint or with add instead of dropping disjoint. (#83821 ) Dropping disjoint from an OR may yield incorrect results, as some analysis may have converted it to an Add implicitly (e.g. SCEV used for dependence analysis). Instead, replace it with an equivalent Add. This is possible as all users of the disjoint OR only access lanes where the operands are disjoint or poison otherwise. Note that replacing all disjoint ORs with ADDs instead of dropping the flags is not strictly necessary. It is only needed for disjoint ORs that SCEV treated as ADDs, but those are not tracked. There are other places that may drop poison-generating flags; those likely need similar treatment. Fixes https://github.com/llvm/llvm-project/issues/81872 PR: https://github.com/llvm/llvm-project/pull/83821	2024-03-19 20:16:18 +01:00
Florian Hahn	fd93a5e3c0	[VPlan] Support match unary and binary recipes in pattern matcher (NFC). Generalize pattern matchers to take recipe types to match as template arguments and use it to provide matchers for unary and binary recipes with specific opcodes and a list of recipe types (VPWidenRecipe, VPReplicateRecipe, VPWidenCastRecipe, VPInstruction) The new matchers are used to simplify and generalize the code in simplifyRecipes.	2024-03-18 14:24:52 +00:00
Florian Hahn	5ab86ef7c1	[VPlan] Remove unused OverrideAllowed arg from getVPValue (NFCI).	2024-03-06 20:14:25 +00:00
Craig Topper	ac783addc4	[LV] Use SmallVector::resize instead of push_back/emplace_back in a loop. NFC (#83696 ) This should be more efficient since the vector can know how much additional space to reserve before creating the new elements.	2024-03-04 10:01:24 -08:00
Florian Hahn	3fac0562f8	[VPlan] Reset trip count when replacing ExpandSCEV recipe. Otherwise accessing the trip count may accesses freed memory. Fixes https://lab.llvm.org/buildbot/#/builders/74/builds/26239 and others.	2024-02-28 16:31:49 +00:00
Florian Hahn	911055e34f	[VPlan] Consistently use (Part, 0) for first lane scalar values (#80271 ) At the moment, some VPInstructions create only a single scalar value, but use VPTransformatState's 'vector' storage for this value. Those values are effectively uniform-per-VF (or in some cases uniform-across-VF-and-UF). Using the vector/per-part storage doesn't interact well with other recipes, that more accurately using (Part, Lane) to look up scalar values and prevents VPInstructions creating scalars from interacting with other recipes working with scalars. This PR tries to unify handling of scalars by using (Part, 0) for scalar values where only the first lane is demanded. This allows using VPInstructions with other recipes like VPScalarCastRecipe and is also needed when using VPInstructions in more cases otuside the vector loop region to generate scalars. Depends on https://github.com/llvm/llvm-project/pull/80269	2024-02-26 19:06:43 +00:00
Florian Hahn	85da9f80b8	[VPlan] Remove unused VPTransformState::VPValue2Value (NFCI). Clean up unused member variable.	2024-02-25 12:14:44 +00:00
Florian Hahn	0b01320d28	[VPlan] Remove unused VPTransformState::CanonicalIV (NFCI). Clean up unused member variable.	2024-02-23 16:54:30 +00:00
Florian Hahn	3d66d6932e	[VPlan] Support live-ins without underlying IR in type analysis. (#80723 ) A VPlan contains multiple live-ins without underlying IR, like VFxUF or VectorTripCount. Trying to infer the scalar type of those causes a crash at the moment. Update VPTypeAnalysis to take a VPlan in its constructor and assign types to those live-ins up front. All those live-ins share the type of the canonical IV. PR: https://github.com/llvm/llvm-project/pull/80723	2024-02-21 19:37:15 +00:00
Florian Hahn	9923d29cfa	[VPlan] Merge main VPlan verifer with HCFG verifier. Unify VPlan verifiers in verifyVPlanIsValid. This adds verification for various properties on blocks to the verifier used for VPlans generated by the inner loop vectorizer. It also adds def-use checks for the verifier used in the VPlan native path. This drops the separate flag to enable HCFG verification. Instead, all VPlans are verified once they have been created, if assertions are enabled. This also removes VPWidenPHIRecipe from VPHeaderPHIRecipe; it is used to model any phi node in the native path.	2024-02-20 16:43:57 +00:00
Florian Hahn	44b17679e3	[VPlan] Remove stale comment from VPTransformState::get (NFC) All values accessed via get are now part of VPTransformState, the ILV reference in the comment has been removed a long time ago. Remove the stale comment.	2024-02-19 22:11:51 +00:00
Florian Hahn	536d78c213	[VPlan] Remove VPInstruction::setUnderlyingInstr (NFCI). VPInstruction doesn't rely on the underlying instruction any longer for codegen, remove the unneeded setUnderlyingInstr.	2024-02-18 18:50:01 +00:00
Florian Hahn	ca56966684	[VPlan] Properly retain flags when cloning VPReplicateRecipe. This makes sure the correct flags are used for the clone (i.e. the ones present on the recipe), instead of the ones on the original IR instruction. At the moment, this should not change anything, as flags of replicate recipe should not be dropped before they are cloned at the moment. But that will change in a follow-up patch.	2024-02-14 11:11:46 +00:00
Florian Hahn	47abbf4fe9	[VPlan] Update VPInst::onlyFirstLaneUsed to check users. (#80269 ) A VPInstruction only has its first lane used if all users use its first lane only. Use vputils::onlyFirstLaneUsed to continue checking the recipe's users to handle more cases. Besides allowing additional introduction of scalar steps when interleaving in some cases, this also enables using an Add VPInstruction to model the increment - as a follow up.	2024-02-03 16:19:10 +00:00
Florian Hahn	3444240540	[VPlan] Mark vputils::onlyFirstPartUsed arg as const (NFC) Split off https://github.com/llvm/llvm-project/pull/80269 as suggested.	2024-02-03 15:59:09 +00:00
Florian Hahn	6936479020	[VPlan] Mark vputils::onlyFirstLaneUsed arg as const (NFC) Split off https://github.com/llvm/llvm-project/pull/80269 as suggested.	2024-02-03 15:56:40 +00:00
Florian Hahn	2906f3626b	[VPlan] Update ::onlyScalarsGenerated to take IsScalable bool (NFCI). Instead of passing in a full VF, just pass IsScalable as bool.	2024-02-03 14:51:14 +00:00
Florian Hahn	ec402a2e53	[VPlan] Implement cloning of VPlans. (#73158 ) This patch implements cloning for VPlans and recipes. Cloning is used in the epilogue vectorization path, to clone the VPlan for the main vector loop. This means we won't re-use a VPlan when executing the VPlan for the epilogue vector loop, which in turn will enable us to perform optimizations based on UF & VF.	2024-01-27 13:30:52 +00:00
Florian Hahn	0ab539fd67	[VPlan] Add new VPScalarCastRecipe, use for IV & step trunc. (#78113 ) Add a new recipe to model scalar cast instructions, without relying on an underlying instruction. This allows creating scalar casts, without relying on an underlying instruction (like the current VPReplicateRecipe). The new recipe is used to explicitly model both truncating the induction step and the VPDerivedIVRecipe, thus simplifying both the recipe and code needed to introduce it. Truncating VPWidenIntOrFpInductionRecipes should also be modeled using the new recipe, as follow-up. PR: https://github.com/llvm/llvm-project/pull/78113	2024-01-26 11:13:05 +00:00
Florian Hahn	42fb1fac9e	[VPlan] Use DebugLoc from recipe in VPWidenCallRecipe (NFCI). Instead of using the debug location of the underlying instruction, use the debug location from the recipe. This removes an unneeded dependency of the underlying instruction.	2024-01-19 13:33:03 +00:00
Florian Hahn	abdb61f5fd	[VPlan] Introduce VPSingleDefRecipe. (#77023 ) This patch introduces a new common base class for recipes defining a single result VPValue. This has been discussed/mentioned at various previous reviews as potential follow-up and helps to replace various getVPSingleValue calls. PR: https://github.com/llvm/llvm-project/pull/77023	2024-01-19 10:27:53 +00:00
Florian Hahn	18ec3304a9	[VPlan] Manage InBounds via VPRecipeWithIRFlags for VectorPtrRecipe. As suggested as follow-up in https://github.com/llvm/llvm-project/pull/72164, manage inbounds via VPRecipeWithIRFlags. Note that in some cases we can now preserve inbounds in a few more cases.	2024-01-07 13:58:05 +00:00
Florian Hahn	241fe83704	[VPlan] Introduce ComputeReductionResult VPInstruction opcode. (#70253 ) This patch introduces a new ComputeReductionResult opcode to compute the final reduction result in the middle block. The code from fixReduction has been moved to ComputeReductionResult, after some earlier cleanup changes to model parts of fixReduction explicitly elsewhere as needed. The recipe may be broken down further in the future. Note that the phi nodes to merge the reduction result from the trip count check and the middle block, to be used as resume value for the scalar remainder loop are also generated based on ComputeReductionResult. Once we have a VPValue for the reduction result, this can also be modeled explicitly and moved out of the recipe.	2024-01-04 22:53:18 +00:00
Florian Hahn	f18536d642	[VPlan] Model address separately. (#72164 ) Move vector pointer generation to a separate VPVectorPointerRecipe. This untangles address computation from the memory recipes future and is also needed to enable explicit unrolling in VPlan. https://github.com/llvm/llvm-project/pull/72164	2024-01-01 19:51:15 +00:00
Florian Hahn	a5891fa4d2	[VPlan] Initial modeling of VF * UF as VPValue. (#74761 ) This patch starts initial modeling of VF * UF in VPlan. Initially, introduce a dedicated VFxUF VPValue, which is then populated during VPlan::prepareToExecute. Initially, the VF * UF applies only to the main vector loop region. Once we extend the scope of VPlan in the future, we may want to associate different VFxUFs with different vector loop regions (e.g. the epilogue vector loop) This allows explicitly parameterizing recipes that rely on the VF * UF, like the canonical induction increment. At the moment, this mainly helps to avoid generating some duplicated calls to vscale with scalable vectors. It should also allow using EVL as induction increments explicitly in D99750. Referring to VF * UF is also needed in other places that we plan to migrate to VPlan, like the minimum trip count check during skeleton creation. The first version creates the value for VF * UF directly in prepareToExecute to limit the scope of the patch. A follow-on patch will model VF * UF computation explicitly in VPlan using recipes. Moved from Phabricator (https://reviews.llvm.org/D157322)	2023-12-08 18:30:30 +00:00
Florian Hahn	bbd1941a38	[VPlan] Add disjoint flag to VPRecipeWithIRFlags. (#74364 ) A new disjoint flag was added for OR instructions in #72583. Update VPRecipeWithIRFlags to also support the new flag. This allows printing and preserving the disjoint flag in vectorized code.	2023-12-05 15:21:59 +00:00
Alexey Bataev	056367bb19	[LV]Support dropping of nneg flag for zext widencast recipes. (#74112 ) Compiler crashes when the assertion triggered for zext nneg instruction, that checks that the instruction cannot produce poison. Changed the base class for widencast recipe to handle dropping nneg flag to avoid compiler crash.	2023-12-05 09:17:23 -05:00
Florian Hahn	99aa5311ee	[VPlan] Add missing output of live-ins to VPlan dot printing. Split off live-in printing to VPlan::printLiveIns and use it to print Live-ins when printing in the DOT format.	2023-12-04 13:41:28 +00:00
Florian Hahn	70535f5e60	[VPlan] Replace IR based truncateToMinimalBitwidths with VPlan version. This patch replaces the IR based truncateToMinimalBitwidths with a VPlan version. This has 3 benefits: 1) the VPlan-based version is simpler; we don't need to implement special codegen for each supported instruction type like the IR based one. 2) Removes a dependency on the cost-model after VPlan execution and 3) Removes a use of getVPValue that uses underlying values after VPlan execution (See removed FIXME). Depends on D149081. Depends on D149079. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D149903	2023-12-02 16:12:38 +00:00
Florian Hahn	906f598263	[VPlan] Remove dead IsEpilogueVec argument from prepareToExecute (NFC).	2023-11-23 16:59:50 +00:00
Florian Hahn	34c2dcd5ac	[VPlan] Move initial skeleton construction to createInitialVPlan. (NFC) This patch moves creating the middle VPBBs and an initial empty vector loop region for the top-level loop to createInitialVPlan. This consolidates code to create the initial VPlan skeleton and enables adding other bits outside the main region during initial VPlan construction. In particular, D150398 will add the exit check & branch to the middle block. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D158333	2023-11-12 13:00:44 +00:00
Florian Hahn	b0b88643a1	[VPlan] Add initial anlysis to infer scalar type of VPValues. (#69013 ) This patch adds initial type inferrence for VPValues. It infers the scalar type of a VPValue, by bottom-up traversing through defining recipes until root nodes with known types are reached (e.g. live-ins or load recipes). The types are then propagated top down through operations. This is intended as building block for a VPlan-based cost model, which will need access to type information for VPValues/recipes. Initial testing is done by asserting the inferred type matches the type of the result value generated for a widen and replicate recipes.	2023-10-27 14:38:28 +01:00
Florian Hahn	97687b7aea	[VPlan] Add active-lane-mask as VPlan-to-VPlan transformation. This patch updates the mask creation code to always create compares of the form (ICMP_ULE, wide canonical IV, backedge-taken-count) up front when tail folding and introduce active-lane-mask as later transformation. This effectively makes (ICMP_ULE, wide canonical IV, backedge-taken-count) the canonical form for tail-folding early on. Introducing more specific active-lane-mask recipes is treated as a VPlan-to-VPlan optimization. This has the advantage of keeping the logic (and complexity) of introducing active-lane-mask recipes in a single place, instead of spreading the logic out across multiple functions. It also simplifies initial VPlan construction and enables treating introducing EVL as similar optimization. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D158779	2023-09-25 13:34:45 +01:00
Florian Hahn	541e88dbc2	[VPlan] Simplify HCFG construction of region blocks (NFC). Update the logic to update the successors and predecessors of region blocks directly. This adds special handling for header and latch blocks in place, and removes the separate loop to fix up the region blocks. Helps to simplify D158333. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D159136	2023-09-24 21:53:35 +01:00
Florian Hahn	3e2d564c3d	[VPlan] Use VPRecipeWithFlags for VPScalarIVStepsRecipe (NFC). This directly models the flags as part of the recipe, which allows dropping them using the VPlan infrastructure when required. It also allows removing the full reference to InductionDescriptor and limit it to only the opcode.	2023-09-08 15:46:12 +01:00
Florian Hahn	785e7063b9	[VPlan] Don't rely on underlying instr in VPWidenRecipe (NFCI). VPWidenRecipe only needs the opcode to widen, all other information (flags, debug loc and operands) is already modeled directly via the recipe. This removes the remaining uses of the underlying instruction from VPWidenRecipe::execute.	2023-09-06 16:27:09 +01:00

1 2 3 4 5 ...

381 Commits