llvm-project

Author	SHA1	Message	Date
Florian Hahn	dadf6f2c5a	[VPlan] Ignore incoming values with constant false mask. (#89384 ) Ignore incoming values with constant false masks when trying to simplify VPBlendRecipes. As a follow-on optimization, we should also be able to drop all incoming values with false masks by creating a new VPBlendRecipe with those operands dropped. PR: https://github.com/llvm/llvm-project/pull/89384	2024-04-23 13:59:01 +01:00
Florian Hahn	17fb3e82f6	[VPlan] Skip extending ICmp results in trunateToMinimalBitwidth. Results of icmp don't need extending after truncating their operands, as the result will always be i1. Skip them during extending. Fixes https://github.com/llvm/llvm-project/issues/79742 Fixes https://github.com/llvm/llvm-project/issues/85185	2024-04-23 11:50:26 +01:00
Florian Hahn	e2a72fa583	[VPlan] Introduce recipes for VP loads and stores. (#87816 ) Introduce new subclasses of VPWidenMemoryRecipe for VP (vector-predicated) loads and stores to address multiple TODOs from https://github.com/llvm/llvm-project/pull/76172 Note that the introduction of the new recipes also improves code-gen for VP gather/scatters by removing the redundant header mask. With the new approach, it is not sufficient to look at users of the widened canonical IV to find all uses of the header mask. In some cases, a widened IV is used instead of separately widening the canonical IV. To handle that, first collect all VPValues representing header masks (by looking at users of both the canonical IV and widened inductions that are canonical) and then checking all users (recursively) of those header masks. Depends on https://github.com/llvm/llvm-project/pull/87411. PR: https://github.com/llvm/llvm-project/pull/87816	2024-04-19 09:44:23 +01:00
Florian Hahn	5d314353fb	[VPlan] Check for VPWidenLoadRecipe directly in truncateToMinBW. (NFCI). Since ne After a separate recipe has been introduced for wide loads in a9bafe91dd0, we can directly check for load recipes in the early bail-out and remove the redundant bail out for stores.	2024-04-17 15:53:32 +01:00
Florian Hahn	41b7341d6b	[VPlan] Factor out helper to recursively collect all users (NFCI). Factor out logic to collect all users recursively to be re-used in https://github.com/llvm/llvm-project/pull/87816.	2024-04-17 14:56:47 +01:00
Florian Hahn	a9bafe91dd	[VPlan] Split VPWidenMemoryInstructionRecipe (NFCI). (#87411 ) This patch introduces a new VPWidenMemoryRecipe base class and distinct sub-classes to model loads and stores. This is a first step in an effort to simplify and modularize code generation for widened loads and stores and enable adding further more specialized memory recipes. PR: https://github.com/llvm/llvm-project/pull/87411	2024-04-17 11:00:58 +01:00
Alexey Bataev	413a66f339	[LV, VP]VP intrinsics support for the Loop Vectorizer + adding new tail-folding mode using EVL. (#76172 ) This patch introduces generating VP intrinsics in the Loop Vectorizer. Currently the Loop Vectorizer supports vector predication in a very limited capacity via tail-folding and masked load/store/gather/scatter intrinsics. However, this does not let architectures with active vector length predication support take advantage of their capabilities. Architectures with general masked predication support also can only take advantage of predication on memory operations. By having a way for the Loop Vectorizer to generate Vector Predication intrinsics, which (will) provide a target-independent way to model predicated vector instructions. These architectures can make better use of their predication capabilities. Our first approach (implemented in this patch) builds on top of the existing tail-folding mechanism in the LV (just adds a new tail-folding mode using EVL), but instead of generating masked intrinsics for memory operations it generates VP intrinsics for loads/stores instructions. The patch adds a new VPlanTransforms to replace the wide header predicate compare with EVL and updates codegen for load/stores to use VP store/load with EVL. Other important part of this approach is how the Explicit Vector Length is computed. (VP intrinsics define this vector length parameter as Explicit Vector Length (EVL)). We use an experimental intrinsic `get_vector_length`, that can be lowered to architecture specific instruction(s) to compute EVL. Also, added a new recipe to emit instructions for computing EVL. Using VPlan in this way will eventually help build and compare VPlans corresponding to different strategies and alternatives. Differential Revision: https://reviews.llvm.org/D99750	2024-04-04 18:30:17 -04:00
Florian Hahn	7bd163d0a4	[VPlan] Clean up dead recipes after UF & VF specific simplification. Recursively remove dead recipes after simplifying vector loop exit branch.	2024-04-04 12:05:08 +01:00
Florian Hahn	e329b68413	[VPlan] Factor out logic to check if recipe is dead (NFCI). In preparation to use the helper in more places.	2024-04-03 14:22:41 +01:00
Florian Hahn	e701c1a653	[VPlan] Use recipe's debug loc for VPWidenMemoryInstructionRecipe (NFCI) Now that VPRecipeBase manages debug locations for recipes, use it in VPWidenMemoryInstructionRecipe.	2024-04-01 12:07:30 +01:00
Florian Hahn	8a614c1d31	[VPlan] Rename getVPValueOrAddLiveIn -> getOrAddLiveIn (NFCI). The helper now only deals with live-ins, clarify the name.	2024-03-28 21:02:15 +00:00
Florian Hahn	6ef829941b	Recommit "[VPlan] Replace disjoint or with add instead of dropping disjoint. (#83821 )" Recommit with a fix for the use-after-free causing the revert. This reverts the revert commit f872043e055f4163c3c4b1b86ca0354490174987. Original commit message: Dropping disjoint from an OR may yield incorrect results, as some analysis may have converted it to an Add implicitly (e.g. SCEV used for dependence analysis). Instead, replace it with an equivalent Add. This is possible as all users of the disjoint OR only access lanes where the operands are disjoint or poison otherwise. Note that replacing all disjoint ORs with ADDs instead of dropping the flags is not strictly necessary. It is only needed for disjoint ORs that SCEV treated as ADDs, but those are not tracked. There are other places that may drop poison-generating flags; those likely need similar treatment. Fixes https://github.com/llvm/llvm-project/issues/81872 PR: https://github.com/llvm/llvm-project/pull/83821	2024-03-27 19:11:18 +00:00
Florian Hahn	06bb8c9f20	[VPlan] Explicitly handle scalar pointer inductions. (#83068 ) Add a new PtrAdd opcode to VPInstruction that corresponds to IRBuilder::CreatePtrAdd, which creates a GEP with source element type i8. This is then used to model scalarizing VPWidenPointerInductionRecipe by introducing scalar-steps to model the index increment followed by a PtrAdd. Note that PtrAdd needs to be able to generate code for only the first lane or for all lanes. This may warrant introducing a separate recipe for scalarizing that can be created without relying on the underlying IR. Depends on https://github.com/llvm/llvm-project/pull/80271 PR: https://github.com/llvm/llvm-project/pull/83068	2024-03-26 16:01:57 +01:00
Florian Hahn	39c8e87717	[VPlan] Move recording of Inst->VPValue to VPRecipeBuilder (NFCI). (#84464 ) Instead of keeping a mapping of Inst->VPValues (of their corresponding recipes) in VPlan's Value2VPValue mapping, keep it in VPRecipeBuilder instead. After recently replacing the last user of this mapping after initial construction, this mapping is only needed for recipe construction (to map IR operands to VPValue operands). By moving the mapping, VPlan's VPValue tracking can be simplified and limited only to live-ins. It also allows removing disableValue2VPValue and associated machinery & asserts. PR: https://github.com/llvm/llvm-project/pull/84464	2024-03-23 18:43:14 +01:00
Benjamin Kramer	f872043e05	Revert "[VPlan] Replace disjoint or with add instead of dropping disjoint. (#83821 )" This reverts commit c2c1e6ee4ce0df3d000ba880fa6cf58441da6462. It creates a use after free. ==8342==ERROR: AddressSanitizer: heap-use-after-free on address 0x50f000001760 at pc 0x55b9fb84a8fb bp 0x7ffc18468a10 sp 0x7ffc18468a08 READ of size 1 at 0x50f000001760 thread T0 #0 0x55b9fb84a8fa in dropPoisonGeneratingFlags llvm/lib/Transforms/Vectorize/VPlan.h:1040:13 #1 0x55b9fb84a8fa in llvm::VPlanTransforms::dropPoisonGeneratingRecipes(llvm::VPlan&, llvm::function_ref<bool (llvm::BasicBlock)>)::$_0::operator()(llvm::VPRecipeBase) const llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:1236:23 #2 0x55b9fb84a196 in llvm::VPlanTransforms::dropPoisonGeneratingRecipes(llvm::VPlan&, llvm::function_ref<bool (llvm::BasicBlock*)>) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp Can be reproduced with asan on Transforms/LoopVectorize/AArch64/sve-interleaved-masked-accesses.ll Transforms/LoopVectorize/X86/pr81872.ll Transforms/LoopVectorize/X86/x86-interleaved-accesses-masked-group.ll	2024-03-20 15:14:58 +01:00
Florian Hahn	c2c1e6ee4c	[VPlan] Replace disjoint or with add instead of dropping disjoint. (#83821 ) Dropping disjoint from an OR may yield incorrect results, as some analysis may have converted it to an Add implicitly (e.g. SCEV used for dependence analysis). Instead, replace it with an equivalent Add. This is possible as all users of the disjoint OR only access lanes where the operands are disjoint or poison otherwise. Note that replacing all disjoint ORs with ADDs instead of dropping the flags is not strictly necessary. It is only needed for disjoint ORs that SCEV treated as ADDs, but those are not tracked. There are other places that may drop poison-generating flags; those likely need similar treatment. Fixes https://github.com/llvm/llvm-project/issues/81872 PR: https://github.com/llvm/llvm-project/pull/83821	2024-03-19 20:16:18 +01:00
Florian Hahn	fd93a5e3c0	[VPlan] Support match unary and binary recipes in pattern matcher (NFC). Generalize pattern matchers to take recipe types to match as template arguments and use it to provide matchers for unary and binary recipes with specific opcodes and a list of recipe types (VPWidenRecipe, VPReplicateRecipe, VPWidenCastRecipe, VPInstruction) The new matchers are used to simplify and generalize the code in simplifyRecipes.	2024-03-18 14:24:52 +00:00
Florian Hahn	c07c1c47d3	[VPlan] Remove redundant cast (NFCI). SinkCandidate is a VPSingleDefRecipe now, so no cast is needed to access getUnderlyingInstr directly.	2024-03-18 08:58:23 +00:00
Florian Hahn	f1015d1701	[VPlan] Use VPBuilder to create ActiveLaneMask (NFC).	2024-03-13 16:08:02 +00:00
Philip Reames	df9ba13579	[LV] Handle scalable VFs in optimizeForVFAndUF (#82669 ) Given a scalable VF of the form <NumElts * VScale>, this patch adds the ability to discharge a backedge test for a loop whose trip count is between (NumElts, MinVScaleNumElts). A couple of notes on this: Annoyingly, I could not figure out to write a test for this case. My attempt is checked in as test32_i8 in f67ef1a, but LV uses a fixed vector in that case, and ignored the force flags. * This depends on 9eb5f94f to avoid appearing like a regression. Since SCEV doesn't know any upper bound on vscale without the vscale_range attribute (it doesn't query TTI), the ranges overflow on the multiply. Arguably, this is fixing a bug in the current LV code since in theory vscale can be large enough to overflow for real, but no actual target is going to see that case.	2024-03-04 13:49:35 -08:00
Florian Hahn	2435dcd83a	[VPlan] Add initial pattern match implementation for VPInstruction. (#80563 ) Add an initial version of a pattern match for VPValues and recipes, starting with VPInstruction. PR: https://github.com/llvm/llvm-project/pull/80563	2024-03-03 21:48:58 +00:00
Florian Hahn	3d66d6932e	[VPlan] Support live-ins without underlying IR in type analysis. (#80723 ) A VPlan contains multiple live-ins without underlying IR, like VFxUF or VectorTripCount. Trying to infer the scalar type of those causes a crash at the moment. Update VPTypeAnalysis to take a VPlan in its constructor and assign types to those live-ins up front. All those live-ins share the type of the canonical IV. PR: https://github.com/llvm/llvm-project/pull/80723	2024-02-21 19:37:15 +00:00
Florian Hahn	20177c45db	[VPlan] Turn private members of VPlanTransforms to static funcs (NFC) Private members of VPlanTransforms are only used inside VPlanTransforms.cpp, just make them static.	2024-02-17 13:45:23 +00:00
Florian Hahn	0dacba3ad1	[VPlan] Handle truncating ICMPs in truncateToMinimalBWs. Update truncateToMinimalBitwidths to handle truncating ICMPs. For ICMPs, the new target type will be the same as the original type. In that case, only truncate the operands, but skip the extend. This is in line with what the original truncateToMinimalBitwidths did for compares. Fixes https://github.com/llvm/llvm-project/issues/81415.	2024-02-16 12:58:56 +00:00
Florian Hahn	debca7ee43	[VPlan] Move dropping of poison flags to VPlanTransforms. (NFC) Move collectPoisonGeneratingFlags from InnerLoopVectorizer to VPlanTransforms and also update its name. collectPoisonGeneratingFlags already directly drops poison-generating flags, not only collecting it. This means it is more appropriate to integerate it directly into the VPlan transform pipeline. The current implementation still calls back to legal to check if a block needs predication, which should be improved in the future.	2024-02-14 12:28:58 +00:00
Florian Hahn	cec24f0d7e	[VPlan] Update stale test after 9536a6286, fix formatting.	2024-01-31 13:45:38 +00:00
Florian Hahn	9536a6286e	[VPlan] Preserve original induction order when creating scalar steps. Update createScalarIVSteps to take an insert point as parameter. This ensures that the inserted scalar steps are in the same order as the recipes they replace (vs in reverse order as currently). This helps to reduce the diff for follow-up changes.	2024-01-31 13:31:28 +00:00
Florian Hahn	743946e8ef	[VPlan] Replace VPRecipeOrVPValue with VP2VP recipe simplification. (#76090 ) Move simplification of VPBlendRecipes from early VPlan construction to VPlan-to-VPlan based recipe simplification. This simplifies initial construction. Note that some in-loop reduction tests are failing at the moment, due to the reduction predicate being created after the reduction recipe. I will provide a patch for that soon. PR: https://github.com/llvm/llvm-project/pull/76090	2024-01-29 09:52:05 +00:00
Florian Hahn	0ab539fd67	[VPlan] Add new VPScalarCastRecipe, use for IV & step trunc. (#78113 ) Add a new recipe to model scalar cast instructions, without relying on an underlying instruction. This allows creating scalar casts, without relying on an underlying instruction (like the current VPReplicateRecipe). The new recipe is used to explicitly model both truncating the induction step and the VPDerivedIVRecipe, thus simplifying both the recipe and code needed to introduce it. Truncating VPWidenIntOrFpInductionRecipes should also be modeled using the new recipe, as follow-up. PR: https://github.com/llvm/llvm-project/pull/78113	2024-01-26 11:13:05 +00:00
Florian Hahn	42fb1fac9e	[VPlan] Use DebugLoc from recipe in VPWidenCallRecipe (NFCI). Instead of using the debug location of the underlying instruction, use the debug location from the recipe. This removes an unneeded dependency of the underlying instruction.	2024-01-19 13:33:03 +00:00
Florian Hahn	abdb61f5fd	[VPlan] Introduce VPSingleDefRecipe. (#77023 ) This patch introduces a new common base class for recipes defining a single result VPValue. This has been discussed/mentioned at various previous reviews as potential follow-up and helps to replace various getVPSingleValue calls. PR: https://github.com/llvm/llvm-project/pull/77023	2024-01-19 10:27:53 +00:00
Florian Hahn	59d6f033a2	[VPlan] Support narrowing widened loads in truncateToMinimimalBitwidths. MinBWs may also contain widened load instructions, handle them by only narrowing their result. Fixes https://github.com/llvm/llvm-project/issues/77468	2024-01-12 13:14:13 +00:00
Florian Hahn	2ab5c47c87	[VPlan] Don't replace scalarizing recipe with VPWidenCastRecipe. Don't replace a scalarizing recipe with a VPWidenCastRecipe. This would introduce wide (vectorizing) recipes when interleaving only. Fixes https://github.com/llvm/llvm-project/issues/76986	2024-01-04 20:39:44 +00:00
Florian Hahn	cb56ba6350	[VPlan] Unswitch cond in replaceUsesWithIf in optimizeInductions (NFC) As suggested post-commit for a00227197, unswitch the condition in replaceUsesWithIf to simplify the check.	2023-12-15 20:26:36 +00:00
Florian Hahn	9277ef12c0	[VPlan] Remove stale comment from optimizeInductions (NFC). As suggested post-commit for a00227197, remove the stale comment, SetVector is no longer used here.	2023-12-15 17:35:13 +00:00
Alexey Bataev	056367bb19	[LV]Support dropping of nneg flag for zext widencast recipes. (#74112 ) Compiler crashes when the assertion triggered for zext nneg instruction, that checks that the instruction cannot produce poison. Changed the base class for widencast recipe to handle dropping nneg flag to avoid compiler crash.	2023-12-05 09:17:23 -05:00
Florian Hahn	cd4348349a	[VPlan] Sink cases where no truncate is needed in truncateMinimalBWs. MinBWs contains entries that specify the minimum required bitwidth. In some cases, the old and new bitwidths can be equal (see test case) and in those cases no truncations are needed, so skip those cases. Fixes #74307.	2023-12-04 15:35:54 +00:00
Florian Hahn	c890582912	[VPlan] Account for live-in entries in MinBW used by replicate recipes. In some cases MinBWs may contain entries for live-ins that are not used by VPWidenRecipe or VPWidenSelectRecipes. In those cases, the live-ins won't get processed, so make sure we include them in the count when used as operands in VPWidenCast and VPWidenSelectRecipe. Fixes https://github.com/llvm/llvm-project/issues/74231	2023-12-03 11:15:29 +00:00
Kazu Hirata	0008b9c0ac	[Vectorize] Fix an unused variable warning This patch fixes: llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:912:16: error: unused variable 'OldResSizeInBits' [-Werror,-Wunused-variable]	2023-12-02 11:20:57 -08:00
Florian Hahn	70535f5e60	[VPlan] Replace IR based truncateToMinimalBitwidths with VPlan version. This patch replaces the IR based truncateToMinimalBitwidths with a VPlan version. This has 3 benefits: 1) the VPlan-based version is simpler; we don't need to implement special codegen for each supported instruction type like the IR based one. 2) Removes a dependency on the cost-model after VPlan execution and 3) Removes a use of getVPValue that uses underlying values after VPlan execution (See removed FIXME). Depends on D149081. Depends on D149079. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D149903	2023-12-02 16:12:38 +00:00
Florian Hahn	6f3b88baa2	[VPlan] Move trunc ([s\|z]ext A) simplifications to simplifyRecipe. Split off simplification from D149903 as suggested. This should be effectively NFC until D149903 lands.	2023-11-16 21:17:10 +00:00
Florian Hahn	097ba5366c	[VPlan] Use VPTypeInfo in simplifyRecipes. Replace getTypeForVPValue with the recently added, more general VPTypeAnalysis.	2023-11-15 15:28:51 +00:00
Florian Hahn	a002271972	[VPlan] Add VPValue::replaceUsesWithIf (NFCI). Add replaceUsesWithIf helper and use it in a few places.	2023-11-06 16:08:22 +00:00
Florian Hahn	cff6652129	[VPlan] Handle VPValues without underlying values in getTypeForVPValue. Fixes a crash after 0c8e5be6fa08. Full type inference will be added in https://github.com/llvm/llvm-project/pull/69013	2023-10-27 13:34:54 +01:00
Florian Hahn	0c8e5be6fa	[VPlan] Simplify redundant trunc (zext A) pairs to A. Add simplification for redundant trunc(zext A) pairs. Generally apply a transform from D149903. Depends on D159200. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D159202	2023-10-22 11:41:38 +01:00
Mikael Holmen	9cecee97a0	[VPlan] Silence gcc Wparentheses warning [NFC] Without the fix gcc warns about ../lib/Transforms/Vectorize/VPlanTransforms.cpp:968:42: warning: suggest parentheses around '&&' within '\|\|' [-Wparentheses] 968 \| UseActiveLaneMaskForControlFlow && \| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~ 969 \| "DataAndControlFlowWithoutRuntimeCheck implies " \| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 970 \| "UseActiveLaneMaskForControlFlow"); \| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~	2023-09-28 12:04:26 +02:00
Florian Hahn	97687b7aea	[VPlan] Add active-lane-mask as VPlan-to-VPlan transformation. This patch updates the mask creation code to always create compares of the form (ICMP_ULE, wide canonical IV, backedge-taken-count) up front when tail folding and introduce active-lane-mask as later transformation. This effectively makes (ICMP_ULE, wide canonical IV, backedge-taken-count) the canonical form for tail-folding early on. Introducing more specific active-lane-mask recipes is treated as a VPlan-to-VPlan optimization. This has the advantage of keeping the logic (and complexity) of introducing active-lane-mask recipes in a single place, instead of spreading the logic out across multiple functions. It also simplifies initial VPlan construction and enables treating introducing EVL as similar optimization. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D158779	2023-09-25 13:34:45 +01:00
Florian Hahn	f108c6cdc1	[VPlan] Fold (MUL A, 1) -> A as VPlan2VPlan transform. Add first VPlan-based recipe simplification to fold (MUL A, 1) -> A. Among other things, this enables additional simplifications after applying versioned strides, as follow up to D147783. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D159200	2023-09-18 21:45:34 +01:00
Florian Hahn	aacaf3d580	[VPlan] Simplify VPDerivedIV truncation handling (NFCI). Address post-commit simplification suggestion for 8a56179bcd8c: Replace IsTruncated by conditionally setting TruncResultTy only if truncation is required.	2023-08-14 17:33:10 +01:00
Florian Hahn	b223229e2c	[VPlan] Re-use existing step again after 34accad1feae. This fixes a failing RISCV test case that was missed originally.	2023-08-08 21:42:56 +01:00

1 2 3 4

172 Commits