llvm-project

Author	SHA1	Message	Date
Ramkumar Ramachandra	3fbae10faa	[VPlan] Improve code using m_APInt (NFC) (#161683 )	2025-10-21 10:27:03 +01:00
Ramkumar Ramachandra	cc850b830c	[VPlan] Use VPlan::getRegion to shorten code (NFC) (#164287 )	2025-10-21 10:25:07 +01:00
Florian Hahn	b4dbb1cdc4	[VPlan] Be more careful with CSE in replicate regions. (#162110 ) Recipes in replicate regions implicitly depend on the region's predicate. Limit CSE to recipes in the same block, when either recipe is in a replicate region. This allows handling VPPredInstPHIRecipe during CSE. If we perform CSE on recipes inside a replicate region, we may end up with 2 VPPredInstPHIRecipes sharing the same operand. This is incompatible with current VPPredInstPHIRecipe codegen, which re-sets the current value of its operand in VPTransformState. This can cause crashes in the added test cases. Note that this patch only modifies ::isEqual to check for replicating regions and not getHash, as CSE across replicating regions should be uncommon. Fixes https://github.com/llvm/llvm-project/issues/157314. Fixes https://github.com/llvm/llvm-project/issues/161974. PR: https://github.com/llvm/llvm-project/pull/162110	2025-10-20 10:53:47 +00:00
Ramkumar Ramachandra	086666de83	[VPlan] Improve code using drop_begin, append_range (NFC) (#163934 )	2025-10-20 09:07:18 +01:00
Florian Hahn	b9ce7656e9	[VPlan] Add VPInstruction to unpack vector values to scalars. (#155670 ) Add a new Unpack VPInstruction (name to be improved) to explicitly extract scalars values from vectors. Test changes are movements of the extracts: they are no generated together and also directly after the producer. Depends on https://github.com/llvm/llvm-project/pull/155102 (included in PR) PR: https://github.com/llvm/llvm-project/pull/155670	2025-10-19 18:49:05 +00:00
Florian Hahn	8769119027	[VPlan] Add VPRecipeBase::getRegion helper (NFC). Multiple places retrieve the region for a recipe. Add a helper to make the code more compact and clearer.	2025-10-18 21:25:34 +01:00
Ramkumar Ramachandra	b71515cc76	[VPlan] Extend licm to hoist assumes (#162636 ) Assumes are safe to hoist if they're guaranteed to execute, since they don't alias, and don't throw. This mirrors what the IR-LICM does.	2025-10-16 13:59:32 +00:00
Ramkumar Ramachandra	8f04f074c9	[VPlan] Clarify legality check in licm (NFC) (#162486 ) Recipes in licm are safe to hoist if the legality check passes, and the recipe is guaranteed to execute; the single successor of the vector preheader is the vector loop region. Clarify this in the code structure and comments.	2025-10-16 12:36:39 +01:00
Florian Hahn	4f23767852	[VPlan] Add m_FirstActiveLane matcher (NFC). Add m_FirstActiveLane, to slightly simplify pattern matching in preparation for https://github.com/llvm/llvm-project/pull/149042.	2025-10-15 18:55:26 +01:00
Florian Hahn	7f54fccc0e	[VPlan] Add ExtractLastLanePerPart, use in narrowToSingleScalar. (#163056 ) When narrowing stores of a single-scalar, we currently use ExtractLastElement, which extracts the last element across all parts. This is not correct if the store's address is not uniform across all parts. If it is only uniform-per-part, the last lane per part must be extracted. Add a new ExtractLastLanePerPart opcode to handle this correctly. Most transforms apply to both ExtractLastElement and ExtractLastLanePerPart, with the only difference being their treatment during unrolling. Fixes https://github.com/llvm/llvm-project/issues/162498. PR: https://github.com/llvm/llvm-project/pull/163056	2025-10-15 13:46:09 +01:00
Florian Hahn	861519327a	[VPlan] Move getCanonicalIV to VPRegionBlock (NFC). (#163020 ) The canonical IV is tied to region blocks; move getCanonicalIV there and update all users. PR: https://github.com/llvm/llvm-project/pull/163020	2025-10-15 12:48:35 +01:00
Florian Hahn	9bb0eedb59	[VPlan] Assign custom opcodes to recipes not mapping to IR opcodes. (#162267 ) We can perform CSE on recipes that do not directly map to Instruction opcodes. One example is VPVectorPointerRecipe. Currently this is handled by supporting them in ::canHandle, but currently that means that we return std::nullopt from getOpcodeOrIntrinsicID() for it. This currently only works, because the only case we return std::nullopt and perform CSE is VPVectorPointerRecipe. But that does not work if we support more such recipes, like VPPredInstPHIRecipe (https://github.com/llvm/llvm-project/pull/162110). To fix this, return a custom opcode from getOpcodeOrIntrinsicID for recipes like VPVectorPointerRecipe, using the VPDefID after all regular instruction opcodes. PR: https://github.com/llvm/llvm-project/pull/162267	2025-10-13 11:16:14 +01:00
Ramkumar Ramachandra	946238e748	[VPlan] Strip VPDT's default constructor (NFC) (#162692 )	2025-10-13 10:16:05 +00:00
Ramkumar Ramachandra	869c76dda3	[VPlan] Allow zero-operand m_BranchOn(Cond\|Count) (NFC) (#162721 )	2025-10-13 08:50:09 +01:00
Florian Hahn	4bf5ab4f9d	[VPlan] Set flags when constructing truncs using VPWidenCastRecipe. VPWidenCastRecipes with Trunc opcodes where missing the correct OpType for IR flags. Update createWidenCast to set the correct flags for truncs, and use it consistenly. Fixes https://github.com/llvm/llvm-project/issues/162374.	2025-10-12 14:01:12 +01:00
Florian Hahn	4b8cac2bcc	[VPlan] Don't reset canonical IV start value. (#161589 ) Instead of re-setting the start value of the canonical IV when vectorizing the epilogue we can emit an Add VPInstruction to provide canonical IV value, adjusted by the resume value from the main loop. This is in preparation to make the canonical IV a VPValue defined by loop regions. It ensures that the canonical IV always starts at 0. PR: https://github.com/llvm/llvm-project/pull/161589	2025-10-11 22:19:05 +01:00
Ramkumar Ramachandra	107940f3be	[VPlan] Improve binary matchers in two places (NFC) (#162268 )	2025-10-07 14:56:43 +01:00
Ramkumar Ramachandra	f7f49ee40e	[VPlan] Improve code around WidenPHI's constructor (NFC) (#162277 )	2025-10-07 14:56:20 +01:00
Ramkumar Ramachandra	93073af121	[LV] Move 3 functions into VPlanTransforms (NFC) (#158644 ) Two of them are actually transforms, and the third is a dependent static.	2025-10-06 11:13:01 +01:00
Ramkumar Ramachandra	ff4aec5d3c	[VPlan] Deref VPlanPtr when passing to transform (NFC) (#161369 ) For uniformity with other transforms.	2025-10-03 09:47:35 +01:00
Ramkumar Ramachandra	1f225676f4	[VPlan] Improve code using VPlan::getFalse (NFC) (#161681 )	2025-10-02 19:21:27 +01:00
Ramkumar Ramachandra	5843ffb149	[VPlan] Improve code using m_One (NFC) (#161686 )	2025-10-02 18:14:43 +01:00
Nicolai Hähnle	11a4b2d950	Cleanup the LLVM exported symbols namespace (#161240 ) There's a pattern throughout LLVM of cl::opts being exported. That in itself is probably a bit unfortunate, but what's especially bad about it is that a lot of those symbols are in the global namespace. Move them into the llvm namespace. While doing this, I noticed some other variables in the global namespace and moved them as well.	2025-10-01 15:32:07 -07:00
Ramkumar Ramachandra	280abaf9da	[VPlan] Handle scalar-VF in transforms (NFC) (#161365 )	2025-09-30 19:35:12 +01:00
Sam Tebbs	88658dbbc5	[LV] Add ExtNegatedMulAccReduction expression type (#160154 ) This PR adds the ExtNegatedMulAccReduction expression type for VPExpressionRecipe so that extend-multiply-accumulate reductions with a negated multiply can be bundled. Stacked PRs: 1. https://github.com/llvm/llvm-project/pull/156976 2. -> https://github.com/llvm/llvm-project/pull/160154 3. https://github.com/llvm/llvm-project/pull/147302	2025-09-30 10:10:37 +01:00
Florian Hahn	71be13a6f0	[VPlan] Rewrite VPExpandSCEVExprs in replaceSymbolicStrides. Extend replaceSymbolicStrides to also replace SCEVUnknowns in VPExpandSCEVExprs using the information from StridesMaps. This results in simpler SCEV expansions in some cases.	2025-09-28 21:55:31 +01:00
Florian Hahn	70a26da639	[VPlan] Set correct flags when creating and cloning VPWidenCastRecipe. Make sure that we set the correct wrap flags when creating new VPWidenCastRecipes for truncs and preserve the flags from the recipe directly when cloning, to make sure they are not dropped. Fixes https://github.com/llvm/llvm-project/issues/160396	2025-09-25 09:00:47 +01:00
Ramkumar Ramachandra	019913e4fa	[VPlan] Add WidenGEP::getSourceElementType (NFC) (#159029 )	2025-09-22 10:02:08 +01:00
Ramkumar Ramachandra	b716d35388	[VPlanPatternMatch] Introduce m_ConstantInt (#159558 )	2025-09-21 13:27:46 +01:00
Florian Hahn	50b9ca4dda	[VPlan] Simplify Plan's entry in removeBranchOnConst. (#154510 ) After https://github.com/llvm/llvm-project/pull/153643, there may be a BranchOnCond with constant condition in the entry block. Simplify those in removeBranchOnConst. This removes a number of redundant conditional branch from entry blocks. In some cases, it may also make the original scalar loop unreachable, because we know it will never execute. In that case, we need to remove the loop from LoopInfo, because all unreachable blocks may dominate each other, making LoopInfo invalid. In those cases, we can also completely remove the loop, for which I'll share a follow-up patch. Depends on https://github.com/llvm/llvm-project/pull/153643. PR: https://github.com/llvm/llvm-project/pull/154510	2025-09-18 19:25:05 +01:00
Ramkumar Ramachandra	0384f6c9db	[VPlanPatternMatch] Introduce match functor (NFC) (#159521 ) Follow up on 7fb3a91 ([PatternMatch] Introduce match functor) to introduce the VPlanPatternMatch version of the match functor to shorten some idioms. Co-authored-by: Luke Lau <luke@igalia.com>	2025-09-18 10:36:12 +01:00
Ramkumar Ramachandra	46fcece2a8	[VPlan] Extend CSE to eliminate GEPs (#156699 ) The motivation for this patch is to close the gap between the VPlan-based CSE and the legacy CSE, to make it easier to remove the legacy CSE. Before this patch, stubbing out the legacy CSE leads to 22 test failures, and after this patch, there are only 12 failures, and all of them seem to have a single root cause: VPlanTransforms::createInterleaveGroups() and VPInterleaveGroup::execute(). The improvements from this patch are of course welcome. While developing the patch, a miscompile was found when GEP source-element-types differ, and this has been fixed. Co-authored-by: Florian Hahn <flo@fhahn.com> Co-authored-by: Luke Lau <luke@igalia.com>	2025-09-16 10:14:32 +00:00
Ramkumar Ramachandra	148a83543b	[LV] Introduce m_One and improve (0\|1)-match (NFC) (#157419 )	2025-09-15 10:34:06 +00:00
Florian Hahn	ef7e03a2d1	[VPlan] Limit ExtractLastElem fold to recipes guaranteed single-scalar. vputils::isSingleScalar(A) may return true to recipes that produce only a single scalar value, but they could still end up as vector instruction, because the recipe could not be converted to a single-scalar VPInstruction/VPReplicateRecipe. For now, only apply the fold for recipes guaranteed to produce a single value, i.e. single-scalar VPInstructions and VPReplicateRecipes. Fixes https://github.com/llvm/llvm-project/issues/158319.	2025-09-13 18:15:38 +01:00
Florian Hahn	b8eaceb39b	[VPlan] Explicitly replicate VPInstructions by VF. (#155102 ) Extend replicateByVF added in #142433 (aa240293190) to also explicitly unroll replicating VPInstructions. Now the only remaining case where we replicate for all lanes is VPReplicateRecipes in replicate regions. PR: https://github.com/llvm/llvm-project/pull/155102	2025-09-12 17:06:26 +01:00
Florian Hahn	1efa997317	[VPlan] Handle stores to single-scalar addr in narrowToSingleScalars. Move handling of stores to single-scalar/uniform address from replicateByVF to narrowToSingleScalar.	2025-09-10 21:58:29 +01:00
Florian Hahn	055e4ff35a	[VPlan] Don't narrow op multiple times in narrowInterleaveGroups. Track which ops already have been narrowed, to avoid narrowing the same operation multiple times. Repeated narrowing will lead to incorrect results, because we could first narrow from an interleave group -> wide load, and then narrow the wide load > single-scalar load. Fixes thttps://github.com/llvm/llvm-project/issues/156190.	2025-09-10 19:22:42 +01:00
Florian Hahn	c3e76b2770	[VPlan] Keep common flags during CSE. (#157664 ) During CSE, we don't have to drop all poison-generating flags on mis-match, we can keep the ones common on both recipes. PR: https://github.com/llvm/llvm-project/pull/157664	2025-09-10 10:20:48 +00:00
Stephen Tozer	d4f7995488	[VPlan] Use Unknown instead of empty location in VPlanTransforms (#157702 ) The default values for DebugLocs in LoopVectorizer/VPlan were recently updated from empty DebugLocs to DebugLoc::getUnknown, as part of the DebugLoc Coverage Tracking work. However, there are some cases where we also pass an explicit empty DebugLoc, in many cases as a filler argument. This patch updates all of these to `getUnknown` for now, until either valid locations or a suitable categorization can be assigned to each instead. This change is NFC outside of DebugLoc coverage tracking builds.	2025-09-10 10:33:58 +01:00
Mel Chen	4d9a7fa9ba	[VPlan] Remove dead recipes before simplifying blends (#157622 ) In simplifyBlends, when normalizing a blend recipe, the first mask that is used only by the blend and is not all-false is chosen, and its corresponding incoming value becomes the initial value, with the others blended into it. At the same time, the mask that is chosen can be eliminated. However, a multi-user mask might be used by a dead recipe, which prevents this optimization. This patch moves removeDeadRecipes before simplifyBlends to eliminate dead recipes, allowing simplifyBlends to remove more dead masks.	2025-09-10 08:03:18 +00:00
Florian Hahn	c4b17bf9ed	[VPlan] Slightly extend ExtractLastElement fold to single-scalars. Update ExtractLastElement fold to support single scalar recipes, if all their users only use scalars.	2025-09-09 22:08:08 +01:00
Florian Hahn	132bacde22	[VPlan] Also allow extracts as users when converting to single scalars. Extracts technically do not use scalars, but vectors, but if the operand is a single scalar we do not need a vector and they should not block forming single scalars.	2025-09-08 22:11:39 +01:00
Luke Lau	3f9e0736ac	[VPlan] Move findCommonEdgeMask optimization to simplifyBlends (#156304 ) Following up from #150368, this moves folding common edge masks into simplifyBlends. One test in uniform-blend.ll ended up regressing but after looking at it closely, it came from a weird (x && !x) edge mask. So I've just included a simplifcation in this PR to fold that to false.	2025-09-05 01:29:22 +00:00
Ramkumar Ramachandra	e4c0b3e111	[VPlan] Simplify x && false -> false, x \| 0 -> x (#156345 ) The OR x, 0 -> x simplification has been introduced to avoid regressions.	2025-09-04 10:29:59 +01:00
Luke Lau	c33ccfa52b	[VPlan] Reassociate (x & y) & z -> x & (y & z) (#155383 ) This PR reassociates logical ands in order to enable more simplifications. The driving motivation for this is that with tail folding all blocks inside the loop body will end up using the header mask. However this can end up nestled deep within a chain of logical ands from other edges. Typically the header mask will be a leaf nested in the LHS, e.g. (headermask & y) & z. So pulling it out allows it to be simplified further, e.g. allows it to be optimised away to VP intrinsics with EVL tail folding.	2025-09-03 01:09:19 +00:00
Ramkumar Ramachandra	d8fd511480	[VPlan] Introduce CSE pass (#151872 ) Introduce a simple common-subexpression-elimination pass at the VPlan-level, running late during the execution of the VPlan. The long-term vision is to get rid of the legacy non-VPlan-based cse routine in LV, but this patch doesn't yet fully subsume it.	2025-09-02 12:23:29 +01:00
Sam Tebbs	37127f74f4	[LV] Bundle sub reductions into VPExpressionRecipe (#147255 ) This PR bundles sub reductions into the VPExpressionRecipe class and adjusts the cost functions to take the negation into account. Stacked PRs: 1. https://github.com/llvm/llvm-project/pull/147026 2. -> https://github.com/llvm/llvm-project/pull/147255 3. https://github.com/llvm/llvm-project/pull/147302 4. https://github.com/llvm/llvm-project/pull/147513	2025-09-01 17:25:01 +01:00
Mel Chen	13357e8a12	[LV][EVL] Support interleaved access with tail folding by EVL (#152070 ) The InterleavedAccess pass already supports transforming vector-predicated (vp) load/store intrinsics. With this patch, we start enabling interleaved access under tail folding by EVL. This patch introduces a new base class, VPInterleaveBase, and a concrete class, VPInterleaveEVLRecipe. Both the existing VPInterleaveRecipe and the new VPInterleaveEVLRecipe inherit from and implement VPInterleaveBase. Compared to VPInterleaveRecipe, VPInterleaveEVLRecipe adds an EVL operand to emit vp.load/vp.store intrinsics. Currently, tail folding by EVL is only supported for scalable vectorization. Therefore, VPInterleaveEVLRecipe will only emit interleave/deinterleave intrinsics. Reverse accesses are not yet implemented, as masked reverse interleaved access under tail folding is not yet supported. Fixed #123201	2025-09-01 21:20:06 +08:00
Luke Lau	eb7f6a5f8a	[VPlan] Simplify (x && y) \|\| (x && z) -> x && (y \|\| z) (#156308 ) Split off from #155383, since it turns out this has a diff on its own.	2025-09-01 21:12:23 +08:00
Kerry McLaughlin	f0e9bba024	[LoopVectorize] Generate wide active lane masks (#147535 ) This patch adds a new flag (-enable-wide-lane-mask) which allows LoopVectorize to generate wider-than-VF active lane masks when it is safe to do so (i.e. the mask is used for data and control flow). The transform in extractFromWideActiveLaneMask creates vector extracts from the first active lane mask in the header & loop body, modifying the active lane mask phi operands to use the extracts. An additional operand is passed to the ActiveLaneMask instruction, the value of which is used as a multiplier of VF when generating the mask. By default this is 1, and is updated to UF by extractFromWideActiveLaneMask. The motivation for this change is to improve interleaved loops when SVE2.1 is available, where we can make use of the whilelo instruction which returns a predicate pair. This is based on a PR that was created by @momchil-velikov (#81140) and contains tests which were added there.	2025-09-01 13:53:30 +01:00

1 2 3 4 5 ...

503 Commits