llvm-project

Author	SHA1	Message	Date
Florian Hahn	7c34848ae1	[VPlan] Hoist loads with invariant addresses using noalias metadata. (#166247 ) This patch implements a transform to hoists single-scalar replicated loads with invariant addresses out of the vector loop to the preheader when scoped noalias metadata proves they cannot alias with any stores in the loop. This enables hosting of loads we can prove do not alias any stores in the loop due to memory runtime checks added during vectorization. PR: https://github.com/llvm/llvm-project/pull/166247	2025-11-18 09:35:48 +00:00
Ramkumar Ramachandra	ef023cae38	Reland [VPlan] Expand WidenInt inductions with nuw/nsw (#168354 ) Changes: The previous patch had to be reverted to a mismatching-OpType assert in cse. The reduced-test has now been added corresponding to a RVV pointer-induction, and the pointer-induction case has been updated to use createOverflowingBinaryOp. While at it, record VPIRFlags in VPWidenInductionRecipe.	2025-11-17 13:44:25 +00:00
Alex Bradbury	f2336d4c7e	Revert "[VPlan] Expand WidenInt inductions with nuw/nsw" (#168080 ) Reverts llvm/llvm-project#163538 This is causing build failures on the two-stage RVV buildbots. e.g. https://lab.llvm.org/buildbot/#/builders/214/builds/1363. I've shared a reproducer and more information at https://github.com/llvm/llvm-project/pull/163538#issuecomment-3533482822 This reverts commit 355e0f94af5adabe90ac57110ce1b47596afd4cd.	2025-11-14 16:11:48 +00:00
Ramkumar Ramachandra	355e0f94af	[VPlan] Expand WidenInt inductions with nuw/nsw (#163538 ) While at it, record VPIRFlags in VPWidenInductionRecipe.	2025-11-14 12:10:55 +00:00
Luke Lau	851f8f7984	[VPlan] Disable partial reductions again with EVL tail folding (#167863 ) VPPartialReductionRecipe doesn't yet support an EVL variant, and we guard against this by not calling convertToAbstractRecipes when we're tail folding with EVL. However recently some things got shuffled around which means we may detect some scaled reductions in collectScaledReductions and store them in ScaledReductionMap, where outside of convertToAbstractRecipes we may look them up and start e.g. adding a scale factor to an otherwise regular VPReductionPHI. This fixes it by skipping collectScaledReductions, and fixes #167861	2025-11-14 06:30:12 +00:00
Florian Hahn	a6edeedbfa	Revert "[LV] Use ExtractLane(LastActiveLane, V) live outs when tail-folding. (#149042 )" This reverts commit 62d1a080e69e3c5e98840e000135afa7c688a77b. This appears to be causing some runtime failures on RISCV https://lab.llvm.org/buildbot/#/builders/210/builds/5221	2025-11-13 22:34:55 +00:00
Luke Lau	c0f7d51e8a	[VPlan] Simplify ExplicitVectorLength(%AVL) -> %AVL when AVL <= VF (#167647 ) [`llvm.experimental.get.vector.length`](https://llvm.org/docs/LangRef.html#id2399) has the property that if the AVL (%cnt) is less than or equal to VF (%max_lanes) then the return value is just AVL. This patch uses SCEV to simplify this in optimizeForVFAndUF, and adds `ExplicitVectorLength` to `VPInstruction::opcodeMayReadOrWriteFromMemory` so it gets removed once dead.	2025-11-13 13:17:01 +00:00
Florian Hahn	62d1a080e6	[LV] Use ExtractLane(LastActiveLane, V) live outs when tail-folding. (#149042 ) Building on top of https://github.com/llvm/llvm-project/pull/148817, introduce a new abstract LastActiveLane opcode that gets lowered to Not(Mask) → FirstActiveLane(NotMask) → Sub(result, 1). When folding the tail, update all extracts for uses outside the loop the extract the value of the last actice lane. See also https://github.com/llvm/llvm-project/issues/148603 PR: https://github.com/llvm/llvm-project/pull/149042	2025-11-12 15:11:00 +00:00
Luke Lau	02c68b3ef7	[VPlan] Plumb scalable register size through narrowInterleaveGroups (#167505 ) On RISC-V narrowInterleaveGroups doesn't kick in because the wrong VectorRegWidth is passed to isConsecutiveInterleaveGroup. narrowInterleaveGroups is always passed the RGK_FixedWidthVector register size, but on RISC-V the RGK_ScalableVector size is twice as large because we want to use LMUL 2. This causes the `GroupSize == VectorRegWidth` check to fail. This fixes it by using the scalable register size whenever the VF is scalable and plumbing it through as a potentially scalable TypeSize. Note that this only makes a difference when tail folding is disabled, as narrowInterleaveGroups can't handle EVL based IVs yet.	2025-11-12 11:14:53 +00:00
Mel Chen	68a4af6acc	[LV][EVL] Replace VPInstruction::Select with vp.merge for predicated div/rem (#154072 ) Since div/rem operations don’t support a mask operand, the lanes of the divisor that are masked out are currently replaced with 1 using VPInstruction::Select before the predicated div/rem operation. This patch replaces ``` VPInstruction::Select(logical_and(header_mask, conditional_mask), LHS, RHS) ``` with ``` vp.merge(conditional_mask, LHS, RHS, EVL) ``` so that the header mask can be replaced by EVL in this usage scenario when tail folding with EVL.	2025-11-12 08:03:57 +00:00
Ramkumar Ramachandra	c8c328406c	Revert "[VPlan] Handle WidenGEP in narrowToSingleScalars" (#167509 ) This reverts commit fdd52f5fe130fb8b98f4aed3d15aa0789cce6b40, as it causes buildbot failures. This will give us time to investigate the failure. https://lab.llvm.org/buildbot/#/builders/210/builds/5160	2025-11-11 14:29:28 +00:00
Ramkumar Ramachandra	fdd52f5fe1	[VPlan] Handle WidenGEP in narrowToSingleScalars (#166740 ) This allows us to strip a special case in VPWidenGEP::execute.	2025-11-11 10:33:55 +00:00
Ramkumar Ramachandra	c2d4c7c18b	[VPlan] Permit more users in narrowToSingleScalars (#166559 ) narrowToSingleScalarRecipes can permit users that are WidenStore, or a VPInstruction that has a suitable opcode. This is a generalization and extension of the existing code.	2025-11-10 17:03:14 +00:00
Ramkumar Ramachandra	2d1d5fe78e	[VPlan] Simplify branch-cond with getVectorTripCount (#155604 ) Call getVectorTripCount first, and call getTripCount failing that, in simplifyBranchConditionForVFAndUF, to simplify missed cases. While at it, strip the dead check for a zero TC.	2025-11-10 10:43:37 +00:00
Luke Lau	bac427a0f6	[VPlan] Remove no-longer-needed EVL VPlan debug output tests. NFC (#166158 ) These VPlan debug output tests were added in https://github.com/llvm/llvm-project/pull/108351 and https://github.com/llvm/llvm-project/pull/110412, whenever we used to convert regular widening recipes to VP intrinsics during EVL tail folding. Nowadays we don't convert these recipes so there's nothing really to be gained from testing them. This removes the VPlan tests since an upcoming patch slightly perturbs these VPlans and removing them seems easier than manually going through and updating them all. I've kept behind the LLVM IR/UTC counterparts in `tail-folding-{cast,call}-intrinsics.ll`, since even though they also aren't really testing anything useful at least they're easy to update.	2025-11-07 16:34:10 +08:00
Florian Hahn	b0b4616790	[VPlan] Handle single-scalar conds in VPWidenSelectRecipe. (#165506 ) Generalize VPWidenSelectRecipe codegen to consider single-scalar conditions instead of just loop-invariant ones. If the condition is a single-scalar, we can simply use a scalar condition. PR: https://github.com/llvm/llvm-project/pull/165506	2025-11-05 22:11:29 +00:00
Florian Hahn	af9a4263a1	[LAA] Only use inbounds/nusw in isNoWrap if the GEP is dereferenced. (#161445 ) Update isNoWrap to only use the inbounds/nusw flags from GEPs that are guaranteed to be dereferenced on every iteration. This fixes a case where we incorrectly determine no dependence. I think the issue is isolated to code that evaluates the resulting AddRec at BTC, just using it to compute the distance between accesses should still be fine; if the access does not execute in a given iteration, there's no dependence in that iteration. But isolating the code is not straight-forward, so be conservative for now. The practical impact should be very minor (only one loop changed across a corpus with 27k modules from large C/C++ workloads. Fixes https://github.com/llvm/llvm-project/issues/160912. PR: https://github.com/llvm/llvm-project/pull/161445	2025-11-04 17:08:12 +00:00
Sam Tebbs	6b19a546aa	[LV] Bundle partial reductions inside VPExpressionRecipe (#147302 ) This PR bundles partial reductions inside the VPExpressionRecipe class. Stacked PRs: 1. https://github.com/llvm/llvm-project/pull/147026 2. https://github.com/llvm/llvm-project/pull/147255 3. https://github.com/llvm/llvm-project/pull/156976 4. https://github.com/llvm/llvm-project/pull/160154 5. -> https://github.com/llvm/llvm-project/pull/147302 6. https://github.com/llvm/llvm-project/pull/162503 7. https://github.com/llvm/llvm-project/pull/147513	2025-10-23 11:18:55 +00:00
Florian Hahn	35b9f20449	[LV] Check for TruncInsts in canTruncateToMinimalBitwidth. TruncInst must truncate at most to their destination. Return false if MinBWs contains a destination size > the trunc result type size. Fixes https://github.com/llvm/llvm-project/issues/162688.	2025-10-20 22:31:16 +01:00
Nikita Popov	573ca36753	[IR] Replace alignment argument with attribute on masked intrinsics (#163802 ) The `masked.load`, `masked.store`, `masked.gather` and `masked.scatter` intrinsics currently accept a separate alignment immarg. Replace this with an `align` attribute on the pointer / vector of pointers argument. This is the standard representation for alignment information on intrinsics, and is already used by all other memory intrinsics. This means the signatures now match llvm.expandload, llvm.vp.load, etc. (Things like llvm.memcpy used to have a separate alignment argument as well, but were already migrated a long time ago.) It's worth noting that the masked.gather and masked.scatter intrinsics previously accepted a zero alignment to indicate the ABI type alignment of the element type. This special case is gone now: If the align attribute is omitted, the implied alignment is 1, as usual. If ABI alignment is desired, it needs to be explicitly emitted (which the IRBuilder API already requires anyway).	2025-10-20 08:50:09 +00:00
Florian Hahn	b9ce7656e9	[VPlan] Add VPInstruction to unpack vector values to scalars. (#155670 ) Add a new Unpack VPInstruction (name to be improved) to explicitly extract scalars values from vectors. Test changes are movements of the extracts: they are no generated together and also directly after the producer. Depends on https://github.com/llvm/llvm-project/pull/155102 (included in PR) PR: https://github.com/llvm/llvm-project/pull/155670	2025-10-19 18:49:05 +00:00
Nikita Popov	8fa4a1029c	[LoopVectorize] Regenerate test checks (NFC)	2025-10-16 18:21:42 +02:00
Florian Hahn	4bf5ab4f9d	[VPlan] Set flags when constructing truncs using VPWidenCastRecipe. VPWidenCastRecipes with Trunc opcodes where missing the correct OpType for IR flags. Update createWidenCast to set the correct flags for truncs, and use it consistenly. Fixes https://github.com/llvm/llvm-project/issues/162374.	2025-10-12 14:01:12 +01:00
Ramkumar Ramachandra	2a02d57efb	[IR] Mark vector intrinsics speculatable (#162334 ) The vector intrinsics in question have no undefined behavior, and have no other effect besides returning the result: they should hence be marked speculatable.	2025-10-09 09:41:59 +01:00
Florian Hahn	8907b6d393	[VPlan] Remove original loop blocks if dead. (#155497 ) Build on top of https://github.com/llvm/llvm-project/pull/154510 to completely remove the blocks of dead scalar loops. Depends on https://github.com/llvm/llvm-project/pull/154510. PR: https://github.com/llvm/llvm-project/pull/155497	2025-10-01 16:53:59 +00:00
Florian Hahn	41a2dfc0d7	[VPlan] Allow multiple users of (broadcast %evl). CSE may replace multiple redundant broadcasts of EVL with a single broadcast which may have more than 1 user. Adjust the verifier to allow this. Fixes a crash when building llvm-test-suite with EVL: https://lab.llvm.org/buildbot/#/builders/210/builds/3303	2025-09-27 21:47:54 +01:00
Florian Hahn	8460dbb450	[VPlan] Mark VPInstruction::Broadcast as not reading/writing memory. This enables additional DCE/CSE opportunities and ensures that we don't end up with multiple redundant users of a VPInstruction using EVL. It fixes a verifier error in the added test_3_inductions test.	2025-09-27 20:48:42 +01:00
Florian Hahn	78af056137	[VPlan] Run CSE closer to VPlan::execute. (#160572 ) Additional CSE opportunities are exposed after converting to concrete recipes/dissolving regions and materializing various expressions. Run CSE later, to capitalize on some of the late opportunities. PR: https://github.com/llvm/llvm-project/pull/160572	2025-09-26 09:38:58 +00:00
Luke Lau	aa6a33ae65	[LV] Remove EVLIndVarSimplify pass (#160454 ) Initially this was needed to replace the fixed-step canonical IV with the variable-step EVL IV, but this was eventually superseded by the loop vectorizer doing this transform itself in #147222. The pass was then removed from the RISC-V pipeline in #151483 and the loop vectorizer stopped emitting the metadata used by the pass in #155760, so now there's no users of it.	2025-09-25 12:18:20 +08:00
Shih-Po Hung	0d22f8344a	[LV][EVL] Remove metadata on EVL vectorized loops (#155760 ) This patch removes the metadata emission for EVL‑vectorized loops, since there is no current in-tree consumer: 1) after VPlan performs canonical IV replacement #147222 and 2) RISCV dropped EVLIndVarSimplifyPass #151483, which was the only user of this metadata.	2025-09-23 07:39:33 +08:00
Florian Hahn	addfdb5273	[LV] Skip select cost for invariant divisors in legacy cost model. For UDiv/SDiv with invariant divisors, the created selects will be hoisted out. Don't compute their cost for each iteration, to match the more accurate VPlan-based cost modeling. Fixes https://github.com/llvm/llvm-project/issues/159402.	2025-09-21 15:08:50 +01:00
Florian Hahn	50b9ca4dda	[VPlan] Simplify Plan's entry in removeBranchOnConst. (#154510 ) After https://github.com/llvm/llvm-project/pull/153643, there may be a BranchOnCond with constant condition in the entry block. Simplify those in removeBranchOnConst. This removes a number of redundant conditional branch from entry blocks. In some cases, it may also make the original scalar loop unreachable, because we know it will never execute. In that case, we need to remove the loop from LoopInfo, because all unreachable blocks may dominate each other, making LoopInfo invalid. In those cases, we can also completely remove the loop, for which I'll share a follow-up patch. Depends on https://github.com/llvm/llvm-project/pull/153643. PR: https://github.com/llvm/llvm-project/pull/154510	2025-09-18 19:25:05 +01:00
Sander de Smalen	17e008db17	[IR] NFC: Remove 'experimental' from partial.reduce.add intrinsic (#158637 ) The partial reduction intrinsics are no longer experimental, because they've been used in production for a while and are unlikely to change.	2025-09-17 11:44:47 +01:00
Ramkumar Ramachandra	46fcece2a8	[VPlan] Extend CSE to eliminate GEPs (#156699 ) The motivation for this patch is to close the gap between the VPlan-based CSE and the legacy CSE, to make it easier to remove the legacy CSE. Before this patch, stubbing out the legacy CSE leads to 22 test failures, and after this patch, there are only 12 failures, and all of them seem to have a single root cause: VPlanTransforms::createInterleaveGroups() and VPInterleaveGroup::execute(). The improvements from this patch are of course welcome. While developing the patch, a miscompile was found when GEP source-element-types differ, and this has been fixed. Co-authored-by: Florian Hahn <flo@fhahn.com> Co-authored-by: Luke Lau <luke@igalia.com>	2025-09-16 10:14:32 +00:00
Luke Lau	4bb250d6a3	[VPlan] Always consider register pressure on RISC-V (#156951 ) Stacked on #156923 In https://godbolt.org/z/8svWaredK, we spill a lot on RISC-V because whilst the largest element type is i8, we generate a bunch of pointer vectors for gathers and scatters. This means the VF chosen is quite high e.g. <vscale x 16 x i8>, but we end up using a bunch of <vscale x 16 x i64> m8 registers for the pointers. This was briefly fixed by #132190 where we computed register pressure in VPlan and used it to prune VFs that were likely to spill. The legacy cost model wasn't able to do this pruning because it didn't have visibility into the pointer vectors that were needed for the gathers/scatters. However VF pruning was restricted again to just the case when max bandwidth was enabled in #141736 to avoid an AArch64 regression, and restricted again in #149056 to only prune VFs that had max bandwidth enabled. On RISC-V we take advantage of register grouping for performance and choose a default of LMUL 2, which means there are 16 registers to work with – half the number as SVE, so we encounter higher register pressure more frequently. As such, we likely want to always consider pruning VFs with high register pressure and not just the VFs from max bandwidth. This adds a TTI hook to opt into this behaviour for RISC-V which fixes the motivating godbolt example above. When last checked this significantly reduces the number of spills on SPEC CPU 2017, up to 80% on 538.imagick_r.	2025-09-12 06:21:54 +00:00
Elvis Wang	3e898bc40f	[LV] Fix cost misaligned when gather/scatter w/ addr is uniform. (#157387 ) This patch fix the assertion when the `isUniform` (from legacy model) and `isSingleScalar`(from Vplan-based model) mismatch. The simplify test that cause assertion ``` loop: loadA = load %a => %a is loop invariant. loadB = load %LoadA ... ``` In the legacy cost model, it cannot analysis that addr of `%loadB` is uniform but in the Vplan-based cost model both addr in `%loadA` and `loadB` is single scalar. Full test caused crash: https://llvm.godbolt.org/z/zEG8YKjqh. --------- Co-authored-by: Luke Lau <luke@igalia.com>	2025-09-11 07:49:54 +08:00
Florian Hahn	9b1b93766d	Reapply "[SCEVExp] Add helper to clean up dead instructions after expansion. (#157308 )" This reverts commit eeb43806eb1b40e690aeeba496ee974172202df9. Recommit with with a fix for MSan failure ( https://lab.llvm.org/buildbot/#/builders/169/builds/14799), by adding a set to track deleted values. Using the InsertedInstructions set is not sufficient, as it use asserting value handles as keys, which may dereference the value at construction. Original message: Add new helper to erase dead instructions inserted during SCEV expansion but not being used due to InstSimplifyFolder simplifications. Together with https://github.com/llvm/llvm-project/pull/157307 this also allows removing some specialized folds, e.g. https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp#L2205 PR: https://github.com/llvm/llvm-project/pull/157308	2025-09-09 09:47:41 +01:00
Florian Hahn	eeb43806eb	Revert "[SCEVExp] Add helper to clean up dead instructions after expansion. (#157308 )" This reverts commit 528b13df571c86a2c5b8305d7974f135d785e30f. Triggers MSan errors in some configurations, e.g. https://lab.llvm.org/buildbot/#/builders/169/builds/14799	2025-09-08 14:52:28 +01:00
Florian Hahn	528b13df57	[SCEVExp] Add helper to clean up dead instructions after expansion. (#157308 ) Add new helper to erase dead instructions inserted during SCEV expansion but not being used due to InstSimplifyFolder simplifications. Together with https://github.com/llvm/llvm-project/pull/157307 this also allows removing some specialized folds, e.g. https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp#L2205 PR: https://github.com/llvm/llvm-project/pull/157308	2025-09-08 10:53:20 +01:00
Florian Hahn	b50ad945dd	[InstSimplify] Simplify extractvalue (umul_with_overflow(x, 1)). (#157307 ) Look through extractvalue to simplify umul_with_overflow where one of the operands is 1. This removes some redundant instructions when expanding SCEVs, which in turn makes the runtime check cost estimate more accurate, reducing the minimum iterations for which vectorization is profitable. PR: https://github.com/llvm/llvm-project/pull/157307	2025-09-07 18:32:40 +01:00
Luke Lau	3f9e0736ac	[VPlan] Move findCommonEdgeMask optimization to simplifyBlends (#156304 ) Following up from #150368, this moves folding common edge masks into simplifyBlends. One test in uniform-blend.ll ended up regressing but after looking at it closely, it came from a weird (x && !x) edge mask. So I've just included a simplifcation in this PR to fold that to false.	2025-09-05 01:29:22 +00:00
Ramkumar Ramachandra	e4c0b3e111	[VPlan] Simplify x && false -> false, x \| 0 -> x (#156345 ) The OR x, 0 -> x simplification has been introduced to avoid regressions.	2025-09-04 10:29:59 +01:00
Mel Chen	2f5500e4cf	[LV] Improve the test coverage for strided access. nfc (#155981 ) Add tests for strided access with UF > 1, and introduce a new test case @constant_stride_reinterpret.	2025-09-03 10:19:36 +00:00
Luke Lau	c33ccfa52b	[VPlan] Reassociate (x & y) & z -> x & (y & z) (#155383 ) This PR reassociates logical ands in order to enable more simplifications. The driving motivation for this is that with tail folding all blocks inside the loop body will end up using the header mask. However this can end up nestled deep within a chain of logical ands from other edges. Typically the header mask will be a leaf nested in the LHS, e.g. (headermask & y) & z. So pulling it out allows it to be simplified further, e.g. allows it to be optimised away to VP intrinsics with EVL tail folding.	2025-09-03 01:09:19 +00:00
Ramkumar Ramachandra	d8fd511480	[VPlan] Introduce CSE pass (#151872 ) Introduce a simple common-subexpression-elimination pass at the VPlan-level, running late during the execution of the VPlan. The long-term vision is to get rid of the legacy non-VPlan-based cse routine in LV, but this patch doesn't yet fully subsume it.	2025-09-02 12:23:29 +01:00
Elvis Wang	8fdae0c7da	[Reland] "[RISCV][TTI] Implement getAddressComputationCost() in RISCV TTI. #149955 " (#156386 ) This patch implements the `getAddressComputationCost()` in RISCV TTI which make the gather/scatter with address calculation more expansive that stride cost. Note that the only user of `getAddressComputationCost()` with vector type is in `VPWidenMemoryRecipe::computeCost()`. So this patch make some LV tests changes. I've checked the tests changes in LV and seems those changes can be divided into two groups. * gather/scatter with uniform vector ptr, seems can be optimized to masked.load. * can optimize to stride load/store. ---- After #155739 landed, the assertion (cost mis-aligned) is fixed. I've tested llvm-test-suite w/ rva23u64 and rva23u64_zvl1024b locally and no assertion occurred.	2025-09-02 11:43:27 +08:00
Elvis Wang	7997a79be6	[LV] Align legacy cost model to vplan-based model for gather/scatter w/ uniform addr. (#155739 ) This patch check if the addr is uniform in legacy cost model to align vplan-based cost model after #150371. This patch fixes llvm-test-suite assertion (https://lab.llvm.org/buildbot/#/builders/210/builds/1935) due to cost model misaligned after #149955 under RISCV. I've tested this patch (on top of #149955) on the llvm-test-suite locally with crashed options `rva23u64`, `rva23u64_zvl1024b` and build successfully. Since this fix will change LV, I think would be better to create a PR to fix this.	2025-09-02 09:11:45 +08:00
Mel Chen	13357e8a12	[LV][EVL] Support interleaved access with tail folding by EVL (#152070 ) The InterleavedAccess pass already supports transforming vector-predicated (vp) load/store intrinsics. With this patch, we start enabling interleaved access under tail folding by EVL. This patch introduces a new base class, VPInterleaveBase, and a concrete class, VPInterleaveEVLRecipe. Both the existing VPInterleaveRecipe and the new VPInterleaveEVLRecipe inherit from and implement VPInterleaveBase. Compared to VPInterleaveRecipe, VPInterleaveEVLRecipe adds an EVL operand to emit vp.load/vp.store intrinsics. Currently, tail folding by EVL is only supported for scalable vectorization. Therefore, VPInterleaveEVLRecipe will only emit interleave/deinterleave intrinsics. Reverse accesses are not yet implemented, as masked reverse interleaved access under tail folding is not yet supported. Fixed #123201	2025-09-01 21:20:06 +08:00
Luke Lau	eb7f6a5f8a	[VPlan] Simplify (x && y) \|\| (x && z) -> x && (y \|\| z) (#156308 ) Split off from #155383, since it turns out this has a diff on its own.	2025-09-01 21:12:23 +08:00
Luke Lau	c9faedd760	[VPlan] Fold common edges away in convertPhisToBlends (#150368 ) If a phi is widened with tail folding, all of its predecessors will have a mask of the form %x = logical-and %active-lane-mask, %foo %y = logical-and %active-lane-mask, %bar %z = logical-and %active-lane-mask, %baz ... We can remove the common %active-lane-mask from all of these edge masks, which in turn allows us to simplify a lot of VPBlendRecipes. In particular, it allows the header mask to be removed in selects with EVL tail folding, improving RISC-V codegen on SPEC CPU 2017 for 525.x264_r, and supersedes #147243. This also allows us to remove VPBlendRecipe and directly emit VPInstruction::Select in another patch.	2025-09-01 07:03:33 +00:00

1 2 3 4 5 ...

495 Commits