llvm-project

Author	SHA1	Message	Date
Kerry McLaughlin	de3de3f143	[LV] Consider interleaving when -enable-wide-lane-mask=true (#163387 ) Currently the only way to enable the use of wide active lane masks is to pass -enable-wide-lane-mask and force both interleaving & tail-folding with additional flags. This patch changes selectInterleaveCount to consider interleaving if wide lane masks were requested, although the feature remains off by default.	2025-11-11 11:46:59 +00:00
Ramkumar Ramachandra	fdd52f5fe1	[VPlan] Handle WidenGEP in narrowToSingleScalars (#166740 ) This allows us to strip a special case in VPWidenGEP::execute.	2025-11-11 10:33:55 +00:00
Florian Hahn	8b1cc2d5f5	[VPlan] Update canNarrowLoad to check WidenMember0's op first (NFCI). This hardens the code to check based on WideMember0's operands. This ensures each call will go through the same check. Should be NFC currently but needed when generalizing in follow-up patches.	2025-11-10 22:18:34 +00:00
Ramkumar Ramachandra	c2d4c7c18b	[VPlan] Permit more users in narrowToSingleScalars (#166559 ) narrowToSingleScalarRecipes can permit users that are WidenStore, or a VPInstruction that has a suitable opcode. This is a generalization and extension of the existing code.	2025-11-10 17:03:14 +00:00
Ramkumar Ramachandra	2d1d5fe78e	[VPlan] Simplify branch-cond with getVectorTripCount (#155604 ) Call getVectorTripCount first, and call getTripCount failing that, in simplifyBranchConditionForVFAndUF, to simplify missed cases. While at it, strip the dead check for a zero TC.	2025-11-10 10:43:37 +00:00
Florian Hahn	17ad8480f8	[VPlan] Convert redundant isSingleScalar check into assert (NFC). Follow-up to post-commit suggestion in https://github.com/llvm/llvm-project/pull/165506. C must be a single-scalar, turn check into assert.	2025-11-07 20:04:25 +00:00
Ramkumar Ramachandra	eab44600fb	[VPlan] Rename onlyFirst(Lane\|Part)Used (NFC) (#166562 ) Rename onlyFirst(Lane\|Part)Used to usesFirst(Lane\|Part)Only, in line with usesScalars, for clarity.	2025-11-06 10:07:58 +00:00
Mel Chen	d1874047f5	[VPlan] Retrieve alignment from Load/StoreInst in constructors. nfc (#165722 ) This patch removes the explicit Alignment parameter from VPWidenLoadRecipe and VPWidenStoreRecipe constructors. Instead, these recipes now directly retrieve the alignment from their LoadInst/StoreInst.	2025-11-06 09:02:04 +00:00
Florian Hahn	9fc8ddd2c8	[VPlan] Move code narrowing ops feeding an interleave group to helper (NFCI) Move and combine the code to narrow ops feeding interleave groups to a single unified static helper. NFC, as legalization logic has not changed.	2025-11-05 22:54:52 +00:00
Florian Hahn	b0b4616790	[VPlan] Handle single-scalar conds in VPWidenSelectRecipe. (#165506 ) Generalize VPWidenSelectRecipe codegen to consider single-scalar conditions instead of just loop-invariant ones. If the condition is a single-scalar, we can simply use a scalar condition. PR: https://github.com/llvm/llvm-project/pull/165506	2025-11-05 22:11:29 +00:00
Ramkumar Ramachandra	1de55c9693	[VPlan] Avoid sinking allocas in sinkScalarOperands (#166135 ) Use cannotHoistOrSinkRecipe to forbid sinking allocas.	2025-11-05 13:06:24 +00:00
Ramkumar Ramachandra	0a95a86634	[VPlan] Fix first-lane comment in sinkScalarOperands (NFC) (#166347 ) To follow-up on a post-commit review.	2025-11-04 12:02:58 +00:00
Ramkumar Ramachandra	0cae0af520	[VPlan] Shorten insert-idiom in sinkScalarOperands (NFC) (#166343 ) To follow-up on a post-commit review.	2025-11-04 10:04:57 +00:00
Mel Chen	40a042e49c	[VPlanTransform] Specialize simplifyRecipe for VPSingleDefRecipe pointer. nfc (#165568 ) The function simplifyRecipe now takes a VPSingleDefRecipe pointer since it only simplifies single-def recipes for now.	2025-11-03 09:00:54 +00:00
Luke Lau	97d4e96cc5	[VPlan] Perform optimizeMaskToEVL in terms of pattern matching (#155394 ) Currently in optimizeMaskToEVL we convert every widened load, store or reduction to a VP predicated recipe with EVL, regardless of whether or not it uses the header mask. So currently we have to be careful when working on other parts VPlan to make sure that the EVL transform doesn't break or transform something incorrectly, because it's not a semantics preserving transform. Forgetting to do so has caused miscompiles before, like the case that was fixed in #113667 This PR rewrites it to work in terms of pattern matching, so it now only converts a recipe to a VP predicated recipe if it is exactly masked with the header mask. After this the transform should be a true optimisation and not change any semantics, so it shouldn't miscompile things if other parts of VPlan change. This fixes #152541, and allows us to move addExplicitVectorLength into tryToBuildVPlanWithVPRecipes in #153144 It also splits out the load/store transforms into separate patterns for reversed and non-reversed, which should make #146525 easier to implement and reason about.	2025-11-03 16:53:18 +08:00
Ramkumar Ramachandra	03eb3cdaaa	[VPlan] Rewrite sinkScalarOperands (NFC) (#151696 ) Rewrite sinkScalarOperands in VPlanTransforms for clarity, in preparation for follow-up work to extend it to handle more recipes.	2025-11-03 06:43:42 +00:00
Florian Hahn	1c727baf69	[VPlan] Mark BranchOnCount and BranchOnCond as having side effects (NFC) BranchOnCount and BranchOnCond do not read memory, but cannot be moved. Mark them as having side-effects, but not reading/writing memory, which more accurately models that above. This allows removing some special checking for branches both in the current code and future patches.	2025-11-02 21:14:37 +00:00
Florian Hahn	b7e922a3da	[VPlan] Convert BuildVector with all-equal values to Broadcast. (#165826 ) Fold BuildVector where all operands are equal to Broadcast of the first operand. This will subsequently make it easier to remove additional buildvectors/broadcasts, e.g. via https://github.com/llvm/llvm-project/pull/165506. PR: https://github.com/llvm/llvm-project/pull/165826	2025-11-01 17:28:42 -07:00
Florian Hahn	6e83937f39	[VPlan] Add getConstantInt helpers for constant int creation (NFC). Add getConstantInt helper methods to VPlan to simplify the common pattern of creating constant integer live-ins. Suggested as follow-up in https://github.com/llvm/llvm-project/pull/164127.	2025-11-01 04:13:01 +00:00
Florian Hahn	a943132761	[VPlan] Add VPRegionBlock::getCanonicalIVType (NFC). (#164127 ) Split off from https://github.com/llvm/llvm-project/pull/156262. Similar to VPRegionBlock::getCanonicalIV, add helper to get the type of the canonical IV, in preparation for removing VPCanonicalIVPHIRecipe. PR: https://github.com/llvm/llvm-project/pull/164127	2025-10-31 20:05:02 -07:00
Florian Hahn	317b42ef5c	[VPlan] Remove original recipe after narrowing to single-scalar. Directly remove RepOrWidenR after replacing all uses. Removing the dead user early unlocks additional opportunities for further narrowing.	2025-10-31 04:38:16 +00:00
Florian Hahn	98d3a25f74	[VPlan] Don't preserve LCSSA in expandSCEVs. (#165505 ) This follows similar reasoning as 45ce88758d24 (https://github.com/llvm/llvm-project/pull/159556): LV does not preserve LCSSA, it constructs it just before processing a loop to vectorize. Runtime check expressions are invariant to that loop, so expanding them should not break LCSSA form for the loop we are about to vectorize. LV creates SCEV and memory runtime checks early on and then disconnects the blocks temporarily. The patch fixes a mis-compile, where previously LCSSA construction during SCEV expand may replace uses in currently unreachable SCEV/memory check blocks. Fixes https://github.com/llvm/llvm-project/issues/162512 PR: https://github.com/llvm/llvm-project/pull/165505	2025-10-29 18:25:46 +00:00
Sam Tebbs	22f860a55d	[LV] Bundle (partial) reductions with a mul of a constant (#162503 ) A reduction (including partial reductions) with a multiply of a constant value can be bundled by first converting it from `reduce.add(mul(ext, const))` to `reduce.add(mul(ext, ext(const)))` as long as it is safe to extend the constant. This PR adds such bundling by first truncating the constant to the source type of the other extend, then extending it to the destination type of the extend. The first truncate is necessary so that the types of each extend's operand are then the same, and the call to canConstantBeExtended proves that the extend following a truncate is safe to do. The truncate is removed by optimisations. This is a stacked PR, 1a and 1b can be merged in any order: 1a. https://github.com/llvm/llvm-project/pull/147302 1b. https://github.com/llvm/llvm-project/pull/163175 2. -> https://github.com/llvm/llvm-project/pull/162503	2025-10-28 16:59:53 +00:00
Ramkumar Ramachandra	a2d873fb87	[VPlan] Introduce cannotHoistOrSinkRecipe, fix miscompile (#162674 ) Factor out common code to determine legality of hoisting and sinking. The patch has the side-effect of fixing an underlying bug, where a load/store pair is reordered.	2025-10-28 09:36:17 +00:00
Mel Chen	6bf948999f	[VPlan] Store memory alignment in VPWidenMemoryRecipe. nfc (#165255 ) Add an member Alignment to VPWidenMemoryRecipe to store memory alignment directly in the recipe. Update constructors, clone(), and relevant methods to use this stored alignment instead of querying the IR instruction. This allows VPWidenLoadRecipe/VPWidenStoreRecipe to be constructed without relying on the original IR instruction in the future.	2025-10-28 15:29:35 +08:00
Ramkumar Ramachandra	2c6c2689c5	[VPlan] Extend tryToFoldLiveIns to fold binary intrinsics (#161703 ) InstSimplifyFolder can fold binary intrinsics, so take the opportunity to unify code with getOpcodeOrIntrinsicID, and handle the case. The additional handling of WidenGEP is non-functional, as the GEP is simplified before it is widened, as the included test shows.	2025-10-24 10:21:39 +00:00
Florian Hahn	301fa24671	[VPlan] Limit narrowInterleaveGroups to single block regions for now. Currently only regions with a single block are supported by the legality checks.	2025-10-23 23:55:59 +01:00
Sam Tebbs	6b19a546aa	[LV] Bundle partial reductions inside VPExpressionRecipe (#147302 ) This PR bundles partial reductions inside the VPExpressionRecipe class. Stacked PRs: 1. https://github.com/llvm/llvm-project/pull/147026 2. https://github.com/llvm/llvm-project/pull/147255 3. https://github.com/llvm/llvm-project/pull/156976 4. https://github.com/llvm/llvm-project/pull/160154 5. -> https://github.com/llvm/llvm-project/pull/147302 6. https://github.com/llvm/llvm-project/pull/162503 7. https://github.com/llvm/llvm-project/pull/147513	2025-10-23 11:18:55 +00:00
Florian Hahn	bfc322dd72	Revert "[VPlan] Run narrowInterleaveGroups during general VPlan optimizations. (#149706 )" This reverts commit 8d29d09309654541fb2861524276ada6a3ebf84c. There have been reports of mis-compiles in https://github.com/llvm/llvm-project/pull/149706. Revert while I investigate.	2025-10-22 21:27:11 +01:00
Florian Hahn	aca53f4375	[VPlan] Skip masked interleave groups in narrowInterleaveGroups. 8d29d09309 exposed a crash due to incorrectly trying to handle masked interleave recipes. For now, the current code does not support masked interleave recipes. Bail out for them.	2025-10-22 14:10:01 +01:00
Florian Hahn	82b59345fe	[VPlan] Clarify naming for helpers to create loop&replicate regions (NFC) Split off to clarify naming, as suggested in https://github.com/llvm/llvm-project/pull/156262.	2025-10-21 20:41:54 +01:00
Florian Hahn	8d29d09309	[VPlan] Run narrowInterleaveGroups during general VPlan optimizations. (#149706 ) Move narrowInterleaveGroups to to general VPlan optimization stage. To do so, narrowInterleaveGroups now has to find a suitable VF where all interleave groups are consecutive and saturate the full vector width. If such a VF is found, the original VPlan is split into 2: a) a new clone which contains all VFs of Plan, except VFToOptimize, and b) the original Plan with VFToOptimize as single VF. The original Plan is then optimized. If a new copy for the other VFs has been created, it is returned and the caller has to add it to the list of candidate plans. Together with https://github.com/llvm/llvm-project/pull/149702, this allows to take the narrowed interleave groups into account when computing costs to choose the best VF and interleave count. One example where we currently miss interleaving/unrolling when narrowing interleave groups is https://godbolt.org/z/Yz77zbacz PR: https://github.com/llvm/llvm-project/pull/149706	2025-10-21 11:37:42 +01:00
Ramkumar Ramachandra	3fbae10faa	[VPlan] Improve code using m_APInt (NFC) (#161683 )	2025-10-21 10:27:03 +01:00
Ramkumar Ramachandra	cc850b830c	[VPlan] Use VPlan::getRegion to shorten code (NFC) (#164287 )	2025-10-21 10:25:07 +01:00
Florian Hahn	b4dbb1cdc4	[VPlan] Be more careful with CSE in replicate regions. (#162110 ) Recipes in replicate regions implicitly depend on the region's predicate. Limit CSE to recipes in the same block, when either recipe is in a replicate region. This allows handling VPPredInstPHIRecipe during CSE. If we perform CSE on recipes inside a replicate region, we may end up with 2 VPPredInstPHIRecipes sharing the same operand. This is incompatible with current VPPredInstPHIRecipe codegen, which re-sets the current value of its operand in VPTransformState. This can cause crashes in the added test cases. Note that this patch only modifies ::isEqual to check for replicating regions and not getHash, as CSE across replicating regions should be uncommon. Fixes https://github.com/llvm/llvm-project/issues/157314. Fixes https://github.com/llvm/llvm-project/issues/161974. PR: https://github.com/llvm/llvm-project/pull/162110	2025-10-20 10:53:47 +00:00
Ramkumar Ramachandra	086666de83	[VPlan] Improve code using drop_begin, append_range (NFC) (#163934 )	2025-10-20 09:07:18 +01:00
Florian Hahn	b9ce7656e9	[VPlan] Add VPInstruction to unpack vector values to scalars. (#155670 ) Add a new Unpack VPInstruction (name to be improved) to explicitly extract scalars values from vectors. Test changes are movements of the extracts: they are no generated together and also directly after the producer. Depends on https://github.com/llvm/llvm-project/pull/155102 (included in PR) PR: https://github.com/llvm/llvm-project/pull/155670	2025-10-19 18:49:05 +00:00
Florian Hahn	8769119027	[VPlan] Add VPRecipeBase::getRegion helper (NFC). Multiple places retrieve the region for a recipe. Add a helper to make the code more compact and clearer.	2025-10-18 21:25:34 +01:00
Ramkumar Ramachandra	b71515cc76	[VPlan] Extend licm to hoist assumes (#162636 ) Assumes are safe to hoist if they're guaranteed to execute, since they don't alias, and don't throw. This mirrors what the IR-LICM does.	2025-10-16 13:59:32 +00:00
Ramkumar Ramachandra	8f04f074c9	[VPlan] Clarify legality check in licm (NFC) (#162486 ) Recipes in licm are safe to hoist if the legality check passes, and the recipe is guaranteed to execute; the single successor of the vector preheader is the vector loop region. Clarify this in the code structure and comments.	2025-10-16 12:36:39 +01:00
Florian Hahn	4f23767852	[VPlan] Add m_FirstActiveLane matcher (NFC). Add m_FirstActiveLane, to slightly simplify pattern matching in preparation for https://github.com/llvm/llvm-project/pull/149042.	2025-10-15 18:55:26 +01:00
Florian Hahn	7f54fccc0e	[VPlan] Add ExtractLastLanePerPart, use in narrowToSingleScalar. (#163056 ) When narrowing stores of a single-scalar, we currently use ExtractLastElement, which extracts the last element across all parts. This is not correct if the store's address is not uniform across all parts. If it is only uniform-per-part, the last lane per part must be extracted. Add a new ExtractLastLanePerPart opcode to handle this correctly. Most transforms apply to both ExtractLastElement and ExtractLastLanePerPart, with the only difference being their treatment during unrolling. Fixes https://github.com/llvm/llvm-project/issues/162498. PR: https://github.com/llvm/llvm-project/pull/163056	2025-10-15 13:46:09 +01:00
Florian Hahn	861519327a	[VPlan] Move getCanonicalIV to VPRegionBlock (NFC). (#163020 ) The canonical IV is tied to region blocks; move getCanonicalIV there and update all users. PR: https://github.com/llvm/llvm-project/pull/163020	2025-10-15 12:48:35 +01:00
Florian Hahn	9bb0eedb59	[VPlan] Assign custom opcodes to recipes not mapping to IR opcodes. (#162267 ) We can perform CSE on recipes that do not directly map to Instruction opcodes. One example is VPVectorPointerRecipe. Currently this is handled by supporting them in ::canHandle, but currently that means that we return std::nullopt from getOpcodeOrIntrinsicID() for it. This currently only works, because the only case we return std::nullopt and perform CSE is VPVectorPointerRecipe. But that does not work if we support more such recipes, like VPPredInstPHIRecipe (https://github.com/llvm/llvm-project/pull/162110). To fix this, return a custom opcode from getOpcodeOrIntrinsicID for recipes like VPVectorPointerRecipe, using the VPDefID after all regular instruction opcodes. PR: https://github.com/llvm/llvm-project/pull/162267	2025-10-13 11:16:14 +01:00
Ramkumar Ramachandra	946238e748	[VPlan] Strip VPDT's default constructor (NFC) (#162692 )	2025-10-13 10:16:05 +00:00
Ramkumar Ramachandra	869c76dda3	[VPlan] Allow zero-operand m_BranchOn(Cond\|Count) (NFC) (#162721 )	2025-10-13 08:50:09 +01:00
Florian Hahn	4bf5ab4f9d	[VPlan] Set flags when constructing truncs using VPWidenCastRecipe. VPWidenCastRecipes with Trunc opcodes where missing the correct OpType for IR flags. Update createWidenCast to set the correct flags for truncs, and use it consistenly. Fixes https://github.com/llvm/llvm-project/issues/162374.	2025-10-12 14:01:12 +01:00
Florian Hahn	4b8cac2bcc	[VPlan] Don't reset canonical IV start value. (#161589 ) Instead of re-setting the start value of the canonical IV when vectorizing the epilogue we can emit an Add VPInstruction to provide canonical IV value, adjusted by the resume value from the main loop. This is in preparation to make the canonical IV a VPValue defined by loop regions. It ensures that the canonical IV always starts at 0. PR: https://github.com/llvm/llvm-project/pull/161589	2025-10-11 22:19:05 +01:00
Ramkumar Ramachandra	107940f3be	[VPlan] Improve binary matchers in two places (NFC) (#162268 )	2025-10-07 14:56:43 +01:00
Ramkumar Ramachandra	f7f49ee40e	[VPlan] Improve code around WidenPHI's constructor (NFC) (#162277 )	2025-10-07 14:56:20 +01:00

1 2 3 4 5 ...

535 Commits