llvm-project

Author	SHA1	Message	Date
Sam Tebbs	31a0ebb840	[NFCI] Address post-merge review of #162503 (#165582 )	2025-10-31 10:23:03 +00:00
Florian Hahn	317b42ef5c	[VPlan] Remove original recipe after narrowing to single-scalar. Directly remove RepOrWidenR after replacing all uses. Removing the dead user early unlocks additional opportunities for further narrowing.	2025-10-31 04:38:16 +00:00
Florian Hahn	683b00bb50	[VPlan] Limit VPScalarIVSteps to step == 1 in getSCEVExprForVPValue. For now, just support VPScalarIVSteps with step == 1 in getSCEVExprForVPValue. This fixes a crash when the step would be != 1.	2025-10-31 02:22:56 +00:00
Vigneshwar Jayakumar	469702c5d5	[LICM] Sink unused l-invariant loads in preheader. (#157559 ) Unused loop invariant loads were not sunk from the preheader to the exit block, increasing live range. This commit moves the sinkUnusedInvariant logic from indvarsimplify to LICM also adds functionality to sink unused load that's not clobbered by the loop body.	2025-10-30 09:23:04 -05:00
Florian Hahn	98d3a25f74	[VPlan] Don't preserve LCSSA in expandSCEVs. (#165505 ) This follows similar reasoning as 45ce88758d24 (https://github.com/llvm/llvm-project/pull/159556): LV does not preserve LCSSA, it constructs it just before processing a loop to vectorize. Runtime check expressions are invariant to that loop, so expanding them should not break LCSSA form for the loop we are about to vectorize. LV creates SCEV and memory runtime checks early on and then disconnects the blocks temporarily. The patch fixes a mis-compile, where previously LCSSA construction during SCEV expand may replace uses in currently unreachable SCEV/memory check blocks. Fixes https://github.com/llvm/llvm-project/issues/162512 PR: https://github.com/llvm/llvm-project/pull/165505	2025-10-29 18:25:46 +00:00
Sam Tebbs	22f860a55d	[LV] Bundle (partial) reductions with a mul of a constant (#162503 ) A reduction (including partial reductions) with a multiply of a constant value can be bundled by first converting it from `reduce.add(mul(ext, const))` to `reduce.add(mul(ext, ext(const)))` as long as it is safe to extend the constant. This PR adds such bundling by first truncating the constant to the source type of the other extend, then extending it to the destination type of the extend. The first truncate is necessary so that the types of each extend's operand are then the same, and the call to canConstantBeExtended proves that the extend following a truncate is safe to do. The truncate is removed by optimisations. This is a stacked PR, 1a and 1b can be merged in any order: 1a. https://github.com/llvm/llvm-project/pull/147302 1b. https://github.com/llvm/llvm-project/pull/163175 2. -> https://github.com/llvm/llvm-project/pull/162503	2025-10-28 16:59:53 +00:00
Ramkumar Ramachandra	a2d873fb87	[VPlan] Introduce cannotHoistOrSinkRecipe, fix miscompile (#162674 ) Factor out common code to determine legality of hoisting and sinking. The patch has the side-effect of fixing an underlying bug, where a load/store pair is reordered.	2025-10-28 09:36:17 +00:00
Florian Hahn	0e28c9bc9d	[LAA] Skip undef/poison strides in collectStridedAccess. The map returned by collectStridedAccess is used to replace strides with their versioned values. This does not work for Undef/Poison, which don't have use-lists. Don't try to version them, as versioning won't be useful in practice. Fixes https://github.com/llvm/llvm-project/issues/162922.	2025-10-27 05:01:17 +00:00
Florian Hahn	57ba58d558	[LV] Modernize version-mem-access.ll tests. Auto-generate CHECK lines and simplify tests a bit.	2025-10-27 03:37:59 +00:00
Hassnaa Hamdi	be29f0dd86	[LV]: Improve accuracy of calculating remaining iterations of MainLoopVF (#156723 ) Transform TC and VF to same numerical space when they are different.	2025-10-26 14:45:44 +00:00
Ramkumar Ramachandra	2c6c2689c5	[VPlan] Extend tryToFoldLiveIns to fold binary intrinsics (#161703 ) InstSimplifyFolder can fold binary intrinsics, so take the opportunity to unify code with getOpcodeOrIntrinsicID, and handle the case. The additional handling of WidenGEP is non-functional, as the GEP is simplified before it is widened, as the included test shows.	2025-10-24 10:21:39 +00:00
Florian Hahn	301fa24671	[VPlan] Limit narrowInterleaveGroups to single block regions for now. Currently only regions with a single block are supported by the legality checks.	2025-10-23 23:55:59 +01:00
Florian Hahn	4ec5852c1d	[LV] Add tests for narrowing interleave groups with multiple blocks. Add additional test coverage for narrowInterleaveGroups with loops with multiple blocks.	2025-10-23 22:54:03 +01:00
paperchalice	249883d0c5	[test][Transforms] Remove unsafe-fp-math uses part 2 (NFC) (#164786 ) Post cleanup for #164534.	2025-10-23 20:31:31 +08:00
Sam Tebbs	6b19a546aa	[LV] Bundle partial reductions inside VPExpressionRecipe (#147302 ) This PR bundles partial reductions inside the VPExpressionRecipe class. Stacked PRs: 1. https://github.com/llvm/llvm-project/pull/147026 2. https://github.com/llvm/llvm-project/pull/147255 3. https://github.com/llvm/llvm-project/pull/156976 4. https://github.com/llvm/llvm-project/pull/160154 5. -> https://github.com/llvm/llvm-project/pull/147302 6. https://github.com/llvm/llvm-project/pull/162503 7. https://github.com/llvm/llvm-project/pull/147513	2025-10-23 11:18:55 +00:00
Florian Hahn	bfc322dd72	Revert "[VPlan] Run narrowInterleaveGroups during general VPlan optimizations. (#149706 )" This reverts commit 8d29d09309654541fb2861524276ada6a3ebf84c. There have been reports of mis-compiles in https://github.com/llvm/llvm-project/pull/149706. Revert while I investigate.	2025-10-22 21:27:11 +01:00
Kerry McLaughlin	45c0b29171	[LV] Ignore user-specified interleave count when unsafe. (#153009 ) When an VF is specified via a loop hint, it will be clamped to a safe VF or ignored if it is found to be unsafe. This is not the case for user-specified interleave counts, which can lead to loops such as the following with a memory dependence being vectorised with interleaving: ``` #pragma clang loop interleave_count(4) for (int i = 4; i < LEN; i++) b[i] = b[i - 4] + a[i]; ``` According to [1], loop hints are ignored if they are not safe to apply. This patch adds a check to prevent vectorisation with interleaving if isSafeForAnyVectorWidth() returns false. This is already checked in selectInterleaveCount(). [1] https://llvm.org/docs/LangRef.html#llvm-loop-vectorize-and-llvm-loop-interleave	2025-10-22 15:21:27 +01:00
Florian Hahn	aca53f4375	[VPlan] Skip masked interleave groups in narrowInterleaveGroups. 8d29d09309 exposed a crash due to incorrectly trying to handle masked interleave recipes. For now, the current code does not support masked interleave recipes. Bail out for them.	2025-10-22 14:10:01 +01:00
Sam Parker	20340accf2	[NFC][WebAssembly] FP conversion interleave tests (#164576 )	2025-10-22 11:43:44 +01:00
Florian Hahn	8d29d09309	[VPlan] Run narrowInterleaveGroups during general VPlan optimizations. (#149706 ) Move narrowInterleaveGroups to to general VPlan optimization stage. To do so, narrowInterleaveGroups now has to find a suitable VF where all interleave groups are consecutive and saturate the full vector width. If such a VF is found, the original VPlan is split into 2: a) a new clone which contains all VFs of Plan, except VFToOptimize, and b) the original Plan with VFToOptimize as single VF. The original Plan is then optimized. If a new copy for the other VFs has been created, it is returned and the caller has to add it to the list of candidate plans. Together with https://github.com/llvm/llvm-project/pull/149702, this allows to take the narrowed interleave groups into account when computing costs to choose the best VF and interleave count. One example where we currently miss interleaving/unrolling when narrowing interleave groups is https://godbolt.org/z/Yz77zbacz PR: https://github.com/llvm/llvm-project/pull/149706	2025-10-21 11:37:42 +01:00
David Sherwood	822c291aac	[LV][NFC] Remove undef from phi incoming values (#163762 ) Split off from PR #163525, this standalone patch replaces use of undef as incoming PHI values with zero, in order to reduce the likelihood of contributors hitting the `undef deprecator` warning in github.	2025-10-21 10:49:27 +01:00
Sushant Gokhale	005ec78b71	[AArch64][CostModel] Add constraints on which partial reductions are (#163728 ) natively supported on Neon and SVE PR #158641 refined and refactored the cost model for partial reductions. While doing so, it missed out on certain constraints. Specifically, cases like i32 -> i64 partial reduce are not natively supported. This patch adds back the condition/constraint that was present before PR #158641	2025-10-20 17:36:44 -07:00
Florian Hahn	35b9f20449	[LV] Check for TruncInsts in canTruncateToMinimalBitwidth. TruncInst must truncate at most to their destination. Return false if MinBWs contains a destination size > the trunc result type size. Fixes https://github.com/llvm/llvm-project/issues/162688.	2025-10-20 22:31:16 +01:00
Florian Hahn	b4dbb1cdc4	[VPlan] Be more careful with CSE in replicate regions. (#162110 ) Recipes in replicate regions implicitly depend on the region's predicate. Limit CSE to recipes in the same block, when either recipe is in a replicate region. This allows handling VPPredInstPHIRecipe during CSE. If we perform CSE on recipes inside a replicate region, we may end up with 2 VPPredInstPHIRecipes sharing the same operand. This is incompatible with current VPPredInstPHIRecipe codegen, which re-sets the current value of its operand in VPTransformState. This can cause crashes in the added test cases. Note that this patch only modifies ::isEqual to check for replicating regions and not getHash, as CSE across replicating regions should be uncommon. Fixes https://github.com/llvm/llvm-project/issues/157314. Fixes https://github.com/llvm/llvm-project/issues/161974. PR: https://github.com/llvm/llvm-project/pull/162110	2025-10-20 10:53:47 +00:00
Luke Lau	9fe1f29541	[VPlan] Set flags when constructing zexts using VPWidenCastRecipe (#164198 ) createWidenCast doesn't set the flag type, so when we simplify trunc (zext nneg x) -> zext x we would hit an assertion in CSE that the flag types don't match with other VPWidenCastRecipes that weren't simplified. This fixes it the same way trunc flags are handled too. As an aside I think it should be correct to preserve the nneg flag in this case since the input operand is still non-negative after the transform. But that's left to another PR. Fixes https://github.com/llvm/llvm-project/issues/164171	2025-10-20 10:39:16 +00:00
Ramkumar Ramachandra	9bfaf12c07	[VPlan] Handle more replicates in isUniformAcrossVFsAndUFs (#162342 ) A single-scalar replicate without side-effects, and with uniform operands, is uniform. Special-case assumes and stores.	2025-10-20 10:26:23 +00:00
Florian Hahn	9317975a7a	[VPlan] Match legacy behavior w.r.t. using pointer phis as scalar addrs. When the legacy cost model scalarizes loads that are used as addresses for other loads and stores, it looks to phi nodes, if they are direct address operands of loads/stores. Match this behavior in isUsedByLoadStoreAddress, to fix a divergence between legacy and VPlan-based cost model.	2025-10-20 11:09:25 +01:00
Florian Hahn	eb17a8d599	[SCEV] Preserve divisor info when adding guard info for ICMP_NE via Sub. (#163250 ) Follow-up to https://github.com/llvm/llvm-project/pull/160500 to preserve divisibiltiy info when creating the UMax. PR: https://github.com/llvm/llvm-project/pull/163250	2025-10-20 10:20:41 +01:00
Nikita Popov	573ca36753	[IR] Replace alignment argument with attribute on masked intrinsics (#163802 ) The `masked.load`, `masked.store`, `masked.gather` and `masked.scatter` intrinsics currently accept a separate alignment immarg. Replace this with an `align` attribute on the pointer / vector of pointers argument. This is the standard representation for alignment information on intrinsics, and is already used by all other memory intrinsics. This means the signatures now match llvm.expandload, llvm.vp.load, etc. (Things like llvm.memcpy used to have a separate alignment argument as well, but were already migrated a long time ago.) It's worth noting that the masked.gather and masked.scatter intrinsics previously accepted a zero alignment to indicate the ABI type alignment of the element type. This special case is gone now: If the align attribute is omitted, the implied alignment is 1, as usual. If ABI alignment is desired, it needs to be explicitly emitted (which the IRBuilder API already requires anyway).	2025-10-20 08:50:09 +00:00
Florian Hahn	445415709e	[LV] Move test for incomplete partial reduction chains to separate file. Move test to new file, to prepare for adding similar tests in https://github.com/llvm/llvm-project/pull/162822.	2025-10-19 22:23:53 +01:00
Florian Hahn	b9ce7656e9	[VPlan] Add VPInstruction to unpack vector values to scalars. (#155670 ) Add a new Unpack VPInstruction (name to be improved) to explicitly extract scalars values from vectors. Test changes are movements of the extracts: they are no generated together and also directly after the producer. Depends on https://github.com/llvm/llvm-project/pull/155102 (included in PR) PR: https://github.com/llvm/llvm-project/pull/155670	2025-10-19 18:49:05 +00:00
Florian Hahn	12ec050b9b	[LV] Remove some unnecessary uses of poison from tests.	2025-10-17 21:20:44 +01:00
Nikita Popov	8fa4a1029c	[LoopVectorize] Regenerate test checks (NFC)	2025-10-16 18:21:42 +02:00
Ramkumar Ramachandra	b71515cc76	[VPlan] Extend licm to hoist assumes (#162636 ) Assumes are safe to hoist if they're guaranteed to execute, since they don't alias, and don't throw. This mirrors what the IR-LICM does.	2025-10-16 13:59:32 +00:00
Ramkumar Ramachandra	34fdd7472b	[LV] Add coverage for operand-bundles (#163417 )	2025-10-16 12:22:03 +00:00
David Sherwood	c48aa54656	[LV][NFC] Remove undef from function return values (#163578 ) Split off from PR #163525, this standalone patch replaces `ret * undef` returns with `ret void` in order to reduce the likelihood of contributors hitting the `undef deprecator` warning in github.	2025-10-16 09:49:38 +01:00
Florian Hahn	7f54fccc0e	[VPlan] Add ExtractLastLanePerPart, use in narrowToSingleScalar. (#163056 ) When narrowing stores of a single-scalar, we currently use ExtractLastElement, which extracts the last element across all parts. This is not correct if the store's address is not uniform across all parts. If it is only uniform-per-part, the last lane per part must be extracted. Add a new ExtractLastLanePerPart opcode to handle this correctly. Most transforms apply to both ExtractLastElement and ExtractLastLanePerPart, with the only difference being their treatment during unrolling. Fixes https://github.com/llvm/llvm-project/issues/162498. PR: https://github.com/llvm/llvm-project/pull/163056	2025-10-15 13:46:09 +01:00
David Sherwood	4f2c867756	[LV][NFC] Fix "cpu" attribute in some partial-reduce*.ll tests (#163518 )	2025-10-15 09:26:04 +01:00
Sushant Gokhale	778d3c8ccc	[NFC] Partial reduce test to demonstrate regression post commit #cc9c64d (#162681 ) We have seen performance regression for several instances of the Numba benchmark, with some ranging around 70%, on Neoverse-v2 post #158641. The mentioned case is short reproducer of the same. See https://godbolt.org/z/j9Mj5WM7c for the IR differences.. A future patch will address this.	2025-10-14 23:51:36 -07:00
Florian Hahn	0fefa56b03	[LV] Add additional min/max reduction tests. Add test coverage for min/max reductions with various combinations of users (in and outside loops, used by stores) and predicated variants. This adds missing test coverage for min/max reductions.	2025-10-14 22:26:59 +01:00
Ramkumar Ramachandra	4ec78f56c2	[LV] Increase coverage of uniformity-rewriter (#161219 ) Add a test with a non-uniform load of an argument (SCEVUnknown), showing that SCEVUnknown cannot always be considered uniform.	2025-10-13 10:15:34 +00:00
Florian Hahn	5e3ac2a6f2	[LV] Bail out on loops with switch as latch terminator. Currently we cannot vectorize loops with latch blocks terminated by a switch. In the future this could be handled by materializing appropriate compares. Fixes https://github.com/llvm/llvm-project/issues/156894.	2025-10-12 21:20:35 +01:00
Florian Hahn	4bf5ab4f9d	[VPlan] Set flags when constructing truncs using VPWidenCastRecipe. VPWidenCastRecipes with Trunc opcodes where missing the correct OpType for IR flags. Update createWidenCast to set the correct flags for truncs, and use it consistenly. Fixes https://github.com/llvm/llvm-project/issues/162374.	2025-10-12 14:01:12 +01:00
Florian Hahn	5db774a822	[LV] Add additional test for narrowing to single scalars. Add extra test coverage for narrowing stores to single scalars, with the store address being uniform-per-part, not uniform-across-all-parts. Test for https://github.com/llvm/llvm-project/issues/162498.	2025-10-12 10:27:20 +01:00
Florian Hahn	ae7b15f2e2	[VPlan] Return invalid for scalable VF in VPReplicateRecipe::computeCost Replication is currently not supported for scalable VFs. Make sure VPReplicateRecipe::computeCost returns an invalid cost early, for scalable VFs if the recipe is not a single-scalar. Note that this moves the existing invalid-costs.ll out of the AArch64 subdirectory, as it does not use a target triple. Fixes https://github.com/llvm/llvm-project/issues/160792.	2025-10-11 19:28:02 +01:00
Yingwei Zheng	9e63b7ae4c	[InstCombine] Fix flag propagation in `foldSelectIntoOp` (#162003 ) Consider the following transform: ``` C = binop float A, nnan OOp D = select ninf, i1 cond, float C, float A -> E = select ninf, i1 cond, float OOp, float Identity F = binop float A, E ``` We cannot propagate ninf from the original select, because OOp may be inf, and the flag only guarantees that FalseVal (op OOp) is never infinity. Examples: -inf + +inf = NaN, -inf - -inf = NaN, 0 * inf = NaN Specifically, if the original select has both ninf and nnan, we can safely propagate the flag. Alive2: + fadd: https://alive2.llvm.org/ce/z/TWfktv + fsub: https://alive2.llvm.org/ce/z/RAsjJb + fmul: https://alive2.llvm.org/ce/z/8eg4ND Closes https://github.com/llvm/llvm-project/issues/161634.	2025-10-11 10:04:55 +08:00
Florian Hahn	ba69e33e13	[LV] Consistently apply address def scalarization across loop. Consistently scalarize loads used as part of address computations across all uses in the loop. This aligns the VPlan and legacy cost model and fixes a divergence crash. It doesn't matter if the load and address users are in different blocks, as long as they are in the same loop, the scalar value can be used. This removes a number of insert/extracts.	2025-10-09 22:04:15 +01:00
Florian Hahn	6d905e41bc	[SCEV] Use getConstantMultiple in to get divisibility info from guards. (#162617 ) Simplify and generalize the code to get a common constant multiple for expressions when collecting guards, replacing the manual implementation. Split off from https://github.com/llvm/llvm-project/pull/160012. PR: https://github.com/llvm/llvm-project/pull/162617	2025-10-09 10:51:36 +01:00
Ramkumar Ramachandra	2a02d57efb	[IR] Mark vector intrinsics speculatable (#162334 ) The vector intrinsics in question have no undefined behavior, and have no other effect besides returning the result: they should hence be marked speculatable.	2025-10-09 09:41:59 +01:00
Florian Hahn	98ce434870	[VPlan] Skip VPBlendRecipe in isUsedByLoadStoreAddress. VPBlendRecipes are introduced as part of if-conversion, potentially adding a def-use chain from a load used in a compare to another load/store. In the scalar IR, there is no connection via def-use chains, so the legacy cost model won't consider the load used by memory operation. Skipping blends brings the VPlan-based cost-computation in line with the legacy cost model after https://github.com/llvm/llvm-project/pull/162157.	2025-10-08 18:43:23 +01:00

1 2 3 4 5 ...

3549 Commits