llvm-project

Author	SHA1	Message	Date
Nikita Popov	4ab87ffd1e	[SCCP] Enable PredicateInfo for non-interprocedural SCCP (#153003 ) SCCP can use PredicateInfo to constrain ranges based on assume and branch conditions. Currently, this is only enabled during IPSCCP. This enables it for SCCP as well, which runs after functions have already been simplified, while IPSCCP runs pre-inline. To a large degree, CVP already handles range-based optimizations, but SCCP is more reliable for the cases it can handle. In particular, SCCP works reliably inside loops, which is something that CVP struggles with due to LVI cycles. I have made various optimizations to make PredicateInfo more efficient, but unfortunately this still has significant compile-time cost (around 0.1-0.2%).	2025-08-19 10:59:38 +02:00
Antonio Frighetto	33761df961	Revert "[SimpleLoopUnswitch] Record loops from unswitching non-trivial conditions" This reverts commit e9de32fd159d30cfd6fcc861b57b7e99ec2742ab due to multiple performance regressions observed across downstream Numba benchmarks (https://github.com/llvm/llvm-project/issues/138509#issuecomment-3193855772). While avoiding non-trivial unswitches on newly-cloned loops helps mitigate the pathological case reported in https://github.com/llvm/llvm-project/issues/138509, it may as well make the IR less friendly to vectorization / loop- canonicalization (in the test reported, previously no select with loop-carried dependence existed in the new specialized loops), leading the abovementioned approach to be reconsidered.	2025-08-18 17:40:08 +02:00
David Green	790bee99de	[VectorCombine] Remove dead node immediately in VectorCombine (#149047 ) The vector combiner will process all instructions as it first loops through the function, adding any newly added and deleted instructions to a worklist which is then processed when all nodes are done. These leaves extra uses in the graph as the initial processing is performed, leading to sub-optimal decisions being made for other combines. This changes it so that trivially dead instructions are removed immediately. The main changes that this requires is to make sure iterator invalidation does not occur.	2025-08-18 07:55:21 +01:00
Florian Hahn	db98ac43ec	[LV] Use shl for ((VF * Step) * vscale) in createStepForVF. (#153495 ) Directly emit shl instead of a multiply if VF * Step is a power-of-2. The main motivation here is to prepare the code and test for directly generating and expanding a SCEV expression of the minimum iteration count. SCEVExpander will directly emit shl for multiplies with powers-of-2. InstCombine will also performs this combine, so end-to-end this should effectively by NFC. PR: https://github.com/llvm/llvm-project/pull/153495	2025-08-14 19:27:51 +01:00
Florian Hahn	d92671cf7d	[PhaseOrdering] Add tests for optimizing std::find for AArch64.	2025-08-14 11:25:55 +01:00
Leon Clark	9115bef8ee	[VectorCombine] Shrink loads used in shufflevector rebroadcasts. (#153138 ) Reopen #128938. Attempt to shrink the size of vector loads where only some of the incoming lanes are used for rebroadcasts in shufflevector instructions. --------- Co-authored-by: Leon Clark <leoclark@amd.com> Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>	2025-08-12 14:08:37 +01:00
David Green	6ca6d45b29	[VectorCombine] Use hasOneUser in shuffle-to-identity fold (#152675 ) We need to check that the node is part of the graph being converted, so will not contain external uses when transformed.	2025-08-11 07:45:15 +01:00
David Green	a976843033	[AArch64] Add a phase-ordering test for a mla reduction sum. NFC	2025-08-10 15:40:12 +01:00
Drew Kersnar	90e8c8e718	[InferAlignment] Propagate alignment between loads/stores of the same base pointer (#145733 ) We can derive and upgrade alignment for loads/stores using other well-aligned loads/stores. This optimization does a single forward pass through each basic block and uses loads/stores (the alignment and the offset) to derive the best possible alignment for a base pointer, caching the result. If it encounters another load/store based on that pointer, it tries to upgrade the alignment. The optimization must be a forward pass within a basic block because control flow and exception throwing can impact alignment guarantees. --------- Co-authored-by: Nikita Popov <github@npopov.com>	2025-08-08 12:05:29 -05:00
Florian Hahn	82d633e9ff	[VPlan] Materialize vector trip count using VPInstructions. (#151925 ) Materialize the vector trip count computation using VPInstruction instead of directly creating IR. This is one of the last few steps needed to model the full vector skeleton in VPlan. It also simplifies vector-trip count computations for scalable vectors, as we can re-use the UF x VF computation. PR: https://github.com/llvm/llvm-project/pull/151925	2025-08-08 11:44:32 +01:00
Nikita Popov	c23b4fbdbb	[IR] Remove size argument from lifetime intrinsics (#150248 ) Now that #149310 has restricted lifetime intrinsics to only work on allocas, we can also drop the explicit size argument. Instead, the size is implied by the alloca. This removes the ability to only mark a prefix of an alloca alive/dead. We never used that capability, so we should remove the need to handle that possibility everywhere (though many key places, including stack coloring, did not actually respect this).	2025-08-08 11:09:34 +02:00
Simon Pilgrim	88c6448fa2	Revert "[VectorCombine] Shrink loads used in shufflevector rebroadcasts" (#151960 ) Reverts llvm/llvm-project#128938 while a crash regression is investigated	2025-08-04 15:03:53 +01:00
Leon Clark	1feed444aa	[VectorCombine] Shrink loads used in shufflevector rebroadcasts (#128938 ) Attempt to shrink the size of vector loads where only some of the incoming lanes are used for rebroadcasts in shufflevector instructions. --------- Co-authored-by: Leon Clark <leoclark@amd.com> Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>	2025-08-04 10:49:27 +01:00
David Green	fca9f70e42	[AArch64] Add some simple phase ordering tests for interleaving at various factors. NFC	2025-08-03 14:51:59 +01:00
Nikita Popov	0a41e7c87e	[LICM] Do not reassociate constant offset GEP (#151492 ) LICM tries to reassociate GEPs in order to hoist an invariant GEP. Currently, it also does this in the case where the GEP has a constant offset. This is usually undesirable. From a back-end perspective, constant GEPs are usually free because they can be folded into addressing modes, so this just increases register pressume. From a middle-end perspective, keeping constant offsets last in the chain makes it easier to analyze the relationship between multiple GEPs on the same base, especially after CSE. The worst that can happen here is if we start with something like ``` loop { p + 4x p + 4x + 1 p + 4x + 2 p + 4x + 3 } ``` And LICM converts it into: ``` p.1 = p + 1 p.2 = p + 2 p.3 = p + 3 loop { p + 4x p.1 + 4x p.2 + 4x p.3 + 4x } ``` Which is much worse than leaving it for CSE to convert to: ``` loop { p2 = p + 4*x p2 + 1 p2 + 2 p2 + 3 } ```	2025-08-01 09:43:15 +02:00
Joel E. Denny	37e03b56b8	Revert "[PGO] Add `llvm.loop.estimated_trip_count` metadata" (#151585 ) Reverts llvm/llvm-project#148758 [As requested.](https://github.com/llvm/llvm-project/pull/148758#pullrequestreview-3076627201)	2025-07-31 15:56:31 -04:00
Joel E. Denny	f7b65011de	[PGO] Add `llvm.loop.estimated_trip_count` metadata (#148758 ) This patch implements the `llvm.loop.estimated_trip_count` metadata discussed in [[RFC] Fix Loop Transformations to Preserve Block Frequencies](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785). As [suggested in the RFC comments](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785/4), it adds the new metadata to all loops at the time of profile ingestion and estimates each trip count from the loop's `branch_weights` metadata. As [suggested in the PR #128785 review](https://github.com/llvm/llvm-project/pull/128785#discussion_r2151091036), it does so via a new `PGOEstimateTripCountsPass` pass, which creates the new metadata for each loop but omits the value if it cannot estimate a trip count due to the loop's form. An important observation not previously discussed is that `PGOEstimateTripCountsPass` often cannot estimate a loop's trip count, but later passes can sometimes transform the loop in a way that makes it possible. Currently, such passes do not necessarily update the metadata, but eventually that should be fixed. Until then, if the new metadata has no value, `llvm::getLoopEstimatedTripCount` disregards it and tries again to estimate the trip count from the loop's current `branch_weights` metadata.	2025-07-31 12:28:25 -04:00
David Green	8f968fe3ec	[AggressiveInstCombine] Make cttz fold more resiliant to non-array geps (#150896 ) Similar to #150639 this fixes the AggressiveInstCombine fold for convert tables to cttz instructions if the gep types are not array types. i.e `gep i16 @glob, i64 %idx` instead of `gep [64 x i16] @glob, i64 0, i64 %idx`.	2025-07-31 16:53:55 +01:00
Samuel Tebbs	fcc419b05f	[LV][NFCI] Swap reduction recipe operand order https://github.com/llvm/llvm-project/pull/147026 will enable sub reductions, which require that the phi value is the first operand since they aren't commutative. This re-orders the operands when executing reductions, which actually matches other existing code in VPReductionRecipe::execute.	2025-07-31 14:35:10 +01:00
Igor Kirillov	0c91e977c0	[VectorCombine] Refine cost model and decision logic in foldSelectShuffle (#146694 ) After PR #136329, shuffle indices may differ, which can cause the existing cost-based logic to miss optimisation opportunities for binop/shuffle sequences. This patch improves the cost model in foldSelectShuffle to more accurately assess costs, recognising when certain duplicate shuffles do not require actual instructions. Additionally, in break-even cases, this change introduces a check for whether the pattern ultimately feeds into a vector reduction, allowing the transform to proceed when it is likely to be profitable overall.	2025-07-25 13:28:27 +01:00
Antonio Frighetto	e9de32fd15	[SimpleLoopUnswitch] Record loops from unswitching non-trivial conditions Track newly-cloned loops coming from unswitching non-trivial invariant conditions, so as to prevent conditions in such cloned blocks from being unswitched again. Fixes: https://github.com/llvm/llvm-project/issues/138509.	2025-07-24 10:27:52 +02:00
David Green	828a867ee0	[AArch64] Reduce the costs of and/or/xor reductions (#148553 ) Since the costs were added the codegen for i8/i16 and/or/xor reductions has improved. This updates the cost model to produce the same costs in terms of number of instructions.	2025-07-16 09:59:36 +01:00
Luke Lau	c8d0e24745	[VPlan] Preserve trunc nuw/nsw in VPRecipeWithIRFlags (#144700 ) This preserves the nuw/nsw flags on widened truncs by checking for TruncInst in the VPIRFlags constructor The motivation for this is to be able to fold away some redundant truncs feeding into uitofps (or potentially narrow the inductions feeding them)	2025-07-15 15:34:14 +08:00
Florian Hahn	08a8e1c6b6	[InstCombine] Move extends across identity shuffles. (#146901 ) Add a new fold to instcombine to move SExt/ZExt across identity shuffles, applying the cast after the shuffle. This sinks extends and can enable more general additional folding of both shuffles (and related instructions) and extends. If backends prefer splitting up doing casts first, the extends can be hoisted again in VectorCombine for example. A larger example is included in the load_i32_zext_to_v4i32. The wider extend is easier to compute an accurate cost for and targets (like AArch64) can lower a single wider extend more efficiently than multiple separate extends. This is a generalization of a VectorCombine version (https://github.com/llvm/llvm-project/pull/141109) as suggested by @preames. PR: https://github.com/llvm/llvm-project/pull/146901	2025-07-14 21:01:03 +01:00
Alexey Bataev	a999a1b88c	[SLP]Remove emission of vector_insert/vector_extract intrinsics Replaced by the regular shuffles. Fixes #145512 Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/148007	2025-07-11 15:26:45 -04:00
Florian Hahn	178e18263f	[InstCombine] Add tests for moving exts across identity shuffles. Also merges redundant check lines in PhaseOrdering/X86/blendv-select.ll to reduce test diff in upcoming change. Precommits tests for https://github.com/llvm/llvm-project/pull/146901.	2025-07-05 20:57:57 +01:00
Amara Emerson	c9babbc206	Pre-commit PhaseOrdering/always-inline-alloca-promotion.ll	2025-07-02 19:51:01 -07:00
Nikita Popov	b8b7494551	[InstCombine] Rewrite multi-use GEPs when simplifying comparison (#146100 ) We already do this when both sides are a GEP, but not if only one is. This ensures that the offset arithmetic is not duplicated.	2025-07-01 14:26:47 +02:00
Nikita Popov	ec150a9944	[PhaseOrdering] Add test for memset DSE (NFC) This checks for the interaction with #145474.	2025-06-27 17:04:04 +02:00
Simon Pilgrim	e25db2f6b3	[CostModel] getInstructionCost - match SK_InsertSubvector shuffle patterns before SK_Select (#145920 ) More closely match improveShuffleKindFromMask's shuffle ordering by trying to match a SK_InsertSubvector shuffles patterns before SK_Select - both can match many of the same patterns, but its much easier to recognise when a SK_InsertSubvector can be converted to SK_Select than vice-versa. Another step towards #145335 - which I'm hoping will allow us to generalise improveShuffleKindFromMask and remove getInstructionCost's shuffle matching entirely.	2025-06-26 20:15:51 +01:00
Simon Pilgrim	1a60c74c13	[CostModel][X86] SK_InsertSubvector inserted into the lowest subvector should be treated as SK_Select blend (#145892 ) X86 uses implicit widening and BLEND/MOV shuffles in these cases - otherwise we still treat it as a SK_PermuteTwoSrc	2025-06-26 16:00:51 +01:00
Narayan	d05634d5cd	[VectorCombine] Fold bitwise operations of bitcasts into bitcast of bitwise operation (#137322 ) Currently, LLVM fails to convert certain pblendvb intrinsics into select instructions when the blend mask is derived from complex boolean logic operations. This occurs even when the mask is ultimately based on sign-extended comparison results, preventing further optimization opportunities. Fixes #66513 --------- Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>	2025-06-26 14:57:22 +01:00
Simon Pilgrim	1d907c28b6	[VectorCombine][X86] fmaddsub.ll - add test variants without any undef elements	2025-06-23 12:18:32 +01:00
Paul Walker	e478a22d54	[LLVM][IRBuilder] Use NUW arithmetic for Create{ElementCount,TypeSize}. (#143532 ) This put the onus on the caller to ensure the result type is big enough. In the unlikely event a cropped result is required then explicitly truncate a safe value.	2025-06-19 13:24:39 +01:00
Simon Pilgrim	44b715293f	[PhaseOrdering][X86] Copy FMUL+ADDSUB/FMADDSUB build vector patterns from codegen tests As detailed on #144489 - confirm the vectorisation of scalar FMUL+ADDSUB/FMADDSUB on various targets	2025-06-18 08:17:08 +01:00
Alexey Bataev	0108a5908c	[SLP]Fix a crash on an subvector size calculation for non-power-of-2 vector Patch fixes cost estimation for the extractelements from non-power-of-2 vectors, defined as subvector extracts. In this case the subvector size might be not adjusted to a whole register size, need to get the minimum between whole vector size and the actual difference to prevent compiler crash. Fixes #143513	2025-06-17 08:58:07 -07:00
Konstantin Bogdanov	07fa6d1d90	[InstCombine] Avoid folding `select(umin(X, Y), X)` with min/max values in false arm (#143020 ) Fixes https://github.com/llvm/llvm-project/issues/139050. This patch adds a check to avoid folding min/max reduction into select, which may block loop vectorization. The issue is that the following snippet: ``` declare i8 @llvm.umin.i8(i8, i8) define i8 @masked_min_fold_bug(i8 %acc, i8 %val, i8 %mask) { ; CHECK-LABEL: @masked_min_fold_bug( ; CHECK: %cond = icmp eq i8 %mask, 0 ; CHECK: %masked_val = select i1 %cond, i8 %val, i8 255 ; CHECK: call i8 @llvm.umin.i8(i8 %acc, i8 %masked_val) ; %cond = icmp eq i8 %mask, 0 %masked_val = select i1 %cond, i8 %val, i8 255 %res = call i8 @llvm.umin.i8(i8 %acc, i8 %masked_val) ret i8 %res } ``` is being optimized to the following code, which can not be vectorized later. ``` declare i8 @llvm.umin.i8(i8, i8) #0 define i8 @masked_min_fold_bug(i8 %acc, i8 %val, i8 %mask) { %cond = icmp eq i8 %mask, 0 %1 = call i8 @llvm.umin.i8(i8 %acc, i8 %val) %res = select i1 %cond, i8 %1, i8 %acc ret i8 %res } attributes #0 = { nocallback nofree nosync nounwind speculatable willreturn memory(none) } ``` Expected: ``` declare i8 @llvm.umin.i8(i8, i8) #0 define i8 @masked_min_fold_bug(i8 %acc, i8 %val, i8 %mask) { %cond = icmp eq i8 %mask, 0 %masked_val = select i1 %cond, i8 %val, i8 -1 %res = call i8 @llvm.umin.i8(i8 %acc, i8 %masked_val) ret i8 %res } attributes #0 = { nocallback nofree nosync nounwind speculatable willreturn memory(none) } ``` https://godbolt.org/z/cYMheKE5r	2025-06-14 14:32:54 +08:00
Nikita Popov	cef5a3155b	[PhaseOrdering] Add test for #139050 (NFC)	2025-06-06 17:50:59 +02:00
Yingwei Zheng	1984c7539e	[ValueTracking] Do not use FMF from fcmp (#142266 ) This patch introduces an FMF parameter for `matchDecomposedSelectPattern` to pass FMF flags from select, instead of fcmp. Closes https://github.com/llvm/llvm-project/issues/137998. Closes https://github.com/llvm/llvm-project/issues/141017.	2025-06-02 18:21:14 +08:00
Yingwei Zheng	3ec0c5c7fe	[InstCombine] Propagate FMF from select instead of fcmp (#141010 ) Previously, `3d6b53980c` propagates FMF from fcmp to avoid performance regressions. With the help of https://github.com/llvm/llvm-project/pull/139861, https://github.com/llvm/llvm-project/pull/141015, and https://github.com/llvm/llvm-project/pull/141914, we can still convert SPF into fabs/minnum/maxnum intrinsics even if some flags are missing. This patch propagates FMF from select to address the long-standing issue. Closes https://github.com/llvm/llvm-project/issues/140994.	2025-05-31 16:25:10 +08:00
Shilei Tian	84a69a0f8f	[AMDGPU] Move InferAddressSpacesPass to middle end optimization pipeline (#138604 ) It will run twice in the non-LTO pipeline with `O1` or higher. In LTO post link pipeline, it will be run once with `O2` or higher, since inline and SROA don't run in `O1`.	2025-05-29 17:20:56 -04:00
Simon Pilgrim	63f3a5babd	[PhaseOrdering][X86] Add test coverage for #48223 The X86 backend shuffle combining is saving us from some poor vectorised IR	2025-05-12 09:16:13 +01:00
Alexey Bataev	a7a74b349d	[SLP]Improve reordering of the alternate nodes Better to preserve the original order of the alternate nodes to avoid inter-lane shuffling, select/insert subvector patterns provide better perf. Reviewers: RKSimon, hiraditya Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/136329	2025-04-24 14:33:10 -04:00
Simon Pilgrim	79144643b6	[PhaseOrdering][X86] blendv-select.ll - add test coverage for #66513	2025-04-24 12:37:05 +01:00
Sirish Pande	7f107c3019	[IndVarsSimplify] sinkUnusedInvariants is skipping instructions while sinking. (#135205 ) While sinking instructions (that are loop invariant) from preheader to the exit block, we are skipping instructions due to decrementing instruction iterator twice.	2025-04-17 19:21:18 -05:00
Alexey Bataev	cf6a452cc7	[SLP]Fix same/alternate analysis in split node analysis for compares getSameOpcode in some cases may consider 2 compares as having same opcode, even though previously they were considered as alternate. It may happen, because getSameOpcode looses info about previous instructions and their states. Need to use isAlternateInstruction function instead for the correct analysis. Reviewers: RKSimon, hiraditya Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/133769	2025-03-31 19:33:40 -04:00
Florian Hahn	dfca6c0d3b	[VPlan] Remove no-op SCALAR-STEPS after unrolling. (#123655 ) After unrolling, there may be additional simplifications that can be applied. One example is removing SCALAR-STEPS for the first part where only the first lane is demanded. This removes redundant adds of 0 from a large number of tests (~200), many which I am still working on updating. In preparation for removing redundant WideIV steps added in https://github.com/llvm/llvm-project/pull/119284. PR: https://github.com/llvm/llvm-project/pull/123655	2025-03-25 12:57:24 +00:00
Gábor Spaits	a0b175cb34	[SimplifyCFG] Treat `extract oneuse(op.with.overflow),1` pattern as a single instruction (#128021 ) Closes #115683 . Overflow arithmetic instruction plus extract value are usually generated when a division is being replaced, but the zero check may still be there. In that case hoist these two instructions out of this basic block, and let later optimizations take care of the unnecessary zero checks.	2025-03-14 14:18:57 +01:00
hanbeom	0ee8f69978	[VectorCombine] Fix invalid shuffle cost argument of foldShuffleOfSelects (#130281 ) In the previous code (#128032), it specified the destination vector as the getShuffleCost argument. Because the shuffle mask specifies the indices of the two vectors specified as elements, the maximum value is twice the size of the source vector. This causes a problem if the destination vector is smaller than the source vector and specify an index in the mask that exceeds the size of the destination vector. Fix the problem by correcting the previous code, which was using wrong argument in the Cost calculation. Fixes #130250	2025-03-07 16:40:26 +00:00
Nikita Popov	e56a6a2683	Reapply [CaptureTracking][FunctionAttrs] Add support for CaptureInfo (#125880 ) (#128020 ) Relative to the previous attempt this includes two fixes: * Adjust callCapturesBefore() to not skip captures(ret: address, provenance) arguments, as these will not count as a capture at the call-site. * When visiting uses during stack slot optimization, don't skip the ModRef check for passthru captures. Calls can both modref and be passthru for captures. ------ This extends CaptureTracking to support inferring non-trivial CaptureInfos. The focus of this patch is to only support FunctionAttrs, other users of CaptureTracking will be updated in followups. The key API changes here are: * DetermineUseCaptureKind() now returns a UseCaptureInfo where the UseCC component specifies what is captured at that Use and the ResultCC component specifies what may be captured via the return value of the User. Usually only one or the other will be used (corresponding to previous MAY_CAPTURE or PASSTHROUGH results), but both may be set for call captures. * The CaptureTracking::captures() extension point is passed this UseCaptureInfo as well and then can decide what to do with it by returning an Action, which is one of: Stop: stop traversal. ContinueIgnoringReturn: continue traversal but don't follow the instruction return value. Continue: continue traversal and follow the instruction return value if it has additional CaptureComponents. For now, this patch retains the (unsound) special logic for comparison of null with a dereferenceable pointer. I'd like to switch key code to take advantage of address/address_is_null before dropping it. This PR mainly intends to introduce necessary API changes and basic inference support, there are various possible improvements marked with TODOs.	2025-02-27 09:38:29 +01:00

1 2 3 4 5 ...

776 Commits