llvm-project

Author	SHA1	Message	Date
Ramkumar Ramachandra	86482dffba	[VPlan] Use m_Broadcast to improve a match (NFC) (#153607 )	2025-08-14 18:10:58 +01:00
Vedant Paranjape	44df9826f3	[InstCombine] Propagate invariant.load metadata across unpacked loads (#152186 ) For loads that operate on aggregate type, instcombine unpacks the loads. It does not preserve the invariant.load metadata. This patch fixes that, it looks for the metadata in the parent load and attaches the metadata to the unpacked loads. ``` %struct.double2 = type { double, double } %struct.double1 = type { double } define %struct.double2 @func1(ptr %a) { %1 = load %struct.double2, ptr %a, align 16, !invariant.load !1 ret %struct.double2 %1 } !1 = !{} ``` Reproducer: https://godbolt.org/z/hcY8MMvYh	2025-08-14 10:08:26 -07:00
Alexey Bataev	ca4ebf9517	[SLP]Support LShr as base for copyable elements Added support for LShr instructions as base for copyable elements. Also, added simple analysis for best base instruction selection, if multiple candidates are available. Reviewers: hiraditya, RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/153393	2025-08-14 12:35:28 -04:00
Alexey Bataev	d57ab276b6	[SLP] Recalculate cleared deps for potential control schedule data nodes Need to recalculate the dependencies for all potential control data schedule nodes to prevent compiler crash. Fixes #153571	2025-08-14 09:00:42 -07:00
Florian Hahn	177f27d220	[VPlan] Add incoming_[blocks,values] iterators to VPPhiAccessors (NFC) (#138472 ) Add 3 new iterator ranges to VPPhiAccessors * incoming_values(): returns a range over the incoming values of a phi * incoming_blocks(): returns a range over the incoming blocks of a phi * incoming_values_and_blocks: returns a range over pairs of incoming values and blocks. Depends on https://github.com/llvm/llvm-project/pull/124838. PR: https://github.com/llvm/llvm-project/pull/138472	2025-08-14 16:47:04 +01:00
Theodoros Theodoridis	d15b7a83a7	[llvm][LICM] Limit multi-use BOAssociation to FP and Vector (#149829 ) Limit the re-association of BOps with multiple users to FP and Vector arithmetic.	2025-08-14 11:56:55 +01:00
Elvis Wang	01fac67e2a	[TTI] Add cost kind to getAddressComputationCost(). NFC. (#153342 ) This patch add cost kind to `getAddressComputationCost()` for #149955. Note that this patch also remove all the default value in `getAddressComputationCost()`.	2025-08-14 16:01:44 +08:00
Pavel Skripkin	30144226a4	[llvm] [InstCombine] fold "icmp eq (X + (V - 1)) & -V, X" to "icmp eq (and X, V - 1), 0" (#152851 ) This fold optimizes ```llvm define i1 @src(i32 %num, i32 %val) { %mask = add i32 %val, -1 %neg = sub nsw i32 0, %val %num.biased = add i32 %num, %mask %_2.sroa.0.0 = and i32 %num.biased, %neg %_0 = icmp eq i32 %_2.sroa.0.0, %num ret i1 %_0 } ``` to ```llvm define i1 @tgt(i32 %num, i32 %val) { %mask = add i32 %val, -1 %tmp = and i32 %num, %mask %ret = icmp eq i32 %tmp, 0 ret i1 %ret } ``` For power-of-two `val`. Observed in real life for following code ```rust pub fn is_aligned(num: usize) -> bool { num.next_multiple_of(1 << 12) == num } ``` which verifies that num is aligned to 4096. Alive2 proof https://alive2.llvm.org/ce/z/QisECm	2025-08-14 10:23:03 +03:00
Luke Lau	af06835483	[VPlan] Use parameter packs to avoid unary/binary/ternary matchers. NFC (#152272 ) Instead of defining unary/binary/ternary/4ary overloads of each matcher, we can use parameter packs to support arbitrary numbers of operands. This allows us to remove the explicit N-ary definitions for each matcher. We need to rewrite Recipe_match's constructor to use a parameter pack too, otherwise we end up with ambiguous overloads.	2025-08-14 11:55:55 +08:00
Florian Hahn	9400490a3c	[LV] Remove unused ILV state (NFC). Remove unused member variables from InnerLoopVectorizer.	2025-08-13 21:28:50 +01:00
Kazu Hirata	1f04b15c56	[Vectorize] Remove a redundant call to std::unique_ptr<T>::get (NFC) (#153359 )	2025-08-13 10:37:31 -07:00
Arnold Schwaighofer	9871adc54a	[coro] [async] Make sure to reprocess non-split async functions (#153419 ) We do this to inline the coro.end.async's tail called function into the non-split async coroutine. rdar://136296219	2025-08-13 10:10:35 -07:00
Sumanth Gundapaneni	77044f944c	[SeparateConstOffsetFromGEP] Decompose constant xor operand if possible (#150438 ) Try to transform XOR(A, B+C) in to XOR(A,C) + B where XOR(A,C) is part of base for memory operations. This transformation can map these Xors in to better addressing mode and eventually decompose them in to geps.	2025-08-13 11:49:44 -05:00
Alexey Bataev	dd5ba694bd	[SLP]Recalculate deps for potential control-dependent schedule data After clearing the dependencies in copyable data, need to recalculate dependencies for the original ScheduleData, if it can be marked as control dependent. Fixes #153289	2025-08-13 08:18:26 -07:00
Ramkumar Ramachandra	d107c29fef	[VPlan] Strip unused CanonicalIVTy arg (NFC) (#153418 )	2025-08-13 15:53:56 +01:00
Orlando Cazalet-Hyams	d13341db26	[RemoveDIs][NFC] Remove getAssignmentMarkers (#153214 ) getAssignmentMarkers was for debug intrinsics. getDVRAssignmentMarkers is used for DbgRecords.	2025-08-13 10:56:19 +01:00
Florian Hahn	48bfaa4c06	[VPlan] Replace VPBB for vector.ph during skeleton creation (NFC) Shift replacement of regular VPBB for vector.ph with the VPIRBB wrapping the created IR block directly to skeleton creation, to be consistent with how the scalar preheader is handled.	2025-08-13 08:30:18 +01:00
Thurston Dang	cf002847a4	Revert "[msan] Improve packed multiply-add instrumentation" (#153343 ) Reverts llvm/llvm-project#152941 Buildbot breakage: https://lab.llvm.org/buildbot/#/builders/66/builds/17843	2025-08-12 21:32:07 -07:00
Luke Lau	9217b6ab2e	[VPlan] Enforce that there is only ever one header mask. NFC (#152489 ) We almost only ever have one header mask, except with the data tail folding style, i.e. with VPInstruction::ActiveLaneMask. All we need to do is to make sure to erase the old header icmp based header mask when replacing it.	2025-08-13 02:39:04 +00:00
Thurston Dang	ba603b5e4d	[msan] Improve packed multiply-add instrumentation (#152941 ) The current instrumentation has false positives: if there is a single uninitialized bit in any of the operands, the entire output is poisoned. This does not take into account that multiplying an uninitialized value with zero results in an initialized zero value. This step allows elements that are zero to clear the corresponding shadow during the multiplication step. The horizontal add step and accumulation step (if any) are modeled using bitwise OR. Future work can apply this improved handler to the AVX512 equivalent intrinsics (x86_avx512_pmaddw_d_512, x86_avx512_pmaddubs_w_512.) and AVX VNNI intrinsics.	2025-08-12 19:13:48 -07:00
Mircea Trofin	374cbfd327	[licm] clone `MD_prof` when hoisting conditional branch (#152232 ) The profiling - related metadata information for the hoisted conditional branch should be copied from the original branch, not from the current terminator of the block it's hoisted to. The patch adds a way to disable the fix just so we can do an ablation test, after which the flag will be removed. The same flag will be reused for other similar fixes. (This was identified through `profcheck` (see Issue #147390), and this PR addresses most of the test failures (when running under profcheck) under `Transforms/LICM`.)	2025-08-13 02:01:00 +02:00
Florian Hahn	8cdab07aaa	Reapply "[VPlan] Remove trivial dead VPPhi cycles." This reverts commit 1c7c8e3ad39957285524ff116d9a6aec0d9b62f9. Recommit with a fix for the verifier error caused for EVL recipes. Extra test coverage added in 6f939da60e.	2025-08-12 22:09:30 +01:00
Leon Clark	e2bbd6d287	[VectorCombine][AMDGPU] Narrow Phi of Shuffles. (#140188 ) Attempt to narrow a phi of shufflevector instructions where the two incoming values have the same operands but different masks. Related to #128938. --------- Co-authored-by: Leon Clark <leoclark@amd.com>	2025-08-12 18:45:11 +01:00
Farzon Lotfi	544562ebc2	[DirectX] Remove lifetime intrinsics and run Dead Store Elimination (#152636 ) fixes #151764 This fix has two parts first we track all lifetime intrinsics and if they are users of an alloca of a target extention like dx.RawBuffer then we eliminate those memory intrinsics when we visit the alloca. We do step one to allow us to use the Dead Store Elimination Pass. This removes the alloca and simplifies the use of the target extention back to using just the global. That keeps things in a form the DXILBitcodeWriter is expecting. Obviously to pull this off we needed to bring back the legacy pass manager plumbing for the DSE pass and hook it up into the DirectX backend. The net impact of this change is that DML shader pass rate went from 89.72% (4268 successful compilations) to 90.98% (4328 successful compilations).	2025-08-12 12:42:08 -04:00
Florian Hahn	424258947e	[VPlan] Materialize VF and VFxUF using VPInstructions. (#152879 ) Materialize VF and VFxUF computation using VPInstruction instead of directly creating IR. This is one of the last few steps needed to model the full vector skeleton in VPlan. This is mostly NFC, although in some cases we remove some unused computations. PR: https://github.com/llvm/llvm-project/pull/152879	2025-08-12 14:13:13 +01:00
Leon Clark	9115bef8ee	[VectorCombine] Shrink loads used in shufflevector rebroadcasts. (#153138 ) Reopen #128938. Attempt to shrink the size of vector loads where only some of the incoming lanes are used for rebroadcasts in shufflevector instructions. --------- Co-authored-by: Leon Clark <leoclark@amd.com> Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>	2025-08-12 14:08:37 +01:00
Andreas Jonson	1840106ddf	[SCCP] Add support for trunc nuw range. (#152990 ) proof: https://alive2.llvm.org/ce/z/_7PVxq	2025-08-12 13:48:55 +02:00
Nikita Popov	ab323eb0c6	[SCCP][PredicateInfo] Do not predicate argument of lifetime intrinsic Replacing the argument with a no-op bitcast violates a verifier constraint, even if only temporarily. Any replacement based on it would result in a violation even after the copy has been removed. Fixes https://github.com/llvm/llvm-project/issues/153013.	2025-08-12 12:56:08 +02:00
David Sherwood	8140779a9a	[LV] Improve accuracy of branch weights in epilogue iteration check block (#152980 ) When one of the vector loops (main or epilogue) is scalable and the other isn't, we can use the estimated value of vscale to improve the accuracy.	2025-08-12 10:37:47 +01:00
Sam Tebbs	0bfa1718af	[LV] Create in-loop sub reductions (#147026 ) This PR allows the loop vectorizer to handle in-loop sub reductions by forming a normal in-loop add reduction with a negated input. Stacked PRs: 1. -> https://github.com/llvm/llvm-project/pull/147026 2. https://github.com/llvm/llvm-project/pull/147255 3. https://github.com/llvm/llvm-project/pull/147302 4. https://github.com/llvm/llvm-project/pull/147513	2025-08-12 10:22:41 +01:00
Jie Fu	2fc1b3dd9f	[MemorySanitizer] Fix an unused-variable warning (NFC) /llvm-project/llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp:2752:22: error: unused variable 'ParamType' [-Werror,-Wunused-variable] FixedVectorType *ParamType = ^ 1 error generated.	2025-08-12 07:51:53 +08:00
Thurston Dang	ef5022745c	[NFCI][msan] Refactor into 'horizontalReduce' (#152961 ) The functionality is used by two helper functions, and will be used even more in the future (e.g., https://github.com/llvm/llvm-project/pull/152941).	2025-08-11 15:48:20 -07:00
Florian Hahn	1c7c8e3ad3	Revert "[VPlan] Remove trivial dead VPPhi cycles." This reverts commit 1f17bb133f4f49942a1e0245291811ca3c99a7d2. This seems to be breaking some RISCV bots, reverting for now https://lab.llvm.org/buildbot/#/builders/210/builds/1266	2025-08-11 22:05:30 +01:00
Florian Hahn	1f17bb133f	[VPlan] Remove trivial dead VPPhi cycles. Update removeDeadRecipes to remove trivial dead VPPhi cycles. Should effectively be NFC end-to-end.	2025-08-11 21:29:49 +01:00
XChy	df75b4b942	Revert "[DFAJumpThreading] Prevent pass from using too much memory." (#153075 ) Reverts llvm/llvm-project#145482	2025-08-12 04:26:47 +08:00
Alexey Bataev	2d7b55a028	[SLP]Initial support for copyable elements Adds initial support for copyable elements, both schedulable and non-schedulable. Adds support only for add for now, other opcodes will added in future. Still some cases are not handled, e.g. stores do not include this, because currently do not check for copyable elements. Reviewers: hiraditya, RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/147366	2025-08-11 09:41:19 -04:00
Alexey Bataev	67af2f6c5c	[SLP]Initial FMAD support (#149102 ) Added initial check for potential fmad conversion in reductions and operands vectorization. Added the check for instruction to fix #152683 Skipped the code for reduction to avoid regressions.	2025-08-11 05:53:55 -07:00
Weibo He	13cd725857	[CoroSplit] Remove lifetime marker checks for subranges of allocas (#152886 ) #150248 starts to drop size argument of lifetime markers. Then lifetime markers cannot refer to subrange of allocas and we can remove this check.	2025-08-11 13:02:07 +02:00
Andreas Jonson	330a589450	[PredicateInfo] Handle trunc nuw i1 condition. (#152988 ) proof: https://alive2.llvm.org/ce/z/mxtn4L	2025-08-11 13:00:54 +02:00
Luke Lau	aea82a780a	[VPlan] Remove some getCanonicalIV() uses. NFC (#152969 ) A lot of time getCanonicalIV() is used to get the canonical IV type, e.g. to instantiate a VPTypeAnalysis or to get the LLVMContext. However VPTypeAnalysis has a constructor that takes the VPlan directly and there's a method on VPlan to get the LLVMContext directly, so use those instead where possible. This lets us remove a constructor on VPTypeAnalysis. Also remove an unused LLVMContext argument in UnrollState whilst we're here.	2025-08-11 18:12:05 +08:00
Luke Lau	acb86fb9e0	[TTI] Consistently pass the pointer type to getAddressComputationCost. NFCI (#152657 ) In some places we were passing the type of value being accessed, in other cases we were passing the type of the pointer for the access. The most "involved" user is LoopVectorizationCostModel::getMemInstScalarizationCost, which is the only call site that passes in the SCEV, and it passes along the pointer type. This changes call sites to consistently pass the pointer type, and renames the arguments to clarify this. No target actually checks the contents of the type passed, only to see if it's a vector or not, so this shouldn't have an effect.	2025-08-11 18:00:12 +08:00
Ramkumar Ramachandra	95c525b1db	[VPlan] Preserve nusw on VectorEndPointer (#151558 ) In createInterleaveGroups, get the nusw in addition to inbounds from the existing GEP, and set them on the VPVectorEndPointerRecipe.	2025-08-11 10:38:25 +01:00
David Sherwood	aba0ce10c7	[LV] Add new line to interleaving disabled message (#152722 )	2025-08-11 09:53:20 +01:00
David Sherwood	9181a7e294	[LV] Fix branch weights in epilogue min iteration check block (#152534 ) I've changed how we construct the EpilogueVectorizerEpilogueLoop and EpilogueVectorizerMainLoop classes so that we construct the parent class with an additional boolean parameter indicating whether we're vectorising the main or epilogue loop. The InnerLoopAndEpilogueVectorizer class uses this new argument in combination with the EpilogueLoopVectorizationInfo struct to set the right UF and VF values. This then allows EpilogueVectorizerEpilogueLoop to access the correct values of VF and UF for the main loop, which are required when setting branch weights in the minimum iteration check block.	2025-08-11 09:52:54 +01:00
Elvis Wang	37fe7a9933	[LV] Generate scalar xor for VPInstruction::Not if possible. (#152628 ) `VPInstruction::Not` which will generate xor instruction is widely used for the exit condition. This patch make `VPInstruction::Not` generate scalar `xor` if possible. This can help reducing the (splat true) in the `xor` and make `xor` be scalar.	2025-08-11 16:35:21 +08:00
Yingwei Zheng	84b31581f8	Revert "[PatternMatch] Add `m_[Shift]OrSelf` matchers." (#152953 ) Reverts llvm/llvm-project#152924 According to `f67668b586`, it is not an NFC change.	2025-08-11 09:35:16 +02:00
hanbeom	a750fcb52b	[GVN] Check IndirectBr in Predecessor Terminators (#151188 ) Critical edges with an IndirectBr terminator cannot be split. Add a check it to prevent assertion failures. Fixes: #150229	2025-08-11 09:25:52 +02:00
Nikita Popov	35bad229c1	[PredicateInfo] Use bitcast instead of ssa.copy (#151174 ) PredicateInfo needs some no-op to which the predicate can be attached. Currently this is an ssa.copy intrinsic. This PR replaces it with a no-op bitcast. Using a bitcast is more efficient because we don't have the overhead of an overloaded intrinsic. It also makes things slightly simpler overall.	2025-08-11 09:25:01 +02:00
David Green	6ca6d45b29	[VectorCombine] Use hasOneUser in shuffle-to-identity fold (#152675 ) We need to check that the node is part of the graph being converted, so will not contain external uses when transformed.	2025-08-11 07:45:15 +01:00
Mel Chen	6db3776f9b	[LV][EVL] Simplify EVL recipe transformation by using a single EVL mask. nfc (#152479 ) The EVL mask is always defined as `icmp ult (step-vector, EVL)`, so we only need to generate it once per plan in the header. Then, we replace all uses of the header mask with the EVL mask, and recursively optimize the users of EVL mask into EVL recipes. This way, the transformation to EVL recipes can be done with just a single loop.	2025-08-11 11:09:01 +08:00

1 2 3 4 5 ...

40745 Commits