llvm-project

Author	SHA1	Message	Date
Florian Hahn	7e9989390d	[VPlan] Materialize Build(Struct)Vectors for VPReplicateRecipes. (NFCI) (#151487 ) Materialze Build(Struct)Vectors explicitly for VPRecplicateRecipes, to serve their users requiring a vector, instead of doing so when unrolling by VF. Now we only need to implicitly build vectors in VPTransformState::get for VPInstructions. Once they are also unrolled by VF we can remove the code-path alltogether. PR: https://github.com/llvm/llvm-project/pull/151487	2025-08-18 20:49:42 +01:00
Thurston Dang	ade755d62b	[msan] Add Instrumentation for Avx512 Instructions: pmaddw, pmaddubs (#153919 ) This applies the pmadd handler (recently improved in https://github.com/llvm/llvm-project/pull/153353) to the Avx512 equivalent of the pmaddw and pmaddubs intrinsics: <16 x i32> @llvm.x86.avx512.pmaddw.d.512(<32 x i16>, <32 x i16>) <32 x i16> @llvm.x86.avx512.pmaddubs.w.512(<64 x i8>, <64 x i8>)	2025-08-18 11:31:15 -07:00
Kyle Wang	064f02dac0	[VectorCombine] Preserve scoped alias metadata (#153714 ) Right now if a load op is scalarized, the `!alias.scope` and `!noalias` metadata are dropped. This PR is to keep them if exist.	2025-08-18 18:16:32 +00:00
Tobias Stadler	8135b7c1ab	[LV] Emit all remarks for unvectorizable instructions (#153833 ) If ExtraAnalysis is requested, emit all remarks caused by unvectorizable instructions - instead of only the first. This is in line with how other places handle DoExtraAnalysis and it can be quite helpful to get info about all instructions in a loop that prevent vectorization.	2025-08-18 18:04:53 +01:00
Ramkumar Ramachandra	97f554249c	[VPlan] Preserve nusw in createInBoundsPtrAdd (#151549 ) Rename createInBoundsPtrAdd to createNoWrapPtrAdd, and preserve nusw as well as inbounds at the callsite.	2025-08-18 17:48:42 +01:00
Andreas Jonson	1b60236200	[SimplifyCFG] Avoid redundant calls in gather. (NFC) (#154133 ) Split out from https://github.com/llvm/llvm-project/pull/154007 as it showed compile time improvements NFC as there needs to be at least two icmps that is part of the chain.	2025-08-18 18:45:52 +02:00
Antonio Frighetto	33761df961	Revert "[SimpleLoopUnswitch] Record loops from unswitching non-trivial conditions" This reverts commit e9de32fd159d30cfd6fcc861b57b7e99ec2742ab due to multiple performance regressions observed across downstream Numba benchmarks (https://github.com/llvm/llvm-project/issues/138509#issuecomment-3193855772). While avoiding non-trivial unswitches on newly-cloned loops helps mitigate the pathological case reported in https://github.com/llvm/llvm-project/issues/138509, it may as well make the IR less friendly to vectorization / loop- canonicalization (in the test reported, previously no select with loop-carried dependence existed in the new specialized loops), leading the abovementioned approach to be reconsidered.	2025-08-18 17:40:08 +02:00
Kazu Hirata	07eb7b7692	[llvm] Replace SmallSet with SmallPtrSet (NFC) (#154068 ) This patch replaces SmallSet<T , N> with SmallPtrSet<T , N>. Note that SmallSet.h "redirects" SmallSet to SmallPtrSet for pointer element types: template <typename PointeeType, unsigned N> class SmallSet<PointeeType, N> : public SmallPtrSet<PointeeType, N> {}; We only have 140 instances that rely on this "redirection", with the vast majority of them under llvm/. Since relying on the redirection doesn't improve readability, this patch replaces SmallSet with SmallPtrSet for pointer element types.	2025-08-18 07:01:29 -07:00
Arne Stenkrona	ea2f5395b1	[SimplifyCFG] Avoid threading for loop headers (#151142 ) Updates SimplifyCFG to avoid jump threading through loop headers if -keep-loops is requested. Canonical loop form requires a loop header that dominates all blocks in the loop. If we thread through a header, we risk breaking its domination of the loop. This change avoids this issue by conservatively avoiding threading through headers entirely. Fixes: https://github.com/llvm/llvm-project/issues/151144	2025-08-18 09:46:55 +00:00
David Sherwood	7ee6cf06c8	[LV] Fix incorrect cost kind in VPReplicateRecipe::computeCost (#153216 ) We were incorrectly using the TTI::TCK_RecipThroughput cost kind and ignoring the kind set in the context.	2025-08-18 09:52:31 +01:00
David Green	790bee99de	[VectorCombine] Remove dead node immediately in VectorCombine (#149047 ) The vector combiner will process all instructions as it first loops through the function, adding any newly added and deleted instructions to a worklist which is then processed when all nodes are done. These leaves extra uses in the graph as the initial processing is performed, leading to sub-optimal decisions being made for other combines. This changes it so that trivially dead instructions are removed immediately. The main changes that this requires is to make sure iterator invalidation does not occur.	2025-08-18 07:55:21 +01:00
Kazu Hirata	cbf5af9668	[llvm] Remove unused includes (NFC) (#154051 ) These are identified by misc-include-cleaner. I've filtered out those that break builds. Also, I'm staying away from llvm-config.h, config.h, and Compiler.h, which likely cause platform- or compiler-specific build failures.	2025-08-17 23:46:35 -07:00
Mel Chen	145e8aadca	[LV][EVL] Add dead EVL mask into ToErase for consistency. nfc (#153761 )	2025-08-18 14:11:50 +08:00
Owen Anderson	69e4514978	[GlobalOpt] Do not fold away addrspacecasts which may be runtime operations (#153753 ) Specifically in the context of the once-stored transformation, GlobalOpt would strip all pointer casts unconditionally, even though addrspacecasts might be runtime operations. This manifested particularly on CHERI targets. This patch was inspired by an existing change in CHERI LLVM (`91afa60f17`), but has been reimplemented with updated conventions, and a testcase constructed from scratch.	2025-08-18 02:11:51 +00:00
Florian Hahn	5892a2beec	[VPlan] Remove dead code from GetBroadCastInstr (NFCI). All relevant places should already explicitly materialize broadcasts. Remove dead code from VPTransformState::get	2025-08-17 21:51:14 +01:00
Andreas Jonson	5ae8a9b8ce	[SimplifyCfg] Handle trunc nuw i1 condition in Equality comparison. (#153051 ) proof: https://alive2.llvm.org/ce/z/WVt4-F	2025-08-17 09:53:40 +02:00
Florian Hahn	351d398a37	[VPlan] Run final VPlan simplifications before codegen. Dissolving the hierarchical VPlan CFG and converting abstract to concrete recipes can expose additional simplification opportunities. Do a final run of simplifyRecipes before executing the VPlan.	2025-08-16 18:54:27 +01:00
Mircea Trofin	c971c25544	[licm] don't drop `MD_prof` when dropping other metadata (#152420 ) Part of Issue #147390	2025-08-16 07:26:13 -07:00
Thurston Dang	638bd11c13	[msan] Handle SSE/AVX pshuf intrinsic by applying to shadow (#153895 ) llvm.x86.sse.pshuf.w(<1 x i64>, i8) and llvm.x86.avx512.pshuf.b.512(<64 x i8>, <64 x i8>) are currently handled strictly, which is suboptimal. llvm.x86.ssse3.pshuf.b(<1 x i64>, <1 x i64>) llvm.x86.ssse3.pshuf.b.128(<16 x i8>, <16 x i8>) and llvm.x86.avx2.pshuf.b(<32 x i8>, <32 x i8>) are currently heuristically handled using maybeHandleSimpleNomemIntrinsic, which is incorrect. Since the second argument is the shuffle order, we instrument all these intrinsics using `handleIntrinsicByApplyingToShadow(..., /trailingVerbatimArgs=/1)` (https://github.com/llvm/llvm-project/pull/114490).	2025-08-15 20:28:30 -07:00
Matt Arsenault	3e5d8a1439	Reapply "RuntimeLibcalls: Generate table of libcall name lengths (#153… (#153864 ) This reverts commit 334e9bf2dd01fbbfe785624c0de477b725cde6f2. Check if llvm-nm exists before building the benchmark.	2025-08-16 09:53:50 +09:00
Thurston Dang	2b75ff192d	[msan] Reland with even more improvement: Improve packed multiply-add instrumentation (#153353 ) This reverts commit cf002847a464c004a57ca4777251b1aafc33d958 i.e., relands ba603b5e4d44f1a25207a2a00196471d2ba93424. It was reverted because it was subtly wrong: multiplying an uninitialized zero should not result in an initialized zero. This reland fixes the issue by using instrumentation analogous to visitAnd (bitwise AND of an initialized zero and an uninitialized value results in an initialized value). Additionally, this reland expands a test case; fixes the commit message; and optimizes the change to avoid the need for horizontalReduce. The current instrumentation has false positives: it does not take into account that multiplying an initialized zero value with an uninitialized value results in an initialized zero value This change fixes the issue during the multiplication step. The horizontal add step is modeled using bitwise OR. Future work can apply this improved handler to the AVX512 equivalent intrinsics (x86_avx512_pmaddw_d_512, x86_avx512_pmaddubs_w_512.) and AVX VNNI intrinsics.	2025-08-15 16:35:42 -07:00
gulfemsavrun	334e9bf2dd	Revert "RuntimeLibcalls: Generate table of libcall name lengths (#153… (#153864 ) …210)" This reverts commit 9a14b1d254a43dc0d4445c3ffa3d393bca007ba3. Revert "RuntimeLibcalls: Return StringRef for libcall names (#153209)" This reverts commit cb1228fbd535b8f9fe78505a15292b0ba23b17de. Revert "TableGen: Emit statically generated hash table for runtime libcalls (#150192)" This reverts commit 769a9058c8d04fc920994f6a5bbb03c8a4fbcd05. Reverted three changes because of a CMake error while building llvm-nm as reported in the following PR: https://github.com/llvm/llvm-project/pull/150192#issuecomment-3192223073	2025-08-15 13:32:27 -07:00
Alexey Bataev	b157599156	[SLP]Do not include copyable data to the same user twice If the copyable schedule data is created and the user is used several times in the user node, no need to count same data for the same user several times, need to include it only ones. Fixes #153754	2025-08-15 12:36:45 -07:00
Florian Hahn	2ed727f3f6	[VPlan] Move SCEV invalidation to ::executePlan. (NFCI) Move SCEV invalidation from legacy ILV code-path directly to ::executePlan.	2025-08-15 20:32:41 +01:00
Alexey Bataev	09f5b9ab0a	Revert "[SLP]Do not include copyable data to the same user twice" This reverts commit 758c6852c3ffe6b5e259cafadd811e60d8c276fb to fix buildbot https://lab.llvm.org/buildbot/#/builders/195/builds/13298	2025-08-15 12:08:31 -07:00
zGoldthorpe	82caa251d4	[InstCombine] Fold integer unpack/repack patterns through ZExt (#153583 ) This patch explicitly enables the InstCombiner to fold integer unpack/repack patterns such as ```llvm define i64 @src_combine(i32 %lower, i32 %upper) { %base = zext i32 %lower to i64 %u.0 = and i32 %upper, u0xff %z.0 = zext i32 %u.0 to i64 %s.0 = shl i64 %z.0, 32 %o.0 = or i64 %base, %s.0 %r.1 = lshr i32 %upper, 8 %u.1 = and i32 %r.1, u0xff %z.1 = zext i32 %u.1 to i64 %s.1 = shl i64 %z.1, 40 %o.1 = or i64 %o.0, %s.1 %r.2 = lshr i32 %upper, 16 %u.2 = and i32 %r.2, u0xff %z.2 = zext i32 %u.2 to i64 %s.2 = shl i64 %z.2, 48 %o.2 = or i64 %o.1, %s.2 %r.3 = lshr i32 %upper, 24 %u.3 = and i32 %r.3, u0xff %z.3 = zext i32 %u.3 to i64 %s.3 = shl i64 %z.3, 56 %o.3 = or i64 %o.2, %s.3 ret i64 %o.3 } ; => define i64 @tgt_combine(i32 %lower, i32 %upper) { %base = zext i32 %lower to i64 %upper.zext = zext i32 %upper to i64 %s.0 = shl nuw i64 %upper.zext, 32 %o.3 = or disjoint i64 %s.0, %base ret i64 %o.3 } ``` Alive2 proofs: [YAy7ny](https://alive2.llvm.org/ce/z/YAy7ny)	2025-08-15 12:48:32 -06:00
Alexey Bataev	758c6852c3	[SLP]Do not include copyable data to the same user twice If the copyable schedule data is created and the user is used several times in the user node, no need to count same data for the same user several times, need to include it only ones. Fixes #153754	2025-08-15 11:47:35 -07:00
XChy	3a4a60deff	[VectorCombine] Apply InstSimplify in scalarizeOpOrCmp to avoid infinite loop (#153069 ) Fixes #153012 As we tolerate unfoldable constant expressions in `scalarizeOpOrCmp`, we may fold ```llvm define void @bug(ptr %ptr1, ptr %ptr2, i64 %idx) #0 { entry: %158 = insertelement <2 x i64> <i64 5, i64 ptrtoint (ptr @val to i64)>, i64 %idx, i32 0 %159 = or disjoint <2 x i64> splat (i64 2), %158 store <2 x i64> %159, ptr %ptr2 ret void } ``` to ```llvm define void @bug(ptr %ptr1, ptr %ptr2, i64 %idx) { entry: %.scalar = or disjoint i64 2, %idx %0 = or <2 x i64> splat (i64 2), <i64 5, i64 ptrtoint (ptr @val to i64)> %1 = insertelement <2 x i64> %0, i64 %.scalar, i64 0 store <2 x i64> %1, ptr %ptr2, align 16 ret void } ``` And it would be folded back in `foldInsExtBinop`, resulting in an infinite loop. This patch forces scalarization iff InstSimplify can fold the constant expression.	2025-08-15 18:38:04 +00:00
Kaitlin Peng	0bb1af478a	[DirectX] Add GlobalDCE pass after finalize linkage pass in DirectX backend (#151071 ) Fixes #139023. This PR essentially removes unused global variables: - Restores the `GlobalDCE` Legacy pass and adds it to the DirectX backend after the finalize linkage pass - Converts external global variables with no usage to internal linkage in the finalize linkage pass - (so they can be removed by `GlobalDCE`) - Makes the `dxil-finalize-linkage` pass usable using the new pass manager flag syntax - Adds tests to `finalize_linkage.ll` that make sure unused global variables are removed - Adds a use for variable `@CBV` in `opaque-value_as_metadata.ll` so it isn't removed - Changes the `scalar-data.ll` run command to avoid removing its global variables --------- Co-authored-by: Farzon Lotfi <farzonlotfi@microsoft.com>	2025-08-15 10:45:34 -07:00
zGoldthorpe	a8d25683ee	[PatternMatch] Allow `m_ConstantInt` to match integer splats (#153692 ) When matching integers, `m_ConstantInt` is a convenient alternative to `m_APInt` for matching unsigned 64-bit integers, allowing one to simplify ```cpp const APInt *IntC; if (match(V, m_APInt(IntC))) { if (IntC->ule(UINT64_MAX)) { uint64_t Int = IntC->getZExtValue(); // ... } } ``` to ```cpp uint64_t Int; if (match(V, m_ConstantInt(Int))) { // ... } ``` However, this simplification is only true if `V` is a scalar type. Specifically, `m_APInt` also matches integer splats, but `m_ConstantInt` does not. This patch ensures that the matching behaviour of `m_ConstantInt` parallels that of `m_APInt`, and also incorporates it in some obvious places.	2025-08-15 10:43:54 -06:00
Ramkumar Ramachandra	f34326dac8	[VPlan] Introduce vputils::onlyScalarValuesUsed (NFC) (#153577 )	2025-08-15 15:55:59 +00:00
Alexey Bataev	13b54f7dc1	[SLP] Recalculate dependencies for potential control dependencies if cleared If the control dependecies are cleared after calcellation of the copyables, need to reclculate them unconditionally. Fixes #153754 #153676	2025-08-15 07:52:10 -07:00
Stephen Tozer	bc216b057d	[Debugify] Improve reduction of debugify coverage build output (#150212 ) In current DebugLoc coverage builds, the output for any reasonably large build can become very large if any missing DebugLocs are present; this happens because single errors in LLVM may result in many errors being reported in the output report. The main cause of this is that the empty locations attached to instructions may be propagated to other instructions in later passes, which will each be reported as new errors. This patch prevents this by adding an "unknown" annotation to instructions after reporting them once, ensuring that any other DebugLocs copied or derived from the original empty location will not be marked as new errors. As a separate but related change, this patch updates the report generation script to deduplicate results using the recorded stacktrace if they are available, instead of the pass+instruction combination. This reduces the size of the reduction, but makes the reduction highly reliable, as the stacktrace allows us to very precisely identify when two bugs have originated from the same place.	2025-08-15 14:01:04 +01:00
Tobias Stadler	d803a93f55	[Inliner] Report inlining decision before deleting Callee contents (#153616 ) Call `recordInliningWithCalleeDeleted` before dropping the contents of the Callee. Otherwise the handlers don't have access to e.g. the DebugLoc, so the Callee DebugLoc was missing in inlining remarks for functions with internal linkage. The test is the same as `optimization-remarks-passed-yaml.ll` except that the function `foo` has internal linkage instead of external linkage.	2025-08-15 12:00:34 +01:00
Mircea Trofin	b8d74ad2b3	[JTS] Use common branch weight downscaling (#153738 ) This also fixes a bug introduced accidentally in #153651, whereby the `JumpTableToSwitch` would convert all the branch weights to 0 except for one. It didn't trip the test because `update_test_checks` wasn't run with `-check-globals`. It is now. This also made noticeable that the direct calls promoted from the indirect call inherited the `VP`metadata, which should be dropped as it makes no more sense now.	2025-08-15 07:30:43 +00:00
Mircea Trofin	3b4775d31d	[NFC][PGO] Factor downscaling of branch weights out of `Instrumentation` into `ProfileData` (#153735 ) The logic isn’t instrumentation-specific, and the refactoring allows users avoid a dependency on `Instrumentation` and just take one on `ProfileData` (which a fairly low-level dependency)	2025-08-14 22:44:36 -07:00
Mircea Trofin	93d24b6b7b	[NFC][PGO] Drop unused `Module` parameter in `setProfMetadata` (#153733 )	2025-08-14 19:55:39 -07:00
Alexey Bataev	bf2f241458	[SLP]Support LShr as base for copyable elements Added support for LShr instructions as base for copyable elements. Also, added simple analysis for best base instruction selection, if multiple candidates are available. Fixed scheduling after cancellation Reviewers: hiraditya, RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/153393	2025-08-14 19:12:27 -07:00
Matt Arsenault	cb1228fbd5	RuntimeLibcalls: Return StringRef for libcall names (#153209 ) Does not yet fully propagate this down into the TargetLowering uses, many of which are relying on null checks on the returned value.	2025-08-15 09:55:39 +09:00
Alex Bradbury	db5f7dc374	Revert "[SLP]Support LShr as base for copyable elements" This reverts commit ca4ebf95172d24f8c47655709b2c9eb85bda5cb2. Causes compile-time crashes for some inputs with RVV zvl512b/zvl1024b configurations. See here for a minimal reproducer: https://github.com/llvm/llvm-project/pull/153393#issuecomment-3189898813	2025-08-14 22:18:24 +01:00
Michael Berg	334a046a3c	[LoopDist] Consider reads and writes together for runtime checks (#145623 ) Emit safety guards for ptr accesses when cross partition loads exist which have a corresponding store to the same address in a different partition. This will emit the necessary ptr checks for these accesses. The test case was obtained from SuperTest, which SiFive runs regularly. We enabled LoopDistribution by default in our downstream compiler, this change was part of that enablement.	2025-08-14 12:50:17 -07:00
Mircea Trofin	a508ea2ad7	Add dependency on `ProfileData` from ScalarOpts (#153651 ) Fixing buildbot failures after PR #153305, e.g. https://lab.llvm.org/buildbot/#/builders/203/builds/19861 Analysis already depends on `ProfileData`, so the transitive closure of the dependencies of `ScalarOpts` doesn't change. Also avoided an extra dependency (and very unnecessary) on `Instrumentation`. The API previously used doesn't need to live in Instrumentation to begin with, but that's something to address in a follow-up.	2025-08-14 12:37:17 -07:00
Mircea Trofin	016c301d30	[NFC] Use `[[maybe_unused]]` for variable used in assertion (#153639 )	2025-08-14 18:52:56 +00:00
Florian Hahn	db98ac43ec	[LV] Use shl for ((VF * Step) * vscale) in createStepForVF. (#153495 ) Directly emit shl instead of a multiply if VF * Step is a power-of-2. The main motivation here is to prepare the code and test for directly generating and expanding a SCEV expression of the minimum iteration count. SCEVExpander will directly emit shl for multiplies with powers-of-2. InstCombine will also performs this combine, so end-to-end this should effectively by NFC. PR: https://github.com/llvm/llvm-project/pull/153495	2025-08-14 19:27:51 +01:00
Mircea Trofin	f5d284309f	[JTS] Propagate profile info (#153305 ) If the indirect call target being recognized as a jump table has profile info, we can accurately synthesize the branch weights of the switch that replaces the indirect call. Otherwise we insert the "unknown" `MD_prof` to indicate this is the best we can do here. Part of Issue #147390	2025-08-14 11:17:57 -07:00
Florian Hahn	ff0ce74be8	[VPlan] Replace scalar preheader with VPIRBB at single place (NFC). Replace the scalar preheader VPBB with an VPIRBB wrapping the IR basic block created by createVectorizedLoopSkeleton.	2025-08-14 19:11:34 +01:00
Ramkumar Ramachandra	86482dffba	[VPlan] Use m_Broadcast to improve a match (NFC) (#153607 )	2025-08-14 18:10:58 +01:00
Vedant Paranjape	44df9826f3	[InstCombine] Propagate invariant.load metadata across unpacked loads (#152186 ) For loads that operate on aggregate type, instcombine unpacks the loads. It does not preserve the invariant.load metadata. This patch fixes that, it looks for the metadata in the parent load and attaches the metadata to the unpacked loads. ``` %struct.double2 = type { double, double } %struct.double1 = type { double } define %struct.double2 @func1(ptr %a) { %1 = load %struct.double2, ptr %a, align 16, !invariant.load !1 ret %struct.double2 %1 } !1 = !{} ``` Reproducer: https://godbolt.org/z/hcY8MMvYh	2025-08-14 10:08:26 -07:00
Alexey Bataev	ca4ebf9517	[SLP]Support LShr as base for copyable elements Added support for LShr instructions as base for copyable elements. Also, added simple analysis for best base instruction selection, if multiple candidates are available. Reviewers: hiraditya, RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/153393	2025-08-14 12:35:28 -04:00
Alexey Bataev	d57ab276b6	[SLP] Recalculate cleared deps for potential control schedule data nodes Need to recalculate the dependencies for all potential control data schedule nodes to prevent compiler crash. Fixes #153571	2025-08-14 09:00:42 -07:00

1 2 3 4 5 ...

40741 Commits