llvm-project

Author	SHA1	Message	Date
Florian Hahn	8bc038daf2	[InstComb] Allow more user for (add (ptrtoint %B), %O) to GEP transform. (#153566 ) Generalize the logic from https://github.com/llvm/llvm-project/pull/153421 to support additional cases where the pointer is only used as integer. Alive2 Proof: https://alive2.llvm.org/ce/z/po58pP This enables vectorizing std::find for some cases, if additional assumptions are provided: https://godbolt.org/z/94oq3576E Depends on https://github.com/llvm/llvm-project/pull/15342. PR: https://github.com/llvm/llvm-project/pull/153566	2025-08-22 10:17:12 +01:00
Bjorn Pettersson	2d3167f8d8	[SeparateConstOffsetFromGEP] Avoid miscompiles related to trunc nuw/nsw (#154582 ) Drop poison generating flags on trunc when distributing trunc over add/sub/or. We need to do this since for example (add (trunc nuw A), (trunc nuw B)) is more poisonous than (trunc nuw (add A, B))). In some situations it is pessimistic to drop the flags. Such as if the add in the example above also has the nuw flag. For now we keep it simple and always drop the flags. Worth mentioning is that we drop the flags when cloning instructions and rebuilding the chain. This is done after the "allowsPreservingNUW" checks in ConstantOffsetExtractor::Extract. So we still take the "trunc nuw" into consideration when determining if nuw can be preserved in the gep (which should be ok since that check also require that all the involved binary operations has nuw). Fixes #154116	2025-08-22 10:27:57 +02:00
Bjorn Pettersson	4ff7ac2330	[SeparateConstOffsetFromGEP] Add test case with trunc nuw/nsw showing miscompile Pre commit a test case for issue #154116. When redistributing trunc over add/sub/or we may need to drop poison generating flags from the trunc.	2025-08-22 10:26:09 +02:00
Matt Arsenault	b1b5102624	AMDGPU: Start considering new atomicrmw metadata on integer operations (#122138 ) Start considering !amdgpu.no.remote.memory.access and !amdgpu.no.fine.grained.host.memory metadata when deciding to expand integer atomic operations. This does not yet attempt to accurately handle fadd/fmin/fmax, which are trickier and require migrating the old "amdgpu-unsafe-fp-atomics" attribute.	2025-08-22 05:29:36 +00:00
Matt Arsenault	01f785cac4	AMDGPU: Expand remaining system atomic operations (#122137 ) System scope atomics need to use cmpxchg loops if we know nothing about the allocation the address is from. aea5980e26e6a87dab9f8acb10eb3a59dd143cb1 started this, this expands the set to cover the remaining integer operations. Don't expand xchg and add, those theoretically should work over PCIe. This is a pre-commit which will introduce performance regressions. Subsequent changes will add handling of new atomicrmw metadata, which will avoid the expansion. Note this still isn't conservative enough; we do need to expand some device scope atomics if the memory is in fine-grained remote memory.	2025-08-22 13:55:04 +09:00
Luke Lau	c97c6869b6	[VPlan] Allow folding not (cmp eq) -> icmp ne with other select users (#154497 ) Currently we only allow folding not (cmp eq) -> icmp ne if the not is the only user of the compare. However a common scenario is that some select might also use the compare. We can still fold the not if we also swizzle the arms of the selects. This helps avoid regressions in #150368	2025-08-22 07:59:14 +08:00
Peter Collingbourne	ff85dbdf6b	ThinLTOBitcodeWriter: Emit __cfi_check to full LTO part of bitcode file. The CrossDSOCFI pass runs on the full LTO module and fills in the body of __cfi_check. This function must have the correct attributes in order to be compatible with the rest of the program. For example, when building with -mbranch-protection=standard, the function must have the branch-target-enforcement attribute, which is normally added by Clang. When __cfi_check is missing, CrossDSOCFI will give it the default set of attributes, which are likely incorrect. Therefore, emit __cfi_check to the full LTO part, where CrossDSOCFI will see it. Reviewers: efriedma-quic, vitalybuka, fmayer Reviewed By: efriedma-quic Pull Request: https://github.com/llvm/llvm-project/pull/154833	2025-08-21 16:31:32 -07:00
Florian Hahn	ba5d487ac4	[LV] Add test with interleave groups separated by offset. Adds extra test coverage for https://github.com/llvm/llvm-project/pull/91196.	2025-08-21 21:40:47 +01:00
Alex MacLean	4a9d2187ee	[SeparateConstOffsetFromGEP] propagate const offset through GEP chains (#143470 ) When separating the constant offset from a GEP, if the pointer operand is a constant ptradd (likely generated when we performed this transform on that GEP), we accumulate the offset into the current offset. This ensures that when there is a chain of GEPs the constant offset reaches the final memory instruction where it can likely be folded into the addressing.	2025-08-21 13:37:24 -07:00
Florian Hahn	e41aaf5a64	[VPlan] Use VPIRMetadata for VPInterleaveRecipe. (#153084 ) Use VPIRMetadata for VPInterleaveRecipe to preserve noalias metadata added by versioning. This still uses InterleaveGroup's logic to preserve existing metadata from IR. This can be migrated separately. Fixes https://github.com/llvm/llvm-project/issues/153006. PR: https://github.com/llvm/llvm-project/pull/153084	2025-08-21 18:58:10 +01:00
Florian Hahn	1b0b59ae43	[InstComb] Fold inttoptr (add (ptrtoint %B), %O) -> GEP for ICMP users. (#153421 ) Replace inttoptr (add (ptrtoint %B), %O) with (getelementptr i8, %B, %o) if all users are ICmp instruction, which in turn means only the address value is compared. We should be able to do this, if the src pointer, the integer type and the destination pointer types have the same bitwidth and address space. A common source of such (inttoptr (add (ptrtoint %B), %O)) is from various iterations in libc++. In practice this triggers in a number of files in Clang and various open source projects, including cppcheck, diamond, llama and more. Alive2 Proof with constant offset: https://alive2.llvm.org/ce/z/K_5N_B PR: https://github.com/llvm/llvm-project/pull/153421	2025-08-21 16:36:25 +01:00
Florian Hahn	cfef05e69c	[InstCombine] Add tests for (inttoptr (add (ptrtoint %Base), %Offset)). Precommit tests for https://github.com/llvm/llvm-project/pull/153421.	2025-08-21 15:42:39 +01:00
Luke Lau	42cf9c60d7	[RISCV] Mark Sub/AddChainWithSubs as legal reduction types (#154753 ) We used to vectorize these scalably but after #147026 they were split out from RecurKind::Add into their own RecurKinds, and we didn't mark them as supported in isLegalToVectorizeReduction. This caused the loop vectorizer to drop the scalable VPlan because it thinks the reductions will be scalarized. This fixes it by just marking them as supported. Fixes #154554	2025-08-21 21:43:48 +08:00
Florian Hahn	d67dba5e88	[VPlan] Check Def2LaneDefs first in cloneForLane. (NFC) If we have entries in Def2LaneDefs, we always have to use it. Move the check before. Otherwise we may not pick the correct operand, e.g. if Op was a replicate recipe that got single-scalar after replicating it. Fixes https://github.com/llvm/llvm-project/issues/154330.	2025-08-21 11:34:49 +01:00
Elvis Wang	d611a9ca15	[LV][VPlan] Reduce register usage of VPEVLBasedIVPHIRecipe. (#154482 ) `VPEVLBasedIVPHIRecipe` will lower to VPInstruction scalar phi and generate scalar phi. This recipe will only occupy a scalar register just like other phi recipes. This patch fix the register usage for `VPEVLBasedIVPHIRecipe` from vector to scalar which is close to generated vector IR. https://godbolt.org/z/6Mzd6W6ha shows that no register spills when choosing `<vscale x 16>`. Note that this test is basically copied from AArch64.	2025-08-21 07:39:01 +08:00
Shih-Po Hung	cf0e86118d	[VPlan] Handle canonical VPWidenIntOrFpInduction in branch-condition simplification (#153539 ) SimplifyBranchConditionForVFAndUF only recognized canonical IVs and a few PHI recipes in the loop header. With more IV-step optimizations, the canonical widen-canonical-iv can be replaced by a canonical VPWidenIntOrFpInduction, which the pass did not handle, causing regressions (missed simplifications). This patch replaces canonical VPWidenIntOrFpInduction with a StepVector in the vector preheader since the vector loop region only executes once.	2025-08-21 07:34:54 +08:00
Florian Hahn	7d33743324	[LV] Add tests for narrowing interleave groups with scalable vectors.	2025-08-20 22:31:24 +01:00
Florian Hahn	b0d0e04693	[LV] Add test where we choose VF * IC is larger than trip count.	2025-08-20 20:40:49 +01:00
Ryotaro Kasuga	2330fd2f73	[LoopPeel] Add new option to peeling loops to convert PHI into IV (#121104 ) LoopPeel currently considers PHI nodes that become loop invariants through peeling. However, in some cases, peeling transforms PHI nodes into induction variables (IVs), potentially enabling further optimizations such as loop vectorization. For example: ```c // TSVC s292 int im = N-1; for (int i=0; i<N; i++) { a[i] = b[i] + b[im]; im = i; } ``` In this case, peeling one iteration converts `im` into an IV, allowing it to be handled by the loop vectorizer. This patch adds a new feature to peel loops when to convert PHIs into IVs. At the moment this feature is disabled by default. Enabling it allows to vectorize the above example. I have measured on neoverse-v2 and observed a speedup of more than 60% (options: `-O3 -ffast-math -mcpu=neoverse-v2 -mllvm -enable-peeling-for-iv`). This PR is taken over from #94900 Related #81851	2025-08-20 13:44:56 +00:00
Florian Hahn	dc23869f98	[LV] Handle vector trip count being zero in preparePlanForEpiVectorLoop. After a485e0e, we may not set the vector trip count in preparePlanForEpilogueVectorLoop if it is zero. We should not choose a VF * UF that makes the main vector loop dead (i.e. vector trip count is zero), but there are some cases where this can happen currently. In those cases, set EPI.VectorTripCount to zero.	2025-08-20 11:54:22 +01:00
Nikita Popov	5ae749b77d	[FunctionAttr] Invalidate callers with mismatching signature (#154289 ) If FunctionAttrs infers additional attributes on a function, it also invalidates analysis on callers of that function. The way it does this right now limits this to calls with matching signature. However, the function attributes will also be used when the signatures do not match. Use getCalledOperand() to avoid a signature check. This is not a correctness fix, just improves analysis quality. I noticed this due to https://github.com/llvm/llvm-project/pull/144497#issuecomment-3199330709, where LICM ends up with a stale MemoryDef that could be a MemoryUse (which is a bug in LICM, but still non-optimal).	2025-08-20 11:38:31 +02:00
Kerry McLaughlin	c34cba0413	[AArch64][SME] Lower aarch64.sme.cnts* to vscale when in streaming mode (#154305 ) In streaming mode, both the @llvm.aarch64.sme.cnts and @llvm.aarch64.sve.cnt intrinsics are equivalent. For SVE, cnt* is lowered in instCombineIntrinsic to @llvm.sme.vscale(). This patch lowers the SME intrinsic similarly when in streaming-mode.	2025-08-20 09:48:36 +01:00
AZero13	f94290cbff	[ValueTracking][GlobalISel] UCMP and SCMP cannot create undef or poison (#154404 ) They cannot make poison or undef, same for IR. They can only make -1, 0, or 1 Alive 2: https://alive2.llvm.org/ce/z/--Jd78	2025-08-20 08:41:27 +09:00
Florian Hahn	23ea79de61	[LV] Add more tests for costs of predicated udivs and calls. Adds missing test coverage for the cost model. Also reduce the size of check lines a bit, by using a common prefix and filtering out after scalar.ph.	2025-08-19 20:04:31 +01:00
Florian Hahn	1217c8226b	[LoopIdiom] Add test for simplifying SCEV during expansion with flags.	2025-08-19 13:22:45 +01:00
Nuno Lopes	d0029b87d8	remove UB from test [NFC]	2025-08-19 11:18:27 +01:00
David Green	a7df02f83c	[InstCombine] Make strlen optimization more resilient to different gep types. (#153623 ) This makes the optimization in optimizeStringLength for strlen(gep @glob, %x) -> sub endof@glob, %x a little more resilient, and maybe a bit more correct for geps with non-array types.	2025-08-19 10:37:17 +01:00
Nikita Popov	4ab87ffd1e	[SCCP] Enable PredicateInfo for non-interprocedural SCCP (#153003 ) SCCP can use PredicateInfo to constrain ranges based on assume and branch conditions. Currently, this is only enabled during IPSCCP. This enables it for SCCP as well, which runs after functions have already been simplified, while IPSCCP runs pre-inline. To a large degree, CVP already handles range-based optimizations, but SCCP is more reliable for the cases it can handle. In particular, SCCP works reliably inside loops, which is something that CVP struggles with due to LVI cycles. I have made various optimizations to make PredicateInfo more efficient, but unfortunately this still has significant compile-time cost (around 0.1-0.2%).	2025-08-19 10:59:38 +02:00
David Sherwood	13d8ba7dea	[LV][TTI] Calculate cost of extracting last index in a scalable vector (#144086 ) There are a couple of places in the loop vectoriser where we want to calculate the cost of extracting the last lane in a vector. However, we wrongly assume that asking for the cost of extracting lane (VF.getKnownMinValue() - 1) is an accurate representation of the cost of extracting the last lane. For SVE at least, this is non-trivial as it requires the use of whilelo and lastb instructions. To solve this problem I have added a new getReverseVectorInstrCost interface where the index is used in reverse from the end of the vector. Suppose a vector has a given ElementCount EC, the extracted/inserted lane would be EC - 1 - Index. For scalable vectors this index is unknown at compile time. I've added a AArch64 hook that better represents the cost, and also a RISCV hook that maintains compatibility with the behaviour prior to this PR. I've also taken the liberty of adding support in vplan for calculating the cost of VPInstruction::ExtractLastElement.	2025-08-19 09:31:37 +01:00
Nikita Popov	5753ee2434	[LICM] Avoid assertion failure on stale MemoryDef It can happen that the call is originally created as a MemoryDef, and then later transforms show it is actually read-only and could be a MemoryUse -- however, this is not guaranteed to be reflected in MSSA.	2025-08-19 10:25:45 +02:00
Nikita Popov	e6d9542b77	[X86][Inline] Check correct function for target feature check (#152515 ) The check for ABI differences for inlined calls involves the caller, the callee and the nested callee. Before inlining, the ABI is determined by the target features of the callee. After inlining it is determined by the caller. The features of the nested callee should never actually matter.	2025-08-19 09:44:00 +02:00
Pierre van Houtryve	6f7c77fe90	[AMDGPU] Check noalias.addrspace in mayAccessScratchThroughFlat (#151319 ) PR #149247 made the MD accessible by the backend so we can now leverage it in the memory model. The first use case here is detecting if a flat op can access scratch memory. Benefits both the MemoryLegalizer and InsertWaitCnt.	2025-08-19 07:42:59 +02:00
Luke Lau	144736b07e	[VPlan] Don't fold live ins with both scalar and vector operands (#154067 ) If we end up with a extract_element VPInstruction where both operands are live-ins, we will try to fold the live-ins even though the first operand is a vector whilst the live-in is scalar. This fixes it by just returning the vector live-in instead of calling the folder, and removes the handling for insertelement where we aren't able to do the fold. From some quick testing we previously never hit this fold anyway, and were probably just missing test coverage. Fixes #154045	2025-08-19 04:10:53 +00:00
Kyle Wang	064f02dac0	[VectorCombine] Preserve scoped alias metadata (#153714 ) Right now if a load op is scalarized, the `!alias.scope` and `!noalias` metadata are dropped. This PR is to keep them if exist.	2025-08-18 18:16:32 +00:00
Justin Fargnoli	58de8f2c25	[Inliner] Add option (default off) to inline all calls regardless of the cost (#152365 ) Add a default off option to the inline cost calculation to always inline all viable calls regardless of the cost/benefit and cost/threshold calculations. For performance reasons, some users require that all calls be inlined. Rather than forcing them to adjust the inlining threshold to an arbitrarily high value, offer an option to inline all calls.	2025-08-18 17:48:49 +00:00
Tobias Stadler	8135b7c1ab	[LV] Emit all remarks for unvectorizable instructions (#153833 ) If ExtraAnalysis is requested, emit all remarks caused by unvectorizable instructions - instead of only the first. This is in line with how other places handle DoExtraAnalysis and it can be quite helpful to get info about all instructions in a loop that prevent vectorization.	2025-08-18 18:04:53 +01:00
Ramkumar Ramachandra	97f554249c	[VPlan] Preserve nusw in createInBoundsPtrAdd (#151549 ) Rename createInBoundsPtrAdd to createNoWrapPtrAdd, and preserve nusw as well as inbounds at the callsite.	2025-08-18 17:48:42 +01:00
Antonio Frighetto	33761df961	Revert "[SimpleLoopUnswitch] Record loops from unswitching non-trivial conditions" This reverts commit e9de32fd159d30cfd6fcc861b57b7e99ec2742ab due to multiple performance regressions observed across downstream Numba benchmarks (https://github.com/llvm/llvm-project/issues/138509#issuecomment-3193855772). While avoiding non-trivial unswitches on newly-cloned loops helps mitigate the pathological case reported in https://github.com/llvm/llvm-project/issues/138509, it may as well make the IR less friendly to vectorization / loop- canonicalization (in the test reported, previously no select with loop-carried dependence existed in the new specialized loops), leading the abovementioned approach to be reconsidered.	2025-08-18 17:40:08 +02:00
Arne Stenkrona	ea2f5395b1	[SimplifyCFG] Avoid threading for loop headers (#151142 ) Updates SimplifyCFG to avoid jump threading through loop headers if -keep-loops is requested. Canonical loop form requires a loop header that dominates all blocks in the loop. If we thread through a header, we risk breaking its domination of the loop. This change avoids this issue by conservatively avoiding threading through headers entirely. Fixes: https://github.com/llvm/llvm-project/issues/151144	2025-08-18 09:46:55 +00:00
David Green	790bee99de	[VectorCombine] Remove dead node immediately in VectorCombine (#149047 ) The vector combiner will process all instructions as it first loops through the function, adding any newly added and deleted instructions to a worklist which is then processed when all nodes are done. These leaves extra uses in the graph as the initial processing is performed, leading to sub-optimal decisions being made for other combines. This changes it so that trivially dead instructions are removed immediately. The main changes that this requires is to make sure iterator invalidation does not occur.	2025-08-18 07:55:21 +01:00
Owen Anderson	69e4514978	[GlobalOpt] Do not fold away addrspacecasts which may be runtime operations (#153753 ) Specifically in the context of the once-stored transformation, GlobalOpt would strip all pointer casts unconditionally, even though addrspacecasts might be runtime operations. This manifested particularly on CHERI targets. This patch was inspired by an existing change in CHERI LLVM (`91afa60f17`), but has been reimplemented with updated conventions, and a testcase constructed from scratch.	2025-08-18 02:11:51 +00:00
Andreas Jonson	0561ff6a12	[LVI] Add support for trunc nuw range. (#154021 ) Proof: https://alive2.llvm.org/ce/z/a5Yjb8	2025-08-17 20:24:09 +02:00
Andreas Jonson	5ae8a9b8ce	[SimplifyCfg] Handle trunc nuw i1 condition in Equality comparison. (#153051 ) proof: https://alive2.llvm.org/ce/z/WVt4-F	2025-08-17 09:53:40 +02:00
Florian Hahn	73775a0f27	[LV] Add test for #153946 . Add test for miscompile from https://github.com/llvm/llvm-project/issues/153946, caused by poison propagation.	2025-08-16 21:19:20 +01:00
Florian Hahn	351d398a37	[VPlan] Run final VPlan simplifications before codegen. Dissolving the hierarchical VPlan CFG and converting abstract to concrete recipes can expose additional simplification opportunities. Do a final run of simplifyRecipes before executing the VPlan.	2025-08-16 18:54:27 +01:00
Mircea Trofin	c971c25544	[licm] don't drop `MD_prof` when dropping other metadata (#152420 ) Part of Issue #147390	2025-08-16 07:26:13 -07:00
Alexey Bataev	b157599156	[SLP]Do not include copyable data to the same user twice If the copyable schedule data is created and the user is used several times in the user node, no need to count same data for the same user several times, need to include it only ones. Fixes #153754	2025-08-15 12:36:45 -07:00
Alexey Bataev	09f5b9ab0a	Revert "[SLP]Do not include copyable data to the same user twice" This reverts commit 758c6852c3ffe6b5e259cafadd811e60d8c276fb to fix buildbot https://lab.llvm.org/buildbot/#/builders/195/builds/13298	2025-08-15 12:08:31 -07:00
zGoldthorpe	82caa251d4	[InstCombine] Fold integer unpack/repack patterns through ZExt (#153583 ) This patch explicitly enables the InstCombiner to fold integer unpack/repack patterns such as ```llvm define i64 @src_combine(i32 %lower, i32 %upper) { %base = zext i32 %lower to i64 %u.0 = and i32 %upper, u0xff %z.0 = zext i32 %u.0 to i64 %s.0 = shl i64 %z.0, 32 %o.0 = or i64 %base, %s.0 %r.1 = lshr i32 %upper, 8 %u.1 = and i32 %r.1, u0xff %z.1 = zext i32 %u.1 to i64 %s.1 = shl i64 %z.1, 40 %o.1 = or i64 %o.0, %s.1 %r.2 = lshr i32 %upper, 16 %u.2 = and i32 %r.2, u0xff %z.2 = zext i32 %u.2 to i64 %s.2 = shl i64 %z.2, 48 %o.2 = or i64 %o.1, %s.2 %r.3 = lshr i32 %upper, 24 %u.3 = and i32 %r.3, u0xff %z.3 = zext i32 %u.3 to i64 %s.3 = shl i64 %z.3, 56 %o.3 = or i64 %o.2, %s.3 ret i64 %o.3 } ; => define i64 @tgt_combine(i32 %lower, i32 %upper) { %base = zext i32 %lower to i64 %upper.zext = zext i32 %upper to i64 %s.0 = shl nuw i64 %upper.zext, 32 %o.3 = or disjoint i64 %s.0, %base ret i64 %o.3 } ``` Alive2 proofs: [YAy7ny](https://alive2.llvm.org/ce/z/YAy7ny)	2025-08-15 12:48:32 -06:00
Alexey Bataev	758c6852c3	[SLP]Do not include copyable data to the same user twice If the copyable schedule data is created and the user is used several times in the user node, no need to count same data for the same user several times, need to include it only ones. Fixes #153754	2025-08-15 11:47:35 -07:00

1 2 3 4 5 ...

32676 Commits