llvm-project

Author	SHA1	Message	Date
paperchalice	c53acf0443	[SelectionDAGBuilder] Remove NoNaNsFPMath uses (#169904 ) Replaced by checking fast-math flags or value tracking results.	2026-02-09 09:48:07 +08:00
Nicolai Hähnle	af836ff60c	[CodeGen] Add getTgtMemIntrinsic overload for multiple memory operands (NFC) (#175843 ) There are target intrinsics that logically require two MMOs, such as llvm.amdgcn.global.load.lds, which is a copy from global memory to LDS, so there's both a load and a store to different addresses. Add an overload of getTgtMemIntrinsic that produces intrinsic info in a vector, and implement it in terms of the existing (now protected) overload. GlobalISel and SelectionDAG paths are updated to support multiple MMOs. The main part of this change is supporting multiple MMOs in MemIntrinsicNodes. Converting the backends to using the new overload is a fairly mechanical step that is done in a separate change in the hope that that allows reducing merging pains during review and for downstreams. A later change will then enable using multiple MMOs in AMDGPU.	2026-02-02 21:58:42 +00:00
Philip Ginsbach-Chen	5d5b4aaa0e	[SelectionDAG][NFC] Rename isConstantSequence to isArithmeticSequence (#179108 ) The previous name was misleading: the method checks for an arithmetic progression `(start, start+stride, start+2*stride, ...)`, not just any constant sequence. The new name uses precise mathematical terminology. https://github.com/llvm/llvm-project/pull/176671#discussion_r2735571479	2026-02-02 17:19:57 +00:00
zhijian lin	dc520ea4af	[PowerPC] using milicode call for strcmp instead of lib call (#177009 ) 1. AIX has "millicode" routines, which are functions loaded at boot time into fixed addresses in kernel memory. This allows them to be customized for the processor. The __strcmp routine is a millicode implementation; we use millicode for the strcmp function instead of a library call to improve performance.	2026-02-02 09:34:53 -05:00
Simon Pilgrim	a372152cb5	[DAG] visitVECTOR_SHUFFLE - ensure correct resno when folding shuffle(bop(shuffle(x,y),shuffle(z,w)) (#179124 ) TLI.isBinOp recognises some opcodes that have multiple results, including UADDO etc. In most cases we currently just bail if a binop has multiple results, but shuffle combining was missing the check and its pretty trivial to add handling in this case. I've added add/sub-overflow opcodes to verifyNode to help catch these cases in the future - IIRC there was a plan to autogen these, but there isn't anything at the moment. Fixes #179112	2026-02-02 09:22:48 +00:00
Benjamin Maxwell	1818b23a99	[SDAG] Check for `nsz` in DAG.canIgnoreSignBitOfZero() (#178905 ) Follow up to #174423	2026-02-01 15:58:38 +00:00
Philip Ginsbach-Chen	e345976e04	[SelectionDAG] Handle undef at any position in isConstantSequence (#176671 ) This patch extends `BuildVectorSDNode::isConstantSequence` to recognize constant sequences that contain undef elements at any position. The new implementation finds the first two non-undef constant elements, computes the stride from their difference, then verifies all other defined elements match the sequence. This enables SVE's INDEX instruction to be used in more cases. This change particularly benefits ZIP1/ZIP2 patterns where one operand is a constant sequence. When a smaller constant vector like `<0, 1, 2, 3>` is used in a ZIP1 shuffle producing a wider result, it gets expanded with trailing undefs. Similarly, for ZIP2 patterns, the DAG combiner transforms the constant to have leading undefs since ZIP2 only uses the upper half of its operands. In particular, these patterns arise naturally from `VectorCombine`'s `compactShuffleOperands` optimization (see #176074) that I am suggesting as a fix for #137447.	2026-01-30 19:57:11 +00:00
Osama Abdelkader	aad7259ff6	[AArch64] Optimize memset to use NEON DUP instruction for more sizes (#166030 ) This change improves memset code generation for non-zero values on AArch64 by using NEON's DUP instruction instead of the less efficient multiplication with 0x01010101 pattern. For small sizes, the value is extracted from a larger DUP. For non-power-of-two sizes, overlapping stores are used in some cases. TargetLowering::findOptimalMemOpLowering is modified to allow explicitly specifying the size of the constant in cases where the constant is larger than the store operations. Fixes #165949	2026-01-29 13:03:38 -08:00
Craig Topper	53ec484ebf	[SelectionDAG] Add CTLS to FoldConstantArithmetic and optimize i1 CTLS to 0. (#178552 ) Since we don't have a CTLS intrinsic, it likely gets constant folded while it is still a CTLZ pattern so I'm using a unittest to test it.	2026-01-29 08:00:10 -08:00
serge-sans-paille	adbbe856d7	[perf] Replace copy-assign by move-assign in llvm/lib/CodeGen/* (#178172 )	2026-01-28 06:57:50 +00:00
Sander de Smalen	0e84f659b8	Support EXTRACT_SUBVECTOR in computeKnownBits for scalable vectors (#177163 ) Rather than not supporting this case it would just be more conservative as it will need to prove known bits for all elements. Follows on from #176883	2026-01-27 12:53:00 +00:00
Craig Topper	896a667473	[KnownBits][SelectionDAG] Add KnownBits::clmul. Support trailing bits. NFC (#177517 ) Borrow the known trailing bits logic from KnownBits::mul, but using APIntOps::clmul.	2026-01-23 11:11:38 -08:00
Craig Topper	53b0a64e98	[SelectionDAG] Add very basic computeKnownBits support for ISD::CLMUL. (#177445 ) This implements leading zero count support so we can remove some unnecessary ANDs.	2026-01-22 14:49:34 -08:00
Cheng Lingfei	711e8e5694	[AArch64] Optimize memcpy for non-power of two sizes (#168890 ) The previous getMemcpyLoadsAndStores implementation would chain load/store instructions from "NumLdStInMemcpy - GlueIter - GluedLdStLimit" to "NumLdStInMemcpy - GlueIter". This approach caused issues when copying non-power-of-two sizes, as it would chain leading load/stores with subsequent instructions at non-power-of-two aligned offsets. This chaining pattern prevented optimal optimizations in aarch64-ldst-opt pass for these load/store instructions. This commit modifies the chaining range to be from GlueIter to GlueIter + GluedLdStLimit, enabling proper optimization of load/store instructions in aarch64-ldst-opt. Closes https://github.com/llvm/llvm-project/issues/165947	2026-01-22 15:47:50 +00:00
Sander de Smalen	e807c6f89d	[AArch64] Fold sext-in-reg for predicate -> fixed-length conversions. (#176883 )	2026-01-21 13:15:28 +00:00
Matt Arsenault	aca2783840	DAG: Get libcall info from LibcallLowering in more places (#176836 ) Avoid using TargetLowering functions	2026-01-20 12:47:22 +01:00
Sander de Smalen	3eed0511c0	[SelectionDAG] NFC: Remove redundant assert in ComputeNumSignBits. This assert should not have existed, because just below it the code bails out for that same condition. The case of the vector being a scalable vector also shouldn't cause the compiler to crash with an assertion failure, and instead it should just avoid analysing the expression.	2026-01-20 09:09:15 +00:00
Jerry Dang	d2c5892c22	[SelectionDAG] Add TRUNCATE_SSAT_S/U and TRUNCATE_USAT_U to canCreateUndefOrPoison and computeKnownBits (#152143 ) (#168809 ) 1. Implement `SelectionDAG::computeKnownBits` for TRUNCATE_SSAT_S/U and TRUNCATE_USAT_U 2. Saturating truncation operations are well-defined for all inputs and cannot create poison or undef values. This allows the optimizer to eliminate unnecessary freeze instructions after these operations. Fixes #152143	2026-01-19 10:25:08 +00:00
fbrv	dd29183f33	[DAG] Allow MIN/MAX signedness flip when operands are known-negative (#174469 ) Extend the existing DAGCombine logic in visitIMINMAX so that signed and unsigned MIN/MAX can be flipped not only when both operands are known non-negative but also when both operands are known negative. This replaces the old SignBitIsZero checks with computeKnownBits and explicit tests for non-negative or negative operands while keeping all existing legality and saturation gating in place. Add regression tests to cover both the known-negative case and the known-non-negative case. Fixes #174325	2026-01-16 18:48:54 +00:00
Matt Arsenault	01e6245af4	DAG: Avoid querying libcall info from TargetLowering (#176268 ) Libcall lowering decisions should come from the LibcallLoweringInfo analysis. Query this through the DAG, so eventually the source can be the analysis. For the moment this is just a wrapper around the TargetLowering information.	2026-01-16 09:02:49 +00:00
zhijian lin	7b90f426a6	[PowerPC] using milicode call for strstr instead of lib call (#176002 ) AIX has "millicode" routines, which are functions loaded at boot time into fixed addresses in kernel memory. This allows them to be customized for the processor. The __strstr routine is a millicode implementation; we use millicode for the strstr function instead of a library call to improve performance. I add a helper function `getRuntimeCallSDValueHelper` in the patch. I will refactor the function `SelectionDAG::getStrlen` `SelectionDAG::getStrcpy` etc later in another patch.	2026-01-15 14:58:17 -05:00
Manasij Mukherjee	2fa1ba62ac	[SelectionDAG] Fix zext assertion check for scalable vectors (#176064 ) Use element type comparisons in getZeroExtendInReg to avoid comparing scalable and fixed types. Fixes #176037	2026-01-14 22:00:26 -08:00
Gergo Stomfai	5f31b9c381	[DAG] computeKnownBits - add CTLS handling (#174824 ) Add handling for CTLS using the same method as in https://github.com/llvm/llvm-project/pull/174636. Added tests to AArch64 and RISCV, but it seems that ARM is actually resolving `llvm.arm.cls` to `clz`, so not tests added there.	2026-01-14 15:04:40 +00:00
actink	ad3e3d809e	[SDAG] fix miss opt: shl nuw + zext adds unnecessary masking (#172046 ) close: #171750	2026-01-13 22:03:47 +08:00
zhijian lin	b983b0e92a	[PowerPC] using milicode call for strcpy instead of lib call (#174782 ) AIX has "millicode" routines, which are functions loaded at boot time into fixed addresses in kernel memory. This allows them to be customized for the processor. The __strcpy routine is a millicode implementation; we use millicode for the strcpy function instead of a library call to improve performance. --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com>	2026-01-12 08:58:45 -05:00
Yingwei Zheng	b8892b9a9b	[SDAG] Add freeze when simplifying select with undef arms (#175199 ) Consider the following pattern: ``` %trunc = trunc nuw i64 %x to i48 %sel = select i1 %cmp, i48 %trunc, i48 undef ``` We cannot simplify `%sel` to `%trunc` as `%trunc` may be poison, which cannot be refined into undef. This patch checks whether the replacement may be poison. If so, it will insert a freeze. We may need SDAG's version of `impliesPoison` if it causes significant regressions. Compile-time impact: https://llvm-compile-time-tracker.com/compare.php?from=ded109c0cff41714ebf9bd60b073aaab07fa4ca8&to=103e605ce6b33bc9145526faf805ee38b972c215&stat=instructions%3Au Closes https://github.com/llvm/llvm-project/issues/175018.	2026-01-10 13:49:53 +08:00
Craig Topper	6c5535bd71	[SelectionDAG] Unify ISD::LOAD handling in ComputeNumSignBits. NFC (#175060 ) Range metadata was handled in a ISD::LOAD case in the main opcode switch. Extending loads and constant pools were handled with special code after the main switch. Move this code into the ISD::LOAD case of the main switch. There is one slight change here, I put the Op.getResNo() == 0 check before the range handling. This should be more correct.	2026-01-08 14:12:47 -08:00
Craig Topper	d81a0e7a18	[SelectionDAG] Add ISD::CTLS to canCreateUndefOrPoison. (#174709 )	2026-01-07 10:38:43 -08:00
Jay Foad	e5623b1a9e	Revert "SelectionDAG: Do not propagate divergence through glue (#174766 )" This reverts commit 47a0d0e42832558f999b149b22cfd48c46ef2a57. Reverted due to test failures in LLVM_ENABLE_EXPENSIVE_CHECKS builds.	2026-01-07 14:23:15 +00:00
Jay Foad	47a0d0e428	SelectionDAG: Do not propagate divergence through glue (#174766 ) Glue does not carry any value (in the LLVM IR Value sense) that could be considered uniform or divergent.	2026-01-07 14:04:36 +00:00
Luke Lau	ad4bfac732	[IR] Split vector.splice into vector.splice.left and vector.splice.right (#170796 ) This PR implements the first change outlined in https://discourse.llvm.org/t/rfc-allow-non-constant-offsets-in-llvm-vector-splice/88974?u=lukel In order to allow non-immediate offsets in the llvm.vector.splice intrinsic, we need to separate out the "shift left" and "shift right" modes into two separate intrinsics, which were previously determined by whether or not the offset is positive or negative. The description in the LangRef has also been reworded in terms of sliding elements left or right and extracting either the upper or lower half as opposed to extracting from a certain index, which brings it inline with the definition of `llvm.fshr.`/`llvm.fshl.`. This patch teaches AutoUpgrade.cpp to upgrade the old intrinsics into their new equivalent one based on their offset, so existing uses of vector.splice should still work. Uses of llvm.vector.splice in `llvm/test/CodeGen` haven't been replaced in this PR to keep the diff small and kick the tyres on the AutoUpgrader a bit. I planned to do this in a follow up NFC but can include it in this PR if reviewers prefer. Similarly the shuffle costing kind `SK_Splice` has just been kept the same for now, to be split into `SK_SpliceLeft` and `SK_SpliceRight` later.	2026-01-06 15:41:26 +08:00
Ramkumar Ramachandra	9e5e267a03	[ISel] Introduce llvm.clmul intrinsic (#168731 ) In line with a std proposal to introduce the llvm.clmul family of intrinsics corresponding to carry-less multiply operations. This work builds upon 727ee7e ([APInt] Introduce carry-less multiply primitives), and follow-up patches will introduce custom-lowering on supported targets, replacing target-specific clmul intrinsics. Testing is done on the RISC-V target, which should be sufficient to prove that the intrinsics work, since no RISC-V specific lowering has been added. Ref: https://isocpp.org/files/papers/P3642R3.html Co-authored-by: Craig Topper <craig.topper@sifive.com>	2026-01-05 20:24:06 +00:00
Sergei Barannikov	501aa3740f	[SelectionDAG] Fix return type of JUMP_TABLE_DEBUG_INFO node (#174228 ) The node has a chain result, not a glue. Extracted from #168421.	2026-01-02 18:50:51 +00:00
Shilei Tian	2f6a630aae	[SelectionDAG] Skip chain node when updating divergence (#173885 ) Fixes #173785.	2025-12-29 14:54:40 -05:00
Craig Topper	877df9e4b9	[SelectionDAG] Make SSHLSAT/USHLSAT obey getShiftAmountTy(). (#173216 ) Treat these like other shift operations by allowing the shift amount to be a different type than the result. The PromoteIntOp_Shift and LegalizeDAG code are not tested due to lack of target support. I'm looking at adding SSHLSAT for the RISC-V P extension. I don't need this support for that since RISC-V only has one legal type. I just thought it was odd that they weren't like other shifts.	2025-12-22 10:28:04 -08:00
Matt Arsenault	68aea8e202	AMDGPU: Avoid introducing unnecessary fabs in fast fdiv lowering (#172553 ) If the sign bit of the denominator is known 0, do not emit the fabs. Also, extend this to handle min/max with fabs inputs. I originally tried to do this as the general combine on fabs, but it proved to be too much trouble at this time. This is mostly complexity introduced by expanding the various min/maxes into canonicalizes, and then not being able to assume the sign bit of canonicalize (fabs x) without nnan. This defends against future code size regressions in the atan2 and atan2pi library functions.	2025-12-17 00:22:12 +01:00
Matt Arsenault	eb1876c960	DAG: Fix arith_fence handling in SignBitIsZeroFP (#172537 )	2025-12-16 20:10:38 +00:00
Matt Arsenault	b2d9356719	DAG: Make more use of the LibcallImpl overload of getExternalSymbol (#172171 ) Also add a new copy for TargetExternalSymbol that AArch64 needs.	2025-12-13 19:16:47 +00:00
Guy David	29611f4cbe	[DAGCombiner] Relax nsz constraint for FP optimizations (#165011 ) Some floating-point optimization don't trigger because they can produce incorrect results around signed zeros, and rely on the existence of the nsz flag which commonly appears when fast-math is enabled. However, this flag is not a hard requirement when all of the users of the combined value are either guaranteed to overwrite the sign-bit or simply ignore it (comparisons, etc.). The optimizations affected: - fadd x, +0.0 -> x - fsub x, -0.0 -> x - fsub +0.0, x -> fneg x - fdiv(x, sqrt(x)) -> sqrt(x) - frem lowering with power-of-2 divisors	2025-12-09 12:07:46 +02:00
Matt Arsenault	27bf5fdcc6	DAG: Add overload of getExternalSymbol using RTLIB::LibcallImpl (#170587 )	2025-12-05 22:39:57 +00:00
David Green	4c6b8825e8	[DAG] Fold mul 0 -> 0 when expanding mul into parts. (#168780 ) If the upper bits are zero, but we expand multiply then immediately convert the multiple into a libcall, there is no opportunity to optimize away the mul. Do so in getNode to make sure extending multiplies optimise cleanly.	2025-12-05 07:58:28 +00:00
Matt Arsenault	8d6c5cddf2	DAG: Use LibcallImpl in various getLibFunc helpers (#170400 ) Avoid using getLibcallName in favor of querying the libcall impl, and getting the ABI details from that.	2025-12-03 13:00:45 -05:00
Lewis Crawford	ea3fdc5972	Avoid maxnum(sNaN, x) optimizations / folds (#170181 ) The behaviour of constant-folding `maxnum(sNaN, x)` and `minnum(sNaN, x)` has become controversial, and there are ongoing discussions about which behaviour we want to specify in the LLVM IR LangRef. See: - https://github.com/llvm/llvm-project/issues/170082 - https://github.com/llvm/llvm-project/pull/168838 - https://github.com/llvm/llvm-project/pull/138451 - https://github.com/llvm/llvm-project/pull/170067 - https://discourse.llvm.org/t/rfc-a-consistent-set-of-semantics-for-the-floating-point-minimum-and-maximum-operations/89006 This patch removes optimizations and constant-folding support for `maxnum(sNaN, x)` but keeps it folded/optimized for `qNaN`. This should allow for some more flexibility so the implementation can conform to either the old or new version of the semantics specified without any changes. As far as I am aware, optimizations involving constant `sNaN` should generally be edge-cases that rarely occur, so here should hopefully be very little real-world performance impact from disabling these optimizations.	2025-12-02 12:43:03 +00:00
Paul Walker	8478de3d00	[LLVM][CodeGen] Remove failure cases when widening EXTRACT/INSERT_SUBVECTOR. (#162308 ) This PR implements catch all handling for widening the scalable subvector operand (INSERT_SUBVECTOR) or result (EXTRACT_SUBVECTOR). It does this via the stack using masked memory operations. With general handling available we can add optimiations for specific cases.	2025-12-01 12:32:58 +00:00
Luke Lau	d1500d12be	[SelectionDAG] Add SelectionDAG::getTypeSize. NFC (#169764 ) Similar to how getElementCount avoids the need to reason about fixed and scalable ElementCounts separately, this patch adds getTypeSize to do the same for TypeSize. It also goes through and replaces some of the manual uses of getVScale with getTypeSize/getElementCount where possible.	2025-12-01 10:33:50 +00:00
Peter Collingbourne	6227eb90da	Add IR and codegen support for deactivation symbols. Deactivation symbols are a mechanism for allowing object files to disable specific instructions in other object files at link time. The initial use case is for pointer field protection. For more information, see the RFC: https://discourse.llvm.org/t/rfc-deactivation-symbols/85556 Reviewers: ojhunt, nikic, fmayer, arsenm, ahmedbougacha Reviewed By: fmayer Pull Request: https://github.com/llvm/llvm-project/pull/133536	2025-11-26 12:37:09 -08:00
陈子昂	e38529ddbb	[DAG] Update canCreateUndefOrPoison to handle ISD::VECTOR_COMPRESS (#168010 ) Fixes #167710	2025-11-19 10:21:05 +00:00
Craig Topper	96e58b83a3	[RISCV] Legalize misaligned unmasked vp.load/vp.store to vle8/vse8. (#167745 ) If vector-unaligned-mem support is not enabled, we should not generate loads/stores that are not aligned to their element size. We already do this for non-VP vector loads/stores. This code has been in our downstream for about a year and a half after finding the vectorizer generating misaligned loads/stores. I don't think that is unique to our downstream. Doing this for masked vp.load/store requires widening the mask as well which is harder to do. NOTE: Because we have to scale the VL, this will introduce additional vsetvli and the VL optimizer will not be effective at optimizing any arithmetic that is consumed by the store.	2025-11-18 11:13:54 -08:00
Sander de Smalen	f369a53d82	[DAGCombiner] Fold select into partial.reduce.add operands. (#167857 ) This generates more optimal codegen when using partial reductions with predication. ``` partial_reduce_mla(acc, sel(p, mul(ext(a), ext(b)), splat(0)), splat(1)) -> partial_reduce_mla(acc, sel(p, a, splat(0)), b) partial.reduce.mla(acc, sel(p, ext(op), splat(0)), splat(1)) -> partial.reduce.*mla(acc, sel(p, op, splat(0)), splat(trunc(1))) ```	2025-11-18 09:49:42 +00:00
Matt Arsenault	0385a182da	DAG: exp opcodes cannotBeOrderedNegativeFP (#167604 )	2025-11-12 19:50:46 +00:00

1 2 3 4 5 ...

2829 Commits