2829 Commits

Author SHA1 Message Date
paperchalice
c53acf0443
[SelectionDAGBuilder] Remove NoNaNsFPMath uses (#169904)
Replaced by checking fast-math flags or value tracking results.
2026-02-09 09:48:07 +08:00
Nicolai Hähnle
af836ff60c
[CodeGen] Add getTgtMemIntrinsic overload for multiple memory operands (NFC) (#175843)
There are target intrinsics that logically require two MMOs, such as
llvm.amdgcn.global.load.lds, which is a copy from global memory to LDS,
so there's both a load and a store to different addresses.

Add an overload of getTgtMemIntrinsic that produces intrinsic info in a
vector, and implement it in terms of the existing (now protected)
overload.

GlobalISel and SelectionDAG paths are updated to support multiple MMOs.
The main part of this change is supporting multiple MMOs in
MemIntrinsicNodes.

Converting the backends to using the new overload is a fairly mechanical step
that is done in a separate change in the hope that that allows reducing merging
pains during review and for downstreams. A later change will then enable
using multiple MMOs in AMDGPU.
2026-02-02 21:58:42 +00:00
Philip Ginsbach-Chen
5d5b4aaa0e
[SelectionDAG][NFC] Rename isConstantSequence to isArithmeticSequence (#179108)
The previous name was misleading: the method checks for an arithmetic
progression `(start, start+stride, start+2*stride, ...)`, not just any
constant sequence. The new name uses precise mathematical terminology.

https://github.com/llvm/llvm-project/pull/176671#discussion_r2735571479
2026-02-02 17:19:57 +00:00
zhijian lin
dc520ea4af
[PowerPC] using milicode call for strcmp instead of lib call (#177009)
1. AIX has "millicode" routines, which are functions loaded at boot time
into fixed addresses in kernel memory. This allows them to be customized
for the processor. The __strcmp routine is a millicode implementation;
we use millicode for the strcmp function instead of a library call to
improve performance.
2026-02-02 09:34:53 -05:00
Simon Pilgrim
a372152cb5
[DAG] visitVECTOR_SHUFFLE - ensure correct resno when folding shuffle(bop(shuffle(x,y),shuffle(z,w)) (#179124)
TLI.isBinOp recognises some opcodes that have multiple results,
including UADDO etc.

In most cases we currently just bail if a binop has multiple results,
but shuffle combining was missing the check and its pretty trivial to
add handling in this case.

I've added add/sub-overflow opcodes to verifyNode to help catch these
cases in the future - IIRC there was a plan to autogen these, but there
isn't anything at the moment.

Fixes #179112
2026-02-02 09:22:48 +00:00
Benjamin Maxwell
1818b23a99
[SDAG] Check for nsz in DAG.canIgnoreSignBitOfZero() (#178905)
Follow up to #174423
2026-02-01 15:58:38 +00:00
Philip Ginsbach-Chen
e345976e04
[SelectionDAG] Handle undef at any position in isConstantSequence (#176671)
This patch extends `BuildVectorSDNode::isConstantSequence` to recognize
constant sequences that contain undef elements at any position.

The new implementation finds the first two non-undef constant elements,
computes the stride from their difference, then verifies all other
defined elements match the sequence. This enables SVE's INDEX
instruction to be used in more cases.

This change particularly benefits ZIP1/ZIP2 patterns where one operand
is a constant sequence. When a smaller constant vector like `<0, 1, 2,
3>` is used in a ZIP1 shuffle producing a wider result, it gets expanded
with trailing undefs. Similarly, for ZIP2 patterns, the DAG combiner
transforms the constant to have leading undefs since ZIP2 only uses the
upper half of its operands.

In particular, these patterns arise naturally from `VectorCombine`'s
`compactShuffleOperands` optimization (see #176074) that I am suggesting
as a fix for #137447.
2026-01-30 19:57:11 +00:00
Osama Abdelkader
aad7259ff6
[AArch64] Optimize memset to use NEON DUP instruction for more sizes (#166030)
This change improves memset code generation for non-zero values on
AArch64 by using NEON's DUP instruction instead of
the less efficient multiplication with 0x01010101 pattern.

For small sizes, the value is extracted from a larger DUP. For
non-power-of-two sizes, overlapping stores are used in some cases.

TargetLowering::findOptimalMemOpLowering is modified to allow explicitly
specifying the size of the constant in cases where the constant is
larger than the store operations.

Fixes #165949
2026-01-29 13:03:38 -08:00
Craig Topper
53ec484ebf
[SelectionDAG] Add CTLS to FoldConstantArithmetic and optimize i1 CTLS to 0. (#178552)
Since we don't have a CTLS intrinsic, it likely gets constant folded
while it is still a CTLZ pattern so I'm using a unittest to test it.
2026-01-29 08:00:10 -08:00
serge-sans-paille
adbbe856d7
[perf] Replace copy-assign by move-assign in llvm/lib/CodeGen/* (#178172) 2026-01-28 06:57:50 +00:00
Sander de Smalen
0e84f659b8
Support EXTRACT_SUBVECTOR in computeKnownBits for scalable vectors (#177163)
Rather than not supporting this case it would just be more conservative
as it will need to prove known bits for all elements.

Follows on from #176883
2026-01-27 12:53:00 +00:00
Craig Topper
896a667473
[KnownBits][SelectionDAG] Add KnownBits::clmul. Support trailing bits. NFC (#177517)
Borrow the known trailing bits logic from KnownBits::mul, but using
APIntOps::clmul.
2026-01-23 11:11:38 -08:00
Craig Topper
53b0a64e98
[SelectionDAG] Add very basic computeKnownBits support for ISD::CLMUL. (#177445)
This implements leading zero count support so we can remove some
unnecessary ANDs.
2026-01-22 14:49:34 -08:00
Cheng Lingfei
711e8e5694
[AArch64] Optimize memcpy for non-power of two sizes (#168890)
The previous getMemcpyLoadsAndStores implementation would chain
load/store instructions from "NumLdStInMemcpy - GlueIter -
GluedLdStLimit" to "NumLdStInMemcpy - GlueIter". This approach caused
issues when copying non-power-of-two sizes, as it would chain leading
load/stores with subsequent instructions at non-power-of-two aligned
offsets.

This chaining pattern prevented optimal optimizations in
aarch64-ldst-opt pass for these load/store instructions.

This commit modifies the chaining range to be from GlueIter to GlueIter
+ GluedLdStLimit, enabling proper optimization of load/store
instructions in aarch64-ldst-opt.


Closes https://github.com/llvm/llvm-project/issues/165947
2026-01-22 15:47:50 +00:00
Sander de Smalen
e807c6f89d
[AArch64] Fold sext-in-reg for predicate -> fixed-length conversions. (#176883) 2026-01-21 13:15:28 +00:00
Matt Arsenault
aca2783840
DAG: Get libcall info from LibcallLowering in more places (#176836)
Avoid using TargetLowering functions
2026-01-20 12:47:22 +01:00
Sander de Smalen
3eed0511c0 [SelectionDAG] NFC: Remove redundant assert in ComputeNumSignBits.
This assert should not have existed, because just below it the code
bails out for that same condition. The case of the vector being a
scalable vector also shouldn't cause the compiler to crash with an
assertion failure, and instead it should just avoid analysing the
expression.
2026-01-20 09:09:15 +00:00
Jerry Dang
d2c5892c22
[SelectionDAG] Add TRUNCATE_SSAT_S/U and TRUNCATE_USAT_U to canCreateUndefOrPoison and computeKnownBits (#152143) (#168809)
1. Implement `SelectionDAG::computeKnownBits` for TRUNCATE_SSAT_S/U and
TRUNCATE_USAT_U
2. Saturating truncation operations are well-defined for all inputs and
cannot create poison or undef values. This allows the optimizer to
eliminate unnecessary freeze instructions after these operations.

Fixes #152143
2026-01-19 10:25:08 +00:00
fbrv
dd29183f33
[DAG] Allow MIN/MAX signedness flip when operands are known-negative (#174469)
Extend the existing DAGCombine logic in visitIMINMAX so that signed and
unsigned MIN/MAX can be flipped not only when both operands are known
non-negative but also when both operands are known negative. This
replaces the old SignBitIsZero checks with computeKnownBits and explicit
tests for non-negative or negative operands while keeping all existing
legality and saturation gating in place. Add regression tests to cover
both the known-negative case and the known-non-negative case.

Fixes #174325
2026-01-16 18:48:54 +00:00
Matt Arsenault
01e6245af4
DAG: Avoid querying libcall info from TargetLowering (#176268)
Libcall lowering decisions should come from the LibcallLoweringInfo
analysis. Query this through the DAG, so eventually the source
can be the analysis. For the moment this is just a wrapper around
the TargetLowering information.
2026-01-16 09:02:49 +00:00
zhijian lin
7b90f426a6
[PowerPC] using milicode call for strstr instead of lib call (#176002)
AIX has "millicode" routines, which are functions loaded at boot time
into fixed addresses in kernel memory. This allows them to be customized
for the processor. The __strstr routine is a millicode implementation;
we use millicode for the strstr function instead of a library call to
improve performance.

I add a helper function `getRuntimeCallSDValueHelper` in the patch. I
will refactor the function `SelectionDAG::getStrlen`
`SelectionDAG::getStrcpy` etc later in another patch.
2026-01-15 14:58:17 -05:00
Manasij Mukherjee
2fa1ba62ac
[SelectionDAG] Fix zext assertion check for scalable vectors (#176064)
Use element type comparisons in getZeroExtendInReg to avoid comparing
scalable and fixed types.

Fixes #176037
2026-01-14 22:00:26 -08:00
Gergo Stomfai
5f31b9c381
[DAG] computeKnownBits - add CTLS handling (#174824)
Add handling for CTLS using the same method as in
https://github.com/llvm/llvm-project/pull/174636.

Added tests to AArch64 and RISCV, but it seems that ARM is actually
resolving `llvm.arm.cls` to `clz`, so not tests added there.
2026-01-14 15:04:40 +00:00
actink
ad3e3d809e
[SDAG] fix miss opt: shl nuw + zext adds unnecessary masking (#172046)
close: #171750
2026-01-13 22:03:47 +08:00
zhijian lin
b983b0e92a
[PowerPC] using milicode call for strcpy instead of lib call (#174782)
AIX has "millicode" routines, which are functions loaded at boot time
into fixed addresses in kernel memory. This allows them to be customized
for the processor. The __strcpy routine is a millicode implementation;
we use millicode for the strcpy function instead of a library call to
improve performance.

---------

Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
2026-01-12 08:58:45 -05:00
Yingwei Zheng
b8892b9a9b
[SDAG] Add freeze when simplifying select with undef arms (#175199)
Consider the following pattern:
```
%trunc = trunc nuw i64 %x to i48
%sel = select i1 %cmp, i48 %trunc, i48 undef
```
We cannot simplify `%sel` to `%trunc` as `%trunc` may be poison, which
cannot be refined into undef.

This patch checks whether the replacement may be poison. If so, it will
insert a freeze.
We may need SDAG's version of `impliesPoison` if it causes significant
regressions.
Compile-time impact:
https://llvm-compile-time-tracker.com/compare.php?from=ded109c0cff41714ebf9bd60b073aaab07fa4ca8&to=103e605ce6b33bc9145526faf805ee38b972c215&stat=instructions%3Au

Closes https://github.com/llvm/llvm-project/issues/175018.
2026-01-10 13:49:53 +08:00
Craig Topper
6c5535bd71
[SelectionDAG] Unify ISD::LOAD handling in ComputeNumSignBits. NFC (#175060)
Range metadata was handled in a ISD::LOAD case in the main opcode
switch. Extending loads and constant pools were handled with special
code after the main switch. Move this code into the ISD::LOAD case of
the main switch.

There is one slight change here, I put the Op.getResNo() == 0 check
before the range handling. This should be more correct.
2026-01-08 14:12:47 -08:00
Craig Topper
d81a0e7a18
[SelectionDAG] Add ISD::CTLS to canCreateUndefOrPoison. (#174709) 2026-01-07 10:38:43 -08:00
Jay Foad
e5623b1a9e Revert "SelectionDAG: Do not propagate divergence through glue (#174766)"
This reverts commit 47a0d0e42832558f999b149b22cfd48c46ef2a57.

Reverted due to test failures in LLVM_ENABLE_EXPENSIVE_CHECKS builds.
2026-01-07 14:23:15 +00:00
Jay Foad
47a0d0e428
SelectionDAG: Do not propagate divergence through glue (#174766)
Glue does not carry any value (in the LLVM IR Value sense) that could be
considered uniform or divergent.
2026-01-07 14:04:36 +00:00
Luke Lau
ad4bfac732
[IR] Split vector.splice into vector.splice.left and vector.splice.right (#170796)
This PR implements the first change outlined in
https://discourse.llvm.org/t/rfc-allow-non-constant-offsets-in-llvm-vector-splice/88974?u=lukel

In order to allow non-immediate offsets in the llvm.vector.splice
intrinsic, we need to separate out the "shift left" and "shift right"
modes into two separate intrinsics, which were previously determined by
whether or not the offset is positive or negative.

The description in the LangRef has also been reworded in terms of
sliding elements left or right and extracting either the upper or lower
half as opposed to extracting from a certain index, which brings it
inline with the definition of `llvm.fshr.*`/`llvm.fshl.*`.

This patch teaches AutoUpgrade.cpp to upgrade the old intrinsics into
their new equivalent one based on their offset, so existing uses of
vector.splice should still work.

Uses of llvm.vector.splice in `llvm/test/CodeGen` haven't been replaced
in this PR to keep the diff small and kick the tyres on the AutoUpgrader
a bit. I planned to do this in a follow up NFC but can include it in
this PR if reviewers prefer.

Similarly the shuffle costing kind `SK_Splice` has just been kept the
same for now, to be split into `SK_SpliceLeft` and `SK_SpliceRight`
later.
2026-01-06 15:41:26 +08:00
Ramkumar Ramachandra
9e5e267a03
[ISel] Introduce llvm.clmul intrinsic (#168731)
In line with a std proposal to introduce the llvm.clmul family of
intrinsics corresponding to carry-less multiply operations. This work
builds upon 727ee7e ([APInt] Introduce carry-less multiply primitives),
and follow-up patches will introduce custom-lowering on supported
targets, replacing target-specific clmul intrinsics.

Testing is done on the RISC-V target, which should be sufficient to
prove that the intrinsics work, since no RISC-V specific lowering has
been added.

Ref: https://isocpp.org/files/papers/P3642R3.html

Co-authored-by: Craig Topper <craig.topper@sifive.com>
2026-01-05 20:24:06 +00:00
Sergei Barannikov
501aa3740f
[SelectionDAG] Fix return type of JUMP_TABLE_DEBUG_INFO node (#174228)
The node has a chain result, not a glue.

Extracted from #168421.
2026-01-02 18:50:51 +00:00
Shilei Tian
2f6a630aae
[SelectionDAG] Skip chain node when updating divergence (#173885)
Fixes #173785.
2025-12-29 14:54:40 -05:00
Craig Topper
877df9e4b9
[SelectionDAG] Make SSHLSAT/USHLSAT obey getShiftAmountTy(). (#173216)
Treat these like other shift operations by allowing the shift amount to
be a different type than the result.

The PromoteIntOp_Shift and LegalizeDAG code are not tested due to lack
of target support.

I'm looking at adding SSHLSAT for the RISC-V P extension. I don't need
this support for that since RISC-V only has one legal type. I just thought it
was odd that they weren't like other shifts.
2025-12-22 10:28:04 -08:00
Matt Arsenault
68aea8e202
AMDGPU: Avoid introducing unnecessary fabs in fast fdiv lowering (#172553)
If the sign bit of the denominator is known 0, do not emit the fabs.
Also, extend this to handle min/max with fabs inputs.

I originally tried to do this as the general combine on fabs, but
it proved to be too much trouble at this time. This is mostly
complexity introduced by expanding the various min/maxes into
canonicalizes, and then not being able to assume the sign bit
of canonicalize (fabs x) without nnan.

This defends against future code size regressions in the atan2 and
atan2pi library functions.
2025-12-17 00:22:12 +01:00
Matt Arsenault
eb1876c960
DAG: Fix arith_fence handling in SignBitIsZeroFP (#172537) 2025-12-16 20:10:38 +00:00
Matt Arsenault
b2d9356719
DAG: Make more use of the LibcallImpl overload of getExternalSymbol (#172171)
Also add a new copy for TargetExternalSymbol that AArch64 needs.
2025-12-13 19:16:47 +00:00
Guy David
29611f4cbe
[DAGCombiner] Relax nsz constraint for FP optimizations (#165011)
Some floating-point optimization don't trigger because they can produce
incorrect results around signed zeros, and rely on the existence of the
nsz flag which commonly appears when fast-math is enabled.
However, this flag is not a hard requirement when all of the users of
the combined value are either guaranteed to overwrite the sign-bit or
simply ignore it (comparisons, etc.).

The optimizations affected:
- fadd x, +0.0 -> x
- fsub x, -0.0 -> x
- fsub +0.0, x -> fneg x
- fdiv(x, sqrt(x)) -> sqrt(x)
- frem lowering with power-of-2 divisors
2025-12-09 12:07:46 +02:00
Matt Arsenault
27bf5fdcc6
DAG: Add overload of getExternalSymbol using RTLIB::LibcallImpl (#170587) 2025-12-05 22:39:57 +00:00
David Green
4c6b8825e8
[DAG] Fold mul 0 -> 0 when expanding mul into parts. (#168780)
If the upper bits are zero, but we expand multiply then immediately
convert the multiple into a libcall, there is no opportunity to optimize
away the mul. Do so in getNode to make sure extending multiplies
optimise cleanly.
2025-12-05 07:58:28 +00:00
Matt Arsenault
8d6c5cddf2
DAG: Use LibcallImpl in various getLibFunc helpers (#170400)
Avoid using getLibcallName in favor of querying the
libcall impl, and getting the ABI details from that.
2025-12-03 13:00:45 -05:00
Lewis Crawford
ea3fdc5972
Avoid maxnum(sNaN, x) optimizations / folds (#170181)
The behaviour of constant-folding `maxnum(sNaN, x)` and `minnum(sNaN,
x)` has become controversial, and there are ongoing discussions about
which behaviour we want to specify in the LLVM IR LangRef.

See:
  - https://github.com/llvm/llvm-project/issues/170082
  - https://github.com/llvm/llvm-project/pull/168838
  - https://github.com/llvm/llvm-project/pull/138451
  - https://github.com/llvm/llvm-project/pull/170067
-
https://discourse.llvm.org/t/rfc-a-consistent-set-of-semantics-for-the-floating-point-minimum-and-maximum-operations/89006

This patch removes optimizations and constant-folding support for
`maxnum(sNaN, x)` but keeps it folded/optimized for `qNaN`. This should
allow for some more flexibility so the implementation can conform to
either the old or new version of the semantics specified without any
changes.

As far as I am aware, optimizations involving constant `sNaN` should
generally be edge-cases that rarely occur, so here should hopefully be
very little real-world performance impact from disabling these
optimizations.
2025-12-02 12:43:03 +00:00
Paul Walker
8478de3d00
[LLVM][CodeGen] Remove failure cases when widening EXTRACT/INSERT_SUBVECTOR. (#162308)
This PR implements catch all handling for widening the scalable
subvector operand (INSERT_SUBVECTOR) or result (EXTRACT_SUBVECTOR). It
does this via the stack using masked memory operations. With general
handling available we can add optimiations for specific cases.
2025-12-01 12:32:58 +00:00
Luke Lau
d1500d12be
[SelectionDAG] Add SelectionDAG::getTypeSize. NFC (#169764)
Similar to how getElementCount avoids the need to reason about fixed and
scalable ElementCounts separately, this patch adds getTypeSize to do the
same for TypeSize.

It also goes through and replaces some of the manual uses of getVScale
with getTypeSize/getElementCount where possible.
2025-12-01 10:33:50 +00:00
Peter Collingbourne
6227eb90da
Add IR and codegen support for deactivation symbols.
Deactivation symbols are a mechanism for allowing object files to disable
specific instructions in other object files at link time. The initial use
case is for pointer field protection.

For more information, see the RFC:
https://discourse.llvm.org/t/rfc-deactivation-symbols/85556

Reviewers: ojhunt, nikic, fmayer, arsenm, ahmedbougacha

Reviewed By: fmayer

Pull Request: https://github.com/llvm/llvm-project/pull/133536
2025-11-26 12:37:09 -08:00
陈子昂
e38529ddbb
[DAG] Update canCreateUndefOrPoison to handle ISD::VECTOR_COMPRESS (#168010)
Fixes #167710
2025-11-19 10:21:05 +00:00
Craig Topper
96e58b83a3
[RISCV] Legalize misaligned unmasked vp.load/vp.store to vle8/vse8. (#167745)
If vector-unaligned-mem support is not enabled, we should not generate
loads/stores that are not aligned to their element size.

We already do this for non-VP vector loads/stores.

This code has been in our downstream for about a year and a half after
finding the vectorizer generating misaligned loads/stores. I don't think
that is unique to our downstream.

Doing this for masked vp.load/store requires widening the mask as well
which is harder to do.

NOTE: Because we have to scale the VL, this will introduce additional
vsetvli and the VL optimizer will not be effective at optimizing any
arithmetic that is consumed by the store.
2025-11-18 11:13:54 -08:00
Sander de Smalen
f369a53d82
[DAGCombiner] Fold select into partial.reduce.add operands. (#167857)
This generates more optimal codegen when using partial reductions with
predication.

```
partial_reduce_*mla(acc, sel(p, mul(*ext(a), *ext(b)), splat(0)), splat(1))
-> partial_reduce_*mla(acc, sel(p, a, splat(0)), b)

partial.reduce.*mla(acc, sel(p, *ext(op), splat(0)), splat(1))
-> partial.reduce.*mla(acc, sel(p, op, splat(0)), splat(trunc(1)))
```
2025-11-18 09:49:42 +00:00
Matt Arsenault
0385a182da
DAG: exp opcodes cannotBeOrderedNegativeFP (#167604) 2025-11-12 19:50:46 +00:00