2685 Commits

Author SHA1 Message Date
Alexander Peskov
cffb8aee14
[DEBUGINFO] Propagate debug metadata for sext SDNode. (#135971)
In some cases of chained `sext` operators the debug metadata can be
missed. This patch propagates proper metadata to resulting node.

Particular case of issue is NVPTX codegen for function with bool local
variable:
```
void test(int i) {
  bool xyz = i == 0;
  foo(i);
}
```

---------

Signed-off-by: Alexander Peskov <apeskov@nvidia.com>
2025-05-02 08:31:41 -04:00
Jonathan Thackray
6e49f73825
Reland [llvm] Add support for llvm IR atomicrmw fminimum/fmaximum instructions (#137701)
This patch adds support for LLVM IR atomicrmw `fmaximum` and `fminimum`
instructions.

These mirror the `llvm.maximum.*` and `llvm.minimum.*` instructions, but
are atomic and use IEEE754 2019 handling for NaNs, which is different to
`fmax` and `fmin`. See:
     https://llvm.org/docs/LangRef.html#llvm-minimum-intrinsic
for more details.

Future changes will allow this LLVM IR to be lowered to specialised
assembler instructions on suitable targets, such as AArch64.
2025-04-30 22:06:37 +01:00
YunQiang Su
db859db74d Revert "CodeGen: Add ISD::AssertNoFPClass (#135946)"
This reverts commit f0c61d2242bbc7576ca5e4137a5ea8f63e4859a9.
2025-04-30 16:16:26 +08:00
Jonathan Thackray
7ee0097b48
Revert "[llvm] Add support for llvm IR atomicrmw fminimum/fmaximum instructions" (#137657)
Reverts llvm/llvm-project#136759 due to bad interaction with c792b25e4
2025-04-28 16:53:36 +01:00
Jonathan Thackray
ba420d8122
[llvm] Add support for llvm IR atomicrmw fminimum/fmaximum instructions (#136759)
This patch adds support for LLVM IR atomicrmw `fmaximum` and `fminimum`
instructions.

These mirror the `llvm.maximum.*` and `llvm.minimum.*` instructions, but
are atomic and use IEEE754 2019 handling for NaNs, which is different to
`fmax` and `fmin`. See:
     https://llvm.org/docs/LangRef.html#llvm-minimum-intrinsic
for more details.

Future changes will allow this LLVM IR to be lowered to specialised
assembler instructions on suitable targets, such as AArch64.
2025-04-28 15:31:44 +01:00
Craig Topper
e17f07c4de
[SelectionDAG] Reduce code duplication between getStore, getTruncStore, and getIndexedStore. (#137435)
Create an extra overload of getStore that can handle of the 3 types of
stores. This is similar to how getLoad/getExtLoad/getIndexLoad is
structure.
2025-04-27 22:32:53 -07:00
Jie Fu
46f91173c5 [CodeGen] Fix -Wunused-variable in SelectionDAG.cpp (NFC)
/llvm-project/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp:7502:17:
 error: unused variable 'NoFPClass' [-Werror,-Wunused-variable]
    FPClassTest NoFPClass = static_cast<FPClassTest>(N2->getAsZExtVal());
                ^
1 error generated.
2025-04-25 14:03:09 +08:00
YunQiang Su
f0c61d2242
CodeGen: Add ISD::AssertNoFPClass (#135946)
It is used to mark a value that we are sure that it is not some fcType.
The examples include:
  * An arguments of a function is marked with nofpclass
  * Output value of an intrinsic can be sure to not be some type

So that the following operation can make some assumptions.

---------

Co-authored-by: Your Name <you@example.com>
2025-04-25 09:12:41 +08:00
Craig Topper
f261f1406d
[SelectionDAG][RISCV] Teach computeKnownBits to use range metadata for atomic_load. (#137119)
And teach SelectionDAGBuilder to get the range metadata in
visitAtomicLoad.

This allows us to recognize that sign extending a byte load of a
boolean value from memory will produce zeros for the extended bits.
This allow us to remove an AND on RISC-V.

Tests copied from #136502 with range metadata added to i1 cases.
Some of the test effects overlap with #136502, but that patch can't
handle the acquire or seq_cst cases with the Zalasr extension. We
only have sign extending versions of those loads.
2025-04-24 12:14:05 -07:00
Craig Topper
dbb0605f87 [SelectionDAG] Add NewSDValueDbgMsg to getAtomic. 2025-04-23 22:56:52 -07:00
Peter Collingbourne
dbb8434ff7
SelectionDAG: Add missing AddNodeIDCustom case for MDNodeSDNode.
Without this we ended up never deduplicating MDNodeSDNodes.

Reviewers: arsenm

Reviewed By: arsenm

Pull Request: https://github.com/llvm/llvm-project/pull/136805
2025-04-23 11:00:48 -07:00
zhijian lin
afda4c295b
Reland [SelectionDAG] Folding ZERO-EXTEND/SIGN_EXTEND poison to Poison value in getNode (#136701)
This patch addresses the signed/zero extension of poison by using a
poison value of the extended type instead of a constant zero of the
extended type.
2025-04-22 17:36:41 -04:00
Craig Topper
f6178cdad0
[SelectionDAG] Pass LoadExtType when ATOMIC_LOAD is created. (#136653)
Rename one signature of getAtomic to getAtomicLoad and pass LoadExtType.
Previously we had to set the extension type after the node was created,
but we don't usually modify SDNodes once they are created. It's possible
the node already existed and has been CSEd. If that happens, modifying
the node may affect the other users. It's therefore safer to add the
extension type at creation so that it is part of the CSE information.

I don't know of any failures related to the current implementation. I
only noticed that it doesn't match how we usually do things.
2025-04-22 09:11:46 -07:00
Craig Topper
497382ee07
[SelectionDAG] Make the FoldingSet profile in getAtomic match AddNodeIDCustom. (#136651)
In theory, the mismatch would have made CSE of AtomicSDNodes not work,
but I don't know how to test it.
2025-04-21 22:39:31 -07:00
Craig Topper
704fc6542c
[SelectionDAG] Prefer to use ATOMIC_LOAD extension type over getExtendForAtomicOps() in computeKnownBits/ComputeNumSignBits. (#136600)
If an ATOMIC_LOAD has ZEXTLOAD/SEXTLOAD extension type we should trust
that over getExtendForAtomicOps().

SystemZ is the only target that uses setAtomicLoadExtAction and they
return ANY_EXTEND from getExtendForAtomicOps(). So I'm not sure there's
a way to get a contradiction currently.

Note, type legalization uses getExtendForAtomicOps() when promoting
ATOMIC_LOAD so we may not need to check getExtendForAtomicOps() for
ATOMIC_LOAD. I have not done much investigating of this.
2025-04-21 13:49:49 -07:00
Jim Lin
b95ec24ff0
[SDAG] Handle insert_subvector in isKnownNeverNaN (#131989)
Propagate nnan across insert_subvector.
2025-04-22 01:19:56 +08:00
Nico Weber
e18a77cfbe Revert "[SelectionDAG] Folding ZERO-EXTEND/SIGN_EXTEND poison to Poison value in getNode (#122741)"
This reverts commit f12078e72601e7c03e5d66afab034313caf8f791.

Breaks `check-llvm`, see comments on https://github.com/llvm/llvm-project/pull/122741
2025-04-21 10:51:03 -04:00
zhijian lin
f12078e726
[SelectionDAG] Folding ZERO-EXTEND/SIGN_EXTEND poison to Poison value in getNode (#122741)
The PR will fix the issue
https://github.com/llvm/llvm-project/issues/122728

This patch addresses the signed/zero extension of poison by using a
poison value of the extended type instead of a constant zero of the
extended type.
2025-04-21 10:02:21 -04:00
Simon Pilgrim
64ffecfc43
[DAG] isKnownNeverNaN - add DemandedElts element mask to isKnownNeverNaN calls (#135952)
Matches what we've done for computeKnownBits etc. to improve vector handling
2025-04-18 09:24:02 +01:00
zhijian lin
3dfdb4dad5
[SelectionDAG] Propagate poison in getNode with two operands if the input is poison. (#135387)
Propagation to poison in function `SDValue
SelectionDAG::getNode(unsigned Opcode, const SDLoc &DL, EVT VT,SDValue
N1, SDValue N2, const SDNodeFlags Flags) ` if one of the input is
poison.
 
 The patch also revert the test cases
 llvm/test/CodeGen/X86/pr119158.ll
 llvm/test/CodeGen/X86/half.ll
 
which are mentioned in
https://github.com/llvm/llvm-project/pull/125883#discussion_r2021390919

---------

Co-authored-by: Amy Kwan <amy.kwan1@ibm.com>
2025-04-17 09:23:14 -04:00
Simon Pilgrim
bb5f53c727
[DAG] isSplatValue - only treat binop splats with repeated undef elements as undef (#135945)
#135597 didn't correctly fix the issue of binops with an undef element
from only one operand - only reporting the common undef elements could
incorrectly recognise splats where the (binop X, undef) fold might
actually be different - we need to ensure both operands have the same
demanded undefs for certainty.

Fixes #135917
2025-04-16 12:34:11 +01:00
Simon Pilgrim
17b4cacbd4
[DAG] isSplatValue - only treat binop splats shared undef elements as undef (#135597)
#134602 demonstrated an issue where an AND node always had at least one demanded UNDEF element in either operand, and incorrectly reported this an all-undef result - despite the other element being 0 (so would correctly fold to 0).

This fix only assumes a binops splats element is undefined if both operands are undef.

Fixes #134602
2025-04-15 08:33:42 +01:00
zhijian lin
378ac572ac
Reland "[SelectionDAG] Introducing a new ISD::POISON SDNode to represent the poison value in the IR." (#135056)
A new ISD::POISON SDNode is introduced to represent the poison value in
the IR, replacing the previous use of ISD::UNDEF
2025-04-10 11:29:14 -04:00
Jakub Kuderski
ef1088f703
Revert "[SelectionDAG] Introducing a new ISD::POISON SDNode to represent the poison value in the IR." (#135060)
Reverts llvm/llvm-project#125883

This PR causes crashes in RISC-V codegen around f16/f64 poison values:
https://github.com/llvm/llvm-project/pull/125883#issuecomment-2787048206
2025-04-09 14:40:56 -04:00
zhijian lin
8fddef8483
[SelectionDAG] Introducing a new ISD::POISON SDNode to represent the poison value in the IR. (#125883)
A new ISD::POISON SDNode is introduced to represent the `poison value`
in the IR, replacing the previous use of ISD::UNDEF.
2025-04-07 10:03:05 -04:00
Sergei Barannikov
0a1742708d
[SelectionDAG] Wire up -gen-sdnode-info TableGen backend (#125358)
This patch introduces SelectionDAGGenTargetInfo and SDNodeInfo classes,
which provide methods for accessing the generated SDNode descriptions.

Pull Request: https://github.com/llvm/llvm-project/pull/125358
Draft PR: https://github.com/llvm/llvm-project/pull/119709
RFC: https://discourse.llvm.org/t/rfc-tablegen-erating-sdnode-descriptions
2025-04-06 13:14:37 +03:00
LU-JOHN
6a46c6c865
Ensure KnownBits passed when calculating from range md has right size (#132985)
KnownBits passed to computeKnownBitsFromRangeMetadata must have the same
bit width as the range metadata bit width. Otherwise the calculated
results will be incorrect.

---------

Signed-off-by: John Lu <John.Lu@amd.com>
2025-04-03 10:17:14 +07:00
Philip Reames
c90a536bcf [CodeGen] Simplify code using TypeSize overloads of getMachineMemOperand [nfc]
These were added in d584cea.  This change runs through existing uses and
simplifies where obvious.
2025-03-27 11:47:51 -07:00
LU-JOHN
70aeb89094
Calculate KnownBits from Metadata correctly for vector loads (#128908)
Calculate KnownBits correctly from metadata for vector loads.

---------

Signed-off-by: John Lu <John.Lu@amd.com>
2025-03-25 22:46:30 +07:00
Simon Pilgrim
0237216f16
[DAG] canCreateUndefOrPoison - add EXTRACT_SUBVECTOR handling (#132745)
Similar to INSERT_SUBVECTOR - the index is constant and will be inbounds
2025-03-24 16:03:47 +00:00
David Green
bd1be8a242
[CodeGen][GlobalISel] Add a getVectorIdxWidth and getVectorIdxLLT. (#131526)
From #106446, this adds a variant of getVectorIdxTy that returns an LLT.
Many uses only look at the width, so a getVectorIdxWidth was added as
the common base.
2025-03-18 08:31:11 +00:00
Jim Lin
00cad3ed22
[SDAG] Handle extract_subvector in isKnownNeverNaN (#131581)
Propagate nnan across extract_subvector.
2025-03-18 09:37:16 +08:00
Benjamin Maxwell
55fdeccc45
[SDAG][X86] Remove hack needed to avoid missing x87 FPU stack pops (#128055)
If a (two-result) node like `FMODF` or `FFREXP` is expanded to a library
call, where said library has the function prototype like: `float(float,
float*)` -- that is it returns a float from the call and via an output
pointer. The first result of the node maps to the value returned by
value and the second result maps to the value returned via the output
pointer.

If only the second result is used after the expansion, we hit an issue
on x87 targets:

```
// Before expansion: 
t0, t1 = fmodf x
return t1  // t0 is unused
```

Expanded result:
```
ptr = alloca
ch0 = call modf ptr
t0, ch1 = copy_from_reg, ch0 // t0 unused
t1, ch2 = ldr ptr, ch1
return t1
```

So far things are alright, but the DAGCombiner optimizes this to:
```
ptr = alloca
ch0 = call modf ptr
// copy_from_reg optimized out
t1, ch1 = ldr ptr, ch0
return t1
```

On most targets this is fine. The optimized out `copy_from_reg` is
unused and is a NOP. However, x87 uses a floating-point stack, and if
the `copy_from_reg` is optimized out it won't emit a pop needed to
remove the unused result.

The prior solution for this was to attach the chain from the
`copy_from_reg` to the root, which did work, however, the root is not
always available (it's set to null during legalize types). So the
alternate solution in this patch is to replace the `copy_from_reg` with
an `X86ISD::POP_FROM_X87_REG` within the X86 call lowering. This node is
the same as `copy_from_reg` except this node makes it explicit that it
may lower to an x87 FPU stack pop. Optimizations should be more cautious
when handling this node than a normal CopyFromReg to avoid removing a
required FPU stack pop.

```
ptr = alloca
ch0 = call modf ptr
t0, ch1 = pop_from_x87_reg, ch0 // t0 unused
t1, ch2 = ldr ptr, ch1
return t1
```

Using this node ensures a required x87 FPU pop is not removed due to the
DAGCombiner.

This is an alternate solution for #127976.
2025-03-03 12:23:28 +00:00
Craig Topper
7bd2be4266 [SelectionDAG] Use Register and MCRegister. NFC
Add operators to Register to supporting adding an offset to get
another Register.
2025-03-02 22:33:25 -08:00
Simon Pilgrim
7de64925da
[DAG] shouldReduceLoadWidth - hasOneUse should check just the loaded value - not the chain (#128167)
The hasOneUse check was failing in any case where the load was part of a chain - we should only be checking if the loaded value has one use, and any updates to the chain should be handled by the fold calling shouldReduceLoadWidth.

I've updated the x86 implementation to match, although it has no effect here yet (I'm still looking at how to improve the x86 implementation) as the inner for loop was discarding chain uses anyway.

By using SDValue::hasOneUse instead this patch exposes a missing dependency on the LLVMSelectionDAG library in a lot of tools + unittests, which resulted in having to make SDNode::hasNUsesOfValue inline.

Noticed while fighting the x86 regressions in #122671
2025-02-24 11:09:41 +00:00
Piotr Fusik
8b58cb853a
[SelectionDAG][NFC] Refactor duplicate code into SDNode::bitcastToAPInt() (#127503) 2025-02-20 13:23:00 +07:00
zhijian lin
1ac0db44fd
[NFC] using isUndef() instead of getOpcode() == ISD::UNDEF (#127713)
[NFC] using isUndef() instead of getOpcode() == ISD::UNDEF
2025-02-19 08:42:38 -05:00
James Chesterman
d4a0848dc6
[SelectionDAG] Add PARTIAL_REDUCE_U/SMLA ISD Nodes (#125207)
Add signed and unsigned PARTIAL_REDUCE_MLA ISD nodes. Add command line
argument (aarch64-enable-partial-reduce-nodes) that indicates whether the
intrinsic experimental_vector_partial_ reduce_add will be transformed
into the new ISD node. Lowering with the new ISD nodes will, for now,
always be done as an expand.
2025-02-18 09:08:47 +00:00
Cullen Rhodes
9b2fc66830
[SDAG] Harden assumption in getMemsetStringVal (#126207)
In 5235973ee03aca4148ecabe5eff64da2af1e034e, an ICE was fixed in
getMemsetStringVal where f128 wasn't handled. It was noted at the time
[1] that the code below this also looks suspect, since it assumes the
element type of VT is either an f32 or f64.

This part of getMemsetStringVal relates to memcpy operations where the
source is a copy from a zero constant. The VT in question is determined
by TargetLowering::findOptimalMemOpLowering, which in turn calls a
further TLI hook getOptimalMemOpType.

For AArch64, getOptimalMemOpType returns either a v16i8, f128, i64, i32
or Other. For Other, TargetLowering::findOptimalMemOpLowering will then
pick an integer VT. So on AArch64 at least, I don't believe the suspect
code can be reached.

For other targets, ARM and x86 are the only ones that return a FP vector
type from getOptimalMemOpType. For both targets, the only such type is
v2f64, but given f64 is already handled it should also be fine.

To defend this, I considered adding an assert as mentioned in [1], but
given getConstantFP handles vector types, I figured using this to fully
handle the FP types makes the code simpler and more robust.

For test coverage I added unreachables to both of the branches handling
FP types in this code, but found neither fired with check-llvm across
all targets.

Test coverage was added to llvm/test/CodeGen/AArch64/memcpy-f128.ll in
5235973ee03aca4148ecabe5eff64da2af1e034e to defend ICE on f128, but at
some point it stopped hitting this code.

AArch64TargetLowering::getOptimalMemOpType was updated in
29200611055f49a0d37243caa5f8bba1df9d57a6, so I suspect this is when it
happened, although I haven't verified this. Although I did find by
updating the test to disable NEON, getOptimalMemOpType returns an f128
and the branch is once again hit.

For the final branch noted as suspect in [1], as far as I can tell this
has never had any test coverage, so I've added a test to the ARM backend
for this.

Fixes: https://github.com/llvm/llvm-project/issues/20521 [1]
2025-02-13 08:48:06 +00:00
Philip Reames
e4016bf5c3 [DAG] Use ArrayRef to simplify ShuffleVectorSDNode::isSplatMask 2025-02-11 12:47:10 -08:00
Simon Pilgrim
b7c8271601
[DAG] getNode - convert scalar i1 arithmetic calls to bitwise instructions (#125486)
We already do this for vector vXi1 types - this patch removes the vector constraint to handle it for all bool types.
2025-02-03 16:36:01 +00:00
David Green
cae0d67cba
[AArch64][SDAG] Detect non-zeroes in truncating buildvectors in fshl lowering (#123597)
A BUILD_VECTOR can implicity shrink the bits of the operands if the
operand types are not legal. For example a v8i16 constant BUILD_VECTOR
might be represented as v8i16 BUILDVECTOR(i32 1, i32 2, ...).
Unfortunately this means that the constants are not accepted by
matchUnaryPredicateImpl, preventing in this case funnel shifts detecting
that all the operands are non-zero. Add a flag to help it match.
2025-02-03 10:47:45 +00:00
Pierre van Houtryve
8ea018ce1d
[DAGISel] Fix MMRA Handling in copyExtraInfo (#124730)
#78569 did not implement this correctly and an edge case breaks it by
triggering `Assertion `!Leafs.empty()' failed.`

Fixes SWDEV-507698
2025-01-28 13:27:26 +01:00
Benjamin Maxwell
778138114e
[SDAG] Use BatchAAResults for querying alias analysis (AA) results (#123934)
Once we get to SelectionDAG the IR should not be changing anymore, so we
can use BatchAAResults rather than AAResults to cache AA queries.

This should be a NFC change for targets that enable AA during codegen
(such as AArch64), but also give a nice compile-time improvement in some
cases. See:
https://github.com/llvm/llvm-project/pull/123787#issuecomment-2606797041

Note: This follows Nikita's suggestion on #123787.
2025-01-23 09:16:09 +00:00
Alex MacLean
3606876b67
[SDAG] Fix CSE for ADDRSPACECAST nodes (#122912)
Correct CSE in SelectionDAG can make DAG combining more effective and
reduces the size of the DAG and thus should improve compile time.
2025-01-20 09:09:22 -08:00
Min-Yih Hsu
2291d0aba9
[DAGCombiner] Turn (neg (max x, (neg x))) into (min x, (neg x)) (#120666)
This pattern was originally spotted in 429.mcf by @topperc.

We already have a DAGCombiner pattern to turn `(neg (abs x))` into `(min
x, (neg x))`. But in some cases `(neg (max x, (neg x)))` is formed by an
expanded `abs` followed by a `neg` that is generated only after the
`abs` expansion. This patch adds a separate pattern to match cases like
this, as well as its inverse pattern: `(neg (min X, (neg X))) --> (max
X, (neg X))`.

This pattern is applicable to both signed and unsigned min/max.
2025-01-02 16:28:55 -08:00
Sergei Barannikov
9ae92d7056
[SelectionDAG] Virtualize isTargetStrictFPOpcode / isTargetMemoryOpcode (#119969)
With this change, targets are no longer required to put memory / strict-fp opcodes after special
`ISD::FIRST_TARGET_MEMORY_OPCODE`/`ISD::FIRST_TARGET_STRICTFP_OPCODE` markers.
This will also allow autogenerating `isTargetMemoryOpcode`/`isTargetStrictFPOpcode (#119709).

Pull Request: https://github.com/llvm/llvm-project/pull/119969
2024-12-21 05:29:51 +03:00
Craig Topper
ecd59f802f [SelectionDAG] Use SmallVectorImpl& to avoid repeating SmallVector size. NFC 2024-12-19 22:03:42 -08:00
Craig Topper
e6b2495545
[SelectionDAG] Split SDNode::use_iterator into user_iterator and use_iterator. (#120531)
SDNode::use_iterator now returns an SDUse& when dereferenced.
SDNode::user_iterator returns SDNode*. SDNode::use_begin/use_end/uses
work on use_iterator. SDNode::user_begin/user_end/users work on
user_iterator.

We can now write range based for loops using SDUse& and SDNode::uses().
I've converted many of these in this patch. I didn't update loops that
have additional variables updated in their for statement.

Some loops use SDNode::use_iterator::getOperandNo() which also prevents
using range based for loops. I plan to move this into SDUse in a follow
up patch.
2024-12-19 08:35:32 -08:00
Craig Topper
104ad9258a
[SelectionDAG] Rename SDNode::uses() to users(). (#120499)
This function is most often used in range based loops or algorithms
where the iterator is implicitly dereferenced. The dereference returns
an SDNode * of the user rather than SDUse * so users() is a better name.

I've long beeen annoyed that we can't write a range based loop over
SDUse when we need getOperandNo. I plan to rename use_iterator to
user_iterator and add a use_iterator that returns SDUse& on dereference.
This will make it more like IR.
2024-12-18 20:09:33 -08:00