In some cases of chained `sext` operators the debug metadata can be
missed. This patch propagates proper metadata to resulting node.
Particular case of issue is NVPTX codegen for function with bool local
variable:
```
void test(int i) {
bool xyz = i == 0;
foo(i);
}
```
---------
Signed-off-by: Alexander Peskov <apeskov@nvidia.com>
This patch adds support for LLVM IR atomicrmw `fmaximum` and `fminimum`
instructions.
These mirror the `llvm.maximum.*` and `llvm.minimum.*` instructions, but
are atomic and use IEEE754 2019 handling for NaNs, which is different to
`fmax` and `fmin`. See:
https://llvm.org/docs/LangRef.html#llvm-minimum-intrinsic
for more details.
Future changes will allow this LLVM IR to be lowered to specialised
assembler instructions on suitable targets, such as AArch64.
This patch adds support for LLVM IR atomicrmw `fmaximum` and `fminimum`
instructions.
These mirror the `llvm.maximum.*` and `llvm.minimum.*` instructions, but
are atomic and use IEEE754 2019 handling for NaNs, which is different to
`fmax` and `fmin`. See:
https://llvm.org/docs/LangRef.html#llvm-minimum-intrinsic
for more details.
Future changes will allow this LLVM IR to be lowered to specialised
assembler instructions on suitable targets, such as AArch64.
It is used to mark a value that we are sure that it is not some fcType.
The examples include:
* An arguments of a function is marked with nofpclass
* Output value of an intrinsic can be sure to not be some type
So that the following operation can make some assumptions.
---------
Co-authored-by: Your Name <you@example.com>
And teach SelectionDAGBuilder to get the range metadata in
visitAtomicLoad.
This allows us to recognize that sign extending a byte load of a
boolean value from memory will produce zeros for the extended bits.
This allow us to remove an AND on RISC-V.
Tests copied from #136502 with range metadata added to i1 cases.
Some of the test effects overlap with #136502, but that patch can't
handle the acquire or seq_cst cases with the Zalasr extension. We
only have sign extending versions of those loads.
Rename one signature of getAtomic to getAtomicLoad and pass LoadExtType.
Previously we had to set the extension type after the node was created,
but we don't usually modify SDNodes once they are created. It's possible
the node already existed and has been CSEd. If that happens, modifying
the node may affect the other users. It's therefore safer to add the
extension type at creation so that it is part of the CSE information.
I don't know of any failures related to the current implementation. I
only noticed that it doesn't match how we usually do things.
If an ATOMIC_LOAD has ZEXTLOAD/SEXTLOAD extension type we should trust
that over getExtendForAtomicOps().
SystemZ is the only target that uses setAtomicLoadExtAction and they
return ANY_EXTEND from getExtendForAtomicOps(). So I'm not sure there's
a way to get a contradiction currently.
Note, type legalization uses getExtendForAtomicOps() when promoting
ATOMIC_LOAD so we may not need to check getExtendForAtomicOps() for
ATOMIC_LOAD. I have not done much investigating of this.
The PR will fix the issue
https://github.com/llvm/llvm-project/issues/122728
This patch addresses the signed/zero extension of poison by using a
poison value of the extended type instead of a constant zero of the
extended type.
Propagation to poison in function `SDValue
SelectionDAG::getNode(unsigned Opcode, const SDLoc &DL, EVT VT,SDValue
N1, SDValue N2, const SDNodeFlags Flags) ` if one of the input is
poison.
The patch also revert the test cases
llvm/test/CodeGen/X86/pr119158.ll
llvm/test/CodeGen/X86/half.ll
which are mentioned in
https://github.com/llvm/llvm-project/pull/125883#discussion_r2021390919
---------
Co-authored-by: Amy Kwan <amy.kwan1@ibm.com>
#135597 didn't correctly fix the issue of binops with an undef element
from only one operand - only reporting the common undef elements could
incorrectly recognise splats where the (binop X, undef) fold might
actually be different - we need to ensure both operands have the same
demanded undefs for certainty.
Fixes#135917
#134602 demonstrated an issue where an AND node always had at least one demanded UNDEF element in either operand, and incorrectly reported this an all-undef result - despite the other element being 0 (so would correctly fold to 0).
This fix only assumes a binops splats element is undefined if both operands are undef.
Fixes#134602
KnownBits passed to computeKnownBitsFromRangeMetadata must have the same
bit width as the range metadata bit width. Otherwise the calculated
results will be incorrect.
---------
Signed-off-by: John Lu <John.Lu@amd.com>
From #106446, this adds a variant of getVectorIdxTy that returns an LLT.
Many uses only look at the width, so a getVectorIdxWidth was added as
the common base.
If a (two-result) node like `FMODF` or `FFREXP` is expanded to a library
call, where said library has the function prototype like: `float(float,
float*)` -- that is it returns a float from the call and via an output
pointer. The first result of the node maps to the value returned by
value and the second result maps to the value returned via the output
pointer.
If only the second result is used after the expansion, we hit an issue
on x87 targets:
```
// Before expansion:
t0, t1 = fmodf x
return t1 // t0 is unused
```
Expanded result:
```
ptr = alloca
ch0 = call modf ptr
t0, ch1 = copy_from_reg, ch0 // t0 unused
t1, ch2 = ldr ptr, ch1
return t1
```
So far things are alright, but the DAGCombiner optimizes this to:
```
ptr = alloca
ch0 = call modf ptr
// copy_from_reg optimized out
t1, ch1 = ldr ptr, ch0
return t1
```
On most targets this is fine. The optimized out `copy_from_reg` is
unused and is a NOP. However, x87 uses a floating-point stack, and if
the `copy_from_reg` is optimized out it won't emit a pop needed to
remove the unused result.
The prior solution for this was to attach the chain from the
`copy_from_reg` to the root, which did work, however, the root is not
always available (it's set to null during legalize types). So the
alternate solution in this patch is to replace the `copy_from_reg` with
an `X86ISD::POP_FROM_X87_REG` within the X86 call lowering. This node is
the same as `copy_from_reg` except this node makes it explicit that it
may lower to an x87 FPU stack pop. Optimizations should be more cautious
when handling this node than a normal CopyFromReg to avoid removing a
required FPU stack pop.
```
ptr = alloca
ch0 = call modf ptr
t0, ch1 = pop_from_x87_reg, ch0 // t0 unused
t1, ch2 = ldr ptr, ch1
return t1
```
Using this node ensures a required x87 FPU pop is not removed due to the
DAGCombiner.
This is an alternate solution for #127976.
The hasOneUse check was failing in any case where the load was part of a chain - we should only be checking if the loaded value has one use, and any updates to the chain should be handled by the fold calling shouldReduceLoadWidth.
I've updated the x86 implementation to match, although it has no effect here yet (I'm still looking at how to improve the x86 implementation) as the inner for loop was discarding chain uses anyway.
By using SDValue::hasOneUse instead this patch exposes a missing dependency on the LLVMSelectionDAG library in a lot of tools + unittests, which resulted in having to make SDNode::hasNUsesOfValue inline.
Noticed while fighting the x86 regressions in #122671
Add signed and unsigned PARTIAL_REDUCE_MLA ISD nodes. Add command line
argument (aarch64-enable-partial-reduce-nodes) that indicates whether the
intrinsic experimental_vector_partial_ reduce_add will be transformed
into the new ISD node. Lowering with the new ISD nodes will, for now,
always be done as an expand.
In 5235973ee03aca4148ecabe5eff64da2af1e034e, an ICE was fixed in
getMemsetStringVal where f128 wasn't handled. It was noted at the time
[1] that the code below this also looks suspect, since it assumes the
element type of VT is either an f32 or f64.
This part of getMemsetStringVal relates to memcpy operations where the
source is a copy from a zero constant. The VT in question is determined
by TargetLowering::findOptimalMemOpLowering, which in turn calls a
further TLI hook getOptimalMemOpType.
For AArch64, getOptimalMemOpType returns either a v16i8, f128, i64, i32
or Other. For Other, TargetLowering::findOptimalMemOpLowering will then
pick an integer VT. So on AArch64 at least, I don't believe the suspect
code can be reached.
For other targets, ARM and x86 are the only ones that return a FP vector
type from getOptimalMemOpType. For both targets, the only such type is
v2f64, but given f64 is already handled it should also be fine.
To defend this, I considered adding an assert as mentioned in [1], but
given getConstantFP handles vector types, I figured using this to fully
handle the FP types makes the code simpler and more robust.
For test coverage I added unreachables to both of the branches handling
FP types in this code, but found neither fired with check-llvm across
all targets.
Test coverage was added to llvm/test/CodeGen/AArch64/memcpy-f128.ll in
5235973ee03aca4148ecabe5eff64da2af1e034e to defend ICE on f128, but at
some point it stopped hitting this code.
AArch64TargetLowering::getOptimalMemOpType was updated in
29200611055f49a0d37243caa5f8bba1df9d57a6, so I suspect this is when it
happened, although I haven't verified this. Although I did find by
updating the test to disable NEON, getOptimalMemOpType returns an f128
and the branch is once again hit.
For the final branch noted as suspect in [1], as far as I can tell this
has never had any test coverage, so I've added a test to the ARM backend
for this.
Fixes: https://github.com/llvm/llvm-project/issues/20521 [1]
A BUILD_VECTOR can implicity shrink the bits of the operands if the
operand types are not legal. For example a v8i16 constant BUILD_VECTOR
might be represented as v8i16 BUILDVECTOR(i32 1, i32 2, ...).
Unfortunately this means that the constants are not accepted by
matchUnaryPredicateImpl, preventing in this case funnel shifts detecting
that all the operands are non-zero. Add a flag to help it match.
Once we get to SelectionDAG the IR should not be changing anymore, so we
can use BatchAAResults rather than AAResults to cache AA queries.
This should be a NFC change for targets that enable AA during codegen
(such as AArch64), but also give a nice compile-time improvement in some
cases. See:
https://github.com/llvm/llvm-project/pull/123787#issuecomment-2606797041
Note: This follows Nikita's suggestion on #123787.
This pattern was originally spotted in 429.mcf by @topperc.
We already have a DAGCombiner pattern to turn `(neg (abs x))` into `(min
x, (neg x))`. But in some cases `(neg (max x, (neg x)))` is formed by an
expanded `abs` followed by a `neg` that is generated only after the
`abs` expansion. This patch adds a separate pattern to match cases like
this, as well as its inverse pattern: `(neg (min X, (neg X))) --> (max
X, (neg X))`.
This pattern is applicable to both signed and unsigned min/max.
With this change, targets are no longer required to put memory / strict-fp opcodes after special
`ISD::FIRST_TARGET_MEMORY_OPCODE`/`ISD::FIRST_TARGET_STRICTFP_OPCODE` markers.
This will also allow autogenerating `isTargetMemoryOpcode`/`isTargetStrictFPOpcode (#119709).
Pull Request: https://github.com/llvm/llvm-project/pull/119969
SDNode::use_iterator now returns an SDUse& when dereferenced.
SDNode::user_iterator returns SDNode*. SDNode::use_begin/use_end/uses
work on use_iterator. SDNode::user_begin/user_end/users work on
user_iterator.
We can now write range based for loops using SDUse& and SDNode::uses().
I've converted many of these in this patch. I didn't update loops that
have additional variables updated in their for statement.
Some loops use SDNode::use_iterator::getOperandNo() which also prevents
using range based for loops. I plan to move this into SDUse in a follow
up patch.
This function is most often used in range based loops or algorithms
where the iterator is implicitly dereferenced. The dereference returns
an SDNode * of the user rather than SDUse * so users() is a better name.
I've long beeen annoyed that we can't write a range based loop over
SDUse when we need getOperandNo. I plan to rename use_iterator to
user_iterator and add a use_iterator that returns SDUse& on dereference.
This will make it more like IR.