This allows truncated splat / buildvector in isBoolConstant, to allow
certain not instructions to be recognized post-legalization, and allow
vselect to optimize.
An override for x86 avx512 predicated vectors is required to avoid an
infinite recursion from the code that detects zero vectors. From:
```
// Check if the first operand is all zeros and Cond type is vXi1.
// If this an avx512 target we can improve the use of zero masking by
// swapping the operands and inverting the condition.
```
Add LLVM Context to getOptimalMemOpType and findOptimalMemOpLowering. So
that we can use EVT::getVectorVT to generate EVT type in
getOptimalMemOpType.
Related to [#146673](https://github.com/llvm/llvm-project/pull/146673).
DAGCombiner can already constant fold build vectors of constants/undefs
to a new vector type, but it has to be incredibly careful after
legalization to not affect a target's canonicalized constants.
This patch proposes we move the implementation inside SelectionDAG to
make it easier for targets to manually use the constant folding whenever
it deems it safe to do so.
I've also altered the method to take the BuildVectorSDNode input
directly and consistently use the same SDLoc.
This reverts commit 8ac7210b7f0ad49ae7809bf6a9faf2f7433384b0.
This breaks the building the AArch64 backend, e.g. see
https://github.com/llvm/llvm-project/pull/144947
Revert to unbreak the build.
Also reverts follow-up commits 1e76f012db3ccfaa05e238812e572b5b6d12c17e.
If a kernel is known to be executing only a single lane, IR
UniformityAnalysis will take note of that (via
GCNTTIImpl::hasBranchDivergence) and report that all values are uniform.
SelectionDAG's built-in divergence tracking should do the same.
CTTZ/CTLZ_ZERO_UNDEF nodes can only create poison if the source value is zero - so check with isKnownNeverZero
Pulled out of #146361 and reapplied now that #146490 has landed.
This keeps getting forgotten (e.g. #66603) - so make a point of adding
it here to make it clear instead of relying on the implicit default of
returning true.
Neither the arithmetic value or overflow result can create undef/poison from regular operands values.
We have complete test coverage for all ADDO/SUBO nodes, 32-bit codegen handles the _CARRY variants but until #145939 lands AND DAGCombiner::visitFREEZE handles multiple results we can't see any codegen change.
Pulled out of #145939
When CSEing a load with an existing load with different range
metadata, clear the range metadata on the existing
load.
This is conservative, alternatively we could calculate new range
metadata using MDNode::getMostGenericRange. Without a test case I wasn't
sure it was worth it.
MDnode::getMostGenericRange takes a non-const MDNode*, but all of
SelectionDAG
uses const MDNode*. A const_cast will need to be used somewhere or
we need to make the codebase consistent about whether MDNode pointers
should be const or not.
I'm sure this isn't the only place that needs to be updated to handle
the CSE.
Fixes#145363.
Followup to #143760 which handled ISD::VSELECT
I've moved ISD::SELECT/VSELECT under the "No poison except from flags
(which is handled above)" subgroup to try to remind people that these
can have poison generating FMFs (NINF/NNAN), even though this hasn't
been well explained anywhere I can find :(
Helps with regressions from #145939
Instead of reporting ___memmove as an implementation of memcpy,
make it unavailable and let the lowering logic consider memmove as
a fallback path.
This avoids a special case 1:N mapping for libcall implementations.
If X is known non-negative, that's still true if we fold the truncate
to create a smaller zext.
In the i128 tests, SelectionDAGBuilder aggressively truncates the
`zext nneg` to i64 to match `getShiftAmountTy`. If we don't preserve
the `nneg` we can't see that the shift amount argument being `signext`
means we don't need to do any extension
We have recently added the partial_reduce_smla and partial_reduce_umla
nodes to represent Acc += ext(b) * ext(b) where the two extends have to
have the same source type, and have the same extend kind.
For riscv64 w/zvqdotq, we have the vqdot and vqdotu instructions which
correspond to the existing nodes, but we also have vqdotsu which
represents the case where the two extends are sign and zero respective
(i.e. not the same type of extend).
This patch adds a partial_reduce_sumla node which has sign extension for
A, and zero extension for B. The addition is somewhat mechanical.
This reverts commit 58cc1675ec7b4aa5bc2dab56180cb7af1b23ade5.
I also made the incorrect assumption that we know both values are
+/-0.0 here as well. Revert for now.
FMAXIMUM is currently legalized via IS_FPCLASS for the signed zero
handling. This is problematic, because it assumes the equivalent integer
type is legal. Many targets have legal fp128, but illegal i128, so this
results in legalization failures.
Fix this by replacing IS_FPCLASS with checking the bitcast to integer
instead. In that case it is sufficient to use any legal integer type, as
we're just interested in the sign bit. This can be obtained via a stack
temporary cast. There is existing FloatSignAsInt functionality used for
legalization of FABS and similar we can use for this purpose.
Fixes https://github.com/llvm/llvm-project/issues/139380.
Fixes https://github.com/llvm/llvm-project/issues/139381.
Fixes https://github.com/llvm/llvm-project/issues/140445.
This opcode represents the addition of a pointer value (first operand)
and an integer offset (second operand). PTRADD nodes are only generated
if the TargetMachine opts in by overriding
TargetMachine::shouldPreservePtrArith().
The PTRADD node and respective visitPTRADD() function were adapted by
@rgwott from the CHERI/Morello LLVM tree.
Original authors: @davidchisnall, @jrtc27, @arichardson.
The changes in this PR were extracted from PR #105669.
---------
Co-authored-by: David Chisnall <github@theravensnest.org>
Co-authored-by: Jessica Clarke <jrtc27@jrtc27.com>
Co-authored-by: Alexander Richardson <alexrichardson@google.com>
Co-authored-by: Rodolfo Wottrich <rodolfo.wottrich@arm.com>
These are identified by misc-include-cleaner. I've filtered out those
that break builds. Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
The load not being included in the chain meant that it could materialize
after a `@llvm.lifetime.end` annotation on the pointer. This could
result in miscompiles if the stack slot is reused for another value.
Fixes https://github.com/llvm/llvm-project/issues/140491
Added APInt::clearBits(unsigned loBit, unsigned hiBit) that clears bits within a certain range.
Fixes#136550
---------
Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>
It is used to mark a value that we are sure that it is not some fcType.
The examples include:
* An arguments of a function is marked with nofpclass
* Output value of an intrinsic can be sure to not be some type
So that the following operation can make some assumptions.
As with the recently added subvector variants, provide the unsigned
index operand to simplify a bunch of code.
---------
Co-authored-by: Luke Lau <luke_lau@icloud.com>
Mechanical change to introduce the new wrappers, and add enough users to
make the usage pattern clear. Once this lands, I'm going to do a further
pass to adjust more callsites as separate changes.
---------
Co-authored-by: Luke Lau <luke_lau@icloud.com>
In some cases of chained `sext` operators the debug metadata can be
missed. This patch propagates proper metadata to resulting node.
Particular case of issue is NVPTX codegen for function with bool local
variable:
```
void test(int i) {
bool xyz = i == 0;
foo(i);
}
```
---------
Signed-off-by: Alexander Peskov <apeskov@nvidia.com>
This patch adds support for LLVM IR atomicrmw `fmaximum` and `fminimum`
instructions.
These mirror the `llvm.maximum.*` and `llvm.minimum.*` instructions, but
are atomic and use IEEE754 2019 handling for NaNs, which is different to
`fmax` and `fmin`. See:
https://llvm.org/docs/LangRef.html#llvm-minimum-intrinsic
for more details.
Future changes will allow this LLVM IR to be lowered to specialised
assembler instructions on suitable targets, such as AArch64.
This patch adds support for LLVM IR atomicrmw `fmaximum` and `fminimum`
instructions.
These mirror the `llvm.maximum.*` and `llvm.minimum.*` instructions, but
are atomic and use IEEE754 2019 handling for NaNs, which is different to
`fmax` and `fmin`. See:
https://llvm.org/docs/LangRef.html#llvm-minimum-intrinsic
for more details.
Future changes will allow this LLVM IR to be lowered to specialised
assembler instructions on suitable targets, such as AArch64.
It is used to mark a value that we are sure that it is not some fcType.
The examples include:
* An arguments of a function is marked with nofpclass
* Output value of an intrinsic can be sure to not be some type
So that the following operation can make some assumptions.
---------
Co-authored-by: Your Name <you@example.com>
And teach SelectionDAGBuilder to get the range metadata in
visitAtomicLoad.
This allows us to recognize that sign extending a byte load of a
boolean value from memory will produce zeros for the extended bits.
This allow us to remove an AND on RISC-V.
Tests copied from #136502 with range metadata added to i1 cases.
Some of the test effects overlap with #136502, but that patch can't
handle the acquire or seq_cst cases with the Zalasr extension. We
only have sign extending versions of those loads.
Rename one signature of getAtomic to getAtomicLoad and pass LoadExtType.
Previously we had to set the extension type after the node was created,
but we don't usually modify SDNodes once they are created. It's possible
the node already existed and has been CSEd. If that happens, modifying
the node may affect the other users. It's therefore safer to add the
extension type at creation so that it is part of the CSE information.
I don't know of any failures related to the current implementation. I
only noticed that it doesn't match how we usually do things.