Once we get to SelectionDAG the IR should not be changing anymore, so we
can use BatchAAResults rather than AAResults to cache AA queries.
This should be a NFC change for targets that enable AA during codegen
(such as AArch64), but also give a nice compile-time improvement in some
cases. See:
https://github.com/llvm/llvm-project/pull/123787#issuecomment-2606797041
Note: This follows Nikita's suggestion on #123787.
This pattern was originally spotted in 429.mcf by @topperc.
We already have a DAGCombiner pattern to turn `(neg (abs x))` into `(min
x, (neg x))`. But in some cases `(neg (max x, (neg x)))` is formed by an
expanded `abs` followed by a `neg` that is generated only after the
`abs` expansion. This patch adds a separate pattern to match cases like
this, as well as its inverse pattern: `(neg (min X, (neg X))) --> (max
X, (neg X))`.
This pattern is applicable to both signed and unsigned min/max.
With this change, targets are no longer required to put memory / strict-fp opcodes after special
`ISD::FIRST_TARGET_MEMORY_OPCODE`/`ISD::FIRST_TARGET_STRICTFP_OPCODE` markers.
This will also allow autogenerating `isTargetMemoryOpcode`/`isTargetStrictFPOpcode (#119709).
Pull Request: https://github.com/llvm/llvm-project/pull/119969
SDNode::use_iterator now returns an SDUse& when dereferenced.
SDNode::user_iterator returns SDNode*. SDNode::use_begin/use_end/uses
work on use_iterator. SDNode::user_begin/user_end/users work on
user_iterator.
We can now write range based for loops using SDUse& and SDNode::uses().
I've converted many of these in this patch. I didn't update loops that
have additional variables updated in their for statement.
Some loops use SDNode::use_iterator::getOperandNo() which also prevents
using range based for loops. I plan to move this into SDUse in a follow
up patch.
This function is most often used in range based loops or algorithms
where the iterator is implicitly dereferenced. The dereference returns
an SDNode * of the user rather than SDUse * so users() is a better name.
I've long beeen annoyed that we can't write a range based loop over
SDUse when we need getOperandNo. I plan to rename use_iterator to
user_iterator and add a use_iterator that returns SDUse& on dereference.
This will make it more like IR.
This adds a new helper `canFoldStoreIntoLibCallOutputPointers()` to
check that it is safe to fold a store into a node that will expand to a
library call that takes output pointers. This requires checking for two
(independent) properties:
1. The store is not within a CALLSEQ_START..CALLSEQ_END pair
* If it is, the expansion would lead to nested call sequences (which is
invalid)
2. The node does not appear as a predecessor to the store
* If it does, attempting to merge the store into the call would result
in a cycle in the DAG
These two properties are checked as part of the same traversal in
`canFoldStoreIntoLibCallOutputPointers()`
EVTs potentially contain a Type * that points into memory owned by an
LLVMContext. Storing them in a function scoped static means they may
outlive the LLVMContext they point to.
This std::set is used to unique single element VT lists containing a
single extended EVT. Single element VT list with a simple EVT are
uniqued by a separate cache indexed by the MVT::SimpleValueType enum. VT
lists with more than one element are uniqued by a FoldingSet owned by
the SelectionDAG object.
This patch moves the single element cache into SelectionDAG so that it
will be destroyed when SelectionDAG is destroyed.
Fixes#88233
Assert that the passed value is a valid unsigned integer value for the
specified type.
For signed values getSignedConstant() / getSignedTargetConstant() should
be used instead.
Fix all the places I could find that did't do this. We were already
mostly correct for FP_ROUND after
9a976f36615dbe15e76c12b22f711b2e597a8e51, but not STRICT_FP_ROUND.
When the chain is not the entry node there is a risk the stores are
within a (CALLSEQ_START, CALLSEQ_END), which when the node is expanded
will lead to nested call sequences.
It should be possible to check for this and allow more cases, but for
now, let's limit this to cases where it's definitely safe.
Fixes#115323
We already support computing known bits for extending loads, but not for
masked loads. For now I've only added support for zero-extends because
that's the only thing currently tested. Even when the passthru value is
poison we still know the top X bits are zero.
This merges the logic for expanding both FFREXP and FSINCOS into one
method `DAG.expandMultipleResultFPLibCall()`. This reduces duplication
and also allows FFREXP to benefit from the stack slot elimination
implemented for FSINCOS. This method will also be used in future to
implement more multiple-result intrinsics (such as modf and sincospi).
This shares most of its code with the scalar sincos expansion. It allows
expanding vector FSINCOS nodes to a library call from the specified
`-vector-library`. The upside of this is it will mean the vectorizer
only needs to handle the sincos intrinsic, which has no memory effects,
and this can handle lowering the intrinsic to a call that takes output
pointers.
This adds the `llvm.sincos` intrinsic, legalization, and lowering.
The `llvm.sincos` intrinsic takes a floating-point value and returns
both the sine and cosine (as a struct).
```
declare { float, float } @llvm.sincos.f32(float %Val)
declare { double, double } @llvm.sincos.f64(double %Val)
declare { x86_fp80, x86_fp80 } @llvm.sincos.f80(x86_fp80 %Val)
declare { fp128, fp128 } @llvm.sincos.f128(fp128 %Val)
declare { ppc_fp128, ppc_fp128 } @llvm.sincos.ppcf128(ppc_fp128 %Val)
declare { <4 x float>, <4 x float> } @llvm.sincos.v4f32(<4 x float> %Val)
```
The lowering is built on top of the existing FSINCOS ISD node, with
additional type legalization to allow for f16, f128, and vector values.
In SelectionDAG, `TargetTransformInfo::hasBranchDivergence()` can be
called when both `NDEBUG` and `LLVM_ENABLE_ABI_BREAKING_CHECKS` are
enabled. In that case, the class member `TTI` is still initialized to
`nullptr`, causing a segfault.
Fix this by ensuring that all the calls to `hasBranchDivergence` and
`VerifyDAGDivergence` only occur when `NDEBUG` is disabled, and
`LLVM_ENABLE_ABI_BREAKING_CHECKS` is enabled.
Alter both isConstantIntBuildVectorOrConstantInt + isConstantFPBuildVectorOrConstantFP to return a bool instead of the underlying SDNode, and adjust usage to account for this.
Update isConstantIntBuildVectorOrConstantInt to peek though bitcasts when attempting to find a constant, in particular this improves canonicalization of constants to the RHS on commutable instructions.
X86 is the beneficiary here as it often bitcasts rematerializable 0/-1 vector constants as vXi32 and bitcasts to the requested type
Minor cleanup that helps with #107423
Reapplied after regression fix ba1255def64a9c3c68d97ace051eec76f546eeb0
Alter both isConstantIntBuildVectorOrConstantInt + isConstantFPBuildVectorOrConstantFP to return a bool instead of the underlying SDNode, and adjust usage to account for this.
Update isConstantIntBuildVectorOrConstantInt to peek though bitcasts when attempting to find a constant, in particular this improves canonicalization of constants to the RHS on commutable instructions.
X86 is the beneficiary here as it often bitcasts rematerializable 0/-1 vector constants as vXi32 and bitcasts to the requested type
Minor cleanup that helps with #107423
This fixes all the places that hit the new assertion added in
https://github.com/llvm/llvm-project/pull/106524 in tests. That is,
cases where the value passed to the APInt constructor is not an N-bit
signed/unsigned integer, where N is the bit width and signedness is
determined by the isSigned flag.
The fixes either set the correct value for isSigned, set the
implicitTrunc flag, or perform more calculations inside APInt.
Note that the assertion is currently still disabled by default, so this
patch is mostly NFC.
This change is part of this proposal:
https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294
Based on example PR #96222 and fix PR #101268, with some differences due
to 2-arg intrinsic and intermediate refactor (RuntimeLibCalls.cpp).
- Add llvm.experimental.constrained.atan2 - Intrinsics.td,
ConstrainedOps.def, LangRef.rst
- Add to ISDOpcodes.h and TargetSelectionDAG.td, connect to intrinsic in
BasicTTIImpl.h, and LibFunc_ in SelectionDAGBuilder.cpp
- Update LegalizeDAG.cpp, LegalizeFloatTypes.cpp, LegalizeVectorOps.cpp,
and LegalizeVectorTypes.cpp
- Update isKnownNeverNaN in SelectionDAG.cpp
- Update SelectionDAGDumper.cpp
- Update libcalls - RuntimeLibcalls.def, RuntimeLibcalls.cpp
- TargetLoweringBase.cpp - Expand for vectors, promote f16
- X86ISelLowering.cpp - Expand f80, promote f32 to f64 for MSVC
Part 4 for Implement the atan2 HLSL Function #70096.
This macros is always defined: either 0 or 1. The correct pattern is to
use #if.
Re-apply #110185 with more fixes for debug build with the ABI breaking
checks disabled.
…ead of #ifdef (#110883)"
This reverts commit 1905cdbf4ef15565504036c52725cb0622ee64ef, which
causes lots of failures where LLVM doesn't have the right header guards.
The errors can be seen on
[BuildKite](https://buildkite.com/llvm-project/upstream-bazel/builds/112362#01924eae-231c-4d06-ba87-2c538cf40e04),
where the source uses `#ifndef NDEBUG`, but the content in question is
defined when `LLVM_ENABLE_ABI_BREAKING_CHECKS == 1`.
For example, `llvm/include/llvm/Support/GenericDomTreeConstruction.h`
has the following:
```cpp
// Helper struct used during edge insertions.
struct InsertionInfo {
// ...
#ifdef LLVM_ENABLE_ABI_BREAKING_CHECKS
SmallVector<TreeNodePtr, 8> VisitedUnaffected;
#endif
};
// ...
InsertionInfo II;
// ...
#ifndef NDEBUG
II.VisitedUnaffected.push_back(SuccTN);
#endif
```
It is almost always simpler to use {} instead of std::nullopt to
initialize an empty ArrayRef. This patch changes all occurrences I could
find in LLVM itself. In future the ArrayRef(std::nullopt_t) constructor
could be deprecated or removed.
When we build a node with illegal type which has a user, it's possible
that it can end up being processed by the DAG combiner later before it's
removed, which can trigger an assert expecting the types to be legalized
already.
This patch introduces lowering of the partial add reduction intrinsic to
a udot or svdot for AArch64. This also involves adding a
`shouldExpandPartialReductionIntrinsic` target hook, which AArch64 will
return false from in the cases that it can be lowered.
When getting a neutral value, we can prefer using a positive zero over a
negative zero if nsz is set on the FADD (or reduction). A positive zero
should be cheaper to materialize on basically all targets.
Arguably, we should be doing this kind of canonicalization in
DAGCombine, but we don't do that for any of the other reduction
variants, so this seems like path of least resistance. This does mean
that we can only do this for "fast" reductions. Just nsz isn't enough,
as that goes through the SEQ_FADD path where the IR level start value
isn't folded away.
If folks think this is to RISCV specific, let me know. There's a trivial
RISCV specific implementation. I went with the generic one as I through
this might benefit other targets.
These lists are quite static and several of the parameters are actually
constant across all users. Heavy use of macros is undesirable, and not
idiomatic in LLVM, so let's just use the naive switch cases.
I'll probably continue with removing the other property macros. These
two just happened to be the two I actually had to figure out for an
unrelated change.
Currently, `getStackAlignment` asserts if the stack alignment wasn't
specified. This makes it inconvenient to use and complicates testing.
This change also makes `exceedsNaturalStackAlignment` method redundant.
This allows the use a single wider operation with a restricted EVL
instead of having to split and cover via decreasing powers-of-two sizes.
On RISCV, this avoids the need for a bunch of vslidedown and vslideup
instructions to extract subvectors, and VL toggles to switch between the
various widths.
Note there is a potential downside of using vp nodes; we loose any
generic DAG combines which might have applied to the split form.
PR #80309 proposes to have users of APInt's uint64_t
constructor opt-in to implicit truncation. Currently, that patch
requires SelectionDAG::getConstant to opt-in.
This patch adds getSignedConstant so we can start fixing some of the
cases that require implicit truncation.
C23 introduced new functions fminimum_num and fmaximum_num, and they
follow the minimumNumber and maximumNumber of IEEE754-2019. Let's
introduce new intrinsics to support them.
This patch introduces support only support for scalar values. The
support of
vector (vp, vp.reduce, vector.reduce),
experimental.constrained
will be added in future patches.
With this patch, MIPSr6 and LoongArch can work out of box with
fcanonical and fmax/fmin.
Aarch64/PowerPC64 can use the same login as MIPSr6 and LoongArch, while
they have no fcanonical support yet.
I will add it in future patches.
The FMIN/FMAX of RISC-V instructions follows the
minimumNumber/maximumNumber of IEEE754-2019. We can just add it in
future patch.
Background
https://discourse.llvm.org/t/rfc-fix-llvm-min-f-and-llvm-max-f-intrinsics/79735
Currently we have fminnum/fmaxnum, which have different behavior on
different platform for NUM vs sNaN:
1) Fallback to fmin(3)/fmax(3): return qNaN.
2) ARM64/ARM32+Neon: same as libc.
3) MIPSr6/LoongArch/RISC-V: return NUM.
And the fix of fminnum/fmaxnum to follow minNUM/maxNUM of IEEE754-2008
will submit as separated patches.