When we build a node with illegal type which has a user, it's possible
that it can end up being processed by the DAG combiner later before it's
removed, which can trigger an assert expecting the types to be legalized
already.
This patch introduces lowering of the partial add reduction intrinsic to
a udot or svdot for AArch64. This also involves adding a
`shouldExpandPartialReductionIntrinsic` target hook, which AArch64 will
return false from in the cases that it can be lowered.
When getting a neutral value, we can prefer using a positive zero over a
negative zero if nsz is set on the FADD (or reduction). A positive zero
should be cheaper to materialize on basically all targets.
Arguably, we should be doing this kind of canonicalization in
DAGCombine, but we don't do that for any of the other reduction
variants, so this seems like path of least resistance. This does mean
that we can only do this for "fast" reductions. Just nsz isn't enough,
as that goes through the SEQ_FADD path where the IR level start value
isn't folded away.
If folks think this is to RISCV specific, let me know. There's a trivial
RISCV specific implementation. I went with the generic one as I through
this might benefit other targets.
These lists are quite static and several of the parameters are actually
constant across all users. Heavy use of macros is undesirable, and not
idiomatic in LLVM, so let's just use the naive switch cases.
I'll probably continue with removing the other property macros. These
two just happened to be the two I actually had to figure out for an
unrelated change.
Currently, `getStackAlignment` asserts if the stack alignment wasn't
specified. This makes it inconvenient to use and complicates testing.
This change also makes `exceedsNaturalStackAlignment` method redundant.
This allows the use a single wider operation with a restricted EVL
instead of having to split and cover via decreasing powers-of-two sizes.
On RISCV, this avoids the need for a bunch of vslidedown and vslideup
instructions to extract subvectors, and VL toggles to switch between the
various widths.
Note there is a potential downside of using vp nodes; we loose any
generic DAG combines which might have applied to the split form.
PR #80309 proposes to have users of APInt's uint64_t
constructor opt-in to implicit truncation. Currently, that patch
requires SelectionDAG::getConstant to opt-in.
This patch adds getSignedConstant so we can start fixing some of the
cases that require implicit truncation.
C23 introduced new functions fminimum_num and fmaximum_num, and they
follow the minimumNumber and maximumNumber of IEEE754-2019. Let's
introduce new intrinsics to support them.
This patch introduces support only support for scalar values. The
support of
vector (vp, vp.reduce, vector.reduce),
experimental.constrained
will be added in future patches.
With this patch, MIPSr6 and LoongArch can work out of box with
fcanonical and fmax/fmin.
Aarch64/PowerPC64 can use the same login as MIPSr6 and LoongArch, while
they have no fcanonical support yet.
I will add it in future patches.
The FMIN/FMAX of RISC-V instructions follows the
minimumNumber/maximumNumber of IEEE754-2019. We can just add it in
future patch.
Background
https://discourse.llvm.org/t/rfc-fix-llvm-min-f-and-llvm-max-f-intrinsics/79735
Currently we have fminnum/fmaxnum, which have different behavior on
different platform for NUM vs sNaN:
1) Fallback to fmin(3)/fmax(3): return qNaN.
2) ARM64/ARM32+Neon: same as libc.
3) MIPSr6/LoongArch/RISC-V: return NUM.
And the fix of fminnum/fmaxnum to follow minNUM/maxNUM of IEEE754-2008
will submit as separated patches.
The following functions have an early exit for scalable vectors:
SelectionDAG::canCreateUndefOrPoison
SelectionDAG:isGuaranteedNotToBeUndefOrPoison
The implementations of these don't look to be sensitive to the
vector type other than some uses of demanded elts analysis that
doesn't fully support scalable types. That said the initial
calculation demands all elements and so I've followed the same
scheme as used by TargetLowering::SimplifyDemandedBits.
Always match ABD patterns pre-legalization, and use TargetLowering::expandABD to expand again during legalization.
abdu(lhs, rhs) -> sub(xor(sub(lhs, rhs), usub_overflow(lhs, rhs)), usub_overflow(lhs, rhs))
Alive2: https://alive2.llvm.org/ce/z/dVdMyv
REAPPLIED: Fix regression issue with "abs(ext(x) - ext(y)) -> zext(abd(x, y))" fold failing after type legalization
Always match ABD patterns pre-legalization, and use TargetLowering::expandABD to expand again during legalization.
abdu(lhs, rhs) -> sub(xor(sub(lhs, rhs), usub_overflow(lhs, rhs)), usub_overflow(lhs, rhs))
Alive2: https://alive2.llvm.org/ce/z/dVdMyv
This addresses a TODO where we can fold a vselect to it's true operand
if the boolean is known to be all trues, by factoring out the logic from
extractBooleanFlip which checks TLI.getBooleanContents.
This fixes DAG divergence mishandling inline asm.
This was considering the glue nodes for divergence, when the
divergence should only come from the individual CopyFromRegs
that are glued. As a result, having any VGPR CopyFromRegs would
taint any uniform SGPR copies as divergent, resulting in SGPR
copies to VGPR virtual registers later.
This patch ports the commit a6614ec5b7c1dbfc4b847884c5de780cf75e8e9c to
SelectionDAG TypeLegalization.
Fixes#98044
Co-authored-by: Manish Kausik H <hmamishkausik@gmail.com>
Handle cases where the subo_carry is subtracting the same operand (=zero) - so only the subtraction of the 0/1 carry bit is affecting the result, giving a 0/-1 allsignbits value.
Noticed while improving ABDS/ABDU expansion.
This reverts commit 740161a9b98c9920dedf1852b5f1c94d0a683af5.
I moved the `ISD` dependencies into the CodeGen portion of the handling,
it's a little awkward but it's the easiest solution I can think of for
now.
This PR adds a new vector intrinsic `@llvm.experimental.vector.compress`
to "compress" data within a vector based on a selection mask, i.e., it
moves all selected values (i.e., where `mask[i] == 1`) to consecutive
lanes in the result vector. A `passthru` vector can be provided, from
which remaining lanes are filled.
The main reason for this is that the existing
`@llvm.masked.compressstore` has very strong constraints in that it can
only write values that were selected, resulting in guard branches for
all targets except AVX-512 (and even there the AMD implementation is
_very_ slow). More instruction sets support "compress" logic, but only
within registers. So to store the values, an additional store is needed.
But this combination is likely significantly faster on many target as it
avoids branches.
In follow up PRs, my plan is to add target-specific lowerings for x86,
SVE, and possibly RISCV. I also want to combine this with a store
instruction, as this is probably a common case and we can avoid some
memory writes in that case.
See [discussion in
forum](https://discourse.llvm.org/t/new-intrinsic-for-masked-vector-compress-without-store/78663)
for initial discussion on the design.
Well, not quite that simple. We can tc memset since it returns the first
argument but bzero doesn't do that and therefore we can end up
miscompiling.
This patch also refactors the logic out of isInTailCallPosition() into the callers.
As a result memcpy and memmove are also modified to do the same thing
for consistency.
rdar://131419786
Summary:
The LTO pass and LLD linker have logic in them that forces extraction
and prevent internalization of needed runtime calls. However, these
currently take all RTLibcalls into account, even if the target does not
support them. The target opts-out of a libcall if it sets its name to
nullptr. This patch pulls this logic out into a class in the header so
that LTO / lld can use it to determine if a symbol actually needs to be
kept.
This is important for targets like AMDGPU that want to be able to use
`lld` to perform the final link step, but does not want the overhead of
uncalled functions. (This adds like a second to the link time trivially)
This patch fixes a problem that existed before where in some situations
a `UCMP`/`SCMP` node which operated on 1-element vectors had a legal
result type (i.e. `v1i64` on AArch64), but illegal operands (i.e.
`v1i65`). This meant that operand scalarization was performed on the
node and the operands were changed to a legal scalar type, but the
result wasn't. This then led to `UCMP`/`SCMP` nodes with different
vector-ness of operands and result appearing in the SDAG. This patch
addresses this issue by fully scalarizing the `UCMP`/`SCMP` node and
then turning its result back into a 1-element vector using a
`SCALAR_TO_VECTOR` node.
It also adds several assertions to `SelectionDAG::getNode()` to avoid
this or a similar issue arising in the future. I wasn't sure if these
two changes are unrelated enough to warrant two small separate PRs, but
I'm happy to split this PR into two if that's deemed more appropriate.
As pointed out by @preames, ConstantRange can wrap so it's possible
for zero to be in a range without zero being the minimum. This fixes
this by checking contains instead.
Add simple support for looking through ZEXT/ANYEXT/SEXT when doing
ComputeKnownSignBits for SHL. This is valid for the case when all
extended bits are shifted out, because then the number of sign bits
can be found by analysing the EXT operand.
A future improvement could be to pass along the "shifted left by"
information in the recursive calls to ComputeKnownSignBits. Allowing
us to handle this more generically.
VSCALE is by definition greater than zero, but this checks it via
getVScaleRange anyway.
The motivation for this is to be able to check if the EVL for a VP
strided load is non-zero in #97394.
I added the tests to the RISC-V backend since the existing X86
known-never-zero.ll test crashed when trying to lower vscale for the
+sse2 RUN line.
#97645 proposed to remove LegalTypes from getShiftAmountTy. This patches
removes it from getShiftAmountConstant which is one of the callers of
getShiftAmountTy.
This patch fixes a miscompilation when `N` gets CSEed to `Existing`:
```
Existing: t5: i32 = sub nuw Constant:i32<0>, t3
N: t30: i32 = sub Constant:i32<0>, t3
```
Fixes https://github.com/llvm/llvm-project/issues/96366.
Since `raw_string_ostream` doesn't own the string buffer, it is
desirable (in terms of memory safety) for users to directly reference
the string buffer rather than use `raw_string_ostream::str()`.
Work towards TODO comment to remove `raw_string_ostream::str()`.
`SelectionDAG::getBitcastedAnyExtOrTrunc` assumes that there is always a
valid
integer type corresponding to another type, which is not always true
when it
comes to vector type. For example, `<3 x i8>` doesn't have a
corresponding
integer type.
Fix SWDEV-464698.
As far as I can tell, this pull request was not approved, and
did not go through an RFC on discourse.
This reverts commit 89881480030f48f83af668175b70a9798edca2fb.
This reverts commit 225d8fc8eb24fb797154c1ef6dcbe5ba033142da.
Currently, on different platform, the behaivor of llvm.minnum is
different if one operand is sNaN:
When we compare sNaN vs NUM:
ARM/AArch64/PowerPC: follow the IEEE754-2008's minNUM: return qNaN.
RISC-V/Hexagon follow the IEEE754-2019's minimumNumber: return NUM. X86:
Returns NUM but not same with IEEE754-2019's minimumNumber as
+0.0 is not always greater than -0.0.
MIPS/LoongArch/Generic: return NUM.
LIBCALL: returns qNaN.
So, let's introduce llvm.minmumnum/llvm.maximumnum, which always follow
IEEE754-2019's minimumNumber/maximumNumber.
Half-fix: #93033
We currently only constant fold binop(bitcast(c1),bitcast(c2)) if c1 and c2 are both bitcasted and from the same type.
This patch relaxes this assumption to allow the constant build vector to originate from different types (and allow cases where only one operand was bitcasted).
We still ensure we bitcast back to one of the original types if both operand were bitcasted (we assume that if we have a non-bitcasted constant then its legal to keep using that type).