For the purpose of verifying proper arguments extensions per the target's ABI,
introduce the NoExt attribute that may be used by a target when neither sign-
or zeroextension is required (e.g. with a struct in register). The purpose of
doing so is to be able to verify that there is always one of these attributes
present and by this detecting cases where sign/zero extension is actually
missing.
As a first step, this patch has the verification step done for the SystemZ
backend only, but left off by default until all known issues have been
addressed.
Other targets/front-ends can now also add NoExt attribute where needed and do
this check in the backend.
Unlike scalar, where AArch64 prefers expanding scmp/ucmp with select,
under Neon we can use the arithmetic expansion to generate fewer
instructions. Notably it also prevents the scalarization of vselect
during vector-legalization.
This is a follow-up to #92289 that adds lowering of the new
`@llvm.experimental.vector.compress` intrinsic on x86 with AVX512
instructions. This intrinsic maps directly to `vpcompress`.
GCC supports code like "asm volatile ("" : "=r" (i) : "0" (f))" where i
is integer type and f is floating point type. Currently this code
produces an error with Clang. The change allows mixed scalar types
between input and output constraints.
Co-authored-by: Matt Arsenault <Matthew.Arsenault@amd.com>
PR #80309 proposes to have users of APInt's uint64_t
constructor opt-in to implicit truncation. Currently, that patch
requires SelectionDAG::getConstant to opt-in.
This patch adds getSignedConstant so we can start fixing some of the
cases that require implicit truncation.
C23 introduced new functions fminimum_num and fmaximum_num, and they
follow the minimumNumber and maximumNumber of IEEE754-2019. Let's
introduce new intrinsics to support them.
This patch introduces support only support for scalar values. The
support of
vector (vp, vp.reduce, vector.reduce),
experimental.constrained
will be added in future patches.
With this patch, MIPSr6 and LoongArch can work out of box with
fcanonical and fmax/fmin.
Aarch64/PowerPC64 can use the same login as MIPSr6 and LoongArch, while
they have no fcanonical support yet.
I will add it in future patches.
The FMIN/FMAX of RISC-V instructions follows the
minimumNumber/maximumNumber of IEEE754-2019. We can just add it in
future patch.
Background
https://discourse.llvm.org/t/rfc-fix-llvm-min-f-and-llvm-max-f-intrinsics/79735
Currently we have fminnum/fmaxnum, which have different behavior on
different platform for NUM vs sNaN:
1) Fallback to fmin(3)/fmax(3): return qNaN.
2) ARM64/ARM32+Neon: same as libc.
3) MIPSr6/LoongArch/RISC-V: return NUM.
And the fix of fminnum/fmaxnum to follow minNUM/maxNUM of IEEE754-2008
will submit as separated patches.
If the upper bits of the shr aren't demanded.
This helps with cases where the outer srl was originally an sra and was
converted to a srl by SimplifyDemandedBits before it had a chance to
combine with the inner sra. This can occur when the inner sra was part
of a sign_extend_inreg expansion.
There are some regressions in ARM and Thumb2.
In TargetLowering::expandFixedPointMul when expanding fixed point
multiplication, and when using a widened MUL as strategy for the
lowering, there was a bug resulting in assertion failures like this:
Assertion `VT.isVector() == N1.getValueType().isVector() &&
"SIGN_EXTEND result type type should be vector iff the operand "
"type is vector!"' failed.
Problem was that we did not consider that VT could be a vector type
when setting up the WideVT. This patch should fix that bug.
Always match ABD patterns pre-legalization, and use TargetLowering::expandABD to expand again during legalization.
abdu(lhs, rhs) -> sub(xor(sub(lhs, rhs), usub_overflow(lhs, rhs)), usub_overflow(lhs, rhs))
Alive2: https://alive2.llvm.org/ce/z/dVdMyv
REAPPLIED: Fix regression issue with "abs(ext(x) - ext(y)) -> zext(abd(x, y))" fold failing after type legalization
For a __thread variable x, when emulated TLS is enabled and there is an
access to x, the compiler first looks up the symbol __emutls_v.x within
the module. However, the issue arises with an alias y of x, the compiler
still tries to look up __emutls_v.y instead of __emutls_v.x. As a
result, the lookup returns a nullptr, causing the compiler to crash. The
purpose of this MR (Merge Request) is to ensure that in emulated TLS,
before checking __emutls_v.y, the compiler first identifies which global
value y is an alias of.
Always match ABD patterns pre-legalization, and use TargetLowering::expandABD to expand again during legalization.
abdu(lhs, rhs) -> sub(xor(sub(lhs, rhs), usub_overflow(lhs, rhs)), usub_overflow(lhs, rhs))
Alive2: https://alive2.llvm.org/ce/z/dVdMyv
Produces better code on x86_64 only in the unordered case. Not
sure what the exact condition should be to avoid the regression. Free
fabs might do it, or maybe requires legality checks for the alternative
integer expansion.
Have simpler lowering for exact udivs in both SelectionDAG and
GlobalISel.
The algorithm is the same between unsigned exact divs and signed divs
save for arithmetic vs logical shift for even divisors, according to
Hacker's Delight, 2nd Edition, page 242.
This PR adds a new vector intrinsic `@llvm.experimental.vector.compress`
to "compress" data within a vector based on a selection mask, i.e., it
moves all selected values (i.e., where `mask[i] == 1`) to consecutive
lanes in the result vector. A `passthru` vector can be provided, from
which remaining lanes are filled.
The main reason for this is that the existing
`@llvm.masked.compressstore` has very strong constraints in that it can
only write values that were selected, resulting in guard branches for
all targets except AVX-512 (and even there the AMD implementation is
_very_ slow). More instruction sets support "compress" logic, but only
within registers. So to store the values, an additional store is needed.
But this combination is likely significantly faster on many target as it
avoids branches.
In follow up PRs, my plan is to add target-specific lowerings for x86,
SVE, and possibly RISCV. I also want to combine this with a store
instruction, as this is probably a common case and we can avoid some
memory writes in that case.
See [discussion in
forum](https://discourse.llvm.org/t/new-intrinsic-for-masked-vector-compress-without-store/78663)
for initial discussion on the design.
The previous expansion of [US]CMP was done using two selects and two
compares. It produced decent code, but on many platforms it is better to
implement [US]CMP nodes by performing the following operation:
```
[us]cmp(x, y) = (x [us]> y) - (x [us]< y)
```
This patch adds this new expansion, as well as a hook in TargetLowering to allow some targets to still use the select-based approach. AArch64 and SystemZ are currently the only targets to prefer the former approach, but other targets may also start to use it if it provides for better codegen.
When looking for a largest legal integer type for a target
`TargetLowering::findOptimalMemOpLowering` assumes that `MVT::i64` is
the largets possible integer type. The patch removes this assumption and
uses `MVT::LAST_INTEGER_VALUETYPE` instead.
#97645 proposed to remove LegalTypes from getShiftAmountTy. This patches
removes it from getShiftAmountConstant which is one of the callers of
getShiftAmountTy.
As far as I can tell, this pull request was not approved, and
did not go through an RFC on discourse.
This reverts commit 89881480030f48f83af668175b70a9798edca2fb.
This reverts commit 225d8fc8eb24fb797154c1ef6dcbe5ba033142da.
Currently, on different platform, the behaivor of llvm.minnum is
different if one operand is sNaN:
When we compare sNaN vs NUM:
ARM/AArch64/PowerPC: follow the IEEE754-2008's minNUM: return qNaN.
RISC-V/Hexagon follow the IEEE754-2019's minimumNumber: return NUM. X86:
Returns NUM but not same with IEEE754-2019's minimumNumber as
+0.0 is not always greater than -0.0.
MIPS/LoongArch/Generic: return NUM.
LIBCALL: returns qNaN.
So, let's introduce llvm.minmumnum/llvm.maximumnum, which always follow
IEEE754-2019's minimumNumber/maximumNumber.
Half-fix: #93033
This PR adds initial support for the `scmp`/`ucmp` 3-way comparison
intrinsics in the SelectionDAG. Some of the expansions/lowerings
are not optimal yet.
Converting to avgfloor and then expanding it back to shift+add later is likely to prevent other folds (re-association and value-tracking in particular) in the meantime.
Fixes#95284