As pointed out in #187518 , currently, `__builtin_isnormal` returns
`true` for subnormal half precision floating point numbers on `s390x.
This is because there is a custom lowering defined which lowers an `f16`
`IS_FPCLASS` ISD node by extending the `f16` value to `f32`, and then
using SystemZ's "test data class" instruction to determine whether the
number is subnormal. However, a number that is subnormal in 16 bits of
precision will no longer be subnormal in 32 bits of precision, and so
the test always returns true, i.e. all subnormal numbers are classified
as normal.
This PR addresses this by removing the custom lowering and instead
relying on the generic expansion of `IS_FPCLASS`, which does not have
this error.
Fixes#187518 .
The recursion here has potentially exponential complexity. Avoid this by
limiting the depth of recursion.
An alternative would be to memoize the results. I went with the simpler
depth limit on the assumption that we don't particularly care about very
deep value chains here.
Fixes https://github.com/llvm/llvm-project/issues/185905.
In M=4 mode, the behavior matches IEEE 754-2019 minimumNumber, except
that if both operands are sNaN, the result will be sNaN rather than
qNaN. However, this is explicitly allowed for LLVM's minimumnum
intrinsic, as canonicalization can be omitted for non-constrainted FP.
As such, mark fminimumnum/fmaximumnum as legal, and lower them the same
way as fminnum/fmaxnum. In the future, we may wish to switch those to
use M=0 instead, to match IEEE 754-2008 maxNum/minNum instead.
This change improves memset code generation for non-zero values on
AArch64 by using NEON's DUP instruction instead of
the less efficient multiplication with 0x01010101 pattern.
For small sizes, the value is extracted from a larger DUP. For
non-power-of-two sizes, overlapping stores are used in some cases.
TargetLowering::findOptimalMemOpLowering is modified to allow explicitly
specifying the size of the constant in cases where the constant is
larger than the store operations.
Fixes#165949
- Make v8f16 a legal type so that arguments can be passed in vector
registers. Handle fp16 vectors so that they have the same ABI as other
fp vectors.
- Set the preferred vector action for fp16 vectors to "split". This will
scalarize all operations, which is not always necessary (like with
memory operations), but it avoids the superfluous operations that result
after first widening and then scalarizing a narrow vector (like v4f16).
Fixes#168992
Splits out change from https://github.com/llvm/llvm-project/pull/176015
Changes shouldExpandAtomicRMWInIR to take a constant argument: This is
to allow some other TargetLowering constant-argument functions to call
it. This change touches several backends. An alternative solution
exists, but to me, this seems the "right" way.
LowerCallTo() and LowerArguments() are both providing the PartOffset field for
each split argument part. As these two methods are intended to work together,
they should both provide the same offsets. However, LowerArguments() has been
providing the offset from the beginning of the struct while LowerCallTo() sets it
relative to the first split part.
This patch removes the PartBase variable in LowerArguments() so that the behavior
matches LowerCallTo(): offsets to split parts of an argument are relative to the first
part of the argument.
The existing condition for checking whether or not to expand an frem
instruction in expand-fp is not sufficiently precise.
The expansion on other targets than AMDGPU - which is the only intended
user right now - is only prevented due to the interaction with the
MaxLegalFpConvertBitWidth check. Relying on this is conceptually wrong
and limits the use of the pass for other targets and further expansions
(e.g. merging with the similar ExpandLargeDivRem pass).
Change the expansion criterion to always expand frem of a given type
for targets that use "Expand" as the legalization action for the
underlying scalar type and use this to exit the pass early for targets
which do not require any expansions. This requires to change the
frem legalization action for all targets which do not want frem to
be expanded in this pass from "Expand" to "LibCall".
---------
Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
This commit addresses a shortcoming in the implementation of
`combineBR_CCMASK` and `combineSELECT_CCMASK`. In cases where
`combineCCMask` was able to reduce the ccmask going into the select or
branch to either true (`ccvalid`) or false (`0`), a trivial instruction
would be emitted (i.e. either a select that would only ever select one
side, or a conditional branch with `true` or `false` as the branch
condition).
This led under certain circumstances to, e.g., `BRC` instructions being
emitted that triggered an assert in the AsmPrinter meant to exclude such
branch conditions.
For the select case, this commit introduces an early bailout that simply
returns the value that would "always" be selected. For the branch case,
the commit introduces an additional guard that prevents the DAGCombine
from taking effect, thereby preventing the illegal instruction from
being emitted.
- The size of the stack slot was previously computed in LowerCall() by using
the original type, but that didn't work for a struct. Compute the size
by looking at the VT of each part and the number of them instead.
- All the members of a struct have the same OrigArgIndex, so it doesn't work
to assume that following parts belong to a split argument until another
OrigArgIndex is encountered. Use the isSplit() and isSplitEnd() flags
instead.
- Detect any scalar integer argumet >64 bits in CanLowerReturn() instead of
just i128, in order to let all of them be passed on stack.
Fixes#168460
This test has long call chain in recursion. Search tree can be pruned
early by swapping CC test and recursive simplifyAssumingCCVal.
Fixes: https://github.com/llvm/llvm-project/issues/168088
Co-authored-by: anoopkg6 <anoopkg6@github.com>
Currently LibcallLoweringInfo is defined inside of TargetLowering,
which is owned by the subtarget. Pass in the subtarget so we can
construct LibcallLoweringInfo with the subtarget. This is a temporary
step that should be revertable in the future, after LibcallLoweringInfo
is moved out of TargetLowering.
This allows SDNodes to be validated against their expected type profiles
and reduces the number of changes required to add a new node.
There is only one node that is missing a description -- `GET_CCMASK`,
others were successfully imported.
Part of #119709.
Pull Request: https://github.com/llvm/llvm-project/pull/168113
Linux kernel build fails for SystemZ as output of INLINEASM was GR32Bit
general-purpose register instead of SystemZ::CC.
---------
Co-authored-by: anoopkg6 <anoopkg6@github.com>
Co-authored-by: Ulrich Weigand <ulrich.weigand@de.ibm.com>
Added Support for flag output operand "=@cc", inline assembly constraint
for
SystemZ.
- Clang now accepts "=@cc" assembly operands, and sets 2-bits condition
code
for output operand for SyatemZ.
- Clang currently emits an assertion that flag output operands are
boolean
values, i.e. in the range [0, 2). Generalize this mechanism to allow
targets to specify arbitrary range assertions for any inline assembly
output operand. This will be used to assert that SystemZ two-bit
condition-code values are in the range [0, 4).
- SystemZ backend lowers "@cc" targets by using ipm sequence to extract
condition code from PSW.
- DAGCombine tries to optimize lowered ipm sequence by combining
CCReg and computing effective CCMask and CCValid in combineCCMask for
select_ccmask and br_ccmask.
- Cost computation is done for merging conditionals for branch
instruction
in SelectionDAG, as split may cause branches conditions evaluation goes
across basic block and difficult to combine.
---------
Co-authored-by: anoopkg6 <anoopkg6@github.com>
Co-authored-by: Ulrich Weigand <ulrich.weigand@de.ibm.com>
Turn a funnel shift by N in the range `121..128` into a funnel shift in
the opposite direction by `128 - N`. Because there are dedicated
instructions for funnel shifts by values smaller than 8, this emits
fewer instructions.
This additional rule is useful because LLVM appears to canonicalize
`fshr` into `fshl`, meaning that the rules for `fshr` on values less
than 8 would not match on organic input.
Simplify min/max instruction matching by making the related
SelectionDAG operations legal.
Add patterns to match (signed and unsigned) saturated
truncation based on open-coded min/max patterns.
Fixes https://github.com/llvm/llvm-project/issues/153655
Commit cdc7864 has an error which would wrongly fold widening
multiplications into an even/odd widening operation.
This PR fixes it and adds tests to check scenarios which should not be
folded into an even/odd widening operation are actually not.
Add LLVM Context to getOptimalMemOpType and findOptimalMemOpLowering. So
that we can use EVT::getVectorVT to generate EVT type in
getOptimalMemOpType.
Related to [#146673](https://github.com/llvm/llvm-project/pull/146673).
/data/llvm-project/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp:2390:19: error: loop variable '[Reg, N]' creates a copy from type 'std::pair<unsigned int, llvm::SDValue> const' [-Werror,-Wrange-loop-construct]
for (const auto [Reg, N] : RegsToPass) {
^
/data/llvm-project/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp:2390:8: note: use reference type 'std::pair<unsigned int, llvm::SDValue> const &' to prevent copying
for (const auto [Reg, N] : RegsToPass) {
^~~~~~~~~~~~~~~~~~~~~
&
/data/llvm-project/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp:2402:19: error: loop variable '[Reg, N]' creates a copy from type 'std::pair<unsigned int, llvm::SDValue> const' [-Werror,-Wrange-loop-construct]
for (const auto [Reg, N] : RegsToPass)
^
/data/llvm-project/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp:2402:8: note: use reference type 'std::pair<unsigned int, llvm::SDValue> const &' to prevent copying
for (const auto [Reg, N] : RegsToPass)
^~~~~~~~~~~~~~~~~~~~~
&
2 errors generated.
Add a DAGCombine for FCOPYSIGN that removes the rounding which is never
needed as the sign bit is already in the correct place. This helps in particular the
rounding to f16 case which needs a libcall.
Also remove the roundings for other FP VTs and simplify the CPSDR
patterns correspondingly.
fp-copysign-03.ll test updated, now also covering the other FP VT
combinations.
Rename one signature of getAtomic to getAtomicLoad and pass LoadExtType.
Previously we had to set the extension type after the node was created,
but we don't usually modify SDNodes once they are created. It's possible
the node already existed and has been CSEd. If that happens, modifying
the node may affect the other users. It's therefore safer to add the
extension type at creation so that it is part of the CSE information.
I don't know of any failures related to the current implementation. I
only noticed that it doesn't match how we usually do things.
- _Float16 is now accepted by Clang.
- The half IR type is fully handled by the backend.
- These values are passed in FP registers and converted to/from float around
each operation.
- Compiler-rt conversion functions are now built for s390x including the missing
extendhfdf2 which was added.
Fixes#50374
The recently announced IBM z17 processor implements the architecture
already supported as "arch15" in LLVM. This patch adds support for "z17"
as an alternate architecture name for arch15.
This patch also add the scheduler description for the z17 processor,
provided by Jonas Paulsson.
Generate code using the VECTOR ADD COMPUTE CARRY and
VECTOR SUBTRACT COMPUTE BORROW INDICATION instructions
to implement open-coded IR with those semantics.
Handles integer vector types as well as i128.
Fixes: https://github.com/llvm/llvm-project/issues/129608
Generate more efficient code for zero or sign extensions where
the source is a subvector generated via SHUFFLE_VECTOR.
Specifically, recognize patterns corresponding to (series of)
VECTOR UNPACK instructions, or the VECTOR SIGN EXTEND TO
DOUBLEWORD instruction.
As a special case, also handle zero or sign extensions of a
vector element to i128.
Fixes: https://github.com/llvm/llvm-project/issues/129576
Fixes: https://github.com/llvm/llvm-project/issues/129899
Detect (non-intrinsic) IR patterns corresponding to the semantics
of the various widening and high-word multiplication instructions.
Specifically, this is done by:
- Recognizing even/odd widening multiplication patterns in DAGCombine
- Recognizing widening multiply-and-add on top during ISel
- Implementing the standard MULHS/MUHLU IR opcodes
- Detecting high-word multiply-and-add (which common code does not)
Depending on architecture level, this can support all integer
vector types as well as the scalar i128 type.
Fixes: https://github.com/llvm/llvm-project/issues/129705
Generate efficient code using the condition code set by the
VECTOR (FP) COMPARE family of instructions to implement
vector comparison reductions, e.g. as resulting from
__builtin_reduce_and/or of some vector comparsion.
Fixes: https://github.com/llvm/llvm-project/issues/129434
It has found to be quite a slowdown to traverse the users of a
function from each call site when it is called many (~70k)
times. This patch fixes this for now as long as this verification
is disabled by default, but there is still a need to eventually
cache the results to avoid recomputation.
Fixes#130541
This is straightforward as we already had all the necessary
instructions, they simply were not wired up.
Also allows implementing the vec_round intrinsic via the
standard llvm.roundeven IR instead of a platform intrinsic now.