645 Commits

Author SHA1 Message Date
Dominik Steenken
6eb5ac52ca
[SystemZ] Remove custom lowering of f16 IS_FPCLASS (#187532)
As pointed out in #187518 , currently, `__builtin_isnormal` returns
`true` for subnormal half precision floating point numbers on `s390x.

This is because there is a custom lowering defined which lowers an `f16`
`IS_FPCLASS` ISD node by extending the `f16` value to `f32`, and then
using SystemZ's "test data class" instruction to determine whether the
number is subnormal. However, a number that is subnormal in 16 bits of
precision will no longer be subnormal in 32 bits of precision, and so
the test always returns true, i.e. all subnormal numbers are classified
as normal.

This PR addresses this by removing the custom lowering and instead
relying on the generic expansion of `IS_FPCLASS`, which does not have
this error.

Fixes #187518 .
2026-03-22 17:19:24 +01:00
Nikita Popov
11e0d6ae4b
[SystemZ] Limit depth of findCCUse() (#185922)
The recursion here has potentially exponential complexity. Avoid this by
limiting the depth of recursion.

An alternative would be to memoize the results. I went with the simpler
depth limit on the assumption that we don't particularly care about very
deep value chains here.

Fixes https://github.com/llvm/llvm-project/issues/185905.
2026-03-12 09:00:38 +01:00
Nikita Popov
d5378dafa2
[SystemZ] Mark fminimumnum/fmaximumnum as legal (#184595)
In M=4 mode, the behavior matches IEEE 754-2019 minimumNumber, except
that if both operands are sNaN, the result will be sNaN rather than
qNaN. However, this is explicitly allowed for LLVM's minimumnum
intrinsic, as canonicalization can be omitted for non-constrainted FP.

As such, mark fminimumnum/fmaximumnum as legal, and lower them the same
way as fminnum/fmaxnum. In the future, we may wish to switch those to
use M=0 instead, to match IEEE 754-2008 maxNum/minNum instead.
2026-03-05 09:03:55 +01:00
Osama Abdelkader
aad7259ff6
[AArch64] Optimize memset to use NEON DUP instruction for more sizes (#166030)
This change improves memset code generation for non-zero values on
AArch64 by using NEON's DUP instruction instead of
the less efficient multiplication with 0x01010101 pattern.

For small sizes, the value is extracted from a larger DUP. For
non-power-of-two sizes, overlapping stores are used in some cases.

TargetLowering::findOptimalMemOpLowering is modified to allow explicitly
specifying the size of the constant in cases where the constant is
larger than the store operations.

Fixes #165949
2026-01-29 13:03:38 -08:00
Jonas Paulsson
c999e9a4fe
[SystemZ] Support fp16 vector ABI and basic codegen. (#171066)
- Make v8f16 a legal type so that arguments can be passed in vector
registers. Handle fp16 vectors so that they have the same ABI as other
fp vectors.

- Set the preferred vector action for fp16 vectors to "split". This will
scalarize all operations, which is not always necessary (like with
memory operations), but it avoids the superfluous operations that result
after first widening and then scalarizing a narrow vector (like v4f16).

Fixes #168992
2026-01-26 13:42:25 -06:00
Matt Arsenault
24be429c8e
SystemZ: Use correctly offset MachinePointerInfo in CC lowering (#177793)
Previously this was just using the original base address as
the pointer info.
2026-01-25 00:02:39 +01:00
Jonas Paulsson
e0a132691f
[SystemZ] Precommit for moving some functions around. (#177441)
In preparation for #171066 (FP16 vector support).
2026-01-22 13:18:36 -06:00
Akshay Deodhar
3860147a7f
[NFC][TargetLowering] Make shouldExpandAtomicRMWInIR and shouldExpandAtomicCmpXchgInIR take a const Instruction pointer (#176073)
Splits out change from https://github.com/llvm/llvm-project/pull/176015

Changes shouldExpandAtomicRMWInIR to take a constant argument: This is
to allow some other TargetLowering constant-argument functions to call
it. This change touches several backends. An alternative solution
exists, but to me, this seems the "right" way.
2026-01-15 14:22:57 -08:00
Jonas Paulsson
100077dbff
[SelectionDAGBuilder] Don't add base offset in LowerFormalArguments(). (#170732)
LowerCallTo() and LowerArguments() are both providing the PartOffset field for
each split argument part. As these two methods are intended to work together,
they should both provide the same offsets. However, LowerArguments()  has been
providing the offset from the beginning of the struct while LowerCallTo() sets it
relative to the first split part.

This patch removes the PartBase variable in LowerArguments() so that the behavior
matches LowerCallTo(): offsets to split parts of an argument are relative to the first
part of the argument.
2025-12-19 11:27:07 -06:00
Frederik Harwath
6ad41bcc49
[CodeGen] expand-fp: Change frem expansion criterion (#158285)
The existing condition for checking whether or not to expand an frem
instruction in expand-fp is not sufficiently precise.
The expansion on other targets than AMDGPU - which is the only intended
user right now - is only prevented due to the interaction with the
MaxLegalFpConvertBitWidth check.  Relying on this is conceptually wrong
and limits the use of the pass for other targets and further expansions
(e.g. merging with the similar ExpandLargeDivRem pass).

Change the expansion criterion to always expand frem of a given type
for targets that use "Expand" as the legalization action for the 
underlying scalar type and use this to exit the pass early for targets 
which do not require any expansions. This requires to change the
frem legalization action for all targets which do not want frem to 
be expanded in this pass from "Expand" to "LibCall".

---------

Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
2025-12-16 17:31:26 +01:00
Dominik Steenken
ca12d1d8f1
[SystemZ] Improve CCMask optimization (#171137)
This commit addresses a shortcoming in the implementation of
`combineBR_CCMASK` and `combineSELECT_CCMASK`. In cases where
`combineCCMask` was able to reduce the ccmask going into the select or
branch to either true (`ccvalid`) or false (`0`), a trivial instruction
would be emitted (i.e. either a select that would only ever select one
side, or a conditional branch with `true` or `false` as the branch
condition).
This led under certain circumstances to, e.g., `BRC` instructions being
emitted that triggered an assert in the AsmPrinter meant to exclude such
branch conditions.
For the select case, this commit introduces an early bailout that simply
returns the value that would "always" be selected. For the branch case,
the commit introduces an additional guard that prevents the DAGCombine
from taking effect, thereby preventing the illegal instruction from
being emitted.
2025-12-09 11:20:40 +01:00
Jonas Paulsson
0b252daf64
[SystemZ] Handle IR struct arguments correctly. (#169583)
- The size of the stack slot was previously computed in LowerCall() by using
  the original type, but that didn't work for a struct. Compute the size
  by looking at the VT of each part and the number of them instead.

- All the members of a struct have the same OrigArgIndex, so it doesn't work
  to assume that following parts belong to a split argument until another
  OrigArgIndex is encountered. Use the isSplit() and isSplitEnd() flags
  instead.

- Detect any scalar integer argumet >64 bits in CanLowerReturn() instead of
  just i128, in order to let all of them be passed on stack.
  
Fixes #168460
2025-12-04 13:14:31 -06:00
anoopkg6
7e85b790b0
[SystemZ] Fix linux s390x main can't bootstrap itself on SanitizerSpecialCaseList.cpp #168088 (#168779)
This test has long call chain in recursion. Search tree can be pruned
early by swapping CC test and recursive simplifyAssumingCCVal.

Fixes: https://github.com/llvm/llvm-project/issues/168088
Co-authored-by: anoopkg6 <anoopkg6@github.com>
2025-11-20 00:07:43 +01:00
Matt Arsenault
a757c4e74e
CodeGen: Add subtarget to TargetLoweringBase constructor (#168620)
Currently LibcallLoweringInfo is defined inside of TargetLowering,
which is owned by the subtarget. Pass in the subtarget so we can
construct LibcallLoweringInfo with the subtarget. This is a temporary
step that should be revertable in the future, after LibcallLoweringInfo
is moved out of TargetLowering.
2025-11-19 19:18:13 +00:00
Sergei Barannikov
320c18a066
[SystemZ] TableGen-erate node descriptions (#168113)
This allows SDNodes to be validated against their expected type profiles
and reduces the number of changes required to add a new node.

There is only one node that is missing a description -- `GET_CCMASK`,
others were successfully imported.

Part of #119709.

Pull Request: https://github.com/llvm/llvm-project/pull/168113
2025-11-17 23:03:45 +03:00
Kazu Hirata
c04e57d133
[llvm] Use StringRef::contains (NFC) (#165397)
Identified with readability-container-contains
2025-10-28 16:15:08 -07:00
anoopkg6
242c716c68
Fix Linux kernel build failure for SytemZ. (#165274)
Linux kernel build fails for SystemZ as output of INLINEASM was GR32Bit
general-purpose register instead of SystemZ::CC.

---------

Co-authored-by: anoopkg6 <anoopkg6@github.com>
Co-authored-by: Ulrich Weigand <ulrich.weigand@de.ibm.com>
2025-10-27 18:22:01 +01:00
anoopkg6
6712e20c52
Add support for flag output operand "=@cc" for SystemZ. (#125970)
Added Support for flag output operand "=@cc", inline assembly constraint
for
SystemZ.

- Clang now accepts "=@cc" assembly operands, and sets 2-bits condition
code
    for output operand for SyatemZ.

- Clang currently emits an assertion that flag output operands are
boolean
values, i.e. in the range [0, 2). Generalize this mechanism to allow
targets to specify arbitrary range assertions for any inline assembly
    output operand.  This will be used to assert that SystemZ two-bit
    condition-code values are in the range [0, 4).

- SystemZ backend lowers "@cc" targets by using ipm sequence to extract
    condition code from PSW.

  - DAGCombine tries to optimize lowered ipm sequence by combining
CCReg and computing effective CCMask and CCValid in combineCCMask for
    select_ccmask and br_ccmask.

- Cost computation is done for merging conditionals for branch
instruction
in SelectionDAG, as split may cause branches conditions evaluation goes
    across basic block and difficult to combine.

---------

Co-authored-by: anoopkg6 <anoopkg6@github.com>
Co-authored-by: Ulrich Weigand <ulrich.weigand@de.ibm.com>
2025-10-14 11:53:42 +02:00
Folkert de Vries
8a9e3333dd
s390x: optimize 128-bit fshl and fshr by high values (#154919)
Turn a funnel shift by N in the range `121..128` into a funnel shift in
the opposite direction by `128 - N`. Because there are dedicated
instructions for funnel shifts by values smaller than 8, this emits
fewer instructions.

This additional rule is useful because LLVM appears to canonicalize
`fshr` into `fshl`, meaning that the rules for `fshr` on values less
than 8 would not match on organic input.
2025-08-27 09:31:49 +02:00
Folkert de Vries
558657298a
s390x: pattern match saturated truncation (#155377)
Simplify min/max instruction matching by making the related
SelectionDAG operations legal.

Add patterns to match (signed and unsigned) saturated
truncation based on open-coded min/max patterns.

Fixes https://github.com/llvm/llvm-project/issues/153655
2025-08-26 17:19:58 +02:00
Nikita Popov
9d37e80d3c
[SystemZ] Remove custom CCState pre-analysis (#154091)
The calling convention lowering now has access to OrigTy, so use that to
detect short vectors.
2025-08-19 09:28:09 +02:00
Nikita Popov
01bc742185
[CodeGen] Give ArgListEntry a proper constructor (NFC) (#153817)
This ensures that the required fields are set, and also makes the
construction more convenient.
2025-08-15 18:06:07 +02:00
sujianIBM
fc12fc635b
[SystemZ] Fix code in widening vector multiplication (#150836)
Commit cdc7864 has an error which would wrongly fold widening
multiplications into an even/odd widening operation.
This PR fixes it and adds tests to check scenarios which should not be
folded into an even/odd widening operation are actually not.
2025-07-31 13:18:23 -04:00
Boyao Wang
697beb3f17
[TargetLowering] Change getOptimalMemOpType and findOptimalMemOpLowering to take LLVM Context (#147664)
Add LLVM Context to getOptimalMemOpType and findOptimalMemOpLowering. So
that we can use EVT::getVectorVT to generate EVT type in
getOptimalMemOpType.

Related to [#146673](https://github.com/llvm/llvm-project/pull/146673).
2025-07-10 11:11:09 +08:00
MangalaPG
dd54b8e462
Clang-Tidy issues in fixed in file SystemZISelLowering.cpp (#147251)
Corrected variable names corrections according to the clang-tidy
standards.

---------

Signed-off-by: MangalaPG <mangala.P.G@ibm.com>
2025-07-09 20:26:42 +02:00
Matt Arsenault
d8ef156379
DAG: Remove verifyReturnAddressArgumentIsConstant (#147240)
The intrinsic argument is already marked with immarg so non-constant
values are rejected by the IR verifier.
2025-07-07 16:28:47 +09:00
Jie Fu
842f4f711d [Target] Prevent copying in loop variables (NFC)
/data/llvm-project/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp:2390:19: error: loop variable '[Reg, N]' creates a copy from type 'std::pair<unsigned int, llvm::SDValue> const' [-Werror,-Wrange-loop-construct]
  for (const auto [Reg, N] : RegsToPass) {
                  ^
/data/llvm-project/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp:2390:8: note: use reference type 'std::pair<unsigned int, llvm::SDValue> const &' to prevent copying
  for (const auto [Reg, N] : RegsToPass) {
       ^~~~~~~~~~~~~~~~~~~~~
                  &
/data/llvm-project/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp:2402:19: error: loop variable '[Reg, N]' creates a copy from type 'std::pair<unsigned int, llvm::SDValue> const' [-Werror,-Wrange-loop-construct]
  for (const auto [Reg, N] : RegsToPass)
                  ^
/data/llvm-project/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp:2402:8: note: use reference type 'std::pair<unsigned int, llvm::SDValue> const &' to prevent copying
  for (const auto [Reg, N] : RegsToPass)
       ^~~~~~~~~~~~~~~~~~~~~
                  &
2 errors generated.
2025-06-29 14:24:41 +08:00
Kazu Hirata
9cf251d9d8
[Target] Use range-based for loops (NFC) (#146253) 2025-06-28 20:41:39 -07:00
Matt Arsenault
48155f93dd
CodeGen: Emit error if getRegisterByName fails (#145194)
This avoids using report_fatal_error and standardizes the error
message in a subset of the error conditions.
2025-06-23 16:33:35 +09:00
Iris Shi
24d730b380
Reland "[SelectionDAG] Make (a & x) | (~a & y) -> (a & (x ^ y)) ^ y available for all targets" (#143651) 2025-06-11 15:56:37 +08:00
Iris Shi
8c890eaa3f
Revert "[SelectionDAG] Make (a & x) | (~a & y) -> (a & (x ^ y)) ^ y available for all targets" (#143648) 2025-06-11 10:19:12 +08:00
Iris Shi
bfb48363b0
[SelectionDAG] Make (a & x) | (~a & y) -> (a & (x ^ y)) ^ y available for all targets (#137641) 2025-06-09 17:57:15 +08:00
Matt Arsenault
0a3e9aa336
SystemZ: Move runtime libcall setting out of TargetLowering (#142622)
RuntimeLibcallInfo needs to be correct outside of codegen contexts.
2025-06-04 06:21:46 +09:00
Rahul Joshi
52c2e45c11
[NFC][CodeGen] Adopt MachineFunctionProperties convenience accessors (#141101) 2025-05-23 08:30:29 -07:00
Craig Topper
dcd62f3674
[SelectionDAG] Rename MemSDNode::getOriginalAlign to getBaseAlign. NFC (#139930)
This matches the underlying function in MachineMemOperand and how it is
printed when BaseAlign differs from Align.
2025-05-16 09:37:02 -07:00
Jonas Paulsson
94a14f9f0d
[SystemZ] Add DAGCombine for FCOPYSIGN to remove rounding. (#136131)
Add a DAGCombine for FCOPYSIGN that removes the rounding which is never
needed as the sign bit is already in the correct place. This helps in particular the
rounding to f16 case which needs a libcall.

Also remove the roundings for other FP VTs and simplify the CPSDR
patterns correspondingly.

fp-copysign-03.ll test updated, now also covering the other FP VT
combinations.
2025-04-24 11:05:51 +02:00
Jonas Paulsson
1ec22fae7e
[SystemZ] Handle f16 load positive/negative/complement without libcalls. (#136286)
This can be done directly with the (64-bit) target instruction as only the sign bit
is changed.
2025-04-24 10:49:40 +02:00
Craig Topper
f6178cdad0
[SelectionDAG] Pass LoadExtType when ATOMIC_LOAD is created. (#136653)
Rename one signature of getAtomic to getAtomicLoad and pass LoadExtType.
Previously we had to set the extension type after the node was created,
but we don't usually modify SDNodes once they are created. It's possible
the node already existed and has been CSEd. If that happens, modifying
the node may affect the other users. It's therefore safer to add the
extension type at creation so that it is part of the CSE information.

I don't know of any failures related to the current implementation. I
only noticed that it doesn't match how we usually do things.
2025-04-22 09:11:46 -07:00
Kazu Hirata
8a00efd26d [SystemZ] Fix warnings
This patch fixes:

  llvm/lib/Target/SystemZ/SystemZISelLowering.cpp:6916:7: error:
  unused variable 'RegVT' [-Werror,-Wunused-variable]

  llvm/lib/Target/SystemZ/SystemZInstrInfo.cpp:1265:30: error: unused
  variable 'RC' [-Werror,-Wunused-variable]
2025-04-16 11:25:55 -07:00
Jonas Paulsson
6d03f51f0c
[SystemZ] Add support for 16-bit floating point. (#109164)
- _Float16 is now accepted by Clang.

- The half IR type is fully handled by the backend.

- These values are passed in FP registers and converted to/from float around
  each operation.

- Compiler-rt conversion functions are now built for s390x including the missing
  extendhfdf2 which was added.

Fixes #50374
2025-04-16 20:02:56 +02:00
Ulrich Weigand
80267f8148
Support z17 processor name and scheduler description (#135254)
The recently announced IBM z17 processor implements the architecture
already supported as "arch15" in LLVM. This patch adds support for "z17"
as an alternate architecture name for arch15.

This patch also add the scheduler description for the z17 processor,
provided by Jonas Paulsson.
2025-04-11 00:20:58 +02:00
Jonas Paulsson
b13373db25
[SystemZ] Use hasAddressTaken() with verifyNarrowIntegerArgs (NFC). (#131039)
Use hasAddressTaken() in SystemZ instead of doing this computation in
isFullyInternal(), and make sure to only do this once per Function.
2025-03-21 19:07:46 +01:00
Ulrich Weigand
f4ea1055ad [SystemZ] Implement i128 funnel shifts
These can be handled via the VECTOR SHIFT LEFT/RIGHT DOUBLE
family of instructions, depending on architecture level.

Fixes: https://github.com/llvm/llvm-project/issues/129955
2025-03-15 18:28:44 +01:00
Ulrich Weigand
4155cc0fb3 [SystemZ] Recognize carry/borrow computation
Generate code using the VECTOR ADD COMPUTE CARRY and
VECTOR SUBTRACT COMPUTE BORROW INDICATION instructions
to implement open-coded IR with those semantics.

Handles integer vector types as well as i128.

Fixes: https://github.com/llvm/llvm-project/issues/129608
2025-03-15 18:28:44 +01:00
Ulrich Weigand
4a4987be36 [SystemZ] Optimize vector zero/sign extensions
Generate more efficient code for zero or sign extensions where
the source is a subvector generated via SHUFFLE_VECTOR.

Specifically, recognize patterns corresponding to (series of)
VECTOR UNPACK instructions, or the VECTOR SIGN EXTEND TO
DOUBLEWORD instruction.

As a special case, also handle zero or sign extensions of a
vector element to i128.

Fixes: https://github.com/llvm/llvm-project/issues/129576
Fixes: https://github.com/llvm/llvm-project/issues/129899
2025-03-15 18:28:44 +01:00
Ulrich Weigand
cdc7864986 [SystemZ] Optimize widening and high-word vector multiplication
Detect (non-intrinsic) IR patterns corresponding to the semantics
of the various widening and high-word multiplication instructions.

Specifically, this is done by:
- Recognizing even/odd widening multiplication patterns in DAGCombine
- Recognizing widening multiply-and-add on top during ISel
- Implementing the standard MULHS/MUHLU IR opcodes
- Detecting high-word multiply-and-add (which common code does not)

Depending on architecture level, this can support all integer
vector types as well as the scalar i128 type.

Fixes: https://github.com/llvm/llvm-project/issues/129705
2025-03-15 18:28:44 +01:00
Ulrich Weigand
7af3d3929e [SystemZ] Optimize vector comparison reductions
Generate efficient code using the condition code set by the
VECTOR (FP) COMPARE family of instructions to implement
vector comparison reductions, e.g. as resulting from
__builtin_reduce_and/or of some vector comparsion.

Fixes: https://github.com/llvm/llvm-project/issues/129434
2025-03-15 18:28:44 +01:00
Jonas Paulsson
378739f182
[SystemZ] Move disabling of arg verification to before isFullyInternal(). (#130693)
It has found to be quite a slowdown to traverse the users of a
function from each call site when it is called many (~70k)
times. This patch fixes this for now as long as this verification
is disabled by default, but there is still a need to eventually
cache the results to avoid recomputation.

Fixes #130541
2025-03-12 18:33:12 +01:00
Ulrich Weigand
adacbf68eb [SystemZ] Add codegen support for llvm.roundeven
This is straightforward as we already had all the necessary
instructions, they simply were not wired up.

Also allows implementing the vec_round intrinsic via the
standard llvm.roundeven IR instead of a platform intrinsic now.
2025-02-14 00:10:37 +01:00
Kazu Hirata
5a056f91be
[SystemZ] Avoid repeated hash lookups (NFC) (#126005)
Co-authored-by: Nikita Popov <github@npopov.com>
2025-02-06 16:22:31 -08:00