llvm-project

Author	SHA1	Message	Date
Dominik Steenken	6eb5ac52ca	[SystemZ] Remove custom lowering of f16 IS_FPCLASS (#187532 ) As pointed out in #187518 , currently, `__builtin_isnormal` returns `true` for subnormal half precision floating point numbers on `s390x. This is because there is a custom lowering defined which lowers an `f16` `IS_FPCLASS` ISD node by extending the `f16` value to `f32`, and then using SystemZ's "test data class" instruction to determine whether the number is subnormal. However, a number that is subnormal in 16 bits of precision will no longer be subnormal in 32 bits of precision, and so the test always returns true, i.e. all subnormal numbers are classified as normal. This PR addresses this by removing the custom lowering and instead relying on the generic expansion of `IS_FPCLASS`, which does not have this error. Fixes #187518 .	2026-03-22 17:19:24 +01:00
Nikita Popov	11e0d6ae4b	[SystemZ] Limit depth of findCCUse() (#185922 ) The recursion here has potentially exponential complexity. Avoid this by limiting the depth of recursion. An alternative would be to memoize the results. I went with the simpler depth limit on the assumption that we don't particularly care about very deep value chains here. Fixes https://github.com/llvm/llvm-project/issues/185905.	2026-03-12 09:00:38 +01:00
Nikita Popov	d5378dafa2	[SystemZ] Mark fminimumnum/fmaximumnum as legal (#184595 ) In M=4 mode, the behavior matches IEEE 754-2019 minimumNumber, except that if both operands are sNaN, the result will be sNaN rather than qNaN. However, this is explicitly allowed for LLVM's minimumnum intrinsic, as canonicalization can be omitted for non-constrainted FP. As such, mark fminimumnum/fmaximumnum as legal, and lower them the same way as fminnum/fmaxnum. In the future, we may wish to switch those to use M=0 instead, to match IEEE 754-2008 maxNum/minNum instead.	2026-03-05 09:03:55 +01:00
Osama Abdelkader	aad7259ff6	[AArch64] Optimize memset to use NEON DUP instruction for more sizes (#166030 ) This change improves memset code generation for non-zero values on AArch64 by using NEON's DUP instruction instead of the less efficient multiplication with 0x01010101 pattern. For small sizes, the value is extracted from a larger DUP. For non-power-of-two sizes, overlapping stores are used in some cases. TargetLowering::findOptimalMemOpLowering is modified to allow explicitly specifying the size of the constant in cases where the constant is larger than the store operations. Fixes #165949	2026-01-29 13:03:38 -08:00
Jonas Paulsson	c999e9a4fe	[SystemZ] Support fp16 vector ABI and basic codegen. (#171066 ) - Make v8f16 a legal type so that arguments can be passed in vector registers. Handle fp16 vectors so that they have the same ABI as other fp vectors. - Set the preferred vector action for fp16 vectors to "split". This will scalarize all operations, which is not always necessary (like with memory operations), but it avoids the superfluous operations that result after first widening and then scalarizing a narrow vector (like v4f16). Fixes #168992	2026-01-26 13:42:25 -06:00
Matt Arsenault	24be429c8e	SystemZ: Use correctly offset MachinePointerInfo in CC lowering (#177793 ) Previously this was just using the original base address as the pointer info.	2026-01-25 00:02:39 +01:00
Jonas Paulsson	e0a132691f	[SystemZ] Precommit for moving some functions around. (#177441 ) In preparation for #171066 (FP16 vector support).	2026-01-22 13:18:36 -06:00
Akshay Deodhar	3860147a7f	[NFC][TargetLowering] Make shouldExpandAtomicRMWInIR and shouldExpandAtomicCmpXchgInIR take a const Instruction pointer (#176073 ) Splits out change from https://github.com/llvm/llvm-project/pull/176015 Changes shouldExpandAtomicRMWInIR to take a constant argument: This is to allow some other TargetLowering constant-argument functions to call it. This change touches several backends. An alternative solution exists, but to me, this seems the "right" way.	2026-01-15 14:22:57 -08:00
Jonas Paulsson	100077dbff	[SelectionDAGBuilder] Don't add base offset in LowerFormalArguments(). (#170732 ) LowerCallTo() and LowerArguments() are both providing the PartOffset field for each split argument part. As these two methods are intended to work together, they should both provide the same offsets. However, LowerArguments() has been providing the offset from the beginning of the struct while LowerCallTo() sets it relative to the first split part. This patch removes the PartBase variable in LowerArguments() so that the behavior matches LowerCallTo(): offsets to split parts of an argument are relative to the first part of the argument.	2025-12-19 11:27:07 -06:00
Frederik Harwath	6ad41bcc49	[CodeGen] expand-fp: Change frem expansion criterion (#158285 ) The existing condition for checking whether or not to expand an frem instruction in expand-fp is not sufficiently precise. The expansion on other targets than AMDGPU - which is the only intended user right now - is only prevented due to the interaction with the MaxLegalFpConvertBitWidth check. Relying on this is conceptually wrong and limits the use of the pass for other targets and further expansions (e.g. merging with the similar ExpandLargeDivRem pass). Change the expansion criterion to always expand frem of a given type for targets that use "Expand" as the legalization action for the underlying scalar type and use this to exit the pass early for targets which do not require any expansions. This requires to change the frem legalization action for all targets which do not want frem to be expanded in this pass from "Expand" to "LibCall". --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com>	2025-12-16 17:31:26 +01:00
Dominik Steenken	ca12d1d8f1	[SystemZ] Improve CCMask optimization (#171137 ) This commit addresses a shortcoming in the implementation of `combineBR_CCMASK` and `combineSELECT_CCMASK`. In cases where `combineCCMask` was able to reduce the ccmask going into the select or branch to either true (`ccvalid`) or false (`0`), a trivial instruction would be emitted (i.e. either a select that would only ever select one side, or a conditional branch with `true` or `false` as the branch condition). This led under certain circumstances to, e.g., `BRC` instructions being emitted that triggered an assert in the AsmPrinter meant to exclude such branch conditions. For the select case, this commit introduces an early bailout that simply returns the value that would "always" be selected. For the branch case, the commit introduces an additional guard that prevents the DAGCombine from taking effect, thereby preventing the illegal instruction from being emitted.	2025-12-09 11:20:40 +01:00
Jonas Paulsson	0b252daf64	[SystemZ] Handle IR struct arguments correctly. (#169583 ) - The size of the stack slot was previously computed in LowerCall() by using the original type, but that didn't work for a struct. Compute the size by looking at the VT of each part and the number of them instead. - All the members of a struct have the same OrigArgIndex, so it doesn't work to assume that following parts belong to a split argument until another OrigArgIndex is encountered. Use the isSplit() and isSplitEnd() flags instead. - Detect any scalar integer argumet >64 bits in CanLowerReturn() instead of just i128, in order to let all of them be passed on stack. Fixes #168460	2025-12-04 13:14:31 -06:00
anoopkg6	7e85b790b0	[SystemZ] Fix linux s390x main can't bootstrap itself on SanitizerSpecialCaseList.cpp #168088 (#168779 ) This test has long call chain in recursion. Search tree can be pruned early by swapping CC test and recursive simplifyAssumingCCVal. Fixes: https://github.com/llvm/llvm-project/issues/168088 Co-authored-by: anoopkg6 <anoopkg6@github.com>	2025-11-20 00:07:43 +01:00
Matt Arsenault	a757c4e74e	CodeGen: Add subtarget to TargetLoweringBase constructor (#168620 ) Currently LibcallLoweringInfo is defined inside of TargetLowering, which is owned by the subtarget. Pass in the subtarget so we can construct LibcallLoweringInfo with the subtarget. This is a temporary step that should be revertable in the future, after LibcallLoweringInfo is moved out of TargetLowering.	2025-11-19 19:18:13 +00:00
Sergei Barannikov	320c18a066	[SystemZ] TableGen-erate node descriptions (#168113 ) This allows SDNodes to be validated against their expected type profiles and reduces the number of changes required to add a new node. There is only one node that is missing a description -- `GET_CCMASK`, others were successfully imported. Part of #119709. Pull Request: https://github.com/llvm/llvm-project/pull/168113	2025-11-17 23:03:45 +03:00
Kazu Hirata	c04e57d133	[llvm] Use StringRef::contains (NFC) (#165397 ) Identified with readability-container-contains	2025-10-28 16:15:08 -07:00
anoopkg6	242c716c68	Fix Linux kernel build failure for SytemZ. (#165274 ) Linux kernel build fails for SystemZ as output of INLINEASM was GR32Bit general-purpose register instead of SystemZ::CC. --------- Co-authored-by: anoopkg6 <anoopkg6@github.com> Co-authored-by: Ulrich Weigand <ulrich.weigand@de.ibm.com>	2025-10-27 18:22:01 +01:00
anoopkg6	6712e20c52	Add support for flag output operand "=@cc" for SystemZ. (#125970 ) Added Support for flag output operand "=@cc", inline assembly constraint for SystemZ. - Clang now accepts "=@cc" assembly operands, and sets 2-bits condition code for output operand for SyatemZ. - Clang currently emits an assertion that flag output operands are boolean values, i.e. in the range [0, 2). Generalize this mechanism to allow targets to specify arbitrary range assertions for any inline assembly output operand. This will be used to assert that SystemZ two-bit condition-code values are in the range [0, 4). - SystemZ backend lowers "@cc" targets by using ipm sequence to extract condition code from PSW. - DAGCombine tries to optimize lowered ipm sequence by combining CCReg and computing effective CCMask and CCValid in combineCCMask for select_ccmask and br_ccmask. - Cost computation is done for merging conditionals for branch instruction in SelectionDAG, as split may cause branches conditions evaluation goes across basic block and difficult to combine. --------- Co-authored-by: anoopkg6 <anoopkg6@github.com> Co-authored-by: Ulrich Weigand <ulrich.weigand@de.ibm.com>	2025-10-14 11:53:42 +02:00
Folkert de Vries	8a9e3333dd	s390x: optimize 128-bit fshl and fshr by high values (#154919 ) Turn a funnel shift by N in the range `121..128` into a funnel shift in the opposite direction by `128 - N`. Because there are dedicated instructions for funnel shifts by values smaller than 8, this emits fewer instructions. This additional rule is useful because LLVM appears to canonicalize `fshr` into `fshl`, meaning that the rules for `fshr` on values less than 8 would not match on organic input.	2025-08-27 09:31:49 +02:00
Folkert de Vries	558657298a	s390x: pattern match saturated truncation (#155377 ) Simplify min/max instruction matching by making the related SelectionDAG operations legal. Add patterns to match (signed and unsigned) saturated truncation based on open-coded min/max patterns. Fixes https://github.com/llvm/llvm-project/issues/153655	2025-08-26 17:19:58 +02:00
Nikita Popov	9d37e80d3c	[SystemZ] Remove custom CCState pre-analysis (#154091 ) The calling convention lowering now has access to OrigTy, so use that to detect short vectors.	2025-08-19 09:28:09 +02:00
Nikita Popov	01bc742185	[CodeGen] Give ArgListEntry a proper constructor (NFC) (#153817 ) This ensures that the required fields are set, and also makes the construction more convenient.	2025-08-15 18:06:07 +02:00
sujianIBM	fc12fc635b	[SystemZ] Fix code in widening vector multiplication (#150836 ) Commit cdc7864 has an error which would wrongly fold widening multiplications into an even/odd widening operation. This PR fixes it and adds tests to check scenarios which should not be folded into an even/odd widening operation are actually not.	2025-07-31 13:18:23 -04:00
Boyao Wang	697beb3f17	[TargetLowering] Change getOptimalMemOpType and findOptimalMemOpLowering to take LLVM Context (#147664 ) Add LLVM Context to getOptimalMemOpType and findOptimalMemOpLowering. So that we can use EVT::getVectorVT to generate EVT type in getOptimalMemOpType. Related to [#146673](https://github.com/llvm/llvm-project/pull/146673).	2025-07-10 11:11:09 +08:00
MangalaPG	dd54b8e462	Clang-Tidy issues in fixed in file SystemZISelLowering.cpp (#147251 ) Corrected variable names corrections according to the clang-tidy standards. --------- Signed-off-by: MangalaPG <mangala.P.G@ibm.com>	2025-07-09 20:26:42 +02:00
Matt Arsenault	d8ef156379	DAG: Remove verifyReturnAddressArgumentIsConstant (#147240 ) The intrinsic argument is already marked with immarg so non-constant values are rejected by the IR verifier.	2025-07-07 16:28:47 +09:00
Jie Fu	842f4f711d	[Target] Prevent copying in loop variables (NFC) /data/llvm-project/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp:2390:19: error: loop variable '[Reg, N]' creates a copy from type 'std::pair<unsigned int, llvm::SDValue> const' [-Werror,-Wrange-loop-construct] for (const auto [Reg, N] : RegsToPass) { ^ /data/llvm-project/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp:2390:8: note: use reference type 'std::pair<unsigned int, llvm::SDValue> const &' to prevent copying for (const auto [Reg, N] : RegsToPass) { ^~~~~~~~~~~~~~~~~~~~~ & /data/llvm-project/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp:2402:19: error: loop variable '[Reg, N]' creates a copy from type 'std::pair<unsigned int, llvm::SDValue> const' [-Werror,-Wrange-loop-construct] for (const auto [Reg, N] : RegsToPass) ^ /data/llvm-project/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp:2402:8: note: use reference type 'std::pair<unsigned int, llvm::SDValue> const &' to prevent copying for (const auto [Reg, N] : RegsToPass) ^~~~~~~~~~~~~~~~~~~~~ & 2 errors generated.	2025-06-29 14:24:41 +08:00
Kazu Hirata	9cf251d9d8	[Target] Use range-based for loops (NFC) (#146253 )	2025-06-28 20:41:39 -07:00
Matt Arsenault	48155f93dd	CodeGen: Emit error if getRegisterByName fails (#145194 ) This avoids using report_fatal_error and standardizes the error message in a subset of the error conditions.	2025-06-23 16:33:35 +09:00
Iris Shi	24d730b380	Reland "[SelectionDAG] Make `(a & x) \| (~a & y) -> (a & (x ^ y)) ^ y` available for all targets" (#143651 )	2025-06-11 15:56:37 +08:00
Iris Shi	8c890eaa3f	Revert "[SelectionDAG] Make `(a & x) \| (~a & y) -> (a & (x ^ y)) ^ y` available for all targets" (#143648 )	2025-06-11 10:19:12 +08:00
Iris Shi	bfb48363b0	[SelectionDAG] Make `(a & x) \| (~a & y) -> (a & (x ^ y)) ^ y` available for all targets (#137641 )	2025-06-09 17:57:15 +08:00
Matt Arsenault	0a3e9aa336	SystemZ: Move runtime libcall setting out of TargetLowering (#142622 ) RuntimeLibcallInfo needs to be correct outside of codegen contexts.	2025-06-04 06:21:46 +09:00
Rahul Joshi	52c2e45c11	[NFC][CodeGen] Adopt MachineFunctionProperties convenience accessors (#141101 )	2025-05-23 08:30:29 -07:00
Craig Topper	dcd62f3674	[SelectionDAG] Rename MemSDNode::getOriginalAlign to getBaseAlign. NFC (#139930 ) This matches the underlying function in MachineMemOperand and how it is printed when BaseAlign differs from Align.	2025-05-16 09:37:02 -07:00
Jonas Paulsson	94a14f9f0d	[SystemZ] Add DAGCombine for FCOPYSIGN to remove rounding. (#136131 ) Add a DAGCombine for FCOPYSIGN that removes the rounding which is never needed as the sign bit is already in the correct place. This helps in particular the rounding to f16 case which needs a libcall. Also remove the roundings for other FP VTs and simplify the CPSDR patterns correspondingly. fp-copysign-03.ll test updated, now also covering the other FP VT combinations.	2025-04-24 11:05:51 +02:00
Jonas Paulsson	1ec22fae7e	[SystemZ] Handle f16 load positive/negative/complement without libcalls. (#136286 ) This can be done directly with the (64-bit) target instruction as only the sign bit is changed.	2025-04-24 10:49:40 +02:00
Craig Topper	f6178cdad0	[SelectionDAG] Pass LoadExtType when ATOMIC_LOAD is created. (#136653 ) Rename one signature of getAtomic to getAtomicLoad and pass LoadExtType. Previously we had to set the extension type after the node was created, but we don't usually modify SDNodes once they are created. It's possible the node already existed and has been CSEd. If that happens, modifying the node may affect the other users. It's therefore safer to add the extension type at creation so that it is part of the CSE information. I don't know of any failures related to the current implementation. I only noticed that it doesn't match how we usually do things.	2025-04-22 09:11:46 -07:00
Kazu Hirata	8a00efd26d	[SystemZ] Fix warnings This patch fixes: llvm/lib/Target/SystemZ/SystemZISelLowering.cpp:6916:7: error: unused variable 'RegVT' [-Werror,-Wunused-variable] llvm/lib/Target/SystemZ/SystemZInstrInfo.cpp:1265:30: error: unused variable 'RC' [-Werror,-Wunused-variable]	2025-04-16 11:25:55 -07:00
Jonas Paulsson	6d03f51f0c	[SystemZ] Add support for 16-bit floating point. (#109164 ) - _Float16 is now accepted by Clang. - The half IR type is fully handled by the backend. - These values are passed in FP registers and converted to/from float around each operation. - Compiler-rt conversion functions are now built for s390x including the missing extendhfdf2 which was added. Fixes #50374	2025-04-16 20:02:56 +02:00
Ulrich Weigand	80267f8148	Support z17 processor name and scheduler description (#135254 ) The recently announced IBM z17 processor implements the architecture already supported as "arch15" in LLVM. This patch adds support for "z17" as an alternate architecture name for arch15. This patch also add the scheduler description for the z17 processor, provided by Jonas Paulsson.	2025-04-11 00:20:58 +02:00
Jonas Paulsson	b13373db25	[SystemZ] Use hasAddressTaken() with verifyNarrowIntegerArgs (NFC). (#131039 ) Use hasAddressTaken() in SystemZ instead of doing this computation in isFullyInternal(), and make sure to only do this once per Function.	2025-03-21 19:07:46 +01:00
Ulrich Weigand	f4ea1055ad	[SystemZ] Implement i128 funnel shifts These can be handled via the VECTOR SHIFT LEFT/RIGHT DOUBLE family of instructions, depending on architecture level. Fixes: https://github.com/llvm/llvm-project/issues/129955	2025-03-15 18:28:44 +01:00
Ulrich Weigand	4155cc0fb3	[SystemZ] Recognize carry/borrow computation Generate code using the VECTOR ADD COMPUTE CARRY and VECTOR SUBTRACT COMPUTE BORROW INDICATION instructions to implement open-coded IR with those semantics. Handles integer vector types as well as i128. Fixes: https://github.com/llvm/llvm-project/issues/129608	2025-03-15 18:28:44 +01:00
Ulrich Weigand	4a4987be36	[SystemZ] Optimize vector zero/sign extensions Generate more efficient code for zero or sign extensions where the source is a subvector generated via SHUFFLE_VECTOR. Specifically, recognize patterns corresponding to (series of) VECTOR UNPACK instructions, or the VECTOR SIGN EXTEND TO DOUBLEWORD instruction. As a special case, also handle zero or sign extensions of a vector element to i128. Fixes: https://github.com/llvm/llvm-project/issues/129576 Fixes: https://github.com/llvm/llvm-project/issues/129899	2025-03-15 18:28:44 +01:00
Ulrich Weigand	cdc7864986	[SystemZ] Optimize widening and high-word vector multiplication Detect (non-intrinsic) IR patterns corresponding to the semantics of the various widening and high-word multiplication instructions. Specifically, this is done by: - Recognizing even/odd widening multiplication patterns in DAGCombine - Recognizing widening multiply-and-add on top during ISel - Implementing the standard MULHS/MUHLU IR opcodes - Detecting high-word multiply-and-add (which common code does not) Depending on architecture level, this can support all integer vector types as well as the scalar i128 type. Fixes: https://github.com/llvm/llvm-project/issues/129705	2025-03-15 18:28:44 +01:00
Ulrich Weigand	7af3d3929e	[SystemZ] Optimize vector comparison reductions Generate efficient code using the condition code set by the VECTOR (FP) COMPARE family of instructions to implement vector comparison reductions, e.g. as resulting from __builtin_reduce_and/or of some vector comparsion. Fixes: https://github.com/llvm/llvm-project/issues/129434	2025-03-15 18:28:44 +01:00
Jonas Paulsson	378739f182	[SystemZ] Move disabling of arg verification to before isFullyInternal(). (#130693 ) It has found to be quite a slowdown to traverse the users of a function from each call site when it is called many (~70k) times. This patch fixes this for now as long as this verification is disabled by default, but there is still a need to eventually cache the results to avoid recomputation. Fixes #130541	2025-03-12 18:33:12 +01:00
Ulrich Weigand	adacbf68eb	[SystemZ] Add codegen support for llvm.roundeven This is straightforward as we already had all the necessary instructions, they simply were not wired up. Also allows implementing the vec_round intrinsic via the standard llvm.roundeven IR instead of a platform intrinsic now.	2025-02-14 00:10:37 +01:00
Kazu Hirata	5a056f91be	[SystemZ] Avoid repeated hash lookups (NFC) (#126005 ) Co-authored-by: Nikita Popov <github@npopov.com>	2025-02-06 16:22:31 -08:00

1 2 3 4 5 ...

645 Commits