llvm-project

Author	SHA1	Message	Date
Jonas Paulsson	16b7cc69ef	[SystemZ] Eliminate call sequence instructions early. (#77812 ) On SystemZ, the outgoing argument area which is big enough for all calls in the function is created once during the prolog, as opposed to adjusting the stack around each call. The call-sequence instructions are therefore not really useful any more than to compute the maximum call frame size, which has so far been done by PEI, but can just as well be done at an earlier point. This patch removes the mapping of the CallFrameSetupOpcode and CallFrameDestroyOpcode and instead computes the MaxCallFrameSize directly after instruction selection and then removes the ADJCALLSTACK pseudos. This removes the confusing pseudos and also avoids the problem of having to keep the call frame size accurate when creating new MBBs. This fixes #76618 which exposed the need to maintain the call frame size when splitting blocks (which was not done).	2024-03-28 18:26:38 +01:00
Ulrich Weigand	4b907414d2	[SystemZ] Add support for llvm.readcyclecounter The llvm.readcyclecounter intrinsic can be implemented via the STORE CLOCK FAST (STCKF) instruction.	2024-03-22 20:01:02 +01:00
Jonas Paulsson	8b8e1adbde	[SystemZ] Don't lower ATOMIC_LOAD/STORE to LOAD/STORE (#75879 ) - Instead of lowering float/double ISD::ATOMIC_LOAD / ISD::ATOMIC_STORE nodes to regular LOAD/STORE nodes, make them legal and select those nodes properly instead. This avoids exposing them to the DAGCombiner. - AtomicExpand pass no longer casts float/double atomic load/stores to integer (FP128 is still casted).	2024-03-18 17:21:50 -04:00
Yusra Syeda	0768253c20	[SystemZ][z/OS] Add exception handling for XPLINK (#74638 ) Adds emitting the exception table and the EH registers for XPLINK. --------- Co-authored-by: Yusra Syeda <yusra.syeda@ibm.com>	2023-12-19 13:58:33 -05:00
Ulrich Weigand	59f7f35a90	[SystemZ] ABI support for single-element vector types Support passing and returning values of single-element vector types (i.e. <1 x i128> and <1 x fp128>). Now that i128 is a legal type, supporting these types can be done simply by providing a getRegisterTypeForCallingConv implementation that handles them. Fixes https://github.com/llvm/llvm-project/issues/61291	2023-12-15 19:31:00 +01:00
Ulrich Weigand	a65ccc1b9f	[SystemZ] Support i128 as legal type in VRs (#74625 ) On processors supporting vector registers and SIMD instructions, enable i128 as legal type in VRs. This allows many operations to be implemented via native instructions directly in VRs (including add, subtract, logical operations and shifts). For a few other operations (e.g. multiply and divide, as well as atomic operations), we need to move the i128 value back to a GPR pair to use the corresponding instruction there. Overall, this is still beneficial. The patch includes the following LLVM changes: - Enable i128 as legal type - Set up legal operations (in SystemZInstrVector.td) - Custom expansion for i128 add/subtract with carry - Custom expansion for i128 comparisons and selects - Support for moving i128 to/from GPR pairs when required - Handle 128-bit integer constant values everywhere - Use i128 as intrinsic operand type where appropriate - Updated and new test cases In addition, clang builtins are updated to reflect the intrinsic operand type changes (which also improves compatibility with GCC).	2023-12-15 12:55:15 +01:00
Jonas Paulsson	435ba72afd	[SystemZ] Simplify handling of AtomicRMW instructions. (#74789 ) Let the AtomicExpand pass do more of the job of expanding AtomicRMWInst:s in order to simplify the handling in the backend. The only cases that the backend needs to handle itself are those of subword size (8/16 bits) and those directly corresponding to a target instruction.	2023-12-08 17:19:17 +01:00
Ulrich Weigand	c61eb44005	[SystemZ] Implement vector rotate in terms of funnel shift Clang currently implements a set of vector rotate builtins (__builtin_s390_verll) in terms of platform-specific LLVM intrinsics. To simplify the IR (and allow for common code optimizations if applicable), this patch removes those LLVM intrinsics and implements the builtins in terms of the platform-independent funnel shift intrinsics instead. Also, fix the prototype of the __builtin_s390_verll builtins for full compatibility with GCC.	2023-12-04 16:52:00 +01:00
Ilya Leoshkevich	03934e70ef	[SystemZ] Enable AtomicExpand pass (#70398 ) The upcoming OpenMP support for SystemZ requires handling of IR insns like `atomicrmw fadd`. Normally atomic float operations are expanded by Clang and such insns do not occur, but OpenMP generates them directly. Other architectures handle this using the AtomicExpand pass, which SystemZ did not need so far. Enable it. Currently AtomicExpand treats atomic load and stores of floats pessimistically: it casts them to integers, which SystemZ does not need, since the floating point load and store instructions are already atomic. However, the way Clang currently expands them is pessimistic as well, so this change does not make things worse. Optimizing operations on atomic floats can be a separate change in the future. This change does not create any differences the Linux kernel build.	2023-10-31 09:51:06 +01:00
Nick Desaulniers	330fa7d2a4	[TargetLowering] Deduplicate choosing InlineAsm constraint between ISels (#67057 ) Given a list of constraints for InlineAsm (ex. "imr") I'm looking to modify the order in which they are chosen. Before doing so, I noticed a fair amount of logic is duplicated between SelectionDAGISel and GlobalISel for this. That is because SelectionDAGISel is also trying to lower immediates during selection. If we detangle these concerns into: 1. choose the preferred constraint 2. attempt to lower that constraint Then we can slide down the list of constraints until we find one that can be lowered. That allows the implementation to be shared between instruction selection frameworks. This makes it so that later I might only need to adjust the priority of constraints in one place, and have both selectors behave the same.	2023-09-25 08:53:03 -07:00
Nick Desaulniers	86735a4353	reland [InlineAsm] wrap ConstraintCode in enum class NFC (#66264 ) reland [InlineAsm] wrap ConstraintCode in enum class NFC (#66003) This reverts commit ee643b706be2b6bef9980b25cc9cc988dab94bb5. Fix up build failures in targets I missed in #66003 Kept as 3 commits for reviewers to see better what's changed. Will squash when merging. - reland [InlineAsm] wrap ConstraintCode in enum class NFC (#66003) - fix all the targets I missed in #66003 - fix off by one found by llvm/test/CodeGen/SystemZ/inline-asm-addr.ll	2023-09-13 13:31:24 -07:00
Yusra Syeda	163aad6bcb	[SystemZ][z/OS] z/OS ADA codegen and emission This patch adds support for the ADA (associated data area), doing the following: -Creates the ADA table to handle displacements -Emits the ADA section in the SystemZAsmPrinter -Lowers the ADA_ENTRY node into the appropriate load instruction Differential Revision: https://reviews.llvm.org/D153788	2023-07-05 13:21:52 -04:00
Yusra Syeda	1bfdc534aa	Revert "[SystemZ][z/OS] This patch adds support for the ADA (associated data area), doing the following:" This reverts commit 9df0f66af5462e23216eae31aedbd4d2f459cc3d.	2023-06-28 11:18:12 -04:00
Yusra Syeda	9df0f66af5	[SystemZ][z/OS] This patch adds support for the ADA (associated data area), doing the following: - Creates the ADA table to handle displacements - Emits the ADA section in the SystemZAsmPrinter - Lowers the ADA_ENTRY node into the appropriate load instruction Differential Revision: https://reviews.llvm.org/D153788	2023-06-28 10:13:10 -04:00
Sergei Barannikov	e744e51b12	[SelectionDAG] Rename ADDCARRY/SUBCARRY to UADDO_CARRY/USUBO_CARRY (NFC) This will make them consistent with other overflow-aware nodes. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D148196	2023-04-29 21:59:58 +03:00
Craig Topper	219ff07f72	[Targets] Rename Flag->Glue. NFC Long long ago Glue was called Flag, and it was never completely renamed.	2023-04-02 19:28:51 -07:00
Matt Arsenault	09dd4d870e	DAG: Remove hasBitPreservingFPLogic This doesn't make sense as an option. fneg and fabs are bit preserving by definition. If a target has some fneg or fabs instruction that are not bitpreserving it's incorrect to lower fneg/fabs to use it.	2023-02-14 10:25:24 -04:00
Jonas Paulsson	0ece2050da	[SystemZ] Implement isGuaranteedNotToBeUndefOrPoisonForTargetNode(). Returning true from this method for PCREL_WRAPPER and PCREL_OFFSET avoids problems when a PCREL_OFFSET node ends up with a freeze operand, which is not handled or expected by the backend. Fixes #60107 Reviewed By: uweigand, RKSimon Differential Revision: https://reviews.llvm.org/D142971	2023-02-01 13:28:18 +01:00
Tulio Magno Quites Machado Filho	1136cf1721	[SystemZ] Implement lowering of GET_ROUNDING Add support for _FLT_ROUNDS_ in SystemZ. Patch by Tulio Magno Quites Machado Filho. Reviewed By: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D140988	2023-01-18 14:41:19 -06:00
Philip Reames	037636e695	[SDAG] Introduce a common MEMBARRIER node [nfc] We have multiple targets which have defined custom instructions and sdag nodes to represent a compiler memory barrier. This patch consolidates the sdag node definition into common code. This is a companion to D92842, but a bit different in focus. This change consolidates the existing sdag node definitions; that patch skipped defining a sdag node by instead going straight to a target node. That patch is also not NFC - as being so is quite hard for commoning up the instruction definitions. I started with two backends to ensure the new common code was reusable while not having a massive diff. Once this lands, I'll submit a series of NFCs for backends where the changes are obvious, or reviews if more discussion is needed. Differential Revision: https://reviews.llvm.org/D141317	2023-01-09 15:20:08 -08:00
Krzysztof Parzyszek	864aaa21b4	TargetLowering: convert Optional to std::optional	2022-12-01 16:19:10 -08:00
Stanislav Mekhanoshin	bcaf31ec3f	[AMDGPU] Allow finer grain control of an unaligned access speed A target can return if a misaligned access is 'fast' as defined by the target or not. In reality there can be different levels of 'fast' and 'slow'. This patch changes the boolean 'Fast' argument of the allowsMisalignedMemoryAccesses family of functions to an unsigned representing its speed. A target can still define it as it wants and the direct translation of the current code uses 0 and 1 for current false and true. This makes the change an NFC. Subsequent patch will start using an actual value of speed in the load/store vectorizer to compare if a vectorized access going to be not just fast, but not slower than before. Differential Revision: https://reviews.llvm.org/D124217	2022-11-17 09:23:53 -08:00
Josh Stone	4dcfb09e40	[NFC][CodeGen] Use const MF in TargetLowering stack probe functions This makes them callable from places like canUseAsPrologue. Differential Revision: https://reviews.llvm.org/D134492	2022-09-23 09:30:32 -07:00
Simon Pilgrim	f9de13232f	[X86] Promote i8/i16 CTTZ (BSF) instructions and remove speculation branch This patch adds a Type operand to the TLI isCheapToSpeculateCttz/isCheapToSpeculateCtlz callbacks, allowing targets to decide whether branches should occur on a type-by-type/legality basis. For X86, this patch proposes to allow CTTZ speculation for i8/i16 types that will lower to promoted i32 BSF instructions by masking the operand above the msb (we already do something similar for i8/i16 TZCNT). This required a minor tweak to CTTZ lowering - if the src operand is known never zero (i.e. due to the promotion masking) we can remove the CMOV zero src handling. Although BSF isn't very fast, most CPUs from the last 20 years don't do that bad a job with it, although there are some annoying passthrough EFLAGS dependencies. Additionally, now that we emit 'REP BSF' in most cases, we are tending towards assuming this will most likely be executed as a TZCNT instruction on any semi-modern CPU. Differential Revision: https://reviews.llvm.org/D132520	2022-08-24 17:28:18 +01:00
Kazu Hirata	d66cbc565a	Don't use Optional::hasValue (NFC)	2022-06-20 20:26:05 -07:00
Jonas Paulsson	eaa78035c6	[SystemZ] Patchset for expanding memcpy/memset using at most two stores. * Set MaxStoresPerMemcpy and MaxStoresPerMemset to 2. * Optimize stores of replicated values in SystemZ::combineSTORE(). This handles the now expanded memory operations and as well some other pre-existing cases. * Reject a big displacement in isLegalAddressingMode() for a vector type. * Return true from shouldConsiderGEPOffsetSplit(). Reviewed By: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D122105	2022-05-13 15:31:09 +02:00
Serge Pavlov	c96cc500f0	[SystemZ] Custom lowering of llvm.is_fpclass Differential Revision: https://reviews.llvm.org/D114695	2022-04-29 13:27:36 +07:00
Jonas Paulsson	4aa5dc15f0	[SystemZ] Handle SystemZ specific inline assembly address operands. Handle ZQ, ZR, ZS and ZT inline assembly operand constraints. Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D110267	2022-04-19 16:55:45 +02:00
Kai Nacke	30053c1445	[SystemZ/z/OS] Add va intrinsics for XPLINK Add support for va intrinsics for the XPLINK ABI. Only the extended vararg variant, which uses a pointer to next argument, is supported. The standard variant will build on this. Reviewed By: uweigand Differential Revision: https://reviews.llvm.org/D120148	2022-02-22 14:35:05 -05:00
Kai Nacke	713496d9c9	[SystemZ/z/OS] Add XPLINK dynamic stack allocation With XPLINK, dynamic stack allocations requires calling a runtime function, which allocates the stack memory, moves the register save area, and returns the new stack pointer. Reviewed By: uweigand Differential Revision: https://reviews.llvm.org/D119732	2022-02-14 13:35:28 -05:00
Jonas Paulsson	9ca9fee6e8	[SystemZ] Don't shrink 64-bit FP constants. Return false from ShouldShrinkFPConstant(), so that these constants are stored in their full size on the constant pool, even if they could have been shrunk and used with an extending load. This is better since LD is faster than LDE, and it also enables reg/mem opcodes. Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D117927	2022-01-27 16:14:53 -06:00
Kazu Hirata	41bfac6aed	[Target] Remove unused forward declarations (NFC)	2022-01-02 10:20:15 -08:00
Jonas Paulsson	cbf682cb1c	[SystemZ] Improve codegen for memset. Memset with a constant length was implemented with a single store followed by a series of MVC:s. This patch changes this so that one store of the byte is emitted for each MVC, which avoids data dependencies between the MVCs. An MVI/STC + MVC(len-1) is done for each block. In addition, memset with a variable length is now also handled without a libcall. Since the byte is first stored and then MVC is used from that address, a length of two must now be subtracted instead of one for the loop and EXRL. This requires an extra check for the one-byte case, which is handled in a special block with just a single MVI/STC (like GCC). Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D112004	2021-12-06 12:10:58 -06:00
Jonas Paulsson	c0d88613f2	[SystemZ] Remove some now unused ISD XXX_LOOP opcodes.	2021-10-14 14:55:44 +02:00
Jonas Paulsson	8b32e25bc2	[SystemZ] Return true from convertSetCCLogicToBitwiseLogic for scalar integer. Review: Ulrich Weigand	2021-06-08 16:27:28 -05:00
Jonas Paulsson	d5e4f28c0a	[SystemZ] Return true from isMaskAndCmp0FoldingBeneficial(). Return true if the mask is a constant uint of 2 bytes, in which case TMLL is available. Review: Ulrich Weigand	2021-06-08 15:42:46 -05:00
Jonas Paulsson	9ee3f16919	[SystemZ] Return true from hasBitPreservingFPLogic(). This is currently NFC on benchmarks and tests. Review: Ulrich Weigand	2021-06-01 11:52:50 -05:00
Ulrich Weigand	c123c178b2	[SystemZ] Set getExtendForAtomicOps to ISD::ANY_EXTEND The implementation of subword atomics does not actually guarantee the result is zero-extended, which now caused build bot failures after https://reviews.llvm.org/D101342 was landed.	2021-05-29 12:15:18 +02:00
Jonas Paulsson	d058262b14	[SystemZ] Support i128 inline asm operands. Support virtual, physical and tied i128 register operands in inline assembly. i128 is on SystemZ not really supported and is not a legal type and generally such a value will be split into two i64 parts. There are however some instructions that require a pair of two GPR64 registers contained in the GR128 bit reg class, which is untyped. For inline assmebly operands, it proved to be very cumbersome to first follow the general behavior of splitting an i128 operand into two parts and then later rebuild the INLINEASM MI to have one GR128 register. Instead, some minor common code changes were made to SelectionDAGBUilder to only create one GR128 register part to begin with. In particular: - getNumRegisters() now has an optional parameter "RegisterVT" which is passed by AddInlineAsmOperands() and GetRegistersForValue(). - The bitcasting in GetRegistersForValue is not performed if RegVT is Untyped. - The RC for a tied use in AddInlineAsmOperands() is now computed either from the tied def (virtual register), or by getMinimalPhysRegClass() (physical register). - InstrEmitter.cpp:EmitCopyFromReg() has been fixed so that the register class (DstRC) can also be computed for an illegal type. In the SystemZ backend getNumRegisters(), splitValueIntoRegisterParts() and joinRegisterPartsIntoValue() have been implemented to handle i128 operands. Differential Revision: https://reviews.llvm.org/D100788 Review: Ulrich Weigand	2021-05-26 10:08:32 -05:00
Jonas Paulsson	e77cb4ae63	[SystemZ] Return true from preferZeroCompareBranch(). Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D103057	2021-05-25 10:24:14 -05:00
Jonas Paulsson	7334b3dc3e	[SystemZ] Reimplement the i8/i16 compare-and-swap logic. Even though the implementation in emitAtomicCmpSwapW() was correct, it made Valgrind report an error. Instead of using a RISBG on CmpVal, an LL[CH]R can be made on the OldVal, and the problem is avoided. Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D97604	2021-03-03 14:04:32 -06:00
Craig Topper	11ef356d9e	[TargetLowering] Use Align in allowsMisalignedMemoryAccesses. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D96097	2021-02-04 19:22:06 -08:00
Jonas Paulsson	653b97690f	[SystemZ] Improve handling of backchain offset. - New function SDValue getBackchainAddress() used by lowerDYNAMIC_STACKALLOC() and lowerSTACKRESTORE() to properly handle the backchain offset also with packed-stack. - Make a common function getBackchainOffset() for the computation of the backchain offset and use in some places (NFC). Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D93171	2020-12-14 12:39:38 -06:00
Jonas Paulsson	45b8e37afc	[SystemZ] Use ISD::ABS opcode during isel. The SystemZISD::IABS node is no longer needed since ISD::ABS can be used instead. Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D91697	2020-11-18 14:43:55 +01:00
Jonas Paulsson	ef7aad0db4	[SystemZ] Improve handling of ZERO_EXTEND_VECTOR_INREG. Instead of doing multiple unpacks when zero extending vectors (e.g. v2i16 -> v2i64), benchmarks have shown that it is better to do a VPERM (vector permute) since that is only one sequential instruction on the critical path. This patch achieves this by 1. Expand ZERO_EXTEND_VECTOR_INREG into a vector shuffle with a zero vector instead of (multiple) unpacks. 2. Improve SystemZ::GeneralShuffle to perform a single unpack as the last operation if Bytes matches it. Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D78486	2020-06-30 09:08:10 +02:00
Jonas Paulsson	515bfc66ea	[SystemZ] Implement -fstack-clash-protection Probing of allocated stack space is now done when this option is passed. The purpose is to protect against the stack clash attack (see https://www.qualys.com/2017/06/19/stack-clash/stack-clash.txt). Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D78717	2020-06-06 18:38:36 +02:00
Ulrich Weigand	947f78ac27	[SystemZ] Fix/optimize vec_load_len and related intrinsics When using vec_load/store_len_r with an immediate length operand of 16 or larger, LLVM will currently emit an VLRL/VSTRL instruction with that immediate. This creates a valid encoding (which should be supported by the assembler), but always traps at runtime. This patch fixes this by not creating VLRL/VSTRL in those cases. This would result in loading the length into a register and calling VLRLR/VSTRLR instead. However, these operations with a length of 15 or larger are in fact simply equivalent to a full vector load or store. And in fact the same holds true for vec_load/store_len as well. Therefore, add a DAGCombine rule to replace those operations with plain vector loads or stores if the length is known at compile time and equal or larger to 15.	2020-05-06 21:15:58 +02:00
Matt Arsenault	84aa58cbe2	CodeGen: Use Register in TargetLowering	2020-04-08 12:10:58 -04:00
Jonas Paulsson	132f25bcca	[SystemZ] Avoid scalarization of [SU]INT_TO_FP ISD-nodes. The type legalizer will scalarize vector conversions from integer to floating point if the source element size is less than that of the result. This is avoided now by inserting a zero/sign-extension of the source vector before type legalization. Review: Ulrich Weigand Differential revision: https://reviews.llvm.org/D75978	2020-03-16 13:07:42 +01:00
Jonas Paulsson	cdcce3cabf	[SystemZ] Also accept ISD::USUBO in shouldFormOverflowOp(). Forming subtract with overflow is beneficial on SystemZ, just like additions. Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D75290	2020-03-03 14:38:57 +01:00

1 2 3 4 5

206 Commits