206 Commits

Author SHA1 Message Date
Jonas Paulsson
16b7cc69ef
[SystemZ] Eliminate call sequence instructions early. (#77812)
On SystemZ, the outgoing argument area which is big enough for all calls
in the function is created once during the prolog, as opposed to
adjusting the stack around each call. The call-sequence instructions are
therefore not really useful any more than to compute the maximum call
frame size, which has so far been done by PEI, but can just as well be
done at an earlier point.

This patch removes the mapping of the CallFrameSetupOpcode and
CallFrameDestroyOpcode and instead computes the MaxCallFrameSize
directly after instruction selection and then removes the ADJCALLSTACK
pseudos. This removes the confusing pseudos and also avoids the problem
of having to keep the call frame size accurate when creating new MBBs.

This fixes #76618 which exposed the need to maintain the call frame size
when splitting blocks (which was not done).
2024-03-28 18:26:38 +01:00
Ulrich Weigand
4b907414d2 [SystemZ] Add support for llvm.readcyclecounter
The llvm.readcyclecounter intrinsic can be implemented via the
STORE CLOCK FAST (STCKF) instruction.
2024-03-22 20:01:02 +01:00
Jonas Paulsson
8b8e1adbde
[SystemZ] Don't lower ATOMIC_LOAD/STORE to LOAD/STORE (#75879)
- Instead of lowering float/double ISD::ATOMIC_LOAD / ISD::ATOMIC_STORE
nodes to regular LOAD/STORE nodes, make them legal and select those nodes
properly instead. This avoids exposing them to the DAGCombiner.

- AtomicExpand pass no longer casts float/double atomic load/stores to integer
  (FP128 is still casted).
2024-03-18 17:21:50 -04:00
Yusra Syeda
0768253c20
[SystemZ][z/OS] Add exception handling for XPLINK (#74638)
Adds emitting the exception table and the EH registers for XPLINK.

---------

Co-authored-by: Yusra Syeda <yusra.syeda@ibm.com>
2023-12-19 13:58:33 -05:00
Ulrich Weigand
59f7f35a90 [SystemZ] ABI support for single-element vector types
Support passing and returning values of single-element vector
types (i.e. <1 x i128> and <1 x fp128>).

Now that i128 is a legal type, supporting these types can be
done simply by providing a getRegisterTypeForCallingConv
implementation that handles them.

Fixes https://github.com/llvm/llvm-project/issues/61291
2023-12-15 19:31:00 +01:00
Ulrich Weigand
a65ccc1b9f
[SystemZ] Support i128 as legal type in VRs (#74625)
On processors supporting vector registers and SIMD instructions, enable
i128 as legal type in VRs. This allows many operations to be implemented
via native instructions directly in VRs (including add, subtract,
logical operations and shifts). For a few other operations (e.g.
multiply and divide, as well as atomic operations), we need to move the
i128 value back to a GPR pair to use the corresponding instruction
there. Overall, this is still beneficial.

The patch includes the following LLVM changes:
- Enable i128 as legal type
- Set up legal operations (in SystemZInstrVector.td)
- Custom expansion for i128 add/subtract with carry
- Custom expansion for i128 comparisons and selects
- Support for moving i128 to/from GPR pairs when required
- Handle 128-bit integer constant values everywhere
- Use i128 as intrinsic operand type where appropriate
- Updated and new test cases

In addition, clang builtins are updated to reflect the intrinsic operand
type changes (which also improves compatibility with GCC).
2023-12-15 12:55:15 +01:00
Jonas Paulsson
435ba72afd
[SystemZ] Simplify handling of AtomicRMW instructions. (#74789)
Let the AtomicExpand pass do more of the job of expanding
AtomicRMWInst:s in order to simplify the handling in the backend.

The only cases that the backend needs to handle itself are those of
subword size (8/16 bits) and those directly corresponding to a target
instruction.
2023-12-08 17:19:17 +01:00
Ulrich Weigand
c61eb44005 [SystemZ] Implement vector rotate in terms of funnel shift
Clang currently implements a set of vector rotate builtins
(__builtin_s390_verll*) in terms of platform-specific LLVM
intrinsics.  To simplify the IR (and allow for common code
optimizations if applicable), this patch removes those LLVM
intrinsics and implements the builtins in terms of the
platform-independent funnel shift intrinsics instead.

Also, fix the prototype of the __builtin_s390_verll*
builtins for full compatibility with GCC.
2023-12-04 16:52:00 +01:00
Ilya Leoshkevich
03934e70ef
[SystemZ] Enable AtomicExpand pass (#70398)
The upcoming OpenMP support for SystemZ requires handling of IR insns
like `atomicrmw fadd`. Normally atomic float operations are expanded by
Clang and such insns do not occur, but OpenMP generates them directly.
Other architectures handle this using the AtomicExpand pass, which
SystemZ did not need so far. Enable it.

Currently AtomicExpand treats atomic load and stores of floats
pessimistically: it casts them to integers, which SystemZ does not need,
since the floating point load and store instructions are already atomic.
However, the way Clang currently expands them is pessimistic as well, so
this change does not make things worse. Optimizing operations on atomic
floats can be a separate change in the future.

This change does not create any differences the Linux kernel build.
2023-10-31 09:51:06 +01:00
Nick Desaulniers
330fa7d2a4
[TargetLowering] Deduplicate choosing InlineAsm constraint between ISels (#67057)
Given a list of constraints for InlineAsm (ex. "imr") I'm looking to
modify the order in which they are chosen. Before doing so, I noticed a
fair
amount of logic is duplicated between SelectionDAGISel and GlobalISel
for this.

That is because SelectionDAGISel is also trying to lower immediates
during selection. If we detangle these concerns into:
1. choose the preferred constraint
2. attempt to lower that constraint

Then we can slide down the list of constraints until we find one that
can be lowered. That allows the implementation to be shared between
instruction selection frameworks.

This makes it so that later I might only need to adjust the priority of
constraints in one place, and have both selectors behave the same.
2023-09-25 08:53:03 -07:00
Nick Desaulniers
86735a4353
reland [InlineAsm] wrap ConstraintCode in enum class NFC (#66264)
reland [InlineAsm] wrap ConstraintCode in enum class NFC (#66003)

This reverts commit ee643b706be2b6bef9980b25cc9cc988dab94bb5.

Fix up build failures in targets I missed in #66003

Kept as 3 commits for reviewers to see better what's changed. Will
squash when
merging.

- reland [InlineAsm] wrap ConstraintCode in enum class NFC (#66003)
- fix all the targets I missed in #66003
- fix off by one found by llvm/test/CodeGen/SystemZ/inline-asm-addr.ll
2023-09-13 13:31:24 -07:00
Yusra Syeda
163aad6bcb [SystemZ][z/OS] z/OS ADA codegen and emission
This patch adds support for the ADA (associated data area), doing the following:

-Creates the ADA table to handle displacements
-Emits the ADA section in the SystemZAsmPrinter
-Lowers the ADA_ENTRY node into the appropriate load instruction

Differential Revision: https://reviews.llvm.org/D153788
2023-07-05 13:21:52 -04:00
Yusra Syeda
1bfdc534aa Revert "[SystemZ][z/OS] This patch adds support for the ADA (associated data area), doing the following:"
This reverts commit 9df0f66af5462e23216eae31aedbd4d2f459cc3d.
2023-06-28 11:18:12 -04:00
Yusra Syeda
9df0f66af5 [SystemZ][z/OS] This patch adds support for the ADA (associated data area), doing the following:
- Creates the ADA table to handle displacements
- Emits the ADA section in the SystemZAsmPrinter
- Lowers the ADA_ENTRY node into the appropriate load instruction

Differential Revision: https://reviews.llvm.org/D153788
2023-06-28 10:13:10 -04:00
Sergei Barannikov
e744e51b12 [SelectionDAG] Rename ADDCARRY/SUBCARRY to UADDO_CARRY/USUBO_CARRY (NFC)
This will make them consistent with other overflow-aware nodes.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D148196
2023-04-29 21:59:58 +03:00
Craig Topper
219ff07f72 [Targets] Rename Flag->Glue. NFC
Long long ago Glue was called Flag, and it was never completely
renamed.
2023-04-02 19:28:51 -07:00
Matt Arsenault
09dd4d870e DAG: Remove hasBitPreservingFPLogic
This doesn't make sense as an option. fneg and fabs are bit
preserving by definition. If a target has some fneg or fabs
instruction that are not bitpreserving it's incorrect to lower
fneg/fabs to use it.
2023-02-14 10:25:24 -04:00
Jonas Paulsson
0ece2050da [SystemZ] Implement isGuaranteedNotToBeUndefOrPoisonForTargetNode().
Returning true from this method for PCREL_WRAPPER and PCREL_OFFSET avoids
problems when a PCREL_OFFSET node ends up with a freeze operand, which is not
handled or expected by the backend.

Fixes #60107

Reviewed By: uweigand, RKSimon

Differential Revision: https://reviews.llvm.org/D142971
2023-02-01 13:28:18 +01:00
Tulio Magno Quites Machado Filho
1136cf1721 [SystemZ] Implement lowering of GET_ROUNDING
Add support for _FLT_ROUNDS_ in SystemZ.

Patch by Tulio Magno Quites Machado Filho.

Reviewed By: Ulrich Weigand

Differential Revision: https://reviews.llvm.org/D140988
2023-01-18 14:41:19 -06:00
Philip Reames
037636e695 [SDAG] Introduce a common MEMBARRIER node [nfc]
We have multiple targets which have defined custom instructions and sdag nodes to represent a compiler memory barrier. This patch consolidates the sdag node definition into common code.

This is a companion to D92842, but a bit different in focus. This change consolidates the existing sdag node definitions; that patch skipped defining a sdag node by instead going straight to a target node. That patch is also not NFC - as being so is quite hard for commoning up the instruction definitions.

I started with two backends to ensure the new common code was reusable while not having a massive diff. Once this lands, I'll submit a series of NFCs for backends where the changes are obvious, or reviews if more discussion is needed.

Differential Revision: https://reviews.llvm.org/D141317
2023-01-09 15:20:08 -08:00
Krzysztof Parzyszek
864aaa21b4 TargetLowering: convert Optional to std::optional 2022-12-01 16:19:10 -08:00
Stanislav Mekhanoshin
bcaf31ec3f [AMDGPU] Allow finer grain control of an unaligned access speed
A target can return if a misaligned access is 'fast' as defined
by the target or not. In reality there can be different levels
of 'fast' and 'slow'. This patch changes the boolean 'Fast'
argument of the allowsMisalignedMemoryAccesses family of functions
to an unsigned representing its speed.

A target can still define it as it wants and the direct translation
of the current code uses 0 and 1 for current false and true. This
makes the change an NFC.

Subsequent patch will start using an actual value of speed in
the load/store vectorizer to compare if a vectorized access going
to be not just fast, but not slower than before.

Differential Revision: https://reviews.llvm.org/D124217
2022-11-17 09:23:53 -08:00
Josh Stone
4dcfb09e40 [NFC][CodeGen] Use const MF in TargetLowering stack probe functions
This makes them callable from places like canUseAsPrologue.

Differential Revision: https://reviews.llvm.org/D134492
2022-09-23 09:30:32 -07:00
Simon Pilgrim
f9de13232f [X86] Promote i8/i16 CTTZ (BSF) instructions and remove speculation branch
This patch adds a Type operand to the TLI isCheapToSpeculateCttz/isCheapToSpeculateCtlz callbacks, allowing targets to decide whether branches should occur on a type-by-type/legality basis.

For X86, this patch proposes to allow CTTZ speculation for i8/i16 types that will lower to promoted i32 BSF instructions by masking the operand above the msb (we already do something similar for i8/i16 TZCNT). This required a minor tweak to CTTZ lowering - if the src operand is known never zero (i.e. due to the promotion masking) we can remove the CMOV zero src handling.

Although BSF isn't very fast, most CPUs from the last 20 years don't do that bad a job with it, although there are some annoying passthrough EFLAGS dependencies. Additionally, now that we emit 'REP BSF' in most cases, we are tending towards assuming this will most likely be executed as a TZCNT instruction on any semi-modern CPU.

Differential Revision: https://reviews.llvm.org/D132520
2022-08-24 17:28:18 +01:00
Kazu Hirata
d66cbc565a Don't use Optional::hasValue (NFC) 2022-06-20 20:26:05 -07:00
Jonas Paulsson
eaa78035c6 [SystemZ] Patchset for expanding memcpy/memset using at most two stores.
* Set MaxStoresPerMemcpy and MaxStoresPerMemset to 2.

* Optimize stores of replicated values in SystemZ::combineSTORE(). This
  handles the now expanded memory operations and as well some other
  pre-existing cases.

* Reject a big displacement in isLegalAddressingMode() for a vector type.

* Return true from shouldConsiderGEPOffsetSplit().

Reviewed By: Ulrich Weigand

Differential Revision: https://reviews.llvm.org/D122105
2022-05-13 15:31:09 +02:00
Serge Pavlov
c96cc500f0 [SystemZ] Custom lowering of llvm.is_fpclass
Differential Revision: https://reviews.llvm.org/D114695
2022-04-29 13:27:36 +07:00
Jonas Paulsson
4aa5dc15f0 [SystemZ] Handle SystemZ specific inline assembly address operands.
Handle ZQ, ZR, ZS and ZT inline assembly operand constraints.

Review: Ulrich Weigand

Differential Revision: https://reviews.llvm.org/D110267
2022-04-19 16:55:45 +02:00
Kai Nacke
30053c1445 [SystemZ/z/OS] Add va intrinsics for XPLINK
Add support for va intrinsics for the XPLINK ABI.
Only the extended vararg variant, which uses a pointer to next
argument, is supported. The standard variant will build on this.

Reviewed By: uweigand

Differential Revision: https://reviews.llvm.org/D120148
2022-02-22 14:35:05 -05:00
Kai Nacke
713496d9c9 [SystemZ/z/OS] Add XPLINK dynamic stack allocation
With XPLINK, dynamic stack allocations requires calling
a runtime function, which allocates the stack memory,
moves the register save area, and returns the new
stack pointer.

Reviewed By: uweigand

Differential Revision: https://reviews.llvm.org/D119732
2022-02-14 13:35:28 -05:00
Jonas Paulsson
9ca9fee6e8 [SystemZ] Don't shrink 64-bit FP constants.
Return false from ShouldShrinkFPConstant(), so that these constants are stored
in their full size on the constant pool, even if they could have been shrunk
and used with an extending load.

This is better since LD is faster than LDE, and it also enables reg/mem opcodes.

Review: Ulrich Weigand

Differential Revision: https://reviews.llvm.org/D117927
2022-01-27 16:14:53 -06:00
Kazu Hirata
41bfac6aed [Target] Remove unused forward declarations (NFC) 2022-01-02 10:20:15 -08:00
Jonas Paulsson
cbf682cb1c [SystemZ] Improve codegen for memset.
Memset with a constant length was implemented with a single store followed by
a series of MVC:s. This patch changes this so that one store of the byte is
emitted for each MVC, which avoids data dependencies between the MVCs. An
MVI/STC + MVC(len-1) is done for each block.

In addition, memset with a variable length is now also handled without a
libcall. Since the byte is first stored and then MVC is used from that
address, a length of two must now be subtracted instead of one for the loop
and EXRL. This requires an extra check for the one-byte case, which is
handled in a special block with just a single MVI/STC (like GCC).

Review: Ulrich Weigand

Differential Revision: https://reviews.llvm.org/D112004
2021-12-06 12:10:58 -06:00
Jonas Paulsson
c0d88613f2 [SystemZ] Remove some now unused ISD XXX_LOOP opcodes. 2021-10-14 14:55:44 +02:00
Jonas Paulsson
8b32e25bc2 [SystemZ] Return true from convertSetCCLogicToBitwiseLogic for scalar integer.
Review: Ulrich Weigand
2021-06-08 16:27:28 -05:00
Jonas Paulsson
d5e4f28c0a [SystemZ] Return true from isMaskAndCmp0FoldingBeneficial().
Return true if the mask is a constant uint of 2 bytes, in which case TMLL is
available.

Review: Ulrich Weigand
2021-06-08 15:42:46 -05:00
Jonas Paulsson
9ee3f16919 [SystemZ] Return true from hasBitPreservingFPLogic().
This is currently NFC on benchmarks and tests.

Review: Ulrich Weigand
2021-06-01 11:52:50 -05:00
Ulrich Weigand
c123c178b2 [SystemZ] Set getExtendForAtomicOps to ISD::ANY_EXTEND
The implementation of subword atomics does not actually
guarantee the result is zero-extended, which now caused
build bot failures after https://reviews.llvm.org/D101342
was landed.
2021-05-29 12:15:18 +02:00
Jonas Paulsson
d058262b14 [SystemZ] Support i128 inline asm operands.
Support virtual, physical and tied i128 register operands in inline assembly.

i128 is on SystemZ not really supported and is not a legal type and generally
such a value will be split into two i64 parts. There are however some
instructions that require a pair of two GPR64 registers contained in the GR128
bit reg class, which is untyped.

For inline assmebly operands, it proved to be very cumbersome to first follow
the general behavior of splitting an i128 operand into two parts and then
later rebuild the INLINEASM MI to have one GR128 register. Instead, some
minor common code changes were made to SelectionDAGBUilder to only create one
GR128 register part to begin with. In particular:

- getNumRegisters() now has an optional parameter "RegisterVT" which is
  passed by AddInlineAsmOperands() and GetRegistersForValue().

- The bitcasting in GetRegistersForValue is not performed if RegVT is
  Untyped.

- The RC for a tied use in AddInlineAsmOperands() is now computed either from
  the tied def (virtual register), or by getMinimalPhysRegClass() (physical
  register).

- InstrEmitter.cpp:EmitCopyFromReg() has been fixed so that the register
  class (DstRC) can also be computed for an illegal type.

In the SystemZ backend getNumRegisters(), splitValueIntoRegisterParts() and
joinRegisterPartsIntoValue() have been implemented to handle i128 operands.

Differential Revision: https://reviews.llvm.org/D100788

Review: Ulrich Weigand
2021-05-26 10:08:32 -05:00
Jonas Paulsson
e77cb4ae63 [SystemZ] Return true from preferZeroCompareBranch().
Review: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D103057
2021-05-25 10:24:14 -05:00
Jonas Paulsson
7334b3dc3e [SystemZ] Reimplement the i8/i16 compare-and-swap logic.
Even though the implementation in emitAtomicCmpSwapW() was correct, it made
Valgrind report an error. Instead of using a RISBG on CmpVal, an LL[CH]R can
be made on the OldVal, and the problem is avoided.

Review: Ulrich Weigand

Differential Revision: https://reviews.llvm.org/D97604
2021-03-03 14:04:32 -06:00
Craig Topper
11ef356d9e [TargetLowering] Use Align in allowsMisalignedMemoryAccesses.
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D96097
2021-02-04 19:22:06 -08:00
Jonas Paulsson
653b97690f [SystemZ] Improve handling of backchain offset.
- New function SDValue getBackchainAddress() used by
  lowerDYNAMIC_STACKALLOC() and lowerSTACKRESTORE() to properly handle the
  backchain offset also with packed-stack.

- Make a common function getBackchainOffset() for the computation of the
  backchain offset and use in some places (NFC).

Review: Ulrich Weigand

Differential Revision: https://reviews.llvm.org/D93171
2020-12-14 12:39:38 -06:00
Jonas Paulsson
45b8e37afc [SystemZ] Use ISD::ABS opcode during isel.
The SystemZISD::IABS node is no longer needed since ISD::ABS can be used
instead.

Review: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D91697
2020-11-18 14:43:55 +01:00
Jonas Paulsson
ef7aad0db4 [SystemZ] Improve handling of ZERO_EXTEND_VECTOR_INREG.
Instead of doing multiple unpacks when zero extending vectors (e.g. v2i16 ->
v2i64), benchmarks have shown that it is better to do a VPERM (vector
permute) since that is only one sequential instruction on the critical path.

This patch achieves this by

1. Expand ZERO_EXTEND_VECTOR_INREG into a vector shuffle with a zero vector
   instead of (multiple) unpacks.

2. Improve SystemZ::GeneralShuffle to perform a single unpack as the last
   operation if Bytes matches it.

Review: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D78486
2020-06-30 09:08:10 +02:00
Jonas Paulsson
515bfc66ea [SystemZ] Implement -fstack-clash-protection
Probing of allocated stack space is now done when this option is passed. The
purpose is to protect against the stack clash attack (see
https://www.qualys.com/2017/06/19/stack-clash/stack-clash.txt).

Review: Ulrich Weigand

Differential Revision: https://reviews.llvm.org/D78717
2020-06-06 18:38:36 +02:00
Ulrich Weigand
947f78ac27 [SystemZ] Fix/optimize vec_load_len and related intrinsics
When using vec_load/store_len_r with an immediate length operand
of 16 or larger, LLVM will currently emit an VLRL/VSTRL instruction
with that immediate.  This creates a valid encoding (which should be
supported by the assembler), but always traps at runtime.  This patch
fixes this by not creating VLRL/VSTRL in those cases.

This would result in loading the length into a register and
calling VLRLR/VSTRLR instead.  However, these operations with
a length of 15 or larger are in fact simply equivalent to a
full vector load or store.  And in fact the same holds true for
vec_load/store_len as well.

Therefore, add a DAGCombine rule to replace those operations with
plain vector loads or stores if the length is known at compile
time and equal or larger to 15.
2020-05-06 21:15:58 +02:00
Matt Arsenault
84aa58cbe2 CodeGen: Use Register in TargetLowering 2020-04-08 12:10:58 -04:00
Jonas Paulsson
132f25bcca [SystemZ] Avoid scalarization of [SU]INT_TO_FP ISD-nodes.
The type legalizer will scalarize vector conversions from integer to floating
point if the source element size is less than that of the result.

This is avoided now by inserting a zero/sign-extension of the source vector
before type legalization.

Review: Ulrich Weigand

Differential revision: https://reviews.llvm.org/D75978
2020-03-16 13:07:42 +01:00
Jonas Paulsson
cdcce3cabf [SystemZ] Also accept ISD::USUBO in shouldFormOverflowOp().
Forming subtract with overflow is beneficial on SystemZ, just like additions.

Review: Ulrich Weigand

Differential Revision: https://reviews.llvm.org/D75290
2020-03-03 14:38:57 +01:00