774 Commits

Author SHA1 Message Date
Neumann Hon
eb3e09c9bf [SystemZ] [z/OS] Add support for generating huge (1 MiB) stack frames in XPLINK64
This patch extends support for generating huge stack frames on 64-bit XPLINK by implementing the ABI-mandated call to the stack extension routine.

Reviewed By: uweigand

Differential Revision: https://reviews.llvm.org/D120450
2022-02-25 02:37:08 -05:00
Kai Nacke
30053c1445 [SystemZ/z/OS] Add va intrinsics for XPLINK
Add support for va intrinsics for the XPLINK ABI.
Only the extended vararg variant, which uses a pointer to next
argument, is supported. The standard variant will build on this.

Reviewed By: uweigand

Differential Revision: https://reviews.llvm.org/D120148
2022-02-22 14:35:05 -05:00
Jonas Paulsson
cf426100d6 [SystemZ] Improve emission of alignment hints.
Handle multiple memoperands in lowerAlignmentHint().

Review: Ulrich Weigand
2022-02-17 12:30:43 -06:00
Mubariz Afzal
1a5b881d4c Revert [SystemZ][z/OS] Fix f32 variadic argument assertion
This reverts ea0676f97d734196b15da7553cd407e6a36cef2d
2022-02-15 23:28:40 -05:00
Mubariz Afzal
ea0676f97d [SystemZ][z/OS] Fix f32 variadic argument assertion
The tablegen lines that specify the XPLINK64 calling convention for promoting an f32 vararg to an f64 are effectively overwritten by the following tablegen line which bitcast an f64 vararg to an i64 (so that it can be used in the GPRs). It becomes a bitcast from f32 to i64.

Since we don't handle a bitcast for f32s this caused an assertion.
2022-02-15 18:11:57 -05:00
Kai Nacke
713496d9c9 [SystemZ/z/OS] Add XPLINK dynamic stack allocation
With XPLINK, dynamic stack allocations requires calling
a runtime function, which allocates the stack memory,
moves the register save area, and returns the new
stack pointer.

Reviewed By: uweigand

Differential Revision: https://reviews.llvm.org/D119732
2022-02-14 13:35:28 -05:00
Kai Nacke
62ba528a68 [Systemz/z/OS] Centralize emitting the call type information
With XPLINK, a no-op with information about the call type is emitted
after each call instruction. Centralizing it has the advantage that it is
easy to document all cases, and it makes it easier to extend it later
(e.g. dynamic stack allocation, 32 bit mode).
Also add a test checking the call types emitted so far.

Reviewed By: uweigand

Differential Revision: https://reviews.llvm.org/D119557
2022-02-14 12:00:50 -05:00
Jonas Paulsson
9ca9fee6e8 [SystemZ] Don't shrink 64-bit FP constants.
Return false from ShouldShrinkFPConstant(), so that these constants are stored
in their full size on the constant pool, even if they could have been shrunk
and used with an extending load.

This is better since LD is faster than LDE, and it also enables reg/mem opcodes.

Review: Ulrich Weigand

Differential Revision: https://reviews.llvm.org/D117927
2022-01-27 16:14:53 -06:00
Jonas Paulsson
f541a5048a [SystemZ] Implement orderFrameObjects().
By reordering the objects on the stack frame after looking at the users, a
better utilization of displacement operands will result. This means less
needed Load Address instructions for the accessing of these objects.

This is important for very large functions where otherwise small changes
could cause a lot more/less accesses go out of range.

Note: this is not yet enabled for SystemZXPLINKFrameLowering, but should be.

Review: Ulrich Weigand

Differential Revision: https://reviews.llvm.org/D115690
2022-01-27 16:09:19 -06:00
Nick Desaulniers
79ebc3b0dd [llvm][test] rewrite callbr to use i rather than X constraint NFC
In D115311, we're looking to modify clang to emit i constraints rather
than X constraints for callbr's indirect destinations. Prior to doing
so, update all of the existing tests in llvm/ to match.

Reviewed By: void, jyknight

Differential Revision: https://reviews.llvm.org/D115410
2022-01-11 11:31:08 -08:00
Nikita Popov
291660e62f [SystemZ] Add missing elementtype in python test (NFC)
Missed this originally because it does not run locally.
2022-01-07 09:08:38 +01:00
Nikita Popov
f430c1eb64 [Tests] Add elementtype attribute to indirect inline asm operands (NFC)
This updates LLVM tests for D116531 by adding elementtype attributes
to operands that correspond to indirect asm constraints.
2022-01-06 14:23:51 +01:00
Neumann Hon
9a35844990 [z/OS] Implement prologue and epilogue generation for z/OS target.
This patch adds support for prologue and epilogue generation for the z/OS target under the XPLINK64 ABI for functions with a stack size of less than 1048576 bytes (huge stack frames).

Reviewed By: uweigand

Differential Revision: https://reviews.llvm.org/D114457
2021-12-16 09:04:05 -05:00
Muiez Ahmed
ebf5497b26 Revert "[z/OS] Implement prologue and epilogue generation for z/OS target."
This reverts commit ffad4d777b227f91be04020e2cd86ab38e969e39 because it introduced buildbot failures.
2021-12-14 14:22:11 -05:00
Neumann Hon
ffad4d777b [z/OS] Implement prologue and epilogue generation for z/OS target.
This patch adds support for prologue and epilogue generation for
the z/OS target under the XPLINK64 ABI for functions with a stack
size of less than 1048576 bytes (huge stack frames).

Reviewed by: uweigand, Kai

Differential Revision: https://reviews.llvm.org/D114457
2021-12-13 17:03:23 -05:00
Jonas Paulsson
cbf682cb1c [SystemZ] Improve codegen for memset.
Memset with a constant length was implemented with a single store followed by
a series of MVC:s. This patch changes this so that one store of the byte is
emitted for each MVC, which avoids data dependencies between the MVCs. An
MVI/STC + MVC(len-1) is done for each block.

In addition, memset with a variable length is now also handled without a
libcall. Since the byte is first stored and then MVC is used from that
address, a length of two must now be subtracted instead of one for the loop
and EXRL. This requires an extra check for the one-byte case, which is
handled in a special block with just a single MVI/STC (like GCC).

Review: Ulrich Weigand

Differential Revision: https://reviews.llvm.org/D112004
2021-12-06 12:10:58 -06:00
Guozhi Wei
f1d8345a2a [TwoAddressInstructionPass] Create register mapping for registers with multiple uses in the current MBB
Currently we create register mappings for registers used only once in current
MBB. For registers with multiple uses, when all the uses are in the current MBB,
we can also create mappings for them similarly according to the last use.
For example

    %reg101 = ...
            = ... reg101
    %reg103 = ADD %reg101, %reg102

We can create mapping between %reg101 and %reg103.

Differential Revision: https://reviews.llvm.org/D113193
2021-11-29 19:01:59 -08:00
Jonas Paulsson
bb506938be [SystemZ] Improvement of emitMemMemWrapper()
It was discovered that an extra register COPY remained when expanding a
(variable length) memory operation with a loop and there was another use of
the involved address register(s) afterwards.

A simple fix for this is to COPY the address registers before the loop and
use that new vreg instead.

Review: Ulrich Weigand

Differential Revision: https://reviews.llvm.org/D112065
2021-10-26 17:03:01 +02:00
Jonas Paulsson
9f8872779a [SystemZ] Provide size values for PATCHPOINT, STACKMAP and FENTRY_CALL.
All instructions must have a correct size value close to emission when
SystemZLongBranch runs, or a necessary branch relaxation may be missed.

This patch also adds an assert for instruction sizes in SystemZLongBranch.

Review: Ulrich Weigand
2021-10-26 12:07:22 +02:00
Anirudh Prasad
aa3519f178 [SystemZ][z/OS] Initial implementation for lowerCall on z/OS
- This patch provides the initial implementation for lowering a call on z/OS according to the XPLINK64 calling convention
- A series of changes have been made to SystemZCallingConv.td to account for these additional XPLINK64 changes including adding a new helper function to shadow the stack along with allocation of a register wherever appropriate
- For the cases of copying a f64 to a gr64 and a f128 / 128-bit vector type to a gr64, a `CCBitConvertToType` has been added and has been bitcasted appropriately in the lowering phase
- Support for the ADA register (R5) will be provided in a later patch.

Reviewed By: uweigand

Differential Revision: https://reviews.llvm.org/D111662
2021-10-21 09:48:59 -04:00
Jonas Paulsson
ccbfcfda1e [SystemZ] Handle huge immediates in SystemZInstrInfo::loadImmediate().
This is needed during isel pseudo expansion in order not to crash on huge
immediates.

Review: Ulrich Weigand
2021-10-15 19:08:45 +02:00
Jonas Paulsson
a33e4c8ae9 [SystemZ] Reapply memcmp and memcpy patches.
This reverts 3562076 and includes some refactoring as well.

Review: Ulrich Weigand

Differential Revision: https://reviews.llvm.org/D111733
2021-10-14 10:37:33 +02:00
Jonas Paulsson
00baad35b2 [SystemZ] Bugfix and refactorization of mem-mem operations
This patch fixes the bug that consisted of treating variable / immediate
length mem operations (such as memcpy, memset, ...) differently. The variable
length case needs to have the length minus 1 passed due to the use of EXRL
target instructions. However, the DAGCombiner can convert a register length
argument into a constant one, and whenever that happened one byte too little
would end up being performed.

This is also a refactorization by reducing the number of opcodes and variants
involved. For any opcode (variable or constant length), only the length minus
one is passed on to the ISD node. The rest of the logic is now instead
handled during isel pseudo expansion.

Review: Ulrich Weigand

Differential Revision: https://reviews.llvm.org/D111729
2021-10-14 10:37:33 +02:00
Kai Nacke
0a950a2e94 [SystemZ/z/OS] Implement save of non-volatile registers on z/OS XPLINK
This PR implements the save of the XPLINK callee-saved registers
on z/OS.

Reviewed By: uweigand, Kai

Differential Revision: https://reviews.llvm.org/D111653
2021-10-13 12:57:57 -04:00
Guozhi Wei
6599961c17 [TwoAddressInstructionPass] Improve the SrcRegMap and DstRegMap computation
This patch contains following enhancements to SrcRegMap and DstRegMap:

  1 In findOnlyInterestingUse not only check if the Reg is two address usage,
    but also check after commutation can it be two address usage.

  2 If a physical register is clobbered, remove SrcRegMap entries that are
    mapped to it.

  3 In processTiedPairs, when create a new COPY instruction, add a SrcRegMap
    entry only when the COPY instruction is coalescable. (The COPY src is
    killed)

With these enhancements isProfitableToCommute can do better commute decision,
and finally more register copies are removed.

Differential Revision: https://reviews.llvm.org/D108731
2021-10-11 15:28:31 -07:00
Jay Foad
df2d4bc4cb [TwoAddressInstruction] Fix ReplacedAllUntiedUses in processTiedPairs
Fix the calculation of ReplacedAllUntiedUses when any of the tied defs
are early-clobber. The effect of this is to fix the placement of kill
flags on an instruction like this (from @f2 in
 test/CodeGen/SystemZ/asm-18.ll):

  INLINEASM &"stepb $1, $2" [attdialect], $0:[regdef-ec:GRH32Bit], def early-clobber %3:grh32bit, $1:[reguse tiedto:$0], killed %4:grh32bit(tied-def 3), $2:[reguse:GRH32Bit], %4:grh32bit

After TwoAddressInstruction without this patch:

  %3:grh32bit = COPY killed %4:grh32bit
  INLINEASM &"stepb $1, $2" [attdialect], $0:[regdef-ec:GRH32Bit], def early-clobber %3:grh32bit, $1:[reguse tiedto:$0], %3:grh32bit(tied-def 3), $2:[reguse:GRH32Bit], %4:grh32bit

Note that the COPY kills %4, even though there is a later use of %4 in
the INLINEASM. This fails machine verification if you force it to run
after TwoAddressInstruction (currently it is disabled for other
reasons).

After TwoAddressInstruction with this patch:

  %3:grh32bit = COPY %4:grh32bit
  INLINEASM &"stepb $1, $2" [attdialect], $0:[regdef-ec:GRH32Bit], def early-clobber %3:grh32bit, $1:[reguse tiedto:$0], %3:grh32bit(tied-def 3), $2:[reguse:GRH32Bit], %4:grh32bit

Differential Revision: https://reviews.llvm.org/D110848
2021-10-07 10:10:11 +01:00
Jay Foad
85abedd750 [TwoAddressInstruction] Pre-commit a test case for D110848 2021-10-07 10:10:11 +01:00
Jonas Paulsson
3562076dfc [SystemZ] Temporarily revert memcmp and memcpy patches
Seem to cause test failures in compiler-rt.

Revert "[SystemZ] Implement memcmp of variable length with CLC."
This reverts commit 7a4e9a0c73667cb80e4572d41535a9e48f1ed9ef.

Revert "[SystemZ] Implement memcpy of variable length with MVC."
This reverts commit c6c13c58eebda605a9a05f1f13cac1e46407afc7.
2021-10-06 11:05:18 +02:00
Jonas Paulsson
7a4e9a0c73 [SystemZ] Implement memcmp of variable length with CLC.
Following the same pattern of memset/memcpy, this patch implements a variable
length memcmp with a CLC loop followed by an EXRL instruction.

Review: Ulrich Weigand

Differential Revision: https://reviews.llvm.org/D107380
2021-10-05 18:20:36 +02:00
Jonas Paulsson
c6c13c58ee [SystemZ] Implement memcpy of variable length with MVC.
Instead of making a memcpy libcall, emit an MVC loop and an EXRL instruction
the same way as is already done for memset 0.

Review: Ulrich Weigand

Differential Revision: https://reviews.llvm.org/D106874
2021-10-05 17:14:41 +02:00
Jay Foad
20c0280733 [LiveIntervals] Repair subreg ranges in processTiedPairs
In TwoAddressInstructionPass::processTiedPairs, update subranges of the
live interval for RegB as well as the main range.

This is a small step towards switching TwoAddressInstructionPass over
from LiveVariables to LiveIntervals. Currently this path is only tested
if you explicitly enable -early-live-intervals.

Differential Revision: https://reviews.llvm.org/D110526
2021-09-28 08:10:16 +01:00
Jonas Paulsson
ea92283449 [SystemZ] Implement ISD::BITCAST for fp128 -> i128.
The type legalizer has by default no method of doing this bitcast other than
storing and reloading the value from stack.

This patch implements a custom lowering of this operation using extractions
of subregs (z13 and earlier using FP128 register pairs), or of vector
elements (with 'vector enhancements 1' using VR128 FP registers).

Review: Ulrich Weigand

Differential Revision: https://reviews.llvm.org/D110346
2021-09-24 10:26:45 +02:00
Jonas Paulsson
a48b43f981 [SystemZ] Emit EXRL target instructions before text section is ended.
SystemZ adds the EXRL target instructions in the end of each file. This must
be done before debug info emission since that may end the text section, and
therefore this is now done in emitConstantPools() (instead of in
emitEndOfAsmFile).

Review: Ulrich Weigand

Differential Revision: https://reviews.llvm.org/D109513
2021-09-21 14:32:28 +02:00
Matt Arsenault
57dfa12e4c SystemZ: Tidy up a mir test 2021-08-10 13:56:54 -04:00
Roman Lebedev
0b094c06f4
[NFC][Codegen][SystemZ] Autogenerate checklines in int-cmp-47.ll 2021-08-04 01:47:39 +03:00
Anirudh Prasad
a8cfa4b9bd [SystemZ][z/OS] Initial code to generate assembly files on z/OS
- This patch consists of the bare basic code needed in order to generate some assembly for the z/OS target.
- Only the .text and the .bss sections are added for now.
- The relevant MCSectionGOFF/Symbol interfaces have been added. This enables us to print out the GOFF machine code sections.
- This patch enables us to add simple lit tests wherever possible, and contribute to the testing coverage for the z/OS target
- Further improvements and additions will be made in future patches.

Reviewed By: tmatheson

Differential Revision: https://reviews.llvm.org/D106380
2021-07-27 11:29:15 -04:00
Ulrich Weigand
8cd8120a7b [SystemZ] Add support for new cpu architecture - arch14
This patch adds support for the next-generation arch14
CPU architecture to the SystemZ backend.

This includes:
- Basic support for the new processor and its features.
- Detection of arch14 as host processor.
- Assembler/disassembler support for new instructions.
- New LLVM intrinsics for certain new instructions.
- Support for low-level builtins mapped to new LLVM intrinsics.
- New high-level intrinsics in vecintrin.h.
- Indicate support by defining  __VEC__ == 10304.

Note: No currently available Z system supports the arch14
architecture.  Once new systems become available, the
official system name will be added as supported -march name.
2021-07-26 16:57:28 +02:00
Jonas Paulsson
6c0e6895d0 [SystemZ] Handle NoRegister in SystemZTargetLowering::emitMemMemWrapper().
Bugfix: The compiler should be able to generate a memset to nullptr.

Review: Ulrich Weigand
2021-07-19 20:04:44 +02:00
Jonas Paulsson
96421af5f8 [SystemZ] Bugfix for the 'N' code for inline asm operand.
Don't use a local MachineOperand copy in SystemZAsmPrinter::PrintAsmOperand()
and change the register as it may break the MRI tracking of register
uses. Use an MCOperand instead.

Review: Ulrich Weigand

Differential Revision: https://reviews.llvm.org/D105757
2021-07-12 15:04:08 +02:00
David Green
4ce26deac2 [DAG] Reassociate Add with Or
We already have reassociation code for Adds and Ors separately in DAG
combiner, this adds it for the combination of the two where Ors act like
Adds. It reassociates (add (or (x, c), y) -> (add (add (x, y), c)) where
we know that the Ors operands have no common bits set, and the Or has
one use.

Differential Revision: https://reviews.llvm.org/D104765
2021-07-07 10:21:07 +01:00
Tom Stellard
7f1c077c30 tests/CodeGen: Use %python lit substitution when invoking python
This will use the python that LLVM was configured to use rather than
python from PATH.

Reviewed By: serge-sans-paille

Differential Revision: https://reviews.llvm.org/D105224
2021-07-06 18:46:36 -07:00
David Green
be0924ad17 [Tests] Update some tests for D104765. NFC 2021-07-06 19:23:52 +01:00
Jonas Paulsson
458eac2573 [SystemZ] Support the 'N' code for the odd register in inline-asm.
The odd register of a (128 bit) register pair is accessed with the 'N' code
with an inline assembly operand.

Review: Ulrich Weigand

Differential Revision: https://reviews.llvm.org/D105502
2021-07-06 19:46:49 +02:00
Jonas Paulsson
37a92f3b03 [SystemZ] Generate XC loop for memset 0 of variable length.
Benchmarking has shown that it is worthwhile to implement a variable length
memset of 0 with XC (exclusive or) like gcc does, instead of using a libcall.

This requires the use of the EXecute Relative Long (EXRL) instruction which
can now be done in a framework that can also be used with other target
instructions (not just XC).

Review: Ulrich Weigand

Differential Revision: https://reviews.llvm.org/D103865
2021-07-06 18:07:31 +02:00
Matt Arsenault
fae05692a3 CodeGen: Print/parse LLTs in MachineMemOperands
This will currently accept the old number of bytes syntax, and convert
it to a scalar. This should be removed in the near future (I think I
converted all of the tests already, but likely missed a few).

Not sure what the exact syntax and policy should be. We can continue
printing the number of bytes for non-generic instructions to avoid
test churn and only allow non-scalar types for generic instructions.

This will currently print the LLT in parentheses, but accept parsing
the existing integers and implicitly converting to scalar. The
parentheses are a bit ugly, but the parser logic seems unable to deal
without either parentheses or some keyword to indicate the start of a
type.
2021-06-30 16:54:13 -04:00
Bjorn Pettersson
4c7f820b2b Update @llvm.powi to handle different int sizes for the exponent
This can be seen as a follow up to commit 0ee439b705e82a4fe20e2,
that changed the second argument of __powidf2, __powisf2 and
__powitf2 in compiler-rt from si_int to int. That was to align with
how those runtimes are defined in libgcc.
One thing that seem to have been missing in that patch was to make
sure that the rest of LLVM also handle that the argument now depends
on the size of int (not using the si_int machine mode for 32-bit).
When using __builtin_powi for a target with 16-bit int clang crashed.
And when emitting libcalls to those rtlib functions, typically when
lowering @llvm.powi), the backend would always prepare the exponent
argument as an i32 which caused miscompiles when the rtlib was
compiled with 16-bit int.

The solution used here is to use an overloaded type for the second
argument in @llvm.powi. This way clang can use the "correct" type
when lowering __builtin_powi, and then later when emitting the libcall
it is assumed that the type used in @llvm.powi matches the rtlib
function.

One thing that needed some extra attention was that when vectorizing
calls several passes did not support that several arguments could
be overloaded in the intrinsics. This patch allows overload of a
scalar operand by adding hasVectorInstrinsicOverloadedScalarOpd, with
an entry for powi.

Differential Revision: https://reviews.llvm.org/D99439
2021-06-17 09:38:28 +02:00
Jonas Paulsson
8b32e25bc2 [SystemZ] Return true from convertSetCCLogicToBitwiseLogic for scalar integer.
Review: Ulrich Weigand
2021-06-08 16:27:28 -05:00
Jonas Paulsson
d5e4f28c0a [SystemZ] Return true from isMaskAndCmp0FoldingBeneficial().
Return true if the mask is a constant uint of 2 bytes, in which case TMLL is
available.

Review: Ulrich Weigand
2021-06-08 15:42:46 -05:00
Florian Hahn
126f90b252
[DAGCombine] Poison-prove scalarizeExtractedVectorLoad.
extractelement is poison if the index is out-of-bounds, so just
scalarizing the load may introduce an out-of-bounds load, which is UB.

To avoid introducing new UB, we can mask the index so it only contains
valid indices.

Fixes PR50382.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D103077
2021-05-30 11:40:55 +01:00
Jonas Paulsson
d058262b14 [SystemZ] Support i128 inline asm operands.
Support virtual, physical and tied i128 register operands in inline assembly.

i128 is on SystemZ not really supported and is not a legal type and generally
such a value will be split into two i64 parts. There are however some
instructions that require a pair of two GPR64 registers contained in the GR128
bit reg class, which is untyped.

For inline assmebly operands, it proved to be very cumbersome to first follow
the general behavior of splitting an i128 operand into two parts and then
later rebuild the INLINEASM MI to have one GR128 register. Instead, some
minor common code changes were made to SelectionDAGBUilder to only create one
GR128 register part to begin with. In particular:

- getNumRegisters() now has an optional parameter "RegisterVT" which is
  passed by AddInlineAsmOperands() and GetRegistersForValue().

- The bitcasting in GetRegistersForValue is not performed if RegVT is
  Untyped.

- The RC for a tied use in AddInlineAsmOperands() is now computed either from
  the tied def (virtual register), or by getMinimalPhysRegClass() (physical
  register).

- InstrEmitter.cpp:EmitCopyFromReg() has been fixed so that the register
  class (DstRC) can also be computed for an illegal type.

In the SystemZ backend getNumRegisters(), splitValueIntoRegisterParts() and
joinRegisterPartsIntoValue() have been implemented to handle i128 operands.

Differential Revision: https://reviews.llvm.org/D100788

Review: Ulrich Weigand
2021-05-26 10:08:32 -05:00