893 Commits

Author SHA1 Message Date
Jonas Paulsson
74a09bd1ec
[SystemZ] Test improvements for atomic load/store instructions (NFC). (#75630)
Improve tests for atomic loads and stores, mainly by testing 128-bit atomic load and store instructions both with and w/out natural alignment.
2023-12-21 20:48:00 +01:00
Yusra Syeda
0768253c20
[SystemZ][z/OS] Add exception handling for XPLINK (#74638)
Adds emitting the exception table and the EH registers for XPLINK.

---------

Co-authored-by: Yusra Syeda <yusra.syeda@ibm.com>
2023-12-19 13:58:33 -05:00
Ulrich Weigand
82a1bffd34
[SelectionDAG] Do not crash on large integers in CheckInteger (#75787)
The CheckInteger routine called from TableGen-generated selection logic
uses getSExtValue - which will abort if the underlying APInt does not
fit into an int64_t.

This case is now triggered by the SystemZ back-end since i128 is a legal
type on certain machines. While we do not have any regular instructions
that take 128-bit immediates (like most other platforms), there are
patterns in the .td files that recognize an i128 "xor ..., -1" as a
"not".

These patterns cause code to be generated that calls the CheckInteger
routine on some i128-valued integer, which may trigger the assert.

Fix by using trySExtValue instead.

Fixes https://github.com/llvm/llvm-project/issues/75710
2023-12-18 14:03:57 +01:00
Ulrich Weigand
a00c4220be [SystemZ] Fix complex address matching when i128 is legal
Complex address matching currently handles truncations, under
the assumption that those are no-ops.  This is no longer true
when i128 is legal.  Change the code to only handle actual
no-op truncations.

Fixes https://github.com/llvm/llvm-project/issues/75708
Fixes https://github.com/llvm/llvm-project/issues/75714
2023-12-18 12:47:45 +01:00
Ulrich Weigand
59f7f35a90 [SystemZ] ABI support for single-element vector types
Support passing and returning values of single-element vector
types (i.e. <1 x i128> and <1 x fp128>).

Now that i128 is a legal type, supporting these types can be
done simply by providing a getRegisterTypeForCallingConv
implementation that handles them.

Fixes https://github.com/llvm/llvm-project/issues/61291
2023-12-15 19:31:00 +01:00
Ulrich Weigand
a65ccc1b9f
[SystemZ] Support i128 as legal type in VRs (#74625)
On processors supporting vector registers and SIMD instructions, enable
i128 as legal type in VRs. This allows many operations to be implemented
via native instructions directly in VRs (including add, subtract,
logical operations and shifts). For a few other operations (e.g.
multiply and divide, as well as atomic operations), we need to move the
i128 value back to a GPR pair to use the corresponding instruction
there. Overall, this is still beneficial.

The patch includes the following LLVM changes:
- Enable i128 as legal type
- Set up legal operations (in SystemZInstrVector.td)
- Custom expansion for i128 add/subtract with carry
- Custom expansion for i128 comparisons and selects
- Support for moving i128 to/from GPR pairs when required
- Handle 128-bit integer constant values everywhere
- Use i128 as intrinsic operand type where appropriate
- Updated and new test cases

In addition, clang builtins are updated to reflect the intrinsic operand
type changes (which also improves compatibility with GCC).
2023-12-15 12:55:15 +01:00
Jonas Paulsson
01061ed370
[SystemZ] Improve shouldCoalesce() for i128. (#74942)
The SystemZ implementation of shouldCoalesce() is merely a workaround
for the fact that regalloc can run out of registers when extending 128-bit
intervals with subreg (GPR64/GPR32) COPYs. 

This patch adds more freedom to the coalescer as it now only checks that
the subreg interval is local to MBB and does not have too many physreg
clobbers.
2023-12-14 15:55:27 +01:00
Jonas Paulsson
07056c2274
[SystemZ] Use LCGR/AGHI for i64 XOR with -1 (#74882)
LCGR/AGHI is a more compact way of implementing a 64-bit NOT.
2023-12-11 17:28:12 +01:00
Jonas Paulsson
435ba72afd
[SystemZ] Simplify handling of AtomicRMW instructions. (#74789)
Let the AtomicExpand pass do more of the job of expanding
AtomicRMWInst:s in order to simplify the handling in the backend.

The only cases that the backend needs to handle itself are those of
subword size (8/16 bits) and those directly corresponding to a target
instruction.
2023-12-08 17:19:17 +01:00
Jonas Paulsson
c568927f3e
[SystemZ] Properly support 16 byte atomic int/fp types and ops. (#73134)
- Clang FE now has MaxAtomicPromoteWidth / MaxAtomicInlineWidth set to 128, and now produces IR
  instead of calls to __atomic instrinsics for 16 bytes as well.
- Atomic __int128 (and long double) variables are now aligned to 16 bytes by default (like gcc 14).
- AtomicExpand pass now expands 16 byte operations as well.
- tests for __atomic builtins for all integer widths, and __atomic_is_lock_free with friends.
- TODO: AtomicExpand pass handles with this patch expansion of i128 atomicrmw:s. As a next step
  smaller integer types should also be possible to handle this way instead of by the backend.
2023-12-05 17:17:21 +01:00
Nikita Popov
eecb99c5f6 [Tests] Add disjoint flag to some tests (NFC)
These tests rely on SCEV looking recognizing an "or" with no common
bits as an "add". Add the disjoint flag to relevant or instructions
in preparation for switching SCEV to use the flag instead of the
ValueTracking query. The IR with disjoint flag matches what
InstCombine would produce.
2023-12-05 14:09:36 +01:00
Ulrich Weigand
c61eb44005 [SystemZ] Implement vector rotate in terms of funnel shift
Clang currently implements a set of vector rotate builtins
(__builtin_s390_verll*) in terms of platform-specific LLVM
intrinsics.  To simplify the IR (and allow for common code
optimizations if applicable), this patch removes those LLVM
intrinsics and implements the builtins in terms of the
platform-independent funnel shift intrinsics instead.

Also, fix the prototype of the __builtin_s390_verll*
builtins for full compatibility with GCC.
2023-12-04 16:52:00 +01:00
Ulrich Weigand
0c568c2535 [SystemZ] Auto-generate vec-intrinsics tests 2023-12-04 16:36:38 +01:00
Yusra Syeda
9a38a72f1d
[SystemZ][z/OS] This change adds support for the PPA2 section in zOS (#68926)
This PR adds support for the PPA2 fields.

---------

Co-authored-by: Yusra Syeda <yusra.syeda@ibm.com>
2023-11-27 16:30:12 -05:00
Ilya Leoshkevich
03934e70ef
[SystemZ] Enable AtomicExpand pass (#70398)
The upcoming OpenMP support for SystemZ requires handling of IR insns
like `atomicrmw fadd`. Normally atomic float operations are expanded by
Clang and such insns do not occur, but OpenMP generates them directly.
Other architectures handle this using the AtomicExpand pass, which
SystemZ did not need so far. Enable it.

Currently AtomicExpand treats atomic load and stores of floats
pessimistically: it casts them to integers, which SystemZ does not need,
since the floating point load and store instructions are already atomic.
However, the way Clang currently expands them is pessimistic as well, so
this change does not make things worse. Optimizing operations on atomic
floats can be a separate change in the future.

This change does not create any differences the Linux kernel build.
2023-10-31 09:51:06 +01:00
Hans Wennborg
e2fc68c3db Typos: 'maxium', 'minium' 2023-10-23 10:42:28 +02:00
Ilya Leoshkevich
8e810dc7d9
[SystemZ] Support builtin_{frame,return}_address() with non-zero argument (#69405)
When the code is built with -mbackchain, it is possible to retrieve the
caller's frame and return addresses. GCC already can do this, add this
support to Clang as well. Use RISCVTargetLowering and GCC's
s390_return_addr_rtx() as inspiration. Add tests based on what GCC is
emitting.
2023-10-18 19:05:31 +02:00
Yusra Syeda
6cf41ada44
[SystemZ][z/OS] Add vararg support to z/OS (#68834)
This PR adds vararg support to z/OS and updates the call-zos-vararg.ll
lit test.

Co-authored-by: Yusra Syeda <yusra.syeda@ibm.com>
2023-10-12 12:42:55 +02:00
Jay Foad
7b3bbd83c0 Revert "[CodeGen] Really renumber slot indexes before register allocation (#67038)"
This reverts commit 2501ae58e3bb9a70d279a56d7b3a0ed70a8a852c.

Reverted due to various buildbot failures.
2023-10-09 12:31:32 +01:00
Jay Foad
2501ae58e3
[CodeGen] Really renumber slot indexes before register allocation (#67038)
PR #66334 tried to renumber slot indexes before register allocation, but
the numbering was still affected by list entries for instructions which
had been erased. Fix this to make the register allocator's live range
length heuristics even less dependent on the history of how instructions
have been added to and removed from SlotIndexes's maps.
2023-10-09 11:44:41 +01:00
Kai Nacke
42de2b7e99
[SystemZ/z/OS] Add library names for intrinsics (#68114)
On z/OS, many library functions have a non-standard name. This change
initializes the table of runtime function which results from lowering
intrinsics to library calls.
2023-10-03 18:53:52 +03:00
Noah Goldstein
de7881ebf5 [DAGCombiner] Combine (select c, (and X, 1), 0) -> (and (zext c), X)
The middle end canonicalizes:
`(and (zext c), X)`
    -> `(select c, (and X, 1), 0)`

But the `and` + `zext` form gets better codegen.
2023-09-28 13:46:46 -05:00
Jay Foad
44e997a158
[TwoAddressInstruction] Use isPlainlyKilled in processTiedPairs (#65976)
Calling isPlainlyKilled instead of directly checking for a kill flag
should make processTiedPairs behave the same with LiveIntervals
(i.e. when compiling with -early-live-intervals) as it does with
LiveVariables.
2023-09-19 16:44:20 +01:00
Jay Foad
e0919b189b [CodeGen] Renumber slot indexes before register allocation (#66334)
RegAllocGreedy uses SlotIndexes::getApproxInstrDistance to approximate
the length of a live range for its heuristics. Renumbering all slot
indexes with the default instruction distance ensures that this estimate
will be as accurate as possible, and will not depend on the history of
how instructions have been added to and removed from SlotIndexes's maps.

This also means that enabling -early-live-intervals, which runs the
SlotIndexes analysis earlier, will not cause large amounts of churn due
to different register allocator decisions.
2023-09-19 11:18:12 +01:00
Jay Foad
0528dbfe5c
Add some -early-live-intervals RUN lines (#66058)
This adds test coverage for an upcoming change to
TwoAddressInstructionPass::processTiedPairs.
2023-09-12 13:06:10 +01:00
Neumann Hon
d00f59893e [SystemZ][z/OS] Fix the entry point marker for leaf functions
The function emitFunctionEntryLabel does not look at whether or not a function is a leaf when setting the entry flags, and instead blindly marks all functions as non-leaf routines.

Differential Revision: https://reviews.llvm.org/D157701

Reviewed By: uweigand
2023-08-23 09:50:01 -04:00
Neumann Hon
43207225b6 Revert "[SystemZ][z/OS] Fix the entry point marker for leaf functions"
This reverts commit 8af297bbb8e97de8908b857eae1a44f46a0d5afe.

Testcase LLVM :: MC/GOFF/ppa1.ll needs to be updated to account for this.
2023-08-20 22:04:02 -04:00
Neumann Hon
8af297bbb8 [SystemZ][z/OS] Fix the entry point marker for leaf functions
The function emitFunctionEntryLabel does not look at whether or not a function is a leaf when setting the entry flags,
and instead blindly marks all functions as non-leaf routines. Change it to check if a function is a leaf function and
mark it accordingly.
2023-08-20 21:53:13 -04:00
Neumann Hon
3e139be29f [SystemZ][z/OS] Add support for function name field of PPA1
This PR causes the PPA1 to emit the function's name if it exists. This field is not emitted for unnamed functions.

Reviewed By: uweigand

Differential Revision: https://reviews.llvm.org/D157494
2023-08-10 04:40:19 -04:00
Jay Foad
56d92c1758 [MachineScheduler] Track physical register dependencies per-regunit
Change the scheduler's physical register dependency tracking from
registers-and-their-aliases to regunits. This has a couple of advantages
when subregisters are used:

- The dependency tracking is more accurate and creates fewer useless
  edges in the dependency graph. An AMDGPU example, edited for clarity:

    SU(0): $vgpr1 = V_MOV_B32 $sgpr0
    SU(1): $vgpr1 = V_ADDC_U32 0, $vgpr1
    SU(2): $vgpr0_vgpr1 = FLAT_LOAD_DWORDX2 $vgpr0_vgpr1, 0, 0

  There is a data dependency on $vgpr1 from SU(0) to SU(1) and from
  SU(1) to SU(2). But the old dependency tracking code also added a
  useless edge from SU(0) to SU(2) because it thought that SU(0)'s def
  of $vgpr1 aliased with SU(2)'s use of $vgpr0_vgpr1.

- On targets like AMDGPU that make heavy use of subregisters, each
  register can have a huge number of aliases - it can be quadratic in
  the size of the largest defined register tuple. There is a much lower
  bound on the number of regunits per register, so iterating over
  regunits is faster than iterating over aliases.

The LLVM compile-time tracker shows a tiny overall improvement of 0.03%
on X86. I expect a larger compile-time improvement on targets like
AMDGPU.

Recommit after fixing AggressiveAntiDepBreaker in D156880.

Differential Revision: https://reviews.llvm.org/D156552
2023-08-07 15:41:40 +01:00
Jay Foad
e2e3f06813 Revert "[MachineScheduler] Track physical register dependencies per-regunit"
This reverts commit 1a54671d5405a39de362e9692ce963c0638023bc.

It was causing lit test failures in a LLVM_ENABLE_EXPENSIVE_CHECKS
build.
2023-07-29 18:05:25 +01:00
Jay Foad
1a54671d54 [MachineScheduler] Track physical register dependencies per-regunit
Change the scheduler's physical register dependency tracking from
registers-and-their-aliases to regunits. This has a couple of advantages
when subregisters are used:

- The dependency tracking is more accurate and creates fewer useless
  edges in the dependency graph. An AMDGPU example, edited for clarity:

    SU(0): $vgpr1 = V_MOV_B32 $sgpr0
    SU(1): $vgpr1 = V_ADDC_U32 0, $vgpr1
    SU(2): $vgpr0_vgpr1 = FLAT_LOAD_DWORDX2 $vgpr0_vgpr1, 0, 0

  There is a data dependency on $vgpr1 from SU(0) to SU(1) and from
  SU(1) to SU(2). But the old dependency tracking code also added a
  useless edge from SU(0) to SU(2) because it thought that SU(0)'s def
  of $vgpr1 aliased with SU(2)'s use of $vgpr0_vgpr1.

- On targets like AMDGPU that make heavy use of subregisters, each
  register can have a huge number of aliases - it can be quadratic in
  the size of the largest defined register tuple. There is a much lower
  bound on the number of regunits per register, so iterating over
  regunits is faster than iterating over aliases.

The LLVM compile-time tracker shows a tiny overall improvement of 0.03%
on X86. I expect a larger compile-time improvement on targets like
AMDGPU.

Differential Revision: https://reviews.llvm.org/D156552
2023-07-29 15:34:53 +01:00
Kevin P. Neal
58ad5699e7 [FPEnv][SystemZ] Correct strictfp tests.
Correct a SystemZ strictfp test to follow the rules documented in the LangRef:
https://llvm.org/docs/LangRef.html#constrained-floating-point-intrinsics

This test, like many others, just needed the function definition corrected.

Test changes verified with D146845.
2023-07-26 09:08:46 -04:00
Amaury Séchet
872276de4b [NFC] Autogenerate CodeGen/SystemZ/int-{uadd,sub}-0*.ll 2023-07-05 20:14:43 +00:00
Yusra Syeda
163aad6bcb [SystemZ][z/OS] z/OS ADA codegen and emission
This patch adds support for the ADA (associated data area), doing the following:

-Creates the ADA table to handle displacements
-Emits the ADA section in the SystemZAsmPrinter
-Lowers the ADA_ENTRY node into the appropriate load instruction

Differential Revision: https://reviews.llvm.org/D153788
2023-07-05 13:21:52 -04:00
Yusra Syeda
1bfdc534aa Revert "[SystemZ][z/OS] This patch adds support for the ADA (associated data area), doing the following:"
This reverts commit 9df0f66af5462e23216eae31aedbd4d2f459cc3d.
2023-06-28 11:18:12 -04:00
Yusra Syeda
9df0f66af5 [SystemZ][z/OS] This patch adds support for the ADA (associated data area), doing the following:
- Creates the ADA table to handle displacements
- Emits the ADA section in the SystemZAsmPrinter
- Lowers the ADA_ENTRY node into the appropriate load instruction

Differential Revision: https://reviews.llvm.org/D153788
2023-06-28 10:13:10 -04:00
Matt Arsenault
80e2c26dfd RegisterCoalescer: Fix name of pass
I finally snapped and fixed this inconsistency.
2023-06-21 10:30:43 -04:00
Amaury Séchet
0a76f7d9d8 [NFC] Autogenerate numerous SystemZ tests 2023-06-14 21:47:31 +00:00
Neumann Hon
8a7a2da18f [SystemZ][z/OS] Correct value of length/4 of params field in PPA1.
The Length/4 of Params field in the PPA1 ought to be the length of the parameters for the current function. Currently we are storing the length of the parameter area in the current function's stack frame, which represents the length of the params of the longest callee in the current function.

Differential Revision: https://reviews.llvm.org/D152920

Reviewed By: uweigand
2023-06-14 13:37:46 -04:00
Neumann Hon
049324ac5e Revert "[SystemZ][z/OS] Correct value of length/4 of params field in PPA1."
This reverts commit e0f7b0e0f704dc3759925602e474b9e669270fcb.
2023-06-14 13:34:16 -04:00
Neumann Hon
e0f7b0e0f7 [SystemZ][z/OS] Correct value of length/4 of params field in PPA1.
The Length/4 of Params field in the PPA1 ought to be the length of the parameters for the current function. Currently we are storing the length of the parameter area in the current function's stack frame, which represents the length of the params of the longest callee in the current function.

Differential revision: https://reviews.llvm.org/D119049

Reviewed By: uweigand
2023-06-14 13:20:45 -04:00
Amaury Séchet
a70d5e25f3 [DAGCombine] Make sure combined nodes are added back to the worklist in topological order.
Currently, a node and its users are added back to the worklist in reverse topological order after it is combined. This diff changes that order to be topological. This is part of a larger migration to get the DAGCombiner to process nodes in topological order.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D127115
2023-06-13 09:14:37 +00:00
JP Lehr
c9998ec145 Revert "[DAGCombine] Make sure combined nodes are added back to the worklist in topological order."
This reverts commit e69fa03ddd85812be3143d79a0359c3e8d43bd45.

This patch lead to build time outs on the AMDGPU OpenMP runtime
buildbot.
2023-06-05 10:55:58 -04:00
Amaury Séchet
e69fa03ddd [DAGCombine] Make sure combined nodes are added back to the worklist in topological order.
Currently, a node and its users are added back to the worklist in reverse topological order after it is combined. This diff changes that order to be topological. This is part of a larger migration to get the DAGCombiner to process nodes in topological order.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D127115
2023-06-05 11:09:18 +00:00
Tobias Hieta
f84bac329b
[NFC][Py Reformat] Reformat lit.local.cfg python files in llvm
This is a follow-up to b71edfaa4ec3c998aadb35255ce2f60bba2940b0
since I forgot the lit.local.cfg files in that one.

Reformatting is done with `black`.

If you end up having problems merging this commit because you
have made changes to a python file, the best way to handle that
is to run git checkout --ours <yourfile> and then reformat it
with black.

If you run into any problems, post to discourse about it and
we will try to help.

RFC Thread below:

https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style

Reviewed By: barannikov88, kwk

Differential Revision: https://reviews.llvm.org/D150762
2023-05-17 17:03:15 +02:00
Tobias Hieta
b71edfaa4e
[NFC][Py Reformat] Reformat python files in llvm
This is the first commit in a series that will reformat
all the python files in the LLVM repository.

Reformatting is done with `black`.

See more information here:

https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style

Reviewed By: jhenderson, JDevlieghere, MatzeB

Differential Revision: https://reviews.llvm.org/D150545
2023-05-17 10:48:52 +02:00
Jonas Paulsson
64599ac97e [MachineSink] Don't reject sinking because of dead def in isProfitableToSinkTo().
An instruction should be sunk (if otherwise legal and profitable) regardless
of if it has a dead def of a physreg or not. Physreg defs are checked in other
places and sinking is only done with dead defs of regs that are not live into
the target MBB.

Differential Revision: https://reviews.llvm.org/D150447

Reviewed By: sebastian-ne, arsenm
2023-05-16 10:00:44 +02:00
Neumann Hon
80c643c464 [SystemZ][z/OS] Save (and restore) R3 to avoid clobbering parameter when call stack frame extension is invoked
When the stack frame extension routine is used, the contents of r3 is overwritten.
However, if r3 is live in the prologue (ie. one of the function's parameters
resides in r3), it needs to be saved. We save r3 in r0 if r0 is available
(ie. r0 is not used as temporary storage for r4), and in the corresponding
stack slot for the third parameter otherwise.

Differential Revision: https://reviews.llvm.org/D150332

Reviewed By: uweigand
2023-05-12 09:32:04 -04:00
Neumann Hon
39b8af47fc Revert "[SystemZ][z/OS] Save (and restore) R3 to avoid clobbering parameter when call stack frame extension is invoked"
This reverts commit 1aec3d15aaa25c39fae026688708d7353d488974.
2023-05-11 22:32:16 -04:00