921 Commits

Author SHA1 Message Date
Jonas Paulsson
16b7cc69ef
[SystemZ] Eliminate call sequence instructions early. (#77812)
On SystemZ, the outgoing argument area which is big enough for all calls
in the function is created once during the prolog, as opposed to
adjusting the stack around each call. The call-sequence instructions are
therefore not really useful any more than to compute the maximum call
frame size, which has so far been done by PEI, but can just as well be
done at an earlier point.

This patch removes the mapping of the CallFrameSetupOpcode and
CallFrameDestroyOpcode and instead computes the MaxCallFrameSize
directly after instruction selection and then removes the ADJCALLSTACK
pseudos. This removes the confusing pseudos and also avoids the problem
of having to keep the call frame size accurate when creating new MBBs.

This fixes #76618 which exposed the need to maintain the call frame size
when splitting blocks (which was not done).
2024-03-28 18:26:38 +01:00
Ulrich Weigand
4b907414d2 [SystemZ] Add support for llvm.readcyclecounter
The llvm.readcyclecounter intrinsic can be implemented via the
STORE CLOCK FAST (STCKF) instruction.
2024-03-22 20:01:02 +01:00
Jonas Paulsson
7564566779 Reapply "Move assertion for AdjustsStack from PEI to MachineVerifier (#85698)"
- The check is now actually done in both PEI and the MachineVerifier.
- More .mir tests trivially updated with "adjustsStack: true" as needed.
2024-03-21 20:24:57 -04:00
Jonas Paulsson
b4b5e8277a
Check for all frame instructions in finalize isel. (#85945)
Check for all frame instructions in finalize isel, not just for the
frame setup opcode. This was proven necessary, see #78001 
for discussion.
2024-03-21 11:00:08 -04:00
Jonas Paulsson
9ebd329ad8 Revert "Move assertion for AdjustsStack from PEI to MachineVerifier. (#85698)"
This reverts commit 05bde30585710a51592eee0a6cf6df8184d09c92.

Reverting due to verifier complaints with expensive checks on build-bot.
2024-03-20 11:48:30 -04:00
Neumann Hon
5fb2797f23
[GOFF][z/OS] Change PrivateGlobalPrefix and PrivateLabelPrefix to be L# (#85730)
The current values for PrivateGlobalPrefix and PrivateLabelPrefix (@@
and @ respectively) are, in hindsight, poor choices for multiple
reasons:

First, there exist externally visible routines from the language
environment that begin with @@. These functions are certainly not
local/private by any means and they should not share a prefix with
private globals.

Secondly, both private globals and private labels should be handled the
same way by GOFF, so it doesn't make much sense for them to have
separate prefixes. GOFF remains the only file format where these are
different and there is no reason for that to be the case
2024-03-20 10:30:30 -04:00
Jonas Paulsson
05bde30585
Move assertion for AdjustsStack from PEI to MachineVerifier. (#85698)
Have the verifier report a missing AdjustsStack flag rather than waiting until
PEI asserts.
2024-03-20 10:29:12 -04:00
Ulrich Weigand
335f365982 Reapply: [SystemZ] Fix overflow flag for i128 USUBO
We use the VSCBIQ/VSBIQ/VSBCBIQ family of instructions to implement
USUBO/USUBO_CARRY for the i128 data type.  However, these instructions
use an inverted sense of the borrow indication flag (a value of 1
indicates *no* borrow, while a value of 0 indicated borrow).  This
does not match the semantics of the boolean "overflow" flag of the
USUBO/USUBO_CARRY ISD nodes.

Fix this by generating code to explicitly invert the flag.  These
cancel out of the result of USUBO feeds into an USUBO_CARRY.

To avoid unnecessary zero-extend operations, also improve the
DAGCombine handling of ZERO_EXTEND to optimize (zext (xor (trunc)))
sequences where appropriate.

Fixes: https://github.com/llvm/llvm-project/issues/83268
2024-03-19 14:07:08 +01:00
Ulrich Weigand
d1c3795968 Revert "Fix overflow flag for i128 USUBO"
This reverts commit d9c31ee9568277e4303715736b40925e41503596.
2024-03-19 11:43:05 +01:00
Ulrich Weigand
d9c31ee956 Fix overflow flag for i128 USUBO
We use the VSCBIQ/VSBIQ/VSBCBIQ family of instructions to implement
USUBO/USUBO_CARRY for the i128 data type.  However, these instructions
use an inverted sense of the borrow indication flag (a value of 1
indicates *no* borrow, while a value of 0 indicated borrow).  This
does not match the semantics of the boolean "overflow" flag of the
USUBO/USUBO_CARRY ISD nodes.

Fix this by generating code to explicitly invert the flag.  These
cancel out of the result of USUBO feeds into an USUBO_CARRY.

To avoid unnecessary zero-extend operations, also improve the
DAGCombine handling of ZERO_EXTEND to optimize (zext (xor (trunc)))
sequences where appropriate.

Fixes: https://github.com/llvm/llvm-project/issues/83268
2024-03-19 11:20:52 +01:00
Jonas Paulsson
8b8e1adbde
[SystemZ] Don't lower ATOMIC_LOAD/STORE to LOAD/STORE (#75879)
- Instead of lowering float/double ISD::ATOMIC_LOAD / ISD::ATOMIC_STORE
nodes to regular LOAD/STORE nodes, make them legal and select those nodes
properly instead. This avoids exposing them to the DAGCombiner.

- AtomicExpand pass no longer casts float/double atomic load/stores to integer
  (FP128 is still casted).
2024-03-18 17:21:50 -04:00
Jonas Paulsson
09bc6abba6
[MachineFrameInfo] Refactoring around computeMaxcallFrameSize() (NFC) (#78001)
- Use computeMaxCallFrameSize() in PEI::calculateCallFrameInfo() instead of duplicating the code.

- Set AdjustsStack in FinalizeISel instead of in computeMaxCallFrameSize().
2024-03-18 10:37:59 -04:00
Kevin P. Neal
3e9e5e2771 [FPEnv][SystemZ] Correct strictfp test.
Correct llvm-reduce strictfp test to follow the rules documented in the
LangRef:
https://llvm.org/docs/LangRef.html#constrained-floating-point-intrinsics

This test needed the strictfp attribute added to function definitions.

Test changes verified with D146845.
2024-02-23 13:00:38 -05:00
Jonas Paulsson
9c0e45d7f0
[SystemZ] Use VT (not ArgVT) for SlotVT in LowerCall(). (#82475)
When an integer argument is promoted and *not* split (like i72 -> i128 on
a new machine with vector support), the SlotVT should be i128, which is
stored in VT - not ArgVT.

Fixes #81417
2024-02-21 16:26:16 +01:00
Ilya Leoshkevich
9c75a98155
[SystemZ] Implement A, O and R inline assembly format flags (#80685)
Implement the following assembly format flags, which are already
supported by GCC:

	'A': On z14 or higher: If operand is a mem print the alignment
         hint usable with vl/vst prefixed by a comma.
	'O': print only the displacement of a memory reference or address.
	'R': print only the base register of a memory reference or address.

Implement 'A' conservatively, since the memory operand alignment
information is not available for INLINEASM at the moment.
2024-02-07 20:41:40 +01:00
Nikita Popov
ff9af4c43a [CodeGen] Convert tests to opaque pointers (NFC) 2024-02-05 14:07:09 +01:00
Oskar Wirga
ff4636a4ab
Refactor recomputeLiveIns to converge on added MachineBasicBlocks (#79940)
This is a fix for the regression seen in
https://github.com/llvm/llvm-project/pull/79498

> Currently, the way that recomputeLiveIns works is that it will
recompute the livein registers for that MachineBasicBlock but it matters
what order you call recomputeLiveIn which can result in incorrect
register allocations down the line.

Now we do not recompute the entire CFG but we do ensure that the newly
added MBB do reach convergence.
2024-01-30 19:33:04 -08:00
Nikita Popov
07a1925b8b Revert "Refactor recomputeLiveIns to operate on whole CFG (#79498)"
This reverts commit 59bf60519fc30d9d36c86abd83093b068f6b1e4b.

Introduces a major compile-time regression.
2024-01-26 22:33:17 +01:00
Oskar Wirga
59bf60519f
Refactor recomputeLiveIns to operate on whole CFG (#79498)
Currently, the way that recomputeLiveIns works is that it will recompute
the livein registers for that MachineBasicBlock but it matters what
order you call recomputeLiveIn which can result in incorrect register
allocations down the line.

This PR fixes that by simply recomputing the liveins for the entire CFG
until convergence is achieved. This makes it harder to introduce subtle
bugs which alter liveness.
2024-01-26 11:25:36 -08:00
dyung
45f883ed06
Change check for embedded llvm version number to a regex to make test more flexible. (#79528)
This test started to fail when LLVM created the release/18.x branch and
the main branch subsequently had the version number increased from 18 to
19.

I investigated this failure (it was blocking our internal automation)
and discovered that the CHECK statement on line 27 seemed to have the
compiler version number (1800) encoded in octal that it was checking
for. I don't know if this is something that explicitly needs to be
checked, so I am leaving it in, but it should be more flexible so the
test doesn't fail anytime the version number is changed. To accomplish
that, I changed the check for the 4-digit version number to be a regex.

I originally updated this test for the 18->19 transition in
a01195ff5cc3d7fd084743b1f47007645bb385f4. This change makes the CHECK
line more flexible so it doesn't need to be continually updated.
2024-01-26 09:36:20 -08:00
Jonas Paulsson
84dcf3d35b
[SystemZ] Require D12 for i128 accesses in isLegalAddressingMode() (#79221)
Machines with vector support handle i128 in vector registers and
therefore only have the small displacement available for memory
accesses. Update isLegalAddressingMode() to reflect this.
2024-01-24 20:16:05 +01:00
Douglas Yung
a01195ff5c Update compiler version expected that seems to be embedded in CHECK line of test at llvm/test/CodeGen/SystemZ/zos-ppa2.ll.
The test contains a CHECK line which verifies an .ascii line which originally checks
for 18001970010100000000. After the bump of the compiler version to 19, the test
started to fail with the string now being 19001970010100000000.

This should fix this failing test on bots.
2024-01-23 22:05:17 -08:00
Jonas Paulsson
1d1893097a
[SystemZ] Don't use FP Load and Test as comparisons to same reg (#78074)
The usage of FP Load and Test instructions as a comparison against zero
with the assumption that the dest reg will always reflect the source reg is
actually incorrect: Unfortunately, a SNaN will be converted to a QNaN, so the
instruction may actually change the value as opposed to being a pure register
move with a test.

This patch
- changes instruction selection to always emit FP LT with a scratch def
  reg, which will typically be allocated to the same reg if dead.
- Removes the conversions into FP LT in SystemZElimcompare.
2024-01-15 19:36:40 +01:00
Jonas Paulsson
e2ce91f48c Fix test output for 3b16d8c 2024-01-15 12:04:00 -06:00
Jonas Paulsson
3b16d8c8ea
[SystemZ] Don't crash on undef source in shouldCoalesce() (#78056)
SystemZRegisterInfo::shouldCoalesce() has to be able to handle an undef
source.
2024-01-15 17:24:38 +01:00
Ulrich Weigand
9aa8c82748 [SystemZ] Fix 256-bit shifts when i128 is legal
When i128 is a legal type, SelectionDAG now attempts to use
SRL_PARTS etc. with type i128, which is not implemented.  Fix
by marking those as Expand, just like we do for i64.

Fixes https://github.com/llvm/llvm-project/issues/77132
2024-01-10 15:12:19 +01:00
Simon Pilgrim
d460c1de3b
[DAG] SimplifyDemandedBits - don't fold sext(x) -> aext(x) if we lose an 0/-1 allsignbits mask (#77296)
For targets that use 0/-1 boolean results, we want to keep this pattern through extensions/truncations as much as possible - so avoid simplifying to any_extend even if we don't demand the upper bits.

Noticed in triage for https://reviews.llvm.org/D152928
2024-01-08 18:01:41 +00:00
Simon Pilgrim
070ac1dcd5 [SystemZ] vec-perm-14.ll - partially regenerate checks so we can see all the vperm codegen
We can't use the script as we need to keep the shuffle mask constant pool checks, but do more than just check that a second vperm isn't generated
2024-01-05 18:00:08 +00:00
Jonas Paulsson
74a09bd1ec
[SystemZ] Test improvements for atomic load/store instructions (NFC). (#75630)
Improve tests for atomic loads and stores, mainly by testing 128-bit atomic load and store instructions both with and w/out natural alignment.
2023-12-21 20:48:00 +01:00
Yusra Syeda
0768253c20
[SystemZ][z/OS] Add exception handling for XPLINK (#74638)
Adds emitting the exception table and the EH registers for XPLINK.

---------

Co-authored-by: Yusra Syeda <yusra.syeda@ibm.com>
2023-12-19 13:58:33 -05:00
Ulrich Weigand
82a1bffd34
[SelectionDAG] Do not crash on large integers in CheckInteger (#75787)
The CheckInteger routine called from TableGen-generated selection logic
uses getSExtValue - which will abort if the underlying APInt does not
fit into an int64_t.

This case is now triggered by the SystemZ back-end since i128 is a legal
type on certain machines. While we do not have any regular instructions
that take 128-bit immediates (like most other platforms), there are
patterns in the .td files that recognize an i128 "xor ..., -1" as a
"not".

These patterns cause code to be generated that calls the CheckInteger
routine on some i128-valued integer, which may trigger the assert.

Fix by using trySExtValue instead.

Fixes https://github.com/llvm/llvm-project/issues/75710
2023-12-18 14:03:57 +01:00
Ulrich Weigand
a00c4220be [SystemZ] Fix complex address matching when i128 is legal
Complex address matching currently handles truncations, under
the assumption that those are no-ops.  This is no longer true
when i128 is legal.  Change the code to only handle actual
no-op truncations.

Fixes https://github.com/llvm/llvm-project/issues/75708
Fixes https://github.com/llvm/llvm-project/issues/75714
2023-12-18 12:47:45 +01:00
Ulrich Weigand
59f7f35a90 [SystemZ] ABI support for single-element vector types
Support passing and returning values of single-element vector
types (i.e. <1 x i128> and <1 x fp128>).

Now that i128 is a legal type, supporting these types can be
done simply by providing a getRegisterTypeForCallingConv
implementation that handles them.

Fixes https://github.com/llvm/llvm-project/issues/61291
2023-12-15 19:31:00 +01:00
Ulrich Weigand
a65ccc1b9f
[SystemZ] Support i128 as legal type in VRs (#74625)
On processors supporting vector registers and SIMD instructions, enable
i128 as legal type in VRs. This allows many operations to be implemented
via native instructions directly in VRs (including add, subtract,
logical operations and shifts). For a few other operations (e.g.
multiply and divide, as well as atomic operations), we need to move the
i128 value back to a GPR pair to use the corresponding instruction
there. Overall, this is still beneficial.

The patch includes the following LLVM changes:
- Enable i128 as legal type
- Set up legal operations (in SystemZInstrVector.td)
- Custom expansion for i128 add/subtract with carry
- Custom expansion for i128 comparisons and selects
- Support for moving i128 to/from GPR pairs when required
- Handle 128-bit integer constant values everywhere
- Use i128 as intrinsic operand type where appropriate
- Updated and new test cases

In addition, clang builtins are updated to reflect the intrinsic operand
type changes (which also improves compatibility with GCC).
2023-12-15 12:55:15 +01:00
Jonas Paulsson
01061ed370
[SystemZ] Improve shouldCoalesce() for i128. (#74942)
The SystemZ implementation of shouldCoalesce() is merely a workaround
for the fact that regalloc can run out of registers when extending 128-bit
intervals with subreg (GPR64/GPR32) COPYs. 

This patch adds more freedom to the coalescer as it now only checks that
the subreg interval is local to MBB and does not have too many physreg
clobbers.
2023-12-14 15:55:27 +01:00
Jonas Paulsson
07056c2274
[SystemZ] Use LCGR/AGHI for i64 XOR with -1 (#74882)
LCGR/AGHI is a more compact way of implementing a 64-bit NOT.
2023-12-11 17:28:12 +01:00
Jonas Paulsson
435ba72afd
[SystemZ] Simplify handling of AtomicRMW instructions. (#74789)
Let the AtomicExpand pass do more of the job of expanding
AtomicRMWInst:s in order to simplify the handling in the backend.

The only cases that the backend needs to handle itself are those of
subword size (8/16 bits) and those directly corresponding to a target
instruction.
2023-12-08 17:19:17 +01:00
Jonas Paulsson
c568927f3e
[SystemZ] Properly support 16 byte atomic int/fp types and ops. (#73134)
- Clang FE now has MaxAtomicPromoteWidth / MaxAtomicInlineWidth set to 128, and now produces IR
  instead of calls to __atomic instrinsics for 16 bytes as well.
- Atomic __int128 (and long double) variables are now aligned to 16 bytes by default (like gcc 14).
- AtomicExpand pass now expands 16 byte operations as well.
- tests for __atomic builtins for all integer widths, and __atomic_is_lock_free with friends.
- TODO: AtomicExpand pass handles with this patch expansion of i128 atomicrmw:s. As a next step
  smaller integer types should also be possible to handle this way instead of by the backend.
2023-12-05 17:17:21 +01:00
Nikita Popov
eecb99c5f6 [Tests] Add disjoint flag to some tests (NFC)
These tests rely on SCEV looking recognizing an "or" with no common
bits as an "add". Add the disjoint flag to relevant or instructions
in preparation for switching SCEV to use the flag instead of the
ValueTracking query. The IR with disjoint flag matches what
InstCombine would produce.
2023-12-05 14:09:36 +01:00
Ulrich Weigand
c61eb44005 [SystemZ] Implement vector rotate in terms of funnel shift
Clang currently implements a set of vector rotate builtins
(__builtin_s390_verll*) in terms of platform-specific LLVM
intrinsics.  To simplify the IR (and allow for common code
optimizations if applicable), this patch removes those LLVM
intrinsics and implements the builtins in terms of the
platform-independent funnel shift intrinsics instead.

Also, fix the prototype of the __builtin_s390_verll*
builtins for full compatibility with GCC.
2023-12-04 16:52:00 +01:00
Ulrich Weigand
0c568c2535 [SystemZ] Auto-generate vec-intrinsics tests 2023-12-04 16:36:38 +01:00
Yusra Syeda
9a38a72f1d
[SystemZ][z/OS] This change adds support for the PPA2 section in zOS (#68926)
This PR adds support for the PPA2 fields.

---------

Co-authored-by: Yusra Syeda <yusra.syeda@ibm.com>
2023-11-27 16:30:12 -05:00
Ilya Leoshkevich
03934e70ef
[SystemZ] Enable AtomicExpand pass (#70398)
The upcoming OpenMP support for SystemZ requires handling of IR insns
like `atomicrmw fadd`. Normally atomic float operations are expanded by
Clang and such insns do not occur, but OpenMP generates them directly.
Other architectures handle this using the AtomicExpand pass, which
SystemZ did not need so far. Enable it.

Currently AtomicExpand treats atomic load and stores of floats
pessimistically: it casts them to integers, which SystemZ does not need,
since the floating point load and store instructions are already atomic.
However, the way Clang currently expands them is pessimistic as well, so
this change does not make things worse. Optimizing operations on atomic
floats can be a separate change in the future.

This change does not create any differences the Linux kernel build.
2023-10-31 09:51:06 +01:00
Hans Wennborg
e2fc68c3db Typos: 'maxium', 'minium' 2023-10-23 10:42:28 +02:00
Ilya Leoshkevich
8e810dc7d9
[SystemZ] Support builtin_{frame,return}_address() with non-zero argument (#69405)
When the code is built with -mbackchain, it is possible to retrieve the
caller's frame and return addresses. GCC already can do this, add this
support to Clang as well. Use RISCVTargetLowering and GCC's
s390_return_addr_rtx() as inspiration. Add tests based on what GCC is
emitting.
2023-10-18 19:05:31 +02:00
Yusra Syeda
6cf41ada44
[SystemZ][z/OS] Add vararg support to z/OS (#68834)
This PR adds vararg support to z/OS and updates the call-zos-vararg.ll
lit test.

Co-authored-by: Yusra Syeda <yusra.syeda@ibm.com>
2023-10-12 12:42:55 +02:00
Jay Foad
7b3bbd83c0 Revert "[CodeGen] Really renumber slot indexes before register allocation (#67038)"
This reverts commit 2501ae58e3bb9a70d279a56d7b3a0ed70a8a852c.

Reverted due to various buildbot failures.
2023-10-09 12:31:32 +01:00
Jay Foad
2501ae58e3
[CodeGen] Really renumber slot indexes before register allocation (#67038)
PR #66334 tried to renumber slot indexes before register allocation, but
the numbering was still affected by list entries for instructions which
had been erased. Fix this to make the register allocator's live range
length heuristics even less dependent on the history of how instructions
have been added to and removed from SlotIndexes's maps.
2023-10-09 11:44:41 +01:00
Kai Nacke
42de2b7e99
[SystemZ/z/OS] Add library names for intrinsics (#68114)
On z/OS, many library functions have a non-standard name. This change
initializes the table of runtime function which results from lowering
intrinsics to library calls.
2023-10-03 18:53:52 +03:00
Noah Goldstein
de7881ebf5 [DAGCombiner] Combine (select c, (and X, 1), 0) -> (and (zext c), X)
The middle end canonicalizes:
`(and (zext c), X)`
    -> `(select c, (and X, 1), 0)`

But the `and` + `zext` form gets better codegen.
2023-09-28 13:46:46 -05:00