945 Commits

Author SHA1 Message Date
Jonas Paulsson
d6ee7e8481
[SystemZ] Handle address clobbering in splitMove(). (#92105)
When expanding an L128 (which is used to reload i128) it is
possible that the quadword destination register clobbers an
address register. This patch adds an assertion against the case
where both of the expanded parts clobber the address, and in the
case where one of the expanded parts do so puts it last.

Fixes #91437
2024-05-15 08:36:26 +02:00
Ulrich Weigand
de117dd533 [SystemZ] Add some more atomic load/store tests
Verify atomic load/store of f128 on z14 where the type
lives in VRs.
2024-05-07 16:57:17 +02:00
Ulrich Weigand
0a0cac6dbd
[SystemZ] Simplify f128 atomic load/store (#90977)
Change definition of expandBitCastI128ToF128 and expandBitCastF128ToI128
to allow for simplified use in atomic load/store.

Update logic to split 128-bit loads and stores in DAGCombine to also
handle the f128 case where appropriate. This fixes the regressions
introduced by recent atomic load/store patches.
2024-05-06 12:17:19 +02:00
Matt Arsenault
eb75af223f Reapply "SystemZ: Fold copy of vector immediate to gr128" (#91099)
This reverts commit a415b4dfcc02e3e82b8c8a7836f7c04b9d65dc9b.

Modify the instruction in place to transform it into a REG_SEQUENCE,
which is what other implementations of foldImmediate do. Also start
erasing the def instruction if there are no other uses.

Fixes #91110.
2024-05-06 10:00:20 +02:00
Matt Arsenault
4b61d04645 SystemZ: Remove unnecessary REQUIRES asserts from tests 2024-05-06 09:52:35 +02:00
Matt Arsenault
181e82143e SystemZ: Remove redundant REQUIRES systemz from test 2024-05-06 09:52:35 +02:00
Vitaly Buka
a415b4dfcc
Revert "SystemZ: Fold copy of vector immediate to gr128" (#91099)
Fails here:
https://lab.llvm.org/buildbot/#/builders/239/builds/6893
https://lab.llvm.org/buildbot/#/builders/5/builds/43113
https://lab.llvm.org/buildbot/#/builders/168/builds/20228

Reverts llvm/llvm-project#90706
2024-05-04 23:59:49 -07:00
Simon Pilgrim
caacf8685a
[DAG] Fold freeze(shuffle(x,y,m)) -> shuffle(freeze(x),freeze(y),m) (#90952)
If the shuffle mask contains no undef elements, then we can move the freeze through a shuffle node.

This requires special case handling to create a new ShuffleVectorSDNode.

Includes VECTOR_SHUFFLE support for isGuaranteedNotToBeUndefOrPoison  / canCreateUndefOrPoison.
2024-05-04 12:03:10 +01:00
Matt Arsenault
49c5f4d56a
SystemZ: Fold copy of vector immediate to gr128 (#90706)
If materializing a constant in a vector register that is just
going to be copied to general registers, directly materialize
the immediate in the gpr. This will avoid a few lit test regressions
in a future commit.
2024-05-03 18:40:11 +02:00
Matt Arsenault
edbe6ebb4d
SystemZ: Don't promote atomic store in IR (#90899)
This is the mirror to the recent atomic load change. The same
bitcast-back-to-integer case is a small code quality regression for the
same reason. This would disappear with a bitcastable legal 128-bit type.
2024-05-03 10:04:12 +02:00
Matt Arsenault
6535e7a400 SystemZ: Remove redundant copy tests from 75f4baa70 2024-05-03 10:03:05 +02:00
Matt Arsenault
38f9c013a0
SystemZ: Stop casting fp typed atomic loads in the IR (#90768)
shouldCastAtomicLoadInIR is a hack that should be removed. Simple
bitcasting of operations should be in the domain of ordinary type
legalization and does not need to be done in the IR.

This introduces a code quality regression due to the hack currently used
to avoid using 128-bit values in the case where the floating point value
is ultimately used as an integer. This would be avoidable if there were
always a legal 128-bit type (like v2i64). This is a pretty niche
situation so I assume it's not important.

I implemented about 85% of the work necessary to make v2i64 legal, but
it was taking too long and I lack the necessary familiarity with systemz
to complete it. I've pushed it here for someone to pick up:
https://github.com/arsenm/llvm-project/pull/new/systemz-legal-v2i64

Depends #90861
2024-05-02 21:31:29 +02:00
Matt Arsenault
d11afe1c74
SystemZ: Handle gr128 to fp128 copies in copyPhysReg (#90861) 2024-05-02 17:46:43 +02:00
Matt Arsenault
3a1e55904b
SystemZ: Add some tests for fp128 atomics with soft-float (#90826) 2024-05-02 15:22:34 +02:00
Matt Arsenault
15027be6a5 SystemZ: Fix test failing the verifier 2024-05-02 08:37:27 +02:00
Matt Arsenault
376bc73b34 SystemZ: Fix accidentally commented out run line in test 2024-05-02 08:37:27 +02:00
Matt Arsenault
75f4baa705
SystemZ: Implement copyPhysReg between vr128 and gr128 (#90616)
I have no idea if this is correct and I probably swapped the element
ordering somewhere.
2024-04-30 23:02:54 +02:00
Jonas Paulsson
6c32a1fdf7
[SystemZ] Enable MachineCombiner for FP reassociation (#83546)
Enable MachineCombining for FP add, sub and mul.

In order for this to work, the default instruction selection of reg/mem opcodes is disabled for ISD nodes that carry the flags that allow reassociation. The reg/mem folding is instead done after MachineCombiner by PeepholeOptimizer. SystemZInstrInfo optimizeLoadInstr() and foldMemoryOperandImpl() ("LoadMI version") have been implemented for this purpose also by this patch.
2024-04-30 17:09:54 +02:00
Matt Arsenault
738c135ee0
SystemZ: Add more tests for fp128 atomics (#90269)
These did not have proper floating point uses so weren't representative
samples. The bitcast inserted by lowering could be absorbed by the
load/store on the source/use.
2024-04-27 20:26:09 +02:00
Kai Nacke
d5022d9ad4
[SystemZ][z/OS] Make z/OS personality function known (#89679)
This change adds the z/OS personality function to the list of known EH
personality functions. It enables removing of the EH data/labels if the
personality function is not invoked.
2024-04-23 10:39:03 -04:00
Kai Nacke
cce4dc7b7a
[SystemZ][z/OS] Implement llvm.returnaddress for XPLINK (#89440)
The implementation follows the ELF implementation.
2024-04-22 11:01:22 -04:00
Kai Nacke
7e2c2981fb
[SystemZ][z/OS] Implement llvm.frameaddr for XPLINK (#89284)
The implementation follows the ELF implementation.
2024-04-19 08:09:49 -04:00
Jonas Paulsson
7e4c6e98fa
[SystemZ] Bugfix in getDemandedSrcElements(). (#88623)
For the intrinsic s390_vperm, all of the elements are demanded, so use
an APInt with the value of '-1' for them (not '1').

Fixes https://github.com/llvm/llvm-project/issues/88397
2024-04-15 16:32:14 +02:00
Dominik Steenken
b794dc2325
[SystemZ] Add custom handling of legal vectors with reduce-add. (#88495)
This commit skips the expansion of the `vector.reduce.add` intrinsic on
vector-enabled SystemZ targets in order to introduce custom handling of
`vector.reduce.add` for legal vector types using the VSUM instructions.
This is limited to full vectors with scalar types up to `i32` due to
performance concerns.

It also adds testing for the generation of such custom handling, and
adapts the related cost computation, as well as the testing for that.

The expected result is a performance boost in certain benchmarks that
make heavy use of `vector.reduce.add` with other benchmarks remaining
constant.

For instance, the assembly for `vector.reduce.add<4 x i32>` changes from
```hlasm
        vmrlg   %v0, %v24, %v24
        vaf     %v0, %v24, %v0
        vrepf   %v1, %v0, 1
        vaf     %v0, %v0, %v1
        vlgvf   %r2, %v0, 0
```
to
```hlasm
        vgbm    %v0, 0
        vsumqf  %v0, %v24, %v0
        vlgvf   %r2, %v0, 3
```
2024-04-12 18:05:30 +02:00
Jonas Paulsson
16b7cc69ef
[SystemZ] Eliminate call sequence instructions early. (#77812)
On SystemZ, the outgoing argument area which is big enough for all calls
in the function is created once during the prolog, as opposed to
adjusting the stack around each call. The call-sequence instructions are
therefore not really useful any more than to compute the maximum call
frame size, which has so far been done by PEI, but can just as well be
done at an earlier point.

This patch removes the mapping of the CallFrameSetupOpcode and
CallFrameDestroyOpcode and instead computes the MaxCallFrameSize
directly after instruction selection and then removes the ADJCALLSTACK
pseudos. This removes the confusing pseudos and also avoids the problem
of having to keep the call frame size accurate when creating new MBBs.

This fixes #76618 which exposed the need to maintain the call frame size
when splitting blocks (which was not done).
2024-03-28 18:26:38 +01:00
Ulrich Weigand
4b907414d2 [SystemZ] Add support for llvm.readcyclecounter
The llvm.readcyclecounter intrinsic can be implemented via the
STORE CLOCK FAST (STCKF) instruction.
2024-03-22 20:01:02 +01:00
Jonas Paulsson
7564566779 Reapply "Move assertion for AdjustsStack from PEI to MachineVerifier (#85698)"
- The check is now actually done in both PEI and the MachineVerifier.
- More .mir tests trivially updated with "adjustsStack: true" as needed.
2024-03-21 20:24:57 -04:00
Jonas Paulsson
b4b5e8277a
Check for all frame instructions in finalize isel. (#85945)
Check for all frame instructions in finalize isel, not just for the
frame setup opcode. This was proven necessary, see #78001 
for discussion.
2024-03-21 11:00:08 -04:00
Jonas Paulsson
9ebd329ad8 Revert "Move assertion for AdjustsStack from PEI to MachineVerifier. (#85698)"
This reverts commit 05bde30585710a51592eee0a6cf6df8184d09c92.

Reverting due to verifier complaints with expensive checks on build-bot.
2024-03-20 11:48:30 -04:00
Neumann Hon
5fb2797f23
[GOFF][z/OS] Change PrivateGlobalPrefix and PrivateLabelPrefix to be L# (#85730)
The current values for PrivateGlobalPrefix and PrivateLabelPrefix (@@
and @ respectively) are, in hindsight, poor choices for multiple
reasons:

First, there exist externally visible routines from the language
environment that begin with @@. These functions are certainly not
local/private by any means and they should not share a prefix with
private globals.

Secondly, both private globals and private labels should be handled the
same way by GOFF, so it doesn't make much sense for them to have
separate prefixes. GOFF remains the only file format where these are
different and there is no reason for that to be the case
2024-03-20 10:30:30 -04:00
Jonas Paulsson
05bde30585
Move assertion for AdjustsStack from PEI to MachineVerifier. (#85698)
Have the verifier report a missing AdjustsStack flag rather than waiting until
PEI asserts.
2024-03-20 10:29:12 -04:00
Ulrich Weigand
335f365982 Reapply: [SystemZ] Fix overflow flag for i128 USUBO
We use the VSCBIQ/VSBIQ/VSBCBIQ family of instructions to implement
USUBO/USUBO_CARRY for the i128 data type.  However, these instructions
use an inverted sense of the borrow indication flag (a value of 1
indicates *no* borrow, while a value of 0 indicated borrow).  This
does not match the semantics of the boolean "overflow" flag of the
USUBO/USUBO_CARRY ISD nodes.

Fix this by generating code to explicitly invert the flag.  These
cancel out of the result of USUBO feeds into an USUBO_CARRY.

To avoid unnecessary zero-extend operations, also improve the
DAGCombine handling of ZERO_EXTEND to optimize (zext (xor (trunc)))
sequences where appropriate.

Fixes: https://github.com/llvm/llvm-project/issues/83268
2024-03-19 14:07:08 +01:00
Ulrich Weigand
d1c3795968 Revert "Fix overflow flag for i128 USUBO"
This reverts commit d9c31ee9568277e4303715736b40925e41503596.
2024-03-19 11:43:05 +01:00
Ulrich Weigand
d9c31ee956 Fix overflow flag for i128 USUBO
We use the VSCBIQ/VSBIQ/VSBCBIQ family of instructions to implement
USUBO/USUBO_CARRY for the i128 data type.  However, these instructions
use an inverted sense of the borrow indication flag (a value of 1
indicates *no* borrow, while a value of 0 indicated borrow).  This
does not match the semantics of the boolean "overflow" flag of the
USUBO/USUBO_CARRY ISD nodes.

Fix this by generating code to explicitly invert the flag.  These
cancel out of the result of USUBO feeds into an USUBO_CARRY.

To avoid unnecessary zero-extend operations, also improve the
DAGCombine handling of ZERO_EXTEND to optimize (zext (xor (trunc)))
sequences where appropriate.

Fixes: https://github.com/llvm/llvm-project/issues/83268
2024-03-19 11:20:52 +01:00
Jonas Paulsson
8b8e1adbde
[SystemZ] Don't lower ATOMIC_LOAD/STORE to LOAD/STORE (#75879)
- Instead of lowering float/double ISD::ATOMIC_LOAD / ISD::ATOMIC_STORE
nodes to regular LOAD/STORE nodes, make them legal and select those nodes
properly instead. This avoids exposing them to the DAGCombiner.

- AtomicExpand pass no longer casts float/double atomic load/stores to integer
  (FP128 is still casted).
2024-03-18 17:21:50 -04:00
Jonas Paulsson
09bc6abba6
[MachineFrameInfo] Refactoring around computeMaxcallFrameSize() (NFC) (#78001)
- Use computeMaxCallFrameSize() in PEI::calculateCallFrameInfo() instead of duplicating the code.

- Set AdjustsStack in FinalizeISel instead of in computeMaxCallFrameSize().
2024-03-18 10:37:59 -04:00
Kevin P. Neal
3e9e5e2771 [FPEnv][SystemZ] Correct strictfp test.
Correct llvm-reduce strictfp test to follow the rules documented in the
LangRef:
https://llvm.org/docs/LangRef.html#constrained-floating-point-intrinsics

This test needed the strictfp attribute added to function definitions.

Test changes verified with D146845.
2024-02-23 13:00:38 -05:00
Jonas Paulsson
9c0e45d7f0
[SystemZ] Use VT (not ArgVT) for SlotVT in LowerCall(). (#82475)
When an integer argument is promoted and *not* split (like i72 -> i128 on
a new machine with vector support), the SlotVT should be i128, which is
stored in VT - not ArgVT.

Fixes #81417
2024-02-21 16:26:16 +01:00
Ilya Leoshkevich
9c75a98155
[SystemZ] Implement A, O and R inline assembly format flags (#80685)
Implement the following assembly format flags, which are already
supported by GCC:

	'A': On z14 or higher: If operand is a mem print the alignment
         hint usable with vl/vst prefixed by a comma.
	'O': print only the displacement of a memory reference or address.
	'R': print only the base register of a memory reference or address.

Implement 'A' conservatively, since the memory operand alignment
information is not available for INLINEASM at the moment.
2024-02-07 20:41:40 +01:00
Nikita Popov
ff9af4c43a [CodeGen] Convert tests to opaque pointers (NFC) 2024-02-05 14:07:09 +01:00
Oskar Wirga
ff4636a4ab
Refactor recomputeLiveIns to converge on added MachineBasicBlocks (#79940)
This is a fix for the regression seen in
https://github.com/llvm/llvm-project/pull/79498

> Currently, the way that recomputeLiveIns works is that it will
recompute the livein registers for that MachineBasicBlock but it matters
what order you call recomputeLiveIn which can result in incorrect
register allocations down the line.

Now we do not recompute the entire CFG but we do ensure that the newly
added MBB do reach convergence.
2024-01-30 19:33:04 -08:00
Nikita Popov
07a1925b8b Revert "Refactor recomputeLiveIns to operate on whole CFG (#79498)"
This reverts commit 59bf60519fc30d9d36c86abd83093b068f6b1e4b.

Introduces a major compile-time regression.
2024-01-26 22:33:17 +01:00
Oskar Wirga
59bf60519f
Refactor recomputeLiveIns to operate on whole CFG (#79498)
Currently, the way that recomputeLiveIns works is that it will recompute
the livein registers for that MachineBasicBlock but it matters what
order you call recomputeLiveIn which can result in incorrect register
allocations down the line.

This PR fixes that by simply recomputing the liveins for the entire CFG
until convergence is achieved. This makes it harder to introduce subtle
bugs which alter liveness.
2024-01-26 11:25:36 -08:00
dyung
45f883ed06
Change check for embedded llvm version number to a regex to make test more flexible. (#79528)
This test started to fail when LLVM created the release/18.x branch and
the main branch subsequently had the version number increased from 18 to
19.

I investigated this failure (it was blocking our internal automation)
and discovered that the CHECK statement on line 27 seemed to have the
compiler version number (1800) encoded in octal that it was checking
for. I don't know if this is something that explicitly needs to be
checked, so I am leaving it in, but it should be more flexible so the
test doesn't fail anytime the version number is changed. To accomplish
that, I changed the check for the 4-digit version number to be a regex.

I originally updated this test for the 18->19 transition in
a01195ff5cc3d7fd084743b1f47007645bb385f4. This change makes the CHECK
line more flexible so it doesn't need to be continually updated.
2024-01-26 09:36:20 -08:00
Jonas Paulsson
84dcf3d35b
[SystemZ] Require D12 for i128 accesses in isLegalAddressingMode() (#79221)
Machines with vector support handle i128 in vector registers and
therefore only have the small displacement available for memory
accesses. Update isLegalAddressingMode() to reflect this.
2024-01-24 20:16:05 +01:00
Douglas Yung
a01195ff5c Update compiler version expected that seems to be embedded in CHECK line of test at llvm/test/CodeGen/SystemZ/zos-ppa2.ll.
The test contains a CHECK line which verifies an .ascii line which originally checks
for 18001970010100000000. After the bump of the compiler version to 19, the test
started to fail with the string now being 19001970010100000000.

This should fix this failing test on bots.
2024-01-23 22:05:17 -08:00
Jonas Paulsson
1d1893097a
[SystemZ] Don't use FP Load and Test as comparisons to same reg (#78074)
The usage of FP Load and Test instructions as a comparison against zero
with the assumption that the dest reg will always reflect the source reg is
actually incorrect: Unfortunately, a SNaN will be converted to a QNaN, so the
instruction may actually change the value as opposed to being a pure register
move with a test.

This patch
- changes instruction selection to always emit FP LT with a scratch def
  reg, which will typically be allocated to the same reg if dead.
- Removes the conversions into FP LT in SystemZElimcompare.
2024-01-15 19:36:40 +01:00
Jonas Paulsson
e2ce91f48c Fix test output for 3b16d8c 2024-01-15 12:04:00 -06:00
Jonas Paulsson
3b16d8c8ea
[SystemZ] Don't crash on undef source in shouldCoalesce() (#78056)
SystemZRegisterInfo::shouldCoalesce() has to be able to handle an undef
source.
2024-01-15 17:24:38 +01:00
Ulrich Weigand
9aa8c82748 [SystemZ] Fix 256-bit shifts when i128 is legal
When i128 is a legal type, SelectionDAG now attempts to use
SRL_PARTS etc. with type i128, which is not implemented.  Fix
by marking those as Expand, just like we do for i64.

Fixes https://github.com/llvm/llvm-project/issues/77132
2024-01-10 15:12:19 +01:00