967 Commits

Author SHA1 Message Date
tltao
0aec4d2b78
[SystemZ] Update large ada tests after HLASM syntax change (#113578)
Fix buildbot failures seen on:
https://lab.llvm.org/buildbot/#/builders/42/builds/1597

caused by:  https://github.com/llvm/llvm-project/pull/113369

Co-authored-by: Tony Tao <tonytao@ca.ibm.com>
2024-10-24 11:02:36 -04:00
tltao
c170405996
[SystemZ] Introduce GNU and HLASM differences to asmwriter and update tests (#113369)
Now that the GNU and HLASM `InstPrinter` paths are separated in
https://github.com/llvm/llvm-project/pull/112975, differentiate between
them in `SystemZInstrFormats.td`.

The main difference are:
- Tabs converted to space
- Remove space after comma for instruction operands

---------

Co-authored-by: Tony Tao <tonytao@ca.ibm.com>
2024-10-23 13:06:48 -04:00
Alex Rønne Petersen
5785cbb405
[llvm] Ensure that soft float targets don't emit fma() libcalls. (#106615)
The previous behavior could be harmful in some edge cases, such as
emitting a call to `fma()` in the `fma()` implementation itself.

Do this by just being more accurate in `isFMAFasterThanFMulAndFAdd()`.
This was already done for PowerPC; this commit just extends that to Arm,
z/Arch, and x86. MIPS and SPARC already got it right, but I added tests
for them too, for good measure.

Note: I don't have commit access.
2024-10-19 06:13:15 -07:00
Jay Foad
922992a22f
Fix typo "instrinsic" (#112899) 2024-10-18 15:58:33 +01:00
Alex Rønne Petersen
ad4a582fd9
[llvm] Consistently respect naked fn attribute in TargetFrameLowering::hasFP() (#106014)
Some targets (e.g. PPC and Hexagon) already did this. I think it's best
to do this consistently so that frontend authors don't run into
inconsistent results when they emit `naked` functions. For example, in
Zig, we had to change our emit code to also set `frame-pointer=none` to
get reliable results across targets.

Note: I don't have commit access.
2024-10-18 09:35:42 +04:00
Jonas Paulsson
76aa370f44
[SystemZ] Remove inlining threshold multiplier. (#106058)
Due to recently reported problems with having the inlining threshold multiplier
set fairly high (x3), this patch removes the multiplier while addressing
the regressions seen by doing so in adjustInliningThreshold().

The specific cases that benefit from inlining that were now found to be in need
of handling contain a considerable number of memory accesses to the same
memory in both caller and callee.
2024-10-07 10:59:45 +02:00
Jonas Paulsson
f9fbfc587d
[SystemZ] Dump function signature on missing arg extension. (#109699)
Make it easier to handle detected problems by providing the function
signature(s) involved in cases of missing argument extensions.
2024-09-30 17:03:18 +02:00
Jonas Paulsson
0ef24aa549
Fix for logic in combineExtract() (#108208)
A (csmith) test case appeared where combineExtract() crashed when the
input vector was a bitcast into a vector of i1:s. Fix this by adding a check
with canTreatAsByteVector() before the call.
2024-09-25 12:12:27 +02:00
Jonas Paulsson
14120227a3
Target ABI: improve call parameters extensions handling (#100757)
For the purpose of verifying proper arguments extensions per the target's ABI,
introduce the NoExt attribute that may be used by a target when neither sign-
or zeroextension is required (e.g. with a struct in register). The purpose of
doing so is to be able to verify that there is always one of these attributes
present and by this detecting cases where sign/zero extension is actually
missing.

As a first step, this patch has the verification step done for the SystemZ
backend only, but left off by default until all known issues have been
addressed.

Other targets/front-ends can now also add NoExt attribute where needed and do
this check in the backend.
2024-09-19 16:59:31 +02:00
Simon Pilgrim
4baf29e81e [DAG] Handle cases where a shift amount is larger than the pre-extended value bitwidth
In the (zext (shl (zext x), cst)) -> (shl (zext x), cst) fold, don't use a bitmask / MaskedValueIsZero as we can't guarantee that the shift amount is in bounds.

Fixes #106202
2024-08-27 18:12:24 +01:00
Abhina Sree
a0be7053d7
[SystemZ][z/OS] Continuation of __ptr32 support (#103393)
This is a continuation of the __ptr32 support added here
135fecd444
2024-08-14 13:26:30 -04:00
tltao
bc747c3e13
[SystemZ][z/OS] Fix incorrect codegen for ADA_ENTRY pseudo instruction (#101415)
The current MCInstBuilder for generating an ALGFI when loading something
from the ADA is incorrect and will crash the compiler.

r0 must also be excluded from the registers returned as the result,
since it is treated as the value "0" on z/OS.

Also add some tests to properly test the paths where LLILF and ALGFI are
generated.

---------

Co-authored-by: Tony Tao <tonytao@ca.ibm.com>
2024-08-01 13:23:49 -04:00
Jonas Paulsson
22bc9db92b
[SystemZ] Use the EVT version of getVectorVT() in combineTruncateExtract(). (#100150)
A test case showed up where the new vector type is v24i16, which is not a simple
MVT. In order to get an extended value type for cases like this, EVT::getVectorVT()
needs to be called instead of MVT::getVectorVT(), otherwise the following call
to getVectorElementType() in combineExtract() will fail.
2024-07-26 14:33:40 +02:00
Björn Pettersson
2b78303e3f
[DAGCombiner] Freeze maybe poison operands when folding select to logic (#84924)
Just like for regular IR we need to treat SELECT as conditionally
blocking poison in SelectionDAG. So (unless the condition itself is
poison) the result is only poison if the selected true/false value is
poison.
Thus, when doing DAG combines that turn SELECT into arithmetic/logical
operations (e.g. AND/OR) we need to make sure that the new operations
aren't more poisonous. One way to do that is to use FREEZE to make
sure the operands aren't posion.

This patch aims at fixing the kind of miscompiles reported in
  https://github.com/llvm/llvm-project/issues/84653
and
  https://github.com/llvm/llvm-project/issues/85190

Solution is to make sure that we insert FREEZE, if needed to make
the fold sound, when using the foldBoolSelectToLogic and
foldVSelectToSignBitSplatMask DAG combines.
2024-07-22 17:19:46 +02:00
Volodymyr Vasylkun
e094abde42
[SelectionDAG] Expand [US]CMP using arithmetic on boolean values instead of selects (#98774)
The previous expansion of [US]CMP was done using two selects and two
compares. It produced decent code, but on many platforms it is better to
implement [US]CMP nodes by performing the following operation:

  ```
[us]cmp(x, y) = (x [us]> y) - (x [us]< y)
```

This patch adds this new expansion, as well as a hook in TargetLowering to allow some targets to still use the select-based approach. AArch64 and SystemZ are currently the only targets to prefer the former approach, but other targets may also start to use it if it provides for better codegen.
2024-07-16 20:56:18 +01:00
Ulrich Weigand
e8e406041e Fix sext_in_reg from i1 to i128
The combineSIGN_EXTEND_INREG routine was using
DAG.getConstant(-1, DL, VT), which does not result in
the expected value when VT has more than 64 bits.

Fix this by using DAG.getAllOnesConstant(DL, VT) instead.

Also add test cases for v1i128 comparisons (which triggers
the bug).
2024-07-15 11:26:37 +02:00
Manish Kausik H
69192e0193
[LegalizeDAG] Optimize CodeGen for ISD::CTLZ_ZERO_UNDEF (#83039)
Previously we had the same instructions being generated for `ISD::CTLZ` and `ISD::CTLZ_ZERO_UNDEF` which did not take advantage of the fact that zero is an invalid input for `ISD::CTLZ_ZERO_UNDEF`. This commit separates codegen for the two cases to allow for the optimization for the latter case.

The details of the optimization are outlined in #82075

Fixes #82075

Co-authored-by: Manish Kausik H <hmamishkausik@gmail.com>
2024-07-08 14:01:32 +01:00
Zibi Sarbinowski
1de1818fab
[SystemZ] Address issue with supper large stack frames (#96318)
This PR fixes the following failure by adjusting the calculation of
maximum displacement from Stack Pointer.

`LLVM ERROR: Error while trying to spill R5D from class ADDR64Bit:
Cannot scavenge register without an emergency spill slot!
`
2024-06-27 09:23:35 -04:00
Farzon Lotfi
189d471191
[clang] Reland Add tanf16 builtin and support for tan constrained intrinsic (#94559)
Relanding this PR now that
https://github.com/llvm/llvm-project/pull/90503 has merged. with `FTAN`
landing in
[TargetLoweringBase.cpp:L1021](https://github.com/llvm/llvm-project/blob/main/llvm/lib/CodeGen/TargetLoweringBase.cpp#L1020C23-L1021C63
) There is now a llvm tan intrinsic 32\64\128 Expand case for all llvm
backends.

In LLVM, the `llvm.experimental.constrained.cos` and
`llvm.experimental.constrained.sin` intrinsics are used for performing
cosine and sine calculations with additional constraints on
floating-point operations. This behavior is expected for all
floating-point math intrinsics. This change adds these constraints for
the `tan` intrinsic.

-  `Builtins.td` - replace TanF128 with F16F128MathTemplate
- `CGBuiltin.cpp` - map existing tan builtins to `tan` and
`constrained_tan` intrinsic
-   `ConstrainedOps.def` map tan and constrained_tan  to an ISDOpcode.

resolves  #91421

---------

Co-authored-by: Farzon Lotfi <farzon@farzon.com>
2024-06-10 20:46:26 -04:00
paperchalice
1bc8b3258e
[NewPM][CodeGen] Port regallocfast to new pass manager (#94426)
This pull request port `regallocfast` to new pass manager. It exposes
the parameter `filter` to handle different register classes for AMDGPU.
IIUC AMDGPU need to allocate different register classes separately so it
need implement its own `--<reg-class>-regalloc`. Now users can use e.g.
`-passe=regallocfast<filter=sgpr>` to allocate specific register class.
The command line option `--regalloc-npm` is still in work progress, plan
to reuse the syntax of passes, e.g. use
`--regalloc-npm=regallocfast<filter=sgpr>,greedy<filter=vgpr>` to
replace `--sgpr-regalloc` and `--vgpr-regalloc`.
2024-06-07 12:22:42 +08:00
paperchalice
9b0e1c2ca2
[NewPM][CodeGen] Port finalize-isel to new pass manager (#94214)
It should preserve more analysis results, but it happens immediately
after instruction selection.
2024-06-04 09:23:52 +08:00
Matt Arsenault
ddb87e0f96
SystemZ: Use REG_SEQUENCE for PAIR128 (#90640)
PAIR128 should probably just be removed entirely

Depends #90638
2024-05-17 13:16:34 +02:00
Jonas Paulsson
d6ee7e8481
[SystemZ] Handle address clobbering in splitMove(). (#92105)
When expanding an L128 (which is used to reload i128) it is
possible that the quadword destination register clobbers an
address register. This patch adds an assertion against the case
where both of the expanded parts clobber the address, and in the
case where one of the expanded parts do so puts it last.

Fixes #91437
2024-05-15 08:36:26 +02:00
Ulrich Weigand
de117dd533 [SystemZ] Add some more atomic load/store tests
Verify atomic load/store of f128 on z14 where the type
lives in VRs.
2024-05-07 16:57:17 +02:00
Ulrich Weigand
0a0cac6dbd
[SystemZ] Simplify f128 atomic load/store (#90977)
Change definition of expandBitCastI128ToF128 and expandBitCastF128ToI128
to allow for simplified use in atomic load/store.

Update logic to split 128-bit loads and stores in DAGCombine to also
handle the f128 case where appropriate. This fixes the regressions
introduced by recent atomic load/store patches.
2024-05-06 12:17:19 +02:00
Matt Arsenault
eb75af223f Reapply "SystemZ: Fold copy of vector immediate to gr128" (#91099)
This reverts commit a415b4dfcc02e3e82b8c8a7836f7c04b9d65dc9b.

Modify the instruction in place to transform it into a REG_SEQUENCE,
which is what other implementations of foldImmediate do. Also start
erasing the def instruction if there are no other uses.

Fixes #91110.
2024-05-06 10:00:20 +02:00
Matt Arsenault
4b61d04645 SystemZ: Remove unnecessary REQUIRES asserts from tests 2024-05-06 09:52:35 +02:00
Matt Arsenault
181e82143e SystemZ: Remove redundant REQUIRES systemz from test 2024-05-06 09:52:35 +02:00
Vitaly Buka
a415b4dfcc
Revert "SystemZ: Fold copy of vector immediate to gr128" (#91099)
Fails here:
https://lab.llvm.org/buildbot/#/builders/239/builds/6893
https://lab.llvm.org/buildbot/#/builders/5/builds/43113
https://lab.llvm.org/buildbot/#/builders/168/builds/20228

Reverts llvm/llvm-project#90706
2024-05-04 23:59:49 -07:00
Simon Pilgrim
caacf8685a
[DAG] Fold freeze(shuffle(x,y,m)) -> shuffle(freeze(x),freeze(y),m) (#90952)
If the shuffle mask contains no undef elements, then we can move the freeze through a shuffle node.

This requires special case handling to create a new ShuffleVectorSDNode.

Includes VECTOR_SHUFFLE support for isGuaranteedNotToBeUndefOrPoison  / canCreateUndefOrPoison.
2024-05-04 12:03:10 +01:00
Matt Arsenault
49c5f4d56a
SystemZ: Fold copy of vector immediate to gr128 (#90706)
If materializing a constant in a vector register that is just
going to be copied to general registers, directly materialize
the immediate in the gpr. This will avoid a few lit test regressions
in a future commit.
2024-05-03 18:40:11 +02:00
Matt Arsenault
edbe6ebb4d
SystemZ: Don't promote atomic store in IR (#90899)
This is the mirror to the recent atomic load change. The same
bitcast-back-to-integer case is a small code quality regression for the
same reason. This would disappear with a bitcastable legal 128-bit type.
2024-05-03 10:04:12 +02:00
Matt Arsenault
6535e7a400 SystemZ: Remove redundant copy tests from 75f4baa70 2024-05-03 10:03:05 +02:00
Matt Arsenault
38f9c013a0
SystemZ: Stop casting fp typed atomic loads in the IR (#90768)
shouldCastAtomicLoadInIR is a hack that should be removed. Simple
bitcasting of operations should be in the domain of ordinary type
legalization and does not need to be done in the IR.

This introduces a code quality regression due to the hack currently used
to avoid using 128-bit values in the case where the floating point value
is ultimately used as an integer. This would be avoidable if there were
always a legal 128-bit type (like v2i64). This is a pretty niche
situation so I assume it's not important.

I implemented about 85% of the work necessary to make v2i64 legal, but
it was taking too long and I lack the necessary familiarity with systemz
to complete it. I've pushed it here for someone to pick up:
https://github.com/arsenm/llvm-project/pull/new/systemz-legal-v2i64

Depends #90861
2024-05-02 21:31:29 +02:00
Matt Arsenault
d11afe1c74
SystemZ: Handle gr128 to fp128 copies in copyPhysReg (#90861) 2024-05-02 17:46:43 +02:00
Matt Arsenault
3a1e55904b
SystemZ: Add some tests for fp128 atomics with soft-float (#90826) 2024-05-02 15:22:34 +02:00
Matt Arsenault
15027be6a5 SystemZ: Fix test failing the verifier 2024-05-02 08:37:27 +02:00
Matt Arsenault
376bc73b34 SystemZ: Fix accidentally commented out run line in test 2024-05-02 08:37:27 +02:00
Matt Arsenault
75f4baa705
SystemZ: Implement copyPhysReg between vr128 and gr128 (#90616)
I have no idea if this is correct and I probably swapped the element
ordering somewhere.
2024-04-30 23:02:54 +02:00
Jonas Paulsson
6c32a1fdf7
[SystemZ] Enable MachineCombiner for FP reassociation (#83546)
Enable MachineCombining for FP add, sub and mul.

In order for this to work, the default instruction selection of reg/mem opcodes is disabled for ISD nodes that carry the flags that allow reassociation. The reg/mem folding is instead done after MachineCombiner by PeepholeOptimizer. SystemZInstrInfo optimizeLoadInstr() and foldMemoryOperandImpl() ("LoadMI version") have been implemented for this purpose also by this patch.
2024-04-30 17:09:54 +02:00
Matt Arsenault
738c135ee0
SystemZ: Add more tests for fp128 atomics (#90269)
These did not have proper floating point uses so weren't representative
samples. The bitcast inserted by lowering could be absorbed by the
load/store on the source/use.
2024-04-27 20:26:09 +02:00
Kai Nacke
d5022d9ad4
[SystemZ][z/OS] Make z/OS personality function known (#89679)
This change adds the z/OS personality function to the list of known EH
personality functions. It enables removing of the EH data/labels if the
personality function is not invoked.
2024-04-23 10:39:03 -04:00
Kai Nacke
cce4dc7b7a
[SystemZ][z/OS] Implement llvm.returnaddress for XPLINK (#89440)
The implementation follows the ELF implementation.
2024-04-22 11:01:22 -04:00
Kai Nacke
7e2c2981fb
[SystemZ][z/OS] Implement llvm.frameaddr for XPLINK (#89284)
The implementation follows the ELF implementation.
2024-04-19 08:09:49 -04:00
Jonas Paulsson
7e4c6e98fa
[SystemZ] Bugfix in getDemandedSrcElements(). (#88623)
For the intrinsic s390_vperm, all of the elements are demanded, so use
an APInt with the value of '-1' for them (not '1').

Fixes https://github.com/llvm/llvm-project/issues/88397
2024-04-15 16:32:14 +02:00
Dominik Steenken
b794dc2325
[SystemZ] Add custom handling of legal vectors with reduce-add. (#88495)
This commit skips the expansion of the `vector.reduce.add` intrinsic on
vector-enabled SystemZ targets in order to introduce custom handling of
`vector.reduce.add` for legal vector types using the VSUM instructions.
This is limited to full vectors with scalar types up to `i32` due to
performance concerns.

It also adds testing for the generation of such custom handling, and
adapts the related cost computation, as well as the testing for that.

The expected result is a performance boost in certain benchmarks that
make heavy use of `vector.reduce.add` with other benchmarks remaining
constant.

For instance, the assembly for `vector.reduce.add<4 x i32>` changes from
```hlasm
        vmrlg   %v0, %v24, %v24
        vaf     %v0, %v24, %v0
        vrepf   %v1, %v0, 1
        vaf     %v0, %v0, %v1
        vlgvf   %r2, %v0, 0
```
to
```hlasm
        vgbm    %v0, 0
        vsumqf  %v0, %v24, %v0
        vlgvf   %r2, %v0, 3
```
2024-04-12 18:05:30 +02:00
Jonas Paulsson
16b7cc69ef
[SystemZ] Eliminate call sequence instructions early. (#77812)
On SystemZ, the outgoing argument area which is big enough for all calls
in the function is created once during the prolog, as opposed to
adjusting the stack around each call. The call-sequence instructions are
therefore not really useful any more than to compute the maximum call
frame size, which has so far been done by PEI, but can just as well be
done at an earlier point.

This patch removes the mapping of the CallFrameSetupOpcode and
CallFrameDestroyOpcode and instead computes the MaxCallFrameSize
directly after instruction selection and then removes the ADJCALLSTACK
pseudos. This removes the confusing pseudos and also avoids the problem
of having to keep the call frame size accurate when creating new MBBs.

This fixes #76618 which exposed the need to maintain the call frame size
when splitting blocks (which was not done).
2024-03-28 18:26:38 +01:00
Ulrich Weigand
4b907414d2 [SystemZ] Add support for llvm.readcyclecounter
The llvm.readcyclecounter intrinsic can be implemented via the
STORE CLOCK FAST (STCKF) instruction.
2024-03-22 20:01:02 +01:00
Jonas Paulsson
7564566779 Reapply "Move assertion for AdjustsStack from PEI to MachineVerifier (#85698)"
- The check is now actually done in both PEI and the MachineVerifier.
- More .mir tests trivially updated with "adjustsStack: true" as needed.
2024-03-21 20:24:57 -04:00
Jonas Paulsson
b4b5e8277a
Check for all frame instructions in finalize isel. (#85945)
Check for all frame instructions in finalize isel, not just for the
frame setup opcode. This was proven necessary, see #78001 
for discussion.
2024-03-21 11:00:08 -04:00