1075 Commits

Author SHA1 Message Date
Tony Tao
019ecdf7bb
[SystemZ][GOFF] Implement emitGlobalAlias for GOFF/HLASM (#180041)
HLASM has a requirement where aliasing labels need to be emitted at the
same time as the aliasee label, similar to AIX. I used their
implementation for reference with some modifications as we can only
alias functions and we must emit all symbol attributes before the label
is emitted to ensure the XATTR instruction contains the correct
attributes.

---------

Co-authored-by: Tony Tao <tonytao@ca.ibm.com>
2026-02-06 14:49:25 -05:00
Kai Nacke
f3bd1b9526
[SystemZ][z/OS] Use the text section for jump tables (#179793)
Jump tables are read only data, and the text section is the best choice
for them.
2026-02-05 08:18:17 -05:00
sujianIBM
8b28f5229e
[SystemZ][z/OS] Reverse the order of instructions to save and restore CSRs (#179540)
Reverse the order of instructions to save and restore CSRs so
instruction on small numbered reg goes first.
2026-02-04 11:48:09 -05:00
sujianIBM
bc80d1ac0c
[SystemZ][z/OS] Set R5 as not restored. (#179666)
R5 (environment register) should not be restored. This is missing in the
code.
Add it back and also add a test to verify it.
2026-02-04 10:57:24 -05:00
Tony Tao
637a038c04
[SystemZ][GOFF] Implement lowerConstant (#179394)
Implement lowerConstants for SystemZ and handle special cases where
entries need to be created in the ADA for static functions or VCon for
externals.

---------

Co-authored-by: Tony Tao <tonytao@ca.ibm.com>
2026-02-04 10:03:34 -05:00
Kai Nacke
8a895b3151
[GOFF] Add emission of debug sections (#178677)
This PR adds the definition of the debug sections for emission into GOFF
files. Currently, there is no debugger available which supports all the
sections. However, they all must defined to avoid regression in LIT test
cases.
2026-02-03 14:38:24 -05:00
Jonas Paulsson
09f9a2892a
[SystemZ] Bugfix: Add VLR16 to SystemZInstrInfo::copyPhysReg(). (#178932)
Support COPYs involving higher FP16 regs (like F24H) with a new pseudo
instruction 'VLR16'.

This is needed with -O0/regalloc=fast, and probably in more cases as
well.

Fixes #178788.
2026-01-30 14:55:07 -06:00
Anikesh Parashar
fd45140ed6
[DAG] SimplifyDemandedBits - ICMP_SLT(X,0) - only sign mask of X is required (#164946)
Resolves #164589
2026-01-28 17:30:23 +00:00
Dominik Steenken
355898a6ce
[SystemZ] Enable -fpatchable-function-entry=M,N (#178191)
This PR enables the option `-fpatchable-function-entry` for SystemZ. It
utilizes existing common code and just adds the emission of nops after
the function label in the backend.

SystemZ provides multiple nop options of varying length, making the
semantics of this option somewhat ambiguous. In order to align with what
`gcc` does with that same option, we#re choosing `nopr` as the
canoonical nop for this purpose.

For test, this adapts an existing test file from aarch64.
2026-01-28 10:42:54 +01:00
Jonas Paulsson
c999e9a4fe
[SystemZ] Support fp16 vector ABI and basic codegen. (#171066)
- Make v8f16 a legal type so that arguments can be passed in vector
registers. Handle fp16 vectors so that they have the same ABI as other
fp vectors.

- Set the preferred vector action for fp16 vectors to "split". This will
scalarize all operations, which is not always necessary (like with
memory operations), but it avoids the superfluous operations that result
after first widening and then scalarizing a narrow vector (like v4f16).

Fixes #168992
2026-01-26 13:42:25 -06:00
Amy Kwan
41567d8ec2
[SystemZ] Implement ctor/dtor emission via @@SQINIT and .xtor sections (#171476)
This patch implements support for constructors/destructors by
introducing the
`@@SQINIT` section and emitting `.xtor.<priority>` sections within the
SystemZ
AsmPrinter and in the GOFF object lowering layer.
2026-01-23 13:29:08 -05:00
Jonas Paulsson
8eccda10d2
[SystemZ] Add SP alignment to the DataLayout string. (#176041)
Add '-S64' to the SystemZ datalayout string, to avoid overalignment of
stack objects.

Fixes #173402
2026-01-20 09:54:47 -06:00
Mikhail Gudim
40a28769a4
[ReachingDefAnalysis] Print basic blocks. (#175568) 2026-01-14 06:29:31 -08:00
Kai Nacke
84bbaa097c
[SystemZ][z/OS] Handle labels for parts (#175665)
Global data is emitted into parts, which are modelled as a MCSection. A
label (symbol of type LD) is not allowed in a part, which requires
special handling. The approach is to not emit the label at all, and
using the part symbol in relocations.
2026-01-13 09:15:27 -05:00
Trevor Gross
e7f23b410b
[SystemZ] Remove the softPromoteHalfType override (#175410)
`softPromoteHalfType` is being phased out because it is prone to
miscompilations (further context at [1]). SystemZ is one of the few
remaining platforms to override the default, so remove it here.

This only affects SystemZ when the `soft-float` option is used.

[1]: https://github.com/llvm/llvm-project/pull/175149
2026-01-11 16:43:40 +01:00
Amy Kwan
1671bb67e7
[SystemZ] Change default backend to ASCII (#174470)
The current (and default) backend on z/OS is EBCDIC.

This patch updates the default backend to be ASCII, which is beneficial
when porting new languages. With this change, ASCII is the default when
no special metadata nodes (such as `zos_le_char_mode`) are present.
2026-01-07 14:27:25 -05:00
Matt Arsenault
56ce7ed72b
llvm: Convert some assorted lit tests to opaque pointers (#174564)
Some of the MIR test hit a bug where it errors if there is a
raw global reference as the referenced value. Worked around some
of those by just keeping a no-op bitcast constant expression.
2026-01-06 11:41:27 +00:00
Kai Nacke
611a271e8d
[GOFF] Write out relocations in the GOFF writer (#167054)
Add support for writing relocations. Since the symbol numbering is only
available after the symbols are written, the relocations are collected
in a vector. At write time, the relocations are converted using the
symbols ids, compressed and written out. A relocation data record is
limited to 32K-1 bytes, which requires making sure that larger
relocation data is written into multiple records.
2025-12-20 15:51:46 -05:00
Kai Nacke
37545b80f7
[GOFF] Emit symbols for functions. (#144437)
A function entry is mapped to a LD symbol with an offset to the begin of
the section. HLASM syntax is emitted accordingly.
2025-12-20 13:22:04 -05:00
Anikesh Parashar
7b101d2198
[SystemZ] Update CodeGen/SystemZ/tdc-05.ll test file (#172437)
This PR updates `llvm/test/CodeGen/SystemZ/tdc-05.ll` using
`llvm/utils/update_llc_test_checks.py` to refresh the expected output.
The updated checks reflect the current output of llc and reduce noise in
future diffs.
2025-12-19 21:33:32 +01:00
KRM7
c9aea6248a
[RegisterCoalescer] Don't commute two-address instructions which only define a subregister (#169031)
Currently, the register coalescer may try to commute an instruction
like:
```
%0.sub_lo32:gpr64 = AND %0.sub_lo32:gpr64(tied-def 0), %1.sub_lo32:gpr64
USE %0:gpr64
```
resulting in:
```
%1.sub_lo32:gpr64 = AND %1.sub_lo32:gpr64(tied-def 0), %0.sub_lo32:gpr64
USE %1:gpr64
```
However, this is not correct if the instruction doesn't define the
entire register, as the value of the upper 32-bits
of the register used in `USE` will not be the same.
2025-12-18 23:24:44 +01:00
Folkert de Vries
a587ccd87d
fix llvm.fma.f16 double rounding issue when there is no native support (#171904)
fixes https://github.com/llvm/llvm-project/issues/98389

As the issue describes, promoting `llvm.fma.f16` to `llvm.fma.f32` does
not work, because there is not enough precision to handle the repeated
rounding. `f64` does have sufficient space. So this PR explicitly
promotes the 16-bit fma to a 64-bit fma.

I could not find examples of a libcall being used for fma, but that's
something that could be looked in separately to work around code size
issues.
2025-12-17 22:03:01 +01:00
Nikita Popov
5a24dfa339
[SDAG] Remove most non-canonical libcall handing (#171288)
This is a followup to https://github.com/llvm/llvm-project/pull/171114,
removing the handling for most libcalls that are already canonicalized
to intrinsics in the middle-end. The only remaining one is fabs, which
has more test coverage than the others.
2025-12-10 11:45:26 +01:00
Dominik Steenken
ca12d1d8f1
[SystemZ] Improve CCMask optimization (#171137)
This commit addresses a shortcoming in the implementation of
`combineBR_CCMASK` and `combineSELECT_CCMASK`. In cases where
`combineCCMask` was able to reduce the ccmask going into the select or
branch to either true (`ccvalid`) or false (`0`), a trivial instruction
would be emitted (i.e. either a select that would only ever select one
side, or a conditional branch with `true` or `false` as the branch
condition).
This led under certain circumstances to, e.g., `BRC` instructions being
emitted that triggered an assert in the AsmPrinter meant to exclude such
branch conditions.
For the select case, this commit introduces an early bailout that simply
returns the value that would "always" be selected. For the branch case,
the commit introduces an additional guard that prevents the DAGCombine
from taking effect, thereby preventing the illegal instruction from
being emitted.
2025-12-09 11:20:40 +01:00
Nikita Popov
c15a3dd932 [SystemZ] Generate test checks (NFC) 2025-12-09 10:49:49 +01:00
Jonas Paulsson
0b252daf64
[SystemZ] Handle IR struct arguments correctly. (#169583)
- The size of the stack slot was previously computed in LowerCall() by using
  the original type, but that didn't work for a struct. Compute the size
  by looking at the VT of each part and the number of them instead.

- All the members of a struct have the same OrigArgIndex, so it doesn't work
  to assume that following parts belong to a split argument until another
  OrigArgIndex is encountered. Use the isSplit() and isSplitEnd() flags
  instead.

- Detect any scalar integer argumet >64 bits in CanLowerReturn() instead of
  just i128, in order to let all of them be passed on stack.
  
Fixes #168460
2025-12-04 13:14:31 -06:00
Kai Nacke
66ca3f1367
[SystemZ] Serialize ada entry flags (#169395)
Adding support for serializing the ada entry flags helps with mir based
test cases. Without this change, the flags are simple displayed as being
"unkmown".
2025-11-27 08:14:43 -05:00
Kai Nacke
47efff777d
[SystemZ] Emit optional argument area length field (#169679)
The Language Environment (LE) reserves 128 byte for the argument area
when the optional field is not present. If the argument area is larger,
then the field must be present to guarantee that the space is reserved
on stack extension. Creating this field when alloca() is used may reduce
the needed stack space in case alloca() causes a stack extension.
2025-11-26 16:16:13 -05:00
Matt Arsenault
dfdada1b78
CodeGen: Remove target hook for terminal rule (#165962)
Enables the terminal rule for remaining targets
2025-11-12 21:12:19 +00:00
Nicolai Hähnle
d1387ed272
CodeGen: More accurate mayAlias for instructions with multiple MMOs (#166211)
There can only be meaningful aliasing between the memory accesses of
different instructions if at least one of the accesses modifies memory.

This check is applied at the instruction-level earlier in the method.
This change merely extends the check on a per-MMO basis.

This affects a SystemZ test because PFD instructions are both mayLoad
and mayStore but may carry a load-only MMO which is now no longer
treated as aliasing loads. The PFD instructions are from llvm.prefetch
generated by loop-data-prefetch.
2025-11-06 09:19:37 -08:00
Vigneshwar Jayakumar
b5f200129a
[CodeGen] Register-coalescer remat fix subreg liveness (#165662)
This is a bugfix in rematerialization where the liveness of subreg mask
was incorrectly updated causing crash in scheduler.
2025-11-04 22:40:40 -06:00
Craig Topper
d310693bde
[SelectionDAG] Use GetPromotedInteger when promoting integer operands of PATCHPOINT/STACKMAP. (#165926)
This is consistent with other promotion, but causes negative constants
to be sign extended instead of zero extended in some cases.

I guess getNode and type legalizer are inconsistent about what
ANY_EXTEND of a constant does.
2025-10-31 22:11:13 +00:00
anoopkg6
242c716c68
Fix Linux kernel build failure for SytemZ. (#165274)
Linux kernel build fails for SystemZ as output of INLINEASM was GR32Bit
general-purpose register instead of SystemZ::CC.

---------

Co-authored-by: anoopkg6 <anoopkg6@github.com>
Co-authored-by: Ulrich Weigand <ulrich.weigand@de.ibm.com>
2025-10-27 18:22:01 +01:00
paperchalice
3656f6f226
[CodeGen] Remove -enable-unsafe-fp-math option (#164559)
Hope this can unblock #105746.
2025-10-22 15:40:31 +08:00
Simon Pilgrim
1360aecb01
[SystemZ] Avoid trunc(add(X,X)) patterns (#164378)
Replace with trunc(add(X,Y)) to avoid premature folding in upcoming patch #164227
2025-10-21 09:35:16 +00:00
anoopkg6
6712e20c52
Add support for flag output operand "=@cc" for SystemZ. (#125970)
Added Support for flag output operand "=@cc", inline assembly constraint
for
SystemZ.

- Clang now accepts "=@cc" assembly operands, and sets 2-bits condition
code
    for output operand for SyatemZ.

- Clang currently emits an assertion that flag output operands are
boolean
values, i.e. in the range [0, 2). Generalize this mechanism to allow
targets to specify arbitrary range assertions for any inline assembly
    output operand.  This will be used to assert that SystemZ two-bit
    condition-code values are in the range [0, 4).

- SystemZ backend lowers "@cc" targets by using ipm sequence to extract
    condition code from PSW.

  - DAGCombine tries to optimize lowered ipm sequence by combining
CCReg and computing effective CCMask and CCValid in combineCCMask for
    select_ccmask and br_ccmask.

- Cost computation is done for merging conditionals for branch
instruction
in SelectionDAG, as split may cause branches conditions evaluation goes
    across basic block and difficult to combine.

---------

Co-authored-by: anoopkg6 <anoopkg6@github.com>
Co-authored-by: Ulrich Weigand <ulrich.weigand@de.ibm.com>
2025-10-14 11:53:42 +02:00
Luke Lau
795a115d19
[RegAlloc] Remove default restriction on non-trivial rematerialization (#159211)
In the register allocator we define non-trivial rematerialization as the
rematerlization of an instruction with virtual register uses.

We have been able to perform non-trivial rematerialization for a while,
but it has been prevented by default unless specifically overriden by
the target in `TargetTransformInfo::isReMaterializableImpl`. The
original reasoning for this given by the comment in the default
implementation is because we might increase a live range of the virtual
register, but we don't actually do this.
LiveRangeEdit::allUsesAvailableAt makes sure that we only rematerialize
instructions whose virtual registers are already live at the use sites.

https://reviews.llvm.org/D106408 had originally tried to remove this
restriction but it was reverted after some performance regressions were
reported. We think it is likely that the regressions were caused by the
fact that the old isTriviallyReMaterializable API sometimes returned
true for non-trivial rematerializations.

However https://github.com/llvm/llvm-project/pull/160377 recently split
the API out into a separate non-trivial and trivial version and updated
the call-sites accordingly, and
https://github.com/llvm/llvm-project/pull/160709 and #159180 fixed
heuristics which weren't accounting for the difference between
non-trivial and trivial.

With these fixes in place, this patch proposes to again allow
non-trivial rematerialization by default which reduces a significant
amount of spills and reloads across various targets.

For llvm-test-suite built with -O3 -flto, we get the following geomean
reduction in reloads:

- arm64-apple-darwin: 11.6%
- riscv64-linux-gnu: 8.1%
- x86_64-linux-gnu: 6.5%
2025-10-04 22:50:44 +00:00
Matt Arsenault
3537e8abfa
RegAllocGreedy: Check if copied lanes are live in trySplitAroundHintReg (#160424)
For subregister copies, do a subregister live check instead of checking
the main range. Doesn't do much yet, the split analysis still does not
track live ranges.
2025-10-02 12:21:02 +00:00
Mikhail Gudim
562146499c
[CodeGen][NewPM] Port ReachingDefAnalysis to new pass manager. (#159572)
In this commit:
  (1) Added new pass manager support for `ReachingDefAnalysis`.
  (2) Added printer pass.
  (3) Make old pass manager use `ReachingDefInfoWrapperPass`
2025-09-19 09:38:34 -04:00
Folkert de Vries
8a9e3333dd
s390x: optimize 128-bit fshl and fshr by high values (#154919)
Turn a funnel shift by N in the range `121..128` into a funnel shift in
the opposite direction by `128 - N`. Because there are dedicated
instructions for funnel shifts by values smaller than 8, this emits
fewer instructions.

This additional rule is useful because LLVM appears to canonicalize
`fshr` into `fshl`, meaning that the rules for `fshr` on values less
than 8 would not match on organic input.
2025-08-27 09:31:49 +02:00
Folkert de Vries
558657298a
s390x: pattern match saturated truncation (#155377)
Simplify min/max instruction matching by making the related
SelectionDAG operations legal.

Add patterns to match (signed and unsigned) saturated
truncation based on open-coded min/max patterns.

Fixes https://github.com/llvm/llvm-project/issues/153655
2025-08-26 17:19:58 +02:00
Nikita Popov
63e7766047
[SystemZ] Allow forming overflow op for i128 (#153557)
Allow matching i128 overflow pattern into UADDO, which then allows use
of vaccq.
2025-08-14 16:15:22 +02:00
KRM7
ee47427386
[RegisterCoalescer] Fix subrange update when rematerialization widens a def (#151974)
Currently, when an instruction rematerialized by the register coalescer
defines more subregs of the destination register
than the original COPY instruction did, we only add dead defs for the
newly defined subregs if they were not defined anywhere
else. For example, consider something like this before
rematerialization:
```
 %0:reg64 = CONSTANT 1
 %1:reg128.sub_lo64_lo32 = COPY %0.lo32
 %1:reg128.sub_lo64_hi32 = ...
 ...
```
that would look like this after rematerializing `%0`:
```
 %0:reg64 = CONSTANT 2
 %1:reg128.sub_lo64 = CONSTANT 2
 %1:reg128.sub_lo64_hi32 = ...
 ...
```
A dead def would not be added for `%1.sub_lo64_hi32` at the 2nd
instruction because it's subrange wasn't empty beforehand.
2025-08-05 22:32:31 +09:00
Matt Arsenault
12568b6a4f
SystemZ: Add sincos intrinsic test (#147473)
The ZOS run line is mostly broken. update_test_checks seems
to not work on it and I have no idea what I'm looking at here.
It's not obvious to me what the calls are. I added some checks
for the references to the libcalls printed at the end of the module,
but didn't check anything in the function body. half also just
asserts somewhere.
2025-08-05 12:55:26 +09:00
sujianIBM
fc12fc635b
[SystemZ] Fix code in widening vector multiplication (#150836)
Commit cdc7864 has an error which would wrongly fold widening
multiplications into an even/odd widening operation.
This PR fixes it and adds tests to check scenarios which should not be
folded into an even/odd widening operation are actually not.
2025-07-31 13:18:23 -04:00
Simon Pilgrim
c37942df00
[DAG] visitFREEZE - limit freezing of multiple operands (#149797)
This is a partial revert of #145939 (I've kept the BUILD_VECTOR(FREEZE(UNDEF), FREEZE(UNDEF), elt2, ...) canonicalization) as we're getting reports of infinite loops (#148084).

The issue appears to be due to deep chains of nodes and how visitFREEZE replaces all instances of an operand with a common frozen version - other users of the original frozen node then get added back to the worklist but might no longer be able to confirm a node isn't poison due to recursion depth limits on isGuaranteedNotToBeUndefOrPoison.

The issue still exists with the old implementation but by only allowing a single frozen operand it helps prevent cases of interdependent frozen nodes.

I'm still working on supporting multiple operands as its critical for topological DAG handling but need to get a fix in for trunk and 21.x.

Fixes #148084
2025-07-22 15:40:55 +01:00
Trevor Gross
0db197adef
[Test] Mark a number of libcall tests nounwind (#148329)
Many tests for floating point libcalls include CFI directives, which
isn't needed for the purpose of these tests. Mark some of the relevant
test functions `nounwind` in order to remove this noise.
2025-07-12 11:57:28 +02:00
Vikram Hegde
fcd4a2fe7a
[CodeGen][NewPM] Port "PostRAMachineSink" pass to NPM (#129690) 2025-07-10 13:10:46 +05:30
Fangrui Song
68494ae072 [XRay] xray_fn_idx: fix alignment directive
Use `emitValueToAlignment` as the section does not contain code.
`emitCodeAlignment` would lead to ALIGN relocations on RISC-V and
LoongArch with linker relaxation.

In addition, change the alignment to wordsize, sufficient for the
runtime requirement (`XRayFunctionSledIndex`).

Related to #147322
2025-07-08 21:52:53 -07:00
Matt Arsenault
026307958b
SystemZ: Remove unnecessary requires asserts from test (#147477) 2025-07-09 09:28:57 +09:00