HLASM has a requirement where aliasing labels need to be emitted at the
same time as the aliasee label, similar to AIX. I used their
implementation for reference with some modifications as we can only
alias functions and we must emit all symbol attributes before the label
is emitted to ensure the XATTR instruction contains the correct
attributes.
---------
Co-authored-by: Tony Tao <tonytao@ca.ibm.com>
Implement lowerConstants for SystemZ and handle special cases where
entries need to be created in the ADA for static functions or VCon for
externals.
---------
Co-authored-by: Tony Tao <tonytao@ca.ibm.com>
This PR adds the definition of the debug sections for emission into GOFF
files. Currently, there is no debugger available which supports all the
sections. However, they all must defined to avoid regression in LIT test
cases.
Support COPYs involving higher FP16 regs (like F24H) with a new pseudo
instruction 'VLR16'.
This is needed with -O0/regalloc=fast, and probably in more cases as
well.
Fixes#178788.
This PR enables the option `-fpatchable-function-entry` for SystemZ. It
utilizes existing common code and just adds the emission of nops after
the function label in the backend.
SystemZ provides multiple nop options of varying length, making the
semantics of this option somewhat ambiguous. In order to align with what
`gcc` does with that same option, we#re choosing `nopr` as the
canoonical nop for this purpose.
For test, this adapts an existing test file from aarch64.
- Make v8f16 a legal type so that arguments can be passed in vector
registers. Handle fp16 vectors so that they have the same ABI as other
fp vectors.
- Set the preferred vector action for fp16 vectors to "split". This will
scalarize all operations, which is not always necessary (like with
memory operations), but it avoids the superfluous operations that result
after first widening and then scalarizing a narrow vector (like v4f16).
Fixes#168992
This patch implements support for constructors/destructors by
introducing the
`@@SQINIT` section and emitting `.xtor.<priority>` sections within the
SystemZ
AsmPrinter and in the GOFF object lowering layer.
Global data is emitted into parts, which are modelled as a MCSection. A
label (symbol of type LD) is not allowed in a part, which requires
special handling. The approach is to not emit the label at all, and
using the part symbol in relocations.
`softPromoteHalfType` is being phased out because it is prone to
miscompilations (further context at [1]). SystemZ is one of the few
remaining platforms to override the default, so remove it here.
This only affects SystemZ when the `soft-float` option is used.
[1]: https://github.com/llvm/llvm-project/pull/175149
The current (and default) backend on z/OS is EBCDIC.
This patch updates the default backend to be ASCII, which is beneficial
when porting new languages. With this change, ASCII is the default when
no special metadata nodes (such as `zos_le_char_mode`) are present.
Some of the MIR test hit a bug where it errors if there is a
raw global reference as the referenced value. Worked around some
of those by just keeping a no-op bitcast constant expression.
Add support for writing relocations. Since the symbol numbering is only
available after the symbols are written, the relocations are collected
in a vector. At write time, the relocations are converted using the
symbols ids, compressed and written out. A relocation data record is
limited to 32K-1 bytes, which requires making sure that larger
relocation data is written into multiple records.
This PR updates `llvm/test/CodeGen/SystemZ/tdc-05.ll` using
`llvm/utils/update_llc_test_checks.py` to refresh the expected output.
The updated checks reflect the current output of llc and reduce noise in
future diffs.
Currently, the register coalescer may try to commute an instruction
like:
```
%0.sub_lo32:gpr64 = AND %0.sub_lo32:gpr64(tied-def 0), %1.sub_lo32:gpr64
USE %0:gpr64
```
resulting in:
```
%1.sub_lo32:gpr64 = AND %1.sub_lo32:gpr64(tied-def 0), %0.sub_lo32:gpr64
USE %1:gpr64
```
However, this is not correct if the instruction doesn't define the
entire register, as the value of the upper 32-bits
of the register used in `USE` will not be the same.
fixes https://github.com/llvm/llvm-project/issues/98389
As the issue describes, promoting `llvm.fma.f16` to `llvm.fma.f32` does
not work, because there is not enough precision to handle the repeated
rounding. `f64` does have sufficient space. So this PR explicitly
promotes the 16-bit fma to a 64-bit fma.
I could not find examples of a libcall being used for fma, but that's
something that could be looked in separately to work around code size
issues.
This is a followup to https://github.com/llvm/llvm-project/pull/171114,
removing the handling for most libcalls that are already canonicalized
to intrinsics in the middle-end. The only remaining one is fabs, which
has more test coverage than the others.
This commit addresses a shortcoming in the implementation of
`combineBR_CCMASK` and `combineSELECT_CCMASK`. In cases where
`combineCCMask` was able to reduce the ccmask going into the select or
branch to either true (`ccvalid`) or false (`0`), a trivial instruction
would be emitted (i.e. either a select that would only ever select one
side, or a conditional branch with `true` or `false` as the branch
condition).
This led under certain circumstances to, e.g., `BRC` instructions being
emitted that triggered an assert in the AsmPrinter meant to exclude such
branch conditions.
For the select case, this commit introduces an early bailout that simply
returns the value that would "always" be selected. For the branch case,
the commit introduces an additional guard that prevents the DAGCombine
from taking effect, thereby preventing the illegal instruction from
being emitted.
- The size of the stack slot was previously computed in LowerCall() by using
the original type, but that didn't work for a struct. Compute the size
by looking at the VT of each part and the number of them instead.
- All the members of a struct have the same OrigArgIndex, so it doesn't work
to assume that following parts belong to a split argument until another
OrigArgIndex is encountered. Use the isSplit() and isSplitEnd() flags
instead.
- Detect any scalar integer argumet >64 bits in CanLowerReturn() instead of
just i128, in order to let all of them be passed on stack.
Fixes#168460
Adding support for serializing the ada entry flags helps with mir based
test cases. Without this change, the flags are simple displayed as being
"unkmown".
The Language Environment (LE) reserves 128 byte for the argument area
when the optional field is not present. If the argument area is larger,
then the field must be present to guarantee that the space is reserved
on stack extension. Creating this field when alloca() is used may reduce
the needed stack space in case alloca() causes a stack extension.
There can only be meaningful aliasing between the memory accesses of
different instructions if at least one of the accesses modifies memory.
This check is applied at the instruction-level earlier in the method.
This change merely extends the check on a per-MMO basis.
This affects a SystemZ test because PFD instructions are both mayLoad
and mayStore but may carry a load-only MMO which is now no longer
treated as aliasing loads. The PFD instructions are from llvm.prefetch
generated by loop-data-prefetch.
This is consistent with other promotion, but causes negative constants
to be sign extended instead of zero extended in some cases.
I guess getNode and type legalizer are inconsistent about what
ANY_EXTEND of a constant does.
Linux kernel build fails for SystemZ as output of INLINEASM was GR32Bit
general-purpose register instead of SystemZ::CC.
---------
Co-authored-by: anoopkg6 <anoopkg6@github.com>
Co-authored-by: Ulrich Weigand <ulrich.weigand@de.ibm.com>
Added Support for flag output operand "=@cc", inline assembly constraint
for
SystemZ.
- Clang now accepts "=@cc" assembly operands, and sets 2-bits condition
code
for output operand for SyatemZ.
- Clang currently emits an assertion that flag output operands are
boolean
values, i.e. in the range [0, 2). Generalize this mechanism to allow
targets to specify arbitrary range assertions for any inline assembly
output operand. This will be used to assert that SystemZ two-bit
condition-code values are in the range [0, 4).
- SystemZ backend lowers "@cc" targets by using ipm sequence to extract
condition code from PSW.
- DAGCombine tries to optimize lowered ipm sequence by combining
CCReg and computing effective CCMask and CCValid in combineCCMask for
select_ccmask and br_ccmask.
- Cost computation is done for merging conditionals for branch
instruction
in SelectionDAG, as split may cause branches conditions evaluation goes
across basic block and difficult to combine.
---------
Co-authored-by: anoopkg6 <anoopkg6@github.com>
Co-authored-by: Ulrich Weigand <ulrich.weigand@de.ibm.com>
In the register allocator we define non-trivial rematerialization as the
rematerlization of an instruction with virtual register uses.
We have been able to perform non-trivial rematerialization for a while,
but it has been prevented by default unless specifically overriden by
the target in `TargetTransformInfo::isReMaterializableImpl`. The
original reasoning for this given by the comment in the default
implementation is because we might increase a live range of the virtual
register, but we don't actually do this.
LiveRangeEdit::allUsesAvailableAt makes sure that we only rematerialize
instructions whose virtual registers are already live at the use sites.
https://reviews.llvm.org/D106408 had originally tried to remove this
restriction but it was reverted after some performance regressions were
reported. We think it is likely that the regressions were caused by the
fact that the old isTriviallyReMaterializable API sometimes returned
true for non-trivial rematerializations.
However https://github.com/llvm/llvm-project/pull/160377 recently split
the API out into a separate non-trivial and trivial version and updated
the call-sites accordingly, and
https://github.com/llvm/llvm-project/pull/160709 and #159180 fixed
heuristics which weren't accounting for the difference between
non-trivial and trivial.
With these fixes in place, this patch proposes to again allow
non-trivial rematerialization by default which reduces a significant
amount of spills and reloads across various targets.
For llvm-test-suite built with -O3 -flto, we get the following geomean
reduction in reloads:
- arm64-apple-darwin: 11.6%
- riscv64-linux-gnu: 8.1%
- x86_64-linux-gnu: 6.5%
For subregister copies, do a subregister live check instead of checking
the main range. Doesn't do much yet, the split analysis still does not
track live ranges.
In this commit:
(1) Added new pass manager support for `ReachingDefAnalysis`.
(2) Added printer pass.
(3) Make old pass manager use `ReachingDefInfoWrapperPass`
Turn a funnel shift by N in the range `121..128` into a funnel shift in
the opposite direction by `128 - N`. Because there are dedicated
instructions for funnel shifts by values smaller than 8, this emits
fewer instructions.
This additional rule is useful because LLVM appears to canonicalize
`fshr` into `fshl`, meaning that the rules for `fshr` on values less
than 8 would not match on organic input.
Simplify min/max instruction matching by making the related
SelectionDAG operations legal.
Add patterns to match (signed and unsigned) saturated
truncation based on open-coded min/max patterns.
Fixes https://github.com/llvm/llvm-project/issues/153655
Currently, when an instruction rematerialized by the register coalescer
defines more subregs of the destination register
than the original COPY instruction did, we only add dead defs for the
newly defined subregs if they were not defined anywhere
else. For example, consider something like this before
rematerialization:
```
%0:reg64 = CONSTANT 1
%1:reg128.sub_lo64_lo32 = COPY %0.lo32
%1:reg128.sub_lo64_hi32 = ...
...
```
that would look like this after rematerializing `%0`:
```
%0:reg64 = CONSTANT 2
%1:reg128.sub_lo64 = CONSTANT 2
%1:reg128.sub_lo64_hi32 = ...
...
```
A dead def would not be added for `%1.sub_lo64_hi32` at the 2nd
instruction because it's subrange wasn't empty beforehand.
The ZOS run line is mostly broken. update_test_checks seems
to not work on it and I have no idea what I'm looking at here.
It's not obvious to me what the calls are. I added some checks
for the references to the libcalls printed at the end of the module,
but didn't check anything in the function body. half also just
asserts somewhere.
Commit cdc7864 has an error which would wrongly fold widening
multiplications into an even/odd widening operation.
This PR fixes it and adds tests to check scenarios which should not be
folded into an even/odd widening operation are actually not.
This is a partial revert of #145939 (I've kept the BUILD_VECTOR(FREEZE(UNDEF), FREEZE(UNDEF), elt2, ...) canonicalization) as we're getting reports of infinite loops (#148084).
The issue appears to be due to deep chains of nodes and how visitFREEZE replaces all instances of an operand with a common frozen version - other users of the original frozen node then get added back to the worklist but might no longer be able to confirm a node isn't poison due to recursion depth limits on isGuaranteedNotToBeUndefOrPoison.
The issue still exists with the old implementation but by only allowing a single frozen operand it helps prevent cases of interdependent frozen nodes.
I'm still working on supporting multiple operands as its critical for topological DAG handling but need to get a fix in for trunk and 21.x.
Fixes#148084
Many tests for floating point libcalls include CFI directives, which
isn't needed for the purpose of these tests. Mark some of the relevant
test functions `nounwind` in order to remove this noise.
Use `emitValueToAlignment` as the section does not contain code.
`emitCodeAlignment` would lead to ALIGN relocations on RISC-V and
LoongArch with linker relaxation.
In addition, change the alignment to wordsize, sufficient for the
runtime requirement (`XRayFunctionSledIndex`).
Related to #147322