- Instead of lowering float/double ISD::ATOMIC_LOAD / ISD::ATOMIC_STORE
nodes to regular LOAD/STORE nodes, make them legal and select those nodes
properly instead. This avoids exposing them to the DAGCombiner.
- AtomicExpand pass no longer casts float/double atomic load/stores to integer
(FP128 is still casted).
The usage of FP Load and Test instructions as a comparison against zero
with the assumption that the dest reg will always reflect the source reg is
actually incorrect: Unfortunately, a SNaN will be converted to a QNaN, so the
instruction may actually change the value as opposed to being a pure register
move with a test.
This patch
- changes instruction selection to always emit FP LT with a scratch def
reg, which will typically be allocated to the same reg if dead.
- Removes the conversions into FP LT in SystemZElimcompare.
Let the AtomicExpand pass do more of the job of expanding
AtomicRMWInst:s in order to simplify the handling in the backend.
The only cases that the backend needs to handle itself are those of
subword size (8/16 bits) and those directly corresponding to a target
instruction.
Prepare for removing the MemOpsEmitted workaround for symbolic
displacements by letting TableGen know about the offsets of the
displacement sub-operands within the instruction.
There are alternative ways to do this that were tried and rejected:
- Creating encoders and decoders for each possible displacement offset.
This is too repetitive.
- Use VarLenCodeEmitter [1]. The resulting diff is quite large.
Instead, use the named sub-operand support introduced by commit
a538d1f13a13 ("[TableGen][CodeEmitterGen] Allow local names for
sub-operands in a operand list.").
Describe instruction encodings in terms of sub-operands instead of
operands (e.g. B, D, L vs BDL) - this also better matches the pictures
from the Principles of Operation. Decompose operands into sub-operands
using the new (bdaddr12only $B1, $D1):$BD1 syntax. Replace the
encoders and the decoders of the operands with these of the
sub-operands.
Since DecodeADDR64BitRegisterClass() is now used for bases and indices,
change it to return NoRegister when decoding 0. This also changes the
disassembly of some instructions, e.g., br %r0 becomes br 0. Since this
better captures the instruction semantics, namely, that the value of
%r0 is not used, keep this change and update the tests.
[1] https://m680x0.github.io/blog/2022/02/varlen-encoder.html
Reviewed By: uweigand
Differential Revision: https://reviews.llvm.org/D155194
For all RotateSelect* instructions, PoP says:
Bits 0-1 of the I5 field (bits 32-33 of the instruction) are
ignored.
LLVM, however, completely prohibits using them, e.g.:
error: invalid operand for instruction
asm("rxsbg %[r1],%[r2],177,43,228\n"
Lift this unnecessary restriction.
Reviewed By: uweigand
Differential Revision: https://reviews.llvm.org/D146185
Memset with a constant length was implemented with a single store followed by
a series of MVC:s. This patch changes this so that one store of the byte is
emitted for each MVC, which avoids data dependencies between the MVCs. An
MVI/STC + MVC(len-1) is done for each block.
In addition, memset with a variable length is now also handled without a
libcall. Since the byte is first stored and then MVC is used from that
address, a length of two must now be subtracted instead of one for the loop
and EXRL. This requires an extra check for the one-byte case, which is
handled in a special block with just a single MVI/STC (like GCC).
Review: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D112004
This patch fixes the bug that consisted of treating variable / immediate
length mem operations (such as memcpy, memset, ...) differently. The variable
length case needs to have the length minus 1 passed due to the use of EXRL
target instructions. However, the DAGCombiner can convert a register length
argument into a constant one, and whenever that happened one byte too little
would end up being performed.
This is also a refactorization by reducing the number of opcodes and variants
involved. For any opcode (variable or constant length), only the length minus
one is passed on to the ISD node. The rest of the logic is now instead
handled during isel pseudo expansion.
Review: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D111729
Seem to cause test failures in compiler-rt.
Revert "[SystemZ] Implement memcmp of variable length with CLC."
This reverts commit 7a4e9a0c73667cb80e4572d41535a9e48f1ed9ef.
Revert "[SystemZ] Implement memcpy of variable length with MVC."
This reverts commit c6c13c58eebda605a9a05f1f13cac1e46407afc7.
Following the same pattern of memset/memcpy, this patch implements a variable
length memcmp with a CLC loop followed by an EXRL instruction.
Review: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D107380
This patch adds support for the next-generation arch14
CPU architecture to the SystemZ backend.
This includes:
- Basic support for the new processor and its features.
- Detection of arch14 as host processor.
- Assembler/disassembler support for new instructions.
- New LLVM intrinsics for certain new instructions.
- Support for low-level builtins mapped to new LLVM intrinsics.
- New high-level intrinsics in vecintrin.h.
- Indicate support by defining __VEC__ == 10304.
Note: No currently available Z system supports the arch14
architecture. Once new systems become available, the
official system name will be added as supported -march name.
Benchmarking has shown that it is worthwhile to implement a variable length
memset of 0 with XC (exclusive or) like gcc does, instead of using a libcall.
This requires the use of the EXecute Relative Long (EXRL) instruction which
can now be done in a framework that can also be used with other target
instructions (not just XC).
Review: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D103865
- This patch adds in the distinction between jg[*] and jl[*] pc-relative
mnemonics based on the variant/dialect.
- Under the hlasm variant, we use the jl[*] family of mnemonics and under
the att (GNU as) variant, we use the jg[*] family of mnemonics.
- jgnop which was added in https://reviews.llvm.org/D92185, is now restricted
to att variant. jlnop is introduced and restricted to hlasm variant.
- The br[*]l additional mnemonics are mapped to either jl[*]/jg[*] based on
the variant.
Reviewed By: uweigand
Differential Revision: https://reviews.llvm.org/D97581
- This patch introduces a different assembler dialect ("hlasm") for z/OS.
The default dialect has now been given the "att" dialect name. For this
appropriate changes have been added to SystemZ.td.
- This patch also makes a few changes to SystemZInstrFormats.td which
restrict a few condition code mnemonics to just the "att" dialect
variant (he, le, lh, nhe, nle, nlh). These extended condition code
mnemonics are not available in HLASM.
- A new private function has been introduced in SystemZAsmParser.cpp to
return the assembler dialect set in SystemZMCAsmInfo.cpp. The reason we
couldn't/haven't explicitly queried the overriden getAssemblerDialect
function from AsmParser is outlined in this thread here. This returned
dialect is directly passed onto the relevant matcher functions which taken
in a variantID, so that the matcher functions can appropriately choose an
instruction based on the variant.
Reviewed By: uweigand
Differential Revision: https://reviews.llvm.org/D94250
This patch consists of the addition of some common additional
extended mnemonics to the SystemZ target.
- These are jnop, jct, jctg, jas, jasl, jxh, jxhg, jxle,
jxleg, bru, brul, br*, br*l.
- These mnemonics and the instructions they map to are
defined here, Chapter 4 - Branching with extended
mnemonic codes.
- Except for jnop (which is a variant of brc 0, label), every
other mnemonic is marked as a MnemonicAlias since there is
already a "defined" instruction with the same encoding
and/or condition mask values.
- brc 0, label doesn't have a defined extended mnemonic, thus
jnop is defined using as an InstAlias. Furthermore, the
applyMnemonicAliases function is called in the overridden
parseInstruction function in SystemZAsmParser.cpp to ensure
any mnemonic aliases are applied before any further
processing on the instruction is done.
Reviewed By: uweigand
Differential Revision: https://reviews.llvm.org/D92185
Support VRI, VRR, VRS, VRV, VRX, VSI instruction formats with the .insn
directive.
Review: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D88357
Use FP-mem instructions when folding reloads into single lane (W..) vector
instructions.
Only do this when all other operands of the instruction have already been
allocated to an FP (F0-F15) register.
Review: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D76705
Swap the compare operands if LHS is spilled while updating the CCMask:s of
the CC users. This is relatively straight forward since the live-in lists for
the CC register can be assumed to be correct during register allocation
(thanks to 659efa2).
Also fold a spilled operand of an LOCR/SELR into an LOC(G).
Review: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D67437
The SELR(Mux) instructions can be converted to two-address form as LOCR(Mux)
instructions whenever one of the sources are the same reg as dest. By adding
this mapping in getTwoOperandOpcode(), we get:
- Two-address hints in getRegAllocationHints() for select register
instructions.
- No need anymore for special handling in SystemZShortenInst.cpp -
shortenSelect() removed.
The two-address hints are now added before the GRX32 hints, which should be
preferred.
Review: Ulrich Weigand
https://reviews.llvm.org/D68870
It was recently discovered that the handling of CC values was actually broken
since overflow was not properly handled ('nsw' flag not checked for).
Add and sub instructions now have a new target specific instruction flag
named SystemZII::CCIfNoSignedWrap. It means that the CC result can be used
instead of a compare with 0, but only if the instruction has the 'nsw' flag
set.
This patch also adds the improvements of conversion to logical instructions
and the analyzing of add with immediates, to be able to eliminate more
compares.
Review: Ulrich Weigand
https://reviews.llvm.org/D66868
This reverts r372314, reapplying r372285 and the commits which depend
on it (r372286-r372293, and r372296-r372297)
This was missing one switch to getTargetConstant in an untested case.
llvm-svn: 372338
This broke the Chromium build, causing it to fail with e.g.
fatal error: error in backend: Cannot select: t362: v4i32 = X86ISD::VSHLI t392, Constant:i8<15>
See llvm-commits thread of r372285 for details.
This also reverts r372286, r372287, r372288, r372289, r372290, r372291,
r372292, r372293, r372296, and r372297, which seemed to depend on the
main commit.
> Encode them directly as an imm argument to G_INTRINSIC*.
>
> Since now intrinsics can now define what parameters are required to be
> immediates, avoid using registers for them. Intrinsics could
> potentially want a constant that isn't a legal register type. Also,
> since G_CONSTANT is subject to CSE and legalization, transforms could
> potentially obscure the value (and create extra work for the
> selector). The register bank of a G_CONSTANT is also meaningful, so
> this could throw off future folding and legalization logic for AMDGPU.
>
> This will be much more convenient to work with than needing to call
> getConstantVRegVal and checking if it may have failed for every
> constant intrinsic parameter. AMDGPU has quite a lot of intrinsics wth
> immarg operands, many of which need inspection during lowering. Having
> to find the value in a register is going to add a lot of boilerplate
> and waste compile time.
>
> SelectionDAG has always provided TargetConstant for constants which
> should not be legalized or materialized in a register. The distinction
> between Constant and TargetConstant was somewhat fuzzy, and there was
> no automatic way to force usage of TargetConstant for certain
> intrinsic parameters. They were both ultimately ConstantSDNode, and it
> was inconsistently used. It was quite easy to mis-select an
> instruction requiring an immediate. For SelectionDAG, start emitting
> TargetConstant for these arguments, and using timm to match them.
>
> Most of the work here is to cleanup target handling of constants. Some
> targets process intrinsics through intermediate custom nodes, which
> need to preserve TargetConstant usage to match the intrinsic
> expectation. Pattern inputs now need to distinguish whether a constant
> is merely compatible with an operand or whether it is mandatory.
>
> The GlobalISelEmitter needs to treat timm as a special case of a leaf
> node, simlar to MachineBasicBlock operands. This should also enable
> handling of patterns for some G_* instructions with immediates, like
> G_FENCE or G_EXTRACT.
>
> This does include a workaround for a crash in GlobalISelEmitter when
> ARM tries to uses "imm" in an output with a "timm" pattern source.
llvm-svn: 372314
Encode them directly as an imm argument to G_INTRINSIC*.
Since now intrinsics can now define what parameters are required to be
immediates, avoid using registers for them. Intrinsics could
potentially want a constant that isn't a legal register type. Also,
since G_CONSTANT is subject to CSE and legalization, transforms could
potentially obscure the value (and create extra work for the
selector). The register bank of a G_CONSTANT is also meaningful, so
this could throw off future folding and legalization logic for AMDGPU.
This will be much more convenient to work with than needing to call
getConstantVRegVal and checking if it may have failed for every
constant intrinsic parameter. AMDGPU has quite a lot of intrinsics wth
immarg operands, many of which need inspection during lowering. Having
to find the value in a register is going to add a lot of boilerplate
and waste compile time.
SelectionDAG has always provided TargetConstant for constants which
should not be legalized or materialized in a register. The distinction
between Constant and TargetConstant was somewhat fuzzy, and there was
no automatic way to force usage of TargetConstant for certain
intrinsic parameters. They were both ultimately ConstantSDNode, and it
was inconsistently used. It was quite easy to mis-select an
instruction requiring an immediate. For SelectionDAG, start emitting
TargetConstant for these arguments, and using timm to match them.
Most of the work here is to cleanup target handling of constants. Some
targets process intrinsics through intermediate custom nodes, which
need to preserve TargetConstant usage to match the intrinsic
expectation. Pattern inputs now need to distinguish whether a constant
is merely compatible with an operand or whether it is mandatory.
The GlobalISelEmitter needs to treat timm as a special case of a leaf
node, simlar to MachineBasicBlock operands. This should also enable
handling of patterns for some G_* instructions with immediates, like
G_FENCE or G_EXTRACT.
This does include a workaround for a crash in GlobalISelEmitter when
ARM tries to uses "imm" in an output with a "timm" pattern source.
llvm-svn: 372285
This patch series adds support for the next-generation arch13
CPU architecture to the SystemZ backend.
This includes:
- Basic support for the new processor and its features.
- Assembler/disassembler support for new instructions.
- CodeGen for new instructions, including new LLVM intrinsics.
- Scheduler description for the new processor.
- Detection of arch13 as host processor.
Note: No currently available Z system supports the arch13
architecture. Once new systems become available, the
official system name will be added as supported -march name.
llvm-svn: 365932
Vector load/store instructions support an optional alignment field
that the compiler can use to provide known alignment info to the
hardware. If the field is used (and the information is correct),
the hardware may be able (on some models) to perform faster memory
accesses than otherwise.
This patch adds support for alignment hints in the assembler and
disassembler, and fills in known alignment during codegen.
llvm-svn: 363806
This patch aims to reduce spilling and register moves by using the 3-address
versions of instructions per default instead of the 2-address equivalent
ones. It seems that both spilling and register moves are improved noticeably
generally.
Regalloc hints are passed to increase conversions to 2-address instructions
which are done in SystemZShortenInst.cpp (after regalloc).
Since the SystemZ reg/mem instructions are 2-address (dst and lhs regs are
the same), foldMemoryOperandImpl() can no longer trivially fold a spilled
source register since the reg/reg instruction is now 3-address. In order to
remedy this, new 3-address pseudo memory instructions are used to perform the
folding only when the dst and lhs virtual registers are known to be allocated
to the same physreg. In order to not let MachineCopyPropagation run and
change registers on these transformed instructions (making it 3-address), a
new target pass called SystemZPostRewrite.cpp is run just after
VirtRegRewriter, that immediately lowers the pseudo to a target instruction.
If it would have been possibe to insert a COPY instruction and change a
register operand (convert to 2-address) in foldMemoryOperandImpl() while
trusting that the caller (e.g. InlineSpiller) would update/repair the
involved LiveIntervals, the solution involving pseudo instructions would not
have been needed. This is perhaps a potential improvement (see Phabricator
post).
Common code changes:
* A new hook TargetPassConfig::addPostRewrite() is utilized to be able to run a
target pass immediately before MachineCopyPropagation.
* VirtRegMap is passed as an argument to foldMemoryOperand().
Review: Ulrich Weigand, Quentin Colombet
https://reviews.llvm.org/D60888
llvm-svn: 362868
to reflect the new license.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
llvm-svn: 351636
There are several vector instructions which may or may not set the
condition code register, depending on the value of an argument.
For codegen, we use two versions of the instruction, one that sets
CC and one that doesn't, which hard-code appropriate values of that
argument. But we also have a "generic" version of the instruction
that is used for the assembler/disassembler. These generic versions
should always be considered to clobber CC just to be safe.
llvm-svn: 349761
Currently, an instruction setting the condition code is linked to
the instruction using the condition code via a "glue" link in the
SelectionDAG. This has a number of drawbacks; in particular, it
means the same CC cannot be used by multiple users. It also makes
it more difficult to efficiently implement SADDO et. al.
This patch changes the back-end to represent CC dependencies as
normal values during SelectionDAG matching, along the lines of
how this is handled in the X86 back-end already.
In addition to the core mechanics of updating all relevant patterns,
this requires a number of additional changes:
- We now need to be able to spill/restore a CC value into a GPR
if necessary. This means providing a copyPhysReg implementation
for moves involving CC, and defining getCrossCopyRegClass.
- Since we still prefer to avoid such spills, we provide an override
for IsProfitableToFold to avoid creating a merged LOAD / ICMP if
this would result in multiple users of the CC.
- combineCCMask no longer requires a single CC user, and no longer
need to be careful about preventing invalid glue/chain cycles.
- emitSelect needs to be more careful in marking CC live-in to
the basic block it generates. Also, we can now optimize the
case of multiple subsequent selects with the same condition
just like X86 does.
llvm-svn: 331202
In patterns where we need to specify a result VT, prefer
[(set (tr.vt tr.op:$V1), (operator ...))]
over
[(set tr.op:$V1, (tr.vt (operator ...)))]
This is NFC now, but simplifies some future changes.
llvm-svn: 331192
If we have LOCR instructions, select them directly from SelectionDAG
instead of first going through a pseudo instruction and then using
the custom inserter to emit the LOCR.
Provide Select pseudo-instructions for VR32/VR64 if we have vector
instructions, to avoid having to go through the first 16 FPRs
unnecessarily.
If we do not have LOCFHR, prefer using LOCR followed by a move
over a conditional branch.
llvm-svn: 331191
If the MachineInstr uses a custom inserter and is then erased after
instruction selection, there is no use for mapping it to a sched class.
Review: Ulrich Weigand
llvm-svn: 331040
This has proven a healthy exercise, as many cases of incorrect instruction
flags were corrected in the process. As part of this, IntrWriteMem was added
to several SystemZ instrinsics.
Furthermore, a bug was exposed in TwoAddress with this change (as incorrect
hasSideEffects flags were removed and instructions could now be sunk), and
the test case for that bugfix (r319646) is included here as
test/CodeGen/SystemZ/twoaddr-sink.ll.
One temporary test regression (one extra copy) which will hopefully go away
in upcoming patches for similar cases:
test/CodeGen/SystemZ/vec-trunc-to-i1.ll
Review: Ulrich Weigand.
https://reviews.llvm.org/D40437
llvm-svn: 319756
This adds support for the new 128-bit vector float instructions of z14.
Note that these instructions actually only operate on the f128 type,
since only each 128-bit vector register can hold only one 128-bit
float value. However, this is still preferable to the legacy 128-bit
float instructions, since those operate on pairs of floating-point
registers (so we can hold at most 8 values in registers), while the
new instructions use single vector registers (so we hold up to 32
value in registers).
Adding support includes:
- Enabling the instructions for the assembler/disassembler.
- CodeGen for the instructions. This includes allocating the f128
type now to the VR128BitRegClass instead of FP128BitRegClass.
- Scheduler description support for the instructions.
Note that for a small number of operations, we have no new vector
instructions (like integer <-> 128-bit float conversions), and so
we use the legacy instruction and then reformat the operand
(i.e. copy between a pair of floating-point registers and a
vector register).
llvm-svn: 308196
This patch series adds support for the IBM z14 processor. This part includes:
- Basic support for the new processor and its features.
- Support for new instructions (except vector 32-bit float and 128-bit float).
- CodeGen for new instructions, including new LLVM intrinsics.
- Scheduler description for the new processor.
- Detection of z14 as host processor.
Support for the new 32-bit vector float and 128-bit vector float
instructions is provided by separate patches.
llvm-svn: 308194
This adds all remaining instructions that were still missing, mostly
privileged and semi-privileged system-level instructions. These are
provided for use with the assembler and disassembler only.
This brings the LLVM assembler / disassembler to parity with the
GNU binutils tools.
llvm-svn: 306876
This adds assembler / disassembler support for the decimal
floating-point instructions. Since LLVM does not yet have
support for decimal float types, these cannot be used for
codegen at this point.
llvm-svn: 304203
This adds assembler / disassembler support for the hexadecimal
floating-point instructions. Since the Linux ABI does not use
any hex float data types, these are not useful for codegen.
llvm-svn: 304202
This adds a few missing instructions for the assembler and
disassembler. Those should be the last missing general-
purpose (Chapter 7) instructions for the z10 ISA.
llvm-svn: 302667
This adds the remaining general arithmetic instructions
for assembler / disassembler use. Most of these are not
useful for codegen; a few might be, and those are listed
in the README.txt for future improvements.
llvm-svn: 302665
Add assembler support for all atomic instructions that weren't already
supported. Some of those could be used to implement codegen for 128-bit
atomic operations, but this isn't done here yet.
llvm-svn: 288526
Add assembler support for instructions manipulating the FPC.
Also add codegen support via the GCC compatibility builtins:
__builtin_s390_sfpc
__builtin_s390_efpc
llvm-svn: 288525