1951 Commits

Author SHA1 Message Date
David Green
4ac2721e51
[AArch64] Add costs for ST3 and ST4 instructions, modelled as store(shuffle). (#87934)
This tries to add some costs for the shuffle in a ST3/ST4 instruction,
which are represented in LLVM IR as store(interleaving shuffle). In
order to detect the store, it needs to add a CxtI context instruction to
check the users of the shuffle. LD3 and LD4 are added, LD2 should be a
zip1 shuffle, which will be added in another patch.

It should help fix some of the regressions from #87510.
2024-04-09 16:36:08 +01:00
Kazu Hirata
17c3f102be [SystemZ] Fix an unused variable warning
This patch fixes:

  llvm/lib/Target/SystemZ/SystemZISelLowering.cpp:8181:9: error:
  unused variable 'TFL' [-Werror,-Wunused-variable]
2024-03-28 14:19:39 -07:00
Jonas Paulsson
16b7cc69ef
[SystemZ] Eliminate call sequence instructions early. (#77812)
On SystemZ, the outgoing argument area which is big enough for all calls
in the function is created once during the prolog, as opposed to
adjusting the stack around each call. The call-sequence instructions are
therefore not really useful any more than to compute the maximum call
frame size, which has so far been done by PEI, but can just as well be
done at an earlier point.

This patch removes the mapping of the CallFrameSetupOpcode and
CallFrameDestroyOpcode and instead computes the MaxCallFrameSize
directly after instruction selection and then removes the ADJCALLSTACK
pseudos. This removes the confusing pseudos and also avoids the problem
of having to keep the call frame size accurate when creating new MBBs.

This fixes #76618 which exposed the need to maintain the call frame size
when splitting blocks (which was not done).
2024-03-28 18:26:38 +01:00
Jonas Paulsson
94b5c118b3
[ISel] Move handling of atomic loads from SystemZ to DAGCombiner (NFC). (#86484)
The folding of sign/zero extensions into an atomic load by specifying an
extension type is not target specific, and therefore belongs in the
DAGCombiner rather than in the SystemZ backend.

- Handle atomic loads similarly to regular loads by adding
AtomicLoadExtActions with set/get methods.
- Move SystemZ extendAtomicLoad() to DagCombiner.cpp.
2024-03-28 16:14:35 +01:00
Il-Capitano
308ed0233a
[Intrinsics] Make patchpoint.i64 generic on its return type (#85911)
Currently patchpoints can only have two result types, `void` and `i64`.
This limits the result to general purpose registers.
This patch makes `patchpoint.i64` an overloadable intrinsic, allowing
result values that can fit in a single register (e.g. integers,
pointers, floats).
2024-03-26 19:08:52 +05:30
Sergei Barannikov
5e5b656102
[MC] Make MCParsedAsmOperand::getReg() return MCRegister (#86444) 2024-03-25 05:13:48 +03:00
Ulrich Weigand
4b907414d2 [SystemZ] Add support for llvm.readcyclecounter
The llvm.readcyclecounter intrinsic can be implemented via the
STORE CLOCK FAST (STCKF) instruction.
2024-03-22 20:01:02 +01:00
Ulrich Weigand
335f365982 Reapply: [SystemZ] Fix overflow flag for i128 USUBO
We use the VSCBIQ/VSBIQ/VSBCBIQ family of instructions to implement
USUBO/USUBO_CARRY for the i128 data type.  However, these instructions
use an inverted sense of the borrow indication flag (a value of 1
indicates *no* borrow, while a value of 0 indicated borrow).  This
does not match the semantics of the boolean "overflow" flag of the
USUBO/USUBO_CARRY ISD nodes.

Fix this by generating code to explicitly invert the flag.  These
cancel out of the result of USUBO feeds into an USUBO_CARRY.

To avoid unnecessary zero-extend operations, also improve the
DAGCombine handling of ZERO_EXTEND to optimize (zext (xor (trunc)))
sequences where appropriate.

Fixes: https://github.com/llvm/llvm-project/issues/83268
2024-03-19 14:07:08 +01:00
Ulrich Weigand
d1c3795968 Revert "Fix overflow flag for i128 USUBO"
This reverts commit d9c31ee9568277e4303715736b40925e41503596.
2024-03-19 11:43:05 +01:00
Ulrich Weigand
d9c31ee956 Fix overflow flag for i128 USUBO
We use the VSCBIQ/VSBIQ/VSBCBIQ family of instructions to implement
USUBO/USUBO_CARRY for the i128 data type.  However, these instructions
use an inverted sense of the borrow indication flag (a value of 1
indicates *no* borrow, while a value of 0 indicated borrow).  This
does not match the semantics of the boolean "overflow" flag of the
USUBO/USUBO_CARRY ISD nodes.

Fix this by generating code to explicitly invert the flag.  These
cancel out of the result of USUBO feeds into an USUBO_CARRY.

To avoid unnecessary zero-extend operations, also improve the
DAGCombine handling of ZERO_EXTEND to optimize (zext (xor (trunc)))
sequences where appropriate.

Fixes: https://github.com/llvm/llvm-project/issues/83268
2024-03-19 11:20:52 +01:00
Jonas Paulsson
8b8e1adbde
[SystemZ] Don't lower ATOMIC_LOAD/STORE to LOAD/STORE (#75879)
- Instead of lowering float/double ISD::ATOMIC_LOAD / ISD::ATOMIC_STORE
nodes to regular LOAD/STORE nodes, make them legal and select those nodes
properly instead. This avoids exposing them to the DAGCombiner.

- AtomicExpand pass no longer casts float/double atomic load/stores to integer
  (FP128 is still casted).
2024-03-18 17:21:50 -04:00
David Green
601e102bdb
[CodeGen] Use LocationSize for MMO getSize (#84751)
This is part of #70452 that changes the type used for the external
interface of MMO to LocationSize as opposed to uint64_t. This means the
constructors take LocationSize, and convert ~UINT64_C(0) to
LocationSize::beforeOrAfter(). The getSize methods return a
LocationSize.

This allows us to be more precise with unknown sizes, not accidentally
treating them as unsigned values, and in the future should allow us to
add proper scalable vector support but none of that is included in this
patch. It should mostly be an NFC.

Global ISel is still expected to use the underlying LLT as it needs, and
are not expected to see unknown sizes for generic operations. Most of
the changes are hopefully fairly mechanical, adding a lot of getValue()
calls and protecting them with hasValue() where needed.
2024-03-17 18:15:56 +00:00
Arthur Eubanks
94c988bcfd [NFC] Remove unused parameter from shouldAssumeDSOLocal() 2024-03-11 19:48:17 +00:00
Dominik Steenken
718962f53b
[SystemZ] Provide improved cost estimates (#83873)
This commit provides better cost estimates for
the llvm.vector.reduce.add intrinsic on SystemZ. These apply to all
vector lengths and integer types up to i128. For integer types larger
than i128, we fall back to the default cost estimate.

This has the effect of lowering the estimated costs of most common
instances of the intrinsic. The expected performance impact of this is
minimal with a tendency to slightly improve performance of some
benchmarks.

This commit also provides a test to check the proper computation of the
new estimates, as well as the fallback for types larger than i128.
2024-03-11 10:40:59 +01:00
AtariDreams
7c21495fee
Reapply "Convert many LivePhysRegs uses to LiveRegUnits" (#84338)
This only converts the instances where all that is needed is to change
the variable type name.

Basically, anything that involves a function that LiveRegUnits does not
directly have was skipped to play it safe.

Reverts
7a0e222a17
2024-03-08 19:05:00 +05:30
Jay Foad
7a0e222a17 Revert "Convert many LivePhysRegs uses to LiveRegUnits (#83905)"
This reverts commit 2a13422b8bcee449405e3ebff957b4020805f91c.

It was causing test failures on the expensive check builders.
2024-03-07 08:20:26 +00:00
AtariDreams
2a13422b8b
Convert many LivePhysRegs uses to LiveRegUnits (#83905) 2024-03-06 10:38:14 +05:30
Neumann Hon
eccc71783c
[SystemZ] [z/OS] Emit offset to PPA2 in separate MCSection (#84043)
The ppa2list section isn't really part of the ppa2 section. The ppa2list
section contains the offset to the ppa2, and must be created with a
special section name (specifically, C_@@QPPA2). The binder searches for
a section with this name, then uses this value to locate the ppa2.

In GOFF terms, these are entirely separate sections; the PPA2 section
isn't even really a section but rather belongs to the code section. On
the other hand, the ppa2list section is a section in its own right and
resides in a separate TXT record.
2024-03-05 15:29:07 -05:00
Ulrich Weigand
a8cb9db5f5
[SystemZ] Use proper relocation for TLS variable debug info (#83975)
Debug info refering to a TLS variable via DW_OP_GNU_push_tls_address
needs to use a R_390_TLS_LDO64 relocation instead of R_390_64.

Fixed by adding a SystemZELFTargetObjectFile override class and proving
a getDebugThreadLocalSymbol implementation.
2024-03-05 17:48:06 +01:00
Rishabh Bali
fe42e72db2
[CodeGen] Port AtomicExpand to new Pass Manager (#71220)
Port the `atomicexpand` pass to the new Pass Manager. 
Fixes #64559
2024-02-25 18:42:22 +05:30
Jonas Paulsson
9c0e45d7f0
[SystemZ] Use VT (not ArgVT) for SlotVT in LowerCall(). (#82475)
When an integer argument is promoted and *not* split (like i72 -> i128 on
a new machine with vector support), the SlotVT should be i128, which is
stored in VT - not ArgVT.

Fixes #81417
2024-02-21 16:26:16 +01:00
Michael Liao
ea226d6693 [LoongArch|Mips|SystemZ|VE] Fix shared build. NFC 2024-02-16 11:41:52 -05:00
Ilya Leoshkevich
9c75a98155
[SystemZ] Implement A, O and R inline assembly format flags (#80685)
Implement the following assembly format flags, which are already
supported by GCC:

	'A': On z14 or higher: If operand is a mem print the alignment
         hint usable with vl/vst prefixed by a comma.
	'O': print only the displacement of a memory reference or address.
	'R': print only the base register of a memory reference or address.

Implement 'A' conservatively, since the memory operand alignment
information is not available for INLINEASM at the moment.
2024-02-07 20:41:40 +01:00
Philip Reames
3ff7caea33
[TTI] Use Register in isLoadFromStackSlot and isStoreToStackSlot [nfc] (#80339) 2024-02-01 17:52:35 -08:00
Kazu Hirata
39fa304866 [llvm] Use StringRef::starts_with (NFC) 2024-01-31 23:54:07 -08:00
Oskar Wirga
ff4636a4ab
Refactor recomputeLiveIns to converge on added MachineBasicBlocks (#79940)
This is a fix for the regression seen in
https://github.com/llvm/llvm-project/pull/79498

> Currently, the way that recomputeLiveIns works is that it will
recompute the livein registers for that MachineBasicBlock but it matters
what order you call recomputeLiveIn which can result in incorrect
register allocations down the line.

Now we do not recompute the entire CFG but we do ensure that the newly
added MBB do reach convergence.
2024-01-30 19:33:04 -08:00
Jonas Paulsson
34dd8ec8ae
[clang, SystemZ] Support -munaligned-symbols (#73511)
When this option is passed to clang, external (and/or weak) symbols
are not assumed to have the minimum ABI alignment normally required.
Symbols defined locally that are not weak are however still given the
minimum alignment.

This is implemented by passing a new parameter to getMinGlobalAlign()
named HasNonWeakDef that is used to return the right alignment value.

This is needed when external symbols created from a linker script may
not get the ABI minimum alignment and must therefore be treated as
unaligned by the compiler.
2024-01-27 18:29:37 +01:00
Nikita Popov
07a1925b8b Revert "Refactor recomputeLiveIns to operate on whole CFG (#79498)"
This reverts commit 59bf60519fc30d9d36c86abd83093b068f6b1e4b.

Introduces a major compile-time regression.
2024-01-26 22:33:17 +01:00
Oskar Wirga
59bf60519f
Refactor recomputeLiveIns to operate on whole CFG (#79498)
Currently, the way that recomputeLiveIns works is that it will recompute
the livein registers for that MachineBasicBlock but it matters what
order you call recomputeLiveIn which can result in incorrect register
allocations down the line.

This PR fixes that by simply recomputing the liveins for the entire CFG
until convergence is achieved. This makes it harder to introduce subtle
bugs which alter liveness.
2024-01-26 11:25:36 -08:00
Shengchen Kan
550f0eb2ce [NFC] Rename TargetInstrInfo::FoldImmediate to TargetInstrInfo::foldImmediate and simplify implementation for X86 2024-01-26 20:50:58 +08:00
Jonas Paulsson
84dcf3d35b
[SystemZ] Require D12 for i128 accesses in isLegalAddressingMode() (#79221)
Machines with vector support handle i128 in vector registers and
therefore only have the small displacement available for memory
accesses. Update isLegalAddressingMode() to reflect this.
2024-01-24 20:16:05 +01:00
Jonas Paulsson
964565f42e
[SystemZ] i128 cost model (#78528)
Update SystemZTTI to reflect the recent change of handling i128 as a
legal type in vector registers.
2024-01-18 18:23:57 +01:00
Jonas Paulsson
1d1893097a
[SystemZ] Don't use FP Load and Test as comparisons to same reg (#78074)
The usage of FP Load and Test instructions as a comparison against zero
with the assumption that the dest reg will always reflect the source reg is
actually incorrect: Unfortunately, a SNaN will be converted to a QNaN, so the
instruction may actually change the value as opposed to being a pure register
move with a test.

This patch
- changes instruction selection to always emit FP LT with a scratch def
  reg, which will typically be allocated to the same reg if dead.
- Removes the conversions into FP LT in SystemZElimcompare.
2024-01-15 19:36:40 +01:00
Jonas Paulsson
62b7e35f10
[SystemZ] Don't assert for i128 vectors in getInterleavedMemoryOpCost() (#78009)
This assert does not seem justified given that the LoopVectorizer can
form interleave groups containing i128 elements where the number of
elements per vector is indeed just one.
2024-01-15 17:31:18 +01:00
Jonas Paulsson
3b16d8c8ea
[SystemZ] Don't crash on undef source in shouldCoalesce() (#78056)
SystemZRegisterInfo::shouldCoalesce() has to be able to handle an undef
source.
2024-01-15 17:24:38 +01:00
Kazu Hirata
7528cf5ef2 [Target] Use getConstantOperandVal (NFC) 2024-01-14 00:53:29 -08:00
Ulrich Weigand
9aa8c82748 [SystemZ] Fix 256-bit shifts when i128 is legal
When i128 is a legal type, SelectionDAG now attempts to use
SRL_PARTS etc. with type i128, which is not implemented.  Fix
by marking those as Expand, just like we do for i64.

Fixes https://github.com/llvm/llvm-project/issues/77132
2024-01-10 15:12:19 +01:00
Alex Bradbury
2d54ec36f7
[SelectionDAG] Add and use SDNode::getAsAPIntVal() helper (#77455)
This is the logical equivalent for #76710 for APInt and uses the same
naming scheme.

Converted existing users through:
`git grep -l "cast<ConstantSDNode>\(.*\).*getAPIntValueValue" | xargs
sed -E -i
's/cast<ConstantSDNode>\((.*)\)->getAPIntValue/\1->getAsAPIntVal/'`
2024-01-09 14:27:07 +00:00
Alex Bradbury
197214e39b
[RFC][SelectionDAG] Add and use SDNode::getAsZExtVal() helper (#76710)
This follows on from #76708, allowing
`cast<ConstantSDNode>(N)->getZExtValue()` to be replaced with just
`N->getAsZextVal();`
    
Introduced via `git grep -l "cast<ConstantSDNode>\(.*\).*getZExtValue" |
xargs sed -E -i
's/cast<ConstantSDNode>\((.*)\)->getZExtValue/\1->getAsZExtVal/'` and
then using `git clang-format` on the result.
2024-01-09 12:25:17 +00:00
Alex Bradbury
80aeb62211
[llvm][NFC] Use SDValue::getConstantOperandVal(i) where possible (#76708)
This helper function shortens examples like
`cast<ConstantSDNode>(Node->getOperand(1))->getZExtValue();` to
`Node->getConstantOperandVal(1);`.

Implemented with:
`git grep -l
"cast<ConstantSDNode>\(.*->getOperand\(.*\)\)->getZExtValue\(\)" | xargs
sed -E -i

's/cast<ConstantSDNode>\((.*)->getOperand\((.*)\)\)->getZExtValue\(\)/\1->getConstantOperandVal(\2)/`
and `git grep -l
"cast<ConstantSDNode>\(.*\.getOperand\(.*\)\)->getZExtValue\(\)" | xargs
sed -E -i

's/cast<ConstantSDNode>\((.*)\.getOperand\((.*)\)\)->getZExtValue\(\)/\1.getConstantOperandVal(\2)/'`.
With a couple of simple manual fixes needed. Result then processed by
`git clang-format`.
2024-01-02 13:14:28 +00:00
Yusra Syeda
0768253c20
[SystemZ][z/OS] Add exception handling for XPLINK (#74638)
Adds emitting the exception table and the EH registers for XPLINK.

---------

Co-authored-by: Yusra Syeda <yusra.syeda@ibm.com>
2023-12-19 13:58:33 -05:00
Ulrich Weigand
a00c4220be [SystemZ] Fix complex address matching when i128 is legal
Complex address matching currently handles truncations, under
the assumption that those are no-ops.  This is no longer true
when i128 is legal.  Change the code to only handle actual
no-op truncations.

Fixes https://github.com/llvm/llvm-project/issues/75708
Fixes https://github.com/llvm/llvm-project/issues/75714
2023-12-18 12:47:45 +01:00
Ulrich Weigand
59f7f35a90 [SystemZ] ABI support for single-element vector types
Support passing and returning values of single-element vector
types (i.e. <1 x i128> and <1 x fp128>).

Now that i128 is a legal type, supporting these types can be
done simply by providing a getRegisterTypeForCallingConv
implementation that handles them.

Fixes https://github.com/llvm/llvm-project/issues/61291
2023-12-15 19:31:00 +01:00
Ulrich Weigand
a65ccc1b9f
[SystemZ] Support i128 as legal type in VRs (#74625)
On processors supporting vector registers and SIMD instructions, enable
i128 as legal type in VRs. This allows many operations to be implemented
via native instructions directly in VRs (including add, subtract,
logical operations and shifts). For a few other operations (e.g.
multiply and divide, as well as atomic operations), we need to move the
i128 value back to a GPR pair to use the corresponding instruction
there. Overall, this is still beneficial.

The patch includes the following LLVM changes:
- Enable i128 as legal type
- Set up legal operations (in SystemZInstrVector.td)
- Custom expansion for i128 add/subtract with carry
- Custom expansion for i128 comparisons and selects
- Support for moving i128 to/from GPR pairs when required
- Handle 128-bit integer constant values everywhere
- Use i128 as intrinsic operand type where appropriate
- Updated and new test cases

In addition, clang builtins are updated to reflect the intrinsic operand
type changes (which also improves compatibility with GCC).
2023-12-15 12:55:15 +01:00
Jonas Paulsson
01061ed370
[SystemZ] Improve shouldCoalesce() for i128. (#74942)
The SystemZ implementation of shouldCoalesce() is merely a workaround
for the fact that regalloc can run out of registers when extending 128-bit
intervals with subreg (GPR64/GPR32) COPYs. 

This patch adds more freedom to the coalescer as it now only checks that
the subreg interval is local to MBB and does not have too many physreg
clobbers.
2023-12-14 15:55:27 +01:00
Kazu Hirata
586ecdf205
[llvm] Use StringRef::{starts,ends}_with (NFC) (#74956)
This patch replaces uses of StringRef::{starts,ends}with with
StringRef::{starts,ends}_with for consistency with
std::{string,string_view}::{starts,ends}_with in C++20.

I'm planning to deprecate and eventually remove
StringRef::{starts,ends}with.
2023-12-11 21:01:36 -08:00
Jonas Paulsson
07056c2274
[SystemZ] Use LCGR/AGHI for i64 XOR with -1 (#74882)
LCGR/AGHI is a more compact way of implementing a 64-bit NOT.
2023-12-11 17:28:12 +01:00
Jonas Paulsson
435ba72afd
[SystemZ] Simplify handling of AtomicRMW instructions. (#74789)
Let the AtomicExpand pass do more of the job of expanding
AtomicRMWInst:s in order to simplify the handling in the backend.

The only cases that the backend needs to handle itself are those of
subword size (8/16 bits) and those directly corresponding to a target
instruction.
2023-12-08 17:19:17 +01:00
Craig Topper
e87f33d9ce
[RISCV][MC] Pass MCSubtargetInfo down to shouldForceRelocation and evaluateTargetFixup. (#73721)
Instead of using the STI stored in RISCVAsmBackend, try to get it from
the MCFragment.

This addresses the issue raised here
https://discourse.llvm.org/t/possible-problem-related-to-subtarget-usage/75283
2023-12-07 13:17:58 -08:00
Jonas Paulsson
c568927f3e
[SystemZ] Properly support 16 byte atomic int/fp types and ops. (#73134)
- Clang FE now has MaxAtomicPromoteWidth / MaxAtomicInlineWidth set to 128, and now produces IR
  instead of calls to __atomic instrinsics for 16 bytes as well.
- Atomic __int128 (and long double) variables are now aligned to 16 bytes by default (like gcc 14).
- AtomicExpand pass now expands 16 byte operations as well.
- tests for __atomic builtins for all integer widths, and __atomic_is_lock_free with friends.
- TODO: AtomicExpand pass handles with this patch expansion of i128 atomicrmw:s. As a next step
  smaller integer types should also be possible to handle this way instead of by the backend.
2023-12-05 17:17:21 +01:00