81 Commits

Author SHA1 Message Date
Matt Arsenault
143ca74ed3 AtomicExpand: Convert tests to opaque pointers 2022-11-28 08:43:16 -05:00
Manuel Brito
f408635b26 [CodeGen] Use poison instead of undef as placeholder in AtomicExpandPass [NFC]
Differential Revision: https://reviews.llvm.org/D138483
2022-11-24 08:42:28 +00:00
Nuno Lopes
b50e1bd605 Revert "[CodeGen] Use poison instead of undef as placeholder in AtomicExpandPass [NFC]"
This reverts commit f50423c1a4422900aa1240fed643f5920451a88d.
2022-11-22 12:41:22 +00:00
Manuel Brito
f50423c1a4 [CodeGen] Use poison instead of undef as placeholder in AtomicExpandPass [NFC]
Differential Revision: https://reviews.llvm.org/D138483
2022-11-22 11:40:25 +00:00
gonglingqin
19ae5391e3 [LoongArch] Expand atomicrmw fadd/fsub/fmin/fmax with CmpXChg
Differential Revision: https://reviews.llvm.org/D137311
2022-11-14 10:11:37 +08:00
Matt Arsenault
3cfa03856f AtomicExpand: Support cmpxchg expansion for small FP types
Handles f16 atomics for AMDGPU.
2022-11-10 22:16:11 -08:00
Bjorn Pettersson
893e351f2f [test] Avoid legacy PM default pipelines (O0,O1 etc) when running opt
Two lit tests were found running something like this:
  opt -O<n> -pass-locked-to-legacy-PM ...

The expand-atomicrmw-xchg-fp.ll seem to have used -O1 just to ensure
that the -atomic-expand pass were thinking that it wasn't running at
O0 level. Same thing can be ensured by using the -codegen-opt-level=1
option, making it possible to avoid using O1 in that test case.

In the vector-reductions-expanded.ll test case it was possible to
split the RUN line into using two opt invocations. First running
"opt -O2" using the new PM, and then running "opt -expand-reductions"
using the legacy PM.

I think that given this patch we get closer to removing code related
to 'AddOptimizationPasses' in opt.cpp.

Differential Revision: https://reviews.llvm.org/D137626
2022-11-09 09:57:57 +01:00
Shilei Tian
1186e9d59f [LLVM][AMDGPU] Specialize 32-bit atomic fadd instruction for generic address space
The 32-bit floating-point atomic add instructions on AMDGPUs does not support a
"flat" or "generic" address space. So, if the address space cannot be determined
statically, the AMDGPU backend will fall back to a CAS loop (which does support
"flat" addressing). Instead, this patch emits runtime address-space checks to
allow native FP atomic add instructions for global and LDS memory (and non-atomic
FP add instructions for private/scratch memory).

In order to do that, this patch introduces a new interface function
`emitExpandAtomicRMW`. It is expected to be called when a common atomic expand
doesn't work for a specific target, such as the case we discussed here.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D129690
2022-11-04 14:11:05 -04:00
Matt Arsenault
b60a9ccd02 AtomicExpand: Use InstSimplifyFolder
Automatically cleanup operations if we know the atomic has higher
alignment.
2022-10-31 23:31:42 -07:00
Matt Arsenault
07f12170a2 AtomicExpand: Don't create unused instructions for some atomicrmw
This wasn't used by every atomicrmw expansion.
2022-10-31 18:34:36 -07:00
Matt Arsenault
d0750ec475 AtomicExpand: Avoid some operations if the atomic is overaligned
Let some of the pointer bithacking fold away if we know the LSB are 0.
2022-10-13 23:31:00 -07:00
Matt Arsenault
01adf1f3e5 AtomicExpand: Add some more overaligned atomic tests 2022-09-28 12:51:30 -04:00
Matt Arsenault
a61c3455c0 AtomicExpand: Use llvm.ptrmask instead of ptrtoint
This removes the ptrtoint from the load's pointer operand, although we
can't entirely eliminate these to get the LSB shift. In a future
patch, this will avoid ptrtoint in the case where the atomic is
overaligned to the word size.
2022-09-28 12:51:30 -04:00
Petar Avramovic
5cee9047d5 AMDGPU: Improve atomicrmw fadd selection
Use same atomicrmw fadd expansion rules for gfx908, gfx940 and gfx11
as for gfx90a. Add missing globalisel legalizer support for flat
atomicrmw fadd f32 on gfx940 and gfx11.
Isel support for gfx11 will be added in D130579.

Differential Revision: https://reviews.llvm.org/D131560
2022-09-23 17:52:10 +02:00
Petar Avramovic
48968c47b0 AMDGPU: Add detailed buffer, global and flat atomic fadd tests
Precommit for D130579 that will remove manual selection and use
patterns from td files. Tests are grouped based on target features.

All patterns have rtn and no-rtn versions.

buffer atomics patterns are selected based on the intrinsic used
(raw or struct) and the offset operand (imm or vgpr):
_offset raw with imm offset
_offen raw with vgpr offset (or large imm offset)
_idxen struct with imm offset
_bothen struct with vgpr offset (or large imm offset)

global and flat atomics are selected via intrinsic or the atomicrmw fadd.
atomicrmw tests have amdgpu-unsafe-fp-atomics=true and non-system scope
since they get expanded otherwise. atomicrmw fadd does not support vector
type, test float and double.

global atomics patterns are selected based on address type via (global or
flat) intrinsic or atomicrmw fadd with global address(addrspace(1)*).
'no suffix' vgpr addrspace(1)* address
_saddr sgpr addrspace(1)* address

flat atomics patterns are selected via (flat)intrinsic or atomicrmw fadd
with flat address (* - address space 0).

Differential Revision: https://reviews.llvm.org/D131561
2022-09-23 17:52:10 +02:00
Matt Arsenault
b9a371f6d1 AtomicExpand: Use correct pointer size for integer
This was using the default address space.
2022-09-20 16:51:05 -04:00
Matt Arsenault
4d322ba77b AMDGPU: Add baseline test for expansion of 16-bit local atomics
The expansion is currently using the wrong pointer size.
2022-09-20 16:51:05 -04:00
Matt Arsenault
784d2930c0 AtomicExpand: Switch test to generated checks 2022-09-20 16:51:05 -04:00
Matt Arsenault
28e03692ae AMDGPU: Fix expansion of 16-bit atomicrmw
Fixes issue 57830
2022-09-20 14:47:40 -04:00
Matt Arsenault
a4b1f7a8b5 AMDGPU: Add some tests for atomics with excess alignment 2022-09-19 19:27:21 -04:00
Matt Arsenault
3f77df8e29 AMDGPU: Update baseline test checks 2022-09-19 18:57:33 -04:00
Marco Elver
f0d6709e4a [AtomicExpandPass] Always copy pcsections Metadata to expanded atomics
When expanding IR atomics to target-specific atomics, copy all
!pcsections Metadata to expanded atomics automatically.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D130885
2022-09-07 11:36:01 +02:00
Kai Luo
ad2f7fd286 [AtomicExpand] Make floating point conversion happens before fence insertion
IIUC, the conversion part is not part of atomic operations and fences should be put around converted atomic operations.
This also fixes atomic load of floating point values which requires fence on PowerPC.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D127609
2022-08-31 09:54:58 +08:00
gonglingqin
e9a4b8e397 [LoongArch] Optimize the atomic store with amswap_db.[w/d]
When AtomicOrdering is release or stronger, use
    amswap_db.[w/d] $zero, $a1, $a0
instead of
    dbar 0
    st.[w/d] $a0, $a1, 0

Thanks to @xry111 for the suggestion: https://reviews.llvm.org/D128901#3626635

Differential Revision: https://reviews.llvm.org/D129838
2022-08-23 17:11:57 +08:00
gonglingqin
47f3dc6d49 [LoongArch] Add codegen support for atomic fence, atomic load and atomic store
Differential Revision: https://reviews.llvm.org/D128901
2022-07-13 15:25:45 +08:00
Kai Luo
6710b21d46 [PowerPC] Allow llvm.ppc.cfence to accept pointer types
In the context of atomic load, integer, pointer and float point types are allowed, thus we should allow llvm.ppc.cfence to accept any type mentioned.

Fixes https://github.com/llvm/llvm-project/issues/55983.

Reviewed By: shchenz, vchuravy

Differential Revision: https://reviews.llvm.org/D127554
2022-06-24 10:55:32 +08:00
Kai Luo
8091f7120c [PowerPC] Correct test RUN line. NFC. 2022-06-14 14:56:00 +08:00
Kai Luo
029fc37270 [PowerPC][AtomicExpand] Precommit IR tests for D127609. NFC. 2022-06-14 14:24:21 +08:00
Kai Luo
18679ac0d7 [PowerPC] Adjust MaxAtomicSizeInBitsSupported on PPC64
AtomicExpandPass uses this variable to determine emitting libcalls or not. The default value is 1024 and if we don't specify it for PPC64 explicitly, AtomicExpandPass won't emit `__atomic_*` libcalls for those target unable to inline atomic ops and finally the backend emits `__sync_*` libcalls. Thanks @efriedma for pointing it out.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D122868
2022-04-09 00:03:09 +00:00
Kai Luo
dc77769fc6 [PowerPC] Add cmpxchg test for pwr7 in atomic expand pass. NFC. 2022-04-01 13:27:54 +08:00
Kai Luo
31906a6090 [AtomicExpand][PowerPC] Fix all-one mask value
When generating a all-one mask value whose bitwidth is larger than 64, signed extension should be used rather then zero extension.

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D120865
2022-03-18 13:35:54 +08:00
Arthur Eubanks
2371c5a0e0 [OpaquePtr][ARM] Use elementtype on ldrex/ldaex/stlex/strex
Includes verifier changes checking the elementtype, clang codegen
changes to emit the elementtype, and ISel changes using the elementtype.

Basically the same as D120527.

Reviewed By: #opaque-pointers, nikic

Differential Revision: https://reviews.llvm.org/D121847
2022-03-16 14:11:53 -07:00
Arthur Eubanks
250620f76e [OpaquePtr][AArch64] Use elementtype on ldxr/stxr
Includes verifier changes checking the elementtype, clang codegen
changes to emit the elementtype, and ISel changes using the elementtype.

Reviewed By: #opaque-pointers, nikic

Differential Revision: https://reviews.llvm.org/D120527
2022-03-14 10:09:59 -07:00
Kai Luo
1cfcbf197c [PowerPC][atomics] Precommit test cases for i128 cmpxchg. NFC. 2022-03-03 10:47:52 +08:00
Kai Luo
1453f048cf [PowerPC] Add lit.local.cfg in AtomicExpand tests
Fixed build errors on other platforms.
2021-07-20 09:13:50 +00:00
Kai Luo
e2ee27b20b [PowerPC] Fallback to base's implementation of shouldExpandAtomicCmpXchgInIR and shouldExpandAtomicCmpXchgInIR
If we can't decide `shouldExpandAtomicCmpXchgInIR` or `shouldExpandAtomicCmpXchgInIR` in PPC's implementation after https://reviews.llvm.org/rGb9c3941cd61de1e1b9e4f3311ddfa92394475f4b, resort to base's implementation.

This fixes internal build of OpenMP which uses atomic operations on float.

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D106234
2021-07-20 06:14:24 +00:00
LemonBoy
b577ec4956 [AtomicExpandPass][AArch64] Promote xchg with floating-point types to integer ones
Follow the same strategy used for atomic loads/stores by converting the operands to equally-sized integer types.
This change prevents the atomic expansion pass from generating illegal LL/SC pairs when targeting AArch64: `expand-atomicrmw-xchg-fp.ll` would previously instantiate intrinsics such as `llvm.aarch64.ldaxr.p0f32` that cannot be lowered.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D103232
2021-05-29 08:57:27 +02:00
Tomas Matheson
9d86095ff8 Revert "[CodeGen][ARM] Implement atomicrmw as pseudo operations at -O0"
This reverts commit 753185031d939711f8733639a77a6fdc3bdbad22.
2021-05-03 21:48:20 +01:00
Tomas Matheson
753185031d [CodeGen][ARM] Implement atomicrmw as pseudo operations at -O0
atomicrmw instructions are expanded by AtomicExpandPass before register allocation
into cmpxchg loops. Register allocation can insert spills between the exclusive loads
and stores, which invalidates the exclusive monitor and can lead to infinite loops.

To avoid this, reimplement atomicrmw operations as pseudo-instructions and expand them
after register allocation.

Floating point legalisation:
f16 ATOMIC_LOAD_FADD(*f16, f16) is legalised to
f32 ATOMIC_LOAD_FADD(*i16, f32) and then eventually
f32 ATOMIC_LOAD_FADD_16(*i16, f32)

Differential Revision: https://reviews.llvm.org/D101164

Originally submitted as 3338290c187b254ad071f4b9cbf2ddb2623cefc0.
Reverted in c7df6b1223d88dfd15248fbf7b7b83dacad22ae3.
2021-05-03 20:25:15 +01:00
LemonBoy
4751cadcca [AArch64] Prevent spilling between ldxr/stxr pairs
Apply the same logic used to check if CMPXCHG nodes should be expanded
at -O0: the register allocator may end up spilling some register in
between the atomic load/store pairs, breaking the atomicity and possibly
stalling the execution.

Fixes PR48017

Reviewed By: efriedman

Differential Revision: https://reviews.llvm.org/D101163
2021-05-01 17:17:05 +02:00
Tomas Matheson
c7df6b1223 Revert "[CodeGen][ARM] Implement atomicrmw as pseudo operations at -O0"
This reverts commit 3338290c187b254ad071f4b9cbf2ddb2623cefc0.

Broke expensive checks on debian.
2021-04-30 16:53:14 +01:00
Tomas Matheson
3338290c18 [CodeGen][ARM] Implement atomicrmw as pseudo operations at -O0
atomicrmw instructions are expanded by AtomicExpandPass before register allocation
into cmpxchg loops. Register allocation can insert spills between the exclusive loads
and stores, which invalidates the exclusive monitor and can lead to infinite loops.

To avoid this, reimplement atomicrmw operations as pseudo-instructions and expand them
after register allocation.

Floating point legalisation:
f16 ATOMIC_LOAD_FADD(*f16, f16) is legalised to
f32 ATOMIC_LOAD_FADD(*i16, f32) and then eventually
f32 ATOMIC_LOAD_FADD_16(*i16, f32)

Differential Revision: https://reviews.llvm.org/D101164
2021-04-30 16:40:33 +01:00
Stanislav Mekhanoshin
30b3aab329 Copy syncscope when expanding atomicrmw into cmpxchg loop
Fixes: SWDEV-280070

Differential Revision: https://reviews.llvm.org/D99902
2021-04-05 17:29:38 -07:00
Konstantin Zhuravlyov
6054a456da AMDGPU: Add support for amdgpu-unsafe-fp-atomics attribute
If amdgpu-unsafe-fp-atomics is specified, allow {flat|global}_atomic_add_f32 even if atomic modes don't match.

Differential Revision: https://reviews.llvm.org/D95391
2021-02-04 08:09:34 -05:00
Pavel Iliin
4d7df43ffd [AArch64] Out-of-line atomics (-moutline-atomics) implementation.
This patch implements out of line atomics for LSE deployment
mechanism. Details how it works can be found in llvm/docs/Atomics.rst
Options -moutline-atomics and -mno-outline-atomics to enable and disable it
were added to clang driver. This is clang and llvm part of out-of-line atomics
interface, library part is already supported by libgcc. Compiler-rt
support is provided in separate patch.

Differential Revision: https://reviews.llvm.org/D91157
2020-11-20 13:30:12 +00:00
Alex Richardson
5bc438efcf [AtomicExpand] Avoid creating an unnamed libcall
I recently modified this pass to better support CHERI-RISC-V and while
doing so I noticed that this pass was calling M->getOrInsertFunction()
with the result of TLI->getLibcallName(RTLibType). However, AMDGPU fills
the libcalls array with nullptr, so this creates an anonymous function
instead. This patch changes expandAtomicOpToLibcall to return false in
case the libcall does not exist and changes the assert() in the callees to
a report_fatal_error() instead.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D88800
2020-11-02 17:52:37 +00:00
Matt Arsenault
af0207f2ba AMDGPU: Check global FP atomics match default FP mode
We would always select global FP atomics from atomicrmw fadd, although
they have a hardcoded FP mode.
2020-09-23 09:07:50 -04:00
Krzysztof Parzyszek
25a4b1904c Handle part-word LL/SC in atomic expansion pass
Differential Revision: https://reviews.llvm.org/D77213
2020-04-28 10:07:39 -05:00
Jonathan Roelofs
7c5d2bec76 [llvm] Fix missing FileCheck directive colons
https://reviews.llvm.org/D77352
2020-04-06 09:59:08 -06:00
Matt Arsenault
32137699f7 AMDGPU: Fix copy-pasted test name error 2019-12-11 19:44:47 +05:30