28 Commits

Author SHA1 Message Date
Matt Arsenault
143ca74ed3 AtomicExpand: Convert tests to opaque pointers 2022-11-28 08:43:16 -05:00
Manuel Brito
f408635b26 [CodeGen] Use poison instead of undef as placeholder in AtomicExpandPass [NFC]
Differential Revision: https://reviews.llvm.org/D138483
2022-11-24 08:42:28 +00:00
Matt Arsenault
3cfa03856f AtomicExpand: Support cmpxchg expansion for small FP types
Handles f16 atomics for AMDGPU.
2022-11-10 22:16:11 -08:00
Shilei Tian
1186e9d59f [LLVM][AMDGPU] Specialize 32-bit atomic fadd instruction for generic address space
The 32-bit floating-point atomic add instructions on AMDGPUs does not support a
"flat" or "generic" address space. So, if the address space cannot be determined
statically, the AMDGPU backend will fall back to a CAS loop (which does support
"flat" addressing). Instead, this patch emits runtime address-space checks to
allow native FP atomic add instructions for global and LDS memory (and non-atomic
FP add instructions for private/scratch memory).

In order to do that, this patch introduces a new interface function
`emitExpandAtomicRMW`. It is expected to be called when a common atomic expand
doesn't work for a specific target, such as the case we discussed here.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D129690
2022-11-04 14:11:05 -04:00
Matt Arsenault
b60a9ccd02 AtomicExpand: Use InstSimplifyFolder
Automatically cleanup operations if we know the atomic has higher
alignment.
2022-10-31 23:31:42 -07:00
Matt Arsenault
07f12170a2 AtomicExpand: Don't create unused instructions for some atomicrmw
This wasn't used by every atomicrmw expansion.
2022-10-31 18:34:36 -07:00
Matt Arsenault
d0750ec475 AtomicExpand: Avoid some operations if the atomic is overaligned
Let some of the pointer bithacking fold away if we know the LSB are 0.
2022-10-13 23:31:00 -07:00
Matt Arsenault
01adf1f3e5 AtomicExpand: Add some more overaligned atomic tests 2022-09-28 12:51:30 -04:00
Matt Arsenault
a61c3455c0 AtomicExpand: Use llvm.ptrmask instead of ptrtoint
This removes the ptrtoint from the load's pointer operand, although we
can't entirely eliminate these to get the LSB shift. In a future
patch, this will avoid ptrtoint in the case where the atomic is
overaligned to the word size.
2022-09-28 12:51:30 -04:00
Petar Avramovic
5cee9047d5 AMDGPU: Improve atomicrmw fadd selection
Use same atomicrmw fadd expansion rules for gfx908, gfx940 and gfx11
as for gfx90a. Add missing globalisel legalizer support for flat
atomicrmw fadd f32 on gfx940 and gfx11.
Isel support for gfx11 will be added in D130579.

Differential Revision: https://reviews.llvm.org/D131560
2022-09-23 17:52:10 +02:00
Petar Avramovic
48968c47b0 AMDGPU: Add detailed buffer, global and flat atomic fadd tests
Precommit for D130579 that will remove manual selection and use
patterns from td files. Tests are grouped based on target features.

All patterns have rtn and no-rtn versions.

buffer atomics patterns are selected based on the intrinsic used
(raw or struct) and the offset operand (imm or vgpr):
_offset raw with imm offset
_offen raw with vgpr offset (or large imm offset)
_idxen struct with imm offset
_bothen struct with vgpr offset (or large imm offset)

global and flat atomics are selected via intrinsic or the atomicrmw fadd.
atomicrmw tests have amdgpu-unsafe-fp-atomics=true and non-system scope
since they get expanded otherwise. atomicrmw fadd does not support vector
type, test float and double.

global atomics patterns are selected based on address type via (global or
flat) intrinsic or atomicrmw fadd with global address(addrspace(1)*).
'no suffix' vgpr addrspace(1)* address
_saddr sgpr addrspace(1)* address

flat atomics patterns are selected via (flat)intrinsic or atomicrmw fadd
with flat address (* - address space 0).

Differential Revision: https://reviews.llvm.org/D131561
2022-09-23 17:52:10 +02:00
Matt Arsenault
b9a371f6d1 AtomicExpand: Use correct pointer size for integer
This was using the default address space.
2022-09-20 16:51:05 -04:00
Matt Arsenault
4d322ba77b AMDGPU: Add baseline test for expansion of 16-bit local atomics
The expansion is currently using the wrong pointer size.
2022-09-20 16:51:05 -04:00
Matt Arsenault
28e03692ae AMDGPU: Fix expansion of 16-bit atomicrmw
Fixes issue 57830
2022-09-20 14:47:40 -04:00
Matt Arsenault
a4b1f7a8b5 AMDGPU: Add some tests for atomics with excess alignment 2022-09-19 19:27:21 -04:00
Matt Arsenault
3f77df8e29 AMDGPU: Update baseline test checks 2022-09-19 18:57:33 -04:00
Stanislav Mekhanoshin
30b3aab329 Copy syncscope when expanding atomicrmw into cmpxchg loop
Fixes: SWDEV-280070

Differential Revision: https://reviews.llvm.org/D99902
2021-04-05 17:29:38 -07:00
Konstantin Zhuravlyov
6054a456da AMDGPU: Add support for amdgpu-unsafe-fp-atomics attribute
If amdgpu-unsafe-fp-atomics is specified, allow {flat|global}_atomic_add_f32 even if atomic modes don't match.

Differential Revision: https://reviews.llvm.org/D95391
2021-02-04 08:09:34 -05:00
Alex Richardson
5bc438efcf [AtomicExpand] Avoid creating an unnamed libcall
I recently modified this pass to better support CHERI-RISC-V and while
doing so I noticed that this pass was calling M->getOrInsertFunction()
with the result of TLI->getLibcallName(RTLibType). However, AMDGPU fills
the libcalls array with nullptr, so this creates an anonymous function
instead. This patch changes expandAtomicOpToLibcall to return false in
case the libcall does not exist and changes the assert() in the callees to
a report_fatal_error() instead.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D88800
2020-11-02 17:52:37 +00:00
Matt Arsenault
af0207f2ba AMDGPU: Check global FP atomics match default FP mode
We would always select global FP atomics from atomicrmw fadd, although
they have a hardcoded FP mode.
2020-09-23 09:07:50 -04:00
Matt Arsenault
32137699f7 AMDGPU: Fix copy-pasted test name error 2019-12-11 19:44:47 +05:30
Matt Arsenault
e16a71382d AMDGPU: Select global atomicrmw fadd
This only works if there is no use of the return value.
2019-11-06 16:06:38 -08:00
Matt Arsenault
c5830f5f05 AtomicExpand: Don't crash on non-0 alloca
This now produces garbage on AMDGPU with a call to an nonexistent,
anonymous libcall but won't assert.

llvm-svn: 363022
2019-06-11 01:35:07 +00:00
Matt Arsenault
383e72fcfe AMDGPU: Expand < 32-bit atomics
Also fix AtomicExpand asserting on atomicrmw fadd/fsub.

llvm-svn: 363021
2019-06-11 01:35:00 +00:00
Eric Christopher
cee313d288 Revert "Temporarily Revert "Add basic loop fusion pass.""
The reversion apparently deleted the test/Transforms directory.

Will be re-reverting again.

llvm-svn: 358552
2019-04-17 04:52:47 +00:00
Eric Christopher
a863435128 Temporarily Revert "Add basic loop fusion pass."
As it's causing some bot failures (and per request from kbarton).

This reverts commit r358543/ab70da07286e618016e78247e4a24fcb84077fda.

llvm-svn: 358546
2019-04-17 02:12:23 +00:00
Matt Arsenault
a5840c3c39 Codegen support for atomicrmw fadd/fsub
llvm-svn: 351851
2019-01-22 18:36:06 +00:00
Matt Arsenault
ab41193312 AMDGPU: Expand atomicrmw nand in IR
llvm-svn: 343559
2018-10-02 03:50:56 +00:00