The 32-bit floating-point atomic add instructions on AMDGPUs does not support a
"flat" or "generic" address space. So, if the address space cannot be determined
statically, the AMDGPU backend will fall back to a CAS loop (which does support
"flat" addressing). Instead, this patch emits runtime address-space checks to
allow native FP atomic add instructions for global and LDS memory (and non-atomic
FP add instructions for private/scratch memory).
In order to do that, this patch introduces a new interface function
`emitExpandAtomicRMW`. It is expected to be called when a common atomic expand
doesn't work for a specific target, such as the case we discussed here.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D129690
This removes the ptrtoint from the load's pointer operand, although we
can't entirely eliminate these to get the LSB shift. In a future
patch, this will avoid ptrtoint in the case where the atomic is
overaligned to the word size.
Use same atomicrmw fadd expansion rules for gfx908, gfx940 and gfx11
as for gfx90a. Add missing globalisel legalizer support for flat
atomicrmw fadd f32 on gfx940 and gfx11.
Isel support for gfx11 will be added in D130579.
Differential Revision: https://reviews.llvm.org/D131560
Precommit for D130579 that will remove manual selection and use
patterns from td files. Tests are grouped based on target features.
All patterns have rtn and no-rtn versions.
buffer atomics patterns are selected based on the intrinsic used
(raw or struct) and the offset operand (imm or vgpr):
_offset raw with imm offset
_offen raw with vgpr offset (or large imm offset)
_idxen struct with imm offset
_bothen struct with vgpr offset (or large imm offset)
global and flat atomics are selected via intrinsic or the atomicrmw fadd.
atomicrmw tests have amdgpu-unsafe-fp-atomics=true and non-system scope
since they get expanded otherwise. atomicrmw fadd does not support vector
type, test float and double.
global atomics patterns are selected based on address type via (global or
flat) intrinsic or atomicrmw fadd with global address(addrspace(1)*).
'no suffix' vgpr addrspace(1)* address
_saddr sgpr addrspace(1)* address
flat atomics patterns are selected via (flat)intrinsic or atomicrmw fadd
with flat address (* - address space 0).
Differential Revision: https://reviews.llvm.org/D131561
If amdgpu-unsafe-fp-atomics is specified, allow {flat|global}_atomic_add_f32 even if atomic modes don't match.
Differential Revision: https://reviews.llvm.org/D95391
I recently modified this pass to better support CHERI-RISC-V and while
doing so I noticed that this pass was calling M->getOrInsertFunction()
with the result of TLI->getLibcallName(RTLibType). However, AMDGPU fills
the libcalls array with nullptr, so this creates an anonymous function
instead. This patch changes expandAtomicOpToLibcall to return false in
case the libcall does not exist and changes the assert() in the callees to
a report_fatal_error() instead.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D88800
As it's causing some bot failures (and per request from kbarton).
This reverts commit r358543/ab70da07286e618016e78247e4a24fcb84077fda.
llvm-svn: 358546