When expanding an atomicrmw with a cmpxchg, preserve any metadata
attached to it. This will avoid unwanted double expansions
in a future commit.
The initial load should also probably receive the same metadata
(which for some reason is not emitted as an atomic).
If the runtime flat address resolves to a scratch address,
64-bit atomics do not work correctly. Insert a runtime address
space check (which is quite likely to be uniform) and select between
the non-atomic and real atomic cases.
Consider noalias.addrspace metadata and avoid this expansion when
possible (we also need to consider it to avoid infinitely expanding
after adding the predication code).
This is the most complex atomicrmw support case. Note we don't have
accurate remarks for all of the cases, which I'm planning on fixing
in a later change with more precise wording.
Continue respecting amdgpu-unsafe-fp-atomics until it's eventual removal.
Also seems to fix a few cases not interpreting amdgpu-unsafe-fp-atomics
appropriately aaggressively.
Define subtarget features for atomic fmin/fmax support.
The flat/global support is a real messe. We had float/double support at
the beginning in gfx6 and gfx7. gfx8 removed these. gfx10 reintroduced them.
gfx11 removed the f64 versions again.
gfx9 partially reintroduced them, in gfx90a and gfx940 but only for f64.
Unlike the existing fadd cases, choose to ignore the requirement for
amdgpu-unsafe-fp-atomics in case of fine-grained memory access. This
is to minimize migration pain to the new atomic control metadata. This
should not break any users, as the atomic intrinsics are still
directly consumed, and clang does not yet produce vector FP atomicrmw.
Because most of tests assume target-abi=`lp64d`, adding the
corresponding feature is reasonable.
rg -l loongarch -g '!*.s' | xargs sed -i '/mtriple=loongarch/ {/-mattr=/!{/target-abi/! s/mtriple=loongarch.. /&-mattr=+d /}}'
This is the first step to eliminating shouldCastAtomicRMWIInIR. This and
the other atomic expand casting hooks should be removed. This adds
duplicate legalization machinery and interfaces. This is already what
codegen is supposed to do, and already does for the promotion case.
In the case of atomicrmw xchg, there seems to be some benefit to having
the bitcasts moved outside of the cmpxchg loop on targets with separate
int and FP registers, which we should be able to deal with by directly
checking for the legality of the underlying operation.
The casting path was also losing metadata when it recreated the
instruction.
These hooks should be removed. This is a trivial legalization transform
the legalizer needs to support. The IR just complicates things, and it
was losing metadata. Implement the DAG promotion support, and switch
AMDGPU over to using it.
Really we'd be a lot better off merging ATOMIC_LOAD and LOAD like
GlobalISel does.
DS atomic fadd F32 does respect the denormal mode, so we do not need to
consider the expected FP mode or unsafe-fp-atomics attribute. They don't
respect the rounding mode, but we don't care outside of strictfp. This
also reveals the fp-mode-is-flush check has been missing in the cases
that should be considering it alongside amdgpu-unsafe-fp-atomics.
This also stops considering the case where flushing is enabled for f64,
as flushing isn't mandated and we barely handle this case.
Add baseline tests which should comprehensively test the new atomic
metadata. Test codegen / expansion, and preservation in a few
transforms.
New metadata defined in #85052
We try to interpret unknown address space numbers as aliases of global,
but this wasn't applied here. Also improve test coverage for the
buffer fat pointer address space.
InstCombine transforms add of 0 to or of 0. For system atomics, this is
problematic because while PCIe supports add, it does not support the
other operations. Undo this for system scope atomics.
Allow using atomicrmw fadd, fsub, fmin, and fmax with vectors of
floating-point type. AMDGPU supports atomic fadd for <2 x half> and <2 x
bfloat> on some targets and address spaces.
Note this only supports the proper floating-point operations; float
vector typed xchg is still not supported. cmpxchg still only supports
integers, so this inserts bitcasts for the loop expansion.
I have support for fp vector typed xchg, and vector of int/ptr
separately implemented but I don't have an immediate need for those
beyond feature consistency.
The AtomicExpand pass was lowering function calls with the strictfp
attribute to sequences that included function calls incorrectly lacking
the attribute. This patch corrects that.
The pass now also emits the correct constrained fp call instead of
normal FP instructions when in a function with the strictfp attribute.
Test changes verified with D146845.
This will result in larger atomic operations getting expanded to
`__atomic_*` libcalls via AtomicExpandPass, which matches what Clang
already does in the frontend.
While AMDGPU currently disables the use of all libcalls, I've changed it
to instead disable all of them _except_ the atomic ones. Those are
already be emitted by the Clang frontend, and enabling them in the
backend allows the same behavior there.
There are many tests that specify a target triple/CPU flags but no
DataLayout which can lead to IR being generated that has unusual
behaviour. This commit attempts to use the default DataLayout based
on the relevant flags if there is no explicit override on the command
line or in the IR file.
One thing that is not currently possible to differentiate from a missing
datalayout `target datalayout = ""` in the IR file since the current
APIs don't allow detecting this case. If it is considered useful to
support this case (instead of passing "-data-layout=" on the command
line), I can change IR parsers to track whether they have seen such a
directive and change the callback type.
Differential Revision: https://reviews.llvm.org/D141060
On AIX, a libatomic supporting inline quadword atomic operations has been released, so that compatibility is not an issue now, we can enable quadword atomics by default.
Reviewed By: #powerpc, nemanjai
Differential Revision: https://reviews.llvm.org/D151312