143 Commits

Author SHA1 Message Date
Matt Arsenault
9cc298108a
AtomicExpand: Copy metadata from atomicrmw to cmpxchg (#109409)
When expanding an atomicrmw with a cmpxchg, preserve any metadata
attached to it. This will avoid unwanted double expansions
in a future commit.

The initial load should also probably receive the same metadata
(which for some reason is not emitted as an atomic).
2024-10-31 11:54:07 -07:00
Matt Arsenault
e3222e6f80
AMDGPU: Add baseline tests for cmpxchg custom expansion (#109408)
We need a non-atomic path if flat may access private.
2024-10-31 11:46:13 -07:00
Matt Arsenault
1d0370872f
AMDGPU: Expand flat atomics that may access private memory (#109407)
If the runtime flat address resolves to a scratch address,
64-bit atomics do not work correctly. Insert a runtime address
space check (which is quite likely to be uniform) and select between
the non-atomic and real atomic cases.

Consider noalias.addrspace metadata and avoid this expansion when
possible (we also need to consider it to avoid infinitely expanding
after adding the predication code).
2024-10-31 08:08:48 -07:00
Matt Arsenault
b0a25468fa
AMDGPU: Add baseline tests for flat-may-alias private atomic expansions (#109406) 2024-10-15 22:29:24 +04:00
Matt Arsenault
0edd07770f
AMDGPU: Preserve alignment when custom expanding atomicrmw (#103768) 2024-08-14 17:16:59 +04:00
Matt Arsenault
edded8d7b5
AMDGPU: Stop handling legacy amdgpu-unsafe-fp-atomics attribute (#101699)
This is now autoupgraded to annotate atomicrmw instructions in
old bitcode.
2024-08-13 22:02:25 +04:00
Matt Arsenault
80c51fad3b
AtomicExpand: Regenerate baseline checks (#103063) 2024-08-13 20:43:39 +04:00
Matt Arsenault
1ae507d109
AMDGPU: Do not create phi user for atomicrmw with no uses (#103061) 2024-08-13 19:24:52 +04:00
Matt Arsenault
42b5540211
AMDGPU: Preserve atomicrmw name when specializing address space (#102470) 2024-08-09 00:43:04 +04:00
Matt Arsenault
bb7143f666
AMDGPU: Avoid creating unnecessary block split in atomic expansion (#102440)
This was creating a new block to insert the is.shared check, but we
can just do that in the original block.
2024-08-09 00:39:12 +04:00
Matt Arsenault
dfda9c5b9e
AMDGPU: Handle new atomicrmw metadata for fadd case (#96760)
This is the most complex atomicrmw support case. Note we don't have
accurate remarks for all of the cases, which I'm planning on fixing
in a later change with more precise wording.

Continue respecting amdgpu-unsafe-fp-atomics until it's eventual removal.
Also seems to fix a few cases not interpreting amdgpu-unsafe-fp-atomics
appropriately aaggressively.
2024-08-02 19:41:33 +04:00
Matt Arsenault
41439d5bb7
AMDGPU: Handle remote/fine-grained memory in atomicrmw fmin/fmax lowering (#96759)
Consider the new atomic metadata when choosing to expand as cmpxchg
instead.
2024-08-01 22:08:01 +04:00
Matt Arsenault
a2a73d892a
AMDGPU: Fix no return atomicrmw fadd v2f16 selection for gfx908 (#96948)
We previously would always expand this with a cmpxchg loop, while
it should be the same conditions as the f32 case (except for the
denormal concern).
2024-06-27 21:17:16 +02:00
Matt Arsenault
a440a96ec2
AMDGPU: Start selecting flat/global atomicrmw fmin/fmax. (#95592)
Define subtarget features for atomic fmin/fmax support.

The flat/global support is a real messe. We had float/double support at
the beginning in gfx6 and gfx7. gfx8 removed these. gfx10 reintroduced them.
gfx11 removed the f64 versions again.

gfx9 partially reintroduced them, in gfx90a and gfx940 but only for f64.
2024-06-23 10:10:41 +02:00
Matt Arsenault
8520061281
AMDGPU: Support local atomicrmw fmin/fmax for float/double (#95590)
This has always been supported. Somehow, we ended up with 2
copies of clang builtins for this case, and the newer one
erroneously requires gfx8-insts.
2024-06-18 18:34:34 +02:00
Matt Arsenault
cf5ce8cdf1 AMDGPU: Add some tests for i128 and fp128 atomic expansion
These produce garbage libcalls, so the result is not useful but
this at least shows we don't assert.
2024-06-18 10:33:25 +02:00
Matt Arsenault
4cf1a19b7e Reapply "AMDGPU: Handle legal v2f16/v2bf16 atomicrmw fadd for global/flat (#95394)"
This reverts commit 95b77d90aae10725ea692e120aac083ef1c1297d.
2024-06-17 16:34:35 +02:00
Nico Weber
95b77d90aa Revert "AMDGPU: Handle legal v2f16/v2bf16 atomicrmw fadd for global/flat (#95394)"
This reverts commit 5021e6dd548323e1169be3d466d440009e6d1f8e.
Breaks tests, see https://github.com/llvm/llvm-project/pull/95394#issuecomment-2169394503
2024-06-15 12:33:13 -04:00
Matt Arsenault
5021e6dd54
AMDGPU: Handle legal v2f16/v2bf16 atomicrmw fadd for global/flat (#95394)
Unlike the existing fadd cases, choose to ignore the requirement for
amdgpu-unsafe-fp-atomics in case of fine-grained memory access. This
is to minimize migration pain to the new atomic control metadata. This
should not break any users, as the atomic intrinsics are still
directly consumed, and clang does not yet produce vector FP atomicrmw.
2024-06-15 09:58:12 +02:00
Matt Arsenault
0a9a5f989f
AMDGPU: Legalize atomicrmw fadd for v2f16/v2bf16 for local memory (#95393)
Make this legal for gfx940 and gfx12
2024-06-15 09:55:04 +02:00
Matt Arsenault
f3afdc4ad9
AtomicExpand: Fix creating invalid ptrmask for fat pointers (#94955)
The ptrmask intrinsic requires the integer mask to be the index size,
not the pointer size.
2024-06-12 10:45:42 +02:00
Matt Arsenault
a2bc50aa8b AMDGPU: Add more tests for vector typed atomicrmw fadd
Some cases should be legal for gfx940.
2024-06-11 14:44:28 +02:00
Matt Arsenault
d81170873c
AtomicExpand: Preserve metadata when expanding partword RMW (#89769)
This will be important for AMDGPU in a future patch.
2024-05-23 10:04:47 +02:00
wanglei
9d4f7f44b6 [test][LoongArch] Add -mattr=+d option. NFC
Because most of tests assume target-abi=`lp64d`, adding the
corresponding feature is reasonable.

rg -l loongarch -g '!*.s' | xargs sed -i '/mtriple=loongarch/ {/-mattr=/!{/target-abi/! s/mtriple=loongarch.. /&-mattr=+d /}}'
2024-05-14 20:23:04 +08:00
Matt Arsenault
82bb2534d4
AMDGPU: Don't bitcast float typed atomic store in IR (#90116)
Implement the promotion in the DAG.

Depends #90113
2024-05-07 21:43:22 +02:00
Matt Arsenault
7927bcdb8a
AMDGPU: Do not bitcast atomicrmw in IR (#90045)
This is the first step to eliminating shouldCastAtomicRMWIInIR. This and
the other atomic expand casting hooks should be removed. This adds
duplicate legalization machinery and interfaces. This is already what
codegen is supposed to do, and already does for the promotion case.

In the case of atomicrmw xchg, there seems to be some benefit to having
the bitcasts moved outside of the cmpxchg loop on targets with separate
int and FP registers, which we should be able to deal with by directly
checking for the legality of the underlying operation.

The casting path was also losing metadata when it recreated the
instruction.
2024-05-07 18:26:32 +02:00
Matt Arsenault
4e67b5058e AMDGPU: Add more tests for atomicrmw handling
Add agent scope copies of atomicrmw atomics tests.
Expand testing for the undo identity atomicrmw case.
Test 16-bit atomic expansions.
2024-05-03 11:50:59 +02:00
Matt Arsenault
9f9856d623 AMDGPU: Update name for amdgpu.no.remote.memory metadata 2024-05-03 11:50:59 +02:00
Matt Arsenault
f1112ebe07
AMDGPU: Do not bitcast atomic load in IR (#90060)
These hooks should be removed. This is a trivial legalization transform
the legalizer needs to support. The IR just complicates things, and it
was losing metadata. Implement the DAG promotion support, and switch
AMDGPU over to using it.

Really we'd be a lot better off merging ATOMIC_LOAD and LOAD like
GlobalISel does.
2024-04-26 12:20:40 +02:00
Matt Arsenault
76a3be7c76 AMDGPU: Add baseline tests for bad bitcasting of atomic load/store 2024-04-25 16:08:11 +02:00
Matt Arsenault
a45eb62877 AtomicExpand: Fix dropping a syncscope when bitcasting atomicrmw 2024-04-24 19:09:34 +02:00
Pierre van Houtryve
cf328ff96d
[IR] Memory Model Relaxation Annotations (#78569)
Implements the core/target-agnostic components of Memory Model
Relaxation Annotations.

RFC:
https://discourse.llvm.org/t/rfc-mmras-memory-model-relaxation-annotations/76361/5
2024-04-24 08:52:25 +02:00
Matt Arsenault
31af5e9001 AtomicExpand: Emit or with constant on RHS
This will save later code from commuting it.
2024-04-23 15:00:31 +02:00
Matt Arsenault
5b6db43f29
AMDGPU: Simplify DS atomicrmw fadd handling (#89468)
DS atomic fadd F32 does respect the denormal mode, so we do not need to
consider the expected FP mode or unsafe-fp-atomics attribute. They don't
respect the rounding mode, but we don't care outside of strictfp. This
also reveals the fp-mode-is-flush check has been missing in the cases
that should be considering it alongside amdgpu-unsafe-fp-atomics.

This also stops considering the case where flushing is enabled for f64,
as flushing isn't mandated and we barely handle this case.
2024-04-22 12:22:54 +02:00
Matt Arsenault
f433c3b380
AMDGPU: Add tests for atomicrmw handling of new metadata (#89248)
Add baseline tests which should comprehensively test the new atomic
metadata. Test codegen / expansion, and preservation in a few
transforms.

New metadata defined in #85052
2024-04-20 00:43:36 +02:00
Matt Arsenault
c8db069253 AMDGPU: Use common check prefix in atomic expand test 2024-04-19 15:46:01 +02:00
Matt Arsenault
db2f64ee1f AMDGPU: Fix not handling atomicrmw fadd in exotic address spaces correctly
We try to interpret unknown address space numbers as aliases of global,
but this wasn't applied here. Also improve test coverage for the
buffer fat pointer address space.
2024-04-17 21:39:26 +02:00
Matt Arsenault
9bd10853e5
AMDGPU: Undo atomicrmw add/sub/xor 0 -> atomicrmw or canonicalization (#87533)
InstCombine transforms add of 0 to or of 0. For system atomics, this is
problematic because while PCIe supports add, it does not support the
other operations. Undo this for system scope atomics.
2024-04-13 00:24:12 +02:00
Matt Arsenault
4cb110a84f
[RFC] IR: Support atomicrmw FP ops with vector types (#86796)
Allow using atomicrmw fadd, fsub, fmin, and fmax with vectors of
floating-point type. AMDGPU supports atomic fadd for <2 x half> and <2 x
bfloat> on some targets and address spaces.

Note this only supports the proper floating-point operations; float
vector typed xchg is still not supported. cmpxchg still only supports
integers, so this inserts bitcasts for the loop expansion.

I have support for fp vector typed xchg, and vector of int/ptr
separately implemented but I don't have an immediate need for those
beyond feature consistency.
2024-04-06 15:27:45 -04:00
Kevin P. Neal
fe893c93b7
[FPEnv][AtomicExpand] Correct strictfp attribute handling in AtomicExpandPass (#87082)
The AtomicExpand pass was lowering function calls with the strictfp
attribute to sequences that included function calls incorrectly lacking
the attribute. This patch corrects that.

The pass now also emits the correct constrained fp call instead of
normal FP instructions when in a function with the strictfp attribute.

Test changes verified with D146845.
2024-03-29 14:54:51 -04:00
Rishabh Bali
fe42e72db2
[CodeGen] Port AtomicExpand to new Pass Manager (#71220)
Port the `atomicexpand` pass to the new Pass Manager. 
Fixes #64559
2024-02-25 18:42:22 +05:30
James Y Knight
137f785fa6
[AMDGPU] Set MaxAtomicSizeInBitsSupported. (#75185)
This will result in larger atomic operations getting expanded to
`__atomic_*` libcalls via AtomicExpandPass, which matches what Clang
already does in the frontend.

While AMDGPU currently disables the use of all libcalls, I've changed it
to instead disable all of them _except_ the atomic ones. Those are
already be emitted by the Clang frontend, and enabling them in the
backend allows the same behavior there.
2023-12-18 16:51:06 -05:00
Alex Richardson
e39f6c1844 [opt] Infer DataLayout from triple if not specified
There are many tests that specify a target triple/CPU flags but no
DataLayout which can lead to IR being generated that has unusual
behaviour. This commit attempts to use the default DataLayout based
on the relevant flags if there is no explicit override on the command
line or in the IR file.

One thing that is not currently possible to differentiate from a missing
datalayout `target datalayout = ""` in the IR file since the current
APIs don't allow detecting this case. If it is considered useful to
support this case (instead of passing "-data-layout=" on the command
line), I can change IR parsers to track whether they have seen such a
directive and change the callback type.

Differential Revision: https://reviews.llvm.org/D141060
2023-10-26 12:07:37 -07:00
Alex Richardson
e86d6a43f0 Regenerate test checks for tests affected by D141060 2023-10-04 10:51:35 -07:00
Pravin Jagtap
5f8fd68672 [AMDGPU] Pre-commit test for D157495
Reviewed By: yassingh

Differential Revision: https://reviews.llvm.org/D158243
2023-08-18 06:52:32 -04:00
Matt Arsenault
7575ee7167 AMDGPU: Add more test coverage for FP-typed atomicrmw xchg 2023-08-10 17:38:25 -04:00
Matt Arsenault
35be9e2903 AtomicExpand: Preserve syncscope when expanding partword atomics 2023-08-08 14:38:06 -04:00
Matt Arsenault
3371849194 AMDGPU: Round out system atomics tests
There were system scope tests only for integer min/max. Expand this to
cover all of the integer operations.
2023-08-08 14:38:05 -04:00
Matt Arsenault
b97e9a9a03 AMDGPU: Fix some typed pointers in atomic expand test 2023-08-08 14:38:05 -04:00
Kai Luo
f26af16e2c [PowerPC][AIX] Enable quadword atomics by default for AIX
On AIX, a libatomic supporting inline quadword atomic operations has been released, so that compatibility is not an issue now, we can enable quadword atomics by default.

Reviewed By: #powerpc, nemanjai

Differential Revision: https://reviews.llvm.org/D151312
2023-07-25 08:21:07 +08:00