160 Commits

Author SHA1 Message Date
Pierre van Houtryve
6f7c77fe90
[AMDGPU] Check noalias.addrspace in mayAccessScratchThroughFlat (#151319)
PR #149247 made the MD accessible by the backend so we can now leverage
it in the memory model. The first use case here is detecting if a flat op
can access scratch memory.
Benefits both the MemoryLegalizer and InsertWaitCnt.
2025-08-19 07:42:59 +02:00
Nikita Popov
c23b4fbdbb
[IR] Remove size argument from lifetime intrinsics (#150248)
Now that #149310 has restricted lifetime intrinsics to only work on
allocas, we can also drop the explicit size argument. Instead, the size
is implied by the alloca.

This removes the ability to only mark a prefix of an alloca alive/dead.
We never used that capability, so we should remove the need to handle
that possibility everywhere (though many key places, including stack
coloring, did not actually respect this).
2025-08-08 11:09:34 +02:00
Andrei Safronov
3ffaaf6db6
[Xtensa] Implement Xtensa S32C1I Option and atomics lowering. (#137134)
Implement Xtensa S32C1I Option. Implement atomic_cmp_swap_32 operation
using s32c1i instruction. Use atomic_cmp_swap_32 operation and AtomicExpand
pass to implement atomics operations.
2025-08-06 17:43:27 +03:00
Matt Arsenault
ed1ee9a9bf
AtomicExpand: Stop using report_fatal_error (#147300)
Emit a context error and delete the instruction. This
allows removing the AMDGPU hack where some atomic libcalls
are falsely added. NVPTX also later copied the same hack,
so remove it there too.

For now just emit the generic error, which is not good. It's
missing any useful context information (despite taking the instruction).
It's also confusing in the failed atomicrmw case, since it's reporting
failure at the intermediate failed cmpxchg instead of the original
atomicrmw.
2025-07-09 15:28:10 +09:00
Matt Arsenault
67076dd79f
AMDGPU: Fix atomic expand tests accidentally underaligning (#147299) 2025-07-08 23:58:51 +09:00
zhijian lin
85a9f2e148
[PowerPC] enable AtomicExpandImpl::expandAtomicCmpXchg for powerpc (#142395)
In PowerPC, the AtomicCmpXchgInst is lowered to
ISD::ATOMIC_CMP_SWAP_WITH_SUCCESS. However, this node does not handle
the weak attribute of AtomicCmpXchgInst. As a result, when compiling C++
atomic_compare_exchange_weak_explicit, the generated assembly includes a
"reservation lost" loop — i.e., it branches back and retries if the
stwcx. (store-conditional) fails. This differs from GCC’s codegen, which
does not include that loop for weak compare-exchange.

Since PowerPC uses LL/SC-style atomic instructions, the patch enables
AtomicExpandImpl::expandAtomicCmpXchg for PowerPC. With this, the weak
attribute is properly respected, and the "reservation lost" loop is
removed for weak operations.

---------

Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
2025-06-13 09:14:48 -04:00
Shilei Tian
8cd5604f59
[AMDGPU][AtomicExpand] Use full flat emulation if a target supports f64 global atomic add instruction (#142859)
If a target supports f64 global atomic add instruction, we can also use
full flat emulation.
2025-06-05 00:45:42 -04:00
Alexander Richardson
2d2d753e01
[AtomicExpand] Drop explicit datalayout from test
Also remove the R600 checks this test as well as a duplicate RUN line.

Reviewed By: arsenm

Pull Request: https://github.com/llvm/llvm-project/pull/137923
2025-05-09 19:43:26 -07:00
Jonathan Thackray
6e49f73825
Reland [llvm] Add support for llvm IR atomicrmw fminimum/fmaximum instructions (#137701)
This patch adds support for LLVM IR atomicrmw `fmaximum` and `fminimum`
instructions.

These mirror the `llvm.maximum.*` and `llvm.minimum.*` instructions, but
are atomic and use IEEE754 2019 handling for NaNs, which is different to
`fmax` and `fmin`. See:
     https://llvm.org/docs/LangRef.html#llvm-minimum-intrinsic
for more details.

Future changes will allow this LLVM IR to be lowered to specialised
assembler instructions on suitable targets, such as AArch64.
2025-04-30 22:06:37 +01:00
Alexander Richardson
ee13638362
[AMDGPU] Remove explicit datalayout from tests where not needed
Since e39f6c1844fab59c638d8059a6cf139adb42279a opt will infer the
correct datalayout when given a triple. Avoid explicitly specifying it
in tests that depend on the AMDGPU target being present to avoid the
string becoming out of sync with the TargetInfo value.
Only tests with REQUIRES: amdgpu-registered-target or a local lit.cfg
were updated to ensure that tests for non-target-specific passes that
happen to use the AMDGPU layout still pass when building with a limited
set of targets.

Reviewed By: shiltian, arsenm

Pull Request: https://github.com/llvm/llvm-project/pull/137921
2025-04-30 10:58:17 -07:00
Jonathan Thackray
7ee0097b48
Revert "[llvm] Add support for llvm IR atomicrmw fminimum/fmaximum instructions" (#137657)
Reverts llvm/llvm-project#136759 due to bad interaction with c792b25e4
2025-04-28 16:53:36 +01:00
Jonathan Thackray
ba420d8122
[llvm] Add support for llvm IR atomicrmw fminimum/fmaximum instructions (#136759)
This patch adds support for LLVM IR atomicrmw `fmaximum` and `fminimum`
instructions.

These mirror the `llvm.maximum.*` and `llvm.minimum.*` instructions, but
are atomic and use IEEE754 2019 handling for NaNs, which is different to
`fmax` and `fmin`. See:
     https://llvm.org/docs/LangRef.html#llvm-minimum-intrinsic
for more details.

Future changes will allow this LLVM IR to be lowered to specialised
assembler instructions on suitable targets, such as AArch64.
2025-04-28 15:31:44 +01:00
Fabian Ritter
a33a84ee63
[AMDGPU][NFC] Replace gfx940 and gfx941 with gfx942 in llvm/test (#125711)
[AMDGPU][NFC] Replace gfx940 and gfx941 with gfx942 in llvm/test

gfx940 and gfx941 are no longer supported. This is one of a series of PRs to remove them from the code base.

This PR uses gfx942 instead of gfx940 and gfx941 in the test RUN-lines (unless there is already a RUN-line for gfx942).

The only notable difference in the test output is that gfx942 does not force the use of sc0 and sc1 on stores while gfx940 and gfx941 do (cf. https://reviews.llvm.org/D149986).

For SWDEV-512631
2025-02-13 15:17:12 +01:00
Lei Huang
a13ec9cd54
[PowerPC] Update data layout aligment of i128 to 16 (#118004)
Fix 64-bit PowerPC part of
https://github.com/llvm/llvm-project/issues/102783.
2024-12-09 18:02:24 -05:00
tangaac
427be07675
[LoongArch] Support amcas[_db].{b/h/w/d} instructions. (#114189)
Two options for clang: -mlamcas & -mno-lamcas.
Enable or disable amcas[_db].{b/h} instructions.
The default is -mno-lamcas.
Only works on LoongArch64.
2024-11-27 17:36:13 +08:00
Matt Arsenault
04d450fd8d
AtomicExpand: Preserve metadata when bitcasting fp atomicrmw xchg (#115240) 2024-11-13 12:51:18 -08:00
Matt Arsenault
30dd1297fa
AMDGPU: Custom expand flat cmpxchg which may access private (#109410)
64-bit flat cmpxchg instructions do not work correctly for scratch
addresses, and need to be expanded as non-atomic.

Allow custom expansion of cmpxchg in AtomicExpand, as is
already the case for atomicrmw.
2024-11-04 09:29:38 -08:00
Matt Arsenault
9cc298108a
AtomicExpand: Copy metadata from atomicrmw to cmpxchg (#109409)
When expanding an atomicrmw with a cmpxchg, preserve any metadata
attached to it. This will avoid unwanted double expansions
in a future commit.

The initial load should also probably receive the same metadata
(which for some reason is not emitted as an atomic).
2024-10-31 11:54:07 -07:00
Matt Arsenault
e3222e6f80
AMDGPU: Add baseline tests for cmpxchg custom expansion (#109408)
We need a non-atomic path if flat may access private.
2024-10-31 11:46:13 -07:00
Matt Arsenault
1d0370872f
AMDGPU: Expand flat atomics that may access private memory (#109407)
If the runtime flat address resolves to a scratch address,
64-bit atomics do not work correctly. Insert a runtime address
space check (which is quite likely to be uniform) and select between
the non-atomic and real atomic cases.

Consider noalias.addrspace metadata and avoid this expansion when
possible (we also need to consider it to avoid infinitely expanding
after adding the predication code).
2024-10-31 08:08:48 -07:00
Matt Arsenault
b0a25468fa
AMDGPU: Add baseline tests for flat-may-alias private atomic expansions (#109406) 2024-10-15 22:29:24 +04:00
Matt Arsenault
0edd07770f
AMDGPU: Preserve alignment when custom expanding atomicrmw (#103768) 2024-08-14 17:16:59 +04:00
Matt Arsenault
edded8d7b5
AMDGPU: Stop handling legacy amdgpu-unsafe-fp-atomics attribute (#101699)
This is now autoupgraded to annotate atomicrmw instructions in
old bitcode.
2024-08-13 22:02:25 +04:00
Matt Arsenault
80c51fad3b
AtomicExpand: Regenerate baseline checks (#103063) 2024-08-13 20:43:39 +04:00
Matt Arsenault
1ae507d109
AMDGPU: Do not create phi user for atomicrmw with no uses (#103061) 2024-08-13 19:24:52 +04:00
Matt Arsenault
42b5540211
AMDGPU: Preserve atomicrmw name when specializing address space (#102470) 2024-08-09 00:43:04 +04:00
Matt Arsenault
bb7143f666
AMDGPU: Avoid creating unnecessary block split in atomic expansion (#102440)
This was creating a new block to insert the is.shared check, but we
can just do that in the original block.
2024-08-09 00:39:12 +04:00
Matt Arsenault
dfda9c5b9e
AMDGPU: Handle new atomicrmw metadata for fadd case (#96760)
This is the most complex atomicrmw support case. Note we don't have
accurate remarks for all of the cases, which I'm planning on fixing
in a later change with more precise wording.

Continue respecting amdgpu-unsafe-fp-atomics until it's eventual removal.
Also seems to fix a few cases not interpreting amdgpu-unsafe-fp-atomics
appropriately aaggressively.
2024-08-02 19:41:33 +04:00
Matt Arsenault
41439d5bb7
AMDGPU: Handle remote/fine-grained memory in atomicrmw fmin/fmax lowering (#96759)
Consider the new atomic metadata when choosing to expand as cmpxchg
instead.
2024-08-01 22:08:01 +04:00
Matt Arsenault
a2a73d892a
AMDGPU: Fix no return atomicrmw fadd v2f16 selection for gfx908 (#96948)
We previously would always expand this with a cmpxchg loop, while
it should be the same conditions as the f32 case (except for the
denormal concern).
2024-06-27 21:17:16 +02:00
Matt Arsenault
a440a96ec2
AMDGPU: Start selecting flat/global atomicrmw fmin/fmax. (#95592)
Define subtarget features for atomic fmin/fmax support.

The flat/global support is a real messe. We had float/double support at
the beginning in gfx6 and gfx7. gfx8 removed these. gfx10 reintroduced them.
gfx11 removed the f64 versions again.

gfx9 partially reintroduced them, in gfx90a and gfx940 but only for f64.
2024-06-23 10:10:41 +02:00
Matt Arsenault
8520061281
AMDGPU: Support local atomicrmw fmin/fmax for float/double (#95590)
This has always been supported. Somehow, we ended up with 2
copies of clang builtins for this case, and the newer one
erroneously requires gfx8-insts.
2024-06-18 18:34:34 +02:00
Matt Arsenault
cf5ce8cdf1 AMDGPU: Add some tests for i128 and fp128 atomic expansion
These produce garbage libcalls, so the result is not useful but
this at least shows we don't assert.
2024-06-18 10:33:25 +02:00
Matt Arsenault
4cf1a19b7e Reapply "AMDGPU: Handle legal v2f16/v2bf16 atomicrmw fadd for global/flat (#95394)"
This reverts commit 95b77d90aae10725ea692e120aac083ef1c1297d.
2024-06-17 16:34:35 +02:00
Nico Weber
95b77d90aa Revert "AMDGPU: Handle legal v2f16/v2bf16 atomicrmw fadd for global/flat (#95394)"
This reverts commit 5021e6dd548323e1169be3d466d440009e6d1f8e.
Breaks tests, see https://github.com/llvm/llvm-project/pull/95394#issuecomment-2169394503
2024-06-15 12:33:13 -04:00
Matt Arsenault
5021e6dd54
AMDGPU: Handle legal v2f16/v2bf16 atomicrmw fadd for global/flat (#95394)
Unlike the existing fadd cases, choose to ignore the requirement for
amdgpu-unsafe-fp-atomics in case of fine-grained memory access. This
is to minimize migration pain to the new atomic control metadata. This
should not break any users, as the atomic intrinsics are still
directly consumed, and clang does not yet produce vector FP atomicrmw.
2024-06-15 09:58:12 +02:00
Matt Arsenault
0a9a5f989f
AMDGPU: Legalize atomicrmw fadd for v2f16/v2bf16 for local memory (#95393)
Make this legal for gfx940 and gfx12
2024-06-15 09:55:04 +02:00
Matt Arsenault
f3afdc4ad9
AtomicExpand: Fix creating invalid ptrmask for fat pointers (#94955)
The ptrmask intrinsic requires the integer mask to be the index size,
not the pointer size.
2024-06-12 10:45:42 +02:00
Matt Arsenault
a2bc50aa8b AMDGPU: Add more tests for vector typed atomicrmw fadd
Some cases should be legal for gfx940.
2024-06-11 14:44:28 +02:00
Matt Arsenault
d81170873c
AtomicExpand: Preserve metadata when expanding partword RMW (#89769)
This will be important for AMDGPU in a future patch.
2024-05-23 10:04:47 +02:00
wanglei
9d4f7f44b6 [test][LoongArch] Add -mattr=+d option. NFC
Because most of tests assume target-abi=`lp64d`, adding the
corresponding feature is reasonable.

rg -l loongarch -g '!*.s' | xargs sed -i '/mtriple=loongarch/ {/-mattr=/!{/target-abi/! s/mtriple=loongarch.. /&-mattr=+d /}}'
2024-05-14 20:23:04 +08:00
Matt Arsenault
82bb2534d4
AMDGPU: Don't bitcast float typed atomic store in IR (#90116)
Implement the promotion in the DAG.

Depends #90113
2024-05-07 21:43:22 +02:00
Matt Arsenault
7927bcdb8a
AMDGPU: Do not bitcast atomicrmw in IR (#90045)
This is the first step to eliminating shouldCastAtomicRMWIInIR. This and
the other atomic expand casting hooks should be removed. This adds
duplicate legalization machinery and interfaces. This is already what
codegen is supposed to do, and already does for the promotion case.

In the case of atomicrmw xchg, there seems to be some benefit to having
the bitcasts moved outside of the cmpxchg loop on targets with separate
int and FP registers, which we should be able to deal with by directly
checking for the legality of the underlying operation.

The casting path was also losing metadata when it recreated the
instruction.
2024-05-07 18:26:32 +02:00
Matt Arsenault
4e67b5058e AMDGPU: Add more tests for atomicrmw handling
Add agent scope copies of atomicrmw atomics tests.
Expand testing for the undo identity atomicrmw case.
Test 16-bit atomic expansions.
2024-05-03 11:50:59 +02:00
Matt Arsenault
9f9856d623 AMDGPU: Update name for amdgpu.no.remote.memory metadata 2024-05-03 11:50:59 +02:00
Matt Arsenault
f1112ebe07
AMDGPU: Do not bitcast atomic load in IR (#90060)
These hooks should be removed. This is a trivial legalization transform
the legalizer needs to support. The IR just complicates things, and it
was losing metadata. Implement the DAG promotion support, and switch
AMDGPU over to using it.

Really we'd be a lot better off merging ATOMIC_LOAD and LOAD like
GlobalISel does.
2024-04-26 12:20:40 +02:00
Matt Arsenault
76a3be7c76 AMDGPU: Add baseline tests for bad bitcasting of atomic load/store 2024-04-25 16:08:11 +02:00
Matt Arsenault
a45eb62877 AtomicExpand: Fix dropping a syncscope when bitcasting atomicrmw 2024-04-24 19:09:34 +02:00
Pierre van Houtryve
cf328ff96d
[IR] Memory Model Relaxation Annotations (#78569)
Implements the core/target-agnostic components of Memory Model
Relaxation Annotations.

RFC:
https://discourse.llvm.org/t/rfc-mmras-memory-model-relaxation-annotations/76361/5
2024-04-24 08:52:25 +02:00
Matt Arsenault
31af5e9001 AtomicExpand: Emit or with constant on RHS
This will save later code from commuting it.
2024-04-23 15:00:31 +02:00