llvm-project

Author	SHA1	Message	Date
Pierre van Houtryve	6f7c77fe90	[AMDGPU] Check noalias.addrspace in mayAccessScratchThroughFlat (#151319 ) PR #149247 made the MD accessible by the backend so we can now leverage it in the memory model. The first use case here is detecting if a flat op can access scratch memory. Benefits both the MemoryLegalizer and InsertWaitCnt.	2025-08-19 07:42:59 +02:00
Nikita Popov	c23b4fbdbb	[IR] Remove size argument from lifetime intrinsics (#150248 ) Now that #149310 has restricted lifetime intrinsics to only work on allocas, we can also drop the explicit size argument. Instead, the size is implied by the alloca. This removes the ability to only mark a prefix of an alloca alive/dead. We never used that capability, so we should remove the need to handle that possibility everywhere (though many key places, including stack coloring, did not actually respect this).	2025-08-08 11:09:34 +02:00
Andrei Safronov	3ffaaf6db6	[Xtensa] Implement Xtensa S32C1I Option and atomics lowering. (#137134 ) Implement Xtensa S32C1I Option. Implement atomic_cmp_swap_32 operation using s32c1i instruction. Use atomic_cmp_swap_32 operation and AtomicExpand pass to implement atomics operations.	2025-08-06 17:43:27 +03:00
Matt Arsenault	ed1ee9a9bf	AtomicExpand: Stop using report_fatal_error (#147300 ) Emit a context error and delete the instruction. This allows removing the AMDGPU hack where some atomic libcalls are falsely added. NVPTX also later copied the same hack, so remove it there too. For now just emit the generic error, which is not good. It's missing any useful context information (despite taking the instruction). It's also confusing in the failed atomicrmw case, since it's reporting failure at the intermediate failed cmpxchg instead of the original atomicrmw.	2025-07-09 15:28:10 +09:00
Matt Arsenault	67076dd79f	AMDGPU: Fix atomic expand tests accidentally underaligning (#147299 )	2025-07-08 23:58:51 +09:00
zhijian lin	85a9f2e148	[PowerPC] enable AtomicExpandImpl::expandAtomicCmpXchg for powerpc (#142395 ) In PowerPC, the AtomicCmpXchgInst is lowered to ISD::ATOMIC_CMP_SWAP_WITH_SUCCESS. However, this node does not handle the weak attribute of AtomicCmpXchgInst. As a result, when compiling C++ atomic_compare_exchange_weak_explicit, the generated assembly includes a "reservation lost" loop — i.e., it branches back and retries if the stwcx. (store-conditional) fails. This differs from GCC’s codegen, which does not include that loop for weak compare-exchange. Since PowerPC uses LL/SC-style atomic instructions, the patch enables AtomicExpandImpl::expandAtomicCmpXchg for PowerPC. With this, the weak attribute is properly respected, and the "reservation lost" loop is removed for weak operations. --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com>	2025-06-13 09:14:48 -04:00
Shilei Tian	8cd5604f59	[AMDGPU][AtomicExpand] Use full flat emulation if a target supports f64 global atomic add instruction (#142859 ) If a target supports f64 global atomic add instruction, we can also use full flat emulation.	2025-06-05 00:45:42 -04:00
Alexander Richardson	2d2d753e01	[AtomicExpand] Drop explicit datalayout from test Also remove the R600 checks this test as well as a duplicate RUN line. Reviewed By: arsenm Pull Request: https://github.com/llvm/llvm-project/pull/137923	2025-05-09 19:43:26 -07:00
Jonathan Thackray	6e49f73825	Reland [llvm] Add support for llvm IR atomicrmw fminimum/fmaximum instructions (#137701 ) This patch adds support for LLVM IR atomicrmw `fmaximum` and `fminimum` instructions. These mirror the `llvm.maximum.` and `llvm.minimum.` instructions, but are atomic and use IEEE754 2019 handling for NaNs, which is different to `fmax` and `fmin`. See: https://llvm.org/docs/LangRef.html#llvm-minimum-intrinsic for more details. Future changes will allow this LLVM IR to be lowered to specialised assembler instructions on suitable targets, such as AArch64.	2025-04-30 22:06:37 +01:00
Alexander Richardson	ee13638362	[AMDGPU] Remove explicit datalayout from tests where not needed Since e39f6c1844fab59c638d8059a6cf139adb42279a opt will infer the correct datalayout when given a triple. Avoid explicitly specifying it in tests that depend on the AMDGPU target being present to avoid the string becoming out of sync with the TargetInfo value. Only tests with REQUIRES: amdgpu-registered-target or a local lit.cfg were updated to ensure that tests for non-target-specific passes that happen to use the AMDGPU layout still pass when building with a limited set of targets. Reviewed By: shiltian, arsenm Pull Request: https://github.com/llvm/llvm-project/pull/137921	2025-04-30 10:58:17 -07:00
Jonathan Thackray	7ee0097b48	Revert "[llvm] Add support for llvm IR atomicrmw fminimum/fmaximum instructions" (#137657 ) Reverts llvm/llvm-project#136759 due to bad interaction with c792b25e4	2025-04-28 16:53:36 +01:00
Jonathan Thackray	ba420d8122	[llvm] Add support for llvm IR atomicrmw fminimum/fmaximum instructions (#136759 ) This patch adds support for LLVM IR atomicrmw `fmaximum` and `fminimum` instructions. These mirror the `llvm.maximum.` and `llvm.minimum.` instructions, but are atomic and use IEEE754 2019 handling for NaNs, which is different to `fmax` and `fmin`. See: https://llvm.org/docs/LangRef.html#llvm-minimum-intrinsic for more details. Future changes will allow this LLVM IR to be lowered to specialised assembler instructions on suitable targets, such as AArch64.	2025-04-28 15:31:44 +01:00
Fabian Ritter	a33a84ee63	[AMDGPU][NFC] Replace gfx940 and gfx941 with gfx942 in llvm/test (#125711 ) [AMDGPU][NFC] Replace gfx940 and gfx941 with gfx942 in llvm/test gfx940 and gfx941 are no longer supported. This is one of a series of PRs to remove them from the code base. This PR uses gfx942 instead of gfx940 and gfx941 in the test RUN-lines (unless there is already a RUN-line for gfx942). The only notable difference in the test output is that gfx942 does not force the use of sc0 and sc1 on stores while gfx940 and gfx941 do (cf. https://reviews.llvm.org/D149986). For SWDEV-512631	2025-02-13 15:17:12 +01:00
Lei Huang	a13ec9cd54	[PowerPC] Update data layout aligment of i128 to 16 (#118004 ) Fix 64-bit PowerPC part of https://github.com/llvm/llvm-project/issues/102783.	2024-12-09 18:02:24 -05:00
tangaac	427be07675	[LoongArch] Support amcas[_db].{b/h/w/d} instructions. (#114189 ) Two options for clang: -mlamcas & -mno-lamcas. Enable or disable amcas[_db].{b/h} instructions. The default is -mno-lamcas. Only works on LoongArch64.	2024-11-27 17:36:13 +08:00
Matt Arsenault	04d450fd8d	AtomicExpand: Preserve metadata when bitcasting fp atomicrmw xchg (#115240 )	2024-11-13 12:51:18 -08:00
Matt Arsenault	30dd1297fa	AMDGPU: Custom expand flat cmpxchg which may access private (#109410 ) 64-bit flat cmpxchg instructions do not work correctly for scratch addresses, and need to be expanded as non-atomic. Allow custom expansion of cmpxchg in AtomicExpand, as is already the case for atomicrmw.	2024-11-04 09:29:38 -08:00
Matt Arsenault	9cc298108a	AtomicExpand: Copy metadata from atomicrmw to cmpxchg (#109409 ) When expanding an atomicrmw with a cmpxchg, preserve any metadata attached to it. This will avoid unwanted double expansions in a future commit. The initial load should also probably receive the same metadata (which for some reason is not emitted as an atomic).	2024-10-31 11:54:07 -07:00
Matt Arsenault	e3222e6f80	AMDGPU: Add baseline tests for cmpxchg custom expansion (#109408 ) We need a non-atomic path if flat may access private.	2024-10-31 11:46:13 -07:00
Matt Arsenault	1d0370872f	AMDGPU: Expand flat atomics that may access private memory (#109407 ) If the runtime flat address resolves to a scratch address, 64-bit atomics do not work correctly. Insert a runtime address space check (which is quite likely to be uniform) and select between the non-atomic and real atomic cases. Consider noalias.addrspace metadata and avoid this expansion when possible (we also need to consider it to avoid infinitely expanding after adding the predication code).	2024-10-31 08:08:48 -07:00
Matt Arsenault	b0a25468fa	AMDGPU: Add baseline tests for flat-may-alias private atomic expansions (#109406 )	2024-10-15 22:29:24 +04:00
Matt Arsenault	0edd07770f	AMDGPU: Preserve alignment when custom expanding atomicrmw (#103768 )	2024-08-14 17:16:59 +04:00
Matt Arsenault	edded8d7b5	AMDGPU: Stop handling legacy amdgpu-unsafe-fp-atomics attribute (#101699 ) This is now autoupgraded to annotate atomicrmw instructions in old bitcode.	2024-08-13 22:02:25 +04:00
Matt Arsenault	80c51fad3b	AtomicExpand: Regenerate baseline checks (#103063 )	2024-08-13 20:43:39 +04:00
Matt Arsenault	1ae507d109	AMDGPU: Do not create phi user for atomicrmw with no uses (#103061 )	2024-08-13 19:24:52 +04:00
Matt Arsenault	42b5540211	AMDGPU: Preserve atomicrmw name when specializing address space (#102470 )	2024-08-09 00:43:04 +04:00
Matt Arsenault	bb7143f666	AMDGPU: Avoid creating unnecessary block split in atomic expansion (#102440 ) This was creating a new block to insert the is.shared check, but we can just do that in the original block.	2024-08-09 00:39:12 +04:00
Matt Arsenault	dfda9c5b9e	AMDGPU: Handle new atomicrmw metadata for fadd case (#96760 ) This is the most complex atomicrmw support case. Note we don't have accurate remarks for all of the cases, which I'm planning on fixing in a later change with more precise wording. Continue respecting amdgpu-unsafe-fp-atomics until it's eventual removal. Also seems to fix a few cases not interpreting amdgpu-unsafe-fp-atomics appropriately aaggressively.	2024-08-02 19:41:33 +04:00
Matt Arsenault	41439d5bb7	AMDGPU: Handle remote/fine-grained memory in atomicrmw fmin/fmax lowering (#96759 ) Consider the new atomic metadata when choosing to expand as cmpxchg instead.	2024-08-01 22:08:01 +04:00
Matt Arsenault	a2a73d892a	AMDGPU: Fix no return atomicrmw fadd v2f16 selection for gfx908 (#96948 ) We previously would always expand this with a cmpxchg loop, while it should be the same conditions as the f32 case (except for the denormal concern).	2024-06-27 21:17:16 +02:00
Matt Arsenault	a440a96ec2	AMDGPU: Start selecting flat/global atomicrmw fmin/fmax. (#95592 ) Define subtarget features for atomic fmin/fmax support. The flat/global support is a real messe. We had float/double support at the beginning in gfx6 and gfx7. gfx8 removed these. gfx10 reintroduced them. gfx11 removed the f64 versions again. gfx9 partially reintroduced them, in gfx90a and gfx940 but only for f64.	2024-06-23 10:10:41 +02:00
Matt Arsenault	8520061281	AMDGPU: Support local atomicrmw fmin/fmax for float/double (#95590 ) This has always been supported. Somehow, we ended up with 2 copies of clang builtins for this case, and the newer one erroneously requires gfx8-insts.	2024-06-18 18:34:34 +02:00
Matt Arsenault	cf5ce8cdf1	AMDGPU: Add some tests for i128 and fp128 atomic expansion These produce garbage libcalls, so the result is not useful but this at least shows we don't assert.	2024-06-18 10:33:25 +02:00
Matt Arsenault	4cf1a19b7e	Reapply "AMDGPU: Handle legal v2f16/v2bf16 atomicrmw fadd for global/flat (#95394 )" This reverts commit 95b77d90aae10725ea692e120aac083ef1c1297d.	2024-06-17 16:34:35 +02:00
Nico Weber	95b77d90aa	Revert "AMDGPU: Handle legal v2f16/v2bf16 atomicrmw fadd for global/flat (#95394 )" This reverts commit 5021e6dd548323e1169be3d466d440009e6d1f8e. Breaks tests, see https://github.com/llvm/llvm-project/pull/95394#issuecomment-2169394503	2024-06-15 12:33:13 -04:00
Matt Arsenault	5021e6dd54	AMDGPU: Handle legal v2f16/v2bf16 atomicrmw fadd for global/flat (#95394 ) Unlike the existing fadd cases, choose to ignore the requirement for amdgpu-unsafe-fp-atomics in case of fine-grained memory access. This is to minimize migration pain to the new atomic control metadata. This should not break any users, as the atomic intrinsics are still directly consumed, and clang does not yet produce vector FP atomicrmw.	2024-06-15 09:58:12 +02:00
Matt Arsenault	0a9a5f989f	AMDGPU: Legalize atomicrmw fadd for v2f16/v2bf16 for local memory (#95393 ) Make this legal for gfx940 and gfx12	2024-06-15 09:55:04 +02:00
Matt Arsenault	f3afdc4ad9	AtomicExpand: Fix creating invalid ptrmask for fat pointers (#94955 ) The ptrmask intrinsic requires the integer mask to be the index size, not the pointer size.	2024-06-12 10:45:42 +02:00
Matt Arsenault	a2bc50aa8b	AMDGPU: Add more tests for vector typed atomicrmw fadd Some cases should be legal for gfx940.	2024-06-11 14:44:28 +02:00
Matt Arsenault	d81170873c	AtomicExpand: Preserve metadata when expanding partword RMW (#89769 ) This will be important for AMDGPU in a future patch.	2024-05-23 10:04:47 +02:00
wanglei	9d4f7f44b6	[test][LoongArch] Add -mattr=+d option. NFC Because most of tests assume target-abi=`lp64d`, adding the corresponding feature is reasonable. rg -l loongarch -g '!*.s' \| xargs sed -i '/mtriple=loongarch/ {/-mattr=/!{/target-abi/! s/mtriple=loongarch.. /&-mattr=+d /}}'	2024-05-14 20:23:04 +08:00
Matt Arsenault	82bb2534d4	AMDGPU: Don't bitcast float typed atomic store in IR (#90116 ) Implement the promotion in the DAG. Depends #90113	2024-05-07 21:43:22 +02:00
Matt Arsenault	7927bcdb8a	AMDGPU: Do not bitcast atomicrmw in IR (#90045 ) This is the first step to eliminating shouldCastAtomicRMWIInIR. This and the other atomic expand casting hooks should be removed. This adds duplicate legalization machinery and interfaces. This is already what codegen is supposed to do, and already does for the promotion case. In the case of atomicrmw xchg, there seems to be some benefit to having the bitcasts moved outside of the cmpxchg loop on targets with separate int and FP registers, which we should be able to deal with by directly checking for the legality of the underlying operation. The casting path was also losing metadata when it recreated the instruction.	2024-05-07 18:26:32 +02:00
Matt Arsenault	4e67b5058e	AMDGPU: Add more tests for atomicrmw handling Add agent scope copies of atomicrmw atomics tests. Expand testing for the undo identity atomicrmw case. Test 16-bit atomic expansions.	2024-05-03 11:50:59 +02:00
Matt Arsenault	9f9856d623	AMDGPU: Update name for amdgpu.no.remote.memory metadata	2024-05-03 11:50:59 +02:00
Matt Arsenault	f1112ebe07	AMDGPU: Do not bitcast atomic load in IR (#90060 ) These hooks should be removed. This is a trivial legalization transform the legalizer needs to support. The IR just complicates things, and it was losing metadata. Implement the DAG promotion support, and switch AMDGPU over to using it. Really we'd be a lot better off merging ATOMIC_LOAD and LOAD like GlobalISel does.	2024-04-26 12:20:40 +02:00
Matt Arsenault	76a3be7c76	AMDGPU: Add baseline tests for bad bitcasting of atomic load/store	2024-04-25 16:08:11 +02:00
Matt Arsenault	a45eb62877	AtomicExpand: Fix dropping a syncscope when bitcasting atomicrmw	2024-04-24 19:09:34 +02:00
Pierre van Houtryve	cf328ff96d	[IR] Memory Model Relaxation Annotations (#78569 ) Implements the core/target-agnostic components of Memory Model Relaxation Annotations. RFC: https://discourse.llvm.org/t/rfc-mmras-memory-model-relaxation-annotations/76361/5	2024-04-24 08:52:25 +02:00
Matt Arsenault	31af5e9001	AtomicExpand: Emit or with constant on RHS This will save later code from commuting it.	2024-04-23 15:00:31 +02:00

1 2 3 4

160 Commits