llvm-project

Author	SHA1	Message	Date
Shilei Tian	ef715849d7	[NFC][AMDGPU] Add some debug prints to SIMemoryLegalizer (#190658 )	2026-04-06 17:17:33 -04:00
vporpo	40ea2f3513	[MIR][MachineInstr] Update MachineInstr::eraseFromParent() to return an iterator (#179787 ) Unlike LLVM IR `Instruction::eraseFromParent()`, `MachineInstr::eraseFromParent()` is void and does not return the iterator following the erased instruction. Returning an iterator can be very helpful for example when we are erasing MachineInstrs while iterating, as it provides a convenient way to get a valid iterator. This patch updates `MachineInstr::eraseFromParent()` to return a `MachineBlock::iterator` (which is a `MachineInstrBundleIterator<MachineInstr>`). If the erased instruction is the head of a bundle, then the returned iterator points to the next bundle (see unittest).	2026-03-13 13:39:13 -07:00
Pierre van Houtryve	b738491d2f	[AMDGPU][GFX12.5] Add support for emitting memory operations with nv bit set (#179413 ) - Add `MONonVolatile` MachineMemOperand flag. - Set nv=1 on memory operations on GFX12.5 if the operation accesses a constant address space, is an invariant load, or has the `MONonVolatile` flag set.	2026-02-06 11:35:46 +01:00
Mariusz Sikora	3c0f5045e1	[AMDGPU] Add FeatureGFX13 and SMEM encoding for gfx13 (#177567 ) For now list of features is based on gfx12 and gfx1250 --------- Co-authored-by: Jay Foad <jay.foad@amd.com>	2026-01-26 14:16:36 +01:00
Stanislav Mekhanoshin	dd947ebcf3	[AMDGPU] Update gfx1250 memory model for global acquire/release (#175865 ) Inserts required waits around GLOBAL_INV/GLOBAL_WBINV for agent scope and above.	2026-01-15 03:25:03 -08:00
LU-JOHN	49381c3000	[NFC][AMDGPU] Declare variables initialized with getDebugLoc as const ref (#174434 ) Declare variables initialized with getDebugLoc as a const reference. Signed-off-by: John Lu <John.Lu@amd.com>	2026-01-05 12:37:47 -06:00
Pierre van Houtryve	a086fb2fbb	[AMDGPU][gfx1250] Add wait_xcnt before any access that cannot be repeated (#168852 ) The xcnt wait is actually required before any memory access that can only be done once, so atomic stores and volatile accesses are affected. This patch also ensures buffer instructions are handled.	2025-11-25 10:11:04 +01:00
Pierre van Houtryve	20795e06ed	[AMDGPU][SIMemoryLegalizer] Combine GFX10-11 CacheControl Classes (#168058 ) Also breaks the long inheritance chains by making both `SIGfx10CacheControl` and `SIGfx12CacheControl` inherit from `SICacheControl` directly. With this patch, we now just have 3 `SICacheControl` implementations that each do their own thing, and there is no more code hidden 3 superclasses above (which made this code harder to read and maintain than it needed to be).	2025-11-18 10:12:56 +01:00
Pierre van Houtryve	b07bfdb6fb	[AMDGPU][SIMemoryLegalizer] Combine all GFX6-9 CacheControl Classes (#168052 ) Merge the following classes into `SIGfx6CacheControl`: - SIGfx7CacheControl - SIGfx90ACacheControl - SIGfx940CacheControl They were all very similar and had a lot of duplicated boilerplate just to implement one or two codegen differences. GFX90A/GFX940 have a bit more differences, but they're still manageable under one class because the general behavior is the same. This removes 500 lines of code and puts everything into a single place which I think makes it a lot easier to maintain, at the cost of a slight increase in complexity for some functions. There is still a lot of room for improvement but I think this patch is already big enough as is and I don't want to bundle too much into one review.	2025-11-17 10:07:20 +01:00
Jay Foad	72c69aefba	[AMDGPU] Make use of getFunction and getMF. NFC. (#167872 )	2025-11-14 11:00:57 +00:00
Jay Foad	60f20ea465	[AMDGPU] Add target feature for waits before system scope stores. NFC. (#164993 )	2025-10-27 10:31:37 +00:00
Kazu Hirata	84857775b7	[Target] Add "override" where appropriate (NFC) (#165083 ) Note that "override" makes "virtual" redundant. Identified with modernize-use-override.	2025-10-25 06:23:43 -07:00
Pierre van Houtryve	07d47c792b	[AMDGPU] Update code sequence for CU-mode Release Fences in GFX10+ (#161638 ) They were previously optimized to not emit any waitcnt, which is technically correct because there is no reordering of operations at workgroup scope in CU mode for GFX10+. This breaks transitivity however, for example if we have the following sequence of events in one thread: - some stores - store atomic release syncscope("workgroup") - barrier then another thread follows with - barrier - load atomic acquire - store atomic release syncscope("agent") It does not work because, while the other thread sees the stores, it cannot release them at the wider scope. Our release fences aren't strong enough to "wait" on stores from other waves. We also cannot strengthen our release fences any further to allow for releasing other wave's stores because only GFX12 can do that with `global_wb`. GFX10-11 do not have the writeback instruction. It'd also add yet another level of complexity to code sequences, with both acquire/release having CU-mode only alternatives. Lastly, acq/rel are always used together. The price for synchronization has to be paid either at the acq, or the rel. Strengthening the releases would just make the memory model more complex but wouldn't help performance. So the choice here is to streamline the code sequences by making CU and WGP mode emit almost identical (vL0 inv is not needed in CU mode) code for release (or stronger) atomic ordering. This also removes the `vm_vsrc(0)` wait before barriers. Now that the release fence in CU mode is strong enough, it is no longer needed. Supersedes #160501 Solves SC1-6454	2025-10-21 09:23:46 +02:00
Krzysztof Drewniak	d37141776f	[AMDGPU] Enable volatile and non-temporal for loads to LDS (#153244 ) The primary purpose of this commit is to enable marking loads to LDS (global.load.lds, buffer.*.load.lds) volatile (using bit 31 of the aux as with normal buffer loads) and to ensure that their !nontemporal annotations translate to appropriate settings of te cache control bits. However, in the process of implementing this feature, we also fixed - Incorrect handling of buffer loads to LDS in GlobalISel - Updating the handling of volatile on buffers in SIMemoryLegalizer: previously, the mapping of address spaces would cause volatile on buffer loads to be silently dropped on at least gfx10. --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com>	2025-10-20 12:42:22 -05:00
Pierre van Houtryve	c1852afa4b	[AMDGPU][SIMemoryLegalizer][GFX12] Correctly insert sample/bvhcnt (#161637 ) The check used was not strong enough to prevent the insertion of sample/bvhcnt when they were not needed. I assume SIInsertWaitCnts was trimming those away anyway, but this was a bug nonetheless. We were inserting SAMPLE/BVHcnt waits in places where we only needed to wait on the previous atomic operation. Neither of these counter have any atomics associated with them.	2025-10-20 12:03:47 +02:00
Stanislav Mekhanoshin	1becadeebc	[AMDGPU] Update comments in memory legalizer. NFC (#160453 )	2025-09-24 10:04:06 -07:00
Fabian Ritter	825c956c64	[AMDGPU] SIMemoryLegalizer: Factor out check if memory operations can affect the global AS (#160129 ) Mostly NFC, and adds an assertion for gfx12 to ensure that no atomic scratch instructions are present in the case of GloballyAddressableScratch. This should always hold because of #154710.	2025-09-24 13:22:03 +02:00
Fabian Ritter	3f8c7e9fa3	[AMDGPU] Insert waitcnt for non-global fence release in GFX12 (#159282 ) A fence release could be followed by a barrier, so it should wait for the relevant memory accesses to complete, even if it is mmra-limited to LDS. So far, that would be skipped for non-global fence releases. Fixes SWDEV-554932.	2025-09-23 11:52:38 +02:00
Sameer Sahasrabuddhe	4b03252ad6	[NFC][AMDGPU][SIMemoryLegalizer] remove effectively empty function (#156806 ) The removed function SIGfx90ACacheControl::enableLoadCacheBypass() does not actually do anything except one assert and one unreachable.	2025-09-12 09:30:12 +00:00
Pierre van Houtryve	49a898f9b5	[AMDGPU][gfx1250] Support "cluster" syncscope (#157641 ) Defaults to "agent" for targets that do not support it. - Add documentation - Register it in MachineModuleInfo - Add MemoryLegalizer support	2025-09-10 11:41:43 +02:00
Pierre van Houtryve	d6d0f4f156	[AMDGPU][gfx1250] Remove SCOPE_SE for scratch stores (#157640 )	2025-09-10 11:03:58 +02:00
Pierre van Houtryve	dcaa29c8ed	Revert "[AMDGPU][gfx1250] Add `cu-store` subtarget feature (#150588 )" (#157639 ) This reverts commit be17791f2624f22b3ed24a2539406164a379125d. This is not necessary for gfx1250 anymore.	2025-09-10 10:20:59 +02:00
Pierre van Houtryve	bed9be954d	[AMDGPU][gfx1250] Implement SIMemoryLegalizer (#154726 ) Implements the base of the MemoryLegalizer for a roughly correct GFX1250 memory model. Documentation will come later, and some remaining changes still have to be added, but this is the backbone of the model.	2025-09-10 10:18:11 +02:00
Pierre van Houtryve	e2bd10cf16	[AMDGPU][gfx1250] Add 128B cooperative atomics (#156418 ) - Add clang built-ins + sema/codegen - Add IR Intrinsic + verifier - Add DAG/GlobalISel codegen for the intrinsics - Add lowering in SIMemoryLegalizer using a MMO flag.	2025-09-04 09:19:25 +00:00
Pierre van Houtryve	d6edc1a96f	[AMDGPU] Reenable BackOffBarrier on GFX11/12 (#155370 ) Re-enable it by adding a wait on vm_vsrc before every barrier "start" instruction in GFX10/11/12 CU mode. This is a less strong wait than what we do without BackOffBarrier, thus this shouldn't introduce any new guarantees that can be abused, instead it relaxes the guarantees we have now to the bare minimum needed to support the behavior users want (fence release + barrier works). There is an exact memory model in the works which will be documented separately.	2025-09-02 09:37:43 +02:00
Sameer Sahasrabuddhe	8f187c74b3	[AMDGPU] introduce S_WAITCNT_LDS_DIRECT in the memory legalizer (#150887 ) The new instruction represents the unknown number of waitcnts needed at a release operation to ensure that prior direct loads to LDS (formerly called LDS DMA) are completed. The instruction is replaced in SIInsertWaitcnts with a suitable value for vmcnt(). Co-authored-by: Austin Kerbow <austin.kerbow@amd.com>.	2025-07-30 11:23:28 +05:30
Pierre van Houtryve	be17791f26	[AMDGPU][gfx1250] Add `cu-store` subtarget feature (#150588 ) Determines whether we can use `SCOPE_CU` stores (on by default), or whether all stores must be done at `SCOPE_SE` minimum.	2025-07-29 11:38:43 +02:00
Pierre van Houtryve	a6532c2ada	[AMDGPU][gfx12] Clean-up implementation of waits before SCOPE_SYS stores (#150587 ) We can do it all in finalizeStore if we ensure it always sees the stores. For that, I needed to fix a hidden bug where finalizeStore wouldn't see all stores because sometimes the iterator got out-of-sync and didn't point to the store anymore. This also removes the waits before volatile LDS stores which never needed it, that was a bug until now.	2025-07-28 15:38:46 +02:00
Pierre van Houtryve	2ad4e93ded	[AMDGPU][gfx1250] Use SCOPE_SE for stores that may hit scratch (#150586 )	2025-07-28 11:40:56 +02:00
Pierre van Houtryve	9c5f8ec561	[NFC][AMDGPU] Refactor handling of `amdgpu-synchronize-as` MD on fences (#148630 ) Directly plug it into the MMO instead, which is much cleaner.	2025-07-24 12:45:50 +02:00
Pierre van Houtryve	cd1b84caa8	[NFC][AMDGPU] Rename "amdgpu-as" to "amdgpu-synchronize-as" (#148627 ) "amdgpu-as" is way too vague and doesn't give enough context. We may want to support it on normal atomics too, to control the synchronized (ordered) AS. If we do that, the name has to be less vague.	2025-07-24 12:41:57 +02:00
Stanislav Mekhanoshin	958dc86026	[AMDGPU] Don't insert wait instructions that are not supported by gfx1250 (#145084 ) No tests yet, but it will allow further tests not to be polluted with these waits.	2025-06-20 12:21:45 -07:00
Justin Bogner	b7bb256703	Warn on misuse of DiagnosticInfo classes that hold Twines (#137397 ) This annotates the `Twine` passed to the constructors of the various DiagnosticInfo subclasses with `[[clang::lifetimebound]]`, which causes us to warn when we would try to print the twine after it had already been destructed. We also update `DiagnosticInfoUnsupported` to hold a `const Twine &` like all of the other DiagnosticInfo classes, since this warning allows us to clean up all of the places where it was being used incorrectly.	2025-05-28 12:26:39 -07:00
Akshat Oke	c22c5643db	[AMDGPU][NPM] Port SIMemoryLegalizer to NPM (#130060 )	2025-03-12 14:30:35 +05:30
Fabian Ritter	2260d59257	[AMDGPU] Remove FeatureForceStoreSC0SC1 (#126878 ) This was only used for gfx940 and gfx941, which have since been removed. For SWDEV-512631	2025-02-19 10:26:09 +01:00
Fabian Ritter	8615f9aaff	[AMDGPU] Replace gfx940 and gfx941 with gfx942 in llvm (#126763 ) gfx940 and gfx941 are no longer supported. This is one of a series of PRs to remove them from the code base. This PR removes all non-documentation occurrences of gfx940/gfx941 from the llvm directory, and the remaining occurrences in clang. Documentation changes will follow. For SWDEV-512631	2025-02-19 10:20:48 +01:00
Pierre van Houtryve	924a64a348	[AMDGPU] Only emit SCOPE_SYS global_wb (#110636 ) global_wb with scopes lower than SCOPE_SYS is unnecessary for correctness. I was initially optimistic they would be very cheap no-ops but they can actually be quite expensive so let's avoid them.	2024-10-07 07:35:31 +02:00
Pierre van Houtryve	eaac4a2613	[AMDGPU] Document & Finalize GFX12 Memory Model (#98599 ) Documents the memory model implemented as of #98591, with some fixes/optimizations to the implementation.	2024-09-09 15:35:28 +02:00
Matt Arsenault	7b28cc0c59	AMDGPU: Query MachineModuleInfo from PM instead of MachineFunction (#99679 )	2024-07-22 08:55:39 +04:00
Jay Foad	74b87b02d2	[AMDGPU] Fix and add namespace closing comments. NFC.	2024-07-16 16:56:31 +01:00
Pierre van Houtryve	b3a446650c	[AMDGPU] Implement GFX12 Memory Model (#98591 ) - Emit GLOBAL_WB instructions - Reflect synscope on instructions's `scope:` operand Fixes SWDEV-468508 Fixes SWDEV-470735 Fixes SWDEV-468392 Fixes SWDEV-469622	2024-07-16 10:53:06 +02:00
Pierre van Houtryve	c1ac6d2dd4	[AMDGPU] Add amdgpu-as MMRA for fences (#78572 ) Using MMRAs, allow `builtin_amdgcn_fence` to emit fences that only target one or more address spaces, instead of fencing all address spaces at once. This is done through a `amdgpu-as` MMRA. Currently focused on OpenCL fences, but can very easily support more AS names and codegen on more than just fences.	2024-05-27 12:17:04 +02:00
Mirko Brkušanin	1fd1f4c0e1	[AMDGPU] Handle amdgpu.last.use metadata (#83816 ) Convert !amdgpu.last.use metadata into MachineMemOperand for last use and handle it in SIMemoryLegalizer similar to nontemporal and volatile.	2024-03-06 16:33:52 +01:00
Mirko Brkušanin	27ce5121ee	[AMDGPU] Fix setting nontemporal in memory legalizer (#83815 ) Iterator MI can advance in insertWait() but we need original instruction to set temporal hint. Just move it before handling volatile.	2024-03-04 15:05:31 +01:00
Petar Avramovic	3e35ba53e2	AMDGPU/GFX12: Insert waitcnts before stores with scope_sys (#82996 ) Insert waitcnts for loads and atomics before stores with system scope. Scope is field in instruction encoding and corresponds to desired coherence level in cache hierarchy. Intrinsic stores can set scope in cache policy operand. If volatile keyword is used on generic stores memory legalizer will set scope to system. Generic stores, by default, get lowest scope level. Waitcnts are not required if it is guaranteed that memory is cached. For example vulkan shaders can guarantee this. TODO: implement flag for frontends to give us a hint not to insert waits. Expecting vulkan flag to be implemented as vulkan:private MMRA.	2024-02-28 16:18:04 +01:00
Pierre van Houtryve	87d7711934	[AMDGPU][SIMemoryLegalizer] Fix order of GL0/1_INV on GFX10/11 (#81450 ) Fixes SWDEV-443292	2024-02-13 09:07:51 +01:00
Jay Foad	ba52f06f9d	[AMDGPU] CodeGen for GFX12 S_WAIT_* instructions (#77438 ) Update SIMemoryLegalizer and SIInsertWaitcnts to use separate wait instructions per counter (e.g. S_WAIT_LOADCNT) and split VMCNT into separate LOADCNT, SAMPLECNT and BVHCNT counters.	2024-01-18 10:47:45 +00:00
Mirko Brkušanin	7ca4473dd9	[AMDGPU] Add new cache flushing instructions for GFX12 (#76944 ) Co-authored-by: Diana Picus <Diana-Magda.Picus@amd.com>	2024-01-08 14:06:58 +00:00
Pierre van Houtryve	ef067f5204	[AMDGPU][SIInsertWaitcnts] Do not add s_waitcnt when the counters are known to be 0 already (#72830 ) Co-authored-by: Juan Manuel MARTINEZ CAAMAÑO <juamarti@amd.com>	2023-12-15 12:33:32 +01:00
Konstantin Zhuravlyov	42bd81410e	AMDGPU: Force sc0 and sc1 on stores for gfx940 and gfx941 Differential Revision: https://reviews.llvm.org/D149986	2023-05-12 11:53:19 -04:00

1 2 3

122 Commits