llvm-project

Author	SHA1	Message	Date
Sameer Sahasrabuddhe	8f187c74b3	[AMDGPU] introduce S_WAITCNT_LDS_DIRECT in the memory legalizer (#150887 ) The new instruction represents the unknown number of waitcnts needed at a release operation to ensure that prior direct loads to LDS (formerly called LDS DMA) are completed. The instruction is replaced in SIInsertWaitcnts with a suitable value for vmcnt(). Co-authored-by: Austin Kerbow <austin.kerbow@amd.com>.	2025-07-30 11:23:28 +05:30
Pierre van Houtryve	be17791f26	[AMDGPU][gfx1250] Add `cu-store` subtarget feature (#150588 ) Determines whether we can use `SCOPE_CU` stores (on by default), or whether all stores must be done at `SCOPE_SE` minimum.	2025-07-29 11:38:43 +02:00
Pierre van Houtryve	a6532c2ada	[AMDGPU][gfx12] Clean-up implementation of waits before SCOPE_SYS stores (#150587 ) We can do it all in finalizeStore if we ensure it always sees the stores. For that, I needed to fix a hidden bug where finalizeStore wouldn't see all stores because sometimes the iterator got out-of-sync and didn't point to the store anymore. This also removes the waits before volatile LDS stores which never needed it, that was a bug until now.	2025-07-28 15:38:46 +02:00
Pierre van Houtryve	2ad4e93ded	[AMDGPU][gfx1250] Use SCOPE_SE for stores that may hit scratch (#150586 )	2025-07-28 11:40:56 +02:00
Pierre van Houtryve	9c5f8ec561	[NFC][AMDGPU] Refactor handling of `amdgpu-synchronize-as` MD on fences (#148630 ) Directly plug it into the MMO instead, which is much cleaner.	2025-07-24 12:45:50 +02:00
Pierre van Houtryve	cd1b84caa8	[NFC][AMDGPU] Rename "amdgpu-as" to "amdgpu-synchronize-as" (#148627 ) "amdgpu-as" is way too vague and doesn't give enough context. We may want to support it on normal atomics too, to control the synchronized (ordered) AS. If we do that, the name has to be less vague.	2025-07-24 12:41:57 +02:00
Stanislav Mekhanoshin	958dc86026	[AMDGPU] Don't insert wait instructions that are not supported by gfx1250 (#145084 ) No tests yet, but it will allow further tests not to be polluted with these waits.	2025-06-20 12:21:45 -07:00
Justin Bogner	b7bb256703	Warn on misuse of DiagnosticInfo classes that hold Twines (#137397 ) This annotates the `Twine` passed to the constructors of the various DiagnosticInfo subclasses with `[[clang::lifetimebound]]`, which causes us to warn when we would try to print the twine after it had already been destructed. We also update `DiagnosticInfoUnsupported` to hold a `const Twine &` like all of the other DiagnosticInfo classes, since this warning allows us to clean up all of the places where it was being used incorrectly.	2025-05-28 12:26:39 -07:00
Akshat Oke	c22c5643db	[AMDGPU][NPM] Port SIMemoryLegalizer to NPM (#130060 )	2025-03-12 14:30:35 +05:30
Fabian Ritter	2260d59257	[AMDGPU] Remove FeatureForceStoreSC0SC1 (#126878 ) This was only used for gfx940 and gfx941, which have since been removed. For SWDEV-512631	2025-02-19 10:26:09 +01:00
Fabian Ritter	8615f9aaff	[AMDGPU] Replace gfx940 and gfx941 with gfx942 in llvm (#126763 ) gfx940 and gfx941 are no longer supported. This is one of a series of PRs to remove them from the code base. This PR removes all non-documentation occurrences of gfx940/gfx941 from the llvm directory, and the remaining occurrences in clang. Documentation changes will follow. For SWDEV-512631	2025-02-19 10:20:48 +01:00
Pierre van Houtryve	924a64a348	[AMDGPU] Only emit SCOPE_SYS global_wb (#110636 ) global_wb with scopes lower than SCOPE_SYS is unnecessary for correctness. I was initially optimistic they would be very cheap no-ops but they can actually be quite expensive so let's avoid them.	2024-10-07 07:35:31 +02:00
Pierre van Houtryve	eaac4a2613	[AMDGPU] Document & Finalize GFX12 Memory Model (#98599 ) Documents the memory model implemented as of #98591, with some fixes/optimizations to the implementation.	2024-09-09 15:35:28 +02:00
Matt Arsenault	7b28cc0c59	AMDGPU: Query MachineModuleInfo from PM instead of MachineFunction (#99679 )	2024-07-22 08:55:39 +04:00
Jay Foad	74b87b02d2	[AMDGPU] Fix and add namespace closing comments. NFC.	2024-07-16 16:56:31 +01:00
Pierre van Houtryve	b3a446650c	[AMDGPU] Implement GFX12 Memory Model (#98591 ) - Emit GLOBAL_WB instructions - Reflect synscope on instructions's `scope:` operand Fixes SWDEV-468508 Fixes SWDEV-470735 Fixes SWDEV-468392 Fixes SWDEV-469622	2024-07-16 10:53:06 +02:00
Pierre van Houtryve	c1ac6d2dd4	[AMDGPU] Add amdgpu-as MMRA for fences (#78572 ) Using MMRAs, allow `builtin_amdgcn_fence` to emit fences that only target one or more address spaces, instead of fencing all address spaces at once. This is done through a `amdgpu-as` MMRA. Currently focused on OpenCL fences, but can very easily support more AS names and codegen on more than just fences.	2024-05-27 12:17:04 +02:00
Mirko Brkušanin	1fd1f4c0e1	[AMDGPU] Handle amdgpu.last.use metadata (#83816 ) Convert !amdgpu.last.use metadata into MachineMemOperand for last use and handle it in SIMemoryLegalizer similar to nontemporal and volatile.	2024-03-06 16:33:52 +01:00
Mirko Brkušanin	27ce5121ee	[AMDGPU] Fix setting nontemporal in memory legalizer (#83815 ) Iterator MI can advance in insertWait() but we need original instruction to set temporal hint. Just move it before handling volatile.	2024-03-04 15:05:31 +01:00
Petar Avramovic	3e35ba53e2	AMDGPU/GFX12: Insert waitcnts before stores with scope_sys (#82996 ) Insert waitcnts for loads and atomics before stores with system scope. Scope is field in instruction encoding and corresponds to desired coherence level in cache hierarchy. Intrinsic stores can set scope in cache policy operand. If volatile keyword is used on generic stores memory legalizer will set scope to system. Generic stores, by default, get lowest scope level. Waitcnts are not required if it is guaranteed that memory is cached. For example vulkan shaders can guarantee this. TODO: implement flag for frontends to give us a hint not to insert waits. Expecting vulkan flag to be implemented as vulkan:private MMRA.	2024-02-28 16:18:04 +01:00
Pierre van Houtryve	87d7711934	[AMDGPU][SIMemoryLegalizer] Fix order of GL0/1_INV on GFX10/11 (#81450 ) Fixes SWDEV-443292	2024-02-13 09:07:51 +01:00
Jay Foad	ba52f06f9d	[AMDGPU] CodeGen for GFX12 S_WAIT_* instructions (#77438 ) Update SIMemoryLegalizer and SIInsertWaitcnts to use separate wait instructions per counter (e.g. S_WAIT_LOADCNT) and split VMCNT into separate LOADCNT, SAMPLECNT and BVHCNT counters.	2024-01-18 10:47:45 +00:00
Mirko Brkušanin	7ca4473dd9	[AMDGPU] Add new cache flushing instructions for GFX12 (#76944 ) Co-authored-by: Diana Picus <Diana-Magda.Picus@amd.com>	2024-01-08 14:06:58 +00:00
Pierre van Houtryve	ef067f5204	[AMDGPU][SIInsertWaitcnts] Do not add s_waitcnt when the counters are known to be 0 already (#72830 ) Co-authored-by: Juan Manuel MARTINEZ CAAMAÑO <juamarti@amd.com>	2023-12-15 12:33:32 +01:00
Konstantin Zhuravlyov	42bd81410e	AMDGPU: Force sc0 and sc1 on stores for gfx940 and gfx941 Differential Revision: https://reviews.llvm.org/D149986	2023-05-12 11:53:19 -04:00
Stanislav Mekhanoshin	59162e3859	[AMDGPU] Skip buffer_wbl2 before atomic fence acquire Memory models for gfx90a and gfx940 do not require buffer_wbl2 before the fence for acquire ordering, but we do insert the full release. Fixes: SWDEV-386785 Differential Revision: https://reviews.llvm.org/D145524	2023-03-08 01:24:20 -08:00
Archibald Elliott	8e3d7cf5de	[NFC][TargetParser] Remove llvm/Support/TargetParser.h	2023-02-07 11:08:21 +00:00
Fangrui Song	21c4dc7997	std::optional::value => operator*/operator-> value() has undesired exception checking semantics and calls __throw_bad_optional_access in libc++. Moreover, the API is unavailable without _LIBCPP_NO_EXCEPTIONS on older Mach-O platforms (see _LIBCPP_AVAILABILITY_BAD_OPTIONAL_ACCESS). This fixes clang.	2022-12-17 00:42:05 +00:00
Jay Foad	6443c0ee02	[AMDGPU] Stop using make_pair and make_tuple. NFC. C++17 allows us to call constructors pair and tuple instead of helper functions make_pair and make_tuple. Differential Revision: https://reviews.llvm.org/D139828	2022-12-14 13:22:26 +00:00
Fangrui Song	67819a72c6	[CodeGen] llvm::Optional => std::optional	2022-12-13 09:06:36 +00:00
Kazu Hirata	8a7cbea525	[llvm] Use std::nullopt instead of None in comments (NFC) This is part of an effort to migrate from llvm::Optional to std::optional: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716	2022-12-08 23:22:00 -08:00
Kazu Hirata	20cde15415	[Target] Use std::nullopt instead of None (NFC) This patch mechanically replaces None with std::nullopt where the compiler would warn if None were deprecated. The intent is to reduce the amount of manual work required in migrating from Optional to std::optional. This is part of an effort to migrate from llvm::Optional to std::optional: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716	2022-12-02 20:36:06 -08:00
Juan Manuel MARTINEZ CAAMAÑO	ee761374f7	[AMDGPU][NFC] Fix typo in commment: replace SiMemOpInfo by SIMemOpInfo	2022-09-02 16:45:10 +02:00
Kazu Hirata	611ffcf4e4	[llvm] Use value instead of getValue (NFC)	2022-07-13 23:11:56 -07:00
Kazu Hirata	3b7c3a654c	Revert "Don't use Optional::hasValue (NFC)" This reverts commit aa8feeefd3ac6c78ee8f67bf033976fc7d68bc6d.	2022-06-25 11:56:50 -07:00
Kazu Hirata	aa8feeefd3	Don't use Optional::hasValue (NFC)	2022-06-25 11:55:57 -07:00
Kazu Hirata	7a47ee51a1	[llvm] Don't use Optional::getValue (NFC)	2022-06-20 22:45:45 -07:00
Jay Foad	b0a3849439	[AMDGPU] Update dlc usage for GFX11 In GFX10 dlc controlled L1 cache bypass. In GFX11 it has been repurposed to control MALL NOALLOC, and glc controls L1 as well as L0 cache bypass. Update the documentation and SIMemoryLegalizer accordingly. Set dlc for nontemporal and volatile accesses. Differential Revision: https://reviews.llvm.org/D127405	2022-06-10 08:10:34 +01:00
serge-sans-paille	989f1c72e0	Cleanup codegen includes This is a (fixed) recommit of https://reviews.llvm.org/D121169 after: 1061034926 before: 1063332844 Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D121681	2022-03-16 08:43:00 +01:00
Stanislav Mekhanoshin	47bac63d3f	[AMDGPU] gfx940 memory model Differential Revision: https://reviews.llvm.org/D121242	2022-03-14 15:01:46 -07:00
Stanislav Mekhanoshin	6181458662	[AMDGPU] gfx940 MUBUF format changes Differential Revision: https://reviews.llvm.org/D121234	2022-03-11 11:36:49 -08:00
Nico Weber	a278250b0f	Revert "Cleanup codegen includes" This reverts commit 7f230feeeac8a67b335f52bd2e900a05c6098f20. Breaks CodeGenCUDA/link-device-bitcode.cu in check-clang, and many LLVM tests, see comments on https://reviews.llvm.org/D121169	2022-03-10 07:59:22 -05:00
serge-sans-paille	7f230feeea	Cleanup codegen includes after: 1061034926 before: 1063332844 Differential Revision: https://reviews.llvm.org/D121169	2022-03-10 10:00:30 +01:00
Sebastian Neubauer	6527b2a4d5	[AMDGPU][NFC] Fix typos Fix some typos in the amdgpu backend. Differential Revision: https://reviews.llvm.org/D119235	2022-02-18 15:05:21 +01:00
Carl Ritson	8967d044fc	[AMDGPU] Add SIMemoryLegalizer comments to clarify bit usage Attempt to further document the intended cache policies requested by different combinations of GLC, SLC and DLC bits. GFX10 non-temporal stores are updated to set GLC. Reviewed By: t-tye Differential Revision: https://reviews.llvm.org/D114351	2021-11-26 21:05:58 +09:00
Jay Foad	c93bf53a3e	[AMDGPU] NFC formatting fixes in SIMemoryLegalizer	2021-11-05 09:10:24 +00:00
Tony Tye	53eb469195	[AMDGPU] Support non-strictly stronger memory orderings in SIMemoryLegalizer C++20 no longer requires the failure memory ordering to be no stronger than the success memory ordering. Adjust assert in AMD GPU SIMemoryLegalizer, and merge instruction memory orderings Add common operation to merge memory orders that allows non strict memory orderings to be combined. Use it in SIMemoryLegalizer and MachineMemOperand::getMergedOrdering. Reviewed By: efriedma, rampitec Differential Revision: https://reviews.llvm.org/D106729	2021-08-10 08:43:03 +00:00
Tony Tye	7f19aa73c2	[AMDGPU] Update gfx90a memory model support Update AMDGPU gfx90a memory model to make coarse grain memory allocations consistent when fine grained system scope atomic acquire and release is performed. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D105137	2021-06-30 04:05:22 +00:00
Eli Friedman	74909e4b6e	Rename MachineMemOperand::getOrdering -> getSuccessOrdering. Since this method can apply to cmpxchg operations, make sure it's clear what value we're actually retrieving. This will help ensure we don't accidentally ignore the failure ordering of cmpxchg in the future. We could potentially introduce a getOrdering() method on AtomicSDNode that asserts the operation isn't cmpxchg, but not sure that's worthwhile. Differential Revision: https://reviews.llvm.org/D103338	2021-06-21 16:49:27 -07:00
Tony Tye	4658cd4c18	[AMDGPU] Update gfx90a memory model support Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D100070	2021-04-07 22:17:58 +00:00

1 2

97 Commits