122 Commits

Author SHA1 Message Date
Shilei Tian
ef715849d7
[NFC][AMDGPU] Add some debug prints to SIMemoryLegalizer (#190658) 2026-04-06 17:17:33 -04:00
vporpo
40ea2f3513
[MIR][MachineInstr] Update MachineInstr::eraseFromParent() to return an iterator (#179787)
Unlike LLVM IR `Instruction::eraseFromParent()`,
`MachineInstr::eraseFromParent()` is void and does not return the
iterator following the erased instruction. Returning an iterator can be
very helpful for example when we are erasing MachineInstrs while
iterating, as it provides a convenient way to get a valid iterator.

This patch updates `MachineInstr::eraseFromParent()` to return a
`MachineBlock::iterator` (which is a
`MachineInstrBundleIterator<MachineInstr>`). If the erased instruction
is the head of a bundle, then the returned iterator points to the next
bundle (see unittest).
2026-03-13 13:39:13 -07:00
Pierre van Houtryve
b738491d2f
[AMDGPU][GFX12.5] Add support for emitting memory operations with nv bit set (#179413)
- Add `MONonVolatile` MachineMemOperand flag.
- Set nv=1 on memory operations on GFX12.5 if the operation accesses a
constant address space,
  is an invariant load, or has the `MONonVolatile` flag set.
2026-02-06 11:35:46 +01:00
Mariusz Sikora
3c0f5045e1
[AMDGPU] Add FeatureGFX13 and SMEM encoding for gfx13 (#177567)
For now list of features is based on gfx12 and gfx1250

---------

Co-authored-by: Jay Foad <jay.foad@amd.com>
2026-01-26 14:16:36 +01:00
Stanislav Mekhanoshin
dd947ebcf3
[AMDGPU] Update gfx1250 memory model for global acquire/release (#175865)
Inserts required waits around GLOBAL_INV/GLOBAL_WBINV for
agent scope and above.
2026-01-15 03:25:03 -08:00
LU-JOHN
49381c3000
[NFC][AMDGPU] Declare variables initialized with getDebugLoc as const ref (#174434)
Declare variables initialized with getDebugLoc as a const reference.

Signed-off-by: John Lu <John.Lu@amd.com>
2026-01-05 12:37:47 -06:00
Pierre van Houtryve
a086fb2fbb
[AMDGPU][gfx1250] Add wait_xcnt before any access that cannot be repeated (#168852)
The xcnt wait is actually required before any memory access that can
only be done once, so atomic stores and volatile accesses are affected.
This patch also ensures buffer instructions are handled.
2025-11-25 10:11:04 +01:00
Pierre van Houtryve
20795e06ed
[AMDGPU][SIMemoryLegalizer] Combine GFX10-11 CacheControl Classes (#168058)
Also breaks the long inheritance chains by making both
`SIGfx10CacheControl` and
`SIGfx12CacheControl` inherit from `SICacheControl` directly.

With this patch, we now just have 3 `SICacheControl` implementations
that each
do their own thing, and there is no more code hidden 3 superclasses
above (which made this code harder to read and maintain than it needed
to be).
2025-11-18 10:12:56 +01:00
Pierre van Houtryve
b07bfdb6fb
[AMDGPU][SIMemoryLegalizer] Combine all GFX6-9 CacheControl Classes (#168052)
Merge the following classes into `SIGfx6CacheControl`:
- SIGfx7CacheControl
- SIGfx90ACacheControl
- SIGfx940CacheControl

They were all very similar and had a lot of duplicated boilerplate just
to implement one or two codegen differences. GFX90A/GFX940 have a bit
more differences, but they're still manageable under one class because
the general behavior is the same.

This removes 500 lines of code and puts everything into a single place
which I think makes it a lot easier to maintain, at the cost of a slight
increase in complexity for some functions.

There is still a lot of room for improvement but I think this patch is
already big enough as is and I don't want to bundle too much into one
review.
2025-11-17 10:07:20 +01:00
Jay Foad
72c69aefba
[AMDGPU] Make use of getFunction and getMF. NFC. (#167872) 2025-11-14 11:00:57 +00:00
Jay Foad
60f20ea465
[AMDGPU] Add target feature for waits before system scope stores. NFC. (#164993) 2025-10-27 10:31:37 +00:00
Kazu Hirata
84857775b7
[Target] Add "override" where appropriate (NFC) (#165083)
Note that "override" makes "virtual" redundant.

Identified with modernize-use-override.
2025-10-25 06:23:43 -07:00
Pierre van Houtryve
07d47c792b
[AMDGPU] Update code sequence for CU-mode Release Fences in GFX10+ (#161638)
They were previously optimized to not emit any waitcnt, which is
technically correct because there is no reordering of operations at
workgroup scope in CU mode for GFX10+.

This breaks transitivity however, for example if we have the following
sequence of events in one thread:

- some stores
- store atomic release syncscope("workgroup")
- barrier

then another thread follows with

- barrier
- load atomic acquire
- store atomic release syncscope("agent")

It does not work because, while the other thread sees the stores, it
cannot release them at the wider scope. Our release fences aren't strong
enough to "wait" on stores from other waves.

We also cannot strengthen our release fences any further to allow for
releasing other wave's stores because only GFX12 can do that with
`global_wb`. GFX10-11 do not have the writeback instruction.
It'd also add yet another level of complexity to code sequences, with
both acquire/release having CU-mode only alternatives.
Lastly, acq/rel are always used together. The price for synchronization
has to be paid either at the acq, or the rel. Strengthening the releases
would just make the memory model more complex but wouldn't help
performance.

So the choice here is to streamline the code sequences by making CU and
WGP mode emit almost identical (vL0 inv is not needed in CU mode) code
for release (or stronger) atomic ordering.

This also removes the `vm_vsrc(0)` wait before barriers. Now that the
release fence in CU mode is strong enough, it is no longer needed.

Supersedes #160501
Solves SC1-6454
2025-10-21 09:23:46 +02:00
Krzysztof Drewniak
d37141776f
[AMDGPU] Enable volatile and non-temporal for loads to LDS (#153244)
The primary purpose of this commit is to enable marking loads to LDS
(global.load.lds, buffer.*.load.lds) volatile (using bit 31 of the aux
as with normal buffer loads) and to ensure that their !nontemporal
annotations translate to appropriate settings of te cache control bits.

However, in the process of implementing this feature, we also fixed
- Incorrect handling of buffer loads to LDS in GlobalISel
- Updating the handling of volatile on buffers in SIMemoryLegalizer:
previously, the mapping of address spaces would cause volatile on buffer
loads to be silently dropped on at least gfx10.

---------

Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
2025-10-20 12:42:22 -05:00
Pierre van Houtryve
c1852afa4b
[AMDGPU][SIMemoryLegalizer][GFX12] Correctly insert sample/bvhcnt (#161637)
The check used was not strong enough to prevent the insertion of sample/bvhcnt when they were not needed.
I assume SIInsertWaitCnts was trimming those away anyway, but this was a bug nonetheless.

We were inserting SAMPLE/BVHcnt waits in places where we only needed to wait on the previous atomic operation. Neither of these counter have any atomics associated with them.
2025-10-20 12:03:47 +02:00
Stanislav Mekhanoshin
1becadeebc
[AMDGPU] Update comments in memory legalizer. NFC (#160453) 2025-09-24 10:04:06 -07:00
Fabian Ritter
825c956c64
[AMDGPU] SIMemoryLegalizer: Factor out check if memory operations can affect the global AS (#160129)
Mostly NFC, and adds an assertion for gfx12 to ensure that no atomic scratch
instructions are present in the case of GloballyAddressableScratch. This should
always hold because of #154710.
2025-09-24 13:22:03 +02:00
Fabian Ritter
3f8c7e9fa3
[AMDGPU] Insert waitcnt for non-global fence release in GFX12 (#159282)
A fence release could be followed by a barrier, so it should wait for
the relevant memory accesses to complete, even if it is mmra-limited to
LDS. So far, that would be skipped for non-global fence releases.

Fixes SWDEV-554932.
2025-09-23 11:52:38 +02:00
Sameer Sahasrabuddhe
4b03252ad6
[NFC][AMDGPU][SIMemoryLegalizer] remove effectively empty function (#156806)
The removed function SIGfx90ACacheControl::enableLoadCacheBypass() does
not actually do anything except one assert and one unreachable.
2025-09-12 09:30:12 +00:00
Pierre van Houtryve
49a898f9b5
[AMDGPU][gfx1250] Support "cluster" syncscope (#157641)
Defaults to "agent" for targets that do not support it.

- Add documentation
- Register it in MachineModuleInfo
- Add MemoryLegalizer support
2025-09-10 11:41:43 +02:00
Pierre van Houtryve
d6d0f4f156
[AMDGPU][gfx1250] Remove SCOPE_SE for scratch stores (#157640) 2025-09-10 11:03:58 +02:00
Pierre van Houtryve
dcaa29c8ed
Revert "[AMDGPU][gfx1250] Add cu-store subtarget feature (#150588)" (#157639)
This reverts commit be17791f2624f22b3ed24a2539406164a379125d.

This is not necessary for gfx1250 anymore.
2025-09-10 10:20:59 +02:00
Pierre van Houtryve
bed9be954d
[AMDGPU][gfx1250] Implement SIMemoryLegalizer (#154726)
Implements the base of the MemoryLegalizer for a roughly correct GFX1250 memory model.
Documentation will come later, and some remaining changes still have to be added, but this is the backbone of the model.
2025-09-10 10:18:11 +02:00
Pierre van Houtryve
e2bd10cf16
[AMDGPU][gfx1250] Add 128B cooperative atomics (#156418)
- Add clang built-ins + sema/codegen
- Add IR Intrinsic + verifier
- Add DAG/GlobalISel codegen for the intrinsics
- Add lowering in SIMemoryLegalizer using a MMO flag.
2025-09-04 09:19:25 +00:00
Pierre van Houtryve
d6edc1a96f
[AMDGPU] Reenable BackOffBarrier on GFX11/12 (#155370)
Re-enable it by adding a wait on vm_vsrc before every barrier "start"
instruction in GFX10/11/12 CU mode.

This is a less strong wait than what we do without BackOffBarrier, thus
this shouldn't introduce
any new guarantees that can be abused, instead it relaxes the guarantees
we have now to the bare
minimum needed to support the behavior users want (fence release +
barrier works).

There is an exact memory model in the works which will be documented
separately.
2025-09-02 09:37:43 +02:00
Sameer Sahasrabuddhe
8f187c74b3
[AMDGPU] introduce S_WAITCNT_LDS_DIRECT in the memory legalizer (#150887)
The new instruction represents the unknown number of waitcnts needed at a
release operation to ensure that prior direct loads to LDS (formerly called LDS
DMA) are completed. The instruction is replaced in SIInsertWaitcnts with a
suitable value for vmcnt().

Co-authored-by: Austin Kerbow <austin.kerbow@amd.com>.
2025-07-30 11:23:28 +05:30
Pierre van Houtryve
be17791f26
[AMDGPU][gfx1250] Add cu-store subtarget feature (#150588)
Determines whether we can use `SCOPE_CU` stores (on by default), or
whether all stores must be done at `SCOPE_SE` minimum.
2025-07-29 11:38:43 +02:00
Pierre van Houtryve
a6532c2ada
[AMDGPU][gfx12] Clean-up implementation of waits before SCOPE_SYS stores (#150587)
We can do it all in finalizeStore if we ensure it always sees the
stores.
For that, I needed to fix a hidden bug where finalizeStore wouldn't see
all stores
because sometimes the iterator got out-of-sync and didn't point to the
store anymore.

This also removes the waits before volatile LDS stores which never
needed it, that was a bug until now.
2025-07-28 15:38:46 +02:00
Pierre van Houtryve
2ad4e93ded
[AMDGPU][gfx1250] Use SCOPE_SE for stores that may hit scratch (#150586) 2025-07-28 11:40:56 +02:00
Pierre van Houtryve
9c5f8ec561
[NFC][AMDGPU] Refactor handling of amdgpu-synchronize-as MD on fences (#148630)
Directly plug it into the MMO instead, which is much cleaner.
2025-07-24 12:45:50 +02:00
Pierre van Houtryve
cd1b84caa8
[NFC][AMDGPU] Rename "amdgpu-as" to "amdgpu-synchronize-as" (#148627)
"amdgpu-as" is way too vague and doesn't give enough context.
We may want to support it on normal atomics too, to control the synchronized (ordered) AS.
If we do that, the name has to be less vague.
2025-07-24 12:41:57 +02:00
Stanislav Mekhanoshin
958dc86026
[AMDGPU] Don't insert wait instructions that are not supported by gfx1250 (#145084)
No tests yet, but it will allow further tests not to be
polluted with these waits.
2025-06-20 12:21:45 -07:00
Justin Bogner
b7bb256703
Warn on misuse of DiagnosticInfo classes that hold Twines (#137397)
This annotates the `Twine` passed to the constructors of the various
DiagnosticInfo subclasses with `[[clang::lifetimebound]]`, which causes
us to warn when we would try to print the twine after it had already
been destructed.

We also update `DiagnosticInfoUnsupported` to hold a `const Twine &`
like all of the other DiagnosticInfo classes, since this warning allows
us to clean up all of the places where it was being used incorrectly.
2025-05-28 12:26:39 -07:00
Akshat Oke
c22c5643db
[AMDGPU][NPM] Port SIMemoryLegalizer to NPM (#130060) 2025-03-12 14:30:35 +05:30
Fabian Ritter
2260d59257
[AMDGPU] Remove FeatureForceStoreSC0SC1 (#126878)
This was only used for gfx940 and gfx941, which have since been removed.

For SWDEV-512631
2025-02-19 10:26:09 +01:00
Fabian Ritter
8615f9aaff
[AMDGPU] Replace gfx940 and gfx941 with gfx942 in llvm (#126763)
gfx940 and gfx941 are no longer supported. This is one of a series of
PRs to remove them from the code base.

This PR removes all non-documentation occurrences of gfx940/gfx941 from
the llvm directory, and the remaining occurrences in clang.

Documentation changes will follow.

For SWDEV-512631
2025-02-19 10:20:48 +01:00
Pierre van Houtryve
924a64a348
[AMDGPU] Only emit SCOPE_SYS global_wb (#110636)
global_wb with scopes lower than SCOPE_SYS is unnecessary for
correctness.

I was initially optimistic they would be very cheap no-ops but they can
actually be quite expensive so let's avoid them.
2024-10-07 07:35:31 +02:00
Pierre van Houtryve
eaac4a2613
[AMDGPU] Document & Finalize GFX12 Memory Model (#98599)
Documents the memory model implemented as of #98591, with some
fixes/optimizations to the implementation.
2024-09-09 15:35:28 +02:00
Matt Arsenault
7b28cc0c59
AMDGPU: Query MachineModuleInfo from PM instead of MachineFunction (#99679) 2024-07-22 08:55:39 +04:00
Jay Foad
74b87b02d2 [AMDGPU] Fix and add namespace closing comments. NFC. 2024-07-16 16:56:31 +01:00
Pierre van Houtryve
b3a446650c
[AMDGPU] Implement GFX12 Memory Model (#98591)
- Emit GLOBAL_WB instructions
- Reflect synscope on instructions's `scope:` operand

Fixes SWDEV-468508
Fixes SWDEV-470735
Fixes SWDEV-468392
Fixes SWDEV-469622
2024-07-16 10:53:06 +02:00
Pierre van Houtryve
c1ac6d2dd4
[AMDGPU] Add amdgpu-as MMRA for fences (#78572)
Using MMRAs, allow `builtin_amdgcn_fence` to emit fences that only
target one or more address spaces, instead of fencing all address spaces
at once.

This is done through a `amdgpu-as` MMRA. Currently focused on OpenCL
fences, but can very easily support more AS names and codegen on more
than just fences.
2024-05-27 12:17:04 +02:00
Mirko Brkušanin
1fd1f4c0e1
[AMDGPU] Handle amdgpu.last.use metadata (#83816)
Convert !amdgpu.last.use metadata into MachineMemOperand for last use
and handle it in SIMemoryLegalizer similar to nontemporal and volatile.
2024-03-06 16:33:52 +01:00
Mirko Brkušanin
27ce5121ee
[AMDGPU] Fix setting nontemporal in memory legalizer (#83815)
Iterator MI can advance in insertWait() but we need original instruction
to set temporal hint. Just move it before handling volatile.
2024-03-04 15:05:31 +01:00
Petar Avramovic
3e35ba53e2
AMDGPU/GFX12: Insert waitcnts before stores with scope_sys (#82996)
Insert waitcnts for loads and atomics before stores with system scope.
Scope is field in instruction encoding and corresponds to desired
coherence level in cache hierarchy.
Intrinsic stores can set scope in cache policy operand.
If volatile keyword is used on generic stores memory legalizer will set
scope to system. Generic stores, by default, get lowest scope level.
Waitcnts are not required if it is guaranteed that memory is cached.
For example vulkan shaders can guarantee this.
TODO: implement flag for frontends to give us a hint not to insert
waits.
Expecting vulkan flag to be implemented as vulkan:private MMRA.
2024-02-28 16:18:04 +01:00
Pierre van Houtryve
87d7711934
[AMDGPU][SIMemoryLegalizer] Fix order of GL0/1_INV on GFX10/11 (#81450)
Fixes SWDEV-443292
2024-02-13 09:07:51 +01:00
Jay Foad
ba52f06f9d
[AMDGPU] CodeGen for GFX12 S_WAIT_* instructions (#77438)
Update SIMemoryLegalizer and SIInsertWaitcnts to use separate wait
instructions per counter (e.g. S_WAIT_LOADCNT) and split VMCNT into
separate LOADCNT, SAMPLECNT and BVHCNT counters.
2024-01-18 10:47:45 +00:00
Mirko Brkušanin
7ca4473dd9
[AMDGPU] Add new cache flushing instructions for GFX12 (#76944)
Co-authored-by: Diana Picus <Diana-Magda.Picus@amd.com>
2024-01-08 14:06:58 +00:00
Pierre van Houtryve
ef067f5204
[AMDGPU][SIInsertWaitcnts] Do not add s_waitcnt when the counters are known to be 0 already (#72830)
Co-authored-by: Juan Manuel MARTINEZ CAAMAÑO <juamarti@amd.com>
2023-12-15 12:33:32 +01:00
Konstantin Zhuravlyov
42bd81410e AMDGPU: Force sc0 and sc1 on stores for gfx940 and gfx941
Differential Revision: https://reviews.llvm.org/D149986
2023-05-12 11:53:19 -04:00