240 Commits

Author SHA1 Message Date
Changpeng Fang
6184ef1c2f
[AMDGPU] Support f64 atomics on gfx1250 (#151172)
- BUF/FLAT/GLOBAL_ADD/MIN/MAX_F64
   - DS_ADD_F64

Co-authored-by: Konstantin Zhuravlyov <Konstantin Zhuravlyov@amd.com>
2025-07-29 09:41:00 -07:00
Jay Foad
28b85502eb
[AMDGPU] Remove some duplicated lines. NFC. (#128029) 2025-07-21 17:28:31 +01:00
Stanislav Mekhanoshin
6d8e53d4af
[AMDGPU] Support nv memory instructions modifier on gfx1250 (#149582) 2025-07-18 14:38:46 -07:00
Changpeng Fang
fe8a26263a
AMDGPU: Remove Formatted MUBUF instructions from gfx1250 support (#145590) 2025-06-24 14:17:13 -07:00
Changpeng Fang
ce4d214947
AMDGPU: Remove MTBUF instructions from gfx1250 support (#145563) 2025-06-24 11:59:13 -07:00
Craig Topper
ca21508080
[Targets] Migrate from atomic_load_8/16/32/64 to atomic_load_nonext_8/16/32/64. NFC (#137428)
This makes them more consistent with the checks performed by regular loads. We can't simply add IsNonExtLoad to the existing atomic_load_8/16/32/64 as that would affect out of tree targets.
2025-04-28 09:26:34 -07:00
Craig Topper
5dc2d668e6
[SelectionDAG][Targets] Replace atomic_load_8/atomic_load_16 with atomic_load_*ext_8/atomic_load_*ext_16 where possible. (#137279)
isAnyExtLoad/isZExtLoad/isSignExtLoad are able to emit predicate checks
from tablegen now so we should use them.

The next step would be to add isNonExtLoad versions and migrate all
remaining uses of atomic_load_8/16/32/64 to that.
2025-04-25 09:01:00 -07:00
Juan Manuel Martinez Caamaño
db33978c46
[AMDGPU][GFX11] buffer_load_lds_{size} instructions do not exist (#132916)
According to the shader manual there are not buffer load lds
instructions of gfx11.

The tests for the regular `buffer_load ... lds` instructions for gfx11
are already present in AMDGPU/gfx11_asm_mubuf.s, where the compiler
fails to encode the instructions for this target.
2025-03-25 15:24:06 +01:00
Jay Foad
457f302473
[AMDGPU] Disallow null for more resource operands (#121941)
Following on from #115200, disallow the null sgpr as a resource operand
in some instructions that were missed.
2025-01-08 08:02:10 +00:00
Jun Wang
b2adeae865
[AMDGPU][MC] Allow null where 128b or larger dst reg is expected (#115200)
For GFX10+, currently null cannot be used as dst reg in instructions
that expect the dst reg to be 128b or larger (e.g., s_load_dwordx4).
This patch fixes this problem while ensuring null cannot be used as S#,
T#, or V#.
2025-01-03 11:49:51 -08:00
Sergei Barannikov
6b2232606d
[TableGen] Replace WantRoot/WantParent SDNode properties with flags (#119599)
These properties are only valid on ComplexPatterns. Having them as flags
is more convenient because one can now use "let = ... in" syntax to set
these flags on several patterns at a time. This is also less error-prone
as it makes it impossible to specify these properties on records derived
from SDPatternOperator.

Pull Request: https://github.com/llvm/llvm-project/pull/119599
2024-12-12 00:41:44 +03:00
Matt Arsenault
7fc71f7909
AMDGPU: Support buffer_atomic_pk_add_bf16 for gfx950 (#117599)
Co-authored-by: Sirish Pande <Sirish.Pande@amd.com>
2024-11-25 19:54:50 -08:00
Jay Foad
1b792252e3
[AMDGPU] Remove hasPostISelHook for atomics. NFC. (#116791)
This is not required since 2147b6c89d44 changed that way that no-ret
atomic ops are selected.
2024-11-20 10:38:35 +00:00
Matt Arsenault
927032807d
AMDGPU: Handle gfx950 96/128-bit buffer_load_lds (#116681)
Enforcing this limit in the clang builtin will come later.
2024-11-18 22:01:56 -08:00
Jay Foad
550501f21c
[AMDGPU] Simplify GFX12 VBUFFER definitions. NFC. (#114403)
For GFX12 hasTFE is always true because it does not have the buffer load
to LDS instructions.
2024-11-01 10:06:45 +00:00
Matt Arsenault
12409024d3
AMDGPU/GlobalISel: Handle atomic sextload and zextload (#111721)
Atomic loads are handled differently from the DAG, and have separate opcodes
and explicit control over the extensions, like ordinary loads. Add
new patterns for these.

There's room for cleanup and improvement. d16 cases aren't handled.

Fixes #111645
2024-10-31 07:44:52 -07:00
Jun Wang
5927c6745c
[AMDGPU][MC] Instructions not to be supported in GFX940 (#109225)
Buffer_store_lds_dword, buffer_wbinvl1, and buffer_wbinvl1_vol are
obsolete in GFX940 and should not be supported.
2024-09-23 10:38:27 -07:00
Jay Foad
935b9f6274 [AMDGPU] Make use of multiclass inheritance. NFC. 2024-09-11 10:39:48 +01:00
Acim Maravic
9398cc2ec5
[LLVM][AMDGPU] Copy isConvergent from Pseudo to Real instructions (#99658)
This patch copies the flag isConvergent from pseudo instructions to the
corresponding real instructions, so that isConvergent flag is also
defined for real instructions.

Flags are not required by the compiler, but for consistency it would be
nice to have them.

Co-authored-by: Acim Maravic <Acim.Maravic@amd.com>
2024-07-25 18:01:07 +02:00
Matt Arsenault
2ef4f863f3
AMDGPU: Add subtarget feature for memory atomic fadd f64 (#96444) 2024-07-10 16:55:06 +04:00
Matt Arsenault
889f3c5741
AMDGPU: Handle legal v2bf16 atomicrmw fadd for gfx12 (#95930)
Annoyingly gfx90a/940 support this for global/flat but not buffer.
2024-06-25 17:45:34 +02:00
Matt Arsenault
a440a96ec2
AMDGPU: Start selecting flat/global atomicrmw fmin/fmax. (#95592)
Define subtarget features for atomic fmin/fmax support.

The flat/global support is a real messe. We had float/double support at
the beginning in gfx6 and gfx7. gfx8 removed these. gfx10 reintroduced them.
gfx11 removed the f64 versions again.

gfx9 partially reintroduced them, in gfx90a and gfx940 but only for f64.
2024-06-23 10:10:41 +02:00
Matt Arsenault
b9c7d60a2f
AMDGPU: Start fixing inconsistencies in usage of SubtargetPredicate (#96337)
SubtargetPredicate should be the primary "does this instruction exist"
predicate, with OtherPredicates used for other side pieces of information.

Changes like 856d1c4410 were backwards. The problematic usage is how
GFX12 is using HasRestrictedOffset. The multiclasses for buffers
should probably be split up instead of hiding OtherPredicates inside
the buffer atomic multiclasses. The two cases are mutually exclusive
and really need a negated predicate for the not-gfx12 case.

It's pretty terrible we have to manage this in the first place.
TableGen should be able to figure out the required predicates
from any instructions that appear in the pattern output.
2024-06-21 23:09:36 +02:00
Matt Arsenault
5d6d2fc080
AMDGPU: Fix overriding SubtargetPredicate in MUBUF_Real_gfx90a (#96351) 2024-06-21 23:07:20 +02:00
Matt Arsenault
9f8e7c3a01
AMDGPU: Create pseudo to real mapping for flat/buffer atomic fmin/fmax (#95591)
The global/flat/buffer atomic fmin/fmax situation is a mess. These
instructions have been renamed 3 times. We currently have
separate pseudos defined for the same opcodes with the different names
(e.g. GLOBAL_ATOMIC_MIN_F64 from gfx90a and GLOBAL_ATOMIC_FMIN_X2 from gfx10).

Use the _FMIN versions as the canonical name for the f32 versions. Use the
_MIN_F64 style as the canonical name for the f64 case. This is because
gfx90a has the most sensible names, but does not have the f32 versions.t sho

Wire through the pseudo to use for the instruction properties vs. the assembly
name like in other cases. This will simplify handling of direct atomicrmw selection.

This will simplify directly selecting these from atomicrmw.
2024-06-18 10:34:09 +02:00
Matt Arsenault
8930ac1bbe
AMDGPU: Cleanup selection patterns for buffer loads (#95378)
We should just support these for all register types.
2024-06-17 21:51:25 +02:00
Matt Arsenault
3b997294d6
AMDGPU: Remove .v2bf16 buffer atomic fadd intrinsics (#95783)
These are redundant with the unsuffixed versions, and have a name
collision with surprising behavior when the base intrinsic is used with
v2bf16.

The global and flat variants should be removed too, but those are complicated
due to using v2i16 in place of the natural v2bf16. Those cases can soon be
completely deleted in favor of atomicrmw.

The GlobalISel codegen change is broken and substitutes handling as bf16
for handling as f16, but it's a bug that this passed the IRTranslator in the first
place.
2024-06-17 21:44:52 +02:00
Joe Nash
7e3e9d4308
[AMDGPU] Change getLdStRegisterOperand to !cond for better diagnostic (#95475)
If you would hit the unexpected case in these !if trees, you'd get an
error message like "error: Not a known RegisterClass! def VReg_1..."
This can happen when changing code quite indirectly related to these
class definitions. We can use !cond here, which has a builtin facility
to throw an error if no case in the !cond statement is hit.

NFC.
2024-06-14 09:33:03 -04:00
Matt Arsenault
c0ff36ea23
AMDGPU: Fix buffer intrinsic handling for various 16-bit elements. (#95376)
Mostly fixes handling of bfloat vectors, but also some missing
i16 cases.
2024-06-13 12:33:18 +02:00
Matt Arsenault
5c9352eb02
DAG: Replace bitwidth with type in suffix in atomic tablegen ops (#94845) 2024-06-13 11:52:22 +02:00
Matt Arsenault
935d377350
AMDGPU: Fix using wrong memory type for non-image resource intrinsics (#94911)
An 8 x i16 raw load was incorrectly using a 64-bit memory type, which
would assert in the MachineMemOperand constructor.

This is preparation for a cleanup which will make the buffer intrinsics
work for all legal types.
2024-06-13 11:10:28 +02:00
Ivan Kosarev
9890f94343
[AMDGPU][GFX12] Support disassembling MUBUF instructions with arbitrary FORMAT values. (#95243)
Some tools generate such instructions with the FORMAT field set to 0,
which corresponds to buf_fmt_invalid, but that should not prevent them
from being recognised on decoding.
2024-06-13 08:16:06 +01:00
Matt Arsenault
dd7540f3da AMDGPU: Handle buffer load/store for 64-bit element types
Note pointers still don't work correctly.
2024-06-12 10:26:16 +02:00
Fabian Ritter
0821b7937c
[AMDGPU] Copy Defs and Uses from Pseudo to Real Instructions (#93004)
Currently, the tablegen files that generate the instruction definitions
in lib/Target/AMDGPU/AMDGPUGenInstrInfo.inc often only include implicit
operands for the architecture-independent pseudo instructions, but not
for the corresponding real instructions. The missing implicit operands
(most prominently: the EXEC mask) do not affect code generation, since
that operates on pseudo instructions, but they are problematic when
working with real instructions, e.g., as a decoding result from the MC
layer.

This patch copies the implicit Defs and Uses from pseudo instructions to
the corresponding real instructions, so that implicit operands are also
defined for real instructions.

Addresses issue #89830.
2024-05-31 08:40:54 +02:00
Mirko Brkušanin
1e6a82b8ef
[AMDGPU] Legalize and select raw/struct_buffer_load with tfe (#93310) 2024-05-27 14:09:17 +02:00
Joe Nash
fe0b7983a2
[AMDGPU] Create AMDGPUMnemonicAlias tablegen class (#89288)
AMDGPUMnemonicAlias is a MnemonicAlias that inherits from
GCNPredicateControl, so that we can set predicates on the alias the same
way as Instructions.
Use AssemblerPredicate instead of Requires on aliases

NFC.
2024-05-09 11:37:56 -04:00
Jay Foad
856d1c4410
[AMDGPU] Fix predicates for BUFFER_ATOMIC_FMIN/FMAX patterns (#89066)
Use OtherPredicates to avoid interfering with other uses of
SubtargetPredicate for GFX12.
2024-04-17 14:58:13 +01:00
David Stuttard
75e528fdd9
[AMDGPU] Extend zero initialization of return values for TFE (#85759)
buffer_load instructions that use TFE also need to zero initialize
return values similar to how the image instructions currently work. Add
support for this with standard zero init of all results + zero init of
just TFE flag when enable-prt-strict-null subtarget feature is disabled.
2024-03-25 09:01:46 +00:00
Jay Foad
7cd61f888c [AMDGPU] Remove unneeded MnemonicAlias. NFC.
This is unneeded because MUBUF_Real_Atomic_gfx11_gfx12 on the line above
generates it automatically.
2024-03-13 12:03:41 +00:00
Jay Foad
36dece0013
[AMDGPU] Add missing GFX10 buffer format d16 hi instructions (#84809) 2024-03-12 08:20:08 +00:00
Jay Foad
074fe3bac6
[AMDGPU] Simplify and refactor VBUFFER_Real class definitions. NFC. (#84521)
Abstracting out a new base class VBUFFER_Real_gfx12 just highlights that
the only difference between the MUBUF and MTBUF forms is in the handling
of the "format" field.
2024-03-08 19:22:08 +00:00
Jay Foad
e460da14ec
[AMDGPU] Use get_BUF_ps to default real_name of BUF instructions. NFC. (#84524) 2024-03-08 19:21:27 +00:00
Jay Foad
0456a32a2a
[AMDGPU] Simplify renamed BUF instruction definitions. NFC. (#84503)
Use optional arguments instead of separate (multi)classes for renamed
instructions.
2024-03-08 16:08:09 +00:00
Jay Foad
bf7f62ab92 [AMDGPU] Make use of Mnem_gfx11_gfx12. NFC. 2024-03-07 10:06:51 +00:00
Jay Foad
e49479b881
[AMDGPU] Remove unneeded BUF _impl multiclasses. NFC. (#84034)
Remove MUBUF_Real_gfx11_impl and others. By converting the underlying
class MUBUF_Real_gfx11 into a multiclass, the _impl wrapper is no longer
needed.
2024-03-05 16:35:29 +00:00
Jay Foad
894f52fc0d
[AMDGPU] Use BUF multiclasses to reduce repetition. NFC. (#84003)
Define BUF Real instructions with this general pattern for all
architectures (not just GFX11):

  multiclass Something_Real_gfx11<...> {
    defvar ps = !cast<Pseudo>(NAME);
    def _gfx11 : ...;
  }

This allows removing a huge amount of repetition in the definitions of
individual Real instructions, where they would have to !cast their own
name to a Pseudo and pass that in as a class argument.
2024-03-05 13:27:51 +00:00
Jay Foad
67a7a5e89d [AMDGPU] Only use the BUF Base_ prefix for multiple architectures. NFC.
The Base_ prefix seems redundant on a class that is only used for GFX11.
2024-03-05 12:01:45 +00:00
Jay Foad
4693efe19c
[AMDGPU] Remove Base_MUBUF_Real_Atomic_gfx11. NFC. (#83994)
This class only existed to set the dlc bit for GFX11 atomics. It is
simpler to set dlc for all loads/stores/atomics in the base class.
2024-03-05 11:58:17 +00:00
Jay Foad
762f762504
[AMDGPU] Rename get_MUBUF_ps and use it for MTBUF too. NFC. (#83991)
This allows removing a couple of MTBUF helper (multi)classes.
2024-03-05 11:21:38 +00:00
Jay Foad
53f89a0bb7
[AMDGPU] Remove AtomicNoRet class and getAtomicNoRetOp table (#83593) 2024-03-01 17:18:55 +00:00