llvm-project

Author	SHA1	Message	Date
Changpeng Fang	6184ef1c2f	[AMDGPU] Support f64 atomics on gfx1250 (#151172 ) - BUF/FLAT/GLOBAL_ADD/MIN/MAX_F64 - DS_ADD_F64 Co-authored-by: Konstantin Zhuravlyov <Konstantin Zhuravlyov@amd.com>	2025-07-29 09:41:00 -07:00
Jay Foad	28b85502eb	[AMDGPU] Remove some duplicated lines. NFC. (#128029 )	2025-07-21 17:28:31 +01:00
Stanislav Mekhanoshin	6d8e53d4af	[AMDGPU] Support nv memory instructions modifier on gfx1250 (#149582 )	2025-07-18 14:38:46 -07:00
Changpeng Fang	fe8a26263a	AMDGPU: Remove Formatted MUBUF instructions from gfx1250 support (#145590 )	2025-06-24 14:17:13 -07:00
Changpeng Fang	ce4d214947	AMDGPU: Remove MTBUF instructions from gfx1250 support (#145563 )	2025-06-24 11:59:13 -07:00
Craig Topper	ca21508080	[Targets] Migrate from atomic_load_8/16/32/64 to atomic_load_nonext_8/16/32/64. NFC (#137428 ) This makes them more consistent with the checks performed by regular loads. We can't simply add IsNonExtLoad to the existing atomic_load_8/16/32/64 as that would affect out of tree targets.	2025-04-28 09:26:34 -07:00
Craig Topper	5dc2d668e6	[SelectionDAG][Targets] Replace atomic_load_8/atomic_load_16 with atomic_load_ext_8/atomic_load_ext_16 where possible. (#137279 ) isAnyExtLoad/isZExtLoad/isSignExtLoad are able to emit predicate checks from tablegen now so we should use them. The next step would be to add isNonExtLoad versions and migrate all remaining uses of atomic_load_8/16/32/64 to that.	2025-04-25 09:01:00 -07:00
Juan Manuel Martinez Caamaño	db33978c46	[AMDGPU][GFX11] buffer_load_lds_{size} instructions do not exist (#132916 ) According to the shader manual there are not buffer load lds instructions of gfx11. The tests for the regular `buffer_load ... lds` instructions for gfx11 are already present in AMDGPU/gfx11_asm_mubuf.s, where the compiler fails to encode the instructions for this target.	2025-03-25 15:24:06 +01:00
Jay Foad	457f302473	[AMDGPU] Disallow null for more resource operands (#121941 ) Following on from #115200, disallow the null sgpr as a resource operand in some instructions that were missed.	2025-01-08 08:02:10 +00:00
Jun Wang	b2adeae865	[AMDGPU][MC] Allow null where 128b or larger dst reg is expected (#115200 ) For GFX10+, currently null cannot be used as dst reg in instructions that expect the dst reg to be 128b or larger (e.g., s_load_dwordx4). This patch fixes this problem while ensuring null cannot be used as S#, T#, or V#.	2025-01-03 11:49:51 -08:00
Sergei Barannikov	6b2232606d	[TableGen] Replace WantRoot/WantParent SDNode properties with flags (#119599 ) These properties are only valid on ComplexPatterns. Having them as flags is more convenient because one can now use "let = ... in" syntax to set these flags on several patterns at a time. This is also less error-prone as it makes it impossible to specify these properties on records derived from SDPatternOperator. Pull Request: https://github.com/llvm/llvm-project/pull/119599	2024-12-12 00:41:44 +03:00
Matt Arsenault	7fc71f7909	AMDGPU: Support buffer_atomic_pk_add_bf16 for gfx950 (#117599 ) Co-authored-by: Sirish Pande <Sirish.Pande@amd.com>	2024-11-25 19:54:50 -08:00
Jay Foad	1b792252e3	[AMDGPU] Remove hasPostISelHook for atomics. NFC. (#116791 ) This is not required since 2147b6c89d44 changed that way that no-ret atomic ops are selected.	2024-11-20 10:38:35 +00:00
Matt Arsenault	927032807d	AMDGPU: Handle gfx950 96/128-bit buffer_load_lds (#116681 ) Enforcing this limit in the clang builtin will come later.	2024-11-18 22:01:56 -08:00
Jay Foad	550501f21c	[AMDGPU] Simplify GFX12 VBUFFER definitions. NFC. (#114403 ) For GFX12 hasTFE is always true because it does not have the buffer load to LDS instructions.	2024-11-01 10:06:45 +00:00
Matt Arsenault	12409024d3	AMDGPU/GlobalISel: Handle atomic sextload and zextload (#111721 ) Atomic loads are handled differently from the DAG, and have separate opcodes and explicit control over the extensions, like ordinary loads. Add new patterns for these. There's room for cleanup and improvement. d16 cases aren't handled. Fixes #111645	2024-10-31 07:44:52 -07:00
Jun Wang	5927c6745c	[AMDGPU][MC] Instructions not to be supported in GFX940 (#109225 ) Buffer_store_lds_dword, buffer_wbinvl1, and buffer_wbinvl1_vol are obsolete in GFX940 and should not be supported.	2024-09-23 10:38:27 -07:00
Jay Foad	935b9f6274	[AMDGPU] Make use of multiclass inheritance. NFC.	2024-09-11 10:39:48 +01:00
Acim Maravic	9398cc2ec5	[LLVM][AMDGPU] Copy isConvergent from Pseudo to Real instructions (#99658 ) This patch copies the flag isConvergent from pseudo instructions to the corresponding real instructions, so that isConvergent flag is also defined for real instructions. Flags are not required by the compiler, but for consistency it would be nice to have them. Co-authored-by: Acim Maravic <Acim.Maravic@amd.com>	2024-07-25 18:01:07 +02:00
Matt Arsenault	2ef4f863f3	AMDGPU: Add subtarget feature for memory atomic fadd f64 (#96444 )	2024-07-10 16:55:06 +04:00
Matt Arsenault	889f3c5741	AMDGPU: Handle legal v2bf16 atomicrmw fadd for gfx12 (#95930 ) Annoyingly gfx90a/940 support this for global/flat but not buffer.	2024-06-25 17:45:34 +02:00
Matt Arsenault	a440a96ec2	AMDGPU: Start selecting flat/global atomicrmw fmin/fmax. (#95592 ) Define subtarget features for atomic fmin/fmax support. The flat/global support is a real messe. We had float/double support at the beginning in gfx6 and gfx7. gfx8 removed these. gfx10 reintroduced them. gfx11 removed the f64 versions again. gfx9 partially reintroduced them, in gfx90a and gfx940 but only for f64.	2024-06-23 10:10:41 +02:00
Matt Arsenault	b9c7d60a2f	AMDGPU: Start fixing inconsistencies in usage of SubtargetPredicate (#96337 ) SubtargetPredicate should be the primary "does this instruction exist" predicate, with OtherPredicates used for other side pieces of information. Changes like 856d1c4410 were backwards. The problematic usage is how GFX12 is using HasRestrictedOffset. The multiclasses for buffers should probably be split up instead of hiding OtherPredicates inside the buffer atomic multiclasses. The two cases are mutually exclusive and really need a negated predicate for the not-gfx12 case. It's pretty terrible we have to manage this in the first place. TableGen should be able to figure out the required predicates from any instructions that appear in the pattern output.	2024-06-21 23:09:36 +02:00
Matt Arsenault	5d6d2fc080	AMDGPU: Fix overriding SubtargetPredicate in MUBUF_Real_gfx90a (#96351 )	2024-06-21 23:07:20 +02:00
Matt Arsenault	9f8e7c3a01	AMDGPU: Create pseudo to real mapping for flat/buffer atomic fmin/fmax (#95591 ) The global/flat/buffer atomic fmin/fmax situation is a mess. These instructions have been renamed 3 times. We currently have separate pseudos defined for the same opcodes with the different names (e.g. GLOBAL_ATOMIC_MIN_F64 from gfx90a and GLOBAL_ATOMIC_FMIN_X2 from gfx10). Use the _FMIN versions as the canonical name for the f32 versions. Use the _MIN_F64 style as the canonical name for the f64 case. This is because gfx90a has the most sensible names, but does not have the f32 versions.t sho Wire through the pseudo to use for the instruction properties vs. the assembly name like in other cases. This will simplify handling of direct atomicrmw selection. This will simplify directly selecting these from atomicrmw.	2024-06-18 10:34:09 +02:00
Matt Arsenault	8930ac1bbe	AMDGPU: Cleanup selection patterns for buffer loads (#95378 ) We should just support these for all register types.	2024-06-17 21:51:25 +02:00
Matt Arsenault	3b997294d6	AMDGPU: Remove .v2bf16 buffer atomic fadd intrinsics (#95783 ) These are redundant with the unsuffixed versions, and have a name collision with surprising behavior when the base intrinsic is used with v2bf16. The global and flat variants should be removed too, but those are complicated due to using v2i16 in place of the natural v2bf16. Those cases can soon be completely deleted in favor of atomicrmw. The GlobalISel codegen change is broken and substitutes handling as bf16 for handling as f16, but it's a bug that this passed the IRTranslator in the first place.	2024-06-17 21:44:52 +02:00
Joe Nash	7e3e9d4308	[AMDGPU] Change getLdStRegisterOperand to !cond for better diagnostic (#95475 ) If you would hit the unexpected case in these !if trees, you'd get an error message like "error: Not a known RegisterClass! def VReg_1..." This can happen when changing code quite indirectly related to these class definitions. We can use !cond here, which has a builtin facility to throw an error if no case in the !cond statement is hit. NFC.	2024-06-14 09:33:03 -04:00
Matt Arsenault	c0ff36ea23	AMDGPU: Fix buffer intrinsic handling for various 16-bit elements. (#95376 ) Mostly fixes handling of bfloat vectors, but also some missing i16 cases.	2024-06-13 12:33:18 +02:00
Matt Arsenault	5c9352eb02	DAG: Replace bitwidth with type in suffix in atomic tablegen ops (#94845 )	2024-06-13 11:52:22 +02:00
Matt Arsenault	935d377350	AMDGPU: Fix using wrong memory type for non-image resource intrinsics (#94911 ) An 8 x i16 raw load was incorrectly using a 64-bit memory type, which would assert in the MachineMemOperand constructor. This is preparation for a cleanup which will make the buffer intrinsics work for all legal types.	2024-06-13 11:10:28 +02:00
Ivan Kosarev	9890f94343	[AMDGPU][GFX12] Support disassembling MUBUF instructions with arbitrary FORMAT values. (#95243 ) Some tools generate such instructions with the FORMAT field set to 0, which corresponds to buf_fmt_invalid, but that should not prevent them from being recognised on decoding.	2024-06-13 08:16:06 +01:00
Matt Arsenault	dd7540f3da	AMDGPU: Handle buffer load/store for 64-bit element types Note pointers still don't work correctly.	2024-06-12 10:26:16 +02:00
Fabian Ritter	0821b7937c	[AMDGPU] Copy Defs and Uses from Pseudo to Real Instructions (#93004 ) Currently, the tablegen files that generate the instruction definitions in lib/Target/AMDGPU/AMDGPUGenInstrInfo.inc often only include implicit operands for the architecture-independent pseudo instructions, but not for the corresponding real instructions. The missing implicit operands (most prominently: the EXEC mask) do not affect code generation, since that operates on pseudo instructions, but they are problematic when working with real instructions, e.g., as a decoding result from the MC layer. This patch copies the implicit Defs and Uses from pseudo instructions to the corresponding real instructions, so that implicit operands are also defined for real instructions. Addresses issue #89830.	2024-05-31 08:40:54 +02:00
Mirko Brkušanin	1e6a82b8ef	[AMDGPU] Legalize and select raw/struct_buffer_load with tfe (#93310 )	2024-05-27 14:09:17 +02:00
Joe Nash	fe0b7983a2	[AMDGPU] Create AMDGPUMnemonicAlias tablegen class (#89288 ) AMDGPUMnemonicAlias is a MnemonicAlias that inherits from GCNPredicateControl, so that we can set predicates on the alias the same way as Instructions. Use AssemblerPredicate instead of Requires on aliases NFC.	2024-05-09 11:37:56 -04:00
Jay Foad	856d1c4410	[AMDGPU] Fix predicates for BUFFER_ATOMIC_FMIN/FMAX patterns (#89066 ) Use OtherPredicates to avoid interfering with other uses of SubtargetPredicate for GFX12.	2024-04-17 14:58:13 +01:00
David Stuttard	75e528fdd9	[AMDGPU] Extend zero initialization of return values for TFE (#85759 ) buffer_load instructions that use TFE also need to zero initialize return values similar to how the image instructions currently work. Add support for this with standard zero init of all results + zero init of just TFE flag when enable-prt-strict-null subtarget feature is disabled.	2024-03-25 09:01:46 +00:00
Jay Foad	7cd61f888c	[AMDGPU] Remove unneeded MnemonicAlias. NFC. This is unneeded because MUBUF_Real_Atomic_gfx11_gfx12 on the line above generates it automatically.	2024-03-13 12:03:41 +00:00
Jay Foad	36dece0013	[AMDGPU] Add missing GFX10 buffer format d16 hi instructions (#84809 )	2024-03-12 08:20:08 +00:00
Jay Foad	074fe3bac6	[AMDGPU] Simplify and refactor VBUFFER_Real class definitions. NFC. (#84521 ) Abstracting out a new base class VBUFFER_Real_gfx12 just highlights that the only difference between the MUBUF and MTBUF forms is in the handling of the "format" field.	2024-03-08 19:22:08 +00:00
Jay Foad	e460da14ec	[AMDGPU] Use get_BUF_ps to default real_name of BUF instructions. NFC. (#84524 )	2024-03-08 19:21:27 +00:00
Jay Foad	0456a32a2a	[AMDGPU] Simplify renamed BUF instruction definitions. NFC. (#84503 ) Use optional arguments instead of separate (multi)classes for renamed instructions.	2024-03-08 16:08:09 +00:00
Jay Foad	bf7f62ab92	[AMDGPU] Make use of Mnem_gfx11_gfx12. NFC.	2024-03-07 10:06:51 +00:00
Jay Foad	e49479b881	[AMDGPU] Remove unneeded BUF _impl multiclasses. NFC. (#84034 ) Remove MUBUF_Real_gfx11_impl and others. By converting the underlying class MUBUF_Real_gfx11 into a multiclass, the _impl wrapper is no longer needed.	2024-03-05 16:35:29 +00:00
Jay Foad	894f52fc0d	[AMDGPU] Use BUF multiclasses to reduce repetition. NFC. (#84003 ) Define BUF Real instructions with this general pattern for all architectures (not just GFX11): multiclass Something_Real_gfx11<...> { defvar ps = !cast<Pseudo>(NAME); def _gfx11 : ...; } This allows removing a huge amount of repetition in the definitions of individual Real instructions, where they would have to !cast their own name to a Pseudo and pass that in as a class argument.	2024-03-05 13:27:51 +00:00
Jay Foad	67a7a5e89d	[AMDGPU] Only use the BUF Base_ prefix for multiple architectures. NFC. The Base_ prefix seems redundant on a class that is only used for GFX11.	2024-03-05 12:01:45 +00:00
Jay Foad	4693efe19c	[AMDGPU] Remove Base_MUBUF_Real_Atomic_gfx11. NFC. (#83994 ) This class only existed to set the dlc bit for GFX11 atomics. It is simpler to set dlc for all loads/stores/atomics in the base class.	2024-03-05 11:58:17 +00:00
Jay Foad	762f762504	[AMDGPU] Rename get_MUBUF_ps and use it for MTBUF too. NFC. (#83991 ) This allows removing a couple of MTBUF helper (multi)classes.	2024-03-05 11:21:38 +00:00
Jay Foad	53f89a0bb7	[AMDGPU] Remove AtomicNoRet class and getAtomicNoRetOp table (#83593 )	2024-03-01 17:18:55 +00:00

1 2 3 4 5

240 Commits