llvm-project

Author	SHA1	Message	Date
Krzysztof Drewniak	52dfcab327	[NFC][mlir][AMDGPU] Partition dialect .td into multiple files (#178562 ) Follow the style of other dialects by having a distiinct .td file for each category of thing (type, attribdut, operation, enum) generated for the AMDGPU dialect. Nothing has changed, but a lot of things have been copy-pasted.	2026-01-29 15:10:20 -08:00
Ivan Butygin	ac62f12192	[mlir][amdgpu] Remove redundant barriers (#175436 )	2026-01-12 14:47:58 +03:00
Jorn Tuyls	5ff486d08e	[NFC][AMDGPU] Use getMixedSize in FatRawBufferCastOp dim reification (#174548 ) After https://github.com/llvm/llvm-project/pull/174477, I found similar logic that can be replaced by `memref::getMixedSize` in the FatRawBufferCastOp dimension reification function.	2026-01-06 08:32:25 -05:00
Ivan Butygin	9afbcde1d2	[mlir][amdgpu] Fix `gather_to_lds` for 0d memrefs (#173421 ) `dstType.areTrailingDimsContiguous(1)` asserts for memref of rank 0.	2025-12-24 02:00:55 +03:00
Eric Feng	24c7b4ea48	[mlir][amdgpu] implement amdgpu.sparse_mfma wrapper for smfmac instructions (#171968 ) Signed-off-by: Eric Feng <Eric.Feng@amd.com>	2025-12-18 20:16:14 -06:00
Justin Rosner	3a88bb90bb	[mlir][AMDGPU] Add scaled wmma ops for gfx1250 (#169854 ) This PR adds scaled WMMA ops (available on gfx1250) and the lowering to the AMDGPU dialect, wrapping the underlying intrinsics.	2025-12-15 15:44:36 -08:00
Erick Ochoa Lopez	5123d36c02	[mlir][amdgpu] Lower make_gather_dma_descriptor. (#172083 ) * Makes `MakeDescriptorOp` a template for `make_dma_descriptor` and `make_gather_dma_descriptor`. * Makes verification and folder for `make_dma_descriptor` a template. * Adds custom verification and folder for `make_dma_gather_descriptor` based on tempalte. * Adds `make_gather_dma_descriptor` op. * Lowers `make_gather_dma_descriptor` to ROCDL.	2025-12-15 13:32:57 -05:00
Zhewen Yu	d107b3c82a	[MLIR][AMDGPU] Implement reifyDimOfResult for FatRawBufferCastOp (#171839 ) Since `FatRawBufferCastOp` preserves the shape of its source operand, the result dimensions can be reified by querying the source's dimensions. --------- Signed-off-by: Yu-Zhewen <zhewenyu@amd.com>	2025-12-12 11:39:00 -08:00
Erick Ochoa Lopez	5ebb928532	[mlir][amdgpu] Adds make_dma_gather_base (#171857 ) * Adds `tdm_gather_base` type. * Adds `make_dma_gather_base` op. * Adds `make_dma_gather_base` lowering to ROCDL.	2025-12-12 09:20:38 -05:00
Ivan Butygin	f88d060c41	[mlir][amdgpu] `memory_counter_wait` tensor counter support (#171153 )	2025-12-08 20:02:40 +03:00
Ivan Butygin	ca8419d6cc	[mlir][amdgpu] Fuse adjacent `MemoryCounterWaitOp` (#171148 ) Taking the minimum value.	2025-12-08 18:52:26 +03:00
Tim Gymnich	0487154588	[mlir][amdgpu] Add workgroup_mask to MakeDmaDescriptorOp (#171103 ) - add `workgroup_mask` and `early_timeout`	2025-12-08 16:02:18 +01:00
Erick Ochoa Lopez	5dfd9c4f84	[mlir][amdgpu] Add lowering for make_dma_descriptor (#169955 ) * Adds initial lowering for make_dma_descriptor supporting tensors of rank 2. * Adds folders for make_dma_descriptor allowing statically known operands to be folded into attributes. * Add AllElementTypesMatch<["lds", "global"]> to make_dma_base. * Rename pad to pad_amount * Rename pad_every to pad_interval	2025-12-05 14:24:23 -05:00
Krzysztof Drewniak	e209b8bc2f	[mlir][AMDGPU] Rename gfx1250 packed extension ops, change firstScaleLane (#170718 ) The current name of scaled_ext_packed816 was, in retrospect, bothering me, since it just has a bunch of numbers on the end and doesn't really reflect the wave-wide nature of the operation. On top of that, the fact that firstScaleLane was 0 or 1, which might be read as the first lane being 1 (and not what it actually was, 16), also seemed weird. Therefore, before this op sees any use, 1. Renaem it to scaled_ext_packed_matrix 2. Change the semantics of firstScaleLane to actually point at the lane where the scales start (valid options currently are 0 or 16, the two halves of a wave32 wave). (Disclaimer: the mechanical updates were done via AI.) --------- Co-authored-by: Erick Ochoa Lopez <eochoalo@amd.com>	2025-12-04 14:35:16 -08:00
Erick Ochoa Lopez	73979c1df9	[mlir][amdgpu] Lower amdgpu.make_dma_base (#169817 ) * Adds lowering for `amdgpu.make_dma_base`	2025-12-02 13:48:31 -05:00
Erick Ochoa Lopez	df3e1b59d8	[mlir][amdgpu] Add amdgpu.make_dma_descriptor (#169407 ) Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com>	2025-12-01 15:05:02 -05:00
Erick Ochoa Lopez	9af00e62ec	[mlir][amdgpu] Add make_dma_base operation (#169086 )	2025-11-26 19:43:38 +00:00
Erick Ochoa Lopez	909c9aacea	[mlir][amdgpu] Add lowerings for ScaledExtPacked816 (#168123 ) * Adds lowerings for amdgpy.scaled_ext_packed816 * updates verifiers	2025-11-17 16:51:52 -05:00
Erick Ochoa Lopez	e468ea3f40	[mlir][amdgpu] Fix documentation and verifiers (#167369 )	2025-11-17 08:34:21 -05:00
Jakub Kuderski	466c526714	[mlir][amdgpu][rocdl] Add gfx1250 wmma ops (#165064 ) Update `amdgpu.wmma` op definition and implement amdgpu to rocdl conversion for new variants.	2025-10-28 12:42:39 -04:00
Jakub Kuderski	f248010a52	[mlir][amdgpu] Update mfma assembly format with intrinsic shape (#165037 ) Use the same format as introduced for wmma by https://github.com/llvm/llvm-project/pull/164920. Also make `blocks` default to 1.	2025-10-25 05:58:43 -04:00
Jakub Kuderski	dc5f274560	[mlir][amdgpu] Add explicit intrinsic shape to wmma (#164920 ) This is in preparation for adding support for gfx1250 wmma intrinsics that include much more possible shapes. Instead of guessing the wave32/wave64 mode based on element types and vector sizes, require the intrinsic shapes to be set explicitly as attributes.	2025-10-24 12:21:33 -04:00
Erick Ochoa Lopez	a76c71b205	[mlir][amdgpu] Add scaled_ext_packed{8,16} operations (#159830 )	2025-10-17 12:58:03 -04:00
Muzammil	5a6756d2a0	[mlir][AMGPU] Replace use of SmallVector with ArrayRef, NFC (#163770 ) Improving choice of class used, from SmallVector to ArrayRef (https://llvm.org/docs/ProgrammersManual.html#llvm-adt-arrayref-h). Also infer template types when possible. Leftover from https://github.com/llvm/llvm-project/pull/155951. --------- Signed-off-by: Muzammiluddin Syed <muzasyed@amd.com>	2025-10-16 10:41:22 -04:00
Ivan Butygin	6ad662d322	[mlir][amdgpu] Add Inliner interface (#162873 ) All the `amdgpu` dialect ops can be inlined. --------- Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>	2025-10-10 21:34:00 +03:00
Muzammil	9628061e05	[mlir][AMDGPU] Add canonicalization pattern to pack scales for ScaledMFMAOp (#155951 ) The ScaledMFMAOp accepts scales as a vector of 4 bytes (`vector<4xf8E8M0FNU>`) that can be stored in a single register with a particular scale accessed using the `OpSel` attribute. Currently, we only use one byte in this 4-byte vector, resulting in 3 wasted registers. This is fixed by identifying when single byte extractions are performed and rewriting them into extractions of 4-byte vectors. Example: ``` %unit = vector.extract %ScaleSrc[offsets] : f8E8M0FNU from vector<?x?x?xf8E8M0FNU> %scale = vector.insert %unit, ... : f8E8M0FNU into vector<4xf8E8M0FNU> amdgpu.scaled_mfma(%scale[0] * ... ``` to ``` %reshaped = vector.shape_cast %ScaleSrc : vector<?x?x?xf8E8M0FNU> to vector<?x4xf8E8M0FNU> %scale = vector.extract %reshaped[?] : vector<4xf8E8M0FNU> from vector<?x4xf8E8M0FNU> amdgpu.scaled_mfma(%scale[0-3] * ... ``` --------- Signed-off-by: Muzammiluddin Syed <muzasyed@amd.com>	2025-09-18 19:25:14 +00:00
Mehdi Amini	6ec08132ee	[MLIR] Apply clang-tidy fixes for readability-identifier-naming in AMDGPUDialect.cpp (NFC)	2025-09-18 10:28:46 -07:00
Tim Gymnich	e20fa4f412	[mlir][AMDGPU] Add PermlaneSwapOp (#154345 ) - Add PermlaneSwapOp that lowers to `rocdl.permlane16.swap` and `rocdl.permlane32.swap` --------- Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com>	2025-08-21 18:21:43 +02:00
Quinn Dawkins	72bc1bea7a	[mlir][AMDGPU] Allow non-contiguous destination memrefs for gather_to_lds (#152559 ) The requirement that the LDS operand is contiguous is overly restrictive because it's perfectly valid to have a subview depend on subgroup IDs that is still subgroup contiguous. We could continue trying to do this verification based on the number of copied elements, but instead this change just opts to clarify the semantics on the op definition.	2025-08-07 17:24:15 -04:00
Quinn Dawkins	b7f889a29c	[mlir][AMDGPU] Add canonicalizer for folding casts into gather_to_lds (#150503 )	2025-07-24 19:58:30 -04:00
Krzysztof Drewniak	9052a85da8	[mlir][AMDGPU] Infer canonical layouts for fat_raw_buffer_cast resetOffset (#149867 ) When inferring the return type of amdgpu.fat_raw_buffer_cast with the offset reset, we would sometimes use a strided layout, like strided<[1]>, in cases where, after stripping the offset, the memref had the identity layout. This would cause issues with EmulateNarrowTypes, which does perform this layout canonicalization. Now, the return type inference will put in an identity layout after offset stripping for 1. Statically-shaped memrefs of any rank where the strides match the suffix product of the shape, and 2. Memrefs of rank <= 1 whose strides are [1] (or []) that just had their offset removed by resetOffset.	2025-07-21 15:18:19 -05:00
Ivan Butygin	6b29ee9d9a	[mlir][amdgpu] Properly handle mismatching memref ranks in `amdgpu.gather_to_lds` (#149407 ) This op doesn't have any rank or indices restrictions on src/dst memrefs, but was using `SameVariadicOperandSize` which was causing issues. Also fix some other issues while we at it.	2025-07-18 00:42:25 +03:00
Daniel Hernandez-Juarez	668c964282	[AMDGPU] [MLIR] Add 96 and 128 bit GatherToLDS for gfx950 (#147496 ) This PR adds 96 and 128 gather_to_lds support for gfx950. Updating lowering, verifier and tests.	2025-07-09 11:53:26 -04:00
Alan Li	3f3282cee8	[AMDGPU] Adding AMDGPU dialect wrapper for ROCDL transpose loads. (#145395 ) * 1-to-1 mapping wrapper op. * Direct lowering from AMDGPU wrapper to ROCDL intrinsics.	2025-06-25 22:58:14 -04:00
Tim Gymnich	67c590004d	[mlir][AMDGPU] Add scaled floating point conversion ops (#141554 ) implement `ScaledExtPackedOp` and `PackedScaledTruncOp`	2025-06-13 11:09:11 +02:00
Krzysztof Drewniak	4bdd116b80	[AMDGPU] Add a new amdgcn.load.to.lds intrinsic (#137425 ) This PR adds a amdgns_load_to_lds intrinsic that abstracts over loads to LDS from global (address space 1) pointers and buffer fat pointers (address space 7), since they use the same API and "gather from a pointer to LDS" is something of an abstract operation. This commit adds the intrinsic and its lowerings for addrspaces 1 and 7, and updates the MLIR wrappers to use it (loosening up the restrictions on loads to LDS along the way to match the ground truth from target features). It also plumbs the intrinsic through to clang.	2025-05-19 07:15:04 -07:00
Christian Sigg	3a6b9b3a87	[mlir][bazel] Fix after dae0ef53a0b99c6c2b74143baee5896e8bc5c8e7 Remove unnecessary include.	2025-04-08 15:47:14 +02:00
Alan Li	dae0ef53a0	[MLIR][AMDGPU] Add a wrapper for global LDS load intrinsics in AMDGPU (#133498 ) Defining a new `amdgpu.global_load` op, which is a thin wrap around ROCDL `global_load_lds` intrinsic, along with its lowering logics to `rocdl.global.load.lds`.	2025-04-08 09:18:30 -04:00
Krzysztof Drewniak	25622aa745	[mlir][AMDGPU] Add gfx950 MFMAs to the amdgpu.mfma op (#133553 ) This commit extends the lowering of amdgpu.mfma to handle the new double-rate MFMAs in gfx950 and adds tests for these operations. It also adds support for MFMAs on small floats (f6 and f4), which are implented using the "scaled" MFMA intrinsic with a scale value of 0 in order to have an unscaled MFMA. This commit does not add a `amdgpu.scaled_mfma` operation, as that is future work. --------- Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com>	2025-04-01 11:59:09 -05:00
Mirza Halilčević	1fc49ff593	[MLIR][AMDGPU] Add OCP FP8 support for new hardware (#127728 ) (Continuing from #106160) This PR addresses remaining review comments from the original PR. Original PR Description --- Upcoming hardware (gfx12 and some future gfx9) will support the OCP 8-bit float formats for their matrix multiplication intrinsics and conversion operations, retaining existing opcodes and compiler builtins. This commit adds support for these types to the MLIR wrappers around such operations, ensuring that the OCP types aren't used to generate those builtins on hardware that doesn't expect that format and, conversely, to ensure that the pre-OCP formats aren't used on new hardware. --------- Signed-off-by: Mirza Halilcevic <mirza.halilcevic@amd.com> Co-authored-by: Paul Fuqua <pf@acm.org> Co-authored-by: Krzysztof Drewniak <Krzysztof.Drewniak@amd.com>	2025-03-03 14:10:31 -06:00
Krzysztof Drewniak	b31175a33a	[mlir][AMDGPU] Add int4 intrinsics, mixed-type fp8 to handle gfx12 (#128963 ) 1. Extend the gfx12 FP8 support to allow mixed-type intrinsics (since they've been added), creating limited mixed-type support that mirrors MFMA 2. Extend the `amdgpu.wmma` intrinsic lowering to correctly handle shorter vectors because gfx12 now has instructions that logically take a 4xi8, or, as far as LLVM's concerned, an i32. Similarly, there are 4xi4 inputs, which are an i16 (that must be zero-extended to i32). 3. Correctly handle the ambiguities in the int4 intrinsics on gfx12, which can either be 16x16x16 or 16x16x32 4. Add tests showing all WMMAs being lowered the way gfx12 expects (mirroring LLVM's tests) 5. Add a verifier to prevent emiting ilegal instructions on gfx12.	2025-02-27 14:48:58 -06:00
Krzysztof Drewniak	42526d240c	[mlir][AMDGPU] Plumb address space 7 through MLIR, add address_space attr. (#125594 ) This commit adds support for casting memrefs into fat raw buffer pointers to the AMDGPU dialect. Fat raw buffer pointers - or, in LLVM terms, ptr addrspcae(7), allow encapsulating a buffer descriptor (as produced by the make.buffer.rsrc intrinsic or provided from some API) into a pointer that supports ordinary pointer operations like load or store. This allows people to take advantage of the additional semantics that buffer_load and similar instructions provide without forcing the use of entirely separate amdgpu.raw_buffer_* operations. Operations on fat raw buffer pointers are translated to the corresponding LLVM intrinsics by the backend. This commit also goes and and defines a #amdgpu.address_space<> attribute so that AMDGPU-specific memory spaces can be represented. Only #amdgpu.address_space<fat_raw_buffer> will work correctly with the memref dialect, but the other possible address spaces are included for completeness. --------- Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com> Co-authored-by: Prashant Kumar <pk5561@gmail.com>	2025-02-26 16:02:39 -06:00
Matthias Springer	6aaa8f25b6	[mlir][IR][NFC] Move free-standing functions to `MemRefType` (#123465 ) Turn free-standing `MemRefType`-related helper functions in `BuiltinTypes.h` into member functions.	2025-01-21 08:48:09 +01:00
Matthias Springer	7a77f14c0a	[mlir][IR] Remove `isF...()` type API for low-precision FP types (#123326 ) Remove `type.isFloat4E2M1FN()` etc. Use `isa<Float4E2M1FNType>(type)` instead. For details, see: https://discourse.llvm.org/t/rethink-on-approach-to-low-precision-fp-types/82361/28	2025-01-20 09:22:53 +01:00
Frank Schlimbach	d5746d73ce	eliminating g++ warnings (#105520 ) Eliminating g++ warnings. Mostly declaring "[[maybe_unused]]", adding return statements where missing and fixing casts. @rengolin --------- Co-authored-by: Benjamin Maxwell <macdue@dueutil.tech> Co-authored-by: Renato Golin <rengolin@systemcall.eu>	2024-10-18 21:20:47 +01:00
Giuseppe Rossini	a8e1c6f99a	[MLIR][AMDGPU] Add support for fp8 ops on gfx12 (#106388 ) This PR is adding support for `fp8` and `bfp8` on gfx12	2024-09-03 17:47:08 +01:00
Giuseppe Rossini	1387ba48a3	[MLIR][AMDGPU] Introduce fp16 packed arithmetic (#105688 ) This PR is introducing rocdl.cvt.pkrtz in the ROCDL dialect and it is using that instruction when lowering `arith::TruncFOp`.	2024-08-26 12:48:57 -05:00
stefankoncarevic	1164e4aef2	[mlir][AMDGPU] Implement AMDGPU DPP operation in MLIR. (#89233 ) Defined AMDGPU DPP operation in mlir to represent semantics. Introduced a new enumeration attribute for different permutations and allowed for different types of arguments. Implemented constant attribute handling for ROCDL::DPPMovOp operation. The operation now correctly accepts constant attributes for dppCtrl, rowMask, bankMask, boundCtrl, and passes them to the corresponding LLVM intrinsic.	2024-08-16 11:19:39 -05:00
Christian Sigg	a5757c5b65	Switch member calls to `isa/dyn_cast/cast/...` to free function calls. (#89356 ) This change cleans up call sites. Next step is to mark the member functions deprecated. See https://mlir.llvm.org/deprecation and https://discourse.llvm.org/t/preferred-casting-style-going-forward.	2024-04-19 15:58:27 +02:00
Jakub Kuderski	44718311de	[mlir][amdgpu] Remove shared memory optimization pass (#88225 ) This implementation has a number of issues and ultimately does not work on gfx9. * It does not reduce bank conflicts with wide memory accesses. * It does not correctly account for when LDS bank conflicts occur on amdgpu. * The implementation is too fragile to be used on real-world code. For example, the code bails out on any `memref.subview` in the root op, even when the subview is not a user of any of the `memref.alloc` ops. I do not see how these can be easily fixed, therefore I think it's better to delete this code.	2024-04-11 11:07:17 -04:00

1 2

70 Commits