llvm-project

Author	SHA1	Message	Date
Erick Ochoa Lopez	1fcfd5c67b	[mlir][amdgpu] Sink op creation in scaled conversion intrinsics (NFC) (#168542 ) Where possible: * notifyMatchFailure happen first * then op.emitOpError * finally assertions / op creation. --------- Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com>	2025-11-18 10:35:05 -05:00
Erick Ochoa Lopez	909c9aacea	[mlir][amdgpu] Add lowerings for ScaledExtPacked816 (#168123 ) * Adds lowerings for amdgpy.scaled_ext_packed816 * updates verifiers	2025-11-17 16:51:52 -05:00
Muzammiluddin Syed	b1262d13e0	[mlir][ROCDL] Refactor wmma intrinsics to use attributes not operands where possible (#167041 ) The current implementation of the WMMA intrinsic ops as they are defined in the ROCDL tablegen is incorrect. They represent as operands what should be attributes such as `clamp`, `opsel`, `signA/signB`. This change performs a refactoring to bring it in line with what we expect. --------- Signed-off-by: Muzammiluddin Syed <muzasyed@amd.com>	2025-11-13 19:50:02 -05:00
Jakub Kuderski	ba0be89cd2	[mlir] Simplify Default cases in type switches. NFC. (#165767 ) Use default values instead of lambdas when possible. `std::nullopt` and `nullptr` can be used now because of https://github.com/llvm/llvm-project/pull/165724.	2025-10-30 15:10:59 -04:00
Jakub Kuderski	3167752f29	[mlir][amdgpu][rocdl] Allow for graceful wmma conversion failures (#165616 )	2025-10-29 16:09:59 -04:00
Jakub Kuderski	466c526714	[mlir][amdgpu][rocdl] Add gfx1250 wmma ops (#165064 ) Update `amdgpu.wmma` op definition and implement amdgpu to rocdl conversion for new variants.	2025-10-28 12:42:39 -04:00
Jakub Kuderski	dc5f274560	[mlir][amdgpu] Add explicit intrinsic shape to wmma (#164920 ) This is in preparation for adding support for gfx1250 wmma intrinsics that include much more possible shapes. Instead of guessing the wave32/wave64 mode based on element types and vector sizes, require the intrinsic shapes to be set explicitly as attributes.	2025-10-24 12:21:33 -04:00
Jakub Kuderski	ae11c5c2c4	[mlir] Switch uses of deprecated .create methods to free function. NFC. (#164635 ) See https://discourse.llvm.org/t/psa-opty-create-now-with-100-more-tab-complete/87339.	2025-10-22 14:51:03 +00:00
Shilei Tian	2195fe7e01	[AMDGPU] Add the support for 45-bit buffer resource (#159702 ) On new targets like `gfx1250`, the buffer resource (V#) now uses this format: ``` base (57-bit): resource[56:0] num_records (45-bit): resource[101:57] reserved (6-bit): resource[107:102] stride (14-bit): resource[121:108] ``` This PR changes the type of `num_records` from `i32` to `i64` in both builtin and intrinsic, and also adds the support for lowering the new format. Fixes SWDEV-554034. --------- Co-authored-by: Krzysztof Drewniak <Krzysztof.Drewniak@amd.com>	2025-09-24 11:12:02 -04:00
Krzysztof Drewniak	5ecc6d1951	[mlir][AMDGPU] Use LDS-only MMRA fences for lds_barrier (#157919 ) The previous lowering strategy for amdgpu.lds_barrier (which is an operation whose semantics are) "s.barrier, and all LDS operations before this happen-before LDS operations after this, and there must not be an inherent fence/forcing-to-completion of global memory (for performance)" was previosuly implemented through using manual calls to waitcnt() intrinsics and the s_barrire intrinsic(s). The lack of explicit fencing enabled miscompiles (where LDS accesses were reordered with the barrier) on gfx12. Since LLVM now allows MMRA annotations to ensure that only LDS accesses are fenced by a pair of fences, we can now use these fences in order to explicitly represent the semantics we want instead of trying to prescribe the method of their implemntation. Note that the gfx908 workaround of hiding the s_barrier in inline assembly in order to prevent spurious vmem barriers remains in place, but is is removed for gfx11 because the fences have been changed to give us the effect we want recently.	2025-09-23 14:00:09 -05:00
Gaurav Verma	a2a9601ea4	[mlir][AMDGPU] Updated `PermlaneSwapOp` to select correct val (#157586 ) * as per the instruction description, updated `PermlaneSwapOp` to select correct val * updated corresponding lit tests Issue it resolves: the block reduction was failing otherwise as we were selecting the `{0}` always. --------- Signed-off-by: xintin <gaurav.verma@amd.com>	2025-09-12 13:45:56 +02:00
Tim Gymnich	003cbbd4ca	[mlir][amdgpu] Promote gpu.shuffle to amdgpu.permlane_swap (#154933 ) - promote `gpu.shuffle %src xor {16,32} 64` to `amdgpu.permlane_swap %src {16,32}`	2025-08-24 12:41:09 +02:00
Tim Gymnich	e20fa4f412	[mlir][AMDGPU] Add PermlaneSwapOp (#154345 ) - Add PermlaneSwapOp that lowers to `rocdl.permlane16.swap` and `rocdl.permlane32.swap` --------- Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com>	2025-08-21 18:21:43 +02:00
Maksim Levental	c610b24493	[mlir][NFC] update `mlir/Dialect` create APIs (27/n) (#150638 ) See https://github.com/llvm/llvm-project/pull/147168 for more info.	2025-07-25 11:48:32 -05:00
Maksim Levental	8e8f195322	[mlir][amd] fix LLVM::InsertValueOp::create failure to disambiguate (#150605 ) fixes https://github.com/llvm/llvm-project/pull/149879#issuecomment-3117145615 Note this happens because ADL can't disambiguate between `mlir::DenseI64ArrayAttr` and `llvm::ArrayRef<int64_t>` for the value 0 which I guess is equal to nullptr on some (most?) systems. Note, this only occurs with the value 0.	2025-07-25 07:56:27 -04:00
Maksim Levental	b0434925c9	[mlir][NFC] update `Conversion` create APIs (4/n) (#149879 ) See https://github.com/llvm/llvm-project/pull/147168 for more info.	2025-07-23 10:49:35 -05:00
Ivan Butygin	4977100624	[mlir][amdgpu] Add `rocdl.s.waitcnt` wrapper (#149670 ) The main motivations is to pass vmcnt/expcnt/lgkmcnt values directly (similar to the asm format) and delegate architecture-dependent bitpacking to the amdgpu->rocdl lowering. --------- Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>	2025-07-22 23:37:56 +03:00
Daniel Hernandez-Juarez	668c964282	[AMDGPU] [MLIR] Add 96 and 128 bit GatherToLDS for gfx950 (#147496 ) This PR adds 96 and 128 gather_to_lds support for gfx950. Updating lowering, verifier and tests.	2025-07-09 11:53:26 -04:00
Alan Li	3f3282cee8	[AMDGPU] Adding AMDGPU dialect wrapper for ROCDL transpose loads. (#145395 ) * 1-to-1 mapping wrapper op. * Direct lowering from AMDGPU wrapper to ROCDL intrinsics.	2025-06-25 22:58:14 -04:00
Umang Yadav	836201f117	Allow bf16 operands on new MFMAs (#144925 ) New gfx950 MFMA allows bf16 operands. `c0cc81cdc0/llvm/include/llvm/IR/IntrinsicsAMDGPU.td (L3434)` When running `amdgpu-to-rocdl`, Current logic converts bf16 to i16 always which fails to compile for newer bf16 MFMA e.g. `v_mfma_f32_16x16x32bf16`. Backend expects bf16 type for the operands for those newer MFMAs. This patch fixes it. CC: @krzysz00 @dhernandez0 @giuseros @antiagainst @kuhar	2025-06-19 12:52:31 -05:00
Daniel Hernandez-Juarez	68b6f392ed	[MLIR][AMDGPU] Fix bug in GatherToLDSOpLowering, get the correct MemRefType for destination (#142915 ) This PR fixes a bug in GatherToLDSOpLowering, we were getting the MemRefType of source for the destination. Additionally, some related typos are corrected. CC: @krzysz00 @umangyadav @lialan	2025-06-13 11:33:51 -05:00
Tim Gymnich	67c590004d	[mlir][AMDGPU] Add scaled floating point conversion ops (#141554 ) implement `ScaledExtPackedOp` and `PackedScaledTruncOp`	2025-06-13 11:09:11 +02:00
Bruno Cardoso Lopes	05494f3bad	[MLIR][LLVM] Tail call support for inline asm op (#140826 )	2025-05-22 15:30:31 -07:00
Krzysztof Drewniak	6c813e8a3c	[mlir][ROCDL] Add fp4 and fp6 conversion intrinsics, fix fp8 immargs (#140801 ) This PR adds support for the scaled conversion intrinsics for fp4 and fp6 types so that they can be targetted by a future amdgpu dialect op or used directly. Additionally, this patch refactors the copy-paste-heavy fp8 versions of these scaled conversion intrinsics with tablegen `foreach` loops, and fixes the fact that certain immargs weren't being stored as attributes. Note that some of the MLIR-level tests for those scaled fp8 intrinsics had incorrect return types, which have been fixed. (Note that while the operations have a known return type, the IR format still prints that type for clarity).	2025-05-21 13:50:02 -07:00
Peiyong Lin	04ad8d4900	Emit inbounds and nuw attributes in memref. (#138984 ) Now that MLIR accepts nuw and nusw in getelementptr, this patch emits the inbounds and nuw attributes when lower memref to LLVM in load and store operators. This patch also strengthens the memref.load and memref.store spec about undefined behaviour during lowering. This patch also lifts the \|rewriter\| parameter in getStridedElementPtr ahead so that LLVM::GEPNoWrapFlags can be added at the end with a default value and grouped together with other operators' parameters. Signed-off-by: Lin, Peiyong <linpyong@gmail.com>	2025-05-20 14:16:22 -07:00
Krzysztof Drewniak	4bdd116b80	[AMDGPU] Add a new amdgcn.load.to.lds intrinsic (#137425 ) This PR adds a amdgns_load_to_lds intrinsic that abstracts over loads to LDS from global (address space 1) pointers and buffer fat pointers (address space 7), since they use the same API and "gather from a pointer to LDS" is something of an abstract operation. This commit adds the intrinsic and its lowerings for addrspaces 1 and 7, and updates the MLIR wrappers to use it (loosening up the restrictions on loads to LDS along the way to match the ground truth from target features). It also plumbs the intrinsic through to clang.	2025-05-19 07:15:04 -07:00
Muzammil	105ce585d3	[mlir][amdgpu] Define an amdgpu.scaling_mfma wrapper (#137498 ) Create a wrapper around the new scaled MFMAs that operate on specific element types and tile sizes. See [Issue](https://github.com/iree-org/iree/issues/20616). --------- Signed-off-by: Muzammiluddin Syed <muzasyed@amd.com>	2025-05-02 18:07:17 -05:00
Ivan Butygin	dda4b968e7	[mlir] AMDGPUToROCDL: lower `amdgpu.swizzle_bitmode` (#136223 ) Repack `amdgpu.swizzle_bitmode` arguments and lower it to `rocdl.ds_swizzle`. Repacking logic is follows: * `sizeof(arg) < sizeof(i32)`: bitcast to integer and zext to i32 and then trunc and bitcast back. * `sizeof(arg) == sizeof(i32)`: just bitcast to i32 and back if not i32 * `sizeof(arg) > sizeof(i32)`: bitcast to `vector<Nxi32>`, extract individual elements and do a series of `rocdl.ds_swizzle` and then compose vector and bitcast back. Added repacking logic to LLVM utils so it can be used elsewhere. I'm planning to use it for `gpu.shuffle` later.	2025-04-18 17:19:04 +03:00
Alan Li	dae0ef53a0	[MLIR][AMDGPU] Add a wrapper for global LDS load intrinsics in AMDGPU (#133498 ) Defining a new `amdgpu.global_load` op, which is a thin wrap around ROCDL `global_load_lds` intrinsic, along with its lowering logics to `rocdl.global.load.lds`.	2025-04-08 09:18:30 -04:00
Krzysztof Drewniak	25622aa745	[mlir][AMDGPU] Add gfx950 MFMAs to the amdgpu.mfma op (#133553 ) This commit extends the lowering of amdgpu.mfma to handle the new double-rate MFMAs in gfx950 and adds tests for these operations. It also adds support for MFMAs on small floats (f6 and f4), which are implented using the "scaled" MFMA intrinsic with a scale value of 0 in order to have an unscaled MFMA. This commit does not add a `amdgpu.scaled_mfma` operation, as that is future work. --------- Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com>	2025-04-01 11:59:09 -05:00
Yi Qian	0ea4fb9264	[AMD][ROCDL] Add packed conversions fp8/bf8->bf16 and fp8/bf8->fp32 in ROCDL dialect (#131850 ) - Add packed conversions fp8/bf8->bf16 for gfx950 and fp8/bf8->fp32 for gfx942 in ROCDL dialect - Update amdgpu.ext_packed_fp8 lowering to use ROCDL packed fp8/bf8->f32 conversions for vector target types and ROCDL scalar fp8/bf8->fp32 for scalar target type. --------- Co-authored-by: Jungwook Park <jungwook.park@amd.com>	2025-03-21 14:49:50 +00:00
Mirza Halilčević	1fc49ff593	[MLIR][AMDGPU] Add OCP FP8 support for new hardware (#127728 ) (Continuing from #106160) This PR addresses remaining review comments from the original PR. Original PR Description --- Upcoming hardware (gfx12 and some future gfx9) will support the OCP 8-bit float formats for their matrix multiplication intrinsics and conversion operations, retaining existing opcodes and compiler builtins. This commit adds support for these types to the MLIR wrappers around such operations, ensuring that the OCP types aren't used to generate those builtins on hardware that doesn't expect that format and, conversely, to ensure that the pre-OCP formats aren't used on new hardware. --------- Signed-off-by: Mirza Halilcevic <mirza.halilcevic@amd.com> Co-authored-by: Paul Fuqua <pf@acm.org> Co-authored-by: Krzysztof Drewniak <Krzysztof.Drewniak@amd.com>	2025-03-03 14:10:31 -06:00
Krzysztof Drewniak	b31175a33a	[mlir][AMDGPU] Add int4 intrinsics, mixed-type fp8 to handle gfx12 (#128963 ) 1. Extend the gfx12 FP8 support to allow mixed-type intrinsics (since they've been added), creating limited mixed-type support that mirrors MFMA 2. Extend the `amdgpu.wmma` intrinsic lowering to correctly handle shorter vectors because gfx12 now has instructions that logically take a 4xi8, or, as far as LLVM's concerned, an i32. Similarly, there are 4xi4 inputs, which are an i16 (that must be zero-extended to i32). 3. Correctly handle the ambiguities in the int4 intrinsics on gfx12, which can either be 16x16x16 or 16x16x32 4. Add tests showing all WMMAs being lowered the way gfx12 expects (mirroring LLVM's tests) 5. Add a verifier to prevent emiting ilegal instructions on gfx12.	2025-02-27 14:48:58 -06:00
Krzysztof Drewniak	42526d240c	[mlir][AMDGPU] Plumb address space 7 through MLIR, add address_space attr. (#125594 ) This commit adds support for casting memrefs into fat raw buffer pointers to the AMDGPU dialect. Fat raw buffer pointers - or, in LLVM terms, ptr addrspcae(7), allow encapsulating a buffer descriptor (as produced by the make.buffer.rsrc intrinsic or provided from some API) into a pointer that supports ordinary pointer operations like load or store. This allows people to take advantage of the additional semantics that buffer_load and similar instructions provide without forcing the use of entirely separate amdgpu.raw_buffer_* operations. Operations on fat raw buffer pointers are translated to the corresponding LLVM intrinsics by the backend. This commit also goes and and defines a #amdgpu.address_space<> attribute so that AMDGPU-specific memory spaces can be represented. Only #amdgpu.address_space<fat_raw_buffer> will work correctly with the memref dialect, but the other possible address spaces are included for completeness. --------- Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com> Co-authored-by: Prashant Kumar <pk5561@gmail.com>	2025-02-26 16:02:39 -06:00
Ivan Butygin	6e611264c6	[mlir] AMDGPUToROCDL: handle 1-element vectors (#128266 ) Buffer intrinsics doesn't support 1-element vectors, cast them to scalars.	2025-02-23 03:50:59 +03:00
lorenzo chelini	0b63dfb066	[MLIR][NFC] Use base alias for constructor inheritance (#127756 ) During my previous cleanup (#127403), I did not notice that we defined a type alias for the base class. This type alias allows us to use the shorter form Base::Base, and this PR switches to that.	2025-02-19 16:05:25 +01:00
Fabian Ritter	8900e412ae	[AMDGPU][MLIR] Replace gfx940 and gfx941 with gfx942 in MLIR (#125836 ) gfx940 and gfx941 are no longer supported. This is one of a series of PRs to remove them from the code base. For SWDEV-512631	2025-02-19 10:05:45 +01:00
lorenzo chelini	c1a2292526	[MLIR][NFC] Retire `let constructor` for passes in Conversion directory (part1) (#127403 ) `let constructor` is deprecated since the table gen backend emits most of the glue logic to build a pass. This PR retires the td method for most (I need another pass) passes in the Conversion directory.	2025-02-17 10:55:27 +01:00
Krzysztof Drewniak	efd0a7f446	[mlir][ROCDL][~NFC] Migrate to LLVM dialect default builders (#125609 ) There were a bunch of spots in ROCDL.td where we were defining our own llvmBuilder call which could have been generated using the default built-in one on LLVM_IntrOpBase. This commit cleans up such usages in the interests of potentinally enabling ROCDL import in the future and of making best practices more obvious. The one breaking change is renaming WaitcntOp to SWaitcntOp, which should have minimal impact.	2025-02-06 11:38:43 -06:00
Matthias Springer	6aaa8f25b6	[mlir][IR][NFC] Move free-standing functions to `MemRefType` (#123465 ) Turn free-standing `MemRefType`-related helper functions in `BuiltinTypes.h` into member functions.	2025-01-21 08:48:09 +01:00
Matthias Springer	7a77f14c0a	[mlir][IR] Remove `isF...()` type API for low-precision FP types (#123326 ) Remove `type.isFloat4E2M1FN()` etc. Use `isa<Float4E2M1FNType>(type)` instead. For details, see: https://discourse.llvm.org/t/rethink-on-approach-to-low-precision-fp-types/82361/28	2025-01-20 09:22:53 +01:00
Fabian Mora	0c1c49f0ff	[mlir][AMDGPU] Fix raw buffer ptr ops lowering (#122293 ) This patch fixes several bugs in the lowering of AMDGPU raw buffer operations. These bugs include: - Incorrectly handling the offset of the memref, causing errors when using subviews. - Using the MaximumOp (float specific op) to calculate the number of records. - The number of records in the static shape case. - The lowering when index bitwidth=i64. Furthermore this patch also switches to use MLIR's data layout to get the type size. --------- Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com>	2025-01-13 16:11:33 -05:00
Ivan Butygin	953b07febc	[mlir] AMDGPUToROCDL: RawBufferOpLowering fixes (#120642 ) 1. We can use `getNumElements()` only for memrefs with trivial layout. 2. Buffer ops expecting sizes in i32 but descriptor values can be either i32 or i64, add appropriate casts. This implementation is not ideal as it can overflow, but it's still better than generating broken IR.	2024-12-20 18:09:01 +03:00
Krzysztof Drewniak	3452149c05	[mlir][AMDGPU] Support vector<2xbf16> packed atomic fadd (#113929 ) Now that we use LLVM's native bfloat types in the AMDGPU lowering, enable vector<2xbf16> for AMDGPU.	2024-10-31 10:52:53 -05:00
Benoit Jacob	d8a656ffaf	[MLIR] AMDGPUToROCDL: Use a bitcast op to reintepret a vector of i8 as single integer. (#111400 ) Found by inspecting AMDGPU assembly - so the arithmetic ops created there were definitely making their way into the target ISA. A `LLVM::BitcastOp` seems equivalent, and evaporates as expected in the target asm. Along the way, I thought that this helper function `mfmaConcatIfNeeded` could be renamed to `convertMFMAVectorOperand` to better convey its contract; so I don't need to think about whether a bitcast is a legitimate "concat" :-) --------- Signed-off-by: Benoit Jacob <jacob.benoit.1@gmail.com>	2024-10-07 14:14:18 -04:00
Matthias Springer	206fad0e21	[mlir][NFC] Mark type converter in `populate...` functions as `const` (#111250 ) This commit marks the type converter in `populate...` functions as `const`. This is useful for debugging. Patterns already take a `const` type converter. However, some `populate...` functions do not only add new patterns, but also add additional type conversion rules. That makes it difficult to find the place where a type conversion was added in the code base. With this change, all `populate...` functions that only populate pattern now have a `const` type converter. Programmers can then conclude from the function signature that these functions do not register any new type conversion rules. Also some minor cleanups around the 1:N dialect conversion infrastructure, which did not always pass the type converter as a `const` object internally.	2024-10-05 21:32:40 +02:00
Daniel Hernandez-Juarez	b014265d99	[mlir][AMDGPU] New gfx12 barrier instructions and update lowering LDSBarrierOp (#109273 ) New gfx12 barrier instructions: s.barrier.signal, s.barrier.wait and s.wait.dscnt. And update lowering LDSBarrierOp accordingly. CC: @krzysz00 @manupak @giuseros	2024-09-20 17:41:36 -05:00
Krzysztof Drewniak	6292ea6879	[mlir][AMDGPU] Remove an old bf16 workaround (#108409 ) The AMDGPU backend now implements LLVM's `bfloat` type. Therefore, we no longer need to type convert MLIR's `bf16` to `i16` during lowerings to ROCDL. As a result of this change, we discovered that, whel the code for MFMA and WMMA intrinsics was mainly prepared for this change, we were failing to bitcast the bf16 results of WMMA operations out from the i16 they're natively represented as. This commit also fixes that issue. --------- Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com>	2024-09-12 17:45:39 -05:00
Krzysztof Drewniak	9596e83b2a	[mlir][AMDGPU] Enable emulating vector buffer_atomic_fadd on gfx11 (#108312 ) * Fix a bug introduced by the Chipset refactoring in #107720 where atomics emulation for adds was mistakenly applied to gfx11+ * Add the case needed for gfx11+ atomic emulation, namely that gfx11 doesn't support atomically adding a v2f16 or v2bf16, thus requiring MLIR-level legalization for buffer intrinsics that attempt to do such an addition * Add tests, including tests for gfx11 atomic emulation Co-authored-by: Manupa Karunaratne <manupa.karunaratne@amd.com>	2024-09-12 09:47:52 -05:00
Krzysztof Drewniak	aa60a3e4d0	[mlir][AMDGPU] Support vector<2xf16> inputs to buffer atomic fadd (#108286 ) Extend the lowering of atomic.fadd to support the v2f16 variant avaliable on some AMDGPU chips. Re-lands #108238 (and addresses review comments from there) Co-authored-by: Giuseppe Rossini <giuseppe.rossini@amd.com>	2024-09-11 17:51:07 -05:00

1 2

93 Commits