llvm-project

Author	SHA1	Message	Date
Dmitry Chigarev	a9fb8b0622	[MLIR][XeGPU] Support vector.contract transpose_a/transpose_b via 'vector-to-gpu' patterns (#182885 ) The PR adds [`vector.contract(transpose_a/transpose_b)` decomposition patterns](`3215645b8d/mlir/lib/Conversion/VectorToGPU/VectorToGPU.cpp (L1263)`) from `vector-to-gpu` to `vector-to-xegpu` pass. The `populatePrepareVectorToMMAPatterns` adds two patterns: 1. `PrepareContractToGPUMMA` that splits `vector.contract(transpose)` into `vector.transpose + vector.contract` 2. `CombineTransferReadOpTranspose` that fuses `vector.transpose` into the permutation map of `vector.transfer_read` The second pattern doesn't always bring us to the desired result (`xegpu.load_nd + vector.transpose + xegpu.dpas`) since [not all data types are supported ](`1237bd6df0/mlir/lib/Conversion/VectorToXeGPU/VectorToXeGPU.cpp (L570-L575)`) for the transposed-read case. There's a second PR (#182875) on this matter that adds a decomposition-pattern for unsupported types (it might seem strange that we first fuse and then decompose transfer_read+transpose but this way we don't have code duplication between vector-to-gpu&to-xegpu passes and cover all functional cases) --------- Signed-off-by: dchigarev <dmitry.chigarev@intel.com>	2026-03-05 17:08:55 +01:00
Dmitry Chigarev	3925d112c4	[MLIR][XeGPU] Decompose unsupported 'vector.transfer_read'-transpose-permutations (#182875 ) The PR adds a pattern to `vector-to-xegpu` pass that decomposes `vector.transfer_read` with unsupported transpose-permutations (unsupported element-type) into `vector.transfer_read + vector.transpose`: Example: ```mlir // input-ir: %0 = vector.transfer_read %source[%offset, %offset], %c0 {permutation_map = affine_map<(d0, d1) -> (d1, d0)>, in_bounds = [true, true]} : memref<32x64xf16>, vector<8x16xf16> // mlir-opt %s --convert-vector-to-xegpu // before PR (no conversion because of unsupported type): %0 = vector.transfer_read %source[%offset, %offset], %c0 {permutation_map = affine_map<(d0, d1) -> (d1, d0)>, in_bounds = [true, true]} : memref<32x64xf16>, vector<8x16xf16> // mlir-opt %s --convert-vector-to-xegpu // after PR (decomposed + converted): %0 = xegpu.load_nd %source[%offset, %offset] %1 = vector.transpose %0 ``` --------- Signed-off-by: dchigarev <dmitry.chigarev@intel.com>	2026-03-05 17:08:22 +01:00
Mehdi Amini	bbd5b1d3bd	[mlir][VectorToXeGPU] Fix crash on memref with non-scalar element type (#183905 ) The vector.store and vector.load lowering in --convert-vector-to-xegpu would crash when the source memref had a non-integer/float element type (e.g. memref<?xvector<4xf32>>). The crash occurred inside createNdDescriptor() when computing the byte offset for dynamic memrefs: srcTy.getElementTypeBitWidth() internally calls getIntOrFloatBitWidth() which asserts on non-scalar types such as vector<4xf32>. Fix by adding a check for the memref's element type in storeLoadPreconditions(). If the element type is not an integer or float, the pattern returns notifyMatchFailure() instead of proceeding and crashing. The same guard is applied to TransferReadLowering and TransferWriteLowering which share the same helper and can hit the same path. Fixes #181463	2026-03-03 11:33:03 +00:00
Jakub Kuderski	59e44799bd	[mlir] Fix new clang-tidy warning llvm-type-switch-case-types. NFC. (#178487 ) Pre-commiting this before landing the new check in https://github.com/llvm/llvm-project/pull/177892	2026-01-28 19:13:47 +00:00
Stefan Weigl-Bosker	db2f0f87b9	[MLIR][XeGPU]: Reject `tensor_desc` types with unknown bitwidth (#173922 ) Fixes https://github.com/llvm/llvm-project/issues/173851 1. Only allow XeGPU_ScalarType element types in `xegpu::TensorDescType` (via verifier, keeping mlir::Type params in api) 2. Fix `VectorToXeGPU` to prevent vectors with invalid TensorDescType element types from lowering	2026-01-20 13:12:33 -08:00
Sang Ik Lee	8b0a24a50d	[MLIR] Vector to XeGPU conversion: Use proper source variant for create_nd_tdesc op creation. (#171216 ) If source strided memref is not fully static - at least one of shape, strides, offset is kDynamic - use i64 source variant. With this change, xegpu.create_nd_tdesc created by lowering from vector dialect, can rely on getMixedOffsets, getMixedSize and getMixedStrides to get relevant values.	2025-12-18 11:03:51 -08:00
Nishant Patel	5fc8e87fe2	[MLIR][XeGPU] Retain anchor op layouts for XeGPU nD ops (#170934 ) This PR adds support to retain the anchor op layouts (after dropping what's not required) for xegpu nD ops during workgroup to subgroup & unroll transformation	2025-12-05 21:49:13 -08:00
Dmitry Chigarev	d90bc3bc60	[mlir][XeGPU][VectorToXeGPU] Use 'xegpu.load' to lower 1D 'vector.transfer_read' for PVC & BMG (#168910 ) The PR changes the `TransferReadLowering` to always use `xegpu.load` (and not `xegpu.load_nd`) for 1D cases as it has more developed interface (e.g. layouts capabilites). Signed-off-by: dchigarev <dmitry.chigarev@intel.com>	2025-11-24 13:01:57 +01:00
Dmitry Chigarev	6c563dc6a2	[mlir][XeGPU] Add optional layout attribute to LoadGather StoreScatter ops (#163414 ) As [suggested here](https://github.com/llvm/llvm-project/pull/163071#discussion_r2427229637) the PR adds an optional layout attribute for `LoadGather` and `StoreScatter` ops. For the load-op the attribute describes the layout of the result (ex `layout_result_0`), and for store-op it describes the layout for the vector-to-store operand (ex `layout_operand_0`). The PR also reworks `propagate-layout` pass to consider perm layout attributes and back-propagate them accordingly. The helper utility function `getDistributeLayoutAttr` is reworked to return either `layout_operand/result_0` or `layout` for load/store ops (denepding on which one is set). After an offline discussion decided that the overall utilities layouts API is confusing since it tries to mix permament and temporary layouts. Would need to change it in the future. --------- Signed-off-by: dchigarev <dmitry.chigarev@intel.com>	2025-11-04 08:19:47 -08:00
Dmitry Chigarev	747050bcce	[MLIR][XeGPU][VectorToXeGPU] Lower vector.load/store/transfer_read/transfer_write to new offsets syntax (#162095 ) Changes the `VectorToXeGPU` pass to generate `xegpu.load_nd/store_nd` ops using new syntax with where offsets are specified at the load/store ops level. ```mlir // from this %desc = xegpu.create_nd_tdesc %src[%off1, %off2]: memref<8x16xf16> -> !xegpu.tensor_desc<8x16xf16> %res = xegpu.load_nd %desc : !xegpu.tensor_desc<8x16xf16> -> vector<8x16xf16> // to this %desc = xegpu.create_nd_tdesc %src: memref<8x16xf16> -> !xegpu.tensor_desc<8x16xf16> %res = xegpu.load_nd %desc[%off1, %off2] : !xegpu.tensor_desc<8x16xf16> -> vector<8x16xf16> ``` In order to support cases with dimension reduction at the `create_nd_tdesc` level (e.g. `memref<8x8x16xf16> -> tensor_desc<8x16xf16>` it was decided to insert a memref.subview that collapses the source shape to 2d, for example: ```mlir // input: %0 = vector.load %source[%off0, %off1, %off2] : memref<8x16x32xf32>, vector<8x16xf32> // --vector-to-xegpu (old) %tdesc = xegpu.create_nd_tdesc %source[%off0, %off1, %off2] : memref<8x16x32xf32> -> tdesc<8x32xf32> %vec = xegpu.load_nd %tdesc // --vector-to-xegpu (new) %collapsed = memref.subview %source[%off0, 0, 0] [1, 16, 32] [1, 1, 1] : memref<8x16x32xf32> -> memref<16x32xf32, strided<[32, 1], offset: ?>> %tdesc = xegpu.create_nd_tdesc %collapsed : memref<16x32xf32, ...> -> tdesc<8x32xf32> %vec = xegpu.load_nd %tdesc[%off1, %off2] ``` <details><summary>Why we need to change that?</summary> ```mlir // reduce dim and apply all 3 offsets at load_nd %desc = xegpu.create_nd_tdesc %source : memref<8x16x32xf32> -> !xegpu.tensor_desc<16x32xf32> // error: xegpu.load_nd len(offsets) != desc.rank %res = xegpu.load_nd %desc[%off, %off, %off] : !xegpu.tensor_desc<16x32xf32> -> vector<8x16xf32> ``` </details> --------- Signed-off-by: dchigarev <dmitry.chigarev@intel.com>	2025-11-04 13:52:23 +01:00
Jakub Kuderski	ba0be89cd2	[mlir] Simplify Default cases in type switches. NFC. (#165767 ) Use default values instead of lambdas when possible. `std::nullopt` and `nullptr` can be used now because of https://github.com/llvm/llvm-project/pull/165724.	2025-10-30 15:10:59 -04:00
Jakub Kuderski	6ee362e1b5	[mlir][vector] Simplify rewrite pattern inheriting constructors. NFC. (#161966 ) Use the `Base` type alias from https://github.com/llvm/llvm-project/pull/158433.	2025-10-04 15:49:25 -04:00
Dmitry Chigarev	c4617bcae1	[MLIR][XeGPU][VectorToXeGPU] Add lowering from vector.gather/scatter to xegpu.load/store (#158024 ) Lowering for `vector.gather`/`vector.scatter` into `xegpu.load`/`xegpu.store`. High level steps to lower vector.gather/scatter: ``` %0 = vector.gather %source[%off1, %off2, %off3][%indices], %mask, %pass_thru : memref<8x16x32xf32>, vector<8xindex>, vector<8xi1>, vector<8xf32> into vector<8xf32> ``` 1. Compute strides and a memref offset for the `%source` memref using `computeMemrefMeta` func from the transfer_read/write lowering 2. Compute a linear offset like `%lin_off = %base_offset + %off1 * strides#0 + %off2 * strides#1 + %off3 * strides#2` 3. Combine the linear offset with `%indices`: `%off = (broadcast %lin_off : index to vector<8xindex>) + %indices * strides#2` 4. Convert memref to an i64: `%flat_memref = memref.extract_aligned_pointer_as_index %source + arith.index_cast` 5. Perform load/store: `%vec = xegpu.load %flat_memref[%off], %mask` 6. Apply selection to propagate values from the pass_thru vector: `%res = arith.select %mask, %vec, %pass_thru`	2025-09-19 11:12:14 +02:00
Dmitry Chigarev	40e85fcaaa	[MLIR][XeGPU][VectorToXeGPU] Fix transfer_read/write cases with non-contiguous memrefs (#158126 ) This PR fixes a case where a source memref in `vector.transfer_read/write` is not contiguous, which violates the `memref.collapse_shape` semantic that is used in the lowering. <details><summary>An example of a failing test</summary> ```mlir gpu.module @xevm_module { gpu.func @load_from_subview(%source: memref<4096x4096xf16>, %off1: index, %off2: index) -> vector<8xf16> { %c0 = arith.constant 0.0 : f16 %subview = memref.subview %source[%off1, %off2] [256, 256] [1, 1] : memref<4096x4096xf16> to memref<256x256xf16, strided<[4096, 1], offset: ?>> %0 = vector.transfer_read %subview[%off2, %off2], %c0 {in_bounds = [true]} : memref<256x256xf16, strided<[4096, 1], offset: ?>>, vector<8xf16> gpu.return %0 : vector<8xf16> } } ``` Fails with: ``` /home/user/llvm/mlir/test/Conversion/VectorToXeGPU/transfer-read-to-xegpu.mlir:404:8: error: 'memref.collapse_shape' op invalid source layout map or collapsing non-contiguous dims %0 = vector.transfer_read %subview[%off2, %off2], %c0 ^ /home/user/llvm/mlir/test/Conversion/VectorToXeGPU/transfer-read-to-xegpu.mlir:404:8: note: see current operation: %8 = "memref.collapse_shape"(%2) <{reassociation = [[0, 1]]}> : (memref<256x256xf16, strided<[4096, 1], offset: ?>>) -> memref<65536xf16> ``` </details> A suggestion was to replace `memref.collapse_shape` with `memref.extract_aligned_pointer_as_index` which is done in this PR. Since `extract_aligned_pointer` applied to a subview returns an original pointer without subview offsets, this PR also adds a logic to use an offset obtained from `memref.extract_strided_metadata` in `baseOffset` calculation in `computeOffsets`. --------- Signed-off-by: dchigarev <dmitry.chigarev@intel.com>	2025-09-11 18:25:51 -07:00
Jianhui Li	98728d9dc8	[MLIR][XeGPU] Add lowering from transfer_read/transfer_write to load_gather/store_scatter (#152429 ) Lowering transfer_read/transfer_write to load_gather/store_scatter in case the target uArch doesn't support load_nd/store_nd. The high level steps: 1. compute Strides; 2. compute Offsets; 3. collapseMemrefTo1D; 4. create Load gather or store_scatter op	2025-08-14 11:27:07 -07:00
Maksim Levental	38976a03cd	[mlir][NFC] update `Conversion` create APIs (7/n) (#149889 ) See https://github.com/llvm/llvm-project/pull/147168 for more info.	2025-07-22 10:41:06 -04:00
Adam Siemieniuk	06ae0c2a10	[mlir][xegpu] Remove vector contract to dpas size restriction (#147470 ) Removes contraction shape check to allow representing large workgroup-level workloads in preparation for distribution.	2025-07-09 22:37:06 +02:00
Kazu Hirata	fa9adbfda9	[mlir] Remove unused includes (NFC) (#147101 ) These are identified by misc-include-cleaner. I've filtered out those that break builds. Also, I'm staying away from llvm-config.h, config.h, and Compiler.h, which likely cause platform- or compiler-specific build failures.	2025-07-04 13:30:21 -07:00
Andrzej Warzyński	c45cc3e420	[mlir][vector] Standardize `base` Naming Across Vector Ops (NFC) (#137859 ) [mlir][vector] Standardize base Naming Across Vector Ops (NFC) This change standardizes the naming convention for the argument representing the value to read from or write to in Vector ops that interface with Tensors or MemRefs. Specifically, it ensures that all such ops use the name `base` (i.e., the base address or location to which offsets are applied). Updated operations: * `vector.transfer_read`, * `vector.transfer_write`. For reference, these ops already use `base`: * `vector.load`, `vector.store`, `vector.scatter`, `vector.gather`, `vector.expandload`, `vector.compressstore`, `vector.maskedstore`, `vector.maskedload`. This is a non-functional change (NFC) and does not alter the semantics of these operations. However, it does require users of the XFer ops to switch from `op.getSource()` to `op.getBase()`. To ease the transition, this PR temporarily adds a `getSource()` interface method for compatibility. This is intended for downstream use only and should not be relied on upstream. The method will be removed prior to the LLVM 21 release. Implements #131602	2025-05-12 09:44:50 +01:00
Adam Siemieniuk	a16c225b40	[mlir][xegpu] Convert Vector contraction to XeGPU (#122115 ) Adds pattern to lower vector.contract to XeGPU operation.	2025-03-13 19:41:53 +01:00
lorenzo chelini	c1a2292526	[MLIR][NFC] Retire `let constructor` for passes in Conversion directory (part1) (#127403 ) `let constructor` is deprecated since the table gen backend emits most of the glue logic to build a pass. This PR retires the td method for most (I need another pass) passes in the Conversion directory.	2025-02-17 10:55:27 +01:00
Jay Foad	aa2952165c	Fix typo "tranpose" (#124929 )	2025-01-29 17:49:54 +00:00
Han-Chung Wang	9cbc1f29ca	[mlir][NFC] Avoid using braced initializer lists to call a constructor. (#123714 ) In the LLVM style guide, we prefer not using braced initializer lists to call a constructor. Also, we prefer using an equal before the open curly brace if we use a braced initializer list when initializing a variable. See https://llvm.org/docs/CodingStandards.html#do-not-use-braced-initializer-lists-to-call-a-constructor for more details. The style guide does not explain the reason well. There is an article from abseil, which mentions few benefits. E.g., we can avoid the most vexing parse, etc. See https://abseil.io/tips/88 for more details. Signed-off-by: hanhanW <hanhan0912@gmail.com>	2025-01-21 21:23:32 -08:00
Matthias Springer	6aaa8f25b6	[mlir][IR][NFC] Move free-standing functions to `MemRefType` (#123465 ) Turn free-standing `MemRefType`-related helper functions in `BuiltinTypes.h` into member functions.	2025-01-21 08:48:09 +01:00
Jacques Pienaar	09dfc5713d	[mlir] Enable decoupling two kinds of greedy behavior. (#104649 ) The greedy rewriter is used in many different flows and it has a lot of convenience (work list management, debugging actions, tracing, etc). But it combines two kinds of greedy behavior 1) how ops are matched, 2) folding wherever it can. These are independent forms of greedy and leads to inefficiency. E.g., cases where one need to create different phases in lowering and is required to applying patterns in specific order split across different passes. Using the driver one ends up needlessly retrying folding/having multiple rounds of folding attempts, where one final run would have sufficed. Of course folks can locally avoid this behavior by just building their own, but this is also a common requested feature that folks keep on working around locally in suboptimal ways. For downstream users, there should be no behavioral change. Updating from the deprecated should just be a find and replace (e.g., `find ./ -type f -exec sed -i 's\|applyPatternsAndFoldGreedily\|applyPatternsGreedily\|g' {} \;` variety) as the API arguments hasn't changed between the two.	2024-12-20 08:15:48 -08:00
Adam Siemieniuk	4c597d42dc	[mlir][xegpu] Support boundary checks only for block instructions (#119380 ) Constrains Vector lowering to apply boundary checks only to data transfers operating on block shapes. This further aligns lowering with the current Xe instructions' restrictions.	2024-12-13 10:01:13 +01:00
Adam Siemieniuk	ec450b1900	[mlir][xegpu] Allow out-of-bounds writes (#110811 ) Relaxes vector.transfer_write lowering to allow out-of-bound writes. This aligns lowering with the current hardware specification which does not update bytes in out-of-bound locations during block stores.	2024-10-09 18:59:14 +02:00
Adam Siemieniuk	6c25604df2	[mlir][xegpu] Convert Vector load and store to XeGPU (#110826 ) Adds patterns to lower vector.load\|store to XeGPU operations.	2024-10-03 08:59:39 +02:00
Kazu Hirata	b52885bc23	[mlir] Use std::optional::value_or (NFC) (#109893 )	2024-09-26 09:53:43 -07:00
Chao Chen	8b5e841487	[MLIR][XeGPU] Updates XeGPU TensorDescAttr and Refine Gather/Scatter definition (#109675 ) Bring back #109144 with fixes to VectorToXeGPU	2024-09-24 10:14:13 -05:00
Adam Siemieniuk	02d34d800b	[mlir][vector][xegpu] Vector to XeGPU conversion pass (#107419 ) Add pass for Vector to XeGPU dialect conversion and initial conversion patterns for vector.transfer_read\|write operations.	2024-09-19 15:16:23 -05:00

31 Commits