31 Commits

Author SHA1 Message Date
Dmitry Chigarev
a9fb8b0622
[MLIR][XeGPU] Support vector.contract transpose_a/transpose_b via 'vector-to-gpu' patterns (#182885)
The PR adds [`vector.contract(transpose_a/transpose_b)` decomposition
patterns](3215645b8d/mlir/lib/Conversion/VectorToGPU/VectorToGPU.cpp (L1263))
from `vector-to-gpu` to `vector-to-xegpu` pass.

The `populatePrepareVectorToMMAPatterns` adds two patterns:
1. `PrepareContractToGPUMMA` that splits `vector.contract(transpose)`
into `vector.transpose + vector.contract`
2. `CombineTransferReadOpTranspose` that fuses `vector.transpose` into
the permutation map of `vector.transfer_read`

The second pattern doesn't always bring us to the desired result
(`xegpu.load_nd + vector.transpose + xegpu.dpas`) since [not all data
types are supported
](1237bd6df0/mlir/lib/Conversion/VectorToXeGPU/VectorToXeGPU.cpp (L570-L575))
for the transposed-read case. There's a second PR (#182875) on this
matter that adds a decomposition-pattern for unsupported types (it might
seem strange that we first fuse and then decompose
transfer_read+transpose but this way we don't have code duplication
between vector-to-gpu&to-xegpu passes and cover all functional cases)

---------

Signed-off-by: dchigarev <dmitry.chigarev@intel.com>
2026-03-05 17:08:55 +01:00
Dmitry Chigarev
3925d112c4
[MLIR][XeGPU] Decompose unsupported 'vector.transfer_read'-transpose-permutations (#182875)
The PR adds a pattern to `vector-to-xegpu` pass that decomposes
`vector.transfer_read` with unsupported transpose-permutations
(unsupported element-type) into `vector.transfer_read +
vector.transpose`:

Example:

```mlir
// input-ir:
  %0 = vector.transfer_read %source[%offset, %offset], %c0
    {permutation_map = affine_map<(d0, d1) -> (d1, d0)>,
    in_bounds = [true, true]} : memref<32x64xf16>, vector<8x16xf16>

// mlir-opt %s --convert-vector-to-xegpu
// before PR (no conversion because of unsupported type):
  %0 = vector.transfer_read %source[%offset, %offset], %c0
    {permutation_map = affine_map<(d0, d1) -> (d1, d0)>,
    in_bounds = [true, true]} : memref<32x64xf16>, vector<8x16xf16>

// mlir-opt %s --convert-vector-to-xegpu
// after PR (decomposed + converted):
  %0 = xegpu.load_nd %source[%offset, %offset]
  %1 = vector.transpose %0
```

---------

Signed-off-by: dchigarev <dmitry.chigarev@intel.com>
2026-03-05 17:08:22 +01:00
Mehdi Amini
bbd5b1d3bd
[mlir][VectorToXeGPU] Fix crash on memref with non-scalar element type (#183905)
The vector.store and vector.load lowering in --convert-vector-to-xegpu
would crash when the source memref had a non-integer/float element type
(e.g. memref<?xvector<4xf32>>).

The crash occurred inside createNdDescriptor() when computing the byte
offset for dynamic memrefs: srcTy.getElementTypeBitWidth() internally
calls getIntOrFloatBitWidth() which asserts on non-scalar types such as
vector<4xf32>.

Fix by adding a check for the memref's element type in
storeLoadPreconditions(). If the element type is not an integer or
float, the pattern returns notifyMatchFailure() instead of proceeding
and crashing.

The same guard is applied to TransferReadLowering and
TransferWriteLowering which share the same helper and can hit the same
path.

Fixes #181463
2026-03-03 11:33:03 +00:00
Jakub Kuderski
59e44799bd
[mlir] Fix new clang-tidy warning llvm-type-switch-case-types. NFC. (#178487)
Pre-commiting this before landing the new check in
https://github.com/llvm/llvm-project/pull/177892
2026-01-28 19:13:47 +00:00
Stefan Weigl-Bosker
db2f0f87b9
[MLIR][XeGPU]: Reject tensor_desc types with unknown bitwidth (#173922)
Fixes https://github.com/llvm/llvm-project/issues/173851

1. Only allow XeGPU_ScalarType element types in `xegpu::TensorDescType`
(via verifier, keeping mlir::Type params in api)
2. Fix `VectorToXeGPU` to prevent vectors with invalid TensorDescType
element types from lowering
2026-01-20 13:12:33 -08:00
Sang Ik Lee
8b0a24a50d
[MLIR] Vector to XeGPU conversion: Use proper source variant for create_nd_tdesc op creation. (#171216)
If source strided memref is not fully static - at least one of shape,
strides, offset is kDynamic - use i64 source variant.
With this change, xegpu.create_nd_tdesc created by lowering from vector
dialect, can rely on getMixedOffsets, getMixedSize and getMixedStrides
to get relevant values.
2025-12-18 11:03:51 -08:00
Nishant Patel
5fc8e87fe2
[MLIR][XeGPU] Retain anchor op layouts for XeGPU nD ops (#170934)
This PR adds support to retain the anchor op layouts (after dropping
what's not required) for xegpu nD ops during workgroup to subgroup &
unroll transformation
2025-12-05 21:49:13 -08:00
Dmitry Chigarev
d90bc3bc60
[mlir][XeGPU][VectorToXeGPU] Use 'xegpu.load' to lower 1D 'vector.transfer_read' for PVC & BMG (#168910)
The PR changes the `TransferReadLowering` to always use `xegpu.load`
(and not `xegpu.load_nd`) for 1D cases as it has more developed
interface (e.g. layouts capabilites).

Signed-off-by: dchigarev <dmitry.chigarev@intel.com>
2025-11-24 13:01:57 +01:00
Dmitry Chigarev
6c563dc6a2
[mlir][XeGPU] Add optional layout attribute to LoadGather StoreScatter ops (#163414)
As [suggested
here](https://github.com/llvm/llvm-project/pull/163071#discussion_r2427229637)
the PR adds an optional layout attribute for `LoadGather` and
`StoreScatter` ops.

For the load-op the attribute describes the layout of the result (ex
`layout_result_0`), and for store-op it describes the layout for the
vector-to-store operand (ex `layout_operand_0`).

The PR also reworks `propagate-layout` pass to consider perm layout
attributes and back-propagate them accordingly.

The helper utility function `getDistributeLayoutAttr` is reworked to
return either `layout_operand/result_0` or `layout` for load/store ops
(denepding on which one is set). After an offline discussion decided
that the overall utilities layouts API is confusing since it tries to
mix permament and temporary layouts. Would need to change it in the
future.

---------

Signed-off-by: dchigarev <dmitry.chigarev@intel.com>
2025-11-04 08:19:47 -08:00
Dmitry Chigarev
747050bcce
[MLIR][XeGPU][VectorToXeGPU] Lower vector.load/store/transfer_read/transfer_write to new offsets syntax (#162095)
Changes the `VectorToXeGPU` pass to generate `xegpu.load_nd/store_nd`
ops using new syntax with where offsets are specified at the load/store
ops level.
```mlir
// from this
%desc = xegpu.create_nd_tdesc %src[%off1, %off2]: memref<8x16xf16> -> !xegpu.tensor_desc<8x16xf16>
%res = xegpu.load_nd %desc : !xegpu.tensor_desc<8x16xf16> -> vector<8x16xf16>

// to this
%desc = xegpu.create_nd_tdesc %src: memref<8x16xf16> -> !xegpu.tensor_desc<8x16xf16>
%res = xegpu.load_nd %desc[%off1, %off2] : !xegpu.tensor_desc<8x16xf16> -> vector<8x16xf16>
```

In order to support cases with dimension reduction at the
`create_nd_tdesc` level (e.g. `memref<8x8x16xf16> ->
tensor_desc<8x16xf16>` it was decided to insert a memref.subview that
collapses the source shape to 2d, for example:

```mlir
// input:
%0 = vector.load %source[%off0, %off1, %off2] : memref<8x16x32xf32>, vector<8x16xf32>

// --vector-to-xegpu (old)
%tdesc = xegpu.create_nd_tdesc %source[%off0, %off1, %off2] : memref<8x16x32xf32> -> tdesc<8x32xf32>
%vec = xegpu.load_nd %tdesc

// --vector-to-xegpu (new)
%collapsed = memref.subview %source[%off0, 0, 0] [1, 16, 32] [1, 1, 1] :
    memref<8x16x32xf32> -> memref<16x32xf32, strided<[32, 1], offset: ?>>
%tdesc = xegpu.create_nd_tdesc %collapsed : memref<16x32xf32, ...> -> tdesc<8x32xf32>
%vec = xegpu.load_nd %tdesc[%off1, %off2]
```

<details><summary>Why we need to change that?</summary>

```mlir
// reduce dim and apply all 3 offsets at load_nd
%desc = xegpu.create_nd_tdesc %source : memref<8x16x32xf32> -> !xegpu.tensor_desc<16x32xf32>
// error: xegpu.load_nd len(offsets) != desc.rank
%res = xegpu.load_nd %desc[%off, %off, %off] : !xegpu.tensor_desc<16x32xf32> -> vector<8x16xf32>
```

</details>

---------

Signed-off-by: dchigarev <dmitry.chigarev@intel.com>
2025-11-04 13:52:23 +01:00
Jakub Kuderski
ba0be89cd2
[mlir] Simplify Default cases in type switches. NFC. (#165767)
Use default values instead of lambdas when possible. `std::nullopt` and
`nullptr` can be used now because of
https://github.com/llvm/llvm-project/pull/165724.
2025-10-30 15:10:59 -04:00
Jakub Kuderski
6ee362e1b5
[mlir][vector] Simplify rewrite pattern inheriting constructors. NFC. (#161966)
Use the `Base` type alias from
https://github.com/llvm/llvm-project/pull/158433.
2025-10-04 15:49:25 -04:00
Dmitry Chigarev
c4617bcae1
[MLIR][XeGPU][VectorToXeGPU] Add lowering from vector.gather/scatter to xegpu.load/store (#158024)
Lowering for `vector.gather`/`vector.scatter` into `xegpu.load`/`xegpu.store`.

High level steps to lower vector.gather/scatter:
```
%0 = vector.gather %source[%off1, %off2, %off3][%indices], %mask,
       %pass_thru : memref<8x16x32xf32>, vector<8xindex>, vector<8xi1>, vector<8xf32> into vector<8xf32>
```

1. Compute strides and a memref offset for the `%source` memref using
`computeMemrefMeta` func from the transfer_read/write lowering
2. Compute a linear offset like `%lin_off = %base_offset + %off1 *
strides#0 + %off2 * strides#1 + %off3 * strides#2`
3. Combine the linear offset with `%indices`: `%off = (broadcast
%lin_off : index to vector<8xindex>) + %indices * strides#2`
4. Convert memref to an i64: `%flat_memref =
memref.extract_aligned_pointer_as_index %source + arith.index_cast`
5. Perform load/store: `%vec = xegpu.load %flat_memref[%off], %mask`
6. Apply selection to propagate values from the pass_thru vector: `%res
= arith.select %mask, %vec, %pass_thru`
2025-09-19 11:12:14 +02:00
Dmitry Chigarev
40e85fcaaa
[MLIR][XeGPU][VectorToXeGPU] Fix transfer_read/write cases with non-contiguous memrefs (#158126)
This PR fixes a case where a source memref in
`vector.transfer_read/write` is not contiguous, which violates the
`memref.collapse_shape` semantic that is used in the lowering.

<details><summary>An example of a failing test</summary>

```mlir
gpu.module @xevm_module {
gpu.func @load_from_subview(%source: memref<4096x4096xf16>, %off1: index, %off2: index) -> vector<8xf16> {
  %c0 = arith.constant 0.0 : f16
  %subview = memref.subview %source[%off1, %off2] [256, 256] [1, 1] : memref<4096x4096xf16> to memref<256x256xf16, strided<[4096, 1], offset: ?>>
  %0 = vector.transfer_read %subview[%off2, %off2], %c0
    {in_bounds = [true]} : memref<256x256xf16, strided<[4096, 1], offset: ?>>, vector<8xf16>
  gpu.return %0 : vector<8xf16>
}
}
```

Fails with:
```
/home/user/llvm/mlir/test/Conversion/VectorToXeGPU/transfer-read-to-xegpu.mlir:404:8: error: 'memref.collapse_shape' op invalid source layout map or collapsing non-contiguous dims
  %0 = vector.transfer_read %subview[%off2, %off2], %c0
       ^
/home/user/llvm/mlir/test/Conversion/VectorToXeGPU/transfer-read-to-xegpu.mlir:404:8: note: see current operation: %8 = "memref.collapse_shape"(%2) <{reassociation = [[0, 1]]}> : (memref<256x256xf16, strided<[4096, 1], offset: ?>>) -> memref<65536xf16>
```

</details>

A suggestion was to replace `memref.collapse_shape` with
`memref.extract_aligned_pointer_as_index` which is done in this PR.
Since `extract_aligned_pointer` applied to a subview returns an original
pointer without subview offsets, this PR also adds a logic to use an
offset obtained from `memref.extract_strided_metadata` in `baseOffset`
calculation in `computeOffsets`.

---------

Signed-off-by: dchigarev <dmitry.chigarev@intel.com>
2025-09-11 18:25:51 -07:00
Jianhui Li
98728d9dc8
[MLIR][XeGPU] Add lowering from transfer_read/transfer_write to load_gather/store_scatter (#152429)
Lowering transfer_read/transfer_write to load_gather/store_scatter in
case the target uArch doesn't support load_nd/store_nd. The high level
steps:
  1. compute Strides;
  2. compute Offsets;
  3. collapseMemrefTo1D;
  4. create Load gather or store_scatter op
2025-08-14 11:27:07 -07:00
Maksim Levental
38976a03cd
[mlir][NFC] update Conversion create APIs (7/n) (#149889)
See https://github.com/llvm/llvm-project/pull/147168 for more info.
2025-07-22 10:41:06 -04:00
Adam Siemieniuk
06ae0c2a10
[mlir][xegpu] Remove vector contract to dpas size restriction (#147470)
Removes contraction shape check to allow representing large
workgroup-level workloads in preparation for distribution.
2025-07-09 22:37:06 +02:00
Kazu Hirata
fa9adbfda9
[mlir] Remove unused includes (NFC) (#147101)
These are identified by misc-include-cleaner.  I've filtered out those
that break builds.  Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
2025-07-04 13:30:21 -07:00
Andrzej Warzyński
c45cc3e420
[mlir][vector] Standardize base Naming Across Vector Ops (NFC) (#137859)
[mlir][vector] Standardize base Naming Across Vector Ops (NFC)

This change standardizes the naming convention for the argument
representing the value to read from or write to in Vector ops that
interface with Tensors or MemRefs. Specifically, it ensures that all
such ops use the name `base` (i.e., the base address or location to
which offsets are applied).

Updated operations:

* `vector.transfer_read`,
* `vector.transfer_write`.

For reference, these ops already use `base`:

* `vector.load`, `vector.store`, `vector.scatter`, `vector.gather`,
  `vector.expandload`, `vector.compressstore`, `vector.maskedstore`,
  `vector.maskedload`.

This is a non-functional change (NFC) and does not alter the semantics of these
operations. However, it does require users of the XFer ops to switch from
`op.getSource()` to `op.getBase()`.

To ease the transition, this PR temporarily adds a `getSource()` interface
method for compatibility. This is intended for downstream use only and should
not be relied on upstream. The method will be removed prior to the LLVM 21
release.

Implements #131602
2025-05-12 09:44:50 +01:00
Adam Siemieniuk
a16c225b40
[mlir][xegpu] Convert Vector contraction to XeGPU (#122115)
Adds pattern to lower vector.contract to XeGPU operation.
2025-03-13 19:41:53 +01:00
lorenzo chelini
c1a2292526
[MLIR][NFC] Retire let constructor for passes in Conversion directory (part1) (#127403)
`let constructor` is deprecated since the table gen backend emits most
of the glue logic to build a pass. This PR retires the td method for
most (I need another pass) passes in the Conversion directory.
2025-02-17 10:55:27 +01:00
Jay Foad
aa2952165c
Fix typo "tranpose" (#124929) 2025-01-29 17:49:54 +00:00
Han-Chung Wang
9cbc1f29ca
[mlir][NFC] Avoid using braced initializer lists to call a constructor. (#123714)
In the LLVM style guide, we prefer not using braced initializer lists to
call a constructor. Also, we prefer using an equal before the open curly
brace if we use a braced initializer list when initializing a variable.

See

https://llvm.org/docs/CodingStandards.html#do-not-use-braced-initializer-lists-to-call-a-constructor
for more details.

The style guide does not explain the reason well. There is an article
from abseil, which mentions few benefits. E.g., we can avoid the most
vexing parse, etc. See https://abseil.io/tips/88 for more details.

Signed-off-by: hanhanW <hanhan0912@gmail.com>
2025-01-21 21:23:32 -08:00
Matthias Springer
6aaa8f25b6
[mlir][IR][NFC] Move free-standing functions to MemRefType (#123465)
Turn free-standing `MemRefType`-related helper functions in
`BuiltinTypes.h` into member functions.
2025-01-21 08:48:09 +01:00
Jacques Pienaar
09dfc5713d
[mlir] Enable decoupling two kinds of greedy behavior. (#104649)
The greedy rewriter is used in many different flows and it has a lot of
convenience (work list management, debugging actions, tracing, etc). But
it combines two kinds of greedy behavior 1) how ops are matched, 2)
folding wherever it can.

These are independent forms of greedy and leads to inefficiency. E.g.,
cases where one need to create different phases in lowering and is
required to applying patterns in specific order split across different
passes. Using the driver one ends up needlessly retrying folding/having
multiple rounds of folding attempts, where one final run would have
sufficed.

Of course folks can locally avoid this behavior by just building their
own, but this is also a common requested feature that folks keep on
working around locally in suboptimal ways.

For downstream users, there should be no behavioral change. Updating
from the deprecated should just be a find and replace (e.g., `find ./
-type f -exec sed -i
's|applyPatternsAndFoldGreedily|applyPatternsGreedily|g' {} \;` variety)
as the API arguments hasn't changed between the two.
2024-12-20 08:15:48 -08:00
Adam Siemieniuk
4c597d42dc
[mlir][xegpu] Support boundary checks only for block instructions (#119380)
Constrains Vector lowering to apply boundary checks only to data
transfers operating on block shapes.

This further aligns lowering with the current Xe instructions'
restrictions.
2024-12-13 10:01:13 +01:00
Adam Siemieniuk
ec450b1900
[mlir][xegpu] Allow out-of-bounds writes (#110811)
Relaxes vector.transfer_write lowering to allow out-of-bound writes.

This aligns lowering with the current hardware specification which does
not update bytes in out-of-bound locations during block stores.
2024-10-09 18:59:14 +02:00
Adam Siemieniuk
6c25604df2
[mlir][xegpu] Convert Vector load and store to XeGPU (#110826)
Adds patterns to lower vector.load|store to XeGPU operations.
2024-10-03 08:59:39 +02:00
Kazu Hirata
b52885bc23
[mlir] Use std::optional::value_or (NFC) (#109893) 2024-09-26 09:53:43 -07:00
Chao Chen
8b5e841487
[MLIR][XeGPU] Updates XeGPU TensorDescAttr and Refine Gather/Scatter definition (#109675)
Bring back #109144 with fixes to VectorToXeGPU
2024-09-24 10:14:13 -05:00
Adam Siemieniuk
02d34d800b
[mlir][vector][xegpu] Vector to XeGPU conversion pass (#107419)
Add pass for Vector to XeGPU dialect conversion and initial conversion
patterns for vector.transfer_read|write operations.
2024-09-19 15:16:23 -05:00