24 Commits

Author SHA1 Message Date
Dmitry Chigarev
d90bc3bc60
[mlir][XeGPU][VectorToXeGPU] Use 'xegpu.load' to lower 1D 'vector.transfer_read' for PVC & BMG (#168910)
The PR changes the `TransferReadLowering` to always use `xegpu.load`
(and not `xegpu.load_nd`) for 1D cases as it has more developed
interface (e.g. layouts capabilites).

Signed-off-by: dchigarev <dmitry.chigarev@intel.com>
2025-11-24 13:01:57 +01:00
Dmitry Chigarev
6c563dc6a2
[mlir][XeGPU] Add optional layout attribute to LoadGather StoreScatter ops (#163414)
As [suggested
here](https://github.com/llvm/llvm-project/pull/163071#discussion_r2427229637)
the PR adds an optional layout attribute for `LoadGather` and
`StoreScatter` ops.

For the load-op the attribute describes the layout of the result (ex
`layout_result_0`), and for store-op it describes the layout for the
vector-to-store operand (ex `layout_operand_0`).

The PR also reworks `propagate-layout` pass to consider perm layout
attributes and back-propagate them accordingly.

The helper utility function `getDistributeLayoutAttr` is reworked to
return either `layout_operand/result_0` or `layout` for load/store ops
(denepding on which one is set). After an offline discussion decided
that the overall utilities layouts API is confusing since it tries to
mix permament and temporary layouts. Would need to change it in the
future.

---------

Signed-off-by: dchigarev <dmitry.chigarev@intel.com>
2025-11-04 08:19:47 -08:00
Dmitry Chigarev
747050bcce
[MLIR][XeGPU][VectorToXeGPU] Lower vector.load/store/transfer_read/transfer_write to new offsets syntax (#162095)
Changes the `VectorToXeGPU` pass to generate `xegpu.load_nd/store_nd`
ops using new syntax with where offsets are specified at the load/store
ops level.
```mlir
// from this
%desc = xegpu.create_nd_tdesc %src[%off1, %off2]: memref<8x16xf16> -> !xegpu.tensor_desc<8x16xf16>
%res = xegpu.load_nd %desc : !xegpu.tensor_desc<8x16xf16> -> vector<8x16xf16>

// to this
%desc = xegpu.create_nd_tdesc %src: memref<8x16xf16> -> !xegpu.tensor_desc<8x16xf16>
%res = xegpu.load_nd %desc[%off1, %off2] : !xegpu.tensor_desc<8x16xf16> -> vector<8x16xf16>
```

In order to support cases with dimension reduction at the
`create_nd_tdesc` level (e.g. `memref<8x8x16xf16> ->
tensor_desc<8x16xf16>` it was decided to insert a memref.subview that
collapses the source shape to 2d, for example:

```mlir
// input:
%0 = vector.load %source[%off0, %off1, %off2] : memref<8x16x32xf32>, vector<8x16xf32>

// --vector-to-xegpu (old)
%tdesc = xegpu.create_nd_tdesc %source[%off0, %off1, %off2] : memref<8x16x32xf32> -> tdesc<8x32xf32>
%vec = xegpu.load_nd %tdesc

// --vector-to-xegpu (new)
%collapsed = memref.subview %source[%off0, 0, 0] [1, 16, 32] [1, 1, 1] :
    memref<8x16x32xf32> -> memref<16x32xf32, strided<[32, 1], offset: ?>>
%tdesc = xegpu.create_nd_tdesc %collapsed : memref<16x32xf32, ...> -> tdesc<8x32xf32>
%vec = xegpu.load_nd %tdesc[%off1, %off2]
```

<details><summary>Why we need to change that?</summary>

```mlir
// reduce dim and apply all 3 offsets at load_nd
%desc = xegpu.create_nd_tdesc %source : memref<8x16x32xf32> -> !xegpu.tensor_desc<16x32xf32>
// error: xegpu.load_nd len(offsets) != desc.rank
%res = xegpu.load_nd %desc[%off, %off, %off] : !xegpu.tensor_desc<16x32xf32> -> vector<8x16xf32>
```

</details>

---------

Signed-off-by: dchigarev <dmitry.chigarev@intel.com>
2025-11-04 13:52:23 +01:00
Jakub Kuderski
ba0be89cd2
[mlir] Simplify Default cases in type switches. NFC. (#165767)
Use default values instead of lambdas when possible. `std::nullopt` and
`nullptr` can be used now because of
https://github.com/llvm/llvm-project/pull/165724.
2025-10-30 15:10:59 -04:00
Jakub Kuderski
6ee362e1b5
[mlir][vector] Simplify rewrite pattern inheriting constructors. NFC. (#161966)
Use the `Base` type alias from
https://github.com/llvm/llvm-project/pull/158433.
2025-10-04 15:49:25 -04:00
Dmitry Chigarev
c4617bcae1
[MLIR][XeGPU][VectorToXeGPU] Add lowering from vector.gather/scatter to xegpu.load/store (#158024)
Lowering for `vector.gather`/`vector.scatter` into `xegpu.load`/`xegpu.store`.

High level steps to lower vector.gather/scatter:
```
%0 = vector.gather %source[%off1, %off2, %off3][%indices], %mask,
       %pass_thru : memref<8x16x32xf32>, vector<8xindex>, vector<8xi1>, vector<8xf32> into vector<8xf32>
```

1. Compute strides and a memref offset for the `%source` memref using
`computeMemrefMeta` func from the transfer_read/write lowering
2. Compute a linear offset like `%lin_off = %base_offset + %off1 *
strides#0 + %off2 * strides#1 + %off3 * strides#2`
3. Combine the linear offset with `%indices`: `%off = (broadcast
%lin_off : index to vector<8xindex>) + %indices * strides#2`
4. Convert memref to an i64: `%flat_memref =
memref.extract_aligned_pointer_as_index %source + arith.index_cast`
5. Perform load/store: `%vec = xegpu.load %flat_memref[%off], %mask`
6. Apply selection to propagate values from the pass_thru vector: `%res
= arith.select %mask, %vec, %pass_thru`
2025-09-19 11:12:14 +02:00
Dmitry Chigarev
40e85fcaaa
[MLIR][XeGPU][VectorToXeGPU] Fix transfer_read/write cases with non-contiguous memrefs (#158126)
This PR fixes a case where a source memref in
`vector.transfer_read/write` is not contiguous, which violates the
`memref.collapse_shape` semantic that is used in the lowering.

<details><summary>An example of a failing test</summary>

```mlir
gpu.module @xevm_module {
gpu.func @load_from_subview(%source: memref<4096x4096xf16>, %off1: index, %off2: index) -> vector<8xf16> {
  %c0 = arith.constant 0.0 : f16
  %subview = memref.subview %source[%off1, %off2] [256, 256] [1, 1] : memref<4096x4096xf16> to memref<256x256xf16, strided<[4096, 1], offset: ?>>
  %0 = vector.transfer_read %subview[%off2, %off2], %c0
    {in_bounds = [true]} : memref<256x256xf16, strided<[4096, 1], offset: ?>>, vector<8xf16>
  gpu.return %0 : vector<8xf16>
}
}
```

Fails with:
```
/home/user/llvm/mlir/test/Conversion/VectorToXeGPU/transfer-read-to-xegpu.mlir:404:8: error: 'memref.collapse_shape' op invalid source layout map or collapsing non-contiguous dims
  %0 = vector.transfer_read %subview[%off2, %off2], %c0
       ^
/home/user/llvm/mlir/test/Conversion/VectorToXeGPU/transfer-read-to-xegpu.mlir:404:8: note: see current operation: %8 = "memref.collapse_shape"(%2) <{reassociation = [[0, 1]]}> : (memref<256x256xf16, strided<[4096, 1], offset: ?>>) -> memref<65536xf16>
```

</details>

A suggestion was to replace `memref.collapse_shape` with
`memref.extract_aligned_pointer_as_index` which is done in this PR.
Since `extract_aligned_pointer` applied to a subview returns an original
pointer without subview offsets, this PR also adds a logic to use an
offset obtained from `memref.extract_strided_metadata` in `baseOffset`
calculation in `computeOffsets`.

---------

Signed-off-by: dchigarev <dmitry.chigarev@intel.com>
2025-09-11 18:25:51 -07:00
Jianhui Li
98728d9dc8
[MLIR][XeGPU] Add lowering from transfer_read/transfer_write to load_gather/store_scatter (#152429)
Lowering transfer_read/transfer_write to load_gather/store_scatter in
case the target uArch doesn't support load_nd/store_nd. The high level
steps:
  1. compute Strides;
  2. compute Offsets;
  3. collapseMemrefTo1D;
  4. create Load gather or store_scatter op
2025-08-14 11:27:07 -07:00
Maksim Levental
38976a03cd
[mlir][NFC] update Conversion create APIs (7/n) (#149889)
See https://github.com/llvm/llvm-project/pull/147168 for more info.
2025-07-22 10:41:06 -04:00
Adam Siemieniuk
06ae0c2a10
[mlir][xegpu] Remove vector contract to dpas size restriction (#147470)
Removes contraction shape check to allow representing large
workgroup-level workloads in preparation for distribution.
2025-07-09 22:37:06 +02:00
Kazu Hirata
fa9adbfda9
[mlir] Remove unused includes (NFC) (#147101)
These are identified by misc-include-cleaner.  I've filtered out those
that break builds.  Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
2025-07-04 13:30:21 -07:00
Andrzej Warzyński
c45cc3e420
[mlir][vector] Standardize base Naming Across Vector Ops (NFC) (#137859)
[mlir][vector] Standardize base Naming Across Vector Ops (NFC)

This change standardizes the naming convention for the argument
representing the value to read from or write to in Vector ops that
interface with Tensors or MemRefs. Specifically, it ensures that all
such ops use the name `base` (i.e., the base address or location to
which offsets are applied).

Updated operations:

* `vector.transfer_read`,
* `vector.transfer_write`.

For reference, these ops already use `base`:

* `vector.load`, `vector.store`, `vector.scatter`, `vector.gather`,
  `vector.expandload`, `vector.compressstore`, `vector.maskedstore`,
  `vector.maskedload`.

This is a non-functional change (NFC) and does not alter the semantics of these
operations. However, it does require users of the XFer ops to switch from
`op.getSource()` to `op.getBase()`.

To ease the transition, this PR temporarily adds a `getSource()` interface
method for compatibility. This is intended for downstream use only and should
not be relied on upstream. The method will be removed prior to the LLVM 21
release.

Implements #131602
2025-05-12 09:44:50 +01:00
Adam Siemieniuk
a16c225b40
[mlir][xegpu] Convert Vector contraction to XeGPU (#122115)
Adds pattern to lower vector.contract to XeGPU operation.
2025-03-13 19:41:53 +01:00
lorenzo chelini
c1a2292526
[MLIR][NFC] Retire let constructor for passes in Conversion directory (part1) (#127403)
`let constructor` is deprecated since the table gen backend emits most
of the glue logic to build a pass. This PR retires the td method for
most (I need another pass) passes in the Conversion directory.
2025-02-17 10:55:27 +01:00
Jay Foad
aa2952165c
Fix typo "tranpose" (#124929) 2025-01-29 17:49:54 +00:00
Han-Chung Wang
9cbc1f29ca
[mlir][NFC] Avoid using braced initializer lists to call a constructor. (#123714)
In the LLVM style guide, we prefer not using braced initializer lists to
call a constructor. Also, we prefer using an equal before the open curly
brace if we use a braced initializer list when initializing a variable.

See

https://llvm.org/docs/CodingStandards.html#do-not-use-braced-initializer-lists-to-call-a-constructor
for more details.

The style guide does not explain the reason well. There is an article
from abseil, which mentions few benefits. E.g., we can avoid the most
vexing parse, etc. See https://abseil.io/tips/88 for more details.

Signed-off-by: hanhanW <hanhan0912@gmail.com>
2025-01-21 21:23:32 -08:00
Matthias Springer
6aaa8f25b6
[mlir][IR][NFC] Move free-standing functions to MemRefType (#123465)
Turn free-standing `MemRefType`-related helper functions in
`BuiltinTypes.h` into member functions.
2025-01-21 08:48:09 +01:00
Jacques Pienaar
09dfc5713d
[mlir] Enable decoupling two kinds of greedy behavior. (#104649)
The greedy rewriter is used in many different flows and it has a lot of
convenience (work list management, debugging actions, tracing, etc). But
it combines two kinds of greedy behavior 1) how ops are matched, 2)
folding wherever it can.

These are independent forms of greedy and leads to inefficiency. E.g.,
cases where one need to create different phases in lowering and is
required to applying patterns in specific order split across different
passes. Using the driver one ends up needlessly retrying folding/having
multiple rounds of folding attempts, where one final run would have
sufficed.

Of course folks can locally avoid this behavior by just building their
own, but this is also a common requested feature that folks keep on
working around locally in suboptimal ways.

For downstream users, there should be no behavioral change. Updating
from the deprecated should just be a find and replace (e.g., `find ./
-type f -exec sed -i
's|applyPatternsAndFoldGreedily|applyPatternsGreedily|g' {} \;` variety)
as the API arguments hasn't changed between the two.
2024-12-20 08:15:48 -08:00
Adam Siemieniuk
4c597d42dc
[mlir][xegpu] Support boundary checks only for block instructions (#119380)
Constrains Vector lowering to apply boundary checks only to data
transfers operating on block shapes.

This further aligns lowering with the current Xe instructions'
restrictions.
2024-12-13 10:01:13 +01:00
Adam Siemieniuk
ec450b1900
[mlir][xegpu] Allow out-of-bounds writes (#110811)
Relaxes vector.transfer_write lowering to allow out-of-bound writes.

This aligns lowering with the current hardware specification which does
not update bytes in out-of-bound locations during block stores.
2024-10-09 18:59:14 +02:00
Adam Siemieniuk
6c25604df2
[mlir][xegpu] Convert Vector load and store to XeGPU (#110826)
Adds patterns to lower vector.load|store to XeGPU operations.
2024-10-03 08:59:39 +02:00
Kazu Hirata
b52885bc23
[mlir] Use std::optional::value_or (NFC) (#109893) 2024-09-26 09:53:43 -07:00
Chao Chen
8b5e841487
[MLIR][XeGPU] Updates XeGPU TensorDescAttr and Refine Gather/Scatter definition (#109675)
Bring back #109144 with fixes to VectorToXeGPU
2024-09-24 10:14:13 -05:00
Adam Siemieniuk
02d34d800b
[mlir][vector][xegpu] Vector to XeGPU conversion pass (#107419)
Add pass for Vector to XeGPU dialect conversion and initial conversion
patterns for vector.transfer_read|write operations.
2024-09-19 15:16:23 -05:00