35 Commits

Author SHA1 Message Date
Dmitry Chigarev
cd5d5b31bf
[mlir][XeGPU] Use DistributeLayoutAttr instead of LayoutAttr for load gather/scatter ops (#167850)
The PR changes the layout attribute type for
`xegpu::LoadGatherOp/StoreScatterOp` from `LayoutAttr` to
`DistributeLayoutAttr` to also support `xegpu.slice` layouts.

Initially we [wanted to restrict slice
layouts](https://github.com/llvm/llvm-project/pull/163414#discussion_r2478978798)
from the attribute, but now it turns out there are actually valid use
cases for that:
```mlir
gpu.func @distribute_load_slice_attr() {
  %2 = memref.alloca() {alignment = 1024} : memref<4096xf32>
  %offset =  arith.constant {layout_result_0 = #xegpu.layout<sg_layout = [8], sg_data = [32], inst_data = [16]> } dense<0> : vector<256xindex>
  %mask = arith.constant {layout_result_0 = #xegpu.layout<sg_layout = [8], sg_data = [32], inst_data = [16]> } dense<1> : vector<256xi1>

  %3 = xegpu.load %2[%offset], %mask <{chunk_size = 1, layout = #xegpu.slice<#xegpu.layout<sg_layout = [8, 8], sg_data = [32, 32], inst_data = [8, 16]>, dims = [0]>>} {
      layout_result_0 = #xegpu.slice<#xegpu.layout<sg_layout = [8, 8], sg_data = [32, 32], inst_data = [8, 16]>, dims = [0]> 
  } : memref<4096xf32>, vector<256xindex>, vector<256xi1> -> vector<256xf32>

  %4 = vector.broadcast %3 {layout_result_0 =
      #xegpu.layout<sg_layout = [8, 8], sg_data = [32, 32], inst_data = [8, 16]>} : vector<256xf32> to vector<256x256xf32>
  gpu.return
}
```

Signed-off-by: dchigarev <dmitry.chigarev@intel.com>
2025-11-17 11:00:03 -08:00
Nishant Patel
f291f335c9
[MLIR][XeGPU] Support order attribute and add pattern for vector.transpose in WgToSg Pass (#165307)
This PR does the following:
1. Handle order attribute during the delinearization from linear
subgroup Id to multi-dim id.
2. Adds a transformation pattern for vector.transpose in wg to sg pass.
3. Updates CHECKS in the wg to sg tests
2025-11-04 19:37:08 -08:00
Dmitry Chigarev
6c563dc6a2
[mlir][XeGPU] Add optional layout attribute to LoadGather StoreScatter ops (#163414)
As [suggested
here](https://github.com/llvm/llvm-project/pull/163071#discussion_r2427229637)
the PR adds an optional layout attribute for `LoadGather` and
`StoreScatter` ops.

For the load-op the attribute describes the layout of the result (ex
`layout_result_0`), and for store-op it describes the layout for the
vector-to-store operand (ex `layout_operand_0`).

The PR also reworks `propagate-layout` pass to consider perm layout
attributes and back-propagate them accordingly.

The helper utility function `getDistributeLayoutAttr` is reworked to
return either `layout_operand/result_0` or `layout` for load/store ops
(denepding on which one is set). After an offline discussion decided
that the overall utilities layouts API is confusing since it tries to
mix permament and temporary layouts. Would need to change it in the
future.

---------

Signed-off-by: dchigarev <dmitry.chigarev@intel.com>
2025-11-04 08:19:47 -08:00
Artem Kroviakov
68c4c83bcb
[MLIR][XeGPU] Matrix load/store subgroup distribution (#165008) 2025-11-03 21:48:27 +01:00
Jakub Kuderski
ae11c5c2c4
[mlir] Switch uses of deprecated .create methods to free function. NFC. (#164635)
See https://discourse.llvm.org/t/psa-opty-create-now-with-100-more-tab-complete/87339.
2025-10-22 14:51:03 +00:00
Jianhui Li
77cb19d7aa
[MLIR][XeGPU] XeVM lowering support for load_matrix/store_matrix + fix sanitizer issue (#163858)
This PR fix the sanitizer issue reported post-merge for
https://github.com/llvm/llvm-project/pull/162780
2025-10-16 14:09:48 -07:00
Vitaly Buka
d43581aaee
Revert "[MLIR][XeGPU] XeVM lowering support for load_matrix/store_matrix" (#163684)
Reverts llvm/llvm-project#162780

Breaks build bots, see #162780.
2025-10-16 03:11:42 +00:00
Jianhui Li
6cae29fb3a
[MLIR][XeGPU] XeVM lowering support for load_matrix/store_matrix (#162780)
This PR adds lowering of xegpu.load_matrix/store_matrix to
xevm.blockload/blockstore or and llvm.load/store, depending on wi level
attributes.
It includes a few components: 
   1. adds wi-level attributes: subgroup_block_io.   
2. expand load_matrix/store_matrix op definition to support scalar data
(besides vector data).
2. adds a member function to mem_desc to compute the linearized address
for a nd offsets.
   3. add lowering depending on wi-level attributes: 
a) if subgroup_block_io attribute presents, lower to
xevm.blockload/blockstore
c) else lower to llvm.load/store. If result is a vector, lower to
llvm.load/store with vector operand.
2025-10-15 16:50:41 -07:00
Nishant Patel
3c7873b75f
[MLIR][XeGPU] Distribute non-splat constant from wg to sg (#161416)
This PR distributes non-splat constant from wg to sg. The current
pattern has limitations and avoids cases which require SLM access.
2025-10-09 10:32:45 -07:00
Nishant Patel
68b143d968
[MLIR][XeGPU] Use operand layouts for store scatter (#161447)
The PR adds a change to use the layouts from the operands since store
doesn't have a result
2025-10-02 11:40:19 -07:00
Nishant Patel
50a7eb6fc2
[MLIR][XeGPU] Add support for vector.multi_reduction in wg to sg pass [1/N] (#157554)
This PR adds pattern for lowering vector.multi_reduction from workgroup
to subgroup IR. It currently only supports sg local reductions
2025-09-25 10:21:54 -07:00
Nishant Patel
8e17f80908
[MLIR][XeGPU] Distribute vector.step & vector.shape_cast op from wg to sg (#155443)
This PR adds patterns to distribute vector.step and vector.shape_cast op
from wg to sg and it also enables constant, broadcast and elementwise
ops to handle the slice attribute
2025-09-12 14:33:52 -07:00
Charitha Saumya
9b0d7ddb04
[mlir][xegpu] Add support for vector.multi_reduction and vector.shape_cast SIMT distribution. (#157560)
Add support for distributing the `vector.multi_reduction` operation
across lanes in a warp. Currently only 2D to 1D reductions are
supported. Given layouts for the source and accumulator vectors,
* If the reduction dimension is distributed across lanes, the reduction
is non-lane-local and the reduction is done using warp shuffles. Here we
simply rewrite the `MultiDimReductionOp` to a sequence of `ReductionOp`s
inside the warp op body. Actual distribution will be done by
`WarpOpReduction` pattern.
* If the reduction dimension is not distributed across lanes, the
reduction is lane-local. In this case, we yield the source and
accumulator vectors from the warp op and perform the lane-local
reduction outside the warp op using a sequence of `ReductionOp`s.

PR also adds support for distributing `vector.shape_cast` based on
layouts.
2025-09-12 09:37:04 -07:00
Jakub Kuderski
2ed3f49c49
[mlir] Use free op create functions. NFC. (#157374)
The builder create methods are deprecated:
https://mlir.llvm.org/deprecation/. See
https://discourse.llvm.org/t/psa-opty-create-now-with-100-more-tab-complete/87339.
2025-09-07 22:13:20 -04:00
Nishant Patel
fdfc751d39
[MLIR][XeGPU] Distribute load_gather/store_scatter op from Wg To Sg (#154420)
This PR adds distribution patterns for scatter ops (LoadGather and
StoreScatter) with offsets.
2025-09-01 13:56:02 -07:00
Chao Chen
c96e2cdd13
[mlir][XeGPU] Update utils for LayoutAttr and SliceAttr support (#154819) 2025-08-27 12:37:15 -05:00
Chao Chen
68d6866428
[mlir][XeGPU] add WgToSg distribution pattern for load_matrix and store_matrix. (#154403) 2025-08-21 10:02:45 -05:00
Nishant Patel
4a9d038acd
[MLIR][XeGPU] Distribute load_nd/store_nd/prefetch_nd with offsets from Wg to Sg (#153432)
This PR adds pattern to distribute the load/store/prefetch nd ops with
offsets from workgroup to subgroup IR. This PR is part of the transition
to move offsets from create_nd to load/store/prefetch nd ops.

Create_nd PR : #152351
2025-08-18 09:45:29 -07:00
Jacques Pienaar
4bf33958da
[mlir] Update builders to use new form. (#154132)
Mechanically applied using clang-tidy.
2025-08-18 15:19:34 +00:00
Chao Chen
9c4e571ae8
[mlir][xegpu] Add definitions of MemDescType and related ops. (#153273) 2025-08-15 18:02:13 -05:00
Nishant Patel
af87214b84
[MLIR][XeGPU] Add pattern for arith.constant for wg to sg distribution (#151977) 2025-08-13 13:52:07 -07:00
Nishant Patel
88ff0f955c
[MLIR][XeGPU] Distribute create_nd_desc op without offset from Wg to Sg (#152351)
This PR adds pattern to distribute the create_nd_desc op without offsets
from workgroup (Wg) IR to subgroup (Sg) IR.
The round robin distribution logic (involves offset calculation) now
will happen in load/store/prefetch nd ops instead of create_nd.
2025-08-11 21:58:24 -07:00
Chao Chen
c96223434c
[mlir][xegpu] Add definition of SliceAttr (#150146)
---------

Co-authored-by: Charitha Saumya <136391709+charithaintc@users.noreply.github.com>
2025-08-08 11:27:17 -05:00
Maksim Levental
c610b24493
[mlir][NFC] update mlir/Dialect create APIs (27/n) (#150638)
See https://github.com/llvm/llvm-project/pull/147168 for more info.
2025-07-25 11:48:32 -05:00
Nishant Patel
65dec99562
[MLIR][XeGPU] Add support for subgroup_id_range (#148661)
This PR adds a new attribute to the xegpu dialect called xegpu.range.
One use case of this attribute can be to attach subgroup_id_range to
scf.if of to drive the execution.
2025-07-23 11:10:35 -07:00
Chao Chen
317dae1a7e
[mlir][xegpu] Add initial skeleton implementation for lowering ConvertLayoutOp (#146176)
This PR adds initial skeleton implementation for lowering
ConvertLayoutOp. It currently only supports cases where SLM is not
needed.

---------

Co-authored-by: Adam Siemieniuk <adam.siemieniuk@intel.com>
2025-07-23 11:35:40 -05:00
Jianhui Li
90944b85c5
[MLIR][XeGPU] Add offset operands to load_nd/store_nd/prefetch_nd (#149424)
This PR allows load_nd/store_nd/prefetch_nd to take an additional offset
operand.
It is based on this PR https://github.com/llvm/llvm-project/pull/148335.
Now user can create a nd_tdesc with no offset, and instead set the
offset with the load_nd operation.
2025-07-23 09:00:51 -07:00
Nishant Patel
56b263b1bd
[MLIR][XeGPU] Add transformation pattern for vector.broadcast in Wg to Sg pass (#144417)
This PR adds transformation pattern for vector.broadcast op in
xegpu-wg-to-sg-distribute pass
2025-07-23 08:41:53 -07:00
Maksim Levental
7b78796543
[mlir][NFC] update mlir/Dialect create APIs (25/n) (#149932)
See https://github.com/llvm/llvm-project/pull/147168 for more info.
2025-07-21 19:57:59 -04:00
Chao Chen
5d849d3a90
[mlir][xegpu] Fix seg-fault caused by setting a null attribute (#146002) 2025-07-01 15:42:52 -05:00
Nishant Patel
8063bd153c
[MLIR][XeGPU] Add support for elementwise ops in Wg to Sg distribute pass [1/N] (#142797)
This PR adds support for Elementwise operations' (unary & binary)
lowering from Workgroup to Subgroup.
2025-06-17 09:55:02 -07:00
Chao Chen
5578bcbcfd
[mlir][xegpu] add support for structure control flow ops in workgroup to subgroup distribution (#142618)
This PR introduces support for `scf::ForOp`, `scf::WhileOp`, `scf::If`,
and `scf::Condition` within the workgroup-subgroup-distribution pass,
leveraging the `SCFStructuralTypeConversionsAndLegality`.
2025-06-13 12:32:46 -05:00
Nishant Patel
a7ede51b55
[mlir][XeGPU] Add XeGPU Workgroup to Subgroup Distribution Pass (#140805)
This PR adds the XeGPU workgroup (wg) to subgroup (sg) pass. The wg to
sg pass transforms the xegpu wg level operations to subgroup operations
based on the sg_layout and sg_data attribute. The PR adds transformation
patterns for following Ops

1. CreateNdDesc
2. LoadNd
3. StoreNd
4. PrefetchNd
5. UpdateNdOffset
6. Dpas
2025-05-21 08:08:46 -05:00
Jan Patrick Lehr
b99e57583e
Revert "[mlir] [XeGPU] Add XeGPU workgroup to subgroup pass (#139477)" (#140779)
This reverts commit 747620db2a02b889ae3ba3921d6c0e526a3e7677.

Multiple bot failures
2025-05-20 20:31:00 +02:00
Nishant Patel
747620db2a
[mlir] [XeGPU] Add XeGPU workgroup to subgroup pass (#139477)
This PR adds the XeGPU workgroup (wg) to subgroup (sg) pass. The wg to
sg pass transforms the xegpu wg level operations to subgroup operations
based on the sg_layout and sg_data attribute. The PR adds transformation
patterns for following Ops

1. CreateNdDesc
2. LoadNd
3. StoreNd
4. PrefetchNd
4. UpdateNdOffset
5. Dpas
2025-05-20 12:35:50 -05:00