13 Commits

Author SHA1 Message Date
Charitha Saumya
9b0d7ddb04
[mlir][xegpu] Add support for vector.multi_reduction and vector.shape_cast SIMT distribution. (#157560)
Add support for distributing the `vector.multi_reduction` operation
across lanes in a warp. Currently only 2D to 1D reductions are
supported. Given layouts for the source and accumulator vectors,
* If the reduction dimension is distributed across lanes, the reduction
is non-lane-local and the reduction is done using warp shuffles. Here we
simply rewrite the `MultiDimReductionOp` to a sequence of `ReductionOp`s
inside the warp op body. Actual distribution will be done by
`WarpOpReduction` pattern.
* If the reduction dimension is not distributed across lanes, the
reduction is lane-local. In this case, we yield the source and
accumulator vectors from the warp op and perform the lane-local
reduction outside the warp op using a sequence of `ReductionOp`s.

PR also adds support for distributing `vector.shape_cast` based on
layouts.
2025-09-12 09:37:04 -07:00
Chao Chen
6026ca301d
[mlir][XeGPU] add unroll patterns for load_matrix and store_matrix (#154637) 2025-09-03 13:56:41 -05:00
Chao Chen
c96e2cdd13
[mlir][XeGPU] Update utils for LayoutAttr and SliceAttr support (#154819) 2025-08-27 12:37:15 -05:00
Chao Chen
68d6866428
[mlir][XeGPU] add WgToSg distribution pattern for load_matrix and store_matrix. (#154403) 2025-08-21 10:02:45 -05:00
Jacques Pienaar
07967d4af8
[mlir] Switch to new LDBG macro (#150616)
Change local variants to use new central one.
2025-07-25 18:22:46 +02:00
Chao Chen
317dae1a7e
[mlir][xegpu] Add initial skeleton implementation for lowering ConvertLayoutOp (#146176)
This PR adds initial skeleton implementation for lowering
ConvertLayoutOp. It currently only supports cases where SLM is not
needed.

---------

Co-authored-by: Adam Siemieniuk <adam.siemieniuk@intel.com>
2025-07-23 11:35:40 -05:00
Kazu Hirata
c06d3a7b72
[mlir] Remove unused includes (NFC) (#148769)
These are identified by misc-include-cleaner.  I've filtered out those
that break builds.  Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
2025-07-14 22:19:23 -07:00
Chao Chen
75524dee18
[mlir][xegpu] Relax rank restriction of TensorDescType (#145916) 2025-07-09 19:40:24 -05:00
Jianhui Li
118bfcda46
[MLIR][XEGPU] Add blocking support for scatter ops (#144766)
Add blocking support for scatter ops: Create_tdesc, update, prefetch,
load and store. It also enables the load/store with chunk size.
2025-06-18 14:52:03 -07:00
Jianhui Li
9630d7cb92
[MLIR][XeGPU] add blocking support for reduce, broadcast, and transpose (#143389)
This PR adds blocking support for vector dialect operations (`reduce`,
`broadcast`, and `transpose`) in the XeGPU based IR. It simply assigned
the shape specified by "inst_data" as its target shape of the unrolling
to implement the blocking. It is based on
https://github.com/llvm/llvm-project/pull/140163.
2025-06-10 10:50:26 -05:00
Chao Chen
9e2684e4cf
[MLIR][XeGPU] Add unroll patterns and blocking pass for XeGPU [2/N] (#142477)
Bring back https://github.com/llvm/llvm-project/pull/140163 with fixes
2025-06-02 21:39:30 -05:00
Chao Chen
b88dfb0b23
Revert "[MLIR][XeGPU] Add unroll patterns and blocking pass for XeGPU [2/N]" (#142459)
Reverts llvm/llvm-project#140163
2025-06-02 15:47:21 -04:00
Chao Chen
0210750d5a
[MLIR][XeGPU] Add unroll patterns and blocking pass for XeGPU [2/N] (#140163)
This PR introduces the initial implementation of a blocking pass for
XeGPU programs. The pass leverages unroll patterns from both the XeGPU
and Vector dialects. 

---------

Co-authored-by: Adam Siemieniuk <adam.siemieniuk@intel.com>
2025-06-02 14:02:45 -05:00