llvm-project

Author	SHA1	Message	Date
Nishant Patel	570055bf97	[MLIR][XeGPU] Propagate layout from anchor ops before Wg To Sg & Blocking Pass (#179490 ) This PR calls recoverTemporaryLayout before the XeGPUWgtoSgDistribute & XeGPUBlocking Pass to recover all the temporary operand layout which might be required by the transformation patterns for checks and verification	2026-02-06 15:56:09 -08:00
Jianhui Li	61b8a57839	[MLIR][XeGPU] Refactor layout propagation utilities (#179016 ) This PR refactors layout propagation into two distinct components: result/anchor layout setup and source layout inference from the result. For operations that require a specific result layout due to semantic or hardware constraints, the propagation logic explicitly sets up the result or anchor layout. Otherwise, it infers the source layout from the backward-propagated consumer layout. The result or anchor layout may differ from the backward-propagated consumer layout; any such discrepancies are resolved via the existing layout-conflict mechanism. This PR introduces the following utility functions: Source layout inference: > inferBroadcastSourceLayout() > inferMultiReductionSourceLayout() > inferBitCastSourceLayout() > inferShapeCastSourceLayout() > inferInsertStridedSliceSourceLayout() Result / anchor layout setup: > setupMultiReductionResultLayout() > setupBitCastResultLayout() > setupInsertStridedSliceResultLayout() > setupLoadMatrixAnchorLayout() > setupStoreMatrixAnchorLayout() > setupLoadGatherAnchorLayout() > setupStoreScatterAnchorLayout() Part of subgroup distribution related code changes are separated and created as PR https://github.com/llvm/llvm-project/pull/179018/changes.	2026-02-05 19:26:25 -08:00
Jianhui Li	1b8903aa8e	[MLIR][XeGPU] setUnitDim bug fix and add documentation (#173521 ) This PR fix a bug in setUnitDimData and setUnitDimLayout, and adds documentation and test. It also cleans up the shapecast op pattern in the wg distribution to use local temporary layout instead of getting from definition op's result (one TODO item from PR [#172125](https://github.com/llvm/llvm-project/pull/172125)).	2026-01-20 21:17:00 -08:00
Nishant Patel	9a2d3ab6ad	[MLIR][XeGPU] Add support for cross-subgroup reduction from wg to sg (#170936 ) This PR adds support for cross-sg reduction whilst distributing from workgroup to subgroup. It has following limitation 1. Cannot reduce to a scalar 2. For cross-sg, only 1:1 decomposition (each sg should be assigned only one tile in the original WG tile) is supported for now. For example for a WG tile of size 256x128, sg_layout = [8, 4], sg_data = [16, 16] wont be supported.	2026-01-16 07:19:23 -08:00
Jianhui Li	074740df8a	[MLIR][XeGPU] bug fix: removing temporary slice layout at the pass end (#172589 ) Removing temporary slice layout (besides the regular layout) at the end of wg distribution and blocking pass. The PR also drop sg_data/inst_data from anchor layouts in every wg-to-sg/blocking/unrolling pattern. --------- Signed-off-by: Dmitry Chigarev <dmitry.chigarev@intel.com> Co-authored-by: Dmitry Chigarev <dmitry.chigarev@intel.com>	2026-01-14 09:57:14 -08:00
Jianhui Li	2b9e47749c	[MLIR][XeGPU] Refactor Layout access interface (#172125 ) This PR builds on the anchor layout mechanism introduced in https://github.com/llvm/llvm-project/pull/169267 and performs the following refactoring: 1. Introduce getAnchorLayout() and setAnchorLayout() interface for anchor ops to get and set layout attributes. 2. Add getLocalLayout() and setLocalLayout() utility functions, and refactor workgroup/subgroup distribution patterns to use these APIs. These utilities access the layout information directly and locally, without relying on global propagation. 3. Introduce localPropagateLayoutsFromAnchor(), a utility used by subgroup distribution to unify non-anchor layout setup. This function is intended to be invoked upfront by all layout-based passes (including workgroup/subgroup distribution and unrolling) to propagate layouts from anchor ops to non-anchor ops. After this step, patterns within the pass should exclusively use getLocalLayout() / setLocalLayout(). 4. Refactor getDistributeLayoutAttr() and setDistributeLayoutAttr() to remove special-case handling. These APIs now operate in a uniform order: anchor ops first, then non-anchor ops, and finally block arguments. These APIs will be deprecated on long run. 5. Refactor patterns in wg/sg distribution, load optimization passes to use get/setAnchorLayout() and get/setLocalLayout(). 6. Update test cases to enforce that anchor ops must use—and only use—anchor layouts.	2025-12-17 12:04:58 -08:00
Jianhui Li	492340aeb1	[MLIR][XeGPU] Add handling for unit-dim expansion in ShapeCast workgroup-to-subgroup distribution (#171758 ) Add special-case handling for ShapeCast when it expands unit dimensions for a succeeding broadcast op. In this scenario, distribution requires the source layout to be a slice layout, and the result layout is first normalized by setting the expanded unit dimensions to 1 before computing the distributed result shape. In all other cases, ShapeCast is distributed as usual. This PR also updates the propagation rule for vectors with expanded unit dimensions, allowing them to share the same layout as the result of a broadcast op. This enables correct layout propagation back to the source of the ShapeCast op, as that layout must ultimately be restored as the parent layout of the slice layout.	2025-12-16 13:13:11 -08:00
Nishant Patel	5fc8e87fe2	[MLIR][XeGPU] Retain anchor op layouts for XeGPU nD ops (#170934 ) This PR adds support to retain the anchor op layouts (after dropping what's not required) for xegpu nD ops during workgroup to subgroup & unroll transformation	2025-12-05 21:49:13 -08:00
Nishant Patel	c8d3b0c8e3	[MLIR][XeGPU] Add distribution for vector.create_mask from Wg to Sg (#169571 )	2025-12-03 16:01:46 -08:00
Jianhui Li	326a1a4bad	[MLIR][XeGPU] Add anchor_layout and update propagation to honor user-specified layouts (#169267 ) Introduce anchor layout for XeGPU anchor ops: load_nd, store_nd, prefetch_nd, dpas, load, store, prefetch, load_matrix, store_matrix, and atomic_rmw. Anchor layout is permanent, and is guaranteed to be honored by XeGPU distribution and lowerinngs once specified. 1. Add anchor_layout for XeGPU anchor OPs: load_nd, store_nd, prefetch_nd, dpas, load, store, prefetch, load_matrix, store_matrix, and atomic_rmw. 2. rename layout attributes to anchor_layout for these ops: load, store, load_matrix, store_matrix 3. update layout propagation pass: Only when user doesn't specify anchor layout, the pass computes a default layout and set to anchor op's permant layout and use that for propagation. if user specified anchor layout, the pass takes user-specified anchor layout. permant layout and use that for propagation. if user specified anchor layout, the pass takes user-specified anchor layout.	2025-11-26 23:02:01 -08:00
Nishant Patel	778e104dee	[MLIR] [XeGPU] Fix dropSgLayoutAndData & dropInstData in SliceAttr (#168618 )	2025-11-21 12:40:16 -08:00
Nishant Patel	310abe0e4b	[MLIR] [XeGPU] Add distribution pattern for vector.constant_mask from Wg To Sg (#168118 )	2025-11-20 15:00:57 -08:00
Dmitry Chigarev	cd5d5b31bf	[mlir][XeGPU] Use DistributeLayoutAttr instead of LayoutAttr for load gather/scatter ops (#167850 ) The PR changes the layout attribute type for `xegpu::LoadGatherOp/StoreScatterOp` from `LayoutAttr` to `DistributeLayoutAttr` to also support `xegpu.slice` layouts. Initially we [wanted to restrict slice layouts](https://github.com/llvm/llvm-project/pull/163414#discussion_r2478978798) from the attribute, but now it turns out there are actually valid use cases for that: ```mlir gpu.func @distribute_load_slice_attr() { %2 = memref.alloca() {alignment = 1024} : memref<4096xf32> %offset = arith.constant {layout_result_0 = #xegpu.layout<sg_layout = [8], sg_data = [32], inst_data = [16]> } dense<0> : vector<256xindex> %mask = arith.constant {layout_result_0 = #xegpu.layout<sg_layout = [8], sg_data = [32], inst_data = [16]> } dense<1> : vector<256xi1> %3 = xegpu.load %2[%offset], %mask <{chunk_size = 1, layout = #xegpu.slice<#xegpu.layout<sg_layout = [8, 8], sg_data = [32, 32], inst_data = [8, 16]>, dims = [0]>>} { layout_result_0 = #xegpu.slice<#xegpu.layout<sg_layout = [8, 8], sg_data = [32, 32], inst_data = [8, 16]>, dims = [0]> } : memref<4096xf32>, vector<256xindex>, vector<256xi1> -> vector<256xf32> %4 = vector.broadcast %3 {layout_result_0 = #xegpu.layout<sg_layout = [8, 8], sg_data = [32, 32], inst_data = [8, 16]>} : vector<256xf32> to vector<256x256xf32> gpu.return } ``` Signed-off-by: dchigarev <dmitry.chigarev@intel.com>	2025-11-17 11:00:03 -08:00
Nishant Patel	f291f335c9	[MLIR][XeGPU] Support order attribute and add pattern for vector.transpose in WgToSg Pass (#165307 ) This PR does the following: 1. Handle order attribute during the delinearization from linear subgroup Id to multi-dim id. 2. Adds a transformation pattern for vector.transpose in wg to sg pass. 3. Updates CHECKS in the wg to sg tests	2025-11-04 19:37:08 -08:00
Dmitry Chigarev	6c563dc6a2	[mlir][XeGPU] Add optional layout attribute to LoadGather StoreScatter ops (#163414 ) As [suggested here](https://github.com/llvm/llvm-project/pull/163071#discussion_r2427229637) the PR adds an optional layout attribute for `LoadGather` and `StoreScatter` ops. For the load-op the attribute describes the layout of the result (ex `layout_result_0`), and for store-op it describes the layout for the vector-to-store operand (ex `layout_operand_0`). The PR also reworks `propagate-layout` pass to consider perm layout attributes and back-propagate them accordingly. The helper utility function `getDistributeLayoutAttr` is reworked to return either `layout_operand/result_0` or `layout` for load/store ops (denepding on which one is set). After an offline discussion decided that the overall utilities layouts API is confusing since it tries to mix permament and temporary layouts. Would need to change it in the future. --------- Signed-off-by: dchigarev <dmitry.chigarev@intel.com>	2025-11-04 08:19:47 -08:00
Artem Kroviakov	68c4c83bcb	[MLIR][XeGPU] Matrix load/store subgroup distribution (#165008 )	2025-11-03 21:48:27 +01:00
Jakub Kuderski	ae11c5c2c4	[mlir] Switch uses of deprecated .create methods to free function. NFC. (#164635 ) See https://discourse.llvm.org/t/psa-opty-create-now-with-100-more-tab-complete/87339.	2025-10-22 14:51:03 +00:00
Jianhui Li	77cb19d7aa	[MLIR][XeGPU] XeVM lowering support for load_matrix/store_matrix + fix sanitizer issue (#163858 ) This PR fix the sanitizer issue reported post-merge for https://github.com/llvm/llvm-project/pull/162780	2025-10-16 14:09:48 -07:00
Vitaly Buka	d43581aaee	Revert "[MLIR][XeGPU] XeVM lowering support for load_matrix/store_matrix" (#163684 ) Reverts llvm/llvm-project#162780 Breaks build bots, see #162780.	2025-10-16 03:11:42 +00:00
Jianhui Li	6cae29fb3a	[MLIR][XeGPU] XeVM lowering support for load_matrix/store_matrix (#162780 ) This PR adds lowering of xegpu.load_matrix/store_matrix to xevm.blockload/blockstore or and llvm.load/store, depending on wi level attributes. It includes a few components: 1. adds wi-level attributes: subgroup_block_io. 2. expand load_matrix/store_matrix op definition to support scalar data (besides vector data). 2. adds a member function to mem_desc to compute the linearized address for a nd offsets. 3. add lowering depending on wi-level attributes: a) if subgroup_block_io attribute presents, lower to xevm.blockload/blockstore c) else lower to llvm.load/store. If result is a vector, lower to llvm.load/store with vector operand.	2025-10-15 16:50:41 -07:00
Nishant Patel	3c7873b75f	[MLIR][XeGPU] Distribute non-splat constant from wg to sg (#161416 ) This PR distributes non-splat constant from wg to sg. The current pattern has limitations and avoids cases which require SLM access.	2025-10-09 10:32:45 -07:00
Nishant Patel	68b143d968	[MLIR][XeGPU] Use operand layouts for store scatter (#161447 ) The PR adds a change to use the layouts from the operands since store doesn't have a result	2025-10-02 11:40:19 -07:00
Nishant Patel	50a7eb6fc2	[MLIR][XeGPU] Add support for vector.multi_reduction in wg to sg pass [1/N] (#157554 ) This PR adds pattern for lowering vector.multi_reduction from workgroup to subgroup IR. It currently only supports sg local reductions	2025-09-25 10:21:54 -07:00
Nishant Patel	8e17f80908	[MLIR][XeGPU] Distribute vector.step & vector.shape_cast op from wg to sg (#155443 ) This PR adds patterns to distribute vector.step and vector.shape_cast op from wg to sg and it also enables constant, broadcast and elementwise ops to handle the slice attribute	2025-09-12 14:33:52 -07:00
Charitha Saumya	9b0d7ddb04	[mlir][xegpu] Add support for `vector.multi_reduction` and `vector.shape_cast` SIMT distribution. (#157560 ) Add support for distributing the `vector.multi_reduction` operation across lanes in a warp. Currently only 2D to 1D reductions are supported. Given layouts for the source and accumulator vectors, * If the reduction dimension is distributed across lanes, the reduction is non-lane-local and the reduction is done using warp shuffles. Here we simply rewrite the `MultiDimReductionOp` to a sequence of `ReductionOp`s inside the warp op body. Actual distribution will be done by `WarpOpReduction` pattern. * If the reduction dimension is not distributed across lanes, the reduction is lane-local. In this case, we yield the source and accumulator vectors from the warp op and perform the lane-local reduction outside the warp op using a sequence of `ReductionOp`s. PR also adds support for distributing `vector.shape_cast` based on layouts.	2025-09-12 09:37:04 -07:00
Jakub Kuderski	2ed3f49c49	[mlir] Use free op create functions. NFC. (#157374 ) The builder create methods are deprecated: https://mlir.llvm.org/deprecation/. See https://discourse.llvm.org/t/psa-opty-create-now-with-100-more-tab-complete/87339.	2025-09-07 22:13:20 -04:00
Nishant Patel	fdfc751d39	[MLIR][XeGPU] Distribute load_gather/store_scatter op from Wg To Sg (#154420 ) This PR adds distribution patterns for scatter ops (LoadGather and StoreScatter) with offsets.	2025-09-01 13:56:02 -07:00
Chao Chen	c96e2cdd13	[mlir][XeGPU] Update utils for LayoutAttr and SliceAttr support (#154819 )	2025-08-27 12:37:15 -05:00
Chao Chen	68d6866428	[mlir][XeGPU] add WgToSg distribution pattern for load_matrix and store_matrix. (#154403 )	2025-08-21 10:02:45 -05:00
Nishant Patel	4a9d038acd	[MLIR][XeGPU] Distribute load_nd/store_nd/prefetch_nd with offsets from Wg to Sg (#153432 ) This PR adds pattern to distribute the load/store/prefetch nd ops with offsets from workgroup to subgroup IR. This PR is part of the transition to move offsets from create_nd to load/store/prefetch nd ops. Create_nd PR : #152351	2025-08-18 09:45:29 -07:00
Jacques Pienaar	4bf33958da	[mlir] Update builders to use new form. (#154132 ) Mechanically applied using clang-tidy.	2025-08-18 15:19:34 +00:00
Chao Chen	9c4e571ae8	[mlir][xegpu] Add definitions of MemDescType and related ops. (#153273 )	2025-08-15 18:02:13 -05:00
Nishant Patel	af87214b84	[MLIR][XeGPU] Add pattern for arith.constant for wg to sg distribution (#151977 )	2025-08-13 13:52:07 -07:00
Nishant Patel	88ff0f955c	[MLIR][XeGPU] Distribute create_nd_desc op without offset from Wg to Sg (#152351 ) This PR adds pattern to distribute the create_nd_desc op without offsets from workgroup (Wg) IR to subgroup (Sg) IR. The round robin distribution logic (involves offset calculation) now will happen in load/store/prefetch nd ops instead of create_nd.	2025-08-11 21:58:24 -07:00
Chao Chen	c96223434c	[mlir][xegpu] Add definition of SliceAttr (#150146 ) --------- Co-authored-by: Charitha Saumya <136391709+charithaintc@users.noreply.github.com>	2025-08-08 11:27:17 -05:00
Maksim Levental	c610b24493	[mlir][NFC] update `mlir/Dialect` create APIs (27/n) (#150638 ) See https://github.com/llvm/llvm-project/pull/147168 for more info.	2025-07-25 11:48:32 -05:00
Nishant Patel	65dec99562	[MLIR][XeGPU] Add support for subgroup_id_range (#148661 ) This PR adds a new attribute to the xegpu dialect called xegpu.range. One use case of this attribute can be to attach subgroup_id_range to scf.if of to drive the execution.	2025-07-23 11:10:35 -07:00
Chao Chen	317dae1a7e	[mlir][xegpu] Add initial skeleton implementation for lowering ConvertLayoutOp (#146176 ) This PR adds initial skeleton implementation for lowering ConvertLayoutOp. It currently only supports cases where SLM is not needed. --------- Co-authored-by: Adam Siemieniuk <adam.siemieniuk@intel.com>	2025-07-23 11:35:40 -05:00
Jianhui Li	90944b85c5	[MLIR][XeGPU] Add offset operands to load_nd/store_nd/prefetch_nd (#149424 ) This PR allows load_nd/store_nd/prefetch_nd to take an additional offset operand. It is based on this PR https://github.com/llvm/llvm-project/pull/148335. Now user can create a nd_tdesc with no offset, and instead set the offset with the load_nd operation.	2025-07-23 09:00:51 -07:00
Nishant Patel	56b263b1bd	[MLIR][XeGPU] Add transformation pattern for vector.broadcast in Wg to Sg pass (#144417 ) This PR adds transformation pattern for vector.broadcast op in xegpu-wg-to-sg-distribute pass	2025-07-23 08:41:53 -07:00
Maksim Levental	7b78796543	[mlir][NFC] update `mlir/Dialect` create APIs (25/n) (#149932 ) See https://github.com/llvm/llvm-project/pull/147168 for more info.	2025-07-21 19:57:59 -04:00
Chao Chen	5d849d3a90	[mlir][xegpu] Fix seg-fault caused by setting a null attribute (#146002 )	2025-07-01 15:42:52 -05:00
Nishant Patel	8063bd153c	[MLIR][XeGPU] Add support for elementwise ops in Wg to Sg distribute pass [1/N] (#142797 ) This PR adds support for Elementwise operations' (unary & binary) lowering from Workgroup to Subgroup.	2025-06-17 09:55:02 -07:00
Chao Chen	5578bcbcfd	[mlir][xegpu] add support for structure control flow ops in workgroup to subgroup distribution (#142618 ) This PR introduces support for `scf::ForOp`, `scf::WhileOp`, `scf::If`, and `scf::Condition` within the workgroup-subgroup-distribution pass, leveraging the `SCFStructuralTypeConversionsAndLegality`.	2025-06-13 12:32:46 -05:00
Nishant Patel	a7ede51b55	[mlir][XeGPU] Add XeGPU Workgroup to Subgroup Distribution Pass (#140805 ) This PR adds the XeGPU workgroup (wg) to subgroup (sg) pass. The wg to sg pass transforms the xegpu wg level operations to subgroup operations based on the sg_layout and sg_data attribute. The PR adds transformation patterns for following Ops 1. CreateNdDesc 2. LoadNd 3. StoreNd 4. PrefetchNd 5. UpdateNdOffset 6. Dpas	2025-05-21 08:08:46 -05:00
Jan Patrick Lehr	b99e57583e	Revert "[mlir] [XeGPU] Add XeGPU workgroup to subgroup pass (#139477 )" (#140779 ) This reverts commit 747620db2a02b889ae3ba3921d6c0e526a3e7677. Multiple bot failures	2025-05-20 20:31:00 +02:00
Nishant Patel	747620db2a	[mlir] [XeGPU] Add XeGPU workgroup to subgroup pass (#139477 ) This PR adds the XeGPU workgroup (wg) to subgroup (sg) pass. The wg to sg pass transforms the xegpu wg level operations to subgroup operations based on the sg_layout and sg_data attribute. The PR adds transformation patterns for following Ops 1. CreateNdDesc 2. LoadNd 3. StoreNd 4. PrefetchNd 4. UpdateNdOffset 5. Dpas	2025-05-20 12:35:50 -05:00

47 Commits