15 Commits

Author SHA1 Message Date
Jianhui Li
401ba6df84
[MLIR][XeGPU] Add Layout Propagation support for multi-reduction/reduction op with scalar result (#189133)
This PR add Layout Propagation support for multi-reduction/reduction op
with scalar result:
1) Enhance setupMultiReductionResultLayout() and
LayoutInfoPropagation::visitVectorMultiReductionOp() to support scalar
result
2) Add propagation support for vector.reduction op at the lane level,
since the op is only introduced at the lane level.
2026-04-01 13:01:34 -07:00
Andrey Pavlenko
44c6a0acb7
[MLIR][XeGPU] Fix dpas f16 output layout (#184419)
The layout propagation fails if dpas has an f16 accumulator. This fix
resolves the issue by removing the packingSize argument which seems not
valid here.
2026-03-20 20:28:26 +00:00
Jianhui Li
f5e2238a3e
[MLIR][XeGPU] Enhance multi-reduction layout propagation rules (#186308)
This PR enhance the multi-reduction layout propagation: 
1. improve inst_data and lane_data to support fractional subgroup size
2. improve subgroup_layout/data setup to utilize the (nested) slice
layout from consumer op

It also removes the restriction in load_matrix/store_matrix layout
propagation to allow nd (n>2) layout
2026-03-20 08:12:32 -07:00
Jianhui Li
c6395bb287
[MLIR][XeGPU] Enhance Layout Propagation for broadcasting both leading dimensions and inner unit dimensions (#185583)
This PR enhances the layout propagation rules for broadcast operations.

The source layout is derived from the result layout based on the
broadcast pattern:
1. Broadcast on leading dimensions
  The source layout is the slice layout of the result layout.
2. Broadcast on inner unit dimensions
The source layout matches the result layout, with sg_data and lane_data
set to 1.
3. Broadcast on both leading dimensions and inner unit dimensions
  The source layout is derived by combining the above two rules.
2026-03-12 20:23:59 -07:00
Jianhui Li
fe11a43c60
[MLIR][XeGPU] Enhancing insert_strided_slice layout setup and infer rules (#184742)
This PR enhances insert_strided_slice layout rules to handle slice
layout and adjust the layout to fit the src shape. It adds dropDims as
layout utility function.
2026-03-06 17:08:23 -08:00
Jianhui Li
34259b76bf
[MLIR][XeGPU] Refactoring Transpose OP Layout Propagation (#184702)
This PR refactors Transpose Op Layout Propagation: 
1. Add inferTransposeSourceLayout() to layout utility, enhance layout
propagation and conflict handling to use this function
2. Add Layout utility: TransposeDims()
3. Refactor IsTransposeOf() and fix minor bugs
4. Fix minor issue in dropSgLayoutAndData()
2026-03-05 15:03:49 -08:00
Nishant Patel
8774da8f2f
[MLIR][XeGPU] Preserve anchor layouts in recoverTemporaryLayout (#182186) 2026-03-01 15:43:01 -08:00
Jianhui Li
77600cbd97
[MLIR][XeGPU] XeGPU Layout adds support for fractional-subgroup-size vector (#183434)
This PR enhances the layout assignment for XeGPU load/store operations
to handle vector size smaller than subgroup size.
Say for vector[4], in case of lane_data=[1], lane_layout=[4] and
inst_data=[4].
The fractional-subgroup-size vector support is required to support the
cross-subgroup reduction case. The number of participant subgroups in
reduction can be small, so it causes each subgroup needs to reduce a
small vector size, often a fraction of subgroup size.
Most layout-based subgroup distribution patterns support
fraction-subgroup-size without no change except a few: reduction,
insert/extract, constant. We don't expect ND operations (like
load_nd/store_nd/dpas) accept fractional-subgroup-size vector.
2026-02-26 19:49:33 -08:00
Charitha Saumya
84594d7539
[mlir][xegpu] Add vector layout conflict handling in XeGPU layout propagation pass. (#182402)
This PR adds support for layout conflict handling for vector operands. A
conflict for a vector operand occurs when a value consumed at a given
operand is not in the expected layout in the context of the consumer
(for example `vector.multi_reduction` op's source require a specific
layout inferred from its current result layout). To resolve this
conflict, we insert an `xegpu.convert_layout` right after the producer
(essentially duplicating the producer with expected layout) and use the
new value in the consumer.
2026-02-25 12:38:33 -08:00
Artem Kroviakov
4226250a42
[MLIR][XeGPU] Fix matrix ops layout propagation (#182268) 2026-02-22 13:40:48 +01:00
Nishant Patel
14f20ce795
[MLIR][XeGPU] Remove layout attribute from scf ops after wg to sg (#180771) 2026-02-12 07:26:18 -08:00
Artem Kroviakov
760f70711a
[MLIR][XeGPU] Use the setupDpasLayout utility for dpas layout propagation (#180937) 2026-02-12 13:18:47 +01:00
Nishant Patel
570055bf97
[MLIR][XeGPU] Propagate layout from anchor ops before Wg To Sg & Blocking Pass (#179490)
This PR calls recoverTemporaryLayout before the XeGPUWgtoSgDistribute &
XeGPUBlocking Pass to recover all the temporary operand layout which
might be required by the transformation patterns for checks and
verification
2026-02-06 15:56:09 -08:00
Jianhui Li
8102ebf6a3
[MLIR][XeGPU] Fixing PR179016 minor issues (#180295)
Fix two issues brough by PR179016: 
1. unused variable if build the option with
"DLLVM_ENABLE_ASSERTIONS=OFF"
2. Recover modification to recoverTemporaryLayouts() brought by
PR176737. Unintentionally lost during the merging process.
2026-02-06 14:51:40 -08:00
Jianhui Li
61b8a57839
[MLIR][XeGPU] Refactor layout propagation utilities (#179016)
This PR refactors layout propagation into two distinct components:
result/anchor layout setup and source layout inference from the result.

For operations that require a specific result layout due to semantic or
hardware constraints, the propagation logic explicitly sets up the
result or anchor layout. Otherwise, it infers the source layout from the
backward-propagated consumer layout.

The result or anchor layout may differ from the backward-propagated
consumer layout; any such discrepancies are resolved via the existing
layout-conflict mechanism.

**This PR introduces the following utility functions:**

Source layout inference:

> inferBroadcastSourceLayout()
> inferMultiReductionSourceLayout()
> inferBitCastSourceLayout()
> inferShapeCastSourceLayout()
> inferInsertStridedSliceSourceLayout()

Result / anchor layout setup:

> setupMultiReductionResultLayout()
> setupBitCastResultLayout()
> setupInsertStridedSliceResultLayout()
> setupLoadMatrixAnchorLayout()
> setupStoreMatrixAnchorLayout()
> setupLoadGatherAnchorLayout()
> setupStoreScatterAnchorLayout()

Part of subgroup distribution related code changes are separated and
created as PR https://github.com/llvm/llvm-project/pull/179018/changes.
2026-02-05 19:26:25 -08:00