33 Commits

Author SHA1 Message Date
Charitha Saumya
2998c74a1e
[mlir][xegpu] Add SIMT distribution support for GEMM transpose B case. (#155517)
This PR adds the features needed for supporting the GEMM with transpose
B case.

Summary of changes.

1). Add distribution logic for `vector.bitcast`, `vector.transpose` and
`memref.extract_aligned_pointer_as_index` cases.
2). Add layout propagation support for `vector.shape_cast`,
`vector.broadcast` and `vector.bitcast`
3). Incorporate slice attribute and `DistributeLayoutAttr` interface
with the core logic in layout prop.
2025-09-19 10:33:27 -07:00
Charitha Saumya
9b0d7ddb04
[mlir][xegpu] Add support for vector.multi_reduction and vector.shape_cast SIMT distribution. (#157560)
Add support for distributing the `vector.multi_reduction` operation
across lanes in a warp. Currently only 2D to 1D reductions are
supported. Given layouts for the source and accumulator vectors,
* If the reduction dimension is distributed across lanes, the reduction
is non-lane-local and the reduction is done using warp shuffles. Here we
simply rewrite the `MultiDimReductionOp` to a sequence of `ReductionOp`s
inside the warp op body. Actual distribution will be done by
`WarpOpReduction` pattern.
* If the reduction dimension is not distributed across lanes, the
reduction is lane-local. In this case, we yield the source and
accumulator vectors from the warp op and perform the lane-local
reduction outside the warp op using a sequence of `ReductionOp`s.

PR also adds support for distributing `vector.shape_cast` based on
layouts.
2025-09-12 09:37:04 -07:00
Jakub Kuderski
2ed3f49c49
[mlir] Use free op create functions. NFC. (#157374)
The builder create methods are deprecated:
https://mlir.llvm.org/deprecation/. See
https://discourse.llvm.org/t/psa-opty-create-now-with-100-more-tab-complete/87339.
2025-09-07 22:13:20 -04:00
Artem Kroviakov
6c6afdd8c2
[MLIR][XeGPU] Reapply attempt for "Scattered ops sg-to-wi distribution #154949" (#156924)
This PR is a reapply of
https://github.com/llvm/llvm-project/pull/154949, which failed one of
sanitizer checks.
The issue was querying the `warpOp` results in `LoadDistribution` after
calling `moveRegionToNewWarpOpAndAppendReturns()`, which resulted in use
after free. This PR solves the issue by moving the op query before the
call and is otherwise identical to the one linked above.

---------

Co-authored-by: Charitha Saumya <136391709+charithaintc@users.noreply.github.com>
2025-09-04 12:04:30 -07:00
Thurston Dang
c1cc9d2c8a
Revert "[MLIR][XeGPU] Scattered ops sg-to-wi distribution" (#156761)
Reverts llvm/llvm-project#154949 due to suspected buildbot breakage
(https://lab.llvm.org/buildbot/#/builders/55/builds/16630/steps/11/logs/stdio).
Previously commented on the original pull request:
https://github.com/llvm/llvm-project/pull/154949#issuecomment-3250709417

```
******************** TEST 'MLIR :: Dialect/XeGPU/subgroup-distribute.mlir' FAILED ********************
...
# | PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
# | Stack dump:
# | 0.	Program arguments: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/bin/mlir-opt -xegpu-subgroup-distribute -allow-unregistered-dialect -canonicalize -cse -split-input-file /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/mlir/test/Dialect/XeGPU/subgroup-distribute.mlir
# |  #0 0x0000c0af4b066df0 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/lib/Support/Unix/Signals.inc:834:13
# |  #1 0x0000c0af4b060e20 llvm::sys::RunSignalHandlers() /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/lib/Support/Signals.cpp:105:18
# |  #2 0x0000c0af4b0691b4 SignalHandler(int, siginfo_t*, void*) /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/lib/Support/Unix/Signals.inc:426:38
# |  #3 0x0000ee25a3dcb8f8 (linux-vdso.so.1+0x8f8)
# |  #4 0x0000ee25a36c7608 (/lib/aarch64-linux-gnu/libc.so.6+0x87608)
# |  #5 0x0000ee25a367cb3c raise (/lib/aarch64-linux-gnu/libc.so.6+0x3cb3c)
# |  #6 0x0000ee25a3667e00 abort (/lib/aarch64-linux-gnu/libc.so.6+0x27e00)
# |  #7 0x0000c0af4ae7e4b0 __sanitizer::Atexit(void (*)()) /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_posix_libcdep.cpp:168:10
# |  #8 0x0000c0af4ae7c354 __sanitizer::Die() /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_termination.cpp:52:5
# |  #9 0x0000c0af4ae66a30 Unlock /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/hwasan/../sanitizer_common/sanitizer_mutex.h:250:16
# | #10 0x0000c0af4ae66a30 ~GenericScopedLock /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/hwasan/../sanitizer_common/sanitizer_mutex.h:386:51
# | #11 0x0000c0af4ae66a30 __hwasan::ScopedReport::~ScopedReport() /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/hwasan/hwasan_report.cpp:54:5
# | #12 0x0000c0af4ae661b8 __hwasan::(anonymous namespace)::BaseReport::~BaseReport() /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/hwasan/hwasan_report.cpp:477:7
# | #13 0x0000c0af4ae63f5c __hwasan::ReportTagMismatch(__sanitizer::StackTrace*, unsigned long, unsigned long, bool, bool, unsigned long*) /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/hwasan/hwasan_report.cpp:1094:1
# | #14 0x0000c0af4ae4f8e0 Destroy /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/hwasan/../sanitizer_common/sanitizer_common.h:532:31
# | #15 0x0000c0af4ae4f8e0 ~InternalMmapVector /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/hwasan/../sanitizer_common/sanitizer_common.h:642:56
# | #16 0x0000c0af4ae4f8e0 __hwasan::HandleTagMismatch(__hwasan::AccessInfo, unsigned long, unsigned long, void*, unsigned long*) /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/hwasan/hwasan.cpp:245:1
# | #17 0x0000c0af4ae51e8c __hwasan_tag_mismatch4 /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/hwasan/hwasan.cpp:764:1
# | #18 0x0000c0af4ae67b30 __interception::InterceptFunction(char const*, unsigned long*, unsigned long, unsigned long) /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/interception/interception_linux.cpp:60:0
# | #19 0x0000c0af5641cd24 getNumResults /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/mlir/include/mlir/IR/Operation.h:404:37
# | #20 0x0000c0af5641cd24 getOpResultImpl /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/mlir/include/mlir/IR/Operation.h:1010:5
# | #21 0x0000c0af5641cd24 getResult /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/mlir/include/mlir/IR/Operation.h:407:54
# | #22 0x0000c0af5641cd24 mlir::OpTrait::detail::MultiResultTraitBase<mlir::gpu::WarpExecuteOnLane0Op, mlir::OpTrait::VariadicResults>::getResult(unsigned int) /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/mlir/include/mlir/IR/OpDefinition.h:638:62
# | #23 0x0000c0af56426b60 getType /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/mlir/include/mlir/IR/Value.h:63:33
# | #24 0x0000c0af56426b60 getType /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/mlir/include/mlir/IR/Value.h:105:39
# | #25 0x0000c0af56426b60 (anonymous namespace)::LoadDistribution::matchAndRewrite(mlir::gpu::WarpExecuteOnLane0Op, mlir::PatternRewriter&) const /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/mlir/lib/Dialect/XeGPU/Transforms/XeGPUSubgroupDistribute.cpp:991:55
...
```
2025-09-03 14:40:18 -07:00
Artem Kroviakov
5777f71bce
[MLIR][XeGPU] Scattered ops sg-to-wi distribution (#154949)
This PR adds distribution patterns for scattered load and store ops,
chunk size included.

XeGPU moves toward offsets being part of the load/store ops, so the pass
only supports this case. Manipulating a vector of offsets indirectly
through create_tdesc is complex and soon to become obsolete anyway.
This PR assumes the SIMT-adapted scatter ops verification introduced in
https://github.com/llvm/llvm-project/pull/154653. The distribution
itself can be reviewed in the meantime.
2025-09-03 11:48:55 -07:00
Chao Chen
c96e2cdd13
[mlir][XeGPU] Update utils for LayoutAttr and SliceAttr support (#154819) 2025-08-27 12:37:15 -05:00
Adam Siemieniuk
533ddcd989
[mlir][gpu] Warp execute terminator getter (#154729)
Adds a utility getter to `warp_execute_on_lane_0` which simplifies
access to the op's terminator.

Uses are refactored to utilize the new terminator getter.
2025-08-22 18:24:23 +02:00
Charitha Saumya
06884d0204
[mlir][xegpu] Bug fix in UpdateNdOffset distribution. (#150545)
Reason is UpdateNdOffset source operand not retaining the layouts when
it is yielded by the warp op. `warp_execute_on_lane0` op expects that
TensorDesc type is unchanged during distribution out of its region. we
use UnrealizedCasts to reconcile this mismatch outside the warpOp (via
`resolveDistributedTy`)
2025-08-05 14:42:14 -07:00
Jianhui Li
90944b85c5
[MLIR][XeGPU] Add offset operands to load_nd/store_nd/prefetch_nd (#149424)
This PR allows load_nd/store_nd/prefetch_nd to take an additional offset
operand.
It is based on this PR https://github.com/llvm/llvm-project/pull/148335.
Now user can create a nd_tdesc with no offset, and instead set the
offset with the load_nd operation.
2025-07-23 09:00:51 -07:00
Maksim Levental
7b78796543
[mlir][NFC] update mlir/Dialect create APIs (25/n) (#149932)
See https://github.com/llvm/llvm-project/pull/147168 for more info.
2025-07-21 19:57:59 -04:00
Charitha Saumya
fc3781853b
[mlir][xegpu] Minor fixes in XeGPU subgroup distribution. (#147846)
This PR addresses the following issues.

1. Add the missing attributes when creating a new GPU funcOp in
`MoveFuncBodyToWarpExecuteOnLane0` pattern.
2. Bug fix in LoadNd distribution to make sure LoadOp is the last op in
warpOp region before it is distributed (needed for preserving the memory
op ordering during distribution).
3. Add utility for removing OpOperand or OpResult layout attributes.
2025-07-17 15:13:20 -07:00
Charitha Saumya
244ebef1dd
Reapply [mlir][vector] Refactor WarpOpScfForOp to support unused or swapped forOp results. (#148313)
Reapply attempt for : https://github.com/llvm/llvm-project/pull/148291
Fix for the build failure reported in :
https://lab.llvm.org/buildbot/#/builders/116/builds/15477

-----

This crash is caused by mismatch of distributed type returned by
`getDistributedType` and intended distributed type for forOp results.

Solution diff:
20c2cf6766

Example:
```
func.func @warp_scf_for_broadcasted_result(%arg0: index) -> vector<1xf32> {
  %c128 = arith.constant 128 : index
  %c1 = arith.constant 1 : index
  %c0 = arith.constant 0 : index
  %2 = gpu.warp_execute_on_lane_0(%arg0)[32] -> (vector<1xf32>) {
    %ini = "some_def"() : () -> (vector<1xf32>)
    %0 = scf.for %arg3 = %c0 to %c128 step %c1 iter_args(%arg4 = %ini) -> (vector<1xf32>) {
      %1 = "some_op"(%arg4) : (vector<1xf32>) -> (vector<1xf32>)
      scf.yield %1 : vector<1xf32>
    }
    gpu.yield %0 : vector<1xf32>
  }
  return %2 : vector<1xf32>
}
``` 
In this case the distributed type for forOp result is `vector<1xf32>`
(result is not distributed and broadcasted to all lanes instead).
However, in this case `getDistributedType` will return NULL type.

Therefore, if the distributed type can be recovered from warpOp, we
should always do that first before using `getDistributedType`
2025-07-14 15:41:56 -07:00
Charitha Saumya
1d33bbab57
Revert "[mlir][vector] Refactor WarpOpScfForOp to support unused or swapped forOp results." (#148291)
Reverts llvm/llvm-project#147620

Reverting due to build failure:
https://lab.llvm.org/buildbot/#/builders/116/builds/15477
2025-07-11 13:22:54 -07:00
Charitha Saumya
3092b765ba
[mlir][vector] Refactor WarpOpScfForOp to support unused or swapped forOp results. (#147620)
Current implementation generates incorrect code or crashes in the
following valid cases.

1. At least one of the for op results are not yielded by the warpOp.
Example:
```
%0 = gpu.warp_execute_on_lane_0(%arg0)[32] -> (vector<4xf32>) {
    ....
    %3:2 = scf.for %arg3 = %c0 to %c128 step %c1 iter_args(%arg4 = %ini, %arg5 = %ini1) -> (vector<128xf32>, vector<128xf32>) {
      
      %1  = ...
      %acc = ....
      scf.yield %acc, %1 : vector<128xf32>, vector<128xf32>
    }
    gpu.yield %3#0 : vector<128xf32> // %3#1 is not used but can not be removed as dead code (loop carried).
  }
  "some_use"(%0) : (vector<4xf32>) -> ()
  return
```
2. Enclosing warpOp yields the forOp results in different order compared
to the forOp results.
Example:
```
  %0:3 = gpu.warp_execute_on_lane_0(%arg0)[32] -> (vector<4xf32>, vector<4xf32>, vector<8xf32>) {
    ....
    %3:3 = scf.for %arg3 = %c0 to %c128 step %c1 iter_args(%arg4 = %ini1, %arg5 = %ini2, %arg6 = %ini3) -> (vector<256xf32>, vector<128xf32>, vector<128xf32>) {
      .....
      scf.yield %acc1, %acc2, %acc3 : vector<256xf32>, vector<128xf32>, vector<128xf32>
    }
    gpu.yield %3#2, %3#1, %3#0 : vector<128xf32>, vector<128xf32>, vector<256xf32> // swapped order
  }
  "some_use_1"(%0#0) : (vector<4xf32>) -> ()
  "some_use_2"(%0#1) : (vector<4xf32>) -> ()
  "some_use_3"(%0#2) : (vector<8xf32>) -> ()

```
2025-07-11 13:08:33 -07:00
Benjamin Kramer
3287c1c176 [XeGPU] Move targetinfo constants to their own header file
This breaks the dependency from Dialect to Utils, which would be cyclic.
2025-06-26 13:53:00 +02:00
Charitha Saumya
c8a9579ff9
[mlir][xegpu] Add support for distributing gpu.barrier (#145434) 2025-06-24 09:28:30 -07:00
Charitha Saumya
adc6228ea0
[mlir][xegpu] Refine layout assignment in XeGPU SIMT distribution. (#142687)
Changes:
* Decouple layout propagation from subgroup distribution and move it to
an independent pass.
* Refine layout assignment to handle control-flow ops correctly (scf.for, scf.while).
* Refine test cases.
2025-06-20 10:43:19 -07:00
Jeremy Kun
b533b0ec34
Define a DataFlowSolver helper that loads sensible default analyses (#143415)
Cf. https://discourse.llvm.org/t/mlir-dead-code-analysis/67568/10

Custom analysis passes will not work properly unless both
DeadCodeAnalysis and SparseConstantPropagation are loaded to the
DataFlowSolver. This is intended behavior, but surprising to many users
as shown in the thread. In lieu of a longer-term fix (which I am not
knowledgeable enough to implement myself, yet), this commit adds a
helper function that loads these two analyses, as well as providing
breadcrumbs for an explanation of the problem. The existing places in
the codebase where these two analyses are loaded for the purpose of
running other unrelated analyses are replaced by the use of the helper.

---------

Co-authored-by: Jeremy Kun <j2kun@users.noreply.github.com>
Co-authored-by: Oleksandr "Alex" Zinenko <azinenko@amd.com>
2025-06-20 08:16:52 -07:00
Chao Chen
9e2684e4cf
[MLIR][XeGPU] Add unroll patterns and blocking pass for XeGPU [2/N] (#142477)
Bring back https://github.com/llvm/llvm-project/pull/140163 with fixes
2025-06-02 21:39:30 -05:00
Chao Chen
b88dfb0b23
Revert "[MLIR][XeGPU] Add unroll patterns and blocking pass for XeGPU [2/N]" (#142459)
Reverts llvm/llvm-project#140163
2025-06-02 15:47:21 -04:00
Chao Chen
0210750d5a
[MLIR][XeGPU] Add unroll patterns and blocking pass for XeGPU [2/N] (#140163)
This PR introduces the initial implementation of a blocking pass for
XeGPU programs. The pass leverages unroll patterns from both the XeGPU
and Vector dialects. 

---------

Co-authored-by: Adam Siemieniuk <adam.siemieniuk@intel.com>
2025-06-02 14:02:45 -05:00
Wang Qiang
cece058191
[llvm][mlir][NFC] Fix typos in comments and test descriptions (#139688)
This patch fixes several typographical errors in comments and test
files:

1. Corrected "achive" to "archive" in archive-update.test. 
2. Fixed "achive" to "achieve" in a comment in
XeGPUSubgroupDistribute.cpp.
3. Corrected "achived" to "achieved" in a test note in
SimpleSIVNoValidityCheckFixedSize.ll.

These changes are non-functional and intended to improve readability and
documentation accuracy.

Signed-off-by: Kane Wang <wangqiang1@kylinos.cn>
Co-authored-by: Kane Wang <wangqiang1@kylinos.cn>
2025-05-13 11:03:51 +01:00
Rahul Joshi
b17f3c63de
[NFC][MLIR] Add {} for else when if body has {} (#139422) 2025-05-12 10:29:03 -07:00
Charitha Saumya
e7dcf1b7e5
[mlir][xegpu] Add SIMT distribution patterns for UpdateNdOffset and PrefetchNd ops. (#138033)
This PR adds support for SIMT distribution of UpdateNdOffset and
PrefetchNd ops.

For both these ops distribution will remove the layout attribute from
the tensor descriptor type. Everything else remains unchanged.

Example 1:

 ```
   #lo0 = #xegpu.layout<wi_layout = [1, 8], wi_data = [1, 1]>
   gpu.warp_execute_on_lane_0(%laneid) -> () {
     ...
     xegpu.prefetch_nd %arg0 : !xegpu.tensor_desc<4x8xf32, #lo0>
   }
 ```
 To
 ```
   %r:2 = gpu.warp_execute_on_lane_0(%laneid) -> (
   !xegpu.tensor_desc<4x8xf32, #lo0>) {
     gpu.yield %arg0: !xegpu.tensor_desc<4x8xf32, #lo0>
   }
   %1 = unrealized_conversion_cast %r#0: !xegpu.tensor_desc<4x8xf32,
     #lo0> -> !xegpu.tensor_desc<4x8xf32>
   xegpu.prefetch_nd %0 : !xegpu.tensor_desc<4x8xf32>

 ```
Example 2:
 ```
   #lo0 = #xegpu.layout<wi_layout = [1, 8], wi_data = [1, 1]>
   %r = gpu.warp_execute_on_lane_0(%laneid) ->
                   (!xegpu.tensor_desc<4x8xf32, #lo0>) {
     ...
     %update = xegpu.update_nd_offset %arg0, [%c32, %c16]:
       !xegpu.tensor_desc<4x8xf32, #lo0>
     gpu.yield %update
   }
   ...
 ```
 To
 ```
   %r:2 = gpu.warp_execute_on_lane_0(%laneid) -> (vector<4x1xf32>,
   !xegpu.tensor_desc<4x8xf32, #lo0>) {
     ...
     %dead = xegpu.update_nd_offset %arg0, [%c32, %c16]:
       !xegpu.tensor_desc<4x8xf32, #lo0> gpu.yield %dead, %arg0
     gup.yield %dead, %arg0, %c32, %c16
   }
%0 = xegpu.unrealized_conversion_cast %r#1: !xegpu.tensor_desc<4x8xf32,
        #lo0> -> !xegpu.tensor_desc<4x8xf32>
   %1 = xegpu.update_nd_offset %0, [%c32, %c16]:
     !xegpu.tensor_desc<4x8xf32>
   ...
 ```
2025-05-08 13:17:38 -07:00
Charitha Saumya
7a66746226
[mlir][xegpu] Handle scalar uniform ops in SIMT distribution. (#138593)
This PR adds support for moving scalar uniform (gpu index ops, constants
etc) outside the `gpu.warp_execute_on_lane0` op. These kinds of ops do
not require distribution and are safe to move out of the warp op. This
also avoid adding separate distribution patterns for these ops.

Example:
```
   %1 = gpu.warp_execute_on_lane_0(%laneid) -> (index) {
     ...
     %block_id_x = gpu.block_id x
     gpu.yield %block_id_x
   }
  // use %1
```
To:
```
   %block_id_x = gpu.block_id x
   %1 = gpu.warp_execute_on_lane_0(%laneid) -> (index) {
     ...
     
     gpu.yield %block_id_x
   }
  // use %1

```
2025-05-08 10:35:32 -07:00
Kazu Hirata
b2e2ae8702 [mlir] Fix warnings
This patch fixes:

  mlir/lib/Dialect/XeGPU/Transforms/XeGPUSubgroupDistribute.cpp:901:12:
  error: variable 'origVecType' set but not used
  [-Werror,-Wunused-but-set-variable]

  mlir/lib/Dialect/XeGPU/Transforms/XeGPUSubgroupDistribute.cpp:908:12:
  error: variable 'origTensorDescTy' set but not used
  [-Werror,-Wunused-but-set-variable]
2025-05-01 12:41:31 -07:00
Charitha Saumya
d30554b19e
[mlir][xegpu] SIMT distribution patterns for XeGPU CreateNdTdesc, LoadNd, StoreNd and Dpas Ops. (#135271)
This PR adds the SIMT distribution patterns for create_nd_tdesc, load_nd, store_nd and dpas XeGPU ops.
2025-04-30 12:16:47 -07:00
Jakub Kuderski
198c5dac37
[mlir][transform] Clean up prints. NFC. (#136401)
Use `llvm::interleaved` from #135517 to simplify printing.
2025-04-19 12:11:06 -04:00
Kazu Hirata
456963de96 [mlir] Fix warnings
This patch fixes:

  mlir/lib/Dialect/XeGPU/Transforms/XeGPUSubgroupDistribute.cpp:54:3:
  error: definition of implicit copy assignment operator for 'Layout'
  is deprecated because it has a user-declared copy constructor
  [-Werror,-Wdeprecated-copy]

  mlir/lib/Dialect/XeGPU/Transforms/XeGPUSubgroupDistribute.cpp:103:3:
  error: definition of implicit copy assignment operator for 'SGMap'
  is deprecated because it has a user-declared copy constructor
  [-Werror,-Wdeprecated-copy]
2025-03-14 13:28:58 -07:00
Charitha Saumya
fd24805c8e
Reapply [mlir][xegpu] Add XeGPU subgroup map propagation analysis for XeGPU SIMT distribution. (#131380)
Originally introduced in #130240 and reverted in #131364 

Reproduced the issue locally in Linux by doing a shared lib build. Fixes
including adding the missing LINK_LIBS.

**Original commit message:**

This PR adds the SG map propagation step of the XeGPU SIMT distribution.
SG map propagation is a sparse backward dataflow analysis that propagate
the sg_map backward starting from the operands of certain operations
(DPAS, store etc.).

This is the first step of XeGPU subgroup distribution. This analysis
result is used to attach layout information to each XeGPU SIMD subgroup
op. The lowering patterns in XeGPUSubgroupDistribute will consume these
layout info to distribute SIMD ops into SIMT ops that work on work-item
level data fragments.

Summary of Lowering XeGPU SIMD -> SIMT
Subgroup map propagation (This PR)
Attach sg_map to each op in move all ops inside
gpu.warp_execute_on_lane0 region.
Distribute each op using sg_map
Additional legalization steps to align more with Xe HW.
2025-03-14 12:38:36 -07:00
Charitha Saumya
3fcd921aa4
Revert "[mlir][xegpu] Add XeGPU subgroup map propagation analysis for XeGPU SIMT distribution." (#131364)
Reverts llvm/llvm-project#130240
2025-03-14 10:36:58 -07:00
Charitha Saumya
5eb557774d
[mlir][xegpu] Add XeGPU subgroup map propagation analysis for XeGPU SIMT distribution. (#130240)
This PR adds the SG map propagation step of the XeGPU SIMT distribution.
SG map propagation is a sparse backward dataflow analysis that propagate
the sg_map backward starting from the operands of certain operations
(DPAS, store etc.).

This is the first step of XeGPU subgroup distribution. This analysis
result is used to attach layout information to each XeGPU SIMD subgroup
op. The lowering patterns in XeGPUSubgroupDistribute will consume these
layout info to distribute SIMD ops into SIMT ops that work on work-item
level data fragments.

### Summary of Lowering XeGPU SIMD -> SIMT

1. Subgroup map propagation (This PR)
2. Attach `sg_map` to each op in move all ops inside
`gpu.warp_execute_on_lane0` region.
3. Distribute each op using `sg_map`
4. Additional legalization steps to align more with Xe HW.
2025-03-14 10:21:22 -07:00