21 Commits

Author SHA1 Message Date
Matthias Springer
0b24580a26
[mlir][Interfaces][NFC] Add RegionBranchOpInterface helper for forwarded values (#173981)
Add a helper function to compute a mapping of successor operands to
successor inputs. This mapping is computed in various places. Also add a
helper function to gather all region branch points.

This commit is in preparation of a bug fix / partial redesign of
`-remove-dead-values`. This commit also removes some duplicate code in
various places.
2026-01-01 11:56:05 +01:00
Jianhui Li
2b9e47749c
[MLIR][XeGPU] Refactor Layout access interface (#172125)
This PR builds on the anchor layout mechanism introduced in
https://github.com/llvm/llvm-project/pull/169267 and performs the
following refactoring:

1. Introduce getAnchorLayout() and setAnchorLayout() interface for
anchor ops to get and set layout attributes.

2. Add getLocalLayout() and setLocalLayout() utility functions, and
refactor workgroup/subgroup distribution patterns to use these APIs.
These utilities access the layout information directly and locally,
without relying on global propagation.

3. Introduce localPropagateLayoutsFromAnchor(), a utility used by
subgroup distribution to unify non-anchor layout setup.
This function is intended to be invoked upfront by all layout-based
passes (including workgroup/subgroup distribution and unrolling) to
propagate layouts from anchor ops to non-anchor ops.
After this step, patterns within the pass should exclusively use
getLocalLayout() / setLocalLayout().

4. Refactor getDistributeLayoutAttr() and setDistributeLayoutAttr() to
remove special-case handling. These APIs now operate in a uniform order:
anchor ops first, then non-anchor ops, and finally block arguments.
These APIs will be deprecated on long run. 

5. Refactor patterns in wg/sg distribution, load optimization passes to
use get/setAnchorLayout() and get/setLocalLayout().

6. Update test cases to enforce that anchor ops must use—and only
use—anchor layouts.
2025-12-17 12:04:58 -08:00
Artem Kroviakov
a6f837e9f8
[MLIR][XeGPU] Add sg layout propagation (#170879) 2025-12-17 11:03:21 +01:00
Jianhui Li
492340aeb1
[MLIR][XeGPU] Add handling for unit-dim expansion in ShapeCast workgroup-to-subgroup distribution (#171758)
Add special-case handling for ShapeCast when it expands unit dimensions
for a succeeding broadcast op. In this scenario, distribution requires
the source layout to be a slice layout, and the result layout is first
normalized by setting the expanded unit dimensions to 1 before computing
the distributed result shape. In all other cases, ShapeCast is
distributed as usual.

This PR also updates the propagation rule for vectors with expanded unit
dimensions, allowing them to share the same layout as the result of a
broadcast op. This enables correct layout propagation back to the source
of the ShapeCast op, as that layout must ultimately be restored as the
parent layout of the slice layout.
2025-12-16 13:13:11 -08:00
Jianhui Li
5236af88e5
[MLIR][XeGPU] Extend propagation and sg_to_lane distribution pass support broadcast with low rank and scalar source input (#170409)
This PR extends XeGPU layout propagation and distribution for
vector.broadcast operation.
It relaxes the restriction of layout propagation to allow low-rank and
scalar source input, and adds a pattern in sg-to-wi distribution to
support the lowering.
2025-12-09 08:48:27 -08:00
Artem Kroviakov
ea00593dd1
[MLIR][XeGPU][Quickfix] Disable block count in propagation (#170304)
One of the previous PRs
https://github.com/llvm/llvm-project/pull/169267/ has reintroduced block
count to layout propagation that was removed in
https://github.com/llvm/llvm-project/pull/168504/. This PR patches the
issue.
2025-12-02 09:49:06 -08:00
Jianhui Li
326a1a4bad
[MLIR][XeGPU] Add anchor_layout and update propagation to honor user-specified layouts (#169267)
Introduce anchor layout for XeGPU anchor ops: load_nd, store_nd,
prefetch_nd, dpas, load, store, prefetch, load_matrix, store_matrix, and
atomic_rmw. Anchor layout is permanent, and is guaranteed to be honored
by XeGPU distribution and lowerinngs once specified.
1. Add anchor_layout for XeGPU anchor OPs: load_nd, store_nd,
prefetch_nd, dpas, load, store, prefetch, load_matrix, store_matrix, and
atomic_rmw.
2. rename layout attributes to anchor_layout for these ops: load, store,
load_matrix, store_matrix
3. update layout propagation pass: Only when user doesn't specify anchor
layout, the pass computes a default layout and set to anchor op's
permant layout and use that for propagation. if user specified anchor
layout, the pass takes user-specified anchor layout. permant layout and
use that for propagation. if user specified anchor layout, the pass
takes user-specified anchor layout.
2025-11-26 23:02:01 -08:00
Artem Kroviakov
8ea5e20ce4
[MLIR][XeGPU] Disable block count usage in layout propagation (#168504) 2025-11-23 10:34:57 +01:00
Artem Kroviakov
bba40ab4bd
[MLIR][XeGPU] Decouple inst_data and lane_layout in propagation (#166941) 2025-11-10 14:14:11 +01:00
Charitha Saumya
9703bda95b
[mlir][xegpu] Add OptimizeBlockLoads pass. (#165483)
This pass rewrites certain xegpu `CreateNd` and `LoadNd` operations that
feeds into `vector.transpose` to more optimal form to improve
performance. Specifically, low precision (bitwidth < 32) `LoadNd` ops
that feeds into transpose ops are rewritten to i32 loads with a valid
transpose layout such that later passes can use the load with transpose
HW feature to accelerate such load ops.

**Update:**
Pass is renamed to `OptimizeBlockLoads ` because later we plan to add
the array length optimization into this pass as well. This will break
down a larger load (like `32x32xf16`) into more DPAS-favorable array
length loads (`32x16xf16` with array length = 2). Both these
optmizations require rewriting `CreateNd` and `LoadNd` and it makes
sense to have a common pass for both.
2025-11-04 13:15:32 -08:00
Dmitry Chigarev
6c563dc6a2
[mlir][XeGPU] Add optional layout attribute to LoadGather StoreScatter ops (#163414)
As [suggested
here](https://github.com/llvm/llvm-project/pull/163071#discussion_r2427229637)
the PR adds an optional layout attribute for `LoadGather` and
`StoreScatter` ops.

For the load-op the attribute describes the layout of the result (ex
`layout_result_0`), and for store-op it describes the layout for the
vector-to-store operand (ex `layout_operand_0`).

The PR also reworks `propagate-layout` pass to consider perm layout
attributes and back-propagate them accordingly.

The helper utility function `getDistributeLayoutAttr` is reworked to
return either `layout_operand/result_0` or `layout` for load/store ops
(denepding on which one is set). After an offline discussion decided
that the overall utilities layouts API is confusing since it tries to
mix permament and temporary layouts. Would need to change it in the
future.

---------

Signed-off-by: dchigarev <dmitry.chigarev@intel.com>
2025-11-04 08:19:47 -08:00
Artem Kroviakov
ec657d859c
[MLIR][XeGPU] Introduce xegpu::uArch usage in target-sensitive passes (#163801) 2025-10-31 17:33:11 +01:00
Charitha Saumya
2998c74a1e
[mlir][xegpu] Add SIMT distribution support for GEMM transpose B case. (#155517)
This PR adds the features needed for supporting the GEMM with transpose
B case.

Summary of changes.

1). Add distribution logic for `vector.bitcast`, `vector.transpose` and
`memref.extract_aligned_pointer_as_index` cases.
2). Add layout propagation support for `vector.shape_cast`,
`vector.broadcast` and `vector.bitcast`
3). Incorporate slice attribute and `DistributeLayoutAttr` interface
with the core logic in layout prop.
2025-09-19 10:33:27 -07:00
Artem Kroviakov
6c6afdd8c2
[MLIR][XeGPU] Reapply attempt for "Scattered ops sg-to-wi distribution #154949" (#156924)
This PR is a reapply of
https://github.com/llvm/llvm-project/pull/154949, which failed one of
sanitizer checks.
The issue was querying the `warpOp` results in `LoadDistribution` after
calling `moveRegionToNewWarpOpAndAppendReturns()`, which resulted in use
after free. This PR solves the issue by moving the op query before the
call and is otherwise identical to the one linked above.

---------

Co-authored-by: Charitha Saumya <136391709+charithaintc@users.noreply.github.com>
2025-09-04 12:04:30 -07:00
Thurston Dang
c1cc9d2c8a
Revert "[MLIR][XeGPU] Scattered ops sg-to-wi distribution" (#156761)
Reverts llvm/llvm-project#154949 due to suspected buildbot breakage
(https://lab.llvm.org/buildbot/#/builders/55/builds/16630/steps/11/logs/stdio).
Previously commented on the original pull request:
https://github.com/llvm/llvm-project/pull/154949#issuecomment-3250709417

```
******************** TEST 'MLIR :: Dialect/XeGPU/subgroup-distribute.mlir' FAILED ********************
...
# | PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
# | Stack dump:
# | 0.	Program arguments: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/bin/mlir-opt -xegpu-subgroup-distribute -allow-unregistered-dialect -canonicalize -cse -split-input-file /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/mlir/test/Dialect/XeGPU/subgroup-distribute.mlir
# |  #0 0x0000c0af4b066df0 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/lib/Support/Unix/Signals.inc:834:13
# |  #1 0x0000c0af4b060e20 llvm::sys::RunSignalHandlers() /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/lib/Support/Signals.cpp:105:18
# |  #2 0x0000c0af4b0691b4 SignalHandler(int, siginfo_t*, void*) /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/lib/Support/Unix/Signals.inc:426:38
# |  #3 0x0000ee25a3dcb8f8 (linux-vdso.so.1+0x8f8)
# |  #4 0x0000ee25a36c7608 (/lib/aarch64-linux-gnu/libc.so.6+0x87608)
# |  #5 0x0000ee25a367cb3c raise (/lib/aarch64-linux-gnu/libc.so.6+0x3cb3c)
# |  #6 0x0000ee25a3667e00 abort (/lib/aarch64-linux-gnu/libc.so.6+0x27e00)
# |  #7 0x0000c0af4ae7e4b0 __sanitizer::Atexit(void (*)()) /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_posix_libcdep.cpp:168:10
# |  #8 0x0000c0af4ae7c354 __sanitizer::Die() /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_termination.cpp:52:5
# |  #9 0x0000c0af4ae66a30 Unlock /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/hwasan/../sanitizer_common/sanitizer_mutex.h:250:16
# | #10 0x0000c0af4ae66a30 ~GenericScopedLock /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/hwasan/../sanitizer_common/sanitizer_mutex.h:386:51
# | #11 0x0000c0af4ae66a30 __hwasan::ScopedReport::~ScopedReport() /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/hwasan/hwasan_report.cpp:54:5
# | #12 0x0000c0af4ae661b8 __hwasan::(anonymous namespace)::BaseReport::~BaseReport() /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/hwasan/hwasan_report.cpp:477:7
# | #13 0x0000c0af4ae63f5c __hwasan::ReportTagMismatch(__sanitizer::StackTrace*, unsigned long, unsigned long, bool, bool, unsigned long*) /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/hwasan/hwasan_report.cpp:1094:1
# | #14 0x0000c0af4ae4f8e0 Destroy /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/hwasan/../sanitizer_common/sanitizer_common.h:532:31
# | #15 0x0000c0af4ae4f8e0 ~InternalMmapVector /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/hwasan/../sanitizer_common/sanitizer_common.h:642:56
# | #16 0x0000c0af4ae4f8e0 __hwasan::HandleTagMismatch(__hwasan::AccessInfo, unsigned long, unsigned long, void*, unsigned long*) /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/hwasan/hwasan.cpp:245:1
# | #17 0x0000c0af4ae51e8c __hwasan_tag_mismatch4 /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/hwasan/hwasan.cpp:764:1
# | #18 0x0000c0af4ae67b30 __interception::InterceptFunction(char const*, unsigned long*, unsigned long, unsigned long) /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/interception/interception_linux.cpp:60:0
# | #19 0x0000c0af5641cd24 getNumResults /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/mlir/include/mlir/IR/Operation.h:404:37
# | #20 0x0000c0af5641cd24 getOpResultImpl /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/mlir/include/mlir/IR/Operation.h:1010:5
# | #21 0x0000c0af5641cd24 getResult /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/mlir/include/mlir/IR/Operation.h:407:54
# | #22 0x0000c0af5641cd24 mlir::OpTrait::detail::MultiResultTraitBase<mlir::gpu::WarpExecuteOnLane0Op, mlir::OpTrait::VariadicResults>::getResult(unsigned int) /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/mlir/include/mlir/IR/OpDefinition.h:638:62
# | #23 0x0000c0af56426b60 getType /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/mlir/include/mlir/IR/Value.h:63:33
# | #24 0x0000c0af56426b60 getType /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/mlir/include/mlir/IR/Value.h:105:39
# | #25 0x0000c0af56426b60 (anonymous namespace)::LoadDistribution::matchAndRewrite(mlir::gpu::WarpExecuteOnLane0Op, mlir::PatternRewriter&) const /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/mlir/lib/Dialect/XeGPU/Transforms/XeGPUSubgroupDistribute.cpp:991:55
...
```
2025-09-03 14:40:18 -07:00
Artem Kroviakov
5777f71bce
[MLIR][XeGPU] Scattered ops sg-to-wi distribution (#154949)
This PR adds distribution patterns for scattered load and store ops,
chunk size included.

XeGPU moves toward offsets being part of the load/store ops, so the pass
only supports this case. Manipulating a vector of offsets indirectly
through create_tdesc is complex and soon to become obsolete anyway.
This PR assumes the SIMT-adapted scatter ops verification introduced in
https://github.com/llvm/llvm-project/pull/154653. The distribution
itself can be reviewed in the meantime.
2025-09-03 11:48:55 -07:00
Chao Chen
c96e2cdd13
[mlir][XeGPU] Update utils for LayoutAttr and SliceAttr support (#154819) 2025-08-27 12:37:15 -05:00
Kazu Hirata
c06d3a7b72
[mlir] Remove unused includes (NFC) (#148769)
These are identified by misc-include-cleaner.  I've filtered out those
that break builds.  Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
2025-07-14 22:19:23 -07:00
Benjamin Kramer
3287c1c176 [XeGPU] Move targetinfo constants to their own header file
This breaks the dependency from Dialect to Utils, which would be cyclic.
2025-06-26 13:53:00 +02:00
Chao Chen
36fbc6a8d2
[MLIR][XeGPU] Remove the transpose attribute from Gather/Scatter ops and Cleanup the documents (#145389) 2025-06-25 19:43:53 -05:00
Charitha Saumya
adc6228ea0
[mlir][xegpu] Refine layout assignment in XeGPU SIMT distribution. (#142687)
Changes:
* Decouple layout propagation from subgroup distribution and move it to
an independent pass.
* Refine layout assignment to handle control-flow ops correctly (scf.for, scf.while).
* Refine test cases.
2025-06-20 10:43:19 -07:00