llvm-project

Author	SHA1	Message	Date
Jianhui Li	470c5ca81c	[MLIR][XeGPU] Fix insert_strided_slice op in subgroup distribution (#180604 ) The PR modifies the subgroup distribution pass to only sink insert_strided_slice operation if it becomes the last op before yield. It avoids sinking insert_strided_slice multiple times and cause potential issue in worst case.	2026-02-09 13:29:46 -08:00
Jianhui Li	61b8a57839	[MLIR][XeGPU] Refactor layout propagation utilities (#179016 ) This PR refactors layout propagation into two distinct components: result/anchor layout setup and source layout inference from the result. For operations that require a specific result layout due to semantic or hardware constraints, the propagation logic explicitly sets up the result or anchor layout. Otherwise, it infers the source layout from the backward-propagated consumer layout. The result or anchor layout may differ from the backward-propagated consumer layout; any such discrepancies are resolved via the existing layout-conflict mechanism. This PR introduces the following utility functions: Source layout inference: > inferBroadcastSourceLayout() > inferMultiReductionSourceLayout() > inferBitCastSourceLayout() > inferShapeCastSourceLayout() > inferInsertStridedSliceSourceLayout() Result / anchor layout setup: > setupMultiReductionResultLayout() > setupBitCastResultLayout() > setupInsertStridedSliceResultLayout() > setupLoadMatrixAnchorLayout() > setupStoreMatrixAnchorLayout() > setupLoadGatherAnchorLayout() > setupStoreScatterAnchorLayout() Part of subgroup distribution related code changes are separated and created as PR https://github.com/llvm/llvm-project/pull/179018/changes.	2026-02-05 19:26:25 -08:00
Jianhui Li	983d8663b0	[MLIR] [XeGPU] SG distribution: adding tests for alloca/create_memdesc and remove unncessary check from shape_cast op lowering (#179018 ) This PR add subgroup distribution tests for memref.alloca and xegpu.create_memdesc ops. It also removes the slice layout requirement for shape_cast.	2026-02-03 09:17:25 -08:00
Charitha Saumya	d9c65f94b1	[mlir][xegpu] Add `XeGPUSgToWiDistributeExperimental` pass. (#177492 ) Currently XeGPU lowering pipeline uses `XeGPUSubgroupDistribute` pass to subgroup to work item distribution of ops. This pass is well established and relies on vector distribution's `WarpOp` based distribution mechanism. However, recent experiments with larger kernels have shown that this pass is very expensive in terms of compile time (see below). This prompted us to create a new pass that does not rely on `WarpOp` based distribution. This PR adds the initial infra to move away from the old way and align Wg To WI distribution with Wg to Sg distribution. New pass also uses context-aware type conversion based on XeGPU layouts to distributed vector types from SG to WI. This PR adds the following changes: * SG to WI distribution pass based on context-aware type conversions using `OpConversionPatterns` * Test pass for testing individual patterns (`TestXeGPUSgToWiDistributeExperimental`) * `XeGPUSgToWiDistributeExperimentalPass` which will eventually replace `XeGPUSubgroupDistribute` Flash attention e2e compilations stats: ``` ----Wall Time---- ----Name---- 0.0032 ( 0.2%) Parser 0.0008 ( 0.0%) CSE 0.0000 ( 0.0%) (A) DominanceInfo 0.0002 ( 0.0%) GpuXeVMAttachTarget 1.1427 ( 58.7%) 'gpu.module' Pipeline 0.0019 ( 0.1%) XeGPUWgToSgDistribute 0.0003 ( 0.0%) CSE 0.0000 ( 0.0%) (A) DominanceInfo 0.0002 ( 0.0%) LowerAffinePass 0.0001 ( 0.0%) CSE 0.0000 ( 0.0%) (A) DominanceInfo 0.0008 ( 0.0%) XeGPUPropagateLayout 0.0056 ( 0.3%) XeGPUBlocking 0.0010 ( 0.1%) Canonicalizer 0.0004 ( 0.0%) CSE 0.0000 ( 0.0%) (A) DominanceInfo 0.0015 ( 0.1%) XeGPUPropagateLayout 0.0007 ( 0.0%) XeGPUOptimizeBlockLoads 0.0010 ( 0.0%) Canonicalizer 0.0004 ( 0.0%) CSE 0.0000 ( 0.0%) (A) DominanceInfo 0.0015 ( 0.1%) XeGPUPropagateLayout 1.1274 ( 57.9%) XeGPUSubgroupDistribute 0.7959 ( 40.9%) Output 0.0022 ( 0.1%) Rest 1.9461 (100.0%) Total ```	2026-01-29 09:57:01 -08:00
Jakub Kuderski	9aaf0b89f5	[mlir] Apply clang-tidy check llvm-use-vector-utils. NFC. (#178526 )	2026-01-29 02:19:00 +00:00
Artem Kroviakov	0926743e2e	[MLIR][XeGPU] Add uniform values distribution pattern (#176737 )	2026-01-26 21:23:31 +01:00
Artem Kroviakov	0b7d14e9a8	[MLIR][XeGPU] Add 2D `vector.multi_reduction` optimization (#171154 )	2026-01-14 12:58:30 +01:00
Jianhui Li	2b9e47749c	[MLIR][XeGPU] Refactor Layout access interface (#172125 ) This PR builds on the anchor layout mechanism introduced in https://github.com/llvm/llvm-project/pull/169267 and performs the following refactoring: 1. Introduce getAnchorLayout() and setAnchorLayout() interface for anchor ops to get and set layout attributes. 2. Add getLocalLayout() and setLocalLayout() utility functions, and refactor workgroup/subgroup distribution patterns to use these APIs. These utilities access the layout information directly and locally, without relying on global propagation. 3. Introduce localPropagateLayoutsFromAnchor(), a utility used by subgroup distribution to unify non-anchor layout setup. This function is intended to be invoked upfront by all layout-based passes (including workgroup/subgroup distribution and unrolling) to propagate layouts from anchor ops to non-anchor ops. After this step, patterns within the pass should exclusively use getLocalLayout() / setLocalLayout(). 4. Refactor getDistributeLayoutAttr() and setDistributeLayoutAttr() to remove special-case handling. These APIs now operate in a uniform order: anchor ops first, then non-anchor ops, and finally block arguments. These APIs will be deprecated on long run. 5. Refactor patterns in wg/sg distribution, load optimization passes to use get/setAnchorLayout() and get/setLocalLayout(). 6. Update test cases to enforce that anchor ops must use—and only use—anchor layouts.	2025-12-17 12:04:58 -08:00
Jianhui Li	492340aeb1	[MLIR][XeGPU] Add handling for unit-dim expansion in ShapeCast workgroup-to-subgroup distribution (#171758 ) Add special-case handling for ShapeCast when it expands unit dimensions for a succeeding broadcast op. In this scenario, distribution requires the source layout to be a slice layout, and the result layout is first normalized by setting the expanded unit dimensions to 1 before computing the distributed result shape. In all other cases, ShapeCast is distributed as usual. This PR also updates the propagation rule for vectors with expanded unit dimensions, allowing them to share the same layout as the result of a broadcast op. This enables correct layout propagation back to the source of the ShapeCast op, as that layout must ultimately be restored as the parent layout of the slice layout.	2025-12-16 13:13:11 -08:00
Charitha Saumya	3ece6626cb	[mlir][xegpu] Add support for `vector.extract_strided_slice` XeGPU SIMT distribution with partial offsets. (#171512 ) `vector.extract_strided_slice` can have two forms when specifying offsets. Case 1: ``` %1 = vector.extract_strided_slice %0 { offsets = [8, 0], sizes = [8, 16], strides = [1, 1]} : vector<24x16xf32> to vector<8x16xf32> ``` Case 2: ``` %1 = vector.extract_strided_slice %0 { offsets = [8], sizes = [8], strides = [1]} : vector<24x16xf32> to vector<8x16xf32> ``` These two ops means the same thing, but case 2 is syntactic sugar to avoid specifying offsets for fully extracted dims. Currently case 2 fails in XeGPU SIMT distribution. This PR fixes this issue.	2025-12-10 09:53:56 -08:00
Jianhui Li	5236af88e5	[MLIR][XeGPU] Extend propagation and sg_to_lane distribution pass support broadcast with low rank and scalar source input (#170409 ) This PR extends XeGPU layout propagation and distribution for vector.broadcast operation. It relaxes the restriction of layout propagation to allow low-rank and scalar source input, and adds a pattern in sg-to-wi distribution to support the lowering.	2025-12-09 08:48:27 -08:00
Charitha Saumya	c333f7dab9	[mlir][xegpu] Add layout based SIMT distribution support for `vector.extract/insert_strided_slice` (#168626 ) This PR adds general SIMT distribution support for `vector.extract/insert_strided_slice`. Currently vector distribution already have support for these operations but have restrictions to avoid requiring layouts during distribution logic. For example, `extract_stride_slice` require that distributed dimension is fully extracted. However, more complex cases may require extracting partially from distributed dimension (eg. 8x16xf16 extraction from 8x32xf16). These types of cases need the layouts to reason about how the data is spread across SIMT lanes. Currently, we don't have layout access in vector distribution so these new patterns are place in XeGPU side. They have higher pattern benefit so that they will be tried first before trying regular vector distribution based patterns.	2025-11-26 10:10:36 -06:00
Kazu Hirata	67391fc039	[mlir] Construct SmallVector with initial values (NFC) (#169239 ) Identified with llvm-use-ranges.	2025-11-23 22:32:50 -08:00
Jakub Kuderski	1fd9c02513	[mlir] Adopt cast function objects. NFC. (#168228 ) These were added in https://github.com/llvm/llvm-project/pull/165803.	2025-11-15 14:51:14 -05:00
Artem Kroviakov	68c4c83bcb	[MLIR][XeGPU] Matrix load/store subgroup distribution (#165008 )	2025-11-03 21:48:27 +01:00
Artem Kroviakov	ec657d859c	[MLIR][XeGPU] Introduce `xegpu::uArch` usage in target-sensitive passes (#163801 )	2025-10-31 17:33:11 +01:00
Charitha Saumya	1e8834ea3a	[mlir][vector][xegpu] Accept uniform values in `getDistributedType` (#163887 ) Uniform values should not be distributed during vector distribution. Example would be a reduction result where reduction happens across lanes. However, current `getDistributedType` does not accept a zero result affine map (i.e. no distributed dims) when describing the distributed dimensions. This result in null type being returned and crashing the vector distribution in some cases. An example case would be a `scf.for` op (about to be distributed) in which one of the for result is a uniform value and it does not have a user outside the warp op. This necessitates querying the `getDistributedType` to figure our the distributed type of this value.	2025-10-22 08:41:41 -07:00
Jakub Kuderski	ae11c5c2c4	[mlir] Switch uses of deprecated .create methods to free function. NFC. (#164635 ) See https://discourse.llvm.org/t/psa-opty-create-now-with-100-more-tab-complete/87339.	2025-10-22 14:51:03 +00:00
Charitha Saumya	bd6da1feaa	[mlir][xegpu] Add more tests in XeGPU subgroup distribution. (#162543 ) This PR adds some tests for covering some useful corner cases. 1. more tests for `vector.shape_cast` distribution. 2. testing for `MoveFuncBodyToWarpOp` pattern that was not possible before.	2025-10-10 09:27:36 -07:00
Charitha Saumya	b86fef88c5	[mlir][xegpu] Create a test pass for subgroup distribution. (#161592 ) Current subgroup distribution test employ the entire `xegpu-subgroup-distribute` pass which include multiple steps like layout propagation, move func body into warp op, and distribute to work items. This makes it harder to isolate the testing for xegpu subgroup distribution logic, because certain corner cases may be not supported yet by other steps mentioned above. This PR introduces a test pass for subgroup distribution logic and isolate the testing for distribution logic. We plan to add more corner case (that were not possible before) covering non-xegpu ops (like vector) in next PRs. This PR also include, 1. minor bug fixes in gather/scatter distribution. 2. bug fix in vector multi reduction lowering where it fails to retain some layouts.	2025-10-03 12:35:13 -07:00
Charitha Saumya	ca61a9d960	[mlir][xegpu] Support offset arguments in LoadNd, StoreNd and PrefetchNd subgroup distribution. (#160417 ) Currently offsets are given as operands of `CreateNd` op. Sg distribution does not support offsets arguments at the consumer. This PR adds support for offsets given at the consumer (like LoadNd). With this change, it is required to specify the offsets at consumer op (LoadNd, StoreNd, PrefetchNd) of the tile or otherwise distribution will fail. This also removes the need for UpdateNdOffset op. PR removes the support for UpdateNdOffset .	2025-09-25 11:25:17 -07:00
Charitha Saumya	2998c74a1e	[mlir][xegpu] Add SIMT distribution support for GEMM transpose B case. (#155517 ) This PR adds the features needed for supporting the GEMM with transpose B case. Summary of changes. 1). Add distribution logic for `vector.bitcast`, `vector.transpose` and `memref.extract_aligned_pointer_as_index` cases. 2). Add layout propagation support for `vector.shape_cast`, `vector.broadcast` and `vector.bitcast` 3). Incorporate slice attribute and `DistributeLayoutAttr` interface with the core logic in layout prop.	2025-09-19 10:33:27 -07:00
Charitha Saumya	9b0d7ddb04	[mlir][xegpu] Add support for `vector.multi_reduction` and `vector.shape_cast` SIMT distribution. (#157560 ) Add support for distributing the `vector.multi_reduction` operation across lanes in a warp. Currently only 2D to 1D reductions are supported. Given layouts for the source and accumulator vectors, * If the reduction dimension is distributed across lanes, the reduction is non-lane-local and the reduction is done using warp shuffles. Here we simply rewrite the `MultiDimReductionOp` to a sequence of `ReductionOp`s inside the warp op body. Actual distribution will be done by `WarpOpReduction` pattern. * If the reduction dimension is not distributed across lanes, the reduction is lane-local. In this case, we yield the source and accumulator vectors from the warp op and perform the lane-local reduction outside the warp op using a sequence of `ReductionOp`s. PR also adds support for distributing `vector.shape_cast` based on layouts.	2025-09-12 09:37:04 -07:00
Jakub Kuderski	2ed3f49c49	[mlir] Use free op create functions. NFC. (#157374 ) The builder create methods are deprecated: https://mlir.llvm.org/deprecation/. See https://discourse.llvm.org/t/psa-opty-create-now-with-100-more-tab-complete/87339.	2025-09-07 22:13:20 -04:00
Artem Kroviakov	6c6afdd8c2	[MLIR][XeGPU] Reapply attempt for "Scattered ops sg-to-wi distribution #154949 " (#156924 ) This PR is a reapply of https://github.com/llvm/llvm-project/pull/154949, which failed one of sanitizer checks. The issue was querying the `warpOp` results in `LoadDistribution` after calling `moveRegionToNewWarpOpAndAppendReturns()`, which resulted in use after free. This PR solves the issue by moving the op query before the call and is otherwise identical to the one linked above. --------- Co-authored-by: Charitha Saumya <136391709+charithaintc@users.noreply.github.com>	2025-09-04 12:04:30 -07:00
Thurston Dang	c1cc9d2c8a	Revert "[MLIR][XeGPU] Scattered ops sg-to-wi distribution" (#156761 ) Reverts llvm/llvm-project#154949 due to suspected buildbot breakage (https://lab.llvm.org/buildbot/#/builders/55/builds/16630/steps/11/logs/stdio). Previously commented on the original pull request: https://github.com/llvm/llvm-project/pull/154949#issuecomment-3250709417 ``` ****************** TEST 'MLIR :: Dialect/XeGPU/subgroup-distribute.mlir' FAILED ****************** ... # \| PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace. # \| Stack dump: # \| 0. Program arguments: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/bin/mlir-opt -xegpu-subgroup-distribute -allow-unregistered-dialect -canonicalize -cse -split-input-file /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/mlir/test/Dialect/XeGPU/subgroup-distribute.mlir # \| #0 0x0000c0af4b066df0 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/lib/Support/Unix/Signals.inc:834:13 # \| #1 0x0000c0af4b060e20 llvm::sys::RunSignalHandlers() /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/lib/Support/Signals.cpp:105:18 # \| #2 0x0000c0af4b0691b4 SignalHandler(int, siginfo_t, void) /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/lib/Support/Unix/Signals.inc:426:38 # \| #3 0x0000ee25a3dcb8f8 (linux-vdso.so.1+0x8f8) # \| #4 0x0000ee25a36c7608 (/lib/aarch64-linux-gnu/libc.so.6+0x87608) # \| #5 0x0000ee25a367cb3c raise (/lib/aarch64-linux-gnu/libc.so.6+0x3cb3c) # \| #6 0x0000ee25a3667e00 abort (/lib/aarch64-linux-gnu/libc.so.6+0x27e00) # \| #7 0x0000c0af4ae7e4b0 __sanitizer::Atexit(void ()()) /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_posix_libcdep.cpp:168:10 # \| #8 0x0000c0af4ae7c354 __sanitizer::Die() /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_termination.cpp:52:5 # \| #9 0x0000c0af4ae66a30 Unlock /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/hwasan/../sanitizer_common/sanitizer_mutex.h:250:16 # \| #10 0x0000c0af4ae66a30 ~GenericScopedLock /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/hwasan/../sanitizer_common/sanitizer_mutex.h:386:51 # \| #11 0x0000c0af4ae66a30 __hwasan::ScopedReport::~ScopedReport() /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/hwasan/hwasan_report.cpp:54:5 # \| #12 0x0000c0af4ae661b8 __hwasan::(anonymous namespace)::BaseReport::~BaseReport() /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/hwasan/hwasan_report.cpp:477:7 # \| #13 0x0000c0af4ae63f5c __hwasan::ReportTagMismatch(__sanitizer::StackTrace, unsigned long, unsigned long, bool, bool, unsigned long) /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/hwasan/hwasan_report.cpp:1094:1 # \| #14 0x0000c0af4ae4f8e0 Destroy /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/hwasan/../sanitizer_common/sanitizer_common.h:532:31 # \| #15 0x0000c0af4ae4f8e0 ~InternalMmapVector /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/hwasan/../sanitizer_common/sanitizer_common.h:642:56 # \| #16 0x0000c0af4ae4f8e0 __hwasan::HandleTagMismatch(__hwasan::AccessInfo, unsigned long, unsigned long, void, unsigned long) /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/hwasan/hwasan.cpp:245:1 # \| #17 0x0000c0af4ae51e8c __hwasan_tag_mismatch4 /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/hwasan/hwasan.cpp:764:1 # \| #18 0x0000c0af4ae67b30 __interception::InterceptFunction(char const, unsigned long*, unsigned long, unsigned long) /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/interception/interception_linux.cpp:60:0 # \| #19 0x0000c0af5641cd24 getNumResults /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/mlir/include/mlir/IR/Operation.h:404:37 # \| #20 0x0000c0af5641cd24 getOpResultImpl /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/mlir/include/mlir/IR/Operation.h:1010:5 # \| #21 0x0000c0af5641cd24 getResult /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/mlir/include/mlir/IR/Operation.h:407:54 # \| #22 0x0000c0af5641cd24 mlir::OpTrait::detail::MultiResultTraitBase<mlir::gpu::WarpExecuteOnLane0Op, mlir::OpTrait::VariadicResults>::getResult(unsigned int) /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/mlir/include/mlir/IR/OpDefinition.h:638:62 # \| #23 0x0000c0af56426b60 getType /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/mlir/include/mlir/IR/Value.h:63:33 # \| #24 0x0000c0af56426b60 getType /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/mlir/include/mlir/IR/Value.h:105:39 # \| #25 0x0000c0af56426b60 (anonymous namespace)::LoadDistribution::matchAndRewrite(mlir::gpu::WarpExecuteOnLane0Op, mlir::PatternRewriter&) const /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/mlir/lib/Dialect/XeGPU/Transforms/XeGPUSubgroupDistribute.cpp:991:55 ... ```	2025-09-03 14:40:18 -07:00
Artem Kroviakov	5777f71bce	[MLIR][XeGPU] Scattered ops sg-to-wi distribution (#154949 ) This PR adds distribution patterns for scattered load and store ops, chunk size included. XeGPU moves toward offsets being part of the load/store ops, so the pass only supports this case. Manipulating a vector of offsets indirectly through create_tdesc is complex and soon to become obsolete anyway. This PR assumes the SIMT-adapted scatter ops verification introduced in https://github.com/llvm/llvm-project/pull/154653. The distribution itself can be reviewed in the meantime.	2025-09-03 11:48:55 -07:00
Chao Chen	c96e2cdd13	[mlir][XeGPU] Update utils for LayoutAttr and SliceAttr support (#154819 )	2025-08-27 12:37:15 -05:00
Adam Siemieniuk	533ddcd989	[mlir][gpu] Warp execute terminator getter (#154729 ) Adds a utility getter to `warp_execute_on_lane_0` which simplifies access to the op's terminator. Uses are refactored to utilize the new terminator getter.	2025-08-22 18:24:23 +02:00
Charitha Saumya	06884d0204	[mlir][xegpu] Bug fix in UpdateNdOffset distribution. (#150545 ) Reason is UpdateNdOffset source operand not retaining the layouts when it is yielded by the warp op. `warp_execute_on_lane0` op expects that TensorDesc type is unchanged during distribution out of its region. we use UnrealizedCasts to reconcile this mismatch outside the warpOp (via `resolveDistributedTy`)	2025-08-05 14:42:14 -07:00
Jianhui Li	90944b85c5	[MLIR][XeGPU] Add offset operands to load_nd/store_nd/prefetch_nd (#149424 ) This PR allows load_nd/store_nd/prefetch_nd to take an additional offset operand. It is based on this PR https://github.com/llvm/llvm-project/pull/148335. Now user can create a nd_tdesc with no offset, and instead set the offset with the load_nd operation.	2025-07-23 09:00:51 -07:00
Maksim Levental	7b78796543	[mlir][NFC] update `mlir/Dialect` create APIs (25/n) (#149932 ) See https://github.com/llvm/llvm-project/pull/147168 for more info.	2025-07-21 19:57:59 -04:00
Charitha Saumya	fc3781853b	[mlir][xegpu] Minor fixes in XeGPU subgroup distribution. (#147846 ) This PR addresses the following issues. 1. Add the missing attributes when creating a new GPU funcOp in `MoveFuncBodyToWarpExecuteOnLane0` pattern. 2. Bug fix in LoadNd distribution to make sure LoadOp is the last op in warpOp region before it is distributed (needed for preserving the memory op ordering during distribution). 3. Add utility for removing OpOperand or OpResult layout attributes.	2025-07-17 15:13:20 -07:00
Charitha Saumya	244ebef1dd	Reapply [mlir][vector] Refactor WarpOpScfForOp to support unused or swapped forOp results. (#148313 ) Reapply attempt for : https://github.com/llvm/llvm-project/pull/148291 Fix for the build failure reported in : https://lab.llvm.org/buildbot/#/builders/116/builds/15477 ----- This crash is caused by mismatch of distributed type returned by `getDistributedType` and intended distributed type for forOp results. Solution diff: `20c2cf6766` Example: ``` func.func @warp_scf_for_broadcasted_result(%arg0: index) -> vector<1xf32> { %c128 = arith.constant 128 : index %c1 = arith.constant 1 : index %c0 = arith.constant 0 : index %2 = gpu.warp_execute_on_lane_0(%arg0)[32] -> (vector<1xf32>) { %ini = "some_def"() : () -> (vector<1xf32>) %0 = scf.for %arg3 = %c0 to %c128 step %c1 iter_args(%arg4 = %ini) -> (vector<1xf32>) { %1 = "some_op"(%arg4) : (vector<1xf32>) -> (vector<1xf32>) scf.yield %1 : vector<1xf32> } gpu.yield %0 : vector<1xf32> } return %2 : vector<1xf32> } ``` In this case the distributed type for forOp result is `vector<1xf32>` (result is not distributed and broadcasted to all lanes instead). However, in this case `getDistributedType` will return NULL type. Therefore, if the distributed type can be recovered from warpOp, we should always do that first before using `getDistributedType`	2025-07-14 15:41:56 -07:00
Charitha Saumya	1d33bbab57	Revert "[mlir][vector] Refactor WarpOpScfForOp to support unused or swapped forOp results." (#148291 ) Reverts llvm/llvm-project#147620 Reverting due to build failure: https://lab.llvm.org/buildbot/#/builders/116/builds/15477	2025-07-11 13:22:54 -07:00
Charitha Saumya	3092b765ba	[mlir][vector] Refactor WarpOpScfForOp to support unused or swapped forOp results. (#147620 ) Current implementation generates incorrect code or crashes in the following valid cases. 1. At least one of the for op results are not yielded by the warpOp. Example: ``` %0 = gpu.warp_execute_on_lane_0(%arg0)[32] -> (vector<4xf32>) { .... %3:2 = scf.for %arg3 = %c0 to %c128 step %c1 iter_args(%arg4 = %ini, %arg5 = %ini1) -> (vector<128xf32>, vector<128xf32>) { %1 = ... %acc = .... scf.yield %acc, %1 : vector<128xf32>, vector<128xf32> } gpu.yield %3#0 : vector<128xf32> // %3#1 is not used but can not be removed as dead code (loop carried). } "some_use"(%0) : (vector<4xf32>) -> () return ``` 2. Enclosing warpOp yields the forOp results in different order compared to the forOp results. Example: ``` %0:3 = gpu.warp_execute_on_lane_0(%arg0)[32] -> (vector<4xf32>, vector<4xf32>, vector<8xf32>) { .... %3:3 = scf.for %arg3 = %c0 to %c128 step %c1 iter_args(%arg4 = %ini1, %arg5 = %ini2, %arg6 = %ini3) -> (vector<256xf32>, vector<128xf32>, vector<128xf32>) { ..... scf.yield %acc1, %acc2, %acc3 : vector<256xf32>, vector<128xf32>, vector<128xf32> } gpu.yield %3#2, %3#1, %3#0 : vector<128xf32>, vector<128xf32>, vector<256xf32> // swapped order } "some_use_1"(%0#0) : (vector<4xf32>) -> () "some_use_2"(%0#1) : (vector<4xf32>) -> () "some_use_3"(%0#2) : (vector<8xf32>) -> () ```	2025-07-11 13:08:33 -07:00
Benjamin Kramer	3287c1c176	[XeGPU] Move targetinfo constants to their own header file This breaks the dependency from Dialect to Utils, which would be cyclic.	2025-06-26 13:53:00 +02:00
Charitha Saumya	c8a9579ff9	[mlir][xegpu] Add support for distributing `gpu.barrier` (#145434 )	2025-06-24 09:28:30 -07:00
Charitha Saumya	adc6228ea0	[mlir][xegpu] Refine layout assignment in XeGPU SIMT distribution. (#142687 ) Changes: * Decouple layout propagation from subgroup distribution and move it to an independent pass. * Refine layout assignment to handle control-flow ops correctly (scf.for, scf.while). * Refine test cases.	2025-06-20 10:43:19 -07:00
Jeremy Kun	b533b0ec34	Define a DataFlowSolver helper that loads sensible default analyses (#143415 ) Cf. https://discourse.llvm.org/t/mlir-dead-code-analysis/67568/10 Custom analysis passes will not work properly unless both DeadCodeAnalysis and SparseConstantPropagation are loaded to the DataFlowSolver. This is intended behavior, but surprising to many users as shown in the thread. In lieu of a longer-term fix (which I am not knowledgeable enough to implement myself, yet), this commit adds a helper function that loads these two analyses, as well as providing breadcrumbs for an explanation of the problem. The existing places in the codebase where these two analyses are loaded for the purpose of running other unrelated analyses are replaced by the use of the helper. --------- Co-authored-by: Jeremy Kun <j2kun@users.noreply.github.com> Co-authored-by: Oleksandr "Alex" Zinenko <azinenko@amd.com>	2025-06-20 08:16:52 -07:00
Chao Chen	9e2684e4cf	[MLIR][XeGPU] Add unroll patterns and blocking pass for XeGPU [2/N] (#142477 ) Bring back https://github.com/llvm/llvm-project/pull/140163 with fixes	2025-06-02 21:39:30 -05:00
Chao Chen	b88dfb0b23	Revert "[MLIR][XeGPU] Add unroll patterns and blocking pass for XeGPU [2/N]" (#142459 ) Reverts llvm/llvm-project#140163	2025-06-02 15:47:21 -04:00
Chao Chen	0210750d5a	[MLIR][XeGPU] Add unroll patterns and blocking pass for XeGPU [2/N] (#140163 ) This PR introduces the initial implementation of a blocking pass for XeGPU programs. The pass leverages unroll patterns from both the XeGPU and Vector dialects. --------- Co-authored-by: Adam Siemieniuk <adam.siemieniuk@intel.com>	2025-06-02 14:02:45 -05:00
Wang Qiang	cece058191	[llvm][mlir][NFC] Fix typos in comments and test descriptions (#139688 ) This patch fixes several typographical errors in comments and test files: 1. Corrected "achive" to "archive" in archive-update.test. 2. Fixed "achive" to "achieve" in a comment in XeGPUSubgroupDistribute.cpp. 3. Corrected "achived" to "achieved" in a test note in SimpleSIVNoValidityCheckFixedSize.ll. These changes are non-functional and intended to improve readability and documentation accuracy. Signed-off-by: Kane Wang <wangqiang1@kylinos.cn> Co-authored-by: Kane Wang <wangqiang1@kylinos.cn>	2025-05-13 11:03:51 +01:00
Rahul Joshi	b17f3c63de	[NFC][MLIR] Add {} for `else` when `if` body has {} (#139422 )	2025-05-12 10:29:03 -07:00
Charitha Saumya	e7dcf1b7e5	[mlir][xegpu] Add SIMT distribution patterns for UpdateNdOffset and PrefetchNd ops. (#138033 ) This PR adds support for SIMT distribution of UpdateNdOffset and PrefetchNd ops. For both these ops distribution will remove the layout attribute from the tensor descriptor type. Everything else remains unchanged. Example 1: ``` #lo0 = #xegpu.layout<wi_layout = [1, 8], wi_data = [1, 1]> gpu.warp_execute_on_lane_0(%laneid) -> () { ... xegpu.prefetch_nd %arg0 : !xegpu.tensor_desc<4x8xf32, #lo0> } ``` To ``` %r:2 = gpu.warp_execute_on_lane_0(%laneid) -> ( !xegpu.tensor_desc<4x8xf32, #lo0>) { gpu.yield %arg0: !xegpu.tensor_desc<4x8xf32, #lo0> } %1 = unrealized_conversion_cast %r#0: !xegpu.tensor_desc<4x8xf32, #lo0> -> !xegpu.tensor_desc<4x8xf32> xegpu.prefetch_nd %0 : !xegpu.tensor_desc<4x8xf32> ``` Example 2: ``` #lo0 = #xegpu.layout<wi_layout = [1, 8], wi_data = [1, 1]> %r = gpu.warp_execute_on_lane_0(%laneid) -> (!xegpu.tensor_desc<4x8xf32, #lo0>) { ... %update = xegpu.update_nd_offset %arg0, [%c32, %c16]: !xegpu.tensor_desc<4x8xf32, #lo0> gpu.yield %update } ... ``` To ``` %r:2 = gpu.warp_execute_on_lane_0(%laneid) -> (vector<4x1xf32>, !xegpu.tensor_desc<4x8xf32, #lo0>) { ... %dead = xegpu.update_nd_offset %arg0, [%c32, %c16]: !xegpu.tensor_desc<4x8xf32, #lo0> gpu.yield %dead, %arg0 gup.yield %dead, %arg0, %c32, %c16 } %0 = xegpu.unrealized_conversion_cast %r#1: !xegpu.tensor_desc<4x8xf32, #lo0> -> !xegpu.tensor_desc<4x8xf32> %1 = xegpu.update_nd_offset %0, [%c32, %c16]: !xegpu.tensor_desc<4x8xf32> ... ```	2025-05-08 13:17:38 -07:00
Charitha Saumya	7a66746226	[mlir][xegpu] Handle scalar uniform ops in SIMT distribution. (#138593 ) This PR adds support for moving scalar uniform (gpu index ops, constants etc) outside the `gpu.warp_execute_on_lane0` op. These kinds of ops do not require distribution and are safe to move out of the warp op. This also avoid adding separate distribution patterns for these ops. Example: ``` %1 = gpu.warp_execute_on_lane_0(%laneid) -> (index) { ... %block_id_x = gpu.block_id x gpu.yield %block_id_x } // use %1 ``` To: ``` %block_id_x = gpu.block_id x %1 = gpu.warp_execute_on_lane_0(%laneid) -> (index) { ... gpu.yield %block_id_x } // use %1 ```	2025-05-08 10:35:32 -07:00
Kazu Hirata	b2e2ae8702	[mlir] Fix warnings This patch fixes: mlir/lib/Dialect/XeGPU/Transforms/XeGPUSubgroupDistribute.cpp:901:12: error: variable 'origVecType' set but not used [-Werror,-Wunused-but-set-variable] mlir/lib/Dialect/XeGPU/Transforms/XeGPUSubgroupDistribute.cpp:908:12: error: variable 'origTensorDescTy' set but not used [-Werror,-Wunused-but-set-variable]	2025-05-01 12:41:31 -07:00
Charitha Saumya	d30554b19e	[mlir][xegpu] SIMT distribution patterns for XeGPU CreateNdTdesc, LoadNd, StoreNd and Dpas Ops. (#135271 ) This PR adds the SIMT distribution patterns for create_nd_tdesc, load_nd, store_nd and dpas XeGPU ops.	2025-04-30 12:16:47 -07:00
Jakub Kuderski	198c5dac37	[mlir][transform] Clean up prints. NFC. (#136401 ) Use `llvm::interleaved` from #135517 to simplify printing.	2025-04-19 12:11:06 -04:00

1 2

54 Commits