llvm-project

Author	SHA1	Message	Date
Artem Kroviakov	2357408dd2	[MLIR][XeGPU] Add layout propagation for `xegpu.store_matrix` (#174952 )	2026-01-23 17:08:30 +01:00
Artem Kroviakov	e6195237a2	[MLIR][XeGPU] Add simple rank-based sg layout creation (#172867 )	2026-01-23 11:43:44 +01:00
Nishant Patel	3150b73dec	[MLIR][XeGPU] Clean up helpers in XeGPUPropagateLayout (#175857 ) In XeGPUPropagateLayout.cpp, the helper getDefaultSIMTLayoutInfo is implemented via multiple overloads that differ significantly in semantics, not just parameter types. Reusing the same function name for these semantically different behaviors makes call sites harder to read and reason about and increases the maintenance burden. This PR improves readability and maintainability of layout propagation logic.	2026-01-15 08:13:38 -08:00
lonely eagle	0394ad1bfa	[mlir][dataflow] Add new visitNonControlFlowArgumentst API to SparseBackwardDataFlowAnalysis and apply it in LivenessAnalysis/RemoveDeadValues (#169816 ) Add visitNonControlFlowArgumentst API to SparseBackwardDataFlowAnalysis, current SparseBackwardDataflowAnalysis cannot access all SSA values, such as, the loop's IV. Now we can use visitNonControlFlowArgumentst to visit it. Apply it in LivenessAnalysis/RemoveDeadValues, solved the issue of IV liveness in the loop. https://discourse.llvm.org/t/rfc-add-visitbranchregionargument-interface-to-sparsedataflowanalysis/89061	2026-01-02 17:10:05 +08:00
Matthias Springer	0b24580a26	[mlir][Interfaces][NFC] Add `RegionBranchOpInterface` helper for forwarded values (#173981 ) Add a helper function to compute a mapping of successor operands to successor inputs. This mapping is computed in various places. Also add a helper function to gather all region branch points. This commit is in preparation of a bug fix / partial redesign of `-remove-dead-values`. This commit also removes some duplicate code in various places.	2026-01-01 11:56:05 +01:00
Jianhui Li	2b9e47749c	[MLIR][XeGPU] Refactor Layout access interface (#172125 ) This PR builds on the anchor layout mechanism introduced in https://github.com/llvm/llvm-project/pull/169267 and performs the following refactoring: 1. Introduce getAnchorLayout() and setAnchorLayout() interface for anchor ops to get and set layout attributes. 2. Add getLocalLayout() and setLocalLayout() utility functions, and refactor workgroup/subgroup distribution patterns to use these APIs. These utilities access the layout information directly and locally, without relying on global propagation. 3. Introduce localPropagateLayoutsFromAnchor(), a utility used by subgroup distribution to unify non-anchor layout setup. This function is intended to be invoked upfront by all layout-based passes (including workgroup/subgroup distribution and unrolling) to propagate layouts from anchor ops to non-anchor ops. After this step, patterns within the pass should exclusively use getLocalLayout() / setLocalLayout(). 4. Refactor getDistributeLayoutAttr() and setDistributeLayoutAttr() to remove special-case handling. These APIs now operate in a uniform order: anchor ops first, then non-anchor ops, and finally block arguments. These APIs will be deprecated on long run. 5. Refactor patterns in wg/sg distribution, load optimization passes to use get/setAnchorLayout() and get/setLocalLayout(). 6. Update test cases to enforce that anchor ops must use—and only use—anchor layouts.	2025-12-17 12:04:58 -08:00
Artem Kroviakov	a6f837e9f8	[MLIR][XeGPU] Add sg layout propagation (#170879 )	2025-12-17 11:03:21 +01:00
Jianhui Li	492340aeb1	[MLIR][XeGPU] Add handling for unit-dim expansion in ShapeCast workgroup-to-subgroup distribution (#171758 ) Add special-case handling for ShapeCast when it expands unit dimensions for a succeeding broadcast op. In this scenario, distribution requires the source layout to be a slice layout, and the result layout is first normalized by setting the expanded unit dimensions to 1 before computing the distributed result shape. In all other cases, ShapeCast is distributed as usual. This PR also updates the propagation rule for vectors with expanded unit dimensions, allowing them to share the same layout as the result of a broadcast op. This enables correct layout propagation back to the source of the ShapeCast op, as that layout must ultimately be restored as the parent layout of the slice layout.	2025-12-16 13:13:11 -08:00
Jianhui Li	5236af88e5	[MLIR][XeGPU] Extend propagation and sg_to_lane distribution pass support broadcast with low rank and scalar source input (#170409 ) This PR extends XeGPU layout propagation and distribution for vector.broadcast operation. It relaxes the restriction of layout propagation to allow low-rank and scalar source input, and adds a pattern in sg-to-wi distribution to support the lowering.	2025-12-09 08:48:27 -08:00
Artem Kroviakov	ea00593dd1	[MLIR][XeGPU][Quickfix] Disable block count in propagation (#170304 ) One of the previous PRs https://github.com/llvm/llvm-project/pull/169267/ has reintroduced block count to layout propagation that was removed in https://github.com/llvm/llvm-project/pull/168504/. This PR patches the issue.	2025-12-02 09:49:06 -08:00
Jianhui Li	326a1a4bad	[MLIR][XeGPU] Add anchor_layout and update propagation to honor user-specified layouts (#169267 ) Introduce anchor layout for XeGPU anchor ops: load_nd, store_nd, prefetch_nd, dpas, load, store, prefetch, load_matrix, store_matrix, and atomic_rmw. Anchor layout is permanent, and is guaranteed to be honored by XeGPU distribution and lowerinngs once specified. 1. Add anchor_layout for XeGPU anchor OPs: load_nd, store_nd, prefetch_nd, dpas, load, store, prefetch, load_matrix, store_matrix, and atomic_rmw. 2. rename layout attributes to anchor_layout for these ops: load, store, load_matrix, store_matrix 3. update layout propagation pass: Only when user doesn't specify anchor layout, the pass computes a default layout and set to anchor op's permant layout and use that for propagation. if user specified anchor layout, the pass takes user-specified anchor layout. permant layout and use that for propagation. if user specified anchor layout, the pass takes user-specified anchor layout.	2025-11-26 23:02:01 -08:00
Artem Kroviakov	8ea5e20ce4	[MLIR][XeGPU] Disable block count usage in layout propagation (#168504 )	2025-11-23 10:34:57 +01:00
Artem Kroviakov	bba40ab4bd	[MLIR][XeGPU] Decouple `inst_data` and `lane_layout` in propagation (#166941 )	2025-11-10 14:14:11 +01:00
Charitha Saumya	9703bda95b	[mlir][xegpu] Add OptimizeBlockLoads pass. (#165483 ) This pass rewrites certain xegpu `CreateNd` and `LoadNd` operations that feeds into `vector.transpose` to more optimal form to improve performance. Specifically, low precision (bitwidth < 32) `LoadNd` ops that feeds into transpose ops are rewritten to i32 loads with a valid transpose layout such that later passes can use the load with transpose HW feature to accelerate such load ops. Update: Pass is renamed to `OptimizeBlockLoads ` because later we plan to add the array length optimization into this pass as well. This will break down a larger load (like `32x32xf16`) into more DPAS-favorable array length loads (`32x16xf16` with array length = 2). Both these optmizations require rewriting `CreateNd` and `LoadNd` and it makes sense to have a common pass for both.	2025-11-04 13:15:32 -08:00
Dmitry Chigarev	6c563dc6a2	[mlir][XeGPU] Add optional layout attribute to LoadGather StoreScatter ops (#163414 ) As [suggested here](https://github.com/llvm/llvm-project/pull/163071#discussion_r2427229637) the PR adds an optional layout attribute for `LoadGather` and `StoreScatter` ops. For the load-op the attribute describes the layout of the result (ex `layout_result_0`), and for store-op it describes the layout for the vector-to-store operand (ex `layout_operand_0`). The PR also reworks `propagate-layout` pass to consider perm layout attributes and back-propagate them accordingly. The helper utility function `getDistributeLayoutAttr` is reworked to return either `layout_operand/result_0` or `layout` for load/store ops (denepding on which one is set). After an offline discussion decided that the overall utilities layouts API is confusing since it tries to mix permament and temporary layouts. Would need to change it in the future. --------- Signed-off-by: dchigarev <dmitry.chigarev@intel.com>	2025-11-04 08:19:47 -08:00
Artem Kroviakov	ec657d859c	[MLIR][XeGPU] Introduce `xegpu::uArch` usage in target-sensitive passes (#163801 )	2025-10-31 17:33:11 +01:00
Charitha Saumya	2998c74a1e	[mlir][xegpu] Add SIMT distribution support for GEMM transpose B case. (#155517 ) This PR adds the features needed for supporting the GEMM with transpose B case. Summary of changes. 1). Add distribution logic for `vector.bitcast`, `vector.transpose` and `memref.extract_aligned_pointer_as_index` cases. 2). Add layout propagation support for `vector.shape_cast`, `vector.broadcast` and `vector.bitcast` 3). Incorporate slice attribute and `DistributeLayoutAttr` interface with the core logic in layout prop.	2025-09-19 10:33:27 -07:00
Artem Kroviakov	6c6afdd8c2	[MLIR][XeGPU] Reapply attempt for "Scattered ops sg-to-wi distribution #154949 " (#156924 ) This PR is a reapply of https://github.com/llvm/llvm-project/pull/154949, which failed one of sanitizer checks. The issue was querying the `warpOp` results in `LoadDistribution` after calling `moveRegionToNewWarpOpAndAppendReturns()`, which resulted in use after free. This PR solves the issue by moving the op query before the call and is otherwise identical to the one linked above. --------- Co-authored-by: Charitha Saumya <136391709+charithaintc@users.noreply.github.com>	2025-09-04 12:04:30 -07:00
Thurston Dang	c1cc9d2c8a	Revert "[MLIR][XeGPU] Scattered ops sg-to-wi distribution" (#156761 ) Reverts llvm/llvm-project#154949 due to suspected buildbot breakage (https://lab.llvm.org/buildbot/#/builders/55/builds/16630/steps/11/logs/stdio). Previously commented on the original pull request: https://github.com/llvm/llvm-project/pull/154949#issuecomment-3250709417 ``` ****************** TEST 'MLIR :: Dialect/XeGPU/subgroup-distribute.mlir' FAILED ****************** ... # \| PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace. # \| Stack dump: # \| 0. Program arguments: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/bin/mlir-opt -xegpu-subgroup-distribute -allow-unregistered-dialect -canonicalize -cse -split-input-file /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/mlir/test/Dialect/XeGPU/subgroup-distribute.mlir # \| #0 0x0000c0af4b066df0 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/lib/Support/Unix/Signals.inc:834:13 # \| #1 0x0000c0af4b060e20 llvm::sys::RunSignalHandlers() /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/lib/Support/Signals.cpp:105:18 # \| #2 0x0000c0af4b0691b4 SignalHandler(int, siginfo_t, void) /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/lib/Support/Unix/Signals.inc:426:38 # \| #3 0x0000ee25a3dcb8f8 (linux-vdso.so.1+0x8f8) # \| #4 0x0000ee25a36c7608 (/lib/aarch64-linux-gnu/libc.so.6+0x87608) # \| #5 0x0000ee25a367cb3c raise (/lib/aarch64-linux-gnu/libc.so.6+0x3cb3c) # \| #6 0x0000ee25a3667e00 abort (/lib/aarch64-linux-gnu/libc.so.6+0x27e00) # \| #7 0x0000c0af4ae7e4b0 __sanitizer::Atexit(void ()()) /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_posix_libcdep.cpp:168:10 # \| #8 0x0000c0af4ae7c354 __sanitizer::Die() /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_termination.cpp:52:5 # \| #9 0x0000c0af4ae66a30 Unlock /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/hwasan/../sanitizer_common/sanitizer_mutex.h:250:16 # \| #10 0x0000c0af4ae66a30 ~GenericScopedLock /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/hwasan/../sanitizer_common/sanitizer_mutex.h:386:51 # \| #11 0x0000c0af4ae66a30 __hwasan::ScopedReport::~ScopedReport() /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/hwasan/hwasan_report.cpp:54:5 # \| #12 0x0000c0af4ae661b8 __hwasan::(anonymous namespace)::BaseReport::~BaseReport() /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/hwasan/hwasan_report.cpp:477:7 # \| #13 0x0000c0af4ae63f5c __hwasan::ReportTagMismatch(__sanitizer::StackTrace, unsigned long, unsigned long, bool, bool, unsigned long) /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/hwasan/hwasan_report.cpp:1094:1 # \| #14 0x0000c0af4ae4f8e0 Destroy /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/hwasan/../sanitizer_common/sanitizer_common.h:532:31 # \| #15 0x0000c0af4ae4f8e0 ~InternalMmapVector /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/hwasan/../sanitizer_common/sanitizer_common.h:642:56 # \| #16 0x0000c0af4ae4f8e0 __hwasan::HandleTagMismatch(__hwasan::AccessInfo, unsigned long, unsigned long, void, unsigned long) /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/hwasan/hwasan.cpp:245:1 # \| #17 0x0000c0af4ae51e8c __hwasan_tag_mismatch4 /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/hwasan/hwasan.cpp:764:1 # \| #18 0x0000c0af4ae67b30 __interception::InterceptFunction(char const, unsigned long*, unsigned long, unsigned long) /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/interception/interception_linux.cpp:60:0 # \| #19 0x0000c0af5641cd24 getNumResults /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/mlir/include/mlir/IR/Operation.h:404:37 # \| #20 0x0000c0af5641cd24 getOpResultImpl /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/mlir/include/mlir/IR/Operation.h:1010:5 # \| #21 0x0000c0af5641cd24 getResult /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/mlir/include/mlir/IR/Operation.h:407:54 # \| #22 0x0000c0af5641cd24 mlir::OpTrait::detail::MultiResultTraitBase<mlir::gpu::WarpExecuteOnLane0Op, mlir::OpTrait::VariadicResults>::getResult(unsigned int) /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/mlir/include/mlir/IR/OpDefinition.h:638:62 # \| #23 0x0000c0af56426b60 getType /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/mlir/include/mlir/IR/Value.h:63:33 # \| #24 0x0000c0af56426b60 getType /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/mlir/include/mlir/IR/Value.h:105:39 # \| #25 0x0000c0af56426b60 (anonymous namespace)::LoadDistribution::matchAndRewrite(mlir::gpu::WarpExecuteOnLane0Op, mlir::PatternRewriter&) const /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/mlir/lib/Dialect/XeGPU/Transforms/XeGPUSubgroupDistribute.cpp:991:55 ... ```	2025-09-03 14:40:18 -07:00
Artem Kroviakov	5777f71bce	[MLIR][XeGPU] Scattered ops sg-to-wi distribution (#154949 ) This PR adds distribution patterns for scattered load and store ops, chunk size included. XeGPU moves toward offsets being part of the load/store ops, so the pass only supports this case. Manipulating a vector of offsets indirectly through create_tdesc is complex and soon to become obsolete anyway. This PR assumes the SIMT-adapted scatter ops verification introduced in https://github.com/llvm/llvm-project/pull/154653. The distribution itself can be reviewed in the meantime.	2025-09-03 11:48:55 -07:00
Chao Chen	c96e2cdd13	[mlir][XeGPU] Update utils for LayoutAttr and SliceAttr support (#154819 )	2025-08-27 12:37:15 -05:00
Kazu Hirata	c06d3a7b72	[mlir] Remove unused includes (NFC) (#148769 ) These are identified by misc-include-cleaner. I've filtered out those that break builds. Also, I'm staying away from llvm-config.h, config.h, and Compiler.h, which likely cause platform- or compiler-specific build failures.	2025-07-14 22:19:23 -07:00
Benjamin Kramer	3287c1c176	[XeGPU] Move targetinfo constants to their own header file This breaks the dependency from Dialect to Utils, which would be cyclic.	2025-06-26 13:53:00 +02:00
Chao Chen	36fbc6a8d2	[MLIR][XeGPU] Remove the transpose attribute from Gather/Scatter ops and Cleanup the documents (#145389 )	2025-06-25 19:43:53 -05:00
Charitha Saumya	adc6228ea0	[mlir][xegpu] Refine layout assignment in XeGPU SIMT distribution. (#142687 ) Changes: * Decouple layout propagation from subgroup distribution and move it to an independent pass. * Refine layout assignment to handle control-flow ops correctly (scf.for, scf.while). * Refine test cases.	2025-06-20 10:43:19 -07:00

25 Commits