llvm-project

Author	SHA1	Message	Date
Durgadoss R	36dc6146b8	[MLIR][NVVM] Update TMA tensor prefetch Op (#153464 ) This patch updates the TMA Tensor prefetch Op to add support for im2col_w/w128 and tile_gather4 modes. This completes support for all modes available in Blackwell. * lit tests are added for all possible combinations. * The invalid tests are moved to a separate file with more coverage. Signed-off-by: Durgadoss R <durgadossr@nvidia.com>	2025-08-22 12:51:29 +05:30
Rajat Bajpai	b08b219650	[MLIR][NVVM] Add "blocksareclusters" kernel attribute support (#154519 ) This change adds "nvvm.blocksareclusters" kernel attribute support in NVVM Dialect/MLIR.	2025-08-22 11:32:21 +05:30
Yang Bai	f1f194bf10	[mlir][vector] fix: unroll vector.from_elements in gpu pipelines (#154774 ) ### Problem PR #142944 introduced a new canonicalization pattern which caused failures in the following GPU-related integration tests: - mlir/test/Integration/GPU/CUDA/TensorCore/sm80/transform-mma-sync-matmul-f16-f16-accum.mlir - mlir/test/Integration/GPU/CUDA/TensorCore/sm80/transform-mma-sync-matmul-f32.mlir The issue occurs because the new canonicalization pattern can generate multi-dimensional `vector.from_elements` operations (rank > 1), but the GPU lowering pipelines were not equipped to handle these during the conversion to LLVM. ### Fix This PR adds `vector::populateVectorFromElementsLoweringPatterns` to the GPU lowering passes that are integrated in `gpu-lower-to-nvvm-pipeline`: - `GpuToLLVMConversionPass`: the general GPU-to-LLVM conversion pass. - `LowerGpuOpsToNVVMOpsPass`: the NVVM-specific lowering pass. Co-authored-by: Yang Bai <yangb@nvidia.com>	2025-08-21 21:46:06 -05:00
James Newling	a64e6f4928	[MLIR][Vector] Test to accompany bug fix (#154434 ) Bug introduced in https://github.com/llvm/llvm-project/pull/93664 The bug was fixed in https://github.com/llvm/llvm-project/pull/152957 But there was no test. This PR adds a test that hits the assertion failure if the fix is reverted (if I change dyn_cast to cast).	2025-08-21 13:00:41 -07:00
Tim Gymnich	e20fa4f412	[mlir][AMDGPU] Add PermlaneSwapOp (#154345 ) - Add PermlaneSwapOp that lowers to `rocdl.permlane16.swap` and `rocdl.permlane32.swap` --------- Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com>	2025-08-21 18:21:43 +02:00
Guray Ozen	5c36fb3303	[MLIR][NVVM] Improve inline_ptx, add readwrite support (#154358 ) Key Features 1. Multiple SSA returns – no struct packing/unpacking required. 2. Automatic struct unpacking – values are directly usable. 3. Readable register mapping * {$rwN} → read-write * {$roN} → read-only * {$woN} → write-only 4. Full read-write support (+ modifier). 5. Simplified operand specification – avoids cryptic "=r,=r,=f,=f,f,f,0,1" constraints. 6. Predicate support: PTX `@p` predication support IR Example: ``` %wo0, %wo1 = nvvm.inline_ptx """ .reg .pred p; setp.ge.s32 p, {$r0}, {$r1}; selp.s32 {$rw0}, {$r0}, {$r1}, p; selp.s32 {$rw1}, {$r0}, {$r1}, p; selp.s32 {$w0}, {$r0}, {$r1}, p; selp.s32 {$w1}, {$r0}, {$r1}, p; """ ro(%a, %b : f32, f32) rw(%c, %d : i32, i32) -> f32, f32 ``` After lowering ``` %0 = llvm.inline_asm has_side_effects asm_dialect = att "{ .reg .pred p;\ setp.ge.s32 p, $4, $5; \ selp.s32 $0, $4, $5, p;\ selp.s32 $1, $4, $5, p;\ selp.s32 $2, $4, $5, p;\ selp.s32 $3, $4, $5, p;\ }" "=r,=r,=f,=f,f,f,0,1" %c500_i32, %c400_i32, %cst, %cst_0 : (i32, i32, f32, f32) -> !llvm.struct<(i32, i32, f32, f32)> %1 = llvm.extractvalue %0 : !llvm.struct<(i32, i32, f32, f32)> %2 = llvm.extractvalue %0 : !llvm.struct<(i32, i32, f32, f32)> %3 = llvm.extractvalue %0 : !llvm.struct<(i32, i32, f32, f32)> %4 = llvm.extractvalue %0 : !llvm.struct<(i32, i32, f32, f32)> // Unpacked result from nvvm.inline_ptx %5 = arith.addi %1, %2 : i32 // read only %6 = arith.addf %cst, %cst_0 : f32 // write only %7 = arith.addf %3, %4 : f32 ```	2025-08-21 17:42:18 +02:00
Chao Chen	68d6866428	[mlir][XeGPU] add WgToSg distribution pattern for load_matrix and store_matrix. (#154403 )	2025-08-21 10:02:45 -05:00
Renato Golin	32a5adbd42	[MLIR][Linalg] Rename convolution pass (#154400 ) Rename the pass `LinalgNamedOpConversionPass` to `SimplifyDepthwiseConvPass` to avoid conflating it with the new morphisms we are creating between the norms.	2025-08-21 15:57:16 +01:00
Guray Ozen	3d41197d68	[MLIR] Introduce RemarkEngine + pluggable remark streaming (YAML/Bitstream) (#152474 ) This PR implements structured, tooling-friendly optimization remarks with zero cost unless enabled. It implements: - `RemarkEngine` collects finalized remarks within `MLIRContext`. - `MLIRRemarkStreamerBase` abstract class streams them to a backend. - Backends: `MLIRLLVMRemarkStreamer` (bridges to llvm::remarks → YAML/Bitstream) or your own custom streamer. - Optional mirroring to DiagnosticEngine (printAsEmitRemarks + categories). - Off by default; no behavior change unless enabled. Thread-safe; ordering best-effort. ## Overview ``` Passes (reportOptimization*) │ ▼ +-------------------+ \| RemarkEngine \| collects +-------------------+ │ │ │ mirror │ stream ▼ ▼ emitRemark MLIRRemarkStreamerBase (abstract) │ ├── MLIRLLVMRemarkStreamer → llvm::remarks → YAML \| Bitstream └── CustomStreamer → your sink ``` ## Enable Remark engine and Plug LLVM's Remark streamer ``` // Enable once per MLIRContext. This uses `MLIRLLVMRemarkStreamer` mlir::remark::enableOptimizationRemarksToFile( ctx, path, llvm::remarks::Format::YAML, cats); ``` ## API to emit remark ``` // Emit from a pass remark::passed(loc, categoryVectorizer, myPassname1) << "vectorized loop"; remark::missed(loc, categoryUnroll, "MyPass") << remark::reason("not profitable at this size") // Creates structured reason arg << remark::suggest("increase unroll factor to >=4"); // Creates structured suggestion arg remark::passed(loc, categoryVectorizer, myPassname1) << "vectorized loop" << remark::metric("tripCount", 128); // Create structured metric on-the-fly ```	2025-08-21 16:02:31 +02:00
donald chen	5af7263d42	[mlir] add getViewDest method to viewLikeOpInterface (#154524 ) The viewLikeOpInterface abstracts the behavior of an operation view one buffer as another. However, the current interface only includes a "getViewSource" method and lacks a "getViewDest" method. Previously, it was generally assumed that viewLikeOpInterface operations would have only one return value, which was the view dest. This assumption was broken by memref.extract_strided_metadata, and more operations may break these silent conventions in the future. Calling "viewLikeInterface->getResult(0)" may lead to a core dump at runtime. Therefore, we need 'getViewDest' method to standardize our behavior. This patch adds the getViewDest function to viewLikeOpInterface and modifies the usage points of viewLikeOpInterface to standardize its use.	2025-08-21 20:09:52 +08:00
Mehdi Amini	b20c291bae	[MLIR] Adopt LDBG() debug macro in PatternApplicator.cpp (NFC) (#154724 )	2025-08-21 10:32:21 +00:00
Mehdi Amini	acda808304	[MLIR] Adopt LDBG() macro in BuiltinAttributes.cpp (NFC) (#154723 )	2025-08-21 10:31:18 +00:00
Mehdi Amini	b916df3a08	[MLIR] Adopt LDBG() in Transform/IR/Utils.cpp (NFC) (#154722 )	2025-08-21 10:30:01 +00:00
Mehdi Amini	30f9428f14	[MLIR] Adopt LDBG() macro in LLVM/NVVM/Target.cpp (#154721 )	2025-08-21 10:29:37 +00:00
Mehdi Amini	db0529dca3	[MLIR] Use LDBG() macro in Dialect.cpp (NFC) (#154720 )	2025-08-21 10:28:51 +00:00
Guray Ozen	7439d22970	[MLIR][NVVM] Add nanosleep (#154697 )	2025-08-21 11:30:41 +02:00
Dominik Adamski	b69fd34e76	[Offload] Add oneInterationPerThread param to loop device RTL (#151959 ) Currently, Flang can generate no-loop kernels for all OpenMP target kernels in the program if the flags -fopenmp-assume-teams-oversubscription or -fopenmp-assume-threads-oversubscription are set. If we add an additional parameter, we can choose in the future which OpenMP kernels should be generated as no-loop kernels. This PR doesn't modify current behavior of oversubscription flags. RFC for no-loop kernels: https://discourse.llvm.org/t/rfc-no-loop-mode-for-openmp-gpu-kernels/87517	2025-08-21 09:03:56 +02:00
Mehdi Amini	62b29d9f76	[MLIR] Adopt LDBG() debug macro in BytecodeWriter.cpp (NFC) (#154642 )	2025-08-20 22:45:39 +00:00
Mehdi Amini	908eebcb93	[MLIR] Adopt LDBG() macro in PDL ByteCodeExecutor (NFC) (#154641 )	2025-08-20 22:40:52 +00:00
Mehdi Amini	dbbd3f0d07	[MLIR] Adopt LDBG() macro in Affine/Analysis/Utils.cpp (NFC) (#154626 )	2025-08-20 21:56:03 +00:00
Mehdi Amini	d20a74e631	[MLIR] Adopt LDBG() macro in BasicPtxBuilderInterface.cpp (NFC) (#154625 )	2025-08-20 21:51:17 +00:00
Mehdi Amini	4be19e27b5	[MLIR] Adopt LDBG() debug macros in Affine LoopAnalysis.cpp (NFC) (#154621 )	2025-08-20 21:45:42 +00:00
Mehdi Amini	6445a75c98	[MLIR] Update MLIRContext to use the LDBG() style debug macro (NFC) (#154619 )	2025-08-20 21:30:11 +00:00
Mehdi Amini	ffbc8da8b5	[MLIR] Migrate LICM utils to the LDBG() macro style logging (NFC) (#154615 )	2025-08-20 21:29:50 +00:00
Mehdi Amini	780750bbf9	[MLIR] Adopt LDBG() debug macro in ConvertToLLVMPass (NFC) (#154616 )	2025-08-20 21:29:35 +00:00
Mehdi Amini	5683baea6d	[MLIR] Adopt LDBG() debug macro in bufferization (NFC) (#154614 )	2025-08-20 21:14:02 +00:00
Rolf Morel	cbfa265e98	[MLIR][LLVMIR][DLTI] Add `LLVM::TargetAttrInterface` and `#llvm.target` attr (#145899 ) Adds the `#llvm.target<triple = $TRIPLE, chip = $CHIP, features = $FEATURES>` attribute and along with a `-llvm-target-to-data-layout` pass to derive a MLIR data layout from the LLVM data layout string (using the existing `DataLayoutImporter`). The attribute implements the relevant DLTI-interfaces, to expose the `triple`, `chip` (AKA `cpu`) and `features` on `#llvm.target` and the full `DataLayoutSpecInterface`. The pass combines the generated `#dlti.dl_spec` with an existing `dl_spec` in case one is already present, e.g. a `dl_spec` which is there to specify size of the `index` type. Adds a `TargetAttrInterface` which can be implemented by all attributes representing LLVM targets. Similar to the Draft PR https://github.com/llvm/llvm-project/pull/78073. RFC on which this PR is based: https://discourse.llvm.org/t/mandatory-data-layout-in-the-llvm-dialect/85875	2025-08-20 22:00:30 +01:00
Michał Górny	d76bb2bb89	[mlir] Fix missing `mlir-capi-global-constructors-test` on standalone build (#154576 ) Add `mlir-capi-global-constructors-test` to `MLIR_TEST_DEPENDS` when `MLIR_ENABLE_EXECUTION_ENGINE` is enabled, to ensure that it is also built during standalone builds, and therefore fix test failure due to the executable being missing. I don't understand the purpose of `LLVM_ENABLE_PIC AND TARGET ${LLVM_NATIVE_ARCH}` block, but the condition is not true in standalone builds. Fixes 7610b1372955da55e3dc4e2eb1440f0304a56ac8.	2025-08-20 19:50:07 +02:00
Akash Banerjee	d69ccded4f	[MLIR] Add cpow support in ComplexToROCDLLibraryCalls (#153183 ) This PR adds support for complex power operations (`cpow`) in the `ComplexToROCDLLibraryCalls` conversion pass, specifically targeting AMDGPU architectures. The implementation optimises complex exponentiation by using mathematical identities and special-case handling for small integer powers. - Force lowering to `complex.pow` operations for the `amdgcn-amd-amdhsa` target instead of using library calls - Convert `complex.pow(z, w)` to `complex.exp(w * complex.log(z))` using mathematical identity	2025-08-20 17:18:30 +00:00
David Tenty	63195d3d7a	[NFC][CMake] quote ${CMAKE_SYSTEM_NAME} consistently (#154537 ) A CMake change included in CMake 4.0 makes `AIX` into a variable (similar to `APPLE`, etc.) `ff03db6657` However, `${CMAKE_SYSTEM_NAME}` unfortunately also expands exactly to `AIX` and `if` auto-expands variable names in CMake. That means you get a double expansion if you write: `if (${CMAKE_SYSTEM_NAME} MATCHES "AIX")` which becomes: `if (AIX MATCHES "AIX")` which is as if you wrote: `if (ON MATCHES "AIX")` You can prevent this by quoting the expansion of "${CMAKE_SYSTEM_NAME}", due to policy [CMP0054](https://cmake.org/cmake/help/latest/policy/CMP0054.html#policy:CMP0054) which is on by default in 4.0+. Most of the LLVM CMake already does this, but this PR fixes the remaining cases where we do not.	2025-08-20 12:45:41 -04:00
Matthias Springer	6a285cc8e6	[mlir][IR] Fix `Block::without_terminator` for blocks without terminator (#154498 ) Blocks without a terminator are not handled correctly by `Block::without_terminator`: the last operation is excluded, even when it is not a terminator. With this commit, only terminators are excluded. If the last operation is unregistered, it is included for safety.	2025-08-20 18:02:24 +02:00
Matthias Springer	0499d3a8cf	[mlir][Interfaces] Add `hasUnknownEffects` helper function (#154523 ) I have seen misuse of the `hasEffect` API in downstream projects: users sometimes think that `hasEffect == false` indicates that the operation does not have a certain memory effect. That's not necessarily the case. When the op does not implement the `MemoryEffectsOpInterface`, it is unknown whether it has the specified effect. "false" can also mean "maybe". This commit clarifies the semantics in the documentation. Also adds `hasUnknownEffects` and `mightHaveEffect` convenience functions. Also simplifies a few call sites.	2025-08-20 15:24:53 +00:00
Mehdi Amini	6cedf6e604	[MLIR] Add missing handling for LLVM_LIT_TOOLS_DIR in mlir lit config (NFC) (#154542 ) This is helping some windows users, here is the doc: LLVM_LIT_TOOLS_DIR:PATH The path to GnuWin32 tools for tests. Valid on Windows host. Defaults to the empty string, in which case lit will look for tools needed for tests (e.g. ``grep``, ``sort``, etc.) in your ``%PATH%``. If GnuWin32 is not in your ``%PATH%``, then you can set this variable to the GnuWin32 directory so that lit can find tools needed for tests in that directory.	2025-08-20 16:05:44 +02:00
Hank	c075fb8c37	[MLIR] Fix duplicated attribute nodes in MLIR bytecode deserialization (#151267 ) Fixes #150163 MLIR bytecode does not preserve alias definitions, so each attribute encountered during deserialization is treated as a new one. This can generate duplicate `DISubprogram` nodes during deserialization. The patch adds a `StringMap` cache that records attributes and fetches them when encountered again.	2025-08-20 13:03:26 +00:00
Luc Forget	95fbc18a70	[MLIR][Wasm] Extending Wasm binary to WasmSSA dialect importer (#154452 ) This is a cherry pick of #154053 with a fix for bad handling of endianess when loading float and double litteral from the binary. --------- Co-authored-by: Ferdinand Lemaire <ferdinand.lemaire@woven-planet.global> Co-authored-by: Jessica Paquette <jessica.paquette@woven-planet.global> Co-authored-by: Luc Forget <luc.forget@woven.toyota>	2025-08-20 10:55:55 +02:00
Ian Wood	961b052e98	[mlir][tensor][NFC] Refactor common methods for bubbling extract_slice op (#153675 ) Exposes the `tensor.extract_slice` reshaping logic in `BubbleUpExpandShapeThroughExtractSlice` and `BubbleUpCollapseShapeThroughExtractSlice` through two corresponding utility functions. These compute the offsets/sizes/strides of an extract slice after either collapsing or expanding. This should also make it easier to implement the two other bubbling cases: (1) the `collapse_shape` is a consumer or (2) the `expand_shape` is a consumer. --------- Signed-off-by: Ian Wood <ianwood@u.northwestern.edu>	2025-08-19 19:31:30 +00:00
Renato Golin	5cc8c92268	[NFC][MLIR] Document better linalg morphism (#154313 )	2025-08-19 17:51:03 +01:00
Kazu Hirata	2c4f0e7ac6	[mlir] Replace SmallSet with SmallPtrSet (NFC) (#154265 ) This patch replaces SmallSet<T , N> with SmallPtrSet<T , N>. Note that SmallSet.h "redirects" SmallSet to SmallPtrSet for pointer element types: template <typename PointeeType, unsigned N> class SmallSet<PointeeType, N> : public SmallPtrSet<PointeeType, N> {}; We only have 30 instances that rely on this "redirection". Since the redirection doesn't improve readability, this patch replaces SmallSet with SmallPtrSet for pointer element types. I'm planning to remove the redirection eventually.	2025-08-19 07:11:47 -07:00
Yang Bai	b4c31dc98d	[mlir][Vector] add vector.insert canonicalization pattern to convert a chain of insertions to vector.from_elements (#142944 ) ## Description This change introduces a new canonicalization pattern for the MLIR Vector dialect that optimizes chains of insertions. The optimization identifies when a vector is completely initialized through a series of vector.insert operations and replaces the entire chain with a single `vector.from_elements` operation. Please be aware that the new pattern doesn't work for poison vectors where only some elements are set, as MLIR doesn't support partial poison vectors for now. New Pattern: InsertChainFullyInitialized * Detects chains of vector.insert operations. * Validates that all insertions are at static positions, and all intermediate insertions have only one use. * Ensures the entire vector is completely initialized. * Replaces the entire chain with a single vector.from_elementts operation. Refactored Helper Function * Extracted `calculateInsertPosition` from `foldDenseElementsAttrDestInsertOp` to avoid code duplication. ## Example ``` // Before: %v1 = vector.insert %c10, %v0[0] : i64 into vector<2xi64> %v2 = vector.insert %c20, %v1[1] : i64 into vector<2xi64> // After: %v2 = vector.from_elements %c10, %c20 : vector<2xi64> ``` It also works for multidimensional vectors. ``` // Before: %v1 = vector.insert %cv0, %v0[0] : vector<3xi64> into vector<2x3xi64> %v2 = vector.insert %cv1, %v1[1] : vector<3xi64> into vector<2x3xi64> // After: %0:3 = vector.to_elements %arg1 : vector<3xi64> %1:3 = vector.to_elements %arg2 : vector<3xi64> %v2 = vector.from_elements %0#0, %0#1, %0#2, %1#0, %1#1, %1#2 : vector<2x3xi64> ``` --------- Co-authored-by: Yang Bai <yangb@nvidia.com> Co-authored-by: Andrzej Warzyński <andrzej.warzynski@gmail.com>	2025-08-19 13:43:31 +01:00
Mehdi Amini	dc82b2cc70	Revert "[MLIR][WASM] Extending the Wasm binary to WasmSSA dialect importer" (#154314 ) Reverts llvm/llvm-project#154053 Seems like an endianness sensitivity failing a big-endian bot.	2025-08-19 14:05:09 +02:00
Luc Forget	df57bb8c49	[MLIR][WASM] Extending the Wasm binary to WasmSSA dialect importer (#154053 ) This is the continuation of #152131 This PR adds support for parsing the global initializer and function body, and support for decoding scalar numerical instructions and variable related instructions. --------- Co-authored-by: Ferdinand Lemaire <ferdinand.lemaire@woven-planet.global> Co-authored-by: Jessica Paquette <jessica.paquette@woven-planet.global> Co-authored-by: Luc Forget <luc.forget@woven.toyota>	2025-08-19 13:42:47 +02:00
Md Asghar Ahmad Shahid	c24c23d9ab	[NFC][mlir][vector] Handle potential static cast assertion. (#152957 ) In FoldArithToVectorOuterProduct pattern, static cast to vector type causes assertion when a scalar type was encountered. It seems the author meant to have a dyn_cast instead. This NFC patch handles it by using dyn_cast.	2025-08-19 09:27:20 +05:30
Jianjian Guan	1eb5b18a04	[mlir][emitc] Support dense as init value for ShapedType (#144826 )	2025-08-19 09:41:15 +08:00
Mehdi Amini	89abccc9a6	[MLIR] Update GreedyRewriter to use the LDBG() debug log mechanism (NFC) (#153961 ) Also improve a bit the LDBG() implementation	2025-08-18 21:05:34 +00:00
Mehdi Amini	8c605bd1f4	[MLIR] Add logging to eraseUnreachableBlocks (NFC) (#153968 )	2025-08-18 21:02:53 +00:00
Mehdi Amini	dfaebe7f48	[MLIR] Fix Liveness analysis handling of unreachable code (#153973 ) This patch is forcing all values to be initialized by the LivenessAnalysis, even in dead blocks. The dataflow framework will skip visiting values when its already knows that a block is dynamically unreachable, so this requires specific handling. Downstream code could consider that the absence of liveness is the same a "dead". However as the code is mutated, new value can be introduced, and a transformation like "RemoveDeadValue" must conservatively consider that the absence of liveness information meant that we weren't sure if a value was dead (it could be a newly introduced value. Fixes #153906	2025-08-18 20:50:36 +00:00
Mehdi Amini	191e7eba93	[MLIR] Stop visiting unreachable blocks in the walkAndApplyPatterns driver (#154038 ) This is similar to the fix to the greedy driver in #153957 ; except that instead of removing unreachable code, we just ignore it. Operations like: ``` %add = arith.addi %add, %add : i64 ``` are legal in unreachable code. Unfortunately many patterns would be unsafe to apply on such IR and can lead to crashes or infinite loops.	2025-08-18 20:46:59 +00:00
Charitha Saumya	9617ce4862	[vector][distribution] Bug fix in `moveRegionToNewWarpOpAndAppendReturns` (#153656 )	2025-08-18 13:26:08 -07:00
Yang Bai	4eb1a07d7d	[mlir][vector] Support multi-dimensional vectors in VectorFromElementsLowering (#151175 ) This patch introduces a new unrolling-based approach for lowering multi-dimensional `vector.from_elements` operations. Implementation Details: 1. New Transform Pattern: Added `UnrollFromElements` that unrolls a N-D(N>=2) from_elements op to a (N-1)-D from_elements op align the outermost dimension. 2. Utility Functions: Added `unrollVectorOp` to reuse the unroll algo of vector.gather for vector.from_elements. 3. Integration: Added the unrolling pattern to the convert-vector-to-llvm pass as a temporal transformation. 4. Use direct LLVM dialect operations instead of intermediate vector.insert operations for efficiency in `VectorFromElementsLowering`. Example: ```mlir // unroll %v = vector.from_elements %e0, %e1, %e2, %e3 : vector<2x2xf32> => %poison_2d = ub.poison : vector<2x2xf32> %vec_1d_0 = vector.from_elements %e0, %e1 : vector<2xf32> %vec_2d_0 = vector.insert %vec_1d_0, %poison_2d [0] : vector<2xf32> into vector<2x2xf32> %vec_1d_1 = vector.from_elements %e2, %e3 : vector<2xf32> %result = vector.insert %vec_1d_1, %vec_2d_0 [1] : vector<2xf32> into vector<2x2xf32> // convert-vector-to-llvm %v = vector.from_elements %e0, %e1, %e2, %e3 : vector<2x2xf32> => %poison_2d = ub.poison : vector<2x2xf32> %poison_2d_cast = builtin.unrealized_conversion_cast %poison_2d : vector<2x2xf32> to !llvm.array<2 x vector<2xf32>> %poison_1d_0 = llvm.mlir.poison : vector<2xf32> %c0_0 = llvm.mlir.constant(0 : i64) : i64 %vec_1d_0_0 = llvm.insertelement %e0, %poison_1d_0[%c0_0 : i64] : vector<2xf32> %c1_0 = llvm.mlir.constant(1 : i64) : i64 %vec_1d_0_1 = llvm.insertelement %e1, %vec_1d_0_0[%c1_0 : i64] : vector<2xf32> %vec_2d_0 = llvm.insertvalue %vec_1d_0_1, %poison_2d_cast[0] : !llvm.array<2 x vector<2xf32>> %poison_1d_1 = llvm.mlir.poison : vector<2xf32> %c0_1 = llvm.mlir.constant(0 : i64) : i64 %vec_1d_1_0 = llvm.insertelement %e2, %poison_1d_1[%c0_1 : i64] : vector<2xf32> %c1_1 = llvm.mlir.constant(1 : i64) : i64 %vec_1d_1_1 = llvm.insertelement %e3, %vec_1d_1_0[%c1_1 : i64] : vector<2xf32> %vec_2d_1 = llvm.insertvalue %vec_1d_1_1, %vec_2d_0[1] : !llvm.array<2 x vector<2xf32>> %result = builtin.unrealized_conversion_cast %vec_2d_1 : !llvm.array<2 x vector<2xf32>> to vector<2x2xf32> ``` --------- Co-authored-by: Nicolas Vasilache <Nico.Vasilache@amd.com> Co-authored-by: Yang Bai <yangb@nvidia.com> Co-authored-by: James Newling <james.newling@gmail.com> Co-authored-by: Diego Caballero <dieg0ca6aller0@gmail.com>	2025-08-18 10:09:12 -07:00
Nishant Patel	4a9d038acd	[MLIR][XeGPU] Distribute load_nd/store_nd/prefetch_nd with offsets from Wg to Sg (#153432 ) This PR adds pattern to distribute the load/store/prefetch nd ops with offsets from workgroup to subgroup IR. This PR is part of the transition to move offsets from create_nd to load/store/prefetch nd ops. Create_nd PR : #152351	2025-08-18 09:45:29 -07:00

1 2 3 4 5 ...

23921 Commits