23921 Commits

Author SHA1 Message Date
Durgadoss R
36dc6146b8
[MLIR][NVVM] Update TMA tensor prefetch Op (#153464)
This patch updates the TMA Tensor prefetch Op
to add support for im2col_w/w128 and tile_gather4 modes.
This completes support for all modes available in Blackwell.
* lit tests are added for all possible combinations.
* The invalid tests are moved to a separate file with more coverage.

Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
2025-08-22 12:51:29 +05:30
Rajat Bajpai
b08b219650
[MLIR][NVVM] Add "blocksareclusters" kernel attribute support (#154519)
This change adds "nvvm.blocksareclusters" kernel attribute support in NVVM Dialect/MLIR.
2025-08-22 11:32:21 +05:30
Yang Bai
f1f194bf10
[mlir][vector] fix: unroll vector.from_elements in gpu pipelines (#154774)
### Problem

PR #142944 introduced a new canonicalization pattern which caused
failures in the following GPU-related integration tests:

-
mlir/test/Integration/GPU/CUDA/TensorCore/sm80/transform-mma-sync-matmul-f16-f16-accum.mlir
-
mlir/test/Integration/GPU/CUDA/TensorCore/sm80/transform-mma-sync-matmul-f32.mlir

The issue occurs because the new canonicalization pattern can generate
multi-dimensional `vector.from_elements` operations (rank > 1), but the
GPU lowering pipelines were not equipped to handle these during the
conversion to LLVM.

### Fix

This PR adds `vector::populateVectorFromElementsLoweringPatterns` to the
GPU lowering passes that are integrated in `gpu-lower-to-nvvm-pipeline`:

- `GpuToLLVMConversionPass`: the general GPU-to-LLVM conversion pass.
- `LowerGpuOpsToNVVMOpsPass`: the NVVM-specific lowering pass.

Co-authored-by: Yang Bai <yangb@nvidia.com>
2025-08-21 21:46:06 -05:00
James Newling
a64e6f4928
[MLIR][Vector] Test to accompany bug fix (#154434)
Bug introduced in 
https://github.com/llvm/llvm-project/pull/93664

The bug was fixed in
https://github.com/llvm/llvm-project/pull/152957

But there was no test. This PR adds a test that hits the assertion
failure if the fix is reverted (if I change dyn_cast to cast).
2025-08-21 13:00:41 -07:00
Tim Gymnich
e20fa4f412
[mlir][AMDGPU] Add PermlaneSwapOp (#154345)
- Add PermlaneSwapOp that lowers to `rocdl.permlane16.swap` and
`rocdl.permlane32.swap`

---------

Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com>
2025-08-21 18:21:43 +02:00
Guray Ozen
5c36fb3303
[MLIR][NVVM] Improve inline_ptx, add readwrite support (#154358)
Key Features
1. Multiple SSA returns – no struct packing/unpacking required.
2. Automatic struct unpacking – values are directly usable.
3. Readable register mapping
    * {$rwN} → read-write
    * {$roN} → read-only
    * {$woN} → write-only
4. Full read-write support (+ modifier).
5. Simplified operand specification – avoids cryptic
"=r,=r,=f,=f,f,f,0,1" constraints.
6. Predicate support: PTX `@p` predication support

IR Example:
```
%wo0, %wo1 = nvvm.inline_ptx """
 .reg .pred p;
 setp.ge.s32 p,   {$r0}, {$r1};
 selp.s32 {$rw0}, {$r0}, {$r1}, p;
 selp.s32 {$rw1}, {$r0}, {$r1}, p;
 selp.s32 {$w0},  {$r0}, {$r1}, p;
 selp.s32 {$w1},  {$r0}, {$r1}, p;
""" ro(%a, %b : f32, f32) rw(%c, %d : i32, i32) -> f32, f32
```

After lowering
```
 %0 = llvm.inline_asm has_side_effects asm_dialect = att
 "{
                              .reg .pred p;\
                              setp.ge.s32 p, $4, $5;   \
                              selp.s32   $0, $4, $5, p;\
                              selp.s32   $1, $4, $5, p;\
                              selp.s32   $2, $4, $5, p;\
                              selp.s32   $3, $4, $5, p;\
   }"
   "=r,=r,=f,=f,f,f,0,1"
   %c500_i32, %c400_i32, %cst, %cst_0
   : (i32, i32, f32, f32)
   -> !llvm.struct<(i32, i32, f32, f32)>

 %1 = llvm.extractvalue %0 : !llvm.struct<(i32, i32, f32, f32)>
 %2 = llvm.extractvalue %0 : !llvm.struct<(i32, i32, f32, f32)>
 %3 = llvm.extractvalue %0 : !llvm.struct<(i32, i32, f32, f32)>
 %4 = llvm.extractvalue %0 : !llvm.struct<(i32, i32, f32, f32)>

 // Unpacked result from nvvm.inline_ptx
 %5 = arith.addi %1, %2 : i32
 // read only
 %6 = arith.addf %cst, %cst_0 : f32
 // write only
 %7 = arith.addf %3, %4 : f32
```
2025-08-21 17:42:18 +02:00
Chao Chen
68d6866428
[mlir][XeGPU] add WgToSg distribution pattern for load_matrix and store_matrix. (#154403) 2025-08-21 10:02:45 -05:00
Renato Golin
32a5adbd42
[MLIR][Linalg] Rename convolution pass (#154400)
Rename the pass `LinalgNamedOpConversionPass` to
`SimplifyDepthwiseConvPass` to avoid conflating it with the new
morphisms we are creating between the norms.
2025-08-21 15:57:16 +01:00
Guray Ozen
3d41197d68
[MLIR] Introduce RemarkEngine + pluggable remark streaming (YAML/Bitstream) (#152474)
This PR implements structured, tooling-friendly optimization remarks
with zero cost unless enabled. It implements:
- `RemarkEngine` collects finalized remarks within `MLIRContext`.
- `MLIRRemarkStreamerBase` abstract class streams them to a backend.
- Backends: `MLIRLLVMRemarkStreamer` (bridges to llvm::remarks →
YAML/Bitstream) or your own custom streamer.
- Optional mirroring to DiagnosticEngine (printAsEmitRemarks +
categories).
- Off by default; no behavior change unless enabled. Thread-safe;
ordering best-effort.


## Overview

```
Passes (reportOptimization*)
         │
         ▼
+-------------------+
|  RemarkEngine     |   collects
+-------------------+
     │         │
     │ mirror  │ stream
     ▼         ▼
emitRemark    MLIRRemarkStreamerBase (abstract)
                   │
                   ├── MLIRLLVMRemarkStreamer → llvm::remarks → YAML | Bitstream
                   └── CustomStreamer → your sink
```

## Enable Remark engine and Plug LLVM's Remark streamer
```
// Enable once per MLIRContext. This uses `MLIRLLVMRemarkStreamer`
mlir::remark::enableOptimizationRemarksToFile(
    ctx, path, llvm::remarks::Format::YAML, cats);
```

## API to emit remark
```
// Emit from a pass
 remark::passed(loc, categoryVectorizer, myPassname1)
        << "vectorized loop";

remark::missed(loc, categoryUnroll, "MyPass")
        << remark::reason("not profitable at this size")   // Creates structured reason arg
        << remark::suggest("increase unroll factor to >=4");   // Creates structured suggestion arg

remark::passed(loc, categoryVectorizer, myPassname1)
        << "vectorized loop" 
        << remark::metric("tripCount", 128);                // Create structured metric on-the-fly
```
2025-08-21 16:02:31 +02:00
donald chen
5af7263d42
[mlir] add getViewDest method to viewLikeOpInterface (#154524)
The viewLikeOpInterface abstracts the behavior of an operation view one
buffer as another. However, the current interface only includes a
"getViewSource" method and lacks a "getViewDest" method.

Previously, it was generally assumed that viewLikeOpInterface operations
would have only one return value, which was the view dest. This
assumption was broken by memref.extract_strided_metadata, and more
operations may break these silent conventions in the future. Calling
"viewLikeInterface->getResult(0)" may lead to a core dump at runtime.
Therefore, we need 'getViewDest' method to standardize our behavior.

This patch adds the getViewDest function to viewLikeOpInterface and
modifies the usage points of viewLikeOpInterface to standardize its use.
2025-08-21 20:09:52 +08:00
Mehdi Amini
b20c291bae
[MLIR] Adopt LDBG() debug macro in PatternApplicator.cpp (NFC) (#154724) 2025-08-21 10:32:21 +00:00
Mehdi Amini
acda808304
[MLIR] Adopt LDBG() macro in BuiltinAttributes.cpp (NFC) (#154723) 2025-08-21 10:31:18 +00:00
Mehdi Amini
b916df3a08
[MLIR] Adopt LDBG() in Transform/IR/Utils.cpp (NFC) (#154722) 2025-08-21 10:30:01 +00:00
Mehdi Amini
30f9428f14
[MLIR] Adopt LDBG() macro in LLVM/NVVM/Target.cpp (#154721) 2025-08-21 10:29:37 +00:00
Mehdi Amini
db0529dca3
[MLIR] Use LDBG() macro in Dialect.cpp (NFC) (#154720) 2025-08-21 10:28:51 +00:00
Guray Ozen
7439d22970
[MLIR][NVVM] Add nanosleep (#154697) 2025-08-21 11:30:41 +02:00
Dominik Adamski
b69fd34e76
[Offload] Add oneInterationPerThread param to loop device RTL (#151959)
Currently, Flang can generate no-loop kernels for all OpenMP target
kernels in the program if the flags
-fopenmp-assume-teams-oversubscription or
-fopenmp-assume-threads-oversubscription are set.
If we add an additional parameter, we can choose
in the future which OpenMP kernels should be generated as no-loop
kernels.

This PR doesn't modify current behavior of oversubscription flags.

RFC for no-loop kernels:
https://discourse.llvm.org/t/rfc-no-loop-mode-for-openmp-gpu-kernels/87517
2025-08-21 09:03:56 +02:00
Mehdi Amini
62b29d9f76
[MLIR] Adopt LDBG() debug macro in BytecodeWriter.cpp (NFC) (#154642) 2025-08-20 22:45:39 +00:00
Mehdi Amini
908eebcb93
[MLIR] Adopt LDBG() macro in PDL ByteCodeExecutor (NFC) (#154641) 2025-08-20 22:40:52 +00:00
Mehdi Amini
dbbd3f0d07
[MLIR] Adopt LDBG() macro in Affine/Analysis/Utils.cpp (NFC) (#154626) 2025-08-20 21:56:03 +00:00
Mehdi Amini
d20a74e631
[MLIR] Adopt LDBG() macro in BasicPtxBuilderInterface.cpp (NFC) (#154625) 2025-08-20 21:51:17 +00:00
Mehdi Amini
4be19e27b5
[MLIR] Adopt LDBG() debug macros in Affine LoopAnalysis.cpp (NFC) (#154621) 2025-08-20 21:45:42 +00:00
Mehdi Amini
6445a75c98
[MLIR] Update MLIRContext to use the LDBG() style debug macro (NFC) (#154619) 2025-08-20 21:30:11 +00:00
Mehdi Amini
ffbc8da8b5
[MLIR] Migrate LICM utils to the LDBG() macro style logging (NFC) (#154615) 2025-08-20 21:29:50 +00:00
Mehdi Amini
780750bbf9
[MLIR] Adopt LDBG() debug macro in ConvertToLLVMPass (NFC) (#154616) 2025-08-20 21:29:35 +00:00
Mehdi Amini
5683baea6d
[MLIR] Adopt LDBG() debug macro in bufferization (NFC) (#154614) 2025-08-20 21:14:02 +00:00
Rolf Morel
cbfa265e98
[MLIR][LLVMIR][DLTI] Add LLVM::TargetAttrInterface and #llvm.target attr (#145899)
Adds the `#llvm.target<triple = $TRIPLE, chip = $CHIP, features =
$FEATURES>` attribute and along with a `-llvm-target-to-data-layout`
pass to derive a MLIR data layout from the LLVM data layout string
(using the existing `DataLayoutImporter`). The attribute implements the
relevant DLTI-interfaces, to expose the `triple`, `chip` (AKA `cpu`) and
`features` on `#llvm.target` and the full `DataLayoutSpecInterface`. The
pass combines the generated `#dlti.dl_spec` with an existing `dl_spec`
in case one is already present, e.g. a `dl_spec` which is there to
specify size of the `index` type.

Adds a `TargetAttrInterface` which can be implemented by all attributes
representing LLVM targets.

Similar to the Draft PR https://github.com/llvm/llvm-project/pull/78073.

RFC on which this PR is based:
https://discourse.llvm.org/t/mandatory-data-layout-in-the-llvm-dialect/85875
2025-08-20 22:00:30 +01:00
Michał Górny
d76bb2bb89
[mlir] Fix missing mlir-capi-global-constructors-test on standalone build (#154576)
Add `mlir-capi-global-constructors-test` to `MLIR_TEST_DEPENDS` when
`MLIR_ENABLE_EXECUTION_ENGINE` is enabled, to ensure that it is also
built during standalone builds, and therefore fix test failure due to
the executable being missing.

I don't understand the purpose of `LLVM_ENABLE_PIC AND TARGET
${LLVM_NATIVE_ARCH}` block, but the condition is not true in standalone
builds.

Fixes 7610b1372955da55e3dc4e2eb1440f0304a56ac8.
2025-08-20 19:50:07 +02:00
Akash Banerjee
d69ccded4f
[MLIR] Add cpow support in ComplexToROCDLLibraryCalls (#153183)
This PR adds support for complex power operations (`cpow`) in the
`ComplexToROCDLLibraryCalls` conversion pass, specifically targeting
AMDGPU architectures. The implementation optimises complex
exponentiation by using mathematical identities and special-case
handling for small integer powers.

- Force lowering to `complex.pow` operations for the `amdgcn-amd-amdhsa`
target instead of using library calls
- Convert `complex.pow(z, w)` to `complex.exp(w * complex.log(z))` using
mathematical identity
2025-08-20 17:18:30 +00:00
David Tenty
63195d3d7a
[NFC][CMake] quote ${CMAKE_SYSTEM_NAME} consistently (#154537)
A CMake change included in CMake 4.0 makes `AIX` into a variable
(similar to `APPLE`, etc.)
ff03db6657

However, `${CMAKE_SYSTEM_NAME}` unfortunately also expands exactly to
`AIX` and `if` auto-expands variable names in CMake. That means you get
a double expansion if you write:

`if (${CMAKE_SYSTEM_NAME}  MATCHES "AIX")`
which becomes:
`if (AIX  MATCHES "AIX")`
which is as if you wrote:
`if (ON MATCHES "AIX")`

You can prevent this by quoting the expansion of "${CMAKE_SYSTEM_NAME}",
due to policy
[CMP0054](https://cmake.org/cmake/help/latest/policy/CMP0054.html#policy:CMP0054)
which is on by default in 4.0+. Most of the LLVM CMake already does
this, but this PR fixes the remaining cases where we do not.
2025-08-20 12:45:41 -04:00
Matthias Springer
6a285cc8e6
[mlir][IR] Fix Block::without_terminator for blocks without terminator (#154498)
Blocks without a terminator are not handled correctly by
`Block::without_terminator`: the last operation is excluded, even when
it is not a terminator. With this commit, only terminators are excluded.
If the last operation is unregistered, it is included for safety.
2025-08-20 18:02:24 +02:00
Matthias Springer
0499d3a8cf
[mlir][Interfaces] Add hasUnknownEffects helper function (#154523)
I have seen misuse of the `hasEffect` API in downstream projects: users
sometimes think that `hasEffect == false` indicates that the operation
does not have a certain memory effect. That's not necessarily the case.
When the op does not implement the `MemoryEffectsOpInterface`, it is
unknown whether it has the specified effect. "false" can also mean
"maybe".

This commit clarifies the semantics in the documentation. Also adds
`hasUnknownEffects` and `mightHaveEffect` convenience functions. Also
simplifies a few call sites.
2025-08-20 15:24:53 +00:00
Mehdi Amini
6cedf6e604
[MLIR] Add missing handling for LLVM_LIT_TOOLS_DIR in mlir lit config (NFC) (#154542)
This is helping some windows users, here is the doc:

**LLVM_LIT_TOOLS_DIR**:PATH
The path to GnuWin32 tools for tests. Valid on Windows host. Defaults to
the empty string, in which case lit will look for tools needed for tests
(e.g. ``grep``, ``sort``, etc.) in your ``%PATH%``. If GnuWin32 is not
in your
``%PATH%``, then you can set this variable to the GnuWin32 directory so
that
  lit can find tools needed for tests in that directory.
2025-08-20 16:05:44 +02:00
Hank
c075fb8c37
[MLIR] Fix duplicated attribute nodes in MLIR bytecode deserialization (#151267)
Fixes #150163 

MLIR bytecode does not preserve alias definitions, so each attribute
encountered during deserialization is treated as a new one. This can
generate duplicate `DISubprogram` nodes during deserialization.

The patch adds a `StringMap` cache that records attributes and fetches
them when encountered again.
2025-08-20 13:03:26 +00:00
Luc Forget
95fbc18a70
[MLIR][Wasm] Extending Wasm binary to WasmSSA dialect importer (#154452)
This is a cherry pick of #154053 with a fix for bad handling of
endianess when loading float and double litteral from the binary.

---------

Co-authored-by: Ferdinand Lemaire <ferdinand.lemaire@woven-planet.global>
Co-authored-by: Jessica Paquette <jessica.paquette@woven-planet.global>
Co-authored-by: Luc Forget <luc.forget@woven.toyota>
2025-08-20 10:55:55 +02:00
Ian Wood
961b052e98
[mlir][tensor][NFC] Refactor common methods for bubbling extract_slice op (#153675)
Exposes the `tensor.extract_slice` reshaping logic in
`BubbleUpExpandShapeThroughExtractSlice` and
`BubbleUpCollapseShapeThroughExtractSlice` through two corresponding
utility functions. These compute the offsets/sizes/strides of an extract
slice after either collapsing or expanding.

This should also make it easier to implement the two other bubbling
cases: (1) the `collapse_shape` is a consumer or (2) the `expand_shape`
is a consumer.

---------

Signed-off-by: Ian Wood <ianwood@u.northwestern.edu>
2025-08-19 19:31:30 +00:00
Renato Golin
5cc8c92268
[NFC][MLIR] Document better linalg morphism (#154313) 2025-08-19 17:51:03 +01:00
Kazu Hirata
2c4f0e7ac6
[mlir] Replace SmallSet with SmallPtrSet (NFC) (#154265)
This patch replaces SmallSet<T *, N> with SmallPtrSet<T *, N>.  Note
that SmallSet.h "redirects" SmallSet to SmallPtrSet for pointer
element types:

  template <typename PointeeType, unsigned N>
class SmallSet<PointeeType*, N> : public SmallPtrSet<PointeeType*, N>
{};

We only have 30 instances that rely on this "redirection".  Since the
redirection doesn't improve readability, this patch replaces SmallSet
with SmallPtrSet for pointer element types.

I'm planning to remove the redirection eventually.
2025-08-19 07:11:47 -07:00
Yang Bai
b4c31dc98d
[mlir][Vector] add vector.insert canonicalization pattern to convert a chain of insertions to vector.from_elements (#142944)
## Description

This change introduces a new canonicalization pattern for the MLIR
Vector dialect that optimizes chains of insertions. The optimization
identifies when a vector is **completely** initialized through a series
of vector.insert operations and replaces the entire chain with a
single `vector.from_elements` operation.

Please be aware that the new pattern **doesn't** work for poison vectors
where only **some** elements are set, as MLIR doesn't support partial
poison vectors for now.

**New Pattern: InsertChainFullyInitialized**

* Detects chains of vector.insert operations.
* Validates that all insertions are at static positions, and all
intermediate insertions have only one use.
* Ensures the entire vector is **completely** initialized.
* Replaces the entire chain with a
single vector.from_elementts operation.

**Refactored Helper Function**

* Extracted `calculateInsertPosition` from
`foldDenseElementsAttrDestInsertOp` to avoid code duplication.

## Example

```
// Before:
%v1 = vector.insert %c10, %v0[0] : i64 into vector<2xi64>
%v2 = vector.insert %c20, %v1[1] : i64 into vector<2xi64>

// After:
%v2 = vector.from_elements %c10, %c20 : vector<2xi64>
```

It also works for multidimensional vectors.

```
// Before:
%v1 = vector.insert %cv0, %v0[0] : vector<3xi64> into vector<2x3xi64>
%v2 = vector.insert %cv1, %v1[1] : vector<3xi64> into vector<2x3xi64>

// After:
%0:3 = vector.to_elements %arg1 : vector<3xi64>
%1:3 = vector.to_elements %arg2 : vector<3xi64>
%v2 = vector.from_elements %0#0, %0#1, %0#2, %1#0, %1#1, %1#2 : vector<2x3xi64>
```

---------

Co-authored-by: Yang Bai <yangb@nvidia.com>
Co-authored-by: Andrzej Warzyński <andrzej.warzynski@gmail.com>
2025-08-19 13:43:31 +01:00
Mehdi Amini
dc82b2cc70
Revert "[MLIR][WASM] Extending the Wasm binary to WasmSSA dialect importer" (#154314)
Reverts llvm/llvm-project#154053

Seems like an endianness sensitivity failing a big-endian bot.
2025-08-19 14:05:09 +02:00
Luc Forget
df57bb8c49
[MLIR][WASM] Extending the Wasm binary to WasmSSA dialect importer (#154053)
This is the continuation of  #152131 

This PR adds support for parsing the global initializer and function
body, and support for decoding scalar numerical instructions and
variable related instructions.

---------

Co-authored-by: Ferdinand Lemaire <ferdinand.lemaire@woven-planet.global>
Co-authored-by: Jessica Paquette <jessica.paquette@woven-planet.global>
Co-authored-by: Luc Forget <luc.forget@woven.toyota>
2025-08-19 13:42:47 +02:00
Md Asghar Ahmad Shahid
c24c23d9ab
[NFC][mlir][vector] Handle potential static cast assertion. (#152957)
In FoldArithToVectorOuterProduct pattern, static cast to vector type
causes assertion when a scalar type was encountered. It seems the author
meant to have a dyn_cast instead.

This NFC patch handles it by using dyn_cast.
2025-08-19 09:27:20 +05:30
Jianjian Guan
1eb5b18a04
[mlir][emitc] Support dense as init value for ShapedType (#144826) 2025-08-19 09:41:15 +08:00
Mehdi Amini
89abccc9a6
[MLIR] Update GreedyRewriter to use the LDBG() debug log mechanism (NFC) (#153961)
Also improve a bit the LDBG() implementation
2025-08-18 21:05:34 +00:00
Mehdi Amini
8c605bd1f4
[MLIR] Add logging to eraseUnreachableBlocks (NFC) (#153968) 2025-08-18 21:02:53 +00:00
Mehdi Amini
dfaebe7f48
[MLIR] Fix Liveness analysis handling of unreachable code (#153973)
This patch is forcing all values to be initialized by the
LivenessAnalysis, even in dead blocks. The dataflow framework will skip
visiting values when its already knows that a block is dynamically
unreachable, so this requires specific handling.
Downstream code could consider that the absence of liveness is the same
a "dead".
However as the code is mutated, new value can be introduced, and a
transformation like "RemoveDeadValue" must conservatively consider that
the absence of liveness information meant that we weren't sure if a
value was dead (it could be a newly introduced value.

Fixes #153906
2025-08-18 20:50:36 +00:00
Mehdi Amini
191e7eba93
[MLIR] Stop visiting unreachable blocks in the walkAndApplyPatterns driver (#154038)
This is similar to the fix to the greedy driver in #153957 ; except that
instead of removing unreachable code, we just ignore it.

Operations like:

```
%add = arith.addi %add, %add : i64
```

are legal in unreachable code.
Unfortunately many patterns would be unsafe to apply on such IR and can
lead to crashes or infinite loops.
2025-08-18 20:46:59 +00:00
Charitha Saumya
9617ce4862
[vector][distribution] Bug fix in moveRegionToNewWarpOpAndAppendReturns (#153656) 2025-08-18 13:26:08 -07:00
Yang Bai
4eb1a07d7d
[mlir][vector] Support multi-dimensional vectors in VectorFromElementsLowering (#151175)
This patch introduces a new unrolling-based approach for lowering
multi-dimensional `vector.from_elements` operations.

**Implementation Details:**
1. **New Transform Pattern**: Added `UnrollFromElements` that unrolls a
N-D(N>=2) from_elements op to a (N-1)-D from_elements op align the
outermost dimension.
2. **Utility Functions**: Added `unrollVectorOp` to reuse the unroll
algo of vector.gather for vector.from_elements.
3. **Integration**: Added the unrolling pattern to the
convert-vector-to-llvm pass as a temporal transformation.
4. Use direct LLVM dialect operations instead of intermediate
vector.insert operations for efficiency in `VectorFromElementsLowering`.

**Example:**
```mlir
// unroll
%v = vector.from_elements  %e0, %e1, %e2, %e3 : vector<2x2xf32>
=>
%poison_2d = ub.poison : vector<2x2xf32>
%vec_1d_0 = vector.from_elements %e0, %e1 : vector<2xf32>
%vec_2d_0 = vector.insert %vec_1d_0, %poison_2d [0] : vector<2xf32> into vector<2x2xf32>
%vec_1d_1 = vector.from_elements %e2, %e3 : vector<2xf32>
%result = vector.insert %vec_1d_1, %vec_2d_0 [1] : vector<2xf32> into vector<2x2xf32>

// convert-vector-to-llvm
%v = vector.from_elements %e0, %e1, %e2, %e3 : vector<2x2xf32>
=>
%poison_2d = ub.poison : vector<2x2xf32>
%poison_2d_cast = builtin.unrealized_conversion_cast %poison_2d : vector<2x2xf32> to !llvm.array<2 x vector<2xf32>>
%poison_1d_0 = llvm.mlir.poison : vector<2xf32>
%c0_0 = llvm.mlir.constant(0 : i64) : i64
%vec_1d_0_0 = llvm.insertelement %e0, %poison_1d_0[%c0_0 : i64] : vector<2xf32>
%c1_0 = llvm.mlir.constant(1 : i64) : i64
%vec_1d_0_1 = llvm.insertelement %e1, %vec_1d_0_0[%c1_0 : i64] : vector<2xf32>
%vec_2d_0 = llvm.insertvalue %vec_1d_0_1, %poison_2d_cast[0] : !llvm.array<2 x vector<2xf32>>
%poison_1d_1 = llvm.mlir.poison : vector<2xf32>
%c0_1 = llvm.mlir.constant(0 : i64) : i64
%vec_1d_1_0 = llvm.insertelement %e2, %poison_1d_1[%c0_1 : i64] : vector<2xf32>
%c1_1 = llvm.mlir.constant(1 : i64) : i64
%vec_1d_1_1 = llvm.insertelement %e3, %vec_1d_1_0[%c1_1 : i64] : vector<2xf32>
%vec_2d_1 = llvm.insertvalue %vec_1d_1_1, %vec_2d_0[1] : !llvm.array<2 x vector<2xf32>>
%result = builtin.unrealized_conversion_cast %vec_2d_1 : !llvm.array<2 x vector<2xf32>> to vector<2x2xf32>
```

---------

Co-authored-by: Nicolas Vasilache <Nico.Vasilache@amd.com>
Co-authored-by: Yang Bai <yangb@nvidia.com>
Co-authored-by: James Newling <james.newling@gmail.com>
Co-authored-by: Diego Caballero <dieg0ca6aller0@gmail.com>
2025-08-18 10:09:12 -07:00
Nishant Patel
4a9d038acd
[MLIR][XeGPU] Distribute load_nd/store_nd/prefetch_nd with offsets from Wg to Sg (#153432)
This PR adds pattern to distribute the load/store/prefetch nd ops with
offsets from workgroup to subgroup IR. This PR is part of the transition
to move offsets from create_nd to load/store/prefetch nd ops.

Create_nd PR : #152351
2025-08-18 09:45:29 -07:00