17998 Commits

Author SHA1 Message Date
Durgadoss R
36dc6146b8
[MLIR][NVVM] Update TMA tensor prefetch Op (#153464)
This patch updates the TMA Tensor prefetch Op
to add support for im2col_w/w128 and tile_gather4 modes.
This completes support for all modes available in Blackwell.
* lit tests are added for all possible combinations.
* The invalid tests are moved to a separate file with more coverage.

Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
2025-08-22 12:51:29 +05:30
Rajat Bajpai
b08b219650
[MLIR][NVVM] Add "blocksareclusters" kernel attribute support (#154519)
This change adds "nvvm.blocksareclusters" kernel attribute support in NVVM Dialect/MLIR.
2025-08-22 11:32:21 +05:30
Yang Bai
f1f194bf10
[mlir][vector] fix: unroll vector.from_elements in gpu pipelines (#154774)
### Problem

PR #142944 introduced a new canonicalization pattern which caused
failures in the following GPU-related integration tests:

-
mlir/test/Integration/GPU/CUDA/TensorCore/sm80/transform-mma-sync-matmul-f16-f16-accum.mlir
-
mlir/test/Integration/GPU/CUDA/TensorCore/sm80/transform-mma-sync-matmul-f32.mlir

The issue occurs because the new canonicalization pattern can generate
multi-dimensional `vector.from_elements` operations (rank > 1), but the
GPU lowering pipelines were not equipped to handle these during the
conversion to LLVM.

### Fix

This PR adds `vector::populateVectorFromElementsLoweringPatterns` to the
GPU lowering passes that are integrated in `gpu-lower-to-nvvm-pipeline`:

- `GpuToLLVMConversionPass`: the general GPU-to-LLVM conversion pass.
- `LowerGpuOpsToNVVMOpsPass`: the NVVM-specific lowering pass.

Co-authored-by: Yang Bai <yangb@nvidia.com>
2025-08-21 21:46:06 -05:00
Tim Gymnich
e20fa4f412
[mlir][AMDGPU] Add PermlaneSwapOp (#154345)
- Add PermlaneSwapOp that lowers to `rocdl.permlane16.swap` and
`rocdl.permlane32.swap`

---------

Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com>
2025-08-21 18:21:43 +02:00
Guray Ozen
5c36fb3303
[MLIR][NVVM] Improve inline_ptx, add readwrite support (#154358)
Key Features
1. Multiple SSA returns – no struct packing/unpacking required.
2. Automatic struct unpacking – values are directly usable.
3. Readable register mapping
    * {$rwN} → read-write
    * {$roN} → read-only
    * {$woN} → write-only
4. Full read-write support (+ modifier).
5. Simplified operand specification – avoids cryptic
"=r,=r,=f,=f,f,f,0,1" constraints.
6. Predicate support: PTX `@p` predication support

IR Example:
```
%wo0, %wo1 = nvvm.inline_ptx """
 .reg .pred p;
 setp.ge.s32 p,   {$r0}, {$r1};
 selp.s32 {$rw0}, {$r0}, {$r1}, p;
 selp.s32 {$rw1}, {$r0}, {$r1}, p;
 selp.s32 {$w0},  {$r0}, {$r1}, p;
 selp.s32 {$w1},  {$r0}, {$r1}, p;
""" ro(%a, %b : f32, f32) rw(%c, %d : i32, i32) -> f32, f32
```

After lowering
```
 %0 = llvm.inline_asm has_side_effects asm_dialect = att
 "{
                              .reg .pred p;\
                              setp.ge.s32 p, $4, $5;   \
                              selp.s32   $0, $4, $5, p;\
                              selp.s32   $1, $4, $5, p;\
                              selp.s32   $2, $4, $5, p;\
                              selp.s32   $3, $4, $5, p;\
   }"
   "=r,=r,=f,=f,f,f,0,1"
   %c500_i32, %c400_i32, %cst, %cst_0
   : (i32, i32, f32, f32)
   -> !llvm.struct<(i32, i32, f32, f32)>

 %1 = llvm.extractvalue %0 : !llvm.struct<(i32, i32, f32, f32)>
 %2 = llvm.extractvalue %0 : !llvm.struct<(i32, i32, f32, f32)>
 %3 = llvm.extractvalue %0 : !llvm.struct<(i32, i32, f32, f32)>
 %4 = llvm.extractvalue %0 : !llvm.struct<(i32, i32, f32, f32)>

 // Unpacked result from nvvm.inline_ptx
 %5 = arith.addi %1, %2 : i32
 // read only
 %6 = arith.addf %cst, %cst_0 : f32
 // write only
 %7 = arith.addf %3, %4 : f32
```
2025-08-21 17:42:18 +02:00
Chao Chen
68d6866428
[mlir][XeGPU] add WgToSg distribution pattern for load_matrix and store_matrix. (#154403) 2025-08-21 10:02:45 -05:00
Renato Golin
32a5adbd42
[MLIR][Linalg] Rename convolution pass (#154400)
Rename the pass `LinalgNamedOpConversionPass` to
`SimplifyDepthwiseConvPass` to avoid conflating it with the new
morphisms we are creating between the norms.
2025-08-21 15:57:16 +01:00
Guray Ozen
3d41197d68
[MLIR] Introduce RemarkEngine + pluggable remark streaming (YAML/Bitstream) (#152474)
This PR implements structured, tooling-friendly optimization remarks
with zero cost unless enabled. It implements:
- `RemarkEngine` collects finalized remarks within `MLIRContext`.
- `MLIRRemarkStreamerBase` abstract class streams them to a backend.
- Backends: `MLIRLLVMRemarkStreamer` (bridges to llvm::remarks →
YAML/Bitstream) or your own custom streamer.
- Optional mirroring to DiagnosticEngine (printAsEmitRemarks +
categories).
- Off by default; no behavior change unless enabled. Thread-safe;
ordering best-effort.


## Overview

```
Passes (reportOptimization*)
         │
         ▼
+-------------------+
|  RemarkEngine     |   collects
+-------------------+
     │         │
     │ mirror  │ stream
     ▼         ▼
emitRemark    MLIRRemarkStreamerBase (abstract)
                   │
                   ├── MLIRLLVMRemarkStreamer → llvm::remarks → YAML | Bitstream
                   └── CustomStreamer → your sink
```

## Enable Remark engine and Plug LLVM's Remark streamer
```
// Enable once per MLIRContext. This uses `MLIRLLVMRemarkStreamer`
mlir::remark::enableOptimizationRemarksToFile(
    ctx, path, llvm::remarks::Format::YAML, cats);
```

## API to emit remark
```
// Emit from a pass
 remark::passed(loc, categoryVectorizer, myPassname1)
        << "vectorized loop";

remark::missed(loc, categoryUnroll, "MyPass")
        << remark::reason("not profitable at this size")   // Creates structured reason arg
        << remark::suggest("increase unroll factor to >=4");   // Creates structured suggestion arg

remark::passed(loc, categoryVectorizer, myPassname1)
        << "vectorized loop" 
        << remark::metric("tripCount", 128);                // Create structured metric on-the-fly
```
2025-08-21 16:02:31 +02:00
donald chen
5af7263d42
[mlir] add getViewDest method to viewLikeOpInterface (#154524)
The viewLikeOpInterface abstracts the behavior of an operation view one
buffer as another. However, the current interface only includes a
"getViewSource" method and lacks a "getViewDest" method.

Previously, it was generally assumed that viewLikeOpInterface operations
would have only one return value, which was the view dest. This
assumption was broken by memref.extract_strided_metadata, and more
operations may break these silent conventions in the future. Calling
"viewLikeInterface->getResult(0)" may lead to a core dump at runtime.
Therefore, we need 'getViewDest' method to standardize our behavior.

This patch adds the getViewDest function to viewLikeOpInterface and
modifies the usage points of viewLikeOpInterface to standardize its use.
2025-08-21 20:09:52 +08:00
Mehdi Amini
b20c291bae
[MLIR] Adopt LDBG() debug macro in PatternApplicator.cpp (NFC) (#154724) 2025-08-21 10:32:21 +00:00
Mehdi Amini
acda808304
[MLIR] Adopt LDBG() macro in BuiltinAttributes.cpp (NFC) (#154723) 2025-08-21 10:31:18 +00:00
Mehdi Amini
b916df3a08
[MLIR] Adopt LDBG() in Transform/IR/Utils.cpp (NFC) (#154722) 2025-08-21 10:30:01 +00:00
Mehdi Amini
30f9428f14
[MLIR] Adopt LDBG() macro in LLVM/NVVM/Target.cpp (#154721) 2025-08-21 10:29:37 +00:00
Mehdi Amini
db0529dca3
[MLIR] Use LDBG() macro in Dialect.cpp (NFC) (#154720) 2025-08-21 10:28:51 +00:00
Mehdi Amini
62b29d9f76
[MLIR] Adopt LDBG() debug macro in BytecodeWriter.cpp (NFC) (#154642) 2025-08-20 22:45:39 +00:00
Mehdi Amini
908eebcb93
[MLIR] Adopt LDBG() macro in PDL ByteCodeExecutor (NFC) (#154641) 2025-08-20 22:40:52 +00:00
Mehdi Amini
dbbd3f0d07
[MLIR] Adopt LDBG() macro in Affine/Analysis/Utils.cpp (NFC) (#154626) 2025-08-20 21:56:03 +00:00
Mehdi Amini
d20a74e631
[MLIR] Adopt LDBG() macro in BasicPtxBuilderInterface.cpp (NFC) (#154625) 2025-08-20 21:51:17 +00:00
Mehdi Amini
4be19e27b5
[MLIR] Adopt LDBG() debug macros in Affine LoopAnalysis.cpp (NFC) (#154621) 2025-08-20 21:45:42 +00:00
Mehdi Amini
6445a75c98
[MLIR] Update MLIRContext to use the LDBG() style debug macro (NFC) (#154619) 2025-08-20 21:30:11 +00:00
Mehdi Amini
ffbc8da8b5
[MLIR] Migrate LICM utils to the LDBG() macro style logging (NFC) (#154615) 2025-08-20 21:29:50 +00:00
Mehdi Amini
780750bbf9
[MLIR] Adopt LDBG() debug macro in ConvertToLLVMPass (NFC) (#154616) 2025-08-20 21:29:35 +00:00
Mehdi Amini
5683baea6d
[MLIR] Adopt LDBG() debug macro in bufferization (NFC) (#154614) 2025-08-20 21:14:02 +00:00
Rolf Morel
cbfa265e98
[MLIR][LLVMIR][DLTI] Add LLVM::TargetAttrInterface and #llvm.target attr (#145899)
Adds the `#llvm.target<triple = $TRIPLE, chip = $CHIP, features =
$FEATURES>` attribute and along with a `-llvm-target-to-data-layout`
pass to derive a MLIR data layout from the LLVM data layout string
(using the existing `DataLayoutImporter`). The attribute implements the
relevant DLTI-interfaces, to expose the `triple`, `chip` (AKA `cpu`) and
`features` on `#llvm.target` and the full `DataLayoutSpecInterface`. The
pass combines the generated `#dlti.dl_spec` with an existing `dl_spec`
in case one is already present, e.g. a `dl_spec` which is there to
specify size of the `index` type.

Adds a `TargetAttrInterface` which can be implemented by all attributes
representing LLVM targets.

Similar to the Draft PR https://github.com/llvm/llvm-project/pull/78073.

RFC on which this PR is based:
https://discourse.llvm.org/t/mandatory-data-layout-in-the-llvm-dialect/85875
2025-08-20 22:00:30 +01:00
Akash Banerjee
d69ccded4f
[MLIR] Add cpow support in ComplexToROCDLLibraryCalls (#153183)
This PR adds support for complex power operations (`cpow`) in the
`ComplexToROCDLLibraryCalls` conversion pass, specifically targeting
AMDGPU architectures. The implementation optimises complex
exponentiation by using mathematical identities and special-case
handling for small integer powers.

- Force lowering to `complex.pow` operations for the `amdgcn-amd-amdhsa`
target instead of using library calls
- Convert `complex.pow(z, w)` to `complex.exp(w * complex.log(z))` using
mathematical identity
2025-08-20 17:18:30 +00:00
Matthias Springer
6a285cc8e6
[mlir][IR] Fix Block::without_terminator for blocks without terminator (#154498)
Blocks without a terminator are not handled correctly by
`Block::without_terminator`: the last operation is excluded, even when
it is not a terminator. With this commit, only terminators are excluded.
If the last operation is unregistered, it is included for safety.
2025-08-20 18:02:24 +02:00
Matthias Springer
0499d3a8cf
[mlir][Interfaces] Add hasUnknownEffects helper function (#154523)
I have seen misuse of the `hasEffect` API in downstream projects: users
sometimes think that `hasEffect == false` indicates that the operation
does not have a certain memory effect. That's not necessarily the case.
When the op does not implement the `MemoryEffectsOpInterface`, it is
unknown whether it has the specified effect. "false" can also mean
"maybe".

This commit clarifies the semantics in the documentation. Also adds
`hasUnknownEffects` and `mightHaveEffect` convenience functions. Also
simplifies a few call sites.
2025-08-20 15:24:53 +00:00
Hank
c075fb8c37
[MLIR] Fix duplicated attribute nodes in MLIR bytecode deserialization (#151267)
Fixes #150163 

MLIR bytecode does not preserve alias definitions, so each attribute
encountered during deserialization is treated as a new one. This can
generate duplicate `DISubprogram` nodes during deserialization.

The patch adds a `StringMap` cache that records attributes and fetches
them when encountered again.
2025-08-20 13:03:26 +00:00
Luc Forget
95fbc18a70
[MLIR][Wasm] Extending Wasm binary to WasmSSA dialect importer (#154452)
This is a cherry pick of #154053 with a fix for bad handling of
endianess when loading float and double litteral from the binary.

---------

Co-authored-by: Ferdinand Lemaire <ferdinand.lemaire@woven-planet.global>
Co-authored-by: Jessica Paquette <jessica.paquette@woven-planet.global>
Co-authored-by: Luc Forget <luc.forget@woven.toyota>
2025-08-20 10:55:55 +02:00
Ian Wood
961b052e98
[mlir][tensor][NFC] Refactor common methods for bubbling extract_slice op (#153675)
Exposes the `tensor.extract_slice` reshaping logic in
`BubbleUpExpandShapeThroughExtractSlice` and
`BubbleUpCollapseShapeThroughExtractSlice` through two corresponding
utility functions. These compute the offsets/sizes/strides of an extract
slice after either collapsing or expanding.

This should also make it easier to implement the two other bubbling
cases: (1) the `collapse_shape` is a consumer or (2) the `expand_shape`
is a consumer.

---------

Signed-off-by: Ian Wood <ianwood@u.northwestern.edu>
2025-08-19 19:31:30 +00:00
Kazu Hirata
2c4f0e7ac6
[mlir] Replace SmallSet with SmallPtrSet (NFC) (#154265)
This patch replaces SmallSet<T *, N> with SmallPtrSet<T *, N>.  Note
that SmallSet.h "redirects" SmallSet to SmallPtrSet for pointer
element types:

  template <typename PointeeType, unsigned N>
class SmallSet<PointeeType*, N> : public SmallPtrSet<PointeeType*, N>
{};

We only have 30 instances that rely on this "redirection".  Since the
redirection doesn't improve readability, this patch replaces SmallSet
with SmallPtrSet for pointer element types.

I'm planning to remove the redirection eventually.
2025-08-19 07:11:47 -07:00
Yang Bai
b4c31dc98d
[mlir][Vector] add vector.insert canonicalization pattern to convert a chain of insertions to vector.from_elements (#142944)
## Description

This change introduces a new canonicalization pattern for the MLIR
Vector dialect that optimizes chains of insertions. The optimization
identifies when a vector is **completely** initialized through a series
of vector.insert operations and replaces the entire chain with a
single `vector.from_elements` operation.

Please be aware that the new pattern **doesn't** work for poison vectors
where only **some** elements are set, as MLIR doesn't support partial
poison vectors for now.

**New Pattern: InsertChainFullyInitialized**

* Detects chains of vector.insert operations.
* Validates that all insertions are at static positions, and all
intermediate insertions have only one use.
* Ensures the entire vector is **completely** initialized.
* Replaces the entire chain with a
single vector.from_elementts operation.

**Refactored Helper Function**

* Extracted `calculateInsertPosition` from
`foldDenseElementsAttrDestInsertOp` to avoid code duplication.

## Example

```
// Before:
%v1 = vector.insert %c10, %v0[0] : i64 into vector<2xi64>
%v2 = vector.insert %c20, %v1[1] : i64 into vector<2xi64>

// After:
%v2 = vector.from_elements %c10, %c20 : vector<2xi64>
```

It also works for multidimensional vectors.

```
// Before:
%v1 = vector.insert %cv0, %v0[0] : vector<3xi64> into vector<2x3xi64>
%v2 = vector.insert %cv1, %v1[1] : vector<3xi64> into vector<2x3xi64>

// After:
%0:3 = vector.to_elements %arg1 : vector<3xi64>
%1:3 = vector.to_elements %arg2 : vector<3xi64>
%v2 = vector.from_elements %0#0, %0#1, %0#2, %1#0, %1#1, %1#2 : vector<2x3xi64>
```

---------

Co-authored-by: Yang Bai <yangb@nvidia.com>
Co-authored-by: Andrzej Warzyński <andrzej.warzynski@gmail.com>
2025-08-19 13:43:31 +01:00
Mehdi Amini
dc82b2cc70
Revert "[MLIR][WASM] Extending the Wasm binary to WasmSSA dialect importer" (#154314)
Reverts llvm/llvm-project#154053

Seems like an endianness sensitivity failing a big-endian bot.
2025-08-19 14:05:09 +02:00
Luc Forget
df57bb8c49
[MLIR][WASM] Extending the Wasm binary to WasmSSA dialect importer (#154053)
This is the continuation of  #152131 

This PR adds support for parsing the global initializer and function
body, and support for decoding scalar numerical instructions and
variable related instructions.

---------

Co-authored-by: Ferdinand Lemaire <ferdinand.lemaire@woven-planet.global>
Co-authored-by: Jessica Paquette <jessica.paquette@woven-planet.global>
Co-authored-by: Luc Forget <luc.forget@woven.toyota>
2025-08-19 13:42:47 +02:00
Md Asghar Ahmad Shahid
c24c23d9ab
[NFC][mlir][vector] Handle potential static cast assertion. (#152957)
In FoldArithToVectorOuterProduct pattern, static cast to vector type
causes assertion when a scalar type was encountered. It seems the author
meant to have a dyn_cast instead.

This NFC patch handles it by using dyn_cast.
2025-08-19 09:27:20 +05:30
Jianjian Guan
1eb5b18a04
[mlir][emitc] Support dense as init value for ShapedType (#144826) 2025-08-19 09:41:15 +08:00
Mehdi Amini
89abccc9a6
[MLIR] Update GreedyRewriter to use the LDBG() debug log mechanism (NFC) (#153961)
Also improve a bit the LDBG() implementation
2025-08-18 21:05:34 +00:00
Mehdi Amini
8c605bd1f4
[MLIR] Add logging to eraseUnreachableBlocks (NFC) (#153968) 2025-08-18 21:02:53 +00:00
Mehdi Amini
dfaebe7f48
[MLIR] Fix Liveness analysis handling of unreachable code (#153973)
This patch is forcing all values to be initialized by the
LivenessAnalysis, even in dead blocks. The dataflow framework will skip
visiting values when its already knows that a block is dynamically
unreachable, so this requires specific handling.
Downstream code could consider that the absence of liveness is the same
a "dead".
However as the code is mutated, new value can be introduced, and a
transformation like "RemoveDeadValue" must conservatively consider that
the absence of liveness information meant that we weren't sure if a
value was dead (it could be a newly introduced value.

Fixes #153906
2025-08-18 20:50:36 +00:00
Mehdi Amini
191e7eba93
[MLIR] Stop visiting unreachable blocks in the walkAndApplyPatterns driver (#154038)
This is similar to the fix to the greedy driver in #153957 ; except that
instead of removing unreachable code, we just ignore it.

Operations like:

```
%add = arith.addi %add, %add : i64
```

are legal in unreachable code.
Unfortunately many patterns would be unsafe to apply on such IR and can
lead to crashes or infinite loops.
2025-08-18 20:46:59 +00:00
Charitha Saumya
9617ce4862
[vector][distribution] Bug fix in moveRegionToNewWarpOpAndAppendReturns (#153656) 2025-08-18 13:26:08 -07:00
Yang Bai
4eb1a07d7d
[mlir][vector] Support multi-dimensional vectors in VectorFromElementsLowering (#151175)
This patch introduces a new unrolling-based approach for lowering
multi-dimensional `vector.from_elements` operations.

**Implementation Details:**
1. **New Transform Pattern**: Added `UnrollFromElements` that unrolls a
N-D(N>=2) from_elements op to a (N-1)-D from_elements op align the
outermost dimension.
2. **Utility Functions**: Added `unrollVectorOp` to reuse the unroll
algo of vector.gather for vector.from_elements.
3. **Integration**: Added the unrolling pattern to the
convert-vector-to-llvm pass as a temporal transformation.
4. Use direct LLVM dialect operations instead of intermediate
vector.insert operations for efficiency in `VectorFromElementsLowering`.

**Example:**
```mlir
// unroll
%v = vector.from_elements  %e0, %e1, %e2, %e3 : vector<2x2xf32>
=>
%poison_2d = ub.poison : vector<2x2xf32>
%vec_1d_0 = vector.from_elements %e0, %e1 : vector<2xf32>
%vec_2d_0 = vector.insert %vec_1d_0, %poison_2d [0] : vector<2xf32> into vector<2x2xf32>
%vec_1d_1 = vector.from_elements %e2, %e3 : vector<2xf32>
%result = vector.insert %vec_1d_1, %vec_2d_0 [1] : vector<2xf32> into vector<2x2xf32>

// convert-vector-to-llvm
%v = vector.from_elements %e0, %e1, %e2, %e3 : vector<2x2xf32>
=>
%poison_2d = ub.poison : vector<2x2xf32>
%poison_2d_cast = builtin.unrealized_conversion_cast %poison_2d : vector<2x2xf32> to !llvm.array<2 x vector<2xf32>>
%poison_1d_0 = llvm.mlir.poison : vector<2xf32>
%c0_0 = llvm.mlir.constant(0 : i64) : i64
%vec_1d_0_0 = llvm.insertelement %e0, %poison_1d_0[%c0_0 : i64] : vector<2xf32>
%c1_0 = llvm.mlir.constant(1 : i64) : i64
%vec_1d_0_1 = llvm.insertelement %e1, %vec_1d_0_0[%c1_0 : i64] : vector<2xf32>
%vec_2d_0 = llvm.insertvalue %vec_1d_0_1, %poison_2d_cast[0] : !llvm.array<2 x vector<2xf32>>
%poison_1d_1 = llvm.mlir.poison : vector<2xf32>
%c0_1 = llvm.mlir.constant(0 : i64) : i64
%vec_1d_1_0 = llvm.insertelement %e2, %poison_1d_1[%c0_1 : i64] : vector<2xf32>
%c1_1 = llvm.mlir.constant(1 : i64) : i64
%vec_1d_1_1 = llvm.insertelement %e3, %vec_1d_1_0[%c1_1 : i64] : vector<2xf32>
%vec_2d_1 = llvm.insertvalue %vec_1d_1_1, %vec_2d_0[1] : !llvm.array<2 x vector<2xf32>>
%result = builtin.unrealized_conversion_cast %vec_2d_1 : !llvm.array<2 x vector<2xf32>> to vector<2x2xf32>
```

---------

Co-authored-by: Nicolas Vasilache <Nico.Vasilache@amd.com>
Co-authored-by: Yang Bai <yangb@nvidia.com>
Co-authored-by: James Newling <james.newling@gmail.com>
Co-authored-by: Diego Caballero <dieg0ca6aller0@gmail.com>
2025-08-18 10:09:12 -07:00
Nishant Patel
4a9d038acd
[MLIR][XeGPU] Distribute load_nd/store_nd/prefetch_nd with offsets from Wg to Sg (#153432)
This PR adds pattern to distribute the load/store/prefetch nd ops with
offsets from workgroup to subgroup IR. This PR is part of the transition
to move offsets from create_nd to load/store/prefetch nd ops.

Create_nd PR : #152351
2025-08-18 09:45:29 -07:00
Jeremy Kun
c67d27dad0
[mlir][Presburger] NFC: return var index from IntegerRelation::addLocalFloorDiv (#153463)
addLocalFloorDiv currently returns void and requires the caller to know
that the newly added local variable is in a particular index. This
commit returns the index of the newly added variable so that callers
need not tie themselves to this implementation detail.

I found one relevant callsite demonstrating this and updated it. I am
using this API out of tree and wanted to make our out-of-tree code a bit
more resilient to upstream changes.
2025-08-18 08:47:47 -07:00
Jacques Pienaar
4bf33958da
[mlir] Update builders to use new form. (#154132)
Mechanically applied using clang-tidy.
2025-08-18 15:19:34 +00:00
Matthias Springer
f84aaa6eaa
[mlir][Transforms] Dialect conversion: Add flag to dump materialization kind (#119532)
Add a debugging flag to the dialect conversion to dump the
materialization kind. This flag is useful to find out whether a missing
materialization rule is for source or target materializations.

Also add missing test coverage for the `buildMaterializations` flag.
2025-08-18 13:25:18 +00:00
Chaitanya
4a3bf27c69
[OpenMP] Introduce omp.target_allocmem and omp.target_freemem omp dialect ops. (#145464)
This PR introduces two new ops in omp dialect, omp.target_allocmem and
omp.target_freemem.
omp.target_allocmem: Allocates heap memory on device. Will be lowered to
omp_target_alloc call in llvm.
omp.target_freemem: Deallocates heap memory on device. Will be lowered
to omp+target_free call in llvm.


Example:
  %1 = omp.target_allocmem %device : i32, i64
  omp.target_freemem %device, %1 : i32, i64

The work in this PR is C-P/inspired from @ivanradanov commit from
coexecute implementation:
[Add fir omp target alloc and free
ops](be860ac8ba)
[Lower omp_target_{alloc,free} to
llvm](6e2d584dc9)
2025-08-18 18:15:11 +05:30
Mehdi Amini
cfe5975eaf
[MLIR] Fix SCF verifier crash (#153974)
An operand of the nested yield op can be null and hasn't been verified
yet when processing the enclosing operation. Using `getResultTypes()`
will dereference this null Value and crash in the verifier.
2025-08-18 12:48:55 +02:00
Mehdi Amini
16aa283344
[MLIR] Refactor the walkAndApplyPatterns driver to remove the recursion (#154037)
This is in preparation of a follow-up change to stop traversing
unreachable blocks.

This is not NFC because of a subtlety of the early_inc. On a test case
like:

```
  scf.if %cond {
    "test.move_after_parent_op"() ({
      "test.any_attr_of_i32_str"() {attr = 0 : i32} : () -> ()
    }) : () -> ()
  }
```

We recursively traverse the nested regions, and process an op when the
region is done (post-order).
We need to pre-increment the iterator before processing an operation in
case it gets deleted. However
we can do this before or after processing the nested region. This
implementation does the latter.
2025-08-18 09:07:19 +00:00
Mehdi Amini
87e6fd161a
[MLIR] Erase unreachable blocks before applying patterns in the greedy rewriter (#153957)
Operations like:

    %add = arith.addi %add, %add : i64

are legal in unreachable code. Unfortunately many patterns would be
unsafe to apply on such IR and can lead to crashes or infinite loops. To
avoid this we can remove unreachable blocks before attempting to apply
patterns.
We may have to do this also whenever the CFG is changed by a pattern, it
is left up for future work right now.

Fixes #153732
2025-08-18 10:59:43 +02:00