261 Commits

Author SHA1 Message Date
Longsheng Mou
1723a5137c
[mlir][tensor] Drop unused AffineExpr variable (NFC) (#168651) 2025-11-19 22:33:40 +08:00
Hanumanth
81964597f9
[mlir][tensor] Fix runtime verification for tensor.extract_slice for empty tensor slices (#166569)
I hit another runtime verification issue (similar to
https://github.com/llvm/llvm-project/pull/164878) while working with
TFLite models. The verifier is incorrectly rejecting
`tensor.extract_slice` operations when extracting an empty slice
(size=0) that starts exactly at the tensor boundary.

The current runtime verification unconditionally enforces `offset <
dim_size`. This makes sense for non-empty slices, but it's too strict
for empty slices, causing false positives that lead to spurious runtime
assertions.

**Simple example that demonstrates the issue:**

```mlir
func.func @extract_empty_slice(%tensor: tensor<?xf32>, %offset: index, %size: index) {
  // When called with: tensor size=10, offset=10, size=0
  // Runtime verification fails: "offset 0 is out-of-bounds"
  %slice = tensor.extract_slice %tensor[%offset] [%size] [1] 
    : tensor<?xf32> to tensor<?xf32>
  return
}
```

For the above example, the check evaluates `10 < 10` which is false, so
verification fails. However, I believe this operation should be valid -
we're extracting zero elements, so there's no actual out-of-bounds
access.

**Real-world repro from the TensorFlow Lite models:**

This issue manifests while lowering TFLite models and a lot of our
system tests are failing due to this. Here's a simplified version
showing the problematic pattern:

In this code, `%extracted_slice_0` becomes an empty tensor when SSA
value `%15` reaches 10 (on the final loop iteration), making `%16 = 0`.
The operation extracts zero elements along dimension 0, which is
semantically valid but fails runtime verification.

```mlir
func.func @simplified_repro_from_tensorflowlite_model(%arg0: tensor<10x4x1xf32>) -> tensor<10x4x1xf32> {
  %c0 = arith.constant 0 : index
  %c1 = arith.constant 1 : index
  %c2 = arith.constant 2 : index
  %c10 = arith.constant 10 : index
  %c-1 = arith.constant -1 : index
  
  %0 = "tosa.const"() <{values = dense<0> : tensor<i32>}> : () -> tensor<i32>
  %1 = "tosa.const"() <{values = dense<1> : tensor<i32>}> : () -> tensor<i32>
  %2 = "tosa.const"() <{values = dense<10> : tensor<i32>}> : () -> tensor<i32>
  %3 = "tosa.const"() <{values = dense<-1> : tensor<2xi32>}> : () -> tensor<2xi32>
  %4 = "tosa.const"() <{values = dense<0> : tensor<2xi32>}> : () -> tensor<2xi32>
  %5 = "tosa.const"() <{values = dense<0.000000e+00> : tensor<1x4x1xf32>}> : () -> tensor<1x4x1xf32>
  %c4_1 = tosa.const_shape  {values = dense<1> : tensor<1xindex>} : () -> !tosa.shape<1>
  
  %6:2 = scf.while (%arg1 = %0, %arg2 = %arg0) 
    : (tensor<i32>, tensor<10x4x1xf32>) -> (tensor<i32>, tensor<10x4x1xf32>) {
    %7 = tosa.greater %2, %arg1 : (tensor<i32>, tensor<i32>) -> tensor<i1>
    %extracted = tensor.extract %7[] : tensor<i1>
    scf.condition(%extracted) %arg1, %arg2 : tensor<i32>, tensor<10x4x1xf32>
  } do {
  ^bb0(%arg1: tensor<i32>, %arg2: tensor<10x4x1xf32>):
    %7 = tosa.add %arg1, %1 : (tensor<i32>, tensor<i32>) -> tensor<i32>
    
    // First slice
    %8 = tosa.reshape %arg1, %c4_1 : (tensor<i32>, !tosa.shape<1>) -> tensor<1xi32>
    %9 = tosa.concat %8, %3 {axis = 0 : i32} : (tensor<1xi32>, tensor<2xi32>) -> tensor<3xi32>
    
    %extracted_0 = tensor.extract %9[%c0] : tensor<3xi32>
    %10 = index.casts %extracted_0 : i32 to index
    %11 = arith.cmpi eq, %10, %c-1 : index
    %12 = arith.select %11, %c10, %10 : index
    
    %extracted_slice = tensor.extract_slice %arg2[0, 0, 0] [%12, 4, 1] [1, 1, 1] 
      : tensor<10x4x1xf32> to tensor<?x4x1xf32>
    
    // Second slice - this is where the failure occurs
    %13 = tosa.reshape %7, %c4_1 : (tensor<i32>, !tosa.shape<1>) -> tensor<1xi32>
    %14 = tosa.concat %13, %4 {axis = 0 : i32} : (tensor<1xi32>, tensor<2xi32>) -> tensor<3xi32>
    
    %extracted_1 = tensor.extract %14[%c0] : tensor<3xi32>
    %15 = index.castu %extracted_1 : i32 to index
    %16 = arith.subi %c10, %15 : index  // size = 10 - offset
    
    %extracted_2 = tensor.extract %14[%c1] : tensor<3xi32>
    %17 = index.castu %extracted_2 : i32 to index
    
    %extracted_3 = tensor.extract %14[%c2] : tensor<3xi32>
    %18 = index.castu %extracted_3 : i32 to index
    
    // On the last loop iteration: %15=10, %16=0
    // %extracted_slice_0 becomes an empty tensor
    // Runtime verification fails: "offset 0 is out-of-bounds"
    %extracted_slice_0 = tensor.extract_slice %arg2[%15, %17, %18] [%16, 4, 1] [1, 1, 1] 
      : tensor<10x4x1xf32> to tensor<?x4x1xf32>
    
    %19 = tosa.concat %extracted_slice, %5, %extracted_slice_0 {axis = 0 : i32} 
      : (tensor<?x4x1xf32>, tensor<1x4x1xf32>, tensor<?x4x1xf32>) -> tensor<10x4x1xf32>
    
    scf.yield %7, %19 : tensor<i32>, tensor<10x4x1xf32>
  }
  
  return %6#1 : tensor<10x4x1xf32>
}
```
**The fix:**

Make the offset check conditional on slice size:
- Empty slice (size == 0): allow `0 <= offset <= dim_size`
- Non-empty slice (size > 0): require `0 <= offset < dim_size`


**Question for reviewers:**
Should we also relax the static verifier to allow this edge case?
Currently, the static verifier rejects the following IR:

```mlir
%tensor = arith.constant dense<1.0> : tensor<10xf32>
%slice = tensor.extract_slice %tensor[10] [0] [1] : tensor<10xf32> to tensor<0xf32>
```
Since we're allowing it at runtime for dynamic shapes, it seems
inconsistent to reject it statically. However, I wanted to get feedback
before making that change - this PR focuses only on the runtime
verification fix for dynamic shapes.

P.S. We have a similar issue with `memref.subview`. I will send a
separate patch for the issue.

Co-authored-by: Hanumanth Hanumantharayappa <hhanuman@ah-hhanuman-l.dhcp.mathworks.com>
2025-11-12 08:37:15 +09:00
Jakub Kuderski
ba0be89cd2
[mlir] Simplify Default cases in type switches. NFC. (#165767)
Use default values instead of lambdas when possible. `std::nullopt` and
`nullptr` can be used now because of
https://github.com/llvm/llvm-project/pull/165724.
2025-10-30 15:10:59 -04:00
srcarroll
f5e175f06d
[mlir][linalg] Genericize MapOp (#162742)
This PR modifies the definition of `linalg::MapOp` so that it has the
same structure of `linalg::GenericOp` and all other linalg ops. Mainly,
it adds an `out` bbarg for the body of the op. Although the `out` arg is
never used in the body, there doesn't seem to be much benefit in
specializing the op to exclude it. In fact it only makes things more
complicated because it doesn't align with the `GenericOp` structure. For
example, `linalg-generalize-named-ops` avoided converting `linalg.map`
purely because it didn't have the structure to do so. Moreover, although
some fusion patterns are applied explicitly to `GenericOp`, we can
change them to be applied to the base `LinalgOp` which will enable
fusion for any fusion-compatible linalg op, but that requires the op
having a generic structure. So these changes will enable us to use
existing generic transformation patterns on `MapOp` that weren't
possible before. They can either be applied to `MapOp` directly or
applied after converting to `GenericOp`.
2025-10-30 10:20:19 -05:00
Mehdi Amini
4e44298f96 [MLIR] Apply clang-tidy fixes for llvm-qualified-auto in SwapExtractSliceWithProducerPatterns.cpp (NFC) 2025-10-28 17:36:35 -07:00
Hanumanth
a6788b5246
[mlir][tensor] Fix runtime verification for tensor.extract_slice when size dimension value is 0 (#164878)
Previously, the runtime verification pass would insert assertion
statements with conditions that always evaluate to false for
semantically valid `tensor.extract_slice` operations where one of the
dimensions had a size of 0.

The `tensor.extract_slice` runtime verification logic was
unconditionally generating checks for the position of the last element
(`offset + (size - 1) * stride`). When `size` is 0, this causes the
assertion condition to always be false, leading to runtime failures even
though the operation is semantically valid.

This patch fixes the issue by making the `lastPos` check conditional.
The offset is always verified, but the endpoint check is only performed
when `size > 0` to avoid generating spurious assert statements.

This issue was discovered through LiteRT model, where a dynamic shape
calculation resulted in a zero-sized dimension being passed to
`tensor.extract_slice`.

The following is a simplified IR snippet from the model. After running
the runtime verification pass, an assertion that always fails is
generated because the SSA value `%3` becomes 0.

```mlir
func.func @simple_repro_from_liteRT_model(%arg0: tensor<10x4x1xf32>) -> tensor<?x?x?xf32> {
  %cst = arith.constant dense<0> : tensor<1xi32>
  %cst_0 = arith.constant dense<-1> : tensor<2xi32>
  %c-1 = arith.constant -1 : index
  %c0 = arith.constant 0 : index
  %c10 = arith.constant 10 : index
  %c1 = arith.constant 1 : index
  %c4 = arith.constant 4 : index
  %c2 = arith.constant 2 : index
  %0 = tensor.empty() : tensor<3xi32>
  %inserted_slice = tensor.insert_slice %cst into %0[0] [1] [1] : tensor<1xi32> into tensor<3xi32>
  %inserted_slice_1 = tensor.insert_slice %cst_0 into %inserted_slice[1] [2] [1] : tensor<2xi32> into tensor<3xi32>
  %extracted = tensor.extract %inserted_slice_1[%c0] : tensor<3xi32>
  %1 = index.casts %extracted : i32 to index
  %2 = arith.cmpi eq, %1, %c-1 : index
  %3 = arith.select %2, %c10, %1 : index
  %extracted_2 = tensor.extract %inserted_slice_1[%c1] : tensor<3xi32>
  %4 = index.casts %extracted_2 : i32 to index
  %5 = arith.cmpi eq, %4, %c-1 : index
  %6 = arith.select %5, %c4, %4 : index
  %extracted_3 = tensor.extract %inserted_slice_1[%c2] : tensor<3xi32>
  %7 = index.casts %extracted_3 : i32 to index
  %8 = arith.cmpi eq, %7, %c-1 : index
  %9 = arith.select %8, %c1, %7 : index
  %extracted_slice = tensor.extract_slice %arg0[0, 0, 0] [%3, %6, %9] [1, 1, 1] : tensor<10x4x1xf32> to tensor<?x?x?xf32>
  return %extracted_slice : tensor<?x?x?xf32>
}
```

The issue can be reproduced more simply with the following test case,
where `dim_0` is `0`. When the runtime verification pass is applied to
this code with `dim_0 = 0`, it generates an assertion that will always
fail at runtime.

```mlir
func.func @extract_slice_zero_size_dim(%arg0: tensor<10x4x1xf32>,
                                      %dim_0: index,
                                      %dim_1: index,
                                      %dim_2: index) {
  %slice = tensor.extract_slice %arg0[0, 0, 0] [%dim_0, %dim_1, %dim_2] [1, 1, 1]
    : tensor<10x4x1xf32> to tensor<?x?x?xf32>
  return
}

func.func @test_zero_size_extraction() {
  %input = arith.constant dense<1.0> : tensor<10x4x1xf32>
  // Define slice dimensions: 0x4x1 (zero-size in first dimension)
  %dim_0 = arith.constant 0 : index
  %dim_1 = arith.constant 4 : index
  %dim_2 = arith.constant 1 : index
  func.call @extract_slice_zero_size_dim(%input, %dim_0, %dim_1, %dim_2)
    : (tensor<10x4x1xf32>, index, index, index) -> ()
  return
}
```

P.S. We probably have a similar issue with `memref.subview`. I will
check this and send a separate PR for the issue.

---------

Co-authored-by: Hanumanth Hanumantharayappa <hhanuman@ah-hhanuman-l.dhcp.mathworks.com>
2025-10-27 11:43:18 -07:00
Hanchenng Wu
a6d1a52b8d
[MLIR] Reuse AsmState to enable fast generate-runtime-verification pass; add location-only pass option (#160331)
The pass generate-runtime-verification generates additional runtime op
verification checks.

Currently, the pass is extremely expensive. For example, with a
mobilenet v2 ssd network(converted to mlir), running this pass alone in
debug mode will take 30 minutes. The same observation has been made to
other networks as small as 5 Mb.

The culprit is this line "op->print(stream, flags);" in function
"RuntimeVerifiableOpInterface::generateErrorMessage" in File
mlir/lib/Interfaces/RuntimeVerifiableOpInterface.cpp.

As we are printing the op with all the names of the operands in the
middle end, we are constructing a new SSANameState for each
op->print(...) call. Thus, we are doing a new SSA analysis for each
error message printed.

Perf profiling shows that 98% percent of the time is spent in the
constructor of SSANameState.

This change refactored the message generator. We use a toplevel
AsmState, and reuse it with all the op-print(stream, asmState). With a
release build, this change reduces the pass exeuction time from ~160
seconds to 0.3 seconds on my machine.

This change also adds verbose options to generate-runtime-verification
pass.
verbose 0: print only source location with error message.
verbose 1: print the full op, including the name of the operands.
2025-10-08 11:48:34 +01:00
Alan Li
b87f1b22a8
[MLIR] Add InParallelOpInterface for parallel combining operations (#157736)
This commit:
- Introduces a new `InParallelOpInterface`, along with the
`ParallelCombiningOpInterface`, represent the parallel updating
operations we have in a parallel loop of `scf.forall`.
- Change the name of `ParallelCombiningOpInterface` to
`InParallelOpInterface` as the naming was quite confusing.
- `ParallelCombiningOpInterface` now is used to generalize operations
that insert into shared tensors within parallel combining regions.
Previously, only `tensor.parallel_insert_slice` was supported directly
in `scf.InParallelOp` regions.
- `tensor.parallel_insert_slice` now implements
`ParallelCombiningOpInterface`.

This change enables future extensions to support additional parallel
combining operations beyond `tensor.parallel_insert_slice`, which have
different update semantics, so the `in_parallel` region can correctly
and safely represent these kinds of operation without potential mistakes
such as races.

Author credits: @qedawkins
2025-09-12 14:23:00 -07:00
Han-Chung Wang
8eba28bc8c
[mlir][NFC] Correct pattern names to match the behaviors. (#158177)
It is a follow-up for
https://github.com/llvm/llvm-project/pull/131982#discussion_r2286014576
and
https://github.com/llvm/llvm-project/pull/126898#discussion_r2286013250.

The names do not match the behaviors, and the revision updates the
names.

Signed-off-by: hanhanW <hanhan0912@gmail.com>
2025-09-12 10:57:20 -07:00
Ian Wood
961b052e98
[mlir][tensor][NFC] Refactor common methods for bubbling extract_slice op (#153675)
Exposes the `tensor.extract_slice` reshaping logic in
`BubbleUpExpandShapeThroughExtractSlice` and
`BubbleUpCollapseShapeThroughExtractSlice` through two corresponding
utility functions. These compute the offsets/sizes/strides of an extract
slice after either collapsing or expanding.

This should also make it easier to implement the two other bubbling
cases: (1) the `collapse_shape` is a consumer or (2) the `expand_shape`
is a consumer.

---------

Signed-off-by: Ian Wood <ianwood@u.northwestern.edu>
2025-08-19 19:31:30 +00:00
Maksim Levental
c090ed53fb
[mlir][NFC] update mlir/Dialect create APIs (33/n) (#150659)
See https://github.com/llvm/llvm-project/pull/147168 for more info.
2025-07-25 16:13:55 -04:00
Kazu Hirata
0925d7572a
[mlir] Remove unused includes (NFC) (#150266)
These are identified by misc-include-cleaner.  I've filtered out those
that break builds.  Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
2025-07-23 15:18:53 -07:00
Maksim Levental
8fff238b2c
[mlir][NFC] update mlir/Dialect create APIs (23/n) (#149930)
See https://github.com/llvm/llvm-project/pull/147168 for more info.
2025-07-23 10:16:52 -04:00
Kazu Hirata
5e0de68626
[mlir] Remove unused includes (NFC) (#148119)
These are identified by misc-include-cleaner.  I've filtered out those
that break builds.  Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
2025-07-11 11:59:26 -07:00
MaheshRavishankar
c22352175e
[mlir][TilingInterface] Allow tile and fuse to work with ReductionTilingStrategy::PartialReductionOuterParallelStrategy. (#147593)
Since `scf::tileUsingSCF` is the core method used for tiling the root
operation within the `scf::tileConsumersAndFuseProducersUsingSCF`, the
latter can fuse into any tiled loop generated using `scf::tileUsingSCF`.
This patch adds a test for tiling a root operation using
`ReductionTilingStrategy::PartialReductionOuterParallelStrategy` and
fusing producers with it.

Since this strategy generates a rank-reducing extract slice
`tensor::replaceExtractSliceWithTiledProducer` which is the core method
used for the fusion was extended to handle the rank-reducing slices.

Also fix a small bug in the computation of the reduction induction
variable (which needs to use `floorDiv` instead of `ceilDiv`)

Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
2025-07-09 08:50:01 -07:00
Andrei Golubev
a63f572628
[mlir][bufferization] Return BufferLikeType in BufferizableOpInterface (#144867)
Support custom types (2/N): allow value-owning operations (e.g.
allocation ops) to bufferize custom tensors into custom buffers. This
requires BufferizableOpInterface::getBufferType() to return
BufferLikeType instead of BaseMemRefType.

Affected implementors of the interface are updated accordingly.

Relates to ee070d08163ac09842d9bf0c1315f311df39faf1.
2025-07-02 11:27:35 -07:00
MaheshRavishankar
c873e5f87d
[mlir][TilingInterface] Handle multi operand consumer fusion. (#145193)
For consumer fusion cases of this form

```
%0:2 = scf.forall .. shared_outs(%arg0 = ..., %arg0 = ...) {

  tensor.parallel_insert_slice ... into %arg0
  tensor.parallel_insert_slice ... into %arg1
}
%1 = linalg.generic ... ins(%0#0, %0#1)
```

the current consumer fusion that handles one slice at a time cannot fuse
the consumer into the loop, since fusing along one slice will create and
SSA violation on the other use from the `scf.forall`. The solution is to
allow consumer fusion to allow considering multiple slices at once. This
PR changes the `TilingInterface` methods related to consumer fusion,
i.e.

- `getTiledImplementationFromOperandTile`
- `getIterationDomainFromOperandTile`

to allow fusion while considering multiple operands. It is upto the
`TilingInterface` implementation to return an error if a list of tiles
of the operands cannot result in a consistent implementation of the
tiled operation.

The Linalg operation implementation of `TilingInterface` has been
modified to account for these changes and allow cases where operand
tiles that can result in a consistent tiling implementation are handled.

---------

Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
2025-06-25 11:54:38 -07:00
Andrei Golubev
ee070d0816
[mlir][bufferization] Support custom types (1/N) (#142986)
Following the addition of TensorLike and BufferLike type interfaces (see
00eaff3e9c897c263a879416d0f151d7ca7eeaff), introduce minimal changes
required to bufferize a custom tensor operation into a custom buffer
operation.

To achieve this, new interface methods are added to TensorLike type
interface that abstract away the differences between existing (tensor ->
memref) and custom conversions.

The scope of the changes is intentionally limited (for example,
BufferizableOpInterface is untouched) in order to first understand the
basics and reach consensus design-wise.

---
Notable changes:
* mlir::bufferization::getBufferType() returns BufferLikeType (instead
of BaseMemRefType)
* ToTensorOp / ToBufferOp operate on TensorLikeType / BufferLikeType.
Operation argument "memref" renamed to "buffer"
* ToTensorOp's tensor type inferring builder is dropped (users now need
to provide the tensor type explicitly)
2025-06-18 16:18:12 +02:00
Kazu Hirata
85480a4d37
[mlir] Directly call ShapedType::isDynamic without lambdas (NFC) (#142994)
We do not need lambdas in these places.
2025-06-05 16:14:27 -07:00
Matthias Springer
e4c8ff94e7
[mlir][tensor] Add runtime verification for cast/dim/extract/insert/extract_slice (#141332)
Add `RuntimeVerifiableOpInterface` implementations for the following
ops. These were mostly copied from the respective memref
implementations. Only the part that deals with offsets and strides was
removed.
* `tensor.cast`: `memref.cast`
* `tensor.dim`: `memref.dim`
* `tensor.extract`: `memref.load`
* `tensor.insert`: `memref.store`
* `tensor.extract_slice`: `memref.subview`
2025-06-05 12:06:47 +09:00
Michele Scuttari
63cb6af782
[MLIR] Add bufferization state to getBufferType and resolveConflicts interface methods (#141466)
The PR continues the work started in #141019 by adding the `BufferizationState` class also to the `getBufferType` and `resolveConflicts` interface methods, together with the additional support functions that are used throughout the bufferization infrastructure.
2025-05-28 10:35:23 +02:00
Michele Scuttari
61d5fdf50c
[MLIR] Add bufferization state class to OneShotBufferization pass (#141019)
Follow-up on #138143, which was reverted due to a missing update a method signature (more specifically, the bufferization interface for `tensor::ConcatOp`) that was not catched before merging. The old PR description is reported in the next lines.

This PR is a follow-up on https://github.com/llvm/llvm-project/pull/138125, and adds a bufferization state class providing information about the IR. The information currently consists of a cached list of symbol tables, which aims to solve the quadratic scaling of the bufferization task with respect to the number of symbols. The PR breaks API compatibility: the bufferize method of the BufferizableOpInterface has been enriched with a reference to a BufferizationState object.

The bufferization state must be kept in a valid state by the interface implementations. For example, if an operation with the Symbol trait is inserted or replaced, its parent SymbolTable must be updated accordingly (see, for example, the bufferization of arith::ConstantOp, where the symbol table of the module gets the new global symbol inserted). Similarly, the invalidation of a symbol table must be performed if an operation with the SymbolTable trait is removed (this can be performed using the invalidateSymbolTable method, introduced in https://github.com/llvm/llvm-project/pull/138014).
2025-05-23 09:21:35 +02:00
Michele Scuttari
72a8893689
Revert "[MLIR] Add bufferization state class to OneShotBufferization pass" (#141012)
Reverts llvm/llvm-project#138143

The PR for the BufferizationState is temporarily reverted due to API incompatibilities that have been initially missed during the update and were not catched by PR checks.
2025-05-22 09:25:07 +02:00
Michele Scuttari
67fc1660d9
[MLIR] Add bufferization state class to OneShotBufferization pass (#138143)
This PR is a follow-up on #138125, and adds a bufferization state class providing information about the IR. The information currently consists of a cached list of symbol tables, which aims to solve the quadratic scaling of the bufferization task with respect to the number of symbols. The PR breaks API compatibility: the `bufferize` method of the `BufferizableOpInterface` has been enriched with a reference to a `BufferizationState` object.

The bufferization state must be kept in a valid state by the interface implementations. For example, if an operation with the `Symbol` trait is inserted or replaced, its parent `SymbolTable` must be updated accordingly (see, for example, the bufferization of `arith::ConstantOp`, where the symbol table of the module gets the new global symbol inserted). Similarly, the invalidation of a symbol table must be performed if an operation with the `SymbolTable` trait is removed (this can be performed using the `invalidateSymbolTable` method, introduced in #138014).
2025-05-22 08:53:38 +02:00
Han-Chung Wang
c39915fa2e
[mlir][NFC] Simplify constant checks with isOneInteger and renamed isZeroInteger. (#139340)
The revision adds isOneInteger helper, and simplifies the existing code
with the two methods. It removes some lambda, which makes code cleaner.

For downstream users, you can update the code with the below script.

```bash
sed -i "s/isZeroIndex/isZeroInteger/g" **/*.h
sed -i "s/isZeroIndex/isZeroInteger/g" **/*.cpp
```

---------

Signed-off-by: hanhanW <hanhan0912@gmail.com>
2025-05-20 14:53:02 -07:00
Jeremy Kun
1bc0043467
Restore #140171 with to_memref -> to_buffer (#140355)
https://github.com/llvm/llvm-project/pull/140171 was reverted because an
op's name changed and I neglected to rebase before merging.

---------

Co-authored-by: Jeremy Kun <j2kun@users.noreply.github.com>
2025-05-17 18:47:16 -07:00
Kazu Hirata
7b8bc1b3d1 Revert "[mlir][bufferization] implement BufferizableOpInterface for concat op (#140171)"
This reverts commit 6d9ce6767d259a5231ae312a19459f8fea3bd0ca.

Multiple builtbot failures have been reported:
https://github.com/llvm/llvm-project/pull/140171
2025-05-16 20:23:18 -07:00
Jeremy Kun
6d9ce6767d
[mlir][bufferization] implement BufferizableOpInterface for concat op (#140171)
Lowers `tensor.concat` to an alloc with a series of `memref.copy` ops to
copy the operands to the alloc.

Example:

```mlir
func.func @tensor.concat(%f: tensor<8xf32>) -> tensor<16xf32> {
  %t = tensor.concat dim(0) %f, %f : (tensor<8xf32>, tensor<8xf32>) -> tensor<16xf32>
  return %t : tensor<16xf32>
}
```

Produces

```mlir
module {
  func.func @tensor.concat(%arg0: tensor<8xf32>) -> tensor<16xf32> {
    // initialization
    %0 = bufferization.to_memref %arg0 : tensor<8xf32> to memref<8xf32>
    %alloc = memref.alloc() {alignment = 64 : i64} : memref<8xf32>
    memref.copy %0, %alloc : memref<8xf32> to memref<8xf32>
    %alloc_0 = memref.alloc() {alignment = 64 : i64} : memref<8xf32>
    memref.copy %0, %alloc_0 : memref<8xf32> to memref<8xf32>
    %alloc_1 = memref.alloc() {alignment = 64 : i64} : memref<16xf32>

    // one copy for each operand
    %subview = memref.subview %alloc_1[0] [8] [1] : memref<16xf32> to memref<8xf32, strided<[1]>>
    memref.copy %alloc, %subview : memref<8xf32> to memref<8xf32, strided<[1]>>
    %subview_2 = memref.subview %alloc_1[8] [8] [1] : memref<16xf32> to memref<8xf32, strided<[1], offset: 8>>
    memref.copy %alloc_0, %subview_2 : memref<8xf32> to memref<8xf32, strided<[1], offset: 8>>
    %1 = bufferization.to_tensor %alloc_1 : memref<16xf32> to tensor<16xf32>
    return %1 : tensor<16xf32>
  }
}
```

This is my first time implementing BufferizableOpInterface, so I'm
looking for some advice on how I can:

1. Clean up my implementation.
2. Avoid duplicate `memref.copy` ops in the `// initialization` section
above when handling duplicate `tensor.concat` operands.

---------

Co-authored-by: Jeremy Kun <j2kun@users.noreply.github.com>
2025-05-16 20:13:42 -07:00
Andrei Golubev
8f91b108df
[mlir][bufferization][NFC] Rename to_memref to to_buffer (#137180)
As part of the work on transitioning bufferization dialect, ops, and
associated logic to operate on newly added type interfaces (see
00eaff3e9c897c263a879416d0f151d7ca7eeaff), rename the
bufferization.to_memref to highlight the generic nature of the op.

Bufferization process produces buffers while memref is a builtin type
rather than a generic term.

Preserve the current API (to_buffer still produces a memref), however,
as the new type interfaces are not used yet.
2025-05-14 11:17:09 +02:00
lorenzo chelini
61536f2781
[mlir] Retire additional let constructor (NFC) (#139390)
Three main changes:

- The pass createRequestCWrappersPass is renamed as
createLLVMRequestCWrappersPass

- createOptimizeForTargetPass is now under the LLVM namespace. It’s
unclear why the NVVM namespace was used initially, as all passes in
LLVMIR/Transforms/Passes.h consistently reside in the LLVM namespace.

- DuplicateFunctionEliminationPass is now in the func namespace.
2025-05-13 11:15:29 +02:00
Andrzej Warzyński
c45cc3e420
[mlir][vector] Standardize base Naming Across Vector Ops (NFC) (#137859)
[mlir][vector] Standardize base Naming Across Vector Ops (NFC)

This change standardizes the naming convention for the argument
representing the value to read from or write to in Vector ops that
interface with Tensors or MemRefs. Specifically, it ensures that all
such ops use the name `base` (i.e., the base address or location to
which offsets are applied).

Updated operations:

* `vector.transfer_read`,
* `vector.transfer_write`.

For reference, these ops already use `base`:

* `vector.load`, `vector.store`, `vector.scatter`, `vector.gather`,
  `vector.expandload`, `vector.compressstore`, `vector.maskedstore`,
  `vector.maskedload`.

This is a non-functional change (NFC) and does not alter the semantics of these
operations. However, it does require users of the XFer ops to switch from
`op.getSource()` to `op.getBase()`.

To ease the transition, this PR temporarily adds a `getSource()` interface
method for compatibility. This is intended for downstream use only and should
not be relied on upstream. The method will be removed prior to the LLVM 21
release.

Implements #131602
2025-05-12 09:44:50 +01:00
MaheshRavishankar
0f3e460e06
[mlir][Tensor] Generalize the pattern to swap tensor.collapse_shape -> tensor.expand_shape. (#133819)
The current patterns compared the reassocation indices for the two ops
and failed if neither of them were of size 1. This patch relaxes this
restriction by handling a new case where the reassociation indices might
be of the same size.

Also generalizes to cases where when generating the swapped
`tensor.expand_shape` -> `tensor.collapse_shape` if one of them is
degenerate, those are not generated.

Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
2025-04-15 14:10:18 -07:00
MaheshRavishankar
a1bc979aa8
[mlir][Bufferization] Do not have read semantics for destination of tensor.parallel_insert_slice. (#134169)
`tensor.insert_slice` needs to have read semantics on its destination
operand. Since it has a return value, its semantics are

- Copy dest to result
- Copy source to subview of destination.

`tensor.parallel_insert_slice` though has no result. So it does not need
to have read semantics. The op description
[here](a3ac318e5f/mlir/include/mlir/Dialect/Tensor/IR/TensorOps.td (L1524))
also says that it is expected to lower to a `memref.subview`, that does
not have read semantics on the destination (its just a view).

This patch drops the read semantics for destination of
`tensor.parallel_insert_slice` but also makes the `shared_outs` operands
of `scf.forall` have read semantics. Earlier it would rely indirectly on
read semantics of destination operand of `tensor.parallel_insert_slice`
to propagate the read semantics for `shared_outs`. Now that is specified
more directly.

Fixes #133964

---------

Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
2025-04-03 09:47:36 -07:00
ofri frishman
6f1347d57b
[MLIR] Bubble up tensor.extract_slice through tensor.collapse_shape (#131982)
Add a pattern that bubbles up tensor.extract_slice through
tensor.collapse_shape.
The pattern is registered in a pattern population function that is used
by the transform op
transform.apply_patterns.tensor.bubble_up_extract_slice and by the
tranform op transform.structured.fuse as a cleanup pattern.
This pattern enables tiling and fusing op chains which contain
tensor.collapse_shape if added as a cleanup pattern of tile and fuse
utility.
Without this pattern that would not be possible, as
tensor.collapse_shape does not implement the tiling interface. This is
an additional pattern to the one added in PR #126898
2025-04-02 21:06:43 +01:00
Christopher Bate
3438dfc7ff
[mlir][tensor] Fix bufferization interface for 'tensor.reshape' (#128590)
Previously, the BufferizableOpInterface implementation for
'tensor.reshape'
listed the 'shape' operand as an alias for the result tensor, causing
unnecessary conflicts with ops that "write" to the shape operand.
2025-03-12 22:19:01 -06:00
Evan Liu
634e25319e
[mlir] Add special case for 0-D tensor when fusing expand from collapse (#130838)
One fusion pattern for collapse_shape -> expand_shape was added in
a95ad2da36,
however if the intermediate tensor between a collapse and expand is a
0-D tensor, then the `reassociation_map` for these two are special cases
and can't be generally fused in this function
`BubbleUpExpandThroughParallelCollapse`.
2025-03-11 15:55:55 -07:00
ofri frishman
6e59282235
[MLIR] Add pattern to bubble up tensor.extract_slice (#126898)
Add a pattern that bubbles up tensor.extract_slice through
tensor.expand_shape, and add a transform op to tensor dialect
to directly use this pattern.
This pattern enables tiling and fusing op chains which contain
tensor.expand_shape if added as a cleanup pattern of tile and fuse
utility.
Without this pattern that would not be possible, as
tensor.expand_shape does not implement the tiling interface.
In addition, registering this pattern as a cleanup pattern for
transform.structured.fuse.
The pattern was first implement in IREE project by
Quinn Dawkins and is being upstreamed.

---------

Co-authored-by: Quinn Dawkins <quinn.dawkins@gmail.com>
2025-03-03 18:20:50 +00:00
Arnab Dutta
3cccb2017f
[MLIR][Tensor] Enhance bufferization of tensor.expand_shape op (#128871)
Instead of inferring the output shape argument of
memref.expand_shape op, use output_shape argument of tensor.expand_shape
op by adding dynamic dimension support for bufferization of
tensor.expand_shape when there are more than one dynamic dim within a
reassociation set.
2025-02-28 10:45:38 +05:30
Andrzej Warzyński
517800e37e
[mlir][tensor][linalg] Move Pack/UnPack Ops to Linalg (#123902)
Moves `PackOp` and `UnPackOp` from the Tensor dialect to Linalg. This change
was discussed in the following RFC:
* https://discourse.llvm.org/t/rfc-move-tensor-pack-and-tensor-unpack-into-linalg

This change involves significant churn but only relocates existing code - no new
functionality is added.

**Note for Downstream Users**
Downstream users must update references to `PackOp` and `UnPackOp` as follows:
  * Code: `s/tensor::(Up)PackOp/linalg::(Un)PackOp/g`
  * Tests: `s/tensor.(un)pack/linalg.(un)pack/g`

No other modifications should be required.
2025-02-17 10:44:27 +00:00
Tomás Longeri
5767e4d4ca
[MLIR][NFC] Return MemRefType in memref.subview return type inference functions (#120024)
Avoids the need for cast, and matches the extra build functions, which
take a `MemRefType`
2025-02-14 12:58:20 +00:00
Matthias Springer
6aaa8f25b6
[mlir][IR][NFC] Move free-standing functions to MemRefType (#123465)
Turn free-standing `MemRefType`-related helper functions in
`BuiltinTypes.h` into member functions.
2025-01-21 08:48:09 +01:00
Kazu Hirata
fecf1397e3
[Tensor] Migrate away from PointerUnion::{is,get} (NFC) (#120679)
Note that PointerUnion::{is,get} have been soft deprecated in
PointerUnion.h:

  // FIXME: Replace the uses of is(), get() and dyn_cast() with
  //        isa<T>, cast<T> and the llvm::dyn_cast<T>

I'm not touching PointerUnion::dyn_cast for now because it's a bit
complicated; we could blindly migrate it to dyn_cast_if_present, but
we should probably use dyn_cast when the operand is known to be
non-null.
2024-12-20 10:41:54 -08:00
Jacques Pienaar
09dfc5713d
[mlir] Enable decoupling two kinds of greedy behavior. (#104649)
The greedy rewriter is used in many different flows and it has a lot of
convenience (work list management, debugging actions, tracing, etc). But
it combines two kinds of greedy behavior 1) how ops are matched, 2)
folding wherever it can.

These are independent forms of greedy and leads to inefficiency. E.g.,
cases where one need to create different phases in lowering and is
required to applying patterns in specific order split across different
passes. Using the driver one ends up needlessly retrying folding/having
multiple rounds of folding attempts, where one final run would have
sufficed.

Of course folks can locally avoid this behavior by just building their
own, but this is also a common requested feature that folks keep on
working around locally in suboptimal ways.

For downstream users, there should be no behavioral change. Updating
from the deprecated should just be a find and replace (e.g., `find ./
-type f -exec sed -i
's|applyPatternsAndFoldGreedily|applyPatternsGreedily|g' {} \;` variety)
as the API arguments hasn't changed between the two.
2024-12-20 08:15:48 -08:00
Christopher Bate
ced2fc7819
[mlir][bufferization] Fix OneShotBufferize when defaultMemorySpaceFn is used (#91524)
As described in issue llvm/llvm-project#91518, a previous PR
llvm/llvm-project#78484 introduced the `defaultMemorySpaceFn` into
bufferization options, allowing one to inform OneShotBufferize that it
should use a specified function to derive the memory space attribute
from the encoding attribute attached to tensor types.

However, introducing this feature exposed unhandled edge cases,
examples of which are introduced by this change in the new test under

`test/Dialect/Bufferization/Transforms/one-shot-bufferize-encodings.mlir`.

Fixing the inconsistencies introduced by `defaultMemorySpaceFn` is
pretty simple. This change:

- Updates the `bufferization.to_memref` and `bufferization.to_tensor`
  operations to explicitly include operand and destination types,
  whereas previously they relied on type inference to deduce the
  tensor types. Since the type inference cannot recover the correct
  tensor encoding/memory space, the operand and result types must be
  explicitly included. This is a small assembly format change, but it
  touches a large number of test files.

- Makes minor updates to other bufferization functions to handle the
  changes in building the above ops.

- Updates bufferization of `tensor.from_elements` to handle memory
  space.


Integration/upgrade guide:

In downstream projects, if you have tests or MLIR files that explicitly
use
`bufferization.to_tensor` or `bufferization.to_memref`, then update
them to the new assembly format as follows:

```
%1 = bufferization.to_memref %0 : memref<10xf32>
%2 = bufferization.to_tensor %1 : memref<10xf32>
```

becomes

```
%1 = bufferization.to_memref %0 : tensor<10xf32> to memref<10xf32>
%2 = bufferization.to_tensor %0 : memref<10xf32> to tensor<10xf32> 
```
2024-11-26 09:45:57 -07:00
MaheshRavishankar
de6d48d05d
[mlir][Tensor] Move concat operation decomposition as a method of the concat operation. (#116004)
Currently the implementation is within a pattern that cannot be used
without a pattern rewriter. Move the decomposition as a method of the
operation to make it usable outside of pattern rewrites.

Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
2024-11-13 13:29:04 -08:00
Max191
98e838a890
[mlir] Do not bufferize parallel_insert_slice dest to read for full slices (#112761)
In the insert_slice bufferization interface implementation, the
destination tensor is not considered read if the full tensor is
overwritten by the slice. This PR adds the same check for
tensor.parallel_insert_slice.

Adds two new StaticValueUtils:
- `isAllConstantIntValue` checks if an array of `OpFoldResult` are all
equal to a passed `int64_t` value.
- `areConstantIntValues` checks if an array of `OpFoldResult` are all
equal to a passed array of `int64_t` values.

fixes https://github.com/llvm/llvm-project/issues/112435

---------

Signed-off-by: Max Dawkins <max.dawkins@gmail.com>
2024-10-18 16:02:03 -04:00
BARRET
1666d13078
[CMake]: Remove unnecessary dependencies on LLVM/MLIR (#111255)
Previous https://github.com/llvm/llvm-project/pull/110362 (reverted)
caused breakage. Here is the PR with fix.

My build cmdline:

```
cmake ../llvm \
    -G Ninja \
    -DCMAKE_BUILD_TYPE=Release \
    -DCMAKE_INSTALL_PREFIX=install \
    -DCMAKE_C_COMPILER=gcc-9 \
    -DCMAKE_CXX_COMPILER=g++-9 \
    -DCMAKE_CUDA_COMPILER=$(which nvcc) \
    -DLLVM_ENABLE_LLD=OFF \
    -DLLVM_ENABLE_ASSERTIONS=ON \
    -DLLVM_BUILD_EXAMPLES=ON \
    -DCOMPILER_RT_BUILD_LIBFUZZER=OFF \
    -DLLVM_CCACHE_BUILD=ON \
    -DMLIR_ENABLE_BINDINGS_PYTHON=ON \
    -DBUILD_SHARED_LIBS=ON \
    -DLLVM_ENABLE_PROJECTS='llvm;mlir'
```
2024-10-07 15:52:43 +02:00
Rajveer Singh Bharadwaj
760ffa4736
[mlir][tensor] Apply InsertSliceOfTransferWriteOpFolder only when transfer_write overwrites all elements of insert_slice (#108803)
Resolves #101708

The updated logic now correctly checks if `transfer_write` completely
overwrites `insert_slice` and only then applies the rewrite for this
pattern.

This check currently covers static sizes, for dynamic sizes
value bounds analysis is needed (see `TODO:`).
2024-10-01 14:29:37 -07:00
Mehdi Amini
8b47711e84
Revert "CMake: Remove unnecessary dependencies on LLVM/MLIR" (#110594)
Reverts llvm/llvm-project#110362

Multiple bots are broken.
2024-10-01 00:44:21 +02:00
BARRET
4980f2177e
CMake: Remove unnecessary dependencies on LLVM/MLIR (#110362)
There are some spurious libraries which can be removed.

I'm trying to bundle MLIR/LLVM library dependencies for our own
libraries. We're utilizing cmake function to recursively collect
MLIR/LLVM related dependencies. However, we identified certain library
dependencies as redundant and safe for removal.
2024-09-30 23:57:13 +02:00