llvm-project

Author	SHA1	Message	Date
Longsheng Mou	1723a5137c	[mlir][tensor] Drop unused AffineExpr variable (NFC) (#168651 )	2025-11-19 22:33:40 +08:00
Hanumanth	81964597f9	[mlir][tensor] Fix runtime verification for tensor.extract_slice for empty tensor slices (#166569 ) I hit another runtime verification issue (similar to https://github.com/llvm/llvm-project/pull/164878) while working with TFLite models. The verifier is incorrectly rejecting `tensor.extract_slice` operations when extracting an empty slice (size=0) that starts exactly at the tensor boundary. The current runtime verification unconditionally enforces `offset < dim_size`. This makes sense for non-empty slices, but it's too strict for empty slices, causing false positives that lead to spurious runtime assertions. Simple example that demonstrates the issue: ```mlir func.func @extract_empty_slice(%tensor: tensor<?xf32>, %offset: index, %size: index) { // When called with: tensor size=10, offset=10, size=0 // Runtime verification fails: "offset 0 is out-of-bounds" %slice = tensor.extract_slice %tensor[%offset] [%size] [1] : tensor<?xf32> to tensor<?xf32> return } ``` For the above example, the check evaluates `10 < 10` which is false, so verification fails. However, I believe this operation should be valid - we're extracting zero elements, so there's no actual out-of-bounds access. Real-world repro from the TensorFlow Lite models: This issue manifests while lowering TFLite models and a lot of our system tests are failing due to this. Here's a simplified version showing the problematic pattern: In this code, `%extracted_slice_0` becomes an empty tensor when SSA value `%15` reaches 10 (on the final loop iteration), making `%16 = 0`. The operation extracts zero elements along dimension 0, which is semantically valid but fails runtime verification. ```mlir func.func @simplified_repro_from_tensorflowlite_model(%arg0: tensor<10x4x1xf32>) -> tensor<10x4x1xf32> { %c0 = arith.constant 0 : index %c1 = arith.constant 1 : index %c2 = arith.constant 2 : index %c10 = arith.constant 10 : index %c-1 = arith.constant -1 : index %0 = "tosa.const"() <{values = dense<0> : tensor<i32>}> : () -> tensor<i32> %1 = "tosa.const"() <{values = dense<1> : tensor<i32>}> : () -> tensor<i32> %2 = "tosa.const"() <{values = dense<10> : tensor<i32>}> : () -> tensor<i32> %3 = "tosa.const"() <{values = dense<-1> : tensor<2xi32>}> : () -> tensor<2xi32> %4 = "tosa.const"() <{values = dense<0> : tensor<2xi32>}> : () -> tensor<2xi32> %5 = "tosa.const"() <{values = dense<0.000000e+00> : tensor<1x4x1xf32>}> : () -> tensor<1x4x1xf32> %c4_1 = tosa.const_shape {values = dense<1> : tensor<1xindex>} : () -> !tosa.shape<1> %6:2 = scf.while (%arg1 = %0, %arg2 = %arg0) : (tensor<i32>, tensor<10x4x1xf32>) -> (tensor<i32>, tensor<10x4x1xf32>) { %7 = tosa.greater %2, %arg1 : (tensor<i32>, tensor<i32>) -> tensor<i1> %extracted = tensor.extract %7[] : tensor<i1> scf.condition(%extracted) %arg1, %arg2 : tensor<i32>, tensor<10x4x1xf32> } do { ^bb0(%arg1: tensor<i32>, %arg2: tensor<10x4x1xf32>): %7 = tosa.add %arg1, %1 : (tensor<i32>, tensor<i32>) -> tensor<i32> // First slice %8 = tosa.reshape %arg1, %c4_1 : (tensor<i32>, !tosa.shape<1>) -> tensor<1xi32> %9 = tosa.concat %8, %3 {axis = 0 : i32} : (tensor<1xi32>, tensor<2xi32>) -> tensor<3xi32> %extracted_0 = tensor.extract %9[%c0] : tensor<3xi32> %10 = index.casts %extracted_0 : i32 to index %11 = arith.cmpi eq, %10, %c-1 : index %12 = arith.select %11, %c10, %10 : index %extracted_slice = tensor.extract_slice %arg2[0, 0, 0] [%12, 4, 1] [1, 1, 1] : tensor<10x4x1xf32> to tensor<?x4x1xf32> // Second slice - this is where the failure occurs %13 = tosa.reshape %7, %c4_1 : (tensor<i32>, !tosa.shape<1>) -> tensor<1xi32> %14 = tosa.concat %13, %4 {axis = 0 : i32} : (tensor<1xi32>, tensor<2xi32>) -> tensor<3xi32> %extracted_1 = tensor.extract %14[%c0] : tensor<3xi32> %15 = index.castu %extracted_1 : i32 to index %16 = arith.subi %c10, %15 : index // size = 10 - offset %extracted_2 = tensor.extract %14[%c1] : tensor<3xi32> %17 = index.castu %extracted_2 : i32 to index %extracted_3 = tensor.extract %14[%c2] : tensor<3xi32> %18 = index.castu %extracted_3 : i32 to index // On the last loop iteration: %15=10, %16=0 // %extracted_slice_0 becomes an empty tensor // Runtime verification fails: "offset 0 is out-of-bounds" %extracted_slice_0 = tensor.extract_slice %arg2[%15, %17, %18] [%16, 4, 1] [1, 1, 1] : tensor<10x4x1xf32> to tensor<?x4x1xf32> %19 = tosa.concat %extracted_slice, %5, %extracted_slice_0 {axis = 0 : i32} : (tensor<?x4x1xf32>, tensor<1x4x1xf32>, tensor<?x4x1xf32>) -> tensor<10x4x1xf32> scf.yield %7, %19 : tensor<i32>, tensor<10x4x1xf32> } return %6#1 : tensor<10x4x1xf32> } ``` The fix: Make the offset check conditional on slice size: - Empty slice (size == 0): allow `0 <= offset <= dim_size` - Non-empty slice (size > 0): require `0 <= offset < dim_size` Question for reviewers: Should we also relax the static verifier to allow this edge case? Currently, the static verifier rejects the following IR: ```mlir %tensor = arith.constant dense<1.0> : tensor<10xf32> %slice = tensor.extract_slice %tensor[10] [0] [1] : tensor<10xf32> to tensor<0xf32> ``` Since we're allowing it at runtime for dynamic shapes, it seems inconsistent to reject it statically. However, I wanted to get feedback before making that change - this PR focuses only on the runtime verification fix for dynamic shapes. P.S. We have a similar issue with `memref.subview`. I will send a separate patch for the issue. Co-authored-by: Hanumanth Hanumantharayappa <hhanuman@ah-hhanuman-l.dhcp.mathworks.com>	2025-11-12 08:37:15 +09:00
Jakub Kuderski	ba0be89cd2	[mlir] Simplify Default cases in type switches. NFC. (#165767 ) Use default values instead of lambdas when possible. `std::nullopt` and `nullptr` can be used now because of https://github.com/llvm/llvm-project/pull/165724.	2025-10-30 15:10:59 -04:00
srcarroll	f5e175f06d	[mlir][linalg] Genericize MapOp (#162742 ) This PR modifies the definition of `linalg::MapOp` so that it has the same structure of `linalg::GenericOp` and all other linalg ops. Mainly, it adds an `out` bbarg for the body of the op. Although the `out` arg is never used in the body, there doesn't seem to be much benefit in specializing the op to exclude it. In fact it only makes things more complicated because it doesn't align with the `GenericOp` structure. For example, `linalg-generalize-named-ops` avoided converting `linalg.map` purely because it didn't have the structure to do so. Moreover, although some fusion patterns are applied explicitly to `GenericOp`, we can change them to be applied to the base `LinalgOp` which will enable fusion for any fusion-compatible linalg op, but that requires the op having a generic structure. So these changes will enable us to use existing generic transformation patterns on `MapOp` that weren't possible before. They can either be applied to `MapOp` directly or applied after converting to `GenericOp`.	2025-10-30 10:20:19 -05:00
Mehdi Amini	4e44298f96	[MLIR] Apply clang-tidy fixes for llvm-qualified-auto in SwapExtractSliceWithProducerPatterns.cpp (NFC)	2025-10-28 17:36:35 -07:00
Hanumanth	a6788b5246	[mlir][tensor] Fix runtime verification for `tensor.extract_slice` when size dimension value is 0 (#164878 ) Previously, the runtime verification pass would insert assertion statements with conditions that always evaluate to false for semantically valid `tensor.extract_slice` operations where one of the dimensions had a size of 0. The `tensor.extract_slice` runtime verification logic was unconditionally generating checks for the position of the last element (`offset + (size - 1) * stride`). When `size` is 0, this causes the assertion condition to always be false, leading to runtime failures even though the operation is semantically valid. This patch fixes the issue by making the `lastPos` check conditional. The offset is always verified, but the endpoint check is only performed when `size > 0` to avoid generating spurious assert statements. This issue was discovered through LiteRT model, where a dynamic shape calculation resulted in a zero-sized dimension being passed to `tensor.extract_slice`. The following is a simplified IR snippet from the model. After running the runtime verification pass, an assertion that always fails is generated because the SSA value `%3` becomes 0. ```mlir func.func @simple_repro_from_liteRT_model(%arg0: tensor<10x4x1xf32>) -> tensor<?x?x?xf32> { %cst = arith.constant dense<0> : tensor<1xi32> %cst_0 = arith.constant dense<-1> : tensor<2xi32> %c-1 = arith.constant -1 : index %c0 = arith.constant 0 : index %c10 = arith.constant 10 : index %c1 = arith.constant 1 : index %c4 = arith.constant 4 : index %c2 = arith.constant 2 : index %0 = tensor.empty() : tensor<3xi32> %inserted_slice = tensor.insert_slice %cst into %0[0] [1] [1] : tensor<1xi32> into tensor<3xi32> %inserted_slice_1 = tensor.insert_slice %cst_0 into %inserted_slice[1] [2] [1] : tensor<2xi32> into tensor<3xi32> %extracted = tensor.extract %inserted_slice_1[%c0] : tensor<3xi32> %1 = index.casts %extracted : i32 to index %2 = arith.cmpi eq, %1, %c-1 : index %3 = arith.select %2, %c10, %1 : index %extracted_2 = tensor.extract %inserted_slice_1[%c1] : tensor<3xi32> %4 = index.casts %extracted_2 : i32 to index %5 = arith.cmpi eq, %4, %c-1 : index %6 = arith.select %5, %c4, %4 : index %extracted_3 = tensor.extract %inserted_slice_1[%c2] : tensor<3xi32> %7 = index.casts %extracted_3 : i32 to index %8 = arith.cmpi eq, %7, %c-1 : index %9 = arith.select %8, %c1, %7 : index %extracted_slice = tensor.extract_slice %arg0[0, 0, 0] [%3, %6, %9] [1, 1, 1] : tensor<10x4x1xf32> to tensor<?x?x?xf32> return %extracted_slice : tensor<?x?x?xf32> } ``` The issue can be reproduced more simply with the following test case, where `dim_0` is `0`. When the runtime verification pass is applied to this code with `dim_0 = 0`, it generates an assertion that will always fail at runtime. ```mlir func.func @extract_slice_zero_size_dim(%arg0: tensor<10x4x1xf32>, %dim_0: index, %dim_1: index, %dim_2: index) { %slice = tensor.extract_slice %arg0[0, 0, 0] [%dim_0, %dim_1, %dim_2] [1, 1, 1] : tensor<10x4x1xf32> to tensor<?x?x?xf32> return } func.func @test_zero_size_extraction() { %input = arith.constant dense<1.0> : tensor<10x4x1xf32> // Define slice dimensions: 0x4x1 (zero-size in first dimension) %dim_0 = arith.constant 0 : index %dim_1 = arith.constant 4 : index %dim_2 = arith.constant 1 : index func.call @extract_slice_zero_size_dim(%input, %dim_0, %dim_1, %dim_2) : (tensor<10x4x1xf32>, index, index, index) -> () return } ``` P.S. We probably have a similar issue with `memref.subview`. I will check this and send a separate PR for the issue. --------- Co-authored-by: Hanumanth Hanumantharayappa <hhanuman@ah-hhanuman-l.dhcp.mathworks.com>	2025-10-27 11:43:18 -07:00
Hanchenng Wu	a6d1a52b8d	[MLIR] Reuse AsmState to enable fast generate-runtime-verification pass; add location-only pass option (#160331 ) The pass generate-runtime-verification generates additional runtime op verification checks. Currently, the pass is extremely expensive. For example, with a mobilenet v2 ssd network(converted to mlir), running this pass alone in debug mode will take 30 minutes. The same observation has been made to other networks as small as 5 Mb. The culprit is this line "op->print(stream, flags);" in function "RuntimeVerifiableOpInterface::generateErrorMessage" in File mlir/lib/Interfaces/RuntimeVerifiableOpInterface.cpp. As we are printing the op with all the names of the operands in the middle end, we are constructing a new SSANameState for each op->print(...) call. Thus, we are doing a new SSA analysis for each error message printed. Perf profiling shows that 98% percent of the time is spent in the constructor of SSANameState. This change refactored the message generator. We use a toplevel AsmState, and reuse it with all the op-print(stream, asmState). With a release build, this change reduces the pass exeuction time from ~160 seconds to 0.3 seconds on my machine. This change also adds verbose options to generate-runtime-verification pass. verbose 0: print only source location with error message. verbose 1: print the full op, including the name of the operands.	2025-10-08 11:48:34 +01:00
Alan Li	b87f1b22a8	[MLIR] Add `InParallelOpInterface` for parallel combining operations (#157736 ) This commit: - Introduces a new `InParallelOpInterface`, along with the `ParallelCombiningOpInterface`, represent the parallel updating operations we have in a parallel loop of `scf.forall`. - Change the name of `ParallelCombiningOpInterface` to `InParallelOpInterface` as the naming was quite confusing. - `ParallelCombiningOpInterface` now is used to generalize operations that insert into shared tensors within parallel combining regions. Previously, only `tensor.parallel_insert_slice` was supported directly in `scf.InParallelOp` regions. - `tensor.parallel_insert_slice` now implements `ParallelCombiningOpInterface`. This change enables future extensions to support additional parallel combining operations beyond `tensor.parallel_insert_slice`, which have different update semantics, so the `in_parallel` region can correctly and safely represent these kinds of operation without potential mistakes such as races. Author credits: @qedawkins	2025-09-12 14:23:00 -07:00
Han-Chung Wang	8eba28bc8c	[mlir][NFC] Correct pattern names to match the behaviors. (#158177 ) It is a follow-up for https://github.com/llvm/llvm-project/pull/131982#discussion_r2286014576 and https://github.com/llvm/llvm-project/pull/126898#discussion_r2286013250. The names do not match the behaviors, and the revision updates the names. Signed-off-by: hanhanW <hanhan0912@gmail.com>	2025-09-12 10:57:20 -07:00
Ian Wood	961b052e98	[mlir][tensor][NFC] Refactor common methods for bubbling extract_slice op (#153675 ) Exposes the `tensor.extract_slice` reshaping logic in `BubbleUpExpandShapeThroughExtractSlice` and `BubbleUpCollapseShapeThroughExtractSlice` through two corresponding utility functions. These compute the offsets/sizes/strides of an extract slice after either collapsing or expanding. This should also make it easier to implement the two other bubbling cases: (1) the `collapse_shape` is a consumer or (2) the `expand_shape` is a consumer. --------- Signed-off-by: Ian Wood <ianwood@u.northwestern.edu>	2025-08-19 19:31:30 +00:00
Maksim Levental	c090ed53fb	[mlir][NFC] update `mlir/Dialect` create APIs (33/n) (#150659 ) See https://github.com/llvm/llvm-project/pull/147168 for more info.	2025-07-25 16:13:55 -04:00
Kazu Hirata	0925d7572a	[mlir] Remove unused includes (NFC) (#150266 ) These are identified by misc-include-cleaner. I've filtered out those that break builds. Also, I'm staying away from llvm-config.h, config.h, and Compiler.h, which likely cause platform- or compiler-specific build failures.	2025-07-23 15:18:53 -07:00
Maksim Levental	8fff238b2c	[mlir][NFC] update `mlir/Dialect` create APIs (23/n) (#149930 ) See https://github.com/llvm/llvm-project/pull/147168 for more info.	2025-07-23 10:16:52 -04:00
Kazu Hirata	5e0de68626	[mlir] Remove unused includes (NFC) (#148119 ) These are identified by misc-include-cleaner. I've filtered out those that break builds. Also, I'm staying away from llvm-config.h, config.h, and Compiler.h, which likely cause platform- or compiler-specific build failures.	2025-07-11 11:59:26 -07:00
MaheshRavishankar	c22352175e	[mlir][TilingInterface] Allow tile and fuse to work with `ReductionTilingStrategy::PartialReductionOuterParallelStrategy`. (#147593 ) Since `scf::tileUsingSCF` is the core method used for tiling the root operation within the `scf::tileConsumersAndFuseProducersUsingSCF`, the latter can fuse into any tiled loop generated using `scf::tileUsingSCF`. This patch adds a test for tiling a root operation using `ReductionTilingStrategy::PartialReductionOuterParallelStrategy` and fusing producers with it. Since this strategy generates a rank-reducing extract slice `tensor::replaceExtractSliceWithTiledProducer` which is the core method used for the fusion was extended to handle the rank-reducing slices. Also fix a small bug in the computation of the reduction induction variable (which needs to use `floorDiv` instead of `ceilDiv`) Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>	2025-07-09 08:50:01 -07:00
Andrei Golubev	a63f572628	[mlir][bufferization] Return BufferLikeType in BufferizableOpInterface (#144867 ) Support custom types (2/N): allow value-owning operations (e.g. allocation ops) to bufferize custom tensors into custom buffers. This requires BufferizableOpInterface::getBufferType() to return BufferLikeType instead of BaseMemRefType. Affected implementors of the interface are updated accordingly. Relates to ee070d08163ac09842d9bf0c1315f311df39faf1.	2025-07-02 11:27:35 -07:00
MaheshRavishankar	c873e5f87d	[mlir][TilingInterface] Handle multi operand consumer fusion. (#145193 ) For consumer fusion cases of this form ``` %0:2 = scf.forall .. shared_outs(%arg0 = ..., %arg0 = ...) { tensor.parallel_insert_slice ... into %arg0 tensor.parallel_insert_slice ... into %arg1 } %1 = linalg.generic ... ins(%0#0, %0#1) ``` the current consumer fusion that handles one slice at a time cannot fuse the consumer into the loop, since fusing along one slice will create and SSA violation on the other use from the `scf.forall`. The solution is to allow consumer fusion to allow considering multiple slices at once. This PR changes the `TilingInterface` methods related to consumer fusion, i.e. - `getTiledImplementationFromOperandTile` - `getIterationDomainFromOperandTile` to allow fusion while considering multiple operands. It is upto the `TilingInterface` implementation to return an error if a list of tiles of the operands cannot result in a consistent implementation of the tiled operation. The Linalg operation implementation of `TilingInterface` has been modified to account for these changes and allow cases where operand tiles that can result in a consistent tiling implementation are handled. --------- Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>	2025-06-25 11:54:38 -07:00
Andrei Golubev	ee070d0816	[mlir][bufferization] Support custom types (1/N) (#142986 ) Following the addition of TensorLike and BufferLike type interfaces (see 00eaff3e9c897c263a879416d0f151d7ca7eeaff), introduce minimal changes required to bufferize a custom tensor operation into a custom buffer operation. To achieve this, new interface methods are added to TensorLike type interface that abstract away the differences between existing (tensor -> memref) and custom conversions. The scope of the changes is intentionally limited (for example, BufferizableOpInterface is untouched) in order to first understand the basics and reach consensus design-wise. --- Notable changes: * mlir::bufferization::getBufferType() returns BufferLikeType (instead of BaseMemRefType) * ToTensorOp / ToBufferOp operate on TensorLikeType / BufferLikeType. Operation argument "memref" renamed to "buffer" * ToTensorOp's tensor type inferring builder is dropped (users now need to provide the tensor type explicitly)	2025-06-18 16:18:12 +02:00
Kazu Hirata	85480a4d37	[mlir] Directly call ShapedType::isDynamic without lambdas (NFC) (#142994 ) We do not need lambdas in these places.	2025-06-05 16:14:27 -07:00
Matthias Springer	e4c8ff94e7	[mlir][tensor] Add runtime verification for `cast`/`dim`/`extract`/`insert`/`extract_slice` (#141332 ) Add `RuntimeVerifiableOpInterface` implementations for the following ops. These were mostly copied from the respective memref implementations. Only the part that deals with offsets and strides was removed. * `tensor.cast`: `memref.cast` * `tensor.dim`: `memref.dim` * `tensor.extract`: `memref.load` * `tensor.insert`: `memref.store` * `tensor.extract_slice`: `memref.subview`	2025-06-05 12:06:47 +09:00
Michele Scuttari	63cb6af782	[MLIR] Add bufferization state to `getBufferType` and `resolveConflicts` interface methods (#141466 ) The PR continues the work started in #141019 by adding the `BufferizationState` class also to the `getBufferType` and `resolveConflicts` interface methods, together with the additional support functions that are used throughout the bufferization infrastructure.	2025-05-28 10:35:23 +02:00
Michele Scuttari	61d5fdf50c	[MLIR] Add bufferization state class to OneShotBufferization pass (#141019 ) Follow-up on #138143, which was reverted due to a missing update a method signature (more specifically, the bufferization interface for `tensor::ConcatOp`) that was not catched before merging. The old PR description is reported in the next lines. This PR is a follow-up on https://github.com/llvm/llvm-project/pull/138125, and adds a bufferization state class providing information about the IR. The information currently consists of a cached list of symbol tables, which aims to solve the quadratic scaling of the bufferization task with respect to the number of symbols. The PR breaks API compatibility: the bufferize method of the BufferizableOpInterface has been enriched with a reference to a BufferizationState object. The bufferization state must be kept in a valid state by the interface implementations. For example, if an operation with the Symbol trait is inserted or replaced, its parent SymbolTable must be updated accordingly (see, for example, the bufferization of arith::ConstantOp, where the symbol table of the module gets the new global symbol inserted). Similarly, the invalidation of a symbol table must be performed if an operation with the SymbolTable trait is removed (this can be performed using the invalidateSymbolTable method, introduced in https://github.com/llvm/llvm-project/pull/138014).	2025-05-23 09:21:35 +02:00
Michele Scuttari	72a8893689	Revert "[MLIR] Add bufferization state class to OneShotBufferization pass" (#141012 ) Reverts llvm/llvm-project#138143 The PR for the BufferizationState is temporarily reverted due to API incompatibilities that have been initially missed during the update and were not catched by PR checks.	2025-05-22 09:25:07 +02:00
Michele Scuttari	67fc1660d9	[MLIR] Add bufferization state class to OneShotBufferization pass (#138143 ) This PR is a follow-up on #138125, and adds a bufferization state class providing information about the IR. The information currently consists of a cached list of symbol tables, which aims to solve the quadratic scaling of the bufferization task with respect to the number of symbols. The PR breaks API compatibility: the `bufferize` method of the `BufferizableOpInterface` has been enriched with a reference to a `BufferizationState` object. The bufferization state must be kept in a valid state by the interface implementations. For example, if an operation with the `Symbol` trait is inserted or replaced, its parent `SymbolTable` must be updated accordingly (see, for example, the bufferization of `arith::ConstantOp`, where the symbol table of the module gets the new global symbol inserted). Similarly, the invalidation of a symbol table must be performed if an operation with the `SymbolTable` trait is removed (this can be performed using the `invalidateSymbolTable` method, introduced in #138014).	2025-05-22 08:53:38 +02:00
Han-Chung Wang	c39915fa2e	[mlir][NFC] Simplify constant checks with isOneInteger and renamed isZeroInteger. (#139340 ) The revision adds isOneInteger helper, and simplifies the existing code with the two methods. It removes some lambda, which makes code cleaner. For downstream users, you can update the code with the below script. ```bash sed -i "s/isZeroIndex/isZeroInteger/g" */.h sed -i "s/isZeroIndex/isZeroInteger/g" */.cpp ``` --------- Signed-off-by: hanhanW <hanhan0912@gmail.com>	2025-05-20 14:53:02 -07:00
Jeremy Kun	1bc0043467	Restore #140171 with to_memref -> to_buffer (#140355 ) https://github.com/llvm/llvm-project/pull/140171 was reverted because an op's name changed and I neglected to rebase before merging. --------- Co-authored-by: Jeremy Kun <j2kun@users.noreply.github.com>	2025-05-17 18:47:16 -07:00
Kazu Hirata	7b8bc1b3d1	Revert "[mlir][bufferization] implement BufferizableOpInterface for concat op (#140171 )" This reverts commit 6d9ce6767d259a5231ae312a19459f8fea3bd0ca. Multiple builtbot failures have been reported: https://github.com/llvm/llvm-project/pull/140171	2025-05-16 20:23:18 -07:00
Jeremy Kun	6d9ce6767d	[mlir][bufferization] implement BufferizableOpInterface for concat op (#140171 ) Lowers `tensor.concat` to an alloc with a series of `memref.copy` ops to copy the operands to the alloc. Example: ```mlir func.func @tensor.concat(%f: tensor<8xf32>) -> tensor<16xf32> { %t = tensor.concat dim(0) %f, %f : (tensor<8xf32>, tensor<8xf32>) -> tensor<16xf32> return %t : tensor<16xf32> } ``` Produces ```mlir module { func.func @tensor.concat(%arg0: tensor<8xf32>) -> tensor<16xf32> { // initialization %0 = bufferization.to_memref %arg0 : tensor<8xf32> to memref<8xf32> %alloc = memref.alloc() {alignment = 64 : i64} : memref<8xf32> memref.copy %0, %alloc : memref<8xf32> to memref<8xf32> %alloc_0 = memref.alloc() {alignment = 64 : i64} : memref<8xf32> memref.copy %0, %alloc_0 : memref<8xf32> to memref<8xf32> %alloc_1 = memref.alloc() {alignment = 64 : i64} : memref<16xf32> // one copy for each operand %subview = memref.subview %alloc_1[0] [8] [1] : memref<16xf32> to memref<8xf32, strided<[1]>> memref.copy %alloc, %subview : memref<8xf32> to memref<8xf32, strided<[1]>> %subview_2 = memref.subview %alloc_1[8] [8] [1] : memref<16xf32> to memref<8xf32, strided<[1], offset: 8>> memref.copy %alloc_0, %subview_2 : memref<8xf32> to memref<8xf32, strided<[1], offset: 8>> %1 = bufferization.to_tensor %alloc_1 : memref<16xf32> to tensor<16xf32> return %1 : tensor<16xf32> } } ``` This is my first time implementing BufferizableOpInterface, so I'm looking for some advice on how I can: 1. Clean up my implementation. 2. Avoid duplicate `memref.copy` ops in the `// initialization` section above when handling duplicate `tensor.concat` operands. --------- Co-authored-by: Jeremy Kun <j2kun@users.noreply.github.com>	2025-05-16 20:13:42 -07:00
Andrei Golubev	8f91b108df	[mlir][bufferization][NFC] Rename to_memref to to_buffer (#137180 ) As part of the work on transitioning bufferization dialect, ops, and associated logic to operate on newly added type interfaces (see 00eaff3e9c897c263a879416d0f151d7ca7eeaff), rename the bufferization.to_memref to highlight the generic nature of the op. Bufferization process produces buffers while memref is a builtin type rather than a generic term. Preserve the current API (to_buffer still produces a memref), however, as the new type interfaces are not used yet.	2025-05-14 11:17:09 +02:00
lorenzo chelini	61536f2781	[mlir] Retire additional `let constructor` (NFC) (#139390 ) Three main changes: - The pass createRequestCWrappersPass is renamed as createLLVMRequestCWrappersPass - createOptimizeForTargetPass is now under the LLVM namespace. It’s unclear why the NVVM namespace was used initially, as all passes in LLVMIR/Transforms/Passes.h consistently reside in the LLVM namespace. - DuplicateFunctionEliminationPass is now in the func namespace.	2025-05-13 11:15:29 +02:00
Andrzej Warzyński	c45cc3e420	[mlir][vector] Standardize `base` Naming Across Vector Ops (NFC) (#137859 ) [mlir][vector] Standardize base Naming Across Vector Ops (NFC) This change standardizes the naming convention for the argument representing the value to read from or write to in Vector ops that interface with Tensors or MemRefs. Specifically, it ensures that all such ops use the name `base` (i.e., the base address or location to which offsets are applied). Updated operations: * `vector.transfer_read`, * `vector.transfer_write`. For reference, these ops already use `base`: * `vector.load`, `vector.store`, `vector.scatter`, `vector.gather`, `vector.expandload`, `vector.compressstore`, `vector.maskedstore`, `vector.maskedload`. This is a non-functional change (NFC) and does not alter the semantics of these operations. However, it does require users of the XFer ops to switch from `op.getSource()` to `op.getBase()`. To ease the transition, this PR temporarily adds a `getSource()` interface method for compatibility. This is intended for downstream use only and should not be relied on upstream. The method will be removed prior to the LLVM 21 release. Implements #131602	2025-05-12 09:44:50 +01:00
MaheshRavishankar	0f3e460e06	[mlir][Tensor] Generalize the pattern to swap `tensor.collapse_shape` -> `tensor.expand_shape`. (#133819 ) The current patterns compared the reassocation indices for the two ops and failed if neither of them were of size 1. This patch relaxes this restriction by handling a new case where the reassociation indices might be of the same size. Also generalizes to cases where when generating the swapped `tensor.expand_shape` -> `tensor.collapse_shape` if one of them is degenerate, those are not generated. Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>	2025-04-15 14:10:18 -07:00
MaheshRavishankar	a1bc979aa8	[mlir][Bufferization] Do not have read semantics for destination of `tensor.parallel_insert_slice`. (#134169 ) `tensor.insert_slice` needs to have read semantics on its destination operand. Since it has a return value, its semantics are - Copy dest to result - Copy source to subview of destination. `tensor.parallel_insert_slice` though has no result. So it does not need to have read semantics. The op description [here](`a3ac318e5f/mlir/include/mlir/Dialect/Tensor/IR/TensorOps.td (L1524)`) also says that it is expected to lower to a `memref.subview`, that does not have read semantics on the destination (its just a view). This patch drops the read semantics for destination of `tensor.parallel_insert_slice` but also makes the `shared_outs` operands of `scf.forall` have read semantics. Earlier it would rely indirectly on read semantics of destination operand of `tensor.parallel_insert_slice` to propagate the read semantics for `shared_outs`. Now that is specified more directly. Fixes #133964 --------- Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>	2025-04-03 09:47:36 -07:00
ofri frishman	6f1347d57b	[MLIR] Bubble up tensor.extract_slice through tensor.collapse_shape (#131982 ) Add a pattern that bubbles up tensor.extract_slice through tensor.collapse_shape. The pattern is registered in a pattern population function that is used by the transform op transform.apply_patterns.tensor.bubble_up_extract_slice and by the tranform op transform.structured.fuse as a cleanup pattern. This pattern enables tiling and fusing op chains which contain tensor.collapse_shape if added as a cleanup pattern of tile and fuse utility. Without this pattern that would not be possible, as tensor.collapse_shape does not implement the tiling interface. This is an additional pattern to the one added in PR #126898	2025-04-02 21:06:43 +01:00
Christopher Bate	3438dfc7ff	[mlir][tensor] Fix bufferization interface for 'tensor.reshape' (#128590 ) Previously, the BufferizableOpInterface implementation for 'tensor.reshape' listed the 'shape' operand as an alias for the result tensor, causing unnecessary conflicts with ops that "write" to the shape operand.	2025-03-12 22:19:01 -06:00
Evan Liu	634e25319e	[mlir] Add special case for 0-D tensor when fusing expand from collapse (#130838 ) One fusion pattern for collapse_shape -> expand_shape was added in `a95ad2da36`, however if the intermediate tensor between a collapse and expand is a 0-D tensor, then the `reassociation_map` for these two are special cases and can't be generally fused in this function `BubbleUpExpandThroughParallelCollapse`.	2025-03-11 15:55:55 -07:00
ofri frishman	6e59282235	[MLIR] Add pattern to bubble up tensor.extract_slice (#126898 ) Add a pattern that bubbles up tensor.extract_slice through tensor.expand_shape, and add a transform op to tensor dialect to directly use this pattern. This pattern enables tiling and fusing op chains which contain tensor.expand_shape if added as a cleanup pattern of tile and fuse utility. Without this pattern that would not be possible, as tensor.expand_shape does not implement the tiling interface. In addition, registering this pattern as a cleanup pattern for transform.structured.fuse. The pattern was first implement in IREE project by Quinn Dawkins and is being upstreamed. --------- Co-authored-by: Quinn Dawkins <quinn.dawkins@gmail.com>	2025-03-03 18:20:50 +00:00
Arnab Dutta	3cccb2017f	[MLIR][Tensor] Enhance bufferization of tensor.expand_shape op (#128871 ) Instead of inferring the output shape argument of memref.expand_shape op, use output_shape argument of tensor.expand_shape op by adding dynamic dimension support for bufferization of tensor.expand_shape when there are more than one dynamic dim within a reassociation set.	2025-02-28 10:45:38 +05:30
Andrzej Warzyński	517800e37e	[mlir][tensor][linalg] Move Pack/UnPack Ops to Linalg (#123902 ) Moves `PackOp` and `UnPackOp` from the Tensor dialect to Linalg. This change was discussed in the following RFC: * https://discourse.llvm.org/t/rfc-move-tensor-pack-and-tensor-unpack-into-linalg This change involves significant churn but only relocates existing code - no new functionality is added. Note for Downstream Users Downstream users must update references to `PackOp` and `UnPackOp` as follows: * Code: `s/tensor::(Up)PackOp/linalg::(Un)PackOp/g` * Tests: `s/tensor.(un)pack/linalg.(un)pack/g` No other modifications should be required.	2025-02-17 10:44:27 +00:00
Tomás Longeri	5767e4d4ca	[MLIR][NFC] Return MemRefType in memref.subview return type inference functions (#120024 ) Avoids the need for cast, and matches the extra build functions, which take a `MemRefType`	2025-02-14 12:58:20 +00:00
Matthias Springer	6aaa8f25b6	[mlir][IR][NFC] Move free-standing functions to `MemRefType` (#123465 ) Turn free-standing `MemRefType`-related helper functions in `BuiltinTypes.h` into member functions.	2025-01-21 08:48:09 +01:00
Kazu Hirata	fecf1397e3	[Tensor] Migrate away from PointerUnion::{is,get} (NFC) (#120679 ) Note that PointerUnion::{is,get} have been soft deprecated in PointerUnion.h: // FIXME: Replace the uses of is(), get() and dyn_cast() with // isa<T>, cast<T> and the llvm::dyn_cast<T> I'm not touching PointerUnion::dyn_cast for now because it's a bit complicated; we could blindly migrate it to dyn_cast_if_present, but we should probably use dyn_cast when the operand is known to be non-null.	2024-12-20 10:41:54 -08:00
Jacques Pienaar	09dfc5713d	[mlir] Enable decoupling two kinds of greedy behavior. (#104649 ) The greedy rewriter is used in many different flows and it has a lot of convenience (work list management, debugging actions, tracing, etc). But it combines two kinds of greedy behavior 1) how ops are matched, 2) folding wherever it can. These are independent forms of greedy and leads to inefficiency. E.g., cases where one need to create different phases in lowering and is required to applying patterns in specific order split across different passes. Using the driver one ends up needlessly retrying folding/having multiple rounds of folding attempts, where one final run would have sufficed. Of course folks can locally avoid this behavior by just building their own, but this is also a common requested feature that folks keep on working around locally in suboptimal ways. For downstream users, there should be no behavioral change. Updating from the deprecated should just be a find and replace (e.g., `find ./ -type f -exec sed -i 's\|applyPatternsAndFoldGreedily\|applyPatternsGreedily\|g' {} \;` variety) as the API arguments hasn't changed between the two.	2024-12-20 08:15:48 -08:00
Christopher Bate	ced2fc7819	[mlir][bufferization] Fix OneShotBufferize when `defaultMemorySpaceFn` is used (#91524 ) As described in issue llvm/llvm-project#91518, a previous PR llvm/llvm-project#78484 introduced the `defaultMemorySpaceFn` into bufferization options, allowing one to inform OneShotBufferize that it should use a specified function to derive the memory space attribute from the encoding attribute attached to tensor types. However, introducing this feature exposed unhandled edge cases, examples of which are introduced by this change in the new test under `test/Dialect/Bufferization/Transforms/one-shot-bufferize-encodings.mlir`. Fixing the inconsistencies introduced by `defaultMemorySpaceFn` is pretty simple. This change: - Updates the `bufferization.to_memref` and `bufferization.to_tensor` operations to explicitly include operand and destination types, whereas previously they relied on type inference to deduce the tensor types. Since the type inference cannot recover the correct tensor encoding/memory space, the operand and result types must be explicitly included. This is a small assembly format change, but it touches a large number of test files. - Makes minor updates to other bufferization functions to handle the changes in building the above ops. - Updates bufferization of `tensor.from_elements` to handle memory space. Integration/upgrade guide: In downstream projects, if you have tests or MLIR files that explicitly use `bufferization.to_tensor` or `bufferization.to_memref`, then update them to the new assembly format as follows: ``` %1 = bufferization.to_memref %0 : memref<10xf32> %2 = bufferization.to_tensor %1 : memref<10xf32> ``` becomes ``` %1 = bufferization.to_memref %0 : tensor<10xf32> to memref<10xf32> %2 = bufferization.to_tensor %0 : memref<10xf32> to tensor<10xf32> ```	2024-11-26 09:45:57 -07:00
MaheshRavishankar	de6d48d05d	[mlir][Tensor] Move concat operation decomposition as a method of the concat operation. (#116004 ) Currently the implementation is within a pattern that cannot be used without a pattern rewriter. Move the decomposition as a method of the operation to make it usable outside of pattern rewrites. Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>	2024-11-13 13:29:04 -08:00
Max191	98e838a890	[mlir] Do not bufferize parallel_insert_slice dest to read for full slices (#112761 ) In the insert_slice bufferization interface implementation, the destination tensor is not considered read if the full tensor is overwritten by the slice. This PR adds the same check for tensor.parallel_insert_slice. Adds two new StaticValueUtils: - `isAllConstantIntValue` checks if an array of `OpFoldResult` are all equal to a passed `int64_t` value. - `areConstantIntValues` checks if an array of `OpFoldResult` are all equal to a passed array of `int64_t` values. fixes https://github.com/llvm/llvm-project/issues/112435 --------- Signed-off-by: Max Dawkins <max.dawkins@gmail.com>	2024-10-18 16:02:03 -04:00
BARRET	1666d13078	[CMake]: Remove unnecessary dependencies on LLVM/MLIR (#111255 ) Previous https://github.com/llvm/llvm-project/pull/110362 (reverted) caused breakage. Here is the PR with fix. My build cmdline: ``` cmake ../llvm \ -G Ninja \ -DCMAKE_BUILD_TYPE=Release \ -DCMAKE_INSTALL_PREFIX=install \ -DCMAKE_C_COMPILER=gcc-9 \ -DCMAKE_CXX_COMPILER=g++-9 \ -DCMAKE_CUDA_COMPILER=$(which nvcc) \ -DLLVM_ENABLE_LLD=OFF \ -DLLVM_ENABLE_ASSERTIONS=ON \ -DLLVM_BUILD_EXAMPLES=ON \ -DCOMPILER_RT_BUILD_LIBFUZZER=OFF \ -DLLVM_CCACHE_BUILD=ON \ -DMLIR_ENABLE_BINDINGS_PYTHON=ON \ -DBUILD_SHARED_LIBS=ON \ -DLLVM_ENABLE_PROJECTS='llvm;mlir' ```	2024-10-07 15:52:43 +02:00
Rajveer Singh Bharadwaj	760ffa4736	[mlir][tensor] Apply `InsertSliceOfTransferWriteOpFolder` only when `transfer_write` overwrites all elements of `insert_slice` (#108803 ) Resolves #101708 The updated logic now correctly checks if `transfer_write` completely overwrites `insert_slice` and only then applies the rewrite for this pattern. This check currently covers static sizes, for dynamic sizes value bounds analysis is needed (see `TODO:`).	2024-10-01 14:29:37 -07:00
Mehdi Amini	8b47711e84	Revert "CMake: Remove unnecessary dependencies on LLVM/MLIR" (#110594 ) Reverts llvm/llvm-project#110362 Multiple bots are broken.	2024-10-01 00:44:21 +02:00
BARRET	4980f2177e	CMake: Remove unnecessary dependencies on LLVM/MLIR (#110362 ) There are some spurious libraries which can be removed. I'm trying to bundle MLIR/LLVM library dependencies for our own libraries. We're utilizing cmake function to recursively collect MLIR/LLVM related dependencies. However, we identified certain library dependencies as redundant and safe for removal.	2024-09-30 23:57:13 +02:00

1 2 3 4 5 ...

261 Commits