llvm-project

Author	SHA1	Message	Date
ziereis	d376a7e75f	[mlir] TileUsingInterface bugfix for dominance error (#178190 ) In this PR i move the insertion point in the `yieldReplacementForFusedProducer` because i ran into some issue where a `tensor.extract_slices` tried to use a result of `affine.apply` that was inserted at the end of the block instead of the start of it. This is the full error of the test i added before this change: ```mlir third-party/llvm-project/mlir/test/Interfaces/TilingInterface/tile-fuse-and-yield-using-scfforall.mlir:83:11: error: operand #1 does not dominate this use %pack = linalg.pack %gen#1 ^ third-party/llvm-project/mlir/test/Interfaces/TilingInterface/tile-fuse-and-yield-using-scfforall.mlir:83:11: note: see current operation: %24 = "tensor.extract_slice"(%23, %36, %8) <{operandSegmentSizes = array<i32: 1, 1, 1, 0>, static_offsets = array<i64: -9223372036854775808, 0>, static_sizes = array<i64: -9223372036854775808, 1024>, static_strides = array<i64: 1, 1>}> : (tensor<32x1024xf32>, index, index) -> tensor<?x1024xf32> third-party/llvm-project/mlir/test/Interfaces/TilingInterface/tile-fuse-and-yield-using-scfforall.mlir:71:12: note: operand defined here (op in the same block) %gen:2 = linalg.generic { ^ // -----// IR Dump After InterpreterPass Failed (transform-interpreter) //----- // #map = affine_map<(d0, d1) -> (d0, d1)> #map1 = affine_map<(d0) -> (d0 * 16)> #map2 = affine_map<(d0) -> (d0 * -16 + 32)> #map3 = affine_map<(d0) -> (16, d0 * -16 + 32)> #map4 = affine_map<(d0) -> (d0 - 1)> "builtin.module"() ({ "func.func"() <{function_type = (tensor<32x1024xf32>) -> (tensor<32x1024xf32>, tensor<2x512x16x2xi8>), sym_name = "fuse_pack_consumer_into_multi_output_generic"}> ({ ^bb0(%arg1: tensor<32x1024xf32>): %2 = "arith.constant"() <{value = 0 : i8}> : () -> i8 %3 = "tensor.empty"() : () -> tensor<32x1024xf32> %4 = "tensor.empty"() : () -> tensor<32x1024xi8> %5 = "tensor.empty"() : () -> tensor<2x512x16x2xi8> %6:2 = "linalg.generic"(%arg1, %3, %4) <{indexing_maps = [#map, #map, #map], iterator_types = [#linalg.iterator_type<parallel>, #linalg.iterator_type<parallel>], operandSegmentSizes = array<i32: 1, 2>}> ({ ^bb0(%arg9: f32, %arg10: f32, %arg11: i8): %41 = "arith.fptoui"(%arg9) : (f32) -> i8 "linalg.yield"(%arg9, %41) : (f32, i8) -> () }) : (tensor<32x1024xf32>, tensor<32x1024xf32>, tensor<32x1024xi8>) -> (tensor<32x1024xf32>, tensor<32x1024xi8>) %7:3 = "scf.forall"(%5, %3, %4) <{operandSegmentSizes = array<i32: 0, 0, 0, 3>, staticLowerBound = array<i64: 0>, staticStep = array<i64: 1>, staticUpperBound = array<i64: 2>}> ({ ^bb0(%arg2: index, %arg3: tensor<2x512x16x2xi8>, %arg4: tensor<32x1024xf32>, %arg5: tensor<32x1024xi8>): %8 = "affine.apply"(%arg2) <{map = #map1}> : (index) -> index %9 = "affine.apply"(%arg2) <{map = #map2}> : (index) -> index %10 = "affine.min"(%arg2) <{map = #map3}> : (index) -> index %11 = "affine.apply"(%10) <{map = #map4}> : (index) -> index %12 = "affine.apply"(%arg2) <{map = #map1}> : (index) -> index %13 = "affine.apply"(%10) <{map = #map4}> : (index) -> index %14 = "affine.apply"(%arg2) <{map = #map1}> : (index) -> index %15 = "affine.apply"(%10) <{map = #map4}> : (index) -> index %16 = "affine.apply"(%arg2) <{map = #map1}> : (index) -> index %17 = "affine.apply"(%10) <{map = #map4}> : (index) -> index %18 = "tensor.extract_slice"(%arg1, %12, %10) <{operandSegmentSizes = array<i32: 1, 1, 1, 0>, static_offsets = array<i64: -9223372036854775808, 0>, static_sizes = array<i64: -9223372036854775808, 1024>, static_strides = array<i64: 1, 1>}> : (tensor<32x1024xf32>, index, index) -> tensor<?x1024xf32> %19 = "tensor.empty"() : () -> tensor<32x1024xf32> %20 = "tensor.extract_slice"(%19, %14, %10) <{operandSegmentSizes = array<i32: 1, 1, 1, 0>, static_offsets = array<i64: -9223372036854775808, 0>, static_sizes = array<i64: -9223372036854775808, 1024>, static_strides = array<i64: 1, 1>}> : (tensor<32x1024xf32>, index, index) -> tensor<?x1024xf32> %21 = "tensor.extract_slice"(%3, %14, %10) <{operandSegmentSizes = array<i32: 1, 1, 1, 0>, static_offsets = array<i64: -9223372036854775808, 0>, static_sizes = array<i64: -9223372036854775808, 1024>, static_strides = array<i64: 1, 1>}> : (tensor<32x1024xf32>, index, index) -> tensor<?x1024xf32> %22 = "tensor.empty"() : () -> tensor<32x1024xi8> %23 = "tensor.extract_slice"(%22, %16, %10) <{operandSegmentSizes = array<i32: 1, 1, 1, 0>, static_offsets = array<i64: -9223372036854775808, 0>, static_sizes = array<i64: -9223372036854775808, 1024>, static_strides = array<i64: 1, 1>}> : (tensor<32x1024xi8>, index, index) -> tensor<?x1024xi8> %24 = "tensor.extract_slice"(%4, %16, %10) <{operandSegmentSizes = array<i32: 1, 1, 1, 0>, static_offsets = array<i64: -9223372036854775808, 0>, static_sizes = array<i64: -9223372036854775808, 1024>, static_strides = array<i64: 1, 1>}> : (tensor<32x1024xi8>, index, index) -> tensor<?x1024xi8> %25 = "tensor.empty"() : () -> tensor<32x1024xf32> %26 = "tensor.extract_slice"(%25, %38, %10) <{operandSegmentSizes = array<i32: 1, 1, 1, 0>, static_offsets = array<i64: -9223372036854775808, 0>, static_sizes = array<i64: -9223372036854775808, 1024>, static_strides = array<i64: 1, 1>}> : (tensor<32x1024xf32>, index, index) -> tensor<?x1024xf32> %27 = "tensor.extract_slice"(%arg4, %38, %10) <{operandSegmentSizes = array<i32: 1, 1, 1, 0>, static_offsets = array<i64: -9223372036854775808, 0>, static_sizes = array<i64: -9223372036854775808, 1024>, static_strides = array<i64: 1, 1>}> : (tensor<32x1024xf32>, index, index) -> tensor<?x1024xf32> %28 = "tensor.empty"() : () -> tensor<32x1024xi8> %29 = "tensor.extract_slice"(%28, %8, %10) <{operandSegmentSizes = array<i32: 1, 1, 1, 0>, static_offsets = array<i64: -9223372036854775808, 0>, static_sizes = array<i64: -9223372036854775808, 1024>, static_strides = array<i64: 1, 1>}> : (tensor<32x1024xi8>, index, index) -> tensor<?x1024xi8> %30 = "tensor.extract_slice"(%arg5, %8, %10) <{operandSegmentSizes = array<i32: 1, 1, 1, 0>, static_offsets = array<i64: -9223372036854775808, 0>, static_sizes = array<i64: -9223372036854775808, 1024>, static_strides = array<i64: 1, 1>}> : (tensor<32x1024xi8>, index, index) -> tensor<?x1024xi8> %31:2 = "linalg.generic"(%18, %27, %30) <{indexing_maps = [#map, #map, #map], iterator_types = [#linalg.iterator_type<parallel>, #linalg.iterator_type<parallel>], operandSegmentSizes = array<i32: 1, 2>}> ({ ^bb0(%arg6: f32, %arg7: f32, %arg8: i8): %40 = "arith.fptoui"(%arg6) : (f32) -> i8 "linalg.yield"(%arg6, %40) : (f32, i8) -> () }) : (tensor<?x1024xf32>, tensor<?x1024xf32>, tensor<?x1024xi8>) -> (tensor<?x1024xf32>, tensor<?x1024xi8>) %32 = "tensor.extract_slice"(%6#1, %8, %10) <{operandSegmentSizes = array<i32: 1, 1, 1, 0>, static_offsets = array<i64: -9223372036854775808, 0>, static_sizes = array<i64: -9223372036854775808, 1024>, static_strides = array<i64: 1, 1>}> : (tensor<32x1024xi8>, index, index) -> tensor<?x1024xi8> %33 = "tensor.empty"() : () -> tensor<2x512x16x2xi8> %34 = "tensor.extract_slice"(%33, %arg2) <{operandSegmentSizes = array<i32: 1, 1, 0, 0>, static_offsets = array<i64: -9223372036854775808, 0, 0, 0>, static_sizes = array<i64: 1, 512, 16, 2>, static_strides = array<i64: 1, 1, 1, 1>}> : (tensor<2x512x16x2xi8>, index) -> tensor<1x512x16x2xi8> %35 = "tensor.extract_slice"(%arg3, %arg2) <{operandSegmentSizes = array<i32: 1, 1, 0, 0>, static_offsets = array<i64: -9223372036854775808, 0, 0, 0>, static_sizes = array<i64: 1, 512, 16, 2>, static_strides = array<i64: 1, 1, 1, 1>}> : (tensor<2x512x16x2xi8>, index) -> tensor<1x512x16x2xi8> %36 = "linalg.pack"(%31#1, %35, %2) <{inner_dims_pos = array<i64: 0, 1>, operandSegmentSizes = array<i32: 1, 1, 1, 0>, static_inner_tiles = array<i64: 16, 2>}> : (tensor<?x1024xi8>, tensor<1x512x16x2xi8>, i8) -> tensor<1x512x16x2xi8> %37 = "affine.apply"(%10) <{map = #map4}> : (index) -> index %38 = "affine.apply"(%arg2) <{map = #map1}> : (index) -> index %39 = "affine.apply"(%10) <{map = #map4}> : (index) -> index "scf.forall.in_parallel"() ({ "tensor.parallel_insert_slice"(%36, %arg3, %arg2) <{operandSegmentSizes = array<i32: 1, 1, 1, 0, 0>, static_offsets = array<i64: -9223372036854775808, 0, 0, 0>, static_sizes = array<i64: 1, 512, 16, 2>, static_strides = array<i64: 1, 1, 1, 1>}> : (tensor<1x512x16x2xi8>, tensor<2x512x16x2xi8>, index) -> () "tensor.parallel_insert_slice"(%31#0, %arg4, %38, %10) <{operandSegmentSizes = array<i32: 1, 1, 1, 1, 0>, static_offsets = array<i64: -9223372036854775808, 0>, static_sizes = array<i64: -9223372036854775808, 1024>, static_strides = array<i64: 1, 1>}> : (tensor<?x1024xf32>, tensor<32x1024xf32>, index, index) -> () "tensor.parallel_insert_slice"(%31#1, %arg5, %8, %10) <{operandSegmentSizes = array<i32: 1, 1, 1, 1, 0>, static_offsets = array<i64: -9223372036854775808, 0>, static_sizes = array<i64: -9223372036854775808, 1024>, static_strides = array<i64: 1, 1>}> : (tensor<?x1024xi8>, tensor<32x1024xi8>, index, index) -> () }) : () -> () }) : (tensor<2x512x16x2xi8>, tensor<32x1024xf32>, tensor<32x1024xi8>) -> (tensor<2x512x16x2xi8>, tensor<32x1024xf32>, tensor<32x1024xi8>) "func.return"(%7#1, %7#0) : (tensor<32x1024xf32>, tensor<2x512x16x2xi8>) -> () }) : () -> () "builtin.module"() ({ "transform.named_sequence"() <{arg_attrs = [{transform.readonly}], function_type = (!transform.any_op) -> (), sym_name = "__transform_main"}> ({ ^bb0(%arg0: !transform.any_op): %0 = "transform.structured.match"(%arg0) <{ops = ["linalg.pack"]}> : (!transform.any_op) -> !transform.any_op %1:2 = "transform.test.fuse_and_yield"(%0) <{tile_interchange = [], tile_sizes = [1], use_forall = true}> : (!transform.any_op) -> (!transform.any_op, !transform.any_op) "transform.yield"() : () -> () }) : () -> () }) {transform.with_named_sequence} : () -> () }) : () -> () ``` I also noticed that Interface tests are missing from the bazel overlay so i also added this.	2026-02-02 11:09:10 -08:00
Matthias Springer	4f9843b9e2	Revert "[mlir][Interfaces] Add `ExecutionProgressOpInterface` + folding pattern (#179039 )" (#179241 ) This reverts commit 48c664b85f8fb043bc113ed0684e156e0f30f1bd. This reverts commit 1e33b736ff3888b04206acb9fbd9f6623c0723aa.	2026-02-02 14:57:19 +00:00
Matthias Springer	1e33b736ff	[mlir][Interfaces] Add `ExecutionProgressOpInterface` + folding pattern (#179039 ) Add the `ExecutionProgressOpInterface` with an interface method to check if an operation "must progress". Add `mustProgress` attributes to `scf.for` and `scf.while` (default value is "true"). `mustProgress` corresponds to the [`llvm.loop.mustprogress` metadata](https://llvm.org/docs/LangRef.html#langref-llvm-loop-mustprogress). Also add a canonicalization pattern to erase `RegionBranchOpInterface` ops that must progress but loop infinitely (and are non-side-effecting). This canonicalization pattern is enabled for `scf.for` and `scf.while`. RFC: https://discourse.llvm.org/t/infinite-loops-and-dead-code/89530	2026-02-02 08:25:45 +01:00
Jakub Kuderski	9aaf0b89f5	[mlir] Apply clang-tidy check llvm-use-vector-utils. NFC. (#178526 )	2026-01-29 02:19:00 +00:00
Jakub Kuderski	59e44799bd	[mlir] Fix new clang-tidy warning llvm-type-switch-case-types. NFC. (#178487 ) Pre-commiting this before landing the new check in https://github.com/llvm/llvm-project/pull/177892	2026-01-28 19:13:47 +00:00
Matthias Springer	14bdd06fba	[mlir][DialectUtils] Fix 0 step handling in `constantTripCount` (#177329 ) A step size of "zero" does not indicate "zero iterations". It may indicate an infinite number of iterations. This commit makes some transformations more conservative. We used to fold away some loops with step size 0 and that's now no longer the case. Relation discussion: https://discourse.llvm.org/t/infinite-loops-and-dead-code/89530	2026-01-25 19:47:35 +00:00
Prathamesh Tagore	689b8a3ed4	[mlir][scf] Skip ops having results with warning in forall-to-for pass (#175926 ) Avoid converting scf.forall ops that have results in forall-to-for pass. Emit a warning instead of failing the pass, so mlir-opt can still produce output on mixed IR. Fixes https://github.com/llvm/llvm-project/issues/174319	2026-01-19 15:48:57 -08:00
Bangtian Liu	fb0881f628	[mlir][Tensor] Add rank-reducing slice in generatedSlices (#174248 ) When `replaceExtractSliceWithTiledProducer `creates a rank-reducing slice to handle type mismatches, it should be tracked in `generatedSlices `so downstream cleanup patterns (like IREE's FoldExtractSliceOfBroadcast) can process it. This PR also fixes an infinite loop in getUntiledProducerFromSliceSource where adding the slice to generatedSlices caused the fusion worklist to repeatedly try to re-fuse producers already inside the innermost loop; the fix skips producers that are already inside the innermost loop via an isProperAncestor check. Added a lit test (@fuse_through_rank_reducing_slice) demonstrating correct fusion through rank-reducing slices. Note that demonstrating the generatedSlices tracking benefit requires a cleanup pattern (SwapExtractSliceWithFillPatterns) to consume the slice; IREE's full CI suite (iree-org/iree#23012) validates this works correctly in practice with patterns like FoldExtractSliceOfBroadcast. --------- Signed-off-by: Bangtian Liu <liubangtian@gmail.com>	2026-01-14 18:29:41 -05:00
ofri frishman	6559bdab61	[mlir] Remove loop peeling explicit C'tor (#175419 ) The SCF dialect loop peeling pass has an explicit C'tor. This creates an inconvenience to use non default pass options, since they can only be passed as a string after the pass creation. After removing the explicit C'tor, the code auto generation creates 2 C'tors, which one of them can directly receive pass options struct in case non default values are required. The explicit C'tor does not match auto generated C'tor convention, so this change requires any uses of the pass in downstream projects to update to use the auto generated C'tor.	2026-01-11 11:59:05 +02:00
Victor Chernyakin	c438773432	[LLVM][ADT] Migrate users of `make_scope_exit` to CTAD (#174030 ) This is a followup to #173131, which introduced the CTAD functionality.	2026-01-02 20:42:56 -08:00
Ming Yan	2c21790983	Revert "[MLIR][SCF] Sink scf.if from scf.while before region into after region in scf-uplift-while-to-for" (#169888 ) Reverts llvm/llvm-project#165216 It is implemented in #169892 .	2025-12-01 19:02:02 +08:00
Ming Yan	25d027b8ab	[MLIR][SCF] Sink scf.if from scf.while before region into after region in scf-uplift-while-to-for (#165216 ) When a `scf.if` directly precedes an `scf.condition` in the before region of an `scf.while` and both share the same condition, move the if into the after region of the loop. This helps simplify the control flow to enable uplifting `scf.while` to `scf.for`.	2025-11-27 23:30:05 +08:00
MaheshRavishankar	dbeda4f419	[mlir][SCF] Add `scf::tileAndFuseConsumer` that tiles a consumer into a given tiled loop nest. (#167634 ) The existing `scf::tileAndFuseConsumerOfSlices` takes a list of slices (and loops they are part of), tries to find the consumer of these slices (all slices are expected to be the same consumer), and then tiles the consumer into the loop nest using the `TilingInterface`. A more natural way of doing consumer fusion is to just start from the consumer, look for operands that are produced by the loop nest passed in as `loops` (presumably these loops are generated by tiling, but that is not a requirement for consumer fusion). Using the consumer you can find the slices of the operands that are accessed within the loop which you can then use to tile and fuse the consumer (using `TilingInterface`). This handles more naturally the case where multiple operands of the consumer come from the loop nest. The `scf::tileAndFuseConsumerOfSlices` was implemented as a mirror of `scf::tileAndFuseProducerOfSlice`. For the latter, the slice has a single producer for the source of the slice, which makes it a natural way of specifying producer fusion. But for consumers, the result might have multiple users, resulting in multiple candidates for fusion, as well as a fusion candidate using multiple results from the tiled loop nest. This means using slices (`tensor.insert_slice`/`tensor.parallel_insert_slice`) as a hook for consumer fusion turns out to be quite hard to navigate. The use of the consumer directly avoids all those pain points. In time the `scf::tileAndFuseConsumerOfSlices` should be deprecated in favor of `scf::tileAndFuseConsumer`. There is a lot of tech-debt that has accumulated in `scf::tileAndFuseConsumerOfSlices` that needs to be cleanedup. So while that gets cleaned up, and required functionality is moved to `scf::tileAndFuseConsumer`, the old path is still maintained. The test for `scf::tileAndFuseConsumerUsingSlices` is copied to `tile-and-fuse-consumer.mlir` to `tile-and-fuse-consumer-using-slices.mlir`. All the tests that were there in this file are now using the `tileAndFuseConsumer` method. The test op `test.tile_and_fuse_consumer` is modified to call `scf::tileAndFuseConsumer`, while a new op `test.tile_and_fuse_consumer_of_slice` is used to keep the old path tested while it is deprecated. --------- Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>	2025-11-20 14:13:57 -08:00
Kazu Hirata	2844d86734	[mlir] Remove unused local variables (NFC) (#167107 ) Identified with bugprone-unused-local-non-trivial-variable.	2025-11-08 07:41:15 -08:00
Jakub Kuderski	ba0be89cd2	[mlir] Simplify Default cases in type switches. NFC. (#165767 ) Use default values instead of lambdas when possible. `std::nullopt` and `nullptr` can be used now because of https://github.com/llvm/llvm-project/pull/165724.	2025-10-30 15:10:59 -04:00
Mehdi Amini	41f65666f6	[MLIR] Revamp RegionBranchOpInterface (#165429 ) This is still somehow a WIP, we have some issues with this interface that are not trivial to solve. This patch tries to make the concepts of RegionBranchPoint and RegionSuccessor more robust and aligned with their definition: - A `RegionBranchPoint` is either the parent (`RegionBranchOpInterface`) op or a `RegionBranchTerminatorOpInterface` operation in a nested region. - A `RegionSuccessor` is either one of the nested region or the parent `RegionBranchOpInterface` Some new methods with reasonnable default implementation are added to help resolving the flow of values across the RegionBranchOpInterface. It is still not trivial in the current state to walk the def-use chain backward with this interface. For example when you have the 3rd block argument in the entry block of a for-loop, finding the matching operands requires to know about the hidden loop iterator block argument and where the iterargs start. The API is designed around forward-tracking of the chain unfortunately. Try to reland #161575 ; I suspect a buildbot incremental build issue.	2025-10-28 09:53:56 -07:00
Mehdi Amini	e3c547179f	Revert " [MLIR] Revamp RegionBranchOpInterface " (#165356 ) Reverts llvm/llvm-project#161575 Broke Windows on ARM buildbot build, needs investigations.	2025-10-28 01:06:14 -07:00
Mehdi Amini	ab1fd21b54	[MLIR] Revamp RegionBranchOpInterface (#161575 ) This is still somehow a WIP, we have some issues with this interface that are not trivial to solve. This patch tries to make the concepts of RegionBranchPoint and RegionSuccessor more robust and aligned with their definition: - A `RegionBranchPoint` is either the parent (`RegionBranchOpInterface`) op or a `RegionBranchTerminatorOpInterface` operation in a nested region. - A `RegionSuccessor` is either one of the nested region or the parent `RegionBranchOpInterface` Some new methods with reasonnable default implementation are added to help resolving the flow of values across the RegionBranchOpInterface. It is still not trivial in the current state to walk the def-use chain backward with this interface. For example when you have the 3rd block argument in the entry block of a for-loop, finding the matching operands requires to know about the hidden loop iterator block argument and where the iterargs start. The API is designed around forward-tracking of the chain unfortunately.	2025-10-28 07:47:26 +00:00
ddubov100	af4fbcaed7	[mlir][SCF] Fix UB adjustment during `scf.for` loop peeling Currently when peeling the first iteration, any mentioning of UB within the loop body is replaced with the new UB in the peeled out first iteration. This introduces a bug in the following scenario: Operations inside of the loop that intentionally use the original UB are incorrectly updated.	2025-10-20 18:18:08 +02:00
Jakub Kuderski	8bab6c4e8c	[mlir] Simplify unreachable type switch cases. NFC. (#162032 ) Use `DefaultUnreachable` from https://github.com/llvm/llvm-project/pull/161970.	2025-10-06 09:23:25 -04:00
sebvince	178651ac87	[MLIR][SCF] Add loops as parameter to LoopTerminator callback when using CustomOp. (#161386 ) This PR adds to the generateLoopTerminatorFn callback the loops generated by GenerateLoopHeaderFn. This is needed to correctly set the insertion point with scf.forall ops.	2025-09-30 09:46:05 -07:00
Dor Arad	373a2f1f22	[mlir][scf] ExecuteRegionOp bufferization to consider no_inline attr (#160697 ) Fix a bug where ExecuteRegionOp bufferization dropped the "no_inline" attribute. Co-authored-by: Dor Arad <dor.arad@mobileye.com>	2025-09-25 15:53:00 +02:00
Artemy Skrebkov	72b8073d05	[mlir][SCF] Add scf.index_switch support for populateSCFStructuralTypeConversionsAndLegality (#160344 ) In a downstream project, there is a need for a type conversion pattern for scf.index_switch operation. A test is added into `mlir/test/Dialect/SparseTensor/scf_1_N_conversion.mlir` (not sure this functionality is really required for sparse tensors, but the test showcase that the new conversion pattern is functional)	2025-09-24 13:50:05 +02:00
MaheshRavishankar	fa19a57b87	[mlir][SCF] Allow using a custom operation to generate loops with `mlir::tileUsingSCF`. (#159660 ) This change adds an option to use a custom operation to generate the inter-tile loops during tiling. When the loop type is set to scf::SCFTilingOptions::LoopType::CustomOp, the method mlir::tileUsingSCF provides two callback functions First one to generate the header of the loop. Second one to generate the terminator of the loop. These methods receive the information needed to generate the loops/terminator and expect to return information needed to generate the code for the intra-tile computation. See comments for more details. Presently this is adds support only for tiling. Subsequent commits will update this to add support for fusion as well. The PR is split into two commits. The first commit is an NFC that just refactors the code (and cleans up some naming) to make it easier to add the support for custom loop operations. The second commit adds the support for using a custom loop operation, as well as a test to exercise this path. Note that this is duplicate of https://github.com/llvm/llvm-project/pull/159506 that was accidently committed and was reverted in https://github.com/llvm/llvm-project/pull/159598 to wait for reviews. Signed-off-by: MaheshRavishankar [mahesh.ravishankar@gmail.com](mailto:mahesh.ravishankar@gmail.com) --------- Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>	2025-09-22 15:20:02 -07:00
lonely eagle	a25eda2bec	[mlir][scf] Modify the return logic of generateLoopNestUsingForOp (NFC) (#159394 ) When loops is empty, avoid executing yieldTiledValuesFn and Add a test which all tile sizes are set to zero.	2025-09-23 00:40:54 +08:00
Matthias Springer	3c862b4ba3	[mlir] Expose optional `PatternBenefit` to function / SCF populate functions (NFC) (#159752 ) Pattern benefit allows users to give priority to a pattern.	2025-09-19 13:46:04 +02:00
MaheshRavishankar	cfaf23927c	Revert "[mlir][SCF] Allow using a custom operation to generate loops with `mlir::tileUsingSCF`." (#159598 ) Reverts llvm/llvm-project#159506 It was committed by accident. Reverting it for reviews.	2025-09-18 14:52:29 -07:00
MaheshRavishankar	b8649098a7	[mlir][SCF] Allow using a custom operation to generate loops with `mlir::tileUsingSCF`. (#159506 ) This change adds an option to use a custom operation to generate the inter-tile loops during tiling. When the loop type is set to `scf::SCFTilingOptions::LoopType::CustomOp`, the method `mlir::tileUsingSCF` provides two callback functions 1. First one to generate the header of the loop. 2. Second one to generate the terminator of the loop. These methods receive the information needed to generate the loops/terminator and expect to return information needed to generate the code for the intra-tile computation. See comments for more details. Presently this is adds support only for tiling. Subsequent commits will update this to add support for fusion as well. The PR is split into two commits. 1) The first commit is an NFC that just refactors the code (and cleans up some naming) to make it easier to add the support for custom loop operations. 2) The second commit adds the support for using a custom loop operation, as well as a test to exercise this path. Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com> --------- Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>	2025-09-18 09:28:44 -07:00
Mehdi Amini	c5062d7d63	Revert "[mlir] move if-condition propagation to a standalone pass" (#159535 ) Reverts llvm/llvm-project#150278 Multiple post-merge comment remained undressed, and some more fundamental issues were also reported in #159165	2025-09-18 09:54:53 +00:00
Mehdi Amini	fba26bc1a5	[MLIR] Fix SCF loop specialization (peeling) to work on scf.for with non-index type (#158707 ) The current code would crash with integer. This is visible on this modified example (the original with index was incorrect)	2025-09-16 14:09:07 +02:00
Alan Li	b87f1b22a8	[MLIR] Add `InParallelOpInterface` for parallel combining operations (#157736 ) This commit: - Introduces a new `InParallelOpInterface`, along with the `ParallelCombiningOpInterface`, represent the parallel updating operations we have in a parallel loop of `scf.forall`. - Change the name of `ParallelCombiningOpInterface` to `InParallelOpInterface` as the naming was quite confusing. - `ParallelCombiningOpInterface` now is used to generalize operations that insert into shared tensors within parallel combining regions. Previously, only `tensor.parallel_insert_slice` was supported directly in `scf.InParallelOp` regions. - `tensor.parallel_insert_slice` now implements `ParallelCombiningOpInterface`. This change enables future extensions to support additional parallel combining operations beyond `tensor.parallel_insert_slice`, which have different update semantics, so the `in_parallel` region can correctly and safely represent these kinds of operation without potential mistakes such as races. Author credits: @qedawkins	2025-09-12 14:23:00 -07:00
Mehdi Amini	8dee9e465b	[MLIR] Apply clang-tidy fixes for readability-identifier-naming in ParallelLoopFusion.cpp (NFC)	2025-09-02 08:27:07 -07:00
Matthias Springer	337707a541	[mlir][Transforms] Dialect conversion: Context-aware type conversions (#140434 ) This commit adds support for context-aware type conversions: type conversion rules that can return different types depending on the IR. There is no change for existing (context-unaware) type conversion rules: ```c++ // Example: Conversion any integer type to f32. converter.addConversion([](IntegerType t) { return Float32Type::get(t.getContext()); } ``` There is now an additional overload to register context-aware type conversion rules: ```c++ // Example: Type conversion rule for integers, depending on the context: // Get the defining op of `v`, read its "increment" attribute and return an // integer with a bitwidth that is increased by "increment". converter.addConversion([](Value v) -> std::optional<Type> { auto intType = dyn_cast<IntegerType>(v.getType()); if (!intType) return std::nullopt; Operation *op = v.getDefiningOp(); if (!op) return std::nullopt; auto incrementAttr = op->getAttrOfType<IntegerAttr>("increment"); if (!incrementAttr) return std::nullopt; return IntegerType::get(v.getContext(), intType.getWidth() + incrementAttr.getInt()); }); ``` For performance reasons, the type converter caches the result of type conversions. This is no longer possible when there context-aware type conversions because each conversion could compute a different type depending on the context. There is no performance degradation when there are only context-unaware type conversions. Note: This commit just adds context-aware type conversions to the dialect conversion framework. There are many existing patterns that still call `converter.convertType(someValue.getType())`. These should be gradually updated in subsequent commits to call `converter.convertType(someValue)`. Co-authored-by: Markus Böck <markus.boeck02@gmail.com>	2025-08-27 09:13:52 +02:00
Shay Kleiman	0cbb6e7d6c	[mlir][scf] Expose isPerfectlyNestedForLoops (#152115 ) The function `isPerfectlyNestedForLoops` is useful on its own and so I'm exposing it for downstream use.	2025-08-26 11:05:19 +03:00
Matthias Springer	21b607adbe	[mlir][SCF] `scf.for`: Add support for unsigned integer comparison (#153379 ) Add a new unit attribute to allow for unsigned integer comparison. Example: ```mlir scf.for unsigned %iv_32 = %lb_32 to %ub_32 step %step_32 : i32 { // body } ``` Discussion: https://discourse.llvm.org/t/scf-should-scf-for-support-unsigned-comparison/84655	2025-08-15 10:59:14 +02:00
Michael Marjieh	12b9c0da04	[mlir][scf] Implement Conversion from scf.parallel to Nested scf.for (#147692 ) Add a utility function/transform operation to convert `scf.parallel` loops to nested `scf.for` loops.	2025-08-04 06:21:50 -07:00
Maksim Levental	c090ed53fb	[mlir][NFC] update `mlir/Dialect` create APIs (33/n) (#150659 ) See https://github.com/llvm/llvm-project/pull/147168 for more info.	2025-07-25 16:13:55 -04:00
Jacques Pienaar	07967d4af8	[mlir] Switch to new LDBG macro (#150616 ) Change local variants to use new central one.	2025-07-25 18:22:46 +02:00
Longsheng Mou	f047b735e9	[mlir][NFC] Use `getDefiningOp<OpTy>()` instead of `dyn_cast<OpTy>(getDefiningOp())` (#150428 ) This PR uses `val.getDefiningOp<OpTy>()` to replace `dyn_cast<OpTy>(val.getDefiningOp())` , `dyn_cast_or_null<OpTy>(val.getDefiningOp())` and `dyn_cast_if_present<OpTy>(val.getDefiningOp())`.	2025-07-25 10:35:51 +08:00
Maksim Levental	588845defd	[mlir][NFC] update `mlir/Dialect` create APIs (20/n) (#149927 ) See https://github.com/llvm/llvm-project/pull/147168 for more info.	2025-07-24 07:09:58 -05:00
Longsheng Mou	3eb49c482c	[mlir][NFC] Use `hasOneBlock` instead of `llvm::hasSingleElement(region)` (#149809 )	2025-07-24 10:11:21 +08:00
Kazu Hirata	0925d7572a	[mlir] Remove unused includes (NFC) (#150266 ) These are identified by misc-include-cleaner. I've filtered out those that break builds. Also, I'm staying away from llvm-config.h, config.h, and Compiler.h, which likely cause platform- or compiler-specific build failures.	2025-07-23 15:18:53 -07:00
Oleksandr "Alex" Zinenko	9d11accf95	[mlir] move if-condition propagation to a standalone pass (#150278 ) This offers a significant speedup over running this as a canonicalizaiton pattern, up to 10x improvement when running on large (>100k operations) inputs coming from Polygeist. It is also not clear whether this transformation is a reasonable canonicalization as it performs non-local rewrites.	2025-07-23 21:02:40 +02:00
Kazu Hirata	c06d3a7b72	[mlir] Remove unused includes (NFC) (#148769 ) These are identified by misc-include-cleaner. I've filtered out those that break builds. Also, I'm staying away from llvm-config.h, config.h, and Compiler.h, which likely cause platform- or compiler-specific build failures.	2025-07-14 22:19:23 -07:00
MaheshRavishankar	c22352175e	[mlir][TilingInterface] Allow tile and fuse to work with `ReductionTilingStrategy::PartialReductionOuterParallelStrategy`. (#147593 ) Since `scf::tileUsingSCF` is the core method used for tiling the root operation within the `scf::tileConsumersAndFuseProducersUsingSCF`, the latter can fuse into any tiled loop generated using `scf::tileUsingSCF`. This patch adds a test for tiling a root operation using `ReductionTilingStrategy::PartialReductionOuterParallelStrategy` and fusing producers with it. Since this strategy generates a rank-reducing extract slice `tensor::replaceExtractSliceWithTiledProducer` which is the core method used for the fusion was extended to handle the rank-reducing slices. Also fix a small bug in the computation of the reduction induction variable (which needs to use `floorDiv` instead of `ceilDiv`) Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>	2025-07-09 08:50:01 -07:00
Hu Yufan	945ed2ed5e	[MLIR][SCF] fix loop pipelining pass use of uninitialized value (#146991 ) fix issue https://github.com/llvm/llvm-project/issues/146990	2025-07-05 12:21:28 -07:00
Andrei Golubev	a63f572628	[mlir][bufferization] Return BufferLikeType in BufferizableOpInterface (#144867 ) Support custom types (2/N): allow value-owning operations (e.g. allocation ops) to bufferize custom tensors into custom buffers. This requires BufferizableOpInterface::getBufferType() to return BufferLikeType instead of BaseMemRefType. Affected implementors of the interface are updated accordingly. Relates to ee070d08163ac09842d9bf0c1315f311df39faf1.	2025-07-02 11:27:35 -07:00
MaheshRavishankar	c873e5f87d	[mlir][TilingInterface] Handle multi operand consumer fusion. (#145193 ) For consumer fusion cases of this form ``` %0:2 = scf.forall .. shared_outs(%arg0 = ..., %arg0 = ...) { tensor.parallel_insert_slice ... into %arg0 tensor.parallel_insert_slice ... into %arg1 } %1 = linalg.generic ... ins(%0#0, %0#1) ``` the current consumer fusion that handles one slice at a time cannot fuse the consumer into the loop, since fusing along one slice will create and SSA violation on the other use from the `scf.forall`. The solution is to allow consumer fusion to allow considering multiple slices at once. This PR changes the `TilingInterface` methods related to consumer fusion, i.e. - `getTiledImplementationFromOperandTile` - `getIterationDomainFromOperandTile` to allow fusion while considering multiple operands. It is upto the `TilingInterface` implementation to return an error if a list of tiles of the operands cannot result in a consistent implementation of the tiled operation. The Linalg operation implementation of `TilingInterface` has been modified to account for these changes and allow cases where operand tiles that can result in a consistent tiling implementation are handled. --------- Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>	2025-06-25 11:54:38 -07:00
Skrai Pardus	a45fda6aeb	switch type and value ordering for arith `Constant[XX]Op` (#144636 ) This change standardizes the order of the parameters for `Constant[XXX] Ops` to match with all other `Op` `build()` constructors. In all instances of generated code for the MLIR dialects's Ops (that is the TableGen using the .td files to create the .h.inc/.cpp.inc files), the desired result type is always specified before the value. Examples: ``` // ArithOps.h.inc class ConstantOp : public ::mlir::Op<ConstantOp, ::mlir::OpTrait::ZeroRegions, ::mlir::OpTrait::OneResult, ::mlir::OpTrait::OneTypedResult<::mlir::Type>::Impl, ::mlir::OpTrait::ZeroSuccessors, ::mlir::OpTrait::ZeroOperands, ::mlir::OpTrait::OpInvariants, ::mlir::BytecodeOpInterface::Trait, ::mlir::OpTrait::ConstantLike, ::mlir::ConditionallySpeculatable::Trait, ::mlir::OpTrait::AlwaysSpeculatableImplTrait, ::mlir::MemoryEffectOpInterface::Trait, ::mlir::OpAsmOpInterface::Trait, ::mlir::InferIntRangeInterface::Trait, ::mlir::InferTypeOpInterface::Trait> { public: .... static void build(::mlir::OpBuilder &odsBuilder, ::mlir::OperationState &odsState, ::mlir::Type result, ::mlir::TypedAttr value); static void build(::mlir::OpBuilder &odsBuilder, ::mlir::OperationState &odsState, ::mlir::TypedAttr value); static void build(::mlir::OpBuilder &odsBuilder, ::mlir::OperationState &odsState, ::mlir::TypeRange resultTypes, ::mlir::TypedAttr value); static void build(::mlir::OpBuilder &, ::mlir::OperationState &odsState, ::mlir::TypeRange resultTypes, ::mlir::ValueRange operands, ::llvm::ArrayRef<::mlir::NamedAttribute> attributes = {}); static void build(::mlir::OpBuilder &odsBuilder, ::mlir::OperationState &odsState, ::mlir::ValueRange operands, ::llvm::ArrayRef<::mlir::NamedAttribute> attributes = {}); ... ``` ``` // ArithOps.h.inc class SubIOp : public ::mlir::Op<SubIOp, ::mlir::OpTrait::ZeroRegions, ::mlir::OpTrait::OneResult, ::mlir::OpTrait::OneTypedResult<::mlir::Type>::Impl, ::mlir::OpTrait::ZeroSuccessors, ::mlir::OpTrait::NOperands<2>::Impl, ::mlir::OpTrait::OpInvariants, ::mlir::BytecodeOpInterface::Trait, ::mlir::ConditionallySpeculatable::Trait, ::mlir::OpTrait::AlwaysSpeculatableImplTrait, ::mlir::MemoryEffectOpInterface::Trait, ::mlir::InferIntRangeInterface::Trait, ::mlir::arith::ArithIntegerOverflowFlagsInterface::Trait, ::mlir::OpTrait::SameOperandsAndResultType, ::mlir::VectorUnrollOpInterface::Trait, ::mlir::OpTrait::Elementwise, ::mlir::OpTrait::Scalarizable, ::mlir::OpTrait::Vectorizable, ::mlir::OpTrait::Tensorizable, ::mlir::InferTypeOpInterface::Trait> { public: ... static void build(::mlir::OpBuilder &odsBuilder, ::mlir::OperationState &odsState, ::mlir::Type result, ::mlir::Value lhs, ::mlir::Value rhs, ::mlir::arith::IntegerOverflowFlagsAttr overflowFlags); static void build(::mlir::OpBuilder &odsBuilder, ::mlir::OperationState &odsState, ::mlir::Value lhs, ::mlir::Value rhs, ::mlir::arith::IntegerOverflowFlagsAttr overflowFlags); static void build(::mlir::OpBuilder &odsBuilder, ::mlir::OperationState &odsState, ::mlir::TypeRange resultTypes, ::mlir::Value lhs, ::mlir::Value rhs, ::mlir::arith::IntegerOverflowFlagsAttr overflowFlags); static void build(::mlir::OpBuilder &odsBuilder, ::mlir::OperationState &odsState, ::mlir::Type result, ::mlir::Value lhs, ::mlir::Value rhs, ::mlir::arith::IntegerOverflowFlags overflowFlags = ::mlir::arith::IntegerOverflowFlags::none); static void build(::mlir::OpBuilder &odsBuilder, ::mlir::OperationState &odsState, ::mlir::Value lhs, ::mlir::Value rhs, ::mlir::arith::IntegerOverflowFlags overflowFlags = ::mlir::arith::IntegerOverflowFlags::none); static void build(::mlir::OpBuilder &odsBuilder, ::mlir::OperationState &odsState, ::mlir::TypeRange resultTypes, ::mlir::Value lhs, ::mlir::Value rhs, ::mlir::arith::IntegerOverflowFlags overflowFlags = ::mlir::arith::IntegerOverflowFlags::none); static void build(::mlir::OpBuilder &, ::mlir::OperationState &odsState, ::mlir::TypeRange resultTypes, ::mlir::ValueRange operands, ::llvm::ArrayRef<::mlir::NamedAttribute> attributes = {}); static void build(::mlir::OpBuilder &odsBuilder, ::mlir::OperationState &odsState, ::mlir::ValueRange operands, ::llvm::ArrayRef<::mlir::NamedAttribute> attributes = {}); ... ``` In comparison, in the distinct case of `ConstantIntOp` and `ConstantFloatOp`, the ordering of the result type and the value is switched. Thus, this PR corrects the ordering of the aforementioned `Constant[XXX]Ops` to match with other constructors.	2025-06-23 23:35:50 +02:00
MaheshRavishankar	7bc956d3d6	[mlir][PartialReductionTilingInterface] Add support for `ReductionTilingStrategy::PartialReductionOuterParallel` in `tileUsingSCF`. (#143988 ) Following up from https://github.com/llvm/llvm-project/pull/143467, this PR adds support for `ReductionTilingStrategy::PartialReductionOuterParallel` to `tileUsingSCF`. The implementation of `PartialReductionTilingInterface` for `Linalg` ops has been updated to support this strategy as well. This makes the `tileUsingSCF` come on par with `linalg::tileReductionUsingForall` which will be deprecated subsequently. Changes summary - `PartialReductionTilingInterface` changes : - `tileToPartialReduction` method needed to get the induction variables of the generated tile loops. This was needed to keep the generated code similar to `linalg::tileReductionUsingForall`, specifically to create a simplified access for slicing the intermediate partial results tensor when tiled in `num_threads` mode. - `getPartialResultTilePosition` methods needs the induction varialbes for the generated tile loops for the same reason above, and also needs the `tilingStrategy` to be passed in to generate correct code. The tests in `transform-tile-reduction.mlir` testing the `linalg::tileReductionUsingForall` have been moved over to test `scf::tileUsingSCF` with `ReductionTilingStrategy::PartialReductionOuterParallel` strategy. Some of the test that were doing further cyclic distribution of the transformed code from tiling are removed. Those seem like two separate transformation that were merged into one. Ideally that would need to happen when resolving the `scf.forall` rather than during tiling. Please review only the top commit. Depends on https://github.com/llvm/llvm-project/pull/143467 Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>	2025-06-23 12:27:26 -07:00

1 2 3 4 5 ...

427 Commits