427 Commits

Author SHA1 Message Date
ziereis
d376a7e75f
[mlir] TileUsingInterface bugfix for dominance error (#178190)
In this PR i move the insertion point in the
`yieldReplacementForFusedProducer` because i ran into some issue where a
`tensor.extract_slices` tried to use a result of `affine.apply` that was
inserted at the end of the block instead of the start of it.

This is the full error of the test i added before this change:

```mlir
third-party/llvm-project/mlir/test/Interfaces/TilingInterface/tile-fuse-and-yield-using-scfforall.mlir:83:11: error: operand #1 does not dominate this use
  %pack = linalg.pack %gen#1
          ^
third-party/llvm-project/mlir/test/Interfaces/TilingInterface/tile-fuse-and-yield-using-scfforall.mlir:83:11: note: see current operation: %24 = "tensor.extract_slice"(%23, %36, %8) <{operandSegmentSizes = array<i32: 1, 1, 1, 0>, static_offsets = array<i64: -9223372036854775808, 0>, static_sizes = array<i64: -9223372036854775808, 1024>, static_strides = array<i64: 1, 1>}> : (tensor<32x1024xf32>, index, index) -> tensor<?x1024xf32>
third-party/llvm-project/mlir/test/Interfaces/TilingInterface/tile-fuse-and-yield-using-scfforall.mlir:71:12: note: operand defined here (op in the same block)
  %gen:2 = linalg.generic {
           ^
// -----// IR Dump After InterpreterPass Failed (transform-interpreter) //----- //
#map = affine_map<(d0, d1) -> (d0, d1)>
#map1 = affine_map<(d0) -> (d0 * 16)>
#map2 = affine_map<(d0) -> (d0 * -16 + 32)>
#map3 = affine_map<(d0) -> (16, d0 * -16 + 32)>
#map4 = affine_map<(d0) -> (d0 - 1)>
"builtin.module"() ({
  "func.func"() <{function_type = (tensor<32x1024xf32>) -> (tensor<32x1024xf32>, tensor<2x512x16x2xi8>), sym_name = "fuse_pack_consumer_into_multi_output_generic"}> ({
  ^bb0(%arg1: tensor<32x1024xf32>):
    %2 = "arith.constant"() <{value = 0 : i8}> : () -> i8
    %3 = "tensor.empty"() : () -> tensor<32x1024xf32>
    %4 = "tensor.empty"() : () -> tensor<32x1024xi8>
    %5 = "tensor.empty"() : () -> tensor<2x512x16x2xi8>
    %6:2 = "linalg.generic"(%arg1, %3, %4) <{indexing_maps = [#map, #map, #map], iterator_types = [#linalg.iterator_type<parallel>, #linalg.iterator_type<parallel>], operandSegmentSizes = array<i32: 1, 2>}> ({
    ^bb0(%arg9: f32, %arg10: f32, %arg11: i8):
      %41 = "arith.fptoui"(%arg9) : (f32) -> i8
      "linalg.yield"(%arg9, %41) : (f32, i8) -> ()
    }) : (tensor<32x1024xf32>, tensor<32x1024xf32>, tensor<32x1024xi8>) -> (tensor<32x1024xf32>, tensor<32x1024xi8>)
    %7:3 = "scf.forall"(%5, %3, %4) <{operandSegmentSizes = array<i32: 0, 0, 0, 3>, staticLowerBound = array<i64: 0>, staticStep = array<i64: 1>, staticUpperBound = array<i64: 2>}> ({
    ^bb0(%arg2: index, %arg3: tensor<2x512x16x2xi8>, %arg4: tensor<32x1024xf32>, %arg5: tensor<32x1024xi8>):
      %8 = "affine.apply"(%arg2) <{map = #map1}> : (index) -> index
      %9 = "affine.apply"(%arg2) <{map = #map2}> : (index) -> index
      %10 = "affine.min"(%arg2) <{map = #map3}> : (index) -> index
      %11 = "affine.apply"(%10) <{map = #map4}> : (index) -> index
      %12 = "affine.apply"(%arg2) <{map = #map1}> : (index) -> index
      %13 = "affine.apply"(%10) <{map = #map4}> : (index) -> index
      %14 = "affine.apply"(%arg2) <{map = #map1}> : (index) -> index
      %15 = "affine.apply"(%10) <{map = #map4}> : (index) -> index
      %16 = "affine.apply"(%arg2) <{map = #map1}> : (index) -> index
      %17 = "affine.apply"(%10) <{map = #map4}> : (index) -> index
      %18 = "tensor.extract_slice"(%arg1, %12, %10) <{operandSegmentSizes = array<i32: 1, 1, 1, 0>, static_offsets = array<i64: -9223372036854775808, 0>, static_sizes = array<i64: -9223372036854775808, 1024>, static_strides = array<i64: 1, 1>}> : (tensor<32x1024xf32>, index, index) -> tensor<?x1024xf32>
      %19 = "tensor.empty"() : () -> tensor<32x1024xf32>
      %20 = "tensor.extract_slice"(%19, %14, %10) <{operandSegmentSizes = array<i32: 1, 1, 1, 0>, static_offsets = array<i64: -9223372036854775808, 0>, static_sizes = array<i64: -9223372036854775808, 1024>, static_strides = array<i64: 1, 1>}> : (tensor<32x1024xf32>, index, index) -> tensor<?x1024xf32>
      %21 = "tensor.extract_slice"(%3, %14, %10) <{operandSegmentSizes = array<i32: 1, 1, 1, 0>, static_offsets = array<i64: -9223372036854775808, 0>, static_sizes = array<i64: -9223372036854775808, 1024>, static_strides = array<i64: 1, 1>}> : (tensor<32x1024xf32>, index, index) -> tensor<?x1024xf32>
      %22 = "tensor.empty"() : () -> tensor<32x1024xi8>
      %23 = "tensor.extract_slice"(%22, %16, %10) <{operandSegmentSizes = array<i32: 1, 1, 1, 0>, static_offsets = array<i64: -9223372036854775808, 0>, static_sizes = array<i64: -9223372036854775808, 1024>, static_strides = array<i64: 1, 1>}> : (tensor<32x1024xi8>, index, index) -> tensor<?x1024xi8>
      %24 = "tensor.extract_slice"(%4, %16, %10) <{operandSegmentSizes = array<i32: 1, 1, 1, 0>, static_offsets = array<i64: -9223372036854775808, 0>, static_sizes = array<i64: -9223372036854775808, 1024>, static_strides = array<i64: 1, 1>}> : (tensor<32x1024xi8>, index, index) -> tensor<?x1024xi8>
      %25 = "tensor.empty"() : () -> tensor<32x1024xf32>
      %26 = "tensor.extract_slice"(%25, %38, %10) <{operandSegmentSizes = array<i32: 1, 1, 1, 0>, static_offsets = array<i64: -9223372036854775808, 0>, static_sizes = array<i64: -9223372036854775808, 1024>, static_strides = array<i64: 1, 1>}> : (tensor<32x1024xf32>, index, index) -> tensor<?x1024xf32>
      %27 = "tensor.extract_slice"(%arg4, %38, %10) <{operandSegmentSizes = array<i32: 1, 1, 1, 0>, static_offsets = array<i64: -9223372036854775808, 0>, static_sizes = array<i64: -9223372036854775808, 1024>, static_strides = array<i64: 1, 1>}> : (tensor<32x1024xf32>, index, index) -> tensor<?x1024xf32>
      %28 = "tensor.empty"() : () -> tensor<32x1024xi8>
      %29 = "tensor.extract_slice"(%28, %8, %10) <{operandSegmentSizes = array<i32: 1, 1, 1, 0>, static_offsets = array<i64: -9223372036854775808, 0>, static_sizes = array<i64: -9223372036854775808, 1024>, static_strides = array<i64: 1, 1>}> : (tensor<32x1024xi8>, index, index) -> tensor<?x1024xi8>
      %30 = "tensor.extract_slice"(%arg5, %8, %10) <{operandSegmentSizes = array<i32: 1, 1, 1, 0>, static_offsets = array<i64: -9223372036854775808, 0>, static_sizes = array<i64: -9223372036854775808, 1024>, static_strides = array<i64: 1, 1>}> : (tensor<32x1024xi8>, index, index) -> tensor<?x1024xi8>
      %31:2 = "linalg.generic"(%18, %27, %30) <{indexing_maps = [#map, #map, #map], iterator_types = [#linalg.iterator_type<parallel>, #linalg.iterator_type<parallel>], operandSegmentSizes = array<i32: 1, 2>}> ({
      ^bb0(%arg6: f32, %arg7: f32, %arg8: i8):
        %40 = "arith.fptoui"(%arg6) : (f32) -> i8
        "linalg.yield"(%arg6, %40) : (f32, i8) -> ()
      }) : (tensor<?x1024xf32>, tensor<?x1024xf32>, tensor<?x1024xi8>) -> (tensor<?x1024xf32>, tensor<?x1024xi8>)
      %32 = "tensor.extract_slice"(%6#1, %8, %10) <{operandSegmentSizes = array<i32: 1, 1, 1, 0>, static_offsets = array<i64: -9223372036854775808, 0>, static_sizes = array<i64: -9223372036854775808, 1024>, static_strides = array<i64: 1, 1>}> : (tensor<32x1024xi8>, index, index) -> tensor<?x1024xi8>
      %33 = "tensor.empty"() : () -> tensor<2x512x16x2xi8>
      %34 = "tensor.extract_slice"(%33, %arg2) <{operandSegmentSizes = array<i32: 1, 1, 0, 0>, static_offsets = array<i64: -9223372036854775808, 0, 0, 0>, static_sizes = array<i64: 1, 512, 16, 2>, static_strides = array<i64: 1, 1, 1, 1>}> : (tensor<2x512x16x2xi8>, index) -> tensor<1x512x16x2xi8>
      %35 = "tensor.extract_slice"(%arg3, %arg2) <{operandSegmentSizes = array<i32: 1, 1, 0, 0>, static_offsets = array<i64: -9223372036854775808, 0, 0, 0>, static_sizes = array<i64: 1, 512, 16, 2>, static_strides = array<i64: 1, 1, 1, 1>}> : (tensor<2x512x16x2xi8>, index) -> tensor<1x512x16x2xi8>
      %36 = "linalg.pack"(%31#1, %35, %2) <{inner_dims_pos = array<i64: 0, 1>, operandSegmentSizes = array<i32: 1, 1, 1, 0>, static_inner_tiles = array<i64: 16, 2>}> : (tensor<?x1024xi8>, tensor<1x512x16x2xi8>, i8) -> tensor<1x512x16x2xi8>
      %37 = "affine.apply"(%10) <{map = #map4}> : (index) -> index
      %38 = "affine.apply"(%arg2) <{map = #map1}> : (index) -> index
      %39 = "affine.apply"(%10) <{map = #map4}> : (index) -> index
      "scf.forall.in_parallel"() ({
        "tensor.parallel_insert_slice"(%36, %arg3, %arg2) <{operandSegmentSizes = array<i32: 1, 1, 1, 0, 0>, static_offsets = array<i64: -9223372036854775808, 0, 0, 0>, static_sizes = array<i64: 1, 512, 16, 2>, static_strides = array<i64: 1, 1, 1, 1>}> : (tensor<1x512x16x2xi8>, tensor<2x512x16x2xi8>, index) -> ()
        "tensor.parallel_insert_slice"(%31#0, %arg4, %38, %10) <{operandSegmentSizes = array<i32: 1, 1, 1, 1, 0>, static_offsets = array<i64: -9223372036854775808, 0>, static_sizes = array<i64: -9223372036854775808, 1024>, static_strides = array<i64: 1, 1>}> : (tensor<?x1024xf32>, tensor<32x1024xf32>, index, index) -> ()
        "tensor.parallel_insert_slice"(%31#1, %arg5, %8, %10) <{operandSegmentSizes = array<i32: 1, 1, 1, 1, 0>, static_offsets = array<i64: -9223372036854775808, 0>, static_sizes = array<i64: -9223372036854775808, 1024>, static_strides = array<i64: 1, 1>}> : (tensor<?x1024xi8>, tensor<32x1024xi8>, index, index) -> ()
      }) : () -> ()
    }) : (tensor<2x512x16x2xi8>, tensor<32x1024xf32>, tensor<32x1024xi8>) -> (tensor<2x512x16x2xi8>, tensor<32x1024xf32>, tensor<32x1024xi8>)
    "func.return"(%7#1, %7#0) : (tensor<32x1024xf32>, tensor<2x512x16x2xi8>) -> ()
  }) : () -> ()
  "builtin.module"() ({
    "transform.named_sequence"() <{arg_attrs = [{transform.readonly}], function_type = (!transform.any_op) -> (), sym_name = "__transform_main"}> ({
    ^bb0(%arg0: !transform.any_op):
      %0 = "transform.structured.match"(%arg0) <{ops = ["linalg.pack"]}> : (!transform.any_op) -> !transform.any_op
      %1:2 = "transform.test.fuse_and_yield"(%0) <{tile_interchange = [], tile_sizes = [1], use_forall = true}> : (!transform.any_op) -> (!transform.any_op, !transform.any_op)
      "transform.yield"() : () -> ()
    }) : () -> ()
  }) {transform.with_named_sequence} : () -> ()
}) : () -> ()
``` 

I also noticed that Interface tests are missing from the bazel overlay
so i also added this.
2026-02-02 11:09:10 -08:00
Matthias Springer
4f9843b9e2
Revert "[mlir][Interfaces] Add ExecutionProgressOpInterface + folding pattern (#179039)" (#179241)
This reverts commit 48c664b85f8fb043bc113ed0684e156e0f30f1bd.
This reverts commit 1e33b736ff3888b04206acb9fbd9f6623c0723aa.
2026-02-02 14:57:19 +00:00
Matthias Springer
1e33b736ff
[mlir][Interfaces] Add ExecutionProgressOpInterface + folding pattern (#179039)
Add the `ExecutionProgressOpInterface` with an interface method to check
if an operation "must progress". Add `mustProgress` attributes to
`scf.for` and `scf.while` (default value is "true").

`mustProgress` corresponds to the [`llvm.loop.mustprogress`
metadata](https://llvm.org/docs/LangRef.html#langref-llvm-loop-mustprogress).

Also add a canonicalization pattern to erase `RegionBranchOpInterface`
ops that must progress but loop infinitely (and are non-side-effecting).
This canonicalization pattern is enabled for `scf.for` and `scf.while`.

RFC: https://discourse.llvm.org/t/infinite-loops-and-dead-code/89530
2026-02-02 08:25:45 +01:00
Jakub Kuderski
9aaf0b89f5
[mlir] Apply clang-tidy check llvm-use-vector-utils. NFC. (#178526) 2026-01-29 02:19:00 +00:00
Jakub Kuderski
59e44799bd
[mlir] Fix new clang-tidy warning llvm-type-switch-case-types. NFC. (#178487)
Pre-commiting this before landing the new check in
https://github.com/llvm/llvm-project/pull/177892
2026-01-28 19:13:47 +00:00
Matthias Springer
14bdd06fba
[mlir][DialectUtils] Fix 0 step handling in constantTripCount (#177329)
A step size of "zero" does not indicate "zero iterations". It may
indicate an infinite number of iterations.

This commit makes some transformations more conservative. We used to
fold away some loops with step size 0 and that's now no longer the case.

Relation discussion:
https://discourse.llvm.org/t/infinite-loops-and-dead-code/89530
2026-01-25 19:47:35 +00:00
Prathamesh Tagore
689b8a3ed4
[mlir][scf] Skip ops having results with warning in forall-to-for pass (#175926)
Avoid converting scf.forall ops that have results in forall-to-for pass.
Emit a warning instead of failing the pass, so mlir-opt can still
produce output on mixed IR.

Fixes https://github.com/llvm/llvm-project/issues/174319
2026-01-19 15:48:57 -08:00
Bangtian Liu
fb0881f628
[mlir][Tensor] Add rank-reducing slice in generatedSlices (#174248)
When `replaceExtractSliceWithTiledProducer `creates a rank-reducing
slice to handle type mismatches, it should be tracked in
`generatedSlices `so downstream cleanup patterns (like IREE's
FoldExtractSliceOfBroadcast) can process it.
 
This PR also fixes an infinite loop in getUntiledProducerFromSliceSource
where adding the slice to generatedSlices caused the fusion worklist to
repeatedly try to re-fuse producers already inside the innermost loop;
the fix skips producers that are already inside the innermost loop via
an isProperAncestor check.

Added a lit test (@fuse_through_rank_reducing_slice) demonstrating
correct fusion through rank-reducing slices. Note that demonstrating the
generatedSlices tracking benefit requires a cleanup pattern
(SwapExtractSliceWithFillPatterns) to consume the slice; IREE's full CI
suite (iree-org/iree#23012) validates this works correctly in practice
with patterns like FoldExtractSliceOfBroadcast.

---------

Signed-off-by: Bangtian Liu <liubangtian@gmail.com>
2026-01-14 18:29:41 -05:00
ofri frishman
6559bdab61
[mlir] Remove loop peeling explicit C'tor (#175419)
The SCF dialect loop peeling pass has an explicit C'tor. This creates an
inconvenience to use non default pass options, since they can only be
passed as a string after the pass creation. After removing the explicit
C'tor, the code auto generation creates 2 C'tors, which one of them can
directly receive pass options struct in case non default values are
required.
The explicit C'tor does not match auto generated C'tor convention, so
this change requires any uses of the pass in downstream projects to
update to use the auto generated C'tor.
2026-01-11 11:59:05 +02:00
Victor Chernyakin
c438773432
[LLVM][ADT] Migrate users of make_scope_exit to CTAD (#174030)
This is a followup to #173131, which introduced the CTAD functionality.
2026-01-02 20:42:56 -08:00
Ming Yan
2c21790983
Revert "[MLIR][SCF] Sink scf.if from scf.while before region into after region in scf-uplift-while-to-for" (#169888)
Reverts llvm/llvm-project#165216
It is implemented in #169892 .
2025-12-01 19:02:02 +08:00
Ming Yan
25d027b8ab
[MLIR][SCF] Sink scf.if from scf.while before region into after region in scf-uplift-while-to-for (#165216)
When a `scf.if` directly precedes an `scf.condition` in the before
region of an `scf.while` and both share the same condition, move the if
into the after region of the loop. This helps simplify the control flow
to enable uplifting `scf.while` to `scf.for`.
2025-11-27 23:30:05 +08:00
MaheshRavishankar
dbeda4f419
[mlir][SCF] Add scf::tileAndFuseConsumer that tiles a consumer into a given tiled loop nest. (#167634)
The existing `scf::tileAndFuseConsumerOfSlices` takes a list of slices
(and loops they are part of), tries to find the consumer of these slices
(all slices are expected to be the same consumer), and then tiles the
consumer into the loop nest using the `TilingInterface`. A more natural
way of doing consumer fusion is to just start from the consumer, look
for operands that are produced by the loop nest passed in as `loops`
(presumably these loops are generated by tiling, but that is not a
requirement for consumer fusion). Using the consumer you can find the
slices of the operands that are accessed within the loop which you can
then use to tile and fuse the consumer (using `TilingInterface`). This
handles more naturally the case where multiple operands of the consumer
come from the loop nest.

The `scf::tileAndFuseConsumerOfSlices` was implemented as a mirror of
`scf::tileAndFuseProducerOfSlice`. For the latter, the slice has a
single producer for the source of the slice, which makes it a natural
way of specifying producer fusion. But for consumers, the result might
have multiple users, resulting in multiple candidates for fusion, as
well as a fusion candidate using multiple results from the tiled loop
nest. This means using slices
(`tensor.insert_slice`/`tensor.parallel_insert_slice`) as a hook for
consumer fusion turns out to be quite hard to navigate. The use of the
consumer directly avoids all those pain points. In time the
`scf::tileAndFuseConsumerOfSlices` should be deprecated in favor of
`scf::tileAndFuseConsumer`. There is a lot of tech-debt that has
accumulated in `scf::tileAndFuseConsumerOfSlices` that needs to be
cleanedup. So while that gets cleaned up, and required functionality is
moved to `scf::tileAndFuseConsumer`, the old path is still maintained.

The test for `scf::tileAndFuseConsumerUsingSlices` is copied to
`tile-and-fuse-consumer.mlir` to
`tile-and-fuse-consumer-using-slices.mlir`. All the tests that were
there in this file are now using the `tileAndFuseConsumer` method. The
test op `test.tile_and_fuse_consumer` is modified to call
`scf::tileAndFuseConsumer`, while a new op
`test.tile_and_fuse_consumer_of_slice` is used to keep the old path
tested while it is deprecated.

---------

Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
2025-11-20 14:13:57 -08:00
Kazu Hirata
2844d86734
[mlir] Remove unused local variables (NFC) (#167107)
Identified with bugprone-unused-local-non-trivial-variable.
2025-11-08 07:41:15 -08:00
Jakub Kuderski
ba0be89cd2
[mlir] Simplify Default cases in type switches. NFC. (#165767)
Use default values instead of lambdas when possible. `std::nullopt` and
`nullptr` can be used now because of
https://github.com/llvm/llvm-project/pull/165724.
2025-10-30 15:10:59 -04:00
Mehdi Amini
41f65666f6
[MLIR] Revamp RegionBranchOpInterface (#165429)
This is still somehow a WIP, we have some issues with this interface
that are not trivial to solve. This patch tries to make the concepts of
RegionBranchPoint and RegionSuccessor more robust and aligned with their
definition:
- A `RegionBranchPoint` is either the parent (`RegionBranchOpInterface`)
op or a `RegionBranchTerminatorOpInterface` operation in a nested
region.
- A `RegionSuccessor` is either one of the nested region or the parent
`RegionBranchOpInterface`

Some new methods with reasonnable default implementation are added to
help resolving the flow of values across the RegionBranchOpInterface.

It is still not trivial in the current state to walk the def-use chain
backward with this interface. For example when you have the 3rd block
argument in the entry block of a for-loop, finding the matching operands
requires to know about the hidden loop iterator block argument and where
the iterargs start. The API is designed around forward-tracking of the
chain unfortunately.

Try to reland #161575 ; I suspect a buildbot incremental build issue.
2025-10-28 09:53:56 -07:00
Mehdi Amini
e3c547179f
Revert " [MLIR] Revamp RegionBranchOpInterface " (#165356)
Reverts llvm/llvm-project#161575

Broke Windows on ARM buildbot build, needs investigations.
2025-10-28 01:06:14 -07:00
Mehdi Amini
ab1fd21b54
[MLIR] Revamp RegionBranchOpInterface (#161575)
This is still somehow a WIP, we have some issues with this interface
that are not trivial to solve. This patch tries to make the concepts of
RegionBranchPoint and RegionSuccessor more robust and aligned with their
definition:
- A `RegionBranchPoint` is either the parent (`RegionBranchOpInterface`)
op or a `RegionBranchTerminatorOpInterface` operation in a nested
region.
- A `RegionSuccessor` is either one of the nested region or the parent
`RegionBranchOpInterface`

Some new methods with reasonnable default implementation are added to
help resolving the flow of values across the RegionBranchOpInterface.

It is still not trivial in the current state to walk the def-use chain
backward with this interface. For example when you have the 3rd block
argument in the entry block of a for-loop, finding the matching operands
requires to know about the hidden loop iterator block argument and where
the iterargs start. The API is designed around forward-tracking of the
chain unfortunately.
2025-10-28 07:47:26 +00:00
ddubov100
af4fbcaed7
[mlir][SCF] Fix UB adjustment during scf.for loop peeling
Currently when peeling the first iteration, any mentioning of UB within the loop body is replaced with the new UB in the peeled out first iteration. This introduces a bug in the following scenario: Operations inside of the loop that intentionally use the original UB are incorrectly updated.
2025-10-20 18:18:08 +02:00
Jakub Kuderski
8bab6c4e8c
[mlir] Simplify unreachable type switch cases. NFC. (#162032)
Use `DefaultUnreachable` from
https://github.com/llvm/llvm-project/pull/161970.
2025-10-06 09:23:25 -04:00
sebvince
178651ac87
[MLIR][SCF] Add loops as parameter to LoopTerminator callback when using CustomOp. (#161386)
This PR adds to the generateLoopTerminatorFn callback the loops
generated by GenerateLoopHeaderFn. This is needed to correctly set the
insertion point with scf.forall ops.
2025-09-30 09:46:05 -07:00
Dor Arad
373a2f1f22
[mlir][scf] ExecuteRegionOp bufferization to consider no_inline attr (#160697)
Fix a bug where ExecuteRegionOp bufferization dropped the "no_inline"
attribute.

Co-authored-by: Dor Arad <dor.arad@mobileye.com>
2025-09-25 15:53:00 +02:00
Artemy Skrebkov
72b8073d05
[mlir][SCF] Add scf.index_switch support for populateSCFStructuralTypeConversionsAndLegality (#160344)
In a downstream project, there is a need for a type conversion pattern
for scf.index_switch operation. A test is added into
`mlir/test/Dialect/SparseTensor/scf_1_N_conversion.mlir` (not sure this
functionality is really required for sparse tensors, but the test
showcase that the new conversion pattern is functional)
2025-09-24 13:50:05 +02:00
MaheshRavishankar
fa19a57b87
[mlir][SCF] Allow using a custom operation to generate loops with mlir::tileUsingSCF. (#159660)
This change adds an option to use a custom operation to generate the
inter-tile loops during tiling. When the loop type is set to
scf::SCFTilingOptions::LoopType::CustomOp, the method
mlir::tileUsingSCF provides two callback functions

First one to generate the header of the loop.
Second one to generate the terminator of the loop.
These methods receive the information needed to generate the
loops/terminator and expect to return information needed to generate
the code for the intra-tile computation. See comments for more
details.

Presently this is adds support only for tiling. Subsequent commits
will update this to add support for fusion as well.

The PR is split into two commits.

The first commit is an NFC that just refactors the code (and cleans up
some naming) to make it easier to add the support for custom loop
operations.
The second commit adds the support for using a custom loop operation, as
well as a test to exercise this path.

Note that this is duplicate of
https://github.com/llvm/llvm-project/pull/159506 that was accidently
committed and was reverted in
https://github.com/llvm/llvm-project/pull/159598 to wait for reviews.

Signed-off-by: MaheshRavishankar
[mahesh.ravishankar@gmail.com](mailto:mahesh.ravishankar@gmail.com)

---------

Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
2025-09-22 15:20:02 -07:00
lonely eagle
a25eda2bec
[mlir][scf] Modify the return logic of generateLoopNestUsingForOp (NFC) (#159394)
When loops is empty, avoid executing yieldTiledValuesFn and Add a test
which all tile sizes are set to zero.
2025-09-23 00:40:54 +08:00
Matthias Springer
3c862b4ba3
[mlir] Expose optional PatternBenefit to function / SCF populate functions (NFC) (#159752)
Pattern benefit allows users to give priority to a pattern.
2025-09-19 13:46:04 +02:00
MaheshRavishankar
cfaf23927c
Revert "[mlir][SCF] Allow using a custom operation to generate loops with mlir::tileUsingSCF." (#159598)
Reverts llvm/llvm-project#159506

It was committed by accident. Reverting it for reviews.
2025-09-18 14:52:29 -07:00
MaheshRavishankar
b8649098a7
[mlir][SCF] Allow using a custom operation to generate loops with mlir::tileUsingSCF. (#159506)
This change adds an option to use a custom operation to generate the
inter-tile loops during tiling. When the loop type is set to
`scf::SCFTilingOptions::LoopType::CustomOp`, the method
`mlir::tileUsingSCF` provides two callback functions

1. First one to generate the header of the loop.
2. Second one to generate the terminator of the loop.

These methods receive the information needed to generate the
loops/terminator and expect to return information needed to generate
the code for the intra-tile computation. See comments for more
details.

Presently this is adds support only for tiling. Subsequent commits
will update this to add support for fusion as well.

The PR is split into two commits.
1) The first commit is an NFC that just refactors the code (and cleans
up some naming) to make it easier to add the support for custom loop
operations.
2) The second commit adds the support for using a custom loop operation,
as well as a test to exercise this path.

Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>

---------

Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
2025-09-18 09:28:44 -07:00
Mehdi Amini
c5062d7d63
Revert "[mlir] move if-condition propagation to a standalone pass" (#159535)
Reverts llvm/llvm-project#150278

Multiple post-merge comment remained undressed, and some more
fundamental issues were also reported in #159165
2025-09-18 09:54:53 +00:00
Mehdi Amini
fba26bc1a5
[MLIR] Fix SCF loop specialization (peeling) to work on scf.for with non-index type (#158707)
The current code would crash with integer. This is visible on this
modified example (the original with index was incorrect)
2025-09-16 14:09:07 +02:00
Alan Li
b87f1b22a8
[MLIR] Add InParallelOpInterface for parallel combining operations (#157736)
This commit:
- Introduces a new `InParallelOpInterface`, along with the
`ParallelCombiningOpInterface`, represent the parallel updating
operations we have in a parallel loop of `scf.forall`.
- Change the name of `ParallelCombiningOpInterface` to
`InParallelOpInterface` as the naming was quite confusing.
- `ParallelCombiningOpInterface` now is used to generalize operations
that insert into shared tensors within parallel combining regions.
Previously, only `tensor.parallel_insert_slice` was supported directly
in `scf.InParallelOp` regions.
- `tensor.parallel_insert_slice` now implements
`ParallelCombiningOpInterface`.

This change enables future extensions to support additional parallel
combining operations beyond `tensor.parallel_insert_slice`, which have
different update semantics, so the `in_parallel` region can correctly
and safely represent these kinds of operation without potential mistakes
such as races.

Author credits: @qedawkins
2025-09-12 14:23:00 -07:00
Mehdi Amini
8dee9e465b [MLIR] Apply clang-tidy fixes for readability-identifier-naming in ParallelLoopFusion.cpp (NFC) 2025-09-02 08:27:07 -07:00
Matthias Springer
337707a541
[mlir][Transforms] Dialect conversion: Context-aware type conversions (#140434)
This commit adds support for context-aware type conversions: type
conversion rules that can return different types depending on the IR.

There is no change for existing (context-unaware) type conversion rules:
```c++
// Example: Conversion any integer type to f32.
converter.addConversion([](IntegerType t) {
  return Float32Type::get(t.getContext());
}
```

There is now an additional overload to register context-aware type
conversion rules:
```c++
// Example: Type conversion rule for integers, depending on the context:
// Get the defining op of `v`, read its "increment" attribute and return an
// integer with a bitwidth that is increased by "increment".
converter.addConversion([](Value v) -> std::optional<Type> {
  auto intType = dyn_cast<IntegerType>(v.getType());
  if (!intType)
    return std::nullopt;
  Operation *op = v.getDefiningOp();
  if (!op)
    return std::nullopt;
  auto incrementAttr = op->getAttrOfType<IntegerAttr>("increment");
  if (!incrementAttr)
    return std::nullopt;
  return IntegerType::get(v.getContext(),
                          intType.getWidth() + incrementAttr.getInt());
});
```

For performance reasons, the type converter caches the result of type
conversions. This is no longer possible when there context-aware type
conversions because each conversion could compute a different type
depending on the context. There is no performance degradation when there
are only context-unaware type conversions.

Note: This commit just adds context-aware type conversions to the
dialect conversion framework. There are many existing patterns that
still call `converter.convertType(someValue.getType())`. These should be
gradually updated in subsequent commits to call
`converter.convertType(someValue)`.

Co-authored-by: Markus Böck <markus.boeck02@gmail.com>
2025-08-27 09:13:52 +02:00
Shay Kleiman
0cbb6e7d6c
[mlir][scf] Expose isPerfectlyNestedForLoops (#152115)
The function `isPerfectlyNestedForLoops` is useful on its own and so I'm
exposing it for downstream use.
2025-08-26 11:05:19 +03:00
Matthias Springer
21b607adbe
[mlir][SCF] scf.for: Add support for unsigned integer comparison (#153379)
Add a new unit attribute to allow for unsigned integer comparison.

Example:
```mlir
scf.for unsigned %iv_32 = %lb_32 to %ub_32 step %step_32 : i32 {
  // body
}
```

Discussion:
https://discourse.llvm.org/t/scf-should-scf-for-support-unsigned-comparison/84655
2025-08-15 10:59:14 +02:00
Michael Marjieh
12b9c0da04
[mlir][scf] Implement Conversion from scf.parallel to Nested scf.for (#147692)
Add a utility function/transform operation to convert `scf.parallel`
loops to nested `scf.for` loops.
2025-08-04 06:21:50 -07:00
Maksim Levental
c090ed53fb
[mlir][NFC] update mlir/Dialect create APIs (33/n) (#150659)
See https://github.com/llvm/llvm-project/pull/147168 for more info.
2025-07-25 16:13:55 -04:00
Jacques Pienaar
07967d4af8
[mlir] Switch to new LDBG macro (#150616)
Change local variants to use new central one.
2025-07-25 18:22:46 +02:00
Longsheng Mou
f047b735e9
[mlir][NFC] Use getDefiningOp<OpTy>() instead of dyn_cast<OpTy>(getDefiningOp()) (#150428)
This PR uses `val.getDefiningOp<OpTy>()` to replace `dyn_cast<OpTy>(val.getDefiningOp())` , `dyn_cast_or_null<OpTy>(val.getDefiningOp())` and `dyn_cast_if_present<OpTy>(val.getDefiningOp())`.
2025-07-25 10:35:51 +08:00
Maksim Levental
588845defd
[mlir][NFC] update mlir/Dialect create APIs (20/n) (#149927)
See https://github.com/llvm/llvm-project/pull/147168 for more info.
2025-07-24 07:09:58 -05:00
Longsheng Mou
3eb49c482c
[mlir][NFC] Use hasOneBlock instead of llvm::hasSingleElement(region) (#149809) 2025-07-24 10:11:21 +08:00
Kazu Hirata
0925d7572a
[mlir] Remove unused includes (NFC) (#150266)
These are identified by misc-include-cleaner.  I've filtered out those
that break builds.  Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
2025-07-23 15:18:53 -07:00
Oleksandr "Alex" Zinenko
9d11accf95
[mlir] move if-condition propagation to a standalone pass (#150278)
This offers a significant speedup over running this as a
canonicalizaiton pattern, up to 10x improvement when running on large
(>100k operations) inputs coming from Polygeist.

It is also not clear whether this transformation is a reasonable
canonicalization as it performs non-local rewrites.
2025-07-23 21:02:40 +02:00
Kazu Hirata
c06d3a7b72
[mlir] Remove unused includes (NFC) (#148769)
These are identified by misc-include-cleaner.  I've filtered out those
that break builds.  Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
2025-07-14 22:19:23 -07:00
MaheshRavishankar
c22352175e
[mlir][TilingInterface] Allow tile and fuse to work with ReductionTilingStrategy::PartialReductionOuterParallelStrategy. (#147593)
Since `scf::tileUsingSCF` is the core method used for tiling the root
operation within the `scf::tileConsumersAndFuseProducersUsingSCF`, the
latter can fuse into any tiled loop generated using `scf::tileUsingSCF`.
This patch adds a test for tiling a root operation using
`ReductionTilingStrategy::PartialReductionOuterParallelStrategy` and
fusing producers with it.

Since this strategy generates a rank-reducing extract slice
`tensor::replaceExtractSliceWithTiledProducer` which is the core method
used for the fusion was extended to handle the rank-reducing slices.

Also fix a small bug in the computation of the reduction induction
variable (which needs to use `floorDiv` instead of `ceilDiv`)

Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
2025-07-09 08:50:01 -07:00
Hu Yufan
945ed2ed5e
[MLIR][SCF] fix loop pipelining pass use of uninitialized value (#146991)
fix issue https://github.com/llvm/llvm-project/issues/146990
2025-07-05 12:21:28 -07:00
Andrei Golubev
a63f572628
[mlir][bufferization] Return BufferLikeType in BufferizableOpInterface (#144867)
Support custom types (2/N): allow value-owning operations (e.g.
allocation ops) to bufferize custom tensors into custom buffers. This
requires BufferizableOpInterface::getBufferType() to return
BufferLikeType instead of BaseMemRefType.

Affected implementors of the interface are updated accordingly.

Relates to ee070d08163ac09842d9bf0c1315f311df39faf1.
2025-07-02 11:27:35 -07:00
MaheshRavishankar
c873e5f87d
[mlir][TilingInterface] Handle multi operand consumer fusion. (#145193)
For consumer fusion cases of this form

```
%0:2 = scf.forall .. shared_outs(%arg0 = ..., %arg0 = ...) {

  tensor.parallel_insert_slice ... into %arg0
  tensor.parallel_insert_slice ... into %arg1
}
%1 = linalg.generic ... ins(%0#0, %0#1)
```

the current consumer fusion that handles one slice at a time cannot fuse
the consumer into the loop, since fusing along one slice will create and
SSA violation on the other use from the `scf.forall`. The solution is to
allow consumer fusion to allow considering multiple slices at once. This
PR changes the `TilingInterface` methods related to consumer fusion,
i.e.

- `getTiledImplementationFromOperandTile`
- `getIterationDomainFromOperandTile`

to allow fusion while considering multiple operands. It is upto the
`TilingInterface` implementation to return an error if a list of tiles
of the operands cannot result in a consistent implementation of the
tiled operation.

The Linalg operation implementation of `TilingInterface` has been
modified to account for these changes and allow cases where operand
tiles that can result in a consistent tiling implementation are handled.

---------

Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
2025-06-25 11:54:38 -07:00
Skrai Pardus
a45fda6aeb
switch type and value ordering for arith Constant[XX]Op (#144636)
This change standardizes the order of the parameters for `Constant[XXX]
Ops` to match with all other `Op` `build()` constructors.

In all instances of generated code for the MLIR dialects's Ops (that is
the TableGen using the .td files to create the .h.inc/.cpp.inc files),
the desired result type is always specified before the value.

Examples: 
```
// ArithOps.h.inc
class ConstantOp : public ::mlir::Op<ConstantOp, ::mlir::OpTrait::ZeroRegions, ::mlir::OpTrait::OneResult, ::mlir::OpTrait::OneTypedResult<::mlir::Type>::Impl, ::mlir::OpTrait::ZeroSuccessors, ::mlir::OpTrait::ZeroOperands, ::mlir::OpTrait::OpInvariants, ::mlir::BytecodeOpInterface::Trait, ::mlir::OpTrait::ConstantLike, ::mlir::ConditionallySpeculatable::Trait, ::mlir::OpTrait::AlwaysSpeculatableImplTrait, ::mlir::MemoryEffectOpInterface::Trait, ::mlir::OpAsmOpInterface::Trait, ::mlir::InferIntRangeInterface::Trait, ::mlir::InferTypeOpInterface::Trait> {
public:
....
static void build(::mlir::OpBuilder &odsBuilder, ::mlir::OperationState &odsState, ::mlir::Type result, ::mlir::TypedAttr value);
  static void build(::mlir::OpBuilder &odsBuilder, ::mlir::OperationState &odsState, ::mlir::TypedAttr value);
  static void build(::mlir::OpBuilder &odsBuilder, ::mlir::OperationState &odsState, ::mlir::TypeRange resultTypes, ::mlir::TypedAttr value);
  static void build(::mlir::OpBuilder &, ::mlir::OperationState &odsState, ::mlir::TypeRange resultTypes, ::mlir::ValueRange operands, ::llvm::ArrayRef<::mlir::NamedAttribute> attributes = {});
  static void build(::mlir::OpBuilder &odsBuilder, ::mlir::OperationState &odsState, ::mlir::ValueRange operands, ::llvm::ArrayRef<::mlir::NamedAttribute> attributes = {});
...
```
```
// ArithOps.h.inc
class SubIOp : public ::mlir::Op<SubIOp, ::mlir::OpTrait::ZeroRegions, ::mlir::OpTrait::OneResult, ::mlir::OpTrait::OneTypedResult<::mlir::Type>::Impl, ::mlir::OpTrait::ZeroSuccessors, ::mlir::OpTrait::NOperands<2>::Impl, ::mlir::OpTrait::OpInvariants, ::mlir::BytecodeOpInterface::Trait, ::mlir::ConditionallySpeculatable::Trait, ::mlir::OpTrait::AlwaysSpeculatableImplTrait, ::mlir::MemoryEffectOpInterface::Trait, ::mlir::InferIntRangeInterface::Trait, ::mlir::arith::ArithIntegerOverflowFlagsInterface::Trait, ::mlir::OpTrait::SameOperandsAndResultType, ::mlir::VectorUnrollOpInterface::Trait, ::mlir::OpTrait::Elementwise, ::mlir::OpTrait::Scalarizable, ::mlir::OpTrait::Vectorizable, ::mlir::OpTrait::Tensorizable, ::mlir::InferTypeOpInterface::Trait> {
public:
...
static void build(::mlir::OpBuilder &odsBuilder, ::mlir::OperationState &odsState, ::mlir::Type result, ::mlir::Value lhs, ::mlir::Value rhs, ::mlir::arith::IntegerOverflowFlagsAttr overflowFlags);
  static void build(::mlir::OpBuilder &odsBuilder, ::mlir::OperationState &odsState, ::mlir::Value lhs, ::mlir::Value rhs, ::mlir::arith::IntegerOverflowFlagsAttr overflowFlags);
  static void build(::mlir::OpBuilder &odsBuilder, ::mlir::OperationState &odsState, ::mlir::TypeRange resultTypes, ::mlir::Value lhs, ::mlir::Value rhs, ::mlir::arith::IntegerOverflowFlagsAttr overflowFlags);
  static void build(::mlir::OpBuilder &odsBuilder, ::mlir::OperationState &odsState, ::mlir::Type result, ::mlir::Value lhs, ::mlir::Value rhs, ::mlir::arith::IntegerOverflowFlags overflowFlags = ::mlir::arith::IntegerOverflowFlags::none);
  static void build(::mlir::OpBuilder &odsBuilder, ::mlir::OperationState &odsState, ::mlir::Value lhs, ::mlir::Value rhs, ::mlir::arith::IntegerOverflowFlags overflowFlags = ::mlir::arith::IntegerOverflowFlags::none);
  static void build(::mlir::OpBuilder &odsBuilder, ::mlir::OperationState &odsState, ::mlir::TypeRange resultTypes, ::mlir::Value lhs, ::mlir::Value rhs, ::mlir::arith::IntegerOverflowFlags overflowFlags = ::mlir::arith::IntegerOverflowFlags::none);
  static void build(::mlir::OpBuilder &, ::mlir::OperationState &odsState, ::mlir::TypeRange resultTypes, ::mlir::ValueRange operands, ::llvm::ArrayRef<::mlir::NamedAttribute> attributes = {});
  static void build(::mlir::OpBuilder &odsBuilder, ::mlir::OperationState &odsState, ::mlir::ValueRange operands, ::llvm::ArrayRef<::mlir::NamedAttribute> attributes = {});
...
```
In comparison, in the distinct case of `ConstantIntOp` and
`ConstantFloatOp`, the ordering of the result type and the value is
switched.

Thus, this PR corrects the ordering of the aforementioned
`Constant[XXX]Ops` to match with other constructors.
2025-06-23 23:35:50 +02:00
MaheshRavishankar
7bc956d3d6
[mlir][PartialReductionTilingInterface] Add support for ReductionTilingStrategy::PartialReductionOuterParallel in tileUsingSCF. (#143988)
Following up from https://github.com/llvm/llvm-project/pull/143467,
this PR adds support for
`ReductionTilingStrategy::PartialReductionOuterParallel` to
`tileUsingSCF`. The implementation of
`PartialReductionTilingInterface` for `Linalg` ops has been updated to
support this strategy as well. This makes the `tileUsingSCF` come on
par with `linalg::tileReductionUsingForall` which will be deprecated
subsequently.

Changes summary
- `PartialReductionTilingInterface` changes :
  - `tileToPartialReduction` method needed to get the induction
    variables of the generated tile loops. This was needed to keep the
    generated code similar to `linalg::tileReductionUsingForall`,
    specifically to create a simplified access for slicing the
intermediate partial results tensor when tiled in `num_threads` mode.
  - `getPartialResultTilePosition` methods needs the induction
    varialbes for the generated tile loops for the same reason above,
    and also needs the `tilingStrategy` to be passed in to generate
    correct code.

The tests in `transform-tile-reduction.mlir` testing the
`linalg::tileReductionUsingForall` have been moved over to test
`scf::tileUsingSCF` with
`ReductionTilingStrategy::PartialReductionOuterParallel`
strategy. Some of the test that were doing further cyclic distribution
of the transformed code from tiling are removed. Those seem like two
separate transformation that were merged into one. Ideally that would
need to happen when resolving the `scf.forall` rather than during
tiling.

Please review only the top commit. Depends on
https://github.com/llvm/llvm-project/pull/143467

Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
2025-06-23 12:27:26 -07:00