13 Commits

Author SHA1 Message Date
Quinn Dawkins
45af50d9ca
[MLIR] Add fusability query to TilingInterface (#166502)
This introduces `isOpFusableWithProducer/Consumer` methods to the
TilingInterface that enable querying whether a tilable op can be fused
into a given set of producer slices or consumer slice without generating
IR. This is needed to enable use of the tiling interface in pattern
rewrites, as without this any pattern rewrite that tries to invoke the
method to tile is allowed to generate IR and fail.
2025-12-04 22:17:45 -05:00
MaheshRavishankar
dbeda4f419
[mlir][SCF] Add scf::tileAndFuseConsumer that tiles a consumer into a given tiled loop nest. (#167634)
The existing `scf::tileAndFuseConsumerOfSlices` takes a list of slices
(and loops they are part of), tries to find the consumer of these slices
(all slices are expected to be the same consumer), and then tiles the
consumer into the loop nest using the `TilingInterface`. A more natural
way of doing consumer fusion is to just start from the consumer, look
for operands that are produced by the loop nest passed in as `loops`
(presumably these loops are generated by tiling, but that is not a
requirement for consumer fusion). Using the consumer you can find the
slices of the operands that are accessed within the loop which you can
then use to tile and fuse the consumer (using `TilingInterface`). This
handles more naturally the case where multiple operands of the consumer
come from the loop nest.

The `scf::tileAndFuseConsumerOfSlices` was implemented as a mirror of
`scf::tileAndFuseProducerOfSlice`. For the latter, the slice has a
single producer for the source of the slice, which makes it a natural
way of specifying producer fusion. But for consumers, the result might
have multiple users, resulting in multiple candidates for fusion, as
well as a fusion candidate using multiple results from the tiled loop
nest. This means using slices
(`tensor.insert_slice`/`tensor.parallel_insert_slice`) as a hook for
consumer fusion turns out to be quite hard to navigate. The use of the
consumer directly avoids all those pain points. In time the
`scf::tileAndFuseConsumerOfSlices` should be deprecated in favor of
`scf::tileAndFuseConsumer`. There is a lot of tech-debt that has
accumulated in `scf::tileAndFuseConsumerOfSlices` that needs to be
cleanedup. So while that gets cleaned up, and required functionality is
moved to `scf::tileAndFuseConsumer`, the old path is still maintained.

The test for `scf::tileAndFuseConsumerUsingSlices` is copied to
`tile-and-fuse-consumer.mlir` to
`tile-and-fuse-consumer-using-slices.mlir`. All the tests that were
there in this file are now using the `tileAndFuseConsumer` method. The
test op `test.tile_and_fuse_consumer` is modified to call
`scf::tileAndFuseConsumer`, while a new op
`test.tile_and_fuse_consumer_of_slice` is used to keep the old path
tested while it is deprecated.

---------

Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
2025-11-20 14:13:57 -08:00
MaheshRavishankar
fa19a57b87
[mlir][SCF] Allow using a custom operation to generate loops with mlir::tileUsingSCF. (#159660)
This change adds an option to use a custom operation to generate the
inter-tile loops during tiling. When the loop type is set to
scf::SCFTilingOptions::LoopType::CustomOp, the method
mlir::tileUsingSCF provides two callback functions

First one to generate the header of the loop.
Second one to generate the terminator of the loop.
These methods receive the information needed to generate the
loops/terminator and expect to return information needed to generate
the code for the intra-tile computation. See comments for more
details.

Presently this is adds support only for tiling. Subsequent commits
will update this to add support for fusion as well.

The PR is split into two commits.

The first commit is an NFC that just refactors the code (and cleans up
some naming) to make it easier to add the support for custom loop
operations.
The second commit adds the support for using a custom loop operation, as
well as a test to exercise this path.

Note that this is duplicate of
https://github.com/llvm/llvm-project/pull/159506 that was accidently
committed and was reverted in
https://github.com/llvm/llvm-project/pull/159598 to wait for reviews.

Signed-off-by: MaheshRavishankar
[mahesh.ravishankar@gmail.com](mailto:mahesh.ravishankar@gmail.com)

---------

Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
2025-09-22 15:20:02 -07:00
MaheshRavishankar
cfaf23927c
Revert "[mlir][SCF] Allow using a custom operation to generate loops with mlir::tileUsingSCF." (#159598)
Reverts llvm/llvm-project#159506

It was committed by accident. Reverting it for reviews.
2025-09-18 14:52:29 -07:00
MaheshRavishankar
b8649098a7
[mlir][SCF] Allow using a custom operation to generate loops with mlir::tileUsingSCF. (#159506)
This change adds an option to use a custom operation to generate the
inter-tile loops during tiling. When the loop type is set to
`scf::SCFTilingOptions::LoopType::CustomOp`, the method
`mlir::tileUsingSCF` provides two callback functions

1. First one to generate the header of the loop.
2. Second one to generate the terminator of the loop.

These methods receive the information needed to generate the
loops/terminator and expect to return information needed to generate
the code for the intra-tile computation. See comments for more
details.

Presently this is adds support only for tiling. Subsequent commits
will update this to add support for fusion as well.

The PR is split into two commits.
1) The first commit is an NFC that just refactors the code (and cleans
up some naming) to make it easier to add the support for custom loop
operations.
2) The second commit adds the support for using a custom loop operation,
as well as a test to exercise this path.

Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>

---------

Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
2025-09-18 09:28:44 -07:00
MaheshRavishankar
c22352175e
[mlir][TilingInterface] Allow tile and fuse to work with ReductionTilingStrategy::PartialReductionOuterParallelStrategy. (#147593)
Since `scf::tileUsingSCF` is the core method used for tiling the root
operation within the `scf::tileConsumersAndFuseProducersUsingSCF`, the
latter can fuse into any tiled loop generated using `scf::tileUsingSCF`.
This patch adds a test for tiling a root operation using
`ReductionTilingStrategy::PartialReductionOuterParallelStrategy` and
fusing producers with it.

Since this strategy generates a rank-reducing extract slice
`tensor::replaceExtractSliceWithTiledProducer` which is the core method
used for the fusion was extended to handle the rank-reducing slices.

Also fix a small bug in the computation of the reduction induction
variable (which needs to use `floorDiv` instead of `ceilDiv`)

Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
2025-07-09 08:50:01 -07:00
MaheshRavishankar
c873e5f87d
[mlir][TilingInterface] Handle multi operand consumer fusion. (#145193)
For consumer fusion cases of this form

```
%0:2 = scf.forall .. shared_outs(%arg0 = ..., %arg0 = ...) {

  tensor.parallel_insert_slice ... into %arg0
  tensor.parallel_insert_slice ... into %arg1
}
%1 = linalg.generic ... ins(%0#0, %0#1)
```

the current consumer fusion that handles one slice at a time cannot fuse
the consumer into the loop, since fusing along one slice will create and
SSA violation on the other use from the `scf.forall`. The solution is to
allow consumer fusion to allow considering multiple slices at once. This
PR changes the `TilingInterface` methods related to consumer fusion,
i.e.

- `getTiledImplementationFromOperandTile`
- `getIterationDomainFromOperandTile`

to allow fusion while considering multiple operands. It is upto the
`TilingInterface` implementation to return an error if a list of tiles
of the operands cannot result in a consistent implementation of the
tiled operation.

The Linalg operation implementation of `TilingInterface` has been
modified to account for these changes and allow cases where operand
tiles that can result in a consistent tiling implementation are handled.

---------

Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
2025-06-25 11:54:38 -07:00
MaheshRavishankar
e4172196a7
[mlir][TilingInterface] Make tileAndFuseConsumerOfSlice take surrounding loops as an argument. (#132082)
This gets the consumer fusion method in sync with the corresponding
producer fusion method `tileAndFuseProducerOfSlice`. Not taking this as
input required use of complicated analysis to retrieve the surrounding
loops which are very fragile. Just like the producer fusion method, the
loops need to be taken in as an argument, with typically the loops being
created by the tiling methods.

Some utilities are added to check that the loops passed in are perfectly
nested (in the case of an `scf.for` loop nest.

This is change 1 of N to simplify the implementation of tile and fuse
consumers.

---------

Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
2025-03-24 11:41:26 -07:00
Yun-Fly
9bc3102bea
[mlir][scf] Extend consumer fusion to multiple tilable users (#111955)
Before, consumer fusion expects single usage(or others are terminator
op). This patch supports multiple tilable consumers fusion. 
E.g.
```
%0 = scf.for {
  ...
  %p = tiledProducer
  ...
}
%1 = tilableConsumer1 ins(%0 : ...)
%2 = tilableConsumer2 ins(%0 : ...)
```
===>
```
%0:3 = scf.for {
  ...
  %p = tiledProducer
  %1 = tiledConsumer1 ins(%p : ...)
  %2 = tiledConsumer2 ins(%p : ...)
  ...
}
```
The key process is ensuring that the first user of loop 
should not dominate any define of consumer operand(s).
2024-11-06 10:03:23 +08:00
Abhishek Varma
2b2ce50fe8
[MLIR][SCF] Add an API to fuse consumer to a producer within scf loop (#88712)
This commit adds an API (`tileAndFuseConsumerOfSlice`) to fuse consumer to a producer within
scf.for/scf.forall loop.

To support this two new methods are added to the `TilingInterface`
- `getIterationDomainTileFromOperandTile`
- `getTiledImplementationFromOperandTile`.

Consumer operations that implement this method can be used to be fused with tiled producer operands in a manner similar to (but essentially the inverse of) the fusion of an untiled producer with a tiled consumer.

Note that this only does one `tiled producer` -> `consumer` fusion. This could be called repeatedly for fusing multiple consumers. The current implementation also is conservative in when this kicks in (like single use of the value returned by the inter-tile loops that surround the tiled producer, etc.) These can be relaxed over time.

Signed-off-by: Abhishek Varma <abhvarma@amd.com>

---------

Signed-off-by: Abhishek Varma <abhvarma@amd.com>
Signed-off-by: Abhishek Varma <avarma094@gmail.com>
Co-authored-by: cxy <chenxunyu1993@gmail.com>
2024-06-01 11:23:41 -07:00
Oleksandr "Alex" Zinenko
5a9bdd85ee
[mlir] split transform interfaces into a separate library (#85221)
Transform interfaces are implemented, direction or via extensions, in
libraries belonging to multiple other dialects. Those dialects don't
need to depend on the non-interface part of the transform dialect, which
includes the growing number of ops and transitive dependency footprint.

Split out the interfaces into a separate library. This in turn requires
flipping the dependency from the interface on the dialect that has crept
in because both co-existed in one library. The interface shouldn't
depend on the transform dialect either.

As a consequence of splitting, the capability of the interpreter to
automatically walk the payload IR to identify payload ops of a certain
kind based on the type used for the entry point symbol argument is
disabled. This is a good move by itself as it simplifies the interpreter
logic. This functionality can be trivially replaced by a
`transform.structured.match` operation.
2024-03-20 22:15:17 +01:00
MaheshRavishankar
76ead96c1d
[mlir][TilingInterface] Use LoopLikeOpInterface in tiling using SCF to unify tiling with scf.for and scf.forall. (#77874)
Using `LoopLikeOpInterface` as the basis for the implementation unifies
all the tiling logic for both `scf.for` and `scf.forall`. The only
difference is the actual loop generation. This is a follow up to
https://github.com/llvm/llvm-project/pull/72178
Instead of many entry points for each loop type, the loop type is now
passed as part of the options passed to the tiling method.

This is a breaking change with the following changes

1) The `scf::tileUsingSCFForOp` is renamed to `scf::tileUsingSCF`
2) The `scf::tileUsingSCFForallOp` is deprecated. The same
   functionality is obtained by using `scf::tileUsingSCF` and setting
   the loop type in `scf::SCFTilingOptions` passed into this method to
   `scf::SCFTilingOptions::LoopType::ForallOp` (using the
   `setLoopType` method).
3) The `scf::tileConsumerAndFusedProducerGreedilyUsingSCFForOp` is
   renamed to `scf::tileConsumerAndFuseProducerUsingSCF`. The use of
   the `controlFn` in `scf::SCFTileAndFuseOptions` allows implementing
   any strategy with the default callback implemeting the greedy fusion.
4) The `scf::SCFTilingResult` and `scf::SCFTileAndFuseResult` now use
   `SmallVector<LoopLikeOpInterface>`.
5) To make `scf::ForallOp` implement the parts of
   `LoopLikeOpInterface` needed, the `getOutputBlockArguments()`
   method is replaced with `getRegionIterArgs()`

These changes now bring the tiling and fusion capabilities using
`scf.forall` on par with what was already supported by `scf.for`
2024-01-25 21:26:23 -08:00
MaheshRavishankar
aa2a96a24a
[mlir][TilingInterface] Move TilingInterface tests to use transform dialect ops. (#77204)
In the process a couple of test transform dialect ops are added just
for testing. These operations are not intended to use as full flushed
out of transformation ops, but are rather operations added for testing.

A separate operation is added to `LinalgTransformOps.td` to convert a
`TilingInterface` operation to loops using the
`generateScalarImplementation` method implemented by the
operation. Eventually this and other operations related to tiling
using the `TilingInterface` need to move to a better place (i.e. out
of `Linalg` dialect)
2024-01-11 21:31:03 -08:00