This introduces `isOpFusableWithProducer/Consumer` methods to the
TilingInterface that enable querying whether a tilable op can be fused
into a given set of producer slices or consumer slice without generating
IR. This is needed to enable use of the tiling interface in pattern
rewrites, as without this any pattern rewrite that tries to invoke the
method to tile is allowed to generate IR and fail.
The existing `scf::tileAndFuseConsumerOfSlices` takes a list of slices
(and loops they are part of), tries to find the consumer of these slices
(all slices are expected to be the same consumer), and then tiles the
consumer into the loop nest using the `TilingInterface`. A more natural
way of doing consumer fusion is to just start from the consumer, look
for operands that are produced by the loop nest passed in as `loops`
(presumably these loops are generated by tiling, but that is not a
requirement for consumer fusion). Using the consumer you can find the
slices of the operands that are accessed within the loop which you can
then use to tile and fuse the consumer (using `TilingInterface`). This
handles more naturally the case where multiple operands of the consumer
come from the loop nest.
The `scf::tileAndFuseConsumerOfSlices` was implemented as a mirror of
`scf::tileAndFuseProducerOfSlice`. For the latter, the slice has a
single producer for the source of the slice, which makes it a natural
way of specifying producer fusion. But for consumers, the result might
have multiple users, resulting in multiple candidates for fusion, as
well as a fusion candidate using multiple results from the tiled loop
nest. This means using slices
(`tensor.insert_slice`/`tensor.parallel_insert_slice`) as a hook for
consumer fusion turns out to be quite hard to navigate. The use of the
consumer directly avoids all those pain points. In time the
`scf::tileAndFuseConsumerOfSlices` should be deprecated in favor of
`scf::tileAndFuseConsumer`. There is a lot of tech-debt that has
accumulated in `scf::tileAndFuseConsumerOfSlices` that needs to be
cleanedup. So while that gets cleaned up, and required functionality is
moved to `scf::tileAndFuseConsumer`, the old path is still maintained.
The test for `scf::tileAndFuseConsumerUsingSlices` is copied to
`tile-and-fuse-consumer.mlir` to
`tile-and-fuse-consumer-using-slices.mlir`. All the tests that were
there in this file are now using the `tileAndFuseConsumer` method. The
test op `test.tile_and_fuse_consumer` is modified to call
`scf::tileAndFuseConsumer`, while a new op
`test.tile_and_fuse_consumer_of_slice` is used to keep the old path
tested while it is deprecated.
---------
Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
This change adds an option to use a custom operation to generate the
inter-tile loops during tiling. When the loop type is set to
scf::SCFTilingOptions::LoopType::CustomOp, the method
mlir::tileUsingSCF provides two callback functions
First one to generate the header of the loop.
Second one to generate the terminator of the loop.
These methods receive the information needed to generate the
loops/terminator and expect to return information needed to generate
the code for the intra-tile computation. See comments for more
details.
Presently this is adds support only for tiling. Subsequent commits
will update this to add support for fusion as well.
The PR is split into two commits.
The first commit is an NFC that just refactors the code (and cleans up
some naming) to make it easier to add the support for custom loop
operations.
The second commit adds the support for using a custom loop operation, as
well as a test to exercise this path.
Note that this is duplicate of
https://github.com/llvm/llvm-project/pull/159506 that was accidently
committed and was reverted in
https://github.com/llvm/llvm-project/pull/159598 to wait for reviews.
Signed-off-by: MaheshRavishankar
[mahesh.ravishankar@gmail.com](mailto:mahesh.ravishankar@gmail.com)
---------
Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
This change adds an option to use a custom operation to generate the
inter-tile loops during tiling. When the loop type is set to
`scf::SCFTilingOptions::LoopType::CustomOp`, the method
`mlir::tileUsingSCF` provides two callback functions
1. First one to generate the header of the loop.
2. Second one to generate the terminator of the loop.
These methods receive the information needed to generate the
loops/terminator and expect to return information needed to generate
the code for the intra-tile computation. See comments for more
details.
Presently this is adds support only for tiling. Subsequent commits
will update this to add support for fusion as well.
The PR is split into two commits.
1) The first commit is an NFC that just refactors the code (and cleans
up some naming) to make it easier to add the support for custom loop
operations.
2) The second commit adds the support for using a custom loop operation,
as well as a test to exercise this path.
Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
---------
Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
Since `scf::tileUsingSCF` is the core method used for tiling the root
operation within the `scf::tileConsumersAndFuseProducersUsingSCF`, the
latter can fuse into any tiled loop generated using `scf::tileUsingSCF`.
This patch adds a test for tiling a root operation using
`ReductionTilingStrategy::PartialReductionOuterParallelStrategy` and
fusing producers with it.
Since this strategy generates a rank-reducing extract slice
`tensor::replaceExtractSliceWithTiledProducer` which is the core method
used for the fusion was extended to handle the rank-reducing slices.
Also fix a small bug in the computation of the reduction induction
variable (which needs to use `floorDiv` instead of `ceilDiv`)
Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
For consumer fusion cases of this form
```
%0:2 = scf.forall .. shared_outs(%arg0 = ..., %arg0 = ...) {
tensor.parallel_insert_slice ... into %arg0
tensor.parallel_insert_slice ... into %arg1
}
%1 = linalg.generic ... ins(%0#0, %0#1)
```
the current consumer fusion that handles one slice at a time cannot fuse
the consumer into the loop, since fusing along one slice will create and
SSA violation on the other use from the `scf.forall`. The solution is to
allow consumer fusion to allow considering multiple slices at once. This
PR changes the `TilingInterface` methods related to consumer fusion,
i.e.
- `getTiledImplementationFromOperandTile`
- `getIterationDomainFromOperandTile`
to allow fusion while considering multiple operands. It is upto the
`TilingInterface` implementation to return an error if a list of tiles
of the operands cannot result in a consistent implementation of the
tiled operation.
The Linalg operation implementation of `TilingInterface` has been
modified to account for these changes and allow cases where operand
tiles that can result in a consistent tiling implementation are handled.
---------
Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
This gets the consumer fusion method in sync with the corresponding
producer fusion method `tileAndFuseProducerOfSlice`. Not taking this as
input required use of complicated analysis to retrieve the surrounding
loops which are very fragile. Just like the producer fusion method, the
loops need to be taken in as an argument, with typically the loops being
created by the tiling methods.
Some utilities are added to check that the loops passed in are perfectly
nested (in the case of an `scf.for` loop nest.
This is change 1 of N to simplify the implementation of tile and fuse
consumers.
---------
Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
This commit adds an API (`tileAndFuseConsumerOfSlice`) to fuse consumer to a producer within
scf.for/scf.forall loop.
To support this two new methods are added to the `TilingInterface`
- `getIterationDomainTileFromOperandTile`
- `getTiledImplementationFromOperandTile`.
Consumer operations that implement this method can be used to be fused with tiled producer operands in a manner similar to (but essentially the inverse of) the fusion of an untiled producer with a tiled consumer.
Note that this only does one `tiled producer` -> `consumer` fusion. This could be called repeatedly for fusing multiple consumers. The current implementation also is conservative in when this kicks in (like single use of the value returned by the inter-tile loops that surround the tiled producer, etc.) These can be relaxed over time.
Signed-off-by: Abhishek Varma <abhvarma@amd.com>
---------
Signed-off-by: Abhishek Varma <abhvarma@amd.com>
Signed-off-by: Abhishek Varma <avarma094@gmail.com>
Co-authored-by: cxy <chenxunyu1993@gmail.com>
Transform interfaces are implemented, direction or via extensions, in
libraries belonging to multiple other dialects. Those dialects don't
need to depend on the non-interface part of the transform dialect, which
includes the growing number of ops and transitive dependency footprint.
Split out the interfaces into a separate library. This in turn requires
flipping the dependency from the interface on the dialect that has crept
in because both co-existed in one library. The interface shouldn't
depend on the transform dialect either.
As a consequence of splitting, the capability of the interpreter to
automatically walk the payload IR to identify payload ops of a certain
kind based on the type used for the entry point symbol argument is
disabled. This is a good move by itself as it simplifies the interpreter
logic. This functionality can be trivially replaced by a
`transform.structured.match` operation.
Using `LoopLikeOpInterface` as the basis for the implementation unifies
all the tiling logic for both `scf.for` and `scf.forall`. The only
difference is the actual loop generation. This is a follow up to
https://github.com/llvm/llvm-project/pull/72178
Instead of many entry points for each loop type, the loop type is now
passed as part of the options passed to the tiling method.
This is a breaking change with the following changes
1) The `scf::tileUsingSCFForOp` is renamed to `scf::tileUsingSCF`
2) The `scf::tileUsingSCFForallOp` is deprecated. The same
functionality is obtained by using `scf::tileUsingSCF` and setting
the loop type in `scf::SCFTilingOptions` passed into this method to
`scf::SCFTilingOptions::LoopType::ForallOp` (using the
`setLoopType` method).
3) The `scf::tileConsumerAndFusedProducerGreedilyUsingSCFForOp` is
renamed to `scf::tileConsumerAndFuseProducerUsingSCF`. The use of
the `controlFn` in `scf::SCFTileAndFuseOptions` allows implementing
any strategy with the default callback implemeting the greedy fusion.
4) The `scf::SCFTilingResult` and `scf::SCFTileAndFuseResult` now use
`SmallVector<LoopLikeOpInterface>`.
5) To make `scf::ForallOp` implement the parts of
`LoopLikeOpInterface` needed, the `getOutputBlockArguments()`
method is replaced with `getRegionIterArgs()`
These changes now bring the tiling and fusion capabilities using
`scf.forall` on par with what was already supported by `scf.for`
In the process a couple of test transform dialect ops are added just
for testing. These operations are not intended to use as full flushed
out of transformation ops, but are rather operations added for testing.
A separate operation is added to `LinalgTransformOps.td` to convert a
`TilingInterface` operation to loops using the
`generateScalarImplementation` method implemented by the
operation. Eventually this and other operations related to tiling
using the `TilingInterface` need to move to a better place (i.e. out
of `Linalg` dialect)