The implementation of these methods are legacy and they are removed in
favor of using the `scf::tileUsingSCF` methods as replacements. To get
the latter on par with requirements of the deprecated methods, the
tiling allows one to specify the maximum number of tiles to use instead
of specifying the tile sizes. When tiling to `scf.forall` this
specification is used to generate the `num_threads` version of the
operation.
A slight deviation from previous implementation is that the deprecated
method always generated the `num_threads` variant of the `scf.forall`
operation. Instead now this is driven by the tiling options specified.
This reduces the indexing math generated when the tile sizes are
specified.
**Moving from `linalg::tileToForallOp` to `scf::tileUsingSCF`**
```
OpBuilder b;
TilingInterface op;
ArrayRef<OpFoldResult> numThreads;
ArrayAttr mapping;
FailureOr<ForallTilingResult> result =linalg::tileToForallOp(b, op, numThreads, mapping);
```
can be replaced by
```
scf::SCFTilingOptions options;
options.setNumThreads(numThreads);
options.setLoopType(scf::SCFTilingOptions::LoopType::ForallOp);
options.setMapping(mapping.getValue()); /*note the difference that setMapping takes an ArrayRef<Attribute> */
FailureOr<scf::SCFTilingResult> result = scf::tileUsingSCF(b, op, options);
```
This generates the `numThreads` version of the `scf.forall` for the
inter-tile loops, i.e.
```
... = scf.forall (%arg0, %arg1) in (%nt0, %nt1) shared_outs(...)
```
**Moving from `linalg::tileToForallOpUsingTileSizes` to
`scf::tileUsingSCF`**
```
OpBuilder b;
TilingInterface op;
ArrayRef<OpFoldResult> tileSizes;
ArrayAttr mapping;
FailureOr<ForallTilingResult> result =linalg::tileToForallOpUsingTileSizes(b, op, tileSizes, mapping);
```
can be replaced by
```
scf::SCFTilingOptions options;
options.setTileSizes(tileSizes);
options.setLoopType(scf::SCFTilingOptions::LoopType::ForallOp);
options.setMapping(mapping.getValue()); /*note the difference that setMapping takes an ArrayRef<Attribute> */
FailureOr<scf::SCFTilingResult> result = scf::tileUsingSCF(b, op, options);
```
Also note that `linalg::tileToForallOpUsingTileSizes` would effectively
call the `linalg::tileToForallOp` by computing the `numThreads` from the
`op` and `tileSizes` and generate the `numThreads` version of the
`scf.forall`. That is not the case anymore. Instead this will directly
generate the `tileSizes` version of the `scf.forall` op
```
... = scf.forall(%arg0, %arg1) = (%lb0, %lb1) to (%ub0, %ub1) step(%step0, %step1) shared_outs(...)
```
If you actually want to use the `numThreads` version, it is upto the
caller to compute the `numThreads` and set `options.setNumThreads`
instead of `options.setTileSizes`. Note that there is a slight
difference in the num threads version and tile size version. The former
requires an additional `affine.max` on the tile size to ensure
non-negative tile sizes. When lowering to `numThreads` version this
`affine.max` is not needed since by construction the tile sizes are
non-negative. In previous implementations, the `numThreads` version
generated when using the `linalg::tileToForallOpUsingTileSizes` method
would avoid generating the `affine.max` operation. To get the same
state, downstream users will have to additionally normalize the
`scf.forall` operation.
**Changes to `transform.structured.tile_using_forall`**
The transform dialect op that called into `linalg::tileToForallOp` and
`linalg::tileToForallOpUsingTileSizes` have been modified to call
`scf::tileUsingSCF`. The transform dialect op always generates the
`numThreads` version of the `scf.forall` op. So when `tile_sizes` are
specified for the transform dialect op, first the `tile_sizes` version
of the `scf.forall` is generated by the `scf::tileUsingSCF` method which
is then further normalized to get back to the same state. So there is no
functional change to `transform.structured.tile_using_forall`. It always
generates the `numThreads` version of the `scf.forall` op (as it did
before this change).
---------
Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
This patch is a first pass at making consistent syntax across the
`LinalgTransformOp`s that use dynamic index lists for size parameters.
Previously, there were two different forms: inline types in the list, or
place them in the functional style tuple. This patch goes for the
latter.
In order to do this, the `printPackedOrDynamicIndexList`,
`printDynamicIndexList` and their `parse` counterparts were modified so
that the types can be optionally provided to the corresponding custom
directives.
All affected ops now use tablegen `assemblyFormat`, so custom
`parse`/`print` functions have been removed. There are a couple ops that
will likely add dynamic size support, and once that happens it should be
made sure that the assembly remains consistent with the changes in this
patch.
The affected ops are as follows: `pack`, `pack_greedily`,
`tile_using_forall`. The `tile_using_for` and `vectorize` ops already
used this syntax, but their custom assembly was removed.
---------
Co-authored-by: Oleksandr "Alex" Zinenko <ftynse@gmail.com>
Using `LoopLikeOpInterface` as the basis for the implementation unifies
all the tiling logic for both `scf.for` and `scf.forall`. The only
difference is the actual loop generation. This is a follow up to
https://github.com/llvm/llvm-project/pull/72178
Instead of many entry points for each loop type, the loop type is now
passed as part of the options passed to the tiling method.
This is a breaking change with the following changes
1) The `scf::tileUsingSCFForOp` is renamed to `scf::tileUsingSCF`
2) The `scf::tileUsingSCFForallOp` is deprecated. The same
functionality is obtained by using `scf::tileUsingSCF` and setting
the loop type in `scf::SCFTilingOptions` passed into this method to
`scf::SCFTilingOptions::LoopType::ForallOp` (using the
`setLoopType` method).
3) The `scf::tileConsumerAndFusedProducerGreedilyUsingSCFForOp` is
renamed to `scf::tileConsumerAndFuseProducerUsingSCF`. The use of
the `controlFn` in `scf::SCFTileAndFuseOptions` allows implementing
any strategy with the default callback implemeting the greedy fusion.
4) The `scf::SCFTilingResult` and `scf::SCFTileAndFuseResult` now use
`SmallVector<LoopLikeOpInterface>`.
5) To make `scf::ForallOp` implement the parts of
`LoopLikeOpInterface` needed, the `getOutputBlockArguments()`
method is replaced with `getRegionIterArgs()`
These changes now bring the tiling and fusion capabilities using
`scf.forall` on par with what was already supported by `scf.for`
In the process a couple of test transform dialect ops are added just
for testing. These operations are not intended to use as full flushed
out of transformation ops, but are rather operations added for testing.
A separate operation is added to `LinalgTransformOps.td` to convert a
`TilingInterface` operation to loops using the
`generateScalarImplementation` method implemented by the
operation. Eventually this and other operations related to tiling
using the `TilingInterface` need to move to a better place (i.e. out
of `Linalg` dialect)
The current implementation of tiling using `scf.for` is convoluted to
make sure that the destination passing style of the untiled program is
preserved. The addition of support to tile using `scf.forall` (adapted
from the transform operation in Linalg) in
https://github.com/llvm/llvm-project/pull/67083 used cloning of the
tiled operations to better streamline the implementation. This PR adapts
the other tiling methods to use a similar approach, making the
transformations (and handling destination passing style semantics) more
systematic.
---------
Co-authored-by: Abhishek-Varma <avarma094@gmail.com>
tensor.empty/linalg.init_tensor produces an uninititalized tensor that can be used as a destination operand for destination-style ops (ops that implement `DestinationStyleOpInterface`).
This change makes it possible to implement `TilingInterface` for non-destination-style ops without depending on the Linalg dialect.
RFC: https://discourse.llvm.org/t/rfc-add-tensor-from-shape-operation/65101
Differential Revision: https://reviews.llvm.org/D135129
The existing implementation of the TilingInterface for Linalg ops was not
modifying the `linalg.index` ops contained within other Linalg ops (they need
to be summed up with the values of respective tile loop induction variables),
which led to the interface-based tiling being incorrect for any Linalg op with
index semantics.
In the process, fix the function performing the index offsetting to use the
pattern rewriter API instead of RAUW as it is being called from patterns and
may mess up the internal state of the rewriter. Also rename the function to
clearly catch all uses.
Depends On D129365
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D129366
This patch implements tile and fuse transformation for ops that
implement the tiling interface. To do so,
- `TilingInterface` needs a new method that generates a tiled
implementation of the operation based on the tile of the result
needed.
- A pattern is added that replaces a `tensor.extract_slice` whose
source is defined by an operation that implements the
`TilingInterface` with a tiled implementation that produces the
extracted slice in-place (using the method added to
`TilingInterface`).
- A pattern is added that takes a sequence of operations that
implement the `TilingInterface` (for now `LinalgOp`s), tiles the
consumer, and greedily fuses its producers iteratively.
Differential Revision: https://reviews.llvm.org/D127809
This patch implements tile and fuse transformation for ops that
implement the tiling interface. To do so,
- `TilingInterface` needs a new method that generates a tiled
implementation of the operation based on the tile of the result
needed.
- A pattern is added that replaces a `tensor.extract_slice` whose
source is defined by an operation that implements the
`TilingInterface` with a tiled implementation that produces the
extracted slice in-place (using the method added to
`TilingInterface`).
- A pattern is added that takes a sequence of operations that
implement the `TilingInterface` (for now `LinalgOp`s), tiles the
consumer, and greedily fuses its producers iteratively.
Differential Revision: https://reviews.llvm.org/D127809
This patch adds support for tiling operations that implement the
TilingInterface.
- It separates the loop constructs that are used to iterate over tile
from the implementation of the tiling itself. For example, the use
of destructive updates is more related to use of scf.for for
iterating over tiles that are tensors.
- To test the transformation, TilingInterface is implemented for
LinalgOps. The separation of the looping constructs used from the
implementation of tile code generation greatly simplifies the
latter.
- The implementation of TilingInterface for LinalgOp is kept as an
external model for now till this approach can be fully flushed out
to replace the existing tiling + fusion approaches in Linalg.
Differential Revision: https://reviews.llvm.org/D127133