`getDestinationOperands` was almost a duplicate of `DestinationStyleOpInterface::getOutputOperands`. Now that the interface has been moved to mlir/Interfaces, it is no longer needed.
Differential Revision: https://reviews.llvm.org/D136240
The transition to transform dialect based tests dropped several cases of
the split reduction testing. Adding them back.
Differential Revision: https://reviews.llvm.org/D136287
This class adds helper functions similar to `emitError` for the
DiagnosedSilenceableFailure class in both the silenceable and definite
failure cases. These helpers simplify the use of said class and make
tranfsorm op application code idiomatic.
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D136072
This revision adds the ability to fuse tileable ops with multiple results to
the transform.fuse_into_containing_op.
Differential Revision: https://reviews.llvm.org/D135955
Context: https://discourse.llvm.org/t/psa-retire-linalg-filter-based-patterns/63785
Uses of `LinalgTilingPattern::returningMatchAndRewrite` are replaced by a top-level `tileWithLinalgTilingOptions` function that is marked obsolete and serves
as a temporary means to transition away from `LinalgTilingOptions`-based tiling.
LinalgTilingOptions supports too many options that have been orthogonalized with the use of the transform dialect.
Additionally, the revision introduces a `transform.structured.tile_to_scf_for` structured transform operation that is needed to properly tile `tensor.pad`
via the TilingInterface. Uses of `transform.structured.tile` will be deprecated and replaced by this new op.
This will achieve the deprecation of `linalg::tileLinalgOp`.
Context: https://discourse.llvm.org/t/psa-retire-tileandfuselinalgops-method/63850
In the process of transitioning, tests that were performing tile and distribute on tensors are retired: transformations should be orthogonalized better in the future.
In particular, tiling to specific loop types and tileAndDistribute behavior are not available via the transform ops.
The behavior is still available as part of the `tileWithLinalgTilingOptions` method to allow downstream clients to transition without breakages but is meant to be retired soon.
As more tests are ported to the transform dialect, it became necessary to introduce a test-transform-dialect-erase-schedule-pass to discard the transform specification
once applied so that e2e lowering and execution is possible.
Lastly, a number of redundant tests that were testing composition of patterns are retired as they are available with a better mechanism via the transform dialect.
Differential Revision: https://reviews.llvm.org/D135573
Context: https://discourse.llvm.org/t/psa-retire-linalg-filter-based-patterns/63785
In the process, also retire `tileConsumerAndFuseProducers` that is now replaced by `tileConsumerAndFuseProducerGreedilyUsingSCFForOp`.
Context: https://discourse.llvm.org/t/psa-retire-tileandfuselinalgops-method/63850
When performing this replacement, a change of behavior appeared: the older `tileConsumerAndFuseProducers` would split the parallel
and non-parallel dimensions automatically and perform a first level of tile-and-fuse on parallel dimensions only and then introduce a
second level of tiling-only on the reduction dimensions. The newer `tileConsumerAndFuseProducerGreedilyUsingSCFForOp` on the other hand
does not perform this breakdown. As a consequence, the transform specification is evolved to produce the same output.
Additionally, replace some uses of `unsigned` by `int64_t` where possible without pulling in larger interface changes (left for a future PR).
Context: https://www.youtube.com/watch?v=Puio5dly9N8
Lastly, tests that were performing tile and fuse and distribute on tensors are retired: the generated IR mixing scf.for, tensors and
distributed processor ids was racy at best ..
Differential Revision: https://reviews.llvm.org/D135559
The transform.split_handles op is useful for ensuring a statically known number of operations are
tracked by the source `handle` and to extract them into individual handles
that can be further manipulated in isolation.
In the process of making the op robust wrt to silenceable errors and the suppress mode, issues were
uncovered and fixed.
The main issue was that silenceable errors were short-circuited too early and the payloads were not
set. This resulted in suppressed silenceable errors not propagating correctly.
Fixing the issue triggered a few test failures: silenceable error returns now must properly set the results state.
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D135426
This revision adds GPU transform dialect. It also introduce a prefix such as "transform.gpu" for all ops related to this dialect.
MLIR already had two GPU transform op in linalg. This revision moves these ops into GPUTransformOps. The Ops are as follows:
`transform.structured.map_nested_foreach_thread_to_gpu_blocks` -> `transform.gpu.map_foreach_to_blocks`
This op selects the outermost (toplevel) foreach_thread and parallelize across GPU blocks. It can also generate `gpu_launch`.
`transform.structured.map_nested_foreach_thread_to_gpu_threads` -> `transform.gpu.map_nested_foreach_to_threads`
This op parallelizes nested foreach_thread that are inside `gpu_launch` across GPU threads.
It doesn't add new functionality, but there are some minor refactoring of the code.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D134800
Previously, splitReduction transformation added the split parallel dimension
*before* the reduction dimension, leading to tiling for reduction. This
commit creates an option to create the parallel dimension *after* the
reduction dimension, allowing us to transform the op into vertical reduction
with SIMD parallelism.
Reviewed By: ThomasRaoux, dcaballe
Differential Revision: https://reviews.llvm.org/D134764
This revision refactors gpu block id generator lambda that is used in the transform dialect. It removes the lambda and instead uses a static function that's name generateGpuBlockIds.
It also simplifies arguments that the function takes.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D134724
This revision adds a new op `map_nested_foreach_thread_to_gpu_blocks` to transform dialect.
If `generate_gpu_launch` argument is given, the op first generates `gpu_launch`. Otherwise, `target` must be `gpu_launch`. The op searches top level `scf.foreach_threads` inside the `gpu_launch` and distributes them with gpu.block_id attribute.
Loop mapping is explicit and given by the map_nested_foreach_thread_to_gpu_blocks op. Mapping is done one-to-one, therefore the loops disappear.
It also adds `gpu dialect` as dependent since the new op can create `gpu::LaunchOp` for given `scf::ForeachThreadOp`.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D134190
This allows downstream uses to use the implementation of the tiling
itself, while performing other transformations that are necessary to
go with it.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D134335
This revision adds a new op `map_nested_foreach_thread_to_gpu_threads` to transform dialect. The op searches `scf.foreach_threads` inside the `gpu_launch` and distributes them with `gpu.thread_id` attribute.
Loop mapping is explicit and given by the `map_nested_foreach_thread_to_gpu_threads` op. Mapping is done one-to-one, therefore the loops dissappear.
The dynamic trip count or trip count that are larger than thread size are not supported for the time being. However, we can indeed support them by generating a loop inside with cyclic scheduling.
For the time being, trip counts that are dynamic or bigger than thread sizes are not supported. However, in the future the compiler can indeed generate a loop with static cyclic scheduling to support these cases.
Current mechanism allows `scf.foreach_threads` to be siblings or nested. There cannot be interleaving code between the loops when they are nested.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D133950
This revision revisits the implementation of `transform.fuse_into_containing_op` so that it iterates on
producers one use at a time.
Support is added to fuse a producer through a foreach_thread shared tensor argument, in which case we
tile and fuse the op inside the containing op and update the shared tensor argument to the unique destination operand.
If one cannot find such a unique destination operand the transform fails.
Differential Revision: https://reviews.llvm.org/D134051
This revision retires the LinalgCodegenStrategy vectorization pattern. Please see the context: https://discourse.llvm.org/t/psa-retire-linalg-filter-based-patterns/63785.
This revision improves the transform dialect's VectorizeOp in different ways below:
- Adds LinalgDialect as a dependent dialect. When `transform.structured.vectorize` vectorizes `tensor.pad`, it generates `linalg.init_tensor`. In this case, linalg dialect must be registered.
- Inserts CopyVectorizationPattern in order to vectorize `memref.copy`.
- Creates two attributes: `disable_multi_reduction_to_contract_patterns` and `disable_transfer_permutation_map_lowering_patterns`. They are limiting the power of vectorization and are currently intended for testing purposes.
It also removes some of the "CHECK: vector.transfer_write" in the vectorization.mlir test. They are redundant writes, at the end of the code there is a rewrite to the same place. Transform dialect no longer generates them.
Depends on D133684 that retires the LinalgCodegenStrategy vectorization pass.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D133699
This revision revisits the implementation of `transform.fuse_into_containing_op` so that it iterates on
producers one use at a time.
Support is added to fuse a producer through a foreach_thread shared tensor argument, in which case we
tile and fuse the op inside the containing op and update the shared tensor argument to the unique destination operand.
If one cannot find such a unique destination operand the transform fails.
`getTiledImplementation`/`generateResultTileValue` only computes the tiled operation, but does not insert the result into any tensor.
Differential Revision: https://reviews.llvm.org/D133015
TileToForeachThreadOp now accepts mixed SSA value operands / index attributes for tile_sizes and num_threads. (Reusing OperandsOrIntegersSizesList.) In case of an operand, a PDL_Operation must be specified that is mapped to a payload op that returns the tile size or number of threads.
Differential Revision: https://reviews.llvm.org/D131949
This patch adds support for string literals as `custom` directive
arguments. This can be useful for re-using custom parsers and printers
when arguments have a known value. For example:
```
ParseResult parseTypedAttr(AsmParser &parser, Attribute &attr, Type type) {
return parser.parseAttribute(attr, type);
}
void printTypedAttr(AsmPrinter &printer, Attribute attr, Type type) {
return parser.printAttributeWithoutType(attr);
}
```
And in TableGen:
```
def FooOp : ... {
let arguments = (ins AnyAttr:$a);
let assemblyFormat = [{ custom<TypedAttr>($a, "$_builder.getI1Type()")
attr-dict }];
}
def BarOp : ... {
let arguments = (ins AnyAttr:$a);
let assemblyFormat = [{ custom<TypedAttr>($a, "$_builder.getIndexType()")
attr-dict }];
}
```
Instead of writing two separate sets of custom parsers and printers.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D131603
This patch removes the `type` field from `Attribute` along with the
`Attribute::getType` accessor.
Going forward, this means that attributes in MLIR will no longer have
types as a first-class concept. This patch lays the groundwork to
incrementally remove or refactor code that relies on generic attributes
being typed. The immediate impact will be on attributes that rely on
`Attribute` containing a type, such as `IntegerAttr`,
`DenseElementsAttr`, and `ml_program::ExternAttr`, which will now need
to define a type parameter on their storage classes. This will save
memory as all other attribute kinds will no longer contain a type.
Moreover, it will not be possible to generically query the type of an
attribute directly. This patch provides an attribute interface
`TypedAttr` that implements only one method, `getType`, which can be
used to generically query the types of attributes that implement the
interface. This interface can be used to retain the concept of a "typed
attribute". The ODS-generated accessor for a `type` parameter
automatically implements this method.
Next steps will be to refactor the assembly formats of certain operations
that rely on `parseAttribute(type)` and `printAttributeWithoutType` to
remove special handling of type elision until `type` can be removed from
the dialect parsing hook entirely; and incrementally remove uses of
`TypedAttr`.
Reviewed By: lattner, rriddle, jpienaar
Differential Revision: https://reviews.llvm.org/D130092
The structured op splitting transformation is conceptually similar to
tiling in the sense that it decomposes the iteration space of the
original op into several parts. Therefore, it is possible to implement
it using the TilingInterface to operate on iteration spaces and their
parts. However, the implementation also requires to pass updated input
operands, which is not supported by the interface, so the implementation
currently remains Linalg-specific.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D129564
The current Parser library is solely focused on providing API for
the textual MLIR format, but MLIR will soon also provide a binary
format. This commit renames the current Parser library to AsmParser to
better correspond to what the library is actually intended for. A new
Parser library is added which will act as a unified parser interface
between both text and binary formats. Most parser clients are
unaffected, given that the unified interface is essentially the same as
the current interface. Only clients that rely on utilizing the
AsmParserState, or those that want to parse Attributes/Types need to be
updated to point to the AsmParser library.
Differential Revision: https://reviews.llvm.org/D129605
In the Transform dialect extensions, provide the separate mechanism to
declare dependent dialects (the dialects the transform IR depends on)
and the generated dialects (the dialects the payload IR may be
transformed into). This allows the Transform dialect clients that are
only constructing the transform IR to avoid loading the dialects
relevant for the payload IR along with the Transform dialect itself,
thus decreasing the build/link time.
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D130289
This is useful for building small test cases and will be utilized in a subsequent commit that adds a fusion example.
Differential Revision: https://reviews.llvm.org/D130344
This op fuses a given payload op into a given container op. Inside the container, all uses of the producer are replaced (fused) with the newly inserted op. If the producer is tileable and accessed via a tensor.extract_slice, the new op computes only the requested slice ("tile and fuse"). Otherwise, the entire tensor value is computed inside the container ("clone and fuse").
Differential Revision: https://reviews.llvm.org/D130244
This change modifies `structured.tile_to_foreach_thread_op` so that
it accepts either `tile_sizes` or `num_threads` parameters. If
`tile_sizes` are specified, then the number of threads required is
derived the tile sizes rather than the other way around. In both cases,
more aggressive folding of loop parameters is enabled during the
transformation, allowing for the potential elimination of `affine.min`
and `affine.max` operations in the static shape case when calculating
the final adjusted tile size.
Differential Revision: https://reviews.llvm.org/D130139
This operation is a NavigationOp that simplifies the writing of transform IR.
Since there is no way of refering to an interface by name, the current implementation uses
an EnumAttr and depends on the interfaces it supports.
In the future, it would be worthwhile to remove this dependence and generalize.
Differential Revision: https://reviews.llvm.org/D130267
This revision adds a new transformation to tile a TilingInterface `op` to a tiled `scf.foreach_thread`, applying
tiling by `num_threads`.
If non-empty, the `threadDimMapping` is added as an attribute to the resulting `scf.foreach_thread`.
0-tile sizes (i.e. tile by the full size of the data) are used to encode
that a dimension is not tiled.
Differential Revision: https://reviews.llvm.org/D129577
This revision removes the LinalgPromotion pattern and adds a `transform.structured.promotion` op.
Since the LinalgPromotion transform allows the injection of arbitrary C++ via lambdas, the current
transform op does not handle it.
It is left for future work to decide what the right transform op control is for those cases.
Note the underlying implementation remains unchanged and the mechanism is still controllable by
lambdas from the API.
During this refactoring it was also determined that the `dynamicBuffers` option does not actually
connect to a change of behavior in the algorithm.
This also exhibits that the related test is wrong (and dangerous).
Both the option and the test are therefore removed.
Lastly, a test that connects patterns using the filter-based mechanism is removed: all the independent
pieces are already tested separately.
Context: https://discourse.llvm.org/t/psa-retire-linalg-filter-based-patterns/63785
Differential Revision: https://reviews.llvm.org/D129649
Existing implementation of structured op splitting creates several
affine.apply and affine.min operations in its subshape computation.
As these shapes are further used in data slice extraction, this may lead
to slice shapes being dynamic even when the original shapes and the
splitting point are static. This is particularly visible when splitting
is combined with further subsetting transformations such as tiling. Use
composition and folding more aggressively in splitting to avoid this.
In particular, introduce a `createComposedAffineMin` function that the
affine map used in "min" with the maps used by any `affine.apply` that
may be feeding the operands to the "min". This enables production of
more static shapes. Also introduce a `createComposedFoldedAffineApply`
function that combines the existing `createComposedAffineApply` with
in-place folding to propagate constants produced by zero-input affine
maps. Using these when splitting allows the subsequent canonicalizer
pass to recover static shapes for structured ops.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D129379
A recent commit introduced helper functions with semantically meaningful names
to populate the lists of memory effects in transform ops, use them whenever
possible.
Depends On D129287
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D129365
Introduce a structured transform op that emits IR computing the multi-tile
sizes with requested parameters (target size and divisor) for the given
structured op. The sizes may fold to arithmetic constant operations when the
shape is constant. These operations may then be used to call the existing
tiling transformation with a single non-zero dynamic size (i.e. perform
strip-mining) for each of the dimensions separately, thus achieving multi-size
tiling with optional loop interchange. A separate test exercises the entire
script.
Depends On D129217
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D129287
Extend the definition of the Tile structured transform op to enable it
accepting handles to operations that produce tile sizes at runtime. This is
useful by itself and prepares for more advanced tiling strategies. Note that
the changes are relevant only to the transform dialect, the tiling
transformation itself already supports dynamic sizes.
Depends On D129216
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D129217
This revision revisits the implementation of applyToOne and its handling
of recoverable errors as well as propagation of null handles.
The implementation is simplified to always require passing a vector<Operation*>
in which the results are returned, resulting in less template instantiation magic.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D129185
Introduce a new transformation on structured ops that splits the iteration
space into two parts along the specified dimension. The index at which the
splitting happens may be static or dynamic. This transformation can be seen as
a rudimentary form of index-set splitting that only supports the splitting
along hyperplanes parallel to the iteration space hyperplanes, and is therefore
decomposable into per-dimension application.
It is a key low-level transformation that enables independent scheduling for
different parts of the iteration space of the same op, which hasn't been
possible previously. It may be used to implement, e.g., multi-sized tiling. In
future, peeling can be implemented as a combination of split-off amount
computation and splitting.
The transformation is conceptually close to tiling in its separation of the
iteration and data spaces, but cannot be currently implemented on top of
TilingInterface as the latter does not properly support `linalg.index`
offsetting.
Note that the transformation intentionally bypasses folding of
`tensor.extract_slice` operations when creating them as this folding was found
to prevent repeated splitting of the same operation because due to internal
assumptions about extract/insert_slice combination in dialect utilities.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D129090
This revision merges the 2 split_reduction transforms and adds extra control by using attributes.
SplitReduction is known to require a concrete additional buffer to store tempoaray information.
Add an option to introduce a `bufferization.alloc_tensor` instead of `linalg.init_tensor`.
This behaves better with subset-based tiling and bufferization.
Differential Revision: https://reviews.llvm.org/D128722
This revision proposes a different implementation of the SplitReductoin transformation that does
not rely on tensor::ExpandShapeOp.
Previously, a dimension `[k]` would be split into `[k][kk]` via an ExpandShapeOp.
Instead, this revision proposes to rewrite `[k]` into `[factor * k + kk]`.
There are different tradeoffs involved but the proposed implementation is more general because
the affine rewrite is well-defined. In particular, it works naturally with `?` parallel dimensions and
non-trivial indexing maps.
A further rewrite of `[factor * k + kk]` + ExpandShapeOp is possible as a followup.
Differential Revision: https://reviews.llvm.org/D128266