At the moment, they are a part of EmptyOp::getCanonicalizationPatterns. When
extract_slice(tensor.empty) is rewritten as a new tensor.empty, it could
happen that we end up with two tensor.empty ops, since the original
tensor.empty can have two users. After bufferization such cases result in two
allocations.
Differential Revision: https://reviews.llvm.org/D139308
This fixes the case where scf::LoopNest::loops is empty.
Change LoopVector and ValueVector to SmallVector.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D136926
`scf.foreach_thread` defines mapping its loops to processors via an integer array, see an example below. A lowering can use this mapping. However, expressing mapping as an integer array is very confusing, especially when there are multiple levels of parallelism. In addition, the op does not verify the integer array. This change introduces device mapping attribute to make mapping descriptive and verifiable. Then it makes GPU transform dialect use it.
```
scf.foreach_thread (%i, %j) in (%c1, %c2) {
scf.foreach_thread (%i2, %j2) in (%c1, %c2)
{...} { thread_dim_mapping = [0, 1]}
} { thread_dim_mapping = [0, 1]}
```
It first introduces a `DeviceMappingInterface` which is an attribute interface. `scf.foreach_thread` defines its mapping via this interface. A lowering must define its attributes and implement this interface as well. This way gives us a clear validation.
The change also introduces two new attributes (`#gpu.thread<x/y/z>` and `#gpu.block<x,y,z>` ). After this change, the above code prints as below, as seen here, this way clarifies the loop mappings. The change also implements consuming of these two new attribute by the transform dialect. Transform dialect binds the outermost loops to the thread blocks and innermost loops to threads.
```
scf.foreach_thread (%i, %j) in (%c1, %c2) {
scf.foreach_thread (%i2, %j2) in (%c1, %c2)
{...} { thread_dim_mapping = [#gpu.thread<x>, #gpu.thread<y>]}
} { thread_dim_mapping = [#gpu.block<x>, #gpu.block<y>]}
```
Reviewed By: ftynse, nicolasvasilache
Differential Revision: https://reviews.llvm.org/D137413
Prior to this change, the "ExtractSliceFromReshape" pattern would transform
```
%collapsed = tensor.collapse_shape %input [[0, 1], [2]]
: tensor<1x11x100xf32> into tensor<11x100xf32>
%slice = tensor.extract_slice %collapsed [%offt, 0] [%size, 100] [1, 1]
: tensor<11x100xf32> to tensor<?x100xf32>
```
into a loop that iterated over the range `%size - %offt`, that pieces
together multiple sub-slices of `%input` along the first dimension. This
is correct but obviously inefficient. The technical condition is that
collapsing at-most-one non-unit dimension of `%src` will not result in a
subsequent slice along the corresponding dimension of `%collapsed`
mapping across discontinuities in the index space of `%src`. Thus, the
definition of a "linearized dimension" (from the perspective of
`tensor.collapse_shape`) is updated to reflect this condition.
The transform will now generate
```
%slice = tensor.extract_slice %input [0, %offt, 0][1, %size, 100] [1, 1]
: tensor<1x11x100xf32> to tensor<1x?x100xf32>
%result = tensor.collapse_shape [[0, 1], [2]]
: tensor<1x?x100xf32> to tensor<?x100xf32>
```
which can be further canonicalized.
Additional tests are added to check this family of edge cases.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D135726
tensor.empty/linalg.init_tensor produces an uninititalized tensor that can be used as a destination operand for destination-style ops (ops that implement `DestinationStyleOpInterface`).
This change makes it possible to implement `TilingInterface` for non-destination-style ops without depending on the Linalg dialect.
RFC: https://discourse.llvm.org/t/rfc-add-tensor-from-shape-operation/65101
Differential Revision: https://reviews.llvm.org/D135129
Consecutive tensor.insert_slice/tensor.extract_slice can be
created for the case like tiling convolution and then downsizing
2-D convolutions into 1-D ones. It hinders further transformations.
So adding these patterns to clean it up.
Given that bufferization is sensitive and have requirements over
the IR structure (see https://reviews.llvm.org/D132666),
these patterns are put in Transforms/ with separate entry points
for explicit collection.
Reviewed By: ThomasRaoux, mravishankar
Differential Revision: https://reviews.llvm.org/D133871
The transformation would fail if none of the sliced dimensions were
linearized by the producing `tensor.collapse_shape`. This is a trivial
edge case but it wasn't correctly tested. Fixes the issue and adds a test.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D134088
This change adds a set of utilities to replace the result of a
`tensor.collapse_shape -> tensor.extract_slice` chain with the
equivalent result formed by aggregating slices of the
`tensor.collapse_shape` source. In general, it is not possible to
commute `extract_slice` and `collapse_shape` if linearized dimensions
are sliced. The i-th dimension of the `tensor.collapse_shape`
result is a "linearized sliced dimension" if:
1) Reassociation indices of tensor.collapse_shape in the i'th position
is greater than size 1 (multiple dimensions of the input are collapsed)
2) The i-th dimension is sliced by `tensor.extract_slice`.
We can work around this by stitching together the result of
`tensor.extract_slice` by iterating over any linearized sliced dimensions.
This is equivalent to "tiling" the linearized-and-sliced dimensions of
the `tensor.collapse_shape` operation in order to manifest the result
tile (the result of the `tensor.extract_slice`). The user of the
utilities must provide the mechanism to create the tiling (e.g. a loop).
In the tests, it is demonstrated how to apply the utilities using either
`scf.for` or `scf.foreach_thread`.
The below example illustrates the pattern using `scf.for`:
```
%0 = linalg.generic ... -> tensor<3x7x11x10xf32>
%1 = tensor.collapse_shape %0 [[0, 1, 2], [3]] : ... to tensor<341x10xf32>
%2 = tensor.extract_slice %1 [13, 0] [10, 10] [2, 1] : .... tensor<10x10xf32>
```
We can construct %2 by generating the following IR:
```
%dest = linalg.init_tensor() : tensor<10x10xf32>
%2 = scf.for %iv = %c0 to %c10 step %c1 iter_args(%arg0) -> tensor<10x10xf32> {
// Step 1: Map this output idx (%iv) to a multi-index for the input (%3):
%linear_index = affine.apply affine_map<(d0)[]->(d0*2 + 11)>(%iv)
%3:3 = arith.delinearize_index %iv into (3, 7, 11)
// Step 2: Extract the slice from the input
%4 = tensor.extract_slice %0 [%3#0, %3#1, %3#2, 0] [1, 1, 1, 10] [1, 1, 1, 1] :
tensor<3x7x11x10xf32> to tensor<1x1x1x10xf32>
%5 = tensor.collapse_shape %4 [[0, 1, 2], [3]] :
tensor<1x1x1x10xf32> into tensor<1x10xf32>
// Step 3: Insert the slice into the destination
%6 = tensor.insert_slice %5 into %arg0 [%iv, 0] [1, 10] [1, 1] :
tensor<1x10xf32> into tensor<10x10xf32>
scf.yield %6 : tensor<10x10xf32>
}
```
The pattern was discussed in the RFC here: https://discourse.llvm.org/t/rfc-tensor-extracting-slices-from-tensor-collapse-shape/64034
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D129699
This reverts commit 5711957875738c1318f89afd7bf4be388f85a087.
A circular dependency is introduced here from Dialect/Utils/ to the
ViewLikeInterface, but it already depends on Dialect/Utils.
Also this introduces a dependency from lib/Dialect/Tensor to Linalg,
which isn't obviously correct from a layering point of view.
This change adds a set of utilities to replace the result of a
`tensor.collapse_shape -> tensor.extract_slice` chain with the
equivalent result formed by aggregating slices of the
`tensor.collapse_shape` source. In general, it is not possible to
commute `extract_slice` and `collapse_shape` if linearized dimensions
are sliced. The i-th dimension of the `tensor.collapse_shape`
result is a "linearized sliced dimension" if:
1) Reassociation indices of tensor.collapse_shape in the i'th position
is greater than size 1 (multiple dimensions of the input are collapsed)
2) The i-th dimension is sliced by `tensor.extract_slice`.
We can work around this by stitching together the result of
`tensor.extract_slice` by iterating over any linearized sliced dimensions.
This is equivalent to "tiling" the linearized-and-sliced dimensions of
the `tensor.collapse_shape` operation in order to manifest the result
tile (the result of the `tensor.extract_slice`). The user of the
utilities must provide the mechanism to create the tiling (e.g. a loop).
In the tests, it is demonstrated how to apply the utilities using either
`scf.for` or `scf.foreach_thread`.
The below example illustrates the pattern using `scf.for`:
```
%0 = linalg.generic ... -> tensor<3x7x11x10xf32>
%1 = tensor.collapse_shape %0 [[0, 1, 2], [3]] : ... to tensor<341x10xf32>
%2 = tensor.extract_slice %1 [13, 0] [10, 10] [2, 1] : .... tensor<10x10xf32>
```
We can construct %2 by generating the following IR:
```
%dest = linalg.init_tensor() : tensor<10x10xf32>
%2 = scf.for %iv = %c0 to %c10 step %c1 iter_args(%arg0) -> tensor<10x10xf32> {
// Step 1: Map this output idx (%iv) to a multi-index for the input (%3):
%linear_index = affine.apply affine_map<(d0)[]->(d0*2 + 11)>(%iv)
%3:3 = arith.delinearize_index %iv into (3, 7, 11)
// Step 2: Extract the slice from the input
%4 = tensor.extract_slice %0 [%3#0, %3#1, %3#2, 0] [1, 1, 1, 10] [1, 1, 1, 1] :
tensor<3x7x11x10xf32> to tensor<1x1x1x10xf32>
%5 = tensor.collapse_shape %4 [[0, 1, 2], [3]] :
tensor<1x1x1x10xf32> into tensor<1x10xf32>
// Step 3: Insert the slice into the destination
%6 = tensor.insert_slice %5 into %arg0 [%iv, 0] [1, 10] [1, 1] :
tensor<1x10xf32> into tensor<10x10xf32>
scf.yield %6 : tensor<10x10xf32>
}
```
The pattern was discussed in the RFC here: https://discourse.llvm.org/t/rfc-tensor-extracting-slices-from-tensor-collapse-shape/64034
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D129699
This aligns the SCF dialect file layout with the majority of the dialects.
Reviewed By: jpienaar
Differential Revision: https://reviews.llvm.org/D128049
This commit restructures how TypeID is implemented to ideally avoid
the current problems related to shared libraries. This is done by changing
the "implicit" fallback path to use the name of the type, instead of using
a static template variable (which breaks shared libraries). The major downside to this
is that it adds some additional initialization costs for the implicit path. Given the
use of type names for uniqueness in the fallback, we also no longer allow types
defined in anonymous namespaces to have an implicit TypeID. To simplify defining
an ID for these classes, a new `MLIR_DEFINE_EXPLICIT_INTERNAL_INLINE_TYPE_ID` macro
was added to allow for explicitly defining a TypeID directly on an internal class.
To help identify when types are using the fallback, `-debug-only=typeid` can be
used to log which types are using implicit ids.
This change generally only requires changes to the test passes, which are all defined
in anonymous namespaces, and thus can't use the fallback any longer.
Differential Revision: https://reviews.llvm.org/D122775
A lot of test passes are currently anchored on FuncOp, but this
dependency
is generally just historical. A majority of these test passes can run on
any operation, or can operate on a specific interface
(FunctionOpInterface/SymbolOpInterface).
This allows for greatly reducing the API dependency on FuncOp, which
is slated to be moved out of the Builtin dialect.
Differential Revision: https://reviews.llvm.org/D121191
Add a pattern matcher for ExtractSliceOp when its source is a constant.
The matching heuristics can be governed by the control function since
generating a new constant is not always beneficial.
Differential Revision: https://reviews.llvm.org/D119605
This commit adds a pattern to wrap a tensor.pad op with
an scf.if op to separate the cases where we don't need padding
(all pad sizes are actually zeros) and where we indeed need
padding.
This pattern is meant to handle padding inside tiled loops.
Under such cases the padding sizes typically depend on the
loop induction variables. Splitting them would allow treating
perfect tiles and edge tiles separately.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D117018