Collapsing / expanding a splatted value can be replaced with a single `tensor.splat` operation. Replace
these cases with a simple `tensor.splat` operation.
Reviewed By: rsuderman
Differential Revision: https://reviews.llvm.org/D140552
std::optional::value() has undesired exception checking semantics and is
unavailable in older Xcode (see _LIBCPP_AVAILABILITY_BAD_OPTIONAL_ACCESS). The
call sites block std::optional migration.
This is part of an effort to migrate from llvm::Optional to
std::optional. This patch changes the way mlir-tblgen generates .inc
files, and modifies tests and documentation appropriately. It is a "no
compromises" patch, and doesn't leave the user with an unpleasant mix of
llvm::Optional and std::optional.
A non-trivial change has been made to ControlFlowInterfaces to split one
constructor into two, relating to a build failure on Windows.
See also: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
Signed-off-by: Ramkumar Ramachandra <r@artagnon.com>
Differential Revision: https://reviews.llvm.org/D138934
The main issue of tiling unpack op is about incomplete tile. Since all
the dimensions are orthogonal, discussing 1-d unpack case is enough. The
core idea is to make the input slice have complete tiles. In this case,
a larger unpacked tile will be created. We'll need an extract_slice op
to shift and truncate the output.
Take Nn_to_N as an example. Say that N=32, n=8, and tiling_size=15. The
coordinates of second tile (i.e., result[15..31]) are [(1, 7), (2, 0,),
(2, 1) ... (3, 6), (3, 7)]. The first row and the last row are
incomplete in terms of inputs. It's impossible to represent an unpack op
using the coordinates. Because the input has higher rank and the math
computation of coordinate is using mod and ceilDiv. That's very tricky.
To represent the unpack op, we have to complete the rows. I.e., the
input coordinates would start with (1, 0); end with (3, 7). In this
context, the tiled unpack produces a (3 * n) elements because there are
3 rows in total. Follow by a tensor.extract_slice op, we can get the
actual result.
If the tiling sizes are multiple of inner tile sizes, it is a perfect
tiling case. In this context, the larger input and output is not needed.
Reviewed By: chelini
Differential Revision: https://reviews.llvm.org/D139362
At the moment, they are a part of EmptyOp::getCanonicalizationPatterns. When
extract_slice(tensor.empty) is rewritten as a new tensor.empty, it could
happen that we end up with two tensor.empty ops, since the original
tensor.empty can have two users. After bufferization such cases result in two
allocations.
Differential Revision: https://reviews.llvm.org/D139308
It introduces a pattern that swaps `linalg.generic + tensor.pack` to
`tensor.pack + linalg.generic`. It requires all the iteration types
being parallel; the indexing map of output operand is identiy. They can
all be relaxed in the future.
The user can decide whether the propagation should be applied or not by
passing a control function.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D138882
This revision adapts the pattern in LinAlg to work on DPS interface, and
adds it to canonicalization patterns of tensor dialect. The
InsertSliceOp is skipped in the pattern because it has its own logic
about folding tensor.cast ops.
Reviewed By: pifon2a
Differential Revision: https://reviews.llvm.org/D139375
We can compute the offsets and sizes for the slice of input because the
iteration domain is defined over outer loops. If the dimension is tiled,
the i-th index is the product of offset_i and inner_tile_i.
Different from tiling a pad op, we do not have to deal with reading zero
data from input. Because the tiling sizes are indicated to packed outer
dimensions. We will read either the entire tile or partial tile for each
packed tile. The scf.if and tensor.generate ops are not needed in this
context.
Co-authored-by: Lorenzo Chelini <l.chelini@icloud.com>
Reviewed By: rengolin, mravishankar
Differential Revision: https://reviews.llvm.org/D138631
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated. The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.
This is part of an effort to migrate from llvm::Optional to
std::optional:
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
The `paddingValue` and `outerDimsPerm` are optional to the op;
`innerTiles` can be variadic in terms of static sizes and dynamic sizes.
Add a custom builder for building pack op easier.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D138860
Do not generate CollapseShapeOps/ExpandShapeOps that have the same source and result shape. Generate casts instead. Such reshapes became invalid with D138498.
Differential Revision: https://reviews.llvm.org/D138557
Avoid duplicate code by using an existing helper function to interchange
a vector based on a permutation. Address comments emerged after landing
D138119.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D138480
Pack and Unpack return new tensors within which the individual elements
are reshuffled according to the packing specification. This has the
consequence of modifying the canonical order in which a given operator
(i.e., Matmul) accesses the individual elements. After bufferization,
this typically translates to increased access locality and cache
behavior improvement, e.g., eliminating cache line splitting.
Co-authored-by: Mahesh Ravishankar <ravishankarm@google.com>
Co-authored-by: Han-Chung Wang <hanchung@google.com>
RFC: https://discourse.llvm.org/t/rfc-tensor-pack-and-tensor-unpack/66408/1
Reviewed By: nicolasvasilache, rengolin, hanchung
Differential Revision: https://reviews.llvm.org/D138119
The `RankedTensorType` can have an optional encoding
attribute. Allowing the builders of `tensor.empty` to accept the
encoding attribute (optionally), allows building empty tensors with
the type having the encoding attribute.
Reviewed By: nicolasvasilache, hanchung, springerm
Differential Revision: https://reviews.llvm.org/D137297
When writing a tensor.extract/tensor.insert, the rank of the tensor is implied by the number of specified indices. When extracting from/inserting into an unranked tensor, it should first be casted to a ranked version.
Differential Revision: https://reviews.llvm.org/D136756
Drop the `createPadScalarOp` from Utils.h since it is a duplicate of
the `build` method added here.
Differential Revision: https://reviews.llvm.org/D136493
`getDestinationOperands` was almost a duplicate of `DestinationStyleOpInterface::getOutputOperands`. Now that the interface has been moved to mlir/Interfaces, it is no longer needed.
Differential Revision: https://reviews.llvm.org/D136240
The assert is misplaced as the result type is allowed to be null. A few
lines below the result type is inferred if it is passed a nullptr.
Besides, this behavior is described in the documentation of the builder.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D136262
These operations have undefined behavior if the index is not less than the rank of the source tensor / memref, so they cannot be freely speculated like they were before this patch. After this patch we speculate them only if we can prove that they don't have UB.
Depends on D135505.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D135748
tensor.empty/linalg.init_tensor produces an uninititalized tensor that can be used as a destination operand for destination-style ops (ops that implement `DestinationStyleOpInterface`).
This change makes it possible to implement `TilingInterface` for non-destination-style ops without depending on the Linalg dialect.
RFC: https://discourse.llvm.org/t/rfc-add-tensor-from-shape-operation/65101
Differential Revision: https://reviews.llvm.org/D135129
This interface is implemented by memref.dim and tensor.dim. This change makes it possible to remove a build dependency of the Affine dialect on the Tensor dialect (and maybe also the MemRef dialect in the future).
Differential Revision: https://reviews.llvm.org/D133595
The utility function should live in `StaticValueUtils.h` as it provides
a convenient way to convert a vector of OpFoldResults into a vector of
Values.
Reviewed By: nicolasvasilache, cota
Differential Revision: https://reviews.llvm.org/D134451
The utility function should live in `StaticValueUtils.h` as it provides
a convenient way to convert a vector of OpFoldResults into a vector of
Values.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D134451
Summary:
Use the new enum in TilingIterface and verify that `iterator_type` attribute in
LinalgOp interface is compatible with the enum values. Later IteratorType enum
will be used in LinalgInterface to replace the current `iterator_type` attribute
array of string.
Existing enums in Linalg are moved into a separate td file and tablegen build
target. This is necessary, have one I32EnumAttr in a shared space that generated
enum class definition and EnumAttrs is dialect-specific location. Otherwise
there might be a conflict that I32EnumAttr generates enum definitions in
multiple places.
Differential Revision: https://reviews.llvm.org/D134634
This reverts commit 730ae80d3e1c47f93f725acb2d37f06fcba06953.
It fails with a linking errors: `undefined reference to
`mlir::getValueOrCreateConstantIndexOp` in `libMLIRDialectUtils`.
The utility function should live in `StaticValueUtils.h` as it provides
a convenient way to convert a vector of OpFoldResults into a vector of
Values.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D134451
This change adds a set of utilities to replace the result of a
`tensor.collapse_shape -> tensor.extract_slice` chain with the
equivalent result formed by aggregating slices of the
`tensor.collapse_shape` source. In general, it is not possible to
commute `extract_slice` and `collapse_shape` if linearized dimensions
are sliced. The i-th dimension of the `tensor.collapse_shape`
result is a "linearized sliced dimension" if:
1) Reassociation indices of tensor.collapse_shape in the i'th position
is greater than size 1 (multiple dimensions of the input are collapsed)
2) The i-th dimension is sliced by `tensor.extract_slice`.
We can work around this by stitching together the result of
`tensor.extract_slice` by iterating over any linearized sliced dimensions.
This is equivalent to "tiling" the linearized-and-sliced dimensions of
the `tensor.collapse_shape` operation in order to manifest the result
tile (the result of the `tensor.extract_slice`). The user of the
utilities must provide the mechanism to create the tiling (e.g. a loop).
In the tests, it is demonstrated how to apply the utilities using either
`scf.for` or `scf.foreach_thread`.
The below example illustrates the pattern using `scf.for`:
```
%0 = linalg.generic ... -> tensor<3x7x11x10xf32>
%1 = tensor.collapse_shape %0 [[0, 1, 2], [3]] : ... to tensor<341x10xf32>
%2 = tensor.extract_slice %1 [13, 0] [10, 10] [2, 1] : .... tensor<10x10xf32>
```
We can construct %2 by generating the following IR:
```
%dest = linalg.init_tensor() : tensor<10x10xf32>
%2 = scf.for %iv = %c0 to %c10 step %c1 iter_args(%arg0) -> tensor<10x10xf32> {
// Step 1: Map this output idx (%iv) to a multi-index for the input (%3):
%linear_index = affine.apply affine_map<(d0)[]->(d0*2 + 11)>(%iv)
%3:3 = arith.delinearize_index %iv into (3, 7, 11)
// Step 2: Extract the slice from the input
%4 = tensor.extract_slice %0 [%3#0, %3#1, %3#2, 0] [1, 1, 1, 10] [1, 1, 1, 1] :
tensor<3x7x11x10xf32> to tensor<1x1x1x10xf32>
%5 = tensor.collapse_shape %4 [[0, 1, 2], [3]] :
tensor<1x1x1x10xf32> into tensor<1x10xf32>
// Step 3: Insert the slice into the destination
%6 = tensor.insert_slice %5 into %arg0 [%iv, 0] [1, 10] [1, 1] :
tensor<1x10xf32> into tensor<10x10xf32>
scf.yield %6 : tensor<10x10xf32>
}
```
The pattern was discussed in the RFC here: https://discourse.llvm.org/t/rfc-tensor-extracting-slices-from-tensor-collapse-shape/64034
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D129699
This reverts commit 5711957875738c1318f89afd7bf4be388f85a087.
A circular dependency is introduced here from Dialect/Utils/ to the
ViewLikeInterface, but it already depends on Dialect/Utils.
Also this introduces a dependency from lib/Dialect/Tensor to Linalg,
which isn't obviously correct from a layering point of view.
This change adds a set of utilities to replace the result of a
`tensor.collapse_shape -> tensor.extract_slice` chain with the
equivalent result formed by aggregating slices of the
`tensor.collapse_shape` source. In general, it is not possible to
commute `extract_slice` and `collapse_shape` if linearized dimensions
are sliced. The i-th dimension of the `tensor.collapse_shape`
result is a "linearized sliced dimension" if:
1) Reassociation indices of tensor.collapse_shape in the i'th position
is greater than size 1 (multiple dimensions of the input are collapsed)
2) The i-th dimension is sliced by `tensor.extract_slice`.
We can work around this by stitching together the result of
`tensor.extract_slice` by iterating over any linearized sliced dimensions.
This is equivalent to "tiling" the linearized-and-sliced dimensions of
the `tensor.collapse_shape` operation in order to manifest the result
tile (the result of the `tensor.extract_slice`). The user of the
utilities must provide the mechanism to create the tiling (e.g. a loop).
In the tests, it is demonstrated how to apply the utilities using either
`scf.for` or `scf.foreach_thread`.
The below example illustrates the pattern using `scf.for`:
```
%0 = linalg.generic ... -> tensor<3x7x11x10xf32>
%1 = tensor.collapse_shape %0 [[0, 1, 2], [3]] : ... to tensor<341x10xf32>
%2 = tensor.extract_slice %1 [13, 0] [10, 10] [2, 1] : .... tensor<10x10xf32>
```
We can construct %2 by generating the following IR:
```
%dest = linalg.init_tensor() : tensor<10x10xf32>
%2 = scf.for %iv = %c0 to %c10 step %c1 iter_args(%arg0) -> tensor<10x10xf32> {
// Step 1: Map this output idx (%iv) to a multi-index for the input (%3):
%linear_index = affine.apply affine_map<(d0)[]->(d0*2 + 11)>(%iv)
%3:3 = arith.delinearize_index %iv into (3, 7, 11)
// Step 2: Extract the slice from the input
%4 = tensor.extract_slice %0 [%3#0, %3#1, %3#2, 0] [1, 1, 1, 10] [1, 1, 1, 1] :
tensor<3x7x11x10xf32> to tensor<1x1x1x10xf32>
%5 = tensor.collapse_shape %4 [[0, 1, 2], [3]] :
tensor<1x1x1x10xf32> into tensor<1x10xf32>
// Step 3: Insert the slice into the destination
%6 = tensor.insert_slice %5 into %arg0 [%iv, 0] [1, 10] [1, 1] :
tensor<1x10xf32> into tensor<10x10xf32>
scf.yield %6 : tensor<10x10xf32>
}
```
The pattern was discussed in the RFC here: https://discourse.llvm.org/t/rfc-tensor-extracting-slices-from-tensor-collapse-shape/64034
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D129699
`getTiledImplementation`/`generateResultTileValue` only computes the tiled operation, but does not insert the result into any tensor.
Differential Revision: https://reviews.llvm.org/D133015
Update the implementation of `TilingInterface` for `tensor.pad`
operations to allow tiling the op using the existing patterns for the
interface. Verify that tests that pass with existing pad tiling
patterns producer the same results through TilingInterface patterns.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D132720
parallel_insert_slice doesn't return a value therefore we shouldn't try
to fold the result. The insert folding don't apply to this op.
The current folding would cause pattern rewrite to not be able to
converge.
Differential Revision: https://reviews.llvm.org/D132668