This patch specializes `FoldTensorCastProducerOp` for `tensor::UnPackOp` by
introducing a dedicated pattern: `FoldTensorCastUnPackOp`. This mirrors a
similar update made for `tensor::PackOp` in #114559. Below is the updated
rationale tailored to `tensor::UnPackOp`.
ISSUE DESCRIPTION
Currently, `FoldTensorCastProducerOp` incorrectly folds the following:
```mlir
%cast = tensor.cast %dest : tensor<1x1x8x1xi32> to tensor<1x1x?x1xi32>
// Note: `%c8` and `?`.
%unpack = tensor.unpack %cast
inner_dims_pos = [0, 1]
inner_tiles = [%c8, 1]
into %res : tensor<1x1x?x1xi32> -> tensor<7x?xi32>
```
as:
```mlir
// Note: `%c8` and `8`.
%unpack = tensor.unpack %cast
inner_dims_pos = [0, 1]
inner_tiles = [%c8, 1]
into %res : tensor<1x1x8x1xi32> -> tensor<7x?xi32>
```
This triggers an Op verification failure because the folder does not
update the inner tile sizes in the unpack Op. This patch addresses the
issue by ensuring proper handling of inner tile sizes.
ADDITIONAL CHANGES
* invalid.mlir: Fixed a typo.
* TensorOps.cpp:
* Removed unnecessary `(void)tileSize`.
* Added comments following the discussion in PR #115772.
* Made minor updates to `FoldTensorCastPackOp` for consistency with the newly
introduced `FoldTensorCastUnPackOp`.
* Tensor/canonicalize.mlir: Ensured consistent usage of `test_attr` (e.g.,
replaced mixed use of `test_attr` and `some_attr`).
Note that PointerUnion::{is,get} have been soft deprecated in
PointerUnion.h:
// FIXME: Replace the uses of is(), get() and dyn_cast() with
// isa<T>, cast<T> and the llvm::dyn_cast<T>
I'm not touching PointerUnion::dyn_cast for now because it's a bit
complicated; we could blindly migrate it to dyn_cast_if_present, but
we should probably use dyn_cast when the operand is known to be
non-null.
Adds a new test for invalid `tensor.unpack` operations where the output
rank does not match the expected rank (input rank + num inner tile
sizes). For example:
```mlir
tensor.unpack %output
inner_dims_pos = [0, 1]
inner_tiles = [32, 16]
into %input : tensor<64x32x16xf32> -> tensor<256x128xf32>
```
In addition, updates the corresponding error message to make it more
informative:
BEFORE:
```mlir
error: packed rank must equal unpacked rank + tiling factors}
```
AFTER:
```mlir
error: packed rank != (unpacked rank + num tiling factors), got 3 != 4
```
Currently the implementation is within a pattern that cannot be used
without a pattern rewriter. Move the decomposition as a method of the
operation to make it usable outside of pattern rewrites.
Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
This PR fixes a crash when the tensor of `tensor.extract` is a dense
resource elements attribute.
Fixes#114728.
Co-authored-by: jinzhi <jinzhi6@huawei.com>
Currently, `FoldTensorCastProducerOp` incorrectly folds the following:
```mlir
%pack = tensor.pack %src
padding_value(%pad : i32)
inner_dims_pos = [0, 1]
inner_tiles = [%c8, 1]
into %cast : tensor<7x?xi32> -> tensor<1x1x?x1xi32>
%res = tensor.cast %pack : tensor<1x1x?x1xi32> to tensor<1x1x8x1xi32>
```
as (note the static trailing dim in the result and dynamic tile
dimension that corresponds to that):
```mlir
%res = tensor.pack %src
padding_value(%pad : i32)
inner_dims_pos = [0, 1]
inner_tiles = [%c8, 1]
into %cast : tensor<7x?xi32> -> tensor<1x1x8x1xi32>
```
This triggers an Op verification failure and is due to the fact that the
folder does not update the inner tile sizes in the pack Op. This PR
addresses that.
Note, supporting other Ops with size-like attributes is left as a TODO.
Add pattern that converts a `tensor.expand_shape` op to a more static
form.
This matches the pattern: `tensor.cast` -> `tensor.expand_shape` if it
has a foldable `tensor.cast` and some constant foldable `output_shape`
operands for the `tensor.expand_shape`. This makes the
`tensor.expand_shape` more static, as well as allowing the static
information to be propagated further down in the program.
Restricts the verifier for tensor.pack and tensor.unpack Ops so that the
following is no longer allowed:
```mlir
%c8 = arith.constant 8 : index
%0 = tensor.pack %input inner_dims_pos = [0, 1] inner_tiles = [8, %c8] into %output : tensor<?x?xf32> -> tensor<?x?x8x8xf32>
```
Specifically, in line with other Tensor Ops, require:
* a dynamic dimensions for each (dynamic) SSA value,
* a static dimension for each static size (attribute).
In the example above, a static dimension (8) is mixed with a dynamic
size (%c8).
Note that this is mostly deleting existing code - that's because this
change simplifies the logic in verifier.
For more context:
* https://discourse.llvm.org/t/tensor-ops-with-dynamic-sizes-which-behaviour-is-more-correct
This is more efficient to avoid a clone that is immediately removed.
Also guard the insertion of a cast on the result on whether the
destination type changed.
Updates the return type of `getNumDynamicDims` and `getNumScalableDims`
from `int64_t` to `size_t`. This is for consistency with other
helpers/methods that return "size" and to reduce the number of
`static_cast`s in various places.
Implements two helper hooks for PackOp and UnPackOP, `getAllOuterDims`
and `getTiledOuterDims`, and adds them to RelayoutOp (that both PackOp
an UnPackOp inherit from).
This improves code re-use and also clarifies the meaning of "outer dims"
and "tiled outer dims".
`tensor.pad(tensor.pad)` with the same constant padding value can be
combined into a single pad that pads to the sum of the high and low
padding amounts.
This patch add a check for indices of `tensor.gather` and
`tensor.scatter`. For that the length of gather_dims/scatter_dims should
match the size of last dimension of the indices. Fix#94901.
The `getDroppedDims` utility function does not follow the convention of
dropping outermost unit dimensions first when inferring a rank reduction
mask for a slice. This PR updates the implementation to match this
convention.
For patterns where there are multiple results apart from dpsInits, this
fails.
E.g.:
```
%13:2 = iree_codegen.ukernel.generic "iree_uk_unpack"
ins(%extracted_slice : tensor<?x1x16x16xf32>) outs(%11 :
tensor<?x?xf32>) ... -> tensor<?x?xf32>, i32
```
The above op has results apart from dpsInit and hence fails. The PR
assumes that the result has dpsInits followed by nonDpsInits.
Implement folding and rewrite logic to eliminate no-op tensor and memref
operations. This handles two specific cases:
1. tensor.insert_slice operations where the size of the inserted slice
is known to be 0.
2. memref.copy operations where either the source or target memrefs are
known to be emtpy.
Co-authored-by: Spenser Bauman <sabauma@fastmail>
Extends pack/unpack perm attribute checker to account for cases when the
optional outer_dims_perm attribute might be missing in one operation and
the other one has explicit identity permutation. This enables
canonicalizer to fold more unpack(pack(x)) variants.
Previously this was only populated in the create method later. This
resolves some of invalid builder paths. This may also be sufficient that
type inference functions no longer have to consider whether property
conversion has happened (but haven't verified that yet).
This also makes Attributes corresponding to Properties as optional
inside the set from attributes method. Today that is in effect what
happens with Property value initialization and folks use it to define
custom C++ types whose default initialization is what they want. This is
the behavior users get if they use properties directly. Propagating
Attributes without allowing partial setting would require iterating over
the dictionary attribute considering the properties of the op type that
will be created. This could also have been an additional method
generated or optional behavior on the set method. But doing it
consistently seems better. In terms of whats lost, it doesn't seem like
anything compared to the pure Property path where Property is default
value initialized and then partially overwritten (this doesn't seem to
buy anything else verification wise).
Default valued Properties (as specified ODS side rather than C++ side)
triggered error as the containing class was not yet complete but
referenced nested class, so that we couldn't have default initializer
for them in the parent class. Added an additional forwarding builder to
avoid needing to update call sites. This could be split out to separate
change.
Inlined templated function in unit test that was only used once. Moved
initialization earlier where seen.
Windows build of `mlir` with Visual Studio (19.36.32538 for x64) using
with the following command:
`cmake.exe -GNinja -DCMAKE_BUILD_TYPE=Release
-DLLVM_ENABLE_PROJECTS=mlir -DLLVM_ENABLE_EH=ON -DLLVM_ENABLE_RTTI=1
-DLLVM_TARGETS_TO_BUILD=host ../llvm`
is leading to a crash when calling canonicalization on
`tensor.pack`/`tensor.unpack` ops `mlir-opt --canonicalize input.mlir`
where the `input.mlir` is as follows (this is taken from one of the
filecheck tests for `tensor.pack`):
```
func.func @pack_unpack(%arg0: tensor<128x256xf32>) -> tensor<128x256xf32> {
%pack_dest = tensor.empty() : tensor<8x16x8x32xf32>
%unpack_dest = tensor.empty() : tensor<128x256xf32>
%tp = tensor.pack %arg0 outer_dims_perm = [1, 0] inner_dims_pos = [0, 1] inner_tiles = [8, 32] into %pack_dest : tensor<128x256xf32> -> tensor<8x16x8x32xf32>
%tup = tensor.unpack %tp outer_dims_perm = [1, 0] inner_dims_pos = [0, 1] inner_tiles = [8, 32] into %unpack_dest : tensor<8x16x8x32xf32> -> tensor<128x256xf32>
return %tup : tensor<128x256xf32>
}
```
The crash is seemingly coming from invalid memory access during
iterating over `innerDimsPos` within `getPackOpResultTypeShape`.
This crash is also causing the following tests to fail:
```
MLIR :: Dialect/Linalg/canonicalize.mlir
MLIR :: Dialect/Linalg/data-layout-propagation.mlir
MLIR :: Dialect/Linalg/generalize-tensor-pack-tile.mlir
MLIR :: Dialect/Linalg/generalize-tensor-pack.mlir
MLIR :: Dialect/Linalg/generalize-tensor-unpack-tile.mlir
MLIR :: Dialect/Linalg/generalize-tensor-unpack.mlir
MLIR :: Dialect/Linalg/transform-lower-pack.mlir
MLIR :: Dialect/Linalg/transform-op-fuse.mlir
MLIR :: Dialect/Linalg/transform-op-pack.mlir
MLIR :: Dialect/Linalg/transform-pack-greedily.mlir
MLIR :: Dialect/Tensor/canonicalize.mlir
MLIR :: Dialect/Tensor/fold-into-pack-and-unpack.mlir
MLIR :: Dialect/Tensor/invalid.mlir
MLIR :: Dialect/Tensor/ops.mlir
MLIR :: Dialect/Tensor/simplify-pack-unpack.mlir
MLIR :: Dialect/Tensor/tiling.mlir
```
A general question is: is it possible to support hooks here to infer the
encoding? E.g., when the extracted tensor slice is rank-reduced, the
encoding need to be updated accordingly as well.
In some cases this pattern may ignore static information due to dynamic
operands in the insert_slice sizes operands, e.g.:
```
%0 = tensor.cast %arg0 : tensor<1x?xf32> to tensor<?x?xf32>
%1 = tensor.insert_slice %0 into %arg1[...] [%s0, %s1] [...]
: tensor<?x?xf32> into tensor<?x?xf32>
```
Can be rewritten into:
```
%1 = tensor.insert_slice %arg0 into %arg1[...] [1, %s1] [...]
: tensor<1x?xf32> into tensor<?x?xf32>
```
This PR updates the matching in the pattern to allow rewrites like this.
Unblocking downstream integrate where an expected-to-fail test was
expecting this to be a runtime verifier error, not a compiler crash:
https://github.com/llvm/torch-mlir/pull/3279.
This patch generalizes tensor.expand_shape and memref.expand_shape to
consume the output shape as a list of SSA values. This enables us to
implement generic reshape operations with dynamic shapes using
collapse_shape/expand_shape pairs.
The output_shape input to expand_shape follows the static/dynamic
representation that's also used in `tensor.extract_slice`.
Differential Revision: https://reviews.llvm.org/D140821
---------
Signed-off-by: Gaurav Shukla<gaurav.shukla@amd.com>
Signed-off-by: Gaurav Shukla <gaurav.shukla@amd.com>
Co-authored-by: Ramiro Leal-Cavazos <ramiroleal050@gmail.com>
Canonicalization defaulted to replacement when the input dims were from
unknown source. This is obviously incorrect. Tweaked and included test
to prevent future issue.
This patch generalizes tensor.expand_shape and memref.expand_shape to
consume the output shape as a list of SSA values. This enables us to
implement generic reshape operations with dynamic shapes using
collapse_shape/expand_shape pairs.
The output_shape input to expand_shape follows the static/dynamic
representation that's also used in `tensor.extract_slice`.
Differential Revision: https://reviews.llvm.org/D140821
Co-authored-by: Ramiro Leal-Cavazos <ramiroleal050@gmail.com>
If `tensor.reshape` occurs with `d0, d1, d2, ...` for the dimensions we
know that the reshape is a no-op. Checking for this case lets us fold
away the computation.
This fixes an "infinite" loop bug, where the incoming IR was repeatedly
rewritten while adding identical cast operations. The test for
compatible types should include the notion of an encoding. If it
differs, then a
naive fusion into the consumer is invalid.
The current canonicalization of `memref.dim` operating on the result of
`memref.reshape` into `memref.load` is incorrect as it doesn't check
whether the `index` operand of `memref.dim` dominates the source
`memref.reshape` op. It always introduces `memref.load` right after
`memref.reshape` to ensure the `memref` is not mutated before the
`memref.load` call. As a result, the following error is observed:
```
$> mlir-opt --canonicalize input.mlir
func.func @reshape_dim(%arg0: memref<*xf32>, %arg1: memref<?xindex>, %arg2: index) -> index {
%c4 = arith.constant 4 : index
%reshape = memref.reshape %arg0(%arg1) : (memref<*xf32>, memref<?xindex>) -> memref<*xf32>
%0 = arith.muli %arg2, %c4 : index
%dim = memref.dim %reshape, %0 : memref<*xf32>
return %dim : index
}
```
results in:
```
dominator.mlir:22:12: error: operand #1 does not dominate this use
%dim = memref.dim %reshape, %0 : memref<*xf32>
^
dominator.mlir:22:12: note: see current operation: %1 = "memref.load"(%arg1, %2) <{nontemporal = false}> : (memref<?xindex>, index) -> index
dominator.mlir:21:10: note: operand defined here (op in the same block)
%0 = arith.muli %arg2, %c4 : index
```
Properly fixing this issue requires a dominator analysis which is
expensive to run within a canonicalization pattern. So, this patch fixes
the canonicalization pattern by being more strict/conservative about the
legality condition in which we perform this canonicalization.
The more general pattern is also added to `tensor.dim`. Since tensors are
immutable we don't need to worry about where to introduce the
`tensor.extract` call after canonicalization.
Before: op verifiers failed if the input and output ranks were the same
(i.e. no expansion or collapse). This behavior requires users of these
shape ops to verify manually that they are not creating identity
versions of these ops every time they build them -- problematic. This PR
removes this strict verification, and introduces folders for the the
identity cases.
The PR also removes the special case handling of rank-0 tensors for
expand_shape and collapse_shape, there doesn't seem to be any reason to
treat them differently.
The revision does not refactor the inferStaticShape for pack and unpack
ops because they can diverge quickly. Because there are more dimensions
can be inferred (i.e., with inner_tile_sizes) if the pack op does not
have padding value.
This is a follow-up of https://github.com/llvm/llvm-project/pull/80848
Previously, the `tensor.pack` verifier detects unconditional runtime
errors only when tile sizes are static. Now, dynamic tiles are
considered and we only require that the input and either corresponding
tile or output size are static to determine if it will unconditionally
produce errors at runtime.
Previously, `InsertSliceOpSourceCastInserter` was incorrectly applied to
a case when tensor types have an encoding attribute attached to them.
The type `newSrcType` was missing that attribute from the old `srcType`,
which made the expression `srcType == newSrcType` false, since
`tensor<2x2xf32, "foo">` is not equal to `tensor<2x2xf32>`. That lead to
an endless back and forth between `InsertSliceOpSourceCastInserter` that
would introduce a cast and `InsertSliceOpCastFolder` that would remove
it right after.
The canonicalization incrementally converts foldable dynamic hi/lo
padding to static hi/lo values. During this canonicalization the
static-fied valued should be removed from the dynamic values.