This PR updates tensor.expand_shape and tensor.collapse_shape ODS
definitions to require ranked tensor operands/results by switching from
AnyTensor to AnyRankedTensor.
Fixes https://github.com/llvm/llvm-project/issues/178228
The `OpWithOffsetSizesAndStridesConstantArgumentFolder` pattern for
tensor.extract_slice was incorrectly using
`inferCanonicalRankReducedResultType()` to compute the result type after
folding constant offsets. This function infers a canonical rank
reduction (dropping the first N unit dimensions), which breaks when the
original operation used a non-canonical rank reduction. This change is
similar to the existing correct behavior in
`SubViewReturnTypeCanonicalizer` for `memref.subview`.
For example, extracting `tensor<1x4xf16>` from `tensor<1x1x8x1xf16>` by
dropping dims 0 and 3 would incorrectly produce tensor<4x1xf16>
(canonical: drop dims 0 and 1).
---------
Signed-off-by: Ian Wood <ianwood@u.northwestern.edu>
In some cases `y = expand(collapse(x))` cannot be folded into x, since x
and y have different types.
In that case, we check if the two types are cast compatible.
If they are, it means the two types have compatible shape and layout and
y can be folded into cast(x).
This causes a change in memref::CastOp::areCastCompatible, where now a
dim of size 1 may have different strides.
Introduces `VerificationUtils` to consolidate common operation
verification patterns in MLIR. This initial implementation provides
`verifyDynamicDimensionCount()` to reduce code duplication across
dialect verifiers.
This is an NFC (No Functional Change) refactoring that improves code
maintainability by extracting reusable verification logic into a shared
utility.
This PR updates `CollapseShapeOp::build` so that when the result type is
not explicitly provided, the inferred result type preserves the encoding
of the source tensor.
This patch fixes a crash occurring in mlir-opt when running
collapse_shape with an invalid index configuration. Instead of crashing,
an error message is returned to the user.
Fixes: #173567
---------
Co-authored-by: Bazinga! <akparmar004>
Adds new builders for `tensor.insert_slice` and `tensor.extract_slice`
Ops for which the _offsets_ and the _strides_ are all 0s and 1s,
respecitvely. This allows us to write:
```cpp
// No offsets and no strides - implicitly set to 0s and 1s,
// respectively.
tensor::InsertSliceOp::create(rewriter, loc, src, dest, writeSizes);
```
instead of:
```cpp
// Strides are initialised explicitly to 1s
Attribute oneIdxAttr = rewriter.getIndexAttr(1);
SmallVector<OpFoldResult> writeStrides(destRank, oneIdxAttr);
// Offsets are initialised explicitly to 0s
Attribute zeroIdxAttr = rewriter.getIndexAttr(0);
SmallVector<OpFoldResult> writeOffsets(destRank, zeroIdxAttr);
tensor::InsertSliceOp::create(rewriter, loc, src, dest, writeOffsets,
writeSizes, writeStrides);
```
Similarly to #152960, this PR fixes `getTiledOuterDims` for `linalg.pack` by
ensuring that the `outer_dims_perm` attributeis properly taken into account.
This enables the main change in this PR: relaxing the constraints in
* `DecomposeOuterUnitDimsPackOpPattern`.
Specifically, the pattern is extended to allow non-unit untiled outer
dimensions. For example:
```mlir
func.func @example(
%src: tensor<2x32x16x8xf32>,
%dest: tensor<2x1x16x8x32xf32>) -> tensor<2x1x16x8x32xf32> {
%pack = linalg.pack %src
inner_dims_pos = [1]
inner_tiles = [32] into %dest
: tensor<2x32x16x8xf32> -> tensor<2x1x16x8x32xf32>
return %pack : tensor<2x1x16x8x32xf32>
}
```
decomposes as:
```mlir
func.func @example(
%src: tensor<2x32x16x8xf32>,
%dest: tensor<2x1x16x8x32xf32>) -> tensor<2x1x16x8x32xf32> {
%0 = tensor.empty() : tensor<2x16x8x32xf32>
%transposed = linalg.transpose
ins(%src : tensor<2x32x16x8xf32>)
outs(%init : tensor<2x16x8x32xf32>)
permutation = [0, 2, 3, 1]
%inserted_slice = tensor.insert_slice %transposed
into %dest[0, 0, 0, 0, 0] [2, 1, 16, 8, 32] [1, 1, 1, 1, 1]
: tensor<2x16x8x32xf32> into tensor<2x1x16x8x32xf32>
return %inserted_slice : tensor<2x1x16x8x32xf32>
}
```
Importantly, this change makes `DecomposeOuterUnitDimsPackOpPattern` (the
decomposition pattern for `linalg.pack`) consistent with the corresponding
pattern for `linalg.unpack`:
* `DecomposeOuterUnitDimsUnPackOpPattern`.
One notable assumption remains: untiled outer dimensions are not permuted. This
was already the case but is now explicitly documented.
Co-authored by: Max Bartel <bartel@roofline.ai>
This commit:
- Introduces a new `InParallelOpInterface`, along with the
`ParallelCombiningOpInterface`, represent the parallel updating
operations we have in a parallel loop of `scf.forall`.
- Change the name of `ParallelCombiningOpInterface` to
`InParallelOpInterface` as the naming was quite confusing.
- `ParallelCombiningOpInterface` now is used to generalize operations
that insert into shared tensors within parallel combining regions.
Previously, only `tensor.parallel_insert_slice` was supported directly
in `scf.InParallelOp` regions.
- `tensor.parallel_insert_slice` now implements
`ParallelCombiningOpInterface`.
This change enables future extensions to support additional parallel
combining operations beyond `tensor.parallel_insert_slice`, which have
different update semantics, so the `in_parallel` region can correctly
and safely represent these kinds of operation without potential mistakes
such as races.
Author credits: @qedawkins
This patch fixes:
mlir/lib/Dialect/Tensor/IR/TensorOps.cpp:3205:13: error: unused
function 'printInferType' [-Werror,-Wunused-function]
mlir/lib/Dialect/Tensor/IR/TensorOps.cpp:3210:1: error: unused
function 'parseInferType' [-Werror,-Wunused-function]
These are identified by misc-include-cleaner. I've filtered out those
that break builds. Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
The motivation is to avoid having to negate `isDynamic*` checks, avoid
double negations, and allow for `ShapedType::isStaticDim` to be used in
ADT functions without having to wrap it in a lambda performing the
negation.
Also add the new functions to C and Python bindings.
Just like 1d-tensors, reshapes of 0d-tensors (aka scalars) are always
no-folds as they only have one possible layout. This PR adds logic to
the `fold` implementation to optimize these away as is currently
implemented for 1d tensors.
Following up from https://github.com/llvm/llvm-project/pull/143467,
this PR adds support for
`ReductionTilingStrategy::PartialReductionOuterParallel` to
`tileUsingSCF`. The implementation of
`PartialReductionTilingInterface` for `Linalg` ops has been updated to
support this strategy as well. This makes the `tileUsingSCF` come on
par with `linalg::tileReductionUsingForall` which will be deprecated
subsequently.
Changes summary
- `PartialReductionTilingInterface` changes :
- `tileToPartialReduction` method needed to get the induction
variables of the generated tile loops. This was needed to keep the
generated code similar to `linalg::tileReductionUsingForall`,
specifically to create a simplified access for slicing the
intermediate partial results tensor when tiled in `num_threads` mode.
- `getPartialResultTilePosition` methods needs the induction
varialbes for the generated tile loops for the same reason above,
and also needs the `tilingStrategy` to be passed in to generate
correct code.
The tests in `transform-tile-reduction.mlir` testing the
`linalg::tileReductionUsingForall` have been moved over to test
`scf::tileUsingSCF` with
`ReductionTilingStrategy::PartialReductionOuterParallel`
strategy. Some of the test that were doing further cyclic distribution
of the transformed code from tiling are removed. Those seem like two
separate transformation that were merged into one. Ideally that would
need to happen when resolving the `scf.forall` rather than during
tiling.
Please review only the top commit. Depends on
https://github.com/llvm/llvm-project/pull/143467
Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
Follow ups from https://github.com/llvm/llvm-project/pull/142458/
In particular concerns that indiscriminately folding tensor constants
can lead to bloating the IR as these can be arbitrarily large.
Signed-off-by: Asra Ali <asraa@google.com>
This patch fixes:
mlir/lib/Dialect/Tensor/IR/TensorOps.cpp:1680:37: error: comparison
of integers of different signs: 'int' and 'uint64_t' (aka 'unsigned
long') [-Werror,-Wsign-compare]
Adds a few canonicalizers, folders, and rewrite patterns to tensor ops:
* tensor.insert folder: insert into a constant is replaced with a new
constant
* tensor.extract folder: extract from a parent tensor that was inserted
at the same indices is folded into the inserted value
* rewrite pattern added that replaces an extract of a collapse shape
with an extract of the source tensor (requires static source dimensions)
Signed-off-by: Asra Ali <asraa@google.com>
The revision adds isOneInteger helper, and simplifies the existing code
with the two methods. It removes some lambda, which makes code cleaner.
For downstream users, you can update the code with the below script.
```bash
sed -i "s/isZeroIndex/isZeroInteger/g" **/*.h
sed -i "s/isZeroIndex/isZeroInteger/g" **/*.cpp
```
---------
Signed-off-by: hanhanW <hanhan0912@gmail.com>
## description
`tensor.concat` requires operands and the result to match on all
dimensions except the concatenation dimension. If one operand is already
static in those dimensions, the other operands and result type may
safely be refined to that same static shape. This PR adds
canonicalization patterns to refine `tensor.concat` types and propagate
static shapes to other canonicalization patterns through casts.
## example
```mlir
%2 = tensor.concat dim(0) %0, %1: (tensor<?x12xi32>, tensor<?x?xi32>) ->tensor<?x12xi32>
```
becomes:
```mlir
%cast = tensor.cast %1 : tensor<?x?xi32> to tensor<?x12xi32>
%2 = tensor.concat dim(0) %0, %cast : (tensor<?x12xi32>,
tensor<?x12xi32>) -> tensor<?x12xi32>
```
---------
Co-authored-by: Ian Wood <ianwood2024@u.northwestern.edu>
These are problematic because the iterative application that locally
resolves the tensor.dim operation introduces
intermediate floor_div, which is losing the information about the exact
division that was carried out in the original
IR, and the iterative algorithm can't converge towards the simplest
form.
Information loss is not acceptable for canonicalization.
Resolving the dimOp can be achieved through
resolve-ranked-shaped-type-result-dims and
resolve-shaped-type-result-dims passes.
---------
Signed-off-by: Vivek Khandelwal <vivekkhandelwal1424@gmail.com>
* Improve the verifier of `memref.subview` to detect out-of-bounds
extractions.
* Improve the documentation of `memref.subview` to make clear that
out-of-bounds extractions are not allowed. Rewrite examples to use the
new `strided<>` notation instead of `affine_map` layout maps. Also
remove all unrelated operations (`memref.alloc`) from the examples.
* Fix various test cases where `memref.subview` ops ran out-of-bounds.
* Update canonicalizations patterns to ensure that they do not fold IR
if it would generate IR that no longer verifies.
Related discussion on Discourse:
https://discourse.llvm.org/t/out-of-bounds-semantics-of-memref-subview/85293
This is a re-upload of #131876, which was reverted due to failing GPU
tests. These tests were faulty and fixed in #133051.
Reverts llvm/llvm-project#131876
GPU integration tests get broken by this PR.
E.x.
`mlir/test/Integration/GPU/CUDA/sm90/gemm_f32_f16_f16_128x128x128.mlir`
* Improve the verifier of `memref.subview` to detect out-of-bounds
extractions.
* Improve the documentation of `memref.subview` to make clear that
out-of-bounds extractions are not allowed. Rewrite examples to use the
new `strided<>` notation instead of `affine_map` layout maps. Also
remove all unrelated operations (`memref.alloc`) from the examples.
* Fix various test cases where `memref.subview` ops ran out-of-bounds.
* Update canonicalizations patterns to ensure that they do not fold IR
if it would generate IR that no longer verifies.
Related discussion on Discourse:
https://discourse.llvm.org/t/out-of-bounds-semantics-of-memref-subview/85293
Since #130487, `tensor.extract_slice` and `tensor.insert_slice` ops that
are statically detected to go out of bounds are rejected by the
verifier.
This commit fixes canonicalization patterns that currently fold
dynamically out-of-bounds ops (valid IR) to statically out-of-bounds ops
(invalid IR).
Creates an interface target for `LinalgRelayoutInterface`. This is
primarily to reduce the dependency of `Tensor` on `Linalg` to the
required minimum. For context and rationale, see:
* https://github.com/llvm/llvm-project/issues/127668
Note, I also took the liberty of renaming `LinalgRelayoutInterface` as
`RelayoutOpInterface` (removed `Linalg`, added `Op`).
Moves `PackOp` and `UnPackOp` from the Tensor dialect to Linalg. This change
was discussed in the following RFC:
* https://discourse.llvm.org/t/rfc-move-tensor-pack-and-tensor-unpack-into-linalg
This change involves significant churn but only relocates existing code - no new
functionality is added.
**Note for Downstream Users**
Downstream users must update references to `PackOp` and `UnPackOp` as follows:
* Code: `s/tensor::(Up)PackOp/linalg::(Un)PackOp/g`
* Tests: `s/tensor.(un)pack/linalg.(un)pack/g`
No other modifications should be required.
1. Extract the main logic from `foldTensorCastPrecondition` into a
dedicated helper hook: `hasFoldableTensorCastOperand`. This allows
for reusing the corresponding checks.
2. Rename `getNewOperands` to `getUpdatedOperandsAfterCastOpFolding` for
better clarity and documentation of its functionality.
3. These updated hooks will be reused in:
* https://github.com/llvm/llvm-project/pull/123902. This PR makes
them public.
**Note:** Moving these hooks to `Tensor/Utils` is not feasible because
`MLIRTensorUtils` depends on `MLIRTensorDialect` (CMake targets). If
these hooks were moved to `Utils`, it would create a dependency of
`MLIRTensorDialect` on `MLIRTensorUtils`, leading to a circular
dependency.
The newly introduced `TensorRelayoutOpInterface` is created specifically
for `tensor.pack` + `tensor.unpack`. Although the interface is
currently empty, it enables us to refactor the logic in
`FoldTensorCastProducerOp` within the Tensor dialect as follows:
```cpp
// OLD
// Reject tensor::PackOp - there's dedicated pattern for that instead.
if (!foldTensorCastPrecondition(op) ||
isa<tensor::PackOp, tensor::UnPackOp>(*op))
return failure();
```
is replaced with:
```cpp
// NEW
// Reject tensor::PackOp - there's dedicated pattern for that instead.
if (!foldTensorCastPrecondition(op) ||
isa<tensor::RelayoutOpInterface>(*op))
return failure();
```
This will be crucial once `tensor.pack` + `tensor.pack` are replaced
with `linalg.pack` + `linalg.unpack` (i.e. moved to Linalg):
* https://github.com/llvm/llvm-project/pull/123902,
* https://discourse.llvm.org/t/rfc-move-tensor-pack-and-tensor-unpack-into-linalg/.
Note that the interface itself will later be moved to the Linalg
dialect. This decoupling ensures that the Tensor dialect does not
require an understanding of Linalg ops, thus keeping the dependency
lightweight.
This PR is effectively a preparatory step for moving PackOp and UnpackOp
to Linalg. Once that's completed, most CMake changes from this PR will
be effectively reverted.
Implement the integer range inference niterface for memref.dim and
tetnor.dim using shared code. The inference will infer the `dim` of
dynamic dimensions to [0, index_max] and take the union of all the
dimensions that the `dim` argument could be validly referring to.
The op carries the output-shape directly. This can be used directly.
Also adds a method to get the shape as a `SmallVector<OpFoldResult>`.
Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
This patch specializes `FoldTensorCastProducerOp` for `tensor::UnPackOp` by
introducing a dedicated pattern: `FoldTensorCastUnPackOp`. This mirrors a
similar update made for `tensor::PackOp` in #114559. Below is the updated
rationale tailored to `tensor::UnPackOp`.
ISSUE DESCRIPTION
Currently, `FoldTensorCastProducerOp` incorrectly folds the following:
```mlir
%cast = tensor.cast %dest : tensor<1x1x8x1xi32> to tensor<1x1x?x1xi32>
// Note: `%c8` and `?`.
%unpack = tensor.unpack %cast
inner_dims_pos = [0, 1]
inner_tiles = [%c8, 1]
into %res : tensor<1x1x?x1xi32> -> tensor<7x?xi32>
```
as:
```mlir
// Note: `%c8` and `8`.
%unpack = tensor.unpack %cast
inner_dims_pos = [0, 1]
inner_tiles = [%c8, 1]
into %res : tensor<1x1x8x1xi32> -> tensor<7x?xi32>
```
This triggers an Op verification failure because the folder does not
update the inner tile sizes in the unpack Op. This patch addresses the
issue by ensuring proper handling of inner tile sizes.
ADDITIONAL CHANGES
* invalid.mlir: Fixed a typo.
* TensorOps.cpp:
* Removed unnecessary `(void)tileSize`.
* Added comments following the discussion in PR #115772.
* Made minor updates to `FoldTensorCastPackOp` for consistency with the newly
introduced `FoldTensorCastUnPackOp`.
* Tensor/canonicalize.mlir: Ensured consistent usage of `test_attr` (e.g.,
replaced mixed use of `test_attr` and `some_attr`).
Note that PointerUnion::{is,get} have been soft deprecated in
PointerUnion.h:
// FIXME: Replace the uses of is(), get() and dyn_cast() with
// isa<T>, cast<T> and the llvm::dyn_cast<T>
I'm not touching PointerUnion::dyn_cast for now because it's a bit
complicated; we could blindly migrate it to dyn_cast_if_present, but
we should probably use dyn_cast when the operand is known to be
non-null.