These are identified by misc-include-cleaner. I've filtered out those
that break builds. Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
The motivation is to avoid having to negate `isDynamic*` checks, avoid
double negations, and allow for `ShapedType::isStaticDim` to be used in
ADT functions without having to wrap it in a lambda performing the
negation.
Also add the new functions to C and Python bindings.
…shapes
Commit 6e5a142 changed the behavior of the function when computing
reassociations between tensors (consisting of unit/dynamic dimensions)
and scalars/0d vectors. The IR representation for such reshapes actually
expects an empty reassociation, like so:
```
func.func @example(%arg0 : tensor<?x?x?xf32>) -> tensor<f32> {
%0 = tensor.collapse_shape %arg0 [] : tensor<?x?x?xf32> into tensor<f32>
}
```
Restore the original behavior - the routine should resort to reporting
failures when compile time-known non-unit dimensions are part of the
attempted reassociation.
Signed-off-by: Artem Gindinson <gindinson@roofline.ai>
The original PR https://github.com/llvm/llvm-project/pull/137963 had a
nvidia bot failure. This appears to be a flaky test because rerunning
the build was successful.
This change needs commit 6f2ba47 to fix incorrect usage of
`getReassociationIndicesForCollapse`.
Reverts llvm/llvm-project#142639
Co-authored-by: Artem Gindinson <gindinson@roofline.ai>
The main idea behind the change is to allow expand-of-collapse folds for
reshapes like `?x?xk` -> `?` (k>1). The rationale here is that the
expand op must have a coherent index/affine expression specified in its
`output_shape` argument (see example below), and if it doesn't, the IR
has already been invalidated at an earlier stage:
```
%c32 = arith.constant 32 : index
%div = arith.divsi %<some_index>, %c32 : index
%collapsed = tensor.collapse_shape %41#1 [[0], [1, 2], [3, 4]]
: tensor<9x?x32x?x32xf32> into tensor<9x?x?xf32>
%affine = affine.apply affine_map<()[s0] -> (s0 * 32)> ()[%div]
%expanded = tensor.expand_shape %collapsed [[0], [1, 2], [3]] output_shape [9, %div, 32, %affine]
: tensor<9x?x?xf32> into tensor<9x?x32x?xf32>
```
On the above assumption, adjust the routine in
`getReassociationIndicesForCollapse()` to allow dynamic reshapes beyond
just `?x..?x1x1x..x1` -> `?`. Dynamic subshapes introduce two kinds of
issues:
1. n>2 consecutive dynamic dimensions in the source shape cannot be
collapsed together into 1<k<n neighboring dynamic dimensions in the
target shape, since there'd be more than one suitable reassociation
(example: `?x?x10x? into ?x?`)
2. When figuring out static subshape reassociations based on products,
there are cases where a static dimension is collapsed with a dynamic
one, and should therefore be skipped when comparing products of source &
target dimensions (e.g. `?x2x3x4 into ?x12`)
To address 1, we should detect such sequences in the target shape before
assigning multiple dynamic dimensions into the same index set. For 2, we
take note that a static target dimension was preceded by a dynamic one
and allow an "offset" subshape of source static dimensions, as long as
there's an exact sequence for the target size later in the source shape.
This PR aims to address all reshapes that can be determined based purely
on shapes (and original reassociation
maps, as done in
`ComposeExpandOfCollapseOp::findCollapsingReassociation)`. It doesn't
seem possible to fold all qualifying dynamic shape patterns in a
deterministic way without looking into affine expressions
simultaneously. That would be difficult to maintain in a single general
utility, so a path forward would be to provide dialect-specific
implementations for Linalg/Tensor.
Signed-off-by: Artem Gindinson <gindinson@roofline.ai>
---------
Signed-off-by: Artem Gindinson <gindinson@roofline.ai>
Co-authored-by: Ian Wood <ianwood2024@u.northwestern.edu>
The method does not need to create any operation, so we can use Builder.
It can be reused by any attribute getter implementation, so it does not
need to declare OpBuilder in the implementation.
Signed-off-by: hanhanW <hanhan0912@gmail.com>
Moves `PackOp` and `UnPackOp` from the Tensor dialect to Linalg. This change
was discussed in the following RFC:
* https://discourse.llvm.org/t/rfc-move-tensor-pack-and-tensor-unpack-into-linalg
This change involves significant churn but only relocates existing code - no new
functionality is added.
**Note for Downstream Users**
Downstream users must update references to `PackOp` and `UnPackOp` as follows:
* Code: `s/tensor::(Up)PackOp/linalg::(Un)PackOp/g`
* Tests: `s/tensor.(un)pack/linalg.(un)pack/g`
No other modifications should be required.
This patch fixes an off-by-one error in
`mlir::getReassociationIndicesForCollapse()` that occurs when the last
two dims of the source tensor satisfy the while loop.
This would cause an assertion failure due to out-of-bounds-access, which
is now fixed.
This patch generalizes tensor.expand_shape and memref.expand_shape to
consume the output shape as a list of SSA values. This enables us to
implement generic reshape operations with dynamic shapes using
collapse_shape/expand_shape pairs.
The output_shape input to expand_shape follows the static/dynamic
representation that's also used in `tensor.extract_slice`.
Differential Revision: https://reviews.llvm.org/D140821
---------
Signed-off-by: Gaurav Shukla<gaurav.shukla@amd.com>
Signed-off-by: Gaurav Shukla <gaurav.shukla@amd.com>
Co-authored-by: Ramiro Leal-Cavazos <ramiroleal050@gmail.com>
This patch generalizes tensor.expand_shape and memref.expand_shape to
consume the output shape as a list of SSA values. This enables us to
implement generic reshape operations with dynamic shapes using
collapse_shape/expand_shape pairs.
The output_shape input to expand_shape follows the static/dynamic
representation that's also used in `tensor.extract_slice`.
Differential Revision: https://reviews.llvm.org/D140821
Co-authored-by: Ramiro Leal-Cavazos <ramiroleal050@gmail.com>
The MLIR classes Type/Attribute/Operation/Op/Value support
cast/dyn_cast/isa/dyn_cast_or_null functionality through llvm's doCast
functionality in addition to defining methods with the same name.
This change begins the migration of uses of the method to the
corresponding function call as has been decided as more consistent.
Note that there still exist classes that only define methods directly,
such as AffineExpr, and this does not include work currently to support
a functional cast/isa call.
Caveats include:
- This clang-tidy script probably has more problems.
- This only touches C++ code, so nothing that is being generated.
Context:
- https://mlir.llvm.org/deprecation/ at "Use the free function variants
for dyn_cast/cast/isa/…"
- Original discussion at https://discourse.llvm.org/t/preferred-casting-style-going-forward/68443
Implementation:
This first patch was created with the following steps. The intention is
to only do automated changes at first, so I waste less time if it's
reverted, and so the first mass change is more clear as an example to
other teams that will need to follow similar steps.
Steps are described per line, as comments are removed by git:
0. Retrieve the change from the following to build clang-tidy with an
additional check:
https://github.com/llvm/llvm-project/compare/main...tpopp:llvm-project:tidy-cast-check
1. Build clang-tidy
2. Run clang-tidy over your entire codebase while disabling all checks
and enabling the one relevant one. Run on all header files also.
3. Delete .inc files that were also modified, so the next build rebuilds
them to a pure state.
4. Some changes have been deleted for the following reasons:
- Some files had a variable also named cast
- Some files had not included a header file that defines the cast
functions
- Some files are definitions of the classes that have the casting
methods, so the code still refers to the method instead of the
function without adding a prefix or removing the method declaration
at the same time.
```
ninja -C $BUILD_DIR clang-tidy
run-clang-tidy -clang-tidy-binary=$BUILD_DIR/bin/clang-tidy -checks='-*,misc-cast-functions'\
-header-filter=mlir/ mlir/* -fix
rm -rf $BUILD_DIR/tools/mlir/**/*.inc
git restore mlir/lib/IR mlir/lib/Dialect/DLTI/DLTI.cpp\
mlir/lib/Dialect/Complex/IR/ComplexDialect.cpp\
mlir/lib/**/IR/\
mlir/lib/Dialect/SparseTensor/Transforms/SparseVectorization.cpp\
mlir/lib/Dialect/Vector/Transforms/LowerVectorMultiReduction.cpp\
mlir/test/lib/Dialect/Test/TestTypes.cpp\
mlir/test/lib/Dialect/Transform/TestTransformDialectExtension.cpp\
mlir/test/lib/Dialect/Test/TestAttributes.cpp\
mlir/unittests/TableGen/EnumsGenTest.cpp\
mlir/test/python/lib/PythonTestCAPI.cpp\
mlir/include/mlir/IR/
```
Differential Revision: https://reviews.llvm.org/D150123
This patch recognizes when tensor.pack/unpack operations are simple
tensor.pad/unpad (a.k.a. tensor.extract_slice) and lowers them in a simpler
sequence of instruction.
For pack, instead of doing:
```
pad
expand_shape
transpose
```
we do
```
pad
insert_slice
```
For unpack, instead of doing:
```
transpose
collapse_shape
extract_slice
```
we do
```
extract_slice
```
Note: returning nullptr for the transform dialect is fine. The related
handles are just ignored by the following transformation.
Differential Revision: https://reviews.llvm.org/D148159
std::optional::value() has undesired exception checking semantics and is
unavailable in older Xcode (see _LIBCPP_AVAILABILITY_BAD_OPTIONAL_ACCESS). The
call sites block std::optional migration.
This is part of an effort to migrate from llvm::Optional to
std::optional. This patch changes the way mlir-tblgen generates .inc
files, and modifies tests and documentation appropriately. It is a "no
compromises" patch, and doesn't leave the user with an unpleasant mix of
llvm::Optional and std::optional.
A non-trivial change has been made to ControlFlowInterfaces to split one
constructor into two, relating to a build failure on Windows.
See also: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
Signed-off-by: Ramkumar Ramachandra <r@artagnon.com>
Differential Revision: https://reviews.llvm.org/D138934
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated. The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.
This is part of an effort to migrate from llvm::Optional to
std::optional:
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
Prior to this change, the "ExtractSliceFromReshape" pattern would transform
```
%collapsed = tensor.collapse_shape %input [[0, 1], [2]]
: tensor<1x11x100xf32> into tensor<11x100xf32>
%slice = tensor.extract_slice %collapsed [%offt, 0] [%size, 100] [1, 1]
: tensor<11x100xf32> to tensor<?x100xf32>
```
into a loop that iterated over the range `%size - %offt`, that pieces
together multiple sub-slices of `%input` along the first dimension. This
is correct but obviously inefficient. The technical condition is that
collapsing at-most-one non-unit dimension of `%src` will not result in a
subsequent slice along the corresponding dimension of `%collapsed`
mapping across discontinuities in the index space of `%src`. Thus, the
definition of a "linearized dimension" (from the perspective of
`tensor.collapse_shape`) is updated to reflect this condition.
The transform will now generate
```
%slice = tensor.extract_slice %input [0, %offt, 0][1, %size, 100] [1, 1]
: tensor<1x11x100xf32> to tensor<1x?x100xf32>
%result = tensor.collapse_shape [[0, 1], [2]]
: tensor<1x?x100xf32> to tensor<?x100xf32>
```
which can be further canonicalized.
Additional tests are added to check this family of edge cases.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D135726
The transformation would fail if none of the sliced dimensions were
linearized by the producing `tensor.collapse_shape`. This is a trivial
edge case but it wasn't correctly tested. Fixes the issue and adds a test.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D134088
This change adds a set of utilities to replace the result of a
`tensor.collapse_shape -> tensor.extract_slice` chain with the
equivalent result formed by aggregating slices of the
`tensor.collapse_shape` source. In general, it is not possible to
commute `extract_slice` and `collapse_shape` if linearized dimensions
are sliced. The i-th dimension of the `tensor.collapse_shape`
result is a "linearized sliced dimension" if:
1) Reassociation indices of tensor.collapse_shape in the i'th position
is greater than size 1 (multiple dimensions of the input are collapsed)
2) The i-th dimension is sliced by `tensor.extract_slice`.
We can work around this by stitching together the result of
`tensor.extract_slice` by iterating over any linearized sliced dimensions.
This is equivalent to "tiling" the linearized-and-sliced dimensions of
the `tensor.collapse_shape` operation in order to manifest the result
tile (the result of the `tensor.extract_slice`). The user of the
utilities must provide the mechanism to create the tiling (e.g. a loop).
In the tests, it is demonstrated how to apply the utilities using either
`scf.for` or `scf.foreach_thread`.
The below example illustrates the pattern using `scf.for`:
```
%0 = linalg.generic ... -> tensor<3x7x11x10xf32>
%1 = tensor.collapse_shape %0 [[0, 1, 2], [3]] : ... to tensor<341x10xf32>
%2 = tensor.extract_slice %1 [13, 0] [10, 10] [2, 1] : .... tensor<10x10xf32>
```
We can construct %2 by generating the following IR:
```
%dest = linalg.init_tensor() : tensor<10x10xf32>
%2 = scf.for %iv = %c0 to %c10 step %c1 iter_args(%arg0) -> tensor<10x10xf32> {
// Step 1: Map this output idx (%iv) to a multi-index for the input (%3):
%linear_index = affine.apply affine_map<(d0)[]->(d0*2 + 11)>(%iv)
%3:3 = arith.delinearize_index %iv into (3, 7, 11)
// Step 2: Extract the slice from the input
%4 = tensor.extract_slice %0 [%3#0, %3#1, %3#2, 0] [1, 1, 1, 10] [1, 1, 1, 1] :
tensor<3x7x11x10xf32> to tensor<1x1x1x10xf32>
%5 = tensor.collapse_shape %4 [[0, 1, 2], [3]] :
tensor<1x1x1x10xf32> into tensor<1x10xf32>
// Step 3: Insert the slice into the destination
%6 = tensor.insert_slice %5 into %arg0 [%iv, 0] [1, 10] [1, 1] :
tensor<1x10xf32> into tensor<10x10xf32>
scf.yield %6 : tensor<10x10xf32>
}
```
The pattern was discussed in the RFC here: https://discourse.llvm.org/t/rfc-tensor-extracting-slices-from-tensor-collapse-shape/64034
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D129699
This reverts commit 5711957875738c1318f89afd7bf4be388f85a087.
A circular dependency is introduced here from Dialect/Utils/ to the
ViewLikeInterface, but it already depends on Dialect/Utils.
Also this introduces a dependency from lib/Dialect/Tensor to Linalg,
which isn't obviously correct from a layering point of view.
This change adds a set of utilities to replace the result of a
`tensor.collapse_shape -> tensor.extract_slice` chain with the
equivalent result formed by aggregating slices of the
`tensor.collapse_shape` source. In general, it is not possible to
commute `extract_slice` and `collapse_shape` if linearized dimensions
are sliced. The i-th dimension of the `tensor.collapse_shape`
result is a "linearized sliced dimension" if:
1) Reassociation indices of tensor.collapse_shape in the i'th position
is greater than size 1 (multiple dimensions of the input are collapsed)
2) The i-th dimension is sliced by `tensor.extract_slice`.
We can work around this by stitching together the result of
`tensor.extract_slice` by iterating over any linearized sliced dimensions.
This is equivalent to "tiling" the linearized-and-sliced dimensions of
the `tensor.collapse_shape` operation in order to manifest the result
tile (the result of the `tensor.extract_slice`). The user of the
utilities must provide the mechanism to create the tiling (e.g. a loop).
In the tests, it is demonstrated how to apply the utilities using either
`scf.for` or `scf.foreach_thread`.
The below example illustrates the pattern using `scf.for`:
```
%0 = linalg.generic ... -> tensor<3x7x11x10xf32>
%1 = tensor.collapse_shape %0 [[0, 1, 2], [3]] : ... to tensor<341x10xf32>
%2 = tensor.extract_slice %1 [13, 0] [10, 10] [2, 1] : .... tensor<10x10xf32>
```
We can construct %2 by generating the following IR:
```
%dest = linalg.init_tensor() : tensor<10x10xf32>
%2 = scf.for %iv = %c0 to %c10 step %c1 iter_args(%arg0) -> tensor<10x10xf32> {
// Step 1: Map this output idx (%iv) to a multi-index for the input (%3):
%linear_index = affine.apply affine_map<(d0)[]->(d0*2 + 11)>(%iv)
%3:3 = arith.delinearize_index %iv into (3, 7, 11)
// Step 2: Extract the slice from the input
%4 = tensor.extract_slice %0 [%3#0, %3#1, %3#2, 0] [1, 1, 1, 10] [1, 1, 1, 1] :
tensor<3x7x11x10xf32> to tensor<1x1x1x10xf32>
%5 = tensor.collapse_shape %4 [[0, 1, 2], [3]] :
tensor<1x1x1x10xf32> into tensor<1x10xf32>
// Step 3: Insert the slice into the destination
%6 = tensor.insert_slice %5 into %arg0 [%iv, 0] [1, 10] [1, 1] :
tensor<1x10xf32> into tensor<10x10xf32>
scf.yield %6 : tensor<10x10xf32>
}
```
The pattern was discussed in the RFC here: https://discourse.llvm.org/t/rfc-tensor-extracting-slices-from-tensor-collapse-shape/64034
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D129699
This change changes the handling of trailing dimensions with unknown
extent. Users of the changessociationIndicesForReshape helper should
see benefits when transforming reshape like operations into
expand/collapse pairs if the higher-rank type has trailing unknown
dimensions.
The motivating example is a reshape from tensor<16x1x?xi32> to
tensor<16xi32> that can be modeled as collapsing the three dimensions.
Differential Revision: https://reviews.llvm.org/D119730
The input type of a linalg.generic can be less dynamic than its output
type. If this is the case moving a reshape across the generic op would
create invalid IR, as expand_shape cannot expand arbitrary dynamic
dimensions.
Check that the reshape is actually valid before creating the
expand_shape. This exposes the existing verification logic in reshape
utils and removes the incomplete custom implementation in fusion.
Differential Revision: https://reviews.llvm.org/D116600
`memref.collapse_shape` has verification logic to make sure
result dim must be static if all the collapsing src dims are static.
Cast folding might add more static information for the src operand
of `memref.collapse_shape` which might change a valid collapsing
operation to be invalid. Add `CollapseShapeOpMemRefCastFolder` pattern
to fix this.
Minor changes to `convertReassociationIndicesToExprs` to use `context`
instead of `builder` to avoid extra steps to construct temporary
builders.
Reviewed By: nicolasvasilache, mravishankar
Differential Revision: https://reviews.llvm.org/D106670