getMixedOffsets() calls getMixedValues() with `static_offsets` and
`offsets`. It is assumed that the number of dynamic offsets in
`static_offsets` equals the rank of `offsets`. Otherwise, we fail on
assert when trying to access an array out of its bounds.
The same applies to getMixedStrides() and getMixedOffsets().
A verification of this assumption is added to
verifyOffsetSizeAndStrideOp() and a clear assert is added in
getMixedValues().
These are identified by misc-include-cleaner. I've filtered out those
that break builds. Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
The motivation is to avoid having to negate `isDynamic*` checks, avoid
double negations, and allow for `ShapedType::isStaticDim` to be used in
ADT functions without having to wrap it in a lambda performing the
negation.
Also add the new functions to C and Python bindings.
Fix the following compile error when building libMLIRDialectUtils.a
only:
In file included from
mlir/include/mlir/Dialect/Utils/ReshapeOpsUtils.h:17,
from mlir/lib/Dialect/Utils/ReshapeOpsUtils.cpp:9:
mlir/include/mlir/Dialect/Arith/IR/Arith.h:28:10:
fatal error: mlir/Dialect/Arith/IR/ArithOpsDialect.h.inc: No such file
or directory
ArithDialect dependency is now needed since
0515449f6dcb452ea0b089fb3057d469c3cffa3f to create arith.muli op.
Following up from https://github.com/llvm/llvm-project/pull/143467,
this PR adds support for
`ReductionTilingStrategy::PartialReductionOuterParallel` to
`tileUsingSCF`. The implementation of
`PartialReductionTilingInterface` for `Linalg` ops has been updated to
support this strategy as well. This makes the `tileUsingSCF` come on
par with `linalg::tileReductionUsingForall` which will be deprecated
subsequently.
Changes summary
- `PartialReductionTilingInterface` changes :
- `tileToPartialReduction` method needed to get the induction
variables of the generated tile loops. This was needed to keep the
generated code similar to `linalg::tileReductionUsingForall`,
specifically to create a simplified access for slicing the
intermediate partial results tensor when tiled in `num_threads` mode.
- `getPartialResultTilePosition` methods needs the induction
varialbes for the generated tile loops for the same reason above,
and also needs the `tilingStrategy` to be passed in to generate
correct code.
The tests in `transform-tile-reduction.mlir` testing the
`linalg::tileReductionUsingForall` have been moved over to test
`scf::tileUsingSCF` with
`ReductionTilingStrategy::PartialReductionOuterParallel`
strategy. Some of the test that were doing further cyclic distribution
of the transformed code from tiling are removed. Those seem like two
separate transformation that were merged into one. Ideally that would
need to happen when resolving the `scf.forall` rather than during
tiling.
Please review only the top commit. Depends on
https://github.com/llvm/llvm-project/pull/143467
Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
This patch enhances `MemRefType::areTrailingDimsContiguous` to also
handle memrefs with dynamic dimensions.
The implementation itself is based on a new member function
`MemRefType::getMaxCollapsableTrailingDims` that return the maximum
number of trailing dimensions that can be collapsed - trivially all
dimensions for memrefs with identity layout, or by examining the memref
strides stopping at discontiguous or statically unknown strides.
…shapes
Commit 6e5a142 changed the behavior of the function when computing
reassociations between tensors (consisting of unit/dynamic dimensions)
and scalars/0d vectors. The IR representation for such reshapes actually
expects an empty reassociation, like so:
```
func.func @example(%arg0 : tensor<?x?x?xf32>) -> tensor<f32> {
%0 = tensor.collapse_shape %arg0 [] : tensor<?x?x?xf32> into tensor<f32>
}
```
Restore the original behavior - the routine should resort to reporting
failures when compile time-known non-unit dimensions are part of the
attempted reassociation.
Signed-off-by: Artem Gindinson <gindinson@roofline.ai>
The original PR https://github.com/llvm/llvm-project/pull/137963 had a
nvidia bot failure. This appears to be a flaky test because rerunning
the build was successful.
This change needs commit 6f2ba47 to fix incorrect usage of
`getReassociationIndicesForCollapse`.
Reverts llvm/llvm-project#142639
Co-authored-by: Artem Gindinson <gindinson@roofline.ai>
The main idea behind the change is to allow expand-of-collapse folds for
reshapes like `?x?xk` -> `?` (k>1). The rationale here is that the
expand op must have a coherent index/affine expression specified in its
`output_shape` argument (see example below), and if it doesn't, the IR
has already been invalidated at an earlier stage:
```
%c32 = arith.constant 32 : index
%div = arith.divsi %<some_index>, %c32 : index
%collapsed = tensor.collapse_shape %41#1 [[0], [1, 2], [3, 4]]
: tensor<9x?x32x?x32xf32> into tensor<9x?x?xf32>
%affine = affine.apply affine_map<()[s0] -> (s0 * 32)> ()[%div]
%expanded = tensor.expand_shape %collapsed [[0], [1, 2], [3]] output_shape [9, %div, 32, %affine]
: tensor<9x?x?xf32> into tensor<9x?x32x?xf32>
```
On the above assumption, adjust the routine in
`getReassociationIndicesForCollapse()` to allow dynamic reshapes beyond
just `?x..?x1x1x..x1` -> `?`. Dynamic subshapes introduce two kinds of
issues:
1. n>2 consecutive dynamic dimensions in the source shape cannot be
collapsed together into 1<k<n neighboring dynamic dimensions in the
target shape, since there'd be more than one suitable reassociation
(example: `?x?x10x? into ?x?`)
2. When figuring out static subshape reassociations based on products,
there are cases where a static dimension is collapsed with a dynamic
one, and should therefore be skipped when comparing products of source &
target dimensions (e.g. `?x2x3x4 into ?x12`)
To address 1, we should detect such sequences in the target shape before
assigning multiple dynamic dimensions into the same index set. For 2, we
take note that a static target dimension was preceded by a dynamic one
and allow an "offset" subshape of source static dimensions, as long as
there's an exact sequence for the target size later in the source shape.
This PR aims to address all reshapes that can be determined based purely
on shapes (and original reassociation
maps, as done in
`ComposeExpandOfCollapseOp::findCollapsingReassociation)`. It doesn't
seem possible to fold all qualifying dynamic shape patterns in a
deterministic way without looking into affine expressions
simultaneously. That would be difficult to maintain in a single general
utility, so a path forward would be to provide dialect-specific
implementations for Linalg/Tensor.
Signed-off-by: Artem Gindinson <gindinson@roofline.ai>
---------
Signed-off-by: Artem Gindinson <gindinson@roofline.ai>
Co-authored-by: Ian Wood <ianwood2024@u.northwestern.edu>
The revision adds isOneInteger helper, and simplifies the existing code
with the two methods. It removes some lambda, which makes code cleaner.
For downstream users, you can update the code with the below script.
```bash
sed -i "s/isZeroIndex/isZeroInteger/g" **/*.h
sed -i "s/isZeroIndex/isZeroInteger/g" **/*.cpp
```
---------
Signed-off-by: hanhanW <hanhan0912@gmail.com>
In `int64_t r = strides.size() - 2`, it may cause overflow on 32-bit
system when strides.size() is 1, because `strides.size()` is defined
as `unsigned int`
The method does not need to create any operation, so we can use Builder.
It can be reused by any attribute getter implementation, so it does not
need to declare OpBuilder in the implementation.
Signed-off-by: hanhanW <hanhan0912@gmail.com>
DenseSet, SmallPtrSet, SmallSet, SetVector, and StringSet recently
gained C++23-style insert_range. This patch replaces:
Dest.insert(Src.begin(), Src.end());
with:
Dest.insert_range(Src);
This patch does not touch custom begin like succ_begin for now.
Adds checks in `isPermutationVector` for indices that are outside of the
bounds and removes the assert.
Signed-off-by: Ian Wood <ianwood2024@u.northwestern.edu>
Moves `PackOp` and `UnPackOp` from the Tensor dialect to Linalg. This change
was discussed in the following RFC:
* https://discourse.llvm.org/t/rfc-move-tensor-pack-and-tensor-unpack-into-linalg
This change involves significant churn but only relocates existing code - no new
functionality is added.
**Note for Downstream Users**
Downstream users must update references to `PackOp` and `UnPackOp` as follows:
* Code: `s/tensor::(Up)PackOp/linalg::(Un)PackOp/g`
* Tests: `s/tensor.(un)pack/linalg.(un)pack/g`
No other modifications should be required.
The op carries the output-shape directly. This can be used directly.
Also adds a method to get the shape as a `SmallVector<OpFoldResult>`.
Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
Note that PointerUnion::{is,get} have been soft deprecated in
PointerUnion.h:
// FIXME: Replace the uses of is(), get() and dyn_cast() with
// isa<T>, cast<T> and the llvm::dyn_cast<T>
I'm not touching PointerUnion::dyn_cast for now because it's a bit
complicated; we could blindly migrate it to dyn_cast_if_present, but
we should probably use dyn_cast when the operand is known to be
non-null.
This PR *restricts* `GeneralizeOuterUnitDimsPackOpPattern` to follow its
intended purpose (as per the documentation), which is to:
> require all outer dimensions of tensor.pack to be 1.
There was one in-tree test that violated this assumption (and happened
to work) – see `@simple_KCRS_to_KRSCsr` in
"generalize-tensor-pack.mlir". That test has been updated to satisfy the
new requirements of the pattern.
By enforcing the pattern to follow its intended design (i.e., making it
stricter), the calculation of shapes and sizes for various Ops that the
pattern generates (PadOp, ExtractSliceOp, EmptyOp, TensorOp, and
InsertSliceOp) becomes much simpler and easier to document. This also
helped *generalize* the pattern to support cases like the one below:
```mlir
func.func @simple_pad_and_pack_dynamic_tile_cst(
%src: tensor<5x1xf32>,
%dest: tensor<1x1x?x2xf32>,
%pad: f32) -> tensor<1x1x?x2xf32> {
%tile_dim_0 = arith.constant 8 : index
%0 = tensor.pack %src
padding_value(%pad : f32)
inner_dims_pos = [0, 1]
inner_tiles = [%tile_dim_0, 2]
into %dest : tensor<5x1xf32> -> tensor<1x1x?x2xf32>
return %0 : tensor<1x1x?x2xf32>
}
```
Note that the inner tile slice is dynamic but compile-time constant.
`getPackOpSourceOrPaddedSource`, which is used to generate PadOp,
detects this and generates a PadOp with static shapes. This is a good
optimization, but it means that all shapes/sizes for Ops generated by
`GeneralizeOuterUnitDimsPackOpPattern` also need to be updated to be
constant/static. By restricting the pattern and simplifying the
size/shape calculation, supporting the case above becomes much easier.
Notable implementation changes:
* PadOp processes the original source (no change in dimensions/rank).
ExtractSliceOp extracts the tile to pack and may reduce the rank. All
following ops work on the tile extracted by ExtractSliceOp (possibly
rank-reduced).
* All shape/size calculations assume that trailing dimensions match
inner_tiles from tensor.pack. All leading dimensions (i.e., outer
dimensions) are assumed to be 1.
* Dynamic sizes for ops like ExtractSliceOp are taken from inner_tiles
rather than computed as, for example, tensor.dim %dest, 2. It’s the
responsibility of the "producers" of tensor.pack to ensure that
dimensions in %dest match the specified tile sizes.
In the insert_slice bufferization interface implementation, the
destination tensor is not considered read if the full tensor is
overwritten by the slice. This PR adds the same check for
tensor.parallel_insert_slice.
Adds two new StaticValueUtils:
- `isAllConstantIntValue` checks if an array of `OpFoldResult` are all
equal to a passed `int64_t` value.
- `areConstantIntValues` checks if an array of `OpFoldResult` are all
equal to a passed array of `int64_t` values.
fixes https://github.com/llvm/llvm-project/issues/112435
---------
Signed-off-by: Max Dawkins <max.dawkins@gmail.com>
This patch fixes an off-by-one error in
`mlir::getReassociationIndicesForCollapse()` that occurs when the last
two dims of the source tensor satisfy the while loop.
This would cause an assertion failure due to out-of-bounds-access, which
is now fixed.
This patch generalizes tensor.expand_shape and memref.expand_shape to
consume the output shape as a list of SSA values. This enables us to
implement generic reshape operations with dynamic shapes using
collapse_shape/expand_shape pairs.
The output_shape input to expand_shape follows the static/dynamic
representation that's also used in `tensor.extract_slice`.
Differential Revision: https://reviews.llvm.org/D140821
---------
Signed-off-by: Gaurav Shukla<gaurav.shukla@amd.com>
Signed-off-by: Gaurav Shukla <gaurav.shukla@amd.com>
Co-authored-by: Ramiro Leal-Cavazos <ramiroleal050@gmail.com>
This patch generalizes tensor.expand_shape and memref.expand_shape to
consume the output shape as a list of SSA values. This enables us to
implement generic reshape operations with dynamic shapes using
collapse_shape/expand_shape pairs.
The output_shape input to expand_shape follows the static/dynamic
representation that's also used in `tensor.extract_slice`.
Differential Revision: https://reviews.llvm.org/D140821
Co-authored-by: Ramiro Leal-Cavazos <ramiroleal050@gmail.com>
It looks like the affine map generated to compute the indices of the
collapsed dimensions used the wrong dim size. For indices `[idx0][idx1]`
we computed the collapsed index as `idx0*size0 + idx1` instead of
`idx0*size1 + idx1`. This led to correctness issues in convolution tests
when enabling this transformation internally.
They can be simplified to reshape ops if outer_dims_perm is an identity
permutation. The revision adds a `isIdentityPermutation` method to
IndexingUtils.
This commit fixes a crash of the canonicalizer when there are slice ops
with offset/size SSA values that have a negative constant value. Such
ops are invalid if they are reachable and their offsets/sizes should not
be folded to static integer values. (But such ops may appear in
non-reachable block.)
This commit fixes#71150.
This change refactors some of the utilities used to unroll larger vector
computations into smaller vector computations. In fact, the indexing
computations used here are rather generic and are useful in other dialects or
downstream projects. Therefore, a utility for iterating over all possible tile
offsets for a particular pair of static (shape, tiled shape) is introduced in
IndexingUtils and replaces the existing computations in the vector unrolling
transformations. This builds off of the refactoring of IndexingUtils introduced
in 203fad476b7e.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D150000
Some GPU backends (SPIR-V) lower memrefs to bare pointers, so for dynamically sized/strided memrefs it will fail.
This pass extracts sizes and strides via `memref.extract_strrided_metadata` outside `gpu.launch` body and do index/offset calculation explicitly and then reconstructs memrefs via `memref.reinterpret_cast`.
`memref.reinterpret_cast` then lowered via https://reviews.llvm.org/D155011
Differential Revision: https://reviews.llvm.org/D155247
Some GPU backends (SPIR-V) lower memrefs to bare pointers, so for dynamically sized/strided memrefs it will fail.
This pass extracts sizes and strides via `memref.extract_strrided_metadata` outside `gpu.launch` body and do index/offset calculation explicitly and then reconstructs memrefs via `memref.reinterpret_cast`.
`memref.reinterpret_cast` then lowered via https://reviews.llvm.org/D155011
Differential Revision: https://reviews.llvm.org/D155247
This revision adds support for direct lowering of a linalg.copy on buffers between global and shared memory to a tma async load + synchronization operations.
This uses the recently introduced Hopper NVVM and NVGPU abstraction to connect things end to end.
Differential Revision: https://reviews.llvm.org/D157087
* Move `foldDynamicIndexList` to `DialectUtils` and simplify function.
* Move `OpWithOffsetSizesAndStridesConstantArgumentFolder` to `ViewLikeInterface` and add documentation.
Differential Revision: https://reviews.llvm.org/D156581