In the linalg ElementwiseOpFusion transform, a pre-requisite for the
fusion between a producer and consumer op is that the producer's output
indexing map associated to the result to be fused must be invertible
(e.g. a simple permutation).
Before this patch, only the first output indexing map was being checked;
this bug produced issues when the operand to fuse was not the 1st result
of the producer op. For example, this situation arises when the producer
op has multiple results because it's the result of previous fusions
where the original result had been preserved: in these cases, the pass
ought to check the indexing map of the result being fused, which is not
necessarily the 1st one.
Signed-off-by: Fabrizio Indirli <Fabrizio.Indirli@arm.com>
When collapsing linalg dimensions we check if its memref operands are
guaranteed to be collapsible. However, we currently assume that the
matching indexing map is the identity map.
This commit modifies this behavior and checks if the memref is
collapsible on the transformed dimensions.
When fusing two `linalg.genericOp`, where the producer has index
semantics, invalid `affine.apply` ops can be generated where the number
of indices do not match the number of loops in the fused genericOp.
This patch fixes the issue by directly using the number of loops from
the generated fused op.
Similar to `FoldWithProducerReshapeOpByCollapsing`,
`FoldReshapeWithGenericOpByCollapsing` needs to be able to handle
partial fusion of a reshape by collapsing. This means that the source of
the generated `expand_shape` op (aka the collapsed linalg op) might not
match the type of the original `collapse_shape` op. This change instead
replaces the original linalg op with the new `expand_shape` op which is
guaranteed to be the same type.
Signed-off-by: Ian Wood <ianwood2024@u.northwestern.edu>
This is similar to other configuration objects used across MLIR.
Rename some fields to better reflect that they are no longer booleans.
Reland 04d261101b4f229189463136a794e3e362a793af / #132253.
Fixes dominance verifier error with
`FoldReshapeWithGenericOpByCollapsing` by setting the insertion point
after `producer`. The `tensor.collapse_shape` op only has a single
operand (`producer`) so it is safe to insert after the producer.
Signed-off-by: Ian Wood <ianwood2024@u.northwestern.edu>
DenseSet, SmallPtrSet, SmallSet, SetVector, and StringSet recently
gained C++23-style insert_range. This patch replaces:
Dest.insert(Src.begin(), Src.end());
with:
Dest.insert_range(Src);
This patch does not touch custom begin like succ_begin for now.
With `tensor.expand_shape` allowing expanding dynamic dimension into
multiple dynamic dimension, adapt the reshape propagation through
expansion to handle cases where one dynamic dimension is expanded into
multiple dynamic dimension.
---------
Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
This pattern to bubble up collapse shapes was missing in
`populateFoldReshapeOpsByCollapsingPatterns` .
Signed-off-by: Nirvedh Meshram <nirvedh@gmail.com>
During https://github.com/llvm/llvm-project/pull/129128 adding reshape
as consumer fusion handling of linalg.transpose was missed. This PR adds
that.
Also transpose reshape as producer fusion test is updated to static
sizes as that is more likely to catch any issues with the permutation
vector in the verifier if the shapes dont match up.
---------
Signed-off-by: Nirvedh Meshram <nirvedh@gmail.com>
Removes the condition that checks that operand is not indexed by
reduction iterators which allows for more fine-grained control via the
reshape fusion control function. For example, users could allow fusing
reshapes expand the M/N dims of a matmul but not the K dims (or preserve
the current behavior by not fusing at all).
---------
Signed-off-by: Ian Wood <ianwood2024@u.northwestern.edu>
This PR preserve linalg Op types for certain named ops such as Fill,
Copy and Transpose instead of fusion always resulting in a generic Op.
---------
Signed-off-by: Nirvedh Meshram <nirvedh@gmail.com>
This pr fixes how iteration domain of linalg.generic is collapsed when
fusing with tensor.expand_shape. Previously, the output_shape for
tensor.expand shape was infered, which doesn't always work except some
special cases.
This patch makes the logic explicitly set the bounds of the new
collapsed iteration domain, because we already know them.
---------
Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com>
The index computation is meant to be signed. Using unsigned could lead
to subtle errors. Fix places where some index math was using unsigned
operations.
Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
The greedy rewriter is used in many different flows and it has a lot of
convenience (work list management, debugging actions, tracing, etc). But
it combines two kinds of greedy behavior 1) how ops are matched, 2)
folding wherever it can.
These are independent forms of greedy and leads to inefficiency. E.g.,
cases where one need to create different phases in lowering and is
required to applying patterns in specific order split across different
passes. Using the driver one ends up needlessly retrying folding/having
multiple rounds of folding attempts, where one final run would have
sufficed.
Of course folks can locally avoid this behavior by just building their
own, but this is also a common requested feature that folks keep on
working around locally in suboptimal ways.
For downstream users, there should be no behavioral change. Updating
from the deprecated should just be a find and replace (e.g., `find ./
-type f -exec sed -i
's|applyPatternsAndFoldGreedily|applyPatternsGreedily|g' {} \;` variety)
as the API arguments hasn't changed between the two.
Note that PointerUnion::{is,get} have been soft deprecated in
PointerUnion.h:
// FIXME: Replace the uses of is(), get() and dyn_cast() with
// isa<T>, cast<T> and the llvm::dyn_cast<T>
I'm not touching PointerUnion::dyn_cast for now because it's a bit
complicated; we could blindly migrate it to dyn_cast_if_present, but
we should probably use dyn_cast when the operand is known to be
non-null.
`isOpOperandCanBeDroppedAfterFusedLinalgs` crashes when `indexingMaps`
is empty. This can occur when `producer` only has DPS init operands and
`consumer ` only has a single DPS input operand (all operands are
ignored and nothing gets added to `indexingMaps`). This is because
`concatAffineMaps` wasn't handling the maps being empty properly.
Similar to `canOpOperandsBeDroppedImpl`, I added an early return when
the maps are of size zero. Additionally, `concatAffineMaps`'s
declaration comment says it returns an empty map when `maps` is empty
but it has no way to get the `MLIRContext` needed to construct the empty
affine map when the array is empty. So, I changed this to take the
context.
__NOTE: concatAffineMaps now takes an MLIRContext to be able to
construct an empty map in the case where `maps` is empty.__
---------
Signed-off-by: Ian Wood <ianwood2024@u.northwestern.edu>
Co-authored-by: Quinn Dawkins <quinn.dawkins@gmail.com>
Fusion of reshapes by collapsing patterns were restricted to single
result operations, but the implementation supports multi result ops.
This PR removes the restriction, since it is not necessary.
/llvm-project/mlir/lib/Dialect/Linalg/Transforms/ElementwiseOpFusion.cpp:124:7:
error: ignoring return value of function declared with 'nodiscard' attribute [-Werror,-Wunused-result]
opOperandsToIgnore.pop_back_val();
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 error generated.
This commit changes the getPreservedProducerResults function so that it
takes the consumer into account along with the producer, in order to
predict which of the producer’s outputs can be dropped during the fusion
process. It provides a more accurate prediction, considering that the
fusion process also depends on the consumer.
Refactored @Max191's PR https://github.com/llvm/llvm-project/pull/94637
to move it to `Tensor`
From the original PR
>This PR adds fusion by expansion patterns to push a tensor.expand_shape
up through a tensor.collapse_shape with non-intersecting reassociations.
Sometimes parallel collapse_shape ops like this can block propagation of
expand_shape ops, so this allows them to pass through each other.
I'm not sure if I put the code/tests in the right places, so let me know
where those go if they aren't.
cc @MaheshRavishankar @hanhanW
---------
Co-authored-by: Max Dawkins <max.dawkins@gmail.com>
This PR adds fusion by collapsing and fusion by expansion patterns for
`tensor.pad` ops in ElementwiseOpFusion. Pad ops can be expanded or
collapsed as long as none of the padded dimensions will be expanded or
collapsed.
This patch generalizes tensor.expand_shape and memref.expand_shape to
consume the output shape as a list of SSA values. This enables us to
implement generic reshape operations with dynamic shapes using
collapse_shape/expand_shape pairs.
The output_shape input to expand_shape follows the static/dynamic
representation that's also used in `tensor.extract_slice`.
Differential Revision: https://reviews.llvm.org/D140821
---------
Signed-off-by: Gaurav Shukla<gaurav.shukla@amd.com>
Signed-off-by: Gaurav Shukla <gaurav.shukla@amd.com>
Co-authored-by: Ramiro Leal-Cavazos <ramiroleal050@gmail.com>
This patch generalizes tensor.expand_shape and memref.expand_shape to
consume the output shape as a list of SSA values. This enables us to
implement generic reshape operations with dynamic shapes using
collapse_shape/expand_shape pairs.
The output_shape input to expand_shape follows the static/dynamic
representation that's also used in `tensor.extract_slice`.
Differential Revision: https://reviews.llvm.org/D140821
Co-authored-by: Ramiro Leal-Cavazos <ramiroleal050@gmail.com>
This adds support for expansion of named linalg ops and linalg ops with
reduction iterators. This improves the ability to make fusion decisions
WRT reduction operations. To recover the previous behavior, users of the
patterns can add a control function to restrict propagation of reshape
by expansion through linalg ops with reduction iterators.
For named linalg ops, this always converts the named op into a generic.
A `transform.structured.flatten_elementwise` op is implemented for
flattening the iteration space and (applicable) operands/results to a
single dimension.
This uses the tablegen macros for generating pass constructors, exposing
pass options for fold-unit-extent-dims and linalg-detensorize.
Additionally aligns some of the pass namings to their text counterpart.
This includes an API change:
createLinalgGeneralizationPass -> createLinalgGeneralizeNamedOpsPass
When creating a new block in (conversion) rewrite patterns,
`OpBuilder::createBlock` must be used. Otherwise, no
`notifyBlockInserted` notification is sent to the listener.
Note: The dialect conversion relies on listener notifications to keep
track of IR modifications. Creating blocks without the builder API can
lead to memory leaks during rollback.
This commit renames 4 pattern rewriter API functions:
* `updateRootInPlace` -> `modifyOpInPlace`
* `startRootUpdate` -> `startOpModification`
* `finalizeRootUpdate` -> `finalizeOpModification`
* `cancelRootUpdate` -> `cancelOpModification`
The term "root" is a misnomer. The root is the op that a rewrite pattern
matches against
(https://mlir.llvm.org/docs/PatternRewriter/#root-operation-name-optional).
A rewriter must be notified of all in-place op modifications, not just
in-place modifications of the root
(https://mlir.llvm.org/docs/PatternRewriter/#pattern-rewriter). The old
function names were confusing and have contributed to various broken
rewrite patterns.
Note: The new function names use the term "modify" instead of "update"
for consistency with the `RewriterBase::Listener` terminology
(`notifyOperationModified`).
Rename interface functions as follows:
* `hasTensorSemantics` -> `hasPureTensorSemantics`
* `hasBufferSemantics` -> `hasPureBufferSemantics`
These two functions return "true" if the op has tensor/buffer operands
but not buffer/tensor operands.
Also drop the "ranked" part from the interface, i.e., do not distinguish
between ranked/unranked types.
The new function names describe the functions more accurately. They also
align their semantics with the notion of "tensor semantics" with the
bufferization framework. (An op is supposed to be bufferized if it has
tensor operands, and we don't care if it also has memref operands.)
This change is in preparation of #75273, which adds
`BufferizableOpInterface::hasTensorSemantics`. By renaming the functions
in the `DestinationStyleOpInterface`, we can avoid name clashes between
the two interfaces.
Declare `getPreservedProducerResults` function which helps to get the
preserved results of the producer linalg generic operation as a result
of elementwise fusion.
Rewrite patterns must return `success` if the IR was modified. This
commit fixes sparse tensor tests such as
`SparseTensor/sparse_fusion.mlir`,
`SparseTensor/CPU/sparse_reduce_custom.mlir`,
`SparseTensor/CPU/sparse_semiring_select.mlir` when running with
`MLIR_ENABLE_EXPENSIVE_PATTERN_API_CHECKS`.
When the Powers That Be decided that the name "sparse compiler" should
be changed to "sparsifier", we negected to change some of the comments
in the code; this pull request completes the name change.
The TableGen code generator now generates C++ code that returns a single
`OpOperand &` for `get...Mutable` of operands that are not variadic and
not optional. `OpOperand::set`/`assign` can be used to set a value (same
as `MutableOperandRange::assign`). This is safer than
`MutableOperandRange` because only single values (and no longer
`ValueRange`) can be assigned.
E.g.:
```
// Assignment of multiple values to non-variadic operand.
// Before: Compiles, but produces invalid op.
// After: Compilation error.
extractSliceOp.getSourceMutable().assign({v1, v2});
```
* "init" operands are specified with `MutableOperandRange` (which gives
access to the underlying `OpOperand *`). No more magic numbers.
* Remove most interface methods and make them helper functions. Only
`getInitsMutable` should be implemented.
* Provide separate helper functions for accessing mutable/immutable
operands (`OpOperand`/`Value`, in line with #66515): `getInitsMutable`
and `getInits` (same naming convention as auto-generated op accessors).
`getInputOperands` was not renamed because this function cannot return a
`MutableOperandRange` (because the operands are not necessarily
consecutive). `OpOperandVector` is no longer needed.
* The new `getDpsInits`/`getDpsInitsMutable` is more efficient than the
old `getDpsInitOperands` because no `SmallVector` is created. The new
functions return a range of operands.
* Fix a bug in `getDpsInputOperands`: out-of-bounds operands were
potentially returned.
`operator[]` returns `OpOperand &` instead of `Value`.
* This allows users to get OpOperands by name instead of "magic" number.
E.g., `extractSliceOp->getOpOperand(0)` can be written as
`extractSliceOp.getSourceMutable()[0]`.
* `OperandRange` provides a read-only API to operands: `operator[]`
returns `Value`. `MutableOperandRange` now provides a mutable API:
`operator[]` returns `OpOperand &`, which can be used to set operands.
Note: The TableGen code generator could be changed to return `OpOperand
&` (instead of `MutableOperandRange`) for non-variadic and non-optional
arguments in a subsequent change. Then the `[0]` part in the above
example would no longer be necessary.
* Remove duplicate functions. `tensor::getMixedSize` and `tensor::getMixedSizes` should be used.
* Use `tensor::getMixedSize` instead of `createOrFold<tensor::DimOp>`. This is more efficient. `createOrFold` will create an op an immediately try to fold it. In case of a static dimension size, an attribute can be used directly.
Differential Revision: https://reviews.llvm.org/D153332