Edit the `findValueInReverseUseDefChain` method to accept `OpOperand`
instead of the `Value` type, This change will make sure that the
populated `visitedOpOperands` argument is fully accurate and contains
the opOperand we have started the reverse chain from.
Note that PointerUnion::{is,get} have been soft deprecated in
PointerUnion.h:
// FIXME: Replace the uses of is(), get() and dyn_cast() with
// isa<T>, cast<T> and the llvm::dyn_cast<T>
I'm not touching PointerUnion::dyn_cast for now because it's a bit
complicated; we could blindly migrate it to dyn_cast_if_present, but
we should probably use dyn_cast when the operand is known to be
non-null.
In many cases the emptyTensorElimination can not transform or eliminate
the empty tensor which is being inserted into the
`SubsetInsertionOpInterface`.
Two major reasons for that:
1- Failing when trying to find a legal/suitable insertion point for the
`subsetExtract` which is about to replace the empty tensor. However, we
may try to handle this issue by moving the needed values which
responsible on building the `subsetExtract` nearby the empty tensor
(which is about to be eliminated). Thus increasing the probability to
find a legal insertion point.
2-The EmptyTensorElimination transform replaces the tensor.empty's uses
all at once in one apply, rather than replacing only the specific use
which was visited in the use-def chain (when traversing from the
tensor.insert_slice). This scenario of replacing all the uses of the
tensor.empty may lead into additional read effects after bufferization
of the specific subset extract/subview which should not be the case.
Both cases may result in many copies in the coming bufferization which
can not be canonicalized.
The first case can be noticed when having a `tensor.empty` followed by
`SubsetInsertionOpInterface` (or in simple words `tensor.insert_slice`),
which have been lowered from `tensor/tosa.concat`.
The second case can be noticed when having a `tensor.empty`, with many
uses and leading to applying the transformation only once, since the
whole uses have been replaced at once.
The first commit in the PR only adds the lit tests for the cases shown
above (NFC), to emphasize how the transform works, in the coming MRs
will upload a slight changes to handle these case.
The second commit in this PR, we want to replace only the specific use
which was visited in the `use-def` chain (when traversing from the
`tensor.insert_slice`'s source).
As described in issue llvm/llvm-project#91518, a previous PR
llvm/llvm-project#78484 introduced the `defaultMemorySpaceFn` into
bufferization options, allowing one to inform OneShotBufferize that it
should use a specified function to derive the memory space attribute
from the encoding attribute attached to tensor types.
However, introducing this feature exposed unhandled edge cases,
examples of which are introduced by this change in the new test under
`test/Dialect/Bufferization/Transforms/one-shot-bufferize-encodings.mlir`.
Fixing the inconsistencies introduced by `defaultMemorySpaceFn` is
pretty simple. This change:
- Updates the `bufferization.to_memref` and `bufferization.to_tensor`
operations to explicitly include operand and destination types,
whereas previously they relied on type inference to deduce the
tensor types. Since the type inference cannot recover the correct
tensor encoding/memory space, the operand and result types must be
explicitly included. This is a small assembly format change, but it
touches a large number of test files.
- Makes minor updates to other bufferization functions to handle the
changes in building the above ops.
- Updates bufferization of `tensor.from_elements` to handle memory
space.
Integration/upgrade guide:
In downstream projects, if you have tests or MLIR files that explicitly
use
`bufferization.to_tensor` or `bufferization.to_memref`, then update
them to the new assembly format as follows:
```
%1 = bufferization.to_memref %0 : memref<10xf32>
%2 = bufferization.to_tensor %1 : memref<10xf32>
```
becomes
```
%1 = bufferization.to_memref %0 : tensor<10xf32> to memref<10xf32>
%2 = bufferization.to_tensor %0 : memref<10xf32> to tensor<10xf32>
```
**Description:**
`OneShotModuleBufferize` deals with the bufferization of `FuncOp`,
`CallOp` and `ReturnOp` but they are hard-coded. Any custom
function-like operations will not be handled. The PR replaces a part of
`FuncOp` and `CallOp` with `FunctionOpInterface` and `CallOpInterface`
in `OneShotModuleBufferize` so that custom function ops and call ops can
be bufferized.
**Related Discord Discussion:**
[Link](https://discord.com/channels/636084430946959380/642426447167881246/1280556809911799900)
---------
Co-authored-by: erick-xanadu <110487834+erick-xanadu@users.noreply.github.com>
Tensor copy insertion currently uses memory_space = 0 when creating a
tensor copy using alloc_tensor. This memory space should instead be the
default memory space provided in bufferization options.
This doesn't change functionality, but lets us avoid attaching all the
interfaces after 513cdb82223a106f183b49a40d9acb1f7efbbe7e turned casting
without loading into an error.
Collection of changes with the goal of being able to convert `encoding`
to `memorySpace` during bufferization
- new API for encoder to allow implementation to select destination
memory space
- update existing bufferization implementations to support the new
interface
This commit renames 4 pattern rewriter API functions:
* `updateRootInPlace` -> `modifyOpInPlace`
* `startRootUpdate` -> `startOpModification`
* `finalizeRootUpdate` -> `finalizeOpModification`
* `cancelRootUpdate` -> `cancelOpModification`
The term "root" is a misnomer. The root is the op that a rewrite pattern
matches against
(https://mlir.llvm.org/docs/PatternRewriter/#root-operation-name-optional).
A rewriter must be notified of all in-place op modifications, not just
in-place modifications of the root
(https://mlir.llvm.org/docs/PatternRewriter/#pattern-rewriter). The old
function names were confusing and have contributed to various broken
rewrite patterns.
Note: The new function names use the term "modify" instead of "update"
for consistency with the `RewriterBase::Listener` terminology
(`notifyOperationModified`).
Add a new interface method to `BufferizableOpInterface`:
`hasTensorSemantics`. This method returns "true" if the op has tensor
semantics and should be bufferized.
Until now, we assumed that an op has tensor semantics if it has tensor
operands and/or tensor op results. However, there are ops like
`ml_program.global` that do not have any results/operands but must still
be bufferized (#75103). The new interface method can return "true" for
such ops.
This change also decouples `bufferization::bufferizeOp` a bit from the
func dialect.
One-Shot Bufferize no longer deallocates buffers, so `deallocationFn`
can be removed.
Note: There is a `bufferization.dealloc_tensor` op that now always
bufferizes to `memref.dealloc`. This op will be phased out soon.
Remove the yielded tensor analysis. This analysis was used to detect
cases where One-Shot Bufferize cannot deallocate buffers. Deallocation
has recently been removed from One-Shot Bufferize. Buffers are now
deallocated by the buffer deallocation pass. This analysis is no longer
needed.
This commit removes the deallocation capabilities of
one-shot-bufferization. One-shot-bufferization should never deallocate
any memrefs as this should be entirely handled by the
ownership-based-buffer-deallocation pass going forward. This means the
`allow-return-allocs` pass option will default to true now,
`create-deallocs` defaults to false and they, as well as the escape
attribute indicating whether a memref escapes the current region, will
be removed. A new `allow-return-allocs-from-loops` option is added as a
temporary workaround for some bufferization limitations.
This reverts commit 6a91dfedeb956dfa092a6a3f411e8b02f0d5d289.
This caused problems in downstream projects. We are reverting to give
them more time for integration.
This is the first commit in a series with the goal to rework the
BufferDeallocation pass. Currently, this pass heavily relies on copies
to perform correct deallocations, which leads to very slow code and
potentially high memory usage. Additionally, there are unsupported cases
such as returning memrefs which this series of commits aims to add
support for as well.
This first commit removes the deallocation capabilities of
one-shot-bufferization.One-shot-bufferization should never deallocate any
memrefs as this should be entirely handled by the buffer-deallocation pass
going forward. This means the allow-return-allocs pass option will
default to true now, create-deallocs defaults to false and they, as well
as the escape attribute indicating whether a memref escapes the current region,
will be removed.
The documentation should w.r.t. these pass option changes should also be
updated in this commit.
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D156662
One-Shot Bufferize correctly handles RaW conflicts around repetitive regions (loops). Specical handling is needed for parallel regions. These are a special kind of repetitive regions that can have additional RaW conflicts that would not be present if the regions would be executed sequentially.
Example:
```
%0 = bufferization.alloc_tensor()
scf.forall ... {
%1 = linalg.fill ins(...) outs(%0)
...
scf.forall.in_parallel {
tensor.parallel_insert_slice %1 into ...
}
}
```
A separate (private) buffer must be allocated for each iteration of the `scf.forall` loop.
This change adds a new interface method to `BufferizableOpInterface` to detect parallel regions. By default, regions are assumed to be sequential.
A buffer is privatized if an OpOperand bufferizes to a memory read inside a parallel region that is different from the parallel region where operand's value is defined.
Differential Revision: https://reviews.llvm.org/D159286
This revision adds support for unstructured control flow to the bufferization infrastructure. In particular: regions with multiple blocks, `cf.br`, `cf.cond_br`.
Two helper templates are added to `BufferizableOpInterface.h`, which can be implemented by ops that supported unstructured control flow in their regions (e.g., `func.func`) and ops that branch to another block (e.g., `cf.br`).
A block signature is always bufferized together with the op that owns the block.
Differential Revision: https://reviews.llvm.org/D158094
`getBufferType` computes the bufferized type of an SSA value without bufferizing any IR. This is useful for predicting the bufferized type of iter_args of a loop.
To avoid endless recursion (e.g., in the case of "scf.for", the type of the iter_arg depends on the type of init_arg and the type of the yielded value; the type of the yielded value depends on the type of the iter_arg again), `fixedTypes` was used to fall back to "fixed" type. A simpler way is to maintain an "invocation stack". `getBufferType` implementations can then inspect the invocation stack to detect repetitive computations (typically when computing the bufferized type of a block argument).
Also improve error messages in case of inconsistent memory spaces inside of a loop.
Differential Revision: https://reviews.llvm.org/D158060
This revision is needed to support bufferization of `cf.br`/`cf.cond_br`. It will also be useful for better analysis of loop ops.
This revision generalizes `getAliasingOpResults` to `getAliasingValues`. An OpOperand can now not only alias with OpResults but also with BlockArguments. In the case of `cf.br` (will be added in a later revision): a `cf.br` operand will alias with the corresponding argument of the destination block.
If an op does not implement the `BufferizableOpInterface`, the analysis in conservative. It previously assumed that an OpOperand may alias with each OpResult. It now assumes that an OpOperand may alias with each OpResult and each BlockArgument of the entry block.
Differential Revision: https://reviews.llvm.org/D157957
This implication was already done de-facto and there were plenty of users and wrapper functions specifically used to handle the "return-like or RegionBranchTerminatorOpInterface" case. These simply existed due to up until recently missing features in ODS.
With the new capabilities of traits, we can make `ReturnLike` imply `RegionBranchTerminatorOpInterface` and auto generate proper definitions for its methods.
Various occurrences and wrapper methods used for `isa<RegionBranchTerminatorOpInterface>() || hasTrait<ReturnLike>()` have all been removed.
Differential Revision: https://reviews.llvm.org/D157402
EmptyTensorElimination is a pre-bufferization transformation that replaces "tensor.empty" ops with "tensor.extract_slice" ops. This revision adds support for cases where the input IR contains "tensor.cast" ops.
Differential Revision: https://reviews.llvm.org/D156167
The `getEnclosingRepetitiveRegion` functions walk the ancestor regions everytime which can be expensive especially when there are multiple regions inbetween. This commit adds a cache to the bufferization analysis to remember the result of the walk.
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D154710
Instead of passing traversal options as a long list of arguments, store them in a TraversalConfig object and pass that object.
Differential Revision: https://reviews.llvm.org/D143927
The MLIR classes Type/Attribute/Operation/Op/Value support
cast/dyn_cast/isa/dyn_cast_or_null functionality through llvm's doCast
functionality in addition to defining methods with the same name.
This change begins the migration of uses of the method to the
corresponding function call as has been decided as more consistent.
Note that there still exist classes that only define methods directly,
such as AffineExpr, and this does not include work currently to support
a functional cast/isa call.
Context:
* https://mlir.llvm.org/deprecation/ at "Use the free function variants for dyn_cast/cast/isa/…"
* Original discussion at https://discourse.llvm.org/t/preferred-casting-style-going-forward/68443
Implementation:
This follows a previous patch that updated calls
`op.cast<T>()-> cast<T>(op)`. However some cases could not handle an
unprefixed `cast` call due to occurrences of variables named cast, or
occurring inside of class definitions which would resolve to the method.
All C++ files that did not work automatically with `cast<T>()` are
updated here to `llvm::cast` and similar with the intention that they
can be easily updated after the methods are removed through a
find-replace.
See https://github.com/llvm/llvm-project/compare/main...tpopp:llvm-project:tidy-cast-check
for the clang-tidy check that is used and then update printed
occurrences of the function to include `llvm::` before.
One can then run the following:
```
ninja -C $BUILD_DIR clang-tidy
run-clang-tidy -clang-tidy-binary=$BUILD_DIR/bin/clang-tidy -checks='-*,misc-cast-functions'\
-export-fixes /tmp/cast/casts.yaml mlir/*\
-header-filter=mlir/ -fix
rm -rf $BUILD_DIR/tools/mlir/**/*.inc
```
Differential Revision: https://reviews.llvm.org/D150348
/data/llvm-project/mlir/lib/Dialect/Bufferization/IR/BufferizableOpInterface.cpp:342:2: error: extra ';' outside of a function is incompatible with C++
98 [-Werror,-Wc++98-compat-extra-semi]
}; // namespace
This change adds a new helper function `mlir::reifyResultShapes` that calls the corresponding interface method and also checks the result produced by the implementation when running in debug mode. Bugs due to incorrect interface implementations can be difficult to debug.
This helper function also reduces the amount of code needed at call sites: the cast to `ReifyRankedShapedTypeOpInterface` is done in the helper function.
Differential Revision: https://reviews.llvm.org/D145777
`reifyResultShapes` now returns `OpFoldResult`s instead of `Value`s. This is often more efficient because many transformations immediately attempt to extract a constant from the reified values.
Differential Revision: https://reviews.llvm.org/D145250
`bufferizesToMemoryWrite(OpResult)` looks for OpOperands that bufferize to memory writes inside the region of the defining op (if it has one). Currently, if the reverse use-def chain stops at any value inside of the region, the OpResult is considered to bufferize to a memory write.
It is always safe to have false positives among `bufferizesToMemoryWrite`, so the previous implementation is also correct. However, it can lead to additional buffer copies.
Differential Revision: https://reviews.llvm.org/D142223
`getAliasingOpOperands`/`getAliasingOpResults` now encodes OpOperand/OpResult, buffer relation and a degree of certainty. E.g.:
```
// aliasingOpOperands(%r) = {(%t, EQUIV, DEFINITE)}
// aliasingOpResults(%t) = {(%r, EQUIV, DEFINITE)}
%r = tensor.insert %f into %t[%idx] : tensor<?xf32>
// aliasingOpOperands(%r) = {(%t0, EQUIV, MAYBE), (%t1, EQUIV, MAYBE)}
// aliasingOpResults(%t0) = {(%r, EQUIV, MAYBE)}
// aliasingOpResults(%t1) = {(%r, EQUIV, MAYBE)}
%r = arith.select %c, %t0, %t1 : tensor<?xf32>
```
`BufferizableOpInterface::bufferRelation` is removed, as it is now part of `getAliasingOpOperands`/`getAliasingOpResults`.
This change allows for better analysis, in particular wrt. equivalence. This allows additional optimizations and better error checking (which is sometimes overly conservative). Examples:
* EmptyTensorElimination can eliminate `tensor.empty` inside `scf.if` blocks. This requires a modeling of equivalence: It is not a per-OpResult property anymore. Instead, it can be specified for each OpOperand and OpResult. This is important because `tensor.empty` may be eliminated only if all values on the SSA use-def chain to the final consumer (`tensor.insert_slice`) are equivalent.
* The detection of "returning allocs from a block" can be improved. (Addresses a TODO in `assertNoAllocsReturned`.) This allows us to bufferize IR such as "yielding a `tensor.extract_slice` result from an `scf.if` branch", which currently fails to bufferize because the alloc detection is too conservative.
* Better bufferization of loops. Aliases of the iter_arg can be yielded (even if they are not equivalent) without having to realloc and copy the entire buffer on each iteration.
The above-mentioned examples are not yet implemented with this change. This change just improves the BufferizableOpInterface, its implementations and related helper functions, so that better aliasing information is available for each op.
Differential Revision: https://reviews.llvm.org/D142129
The previous strategy was too complex and faulty. Op dominance cannot be used to rule out RaW conflicts due to op ordering if the reading op and the conflicting writing op are in a sub repetitive region of the closest enclosing repetitive region of the definition of the read value.
Differential Revision: https://reviews.llvm.org/D143087
Reading from tensor.empty or bufferization.alloc_tensor (without copy) cannot cause a conflict because these ops do not specify the contents of their result tensors.
Differential Revision: https://reviews.llvm.org/D143183
* `getAliasingOpOperand` => `getAliasingOpOperands`
* `getAliasingOpResult` => `getAliasingOpResults`
Also a few minor code cleanups and better documentation.
Differential Revision: https://reviews.llvm.org/D142979
Unranked tensors can currently not be copied. They are forced to always bufferize in-place. There is typically some other OpOperand that can bufferize out-of-place instead if needed.
Note: There is IR that cannot be bufferized with One-Shot Bufferize at the moment (see invalid test case). But it is unclear if we need to support such cases. We do not have a use case at the moment. This restriction could be loosened in the future if needed.
This change improves error handling when bufferizing IR where an unranked tensor would be copied. It also disables an optimization where an OpResult was copied instead of an OpOperand in case the OpResult is an unranked tensor (Github #60187).
Differential Revision: https://reviews.llvm.org/D142331
The previous lingo was confusing. There are no writes on tensors. There are only definitions.
Also some minor cleanup and better documentation.
Differential Revision: https://reviews.llvm.org/D141790
The name of the method was confusing. It is bufferizesToMemoryWrite, but from the perspective of OpResults.
`bufferizesToMemoryWrite(OpResult)` now supports ops with regions that do not have aliasing OpOperands (such as `scf.if`). These ops no longer need to implement `isMemoryWrite`.
Differential Revision: https://reviews.llvm.org/D141684
These functions are generally useful and not specific to One-Shot Analysis. Move them to `BufferizableOpInterface.h` and make them public.
Differential Revision: https://reviews.llvm.org/D141685
The patch adds operations to `BlockAndValueMapping` and renames it to `IRMapping`. When operations are cloned, old operations are mapped to the cloned operations. This allows mapping from an operation to a cloned operation. Example:
```
Operation *opWithRegion = ...
Operation *opInsideRegion = &opWithRegion->front().front();
IRMapping map
Operation *newOpWithRegion = opWithRegion->clone(map);
Operation *newOpInsideRegion = map.lookupOrNull(opInsideRegion);
```
Migration instructions:
All includes to `mlir/IR/BlockAndValueMapping.h` should be replaced with `mlir/IR/IRMapping.h`. All uses of `BlockAndValueMapping` need to be renamed to `IRMapping`.
Reviewed By: rriddle, mehdi_amini
Differential Revision: https://reviews.llvm.org/D139665
TensorCopyInsertion inserts bufferization.alloc_tensor ops in case of RaW conflicts. If such a tensor is dynamically shaped, tensor.dim ops are inserted. There is an optimization for ops such as tensor.extract_slice: A copy of the result is created instead of the operand. Afterwards, all uses of the result are updated. E.g.:
```
%0 = tensor.extract_slice ... : tensor<?xf32> to tensor<?xf32>
%1 = tensor.dim %0, %c0 : tensor<?xf32>
%2 = bufferization.alloc_tensor(%dim) : tensor<?xf32>
```
All uses of %0, except for tensor.dim and bufferization.alloc_tensor (if any), should be replaced. Before this change, the use in tensor.dim was also replaced, resulting in IR that had a dominance error.
Note: There is no test case for this fix because the bug cannot be triggered with tensor.extract_slice, which implements an interface to reify result shapes. This bug appeared in an external project with a tensor.extract_slice-like op that does not implement that interface, in which case tensor.dim ops must be created. We do not have such an op in MLIR to trigger this bug.
Differential Revision: https://reviews.llvm.org/D140471
`DialectAnalysisState` is now `OneShotAnalysisState::Extension`.
This state extension mechanism is needed only for One-Shot Analysis, so it is moved from `BufferizableOpInterface.h` to `OneShotAnalysis.h`.
Extensions are now identified via TypeIDs instead of StringRefs. The API of state extensions is cleaned up and follows the same pattern as other extension mechanisms in MLIR (e.g., `transform::TransformState::Extension`).
Also delete some dead code.
Differential Revision: https://reviews.llvm.org/D135051