Multiple `func.return` ops inside of a `func.func` op are now supported
during bufferization. This PR extends the code base in 3 places:
- When inferring function return types, `memref.cast` ops are folded
away only if all `func.return` ops have matching buffer types. (E.g., we
don't fold if two `return` ops have operands with different layout
maps.)
- The alias sets of all `func.return` ops are merged. That's because
aliasing is a "may be" property.
- The equivalence sets of all `func.return` ops are taken only if they
match. If different `func.return` ops have different equivalence sets
for their operands, the equivalence information is dropped. That's
because equivalence is a "must be" property.
This commit is in preparation of removing the deprecated
`func-bufferize` pass. That pass can bufferize functions with multiple
`return` ops.
This commit adds support for recursive function calls to One-Shot
Bufferize.
The analysis does not support recursive function calls. The function
body itself can be analyzed, but we cannot make any assumptions about
the aliasing relation between function result and function arguments.
Similarly, when looking at a `call` op, we do not know whether the
operands will bufferize to a memory read/write. In the absence of such
information, we have to conservatively assume that they do.
This commit is in preparation of removing the deprecated
`func-bufferize` pass. That pass can bufferize recursive functions.
This commit adds support for bufferizing external functions that have no
body. Such functions were previously rejected by One-Shot Bufferize if
they returned a tensor value.
This commit is in preparation of removing the deprecated
`func-bufferize` pass. That pass can bufferize external functions.
Also update a few comments.
**Description:**
`OneShotModuleBufferize` deals with the bufferization of `FuncOp`,
`CallOp` and `ReturnOp` but they are hard-coded. Any custom
function-like operations will not be handled. The PR replaces a part of
`FuncOp` and `CallOp` with `FunctionOpInterface` and `CallOpInterface`
in `OneShotModuleBufferize` so that custom function ops and call ops can
be bufferized.
**Related Discord Discussion:**
[Link](https://discord.com/channels/636084430946959380/642426447167881246/1280556809911799900)
---------
Co-authored-by: erick-xanadu <110487834+erick-xanadu@users.noreply.github.com>
Handle caller/callee type mismatch using `castOrReallocMemRefValue`
instead of just a `CastOp`. The method insert a reallocation + copy if
it cannot be statically guaranteed that a direct cast would be valid.
Fix#105916.
Collection of changes with the goal of being able to convert `encoding`
to `memorySpace` during bufferization
- new API for encoder to allow implementation to select destination
memory space
- update existing bufferization implementations to support the new
interface
Add a new interface method to `BufferizableOpInterface`:
`hasTensorSemantics`. This method returns "true" if the op has tensor
semantics and should be bufferized.
Until now, we assumed that an op has tensor semantics if it has tensor
operands and/or tensor op results. However, there are ops like
`ml_program.global` that do not have any results/operands but must still
be bufferized (#75103). The new interface method can return "true" for
such ops.
This change also decouples `bufferization::bufferizeOp` a bit from the
func dialect.
This commit removes the deallocation capabilities of
one-shot-bufferization. One-shot-bufferization should never deallocate
any memrefs as this should be entirely handled by the
ownership-based-buffer-deallocation pass going forward. This means the
`allow-return-allocs` pass option will default to true now,
`create-deallocs` defaults to false and they, as well as the escape
attribute indicating whether a memref escapes the current region, will
be removed. A new `allow-return-allocs-from-loops` option is added as a
temporary workaround for some bufferization limitations.
This reverts commit 6a91dfedeb956dfa092a6a3f411e8b02f0d5d289.
This caused problems in downstream projects. We are reverting to give
them more time for integration.
This is the first commit in a series with the goal to rework the
BufferDeallocation pass. Currently, this pass heavily relies on copies
to perform correct deallocations, which leads to very slow code and
potentially high memory usage. Additionally, there are unsupported cases
such as returning memrefs which this series of commits aims to add
support for as well.
This first commit removes the deallocation capabilities of
one-shot-bufferization.One-shot-bufferization should never deallocate any
memrefs as this should be entirely handled by the buffer-deallocation pass
going forward. This means the allow-return-allocs pass option will
default to true now, create-deallocs defaults to false and they, as well
as the escape attribute indicating whether a memref escapes the current region,
will be removed.
The documentation should w.r.t. these pass option changes should also be
updated in this commit.
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D156662
This revision adds support for unstructured control flow to the bufferization infrastructure. In particular: regions with multiple blocks, `cf.br`, `cf.cond_br`.
Two helper templates are added to `BufferizableOpInterface.h`, which can be implemented by ops that supported unstructured control flow in their regions (e.g., `func.func`) and ops that branch to another block (e.g., `cf.br`).
A block signature is always bufferized together with the op that owns the block.
Differential Revision: https://reviews.llvm.org/D158094
When bufferizing "func.func", the entry block signature is bufferized. (Only functions with a single block are supported at the moment.) This functionality is moved into a separate function, so that it can be used for bufferizing unstructured control flow in the future.
Differential Revision: https://reviews.llvm.org/D158154
`getBufferType` computes the bufferized type of an SSA value without bufferizing any IR. This is useful for predicting the bufferized type of iter_args of a loop.
To avoid endless recursion (e.g., in the case of "scf.for", the type of the iter_arg depends on the type of init_arg and the type of the yielded value; the type of the yielded value depends on the type of the iter_arg again), `fixedTypes` was used to fall back to "fixed" type. A simpler way is to maintain an "invocation stack". `getBufferType` implementations can then inspect the invocation stack to detect repetitive computations (typically when computing the bufferized type of a block argument).
Also improve error messages in case of inconsistent memory spaces inside of a loop.
Differential Revision: https://reviews.llvm.org/D158060
This revision is needed to support bufferization of `cf.br`/`cf.cond_br`. It will also be useful for better analysis of loop ops.
This revision generalizes `getAliasingOpResults` to `getAliasingValues`. An OpOperand can now not only alias with OpResults but also with BlockArguments. In the case of `cf.br` (will be added in a later revision): a `cf.br` operand will alias with the corresponding argument of the destination block.
If an op does not implement the `BufferizableOpInterface`, the analysis in conservative. It previously assumed that an OpOperand may alias with each OpResult. It now assumes that an OpOperand may alias with each OpResult and each BlockArgument of the entry block.
Differential Revision: https://reviews.llvm.org/D157957
The bufferization implementation of `func.func` and `func.call` can be simplified. It still contained code that was necessary when One-Shot Bufferize removed return values. This functionality has been extracted into a separate pass a while ago.
Differential Revision: https://reviews.llvm.org/D157893
The MLIR classes Type/Attribute/Operation/Op/Value support
cast/dyn_cast/isa/dyn_cast_or_null functionality through llvm's doCast
functionality in addition to defining methods with the same name.
This change begins the migration of uses of the method to the
corresponding function call as has been decided as more consistent.
Note that there still exist classes that only define methods directly,
such as AffineExpr, and this does not include work currently to support
a functional cast/isa call.
Context:
- https://mlir.llvm.org/deprecation/ at "Use the free function variants
for dyn_cast/cast/isa/…"
- Original discussion at https://discourse.llvm.org/t/preferred-casting-style-going-forward/68443
Implementation:
This patch updates all remaining uses of the deprecated functionality in
mlir/. This was done with clang-tidy as described below and further
modifications to GPUBase.td and OpenMPOpsInterfaces.td.
Steps are described per line, as comments are removed by git:
0. Retrieve the change from the following to build clang-tidy with an
additional check:
main...tpopp:llvm-project:tidy-cast-check
1. Build clang-tidy
2. Run clang-tidy over your entire codebase while disabling all checks
and enabling the one relevant one. Run on all header files also.
3. Delete .inc files that were also modified, so the next build rebuilds
them to a pure state.
```
ninja -C $BUILD_DIR clang-tidy
run-clang-tidy -clang-tidy-binary=$BUILD_DIR/bin/clang-tidy -checks='-*,misc-cast-functions'\
-header-filter=mlir/ mlir/* -fix
rm -rf $BUILD_DIR/tools/mlir/**/*.inc
```
Differential Revision: https://reviews.llvm.org/D151542
The MLIR classes Type/Attribute/Operation/Op/Value support
cast/dyn_cast/isa/dyn_cast_or_null functionality through llvm's doCast
functionality in addition to defining methods with the same name.
This change begins the migration of uses of the method to the
corresponding function call as has been decided as more consistent.
Note that there still exist classes that only define methods directly,
such as AffineExpr, and this does not include work currently to support
a functional cast/isa call.
Caveats include:
- This clang-tidy script probably has more problems.
- This only touches C++ code, so nothing that is being generated.
Context:
- https://mlir.llvm.org/deprecation/ at "Use the free function variants
for dyn_cast/cast/isa/…"
- Original discussion at https://discourse.llvm.org/t/preferred-casting-style-going-forward/68443
Implementation:
This first patch was created with the following steps. The intention is
to only do automated changes at first, so I waste less time if it's
reverted, and so the first mass change is more clear as an example to
other teams that will need to follow similar steps.
Steps are described per line, as comments are removed by git:
0. Retrieve the change from the following to build clang-tidy with an
additional check:
https://github.com/llvm/llvm-project/compare/main...tpopp:llvm-project:tidy-cast-check
1. Build clang-tidy
2. Run clang-tidy over your entire codebase while disabling all checks
and enabling the one relevant one. Run on all header files also.
3. Delete .inc files that were also modified, so the next build rebuilds
them to a pure state.
4. Some changes have been deleted for the following reasons:
- Some files had a variable also named cast
- Some files had not included a header file that defines the cast
functions
- Some files are definitions of the classes that have the casting
methods, so the code still refers to the method instead of the
function without adding a prefix or removing the method declaration
at the same time.
```
ninja -C $BUILD_DIR clang-tidy
run-clang-tidy -clang-tidy-binary=$BUILD_DIR/bin/clang-tidy -checks='-*,misc-cast-functions'\
-header-filter=mlir/ mlir/* -fix
rm -rf $BUILD_DIR/tools/mlir/**/*.inc
git restore mlir/lib/IR mlir/lib/Dialect/DLTI/DLTI.cpp\
mlir/lib/Dialect/Complex/IR/ComplexDialect.cpp\
mlir/lib/**/IR/\
mlir/lib/Dialect/SparseTensor/Transforms/SparseVectorization.cpp\
mlir/lib/Dialect/Vector/Transforms/LowerVectorMultiReduction.cpp\
mlir/test/lib/Dialect/Test/TestTypes.cpp\
mlir/test/lib/Dialect/Transform/TestTransformDialectExtension.cpp\
mlir/test/lib/Dialect/Test/TestAttributes.cpp\
mlir/unittests/TableGen/EnumsGenTest.cpp\
mlir/test/python/lib/PythonTestCAPI.cpp\
mlir/include/mlir/IR/
```
Differential Revision: https://reviews.llvm.org/D150123
The function arguments and results type will have the default memory space.
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D144539
`getAliasingOpOperands`/`getAliasingOpResults` now encodes OpOperand/OpResult, buffer relation and a degree of certainty. E.g.:
```
// aliasingOpOperands(%r) = {(%t, EQUIV, DEFINITE)}
// aliasingOpResults(%t) = {(%r, EQUIV, DEFINITE)}
%r = tensor.insert %f into %t[%idx] : tensor<?xf32>
// aliasingOpOperands(%r) = {(%t0, EQUIV, MAYBE), (%t1, EQUIV, MAYBE)}
// aliasingOpResults(%t0) = {(%r, EQUIV, MAYBE)}
// aliasingOpResults(%t1) = {(%r, EQUIV, MAYBE)}
%r = arith.select %c, %t0, %t1 : tensor<?xf32>
```
`BufferizableOpInterface::bufferRelation` is removed, as it is now part of `getAliasingOpOperands`/`getAliasingOpResults`.
This change allows for better analysis, in particular wrt. equivalence. This allows additional optimizations and better error checking (which is sometimes overly conservative). Examples:
* EmptyTensorElimination can eliminate `tensor.empty` inside `scf.if` blocks. This requires a modeling of equivalence: It is not a per-OpResult property anymore. Instead, it can be specified for each OpOperand and OpResult. This is important because `tensor.empty` may be eliminated only if all values on the SSA use-def chain to the final consumer (`tensor.insert_slice`) are equivalent.
* The detection of "returning allocs from a block" can be improved. (Addresses a TODO in `assertNoAllocsReturned`.) This allows us to bufferize IR such as "yielding a `tensor.extract_slice` result from an `scf.if` branch", which currently fails to bufferize because the alloc detection is too conservative.
* Better bufferization of loops. Aliases of the iter_arg can be yielded (even if they are not equivalent) without having to realloc and copy the entire buffer on each iteration.
The above-mentioned examples are not yet implemented with this change. This change just improves the BufferizableOpInterface, its implementations and related helper functions, so that better aliasing information is available for each op.
Differential Revision: https://reviews.llvm.org/D142129
* `getAliasingOpOperand` => `getAliasingOpOperands`
* `getAliasingOpResult` => `getAliasingOpResults`
Also a few minor code cleanups and better documentation.
Differential Revision: https://reviews.llvm.org/D142979
The previous name was incorrect. `None` does not mean that there is no buffer relation between two buffers (seems to imply that they do not alias for sure); instead it means that there is no further information available.
Differential Revision: https://reviews.llvm.org/D142870
The analysis previous kept track of OpOperand -> OpResult and OpResult -> OpOperand aliasing mappings. Only one mapping is needed, the other one can be inferred.
Differential Revision: https://reviews.llvm.org/D142128
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated. The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.
This is part of an effort to migrate from llvm::Optional to
std::optional:
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
`DialectAnalysisState` is now `OneShotAnalysisState::Extension`.
This state extension mechanism is needed only for One-Shot Analysis, so it is moved from `BufferizableOpInterface.h` to `OneShotAnalysis.h`.
Extensions are now identified via TypeIDs instead of StringRefs. The API of state extensions is cleaned up and follows the same pattern as other extension mechanisms in MLIR (e.g., `transform::TransformState::Extension`).
Also delete some dead code.
Differential Revision: https://reviews.llvm.org/D135051
MemRef has been accepting a general Attribute as memory space for
a long time. This commits updates bufferization side to catch up,
which allows downstream users to plugin customized symbolic memory
space. This also eliminates quite a few `getMemorySpaceAsInt`
calls, which is deprecated.
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D138330
Expose `function-boundary-type-conversion` in `OneShotBufferizeOp`. To
reuse options between passes and transform operations, create a
`BufferizationEnums.td`.
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D137833
With the recent refactorings, this class is no longer needed. We can use BufferizationOptions in all places were BufferizationState was used.
Differential Revision: https://reviews.llvm.org/D127653
This change changes the bufferization so that it utilizes the new TensorCopyInsertion pass. One-Shot Bufferize no longer calls the One-Shot Analysis. Instead, it relies on the TensorCopyInsertion pass to make the entire IR fully inplacable. The `bufferize` implementations of all ops are simplified; they no longer have to account for out-of-place bufferization decisions. These were already materialized in the IR in the form of `bufferization.alloc_tensor` ops during the TensorCopyInsertion pass.
Differential Revision: https://reviews.llvm.org/D127652
Bufferization of the func dialect must go through `OneShotModuleBufferize`. With this change, the analysis interface methods of the BufferizableOpInterface of func dialect ops can be used together with the normal `OneShotBufferize`. (In the absence of analysis information, they will return conservative results.)
Differential Revision: https://reviews.llvm.org/D127299
This simplifies the bufferization itself and is in preparation of connecting with the sparse compiler.
Differential Revision: https://reviews.llvm.org/D126814