This commit adds the `BufferOriginAnalysis`, which can be queried to
check if two buffer SSA values originate from the same allocation. This
new analysis is used in the buffer deallocation pass to fold away or
simplify `bufferization.dealloc` ops more aggressively.
The `BufferOriginAnalysis` is based on the `BufferViewFlowAnalysis`,
which collects buffer SSA value "same buffer" dependencies. E.g., given
IR such as:
```
%0 = memref.alloc()
%1 = memref.subview %0
%2 = memref.subview %1
```
The `BufferViewFlowAnalysis` will report the following "reverse"
dependencies (`resolveReverse`) for `%2`: {`%2`, `%1`, `%0`}. I.e., all
buffer SSA values in the reverse use-def chain that originate from the
same allocation as `%2`. The `BufferOriginAnalysis` is built on top of
that. It handles only simple cases at the moment and may conservatively
return "unknown" around certain IR with branches, memref globals and
function arguments.
This analysis enables additional simplifications during
`-buffer-deallocation-simplification`. In particular, "regular" scf.for
loop nests, that yield buffers (or reallocations thereof) in the same
order as they appear in the iter_args, are now handled much more
efficiently. Such IR patterns are generated by the sparse compiler.
Even when `private-function-dynamic-ownership` is set, ownership should
never be passed to the callee. This can lead to double deallocs (#77096)
or use-after-free in the caller because ownership is currently passed
regardless of whether there are any further uses of the buffer in the
caller or not.
Note: This is consistent with the fact that ownership is never passed to
nested regions.
This commit fixes#77096.
Add write memory effect for the print operation. The exact memory
behavior is implemented in other print-like operations such as
`transform::PrintOp` or `gpu::printf`.
Providing memory behavior allows using the operation in passes like
buffer deallocation instead of emitting an error.
The buffer deallocation pass checks the IR ("operation preconditions")
to make sure that there is no IR that is unsupported. In such a case,
the pass signals a failure.
The pass now rejects all ops with unknown memory effects. We do not know
whether such an op allocates memory or not. Therefore, the buffer
deallocation pass does not know whether a deallocation op should be
inserted or not.
Memory effects are queried from the `MemoryEffectOpInterface` interface.
Ops that do not implement this interface but have the
`RecursiveMemoryEffects` trait do not have any side effects (apart from
the ones that their nested ops may have).
Unregistered ops are now rejected by the pass because they do not
implement the `MemoryEffectOpInterface` and neither do we know if they
have `RecursiveMemoryEffects` or not. All test cases that currently have
unregistered ops are updated to use registered ops.
Add a new attribute `bufferization.manual_deallocation` that can be
attached to allocation and deallocation ops. Buffers that are allocated
with this attribute are assigned an ownership of "false". Such buffers
can be deallocated manually (e.g., with `memref.dealloc`) if the
deallocation op also has the attribute set. Previously, the
ownership-based buffer deallocation pass used to reject IR with existing
deallocation ops. This is no longer the case if such ops have this new
attribute.
This change is useful for the sparse compiler, which currently
deallocates the sparse tensor buffers by itself.
Values that are the result of buffer allocation ops are guaranteed to
*not* be the same allocation as block arguments of containing blocks.
This fact can be used to allow for more aggressive simplification of
`bufferization.dealloc` ops.
This reverts commit aa9eb47da2e501d3505de733240eb89c9a0ea2cf.
It introduced a double free in a test case. Reverting to have some time
for fixing this and relanding later.
Inserting clones requires a lot of assumptions to hold on the input IR, e.g., all writes to a buffer need to dominate all reads. This is not guaranteed by one-shot bufferization and isn't easy to verify, thus it could quickly lead to incorrect results that are hard to debug. This commit changes the mechanism of how an ownership indicator is materialized when there is not already a unique ownership present. Additionally, we don't create copies of returned memrefs anymore when we don't have ownership. Instead, we insert assert operations to make sure we have ownership at runtime, or otherwise report to the user that correctness could not be guaranteed.
The buffer deallocation pipeline now works on modules and functions.
Also add extra test cases that run the buffer deallocation pipeline on
modules and functions. (Test cases that insert a helper function.)
* Properly handle the case where an op is deleted and thus no other
interfaces should be processed anymore.
* Don't add ownership indicator arguments and results to function
declarations
Since ownership based buffer deallocation requires a few passes to be run in a somewhat fixed sequence, it makes sense to have a pipeline for convenience (and to reduce the number of transform ops to represent default deallocation).
Add a method to the BufferDeallocationOpInterface that allows operations to implement the interface and provide custom logic to compute the ownership indicators of values it defines. As a demonstrating example, this new method is implemented by the `arith.select` operation.
This new interface allows operations to implement custom handling of ownership values and insertion of dealloc operations which is useful when an op cannot implement the interfaces supported by default by the buffer deallocation pass (e.g., because they are not exactly compatible or because there are some additional semantics to it that would render the default implementations in buffer deallocation invalid, or because no interfaces exist for this
kind of behavior and it's not worth introducing one plus a default implementation in buffer deallocation). Additionally, it can also be used to provide more efficient handling for a specific op than the interface based default
implementations can.
Add a new Buffer Deallocation pass with the intend to replace the old
one. For now it is added as a separate pass alongside in order to allow
downstream users to migrate over gradually. This new pass has the goal
of inserting fewer clone operations and supporting additional use-cases.
Please refer to the Buffer Deallocation section in the updated
Bufferization.md file for more information on how this new pass works.